DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH] usertools: rewrite pmdinfo
@ 2022-09-13 10:58 Robin Jarry
  2022-09-13 11:29 ` Ferruh Yigit
                   ` (7 more replies)
  0 siblings, 8 replies; 42+ messages in thread
From: Robin Jarry @ 2022-09-13 10:58 UTC (permalink / raw)
  To: dev; +Cc: Robin Jarry, Olivier Matz

dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
merely prints multiple independent JSON lines which cannot be fed
directly to any JSON parser. Moreover, the script complexity is rather
high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
sections. Rewrite it so that it can produce valid JSON.

Remove the PCI database parsing for PCI-ID to Vendor-Device names
conversion. This should be done by external scripts (if really needed).

Here are some examples of use with jq:

Get the complete info for a given driver:

 ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
   jq '.[] | select(.name == "dmadev_idxd_pci")'
 {
   "name": "dmadev_idxd_pci",
   "params": "max_queues=0",
   "kmod": "vfio-pci",
   "devices": [
     {
       "vendor_id": "8086",
       "device_id": "0b25",
       "subsystem_device_id": "ffff",
       "subsystem_system_id": "ffff"
     }
   ]
 }

Get only the required kernel modules for a given driver:

 ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
   jq '.[] | select(.name == "net_i40e").kmod'
 "* igb_uio | uio_pci_generic | vfio-pci"

Get only the required kernel modules for a given device:

 ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
   jq '.[] | select(.devices[] | .vendor_id == "15b3" and .device_id == "1013").kmod'
 "* ib_uverbs & mlx5_core & mlx5_ib"

Print the list of drivers which define multiple parameters without
string separators:

 ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
   jq '.[] | select(.params!=null and (.params|test("=[^ ]+="))) | {name, params}'
 ...

The script passes flake8, black, isort and pylint checks.

I have tested this with a matrix of python/pyelftools versions:

                             pyelftools
               0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
         3.6     ok   ok   ok   ok   ok   ok   ok   ok
         3.7     ok   ok   ok   ok   ok   ok   ok   ok
  Python 3.8     ok   ok   ok   ok   ok   ok   ok   ok
         3.9     ok   ok   ok   ok   ok   ok   ok   ok
         3.10  fail fail fail fail   ok   ok   ok   ok

All failures with python 3.10 are related to the same issue:

  File "elftools/construct/lib/container.py", line 5, in <module>
    from collections import MutableMapping
  ImportError: cannot import name 'MutableMapping' from 'collections'

Python 3.10 support is only available since pyelftools 0.26.

NB: The output produced by the legacy -r/--raw flag can be obtained with
the following command:

  strings build/app/dpdk-testpmd | sed -n 's/^PMD_INFO_STRING= //p'

Cc: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Robin Jarry <robin@jarry.cc>
---
There were multiple compatibility issues with this script in the past
years. Also, the style and complexity may be unsettling for python
developers. After this patch, maintenance should be much easier.

 doc/guides/rel_notes/release_22_11.rst |   5 +
 usertools/dpdk-pmdinfo.py              | 856 ++++++++-----------------
 2 files changed, 261 insertions(+), 600 deletions(-)

diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 8c021cf0505e..67054f5acdc9 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -84,6 +84,11 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* The ``dpdk-pmdinfo.py`` script was rewritten to produce valid JSON only.
+  PCI-IDs parsing has been removed.
+  To get a similar output to the (now removed) ``-r/--raw`` flag, you may use the following command::
+
+     strings $dpdk_binary_or_driver | sed -n 's/^PMD_INFO_STRING= //p'
 
 ABI Changes
 -----------
diff --git a/usertools/dpdk-pmdinfo.py b/usertools/dpdk-pmdinfo.py
index 40ef5cec6cba..cc72e5ce27a2 100755
--- a/usertools/dpdk-pmdinfo.py
+++ b/usertools/dpdk-pmdinfo.py
@@ -1,626 +1,282 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2016  Neil Horman <nhorman@tuxdriver.com>
+# Copyright(c) 2022  Robin Jarry
+# pylint: disable=invalid-name
 
-# -------------------------------------------------------------------------
-#
-# Utility to dump PMD_INFO_STRING support from an object file
-#
-# -------------------------------------------------------------------------
+r"""
+Utility to dump PMD_INFO_STRING support from DPDK binaries.
+
+This script prints JSON output to be interpreted by other tools. Here are some
+examples with jq:
+
+Get the complete info for a given driver:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.name == "cnxk_nix_inl")'
+
+Get only the required kernel modules for a given driver:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.name == "net_i40e").kmod'
+
+Get only the required kernel modules for a given device:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.devices[] | .vendor_id == "15b3" and .device_id == "1013").kmod'
+"""
+
+import argparse
 import json
 import os
-import platform
+import re
+import string
 import sys
-import argparse
-from elftools.common.exceptions import ELFError
-from elftools.common.py3compat import byte2int
-from elftools.elf.elffile import ELFFile
+from pathlib import Path
+from typing import Iterable, Iterator, List, Union
 
+import elftools
+from elftools.elf.elffile import ELFError, ELFFile
 
-# For running from development directory. It should take precedence over the
-# installed pyelftools.
-sys.path.insert(0, '.')
-
-raw_output = False
-pcidb = None
-
-# ===========================================
-
-class Vendor:
-    """
-    Class for vendors. This is the top level class
-    for the devices belong to a specific vendor.
-    self.devices is the device dictionary
-    subdevices are in each device.
-    """
-
-    def __init__(self, vendorStr):
-        """
-        Class initializes with the raw line from pci.ids
-        Parsing takes place inside __init__
-        """
-        self.ID = vendorStr.split()[0]
-        self.name = vendorStr.replace("%s " % self.ID, "").rstrip()
-        self.devices = {}
-
-    def addDevice(self, deviceStr):
-        """
-        Adds a device to self.devices
-        takes the raw line from pci.ids
-        """
-        s = deviceStr.strip()
-        devID = s.split()[0]
-        if devID in self.devices:
-            pass
-        else:
-            self.devices[devID] = Device(deviceStr)
-
-    def report(self):
-        print(self.ID, self.name)
-        for id, dev in self.devices.items():
-            dev.report()
-
-    def find_device(self, devid):
-        # convert to a hex string and remove 0x
-        devid = hex(devid)[2:]
-        try:
-            return self.devices[devid]
-        except:
-            return Device("%s  Unknown Device" % devid)
-
-
-class Device:
-
-    def __init__(self, deviceStr):
-        """
-        Class for each device.
-        Each vendor has its own devices dictionary.
-        """
-        s = deviceStr.strip()
-        self.ID = s.split()[0]
-        self.name = s.replace("%s  " % self.ID, "")
-        self.subdevices = {}
-
-    def report(self):
-        print("\t%s\t%s" % (self.ID, self.name))
-        for subID, subdev in self.subdevices.items():
-            subdev.report()
-
-    def addSubDevice(self, subDeviceStr):
-        """
-        Adds a subvendor, subdevice to device.
-        Uses raw line from pci.ids
-        """
-        s = subDeviceStr.strip()
-        spl = s.split()
-        subVendorID = spl[0]
-        subDeviceID = spl[1]
-        subDeviceName = s.split("  ")[-1]
-        devID = "%s:%s" % (subVendorID, subDeviceID)
-        self.subdevices[devID] = SubDevice(
-            subVendorID, subDeviceID, subDeviceName)
-
-    def find_subid(self, subven, subdev):
-        subven = hex(subven)[2:]
-        subdev = hex(subdev)[2:]
-        devid = "%s:%s" % (subven, subdev)
-
-        try:
-            return self.subdevices[devid]
-        except:
-            if (subven == "ffff" and subdev == "ffff"):
-                return SubDevice("ffff", "ffff", "(All Subdevices)")
-            return SubDevice(subven, subdev, "(Unknown Subdevice)")
-
-
-class SubDevice:
-    """
-    Class for subdevices.
-    """
-
-    def __init__(self, vendor, device, name):
-        """
-        Class initializes with vendorid, deviceid and name
-        """
-        self.vendorID = vendor
-        self.deviceID = device
-        self.name = name
-
-    def report(self):
-        print("\t\t%s\t%s\t%s" % (self.vendorID, self.deviceID, self.name))
-
-
-class PCIIds:
-    """
-    Top class for all pci.ids entries.
-    All queries will be asked to this class.
-    PCIIds.vendors["0e11"].devices["0046"].\
-    subdevices["0e11:4091"].name  =  "Smart Array 6i"
-    """
-
-    def __init__(self, filename):
-        """
-        Prepares the directories.
-        Checks local data file.
-        Tries to load from local, if not found, downloads from web
-        """
-        self.version = ""
-        self.date = ""
-        self.vendors = {}
-        self.contents = None
-        self.readLocal(filename)
-        self.parse()
-
-    def reportVendors(self):
-        """Reports the vendors
-        """
-        for vid, v in self.vendors.items():
-            print(v.ID, v.name)
-
-    def report(self, vendor=None):
-        """
-        Reports everything for all vendors or a specific vendor
-        PCIIds.report()  reports everything
-        PCIIDs.report("0e11") reports only "Compaq Computer Corporation"
-        """
-        if vendor is not None:
-            self.vendors[vendor].report()
-        else:
-            for vID, v in self.vendors.items():
-                v.report()
-
-    def find_vendor(self, vid):
-        # convert vid to a hex string and remove the 0x
-        vid = hex(vid)[2:]
-
-        try:
-            return self.vendors[vid]
-        except:
-            return Vendor("%s Unknown Vendor" % (vid))
-
-    def findDate(self, content):
-        for l in content:
-            if l.find("Date:") > -1:
-                return l.split()[-2].replace("-", "")
-        return None
-
-    def parse(self):
-        if not self.contents:
-            print("data/%s-pci.ids not found" % self.date)
-        else:
-            vendorID = ""
-            deviceID = ""
-            for l in self.contents:
-                if l[0] == "#":
-                    continue
-                elif not l.strip():
-                    continue
-                else:
-                    if l.find("\t\t") == 0:
-                        self.vendors[vendorID].devices[
-                            deviceID].addSubDevice(l)
-                    elif l.find("\t") == 0:
-                        deviceID = l.strip().split()[0]
-                        self.vendors[vendorID].addDevice(l)
-                    else:
-                        vendorID = l.split()[0]
-                        self.vendors[vendorID] = Vendor(l)
-
-    def readLocal(self, filename):
-        """
-        Reads the local file
-        """
-        with open(filename, 'r', encoding='utf-8') as f:
-            self.contents = f.readlines()
-        self.date = self.findDate(self.contents)
-
-    def loadLocal(self):
-        """
-        Loads database from local. If there is no file,
-        it creates a new one from web
-        """
-        self.date = idsfile[0].split("/")[1].split("-")[0]
-        self.readLocal()
-
-
-# =======================================
-
-def search_file(filename, search_path):
-    """ Given a search path, find file with requested name """
-    for path in search_path.split(':'):
-        candidate = os.path.join(path, filename)
-        if os.path.exists(candidate):
-            return os.path.abspath(candidate)
-    return None
-
-
-class ReadElf(object):
-    """ display_* methods are used to emit output into the output stream
-    """
-
-    def __init__(self, file, output):
-        """ file:
-                stream object with the ELF file to read
-
-            output:
-                output stream to write to
-        """
-        self.elffile = ELFFile(file)
-        self.output = output
-
-        # Lazily initialized if a debug dump is requested
-        self._dwarfinfo = None
-
-        self._versioninfo = None
-
-    def _section_from_spec(self, spec):
-        """ Retrieve a section given a "spec" (either number or name).
-            Return None if no such section exists in the file.
-        """
-        try:
-            num = int(spec)
-            if num < self.elffile.num_sections():
-                return self.elffile.get_section(num)
-            return None
-        except ValueError:
-            # Not a number. Must be a name then
-            section = self.elffile.get_section_by_name(force_unicode(spec))
-            if section is None:
-                # No match with a unicode name.
-                # Some versions of pyelftools (<= 0.23) store internal strings
-                # as bytes. Try again with the name encoded as bytes.
-                section = self.elffile.get_section_by_name(force_bytes(spec))
-            return section
-
-    def pretty_print_pmdinfo(self, pmdinfo):
-        global pcidb
-
-        for i in pmdinfo["pci_ids"]:
-            vendor = pcidb.find_vendor(i[0])
-            device = vendor.find_device(i[1])
-            subdev = device.find_subid(i[2], i[3])
-            print("%s (%s) : %s (%s) %s" %
-                  (vendor.name, vendor.ID, device.name,
-                   device.ID, subdev.name))
-
-    def parse_pmd_info_string(self, mystring):
-        global raw_output
-        global pcidb
-
-        optional_pmd_info = [
-            {'id': 'params', 'tag': 'PMD PARAMETERS'},
-            {'id': 'kmod', 'tag': 'PMD KMOD DEPENDENCIES'}
-        ]
-
-        i = mystring.index("=")
-        mystring = mystring[i + 2:]
-        pmdinfo = json.loads(mystring)
-
-        if raw_output:
-            print(json.dumps(pmdinfo))
-            return
-
-        print("PMD NAME: " + pmdinfo["name"])
-        for i in optional_pmd_info:
-            try:
-                print("%s: %s" % (i['tag'], pmdinfo[i['id']]))
-            except KeyError:
-                continue
-
-        if pmdinfo["pci_ids"]:
-            print("PMD HW SUPPORT:")
-            if pcidb is not None:
-                self.pretty_print_pmdinfo(pmdinfo)
-            else:
-                print("VENDOR\t DEVICE\t SUBVENDOR\t SUBDEVICE")
-                for i in pmdinfo["pci_ids"]:
-                    print("0x%04x\t 0x%04x\t 0x%04x\t\t 0x%04x" %
-                          (i[0], i[1], i[2], i[3]))
-
-        print("")
-
-    def display_pmd_info_strings(self, section_spec):
-        """ Display a strings dump of a section. section_spec is either a
-            section number or a name.
-        """
-        section = self._section_from_spec(section_spec)
-        if section is None:
-            return
-
-        data = section.data()
-        dataptr = 0
-
-        while dataptr < len(data):
-            while (dataptr < len(data) and
-                   not 32 <= byte2int(data[dataptr]) <= 127):
-                dataptr += 1
-
-            if dataptr >= len(data):
-                break
-
-            endptr = dataptr
-            while endptr < len(data) and byte2int(data[endptr]) != 0:
-                endptr += 1
-
-            # pyelftools may return byte-strings, force decode them
-            mystring = force_unicode(data[dataptr:endptr])
-            rc = mystring.find("PMD_INFO_STRING")
-            if rc != -1:
-                self.parse_pmd_info_string(mystring[rc:])
-
-            dataptr = endptr
-
-    def find_librte_eal(self, section):
-        for tag in section.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_NEEDED':
-                if "librte_eal" in force_unicode(tag.needed):
-                    return force_unicode(tag.needed)
-        return None
-
-    def search_for_autoload_path(self):
-        scanelf = self
-        scanfile = None
-        library = None
-
-        section = self._section_from_spec(".dynamic")
-        try:
-            eallib = self.find_librte_eal(section)
-            if eallib is not None:
-                ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-                if ldlibpath is None:
-                    ldlibpath = ""
-                dtr = self.get_dt_runpath(section)
-                library = search_file(eallib,
-                                      dtr + ":" + ldlibpath +
-                                      ":/usr/lib64:/lib64:/usr/lib:/lib")
-                if library is None:
-                    return (None, None)
-                if not raw_output:
-                    print("Scanning for autoload path in %s" % library)
-                scanfile = open(library, 'rb')
-                scanelf = ReadElf(scanfile, sys.stdout)
-        except AttributeError:
-            # Not a dynamic binary
-            pass
-        except ELFError:
-            scanfile.close()
-            return (None, None)
-
-        section = scanelf._section_from_spec(".rodata")
-        if section is None:
-            if scanfile is not None:
-                scanfile.close()
-            return (None, None)
-
-        data = section.data()
-        dataptr = 0
-
-        while dataptr < len(data):
-            while (dataptr < len(data) and
-                   not 32 <= byte2int(data[dataptr]) <= 127):
-                dataptr += 1
-
-            if dataptr >= len(data):
-                break
-
-            endptr = dataptr
-            while endptr < len(data) and byte2int(data[endptr]) != 0:
-                endptr += 1
-
-            # pyelftools may return byte-strings, force decode them
-            mystring = force_unicode(data[dataptr:endptr])
-            rc = mystring.find("DPDK_PLUGIN_PATH")
-            if rc != -1:
-                rc = mystring.find("=")
-                return (mystring[rc + 1:], library)
-
-            dataptr = endptr
-        if scanfile is not None:
-            scanfile.close()
-        return (None, None)
-
-    def get_dt_runpath(self, dynsec):
-        for tag in dynsec.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_RUNPATH':
-                return force_unicode(tag.runpath)
-        return ""
-
-    def process_dt_needed_entries(self):
-        """ Look to see if there are any DT_NEEDED entries in the binary
-            And process those if there are
-        """
-        runpath = ""
-        ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-        if ldlibpath is None:
-            ldlibpath = ""
-
-        dynsec = self._section_from_spec(".dynamic")
-        try:
-            runpath = self.get_dt_runpath(dynsec)
-        except AttributeError:
-            # dynsec is None, just return
-            return
-
-        for tag in dynsec.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_NEEDED':
-                if 'librte_' in force_unicode(tag.needed):
-                    library = search_file(force_unicode(tag.needed),
-                                          runpath + ":" + ldlibpath +
-                                          ":/usr/lib64:/lib64:/usr/lib:/lib")
-                    if library is not None:
-                        with open(library, 'rb') as file:
-                            try:
-                                libelf = ReadElf(file, sys.stdout)
-                            except ELFError:
-                                print("%s is no an ELF file" % library)
-                                continue
-                            libelf.process_dt_needed_entries()
-                            libelf.display_pmd_info_strings(".rodata")
-                            file.close()
-
-
-# compat: remove force_unicode & force_bytes when pyelftools<=0.23 support is
-# dropped.
-def force_unicode(s):
-    if hasattr(s, 'decode') and callable(s.decode):
-        s = s.decode('latin-1')  # same encoding used in pyelftools py3compat
-    return s
-
-
-def force_bytes(s):
-    if hasattr(s, 'encode') and callable(s.encode):
-        s = s.encode('latin-1')  # same encoding used in pyelftools py3compat
-    return s
-
-
-def scan_autoload_path(autoload_path):
-    global raw_output
-
-    if not os.path.exists(autoload_path):
-        return
 
+# ----------------------------------------------------------------------------
+def main() -> int:  # pylint: disable=missing-docstring
     try:
-        dirs = os.listdir(autoload_path)
-    except OSError:
-        # Couldn't read the directory, give up
-        return
+        args = parse_args()
+        info = parse_pmdinfo(args.elf_files, args.search_plugins)
+        json.dump(info, sys.stdout, indent=2)
+        sys.stdout.write("\n")
+    except BrokenPipeError:
+        pass
+    except KeyboardInterrupt:
+        return 1
+    except Exception as e:  # pylint: disable=broad-except
+        print(f"error: {e}", file=sys.stderr)
+        return 1
 
-    for d in dirs:
-        dpath = os.path.join(autoload_path, d)
-        if os.path.isdir(dpath):
-            scan_autoload_path(dpath)
-        if os.path.isfile(dpath):
-            try:
-                file = open(dpath, 'rb')
-                readelf = ReadElf(file, sys.stdout)
-            except ELFError:
-                # this is likely not an elf file, skip it
-                continue
-            except IOError:
-                # No permission to read the file, skip it
-                continue
-
-            if not raw_output:
-                print("Hw Support for library %s" % d)
-            readelf.display_pmd_info_strings(".rodata")
-            file.close()
+    return 0
 
 
-def scan_for_autoload_pmds(dpdk_path):
+# ----------------------------------------------------------------------------
+def parse_args() -> argparse.Namespace:
     """
-    search the specified application or path for a pmd autoload path
-    then scan said path for pmds and report hw support
+    Parse command line arguments.
     """
-    global raw_output
-
-    if not os.path.isfile(dpdk_path):
-        if not raw_output:
-            print("Must specify a file name")
-        return
-
-    file = open(dpdk_path, 'rb')
-    try:
-        readelf = ReadElf(file, sys.stdout)
-    except ElfError:
-        if not raw_output:
-            print("Unable to parse %s" % file)
-        return
-
-    (autoload_path, scannedfile) = readelf.search_for_autoload_path()
-    if not autoload_path:
-        if not raw_output:
-            print("No autoload path configured in %s" % dpdk_path)
-        return
-    if not raw_output:
-        if scannedfile is None:
-            scannedfile = dpdk_path
-        print("Found autoload path %s in %s" % (autoload_path, scannedfile))
-
-    file.close()
-    if not raw_output:
-        print("Discovered Autoload HW Support:")
-    scan_autoload_path(autoload_path)
-    return
-
-
-def main(stream=None):
-    global raw_output
-    global pcidb
-
-    pcifile_default = "./pci.ids"  # For unknown OS's assume local file
-    if platform.system() == 'Linux':
-        # hwdata is the legacy location, misc is supported going forward
-        pcifile_default = "/usr/share/misc/pci.ids"
-        if not os.path.exists(pcifile_default):
-            pcifile_default = "/usr/share/hwdata/pci.ids"
-    elif platform.system() == 'FreeBSD':
-        pcifile_default = "/usr/local/share/pciids/pci.ids"
-        if not os.path.exists(pcifile_default):
-            pcifile_default = "/usr/share/misc/pci_vendors"
-
     parser = argparse.ArgumentParser(
-        usage='usage: %(prog)s [-hrtp] [-d <pci id file>] elf_file',
-        description="Dump pmd hardware support info")
-    group = parser.add_mutually_exclusive_group()
-    group.add_argument('-r', '--raw',
-                       action='store_true', dest='raw_output',
-                       help='dump raw json strings')
-    group.add_argument("-t", "--table", dest="tblout",
-                       help="output information on hw support as a hex table",
-                       action='store_true')
-    parser.add_argument("-d", "--pcidb", dest="pcifile",
-                        help="specify a pci database to get vendor names from",
-                        default=pcifile_default, metavar="FILE")
-    parser.add_argument("-p", "--plugindir", dest="pdir",
-                        help="scan dpdk for autoload plugins",
-                        action='store_true')
-    parser.add_argument("elf_file", help="driver shared object file")
-    args = parser.parse_args()
+        description=__doc__,
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    parser.add_argument(
+        "-p",
+        "--search-plugins",
+        action="store_true",
+        help="""
+        In addition of ELF_FILEs and their linked dynamic libraries, also scan
+        the DPDK plugins path.
+        """,
+    )
+    parser.add_argument(
+        "elf_files",
+        metavar="ELF_FILE",
+        nargs="+",
+        type=existing_file,
+        help="""
+        DPDK application binary or dynamic library.
+        """,
+    )
+    return parser.parse_args()
 
-    if args.raw_output:
-        raw_output = True
 
-    if args.tblout:
-        args.pcifile = None
+# ----------------------------------------------------------------------------
+def parse_pmdinfo(paths: Iterable[Path], search_plugins: bool) -> List[dict]:
+    """
+    Extract DPDK PMD info JSON strings from an ELF file.
 
-    if args.pcifile:
-        pcidb = PCIIds(args.pcifile)
-        if pcidb is None:
-            print("Pci DB file not found")
-            exit(1)
+    :returns:
+        A list of DPDK drivers info dictionaries.
+    """
+    binaries = set(paths)
+    for p in paths:
+        binaries.update(get_needed_libs(p))
+    if search_plugins:
+        # cast to list to avoid errors with update while iterating
+        binaries.update(list(get_plugin_libs(binaries)))
 
-    if args.pdir:
-        exit(scan_for_autoload_pmds(args.elf_file))
+    drivers = []
 
-    ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-    if ldlibpath is None:
-        ldlibpath = ""
-
-    if os.path.exists(args.elf_file):
-        myelffile = args.elf_file
-    else:
-        myelffile = search_file(args.elf_file,
-                                ldlibpath + ":/usr/lib64:/lib64:/usr/lib:/lib")
-
-    if myelffile is None:
-        print("File not found")
-        sys.exit(1)
-
-    with open(myelffile, 'rb') as file:
+    for b in binaries:
         try:
-            readelf = ReadElf(file, sys.stdout)
-            readelf.process_dt_needed_entries()
-            readelf.display_pmd_info_strings(".rodata")
-            sys.exit(0)
+            for s in get_elf_strings(b, ".rodata", "PMD_INFO_STRING="):
+                try:
+                    info = json.loads(s)
+                    # convert numerical ids to hex strings
+                    info["devices"] = []
+                    for vendor, device, subdev, subsys in info.pop("pci_ids"):
+                        info["devices"].append(
+                            {
+                                "vendor_id": f"{vendor:04x}",
+                                "device_id": f"{device:04x}",
+                                "subsystem_device_id": f"{subdev:04x}",
+                                "subsystem_system_id": f"{subsys:04x}",
+                            }
+                        )
+                    drivers.append(info)
+                except ValueError as e:
+                    print(f"warning: {b}: {e}", file=sys.stderr)
+        except FileNotFoundError as e:
+            print(f"warning: {b}: {e}", file=sys.stderr)
+        except ELFError as e:
+            print(f"warning: {b}: elf error: {e}", file=sys.stderr)
 
-        except ELFError as ex:
-            sys.stderr.write('ELF error: %s\n' % ex)
-            sys.exit(1)
+    return drivers
 
 
-# -------------------------------------------------------------------------
-if __name__ == '__main__':
-    main()
+# ----------------------------------------------------------------------------
+def get_plugin_libs(binaries: Iterable[Path]) -> Iterator[Path]:
+    """
+    Look into the provided binaries for DPDK_PLUGIN_PATH and scan the path
+    for files.
+    """
+    for b in binaries:
+        for p in get_elf_strings(b, ".rodata", "DPDK_PLUGIN_PATH="):
+            plugin_path = p.strip()
+            for root, _, files in os.walk(plugin_path):
+                for f in files:
+                    yield Path(root) / f
+            # no need to search in other binaries.
+            return
+
+
+# ----------------------------------------------------------------------------
+def existing_file(value: str) -> Path:
+    """
+    Argparse type= callback to ensure an argument points to a valid file path.
+    """
+    path = Path(value)
+    if not path.is_file():
+        raise argparse.ArgumentTypeError(f"{value}: No such file")
+    return path
+
+
+# ----------------------------------------------------------------------------
+def search_ld_library_path(name: str) -> Path:
+    """
+    Search a file into LD_LIBRARY_PATH and the standard folders where libraries
+    are usually located.
+
+    :raises FileNotFoundError:
+    """
+    folders = []
+    if "LD_LIBRARY_PATH" in os.environ:
+        folders += os.environ["LD_LIBRARY_PATH"].split(":")
+    folders += ["/usr/lib64", "/lib64", "/usr/lib", "/lib"]
+    for d in folders:
+        filepath = Path(d) / name
+        if filepath.is_file():
+            return filepath
+    raise FileNotFoundError(name)
+
+
+# ----------------------------------------------------------------------------
+PRINTABLE_BYTES = frozenset(string.printable.encode("ascii"))
+
+
+def find_strings(buf: bytes, prefix: str) -> Iterator[str]:
+    """
+    Extract strings of printable ASCII characters from a bytes buffer.
+    """
+    view = memoryview(buf)
+    start = None
+
+    for i, b in enumerate(view):
+        if start is None and b in PRINTABLE_BYTES:
+            # mark begining of string
+            start = i
+            continue
+
+        if start is not None:
+            if b in PRINTABLE_BYTES:
+                # string not finished
+                continue
+            if b == 0:
+                # end of string
+                s = view[start:i].tobytes().decode("ascii")
+                if s.startswith(prefix):
+                    yield s[len(prefix) :]
+            # There can be byte sequences where a non-printable character
+            # follows a printable one. Ignore that.
+            start = None
+
+
+# ----------------------------------------------------------------------------
+def elftools_version():
+    """
+    Extract pyelftools version as a tuple of integers for easy comparison.
+    """
+    version = getattr(elftools, "__version__", "")
+    match = re.match(r"^(\d+)\.(\d+).*$", str(version))
+    if not match:
+        # cannot determine version, hope for the best
+        return (0, 24)
+    return (int(match[1]), int(match[2]))
+
+
+ELFTOOLS_VERSION = elftools_version()
+
+
+def from_elftools(s: Union[bytes, str]) -> str:
+    """
+    Earlier versions of pyelftools (< 0.24) return bytes encoded with "latin-1"
+    instead of python strings.
+    """
+    if isinstance(s, bytes):
+        return s.decode("latin-1")
+    return s
+
+
+def to_elftools(s: str) -> Union[bytes, str]:
+    """
+    Earlier versions of pyelftools (< 0.24) assume that ELF section and tags
+    are bytes encoded with "latin-1" instead of python strings.
+    """
+    if ELFTOOLS_VERSION < (0, 24):
+        return s.encode("latin-1")
+    return s
+
+
+# ----------------------------------------------------------------------------
+def get_elf_strings(path: Path, section: str, prefix: str) -> Iterator[str]:
+    """
+    Extract strings from a named ELF section in a file.
+    """
+    with path.open("rb") as f:
+        elf = ELFFile(f)
+        sec = elf.get_section_by_name(to_elftools(section))
+        if not sec:
+            return
+        yield from find_strings(sec.data(), prefix)
+
+
+# ----------------------------------------------------------------------------
+def get_needed_libs(path: Path) -> Iterator[Path]:
+    """
+    Extract the dynamic library dependencies from an ELF file.
+    """
+    with path.open("rb") as f:
+        elf = ELFFile(f)
+        dyn = elf.get_section_by_name(to_elftools(".dynamic"))
+        if not dyn:
+            return
+        for tag in dyn.iter_tags(to_elftools("DT_NEEDED")):
+            needed = from_elftools(tag.needed)
+            if not needed.startswith("librte_"):
+                continue
+            try:
+                yield search_ld_library_path(needed)
+            except FileNotFoundError:
+                print(f"warning: cannot find {needed}", file=sys.stderr)
+
+
+# ----------------------------------------------------------------------------
+if __name__ == "__main__":
+    sys.exit(main())
-- 
2.37.3


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] usertools: rewrite pmdinfo
  2022-09-13 10:58 [PATCH] usertools: rewrite pmdinfo Robin Jarry
@ 2022-09-13 11:29 ` Ferruh Yigit
  2022-09-13 11:49   ` Robin Jarry
  2022-09-13 14:17 ` Bruce Richardson
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Ferruh Yigit @ 2022-09-13 11:29 UTC (permalink / raw)
  To: Robin Jarry; +Cc: Olivier Matz, dev

On 9/13/2022 11:58 AM, Robin Jarry wrote:
> dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
> merely prints multiple independent JSON lines which cannot be fed
> directly to any JSON parser. Moreover, the script complexity is rather
> high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
> sections. Rewrite it so that it can produce valid JSON.
> 
> Remove the PCI database parsing for PCI-ID to Vendor-Device names
> conversion. This should be done by external scripts (if really needed).
> 
> Here are some examples of use with jq:
> 
> Get the complete info for a given driver:
> 
>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>     jq '.[] | select(.name == "dmadev_idxd_pci")'
>   {
>     "name": "dmadev_idxd_pci",
>     "params": "max_queues=0",
>     "kmod": "vfio-pci",
>     "devices": [
>       {
>         "vendor_id": "8086",
>         "device_id": "0b25",
>         "subsystem_device_id": "ffff",
>         "subsystem_system_id": "ffff"
>       }
>     ]
>   }
> 
> Get only the required kernel modules for a given driver:
> 
>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>     jq '.[] | select(.name == "net_i40e").kmod'
>   "* igb_uio | uio_pci_generic | vfio-pci"
> 
> Get only the required kernel modules for a given device:
> 
>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>     jq '.[] | select(.devices[] | .vendor_id == "15b3" and .device_id == "1013").kmod'
>   "* ib_uverbs & mlx5_core & mlx5_ib"
> 
> Print the list of drivers which define multiple parameters without
> string separators:
> 
>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>     jq '.[] | select(.params!=null and (.params|test("=[^ ]+="))) | {name, params}'
>   ...
> 
> The script passes flake8, black, isort and pylint checks.
> 
> I have tested this with a matrix of python/pyelftools versions:
> 
>                               pyelftools
>                 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
>           3.6     ok   ok   ok   ok   ok   ok   ok   ok
>           3.7     ok   ok   ok   ok   ok   ok   ok   ok
>    Python 3.8     ok   ok   ok   ok   ok   ok   ok   ok
>           3.9     ok   ok   ok   ok   ok   ok   ok   ok
>           3.10  fail fail fail fail   ok   ok   ok   ok
> 
> All failures with python 3.10 are related to the same issue:
> 
>    File "elftools/construct/lib/container.py", line 5, in <module>
>      from collections import MutableMapping
>    ImportError: cannot import name 'MutableMapping' from 'collections'
> 
> Python 3.10 support is only available since pyelftools 0.26.
> 
> NB: The output produced by the legacy -r/--raw flag can be obtained with
> the following command:
> 
>    strings build/app/dpdk-testpmd | sed -n 's/^PMD_INFO_STRING= //p'
> 
> Cc: Olivier Matz<olivier.matz@6wind.com>
> Signed-off-by: Robin Jarry<robin@jarry.cc>

Hi Robin,

Thanks for the work.

One of the major usecase of the script is to get information from binary 
drivers. So intentions of the script is to run it on drivers more than 
applications (dpdk-testpmd).

When I run it on one of the .so drivers, it is generating some warnings 
[1], is this expected?

[1]
$ ./usertools/dpdk-pmdinfo.py ./build/drivers/librte_net_ixgbe.so 
 

warning: cannot find librte_ethdev.so.23
warning: cannot find librte_eal.so.23
warning: cannot find librte_kvargs.so.23
warning: cannot find librte_telemetry.so.23
warning: cannot find librte_net.so.23
warning: cannot find librte_mbuf.so.23
warning: cannot find librte_mempool.so.23
warning: cannot find librte_ring.so.23
warning: cannot find librte_meter.so.23
warning: cannot find librte_bus_pci.so.23
warning: cannot find librte_pci.so.23
warning: cannot find librte_bus_vdev.so.23
warning: cannot find librte_hash.so.23
warning: cannot find librte_rcu.so.23
warning: cannot find librte_security.so.23
warning: cannot find librte_cryptodev.so.23
[
   {
     "name": "net_ixgbe_vf",
     "params": "pflink_fullchk=<0|1>",
     "kmod": "* igb_uio | vfio-pci",
     "devices": [
...
...



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] usertools: rewrite pmdinfo
  2022-09-13 11:29 ` Ferruh Yigit
@ 2022-09-13 11:49   ` Robin Jarry
  2022-09-13 13:50     ` Ferruh Yigit
  0 siblings, 1 reply; 42+ messages in thread
From: Robin Jarry @ 2022-09-13 11:49 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Olivier Matz, dev

Ferruh Yigit, Sep 13, 2022 at 13:29:
> Hi Robin,
>
> Thanks for the work.
>
> One of the major usecase of the script is to get information from binary 
> drivers. So intentions of the script is to run it on drivers more than 
> applications (dpdk-testpmd).
>
> When I run it on one of the .so drivers, it is generating some warnings 
> [1], is this expected?
>
> [1]
> $ ./usertools/dpdk-pmdinfo.py ./build/drivers/librte_net_ixgbe.so 
>  
>
> warning: cannot find librte_ethdev.so.23
> warning: cannot find librte_eal.so.23
> warning: cannot find librte_kvargs.so.23
> warning: cannot find librte_telemetry.so.23
> warning: cannot find librte_net.so.23
> warning: cannot find librte_mbuf.so.23
> warning: cannot find librte_mempool.so.23
> warning: cannot find librte_ring.so.23
> warning: cannot find librte_meter.so.23
> warning: cannot find librte_bus_pci.so.23
> warning: cannot find librte_pci.so.23
> warning: cannot find librte_bus_vdev.so.23
> warning: cannot find librte_hash.so.23
> warning: cannot find librte_rcu.so.23
> warning: cannot find librte_security.so.23
> warning: cannot find librte_cryptodev.so.23
> [
>    {
>      "name": "net_ixgbe_vf",
>      "params": "pflink_fullchk=<0|1>",
>      "kmod": "* igb_uio | vfio-pci",
>      "devices": [
> ...
> ...

Hi Ferruh,

yes it tries to parse all required (DT_NEEDED) dynamic libraries as did
the previous version of the script. The warnings are displayed when
a needed lib is not found.

You can fix that by exporting LD_LIBRARY_PATH:

$ LD_LIBRARY_PATH=build/lib/:build/drivers/ usertools/dpdk-pmdinfo.py build/drivers/librte_net_ixgbe.so | head
[
  {
    "name": "net_ixgbe_vf",
    "params": "pflink_fullchk=<0|1>",
    "kmod": "* igb_uio | vfio-pci",
    "devices": [
      {
        "vendor_id": "8086",
        "device_id": "10ed",
        "subsystem_device_id": "ffff",
...
...

If the libraries are installed in a standard path, it should not be
necessary to export LD_LIBRARY_PATH.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] usertools: rewrite pmdinfo
  2022-09-13 11:49   ` Robin Jarry
@ 2022-09-13 13:50     ` Ferruh Yigit
  2022-09-13 13:59       ` Robin Jarry
  0 siblings, 1 reply; 42+ messages in thread
From: Ferruh Yigit @ 2022-09-13 13:50 UTC (permalink / raw)
  To: Robin Jarry; +Cc: Olivier Matz, dev

On 9/13/2022 12:49 PM, Robin Jarry wrote:
> CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
> 
> 
> Ferruh Yigit, Sep 13, 2022 at 13:29:
>> Hi Robin,
>>
>> Thanks for the work.
>>
>> One of the major usecase of the script is to get information from binary
>> drivers. So intentions of the script is to run it on drivers more than
>> applications (dpdk-testpmd).
>>
>> When I run it on one of the .so drivers, it is generating some warnings
>> [1], is this expected?
>>
>> [1]
>> $ ./usertools/dpdk-pmdinfo.py ./build/drivers/librte_net_ixgbe.so
>>
>>
>> warning: cannot find librte_ethdev.so.23
>> warning: cannot find librte_eal.so.23
>> warning: cannot find librte_kvargs.so.23
>> warning: cannot find librte_telemetry.so.23
>> warning: cannot find librte_net.so.23
>> warning: cannot find librte_mbuf.so.23
>> warning: cannot find librte_mempool.so.23
>> warning: cannot find librte_ring.so.23
>> warning: cannot find librte_meter.so.23
>> warning: cannot find librte_bus_pci.so.23
>> warning: cannot find librte_pci.so.23
>> warning: cannot find librte_bus_vdev.so.23
>> warning: cannot find librte_hash.so.23
>> warning: cannot find librte_rcu.so.23
>> warning: cannot find librte_security.so.23
>> warning: cannot find librte_cryptodev.so.23
>> [
>>     {
>>       "name": "net_ixgbe_vf",
>>       "params": "pflink_fullchk=<0|1>",
>>       "kmod": "* igb_uio | vfio-pci",
>>       "devices": [
>> ...
>> ...
> 
> Hi Ferruh,
> 
> yes it tries to parse all required (DT_NEEDED) dynamic libraries as did
> the previous version of the script. The warnings are displayed when
> a needed lib is not found.
> 
> You can fix that by exporting LD_LIBRARY_PATH:
> 
> $ LD_LIBRARY_PATH=build/lib/:build/drivers/ usertools/dpdk-pmdinfo.py build/drivers/librte_net_ixgbe.so | head
> [
>    {
>      "name": "net_ixgbe_vf",
>      "params": "pflink_fullchk=<0|1>",
>      "kmod": "* igb_uio | vfio-pci",
>      "devices": [
>        {
>          "vendor_id": "8086",
>          "device_id": "10ed",
>          "subsystem_device_id": "ffff",
> ...
> ...
> 
> If the libraries are installed in a standard path, it should not be
> necessary to export LD_LIBRARY_PATH.

I confirm warnings are gone when `LD_LIBRARY_PATH` is provided, but why 
current version doesn't require `LD_LIBRARY_PATH`?

Also can it be possible to parse 'DT_NEEDED' for applications, but 
ignore it for dynamic libraries?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] usertools: rewrite pmdinfo
  2022-09-13 13:50     ` Ferruh Yigit
@ 2022-09-13 13:59       ` Robin Jarry
  2022-09-13 14:17         ` Ferruh Yigit
  0 siblings, 1 reply; 42+ messages in thread
From: Robin Jarry @ 2022-09-13 13:59 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Olivier Matz, dev

Ferruh Yigit, Sep 13, 2022 at 15:50:
> I confirm warnings are gone when `LD_LIBRARY_PATH` is provided, but why 
> current version doesn't require `LD_LIBRARY_PATH`?

It does but I assume no warning is displayed when the required libs are
not found. I could silence the warnings unless a -v/--verbose flag is
specified.

> Also can it be possible to parse 'DT_NEEDED' for applications, but 
> ignore it for dynamic libraries?

Yes it is possible. I'll include that in v2.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] usertools: rewrite pmdinfo
  2022-09-13 10:58 [PATCH] usertools: rewrite pmdinfo Robin Jarry
  2022-09-13 11:29 ` Ferruh Yigit
@ 2022-09-13 14:17 ` Bruce Richardson
  2022-09-13 19:42 ` [PATCH v2] " Robin Jarry
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 42+ messages in thread
From: Bruce Richardson @ 2022-09-13 14:17 UTC (permalink / raw)
  To: Robin Jarry; +Cc: dev, Olivier Matz

On Tue, Sep 13, 2022 at 12:58:11PM +0200, Robin Jarry wrote:
> dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
> merely prints multiple independent JSON lines which cannot be fed
> directly to any JSON parser. Moreover, the script complexity is rather
> high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
> sections. Rewrite it so that it can produce valid JSON.
> 
> Remove the PCI database parsing for PCI-ID to Vendor-Device names
> conversion. This should be done by external scripts (if really needed).
> 

Thanks for the rework. Comment inline below.

/Bruce

<snip>

> +    :raises FileNotFoundError:
> +    """
> +    folders = []
> +    if "LD_LIBRARY_PATH" in os.environ:
> +        folders += os.environ["LD_LIBRARY_PATH"].split(":")
> +    folders += ["/usr/lib64", "/lib64", "/usr/lib", "/lib"]

This is a standard set of folders for Redhat and similar based
distributions, but not for Ubuntu and Debian ones - which use e.g.
"/lib/x86_64-linux-gnu". It's also missing path options from /usr/local,
which is a likely location for a compiled and installed DPDK.

Ideally, the script would parse the entries from /etc/ld.so.conf, or
/etc/ld.so.conf.d/* files to get a full list of directories to search on
the system. Frankly, though, that seems like too much work. :-) 
Two ideas I'd suggest, though both involve shelling out to other commands:

* Call "ldd" to get the paths to the relevant libraries for the current
  object, rather than parsing the DT_NEEDED entries from the elf file in
  the script
* Use "ldconfig -p" to get the list of libs and paths for the script and
  use that for matching the DT_NEEDED entries.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH] usertools: rewrite pmdinfo
  2022-09-13 13:59       ` Robin Jarry
@ 2022-09-13 14:17         ` Ferruh Yigit
  0 siblings, 0 replies; 42+ messages in thread
From: Ferruh Yigit @ 2022-09-13 14:17 UTC (permalink / raw)
  To: Robin Jarry; +Cc: Olivier Matz, dev

On 9/13/2022 2:59 PM, Robin Jarry wrote:
> 
> Ferruh Yigit, Sep 13, 2022 at 15:50:
>> I confirm warnings are gone when `LD_LIBRARY_PATH` is provided, but why
>> current version doesn't require `LD_LIBRARY_PATH`?
> 
> It does but I assume no warning is displayed when the required libs are
> not found. I could silence the warnings unless a -v/--verbose flag is
> specified.
> 

+1 to silence the warnings by default.

>> Also can it be possible to parse 'DT_NEEDED' for applications, but
>> ignore it for dynamic libraries?
> 
> Yes it is possible. I'll include that in v2.

Cool, thanks.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2] usertools: rewrite pmdinfo
  2022-09-13 10:58 [PATCH] usertools: rewrite pmdinfo Robin Jarry
  2022-09-13 11:29 ` Ferruh Yigit
  2022-09-13 14:17 ` Bruce Richardson
@ 2022-09-13 19:42 ` Robin Jarry
  2022-09-13 20:54   ` Ferruh Yigit
  2022-09-20  9:08 ` [PATCH v3] " Robin Jarry
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Robin Jarry @ 2022-09-13 19:42 UTC (permalink / raw)
  To: dev; +Cc: Robin Jarry, Olivier Matz, Ferruh Yigit, Bruce Richardson

dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
merely prints multiple independent JSON lines which cannot be fed
directly to any JSON parser. Moreover, the script complexity is rather
high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
sections. Rewrite it so that it can produce valid JSON.

Remove the PCI database parsing for PCI-ID to Vendor-Device names
conversion. This should be done by external scripts (if really needed).

Here are some examples of use with jq:

Get the complete info for a given driver:

 ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
   jq '.[] | select(.name == "dmadev_idxd_pci")'
 {
   "name": "dmadev_idxd_pci",
   "params": "max_queues=0",
   "kmod": "vfio-pci",
   "pci_ids": [
     {
       "vendor": "8086",
       "device": "0b25",
       "subsystem_vendor": "ffff",
       "subsystem_device": "ffff"
     }
   ]
 }

Get only the required kernel modules for a given driver:

 ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
   jq '.[] | select(.name == "net_i40e").kmod'
 "* igb_uio | uio_pci_generic | vfio-pci"

Get only the required kernel modules for a given device:

 ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
   jq '.[] | select(.pci_ids[] | .vendor == "15b3" and .device == "1013").kmod'
 "* ib_uverbs & mlx5_core & mlx5_ib"

Print the list of drivers which define multiple parameters without
string separators:

 ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
   jq '.[] | select(.params!=null and (.params|test("=[^ ]+="))) | {name, params}'
 ...

The script passes flake8, black, isort and pylint checks.

I have tested this with a matrix of python/pyelftools versions:

                             pyelftools
               0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
         3.6     ok   ok   ok   ok   ok   ok   ok   ok
         3.7     ok   ok   ok   ok   ok   ok   ok   ok
  Python 3.8     ok   ok   ok   ok   ok   ok   ok   ok
         3.9     ok   ok   ok   ok   ok   ok   ok   ok
         3.10  fail fail fail fail   ok   ok   ok   ok

All failures with python 3.10 are related to the same issue:

  File "elftools/construct/lib/container.py", line 5, in <module>
    from collections import MutableMapping
  ImportError: cannot import name 'MutableMapping' from 'collections'

Python 3.10 support is only available since pyelftools 0.26. The script
will only work with Python 3.6 and later. Update the minimal system
requirements and release notes.

NB: The output produced by the legacy -r/--raw flag can be obtained with
the following command:

  strings build/app/dpdk-testpmd | sed -n 's/^PMD_INFO_STRING= //p'

Cc: Olivier Matz <olivier.matz@6wind.com>
Cc: Ferruh Yigit <ferruh.yigit@xilinx.com>
Cc: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
v1 -> v2:

* update release notes and minimal python version requirement
* hide warnings by default (-v/--verbose to show them)
* show debug messages with -vv
* also search libs in folders listed in /etc/ld.so.conf/*.conf
* only search for DT_NEEDED on executables, not on dynamic libraries
* take DT_RUNPATH into account for searching libraries
* fix weird broken pipe error
* fix some typos:
    s/begining/beginning/
    s/subsystem_device/subsystem_vendor/
    s/subsystem_system/subsystem_device/
* change field names for pci_ids elements (remove _id suffixes)
* DT_NEEDED of all files are analyzed. There is no way to differentiate
  between dynamically linked executables and dynamic libraries.

 doc/guides/linux_gsg/sys_reqs.rst      |   2 +-
 doc/guides/rel_notes/release_22_11.rst |   5 +
 usertools/dpdk-pmdinfo.py              | 913 +++++++++----------------
 3 files changed, 313 insertions(+), 607 deletions(-)

diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index 08d45898f025..f842105eeda7 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -41,7 +41,7 @@ Compilation of the DPDK
    resulting in statically linked applications not being linked properly.
    Use an updated version of ``pkg-config`` or ``pkgconf`` instead when building applications
 
-*   Python 3.5 or later.
+*   Python 3.6 or later.
 
 *   Meson (version 0.49.2+) and ninja
 
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 8c021cf0505e..67054f5acdc9 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -84,6 +84,11 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* The ``dpdk-pmdinfo.py`` script was rewritten to produce valid JSON only.
+  PCI-IDs parsing has been removed.
+  To get a similar output to the (now removed) ``-r/--raw`` flag, you may use the following command::
+
+     strings $dpdk_binary_or_driver | sed -n 's/^PMD_INFO_STRING= //p'
 
 ABI Changes
 -----------
diff --git a/usertools/dpdk-pmdinfo.py b/usertools/dpdk-pmdinfo.py
index 40ef5cec6cba..ac043f5e4ed8 100755
--- a/usertools/dpdk-pmdinfo.py
+++ b/usertools/dpdk-pmdinfo.py
@@ -1,626 +1,327 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2016  Neil Horman <nhorman@tuxdriver.com>
+# Copyright(c) 2022  Robin Jarry
+# pylint: disable=invalid-name
+
+r"""
+Utility to dump PMD_INFO_STRING support from DPDK binaries.
+
+This script prints JSON output to be interpreted by other tools. Here are some
+examples with jq:
+
+Get the complete info for a given driver:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.name == "cnxk_nix_inl")'
+
+Get only the required kernel modules for a given driver:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.name == "net_i40e").kmod'
+
+Get only the required kernel modules for a given device:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.devices[] | .vendor_id == "15b3" and .device_id == "1013").kmod'
+"""
 
-# -------------------------------------------------------------------------
-#
-# Utility to dump PMD_INFO_STRING support from an object file
-#
-# -------------------------------------------------------------------------
-import json
-import os
-import platform
-import sys
 import argparse
-from elftools.common.exceptions import ELFError
-from elftools.common.py3compat import byte2int
-from elftools.elf.elffile import ELFFile
-
-
-# For running from development directory. It should take precedence over the
-# installed pyelftools.
-sys.path.insert(0, '.')
-
-raw_output = False
-pcidb = None
-
-# ===========================================
-
-class Vendor:
-    """
-    Class for vendors. This is the top level class
-    for the devices belong to a specific vendor.
-    self.devices is the device dictionary
-    subdevices are in each device.
-    """
-
-    def __init__(self, vendorStr):
-        """
-        Class initializes with the raw line from pci.ids
-        Parsing takes place inside __init__
-        """
-        self.ID = vendorStr.split()[0]
-        self.name = vendorStr.replace("%s " % self.ID, "").rstrip()
-        self.devices = {}
-
-    def addDevice(self, deviceStr):
-        """
-        Adds a device to self.devices
-        takes the raw line from pci.ids
-        """
-        s = deviceStr.strip()
-        devID = s.split()[0]
-        if devID in self.devices:
-            pass
-        else:
-            self.devices[devID] = Device(deviceStr)
-
-    def report(self):
-        print(self.ID, self.name)
-        for id, dev in self.devices.items():
-            dev.report()
-
-    def find_device(self, devid):
-        # convert to a hex string and remove 0x
-        devid = hex(devid)[2:]
-        try:
-            return self.devices[devid]
-        except:
-            return Device("%s  Unknown Device" % devid)
-
-
-class Device:
-
-    def __init__(self, deviceStr):
-        """
-        Class for each device.
-        Each vendor has its own devices dictionary.
-        """
-        s = deviceStr.strip()
-        self.ID = s.split()[0]
-        self.name = s.replace("%s  " % self.ID, "")
-        self.subdevices = {}
-
-    def report(self):
-        print("\t%s\t%s" % (self.ID, self.name))
-        for subID, subdev in self.subdevices.items():
-            subdev.report()
-
-    def addSubDevice(self, subDeviceStr):
-        """
-        Adds a subvendor, subdevice to device.
-        Uses raw line from pci.ids
-        """
-        s = subDeviceStr.strip()
-        spl = s.split()
-        subVendorID = spl[0]
-        subDeviceID = spl[1]
-        subDeviceName = s.split("  ")[-1]
-        devID = "%s:%s" % (subVendorID, subDeviceID)
-        self.subdevices[devID] = SubDevice(
-            subVendorID, subDeviceID, subDeviceName)
-
-    def find_subid(self, subven, subdev):
-        subven = hex(subven)[2:]
-        subdev = hex(subdev)[2:]
-        devid = "%s:%s" % (subven, subdev)
-
-        try:
-            return self.subdevices[devid]
-        except:
-            if (subven == "ffff" and subdev == "ffff"):
-                return SubDevice("ffff", "ffff", "(All Subdevices)")
-            return SubDevice(subven, subdev, "(Unknown Subdevice)")
-
-
-class SubDevice:
-    """
-    Class for subdevices.
-    """
-
-    def __init__(self, vendor, device, name):
-        """
-        Class initializes with vendorid, deviceid and name
-        """
-        self.vendorID = vendor
-        self.deviceID = device
-        self.name = name
-
-    def report(self):
-        print("\t\t%s\t%s\t%s" % (self.vendorID, self.deviceID, self.name))
-
-
-class PCIIds:
-    """
-    Top class for all pci.ids entries.
-    All queries will be asked to this class.
-    PCIIds.vendors["0e11"].devices["0046"].\
-    subdevices["0e11:4091"].name  =  "Smart Array 6i"
-    """
-
-    def __init__(self, filename):
-        """
-        Prepares the directories.
-        Checks local data file.
-        Tries to load from local, if not found, downloads from web
-        """
-        self.version = ""
-        self.date = ""
-        self.vendors = {}
-        self.contents = None
-        self.readLocal(filename)
-        self.parse()
-
-    def reportVendors(self):
-        """Reports the vendors
-        """
-        for vid, v in self.vendors.items():
-            print(v.ID, v.name)
-
-    def report(self, vendor=None):
-        """
-        Reports everything for all vendors or a specific vendor
-        PCIIds.report()  reports everything
-        PCIIDs.report("0e11") reports only "Compaq Computer Corporation"
-        """
-        if vendor is not None:
-            self.vendors[vendor].report()
-        else:
-            for vID, v in self.vendors.items():
-                v.report()
-
-    def find_vendor(self, vid):
-        # convert vid to a hex string and remove the 0x
-        vid = hex(vid)[2:]
-
-        try:
-            return self.vendors[vid]
-        except:
-            return Vendor("%s Unknown Vendor" % (vid))
-
-    def findDate(self, content):
-        for l in content:
-            if l.find("Date:") > -1:
-                return l.split()[-2].replace("-", "")
-        return None
-
-    def parse(self):
-        if not self.contents:
-            print("data/%s-pci.ids not found" % self.date)
-        else:
-            vendorID = ""
-            deviceID = ""
-            for l in self.contents:
-                if l[0] == "#":
-                    continue
-                elif not l.strip():
-                    continue
-                else:
-                    if l.find("\t\t") == 0:
-                        self.vendors[vendorID].devices[
-                            deviceID].addSubDevice(l)
-                    elif l.find("\t") == 0:
-                        deviceID = l.strip().split()[0]
-                        self.vendors[vendorID].addDevice(l)
-                    else:
-                        vendorID = l.split()[0]
-                        self.vendors[vendorID] = Vendor(l)
-
-    def readLocal(self, filename):
-        """
-        Reads the local file
-        """
-        with open(filename, 'r', encoding='utf-8') as f:
-            self.contents = f.readlines()
-        self.date = self.findDate(self.contents)
-
-    def loadLocal(self):
-        """
-        Loads database from local. If there is no file,
-        it creates a new one from web
-        """
-        self.date = idsfile[0].split("/")[1].split("-")[0]
-        self.readLocal()
-
-
-# =======================================
-
-def search_file(filename, search_path):
-    """ Given a search path, find file with requested name """
-    for path in search_path.split(':'):
-        candidate = os.path.join(path, filename)
-        if os.path.exists(candidate):
-            return os.path.abspath(candidate)
-    return None
-
-
-class ReadElf(object):
-    """ display_* methods are used to emit output into the output stream
-    """
-
-    def __init__(self, file, output):
-        """ file:
-                stream object with the ELF file to read
-
-            output:
-                output stream to write to
-        """
-        self.elffile = ELFFile(file)
-        self.output = output
-
-        # Lazily initialized if a debug dump is requested
-        self._dwarfinfo = None
-
-        self._versioninfo = None
-
-    def _section_from_spec(self, spec):
-        """ Retrieve a section given a "spec" (either number or name).
-            Return None if no such section exists in the file.
-        """
-        try:
-            num = int(spec)
-            if num < self.elffile.num_sections():
-                return self.elffile.get_section(num)
-            return None
-        except ValueError:
-            # Not a number. Must be a name then
-            section = self.elffile.get_section_by_name(force_unicode(spec))
-            if section is None:
-                # No match with a unicode name.
-                # Some versions of pyelftools (<= 0.23) store internal strings
-                # as bytes. Try again with the name encoded as bytes.
-                section = self.elffile.get_section_by_name(force_bytes(spec))
-            return section
-
-    def pretty_print_pmdinfo(self, pmdinfo):
-        global pcidb
-
-        for i in pmdinfo["pci_ids"]:
-            vendor = pcidb.find_vendor(i[0])
-            device = vendor.find_device(i[1])
-            subdev = device.find_subid(i[2], i[3])
-            print("%s (%s) : %s (%s) %s" %
-                  (vendor.name, vendor.ID, device.name,
-                   device.ID, subdev.name))
-
-    def parse_pmd_info_string(self, mystring):
-        global raw_output
-        global pcidb
-
-        optional_pmd_info = [
-            {'id': 'params', 'tag': 'PMD PARAMETERS'},
-            {'id': 'kmod', 'tag': 'PMD KMOD DEPENDENCIES'}
-        ]
-
-        i = mystring.index("=")
-        mystring = mystring[i + 2:]
-        pmdinfo = json.loads(mystring)
-
-        if raw_output:
-            print(json.dumps(pmdinfo))
-            return
-
-        print("PMD NAME: " + pmdinfo["name"])
-        for i in optional_pmd_info:
-            try:
-                print("%s: %s" % (i['tag'], pmdinfo[i['id']]))
-            except KeyError:
-                continue
-
-        if pmdinfo["pci_ids"]:
-            print("PMD HW SUPPORT:")
-            if pcidb is not None:
-                self.pretty_print_pmdinfo(pmdinfo)
-            else:
-                print("VENDOR\t DEVICE\t SUBVENDOR\t SUBDEVICE")
-                for i in pmdinfo["pci_ids"]:
-                    print("0x%04x\t 0x%04x\t 0x%04x\t\t 0x%04x" %
-                          (i[0], i[1], i[2], i[3]))
-
-        print("")
-
-    def display_pmd_info_strings(self, section_spec):
-        """ Display a strings dump of a section. section_spec is either a
-            section number or a name.
-        """
-        section = self._section_from_spec(section_spec)
-        if section is None:
-            return
-
-        data = section.data()
-        dataptr = 0
-
-        while dataptr < len(data):
-            while (dataptr < len(data) and
-                   not 32 <= byte2int(data[dataptr]) <= 127):
-                dataptr += 1
-
-            if dataptr >= len(data):
-                break
-
-            endptr = dataptr
-            while endptr < len(data) and byte2int(data[endptr]) != 0:
-                endptr += 1
-
-            # pyelftools may return byte-strings, force decode them
-            mystring = force_unicode(data[dataptr:endptr])
-            rc = mystring.find("PMD_INFO_STRING")
-            if rc != -1:
-                self.parse_pmd_info_string(mystring[rc:])
-
-            dataptr = endptr
-
-    def find_librte_eal(self, section):
-        for tag in section.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_NEEDED':
-                if "librte_eal" in force_unicode(tag.needed):
-                    return force_unicode(tag.needed)
-        return None
-
-    def search_for_autoload_path(self):
-        scanelf = self
-        scanfile = None
-        library = None
-
-        section = self._section_from_spec(".dynamic")
-        try:
-            eallib = self.find_librte_eal(section)
-            if eallib is not None:
-                ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-                if ldlibpath is None:
-                    ldlibpath = ""
-                dtr = self.get_dt_runpath(section)
-                library = search_file(eallib,
-                                      dtr + ":" + ldlibpath +
-                                      ":/usr/lib64:/lib64:/usr/lib:/lib")
-                if library is None:
-                    return (None, None)
-                if not raw_output:
-                    print("Scanning for autoload path in %s" % library)
-                scanfile = open(library, 'rb')
-                scanelf = ReadElf(scanfile, sys.stdout)
-        except AttributeError:
-            # Not a dynamic binary
-            pass
-        except ELFError:
-            scanfile.close()
-            return (None, None)
-
-        section = scanelf._section_from_spec(".rodata")
-        if section is None:
-            if scanfile is not None:
-                scanfile.close()
-            return (None, None)
-
-        data = section.data()
-        dataptr = 0
-
-        while dataptr < len(data):
-            while (dataptr < len(data) and
-                   not 32 <= byte2int(data[dataptr]) <= 127):
-                dataptr += 1
-
-            if dataptr >= len(data):
-                break
-
-            endptr = dataptr
-            while endptr < len(data) and byte2int(data[endptr]) != 0:
-                endptr += 1
-
-            # pyelftools may return byte-strings, force decode them
-            mystring = force_unicode(data[dataptr:endptr])
-            rc = mystring.find("DPDK_PLUGIN_PATH")
-            if rc != -1:
-                rc = mystring.find("=")
-                return (mystring[rc + 1:], library)
-
-            dataptr = endptr
-        if scanfile is not None:
-            scanfile.close()
-        return (None, None)
-
-    def get_dt_runpath(self, dynsec):
-        for tag in dynsec.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_RUNPATH':
-                return force_unicode(tag.runpath)
-        return ""
-
-    def process_dt_needed_entries(self):
-        """ Look to see if there are any DT_NEEDED entries in the binary
-            And process those if there are
-        """
-        runpath = ""
-        ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-        if ldlibpath is None:
-            ldlibpath = ""
-
-        dynsec = self._section_from_spec(".dynamic")
-        try:
-            runpath = self.get_dt_runpath(dynsec)
-        except AttributeError:
-            # dynsec is None, just return
-            return
-
-        for tag in dynsec.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_NEEDED':
-                if 'librte_' in force_unicode(tag.needed):
-                    library = search_file(force_unicode(tag.needed),
-                                          runpath + ":" + ldlibpath +
-                                          ":/usr/lib64:/lib64:/usr/lib:/lib")
-                    if library is not None:
-                        with open(library, 'rb') as file:
-                            try:
-                                libelf = ReadElf(file, sys.stdout)
-                            except ELFError:
-                                print("%s is no an ELF file" % library)
-                                continue
-                            libelf.process_dt_needed_entries()
-                            libelf.display_pmd_info_strings(".rodata")
-                            file.close()
-
-
-# compat: remove force_unicode & force_bytes when pyelftools<=0.23 support is
-# dropped.
-def force_unicode(s):
-    if hasattr(s, 'decode') and callable(s.decode):
-        s = s.decode('latin-1')  # same encoding used in pyelftools py3compat
-    return s
-
-
-def force_bytes(s):
-    if hasattr(s, 'encode') and callable(s.encode):
-        s = s.encode('latin-1')  # same encoding used in pyelftools py3compat
-    return s
-
-
-def scan_autoload_path(autoload_path):
-    global raw_output
-
-    if not os.path.exists(autoload_path):
-        return
-
+import glob
+import json
+import logging
+import os
+import re
+import string
+import sys
+from pathlib import Path
+from typing import Iterable, Iterator, List, Union
+
+import elftools
+from elftools.elf.elffile import ELFError, ELFFile
+
+
+# ----------------------------------------------------------------------------
+def main() -> int:  # pylint: disable=missing-docstring
     try:
-        dirs = os.listdir(autoload_path)
-    except OSError:
-        # Couldn't read the directory, give up
-        return
-
-    for d in dirs:
-        dpath = os.path.join(autoload_path, d)
-        if os.path.isdir(dpath):
-            scan_autoload_path(dpath)
-        if os.path.isfile(dpath):
-            try:
-                file = open(dpath, 'rb')
-                readelf = ReadElf(file, sys.stdout)
-            except ELFError:
-                # this is likely not an elf file, skip it
-                continue
-            except IOError:
-                # No permission to read the file, skip it
-                continue
-
-            if not raw_output:
-                print("Hw Support for library %s" % d)
-            readelf.display_pmd_info_strings(".rodata")
-            file.close()
+        args = parse_args()
+        logging.basicConfig(
+            stream=sys.stderr,
+            format="%(levelname)s: %(message)s",
+            level={
+                0: logging.ERROR,
+                1: logging.WARNING,
+            }.get(args.verbose, logging.DEBUG),
+        )
+        info = parse_pmdinfo(args.elf_files, args.search_plugins)
+        print(json.dumps(info, indent=2))
+    except BrokenPipeError:
+        pass
+    except KeyboardInterrupt:
+        return 1
+    except Exception as e:  # pylint: disable=broad-except
+        logging.error("%s", e)
+        return 1
+    return 0
 
 
-def scan_for_autoload_pmds(dpdk_path):
+# ----------------------------------------------------------------------------
+def parse_args() -> argparse.Namespace:
     """
-    search the specified application or path for a pmd autoload path
-    then scan said path for pmds and report hw support
+    Parse command line arguments.
     """
-    global raw_output
-
-    if not os.path.isfile(dpdk_path):
-        if not raw_output:
-            print("Must specify a file name")
-        return
-
-    file = open(dpdk_path, 'rb')
-    try:
-        readelf = ReadElf(file, sys.stdout)
-    except ElfError:
-        if not raw_output:
-            print("Unable to parse %s" % file)
-        return
-
-    (autoload_path, scannedfile) = readelf.search_for_autoload_path()
-    if not autoload_path:
-        if not raw_output:
-            print("No autoload path configured in %s" % dpdk_path)
-        return
-    if not raw_output:
-        if scannedfile is None:
-            scannedfile = dpdk_path
-        print("Found autoload path %s in %s" % (autoload_path, scannedfile))
-
-    file.close()
-    if not raw_output:
-        print("Discovered Autoload HW Support:")
-    scan_autoload_path(autoload_path)
-    return
-
-
-def main(stream=None):
-    global raw_output
-    global pcidb
-
-    pcifile_default = "./pci.ids"  # For unknown OS's assume local file
-    if platform.system() == 'Linux':
-        # hwdata is the legacy location, misc is supported going forward
-        pcifile_default = "/usr/share/misc/pci.ids"
-        if not os.path.exists(pcifile_default):
-            pcifile_default = "/usr/share/hwdata/pci.ids"
-    elif platform.system() == 'FreeBSD':
-        pcifile_default = "/usr/local/share/pciids/pci.ids"
-        if not os.path.exists(pcifile_default):
-            pcifile_default = "/usr/share/misc/pci_vendors"
-
     parser = argparse.ArgumentParser(
-        usage='usage: %(prog)s [-hrtp] [-d <pci id file>] elf_file',
-        description="Dump pmd hardware support info")
-    group = parser.add_mutually_exclusive_group()
-    group.add_argument('-r', '--raw',
-                       action='store_true', dest='raw_output',
-                       help='dump raw json strings')
-    group.add_argument("-t", "--table", dest="tblout",
-                       help="output information on hw support as a hex table",
-                       action='store_true')
-    parser.add_argument("-d", "--pcidb", dest="pcifile",
-                        help="specify a pci database to get vendor names from",
-                        default=pcifile_default, metavar="FILE")
-    parser.add_argument("-p", "--plugindir", dest="pdir",
-                        help="scan dpdk for autoload plugins",
-                        action='store_true')
-    parser.add_argument("elf_file", help="driver shared object file")
-    args = parser.parse_args()
+        description=__doc__,
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    parser.add_argument(
+        "-p",
+        "--search-plugins",
+        action="store_true",
+        help="""
+        In addition of ELF_FILEs and their linked dynamic libraries, also scan
+        the DPDK plugins path.
+        """,
+    )
+    parser.add_argument(
+        "-v",
+        "--verbose",
+        action="count",
+        default=0,
+        help="""
+        Display warnings due to linked libraries not found or ELF/JSON parsing
+        errors in these libraries. Use twice to show debug messages.
+        """,
+    )
+    parser.add_argument(
+        "elf_files",
+        metavar="ELF_FILE",
+        nargs="+",
+        type=existing_file,
+        help="""
+        DPDK application binary or dynamic library.
+        """,
+    )
+    return parser.parse_args()
 
-    if args.raw_output:
-        raw_output = True
 
-    if args.tblout:
-        args.pcifile = None
+# ----------------------------------------------------------------------------
+def parse_pmdinfo(paths: Iterable[Path], search_plugins: bool) -> List[dict]:
+    """
+    Extract DPDK PMD info JSON strings from an ELF file.
 
-    if args.pcifile:
-        pcidb = PCIIds(args.pcifile)
-        if pcidb is None:
-            print("Pci DB file not found")
-            exit(1)
+    :returns:
+        A list of DPDK drivers info dictionaries.
+    """
+    binaries = set(paths)
+    for p in paths:
+        binaries.update(get_needed_libs(p))
+    if search_plugins:
+        # cast to list to avoid errors with update while iterating
+        binaries.update(list(get_plugin_libs(binaries)))
 
-    if args.pdir:
-        exit(scan_for_autoload_pmds(args.elf_file))
+    drivers = []
 
-    ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-    if ldlibpath is None:
-        ldlibpath = ""
-
-    if os.path.exists(args.elf_file):
-        myelffile = args.elf_file
-    else:
-        myelffile = search_file(args.elf_file,
-                                ldlibpath + ":/usr/lib64:/lib64:/usr/lib:/lib")
-
-    if myelffile is None:
-        print("File not found")
-        sys.exit(1)
-
-    with open(myelffile, 'rb') as file:
+    for b in binaries:
+        logging.debug("analyzing %s", b)
         try:
-            readelf = ReadElf(file, sys.stdout)
-            readelf.process_dt_needed_entries()
-            readelf.display_pmd_info_strings(".rodata")
-            sys.exit(0)
+            for s in get_elf_strings(b, ".rodata", "PMD_INFO_STRING="):
+                try:
+                    info = json.loads(s)
+                    # convert numerical ids to hex strings
+                    pci_ids = []
+                    for vendor, device, subven, subdev in info.pop("pci_ids"):
+                        pci_ids.append(
+                            {
+                                "vendor": f"{vendor:04x}",
+                                "device": f"{device:04x}",
+                                "subsystem_vendor": f"{subven:04x}",
+                                "subsystem_device": f"{subdev:04x}",
+                            }
+                        )
+                    info["pci_ids"] = pci_ids
+                    drivers.append(info)
+                except ValueError as e:
+                    # invalid JSON, should never happen
+                    logging.warning("%s: %s", b, e)
+        except ELFError as e:
+            # only happens for discovered plugins that are not ELF
+            logging.debug("%s: cannot parse ELF: %s", b, e)
 
-        except ELFError as ex:
-            sys.stderr.write('ELF error: %s\n' % ex)
-            sys.exit(1)
+    return drivers
 
 
-# -------------------------------------------------------------------------
-if __name__ == '__main__':
-    main()
+# ----------------------------------------------------------------------------
+def get_plugin_libs(binaries: Iterable[Path]) -> Iterator[Path]:
+    """
+    Look into the provided binaries for DPDK_PLUGIN_PATH and scan the path
+    for files.
+    """
+    for b in binaries:
+        for p in get_elf_strings(b, ".rodata", "DPDK_PLUGIN_PATH="):
+            plugin_path = p.strip()
+            logging.debug("discovering plugins in %s", plugin_path)
+            for root, _, files in os.walk(plugin_path):
+                for f in files:
+                    yield Path(root) / f
+            # no need to search in other binaries.
+            return
+
+
+# ----------------------------------------------------------------------------
+def existing_file(value: str) -> Path:
+    """
+    Argparse type= callback to ensure an argument points to a valid file path.
+    """
+    path = Path(value)
+    if not path.is_file():
+        raise argparse.ArgumentTypeError(f"{value}: No such file")
+    return path
+
+
+# ----------------------------------------------------------------------------
+PRINTABLE_BYTES = frozenset(string.printable.encode("ascii"))
+
+
+def find_strings(buf: bytes, prefix: str) -> Iterator[str]:
+    """
+    Extract strings of printable ASCII characters from a bytes buffer.
+    """
+    view = memoryview(buf)
+    start = None
+
+    for i, b in enumerate(view):
+        if start is None and b in PRINTABLE_BYTES:
+            # mark beginning of string
+            start = i
+            continue
+        if start is not None:
+            if b in PRINTABLE_BYTES:
+                # string not finished
+                continue
+            if b == 0:
+                # end of string
+                s = view[start:i].tobytes().decode("ascii")
+                if s.startswith(prefix):
+                    yield s[len(prefix) :]
+            # There can be byte sequences where a non-printable byte
+            # follows a printable one. Ignore that.
+            start = None
+
+
+# ----------------------------------------------------------------------------
+def elftools_version():
+    """
+    Extract pyelftools version as a tuple of integers for easy comparison.
+    """
+    version = getattr(elftools, "__version__", "")
+    match = re.match(r"^(\d+)\.(\d+).*$", str(version))
+    if not match:
+        # cannot determine version, hope for the best
+        return (0, 24)
+    return (int(match[1]), int(match[2]))
+
+
+ELFTOOLS_VERSION = elftools_version()
+
+
+def from_elftools(s: Union[bytes, str]) -> str:
+    """
+    Earlier versions of pyelftools (< 0.24) return bytes encoded with "latin-1"
+    instead of python strings.
+    """
+    if isinstance(s, bytes):
+        return s.decode("latin-1")
+    return s
+
+
+def to_elftools(s: str) -> Union[bytes, str]:
+    """
+    Earlier versions of pyelftools (< 0.24) assume that ELF section and tags
+    are bytes encoded with "latin-1" instead of python strings.
+    """
+    if ELFTOOLS_VERSION < (0, 24):
+        return s.encode("latin-1")
+    return s
+
+
+# ----------------------------------------------------------------------------
+def get_elf_strings(path: Path, section: str, prefix: str) -> Iterator[str]:
+    """
+    Extract strings from a named ELF section in a file.
+    """
+    with path.open("rb") as f:
+        elf = ELFFile(f)
+        sec = elf.get_section_by_name(to_elftools(section))
+        if not sec:
+            return
+        yield from find_strings(sec.data(), prefix)
+
+
+# ----------------------------------------------------------------------------
+def ld_so_path() -> Iterator[str]:
+    """
+    Return the list of directories where dynamic libraries are loaded based
+    on the contents of /etc/ld.so.conf/*.conf.
+    """
+    for conf in glob.iglob("/etc/ld.so.conf/*.conf"):
+        try:
+            with open(conf, "r", encoding="utf-8") as f:
+                for line in f:
+                    line = line.strip()
+                    if os.path.isdir(line):
+                        yield line
+        except OSError:
+            pass
+
+
+LD_SO_CONF_PATH = ld_so_path()
+
+
+def search_dt_needed(origin: Path, needed: str, runpath: List[str]) -> Path:
+    """
+    Search a file into LD_LIBRARY_PATH (if defined), runpath (if set) and in
+    all folders declared in /etc/ld.so.conf/*.conf. Finally, look in the
+    standard folders (/lib followed by /usr/lib).
+    """
+    folders = []
+    if "LD_LIBRARY_PATH" in os.environ:
+        folders += os.environ["LD_LIBRARY_PATH"].split(":")
+    folders += runpath
+    folders += LD_SO_CONF_PATH
+    folders += ["/lib", "/usr/lib"]
+    for d in folders:
+        d = d.replace("$ORIGIN", str(origin.parent.absolute()))
+        filepath = Path(d) / needed
+        if filepath.is_file():
+            return filepath
+    raise FileNotFoundError(needed)
+
+
+# ----------------------------------------------------------------------------
+def get_needed_libs(path: Path) -> Iterator[Path]:
+    """
+    Extract the dynamic library dependencies from an ELF executable.
+    """
+    with path.open("rb") as f:
+        elf = ELFFile(f)
+        dyn = elf.get_section_by_name(to_elftools(".dynamic"))
+        if not dyn:
+            return
+        runpath = []
+        for tag in dyn.iter_tags(to_elftools("DT_RUNPATH")):
+            runpath += from_elftools(tag.runpath).split(":")
+        for tag in dyn.iter_tags(to_elftools("DT_NEEDED")):
+            needed = from_elftools(tag.needed)
+            if not needed.startswith("librte_"):
+                continue
+            logging.debug("%s: DT_NEEDED %s", path, needed)
+            try:
+                yield search_dt_needed(path, needed, runpath)
+            except FileNotFoundError:
+                logging.warning("%s: DT_NEEDED not found: %s", path, needed)
+
+
+# ----------------------------------------------------------------------------
+if __name__ == "__main__":
+    sys.exit(main())
-- 
2.37.3


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] usertools: rewrite pmdinfo
  2022-09-13 19:42 ` [PATCH v2] " Robin Jarry
@ 2022-09-13 20:54   ` Ferruh Yigit
  2022-09-13 21:22     ` Robin Jarry
  0 siblings, 1 reply; 42+ messages in thread
From: Ferruh Yigit @ 2022-09-13 20:54 UTC (permalink / raw)
  To: Robin Jarry; +Cc: Olivier Matz, Bruce Richardson, dev

On 9/13/2022 8:42 PM, Robin Jarry wrote:

> 
> dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
> merely prints multiple independent JSON lines which cannot be fed
> directly to any JSON parser. Moreover, the script complexity is rather
> high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
> sections. Rewrite it so that it can produce valid JSON.
> 
> Remove the PCI database parsing for PCI-ID to Vendor-Device names
> conversion. This should be done by external scripts (if really needed).
> 
> Here are some examples of use with jq:
> 
> Get the complete info for a given driver:
> 
>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>     jq '.[] | select(.name == "dmadev_idxd_pci")'
>   {
>     "name": "dmadev_idxd_pci",
>     "params": "max_queues=0",
>     "kmod": "vfio-pci",
>     "pci_ids": [
>       {
>         "vendor": "8086",
>         "device": "0b25",
>         "subsystem_vendor": "ffff",
>         "subsystem_device": "ffff"
>       }
>     ]
>   }
> 
> Get only the required kernel modules for a given driver:
> 
>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>     jq '.[] | select(.name == "net_i40e").kmod'
>   "* igb_uio | uio_pci_generic | vfio-pci"
> 
> Get only the required kernel modules for a given device:
> 
>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>     jq '.[] | select(.pci_ids[] | .vendor == "15b3" and .device == "1013").kmod'
>   "* ib_uverbs & mlx5_core & mlx5_ib"
> 
> Print the list of drivers which define multiple parameters without
> string separators:
> 
>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>     jq '.[] | select(.params!=null and (.params|test("=[^ ]+="))) | {name, params}'
>   ...
> 
> The script passes flake8, black, isort and pylint checks.
> 
> I have tested this with a matrix of python/pyelftools versions:
> 
>                               pyelftools
>                 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
>           3.6     ok   ok   ok   ok   ok   ok   ok   ok
>           3.7     ok   ok   ok   ok   ok   ok   ok   ok
>    Python 3.8     ok   ok   ok   ok   ok   ok   ok   ok
>           3.9     ok   ok   ok   ok   ok   ok   ok   ok
>           3.10  fail fail fail fail   ok   ok   ok   ok
> 
> All failures with python 3.10 are related to the same issue:
> 
>    File "elftools/construct/lib/container.py", line 5, in <module>
>      from collections import MutableMapping
>    ImportError: cannot import name 'MutableMapping' from 'collections'
> 
> Python 3.10 support is only available since pyelftools 0.26. The script
> will only work with Python 3.6 and later. Update the minimal system
> requirements and release notes.
> 
> NB: The output produced by the legacy -r/--raw flag can be obtained with
> the following command:
> 
>    strings build/app/dpdk-testpmd | sed -n 's/^PMD_INFO_STRING= //p'
> 
> Cc: Olivier Matz <olivier.matz@6wind.com>
> Cc: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Cc: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Robin Jarry <rjarry@redhat.com>

Some of the drivers doesn't provide PCI ids, but script is listing them 
empty, like [1], is it better to omit the output for that case, as done 
to 'params' & 'kmod'?

Except from above note,
Tested-by: Ferruh Yigit <ferruh.yigit@xilinx.com>


[1]
   {
     "name": "net_enetfec",
     "pci_ids": []
   },



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] usertools: rewrite pmdinfo
  2022-09-13 20:54   ` Ferruh Yigit
@ 2022-09-13 21:22     ` Robin Jarry
  2022-09-14 11:46       ` Ferruh Yigit
  0 siblings, 1 reply; 42+ messages in thread
From: Robin Jarry @ 2022-09-13 21:22 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Olivier Matz, Bruce Richardson, dev

Ferruh Yigit, Sep 13, 2022 at 22:54:
> Some of the drivers doesn't provide PCI ids, but script is listing
> them empty, like [1], is it better to omit the output for that case,
> as done to 'params' & 'kmod'?
[snip]
> [1]
>    {
>      "name": "net_enetfec",
>      "pci_ids": []
>    },

I could indeed omit the drivers that only report their name.

However, this raises another question: why do these drivers report
a PMD_INFO_STRING in the first place? Should buildtools/pmdinfogen.py be
modified as well?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] usertools: rewrite pmdinfo
  2022-09-13 21:22     ` Robin Jarry
@ 2022-09-14 11:46       ` Ferruh Yigit
  2022-09-15  9:18         ` Robin Jarry
  0 siblings, 1 reply; 42+ messages in thread
From: Ferruh Yigit @ 2022-09-14 11:46 UTC (permalink / raw)
  To: Robin Jarry; +Cc: Olivier Matz, Bruce Richardson, dev

On 9/13/2022 10:22 PM, Robin Jarry wrote:
> CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
> 
> 
> Ferruh Yigit, Sep 13, 2022 at 22:54:
>> Some of the drivers doesn't provide PCI ids, but script is listing
>> them empty, like [1], is it better to omit the output for that case,
>> as done to 'params' & 'kmod'?
> [snip]
>> [1]
>>     {
>>       "name": "net_enetfec",
>>       "pci_ids": []
>>     },
> 
> I could indeed omit the drivers that only report their name.
> 
> However, this raises another question: why do these drivers report
> a PMD_INFO_STRING in the first place? Should buildtools/pmdinfogen.py be
> modified as well?
> 

I think better to display name, if there is nothing else to display, 
comparing to not display anything at all.
Command can be run on the driver object, .so, so user expects to see 
some output.

For above enetfec driver, it is virtual driver and doesn't support 
pci_ids, and it seems doesn't have parameters or kmod requirements.
So in this case it is OK to have only name.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2] usertools: rewrite pmdinfo
  2022-09-14 11:46       ` Ferruh Yigit
@ 2022-09-15  9:18         ` Robin Jarry
  0 siblings, 0 replies; 42+ messages in thread
From: Robin Jarry @ 2022-09-15  9:18 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Olivier Matz, Bruce Richardson, dev

Ferruh Yigit, Sep 14, 2022 at 13:46:
> I think better to display name, if there is nothing else to display, 
> comparing to not display anything at all.
> Command can be run on the driver object, .so, so user expects to see 
> some output.
>
> For above enetfec driver, it is virtual driver and doesn't support 
> pci_ids, and it seems doesn't have parameters or kmod requirements.
> So in this case it is OK to have only name.

Ok, was there something else to change before I send a v3?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3] usertools: rewrite pmdinfo
  2022-09-13 10:58 [PATCH] usertools: rewrite pmdinfo Robin Jarry
                   ` (2 preceding siblings ...)
  2022-09-13 19:42 ` [PATCH v2] " Robin Jarry
@ 2022-09-20  9:08 ` Robin Jarry
  2022-09-20 10:10   ` Ferruh Yigit
  2022-09-20 10:42 ` [PATCH v4] " Robin Jarry
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Robin Jarry @ 2022-09-20  9:08 UTC (permalink / raw)
  To: dev; +Cc: Robin Jarry, Olivier Matz, Ferruh Yigit, Bruce Richardson

dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
merely prints multiple independent JSON lines which cannot be fed
directly to any JSON parser. Moreover, the script complexity is rather
high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
sections. Rewrite it so that it can produce valid JSON.

Remove the PCI database parsing for PCI-ID to Vendor-Device names
conversion. This should be done by external scripts (if really needed).

Here are some examples of use with jq:

Get the complete info for a given driver:

 ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
   jq '.[] | select(.name == "dmadev_idxd_pci")'
 {
   "name": "dmadev_idxd_pci",
   "params": "max_queues=0",
   "kmod": "vfio-pci",
   "pci_ids": [
     {
       "vendor": "8086",
       "device": "0b25",
       "subsystem_vendor": "ffff",
       "subsystem_device": "ffff"
     }
   ]
 }

Get only the required kernel modules for a given driver:

 ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
   jq '.[] | select(.name == "net_i40e").kmod'
 "* igb_uio | uio_pci_generic | vfio-pci"

Get only the required kernel modules for a given device:

 ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
   jq '.[] | select(.pci_ids[] | .vendor == "15b3" and .device == "1013").kmod'
 "* ib_uverbs & mlx5_core & mlx5_ib"

Print the list of drivers which define multiple parameters without
string separators:

 ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
   jq '.[] | select(.params!=null and (.params|test("=[^ ]+="))) | {name, params}'
 ...

The script passes flake8, black, isort and pylint checks.

I have tested this with a matrix of python/pyelftools versions:

                             pyelftools
               0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
         3.6     ok   ok   ok   ok   ok   ok   ok   ok
         3.7     ok   ok   ok   ok   ok   ok   ok   ok
  Python 3.8     ok   ok   ok   ok   ok   ok   ok   ok
         3.9     ok   ok   ok   ok   ok   ok   ok   ok
         3.10  fail fail fail fail   ok   ok   ok   ok

All failures with python 3.10 are related to the same issue:

  File "elftools/construct/lib/container.py", line 5, in <module>
    from collections import MutableMapping
  ImportError: cannot import name 'MutableMapping' from 'collections'

Python 3.10 support is only available since pyelftools 0.26. The script
will only work with Python 3.6 and later. Update the minimal system
requirements and release notes.

NB: The output produced by the legacy -r/--raw flag can be obtained with
the following command:

  strings build/app/dpdk-testpmd | sed -n 's/^PMD_INFO_STRING= //p'

Cc: Olivier Matz <olivier.matz@6wind.com>
Cc: Ferruh Yigit <ferruh.yigit@xilinx.com>
Cc: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
v2 -> v3:

* strip "pci_ids" when it is empty (some drivers do not support any pci
  devices)

v1 -> v2:

* update release notes and minimal python version requirement
* hide warnings by default (-v/--verbose to show them)
* show debug messages with -vv
* also search libs in folders listed in /etc/ld.so.conf/*.conf
* only search for DT_NEEDED on executables, not on dynamic libraries
* take DT_RUNPATH into account for searching libraries
* fix weird broken pipe error
* fix some typos:
    s/begining/beginning/
    s/subsystem_device/subsystem_vendor/
    s/subsystem_system/subsystem_device/
* change field names for pci_ids elements (remove _id suffixes)
* DT_NEEDED of files are analyzed. There is no way to differentiate
  between dynamically linked executables and dynamic libraries.

 doc/guides/linux_gsg/sys_reqs.rst      |   2 +-
 doc/guides/rel_notes/release_22_11.rst |   5 +
 usertools/dpdk-pmdinfo.py              | 914 +++++++++----------------
 3 files changed, 314 insertions(+), 607 deletions(-)

diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index 08d45898f025..f842105eeda7 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -41,7 +41,7 @@ Compilation of the DPDK
    resulting in statically linked applications not being linked properly.
    Use an updated version of ``pkg-config`` or ``pkgconf`` instead when building applications
 
-*   Python 3.5 or later.
+*   Python 3.6 or later.
 
 *   Meson (version 0.49.2+) and ninja
 
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 8c021cf0505e..67054f5acdc9 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -84,6 +84,11 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* The ``dpdk-pmdinfo.py`` script was rewritten to produce valid JSON only.
+  PCI-IDs parsing has been removed.
+  To get a similar output to the (now removed) ``-r/--raw`` flag, you may use the following command::
+
+     strings $dpdk_binary_or_driver | sed -n 's/^PMD_INFO_STRING= //p'
 
 ABI Changes
 -----------
diff --git a/usertools/dpdk-pmdinfo.py b/usertools/dpdk-pmdinfo.py
index 40ef5cec6cba..068fdba2a603 100755
--- a/usertools/dpdk-pmdinfo.py
+++ b/usertools/dpdk-pmdinfo.py
@@ -1,626 +1,328 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2016  Neil Horman <nhorman@tuxdriver.com>
+# Copyright(c) 2022  Robin Jarry
+# pylint: disable=invalid-name
+
+r"""
+Utility to dump PMD_INFO_STRING support from DPDK binaries.
+
+This script prints JSON output to be interpreted by other tools. Here are some
+examples with jq:
+
+Get the complete info for a given driver:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.name == "cnxk_nix_inl")'
+
+Get only the required kernel modules for a given driver:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.name == "net_i40e").kmod'
+
+Get only the required kernel modules for a given device:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.devices[] | .vendor_id == "15b3" and .device_id == "1013").kmod'
+"""
 
-# -------------------------------------------------------------------------
-#
-# Utility to dump PMD_INFO_STRING support from an object file
-#
-# -------------------------------------------------------------------------
-import json
-import os
-import platform
-import sys
 import argparse
-from elftools.common.exceptions import ELFError
-from elftools.common.py3compat import byte2int
-from elftools.elf.elffile import ELFFile
-
-
-# For running from development directory. It should take precedence over the
-# installed pyelftools.
-sys.path.insert(0, '.')
-
-raw_output = False
-pcidb = None
-
-# ===========================================
-
-class Vendor:
-    """
-    Class for vendors. This is the top level class
-    for the devices belong to a specific vendor.
-    self.devices is the device dictionary
-    subdevices are in each device.
-    """
-
-    def __init__(self, vendorStr):
-        """
-        Class initializes with the raw line from pci.ids
-        Parsing takes place inside __init__
-        """
-        self.ID = vendorStr.split()[0]
-        self.name = vendorStr.replace("%s " % self.ID, "").rstrip()
-        self.devices = {}
-
-    def addDevice(self, deviceStr):
-        """
-        Adds a device to self.devices
-        takes the raw line from pci.ids
-        """
-        s = deviceStr.strip()
-        devID = s.split()[0]
-        if devID in self.devices:
-            pass
-        else:
-            self.devices[devID] = Device(deviceStr)
-
-    def report(self):
-        print(self.ID, self.name)
-        for id, dev in self.devices.items():
-            dev.report()
-
-    def find_device(self, devid):
-        # convert to a hex string and remove 0x
-        devid = hex(devid)[2:]
-        try:
-            return self.devices[devid]
-        except:
-            return Device("%s  Unknown Device" % devid)
-
-
-class Device:
-
-    def __init__(self, deviceStr):
-        """
-        Class for each device.
-        Each vendor has its own devices dictionary.
-        """
-        s = deviceStr.strip()
-        self.ID = s.split()[0]
-        self.name = s.replace("%s  " % self.ID, "")
-        self.subdevices = {}
-
-    def report(self):
-        print("\t%s\t%s" % (self.ID, self.name))
-        for subID, subdev in self.subdevices.items():
-            subdev.report()
-
-    def addSubDevice(self, subDeviceStr):
-        """
-        Adds a subvendor, subdevice to device.
-        Uses raw line from pci.ids
-        """
-        s = subDeviceStr.strip()
-        spl = s.split()
-        subVendorID = spl[0]
-        subDeviceID = spl[1]
-        subDeviceName = s.split("  ")[-1]
-        devID = "%s:%s" % (subVendorID, subDeviceID)
-        self.subdevices[devID] = SubDevice(
-            subVendorID, subDeviceID, subDeviceName)
-
-    def find_subid(self, subven, subdev):
-        subven = hex(subven)[2:]
-        subdev = hex(subdev)[2:]
-        devid = "%s:%s" % (subven, subdev)
-
-        try:
-            return self.subdevices[devid]
-        except:
-            if (subven == "ffff" and subdev == "ffff"):
-                return SubDevice("ffff", "ffff", "(All Subdevices)")
-            return SubDevice(subven, subdev, "(Unknown Subdevice)")
-
-
-class SubDevice:
-    """
-    Class for subdevices.
-    """
-
-    def __init__(self, vendor, device, name):
-        """
-        Class initializes with vendorid, deviceid and name
-        """
-        self.vendorID = vendor
-        self.deviceID = device
-        self.name = name
-
-    def report(self):
-        print("\t\t%s\t%s\t%s" % (self.vendorID, self.deviceID, self.name))
-
-
-class PCIIds:
-    """
-    Top class for all pci.ids entries.
-    All queries will be asked to this class.
-    PCIIds.vendors["0e11"].devices["0046"].\
-    subdevices["0e11:4091"].name  =  "Smart Array 6i"
-    """
-
-    def __init__(self, filename):
-        """
-        Prepares the directories.
-        Checks local data file.
-        Tries to load from local, if not found, downloads from web
-        """
-        self.version = ""
-        self.date = ""
-        self.vendors = {}
-        self.contents = None
-        self.readLocal(filename)
-        self.parse()
-
-    def reportVendors(self):
-        """Reports the vendors
-        """
-        for vid, v in self.vendors.items():
-            print(v.ID, v.name)
-
-    def report(self, vendor=None):
-        """
-        Reports everything for all vendors or a specific vendor
-        PCIIds.report()  reports everything
-        PCIIDs.report("0e11") reports only "Compaq Computer Corporation"
-        """
-        if vendor is not None:
-            self.vendors[vendor].report()
-        else:
-            for vID, v in self.vendors.items():
-                v.report()
-
-    def find_vendor(self, vid):
-        # convert vid to a hex string and remove the 0x
-        vid = hex(vid)[2:]
-
-        try:
-            return self.vendors[vid]
-        except:
-            return Vendor("%s Unknown Vendor" % (vid))
-
-    def findDate(self, content):
-        for l in content:
-            if l.find("Date:") > -1:
-                return l.split()[-2].replace("-", "")
-        return None
-
-    def parse(self):
-        if not self.contents:
-            print("data/%s-pci.ids not found" % self.date)
-        else:
-            vendorID = ""
-            deviceID = ""
-            for l in self.contents:
-                if l[0] == "#":
-                    continue
-                elif not l.strip():
-                    continue
-                else:
-                    if l.find("\t\t") == 0:
-                        self.vendors[vendorID].devices[
-                            deviceID].addSubDevice(l)
-                    elif l.find("\t") == 0:
-                        deviceID = l.strip().split()[0]
-                        self.vendors[vendorID].addDevice(l)
-                    else:
-                        vendorID = l.split()[0]
-                        self.vendors[vendorID] = Vendor(l)
-
-    def readLocal(self, filename):
-        """
-        Reads the local file
-        """
-        with open(filename, 'r', encoding='utf-8') as f:
-            self.contents = f.readlines()
-        self.date = self.findDate(self.contents)
-
-    def loadLocal(self):
-        """
-        Loads database from local. If there is no file,
-        it creates a new one from web
-        """
-        self.date = idsfile[0].split("/")[1].split("-")[0]
-        self.readLocal()
-
-
-# =======================================
-
-def search_file(filename, search_path):
-    """ Given a search path, find file with requested name """
-    for path in search_path.split(':'):
-        candidate = os.path.join(path, filename)
-        if os.path.exists(candidate):
-            return os.path.abspath(candidate)
-    return None
-
-
-class ReadElf(object):
-    """ display_* methods are used to emit output into the output stream
-    """
-
-    def __init__(self, file, output):
-        """ file:
-                stream object with the ELF file to read
-
-            output:
-                output stream to write to
-        """
-        self.elffile = ELFFile(file)
-        self.output = output
-
-        # Lazily initialized if a debug dump is requested
-        self._dwarfinfo = None
-
-        self._versioninfo = None
-
-    def _section_from_spec(self, spec):
-        """ Retrieve a section given a "spec" (either number or name).
-            Return None if no such section exists in the file.
-        """
-        try:
-            num = int(spec)
-            if num < self.elffile.num_sections():
-                return self.elffile.get_section(num)
-            return None
-        except ValueError:
-            # Not a number. Must be a name then
-            section = self.elffile.get_section_by_name(force_unicode(spec))
-            if section is None:
-                # No match with a unicode name.
-                # Some versions of pyelftools (<= 0.23) store internal strings
-                # as bytes. Try again with the name encoded as bytes.
-                section = self.elffile.get_section_by_name(force_bytes(spec))
-            return section
-
-    def pretty_print_pmdinfo(self, pmdinfo):
-        global pcidb
-
-        for i in pmdinfo["pci_ids"]:
-            vendor = pcidb.find_vendor(i[0])
-            device = vendor.find_device(i[1])
-            subdev = device.find_subid(i[2], i[3])
-            print("%s (%s) : %s (%s) %s" %
-                  (vendor.name, vendor.ID, device.name,
-                   device.ID, subdev.name))
-
-    def parse_pmd_info_string(self, mystring):
-        global raw_output
-        global pcidb
-
-        optional_pmd_info = [
-            {'id': 'params', 'tag': 'PMD PARAMETERS'},
-            {'id': 'kmod', 'tag': 'PMD KMOD DEPENDENCIES'}
-        ]
-
-        i = mystring.index("=")
-        mystring = mystring[i + 2:]
-        pmdinfo = json.loads(mystring)
-
-        if raw_output:
-            print(json.dumps(pmdinfo))
-            return
-
-        print("PMD NAME: " + pmdinfo["name"])
-        for i in optional_pmd_info:
-            try:
-                print("%s: %s" % (i['tag'], pmdinfo[i['id']]))
-            except KeyError:
-                continue
-
-        if pmdinfo["pci_ids"]:
-            print("PMD HW SUPPORT:")
-            if pcidb is not None:
-                self.pretty_print_pmdinfo(pmdinfo)
-            else:
-                print("VENDOR\t DEVICE\t SUBVENDOR\t SUBDEVICE")
-                for i in pmdinfo["pci_ids"]:
-                    print("0x%04x\t 0x%04x\t 0x%04x\t\t 0x%04x" %
-                          (i[0], i[1], i[2], i[3]))
-
-        print("")
-
-    def display_pmd_info_strings(self, section_spec):
-        """ Display a strings dump of a section. section_spec is either a
-            section number or a name.
-        """
-        section = self._section_from_spec(section_spec)
-        if section is None:
-            return
-
-        data = section.data()
-        dataptr = 0
-
-        while dataptr < len(data):
-            while (dataptr < len(data) and
-                   not 32 <= byte2int(data[dataptr]) <= 127):
-                dataptr += 1
-
-            if dataptr >= len(data):
-                break
-
-            endptr = dataptr
-            while endptr < len(data) and byte2int(data[endptr]) != 0:
-                endptr += 1
-
-            # pyelftools may return byte-strings, force decode them
-            mystring = force_unicode(data[dataptr:endptr])
-            rc = mystring.find("PMD_INFO_STRING")
-            if rc != -1:
-                self.parse_pmd_info_string(mystring[rc:])
-
-            dataptr = endptr
-
-    def find_librte_eal(self, section):
-        for tag in section.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_NEEDED':
-                if "librte_eal" in force_unicode(tag.needed):
-                    return force_unicode(tag.needed)
-        return None
-
-    def search_for_autoload_path(self):
-        scanelf = self
-        scanfile = None
-        library = None
-
-        section = self._section_from_spec(".dynamic")
-        try:
-            eallib = self.find_librte_eal(section)
-            if eallib is not None:
-                ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-                if ldlibpath is None:
-                    ldlibpath = ""
-                dtr = self.get_dt_runpath(section)
-                library = search_file(eallib,
-                                      dtr + ":" + ldlibpath +
-                                      ":/usr/lib64:/lib64:/usr/lib:/lib")
-                if library is None:
-                    return (None, None)
-                if not raw_output:
-                    print("Scanning for autoload path in %s" % library)
-                scanfile = open(library, 'rb')
-                scanelf = ReadElf(scanfile, sys.stdout)
-        except AttributeError:
-            # Not a dynamic binary
-            pass
-        except ELFError:
-            scanfile.close()
-            return (None, None)
-
-        section = scanelf._section_from_spec(".rodata")
-        if section is None:
-            if scanfile is not None:
-                scanfile.close()
-            return (None, None)
-
-        data = section.data()
-        dataptr = 0
-
-        while dataptr < len(data):
-            while (dataptr < len(data) and
-                   not 32 <= byte2int(data[dataptr]) <= 127):
-                dataptr += 1
-
-            if dataptr >= len(data):
-                break
-
-            endptr = dataptr
-            while endptr < len(data) and byte2int(data[endptr]) != 0:
-                endptr += 1
-
-            # pyelftools may return byte-strings, force decode them
-            mystring = force_unicode(data[dataptr:endptr])
-            rc = mystring.find("DPDK_PLUGIN_PATH")
-            if rc != -1:
-                rc = mystring.find("=")
-                return (mystring[rc + 1:], library)
-
-            dataptr = endptr
-        if scanfile is not None:
-            scanfile.close()
-        return (None, None)
-
-    def get_dt_runpath(self, dynsec):
-        for tag in dynsec.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_RUNPATH':
-                return force_unicode(tag.runpath)
-        return ""
-
-    def process_dt_needed_entries(self):
-        """ Look to see if there are any DT_NEEDED entries in the binary
-            And process those if there are
-        """
-        runpath = ""
-        ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-        if ldlibpath is None:
-            ldlibpath = ""
-
-        dynsec = self._section_from_spec(".dynamic")
-        try:
-            runpath = self.get_dt_runpath(dynsec)
-        except AttributeError:
-            # dynsec is None, just return
-            return
-
-        for tag in dynsec.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_NEEDED':
-                if 'librte_' in force_unicode(tag.needed):
-                    library = search_file(force_unicode(tag.needed),
-                                          runpath + ":" + ldlibpath +
-                                          ":/usr/lib64:/lib64:/usr/lib:/lib")
-                    if library is not None:
-                        with open(library, 'rb') as file:
-                            try:
-                                libelf = ReadElf(file, sys.stdout)
-                            except ELFError:
-                                print("%s is no an ELF file" % library)
-                                continue
-                            libelf.process_dt_needed_entries()
-                            libelf.display_pmd_info_strings(".rodata")
-                            file.close()
-
-
-# compat: remove force_unicode & force_bytes when pyelftools<=0.23 support is
-# dropped.
-def force_unicode(s):
-    if hasattr(s, 'decode') and callable(s.decode):
-        s = s.decode('latin-1')  # same encoding used in pyelftools py3compat
-    return s
-
-
-def force_bytes(s):
-    if hasattr(s, 'encode') and callable(s.encode):
-        s = s.encode('latin-1')  # same encoding used in pyelftools py3compat
-    return s
-
-
-def scan_autoload_path(autoload_path):
-    global raw_output
-
-    if not os.path.exists(autoload_path):
-        return
-
+import glob
+import json
+import logging
+import os
+import re
+import string
+import sys
+from pathlib import Path
+from typing import Iterable, Iterator, List, Union
+
+import elftools
+from elftools.elf.elffile import ELFError, ELFFile
+
+
+# ----------------------------------------------------------------------------
+def main() -> int:  # pylint: disable=missing-docstring
     try:
-        dirs = os.listdir(autoload_path)
-    except OSError:
-        # Couldn't read the directory, give up
-        return
-
-    for d in dirs:
-        dpath = os.path.join(autoload_path, d)
-        if os.path.isdir(dpath):
-            scan_autoload_path(dpath)
-        if os.path.isfile(dpath):
-            try:
-                file = open(dpath, 'rb')
-                readelf = ReadElf(file, sys.stdout)
-            except ELFError:
-                # this is likely not an elf file, skip it
-                continue
-            except IOError:
-                # No permission to read the file, skip it
-                continue
-
-            if not raw_output:
-                print("Hw Support for library %s" % d)
-            readelf.display_pmd_info_strings(".rodata")
-            file.close()
+        args = parse_args()
+        logging.basicConfig(
+            stream=sys.stderr,
+            format="%(levelname)s: %(message)s",
+            level={
+                0: logging.ERROR,
+                1: logging.WARNING,
+            }.get(args.verbose, logging.DEBUG),
+        )
+        info = parse_pmdinfo(args.elf_files, args.search_plugins)
+        print(json.dumps(info, indent=2))
+    except BrokenPipeError:
+        pass
+    except KeyboardInterrupt:
+        return 1
+    except Exception as e:  # pylint: disable=broad-except
+        logging.error("%s", e)
+        return 1
+    return 0
 
 
-def scan_for_autoload_pmds(dpdk_path):
+# ----------------------------------------------------------------------------
+def parse_args() -> argparse.Namespace:
     """
-    search the specified application or path for a pmd autoload path
-    then scan said path for pmds and report hw support
+    Parse command line arguments.
     """
-    global raw_output
-
-    if not os.path.isfile(dpdk_path):
-        if not raw_output:
-            print("Must specify a file name")
-        return
-
-    file = open(dpdk_path, 'rb')
-    try:
-        readelf = ReadElf(file, sys.stdout)
-    except ElfError:
-        if not raw_output:
-            print("Unable to parse %s" % file)
-        return
-
-    (autoload_path, scannedfile) = readelf.search_for_autoload_path()
-    if not autoload_path:
-        if not raw_output:
-            print("No autoload path configured in %s" % dpdk_path)
-        return
-    if not raw_output:
-        if scannedfile is None:
-            scannedfile = dpdk_path
-        print("Found autoload path %s in %s" % (autoload_path, scannedfile))
-
-    file.close()
-    if not raw_output:
-        print("Discovered Autoload HW Support:")
-    scan_autoload_path(autoload_path)
-    return
-
-
-def main(stream=None):
-    global raw_output
-    global pcidb
-
-    pcifile_default = "./pci.ids"  # For unknown OS's assume local file
-    if platform.system() == 'Linux':
-        # hwdata is the legacy location, misc is supported going forward
-        pcifile_default = "/usr/share/misc/pci.ids"
-        if not os.path.exists(pcifile_default):
-            pcifile_default = "/usr/share/hwdata/pci.ids"
-    elif platform.system() == 'FreeBSD':
-        pcifile_default = "/usr/local/share/pciids/pci.ids"
-        if not os.path.exists(pcifile_default):
-            pcifile_default = "/usr/share/misc/pci_vendors"
-
     parser = argparse.ArgumentParser(
-        usage='usage: %(prog)s [-hrtp] [-d <pci id file>] elf_file',
-        description="Dump pmd hardware support info")
-    group = parser.add_mutually_exclusive_group()
-    group.add_argument('-r', '--raw',
-                       action='store_true', dest='raw_output',
-                       help='dump raw json strings')
-    group.add_argument("-t", "--table", dest="tblout",
-                       help="output information on hw support as a hex table",
-                       action='store_true')
-    parser.add_argument("-d", "--pcidb", dest="pcifile",
-                        help="specify a pci database to get vendor names from",
-                        default=pcifile_default, metavar="FILE")
-    parser.add_argument("-p", "--plugindir", dest="pdir",
-                        help="scan dpdk for autoload plugins",
-                        action='store_true')
-    parser.add_argument("elf_file", help="driver shared object file")
-    args = parser.parse_args()
+        description=__doc__,
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    parser.add_argument(
+        "-p",
+        "--search-plugins",
+        action="store_true",
+        help="""
+        In addition of ELF_FILEs and their linked dynamic libraries, also scan
+        the DPDK plugins path.
+        """,
+    )
+    parser.add_argument(
+        "-v",
+        "--verbose",
+        action="count",
+        default=0,
+        help="""
+        Display warnings due to linked libraries not found or ELF/JSON parsing
+        errors in these libraries. Use twice to show debug messages.
+        """,
+    )
+    parser.add_argument(
+        "elf_files",
+        metavar="ELF_FILE",
+        nargs="+",
+        type=existing_file,
+        help="""
+        DPDK application binary or dynamic library.
+        """,
+    )
+    return parser.parse_args()
 
-    if args.raw_output:
-        raw_output = True
 
-    if args.tblout:
-        args.pcifile = None
+# ----------------------------------------------------------------------------
+def parse_pmdinfo(paths: Iterable[Path], search_plugins: bool) -> List[dict]:
+    """
+    Extract DPDK PMD info JSON strings from an ELF file.
 
-    if args.pcifile:
-        pcidb = PCIIds(args.pcifile)
-        if pcidb is None:
-            print("Pci DB file not found")
-            exit(1)
+    :returns:
+        A list of DPDK drivers info dictionaries.
+    """
+    binaries = set(paths)
+    for p in paths:
+        binaries.update(get_needed_libs(p))
+    if search_plugins:
+        # cast to list to avoid errors with update while iterating
+        binaries.update(list(get_plugin_libs(binaries)))
 
-    if args.pdir:
-        exit(scan_for_autoload_pmds(args.elf_file))
+    drivers = []
 
-    ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-    if ldlibpath is None:
-        ldlibpath = ""
-
-    if os.path.exists(args.elf_file):
-        myelffile = args.elf_file
-    else:
-        myelffile = search_file(args.elf_file,
-                                ldlibpath + ":/usr/lib64:/lib64:/usr/lib:/lib")
-
-    if myelffile is None:
-        print("File not found")
-        sys.exit(1)
-
-    with open(myelffile, 'rb') as file:
+    for b in binaries:
+        logging.debug("analyzing %s", b)
         try:
-            readelf = ReadElf(file, sys.stdout)
-            readelf.process_dt_needed_entries()
-            readelf.display_pmd_info_strings(".rodata")
-            sys.exit(0)
+            for s in get_elf_strings(b, ".rodata", "PMD_INFO_STRING="):
+                try:
+                    info = json.loads(s)
+                    # convert numerical ids to hex strings
+                    pci_ids = []
+                    for vendor, device, subven, subdev in info.pop("pci_ids"):
+                        pci_ids.append(
+                            {
+                                "vendor": f"{vendor:04x}",
+                                "device": f"{device:04x}",
+                                "subsystem_vendor": f"{subven:04x}",
+                                "subsystem_device": f"{subdev:04x}",
+                            }
+                        )
+                    if pci_ids:
+                        info["pci_ids"] = pci_ids
+                    drivers.append(info)
+                except ValueError as e:
+                    # invalid JSON, should never happen
+                    logging.warning("%s: %s", b, e)
+        except ELFError as e:
+            # only happens for discovered plugins that are not ELF
+            logging.debug("%s: cannot parse ELF: %s", b, e)
 
-        except ELFError as ex:
-            sys.stderr.write('ELF error: %s\n' % ex)
-            sys.exit(1)
+    return drivers
 
 
-# -------------------------------------------------------------------------
-if __name__ == '__main__':
-    main()
+# ----------------------------------------------------------------------------
+def get_plugin_libs(binaries: Iterable[Path]) -> Iterator[Path]:
+    """
+    Look into the provided binaries for DPDK_PLUGIN_PATH and scan the path
+    for files.
+    """
+    for b in binaries:
+        for p in get_elf_strings(b, ".rodata", "DPDK_PLUGIN_PATH="):
+            plugin_path = p.strip()
+            logging.debug("discovering plugins in %s", plugin_path)
+            for root, _, files in os.walk(plugin_path):
+                for f in files:
+                    yield Path(root) / f
+            # no need to search in other binaries.
+            return
+
+
+# ----------------------------------------------------------------------------
+def existing_file(value: str) -> Path:
+    """
+    Argparse type= callback to ensure an argument points to a valid file path.
+    """
+    path = Path(value)
+    if not path.is_file():
+        raise argparse.ArgumentTypeError(f"{value}: No such file")
+    return path
+
+
+# ----------------------------------------------------------------------------
+PRINTABLE_BYTES = frozenset(string.printable.encode("ascii"))
+
+
+def find_strings(buf: bytes, prefix: str) -> Iterator[str]:
+    """
+    Extract strings of printable ASCII characters from a bytes buffer.
+    """
+    view = memoryview(buf)
+    start = None
+
+    for i, b in enumerate(view):
+        if start is None and b in PRINTABLE_BYTES:
+            # mark beginning of string
+            start = i
+            continue
+        if start is not None:
+            if b in PRINTABLE_BYTES:
+                # string not finished
+                continue
+            if b == 0:
+                # end of string
+                s = view[start:i].tobytes().decode("ascii")
+                if s.startswith(prefix):
+                    yield s[len(prefix) :]
+            # There can be byte sequences where a non-printable byte
+            # follows a printable one. Ignore that.
+            start = None
+
+
+# ----------------------------------------------------------------------------
+def elftools_version():
+    """
+    Extract pyelftools version as a tuple of integers for easy comparison.
+    """
+    version = getattr(elftools, "__version__", "")
+    match = re.match(r"^(\d+)\.(\d+).*$", str(version))
+    if not match:
+        # cannot determine version, hope for the best
+        return (0, 24)
+    return (int(match[1]), int(match[2]))
+
+
+ELFTOOLS_VERSION = elftools_version()
+
+
+def from_elftools(s: Union[bytes, str]) -> str:
+    """
+    Earlier versions of pyelftools (< 0.24) return bytes encoded with "latin-1"
+    instead of python strings.
+    """
+    if isinstance(s, bytes):
+        return s.decode("latin-1")
+    return s
+
+
+def to_elftools(s: str) -> Union[bytes, str]:
+    """
+    Earlier versions of pyelftools (< 0.24) assume that ELF section and tags
+    are bytes encoded with "latin-1" instead of python strings.
+    """
+    if ELFTOOLS_VERSION < (0, 24):
+        return s.encode("latin-1")
+    return s
+
+
+# ----------------------------------------------------------------------------
+def get_elf_strings(path: Path, section: str, prefix: str) -> Iterator[str]:
+    """
+    Extract strings from a named ELF section in a file.
+    """
+    with path.open("rb") as f:
+        elf = ELFFile(f)
+        sec = elf.get_section_by_name(to_elftools(section))
+        if not sec:
+            return
+        yield from find_strings(sec.data(), prefix)
+
+
+# ----------------------------------------------------------------------------
+def ld_so_path() -> Iterator[str]:
+    """
+    Return the list of directories where dynamic libraries are loaded based
+    on the contents of /etc/ld.so.conf/*.conf.
+    """
+    for conf in glob.iglob("/etc/ld.so.conf/*.conf"):
+        try:
+            with open(conf, "r", encoding="utf-8") as f:
+                for line in f:
+                    line = line.strip()
+                    if os.path.isdir(line):
+                        yield line
+        except OSError:
+            pass
+
+
+LD_SO_CONF_PATH = ld_so_path()
+
+
+def search_dt_needed(origin: Path, needed: str, runpath: List[str]) -> Path:
+    """
+    Search a file into LD_LIBRARY_PATH (if defined), runpath (if set) and in
+    all folders declared in /etc/ld.so.conf/*.conf. Finally, look in the
+    standard folders (/lib followed by /usr/lib).
+    """
+    folders = []
+    if "LD_LIBRARY_PATH" in os.environ:
+        folders += os.environ["LD_LIBRARY_PATH"].split(":")
+    folders += runpath
+    folders += LD_SO_CONF_PATH
+    folders += ["/lib", "/usr/lib"]
+    for d in folders:
+        d = d.replace("$ORIGIN", str(origin.parent.absolute()))
+        filepath = Path(d) / needed
+        if filepath.is_file():
+            return filepath
+    raise FileNotFoundError(needed)
+
+
+# ----------------------------------------------------------------------------
+def get_needed_libs(path: Path) -> Iterator[Path]:
+    """
+    Extract the dynamic library dependencies from an ELF executable.
+    """
+    with path.open("rb") as f:
+        elf = ELFFile(f)
+        dyn = elf.get_section_by_name(to_elftools(".dynamic"))
+        if not dyn:
+            return
+        runpath = []
+        for tag in dyn.iter_tags(to_elftools("DT_RUNPATH")):
+            runpath += from_elftools(tag.runpath).split(":")
+        for tag in dyn.iter_tags(to_elftools("DT_NEEDED")):
+            needed = from_elftools(tag.needed)
+            if not needed.startswith("librte_"):
+                continue
+            logging.debug("%s: DT_NEEDED %s", path, needed)
+            try:
+                yield search_dt_needed(path, needed, runpath)
+            except FileNotFoundError:
+                logging.warning("%s: DT_NEEDED not found: %s", path, needed)
+
+
+# ----------------------------------------------------------------------------
+if __name__ == "__main__":
+    sys.exit(main())
-- 
2.37.3


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3] usertools: rewrite pmdinfo
  2022-09-20  9:08 ` [PATCH v3] " Robin Jarry
@ 2022-09-20 10:10   ` Ferruh Yigit
  2022-09-20 10:12     ` Robin Jarry
  0 siblings, 1 reply; 42+ messages in thread
From: Ferruh Yigit @ 2022-09-20 10:10 UTC (permalink / raw)
  To: Robin Jarry, dev, Jerin Jacob Kollanukkaran
  Cc: Olivier Matz, Bruce Richardson, Rasesh Mody,
	Devendra Singh Rawat, Ashwin Sekhar T K, Pavan Nikhilesh,
	Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao

On 9/20/2022 10:08 AM, Robin Jarry wrote:
> CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
> 
> 
> dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
> merely prints multiple independent JSON lines which cannot be fed
> directly to any JSON parser. Moreover, the script complexity is rather
> high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
> sections. Rewrite it so that it can produce valid JSON.
> 
> Remove the PCI database parsing for PCI-ID to Vendor-Device names
> conversion. This should be done by external scripts (if really needed).
> 
> Here are some examples of use with jq:
> 
> Get the complete info for a given driver:
> 
>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>     jq '.[] | select(.name == "dmadev_idxd_pci")'
>   {
>     "name": "dmadev_idxd_pci",
>     "params": "max_queues=0",
>     "kmod": "vfio-pci",
>     "pci_ids": [
>       {
>         "vendor": "8086",
>         "device": "0b25",
>         "subsystem_vendor": "ffff",
>         "subsystem_device": "ffff"
>       }
>     ]
>   }
> 
> Get only the required kernel modules for a given driver:
> 
>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>     jq '.[] | select(.name == "net_i40e").kmod'
>   "* igb_uio | uio_pci_generic | vfio-pci"
> 
> Get only the required kernel modules for a given device:
> 
>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>     jq '.[] | select(.pci_ids[] | .vendor == "15b3" and .device == "1013").kmod'
>   "* ib_uverbs & mlx5_core & mlx5_ib"
> 
> Print the list of drivers which define multiple parameters without
> string separators:
> 
>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>     jq '.[] | select(.params!=null and (.params|test("=[^ ]+="))) | {name, params}'
>   ...
> 
> The script passes flake8, black, isort and pylint checks.
> 
> I have tested this with a matrix of python/pyelftools versions:
> 
>                               pyelftools
>                 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
>           3.6     ok   ok   ok   ok   ok   ok   ok   ok
>           3.7     ok   ok   ok   ok   ok   ok   ok   ok
>    Python 3.8     ok   ok   ok   ok   ok   ok   ok   ok
>           3.9     ok   ok   ok   ok   ok   ok   ok   ok
>           3.10  fail fail fail fail   ok   ok   ok   ok
> 
> All failures with python 3.10 are related to the same issue:
> 
>    File "elftools/construct/lib/container.py", line 5, in <module>
>      from collections import MutableMapping
>    ImportError: cannot import name 'MutableMapping' from 'collections'
> 
> Python 3.10 support is only available since pyelftools 0.26. The script
> will only work with Python 3.6 and later. Update the minimal system
> requirements and release notes.
> 
> NB: The output produced by the legacy -r/--raw flag can be obtained with
> the following command:
> 
>    strings build/app/dpdk-testpmd | sed -n 's/^PMD_INFO_STRING= //p'
> 
> Cc: Olivier Matz <olivier.matz@6wind.com>
> Cc: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Cc: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Robin Jarry <rjarry@redhat.com>


For 'subsystem_vendor' & 'subsystem_device', the value "ffff" means it 
is not explicitly defined, so it gets default value.
What do you think to omit those as well, when value is "ffff", to reduce 
noise on the output?



BTW, I have detected some duplicates in the output, like [1], [2] & [3]. 
It seems related to the duplicates in the code, cc'ed maintainers.

[1]:
   {
     "name": "net_qede",
     "kmod": "* igb_uio | uio_pci_generic | vfio-pci",
     "pci_ids": [
       {
         "vendor": "1077",
         "device": "1634",
         "subsystem_vendor": "ffff",
         "subsystem_device": "ffff"
       },
       {
         "vendor": "1077",
         "device": "1629",
         "subsystem_vendor": "ffff",
         "subsystem_device": "ffff"
       },
       {
         "vendor": "1077",
         "device": "1634",
         "subsystem_vendor": "ffff",
         "subsystem_device": "ffff"
       },
...

[2]
   {
     "name": "mempool_cnxk",
     "params": "max_pools=<128-1048576>",
     "kmod": "vfio-pci",
     "pci_ids": [
       {
         "vendor": "177d",
         "device": "a0fb",
         "subsystem_vendor": "ffff",
         "subsystem_device": "b900"
       },
       {
         "vendor": "177d",
         "device": "a0fb",
         "subsystem_vendor": "ffff",
         "subsystem_device": "b900"
       },
...

[3]
   {
     "name": "net_cn10k",
     "kmod": "vfio-pci",
     "pci_ids": [
       {
         "vendor": "177d",
         "device": "a063",
         "subsystem_vendor": "ffff",
         "subsystem_device": "b900"
       },
       {
         "vendor": "177d",
         "device": "a063",
         "subsystem_vendor": "ffff",
         "subsystem_device": "b900"
       },
...

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3] usertools: rewrite pmdinfo
  2022-09-20 10:10   ` Ferruh Yigit
@ 2022-09-20 10:12     ` Robin Jarry
  0 siblings, 0 replies; 42+ messages in thread
From: Robin Jarry @ 2022-09-20 10:12 UTC (permalink / raw)
  To: Ferruh Yigit, dev, Jerin Jacob Kollanukkaran
  Cc: Olivier Matz, Bruce Richardson, Rasesh Mody,
	Devendra Singh Rawat, Ashwin Sekhar T K, Pavan Nikhilesh,
	Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori, Satha Rao

Ferruh Yigit, Sep 20, 2022 at 12:10:
> For 'subsystem_vendor' & 'subsystem_device', the value "ffff" means it 
> is not explicitly defined, so it gets default value.
> What do you think to omit those as well, when value is "ffff", to reduce 
> noise on the output?

Sure, I could strip those as well.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4] usertools: rewrite pmdinfo
  2022-09-13 10:58 [PATCH] usertools: rewrite pmdinfo Robin Jarry
                   ` (3 preceding siblings ...)
  2022-09-20  9:08 ` [PATCH v3] " Robin Jarry
@ 2022-09-20 10:42 ` Robin Jarry
  2022-09-20 14:08   ` Olivier Matz
  2022-09-20 17:48   ` Ferruh Yigit
  2022-09-22 11:58 ` [PATCH v5] " Robin Jarry
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 42+ messages in thread
From: Robin Jarry @ 2022-09-20 10:42 UTC (permalink / raw)
  To: dev; +Cc: Robin Jarry, Olivier Matz, Ferruh Yigit, Bruce Richardson

dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
merely prints multiple independent JSON lines which cannot be fed
directly to any JSON parser. Moreover, the script complexity is rather
high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
sections. Rewrite it so that it can produce valid JSON.

Remove the PCI database parsing for PCI-ID to Vendor-Device names
conversion. This should be done by external scripts (if really needed).

Here are some examples of use with jq:

Get the complete info for a given driver:

 ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
   jq '.[] | select(.name == "dmadev_idxd_pci")'
 {
   "name": "dmadev_idxd_pci",
   "params": "max_queues=0",
   "kmod": "vfio-pci",
   "pci_ids": [
     {
       "vendor": "8086",
       "device": "0b25"
     }
   ]
 }

Get only the required kernel modules for a given driver:

 ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
   jq '.[] | select(.name == "net_i40e").kmod'
 "* igb_uio | uio_pci_generic | vfio-pci"

Get only the required kernel modules for a given device:

 ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
   jq '.[] | select(.pci_ids[] | .vendor == "15b3" and .device == "1013").kmod'
 "* ib_uverbs & mlx5_core & mlx5_ib"

Print the list of drivers which define multiple parameters without
space separators:

 ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
   jq '.[] | select(.params!=null and (.params|test("=[^ ]+="))) | {name, params}'
 ...

The script passes flake8, black, isort and pylint checks.

I have tested this with a matrix of python/pyelftools versions:

                             pyelftools
               0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
         3.6     ok   ok   ok   ok   ok   ok   ok   ok
         3.7     ok   ok   ok   ok   ok   ok   ok   ok
  Python 3.8     ok   ok   ok   ok   ok   ok   ok   ok
         3.9     ok   ok   ok   ok   ok   ok   ok   ok
         3.10  fail fail fail fail   ok   ok   ok   ok

All failures with python 3.10 are related to the same issue:

  File "elftools/construct/lib/container.py", line 5, in <module>
    from collections import MutableMapping
  ImportError: cannot import name 'MutableMapping' from 'collections'

Python 3.10 support is only available since pyelftools 0.26. The script
will only work with Python 3.6 and later. Update the minimal system
requirements and release notes.

NB: The output produced by the legacy -r/--raw flag can be obtained with
the following command:

  strings build/app/dpdk-testpmd | sed -n 's/^PMD_INFO_STRING= //p'

Cc: Olivier Matz <olivier.matz@6wind.com>
Cc: Ferruh Yigit <ferruh.yigit@xilinx.com>
Cc: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
v3 -> v4:

* also strip pci_id fields when they have the wildcard 0xFFFF value.

v2 -> v3:

* strip "pci_ids" when it is empty (some drivers do not support any
  pci devices)

v1 -> v2:

* update release notes and minimal python version requirement
* hide warnings by default (-v/--verbose to show them)
* show debug messages with -vv
* also search libs in folders listed in /etc/ld.so.conf/*.conf
* only search for DT_NEEDED on executables, not on dynamic libraries
* take DT_RUNPATH into account for searching libraries
* fix weird broken pipe error
* fix some typos:
    s/begining/beginning/
    s/subsystem_device/subsystem_vendor/
    s/subsystem_system/subsystem_device/
* change field names for pci_ids elements (remove _id suffixes)
* DT_NEEDED of files are analyzed. There is no way to differentiate
  between dynamically linked executables and dynamic libraries.

 doc/guides/linux_gsg/sys_reqs.rst      |   2 +-
 doc/guides/rel_notes/release_22_11.rst |   5 +
 usertools/dpdk-pmdinfo.py              | 924 +++++++++----------------
 3 files changed, 324 insertions(+), 607 deletions(-)

diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index 08d45898f025..f842105eeda7 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -41,7 +41,7 @@ Compilation of the DPDK
    resulting in statically linked applications not being linked properly.
    Use an updated version of ``pkg-config`` or ``pkgconf`` instead when building applications
 
-*   Python 3.5 or later.
+*   Python 3.6 or later.
 
 *   Meson (version 0.49.2+) and ninja
 
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 8c021cf0505e..67054f5acdc9 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -84,6 +84,11 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* The ``dpdk-pmdinfo.py`` script was rewritten to produce valid JSON only.
+  PCI-IDs parsing has been removed.
+  To get a similar output to the (now removed) ``-r/--raw`` flag, you may use the following command::
+
+     strings $dpdk_binary_or_driver | sed -n 's/^PMD_INFO_STRING= //p'
 
 ABI Changes
 -----------
diff --git a/usertools/dpdk-pmdinfo.py b/usertools/dpdk-pmdinfo.py
index 40ef5cec6cba..a68921296609 100755
--- a/usertools/dpdk-pmdinfo.py
+++ b/usertools/dpdk-pmdinfo.py
@@ -1,626 +1,338 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2016  Neil Horman <nhorman@tuxdriver.com>
+# Copyright(c) 2022  Robin Jarry
+# pylint: disable=invalid-name
+
+r"""
+Utility to dump PMD_INFO_STRING support from DPDK binaries.
+
+This script prints JSON output to be interpreted by other tools. Here are some
+examples with jq:
+
+Get the complete info for a given driver:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.name == "cnxk_nix_inl")'
+
+Get only the required kernel modules for a given driver:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.name == "net_i40e").kmod'
+
+Get only the required kernel modules for a given device:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.devices[] | .vendor_id == "15b3" and .device_id == "1013").kmod'
+"""
 
-# -------------------------------------------------------------------------
-#
-# Utility to dump PMD_INFO_STRING support from an object file
-#
-# -------------------------------------------------------------------------
-import json
-import os
-import platform
-import sys
 import argparse
-from elftools.common.exceptions import ELFError
-from elftools.common.py3compat import byte2int
-from elftools.elf.elffile import ELFFile
-
-
-# For running from development directory. It should take precedence over the
-# installed pyelftools.
-sys.path.insert(0, '.')
-
-raw_output = False
-pcidb = None
-
-# ===========================================
-
-class Vendor:
-    """
-    Class for vendors. This is the top level class
-    for the devices belong to a specific vendor.
-    self.devices is the device dictionary
-    subdevices are in each device.
-    """
-
-    def __init__(self, vendorStr):
-        """
-        Class initializes with the raw line from pci.ids
-        Parsing takes place inside __init__
-        """
-        self.ID = vendorStr.split()[0]
-        self.name = vendorStr.replace("%s " % self.ID, "").rstrip()
-        self.devices = {}
-
-    def addDevice(self, deviceStr):
-        """
-        Adds a device to self.devices
-        takes the raw line from pci.ids
-        """
-        s = deviceStr.strip()
-        devID = s.split()[0]
-        if devID in self.devices:
-            pass
-        else:
-            self.devices[devID] = Device(deviceStr)
-
-    def report(self):
-        print(self.ID, self.name)
-        for id, dev in self.devices.items():
-            dev.report()
-
-    def find_device(self, devid):
-        # convert to a hex string and remove 0x
-        devid = hex(devid)[2:]
-        try:
-            return self.devices[devid]
-        except:
-            return Device("%s  Unknown Device" % devid)
-
-
-class Device:
-
-    def __init__(self, deviceStr):
-        """
-        Class for each device.
-        Each vendor has its own devices dictionary.
-        """
-        s = deviceStr.strip()
-        self.ID = s.split()[0]
-        self.name = s.replace("%s  " % self.ID, "")
-        self.subdevices = {}
-
-    def report(self):
-        print("\t%s\t%s" % (self.ID, self.name))
-        for subID, subdev in self.subdevices.items():
-            subdev.report()
-
-    def addSubDevice(self, subDeviceStr):
-        """
-        Adds a subvendor, subdevice to device.
-        Uses raw line from pci.ids
-        """
-        s = subDeviceStr.strip()
-        spl = s.split()
-        subVendorID = spl[0]
-        subDeviceID = spl[1]
-        subDeviceName = s.split("  ")[-1]
-        devID = "%s:%s" % (subVendorID, subDeviceID)
-        self.subdevices[devID] = SubDevice(
-            subVendorID, subDeviceID, subDeviceName)
-
-    def find_subid(self, subven, subdev):
-        subven = hex(subven)[2:]
-        subdev = hex(subdev)[2:]
-        devid = "%s:%s" % (subven, subdev)
-
-        try:
-            return self.subdevices[devid]
-        except:
-            if (subven == "ffff" and subdev == "ffff"):
-                return SubDevice("ffff", "ffff", "(All Subdevices)")
-            return SubDevice(subven, subdev, "(Unknown Subdevice)")
-
-
-class SubDevice:
-    """
-    Class for subdevices.
-    """
-
-    def __init__(self, vendor, device, name):
-        """
-        Class initializes with vendorid, deviceid and name
-        """
-        self.vendorID = vendor
-        self.deviceID = device
-        self.name = name
-
-    def report(self):
-        print("\t\t%s\t%s\t%s" % (self.vendorID, self.deviceID, self.name))
-
-
-class PCIIds:
-    """
-    Top class for all pci.ids entries.
-    All queries will be asked to this class.
-    PCIIds.vendors["0e11"].devices["0046"].\
-    subdevices["0e11:4091"].name  =  "Smart Array 6i"
-    """
-
-    def __init__(self, filename):
-        """
-        Prepares the directories.
-        Checks local data file.
-        Tries to load from local, if not found, downloads from web
-        """
-        self.version = ""
-        self.date = ""
-        self.vendors = {}
-        self.contents = None
-        self.readLocal(filename)
-        self.parse()
-
-    def reportVendors(self):
-        """Reports the vendors
-        """
-        for vid, v in self.vendors.items():
-            print(v.ID, v.name)
-
-    def report(self, vendor=None):
-        """
-        Reports everything for all vendors or a specific vendor
-        PCIIds.report()  reports everything
-        PCIIDs.report("0e11") reports only "Compaq Computer Corporation"
-        """
-        if vendor is not None:
-            self.vendors[vendor].report()
-        else:
-            for vID, v in self.vendors.items():
-                v.report()
-
-    def find_vendor(self, vid):
-        # convert vid to a hex string and remove the 0x
-        vid = hex(vid)[2:]
-
-        try:
-            return self.vendors[vid]
-        except:
-            return Vendor("%s Unknown Vendor" % (vid))
-
-    def findDate(self, content):
-        for l in content:
-            if l.find("Date:") > -1:
-                return l.split()[-2].replace("-", "")
-        return None
-
-    def parse(self):
-        if not self.contents:
-            print("data/%s-pci.ids not found" % self.date)
-        else:
-            vendorID = ""
-            deviceID = ""
-            for l in self.contents:
-                if l[0] == "#":
-                    continue
-                elif not l.strip():
-                    continue
-                else:
-                    if l.find("\t\t") == 0:
-                        self.vendors[vendorID].devices[
-                            deviceID].addSubDevice(l)
-                    elif l.find("\t") == 0:
-                        deviceID = l.strip().split()[0]
-                        self.vendors[vendorID].addDevice(l)
-                    else:
-                        vendorID = l.split()[0]
-                        self.vendors[vendorID] = Vendor(l)
-
-    def readLocal(self, filename):
-        """
-        Reads the local file
-        """
-        with open(filename, 'r', encoding='utf-8') as f:
-            self.contents = f.readlines()
-        self.date = self.findDate(self.contents)
-
-    def loadLocal(self):
-        """
-        Loads database from local. If there is no file,
-        it creates a new one from web
-        """
-        self.date = idsfile[0].split("/")[1].split("-")[0]
-        self.readLocal()
-
-
-# =======================================
-
-def search_file(filename, search_path):
-    """ Given a search path, find file with requested name """
-    for path in search_path.split(':'):
-        candidate = os.path.join(path, filename)
-        if os.path.exists(candidate):
-            return os.path.abspath(candidate)
-    return None
-
-
-class ReadElf(object):
-    """ display_* methods are used to emit output into the output stream
-    """
-
-    def __init__(self, file, output):
-        """ file:
-                stream object with the ELF file to read
-
-            output:
-                output stream to write to
-        """
-        self.elffile = ELFFile(file)
-        self.output = output
-
-        # Lazily initialized if a debug dump is requested
-        self._dwarfinfo = None
-
-        self._versioninfo = None
-
-    def _section_from_spec(self, spec):
-        """ Retrieve a section given a "spec" (either number or name).
-            Return None if no such section exists in the file.
-        """
-        try:
-            num = int(spec)
-            if num < self.elffile.num_sections():
-                return self.elffile.get_section(num)
-            return None
-        except ValueError:
-            # Not a number. Must be a name then
-            section = self.elffile.get_section_by_name(force_unicode(spec))
-            if section is None:
-                # No match with a unicode name.
-                # Some versions of pyelftools (<= 0.23) store internal strings
-                # as bytes. Try again with the name encoded as bytes.
-                section = self.elffile.get_section_by_name(force_bytes(spec))
-            return section
-
-    def pretty_print_pmdinfo(self, pmdinfo):
-        global pcidb
-
-        for i in pmdinfo["pci_ids"]:
-            vendor = pcidb.find_vendor(i[0])
-            device = vendor.find_device(i[1])
-            subdev = device.find_subid(i[2], i[3])
-            print("%s (%s) : %s (%s) %s" %
-                  (vendor.name, vendor.ID, device.name,
-                   device.ID, subdev.name))
-
-    def parse_pmd_info_string(self, mystring):
-        global raw_output
-        global pcidb
-
-        optional_pmd_info = [
-            {'id': 'params', 'tag': 'PMD PARAMETERS'},
-            {'id': 'kmod', 'tag': 'PMD KMOD DEPENDENCIES'}
-        ]
-
-        i = mystring.index("=")
-        mystring = mystring[i + 2:]
-        pmdinfo = json.loads(mystring)
-
-        if raw_output:
-            print(json.dumps(pmdinfo))
-            return
-
-        print("PMD NAME: " + pmdinfo["name"])
-        for i in optional_pmd_info:
-            try:
-                print("%s: %s" % (i['tag'], pmdinfo[i['id']]))
-            except KeyError:
-                continue
-
-        if pmdinfo["pci_ids"]:
-            print("PMD HW SUPPORT:")
-            if pcidb is not None:
-                self.pretty_print_pmdinfo(pmdinfo)
-            else:
-                print("VENDOR\t DEVICE\t SUBVENDOR\t SUBDEVICE")
-                for i in pmdinfo["pci_ids"]:
-                    print("0x%04x\t 0x%04x\t 0x%04x\t\t 0x%04x" %
-                          (i[0], i[1], i[2], i[3]))
-
-        print("")
-
-    def display_pmd_info_strings(self, section_spec):
-        """ Display a strings dump of a section. section_spec is either a
-            section number or a name.
-        """
-        section = self._section_from_spec(section_spec)
-        if section is None:
-            return
-
-        data = section.data()
-        dataptr = 0
-
-        while dataptr < len(data):
-            while (dataptr < len(data) and
-                   not 32 <= byte2int(data[dataptr]) <= 127):
-                dataptr += 1
-
-            if dataptr >= len(data):
-                break
-
-            endptr = dataptr
-            while endptr < len(data) and byte2int(data[endptr]) != 0:
-                endptr += 1
-
-            # pyelftools may return byte-strings, force decode them
-            mystring = force_unicode(data[dataptr:endptr])
-            rc = mystring.find("PMD_INFO_STRING")
-            if rc != -1:
-                self.parse_pmd_info_string(mystring[rc:])
-
-            dataptr = endptr
-
-    def find_librte_eal(self, section):
-        for tag in section.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_NEEDED':
-                if "librte_eal" in force_unicode(tag.needed):
-                    return force_unicode(tag.needed)
-        return None
-
-    def search_for_autoload_path(self):
-        scanelf = self
-        scanfile = None
-        library = None
-
-        section = self._section_from_spec(".dynamic")
-        try:
-            eallib = self.find_librte_eal(section)
-            if eallib is not None:
-                ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-                if ldlibpath is None:
-                    ldlibpath = ""
-                dtr = self.get_dt_runpath(section)
-                library = search_file(eallib,
-                                      dtr + ":" + ldlibpath +
-                                      ":/usr/lib64:/lib64:/usr/lib:/lib")
-                if library is None:
-                    return (None, None)
-                if not raw_output:
-                    print("Scanning for autoload path in %s" % library)
-                scanfile = open(library, 'rb')
-                scanelf = ReadElf(scanfile, sys.stdout)
-        except AttributeError:
-            # Not a dynamic binary
-            pass
-        except ELFError:
-            scanfile.close()
-            return (None, None)
-
-        section = scanelf._section_from_spec(".rodata")
-        if section is None:
-            if scanfile is not None:
-                scanfile.close()
-            return (None, None)
-
-        data = section.data()
-        dataptr = 0
-
-        while dataptr < len(data):
-            while (dataptr < len(data) and
-                   not 32 <= byte2int(data[dataptr]) <= 127):
-                dataptr += 1
-
-            if dataptr >= len(data):
-                break
-
-            endptr = dataptr
-            while endptr < len(data) and byte2int(data[endptr]) != 0:
-                endptr += 1
-
-            # pyelftools may return byte-strings, force decode them
-            mystring = force_unicode(data[dataptr:endptr])
-            rc = mystring.find("DPDK_PLUGIN_PATH")
-            if rc != -1:
-                rc = mystring.find("=")
-                return (mystring[rc + 1:], library)
-
-            dataptr = endptr
-        if scanfile is not None:
-            scanfile.close()
-        return (None, None)
-
-    def get_dt_runpath(self, dynsec):
-        for tag in dynsec.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_RUNPATH':
-                return force_unicode(tag.runpath)
-        return ""
-
-    def process_dt_needed_entries(self):
-        """ Look to see if there are any DT_NEEDED entries in the binary
-            And process those if there are
-        """
-        runpath = ""
-        ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-        if ldlibpath is None:
-            ldlibpath = ""
-
-        dynsec = self._section_from_spec(".dynamic")
-        try:
-            runpath = self.get_dt_runpath(dynsec)
-        except AttributeError:
-            # dynsec is None, just return
-            return
-
-        for tag in dynsec.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_NEEDED':
-                if 'librte_' in force_unicode(tag.needed):
-                    library = search_file(force_unicode(tag.needed),
-                                          runpath + ":" + ldlibpath +
-                                          ":/usr/lib64:/lib64:/usr/lib:/lib")
-                    if library is not None:
-                        with open(library, 'rb') as file:
-                            try:
-                                libelf = ReadElf(file, sys.stdout)
-                            except ELFError:
-                                print("%s is no an ELF file" % library)
-                                continue
-                            libelf.process_dt_needed_entries()
-                            libelf.display_pmd_info_strings(".rodata")
-                            file.close()
-
-
-# compat: remove force_unicode & force_bytes when pyelftools<=0.23 support is
-# dropped.
-def force_unicode(s):
-    if hasattr(s, 'decode') and callable(s.decode):
-        s = s.decode('latin-1')  # same encoding used in pyelftools py3compat
-    return s
-
-
-def force_bytes(s):
-    if hasattr(s, 'encode') and callable(s.encode):
-        s = s.encode('latin-1')  # same encoding used in pyelftools py3compat
-    return s
-
-
-def scan_autoload_path(autoload_path):
-    global raw_output
-
-    if not os.path.exists(autoload_path):
-        return
-
+import glob
+import json
+import logging
+import os
+import re
+import string
+import sys
+from pathlib import Path
+from typing import Iterable, Iterator, List, Union
+
+import elftools
+from elftools.elf.elffile import ELFError, ELFFile
+
+
+# ----------------------------------------------------------------------------
+def main() -> int:  # pylint: disable=missing-docstring
     try:
-        dirs = os.listdir(autoload_path)
-    except OSError:
-        # Couldn't read the directory, give up
-        return
-
-    for d in dirs:
-        dpath = os.path.join(autoload_path, d)
-        if os.path.isdir(dpath):
-            scan_autoload_path(dpath)
-        if os.path.isfile(dpath):
-            try:
-                file = open(dpath, 'rb')
-                readelf = ReadElf(file, sys.stdout)
-            except ELFError:
-                # this is likely not an elf file, skip it
-                continue
-            except IOError:
-                # No permission to read the file, skip it
-                continue
-
-            if not raw_output:
-                print("Hw Support for library %s" % d)
-            readelf.display_pmd_info_strings(".rodata")
-            file.close()
+        args = parse_args()
+        logging.basicConfig(
+            stream=sys.stderr,
+            format="%(levelname)s: %(message)s",
+            level={
+                0: logging.ERROR,
+                1: logging.WARNING,
+            }.get(args.verbose, logging.DEBUG),
+        )
+        info = parse_pmdinfo(args.elf_files, args.search_plugins)
+        print(json.dumps(info, indent=2))
+    except BrokenPipeError:
+        pass
+    except KeyboardInterrupt:
+        return 1
+    except Exception as e:  # pylint: disable=broad-except
+        logging.error("%s", e)
+        return 1
+    return 0
 
 
-def scan_for_autoload_pmds(dpdk_path):
+# ----------------------------------------------------------------------------
+def parse_args() -> argparse.Namespace:
     """
-    search the specified application or path for a pmd autoload path
-    then scan said path for pmds and report hw support
+    Parse command line arguments.
     """
-    global raw_output
-
-    if not os.path.isfile(dpdk_path):
-        if not raw_output:
-            print("Must specify a file name")
-        return
-
-    file = open(dpdk_path, 'rb')
-    try:
-        readelf = ReadElf(file, sys.stdout)
-    except ElfError:
-        if not raw_output:
-            print("Unable to parse %s" % file)
-        return
-
-    (autoload_path, scannedfile) = readelf.search_for_autoload_path()
-    if not autoload_path:
-        if not raw_output:
-            print("No autoload path configured in %s" % dpdk_path)
-        return
-    if not raw_output:
-        if scannedfile is None:
-            scannedfile = dpdk_path
-        print("Found autoload path %s in %s" % (autoload_path, scannedfile))
-
-    file.close()
-    if not raw_output:
-        print("Discovered Autoload HW Support:")
-    scan_autoload_path(autoload_path)
-    return
-
-
-def main(stream=None):
-    global raw_output
-    global pcidb
-
-    pcifile_default = "./pci.ids"  # For unknown OS's assume local file
-    if platform.system() == 'Linux':
-        # hwdata is the legacy location, misc is supported going forward
-        pcifile_default = "/usr/share/misc/pci.ids"
-        if not os.path.exists(pcifile_default):
-            pcifile_default = "/usr/share/hwdata/pci.ids"
-    elif platform.system() == 'FreeBSD':
-        pcifile_default = "/usr/local/share/pciids/pci.ids"
-        if not os.path.exists(pcifile_default):
-            pcifile_default = "/usr/share/misc/pci_vendors"
-
     parser = argparse.ArgumentParser(
-        usage='usage: %(prog)s [-hrtp] [-d <pci id file>] elf_file',
-        description="Dump pmd hardware support info")
-    group = parser.add_mutually_exclusive_group()
-    group.add_argument('-r', '--raw',
-                       action='store_true', dest='raw_output',
-                       help='dump raw json strings')
-    group.add_argument("-t", "--table", dest="tblout",
-                       help="output information on hw support as a hex table",
-                       action='store_true')
-    parser.add_argument("-d", "--pcidb", dest="pcifile",
-                        help="specify a pci database to get vendor names from",
-                        default=pcifile_default, metavar="FILE")
-    parser.add_argument("-p", "--plugindir", dest="pdir",
-                        help="scan dpdk for autoload plugins",
-                        action='store_true')
-    parser.add_argument("elf_file", help="driver shared object file")
-    args = parser.parse_args()
+        description=__doc__,
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    parser.add_argument(
+        "-p",
+        "--search-plugins",
+        action="store_true",
+        help="""
+        In addition of ELF_FILEs and their linked dynamic libraries, also scan
+        the DPDK plugins path.
+        """,
+    )
+    parser.add_argument(
+        "-v",
+        "--verbose",
+        action="count",
+        default=0,
+        help="""
+        Display warnings due to linked libraries not found or ELF/JSON parsing
+        errors in these libraries. Use twice to show debug messages.
+        """,
+    )
+    parser.add_argument(
+        "elf_files",
+        metavar="ELF_FILE",
+        nargs="+",
+        type=existing_file,
+        help="""
+        DPDK application binary or dynamic library.
+        """,
+    )
+    return parser.parse_args()
 
-    if args.raw_output:
-        raw_output = True
 
-    if args.tblout:
-        args.pcifile = None
+# ----------------------------------------------------------------------------
+def parse_pmdinfo(paths: Iterable[Path], search_plugins: bool) -> List[dict]:
+    """
+    Extract DPDK PMD info JSON strings from an ELF file.
 
-    if args.pcifile:
-        pcidb = PCIIds(args.pcifile)
-        if pcidb is None:
-            print("Pci DB file not found")
-            exit(1)
+    :returns:
+        A list of DPDK drivers info dictionaries.
+    """
+    binaries = set(paths)
+    for p in paths:
+        binaries.update(get_needed_libs(p))
+    if search_plugins:
+        # cast to list to avoid errors with update while iterating
+        binaries.update(list(get_plugin_libs(binaries)))
 
-    if args.pdir:
-        exit(scan_for_autoload_pmds(args.elf_file))
+    drivers = []
 
-    ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-    if ldlibpath is None:
-        ldlibpath = ""
-
-    if os.path.exists(args.elf_file):
-        myelffile = args.elf_file
-    else:
-        myelffile = search_file(args.elf_file,
-                                ldlibpath + ":/usr/lib64:/lib64:/usr/lib:/lib")
-
-    if myelffile is None:
-        print("File not found")
-        sys.exit(1)
-
-    with open(myelffile, 'rb') as file:
+    for b in binaries:
+        logging.debug("analyzing %s", b)
         try:
-            readelf = ReadElf(file, sys.stdout)
-            readelf.process_dt_needed_entries()
-            readelf.display_pmd_info_strings(".rodata")
-            sys.exit(0)
+            for s in get_elf_strings(b, ".rodata", "PMD_INFO_STRING="):
+                try:
+                    info = json.loads(s)
+                    scrub_pci_ids(info)
+                    drivers.append(info)
+                except ValueError as e:
+                    # invalid JSON, should never happen
+                    logging.warning("%s: %s", b, e)
+        except ELFError as e:
+            # only happens for discovered plugins that are not ELF
+            logging.debug("%s: cannot parse ELF: %s", b, e)
 
-        except ELFError as ex:
-            sys.stderr.write('ELF error: %s\n' % ex)
-            sys.exit(1)
+    return drivers
 
 
-# -------------------------------------------------------------------------
-if __name__ == '__main__':
-    main()
+# ----------------------------------------------------------------------------
+PCI_FIELDS = ("vendor", "device", "subsystem_vendor", "subsystem_device")
+
+
+def scrub_pci_ids(info: dict):
+    """
+    Convert numerical ids to hex strings.
+    Strip empty pci_ids lists.
+    Strip wildcard 0xFFFF ids.
+    """
+    pci_ids = []
+    for pci_fields in info.pop("pci_ids"):
+        pci = {}
+        for name, value in zip(PCI_FIELDS, pci_fields):
+            if value != 0xFFFF:
+                pci[name] = f"{value:04x}"
+        if pci:
+            pci_ids.append(pci)
+    if pci_ids:
+        info["pci_ids"] = pci_ids
+
+
+# ----------------------------------------------------------------------------
+def get_plugin_libs(binaries: Iterable[Path]) -> Iterator[Path]:
+    """
+    Look into the provided binaries for DPDK_PLUGIN_PATH and scan the path
+    for files.
+    """
+    for b in binaries:
+        for p in get_elf_strings(b, ".rodata", "DPDK_PLUGIN_PATH="):
+            plugin_path = p.strip()
+            logging.debug("discovering plugins in %s", plugin_path)
+            for root, _, files in os.walk(plugin_path):
+                for f in files:
+                    yield Path(root) / f
+            # no need to search in other binaries.
+            return
+
+
+# ----------------------------------------------------------------------------
+def existing_file(value: str) -> Path:
+    """
+    Argparse type= callback to ensure an argument points to a valid file path.
+    """
+    path = Path(value)
+    if not path.is_file():
+        raise argparse.ArgumentTypeError(f"{value}: No such file")
+    return path
+
+
+# ----------------------------------------------------------------------------
+PRINTABLE_BYTES = frozenset(string.printable.encode("ascii"))
+
+
+def find_strings(buf: bytes, prefix: str) -> Iterator[str]:
+    """
+    Extract strings of printable ASCII characters from a bytes buffer.
+    """
+    view = memoryview(buf)
+    start = None
+
+    for i, b in enumerate(view):
+        if start is None and b in PRINTABLE_BYTES:
+            # mark beginning of string
+            start = i
+            continue
+        if start is not None:
+            if b in PRINTABLE_BYTES:
+                # string not finished
+                continue
+            if b == 0:
+                # end of string
+                s = view[start:i].tobytes().decode("ascii")
+                if s.startswith(prefix):
+                    yield s[len(prefix) :]
+            # There can be byte sequences where a non-printable byte
+            # follows a printable one. Ignore that.
+            start = None
+
+
+# ----------------------------------------------------------------------------
+def elftools_version():
+    """
+    Extract pyelftools version as a tuple of integers for easy comparison.
+    """
+    version = getattr(elftools, "__version__", "")
+    match = re.match(r"^(\d+)\.(\d+).*$", str(version))
+    if not match:
+        # cannot determine version, hope for the best
+        return (0, 24)
+    return (int(match[1]), int(match[2]))
+
+
+ELFTOOLS_VERSION = elftools_version()
+
+
+def from_elftools(s: Union[bytes, str]) -> str:
+    """
+    Earlier versions of pyelftools (< 0.24) return bytes encoded with "latin-1"
+    instead of python strings.
+    """
+    if isinstance(s, bytes):
+        return s.decode("latin-1")
+    return s
+
+
+def to_elftools(s: str) -> Union[bytes, str]:
+    """
+    Earlier versions of pyelftools (< 0.24) assume that ELF section and tags
+    are bytes encoded with "latin-1" instead of python strings.
+    """
+    if ELFTOOLS_VERSION < (0, 24):
+        return s.encode("latin-1")
+    return s
+
+
+# ----------------------------------------------------------------------------
+def get_elf_strings(path: Path, section: str, prefix: str) -> Iterator[str]:
+    """
+    Extract strings from a named ELF section in a file.
+    """
+    with path.open("rb") as f:
+        elf = ELFFile(f)
+        sec = elf.get_section_by_name(to_elftools(section))
+        if not sec:
+            return
+        yield from find_strings(sec.data(), prefix)
+
+
+# ----------------------------------------------------------------------------
+def ld_so_path() -> Iterator[str]:
+    """
+    Return the list of directories where dynamic libraries are loaded based
+    on the contents of /etc/ld.so.conf/*.conf.
+    """
+    for conf in glob.iglob("/etc/ld.so.conf/*.conf"):
+        try:
+            with open(conf, "r", encoding="utf-8") as f:
+                for line in f:
+                    line = line.strip()
+                    if os.path.isdir(line):
+                        yield line
+        except OSError:
+            pass
+
+
+LD_SO_CONF_PATH = ld_so_path()
+
+
+def search_dt_needed(origin: Path, needed: str, runpath: List[str]) -> Path:
+    """
+    Search a file into LD_LIBRARY_PATH (if defined), runpath (if set) and in
+    all folders declared in /etc/ld.so.conf/*.conf. Finally, look in the
+    standard folders (/lib followed by /usr/lib).
+    """
+    folders = []
+    if "LD_LIBRARY_PATH" in os.environ:
+        folders += os.environ["LD_LIBRARY_PATH"].split(":")
+    folders += runpath
+    folders += LD_SO_CONF_PATH
+    folders += ["/lib", "/usr/lib"]
+    for d in folders:
+        d = d.replace("$ORIGIN", str(origin.parent.absolute()))
+        filepath = Path(d) / needed
+        if filepath.is_file():
+            return filepath
+    raise FileNotFoundError(needed)
+
+
+# ----------------------------------------------------------------------------
+def get_needed_libs(path: Path) -> Iterator[Path]:
+    """
+    Extract the dynamic library dependencies from an ELF executable.
+    """
+    with path.open("rb") as f:
+        elf = ELFFile(f)
+        dyn = elf.get_section_by_name(to_elftools(".dynamic"))
+        if not dyn:
+            return
+        runpath = []
+        for tag in dyn.iter_tags(to_elftools("DT_RUNPATH")):
+            runpath += from_elftools(tag.runpath).split(":")
+        for tag in dyn.iter_tags(to_elftools("DT_NEEDED")):
+            needed = from_elftools(tag.needed)
+            if not needed.startswith("librte_"):
+                continue
+            logging.debug("%s: DT_NEEDED %s", path, needed)
+            try:
+                yield search_dt_needed(path, needed, runpath)
+            except FileNotFoundError:
+                logging.warning("%s: DT_NEEDED not found: %s", path, needed)
+
+
+# ----------------------------------------------------------------------------
+if __name__ == "__main__":
+    sys.exit(main())
-- 
2.37.3


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] usertools: rewrite pmdinfo
  2022-09-20 10:42 ` [PATCH v4] " Robin Jarry
@ 2022-09-20 14:08   ` Olivier Matz
  2022-09-20 17:48   ` Ferruh Yigit
  1 sibling, 0 replies; 42+ messages in thread
From: Olivier Matz @ 2022-09-20 14:08 UTC (permalink / raw)
  To: Robin Jarry; +Cc: dev, Ferruh Yigit, Bruce Richardson

On Tue, Sep 20, 2022 at 12:42:12PM +0200, Robin Jarry wrote:
> dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
> merely prints multiple independent JSON lines which cannot be fed
> directly to any JSON parser. Moreover, the script complexity is rather
> high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
> sections. Rewrite it so that it can produce valid JSON.
> 
> Remove the PCI database parsing for PCI-ID to Vendor-Device names
> conversion. This should be done by external scripts (if really needed).
> 
> Here are some examples of use with jq:
> 
> Get the complete info for a given driver:
> 
>  ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>    jq '.[] | select(.name == "dmadev_idxd_pci")'
>  {
>    "name": "dmadev_idxd_pci",
>    "params": "max_queues=0",
>    "kmod": "vfio-pci",
>    "pci_ids": [
>      {
>        "vendor": "8086",
>        "device": "0b25"
>      }
>    ]
>  }
> 
> Get only the required kernel modules for a given driver:
> 
>  ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>    jq '.[] | select(.name == "net_i40e").kmod'
>  "* igb_uio | uio_pci_generic | vfio-pci"
> 
> Get only the required kernel modules for a given device:
> 
>  ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>    jq '.[] | select(.pci_ids[] | .vendor == "15b3" and .device == "1013").kmod'
>  "* ib_uverbs & mlx5_core & mlx5_ib"
> 
> Print the list of drivers which define multiple parameters without
> space separators:
> 
>  ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>    jq '.[] | select(.params!=null and (.params|test("=[^ ]+="))) | {name, params}'
>  ...
> 
> The script passes flake8, black, isort and pylint checks.
> 
> I have tested this with a matrix of python/pyelftools versions:
> 
>                              pyelftools
>                0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
>          3.6     ok   ok   ok   ok   ok   ok   ok   ok
>          3.7     ok   ok   ok   ok   ok   ok   ok   ok
>   Python 3.8     ok   ok   ok   ok   ok   ok   ok   ok
>          3.9     ok   ok   ok   ok   ok   ok   ok   ok
>          3.10  fail fail fail fail   ok   ok   ok   ok
> 
> All failures with python 3.10 are related to the same issue:
> 
>   File "elftools/construct/lib/container.py", line 5, in <module>
>     from collections import MutableMapping
>   ImportError: cannot import name 'MutableMapping' from 'collections'
> 
> Python 3.10 support is only available since pyelftools 0.26. The script
> will only work with Python 3.6 and later. Update the minimal system
> requirements and release notes.
> 
> NB: The output produced by the legacy -r/--raw flag can be obtained with
> the following command:
> 
>   strings build/app/dpdk-testpmd | sed -n 's/^PMD_INFO_STRING= //p'
> 
> Cc: Olivier Matz <olivier.matz@6wind.com>
> Cc: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Cc: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Robin Jarry <rjarry@redhat.com>

Tested-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] usertools: rewrite pmdinfo
  2022-09-20 10:42 ` [PATCH v4] " Robin Jarry
  2022-09-20 14:08   ` Olivier Matz
@ 2022-09-20 17:48   ` Ferruh Yigit
  2022-09-20 17:50     ` Ferruh Yigit
  2022-09-20 19:15     ` Robin Jarry
  1 sibling, 2 replies; 42+ messages in thread
From: Ferruh Yigit @ 2022-09-20 17:48 UTC (permalink / raw)
  To: Robin Jarry, dev; +Cc: Olivier Matz, Bruce Richardson

On 9/20/2022 11:42 AM, Robin Jarry wrote:

> 
> dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
> merely prints multiple independent JSON lines which cannot be fed
> directly to any JSON parser. Moreover, the script complexity is rather
> high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
> sections. Rewrite it so that it can produce valid JSON.
> 
> Remove the PCI database parsing for PCI-ID to Vendor-Device names
> conversion. This should be done by external scripts (if really needed).
> 
> Here are some examples of use with jq:
> 
> Get the complete info for a given driver:
> 
>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>     jq '.[] | select(.name == "dmadev_idxd_pci")'
>   {
>     "name": "dmadev_idxd_pci",
>     "params": "max_queues=0",
>     "kmod": "vfio-pci",
>     "pci_ids": [
>       {
>         "vendor": "8086",
>         "device": "0b25"
>       }
>     ]
>   }
> 
> Get only the required kernel modules for a given driver:
> 
>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>     jq '.[] | select(.name == "net_i40e").kmod'
>   "* igb_uio | uio_pci_generic | vfio-pci"
> 
> Get only the required kernel modules for a given device:
> 
>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>     jq '.[] | select(.pci_ids[] | .vendor == "15b3" and .device == "1013").kmod'
>   "* ib_uverbs & mlx5_core & mlx5_ib"
> 
> Print the list of drivers which define multiple parameters without
> space separators:
> 
>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>     jq '.[] | select(.params!=null and (.params|test("=[^ ]+="))) | {name, params}'
>   ...
> 
> The script passes flake8, black, isort and pylint checks.
> 
> I have tested this with a matrix of python/pyelftools versions:
> 
>                               pyelftools
>                 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
>           3.6     ok   ok   ok   ok   ok   ok   ok   ok
>           3.7     ok   ok   ok   ok   ok   ok   ok   ok
>    Python 3.8     ok   ok   ok   ok   ok   ok   ok   ok
>           3.9     ok   ok   ok   ok   ok   ok   ok   ok
>           3.10  fail fail fail fail   ok   ok   ok   ok
> 
> All failures with python 3.10 are related to the same issue:
> 
>    File "elftools/construct/lib/container.py", line 5, in <module>
>      from collections import MutableMapping
>    ImportError: cannot import name 'MutableMapping' from 'collections'
> 
> Python 3.10 support is only available since pyelftools 0.26. The script
> will only work with Python 3.6 and later. Update the minimal system
> requirements and release notes.
> 
> NB: The output produced by the legacy -r/--raw flag can be obtained with
> the following command:
> 
>    strings build/app/dpdk-testpmd | sed -n 's/^PMD_INFO_STRING= //p'
> 
> Cc: Olivier Matz <olivier.matz@6wind.com>
> Cc: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Cc: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Robin Jarry <rjarry@redhat.com>

<...>

> diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
> index 8c021cf0505e..67054f5acdc9 100644
> --- a/doc/guides/rel_notes/release_22_11.rst
> +++ b/doc/guides/rel_notes/release_22_11.rst
> @@ -84,6 +84,11 @@ API Changes
>      Also, make sure to start the actual text at the margin.
>      =======================================================
> 
> +* The ``dpdk-pmdinfo.py`` script was rewritten to produce valid JSON only.
> +  PCI-IDs parsing has been removed.
> +  To get a similar output to the (now removed) ``-r/--raw`` flag, you may use the following command::
> +
> +     strings $dpdk_binary_or_driver | sed -n 's/^PMD_INFO_STRING= //p'
> 

Empty line is missing (in case there will be a new version for some 
other reason).


Thanks for the update,
Tested-by: Ferruh Yigit <ferruh.yigit@xilinx.com>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] usertools: rewrite pmdinfo
  2022-09-20 17:48   ` Ferruh Yigit
@ 2022-09-20 17:50     ` Ferruh Yigit
  2022-09-21  7:27       ` Thomas Monjalon
  2022-09-20 19:15     ` Robin Jarry
  1 sibling, 1 reply; 42+ messages in thread
From: Ferruh Yigit @ 2022-09-20 17:50 UTC (permalink / raw)
  To: Mcnamara, John, Thomas Monjalon
  Cc: Olivier Matz, Bruce Richardson, Robin Jarry, dev

On 9/20/2022 6:48 PM, Ferruh Yigit wrote:
> On 9/20/2022 11:42 AM, Robin Jarry wrote:
> 
>>
>> dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
>> merely prints multiple independent JSON lines which cannot be fed
>> directly to any JSON parser. Moreover, the script complexity is rather
>> high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
>> sections. Rewrite it so that it can produce valid JSON.
>>
>> Remove the PCI database parsing for PCI-ID to Vendor-Device names
>> conversion. This should be done by external scripts (if really needed).
>>
>> Here are some examples of use with jq:
>>
>> Get the complete info for a given driver:
>>
>>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>>     jq '.[] | select(.name == "dmadev_idxd_pci")'
>>   {
>>     "name": "dmadev_idxd_pci",
>>     "params": "max_queues=0",
>>     "kmod": "vfio-pci",
>>     "pci_ids": [
>>       {
>>         "vendor": "8086",
>>         "device": "0b25"
>>       }
>>     ]
>>   }
>>
>> Get only the required kernel modules for a given driver:
>>
>>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>>     jq '.[] | select(.name == "net_i40e").kmod'
>>   "* igb_uio | uio_pci_generic | vfio-pci"
>>
>> Get only the required kernel modules for a given device:
>>
>>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>>     jq '.[] | select(.pci_ids[] | .vendor == "15b3" and .device == 
>> "1013").kmod'
>>   "* ib_uverbs & mlx5_core & mlx5_ib"
>>
>> Print the list of drivers which define multiple parameters without
>> space separators:
>>
>>   ~$ usertools/dpdk-pmdinfo.py build/app/dpdk-testpmd | \
>>     jq '.[] | select(.params!=null and (.params|test("=[^ ]+="))) | 
>> {name, params}'
>>   ...
>>
>> The script passes flake8, black, isort and pylint checks.
>>
>> I have tested this with a matrix of python/pyelftools versions:
>>
>>                               pyelftools
>>                 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
>>           3.6     ok   ok   ok   ok   ok   ok   ok   ok
>>           3.7     ok   ok   ok   ok   ok   ok   ok   ok
>>    Python 3.8     ok   ok   ok   ok   ok   ok   ok   ok
>>           3.9     ok   ok   ok   ok   ok   ok   ok   ok
>>           3.10  fail fail fail fail   ok   ok   ok   ok
>>
>> All failures with python 3.10 are related to the same issue:
>>
>>    File "elftools/construct/lib/container.py", line 5, in <module>
>>      from collections import MutableMapping
>>    ImportError: cannot import name 'MutableMapping' from 'collections'
>>
>> Python 3.10 support is only available since pyelftools 0.26. The script
>> will only work with Python 3.6 and later. Update the minimal system
>> requirements and release notes.
>>
>> NB: The output produced by the legacy -r/--raw flag can be obtained with
>> the following command:
>>
>>    strings build/app/dpdk-testpmd | sed -n 's/^PMD_INFO_STRING= //p'
>>
>> Cc: Olivier Matz <olivier.matz@6wind.com>
>> Cc: Ferruh Yigit <ferruh.yigit@xilinx.com>
>> Cc: Bruce Richardson <bruce.richardson@intel.com>
>> Signed-off-by: Robin Jarry <rjarry@redhat.com>
> 
> <...>
> 
>> diff --git a/doc/guides/rel_notes/release_22_11.rst 
>> b/doc/guides/rel_notes/release_22_11.rst
>> index 8c021cf0505e..67054f5acdc9 100644
>> --- a/doc/guides/rel_notes/release_22_11.rst
>> +++ b/doc/guides/rel_notes/release_22_11.rst
>> @@ -84,6 +84,11 @@ API Changes
>>      Also, make sure to start the actual text at the margin.
>>      =======================================================
>>
>> +* The ``dpdk-pmdinfo.py`` script was rewritten to produce valid JSON 
>> only.
>> +  PCI-IDs parsing has been removed.
>> +  To get a similar output to the (now removed) ``-r/--raw`` flag, you 
>> may use the following command::
>> +
>> +     strings $dpdk_binary_or_driver | sed -n 's/^PMD_INFO_STRING= //p'
>>
> 
> Empty line is missing (in case there will be a new version for some 
> other reason).
> 
> 
> Thanks for the update,
> Tested-by: Ferruh Yigit <ferruh.yigit@xilinx.com>
> 

Thomas, John,

Should we have documentation for usertools, since they are user facing, 
what do you think?
Can it be possible to find resource for it?

Thanks,
ferruh

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] usertools: rewrite pmdinfo
  2022-09-20 17:48   ` Ferruh Yigit
  2022-09-20 17:50     ` Ferruh Yigit
@ 2022-09-20 19:15     ` Robin Jarry
  2022-09-21  7:58       ` Ferruh Yigit
  1 sibling, 1 reply; 42+ messages in thread
From: Robin Jarry @ 2022-09-20 19:15 UTC (permalink / raw)
  To: Ferruh Yigit, dev; +Cc: Olivier Matz, Bruce Richardson

Ferruh Yigit, Sep 20, 2022 at 19:48:
> > +* The ``dpdk-pmdinfo.py`` script was rewritten to produce valid JSON only.
> > +  PCI-IDs parsing has been removed.
> > +  To get a similar output to the (now removed) ``-r/--raw`` flag, you may use the following command::
> > +
> > +     strings $dpdk_binary_or_driver | sed -n 's/^PMD_INFO_STRING= //p'
> > 
>
> Empty line is missing (in case there will be a new version for some 
> other reason).

What do you mean?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] usertools: rewrite pmdinfo
  2022-09-20 17:50     ` Ferruh Yigit
@ 2022-09-21  7:27       ` Thomas Monjalon
  2022-09-21  8:02         ` Ferruh Yigit
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Monjalon @ 2022-09-21  7:27 UTC (permalink / raw)
  To: Mcnamara, John, Ferruh Yigit
  Cc: Olivier Matz, Bruce Richardson, Robin Jarry, dev

20/09/2022 19:50, Ferruh Yigit:
> Thomas, John,
> 
> Should we have documentation for usertools, since they are user facing, 
> what do you think?

We have doc/guides/tools/

> Can it be possible to find resource for it?



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] usertools: rewrite pmdinfo
  2022-09-20 19:15     ` Robin Jarry
@ 2022-09-21  7:58       ` Ferruh Yigit
  2022-09-21  9:57         ` Ferruh Yigit
  0 siblings, 1 reply; 42+ messages in thread
From: Ferruh Yigit @ 2022-09-21  7:58 UTC (permalink / raw)
  To: Robin Jarry, dev; +Cc: Olivier Matz, Bruce Richardson

On 9/20/2022 8:15 PM, Robin Jarry wrote:

> 
> Ferruh Yigit, Sep 20, 2022 at 19:48:
>>> +* The ``dpdk-pmdinfo.py`` script was rewritten to produce valid JSON only.
>>> +  PCI-IDs parsing has been removed.
>>> +  To get a similar output to the (now removed) ``-r/--raw`` flag, you may use the following command::
>>> +
>>> +     strings $dpdk_binary_or_driver | sed -n 's/^PMD_INFO_STRING= //p'
>>>
>>
>> Empty line is missing (in case there will be a new version for some
>> other reason).
> 
> What do you mean?
> 

just nit picking, there needs to be two empty lines before "ABI 
Changes", I assume committer can fix it while merging if there won't be 
any new version.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] usertools: rewrite pmdinfo
  2022-09-21  7:27       ` Thomas Monjalon
@ 2022-09-21  8:02         ` Ferruh Yigit
  0 siblings, 0 replies; 42+ messages in thread
From: Ferruh Yigit @ 2022-09-21  8:02 UTC (permalink / raw)
  To: Thomas Monjalon, Mcnamara, John, Robin Jarry
  Cc: Olivier Matz, Bruce Richardson, dev

On 9/21/2022 8:27 AM, Thomas Monjalon wrote:
> 20/09/2022 19:50, Ferruh Yigit:
>> Thomas, John,
>>
>> Should we have documentation for usertools, since they are user facing,
>> what do you think?
> 
> We have doc/guides/tools/
> 

Indeed, and there is even 'doc/guides/tools/pmdinfo.rst', so
@Robin that documentation also needs to be updated with this patch.

I wonder if that documentation should have some pmdinfo technical 
details (not just dpdk-pmdinfo.py but overall feature)?

>> Can it be possible to find resource for it?
> 
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4] usertools: rewrite pmdinfo
  2022-09-21  7:58       ` Ferruh Yigit
@ 2022-09-21  9:57         ` Ferruh Yigit
  0 siblings, 0 replies; 42+ messages in thread
From: Ferruh Yigit @ 2022-09-21  9:57 UTC (permalink / raw)
  To: Robin Jarry, dev; +Cc: Olivier Matz, Bruce Richardson

On 9/21/2022 8:58 AM, Ferruh Yigit wrote:
> On 9/20/2022 8:15 PM, Robin Jarry wrote:
> 
>>
>> Ferruh Yigit, Sep 20, 2022 at 19:48:
>>>> +* The ``dpdk-pmdinfo.py`` script was rewritten to produce valid 
>>>> JSON only.
>>>> +  PCI-IDs parsing has been removed.
>>>> +  To get a similar output to the (now removed) ``-r/--raw`` flag, 
>>>> you may use the following command::
>>>> +
>>>> +     strings $dpdk_binary_or_driver | sed -n 's/^PMD_INFO_STRING= //p'
>>>>
>>>
>>> Empty line is missing (in case there will be a new version for some
>>> other reason).
>>
>> What do you mean?
>>
> 
> just nit picking, there needs to be two empty lines before "ABI 
> Changes", I assume committer can fix it while merging if there won't be 
> any new version.

Also looking twice, this release note update is in "API Changes" 
section, I think that is not correct place, "New Features" section can 
be better place.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v5] usertools: rewrite pmdinfo
  2022-09-13 10:58 [PATCH] usertools: rewrite pmdinfo Robin Jarry
                   ` (4 preceding siblings ...)
  2022-09-20 10:42 ` [PATCH v4] " Robin Jarry
@ 2022-09-22 11:58 ` Robin Jarry
  2022-09-22 12:03   ` Bruce Richardson
                     ` (3 more replies)
  2022-09-26 13:44 ` [PATCH v6] " Robin Jarry
  2022-10-04 19:29 ` [PATCH v7] " Robin Jarry
  7 siblings, 4 replies; 42+ messages in thread
From: Robin Jarry @ 2022-09-22 11:58 UTC (permalink / raw)
  To: dev; +Cc: Robin Jarry, Olivier Matz, Ferruh Yigit, Bruce Richardson

dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
merely prints multiple independent JSON lines which cannot be fed
directly to any JSON parser. Moreover, the script complexity is rather
high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
sections. Rewrite it so that it can produce valid JSON.

Remove the PCI database parsing for PCI-ID to Vendor-Device names
conversion. This should be done by external scripts (if really needed).

The script passes flake8, black, isort and pylint checks.

I have tested this with a matrix of python/pyelftools versions:

                                 pyelftools
               0.22  0.23  0.24  0.25  0.26  0.27  0.28  0.29
        3.6      ok    ok    ok    ok    ok    ok    ok    ok
        3.7      ok    ok    ok    ok    ok    ok    ok    ok
 Python 3.8      ok    ok    ok    ok    ok    ok    ok    ok
        3.9      ok    ok    ok    ok    ok    ok    ok    ok
        3.10   fail  fail  fail  fail    ok    ok    ok    ok

All failures with python 3.10 are related to the same issue:

  File "elftools/construct/lib/container.py", line 5, in <module>
    from collections import MutableMapping
  ImportError: cannot import name 'MutableMapping' from 'collections'

Python 3.10 support is only available since pyelftools 0.26. The script
will only work with Python 3.6 and later. Update the minimal system
requirements, docs and release notes.

Cc: Olivier Matz <olivier.matz@6wind.com>
Cc: Ferruh Yigit <ferruh.yigit@xilinx.com>
Cc: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
v4 -> v5:

* fixed doc/guides/rel_notes/release_22_11.rst
* updated doc/guides/tools/pmdinfo.rst with examples that were in the
  commit message

v3 -> v4:

* also strip pci_id fields when they have the wildcard 0xFFFF value.

v2 -> v3:

* strip "pci_ids" when it is empty (some drivers do not support any
  pci devices)

v1 -> v2:

* update release notes and minimal python version requirement
* hide warnings by default (-v/--verbose to show them)
* show debug messages with -vv
* also search libs in folders listed in /etc/ld.so.conf/*.conf
* only search for DT_NEEDED on executables, not on dynamic libraries
* take DT_RUNPATH into account for searching libraries
* fix weird broken pipe error
* fix some typos:
    s/begining/beginning/
    s/subsystem_device/subsystem_vendor/
    s/subsystem_system/subsystem_device/
* change field names for pci_ids elements (remove _id suffixes)
* DT_NEEDED of files are analyzed. There is no way to differentiate
  between dynamically linked executables and dynamic libraries.

 doc/guides/linux_gsg/sys_reqs.rst      |   2 +-
 doc/guides/rel_notes/release_22_11.rst |   8 +
 doc/guides/tools/pmdinfo.rst           |  86 ++-
 usertools/dpdk-pmdinfo.py              | 924 +++++++++----------------
 4 files changed, 400 insertions(+), 620 deletions(-)

diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index 08d45898f025..f842105eeda7 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -41,7 +41,7 @@ Compilation of the DPDK
    resulting in statically linked applications not being linked properly.
    Use an updated version of ``pkg-config`` or ``pkgconf`` instead when building applications
 
-*   Python 3.5 or later.
+*   Python 3.6 or later.
 
 *   Meson (version 0.49.2+) and ninja
 
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 8c021cf0505e..d10e856ada74 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -55,6 +55,14 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* The ``dpdk-pmdinfo.py`` script was rewritten to produce valid JSON only.
+  PCI-IDs parsing has been removed.
+  To get a similar output to the (now removed) ``-r/--raw`` flag, you may use the following command:
+
+  .. code-block:: sh
+
+     strings $dpdk_binary_or_driver | sed -n 's/^PMD_INFO_STRING= //p'
+
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/pmdinfo.rst b/doc/guides/tools/pmdinfo.rst
index af6d24b0f63b..3354cbba4c8a 100644
--- a/doc/guides/tools/pmdinfo.rst
+++ b/doc/guides/tools/pmdinfo.rst
@@ -5,25 +5,85 @@
 dpdk-pmdinfo Application
 ========================
 
-The ``dpdk-pmdinfo`` tool is a Data Plane Development Kit (DPDK) utility that
-can dump a PMDs hardware support info.
+The ``dpdk-pmdinfo.py`` tool is a Data Plane Development Kit (DPDK) utility that
+can dump a PMDs hardware support info in the JSON format.
 
+Synopsis
+--------
 
-Running the Application
------------------------
+::
 
-The tool has a number of command line options:
+   dpdk-pmdinfo.py [-h] [-p] [-v] ELF_FILE [ELF_FILE ...]
+
+Arguments
+---------
+
+.. program:: dpdk-pmdinfo.py
+
+.. option:: -h, --help
+
+   Show the inline help.
+
+.. option:: -p, --search-plugins
+
+   In addition of ``ELF_FILE``\s and their linked dynamic libraries, also scan
+   the DPDK plugins path.
+
+.. option:: -v, --verbose
+
+   Display warnings due to linked libraries not found or ELF/JSON parsing errors
+   in these libraries. Use twice to show debug messages.
+
+.. option:: ELF_FILE
+
+   DPDK application binary or dynamic library.
+   Can be specified multiple times.
+
+Environment Variables
+---------------------
+
+.. envvar:: LD_LIBRARY_PATH
+
+   ``dpdk-pmdinfo.py`` will also parse ``librte_*.so`` dynamic libraries that
+   are linked to the specified ``ELF_FILE``\s arguments. The dynamic library
+   files will be looked up based on the ``DT_RUNPATH`` set at link time, the
+   ``/etc/ld.so.conf.d/*.conf`` files and also in the standard ``/lib`` and
+   ``/usr/lib`` folders. Any colon separated folder defined in
+   ``LD_LIBRARY_PATH`` will be looked up first.
+
+Examples
+--------
+
+Get the complete info for a given driver:
 
 .. code-block:: console
 
-   dpdk-pmdinfo [-hrtp] [-d <pci id file] <elf-file>
+   $ dpdk-pmdinfo.py /usr/bin/dpdk-testpmd | \
+       jq '.[] | select(.name == "net_ice_dcf")'
+   {
+     "name": "net_ice_dcf",
+     "params": "cap=dcf",
+     "kmod": "* igb_uio | vfio-pci",
+     "pci_ids": [
+       {
+         "vendor": "8086",
+         "device": "1889"
+       }
+     ]
+   }
 
-   -h, --help            Show a short help message and exit
-   -r, --raw             Dump as raw json strings
-   -d FILE, --pcidb=FILE Specify a pci database to get vendor names from
-   -t, --table           Output information on hw support as a hex table
-   -p, --plugindir       Scan dpdk for autoload plugins
+Get only the required kernel modules for a given driver:
 
-.. Note::
+.. code-block:: console
 
-   * Parameters inside the square brackets represents optional parameters.
+   $ dpdk-pmdinfo.py /usr/bin/dpdk-testpmd | \
+       jq '.[] | select(.name == "net_cn10k").kmod'
+   "vfio-pci"
+
+Get only the required kernel modules for a given device:
+
+.. code-block:: console
+
+   $ dpdk-pmdinfo.py /usr/bin/dpdk-testpmd | \
+       jq '.[] | select(.pci_ids[] | .vendor == "15b3" and .device == "1013").kmod'
+   "* ib_uverbs & mlx5_core & mlx5_ib"
diff --git a/usertools/dpdk-pmdinfo.py b/usertools/dpdk-pmdinfo.py
index 40ef5cec6cba..a68921296609 100755
--- a/usertools/dpdk-pmdinfo.py
+++ b/usertools/dpdk-pmdinfo.py
@@ -1,626 +1,338 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2016  Neil Horman <nhorman@tuxdriver.com>
+# Copyright(c) 2022  Robin Jarry
+# pylint: disable=invalid-name
+
+r"""
+Utility to dump PMD_INFO_STRING support from DPDK binaries.
+
+This script prints JSON output to be interpreted by other tools. Here are some
+examples with jq:
+
+Get the complete info for a given driver:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.name == "cnxk_nix_inl")'
+
+Get only the required kernel modules for a given driver:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.name == "net_i40e").kmod'
+
+Get only the required kernel modules for a given device:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.devices[] | .vendor_id == "15b3" and .device_id == "1013").kmod'
+"""
 
-# -------------------------------------------------------------------------
-#
-# Utility to dump PMD_INFO_STRING support from an object file
-#
-# -------------------------------------------------------------------------
-import json
-import os
-import platform
-import sys
 import argparse
-from elftools.common.exceptions import ELFError
-from elftools.common.py3compat import byte2int
-from elftools.elf.elffile import ELFFile
-
-
-# For running from development directory. It should take precedence over the
-# installed pyelftools.
-sys.path.insert(0, '.')
-
-raw_output = False
-pcidb = None
-
-# ===========================================
-
-class Vendor:
-    """
-    Class for vendors. This is the top level class
-    for the devices belong to a specific vendor.
-    self.devices is the device dictionary
-    subdevices are in each device.
-    """
-
-    def __init__(self, vendorStr):
-        """
-        Class initializes with the raw line from pci.ids
-        Parsing takes place inside __init__
-        """
-        self.ID = vendorStr.split()[0]
-        self.name = vendorStr.replace("%s " % self.ID, "").rstrip()
-        self.devices = {}
-
-    def addDevice(self, deviceStr):
-        """
-        Adds a device to self.devices
-        takes the raw line from pci.ids
-        """
-        s = deviceStr.strip()
-        devID = s.split()[0]
-        if devID in self.devices:
-            pass
-        else:
-            self.devices[devID] = Device(deviceStr)
-
-    def report(self):
-        print(self.ID, self.name)
-        for id, dev in self.devices.items():
-            dev.report()
-
-    def find_device(self, devid):
-        # convert to a hex string and remove 0x
-        devid = hex(devid)[2:]
-        try:
-            return self.devices[devid]
-        except:
-            return Device("%s  Unknown Device" % devid)
-
-
-class Device:
-
-    def __init__(self, deviceStr):
-        """
-        Class for each device.
-        Each vendor has its own devices dictionary.
-        """
-        s = deviceStr.strip()
-        self.ID = s.split()[0]
-        self.name = s.replace("%s  " % self.ID, "")
-        self.subdevices = {}
-
-    def report(self):
-        print("\t%s\t%s" % (self.ID, self.name))
-        for subID, subdev in self.subdevices.items():
-            subdev.report()
-
-    def addSubDevice(self, subDeviceStr):
-        """
-        Adds a subvendor, subdevice to device.
-        Uses raw line from pci.ids
-        """
-        s = subDeviceStr.strip()
-        spl = s.split()
-        subVendorID = spl[0]
-        subDeviceID = spl[1]
-        subDeviceName = s.split("  ")[-1]
-        devID = "%s:%s" % (subVendorID, subDeviceID)
-        self.subdevices[devID] = SubDevice(
-            subVendorID, subDeviceID, subDeviceName)
-
-    def find_subid(self, subven, subdev):
-        subven = hex(subven)[2:]
-        subdev = hex(subdev)[2:]
-        devid = "%s:%s" % (subven, subdev)
-
-        try:
-            return self.subdevices[devid]
-        except:
-            if (subven == "ffff" and subdev == "ffff"):
-                return SubDevice("ffff", "ffff", "(All Subdevices)")
-            return SubDevice(subven, subdev, "(Unknown Subdevice)")
-
-
-class SubDevice:
-    """
-    Class for subdevices.
-    """
-
-    def __init__(self, vendor, device, name):
-        """
-        Class initializes with vendorid, deviceid and name
-        """
-        self.vendorID = vendor
-        self.deviceID = device
-        self.name = name
-
-    def report(self):
-        print("\t\t%s\t%s\t%s" % (self.vendorID, self.deviceID, self.name))
-
-
-class PCIIds:
-    """
-    Top class for all pci.ids entries.
-    All queries will be asked to this class.
-    PCIIds.vendors["0e11"].devices["0046"].\
-    subdevices["0e11:4091"].name  =  "Smart Array 6i"
-    """
-
-    def __init__(self, filename):
-        """
-        Prepares the directories.
-        Checks local data file.
-        Tries to load from local, if not found, downloads from web
-        """
-        self.version = ""
-        self.date = ""
-        self.vendors = {}
-        self.contents = None
-        self.readLocal(filename)
-        self.parse()
-
-    def reportVendors(self):
-        """Reports the vendors
-        """
-        for vid, v in self.vendors.items():
-            print(v.ID, v.name)
-
-    def report(self, vendor=None):
-        """
-        Reports everything for all vendors or a specific vendor
-        PCIIds.report()  reports everything
-        PCIIDs.report("0e11") reports only "Compaq Computer Corporation"
-        """
-        if vendor is not None:
-            self.vendors[vendor].report()
-        else:
-            for vID, v in self.vendors.items():
-                v.report()
-
-    def find_vendor(self, vid):
-        # convert vid to a hex string and remove the 0x
-        vid = hex(vid)[2:]
-
-        try:
-            return self.vendors[vid]
-        except:
-            return Vendor("%s Unknown Vendor" % (vid))
-
-    def findDate(self, content):
-        for l in content:
-            if l.find("Date:") > -1:
-                return l.split()[-2].replace("-", "")
-        return None
-
-    def parse(self):
-        if not self.contents:
-            print("data/%s-pci.ids not found" % self.date)
-        else:
-            vendorID = ""
-            deviceID = ""
-            for l in self.contents:
-                if l[0] == "#":
-                    continue
-                elif not l.strip():
-                    continue
-                else:
-                    if l.find("\t\t") == 0:
-                        self.vendors[vendorID].devices[
-                            deviceID].addSubDevice(l)
-                    elif l.find("\t") == 0:
-                        deviceID = l.strip().split()[0]
-                        self.vendors[vendorID].addDevice(l)
-                    else:
-                        vendorID = l.split()[0]
-                        self.vendors[vendorID] = Vendor(l)
-
-    def readLocal(self, filename):
-        """
-        Reads the local file
-        """
-        with open(filename, 'r', encoding='utf-8') as f:
-            self.contents = f.readlines()
-        self.date = self.findDate(self.contents)
-
-    def loadLocal(self):
-        """
-        Loads database from local. If there is no file,
-        it creates a new one from web
-        """
-        self.date = idsfile[0].split("/")[1].split("-")[0]
-        self.readLocal()
-
-
-# =======================================
-
-def search_file(filename, search_path):
-    """ Given a search path, find file with requested name """
-    for path in search_path.split(':'):
-        candidate = os.path.join(path, filename)
-        if os.path.exists(candidate):
-            return os.path.abspath(candidate)
-    return None
-
-
-class ReadElf(object):
-    """ display_* methods are used to emit output into the output stream
-    """
-
-    def __init__(self, file, output):
-        """ file:
-                stream object with the ELF file to read
-
-            output:
-                output stream to write to
-        """
-        self.elffile = ELFFile(file)
-        self.output = output
-
-        # Lazily initialized if a debug dump is requested
-        self._dwarfinfo = None
-
-        self._versioninfo = None
-
-    def _section_from_spec(self, spec):
-        """ Retrieve a section given a "spec" (either number or name).
-            Return None if no such section exists in the file.
-        """
-        try:
-            num = int(spec)
-            if num < self.elffile.num_sections():
-                return self.elffile.get_section(num)
-            return None
-        except ValueError:
-            # Not a number. Must be a name then
-            section = self.elffile.get_section_by_name(force_unicode(spec))
-            if section is None:
-                # No match with a unicode name.
-                # Some versions of pyelftools (<= 0.23) store internal strings
-                # as bytes. Try again with the name encoded as bytes.
-                section = self.elffile.get_section_by_name(force_bytes(spec))
-            return section
-
-    def pretty_print_pmdinfo(self, pmdinfo):
-        global pcidb
-
-        for i in pmdinfo["pci_ids"]:
-            vendor = pcidb.find_vendor(i[0])
-            device = vendor.find_device(i[1])
-            subdev = device.find_subid(i[2], i[3])
-            print("%s (%s) : %s (%s) %s" %
-                  (vendor.name, vendor.ID, device.name,
-                   device.ID, subdev.name))
-
-    def parse_pmd_info_string(self, mystring):
-        global raw_output
-        global pcidb
-
-        optional_pmd_info = [
-            {'id': 'params', 'tag': 'PMD PARAMETERS'},
-            {'id': 'kmod', 'tag': 'PMD KMOD DEPENDENCIES'}
-        ]
-
-        i = mystring.index("=")
-        mystring = mystring[i + 2:]
-        pmdinfo = json.loads(mystring)
-
-        if raw_output:
-            print(json.dumps(pmdinfo))
-            return
-
-        print("PMD NAME: " + pmdinfo["name"])
-        for i in optional_pmd_info:
-            try:
-                print("%s: %s" % (i['tag'], pmdinfo[i['id']]))
-            except KeyError:
-                continue
-
-        if pmdinfo["pci_ids"]:
-            print("PMD HW SUPPORT:")
-            if pcidb is not None:
-                self.pretty_print_pmdinfo(pmdinfo)
-            else:
-                print("VENDOR\t DEVICE\t SUBVENDOR\t SUBDEVICE")
-                for i in pmdinfo["pci_ids"]:
-                    print("0x%04x\t 0x%04x\t 0x%04x\t\t 0x%04x" %
-                          (i[0], i[1], i[2], i[3]))
-
-        print("")
-
-    def display_pmd_info_strings(self, section_spec):
-        """ Display a strings dump of a section. section_spec is either a
-            section number or a name.
-        """
-        section = self._section_from_spec(section_spec)
-        if section is None:
-            return
-
-        data = section.data()
-        dataptr = 0
-
-        while dataptr < len(data):
-            while (dataptr < len(data) and
-                   not 32 <= byte2int(data[dataptr]) <= 127):
-                dataptr += 1
-
-            if dataptr >= len(data):
-                break
-
-            endptr = dataptr
-            while endptr < len(data) and byte2int(data[endptr]) != 0:
-                endptr += 1
-
-            # pyelftools may return byte-strings, force decode them
-            mystring = force_unicode(data[dataptr:endptr])
-            rc = mystring.find("PMD_INFO_STRING")
-            if rc != -1:
-                self.parse_pmd_info_string(mystring[rc:])
-
-            dataptr = endptr
-
-    def find_librte_eal(self, section):
-        for tag in section.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_NEEDED':
-                if "librte_eal" in force_unicode(tag.needed):
-                    return force_unicode(tag.needed)
-        return None
-
-    def search_for_autoload_path(self):
-        scanelf = self
-        scanfile = None
-        library = None
-
-        section = self._section_from_spec(".dynamic")
-        try:
-            eallib = self.find_librte_eal(section)
-            if eallib is not None:
-                ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-                if ldlibpath is None:
-                    ldlibpath = ""
-                dtr = self.get_dt_runpath(section)
-                library = search_file(eallib,
-                                      dtr + ":" + ldlibpath +
-                                      ":/usr/lib64:/lib64:/usr/lib:/lib")
-                if library is None:
-                    return (None, None)
-                if not raw_output:
-                    print("Scanning for autoload path in %s" % library)
-                scanfile = open(library, 'rb')
-                scanelf = ReadElf(scanfile, sys.stdout)
-        except AttributeError:
-            # Not a dynamic binary
-            pass
-        except ELFError:
-            scanfile.close()
-            return (None, None)
-
-        section = scanelf._section_from_spec(".rodata")
-        if section is None:
-            if scanfile is not None:
-                scanfile.close()
-            return (None, None)
-
-        data = section.data()
-        dataptr = 0
-
-        while dataptr < len(data):
-            while (dataptr < len(data) and
-                   not 32 <= byte2int(data[dataptr]) <= 127):
-                dataptr += 1
-
-            if dataptr >= len(data):
-                break
-
-            endptr = dataptr
-            while endptr < len(data) and byte2int(data[endptr]) != 0:
-                endptr += 1
-
-            # pyelftools may return byte-strings, force decode them
-            mystring = force_unicode(data[dataptr:endptr])
-            rc = mystring.find("DPDK_PLUGIN_PATH")
-            if rc != -1:
-                rc = mystring.find("=")
-                return (mystring[rc + 1:], library)
-
-            dataptr = endptr
-        if scanfile is not None:
-            scanfile.close()
-        return (None, None)
-
-    def get_dt_runpath(self, dynsec):
-        for tag in dynsec.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_RUNPATH':
-                return force_unicode(tag.runpath)
-        return ""
-
-    def process_dt_needed_entries(self):
-        """ Look to see if there are any DT_NEEDED entries in the binary
-            And process those if there are
-        """
-        runpath = ""
-        ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-        if ldlibpath is None:
-            ldlibpath = ""
-
-        dynsec = self._section_from_spec(".dynamic")
-        try:
-            runpath = self.get_dt_runpath(dynsec)
-        except AttributeError:
-            # dynsec is None, just return
-            return
-
-        for tag in dynsec.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_NEEDED':
-                if 'librte_' in force_unicode(tag.needed):
-                    library = search_file(force_unicode(tag.needed),
-                                          runpath + ":" + ldlibpath +
-                                          ":/usr/lib64:/lib64:/usr/lib:/lib")
-                    if library is not None:
-                        with open(library, 'rb') as file:
-                            try:
-                                libelf = ReadElf(file, sys.stdout)
-                            except ELFError:
-                                print("%s is no an ELF file" % library)
-                                continue
-                            libelf.process_dt_needed_entries()
-                            libelf.display_pmd_info_strings(".rodata")
-                            file.close()
-
-
-# compat: remove force_unicode & force_bytes when pyelftools<=0.23 support is
-# dropped.
-def force_unicode(s):
-    if hasattr(s, 'decode') and callable(s.decode):
-        s = s.decode('latin-1')  # same encoding used in pyelftools py3compat
-    return s
-
-
-def force_bytes(s):
-    if hasattr(s, 'encode') and callable(s.encode):
-        s = s.encode('latin-1')  # same encoding used in pyelftools py3compat
-    return s
-
-
-def scan_autoload_path(autoload_path):
-    global raw_output
-
-    if not os.path.exists(autoload_path):
-        return
-
+import glob
+import json
+import logging
+import os
+import re
+import string
+import sys
+from pathlib import Path
+from typing import Iterable, Iterator, List, Union
+
+import elftools
+from elftools.elf.elffile import ELFError, ELFFile
+
+
+# ----------------------------------------------------------------------------
+def main() -> int:  # pylint: disable=missing-docstring
     try:
-        dirs = os.listdir(autoload_path)
-    except OSError:
-        # Couldn't read the directory, give up
-        return
-
-    for d in dirs:
-        dpath = os.path.join(autoload_path, d)
-        if os.path.isdir(dpath):
-            scan_autoload_path(dpath)
-        if os.path.isfile(dpath):
-            try:
-                file = open(dpath, 'rb')
-                readelf = ReadElf(file, sys.stdout)
-            except ELFError:
-                # this is likely not an elf file, skip it
-                continue
-            except IOError:
-                # No permission to read the file, skip it
-                continue
-
-            if not raw_output:
-                print("Hw Support for library %s" % d)
-            readelf.display_pmd_info_strings(".rodata")
-            file.close()
+        args = parse_args()
+        logging.basicConfig(
+            stream=sys.stderr,
+            format="%(levelname)s: %(message)s",
+            level={
+                0: logging.ERROR,
+                1: logging.WARNING,
+            }.get(args.verbose, logging.DEBUG),
+        )
+        info = parse_pmdinfo(args.elf_files, args.search_plugins)
+        print(json.dumps(info, indent=2))
+    except BrokenPipeError:
+        pass
+    except KeyboardInterrupt:
+        return 1
+    except Exception as e:  # pylint: disable=broad-except
+        logging.error("%s", e)
+        return 1
+    return 0
 
 
-def scan_for_autoload_pmds(dpdk_path):
+# ----------------------------------------------------------------------------
+def parse_args() -> argparse.Namespace:
     """
-    search the specified application or path for a pmd autoload path
-    then scan said path for pmds and report hw support
+    Parse command line arguments.
     """
-    global raw_output
-
-    if not os.path.isfile(dpdk_path):
-        if not raw_output:
-            print("Must specify a file name")
-        return
-
-    file = open(dpdk_path, 'rb')
-    try:
-        readelf = ReadElf(file, sys.stdout)
-    except ElfError:
-        if not raw_output:
-            print("Unable to parse %s" % file)
-        return
-
-    (autoload_path, scannedfile) = readelf.search_for_autoload_path()
-    if not autoload_path:
-        if not raw_output:
-            print("No autoload path configured in %s" % dpdk_path)
-        return
-    if not raw_output:
-        if scannedfile is None:
-            scannedfile = dpdk_path
-        print("Found autoload path %s in %s" % (autoload_path, scannedfile))
-
-    file.close()
-    if not raw_output:
-        print("Discovered Autoload HW Support:")
-    scan_autoload_path(autoload_path)
-    return
-
-
-def main(stream=None):
-    global raw_output
-    global pcidb
-
-    pcifile_default = "./pci.ids"  # For unknown OS's assume local file
-    if platform.system() == 'Linux':
-        # hwdata is the legacy location, misc is supported going forward
-        pcifile_default = "/usr/share/misc/pci.ids"
-        if not os.path.exists(pcifile_default):
-            pcifile_default = "/usr/share/hwdata/pci.ids"
-    elif platform.system() == 'FreeBSD':
-        pcifile_default = "/usr/local/share/pciids/pci.ids"
-        if not os.path.exists(pcifile_default):
-            pcifile_default = "/usr/share/misc/pci_vendors"
-
     parser = argparse.ArgumentParser(
-        usage='usage: %(prog)s [-hrtp] [-d <pci id file>] elf_file',
-        description="Dump pmd hardware support info")
-    group = parser.add_mutually_exclusive_group()
-    group.add_argument('-r', '--raw',
-                       action='store_true', dest='raw_output',
-                       help='dump raw json strings')
-    group.add_argument("-t", "--table", dest="tblout",
-                       help="output information on hw support as a hex table",
-                       action='store_true')
-    parser.add_argument("-d", "--pcidb", dest="pcifile",
-                        help="specify a pci database to get vendor names from",
-                        default=pcifile_default, metavar="FILE")
-    parser.add_argument("-p", "--plugindir", dest="pdir",
-                        help="scan dpdk for autoload plugins",
-                        action='store_true')
-    parser.add_argument("elf_file", help="driver shared object file")
-    args = parser.parse_args()
+        description=__doc__,
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    parser.add_argument(
+        "-p",
+        "--search-plugins",
+        action="store_true",
+        help="""
+        In addition of ELF_FILEs and their linked dynamic libraries, also scan
+        the DPDK plugins path.
+        """,
+    )
+    parser.add_argument(
+        "-v",
+        "--verbose",
+        action="count",
+        default=0,
+        help="""
+        Display warnings due to linked libraries not found or ELF/JSON parsing
+        errors in these libraries. Use twice to show debug messages.
+        """,
+    )
+    parser.add_argument(
+        "elf_files",
+        metavar="ELF_FILE",
+        nargs="+",
+        type=existing_file,
+        help="""
+        DPDK application binary or dynamic library.
+        """,
+    )
+    return parser.parse_args()
 
-    if args.raw_output:
-        raw_output = True
 
-    if args.tblout:
-        args.pcifile = None
+# ----------------------------------------------------------------------------
+def parse_pmdinfo(paths: Iterable[Path], search_plugins: bool) -> List[dict]:
+    """
+    Extract DPDK PMD info JSON strings from an ELF file.
 
-    if args.pcifile:
-        pcidb = PCIIds(args.pcifile)
-        if pcidb is None:
-            print("Pci DB file not found")
-            exit(1)
+    :returns:
+        A list of DPDK drivers info dictionaries.
+    """
+    binaries = set(paths)
+    for p in paths:
+        binaries.update(get_needed_libs(p))
+    if search_plugins:
+        # cast to list to avoid errors with update while iterating
+        binaries.update(list(get_plugin_libs(binaries)))
 
-    if args.pdir:
-        exit(scan_for_autoload_pmds(args.elf_file))
+    drivers = []
 
-    ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-    if ldlibpath is None:
-        ldlibpath = ""
-
-    if os.path.exists(args.elf_file):
-        myelffile = args.elf_file
-    else:
-        myelffile = search_file(args.elf_file,
-                                ldlibpath + ":/usr/lib64:/lib64:/usr/lib:/lib")
-
-    if myelffile is None:
-        print("File not found")
-        sys.exit(1)
-
-    with open(myelffile, 'rb') as file:
+    for b in binaries:
+        logging.debug("analyzing %s", b)
         try:
-            readelf = ReadElf(file, sys.stdout)
-            readelf.process_dt_needed_entries()
-            readelf.display_pmd_info_strings(".rodata")
-            sys.exit(0)
+            for s in get_elf_strings(b, ".rodata", "PMD_INFO_STRING="):
+                try:
+                    info = json.loads(s)
+                    scrub_pci_ids(info)
+                    drivers.append(info)
+                except ValueError as e:
+                    # invalid JSON, should never happen
+                    logging.warning("%s: %s", b, e)
+        except ELFError as e:
+            # only happens for discovered plugins that are not ELF
+            logging.debug("%s: cannot parse ELF: %s", b, e)
 
-        except ELFError as ex:
-            sys.stderr.write('ELF error: %s\n' % ex)
-            sys.exit(1)
+    return drivers
 
 
-# -------------------------------------------------------------------------
-if __name__ == '__main__':
-    main()
+# ----------------------------------------------------------------------------
+PCI_FIELDS = ("vendor", "device", "subsystem_vendor", "subsystem_device")
+
+
+def scrub_pci_ids(info: dict):
+    """
+    Convert numerical ids to hex strings.
+    Strip empty pci_ids lists.
+    Strip wildcard 0xFFFF ids.
+    """
+    pci_ids = []
+    for pci_fields in info.pop("pci_ids"):
+        pci = {}
+        for name, value in zip(PCI_FIELDS, pci_fields):
+            if value != 0xFFFF:
+                pci[name] = f"{value:04x}"
+        if pci:
+            pci_ids.append(pci)
+    if pci_ids:
+        info["pci_ids"] = pci_ids
+
+
+# ----------------------------------------------------------------------------
+def get_plugin_libs(binaries: Iterable[Path]) -> Iterator[Path]:
+    """
+    Look into the provided binaries for DPDK_PLUGIN_PATH and scan the path
+    for files.
+    """
+    for b in binaries:
+        for p in get_elf_strings(b, ".rodata", "DPDK_PLUGIN_PATH="):
+            plugin_path = p.strip()
+            logging.debug("discovering plugins in %s", plugin_path)
+            for root, _, files in os.walk(plugin_path):
+                for f in files:
+                    yield Path(root) / f
+            # no need to search in other binaries.
+            return
+
+
+# ----------------------------------------------------------------------------
+def existing_file(value: str) -> Path:
+    """
+    Argparse type= callback to ensure an argument points to a valid file path.
+    """
+    path = Path(value)
+    if not path.is_file():
+        raise argparse.ArgumentTypeError(f"{value}: No such file")
+    return path
+
+
+# ----------------------------------------------------------------------------
+PRINTABLE_BYTES = frozenset(string.printable.encode("ascii"))
+
+
+def find_strings(buf: bytes, prefix: str) -> Iterator[str]:
+    """
+    Extract strings of printable ASCII characters from a bytes buffer.
+    """
+    view = memoryview(buf)
+    start = None
+
+    for i, b in enumerate(view):
+        if start is None and b in PRINTABLE_BYTES:
+            # mark beginning of string
+            start = i
+            continue
+        if start is not None:
+            if b in PRINTABLE_BYTES:
+                # string not finished
+                continue
+            if b == 0:
+                # end of string
+                s = view[start:i].tobytes().decode("ascii")
+                if s.startswith(prefix):
+                    yield s[len(prefix) :]
+            # There can be byte sequences where a non-printable byte
+            # follows a printable one. Ignore that.
+            start = None
+
+
+# ----------------------------------------------------------------------------
+def elftools_version():
+    """
+    Extract pyelftools version as a tuple of integers for easy comparison.
+    """
+    version = getattr(elftools, "__version__", "")
+    match = re.match(r"^(\d+)\.(\d+).*$", str(version))
+    if not match:
+        # cannot determine version, hope for the best
+        return (0, 24)
+    return (int(match[1]), int(match[2]))
+
+
+ELFTOOLS_VERSION = elftools_version()
+
+
+def from_elftools(s: Union[bytes, str]) -> str:
+    """
+    Earlier versions of pyelftools (< 0.24) return bytes encoded with "latin-1"
+    instead of python strings.
+    """
+    if isinstance(s, bytes):
+        return s.decode("latin-1")
+    return s
+
+
+def to_elftools(s: str) -> Union[bytes, str]:
+    """
+    Earlier versions of pyelftools (< 0.24) assume that ELF section and tags
+    are bytes encoded with "latin-1" instead of python strings.
+    """
+    if ELFTOOLS_VERSION < (0, 24):
+        return s.encode("latin-1")
+    return s
+
+
+# ----------------------------------------------------------------------------
+def get_elf_strings(path: Path, section: str, prefix: str) -> Iterator[str]:
+    """
+    Extract strings from a named ELF section in a file.
+    """
+    with path.open("rb") as f:
+        elf = ELFFile(f)
+        sec = elf.get_section_by_name(to_elftools(section))
+        if not sec:
+            return
+        yield from find_strings(sec.data(), prefix)
+
+
+# ----------------------------------------------------------------------------
+def ld_so_path() -> Iterator[str]:
+    """
+    Return the list of directories where dynamic libraries are loaded based
+    on the contents of /etc/ld.so.conf/*.conf.
+    """
+    for conf in glob.iglob("/etc/ld.so.conf/*.conf"):
+        try:
+            with open(conf, "r", encoding="utf-8") as f:
+                for line in f:
+                    line = line.strip()
+                    if os.path.isdir(line):
+                        yield line
+        except OSError:
+            pass
+
+
+LD_SO_CONF_PATH = ld_so_path()
+
+
+def search_dt_needed(origin: Path, needed: str, runpath: List[str]) -> Path:
+    """
+    Search a file into LD_LIBRARY_PATH (if defined), runpath (if set) and in
+    all folders declared in /etc/ld.so.conf/*.conf. Finally, look in the
+    standard folders (/lib followed by /usr/lib).
+    """
+    folders = []
+    if "LD_LIBRARY_PATH" in os.environ:
+        folders += os.environ["LD_LIBRARY_PATH"].split(":")
+    folders += runpath
+    folders += LD_SO_CONF_PATH
+    folders += ["/lib", "/usr/lib"]
+    for d in folders:
+        d = d.replace("$ORIGIN", str(origin.parent.absolute()))
+        filepath = Path(d) / needed
+        if filepath.is_file():
+            return filepath
+    raise FileNotFoundError(needed)
+
+
+# ----------------------------------------------------------------------------
+def get_needed_libs(path: Path) -> Iterator[Path]:
+    """
+    Extract the dynamic library dependencies from an ELF executable.
+    """
+    with path.open("rb") as f:
+        elf = ELFFile(f)
+        dyn = elf.get_section_by_name(to_elftools(".dynamic"))
+        if not dyn:
+            return
+        runpath = []
+        for tag in dyn.iter_tags(to_elftools("DT_RUNPATH")):
+            runpath += from_elftools(tag.runpath).split(":")
+        for tag in dyn.iter_tags(to_elftools("DT_NEEDED")):
+            needed = from_elftools(tag.needed)
+            if not needed.startswith("librte_"):
+                continue
+            logging.debug("%s: DT_NEEDED %s", path, needed)
+            try:
+                yield search_dt_needed(path, needed, runpath)
+            except FileNotFoundError:
+                logging.warning("%s: DT_NEEDED not found: %s", path, needed)
+
+
+# ----------------------------------------------------------------------------
+if __name__ == "__main__":
+    sys.exit(main())
-- 
2.37.3


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5] usertools: rewrite pmdinfo
  2022-09-22 11:58 ` [PATCH v5] " Robin Jarry
@ 2022-09-22 12:03   ` Bruce Richardson
  2022-09-22 15:12   ` Ferruh Yigit
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 42+ messages in thread
From: Bruce Richardson @ 2022-09-22 12:03 UTC (permalink / raw)
  To: Robin Jarry; +Cc: dev, Olivier Matz, Ferruh Yigit

On Thu, Sep 22, 2022 at 01:58:02PM +0200, Robin Jarry wrote:
> dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
> merely prints multiple independent JSON lines which cannot be fed
> directly to any JSON parser. Moreover, the script complexity is rather
> high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
> sections. Rewrite it so that it can produce valid JSON.
> 
> Remove the PCI database parsing for PCI-ID to Vendor-Device names
> conversion. This should be done by external scripts (if really needed).
> 
> The script passes flake8, black, isort and pylint checks.
> 
> I have tested this with a matrix of python/pyelftools versions:
> 
>                                  pyelftools
>                0.22  0.23  0.24  0.25  0.26  0.27  0.28  0.29
>         3.6      ok    ok    ok    ok    ok    ok    ok    ok
>         3.7      ok    ok    ok    ok    ok    ok    ok    ok
>  Python 3.8      ok    ok    ok    ok    ok    ok    ok    ok
>         3.9      ok    ok    ok    ok    ok    ok    ok    ok
>         3.10   fail  fail  fail  fail    ok    ok    ok    ok
> 
> All failures with python 3.10 are related to the same issue:
> 
>   File "elftools/construct/lib/container.py", line 5, in <module>
>     from collections import MutableMapping
>   ImportError: cannot import name 'MutableMapping' from 'collections'
> 
> Python 3.10 support is only available since pyelftools 0.26. The script
> will only work with Python 3.6 and later. Update the minimal system
> requirements, docs and release notes.
> 
> Cc: Olivier Matz <olivier.matz@6wind.com>
> Cc: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Cc: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Robin Jarry <rjarry@redhat.com>

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5] usertools: rewrite pmdinfo
  2022-09-22 11:58 ` [PATCH v5] " Robin Jarry
  2022-09-22 12:03   ` Bruce Richardson
@ 2022-09-22 15:12   ` Ferruh Yigit
  2022-09-26 11:55   ` Olivier Matz
  2022-09-26 12:52   ` Robin Jarry
  3 siblings, 0 replies; 42+ messages in thread
From: Ferruh Yigit @ 2022-09-22 15:12 UTC (permalink / raw)
  To: Robin Jarry, dev; +Cc: Olivier Matz, Ferruh Yigit, Bruce Richardson

On 9/22/2022 12:58 PM, Robin Jarry wrote:
> dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
> merely prints multiple independent JSON lines which cannot be fed
> directly to any JSON parser. Moreover, the script complexity is rather
> high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
> sections. Rewrite it so that it can produce valid JSON.
> 
> Remove the PCI database parsing for PCI-ID to Vendor-Device names
> conversion. This should be done by external scripts (if really needed).
> 
> The script passes flake8, black, isort and pylint checks.
> 
> I have tested this with a matrix of python/pyelftools versions:
> 
>                                   pyelftools
>                 0.22  0.23  0.24  0.25  0.26  0.27  0.28  0.29
>          3.6      ok    ok    ok    ok    ok    ok    ok    ok
>          3.7      ok    ok    ok    ok    ok    ok    ok    ok
>   Python 3.8      ok    ok    ok    ok    ok    ok    ok    ok
>          3.9      ok    ok    ok    ok    ok    ok    ok    ok
>          3.10   fail  fail  fail  fail    ok    ok    ok    ok
> 
> All failures with python 3.10 are related to the same issue:
> 
>    File "elftools/construct/lib/container.py", line 5, in <module>
>      from collections import MutableMapping
>    ImportError: cannot import name 'MutableMapping' from 'collections'
> 
> Python 3.10 support is only available since pyelftools 0.26. The script
> will only work with Python 3.6 and later. Update the minimal system
> requirements, docs and release notes.
> 
> Cc: Olivier Matz<olivier.matz@6wind.com>
> Cc: Ferruh Yigit<ferruh.yigit@xilinx.com>
> Cc: Bruce Richardson<bruce.richardson@intel.com>
> Signed-off-by: Robin Jarry<rjarry@redhat.com>

Tested-by: Ferruh Yigit <ferruh.yigit@amd.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5] usertools: rewrite pmdinfo
  2022-09-22 11:58 ` [PATCH v5] " Robin Jarry
  2022-09-22 12:03   ` Bruce Richardson
  2022-09-22 15:12   ` Ferruh Yigit
@ 2022-09-26 11:55   ` Olivier Matz
  2022-09-26 12:52   ` Robin Jarry
  3 siblings, 0 replies; 42+ messages in thread
From: Olivier Matz @ 2022-09-26 11:55 UTC (permalink / raw)
  To: Robin Jarry; +Cc: dev, Ferruh Yigit, Bruce Richardson

On Thu, Sep 22, 2022 at 01:58:02PM +0200, Robin Jarry wrote:
> dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
> merely prints multiple independent JSON lines which cannot be fed
> directly to any JSON parser. Moreover, the script complexity is rather
> high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
> sections. Rewrite it so that it can produce valid JSON.
> 
> Remove the PCI database parsing for PCI-ID to Vendor-Device names
> conversion. This should be done by external scripts (if really needed).
> 
> The script passes flake8, black, isort and pylint checks.
> 
> I have tested this with a matrix of python/pyelftools versions:
> 
>                                  pyelftools
>                0.22  0.23  0.24  0.25  0.26  0.27  0.28  0.29
>         3.6      ok    ok    ok    ok    ok    ok    ok    ok
>         3.7      ok    ok    ok    ok    ok    ok    ok    ok
>  Python 3.8      ok    ok    ok    ok    ok    ok    ok    ok
>         3.9      ok    ok    ok    ok    ok    ok    ok    ok
>         3.10   fail  fail  fail  fail    ok    ok    ok    ok
> 
> All failures with python 3.10 are related to the same issue:
> 
>   File "elftools/construct/lib/container.py", line 5, in <module>
>     from collections import MutableMapping
>   ImportError: cannot import name 'MutableMapping' from 'collections'
> 
> Python 3.10 support is only available since pyelftools 0.26. The script
> will only work with Python 3.6 and later. Update the minimal system
> requirements, docs and release notes.
> 
> Cc: Olivier Matz <olivier.matz@6wind.com>
> Cc: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Cc: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Robin Jarry <rjarry@redhat.com>

Tested-by: Olivier Matz <olivier.matz@6wind.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v5] usertools: rewrite pmdinfo
  2022-09-22 11:58 ` [PATCH v5] " Robin Jarry
                     ` (2 preceding siblings ...)
  2022-09-26 11:55   ` Olivier Matz
@ 2022-09-26 12:52   ` Robin Jarry
  3 siblings, 0 replies; 42+ messages in thread
From: Robin Jarry @ 2022-09-26 12:52 UTC (permalink / raw)
  To: dev; +Cc: Olivier Matz, Ferruh Yigit, Bruce Richardson

Robin Jarry, Sep 22, 2022 at 13:58:
> +# ----------------------------------------------------------------------------
> +def ld_so_path() -> Iterator[str]:
> +    """
> +    Return the list of directories where dynamic libraries are loaded based
> +    on the contents of /etc/ld.so.conf/*.conf.
> +    """
> +    for conf in glob.iglob("/etc/ld.so.conf/*.conf"):

I just noticed that this folder path is invalid. I did not encounter any
errors since I did not test with librte_*.so libs installed in /usr/*.

> +        try:
> +            with open(conf, "r", encoding="utf-8") as f:
> +                for line in f:
> +                    line = line.strip()
> +                    if os.path.isdir(line):
> +                        yield line
> +        except OSError:
> +            pass
> +
> +
> +LD_SO_CONF_PATH = ld_so_path()

Also, this is stupid. The iterator will be exhausted after iterating
once over it and the SO path folders will be empty on subsequent
lookups. I'll submit a v6 with a fix for these two bugs.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v6] usertools: rewrite pmdinfo
  2022-09-13 10:58 [PATCH] usertools: rewrite pmdinfo Robin Jarry
                   ` (5 preceding siblings ...)
  2022-09-22 11:58 ` [PATCH v5] " Robin Jarry
@ 2022-09-26 13:44 ` Robin Jarry
  2022-09-26 15:17   ` Bruce Richardson
  2022-10-04 19:29 ` [PATCH v7] " Robin Jarry
  7 siblings, 1 reply; 42+ messages in thread
From: Robin Jarry @ 2022-09-26 13:44 UTC (permalink / raw)
  To: dev; +Cc: Robin Jarry, Olivier Matz, Ferruh Yigit, Bruce Richardson

dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
merely prints multiple independent JSON lines which cannot be fed
directly to any JSON parser. Moreover, the script complexity is rather
high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
sections. Rewrite it so that it can produce valid JSON.

Remove the PCI database parsing for PCI-ID to Vendor-Device names
conversion. This should be done by external scripts (if really needed).

The script passes flake8, black, isort and pylint checks.

I have tested this with a matrix of python/pyelftools versions:

                                 pyelftools
               0.22  0.23  0.24  0.25  0.26  0.27  0.28  0.29
        3.6      ok    ok    ok    ok    ok    ok    ok    ok
        3.7      ok    ok    ok    ok    ok    ok    ok    ok
 Python 3.8      ok    ok    ok    ok    ok    ok    ok    ok
        3.9      ok    ok    ok    ok    ok    ok    ok    ok
        3.10   fail  fail  fail  fail    ok    ok    ok    ok

All failures with python 3.10 are related to the same issue:

  File "elftools/construct/lib/container.py", line 5, in <module>
    from collections import MutableMapping
  ImportError: cannot import name 'MutableMapping' from 'collections'

Python 3.10 support is only available since pyelftools 0.26. The script
will only work with Python 3.6 and later. Update the minimal system
requirements, docs and release notes.

Cc: Olivier Matz <olivier.matz@6wind.com>
Cc: Ferruh Yigit <ferruh.yigit@xilinx.com>
Cc: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Robin Jarry <rjarry@redhat.com>
---
v5 -> v6:

* fixed typo: /etc/ld.so.conf/*.conf -> /etc/ld.so.conf.d/*.conf
* changed ld_so_path() to return an actual list to allow iterating
  multiple times
* include standard /lib & /usr/lib folders in LD_SO_PATH

v4 -> v5:

* fixed doc/guides/rel_notes/release_22_11.rst
* updated doc/guides/tools/pmdinfo.rst with examples that were in the
  commit message

v3 -> v4:

* also strip pci_id fields when they have the wildcard 0xFFFF value.

v2 -> v3:

* strip "pci_ids" when it is empty (some drivers do not support any
  pci devices)

v1 -> v2:

* update release notes and minimal python version requirement
* hide warnings by default (-v/--verbose to show them)
* show debug messages with -vv
* also search libs in folders listed in /etc/ld.so.conf/*.conf
* only search for DT_NEEDED on executables, not on dynamic libraries
* take DT_RUNPATH into account for searching libraries
* fix weird broken pipe error
* fix some typos:
    s/begining/beginning/
    s/subsystem_device/subsystem_vendor/
    s/subsystem_system/subsystem_device/
* change field names for pci_ids elements (remove _id suffixes)
* DT_NEEDED of files are analyzed. There is no way to differentiate
  between dynamically linked executables and dynamic libraries.

 doc/guides/linux_gsg/sys_reqs.rst      |   2 +-
 doc/guides/rel_notes/release_22_11.rst |   8 +
 doc/guides/tools/pmdinfo.rst           |  86 ++-
 usertools/dpdk-pmdinfo.py              | 927 +++++++++----------------
 4 files changed, 403 insertions(+), 620 deletions(-)

diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index 08d45898f025..f842105eeda7 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -41,7 +41,7 @@ Compilation of the DPDK
    resulting in statically linked applications not being linked properly.
    Use an updated version of ``pkg-config`` or ``pkgconf`` instead when building applications
 
-*   Python 3.5 or later.
+*   Python 3.6 or later.
 
 *   Meson (version 0.49.2+) and ninja
 
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 8c021cf0505e..d10e856ada74 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -55,6 +55,14 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* The ``dpdk-pmdinfo.py`` script was rewritten to produce valid JSON only.
+  PCI-IDs parsing has been removed.
+  To get a similar output to the (now removed) ``-r/--raw`` flag, you may use the following command:
+
+  .. code-block:: sh
+
+     strings $dpdk_binary_or_driver | sed -n 's/^PMD_INFO_STRING= //p'
+
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/pmdinfo.rst b/doc/guides/tools/pmdinfo.rst
index af6d24b0f63b..3354cbba4c8a 100644
--- a/doc/guides/tools/pmdinfo.rst
+++ b/doc/guides/tools/pmdinfo.rst
@@ -5,25 +5,85 @@
 dpdk-pmdinfo Application
 ========================
 
-The ``dpdk-pmdinfo`` tool is a Data Plane Development Kit (DPDK) utility that
-can dump a PMDs hardware support info.
+The ``dpdk-pmdinfo.py`` tool is a Data Plane Development Kit (DPDK) utility that
+can dump a PMDs hardware support info in the JSON format.
 
+Synopsis
+--------
 
-Running the Application
------------------------
+::
 
-The tool has a number of command line options:
+   dpdk-pmdinfo.py [-h] [-p] [-v] ELF_FILE [ELF_FILE ...]
+
+Arguments
+---------
+
+.. program:: dpdk-pmdinfo.py
+
+.. option:: -h, --help
+
+   Show the inline help.
+
+.. option:: -p, --search-plugins
+
+   In addition of ``ELF_FILE``\s and their linked dynamic libraries, also scan
+   the DPDK plugins path.
+
+.. option:: -v, --verbose
+
+   Display warnings due to linked libraries not found or ELF/JSON parsing errors
+   in these libraries. Use twice to show debug messages.
+
+.. option:: ELF_FILE
+
+   DPDK application binary or dynamic library.
+   Can be specified multiple times.
+
+Environment Variables
+---------------------
+
+.. envvar:: LD_LIBRARY_PATH
+
+   ``dpdk-pmdinfo.py`` will also parse ``librte_*.so`` dynamic libraries that
+   are linked to the specified ``ELF_FILE``\s arguments. The dynamic library
+   files will be looked up based on the ``DT_RUNPATH`` set at link time, the
+   ``/etc/ld.so.conf.d/*.conf`` files and also in the standard ``/lib`` and
+   ``/usr/lib`` folders. Any colon separated folder defined in
+   ``LD_LIBRARY_PATH`` will be looked up first.
+
+Examples
+--------
+
+Get the complete info for a given driver:
 
 .. code-block:: console
 
-   dpdk-pmdinfo [-hrtp] [-d <pci id file] <elf-file>
+   $ dpdk-pmdinfo.py /usr/bin/dpdk-testpmd | \
+       jq '.[] | select(.name == "net_ice_dcf")'
+   {
+     "name": "net_ice_dcf",
+     "params": "cap=dcf",
+     "kmod": "* igb_uio | vfio-pci",
+     "pci_ids": [
+       {
+         "vendor": "8086",
+         "device": "1889"
+       }
+     ]
+   }
 
-   -h, --help            Show a short help message and exit
-   -r, --raw             Dump as raw json strings
-   -d FILE, --pcidb=FILE Specify a pci database to get vendor names from
-   -t, --table           Output information on hw support as a hex table
-   -p, --plugindir       Scan dpdk for autoload plugins
+Get only the required kernel modules for a given driver:
 
-.. Note::
+.. code-block:: console
 
-   * Parameters inside the square brackets represents optional parameters.
+   $ dpdk-pmdinfo.py /usr/bin/dpdk-testpmd | \
+       jq '.[] | select(.name == "net_cn10k").kmod'
+   "vfio-pci"
+
+Get only the required kernel modules for a given device:
+
+.. code-block:: console
+
+   $ dpdk-pmdinfo.py /usr/bin/dpdk-testpmd | \
+       jq '.[] | select(.pci_ids[] | .vendor == "15b3" and .device == "1013").kmod'
+   "* ib_uverbs & mlx5_core & mlx5_ib"
diff --git a/usertools/dpdk-pmdinfo.py b/usertools/dpdk-pmdinfo.py
index 40ef5cec6cba..4021ab357cc9 100755
--- a/usertools/dpdk-pmdinfo.py
+++ b/usertools/dpdk-pmdinfo.py
@@ -1,626 +1,341 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2016  Neil Horman <nhorman@tuxdriver.com>
+# Copyright(c) 2022  Robin Jarry
+# pylint: disable=invalid-name
+
+r"""
+Utility to dump PMD_INFO_STRING support from DPDK binaries.
+
+This script prints JSON output to be interpreted by other tools. Here are some
+examples with jq:
+
+Get the complete info for a given driver:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.name == "cnxk_nix_inl")'
+
+Get only the required kernel modules for a given driver:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.name == "net_i40e").kmod'
+
+Get only the required kernel modules for a given device:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.devices[] | .vendor_id == "15b3" and .device_id == "1013").kmod'
+"""
 
-# -------------------------------------------------------------------------
-#
-# Utility to dump PMD_INFO_STRING support from an object file
-#
-# -------------------------------------------------------------------------
-import json
-import os
-import platform
-import sys
 import argparse
-from elftools.common.exceptions import ELFError
-from elftools.common.py3compat import byte2int
-from elftools.elf.elffile import ELFFile
-
-
-# For running from development directory. It should take precedence over the
-# installed pyelftools.
-sys.path.insert(0, '.')
-
-raw_output = False
-pcidb = None
-
-# ===========================================
-
-class Vendor:
-    """
-    Class for vendors. This is the top level class
-    for the devices belong to a specific vendor.
-    self.devices is the device dictionary
-    subdevices are in each device.
-    """
-
-    def __init__(self, vendorStr):
-        """
-        Class initializes with the raw line from pci.ids
-        Parsing takes place inside __init__
-        """
-        self.ID = vendorStr.split()[0]
-        self.name = vendorStr.replace("%s " % self.ID, "").rstrip()
-        self.devices = {}
-
-    def addDevice(self, deviceStr):
-        """
-        Adds a device to self.devices
-        takes the raw line from pci.ids
-        """
-        s = deviceStr.strip()
-        devID = s.split()[0]
-        if devID in self.devices:
-            pass
-        else:
-            self.devices[devID] = Device(deviceStr)
-
-    def report(self):
-        print(self.ID, self.name)
-        for id, dev in self.devices.items():
-            dev.report()
-
-    def find_device(self, devid):
-        # convert to a hex string and remove 0x
-        devid = hex(devid)[2:]
-        try:
-            return self.devices[devid]
-        except:
-            return Device("%s  Unknown Device" % devid)
-
-
-class Device:
-
-    def __init__(self, deviceStr):
-        """
-        Class for each device.
-        Each vendor has its own devices dictionary.
-        """
-        s = deviceStr.strip()
-        self.ID = s.split()[0]
-        self.name = s.replace("%s  " % self.ID, "")
-        self.subdevices = {}
-
-    def report(self):
-        print("\t%s\t%s" % (self.ID, self.name))
-        for subID, subdev in self.subdevices.items():
-            subdev.report()
-
-    def addSubDevice(self, subDeviceStr):
-        """
-        Adds a subvendor, subdevice to device.
-        Uses raw line from pci.ids
-        """
-        s = subDeviceStr.strip()
-        spl = s.split()
-        subVendorID = spl[0]
-        subDeviceID = spl[1]
-        subDeviceName = s.split("  ")[-1]
-        devID = "%s:%s" % (subVendorID, subDeviceID)
-        self.subdevices[devID] = SubDevice(
-            subVendorID, subDeviceID, subDeviceName)
-
-    def find_subid(self, subven, subdev):
-        subven = hex(subven)[2:]
-        subdev = hex(subdev)[2:]
-        devid = "%s:%s" % (subven, subdev)
-
-        try:
-            return self.subdevices[devid]
-        except:
-            if (subven == "ffff" and subdev == "ffff"):
-                return SubDevice("ffff", "ffff", "(All Subdevices)")
-            return SubDevice(subven, subdev, "(Unknown Subdevice)")
-
-
-class SubDevice:
-    """
-    Class for subdevices.
-    """
-
-    def __init__(self, vendor, device, name):
-        """
-        Class initializes with vendorid, deviceid and name
-        """
-        self.vendorID = vendor
-        self.deviceID = device
-        self.name = name
-
-    def report(self):
-        print("\t\t%s\t%s\t%s" % (self.vendorID, self.deviceID, self.name))
-
-
-class PCIIds:
-    """
-    Top class for all pci.ids entries.
-    All queries will be asked to this class.
-    PCIIds.vendors["0e11"].devices["0046"].\
-    subdevices["0e11:4091"].name  =  "Smart Array 6i"
-    """
-
-    def __init__(self, filename):
-        """
-        Prepares the directories.
-        Checks local data file.
-        Tries to load from local, if not found, downloads from web
-        """
-        self.version = ""
-        self.date = ""
-        self.vendors = {}
-        self.contents = None
-        self.readLocal(filename)
-        self.parse()
-
-    def reportVendors(self):
-        """Reports the vendors
-        """
-        for vid, v in self.vendors.items():
-            print(v.ID, v.name)
-
-    def report(self, vendor=None):
-        """
-        Reports everything for all vendors or a specific vendor
-        PCIIds.report()  reports everything
-        PCIIDs.report("0e11") reports only "Compaq Computer Corporation"
-        """
-        if vendor is not None:
-            self.vendors[vendor].report()
-        else:
-            for vID, v in self.vendors.items():
-                v.report()
-
-    def find_vendor(self, vid):
-        # convert vid to a hex string and remove the 0x
-        vid = hex(vid)[2:]
-
-        try:
-            return self.vendors[vid]
-        except:
-            return Vendor("%s Unknown Vendor" % (vid))
-
-    def findDate(self, content):
-        for l in content:
-            if l.find("Date:") > -1:
-                return l.split()[-2].replace("-", "")
-        return None
-
-    def parse(self):
-        if not self.contents:
-            print("data/%s-pci.ids not found" % self.date)
-        else:
-            vendorID = ""
-            deviceID = ""
-            for l in self.contents:
-                if l[0] == "#":
-                    continue
-                elif not l.strip():
-                    continue
-                else:
-                    if l.find("\t\t") == 0:
-                        self.vendors[vendorID].devices[
-                            deviceID].addSubDevice(l)
-                    elif l.find("\t") == 0:
-                        deviceID = l.strip().split()[0]
-                        self.vendors[vendorID].addDevice(l)
-                    else:
-                        vendorID = l.split()[0]
-                        self.vendors[vendorID] = Vendor(l)
-
-    def readLocal(self, filename):
-        """
-        Reads the local file
-        """
-        with open(filename, 'r', encoding='utf-8') as f:
-            self.contents = f.readlines()
-        self.date = self.findDate(self.contents)
-
-    def loadLocal(self):
-        """
-        Loads database from local. If there is no file,
-        it creates a new one from web
-        """
-        self.date = idsfile[0].split("/")[1].split("-")[0]
-        self.readLocal()
-
-
-# =======================================
-
-def search_file(filename, search_path):
-    """ Given a search path, find file with requested name """
-    for path in search_path.split(':'):
-        candidate = os.path.join(path, filename)
-        if os.path.exists(candidate):
-            return os.path.abspath(candidate)
-    return None
-
-
-class ReadElf(object):
-    """ display_* methods are used to emit output into the output stream
-    """
-
-    def __init__(self, file, output):
-        """ file:
-                stream object with the ELF file to read
-
-            output:
-                output stream to write to
-        """
-        self.elffile = ELFFile(file)
-        self.output = output
-
-        # Lazily initialized if a debug dump is requested
-        self._dwarfinfo = None
-
-        self._versioninfo = None
-
-    def _section_from_spec(self, spec):
-        """ Retrieve a section given a "spec" (either number or name).
-            Return None if no such section exists in the file.
-        """
-        try:
-            num = int(spec)
-            if num < self.elffile.num_sections():
-                return self.elffile.get_section(num)
-            return None
-        except ValueError:
-            # Not a number. Must be a name then
-            section = self.elffile.get_section_by_name(force_unicode(spec))
-            if section is None:
-                # No match with a unicode name.
-                # Some versions of pyelftools (<= 0.23) store internal strings
-                # as bytes. Try again with the name encoded as bytes.
-                section = self.elffile.get_section_by_name(force_bytes(spec))
-            return section
-
-    def pretty_print_pmdinfo(self, pmdinfo):
-        global pcidb
-
-        for i in pmdinfo["pci_ids"]:
-            vendor = pcidb.find_vendor(i[0])
-            device = vendor.find_device(i[1])
-            subdev = device.find_subid(i[2], i[3])
-            print("%s (%s) : %s (%s) %s" %
-                  (vendor.name, vendor.ID, device.name,
-                   device.ID, subdev.name))
-
-    def parse_pmd_info_string(self, mystring):
-        global raw_output
-        global pcidb
-
-        optional_pmd_info = [
-            {'id': 'params', 'tag': 'PMD PARAMETERS'},
-            {'id': 'kmod', 'tag': 'PMD KMOD DEPENDENCIES'}
-        ]
-
-        i = mystring.index("=")
-        mystring = mystring[i + 2:]
-        pmdinfo = json.loads(mystring)
-
-        if raw_output:
-            print(json.dumps(pmdinfo))
-            return
-
-        print("PMD NAME: " + pmdinfo["name"])
-        for i in optional_pmd_info:
-            try:
-                print("%s: %s" % (i['tag'], pmdinfo[i['id']]))
-            except KeyError:
-                continue
-
-        if pmdinfo["pci_ids"]:
-            print("PMD HW SUPPORT:")
-            if pcidb is not None:
-                self.pretty_print_pmdinfo(pmdinfo)
-            else:
-                print("VENDOR\t DEVICE\t SUBVENDOR\t SUBDEVICE")
-                for i in pmdinfo["pci_ids"]:
-                    print("0x%04x\t 0x%04x\t 0x%04x\t\t 0x%04x" %
-                          (i[0], i[1], i[2], i[3]))
-
-        print("")
-
-    def display_pmd_info_strings(self, section_spec):
-        """ Display a strings dump of a section. section_spec is either a
-            section number or a name.
-        """
-        section = self._section_from_spec(section_spec)
-        if section is None:
-            return
-
-        data = section.data()
-        dataptr = 0
-
-        while dataptr < len(data):
-            while (dataptr < len(data) and
-                   not 32 <= byte2int(data[dataptr]) <= 127):
-                dataptr += 1
-
-            if dataptr >= len(data):
-                break
-
-            endptr = dataptr
-            while endptr < len(data) and byte2int(data[endptr]) != 0:
-                endptr += 1
-
-            # pyelftools may return byte-strings, force decode them
-            mystring = force_unicode(data[dataptr:endptr])
-            rc = mystring.find("PMD_INFO_STRING")
-            if rc != -1:
-                self.parse_pmd_info_string(mystring[rc:])
-
-            dataptr = endptr
-
-    def find_librte_eal(self, section):
-        for tag in section.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_NEEDED':
-                if "librte_eal" in force_unicode(tag.needed):
-                    return force_unicode(tag.needed)
-        return None
-
-    def search_for_autoload_path(self):
-        scanelf = self
-        scanfile = None
-        library = None
-
-        section = self._section_from_spec(".dynamic")
-        try:
-            eallib = self.find_librte_eal(section)
-            if eallib is not None:
-                ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-                if ldlibpath is None:
-                    ldlibpath = ""
-                dtr = self.get_dt_runpath(section)
-                library = search_file(eallib,
-                                      dtr + ":" + ldlibpath +
-                                      ":/usr/lib64:/lib64:/usr/lib:/lib")
-                if library is None:
-                    return (None, None)
-                if not raw_output:
-                    print("Scanning for autoload path in %s" % library)
-                scanfile = open(library, 'rb')
-                scanelf = ReadElf(scanfile, sys.stdout)
-        except AttributeError:
-            # Not a dynamic binary
-            pass
-        except ELFError:
-            scanfile.close()
-            return (None, None)
-
-        section = scanelf._section_from_spec(".rodata")
-        if section is None:
-            if scanfile is not None:
-                scanfile.close()
-            return (None, None)
-
-        data = section.data()
-        dataptr = 0
-
-        while dataptr < len(data):
-            while (dataptr < len(data) and
-                   not 32 <= byte2int(data[dataptr]) <= 127):
-                dataptr += 1
-
-            if dataptr >= len(data):
-                break
-
-            endptr = dataptr
-            while endptr < len(data) and byte2int(data[endptr]) != 0:
-                endptr += 1
-
-            # pyelftools may return byte-strings, force decode them
-            mystring = force_unicode(data[dataptr:endptr])
-            rc = mystring.find("DPDK_PLUGIN_PATH")
-            if rc != -1:
-                rc = mystring.find("=")
-                return (mystring[rc + 1:], library)
-
-            dataptr = endptr
-        if scanfile is not None:
-            scanfile.close()
-        return (None, None)
-
-    def get_dt_runpath(self, dynsec):
-        for tag in dynsec.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_RUNPATH':
-                return force_unicode(tag.runpath)
-        return ""
-
-    def process_dt_needed_entries(self):
-        """ Look to see if there are any DT_NEEDED entries in the binary
-            And process those if there are
-        """
-        runpath = ""
-        ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-        if ldlibpath is None:
-            ldlibpath = ""
-
-        dynsec = self._section_from_spec(".dynamic")
-        try:
-            runpath = self.get_dt_runpath(dynsec)
-        except AttributeError:
-            # dynsec is None, just return
-            return
-
-        for tag in dynsec.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_NEEDED':
-                if 'librte_' in force_unicode(tag.needed):
-                    library = search_file(force_unicode(tag.needed),
-                                          runpath + ":" + ldlibpath +
-                                          ":/usr/lib64:/lib64:/usr/lib:/lib")
-                    if library is not None:
-                        with open(library, 'rb') as file:
-                            try:
-                                libelf = ReadElf(file, sys.stdout)
-                            except ELFError:
-                                print("%s is no an ELF file" % library)
-                                continue
-                            libelf.process_dt_needed_entries()
-                            libelf.display_pmd_info_strings(".rodata")
-                            file.close()
-
-
-# compat: remove force_unicode & force_bytes when pyelftools<=0.23 support is
-# dropped.
-def force_unicode(s):
-    if hasattr(s, 'decode') and callable(s.decode):
-        s = s.decode('latin-1')  # same encoding used in pyelftools py3compat
-    return s
-
-
-def force_bytes(s):
-    if hasattr(s, 'encode') and callable(s.encode):
-        s = s.encode('latin-1')  # same encoding used in pyelftools py3compat
-    return s
-
-
-def scan_autoload_path(autoload_path):
-    global raw_output
-
-    if not os.path.exists(autoload_path):
-        return
-
+import glob
+import json
+import logging
+import os
+import re
+import string
+import sys
+from pathlib import Path
+from typing import Iterable, Iterator, List, Union
+
+import elftools
+from elftools.elf.elffile import ELFError, ELFFile
+
+
+# ----------------------------------------------------------------------------
+def main() -> int:  # pylint: disable=missing-docstring
     try:
-        dirs = os.listdir(autoload_path)
-    except OSError:
-        # Couldn't read the directory, give up
-        return
-
-    for d in dirs:
-        dpath = os.path.join(autoload_path, d)
-        if os.path.isdir(dpath):
-            scan_autoload_path(dpath)
-        if os.path.isfile(dpath):
-            try:
-                file = open(dpath, 'rb')
-                readelf = ReadElf(file, sys.stdout)
-            except ELFError:
-                # this is likely not an elf file, skip it
-                continue
-            except IOError:
-                # No permission to read the file, skip it
-                continue
-
-            if not raw_output:
-                print("Hw Support for library %s" % d)
-            readelf.display_pmd_info_strings(".rodata")
-            file.close()
+        args = parse_args()
+        logging.basicConfig(
+            stream=sys.stderr,
+            format="%(levelname)s: %(message)s",
+            level={
+                0: logging.ERROR,
+                1: logging.WARNING,
+            }.get(args.verbose, logging.DEBUG),
+        )
+        info = parse_pmdinfo(args.elf_files, args.search_plugins)
+        print(json.dumps(info, indent=2))
+    except BrokenPipeError:
+        pass
+    except KeyboardInterrupt:
+        return 1
+    except Exception as e:  # pylint: disable=broad-except
+        logging.error("%s", e)
+        return 1
+    return 0
 
 
-def scan_for_autoload_pmds(dpdk_path):
+# ----------------------------------------------------------------------------
+def parse_args() -> argparse.Namespace:
     """
-    search the specified application or path for a pmd autoload path
-    then scan said path for pmds and report hw support
+    Parse command line arguments.
     """
-    global raw_output
-
-    if not os.path.isfile(dpdk_path):
-        if not raw_output:
-            print("Must specify a file name")
-        return
-
-    file = open(dpdk_path, 'rb')
-    try:
-        readelf = ReadElf(file, sys.stdout)
-    except ElfError:
-        if not raw_output:
-            print("Unable to parse %s" % file)
-        return
-
-    (autoload_path, scannedfile) = readelf.search_for_autoload_path()
-    if not autoload_path:
-        if not raw_output:
-            print("No autoload path configured in %s" % dpdk_path)
-        return
-    if not raw_output:
-        if scannedfile is None:
-            scannedfile = dpdk_path
-        print("Found autoload path %s in %s" % (autoload_path, scannedfile))
-
-    file.close()
-    if not raw_output:
-        print("Discovered Autoload HW Support:")
-    scan_autoload_path(autoload_path)
-    return
-
-
-def main(stream=None):
-    global raw_output
-    global pcidb
-
-    pcifile_default = "./pci.ids"  # For unknown OS's assume local file
-    if platform.system() == 'Linux':
-        # hwdata is the legacy location, misc is supported going forward
-        pcifile_default = "/usr/share/misc/pci.ids"
-        if not os.path.exists(pcifile_default):
-            pcifile_default = "/usr/share/hwdata/pci.ids"
-    elif platform.system() == 'FreeBSD':
-        pcifile_default = "/usr/local/share/pciids/pci.ids"
-        if not os.path.exists(pcifile_default):
-            pcifile_default = "/usr/share/misc/pci_vendors"
-
     parser = argparse.ArgumentParser(
-        usage='usage: %(prog)s [-hrtp] [-d <pci id file>] elf_file',
-        description="Dump pmd hardware support info")
-    group = parser.add_mutually_exclusive_group()
-    group.add_argument('-r', '--raw',
-                       action='store_true', dest='raw_output',
-                       help='dump raw json strings')
-    group.add_argument("-t", "--table", dest="tblout",
-                       help="output information on hw support as a hex table",
-                       action='store_true')
-    parser.add_argument("-d", "--pcidb", dest="pcifile",
-                        help="specify a pci database to get vendor names from",
-                        default=pcifile_default, metavar="FILE")
-    parser.add_argument("-p", "--plugindir", dest="pdir",
-                        help="scan dpdk for autoload plugins",
-                        action='store_true')
-    parser.add_argument("elf_file", help="driver shared object file")
-    args = parser.parse_args()
+        description=__doc__,
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    parser.add_argument(
+        "-p",
+        "--search-plugins",
+        action="store_true",
+        help="""
+        In addition of ELF_FILEs and their linked dynamic libraries, also scan
+        the DPDK plugins path.
+        """,
+    )
+    parser.add_argument(
+        "-v",
+        "--verbose",
+        action="count",
+        default=0,
+        help="""
+        Display warnings due to linked libraries not found or ELF/JSON parsing
+        errors in these libraries. Use twice to show debug messages.
+        """,
+    )
+    parser.add_argument(
+        "elf_files",
+        metavar="ELF_FILE",
+        nargs="+",
+        type=existing_file,
+        help="""
+        DPDK application binary or dynamic library.
+        """,
+    )
+    return parser.parse_args()
 
-    if args.raw_output:
-        raw_output = True
 
-    if args.tblout:
-        args.pcifile = None
+# ----------------------------------------------------------------------------
+def parse_pmdinfo(paths: Iterable[Path], search_plugins: bool) -> List[dict]:
+    """
+    Extract DPDK PMD info JSON strings from an ELF file.
 
-    if args.pcifile:
-        pcidb = PCIIds(args.pcifile)
-        if pcidb is None:
-            print("Pci DB file not found")
-            exit(1)
+    :returns:
+        A list of DPDK drivers info dictionaries.
+    """
+    binaries = set(paths)
+    for p in paths:
+        binaries.update(get_needed_libs(p))
+    if search_plugins:
+        # cast to list to avoid errors with update while iterating
+        binaries.update(list(get_plugin_libs(binaries)))
 
-    if args.pdir:
-        exit(scan_for_autoload_pmds(args.elf_file))
+    drivers = []
 
-    ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-    if ldlibpath is None:
-        ldlibpath = ""
-
-    if os.path.exists(args.elf_file):
-        myelffile = args.elf_file
-    else:
-        myelffile = search_file(args.elf_file,
-                                ldlibpath + ":/usr/lib64:/lib64:/usr/lib:/lib")
-
-    if myelffile is None:
-        print("File not found")
-        sys.exit(1)
-
-    with open(myelffile, 'rb') as file:
+    for b in binaries:
+        logging.debug("analyzing %s", b)
         try:
-            readelf = ReadElf(file, sys.stdout)
-            readelf.process_dt_needed_entries()
-            readelf.display_pmd_info_strings(".rodata")
-            sys.exit(0)
+            for s in get_elf_strings(b, ".rodata", "PMD_INFO_STRING="):
+                try:
+                    info = json.loads(s)
+                    scrub_pci_ids(info)
+                    drivers.append(info)
+                except ValueError as e:
+                    # invalid JSON, should never happen
+                    logging.warning("%s: %s", b, e)
+        except ELFError as e:
+            # only happens for discovered plugins that are not ELF
+            logging.debug("%s: cannot parse ELF: %s", b, e)
 
-        except ELFError as ex:
-            sys.stderr.write('ELF error: %s\n' % ex)
-            sys.exit(1)
+    return drivers
 
 
-# -------------------------------------------------------------------------
-if __name__ == '__main__':
-    main()
+# ----------------------------------------------------------------------------
+PCI_FIELDS = ("vendor", "device", "subsystem_vendor", "subsystem_device")
+
+
+def scrub_pci_ids(info: dict):
+    """
+    Convert numerical ids to hex strings.
+    Strip empty pci_ids lists.
+    Strip wildcard 0xFFFF ids.
+    """
+    pci_ids = []
+    for pci_fields in info.pop("pci_ids"):
+        pci = {}
+        for name, value in zip(PCI_FIELDS, pci_fields):
+            if value != 0xFFFF:
+                pci[name] = f"{value:04x}"
+        if pci:
+            pci_ids.append(pci)
+    if pci_ids:
+        info["pci_ids"] = pci_ids
+
+
+# ----------------------------------------------------------------------------
+def get_plugin_libs(binaries: Iterable[Path]) -> Iterator[Path]:
+    """
+    Look into the provided binaries for DPDK_PLUGIN_PATH and scan the path
+    for files.
+    """
+    for b in binaries:
+        for p in get_elf_strings(b, ".rodata", "DPDK_PLUGIN_PATH="):
+            plugin_path = p.strip()
+            logging.debug("discovering plugins in %s", plugin_path)
+            for root, _, files in os.walk(plugin_path):
+                for f in files:
+                    yield Path(root) / f
+            # no need to search in other binaries.
+            return
+
+
+# ----------------------------------------------------------------------------
+def existing_file(value: str) -> Path:
+    """
+    Argparse type= callback to ensure an argument points to a valid file path.
+    """
+    path = Path(value)
+    if not path.is_file():
+        raise argparse.ArgumentTypeError(f"{value}: No such file")
+    return path
+
+
+# ----------------------------------------------------------------------------
+PRINTABLE_BYTES = frozenset(string.printable.encode("ascii"))
+
+
+def find_strings(buf: bytes, prefix: str) -> Iterator[str]:
+    """
+    Extract strings of printable ASCII characters from a bytes buffer.
+    """
+    view = memoryview(buf)
+    start = None
+
+    for i, b in enumerate(view):
+        if start is None and b in PRINTABLE_BYTES:
+            # mark beginning of string
+            start = i
+            continue
+        if start is not None:
+            if b in PRINTABLE_BYTES:
+                # string not finished
+                continue
+            if b == 0:
+                # end of string
+                s = view[start:i].tobytes().decode("ascii")
+                if s.startswith(prefix):
+                    yield s[len(prefix) :]
+            # There can be byte sequences where a non-printable byte
+            # follows a printable one. Ignore that.
+            start = None
+
+
+# ----------------------------------------------------------------------------
+def elftools_version():
+    """
+    Extract pyelftools version as a tuple of integers for easy comparison.
+    """
+    version = getattr(elftools, "__version__", "")
+    match = re.match(r"^(\d+)\.(\d+).*$", str(version))
+    if not match:
+        # cannot determine version, hope for the best
+        return (0, 24)
+    return (int(match[1]), int(match[2]))
+
+
+ELFTOOLS_VERSION = elftools_version()
+
+
+def from_elftools(s: Union[bytes, str]) -> str:
+    """
+    Earlier versions of pyelftools (< 0.24) return bytes encoded with "latin-1"
+    instead of python strings.
+    """
+    if isinstance(s, bytes):
+        return s.decode("latin-1")
+    return s
+
+
+def to_elftools(s: str) -> Union[bytes, str]:
+    """
+    Earlier versions of pyelftools (< 0.24) assume that ELF section and tags
+    are bytes encoded with "latin-1" instead of python strings.
+    """
+    if ELFTOOLS_VERSION < (0, 24):
+        return s.encode("latin-1")
+    return s
+
+
+# ----------------------------------------------------------------------------
+def get_elf_strings(path: Path, section: str, prefix: str) -> Iterator[str]:
+    """
+    Extract strings from a named ELF section in a file.
+    """
+    with path.open("rb") as f:
+        elf = ELFFile(f)
+        sec = elf.get_section_by_name(to_elftools(section))
+        if not sec:
+            return
+        yield from find_strings(sec.data(), prefix)
+
+
+# ----------------------------------------------------------------------------
+def ld_so_path() -> Iterator[str]:
+    """
+    Return the list of directories where dynamic libraries are loaded based on
+    the contents of /etc/ld.so.conf/*.conf and the standard fallback folders.
+    """
+    sopath = []
+    for conf in glob.iglob("/etc/ld.so.conf.d/*.conf"):
+        try:
+            with open(conf, "r", encoding="utf-8") as f:
+                for line in f:
+                    line = line.strip()
+                    if os.path.isdir(line):
+                        sopath.append(line)
+        except OSError:
+            pass
+    # these two folders are always searched at the end, in that order
+    sopath += ["/lib", "/usr/lib"]
+    return sopath
+
+
+LD_SO_PATH = ld_so_path()
+
+
+def search_dt_needed(origin: Path, needed: str, runpath: List[str]) -> Path:
+    """
+    Search a file into LD_LIBRARY_PATH (if defined), runpath (if set) and in
+    all folders declared in /etc/ld.so.conf/*.conf. Finally, look in the
+    standard folders (/lib followed by /usr/lib).
+    """
+    folders = []
+    if "LD_LIBRARY_PATH" in os.environ:
+        folders += os.environ["LD_LIBRARY_PATH"].split(":")
+    folders += runpath
+    folders += LD_SO_PATH
+    for d in folders:
+        d = d.replace("$ORIGIN", str(origin.parent.absolute()))
+        filepath = Path(d) / needed
+        if filepath.is_file():
+            return filepath
+    raise FileNotFoundError(needed)
+
+
+# ----------------------------------------------------------------------------
+def get_needed_libs(path: Path) -> Iterator[Path]:
+    """
+    Extract the dynamic library dependencies from an ELF executable.
+    """
+    with path.open("rb") as f:
+        elf = ELFFile(f)
+        dyn = elf.get_section_by_name(to_elftools(".dynamic"))
+        if not dyn:
+            return
+        runpath = []
+        for tag in dyn.iter_tags(to_elftools("DT_RUNPATH")):
+            runpath += from_elftools(tag.runpath).split(":")
+        for tag in dyn.iter_tags(to_elftools("DT_NEEDED")):
+            needed = from_elftools(tag.needed)
+            if not needed.startswith("librte_"):
+                continue
+            logging.debug("%s: DT_NEEDED %s", path, needed)
+            try:
+                yield search_dt_needed(path, needed, runpath)
+            except FileNotFoundError:
+                logging.warning("%s: DT_NEEDED not found: %s", path, needed)
+
+
+# ----------------------------------------------------------------------------
+if __name__ == "__main__":
+    sys.exit(main())
-- 
2.37.3


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6] usertools: rewrite pmdinfo
  2022-09-26 13:44 ` [PATCH v6] " Robin Jarry
@ 2022-09-26 15:17   ` Bruce Richardson
  2022-09-28  6:51     ` Robin Jarry
  0 siblings, 1 reply; 42+ messages in thread
From: Bruce Richardson @ 2022-09-26 15:17 UTC (permalink / raw)
  To: Robin Jarry; +Cc: dev, Olivier Matz, Ferruh Yigit

On Mon, Sep 26, 2022 at 03:44:38PM +0200, Robin Jarry wrote:
> dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
> merely prints multiple independent JSON lines which cannot be fed
> directly to any JSON parser. Moreover, the script complexity is rather
> high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
> sections. Rewrite it so that it can produce valid JSON.
> 
> Remove the PCI database parsing for PCI-ID to Vendor-Device names
> conversion. This should be done by external scripts (if really needed).
> 
> The script passes flake8, black, isort and pylint checks.
> 
> I have tested this with a matrix of python/pyelftools versions:
> 
>                                  pyelftools
>                0.22  0.23  0.24  0.25  0.26  0.27  0.28  0.29
>         3.6      ok    ok    ok    ok    ok    ok    ok    ok
>         3.7      ok    ok    ok    ok    ok    ok    ok    ok
>  Python 3.8      ok    ok    ok    ok    ok    ok    ok    ok
>         3.9      ok    ok    ok    ok    ok    ok    ok    ok
>         3.10   fail  fail  fail  fail    ok    ok    ok    ok
> 
> All failures with python 3.10 are related to the same issue:
> 
>   File "elftools/construct/lib/container.py", line 5, in <module>
>     from collections import MutableMapping
>   ImportError: cannot import name 'MutableMapping' from 'collections'
> 
> Python 3.10 support is only available since pyelftools 0.26. The script
> will only work with Python 3.6 and later. Update the minimal system
> requirements, docs and release notes.
> 
> Cc: Olivier Matz <olivier.matz@6wind.com>
> Cc: Ferruh Yigit <ferruh.yigit@xilinx.com>
> Cc: Bruce Richardson <bruce.richardson@intel.com>
> Signed-off-by: Robin Jarry <rjarry@redhat.com>
> ---
> v5 -> v6:
> 
> * fixed typo: /etc/ld.so.conf/*.conf -> /etc/ld.so.conf.d/*.conf

I am a little uncertain about doing this parsing, and worried it may be a
bit fragile. The main file for ld.so still is ld.so.conf, which, on my
system anyway, does indeed just have an include for *.conf in the .d
directory. However, is it possible that there are systems out there that
still have entries in ld.so.conf and possibly elsewhere?

I think my preference would still be to shell out to ldconfig and query its
database, or to shell out to ldd to get the dependencies of a .so from
there. I just think it may be more robust, but at the cost of running some
shell commands.

However, I don't feel strongly about this, so if others prefer the
pure-python ld.so.conf parsing approach better, I'm ok with that.

/Bruce

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6] usertools: rewrite pmdinfo
  2022-09-26 15:17   ` Bruce Richardson
@ 2022-09-28  6:51     ` Robin Jarry
  2022-09-28 10:53       ` Bruce Richardson
  0 siblings, 1 reply; 42+ messages in thread
From: Robin Jarry @ 2022-09-28  6:51 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Olivier Matz, Ferruh Yigit

Bruce Richardson, Sep 26, 2022 at 17:17:
> > * fixed typo: /etc/ld.so.conf/*.conf -> /etc/ld.so.conf.d/*.conf
>
> I am a little uncertain about doing this parsing, and worried it may be a
> bit fragile. The main file for ld.so still is ld.so.conf, which, on my
> system anyway, does indeed just have an include for *.conf in the .d
> directory. However, is it possible that there are systems out there that
> still have entries in ld.so.conf and possibly elsewhere?
>
> I think my preference would still be to shell out to ldconfig and query its
> database, or to shell out to ldd to get the dependencies of a .so from
> there. I just think it may be more robust, but at the cost of running some
> shell commands.
>
> However, I don't feel strongly about this, so if others prefer the
> pure-python ld.so.conf parsing approach better, I'm ok with that.

I was also concerned with parsing ld.so.conf files. However, I did not
find a way to get ldconfig simply to print the folders that are to be
analyzed. This would require some regexp parsing of ldconfig output:

  ldconfig -vNX 2>/dev/null | sed -nre 's,^(/.*): \(from .*\)$,\1,p'

I don't know which way is the least hacky.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6] usertools: rewrite pmdinfo
  2022-09-28  6:51     ` Robin Jarry
@ 2022-09-28 10:53       ` Bruce Richardson
  2022-09-28 11:12         ` Robin Jarry
  0 siblings, 1 reply; 42+ messages in thread
From: Bruce Richardson @ 2022-09-28 10:53 UTC (permalink / raw)
  To: Robin Jarry; +Cc: dev, Olivier Matz, Ferruh Yigit

On Wed, Sep 28, 2022 at 08:51:39AM +0200, Robin Jarry wrote:
> Bruce Richardson, Sep 26, 2022 at 17:17:
> > > * fixed typo: /etc/ld.so.conf/*.conf -> /etc/ld.so.conf.d/*.conf
> >
> > I am a little uncertain about doing this parsing, and worried it may be a
> > bit fragile. The main file for ld.so still is ld.so.conf, which, on my
> > system anyway, does indeed just have an include for *.conf in the .d
> > directory. However, is it possible that there are systems out there that
> > still have entries in ld.so.conf and possibly elsewhere?
> >
> > I think my preference would still be to shell out to ldconfig and query its
> > database, or to shell out to ldd to get the dependencies of a .so from
> > there. I just think it may be more robust, but at the cost of running some
> > shell commands.
> >
> > However, I don't feel strongly about this, so if others prefer the
> > pure-python ld.so.conf parsing approach better, I'm ok with that.
> 
> I was also concerned with parsing ld.so.conf files. However, I did not
> find a way to get ldconfig simply to print the folders that are to be
> analyzed. This would require some regexp parsing of ldconfig output:
> 
>   ldconfig -vNX 2>/dev/null | sed -nre 's,^(/.*): \(from .*\)$,\1,p'
> 
> I don't know which way is the least hacky.
>
How about "ldconfig -p" and just using the list of libraries given to match
against those requested in the elf file, rather than worrying about
directories at all?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6] usertools: rewrite pmdinfo
  2022-09-28 10:53       ` Bruce Richardson
@ 2022-09-28 11:12         ` Robin Jarry
  0 siblings, 0 replies; 42+ messages in thread
From: Robin Jarry @ 2022-09-28 11:12 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Olivier Matz, Ferruh Yigit

Bruce Richardson, Sep 28, 2022 at 12:53:
> How about "ldconfig -p" and just using the list of libraries given to
> match against those requested in the elf file, rather than worrying
> about directories at all?

I could do that but then, the DT_RUNPATH and LD_LIBRARY_PATH directories
must be searched at runtime anyways. It seems more consistent to use
a single method to resolve dynamic libraries.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v7] usertools: rewrite pmdinfo
  2022-09-13 10:58 [PATCH] usertools: rewrite pmdinfo Robin Jarry
                   ` (6 preceding siblings ...)
  2022-09-26 13:44 ` [PATCH v6] " Robin Jarry
@ 2022-10-04 19:29 ` Robin Jarry
  2022-10-10 22:44   ` Thomas Monjalon
  7 siblings, 1 reply; 42+ messages in thread
From: Robin Jarry @ 2022-10-04 19:29 UTC (permalink / raw)
  To: dev; +Cc: Robin Jarry, Ferruh Yigit, Olivier Matz, Bruce Richardson

dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
merely prints multiple independent JSON lines which cannot be fed
directly to any JSON parser. Moreover, the script complexity is rather
high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
sections. Rewrite it so that it can produce valid JSON.

Remove the PCI database parsing for PCI-ID to Vendor-Device names
conversion. This should be done by external scripts (if really needed).

The script passes flake8, black, isort and pylint checks.

I have tested this with a matrix of python/pyelftools versions:

                                 pyelftools
               0.22  0.23  0.24  0.25  0.26  0.27  0.28  0.29
        3.6      ok    ok    ok    ok    ok    ok    ok    ok
        3.7      ok    ok    ok    ok    ok    ok    ok    ok
 Python 3.8      ok    ok    ok    ok    ok    ok    ok    ok
        3.9      ok    ok    ok    ok    ok   *ok    ok    ok
        3.10   fail  fail  fail  fail    ok    ok    ok    ok

                                     * Also tested on FreeBSD

All failures with python 3.10 are related to the same issue:

  File "elftools/construct/lib/container.py", line 5, in <module>
    from collections import MutableMapping
  ImportError: cannot import name 'MutableMapping' from 'collections'

Python 3.10 support is only available since pyelftools 0.26. The script
will only work with Python 3.6 and later.

Update the minimal system requirements, docs and release notes.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Tested-by: Ferruh Yigit <ferruh.yigit@amd.com>
Tested-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
v6 -> v7:

* replaced hacky /etc/ld.so.conf.d/*.conf parsing with ldd invocation
  (makes code actually shorter)
* tested that it works on FreeBSD and Linux
* rebased on latest main (v22.07-486-gd5262b521d09)

 doc/guides/linux_gsg/sys_reqs.rst      |   2 +-
 doc/guides/rel_notes/release_22_11.rst |  10 +
 doc/guides/tools/pmdinfo.rst           |  82 ++-
 usertools/dpdk-pmdinfo.py              | 896 ++++++++-----------------
 4 files changed, 370 insertions(+), 620 deletions(-)

diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index 08d45898f025..f842105eeda7 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -41,7 +41,7 @@ Compilation of the DPDK
    resulting in statically linked applications not being linked properly.
    Use an updated version of ``pkg-config`` or ``pkgconf`` instead when building applications
 
-*   Python 3.5 or later.
+*   Python 3.6 or later.
 
 *   Meson (version 0.49.2+) and ninja
 
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index 5d8ef669b829..8106fb41e09b 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -123,6 +123,16 @@ New Features
   into single event containing ``rte_event_vector``
   whose event type is ``RTE_EVENT_TYPE_CRYPTODEV_VECTOR``.
 
+* **Rewritten dpdk-pmdinfo.py script.**
+
+  The ``dpdk-pmdinfo.py`` script was rewritten to produce valid JSON only.
+  PCI-IDs parsing has been removed.
+  To get a similar output to the (now removed) ``-r/--raw`` flag, you may use the following command:
+
+  .. code-block:: sh
+
+     strings $dpdk_binary_or_driver | sed -n 's/^PMD_INFO_STRING= //p'
+
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/pmdinfo.rst b/doc/guides/tools/pmdinfo.rst
index af6d24b0f63b..5b1e352f5634 100644
--- a/doc/guides/tools/pmdinfo.rst
+++ b/doc/guides/tools/pmdinfo.rst
@@ -5,25 +5,81 @@
 dpdk-pmdinfo Application
 ========================
 
-The ``dpdk-pmdinfo`` tool is a Data Plane Development Kit (DPDK) utility that
-can dump a PMDs hardware support info.
+The ``dpdk-pmdinfo.py`` tool is a Data Plane Development Kit (DPDK) utility that
+can dump a PMDs hardware support info in the JSON format.
 
+Synopsis
+--------
 
-Running the Application
------------------------
+::
 
-The tool has a number of command line options:
+   dpdk-pmdinfo.py [-h] [-p] [-v] ELF_FILE [ELF_FILE ...]
+
+Arguments
+---------
+
+.. program:: dpdk-pmdinfo.py
+
+.. option:: -h, --help
+
+   Show the inline help.
+
+.. option:: -p, --search-plugins
+
+   In addition of ``ELF_FILE``\s and their linked dynamic libraries, also scan
+   the DPDK plugins path.
+
+.. option:: -v, --verbose
+
+   Display warnings due to linked libraries not found or ELF/JSON parsing errors
+   in these libraries. Use twice to show debug messages.
+
+.. option:: ELF_FILE
+
+   DPDK application binary or dynamic library.
+   Any linked ``librte_*.so`` library (as reported by ``ldd``) will also be analyzed.
+   Can be specified multiple times.
+
+Environment Variables
+---------------------
+
+.. envvar:: LD_LIBRARY_PATH
+
+   If specified, the linked ``librte_*.so`` libraries will be looked up here first.
+
+Examples
+--------
+
+Get the complete info for a given driver:
 
 .. code-block:: console
 
-   dpdk-pmdinfo [-hrtp] [-d <pci id file] <elf-file>
+   $ dpdk-pmdinfo.py /usr/bin/dpdk-testpmd | \
+       jq '.[] | select(.name == "net_ice_dcf")'
+   {
+     "name": "net_ice_dcf",
+     "params": "cap=dcf",
+     "kmod": "* igb_uio | vfio-pci",
+     "pci_ids": [
+       {
+         "vendor": "8086",
+         "device": "1889"
+       }
+     ]
+   }
 
-   -h, --help            Show a short help message and exit
-   -r, --raw             Dump as raw json strings
-   -d FILE, --pcidb=FILE Specify a pci database to get vendor names from
-   -t, --table           Output information on hw support as a hex table
-   -p, --plugindir       Scan dpdk for autoload plugins
+Get only the required kernel modules for a given driver:
 
-.. Note::
+.. code-block:: console
 
-   * Parameters inside the square brackets represents optional parameters.
+   $ dpdk-pmdinfo.py /usr/bin/dpdk-testpmd | \
+       jq '.[] | select(.name == "net_cn10k").kmod'
+   "vfio-pci"
+
+Get only the required kernel modules for a given device:
+
+.. code-block:: console
+
+   $ dpdk-pmdinfo.py /usr/bin/dpdk-testpmd | \
+       jq '.[] | select(.pci_ids[] | .vendor == "15b3" and .device == "1013").kmod'
+   "* ib_uverbs & mlx5_core & mlx5_ib"
diff --git a/usertools/dpdk-pmdinfo.py b/usertools/dpdk-pmdinfo.py
index 40ef5cec6cba..67d023a04711 100755
--- a/usertools/dpdk-pmdinfo.py
+++ b/usertools/dpdk-pmdinfo.py
@@ -1,626 +1,310 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2016  Neil Horman <nhorman@tuxdriver.com>
+# Copyright(c) 2022  Robin Jarry
+# pylint: disable=invalid-name
+
+r"""
+Utility to dump PMD_INFO_STRING support from DPDK binaries.
+
+This script prints JSON output to be interpreted by other tools. Here are some
+examples with jq:
+
+Get the complete info for a given driver:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.name == "cnxk_nix_inl")'
+
+Get only the required kernel modules for a given driver:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.name == "net_i40e").kmod'
+
+Get only the required kernel modules for a given device:
+
+  %(prog)s dpdk-testpmd | \
+  jq '.[] | select(.devices[] | .vendor_id == "15b3" and .device_id == "1013").kmod'
+"""
 
-# -------------------------------------------------------------------------
-#
-# Utility to dump PMD_INFO_STRING support from an object file
-#
-# -------------------------------------------------------------------------
-import json
-import os
-import platform
-import sys
 import argparse
-from elftools.common.exceptions import ELFError
-from elftools.common.py3compat import byte2int
-from elftools.elf.elffile import ELFFile
-
-
-# For running from development directory. It should take precedence over the
-# installed pyelftools.
-sys.path.insert(0, '.')
-
-raw_output = False
-pcidb = None
-
-# ===========================================
-
-class Vendor:
-    """
-    Class for vendors. This is the top level class
-    for the devices belong to a specific vendor.
-    self.devices is the device dictionary
-    subdevices are in each device.
-    """
-
-    def __init__(self, vendorStr):
-        """
-        Class initializes with the raw line from pci.ids
-        Parsing takes place inside __init__
-        """
-        self.ID = vendorStr.split()[0]
-        self.name = vendorStr.replace("%s " % self.ID, "").rstrip()
-        self.devices = {}
-
-    def addDevice(self, deviceStr):
-        """
-        Adds a device to self.devices
-        takes the raw line from pci.ids
-        """
-        s = deviceStr.strip()
-        devID = s.split()[0]
-        if devID in self.devices:
-            pass
-        else:
-            self.devices[devID] = Device(deviceStr)
-
-    def report(self):
-        print(self.ID, self.name)
-        for id, dev in self.devices.items():
-            dev.report()
-
-    def find_device(self, devid):
-        # convert to a hex string and remove 0x
-        devid = hex(devid)[2:]
-        try:
-            return self.devices[devid]
-        except:
-            return Device("%s  Unknown Device" % devid)
-
-
-class Device:
-
-    def __init__(self, deviceStr):
-        """
-        Class for each device.
-        Each vendor has its own devices dictionary.
-        """
-        s = deviceStr.strip()
-        self.ID = s.split()[0]
-        self.name = s.replace("%s  " % self.ID, "")
-        self.subdevices = {}
-
-    def report(self):
-        print("\t%s\t%s" % (self.ID, self.name))
-        for subID, subdev in self.subdevices.items():
-            subdev.report()
-
-    def addSubDevice(self, subDeviceStr):
-        """
-        Adds a subvendor, subdevice to device.
-        Uses raw line from pci.ids
-        """
-        s = subDeviceStr.strip()
-        spl = s.split()
-        subVendorID = spl[0]
-        subDeviceID = spl[1]
-        subDeviceName = s.split("  ")[-1]
-        devID = "%s:%s" % (subVendorID, subDeviceID)
-        self.subdevices[devID] = SubDevice(
-            subVendorID, subDeviceID, subDeviceName)
-
-    def find_subid(self, subven, subdev):
-        subven = hex(subven)[2:]
-        subdev = hex(subdev)[2:]
-        devid = "%s:%s" % (subven, subdev)
-
-        try:
-            return self.subdevices[devid]
-        except:
-            if (subven == "ffff" and subdev == "ffff"):
-                return SubDevice("ffff", "ffff", "(All Subdevices)")
-            return SubDevice(subven, subdev, "(Unknown Subdevice)")
-
-
-class SubDevice:
-    """
-    Class for subdevices.
-    """
-
-    def __init__(self, vendor, device, name):
-        """
-        Class initializes with vendorid, deviceid and name
-        """
-        self.vendorID = vendor
-        self.deviceID = device
-        self.name = name
-
-    def report(self):
-        print("\t\t%s\t%s\t%s" % (self.vendorID, self.deviceID, self.name))
-
-
-class PCIIds:
-    """
-    Top class for all pci.ids entries.
-    All queries will be asked to this class.
-    PCIIds.vendors["0e11"].devices["0046"].\
-    subdevices["0e11:4091"].name  =  "Smart Array 6i"
-    """
-
-    def __init__(self, filename):
-        """
-        Prepares the directories.
-        Checks local data file.
-        Tries to load from local, if not found, downloads from web
-        """
-        self.version = ""
-        self.date = ""
-        self.vendors = {}
-        self.contents = None
-        self.readLocal(filename)
-        self.parse()
-
-    def reportVendors(self):
-        """Reports the vendors
-        """
-        for vid, v in self.vendors.items():
-            print(v.ID, v.name)
-
-    def report(self, vendor=None):
-        """
-        Reports everything for all vendors or a specific vendor
-        PCIIds.report()  reports everything
-        PCIIDs.report("0e11") reports only "Compaq Computer Corporation"
-        """
-        if vendor is not None:
-            self.vendors[vendor].report()
-        else:
-            for vID, v in self.vendors.items():
-                v.report()
-
-    def find_vendor(self, vid):
-        # convert vid to a hex string and remove the 0x
-        vid = hex(vid)[2:]
-
-        try:
-            return self.vendors[vid]
-        except:
-            return Vendor("%s Unknown Vendor" % (vid))
-
-    def findDate(self, content):
-        for l in content:
-            if l.find("Date:") > -1:
-                return l.split()[-2].replace("-", "")
-        return None
-
-    def parse(self):
-        if not self.contents:
-            print("data/%s-pci.ids not found" % self.date)
-        else:
-            vendorID = ""
-            deviceID = ""
-            for l in self.contents:
-                if l[0] == "#":
-                    continue
-                elif not l.strip():
-                    continue
-                else:
-                    if l.find("\t\t") == 0:
-                        self.vendors[vendorID].devices[
-                            deviceID].addSubDevice(l)
-                    elif l.find("\t") == 0:
-                        deviceID = l.strip().split()[0]
-                        self.vendors[vendorID].addDevice(l)
-                    else:
-                        vendorID = l.split()[0]
-                        self.vendors[vendorID] = Vendor(l)
-
-    def readLocal(self, filename):
-        """
-        Reads the local file
-        """
-        with open(filename, 'r', encoding='utf-8') as f:
-            self.contents = f.readlines()
-        self.date = self.findDate(self.contents)
-
-    def loadLocal(self):
-        """
-        Loads database from local. If there is no file,
-        it creates a new one from web
-        """
-        self.date = idsfile[0].split("/")[1].split("-")[0]
-        self.readLocal()
-
-
-# =======================================
-
-def search_file(filename, search_path):
-    """ Given a search path, find file with requested name """
-    for path in search_path.split(':'):
-        candidate = os.path.join(path, filename)
-        if os.path.exists(candidate):
-            return os.path.abspath(candidate)
-    return None
-
-
-class ReadElf(object):
-    """ display_* methods are used to emit output into the output stream
-    """
-
-    def __init__(self, file, output):
-        """ file:
-                stream object with the ELF file to read
-
-            output:
-                output stream to write to
-        """
-        self.elffile = ELFFile(file)
-        self.output = output
-
-        # Lazily initialized if a debug dump is requested
-        self._dwarfinfo = None
-
-        self._versioninfo = None
-
-    def _section_from_spec(self, spec):
-        """ Retrieve a section given a "spec" (either number or name).
-            Return None if no such section exists in the file.
-        """
-        try:
-            num = int(spec)
-            if num < self.elffile.num_sections():
-                return self.elffile.get_section(num)
-            return None
-        except ValueError:
-            # Not a number. Must be a name then
-            section = self.elffile.get_section_by_name(force_unicode(spec))
-            if section is None:
-                # No match with a unicode name.
-                # Some versions of pyelftools (<= 0.23) store internal strings
-                # as bytes. Try again with the name encoded as bytes.
-                section = self.elffile.get_section_by_name(force_bytes(spec))
-            return section
-
-    def pretty_print_pmdinfo(self, pmdinfo):
-        global pcidb
-
-        for i in pmdinfo["pci_ids"]:
-            vendor = pcidb.find_vendor(i[0])
-            device = vendor.find_device(i[1])
-            subdev = device.find_subid(i[2], i[3])
-            print("%s (%s) : %s (%s) %s" %
-                  (vendor.name, vendor.ID, device.name,
-                   device.ID, subdev.name))
-
-    def parse_pmd_info_string(self, mystring):
-        global raw_output
-        global pcidb
-
-        optional_pmd_info = [
-            {'id': 'params', 'tag': 'PMD PARAMETERS'},
-            {'id': 'kmod', 'tag': 'PMD KMOD DEPENDENCIES'}
-        ]
-
-        i = mystring.index("=")
-        mystring = mystring[i + 2:]
-        pmdinfo = json.loads(mystring)
-
-        if raw_output:
-            print(json.dumps(pmdinfo))
-            return
-
-        print("PMD NAME: " + pmdinfo["name"])
-        for i in optional_pmd_info:
-            try:
-                print("%s: %s" % (i['tag'], pmdinfo[i['id']]))
-            except KeyError:
-                continue
-
-        if pmdinfo["pci_ids"]:
-            print("PMD HW SUPPORT:")
-            if pcidb is not None:
-                self.pretty_print_pmdinfo(pmdinfo)
-            else:
-                print("VENDOR\t DEVICE\t SUBVENDOR\t SUBDEVICE")
-                for i in pmdinfo["pci_ids"]:
-                    print("0x%04x\t 0x%04x\t 0x%04x\t\t 0x%04x" %
-                          (i[0], i[1], i[2], i[3]))
-
-        print("")
-
-    def display_pmd_info_strings(self, section_spec):
-        """ Display a strings dump of a section. section_spec is either a
-            section number or a name.
-        """
-        section = self._section_from_spec(section_spec)
-        if section is None:
-            return
-
-        data = section.data()
-        dataptr = 0
-
-        while dataptr < len(data):
-            while (dataptr < len(data) and
-                   not 32 <= byte2int(data[dataptr]) <= 127):
-                dataptr += 1
-
-            if dataptr >= len(data):
-                break
-
-            endptr = dataptr
-            while endptr < len(data) and byte2int(data[endptr]) != 0:
-                endptr += 1
-
-            # pyelftools may return byte-strings, force decode them
-            mystring = force_unicode(data[dataptr:endptr])
-            rc = mystring.find("PMD_INFO_STRING")
-            if rc != -1:
-                self.parse_pmd_info_string(mystring[rc:])
-
-            dataptr = endptr
-
-    def find_librte_eal(self, section):
-        for tag in section.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_NEEDED':
-                if "librte_eal" in force_unicode(tag.needed):
-                    return force_unicode(tag.needed)
-        return None
-
-    def search_for_autoload_path(self):
-        scanelf = self
-        scanfile = None
-        library = None
-
-        section = self._section_from_spec(".dynamic")
-        try:
-            eallib = self.find_librte_eal(section)
-            if eallib is not None:
-                ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-                if ldlibpath is None:
-                    ldlibpath = ""
-                dtr = self.get_dt_runpath(section)
-                library = search_file(eallib,
-                                      dtr + ":" + ldlibpath +
-                                      ":/usr/lib64:/lib64:/usr/lib:/lib")
-                if library is None:
-                    return (None, None)
-                if not raw_output:
-                    print("Scanning for autoload path in %s" % library)
-                scanfile = open(library, 'rb')
-                scanelf = ReadElf(scanfile, sys.stdout)
-        except AttributeError:
-            # Not a dynamic binary
-            pass
-        except ELFError:
-            scanfile.close()
-            return (None, None)
-
-        section = scanelf._section_from_spec(".rodata")
-        if section is None:
-            if scanfile is not None:
-                scanfile.close()
-            return (None, None)
-
-        data = section.data()
-        dataptr = 0
-
-        while dataptr < len(data):
-            while (dataptr < len(data) and
-                   not 32 <= byte2int(data[dataptr]) <= 127):
-                dataptr += 1
-
-            if dataptr >= len(data):
-                break
-
-            endptr = dataptr
-            while endptr < len(data) and byte2int(data[endptr]) != 0:
-                endptr += 1
-
-            # pyelftools may return byte-strings, force decode them
-            mystring = force_unicode(data[dataptr:endptr])
-            rc = mystring.find("DPDK_PLUGIN_PATH")
-            if rc != -1:
-                rc = mystring.find("=")
-                return (mystring[rc + 1:], library)
-
-            dataptr = endptr
-        if scanfile is not None:
-            scanfile.close()
-        return (None, None)
-
-    def get_dt_runpath(self, dynsec):
-        for tag in dynsec.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_RUNPATH':
-                return force_unicode(tag.runpath)
-        return ""
-
-    def process_dt_needed_entries(self):
-        """ Look to see if there are any DT_NEEDED entries in the binary
-            And process those if there are
-        """
-        runpath = ""
-        ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-        if ldlibpath is None:
-            ldlibpath = ""
-
-        dynsec = self._section_from_spec(".dynamic")
-        try:
-            runpath = self.get_dt_runpath(dynsec)
-        except AttributeError:
-            # dynsec is None, just return
-            return
-
-        for tag in dynsec.iter_tags():
-            # pyelftools may return byte-strings, force decode them
-            if force_unicode(tag.entry.d_tag) == 'DT_NEEDED':
-                if 'librte_' in force_unicode(tag.needed):
-                    library = search_file(force_unicode(tag.needed),
-                                          runpath + ":" + ldlibpath +
-                                          ":/usr/lib64:/lib64:/usr/lib:/lib")
-                    if library is not None:
-                        with open(library, 'rb') as file:
-                            try:
-                                libelf = ReadElf(file, sys.stdout)
-                            except ELFError:
-                                print("%s is no an ELF file" % library)
-                                continue
-                            libelf.process_dt_needed_entries()
-                            libelf.display_pmd_info_strings(".rodata")
-                            file.close()
-
-
-# compat: remove force_unicode & force_bytes when pyelftools<=0.23 support is
-# dropped.
-def force_unicode(s):
-    if hasattr(s, 'decode') and callable(s.decode):
-        s = s.decode('latin-1')  # same encoding used in pyelftools py3compat
-    return s
-
-
-def force_bytes(s):
-    if hasattr(s, 'encode') and callable(s.encode):
-        s = s.encode('latin-1')  # same encoding used in pyelftools py3compat
-    return s
-
-
-def scan_autoload_path(autoload_path):
-    global raw_output
-
-    if not os.path.exists(autoload_path):
-        return
-
+import json
+import logging
+import os
+import re
+import string
+import subprocess
+import sys
+from pathlib import Path
+from typing import Iterable, Iterator, List, Union
+
+import elftools
+from elftools.elf.elffile import ELFError, ELFFile
+
+
+# ----------------------------------------------------------------------------
+def main() -> int:  # pylint: disable=missing-docstring
     try:
-        dirs = os.listdir(autoload_path)
-    except OSError:
-        # Couldn't read the directory, give up
-        return
-
-    for d in dirs:
-        dpath = os.path.join(autoload_path, d)
-        if os.path.isdir(dpath):
-            scan_autoload_path(dpath)
-        if os.path.isfile(dpath):
-            try:
-                file = open(dpath, 'rb')
-                readelf = ReadElf(file, sys.stdout)
-            except ELFError:
-                # this is likely not an elf file, skip it
-                continue
-            except IOError:
-                # No permission to read the file, skip it
-                continue
-
-            if not raw_output:
-                print("Hw Support for library %s" % d)
-            readelf.display_pmd_info_strings(".rodata")
-            file.close()
+        args = parse_args()
+        logging.basicConfig(
+            stream=sys.stderr,
+            format="%(levelname)s: %(message)s",
+            level={
+                0: logging.ERROR,
+                1: logging.WARNING,
+            }.get(args.verbose, logging.DEBUG),
+        )
+        info = parse_pmdinfo(args.elf_files, args.search_plugins)
+        print(json.dumps(info, indent=2))
+    except BrokenPipeError:
+        pass
+    except KeyboardInterrupt:
+        return 1
+    except Exception as e:  # pylint: disable=broad-except
+        logging.error("%s", e)
+        return 1
+    return 0
 
 
-def scan_for_autoload_pmds(dpdk_path):
+# ----------------------------------------------------------------------------
+def parse_args() -> argparse.Namespace:
     """
-    search the specified application or path for a pmd autoload path
-    then scan said path for pmds and report hw support
+    Parse command line arguments.
     """
-    global raw_output
-
-    if not os.path.isfile(dpdk_path):
-        if not raw_output:
-            print("Must specify a file name")
-        return
-
-    file = open(dpdk_path, 'rb')
-    try:
-        readelf = ReadElf(file, sys.stdout)
-    except ElfError:
-        if not raw_output:
-            print("Unable to parse %s" % file)
-        return
-
-    (autoload_path, scannedfile) = readelf.search_for_autoload_path()
-    if not autoload_path:
-        if not raw_output:
-            print("No autoload path configured in %s" % dpdk_path)
-        return
-    if not raw_output:
-        if scannedfile is None:
-            scannedfile = dpdk_path
-        print("Found autoload path %s in %s" % (autoload_path, scannedfile))
-
-    file.close()
-    if not raw_output:
-        print("Discovered Autoload HW Support:")
-    scan_autoload_path(autoload_path)
-    return
-
-
-def main(stream=None):
-    global raw_output
-    global pcidb
-
-    pcifile_default = "./pci.ids"  # For unknown OS's assume local file
-    if platform.system() == 'Linux':
-        # hwdata is the legacy location, misc is supported going forward
-        pcifile_default = "/usr/share/misc/pci.ids"
-        if not os.path.exists(pcifile_default):
-            pcifile_default = "/usr/share/hwdata/pci.ids"
-    elif platform.system() == 'FreeBSD':
-        pcifile_default = "/usr/local/share/pciids/pci.ids"
-        if not os.path.exists(pcifile_default):
-            pcifile_default = "/usr/share/misc/pci_vendors"
-
     parser = argparse.ArgumentParser(
-        usage='usage: %(prog)s [-hrtp] [-d <pci id file>] elf_file',
-        description="Dump pmd hardware support info")
-    group = parser.add_mutually_exclusive_group()
-    group.add_argument('-r', '--raw',
-                       action='store_true', dest='raw_output',
-                       help='dump raw json strings')
-    group.add_argument("-t", "--table", dest="tblout",
-                       help="output information on hw support as a hex table",
-                       action='store_true')
-    parser.add_argument("-d", "--pcidb", dest="pcifile",
-                        help="specify a pci database to get vendor names from",
-                        default=pcifile_default, metavar="FILE")
-    parser.add_argument("-p", "--plugindir", dest="pdir",
-                        help="scan dpdk for autoload plugins",
-                        action='store_true')
-    parser.add_argument("elf_file", help="driver shared object file")
-    args = parser.parse_args()
+        description=__doc__,
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+    )
+    parser.add_argument(
+        "-p",
+        "--search-plugins",
+        action="store_true",
+        help="""
+        In addition of ELF_FILEs and their linked dynamic libraries, also scan
+        the DPDK plugins path.
+        """,
+    )
+    parser.add_argument(
+        "-v",
+        "--verbose",
+        action="count",
+        default=0,
+        help="""
+        Display warnings due to linked libraries not found or ELF/JSON parsing
+        errors in these libraries. Use twice to show debug messages.
+        """,
+    )
+    parser.add_argument(
+        "elf_files",
+        metavar="ELF_FILE",
+        nargs="+",
+        type=existing_file,
+        help="""
+        DPDK application binary or dynamic library.
+        """,
+    )
+    return parser.parse_args()
 
-    if args.raw_output:
-        raw_output = True
 
-    if args.tblout:
-        args.pcifile = None
+# ----------------------------------------------------------------------------
+def parse_pmdinfo(paths: Iterable[Path], search_plugins: bool) -> List[dict]:
+    """
+    Extract DPDK PMD info JSON strings from an ELF file.
 
-    if args.pcifile:
-        pcidb = PCIIds(args.pcifile)
-        if pcidb is None:
-            print("Pci DB file not found")
-            exit(1)
+    :returns:
+        A list of DPDK drivers info dictionaries.
+    """
+    binaries = set(paths)
+    for p in paths:
+        binaries.update(get_needed_libs(p))
+    if search_plugins:
+        # cast to list to avoid errors with update while iterating
+        binaries.update(list(get_plugin_libs(binaries)))
 
-    if args.pdir:
-        exit(scan_for_autoload_pmds(args.elf_file))
+    drivers = []
 
-    ldlibpath = os.environ.get('LD_LIBRARY_PATH')
-    if ldlibpath is None:
-        ldlibpath = ""
-
-    if os.path.exists(args.elf_file):
-        myelffile = args.elf_file
-    else:
-        myelffile = search_file(args.elf_file,
-                                ldlibpath + ":/usr/lib64:/lib64:/usr/lib:/lib")
-
-    if myelffile is None:
-        print("File not found")
-        sys.exit(1)
-
-    with open(myelffile, 'rb') as file:
+    for b in binaries:
+        logging.debug("analyzing %s", b)
         try:
-            readelf = ReadElf(file, sys.stdout)
-            readelf.process_dt_needed_entries()
-            readelf.display_pmd_info_strings(".rodata")
-            sys.exit(0)
+            for s in get_elf_strings(b, ".rodata", "PMD_INFO_STRING="):
+                try:
+                    info = json.loads(s)
+                    scrub_pci_ids(info)
+                    drivers.append(info)
+                except ValueError as e:
+                    # invalid JSON, should never happen
+                    logging.warning("%s: %s", b, e)
+        except ELFError as e:
+            # only happens for discovered plugins that are not ELF
+            logging.debug("%s: cannot parse ELF: %s", b, e)
 
-        except ELFError as ex:
-            sys.stderr.write('ELF error: %s\n' % ex)
-            sys.exit(1)
+    return drivers
 
 
-# -------------------------------------------------------------------------
-if __name__ == '__main__':
-    main()
+# ----------------------------------------------------------------------------
+PCI_FIELDS = ("vendor", "device", "subsystem_vendor", "subsystem_device")
+
+
+def scrub_pci_ids(info: dict):
+    """
+    Convert numerical ids to hex strings.
+    Strip empty pci_ids lists.
+    Strip wildcard 0xFFFF ids.
+    """
+    pci_ids = []
+    for pci_fields in info.pop("pci_ids"):
+        pci = {}
+        for name, value in zip(PCI_FIELDS, pci_fields):
+            if value != 0xFFFF:
+                pci[name] = f"{value:04x}"
+        if pci:
+            pci_ids.append(pci)
+    if pci_ids:
+        info["pci_ids"] = pci_ids
+
+
+# ----------------------------------------------------------------------------
+def get_plugin_libs(binaries: Iterable[Path]) -> Iterator[Path]:
+    """
+    Look into the provided binaries for DPDK_PLUGIN_PATH and scan the path
+    for files.
+    """
+    for b in binaries:
+        for p in get_elf_strings(b, ".rodata", "DPDK_PLUGIN_PATH="):
+            plugin_path = p.strip()
+            logging.debug("discovering plugins in %s", plugin_path)
+            for root, _, files in os.walk(plugin_path):
+                for f in files:
+                    yield Path(root) / f
+            # no need to search in other binaries.
+            return
+
+
+# ----------------------------------------------------------------------------
+def existing_file(value: str) -> Path:
+    """
+    Argparse type= callback to ensure an argument points to a valid file path.
+    """
+    path = Path(value)
+    if not path.is_file():
+        raise argparse.ArgumentTypeError(f"{value}: No such file")
+    return path
+
+
+# ----------------------------------------------------------------------------
+PRINTABLE_BYTES = frozenset(string.printable.encode("ascii"))
+
+
+def find_strings(buf: bytes, prefix: str) -> Iterator[str]:
+    """
+    Extract strings of printable ASCII characters from a bytes buffer.
+    """
+    view = memoryview(buf)
+    start = None
+
+    for i, b in enumerate(view):
+        if start is None and b in PRINTABLE_BYTES:
+            # mark beginning of string
+            start = i
+            continue
+        if start is not None:
+            if b in PRINTABLE_BYTES:
+                # string not finished
+                continue
+            if b == 0:
+                # end of string
+                s = view[start:i].tobytes().decode("ascii")
+                if s.startswith(prefix):
+                    yield s[len(prefix) :]
+            # There can be byte sequences where a non-printable byte
+            # follows a printable one. Ignore that.
+            start = None
+
+
+# ----------------------------------------------------------------------------
+def elftools_version():
+    """
+    Extract pyelftools version as a tuple of integers for easy comparison.
+    """
+    version = getattr(elftools, "__version__", "")
+    match = re.match(r"^(\d+)\.(\d+).*$", str(version))
+    if not match:
+        # cannot determine version, hope for the best
+        return (0, 24)
+    return (int(match[1]), int(match[2]))
+
+
+ELFTOOLS_VERSION = elftools_version()
+
+
+def from_elftools(s: Union[bytes, str]) -> str:
+    """
+    Earlier versions of pyelftools (< 0.24) return bytes encoded with "latin-1"
+    instead of python strings.
+    """
+    if isinstance(s, bytes):
+        return s.decode("latin-1")
+    return s
+
+
+def to_elftools(s: str) -> Union[bytes, str]:
+    """
+    Earlier versions of pyelftools (< 0.24) assume that ELF section and tags
+    are bytes encoded with "latin-1" instead of python strings.
+    """
+    if ELFTOOLS_VERSION < (0, 24):
+        return s.encode("latin-1")
+    return s
+
+
+# ----------------------------------------------------------------------------
+def get_elf_strings(path: Path, section: str, prefix: str) -> Iterator[str]:
+    """
+    Extract strings from a named ELF section in a file.
+    """
+    with path.open("rb") as f:
+        elf = ELFFile(f)
+        sec = elf.get_section_by_name(to_elftools(section))
+        if not sec:
+            return
+        yield from find_strings(sec.data(), prefix)
+
+
+# ----------------------------------------------------------------------------
+LDD_LIB_RE = re.compile(
+    r"""
+    ^                  # beginning of line
+    \t                 # tab
+    (\S+)              # lib name
+    \s+=>\s+
+    (/\S+)             # lib path
+    \s+
+    \(0x[0-9A-Fa-f]+\) # address
+    \s*
+    $                  # end of line
+    """,
+    re.MULTILINE | re.VERBOSE,
+)
+
+
+def get_needed_libs(path: Path) -> Iterator[Path]:
+    """
+    Extract the dynamic library dependencies from an ELF executable.
+    """
+    with subprocess.Popen(
+        ["ldd", str(path)], stdout=subprocess.PIPE, stderr=subprocess.PIPE
+    ) as proc:
+        out, err = proc.communicate()
+        if proc.returncode != 0:
+            err = err.decode("utf-8").splitlines()[-1].strip()
+            raise Exception(f"cannot read ELF file: {err}")
+        for match in LDD_LIB_RE.finditer(out.decode("utf-8")):
+            libname, libpath = match.groups()
+            if libname.startswith("librte_"):
+                libpath = Path(libpath)
+                if libpath.is_file():
+                    yield libpath.resolve()
+
+
+# ----------------------------------------------------------------------------
+if __name__ == "__main__":
+    sys.exit(main())
-- 
2.37.3


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v7] usertools: rewrite pmdinfo
  2022-10-04 19:29 ` [PATCH v7] " Robin Jarry
@ 2022-10-10 22:44   ` Thomas Monjalon
  2022-10-12 15:16     ` Olivier Matz
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Monjalon @ 2022-10-10 22:44 UTC (permalink / raw)
  To: Robin Jarry
  Cc: dev, Robin Jarry, Ferruh Yigit, Olivier Matz, Bruce Richardson

04/10/2022 21:29, Robin Jarry:
> dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
> merely prints multiple independent JSON lines which cannot be fed
> directly to any JSON parser. Moreover, the script complexity is rather
> high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
> sections. Rewrite it so that it can produce valid JSON.
> 
> Remove the PCI database parsing for PCI-ID to Vendor-Device names
> conversion. This should be done by external scripts (if really needed).
> 
> The script passes flake8, black, isort and pylint checks.
> 
> I have tested this with a matrix of python/pyelftools versions:
> 
>                                  pyelftools
>                0.22  0.23  0.24  0.25  0.26  0.27  0.28  0.29
>         3.6      ok    ok    ok    ok    ok    ok    ok    ok
>         3.7      ok    ok    ok    ok    ok    ok    ok    ok
>  Python 3.8      ok    ok    ok    ok    ok    ok    ok    ok
>         3.9      ok    ok    ok    ok    ok   *ok    ok    ok
>         3.10   fail  fail  fail  fail    ok    ok    ok    ok
> 
>                                      * Also tested on FreeBSD
> 
> All failures with python 3.10 are related to the same issue:
> 
>   File "elftools/construct/lib/container.py", line 5, in <module>
>     from collections import MutableMapping
>   ImportError: cannot import name 'MutableMapping' from 'collections'
> 
> Python 3.10 support is only available since pyelftools 0.26. The script
> will only work with Python 3.6 and later.
> 
> Update the minimal system requirements, docs and release notes.
> 
> Signed-off-by: Robin Jarry <rjarry@redhat.com>
> Tested-by: Ferruh Yigit <ferruh.yigit@amd.com>
> Tested-by: Olivier Matz <olivier.matz@6wind.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>

Applied, thanks.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v7] usertools: rewrite pmdinfo
  2022-10-10 22:44   ` Thomas Monjalon
@ 2022-10-12 15:16     ` Olivier Matz
  2022-10-12 16:16       ` Thomas Monjalon
  0 siblings, 1 reply; 42+ messages in thread
From: Olivier Matz @ 2022-10-12 15:16 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Robin Jarry, dev, Ferruh Yigit, Bruce Richardson

Hi,

On Tue, Oct 11, 2022 at 12:44:56AM +0200, Thomas Monjalon wrote:
> 04/10/2022 21:29, Robin Jarry:
> > dpdk-pmdinfo.py does not produce any parseable output. The -r/--raw flag
> > merely prints multiple independent JSON lines which cannot be fed
> > directly to any JSON parser. Moreover, the script complexity is rather
> > high for such a simple task: extracting PMD_INFO_STRING from .rodata ELF
> > sections. Rewrite it so that it can produce valid JSON.
> > 
> > Remove the PCI database parsing for PCI-ID to Vendor-Device names
> > conversion. This should be done by external scripts (if really needed).
> > 
> > The script passes flake8, black, isort and pylint checks.
> > 
> > I have tested this with a matrix of python/pyelftools versions:
> > 
> >                                  pyelftools
> >                0.22  0.23  0.24  0.25  0.26  0.27  0.28  0.29
> >         3.6      ok    ok    ok    ok    ok    ok    ok    ok
> >         3.7      ok    ok    ok    ok    ok    ok    ok    ok
> >  Python 3.8      ok    ok    ok    ok    ok    ok    ok    ok
> >         3.9      ok    ok    ok    ok    ok   *ok    ok    ok
> >         3.10   fail  fail  fail  fail    ok    ok    ok    ok
> > 
> >                                      * Also tested on FreeBSD
> > 
> > All failures with python 3.10 are related to the same issue:
> > 
> >   File "elftools/construct/lib/container.py", line 5, in <module>
> >     from collections import MutableMapping
> >   ImportError: cannot import name 'MutableMapping' from 'collections'
> > 
> > Python 3.10 support is only available since pyelftools 0.26. The script
> > will only work with Python 3.6 and later.
> > 
> > Update the minimal system requirements, docs and release notes.
> > 
> > Signed-off-by: Robin Jarry <rjarry@redhat.com>
> > Tested-by: Ferruh Yigit <ferruh.yigit@amd.com>
> > Tested-by: Olivier Matz <olivier.matz@6wind.com>
> > Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> Applied, thanks.

As discussed off-list with Robin, it appears that "ldd" is not available
on buildroot-based images. See:
http://lists.busybox.net/pipermail/buildroot/2013-July/074927.html

The link is quite old but it seems it's still true today if we don't
build the toolchain.

Robin suggested this patch:

--- a/usertools/dpdk-pmdinfo.py
+++ b/usertools/dpdk-pmdinfo.py
@@ -290,8 +290,10 @@ def get_needed_libs(path: Path) -> Iterator[Path]:
     """
     Extract the dynamic library dependencies from an ELF executable.
     """
+    env = os.environ.copy()
+    env["LD_TRACE_LOADED_OBJECTS"] = "1"
     with subprocess.Popen(
-        ["ldd", str(path)], stdout=subprocess.PIPE, stderr=subprocess.PIPE
+        [str(path)], stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env
     ) as proc:
         out, err = proc.communicate()
         if proc.returncode != 0:

One subtle difference is that the patched version won't work on
non-executable files, but I don't think it can happen in real-life.

An alternative for us is to provide a simple "ldd" shell script in our
buildroot-based images.

I don't have a strong opinion, I'll tend to say that the patch is a
better option. Any comment?

Thanks,
Olivier

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v7] usertools: rewrite pmdinfo
  2022-10-12 15:16     ` Olivier Matz
@ 2022-10-12 16:16       ` Thomas Monjalon
  2022-10-12 16:30         ` Robin Jarry
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Monjalon @ 2022-10-12 16:16 UTC (permalink / raw)
  To: Olivier Matz; +Cc: Robin Jarry, dev, Ferruh Yigit, Bruce Richardson

12/10/2022 17:16, Olivier Matz:
> On Tue, Oct 11, 2022 at 12:44:56AM +0200, Thomas Monjalon wrote:
> As discussed off-list with Robin, it appears that "ldd" is not available
> on buildroot-based images. See:
> http://lists.busybox.net/pipermail/buildroot/2013-July/074927.html
> 
> The link is quite old but it seems it's still true today if we don't
> build the toolchain.
> 
> Robin suggested this patch:
> 
> --- a/usertools/dpdk-pmdinfo.py
> +++ b/usertools/dpdk-pmdinfo.py
> @@ -290,8 +290,10 @@ def get_needed_libs(path: Path) -> Iterator[Path]:
>      """
>      Extract the dynamic library dependencies from an ELF executable.
>      """
> +    env = os.environ.copy()
> +    env["LD_TRACE_LOADED_OBJECTS"] = "1"
>      with subprocess.Popen(
> -        ["ldd", str(path)], stdout=subprocess.PIPE, stderr=subprocess.PIPE
> +        [str(path)], stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env
>      ) as proc:
>          out, err = proc.communicate()
>          if proc.returncode != 0:
> 
> One subtle difference is that the patched version won't work on
> non-executable files, but I don't think it can happen in real-life.
> 
> An alternative for us is to provide a simple "ldd" shell script in our
> buildroot-based images.
> 
> I don't have a strong opinion, I'll tend to say that the patch is a
> better option. Any comment?

What about implementing both?
If ldd is available, use it,
otherwise use LD_TRACE_LOADED_OBJECTS variable.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v7] usertools: rewrite pmdinfo
  2022-10-12 16:16       ` Thomas Monjalon
@ 2022-10-12 16:30         ` Robin Jarry
  2022-10-12 16:44           ` Thomas Monjalon
  0 siblings, 1 reply; 42+ messages in thread
From: Robin Jarry @ 2022-10-12 16:30 UTC (permalink / raw)
  To: Thomas Monjalon, Olivier Matz; +Cc: dev, Ferruh Yigit, Bruce Richardson

Thomas Monjalon, Oct 12, 2022 at 18:16:
> What about implementing both?
> If ldd is available, use it,
> otherwise use LD_TRACE_LOADED_OBJECTS variable.

This is a bit overkill in my opinion. Also it would make the behaviour
somewhat different whether ldd is available and/or the analyzed binaries
are executable.

I think it is an OK limitation to require that the analyzed ELF files
are executable.

By the way, the following command:

    LD_TRACE_LOADED_OBJECTS=1 build/app/dpdk-test-pmd

Works both on Linux and FreeBSD and produces similar outputs.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v7] usertools: rewrite pmdinfo
  2022-10-12 16:30         ` Robin Jarry
@ 2022-10-12 16:44           ` Thomas Monjalon
  2022-10-12 16:48             ` Robin Jarry
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Monjalon @ 2022-10-12 16:44 UTC (permalink / raw)
  To: Olivier Matz, Robin Jarry; +Cc: dev, Ferruh Yigit, Bruce Richardson

12/10/2022 18:30, Robin Jarry:
> Thomas Monjalon, Oct 12, 2022 at 18:16:
> > What about implementing both?
> > If ldd is available, use it,
> > otherwise use LD_TRACE_LOADED_OBJECTS variable.
> 
> This is a bit overkill in my opinion. Also it would make the behaviour
> somewhat different whether ldd is available and/or the analyzed binaries
> are executable.

Yes, different behaviour is not desirable.

> I think it is an OK limitation to require that the analyzed ELF files
> are executable.
> 
> By the way, the following command:
> 
>     LD_TRACE_LOADED_OBJECTS=1 build/app/dpdk-test-pmd
> 
> Works both on Linux and FreeBSD and produces similar outputs.

OK for executable,
but can we expect .so to be always executable?



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v7] usertools: rewrite pmdinfo
  2022-10-12 16:44           ` Thomas Monjalon
@ 2022-10-12 16:48             ` Robin Jarry
  2022-10-12 20:40               ` Thomas Monjalon
  0 siblings, 1 reply; 42+ messages in thread
From: Robin Jarry @ 2022-10-12 16:48 UTC (permalink / raw)
  To: Thomas Monjalon, Olivier Matz; +Cc: dev, Ferruh Yigit, Bruce Richardson

Thomas Monjalon, Oct 12, 2022 at 18:44:
> OK for executable,
> but can we expect .so to be always executable?

I think this is the default when linking. Whether dynamic libraries or
executable programs.

Also, the "must be executable" limitation only applies on the files
specified on the command line. Perhaps this is acceptable? We can make
it obvious in the docs and return an explicit error so that users are
not confused.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v7] usertools: rewrite pmdinfo
  2022-10-12 16:48             ` Robin Jarry
@ 2022-10-12 20:40               ` Thomas Monjalon
  0 siblings, 0 replies; 42+ messages in thread
From: Thomas Monjalon @ 2022-10-12 20:40 UTC (permalink / raw)
  To: Olivier Matz, Robin Jarry; +Cc: dev, Ferruh Yigit, Bruce Richardson

12/10/2022 18:48, Robin Jarry:
> Thomas Monjalon, Oct 12, 2022 at 18:44:
> > OK for executable,
> > but can we expect .so to be always executable?
> 
> I think this is the default when linking. Whether dynamic libraries or
> executable programs.
> 
> Also, the "must be executable" limitation only applies on the files
> specified on the command line. Perhaps this is acceptable? We can make
> it obvious in the docs and return an explicit error so that users are
> not confused.

Yes good idea to add a check with a warning.



^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2022-10-12 20:40 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-13 10:58 [PATCH] usertools: rewrite pmdinfo Robin Jarry
2022-09-13 11:29 ` Ferruh Yigit
2022-09-13 11:49   ` Robin Jarry
2022-09-13 13:50     ` Ferruh Yigit
2022-09-13 13:59       ` Robin Jarry
2022-09-13 14:17         ` Ferruh Yigit
2022-09-13 14:17 ` Bruce Richardson
2022-09-13 19:42 ` [PATCH v2] " Robin Jarry
2022-09-13 20:54   ` Ferruh Yigit
2022-09-13 21:22     ` Robin Jarry
2022-09-14 11:46       ` Ferruh Yigit
2022-09-15  9:18         ` Robin Jarry
2022-09-20  9:08 ` [PATCH v3] " Robin Jarry
2022-09-20 10:10   ` Ferruh Yigit
2022-09-20 10:12     ` Robin Jarry
2022-09-20 10:42 ` [PATCH v4] " Robin Jarry
2022-09-20 14:08   ` Olivier Matz
2022-09-20 17:48   ` Ferruh Yigit
2022-09-20 17:50     ` Ferruh Yigit
2022-09-21  7:27       ` Thomas Monjalon
2022-09-21  8:02         ` Ferruh Yigit
2022-09-20 19:15     ` Robin Jarry
2022-09-21  7:58       ` Ferruh Yigit
2022-09-21  9:57         ` Ferruh Yigit
2022-09-22 11:58 ` [PATCH v5] " Robin Jarry
2022-09-22 12:03   ` Bruce Richardson
2022-09-22 15:12   ` Ferruh Yigit
2022-09-26 11:55   ` Olivier Matz
2022-09-26 12:52   ` Robin Jarry
2022-09-26 13:44 ` [PATCH v6] " Robin Jarry
2022-09-26 15:17   ` Bruce Richardson
2022-09-28  6:51     ` Robin Jarry
2022-09-28 10:53       ` Bruce Richardson
2022-09-28 11:12         ` Robin Jarry
2022-10-04 19:29 ` [PATCH v7] " Robin Jarry
2022-10-10 22:44   ` Thomas Monjalon
2022-10-12 15:16     ` Olivier Matz
2022-10-12 16:16       ` Thomas Monjalon
2022-10-12 16:30         ` Robin Jarry
2022-10-12 16:44           ` Thomas Monjalon
2022-10-12 16:48             ` Robin Jarry
2022-10-12 20:40               ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).