DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements
@ 2021-09-03  0:47 Stephen Hemminger
  2021-09-03  0:47 ` [dpdk-dev] [PATCH 1/5] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
                   ` (18 more replies)
  0 siblings, 19 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-03  0:47 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patch set is a more complete version of the the enhanced
packet capture support described last year.

The new capture library and utility are:
  - faster avoids lots of extra I/O, does bursting, etc.
  - gives more information (multiple ports, queues, etc)
  - has a better user interface (same as Wireshark dumpcap)
  - fixes structural problems with VLAN's and timestamps

And it keeps the old pdump command as is for those people
who never want to change.

The one missing piece is that dumpcap utility does not
yet have the necessary converter to take the classic
BPF for pcap_compile and convert it to eBPF for DPDK.
(It is not hard, just not working right yet.)

Stephen Hemminger (5):
  librte_pcapng: add new library for writing pcapng files
  pdump: support pcapng and filtering
  app/dumpcap: add new packet capture application
  doc: changes for new pcapng and dumpcap
  MAINTAINERS: add entry for new pcapng and dumper

 MAINTAINERS                                   |   6 +
 app/dumpcap/main.c                            | 829 ++++++++++++++++++
 app/dumpcap/meson.build                       |  11 +
 app/meson.build                               |   1 +
 doc/api/doxy-api-index.md                     |   1 +
 doc/api/doxy-api.conf.in                      |   1 +
 doc/guides/howto/packet_capture_framework.rst |  67 +-
 doc/guides/prog_guide/index.rst               |   1 +
 doc/guides/prog_guide/pcapng_lib.rst          |  24 +
 doc/guides/prog_guide/pdump_lib.rst           |  28 +-
 doc/guides/rel_notes/release_21_11.rst        |  10 +
 doc/guides/tools/dumpcap.rst                  |  80 ++
 doc/guides/tools/index.rst                    |   1 +
 lib/meson.build                               |   5 +-
 lib/pcapng/meson.build                        |   8 +
 lib/pcapng/pcapng_proto.h                     | 129 +++
 lib/pcapng/rte_pcapng.c                       | 543 ++++++++++++
 lib/pcapng/rte_pcapng.h                       | 175 ++++
 lib/pcapng/version.map                        |  12 +
 lib/pdump/meson.build                         |   2 +-
 lib/pdump/rte_pdump.c                         | 386 +++++---
 lib/pdump/rte_pdump.h                         | 117 ++-
 lib/pdump/version.map                         |   8 +
 23 files changed, 2283 insertions(+), 162 deletions(-)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH 1/5] librte_pcapng: add new library for writing pcapng files
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-03  0:47 ` Stephen Hemminger
  2021-09-03  0:47 ` [dpdk-dev] [PATCH 2/5] pdump: support pcapng and filtering Stephen Hemminger
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-03  0:47 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See draft RFC
  https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
and
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build           |   1 +
 lib/pcapng/meson.build    |   8 +
 lib/pcapng/pcapng_proto.h | 129 +++++++++
 lib/pcapng/rte_pcapng.c   | 543 ++++++++++++++++++++++++++++++++++++++
 lib/pcapng/rte_pcapng.h   | 175 ++++++++++++
 lib/pcapng/version.map    |  12 +
 6 files changed, 868 insertions(+)
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

diff --git a/lib/meson.build b/lib/meson.build
index 1673ca4323c0..514be90b09ec 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -41,6 +41,7 @@ libraries = [
         'latencystats',
         'lpm',
         'member',
+	'pcapng',
         'power',
         'pdump',
         'rawdev',
diff --git a/lib/pcapng/meson.build b/lib/pcapng/meson.build
new file mode 100644
index 000000000000..fe636bdf3c0b
--- /dev/null
+++ b/lib/pcapng/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+version = 1
+sources = files('rte_pcapng.c')
+headers = files('rte_pcapng.h')
+
+deps += ['ethdev']
diff --git a/lib/pcapng/pcapng_proto.h b/lib/pcapng/pcapng_proto.h
new file mode 100644
index 000000000000..47161d8a1213
--- /dev/null
+++ b/lib/pcapng/pcapng_proto.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * PCAP Next Generation Capture File writer
+ *
+ * See: https://github.com/pcapng/pcapng/ for the file format.
+ */
+
+enum pcapng_block_types {
+	PCAPNG_INTERFACE_BLOCK		= 1,
+	PCAPNG_PACKET_BLOCK,		/* Obsolete */
+	PCAPNG_SIMPLE_PACKET_BLOCK,
+	PCAPNG_NAME_RESOLUTION_BLOCK,
+	PCAPNG_INTERFACE_STATS_BLOCK,
+	PCAPNG_ENHANCED_PACKET_BLOCK,
+
+	PCAPNG_SECTION_BLOCK		= 0x0A0D0D0A,
+};
+
+struct pcapng_option {
+	uint16_t code;
+	uint16_t length;
+	uint8_t data[];
+};
+
+#define PCAPNG_BYTE_ORDER_MAGIC 0x1A2B3C4D
+#define PCAPNG_MAJOR_VERS 1
+#define PCAPNG_MINOR_VERS 0
+
+enum pcapng_opt {
+	PCAPNG_OPT_END	= 0,
+	PCAPNG_OPT_COMMENT = 1,
+};
+
+struct pcapng_section_header {
+	uint32_t block_type;
+	uint32_t block_length;
+	uint32_t byte_order_magic;
+	uint16_t major_version;
+	uint16_t minor_version;
+	uint64_t section_length;
+};
+
+enum pcapng_section_opt {
+	PCAPNG_SHB_HARDWARE = 2,
+	PCAPNG_SHB_OS	    = 3,
+	PCAPNG_SHB_USERAPPL = 4,
+};
+
+struct pcapng_interface_block {
+	uint32_t block_type;	/* 1 */
+	uint32_t block_length;
+	uint16_t link_type;
+	uint16_t reserved;
+	uint32_t snap_len;
+};
+
+enum pcapng_interface_options {
+	PCAPNG_IFB_NAME	 = 2,
+	PCAPNG_IFB_DESCRIPTION,
+	PCAPNG_IFB_IPV4ADDR,
+	PCAPNG_IFB_IPV6ADDR,
+	PCAPNG_IFB_MACADDR,
+	PCAPNG_IFB_EUIADDR,
+	PCAPNG_IFB_SPEED,
+	PCAPNG_IFB_TSRESOL,
+	PCAPNG_IFB_TZONE,
+	PCAPNG_IFB_FILTER,
+	PCAPNG_IFB_OS,
+	PCAPNG_IFB_FCSLEN,
+	PCAPNG_IFB_TSOFFSET,
+	PCAPNG_IFB_HARDWARE,
+};
+
+struct pcapng_enhance_packet_block {
+	uint32_t block_type;	/* 6 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+	uint32_t capture_length;
+	uint32_t original_length;
+};
+
+/* Flags values */
+#define PCAPNG_IFB_INBOUND   0b01
+#define PCAPNG_IFB_OUTBOUND  0b10
+
+enum pcapng_epb_options {
+	PCAPNG_EPB_FLAGS = 2,
+	PCAPNG_EPB_HASH,
+	PCAPNG_EPB_DROPCOUNT,
+	PCAPNG_EPB_PACKETID,
+	PCAPNG_EPB_QUEUE,
+	PCAPNG_EPB_VERDICT,
+};
+
+enum pcapng_epb_hash {
+	PCAPNG_HASH_2COMP = 0,
+	PCAPNG_HASH_XOR,
+	PCAPNG_HASH_CRC32,
+	PCAPNG_HASH_MD5,
+	PCAPNG_HASH_SHA1,
+	PCAPNG_HASH_TOEPLITZ,
+};
+
+struct pcapng_simple_packet {
+	uint32_t block_type;	/* 3 */
+	uint32_t block_length;
+	uint32_t packet_length;
+};
+
+struct pcapng_statistics {
+	uint32_t block_type;	/* 5 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+};
+
+enum pcapng_isb_options {
+	PCAPNG_ISB_STARTTIME = 2,
+	PCAPNG_ISB_ENDTIME,
+	PCAPNG_ISB_IFRECV,
+	PCAPNG_ISB_IFDROP,
+	PCAPNG_ISB_FILTERACCEPT,
+	PCAPNG_ISB_OSDROP,
+	PCAPNG_ISB_USRDELIV,
+};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
new file mode 100644
index 000000000000..4bfc1a5240f0
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.c
@@ -0,0 +1,543 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <net/if.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/uio.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+#include <rte_time.h>
+#include <rte_version.h>
+
+#include "pcapng_proto.h"
+
+/* conversion from DPDK speed to PCAPNG */
+#define PCAPNG_MBPS_SPEED 1000000ull
+
+/* Format of the capture file handle */
+struct rte_pcapng {
+	int  outfd;		/* output file */
+	/* DPDK port id to interface index in file */
+	uint32_t port_index[RTE_MAX_ETHPORTS];
+};
+
+/* For converting TSC cycles to PCAPNG ns format */
+struct pcapng_time {
+	uint64_t ns;
+	uint64_t cycles;
+} pcapng_time;
+
+int rte_pcapng_init(void)
+{
+	struct timespec ts;
+
+	pcapng_time.cycles = rte_get_tsc_cycles();
+	if (clock_gettime(CLOCK_REALTIME, &ts) < 0)
+		return -1;
+
+	pcapng_time.ns = rte_timespec_to_ns(&ts);
+	return 0;
+}
+
+/* PCAPNG timestamps are in nanoseconds */
+static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
+{
+	uint64_t delta;
+
+	delta = cycles - pcapng_time.cycles;
+	return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
+}
+
+/* length of option including padding */
+static size_t pcapng_optlen(uint16_t len)
+{
+	return RTE_ALIGN(sizeof(struct pcapng_option) + len,
+			 sizeof(uint32_t));
+}
+
+/* build TLV option and return location of next */
+static struct pcapng_option *
+pcapng_add_option(struct pcapng_option *popt, uint16_t code,
+		  const void *data, uint16_t len)
+{
+	popt->code = code;
+	popt->length = len;
+	memcpy(popt->data, data, len);
+
+	return (struct pcapng_option *)((uint8_t *)popt + pcapng_optlen(len));
+}
+
+/*
+ * Write required initial section header describing the capture
+ */
+static int
+pcapng_section_block(rte_pcapng_t *self,
+		    const char *os, const char *hw,
+		    const char *app, const char *comment)
+{
+	struct pcapng_section_header *hdr;
+	struct pcapng_option *opt;
+	void *buf;
+	size_t len;
+	ssize_t cc;
+
+	len = sizeof(*hdr);
+	if (hw)
+		len += pcapng_optlen(strlen(hw));
+	if (os)
+		len += pcapng_optlen(strlen(os));
+	if (app)
+		len += pcapng_optlen(strlen(app));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = calloc(1, len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_section_header *)buf;
+	*hdr = (struct pcapng_section_header) {
+		.block_type = PCAPNG_SECTION_BLOCK,
+		.block_length = len,
+		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
+		.major_version = PCAPNG_MAJOR_VERS,
+		.minor_version = PCAPNG_MINOR_VERS,
+		.section_length = UINT64_MAX,
+	};
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (hw)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_HARDWARE,
+					hw, strlen(hw));
+	if (os)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_OS,
+					os, strlen(os));
+	if (app)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_USERAPPL,
+					app, strlen(app));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+	/* clone block_length after option */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+/* Write the PCAPNG section header at start of file */
+static ssize_t
+pcapng_interface_block(rte_pcapng_t *self, const char *if_name,
+		      uint64_t if_speed, const uint8_t *mac_addr,
+		      const char *if_hw, const char *comment)
+{
+	struct pcapng_interface_block *hdr;
+	struct pcapng_option *opt;
+	const uint8_t tsresol = 9;	/* nanosecond resolution */
+	size_t len = sizeof(*hdr);
+	ssize_t cc;
+	void *buf;
+
+	len += pcapng_optlen(sizeof(tsresol));
+	if (if_name)
+		len += pcapng_optlen(strlen(if_name));
+	if (mac_addr)
+		len += pcapng_optlen(6);
+	if (if_speed)
+		len += pcapng_optlen(sizeof(uint64_t));
+	if (if_hw)
+		len += pcapng_optlen(strlen(if_hw));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+	buf = calloc(1, len);
+	if (!buf)
+		return -ENOMEM;
+
+	hdr = (struct pcapng_interface_block *)buf;
+	hdr->block_type = PCAPNG_INTERFACE_BLOCK;
+	hdr->link_type = 1;	/* Ethernet */
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (if_name)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_NAME,
+					 if_name, strlen(if_name));
+	if (mac_addr)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_MACADDR,
+					mac_addr, RTE_ETHER_ADDR_LEN);
+	if (if_speed)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_SPEED,
+					 &if_speed, sizeof(uint64_t));
+	opt = pcapng_add_option(opt, PCAPNG_IFB_TSRESOL,
+				&tsresol, sizeof(tsresol));
+	if (if_hw)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE,
+					 if_hw, strlen(if_hw));
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+static int
+pcapng_add_interface(rte_pcapng_t *self, uint16_t port)
+{
+	struct rte_eth_dev_info dev_info;
+	struct rte_ether_addr macaddr;
+	const struct rte_device *dev;
+	struct rte_eth_link link;
+	char ifname[IF_NAMESIZE];
+	char ifhw[256];
+	uint64_t speed = 0;
+
+	if (rte_eth_dev_info_get(port, &dev_info) < 0)
+		return -1;
+
+	/* make something like an interface name */
+	if (if_indextoname(dev_info.if_index, ifname) == NULL)
+		snprintf(ifname, IF_NAMESIZE, "dpdk:%u", port);
+
+	/* make a useful device hardware string */
+	dev = dev_info.device;
+	if (dev)
+		snprintf(ifhw, sizeof(ifhw),
+			 "%s-%s", dev->bus->name, dev->name);
+
+	/* DPDK reports in units of Mbps */
+	rte_eth_link_get(port, &link);
+	if (link.link_status == ETH_LINK_UP)
+		speed = link.link_speed * PCAPNG_MBPS_SPEED;
+
+	rte_eth_macaddr_get(port, &macaddr);
+
+	return pcapng_interface_block(self, ifname, speed,
+				      macaddr.addr_bytes,
+				      dev ? ifhw : NULL, NULL);
+}
+
+/*
+ * Write the list of possible interfaces at the start
+ * of the file.
+ */
+static int
+pcapng_interfaces(rte_pcapng_t *self)
+{
+	uint16_t port_id;
+	uint16_t index = 0;
+
+	RTE_ETH_FOREACH_DEV(port_id) {
+		/* The list if ports in pcapng needs to be contiguous */
+		self->port_index[port_id] = index++;
+		if (pcapng_add_interface(self, port_id) < 0)
+			return -1;
+	}
+	return 0;
+}
+
+/*
+ * Write an Interface statistics block at the end of capture.
+ */
+int
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id,
+		       uint64_t ifrecv, uint64_t ifdrop,
+		       uint64_t filteraccept, uint64_t dropped)
+{
+	struct pcapng_statistics *hdr;
+	struct pcapng_option *opt;
+	uint64_t ns;
+	size_t len;
+	uint8_t buf[512];
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	ns = pcapng_tsc_to_ns(rte_get_tsc_cycles());
+
+	hdr = (struct pcapng_statistics *)buf;
+	opt = (struct pcapng_option *)(hdr + 1);
+
+	opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV,
+				&ifrecv, sizeof(ifrecv));
+	opt = pcapng_add_option(opt, PCAPNG_ISB_IFDROP,
+				&ifdrop, sizeof(ifdrop));
+	opt = pcapng_add_option(opt, PCAPNG_ISB_FILTERACCEPT,
+				&filteraccept, sizeof(filteraccept));
+	opt = pcapng_add_option(opt, PCAPNG_ISB_OSDROP,
+				&dropped, sizeof(dropped));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	len = sizeof(*hdr)
+		+ 4 * pcapng_optlen(sizeof(uint64_t))
+		+ pcapng_optlen(0)
+		+ sizeof(uint32_t);
+
+	hdr->block_type = PCAPNG_INTERFACE_STATS_BLOCK;
+	hdr->block_length = len;
+	hdr->interface_id = self->port_index[port_id];
+
+	hdr->timestamp_hi = ns >> 32;
+	hdr->timestamp_lo = (uint32_t)ns;
+
+	/* clone block_length after option */
+	memcpy(opt, &len, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+/*
+ *   The mbufs created use the Pcapng standard enhanced packet  block.
+ *
+ *                         1                   2                   3
+ *     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  0 |                    Block Type = 0x00000006                    |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  4 |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  8 |                         Interface ID                          |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 12 |                        Timestamp (High)                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 16 |                        Timestamp (Low)                        |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 20 |                    Captured Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 24 |                    Original Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 28 /                                                               /
+ *    /                          Packet Data                          /
+ *    /              variable length, padded to 32 bits               /
+ *    /                                                               /
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0002     |     Option Length = 0x004     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Flags (direction)                                |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0006     |     Option Length = 0x002     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Queue id                                         |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+
+/* Make a copy of original mbuf with pcapng header and options */
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint16_t queue,
+		const struct rte_mbuf *md,
+		struct rte_mempool *mp,
+		uint32_t length, uint64_t cycles,
+		enum rte_pcapng_direction direction)
+{
+	struct pcapng_enhance_packet_block *epb;
+	struct pcapng_option *opt;
+	size_t optlen;
+	uint32_t orig_len, data_len, padding, flags;
+	struct rte_mbuf *mc;
+	uint64_t ns;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+#endif
+	ns = pcapng_tsc_to_ns(cycles);
+
+	orig_len = rte_pktmbuf_pkt_len(md);
+
+	/* Take snapshot of the data */
+	mc = rte_pktmbuf_copy(md, mp, 0, length);
+	if (unlikely(mc == NULL))
+		return NULL;
+
+	/* If packet had offloaded VLAN, expand it */
+	if (md->ol_flags & ~(PKT_RX_VLAN_STRIPPED | PKT_TX_VLAN)) {
+		if (rte_vlan_insert(&mc) != 0)
+			goto fail;
+
+		orig_len += sizeof(struct rte_vlan_hdr);
+	}
+
+	/* pad the packet to 32 bit boundary */
+	data_len = rte_pktmbuf_data_len(mc);
+	padding = RTE_ALIGN_CEIL(data_len, 32) - data_len;
+	if (padding > 0) {
+		void *tail = rte_pktmbuf_append(mc, padding);
+
+		if (tail == NULL)
+			goto fail;
+		memset(tail, 0, padding);
+	}
+
+	/* reserve trailing options and block length */
+	optlen = pcapng_optlen(sizeof(flags)) + pcapng_optlen(sizeof(queue));
+	opt = (struct pcapng_option *)
+		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
+	if (unlikely(opt == NULL))
+		goto fail;
+
+	switch (direction) {
+	case RTE_PCAPNG_DIRECTION_IN:
+		flags = PCAPNG_IFB_INBOUND;
+		break;
+	case RTE_PCAPNG_DIRECTION_OUT:
+		flags = PCAPNG_IFB_OUTBOUND;
+		break;
+	default:
+		flags = 0;
+	}
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS,
+				&flags, sizeof(flags));
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
+				&queue, sizeof(queue));
+
+	/* Add PCAPNG packet header */
+	epb = (struct pcapng_enhance_packet_block *)
+		rte_pktmbuf_prepend(mc, sizeof(*epb));
+	if (unlikely(epb == NULL))
+		goto fail;
+
+	epb->block_type = PCAPNG_ENHANCED_PACKET_BLOCK;
+	epb->block_length = rte_pktmbuf_data_len(mc);
+
+	/* Interface index is filled in later during write */
+	mc->port = port_id;
+
+	epb->timestamp_hi = ns >> 32;
+	epb->timestamp_lo = (uint32_t)ns;
+	epb->capture_length = data_len;
+	epb->original_length = orig_len;
+
+	/* set trailer of block length */
+	*(uint32_t *)opt = epb->block_length;
+
+	return mc;
+
+fail:
+	rte_pktmbuf_free(mc);
+	return NULL;
+}
+
+/* Count how many segments are in this array of mbufs */
+static unsigned int
+mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
+{
+	unsigned int i, iovcnt;
+
+	for (iovcnt = 0, i = 0; i < n; i++) {
+		const struct rte_mbuf *m = pkts[i];
+
+		__rte_mbuf_sanity_check(m, 1);
+
+		iovcnt += m->nb_segs;
+	}
+	return iovcnt;
+}
+
+/* Write pre-formatted packets to file. */
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts)
+{
+	int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
+	struct iovec iov[iovcnt];
+	unsigned int i, cnt;
+	ssize_t ret;
+
+	for (i = cnt = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *m = pkts[i];
+		struct pcapng_enhance_packet_block *epb;
+
+		/* sanity check that is really a pcapng mbuf */
+		epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
+		if (unlikely(epb->block_type != PCAPNG_ENHANCED_PACKET_BLOCK ||
+			     epb->block_length != rte_pktmbuf_data_len(m))) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/*
+		 * The DPDK port is recorded during pcapng_copy.
+		 * Map that to PCAPNG interface in file.
+		 */
+		epb->interface_id = self->port_index[m->port];
+		do {
+			iov[cnt].iov_base = rte_pktmbuf_mtod(m, void *);
+			iov[cnt].iov_len = rte_pktmbuf_data_len(m);
+			++cnt;
+		} while ((m = m->next));
+	}
+
+	ret = writev(self->outfd, iov, iovcnt);
+	if (unlikely(ret < 0))
+		rte_errno = errno;
+	return ret;
+}
+
+/* Create new pcapng writer handle */
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment)
+{
+	rte_pcapng_t *self;
+
+	self = malloc(sizeof(*self));
+	if (!self) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	self->outfd = fd;
+
+	if (pcapng_section_block(self, osname, hardware, appname, comment) < 0)
+		goto fail;
+
+	if (pcapng_interfaces(self) < 0)
+		goto fail;
+
+	return self;
+fail:
+	free(self);
+	return NULL;
+}
+
+void
+rte_pcapng_close(rte_pcapng_t *self)
+{
+	close(self->outfd);
+	free(self);
+}
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
new file mode 100644
index 000000000000..2fa56ef9bcbc
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.h
@@ -0,0 +1,175 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+/**
+ * @file
+ * RTE pcapng
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * Pcapng is an evolution from the pcap format, created to address some of
+ * its deficiencies. Namely, the lack of extensibility and inability to store
+ * additional information.
+ *
+ * For details about the file format see RFC:
+ *   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+ *  and
+ *    https://github.com/pcapng/pcapng/
+ */
+
+#ifndef _RTE_PCAPNG_H_
+#define _RTE_PCAPNG_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Initialize the pcapng library.
+ *
+ * Computes the time offset for timestamps.
+ *
+ * @return
+ *    0 on success, -1 on error
+ */
+__rte_experimental
+int rte_pcapng_init(void);
+
+/* Opaque handle used for functions in this library. */
+typedef struct rte_pcapng rte_pcapng_t;
+
+/**
+ * Write data to existing open file
+ *
+ * @param fd
+ *   file descriptor
+ * @param osname
+ *   Optional description of the operting system.
+ *   Examples: "Debian 11", "Windows Server 22"
+ * @param hardware
+ *   Optional description of the hardware used to create this file.
+ *   Examples: "x86 Virtual Machine"
+ * @param appname
+ *   Optional: application name recorded in the pcapng file.
+ *   Example: "dpdk-dumpcap 1.0 (DPDK 20.11)"
+ * @param comment
+ *   Optional comment to add to file header.
+ * @return
+ *   handle to library, or NULL in case of error (and rte_errno is set).
+ */
+__rte_experimental
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment);
+
+/**
+ * Close capture file
+ *
+ * @param self
+ *  handle to library
+ */
+__rte_experimental
+void
+rte_pcapng_close(rte_pcapng_t *self);
+
+/**
+ * Direction flag
+ * These should match Enhanced Packet Block flag bits
+ */
+enum rte_pcapng_direction {
+	RTE_PCAPNG_DIRECTION_UNKNOWN = 0,
+	RTE_PCAPNG_DIRECTION_IN  = 1,
+	RTE_PCAPNG_DIRECTION_OUT = 2,
+};
+
+/**
+ * Format an mbuf for writing to file
+ *
+ * @param port_id
+ *   The Ethernet port on which packet was receive
+ *   or is going to be transmitted.
+ * @param queue
+ *   The queue on the Ethernet port.
+ * @param mp
+ *   The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ *   The mbuf to copy
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @param timestamp
+ *   The timestamp in TSC cycles.
+ * @param direction
+ *   The direction of the packer: receive, transmit or unknown.
+ *
+ * @return
+ *   - The pointer to the new mbuf formatted for pcapng_write
+ *   - NULL if allocation fails.
+ *
+ */
+__rte_experimental
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint16_t queue,
+		const struct rte_mbuf *m, struct rte_mempool *mp,
+		uint32_t length, uint64_t timestamp,
+		enum rte_pcapng_direction direction);
+
+/**
+ * Write packets to the capture file.
+ *
+ * Packets to be captured are copied by rte_pcapng_mbuf()
+ * and then this function is called to write them to the file.
+ * @warning
+ * Do not pass original mbufs
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param pkts
+ *  The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *  which contain the output packets
+ * @param nb_pkts
+ *  The number of packets to write to the file.
+ * @return
+ *  The number of bytes written to file, -1 on failure to write file.
+ *  The mbuf's in *pkts* are always freed.
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts);
+
+/**
+ * Write an Interface statistics block.
+ * For statistics, use 0 if don't know or care to report it.
+ * Should be called before closing capture to report results.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param port
+ *  The Ethernet port to report stats on.
+ * @param received
+ *  The number of packets received since the start of capturing.
+ * @param missed
+ *  The number of packets missed by the device.
+ * @param accepted
+ *  The number of packets accepted by the filter.
+ * @param dropped
+ *  The number of packets dropped by the application (OS)
+ * @return
+ *  number of bytes written to file, -1 on failure to write file
+ */
+__rte_experimental
+int
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port,
+		       uint64_t received, uint64_t missed,
+		       uint64_t accepted, uint64_t dropped);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PCAPNG_H_ */
diff --git a/lib/pcapng/version.map b/lib/pcapng/version.map
new file mode 100644
index 000000000000..e6405f3bee01
--- /dev/null
+++ b/lib/pcapng/version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_pcapng_close;
+	rte_pcapng_copy;
+	rte_pcapng_fdopen;
+	rte_pcapng_init;
+	rte_pcapng_write_packets;
+	rte_pcapng_write_stats;
+
+	local: *;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH 2/5] pdump: support pcapng and filtering
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
  2021-09-03  0:47 ` [dpdk-dev] [PATCH 1/5] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-09-03  0:47 ` Stephen Hemminger
  2021-09-03  0:47 ` [dpdk-dev] [PATCH 3/5] app/dumpcap: add new packet capture application Stephen Hemminger
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-03  0:47 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build       |   4 +-
 lib/pdump/meson.build |   2 +-
 lib/pdump/rte_pdump.c | 386 +++++++++++++++++++++++++++++-------------
 lib/pdump/rte_pdump.h | 117 ++++++++++++-
 lib/pdump/version.map |   8 +
 5 files changed, 394 insertions(+), 123 deletions(-)

diff --git a/lib/meson.build b/lib/meson.build
index 514be90b09ec..b59bec494275 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -26,6 +26,7 @@ libraries = [
         'timer',   # eventdev depends on this
         'acl',
         'bbdev',
+        'bpf',
         'bitratestats',
         'cfgfile',
         'compressdev',
@@ -43,7 +44,6 @@ libraries = [
         'member',
 	'pcapng',
         'power',
-        'pdump',
         'rawdev',
         'regexdev',
         'rib',
@@ -55,10 +55,10 @@ libraries = [
         'ipsec', # ipsec lib depends on net, crypto and security
         'fib', #fib lib depends on rib
         'port', # pkt framework libs which use other libs from above
+	'pdump', # pdump lib depends on bpf pcapng
         'table',
         'pipeline',
         'flow_classify', # flow_classify lib depends on pkt framework table lib
-        'bpf',
         'graph',
         'node',
 ]
diff --git a/lib/pdump/meson.build b/lib/pdump/meson.build
index 3a95eabde6a6..51ceb2afdec5 100644
--- a/lib/pdump/meson.build
+++ b/lib/pdump/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('rte_pdump.c')
 headers = files('rte_pdump.h')
-deps += ['ethdev']
+deps += ['ethdev', 'bpf', 'pcapng']
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 382217bc1564..3237c54af69e 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -9,6 +9,7 @@
 #include <rte_log.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
+#include <rte_pcapng.h>
 
 #include "rte_pdump.h"
 
@@ -27,30 +28,26 @@ enum pdump_operation {
 	ENABLE = 2
 };
 
+/*
+ * Note: version numbers intentionally start at 3
+ * in order to catch any application built with older out
+ * version of DPDK using incompatiable client request format.
+ */
 enum pdump_version {
-	V1 = 1
+	PDUMP_CLIENT_LEGACY = 3,
+	PDUMP_CLIENT_PCAPNG = 4,
 };
 
 struct pdump_request {
 	uint16_t ver;
 	uint16_t op;
-	uint32_t flags;
-	union pdump_data {
-		struct enable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} en_v1;
-		struct disable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} dis_v1;
-	} data;
+	uint16_t flags;
+	uint16_t queue;
+	struct rte_ring *ring;
+	struct rte_mempool *mp;
+	const struct rte_bpf *filter;
+	uint32_t snaplen;
+	char device[RTE_DEV_NAME_MAX_LEN];
 };
 
 struct pdump_response {
@@ -63,36 +60,67 @@ static struct pdump_rxtx_cbs {
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	const struct rte_eth_rxtx_callback *cb;
-	void *filter;
+	const struct rte_bpf *filter;
+	enum pdump_version ver;
+	uint32_t snaplen;
+	struct rte_pdump_stats stats;
 } rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
 tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
 
 
-static inline void
-pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+static void
+pdump_copy(uint16_t port_id, uint16_t queue,
+	   enum rte_pcapng_direction direction,
+	   struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
 	unsigned int i;
 	int ring_enq;
 	uint16_t d_pkts = 0;
 	struct rte_mbuf *dup_bufs[nb_pkts];
-	struct pdump_rxtx_cbs *cbs;
+	struct pdump_rxtx_cbs *cbs = user_params;
+	uint64_t ts;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	struct rte_mbuf *p;
+	uint64_t bpf_rc[nb_pkts];
+
+	if (cbs->filter &&
+	    !rte_bpf_exec_burst(cbs->filter, (void **)pkts, bpf_rc, nb_pkts))
+		return;	/* our work here is done */
 
-	cbs  = user_params;
+	ts = rte_get_tsc_cycles();
 	ring = cbs->ring;
 	mp = cbs->mp;
 	for (i = 0; i < nb_pkts; i++) {
-		p = rte_pktmbuf_copy(pkts[i], mp, 0, UINT32_MAX);
-		if (p)
+		/*
+		 * Similar behavior to rte_bpf_eth callback.
+		 * if BPF program returns zero value for a given packet,
+		 * then it will be ignored.
+		 */
+		if (cbs->filter && bpf_rc[i] == 0)
+			continue;
+
+		/*
+		 * If using pcapng then want to wrap packets
+		 * otherwise a simple copy.
+		 */
+		if (cbs->ver == PDUMP_CLIENT_PCAPNG)
+			p = rte_pcapng_copy(port_id, queue,
+					    pkts[i], mp, cbs->snaplen,
+					    ts, direction);
+		else
+			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
+
+		if (likely(p != NULL))
 			dup_bufs[d_pkts++] = p;
 	}
 
+	cbs->stats.accepted += d_pkts;
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)dup_bufs, d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
+		cbs->stats.missed += drops;
 		PDUMP_LOG(DEBUG,
 			"only %d of packets enqueued to ring\n", ring_enq);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
@@ -100,43 +128,50 @@ pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 }
 
 static uint16_t
-pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_rx(uint16_t port, uint16_t queue,
 	struct rte_mbuf **pkts, uint16_t nb_pkts,
 	uint16_t max_pkts __rte_unused,
 	void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_IN,
+		   pkts, nb_pkts, user_params);
 	return nb_pkts;
 }
 
 static uint16_t
-pdump_tx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_tx(uint16_t port, uint16_t queue,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT,
+		   pkts, nb_pkts, user_params);
 	return nb_pkts;
 }
 
 static int
-pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_rx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    uint16_t operation, uint32_t snaplen)
 {
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &rx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &rx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"rx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			memset(&cbs->stats, 0, sizeof(cbs->stats));
+
 			cbs->cb = rte_eth_add_first_rx_callback(port, qid,
 								pdump_rx, cbs);
 			if (cbs->cb == NULL) {
@@ -145,8 +180,7 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -170,26 +204,30 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 }
 
 static int
-pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_tx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    uint16_t operation, uint32_t snaplen)
 {
 
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &tx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &tx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"tx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			memset(&cbs->stats, 0, sizeof(cbs->stats));
 			cbs->cb = rte_eth_add_tx_callback(port, qid, pdump_tx,
 								cbs);
 			if (cbs->cb == NULL) {
@@ -198,8 +236,7 @@ pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -233,32 +270,25 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 
+	if (!(p->ver == PDUMP_CLIENT_LEGACY ||
+	      p->ver == PDUMP_CLIENT_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "incorrect client version %u\n", p->ver);
+		return -EINVAL;
+	}
+
 	flags = p->flags;
 	operation = p->op;
-	if (operation == ENABLE) {
-		ret = rte_eth_dev_get_port_by_name(p->data.en_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.en_v1.device);
-			return -EINVAL;
-		}
-		queue = p->data.en_v1.queue;
-		ring = p->data.en_v1.ring;
-		mp = p->data.en_v1.mp;
-	} else {
-		ret = rte_eth_dev_get_port_by_name(p->data.dis_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.dis_v1.device);
-			return -EINVAL;
-		}
-		queue = p->data.dis_v1.queue;
-		ring = p->data.dis_v1.ring;
-		mp = p->data.dis_v1.mp;
+	queue = p->queue;
+	ring = p->ring;
+	mp = p->mp;
+
+	ret = rte_eth_dev_get_port_by_name(p->device, &port);
+	if (ret < 0) {
+		PDUMP_LOG(ERR,
+			  "failed to get port id for device id=%s\n",
+			  p->device);
+		return -EINVAL;
 	}
 
 	/* validation if packet capture is for all queues */
@@ -296,8 +326,8 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register RX callback */
 	if (flags & RTE_PDUMP_FLAG_RX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_rx_q : queue + 1;
-		ret = pdump_register_rx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_rx_callbacks(p->ver, end_q, port, queue, ring, mp,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -305,8 +335,8 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register TX callback */
 	if (flags & RTE_PDUMP_FLAG_TX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_tx_q : queue + 1;
-		ret = pdump_register_tx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_tx_callbacks(p->ver, end_q, port, queue, ring, mp,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -349,6 +379,8 @@ rte_pdump_init(void)
 {
 	int ret;
 
+	rte_pcapng_init();
+
 	ret = rte_mp_action_register(PDUMP_MP, pdump_server);
 	if (ret && rte_errno != ENOTSUP)
 		return -1;
@@ -392,14 +424,21 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 static int
 pdump_validate_flags(uint32_t flags)
 {
-	if (flags != RTE_PDUMP_FLAG_RX && flags != RTE_PDUMP_FLAG_TX &&
-		flags != RTE_PDUMP_FLAG_RXTX) {
+	if ((flags & RTE_PDUMP_FLAG_RXTX) == 0) {
 		PDUMP_LOG(ERR,
 			"invalid flags, should be either rx/tx/rxtx\n");
 		rte_errno = EINVAL;
 		return -1;
 	}
 
+	/* mask off the flags we know about */
+	if (flags & ~(RTE_PDUMP_FLAG_RXTX | RTE_PDUMP_FLAG_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "unknown flags: %#x\n", flags);
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -426,12 +465,12 @@ pdump_validate_port(uint16_t port, char *name)
 }
 
 static int
-pdump_prepare_client_request(char *device, uint16_t queue,
-				uint32_t flags,
-				uint16_t operation,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+pdump_prepare_client_request(const char *device, uint16_t queue,
+			     uint32_t flags, uint32_t snaplen,
+			     uint16_t operation,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     const struct rte_bpf *filter)
 {
 	int ret = -1;
 	struct rte_mp_msg mp_req, *mp_rep;
@@ -440,23 +479,23 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	struct pdump_request *req = (struct pdump_request *)mp_req.param;
 	struct pdump_response *resp;
 
-	req->ver = 1;
-	req->flags = flags;
+	memset(req, 0, sizeof(*req));
+	if (flags & RTE_PDUMP_FLAG_PCAPNG)
+		req->ver = PDUMP_CLIENT_PCAPNG;
+	else
+		req->ver = PDUMP_CLIENT_LEGACY;
+
+	req->flags = flags & RTE_PDUMP_FLAG_RXTX;
 	req->op = operation;
+	req->queue = queue;
+	strlcpy(req->device, device,sizeof(req->device));
+
 	if ((operation & ENABLE) != 0) {
-		strlcpy(req->data.en_v1.device, device,
-			sizeof(req->data.en_v1.device));
-		req->data.en_v1.queue = queue;
-		req->data.en_v1.ring = ring;
-		req->data.en_v1.mp = mp;
-		req->data.en_v1.filter = filter;
-	} else {
-		strlcpy(req->data.dis_v1.device, device,
-			sizeof(req->data.dis_v1.device));
-		req->data.dis_v1.queue = queue;
-		req->data.dis_v1.ring = NULL;
-		req->data.dis_v1.mp = NULL;
-		req->data.dis_v1.filter = NULL;
+		req->queue = queue;
+		req->ring = ring;
+		req->mp = mp;
+		req->filter = filter;
+		req->snaplen = snaplen;
 	}
 
 	strlcpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
@@ -477,11 +516,17 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	return ret;
 }
 
-int
-rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
-			struct rte_ring *ring,
-			struct rte_mempool *mp,
-			void *filter)
+/*
+ * There are two versions of this function, because although original API
+ * left place holder for future filter, it never checked the value.
+ * Therefore the API can't depend on application passing a non
+ * bogus value.
+ */
+static int
+pdump_enable(uint16_t port, uint16_t queue,
+	     uint32_t flags, uint32_t snaplen,
+	     struct rte_ring *ring, struct rte_mempool *mp,
+	     const struct rte_bpf *filter)
 {
 	int ret;
 	char name[RTE_DEV_NAME_MAX_LEN];
@@ -496,20 +541,42 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						ENABLE, ring, mp, filter);
+	if (snaplen == 0)
+		snaplen = UINT32_MAX;
 
-	return ret;
+	return pdump_prepare_client_request(name, queue, flags, snaplen,
+					    ENABLE, ring, mp, filter);
 }
 
 int
-rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
-				uint32_t flags,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
+		 struct rte_ring *ring,
+		 struct rte_mempool *mp,
+		 void *filter __rte_unused)
 {
-	int ret = 0;
+	return pdump_enable(port, queue, flags, 0,
+			    ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf *filter)
+{
+	return pdump_enable(port, queue, flags, snaplen,
+			    ring, mp, filter);
+}
+
+static int
+pdump_enable_by_deviceid(const char *device_id, uint16_t queue,
+			 uint32_t flags, uint32_t snaplen,
+			 struct rte_ring *ring,
+			 struct rte_mempool *mp,
+			 const struct rte_bpf *filter)
+{
+	int ret;
 
 	ret = pdump_validate_ring_mp(ring, mp);
 	if (ret < 0)
@@ -518,10 +585,30 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						ENABLE, ring, mp, filter);
+	return pdump_prepare_client_request(device_id, queue, flags, snaplen,
+					    ENABLE, ring, mp, filter);
+}
 
-	return ret;
+int
+rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
+			     uint32_t flags,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     void *filter __rte_unused)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, 0,
+					ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf *filter)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, snaplen,
+					ring, mp, filter);
 }
 
 int
@@ -537,8 +624,8 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags)
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(name, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
@@ -553,8 +640,73 @@ rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(device_id, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
+
+static void
+pdump_sum_stats(struct rte_pdump_stats *total,
+		const struct pdump_rxtx_cbs *cbs,
+		uint16_t nq)
+{
+	uint16_t qid;
+
+	memset(total, 0, sizeof(*total));
+
+	for (qid = 0; qid < nq; qid++) {
+		total->received += cbs[qid].stats.received;
+		total->missed += cbs[qid].stats.missed;
+		total->accepted += cbs[qid].stats.accepted;
+	}
+}
+
+int
+rte_pdump_get_stats(uint16_t port, uint16_t queue,
+		    struct rte_pdump_stats *rx_stats,
+		    struct rte_pdump_stats *tx_stats)
+{
+	uint16_t nb_rx_q = 0, nb_tx_q = 0;
+
+	if (port >= RTE_MAX_ETHPORTS) {
+		PDUMP_LOG(ERR, "Invalid port id %u\n", port);
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	if (queue == RTE_PDUMP_ALL_QUEUES) {
+		struct rte_eth_dev_info dev_info;
+		int ret;
+
+		ret = rte_eth_dev_info_get(port, &dev_info);
+		if (ret != 0) {
+			PDUMP_LOG(ERR,
+				"Error during getting device (port %u) info: %s\n",
+				port, strerror(-ret));
+			return ret;
+		}
+		nb_rx_q = dev_info.nb_rx_queues;
+		nb_tx_q = dev_info.nb_tx_queues;
+	} else if (queue >= RTE_MAX_QUEUES_PER_PORT) {
+		PDUMP_LOG(ERR, "Invalid queue id %u\n", queue);
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	if (rx_stats) {
+		if (queue == RTE_PDUMP_ALL_QUEUES)
+			pdump_sum_stats(rx_stats, &rx_cbs[port][0], nb_rx_q);
+		else
+			*rx_stats = rx_cbs[port][queue].stats;
+	}
+
+	if (tx_stats) {
+		if (queue == RTE_PDUMP_ALL_QUEUES)
+			pdump_sum_stats(tx_stats, &tx_cbs[port][0], nb_tx_q);
+		else
+			*tx_stats = tx_cbs[port][queue].stats;
+	}
+
+	return 0;
+}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index 6b00fc17aeb2..992331fddffb 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -15,6 +15,7 @@
 #include <stdint.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
+#include <rte_bpf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -26,7 +27,9 @@ enum {
 	RTE_PDUMP_FLAG_RX = 1,  /* receive direction */
 	RTE_PDUMP_FLAG_TX = 2,  /* transmit direction */
 	/* both receive and transmit directions */
-	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
+	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX),
+
+	RTE_PDUMP_FLAG_PCAPNG = 4, /* format for pcapng */
 };
 
 /**
@@ -68,7 +71,7 @@ rte_pdump_uninit(void);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  Unused should be NULL.
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -80,6 +83,42 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 		struct rte_mempool *mp,
 		void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given port and queue with filtering.
+ *
+ * @param port
+ *  port on which packet capturing should be enabled.
+ * @param queue
+ *  queue of a given port on which packet capturing should be enabled.
+ *  users should pass on value UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  flags specifies RTE_PDUMP_FLAG_RX/RTE_PDUMP_FLAG_TX/RTE_PDUMP_FLAG_RXTX
+ *  on which packet capturing should be enabled for a given port and queue.
+ * @param snaplen
+ *  snapshot length. No more than snaplen bytes of the network packet
+ *  will be saved.  Use 0 or 262144 to capture all of the packet.
+ * @param ring
+ *  ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  BPF filter to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf *filter);
+
 /**
  * Disables packet capturing on given port and queue.
  *
@@ -118,7 +157,7 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  unused should be NULL
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -131,6 +170,44 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 				struct rte_mempool *mp,
 				void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given device id and queue with filtering.
+ * device_id can be name or pci address of device.
+ *
+ * @param device_id
+ *  device id on which packet capturing should be enabled.
+ * @param queue
+ *  queue of a given device id on which packet capturing should be enabled.
+ *  users should pass on value UINT16_MAX to enable packet capturing on all
+ *  queues of a given device id.
+ * @param flags
+ *  flags specifies RTE_PDUMP_FLAG_RX/RTE_PDUMP_FLAG_TX/RTE_PDUMP_FLAG_RXTX
+ *  on which packet capturing should be enabled for a given port and queue.
+ * @param snaplen
+ *  snapshot length. No more than snaplen bytes of the network packet
+ *  will be saved.  Use 0 or 262144 to capture all of the packet.
+ * @param ring
+ *  ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  BPF filter to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf *filter);
+
+
 /**
  * Disables packet capturing on given device_id and queue.
  * device_id can be name or pci address of device.
@@ -153,6 +230,40 @@ int
 rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 				uint32_t flags);
 
+struct rte_pdump_stats {
+	uint64_t received;	/**< callback called */
+	uint64_t accepted;	/**< allowed by filter */
+	uint64_t missed;	/**< ring full */
+};
+
+/**
+ * Query packet capture statistics.
+ *
+ * @param port
+ *  port on which packet capturing should be enabled.
+ * @param queue
+ *  queue of a given port on which packet capturing should be enabled.
+ *  users should pass on value UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param rx_stats
+ *   A pointer to a structure of type *rte_pdump_stats* to be filled with
+ *   the values of the capture statistics:
+ *   - *received* with the total of received packets.
+ *   - *accepted* with the total of packets matched by the filter.
+ *   - *missed*   with the total of packets missed because of ring full.
+ * @param tx_stats
+ *   - *received* with the total of transmitted packets.
+ *   - *accepted* with the total of packets matched by the filter.
+ *   - *missed*   with the total of packets missed because of ring full.
+ * @return
+ *   Zero if successful. Non-zero otherwise.
+ */
+__rte_experimental
+int
+rte_pdump_get_stats(uint16_t port, uint16_t queue,
+		    struct rte_pdump_stats *rx_stats,
+		    struct rte_pdump_stats *tx_stats);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index f0a9d12c9a9e..b4c20b56f237 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -10,3 +10,11 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_pdump_enable_bpf;
+	rte_pdump_enable_bpf_by_deviceid;
+	rte_pdump_get_stats;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH 3/5] app/dumpcap: add new packet capture application
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
  2021-09-03  0:47 ` [dpdk-dev] [PATCH 1/5] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
  2021-09-03  0:47 ` [dpdk-dev] [PATCH 2/5] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-09-03  0:47 ` Stephen Hemminger
  2021-09-03  0:59   ` Stephen Hemminger
  2021-09-03  0:47 ` [dpdk-dev] [PATCH 4/5] doc: changes for new pcapng and dumpcap Stephen Hemminger
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-03  0:47 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.

It is not complete yet some features such as filtering are not implemented.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/dumpcap/main.c      | 829 ++++++++++++++++++++++++++++++++++++++++
 app/dumpcap/meson.build |  11 +
 app/meson.build         |   1 +
 3 files changed, 841 insertions(+)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build

diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c
new file mode 100644
index 000000000000..09fc7dcb5a21
--- /dev/null
+++ b/app/dumpcap/main.c
@@ -0,0 +1,829 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * DPDK application to dump network traffic
+ * This is designed to look and act like the Wireshark
+ * dumpcap program.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_alarm.h>
+#include <rte_config.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_lcore.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_pcapng.h>
+#include <rte_pdump.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+#include <rte_version.h>
+
+#include <pcap/pcap.h>
+
+#define RING_NAME "capture-ring"
+#define MONITOR_INTERVAL  (500 * 1000)
+#define MBUF_POOL_CACHE_SIZE 32
+#define BURST_SIZE 32
+#define SLEEP_THRESHOLD 1000
+
+static const char *progname;
+static bool quit_signal;
+static bool group_read;
+static bool quiet;
+static bool promiscuous_mode = true;
+static bool use_pcapng = true;
+static bool show_statistics;
+static char *output_name;
+static const char *filter_str;
+static unsigned int ring_size = 2048;
+static uint64_t packets_received;
+static size_t file_size;
+static const char *capture_comment;
+static uint32_t snaplen = 65535;
+static bool dump_bpf;
+static struct timespec start_time;
+
+static struct {
+	double duration;
+	unsigned long packets;
+	size_t size;
+} stop;
+
+struct interface {
+	TAILQ_ENTRY(interface) next;
+	uint64_t missed;
+	uint16_t port;
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_rxtx_callback *rx_cb[RTE_MAX_QUEUES_PER_PORT];
+};
+
+TAILQ_HEAD(interface_list, interface);
+static struct interface_list interfaces = TAILQ_HEAD_INITIALIZER(interfaces);
+static struct interface *port2intf[RTE_MAX_ETHPORTS];
+
+static void usage(void)
+{
+	printf("Usage: %s [options] ...\n\n", progname);
+	printf("Capture Interface:\n"
+	       "  -i <interface>           name or port index of interface\n"
+	       "  -f <capture filter>      packet filter in libpcap filter syntax\n"
+	       "  -s <snaplen>, --snapshot-length <snaplen>\n"
+	       "                           packet snapshot length (def: appropriate maximum)\n"
+	       "  -p, --no-promiscuous-mode\n"
+	       "                           don't capture in promiscuous mode\n"
+	       "  -D, --list-interfaces    print list of interfaces and exit\n"
+	       "  -d                       print generated BPF code for capture filter\n"
+	       "  -S                       print statistics for each interface once per second\n"
+	       "\n"
+	       "Stop conditions:\n"
+	       "  -c <packet count>        stop after n packets (def: infinite)\n"
+	       "  -a <autostop cond.> ..., --autostop <autostop cond.> ...\n"
+	       "                           duration:NUM - stop after NUM seconds\n"
+	       "                           filesize:NUM - stop this file after NUM kB\n"
+	       "                            packets:NUM - stop after NUM packets\n"
+	       "Output (files):\n"
+	       "  -w <filename>            name of file to save (def: tempfile)\n"
+	       "  -g                       enable group read access on the output file(s)\n"
+	       "  -n                       use pcapng format instead of pcap (default)\n"
+	       "  -P                       use libpcap format instead of pcapng\n"
+	       "  --capture-comment <comment>\n"
+	       "                           add a capture comment to the output file\n"
+	       "\n"
+	       "Miscellaneous:\n"
+	       "  -q                       don't report packet capture counts\n"
+	       "  -v, --version            print version information and exit\n"
+	       "  -h, --help               display this help and exit\n"
+	       "\n"
+	       "Use Ctrl-C to stop capturing at any time.\n");
+}
+
+static const char *version(void)
+{
+	static char str[128];
+
+	snprintf(str, sizeof(str),
+		 "%s 1.0 (%s)\n", progname, rte_version());
+	return str;
+}
+
+/* Parse numeric argument from command line */
+static unsigned long get_uint(const char *arg, const char *name,
+			     unsigned int limit)
+{
+	unsigned long u;
+	char *endp;
+
+	u = strtoul(arg, &endp, 0);
+	if (*arg == '\0' || *endp != '\0')
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is not a valid number\n",
+			 name, arg);
+	if (limit && u > limit)
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is too large (greater than %u)\n",
+			 name, arg, limit);
+
+	return u;
+}
+
+/* Set auto stop values */
+static void autostop(char *opt)
+{
+	char *value, *endp;
+
+	value = strchr(opt, ':');
+	if (value == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Missing colon in auto stop parameter\n");
+
+	*value++ = '\0';
+	if (strcmp(opt, "duration") == 0) {
+		stop.duration = strtod(value, &endp);
+		if (*value == '\0' || *endp != '\0' || stop.duration < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Invalid duration \"%s\"\n", value);
+	} else if (strcmp(opt, "filesize") == 0) {
+		stop.size = get_uint(value, "filesize", 0) * 1024;
+	} else if (strcmp(opt, "packets") == 0) {
+		stop.packets = get_uint(value, "packets", 0);
+	} else {
+		rte_exit(EXIT_FAILURE,
+			 "Unknown autostop parameter \"%s\"\n", opt);
+	}
+}
+
+/* Add interface to list of interfaces to capture */
+static void add_interface(uint16_t port, const char *name)
+{
+	struct interface *intf;
+
+	intf = malloc(sizeof(*intf));
+	if (!intf)
+		rte_exit(EXIT_FAILURE, "no memory for interface\n");
+
+	memset(intf, 0, sizeof(*intf));
+	strlcpy(intf->name, name, sizeof(intf->name));
+
+	printf("Capturing on '%s'\n", name);
+
+	port2intf[port] = intf;
+	TAILQ_INSERT_TAIL(&interfaces, intf, next);
+}
+
+/* Select all valid DPDK interfaces */
+static void select_all_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+	}
+}
+
+/*
+ * Choose interface to capture if no -i option given.
+ * Select the first DPDK port, this matches what dumpcap does.
+ */
+static void set_default_interface(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+		return;
+	}
+	rte_exit(EXIT_FAILURE, "No usable interfaces found\n");
+}
+
+/* Lookup interface by name or port and add it to the list */
+static void select_interface(const char *arg)
+{
+	uint16_t port;
+
+	if (strcmp(arg, "*"))
+		select_all_interfaces();
+	else if (rte_eth_dev_get_port_by_name(arg, &port) == 0)
+		add_interface(port, arg);
+	else {
+		char name[RTE_ETH_NAME_MAX_LEN];
+
+		port = get_uint(arg, "port_number", UINT16_MAX);
+		if (rte_eth_dev_get_name_by_port(port, name) < 0)
+			rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
+				 port);
+		add_interface(port, name);
+	}
+}
+
+/* Display list of possible interfaces that can be used. */
+static void show_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		printf("%u. %s\n", p, name);
+	}
+}
+
+static struct rte_bpf *compile_filter(void)
+{
+	struct bpf_program fcode;
+	pcap_t *pcap;
+
+	pcap = pcap_open_dead(DLT_EN10MB, snaplen);
+	if (!pcap)
+		rte_exit(EXIT_FAILURE, "can not open pcap\n");
+
+	if (pcap_compile(pcap, &fcode, filter_str,
+			 1, PCAP_NETMASK_UNKNOWN) != 0)
+		rte_exit(EXIT_FAILURE, "pcap filter string not valid (%s)\n",
+			 pcap_geterr(pcap));
+
+	rte_exit(EXIT_FAILURE, "filter not implemented yet\n");
+
+	/*
+	 * Need to convert classic BPF to eBPF and put in shared memory
+	 * be read by primary process.
+	 */
+	pcap_freecode(&fcode);
+	pcap_close(pcap);
+
+	rte_exit(EXIT_FAILURE, "not implemented\n");
+	return NULL;
+}
+
+/*
+ * Parse command line options.
+ * These are chosen to be similar to dumpcap command.
+ */
+static void parse_opts(int argc, char **argv)
+{
+	static const struct option long_options[] = {
+		{ "autostop",        required_argument, NULL, 'a' },
+		{ "capture-comment", required_argument, NULL, 0 },
+		{ "help",            no_argument,       NULL, 'h' },
+		{ "interface",       required_argument, NULL, 'i' },
+		{ "list-interfaces", no_argument,       NULL, 'D' },
+		{ "no-promiscuous-mode", no_argument,   NULL, 'p' },
+		{ "output-file",     required_argument, NULL, 'w' },
+		{ "ring-buffer",     required_argument, NULL, 'b' },
+		{ "snapshot-length", required_argument, NULL, 's' },
+		{ "version",         no_argument,       NULL, 'v' },
+		{ NULL },
+	};
+	int option_index, c;
+
+	for (;;) {
+		c = getopt_long(argc, argv, "a:b:c:dDf:ghi:nN:pPqs:Svw:",
+				long_options, &option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			switch (option_index) {
+			case 0:
+				capture_comment = optarg;
+				break;
+			default:
+				usage();
+				exit(1);
+			}
+			break;
+		case 'a':
+			autostop(optarg);
+			break;
+		case 'b':
+			rte_exit(EXIT_FAILURE,
+				 "multiple files not implemented\n");
+			break;
+		case 'c':
+			stop.packets = get_uint(optarg, "packet_count", 0);
+			break;
+		case 'd':
+			dump_bpf = true;
+			break;
+		case 'D':
+			show_interfaces();
+			exit(0);
+		case 'f':
+			filter_str = optarg;
+			break;
+		case 'g':
+			group_read = true;
+			break;
+		case 'h':
+			printf("%s\n\n", version());
+			usage();
+			exit(0);
+		case 'i':
+			select_interface(optarg);
+			break;
+		case 'n':
+			use_pcapng = true;
+			break;
+		case 'N':
+			ring_size = get_uint(optarg, "packet_limit", 0);
+			break;
+		case 'p':
+			promiscuous_mode = false;
+			break;
+		case 'P':
+			use_pcapng = false;
+			break;
+		case 'q':
+			quiet = true;
+			break;
+		case 's':
+			snaplen = get_uint(optarg, "snap_len", 0);
+			break;
+		case 'S':
+			show_statistics = true;
+			break;
+		case 'w':
+			output_name = optarg;
+			break;
+		case 'v':
+			printf("%s\n", version());
+			exit(0);
+		default:
+			fprintf(stderr, "Invalid option: %s\n",
+				argv[optind - 1]);
+			usage();
+			exit(1);
+		}
+	}
+}
+
+static void
+signal_handler(int sig_num __rte_unused)
+{
+	__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+}
+
+static double elapsed(void)
+{
+	struct timespec now;
+	double secs;
+
+	clock_gettime(CLOCK_MONOTONIC_COARSE, &now);
+	secs = now.tv_sec - start_time.tv_sec;
+	secs += (now.tv_nsec - start_time.tv_nsec) / 1.e9;
+	return secs;
+}
+
+static void
+cleanup_pdump_resources(void)
+{
+	struct interface *intf;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		rte_pdump_disable(intf->port,
+				  RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
+		if (promiscuous_mode)
+			rte_eth_promiscuous_disable(intf->port);
+	}
+}
+
+/* Alarm signal handler, used to check that primary process */
+static void
+monitor_primary(void *arg __rte_unused)
+{
+	if (__atomic_load_n(&quit_signal, __ATOMIC_RELAXED))
+		return;
+
+	if (rte_eal_primary_proc_alive(NULL)) {
+		rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+		return;
+	}
+
+	fprintf(stderr, "Primary process is no longer active, exiting...\n");
+	__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+}
+
+/* Setup handler to check when primary exits. */
+static void
+enable_primary_monitor(void)
+{
+	int ret;
+
+	/* Once primary exits, so will pdump. */
+	ret = rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to enable monitor:%d\n", ret);
+}
+
+static void
+disable_primary_monitor(void)
+{
+	int ret;
+
+	ret = rte_eal_alarm_cancel(monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to disable monitor:%d\n", ret);
+}
+
+static void
+report_packet_stats(rte_pcapng_t *out)
+{
+	struct interface *intf;
+	struct rte_pdump_stats rxstats, txstats;
+	struct rte_eth_stats stats;
+	uint64_t received, accepted, dropped, ifdrop;
+
+	fputc('\n', stderr);
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (rte_pdump_get_stats(intf->port, UINT16_MAX,
+					&rxstats, &txstats) < 0)
+			continue;
+
+		if (rte_eth_stats_get(intf->port, &stats) < 0)
+			ifdrop = 0;
+		else
+			ifdrop = stats.imissed - intf->missed;
+
+		received = rxstats.received + txstats.received;
+		accepted = rxstats.accepted + txstats.accepted;
+		dropped = rxstats.missed + txstats.missed;
+
+		if (use_pcapng)
+			rte_pcapng_write_stats(out, intf->port,
+					       received, ifdrop, accepted, dropped);
+		fprintf(stderr,
+			"Packets received/dropped on interface '%s': "
+			"%"PRIu64 "/%" PRIu64 " (%.1f)\n",
+			intf->name, received, dropped,
+			received ? 100. * received / (received + dropped) : 0.);
+	}
+}
+
+/*
+ * Start DPDK EAL with arguments.
+ * Unlike most DPDK programs, this application does not use the
+ * typical EAL command line arguments.
+ * We don't want to expose all the DPDK internals to the user.
+ */
+static void dpdk_init(void)
+{
+	const char *args[] = {
+		progname, "--proc-type", "secondary", "--log-level", "error"
+	};
+	const int eal_argc = RTE_DIM(args);
+	char **eal_argv;
+	unsigned int i;
+
+	/* DPDK API requires mutable versions of command line arguments. */
+	eal_argv = calloc(eal_argc + 1, sizeof(char *));
+	if (eal_argv == NULL)
+		rte_panic("No memory\n");
+
+	for (i = 0; i < RTE_DIM(args); i++)
+		eal_argv[i] = strdup(args[i]);
+
+	if (rte_eal_init(eal_argc, eal_argv) < 0)
+		rte_panic("EAL init failed\n");
+
+	if (rte_eth_dev_count_avail() == 0)
+		rte_exit(EXIT_FAILURE, "No Ethernet ports found\n");
+}
+
+/* Create packet ring shared between callbacks and process */
+static struct rte_ring *create_ring(void)
+{
+	struct rte_ring *pring;
+	size_t size, log2;
+
+	/* Find next power of 2 >= size. */
+	size = ring_size;
+	log2 = sizeof(size) * 8 - __builtin_clzl(size - 1);
+	size = 1u << log2;
+
+	if (size != ring_size) {
+		fprintf(stderr, "Ring size %u rounded up to %zu\n",
+			ring_size, size);
+		ring_size = size;
+	}
+
+	pring = rte_ring_lookup(RING_NAME);
+	if (pring == NULL) {
+		pring = rte_ring_create(RING_NAME, ring_size,
+					rte_socket_id(), 0);
+		if (pring == NULL)
+			rte_exit(EXIT_FAILURE, "Could not create ring :%s\n",
+				 rte_strerror(rte_errno));
+	}
+	return pring;
+}
+
+static struct rte_mempool *create_mempool(void)
+{
+	static const char pool_name[] = "capture_mbufs";
+	size_t num_mbufs = 2 * ring_size;
+	struct rte_mempool *mp;
+	uint16_t data_size = RTE_MBUF_DEFAULT_BUF_SIZE;
+
+	mp = rte_mempool_lookup(pool_name);
+	if (mp)
+		return mp;
+
+	if (snaplen < data_size)
+		data_size = snaplen;
+
+	mp = rte_pktmbuf_pool_create_by_ops(pool_name, num_mbufs,
+					    MBUF_POOL_CACHE_SIZE, 0,
+					    data_size,
+					    rte_socket_id(), "ring_mp_sc");
+	if (mp == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Mempool (%s) creation failed: %s\n", pool_name,
+			 rte_strerror(rte_errno));
+
+	return mp;
+}
+
+static void *create_output(void)
+{
+	struct utsname uts;
+	char os[256];
+	int fd;
+
+	/* If no filename specified make a tempfile name */
+	if (output_name == NULL) {
+		struct interface *intf;
+		struct tm *tm;
+		time_t now;
+		char ts[32];
+
+		intf = TAILQ_FIRST(&interfaces);
+		now = time(NULL);
+		tm = localtime(&now);
+		if (!tm)
+			rte_panic("localtime failed\n");
+
+		strftime(ts, sizeof(ts), "%Y%m%d%H%M%S", tm);
+		if (asprintf(&output_name, "/tmp/%s_%u_%s_%s.%s",
+			     progname, intf->port, intf->name, ts,
+			     use_pcapng ? "pcapng" : "pcap") < 0)
+			rte_panic("asprintf failed\n");
+	}
+
+	if (strcmp(output_name, "-") == 0)
+		fd = STDOUT_FILENO;
+	else {
+		mode_t mode = group_read ? 0640 : 0600;
+
+		fd = open(output_name, O_WRONLY | O_CREAT, mode);
+		if (fd < 0)
+			rte_exit(EXIT_FAILURE, "Can not open \"%s\": %s\n",
+				 output_name, strerror(errno));
+	}
+
+	if (use_pcapng) {
+		rte_pcapng_t *pcapng;
+
+		if (uname(&uts) < 0)
+			strcpy(os, "unknown");
+		else
+			snprintf(os, sizeof(os), "%s %s",
+				 uts.sysname, uts.release);
+
+		pcapng = rte_pcapng_fdopen(fd, os, NULL, version(), capture_comment);
+		if (pcapng == NULL)
+			rte_exit(EXIT_FAILURE, "pcapng_fdopen failed: %s\n",
+				 strerror(rte_errno));
+		return pcapng;
+	} else {
+		pcap_dumper_t *dumper;
+		pcap_t *pcap;
+
+		pcap = pcap_open_dead_with_tstamp_precision(DLT_EN10MB, snaplen,
+							    PCAP_TSTAMP_PRECISION_NANO);
+		if (pcap == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_open_dead failed\n");
+
+		dumper = pcap_dump_fopen(pcap, fdopen(fd, "w"));
+		if (dumper == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_dump_fopen failed: %s\n",
+				 pcap_geterr(pcap));
+		return dumper;
+	}
+}
+
+static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp,
+			 struct rte_bpf *filter)
+{
+	struct rte_eth_stats stats;
+	struct interface *intf;
+	uint32_t flags;
+	int ret;
+
+	flags = RTE_PDUMP_FLAG_RXTX;
+	if (use_pcapng)
+		flags |= RTE_PDUMP_FLAG_PCAPNG;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (promiscuous_mode)
+			rte_eth_promiscuous_enable(intf->port);
+
+		if (rte_eth_stats_get(intf->port, &stats) < 0)
+			intf->missed = 0;
+		else
+			intf->missed = stats.imissed;
+
+		ret = rte_pdump_enable_bpf(intf->port, RTE_PDUMP_ALL_QUEUES,
+					   flags, snaplen,
+					   r, mp, filter);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Packet dump enable failed: %s\n",
+				 rte_strerror(rte_errno));
+	}
+}
+
+/*
+ * Show current count of captured packets
+ * with backspaces to overwrite last value.
+ */
+static void show_count(uint64_t count)
+{
+	unsigned int i;
+	static unsigned int bt;
+
+	for (i = 0; i < bt; i++)
+		fputc('\b', stderr);
+
+	bt = fprintf(stderr, "%"PRIu64" ", count);
+}
+
+/* Write multiple packets in older pcap format */
+static ssize_t
+pcap_write_packets(pcap_dumper_t *dumper,
+		   struct rte_mbuf *pkts[], uint16_t n)
+{
+	uint8_t temp_data[snaplen];
+	struct pcap_pkthdr header;
+	uint16_t i;
+	size_t total = 0;
+
+	gettimeofday(&header.ts, NULL);
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m = pkts[i];
+
+		header.len = rte_pktmbuf_pkt_len(m);
+		header.caplen = RTE_MIN(header.len, snaplen);
+
+		pcap_dump((u_char *)dumper, &header,
+			  rte_pktmbuf_read(m, 0, header.caplen, temp_data));
+
+		total += sizeof(header) + header.len;
+	}
+
+	return total;
+}
+
+/* Process all packets in ring and dump to capture file */
+static int process_ring(void *out, struct rte_ring *r)
+{
+	struct rte_mbuf *pkts[BURST_SIZE];
+	unsigned int avail, n;
+	static unsigned int empty_count;
+	ssize_t written;
+
+	n = rte_ring_sc_dequeue_burst(r, (void **) pkts, BURST_SIZE,
+				      &avail);
+	if (n == 0) {
+		/* don't consume endless amounts of cpu if idle */
+		if (empty_count < SLEEP_THRESHOLD)
+			++empty_count;
+		else
+			usleep(10);
+		return 0;
+	}
+
+	empty_count = (avail == 0);
+
+	if (use_pcapng)
+		written = rte_pcapng_write_packets(out, pkts, n);
+	else
+		written = pcap_write_packets(out, pkts, n);
+
+	rte_pktmbuf_free_bulk(pkts, n);
+
+	if (written < 0)
+		return -1;
+
+	file_size += written;
+	packets_received += n;
+	if (!quiet)
+		show_count(packets_received);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	struct rte_bpf *bpf_filter = NULL;
+	struct rte_ring *r;
+	struct rte_mempool *mp;
+	void *out;
+
+	progname = basename(argv[0]);
+
+	dpdk_init();
+	parse_opts(argc, argv);
+
+	if (filter_str)
+		bpf_filter = compile_filter();
+
+	if (dump_bpf)
+		fprintf(stderr, "dump filter not implemented yet\n");
+
+	if (TAILQ_EMPTY(&interfaces))
+		set_default_interface();
+
+	r = create_ring();
+	mp = create_mempool();
+	out = create_output();
+	if (out == NULL)
+		rte_exit(EXIT_FAILURE, "can not open output file: %s\n",
+			 rte_strerror(rte_errno));
+
+	if (clock_gettime(CLOCK_MONOTONIC_COARSE, &start_time) < 0)
+		rte_exit(EXIT_FAILURE, "clock_gettime() failed: %s\n",
+			 strerror(errno));
+
+	enable_pdump(r, mp, bpf_filter);
+
+	signal(SIGINT, signal_handler);
+	signal(SIGPIPE, SIG_IGN);
+
+	enable_primary_monitor();
+
+	if (!quiet) {
+		fprintf(stderr, "Packets captured: ");
+		show_count(0);
+	}
+
+	while (!__atomic_load_n(&quit_signal, __ATOMIC_RELAXED)) {
+		if (process_ring(out, r) < 0) {
+			fprintf(stderr, "pcapng file write failed; %s\n",
+				strerror(errno));
+			break;
+		}
+
+		if (stop.size && file_size >= stop.size)
+			break;
+
+		if (stop.packets && packets_received >= stop.packets)
+			break;
+
+		if (stop.duration != 0 && elapsed() > stop.duration)
+			break;
+	}
+
+	disable_primary_monitor();
+
+	report_packet_stats(out);
+	if (use_pcapng)
+		rte_pcapng_close(out);
+	else
+		pcap_dump_close(out);
+
+	cleanup_pdump_resources();
+	rte_free(bpf_filter);
+	rte_ring_free(r);
+	rte_mempool_free(mp);
+
+	return rte_eal_cleanup() ? EXIT_FAILURE : 0;
+}
diff --git a/app/dumpcap/meson.build b/app/dumpcap/meson.build
new file mode 100644
index 000000000000..7a98994d3ce4
--- /dev/null
+++ b/app/dumpcap/meson.build
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+if is_windows
+	build = false
+	reason = 'not supported on Windows'
+	subdir_done()
+endif
+
+sources = files('main.c')
+deps += ['ethdev', 'pdump', 'pcapng']
diff --git a/app/meson.build b/app/meson.build
index 4c6049807cc3..e41a2e390236 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017-2019 Intel Corporation
 
 apps = [
+        'dumpcap',
         'pdump',
         'proc-info',
         'test-acl',
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH 4/5] doc: changes for new pcapng and dumpcap
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (2 preceding siblings ...)
  2021-09-03  0:47 ` [dpdk-dev] [PATCH 3/5] app/dumpcap: add new packet capture application Stephen Hemminger
@ 2021-09-03  0:47 ` Stephen Hemminger
  2021-09-03  0:47 ` [dpdk-dev] [PATCH 5/5] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-03  0:47 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Describe the new packet capture library and utilities

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/api/doxy-api-index.md                     |  1 +
 doc/api/doxy-api.conf.in                      |  1 +
 doc/guides/howto/packet_capture_framework.rst | 67 ++++++++--------
 doc/guides/prog_guide/index.rst               |  1 +
 doc/guides/prog_guide/pcapng_lib.rst          | 24 ++++++
 doc/guides/prog_guide/pdump_lib.rst           | 28 +++++--
 doc/guides/rel_notes/release_21_11.rst        | 10 +++
 doc/guides/tools/dumpcap.rst                  | 80 +++++++++++++++++++
 doc/guides/tools/index.rst                    |  1 +
 9 files changed, 174 insertions(+), 39 deletions(-)
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a0356..ee07394d1c78 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -223,3 +223,4 @@ The public API headers are grouped by topics:
   [experimental APIs]  (@ref rte_compat.h),
   [ABI versioning]     (@ref rte_function_versioning.h),
   [version]            (@ref rte_version.h)
+  [pcapng]             (@ref rte_pcapng.h)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6ab..aba17799a9a1 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -58,6 +58,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/metrics \
                           @TOPDIR@/lib/node \
                           @TOPDIR@/lib/net \
+                          @TOPDIR@/lib/pcapng \
                           @TOPDIR@/lib/pci \
                           @TOPDIR@/lib/pdump \
                           @TOPDIR@/lib/pipeline \
diff --git a/doc/guides/howto/packet_capture_framework.rst b/doc/guides/howto/packet_capture_framework.rst
index c31bac52340e..78baa609a021 100644
--- a/doc/guides/howto/packet_capture_framework.rst
+++ b/doc/guides/howto/packet_capture_framework.rst
@@ -1,18 +1,19 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(c) 2017 Intel Corporation.
 
-DPDK pdump Library and pdump Tool
-=================================
+DPDK packet capture libraries and tools
+=======================================
 
 This document describes how the Data Plane Development Kit (DPDK) Packet
 Capture Framework is used for capturing packets on DPDK ports. It is intended
 for users of DPDK who want to know more about the Packet Capture feature and
 for those who want to monitor traffic on DPDK-controlled devices.
 
-The DPDK packet capture framework was introduced in DPDK v16.07. The DPDK
-packet capture framework consists of the DPDK pdump library and DPDK pdump
-tool.
-
+The DPDK packet capture framework was introduced in DPDK v16.07 and
+enhanced in 21.1. The DPDK packet capture framework consists of the
+libraries for collecting packets ``librte_pdump`` and writing packets
+to a file ``librte_pcapng``. There are two sample applications:
+``dpdk-dumpcap`` and older ``dpdk-pdump``.
 
 Introduction
 ------------
@@ -22,43 +23,46 @@ allow users to initialize the packet capture framework and to enable or
 disable packet capture. The library works on a multi process communication model and its
 usage is recommended for debugging purposes.
 
-The :ref:`dpdk-pdump <pdump_tool>` tool is developed based on the
-``librte_pdump`` library.  It runs as a DPDK secondary process and is capable
-of enabling or disabling packet capture on DPDK ports. The ``dpdk-pdump`` tool
-provides command-line options with which users can request enabling or
-disabling of the packet capture on DPDK ports.
+The :ref:`librte_pcapng <pcapng_library>` library provides the APIs to format
+packets and write them to a file in Pcapng format.
+
+
+The :ref:`dpdk-dumpcap <dumpcap_tool>` is a tool that captures packets in
+like Wireshark dumpcap does for Linux. It runs as a DPDK secondary process and
+captures packets from one or more interfaces and writes them to a file
+in Pcapng format.  The ``dpdk-dumpcap`` tool is designed to take
+most of the same options as the Wireshark ``dumpcap`` command.
 
-The application which initializes the packet capture framework will be a primary process
-and the application that enables or disables the packet capture will
-be a secondary process. The primary process sends the Rx and Tx packets from the DPDK ports
-to the secondary process.
+Without any options it will use the packet capture framework to
+capture traffic from the first available DPDK port.
 
 In DPDK the ``testpmd`` application can be used to initialize the packet
-capture framework and acts as a server, and the ``dpdk-pdump`` tool acts as a
+capture framework and acts as a server, and the ``dpdk-dumpcap`` tool acts as a
 client. To view Rx or Tx packets of ``testpmd``, the application should be
-launched first, and then the ``dpdk-pdump`` tool. Packets from ``testpmd``
-will be sent to the tool, which then sends them on to the Pcap PMD device and
-that device writes them to the Pcap file or to an external interface depending
-on the command-line option used.
+launched first, and then the ``dpdk-dumpcap`` tool. Packets from ``testpmd``
+will be sent to the tool, and then to the Pcapng file.
 
 Some things to note:
 
-* The ``dpdk-pdump`` tool can only be used in conjunction with a primary
+* All tools using ``librte_pdump`` can only be used in conjunction with a primary
   application which has the packet capture framework initialized already. In
   dpdk, only ``testpmd`` is modified to initialize packet capture framework,
-  other applications remain untouched. So, if the ``dpdk-pdump`` tool has to
+  other applications remain untouched. So, if the ``dpdk-dumpcap`` tool has to
   be used with any application other than the testpmd, the user needs to
   explicitly modify that application to call the packet capture framework
   initialization code. Refer to the ``app/test-pmd/testpmd.c`` code and look
   for ``pdump`` keyword to see how this is done.
 
-* The ``dpdk-pdump`` tool depends on the libpcap based PMD.
+* The ``dpdk-pdump`` tool is an older tool created as demonstration of ``librte_pdump``
+  library. The ``dpdk-pdump`` tool provides more limited functionality and
+  and depends on the Pcap PMD. It is retained only for compatibility reasons;
+  users should use ``dpdk-dumpcap`` instead.
 
 
 Test Environment
 ----------------
 
-The overview of using the Packet Capture Framework and the ``dpdk-pdump`` tool
+The overview of using the Packet Capture Framework and the ``dpdk-dumpcap`` utility
 for packet capturing on the DPDK port in
 :numref:`figure_packet_capture_framework`.
 
@@ -66,13 +70,13 @@ for packet capturing on the DPDK port in
 
 .. figure:: img/packet_capture_framework.*
 
-   Packet capturing on a DPDK port using the dpdk-pdump tool.
+   Packet capturing on a DPDK port using the dpdk-dumpcap utility.
 
 
 Running the Application
 -----------------------
 
-The following steps demonstrate how to run the ``dpdk-pdump`` tool to capture
+The following steps demonstrate how to run the ``dpdk-dumpcap`` tool to capture
 Rx side packets on dpdk_port0 in :numref:`figure_packet_capture_framework` and
 inspect them using ``tcpdump``.
 
@@ -80,16 +84,15 @@ inspect them using ``tcpdump``.
 
      sudo <build_dir>/app/dpdk-testpmd -c 0xf0 -n 4 -- -i --port-topology=chained
 
-#. Launch the pdump tool as follows::
+#. Launch the dpdk-dump as follows::
 
-     sudo <build_dir>/app/dpdk-pdump -- \
-          --pdump 'port=0,queue=*,rx-dev=/tmp/capture.pcap'
+     sudo <build_dir>/app/dpdk-dumpcap -w /tmp/capture.pcapng
 
 #. Send traffic to dpdk_port0 from traffic generator.
-   Inspect packets captured in the file capture.pcap using a tool
-   that can interpret Pcap files, for example tcpdump::
+   Inspect packets captured in the file capture.pcap using a tool such as
+   tcpdump or tshark that can interpret Pcapng files::
 
-     $tcpdump -nr /tmp/capture.pcap
+     $ tcpdump -nr /tmp/capture.pcapng
      reading from file /tmp/capture.pcap, link-type EN10MB (Ethernet)
      11:11:36.891404 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
      11:11:36.891442 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46a3..b440c77c2ba1 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -43,6 +43,7 @@ Programmer's Guide
     ip_fragment_reassembly_lib
     generic_receive_offload_lib
     generic_segmentation_offload_lib
+    pcapng_lib
     pdump_lib
     multi_proc_support
     kernel_nic_interface
diff --git a/doc/guides/prog_guide/pcapng_lib.rst b/doc/guides/prog_guide/pcapng_lib.rst
new file mode 100644
index 000000000000..36379b530a57
--- /dev/null
+++ b/doc/guides/prog_guide/pcapng_lib.rst
@@ -0,0 +1,24 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2016 Intel Corporation.
+
+.. _pcapng_library:
+
+Packet Capture File Writer
+==========================
+
+Pcapng is a library for creating files in Pcapng file format.
+The Pcapng file format is the default capture file format for modern
+network capture processing tools. It can be read by wireshark and tcpdump.
+
+Usage
+-----
+
+Before the library can be used the function ``rte_pcapng_init``
+should be called once to initialize timestamp computation.
+
+
+References
+----------
+* Draft RFC https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+
+* Project repository  https://github.com/pcapng/pcapng/
diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst
index 62c0b015b2fe..9af91415e5ea 100644
--- a/doc/guides/prog_guide/pdump_lib.rst
+++ b/doc/guides/prog_guide/pdump_lib.rst
@@ -3,10 +3,10 @@
 
 .. _pdump_library:
 
-The librte_pdump Library
-========================
+The Packet Capture Library
+==========================
 
-The ``librte_pdump`` library provides a framework for packet capturing in DPDK.
+The DPDK ``pdump`` library provides a framework for packet capturing in DPDK.
 The library does the complete copy of the Rx and Tx mbufs to a new mempool and
 hence it slows down the performance of the applications, so it is recommended
 to use this library for debugging purposes.
@@ -23,11 +23,19 @@ or disable the packet capture, and to uninitialize it.
 
 * ``rte_pdump_enable()``:
   This API enables the packet capture on a given port and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf()``
+  This API enables the packet capture on a given port and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_enable_by_deviceid()``:
   This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf_by_deviceid()``
+  This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
+  It also allows seating an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_disable()``:
   This API disables the packet capture on a given port and queue.
@@ -61,6 +69,12 @@ and enables the packet capture by registering the Ethernet RX and TX callbacks f
 and queue combinations. Then the primary process will mirror the packets to the new mempool and enqueue them to
 the rte_ring that secondary process have passed to these APIs.
 
+The packet ring supports one of two formats. The default format enqueues copies of the original packets
+into the rte_ring. If the ``RTE_PDUMP_FLAG_PCAPNG`` is set the mbuf data is extended with header and trailer
+to match the format of Pcapng enhanced packet block. The enhanced packet block has meta-data such as the
+timestamp, port and queue the packet was captured on. It is up to the application consuming the
+packets from the ring to select the format desired.
+
 The library APIs ``rte_pdump_disable()`` and ``rte_pdump_disable_by_deviceid()`` disables the packet capture.
 For the calls to these APIs from secondary process, the library creates the "pdump disable" request and sends
 the request to the primary process over the multi process channel. The primary process takes this request and
@@ -74,5 +88,5 @@ function.
 Use Case: Packet Capturing
 --------------------------
 
-The DPDK ``app/pdump`` tool is developed based on this library to capture packets in DPDK.
-Users can use this as an example to develop their own packet capturing tools.
+The DPDK ``app/dpdk-dumpcap`` utility uses this library
+to capture packets in DPDK.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 675b5738348b..ee24cbfdb99d 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -62,6 +62,16 @@ New Features
   * Added bus-level parsing of the devargs syntax.
   * Kept compatibility with the legacy syntax as parsing fallback.
 
+* **Enhance Packet capture.**
+
+  * New dpdk-dumpcap program that has most of the features of the
+    wireshark dumpcap utility including capture of multiple interfaces,
+    stopping after number of bytes, packets.
+  * New library for writing pcapng packet capture files.
+  * Enhancement to the pdump library to support:
+    * Packet filter with BPF.
+    * Pcapng format with timestamps and meta-data.
+    * Fixes packet capture with stripped VLAN tags.
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/dumpcap.rst b/doc/guides/tools/dumpcap.rst
new file mode 100644
index 000000000000..90993bd5be5e
--- /dev/null
+++ b/doc/guides/tools/dumpcap.rst
@@ -0,0 +1,80 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Microsoft Corporation.
+
+.. _dumpcap_tool:
+
+dpdk-dumpcap Application
+========================
+
+The ``dpdk-dumpcap`` tool is a Data Plane Development Kit (DPDK)
+network traffic dump tool.  The interface is similar to  the dumpcap tool in Wireshark.
+It runs as a secondary DPDK process and lets you capture packets that are
+coming into and out of a DPDK primary process.
+The ``dpdk-dumpcap`` writes files in Pcapng packet format using
+capture file format is pcapng.
+
+Without any options set it will use DPDK to capture traffic from the first
+available DPDK interface and write the received raw packet data, along
+with timestamps into a pcapng file.
+
+If the ``-w`` option is not specified, ``dpdk-dumpcap`` writes to a newly
+create file with a name chosen based on interface name and timestamp.
+If ``-w`` option is specified, then that file is used.
+
+   .. Note::
+      * The ``dpdk-dumpcap`` tool can only be used in conjunction with a primary
+        application which has the packet capture framework initialized already.
+        In dpdk, only the ``testpmd`` is modified to initialize packet capture
+        framework, other applications remain untouched. So, if the ``dpdk-dumpcap``
+        tool has to be used with any application other than the testpmd, user
+        needs to explicitly modify that application to call packet capture
+        framework initialization code. Refer ``app/test-pmd/testpmd.c``
+        code to see how this is done.
+
+      * The ``dpdk-dumpcap`` tool runs as a DPDK secondary process. It exits when
+        the primary application exits.
+
+
+Running the Application
+-----------------------
+
+To list interfaces available for capture use ``--list-interfaces``.
+
+To capture on multiple interfaces at once, use multiple ``-I`` flags.
+
+Example
+-------
+
+.. code-block:: console
+
+   # ./<build_dir>/app/dpdk-dumpcap --list-interfaces
+   0. 000:00:03.0
+   1. 000:00:03.1
+
+   # ./<build_dir>/app/dpdk-dumpcap -I 0000:00:03.0 -c 6 -w /tmp/sample.pcapng
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 6/0
+
+
+Limitations
+-----------
+The following options of Wireshark ``dumpcap`` are not yet implemented:
+
+   * ``-f <capture filter>`` -- needs translation from classic BPF to eBPF.
+
+   * ``-b|--ring-buffer`` -- more complex file management.
+
+   * ``-C <byte_limit>`` -- doesn't make sense in DPDK model.
+
+   * ``-t`` -- use a thread per interface
+
+   * Timestamp type.
+
+   * Link data types. Only EN10MB (Ethernet) is supported.
+
+   * Wireless related options:  ``-I|--monitor-mode`` and  ``-k <freq>``
+
+
+.. Note::
+   * The options to ``dpdk-dumpcap`` are like the Wireshark dumpcap program and
+     are not the same as ``dpdk-pdump`` and other DPDK applications.
diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst
index 93dde4148e90..b71c12b8f2dd 100644
--- a/doc/guides/tools/index.rst
+++ b/doc/guides/tools/index.rst
@@ -8,6 +8,7 @@ DPDK Tools User Guides
     :maxdepth: 2
     :numbered:
 
+    dumpcap
     proc_info
     pdump
     pmdinfo
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH 5/5] MAINTAINERS: add entry for new pcapng and dumper
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (3 preceding siblings ...)
  2021-09-03  0:47 ` [dpdk-dev] [PATCH 4/5] doc: changes for new pcapng and dumpcap Stephen Hemminger
@ 2021-09-03  0:47 ` Stephen Hemminger
  2021-09-03 22:06 ` [dpdk-dev] [PATCH v2 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-03  0:47 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Claim responsibility for the new code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 266f5ac1dae8..06384ac2702d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1429,6 +1429,12 @@ F: app/test/test_pdump.*
 F: app/pdump/
 F: doc/guides/tools/pdump.rst
 
+Packet dump
+M: Stephen Hemminger <stephen@networkplumber.org>
+F: lib/pcapng/
+F: doc/guides/prog_guide/pcapng_lib.rst
+F: app/dumpcap/
+F: doc/guides/tools/dumpcap.rst
 
 Packet Framework
 ----------------
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH 3/5] app/dumpcap: add new packet capture application
  2021-09-03  0:47 ` [dpdk-dev] [PATCH 3/5] app/dumpcap: add new packet capture application Stephen Hemminger
@ 2021-09-03  0:59   ` Stephen Hemminger
  0 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-03  0:59 UTC (permalink / raw)
  To: dev

Ken Thompson lives on...

WARNING:TYPO_SPELLING: 'CREAT' may be misspelled - perhaps 'CREATE'?
#705: FILE: app/dumpcap/main.c:605:
+		fd = open(output_name, O_WRONLY | O_CREAT, mode);

^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v2 0/5] Packet capture framework enhancements
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (4 preceding siblings ...)
  2021-09-03  0:47 ` [dpdk-dev] [PATCH 5/5] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
@ 2021-09-03 22:06 ` Stephen Hemminger
  2021-09-03 22:06   ` [dpdk-dev] [PATCH v2 1/5] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
                     ` (4 more replies)
  2021-09-08  4:50 ` [dpdk-dev] [PATCH v3 0/8] Packet capture framework enhancements Stephen Hemminger
                   ` (12 subsequent siblings)
  18 siblings, 5 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-03 22:06 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patch set is a more complete version of the the enhanced
packet capture support described last year.

The new capture library and utility are:
  - faster avoids lots of extra I/O, does bursting, etc.
  - gives more information (multiple ports, queues, etc)
  - has a better user interface (same as Wireshark dumpcap)
  - fixes structural problems with VLAN's and timestamps

And it keeps the old pdump command as is for those people
who never want to change.

The one missing piece is that dumpcap utility does not
yet have the necessary converter to take the classic
BPF for pcap_compile and convert it to eBPF for DPDK.
(It is not hard, just not working right yet.)

v2 
   fix formatting of packet blocks
   fix the new packet capture statistics
   fix crash when primary process exits
   record start/end time
   various whitespace/checkpatch warnings

Stephen Hemminger (5):
  librte_pcapng: add new library for writing pcapng files
  pdump: support pcapng and filtering
  app/dumpcap: add new packet capture application
  doc: changes for new pcapng and dumpcap
  MAINTAINERS: add entry for new pcapng and dumper

 MAINTAINERS                                   |   6 +
 app/dumpcap/main.c                            | 831 ++++++++++++++++++
 app/dumpcap/meson.build                       |  18 +
 app/meson.build                               |   1 +
 doc/api/doxy-api-index.md                     |   1 +
 doc/api/doxy-api.conf.in                      |   1 +
 .../howto/img/packet_capture_framework.svg    |  96 +-
 doc/guides/howto/packet_capture_framework.rst |  67 +-
 doc/guides/prog_guide/index.rst               |   1 +
 doc/guides/prog_guide/pcapng_lib.rst          |  24 +
 doc/guides/prog_guide/pdump_lib.rst           |  28 +-
 doc/guides/rel_notes/release_21_11.rst        |  10 +
 doc/guides/tools/dumpcap.rst                  |  80 ++
 doc/guides/tools/index.rst                    |   1 +
 lib/meson.build                               |   5 +-
 lib/pcapng/meson.build                        |   8 +
 lib/pcapng/pcapng_proto.h                     | 129 +++
 lib/pcapng/rte_pcapng.c                       | 577 ++++++++++++
 lib/pcapng/rte_pcapng.h                       | 205 +++++
 lib/pcapng/version.map                        |  13 +
 lib/pdump/meson.build                         |   2 +-
 lib/pdump/rte_pdump.c                         | 419 ++++++---
 lib/pdump/rte_pdump.h                         | 110 ++-
 lib/pdump/version.map                         |   8 +
 24 files changed, 2426 insertions(+), 215 deletions(-)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v2 1/5] librte_pcapng: add new library for writing pcapng files
  2021-09-03 22:06 ` [dpdk-dev] [PATCH v2 0/5] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-03 22:06   ` Stephen Hemminger
  2021-09-03 22:06   ` [dpdk-dev] [PATCH v2 2/5] pdump: support pcapng and filtering Stephen Hemminger
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-03 22:06 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See draft RFC
  https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
and
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build           |   1 +
 lib/pcapng/meson.build    |   8 +
 lib/pcapng/pcapng_proto.h | 129 +++++++++
 lib/pcapng/rte_pcapng.c   | 577 ++++++++++++++++++++++++++++++++++++++
 lib/pcapng/rte_pcapng.h   | 205 ++++++++++++++
 lib/pcapng/version.map    |  13 +
 6 files changed, 933 insertions(+)
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

diff --git a/lib/meson.build b/lib/meson.build
index 1673ca4323c0..51bf9c2d11f0 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -41,6 +41,7 @@ libraries = [
         'latencystats',
         'lpm',
         'member',
+        'pcapng',
         'power',
         'pdump',
         'rawdev',
diff --git a/lib/pcapng/meson.build b/lib/pcapng/meson.build
new file mode 100644
index 000000000000..fe636bdf3c0b
--- /dev/null
+++ b/lib/pcapng/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+version = 1
+sources = files('rte_pcapng.c')
+headers = files('rte_pcapng.h')
+
+deps += ['ethdev']
diff --git a/lib/pcapng/pcapng_proto.h b/lib/pcapng/pcapng_proto.h
new file mode 100644
index 000000000000..47161d8a1213
--- /dev/null
+++ b/lib/pcapng/pcapng_proto.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * PCAP Next Generation Capture File writer
+ *
+ * See: https://github.com/pcapng/pcapng/ for the file format.
+ */
+
+enum pcapng_block_types {
+	PCAPNG_INTERFACE_BLOCK		= 1,
+	PCAPNG_PACKET_BLOCK,		/* Obsolete */
+	PCAPNG_SIMPLE_PACKET_BLOCK,
+	PCAPNG_NAME_RESOLUTION_BLOCK,
+	PCAPNG_INTERFACE_STATS_BLOCK,
+	PCAPNG_ENHANCED_PACKET_BLOCK,
+
+	PCAPNG_SECTION_BLOCK		= 0x0A0D0D0A,
+};
+
+struct pcapng_option {
+	uint16_t code;
+	uint16_t length;
+	uint8_t data[];
+};
+
+#define PCAPNG_BYTE_ORDER_MAGIC 0x1A2B3C4D
+#define PCAPNG_MAJOR_VERS 1
+#define PCAPNG_MINOR_VERS 0
+
+enum pcapng_opt {
+	PCAPNG_OPT_END	= 0,
+	PCAPNG_OPT_COMMENT = 1,
+};
+
+struct pcapng_section_header {
+	uint32_t block_type;
+	uint32_t block_length;
+	uint32_t byte_order_magic;
+	uint16_t major_version;
+	uint16_t minor_version;
+	uint64_t section_length;
+};
+
+enum pcapng_section_opt {
+	PCAPNG_SHB_HARDWARE = 2,
+	PCAPNG_SHB_OS	    = 3,
+	PCAPNG_SHB_USERAPPL = 4,
+};
+
+struct pcapng_interface_block {
+	uint32_t block_type;	/* 1 */
+	uint32_t block_length;
+	uint16_t link_type;
+	uint16_t reserved;
+	uint32_t snap_len;
+};
+
+enum pcapng_interface_options {
+	PCAPNG_IFB_NAME	 = 2,
+	PCAPNG_IFB_DESCRIPTION,
+	PCAPNG_IFB_IPV4ADDR,
+	PCAPNG_IFB_IPV6ADDR,
+	PCAPNG_IFB_MACADDR,
+	PCAPNG_IFB_EUIADDR,
+	PCAPNG_IFB_SPEED,
+	PCAPNG_IFB_TSRESOL,
+	PCAPNG_IFB_TZONE,
+	PCAPNG_IFB_FILTER,
+	PCAPNG_IFB_OS,
+	PCAPNG_IFB_FCSLEN,
+	PCAPNG_IFB_TSOFFSET,
+	PCAPNG_IFB_HARDWARE,
+};
+
+struct pcapng_enhance_packet_block {
+	uint32_t block_type;	/* 6 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+	uint32_t capture_length;
+	uint32_t original_length;
+};
+
+/* Flags values */
+#define PCAPNG_IFB_INBOUND   0b01
+#define PCAPNG_IFB_OUTBOUND  0b10
+
+enum pcapng_epb_options {
+	PCAPNG_EPB_FLAGS = 2,
+	PCAPNG_EPB_HASH,
+	PCAPNG_EPB_DROPCOUNT,
+	PCAPNG_EPB_PACKETID,
+	PCAPNG_EPB_QUEUE,
+	PCAPNG_EPB_VERDICT,
+};
+
+enum pcapng_epb_hash {
+	PCAPNG_HASH_2COMP = 0,
+	PCAPNG_HASH_XOR,
+	PCAPNG_HASH_CRC32,
+	PCAPNG_HASH_MD5,
+	PCAPNG_HASH_SHA1,
+	PCAPNG_HASH_TOEPLITZ,
+};
+
+struct pcapng_simple_packet {
+	uint32_t block_type;	/* 3 */
+	uint32_t block_length;
+	uint32_t packet_length;
+};
+
+struct pcapng_statistics {
+	uint32_t block_type;	/* 5 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+};
+
+enum pcapng_isb_options {
+	PCAPNG_ISB_STARTTIME = 2,
+	PCAPNG_ISB_ENDTIME,
+	PCAPNG_ISB_IFRECV,
+	PCAPNG_ISB_IFDROP,
+	PCAPNG_ISB_FILTERACCEPT,
+	PCAPNG_ISB_OSDROP,
+	PCAPNG_ISB_USRDELIV,
+};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
new file mode 100644
index 000000000000..2c4a6d765de6
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.c
@@ -0,0 +1,577 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+#include <errno.h>
+#include <net/if.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/uio.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+#include <rte_time.h>
+
+#include "pcapng_proto.h"
+
+/* conversion from DPDK speed to PCAPNG */
+#define PCAPNG_MBPS_SPEED 1000000ull
+
+/* Format of the capture file handle */
+struct rte_pcapng {
+	int  outfd;		/* output file */
+	/* DPDK port id to interface index in file */
+	uint32_t port_index[RTE_MAX_ETHPORTS];
+};
+
+/* For converting TSC cycles to PCAPNG ns format */
+struct pcapng_time {
+	uint64_t ns;
+	uint64_t cycles;
+} pcapng_time;
+
+int rte_pcapng_init(void)
+{
+	struct timespec ts;
+
+	pcapng_time.cycles = rte_get_tsc_cycles();
+	if (clock_gettime(CLOCK_REALTIME, &ts) < 0)
+		return -1;
+
+	pcapng_time.ns = rte_timespec_to_ns(&ts);
+	return 0;
+}
+
+/* PCAPNG timestamps are in nanoseconds */
+static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
+{
+	uint64_t delta;
+
+	delta = cycles - pcapng_time.cycles;
+	return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
+}
+
+/* length of option including padding */
+static uint16_t pcapng_optlen(uint16_t len)
+{
+	return RTE_ALIGN(sizeof(struct pcapng_option) + len,
+			 sizeof(uint32_t));
+}
+
+/* build TLV option and return location of next */
+static struct pcapng_option *
+pcapng_add_option(struct pcapng_option *popt, uint16_t code,
+		  const void *data, uint16_t len)
+{
+	popt->code = code;
+	popt->length = len;
+	memcpy(popt->data, data, len);
+
+	return (struct pcapng_option *)((uint8_t *)popt + pcapng_optlen(len));
+}
+
+/*
+ * Write required initial section header describing the capture
+ */
+static int
+pcapng_section_block(rte_pcapng_t *self,
+		    const char *os, const char *hw,
+		    const char *app, const char *comment)
+{
+	struct pcapng_section_header *hdr;
+	struct pcapng_option *opt;
+	void *buf;
+	uint32_t len;
+	ssize_t cc;
+
+	len = sizeof(*hdr);
+	if (hw)
+		len += pcapng_optlen(strlen(hw));
+	if (os)
+		len += pcapng_optlen(strlen(os));
+	if (app)
+		len += pcapng_optlen(strlen(app));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = calloc(1, len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_section_header *)buf;
+	*hdr = (struct pcapng_section_header) {
+		.block_type = PCAPNG_SECTION_BLOCK,
+		.block_length = len,
+		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
+		.major_version = PCAPNG_MAJOR_VERS,
+		.minor_version = PCAPNG_MINOR_VERS,
+		.section_length = UINT64_MAX,
+	};
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (hw)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_HARDWARE,
+					hw, strlen(hw));
+	if (os)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_OS,
+					os, strlen(os));
+	if (app)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_USERAPPL,
+					app, strlen(app));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+	/* clone block_length after option */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+/* Write the PCAPNG section header at start of file */
+static ssize_t
+pcapng_interface_block(rte_pcapng_t *self, const char *if_name,
+		      uint64_t if_speed, const uint8_t *mac_addr,
+		      const char *if_hw, const char *comment)
+{
+	struct pcapng_interface_block *hdr;
+	struct pcapng_option *opt;
+	const uint8_t tsresol = 9;	/* nanosecond resolution */
+	uint32_t len = sizeof(*hdr);
+	ssize_t cc;
+	void *buf;
+
+	len += pcapng_optlen(sizeof(tsresol));
+	if (if_name)
+		len += pcapng_optlen(strlen(if_name));
+	if (mac_addr)
+		len += pcapng_optlen(6);
+	if (if_speed)
+		len += pcapng_optlen(sizeof(uint64_t));
+	if (if_hw)
+		len += pcapng_optlen(strlen(if_hw));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+	buf = calloc(1, len);
+	if (!buf)
+		return -ENOMEM;
+
+	hdr = (struct pcapng_interface_block *)buf;
+	hdr->block_type = PCAPNG_INTERFACE_BLOCK;
+	hdr->link_type = 1;	/* Ethernet */
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (if_name)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_NAME,
+					 if_name, strlen(if_name));
+	if (mac_addr)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_MACADDR,
+					mac_addr, RTE_ETHER_ADDR_LEN);
+	if (if_speed)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_SPEED,
+					 &if_speed, sizeof(uint64_t));
+	opt = pcapng_add_option(opt, PCAPNG_IFB_TSRESOL,
+				&tsresol, sizeof(tsresol));
+	if (if_hw)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE,
+					 if_hw, strlen(if_hw));
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+static int
+pcapng_add_interface(rte_pcapng_t *self, uint16_t port)
+{
+	struct rte_eth_dev_info dev_info;
+	struct rte_ether_addr macaddr;
+	const struct rte_device *dev;
+	struct rte_eth_link link;
+	char ifname[IF_NAMESIZE];
+	char ifhw[256];
+	uint64_t speed = 0;
+
+	if (rte_eth_dev_info_get(port, &dev_info) < 0)
+		return -1;
+
+	/* make something like an interface name */
+	if (if_indextoname(dev_info.if_index, ifname) == NULL)
+		snprintf(ifname, IF_NAMESIZE, "dpdk:%u", port);
+
+	/* make a useful device hardware string */
+	dev = dev_info.device;
+	if (dev)
+		snprintf(ifhw, sizeof(ifhw),
+			 "%s-%s", dev->bus->name, dev->name);
+
+	/* DPDK reports in units of Mbps */
+	rte_eth_link_get(port, &link);
+	if (link.link_status == ETH_LINK_UP)
+		speed = link.link_speed * PCAPNG_MBPS_SPEED;
+
+	rte_eth_macaddr_get(port, &macaddr);
+
+	return pcapng_interface_block(self, ifname, speed,
+				      macaddr.addr_bytes,
+				      dev ? ifhw : NULL, NULL);
+}
+
+/*
+ * Write the list of possible interfaces at the start
+ * of the file.
+ */
+static int
+pcapng_interfaces(rte_pcapng_t *self)
+{
+	uint16_t port_id;
+	uint16_t index = 0;
+
+	RTE_ETH_FOREACH_DEV(port_id) {
+		/* The list if ports in pcapng needs to be contiguous */
+		self->port_index[port_id] = index++;
+		if (pcapng_add_interface(self, port_id) < 0)
+			return -1;
+	}
+	return 0;
+}
+
+/*
+ * Write an Interface statistics block at the end of capture.
+ */
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop)
+{
+	struct pcapng_statistics *hdr;
+	struct pcapng_option *opt;
+	uint32_t optlen, len;
+	uint8_t *buf;
+	uint64_t ns;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	optlen = 0;
+
+	if (ifrecv != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifdrop));
+	if (start_time != 0)
+		optlen += pcapng_optlen(sizeof(start_time));
+	if (end_time != 0)
+		optlen += pcapng_optlen(sizeof(end_time));
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+	if (optlen != 0)
+		optlen += pcapng_optlen(0);
+
+	len = sizeof(*hdr) + optlen + sizeof(uint32_t);
+	buf = alloca(len);
+	if (buf == NULL)
+		return -1;
+
+	hdr = (struct pcapng_statistics *)buf;
+	opt = (struct pcapng_option *)(hdr + 1);
+
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (start_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME,
+					 &start_time, sizeof(start_time));
+	if (end_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME,
+					 &end_time, sizeof(end_time));
+	if (ifrecv != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV,
+				&ifrecv, sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFDROP,
+				&ifdrop, sizeof(ifdrop));
+	if (optlen != 0)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	hdr->block_type = PCAPNG_INTERFACE_STATS_BLOCK;
+	hdr->block_length = len;
+	hdr->interface_id = self->port_index[port_id];
+
+	ns = pcapng_tsc_to_ns(rte_get_tsc_cycles());
+	hdr->timestamp_hi = ns >> 32;
+	hdr->timestamp_lo = (uint32_t)ns;
+
+	/* clone block_length after option */
+	memcpy(opt, &len, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length)
+{
+	/* The VLAN and EPB header must fit in the mbuf headroom. */
+	RTE_ASSERT(sizeof(struct pcapng_enhance_packet_block) +
+		   sizeof(struct rte_vlan_hdr) <= RTE_PKTMBUF_HEADROOM);
+
+	/* The flags and queue information are added at the end. */
+	return sizeof(struct rte_mbuf)
+		+ RTE_ALIGN(length, sizeof(uint32_t))
+		+ pcapng_optlen(sizeof(uint32_t)) /* flag option */
+		+ pcapng_optlen(sizeof(uint32_t)) /* queue option */
+		+ sizeof(uint32_t);		  /*  length */
+}
+
+/*
+ *   The mbufs created use the Pcapng standard enhanced packet  block.
+ *
+ *                         1                   2                   3
+ *     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  0 |                    Block Type = 0x00000006                    |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  4 |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  8 |                         Interface ID                          |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 12 |                        Timestamp (High)                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 16 |                        Timestamp (Low)                        |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 20 |                    Captured Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 24 |                    Original Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 28 /                                                               /
+ *    /                          Packet Data                          /
+ *    /              variable length, padded to 32 bits               /
+ *    /                                                               /
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0002     |     Option Length = 0x004     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Flags (direction)                                |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0006     |     Option Length = 0x002     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Queue id                                         |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+
+/* Make a copy of original mbuf with pcapng header and options */
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *md,
+		struct rte_mempool *mp,
+		uint32_t length, uint64_t cycles,
+		enum rte_pcapng_direction direction)
+{
+	struct pcapng_enhance_packet_block *epb;
+	uint32_t orig_len, data_len, padding, flags;
+	struct pcapng_option *opt;
+	const uint16_t optlen = pcapng_optlen(sizeof(flags)) + pcapng_optlen(sizeof(queue));
+	struct rte_mbuf *mc;
+	uint64_t ns;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+#endif
+	ns = pcapng_tsc_to_ns(cycles);
+
+	orig_len = rte_pktmbuf_pkt_len(md);
+
+	/* Take snapshot of the data */
+	mc = rte_pktmbuf_copy(md, mp, 0, length);
+	if (unlikely(mc == NULL))
+		return NULL;
+
+	/* If packet had offloaded VLAN, expand it */
+	if (md->ol_flags & ~(PKT_RX_VLAN_STRIPPED | PKT_TX_VLAN)) {
+		if (rte_vlan_insert(&mc) != 0)
+			goto fail;
+
+		orig_len += sizeof(struct rte_vlan_hdr);
+	}
+
+	/* pad the packet to 32 bit boundary */
+	data_len = rte_pktmbuf_data_len(mc);
+	padding = RTE_ALIGN(data_len, sizeof(uint32_t)) - data_len;
+	if (padding > 0) {
+		void *tail = rte_pktmbuf_append(mc, padding);
+
+		if (tail == NULL)
+			goto fail;
+		memset(tail, 0, padding);
+	}
+
+	/* reserve trailing options and block length */
+	opt = (struct pcapng_option *)
+		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
+	if (unlikely(opt == NULL))
+		goto fail;
+
+	switch (direction) {
+	case RTE_PCAPNG_DIRECTION_IN:
+		flags = PCAPNG_IFB_INBOUND;
+		break;
+	case RTE_PCAPNG_DIRECTION_OUT:
+		flags = PCAPNG_IFB_OUTBOUND;
+		break;
+	default:
+		flags = 0;
+	}
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS,
+				&flags, sizeof(flags));
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
+				&queue, sizeof(queue));
+
+	/* Add PCAPNG packet header */
+	epb = (struct pcapng_enhance_packet_block *)
+		rte_pktmbuf_prepend(mc, sizeof(*epb));
+	if (unlikely(epb == NULL))
+		goto fail;
+
+	epb->block_type = PCAPNG_ENHANCED_PACKET_BLOCK;
+	epb->block_length = rte_pktmbuf_data_len(mc);
+
+	/* Interface index is filled in later during write */
+	mc->port = port_id;
+
+	epb->timestamp_hi = ns >> 32;
+	epb->timestamp_lo = (uint32_t)ns;
+	epb->capture_length = data_len;
+	epb->original_length = orig_len;
+
+	/* set trailer of block length */
+	*(uint32_t *)opt = epb->block_length;
+
+	return mc;
+
+fail:
+	rte_pktmbuf_free(mc);
+	return NULL;
+}
+
+/* Count how many segments are in this array of mbufs */
+static unsigned int
+mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
+{
+	unsigned int i, iovcnt;
+
+	for (iovcnt = 0, i = 0; i < n; i++) {
+		const struct rte_mbuf *m = pkts[i];
+
+		__rte_mbuf_sanity_check(m, 1);
+
+		iovcnt += m->nb_segs;
+	}
+	return iovcnt;
+}
+
+/* Write pre-formatted packets to file. */
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts)
+{
+	int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
+	struct iovec iov[iovcnt];
+	unsigned int i, cnt;
+	ssize_t ret;
+
+	for (i = cnt = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *m = pkts[i];
+		struct pcapng_enhance_packet_block *epb;
+
+		/* sanity check that is really a pcapng mbuf */
+		epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
+		if (unlikely(epb->block_type != PCAPNG_ENHANCED_PACKET_BLOCK ||
+			     epb->block_length != rte_pktmbuf_data_len(m))) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/*
+		 * The DPDK port is recorded during pcapng_copy.
+		 * Map that to PCAPNG interface in file.
+		 */
+		epb->interface_id = self->port_index[m->port];
+		do {
+			iov[cnt].iov_base = rte_pktmbuf_mtod(m, void *);
+			iov[cnt].iov_len = rte_pktmbuf_data_len(m);
+			++cnt;
+		} while ((m = m->next));
+	}
+
+	ret = writev(self->outfd, iov, iovcnt);
+	if (unlikely(ret < 0))
+		rte_errno = errno;
+	return ret;
+}
+
+/* Create new pcapng writer handle */
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment)
+{
+	rte_pcapng_t *self;
+
+	self = malloc(sizeof(*self));
+	if (!self) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	self->outfd = fd;
+
+	if (pcapng_section_block(self, osname, hardware, appname, comment) < 0)
+		goto fail;
+
+	if (pcapng_interfaces(self) < 0)
+		goto fail;
+
+	return self;
+fail:
+	free(self);
+	return NULL;
+}
+
+void
+rte_pcapng_close(rte_pcapng_t *self)
+{
+	close(self->outfd);
+	free(self);
+}
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
new file mode 100644
index 000000000000..a574631def98
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.h
@@ -0,0 +1,205 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+/**
+ * @file
+ * RTE pcapng
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * Pcapng is an evolution from the pcap format, created to address some of
+ * its deficiencies. Namely, the lack of extensibility and inability to store
+ * additional information.
+ *
+ * For details about the file format see RFC:
+ *   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+ *  and
+ *    https://github.com/pcapng/pcapng/
+ */
+
+#ifndef _RTE_PCAPNG_H_
+#define _RTE_PCAPNG_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+#include <rte_compat.h>
+#include <rte_common.h>
+#include <rte_mempool.h>
+#include <rte_ring.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Initialize the pcapng library.
+ *
+ * Computes the time offset for timestamps.
+ *
+ * @return
+ *    0 on success, -1 on error
+ */
+__rte_experimental
+int rte_pcapng_init(void);
+
+/* Opaque handle used for functions in this library. */
+typedef struct rte_pcapng rte_pcapng_t;
+
+/**
+ * Write data to existing open file
+ *
+ * @param fd
+ *   file descriptor
+ * @param osname
+ *   Optional description of the operating system.
+ *   Examples: "Debian 11", "Windows Server 22"
+ * @param hardware
+ *   Optional description of the hardware used to create this file.
+ *   Examples: "x86 Virtual Machine"
+ * @param appname
+ *   Optional: application name recorded in the pcapng file.
+ *   Example: "dpdk-dumpcap 1.0 (DPDK 20.11)"
+ * @param comment
+ *   Optional comment to add to file header.
+ * @return
+ *   handle to library, or NULL in case of error (and rte_errno is set).
+ */
+__rte_experimental
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment);
+
+/**
+ * Close capture file
+ *
+ * @param self
+ *  handle to library
+ */
+__rte_experimental
+void
+rte_pcapng_close(rte_pcapng_t *self);
+
+/**
+ * Direction flag
+ * These should match Enhanced Packet Block flag bits
+ */
+enum rte_pcapng_direction {
+	RTE_PCAPNG_DIRECTION_UNKNOWN = 0,
+	RTE_PCAPNG_DIRECTION_IN  = 1,
+	RTE_PCAPNG_DIRECTION_OUT = 2,
+};
+
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param port_id
+ *   The Ethernet port on which packet was received
+ *   or is going to be transmitted.
+ * @param queue
+ *   The queue on the Ethernet port where packet was received
+ *   or is going to be transmitted.
+ * @param mp
+ *   The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ *   The mbuf to copy
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @param timestamp
+ *   The timestamp in TSC cycles.
+ * @param direction
+ *   The direction of the packer: receive, transmit or unknown.
+ *
+ * @return
+ *   - The pointer to the new mbuf formatted for pcapng_write
+ *   - NULL if allocation fails.
+ *
+ */
+__rte_experimental
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *m, struct rte_mempool *mp,
+		uint32_t length, uint64_t timestamp,
+		enum rte_pcapng_direction direction);
+
+
+/**
+ * Determine optimum mbuf data size.
+ *
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @return
+ *   The minimum size of mbuf data to handle packet with length bytes.
+ *   Accounting for required header and trailer fields
+ */
+__rte_experimental
+uint32_t
+rte_pcapng_mbuf_size(uint32_t snaplen);
+
+/**
+ * Write packets to the capture file.
+ *
+ * Packets to be captured are copied by rte_pcapng_mbuf()
+ * and then this function is called to write them to the file.
+ * @warning
+ * Do not pass original mbufs
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param pkts
+ *  The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *  which contain the output packets
+ * @param nb_pkts
+ *  The number of packets to write to the file.
+ * @return
+ *  The number of bytes written to file, -1 on failure to write file.
+ *  The mbuf's in *pkts* are always freed.
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts);
+
+/**
+ * Write an Interface statistics block.
+ * For statistics, use 0 if don't know or care to report it.
+ * Should be called before closing capture to report results.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param port
+ *  The Ethernet port to report stats on.
+ * @param comment
+ *   Optional comment to add to statistics.
+ * @param start_time
+ *  The time when packet capture was started in nanoseconds.
+ *  Optional: can be zero if not known.
+ * @param end_time
+ *  The time when packet capture was stopped in nanoseconds.
+ *  Optional: can be zero if not finished;
+ * @param ifrecv
+ *  The number of packets received by capture.
+ *  Optional: use UINT64_MAX if not known.
+ * @param ifdrop
+ *  The number of packets missed by the capture process.
+ *  Optional: use UINT64_MAX if not known.
+ * @return
+ *  number of bytes written to file, -1 on failure to write file
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PCAPNG_H_ */
diff --git a/lib/pcapng/version.map b/lib/pcapng/version.map
new file mode 100644
index 000000000000..7418fc5dcac9
--- /dev/null
+++ b/lib/pcapng/version.map
@@ -0,0 +1,13 @@
+EXPERIMENTAL {
+	global:
+
+	rte_pcapng_close;
+	rte_pcapng_copy;
+	rte_pcapng_fdopen;
+	rte_pcapng_init;
+	rte_pcapng_mbuf_size;
+	rte_pcapng_write_packets;
+	rte_pcapng_write_stats;
+
+	local: *;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v2 2/5] pdump: support pcapng and filtering
  2021-09-03 22:06 ` [dpdk-dev] [PATCH v2 0/5] Packet capture framework enhancements Stephen Hemminger
  2021-09-03 22:06   ` [dpdk-dev] [PATCH v2 1/5] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-09-03 22:06   ` Stephen Hemminger
  2021-09-03 22:06   ` [dpdk-dev] [PATCH v2 3/5] app/dumpcap: add new packet capture application Stephen Hemminger
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-03 22:06 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build       |   4 +-
 lib/pdump/meson.build |   2 +-
 lib/pdump/rte_pdump.c | 419 ++++++++++++++++++++++++++++++------------
 lib/pdump/rte_pdump.h | 110 ++++++++++-
 lib/pdump/version.map |   8 +
 5 files changed, 415 insertions(+), 128 deletions(-)

diff --git a/lib/meson.build b/lib/meson.build
index 51bf9c2d11f0..4b01949fe715 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -26,6 +26,7 @@ libraries = [
         'timer',   # eventdev depends on this
         'acl',
         'bbdev',
+        'bpf',
         'bitratestats',
         'cfgfile',
         'compressdev',
@@ -43,7 +44,6 @@ libraries = [
         'member',
         'pcapng',
         'power',
-        'pdump',
         'rawdev',
         'regexdev',
         'rib',
@@ -55,10 +55,10 @@ libraries = [
         'ipsec', # ipsec lib depends on net, crypto and security
         'fib', #fib lib depends on rib
         'port', # pkt framework libs which use other libs from above
+	'pdump', # pdump lib depends on bpf pcapng
         'table',
         'pipeline',
         'flow_classify', # flow_classify lib depends on pkt framework table lib
-        'bpf',
         'graph',
         'node',
 ]
diff --git a/lib/pdump/meson.build b/lib/pdump/meson.build
index 3a95eabde6a6..51ceb2afdec5 100644
--- a/lib/pdump/meson.build
+++ b/lib/pdump/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('rte_pdump.c')
 headers = files('rte_pdump.h')
-deps += ['ethdev']
+deps += ['ethdev', 'bpf', 'pcapng']
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 382217bc1564..dde536c2e047 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -7,8 +7,10 @@
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
+#include <rte_pcapng.h>
 
 #include "rte_pdump.h"
 
@@ -27,30 +29,26 @@ enum pdump_operation {
 	ENABLE = 2
 };
 
+/*
+ * Note: version numbers intentionally start at 3
+ * in order to catch any application built with older out
+ * version of DPDK using incompatiable client request format.
+ */
 enum pdump_version {
-	V1 = 1
+	PDUMP_CLIENT_LEGACY = 3,
+	PDUMP_CLIENT_PCAPNG = 4,
 };
 
 struct pdump_request {
 	uint16_t ver;
 	uint16_t op;
-	uint32_t flags;
-	union pdump_data {
-		struct enable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} en_v1;
-		struct disable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} dis_v1;
-	} data;
+	uint16_t flags;
+	uint16_t queue;
+	struct rte_ring *ring;
+	struct rte_mempool *mp;
+	const struct rte_bpf *filter;
+	uint32_t snaplen;
+	char device[RTE_DEV_NAME_MAX_LEN];
 };
 
 struct pdump_response {
@@ -63,80 +61,137 @@ static struct pdump_rxtx_cbs {
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	const struct rte_eth_rxtx_callback *cb;
-	void *filter;
+	const struct rte_bpf *filter;
+	enum pdump_version ver;
+	uint32_t snaplen;
 } rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
 tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
 
-
-static inline void
-pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+static const char *MZ_RTE_PDUMP_STATS = "rte_pdump_stats";
+
+/* Shared memory between primary and secondary processes. */
+static struct {
+	struct rte_pdump_stats rx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+	struct rte_pdump_stats tx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+} *pdump_stats;
+
+/* Create a clone of mbuf to be placed into ring. */
+static void
+pdump_copy(uint16_t port_id, uint16_t queue,
+	   enum rte_pcapng_direction direction,
+	   struct rte_mbuf **pkts, uint16_t nb_pkts,
+	   const struct pdump_rxtx_cbs *cbs,
+	   struct rte_pdump_stats *stats)
 {
 	unsigned int i;
 	int ring_enq;
 	uint16_t d_pkts = 0;
 	struct rte_mbuf *dup_bufs[nb_pkts];
-	struct pdump_rxtx_cbs *cbs;
+	uint64_t ts;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	struct rte_mbuf *p;
+	uint64_t bpf_rc[nb_pkts];
+
+	if (cbs->filter &&
+	    !rte_bpf_exec_burst(cbs->filter, (void **)pkts, bpf_rc, nb_pkts)) {
+		/* All packets were filtered out */
+		__atomic_fetch_add(&stats->filtered, nb_pkts, __ATOMIC_RELAXED);
+		return;
+	}
 
-	cbs  = user_params;
+	ts = rte_get_tsc_cycles();
 	ring = cbs->ring;
 	mp = cbs->mp;
 	for (i = 0; i < nb_pkts; i++) {
-		p = rte_pktmbuf_copy(pkts[i], mp, 0, UINT32_MAX);
-		if (p)
+		/*
+		 * Similar behavior to rte_bpf_eth callback.
+		 * if BPF program returns zero value for a given packet,
+		 * then it will be ignored.
+		 */
+		if (cbs->filter && bpf_rc[i] == 0) {
+			__atomic_fetch_add(&stats->filtered,
+					   1, __ATOMIC_RELAXED);
+			continue;
+		}
+
+		/*
+		 * If using pcapng then want to wrap packets
+		 * otherwise a simple copy.
+		 */
+		if (cbs->ver == PDUMP_CLIENT_PCAPNG)
+			p = rte_pcapng_copy(port_id, queue,
+					    pkts[i], mp, cbs->snaplen,
+					    ts, direction);
+		else
+			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
+
+		if (unlikely(p == NULL))
+			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+		else
 			dup_bufs[d_pkts++] = p;
 	}
 
+	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)dup_bufs, d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		PDUMP_LOG(DEBUG,
-			"only %d of packets enqueued to ring\n", ring_enq);
+		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
 
 static uint16_t
-pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_rx(uint16_t port, uint16_t queue,
 	struct rte_mbuf **pkts, uint16_t nb_pkts,
-	uint16_t max_pkts __rte_unused,
-	void *user_params)
+	uint16_t max_pkts __rte_unused, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->rx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_IN,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static uint16_t
-pdump_tx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_tx(uint16_t port, uint16_t queue,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->tx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static int
-pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_rx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    uint16_t operation, uint32_t snaplen)
 {
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &rx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &rx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"rx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+
 			cbs->cb = rte_eth_add_first_rx_callback(port, qid,
 								pdump_rx, cbs);
 			if (cbs->cb == NULL) {
@@ -145,8 +200,7 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -170,26 +224,30 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 }
 
 static int
-pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_tx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    uint16_t operation, uint32_t snaplen)
 {
 
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &tx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &tx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"tx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+
 			cbs->cb = rte_eth_add_tx_callback(port, qid, pdump_tx,
 								cbs);
 			if (cbs->cb == NULL) {
@@ -198,8 +256,7 @@ pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -233,32 +290,25 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 
+	if (!(p->ver == PDUMP_CLIENT_LEGACY ||
+	      p->ver == PDUMP_CLIENT_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "incorrect client version %u\n", p->ver);
+		return -EINVAL;
+	}
+
 	flags = p->flags;
 	operation = p->op;
-	if (operation == ENABLE) {
-		ret = rte_eth_dev_get_port_by_name(p->data.en_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.en_v1.device);
-			return -EINVAL;
-		}
-		queue = p->data.en_v1.queue;
-		ring = p->data.en_v1.ring;
-		mp = p->data.en_v1.mp;
-	} else {
-		ret = rte_eth_dev_get_port_by_name(p->data.dis_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.dis_v1.device);
-			return -EINVAL;
-		}
-		queue = p->data.dis_v1.queue;
-		ring = p->data.dis_v1.ring;
-		mp = p->data.dis_v1.mp;
+	queue = p->queue;
+	ring = p->ring;
+	mp = p->mp;
+
+	ret = rte_eth_dev_get_port_by_name(p->device, &port);
+	if (ret < 0) {
+		PDUMP_LOG(ERR,
+			  "failed to get port id for device id=%s\n",
+			  p->device);
+		return -EINVAL;
 	}
 
 	/* validation if packet capture is for all queues */
@@ -296,8 +346,8 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register RX callback */
 	if (flags & RTE_PDUMP_FLAG_RX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_rx_q : queue + 1;
-		ret = pdump_register_rx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_rx_callbacks(p->ver, end_q, port, queue, ring, mp,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -305,8 +355,8 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register TX callback */
 	if (flags & RTE_PDUMP_FLAG_TX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_tx_q : queue + 1;
-		ret = pdump_register_tx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_tx_callbacks(p->ver, end_q, port, queue, ring, mp,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -347,8 +397,20 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 int
 rte_pdump_init(void)
 {
+	const struct rte_memzone *mz;
 	int ret;
 
+	mz = rte_memzone_reserve(MZ_RTE_PDUMP_STATS, sizeof(*pdump_stats),
+				 rte_socket_id(), 0);
+	if (mz == NULL) {
+		PDUMP_LOG(ERR, "cannot allocate pdump statistics\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	pdump_stats = mz->addr;
+
+	rte_pcapng_init();
+
 	ret = rte_mp_action_register(PDUMP_MP, pdump_server);
 	if (ret && rte_errno != ENOTSUP)
 		return -1;
@@ -392,14 +454,21 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 static int
 pdump_validate_flags(uint32_t flags)
 {
-	if (flags != RTE_PDUMP_FLAG_RX && flags != RTE_PDUMP_FLAG_TX &&
-		flags != RTE_PDUMP_FLAG_RXTX) {
+	if ((flags & RTE_PDUMP_FLAG_RXTX) == 0) {
 		PDUMP_LOG(ERR,
 			"invalid flags, should be either rx/tx/rxtx\n");
 		rte_errno = EINVAL;
 		return -1;
 	}
 
+	/* mask off the flags we know about */
+	if (flags & ~(RTE_PDUMP_FLAG_RXTX | RTE_PDUMP_FLAG_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "unknown flags: %#x\n", flags);
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -426,12 +495,12 @@ pdump_validate_port(uint16_t port, char *name)
 }
 
 static int
-pdump_prepare_client_request(char *device, uint16_t queue,
-				uint32_t flags,
-				uint16_t operation,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+pdump_prepare_client_request(const char *device, uint16_t queue,
+			     uint32_t flags, uint32_t snaplen,
+			     uint16_t operation,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     const struct rte_bpf *filter)
 {
 	int ret = -1;
 	struct rte_mp_msg mp_req, *mp_rep;
@@ -440,23 +509,23 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	struct pdump_request *req = (struct pdump_request *)mp_req.param;
 	struct pdump_response *resp;
 
-	req->ver = 1;
-	req->flags = flags;
+	memset(req, 0, sizeof(*req));
+	if (flags & RTE_PDUMP_FLAG_PCAPNG)
+		req->ver = PDUMP_CLIENT_PCAPNG;
+	else
+		req->ver = PDUMP_CLIENT_LEGACY;
+
+	req->flags = flags & RTE_PDUMP_FLAG_RXTX;
 	req->op = operation;
+	req->queue = queue;
+	strlcpy(req->device, device,sizeof(req->device));
+
 	if ((operation & ENABLE) != 0) {
-		strlcpy(req->data.en_v1.device, device,
-			sizeof(req->data.en_v1.device));
-		req->data.en_v1.queue = queue;
-		req->data.en_v1.ring = ring;
-		req->data.en_v1.mp = mp;
-		req->data.en_v1.filter = filter;
-	} else {
-		strlcpy(req->data.dis_v1.device, device,
-			sizeof(req->data.dis_v1.device));
-		req->data.dis_v1.queue = queue;
-		req->data.dis_v1.ring = NULL;
-		req->data.dis_v1.mp = NULL;
-		req->data.dis_v1.filter = NULL;
+		req->queue = queue;
+		req->ring = ring;
+		req->mp = mp;
+		req->filter = filter;
+		req->snaplen = snaplen;
 	}
 
 	strlcpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
@@ -477,11 +546,17 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	return ret;
 }
 
-int
-rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
-			struct rte_ring *ring,
-			struct rte_mempool *mp,
-			void *filter)
+/*
+ * There are two versions of this function, because although original API
+ * left place holder for future filter, it never checked the value.
+ * Therefore the API can't depend on application passing a non
+ * bogus value.
+ */
+static int
+pdump_enable(uint16_t port, uint16_t queue,
+	     uint32_t flags, uint32_t snaplen,
+	     struct rte_ring *ring, struct rte_mempool *mp,
+	     const struct rte_bpf *filter)
 {
 	int ret;
 	char name[RTE_DEV_NAME_MAX_LEN];
@@ -496,20 +571,42 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						ENABLE, ring, mp, filter);
+	if (snaplen == 0)
+		snaplen = UINT32_MAX;
 
-	return ret;
+	return pdump_prepare_client_request(name, queue, flags, snaplen,
+					    ENABLE, ring, mp, filter);
 }
 
 int
-rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
-				uint32_t flags,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
+		 struct rte_ring *ring,
+		 struct rte_mempool *mp,
+		 void *filter __rte_unused)
 {
-	int ret = 0;
+	return pdump_enable(port, queue, flags, 0,
+			    ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf *filter)
+{
+	return pdump_enable(port, queue, flags, snaplen,
+			    ring, mp, filter);
+}
+
+static int
+pdump_enable_by_deviceid(const char *device_id, uint16_t queue,
+			 uint32_t flags, uint32_t snaplen,
+			 struct rte_ring *ring,
+			 struct rte_mempool *mp,
+			 const struct rte_bpf *filter)
+{
+	int ret;
 
 	ret = pdump_validate_ring_mp(ring, mp);
 	if (ret < 0)
@@ -518,10 +615,30 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						ENABLE, ring, mp, filter);
+	return pdump_prepare_client_request(device_id, queue, flags, snaplen,
+					    ENABLE, ring, mp, filter);
+}
 
-	return ret;
+int
+rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
+			     uint32_t flags,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     void *filter __rte_unused)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, 0,
+					ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf *filter)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, snaplen,
+					ring, mp, filter);
 }
 
 int
@@ -537,8 +654,8 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags)
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(name, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
@@ -553,8 +670,66 @@ rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(device_id, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
+
+static void
+pdump_sum_stats(uint16_t port, uint16_t nq,
+		const struct rte_pdump_stats stats[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+		struct rte_pdump_stats *total)
+{
+	uint64_t *sum = (uint64_t *)total;
+	unsigned int i;
+	uint64_t val;
+	uint16_t qid;
+
+	for (qid = 0; qid < nq; qid++) {
+		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+
+		for (i = 0; i < sizeof(*total) / sizeof (uint64_t); i++) {
+			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			sum[i] += val;
+		}
+	}
+}
+
+int
+rte_pdump_stats(uint16_t port, struct rte_pdump_stats *stats)
+{
+	struct rte_eth_dev_info dev_info;
+	const struct rte_memzone *mz;
+	int ret;
+
+	memset(stats, 0, sizeof(*stats));
+	ret = rte_eth_dev_info_get(port, &dev_info);
+	if (ret != 0) {
+		PDUMP_LOG(ERR,
+			  "Error during getting device (port %u) info: %s\n",
+			  port, strerror(-ret));
+		return ret;
+	}
+
+	if (pdump_stats == NULL) {
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+			PDUMP_LOG(ERR,
+				  "pdump not initialized\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		mz = rte_memzone_lookup(MZ_RTE_PDUMP_STATS);
+		if (mz == NULL) {
+			PDUMP_LOG(ERR, "cannont find pdump stats\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+		pdump_stats = mz->addr;
+	}
+
+	pdump_sum_stats(port, dev_info.nb_rx_queues, pdump_stats->rx, stats);
+	pdump_sum_stats(port, dev_info.nb_tx_queues, pdump_stats->tx, stats);
+	return 0;
+}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index 6b00fc17aeb2..03320f5ac2ab 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -15,6 +15,7 @@
 #include <stdint.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
+#include <rte_bpf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -26,7 +27,9 @@ enum {
 	RTE_PDUMP_FLAG_RX = 1,  /* receive direction */
 	RTE_PDUMP_FLAG_TX = 2,  /* transmit direction */
 	/* both receive and transmit directions */
-	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
+	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX),
+
+	RTE_PDUMP_FLAG_PCAPNG = 4, /* format for pcapng */
 };
 
 /**
@@ -68,7 +71,7 @@ rte_pdump_uninit(void);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  Unused should be NULL.
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -80,6 +83,41 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 		struct rte_mempool *mp,
 		void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given port and queue with filtering.
+ *
+ * @param port_id
+ *  The Ethernet port on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  Use BPF filter to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf(uint16_t port_id, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf *filter);
+
 /**
  * Disables packet capturing on given port and queue.
  *
@@ -118,7 +156,7 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  unused should be NULL
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -131,6 +169,43 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 				struct rte_mempool *mp,
 				void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given device id and queue with filtering.
+ * device_id can be name or pci address of device.
+ *
+ * @param device_id
+ *  device id on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  Use BPF filter to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf *filter);
+
+
 /**
  * Disables packet capturing on given device_id and queue.
  * device_id can be name or pci address of device.
@@ -153,6 +228,35 @@ int
 rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 				uint32_t flags);
 
+
+/**
+ * A structure used to retrieve statistics from packet capture.
+ * The statistics are sum of both receive and transmit queues.
+ */
+struct rte_pdump_stats {
+	uint64_t accepted; /**< Number of packets accepted by filter. */
+	uint64_t filtered; /**< Number of packets rejected by filter. */
+	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
+	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+
+	uint64_t reserved[4]; /**< Reserved and pad to cache line */
+};
+
+/**
+ * Retrieve the packet capture statistics for a queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param stats
+ *   A pointer to structure of type *rte_pdump_stats* to be filled in.
+ * @return
+ *   Zero if successful. -1 on error and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_pdump_stats(uint16_t port_id, struct rte_pdump_stats *stats);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index f0a9d12c9a9e..ce5502d9cdf4 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -10,3 +10,11 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_pdump_enable_bpf;
+	rte_pdump_enable_bpf_by_deviceid;
+	rte_pdump_stats;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v2 3/5] app/dumpcap: add new packet capture application
  2021-09-03 22:06 ` [dpdk-dev] [PATCH v2 0/5] Packet capture framework enhancements Stephen Hemminger
  2021-09-03 22:06   ` [dpdk-dev] [PATCH v2 1/5] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
  2021-09-03 22:06   ` [dpdk-dev] [PATCH v2 2/5] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-09-03 22:06   ` Stephen Hemminger
  2021-09-03 22:06   ` [dpdk-dev] [PATCH v2 4/5] doc: changes for new pcapng and dumpcap Stephen Hemminger
  2021-09-03 22:06   ` [dpdk-dev] [PATCH v2 5/5] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
  4 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-03 22:06 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.

It is not complete yet some features such as filtering are not implemented.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/dumpcap/main.c      | 831 ++++++++++++++++++++++++++++++++++++++++
 app/dumpcap/meson.build |  18 +
 app/meson.build         |   1 +
 3 files changed, 850 insertions(+)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build

diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c
new file mode 100644
index 000000000000..4bf41b10106f
--- /dev/null
+++ b/app/dumpcap/main.c
@@ -0,0 +1,831 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * DPDK application to dump network traffic
+ * This is designed to look and act like the Wireshark
+ * dumpcap program.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_alarm.h>
+#include <rte_config.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_lcore.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_pcapng.h>
+#include <rte_pdump.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+#include <rte_time.h>
+#include <rte_version.h>
+
+#include <pcap/pcap.h>
+
+#define RING_NAME "capture-ring"
+#define MONITOR_INTERVAL  (500 * 1000)
+#define MBUF_POOL_CACHE_SIZE 32
+#define BURST_SIZE 32
+#define SLEEP_THRESHOLD 1000
+
+/* command line flags */
+static const char *progname;
+static bool quit_signal;
+static bool group_read;
+static bool quiet;
+static bool promiscuous_mode = true;
+static bool use_pcapng = true;
+static bool show_statistics;
+static char *output_name;
+static const char *filter_str;
+static unsigned int ring_size = 2048;
+static const char *capture_comment;
+static uint32_t snaplen = RTE_MBUF_DEFAULT_BUF_SIZE;
+static bool dump_bpf;
+static struct {
+	double duration;
+	unsigned long packets;
+	size_t size;
+} stop;
+
+/* Running state */
+static uint64_t start_time, end_time;
+static uint64_t packets_received;
+static size_t file_size;
+
+struct interface {
+	TAILQ_ENTRY(interface) next;
+	uint16_t port;
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_rxtx_callback *rx_cb[RTE_MAX_QUEUES_PER_PORT];
+};
+
+TAILQ_HEAD(interface_list, interface);
+static struct interface_list interfaces = TAILQ_HEAD_INITIALIZER(interfaces);
+static struct interface *port2intf[RTE_MAX_ETHPORTS];
+
+static void usage(void)
+{
+	printf("Usage: %s [options] ...\n\n", progname);
+	printf("Capture Interface:\n"
+	       "  -i <interface>           name or port index of interface\n"
+	       "  -f <capture filter>      packet filter in libpcap filter syntax\n");
+	printf("  -s <snaplen>, --snapshot-length <snaplen>\n"
+	       "                           packet snapshot length (def: %u)\n",
+	       RTE_MBUF_DEFAULT_BUF_SIZE);
+	printf("  -p, --no-promiscuous-mode\n"
+	       "                           don't capture in promiscuous mode\n"
+	       "  -D, --list-interfaces    print list of interfaces and exit\n"
+	       "  -d                       print generated BPF code for capture filter\n"
+	       "  -S                       print statistics for each interface once per second\n"
+	       "\n"
+	       "Stop conditions:\n"
+	       "  -c <packet count>        stop after n packets (def: infinite)\n"
+	       "  -a <autostop cond.> ..., --autostop <autostop cond.> ...\n"
+	       "                           duration:NUM - stop after NUM seconds\n"
+	       "                           filesize:NUM - stop this file after NUM kB\n"
+	       "                            packets:NUM - stop after NUM packets\n"
+	       "Output (files):\n"
+	       "  -w <filename>            name of file to save (def: tempfile)\n"
+	       "  -g                       enable group read access on the output file(s)\n"
+	       "  -n                       use pcapng format instead of pcap (default)\n"
+	       "  -P                       use libpcap format instead of pcapng\n"
+	       "  --capture-comment <comment>\n"
+	       "                           add a capture comment to the output file\n"
+	       "\n"
+	       "Miscellaneous:\n"
+	       "  -q                       don't report packet capture counts\n"
+	       "  -v, --version            print version information and exit\n"
+	       "  -h, --help               display this help and exit\n"
+	       "\n"
+	       "Use Ctrl-C to stop capturing at any time.\n");
+}
+
+static const char *version(void)
+{
+	static char str[128];
+
+	snprintf(str, sizeof(str),
+		 "%s 1.0 (%s)\n", progname, rte_version());
+	return str;
+}
+
+/* Parse numeric argument from command line */
+static unsigned long get_uint(const char *arg, const char *name,
+			     unsigned int limit)
+{
+	unsigned long u;
+	char *endp;
+
+	u = strtoul(arg, &endp, 0);
+	if (*arg == '\0' || *endp != '\0')
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is not a valid number\n",
+			 name, arg);
+	if (limit && u > limit)
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is too large (greater than %u)\n",
+			 name, arg, limit);
+
+	return u;
+}
+
+/* Set auto stop values */
+static void auto_stop(char *opt)
+{
+	char *value, *endp;
+
+	value = strchr(opt, ':');
+	if (value == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Missing colon in auto stop parameter\n");
+
+	*value++ = '\0';
+	if (strcmp(opt, "duration") == 0) {
+		stop.duration = strtod(value, &endp);
+		if (*value == '\0' || *endp != '\0' || stop.duration < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Invalid duration \"%s\"\n", value);
+	} else if (strcmp(opt, "filesize") == 0) {
+		stop.size = get_uint(value, "filesize", 0) * 1024;
+	} else if (strcmp(opt, "packets") == 0) {
+		stop.packets = get_uint(value, "packets", 0);
+	} else {
+		rte_exit(EXIT_FAILURE,
+			 "Unknown autostop parameter \"%s\"\n", opt);
+	}
+}
+
+/* Add interface to list of interfaces to capture */
+static void add_interface(uint16_t port, const char *name)
+{
+	struct interface *intf;
+
+	intf = malloc(sizeof(*intf));
+	if (!intf)
+		rte_exit(EXIT_FAILURE, "no memory for interface\n");
+
+	memset(intf, 0, sizeof(*intf));
+	strlcpy(intf->name, name, sizeof(intf->name));
+
+	printf("Capturing on '%s'\n", name);
+
+	port2intf[port] = intf;
+	TAILQ_INSERT_TAIL(&interfaces, intf, next);
+}
+
+/* Select all valid DPDK interfaces */
+static void select_all_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+	}
+}
+
+/*
+ * Choose interface to capture if no -i option given.
+ * Select the first DPDK port, this matches what dumpcap does.
+ */
+static void set_default_interface(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+		return;
+	}
+	rte_exit(EXIT_FAILURE, "No usable interfaces found\n");
+}
+
+/* Lookup interface by name or port and add it to the list */
+static void select_interface(const char *arg)
+{
+	uint16_t port;
+
+	if (strcmp(arg, "*"))
+		select_all_interfaces();
+	else if (rte_eth_dev_get_port_by_name(arg, &port) == 0)
+		add_interface(port, arg);
+	else {
+		char name[RTE_ETH_NAME_MAX_LEN];
+
+		port = get_uint(arg, "port_number", UINT16_MAX);
+		if (rte_eth_dev_get_name_by_port(port, name) < 0)
+			rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
+				 port);
+		add_interface(port, name);
+	}
+}
+
+/* Display list of possible interfaces that can be used. */
+static void show_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		printf("%u. %s\n", p, name);
+	}
+}
+
+static struct rte_bpf *compile_filter(void)
+{
+	struct bpf_program fcode;
+	pcap_t *pcap;
+
+	pcap = pcap_open_dead(DLT_EN10MB, snaplen);
+	if (!pcap)
+		rte_exit(EXIT_FAILURE, "can not open pcap\n");
+
+	if (pcap_compile(pcap, &fcode, filter_str,
+			 1, PCAP_NETMASK_UNKNOWN) != 0)
+		rte_exit(EXIT_FAILURE, "pcap filter string not valid (%s)\n",
+			 pcap_geterr(pcap));
+
+	rte_exit(EXIT_FAILURE, "filter not implemented yet\n");
+
+	/*
+	 * Need to convert classic BPF to eBPF and put in shared memory
+	 * be read by primary process.
+	 */
+	pcap_freecode(&fcode);
+	pcap_close(pcap);
+
+	rte_exit(EXIT_FAILURE, "not implemented\n");
+	return NULL;
+}
+
+/*
+ * Parse command line options.
+ * These are chosen to be similar to dumpcap command.
+ */
+static void parse_opts(int argc, char **argv)
+{
+	static const struct option long_options[] = {
+		{ "autostop",        required_argument, NULL, 'a' },
+		{ "capture-comment", required_argument, NULL, 0 },
+		{ "help",            no_argument,       NULL, 'h' },
+		{ "interface",       required_argument, NULL, 'i' },
+		{ "list-interfaces", no_argument,       NULL, 'D' },
+		{ "no-promiscuous-mode", no_argument,   NULL, 'p' },
+		{ "output-file",     required_argument, NULL, 'w' },
+		{ "ring-buffer",     required_argument, NULL, 'b' },
+		{ "snapshot-length", required_argument, NULL, 's' },
+		{ "version",         no_argument,       NULL, 'v' },
+		{ NULL },
+	};
+	int option_index, c;
+
+	for (;;) {
+		c = getopt_long(argc, argv, "a:b:c:dDf:ghi:nN:pPqs:Svw:",
+				long_options, &option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			switch (option_index) {
+			case 0:
+				capture_comment = optarg;
+				break;
+			default:
+				usage();
+				exit(1);
+			}
+			break;
+		case 'a':
+			auto_stop(optarg);
+			break;
+		case 'b':
+			rte_exit(EXIT_FAILURE,
+				 "multiple files not implemented\n");
+			break;
+		case 'c':
+			stop.packets = get_uint(optarg, "packet_count", 0);
+			break;
+		case 'd':
+			dump_bpf = true;
+			break;
+		case 'D':
+			show_interfaces();
+			exit(0);
+		case 'f':
+			filter_str = optarg;
+			break;
+		case 'g':
+			group_read = true;
+			break;
+		case 'h':
+			printf("%s\n\n", version());
+			usage();
+			exit(0);
+		case 'i':
+			select_interface(optarg);
+			break;
+		case 'n':
+			use_pcapng = true;
+			break;
+		case 'N':
+			ring_size = get_uint(optarg, "packet_limit", 0);
+			break;
+		case 'p':
+			promiscuous_mode = false;
+			break;
+		case 'P':
+			use_pcapng = false;
+			break;
+		case 'q':
+			quiet = true;
+			break;
+		case 's':
+			snaplen = get_uint(optarg, "snap_len", 0);
+			break;
+		case 'S':
+			show_statistics = true;
+			break;
+		case 'w':
+			output_name = optarg;
+			break;
+		case 'v':
+			printf("%s\n", version());
+			exit(0);
+		default:
+			fprintf(stderr, "Invalid option: %s\n",
+				argv[optind - 1]);
+			usage();
+			exit(1);
+		}
+	}
+}
+
+static void
+signal_handler(int sig_num __rte_unused)
+{
+	__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+}
+
+/* Return the time since 1/1/1970 in nanoseconds */
+static uint64_t create_timestamp(void)
+{
+	struct timespec now;
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+	return rte_timespec_to_ns(&now);
+}
+
+/* Return the seconds elapsed since start_time */
+static double elapsed(void)
+{
+	struct timespec now;
+	int64_t delta;
+
+	clock_gettime(CLOCK_MONOTONIC_COARSE, &now);
+	delta = rte_timespec_to_ns(&now) - start_time;
+	return delta / 1.e9;
+}
+
+static void
+cleanup_pdump_resources(void)
+{
+	struct interface *intf;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		rte_pdump_disable(intf->port,
+				  RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
+		if (promiscuous_mode)
+			rte_eth_promiscuous_disable(intf->port);
+	}
+}
+
+/* Alarm signal handler, used to check that primary process */
+static void
+monitor_primary(void *arg __rte_unused)
+{
+	if (__atomic_load_n(&quit_signal, __ATOMIC_RELAXED))
+		return;
+
+	if (rte_eal_primary_proc_alive(NULL)) {
+		rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	} else {
+		fprintf(stderr,
+			"Primary process is no longer active, exiting...\n");
+		__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+	}
+}
+
+/* Setup handler to check when primary exits. */
+static void
+enable_primary_monitor(void)
+{
+	int ret;
+
+	/* Once primary exits, so will pdump. */
+	ret = rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to enable monitor:%d\n", ret);
+}
+
+static void
+disable_primary_monitor(void)
+{
+	int ret;
+
+	ret = rte_eal_alarm_cancel(monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to disable monitor:%d\n", ret);
+}
+
+static void
+report_packet_stats(rte_pcapng_t *out)
+{
+	struct rte_pdump_stats pdump_stats;
+	struct interface *intf;
+	uint64_t ifrecv, ifdrop;
+	double percent;
+
+	fputc('\n', stderr);
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (rte_pdump_stats(intf->port, &pdump_stats) < 0)
+			continue;
+
+		/* do what Wiretap does */
+		ifrecv = pdump_stats.accepted + pdump_stats.filtered;
+		ifdrop = pdump_stats.nombuf + pdump_stats.ringfull;
+
+		if (use_pcapng)
+			rte_pcapng_write_stats(out, intf->port, NULL,
+					       start_time, end_time,
+					       ifrecv, ifdrop);
+
+		if (ifrecv == 0)
+			percent = 0;
+		else
+			percent = 100. * ifrecv / (ifrecv + ifdrop);
+
+		fprintf(stderr,
+			"Packets received/dropped on interface '%s': "
+			"%"PRIu64 "/%" PRIu64 " (%.1f)\n",
+			intf->name, ifrecv, ifdrop, percent);
+	}
+}
+
+/*
+ * Start DPDK EAL with arguments.
+ * Unlike most DPDK programs, this application does not use the
+ * typical EAL command line arguments.
+ * We don't want to expose all the DPDK internals to the user.
+ */
+static void dpdk_init(void)
+{
+	const char *args[] = {
+		progname, "--proc-type", "secondary", "--log-level", "error"
+	};
+	const int eal_argc = RTE_DIM(args);
+	char **eal_argv;
+	unsigned int i;
+
+	/* DPDK API requires mutable versions of command line arguments. */
+	eal_argv = calloc(eal_argc + 1, sizeof(char *));
+	if (eal_argv == NULL)
+		rte_panic("No memory\n");
+
+	for (i = 0; i < RTE_DIM(args); i++)
+		eal_argv[i] = strdup(args[i]);
+
+	if (rte_eal_init(eal_argc, eal_argv) < 0)
+		rte_panic("EAL init failed\n");
+
+	if (rte_eth_dev_count_avail() == 0)
+		rte_exit(EXIT_FAILURE, "No Ethernet ports found\n");
+}
+
+/* Create packet ring shared between callbacks and process */
+static struct rte_ring *create_ring(void)
+{
+	struct rte_ring *ring;
+	size_t size, log2;
+
+	/* Find next power of 2 >= size. */
+	size = ring_size;
+	log2 = sizeof(size) * 8 - __builtin_clzl(size - 1);
+	size = 1u << log2;
+
+	if (size != ring_size) {
+		fprintf(stderr, "Ring size %u rounded up to %zu\n",
+			ring_size, size);
+		ring_size = size;
+	}
+
+	ring = rte_ring_lookup(RING_NAME);
+	if (ring == NULL) {
+		ring = rte_ring_create(RING_NAME, ring_size,
+					rte_socket_id(), 0);
+		if (ring == NULL)
+			rte_exit(EXIT_FAILURE, "Could not create ring :%s\n",
+				 rte_strerror(rte_errno));
+	}
+	return ring;
+}
+
+static struct rte_mempool *create_mempool(void)
+{
+	static const char pool_name[] = "capture_mbufs";
+	size_t num_mbufs = 2 * ring_size;
+	struct rte_mempool *mp;
+
+	mp = rte_mempool_lookup(pool_name);
+	if (mp)
+		return mp;
+
+	mp = rte_pktmbuf_pool_create_by_ops(pool_name, num_mbufs,
+					    MBUF_POOL_CACHE_SIZE, 0,
+					    rte_pcapng_mbuf_size(snaplen),
+					    rte_socket_id(), "ring_mp_sc");
+	if (mp == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Mempool (%s) creation failed: %s\n", pool_name,
+			 rte_strerror(rte_errno));
+
+	return mp;
+}
+
+static void *create_output(void)
+{
+	struct utsname uts;
+	char os[256];
+	int fd;
+
+	/* If no filename specified make a tempfile name */
+	if (output_name == NULL) {
+		struct interface *intf;
+		struct tm *tm;
+		time_t now;
+		char ts[32];
+
+		intf = TAILQ_FIRST(&interfaces);
+		now = time(NULL);
+		tm = localtime(&now);
+		if (!tm)
+			rte_panic("localtime failed\n");
+
+		strftime(ts, sizeof(ts), "%Y%m%d%H%M%S", tm);
+		if (asprintf(&output_name, "/tmp/%s_%u_%s_%s.%s",
+			     progname, intf->port, intf->name, ts,
+			     use_pcapng ? "pcapng" : "pcap") < 0)
+			rte_panic("asprintf failed\n");
+	}
+
+	if (strcmp(output_name, "-") == 0)
+		fd = STDOUT_FILENO;
+	else {
+		mode_t mode = group_read ? 0640 : 0600;
+
+		fd = open(output_name, O_WRONLY | O_CREAT, mode);
+		if (fd < 0)
+			rte_exit(EXIT_FAILURE, "Can not open \"%s\": %s\n",
+				 output_name, strerror(errno));
+	}
+
+	if (use_pcapng) {
+		rte_pcapng_t *pcapng;
+
+		if (uname(&uts) < 0)
+			strcpy(os, "unknown");
+		else
+			snprintf(os, sizeof(os), "%s %s",
+				 uts.sysname, uts.release);
+
+		pcapng = rte_pcapng_fdopen(fd, os, NULL, version(), capture_comment);
+		if (pcapng == NULL)
+			rte_exit(EXIT_FAILURE, "pcapng_fdopen failed: %s\n",
+				 strerror(rte_errno));
+		return pcapng;
+	} else {
+		pcap_dumper_t *dumper;
+		pcap_t *pcap;
+
+		pcap = pcap_open_dead_with_tstamp_precision(DLT_EN10MB, snaplen,
+							    PCAP_TSTAMP_PRECISION_NANO);
+		if (pcap == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_open_dead failed\n");
+
+		dumper = pcap_dump_fopen(pcap, fdopen(fd, "w"));
+		if (dumper == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_dump_fopen failed: %s\n",
+				 pcap_geterr(pcap));
+		return dumper;
+	}
+}
+
+static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp,
+			 struct rte_bpf *filter)
+{
+	struct interface *intf;
+	uint32_t flags;
+	int ret;
+
+	flags = RTE_PDUMP_FLAG_RXTX;
+	if (use_pcapng)
+		flags |= RTE_PDUMP_FLAG_PCAPNG;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (promiscuous_mode)
+			rte_eth_promiscuous_enable(intf->port);
+
+		ret = rte_pdump_enable_bpf(intf->port, RTE_PDUMP_ALL_QUEUES,
+					   flags, snaplen,
+					   r, mp, filter);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Packet dump enable failed: %s\n",
+				 rte_strerror(rte_errno));
+	}
+}
+
+/*
+ * Show current count of captured packets
+ * with backspaces to overwrite last value.
+ */
+static void show_count(uint64_t count)
+{
+	unsigned int i;
+	static unsigned int bt;
+
+	for (i = 0; i < bt; i++)
+		fputc('\b', stderr);
+
+	bt = fprintf(stderr, "%"PRIu64" ", count);
+}
+
+/* Write multiple packets in older pcap format */
+static ssize_t
+pcap_write_packets(pcap_dumper_t *dumper,
+		   struct rte_mbuf *pkts[], uint16_t n)
+{
+	uint8_t temp_data[snaplen];
+	struct pcap_pkthdr header;
+	uint16_t i;
+	size_t total = 0;
+
+	gettimeofday(&header.ts, NULL);
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m = pkts[i];
+
+		header.len = rte_pktmbuf_pkt_len(m);
+		header.caplen = RTE_MIN(header.len, snaplen);
+
+		pcap_dump((u_char *)dumper, &header,
+			  rte_pktmbuf_read(m, 0, header.caplen, temp_data));
+
+		total += sizeof(header) + header.len;
+	}
+
+	return total;
+}
+
+/* Process all packets in ring and dump to capture file */
+static int process_ring(void *out, struct rte_ring *r)
+{
+	struct rte_mbuf *pkts[BURST_SIZE];
+	unsigned int avail, n;
+	static unsigned int empty_count;
+	ssize_t written;
+
+	n = rte_ring_sc_dequeue_burst(r, (void **) pkts, BURST_SIZE,
+				      &avail);
+	if (n == 0) {
+		/* don't consume endless amounts of cpu if idle */
+		if (empty_count < SLEEP_THRESHOLD)
+			++empty_count;
+		else
+			usleep(10);
+		return 0;
+	}
+
+	empty_count = (avail == 0);
+
+	if (use_pcapng)
+		written = rte_pcapng_write_packets(out, pkts, n);
+	else
+		written = pcap_write_packets(out, pkts, n);
+
+	rte_pktmbuf_free_bulk(pkts, n);
+
+	if (written < 0)
+		return -1;
+
+	file_size += written;
+	packets_received += n;
+	if (!quiet)
+		show_count(packets_received);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	struct rte_bpf *bpf_filter = NULL;
+	struct rte_ring *r;
+	struct rte_mempool *mp;
+	void *out;
+
+	progname = basename(argv[0]);
+
+	dpdk_init();
+	parse_opts(argc, argv);
+
+	if (filter_str)
+		bpf_filter = compile_filter();
+
+	if (dump_bpf)
+		fprintf(stderr, "dump filter not implemented yet\n");
+
+	if (TAILQ_EMPTY(&interfaces))
+		set_default_interface();
+
+	r = create_ring();
+	mp = create_mempool();
+	out = create_output();
+	if (out == NULL)
+		rte_exit(EXIT_FAILURE, "can not open output file: %s\n",
+			 rte_strerror(rte_errno));
+
+	start_time = create_timestamp();
+	enable_pdump(r, mp, bpf_filter);
+
+	signal(SIGINT, signal_handler);
+	signal(SIGPIPE, SIG_IGN);
+
+	enable_primary_monitor();
+
+	if (!quiet) {
+		fprintf(stderr, "Packets captured: ");
+		show_count(0);
+	}
+
+	while (!__atomic_load_n(&quit_signal, __ATOMIC_RELAXED)) {
+		if (process_ring(out, r) < 0) {
+			fprintf(stderr, "pcapng file write failed; %s\n",
+				strerror(errno));
+			break;
+		}
+
+		if (stop.size && file_size >= stop.size)
+			break;
+
+		if (stop.packets && packets_received >= stop.packets)
+			break;
+
+		if (stop.duration != 0 && elapsed() > stop.duration)
+			break;
+	}
+
+	end_time = create_timestamp();
+	disable_primary_monitor();
+
+	if (rte_eal_primary_proc_alive(NULL))
+		report_packet_stats(out);
+
+	if (use_pcapng)
+		rte_pcapng_close(out);
+	else
+		pcap_dump_close(out);
+
+	cleanup_pdump_resources();
+	rte_free(bpf_filter);
+	rte_ring_free(r);
+	rte_mempool_free(mp);
+
+	return rte_eal_cleanup() ? EXIT_FAILURE : 0;
+}
diff --git a/app/dumpcap/meson.build b/app/dumpcap/meson.build
new file mode 100644
index 000000000000..206f496fd6b0
--- /dev/null
+++ b/app/dumpcap/meson.build
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+if not dpdk_conf.has('RTE_PORT_PCAP')
+    build = false
+    reason = 'missing dependency, "libpcap"'
+endif
+
+if is_windows
+	build = false
+	reason = 'not supported on Windows'
+	subdir_done()
+endif
+
+ext_deps += pcap_dep
+
+sources = files('main.c')
+deps += ['ethdev', 'pdump', 'pcapng']
diff --git a/app/meson.build b/app/meson.build
index 4c6049807cc3..e41a2e390236 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017-2019 Intel Corporation
 
 apps = [
+        'dumpcap',
         'pdump',
         'proc-info',
         'test-acl',
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v2 4/5] doc: changes for new pcapng and dumpcap
  2021-09-03 22:06 ` [dpdk-dev] [PATCH v2 0/5] Packet capture framework enhancements Stephen Hemminger
                     ` (2 preceding siblings ...)
  2021-09-03 22:06   ` [dpdk-dev] [PATCH v2 3/5] app/dumpcap: add new packet capture application Stephen Hemminger
@ 2021-09-03 22:06   ` Stephen Hemminger
  2021-09-03 22:06   ` [dpdk-dev] [PATCH v2 5/5] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
  4 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-03 22:06 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Describe the new packet capture library and utilities

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/api/doxy-api-index.md                     |  1 +
 doc/api/doxy-api.conf.in                      |  1 +
 .../howto/img/packet_capture_framework.svg    | 96 +++++++++----------
 doc/guides/howto/packet_capture_framework.rst | 67 ++++++-------
 doc/guides/prog_guide/index.rst               |  1 +
 doc/guides/prog_guide/pcapng_lib.rst          | 24 +++++
 doc/guides/prog_guide/pdump_lib.rst           | 28 ++++--
 doc/guides/rel_notes/release_21_11.rst        | 10 ++
 doc/guides/tools/dumpcap.rst                  | 80 ++++++++++++++++
 doc/guides/tools/index.rst                    |  1 +
 10 files changed, 222 insertions(+), 87 deletions(-)
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a0356..ee07394d1c78 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -223,3 +223,4 @@ The public API headers are grouped by topics:
   [experimental APIs]  (@ref rte_compat.h),
   [ABI versioning]     (@ref rte_function_versioning.h),
   [version]            (@ref rte_version.h)
+  [pcapng]             (@ref rte_pcapng.h)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6ab..aba17799a9a1 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -58,6 +58,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/metrics \
                           @TOPDIR@/lib/node \
                           @TOPDIR@/lib/net \
+                          @TOPDIR@/lib/pcapng \
                           @TOPDIR@/lib/pci \
                           @TOPDIR@/lib/pdump \
                           @TOPDIR@/lib/pipeline \
diff --git a/doc/guides/howto/img/packet_capture_framework.svg b/doc/guides/howto/img/packet_capture_framework.svg
index a76baf71fdee..1c2646a81096 100644
--- a/doc/guides/howto/img/packet_capture_framework.svg
+++ b/doc/guides/howto/img/packet_capture_framework.svg
@@ -1,6 +1,4 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<!-- Created with Inkscape (http://www.inkscape.org/) -->
-
 <svg
    xmlns:osb="http://www.openswatchbook.org/uri/2009/osb"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
@@ -16,8 +14,8 @@
    viewBox="0 0 425.19685 283.46457"
    id="svg2"
    version="1.1"
-   inkscape:version="0.91 r13725"
-   sodipodi:docname="drawing-pcap.svg">
+   inkscape:version="1.0.2 (e86c870879, 2021-01-15)"
+   sodipodi:docname="packet_capture_framework.svg">
   <defs
      id="defs4">
     <marker
@@ -228,7 +226,7 @@
        x2="487.64606"
        y2="258.38232"
        gradientUnits="userSpaceOnUse"
-       gradientTransform="translate(-84.916417,744.90779)" />
+       gradientTransform="matrix(1.1457977,0,0,0.99944907,-151.97019,745.05014)" />
     <linearGradient
        inkscape:collect="always"
        xlink:href="#linearGradient5784"
@@ -277,17 +275,18 @@
      borderopacity="1.0"
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
-     inkscape:zoom="0.57434918"
-     inkscape:cx="215.17857"
-     inkscape:cy="285.26445"
+     inkscape:zoom="1"
+     inkscape:cx="226.77165"
+     inkscape:cy="78.124511"
      inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
-     inkscape:window-width="1874"
-     inkscape:window-height="971"
-     inkscape:window-x="2"
-     inkscape:window-y="24"
-     inkscape:window-maximized="0" />
+     inkscape:window-width="2560"
+     inkscape:window-height="1414"
+     inkscape:window-x="0"
+     inkscape:window-y="0"
+     inkscape:window-maximized="1"
+     inkscape:document-rotation="0" />
   <metadata
      id="metadata7">
     <rdf:RDF>
@@ -296,7 +295,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
+        <dc:title />
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -321,15 +320,15 @@
        y="790.82452" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="61.050636"
        y="807.3205"
-       id="text4152"
-       sodipodi:linespacing="125%"><tspan
+       id="text4152"><tspan
          sodipodi:role="line"
          id="tspan4154"
          x="61.050636"
-         y="807.3205">DPDK Primary Application</tspan></text>
+         y="807.3205"
+         style="font-size:12.5px;line-height:1.25">DPDK Primary Application</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6"
@@ -339,19 +338,20 @@
        y="827.01843" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="350.68585"
        y="841.16058"
-       id="text4189"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189"><tspan
          sodipodi:role="line"
          id="tspan4191"
          x="350.68585"
-         y="841.16058">dpdk-pdump</tspan><tspan
+         y="841.16058"
+         style="font-size:12.5px;line-height:1.25">dpdk-dumpcap</tspan><tspan
          sodipodi:role="line"
          x="350.68585"
          y="856.78558"
-         id="tspan4193">tool</tspan></text>
+         id="tspan4193"
+         style="font-size:12.5px;line-height:1.25">tool</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4"
@@ -361,15 +361,15 @@
        y="891.16315" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70612"
        y="905.3053"
-       id="text4189-1"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1"><tspan
          sodipodi:role="line"
          x="352.70612"
          y="905.3053"
-         id="tspan4193-3">PCAP PMD</tspan></text>
+         id="tspan4193-3"
+         style="font-size:12.5px;line-height:1.25">librte_pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5745);fill-opacity:1;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-6"
@@ -379,15 +379,15 @@
        y="923.9931" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.02846"
        y="938.13525"
-       id="text4189-0"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-0"><tspan
          sodipodi:role="line"
          x="136.02846"
          y="938.13525"
-         id="tspan4193-6">dpdk_port0</tspan></text>
+         id="tspan4193-6"
+         style="font-size:12.5px;line-height:1.25">dpdk_port0</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-5"
@@ -397,33 +397,33 @@
        y="824.99817" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="137.54369"
        y="839.14026"
-       id="text4189-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-4"><tspan
          sodipodi:role="line"
          x="137.54369"
          y="839.14026"
-         id="tspan4193-2">librte_pdump</tspan></text>
+         id="tspan4193-2"
+         style="font-size:12.5px;line-height:1.25">librte_pdump</tspan></text>
     <rect
-       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1.07013;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5"
-       width="94.449265"
-       height="35.355339"
-       x="307.7804"
-       y="985.61243" />
+       width="108.21974"
+       height="35.335861"
+       x="297.9809"
+       y="985.62219" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70618"
        y="999.75458"
-       id="text4189-1-8"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8"><tspan
          sodipodi:role="line"
          x="352.70618"
          y="999.75458"
-         id="tspan4193-3-2">capture.pcap</tspan></text>
+         id="tspan4193-3-2"
+         style="font-size:12.5px;line-height:1.25">capture.pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5788-1);fill-opacity:1;stroke:#257cdc;stroke-width:1.12555885;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5-1"
@@ -433,15 +433,15 @@
        y="983.14984" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.53352"
        y="1002.785"
-       id="text4189-1-8-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8-4"><tspan
          sodipodi:role="line"
          x="136.53352"
          y="1002.785"
-         id="tspan4193-3-2-7">Traffic Generator</tspan></text>
+         id="tspan4193-3-2-7"
+         style="font-size:12.5px;line-height:1.25">Traffic Generator</tspan></text>
     <path
        style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker7331)"
        d="m 351.46948,927.02357 c 0,57.5787 0,57.5787 0,57.5787"
diff --git a/doc/guides/howto/packet_capture_framework.rst b/doc/guides/howto/packet_capture_framework.rst
index c31bac52340e..78baa609a021 100644
--- a/doc/guides/howto/packet_capture_framework.rst
+++ b/doc/guides/howto/packet_capture_framework.rst
@@ -1,18 +1,19 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(c) 2017 Intel Corporation.
 
-DPDK pdump Library and pdump Tool
-=================================
+DPDK packet capture libraries and tools
+=======================================
 
 This document describes how the Data Plane Development Kit (DPDK) Packet
 Capture Framework is used for capturing packets on DPDK ports. It is intended
 for users of DPDK who want to know more about the Packet Capture feature and
 for those who want to monitor traffic on DPDK-controlled devices.
 
-The DPDK packet capture framework was introduced in DPDK v16.07. The DPDK
-packet capture framework consists of the DPDK pdump library and DPDK pdump
-tool.
-
+The DPDK packet capture framework was introduced in DPDK v16.07 and
+enhanced in 21.1. The DPDK packet capture framework consists of the
+libraries for collecting packets ``librte_pdump`` and writing packets
+to a file ``librte_pcapng``. There are two sample applications:
+``dpdk-dumpcap`` and older ``dpdk-pdump``.
 
 Introduction
 ------------
@@ -22,43 +23,46 @@ allow users to initialize the packet capture framework and to enable or
 disable packet capture. The library works on a multi process communication model and its
 usage is recommended for debugging purposes.
 
-The :ref:`dpdk-pdump <pdump_tool>` tool is developed based on the
-``librte_pdump`` library.  It runs as a DPDK secondary process and is capable
-of enabling or disabling packet capture on DPDK ports. The ``dpdk-pdump`` tool
-provides command-line options with which users can request enabling or
-disabling of the packet capture on DPDK ports.
+The :ref:`librte_pcapng <pcapng_library>` library provides the APIs to format
+packets and write them to a file in Pcapng format.
+
+
+The :ref:`dpdk-dumpcap <dumpcap_tool>` is a tool that captures packets in
+like Wireshark dumpcap does for Linux. It runs as a DPDK secondary process and
+captures packets from one or more interfaces and writes them to a file
+in Pcapng format.  The ``dpdk-dumpcap`` tool is designed to take
+most of the same options as the Wireshark ``dumpcap`` command.
 
-The application which initializes the packet capture framework will be a primary process
-and the application that enables or disables the packet capture will
-be a secondary process. The primary process sends the Rx and Tx packets from the DPDK ports
-to the secondary process.
+Without any options it will use the packet capture framework to
+capture traffic from the first available DPDK port.
 
 In DPDK the ``testpmd`` application can be used to initialize the packet
-capture framework and acts as a server, and the ``dpdk-pdump`` tool acts as a
+capture framework and acts as a server, and the ``dpdk-dumpcap`` tool acts as a
 client. To view Rx or Tx packets of ``testpmd``, the application should be
-launched first, and then the ``dpdk-pdump`` tool. Packets from ``testpmd``
-will be sent to the tool, which then sends them on to the Pcap PMD device and
-that device writes them to the Pcap file or to an external interface depending
-on the command-line option used.
+launched first, and then the ``dpdk-dumpcap`` tool. Packets from ``testpmd``
+will be sent to the tool, and then to the Pcapng file.
 
 Some things to note:
 
-* The ``dpdk-pdump`` tool can only be used in conjunction with a primary
+* All tools using ``librte_pdump`` can only be used in conjunction with a primary
   application which has the packet capture framework initialized already. In
   dpdk, only ``testpmd`` is modified to initialize packet capture framework,
-  other applications remain untouched. So, if the ``dpdk-pdump`` tool has to
+  other applications remain untouched. So, if the ``dpdk-dumpcap`` tool has to
   be used with any application other than the testpmd, the user needs to
   explicitly modify that application to call the packet capture framework
   initialization code. Refer to the ``app/test-pmd/testpmd.c`` code and look
   for ``pdump`` keyword to see how this is done.
 
-* The ``dpdk-pdump`` tool depends on the libpcap based PMD.
+* The ``dpdk-pdump`` tool is an older tool created as demonstration of ``librte_pdump``
+  library. The ``dpdk-pdump`` tool provides more limited functionality and
+  and depends on the Pcap PMD. It is retained only for compatibility reasons;
+  users should use ``dpdk-dumpcap`` instead.
 
 
 Test Environment
 ----------------
 
-The overview of using the Packet Capture Framework and the ``dpdk-pdump`` tool
+The overview of using the Packet Capture Framework and the ``dpdk-dumpcap`` utility
 for packet capturing on the DPDK port in
 :numref:`figure_packet_capture_framework`.
 
@@ -66,13 +70,13 @@ for packet capturing on the DPDK port in
 
 .. figure:: img/packet_capture_framework.*
 
-   Packet capturing on a DPDK port using the dpdk-pdump tool.
+   Packet capturing on a DPDK port using the dpdk-dumpcap utility.
 
 
 Running the Application
 -----------------------
 
-The following steps demonstrate how to run the ``dpdk-pdump`` tool to capture
+The following steps demonstrate how to run the ``dpdk-dumpcap`` tool to capture
 Rx side packets on dpdk_port0 in :numref:`figure_packet_capture_framework` and
 inspect them using ``tcpdump``.
 
@@ -80,16 +84,15 @@ inspect them using ``tcpdump``.
 
      sudo <build_dir>/app/dpdk-testpmd -c 0xf0 -n 4 -- -i --port-topology=chained
 
-#. Launch the pdump tool as follows::
+#. Launch the dpdk-dump as follows::
 
-     sudo <build_dir>/app/dpdk-pdump -- \
-          --pdump 'port=0,queue=*,rx-dev=/tmp/capture.pcap'
+     sudo <build_dir>/app/dpdk-dumpcap -w /tmp/capture.pcapng
 
 #. Send traffic to dpdk_port0 from traffic generator.
-   Inspect packets captured in the file capture.pcap using a tool
-   that can interpret Pcap files, for example tcpdump::
+   Inspect packets captured in the file capture.pcap using a tool such as
+   tcpdump or tshark that can interpret Pcapng files::
 
-     $tcpdump -nr /tmp/capture.pcap
+     $ tcpdump -nr /tmp/capture.pcapng
      reading from file /tmp/capture.pcap, link-type EN10MB (Ethernet)
      11:11:36.891404 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
      11:11:36.891442 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46a3..b440c77c2ba1 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -43,6 +43,7 @@ Programmer's Guide
     ip_fragment_reassembly_lib
     generic_receive_offload_lib
     generic_segmentation_offload_lib
+    pcapng_lib
     pdump_lib
     multi_proc_support
     kernel_nic_interface
diff --git a/doc/guides/prog_guide/pcapng_lib.rst b/doc/guides/prog_guide/pcapng_lib.rst
new file mode 100644
index 000000000000..36379b530a57
--- /dev/null
+++ b/doc/guides/prog_guide/pcapng_lib.rst
@@ -0,0 +1,24 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2016 Intel Corporation.
+
+.. _pcapng_library:
+
+Packet Capture File Writer
+==========================
+
+Pcapng is a library for creating files in Pcapng file format.
+The Pcapng file format is the default capture file format for modern
+network capture processing tools. It can be read by wireshark and tcpdump.
+
+Usage
+-----
+
+Before the library can be used the function ``rte_pcapng_init``
+should be called once to initialize timestamp computation.
+
+
+References
+----------
+* Draft RFC https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+
+* Project repository  https://github.com/pcapng/pcapng/
diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst
index 62c0b015b2fe..9af91415e5ea 100644
--- a/doc/guides/prog_guide/pdump_lib.rst
+++ b/doc/guides/prog_guide/pdump_lib.rst
@@ -3,10 +3,10 @@
 
 .. _pdump_library:
 
-The librte_pdump Library
-========================
+The Packet Capture Library
+==========================
 
-The ``librte_pdump`` library provides a framework for packet capturing in DPDK.
+The DPDK ``pdump`` library provides a framework for packet capturing in DPDK.
 The library does the complete copy of the Rx and Tx mbufs to a new mempool and
 hence it slows down the performance of the applications, so it is recommended
 to use this library for debugging purposes.
@@ -23,11 +23,19 @@ or disable the packet capture, and to uninitialize it.
 
 * ``rte_pdump_enable()``:
   This API enables the packet capture on a given port and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf()``
+  This API enables the packet capture on a given port and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_enable_by_deviceid()``:
   This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf_by_deviceid()``
+  This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
+  It also allows seating an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_disable()``:
   This API disables the packet capture on a given port and queue.
@@ -61,6 +69,12 @@ and enables the packet capture by registering the Ethernet RX and TX callbacks f
 and queue combinations. Then the primary process will mirror the packets to the new mempool and enqueue them to
 the rte_ring that secondary process have passed to these APIs.
 
+The packet ring supports one of two formats. The default format enqueues copies of the original packets
+into the rte_ring. If the ``RTE_PDUMP_FLAG_PCAPNG`` is set the mbuf data is extended with header and trailer
+to match the format of Pcapng enhanced packet block. The enhanced packet block has meta-data such as the
+timestamp, port and queue the packet was captured on. It is up to the application consuming the
+packets from the ring to select the format desired.
+
 The library APIs ``rte_pdump_disable()`` and ``rte_pdump_disable_by_deviceid()`` disables the packet capture.
 For the calls to these APIs from secondary process, the library creates the "pdump disable" request and sends
 the request to the primary process over the multi process channel. The primary process takes this request and
@@ -74,5 +88,5 @@ function.
 Use Case: Packet Capturing
 --------------------------
 
-The DPDK ``app/pdump`` tool is developed based on this library to capture packets in DPDK.
-Users can use this as an example to develop their own packet capturing tools.
+The DPDK ``app/dpdk-dumpcap`` utility uses this library
+to capture packets in DPDK.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 675b5738348b..ee24cbfdb99d 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -62,6 +62,16 @@ New Features
   * Added bus-level parsing of the devargs syntax.
   * Kept compatibility with the legacy syntax as parsing fallback.
 
+* **Enhance Packet capture.**
+
+  * New dpdk-dumpcap program that has most of the features of the
+    wireshark dumpcap utility including capture of multiple interfaces,
+    stopping after number of bytes, packets.
+  * New library for writing pcapng packet capture files.
+  * Enhancement to the pdump library to support:
+    * Packet filter with BPF.
+    * Pcapng format with timestamps and meta-data.
+    * Fixes packet capture with stripped VLAN tags.
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/dumpcap.rst b/doc/guides/tools/dumpcap.rst
new file mode 100644
index 000000000000..90993bd5be5e
--- /dev/null
+++ b/doc/guides/tools/dumpcap.rst
@@ -0,0 +1,80 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Microsoft Corporation.
+
+.. _dumpcap_tool:
+
+dpdk-dumpcap Application
+========================
+
+The ``dpdk-dumpcap`` tool is a Data Plane Development Kit (DPDK)
+network traffic dump tool.  The interface is similar to  the dumpcap tool in Wireshark.
+It runs as a secondary DPDK process and lets you capture packets that are
+coming into and out of a DPDK primary process.
+The ``dpdk-dumpcap`` writes files in Pcapng packet format using
+capture file format is pcapng.
+
+Without any options set it will use DPDK to capture traffic from the first
+available DPDK interface and write the received raw packet data, along
+with timestamps into a pcapng file.
+
+If the ``-w`` option is not specified, ``dpdk-dumpcap`` writes to a newly
+create file with a name chosen based on interface name and timestamp.
+If ``-w`` option is specified, then that file is used.
+
+   .. Note::
+      * The ``dpdk-dumpcap`` tool can only be used in conjunction with a primary
+        application which has the packet capture framework initialized already.
+        In dpdk, only the ``testpmd`` is modified to initialize packet capture
+        framework, other applications remain untouched. So, if the ``dpdk-dumpcap``
+        tool has to be used with any application other than the testpmd, user
+        needs to explicitly modify that application to call packet capture
+        framework initialization code. Refer ``app/test-pmd/testpmd.c``
+        code to see how this is done.
+
+      * The ``dpdk-dumpcap`` tool runs as a DPDK secondary process. It exits when
+        the primary application exits.
+
+
+Running the Application
+-----------------------
+
+To list interfaces available for capture use ``--list-interfaces``.
+
+To capture on multiple interfaces at once, use multiple ``-I`` flags.
+
+Example
+-------
+
+.. code-block:: console
+
+   # ./<build_dir>/app/dpdk-dumpcap --list-interfaces
+   0. 000:00:03.0
+   1. 000:00:03.1
+
+   # ./<build_dir>/app/dpdk-dumpcap -I 0000:00:03.0 -c 6 -w /tmp/sample.pcapng
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 6/0
+
+
+Limitations
+-----------
+The following options of Wireshark ``dumpcap`` are not yet implemented:
+
+   * ``-f <capture filter>`` -- needs translation from classic BPF to eBPF.
+
+   * ``-b|--ring-buffer`` -- more complex file management.
+
+   * ``-C <byte_limit>`` -- doesn't make sense in DPDK model.
+
+   * ``-t`` -- use a thread per interface
+
+   * Timestamp type.
+
+   * Link data types. Only EN10MB (Ethernet) is supported.
+
+   * Wireless related options:  ``-I|--monitor-mode`` and  ``-k <freq>``
+
+
+.. Note::
+   * The options to ``dpdk-dumpcap`` are like the Wireshark dumpcap program and
+     are not the same as ``dpdk-pdump`` and other DPDK applications.
diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst
index 93dde4148e90..b71c12b8f2dd 100644
--- a/doc/guides/tools/index.rst
+++ b/doc/guides/tools/index.rst
@@ -8,6 +8,7 @@ DPDK Tools User Guides
     :maxdepth: 2
     :numbered:
 
+    dumpcap
     proc_info
     pdump
     pmdinfo
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v2 5/5] MAINTAINERS: add entry for new pcapng and dumper
  2021-09-03 22:06 ` [dpdk-dev] [PATCH v2 0/5] Packet capture framework enhancements Stephen Hemminger
                     ` (3 preceding siblings ...)
  2021-09-03 22:06   ` [dpdk-dev] [PATCH v2 4/5] doc: changes for new pcapng and dumpcap Stephen Hemminger
@ 2021-09-03 22:06   ` Stephen Hemminger
  4 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-03 22:06 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Claim responsibility for the new code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 266f5ac1dae8..06384ac2702d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1429,6 +1429,12 @@ F: app/test/test_pdump.*
 F: app/pdump/
 F: doc/guides/tools/pdump.rst
 
+Packet dump
+M: Stephen Hemminger <stephen@networkplumber.org>
+F: lib/pcapng/
+F: doc/guides/prog_guide/pcapng_lib.rst
+F: app/dumpcap/
+F: doc/guides/tools/dumpcap.rst
 
 Packet Framework
 ----------------
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v3 0/8] Packet capture framework enhancements
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (5 preceding siblings ...)
  2021-09-03 22:06 ` [dpdk-dev] [PATCH v2 0/5] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-08  4:50 ` Stephen Hemminger
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 1/8] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
                     ` (7 more replies)
  2021-09-08 17:16 ` [dpdk-dev] [PATCH v4 0/8] Packet capture framework enhancements Stephen Hemminger
                   ` (11 subsequent siblings)
  18 siblings, 8 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08  4:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patch set is a more complete version of the the enhanced
packet capture support described last year.

The new capture library and utility are:
  - faster avoids lots of extra I/O, does bursting, etc.
  - gives more information (multiple ports, queues, etc)
  - has a better user interface (same as Wireshark dumpcap)
  - fixes structural problems with VLAN's and timestamps

It preserves the old pdump command as is for those people
who never want to change.

v3 changes:
  - introduce packet filters using classic BPF to eBPF converter
    required small fix to DPDK BPF interpreter
  - introduce function to decode eBPF instructions
  - add option to dumpcap to show both classic BPF and eBPF result
  - drop some un-useful stubs
  - minor checkpatch warning cleanup

Stephen Hemminger (8):
  librte_pcapng: add new library for writing pcapng files
  bpf: allow self-xor operation
  bpf: add function to convert classic BPF to DPDK BPF
  bpf: add function to dump eBPF instructions
  pdump: support pcapng and filtering
  app/dumpcap: add new packet capture application
  doc: changes for new pcapng and dumpcap
  MAINTAINERS: add entry for new pcapng and dumper

 MAINTAINERS                                   |   6 +
 app/dumpcap/main.c                            | 832 ++++++++++++++++++
 app/dumpcap/meson.build                       |  16 +
 app/meson.build                               |   1 +
 doc/api/doxy-api-index.md                     |   1 +
 doc/api/doxy-api.conf.in                      |   1 +
 .../howto/img/packet_capture_framework.svg    |  96 +-
 doc/guides/howto/packet_capture_framework.rst |  67 +-
 doc/guides/prog_guide/index.rst               |   1 +
 doc/guides/prog_guide/pcapng_lib.rst          |  24 +
 doc/guides/prog_guide/pdump_lib.rst           |  28 +-
 doc/guides/rel_notes/release_21_11.rst        |  10 +
 doc/guides/tools/dumpcap.rst                  |  86 ++
 doc/guides/tools/index.rst                    |   1 +
 lib/bpf/bpf_convert.c                         | 580 ++++++++++++
 lib/bpf/bpf_dump.c                            | 118 +++
 lib/bpf/bpf_validate.c                        |   8 +-
 lib/bpf/meson.build                           |   6 +
 lib/bpf/rte_bpf.h                             |  39 +
 lib/bpf/version.map                           |   7 +
 lib/meson.build                               |   5 +-
 lib/pcapng/meson.build                        |   8 +
 lib/pcapng/pcapng_proto.h                     | 129 +++
 lib/pcapng/rte_pcapng.c                       | 574 ++++++++++++
 lib/pcapng/rte_pcapng.h                       | 194 ++++
 lib/pcapng/version.map                        |  12 +
 lib/pdump/meson.build                         |   2 +-
 lib/pdump/rte_pdump.c                         | 437 ++++++---
 lib/pdump/rte_pdump.h                         | 110 ++-
 lib/pdump/version.map                         |   8 +
 30 files changed, 3192 insertions(+), 215 deletions(-)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst
 create mode 100644 lib/bpf/bpf_convert.c
 create mode 100644 lib/bpf/bpf_dump.c
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v3 1/8] librte_pcapng: add new library for writing pcapng files
  2021-09-08  4:50 ` [dpdk-dev] [PATCH v3 0/8] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-08  4:50   ` Stephen Hemminger
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 2/8] bpf: allow self-xor operation Stephen Hemminger
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08  4:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See draft RFC
  https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
and
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build           |   1 +
 lib/pcapng/meson.build    |   8 +
 lib/pcapng/pcapng_proto.h | 129 +++++++++
 lib/pcapng/rte_pcapng.c   | 574 ++++++++++++++++++++++++++++++++++++++
 lib/pcapng/rte_pcapng.h   | 194 +++++++++++++
 lib/pcapng/version.map    |  12 +
 6 files changed, 918 insertions(+)
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

diff --git a/lib/meson.build b/lib/meson.build
index 1673ca4323c0..51bf9c2d11f0 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -41,6 +41,7 @@ libraries = [
         'latencystats',
         'lpm',
         'member',
+        'pcapng',
         'power',
         'pdump',
         'rawdev',
diff --git a/lib/pcapng/meson.build b/lib/pcapng/meson.build
new file mode 100644
index 000000000000..fe636bdf3c0b
--- /dev/null
+++ b/lib/pcapng/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+version = 1
+sources = files('rte_pcapng.c')
+headers = files('rte_pcapng.h')
+
+deps += ['ethdev']
diff --git a/lib/pcapng/pcapng_proto.h b/lib/pcapng/pcapng_proto.h
new file mode 100644
index 000000000000..47161d8a1213
--- /dev/null
+++ b/lib/pcapng/pcapng_proto.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * PCAP Next Generation Capture File writer
+ *
+ * See: https://github.com/pcapng/pcapng/ for the file format.
+ */
+
+enum pcapng_block_types {
+	PCAPNG_INTERFACE_BLOCK		= 1,
+	PCAPNG_PACKET_BLOCK,		/* Obsolete */
+	PCAPNG_SIMPLE_PACKET_BLOCK,
+	PCAPNG_NAME_RESOLUTION_BLOCK,
+	PCAPNG_INTERFACE_STATS_BLOCK,
+	PCAPNG_ENHANCED_PACKET_BLOCK,
+
+	PCAPNG_SECTION_BLOCK		= 0x0A0D0D0A,
+};
+
+struct pcapng_option {
+	uint16_t code;
+	uint16_t length;
+	uint8_t data[];
+};
+
+#define PCAPNG_BYTE_ORDER_MAGIC 0x1A2B3C4D
+#define PCAPNG_MAJOR_VERS 1
+#define PCAPNG_MINOR_VERS 0
+
+enum pcapng_opt {
+	PCAPNG_OPT_END	= 0,
+	PCAPNG_OPT_COMMENT = 1,
+};
+
+struct pcapng_section_header {
+	uint32_t block_type;
+	uint32_t block_length;
+	uint32_t byte_order_magic;
+	uint16_t major_version;
+	uint16_t minor_version;
+	uint64_t section_length;
+};
+
+enum pcapng_section_opt {
+	PCAPNG_SHB_HARDWARE = 2,
+	PCAPNG_SHB_OS	    = 3,
+	PCAPNG_SHB_USERAPPL = 4,
+};
+
+struct pcapng_interface_block {
+	uint32_t block_type;	/* 1 */
+	uint32_t block_length;
+	uint16_t link_type;
+	uint16_t reserved;
+	uint32_t snap_len;
+};
+
+enum pcapng_interface_options {
+	PCAPNG_IFB_NAME	 = 2,
+	PCAPNG_IFB_DESCRIPTION,
+	PCAPNG_IFB_IPV4ADDR,
+	PCAPNG_IFB_IPV6ADDR,
+	PCAPNG_IFB_MACADDR,
+	PCAPNG_IFB_EUIADDR,
+	PCAPNG_IFB_SPEED,
+	PCAPNG_IFB_TSRESOL,
+	PCAPNG_IFB_TZONE,
+	PCAPNG_IFB_FILTER,
+	PCAPNG_IFB_OS,
+	PCAPNG_IFB_FCSLEN,
+	PCAPNG_IFB_TSOFFSET,
+	PCAPNG_IFB_HARDWARE,
+};
+
+struct pcapng_enhance_packet_block {
+	uint32_t block_type;	/* 6 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+	uint32_t capture_length;
+	uint32_t original_length;
+};
+
+/* Flags values */
+#define PCAPNG_IFB_INBOUND   0b01
+#define PCAPNG_IFB_OUTBOUND  0b10
+
+enum pcapng_epb_options {
+	PCAPNG_EPB_FLAGS = 2,
+	PCAPNG_EPB_HASH,
+	PCAPNG_EPB_DROPCOUNT,
+	PCAPNG_EPB_PACKETID,
+	PCAPNG_EPB_QUEUE,
+	PCAPNG_EPB_VERDICT,
+};
+
+enum pcapng_epb_hash {
+	PCAPNG_HASH_2COMP = 0,
+	PCAPNG_HASH_XOR,
+	PCAPNG_HASH_CRC32,
+	PCAPNG_HASH_MD5,
+	PCAPNG_HASH_SHA1,
+	PCAPNG_HASH_TOEPLITZ,
+};
+
+struct pcapng_simple_packet {
+	uint32_t block_type;	/* 3 */
+	uint32_t block_length;
+	uint32_t packet_length;
+};
+
+struct pcapng_statistics {
+	uint32_t block_type;	/* 5 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+};
+
+enum pcapng_isb_options {
+	PCAPNG_ISB_STARTTIME = 2,
+	PCAPNG_ISB_ENDTIME,
+	PCAPNG_ISB_IFRECV,
+	PCAPNG_ISB_IFDROP,
+	PCAPNG_ISB_FILTERACCEPT,
+	PCAPNG_ISB_OSDROP,
+	PCAPNG_ISB_USRDELIV,
+};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
new file mode 100644
index 000000000000..7bde8d9b75fd
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.c
@@ -0,0 +1,574 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+#include <errno.h>
+#include <net/if.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/uio.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+#include <rte_time.h>
+
+#include "pcapng_proto.h"
+
+/* conversion from DPDK speed to PCAPNG */
+#define PCAPNG_MBPS_SPEED 1000000ull
+
+/* Format of the capture file handle */
+struct rte_pcapng {
+	int  outfd;		/* output file */
+	/* DPDK port id to interface index in file */
+	uint32_t port_index[RTE_MAX_ETHPORTS];
+};
+
+/* For converting TSC cycles to PCAPNG ns format */
+struct pcapng_time {
+	uint64_t ns;
+	uint64_t cycles;
+} pcapng_time;
+
+RTE_INIT(pcapng_init)
+{
+	struct timespec ts;
+
+	pcapng_time.cycles = rte_get_tsc_cycles();
+	clock_gettime(CLOCK_REALTIME, &ts);
+	pcapng_time.ns = rte_timespec_to_ns(&ts);
+}
+
+/* PCAPNG timestamps are in nanoseconds */
+static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
+{
+	uint64_t delta;
+
+	delta = cycles - pcapng_time.cycles;
+	return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
+}
+
+/* length of option including padding */
+static uint16_t pcapng_optlen(uint16_t len)
+{
+	return RTE_ALIGN(sizeof(struct pcapng_option) + len,
+			 sizeof(uint32_t));
+}
+
+/* build TLV option and return location of next */
+static struct pcapng_option *
+pcapng_add_option(struct pcapng_option *popt, uint16_t code,
+		  const void *data, uint16_t len)
+{
+	popt->code = code;
+	popt->length = len;
+	memcpy(popt->data, data, len);
+
+	return (struct pcapng_option *)((uint8_t *)popt + pcapng_optlen(len));
+}
+
+/*
+ * Write required initial section header describing the capture
+ */
+static int
+pcapng_section_block(rte_pcapng_t *self,
+		    const char *os, const char *hw,
+		    const char *app, const char *comment)
+{
+	struct pcapng_section_header *hdr;
+	struct pcapng_option *opt;
+	void *buf;
+	uint32_t len;
+	ssize_t cc;
+
+	len = sizeof(*hdr);
+	if (hw)
+		len += pcapng_optlen(strlen(hw));
+	if (os)
+		len += pcapng_optlen(strlen(os));
+	if (app)
+		len += pcapng_optlen(strlen(app));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = calloc(1, len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_section_header *)buf;
+	*hdr = (struct pcapng_section_header) {
+		.block_type = PCAPNG_SECTION_BLOCK,
+		.block_length = len,
+		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
+		.major_version = PCAPNG_MAJOR_VERS,
+		.minor_version = PCAPNG_MINOR_VERS,
+		.section_length = UINT64_MAX,
+	};
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (hw)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_HARDWARE,
+					hw, strlen(hw));
+	if (os)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_OS,
+					os, strlen(os));
+	if (app)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_USERAPPL,
+					app, strlen(app));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+	/* clone block_length after option */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+/* Write the PCAPNG section header at start of file */
+static ssize_t
+pcapng_interface_block(rte_pcapng_t *self, const char *if_name,
+		      uint64_t if_speed, const uint8_t *mac_addr,
+		      const char *if_hw, const char *comment)
+{
+	struct pcapng_interface_block *hdr;
+	struct pcapng_option *opt;
+	const uint8_t tsresol = 9;	/* nanosecond resolution */
+	uint32_t len = sizeof(*hdr);
+	ssize_t cc;
+	void *buf;
+
+	len += pcapng_optlen(sizeof(tsresol));
+	if (if_name)
+		len += pcapng_optlen(strlen(if_name));
+	if (mac_addr)
+		len += pcapng_optlen(6);
+	if (if_speed)
+		len += pcapng_optlen(sizeof(uint64_t));
+	if (if_hw)
+		len += pcapng_optlen(strlen(if_hw));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+	buf = calloc(1, len);
+	if (!buf)
+		return -ENOMEM;
+
+	hdr = (struct pcapng_interface_block *)buf;
+	hdr->block_type = PCAPNG_INTERFACE_BLOCK;
+	hdr->link_type = 1;	/* Ethernet */
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (if_name)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_NAME,
+					 if_name, strlen(if_name));
+	if (mac_addr)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_MACADDR,
+					mac_addr, RTE_ETHER_ADDR_LEN);
+	if (if_speed)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_SPEED,
+					 &if_speed, sizeof(uint64_t));
+	opt = pcapng_add_option(opt, PCAPNG_IFB_TSRESOL,
+				&tsresol, sizeof(tsresol));
+	if (if_hw)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE,
+					 if_hw, strlen(if_hw));
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+static int
+pcapng_add_interface(rte_pcapng_t *self, uint16_t port)
+{
+	struct rte_eth_dev_info dev_info;
+	struct rte_ether_addr macaddr;
+	const struct rte_device *dev;
+	struct rte_eth_link link;
+	char ifname[IF_NAMESIZE];
+	char ifhw[256];
+	uint64_t speed = 0;
+
+	if (rte_eth_dev_info_get(port, &dev_info) < 0)
+		return -1;
+
+	/* make something like an interface name */
+	if (if_indextoname(dev_info.if_index, ifname) == NULL)
+		snprintf(ifname, IF_NAMESIZE, "dpdk:%u", port);
+
+	/* make a useful device hardware string */
+	dev = dev_info.device;
+	if (dev)
+		snprintf(ifhw, sizeof(ifhw),
+			 "%s-%s", dev->bus->name, dev->name);
+
+	/* DPDK reports in units of Mbps */
+	rte_eth_link_get(port, &link);
+	if (link.link_status == ETH_LINK_UP)
+		speed = link.link_speed * PCAPNG_MBPS_SPEED;
+
+	rte_eth_macaddr_get(port, &macaddr);
+
+	return pcapng_interface_block(self, ifname, speed,
+				      macaddr.addr_bytes,
+				      dev ? ifhw : NULL, NULL);
+}
+
+/*
+ * Write the list of possible interfaces at the start
+ * of the file.
+ */
+static int
+pcapng_interfaces(rte_pcapng_t *self)
+{
+	uint16_t port_id;
+	uint16_t index = 0;
+
+	RTE_ETH_FOREACH_DEV(port_id) {
+		/* The list if ports in pcapng needs to be contiguous */
+		self->port_index[port_id] = index++;
+		if (pcapng_add_interface(self, port_id) < 0)
+			return -1;
+	}
+	return 0;
+}
+
+/*
+ * Write an Interface statistics block at the end of capture.
+ */
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop)
+{
+	struct pcapng_statistics *hdr;
+	struct pcapng_option *opt;
+	uint32_t optlen, len;
+	uint8_t *buf;
+	uint64_t ns;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	optlen = 0;
+
+	if (ifrecv != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifdrop));
+	if (start_time != 0)
+		optlen += pcapng_optlen(sizeof(start_time));
+	if (end_time != 0)
+		optlen += pcapng_optlen(sizeof(end_time));
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+	if (optlen != 0)
+		optlen += pcapng_optlen(0);
+
+	len = sizeof(*hdr) + optlen + sizeof(uint32_t);
+	buf = alloca(len);
+	if (buf == NULL)
+		return -1;
+
+	hdr = (struct pcapng_statistics *)buf;
+	opt = (struct pcapng_option *)(hdr + 1);
+
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (start_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME,
+					 &start_time, sizeof(start_time));
+	if (end_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME,
+					 &end_time, sizeof(end_time));
+	if (ifrecv != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV,
+				&ifrecv, sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFDROP,
+				&ifdrop, sizeof(ifdrop));
+	if (optlen != 0)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	hdr->block_type = PCAPNG_INTERFACE_STATS_BLOCK;
+	hdr->block_length = len;
+	hdr->interface_id = self->port_index[port_id];
+
+	ns = pcapng_tsc_to_ns(rte_get_tsc_cycles());
+	hdr->timestamp_hi = ns >> 32;
+	hdr->timestamp_lo = (uint32_t)ns;
+
+	/* clone block_length after option */
+	memcpy(opt, &len, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length)
+{
+	/* The VLAN and EPB header must fit in the mbuf headroom. */
+	RTE_ASSERT(sizeof(struct pcapng_enhance_packet_block) +
+		   sizeof(struct rte_vlan_hdr) <= RTE_PKTMBUF_HEADROOM);
+
+	/* The flags and queue information are added at the end. */
+	return sizeof(struct rte_mbuf)
+		+ RTE_ALIGN(length, sizeof(uint32_t))
+		+ pcapng_optlen(sizeof(uint32_t)) /* flag option */
+		+ pcapng_optlen(sizeof(uint32_t)) /* queue option */
+		+ sizeof(uint32_t);		  /*  length */
+}
+
+/*
+ *   The mbufs created use the Pcapng standard enhanced packet  block.
+ *
+ *                         1                   2                   3
+ *     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  0 |                    Block Type = 0x00000006                    |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  4 |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  8 |                         Interface ID                          |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 12 |                        Timestamp (High)                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 16 |                        Timestamp (Low)                        |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 20 |                    Captured Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 24 |                    Original Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 28 /                                                               /
+ *    /                          Packet Data                          /
+ *    /              variable length, padded to 32 bits               /
+ *    /                                                               /
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0002     |     Option Length = 0x004     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Flags (direction)                                |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0006     |     Option Length = 0x002     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Queue id                                         |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+
+/* Make a copy of original mbuf with pcapng header and options */
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *md,
+		struct rte_mempool *mp,
+		uint32_t length, uint64_t cycles,
+		enum rte_pcapng_direction direction)
+{
+	struct pcapng_enhance_packet_block *epb;
+	uint32_t orig_len, data_len, padding, flags;
+	struct pcapng_option *opt;
+	const uint16_t optlen = pcapng_optlen(sizeof(flags)) + pcapng_optlen(sizeof(queue));
+	struct rte_mbuf *mc;
+	uint64_t ns;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+#endif
+	ns = pcapng_tsc_to_ns(cycles);
+
+	orig_len = rte_pktmbuf_pkt_len(md);
+
+	/* Take snapshot of the data */
+	mc = rte_pktmbuf_copy(md, mp, 0, length);
+	if (unlikely(mc == NULL))
+		return NULL;
+
+	/* If packet had offloaded VLAN, expand it */
+	if (md->ol_flags & ~(PKT_RX_VLAN_STRIPPED | PKT_TX_VLAN)) {
+		if (rte_vlan_insert(&mc) != 0)
+			goto fail;
+
+		orig_len += sizeof(struct rte_vlan_hdr);
+	}
+
+	/* pad the packet to 32 bit boundary */
+	data_len = rte_pktmbuf_data_len(mc);
+	padding = RTE_ALIGN(data_len, sizeof(uint32_t)) - data_len;
+	if (padding > 0) {
+		void *tail = rte_pktmbuf_append(mc, padding);
+
+		if (tail == NULL)
+			goto fail;
+		memset(tail, 0, padding);
+	}
+
+	/* reserve trailing options and block length */
+	opt = (struct pcapng_option *)
+		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
+	if (unlikely(opt == NULL))
+		goto fail;
+
+	switch (direction) {
+	case RTE_PCAPNG_DIRECTION_IN:
+		flags = PCAPNG_IFB_INBOUND;
+		break;
+	case RTE_PCAPNG_DIRECTION_OUT:
+		flags = PCAPNG_IFB_OUTBOUND;
+		break;
+	default:
+		flags = 0;
+	}
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS,
+				&flags, sizeof(flags));
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
+				&queue, sizeof(queue));
+
+	/* Add PCAPNG packet header */
+	epb = (struct pcapng_enhance_packet_block *)
+		rte_pktmbuf_prepend(mc, sizeof(*epb));
+	if (unlikely(epb == NULL))
+		goto fail;
+
+	epb->block_type = PCAPNG_ENHANCED_PACKET_BLOCK;
+	epb->block_length = rte_pktmbuf_data_len(mc);
+
+	/* Interface index is filled in later during write */
+	mc->port = port_id;
+
+	epb->timestamp_hi = ns >> 32;
+	epb->timestamp_lo = (uint32_t)ns;
+	epb->capture_length = data_len;
+	epb->original_length = orig_len;
+
+	/* set trailer of block length */
+	*(uint32_t *)opt = epb->block_length;
+
+	return mc;
+
+fail:
+	rte_pktmbuf_free(mc);
+	return NULL;
+}
+
+/* Count how many segments are in this array of mbufs */
+static unsigned int
+mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
+{
+	unsigned int i, iovcnt;
+
+	for (iovcnt = 0, i = 0; i < n; i++) {
+		const struct rte_mbuf *m = pkts[i];
+
+		__rte_mbuf_sanity_check(m, 1);
+
+		iovcnt += m->nb_segs;
+	}
+	return iovcnt;
+}
+
+/* Write pre-formatted packets to file. */
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts)
+{
+	int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
+	struct iovec iov[iovcnt];
+	unsigned int i, cnt;
+	ssize_t ret;
+
+	for (i = cnt = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *m = pkts[i];
+		struct pcapng_enhance_packet_block *epb;
+
+		/* sanity check that is really a pcapng mbuf */
+		epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
+		if (unlikely(epb->block_type != PCAPNG_ENHANCED_PACKET_BLOCK ||
+			     epb->block_length != rte_pktmbuf_data_len(m))) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/*
+		 * The DPDK port is recorded during pcapng_copy.
+		 * Map that to PCAPNG interface in file.
+		 */
+		epb->interface_id = self->port_index[m->port];
+		do {
+			iov[cnt].iov_base = rte_pktmbuf_mtod(m, void *);
+			iov[cnt].iov_len = rte_pktmbuf_data_len(m);
+			++cnt;
+		} while ((m = m->next));
+	}
+
+	ret = writev(self->outfd, iov, iovcnt);
+	if (unlikely(ret < 0))
+		rte_errno = errno;
+	return ret;
+}
+
+/* Create new pcapng writer handle */
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment)
+{
+	rte_pcapng_t *self;
+
+	self = malloc(sizeof(*self));
+	if (!self) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	self->outfd = fd;
+
+	if (pcapng_section_block(self, osname, hardware, appname, comment) < 0)
+		goto fail;
+
+	if (pcapng_interfaces(self) < 0)
+		goto fail;
+
+	return self;
+fail:
+	free(self);
+	return NULL;
+}
+
+void
+rte_pcapng_close(rte_pcapng_t *self)
+{
+	close(self->outfd);
+	free(self);
+}
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
new file mode 100644
index 000000000000..3b826c4291d7
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.h
@@ -0,0 +1,194 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+/**
+ * @file
+ * RTE pcapng
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * Pcapng is an evolution from the pcap format, created to address some of
+ * its deficiencies. Namely, the lack of extensibility and inability to store
+ * additional information.
+ *
+ * For details about the file format see RFC:
+ *   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+ *  and
+ *    https://github.com/pcapng/pcapng/
+ */
+
+#ifndef _RTE_PCAPNG_H_
+#define _RTE_PCAPNG_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+#include <rte_compat.h>
+#include <rte_common.h>
+#include <rte_mempool.h>
+#include <rte_ring.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Opaque handle used for functions in this library. */
+typedef struct rte_pcapng rte_pcapng_t;
+
+/**
+ * Write data to existing open file
+ *
+ * @param fd
+ *   file descriptor
+ * @param osname
+ *   Optional description of the operating system.
+ *   Examples: "Debian 11", "Windows Server 22"
+ * @param hardware
+ *   Optional description of the hardware used to create this file.
+ *   Examples: "x86 Virtual Machine"
+ * @param appname
+ *   Optional: application name recorded in the pcapng file.
+ *   Example: "dpdk-dumpcap 1.0 (DPDK 20.11)"
+ * @param comment
+ *   Optional comment to add to file header.
+ * @return
+ *   handle to library, or NULL in case of error (and rte_errno is set).
+ */
+__rte_experimental
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment);
+
+/**
+ * Close capture file
+ *
+ * @param self
+ *  handle to library
+ */
+__rte_experimental
+void
+rte_pcapng_close(rte_pcapng_t *self);
+
+/**
+ * Direction flag
+ * These should match Enhanced Packet Block flag bits
+ */
+enum rte_pcapng_direction {
+	RTE_PCAPNG_DIRECTION_UNKNOWN = 0,
+	RTE_PCAPNG_DIRECTION_IN  = 1,
+	RTE_PCAPNG_DIRECTION_OUT = 2,
+};
+
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param port_id
+ *   The Ethernet port on which packet was received
+ *   or is going to be transmitted.
+ * @param queue
+ *   The queue on the Ethernet port where packet was received
+ *   or is going to be transmitted.
+ * @param mp
+ *   The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ *   The mbuf to copy
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @param timestamp
+ *   The timestamp in TSC cycles.
+ * @param direction
+ *   The direction of the packer: receive, transmit or unknown.
+ *
+ * @return
+ *   - The pointer to the new mbuf formatted for pcapng_write
+ *   - NULL if allocation fails.
+ *
+ */
+__rte_experimental
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *m, struct rte_mempool *mp,
+		uint32_t length, uint64_t timestamp,
+		enum rte_pcapng_direction direction);
+
+
+/**
+ * Determine optimum mbuf data size.
+ *
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @return
+ *   The minimum size of mbuf data to handle packet with length bytes.
+ *   Accounting for required header and trailer fields
+ */
+__rte_experimental
+uint32_t
+rte_pcapng_mbuf_size(uint32_t snaplen);
+
+/**
+ * Write packets to the capture file.
+ *
+ * Packets to be captured are copied by rte_pcapng_mbuf()
+ * and then this function is called to write them to the file.
+ * @warning
+ * Do not pass original mbufs
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param pkts
+ *  The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *  which contain the output packets
+ * @param nb_pkts
+ *  The number of packets to write to the file.
+ * @return
+ *  The number of bytes written to file, -1 on failure to write file.
+ *  The mbuf's in *pkts* are always freed.
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts);
+
+/**
+ * Write an Interface statistics block.
+ * For statistics, use 0 if don't know or care to report it.
+ * Should be called before closing capture to report results.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param port
+ *  The Ethernet port to report stats on.
+ * @param comment
+ *   Optional comment to add to statistics.
+ * @param start_time
+ *  The time when packet capture was started in nanoseconds.
+ *  Optional: can be zero if not known.
+ * @param end_time
+ *  The time when packet capture was stopped in nanoseconds.
+ *  Optional: can be zero if not finished;
+ * @param ifrecv
+ *  The number of packets received by capture.
+ *  Optional: use UINT64_MAX if not known.
+ * @param ifdrop
+ *  The number of packets missed by the capture process.
+ *  Optional: use UINT64_MAX if not known.
+ * @return
+ *  number of bytes written to file, -1 on failure to write file
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PCAPNG_H_ */
diff --git a/lib/pcapng/version.map b/lib/pcapng/version.map
new file mode 100644
index 000000000000..05a9c86a7d91
--- /dev/null
+++ b/lib/pcapng/version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_pcapng_close;
+	rte_pcapng_copy;
+	rte_pcapng_fdopen;
+	rte_pcapng_mbuf_size;
+	rte_pcapng_write_packets;
+	rte_pcapng_write_stats;
+
+	local: *;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v3 2/8] bpf: allow self-xor operation
  2021-09-08  4:50 ` [dpdk-dev] [PATCH v3 0/8] Packet capture framework enhancements Stephen Hemminger
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 1/8] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-09-08  4:50   ` Stephen Hemminger
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 3/8] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08  4:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

When doing BPF filter program conversion, a common way
to zero a register in single instruction is:
     xor r7,r7

The BPF validator would not allow this because the value of
r7 was undefined. But after this operation it always zero so
allow it as a special case.

Cc: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_validate.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/bpf/bpf_validate.c b/lib/bpf/bpf_validate.c
index 7b1291b382e9..7647a7454dc2 100644
--- a/lib/bpf/bpf_validate.c
+++ b/lib/bpf/bpf_validate.c
@@ -661,8 +661,12 @@ eval_alu(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
 
 	op = BPF_OP(ins->code);
 
-	err = eval_defined((op != EBPF_MOV) ? rd : NULL,
-			(op != BPF_NEG) ? &rs : NULL);
+	/* Allow self-xor as way to zero register */
+	if (op == BPF_XOR && ins->src_reg == ins->dst_reg)
+		err = NULL;
+	else
+		err = eval_defined((op != EBPF_MOV) ? rd : NULL,
+				   (op != BPF_NEG) ? &rs : NULL);
 	if (err != NULL)
 		return err;
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v3 3/8] bpf: add function to convert classic BPF to DPDK BPF
  2021-09-08  4:50 ` [dpdk-dev] [PATCH v3 0/8] Packet capture framework enhancements Stephen Hemminger
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 1/8] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 2/8] bpf: allow self-xor operation Stephen Hemminger
@ 2021-09-08  4:50   ` Stephen Hemminger
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 4/8] bpf: add function to dump eBPF instructions Stephen Hemminger
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08  4:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

The pcap library emits classic BPF (32 bit) and is useful for
creating filter programs.  The DPDK BPF library only implements
extended BPF (eBPF).  Add an function to convert from old to
new.

The rte_bpf_convert function uses rte_malloc to put the resulting
program in hugepage shared memory so it can be passed from a
secondary process to a primary process.

The code to convert was originally done as part of the Linux
kernel implementation then converted to a userspace program.
Both authors have agreed that it is allowable to license this
as BSD licensed in DPDK.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_convert.c | 580 ++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build   |   5 +
 lib/bpf/rte_bpf.h     |  25 ++
 lib/bpf/version.map   |   6 +
 4 files changed, 616 insertions(+)
 create mode 100644 lib/bpf/bpf_convert.c

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
new file mode 100644
index 000000000000..e3acdec60190
--- /dev/null
+++ b/lib/bpf/bpf_convert.c
@@ -0,0 +1,580 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Microsoft Corporation
+ *
+ * Based on bpf_convert_filter() in the Linux kernel sources
+ * and filter2xdp.
+ *
+ * Licensed as BSD with permission original authors.
+ * Copyright (C) 2017 Tobias Klauser
+ * Copyright (c) 2011 - 2014 PLUMgrid, http://plumgrid.com
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_bpf.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+/* Workaround name conflicts with libpcap */
+#define bpf_validate(f, len) bpf_validate_libpcap(f, len)
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+#undef bpf_validate
+
+#include "bpf_impl.h"
+#include "bpf_def.h"
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+/*
+ * Linux socket filter uses negative absolute offsets to
+ * reference ancillary data.
+ *
+ */
+#define SKF_AD_OFF    (-0x1000)
+#define SKF_AD_PROTOCOL 0
+#define SKF_AD_PKTTYPE	4
+#define SKF_AD_IFINDEX	8
+#define SKF_AD_NLATTR	12
+#define SKF_AD_NLATTR_NEST	16
+#define SKF_AD_MARK	20
+#define SKF_AD_QUEUE	24
+#define SKF_AD_HATYPE	28
+#define SKF_AD_RXHASH	32
+#define SKF_AD_CPU	36
+#define SKF_AD_ALU_XOR_X	40
+#define SKF_AD_VLAN_TAG	44
+#define SKF_AD_VLAN_TAG_PRESENT 48
+#define SKF_AD_PAY_OFFSET	52
+#define SKF_AD_RANDOM	56
+#define SKF_AD_VLAN_TPID	60
+#define SKF_AD_MAX	64
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1	EBPF_REG_1
+#define BPF_REG_ARG2	EBPF_REG_2
+#define BPF_REG_ARG3	EBPF_REG_3
+#define BPF_REG_ARG4	EBPF_REG_4
+#define BPF_REG_ARG5	EBPF_REG_5
+#define BPF_REG_CTX	EBPF_REG_6
+#define BPF_REG_FP	EBPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A	EBPF_REG_0
+#define BPF_REG_X	EBPF_REG_7
+#define BPF_REG_TMP	EBPF_REG_8
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define EBPF_ALU64_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | BPF_OP(OP) | BPF_X,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU32_IMM(OP, DST, IMM)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV32_IMM(DST, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = CODE,					\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN()						\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | EBPF_EXIT,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/*
+ * Placeholder to convert BPF extensions like length and VLAN tag
+ * If and when DPDK BPF supports them.
+ */
+static bool convert_bpf_load(const struct bpf_insn *fp,
+			     struct ebpf_insn **new_insnp __rte_unused)
+{
+	switch (fp->k) {
+	case SKF_AD_OFF + SKF_AD_PROTOCOL:
+	case SKF_AD_OFF + SKF_AD_PKTTYPE:
+	case SKF_AD_OFF + SKF_AD_IFINDEX:
+	case SKF_AD_OFF + SKF_AD_HATYPE:
+	case SKF_AD_OFF + SKF_AD_MARK:
+	case SKF_AD_OFF + SKF_AD_RXHASH:
+	case SKF_AD_OFF + SKF_AD_QUEUE:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG_PRESENT:
+	case SKF_AD_OFF + SKF_AD_VLAN_TPID:
+	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
+	case SKF_AD_OFF + SKF_AD_NLATTR:
+	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
+	case SKF_AD_OFF + SKF_AD_CPU:
+	case SKF_AD_OFF + SKF_AD_RANDOM:
+	case SKF_AD_OFF + SKF_AD_ALU_XOR_X:
+		/* TODO: Not implemented yet */
+		RTE_BPF_LOG(ERR, "BPF extension LOAD ABS %u not supported\n",
+			    fp->k);
+		return true;
+	default:
+		return false;
+	}
+}
+
+static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
+			      struct ebpf_insn *new_prog, uint32_t *new_len)
+{
+	unsigned int pass = 0;
+	size_t new_flen = 0, target, i;
+	struct ebpf_insn *new_insn;
+	const struct bpf_insn *fp;
+	int *addrs = NULL;
+	uint8_t bpf_src;
+
+	if (len > BPF_MAXINSNS) {
+		RTE_BPF_LOG(ERR, "%s: cBPF program too long (%zu insns)\n",
+			    __func__, len);
+		return -EINVAL;
+	}
+
+	/* On second pass, allocate the new program */
+	if (new_prog) {
+		addrs = calloc(len, sizeof(*addrs));
+		if (addrs == NULL)
+			return -ENOMEM;
+	}
+
+do_pass:
+	new_insn = new_prog;
+	fp = prog;
+
+	/* Classic BPF related prologue emission. */
+	if (new_insn) {
+		/* Classic BPF expects A and X to be reset first. These need
+		 * to be guaranteed to be the first two instructions.
+		 */
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+
+		/* All programs must keep CTX in callee saved BPF_REG_CTX.
+		 * In eBPF case it's done by the compiler, here we need to
+		 * do this ourself. Initial CTX is present in BPF_REG_ARG1.
+		 */
+		*new_insn++ = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1);
+	} else {
+		new_insn += 3;
+	}
+
+	for (i = 0; i < len; fp++, i++) {
+		struct ebpf_insn tmp_insns[6] = { };
+		struct ebpf_insn *insn = tmp_insns;
+
+		if (addrs)
+			addrs[i] = new_insn - new_prog;
+
+		switch (fp->code) {
+			/* Absolute loads are how classic BPF accesses skb */
+		case BPF_LD | BPF_ABS | BPF_W:
+		case BPF_LD | BPF_ABS | BPF_H:
+		case BPF_LD | BPF_ABS | BPF_B:
+			if (convert_bpf_load(fp, &insn))
+				goto err;
+
+			*insn = BPF_RAW_INSN(fp->code, 0, 0, 0, fp->k);
+			break;
+
+		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_X:
+			/* For cBPF, don't cause floating point exception */
+			*insn++ = BPF_MOV32_REG(BPF_REG_X, BPF_REG_X);
+			*insn++ = BPF_JMP_IMM(EBPF_JNE, BPF_REG_X, 0, 2);
+			*insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+			*insn++ = BPF_EXIT_INSN();
+			/* fallthrough */
+		case BPF_ALU | BPF_ADD | BPF_X:
+		case BPF_ALU | BPF_ADD | BPF_K:
+		case BPF_ALU | BPF_SUB | BPF_X:
+		case BPF_ALU | BPF_SUB | BPF_K:
+		case BPF_ALU | BPF_AND | BPF_X:
+		case BPF_ALU | BPF_AND | BPF_K:
+		case BPF_ALU | BPF_OR | BPF_X:
+		case BPF_ALU | BPF_OR | BPF_K:
+		case BPF_ALU | BPF_LSH | BPF_X:
+		case BPF_ALU | BPF_LSH | BPF_K:
+		case BPF_ALU | BPF_RSH | BPF_X:
+		case BPF_ALU | BPF_RSH | BPF_K:
+		case BPF_ALU | BPF_XOR | BPF_X:
+		case BPF_ALU | BPF_XOR | BPF_K:
+		case BPF_ALU | BPF_MUL | BPF_X:
+		case BPF_ALU | BPF_MUL | BPF_K:
+		case BPF_ALU | BPF_DIV | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_NEG:
+		case BPF_LD | BPF_IND | BPF_W:
+		case BPF_LD | BPF_IND | BPF_H:
+		case BPF_LD | BPF_IND | BPF_B:
+			/* All arithmetic insns map as-is. */
+			*insn = BPF_RAW_INSN(fp->code, BPF_REG_A, BPF_REG_X, 0, fp->k);
+			break;
+
+			/* Jump transformation cannot use BPF block macros
+			 * everywhere as offset calculation and target updates
+			 * require a bit more work than the rest, i.e. jump
+			 * opcodes map as-is, but offsets need adjustment.
+			 */
+
+#define BPF_EMIT_JMP							\
+			do {						\
+				if (target >= len)			\
+					goto err;			\
+				insn->off = addrs ? addrs[target] - addrs[i] - 1 : 0; \
+				/* Adjust pc relative offset for 2nd or 3rd insn. */ \
+				insn->off -= insn - tmp_insns;		\
+			} while (0)
+
+		case BPF_JMP | BPF_JA:
+			target = i + fp->k + 1;
+			insn->code = fp->code;
+			BPF_EMIT_JMP;
+			break;
+
+		case BPF_JMP | BPF_JEQ | BPF_K:
+		case BPF_JMP | BPF_JEQ | BPF_X:
+		case BPF_JMP | BPF_JSET | BPF_K:
+		case BPF_JMP | BPF_JSET | BPF_X:
+		case BPF_JMP | BPF_JGT | BPF_K:
+		case BPF_JMP | BPF_JGT | BPF_X:
+		case BPF_JMP | BPF_JGE | BPF_K:
+		case BPF_JMP | BPF_JGE | BPF_X:
+			if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
+				/* BPF immediates are signed, zero extend
+				 * immediate into tmp register and use it
+				 * in compare insn.
+				 */
+				*insn++ = BPF_MOV32_IMM(BPF_REG_TMP, fp->k);
+
+				insn->dst_reg = BPF_REG_A;
+				insn->src_reg = BPF_REG_TMP;
+				bpf_src = BPF_X;
+			} else {
+				insn->dst_reg = BPF_REG_A;
+				insn->imm = fp->k;
+				bpf_src = BPF_SRC(fp->code);
+				insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			}
+
+			/* Common case where 'jump_false' is next insn. */
+			if (fp->jf == 0) {
+				insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+				target = i + fp->jt + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Convert JEQ into JNE when 'jump_true' is next insn. */
+			if (fp->jt == 0 && BPF_OP(fp->code) == BPF_JEQ) {
+				insn->code = BPF_JMP | EBPF_JNE | bpf_src;
+				target = i + fp->jf + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Other jumps are mapped into two insns: Jxx and JA. */
+			target = i + fp->jt + 1;
+			insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+			BPF_EMIT_JMP;
+			insn++;
+
+			insn->code = BPF_JMP | BPF_JA;
+			target = i + fp->jf + 1;
+			BPF_EMIT_JMP;
+			break;
+
+			/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
+		case BPF_LDX | BPF_MSH | BPF_B:
+			/* tmp = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_A);
+			/* A = BPF_R0 = *(u8 *) (skb->data + K) */
+			*insn++ = BPF_LD_ABS(BPF_B, fp->k);
+			/* A &= 0xf */
+			*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, 0xf);
+			/* A <<= 2 */
+			*insn++ = BPF_ALU32_IMM(BPF_LSH, BPF_REG_A, 2);
+			/* X = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			/* A = tmp */
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP);
+			break;
+
+			/* RET_K is remaped into 2 insns. RET_A case doesn't need an
+			 * extra mov as EBPF_REG_0 is already mapped into BPF_REG_A.
+			 */
+		case BPF_RET | BPF_A:
+		case BPF_RET | BPF_K:
+			if (BPF_RVAL(fp->code) == BPF_K) {
+				*insn++ = BPF_MOV32_RAW(BPF_K, EBPF_REG_0,
+							0, fp->k);
+			}
+			*insn = BPF_EXIT_INSN();
+			break;
+
+			/* Store to stack. */
+		case BPF_ST:
+		case BPF_STX:
+			*insn = BPF_STX_MEM(BPF_W, BPF_REG_FP, BPF_CLASS(fp->code) ==
+					    BPF_ST ? BPF_REG_A : BPF_REG_X,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* Load from stack. */
+		case BPF_LD | BPF_MEM:
+		case BPF_LDX | BPF_MEM:
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD  ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_FP,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* A = K or X = K */
+		case BPF_LD | BPF_IMM:
+		case BPF_LDX | BPF_IMM:
+			*insn = BPF_MOV32_IMM(BPF_CLASS(fp->code) == BPF_LD ?
+					      BPF_REG_A : BPF_REG_X, fp->k);
+			break;
+
+			/* X = A */
+		case BPF_MISC | BPF_TAX:
+			*insn = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			break;
+
+			/* A = X */
+		case BPF_MISC | BPF_TXA:
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_X);
+			break;
+
+			/* A = skb->len or X = skb->len */
+		case BPF_LD | BPF_W | BPF_LEN:
+		case BPF_LDX | BPF_W | BPF_LEN:
+#if 0
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_CTX,
+					    offsetof(struct sk_buff, len));
+#endif
+			RTE_BPF_LOG(ERR, "%s: length not implemented\n", __func__);
+			goto err;
+
+			/* Access seccomp_data fields. */
+		case BPF_LDX | BPF_ABS | BPF_W:
+#if 0
+			/* A = *(u32 *) (ctx + K) */
+			*insn = BPF_LDX_MEM(BPF_W, BPF_REG_A, BPF_REG_CTX, fp->k);
+#endif
+			RTE_BPF_LOG(ERR, "%s: data fields implemented\n", __func__);
+			goto err;
+
+			/* Unknown instruction. */
+		default:
+			RTE_BPF_LOG(ERR, "%s: Unknown instruction!: %#x\n",
+				    __func__, fp->code);
+			goto err;
+		}
+
+		insn++;
+		if (new_prog)
+			memcpy(new_insn, tmp_insns,
+			       sizeof(*insn) * (insn - tmp_insns));
+		new_insn += insn - tmp_insns;
+	}
+
+	if (!new_prog) {
+		/* Only calculating new length. */
+		*new_len = new_insn - new_prog;
+		return 0;
+	}
+
+	pass++;
+	if ((ptrdiff_t)new_flen != new_insn - new_prog) {
+		new_flen = new_insn - new_prog;
+		if (pass > 2)
+			goto err;
+		goto do_pass;
+	}
+
+	free(addrs);
+	assert (*new_len == new_flen);
+
+	return 0;
+err:
+	free(addrs);
+	return -1;
+}
+
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog)
+{
+	struct rte_bpf_prm *prm = NULL;
+	struct ebpf_insn *ebpf = NULL;
+	uint32_t ebpf_len = 0;
+	int ret;
+
+	if (prog == NULL) {
+		RTE_BPF_LOG(ERR, "%s: NULL program\n", __func__);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* 1st pass: calculate the eBPF program length */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, NULL, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot get eBPF length\n", __func__);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	RTE_BPF_LOG(DEBUG, "%s: prog len cBPF=%u -> eBPF=%u\n",
+		    __func__, prog->bf_len, ebpf_len);
+
+	prm = rte_zmalloc("bpf_filter",
+			  sizeof(*prm) + ebpf_len * sizeof(*ebpf), 0);
+	if (prm == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	/* The EPBF instructions in this case are right after the header */
+	ebpf = (void *)(prm + 1);
+
+	/* 2nd pass: remap cBPF to eBPF instructions  */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, ebpf, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot convert cBPF to eBPF\n", __func__);
+		free(prm);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	prm->ins = ebpf;
+	prm->nb_ins = ebpf_len;
+
+	/* Classic BPF programs use mbufs */
+	prm->prog_arg.type = RTE_BPF_ARG_PTR_MBUF;
+	prm->prog_arg.size = sizeof(struct rte_mbuf);
+
+	return prm;
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 63cbd60185e0..54f7610ae990 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -25,3 +25,8 @@ if dep.found()
     sources += files('bpf_load_elf.c')
     ext_deps += dep
 endif
+
+if dpdk_conf.has('RTE_PORT_PCAP')
+    sources += files('bpf_convert.c')
+    ext_deps += pcap_dep
+endif
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 69116f36ba8b..2f23e272a376 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,31 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+#ifdef RTE_PORT_PCAP
+
+struct bpf_program;
+
+/**
+ * Convert a Classic BPF program from libpcap into a DPDK BPF code.
+ *
+ * @param prog
+ *  Classic BPF program from pcap_compile().
+ * @param prm
+ *  Result Extended BPF program.
+ * @return
+ *   Pointer to BPF program (allocated with *rte_malloc*)
+ *   that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+__rte_experimental
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog);
+
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 0bf35f487666..47082d5003ef 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -14,3 +14,9 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_convert;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v3 4/8] bpf: add function to dump eBPF instructions
  2021-09-08  4:50 ` [dpdk-dev] [PATCH v3 0/8] Packet capture framework enhancements Stephen Hemminger
                     ` (2 preceding siblings ...)
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 3/8] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
@ 2021-09-08  4:50   ` Stephen Hemminger
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 5/8] pdump: support pcapng and filtering Stephen Hemminger
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08  4:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

When debugging converted (and other) programs it is useful
to see disassembled eBPF output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_dump.c  | 118 ++++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build |   1 +
 lib/bpf/rte_bpf.h   |  14 ++++++
 lib/bpf/version.map |   1 +
 4 files changed, 134 insertions(+)
 create mode 100644 lib/bpf/bpf_dump.c

diff --git a/lib/bpf/bpf_dump.c b/lib/bpf/bpf_dump.c
new file mode 100644
index 000000000000..a6a431e64903
--- /dev/null
+++ b/lib/bpf/bpf_dump.c
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Stephen Hemminger
+ * Based on filter2xdp
+ * Copyright (C) 2017 Tobias Klauser
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+
+#include "rte_bpf.h"
+
+#define BPF_OP_INDEX(x) (BPF_OP(x) >> 4)
+#define BPF_SIZE_INDEX(x) (BPF_SIZE(x) >> 3)
+
+static const char *const class_tbl[] = {
+	[BPF_LD] = "ld",   [BPF_LDX] = "ldx",	 [BPF_ST] = "st",
+	[BPF_STX] = "stx", [BPF_ALU] = "alu",	 [BPF_JMP] = "jmp",
+	[BPF_RET] = "ret", [BPF_MISC] = "alu64",
+};
+
+static const char *const alu_op_tbl[16] = {
+	[BPF_ADD >> 4] = "add",	   [BPF_SUB >> 4] = "sub",
+	[BPF_MUL >> 4] = "mul",	   [BPF_DIV >> 4] = "div",
+	[BPF_OR >> 4] = "or",	   [BPF_AND >> 4] = "and",
+	[BPF_LSH >> 4] = "lsh",	   [BPF_RSH >> 4] = "rsh",
+	[BPF_NEG >> 4] = "neg",	   [BPF_MOD >> 4] = "mod",
+	[BPF_XOR >> 4] = "xor",	   [EBPF_MOV >> 4] = "mov",
+	[EBPF_ARSH >> 4] = "arsh", [EBPF_END >> 4] = "endian",
+};
+
+static const char *const size_tbl[] = {
+	[BPF_W >> 3] = "w",
+	[BPF_H >> 3] = "h",
+	[BPF_B >> 3] = "b",
+	[EBPF_DW >> 3] = "dw",
+};
+
+static const char *const jump_tbl[16] = {
+	[BPF_JA >> 4] = "ja",	   [BPF_JEQ >> 4] = "jeq",
+	[BPF_JGT >> 4] = "jgt",	   [BPF_JGE >> 4] = "jge",
+	[BPF_JSET >> 4] = "jset",  [EBPF_JNE >> 4] = "jne",
+	[EBPF_JSGT >> 4] = "jsgt", [EBPF_JSGE >> 4] = "jsge",
+	[EBPF_CALL >> 4] = "call", [EBPF_EXIT >> 4] = "exit",
+};
+
+static void ebpf_dump(FILE *f, const struct ebpf_insn insn, size_t n)
+{
+	const char *op, *postfix = "";
+	uint8_t cls = BPF_CLASS(insn.code);
+
+	fprintf(f, " L%zu:\t", n);
+
+	switch (cls) {
+	default:
+		fprintf(f, "unimp 0x%x // class: %s\n", insn.code,
+			class_tbl[cls]);
+		break;
+	case BPF_ALU:
+		postfix = "32";
+		/* fall through */
+	case EBPF_ALU64:
+		op = alu_op_tbl[BPF_OP_INDEX(insn.code)];
+		if (BPF_SRC(insn.code) == BPF_X)
+			fprintf(f, "%s%s r%u, r%u\n", op, postfix, insn.dst_reg,
+				insn.src_reg);
+		else
+			fprintf(f, "%s%s r%u, #0x%x\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		break;
+	case BPF_LD:
+		op = "ld";
+		postfix = size_tbl[BPF_SIZE_INDEX(insn.code)];
+		if (BPF_MODE(insn.code) == BPF_IMM)
+			fprintf(f, "%s%s r%d, #0x%x\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		else if (BPF_MODE(insn.code) == BPF_ABS)
+			fprintf(f, "%s%s r%d, [%d]\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		else if (BPF_MODE(insn.code) == BPF_IND)
+			fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix,
+				insn.dst_reg, insn.src_reg, insn.imm);
+		else
+			fprintf(f, "// BUG: LD opcode 0x%02x in eBPF insns\n",
+				insn.code);
+		break;
+	case BPF_LDX:
+		op = "ldx";
+		postfix = size_tbl[BPF_SIZE_INDEX(insn.code)];
+		fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix, insn.dst_reg,
+			insn.src_reg, insn.off);
+		break;
+#define L(pc, off) ((int)(pc) + 1 + (off))
+	case BPF_JMP:
+		op = jump_tbl[BPF_OP_INDEX(insn.code)];
+		if (op == NULL)
+			fprintf(f, "invalid jump opcode: %#x\n", insn.code);
+		else if (BPF_OP(insn.code) == BPF_JA)
+			fprintf(f, "%s L%d\n", op, L(n, insn.off));
+		else if (BPF_OP(insn.code) == EBPF_EXIT)
+			fprintf(f, "%s\n", op);
+		else
+			fprintf(f, "%s r%u, #0x%x, L%d\n", op, insn.dst_reg,
+				insn.imm, L(n, insn.off));
+		break;
+	case BPF_RET:
+		fprintf(f, "// BUG: RET opcode 0x%02x in eBPF insns\n",
+			insn.code);
+		break;
+	}
+}
+
+void rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len)
+{
+	uint32_t i;
+
+	for (i = 0; i < len; ++i)
+		ebpf_dump(f, buf[i], i);
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 54f7610ae990..5b5585173aeb 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2018 Intel Corporation
 
 sources = files('bpf.c',
+	'bpf_dump.c',
         'bpf_exec.c',
         'bpf_load.c',
         'bpf_pkt.c',
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 2f23e272a376..8bfb8399ca43 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,20 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+/**
+ * Dump epf instructions to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param buf
+ *   A pointer to BPF instructions
+ * @param len
+ *   Number of BPF instructions to dump.
+ */
+__rte_experimental
+void
+rte_bpf_dump (FILE *f, const struct ebpf_insn *buf, uint32_t len);
+
 #ifdef RTE_PORT_PCAP
 
 struct bpf_program;
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 47082d5003ef..3b953f2f4592 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -19,4 +19,5 @@ EXPERIMENTAL {
 	global:
 
 	rte_bpf_convert;
+	rte_bpf_dump;
 };
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v3 5/8] pdump: support pcapng and filtering
  2021-09-08  4:50 ` [dpdk-dev] [PATCH v3 0/8] Packet capture framework enhancements Stephen Hemminger
                     ` (3 preceding siblings ...)
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 4/8] bpf: add function to dump eBPF instructions Stephen Hemminger
@ 2021-09-08  4:50   ` Stephen Hemminger
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 6/8] app/dumpcap: add new packet capture application Stephen Hemminger
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08  4:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build       |   4 +-
 lib/pdump/meson.build |   2 +-
 lib/pdump/rte_pdump.c | 437 ++++++++++++++++++++++++++++++------------
 lib/pdump/rte_pdump.h | 110 ++++++++++-
 lib/pdump/version.map |   8 +
 5 files changed, 435 insertions(+), 126 deletions(-)

diff --git a/lib/meson.build b/lib/meson.build
index 51bf9c2d11f0..4b01949fe715 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -26,6 +26,7 @@ libraries = [
         'timer',   # eventdev depends on this
         'acl',
         'bbdev',
+        'bpf',
         'bitratestats',
         'cfgfile',
         'compressdev',
@@ -43,7 +44,6 @@ libraries = [
         'member',
         'pcapng',
         'power',
-        'pdump',
         'rawdev',
         'regexdev',
         'rib',
@@ -55,10 +55,10 @@ libraries = [
         'ipsec', # ipsec lib depends on net, crypto and security
         'fib', #fib lib depends on rib
         'port', # pkt framework libs which use other libs from above
+	'pdump', # pdump lib depends on bpf pcapng
         'table',
         'pipeline',
         'flow_classify', # flow_classify lib depends on pkt framework table lib
-        'bpf',
         'graph',
         'node',
 ]
diff --git a/lib/pdump/meson.build b/lib/pdump/meson.build
index 3a95eabde6a6..51ceb2afdec5 100644
--- a/lib/pdump/meson.build
+++ b/lib/pdump/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('rte_pdump.c')
 headers = files('rte_pdump.h')
-deps += ['ethdev']
+deps += ['ethdev', 'bpf', 'pcapng']
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 382217bc1564..9cb31d6b5008 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -7,8 +7,10 @@
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
+#include <rte_pcapng.h>
 
 #include "rte_pdump.h"
 
@@ -27,30 +29,26 @@ enum pdump_operation {
 	ENABLE = 2
 };
 
+/*
+ * Note: version numbers intentionally start at 3
+ * in order to catch any application built with older out
+ * version of DPDK using incompatible client request format.
+ */
 enum pdump_version {
-	V1 = 1
+	PDUMP_CLIENT_LEGACY = 3,
+	PDUMP_CLIENT_PCAPNG = 4,
 };
 
 struct pdump_request {
 	uint16_t ver;
 	uint16_t op;
-	uint32_t flags;
-	union pdump_data {
-		struct enable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} en_v1;
-		struct disable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} dis_v1;
-	} data;
+	uint16_t flags;
+	uint16_t queue;
+	struct rte_ring *ring;
+	struct rte_mempool *mp;
+	const struct rte_bpf_prm *prm;
+	uint32_t snaplen;
+	char device[RTE_DEV_NAME_MAX_LEN];
 };
 
 struct pdump_response {
@@ -63,80 +61,140 @@ static struct pdump_rxtx_cbs {
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	const struct rte_eth_rxtx_callback *cb;
-	void *filter;
+	const struct rte_bpf *filter;
+	enum pdump_version ver;
+	uint32_t snaplen;
 } rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
 tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
 
-
-static inline void
-pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+static const char *MZ_RTE_PDUMP_STATS = "rte_pdump_stats";
+
+/* Shared memory between primary and secondary processes. */
+static struct {
+	struct rte_pdump_stats rx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+	struct rte_pdump_stats tx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+} *pdump_stats;
+
+/* Create a clone of mbuf to be placed into ring. */
+static void
+pdump_copy(uint16_t port_id, uint16_t queue,
+	   enum rte_pcapng_direction direction,
+	   struct rte_mbuf **pkts, uint16_t nb_pkts,
+	   const struct pdump_rxtx_cbs *cbs,
+	   struct rte_pdump_stats *stats)
 {
 	unsigned int i;
 	int ring_enq;
 	uint16_t d_pkts = 0;
 	struct rte_mbuf *dup_bufs[nb_pkts];
-	struct pdump_rxtx_cbs *cbs;
+	uint64_t ts;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	struct rte_mbuf *p;
+	uint64_t rcs[nb_pkts];
+
+	if (cbs->filter &&
+	    rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts) == 0) {
+		/* All packets were filtered out */
+		__atomic_fetch_add(&stats->filtered, nb_pkts,
+				   __ATOMIC_RELAXED);
+		return;
+	}
 
-	cbs  = user_params;
+	ts = rte_get_tsc_cycles();
 	ring = cbs->ring;
 	mp = cbs->mp;
 	for (i = 0; i < nb_pkts; i++) {
-		p = rte_pktmbuf_copy(pkts[i], mp, 0, UINT32_MAX);
-		if (p)
+		/*
+		 * Similar behavior to rte_bpf_eth callback.
+		 * if BPF program returns zero value for a given packet,
+		 * then it will be ignored.
+		 */
+		if (cbs->filter && rcs[i] == 0) {
+			__atomic_fetch_add(&stats->filtered,
+					   1, __ATOMIC_RELAXED);
+			continue;
+		}
+
+		/*
+		 * If using pcapng then want to wrap packets
+		 * otherwise a simple copy.
+		 */
+		if (cbs->ver == PDUMP_CLIENT_PCAPNG)
+			p = rte_pcapng_copy(port_id, queue,
+					    pkts[i], mp, cbs->snaplen,
+					    ts, direction);
+		else
+			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
+
+		if (unlikely(p == NULL))
+			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+		else
 			dup_bufs[d_pkts++] = p;
 	}
 
+	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)dup_bufs, d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		PDUMP_LOG(DEBUG,
-			"only %d of packets enqueued to ring\n", ring_enq);
+		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
 
 static uint16_t
-pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_rx(uint16_t port, uint16_t queue,
 	struct rte_mbuf **pkts, uint16_t nb_pkts,
-	uint16_t max_pkts __rte_unused,
-	void *user_params)
+	uint16_t max_pkts __rte_unused, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->rx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_IN,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static uint16_t
-pdump_tx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_tx(uint16_t port, uint16_t queue,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->tx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static int
-pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_rx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &rx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &rx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"rx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_first_rx_callback(port, qid,
 								pdump_rx, cbs);
 			if (cbs->cb == NULL) {
@@ -145,8 +203,7 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -170,26 +227,32 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 }
 
 static int
-pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_tx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &tx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &tx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"tx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_tx_callback(port, qid, pdump_tx,
 								cbs);
 			if (cbs->cb == NULL) {
@@ -198,8 +261,7 @@ pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -228,37 +290,47 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	uint16_t nb_rx_q = 0, nb_tx_q = 0, end_q, queue;
 	uint16_t port;
 	int ret = 0;
+	struct rte_bpf *filter = NULL;
 	uint32_t flags;
 	uint16_t operation;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 
-	flags = p->flags;
-	operation = p->op;
-	if (operation == ENABLE) {
-		ret = rte_eth_dev_get_port_by_name(p->data.en_v1.device,
-				&port);
-		if (ret < 0) {
+	if (!(p->ver == PDUMP_CLIENT_LEGACY ||
+	      p->ver == PDUMP_CLIENT_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "incorrect client version %u\n", p->ver);
+		return -EINVAL;
+	}
+
+	if (p->prm) {
+		if (p->prm->prog_arg.type != RTE_BPF_ARG_PTR_MBUF) {
 			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.en_v1.device);
+				  "invalid BPF program type: %u\n",
+				  p->prm->prog_arg.type);
 			return -EINVAL;
 		}
-		queue = p->data.en_v1.queue;
-		ring = p->data.en_v1.ring;
-		mp = p->data.en_v1.mp;
-	} else {
-		ret = rte_eth_dev_get_port_by_name(p->data.dis_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.dis_v1.device);
-			return -EINVAL;
+
+		filter = rte_bpf_load(p->prm);
+		if (filter == NULL) {
+			PDUMP_LOG(ERR, "cannot load BPF filter: %s\n",
+				  rte_strerror(rte_errno));
+			return -rte_errno;
 		}
-		queue = p->data.dis_v1.queue;
-		ring = p->data.dis_v1.ring;
-		mp = p->data.dis_v1.mp;
+	}
+
+	flags = p->flags;
+	operation = p->op;
+	queue = p->queue;
+	ring = p->ring;
+	mp = p->mp;
+
+	ret = rte_eth_dev_get_port_by_name(p->device, &port);
+	if (ret < 0) {
+		PDUMP_LOG(ERR,
+			  "failed to get port id for device id=%s\n",
+			  p->device);
+		return -EINVAL;
 	}
 
 	/* validation if packet capture is for all queues */
@@ -296,8 +368,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register RX callback */
 	if (flags & RTE_PDUMP_FLAG_RX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_rx_q : queue + 1;
-		ret = pdump_register_rx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_rx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -305,8 +378,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register TX callback */
 	if (flags & RTE_PDUMP_FLAG_TX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_tx_q : queue + 1;
-		ret = pdump_register_tx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_tx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -347,8 +421,18 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 int
 rte_pdump_init(void)
 {
+	const struct rte_memzone *mz;
 	int ret;
 
+	mz = rte_memzone_reserve(MZ_RTE_PDUMP_STATS, sizeof(*pdump_stats),
+				 rte_socket_id(), 0);
+	if (mz == NULL) {
+		PDUMP_LOG(ERR, "cannot allocate pdump statistics\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	pdump_stats = mz->addr;
+
 	ret = rte_mp_action_register(PDUMP_MP, pdump_server);
 	if (ret && rte_errno != ENOTSUP)
 		return -1;
@@ -392,14 +476,21 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 static int
 pdump_validate_flags(uint32_t flags)
 {
-	if (flags != RTE_PDUMP_FLAG_RX && flags != RTE_PDUMP_FLAG_TX &&
-		flags != RTE_PDUMP_FLAG_RXTX) {
+	if ((flags & RTE_PDUMP_FLAG_RXTX) == 0) {
 		PDUMP_LOG(ERR,
 			"invalid flags, should be either rx/tx/rxtx\n");
 		rte_errno = EINVAL;
 		return -1;
 	}
 
+	/* mask off the flags we know about */
+	if (flags & ~(RTE_PDUMP_FLAG_RXTX | RTE_PDUMP_FLAG_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "unknown flags: %#x\n", flags);
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -426,12 +517,12 @@ pdump_validate_port(uint16_t port, char *name)
 }
 
 static int
-pdump_prepare_client_request(char *device, uint16_t queue,
-				uint32_t flags,
-				uint16_t operation,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+pdump_prepare_client_request(const char *device, uint16_t queue,
+			     uint32_t flags, uint32_t snaplen,
+			     uint16_t operation,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     const struct rte_bpf_prm *prm)
 {
 	int ret = -1;
 	struct rte_mp_msg mp_req, *mp_rep;
@@ -440,23 +531,23 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	struct pdump_request *req = (struct pdump_request *)mp_req.param;
 	struct pdump_response *resp;
 
-	req->ver = 1;
-	req->flags = flags;
+	memset(req, 0, sizeof(*req));
+	if (flags & RTE_PDUMP_FLAG_PCAPNG)
+		req->ver = PDUMP_CLIENT_PCAPNG;
+	else
+		req->ver = PDUMP_CLIENT_LEGACY;
+
+	req->flags = flags & RTE_PDUMP_FLAG_RXTX;
 	req->op = operation;
+	req->queue = queue;
+	strlcpy(req->device, device, sizeof(req->device));
+
 	if ((operation & ENABLE) != 0) {
-		strlcpy(req->data.en_v1.device, device,
-			sizeof(req->data.en_v1.device));
-		req->data.en_v1.queue = queue;
-		req->data.en_v1.ring = ring;
-		req->data.en_v1.mp = mp;
-		req->data.en_v1.filter = filter;
-	} else {
-		strlcpy(req->data.dis_v1.device, device,
-			sizeof(req->data.dis_v1.device));
-		req->data.dis_v1.queue = queue;
-		req->data.dis_v1.ring = NULL;
-		req->data.dis_v1.mp = NULL;
-		req->data.dis_v1.filter = NULL;
+		req->queue = queue;
+		req->ring = ring;
+		req->mp = mp;
+		req->prm = prm;
+		req->snaplen = snaplen;
 	}
 
 	strlcpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
@@ -477,11 +568,17 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	return ret;
 }
 
-int
-rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
-			struct rte_ring *ring,
-			struct rte_mempool *mp,
-			void *filter)
+/*
+ * There are two versions of this function, because although original API
+ * left place holder for future filter, it never checked the value.
+ * Therefore the API can't depend on application passing a non
+ * bogus value.
+ */
+static int
+pdump_enable(uint16_t port, uint16_t queue,
+	     uint32_t flags, uint32_t snaplen,
+	     struct rte_ring *ring, struct rte_mempool *mp,
+	     const struct rte_bpf_prm *prm)
 {
 	int ret;
 	char name[RTE_DEV_NAME_MAX_LEN];
@@ -496,20 +593,42 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						ENABLE, ring, mp, filter);
+	if (snaplen == 0)
+		snaplen = UINT32_MAX;
 
-	return ret;
+	return pdump_prepare_client_request(name, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
 }
 
 int
-rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
-				uint32_t flags,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
+		 struct rte_ring *ring,
+		 struct rte_mempool *mp,
+		 void *filter __rte_unused)
 {
-	int ret = 0;
+	return pdump_enable(port, queue, flags, 0,
+			    ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm)
+{
+	return pdump_enable(port, queue, flags, snaplen,
+			    ring, mp, prm);
+}
+
+static int
+pdump_enable_by_deviceid(const char *device_id, uint16_t queue,
+			 uint32_t flags, uint32_t snaplen,
+			 struct rte_ring *ring,
+			 struct rte_mempool *mp,
+			 const struct rte_bpf_prm *prm)
+{
+	int ret;
 
 	ret = pdump_validate_ring_mp(ring, mp);
 	if (ret < 0)
@@ -518,10 +637,30 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						ENABLE, ring, mp, filter);
+	return pdump_prepare_client_request(device_id, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
+}
 
-	return ret;
+int
+rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
+			     uint32_t flags,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     void *filter __rte_unused)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, 0,
+					ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *prm)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, snaplen,
+					ring, mp, prm);
 }
 
 int
@@ -537,8 +676,8 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags)
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(name, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
@@ -553,8 +692,66 @@ rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(device_id, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
+
+static void
+pdump_sum_stats(uint16_t port, uint16_t nq,
+		const struct rte_pdump_stats stats[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+		struct rte_pdump_stats *total)
+{
+	uint64_t *sum = (uint64_t *)total;
+	unsigned int i;
+	uint64_t val;
+	uint16_t qid;
+
+	for (qid = 0; qid < nq; qid++) {
+		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+
+		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
+			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			sum[i] += val;
+		}
+	}
+}
+
+int
+rte_pdump_stats(uint16_t port, struct rte_pdump_stats *stats)
+{
+	struct rte_eth_dev_info dev_info;
+	const struct rte_memzone *mz;
+	int ret;
+
+	memset(stats, 0, sizeof(*stats));
+	ret = rte_eth_dev_info_get(port, &dev_info);
+	if (ret != 0) {
+		PDUMP_LOG(ERR,
+			  "Error during getting device (port %u) info: %s\n",
+			  port, strerror(-ret));
+		return ret;
+	}
+
+	if (pdump_stats == NULL) {
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+			PDUMP_LOG(ERR,
+				  "pdump not initialized\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		mz = rte_memzone_lookup(MZ_RTE_PDUMP_STATS);
+		if (mz == NULL) {
+			PDUMP_LOG(ERR, "can not find pdump stats\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+		pdump_stats = mz->addr;
+	}
+
+	pdump_sum_stats(port, dev_info.nb_rx_queues, pdump_stats->rx, stats);
+	pdump_sum_stats(port, dev_info.nb_tx_queues, pdump_stats->tx, stats);
+	return 0;
+}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index 6b00fc17aeb2..e2fbd78c6273 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -15,6 +15,7 @@
 #include <stdint.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
+#include <rte_bpf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -26,7 +27,9 @@ enum {
 	RTE_PDUMP_FLAG_RX = 1,  /* receive direction */
 	RTE_PDUMP_FLAG_TX = 2,  /* transmit direction */
 	/* both receive and transmit directions */
-	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
+	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX),
+
+	RTE_PDUMP_FLAG_PCAPNG = 4, /* format for pcapng */
 };
 
 /**
@@ -68,7 +71,7 @@ rte_pdump_uninit(void);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  Unused should be NULL.
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -80,6 +83,41 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 		struct rte_mempool *mp,
 		void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given port and queue with filtering.
+ *
+ * @param port_id
+ *  The Ethernet port on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param prm
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf(uint16_t port_id, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm);
+
 /**
  * Disables packet capturing on given port and queue.
  *
@@ -118,7 +156,7 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  unused should be NULL
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -131,6 +169,43 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 				struct rte_mempool *mp,
 				void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given device id and queue with filtering.
+ * device_id can be name or pci address of device.
+ *
+ * @param device_id
+ *  device id on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *prm);
+
+
 /**
  * Disables packet capturing on given device_id and queue.
  * device_id can be name or pci address of device.
@@ -153,6 +228,35 @@ int
 rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 				uint32_t flags);
 
+
+/**
+ * A structure used to retrieve statistics from packet capture.
+ * The statistics are sum of both receive and transmit queues.
+ */
+struct rte_pdump_stats {
+	uint64_t accepted; /**< Number of packets accepted by filter. */
+	uint64_t filtered; /**< Number of packets rejected by filter. */
+	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
+	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+
+	uint64_t reserved[4]; /**< Reserved and pad to cache line */
+};
+
+/**
+ * Retrieve the packet capture statistics for a queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param stats
+ *   A pointer to structure of type *rte_pdump_stats* to be filled in.
+ * @return
+ *   Zero if successful. -1 on error and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_pdump_stats(uint16_t port_id, struct rte_pdump_stats *stats);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index f0a9d12c9a9e..ce5502d9cdf4 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -10,3 +10,11 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_pdump_enable_bpf;
+	rte_pdump_enable_bpf_by_deviceid;
+	rte_pdump_stats;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v3 6/8] app/dumpcap: add new packet capture application
  2021-09-08  4:50 ` [dpdk-dev] [PATCH v3 0/8] Packet capture framework enhancements Stephen Hemminger
                     ` (4 preceding siblings ...)
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 5/8] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-09-08  4:50   ` Stephen Hemminger
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 7/8] doc: changes for new pcapng and dumpcap Stephen Hemminger
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 8/8] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
  7 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08  4:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.

It is not complete yet some features such as filtering are not implemented.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/dumpcap/main.c      | 832 ++++++++++++++++++++++++++++++++++++++++
 app/dumpcap/meson.build |  16 +
 app/meson.build         |   1 +
 3 files changed, 849 insertions(+)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build

diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c
new file mode 100644
index 000000000000..e6456550a0be
--- /dev/null
+++ b/app/dumpcap/main.c
@@ -0,0 +1,832 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * DPDK application to dump network traffic
+ * This is designed to look and act like the Wireshark
+ * dumpcap program.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_alarm.h>
+#include <rte_config.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_lcore.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_pcapng.h>
+#include <rte_pdump.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+#include <rte_time.h>
+#include <rte_version.h>
+
+#include <pcap/pcap.h>
+
+#define RING_NAME "capture-ring"
+#define MONITOR_INTERVAL  (500 * 1000)
+#define MBUF_POOL_CACHE_SIZE 32
+#define BURST_SIZE 32
+#define SLEEP_THRESHOLD 1000
+
+/* command line flags */
+static const char *progname;
+static bool quit_signal;
+static bool group_read;
+static bool quiet;
+static bool promiscuous_mode = true;
+static bool use_pcapng = true;
+static char *output_name;
+static const char *filter_str;
+static unsigned int ring_size = 2048;
+static const char *capture_comment;
+static uint32_t snaplen = RTE_MBUF_DEFAULT_BUF_SIZE;
+static unsigned dump_bpf;
+static struct {
+	uint64_t  duration;	/* nanoseconds */
+	unsigned long packets;  /* number of packets in file */
+	size_t size;		/* file size (bytes) */
+} stop;
+
+/* Running state */
+static struct rte_bpf_prm *bpf_prm;
+static uint64_t start_time, end_time;
+static uint64_t packets_received;
+static size_t file_size;
+
+struct interface {
+	TAILQ_ENTRY(interface) next;
+	uint16_t port;
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_rxtx_callback *rx_cb[RTE_MAX_QUEUES_PER_PORT];
+};
+
+TAILQ_HEAD(interface_list, interface);
+static struct interface_list interfaces = TAILQ_HEAD_INITIALIZER(interfaces);
+static struct interface *port2intf[RTE_MAX_ETHPORTS];
+
+static void usage(void)
+{
+	printf("Usage: %s [options] ...\n\n", progname);
+	printf("Capture Interface:\n"
+	       "  -i <interface>           name or port index of interface\n"
+	       "  -f <capture filter>      packet filter in libpcap filter syntax\n");
+	printf("  -s <snaplen>, --snapshot-length <snaplen>\n"
+	       "                           packet snapshot length (def: %u)\n",
+	       RTE_MBUF_DEFAULT_BUF_SIZE);
+	printf("  -p, --no-promiscuous-mode\n"
+	       "                           don't capture in promiscuous mode\n"
+	       "  -D, --list-interfaces    print list of interfaces and exit\n"
+	       "  -d                       print generated BPF code for capture filter\n"
+	       "\n"
+	       "Stop conditions:\n"
+	       "  -c <packet count>        stop after n packets (def: infinite)\n"
+	       "  -a <autostop cond.> ..., --autostop <autostop cond.> ...\n"
+	       "                           duration:NUM - stop after NUM seconds\n"
+	       "                           filesize:NUM - stop this file after NUM kB\n"
+	       "                            packets:NUM - stop after NUM packets\n"
+	       "Output (files):\n"
+	       "  -w <filename>            name of file to save (def: tempfile)\n"
+	       "  -g                       enable group read access on the output file(s)\n"
+	       "  -n                       use pcapng format instead of pcap (default)\n"
+	       "  -P                       use libpcap format instead of pcapng\n"
+	       "  --capture-comment <comment>\n"
+	       "                           add a capture comment to the output file\n"
+	       "\n"
+	       "Miscellaneous:\n"
+	       "  -q                       don't report packet capture counts\n"
+	       "  -v, --version            print version information and exit\n"
+	       "  -h, --help               display this help and exit\n"
+	       "\n"
+	       "Use Ctrl-C to stop capturing at any time.\n");
+}
+
+static const char *version(void)
+{
+	static char str[128];
+
+	snprintf(str, sizeof(str),
+		 "%s 1.0 (%s)\n", progname, rte_version());
+	return str;
+}
+
+/* Parse numeric argument from command line */
+static unsigned long get_uint(const char *arg, const char *name,
+			     unsigned int limit)
+{
+	unsigned long u;
+	char *endp;
+
+	u = strtoul(arg, &endp, 0);
+	if (*arg == '\0' || *endp != '\0')
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is not a valid number\n",
+			 name, arg);
+	if (limit && u > limit)
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is too large (greater than %u)\n",
+			 name, arg, limit);
+
+	return u;
+}
+
+/* Set auto stop values */
+static void auto_stop(char *opt)
+{
+	char *value, *endp;
+
+	value = strchr(opt, ':');
+	if (value == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Missing colon in auto stop parameter\n");
+
+	*value++ = '\0';
+	if (strcmp(opt, "duration") == 0) {
+		double interval = strtod(value, &endp);
+
+		if (*value == '\0' || *endp != '\0' || interval <= 0)
+			rte_exit(EXIT_FAILURE,
+				 "Invalid duration \"%s\"\n", value);
+		stop.duration = NSEC_PER_SEC * interval;
+	} else if (strcmp(opt, "filesize") == 0) {
+		stop.size = get_uint(value, "filesize", 0) * 1024;
+	} else if (strcmp(opt, "packets") == 0) {
+		stop.packets = get_uint(value, "packets", 0);
+	} else {
+		rte_exit(EXIT_FAILURE,
+			 "Unknown autostop parameter \"%s\"\n", opt);
+	}
+}
+
+/* Add interface to list of interfaces to capture */
+static void add_interface(uint16_t port, const char *name)
+{
+	struct interface *intf;
+
+	intf = malloc(sizeof(*intf));
+	if (!intf)
+		rte_exit(EXIT_FAILURE, "no memory for interface\n");
+
+	memset(intf, 0, sizeof(*intf));
+	strlcpy(intf->name, name, sizeof(intf->name));
+
+	printf("Capturing on '%s'\n", name);
+
+	port2intf[port] = intf;
+	TAILQ_INSERT_TAIL(&interfaces, intf, next);
+}
+
+/* Select all valid DPDK interfaces */
+static void select_all_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+	}
+}
+
+/*
+ * Choose interface to capture if no -i option given.
+ * Select the first DPDK port, this matches what dumpcap does.
+ */
+static void set_default_interface(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+		return;
+	}
+	rte_exit(EXIT_FAILURE, "No usable interfaces found\n");
+}
+
+/* Lookup interface by name or port and add it to the list */
+static void select_interface(const char *arg)
+{
+	uint16_t port;
+
+	if (strcmp(arg, "*"))
+		select_all_interfaces();
+	else if (rte_eth_dev_get_port_by_name(arg, &port) == 0)
+		add_interface(port, arg);
+	else {
+		char name[RTE_ETH_NAME_MAX_LEN];
+
+		port = get_uint(arg, "port_number", UINT16_MAX);
+		if (rte_eth_dev_get_name_by_port(port, name) < 0)
+			rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
+				 port);
+		add_interface(port, name);
+	}
+}
+
+/* Display list of possible interfaces that can be used. */
+static void show_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		printf("%u. %s\n", p, name);
+	}
+}
+
+static void compile_filter(void)
+{
+	struct bpf_program fcode;
+	pcap_t *pcap;
+
+	pcap = pcap_open_dead(DLT_EN10MB, snaplen);
+	if (!pcap)
+		rte_exit(EXIT_FAILURE, "can not open pcap\n");
+
+	if (pcap_compile(pcap, &fcode, filter_str,
+			 1, PCAP_NETMASK_UNKNOWN) != 0)
+		rte_exit(EXIT_FAILURE, "pcap filter string not valid (%s)\n",
+			 pcap_geterr(pcap));
+
+	bpf_prm = rte_bpf_convert(&fcode);
+	if (bpf_prm == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "bpf convert failed\n");
+
+	if (dump_bpf) {
+		printf("BPF code\n");
+		bpf_dump(&fcode, dump_bpf);
+
+		if (dump_bpf > 1) {
+			unsigned int i;
+
+			printf("\nEBPF code\n");
+			for (i = 0; i < bpf_prm->nb_ins; ++i) {
+				const struct ebpf_insn *ins = &bpf_prm->ins[i];
+
+				printf("{ %#04x, %2u, %2u, %4u, %#10x },\n",
+				    ins->code, ins->dst_reg, ins->src_reg,
+				    ins->off, ins->imm);
+			}
+		}
+		exit(0);
+	}
+
+	/* Don't care about original program any more */
+	pcap_freecode(&fcode);
+	pcap_close(pcap);
+}
+
+/*
+ * Parse command line options.
+ * These are chosen to be similar to dumpcap command.
+ */
+static void parse_opts(int argc, char **argv)
+{
+	static const struct option long_options[] = {
+		{ "autostop",        required_argument, NULL, 'a' },
+		{ "capture-comment", required_argument, NULL, 0 },
+		{ "help",            no_argument,       NULL, 'h' },
+		{ "interface",       required_argument, NULL, 'i' },
+		{ "list-interfaces", no_argument,       NULL, 'D' },
+		{ "no-promiscuous-mode", no_argument,   NULL, 'p' },
+		{ "output-file",     required_argument, NULL, 'w' },
+		{ "ring-buffer",     required_argument, NULL, 'b' },
+		{ "snapshot-length", required_argument, NULL, 's' },
+		{ "version",         no_argument,       NULL, 'v' },
+		{ NULL },
+	};
+	int option_index, c;
+
+	for (;;) {
+		c = getopt_long(argc, argv, "a:b:c:dDf:ghi:nN:pPqs:vw:",
+				long_options, &option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			switch (option_index) {
+			case 0:
+				capture_comment = optarg;
+				break;
+			default:
+				usage();
+				exit(1);
+			}
+			break;
+		case 'a':
+			auto_stop(optarg);
+			break;
+		case 'b':
+			rte_exit(EXIT_FAILURE,
+				 "multiple files not implemented\n");
+			break;
+		case 'c':
+			stop.packets = get_uint(optarg, "packet_count", 0);
+			break;
+		case 'd':
+			++dump_bpf;
+			break;
+		case 'D':
+			show_interfaces();
+			exit(0);
+		case 'f':
+			filter_str = optarg;
+			break;
+		case 'g':
+			group_read = true;
+			break;
+		case 'h':
+			printf("%s\n\n", version());
+			usage();
+			exit(0);
+		case 'i':
+			select_interface(optarg);
+			break;
+		case 'n':
+			use_pcapng = true;
+			break;
+		case 'N':
+			ring_size = get_uint(optarg, "packet_limit", 0);
+			break;
+		case 'p':
+			promiscuous_mode = false;
+			break;
+		case 'P':
+			use_pcapng = false;
+			break;
+		case 'q':
+			quiet = true;
+			break;
+		case 's':
+			snaplen = get_uint(optarg, "snap_len", 0);
+			break;
+		case 'w':
+			output_name = optarg;
+			break;
+		case 'v':
+			printf("%s\n", version());
+			exit(0);
+		default:
+			fprintf(stderr, "Invalid option: %s\n",
+				argv[optind - 1]);
+			usage();
+			exit(1);
+		}
+	}
+}
+
+static void
+signal_handler(int sig_num __rte_unused)
+{
+	__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+}
+
+/* Return the time since 1/1/1970 in nanoseconds */
+static uint64_t create_timestamp(void)
+{
+	struct timespec now;
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+	return rte_timespec_to_ns(&now);
+}
+
+static void
+cleanup_pdump_resources(void)
+{
+	struct interface *intf;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		rte_pdump_disable(intf->port,
+				  RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
+		if (promiscuous_mode)
+			rte_eth_promiscuous_disable(intf->port);
+	}
+}
+
+/* Alarm signal handler, used to check that primary process */
+static void
+monitor_primary(void *arg __rte_unused)
+{
+	if (__atomic_load_n(&quit_signal, __ATOMIC_RELAXED))
+		return;
+
+	if (rte_eal_primary_proc_alive(NULL)) {
+		rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	} else {
+		fprintf(stderr,
+			"Primary process is no longer active, exiting...\n");
+		__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+	}
+}
+
+/* Setup handler to check when primary exits. */
+static void
+enable_primary_monitor(void)
+{
+	int ret;
+
+	/* Once primary exits, so will pdump. */
+	ret = rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to enable monitor:%d\n", ret);
+}
+
+static void
+disable_primary_monitor(void)
+{
+	int ret;
+
+	ret = rte_eal_alarm_cancel(monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to disable monitor:%d\n", ret);
+}
+
+static void
+report_packet_stats(rte_pcapng_t *out)
+{
+	struct rte_pdump_stats pdump_stats;
+	struct interface *intf;
+	uint64_t ifrecv, ifdrop;
+	double percent;
+
+	fputc('\n', stderr);
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (rte_pdump_stats(intf->port, &pdump_stats) < 0)
+			continue;
+
+		/* do what Wiretap does */
+		ifrecv = pdump_stats.accepted + pdump_stats.filtered;
+		ifdrop = pdump_stats.nombuf + pdump_stats.ringfull;
+
+		if (use_pcapng)
+			rte_pcapng_write_stats(out, intf->port, NULL,
+					       start_time, end_time,
+					       ifrecv, ifdrop);
+
+		if (ifrecv == 0)
+			percent = 0;
+		else
+			percent = 100. * ifrecv / (ifrecv + ifdrop);
+
+		fprintf(stderr,
+			"Packets received/dropped on interface '%s': "
+			"%"PRIu64 "/%" PRIu64 " (%.1f)\n",
+			intf->name, ifrecv, ifdrop, percent);
+	}
+}
+
+/*
+ * Start DPDK EAL with arguments.
+ * Unlike most DPDK programs, this application does not use the
+ * typical EAL command line arguments.
+ * We don't want to expose all the DPDK internals to the user.
+ */
+static void dpdk_init(void)
+{
+	const char *args[] = {
+		progname, "--proc-type", "secondary",
+		"--log-level", "bpf:debug"
+
+	};
+	const int eal_argc = RTE_DIM(args);
+	char **eal_argv;
+	unsigned int i;
+
+	/* DPDK API requires mutable versions of command line arguments. */
+	eal_argv = calloc(eal_argc + 1, sizeof(char *));
+	if (eal_argv == NULL)
+		rte_panic("No memory\n");
+
+	for (i = 0; i < RTE_DIM(args); i++)
+		eal_argv[i] = strdup(args[i]);
+
+	if (rte_eal_init(eal_argc, eal_argv) < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed: is primary process running?\n");
+
+	if (rte_eth_dev_count_avail() == 0)
+		rte_exit(EXIT_FAILURE, "No Ethernet ports found\n");
+}
+
+/* Create packet ring shared between callbacks and process */
+static struct rte_ring *create_ring(void)
+{
+	struct rte_ring *ring;
+	size_t size, log2;
+
+	/* Find next power of 2 >= size. */
+	size = ring_size;
+	log2 = sizeof(size) * 8 - __builtin_clzl(size - 1);
+	size = 1u << log2;
+
+	if (size != ring_size) {
+		fprintf(stderr, "Ring size %u rounded up to %zu\n",
+			ring_size, size);
+		ring_size = size;
+	}
+
+	ring = rte_ring_lookup(RING_NAME);
+	if (ring == NULL) {
+		ring = rte_ring_create(RING_NAME, ring_size,
+					rte_socket_id(), 0);
+		if (ring == NULL)
+			rte_exit(EXIT_FAILURE, "Could not create ring :%s\n",
+				 rte_strerror(rte_errno));
+	}
+	return ring;
+}
+
+static struct rte_mempool *create_mempool(void)
+{
+	static const char pool_name[] = "capture_mbufs";
+	size_t num_mbufs = 2 * ring_size;
+	struct rte_mempool *mp;
+
+	mp = rte_mempool_lookup(pool_name);
+	if (mp)
+		return mp;
+
+	mp = rte_pktmbuf_pool_create_by_ops(pool_name, num_mbufs,
+					    MBUF_POOL_CACHE_SIZE, 0,
+					    rte_pcapng_mbuf_size(snaplen),
+					    rte_socket_id(), "ring_mp_sc");
+	if (mp == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Mempool (%s) creation failed: %s\n", pool_name,
+			 rte_strerror(rte_errno));
+
+	return mp;
+}
+
+static void *create_output(void)
+{
+	struct utsname uts;
+	char os[256];
+	int fd;
+
+	/* If no filename specified make a tempfile name */
+	if (output_name == NULL) {
+		struct interface *intf;
+		struct tm *tm;
+		time_t now;
+		char ts[32];
+
+		intf = TAILQ_FIRST(&interfaces);
+		now = time(NULL);
+		tm = localtime(&now);
+		if (!tm)
+			rte_panic("localtime failed\n");
+
+		strftime(ts, sizeof(ts), "%Y%m%d%H%M%S", tm);
+		if (asprintf(&output_name, "/tmp/%s_%u_%s_%s.%s",
+			     progname, intf->port, intf->name, ts,
+			     use_pcapng ? "pcapng" : "pcap") < 0)
+			rte_panic("asprintf failed\n");
+	}
+
+	if (strcmp(output_name, "-") == 0)
+		fd = STDOUT_FILENO;
+	else {
+		mode_t mode = group_read ? 0640 : 0600;
+
+		fd = open(output_name, O_WRONLY | O_CREAT, mode);
+		if (fd < 0)
+			rte_exit(EXIT_FAILURE, "Can not open \"%s\": %s\n",
+				 output_name, strerror(errno));
+	}
+
+	if (use_pcapng) {
+		rte_pcapng_t *pcapng;
+
+		if (uname(&uts) < 0)
+			strcpy(os, "unknown");
+		else
+			snprintf(os, sizeof(os), "%s %s",
+				 uts.sysname, uts.release);
+
+		pcapng = rte_pcapng_fdopen(fd, os, NULL, version(), capture_comment);
+		if (pcapng == NULL)
+			rte_exit(EXIT_FAILURE, "pcapng_fdopen failed: %s\n",
+				 strerror(rte_errno));
+		return pcapng;
+	} else {
+		pcap_dumper_t *dumper;
+		pcap_t *pcap;
+
+		pcap = pcap_open_dead_with_tstamp_precision(DLT_EN10MB, snaplen,
+							    PCAP_TSTAMP_PRECISION_NANO);
+		if (pcap == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_open_dead failed\n");
+
+		dumper = pcap_dump_fopen(pcap, fdopen(fd, "w"));
+		if (dumper == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_dump_fopen failed: %s\n",
+				 pcap_geterr(pcap));
+		return dumper;
+	}
+}
+
+static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp)
+{
+	struct interface *intf;
+	uint32_t flags;
+	int ret;
+
+	flags = RTE_PDUMP_FLAG_RXTX;
+	if (use_pcapng)
+		flags |= RTE_PDUMP_FLAG_PCAPNG;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (promiscuous_mode)
+			rte_eth_promiscuous_enable(intf->port);
+
+		ret = rte_pdump_enable_bpf(intf->port, RTE_PDUMP_ALL_QUEUES,
+					   flags, snaplen,
+					   r, mp, bpf_prm);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Packet dump enable failed: %s\n",
+				 rte_strerror(-ret));
+	}
+}
+
+/*
+ * Show current count of captured packets
+ * with backspaces to overwrite last value.
+ */
+static void show_count(uint64_t count)
+{
+	unsigned int i;
+	static unsigned int bt;
+
+	for (i = 0; i < bt; i++)
+		fputc('\b', stderr);
+
+	bt = fprintf(stderr, "%"PRIu64" ", count);
+}
+
+/* Write multiple packets in older pcap format */
+static ssize_t
+pcap_write_packets(pcap_dumper_t *dumper,
+		   struct rte_mbuf *pkts[], uint16_t n)
+{
+	uint8_t temp_data[snaplen];
+	struct pcap_pkthdr header;
+	uint16_t i;
+	size_t total = 0;
+
+	gettimeofday(&header.ts, NULL);
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m = pkts[i];
+
+		header.len = rte_pktmbuf_pkt_len(m);
+		header.caplen = RTE_MIN(header.len, snaplen);
+
+		pcap_dump((u_char *)dumper, &header,
+			  rte_pktmbuf_read(m, 0, header.caplen, temp_data));
+
+		total += sizeof(header) + header.len;
+	}
+
+	return total;
+}
+
+/* Process all packets in ring and dump to capture file */
+static int process_ring(void *out, struct rte_ring *r)
+{
+	struct rte_mbuf *pkts[BURST_SIZE];
+	unsigned int avail, n;
+	static unsigned int empty_count;
+	ssize_t written;
+
+	n = rte_ring_sc_dequeue_burst(r, (void **) pkts, BURST_SIZE,
+				      &avail);
+	if (n == 0) {
+		/* don't consume endless amounts of cpu if idle */
+		if (empty_count < SLEEP_THRESHOLD)
+			++empty_count;
+		else
+			usleep(10);
+		return 0;
+	}
+
+	empty_count = (avail == 0);
+
+	if (use_pcapng)
+		written = rte_pcapng_write_packets(out, pkts, n);
+	else
+		written = pcap_write_packets(out, pkts, n);
+
+	rte_pktmbuf_free_bulk(pkts, n);
+
+	if (written < 0)
+		return -1;
+
+	file_size += written;
+	packets_received += n;
+	if (!quiet)
+		show_count(packets_received);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	struct rte_ring *r;
+	struct rte_mempool *mp;
+	void *out;
+
+	progname = basename(argv[0]);
+
+	dpdk_init();
+	parse_opts(argc, argv);
+
+	if (filter_str)
+		compile_filter();
+
+	if (TAILQ_EMPTY(&interfaces))
+		set_default_interface();
+
+	r = create_ring();
+	mp = create_mempool();
+	out = create_output();
+	if (out == NULL)
+		rte_exit(EXIT_FAILURE, "can not open output file: %s\n",
+			 rte_strerror(rte_errno));
+
+	start_time = create_timestamp();
+	enable_pdump(r, mp);
+
+	signal(SIGINT, signal_handler);
+	signal(SIGPIPE, SIG_IGN);
+
+	enable_primary_monitor();
+
+	if (!quiet) {
+		fprintf(stderr, "Packets captured: ");
+		show_count(0);
+	}
+
+	while (!__atomic_load_n(&quit_signal, __ATOMIC_RELAXED)) {
+		if (process_ring(out, r) < 0) {
+			fprintf(stderr, "pcapng file write failed; %s\n",
+				strerror(errno));
+			break;
+		}
+
+		if (stop.size && file_size >= stop.size)
+			break;
+
+		if (stop.packets && packets_received >= stop.packets)
+			break;
+
+		if (stop.duration != 0 &&
+		    create_timestamp() - start_time > stop.duration)
+			break;
+	}
+
+	end_time = create_timestamp();
+	disable_primary_monitor();
+
+	if (rte_eal_primary_proc_alive(NULL))
+		report_packet_stats(out);
+
+	if (use_pcapng)
+		rte_pcapng_close(out);
+	else
+		pcap_dump_close(out);
+
+	cleanup_pdump_resources();
+	rte_free(bpf_filter);
+	rte_ring_free(r);
+	rte_mempool_free(mp);
+
+	return rte_eal_cleanup() ? EXIT_FAILURE : 0;
+}
diff --git a/app/dumpcap/meson.build b/app/dumpcap/meson.build
new file mode 100644
index 000000000000..794336211eff
--- /dev/null
+++ b/app/dumpcap/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+if not dpdk_conf.has('RTE_PORT_PCAP')
+    build = false
+    reason = 'missing dependency, "libpcap"'
+endif
+
+if is_windows
+	build = false
+	reason = 'not supported on Windows'
+	subdir_done()
+endif
+
+sources = files('main.c')
+deps += ['ethdev', 'pdump', 'pcapng', 'bpf']
diff --git a/app/meson.build b/app/meson.build
index 4c6049807cc3..e41a2e390236 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017-2019 Intel Corporation
 
 apps = [
+        'dumpcap',
         'pdump',
         'proc-info',
         'test-acl',
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v3 7/8] doc: changes for new pcapng and dumpcap
  2021-09-08  4:50 ` [dpdk-dev] [PATCH v3 0/8] Packet capture framework enhancements Stephen Hemminger
                     ` (5 preceding siblings ...)
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 6/8] app/dumpcap: add new packet capture application Stephen Hemminger
@ 2021-09-08  4:50   ` Stephen Hemminger
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 8/8] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
  7 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08  4:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Describe the new packet capture library and utilities

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/api/doxy-api-index.md                     |  1 +
 doc/api/doxy-api.conf.in                      |  1 +
 .../howto/img/packet_capture_framework.svg    | 96 +++++++++----------
 doc/guides/howto/packet_capture_framework.rst | 67 ++++++-------
 doc/guides/prog_guide/index.rst               |  1 +
 doc/guides/prog_guide/pcapng_lib.rst          | 24 +++++
 doc/guides/prog_guide/pdump_lib.rst           | 28 ++++--
 doc/guides/rel_notes/release_21_11.rst        | 10 ++
 doc/guides/tools/dumpcap.rst                  | 86 +++++++++++++++++
 doc/guides/tools/index.rst                    |  1 +
 10 files changed, 228 insertions(+), 87 deletions(-)
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a0356..ee07394d1c78 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -223,3 +223,4 @@ The public API headers are grouped by topics:
   [experimental APIs]  (@ref rte_compat.h),
   [ABI versioning]     (@ref rte_function_versioning.h),
   [version]            (@ref rte_version.h)
+  [pcapng]             (@ref rte_pcapng.h)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6ab..aba17799a9a1 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -58,6 +58,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/metrics \
                           @TOPDIR@/lib/node \
                           @TOPDIR@/lib/net \
+                          @TOPDIR@/lib/pcapng \
                           @TOPDIR@/lib/pci \
                           @TOPDIR@/lib/pdump \
                           @TOPDIR@/lib/pipeline \
diff --git a/doc/guides/howto/img/packet_capture_framework.svg b/doc/guides/howto/img/packet_capture_framework.svg
index a76baf71fdee..1c2646a81096 100644
--- a/doc/guides/howto/img/packet_capture_framework.svg
+++ b/doc/guides/howto/img/packet_capture_framework.svg
@@ -1,6 +1,4 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<!-- Created with Inkscape (http://www.inkscape.org/) -->
-
 <svg
    xmlns:osb="http://www.openswatchbook.org/uri/2009/osb"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
@@ -16,8 +14,8 @@
    viewBox="0 0 425.19685 283.46457"
    id="svg2"
    version="1.1"
-   inkscape:version="0.91 r13725"
-   sodipodi:docname="drawing-pcap.svg">
+   inkscape:version="1.0.2 (e86c870879, 2021-01-15)"
+   sodipodi:docname="packet_capture_framework.svg">
   <defs
      id="defs4">
     <marker
@@ -228,7 +226,7 @@
        x2="487.64606"
        y2="258.38232"
        gradientUnits="userSpaceOnUse"
-       gradientTransform="translate(-84.916417,744.90779)" />
+       gradientTransform="matrix(1.1457977,0,0,0.99944907,-151.97019,745.05014)" />
     <linearGradient
        inkscape:collect="always"
        xlink:href="#linearGradient5784"
@@ -277,17 +275,18 @@
      borderopacity="1.0"
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
-     inkscape:zoom="0.57434918"
-     inkscape:cx="215.17857"
-     inkscape:cy="285.26445"
+     inkscape:zoom="1"
+     inkscape:cx="226.77165"
+     inkscape:cy="78.124511"
      inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
-     inkscape:window-width="1874"
-     inkscape:window-height="971"
-     inkscape:window-x="2"
-     inkscape:window-y="24"
-     inkscape:window-maximized="0" />
+     inkscape:window-width="2560"
+     inkscape:window-height="1414"
+     inkscape:window-x="0"
+     inkscape:window-y="0"
+     inkscape:window-maximized="1"
+     inkscape:document-rotation="0" />
   <metadata
      id="metadata7">
     <rdf:RDF>
@@ -296,7 +295,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
+        <dc:title />
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -321,15 +320,15 @@
        y="790.82452" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="61.050636"
        y="807.3205"
-       id="text4152"
-       sodipodi:linespacing="125%"><tspan
+       id="text4152"><tspan
          sodipodi:role="line"
          id="tspan4154"
          x="61.050636"
-         y="807.3205">DPDK Primary Application</tspan></text>
+         y="807.3205"
+         style="font-size:12.5px;line-height:1.25">DPDK Primary Application</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6"
@@ -339,19 +338,20 @@
        y="827.01843" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="350.68585"
        y="841.16058"
-       id="text4189"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189"><tspan
          sodipodi:role="line"
          id="tspan4191"
          x="350.68585"
-         y="841.16058">dpdk-pdump</tspan><tspan
+         y="841.16058"
+         style="font-size:12.5px;line-height:1.25">dpdk-dumpcap</tspan><tspan
          sodipodi:role="line"
          x="350.68585"
          y="856.78558"
-         id="tspan4193">tool</tspan></text>
+         id="tspan4193"
+         style="font-size:12.5px;line-height:1.25">tool</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4"
@@ -361,15 +361,15 @@
        y="891.16315" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70612"
        y="905.3053"
-       id="text4189-1"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1"><tspan
          sodipodi:role="line"
          x="352.70612"
          y="905.3053"
-         id="tspan4193-3">PCAP PMD</tspan></text>
+         id="tspan4193-3"
+         style="font-size:12.5px;line-height:1.25">librte_pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5745);fill-opacity:1;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-6"
@@ -379,15 +379,15 @@
        y="923.9931" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.02846"
        y="938.13525"
-       id="text4189-0"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-0"><tspan
          sodipodi:role="line"
          x="136.02846"
          y="938.13525"
-         id="tspan4193-6">dpdk_port0</tspan></text>
+         id="tspan4193-6"
+         style="font-size:12.5px;line-height:1.25">dpdk_port0</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-5"
@@ -397,33 +397,33 @@
        y="824.99817" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="137.54369"
        y="839.14026"
-       id="text4189-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-4"><tspan
          sodipodi:role="line"
          x="137.54369"
          y="839.14026"
-         id="tspan4193-2">librte_pdump</tspan></text>
+         id="tspan4193-2"
+         style="font-size:12.5px;line-height:1.25">librte_pdump</tspan></text>
     <rect
-       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1.07013;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5"
-       width="94.449265"
-       height="35.355339"
-       x="307.7804"
-       y="985.61243" />
+       width="108.21974"
+       height="35.335861"
+       x="297.9809"
+       y="985.62219" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70618"
        y="999.75458"
-       id="text4189-1-8"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8"><tspan
          sodipodi:role="line"
          x="352.70618"
          y="999.75458"
-         id="tspan4193-3-2">capture.pcap</tspan></text>
+         id="tspan4193-3-2"
+         style="font-size:12.5px;line-height:1.25">capture.pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5788-1);fill-opacity:1;stroke:#257cdc;stroke-width:1.12555885;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5-1"
@@ -433,15 +433,15 @@
        y="983.14984" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.53352"
        y="1002.785"
-       id="text4189-1-8-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8-4"><tspan
          sodipodi:role="line"
          x="136.53352"
          y="1002.785"
-         id="tspan4193-3-2-7">Traffic Generator</tspan></text>
+         id="tspan4193-3-2-7"
+         style="font-size:12.5px;line-height:1.25">Traffic Generator</tspan></text>
     <path
        style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker7331)"
        d="m 351.46948,927.02357 c 0,57.5787 0,57.5787 0,57.5787"
diff --git a/doc/guides/howto/packet_capture_framework.rst b/doc/guides/howto/packet_capture_framework.rst
index c31bac52340e..78baa609a021 100644
--- a/doc/guides/howto/packet_capture_framework.rst
+++ b/doc/guides/howto/packet_capture_framework.rst
@@ -1,18 +1,19 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(c) 2017 Intel Corporation.
 
-DPDK pdump Library and pdump Tool
-=================================
+DPDK packet capture libraries and tools
+=======================================
 
 This document describes how the Data Plane Development Kit (DPDK) Packet
 Capture Framework is used for capturing packets on DPDK ports. It is intended
 for users of DPDK who want to know more about the Packet Capture feature and
 for those who want to monitor traffic on DPDK-controlled devices.
 
-The DPDK packet capture framework was introduced in DPDK v16.07. The DPDK
-packet capture framework consists of the DPDK pdump library and DPDK pdump
-tool.
-
+The DPDK packet capture framework was introduced in DPDK v16.07 and
+enhanced in 21.1. The DPDK packet capture framework consists of the
+libraries for collecting packets ``librte_pdump`` and writing packets
+to a file ``librte_pcapng``. There are two sample applications:
+``dpdk-dumpcap`` and older ``dpdk-pdump``.
 
 Introduction
 ------------
@@ -22,43 +23,46 @@ allow users to initialize the packet capture framework and to enable or
 disable packet capture. The library works on a multi process communication model and its
 usage is recommended for debugging purposes.
 
-The :ref:`dpdk-pdump <pdump_tool>` tool is developed based on the
-``librte_pdump`` library.  It runs as a DPDK secondary process and is capable
-of enabling or disabling packet capture on DPDK ports. The ``dpdk-pdump`` tool
-provides command-line options with which users can request enabling or
-disabling of the packet capture on DPDK ports.
+The :ref:`librte_pcapng <pcapng_library>` library provides the APIs to format
+packets and write them to a file in Pcapng format.
+
+
+The :ref:`dpdk-dumpcap <dumpcap_tool>` is a tool that captures packets in
+like Wireshark dumpcap does for Linux. It runs as a DPDK secondary process and
+captures packets from one or more interfaces and writes them to a file
+in Pcapng format.  The ``dpdk-dumpcap`` tool is designed to take
+most of the same options as the Wireshark ``dumpcap`` command.
 
-The application which initializes the packet capture framework will be a primary process
-and the application that enables or disables the packet capture will
-be a secondary process. The primary process sends the Rx and Tx packets from the DPDK ports
-to the secondary process.
+Without any options it will use the packet capture framework to
+capture traffic from the first available DPDK port.
 
 In DPDK the ``testpmd`` application can be used to initialize the packet
-capture framework and acts as a server, and the ``dpdk-pdump`` tool acts as a
+capture framework and acts as a server, and the ``dpdk-dumpcap`` tool acts as a
 client. To view Rx or Tx packets of ``testpmd``, the application should be
-launched first, and then the ``dpdk-pdump`` tool. Packets from ``testpmd``
-will be sent to the tool, which then sends them on to the Pcap PMD device and
-that device writes them to the Pcap file or to an external interface depending
-on the command-line option used.
+launched first, and then the ``dpdk-dumpcap`` tool. Packets from ``testpmd``
+will be sent to the tool, and then to the Pcapng file.
 
 Some things to note:
 
-* The ``dpdk-pdump`` tool can only be used in conjunction with a primary
+* All tools using ``librte_pdump`` can only be used in conjunction with a primary
   application which has the packet capture framework initialized already. In
   dpdk, only ``testpmd`` is modified to initialize packet capture framework,
-  other applications remain untouched. So, if the ``dpdk-pdump`` tool has to
+  other applications remain untouched. So, if the ``dpdk-dumpcap`` tool has to
   be used with any application other than the testpmd, the user needs to
   explicitly modify that application to call the packet capture framework
   initialization code. Refer to the ``app/test-pmd/testpmd.c`` code and look
   for ``pdump`` keyword to see how this is done.
 
-* The ``dpdk-pdump`` tool depends on the libpcap based PMD.
+* The ``dpdk-pdump`` tool is an older tool created as demonstration of ``librte_pdump``
+  library. The ``dpdk-pdump`` tool provides more limited functionality and
+  and depends on the Pcap PMD. It is retained only for compatibility reasons;
+  users should use ``dpdk-dumpcap`` instead.
 
 
 Test Environment
 ----------------
 
-The overview of using the Packet Capture Framework and the ``dpdk-pdump`` tool
+The overview of using the Packet Capture Framework and the ``dpdk-dumpcap`` utility
 for packet capturing on the DPDK port in
 :numref:`figure_packet_capture_framework`.
 
@@ -66,13 +70,13 @@ for packet capturing on the DPDK port in
 
 .. figure:: img/packet_capture_framework.*
 
-   Packet capturing on a DPDK port using the dpdk-pdump tool.
+   Packet capturing on a DPDK port using the dpdk-dumpcap utility.
 
 
 Running the Application
 -----------------------
 
-The following steps demonstrate how to run the ``dpdk-pdump`` tool to capture
+The following steps demonstrate how to run the ``dpdk-dumpcap`` tool to capture
 Rx side packets on dpdk_port0 in :numref:`figure_packet_capture_framework` and
 inspect them using ``tcpdump``.
 
@@ -80,16 +84,15 @@ inspect them using ``tcpdump``.
 
      sudo <build_dir>/app/dpdk-testpmd -c 0xf0 -n 4 -- -i --port-topology=chained
 
-#. Launch the pdump tool as follows::
+#. Launch the dpdk-dump as follows::
 
-     sudo <build_dir>/app/dpdk-pdump -- \
-          --pdump 'port=0,queue=*,rx-dev=/tmp/capture.pcap'
+     sudo <build_dir>/app/dpdk-dumpcap -w /tmp/capture.pcapng
 
 #. Send traffic to dpdk_port0 from traffic generator.
-   Inspect packets captured in the file capture.pcap using a tool
-   that can interpret Pcap files, for example tcpdump::
+   Inspect packets captured in the file capture.pcap using a tool such as
+   tcpdump or tshark that can interpret Pcapng files::
 
-     $tcpdump -nr /tmp/capture.pcap
+     $ tcpdump -nr /tmp/capture.pcapng
      reading from file /tmp/capture.pcap, link-type EN10MB (Ethernet)
      11:11:36.891404 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
      11:11:36.891442 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46a3..b440c77c2ba1 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -43,6 +43,7 @@ Programmer's Guide
     ip_fragment_reassembly_lib
     generic_receive_offload_lib
     generic_segmentation_offload_lib
+    pcapng_lib
     pdump_lib
     multi_proc_support
     kernel_nic_interface
diff --git a/doc/guides/prog_guide/pcapng_lib.rst b/doc/guides/prog_guide/pcapng_lib.rst
new file mode 100644
index 000000000000..36379b530a57
--- /dev/null
+++ b/doc/guides/prog_guide/pcapng_lib.rst
@@ -0,0 +1,24 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2016 Intel Corporation.
+
+.. _pcapng_library:
+
+Packet Capture File Writer
+==========================
+
+Pcapng is a library for creating files in Pcapng file format.
+The Pcapng file format is the default capture file format for modern
+network capture processing tools. It can be read by wireshark and tcpdump.
+
+Usage
+-----
+
+Before the library can be used the function ``rte_pcapng_init``
+should be called once to initialize timestamp computation.
+
+
+References
+----------
+* Draft RFC https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+
+* Project repository  https://github.com/pcapng/pcapng/
diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst
index 62c0b015b2fe..9af91415e5ea 100644
--- a/doc/guides/prog_guide/pdump_lib.rst
+++ b/doc/guides/prog_guide/pdump_lib.rst
@@ -3,10 +3,10 @@
 
 .. _pdump_library:
 
-The librte_pdump Library
-========================
+The Packet Capture Library
+==========================
 
-The ``librte_pdump`` library provides a framework for packet capturing in DPDK.
+The DPDK ``pdump`` library provides a framework for packet capturing in DPDK.
 The library does the complete copy of the Rx and Tx mbufs to a new mempool and
 hence it slows down the performance of the applications, so it is recommended
 to use this library for debugging purposes.
@@ -23,11 +23,19 @@ or disable the packet capture, and to uninitialize it.
 
 * ``rte_pdump_enable()``:
   This API enables the packet capture on a given port and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf()``
+  This API enables the packet capture on a given port and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_enable_by_deviceid()``:
   This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf_by_deviceid()``
+  This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
+  It also allows seating an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_disable()``:
   This API disables the packet capture on a given port and queue.
@@ -61,6 +69,12 @@ and enables the packet capture by registering the Ethernet RX and TX callbacks f
 and queue combinations. Then the primary process will mirror the packets to the new mempool and enqueue them to
 the rte_ring that secondary process have passed to these APIs.
 
+The packet ring supports one of two formats. The default format enqueues copies of the original packets
+into the rte_ring. If the ``RTE_PDUMP_FLAG_PCAPNG`` is set the mbuf data is extended with header and trailer
+to match the format of Pcapng enhanced packet block. The enhanced packet block has meta-data such as the
+timestamp, port and queue the packet was captured on. It is up to the application consuming the
+packets from the ring to select the format desired.
+
 The library APIs ``rte_pdump_disable()`` and ``rte_pdump_disable_by_deviceid()`` disables the packet capture.
 For the calls to these APIs from secondary process, the library creates the "pdump disable" request and sends
 the request to the primary process over the multi process channel. The primary process takes this request and
@@ -74,5 +88,5 @@ function.
 Use Case: Packet Capturing
 --------------------------
 
-The DPDK ``app/pdump`` tool is developed based on this library to capture packets in DPDK.
-Users can use this as an example to develop their own packet capturing tools.
+The DPDK ``app/dpdk-dumpcap`` utility uses this library
+to capture packets in DPDK.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 675b5738348b..ee24cbfdb99d 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -62,6 +62,16 @@ New Features
   * Added bus-level parsing of the devargs syntax.
   * Kept compatibility with the legacy syntax as parsing fallback.
 
+* **Enhance Packet capture.**
+
+  * New dpdk-dumpcap program that has most of the features of the
+    wireshark dumpcap utility including capture of multiple interfaces,
+    stopping after number of bytes, packets.
+  * New library for writing pcapng packet capture files.
+  * Enhancement to the pdump library to support:
+    * Packet filter with BPF.
+    * Pcapng format with timestamps and meta-data.
+    * Fixes packet capture with stripped VLAN tags.
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/dumpcap.rst b/doc/guides/tools/dumpcap.rst
new file mode 100644
index 000000000000..664ea0c79802
--- /dev/null
+++ b/doc/guides/tools/dumpcap.rst
@@ -0,0 +1,86 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Microsoft Corporation.
+
+.. _dumpcap_tool:
+
+dpdk-dumpcap Application
+========================
+
+The ``dpdk-dumpcap`` tool is a Data Plane Development Kit (DPDK)
+network traffic dump tool.  The interface is similar to  the dumpcap tool in Wireshark.
+It runs as a secondary DPDK process and lets you capture packets that are
+coming into and out of a DPDK primary process.
+The ``dpdk-dumpcap`` writes files in Pcapng packet format using
+capture file format is pcapng.
+
+Without any options set it will use DPDK to capture traffic from the first
+available DPDK interface and write the received raw packet data, along
+with timestamps into a pcapng file.
+
+If the ``-w`` option is not specified, ``dpdk-dumpcap`` writes to a newly
+create file with a name chosen based on interface name and timestamp.
+If ``-w`` option is specified, then that file is used.
+
+   .. Note::
+      * The ``dpdk-dumpcap`` tool can only be used in conjunction with a primary
+        application which has the packet capture framework initialized already.
+        In dpdk, only the ``testpmd`` is modified to initialize packet capture
+        framework, other applications remain untouched. So, if the ``dpdk-dumpcap``
+        tool has to be used with any application other than the testpmd, user
+        needs to explicitly modify that application to call packet capture
+        framework initialization code. Refer ``app/test-pmd/testpmd.c``
+        code to see how this is done.
+
+      * The ``dpdk-dumpcap`` tool runs as a DPDK secondary process. It exits when
+        the primary application exits.
+
+
+Running the Application
+-----------------------
+
+To list interfaces available for capture use ``--list-interfaces``.
+
+To filter packets in style of *tshark* use the ``-f`` flag.
+
+To capture on multiple interfaces at once, use multiple ``-I`` flags.
+
+Example
+-------
+
+.. code-block:: console
+
+   # ./<build_dir>/app/dpdk-dumpcap --list-interfaces
+   0. 000:00:03.0
+   1. 000:00:03.1
+
+   # ./<build_dir>/app/dpdk-dumpcap -I 0000:00:03.0 -c 6 -w /tmp/sample.pcapng
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 6/0
+
+   # ./<build_dir>/app/dpdk-dumpcap -f 'tcp port 80'
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 10/8
+
+
+Limitations
+-----------
+The following option of Wireshark ``dumpcap`` is not yet implemented:
+
+   * ``-b|--ring-buffer`` -- more complex file management.
+
+The following options do not make sense in the context of DPDK.
+
+   * ``-C <byte_limit>`` -- its a kernel thing
+
+   * ``-t`` -- use a thread per interface
+
+   * Timestamp type.
+
+   * Link data types. Only EN10MB (Ethernet) is supported.
+
+   * Wireless related options:  ``-I|--monitor-mode`` and  ``-k <freq>``
+
+
+.. Note::
+   * The options to ``dpdk-dumpcap`` are like the Wireshark dumpcap program and
+     are not the same as ``dpdk-pdump`` and other DPDK applications.
diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst
index 93dde4148e90..b71c12b8f2dd 100644
--- a/doc/guides/tools/index.rst
+++ b/doc/guides/tools/index.rst
@@ -8,6 +8,7 @@ DPDK Tools User Guides
     :maxdepth: 2
     :numbered:
 
+    dumpcap
     proc_info
     pdump
     pmdinfo
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v3 8/8] MAINTAINERS: add entry for new pcapng and dumper
  2021-09-08  4:50 ` [dpdk-dev] [PATCH v3 0/8] Packet capture framework enhancements Stephen Hemminger
                     ` (6 preceding siblings ...)
  2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 7/8] doc: changes for new pcapng and dumpcap Stephen Hemminger
@ 2021-09-08  4:50   ` Stephen Hemminger
  7 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08  4:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Claim responsibility for the new code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 266f5ac1dae8..06384ac2702d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1429,6 +1429,12 @@ F: app/test/test_pdump.*
 F: app/pdump/
 F: doc/guides/tools/pdump.rst
 
+Packet dump
+M: Stephen Hemminger <stephen@networkplumber.org>
+F: lib/pcapng/
+F: doc/guides/prog_guide/pcapng_lib.rst
+F: app/dumpcap/
+F: doc/guides/tools/dumpcap.rst
 
 Packet Framework
 ----------------
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v4 0/8] Packet capture framework enhancements
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (6 preceding siblings ...)
  2021-09-08  4:50 ` [dpdk-dev] [PATCH v3 0/8] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-08 17:16 ` Stephen Hemminger
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 1/8] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
                     ` (7 more replies)
  2021-09-08 21:50 ` [dpdk-dev] [PATCH v5 0/9] Packet capture framework enhancements Stephen Hemminger
                   ` (10 subsequent siblings)
  18 siblings, 8 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 17:16 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patch set is a more complete version of the the enhanced
packet capture support described last year.

The new capture library and utility are:
  - faster avoids lots of extra I/O, does bursting, etc.
  - gives more information (multiple ports, queues, etc)
  - has a better user interface (same as Wireshark dumpcap)
  - fixes structural problems with VLAN's and timestamps

It preserves the old pdump command as is for those people
who never want to change.

v4 changes:
  - minor checkpatch fixes.
    Note: some of the checkpatch warnings are bogus and won't be fixed.
  - fix build of dumpcap on FreeBSD

v3 changes:
  - introduce packet filters using classic BPF to eBPF converter
    required small fix to DPDK BPF interpreter
  - introduce function to decode eBPF instructions
  - add option to dumpcap to show both classic BPF and eBPF result
  - drop some un-useful stubs
  - minor checkpatch warning cleanup


Stephen Hemminger (8):
  librte_pcapng: add new library for writing pcapng files
  bpf: allow self-xor operation
  bpf: add function to convert classic BPF to DPDK BPF
  bpf: add function to dump eBPF instructions
  pdump: support pcapng and filtering
  app/dumpcap: add new packet capture application
  doc: changes for new pcapng and dumpcap
  MAINTAINERS: add entry for new pcapng and dumper

 MAINTAINERS                                   |   6 +
 app/dumpcap/main.c                            | 827 ++++++++++++++++++
 app/dumpcap/meson.build                       |  16 +
 app/meson.build                               |   1 +
 doc/api/doxy-api-index.md                     |   1 +
 doc/api/doxy-api.conf.in                      |   1 +
 .../howto/img/packet_capture_framework.svg    |  96 +-
 doc/guides/howto/packet_capture_framework.rst |  67 +-
 doc/guides/prog_guide/index.rst               |   1 +
 doc/guides/prog_guide/pcapng_lib.rst          |  24 +
 doc/guides/prog_guide/pdump_lib.rst           |  28 +-
 doc/guides/rel_notes/release_21_11.rst        |  10 +
 doc/guides/tools/dumpcap.rst                  |  86 ++
 doc/guides/tools/index.rst                    |   1 +
 lib/bpf/bpf_convert.c                         | 570 ++++++++++++
 lib/bpf/bpf_dump.c                            | 118 +++
 lib/bpf/bpf_validate.c                        |   8 +-
 lib/bpf/meson.build                           |   6 +
 lib/bpf/rte_bpf.h                             |  39 +
 lib/bpf/version.map                           |   7 +
 lib/meson.build                               |   5 +-
 lib/pcapng/meson.build                        |   8 +
 lib/pcapng/pcapng_proto.h                     | 129 +++
 lib/pcapng/rte_pcapng.c                       | 574 ++++++++++++
 lib/pcapng/rte_pcapng.h                       | 194 ++++
 lib/pcapng/version.map                        |  12 +
 lib/pdump/meson.build                         |   2 +-
 lib/pdump/rte_pdump.c                         | 437 ++++++---
 lib/pdump/rte_pdump.h                         | 110 ++-
 lib/pdump/version.map                         |   8 +
 30 files changed, 3177 insertions(+), 215 deletions(-)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst
 create mode 100644 lib/bpf/bpf_convert.c
 create mode 100644 lib/bpf/bpf_dump.c
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v4 1/8] librte_pcapng: add new library for writing pcapng files
  2021-09-08 17:16 ` [dpdk-dev] [PATCH v4 0/8] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-08 17:16   ` Stephen Hemminger
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 2/8] bpf: allow self-xor operation Stephen Hemminger
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 17:16 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See draft RFC
  https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
and
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build           |   1 +
 lib/pcapng/meson.build    |   8 +
 lib/pcapng/pcapng_proto.h | 129 +++++++++
 lib/pcapng/rte_pcapng.c   | 574 ++++++++++++++++++++++++++++++++++++++
 lib/pcapng/rte_pcapng.h   | 194 +++++++++++++
 lib/pcapng/version.map    |  12 +
 6 files changed, 918 insertions(+)
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

diff --git a/lib/meson.build b/lib/meson.build
index 1673ca4323c0..51bf9c2d11f0 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -41,6 +41,7 @@ libraries = [
         'latencystats',
         'lpm',
         'member',
+        'pcapng',
         'power',
         'pdump',
         'rawdev',
diff --git a/lib/pcapng/meson.build b/lib/pcapng/meson.build
new file mode 100644
index 000000000000..fe636bdf3c0b
--- /dev/null
+++ b/lib/pcapng/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+version = 1
+sources = files('rte_pcapng.c')
+headers = files('rte_pcapng.h')
+
+deps += ['ethdev']
diff --git a/lib/pcapng/pcapng_proto.h b/lib/pcapng/pcapng_proto.h
new file mode 100644
index 000000000000..47161d8a1213
--- /dev/null
+++ b/lib/pcapng/pcapng_proto.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * PCAP Next Generation Capture File writer
+ *
+ * See: https://github.com/pcapng/pcapng/ for the file format.
+ */
+
+enum pcapng_block_types {
+	PCAPNG_INTERFACE_BLOCK		= 1,
+	PCAPNG_PACKET_BLOCK,		/* Obsolete */
+	PCAPNG_SIMPLE_PACKET_BLOCK,
+	PCAPNG_NAME_RESOLUTION_BLOCK,
+	PCAPNG_INTERFACE_STATS_BLOCK,
+	PCAPNG_ENHANCED_PACKET_BLOCK,
+
+	PCAPNG_SECTION_BLOCK		= 0x0A0D0D0A,
+};
+
+struct pcapng_option {
+	uint16_t code;
+	uint16_t length;
+	uint8_t data[];
+};
+
+#define PCAPNG_BYTE_ORDER_MAGIC 0x1A2B3C4D
+#define PCAPNG_MAJOR_VERS 1
+#define PCAPNG_MINOR_VERS 0
+
+enum pcapng_opt {
+	PCAPNG_OPT_END	= 0,
+	PCAPNG_OPT_COMMENT = 1,
+};
+
+struct pcapng_section_header {
+	uint32_t block_type;
+	uint32_t block_length;
+	uint32_t byte_order_magic;
+	uint16_t major_version;
+	uint16_t minor_version;
+	uint64_t section_length;
+};
+
+enum pcapng_section_opt {
+	PCAPNG_SHB_HARDWARE = 2,
+	PCAPNG_SHB_OS	    = 3,
+	PCAPNG_SHB_USERAPPL = 4,
+};
+
+struct pcapng_interface_block {
+	uint32_t block_type;	/* 1 */
+	uint32_t block_length;
+	uint16_t link_type;
+	uint16_t reserved;
+	uint32_t snap_len;
+};
+
+enum pcapng_interface_options {
+	PCAPNG_IFB_NAME	 = 2,
+	PCAPNG_IFB_DESCRIPTION,
+	PCAPNG_IFB_IPV4ADDR,
+	PCAPNG_IFB_IPV6ADDR,
+	PCAPNG_IFB_MACADDR,
+	PCAPNG_IFB_EUIADDR,
+	PCAPNG_IFB_SPEED,
+	PCAPNG_IFB_TSRESOL,
+	PCAPNG_IFB_TZONE,
+	PCAPNG_IFB_FILTER,
+	PCAPNG_IFB_OS,
+	PCAPNG_IFB_FCSLEN,
+	PCAPNG_IFB_TSOFFSET,
+	PCAPNG_IFB_HARDWARE,
+};
+
+struct pcapng_enhance_packet_block {
+	uint32_t block_type;	/* 6 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+	uint32_t capture_length;
+	uint32_t original_length;
+};
+
+/* Flags values */
+#define PCAPNG_IFB_INBOUND   0b01
+#define PCAPNG_IFB_OUTBOUND  0b10
+
+enum pcapng_epb_options {
+	PCAPNG_EPB_FLAGS = 2,
+	PCAPNG_EPB_HASH,
+	PCAPNG_EPB_DROPCOUNT,
+	PCAPNG_EPB_PACKETID,
+	PCAPNG_EPB_QUEUE,
+	PCAPNG_EPB_VERDICT,
+};
+
+enum pcapng_epb_hash {
+	PCAPNG_HASH_2COMP = 0,
+	PCAPNG_HASH_XOR,
+	PCAPNG_HASH_CRC32,
+	PCAPNG_HASH_MD5,
+	PCAPNG_HASH_SHA1,
+	PCAPNG_HASH_TOEPLITZ,
+};
+
+struct pcapng_simple_packet {
+	uint32_t block_type;	/* 3 */
+	uint32_t block_length;
+	uint32_t packet_length;
+};
+
+struct pcapng_statistics {
+	uint32_t block_type;	/* 5 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+};
+
+enum pcapng_isb_options {
+	PCAPNG_ISB_STARTTIME = 2,
+	PCAPNG_ISB_ENDTIME,
+	PCAPNG_ISB_IFRECV,
+	PCAPNG_ISB_IFDROP,
+	PCAPNG_ISB_FILTERACCEPT,
+	PCAPNG_ISB_OSDROP,
+	PCAPNG_ISB_USRDELIV,
+};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
new file mode 100644
index 000000000000..7bde8d9b75fd
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.c
@@ -0,0 +1,574 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+#include <errno.h>
+#include <net/if.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/uio.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+#include <rte_time.h>
+
+#include "pcapng_proto.h"
+
+/* conversion from DPDK speed to PCAPNG */
+#define PCAPNG_MBPS_SPEED 1000000ull
+
+/* Format of the capture file handle */
+struct rte_pcapng {
+	int  outfd;		/* output file */
+	/* DPDK port id to interface index in file */
+	uint32_t port_index[RTE_MAX_ETHPORTS];
+};
+
+/* For converting TSC cycles to PCAPNG ns format */
+struct pcapng_time {
+	uint64_t ns;
+	uint64_t cycles;
+} pcapng_time;
+
+RTE_INIT(pcapng_init)
+{
+	struct timespec ts;
+
+	pcapng_time.cycles = rte_get_tsc_cycles();
+	clock_gettime(CLOCK_REALTIME, &ts);
+	pcapng_time.ns = rte_timespec_to_ns(&ts);
+}
+
+/* PCAPNG timestamps are in nanoseconds */
+static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
+{
+	uint64_t delta;
+
+	delta = cycles - pcapng_time.cycles;
+	return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
+}
+
+/* length of option including padding */
+static uint16_t pcapng_optlen(uint16_t len)
+{
+	return RTE_ALIGN(sizeof(struct pcapng_option) + len,
+			 sizeof(uint32_t));
+}
+
+/* build TLV option and return location of next */
+static struct pcapng_option *
+pcapng_add_option(struct pcapng_option *popt, uint16_t code,
+		  const void *data, uint16_t len)
+{
+	popt->code = code;
+	popt->length = len;
+	memcpy(popt->data, data, len);
+
+	return (struct pcapng_option *)((uint8_t *)popt + pcapng_optlen(len));
+}
+
+/*
+ * Write required initial section header describing the capture
+ */
+static int
+pcapng_section_block(rte_pcapng_t *self,
+		    const char *os, const char *hw,
+		    const char *app, const char *comment)
+{
+	struct pcapng_section_header *hdr;
+	struct pcapng_option *opt;
+	void *buf;
+	uint32_t len;
+	ssize_t cc;
+
+	len = sizeof(*hdr);
+	if (hw)
+		len += pcapng_optlen(strlen(hw));
+	if (os)
+		len += pcapng_optlen(strlen(os));
+	if (app)
+		len += pcapng_optlen(strlen(app));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = calloc(1, len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_section_header *)buf;
+	*hdr = (struct pcapng_section_header) {
+		.block_type = PCAPNG_SECTION_BLOCK,
+		.block_length = len,
+		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
+		.major_version = PCAPNG_MAJOR_VERS,
+		.minor_version = PCAPNG_MINOR_VERS,
+		.section_length = UINT64_MAX,
+	};
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (hw)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_HARDWARE,
+					hw, strlen(hw));
+	if (os)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_OS,
+					os, strlen(os));
+	if (app)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_USERAPPL,
+					app, strlen(app));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+	/* clone block_length after option */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+/* Write the PCAPNG section header at start of file */
+static ssize_t
+pcapng_interface_block(rte_pcapng_t *self, const char *if_name,
+		      uint64_t if_speed, const uint8_t *mac_addr,
+		      const char *if_hw, const char *comment)
+{
+	struct pcapng_interface_block *hdr;
+	struct pcapng_option *opt;
+	const uint8_t tsresol = 9;	/* nanosecond resolution */
+	uint32_t len = sizeof(*hdr);
+	ssize_t cc;
+	void *buf;
+
+	len += pcapng_optlen(sizeof(tsresol));
+	if (if_name)
+		len += pcapng_optlen(strlen(if_name));
+	if (mac_addr)
+		len += pcapng_optlen(6);
+	if (if_speed)
+		len += pcapng_optlen(sizeof(uint64_t));
+	if (if_hw)
+		len += pcapng_optlen(strlen(if_hw));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+	buf = calloc(1, len);
+	if (!buf)
+		return -ENOMEM;
+
+	hdr = (struct pcapng_interface_block *)buf;
+	hdr->block_type = PCAPNG_INTERFACE_BLOCK;
+	hdr->link_type = 1;	/* Ethernet */
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (if_name)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_NAME,
+					 if_name, strlen(if_name));
+	if (mac_addr)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_MACADDR,
+					mac_addr, RTE_ETHER_ADDR_LEN);
+	if (if_speed)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_SPEED,
+					 &if_speed, sizeof(uint64_t));
+	opt = pcapng_add_option(opt, PCAPNG_IFB_TSRESOL,
+				&tsresol, sizeof(tsresol));
+	if (if_hw)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE,
+					 if_hw, strlen(if_hw));
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+static int
+pcapng_add_interface(rte_pcapng_t *self, uint16_t port)
+{
+	struct rte_eth_dev_info dev_info;
+	struct rte_ether_addr macaddr;
+	const struct rte_device *dev;
+	struct rte_eth_link link;
+	char ifname[IF_NAMESIZE];
+	char ifhw[256];
+	uint64_t speed = 0;
+
+	if (rte_eth_dev_info_get(port, &dev_info) < 0)
+		return -1;
+
+	/* make something like an interface name */
+	if (if_indextoname(dev_info.if_index, ifname) == NULL)
+		snprintf(ifname, IF_NAMESIZE, "dpdk:%u", port);
+
+	/* make a useful device hardware string */
+	dev = dev_info.device;
+	if (dev)
+		snprintf(ifhw, sizeof(ifhw),
+			 "%s-%s", dev->bus->name, dev->name);
+
+	/* DPDK reports in units of Mbps */
+	rte_eth_link_get(port, &link);
+	if (link.link_status == ETH_LINK_UP)
+		speed = link.link_speed * PCAPNG_MBPS_SPEED;
+
+	rte_eth_macaddr_get(port, &macaddr);
+
+	return pcapng_interface_block(self, ifname, speed,
+				      macaddr.addr_bytes,
+				      dev ? ifhw : NULL, NULL);
+}
+
+/*
+ * Write the list of possible interfaces at the start
+ * of the file.
+ */
+static int
+pcapng_interfaces(rte_pcapng_t *self)
+{
+	uint16_t port_id;
+	uint16_t index = 0;
+
+	RTE_ETH_FOREACH_DEV(port_id) {
+		/* The list if ports in pcapng needs to be contiguous */
+		self->port_index[port_id] = index++;
+		if (pcapng_add_interface(self, port_id) < 0)
+			return -1;
+	}
+	return 0;
+}
+
+/*
+ * Write an Interface statistics block at the end of capture.
+ */
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop)
+{
+	struct pcapng_statistics *hdr;
+	struct pcapng_option *opt;
+	uint32_t optlen, len;
+	uint8_t *buf;
+	uint64_t ns;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	optlen = 0;
+
+	if (ifrecv != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifdrop));
+	if (start_time != 0)
+		optlen += pcapng_optlen(sizeof(start_time));
+	if (end_time != 0)
+		optlen += pcapng_optlen(sizeof(end_time));
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+	if (optlen != 0)
+		optlen += pcapng_optlen(0);
+
+	len = sizeof(*hdr) + optlen + sizeof(uint32_t);
+	buf = alloca(len);
+	if (buf == NULL)
+		return -1;
+
+	hdr = (struct pcapng_statistics *)buf;
+	opt = (struct pcapng_option *)(hdr + 1);
+
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (start_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME,
+					 &start_time, sizeof(start_time));
+	if (end_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME,
+					 &end_time, sizeof(end_time));
+	if (ifrecv != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV,
+				&ifrecv, sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFDROP,
+				&ifdrop, sizeof(ifdrop));
+	if (optlen != 0)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	hdr->block_type = PCAPNG_INTERFACE_STATS_BLOCK;
+	hdr->block_length = len;
+	hdr->interface_id = self->port_index[port_id];
+
+	ns = pcapng_tsc_to_ns(rte_get_tsc_cycles());
+	hdr->timestamp_hi = ns >> 32;
+	hdr->timestamp_lo = (uint32_t)ns;
+
+	/* clone block_length after option */
+	memcpy(opt, &len, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length)
+{
+	/* The VLAN and EPB header must fit in the mbuf headroom. */
+	RTE_ASSERT(sizeof(struct pcapng_enhance_packet_block) +
+		   sizeof(struct rte_vlan_hdr) <= RTE_PKTMBUF_HEADROOM);
+
+	/* The flags and queue information are added at the end. */
+	return sizeof(struct rte_mbuf)
+		+ RTE_ALIGN(length, sizeof(uint32_t))
+		+ pcapng_optlen(sizeof(uint32_t)) /* flag option */
+		+ pcapng_optlen(sizeof(uint32_t)) /* queue option */
+		+ sizeof(uint32_t);		  /*  length */
+}
+
+/*
+ *   The mbufs created use the Pcapng standard enhanced packet  block.
+ *
+ *                         1                   2                   3
+ *     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  0 |                    Block Type = 0x00000006                    |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  4 |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  8 |                         Interface ID                          |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 12 |                        Timestamp (High)                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 16 |                        Timestamp (Low)                        |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 20 |                    Captured Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 24 |                    Original Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 28 /                                                               /
+ *    /                          Packet Data                          /
+ *    /              variable length, padded to 32 bits               /
+ *    /                                                               /
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0002     |     Option Length = 0x004     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Flags (direction)                                |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0006     |     Option Length = 0x002     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Queue id                                         |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+
+/* Make a copy of original mbuf with pcapng header and options */
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *md,
+		struct rte_mempool *mp,
+		uint32_t length, uint64_t cycles,
+		enum rte_pcapng_direction direction)
+{
+	struct pcapng_enhance_packet_block *epb;
+	uint32_t orig_len, data_len, padding, flags;
+	struct pcapng_option *opt;
+	const uint16_t optlen = pcapng_optlen(sizeof(flags)) + pcapng_optlen(sizeof(queue));
+	struct rte_mbuf *mc;
+	uint64_t ns;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+#endif
+	ns = pcapng_tsc_to_ns(cycles);
+
+	orig_len = rte_pktmbuf_pkt_len(md);
+
+	/* Take snapshot of the data */
+	mc = rte_pktmbuf_copy(md, mp, 0, length);
+	if (unlikely(mc == NULL))
+		return NULL;
+
+	/* If packet had offloaded VLAN, expand it */
+	if (md->ol_flags & ~(PKT_RX_VLAN_STRIPPED | PKT_TX_VLAN)) {
+		if (rte_vlan_insert(&mc) != 0)
+			goto fail;
+
+		orig_len += sizeof(struct rte_vlan_hdr);
+	}
+
+	/* pad the packet to 32 bit boundary */
+	data_len = rte_pktmbuf_data_len(mc);
+	padding = RTE_ALIGN(data_len, sizeof(uint32_t)) - data_len;
+	if (padding > 0) {
+		void *tail = rte_pktmbuf_append(mc, padding);
+
+		if (tail == NULL)
+			goto fail;
+		memset(tail, 0, padding);
+	}
+
+	/* reserve trailing options and block length */
+	opt = (struct pcapng_option *)
+		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
+	if (unlikely(opt == NULL))
+		goto fail;
+
+	switch (direction) {
+	case RTE_PCAPNG_DIRECTION_IN:
+		flags = PCAPNG_IFB_INBOUND;
+		break;
+	case RTE_PCAPNG_DIRECTION_OUT:
+		flags = PCAPNG_IFB_OUTBOUND;
+		break;
+	default:
+		flags = 0;
+	}
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS,
+				&flags, sizeof(flags));
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
+				&queue, sizeof(queue));
+
+	/* Add PCAPNG packet header */
+	epb = (struct pcapng_enhance_packet_block *)
+		rte_pktmbuf_prepend(mc, sizeof(*epb));
+	if (unlikely(epb == NULL))
+		goto fail;
+
+	epb->block_type = PCAPNG_ENHANCED_PACKET_BLOCK;
+	epb->block_length = rte_pktmbuf_data_len(mc);
+
+	/* Interface index is filled in later during write */
+	mc->port = port_id;
+
+	epb->timestamp_hi = ns >> 32;
+	epb->timestamp_lo = (uint32_t)ns;
+	epb->capture_length = data_len;
+	epb->original_length = orig_len;
+
+	/* set trailer of block length */
+	*(uint32_t *)opt = epb->block_length;
+
+	return mc;
+
+fail:
+	rte_pktmbuf_free(mc);
+	return NULL;
+}
+
+/* Count how many segments are in this array of mbufs */
+static unsigned int
+mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
+{
+	unsigned int i, iovcnt;
+
+	for (iovcnt = 0, i = 0; i < n; i++) {
+		const struct rte_mbuf *m = pkts[i];
+
+		__rte_mbuf_sanity_check(m, 1);
+
+		iovcnt += m->nb_segs;
+	}
+	return iovcnt;
+}
+
+/* Write pre-formatted packets to file. */
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts)
+{
+	int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
+	struct iovec iov[iovcnt];
+	unsigned int i, cnt;
+	ssize_t ret;
+
+	for (i = cnt = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *m = pkts[i];
+		struct pcapng_enhance_packet_block *epb;
+
+		/* sanity check that is really a pcapng mbuf */
+		epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
+		if (unlikely(epb->block_type != PCAPNG_ENHANCED_PACKET_BLOCK ||
+			     epb->block_length != rte_pktmbuf_data_len(m))) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/*
+		 * The DPDK port is recorded during pcapng_copy.
+		 * Map that to PCAPNG interface in file.
+		 */
+		epb->interface_id = self->port_index[m->port];
+		do {
+			iov[cnt].iov_base = rte_pktmbuf_mtod(m, void *);
+			iov[cnt].iov_len = rte_pktmbuf_data_len(m);
+			++cnt;
+		} while ((m = m->next));
+	}
+
+	ret = writev(self->outfd, iov, iovcnt);
+	if (unlikely(ret < 0))
+		rte_errno = errno;
+	return ret;
+}
+
+/* Create new pcapng writer handle */
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment)
+{
+	rte_pcapng_t *self;
+
+	self = malloc(sizeof(*self));
+	if (!self) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	self->outfd = fd;
+
+	if (pcapng_section_block(self, osname, hardware, appname, comment) < 0)
+		goto fail;
+
+	if (pcapng_interfaces(self) < 0)
+		goto fail;
+
+	return self;
+fail:
+	free(self);
+	return NULL;
+}
+
+void
+rte_pcapng_close(rte_pcapng_t *self)
+{
+	close(self->outfd);
+	free(self);
+}
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
new file mode 100644
index 000000000000..2f1bb073df08
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.h
@@ -0,0 +1,194 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+/**
+ * @file
+ * RTE pcapng
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * Pcapng is an evolution from the pcap format, created to address some of
+ * its deficiencies. Namely, the lack of extensibility and inability to store
+ * additional information.
+ *
+ * For details about the file format see RFC:
+ *   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+ *  and
+ *    https://github.com/pcapng/pcapng/
+ */
+
+#ifndef _RTE_PCAPNG_H_
+#define _RTE_PCAPNG_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+#include <rte_compat.h>
+#include <rte_common.h>
+#include <rte_mempool.h>
+#include <rte_ring.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Opaque handle used for functions in this library. */
+typedef struct rte_pcapng rte_pcapng_t;
+
+/**
+ * Write data to existing open file
+ *
+ * @param fd
+ *   file descriptor
+ * @param osname
+ *   Optional description of the operating system.
+ *   Examples: "Debian 11", "Windows Server 22"
+ * @param hardware
+ *   Optional description of the hardware used to create this file.
+ *   Examples: "x86 Virtual Machine"
+ * @param appname
+ *   Optional: application name recorded in the pcapng file.
+ *   Example: "dpdk-dumpcap 1.0 (DPDK 20.11)"
+ * @param comment
+ *   Optional comment to add to file header.
+ * @return
+ *   handle to library, or NULL in case of error (and rte_errno is set).
+ */
+__rte_experimental
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment);
+
+/**
+ * Close capture file
+ *
+ * @param self
+ *  handle to library
+ */
+__rte_experimental
+void
+rte_pcapng_close(rte_pcapng_t *self);
+
+/**
+ * Direction flag
+ * These should match Enhanced Packet Block flag bits
+ */
+enum rte_pcapng_direction {
+	RTE_PCAPNG_DIRECTION_UNKNOWN = 0,
+	RTE_PCAPNG_DIRECTION_IN  = 1,
+	RTE_PCAPNG_DIRECTION_OUT = 2,
+};
+
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param port_id
+ *   The Ethernet port on which packet was received
+ *   or is going to be transmitted.
+ * @param queue
+ *   The queue on the Ethernet port where packet was received
+ *   or is going to be transmitted.
+ * @param mp
+ *   The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ *   The mbuf to copy
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @param timestamp
+ *   The timestamp in TSC cycles.
+ * @param direction
+ *   The direction of the packer: receive, transmit or unknown.
+ *
+ * @return
+ *   - The pointer to the new mbuf formatted for pcapng_write
+ *   - NULL if allocation fails.
+ *
+ */
+__rte_experimental
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *m, struct rte_mempool *mp,
+		uint32_t length, uint64_t timestamp,
+		enum rte_pcapng_direction direction);
+
+
+/**
+ * Determine optimum mbuf data size.
+ *
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @return
+ *   The minimum size of mbuf data to handle packet with length bytes.
+ *   Accounting for required header and trailer fields
+ */
+__rte_experimental
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length);
+
+/**
+ * Write packets to the capture file.
+ *
+ * Packets to be captured are copied by rte_pcapng_mbuf()
+ * and then this function is called to write them to the file.
+ * @warning
+ * Do not pass original mbufs
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param pkts
+ *  The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *  which contain the output packets
+ * @param nb_pkts
+ *  The number of packets to write to the file.
+ * @return
+ *  The number of bytes written to file, -1 on failure to write file.
+ *  The mbuf's in *pkts* are always freed.
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts);
+
+/**
+ * Write an Interface statistics block.
+ * For statistics, use 0 if don't know or care to report it.
+ * Should be called before closing capture to report results.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param port
+ *  The Ethernet port to report stats on.
+ * @param comment
+ *   Optional comment to add to statistics.
+ * @param start_time
+ *  The time when packet capture was started in nanoseconds.
+ *  Optional: can be zero if not known.
+ * @param end_time
+ *  The time when packet capture was stopped in nanoseconds.
+ *  Optional: can be zero if not finished;
+ * @param ifrecv
+ *  The number of packets received by capture.
+ *  Optional: use UINT64_MAX if not known.
+ * @param ifdrop
+ *  The number of packets missed by the capture process.
+ *  Optional: use UINT64_MAX if not known.
+ * @return
+ *  number of bytes written to file, -1 on failure to write file
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PCAPNG_H_ */
diff --git a/lib/pcapng/version.map b/lib/pcapng/version.map
new file mode 100644
index 000000000000..05a9c86a7d91
--- /dev/null
+++ b/lib/pcapng/version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_pcapng_close;
+	rte_pcapng_copy;
+	rte_pcapng_fdopen;
+	rte_pcapng_mbuf_size;
+	rte_pcapng_write_packets;
+	rte_pcapng_write_stats;
+
+	local: *;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v4 2/8] bpf: allow self-xor operation
  2021-09-08 17:16 ` [dpdk-dev] [PATCH v4 0/8] Packet capture framework enhancements Stephen Hemminger
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 1/8] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-09-08 17:16   ` Stephen Hemminger
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 3/8] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 17:16 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

When doing BPF filter program conversion, a common way
to zero a register in single instruction is:
     xor r7,r7
The BPF validator would not allow this because the value of
r7 was undefined. But after this operation it always zero.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_validate.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/bpf/bpf_validate.c b/lib/bpf/bpf_validate.c
index 7b1291b382e9..7647a7454dc2 100644
--- a/lib/bpf/bpf_validate.c
+++ b/lib/bpf/bpf_validate.c
@@ -661,8 +661,12 @@ eval_alu(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
 
 	op = BPF_OP(ins->code);
 
-	err = eval_defined((op != EBPF_MOV) ? rd : NULL,
-			(op != BPF_NEG) ? &rs : NULL);
+	/* Allow self-xor as way to zero register */
+	if (op == BPF_XOR && ins->src_reg == ins->dst_reg)
+		err = NULL;
+	else
+		err = eval_defined((op != EBPF_MOV) ? rd : NULL,
+				   (op != BPF_NEG) ? &rs : NULL);
 	if (err != NULL)
 		return err;
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v4 3/8] bpf: add function to convert classic BPF to DPDK BPF
  2021-09-08 17:16 ` [dpdk-dev] [PATCH v4 0/8] Packet capture framework enhancements Stephen Hemminger
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 1/8] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 2/8] bpf: allow self-xor operation Stephen Hemminger
@ 2021-09-08 17:16   ` Stephen Hemminger
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 4/8] bpf: add function to dump eBPF instructions Stephen Hemminger
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 17:16 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

The pcap library emits classic BPF (32 bit) and is useful for
creating filter programs.  The DPDK BPF library only implements
extended BPF (eBPF).  Add an function to convert from old to
new.

The rte_bpf_convert function uses rte_malloc to put the resulting
program in hugepage shared memory so it can be passed from a
secondary process to a primary process.

The code to convert was originally done as part of the Linux
kernel implementation then converted to a userspace program.
Both authors have agreed that it is allowable to license this
as BSD licensed in DPDK.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_convert.c | 570 ++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build   |   5 +
 lib/bpf/rte_bpf.h     |  25 ++
 lib/bpf/version.map   |   6 +
 4 files changed, 606 insertions(+)
 create mode 100644 lib/bpf/bpf_convert.c

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
new file mode 100644
index 000000000000..198e6d359042
--- /dev/null
+++ b/lib/bpf/bpf_convert.c
@@ -0,0 +1,570 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Microsoft Corporation
+ *
+ * Based on bpf_convert_filter() in the Linux kernel sources
+ * and filter2xdp.
+ *
+ * Licensed as BSD with permission original authors.
+ * Copyright (C) 2017 Tobias Klauser
+ * Copyright (c) 2011 - 2014 PLUMgrid, http://plumgrid.com
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_bpf.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+/* Workaround name conflicts with libpcap */
+#define bpf_validate(f, len) bpf_validate_libpcap(f, len)
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+#undef bpf_validate
+
+#include "bpf_impl.h"
+#include "bpf_def.h"
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+/*
+ * Linux socket filter uses negative absolute offsets to
+ * reference ancillary data.
+ *
+ */
+#define SKF_AD_OFF    (-0x1000)
+#define SKF_AD_PROTOCOL 0
+#define SKF_AD_PKTTYPE	4
+#define SKF_AD_IFINDEX	8
+#define SKF_AD_NLATTR	12
+#define SKF_AD_NLATTR_NEST	16
+#define SKF_AD_MARK	20
+#define SKF_AD_QUEUE	24
+#define SKF_AD_HATYPE	28
+#define SKF_AD_RXHASH	32
+#define SKF_AD_CPU	36
+#define SKF_AD_ALU_XOR_X	40
+#define SKF_AD_VLAN_TAG	44
+#define SKF_AD_VLAN_TAG_PRESENT 48
+#define SKF_AD_PAY_OFFSET	52
+#define SKF_AD_RANDOM	56
+#define SKF_AD_VLAN_TPID	60
+#define SKF_AD_MAX	64
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1	EBPF_REG_1
+#define BPF_REG_ARG2	EBPF_REG_2
+#define BPF_REG_ARG3	EBPF_REG_3
+#define BPF_REG_ARG4	EBPF_REG_4
+#define BPF_REG_ARG5	EBPF_REG_5
+#define BPF_REG_CTX	EBPF_REG_6
+#define BPF_REG_FP	EBPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A	EBPF_REG_0
+#define BPF_REG_X	EBPF_REG_7
+#define BPF_REG_TMP	EBPF_REG_8
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define EBPF_ALU64_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | BPF_OP(OP) | BPF_X,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU32_IMM(OP, DST, IMM)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV32_IMM(DST, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = CODE,					\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN()						\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | EBPF_EXIT,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/*
+ * Placeholder to convert BPF extensions like length and VLAN tag
+ * If and when DPDK BPF supports them.
+ */
+static bool convert_bpf_load(const struct bpf_insn *fp,
+			     struct ebpf_insn **new_insnp __rte_unused)
+{
+	switch (fp->k) {
+	case SKF_AD_OFF + SKF_AD_PROTOCOL:
+	case SKF_AD_OFF + SKF_AD_PKTTYPE:
+	case SKF_AD_OFF + SKF_AD_IFINDEX:
+	case SKF_AD_OFF + SKF_AD_HATYPE:
+	case SKF_AD_OFF + SKF_AD_MARK:
+	case SKF_AD_OFF + SKF_AD_RXHASH:
+	case SKF_AD_OFF + SKF_AD_QUEUE:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG_PRESENT:
+	case SKF_AD_OFF + SKF_AD_VLAN_TPID:
+	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
+	case SKF_AD_OFF + SKF_AD_NLATTR:
+	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
+	case SKF_AD_OFF + SKF_AD_CPU:
+	case SKF_AD_OFF + SKF_AD_RANDOM:
+	case SKF_AD_OFF + SKF_AD_ALU_XOR_X:
+		/* TODO: Not implemented yet */
+		RTE_BPF_LOG(ERR, "BPF extension LOAD ABS %u not supported\n",
+			    fp->k);
+		return true;
+	default:
+		return false;
+	}
+}
+
+static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
+			      struct ebpf_insn *new_prog, uint32_t *new_len)
+{
+	unsigned int pass = 0;
+	size_t new_flen = 0, target, i;
+	struct ebpf_insn *new_insn;
+	const struct bpf_insn *fp;
+	int *addrs = NULL;
+	uint8_t bpf_src;
+
+	if (len > BPF_MAXINSNS) {
+		RTE_BPF_LOG(ERR, "%s: cBPF program too long (%zu insns)\n",
+			    __func__, len);
+		return -EINVAL;
+	}
+
+	/* On second pass, allocate the new program */
+	if (new_prog) {
+		addrs = calloc(len, sizeof(*addrs));
+		if (addrs == NULL)
+			return -ENOMEM;
+	}
+
+do_pass:
+	new_insn = new_prog;
+	fp = prog;
+
+	/* Classic BPF related prologue emission. */
+	if (new_insn) {
+		/* Classic BPF expects A and X to be reset first. These need
+		 * to be guaranteed to be the first two instructions.
+		 */
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+
+		/* All programs must keep CTX in callee saved BPF_REG_CTX.
+		 * In eBPF case it's done by the compiler, here we need to
+		 * do this ourself. Initial CTX is present in BPF_REG_ARG1.
+		 */
+		*new_insn++ = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1);
+	} else {
+		new_insn += 3;
+	}
+
+	for (i = 0; i < len; fp++, i++) {
+		struct ebpf_insn tmp_insns[6] = { };
+		struct ebpf_insn *insn = tmp_insns;
+
+		if (addrs)
+			addrs[i] = new_insn - new_prog;
+
+		switch (fp->code) {
+			/* Absolute loads are how classic BPF accesses skb */
+		case BPF_LD | BPF_ABS | BPF_W:
+		case BPF_LD | BPF_ABS | BPF_H:
+		case BPF_LD | BPF_ABS | BPF_B:
+			if (convert_bpf_load(fp, &insn))
+				goto err;
+
+			*insn = BPF_RAW_INSN(fp->code, 0, 0, 0, fp->k);
+			break;
+
+		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_X:
+			/* For cBPF, don't cause floating point exception */
+			*insn++ = BPF_MOV32_REG(BPF_REG_X, BPF_REG_X);
+			*insn++ = BPF_JMP_IMM(EBPF_JNE, BPF_REG_X, 0, 2);
+			*insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+			*insn++ = BPF_EXIT_INSN();
+			/* fallthrough */
+		case BPF_ALU | BPF_ADD | BPF_X:
+		case BPF_ALU | BPF_ADD | BPF_K:
+		case BPF_ALU | BPF_SUB | BPF_X:
+		case BPF_ALU | BPF_SUB | BPF_K:
+		case BPF_ALU | BPF_AND | BPF_X:
+		case BPF_ALU | BPF_AND | BPF_K:
+		case BPF_ALU | BPF_OR | BPF_X:
+		case BPF_ALU | BPF_OR | BPF_K:
+		case BPF_ALU | BPF_LSH | BPF_X:
+		case BPF_ALU | BPF_LSH | BPF_K:
+		case BPF_ALU | BPF_RSH | BPF_X:
+		case BPF_ALU | BPF_RSH | BPF_K:
+		case BPF_ALU | BPF_XOR | BPF_X:
+		case BPF_ALU | BPF_XOR | BPF_K:
+		case BPF_ALU | BPF_MUL | BPF_X:
+		case BPF_ALU | BPF_MUL | BPF_K:
+		case BPF_ALU | BPF_DIV | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_NEG:
+		case BPF_LD | BPF_IND | BPF_W:
+		case BPF_LD | BPF_IND | BPF_H:
+		case BPF_LD | BPF_IND | BPF_B:
+			/* All arithmetic insns map as-is. */
+			*insn = BPF_RAW_INSN(fp->code, BPF_REG_A, BPF_REG_X, 0, fp->k);
+			break;
+
+			/* Jump transformation cannot use BPF block macros
+			 * everywhere as offset calculation and target updates
+			 * require a bit more work than the rest, i.e. jump
+			 * opcodes map as-is, but offsets need adjustment.
+			 */
+
+#define BPF_EMIT_JMP							\
+			do {						\
+				if (target >= len)			\
+					goto err;			\
+				insn->off = addrs ? addrs[target] - addrs[i] - 1 : 0; \
+				/* Adjust pc relative offset for 2nd or 3rd insn. */ \
+				insn->off -= insn - tmp_insns;		\
+			} while (0)
+
+		case BPF_JMP | BPF_JA:
+			target = i + fp->k + 1;
+			insn->code = fp->code;
+			BPF_EMIT_JMP;
+			break;
+
+		case BPF_JMP | BPF_JEQ | BPF_K:
+		case BPF_JMP | BPF_JEQ | BPF_X:
+		case BPF_JMP | BPF_JSET | BPF_K:
+		case BPF_JMP | BPF_JSET | BPF_X:
+		case BPF_JMP | BPF_JGT | BPF_K:
+		case BPF_JMP | BPF_JGT | BPF_X:
+		case BPF_JMP | BPF_JGE | BPF_K:
+		case BPF_JMP | BPF_JGE | BPF_X:
+			if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
+				/* BPF immediates are signed, zero extend
+				 * immediate into tmp register and use it
+				 * in compare insn.
+				 */
+				*insn++ = BPF_MOV32_IMM(BPF_REG_TMP, fp->k);
+
+				insn->dst_reg = BPF_REG_A;
+				insn->src_reg = BPF_REG_TMP;
+				bpf_src = BPF_X;
+			} else {
+				insn->dst_reg = BPF_REG_A;
+				insn->imm = fp->k;
+				bpf_src = BPF_SRC(fp->code);
+				insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			}
+
+			/* Common case where 'jump_false' is next insn. */
+			if (fp->jf == 0) {
+				insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+				target = i + fp->jt + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Convert JEQ into JNE when 'jump_true' is next insn. */
+			if (fp->jt == 0 && BPF_OP(fp->code) == BPF_JEQ) {
+				insn->code = BPF_JMP | EBPF_JNE | bpf_src;
+				target = i + fp->jf + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Other jumps are mapped into two insns: Jxx and JA. */
+			target = i + fp->jt + 1;
+			insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+			BPF_EMIT_JMP;
+			insn++;
+
+			insn->code = BPF_JMP | BPF_JA;
+			target = i + fp->jf + 1;
+			BPF_EMIT_JMP;
+			break;
+
+			/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
+		case BPF_LDX | BPF_MSH | BPF_B:
+			/* tmp = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_A);
+			/* A = BPF_R0 = *(u8 *) (skb->data + K) */
+			*insn++ = BPF_LD_ABS(BPF_B, fp->k);
+			/* A &= 0xf */
+			*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, 0xf);
+			/* A <<= 2 */
+			*insn++ = BPF_ALU32_IMM(BPF_LSH, BPF_REG_A, 2);
+			/* X = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			/* A = tmp */
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP);
+			break;
+
+			/* RET_K is remaped into 2 insns. RET_A case doesn't need an
+			 * extra mov as EBPF_REG_0 is already mapped into BPF_REG_A.
+			 */
+		case BPF_RET | BPF_A:
+		case BPF_RET | BPF_K:
+			if (BPF_RVAL(fp->code) == BPF_K) {
+				*insn++ = BPF_MOV32_RAW(BPF_K, EBPF_REG_0,
+							0, fp->k);
+			}
+			*insn = BPF_EXIT_INSN();
+			break;
+
+			/* Store to stack. */
+		case BPF_ST:
+		case BPF_STX:
+			*insn = BPF_STX_MEM(BPF_W, BPF_REG_FP, BPF_CLASS(fp->code) ==
+					    BPF_ST ? BPF_REG_A : BPF_REG_X,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* Load from stack. */
+		case BPF_LD | BPF_MEM:
+		case BPF_LDX | BPF_MEM:
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD  ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_FP,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* A = K or X = K */
+		case BPF_LD | BPF_IMM:
+		case BPF_LDX | BPF_IMM:
+			*insn = BPF_MOV32_IMM(BPF_CLASS(fp->code) == BPF_LD ?
+					      BPF_REG_A : BPF_REG_X, fp->k);
+			break;
+
+			/* X = A */
+		case BPF_MISC | BPF_TAX:
+			*insn = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			break;
+
+			/* A = X */
+		case BPF_MISC | BPF_TXA:
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_X);
+			break;
+
+			/* A = mbuf->len or X = mbuf->len */
+		case BPF_LD | BPF_W | BPF_LEN:
+		case BPF_LDX | BPF_W | BPF_LEN:
+			/* BPF_ABS/BPF_IND implicitly expect mbuf ptr in R6 */
+
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_CTX,
+					    offsetof(struct rte_mbuf, pkt_len));
+			break;
+
+			/* Unknown instruction. */
+		default:
+			RTE_BPF_LOG(ERR, "%s: Unknown instruction!: %#x\n",
+				    __func__, fp->code);
+			goto err;
+		}
+
+		insn++;
+		if (new_prog)
+			memcpy(new_insn, tmp_insns,
+			       sizeof(*insn) * (insn - tmp_insns));
+		new_insn += insn - tmp_insns;
+	}
+
+	if (!new_prog) {
+		/* Only calculating new length. */
+		*new_len = new_insn - new_prog;
+		return 0;
+	}
+
+	pass++;
+	if ((ptrdiff_t)new_flen != new_insn - new_prog) {
+		new_flen = new_insn - new_prog;
+		if (pass > 2)
+			goto err;
+		goto do_pass;
+	}
+
+	free(addrs);
+	assert(*new_len == new_flen);
+
+	return 0;
+err:
+	free(addrs);
+	return -1;
+}
+
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog)
+{
+	struct rte_bpf_prm *prm = NULL;
+	struct ebpf_insn *ebpf = NULL;
+	uint32_t ebpf_len = 0;
+	int ret;
+
+	if (prog == NULL) {
+		RTE_BPF_LOG(ERR, "%s: NULL program\n", __func__);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* 1st pass: calculate the eBPF program length */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, NULL, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot get eBPF length\n", __func__);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	RTE_BPF_LOG(DEBUG, "%s: prog len cBPF=%u -> eBPF=%u\n",
+		    __func__, prog->bf_len, ebpf_len);
+
+	prm = rte_zmalloc("bpf_filter",
+			  sizeof(*prm) + ebpf_len * sizeof(*ebpf), 0);
+	if (prm == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	/* The EPBF instructions in this case are right after the header */
+	ebpf = (void *)(prm + 1);
+
+	/* 2nd pass: remap cBPF to eBPF instructions  */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, ebpf, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot convert cBPF to eBPF\n", __func__);
+		free(prm);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	prm->ins = ebpf;
+	prm->nb_ins = ebpf_len;
+
+	/* Classic BPF programs use mbufs */
+	prm->prog_arg.type = RTE_BPF_ARG_PTR_MBUF;
+	prm->prog_arg.size = sizeof(struct rte_mbuf);
+
+	return prm;
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 63cbd60185e0..54f7610ae990 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -25,3 +25,8 @@ if dep.found()
     sources += files('bpf_load_elf.c')
     ext_deps += dep
 endif
+
+if dpdk_conf.has('RTE_PORT_PCAP')
+    sources += files('bpf_convert.c')
+    ext_deps += pcap_dep
+endif
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 69116f36ba8b..2f23e272a376 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,31 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+#ifdef RTE_PORT_PCAP
+
+struct bpf_program;
+
+/**
+ * Convert a Classic BPF program from libpcap into a DPDK BPF code.
+ *
+ * @param prog
+ *  Classic BPF program from pcap_compile().
+ * @param prm
+ *  Result Extended BPF program.
+ * @return
+ *   Pointer to BPF program (allocated with *rte_malloc*)
+ *   that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+__rte_experimental
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog);
+
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 0bf35f487666..47082d5003ef 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -14,3 +14,9 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_convert;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v4 4/8] bpf: add function to dump eBPF instructions
  2021-09-08 17:16 ` [dpdk-dev] [PATCH v4 0/8] Packet capture framework enhancements Stephen Hemminger
                     ` (2 preceding siblings ...)
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 3/8] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
@ 2021-09-08 17:16   ` Stephen Hemminger
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 5/8] pdump: support pcapng and filtering Stephen Hemminger
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 17:16 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

When debugging converted (and other) programs it is useful
to see disassembled eBPF output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_dump.c  | 118 ++++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build |   1 +
 lib/bpf/rte_bpf.h   |  14 ++++++
 lib/bpf/version.map |   1 +
 4 files changed, 134 insertions(+)
 create mode 100644 lib/bpf/bpf_dump.c

diff --git a/lib/bpf/bpf_dump.c b/lib/bpf/bpf_dump.c
new file mode 100644
index 000000000000..a6a431e64903
--- /dev/null
+++ b/lib/bpf/bpf_dump.c
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Stephen Hemminger
+ * Based on filter2xdp
+ * Copyright (C) 2017 Tobias Klauser
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+
+#include "rte_bpf.h"
+
+#define BPF_OP_INDEX(x) (BPF_OP(x) >> 4)
+#define BPF_SIZE_INDEX(x) (BPF_SIZE(x) >> 3)
+
+static const char *const class_tbl[] = {
+	[BPF_LD] = "ld",   [BPF_LDX] = "ldx",	 [BPF_ST] = "st",
+	[BPF_STX] = "stx", [BPF_ALU] = "alu",	 [BPF_JMP] = "jmp",
+	[BPF_RET] = "ret", [BPF_MISC] = "alu64",
+};
+
+static const char *const alu_op_tbl[16] = {
+	[BPF_ADD >> 4] = "add",	   [BPF_SUB >> 4] = "sub",
+	[BPF_MUL >> 4] = "mul",	   [BPF_DIV >> 4] = "div",
+	[BPF_OR >> 4] = "or",	   [BPF_AND >> 4] = "and",
+	[BPF_LSH >> 4] = "lsh",	   [BPF_RSH >> 4] = "rsh",
+	[BPF_NEG >> 4] = "neg",	   [BPF_MOD >> 4] = "mod",
+	[BPF_XOR >> 4] = "xor",	   [EBPF_MOV >> 4] = "mov",
+	[EBPF_ARSH >> 4] = "arsh", [EBPF_END >> 4] = "endian",
+};
+
+static const char *const size_tbl[] = {
+	[BPF_W >> 3] = "w",
+	[BPF_H >> 3] = "h",
+	[BPF_B >> 3] = "b",
+	[EBPF_DW >> 3] = "dw",
+};
+
+static const char *const jump_tbl[16] = {
+	[BPF_JA >> 4] = "ja",	   [BPF_JEQ >> 4] = "jeq",
+	[BPF_JGT >> 4] = "jgt",	   [BPF_JGE >> 4] = "jge",
+	[BPF_JSET >> 4] = "jset",  [EBPF_JNE >> 4] = "jne",
+	[EBPF_JSGT >> 4] = "jsgt", [EBPF_JSGE >> 4] = "jsge",
+	[EBPF_CALL >> 4] = "call", [EBPF_EXIT >> 4] = "exit",
+};
+
+static void ebpf_dump(FILE *f, const struct ebpf_insn insn, size_t n)
+{
+	const char *op, *postfix = "";
+	uint8_t cls = BPF_CLASS(insn.code);
+
+	fprintf(f, " L%zu:\t", n);
+
+	switch (cls) {
+	default:
+		fprintf(f, "unimp 0x%x // class: %s\n", insn.code,
+			class_tbl[cls]);
+		break;
+	case BPF_ALU:
+		postfix = "32";
+		/* fall through */
+	case EBPF_ALU64:
+		op = alu_op_tbl[BPF_OP_INDEX(insn.code)];
+		if (BPF_SRC(insn.code) == BPF_X)
+			fprintf(f, "%s%s r%u, r%u\n", op, postfix, insn.dst_reg,
+				insn.src_reg);
+		else
+			fprintf(f, "%s%s r%u, #0x%x\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		break;
+	case BPF_LD:
+		op = "ld";
+		postfix = size_tbl[BPF_SIZE_INDEX(insn.code)];
+		if (BPF_MODE(insn.code) == BPF_IMM)
+			fprintf(f, "%s%s r%d, #0x%x\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		else if (BPF_MODE(insn.code) == BPF_ABS)
+			fprintf(f, "%s%s r%d, [%d]\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		else if (BPF_MODE(insn.code) == BPF_IND)
+			fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix,
+				insn.dst_reg, insn.src_reg, insn.imm);
+		else
+			fprintf(f, "// BUG: LD opcode 0x%02x in eBPF insns\n",
+				insn.code);
+		break;
+	case BPF_LDX:
+		op = "ldx";
+		postfix = size_tbl[BPF_SIZE_INDEX(insn.code)];
+		fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix, insn.dst_reg,
+			insn.src_reg, insn.off);
+		break;
+#define L(pc, off) ((int)(pc) + 1 + (off))
+	case BPF_JMP:
+		op = jump_tbl[BPF_OP_INDEX(insn.code)];
+		if (op == NULL)
+			fprintf(f, "invalid jump opcode: %#x\n", insn.code);
+		else if (BPF_OP(insn.code) == BPF_JA)
+			fprintf(f, "%s L%d\n", op, L(n, insn.off));
+		else if (BPF_OP(insn.code) == EBPF_EXIT)
+			fprintf(f, "%s\n", op);
+		else
+			fprintf(f, "%s r%u, #0x%x, L%d\n", op, insn.dst_reg,
+				insn.imm, L(n, insn.off));
+		break;
+	case BPF_RET:
+		fprintf(f, "// BUG: RET opcode 0x%02x in eBPF insns\n",
+			insn.code);
+		break;
+	}
+}
+
+void rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len)
+{
+	uint32_t i;
+
+	for (i = 0; i < len; ++i)
+		ebpf_dump(f, buf[i], i);
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 54f7610ae990..5b5585173aeb 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2018 Intel Corporation
 
 sources = files('bpf.c',
+	'bpf_dump.c',
         'bpf_exec.c',
         'bpf_load.c',
         'bpf_pkt.c',
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 2f23e272a376..0d0a84b130a0 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,20 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+/**
+ * Dump epf instructions to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param buf
+ *   A pointer to BPF instructions
+ * @param len
+ *   Number of BPF instructions to dump.
+ */
+__rte_experimental
+void
+rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len);
+
 #ifdef RTE_PORT_PCAP
 
 struct bpf_program;
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 47082d5003ef..3b953f2f4592 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -19,4 +19,5 @@ EXPERIMENTAL {
 	global:
 
 	rte_bpf_convert;
+	rte_bpf_dump;
 };
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v4 5/8] pdump: support pcapng and filtering
  2021-09-08 17:16 ` [dpdk-dev] [PATCH v4 0/8] Packet capture framework enhancements Stephen Hemminger
                     ` (3 preceding siblings ...)
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 4/8] bpf: add function to dump eBPF instructions Stephen Hemminger
@ 2021-09-08 17:16   ` Stephen Hemminger
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 6/8] app/dumpcap: add new packet capture application Stephen Hemminger
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 17:16 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build       |   4 +-
 lib/pdump/meson.build |   2 +-
 lib/pdump/rte_pdump.c | 437 ++++++++++++++++++++++++++++++------------
 lib/pdump/rte_pdump.h | 110 ++++++++++-
 lib/pdump/version.map |   8 +
 5 files changed, 435 insertions(+), 126 deletions(-)

diff --git a/lib/meson.build b/lib/meson.build
index 51bf9c2d11f0..4b01949fe715 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -26,6 +26,7 @@ libraries = [
         'timer',   # eventdev depends on this
         'acl',
         'bbdev',
+        'bpf',
         'bitratestats',
         'cfgfile',
         'compressdev',
@@ -43,7 +44,6 @@ libraries = [
         'member',
         'pcapng',
         'power',
-        'pdump',
         'rawdev',
         'regexdev',
         'rib',
@@ -55,10 +55,10 @@ libraries = [
         'ipsec', # ipsec lib depends on net, crypto and security
         'fib', #fib lib depends on rib
         'port', # pkt framework libs which use other libs from above
+	'pdump', # pdump lib depends on bpf pcapng
         'table',
         'pipeline',
         'flow_classify', # flow_classify lib depends on pkt framework table lib
-        'bpf',
         'graph',
         'node',
 ]
diff --git a/lib/pdump/meson.build b/lib/pdump/meson.build
index 3a95eabde6a6..51ceb2afdec5 100644
--- a/lib/pdump/meson.build
+++ b/lib/pdump/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('rte_pdump.c')
 headers = files('rte_pdump.h')
-deps += ['ethdev']
+deps += ['ethdev', 'bpf', 'pcapng']
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 382217bc1564..9cb31d6b5008 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -7,8 +7,10 @@
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
+#include <rte_pcapng.h>
 
 #include "rte_pdump.h"
 
@@ -27,30 +29,26 @@ enum pdump_operation {
 	ENABLE = 2
 };
 
+/*
+ * Note: version numbers intentionally start at 3
+ * in order to catch any application built with older out
+ * version of DPDK using incompatible client request format.
+ */
 enum pdump_version {
-	V1 = 1
+	PDUMP_CLIENT_LEGACY = 3,
+	PDUMP_CLIENT_PCAPNG = 4,
 };
 
 struct pdump_request {
 	uint16_t ver;
 	uint16_t op;
-	uint32_t flags;
-	union pdump_data {
-		struct enable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} en_v1;
-		struct disable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} dis_v1;
-	} data;
+	uint16_t flags;
+	uint16_t queue;
+	struct rte_ring *ring;
+	struct rte_mempool *mp;
+	const struct rte_bpf_prm *prm;
+	uint32_t snaplen;
+	char device[RTE_DEV_NAME_MAX_LEN];
 };
 
 struct pdump_response {
@@ -63,80 +61,140 @@ static struct pdump_rxtx_cbs {
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	const struct rte_eth_rxtx_callback *cb;
-	void *filter;
+	const struct rte_bpf *filter;
+	enum pdump_version ver;
+	uint32_t snaplen;
 } rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
 tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
 
-
-static inline void
-pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+static const char *MZ_RTE_PDUMP_STATS = "rte_pdump_stats";
+
+/* Shared memory between primary and secondary processes. */
+static struct {
+	struct rte_pdump_stats rx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+	struct rte_pdump_stats tx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+} *pdump_stats;
+
+/* Create a clone of mbuf to be placed into ring. */
+static void
+pdump_copy(uint16_t port_id, uint16_t queue,
+	   enum rte_pcapng_direction direction,
+	   struct rte_mbuf **pkts, uint16_t nb_pkts,
+	   const struct pdump_rxtx_cbs *cbs,
+	   struct rte_pdump_stats *stats)
 {
 	unsigned int i;
 	int ring_enq;
 	uint16_t d_pkts = 0;
 	struct rte_mbuf *dup_bufs[nb_pkts];
-	struct pdump_rxtx_cbs *cbs;
+	uint64_t ts;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	struct rte_mbuf *p;
+	uint64_t rcs[nb_pkts];
+
+	if (cbs->filter &&
+	    rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts) == 0) {
+		/* All packets were filtered out */
+		__atomic_fetch_add(&stats->filtered, nb_pkts,
+				   __ATOMIC_RELAXED);
+		return;
+	}
 
-	cbs  = user_params;
+	ts = rte_get_tsc_cycles();
 	ring = cbs->ring;
 	mp = cbs->mp;
 	for (i = 0; i < nb_pkts; i++) {
-		p = rte_pktmbuf_copy(pkts[i], mp, 0, UINT32_MAX);
-		if (p)
+		/*
+		 * Similar behavior to rte_bpf_eth callback.
+		 * if BPF program returns zero value for a given packet,
+		 * then it will be ignored.
+		 */
+		if (cbs->filter && rcs[i] == 0) {
+			__atomic_fetch_add(&stats->filtered,
+					   1, __ATOMIC_RELAXED);
+			continue;
+		}
+
+		/*
+		 * If using pcapng then want to wrap packets
+		 * otherwise a simple copy.
+		 */
+		if (cbs->ver == PDUMP_CLIENT_PCAPNG)
+			p = rte_pcapng_copy(port_id, queue,
+					    pkts[i], mp, cbs->snaplen,
+					    ts, direction);
+		else
+			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
+
+		if (unlikely(p == NULL))
+			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+		else
 			dup_bufs[d_pkts++] = p;
 	}
 
+	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)dup_bufs, d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		PDUMP_LOG(DEBUG,
-			"only %d of packets enqueued to ring\n", ring_enq);
+		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
 
 static uint16_t
-pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_rx(uint16_t port, uint16_t queue,
 	struct rte_mbuf **pkts, uint16_t nb_pkts,
-	uint16_t max_pkts __rte_unused,
-	void *user_params)
+	uint16_t max_pkts __rte_unused, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->rx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_IN,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static uint16_t
-pdump_tx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_tx(uint16_t port, uint16_t queue,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->tx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static int
-pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_rx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &rx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &rx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"rx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_first_rx_callback(port, qid,
 								pdump_rx, cbs);
 			if (cbs->cb == NULL) {
@@ -145,8 +203,7 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -170,26 +227,32 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 }
 
 static int
-pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_tx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &tx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &tx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"tx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_tx_callback(port, qid, pdump_tx,
 								cbs);
 			if (cbs->cb == NULL) {
@@ -198,8 +261,7 @@ pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -228,37 +290,47 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	uint16_t nb_rx_q = 0, nb_tx_q = 0, end_q, queue;
 	uint16_t port;
 	int ret = 0;
+	struct rte_bpf *filter = NULL;
 	uint32_t flags;
 	uint16_t operation;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 
-	flags = p->flags;
-	operation = p->op;
-	if (operation == ENABLE) {
-		ret = rte_eth_dev_get_port_by_name(p->data.en_v1.device,
-				&port);
-		if (ret < 0) {
+	if (!(p->ver == PDUMP_CLIENT_LEGACY ||
+	      p->ver == PDUMP_CLIENT_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "incorrect client version %u\n", p->ver);
+		return -EINVAL;
+	}
+
+	if (p->prm) {
+		if (p->prm->prog_arg.type != RTE_BPF_ARG_PTR_MBUF) {
 			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.en_v1.device);
+				  "invalid BPF program type: %u\n",
+				  p->prm->prog_arg.type);
 			return -EINVAL;
 		}
-		queue = p->data.en_v1.queue;
-		ring = p->data.en_v1.ring;
-		mp = p->data.en_v1.mp;
-	} else {
-		ret = rte_eth_dev_get_port_by_name(p->data.dis_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.dis_v1.device);
-			return -EINVAL;
+
+		filter = rte_bpf_load(p->prm);
+		if (filter == NULL) {
+			PDUMP_LOG(ERR, "cannot load BPF filter: %s\n",
+				  rte_strerror(rte_errno));
+			return -rte_errno;
 		}
-		queue = p->data.dis_v1.queue;
-		ring = p->data.dis_v1.ring;
-		mp = p->data.dis_v1.mp;
+	}
+
+	flags = p->flags;
+	operation = p->op;
+	queue = p->queue;
+	ring = p->ring;
+	mp = p->mp;
+
+	ret = rte_eth_dev_get_port_by_name(p->device, &port);
+	if (ret < 0) {
+		PDUMP_LOG(ERR,
+			  "failed to get port id for device id=%s\n",
+			  p->device);
+		return -EINVAL;
 	}
 
 	/* validation if packet capture is for all queues */
@@ -296,8 +368,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register RX callback */
 	if (flags & RTE_PDUMP_FLAG_RX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_rx_q : queue + 1;
-		ret = pdump_register_rx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_rx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -305,8 +378,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register TX callback */
 	if (flags & RTE_PDUMP_FLAG_TX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_tx_q : queue + 1;
-		ret = pdump_register_tx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_tx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -347,8 +421,18 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 int
 rte_pdump_init(void)
 {
+	const struct rte_memzone *mz;
 	int ret;
 
+	mz = rte_memzone_reserve(MZ_RTE_PDUMP_STATS, sizeof(*pdump_stats),
+				 rte_socket_id(), 0);
+	if (mz == NULL) {
+		PDUMP_LOG(ERR, "cannot allocate pdump statistics\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	pdump_stats = mz->addr;
+
 	ret = rte_mp_action_register(PDUMP_MP, pdump_server);
 	if (ret && rte_errno != ENOTSUP)
 		return -1;
@@ -392,14 +476,21 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 static int
 pdump_validate_flags(uint32_t flags)
 {
-	if (flags != RTE_PDUMP_FLAG_RX && flags != RTE_PDUMP_FLAG_TX &&
-		flags != RTE_PDUMP_FLAG_RXTX) {
+	if ((flags & RTE_PDUMP_FLAG_RXTX) == 0) {
 		PDUMP_LOG(ERR,
 			"invalid flags, should be either rx/tx/rxtx\n");
 		rte_errno = EINVAL;
 		return -1;
 	}
 
+	/* mask off the flags we know about */
+	if (flags & ~(RTE_PDUMP_FLAG_RXTX | RTE_PDUMP_FLAG_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "unknown flags: %#x\n", flags);
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -426,12 +517,12 @@ pdump_validate_port(uint16_t port, char *name)
 }
 
 static int
-pdump_prepare_client_request(char *device, uint16_t queue,
-				uint32_t flags,
-				uint16_t operation,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+pdump_prepare_client_request(const char *device, uint16_t queue,
+			     uint32_t flags, uint32_t snaplen,
+			     uint16_t operation,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     const struct rte_bpf_prm *prm)
 {
 	int ret = -1;
 	struct rte_mp_msg mp_req, *mp_rep;
@@ -440,23 +531,23 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	struct pdump_request *req = (struct pdump_request *)mp_req.param;
 	struct pdump_response *resp;
 
-	req->ver = 1;
-	req->flags = flags;
+	memset(req, 0, sizeof(*req));
+	if (flags & RTE_PDUMP_FLAG_PCAPNG)
+		req->ver = PDUMP_CLIENT_PCAPNG;
+	else
+		req->ver = PDUMP_CLIENT_LEGACY;
+
+	req->flags = flags & RTE_PDUMP_FLAG_RXTX;
 	req->op = operation;
+	req->queue = queue;
+	strlcpy(req->device, device, sizeof(req->device));
+
 	if ((operation & ENABLE) != 0) {
-		strlcpy(req->data.en_v1.device, device,
-			sizeof(req->data.en_v1.device));
-		req->data.en_v1.queue = queue;
-		req->data.en_v1.ring = ring;
-		req->data.en_v1.mp = mp;
-		req->data.en_v1.filter = filter;
-	} else {
-		strlcpy(req->data.dis_v1.device, device,
-			sizeof(req->data.dis_v1.device));
-		req->data.dis_v1.queue = queue;
-		req->data.dis_v1.ring = NULL;
-		req->data.dis_v1.mp = NULL;
-		req->data.dis_v1.filter = NULL;
+		req->queue = queue;
+		req->ring = ring;
+		req->mp = mp;
+		req->prm = prm;
+		req->snaplen = snaplen;
 	}
 
 	strlcpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
@@ -477,11 +568,17 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	return ret;
 }
 
-int
-rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
-			struct rte_ring *ring,
-			struct rte_mempool *mp,
-			void *filter)
+/*
+ * There are two versions of this function, because although original API
+ * left place holder for future filter, it never checked the value.
+ * Therefore the API can't depend on application passing a non
+ * bogus value.
+ */
+static int
+pdump_enable(uint16_t port, uint16_t queue,
+	     uint32_t flags, uint32_t snaplen,
+	     struct rte_ring *ring, struct rte_mempool *mp,
+	     const struct rte_bpf_prm *prm)
 {
 	int ret;
 	char name[RTE_DEV_NAME_MAX_LEN];
@@ -496,20 +593,42 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						ENABLE, ring, mp, filter);
+	if (snaplen == 0)
+		snaplen = UINT32_MAX;
 
-	return ret;
+	return pdump_prepare_client_request(name, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
 }
 
 int
-rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
-				uint32_t flags,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
+		 struct rte_ring *ring,
+		 struct rte_mempool *mp,
+		 void *filter __rte_unused)
 {
-	int ret = 0;
+	return pdump_enable(port, queue, flags, 0,
+			    ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm)
+{
+	return pdump_enable(port, queue, flags, snaplen,
+			    ring, mp, prm);
+}
+
+static int
+pdump_enable_by_deviceid(const char *device_id, uint16_t queue,
+			 uint32_t flags, uint32_t snaplen,
+			 struct rte_ring *ring,
+			 struct rte_mempool *mp,
+			 const struct rte_bpf_prm *prm)
+{
+	int ret;
 
 	ret = pdump_validate_ring_mp(ring, mp);
 	if (ret < 0)
@@ -518,10 +637,30 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						ENABLE, ring, mp, filter);
+	return pdump_prepare_client_request(device_id, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
+}
 
-	return ret;
+int
+rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
+			     uint32_t flags,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     void *filter __rte_unused)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, 0,
+					ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *prm)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, snaplen,
+					ring, mp, prm);
 }
 
 int
@@ -537,8 +676,8 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags)
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(name, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
@@ -553,8 +692,66 @@ rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(device_id, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
+
+static void
+pdump_sum_stats(uint16_t port, uint16_t nq,
+		const struct rte_pdump_stats stats[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+		struct rte_pdump_stats *total)
+{
+	uint64_t *sum = (uint64_t *)total;
+	unsigned int i;
+	uint64_t val;
+	uint16_t qid;
+
+	for (qid = 0; qid < nq; qid++) {
+		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+
+		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
+			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			sum[i] += val;
+		}
+	}
+}
+
+int
+rte_pdump_stats(uint16_t port, struct rte_pdump_stats *stats)
+{
+	struct rte_eth_dev_info dev_info;
+	const struct rte_memzone *mz;
+	int ret;
+
+	memset(stats, 0, sizeof(*stats));
+	ret = rte_eth_dev_info_get(port, &dev_info);
+	if (ret != 0) {
+		PDUMP_LOG(ERR,
+			  "Error during getting device (port %u) info: %s\n",
+			  port, strerror(-ret));
+		return ret;
+	}
+
+	if (pdump_stats == NULL) {
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+			PDUMP_LOG(ERR,
+				  "pdump not initialized\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		mz = rte_memzone_lookup(MZ_RTE_PDUMP_STATS);
+		if (mz == NULL) {
+			PDUMP_LOG(ERR, "can not find pdump stats\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+		pdump_stats = mz->addr;
+	}
+
+	pdump_sum_stats(port, dev_info.nb_rx_queues, pdump_stats->rx, stats);
+	pdump_sum_stats(port, dev_info.nb_tx_queues, pdump_stats->tx, stats);
+	return 0;
+}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index 6b00fc17aeb2..e2fbd78c6273 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -15,6 +15,7 @@
 #include <stdint.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
+#include <rte_bpf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -26,7 +27,9 @@ enum {
 	RTE_PDUMP_FLAG_RX = 1,  /* receive direction */
 	RTE_PDUMP_FLAG_TX = 2,  /* transmit direction */
 	/* both receive and transmit directions */
-	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
+	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX),
+
+	RTE_PDUMP_FLAG_PCAPNG = 4, /* format for pcapng */
 };
 
 /**
@@ -68,7 +71,7 @@ rte_pdump_uninit(void);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  Unused should be NULL.
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -80,6 +83,41 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 		struct rte_mempool *mp,
 		void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given port and queue with filtering.
+ *
+ * @param port_id
+ *  The Ethernet port on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param prm
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf(uint16_t port_id, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm);
+
 /**
  * Disables packet capturing on given port and queue.
  *
@@ -118,7 +156,7 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  unused should be NULL
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -131,6 +169,43 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 				struct rte_mempool *mp,
 				void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given device id and queue with filtering.
+ * device_id can be name or pci address of device.
+ *
+ * @param device_id
+ *  device id on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *prm);
+
+
 /**
  * Disables packet capturing on given device_id and queue.
  * device_id can be name or pci address of device.
@@ -153,6 +228,35 @@ int
 rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 				uint32_t flags);
 
+
+/**
+ * A structure used to retrieve statistics from packet capture.
+ * The statistics are sum of both receive and transmit queues.
+ */
+struct rte_pdump_stats {
+	uint64_t accepted; /**< Number of packets accepted by filter. */
+	uint64_t filtered; /**< Number of packets rejected by filter. */
+	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
+	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+
+	uint64_t reserved[4]; /**< Reserved and pad to cache line */
+};
+
+/**
+ * Retrieve the packet capture statistics for a queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param stats
+ *   A pointer to structure of type *rte_pdump_stats* to be filled in.
+ * @return
+ *   Zero if successful. -1 on error and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_pdump_stats(uint16_t port_id, struct rte_pdump_stats *stats);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index f0a9d12c9a9e..ce5502d9cdf4 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -10,3 +10,11 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_pdump_enable_bpf;
+	rte_pdump_enable_bpf_by_deviceid;
+	rte_pdump_stats;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v4 6/8] app/dumpcap: add new packet capture application
  2021-09-08 17:16 ` [dpdk-dev] [PATCH v4 0/8] Packet capture framework enhancements Stephen Hemminger
                     ` (4 preceding siblings ...)
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 5/8] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-09-08 17:16   ` Stephen Hemminger
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 7/8] doc: changes for new pcapng and dumpcap Stephen Hemminger
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 8/8] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
  7 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 17:16 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.

It is not complete yet some features such as filtering are not implemented.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/dumpcap/main.c      | 827 ++++++++++++++++++++++++++++++++++++++++
 app/dumpcap/meson.build |  16 +
 app/meson.build         |   1 +
 3 files changed, 844 insertions(+)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build

diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c
new file mode 100644
index 000000000000..66e25708b4d3
--- /dev/null
+++ b/app/dumpcap/main.c
@@ -0,0 +1,827 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * DPDK application to dump network traffic
+ * This is designed to look and act like the Wireshark
+ * dumpcap program.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <limits.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_alarm.h>
+#include <rte_bpf.h>
+#include <rte_config.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_lcore.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_pcapng.h>
+#include <rte_pdump.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+#include <rte_time.h>
+#include <rte_version.h>
+
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+
+#define RING_NAME "capture-ring"
+#define MONITOR_INTERVAL  (500 * 1000)
+#define MBUF_POOL_CACHE_SIZE 32
+#define BURST_SIZE 32
+#define SLEEP_THRESHOLD 1000
+
+/* command line flags */
+static const char *progname;
+static bool quit_signal;
+static bool group_read;
+static bool quiet;
+static bool promiscuous_mode = true;
+static bool use_pcapng = true;
+static char *output_name;
+static const char *filter_str;
+static unsigned int ring_size = 2048;
+static const char *capture_comment;
+static uint32_t snaplen = RTE_MBUF_DEFAULT_BUF_SIZE;
+static bool dump_bpf;
+static struct {
+	uint64_t  duration;	/* nanoseconds */
+	unsigned long packets;  /* number of packets in file */
+	size_t size;		/* file size (bytes) */
+} stop;
+
+/* Running state */
+static struct rte_bpf_prm *bpf_prm;
+static uint64_t start_time, end_time;
+static uint64_t packets_received;
+static size_t file_size;
+
+struct interface {
+	TAILQ_ENTRY(interface) next;
+	uint16_t port;
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_rxtx_callback *rx_cb[RTE_MAX_QUEUES_PER_PORT];
+};
+
+TAILQ_HEAD(interface_list, interface);
+static struct interface_list interfaces = TAILQ_HEAD_INITIALIZER(interfaces);
+static struct interface *port2intf[RTE_MAX_ETHPORTS];
+
+static void usage(void)
+{
+	printf("Usage: %s [options] ...\n\n", progname);
+	printf("Capture Interface:\n"
+	       "  -i <interface>           name or port index of interface\n"
+	       "  -f <capture filter>      packet filter in libpcap filter syntax\n");
+	printf("  -s <snaplen>, --snapshot-length <snaplen>\n"
+	       "                           packet snapshot length (def: %u)\n",
+	       RTE_MBUF_DEFAULT_BUF_SIZE);
+	printf("  -p, --no-promiscuous-mode\n"
+	       "                           don't capture in promiscuous mode\n"
+	       "  -D, --list-interfaces    print list of interfaces and exit\n"
+	       "  -d                       print generated BPF code for capture filter\n"
+	       "\n"
+	       "Stop conditions:\n"
+	       "  -c <packet count>        stop after n packets (def: infinite)\n"
+	       "  -a <autostop cond.> ..., --autostop <autostop cond.> ...\n"
+	       "                           duration:NUM - stop after NUM seconds\n"
+	       "                           filesize:NUM - stop this file after NUM kB\n"
+	       "                            packets:NUM - stop after NUM packets\n"
+	       "Output (files):\n"
+	       "  -w <filename>            name of file to save (def: tempfile)\n"
+	       "  -g                       enable group read access on the output file(s)\n"
+	       "  -n                       use pcapng format instead of pcap (default)\n"
+	       "  -P                       use libpcap format instead of pcapng\n"
+	       "  --capture-comment <comment>\n"
+	       "                           add a capture comment to the output file\n"
+	       "\n"
+	       "Miscellaneous:\n"
+	       "  -q                       don't report packet capture counts\n"
+	       "  -v, --version            print version information and exit\n"
+	       "  -h, --help               display this help and exit\n"
+	       "\n"
+	       "Use Ctrl-C to stop capturing at any time.\n");
+}
+
+static const char *version(void)
+{
+	static char str[128];
+
+	snprintf(str, sizeof(str),
+		 "%s 1.0 (%s)\n", progname, rte_version());
+	return str;
+}
+
+/* Parse numeric argument from command line */
+static unsigned long get_uint(const char *arg, const char *name,
+			     unsigned int limit)
+{
+	unsigned long u;
+	char *endp;
+
+	u = strtoul(arg, &endp, 0);
+	if (*arg == '\0' || *endp != '\0')
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is not a valid number\n",
+			 name, arg);
+	if (limit && u > limit)
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is too large (greater than %u)\n",
+			 name, arg, limit);
+
+	return u;
+}
+
+/* Set auto stop values */
+static void auto_stop(char *opt)
+{
+	char *value, *endp;
+
+	value = strchr(opt, ':');
+	if (value == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Missing colon in auto stop parameter\n");
+
+	*value++ = '\0';
+	if (strcmp(opt, "duration") == 0) {
+		double interval = strtod(value, &endp);
+
+		if (*value == '\0' || *endp != '\0' || interval <= 0)
+			rte_exit(EXIT_FAILURE,
+				 "Invalid duration \"%s\"\n", value);
+		stop.duration = NSEC_PER_SEC * interval;
+	} else if (strcmp(opt, "filesize") == 0) {
+		stop.size = get_uint(value, "filesize", 0) * 1024;
+	} else if (strcmp(opt, "packets") == 0) {
+		stop.packets = get_uint(value, "packets", 0);
+	} else {
+		rte_exit(EXIT_FAILURE,
+			 "Unknown autostop parameter \"%s\"\n", opt);
+	}
+}
+
+/* Add interface to list of interfaces to capture */
+static void add_interface(uint16_t port, const char *name)
+{
+	struct interface *intf;
+
+	intf = malloc(sizeof(*intf));
+	if (!intf)
+		rte_exit(EXIT_FAILURE, "no memory for interface\n");
+
+	memset(intf, 0, sizeof(*intf));
+	strlcpy(intf->name, name, sizeof(intf->name));
+
+	printf("Capturing on '%s'\n", name);
+
+	port2intf[port] = intf;
+	TAILQ_INSERT_TAIL(&interfaces, intf, next);
+}
+
+/* Select all valid DPDK interfaces */
+static void select_all_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+	}
+}
+
+/*
+ * Choose interface to capture if no -i option given.
+ * Select the first DPDK port, this matches what dumpcap does.
+ */
+static void set_default_interface(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+		return;
+	}
+	rte_exit(EXIT_FAILURE, "No usable interfaces found\n");
+}
+
+/* Lookup interface by name or port and add it to the list */
+static void select_interface(const char *arg)
+{
+	uint16_t port;
+
+	if (strcmp(arg, "*"))
+		select_all_interfaces();
+	else if (rte_eth_dev_get_port_by_name(arg, &port) == 0)
+		add_interface(port, arg);
+	else {
+		char name[RTE_ETH_NAME_MAX_LEN];
+
+		port = get_uint(arg, "port_number", UINT16_MAX);
+		if (rte_eth_dev_get_name_by_port(port, name) < 0)
+			rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
+				 port);
+		add_interface(port, name);
+	}
+}
+
+/* Display list of possible interfaces that can be used. */
+static void show_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		printf("%u. %s\n", p, name);
+	}
+}
+
+static void compile_filter(void)
+{
+	struct bpf_program bf;
+	pcap_t *pcap;
+
+	pcap = pcap_open_dead(DLT_EN10MB, snaplen);
+	if (!pcap)
+		rte_exit(EXIT_FAILURE, "can not open pcap\n");
+
+	if (pcap_compile(pcap, &bf, filter_str,
+			 1, PCAP_NETMASK_UNKNOWN) != 0)
+		rte_exit(EXIT_FAILURE, "pcap filter string not valid (%s)\n",
+			 pcap_geterr(pcap));
+
+	bpf_prm = rte_bpf_convert(&bf);
+	if (bpf_prm == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "bpf convert failed\n");
+
+	if (dump_bpf) {
+		printf("cBPF program (%u insns)\n", bf.bf_len);
+		bpf_dump(&bf, 1);
+		printf("\neBPF program (%u insns)\n", bpf_prm->nb_ins);
+		rte_bpf_dump(stdout, bpf_prm->ins, bpf_prm->nb_ins);
+		exit(0);
+	}
+
+	/* Don't care about original program any more */
+	pcap_freecode(&bf);
+	pcap_close(pcap);
+}
+
+/*
+ * Parse command line options.
+ * These are chosen to be similar to dumpcap command.
+ */
+static void parse_opts(int argc, char **argv)
+{
+	static const struct option long_options[] = {
+		{ "autostop",        required_argument, NULL, 'a' },
+		{ "capture-comment", required_argument, NULL, 0 },
+		{ "help",            no_argument,       NULL, 'h' },
+		{ "interface",       required_argument, NULL, 'i' },
+		{ "list-interfaces", no_argument,       NULL, 'D' },
+		{ "no-promiscuous-mode", no_argument,   NULL, 'p' },
+		{ "output-file",     required_argument, NULL, 'w' },
+		{ "ring-buffer",     required_argument, NULL, 'b' },
+		{ "snapshot-length", required_argument, NULL, 's' },
+		{ "version",         no_argument,       NULL, 'v' },
+		{ NULL },
+	};
+	int option_index, c;
+
+	for (;;) {
+		c = getopt_long(argc, argv, "a:b:c:dDf:ghi:nN:pPqs:vw:",
+				long_options, &option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			switch (option_index) {
+			case 0:
+				capture_comment = optarg;
+				break;
+			default:
+				usage();
+				exit(1);
+			}
+			break;
+		case 'a':
+			auto_stop(optarg);
+			break;
+		case 'b':
+			rte_exit(EXIT_FAILURE,
+				 "multiple files not implemented\n");
+			break;
+		case 'c':
+			stop.packets = get_uint(optarg, "packet_count", 0);
+			break;
+		case 'd':
+			dump_bpf = true;
+			break;
+		case 'D':
+			show_interfaces();
+			exit(0);
+		case 'f':
+			filter_str = optarg;
+			break;
+		case 'g':
+			group_read = true;
+			break;
+		case 'h':
+			printf("%s\n\n", version());
+			usage();
+			exit(0);
+		case 'i':
+			select_interface(optarg);
+			break;
+		case 'n':
+			use_pcapng = true;
+			break;
+		case 'N':
+			ring_size = get_uint(optarg, "packet_limit", 0);
+			break;
+		case 'p':
+			promiscuous_mode = false;
+			break;
+		case 'P':
+			use_pcapng = false;
+			break;
+		case 'q':
+			quiet = true;
+			break;
+		case 's':
+			snaplen = get_uint(optarg, "snap_len", 0);
+			break;
+		case 'w':
+			output_name = optarg;
+			break;
+		case 'v':
+			printf("%s\n", version());
+			exit(0);
+		default:
+			fprintf(stderr, "Invalid option: %s\n",
+				argv[optind - 1]);
+			usage();
+			exit(1);
+		}
+	}
+}
+
+static void
+signal_handler(int sig_num __rte_unused)
+{
+	__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+}
+
+/* Return the time since 1/1/1970 in nanoseconds */
+static uint64_t create_timestamp(void)
+{
+	struct timespec now;
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+	return rte_timespec_to_ns(&now);
+}
+
+static void
+cleanup_pdump_resources(void)
+{
+	struct interface *intf;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		rte_pdump_disable(intf->port,
+				  RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
+		if (promiscuous_mode)
+			rte_eth_promiscuous_disable(intf->port);
+	}
+}
+
+/* Alarm signal handler, used to check that primary process */
+static void
+monitor_primary(void *arg __rte_unused)
+{
+	if (__atomic_load_n(&quit_signal, __ATOMIC_RELAXED))
+		return;
+
+	if (rte_eal_primary_proc_alive(NULL)) {
+		rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	} else {
+		fprintf(stderr,
+			"Primary process is no longer active, exiting...\n");
+		__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+	}
+}
+
+/* Setup handler to check when primary exits. */
+static void
+enable_primary_monitor(void)
+{
+	int ret;
+
+	/* Once primary exits, so will pdump. */
+	ret = rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to enable monitor:%d\n", ret);
+}
+
+static void
+disable_primary_monitor(void)
+{
+	int ret;
+
+	ret = rte_eal_alarm_cancel(monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to disable monitor:%d\n", ret);
+}
+
+static void
+report_packet_stats(rte_pcapng_t *out)
+{
+	struct rte_pdump_stats pdump_stats;
+	struct interface *intf;
+	uint64_t ifrecv, ifdrop;
+	double percent;
+
+	fputc('\n', stderr);
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (rte_pdump_stats(intf->port, &pdump_stats) < 0)
+			continue;
+
+		/* do what Wiretap does */
+		ifrecv = pdump_stats.accepted + pdump_stats.filtered;
+		ifdrop = pdump_stats.nombuf + pdump_stats.ringfull;
+
+		if (use_pcapng)
+			rte_pcapng_write_stats(out, intf->port, NULL,
+					       start_time, end_time,
+					       ifrecv, ifdrop);
+
+		if (ifrecv == 0)
+			percent = 0;
+		else
+			percent = 100. * ifrecv / (ifrecv + ifdrop);
+
+		fprintf(stderr,
+			"Packets received/dropped on interface '%s': "
+			"%"PRIu64 "/%" PRIu64 " (%.1f)\n",
+			intf->name, ifrecv, ifdrop, percent);
+	}
+}
+
+/*
+ * Start DPDK EAL with arguments.
+ * Unlike most DPDK programs, this application does not use the
+ * typical EAL command line arguments.
+ * We don't want to expose all the DPDK internals to the user.
+ */
+static void dpdk_init(void)
+{
+	const char *args[] = {
+		progname, "--proc-type", "secondary",
+		"--log-level", "bpf:debug"
+
+	};
+	const int eal_argc = RTE_DIM(args);
+	char **eal_argv;
+	unsigned int i;
+
+	/* DPDK API requires mutable versions of command line arguments. */
+	eal_argv = calloc(eal_argc + 1, sizeof(char *));
+	if (eal_argv == NULL)
+		rte_panic("No memory\n");
+
+	for (i = 0; i < RTE_DIM(args); i++)
+		eal_argv[i] = strdup(args[i]);
+
+	if (rte_eal_init(eal_argc, eal_argv) < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed: is primary process running?\n");
+
+	if (rte_eth_dev_count_avail() == 0)
+		rte_exit(EXIT_FAILURE, "No Ethernet ports found\n");
+}
+
+/* Create packet ring shared between callbacks and process */
+static struct rte_ring *create_ring(void)
+{
+	struct rte_ring *ring;
+	size_t size, log2;
+
+	/* Find next power of 2 >= size. */
+	size = ring_size;
+	log2 = sizeof(size) * 8 - __builtin_clzl(size - 1);
+	size = 1u << log2;
+
+	if (size != ring_size) {
+		fprintf(stderr, "Ring size %u rounded up to %zu\n",
+			ring_size, size);
+		ring_size = size;
+	}
+
+	ring = rte_ring_lookup(RING_NAME);
+	if (ring == NULL) {
+		ring = rte_ring_create(RING_NAME, ring_size,
+					rte_socket_id(), 0);
+		if (ring == NULL)
+			rte_exit(EXIT_FAILURE, "Could not create ring :%s\n",
+				 rte_strerror(rte_errno));
+	}
+	return ring;
+}
+
+static struct rte_mempool *create_mempool(void)
+{
+	static const char pool_name[] = "capture_mbufs";
+	size_t num_mbufs = 2 * ring_size;
+	struct rte_mempool *mp;
+
+	mp = rte_mempool_lookup(pool_name);
+	if (mp)
+		return mp;
+
+	mp = rte_pktmbuf_pool_create_by_ops(pool_name, num_mbufs,
+					    MBUF_POOL_CACHE_SIZE, 0,
+					    rte_pcapng_mbuf_size(snaplen),
+					    rte_socket_id(), "ring_mp_sc");
+	if (mp == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Mempool (%s) creation failed: %s\n", pool_name,
+			 rte_strerror(rte_errno));
+
+	return mp;
+}
+
+static void *create_output(void)
+{
+	static char tmp_path[PATH_MAX];
+	struct utsname uts;
+	char os[256];
+	int fd;
+
+	/* If no filename specified make a tempfile name */
+	if (output_name == NULL) {
+		struct interface *intf;
+		struct tm *tm;
+		time_t now;
+		char ts[32];
+
+		intf = TAILQ_FIRST(&interfaces);
+		now = time(NULL);
+		tm = localtime(&now);
+		if (!tm)
+			rte_panic("localtime failed\n");
+
+		strftime(ts, sizeof(ts), "%Y%m%d%H%M%S", tm);
+
+		snprintf(tmp_path, sizeof(tmp_path),
+			 "/tmp/%s_%u_%s_%s.%s",
+			 progname, intf->port, intf->name, ts,
+			 use_pcapng ? "pcapng" : "pcap");
+		output_name = tmp_path;
+	}
+
+	if (strcmp(output_name, "-") == 0)
+		fd = STDOUT_FILENO;
+	else {
+		mode_t mode = group_read ? 0640 : 0600;
+
+		fd = open(output_name, O_WRONLY | O_CREAT, mode);
+		if (fd < 0)
+			rte_exit(EXIT_FAILURE, "Can not open \"%s\": %s\n",
+				 output_name, strerror(errno));
+	}
+
+	if (use_pcapng) {
+		rte_pcapng_t *pcapng;
+
+		if (uname(&uts) < 0)
+			strcpy(os, "unknown");
+		else
+			snprintf(os, sizeof(os), "%s %s",
+				 uts.sysname, uts.release);
+
+		pcapng = rte_pcapng_fdopen(fd, os, NULL, version(), capture_comment);
+		if (pcapng == NULL)
+			rte_exit(EXIT_FAILURE, "pcapng_fdopen failed: %s\n",
+				 strerror(rte_errno));
+		return pcapng;
+	} else {
+		pcap_dumper_t *dumper;
+		pcap_t *pcap;
+
+		pcap = pcap_open_dead_with_tstamp_precision(DLT_EN10MB, snaplen,
+							    PCAP_TSTAMP_PRECISION_NANO);
+		if (pcap == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_open_dead failed\n");
+
+		dumper = pcap_dump_fopen(pcap, fdopen(fd, "w"));
+		if (dumper == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_dump_fopen failed: %s\n",
+				 pcap_geterr(pcap));
+		return dumper;
+	}
+}
+
+static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp)
+{
+	struct interface *intf;
+	uint32_t flags;
+	int ret;
+
+	flags = RTE_PDUMP_FLAG_RXTX;
+	if (use_pcapng)
+		flags |= RTE_PDUMP_FLAG_PCAPNG;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (promiscuous_mode)
+			rte_eth_promiscuous_enable(intf->port);
+
+		ret = rte_pdump_enable_bpf(intf->port, RTE_PDUMP_ALL_QUEUES,
+					   flags, snaplen,
+					   r, mp, bpf_prm);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Packet dump enable failed: %s\n",
+				 rte_strerror(-ret));
+	}
+}
+
+/*
+ * Show current count of captured packets
+ * with backspaces to overwrite last value.
+ */
+static void show_count(uint64_t count)
+{
+	unsigned int i;
+	static unsigned int bt;
+
+	for (i = 0; i < bt; i++)
+		fputc('\b', stderr);
+
+	bt = fprintf(stderr, "%"PRIu64" ", count);
+}
+
+/* Write multiple packets in older pcap format */
+static ssize_t
+pcap_write_packets(pcap_dumper_t *dumper,
+		   struct rte_mbuf *pkts[], uint16_t n)
+{
+	uint8_t temp_data[snaplen];
+	struct pcap_pkthdr header;
+	uint16_t i;
+	size_t total = 0;
+
+	gettimeofday(&header.ts, NULL);
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m = pkts[i];
+
+		header.len = rte_pktmbuf_pkt_len(m);
+		header.caplen = RTE_MIN(header.len, snaplen);
+
+		pcap_dump((u_char *)dumper, &header,
+			  rte_pktmbuf_read(m, 0, header.caplen, temp_data));
+
+		total += sizeof(header) + header.len;
+	}
+
+	return total;
+}
+
+/* Process all packets in ring and dump to capture file */
+static int process_ring(void *out, struct rte_ring *r)
+{
+	struct rte_mbuf *pkts[BURST_SIZE];
+	unsigned int avail, n;
+	static unsigned int empty_count;
+	ssize_t written;
+
+	n = rte_ring_sc_dequeue_burst(r, (void **) pkts, BURST_SIZE,
+				      &avail);
+	if (n == 0) {
+		/* don't consume endless amounts of cpu if idle */
+		if (empty_count < SLEEP_THRESHOLD)
+			++empty_count;
+		else
+			usleep(10);
+		return 0;
+	}
+
+	empty_count = (avail == 0);
+
+	if (use_pcapng)
+		written = rte_pcapng_write_packets(out, pkts, n);
+	else
+		written = pcap_write_packets(out, pkts, n);
+
+	rte_pktmbuf_free_bulk(pkts, n);
+
+	if (written < 0)
+		return -1;
+
+	file_size += written;
+	packets_received += n;
+	if (!quiet)
+		show_count(packets_received);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	struct rte_ring *r;
+	struct rte_mempool *mp;
+	void *out;
+
+	progname = argv[0];
+
+	dpdk_init();
+	parse_opts(argc, argv);
+
+	if (filter_str)
+		compile_filter();
+
+	if (TAILQ_EMPTY(&interfaces))
+		set_default_interface();
+
+	r = create_ring();
+	mp = create_mempool();
+	out = create_output();
+	if (out == NULL)
+		rte_exit(EXIT_FAILURE, "can not open output file: %s\n",
+			 rte_strerror(rte_errno));
+
+	start_time = create_timestamp();
+	enable_pdump(r, mp);
+
+	signal(SIGINT, signal_handler);
+	signal(SIGPIPE, SIG_IGN);
+
+	enable_primary_monitor();
+
+	if (!quiet) {
+		fprintf(stderr, "Packets captured: ");
+		show_count(0);
+	}
+
+	while (!__atomic_load_n(&quit_signal, __ATOMIC_RELAXED)) {
+		if (process_ring(out, r) < 0) {
+			fprintf(stderr, "pcapng file write failed; %s\n",
+				strerror(errno));
+			break;
+		}
+
+		if (stop.size && file_size >= stop.size)
+			break;
+
+		if (stop.packets && packets_received >= stop.packets)
+			break;
+
+		if (stop.duration != 0 &&
+		    create_timestamp() - start_time > stop.duration)
+			break;
+	}
+
+	end_time = create_timestamp();
+	disable_primary_monitor();
+
+	if (rte_eal_primary_proc_alive(NULL))
+		report_packet_stats(out);
+
+	if (use_pcapng)
+		rte_pcapng_close(out);
+	else
+		pcap_dump_close(out);
+
+	cleanup_pdump_resources();
+	rte_free(bpf_filter);
+	rte_ring_free(r);
+	rte_mempool_free(mp);
+
+	return rte_eal_cleanup() ? EXIT_FAILURE : 0;
+}
diff --git a/app/dumpcap/meson.build b/app/dumpcap/meson.build
new file mode 100644
index 000000000000..794336211eff
--- /dev/null
+++ b/app/dumpcap/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+if not dpdk_conf.has('RTE_PORT_PCAP')
+    build = false
+    reason = 'missing dependency, "libpcap"'
+endif
+
+if is_windows
+	build = false
+	reason = 'not supported on Windows'
+	subdir_done()
+endif
+
+sources = files('main.c')
+deps += ['ethdev', 'pdump', 'pcapng', 'bpf']
diff --git a/app/meson.build b/app/meson.build
index 4c6049807cc3..e41a2e390236 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017-2019 Intel Corporation
 
 apps = [
+        'dumpcap',
         'pdump',
         'proc-info',
         'test-acl',
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v4 7/8] doc: changes for new pcapng and dumpcap
  2021-09-08 17:16 ` [dpdk-dev] [PATCH v4 0/8] Packet capture framework enhancements Stephen Hemminger
                     ` (5 preceding siblings ...)
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 6/8] app/dumpcap: add new packet capture application Stephen Hemminger
@ 2021-09-08 17:16   ` Stephen Hemminger
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 8/8] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
  7 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 17:16 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Describe the new packet capture library and utilities

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/api/doxy-api-index.md                     |  1 +
 doc/api/doxy-api.conf.in                      |  1 +
 .../howto/img/packet_capture_framework.svg    | 96 +++++++++----------
 doc/guides/howto/packet_capture_framework.rst | 67 ++++++-------
 doc/guides/prog_guide/index.rst               |  1 +
 doc/guides/prog_guide/pcapng_lib.rst          | 24 +++++
 doc/guides/prog_guide/pdump_lib.rst           | 28 ++++--
 doc/guides/rel_notes/release_21_11.rst        | 10 ++
 doc/guides/tools/dumpcap.rst                  | 86 +++++++++++++++++
 doc/guides/tools/index.rst                    |  1 +
 10 files changed, 228 insertions(+), 87 deletions(-)
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a0356..ee07394d1c78 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -223,3 +223,4 @@ The public API headers are grouped by topics:
   [experimental APIs]  (@ref rte_compat.h),
   [ABI versioning]     (@ref rte_function_versioning.h),
   [version]            (@ref rte_version.h)
+  [pcapng]             (@ref rte_pcapng.h)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6ab..aba17799a9a1 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -58,6 +58,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/metrics \
                           @TOPDIR@/lib/node \
                           @TOPDIR@/lib/net \
+                          @TOPDIR@/lib/pcapng \
                           @TOPDIR@/lib/pci \
                           @TOPDIR@/lib/pdump \
                           @TOPDIR@/lib/pipeline \
diff --git a/doc/guides/howto/img/packet_capture_framework.svg b/doc/guides/howto/img/packet_capture_framework.svg
index a76baf71fdee..1c2646a81096 100644
--- a/doc/guides/howto/img/packet_capture_framework.svg
+++ b/doc/guides/howto/img/packet_capture_framework.svg
@@ -1,6 +1,4 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<!-- Created with Inkscape (http://www.inkscape.org/) -->
-
 <svg
    xmlns:osb="http://www.openswatchbook.org/uri/2009/osb"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
@@ -16,8 +14,8 @@
    viewBox="0 0 425.19685 283.46457"
    id="svg2"
    version="1.1"
-   inkscape:version="0.91 r13725"
-   sodipodi:docname="drawing-pcap.svg">
+   inkscape:version="1.0.2 (e86c870879, 2021-01-15)"
+   sodipodi:docname="packet_capture_framework.svg">
   <defs
      id="defs4">
     <marker
@@ -228,7 +226,7 @@
        x2="487.64606"
        y2="258.38232"
        gradientUnits="userSpaceOnUse"
-       gradientTransform="translate(-84.916417,744.90779)" />
+       gradientTransform="matrix(1.1457977,0,0,0.99944907,-151.97019,745.05014)" />
     <linearGradient
        inkscape:collect="always"
        xlink:href="#linearGradient5784"
@@ -277,17 +275,18 @@
      borderopacity="1.0"
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
-     inkscape:zoom="0.57434918"
-     inkscape:cx="215.17857"
-     inkscape:cy="285.26445"
+     inkscape:zoom="1"
+     inkscape:cx="226.77165"
+     inkscape:cy="78.124511"
      inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
-     inkscape:window-width="1874"
-     inkscape:window-height="971"
-     inkscape:window-x="2"
-     inkscape:window-y="24"
-     inkscape:window-maximized="0" />
+     inkscape:window-width="2560"
+     inkscape:window-height="1414"
+     inkscape:window-x="0"
+     inkscape:window-y="0"
+     inkscape:window-maximized="1"
+     inkscape:document-rotation="0" />
   <metadata
      id="metadata7">
     <rdf:RDF>
@@ -296,7 +295,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
+        <dc:title />
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -321,15 +320,15 @@
        y="790.82452" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="61.050636"
        y="807.3205"
-       id="text4152"
-       sodipodi:linespacing="125%"><tspan
+       id="text4152"><tspan
          sodipodi:role="line"
          id="tspan4154"
          x="61.050636"
-         y="807.3205">DPDK Primary Application</tspan></text>
+         y="807.3205"
+         style="font-size:12.5px;line-height:1.25">DPDK Primary Application</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6"
@@ -339,19 +338,20 @@
        y="827.01843" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="350.68585"
        y="841.16058"
-       id="text4189"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189"><tspan
          sodipodi:role="line"
          id="tspan4191"
          x="350.68585"
-         y="841.16058">dpdk-pdump</tspan><tspan
+         y="841.16058"
+         style="font-size:12.5px;line-height:1.25">dpdk-dumpcap</tspan><tspan
          sodipodi:role="line"
          x="350.68585"
          y="856.78558"
-         id="tspan4193">tool</tspan></text>
+         id="tspan4193"
+         style="font-size:12.5px;line-height:1.25">tool</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4"
@@ -361,15 +361,15 @@
        y="891.16315" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70612"
        y="905.3053"
-       id="text4189-1"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1"><tspan
          sodipodi:role="line"
          x="352.70612"
          y="905.3053"
-         id="tspan4193-3">PCAP PMD</tspan></text>
+         id="tspan4193-3"
+         style="font-size:12.5px;line-height:1.25">librte_pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5745);fill-opacity:1;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-6"
@@ -379,15 +379,15 @@
        y="923.9931" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.02846"
        y="938.13525"
-       id="text4189-0"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-0"><tspan
          sodipodi:role="line"
          x="136.02846"
          y="938.13525"
-         id="tspan4193-6">dpdk_port0</tspan></text>
+         id="tspan4193-6"
+         style="font-size:12.5px;line-height:1.25">dpdk_port0</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-5"
@@ -397,33 +397,33 @@
        y="824.99817" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="137.54369"
        y="839.14026"
-       id="text4189-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-4"><tspan
          sodipodi:role="line"
          x="137.54369"
          y="839.14026"
-         id="tspan4193-2">librte_pdump</tspan></text>
+         id="tspan4193-2"
+         style="font-size:12.5px;line-height:1.25">librte_pdump</tspan></text>
     <rect
-       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1.07013;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5"
-       width="94.449265"
-       height="35.355339"
-       x="307.7804"
-       y="985.61243" />
+       width="108.21974"
+       height="35.335861"
+       x="297.9809"
+       y="985.62219" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70618"
        y="999.75458"
-       id="text4189-1-8"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8"><tspan
          sodipodi:role="line"
          x="352.70618"
          y="999.75458"
-         id="tspan4193-3-2">capture.pcap</tspan></text>
+         id="tspan4193-3-2"
+         style="font-size:12.5px;line-height:1.25">capture.pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5788-1);fill-opacity:1;stroke:#257cdc;stroke-width:1.12555885;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5-1"
@@ -433,15 +433,15 @@
        y="983.14984" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.53352"
        y="1002.785"
-       id="text4189-1-8-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8-4"><tspan
          sodipodi:role="line"
          x="136.53352"
          y="1002.785"
-         id="tspan4193-3-2-7">Traffic Generator</tspan></text>
+         id="tspan4193-3-2-7"
+         style="font-size:12.5px;line-height:1.25">Traffic Generator</tspan></text>
     <path
        style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker7331)"
        d="m 351.46948,927.02357 c 0,57.5787 0,57.5787 0,57.5787"
diff --git a/doc/guides/howto/packet_capture_framework.rst b/doc/guides/howto/packet_capture_framework.rst
index c31bac52340e..78baa609a021 100644
--- a/doc/guides/howto/packet_capture_framework.rst
+++ b/doc/guides/howto/packet_capture_framework.rst
@@ -1,18 +1,19 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(c) 2017 Intel Corporation.
 
-DPDK pdump Library and pdump Tool
-=================================
+DPDK packet capture libraries and tools
+=======================================
 
 This document describes how the Data Plane Development Kit (DPDK) Packet
 Capture Framework is used for capturing packets on DPDK ports. It is intended
 for users of DPDK who want to know more about the Packet Capture feature and
 for those who want to monitor traffic on DPDK-controlled devices.
 
-The DPDK packet capture framework was introduced in DPDK v16.07. The DPDK
-packet capture framework consists of the DPDK pdump library and DPDK pdump
-tool.
-
+The DPDK packet capture framework was introduced in DPDK v16.07 and
+enhanced in 21.1. The DPDK packet capture framework consists of the
+libraries for collecting packets ``librte_pdump`` and writing packets
+to a file ``librte_pcapng``. There are two sample applications:
+``dpdk-dumpcap`` and older ``dpdk-pdump``.
 
 Introduction
 ------------
@@ -22,43 +23,46 @@ allow users to initialize the packet capture framework and to enable or
 disable packet capture. The library works on a multi process communication model and its
 usage is recommended for debugging purposes.
 
-The :ref:`dpdk-pdump <pdump_tool>` tool is developed based on the
-``librte_pdump`` library.  It runs as a DPDK secondary process and is capable
-of enabling or disabling packet capture on DPDK ports. The ``dpdk-pdump`` tool
-provides command-line options with which users can request enabling or
-disabling of the packet capture on DPDK ports.
+The :ref:`librte_pcapng <pcapng_library>` library provides the APIs to format
+packets and write them to a file in Pcapng format.
+
+
+The :ref:`dpdk-dumpcap <dumpcap_tool>` is a tool that captures packets in
+like Wireshark dumpcap does for Linux. It runs as a DPDK secondary process and
+captures packets from one or more interfaces and writes them to a file
+in Pcapng format.  The ``dpdk-dumpcap`` tool is designed to take
+most of the same options as the Wireshark ``dumpcap`` command.
 
-The application which initializes the packet capture framework will be a primary process
-and the application that enables or disables the packet capture will
-be a secondary process. The primary process sends the Rx and Tx packets from the DPDK ports
-to the secondary process.
+Without any options it will use the packet capture framework to
+capture traffic from the first available DPDK port.
 
 In DPDK the ``testpmd`` application can be used to initialize the packet
-capture framework and acts as a server, and the ``dpdk-pdump`` tool acts as a
+capture framework and acts as a server, and the ``dpdk-dumpcap`` tool acts as a
 client. To view Rx or Tx packets of ``testpmd``, the application should be
-launched first, and then the ``dpdk-pdump`` tool. Packets from ``testpmd``
-will be sent to the tool, which then sends them on to the Pcap PMD device and
-that device writes them to the Pcap file or to an external interface depending
-on the command-line option used.
+launched first, and then the ``dpdk-dumpcap`` tool. Packets from ``testpmd``
+will be sent to the tool, and then to the Pcapng file.
 
 Some things to note:
 
-* The ``dpdk-pdump`` tool can only be used in conjunction with a primary
+* All tools using ``librte_pdump`` can only be used in conjunction with a primary
   application which has the packet capture framework initialized already. In
   dpdk, only ``testpmd`` is modified to initialize packet capture framework,
-  other applications remain untouched. So, if the ``dpdk-pdump`` tool has to
+  other applications remain untouched. So, if the ``dpdk-dumpcap`` tool has to
   be used with any application other than the testpmd, the user needs to
   explicitly modify that application to call the packet capture framework
   initialization code. Refer to the ``app/test-pmd/testpmd.c`` code and look
   for ``pdump`` keyword to see how this is done.
 
-* The ``dpdk-pdump`` tool depends on the libpcap based PMD.
+* The ``dpdk-pdump`` tool is an older tool created as demonstration of ``librte_pdump``
+  library. The ``dpdk-pdump`` tool provides more limited functionality and
+  and depends on the Pcap PMD. It is retained only for compatibility reasons;
+  users should use ``dpdk-dumpcap`` instead.
 
 
 Test Environment
 ----------------
 
-The overview of using the Packet Capture Framework and the ``dpdk-pdump`` tool
+The overview of using the Packet Capture Framework and the ``dpdk-dumpcap`` utility
 for packet capturing on the DPDK port in
 :numref:`figure_packet_capture_framework`.
 
@@ -66,13 +70,13 @@ for packet capturing on the DPDK port in
 
 .. figure:: img/packet_capture_framework.*
 
-   Packet capturing on a DPDK port using the dpdk-pdump tool.
+   Packet capturing on a DPDK port using the dpdk-dumpcap utility.
 
 
 Running the Application
 -----------------------
 
-The following steps demonstrate how to run the ``dpdk-pdump`` tool to capture
+The following steps demonstrate how to run the ``dpdk-dumpcap`` tool to capture
 Rx side packets on dpdk_port0 in :numref:`figure_packet_capture_framework` and
 inspect them using ``tcpdump``.
 
@@ -80,16 +84,15 @@ inspect them using ``tcpdump``.
 
      sudo <build_dir>/app/dpdk-testpmd -c 0xf0 -n 4 -- -i --port-topology=chained
 
-#. Launch the pdump tool as follows::
+#. Launch the dpdk-dump as follows::
 
-     sudo <build_dir>/app/dpdk-pdump -- \
-          --pdump 'port=0,queue=*,rx-dev=/tmp/capture.pcap'
+     sudo <build_dir>/app/dpdk-dumpcap -w /tmp/capture.pcapng
 
 #. Send traffic to dpdk_port0 from traffic generator.
-   Inspect packets captured in the file capture.pcap using a tool
-   that can interpret Pcap files, for example tcpdump::
+   Inspect packets captured in the file capture.pcap using a tool such as
+   tcpdump or tshark that can interpret Pcapng files::
 
-     $tcpdump -nr /tmp/capture.pcap
+     $ tcpdump -nr /tmp/capture.pcapng
      reading from file /tmp/capture.pcap, link-type EN10MB (Ethernet)
      11:11:36.891404 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
      11:11:36.891442 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46a3..b440c77c2ba1 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -43,6 +43,7 @@ Programmer's Guide
     ip_fragment_reassembly_lib
     generic_receive_offload_lib
     generic_segmentation_offload_lib
+    pcapng_lib
     pdump_lib
     multi_proc_support
     kernel_nic_interface
diff --git a/doc/guides/prog_guide/pcapng_lib.rst b/doc/guides/prog_guide/pcapng_lib.rst
new file mode 100644
index 000000000000..36379b530a57
--- /dev/null
+++ b/doc/guides/prog_guide/pcapng_lib.rst
@@ -0,0 +1,24 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2016 Intel Corporation.
+
+.. _pcapng_library:
+
+Packet Capture File Writer
+==========================
+
+Pcapng is a library for creating files in Pcapng file format.
+The Pcapng file format is the default capture file format for modern
+network capture processing tools. It can be read by wireshark and tcpdump.
+
+Usage
+-----
+
+Before the library can be used the function ``rte_pcapng_init``
+should be called once to initialize timestamp computation.
+
+
+References
+----------
+* Draft RFC https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+
+* Project repository  https://github.com/pcapng/pcapng/
diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst
index 62c0b015b2fe..9af91415e5ea 100644
--- a/doc/guides/prog_guide/pdump_lib.rst
+++ b/doc/guides/prog_guide/pdump_lib.rst
@@ -3,10 +3,10 @@
 
 .. _pdump_library:
 
-The librte_pdump Library
-========================
+The Packet Capture Library
+==========================
 
-The ``librte_pdump`` library provides a framework for packet capturing in DPDK.
+The DPDK ``pdump`` library provides a framework for packet capturing in DPDK.
 The library does the complete copy of the Rx and Tx mbufs to a new mempool and
 hence it slows down the performance of the applications, so it is recommended
 to use this library for debugging purposes.
@@ -23,11 +23,19 @@ or disable the packet capture, and to uninitialize it.
 
 * ``rte_pdump_enable()``:
   This API enables the packet capture on a given port and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf()``
+  This API enables the packet capture on a given port and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_enable_by_deviceid()``:
   This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf_by_deviceid()``
+  This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
+  It also allows seating an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_disable()``:
   This API disables the packet capture on a given port and queue.
@@ -61,6 +69,12 @@ and enables the packet capture by registering the Ethernet RX and TX callbacks f
 and queue combinations. Then the primary process will mirror the packets to the new mempool and enqueue them to
 the rte_ring that secondary process have passed to these APIs.
 
+The packet ring supports one of two formats. The default format enqueues copies of the original packets
+into the rte_ring. If the ``RTE_PDUMP_FLAG_PCAPNG`` is set the mbuf data is extended with header and trailer
+to match the format of Pcapng enhanced packet block. The enhanced packet block has meta-data such as the
+timestamp, port and queue the packet was captured on. It is up to the application consuming the
+packets from the ring to select the format desired.
+
 The library APIs ``rte_pdump_disable()`` and ``rte_pdump_disable_by_deviceid()`` disables the packet capture.
 For the calls to these APIs from secondary process, the library creates the "pdump disable" request and sends
 the request to the primary process over the multi process channel. The primary process takes this request and
@@ -74,5 +88,5 @@ function.
 Use Case: Packet Capturing
 --------------------------
 
-The DPDK ``app/pdump`` tool is developed based on this library to capture packets in DPDK.
-Users can use this as an example to develop their own packet capturing tools.
+The DPDK ``app/dpdk-dumpcap`` utility uses this library
+to capture packets in DPDK.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 675b5738348b..ee24cbfdb99d 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -62,6 +62,16 @@ New Features
   * Added bus-level parsing of the devargs syntax.
   * Kept compatibility with the legacy syntax as parsing fallback.
 
+* **Enhance Packet capture.**
+
+  * New dpdk-dumpcap program that has most of the features of the
+    wireshark dumpcap utility including capture of multiple interfaces,
+    stopping after number of bytes, packets.
+  * New library for writing pcapng packet capture files.
+  * Enhancement to the pdump library to support:
+    * Packet filter with BPF.
+    * Pcapng format with timestamps and meta-data.
+    * Fixes packet capture with stripped VLAN tags.
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/dumpcap.rst b/doc/guides/tools/dumpcap.rst
new file mode 100644
index 000000000000..664ea0c79802
--- /dev/null
+++ b/doc/guides/tools/dumpcap.rst
@@ -0,0 +1,86 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Microsoft Corporation.
+
+.. _dumpcap_tool:
+
+dpdk-dumpcap Application
+========================
+
+The ``dpdk-dumpcap`` tool is a Data Plane Development Kit (DPDK)
+network traffic dump tool.  The interface is similar to  the dumpcap tool in Wireshark.
+It runs as a secondary DPDK process and lets you capture packets that are
+coming into and out of a DPDK primary process.
+The ``dpdk-dumpcap`` writes files in Pcapng packet format using
+capture file format is pcapng.
+
+Without any options set it will use DPDK to capture traffic from the first
+available DPDK interface and write the received raw packet data, along
+with timestamps into a pcapng file.
+
+If the ``-w`` option is not specified, ``dpdk-dumpcap`` writes to a newly
+create file with a name chosen based on interface name and timestamp.
+If ``-w`` option is specified, then that file is used.
+
+   .. Note::
+      * The ``dpdk-dumpcap`` tool can only be used in conjunction with a primary
+        application which has the packet capture framework initialized already.
+        In dpdk, only the ``testpmd`` is modified to initialize packet capture
+        framework, other applications remain untouched. So, if the ``dpdk-dumpcap``
+        tool has to be used with any application other than the testpmd, user
+        needs to explicitly modify that application to call packet capture
+        framework initialization code. Refer ``app/test-pmd/testpmd.c``
+        code to see how this is done.
+
+      * The ``dpdk-dumpcap`` tool runs as a DPDK secondary process. It exits when
+        the primary application exits.
+
+
+Running the Application
+-----------------------
+
+To list interfaces available for capture use ``--list-interfaces``.
+
+To filter packets in style of *tshark* use the ``-f`` flag.
+
+To capture on multiple interfaces at once, use multiple ``-I`` flags.
+
+Example
+-------
+
+.. code-block:: console
+
+   # ./<build_dir>/app/dpdk-dumpcap --list-interfaces
+   0. 000:00:03.0
+   1. 000:00:03.1
+
+   # ./<build_dir>/app/dpdk-dumpcap -I 0000:00:03.0 -c 6 -w /tmp/sample.pcapng
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 6/0
+
+   # ./<build_dir>/app/dpdk-dumpcap -f 'tcp port 80'
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 10/8
+
+
+Limitations
+-----------
+The following option of Wireshark ``dumpcap`` is not yet implemented:
+
+   * ``-b|--ring-buffer`` -- more complex file management.
+
+The following options do not make sense in the context of DPDK.
+
+   * ``-C <byte_limit>`` -- its a kernel thing
+
+   * ``-t`` -- use a thread per interface
+
+   * Timestamp type.
+
+   * Link data types. Only EN10MB (Ethernet) is supported.
+
+   * Wireless related options:  ``-I|--monitor-mode`` and  ``-k <freq>``
+
+
+.. Note::
+   * The options to ``dpdk-dumpcap`` are like the Wireshark dumpcap program and
+     are not the same as ``dpdk-pdump`` and other DPDK applications.
diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst
index 93dde4148e90..b71c12b8f2dd 100644
--- a/doc/guides/tools/index.rst
+++ b/doc/guides/tools/index.rst
@@ -8,6 +8,7 @@ DPDK Tools User Guides
     :maxdepth: 2
     :numbered:
 
+    dumpcap
     proc_info
     pdump
     pmdinfo
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v4 8/8] MAINTAINERS: add entry for new pcapng and dumper
  2021-09-08 17:16 ` [dpdk-dev] [PATCH v4 0/8] Packet capture framework enhancements Stephen Hemminger
                     ` (6 preceding siblings ...)
  2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 7/8] doc: changes for new pcapng and dumpcap Stephen Hemminger
@ 2021-09-08 17:16   ` Stephen Hemminger
  7 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 17:16 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Claim responsibility for the new code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 266f5ac1dae8..06384ac2702d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1429,6 +1429,12 @@ F: app/test/test_pdump.*
 F: app/pdump/
 F: doc/guides/tools/pdump.rst
 
+Packet dump
+M: Stephen Hemminger <stephen@networkplumber.org>
+F: lib/pcapng/
+F: doc/guides/prog_guide/pcapng_lib.rst
+F: app/dumpcap/
+F: doc/guides/tools/dumpcap.rst
 
 Packet Framework
 ----------------
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v5 0/9] Packet capture framework enhancements
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (7 preceding siblings ...)
  2021-09-08 17:16 ` [dpdk-dev] [PATCH v4 0/8] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-08 21:50 ` Stephen Hemminger
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 1/9] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
                     ` (8 more replies)
  2021-09-09 23:33 ` [dpdk-dev] [PATCH v6 00/10] Packet capture framework enhancements Stephen Hemminger
                   ` (9 subsequent siblings)
  18 siblings, 9 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 21:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patch set is a more complete version of the the enhanced
packet capture support described last year.

The new capture library and utility are:
  - faster avoids lots of extra I/O, does bursting, etc.
  - gives more information (multiple ports, queues, etc)
  - has a better user interface (same as Wireshark dumpcap)
  - fixes structural problems with VLAN's and timestamps

It preserves the old pdump command as is for those people
who never want to change.

v5 changes:
  - minor build and checkpatch fixes for RHEL/FreeBSD
  - disable lib/pdump on Windows. It was not useful before
    and now pdump depends on bpf.

v4 changes:
  - minor checkpatch fixes.
    Note: some of the checkpatch warnings are bogus and won't be fixed.
  - fix build of dumpcap on FreeBSD

v3 changes:
  - introduce packet filters using classic BPF to eBPF converter
    required small fix to DPDK BPF interpreter
  - introduce function to decode eBPF instructions
  - add option to dumpcap to show both classic BPF and eBPF result
  - drop some un-useful stubs
  - minor checkpatch warning cleanup


Stephen Hemminger (9):
  librte_pcapng: add new library for writing pcapng files
  bpf: allow self-xor operation
  bpf: add function to convert classic BPF to DPDK BPF
  bpf: add function to dump eBPF instructions
  lib: pdump is not supported on Windows
  pdump: support pcapng and filtering
  app/dumpcap: add new packet capture application
  doc: changes for new pcapng and dumpcap
  MAINTAINERS: add entry for new pcapng and dumper

 MAINTAINERS                                   |   6 +
 app/dumpcap/main.c                            | 827 ++++++++++++++++++
 app/dumpcap/meson.build                       |  16 +
 app/meson.build                               |   1 +
 doc/api/doxy-api-index.md                     |   1 +
 doc/api/doxy-api.conf.in                      |   1 +
 .../howto/img/packet_capture_framework.svg    |  96 +-
 doc/guides/howto/packet_capture_framework.rst |  67 +-
 doc/guides/prog_guide/index.rst               |   1 +
 doc/guides/prog_guide/pcapng_lib.rst          |  24 +
 doc/guides/prog_guide/pdump_lib.rst           |  28 +-
 doc/guides/rel_notes/release_21_11.rst        |  10 +
 doc/guides/tools/dumpcap.rst                  |  86 ++
 doc/guides/tools/index.rst                    |   1 +
 lib/bpf/bpf_convert.c                         | 570 ++++++++++++
 lib/bpf/bpf_dump.c                            | 118 +++
 lib/bpf/bpf_validate.c                        |   8 +-
 lib/bpf/meson.build                           |   6 +
 lib/bpf/rte_bpf.h                             |  39 +
 lib/bpf/version.map                           |   7 +
 lib/meson.build                               |   6 +-
 lib/pcapng/meson.build                        |   8 +
 lib/pcapng/pcapng_proto.h                     | 129 +++
 lib/pcapng/rte_pcapng.c                       | 574 ++++++++++++
 lib/pcapng/rte_pcapng.h                       | 194 ++++
 lib/pcapng/version.map                        |  12 +
 lib/pdump/meson.build                         |   2 +-
 lib/pdump/rte_pdump.c                         | 437 ++++++---
 lib/pdump/rte_pdump.h                         | 110 ++-
 lib/pdump/version.map                         |   8 +
 30 files changed, 3177 insertions(+), 216 deletions(-)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst
 create mode 100644 lib/bpf/bpf_convert.c
 create mode 100644 lib/bpf/bpf_dump.c
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v5 1/9] librte_pcapng: add new library for writing pcapng files
  2021-09-08 21:50 ` [dpdk-dev] [PATCH v5 0/9] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-08 21:50   ` Stephen Hemminger
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 2/9] bpf: allow self-xor operation Stephen Hemminger
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 21:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See draft RFC
  https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
and
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build           |   1 +
 lib/pcapng/meson.build    |   8 +
 lib/pcapng/pcapng_proto.h | 129 +++++++++
 lib/pcapng/rte_pcapng.c   | 574 ++++++++++++++++++++++++++++++++++++++
 lib/pcapng/rte_pcapng.h   | 194 +++++++++++++
 lib/pcapng/version.map    |  12 +
 6 files changed, 918 insertions(+)
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

diff --git a/lib/meson.build b/lib/meson.build
index 1673ca4323c0..51bf9c2d11f0 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -41,6 +41,7 @@ libraries = [
         'latencystats',
         'lpm',
         'member',
+        'pcapng',
         'power',
         'pdump',
         'rawdev',
diff --git a/lib/pcapng/meson.build b/lib/pcapng/meson.build
new file mode 100644
index 000000000000..fe636bdf3c0b
--- /dev/null
+++ b/lib/pcapng/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+version = 1
+sources = files('rte_pcapng.c')
+headers = files('rte_pcapng.h')
+
+deps += ['ethdev']
diff --git a/lib/pcapng/pcapng_proto.h b/lib/pcapng/pcapng_proto.h
new file mode 100644
index 000000000000..47161d8a1213
--- /dev/null
+++ b/lib/pcapng/pcapng_proto.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * PCAP Next Generation Capture File writer
+ *
+ * See: https://github.com/pcapng/pcapng/ for the file format.
+ */
+
+enum pcapng_block_types {
+	PCAPNG_INTERFACE_BLOCK		= 1,
+	PCAPNG_PACKET_BLOCK,		/* Obsolete */
+	PCAPNG_SIMPLE_PACKET_BLOCK,
+	PCAPNG_NAME_RESOLUTION_BLOCK,
+	PCAPNG_INTERFACE_STATS_BLOCK,
+	PCAPNG_ENHANCED_PACKET_BLOCK,
+
+	PCAPNG_SECTION_BLOCK		= 0x0A0D0D0A,
+};
+
+struct pcapng_option {
+	uint16_t code;
+	uint16_t length;
+	uint8_t data[];
+};
+
+#define PCAPNG_BYTE_ORDER_MAGIC 0x1A2B3C4D
+#define PCAPNG_MAJOR_VERS 1
+#define PCAPNG_MINOR_VERS 0
+
+enum pcapng_opt {
+	PCAPNG_OPT_END	= 0,
+	PCAPNG_OPT_COMMENT = 1,
+};
+
+struct pcapng_section_header {
+	uint32_t block_type;
+	uint32_t block_length;
+	uint32_t byte_order_magic;
+	uint16_t major_version;
+	uint16_t minor_version;
+	uint64_t section_length;
+};
+
+enum pcapng_section_opt {
+	PCAPNG_SHB_HARDWARE = 2,
+	PCAPNG_SHB_OS	    = 3,
+	PCAPNG_SHB_USERAPPL = 4,
+};
+
+struct pcapng_interface_block {
+	uint32_t block_type;	/* 1 */
+	uint32_t block_length;
+	uint16_t link_type;
+	uint16_t reserved;
+	uint32_t snap_len;
+};
+
+enum pcapng_interface_options {
+	PCAPNG_IFB_NAME	 = 2,
+	PCAPNG_IFB_DESCRIPTION,
+	PCAPNG_IFB_IPV4ADDR,
+	PCAPNG_IFB_IPV6ADDR,
+	PCAPNG_IFB_MACADDR,
+	PCAPNG_IFB_EUIADDR,
+	PCAPNG_IFB_SPEED,
+	PCAPNG_IFB_TSRESOL,
+	PCAPNG_IFB_TZONE,
+	PCAPNG_IFB_FILTER,
+	PCAPNG_IFB_OS,
+	PCAPNG_IFB_FCSLEN,
+	PCAPNG_IFB_TSOFFSET,
+	PCAPNG_IFB_HARDWARE,
+};
+
+struct pcapng_enhance_packet_block {
+	uint32_t block_type;	/* 6 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+	uint32_t capture_length;
+	uint32_t original_length;
+};
+
+/* Flags values */
+#define PCAPNG_IFB_INBOUND   0b01
+#define PCAPNG_IFB_OUTBOUND  0b10
+
+enum pcapng_epb_options {
+	PCAPNG_EPB_FLAGS = 2,
+	PCAPNG_EPB_HASH,
+	PCAPNG_EPB_DROPCOUNT,
+	PCAPNG_EPB_PACKETID,
+	PCAPNG_EPB_QUEUE,
+	PCAPNG_EPB_VERDICT,
+};
+
+enum pcapng_epb_hash {
+	PCAPNG_HASH_2COMP = 0,
+	PCAPNG_HASH_XOR,
+	PCAPNG_HASH_CRC32,
+	PCAPNG_HASH_MD5,
+	PCAPNG_HASH_SHA1,
+	PCAPNG_HASH_TOEPLITZ,
+};
+
+struct pcapng_simple_packet {
+	uint32_t block_type;	/* 3 */
+	uint32_t block_length;
+	uint32_t packet_length;
+};
+
+struct pcapng_statistics {
+	uint32_t block_type;	/* 5 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+};
+
+enum pcapng_isb_options {
+	PCAPNG_ISB_STARTTIME = 2,
+	PCAPNG_ISB_ENDTIME,
+	PCAPNG_ISB_IFRECV,
+	PCAPNG_ISB_IFDROP,
+	PCAPNG_ISB_FILTERACCEPT,
+	PCAPNG_ISB_OSDROP,
+	PCAPNG_ISB_USRDELIV,
+};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
new file mode 100644
index 000000000000..7bde8d9b75fd
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.c
@@ -0,0 +1,574 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+#include <errno.h>
+#include <net/if.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/uio.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+#include <rte_time.h>
+
+#include "pcapng_proto.h"
+
+/* conversion from DPDK speed to PCAPNG */
+#define PCAPNG_MBPS_SPEED 1000000ull
+
+/* Format of the capture file handle */
+struct rte_pcapng {
+	int  outfd;		/* output file */
+	/* DPDK port id to interface index in file */
+	uint32_t port_index[RTE_MAX_ETHPORTS];
+};
+
+/* For converting TSC cycles to PCAPNG ns format */
+struct pcapng_time {
+	uint64_t ns;
+	uint64_t cycles;
+} pcapng_time;
+
+RTE_INIT(pcapng_init)
+{
+	struct timespec ts;
+
+	pcapng_time.cycles = rte_get_tsc_cycles();
+	clock_gettime(CLOCK_REALTIME, &ts);
+	pcapng_time.ns = rte_timespec_to_ns(&ts);
+}
+
+/* PCAPNG timestamps are in nanoseconds */
+static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
+{
+	uint64_t delta;
+
+	delta = cycles - pcapng_time.cycles;
+	return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
+}
+
+/* length of option including padding */
+static uint16_t pcapng_optlen(uint16_t len)
+{
+	return RTE_ALIGN(sizeof(struct pcapng_option) + len,
+			 sizeof(uint32_t));
+}
+
+/* build TLV option and return location of next */
+static struct pcapng_option *
+pcapng_add_option(struct pcapng_option *popt, uint16_t code,
+		  const void *data, uint16_t len)
+{
+	popt->code = code;
+	popt->length = len;
+	memcpy(popt->data, data, len);
+
+	return (struct pcapng_option *)((uint8_t *)popt + pcapng_optlen(len));
+}
+
+/*
+ * Write required initial section header describing the capture
+ */
+static int
+pcapng_section_block(rte_pcapng_t *self,
+		    const char *os, const char *hw,
+		    const char *app, const char *comment)
+{
+	struct pcapng_section_header *hdr;
+	struct pcapng_option *opt;
+	void *buf;
+	uint32_t len;
+	ssize_t cc;
+
+	len = sizeof(*hdr);
+	if (hw)
+		len += pcapng_optlen(strlen(hw));
+	if (os)
+		len += pcapng_optlen(strlen(os));
+	if (app)
+		len += pcapng_optlen(strlen(app));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = calloc(1, len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_section_header *)buf;
+	*hdr = (struct pcapng_section_header) {
+		.block_type = PCAPNG_SECTION_BLOCK,
+		.block_length = len,
+		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
+		.major_version = PCAPNG_MAJOR_VERS,
+		.minor_version = PCAPNG_MINOR_VERS,
+		.section_length = UINT64_MAX,
+	};
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (hw)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_HARDWARE,
+					hw, strlen(hw));
+	if (os)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_OS,
+					os, strlen(os));
+	if (app)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_USERAPPL,
+					app, strlen(app));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+	/* clone block_length after option */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+/* Write the PCAPNG section header at start of file */
+static ssize_t
+pcapng_interface_block(rte_pcapng_t *self, const char *if_name,
+		      uint64_t if_speed, const uint8_t *mac_addr,
+		      const char *if_hw, const char *comment)
+{
+	struct pcapng_interface_block *hdr;
+	struct pcapng_option *opt;
+	const uint8_t tsresol = 9;	/* nanosecond resolution */
+	uint32_t len = sizeof(*hdr);
+	ssize_t cc;
+	void *buf;
+
+	len += pcapng_optlen(sizeof(tsresol));
+	if (if_name)
+		len += pcapng_optlen(strlen(if_name));
+	if (mac_addr)
+		len += pcapng_optlen(6);
+	if (if_speed)
+		len += pcapng_optlen(sizeof(uint64_t));
+	if (if_hw)
+		len += pcapng_optlen(strlen(if_hw));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+	buf = calloc(1, len);
+	if (!buf)
+		return -ENOMEM;
+
+	hdr = (struct pcapng_interface_block *)buf;
+	hdr->block_type = PCAPNG_INTERFACE_BLOCK;
+	hdr->link_type = 1;	/* Ethernet */
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (if_name)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_NAME,
+					 if_name, strlen(if_name));
+	if (mac_addr)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_MACADDR,
+					mac_addr, RTE_ETHER_ADDR_LEN);
+	if (if_speed)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_SPEED,
+					 &if_speed, sizeof(uint64_t));
+	opt = pcapng_add_option(opt, PCAPNG_IFB_TSRESOL,
+				&tsresol, sizeof(tsresol));
+	if (if_hw)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE,
+					 if_hw, strlen(if_hw));
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+static int
+pcapng_add_interface(rte_pcapng_t *self, uint16_t port)
+{
+	struct rte_eth_dev_info dev_info;
+	struct rte_ether_addr macaddr;
+	const struct rte_device *dev;
+	struct rte_eth_link link;
+	char ifname[IF_NAMESIZE];
+	char ifhw[256];
+	uint64_t speed = 0;
+
+	if (rte_eth_dev_info_get(port, &dev_info) < 0)
+		return -1;
+
+	/* make something like an interface name */
+	if (if_indextoname(dev_info.if_index, ifname) == NULL)
+		snprintf(ifname, IF_NAMESIZE, "dpdk:%u", port);
+
+	/* make a useful device hardware string */
+	dev = dev_info.device;
+	if (dev)
+		snprintf(ifhw, sizeof(ifhw),
+			 "%s-%s", dev->bus->name, dev->name);
+
+	/* DPDK reports in units of Mbps */
+	rte_eth_link_get(port, &link);
+	if (link.link_status == ETH_LINK_UP)
+		speed = link.link_speed * PCAPNG_MBPS_SPEED;
+
+	rte_eth_macaddr_get(port, &macaddr);
+
+	return pcapng_interface_block(self, ifname, speed,
+				      macaddr.addr_bytes,
+				      dev ? ifhw : NULL, NULL);
+}
+
+/*
+ * Write the list of possible interfaces at the start
+ * of the file.
+ */
+static int
+pcapng_interfaces(rte_pcapng_t *self)
+{
+	uint16_t port_id;
+	uint16_t index = 0;
+
+	RTE_ETH_FOREACH_DEV(port_id) {
+		/* The list if ports in pcapng needs to be contiguous */
+		self->port_index[port_id] = index++;
+		if (pcapng_add_interface(self, port_id) < 0)
+			return -1;
+	}
+	return 0;
+}
+
+/*
+ * Write an Interface statistics block at the end of capture.
+ */
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop)
+{
+	struct pcapng_statistics *hdr;
+	struct pcapng_option *opt;
+	uint32_t optlen, len;
+	uint8_t *buf;
+	uint64_t ns;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	optlen = 0;
+
+	if (ifrecv != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifdrop));
+	if (start_time != 0)
+		optlen += pcapng_optlen(sizeof(start_time));
+	if (end_time != 0)
+		optlen += pcapng_optlen(sizeof(end_time));
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+	if (optlen != 0)
+		optlen += pcapng_optlen(0);
+
+	len = sizeof(*hdr) + optlen + sizeof(uint32_t);
+	buf = alloca(len);
+	if (buf == NULL)
+		return -1;
+
+	hdr = (struct pcapng_statistics *)buf;
+	opt = (struct pcapng_option *)(hdr + 1);
+
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (start_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME,
+					 &start_time, sizeof(start_time));
+	if (end_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME,
+					 &end_time, sizeof(end_time));
+	if (ifrecv != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV,
+				&ifrecv, sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFDROP,
+				&ifdrop, sizeof(ifdrop));
+	if (optlen != 0)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	hdr->block_type = PCAPNG_INTERFACE_STATS_BLOCK;
+	hdr->block_length = len;
+	hdr->interface_id = self->port_index[port_id];
+
+	ns = pcapng_tsc_to_ns(rte_get_tsc_cycles());
+	hdr->timestamp_hi = ns >> 32;
+	hdr->timestamp_lo = (uint32_t)ns;
+
+	/* clone block_length after option */
+	memcpy(opt, &len, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length)
+{
+	/* The VLAN and EPB header must fit in the mbuf headroom. */
+	RTE_ASSERT(sizeof(struct pcapng_enhance_packet_block) +
+		   sizeof(struct rte_vlan_hdr) <= RTE_PKTMBUF_HEADROOM);
+
+	/* The flags and queue information are added at the end. */
+	return sizeof(struct rte_mbuf)
+		+ RTE_ALIGN(length, sizeof(uint32_t))
+		+ pcapng_optlen(sizeof(uint32_t)) /* flag option */
+		+ pcapng_optlen(sizeof(uint32_t)) /* queue option */
+		+ sizeof(uint32_t);		  /*  length */
+}
+
+/*
+ *   The mbufs created use the Pcapng standard enhanced packet  block.
+ *
+ *                         1                   2                   3
+ *     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  0 |                    Block Type = 0x00000006                    |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  4 |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  8 |                         Interface ID                          |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 12 |                        Timestamp (High)                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 16 |                        Timestamp (Low)                        |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 20 |                    Captured Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 24 |                    Original Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 28 /                                                               /
+ *    /                          Packet Data                          /
+ *    /              variable length, padded to 32 bits               /
+ *    /                                                               /
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0002     |     Option Length = 0x004     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Flags (direction)                                |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0006     |     Option Length = 0x002     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Queue id                                         |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+
+/* Make a copy of original mbuf with pcapng header and options */
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *md,
+		struct rte_mempool *mp,
+		uint32_t length, uint64_t cycles,
+		enum rte_pcapng_direction direction)
+{
+	struct pcapng_enhance_packet_block *epb;
+	uint32_t orig_len, data_len, padding, flags;
+	struct pcapng_option *opt;
+	const uint16_t optlen = pcapng_optlen(sizeof(flags)) + pcapng_optlen(sizeof(queue));
+	struct rte_mbuf *mc;
+	uint64_t ns;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+#endif
+	ns = pcapng_tsc_to_ns(cycles);
+
+	orig_len = rte_pktmbuf_pkt_len(md);
+
+	/* Take snapshot of the data */
+	mc = rte_pktmbuf_copy(md, mp, 0, length);
+	if (unlikely(mc == NULL))
+		return NULL;
+
+	/* If packet had offloaded VLAN, expand it */
+	if (md->ol_flags & ~(PKT_RX_VLAN_STRIPPED | PKT_TX_VLAN)) {
+		if (rte_vlan_insert(&mc) != 0)
+			goto fail;
+
+		orig_len += sizeof(struct rte_vlan_hdr);
+	}
+
+	/* pad the packet to 32 bit boundary */
+	data_len = rte_pktmbuf_data_len(mc);
+	padding = RTE_ALIGN(data_len, sizeof(uint32_t)) - data_len;
+	if (padding > 0) {
+		void *tail = rte_pktmbuf_append(mc, padding);
+
+		if (tail == NULL)
+			goto fail;
+		memset(tail, 0, padding);
+	}
+
+	/* reserve trailing options and block length */
+	opt = (struct pcapng_option *)
+		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
+	if (unlikely(opt == NULL))
+		goto fail;
+
+	switch (direction) {
+	case RTE_PCAPNG_DIRECTION_IN:
+		flags = PCAPNG_IFB_INBOUND;
+		break;
+	case RTE_PCAPNG_DIRECTION_OUT:
+		flags = PCAPNG_IFB_OUTBOUND;
+		break;
+	default:
+		flags = 0;
+	}
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS,
+				&flags, sizeof(flags));
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
+				&queue, sizeof(queue));
+
+	/* Add PCAPNG packet header */
+	epb = (struct pcapng_enhance_packet_block *)
+		rte_pktmbuf_prepend(mc, sizeof(*epb));
+	if (unlikely(epb == NULL))
+		goto fail;
+
+	epb->block_type = PCAPNG_ENHANCED_PACKET_BLOCK;
+	epb->block_length = rte_pktmbuf_data_len(mc);
+
+	/* Interface index is filled in later during write */
+	mc->port = port_id;
+
+	epb->timestamp_hi = ns >> 32;
+	epb->timestamp_lo = (uint32_t)ns;
+	epb->capture_length = data_len;
+	epb->original_length = orig_len;
+
+	/* set trailer of block length */
+	*(uint32_t *)opt = epb->block_length;
+
+	return mc;
+
+fail:
+	rte_pktmbuf_free(mc);
+	return NULL;
+}
+
+/* Count how many segments are in this array of mbufs */
+static unsigned int
+mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
+{
+	unsigned int i, iovcnt;
+
+	for (iovcnt = 0, i = 0; i < n; i++) {
+		const struct rte_mbuf *m = pkts[i];
+
+		__rte_mbuf_sanity_check(m, 1);
+
+		iovcnt += m->nb_segs;
+	}
+	return iovcnt;
+}
+
+/* Write pre-formatted packets to file. */
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts)
+{
+	int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
+	struct iovec iov[iovcnt];
+	unsigned int i, cnt;
+	ssize_t ret;
+
+	for (i = cnt = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *m = pkts[i];
+		struct pcapng_enhance_packet_block *epb;
+
+		/* sanity check that is really a pcapng mbuf */
+		epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
+		if (unlikely(epb->block_type != PCAPNG_ENHANCED_PACKET_BLOCK ||
+			     epb->block_length != rte_pktmbuf_data_len(m))) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/*
+		 * The DPDK port is recorded during pcapng_copy.
+		 * Map that to PCAPNG interface in file.
+		 */
+		epb->interface_id = self->port_index[m->port];
+		do {
+			iov[cnt].iov_base = rte_pktmbuf_mtod(m, void *);
+			iov[cnt].iov_len = rte_pktmbuf_data_len(m);
+			++cnt;
+		} while ((m = m->next));
+	}
+
+	ret = writev(self->outfd, iov, iovcnt);
+	if (unlikely(ret < 0))
+		rte_errno = errno;
+	return ret;
+}
+
+/* Create new pcapng writer handle */
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment)
+{
+	rte_pcapng_t *self;
+
+	self = malloc(sizeof(*self));
+	if (!self) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	self->outfd = fd;
+
+	if (pcapng_section_block(self, osname, hardware, appname, comment) < 0)
+		goto fail;
+
+	if (pcapng_interfaces(self) < 0)
+		goto fail;
+
+	return self;
+fail:
+	free(self);
+	return NULL;
+}
+
+void
+rte_pcapng_close(rte_pcapng_t *self)
+{
+	close(self->outfd);
+	free(self);
+}
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
new file mode 100644
index 000000000000..2f1bb073df08
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.h
@@ -0,0 +1,194 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+/**
+ * @file
+ * RTE pcapng
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * Pcapng is an evolution from the pcap format, created to address some of
+ * its deficiencies. Namely, the lack of extensibility and inability to store
+ * additional information.
+ *
+ * For details about the file format see RFC:
+ *   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+ *  and
+ *    https://github.com/pcapng/pcapng/
+ */
+
+#ifndef _RTE_PCAPNG_H_
+#define _RTE_PCAPNG_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+#include <rte_compat.h>
+#include <rte_common.h>
+#include <rte_mempool.h>
+#include <rte_ring.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Opaque handle used for functions in this library. */
+typedef struct rte_pcapng rte_pcapng_t;
+
+/**
+ * Write data to existing open file
+ *
+ * @param fd
+ *   file descriptor
+ * @param osname
+ *   Optional description of the operating system.
+ *   Examples: "Debian 11", "Windows Server 22"
+ * @param hardware
+ *   Optional description of the hardware used to create this file.
+ *   Examples: "x86 Virtual Machine"
+ * @param appname
+ *   Optional: application name recorded in the pcapng file.
+ *   Example: "dpdk-dumpcap 1.0 (DPDK 20.11)"
+ * @param comment
+ *   Optional comment to add to file header.
+ * @return
+ *   handle to library, or NULL in case of error (and rte_errno is set).
+ */
+__rte_experimental
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment);
+
+/**
+ * Close capture file
+ *
+ * @param self
+ *  handle to library
+ */
+__rte_experimental
+void
+rte_pcapng_close(rte_pcapng_t *self);
+
+/**
+ * Direction flag
+ * These should match Enhanced Packet Block flag bits
+ */
+enum rte_pcapng_direction {
+	RTE_PCAPNG_DIRECTION_UNKNOWN = 0,
+	RTE_PCAPNG_DIRECTION_IN  = 1,
+	RTE_PCAPNG_DIRECTION_OUT = 2,
+};
+
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param port_id
+ *   The Ethernet port on which packet was received
+ *   or is going to be transmitted.
+ * @param queue
+ *   The queue on the Ethernet port where packet was received
+ *   or is going to be transmitted.
+ * @param mp
+ *   The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ *   The mbuf to copy
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @param timestamp
+ *   The timestamp in TSC cycles.
+ * @param direction
+ *   The direction of the packer: receive, transmit or unknown.
+ *
+ * @return
+ *   - The pointer to the new mbuf formatted for pcapng_write
+ *   - NULL if allocation fails.
+ *
+ */
+__rte_experimental
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *m, struct rte_mempool *mp,
+		uint32_t length, uint64_t timestamp,
+		enum rte_pcapng_direction direction);
+
+
+/**
+ * Determine optimum mbuf data size.
+ *
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @return
+ *   The minimum size of mbuf data to handle packet with length bytes.
+ *   Accounting for required header and trailer fields
+ */
+__rte_experimental
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length);
+
+/**
+ * Write packets to the capture file.
+ *
+ * Packets to be captured are copied by rte_pcapng_mbuf()
+ * and then this function is called to write them to the file.
+ * @warning
+ * Do not pass original mbufs
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param pkts
+ *  The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *  which contain the output packets
+ * @param nb_pkts
+ *  The number of packets to write to the file.
+ * @return
+ *  The number of bytes written to file, -1 on failure to write file.
+ *  The mbuf's in *pkts* are always freed.
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts);
+
+/**
+ * Write an Interface statistics block.
+ * For statistics, use 0 if don't know or care to report it.
+ * Should be called before closing capture to report results.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param port
+ *  The Ethernet port to report stats on.
+ * @param comment
+ *   Optional comment to add to statistics.
+ * @param start_time
+ *  The time when packet capture was started in nanoseconds.
+ *  Optional: can be zero if not known.
+ * @param end_time
+ *  The time when packet capture was stopped in nanoseconds.
+ *  Optional: can be zero if not finished;
+ * @param ifrecv
+ *  The number of packets received by capture.
+ *  Optional: use UINT64_MAX if not known.
+ * @param ifdrop
+ *  The number of packets missed by the capture process.
+ *  Optional: use UINT64_MAX if not known.
+ * @return
+ *  number of bytes written to file, -1 on failure to write file
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PCAPNG_H_ */
diff --git a/lib/pcapng/version.map b/lib/pcapng/version.map
new file mode 100644
index 000000000000..05a9c86a7d91
--- /dev/null
+++ b/lib/pcapng/version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_pcapng_close;
+	rte_pcapng_copy;
+	rte_pcapng_fdopen;
+	rte_pcapng_mbuf_size;
+	rte_pcapng_write_packets;
+	rte_pcapng_write_stats;
+
+	local: *;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v5 2/9] bpf: allow self-xor operation
  2021-09-08 21:50 ` [dpdk-dev] [PATCH v5 0/9] Packet capture framework enhancements Stephen Hemminger
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 1/9] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-09-08 21:50   ` Stephen Hemminger
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 3/9] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 21:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

When doing BPF filter program conversion, a common way
to zero a register in single instruction is:
     xor r7,r7
The BPF validator would not allow this because the value of
r7 was undefined. But after this operation it always zero.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_validate.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/bpf/bpf_validate.c b/lib/bpf/bpf_validate.c
index 7b1291b382e9..7647a7454dc2 100644
--- a/lib/bpf/bpf_validate.c
+++ b/lib/bpf/bpf_validate.c
@@ -661,8 +661,12 @@ eval_alu(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
 
 	op = BPF_OP(ins->code);
 
-	err = eval_defined((op != EBPF_MOV) ? rd : NULL,
-			(op != BPF_NEG) ? &rs : NULL);
+	/* Allow self-xor as way to zero register */
+	if (op == BPF_XOR && ins->src_reg == ins->dst_reg)
+		err = NULL;
+	else
+		err = eval_defined((op != EBPF_MOV) ? rd : NULL,
+				   (op != BPF_NEG) ? &rs : NULL);
 	if (err != NULL)
 		return err;
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v5 3/9] bpf: add function to convert classic BPF to DPDK BPF
  2021-09-08 21:50 ` [dpdk-dev] [PATCH v5 0/9] Packet capture framework enhancements Stephen Hemminger
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 1/9] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 2/9] bpf: allow self-xor operation Stephen Hemminger
@ 2021-09-08 21:50   ` Stephen Hemminger
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 4/9] bpf: add function to dump eBPF instructions Stephen Hemminger
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 21:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

The pcap library emits classic BPF (32 bit) and is useful for
creating filter programs.  The DPDK BPF library only implements
extended BPF (eBPF).  Add an function to convert from old to
new.

The rte_bpf_convert function uses rte_malloc to put the resulting
program in hugepage shared memory so it can be passed from a
secondary process to a primary process.

The code to convert was originally done as part of the Linux
kernel implementation then converted to a userspace program.
Both authors have agreed that it is allowable to license this
as BSD licensed in DPDK.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_convert.c | 570 ++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build   |   5 +
 lib/bpf/rte_bpf.h     |  25 ++
 lib/bpf/version.map   |   6 +
 4 files changed, 606 insertions(+)
 create mode 100644 lib/bpf/bpf_convert.c

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
new file mode 100644
index 000000000000..198e6d359042
--- /dev/null
+++ b/lib/bpf/bpf_convert.c
@@ -0,0 +1,570 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Microsoft Corporation
+ *
+ * Based on bpf_convert_filter() in the Linux kernel sources
+ * and filter2xdp.
+ *
+ * Licensed as BSD with permission original authors.
+ * Copyright (C) 2017 Tobias Klauser
+ * Copyright (c) 2011 - 2014 PLUMgrid, http://plumgrid.com
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_bpf.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+/* Workaround name conflicts with libpcap */
+#define bpf_validate(f, len) bpf_validate_libpcap(f, len)
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+#undef bpf_validate
+
+#include "bpf_impl.h"
+#include "bpf_def.h"
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+/*
+ * Linux socket filter uses negative absolute offsets to
+ * reference ancillary data.
+ *
+ */
+#define SKF_AD_OFF    (-0x1000)
+#define SKF_AD_PROTOCOL 0
+#define SKF_AD_PKTTYPE	4
+#define SKF_AD_IFINDEX	8
+#define SKF_AD_NLATTR	12
+#define SKF_AD_NLATTR_NEST	16
+#define SKF_AD_MARK	20
+#define SKF_AD_QUEUE	24
+#define SKF_AD_HATYPE	28
+#define SKF_AD_RXHASH	32
+#define SKF_AD_CPU	36
+#define SKF_AD_ALU_XOR_X	40
+#define SKF_AD_VLAN_TAG	44
+#define SKF_AD_VLAN_TAG_PRESENT 48
+#define SKF_AD_PAY_OFFSET	52
+#define SKF_AD_RANDOM	56
+#define SKF_AD_VLAN_TPID	60
+#define SKF_AD_MAX	64
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1	EBPF_REG_1
+#define BPF_REG_ARG2	EBPF_REG_2
+#define BPF_REG_ARG3	EBPF_REG_3
+#define BPF_REG_ARG4	EBPF_REG_4
+#define BPF_REG_ARG5	EBPF_REG_5
+#define BPF_REG_CTX	EBPF_REG_6
+#define BPF_REG_FP	EBPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A	EBPF_REG_0
+#define BPF_REG_X	EBPF_REG_7
+#define BPF_REG_TMP	EBPF_REG_8
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define EBPF_ALU64_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | BPF_OP(OP) | BPF_X,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU32_IMM(OP, DST, IMM)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV32_IMM(DST, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = CODE,					\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN()						\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | EBPF_EXIT,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/*
+ * Placeholder to convert BPF extensions like length and VLAN tag
+ * If and when DPDK BPF supports them.
+ */
+static bool convert_bpf_load(const struct bpf_insn *fp,
+			     struct ebpf_insn **new_insnp __rte_unused)
+{
+	switch (fp->k) {
+	case SKF_AD_OFF + SKF_AD_PROTOCOL:
+	case SKF_AD_OFF + SKF_AD_PKTTYPE:
+	case SKF_AD_OFF + SKF_AD_IFINDEX:
+	case SKF_AD_OFF + SKF_AD_HATYPE:
+	case SKF_AD_OFF + SKF_AD_MARK:
+	case SKF_AD_OFF + SKF_AD_RXHASH:
+	case SKF_AD_OFF + SKF_AD_QUEUE:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG_PRESENT:
+	case SKF_AD_OFF + SKF_AD_VLAN_TPID:
+	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
+	case SKF_AD_OFF + SKF_AD_NLATTR:
+	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
+	case SKF_AD_OFF + SKF_AD_CPU:
+	case SKF_AD_OFF + SKF_AD_RANDOM:
+	case SKF_AD_OFF + SKF_AD_ALU_XOR_X:
+		/* TODO: Not implemented yet */
+		RTE_BPF_LOG(ERR, "BPF extension LOAD ABS %u not supported\n",
+			    fp->k);
+		return true;
+	default:
+		return false;
+	}
+}
+
+static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
+			      struct ebpf_insn *new_prog, uint32_t *new_len)
+{
+	unsigned int pass = 0;
+	size_t new_flen = 0, target, i;
+	struct ebpf_insn *new_insn;
+	const struct bpf_insn *fp;
+	int *addrs = NULL;
+	uint8_t bpf_src;
+
+	if (len > BPF_MAXINSNS) {
+		RTE_BPF_LOG(ERR, "%s: cBPF program too long (%zu insns)\n",
+			    __func__, len);
+		return -EINVAL;
+	}
+
+	/* On second pass, allocate the new program */
+	if (new_prog) {
+		addrs = calloc(len, sizeof(*addrs));
+		if (addrs == NULL)
+			return -ENOMEM;
+	}
+
+do_pass:
+	new_insn = new_prog;
+	fp = prog;
+
+	/* Classic BPF related prologue emission. */
+	if (new_insn) {
+		/* Classic BPF expects A and X to be reset first. These need
+		 * to be guaranteed to be the first two instructions.
+		 */
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+
+		/* All programs must keep CTX in callee saved BPF_REG_CTX.
+		 * In eBPF case it's done by the compiler, here we need to
+		 * do this ourself. Initial CTX is present in BPF_REG_ARG1.
+		 */
+		*new_insn++ = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1);
+	} else {
+		new_insn += 3;
+	}
+
+	for (i = 0; i < len; fp++, i++) {
+		struct ebpf_insn tmp_insns[6] = { };
+		struct ebpf_insn *insn = tmp_insns;
+
+		if (addrs)
+			addrs[i] = new_insn - new_prog;
+
+		switch (fp->code) {
+			/* Absolute loads are how classic BPF accesses skb */
+		case BPF_LD | BPF_ABS | BPF_W:
+		case BPF_LD | BPF_ABS | BPF_H:
+		case BPF_LD | BPF_ABS | BPF_B:
+			if (convert_bpf_load(fp, &insn))
+				goto err;
+
+			*insn = BPF_RAW_INSN(fp->code, 0, 0, 0, fp->k);
+			break;
+
+		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_X:
+			/* For cBPF, don't cause floating point exception */
+			*insn++ = BPF_MOV32_REG(BPF_REG_X, BPF_REG_X);
+			*insn++ = BPF_JMP_IMM(EBPF_JNE, BPF_REG_X, 0, 2);
+			*insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+			*insn++ = BPF_EXIT_INSN();
+			/* fallthrough */
+		case BPF_ALU | BPF_ADD | BPF_X:
+		case BPF_ALU | BPF_ADD | BPF_K:
+		case BPF_ALU | BPF_SUB | BPF_X:
+		case BPF_ALU | BPF_SUB | BPF_K:
+		case BPF_ALU | BPF_AND | BPF_X:
+		case BPF_ALU | BPF_AND | BPF_K:
+		case BPF_ALU | BPF_OR | BPF_X:
+		case BPF_ALU | BPF_OR | BPF_K:
+		case BPF_ALU | BPF_LSH | BPF_X:
+		case BPF_ALU | BPF_LSH | BPF_K:
+		case BPF_ALU | BPF_RSH | BPF_X:
+		case BPF_ALU | BPF_RSH | BPF_K:
+		case BPF_ALU | BPF_XOR | BPF_X:
+		case BPF_ALU | BPF_XOR | BPF_K:
+		case BPF_ALU | BPF_MUL | BPF_X:
+		case BPF_ALU | BPF_MUL | BPF_K:
+		case BPF_ALU | BPF_DIV | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_NEG:
+		case BPF_LD | BPF_IND | BPF_W:
+		case BPF_LD | BPF_IND | BPF_H:
+		case BPF_LD | BPF_IND | BPF_B:
+			/* All arithmetic insns map as-is. */
+			*insn = BPF_RAW_INSN(fp->code, BPF_REG_A, BPF_REG_X, 0, fp->k);
+			break;
+
+			/* Jump transformation cannot use BPF block macros
+			 * everywhere as offset calculation and target updates
+			 * require a bit more work than the rest, i.e. jump
+			 * opcodes map as-is, but offsets need adjustment.
+			 */
+
+#define BPF_EMIT_JMP							\
+			do {						\
+				if (target >= len)			\
+					goto err;			\
+				insn->off = addrs ? addrs[target] - addrs[i] - 1 : 0; \
+				/* Adjust pc relative offset for 2nd or 3rd insn. */ \
+				insn->off -= insn - tmp_insns;		\
+			} while (0)
+
+		case BPF_JMP | BPF_JA:
+			target = i + fp->k + 1;
+			insn->code = fp->code;
+			BPF_EMIT_JMP;
+			break;
+
+		case BPF_JMP | BPF_JEQ | BPF_K:
+		case BPF_JMP | BPF_JEQ | BPF_X:
+		case BPF_JMP | BPF_JSET | BPF_K:
+		case BPF_JMP | BPF_JSET | BPF_X:
+		case BPF_JMP | BPF_JGT | BPF_K:
+		case BPF_JMP | BPF_JGT | BPF_X:
+		case BPF_JMP | BPF_JGE | BPF_K:
+		case BPF_JMP | BPF_JGE | BPF_X:
+			if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
+				/* BPF immediates are signed, zero extend
+				 * immediate into tmp register and use it
+				 * in compare insn.
+				 */
+				*insn++ = BPF_MOV32_IMM(BPF_REG_TMP, fp->k);
+
+				insn->dst_reg = BPF_REG_A;
+				insn->src_reg = BPF_REG_TMP;
+				bpf_src = BPF_X;
+			} else {
+				insn->dst_reg = BPF_REG_A;
+				insn->imm = fp->k;
+				bpf_src = BPF_SRC(fp->code);
+				insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			}
+
+			/* Common case where 'jump_false' is next insn. */
+			if (fp->jf == 0) {
+				insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+				target = i + fp->jt + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Convert JEQ into JNE when 'jump_true' is next insn. */
+			if (fp->jt == 0 && BPF_OP(fp->code) == BPF_JEQ) {
+				insn->code = BPF_JMP | EBPF_JNE | bpf_src;
+				target = i + fp->jf + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Other jumps are mapped into two insns: Jxx and JA. */
+			target = i + fp->jt + 1;
+			insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+			BPF_EMIT_JMP;
+			insn++;
+
+			insn->code = BPF_JMP | BPF_JA;
+			target = i + fp->jf + 1;
+			BPF_EMIT_JMP;
+			break;
+
+			/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
+		case BPF_LDX | BPF_MSH | BPF_B:
+			/* tmp = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_A);
+			/* A = BPF_R0 = *(u8 *) (skb->data + K) */
+			*insn++ = BPF_LD_ABS(BPF_B, fp->k);
+			/* A &= 0xf */
+			*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, 0xf);
+			/* A <<= 2 */
+			*insn++ = BPF_ALU32_IMM(BPF_LSH, BPF_REG_A, 2);
+			/* X = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			/* A = tmp */
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP);
+			break;
+
+			/* RET_K is remaped into 2 insns. RET_A case doesn't need an
+			 * extra mov as EBPF_REG_0 is already mapped into BPF_REG_A.
+			 */
+		case BPF_RET | BPF_A:
+		case BPF_RET | BPF_K:
+			if (BPF_RVAL(fp->code) == BPF_K) {
+				*insn++ = BPF_MOV32_RAW(BPF_K, EBPF_REG_0,
+							0, fp->k);
+			}
+			*insn = BPF_EXIT_INSN();
+			break;
+
+			/* Store to stack. */
+		case BPF_ST:
+		case BPF_STX:
+			*insn = BPF_STX_MEM(BPF_W, BPF_REG_FP, BPF_CLASS(fp->code) ==
+					    BPF_ST ? BPF_REG_A : BPF_REG_X,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* Load from stack. */
+		case BPF_LD | BPF_MEM:
+		case BPF_LDX | BPF_MEM:
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD  ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_FP,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* A = K or X = K */
+		case BPF_LD | BPF_IMM:
+		case BPF_LDX | BPF_IMM:
+			*insn = BPF_MOV32_IMM(BPF_CLASS(fp->code) == BPF_LD ?
+					      BPF_REG_A : BPF_REG_X, fp->k);
+			break;
+
+			/* X = A */
+		case BPF_MISC | BPF_TAX:
+			*insn = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			break;
+
+			/* A = X */
+		case BPF_MISC | BPF_TXA:
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_X);
+			break;
+
+			/* A = mbuf->len or X = mbuf->len */
+		case BPF_LD | BPF_W | BPF_LEN:
+		case BPF_LDX | BPF_W | BPF_LEN:
+			/* BPF_ABS/BPF_IND implicitly expect mbuf ptr in R6 */
+
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_CTX,
+					    offsetof(struct rte_mbuf, pkt_len));
+			break;
+
+			/* Unknown instruction. */
+		default:
+			RTE_BPF_LOG(ERR, "%s: Unknown instruction!: %#x\n",
+				    __func__, fp->code);
+			goto err;
+		}
+
+		insn++;
+		if (new_prog)
+			memcpy(new_insn, tmp_insns,
+			       sizeof(*insn) * (insn - tmp_insns));
+		new_insn += insn - tmp_insns;
+	}
+
+	if (!new_prog) {
+		/* Only calculating new length. */
+		*new_len = new_insn - new_prog;
+		return 0;
+	}
+
+	pass++;
+	if ((ptrdiff_t)new_flen != new_insn - new_prog) {
+		new_flen = new_insn - new_prog;
+		if (pass > 2)
+			goto err;
+		goto do_pass;
+	}
+
+	free(addrs);
+	assert(*new_len == new_flen);
+
+	return 0;
+err:
+	free(addrs);
+	return -1;
+}
+
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog)
+{
+	struct rte_bpf_prm *prm = NULL;
+	struct ebpf_insn *ebpf = NULL;
+	uint32_t ebpf_len = 0;
+	int ret;
+
+	if (prog == NULL) {
+		RTE_BPF_LOG(ERR, "%s: NULL program\n", __func__);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* 1st pass: calculate the eBPF program length */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, NULL, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot get eBPF length\n", __func__);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	RTE_BPF_LOG(DEBUG, "%s: prog len cBPF=%u -> eBPF=%u\n",
+		    __func__, prog->bf_len, ebpf_len);
+
+	prm = rte_zmalloc("bpf_filter",
+			  sizeof(*prm) + ebpf_len * sizeof(*ebpf), 0);
+	if (prm == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	/* The EPBF instructions in this case are right after the header */
+	ebpf = (void *)(prm + 1);
+
+	/* 2nd pass: remap cBPF to eBPF instructions  */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, ebpf, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot convert cBPF to eBPF\n", __func__);
+		free(prm);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	prm->ins = ebpf;
+	prm->nb_ins = ebpf_len;
+
+	/* Classic BPF programs use mbufs */
+	prm->prog_arg.type = RTE_BPF_ARG_PTR_MBUF;
+	prm->prog_arg.size = sizeof(struct rte_mbuf);
+
+	return prm;
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 63cbd60185e0..54f7610ae990 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -25,3 +25,8 @@ if dep.found()
     sources += files('bpf_load_elf.c')
     ext_deps += dep
 endif
+
+if dpdk_conf.has('RTE_PORT_PCAP')
+    sources += files('bpf_convert.c')
+    ext_deps += pcap_dep
+endif
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 69116f36ba8b..2f23e272a376 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,31 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+#ifdef RTE_PORT_PCAP
+
+struct bpf_program;
+
+/**
+ * Convert a Classic BPF program from libpcap into a DPDK BPF code.
+ *
+ * @param prog
+ *  Classic BPF program from pcap_compile().
+ * @param prm
+ *  Result Extended BPF program.
+ * @return
+ *   Pointer to BPF program (allocated with *rte_malloc*)
+ *   that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+__rte_experimental
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog);
+
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 0bf35f487666..47082d5003ef 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -14,3 +14,9 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_convert;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v5 4/9] bpf: add function to dump eBPF instructions
  2021-09-08 21:50 ` [dpdk-dev] [PATCH v5 0/9] Packet capture framework enhancements Stephen Hemminger
                     ` (2 preceding siblings ...)
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 3/9] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
@ 2021-09-08 21:50   ` Stephen Hemminger
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 5/9] lib: pdump is not supported on Windows Stephen Hemminger
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 21:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

When debugging converted (and other) programs it is useful
to see disassembled eBPF output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_dump.c  | 118 ++++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build |   1 +
 lib/bpf/rte_bpf.h   |  14 ++++++
 lib/bpf/version.map |   1 +
 4 files changed, 134 insertions(+)
 create mode 100644 lib/bpf/bpf_dump.c

diff --git a/lib/bpf/bpf_dump.c b/lib/bpf/bpf_dump.c
new file mode 100644
index 000000000000..a6a431e64903
--- /dev/null
+++ b/lib/bpf/bpf_dump.c
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Stephen Hemminger
+ * Based on filter2xdp
+ * Copyright (C) 2017 Tobias Klauser
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+
+#include "rte_bpf.h"
+
+#define BPF_OP_INDEX(x) (BPF_OP(x) >> 4)
+#define BPF_SIZE_INDEX(x) (BPF_SIZE(x) >> 3)
+
+static const char *const class_tbl[] = {
+	[BPF_LD] = "ld",   [BPF_LDX] = "ldx",	 [BPF_ST] = "st",
+	[BPF_STX] = "stx", [BPF_ALU] = "alu",	 [BPF_JMP] = "jmp",
+	[BPF_RET] = "ret", [BPF_MISC] = "alu64",
+};
+
+static const char *const alu_op_tbl[16] = {
+	[BPF_ADD >> 4] = "add",	   [BPF_SUB >> 4] = "sub",
+	[BPF_MUL >> 4] = "mul",	   [BPF_DIV >> 4] = "div",
+	[BPF_OR >> 4] = "or",	   [BPF_AND >> 4] = "and",
+	[BPF_LSH >> 4] = "lsh",	   [BPF_RSH >> 4] = "rsh",
+	[BPF_NEG >> 4] = "neg",	   [BPF_MOD >> 4] = "mod",
+	[BPF_XOR >> 4] = "xor",	   [EBPF_MOV >> 4] = "mov",
+	[EBPF_ARSH >> 4] = "arsh", [EBPF_END >> 4] = "endian",
+};
+
+static const char *const size_tbl[] = {
+	[BPF_W >> 3] = "w",
+	[BPF_H >> 3] = "h",
+	[BPF_B >> 3] = "b",
+	[EBPF_DW >> 3] = "dw",
+};
+
+static const char *const jump_tbl[16] = {
+	[BPF_JA >> 4] = "ja",	   [BPF_JEQ >> 4] = "jeq",
+	[BPF_JGT >> 4] = "jgt",	   [BPF_JGE >> 4] = "jge",
+	[BPF_JSET >> 4] = "jset",  [EBPF_JNE >> 4] = "jne",
+	[EBPF_JSGT >> 4] = "jsgt", [EBPF_JSGE >> 4] = "jsge",
+	[EBPF_CALL >> 4] = "call", [EBPF_EXIT >> 4] = "exit",
+};
+
+static void ebpf_dump(FILE *f, const struct ebpf_insn insn, size_t n)
+{
+	const char *op, *postfix = "";
+	uint8_t cls = BPF_CLASS(insn.code);
+
+	fprintf(f, " L%zu:\t", n);
+
+	switch (cls) {
+	default:
+		fprintf(f, "unimp 0x%x // class: %s\n", insn.code,
+			class_tbl[cls]);
+		break;
+	case BPF_ALU:
+		postfix = "32";
+		/* fall through */
+	case EBPF_ALU64:
+		op = alu_op_tbl[BPF_OP_INDEX(insn.code)];
+		if (BPF_SRC(insn.code) == BPF_X)
+			fprintf(f, "%s%s r%u, r%u\n", op, postfix, insn.dst_reg,
+				insn.src_reg);
+		else
+			fprintf(f, "%s%s r%u, #0x%x\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		break;
+	case BPF_LD:
+		op = "ld";
+		postfix = size_tbl[BPF_SIZE_INDEX(insn.code)];
+		if (BPF_MODE(insn.code) == BPF_IMM)
+			fprintf(f, "%s%s r%d, #0x%x\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		else if (BPF_MODE(insn.code) == BPF_ABS)
+			fprintf(f, "%s%s r%d, [%d]\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		else if (BPF_MODE(insn.code) == BPF_IND)
+			fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix,
+				insn.dst_reg, insn.src_reg, insn.imm);
+		else
+			fprintf(f, "// BUG: LD opcode 0x%02x in eBPF insns\n",
+				insn.code);
+		break;
+	case BPF_LDX:
+		op = "ldx";
+		postfix = size_tbl[BPF_SIZE_INDEX(insn.code)];
+		fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix, insn.dst_reg,
+			insn.src_reg, insn.off);
+		break;
+#define L(pc, off) ((int)(pc) + 1 + (off))
+	case BPF_JMP:
+		op = jump_tbl[BPF_OP_INDEX(insn.code)];
+		if (op == NULL)
+			fprintf(f, "invalid jump opcode: %#x\n", insn.code);
+		else if (BPF_OP(insn.code) == BPF_JA)
+			fprintf(f, "%s L%d\n", op, L(n, insn.off));
+		else if (BPF_OP(insn.code) == EBPF_EXIT)
+			fprintf(f, "%s\n", op);
+		else
+			fprintf(f, "%s r%u, #0x%x, L%d\n", op, insn.dst_reg,
+				insn.imm, L(n, insn.off));
+		break;
+	case BPF_RET:
+		fprintf(f, "// BUG: RET opcode 0x%02x in eBPF insns\n",
+			insn.code);
+		break;
+	}
+}
+
+void rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len)
+{
+	uint32_t i;
+
+	for (i = 0; i < len; ++i)
+		ebpf_dump(f, buf[i], i);
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 54f7610ae990..5b5585173aeb 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2018 Intel Corporation
 
 sources = files('bpf.c',
+	'bpf_dump.c',
         'bpf_exec.c',
         'bpf_load.c',
         'bpf_pkt.c',
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 2f23e272a376..0d0a84b130a0 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,20 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+/**
+ * Dump epf instructions to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param buf
+ *   A pointer to BPF instructions
+ * @param len
+ *   Number of BPF instructions to dump.
+ */
+__rte_experimental
+void
+rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len);
+
 #ifdef RTE_PORT_PCAP
 
 struct bpf_program;
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 47082d5003ef..3b953f2f4592 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -19,4 +19,5 @@ EXPERIMENTAL {
 	global:
 
 	rte_bpf_convert;
+	rte_bpf_dump;
 };
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v5 5/9] lib: pdump is not supported on Windows
  2021-09-08 21:50 ` [dpdk-dev] [PATCH v5 0/9] Packet capture framework enhancements Stephen Hemminger
                     ` (3 preceding siblings ...)
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 4/9] bpf: add function to dump eBPF instructions Stephen Hemminger
@ 2021-09-08 21:50   ` Stephen Hemminger
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 6/9] pdump: support pcapng and filtering Stephen Hemminger
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 21:50 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

The original version of the pdump library was building on
Windows, but it was useless since the pdump utility was not being
built.

The new version of pdump with filtering now has dependency
on bpf. But bpf library is not available on Windows.

For now, just stop trying to build pdump on Windows.
Eventually, bpf library, pdump library, dumpcap tool,
and pdump tool can be converted to work on Windows.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Cc: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>
Cc: Dmitry Malloy <dmitrym@microsoft.com>
Cc: Pallavi Kadam <pallavi.kadam@intel.com>
---
 lib/meson.build | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/meson.build b/lib/meson.build
index 51bf9c2d11f0..ba88e9eabc58 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -85,7 +85,6 @@ if is_windows
             'gro',
             'gso',
             'latencystats',
-            'pdump',
     ] # only supported libraries for windows
 endif
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v5 6/9] pdump: support pcapng and filtering
  2021-09-08 21:50 ` [dpdk-dev] [PATCH v5 0/9] Packet capture framework enhancements Stephen Hemminger
                     ` (4 preceding siblings ...)
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 5/9] lib: pdump is not supported on Windows Stephen Hemminger
@ 2021-09-08 21:50   ` Stephen Hemminger
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 7/9] app/dumpcap: add new packet capture application Stephen Hemminger
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 21:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build       |   4 +-
 lib/pdump/meson.build |   2 +-
 lib/pdump/rte_pdump.c | 437 ++++++++++++++++++++++++++++++------------
 lib/pdump/rte_pdump.h | 110 ++++++++++-
 lib/pdump/version.map |   8 +
 5 files changed, 435 insertions(+), 126 deletions(-)

diff --git a/lib/meson.build b/lib/meson.build
index ba88e9eabc58..aacc33a0f0c0 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -26,6 +26,7 @@ libraries = [
         'timer',   # eventdev depends on this
         'acl',
         'bbdev',
+        'bpf',
         'bitratestats',
         'cfgfile',
         'compressdev',
@@ -43,7 +44,6 @@ libraries = [
         'member',
         'pcapng',
         'power',
-        'pdump',
         'rawdev',
         'regexdev',
         'rib',
@@ -55,10 +55,10 @@ libraries = [
         'ipsec', # ipsec lib depends on net, crypto and security
         'fib', #fib lib depends on rib
         'port', # pkt framework libs which use other libs from above
+	'pdump', # pdump lib depends on bpf pcapng
         'table',
         'pipeline',
         'flow_classify', # flow_classify lib depends on pkt framework table lib
-        'bpf',
         'graph',
         'node',
 ]
diff --git a/lib/pdump/meson.build b/lib/pdump/meson.build
index 3a95eabde6a6..51ceb2afdec5 100644
--- a/lib/pdump/meson.build
+++ b/lib/pdump/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('rte_pdump.c')
 headers = files('rte_pdump.h')
-deps += ['ethdev']
+deps += ['ethdev', 'bpf', 'pcapng']
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 382217bc1564..f2047ad9f001 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -7,8 +7,10 @@
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
+#include <rte_pcapng.h>
 
 #include "rte_pdump.h"
 
@@ -27,30 +29,26 @@ enum pdump_operation {
 	ENABLE = 2
 };
 
+/*
+ * Note: version numbers intentionally start at 3
+ * in order to catch any application built with older out
+ * version of DPDK using incompatible client request format.
+ */
 enum pdump_version {
-	V1 = 1
+	PDUMP_CLIENT_LEGACY = 3,
+	PDUMP_CLIENT_PCAPNG = 4,
 };
 
 struct pdump_request {
 	uint16_t ver;
 	uint16_t op;
-	uint32_t flags;
-	union pdump_data {
-		struct enable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} en_v1;
-		struct disable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} dis_v1;
-	} data;
+	uint16_t flags;
+	uint16_t queue;
+	struct rte_ring *ring;
+	struct rte_mempool *mp;
+	const struct rte_bpf_prm *prm;
+	uint32_t snaplen;
+	char device[RTE_DEV_NAME_MAX_LEN];
 };
 
 struct pdump_response {
@@ -63,80 +61,140 @@ static struct pdump_rxtx_cbs {
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	const struct rte_eth_rxtx_callback *cb;
-	void *filter;
+	const struct rte_bpf *filter;
+	enum pdump_version ver;
+	uint32_t snaplen;
 } rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
 tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
 
-
-static inline void
-pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+static const char *MZ_RTE_PDUMP_STATS = "rte_pdump_stats";
+
+/* Shared memory between primary and secondary processes. */
+static struct {
+	struct rte_pdump_stats rx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+	struct rte_pdump_stats tx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+} *pdump_stats;
+
+/* Create a clone of mbuf to be placed into ring. */
+static void
+pdump_copy(uint16_t port_id, uint16_t queue,
+	   enum rte_pcapng_direction direction,
+	   struct rte_mbuf **pkts, uint16_t nb_pkts,
+	   const struct pdump_rxtx_cbs *cbs,
+	   struct rte_pdump_stats *stats)
 {
 	unsigned int i;
 	int ring_enq;
 	uint16_t d_pkts = 0;
 	struct rte_mbuf *dup_bufs[nb_pkts];
-	struct pdump_rxtx_cbs *cbs;
+	uint64_t ts;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	struct rte_mbuf *p;
+	uint64_t rcs[nb_pkts];
+
+	if (cbs->filter &&
+	    rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts) == 0) {
+		/* All packets were filtered out */
+		__atomic_fetch_add(&stats->filtered, nb_pkts,
+				   __ATOMIC_RELAXED);
+		return;
+	}
 
-	cbs  = user_params;
+	ts = rte_get_tsc_cycles();
 	ring = cbs->ring;
 	mp = cbs->mp;
 	for (i = 0; i < nb_pkts; i++) {
-		p = rte_pktmbuf_copy(pkts[i], mp, 0, UINT32_MAX);
-		if (p)
+		/*
+		 * Similar behavior to rte_bpf_eth callback.
+		 * if BPF program returns zero value for a given packet,
+		 * then it will be ignored.
+		 */
+		if (cbs->filter && rcs[i] == 0) {
+			__atomic_fetch_add(&stats->filtered,
+					   1, __ATOMIC_RELAXED);
+			continue;
+		}
+
+		/*
+		 * If using pcapng then want to wrap packets
+		 * otherwise a simple copy.
+		 */
+		if (cbs->ver == PDUMP_CLIENT_PCAPNG)
+			p = rte_pcapng_copy(port_id, queue,
+					    pkts[i], mp, cbs->snaplen,
+					    ts, direction);
+		else
+			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
+
+		if (unlikely(p == NULL))
+			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+		else
 			dup_bufs[d_pkts++] = p;
 	}
 
+	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)dup_bufs, d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		PDUMP_LOG(DEBUG,
-			"only %d of packets enqueued to ring\n", ring_enq);
+		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
 
 static uint16_t
-pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_rx(uint16_t port, uint16_t queue,
 	struct rte_mbuf **pkts, uint16_t nb_pkts,
-	uint16_t max_pkts __rte_unused,
-	void *user_params)
+	uint16_t max_pkts __rte_unused, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->rx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_IN,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static uint16_t
-pdump_tx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_tx(uint16_t port, uint16_t queue,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->tx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static int
-pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_rx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &rx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &rx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"rx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_first_rx_callback(port, qid,
 								pdump_rx, cbs);
 			if (cbs->cb == NULL) {
@@ -145,8 +203,7 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -170,26 +227,32 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 }
 
 static int
-pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_tx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &tx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &tx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"tx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_tx_callback(port, qid, pdump_tx,
 								cbs);
 			if (cbs->cb == NULL) {
@@ -198,8 +261,7 @@ pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -228,37 +290,47 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	uint16_t nb_rx_q = 0, nb_tx_q = 0, end_q, queue;
 	uint16_t port;
 	int ret = 0;
+	struct rte_bpf *filter = NULL;
 	uint32_t flags;
 	uint16_t operation;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 
-	flags = p->flags;
-	operation = p->op;
-	if (operation == ENABLE) {
-		ret = rte_eth_dev_get_port_by_name(p->data.en_v1.device,
-				&port);
-		if (ret < 0) {
+	if (!(p->ver == PDUMP_CLIENT_LEGACY ||
+	      p->ver == PDUMP_CLIENT_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "incorrect client version %u\n", p->ver);
+		return -EINVAL;
+	}
+
+	if (p->prm) {
+		if (p->prm->prog_arg.type != RTE_BPF_ARG_PTR_MBUF) {
 			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.en_v1.device);
+				  "invalid BPF program type: %u\n",
+				  p->prm->prog_arg.type);
 			return -EINVAL;
 		}
-		queue = p->data.en_v1.queue;
-		ring = p->data.en_v1.ring;
-		mp = p->data.en_v1.mp;
-	} else {
-		ret = rte_eth_dev_get_port_by_name(p->data.dis_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.dis_v1.device);
-			return -EINVAL;
+
+		filter = rte_bpf_load(p->prm);
+		if (filter == NULL) {
+			PDUMP_LOG(ERR, "cannot load BPF filter: %s\n",
+				  rte_strerror(rte_errno));
+			return -rte_errno;
 		}
-		queue = p->data.dis_v1.queue;
-		ring = p->data.dis_v1.ring;
-		mp = p->data.dis_v1.mp;
+	}
+
+	flags = p->flags;
+	operation = p->op;
+	queue = p->queue;
+	ring = p->ring;
+	mp = p->mp;
+
+	ret = rte_eth_dev_get_port_by_name(p->device, &port);
+	if (ret < 0) {
+		PDUMP_LOG(ERR,
+			  "failed to get port id for device id=%s\n",
+			  p->device);
+		return -EINVAL;
 	}
 
 	/* validation if packet capture is for all queues */
@@ -296,8 +368,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register RX callback */
 	if (flags & RTE_PDUMP_FLAG_RX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_rx_q : queue + 1;
-		ret = pdump_register_rx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_rx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -305,8 +378,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register TX callback */
 	if (flags & RTE_PDUMP_FLAG_TX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_tx_q : queue + 1;
-		ret = pdump_register_tx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_tx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -347,8 +421,18 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 int
 rte_pdump_init(void)
 {
+	const struct rte_memzone *mz;
 	int ret;
 
+	mz = rte_memzone_reserve(MZ_RTE_PDUMP_STATS, sizeof(*pdump_stats),
+				 rte_socket_id(), 0);
+	if (mz == NULL) {
+		PDUMP_LOG(ERR, "cannot allocate pdump statistics\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	pdump_stats = mz->addr;
+
 	ret = rte_mp_action_register(PDUMP_MP, pdump_server);
 	if (ret && rte_errno != ENOTSUP)
 		return -1;
@@ -392,14 +476,21 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 static int
 pdump_validate_flags(uint32_t flags)
 {
-	if (flags != RTE_PDUMP_FLAG_RX && flags != RTE_PDUMP_FLAG_TX &&
-		flags != RTE_PDUMP_FLAG_RXTX) {
+	if ((flags & RTE_PDUMP_FLAG_RXTX) == 0) {
 		PDUMP_LOG(ERR,
 			"invalid flags, should be either rx/tx/rxtx\n");
 		rte_errno = EINVAL;
 		return -1;
 	}
 
+	/* mask off the flags we know about */
+	if (flags & ~(RTE_PDUMP_FLAG_RXTX | RTE_PDUMP_FLAG_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "unknown flags: %#x\n", flags);
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -426,12 +517,12 @@ pdump_validate_port(uint16_t port, char *name)
 }
 
 static int
-pdump_prepare_client_request(char *device, uint16_t queue,
-				uint32_t flags,
-				uint16_t operation,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+pdump_prepare_client_request(const char *device, uint16_t queue,
+			     uint32_t flags, uint32_t snaplen,
+			     uint16_t operation,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     const struct rte_bpf_prm *prm)
 {
 	int ret = -1;
 	struct rte_mp_msg mp_req, *mp_rep;
@@ -440,23 +531,23 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	struct pdump_request *req = (struct pdump_request *)mp_req.param;
 	struct pdump_response *resp;
 
-	req->ver = 1;
-	req->flags = flags;
+	memset(req, 0, sizeof(*req));
+	if (flags & RTE_PDUMP_FLAG_PCAPNG)
+		req->ver = PDUMP_CLIENT_PCAPNG;
+	else
+		req->ver = PDUMP_CLIENT_LEGACY;
+
+	req->flags = flags & RTE_PDUMP_FLAG_RXTX;
 	req->op = operation;
+	req->queue = queue;
+	strlcpy(req->device, device, sizeof(req->device));
+
 	if ((operation & ENABLE) != 0) {
-		strlcpy(req->data.en_v1.device, device,
-			sizeof(req->data.en_v1.device));
-		req->data.en_v1.queue = queue;
-		req->data.en_v1.ring = ring;
-		req->data.en_v1.mp = mp;
-		req->data.en_v1.filter = filter;
-	} else {
-		strlcpy(req->data.dis_v1.device, device,
-			sizeof(req->data.dis_v1.device));
-		req->data.dis_v1.queue = queue;
-		req->data.dis_v1.ring = NULL;
-		req->data.dis_v1.mp = NULL;
-		req->data.dis_v1.filter = NULL;
+		req->queue = queue;
+		req->ring = ring;
+		req->mp = mp;
+		req->prm = prm;
+		req->snaplen = snaplen;
 	}
 
 	strlcpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
@@ -477,11 +568,17 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	return ret;
 }
 
-int
-rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
-			struct rte_ring *ring,
-			struct rte_mempool *mp,
-			void *filter)
+/*
+ * There are two versions of this function, because although original API
+ * left place holder for future filter, it never checked the value.
+ * Therefore the API can't depend on application passing a non
+ * bogus value.
+ */
+static int
+pdump_enable(uint16_t port, uint16_t queue,
+	     uint32_t flags, uint32_t snaplen,
+	     struct rte_ring *ring, struct rte_mempool *mp,
+	     const struct rte_bpf_prm *prm)
 {
 	int ret;
 	char name[RTE_DEV_NAME_MAX_LEN];
@@ -496,20 +593,42 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						ENABLE, ring, mp, filter);
+	if (snaplen == 0)
+		snaplen = UINT32_MAX;
 
-	return ret;
+	return pdump_prepare_client_request(name, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
 }
 
 int
-rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
-				uint32_t flags,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
+		 struct rte_ring *ring,
+		 struct rte_mempool *mp,
+		 void *filter __rte_unused)
 {
-	int ret = 0;
+	return pdump_enable(port, queue, flags, 0,
+			    ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm)
+{
+	return pdump_enable(port, queue, flags, snaplen,
+			    ring, mp, prm);
+}
+
+static int
+pdump_enable_by_deviceid(const char *device_id, uint16_t queue,
+			 uint32_t flags, uint32_t snaplen,
+			 struct rte_ring *ring,
+			 struct rte_mempool *mp,
+			 const struct rte_bpf_prm *prm)
+{
+	int ret;
 
 	ret = pdump_validate_ring_mp(ring, mp);
 	if (ret < 0)
@@ -518,10 +637,30 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						ENABLE, ring, mp, filter);
+	return pdump_prepare_client_request(device_id, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
+}
 
-	return ret;
+int
+rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
+			     uint32_t flags,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     void *filter __rte_unused)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, 0,
+					ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *prm)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, snaplen,
+					ring, mp, prm);
 }
 
 int
@@ -537,8 +676,8 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags)
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(name, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
@@ -553,8 +692,66 @@ rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(device_id, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
+
+static void
+pdump_sum_stats(uint16_t port, uint16_t nq,
+		struct rte_pdump_stats stats[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+		struct rte_pdump_stats *total)
+{
+	uint64_t *sum = (uint64_t *)total;
+	unsigned int i;
+	uint64_t val;
+	uint16_t qid;
+
+	for (qid = 0; qid < nq; qid++) {
+		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+
+		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
+			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			sum[i] += val;
+		}
+	}
+}
+
+int
+rte_pdump_stats(uint16_t port, struct rte_pdump_stats *stats)
+{
+	struct rte_eth_dev_info dev_info;
+	const struct rte_memzone *mz;
+	int ret;
+
+	memset(stats, 0, sizeof(*stats));
+	ret = rte_eth_dev_info_get(port, &dev_info);
+	if (ret != 0) {
+		PDUMP_LOG(ERR,
+			  "Error during getting device (port %u) info: %s\n",
+			  port, strerror(-ret));
+		return ret;
+	}
+
+	if (pdump_stats == NULL) {
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+			PDUMP_LOG(ERR,
+				  "pdump not initialized\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		mz = rte_memzone_lookup(MZ_RTE_PDUMP_STATS);
+		if (mz == NULL) {
+			PDUMP_LOG(ERR, "can not find pdump stats\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+		pdump_stats = mz->addr;
+	}
+
+	pdump_sum_stats(port, dev_info.nb_rx_queues, pdump_stats->rx, stats);
+	pdump_sum_stats(port, dev_info.nb_tx_queues, pdump_stats->tx, stats);
+	return 0;
+}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index 6b00fc17aeb2..be3fd14c4bd3 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -15,6 +15,7 @@
 #include <stdint.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
+#include <rte_bpf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -26,7 +27,9 @@ enum {
 	RTE_PDUMP_FLAG_RX = 1,  /* receive direction */
 	RTE_PDUMP_FLAG_TX = 2,  /* transmit direction */
 	/* both receive and transmit directions */
-	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
+	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX),
+
+	RTE_PDUMP_FLAG_PCAPNG = 4, /* format for pcapng */
 };
 
 /**
@@ -68,7 +71,7 @@ rte_pdump_uninit(void);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  Unused should be NULL.
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -80,6 +83,41 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 		struct rte_mempool *mp,
 		void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given port and queue with filtering.
+ *
+ * @param port_id
+ *  The Ethernet port on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param prm
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf(uint16_t port_id, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm);
+
 /**
  * Disables packet capturing on given port and queue.
  *
@@ -118,7 +156,7 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  unused should be NULL
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -131,6 +169,43 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 				struct rte_mempool *mp,
 				void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given device id and queue with filtering.
+ * device_id can be name or pci address of device.
+ *
+ * @param device_id
+ *  device id on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *filter);
+
+
 /**
  * Disables packet capturing on given device_id and queue.
  * device_id can be name or pci address of device.
@@ -153,6 +228,35 @@ int
 rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 				uint32_t flags);
 
+
+/**
+ * A structure used to retrieve statistics from packet capture.
+ * The statistics are sum of both receive and transmit queues.
+ */
+struct rte_pdump_stats {
+	uint64_t accepted; /**< Number of packets accepted by filter. */
+	uint64_t filtered; /**< Number of packets rejected by filter. */
+	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
+	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+
+	uint64_t reserved[4]; /**< Reserved and pad to cache line */
+};
+
+/**
+ * Retrieve the packet capture statistics for a queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param stats
+ *   A pointer to structure of type *rte_pdump_stats* to be filled in.
+ * @return
+ *   Zero if successful. -1 on error and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_pdump_stats(uint16_t port_id, struct rte_pdump_stats *stats);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index f0a9d12c9a9e..ce5502d9cdf4 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -10,3 +10,11 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_pdump_enable_bpf;
+	rte_pdump_enable_bpf_by_deviceid;
+	rte_pdump_stats;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v5 7/9] app/dumpcap: add new packet capture application
  2021-09-08 21:50 ` [dpdk-dev] [PATCH v5 0/9] Packet capture framework enhancements Stephen Hemminger
                     ` (5 preceding siblings ...)
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 6/9] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-09-08 21:50   ` Stephen Hemminger
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 8/9] doc: changes for new pcapng and dumpcap Stephen Hemminger
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 9/9] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
  8 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 21:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.

It is not complete yet some features such as filtering are not implemented.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/dumpcap/main.c      | 827 ++++++++++++++++++++++++++++++++++++++++
 app/dumpcap/meson.build |  16 +
 app/meson.build         |   1 +
 3 files changed, 844 insertions(+)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build

diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c
new file mode 100644
index 000000000000..66e25708b4d3
--- /dev/null
+++ b/app/dumpcap/main.c
@@ -0,0 +1,827 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * DPDK application to dump network traffic
+ * This is designed to look and act like the Wireshark
+ * dumpcap program.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <limits.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_alarm.h>
+#include <rte_bpf.h>
+#include <rte_config.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_lcore.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_pcapng.h>
+#include <rte_pdump.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+#include <rte_time.h>
+#include <rte_version.h>
+
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+
+#define RING_NAME "capture-ring"
+#define MONITOR_INTERVAL  (500 * 1000)
+#define MBUF_POOL_CACHE_SIZE 32
+#define BURST_SIZE 32
+#define SLEEP_THRESHOLD 1000
+
+/* command line flags */
+static const char *progname;
+static bool quit_signal;
+static bool group_read;
+static bool quiet;
+static bool promiscuous_mode = true;
+static bool use_pcapng = true;
+static char *output_name;
+static const char *filter_str;
+static unsigned int ring_size = 2048;
+static const char *capture_comment;
+static uint32_t snaplen = RTE_MBUF_DEFAULT_BUF_SIZE;
+static bool dump_bpf;
+static struct {
+	uint64_t  duration;	/* nanoseconds */
+	unsigned long packets;  /* number of packets in file */
+	size_t size;		/* file size (bytes) */
+} stop;
+
+/* Running state */
+static struct rte_bpf_prm *bpf_prm;
+static uint64_t start_time, end_time;
+static uint64_t packets_received;
+static size_t file_size;
+
+struct interface {
+	TAILQ_ENTRY(interface) next;
+	uint16_t port;
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_rxtx_callback *rx_cb[RTE_MAX_QUEUES_PER_PORT];
+};
+
+TAILQ_HEAD(interface_list, interface);
+static struct interface_list interfaces = TAILQ_HEAD_INITIALIZER(interfaces);
+static struct interface *port2intf[RTE_MAX_ETHPORTS];
+
+static void usage(void)
+{
+	printf("Usage: %s [options] ...\n\n", progname);
+	printf("Capture Interface:\n"
+	       "  -i <interface>           name or port index of interface\n"
+	       "  -f <capture filter>      packet filter in libpcap filter syntax\n");
+	printf("  -s <snaplen>, --snapshot-length <snaplen>\n"
+	       "                           packet snapshot length (def: %u)\n",
+	       RTE_MBUF_DEFAULT_BUF_SIZE);
+	printf("  -p, --no-promiscuous-mode\n"
+	       "                           don't capture in promiscuous mode\n"
+	       "  -D, --list-interfaces    print list of interfaces and exit\n"
+	       "  -d                       print generated BPF code for capture filter\n"
+	       "\n"
+	       "Stop conditions:\n"
+	       "  -c <packet count>        stop after n packets (def: infinite)\n"
+	       "  -a <autostop cond.> ..., --autostop <autostop cond.> ...\n"
+	       "                           duration:NUM - stop after NUM seconds\n"
+	       "                           filesize:NUM - stop this file after NUM kB\n"
+	       "                            packets:NUM - stop after NUM packets\n"
+	       "Output (files):\n"
+	       "  -w <filename>            name of file to save (def: tempfile)\n"
+	       "  -g                       enable group read access on the output file(s)\n"
+	       "  -n                       use pcapng format instead of pcap (default)\n"
+	       "  -P                       use libpcap format instead of pcapng\n"
+	       "  --capture-comment <comment>\n"
+	       "                           add a capture comment to the output file\n"
+	       "\n"
+	       "Miscellaneous:\n"
+	       "  -q                       don't report packet capture counts\n"
+	       "  -v, --version            print version information and exit\n"
+	       "  -h, --help               display this help and exit\n"
+	       "\n"
+	       "Use Ctrl-C to stop capturing at any time.\n");
+}
+
+static const char *version(void)
+{
+	static char str[128];
+
+	snprintf(str, sizeof(str),
+		 "%s 1.0 (%s)\n", progname, rte_version());
+	return str;
+}
+
+/* Parse numeric argument from command line */
+static unsigned long get_uint(const char *arg, const char *name,
+			     unsigned int limit)
+{
+	unsigned long u;
+	char *endp;
+
+	u = strtoul(arg, &endp, 0);
+	if (*arg == '\0' || *endp != '\0')
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is not a valid number\n",
+			 name, arg);
+	if (limit && u > limit)
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is too large (greater than %u)\n",
+			 name, arg, limit);
+
+	return u;
+}
+
+/* Set auto stop values */
+static void auto_stop(char *opt)
+{
+	char *value, *endp;
+
+	value = strchr(opt, ':');
+	if (value == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Missing colon in auto stop parameter\n");
+
+	*value++ = '\0';
+	if (strcmp(opt, "duration") == 0) {
+		double interval = strtod(value, &endp);
+
+		if (*value == '\0' || *endp != '\0' || interval <= 0)
+			rte_exit(EXIT_FAILURE,
+				 "Invalid duration \"%s\"\n", value);
+		stop.duration = NSEC_PER_SEC * interval;
+	} else if (strcmp(opt, "filesize") == 0) {
+		stop.size = get_uint(value, "filesize", 0) * 1024;
+	} else if (strcmp(opt, "packets") == 0) {
+		stop.packets = get_uint(value, "packets", 0);
+	} else {
+		rte_exit(EXIT_FAILURE,
+			 "Unknown autostop parameter \"%s\"\n", opt);
+	}
+}
+
+/* Add interface to list of interfaces to capture */
+static void add_interface(uint16_t port, const char *name)
+{
+	struct interface *intf;
+
+	intf = malloc(sizeof(*intf));
+	if (!intf)
+		rte_exit(EXIT_FAILURE, "no memory for interface\n");
+
+	memset(intf, 0, sizeof(*intf));
+	strlcpy(intf->name, name, sizeof(intf->name));
+
+	printf("Capturing on '%s'\n", name);
+
+	port2intf[port] = intf;
+	TAILQ_INSERT_TAIL(&interfaces, intf, next);
+}
+
+/* Select all valid DPDK interfaces */
+static void select_all_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+	}
+}
+
+/*
+ * Choose interface to capture if no -i option given.
+ * Select the first DPDK port, this matches what dumpcap does.
+ */
+static void set_default_interface(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+		return;
+	}
+	rte_exit(EXIT_FAILURE, "No usable interfaces found\n");
+}
+
+/* Lookup interface by name or port and add it to the list */
+static void select_interface(const char *arg)
+{
+	uint16_t port;
+
+	if (strcmp(arg, "*"))
+		select_all_interfaces();
+	else if (rte_eth_dev_get_port_by_name(arg, &port) == 0)
+		add_interface(port, arg);
+	else {
+		char name[RTE_ETH_NAME_MAX_LEN];
+
+		port = get_uint(arg, "port_number", UINT16_MAX);
+		if (rte_eth_dev_get_name_by_port(port, name) < 0)
+			rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
+				 port);
+		add_interface(port, name);
+	}
+}
+
+/* Display list of possible interfaces that can be used. */
+static void show_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		printf("%u. %s\n", p, name);
+	}
+}
+
+static void compile_filter(void)
+{
+	struct bpf_program bf;
+	pcap_t *pcap;
+
+	pcap = pcap_open_dead(DLT_EN10MB, snaplen);
+	if (!pcap)
+		rte_exit(EXIT_FAILURE, "can not open pcap\n");
+
+	if (pcap_compile(pcap, &bf, filter_str,
+			 1, PCAP_NETMASK_UNKNOWN) != 0)
+		rte_exit(EXIT_FAILURE, "pcap filter string not valid (%s)\n",
+			 pcap_geterr(pcap));
+
+	bpf_prm = rte_bpf_convert(&bf);
+	if (bpf_prm == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "bpf convert failed\n");
+
+	if (dump_bpf) {
+		printf("cBPF program (%u insns)\n", bf.bf_len);
+		bpf_dump(&bf, 1);
+		printf("\neBPF program (%u insns)\n", bpf_prm->nb_ins);
+		rte_bpf_dump(stdout, bpf_prm->ins, bpf_prm->nb_ins);
+		exit(0);
+	}
+
+	/* Don't care about original program any more */
+	pcap_freecode(&bf);
+	pcap_close(pcap);
+}
+
+/*
+ * Parse command line options.
+ * These are chosen to be similar to dumpcap command.
+ */
+static void parse_opts(int argc, char **argv)
+{
+	static const struct option long_options[] = {
+		{ "autostop",        required_argument, NULL, 'a' },
+		{ "capture-comment", required_argument, NULL, 0 },
+		{ "help",            no_argument,       NULL, 'h' },
+		{ "interface",       required_argument, NULL, 'i' },
+		{ "list-interfaces", no_argument,       NULL, 'D' },
+		{ "no-promiscuous-mode", no_argument,   NULL, 'p' },
+		{ "output-file",     required_argument, NULL, 'w' },
+		{ "ring-buffer",     required_argument, NULL, 'b' },
+		{ "snapshot-length", required_argument, NULL, 's' },
+		{ "version",         no_argument,       NULL, 'v' },
+		{ NULL },
+	};
+	int option_index, c;
+
+	for (;;) {
+		c = getopt_long(argc, argv, "a:b:c:dDf:ghi:nN:pPqs:vw:",
+				long_options, &option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			switch (option_index) {
+			case 0:
+				capture_comment = optarg;
+				break;
+			default:
+				usage();
+				exit(1);
+			}
+			break;
+		case 'a':
+			auto_stop(optarg);
+			break;
+		case 'b':
+			rte_exit(EXIT_FAILURE,
+				 "multiple files not implemented\n");
+			break;
+		case 'c':
+			stop.packets = get_uint(optarg, "packet_count", 0);
+			break;
+		case 'd':
+			dump_bpf = true;
+			break;
+		case 'D':
+			show_interfaces();
+			exit(0);
+		case 'f':
+			filter_str = optarg;
+			break;
+		case 'g':
+			group_read = true;
+			break;
+		case 'h':
+			printf("%s\n\n", version());
+			usage();
+			exit(0);
+		case 'i':
+			select_interface(optarg);
+			break;
+		case 'n':
+			use_pcapng = true;
+			break;
+		case 'N':
+			ring_size = get_uint(optarg, "packet_limit", 0);
+			break;
+		case 'p':
+			promiscuous_mode = false;
+			break;
+		case 'P':
+			use_pcapng = false;
+			break;
+		case 'q':
+			quiet = true;
+			break;
+		case 's':
+			snaplen = get_uint(optarg, "snap_len", 0);
+			break;
+		case 'w':
+			output_name = optarg;
+			break;
+		case 'v':
+			printf("%s\n", version());
+			exit(0);
+		default:
+			fprintf(stderr, "Invalid option: %s\n",
+				argv[optind - 1]);
+			usage();
+			exit(1);
+		}
+	}
+}
+
+static void
+signal_handler(int sig_num __rte_unused)
+{
+	__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+}
+
+/* Return the time since 1/1/1970 in nanoseconds */
+static uint64_t create_timestamp(void)
+{
+	struct timespec now;
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+	return rte_timespec_to_ns(&now);
+}
+
+static void
+cleanup_pdump_resources(void)
+{
+	struct interface *intf;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		rte_pdump_disable(intf->port,
+				  RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
+		if (promiscuous_mode)
+			rte_eth_promiscuous_disable(intf->port);
+	}
+}
+
+/* Alarm signal handler, used to check that primary process */
+static void
+monitor_primary(void *arg __rte_unused)
+{
+	if (__atomic_load_n(&quit_signal, __ATOMIC_RELAXED))
+		return;
+
+	if (rte_eal_primary_proc_alive(NULL)) {
+		rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	} else {
+		fprintf(stderr,
+			"Primary process is no longer active, exiting...\n");
+		__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+	}
+}
+
+/* Setup handler to check when primary exits. */
+static void
+enable_primary_monitor(void)
+{
+	int ret;
+
+	/* Once primary exits, so will pdump. */
+	ret = rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to enable monitor:%d\n", ret);
+}
+
+static void
+disable_primary_monitor(void)
+{
+	int ret;
+
+	ret = rte_eal_alarm_cancel(monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to disable monitor:%d\n", ret);
+}
+
+static void
+report_packet_stats(rte_pcapng_t *out)
+{
+	struct rte_pdump_stats pdump_stats;
+	struct interface *intf;
+	uint64_t ifrecv, ifdrop;
+	double percent;
+
+	fputc('\n', stderr);
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (rte_pdump_stats(intf->port, &pdump_stats) < 0)
+			continue;
+
+		/* do what Wiretap does */
+		ifrecv = pdump_stats.accepted + pdump_stats.filtered;
+		ifdrop = pdump_stats.nombuf + pdump_stats.ringfull;
+
+		if (use_pcapng)
+			rte_pcapng_write_stats(out, intf->port, NULL,
+					       start_time, end_time,
+					       ifrecv, ifdrop);
+
+		if (ifrecv == 0)
+			percent = 0;
+		else
+			percent = 100. * ifrecv / (ifrecv + ifdrop);
+
+		fprintf(stderr,
+			"Packets received/dropped on interface '%s': "
+			"%"PRIu64 "/%" PRIu64 " (%.1f)\n",
+			intf->name, ifrecv, ifdrop, percent);
+	}
+}
+
+/*
+ * Start DPDK EAL with arguments.
+ * Unlike most DPDK programs, this application does not use the
+ * typical EAL command line arguments.
+ * We don't want to expose all the DPDK internals to the user.
+ */
+static void dpdk_init(void)
+{
+	const char *args[] = {
+		progname, "--proc-type", "secondary",
+		"--log-level", "bpf:debug"
+
+	};
+	const int eal_argc = RTE_DIM(args);
+	char **eal_argv;
+	unsigned int i;
+
+	/* DPDK API requires mutable versions of command line arguments. */
+	eal_argv = calloc(eal_argc + 1, sizeof(char *));
+	if (eal_argv == NULL)
+		rte_panic("No memory\n");
+
+	for (i = 0; i < RTE_DIM(args); i++)
+		eal_argv[i] = strdup(args[i]);
+
+	if (rte_eal_init(eal_argc, eal_argv) < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed: is primary process running?\n");
+
+	if (rte_eth_dev_count_avail() == 0)
+		rte_exit(EXIT_FAILURE, "No Ethernet ports found\n");
+}
+
+/* Create packet ring shared between callbacks and process */
+static struct rte_ring *create_ring(void)
+{
+	struct rte_ring *ring;
+	size_t size, log2;
+
+	/* Find next power of 2 >= size. */
+	size = ring_size;
+	log2 = sizeof(size) * 8 - __builtin_clzl(size - 1);
+	size = 1u << log2;
+
+	if (size != ring_size) {
+		fprintf(stderr, "Ring size %u rounded up to %zu\n",
+			ring_size, size);
+		ring_size = size;
+	}
+
+	ring = rte_ring_lookup(RING_NAME);
+	if (ring == NULL) {
+		ring = rte_ring_create(RING_NAME, ring_size,
+					rte_socket_id(), 0);
+		if (ring == NULL)
+			rte_exit(EXIT_FAILURE, "Could not create ring :%s\n",
+				 rte_strerror(rte_errno));
+	}
+	return ring;
+}
+
+static struct rte_mempool *create_mempool(void)
+{
+	static const char pool_name[] = "capture_mbufs";
+	size_t num_mbufs = 2 * ring_size;
+	struct rte_mempool *mp;
+
+	mp = rte_mempool_lookup(pool_name);
+	if (mp)
+		return mp;
+
+	mp = rte_pktmbuf_pool_create_by_ops(pool_name, num_mbufs,
+					    MBUF_POOL_CACHE_SIZE, 0,
+					    rte_pcapng_mbuf_size(snaplen),
+					    rte_socket_id(), "ring_mp_sc");
+	if (mp == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Mempool (%s) creation failed: %s\n", pool_name,
+			 rte_strerror(rte_errno));
+
+	return mp;
+}
+
+static void *create_output(void)
+{
+	static char tmp_path[PATH_MAX];
+	struct utsname uts;
+	char os[256];
+	int fd;
+
+	/* If no filename specified make a tempfile name */
+	if (output_name == NULL) {
+		struct interface *intf;
+		struct tm *tm;
+		time_t now;
+		char ts[32];
+
+		intf = TAILQ_FIRST(&interfaces);
+		now = time(NULL);
+		tm = localtime(&now);
+		if (!tm)
+			rte_panic("localtime failed\n");
+
+		strftime(ts, sizeof(ts), "%Y%m%d%H%M%S", tm);
+
+		snprintf(tmp_path, sizeof(tmp_path),
+			 "/tmp/%s_%u_%s_%s.%s",
+			 progname, intf->port, intf->name, ts,
+			 use_pcapng ? "pcapng" : "pcap");
+		output_name = tmp_path;
+	}
+
+	if (strcmp(output_name, "-") == 0)
+		fd = STDOUT_FILENO;
+	else {
+		mode_t mode = group_read ? 0640 : 0600;
+
+		fd = open(output_name, O_WRONLY | O_CREAT, mode);
+		if (fd < 0)
+			rte_exit(EXIT_FAILURE, "Can not open \"%s\": %s\n",
+				 output_name, strerror(errno));
+	}
+
+	if (use_pcapng) {
+		rte_pcapng_t *pcapng;
+
+		if (uname(&uts) < 0)
+			strcpy(os, "unknown");
+		else
+			snprintf(os, sizeof(os), "%s %s",
+				 uts.sysname, uts.release);
+
+		pcapng = rte_pcapng_fdopen(fd, os, NULL, version(), capture_comment);
+		if (pcapng == NULL)
+			rte_exit(EXIT_FAILURE, "pcapng_fdopen failed: %s\n",
+				 strerror(rte_errno));
+		return pcapng;
+	} else {
+		pcap_dumper_t *dumper;
+		pcap_t *pcap;
+
+		pcap = pcap_open_dead_with_tstamp_precision(DLT_EN10MB, snaplen,
+							    PCAP_TSTAMP_PRECISION_NANO);
+		if (pcap == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_open_dead failed\n");
+
+		dumper = pcap_dump_fopen(pcap, fdopen(fd, "w"));
+		if (dumper == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_dump_fopen failed: %s\n",
+				 pcap_geterr(pcap));
+		return dumper;
+	}
+}
+
+static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp)
+{
+	struct interface *intf;
+	uint32_t flags;
+	int ret;
+
+	flags = RTE_PDUMP_FLAG_RXTX;
+	if (use_pcapng)
+		flags |= RTE_PDUMP_FLAG_PCAPNG;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (promiscuous_mode)
+			rte_eth_promiscuous_enable(intf->port);
+
+		ret = rte_pdump_enable_bpf(intf->port, RTE_PDUMP_ALL_QUEUES,
+					   flags, snaplen,
+					   r, mp, bpf_prm);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Packet dump enable failed: %s\n",
+				 rte_strerror(-ret));
+	}
+}
+
+/*
+ * Show current count of captured packets
+ * with backspaces to overwrite last value.
+ */
+static void show_count(uint64_t count)
+{
+	unsigned int i;
+	static unsigned int bt;
+
+	for (i = 0; i < bt; i++)
+		fputc('\b', stderr);
+
+	bt = fprintf(stderr, "%"PRIu64" ", count);
+}
+
+/* Write multiple packets in older pcap format */
+static ssize_t
+pcap_write_packets(pcap_dumper_t *dumper,
+		   struct rte_mbuf *pkts[], uint16_t n)
+{
+	uint8_t temp_data[snaplen];
+	struct pcap_pkthdr header;
+	uint16_t i;
+	size_t total = 0;
+
+	gettimeofday(&header.ts, NULL);
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m = pkts[i];
+
+		header.len = rte_pktmbuf_pkt_len(m);
+		header.caplen = RTE_MIN(header.len, snaplen);
+
+		pcap_dump((u_char *)dumper, &header,
+			  rte_pktmbuf_read(m, 0, header.caplen, temp_data));
+
+		total += sizeof(header) + header.len;
+	}
+
+	return total;
+}
+
+/* Process all packets in ring and dump to capture file */
+static int process_ring(void *out, struct rte_ring *r)
+{
+	struct rte_mbuf *pkts[BURST_SIZE];
+	unsigned int avail, n;
+	static unsigned int empty_count;
+	ssize_t written;
+
+	n = rte_ring_sc_dequeue_burst(r, (void **) pkts, BURST_SIZE,
+				      &avail);
+	if (n == 0) {
+		/* don't consume endless amounts of cpu if idle */
+		if (empty_count < SLEEP_THRESHOLD)
+			++empty_count;
+		else
+			usleep(10);
+		return 0;
+	}
+
+	empty_count = (avail == 0);
+
+	if (use_pcapng)
+		written = rte_pcapng_write_packets(out, pkts, n);
+	else
+		written = pcap_write_packets(out, pkts, n);
+
+	rte_pktmbuf_free_bulk(pkts, n);
+
+	if (written < 0)
+		return -1;
+
+	file_size += written;
+	packets_received += n;
+	if (!quiet)
+		show_count(packets_received);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	struct rte_ring *r;
+	struct rte_mempool *mp;
+	void *out;
+
+	progname = argv[0];
+
+	dpdk_init();
+	parse_opts(argc, argv);
+
+	if (filter_str)
+		compile_filter();
+
+	if (TAILQ_EMPTY(&interfaces))
+		set_default_interface();
+
+	r = create_ring();
+	mp = create_mempool();
+	out = create_output();
+	if (out == NULL)
+		rte_exit(EXIT_FAILURE, "can not open output file: %s\n",
+			 rte_strerror(rte_errno));
+
+	start_time = create_timestamp();
+	enable_pdump(r, mp);
+
+	signal(SIGINT, signal_handler);
+	signal(SIGPIPE, SIG_IGN);
+
+	enable_primary_monitor();
+
+	if (!quiet) {
+		fprintf(stderr, "Packets captured: ");
+		show_count(0);
+	}
+
+	while (!__atomic_load_n(&quit_signal, __ATOMIC_RELAXED)) {
+		if (process_ring(out, r) < 0) {
+			fprintf(stderr, "pcapng file write failed; %s\n",
+				strerror(errno));
+			break;
+		}
+
+		if (stop.size && file_size >= stop.size)
+			break;
+
+		if (stop.packets && packets_received >= stop.packets)
+			break;
+
+		if (stop.duration != 0 &&
+		    create_timestamp() - start_time > stop.duration)
+			break;
+	}
+
+	end_time = create_timestamp();
+	disable_primary_monitor();
+
+	if (rte_eal_primary_proc_alive(NULL))
+		report_packet_stats(out);
+
+	if (use_pcapng)
+		rte_pcapng_close(out);
+	else
+		pcap_dump_close(out);
+
+	cleanup_pdump_resources();
+	rte_free(bpf_filter);
+	rte_ring_free(r);
+	rte_mempool_free(mp);
+
+	return rte_eal_cleanup() ? EXIT_FAILURE : 0;
+}
diff --git a/app/dumpcap/meson.build b/app/dumpcap/meson.build
new file mode 100644
index 000000000000..794336211eff
--- /dev/null
+++ b/app/dumpcap/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+if not dpdk_conf.has('RTE_PORT_PCAP')
+    build = false
+    reason = 'missing dependency, "libpcap"'
+endif
+
+if is_windows
+	build = false
+	reason = 'not supported on Windows'
+	subdir_done()
+endif
+
+sources = files('main.c')
+deps += ['ethdev', 'pdump', 'pcapng', 'bpf']
diff --git a/app/meson.build b/app/meson.build
index 4c6049807cc3..e41a2e390236 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017-2019 Intel Corporation
 
 apps = [
+        'dumpcap',
         'pdump',
         'proc-info',
         'test-acl',
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v5 8/9] doc: changes for new pcapng and dumpcap
  2021-09-08 21:50 ` [dpdk-dev] [PATCH v5 0/9] Packet capture framework enhancements Stephen Hemminger
                     ` (6 preceding siblings ...)
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 7/9] app/dumpcap: add new packet capture application Stephen Hemminger
@ 2021-09-08 21:50   ` Stephen Hemminger
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 9/9] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
  8 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 21:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Describe the new packet capture library and utilities

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/api/doxy-api-index.md                     |  1 +
 doc/api/doxy-api.conf.in                      |  1 +
 .../howto/img/packet_capture_framework.svg    | 96 +++++++++----------
 doc/guides/howto/packet_capture_framework.rst | 67 ++++++-------
 doc/guides/prog_guide/index.rst               |  1 +
 doc/guides/prog_guide/pcapng_lib.rst          | 24 +++++
 doc/guides/prog_guide/pdump_lib.rst           | 28 ++++--
 doc/guides/rel_notes/release_21_11.rst        | 10 ++
 doc/guides/tools/dumpcap.rst                  | 86 +++++++++++++++++
 doc/guides/tools/index.rst                    |  1 +
 10 files changed, 228 insertions(+), 87 deletions(-)
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a0356..ee07394d1c78 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -223,3 +223,4 @@ The public API headers are grouped by topics:
   [experimental APIs]  (@ref rte_compat.h),
   [ABI versioning]     (@ref rte_function_versioning.h),
   [version]            (@ref rte_version.h)
+  [pcapng]             (@ref rte_pcapng.h)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6ab..aba17799a9a1 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -58,6 +58,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/metrics \
                           @TOPDIR@/lib/node \
                           @TOPDIR@/lib/net \
+                          @TOPDIR@/lib/pcapng \
                           @TOPDIR@/lib/pci \
                           @TOPDIR@/lib/pdump \
                           @TOPDIR@/lib/pipeline \
diff --git a/doc/guides/howto/img/packet_capture_framework.svg b/doc/guides/howto/img/packet_capture_framework.svg
index a76baf71fdee..1c2646a81096 100644
--- a/doc/guides/howto/img/packet_capture_framework.svg
+++ b/doc/guides/howto/img/packet_capture_framework.svg
@@ -1,6 +1,4 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<!-- Created with Inkscape (http://www.inkscape.org/) -->
-
 <svg
    xmlns:osb="http://www.openswatchbook.org/uri/2009/osb"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
@@ -16,8 +14,8 @@
    viewBox="0 0 425.19685 283.46457"
    id="svg2"
    version="1.1"
-   inkscape:version="0.91 r13725"
-   sodipodi:docname="drawing-pcap.svg">
+   inkscape:version="1.0.2 (e86c870879, 2021-01-15)"
+   sodipodi:docname="packet_capture_framework.svg">
   <defs
      id="defs4">
     <marker
@@ -228,7 +226,7 @@
        x2="487.64606"
        y2="258.38232"
        gradientUnits="userSpaceOnUse"
-       gradientTransform="translate(-84.916417,744.90779)" />
+       gradientTransform="matrix(1.1457977,0,0,0.99944907,-151.97019,745.05014)" />
     <linearGradient
        inkscape:collect="always"
        xlink:href="#linearGradient5784"
@@ -277,17 +275,18 @@
      borderopacity="1.0"
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
-     inkscape:zoom="0.57434918"
-     inkscape:cx="215.17857"
-     inkscape:cy="285.26445"
+     inkscape:zoom="1"
+     inkscape:cx="226.77165"
+     inkscape:cy="78.124511"
      inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
-     inkscape:window-width="1874"
-     inkscape:window-height="971"
-     inkscape:window-x="2"
-     inkscape:window-y="24"
-     inkscape:window-maximized="0" />
+     inkscape:window-width="2560"
+     inkscape:window-height="1414"
+     inkscape:window-x="0"
+     inkscape:window-y="0"
+     inkscape:window-maximized="1"
+     inkscape:document-rotation="0" />
   <metadata
      id="metadata7">
     <rdf:RDF>
@@ -296,7 +295,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
+        <dc:title />
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -321,15 +320,15 @@
        y="790.82452" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="61.050636"
        y="807.3205"
-       id="text4152"
-       sodipodi:linespacing="125%"><tspan
+       id="text4152"><tspan
          sodipodi:role="line"
          id="tspan4154"
          x="61.050636"
-         y="807.3205">DPDK Primary Application</tspan></text>
+         y="807.3205"
+         style="font-size:12.5px;line-height:1.25">DPDK Primary Application</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6"
@@ -339,19 +338,20 @@
        y="827.01843" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="350.68585"
        y="841.16058"
-       id="text4189"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189"><tspan
          sodipodi:role="line"
          id="tspan4191"
          x="350.68585"
-         y="841.16058">dpdk-pdump</tspan><tspan
+         y="841.16058"
+         style="font-size:12.5px;line-height:1.25">dpdk-dumpcap</tspan><tspan
          sodipodi:role="line"
          x="350.68585"
          y="856.78558"
-         id="tspan4193">tool</tspan></text>
+         id="tspan4193"
+         style="font-size:12.5px;line-height:1.25">tool</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4"
@@ -361,15 +361,15 @@
        y="891.16315" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70612"
        y="905.3053"
-       id="text4189-1"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1"><tspan
          sodipodi:role="line"
          x="352.70612"
          y="905.3053"
-         id="tspan4193-3">PCAP PMD</tspan></text>
+         id="tspan4193-3"
+         style="font-size:12.5px;line-height:1.25">librte_pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5745);fill-opacity:1;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-6"
@@ -379,15 +379,15 @@
        y="923.9931" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.02846"
        y="938.13525"
-       id="text4189-0"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-0"><tspan
          sodipodi:role="line"
          x="136.02846"
          y="938.13525"
-         id="tspan4193-6">dpdk_port0</tspan></text>
+         id="tspan4193-6"
+         style="font-size:12.5px;line-height:1.25">dpdk_port0</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-5"
@@ -397,33 +397,33 @@
        y="824.99817" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="137.54369"
        y="839.14026"
-       id="text4189-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-4"><tspan
          sodipodi:role="line"
          x="137.54369"
          y="839.14026"
-         id="tspan4193-2">librte_pdump</tspan></text>
+         id="tspan4193-2"
+         style="font-size:12.5px;line-height:1.25">librte_pdump</tspan></text>
     <rect
-       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1.07013;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5"
-       width="94.449265"
-       height="35.355339"
-       x="307.7804"
-       y="985.61243" />
+       width="108.21974"
+       height="35.335861"
+       x="297.9809"
+       y="985.62219" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70618"
        y="999.75458"
-       id="text4189-1-8"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8"><tspan
          sodipodi:role="line"
          x="352.70618"
          y="999.75458"
-         id="tspan4193-3-2">capture.pcap</tspan></text>
+         id="tspan4193-3-2"
+         style="font-size:12.5px;line-height:1.25">capture.pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5788-1);fill-opacity:1;stroke:#257cdc;stroke-width:1.12555885;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5-1"
@@ -433,15 +433,15 @@
        y="983.14984" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.53352"
        y="1002.785"
-       id="text4189-1-8-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8-4"><tspan
          sodipodi:role="line"
          x="136.53352"
          y="1002.785"
-         id="tspan4193-3-2-7">Traffic Generator</tspan></text>
+         id="tspan4193-3-2-7"
+         style="font-size:12.5px;line-height:1.25">Traffic Generator</tspan></text>
     <path
        style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker7331)"
        d="m 351.46948,927.02357 c 0,57.5787 0,57.5787 0,57.5787"
diff --git a/doc/guides/howto/packet_capture_framework.rst b/doc/guides/howto/packet_capture_framework.rst
index c31bac52340e..78baa609a021 100644
--- a/doc/guides/howto/packet_capture_framework.rst
+++ b/doc/guides/howto/packet_capture_framework.rst
@@ -1,18 +1,19 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(c) 2017 Intel Corporation.
 
-DPDK pdump Library and pdump Tool
-=================================
+DPDK packet capture libraries and tools
+=======================================
 
 This document describes how the Data Plane Development Kit (DPDK) Packet
 Capture Framework is used for capturing packets on DPDK ports. It is intended
 for users of DPDK who want to know more about the Packet Capture feature and
 for those who want to monitor traffic on DPDK-controlled devices.
 
-The DPDK packet capture framework was introduced in DPDK v16.07. The DPDK
-packet capture framework consists of the DPDK pdump library and DPDK pdump
-tool.
-
+The DPDK packet capture framework was introduced in DPDK v16.07 and
+enhanced in 21.1. The DPDK packet capture framework consists of the
+libraries for collecting packets ``librte_pdump`` and writing packets
+to a file ``librte_pcapng``. There are two sample applications:
+``dpdk-dumpcap`` and older ``dpdk-pdump``.
 
 Introduction
 ------------
@@ -22,43 +23,46 @@ allow users to initialize the packet capture framework and to enable or
 disable packet capture. The library works on a multi process communication model and its
 usage is recommended for debugging purposes.
 
-The :ref:`dpdk-pdump <pdump_tool>` tool is developed based on the
-``librte_pdump`` library.  It runs as a DPDK secondary process and is capable
-of enabling or disabling packet capture on DPDK ports. The ``dpdk-pdump`` tool
-provides command-line options with which users can request enabling or
-disabling of the packet capture on DPDK ports.
+The :ref:`librte_pcapng <pcapng_library>` library provides the APIs to format
+packets and write them to a file in Pcapng format.
+
+
+The :ref:`dpdk-dumpcap <dumpcap_tool>` is a tool that captures packets in
+like Wireshark dumpcap does for Linux. It runs as a DPDK secondary process and
+captures packets from one or more interfaces and writes them to a file
+in Pcapng format.  The ``dpdk-dumpcap`` tool is designed to take
+most of the same options as the Wireshark ``dumpcap`` command.
 
-The application which initializes the packet capture framework will be a primary process
-and the application that enables or disables the packet capture will
-be a secondary process. The primary process sends the Rx and Tx packets from the DPDK ports
-to the secondary process.
+Without any options it will use the packet capture framework to
+capture traffic from the first available DPDK port.
 
 In DPDK the ``testpmd`` application can be used to initialize the packet
-capture framework and acts as a server, and the ``dpdk-pdump`` tool acts as a
+capture framework and acts as a server, and the ``dpdk-dumpcap`` tool acts as a
 client. To view Rx or Tx packets of ``testpmd``, the application should be
-launched first, and then the ``dpdk-pdump`` tool. Packets from ``testpmd``
-will be sent to the tool, which then sends them on to the Pcap PMD device and
-that device writes them to the Pcap file or to an external interface depending
-on the command-line option used.
+launched first, and then the ``dpdk-dumpcap`` tool. Packets from ``testpmd``
+will be sent to the tool, and then to the Pcapng file.
 
 Some things to note:
 
-* The ``dpdk-pdump`` tool can only be used in conjunction with a primary
+* All tools using ``librte_pdump`` can only be used in conjunction with a primary
   application which has the packet capture framework initialized already. In
   dpdk, only ``testpmd`` is modified to initialize packet capture framework,
-  other applications remain untouched. So, if the ``dpdk-pdump`` tool has to
+  other applications remain untouched. So, if the ``dpdk-dumpcap`` tool has to
   be used with any application other than the testpmd, the user needs to
   explicitly modify that application to call the packet capture framework
   initialization code. Refer to the ``app/test-pmd/testpmd.c`` code and look
   for ``pdump`` keyword to see how this is done.
 
-* The ``dpdk-pdump`` tool depends on the libpcap based PMD.
+* The ``dpdk-pdump`` tool is an older tool created as demonstration of ``librte_pdump``
+  library. The ``dpdk-pdump`` tool provides more limited functionality and
+  and depends on the Pcap PMD. It is retained only for compatibility reasons;
+  users should use ``dpdk-dumpcap`` instead.
 
 
 Test Environment
 ----------------
 
-The overview of using the Packet Capture Framework and the ``dpdk-pdump`` tool
+The overview of using the Packet Capture Framework and the ``dpdk-dumpcap`` utility
 for packet capturing on the DPDK port in
 :numref:`figure_packet_capture_framework`.
 
@@ -66,13 +70,13 @@ for packet capturing on the DPDK port in
 
 .. figure:: img/packet_capture_framework.*
 
-   Packet capturing on a DPDK port using the dpdk-pdump tool.
+   Packet capturing on a DPDK port using the dpdk-dumpcap utility.
 
 
 Running the Application
 -----------------------
 
-The following steps demonstrate how to run the ``dpdk-pdump`` tool to capture
+The following steps demonstrate how to run the ``dpdk-dumpcap`` tool to capture
 Rx side packets on dpdk_port0 in :numref:`figure_packet_capture_framework` and
 inspect them using ``tcpdump``.
 
@@ -80,16 +84,15 @@ inspect them using ``tcpdump``.
 
      sudo <build_dir>/app/dpdk-testpmd -c 0xf0 -n 4 -- -i --port-topology=chained
 
-#. Launch the pdump tool as follows::
+#. Launch the dpdk-dump as follows::
 
-     sudo <build_dir>/app/dpdk-pdump -- \
-          --pdump 'port=0,queue=*,rx-dev=/tmp/capture.pcap'
+     sudo <build_dir>/app/dpdk-dumpcap -w /tmp/capture.pcapng
 
 #. Send traffic to dpdk_port0 from traffic generator.
-   Inspect packets captured in the file capture.pcap using a tool
-   that can interpret Pcap files, for example tcpdump::
+   Inspect packets captured in the file capture.pcap using a tool such as
+   tcpdump or tshark that can interpret Pcapng files::
 
-     $tcpdump -nr /tmp/capture.pcap
+     $ tcpdump -nr /tmp/capture.pcapng
      reading from file /tmp/capture.pcap, link-type EN10MB (Ethernet)
      11:11:36.891404 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
      11:11:36.891442 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46a3..b440c77c2ba1 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -43,6 +43,7 @@ Programmer's Guide
     ip_fragment_reassembly_lib
     generic_receive_offload_lib
     generic_segmentation_offload_lib
+    pcapng_lib
     pdump_lib
     multi_proc_support
     kernel_nic_interface
diff --git a/doc/guides/prog_guide/pcapng_lib.rst b/doc/guides/prog_guide/pcapng_lib.rst
new file mode 100644
index 000000000000..36379b530a57
--- /dev/null
+++ b/doc/guides/prog_guide/pcapng_lib.rst
@@ -0,0 +1,24 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2016 Intel Corporation.
+
+.. _pcapng_library:
+
+Packet Capture File Writer
+==========================
+
+Pcapng is a library for creating files in Pcapng file format.
+The Pcapng file format is the default capture file format for modern
+network capture processing tools. It can be read by wireshark and tcpdump.
+
+Usage
+-----
+
+Before the library can be used the function ``rte_pcapng_init``
+should be called once to initialize timestamp computation.
+
+
+References
+----------
+* Draft RFC https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+
+* Project repository  https://github.com/pcapng/pcapng/
diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst
index 62c0b015b2fe..9af91415e5ea 100644
--- a/doc/guides/prog_guide/pdump_lib.rst
+++ b/doc/guides/prog_guide/pdump_lib.rst
@@ -3,10 +3,10 @@
 
 .. _pdump_library:
 
-The librte_pdump Library
-========================
+The Packet Capture Library
+==========================
 
-The ``librte_pdump`` library provides a framework for packet capturing in DPDK.
+The DPDK ``pdump`` library provides a framework for packet capturing in DPDK.
 The library does the complete copy of the Rx and Tx mbufs to a new mempool and
 hence it slows down the performance of the applications, so it is recommended
 to use this library for debugging purposes.
@@ -23,11 +23,19 @@ or disable the packet capture, and to uninitialize it.
 
 * ``rte_pdump_enable()``:
   This API enables the packet capture on a given port and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf()``
+  This API enables the packet capture on a given port and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_enable_by_deviceid()``:
   This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf_by_deviceid()``
+  This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
+  It also allows seating an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_disable()``:
   This API disables the packet capture on a given port and queue.
@@ -61,6 +69,12 @@ and enables the packet capture by registering the Ethernet RX and TX callbacks f
 and queue combinations. Then the primary process will mirror the packets to the new mempool and enqueue them to
 the rte_ring that secondary process have passed to these APIs.
 
+The packet ring supports one of two formats. The default format enqueues copies of the original packets
+into the rte_ring. If the ``RTE_PDUMP_FLAG_PCAPNG`` is set the mbuf data is extended with header and trailer
+to match the format of Pcapng enhanced packet block. The enhanced packet block has meta-data such as the
+timestamp, port and queue the packet was captured on. It is up to the application consuming the
+packets from the ring to select the format desired.
+
 The library APIs ``rte_pdump_disable()`` and ``rte_pdump_disable_by_deviceid()`` disables the packet capture.
 For the calls to these APIs from secondary process, the library creates the "pdump disable" request and sends
 the request to the primary process over the multi process channel. The primary process takes this request and
@@ -74,5 +88,5 @@ function.
 Use Case: Packet Capturing
 --------------------------
 
-The DPDK ``app/pdump`` tool is developed based on this library to capture packets in DPDK.
-Users can use this as an example to develop their own packet capturing tools.
+The DPDK ``app/dpdk-dumpcap`` utility uses this library
+to capture packets in DPDK.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 675b5738348b..ee24cbfdb99d 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -62,6 +62,16 @@ New Features
   * Added bus-level parsing of the devargs syntax.
   * Kept compatibility with the legacy syntax as parsing fallback.
 
+* **Enhance Packet capture.**
+
+  * New dpdk-dumpcap program that has most of the features of the
+    wireshark dumpcap utility including capture of multiple interfaces,
+    stopping after number of bytes, packets.
+  * New library for writing pcapng packet capture files.
+  * Enhancement to the pdump library to support:
+    * Packet filter with BPF.
+    * Pcapng format with timestamps and meta-data.
+    * Fixes packet capture with stripped VLAN tags.
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/dumpcap.rst b/doc/guides/tools/dumpcap.rst
new file mode 100644
index 000000000000..664ea0c79802
--- /dev/null
+++ b/doc/guides/tools/dumpcap.rst
@@ -0,0 +1,86 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Microsoft Corporation.
+
+.. _dumpcap_tool:
+
+dpdk-dumpcap Application
+========================
+
+The ``dpdk-dumpcap`` tool is a Data Plane Development Kit (DPDK)
+network traffic dump tool.  The interface is similar to  the dumpcap tool in Wireshark.
+It runs as a secondary DPDK process and lets you capture packets that are
+coming into and out of a DPDK primary process.
+The ``dpdk-dumpcap`` writes files in Pcapng packet format using
+capture file format is pcapng.
+
+Without any options set it will use DPDK to capture traffic from the first
+available DPDK interface and write the received raw packet data, along
+with timestamps into a pcapng file.
+
+If the ``-w`` option is not specified, ``dpdk-dumpcap`` writes to a newly
+create file with a name chosen based on interface name and timestamp.
+If ``-w`` option is specified, then that file is used.
+
+   .. Note::
+      * The ``dpdk-dumpcap`` tool can only be used in conjunction with a primary
+        application which has the packet capture framework initialized already.
+        In dpdk, only the ``testpmd`` is modified to initialize packet capture
+        framework, other applications remain untouched. So, if the ``dpdk-dumpcap``
+        tool has to be used with any application other than the testpmd, user
+        needs to explicitly modify that application to call packet capture
+        framework initialization code. Refer ``app/test-pmd/testpmd.c``
+        code to see how this is done.
+
+      * The ``dpdk-dumpcap`` tool runs as a DPDK secondary process. It exits when
+        the primary application exits.
+
+
+Running the Application
+-----------------------
+
+To list interfaces available for capture use ``--list-interfaces``.
+
+To filter packets in style of *tshark* use the ``-f`` flag.
+
+To capture on multiple interfaces at once, use multiple ``-I`` flags.
+
+Example
+-------
+
+.. code-block:: console
+
+   # ./<build_dir>/app/dpdk-dumpcap --list-interfaces
+   0. 000:00:03.0
+   1. 000:00:03.1
+
+   # ./<build_dir>/app/dpdk-dumpcap -I 0000:00:03.0 -c 6 -w /tmp/sample.pcapng
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 6/0
+
+   # ./<build_dir>/app/dpdk-dumpcap -f 'tcp port 80'
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 10/8
+
+
+Limitations
+-----------
+The following option of Wireshark ``dumpcap`` is not yet implemented:
+
+   * ``-b|--ring-buffer`` -- more complex file management.
+
+The following options do not make sense in the context of DPDK.
+
+   * ``-C <byte_limit>`` -- its a kernel thing
+
+   * ``-t`` -- use a thread per interface
+
+   * Timestamp type.
+
+   * Link data types. Only EN10MB (Ethernet) is supported.
+
+   * Wireless related options:  ``-I|--monitor-mode`` and  ``-k <freq>``
+
+
+.. Note::
+   * The options to ``dpdk-dumpcap`` are like the Wireshark dumpcap program and
+     are not the same as ``dpdk-pdump`` and other DPDK applications.
diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst
index 93dde4148e90..b71c12b8f2dd 100644
--- a/doc/guides/tools/index.rst
+++ b/doc/guides/tools/index.rst
@@ -8,6 +8,7 @@ DPDK Tools User Guides
     :maxdepth: 2
     :numbered:
 
+    dumpcap
     proc_info
     pdump
     pmdinfo
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v5 9/9] MAINTAINERS: add entry for new pcapng and dumper
  2021-09-08 21:50 ` [dpdk-dev] [PATCH v5 0/9] Packet capture framework enhancements Stephen Hemminger
                     ` (7 preceding siblings ...)
  2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 8/9] doc: changes for new pcapng and dumpcap Stephen Hemminger
@ 2021-09-08 21:50   ` Stephen Hemminger
  8 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-08 21:50 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Claim responsibility for the new code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 266f5ac1dae8..06384ac2702d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1429,6 +1429,12 @@ F: app/test/test_pdump.*
 F: app/pdump/
 F: doc/guides/tools/pdump.rst
 
+Packet dump
+M: Stephen Hemminger <stephen@networkplumber.org>
+F: lib/pcapng/
+F: doc/guides/prog_guide/pcapng_lib.rst
+F: app/dumpcap/
+F: doc/guides/tools/dumpcap.rst
 
 Packet Framework
 ----------------
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v6 00/10] Packet capture framework enhancements
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (8 preceding siblings ...)
  2021-09-08 21:50 ` [dpdk-dev] [PATCH v5 0/9] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-09 23:33 ` Stephen Hemminger
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 01/10] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
                     ` (9 more replies)
  2021-09-10 18:18 ` [dpdk-dev] [PATCH v7 00/11] Packet capture framework enhancements Stephen Hemminger
                   ` (8 subsequent siblings)
  18 siblings, 10 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-09 23:33 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patch set is a more complete version of the the enhanced
packet capture support described last year.

The new capture library and utility are:
  - faster avoids lots of extra I/O, does bursting, etc.
  - gives more information (multiple ports, queues, etc)
  - has a better user interface (same as Wireshark dumpcap)
  - fixes structural problems with VLAN's and timestamps

It preserves the old pdump command as is for those people
who never want to change.

v6 changes:
  - add a test for bpf converter (and fix one bug it found).

v5 changes:
  - minor build and checkpatch fixes for RHEL/FreeBSD
  - disable lib/pdump on Windows. It was not useful before
    and now pdump depends on bpf.

v4 changes:
  - minor checkpatch fixes.
    Note: some of the checkpatch warnings are bogus and won't be fixed.
  - fix build of dumpcap on FreeBSD

v3 changes:
  - introduce packet filters using classic BPF to eBPF converter
    required small fix to DPDK BPF interpreter
  - introduce function to decode eBPF instructions
  - add option to dumpcap to show both classic BPF and eBPF result
  - drop some un-useful stubs
  - minor checkpatch warning cleanup

Stephen Hemminger (10):
  librte_pcapng: add new library for writing pcapng files
  bpf: allow self-xor operation
  bpf: add function to convert classic BPF to DPDK BPF
  bpf: add function to dump eBPF instructions
  test: add test for bpf_convert
  lib: pdump is not supported on Windows
  pdump: support pcapng and filtering
  app/dumpcap: add new packet capture application
  doc: changes for new pcapng and dumpcap
  MAINTAINERS: add entry for new pcapng and dumper

 MAINTAINERS                                   |   6 +
 app/dumpcap/main.c                            | 827 ++++++++++++++++++
 app/dumpcap/meson.build                       |  16 +
 app/meson.build                               |   1 +
 app/test/test_bpf.c                           | 148 ++++
 doc/api/doxy-api-index.md                     |   1 +
 doc/api/doxy-api.conf.in                      |   1 +
 .../howto/img/packet_capture_framework.svg    |  96 +-
 doc/guides/howto/packet_capture_framework.rst |  67 +-
 doc/guides/prog_guide/index.rst               |   1 +
 doc/guides/prog_guide/pcapng_lib.rst          |  24 +
 doc/guides/prog_guide/pdump_lib.rst           |  28 +-
 doc/guides/rel_notes/release_21_11.rst        |  10 +
 doc/guides/tools/dumpcap.rst                  |  86 ++
 doc/guides/tools/index.rst                    |   1 +
 lib/bpf/bpf_convert.c                         | 575 ++++++++++++
 lib/bpf/bpf_dump.c                            | 118 +++
 lib/bpf/bpf_validate.c                        |   8 +-
 lib/bpf/meson.build                           |   6 +
 lib/bpf/rte_bpf.h                             |  39 +
 lib/bpf/version.map                           |   7 +
 lib/meson.build                               |   6 +-
 lib/pcapng/meson.build                        |   8 +
 lib/pcapng/pcapng_proto.h                     | 129 +++
 lib/pcapng/rte_pcapng.c                       | 574 ++++++++++++
 lib/pcapng/rte_pcapng.h                       | 194 ++++
 lib/pcapng/version.map                        |  12 +
 lib/pdump/meson.build                         |   2 +-
 lib/pdump/rte_pdump.c                         | 437 ++++++---
 lib/pdump/rte_pdump.h                         | 110 ++-
 lib/pdump/version.map                         |   8 +
 31 files changed, 3330 insertions(+), 216 deletions(-)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst
 create mode 100644 lib/bpf/bpf_convert.c
 create mode 100644 lib/bpf/bpf_dump.c
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v6 01/10] librte_pcapng: add new library for writing pcapng files
  2021-09-09 23:33 ` [dpdk-dev] [PATCH v6 00/10] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-09 23:33   ` Stephen Hemminger
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 02/10] bpf: allow self-xor operation Stephen Hemminger
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-09 23:33 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See draft RFC
  https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
and
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build           |   1 +
 lib/pcapng/meson.build    |   8 +
 lib/pcapng/pcapng_proto.h | 129 +++++++++
 lib/pcapng/rte_pcapng.c   | 574 ++++++++++++++++++++++++++++++++++++++
 lib/pcapng/rte_pcapng.h   | 194 +++++++++++++
 lib/pcapng/version.map    |  12 +
 6 files changed, 918 insertions(+)
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

diff --git a/lib/meson.build b/lib/meson.build
index 1673ca4323c0..51bf9c2d11f0 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -41,6 +41,7 @@ libraries = [
         'latencystats',
         'lpm',
         'member',
+        'pcapng',
         'power',
         'pdump',
         'rawdev',
diff --git a/lib/pcapng/meson.build b/lib/pcapng/meson.build
new file mode 100644
index 000000000000..fe636bdf3c0b
--- /dev/null
+++ b/lib/pcapng/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+version = 1
+sources = files('rte_pcapng.c')
+headers = files('rte_pcapng.h')
+
+deps += ['ethdev']
diff --git a/lib/pcapng/pcapng_proto.h b/lib/pcapng/pcapng_proto.h
new file mode 100644
index 000000000000..47161d8a1213
--- /dev/null
+++ b/lib/pcapng/pcapng_proto.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * PCAP Next Generation Capture File writer
+ *
+ * See: https://github.com/pcapng/pcapng/ for the file format.
+ */
+
+enum pcapng_block_types {
+	PCAPNG_INTERFACE_BLOCK		= 1,
+	PCAPNG_PACKET_BLOCK,		/* Obsolete */
+	PCAPNG_SIMPLE_PACKET_BLOCK,
+	PCAPNG_NAME_RESOLUTION_BLOCK,
+	PCAPNG_INTERFACE_STATS_BLOCK,
+	PCAPNG_ENHANCED_PACKET_BLOCK,
+
+	PCAPNG_SECTION_BLOCK		= 0x0A0D0D0A,
+};
+
+struct pcapng_option {
+	uint16_t code;
+	uint16_t length;
+	uint8_t data[];
+};
+
+#define PCAPNG_BYTE_ORDER_MAGIC 0x1A2B3C4D
+#define PCAPNG_MAJOR_VERS 1
+#define PCAPNG_MINOR_VERS 0
+
+enum pcapng_opt {
+	PCAPNG_OPT_END	= 0,
+	PCAPNG_OPT_COMMENT = 1,
+};
+
+struct pcapng_section_header {
+	uint32_t block_type;
+	uint32_t block_length;
+	uint32_t byte_order_magic;
+	uint16_t major_version;
+	uint16_t minor_version;
+	uint64_t section_length;
+};
+
+enum pcapng_section_opt {
+	PCAPNG_SHB_HARDWARE = 2,
+	PCAPNG_SHB_OS	    = 3,
+	PCAPNG_SHB_USERAPPL = 4,
+};
+
+struct pcapng_interface_block {
+	uint32_t block_type;	/* 1 */
+	uint32_t block_length;
+	uint16_t link_type;
+	uint16_t reserved;
+	uint32_t snap_len;
+};
+
+enum pcapng_interface_options {
+	PCAPNG_IFB_NAME	 = 2,
+	PCAPNG_IFB_DESCRIPTION,
+	PCAPNG_IFB_IPV4ADDR,
+	PCAPNG_IFB_IPV6ADDR,
+	PCAPNG_IFB_MACADDR,
+	PCAPNG_IFB_EUIADDR,
+	PCAPNG_IFB_SPEED,
+	PCAPNG_IFB_TSRESOL,
+	PCAPNG_IFB_TZONE,
+	PCAPNG_IFB_FILTER,
+	PCAPNG_IFB_OS,
+	PCAPNG_IFB_FCSLEN,
+	PCAPNG_IFB_TSOFFSET,
+	PCAPNG_IFB_HARDWARE,
+};
+
+struct pcapng_enhance_packet_block {
+	uint32_t block_type;	/* 6 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+	uint32_t capture_length;
+	uint32_t original_length;
+};
+
+/* Flags values */
+#define PCAPNG_IFB_INBOUND   0b01
+#define PCAPNG_IFB_OUTBOUND  0b10
+
+enum pcapng_epb_options {
+	PCAPNG_EPB_FLAGS = 2,
+	PCAPNG_EPB_HASH,
+	PCAPNG_EPB_DROPCOUNT,
+	PCAPNG_EPB_PACKETID,
+	PCAPNG_EPB_QUEUE,
+	PCAPNG_EPB_VERDICT,
+};
+
+enum pcapng_epb_hash {
+	PCAPNG_HASH_2COMP = 0,
+	PCAPNG_HASH_XOR,
+	PCAPNG_HASH_CRC32,
+	PCAPNG_HASH_MD5,
+	PCAPNG_HASH_SHA1,
+	PCAPNG_HASH_TOEPLITZ,
+};
+
+struct pcapng_simple_packet {
+	uint32_t block_type;	/* 3 */
+	uint32_t block_length;
+	uint32_t packet_length;
+};
+
+struct pcapng_statistics {
+	uint32_t block_type;	/* 5 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+};
+
+enum pcapng_isb_options {
+	PCAPNG_ISB_STARTTIME = 2,
+	PCAPNG_ISB_ENDTIME,
+	PCAPNG_ISB_IFRECV,
+	PCAPNG_ISB_IFDROP,
+	PCAPNG_ISB_FILTERACCEPT,
+	PCAPNG_ISB_OSDROP,
+	PCAPNG_ISB_USRDELIV,
+};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
new file mode 100644
index 000000000000..7bde8d9b75fd
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.c
@@ -0,0 +1,574 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+#include <errno.h>
+#include <net/if.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/uio.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+#include <rte_time.h>
+
+#include "pcapng_proto.h"
+
+/* conversion from DPDK speed to PCAPNG */
+#define PCAPNG_MBPS_SPEED 1000000ull
+
+/* Format of the capture file handle */
+struct rte_pcapng {
+	int  outfd;		/* output file */
+	/* DPDK port id to interface index in file */
+	uint32_t port_index[RTE_MAX_ETHPORTS];
+};
+
+/* For converting TSC cycles to PCAPNG ns format */
+struct pcapng_time {
+	uint64_t ns;
+	uint64_t cycles;
+} pcapng_time;
+
+RTE_INIT(pcapng_init)
+{
+	struct timespec ts;
+
+	pcapng_time.cycles = rte_get_tsc_cycles();
+	clock_gettime(CLOCK_REALTIME, &ts);
+	pcapng_time.ns = rte_timespec_to_ns(&ts);
+}
+
+/* PCAPNG timestamps are in nanoseconds */
+static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
+{
+	uint64_t delta;
+
+	delta = cycles - pcapng_time.cycles;
+	return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
+}
+
+/* length of option including padding */
+static uint16_t pcapng_optlen(uint16_t len)
+{
+	return RTE_ALIGN(sizeof(struct pcapng_option) + len,
+			 sizeof(uint32_t));
+}
+
+/* build TLV option and return location of next */
+static struct pcapng_option *
+pcapng_add_option(struct pcapng_option *popt, uint16_t code,
+		  const void *data, uint16_t len)
+{
+	popt->code = code;
+	popt->length = len;
+	memcpy(popt->data, data, len);
+
+	return (struct pcapng_option *)((uint8_t *)popt + pcapng_optlen(len));
+}
+
+/*
+ * Write required initial section header describing the capture
+ */
+static int
+pcapng_section_block(rte_pcapng_t *self,
+		    const char *os, const char *hw,
+		    const char *app, const char *comment)
+{
+	struct pcapng_section_header *hdr;
+	struct pcapng_option *opt;
+	void *buf;
+	uint32_t len;
+	ssize_t cc;
+
+	len = sizeof(*hdr);
+	if (hw)
+		len += pcapng_optlen(strlen(hw));
+	if (os)
+		len += pcapng_optlen(strlen(os));
+	if (app)
+		len += pcapng_optlen(strlen(app));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = calloc(1, len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_section_header *)buf;
+	*hdr = (struct pcapng_section_header) {
+		.block_type = PCAPNG_SECTION_BLOCK,
+		.block_length = len,
+		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
+		.major_version = PCAPNG_MAJOR_VERS,
+		.minor_version = PCAPNG_MINOR_VERS,
+		.section_length = UINT64_MAX,
+	};
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (hw)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_HARDWARE,
+					hw, strlen(hw));
+	if (os)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_OS,
+					os, strlen(os));
+	if (app)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_USERAPPL,
+					app, strlen(app));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+	/* clone block_length after option */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+/* Write the PCAPNG section header at start of file */
+static ssize_t
+pcapng_interface_block(rte_pcapng_t *self, const char *if_name,
+		      uint64_t if_speed, const uint8_t *mac_addr,
+		      const char *if_hw, const char *comment)
+{
+	struct pcapng_interface_block *hdr;
+	struct pcapng_option *opt;
+	const uint8_t tsresol = 9;	/* nanosecond resolution */
+	uint32_t len = sizeof(*hdr);
+	ssize_t cc;
+	void *buf;
+
+	len += pcapng_optlen(sizeof(tsresol));
+	if (if_name)
+		len += pcapng_optlen(strlen(if_name));
+	if (mac_addr)
+		len += pcapng_optlen(6);
+	if (if_speed)
+		len += pcapng_optlen(sizeof(uint64_t));
+	if (if_hw)
+		len += pcapng_optlen(strlen(if_hw));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+	buf = calloc(1, len);
+	if (!buf)
+		return -ENOMEM;
+
+	hdr = (struct pcapng_interface_block *)buf;
+	hdr->block_type = PCAPNG_INTERFACE_BLOCK;
+	hdr->link_type = 1;	/* Ethernet */
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (if_name)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_NAME,
+					 if_name, strlen(if_name));
+	if (mac_addr)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_MACADDR,
+					mac_addr, RTE_ETHER_ADDR_LEN);
+	if (if_speed)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_SPEED,
+					 &if_speed, sizeof(uint64_t));
+	opt = pcapng_add_option(opt, PCAPNG_IFB_TSRESOL,
+				&tsresol, sizeof(tsresol));
+	if (if_hw)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE,
+					 if_hw, strlen(if_hw));
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+static int
+pcapng_add_interface(rte_pcapng_t *self, uint16_t port)
+{
+	struct rte_eth_dev_info dev_info;
+	struct rte_ether_addr macaddr;
+	const struct rte_device *dev;
+	struct rte_eth_link link;
+	char ifname[IF_NAMESIZE];
+	char ifhw[256];
+	uint64_t speed = 0;
+
+	if (rte_eth_dev_info_get(port, &dev_info) < 0)
+		return -1;
+
+	/* make something like an interface name */
+	if (if_indextoname(dev_info.if_index, ifname) == NULL)
+		snprintf(ifname, IF_NAMESIZE, "dpdk:%u", port);
+
+	/* make a useful device hardware string */
+	dev = dev_info.device;
+	if (dev)
+		snprintf(ifhw, sizeof(ifhw),
+			 "%s-%s", dev->bus->name, dev->name);
+
+	/* DPDK reports in units of Mbps */
+	rte_eth_link_get(port, &link);
+	if (link.link_status == ETH_LINK_UP)
+		speed = link.link_speed * PCAPNG_MBPS_SPEED;
+
+	rte_eth_macaddr_get(port, &macaddr);
+
+	return pcapng_interface_block(self, ifname, speed,
+				      macaddr.addr_bytes,
+				      dev ? ifhw : NULL, NULL);
+}
+
+/*
+ * Write the list of possible interfaces at the start
+ * of the file.
+ */
+static int
+pcapng_interfaces(rte_pcapng_t *self)
+{
+	uint16_t port_id;
+	uint16_t index = 0;
+
+	RTE_ETH_FOREACH_DEV(port_id) {
+		/* The list if ports in pcapng needs to be contiguous */
+		self->port_index[port_id] = index++;
+		if (pcapng_add_interface(self, port_id) < 0)
+			return -1;
+	}
+	return 0;
+}
+
+/*
+ * Write an Interface statistics block at the end of capture.
+ */
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop)
+{
+	struct pcapng_statistics *hdr;
+	struct pcapng_option *opt;
+	uint32_t optlen, len;
+	uint8_t *buf;
+	uint64_t ns;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	optlen = 0;
+
+	if (ifrecv != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifdrop));
+	if (start_time != 0)
+		optlen += pcapng_optlen(sizeof(start_time));
+	if (end_time != 0)
+		optlen += pcapng_optlen(sizeof(end_time));
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+	if (optlen != 0)
+		optlen += pcapng_optlen(0);
+
+	len = sizeof(*hdr) + optlen + sizeof(uint32_t);
+	buf = alloca(len);
+	if (buf == NULL)
+		return -1;
+
+	hdr = (struct pcapng_statistics *)buf;
+	opt = (struct pcapng_option *)(hdr + 1);
+
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (start_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME,
+					 &start_time, sizeof(start_time));
+	if (end_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME,
+					 &end_time, sizeof(end_time));
+	if (ifrecv != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV,
+				&ifrecv, sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFDROP,
+				&ifdrop, sizeof(ifdrop));
+	if (optlen != 0)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	hdr->block_type = PCAPNG_INTERFACE_STATS_BLOCK;
+	hdr->block_length = len;
+	hdr->interface_id = self->port_index[port_id];
+
+	ns = pcapng_tsc_to_ns(rte_get_tsc_cycles());
+	hdr->timestamp_hi = ns >> 32;
+	hdr->timestamp_lo = (uint32_t)ns;
+
+	/* clone block_length after option */
+	memcpy(opt, &len, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length)
+{
+	/* The VLAN and EPB header must fit in the mbuf headroom. */
+	RTE_ASSERT(sizeof(struct pcapng_enhance_packet_block) +
+		   sizeof(struct rte_vlan_hdr) <= RTE_PKTMBUF_HEADROOM);
+
+	/* The flags and queue information are added at the end. */
+	return sizeof(struct rte_mbuf)
+		+ RTE_ALIGN(length, sizeof(uint32_t))
+		+ pcapng_optlen(sizeof(uint32_t)) /* flag option */
+		+ pcapng_optlen(sizeof(uint32_t)) /* queue option */
+		+ sizeof(uint32_t);		  /*  length */
+}
+
+/*
+ *   The mbufs created use the Pcapng standard enhanced packet  block.
+ *
+ *                         1                   2                   3
+ *     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  0 |                    Block Type = 0x00000006                    |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  4 |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  8 |                         Interface ID                          |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 12 |                        Timestamp (High)                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 16 |                        Timestamp (Low)                        |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 20 |                    Captured Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 24 |                    Original Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 28 /                                                               /
+ *    /                          Packet Data                          /
+ *    /              variable length, padded to 32 bits               /
+ *    /                                                               /
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0002     |     Option Length = 0x004     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Flags (direction)                                |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0006     |     Option Length = 0x002     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Queue id                                         |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+
+/* Make a copy of original mbuf with pcapng header and options */
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *md,
+		struct rte_mempool *mp,
+		uint32_t length, uint64_t cycles,
+		enum rte_pcapng_direction direction)
+{
+	struct pcapng_enhance_packet_block *epb;
+	uint32_t orig_len, data_len, padding, flags;
+	struct pcapng_option *opt;
+	const uint16_t optlen = pcapng_optlen(sizeof(flags)) + pcapng_optlen(sizeof(queue));
+	struct rte_mbuf *mc;
+	uint64_t ns;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+#endif
+	ns = pcapng_tsc_to_ns(cycles);
+
+	orig_len = rte_pktmbuf_pkt_len(md);
+
+	/* Take snapshot of the data */
+	mc = rte_pktmbuf_copy(md, mp, 0, length);
+	if (unlikely(mc == NULL))
+		return NULL;
+
+	/* If packet had offloaded VLAN, expand it */
+	if (md->ol_flags & ~(PKT_RX_VLAN_STRIPPED | PKT_TX_VLAN)) {
+		if (rte_vlan_insert(&mc) != 0)
+			goto fail;
+
+		orig_len += sizeof(struct rte_vlan_hdr);
+	}
+
+	/* pad the packet to 32 bit boundary */
+	data_len = rte_pktmbuf_data_len(mc);
+	padding = RTE_ALIGN(data_len, sizeof(uint32_t)) - data_len;
+	if (padding > 0) {
+		void *tail = rte_pktmbuf_append(mc, padding);
+
+		if (tail == NULL)
+			goto fail;
+		memset(tail, 0, padding);
+	}
+
+	/* reserve trailing options and block length */
+	opt = (struct pcapng_option *)
+		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
+	if (unlikely(opt == NULL))
+		goto fail;
+
+	switch (direction) {
+	case RTE_PCAPNG_DIRECTION_IN:
+		flags = PCAPNG_IFB_INBOUND;
+		break;
+	case RTE_PCAPNG_DIRECTION_OUT:
+		flags = PCAPNG_IFB_OUTBOUND;
+		break;
+	default:
+		flags = 0;
+	}
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS,
+				&flags, sizeof(flags));
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
+				&queue, sizeof(queue));
+
+	/* Add PCAPNG packet header */
+	epb = (struct pcapng_enhance_packet_block *)
+		rte_pktmbuf_prepend(mc, sizeof(*epb));
+	if (unlikely(epb == NULL))
+		goto fail;
+
+	epb->block_type = PCAPNG_ENHANCED_PACKET_BLOCK;
+	epb->block_length = rte_pktmbuf_data_len(mc);
+
+	/* Interface index is filled in later during write */
+	mc->port = port_id;
+
+	epb->timestamp_hi = ns >> 32;
+	epb->timestamp_lo = (uint32_t)ns;
+	epb->capture_length = data_len;
+	epb->original_length = orig_len;
+
+	/* set trailer of block length */
+	*(uint32_t *)opt = epb->block_length;
+
+	return mc;
+
+fail:
+	rte_pktmbuf_free(mc);
+	return NULL;
+}
+
+/* Count how many segments are in this array of mbufs */
+static unsigned int
+mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
+{
+	unsigned int i, iovcnt;
+
+	for (iovcnt = 0, i = 0; i < n; i++) {
+		const struct rte_mbuf *m = pkts[i];
+
+		__rte_mbuf_sanity_check(m, 1);
+
+		iovcnt += m->nb_segs;
+	}
+	return iovcnt;
+}
+
+/* Write pre-formatted packets to file. */
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts)
+{
+	int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
+	struct iovec iov[iovcnt];
+	unsigned int i, cnt;
+	ssize_t ret;
+
+	for (i = cnt = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *m = pkts[i];
+		struct pcapng_enhance_packet_block *epb;
+
+		/* sanity check that is really a pcapng mbuf */
+		epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
+		if (unlikely(epb->block_type != PCAPNG_ENHANCED_PACKET_BLOCK ||
+			     epb->block_length != rte_pktmbuf_data_len(m))) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/*
+		 * The DPDK port is recorded during pcapng_copy.
+		 * Map that to PCAPNG interface in file.
+		 */
+		epb->interface_id = self->port_index[m->port];
+		do {
+			iov[cnt].iov_base = rte_pktmbuf_mtod(m, void *);
+			iov[cnt].iov_len = rte_pktmbuf_data_len(m);
+			++cnt;
+		} while ((m = m->next));
+	}
+
+	ret = writev(self->outfd, iov, iovcnt);
+	if (unlikely(ret < 0))
+		rte_errno = errno;
+	return ret;
+}
+
+/* Create new pcapng writer handle */
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment)
+{
+	rte_pcapng_t *self;
+
+	self = malloc(sizeof(*self));
+	if (!self) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	self->outfd = fd;
+
+	if (pcapng_section_block(self, osname, hardware, appname, comment) < 0)
+		goto fail;
+
+	if (pcapng_interfaces(self) < 0)
+		goto fail;
+
+	return self;
+fail:
+	free(self);
+	return NULL;
+}
+
+void
+rte_pcapng_close(rte_pcapng_t *self)
+{
+	close(self->outfd);
+	free(self);
+}
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
new file mode 100644
index 000000000000..2f1bb073df08
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.h
@@ -0,0 +1,194 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+/**
+ * @file
+ * RTE pcapng
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * Pcapng is an evolution from the pcap format, created to address some of
+ * its deficiencies. Namely, the lack of extensibility and inability to store
+ * additional information.
+ *
+ * For details about the file format see RFC:
+ *   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+ *  and
+ *    https://github.com/pcapng/pcapng/
+ */
+
+#ifndef _RTE_PCAPNG_H_
+#define _RTE_PCAPNG_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+#include <rte_compat.h>
+#include <rte_common.h>
+#include <rte_mempool.h>
+#include <rte_ring.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Opaque handle used for functions in this library. */
+typedef struct rte_pcapng rte_pcapng_t;
+
+/**
+ * Write data to existing open file
+ *
+ * @param fd
+ *   file descriptor
+ * @param osname
+ *   Optional description of the operating system.
+ *   Examples: "Debian 11", "Windows Server 22"
+ * @param hardware
+ *   Optional description of the hardware used to create this file.
+ *   Examples: "x86 Virtual Machine"
+ * @param appname
+ *   Optional: application name recorded in the pcapng file.
+ *   Example: "dpdk-dumpcap 1.0 (DPDK 20.11)"
+ * @param comment
+ *   Optional comment to add to file header.
+ * @return
+ *   handle to library, or NULL in case of error (and rte_errno is set).
+ */
+__rte_experimental
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment);
+
+/**
+ * Close capture file
+ *
+ * @param self
+ *  handle to library
+ */
+__rte_experimental
+void
+rte_pcapng_close(rte_pcapng_t *self);
+
+/**
+ * Direction flag
+ * These should match Enhanced Packet Block flag bits
+ */
+enum rte_pcapng_direction {
+	RTE_PCAPNG_DIRECTION_UNKNOWN = 0,
+	RTE_PCAPNG_DIRECTION_IN  = 1,
+	RTE_PCAPNG_DIRECTION_OUT = 2,
+};
+
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param port_id
+ *   The Ethernet port on which packet was received
+ *   or is going to be transmitted.
+ * @param queue
+ *   The queue on the Ethernet port where packet was received
+ *   or is going to be transmitted.
+ * @param mp
+ *   The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ *   The mbuf to copy
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @param timestamp
+ *   The timestamp in TSC cycles.
+ * @param direction
+ *   The direction of the packer: receive, transmit or unknown.
+ *
+ * @return
+ *   - The pointer to the new mbuf formatted for pcapng_write
+ *   - NULL if allocation fails.
+ *
+ */
+__rte_experimental
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *m, struct rte_mempool *mp,
+		uint32_t length, uint64_t timestamp,
+		enum rte_pcapng_direction direction);
+
+
+/**
+ * Determine optimum mbuf data size.
+ *
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @return
+ *   The minimum size of mbuf data to handle packet with length bytes.
+ *   Accounting for required header and trailer fields
+ */
+__rte_experimental
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length);
+
+/**
+ * Write packets to the capture file.
+ *
+ * Packets to be captured are copied by rte_pcapng_mbuf()
+ * and then this function is called to write them to the file.
+ * @warning
+ * Do not pass original mbufs
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param pkts
+ *  The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *  which contain the output packets
+ * @param nb_pkts
+ *  The number of packets to write to the file.
+ * @return
+ *  The number of bytes written to file, -1 on failure to write file.
+ *  The mbuf's in *pkts* are always freed.
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts);
+
+/**
+ * Write an Interface statistics block.
+ * For statistics, use 0 if don't know or care to report it.
+ * Should be called before closing capture to report results.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param port
+ *  The Ethernet port to report stats on.
+ * @param comment
+ *   Optional comment to add to statistics.
+ * @param start_time
+ *  The time when packet capture was started in nanoseconds.
+ *  Optional: can be zero if not known.
+ * @param end_time
+ *  The time when packet capture was stopped in nanoseconds.
+ *  Optional: can be zero if not finished;
+ * @param ifrecv
+ *  The number of packets received by capture.
+ *  Optional: use UINT64_MAX if not known.
+ * @param ifdrop
+ *  The number of packets missed by the capture process.
+ *  Optional: use UINT64_MAX if not known.
+ * @return
+ *  number of bytes written to file, -1 on failure to write file
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PCAPNG_H_ */
diff --git a/lib/pcapng/version.map b/lib/pcapng/version.map
new file mode 100644
index 000000000000..05a9c86a7d91
--- /dev/null
+++ b/lib/pcapng/version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_pcapng_close;
+	rte_pcapng_copy;
+	rte_pcapng_fdopen;
+	rte_pcapng_mbuf_size;
+	rte_pcapng_write_packets;
+	rte_pcapng_write_stats;
+
+	local: *;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v6 02/10] bpf: allow self-xor operation
  2021-09-09 23:33 ` [dpdk-dev] [PATCH v6 00/10] Packet capture framework enhancements Stephen Hemminger
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 01/10] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-09-09 23:33   ` Stephen Hemminger
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 03/10] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-09 23:33 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

When doing BPF filter program conversion, a common way
to zero a register in single instruction is:
     xor r7,r7
The BPF validator would not allow this because the value of
r7 was undefined. But after this operation it always zero.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_validate.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/bpf/bpf_validate.c b/lib/bpf/bpf_validate.c
index 7b1291b382e9..7647a7454dc2 100644
--- a/lib/bpf/bpf_validate.c
+++ b/lib/bpf/bpf_validate.c
@@ -661,8 +661,12 @@ eval_alu(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
 
 	op = BPF_OP(ins->code);
 
-	err = eval_defined((op != EBPF_MOV) ? rd : NULL,
-			(op != BPF_NEG) ? &rs : NULL);
+	/* Allow self-xor as way to zero register */
+	if (op == BPF_XOR && ins->src_reg == ins->dst_reg)
+		err = NULL;
+	else
+		err = eval_defined((op != EBPF_MOV) ? rd : NULL,
+				   (op != BPF_NEG) ? &rs : NULL);
 	if (err != NULL)
 		return err;
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v6 03/10] bpf: add function to convert classic BPF to DPDK BPF
  2021-09-09 23:33 ` [dpdk-dev] [PATCH v6 00/10] Packet capture framework enhancements Stephen Hemminger
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 01/10] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 02/10] bpf: allow self-xor operation Stephen Hemminger
@ 2021-09-09 23:33   ` Stephen Hemminger
  2021-09-10  7:59     ` Dmitry Kozlyuk
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 04/10] bpf: add function to dump eBPF instructions Stephen Hemminger
                     ` (6 subsequent siblings)
  9 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-09 23:33 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

The pcap library emits classic BPF (32 bit) and is useful for
creating filter programs.  The DPDK BPF library only implements
extended BPF (eBPF).  Add an function to convert from old to
new.

The rte_bpf_convert function uses rte_malloc to put the resulting
program in hugepage shared memory so it can be passed from a
secondary process to a primary process.

The code to convert was originally done as part of the Linux
kernel implementation then converted to a userspace program.
Both authors have agreed that it is allowable to license this
as BSD licensed in DPDK.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_convert.c | 570 ++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build   |   5 +
 lib/bpf/rte_bpf.h     |  25 ++
 lib/bpf/version.map   |   6 +
 4 files changed, 606 insertions(+)
 create mode 100644 lib/bpf/bpf_convert.c

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
new file mode 100644
index 000000000000..198e6d359042
--- /dev/null
+++ b/lib/bpf/bpf_convert.c
@@ -0,0 +1,570 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Microsoft Corporation
+ *
+ * Based on bpf_convert_filter() in the Linux kernel sources
+ * and filter2xdp.
+ *
+ * Licensed as BSD with permission original authors.
+ * Copyright (C) 2017 Tobias Klauser
+ * Copyright (c) 2011 - 2014 PLUMgrid, http://plumgrid.com
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_bpf.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+/* Workaround name conflicts with libpcap */
+#define bpf_validate(f, len) bpf_validate_libpcap(f, len)
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+#undef bpf_validate
+
+#include "bpf_impl.h"
+#include "bpf_def.h"
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+/*
+ * Linux socket filter uses negative absolute offsets to
+ * reference ancillary data.
+ *
+ */
+#define SKF_AD_OFF    (-0x1000)
+#define SKF_AD_PROTOCOL 0
+#define SKF_AD_PKTTYPE	4
+#define SKF_AD_IFINDEX	8
+#define SKF_AD_NLATTR	12
+#define SKF_AD_NLATTR_NEST	16
+#define SKF_AD_MARK	20
+#define SKF_AD_QUEUE	24
+#define SKF_AD_HATYPE	28
+#define SKF_AD_RXHASH	32
+#define SKF_AD_CPU	36
+#define SKF_AD_ALU_XOR_X	40
+#define SKF_AD_VLAN_TAG	44
+#define SKF_AD_VLAN_TAG_PRESENT 48
+#define SKF_AD_PAY_OFFSET	52
+#define SKF_AD_RANDOM	56
+#define SKF_AD_VLAN_TPID	60
+#define SKF_AD_MAX	64
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1	EBPF_REG_1
+#define BPF_REG_ARG2	EBPF_REG_2
+#define BPF_REG_ARG3	EBPF_REG_3
+#define BPF_REG_ARG4	EBPF_REG_4
+#define BPF_REG_ARG5	EBPF_REG_5
+#define BPF_REG_CTX	EBPF_REG_6
+#define BPF_REG_FP	EBPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A	EBPF_REG_0
+#define BPF_REG_X	EBPF_REG_7
+#define BPF_REG_TMP	EBPF_REG_8
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define EBPF_ALU64_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | BPF_OP(OP) | BPF_X,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU32_IMM(OP, DST, IMM)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV32_IMM(DST, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = CODE,					\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN()						\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | EBPF_EXIT,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/*
+ * Placeholder to convert BPF extensions like length and VLAN tag
+ * If and when DPDK BPF supports them.
+ */
+static bool convert_bpf_load(const struct bpf_insn *fp,
+			     struct ebpf_insn **new_insnp __rte_unused)
+{
+	switch (fp->k) {
+	case SKF_AD_OFF + SKF_AD_PROTOCOL:
+	case SKF_AD_OFF + SKF_AD_PKTTYPE:
+	case SKF_AD_OFF + SKF_AD_IFINDEX:
+	case SKF_AD_OFF + SKF_AD_HATYPE:
+	case SKF_AD_OFF + SKF_AD_MARK:
+	case SKF_AD_OFF + SKF_AD_RXHASH:
+	case SKF_AD_OFF + SKF_AD_QUEUE:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG_PRESENT:
+	case SKF_AD_OFF + SKF_AD_VLAN_TPID:
+	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
+	case SKF_AD_OFF + SKF_AD_NLATTR:
+	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
+	case SKF_AD_OFF + SKF_AD_CPU:
+	case SKF_AD_OFF + SKF_AD_RANDOM:
+	case SKF_AD_OFF + SKF_AD_ALU_XOR_X:
+		/* TODO: Not implemented yet */
+		RTE_BPF_LOG(ERR, "BPF extension LOAD ABS %u not supported\n",
+			    fp->k);
+		return true;
+	default:
+		return false;
+	}
+}
+
+static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
+			      struct ebpf_insn *new_prog, uint32_t *new_len)
+{
+	unsigned int pass = 0;
+	size_t new_flen = 0, target, i;
+	struct ebpf_insn *new_insn;
+	const struct bpf_insn *fp;
+	int *addrs = NULL;
+	uint8_t bpf_src;
+
+	if (len > BPF_MAXINSNS) {
+		RTE_BPF_LOG(ERR, "%s: cBPF program too long (%zu insns)\n",
+			    __func__, len);
+		return -EINVAL;
+	}
+
+	/* On second pass, allocate the new program */
+	if (new_prog) {
+		addrs = calloc(len, sizeof(*addrs));
+		if (addrs == NULL)
+			return -ENOMEM;
+	}
+
+do_pass:
+	new_insn = new_prog;
+	fp = prog;
+
+	/* Classic BPF related prologue emission. */
+	if (new_insn) {
+		/* Classic BPF expects A and X to be reset first. These need
+		 * to be guaranteed to be the first two instructions.
+		 */
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+
+		/* All programs must keep CTX in callee saved BPF_REG_CTX.
+		 * In eBPF case it's done by the compiler, here we need to
+		 * do this ourself. Initial CTX is present in BPF_REG_ARG1.
+		 */
+		*new_insn++ = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1);
+	} else {
+		new_insn += 3;
+	}
+
+	for (i = 0; i < len; fp++, i++) {
+		struct ebpf_insn tmp_insns[6] = { };
+		struct ebpf_insn *insn = tmp_insns;
+
+		if (addrs)
+			addrs[i] = new_insn - new_prog;
+
+		switch (fp->code) {
+			/* Absolute loads are how classic BPF accesses skb */
+		case BPF_LD | BPF_ABS | BPF_W:
+		case BPF_LD | BPF_ABS | BPF_H:
+		case BPF_LD | BPF_ABS | BPF_B:
+			if (convert_bpf_load(fp, &insn))
+				goto err;
+
+			*insn = BPF_RAW_INSN(fp->code, 0, 0, 0, fp->k);
+			break;
+
+		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_X:
+			/* For cBPF, don't cause floating point exception */
+			*insn++ = BPF_MOV32_REG(BPF_REG_X, BPF_REG_X);
+			*insn++ = BPF_JMP_IMM(EBPF_JNE, BPF_REG_X, 0, 2);
+			*insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+			*insn++ = BPF_EXIT_INSN();
+			/* fallthrough */
+		case BPF_ALU | BPF_ADD | BPF_X:
+		case BPF_ALU | BPF_ADD | BPF_K:
+		case BPF_ALU | BPF_SUB | BPF_X:
+		case BPF_ALU | BPF_SUB | BPF_K:
+		case BPF_ALU | BPF_AND | BPF_X:
+		case BPF_ALU | BPF_AND | BPF_K:
+		case BPF_ALU | BPF_OR | BPF_X:
+		case BPF_ALU | BPF_OR | BPF_K:
+		case BPF_ALU | BPF_LSH | BPF_X:
+		case BPF_ALU | BPF_LSH | BPF_K:
+		case BPF_ALU | BPF_RSH | BPF_X:
+		case BPF_ALU | BPF_RSH | BPF_K:
+		case BPF_ALU | BPF_XOR | BPF_X:
+		case BPF_ALU | BPF_XOR | BPF_K:
+		case BPF_ALU | BPF_MUL | BPF_X:
+		case BPF_ALU | BPF_MUL | BPF_K:
+		case BPF_ALU | BPF_DIV | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_NEG:
+		case BPF_LD | BPF_IND | BPF_W:
+		case BPF_LD | BPF_IND | BPF_H:
+		case BPF_LD | BPF_IND | BPF_B:
+			/* All arithmetic insns map as-is. */
+			*insn = BPF_RAW_INSN(fp->code, BPF_REG_A, BPF_REG_X, 0, fp->k);
+			break;
+
+			/* Jump transformation cannot use BPF block macros
+			 * everywhere as offset calculation and target updates
+			 * require a bit more work than the rest, i.e. jump
+			 * opcodes map as-is, but offsets need adjustment.
+			 */
+
+#define BPF_EMIT_JMP							\
+			do {						\
+				if (target >= len)			\
+					goto err;			\
+				insn->off = addrs ? addrs[target] - addrs[i] - 1 : 0; \
+				/* Adjust pc relative offset for 2nd or 3rd insn. */ \
+				insn->off -= insn - tmp_insns;		\
+			} while (0)
+
+		case BPF_JMP | BPF_JA:
+			target = i + fp->k + 1;
+			insn->code = fp->code;
+			BPF_EMIT_JMP;
+			break;
+
+		case BPF_JMP | BPF_JEQ | BPF_K:
+		case BPF_JMP | BPF_JEQ | BPF_X:
+		case BPF_JMP | BPF_JSET | BPF_K:
+		case BPF_JMP | BPF_JSET | BPF_X:
+		case BPF_JMP | BPF_JGT | BPF_K:
+		case BPF_JMP | BPF_JGT | BPF_X:
+		case BPF_JMP | BPF_JGE | BPF_K:
+		case BPF_JMP | BPF_JGE | BPF_X:
+			if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
+				/* BPF immediates are signed, zero extend
+				 * immediate into tmp register and use it
+				 * in compare insn.
+				 */
+				*insn++ = BPF_MOV32_IMM(BPF_REG_TMP, fp->k);
+
+				insn->dst_reg = BPF_REG_A;
+				insn->src_reg = BPF_REG_TMP;
+				bpf_src = BPF_X;
+			} else {
+				insn->dst_reg = BPF_REG_A;
+				insn->imm = fp->k;
+				bpf_src = BPF_SRC(fp->code);
+				insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			}
+
+			/* Common case where 'jump_false' is next insn. */
+			if (fp->jf == 0) {
+				insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+				target = i + fp->jt + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Convert JEQ into JNE when 'jump_true' is next insn. */
+			if (fp->jt == 0 && BPF_OP(fp->code) == BPF_JEQ) {
+				insn->code = BPF_JMP | EBPF_JNE | bpf_src;
+				target = i + fp->jf + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Other jumps are mapped into two insns: Jxx and JA. */
+			target = i + fp->jt + 1;
+			insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+			BPF_EMIT_JMP;
+			insn++;
+
+			insn->code = BPF_JMP | BPF_JA;
+			target = i + fp->jf + 1;
+			BPF_EMIT_JMP;
+			break;
+
+			/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
+		case BPF_LDX | BPF_MSH | BPF_B:
+			/* tmp = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_A);
+			/* A = BPF_R0 = *(u8 *) (skb->data + K) */
+			*insn++ = BPF_LD_ABS(BPF_B, fp->k);
+			/* A &= 0xf */
+			*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, 0xf);
+			/* A <<= 2 */
+			*insn++ = BPF_ALU32_IMM(BPF_LSH, BPF_REG_A, 2);
+			/* X = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			/* A = tmp */
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP);
+			break;
+
+			/* RET_K is remaped into 2 insns. RET_A case doesn't need an
+			 * extra mov as EBPF_REG_0 is already mapped into BPF_REG_A.
+			 */
+		case BPF_RET | BPF_A:
+		case BPF_RET | BPF_K:
+			if (BPF_RVAL(fp->code) == BPF_K) {
+				*insn++ = BPF_MOV32_RAW(BPF_K, EBPF_REG_0,
+							0, fp->k);
+			}
+			*insn = BPF_EXIT_INSN();
+			break;
+
+			/* Store to stack. */
+		case BPF_ST:
+		case BPF_STX:
+			*insn = BPF_STX_MEM(BPF_W, BPF_REG_FP, BPF_CLASS(fp->code) ==
+					    BPF_ST ? BPF_REG_A : BPF_REG_X,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* Load from stack. */
+		case BPF_LD | BPF_MEM:
+		case BPF_LDX | BPF_MEM:
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD  ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_FP,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* A = K or X = K */
+		case BPF_LD | BPF_IMM:
+		case BPF_LDX | BPF_IMM:
+			*insn = BPF_MOV32_IMM(BPF_CLASS(fp->code) == BPF_LD ?
+					      BPF_REG_A : BPF_REG_X, fp->k);
+			break;
+
+			/* X = A */
+		case BPF_MISC | BPF_TAX:
+			*insn = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			break;
+
+			/* A = X */
+		case BPF_MISC | BPF_TXA:
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_X);
+			break;
+
+			/* A = mbuf->len or X = mbuf->len */
+		case BPF_LD | BPF_W | BPF_LEN:
+		case BPF_LDX | BPF_W | BPF_LEN:
+			/* BPF_ABS/BPF_IND implicitly expect mbuf ptr in R6 */
+
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_CTX,
+					    offsetof(struct rte_mbuf, pkt_len));
+			break;
+
+			/* Unknown instruction. */
+		default:
+			RTE_BPF_LOG(ERR, "%s: Unknown instruction!: %#x\n",
+				    __func__, fp->code);
+			goto err;
+		}
+
+		insn++;
+		if (new_prog)
+			memcpy(new_insn, tmp_insns,
+			       sizeof(*insn) * (insn - tmp_insns));
+		new_insn += insn - tmp_insns;
+	}
+
+	if (!new_prog) {
+		/* Only calculating new length. */
+		*new_len = new_insn - new_prog;
+		return 0;
+	}
+
+	pass++;
+	if ((ptrdiff_t)new_flen != new_insn - new_prog) {
+		new_flen = new_insn - new_prog;
+		if (pass > 2)
+			goto err;
+		goto do_pass;
+	}
+
+	free(addrs);
+	assert(*new_len == new_flen);
+
+	return 0;
+err:
+	free(addrs);
+	return -1;
+}
+
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog)
+{
+	struct rte_bpf_prm *prm = NULL;
+	struct ebpf_insn *ebpf = NULL;
+	uint32_t ebpf_len = 0;
+	int ret;
+
+	if (prog == NULL) {
+		RTE_BPF_LOG(ERR, "%s: NULL program\n", __func__);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* 1st pass: calculate the eBPF program length */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, NULL, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot get eBPF length\n", __func__);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	RTE_BPF_LOG(DEBUG, "%s: prog len cBPF=%u -> eBPF=%u\n",
+		    __func__, prog->bf_len, ebpf_len);
+
+	prm = rte_zmalloc("bpf_filter",
+			  sizeof(*prm) + ebpf_len * sizeof(*ebpf), 0);
+	if (prm == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	/* The EPBF instructions in this case are right after the header */
+	ebpf = (void *)(prm + 1);
+
+	/* 2nd pass: remap cBPF to eBPF instructions  */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, ebpf, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot convert cBPF to eBPF\n", __func__);
+		free(prm);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	prm->ins = ebpf;
+	prm->nb_ins = ebpf_len;
+
+	/* Classic BPF programs use mbufs */
+	prm->prog_arg.type = RTE_BPF_ARG_PTR_MBUF;
+	prm->prog_arg.size = sizeof(struct rte_mbuf);
+
+	return prm;
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 63cbd60185e0..54f7610ae990 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -25,3 +25,8 @@ if dep.found()
     sources += files('bpf_load_elf.c')
     ext_deps += dep
 endif
+
+if dpdk_conf.has('RTE_PORT_PCAP')
+    sources += files('bpf_convert.c')
+    ext_deps += pcap_dep
+endif
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 69116f36ba8b..2f23e272a376 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,31 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+#ifdef RTE_PORT_PCAP
+
+struct bpf_program;
+
+/**
+ * Convert a Classic BPF program from libpcap into a DPDK BPF code.
+ *
+ * @param prog
+ *  Classic BPF program from pcap_compile().
+ * @param prm
+ *  Result Extended BPF program.
+ * @return
+ *   Pointer to BPF program (allocated with *rte_malloc*)
+ *   that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+__rte_experimental
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog);
+
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 0bf35f487666..47082d5003ef 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -14,3 +14,9 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_convert;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v6 04/10] bpf: add function to dump eBPF instructions
  2021-09-09 23:33 ` [dpdk-dev] [PATCH v6 00/10] Packet capture framework enhancements Stephen Hemminger
                     ` (2 preceding siblings ...)
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 03/10] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
@ 2021-09-09 23:33   ` Stephen Hemminger
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 05/10] test: add test for bpf_convert Stephen Hemminger
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-09 23:33 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

When debugging converted (and other) programs it is useful
to see disassembled eBPF output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_convert.c |   7 ++-
 lib/bpf/bpf_dump.c    | 118 ++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build   |   1 +
 lib/bpf/rte_bpf.h     |  14 +++++
 lib/bpf/version.map   |   1 +
 5 files changed, 140 insertions(+), 1 deletion(-)
 create mode 100644 lib/bpf/bpf_dump.c

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
index 198e6d359042..f649fa663edf 100644
--- a/lib/bpf/bpf_convert.c
+++ b/lib/bpf/bpf_convert.c
@@ -331,7 +331,12 @@ static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
 		case BPF_LD | BPF_IND | BPF_H:
 		case BPF_LD | BPF_IND | BPF_B:
 			/* All arithmetic insns map as-is. */
-			*insn = BPF_RAW_INSN(fp->code, BPF_REG_A, BPF_REG_X, 0, fp->k);
+			insn->code = fp->code;
+			insn->dst_reg = BPF_REG_A;
+			bpf_src = BPF_SRC(fp->code);
+			insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			insn->off = 0;
+			insn->imm = fp->k;
 			break;
 
 			/* Jump transformation cannot use BPF block macros
diff --git a/lib/bpf/bpf_dump.c b/lib/bpf/bpf_dump.c
new file mode 100644
index 000000000000..a6a431e64903
--- /dev/null
+++ b/lib/bpf/bpf_dump.c
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Stephen Hemminger
+ * Based on filter2xdp
+ * Copyright (C) 2017 Tobias Klauser
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+
+#include "rte_bpf.h"
+
+#define BPF_OP_INDEX(x) (BPF_OP(x) >> 4)
+#define BPF_SIZE_INDEX(x) (BPF_SIZE(x) >> 3)
+
+static const char *const class_tbl[] = {
+	[BPF_LD] = "ld",   [BPF_LDX] = "ldx",	 [BPF_ST] = "st",
+	[BPF_STX] = "stx", [BPF_ALU] = "alu",	 [BPF_JMP] = "jmp",
+	[BPF_RET] = "ret", [BPF_MISC] = "alu64",
+};
+
+static const char *const alu_op_tbl[16] = {
+	[BPF_ADD >> 4] = "add",	   [BPF_SUB >> 4] = "sub",
+	[BPF_MUL >> 4] = "mul",	   [BPF_DIV >> 4] = "div",
+	[BPF_OR >> 4] = "or",	   [BPF_AND >> 4] = "and",
+	[BPF_LSH >> 4] = "lsh",	   [BPF_RSH >> 4] = "rsh",
+	[BPF_NEG >> 4] = "neg",	   [BPF_MOD >> 4] = "mod",
+	[BPF_XOR >> 4] = "xor",	   [EBPF_MOV >> 4] = "mov",
+	[EBPF_ARSH >> 4] = "arsh", [EBPF_END >> 4] = "endian",
+};
+
+static const char *const size_tbl[] = {
+	[BPF_W >> 3] = "w",
+	[BPF_H >> 3] = "h",
+	[BPF_B >> 3] = "b",
+	[EBPF_DW >> 3] = "dw",
+};
+
+static const char *const jump_tbl[16] = {
+	[BPF_JA >> 4] = "ja",	   [BPF_JEQ >> 4] = "jeq",
+	[BPF_JGT >> 4] = "jgt",	   [BPF_JGE >> 4] = "jge",
+	[BPF_JSET >> 4] = "jset",  [EBPF_JNE >> 4] = "jne",
+	[EBPF_JSGT >> 4] = "jsgt", [EBPF_JSGE >> 4] = "jsge",
+	[EBPF_CALL >> 4] = "call", [EBPF_EXIT >> 4] = "exit",
+};
+
+static void ebpf_dump(FILE *f, const struct ebpf_insn insn, size_t n)
+{
+	const char *op, *postfix = "";
+	uint8_t cls = BPF_CLASS(insn.code);
+
+	fprintf(f, " L%zu:\t", n);
+
+	switch (cls) {
+	default:
+		fprintf(f, "unimp 0x%x // class: %s\n", insn.code,
+			class_tbl[cls]);
+		break;
+	case BPF_ALU:
+		postfix = "32";
+		/* fall through */
+	case EBPF_ALU64:
+		op = alu_op_tbl[BPF_OP_INDEX(insn.code)];
+		if (BPF_SRC(insn.code) == BPF_X)
+			fprintf(f, "%s%s r%u, r%u\n", op, postfix, insn.dst_reg,
+				insn.src_reg);
+		else
+			fprintf(f, "%s%s r%u, #0x%x\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		break;
+	case BPF_LD:
+		op = "ld";
+		postfix = size_tbl[BPF_SIZE_INDEX(insn.code)];
+		if (BPF_MODE(insn.code) == BPF_IMM)
+			fprintf(f, "%s%s r%d, #0x%x\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		else if (BPF_MODE(insn.code) == BPF_ABS)
+			fprintf(f, "%s%s r%d, [%d]\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		else if (BPF_MODE(insn.code) == BPF_IND)
+			fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix,
+				insn.dst_reg, insn.src_reg, insn.imm);
+		else
+			fprintf(f, "// BUG: LD opcode 0x%02x in eBPF insns\n",
+				insn.code);
+		break;
+	case BPF_LDX:
+		op = "ldx";
+		postfix = size_tbl[BPF_SIZE_INDEX(insn.code)];
+		fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix, insn.dst_reg,
+			insn.src_reg, insn.off);
+		break;
+#define L(pc, off) ((int)(pc) + 1 + (off))
+	case BPF_JMP:
+		op = jump_tbl[BPF_OP_INDEX(insn.code)];
+		if (op == NULL)
+			fprintf(f, "invalid jump opcode: %#x\n", insn.code);
+		else if (BPF_OP(insn.code) == BPF_JA)
+			fprintf(f, "%s L%d\n", op, L(n, insn.off));
+		else if (BPF_OP(insn.code) == EBPF_EXIT)
+			fprintf(f, "%s\n", op);
+		else
+			fprintf(f, "%s r%u, #0x%x, L%d\n", op, insn.dst_reg,
+				insn.imm, L(n, insn.off));
+		break;
+	case BPF_RET:
+		fprintf(f, "// BUG: RET opcode 0x%02x in eBPF insns\n",
+			insn.code);
+		break;
+	}
+}
+
+void rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len)
+{
+	uint32_t i;
+
+	for (i = 0; i < len; ++i)
+		ebpf_dump(f, buf[i], i);
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 54f7610ae990..5b5585173aeb 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2018 Intel Corporation
 
 sources = files('bpf.c',
+	'bpf_dump.c',
         'bpf_exec.c',
         'bpf_load.c',
         'bpf_pkt.c',
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 2f23e272a376..0d0a84b130a0 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,20 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+/**
+ * Dump epf instructions to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param buf
+ *   A pointer to BPF instructions
+ * @param len
+ *   Number of BPF instructions to dump.
+ */
+__rte_experimental
+void
+rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len);
+
 #ifdef RTE_PORT_PCAP
 
 struct bpf_program;
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 47082d5003ef..3b953f2f4592 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -19,4 +19,5 @@ EXPERIMENTAL {
 	global:
 
 	rte_bpf_convert;
+	rte_bpf_dump;
 };
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v6 05/10] test: add test for bpf_convert
  2021-09-09 23:33 ` [dpdk-dev] [PATCH v6 00/10] Packet capture framework enhancements Stephen Hemminger
                     ` (3 preceding siblings ...)
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 04/10] bpf: add function to dump eBPF instructions Stephen Hemminger
@ 2021-09-09 23:33   ` Stephen Hemminger
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 06/10] lib: pdump is not supported on Windows Stephen Hemminger
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-09 23:33 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Add some functional tests for the Classic BPF to DPDK BPF converter.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 148 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 148 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 527c06b80708..68b09067bf56 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -10,6 +10,7 @@
 #include <rte_memory.h>
 #include <rte_debug.h>
 #include <rte_hexdump.h>
+#include <rte_malloc.h>
 #include <rte_random.h>
 #include <rte_byteorder.h>
 #include <rte_errno.h>
@@ -3233,3 +3234,150 @@ test_bpf(void)
 }
 
 REGISTER_TEST_COMMAND(bpf_autotest, test_bpf);
+
+#ifdef RTE_PORT_PCAP
+#include <pcap/pcap.h>
+
+static void
+test_bpf_dump(struct bpf_program *cbf, const struct rte_bpf_prm *prm)
+{
+	printf("cBPF program (%u insns)\n", cbf->bf_len);
+	bpf_dump(cbf, 1);
+
+	printf("\neBPF program (%u insns)\n", prm->nb_ins);
+	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
+}
+
+static int
+test_bpf_match(pcap_t *pcap, const char *str, bool expected)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+	uint8_t tbuf[sizeof(struct dummy_mbuf)];
+	int ret = -1;
+	uint64_t rc;
+
+	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
+		       __func__, __LINE__,  str, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, str, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	test_ld_mbuf1_prepare(tbuf);
+	rc = rte_bpf_exec(bpf, tbuf);
+	if ((rc == 0) == expected)
+		ret = 0;
+	else
+		printf("%s@%d: failed match: expect %s 0 got %"PRIu64"\n",
+		       __func__, __LINE__, expected ? "==" : "<>",  rc);
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return ret;
+}
+
+/* Basic sanity test can we match a IP packet */
+static int
+test_bpf_filter_sanity(pcap_t *pcap)
+{
+	int ret;
+
+	ret = test_bpf_match(pcap, "ip", true);
+	ret |= test_bpf_match(pcap, "not ip", false);
+
+	return ret;
+}
+
+/* Some sample pcap filter strings from tcpdump man page */
+static const char *sample_filters[] = {
+	"host 192.168.1.100",
+	"src net 10",
+	"not stp",
+	"len = 128",
+	"ip host 1.1.1.1 and not 1.1.1.2",
+	"ip and not net 127.0.0.1",
+	"tcp[tcpflags] & (tcp-syn|tcp-fin) != 0 and not src and dst net 127.0.0.1",
+	"ether[0] & 1 = 0 and ip[16] >= 224",
+	"icmp[icmptype] != icmp-echo and icmp[icmptype] != icmp-echoreply",
+};
+
+static int
+test_bpf_filter(pcap_t *pcap, const char *s)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+
+	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
+		       __func__, __LINE__, s, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, s, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	else {
+		printf("%s \"%s\"\n", __func__, s);
+		test_bpf_dump(&fcode, prm);
+	}
+
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return (bpf == NULL) ? -1 : 0;
+}
+
+static int
+test_bpf_convert(void)
+{
+	unsigned int i;
+	pcap_t *pcap;
+	int rc;
+
+	pcap = pcap_open_dead(DLT_EN10MB, 262144);
+	if (!pcap) {
+		printf("pcap_open_dead failed\n");
+		return -1;
+	}
+
+	rc = test_bpf_filter_sanity(pcap);
+	for (i = 0; i < RTE_DIM(sample_filters); i++) {
+		rc |= test_bpf_filter(pcap,sample_filters[i]);
+	}
+	pcap_close(pcap);
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(bpf_convert_autotest, test_bpf_convert);
+#endif /* RTE_PORT_PCAP */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v6 06/10] lib: pdump is not supported on Windows
  2021-09-09 23:33 ` [dpdk-dev] [PATCH v6 00/10] Packet capture framework enhancements Stephen Hemminger
                     ` (4 preceding siblings ...)
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 05/10] test: add test for bpf_convert Stephen Hemminger
@ 2021-09-09 23:33   ` Stephen Hemminger
  2021-09-10  8:17     ` Dmitry Kozlyuk
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 07/10] pdump: support pcapng and filtering Stephen Hemminger
                     ` (3 subsequent siblings)
  9 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-09 23:33 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

The original version of the pdump library was building on
Windows, but it was useless since the pdump utility was not being
built.

The new version of pdump with filtering now has dependency
on bpf. But bpf library is not available on Windows.

For now, just stop trying to build pdump on Windows.
Eventually, bpf library, pdump library, dumpcap tool,
and pdump tool can be converted to work on Windows.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Cc: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>
Cc: Dmitry Malloy <dmitrym@microsoft.com>
Cc: Pallavi Kadam <pallavi.kadam@intel.com>
---
 lib/meson.build | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/meson.build b/lib/meson.build
index 51bf9c2d11f0..ba88e9eabc58 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -85,7 +85,6 @@ if is_windows
             'gro',
             'gso',
             'latencystats',
-            'pdump',
     ] # only supported libraries for windows
 endif
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v6 07/10] pdump: support pcapng and filtering
  2021-09-09 23:33 ` [dpdk-dev] [PATCH v6 00/10] Packet capture framework enhancements Stephen Hemminger
                     ` (5 preceding siblings ...)
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 06/10] lib: pdump is not supported on Windows Stephen Hemminger
@ 2021-09-09 23:33   ` Stephen Hemminger
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 08/10] app/dumpcap: add new packet capture application Stephen Hemminger
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-09 23:33 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build       |   4 +-
 lib/pdump/meson.build |   2 +-
 lib/pdump/rte_pdump.c | 437 ++++++++++++++++++++++++++++++------------
 lib/pdump/rte_pdump.h | 110 ++++++++++-
 lib/pdump/version.map |   8 +
 5 files changed, 435 insertions(+), 126 deletions(-)

diff --git a/lib/meson.build b/lib/meson.build
index ba88e9eabc58..1da521ea6185 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -26,6 +26,7 @@ libraries = [
         'timer',   # eventdev depends on this
         'acl',
         'bbdev',
+        'bpf',
         'bitratestats',
         'cfgfile',
         'compressdev',
@@ -43,7 +44,6 @@ libraries = [
         'member',
         'pcapng',
         'power',
-        'pdump',
         'rawdev',
         'regexdev',
         'rib',
@@ -55,10 +55,10 @@ libraries = [
         'ipsec', # ipsec lib depends on net, crypto and security
         'fib', #fib lib depends on rib
         'port', # pkt framework libs which use other libs from above
+        'pdump', # pdump lib depends on bpf pcapng
         'table',
         'pipeline',
         'flow_classify', # flow_classify lib depends on pkt framework table lib
-        'bpf',
         'graph',
         'node',
 ]
diff --git a/lib/pdump/meson.build b/lib/pdump/meson.build
index 3a95eabde6a6..51ceb2afdec5 100644
--- a/lib/pdump/meson.build
+++ b/lib/pdump/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('rte_pdump.c')
 headers = files('rte_pdump.h')
-deps += ['ethdev']
+deps += ['ethdev', 'bpf', 'pcapng']
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 382217bc1564..f2047ad9f001 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -7,8 +7,10 @@
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
+#include <rte_pcapng.h>
 
 #include "rte_pdump.h"
 
@@ -27,30 +29,26 @@ enum pdump_operation {
 	ENABLE = 2
 };
 
+/*
+ * Note: version numbers intentionally start at 3
+ * in order to catch any application built with older out
+ * version of DPDK using incompatible client request format.
+ */
 enum pdump_version {
-	V1 = 1
+	PDUMP_CLIENT_LEGACY = 3,
+	PDUMP_CLIENT_PCAPNG = 4,
 };
 
 struct pdump_request {
 	uint16_t ver;
 	uint16_t op;
-	uint32_t flags;
-	union pdump_data {
-		struct enable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} en_v1;
-		struct disable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} dis_v1;
-	} data;
+	uint16_t flags;
+	uint16_t queue;
+	struct rte_ring *ring;
+	struct rte_mempool *mp;
+	const struct rte_bpf_prm *prm;
+	uint32_t snaplen;
+	char device[RTE_DEV_NAME_MAX_LEN];
 };
 
 struct pdump_response {
@@ -63,80 +61,140 @@ static struct pdump_rxtx_cbs {
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	const struct rte_eth_rxtx_callback *cb;
-	void *filter;
+	const struct rte_bpf *filter;
+	enum pdump_version ver;
+	uint32_t snaplen;
 } rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
 tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
 
-
-static inline void
-pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+static const char *MZ_RTE_PDUMP_STATS = "rte_pdump_stats";
+
+/* Shared memory between primary and secondary processes. */
+static struct {
+	struct rte_pdump_stats rx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+	struct rte_pdump_stats tx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+} *pdump_stats;
+
+/* Create a clone of mbuf to be placed into ring. */
+static void
+pdump_copy(uint16_t port_id, uint16_t queue,
+	   enum rte_pcapng_direction direction,
+	   struct rte_mbuf **pkts, uint16_t nb_pkts,
+	   const struct pdump_rxtx_cbs *cbs,
+	   struct rte_pdump_stats *stats)
 {
 	unsigned int i;
 	int ring_enq;
 	uint16_t d_pkts = 0;
 	struct rte_mbuf *dup_bufs[nb_pkts];
-	struct pdump_rxtx_cbs *cbs;
+	uint64_t ts;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	struct rte_mbuf *p;
+	uint64_t rcs[nb_pkts];
+
+	if (cbs->filter &&
+	    rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts) == 0) {
+		/* All packets were filtered out */
+		__atomic_fetch_add(&stats->filtered, nb_pkts,
+				   __ATOMIC_RELAXED);
+		return;
+	}
 
-	cbs  = user_params;
+	ts = rte_get_tsc_cycles();
 	ring = cbs->ring;
 	mp = cbs->mp;
 	for (i = 0; i < nb_pkts; i++) {
-		p = rte_pktmbuf_copy(pkts[i], mp, 0, UINT32_MAX);
-		if (p)
+		/*
+		 * Similar behavior to rte_bpf_eth callback.
+		 * if BPF program returns zero value for a given packet,
+		 * then it will be ignored.
+		 */
+		if (cbs->filter && rcs[i] == 0) {
+			__atomic_fetch_add(&stats->filtered,
+					   1, __ATOMIC_RELAXED);
+			continue;
+		}
+
+		/*
+		 * If using pcapng then want to wrap packets
+		 * otherwise a simple copy.
+		 */
+		if (cbs->ver == PDUMP_CLIENT_PCAPNG)
+			p = rte_pcapng_copy(port_id, queue,
+					    pkts[i], mp, cbs->snaplen,
+					    ts, direction);
+		else
+			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
+
+		if (unlikely(p == NULL))
+			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+		else
 			dup_bufs[d_pkts++] = p;
 	}
 
+	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)dup_bufs, d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		PDUMP_LOG(DEBUG,
-			"only %d of packets enqueued to ring\n", ring_enq);
+		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
 
 static uint16_t
-pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_rx(uint16_t port, uint16_t queue,
 	struct rte_mbuf **pkts, uint16_t nb_pkts,
-	uint16_t max_pkts __rte_unused,
-	void *user_params)
+	uint16_t max_pkts __rte_unused, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->rx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_IN,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static uint16_t
-pdump_tx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_tx(uint16_t port, uint16_t queue,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->tx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static int
-pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_rx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &rx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &rx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"rx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_first_rx_callback(port, qid,
 								pdump_rx, cbs);
 			if (cbs->cb == NULL) {
@@ -145,8 +203,7 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -170,26 +227,32 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 }
 
 static int
-pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_tx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &tx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &tx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"tx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_tx_callback(port, qid, pdump_tx,
 								cbs);
 			if (cbs->cb == NULL) {
@@ -198,8 +261,7 @@ pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -228,37 +290,47 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	uint16_t nb_rx_q = 0, nb_tx_q = 0, end_q, queue;
 	uint16_t port;
 	int ret = 0;
+	struct rte_bpf *filter = NULL;
 	uint32_t flags;
 	uint16_t operation;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 
-	flags = p->flags;
-	operation = p->op;
-	if (operation == ENABLE) {
-		ret = rte_eth_dev_get_port_by_name(p->data.en_v1.device,
-				&port);
-		if (ret < 0) {
+	if (!(p->ver == PDUMP_CLIENT_LEGACY ||
+	      p->ver == PDUMP_CLIENT_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "incorrect client version %u\n", p->ver);
+		return -EINVAL;
+	}
+
+	if (p->prm) {
+		if (p->prm->prog_arg.type != RTE_BPF_ARG_PTR_MBUF) {
 			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.en_v1.device);
+				  "invalid BPF program type: %u\n",
+				  p->prm->prog_arg.type);
 			return -EINVAL;
 		}
-		queue = p->data.en_v1.queue;
-		ring = p->data.en_v1.ring;
-		mp = p->data.en_v1.mp;
-	} else {
-		ret = rte_eth_dev_get_port_by_name(p->data.dis_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.dis_v1.device);
-			return -EINVAL;
+
+		filter = rte_bpf_load(p->prm);
+		if (filter == NULL) {
+			PDUMP_LOG(ERR, "cannot load BPF filter: %s\n",
+				  rte_strerror(rte_errno));
+			return -rte_errno;
 		}
-		queue = p->data.dis_v1.queue;
-		ring = p->data.dis_v1.ring;
-		mp = p->data.dis_v1.mp;
+	}
+
+	flags = p->flags;
+	operation = p->op;
+	queue = p->queue;
+	ring = p->ring;
+	mp = p->mp;
+
+	ret = rte_eth_dev_get_port_by_name(p->device, &port);
+	if (ret < 0) {
+		PDUMP_LOG(ERR,
+			  "failed to get port id for device id=%s\n",
+			  p->device);
+		return -EINVAL;
 	}
 
 	/* validation if packet capture is for all queues */
@@ -296,8 +368,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register RX callback */
 	if (flags & RTE_PDUMP_FLAG_RX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_rx_q : queue + 1;
-		ret = pdump_register_rx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_rx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -305,8 +378,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register TX callback */
 	if (flags & RTE_PDUMP_FLAG_TX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_tx_q : queue + 1;
-		ret = pdump_register_tx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_tx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -347,8 +421,18 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 int
 rte_pdump_init(void)
 {
+	const struct rte_memzone *mz;
 	int ret;
 
+	mz = rte_memzone_reserve(MZ_RTE_PDUMP_STATS, sizeof(*pdump_stats),
+				 rte_socket_id(), 0);
+	if (mz == NULL) {
+		PDUMP_LOG(ERR, "cannot allocate pdump statistics\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	pdump_stats = mz->addr;
+
 	ret = rte_mp_action_register(PDUMP_MP, pdump_server);
 	if (ret && rte_errno != ENOTSUP)
 		return -1;
@@ -392,14 +476,21 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 static int
 pdump_validate_flags(uint32_t flags)
 {
-	if (flags != RTE_PDUMP_FLAG_RX && flags != RTE_PDUMP_FLAG_TX &&
-		flags != RTE_PDUMP_FLAG_RXTX) {
+	if ((flags & RTE_PDUMP_FLAG_RXTX) == 0) {
 		PDUMP_LOG(ERR,
 			"invalid flags, should be either rx/tx/rxtx\n");
 		rte_errno = EINVAL;
 		return -1;
 	}
 
+	/* mask off the flags we know about */
+	if (flags & ~(RTE_PDUMP_FLAG_RXTX | RTE_PDUMP_FLAG_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "unknown flags: %#x\n", flags);
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -426,12 +517,12 @@ pdump_validate_port(uint16_t port, char *name)
 }
 
 static int
-pdump_prepare_client_request(char *device, uint16_t queue,
-				uint32_t flags,
-				uint16_t operation,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+pdump_prepare_client_request(const char *device, uint16_t queue,
+			     uint32_t flags, uint32_t snaplen,
+			     uint16_t operation,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     const struct rte_bpf_prm *prm)
 {
 	int ret = -1;
 	struct rte_mp_msg mp_req, *mp_rep;
@@ -440,23 +531,23 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	struct pdump_request *req = (struct pdump_request *)mp_req.param;
 	struct pdump_response *resp;
 
-	req->ver = 1;
-	req->flags = flags;
+	memset(req, 0, sizeof(*req));
+	if (flags & RTE_PDUMP_FLAG_PCAPNG)
+		req->ver = PDUMP_CLIENT_PCAPNG;
+	else
+		req->ver = PDUMP_CLIENT_LEGACY;
+
+	req->flags = flags & RTE_PDUMP_FLAG_RXTX;
 	req->op = operation;
+	req->queue = queue;
+	strlcpy(req->device, device, sizeof(req->device));
+
 	if ((operation & ENABLE) != 0) {
-		strlcpy(req->data.en_v1.device, device,
-			sizeof(req->data.en_v1.device));
-		req->data.en_v1.queue = queue;
-		req->data.en_v1.ring = ring;
-		req->data.en_v1.mp = mp;
-		req->data.en_v1.filter = filter;
-	} else {
-		strlcpy(req->data.dis_v1.device, device,
-			sizeof(req->data.dis_v1.device));
-		req->data.dis_v1.queue = queue;
-		req->data.dis_v1.ring = NULL;
-		req->data.dis_v1.mp = NULL;
-		req->data.dis_v1.filter = NULL;
+		req->queue = queue;
+		req->ring = ring;
+		req->mp = mp;
+		req->prm = prm;
+		req->snaplen = snaplen;
 	}
 
 	strlcpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
@@ -477,11 +568,17 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	return ret;
 }
 
-int
-rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
-			struct rte_ring *ring,
-			struct rte_mempool *mp,
-			void *filter)
+/*
+ * There are two versions of this function, because although original API
+ * left place holder for future filter, it never checked the value.
+ * Therefore the API can't depend on application passing a non
+ * bogus value.
+ */
+static int
+pdump_enable(uint16_t port, uint16_t queue,
+	     uint32_t flags, uint32_t snaplen,
+	     struct rte_ring *ring, struct rte_mempool *mp,
+	     const struct rte_bpf_prm *prm)
 {
 	int ret;
 	char name[RTE_DEV_NAME_MAX_LEN];
@@ -496,20 +593,42 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						ENABLE, ring, mp, filter);
+	if (snaplen == 0)
+		snaplen = UINT32_MAX;
 
-	return ret;
+	return pdump_prepare_client_request(name, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
 }
 
 int
-rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
-				uint32_t flags,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
+		 struct rte_ring *ring,
+		 struct rte_mempool *mp,
+		 void *filter __rte_unused)
 {
-	int ret = 0;
+	return pdump_enable(port, queue, flags, 0,
+			    ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm)
+{
+	return pdump_enable(port, queue, flags, snaplen,
+			    ring, mp, prm);
+}
+
+static int
+pdump_enable_by_deviceid(const char *device_id, uint16_t queue,
+			 uint32_t flags, uint32_t snaplen,
+			 struct rte_ring *ring,
+			 struct rte_mempool *mp,
+			 const struct rte_bpf_prm *prm)
+{
+	int ret;
 
 	ret = pdump_validate_ring_mp(ring, mp);
 	if (ret < 0)
@@ -518,10 +637,30 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						ENABLE, ring, mp, filter);
+	return pdump_prepare_client_request(device_id, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
+}
 
-	return ret;
+int
+rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
+			     uint32_t flags,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     void *filter __rte_unused)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, 0,
+					ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *prm)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, snaplen,
+					ring, mp, prm);
 }
 
 int
@@ -537,8 +676,8 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags)
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(name, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
@@ -553,8 +692,66 @@ rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(device_id, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
+
+static void
+pdump_sum_stats(uint16_t port, uint16_t nq,
+		struct rte_pdump_stats stats[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+		struct rte_pdump_stats *total)
+{
+	uint64_t *sum = (uint64_t *)total;
+	unsigned int i;
+	uint64_t val;
+	uint16_t qid;
+
+	for (qid = 0; qid < nq; qid++) {
+		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+
+		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
+			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			sum[i] += val;
+		}
+	}
+}
+
+int
+rte_pdump_stats(uint16_t port, struct rte_pdump_stats *stats)
+{
+	struct rte_eth_dev_info dev_info;
+	const struct rte_memzone *mz;
+	int ret;
+
+	memset(stats, 0, sizeof(*stats));
+	ret = rte_eth_dev_info_get(port, &dev_info);
+	if (ret != 0) {
+		PDUMP_LOG(ERR,
+			  "Error during getting device (port %u) info: %s\n",
+			  port, strerror(-ret));
+		return ret;
+	}
+
+	if (pdump_stats == NULL) {
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+			PDUMP_LOG(ERR,
+				  "pdump not initialized\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		mz = rte_memzone_lookup(MZ_RTE_PDUMP_STATS);
+		if (mz == NULL) {
+			PDUMP_LOG(ERR, "can not find pdump stats\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+		pdump_stats = mz->addr;
+	}
+
+	pdump_sum_stats(port, dev_info.nb_rx_queues, pdump_stats->rx, stats);
+	pdump_sum_stats(port, dev_info.nb_tx_queues, pdump_stats->tx, stats);
+	return 0;
+}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index 6b00fc17aeb2..be3fd14c4bd3 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -15,6 +15,7 @@
 #include <stdint.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
+#include <rte_bpf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -26,7 +27,9 @@ enum {
 	RTE_PDUMP_FLAG_RX = 1,  /* receive direction */
 	RTE_PDUMP_FLAG_TX = 2,  /* transmit direction */
 	/* both receive and transmit directions */
-	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
+	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX),
+
+	RTE_PDUMP_FLAG_PCAPNG = 4, /* format for pcapng */
 };
 
 /**
@@ -68,7 +71,7 @@ rte_pdump_uninit(void);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  Unused should be NULL.
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -80,6 +83,41 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 		struct rte_mempool *mp,
 		void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given port and queue with filtering.
+ *
+ * @param port_id
+ *  The Ethernet port on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param prm
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf(uint16_t port_id, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm);
+
 /**
  * Disables packet capturing on given port and queue.
  *
@@ -118,7 +156,7 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  unused should be NULL
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -131,6 +169,43 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 				struct rte_mempool *mp,
 				void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given device id and queue with filtering.
+ * device_id can be name or pci address of device.
+ *
+ * @param device_id
+ *  device id on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *filter);
+
+
 /**
  * Disables packet capturing on given device_id and queue.
  * device_id can be name or pci address of device.
@@ -153,6 +228,35 @@ int
 rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 				uint32_t flags);
 
+
+/**
+ * A structure used to retrieve statistics from packet capture.
+ * The statistics are sum of both receive and transmit queues.
+ */
+struct rte_pdump_stats {
+	uint64_t accepted; /**< Number of packets accepted by filter. */
+	uint64_t filtered; /**< Number of packets rejected by filter. */
+	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
+	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+
+	uint64_t reserved[4]; /**< Reserved and pad to cache line */
+};
+
+/**
+ * Retrieve the packet capture statistics for a queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param stats
+ *   A pointer to structure of type *rte_pdump_stats* to be filled in.
+ * @return
+ *   Zero if successful. -1 on error and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_pdump_stats(uint16_t port_id, struct rte_pdump_stats *stats);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index f0a9d12c9a9e..ce5502d9cdf4 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -10,3 +10,11 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_pdump_enable_bpf;
+	rte_pdump_enable_bpf_by_deviceid;
+	rte_pdump_stats;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v6 08/10] app/dumpcap: add new packet capture application
  2021-09-09 23:33 ` [dpdk-dev] [PATCH v6 00/10] Packet capture framework enhancements Stephen Hemminger
                     ` (6 preceding siblings ...)
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 07/10] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-09-09 23:33   ` Stephen Hemminger
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 09/10] doc: changes for new pcapng and dumpcap Stephen Hemminger
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 10/10] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
  9 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-09 23:33 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.

It is not complete yet some features such as filtering are not implemented.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/dumpcap/main.c      | 827 ++++++++++++++++++++++++++++++++++++++++
 app/dumpcap/meson.build |  16 +
 app/meson.build         |   1 +
 3 files changed, 844 insertions(+)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build

diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c
new file mode 100644
index 000000000000..66e25708b4d3
--- /dev/null
+++ b/app/dumpcap/main.c
@@ -0,0 +1,827 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * DPDK application to dump network traffic
+ * This is designed to look and act like the Wireshark
+ * dumpcap program.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <limits.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_alarm.h>
+#include <rte_bpf.h>
+#include <rte_config.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_lcore.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_pcapng.h>
+#include <rte_pdump.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+#include <rte_time.h>
+#include <rte_version.h>
+
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+
+#define RING_NAME "capture-ring"
+#define MONITOR_INTERVAL  (500 * 1000)
+#define MBUF_POOL_CACHE_SIZE 32
+#define BURST_SIZE 32
+#define SLEEP_THRESHOLD 1000
+
+/* command line flags */
+static const char *progname;
+static bool quit_signal;
+static bool group_read;
+static bool quiet;
+static bool promiscuous_mode = true;
+static bool use_pcapng = true;
+static char *output_name;
+static const char *filter_str;
+static unsigned int ring_size = 2048;
+static const char *capture_comment;
+static uint32_t snaplen = RTE_MBUF_DEFAULT_BUF_SIZE;
+static bool dump_bpf;
+static struct {
+	uint64_t  duration;	/* nanoseconds */
+	unsigned long packets;  /* number of packets in file */
+	size_t size;		/* file size (bytes) */
+} stop;
+
+/* Running state */
+static struct rte_bpf_prm *bpf_prm;
+static uint64_t start_time, end_time;
+static uint64_t packets_received;
+static size_t file_size;
+
+struct interface {
+	TAILQ_ENTRY(interface) next;
+	uint16_t port;
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_rxtx_callback *rx_cb[RTE_MAX_QUEUES_PER_PORT];
+};
+
+TAILQ_HEAD(interface_list, interface);
+static struct interface_list interfaces = TAILQ_HEAD_INITIALIZER(interfaces);
+static struct interface *port2intf[RTE_MAX_ETHPORTS];
+
+static void usage(void)
+{
+	printf("Usage: %s [options] ...\n\n", progname);
+	printf("Capture Interface:\n"
+	       "  -i <interface>           name or port index of interface\n"
+	       "  -f <capture filter>      packet filter in libpcap filter syntax\n");
+	printf("  -s <snaplen>, --snapshot-length <snaplen>\n"
+	       "                           packet snapshot length (def: %u)\n",
+	       RTE_MBUF_DEFAULT_BUF_SIZE);
+	printf("  -p, --no-promiscuous-mode\n"
+	       "                           don't capture in promiscuous mode\n"
+	       "  -D, --list-interfaces    print list of interfaces and exit\n"
+	       "  -d                       print generated BPF code for capture filter\n"
+	       "\n"
+	       "Stop conditions:\n"
+	       "  -c <packet count>        stop after n packets (def: infinite)\n"
+	       "  -a <autostop cond.> ..., --autostop <autostop cond.> ...\n"
+	       "                           duration:NUM - stop after NUM seconds\n"
+	       "                           filesize:NUM - stop this file after NUM kB\n"
+	       "                            packets:NUM - stop after NUM packets\n"
+	       "Output (files):\n"
+	       "  -w <filename>            name of file to save (def: tempfile)\n"
+	       "  -g                       enable group read access on the output file(s)\n"
+	       "  -n                       use pcapng format instead of pcap (default)\n"
+	       "  -P                       use libpcap format instead of pcapng\n"
+	       "  --capture-comment <comment>\n"
+	       "                           add a capture comment to the output file\n"
+	       "\n"
+	       "Miscellaneous:\n"
+	       "  -q                       don't report packet capture counts\n"
+	       "  -v, --version            print version information and exit\n"
+	       "  -h, --help               display this help and exit\n"
+	       "\n"
+	       "Use Ctrl-C to stop capturing at any time.\n");
+}
+
+static const char *version(void)
+{
+	static char str[128];
+
+	snprintf(str, sizeof(str),
+		 "%s 1.0 (%s)\n", progname, rte_version());
+	return str;
+}
+
+/* Parse numeric argument from command line */
+static unsigned long get_uint(const char *arg, const char *name,
+			     unsigned int limit)
+{
+	unsigned long u;
+	char *endp;
+
+	u = strtoul(arg, &endp, 0);
+	if (*arg == '\0' || *endp != '\0')
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is not a valid number\n",
+			 name, arg);
+	if (limit && u > limit)
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is too large (greater than %u)\n",
+			 name, arg, limit);
+
+	return u;
+}
+
+/* Set auto stop values */
+static void auto_stop(char *opt)
+{
+	char *value, *endp;
+
+	value = strchr(opt, ':');
+	if (value == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Missing colon in auto stop parameter\n");
+
+	*value++ = '\0';
+	if (strcmp(opt, "duration") == 0) {
+		double interval = strtod(value, &endp);
+
+		if (*value == '\0' || *endp != '\0' || interval <= 0)
+			rte_exit(EXIT_FAILURE,
+				 "Invalid duration \"%s\"\n", value);
+		stop.duration = NSEC_PER_SEC * interval;
+	} else if (strcmp(opt, "filesize") == 0) {
+		stop.size = get_uint(value, "filesize", 0) * 1024;
+	} else if (strcmp(opt, "packets") == 0) {
+		stop.packets = get_uint(value, "packets", 0);
+	} else {
+		rte_exit(EXIT_FAILURE,
+			 "Unknown autostop parameter \"%s\"\n", opt);
+	}
+}
+
+/* Add interface to list of interfaces to capture */
+static void add_interface(uint16_t port, const char *name)
+{
+	struct interface *intf;
+
+	intf = malloc(sizeof(*intf));
+	if (!intf)
+		rte_exit(EXIT_FAILURE, "no memory for interface\n");
+
+	memset(intf, 0, sizeof(*intf));
+	strlcpy(intf->name, name, sizeof(intf->name));
+
+	printf("Capturing on '%s'\n", name);
+
+	port2intf[port] = intf;
+	TAILQ_INSERT_TAIL(&interfaces, intf, next);
+}
+
+/* Select all valid DPDK interfaces */
+static void select_all_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+	}
+}
+
+/*
+ * Choose interface to capture if no -i option given.
+ * Select the first DPDK port, this matches what dumpcap does.
+ */
+static void set_default_interface(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+		return;
+	}
+	rte_exit(EXIT_FAILURE, "No usable interfaces found\n");
+}
+
+/* Lookup interface by name or port and add it to the list */
+static void select_interface(const char *arg)
+{
+	uint16_t port;
+
+	if (strcmp(arg, "*"))
+		select_all_interfaces();
+	else if (rte_eth_dev_get_port_by_name(arg, &port) == 0)
+		add_interface(port, arg);
+	else {
+		char name[RTE_ETH_NAME_MAX_LEN];
+
+		port = get_uint(arg, "port_number", UINT16_MAX);
+		if (rte_eth_dev_get_name_by_port(port, name) < 0)
+			rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
+				 port);
+		add_interface(port, name);
+	}
+}
+
+/* Display list of possible interfaces that can be used. */
+static void show_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		printf("%u. %s\n", p, name);
+	}
+}
+
+static void compile_filter(void)
+{
+	struct bpf_program bf;
+	pcap_t *pcap;
+
+	pcap = pcap_open_dead(DLT_EN10MB, snaplen);
+	if (!pcap)
+		rte_exit(EXIT_FAILURE, "can not open pcap\n");
+
+	if (pcap_compile(pcap, &bf, filter_str,
+			 1, PCAP_NETMASK_UNKNOWN) != 0)
+		rte_exit(EXIT_FAILURE, "pcap filter string not valid (%s)\n",
+			 pcap_geterr(pcap));
+
+	bpf_prm = rte_bpf_convert(&bf);
+	if (bpf_prm == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "bpf convert failed\n");
+
+	if (dump_bpf) {
+		printf("cBPF program (%u insns)\n", bf.bf_len);
+		bpf_dump(&bf, 1);
+		printf("\neBPF program (%u insns)\n", bpf_prm->nb_ins);
+		rte_bpf_dump(stdout, bpf_prm->ins, bpf_prm->nb_ins);
+		exit(0);
+	}
+
+	/* Don't care about original program any more */
+	pcap_freecode(&bf);
+	pcap_close(pcap);
+}
+
+/*
+ * Parse command line options.
+ * These are chosen to be similar to dumpcap command.
+ */
+static void parse_opts(int argc, char **argv)
+{
+	static const struct option long_options[] = {
+		{ "autostop",        required_argument, NULL, 'a' },
+		{ "capture-comment", required_argument, NULL, 0 },
+		{ "help",            no_argument,       NULL, 'h' },
+		{ "interface",       required_argument, NULL, 'i' },
+		{ "list-interfaces", no_argument,       NULL, 'D' },
+		{ "no-promiscuous-mode", no_argument,   NULL, 'p' },
+		{ "output-file",     required_argument, NULL, 'w' },
+		{ "ring-buffer",     required_argument, NULL, 'b' },
+		{ "snapshot-length", required_argument, NULL, 's' },
+		{ "version",         no_argument,       NULL, 'v' },
+		{ NULL },
+	};
+	int option_index, c;
+
+	for (;;) {
+		c = getopt_long(argc, argv, "a:b:c:dDf:ghi:nN:pPqs:vw:",
+				long_options, &option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			switch (option_index) {
+			case 0:
+				capture_comment = optarg;
+				break;
+			default:
+				usage();
+				exit(1);
+			}
+			break;
+		case 'a':
+			auto_stop(optarg);
+			break;
+		case 'b':
+			rte_exit(EXIT_FAILURE,
+				 "multiple files not implemented\n");
+			break;
+		case 'c':
+			stop.packets = get_uint(optarg, "packet_count", 0);
+			break;
+		case 'd':
+			dump_bpf = true;
+			break;
+		case 'D':
+			show_interfaces();
+			exit(0);
+		case 'f':
+			filter_str = optarg;
+			break;
+		case 'g':
+			group_read = true;
+			break;
+		case 'h':
+			printf("%s\n\n", version());
+			usage();
+			exit(0);
+		case 'i':
+			select_interface(optarg);
+			break;
+		case 'n':
+			use_pcapng = true;
+			break;
+		case 'N':
+			ring_size = get_uint(optarg, "packet_limit", 0);
+			break;
+		case 'p':
+			promiscuous_mode = false;
+			break;
+		case 'P':
+			use_pcapng = false;
+			break;
+		case 'q':
+			quiet = true;
+			break;
+		case 's':
+			snaplen = get_uint(optarg, "snap_len", 0);
+			break;
+		case 'w':
+			output_name = optarg;
+			break;
+		case 'v':
+			printf("%s\n", version());
+			exit(0);
+		default:
+			fprintf(stderr, "Invalid option: %s\n",
+				argv[optind - 1]);
+			usage();
+			exit(1);
+		}
+	}
+}
+
+static void
+signal_handler(int sig_num __rte_unused)
+{
+	__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+}
+
+/* Return the time since 1/1/1970 in nanoseconds */
+static uint64_t create_timestamp(void)
+{
+	struct timespec now;
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+	return rte_timespec_to_ns(&now);
+}
+
+static void
+cleanup_pdump_resources(void)
+{
+	struct interface *intf;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		rte_pdump_disable(intf->port,
+				  RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
+		if (promiscuous_mode)
+			rte_eth_promiscuous_disable(intf->port);
+	}
+}
+
+/* Alarm signal handler, used to check that primary process */
+static void
+monitor_primary(void *arg __rte_unused)
+{
+	if (__atomic_load_n(&quit_signal, __ATOMIC_RELAXED))
+		return;
+
+	if (rte_eal_primary_proc_alive(NULL)) {
+		rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	} else {
+		fprintf(stderr,
+			"Primary process is no longer active, exiting...\n");
+		__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+	}
+}
+
+/* Setup handler to check when primary exits. */
+static void
+enable_primary_monitor(void)
+{
+	int ret;
+
+	/* Once primary exits, so will pdump. */
+	ret = rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to enable monitor:%d\n", ret);
+}
+
+static void
+disable_primary_monitor(void)
+{
+	int ret;
+
+	ret = rte_eal_alarm_cancel(monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to disable monitor:%d\n", ret);
+}
+
+static void
+report_packet_stats(rte_pcapng_t *out)
+{
+	struct rte_pdump_stats pdump_stats;
+	struct interface *intf;
+	uint64_t ifrecv, ifdrop;
+	double percent;
+
+	fputc('\n', stderr);
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (rte_pdump_stats(intf->port, &pdump_stats) < 0)
+			continue;
+
+		/* do what Wiretap does */
+		ifrecv = pdump_stats.accepted + pdump_stats.filtered;
+		ifdrop = pdump_stats.nombuf + pdump_stats.ringfull;
+
+		if (use_pcapng)
+			rte_pcapng_write_stats(out, intf->port, NULL,
+					       start_time, end_time,
+					       ifrecv, ifdrop);
+
+		if (ifrecv == 0)
+			percent = 0;
+		else
+			percent = 100. * ifrecv / (ifrecv + ifdrop);
+
+		fprintf(stderr,
+			"Packets received/dropped on interface '%s': "
+			"%"PRIu64 "/%" PRIu64 " (%.1f)\n",
+			intf->name, ifrecv, ifdrop, percent);
+	}
+}
+
+/*
+ * Start DPDK EAL with arguments.
+ * Unlike most DPDK programs, this application does not use the
+ * typical EAL command line arguments.
+ * We don't want to expose all the DPDK internals to the user.
+ */
+static void dpdk_init(void)
+{
+	const char *args[] = {
+		progname, "--proc-type", "secondary",
+		"--log-level", "bpf:debug"
+
+	};
+	const int eal_argc = RTE_DIM(args);
+	char **eal_argv;
+	unsigned int i;
+
+	/* DPDK API requires mutable versions of command line arguments. */
+	eal_argv = calloc(eal_argc + 1, sizeof(char *));
+	if (eal_argv == NULL)
+		rte_panic("No memory\n");
+
+	for (i = 0; i < RTE_DIM(args); i++)
+		eal_argv[i] = strdup(args[i]);
+
+	if (rte_eal_init(eal_argc, eal_argv) < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed: is primary process running?\n");
+
+	if (rte_eth_dev_count_avail() == 0)
+		rte_exit(EXIT_FAILURE, "No Ethernet ports found\n");
+}
+
+/* Create packet ring shared between callbacks and process */
+static struct rte_ring *create_ring(void)
+{
+	struct rte_ring *ring;
+	size_t size, log2;
+
+	/* Find next power of 2 >= size. */
+	size = ring_size;
+	log2 = sizeof(size) * 8 - __builtin_clzl(size - 1);
+	size = 1u << log2;
+
+	if (size != ring_size) {
+		fprintf(stderr, "Ring size %u rounded up to %zu\n",
+			ring_size, size);
+		ring_size = size;
+	}
+
+	ring = rte_ring_lookup(RING_NAME);
+	if (ring == NULL) {
+		ring = rte_ring_create(RING_NAME, ring_size,
+					rte_socket_id(), 0);
+		if (ring == NULL)
+			rte_exit(EXIT_FAILURE, "Could not create ring :%s\n",
+				 rte_strerror(rte_errno));
+	}
+	return ring;
+}
+
+static struct rte_mempool *create_mempool(void)
+{
+	static const char pool_name[] = "capture_mbufs";
+	size_t num_mbufs = 2 * ring_size;
+	struct rte_mempool *mp;
+
+	mp = rte_mempool_lookup(pool_name);
+	if (mp)
+		return mp;
+
+	mp = rte_pktmbuf_pool_create_by_ops(pool_name, num_mbufs,
+					    MBUF_POOL_CACHE_SIZE, 0,
+					    rte_pcapng_mbuf_size(snaplen),
+					    rte_socket_id(), "ring_mp_sc");
+	if (mp == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Mempool (%s) creation failed: %s\n", pool_name,
+			 rte_strerror(rte_errno));
+
+	return mp;
+}
+
+static void *create_output(void)
+{
+	static char tmp_path[PATH_MAX];
+	struct utsname uts;
+	char os[256];
+	int fd;
+
+	/* If no filename specified make a tempfile name */
+	if (output_name == NULL) {
+		struct interface *intf;
+		struct tm *tm;
+		time_t now;
+		char ts[32];
+
+		intf = TAILQ_FIRST(&interfaces);
+		now = time(NULL);
+		tm = localtime(&now);
+		if (!tm)
+			rte_panic("localtime failed\n");
+
+		strftime(ts, sizeof(ts), "%Y%m%d%H%M%S", tm);
+
+		snprintf(tmp_path, sizeof(tmp_path),
+			 "/tmp/%s_%u_%s_%s.%s",
+			 progname, intf->port, intf->name, ts,
+			 use_pcapng ? "pcapng" : "pcap");
+		output_name = tmp_path;
+	}
+
+	if (strcmp(output_name, "-") == 0)
+		fd = STDOUT_FILENO;
+	else {
+		mode_t mode = group_read ? 0640 : 0600;
+
+		fd = open(output_name, O_WRONLY | O_CREAT, mode);
+		if (fd < 0)
+			rte_exit(EXIT_FAILURE, "Can not open \"%s\": %s\n",
+				 output_name, strerror(errno));
+	}
+
+	if (use_pcapng) {
+		rte_pcapng_t *pcapng;
+
+		if (uname(&uts) < 0)
+			strcpy(os, "unknown");
+		else
+			snprintf(os, sizeof(os), "%s %s",
+				 uts.sysname, uts.release);
+
+		pcapng = rte_pcapng_fdopen(fd, os, NULL, version(), capture_comment);
+		if (pcapng == NULL)
+			rte_exit(EXIT_FAILURE, "pcapng_fdopen failed: %s\n",
+				 strerror(rte_errno));
+		return pcapng;
+	} else {
+		pcap_dumper_t *dumper;
+		pcap_t *pcap;
+
+		pcap = pcap_open_dead_with_tstamp_precision(DLT_EN10MB, snaplen,
+							    PCAP_TSTAMP_PRECISION_NANO);
+		if (pcap == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_open_dead failed\n");
+
+		dumper = pcap_dump_fopen(pcap, fdopen(fd, "w"));
+		if (dumper == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_dump_fopen failed: %s\n",
+				 pcap_geterr(pcap));
+		return dumper;
+	}
+}
+
+static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp)
+{
+	struct interface *intf;
+	uint32_t flags;
+	int ret;
+
+	flags = RTE_PDUMP_FLAG_RXTX;
+	if (use_pcapng)
+		flags |= RTE_PDUMP_FLAG_PCAPNG;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (promiscuous_mode)
+			rte_eth_promiscuous_enable(intf->port);
+
+		ret = rte_pdump_enable_bpf(intf->port, RTE_PDUMP_ALL_QUEUES,
+					   flags, snaplen,
+					   r, mp, bpf_prm);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Packet dump enable failed: %s\n",
+				 rte_strerror(-ret));
+	}
+}
+
+/*
+ * Show current count of captured packets
+ * with backspaces to overwrite last value.
+ */
+static void show_count(uint64_t count)
+{
+	unsigned int i;
+	static unsigned int bt;
+
+	for (i = 0; i < bt; i++)
+		fputc('\b', stderr);
+
+	bt = fprintf(stderr, "%"PRIu64" ", count);
+}
+
+/* Write multiple packets in older pcap format */
+static ssize_t
+pcap_write_packets(pcap_dumper_t *dumper,
+		   struct rte_mbuf *pkts[], uint16_t n)
+{
+	uint8_t temp_data[snaplen];
+	struct pcap_pkthdr header;
+	uint16_t i;
+	size_t total = 0;
+
+	gettimeofday(&header.ts, NULL);
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m = pkts[i];
+
+		header.len = rte_pktmbuf_pkt_len(m);
+		header.caplen = RTE_MIN(header.len, snaplen);
+
+		pcap_dump((u_char *)dumper, &header,
+			  rte_pktmbuf_read(m, 0, header.caplen, temp_data));
+
+		total += sizeof(header) + header.len;
+	}
+
+	return total;
+}
+
+/* Process all packets in ring and dump to capture file */
+static int process_ring(void *out, struct rte_ring *r)
+{
+	struct rte_mbuf *pkts[BURST_SIZE];
+	unsigned int avail, n;
+	static unsigned int empty_count;
+	ssize_t written;
+
+	n = rte_ring_sc_dequeue_burst(r, (void **) pkts, BURST_SIZE,
+				      &avail);
+	if (n == 0) {
+		/* don't consume endless amounts of cpu if idle */
+		if (empty_count < SLEEP_THRESHOLD)
+			++empty_count;
+		else
+			usleep(10);
+		return 0;
+	}
+
+	empty_count = (avail == 0);
+
+	if (use_pcapng)
+		written = rte_pcapng_write_packets(out, pkts, n);
+	else
+		written = pcap_write_packets(out, pkts, n);
+
+	rte_pktmbuf_free_bulk(pkts, n);
+
+	if (written < 0)
+		return -1;
+
+	file_size += written;
+	packets_received += n;
+	if (!quiet)
+		show_count(packets_received);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	struct rte_ring *r;
+	struct rte_mempool *mp;
+	void *out;
+
+	progname = argv[0];
+
+	dpdk_init();
+	parse_opts(argc, argv);
+
+	if (filter_str)
+		compile_filter();
+
+	if (TAILQ_EMPTY(&interfaces))
+		set_default_interface();
+
+	r = create_ring();
+	mp = create_mempool();
+	out = create_output();
+	if (out == NULL)
+		rte_exit(EXIT_FAILURE, "can not open output file: %s\n",
+			 rte_strerror(rte_errno));
+
+	start_time = create_timestamp();
+	enable_pdump(r, mp);
+
+	signal(SIGINT, signal_handler);
+	signal(SIGPIPE, SIG_IGN);
+
+	enable_primary_monitor();
+
+	if (!quiet) {
+		fprintf(stderr, "Packets captured: ");
+		show_count(0);
+	}
+
+	while (!__atomic_load_n(&quit_signal, __ATOMIC_RELAXED)) {
+		if (process_ring(out, r) < 0) {
+			fprintf(stderr, "pcapng file write failed; %s\n",
+				strerror(errno));
+			break;
+		}
+
+		if (stop.size && file_size >= stop.size)
+			break;
+
+		if (stop.packets && packets_received >= stop.packets)
+			break;
+
+		if (stop.duration != 0 &&
+		    create_timestamp() - start_time > stop.duration)
+			break;
+	}
+
+	end_time = create_timestamp();
+	disable_primary_monitor();
+
+	if (rte_eal_primary_proc_alive(NULL))
+		report_packet_stats(out);
+
+	if (use_pcapng)
+		rte_pcapng_close(out);
+	else
+		pcap_dump_close(out);
+
+	cleanup_pdump_resources();
+	rte_free(bpf_filter);
+	rte_ring_free(r);
+	rte_mempool_free(mp);
+
+	return rte_eal_cleanup() ? EXIT_FAILURE : 0;
+}
diff --git a/app/dumpcap/meson.build b/app/dumpcap/meson.build
new file mode 100644
index 000000000000..794336211eff
--- /dev/null
+++ b/app/dumpcap/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+if not dpdk_conf.has('RTE_PORT_PCAP')
+    build = false
+    reason = 'missing dependency, "libpcap"'
+endif
+
+if is_windows
+	build = false
+	reason = 'not supported on Windows'
+	subdir_done()
+endif
+
+sources = files('main.c')
+deps += ['ethdev', 'pdump', 'pcapng', 'bpf']
diff --git a/app/meson.build b/app/meson.build
index 4c6049807cc3..e41a2e390236 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017-2019 Intel Corporation
 
 apps = [
+        'dumpcap',
         'pdump',
         'proc-info',
         'test-acl',
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v6 09/10] doc: changes for new pcapng and dumpcap
  2021-09-09 23:33 ` [dpdk-dev] [PATCH v6 00/10] Packet capture framework enhancements Stephen Hemminger
                     ` (7 preceding siblings ...)
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 08/10] app/dumpcap: add new packet capture application Stephen Hemminger
@ 2021-09-09 23:33   ` Stephen Hemminger
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 10/10] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
  9 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-09 23:33 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Describe the new packet capture library and utilities

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/api/doxy-api-index.md                     |  1 +
 doc/api/doxy-api.conf.in                      |  1 +
 .../howto/img/packet_capture_framework.svg    | 96 +++++++++----------
 doc/guides/howto/packet_capture_framework.rst | 67 ++++++-------
 doc/guides/prog_guide/index.rst               |  1 +
 doc/guides/prog_guide/pcapng_lib.rst          | 24 +++++
 doc/guides/prog_guide/pdump_lib.rst           | 28 ++++--
 doc/guides/rel_notes/release_21_11.rst        | 10 ++
 doc/guides/tools/dumpcap.rst                  | 86 +++++++++++++++++
 doc/guides/tools/index.rst                    |  1 +
 10 files changed, 228 insertions(+), 87 deletions(-)
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a0356..ee07394d1c78 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -223,3 +223,4 @@ The public API headers are grouped by topics:
   [experimental APIs]  (@ref rte_compat.h),
   [ABI versioning]     (@ref rte_function_versioning.h),
   [version]            (@ref rte_version.h)
+  [pcapng]             (@ref rte_pcapng.h)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6ab..aba17799a9a1 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -58,6 +58,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/metrics \
                           @TOPDIR@/lib/node \
                           @TOPDIR@/lib/net \
+                          @TOPDIR@/lib/pcapng \
                           @TOPDIR@/lib/pci \
                           @TOPDIR@/lib/pdump \
                           @TOPDIR@/lib/pipeline \
diff --git a/doc/guides/howto/img/packet_capture_framework.svg b/doc/guides/howto/img/packet_capture_framework.svg
index a76baf71fdee..1c2646a81096 100644
--- a/doc/guides/howto/img/packet_capture_framework.svg
+++ b/doc/guides/howto/img/packet_capture_framework.svg
@@ -1,6 +1,4 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<!-- Created with Inkscape (http://www.inkscape.org/) -->
-
 <svg
    xmlns:osb="http://www.openswatchbook.org/uri/2009/osb"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
@@ -16,8 +14,8 @@
    viewBox="0 0 425.19685 283.46457"
    id="svg2"
    version="1.1"
-   inkscape:version="0.91 r13725"
-   sodipodi:docname="drawing-pcap.svg">
+   inkscape:version="1.0.2 (e86c870879, 2021-01-15)"
+   sodipodi:docname="packet_capture_framework.svg">
   <defs
      id="defs4">
     <marker
@@ -228,7 +226,7 @@
        x2="487.64606"
        y2="258.38232"
        gradientUnits="userSpaceOnUse"
-       gradientTransform="translate(-84.916417,744.90779)" />
+       gradientTransform="matrix(1.1457977,0,0,0.99944907,-151.97019,745.05014)" />
     <linearGradient
        inkscape:collect="always"
        xlink:href="#linearGradient5784"
@@ -277,17 +275,18 @@
      borderopacity="1.0"
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
-     inkscape:zoom="0.57434918"
-     inkscape:cx="215.17857"
-     inkscape:cy="285.26445"
+     inkscape:zoom="1"
+     inkscape:cx="226.77165"
+     inkscape:cy="78.124511"
      inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
-     inkscape:window-width="1874"
-     inkscape:window-height="971"
-     inkscape:window-x="2"
-     inkscape:window-y="24"
-     inkscape:window-maximized="0" />
+     inkscape:window-width="2560"
+     inkscape:window-height="1414"
+     inkscape:window-x="0"
+     inkscape:window-y="0"
+     inkscape:window-maximized="1"
+     inkscape:document-rotation="0" />
   <metadata
      id="metadata7">
     <rdf:RDF>
@@ -296,7 +295,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
+        <dc:title />
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -321,15 +320,15 @@
        y="790.82452" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="61.050636"
        y="807.3205"
-       id="text4152"
-       sodipodi:linespacing="125%"><tspan
+       id="text4152"><tspan
          sodipodi:role="line"
          id="tspan4154"
          x="61.050636"
-         y="807.3205">DPDK Primary Application</tspan></text>
+         y="807.3205"
+         style="font-size:12.5px;line-height:1.25">DPDK Primary Application</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6"
@@ -339,19 +338,20 @@
        y="827.01843" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="350.68585"
        y="841.16058"
-       id="text4189"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189"><tspan
          sodipodi:role="line"
          id="tspan4191"
          x="350.68585"
-         y="841.16058">dpdk-pdump</tspan><tspan
+         y="841.16058"
+         style="font-size:12.5px;line-height:1.25">dpdk-dumpcap</tspan><tspan
          sodipodi:role="line"
          x="350.68585"
          y="856.78558"
-         id="tspan4193">tool</tspan></text>
+         id="tspan4193"
+         style="font-size:12.5px;line-height:1.25">tool</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4"
@@ -361,15 +361,15 @@
        y="891.16315" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70612"
        y="905.3053"
-       id="text4189-1"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1"><tspan
          sodipodi:role="line"
          x="352.70612"
          y="905.3053"
-         id="tspan4193-3">PCAP PMD</tspan></text>
+         id="tspan4193-3"
+         style="font-size:12.5px;line-height:1.25">librte_pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5745);fill-opacity:1;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-6"
@@ -379,15 +379,15 @@
        y="923.9931" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.02846"
        y="938.13525"
-       id="text4189-0"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-0"><tspan
          sodipodi:role="line"
          x="136.02846"
          y="938.13525"
-         id="tspan4193-6">dpdk_port0</tspan></text>
+         id="tspan4193-6"
+         style="font-size:12.5px;line-height:1.25">dpdk_port0</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-5"
@@ -397,33 +397,33 @@
        y="824.99817" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="137.54369"
        y="839.14026"
-       id="text4189-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-4"><tspan
          sodipodi:role="line"
          x="137.54369"
          y="839.14026"
-         id="tspan4193-2">librte_pdump</tspan></text>
+         id="tspan4193-2"
+         style="font-size:12.5px;line-height:1.25">librte_pdump</tspan></text>
     <rect
-       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1.07013;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5"
-       width="94.449265"
-       height="35.355339"
-       x="307.7804"
-       y="985.61243" />
+       width="108.21974"
+       height="35.335861"
+       x="297.9809"
+       y="985.62219" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70618"
        y="999.75458"
-       id="text4189-1-8"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8"><tspan
          sodipodi:role="line"
          x="352.70618"
          y="999.75458"
-         id="tspan4193-3-2">capture.pcap</tspan></text>
+         id="tspan4193-3-2"
+         style="font-size:12.5px;line-height:1.25">capture.pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5788-1);fill-opacity:1;stroke:#257cdc;stroke-width:1.12555885;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5-1"
@@ -433,15 +433,15 @@
        y="983.14984" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.53352"
        y="1002.785"
-       id="text4189-1-8-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8-4"><tspan
          sodipodi:role="line"
          x="136.53352"
          y="1002.785"
-         id="tspan4193-3-2-7">Traffic Generator</tspan></text>
+         id="tspan4193-3-2-7"
+         style="font-size:12.5px;line-height:1.25">Traffic Generator</tspan></text>
     <path
        style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker7331)"
        d="m 351.46948,927.02357 c 0,57.5787 0,57.5787 0,57.5787"
diff --git a/doc/guides/howto/packet_capture_framework.rst b/doc/guides/howto/packet_capture_framework.rst
index c31bac52340e..78baa609a021 100644
--- a/doc/guides/howto/packet_capture_framework.rst
+++ b/doc/guides/howto/packet_capture_framework.rst
@@ -1,18 +1,19 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(c) 2017 Intel Corporation.
 
-DPDK pdump Library and pdump Tool
-=================================
+DPDK packet capture libraries and tools
+=======================================
 
 This document describes how the Data Plane Development Kit (DPDK) Packet
 Capture Framework is used for capturing packets on DPDK ports. It is intended
 for users of DPDK who want to know more about the Packet Capture feature and
 for those who want to monitor traffic on DPDK-controlled devices.
 
-The DPDK packet capture framework was introduced in DPDK v16.07. The DPDK
-packet capture framework consists of the DPDK pdump library and DPDK pdump
-tool.
-
+The DPDK packet capture framework was introduced in DPDK v16.07 and
+enhanced in 21.1. The DPDK packet capture framework consists of the
+libraries for collecting packets ``librte_pdump`` and writing packets
+to a file ``librte_pcapng``. There are two sample applications:
+``dpdk-dumpcap`` and older ``dpdk-pdump``.
 
 Introduction
 ------------
@@ -22,43 +23,46 @@ allow users to initialize the packet capture framework and to enable or
 disable packet capture. The library works on a multi process communication model and its
 usage is recommended for debugging purposes.
 
-The :ref:`dpdk-pdump <pdump_tool>` tool is developed based on the
-``librte_pdump`` library.  It runs as a DPDK secondary process and is capable
-of enabling or disabling packet capture on DPDK ports. The ``dpdk-pdump`` tool
-provides command-line options with which users can request enabling or
-disabling of the packet capture on DPDK ports.
+The :ref:`librte_pcapng <pcapng_library>` library provides the APIs to format
+packets and write them to a file in Pcapng format.
+
+
+The :ref:`dpdk-dumpcap <dumpcap_tool>` is a tool that captures packets in
+like Wireshark dumpcap does for Linux. It runs as a DPDK secondary process and
+captures packets from one or more interfaces and writes them to a file
+in Pcapng format.  The ``dpdk-dumpcap`` tool is designed to take
+most of the same options as the Wireshark ``dumpcap`` command.
 
-The application which initializes the packet capture framework will be a primary process
-and the application that enables or disables the packet capture will
-be a secondary process. The primary process sends the Rx and Tx packets from the DPDK ports
-to the secondary process.
+Without any options it will use the packet capture framework to
+capture traffic from the first available DPDK port.
 
 In DPDK the ``testpmd`` application can be used to initialize the packet
-capture framework and acts as a server, and the ``dpdk-pdump`` tool acts as a
+capture framework and acts as a server, and the ``dpdk-dumpcap`` tool acts as a
 client. To view Rx or Tx packets of ``testpmd``, the application should be
-launched first, and then the ``dpdk-pdump`` tool. Packets from ``testpmd``
-will be sent to the tool, which then sends them on to the Pcap PMD device and
-that device writes them to the Pcap file or to an external interface depending
-on the command-line option used.
+launched first, and then the ``dpdk-dumpcap`` tool. Packets from ``testpmd``
+will be sent to the tool, and then to the Pcapng file.
 
 Some things to note:
 
-* The ``dpdk-pdump`` tool can only be used in conjunction with a primary
+* All tools using ``librte_pdump`` can only be used in conjunction with a primary
   application which has the packet capture framework initialized already. In
   dpdk, only ``testpmd`` is modified to initialize packet capture framework,
-  other applications remain untouched. So, if the ``dpdk-pdump`` tool has to
+  other applications remain untouched. So, if the ``dpdk-dumpcap`` tool has to
   be used with any application other than the testpmd, the user needs to
   explicitly modify that application to call the packet capture framework
   initialization code. Refer to the ``app/test-pmd/testpmd.c`` code and look
   for ``pdump`` keyword to see how this is done.
 
-* The ``dpdk-pdump`` tool depends on the libpcap based PMD.
+* The ``dpdk-pdump`` tool is an older tool created as demonstration of ``librte_pdump``
+  library. The ``dpdk-pdump`` tool provides more limited functionality and
+  and depends on the Pcap PMD. It is retained only for compatibility reasons;
+  users should use ``dpdk-dumpcap`` instead.
 
 
 Test Environment
 ----------------
 
-The overview of using the Packet Capture Framework and the ``dpdk-pdump`` tool
+The overview of using the Packet Capture Framework and the ``dpdk-dumpcap`` utility
 for packet capturing on the DPDK port in
 :numref:`figure_packet_capture_framework`.
 
@@ -66,13 +70,13 @@ for packet capturing on the DPDK port in
 
 .. figure:: img/packet_capture_framework.*
 
-   Packet capturing on a DPDK port using the dpdk-pdump tool.
+   Packet capturing on a DPDK port using the dpdk-dumpcap utility.
 
 
 Running the Application
 -----------------------
 
-The following steps demonstrate how to run the ``dpdk-pdump`` tool to capture
+The following steps demonstrate how to run the ``dpdk-dumpcap`` tool to capture
 Rx side packets on dpdk_port0 in :numref:`figure_packet_capture_framework` and
 inspect them using ``tcpdump``.
 
@@ -80,16 +84,15 @@ inspect them using ``tcpdump``.
 
      sudo <build_dir>/app/dpdk-testpmd -c 0xf0 -n 4 -- -i --port-topology=chained
 
-#. Launch the pdump tool as follows::
+#. Launch the dpdk-dump as follows::
 
-     sudo <build_dir>/app/dpdk-pdump -- \
-          --pdump 'port=0,queue=*,rx-dev=/tmp/capture.pcap'
+     sudo <build_dir>/app/dpdk-dumpcap -w /tmp/capture.pcapng
 
 #. Send traffic to dpdk_port0 from traffic generator.
-   Inspect packets captured in the file capture.pcap using a tool
-   that can interpret Pcap files, for example tcpdump::
+   Inspect packets captured in the file capture.pcap using a tool such as
+   tcpdump or tshark that can interpret Pcapng files::
 
-     $tcpdump -nr /tmp/capture.pcap
+     $ tcpdump -nr /tmp/capture.pcapng
      reading from file /tmp/capture.pcap, link-type EN10MB (Ethernet)
      11:11:36.891404 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
      11:11:36.891442 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46a3..b440c77c2ba1 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -43,6 +43,7 @@ Programmer's Guide
     ip_fragment_reassembly_lib
     generic_receive_offload_lib
     generic_segmentation_offload_lib
+    pcapng_lib
     pdump_lib
     multi_proc_support
     kernel_nic_interface
diff --git a/doc/guides/prog_guide/pcapng_lib.rst b/doc/guides/prog_guide/pcapng_lib.rst
new file mode 100644
index 000000000000..36379b530a57
--- /dev/null
+++ b/doc/guides/prog_guide/pcapng_lib.rst
@@ -0,0 +1,24 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2016 Intel Corporation.
+
+.. _pcapng_library:
+
+Packet Capture File Writer
+==========================
+
+Pcapng is a library for creating files in Pcapng file format.
+The Pcapng file format is the default capture file format for modern
+network capture processing tools. It can be read by wireshark and tcpdump.
+
+Usage
+-----
+
+Before the library can be used the function ``rte_pcapng_init``
+should be called once to initialize timestamp computation.
+
+
+References
+----------
+* Draft RFC https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+
+* Project repository  https://github.com/pcapng/pcapng/
diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst
index 62c0b015b2fe..9af91415e5ea 100644
--- a/doc/guides/prog_guide/pdump_lib.rst
+++ b/doc/guides/prog_guide/pdump_lib.rst
@@ -3,10 +3,10 @@
 
 .. _pdump_library:
 
-The librte_pdump Library
-========================
+The Packet Capture Library
+==========================
 
-The ``librte_pdump`` library provides a framework for packet capturing in DPDK.
+The DPDK ``pdump`` library provides a framework for packet capturing in DPDK.
 The library does the complete copy of the Rx and Tx mbufs to a new mempool and
 hence it slows down the performance of the applications, so it is recommended
 to use this library for debugging purposes.
@@ -23,11 +23,19 @@ or disable the packet capture, and to uninitialize it.
 
 * ``rte_pdump_enable()``:
   This API enables the packet capture on a given port and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf()``
+  This API enables the packet capture on a given port and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_enable_by_deviceid()``:
   This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf_by_deviceid()``
+  This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
+  It also allows seating an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_disable()``:
   This API disables the packet capture on a given port and queue.
@@ -61,6 +69,12 @@ and enables the packet capture by registering the Ethernet RX and TX callbacks f
 and queue combinations. Then the primary process will mirror the packets to the new mempool and enqueue them to
 the rte_ring that secondary process have passed to these APIs.
 
+The packet ring supports one of two formats. The default format enqueues copies of the original packets
+into the rte_ring. If the ``RTE_PDUMP_FLAG_PCAPNG`` is set the mbuf data is extended with header and trailer
+to match the format of Pcapng enhanced packet block. The enhanced packet block has meta-data such as the
+timestamp, port and queue the packet was captured on. It is up to the application consuming the
+packets from the ring to select the format desired.
+
 The library APIs ``rte_pdump_disable()`` and ``rte_pdump_disable_by_deviceid()`` disables the packet capture.
 For the calls to these APIs from secondary process, the library creates the "pdump disable" request and sends
 the request to the primary process over the multi process channel. The primary process takes this request and
@@ -74,5 +88,5 @@ function.
 Use Case: Packet Capturing
 --------------------------
 
-The DPDK ``app/pdump`` tool is developed based on this library to capture packets in DPDK.
-Users can use this as an example to develop their own packet capturing tools.
+The DPDK ``app/dpdk-dumpcap`` utility uses this library
+to capture packets in DPDK.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 675b5738348b..ee24cbfdb99d 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -62,6 +62,16 @@ New Features
   * Added bus-level parsing of the devargs syntax.
   * Kept compatibility with the legacy syntax as parsing fallback.
 
+* **Enhance Packet capture.**
+
+  * New dpdk-dumpcap program that has most of the features of the
+    wireshark dumpcap utility including capture of multiple interfaces,
+    stopping after number of bytes, packets.
+  * New library for writing pcapng packet capture files.
+  * Enhancement to the pdump library to support:
+    * Packet filter with BPF.
+    * Pcapng format with timestamps and meta-data.
+    * Fixes packet capture with stripped VLAN tags.
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/dumpcap.rst b/doc/guides/tools/dumpcap.rst
new file mode 100644
index 000000000000..664ea0c79802
--- /dev/null
+++ b/doc/guides/tools/dumpcap.rst
@@ -0,0 +1,86 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Microsoft Corporation.
+
+.. _dumpcap_tool:
+
+dpdk-dumpcap Application
+========================
+
+The ``dpdk-dumpcap`` tool is a Data Plane Development Kit (DPDK)
+network traffic dump tool.  The interface is similar to  the dumpcap tool in Wireshark.
+It runs as a secondary DPDK process and lets you capture packets that are
+coming into and out of a DPDK primary process.
+The ``dpdk-dumpcap`` writes files in Pcapng packet format using
+capture file format is pcapng.
+
+Without any options set it will use DPDK to capture traffic from the first
+available DPDK interface and write the received raw packet data, along
+with timestamps into a pcapng file.
+
+If the ``-w`` option is not specified, ``dpdk-dumpcap`` writes to a newly
+create file with a name chosen based on interface name and timestamp.
+If ``-w`` option is specified, then that file is used.
+
+   .. Note::
+      * The ``dpdk-dumpcap`` tool can only be used in conjunction with a primary
+        application which has the packet capture framework initialized already.
+        In dpdk, only the ``testpmd`` is modified to initialize packet capture
+        framework, other applications remain untouched. So, if the ``dpdk-dumpcap``
+        tool has to be used with any application other than the testpmd, user
+        needs to explicitly modify that application to call packet capture
+        framework initialization code. Refer ``app/test-pmd/testpmd.c``
+        code to see how this is done.
+
+      * The ``dpdk-dumpcap`` tool runs as a DPDK secondary process. It exits when
+        the primary application exits.
+
+
+Running the Application
+-----------------------
+
+To list interfaces available for capture use ``--list-interfaces``.
+
+To filter packets in style of *tshark* use the ``-f`` flag.
+
+To capture on multiple interfaces at once, use multiple ``-I`` flags.
+
+Example
+-------
+
+.. code-block:: console
+
+   # ./<build_dir>/app/dpdk-dumpcap --list-interfaces
+   0. 000:00:03.0
+   1. 000:00:03.1
+
+   # ./<build_dir>/app/dpdk-dumpcap -I 0000:00:03.0 -c 6 -w /tmp/sample.pcapng
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 6/0
+
+   # ./<build_dir>/app/dpdk-dumpcap -f 'tcp port 80'
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 10/8
+
+
+Limitations
+-----------
+The following option of Wireshark ``dumpcap`` is not yet implemented:
+
+   * ``-b|--ring-buffer`` -- more complex file management.
+
+The following options do not make sense in the context of DPDK.
+
+   * ``-C <byte_limit>`` -- its a kernel thing
+
+   * ``-t`` -- use a thread per interface
+
+   * Timestamp type.
+
+   * Link data types. Only EN10MB (Ethernet) is supported.
+
+   * Wireless related options:  ``-I|--monitor-mode`` and  ``-k <freq>``
+
+
+.. Note::
+   * The options to ``dpdk-dumpcap`` are like the Wireshark dumpcap program and
+     are not the same as ``dpdk-pdump`` and other DPDK applications.
diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst
index 93dde4148e90..b71c12b8f2dd 100644
--- a/doc/guides/tools/index.rst
+++ b/doc/guides/tools/index.rst
@@ -8,6 +8,7 @@ DPDK Tools User Guides
     :maxdepth: 2
     :numbered:
 
+    dumpcap
     proc_info
     pdump
     pmdinfo
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v6 10/10] MAINTAINERS: add entry for new pcapng and dumper
  2021-09-09 23:33 ` [dpdk-dev] [PATCH v6 00/10] Packet capture framework enhancements Stephen Hemminger
                     ` (8 preceding siblings ...)
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 09/10] doc: changes for new pcapng and dumpcap Stephen Hemminger
@ 2021-09-09 23:33   ` Stephen Hemminger
  9 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-09 23:33 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Claim responsibility for the new code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 266f5ac1dae8..06384ac2702d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1429,6 +1429,12 @@ F: app/test/test_pdump.*
 F: app/pdump/
 F: doc/guides/tools/pdump.rst
 
+Packet dump
+M: Stephen Hemminger <stephen@networkplumber.org>
+F: lib/pcapng/
+F: doc/guides/prog_guide/pcapng_lib.rst
+F: app/dumpcap/
+F: doc/guides/tools/dumpcap.rst
 
 Packet Framework
 ----------------
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v6 03/10] bpf: add function to convert classic BPF to DPDK BPF
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 03/10] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
@ 2021-09-10  7:59     ` Dmitry Kozlyuk
  0 siblings, 0 replies; 220+ messages in thread
From: Dmitry Kozlyuk @ 2021-09-10  7:59 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

2021-09-09 16:33 (UTC-0700), Stephen Hemminger:
[...]
> +	prm = rte_zmalloc("bpf_filter",
> +			  sizeof(*prm) + ebpf_len * sizeof(*ebpf), 0);
> +	if (prm == NULL) {
> +		rte_errno = ENOMEM;
> +		return NULL;
> +	}
> +
> +	/* The EPBF instructions in this case are right after the header */
> +	ebpf = (void *)(prm + 1);
> +
> +	/* 2nd pass: remap cBPF to eBPF instructions  */
> +	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, ebpf, &ebpf_len);
> +	if (ret < 0) {
> +		RTE_BPF_LOG(ERR, "%s: cannot convert cBPF to eBPF\n", __func__);
> +		free(prm);

free -> rte_free

> +		rte_errno = -ret;
> +		return NULL;
> +	}
[...]
> diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
> index 69116f36ba8b..2f23e272a376 100644
> --- a/lib/bpf/rte_bpf.h
> +++ b/lib/bpf/rte_bpf.h
> @@ -198,6 +198,31 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
>  int
>  rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
>  
> +#ifdef RTE_PORT_PCAP

In libre_bpf the function for ELF loading is always declared, and defined as a
stub when libelf is unavailable. The app using it can link to DPDK with or
without ELF support. No strong opinion here, but using different approaches
is a bit messy.

> +
> +struct bpf_program;
> +
> +/**
> + * Convert a Classic BPF program from libpcap into a DPDK BPF code.
> + *
> + * @param prog
> + *  Classic BPF program from pcap_compile().
> + * @param prm
> + *  Result Extended BPF program.
> + * @return
> + *   Pointer to BPF program (allocated with *rte_malloc*)
> + *   that is used in future BPF operations,
> + *   or NULL on error, with error code set in rte_errno.
> + *   Possible rte_errno errors include:
> + *   - EINVAL - invalid parameter passed to function
> + *   - ENOMEM - can't reserve enough memory
> + */
> +__rte_experimental
> +struct rte_bpf_prm *
> +rte_bpf_convert(const struct bpf_program *prog);
> +
> +#endif
> +
>  #ifdef __cplusplus
>  }
>  #endif

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v6 06/10] lib: pdump is not supported on Windows
  2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 06/10] lib: pdump is not supported on Windows Stephen Hemminger
@ 2021-09-10  8:17     ` Dmitry Kozlyuk
  0 siblings, 0 replies; 220+ messages in thread
From: Dmitry Kozlyuk @ 2021-09-10  8:17 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, Narcisa Ana Maria Vasile, Dmitry Malloy, Pallavi Kadam

2021-09-09 16:33 (UTC-0700), Stephen Hemminger:
> The original version of the pdump library was building on
> Windows, but it was useless since the pdump utility was not being
> built.
> 
> The new version of pdump with filtering now has dependency
> on bpf. But bpf library is not available on Windows.
> 
> For now, just stop trying to build pdump on Windows.
> Eventually, bpf library, pdump library, dumpcap tool,
> and pdump tool can be converted to work on Windows.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> 
> Cc: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> Cc: Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>
> Cc: Dmitry Malloy <dmitrym@microsoft.com>
> Cc: Pallavi Kadam <pallavi.kadam@intel.com>
> ---
>  lib/meson.build | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/lib/meson.build b/lib/meson.build
> index 51bf9c2d11f0..ba88e9eabc58 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -85,7 +85,6 @@ if is_windows
>              'gro',
>              'gso',
>              'latencystats',
> -            'pdump',
>      ] # only supported libraries for windows
>  endif
>  

Anyway pdump relies on multiprocess not supported on Windows.

Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v7 00/11] Packet capture framework enhancements
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (9 preceding siblings ...)
  2021-09-09 23:33 ` [dpdk-dev] [PATCH v6 00/10] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-10 18:18 ` Stephen Hemminger
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 01/11] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
                     ` (10 more replies)
  2021-09-13 18:14 ` [dpdk-dev] [PATCH v8 00/12] Packet capture framework enhancements Stephen Hemminger
                   ` (7 subsequent siblings)
  18 siblings, 11 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-10 18:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patch set is a more complete version of the the enhanced
packet capture support described last year.

The new capture library and utility are:
  - faster avoids lots of extra I/O, does bursting, etc.
  - gives more information (multiple ports, queues, etc)
  - has a better user interface (same as Wireshark dumpcap)
  - fixes structural problems with VLAN's and timestamps

v7 changes:
  - add functional tests for pcapng lib
  - bug fix for error returns in pcapng lib
  - handle long osname on FreeBSD
  - resolve almost all checkpatch issues

Stephen Hemminger (11):
  librte_pcapng: add new library for writing pcapng files
  lib: pdump is not supported on Windows
  bpf: allow self-xor operation
  bpf: add function to convert classic BPF to DPDK BPF
  bpf: add function to dump eBPF instructions
  pdump: support pcapng and filtering
  app/dumpcap: add new packet capture application
  test: add test for bpf_convert
  test: add a test for pcapng library
  doc: changes for new pcapng and dumpcap
  MAINTAINERS: add entry for new pcapng and dumper

 MAINTAINERS                                   |   6 +
 app/dumpcap/main.c                            | 844 ++++++++++++++++++
 app/dumpcap/meson.build                       |  16 +
 app/meson.build                               |   1 +
 app/test/meson.build                          |   1 +
 app/test/test_bpf.c                           | 148 +++
 app/test/test_pcapng.c                        | 190 ++++
 doc/api/doxy-api-index.md                     |   1 +
 doc/api/doxy-api.conf.in                      |   1 +
 .../howto/img/packet_capture_framework.svg    |  96 +-
 doc/guides/howto/packet_capture_framework.rst |  67 +-
 doc/guides/prog_guide/index.rst               |   1 +
 doc/guides/prog_guide/pcapng_lib.rst          |  24 +
 doc/guides/prog_guide/pdump_lib.rst           |  28 +-
 doc/guides/rel_notes/release_21_11.rst        |  10 +
 doc/guides/tools/dumpcap.rst                  |  86 ++
 doc/guides/tools/index.rst                    |   1 +
 lib/bpf/bpf_convert.c                         | 575 ++++++++++++
 lib/bpf/bpf_dump.c                            | 118 +++
 lib/bpf/bpf_validate.c                        |   8 +-
 lib/bpf/meson.build                           |   6 +
 lib/bpf/rte_bpf.h                             |  39 +
 lib/bpf/version.map                           |   7 +
 lib/meson.build                               |   6 +-
 lib/pcapng/meson.build                        |   8 +
 lib/pcapng/pcapng_proto.h                     | 129 +++
 lib/pcapng/rte_pcapng.c                       | 574 ++++++++++++
 lib/pcapng/rte_pcapng.h                       | 194 ++++
 lib/pcapng/version.map                        |  12 +
 lib/pdump/meson.build                         |   2 +-
 lib/pdump/rte_pdump.c                         | 437 ++++++---
 lib/pdump/rte_pdump.h                         | 110 ++-
 lib/pdump/version.map                         |   8 +
 33 files changed, 3538 insertions(+), 216 deletions(-)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build
 create mode 100644 app/test/test_pcapng.c
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst
 create mode 100644 lib/bpf/bpf_convert.c
 create mode 100644 lib/bpf/bpf_dump.c
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v7 01/11] librte_pcapng: add new library for writing pcapng files
  2021-09-10 18:18 ` [dpdk-dev] [PATCH v7 00/11] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-10 18:18   ` Stephen Hemminger
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 02/11] lib: pdump is not supported on Windows Stephen Hemminger
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-10 18:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See draft RFC
  https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
and
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build           |   1 +
 lib/pcapng/meson.build    |   8 +
 lib/pcapng/pcapng_proto.h | 129 +++++++++
 lib/pcapng/rte_pcapng.c   | 574 ++++++++++++++++++++++++++++++++++++++
 lib/pcapng/rte_pcapng.h   | 194 +++++++++++++
 lib/pcapng/version.map    |  12 +
 6 files changed, 918 insertions(+)
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

diff --git a/lib/meson.build b/lib/meson.build
index 1673ca4323c0..51bf9c2d11f0 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -41,6 +41,7 @@ libraries = [
         'latencystats',
         'lpm',
         'member',
+        'pcapng',
         'power',
         'pdump',
         'rawdev',
diff --git a/lib/pcapng/meson.build b/lib/pcapng/meson.build
new file mode 100644
index 000000000000..fe636bdf3c0b
--- /dev/null
+++ b/lib/pcapng/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+version = 1
+sources = files('rte_pcapng.c')
+headers = files('rte_pcapng.h')
+
+deps += ['ethdev']
diff --git a/lib/pcapng/pcapng_proto.h b/lib/pcapng/pcapng_proto.h
new file mode 100644
index 000000000000..47161d8a1213
--- /dev/null
+++ b/lib/pcapng/pcapng_proto.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * PCAP Next Generation Capture File writer
+ *
+ * See: https://github.com/pcapng/pcapng/ for the file format.
+ */
+
+enum pcapng_block_types {
+	PCAPNG_INTERFACE_BLOCK		= 1,
+	PCAPNG_PACKET_BLOCK,		/* Obsolete */
+	PCAPNG_SIMPLE_PACKET_BLOCK,
+	PCAPNG_NAME_RESOLUTION_BLOCK,
+	PCAPNG_INTERFACE_STATS_BLOCK,
+	PCAPNG_ENHANCED_PACKET_BLOCK,
+
+	PCAPNG_SECTION_BLOCK		= 0x0A0D0D0A,
+};
+
+struct pcapng_option {
+	uint16_t code;
+	uint16_t length;
+	uint8_t data[];
+};
+
+#define PCAPNG_BYTE_ORDER_MAGIC 0x1A2B3C4D
+#define PCAPNG_MAJOR_VERS 1
+#define PCAPNG_MINOR_VERS 0
+
+enum pcapng_opt {
+	PCAPNG_OPT_END	= 0,
+	PCAPNG_OPT_COMMENT = 1,
+};
+
+struct pcapng_section_header {
+	uint32_t block_type;
+	uint32_t block_length;
+	uint32_t byte_order_magic;
+	uint16_t major_version;
+	uint16_t minor_version;
+	uint64_t section_length;
+};
+
+enum pcapng_section_opt {
+	PCAPNG_SHB_HARDWARE = 2,
+	PCAPNG_SHB_OS	    = 3,
+	PCAPNG_SHB_USERAPPL = 4,
+};
+
+struct pcapng_interface_block {
+	uint32_t block_type;	/* 1 */
+	uint32_t block_length;
+	uint16_t link_type;
+	uint16_t reserved;
+	uint32_t snap_len;
+};
+
+enum pcapng_interface_options {
+	PCAPNG_IFB_NAME	 = 2,
+	PCAPNG_IFB_DESCRIPTION,
+	PCAPNG_IFB_IPV4ADDR,
+	PCAPNG_IFB_IPV6ADDR,
+	PCAPNG_IFB_MACADDR,
+	PCAPNG_IFB_EUIADDR,
+	PCAPNG_IFB_SPEED,
+	PCAPNG_IFB_TSRESOL,
+	PCAPNG_IFB_TZONE,
+	PCAPNG_IFB_FILTER,
+	PCAPNG_IFB_OS,
+	PCAPNG_IFB_FCSLEN,
+	PCAPNG_IFB_TSOFFSET,
+	PCAPNG_IFB_HARDWARE,
+};
+
+struct pcapng_enhance_packet_block {
+	uint32_t block_type;	/* 6 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+	uint32_t capture_length;
+	uint32_t original_length;
+};
+
+/* Flags values */
+#define PCAPNG_IFB_INBOUND   0b01
+#define PCAPNG_IFB_OUTBOUND  0b10
+
+enum pcapng_epb_options {
+	PCAPNG_EPB_FLAGS = 2,
+	PCAPNG_EPB_HASH,
+	PCAPNG_EPB_DROPCOUNT,
+	PCAPNG_EPB_PACKETID,
+	PCAPNG_EPB_QUEUE,
+	PCAPNG_EPB_VERDICT,
+};
+
+enum pcapng_epb_hash {
+	PCAPNG_HASH_2COMP = 0,
+	PCAPNG_HASH_XOR,
+	PCAPNG_HASH_CRC32,
+	PCAPNG_HASH_MD5,
+	PCAPNG_HASH_SHA1,
+	PCAPNG_HASH_TOEPLITZ,
+};
+
+struct pcapng_simple_packet {
+	uint32_t block_type;	/* 3 */
+	uint32_t block_length;
+	uint32_t packet_length;
+};
+
+struct pcapng_statistics {
+	uint32_t block_type;	/* 5 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+};
+
+enum pcapng_isb_options {
+	PCAPNG_ISB_STARTTIME = 2,
+	PCAPNG_ISB_ENDTIME,
+	PCAPNG_ISB_IFRECV,
+	PCAPNG_ISB_IFDROP,
+	PCAPNG_ISB_FILTERACCEPT,
+	PCAPNG_ISB_OSDROP,
+	PCAPNG_ISB_USRDELIV,
+};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
new file mode 100644
index 000000000000..f8280a8b01f4
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.c
@@ -0,0 +1,574 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+#include <errno.h>
+#include <net/if.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/uio.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+#include <rte_time.h>
+
+#include "pcapng_proto.h"
+
+/* conversion from DPDK speed to PCAPNG */
+#define PCAPNG_MBPS_SPEED 1000000ull
+
+/* Format of the capture file handle */
+struct rte_pcapng {
+	int  outfd;		/* output file */
+	/* DPDK port id to interface index in file */
+	uint32_t port_index[RTE_MAX_ETHPORTS];
+};
+
+/* For converting TSC cycles to PCAPNG ns format */
+struct pcapng_time {
+	uint64_t ns;
+	uint64_t cycles;
+} pcapng_time;
+
+RTE_INIT(pcapng_init)
+{
+	struct timespec ts;
+
+	pcapng_time.cycles = rte_get_tsc_cycles();
+	clock_gettime(CLOCK_REALTIME, &ts);
+	pcapng_time.ns = rte_timespec_to_ns(&ts);
+}
+
+/* PCAPNG timestamps are in nanoseconds */
+static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
+{
+	uint64_t delta;
+
+	delta = cycles - pcapng_time.cycles;
+	return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
+}
+
+/* length of option including padding */
+static uint16_t pcapng_optlen(uint16_t len)
+{
+	return RTE_ALIGN(sizeof(struct pcapng_option) + len,
+			 sizeof(uint32_t));
+}
+
+/* build TLV option and return location of next */
+static struct pcapng_option *
+pcapng_add_option(struct pcapng_option *popt, uint16_t code,
+		  const void *data, uint16_t len)
+{
+	popt->code = code;
+	popt->length = len;
+	memcpy(popt->data, data, len);
+
+	return (struct pcapng_option *)((uint8_t *)popt + pcapng_optlen(len));
+}
+
+/*
+ * Write required initial section header describing the capture
+ */
+static int
+pcapng_section_block(rte_pcapng_t *self,
+		    const char *os, const char *hw,
+		    const char *app, const char *comment)
+{
+	struct pcapng_section_header *hdr;
+	struct pcapng_option *opt;
+	void *buf;
+	uint32_t len;
+	ssize_t cc;
+
+	len = sizeof(*hdr);
+	if (hw)
+		len += pcapng_optlen(strlen(hw));
+	if (os)
+		len += pcapng_optlen(strlen(os));
+	if (app)
+		len += pcapng_optlen(strlen(app));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = calloc(1, len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_section_header *)buf;
+	*hdr = (struct pcapng_section_header) {
+		.block_type = PCAPNG_SECTION_BLOCK,
+		.block_length = len,
+		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
+		.major_version = PCAPNG_MAJOR_VERS,
+		.minor_version = PCAPNG_MINOR_VERS,
+		.section_length = UINT64_MAX,
+	};
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (hw)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_HARDWARE,
+					hw, strlen(hw));
+	if (os)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_OS,
+					os, strlen(os));
+	if (app)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_USERAPPL,
+					app, strlen(app));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+	/* clone block_length after option */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+/* Write the PCAPNG section header at start of file */
+static ssize_t
+pcapng_interface_block(rte_pcapng_t *self, const char *if_name,
+		      uint64_t if_speed, const uint8_t *mac_addr,
+		      const char *if_hw, const char *comment)
+{
+	struct pcapng_interface_block *hdr;
+	struct pcapng_option *opt;
+	const uint8_t tsresol = 9;	/* nanosecond resolution */
+	uint32_t len = sizeof(*hdr);
+	ssize_t cc;
+	void *buf;
+
+	len += pcapng_optlen(sizeof(tsresol));
+	if (if_name)
+		len += pcapng_optlen(strlen(if_name));
+	if (mac_addr)
+		len += pcapng_optlen(6);
+	if (if_speed)
+		len += pcapng_optlen(sizeof(uint64_t));
+	if (if_hw)
+		len += pcapng_optlen(strlen(if_hw));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+	buf = calloc(1, len);
+	if (!buf)
+		return -ENOMEM;
+
+	hdr = (struct pcapng_interface_block *)buf;
+	hdr->block_type = PCAPNG_INTERFACE_BLOCK;
+	hdr->link_type = 1;	/* Ethernet */
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (if_name)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_NAME,
+					 if_name, strlen(if_name));
+	if (mac_addr)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_MACADDR,
+					mac_addr, RTE_ETHER_ADDR_LEN);
+	if (if_speed)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_SPEED,
+					 &if_speed, sizeof(uint64_t));
+	opt = pcapng_add_option(opt, PCAPNG_IFB_TSRESOL,
+				&tsresol, sizeof(tsresol));
+	if (if_hw)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE,
+					 if_hw, strlen(if_hw));
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+static int
+pcapng_add_interface(rte_pcapng_t *self, uint16_t port)
+{
+	struct rte_eth_dev_info dev_info;
+	struct rte_ether_addr macaddr;
+	const struct rte_device *dev;
+	struct rte_eth_link link;
+	char ifname[IF_NAMESIZE];
+	char ifhw[256];
+	uint64_t speed = 0;
+
+	if (rte_eth_dev_info_get(port, &dev_info) < 0)
+		return -1;
+
+	/* make something like an interface name */
+	if (if_indextoname(dev_info.if_index, ifname) == NULL)
+		snprintf(ifname, IF_NAMESIZE, "dpdk:%u", port);
+
+	/* make a useful device hardware string */
+	dev = dev_info.device;
+	if (dev)
+		snprintf(ifhw, sizeof(ifhw),
+			 "%s-%s", dev->bus->name, dev->name);
+
+	/* DPDK reports in units of Mbps */
+	rte_eth_link_get(port, &link);
+	if (link.link_status == ETH_LINK_UP)
+		speed = link.link_speed * PCAPNG_MBPS_SPEED;
+
+	rte_eth_macaddr_get(port, &macaddr);
+
+	return pcapng_interface_block(self, ifname, speed,
+				      macaddr.addr_bytes,
+				      dev ? ifhw : NULL, NULL);
+}
+
+/*
+ * Write the list of possible interfaces at the start
+ * of the file.
+ */
+static int
+pcapng_interfaces(rte_pcapng_t *self)
+{
+	uint16_t port_id;
+	uint16_t index = 0;
+
+	RTE_ETH_FOREACH_DEV(port_id) {
+		/* The list if ports in pcapng needs to be contiguous */
+		self->port_index[port_id] = index++;
+		if (pcapng_add_interface(self, port_id) < 0)
+			return -1;
+	}
+	return 0;
+}
+
+/*
+ * Write an Interface statistics block at the end of capture.
+ */
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop)
+{
+	struct pcapng_statistics *hdr;
+	struct pcapng_option *opt;
+	uint32_t optlen, len;
+	uint8_t *buf;
+	uint64_t ns;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	optlen = 0;
+
+	if (ifrecv != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifdrop));
+	if (start_time != 0)
+		optlen += pcapng_optlen(sizeof(start_time));
+	if (end_time != 0)
+		optlen += pcapng_optlen(sizeof(end_time));
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+	if (optlen != 0)
+		optlen += pcapng_optlen(0);
+
+	len = sizeof(*hdr) + optlen + sizeof(uint32_t);
+	buf = alloca(len);
+	if (buf == NULL)
+		return -1;
+
+	hdr = (struct pcapng_statistics *)buf;
+	opt = (struct pcapng_option *)(hdr + 1);
+
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (start_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME,
+					 &start_time, sizeof(start_time));
+	if (end_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME,
+					 &end_time, sizeof(end_time));
+	if (ifrecv != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV,
+				&ifrecv, sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFDROP,
+				&ifdrop, sizeof(ifdrop));
+	if (optlen != 0)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	hdr->block_type = PCAPNG_INTERFACE_STATS_BLOCK;
+	hdr->block_length = len;
+	hdr->interface_id = self->port_index[port_id];
+
+	ns = pcapng_tsc_to_ns(rte_get_tsc_cycles());
+	hdr->timestamp_hi = ns >> 32;
+	hdr->timestamp_lo = (uint32_t)ns;
+
+	/* clone block_length after option */
+	memcpy(opt, &len, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length)
+{
+	/* The VLAN and EPB header must fit in the mbuf headroom. */
+	RTE_ASSERT(sizeof(struct pcapng_enhance_packet_block) +
+		   sizeof(struct rte_vlan_hdr) <= RTE_PKTMBUF_HEADROOM);
+
+	/* The flags and queue information are added at the end. */
+	return sizeof(struct rte_mbuf)
+		+ RTE_ALIGN(length, sizeof(uint32_t))
+		+ pcapng_optlen(sizeof(uint32_t)) /* flag option */
+		+ pcapng_optlen(sizeof(uint32_t)) /* queue option */
+		+ sizeof(uint32_t);		  /*  length */
+}
+
+/*
+ *   The mbufs created use the Pcapng standard enhanced packet  block.
+ *
+ *                         1                   2                   3
+ *     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  0 |                    Block Type = 0x00000006                    |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  4 |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  8 |                         Interface ID                          |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 12 |                        Timestamp (High)                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 16 |                        Timestamp (Low)                        |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 20 |                    Captured Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 24 |                    Original Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 28 /                                                               /
+ *    /                          Packet Data                          /
+ *    /              variable length, padded to 32 bits               /
+ *    /                                                               /
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0002     |     Option Length = 0x004     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Flags (direction)                                |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0006     |     Option Length = 0x002     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Queue id                                         |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+
+/* Make a copy of original mbuf with pcapng header and options */
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *md,
+		struct rte_mempool *mp,
+		uint32_t length, uint64_t cycles,
+		enum rte_pcapng_direction direction)
+{
+	struct pcapng_enhance_packet_block *epb;
+	uint32_t orig_len, data_len, padding, flags;
+	struct pcapng_option *opt;
+	const uint16_t optlen = pcapng_optlen(sizeof(flags)) + pcapng_optlen(sizeof(queue));
+	struct rte_mbuf *mc;
+	uint64_t ns;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
+#endif
+	ns = pcapng_tsc_to_ns(cycles);
+
+	orig_len = rte_pktmbuf_pkt_len(md);
+
+	/* Take snapshot of the data */
+	mc = rte_pktmbuf_copy(md, mp, 0, length);
+	if (unlikely(mc == NULL))
+		return NULL;
+
+	/* If packet had offloaded VLAN, expand it */
+	if (md->ol_flags & ~(PKT_RX_VLAN_STRIPPED | PKT_TX_VLAN)) {
+		if (rte_vlan_insert(&mc) != 0)
+			goto fail;
+
+		orig_len += sizeof(struct rte_vlan_hdr);
+	}
+
+	/* pad the packet to 32 bit boundary */
+	data_len = rte_pktmbuf_data_len(mc);
+	padding = RTE_ALIGN(data_len, sizeof(uint32_t)) - data_len;
+	if (padding > 0) {
+		void *tail = rte_pktmbuf_append(mc, padding);
+
+		if (tail == NULL)
+			goto fail;
+		memset(tail, 0, padding);
+	}
+
+	/* reserve trailing options and block length */
+	opt = (struct pcapng_option *)
+		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
+	if (unlikely(opt == NULL))
+		goto fail;
+
+	switch (direction) {
+	case RTE_PCAPNG_DIRECTION_IN:
+		flags = PCAPNG_IFB_INBOUND;
+		break;
+	case RTE_PCAPNG_DIRECTION_OUT:
+		flags = PCAPNG_IFB_OUTBOUND;
+		break;
+	default:
+		flags = 0;
+	}
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS,
+				&flags, sizeof(flags));
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
+				&queue, sizeof(queue));
+
+	/* Add PCAPNG packet header */
+	epb = (struct pcapng_enhance_packet_block *)
+		rte_pktmbuf_prepend(mc, sizeof(*epb));
+	if (unlikely(epb == NULL))
+		goto fail;
+
+	epb->block_type = PCAPNG_ENHANCED_PACKET_BLOCK;
+	epb->block_length = rte_pktmbuf_data_len(mc);
+
+	/* Interface index is filled in later during write */
+	mc->port = port_id;
+
+	epb->timestamp_hi = ns >> 32;
+	epb->timestamp_lo = (uint32_t)ns;
+	epb->capture_length = data_len;
+	epb->original_length = orig_len;
+
+	/* set trailer of block length */
+	*(uint32_t *)opt = epb->block_length;
+
+	return mc;
+
+fail:
+	rte_pktmbuf_free(mc);
+	return NULL;
+}
+
+/* Count how many segments are in this array of mbufs */
+static unsigned int
+mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
+{
+	unsigned int i, iovcnt;
+
+	for (iovcnt = 0, i = 0; i < n; i++) {
+		const struct rte_mbuf *m = pkts[i];
+
+		__rte_mbuf_sanity_check(m, 1);
+
+		iovcnt += m->nb_segs;
+	}
+	return iovcnt;
+}
+
+/* Write pre-formatted packets to file. */
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts)
+{
+	int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
+	struct iovec iov[iovcnt];
+	unsigned int i, cnt;
+	ssize_t ret;
+
+	for (i = cnt = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *m = pkts[i];
+		struct pcapng_enhance_packet_block *epb;
+
+		/* sanity check that is really a pcapng mbuf */
+		epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
+		if (unlikely(epb->block_type != PCAPNG_ENHANCED_PACKET_BLOCK ||
+			     epb->block_length != rte_pktmbuf_data_len(m))) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/*
+		 * The DPDK port is recorded during pcapng_copy.
+		 * Map that to PCAPNG interface in file.
+		 */
+		epb->interface_id = self->port_index[m->port];
+		do {
+			iov[cnt].iov_base = rte_pktmbuf_mtod(m, void *);
+			iov[cnt].iov_len = rte_pktmbuf_data_len(m);
+			++cnt;
+		} while ((m = m->next));
+	}
+
+	ret = writev(self->outfd, iov, iovcnt);
+	if (unlikely(ret < 0))
+		rte_errno = errno;
+	return ret;
+}
+
+/* Create new pcapng writer handle */
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment)
+{
+	rte_pcapng_t *self;
+
+	self = malloc(sizeof(*self));
+	if (!self) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	self->outfd = fd;
+
+	if (pcapng_section_block(self, osname, hardware, appname, comment) < 0)
+		goto fail;
+
+	if (pcapng_interfaces(self) < 0)
+		goto fail;
+
+	return self;
+fail:
+	free(self);
+	return NULL;
+}
+
+void
+rte_pcapng_close(rte_pcapng_t *self)
+{
+	close(self->outfd);
+	free(self);
+}
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
new file mode 100644
index 000000000000..2f1bb073df08
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.h
@@ -0,0 +1,194 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+/**
+ * @file
+ * RTE pcapng
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * Pcapng is an evolution from the pcap format, created to address some of
+ * its deficiencies. Namely, the lack of extensibility and inability to store
+ * additional information.
+ *
+ * For details about the file format see RFC:
+ *   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+ *  and
+ *    https://github.com/pcapng/pcapng/
+ */
+
+#ifndef _RTE_PCAPNG_H_
+#define _RTE_PCAPNG_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+#include <rte_compat.h>
+#include <rte_common.h>
+#include <rte_mempool.h>
+#include <rte_ring.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Opaque handle used for functions in this library. */
+typedef struct rte_pcapng rte_pcapng_t;
+
+/**
+ * Write data to existing open file
+ *
+ * @param fd
+ *   file descriptor
+ * @param osname
+ *   Optional description of the operating system.
+ *   Examples: "Debian 11", "Windows Server 22"
+ * @param hardware
+ *   Optional description of the hardware used to create this file.
+ *   Examples: "x86 Virtual Machine"
+ * @param appname
+ *   Optional: application name recorded in the pcapng file.
+ *   Example: "dpdk-dumpcap 1.0 (DPDK 20.11)"
+ * @param comment
+ *   Optional comment to add to file header.
+ * @return
+ *   handle to library, or NULL in case of error (and rte_errno is set).
+ */
+__rte_experimental
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment);
+
+/**
+ * Close capture file
+ *
+ * @param self
+ *  handle to library
+ */
+__rte_experimental
+void
+rte_pcapng_close(rte_pcapng_t *self);
+
+/**
+ * Direction flag
+ * These should match Enhanced Packet Block flag bits
+ */
+enum rte_pcapng_direction {
+	RTE_PCAPNG_DIRECTION_UNKNOWN = 0,
+	RTE_PCAPNG_DIRECTION_IN  = 1,
+	RTE_PCAPNG_DIRECTION_OUT = 2,
+};
+
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param port_id
+ *   The Ethernet port on which packet was received
+ *   or is going to be transmitted.
+ * @param queue
+ *   The queue on the Ethernet port where packet was received
+ *   or is going to be transmitted.
+ * @param mp
+ *   The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ *   The mbuf to copy
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @param timestamp
+ *   The timestamp in TSC cycles.
+ * @param direction
+ *   The direction of the packer: receive, transmit or unknown.
+ *
+ * @return
+ *   - The pointer to the new mbuf formatted for pcapng_write
+ *   - NULL if allocation fails.
+ *
+ */
+__rte_experimental
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *m, struct rte_mempool *mp,
+		uint32_t length, uint64_t timestamp,
+		enum rte_pcapng_direction direction);
+
+
+/**
+ * Determine optimum mbuf data size.
+ *
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @return
+ *   The minimum size of mbuf data to handle packet with length bytes.
+ *   Accounting for required header and trailer fields
+ */
+__rte_experimental
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length);
+
+/**
+ * Write packets to the capture file.
+ *
+ * Packets to be captured are copied by rte_pcapng_mbuf()
+ * and then this function is called to write them to the file.
+ * @warning
+ * Do not pass original mbufs
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param pkts
+ *  The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *  which contain the output packets
+ * @param nb_pkts
+ *  The number of packets to write to the file.
+ * @return
+ *  The number of bytes written to file, -1 on failure to write file.
+ *  The mbuf's in *pkts* are always freed.
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts);
+
+/**
+ * Write an Interface statistics block.
+ * For statistics, use 0 if don't know or care to report it.
+ * Should be called before closing capture to report results.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param port
+ *  The Ethernet port to report stats on.
+ * @param comment
+ *   Optional comment to add to statistics.
+ * @param start_time
+ *  The time when packet capture was started in nanoseconds.
+ *  Optional: can be zero if not known.
+ * @param end_time
+ *  The time when packet capture was stopped in nanoseconds.
+ *  Optional: can be zero if not finished;
+ * @param ifrecv
+ *  The number of packets received by capture.
+ *  Optional: use UINT64_MAX if not known.
+ * @param ifdrop
+ *  The number of packets missed by the capture process.
+ *  Optional: use UINT64_MAX if not known.
+ * @return
+ *  number of bytes written to file, -1 on failure to write file
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PCAPNG_H_ */
diff --git a/lib/pcapng/version.map b/lib/pcapng/version.map
new file mode 100644
index 000000000000..05a9c86a7d91
--- /dev/null
+++ b/lib/pcapng/version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_pcapng_close;
+	rte_pcapng_copy;
+	rte_pcapng_fdopen;
+	rte_pcapng_mbuf_size;
+	rte_pcapng_write_packets;
+	rte_pcapng_write_stats;
+
+	local: *;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v7 02/11] lib: pdump is not supported on Windows
  2021-09-10 18:18 ` [dpdk-dev] [PATCH v7 00/11] Packet capture framework enhancements Stephen Hemminger
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 01/11] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-09-10 18:18   ` Stephen Hemminger
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 03/11] bpf: allow self-xor operation Stephen Hemminger
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-10 18:18 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

The current version of the pdump library was building on
Windows, but it was useless since the pdump utility was not being
built and Windows does not have multi-process support.

The new version of pdump with filtering now has dependency
on bpf. But bpf library is not available on Windows.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

Cc: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>
Cc: Dmitry Malloy <dmitrym@microsoft.com>
Cc: Pallavi Kadam <pallavi.kadam@intel.com>
---
 lib/meson.build | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/meson.build b/lib/meson.build
index 51bf9c2d11f0..ba88e9eabc58 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -85,7 +85,6 @@ if is_windows
             'gro',
             'gso',
             'latencystats',
-            'pdump',
     ] # only supported libraries for windows
 endif
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v7 03/11] bpf: allow self-xor operation
  2021-09-10 18:18 ` [dpdk-dev] [PATCH v7 00/11] Packet capture framework enhancements Stephen Hemminger
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 01/11] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 02/11] lib: pdump is not supported on Windows Stephen Hemminger
@ 2021-09-10 18:18   ` Stephen Hemminger
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 04/11] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-10 18:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

When doing BPF filter program conversion, a common way
to zero a register in single instruction is:
     xor r7,r7
The BPF validator would not allow this because the value of
r7 was undefined. But after this operation it always zero.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_validate.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/bpf/bpf_validate.c b/lib/bpf/bpf_validate.c
index 7b1291b382e9..7647a7454dc2 100644
--- a/lib/bpf/bpf_validate.c
+++ b/lib/bpf/bpf_validate.c
@@ -661,8 +661,12 @@ eval_alu(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
 
 	op = BPF_OP(ins->code);
 
-	err = eval_defined((op != EBPF_MOV) ? rd : NULL,
-			(op != BPF_NEG) ? &rs : NULL);
+	/* Allow self-xor as way to zero register */
+	if (op == BPF_XOR && ins->src_reg == ins->dst_reg)
+		err = NULL;
+	else
+		err = eval_defined((op != EBPF_MOV) ? rd : NULL,
+				   (op != BPF_NEG) ? &rs : NULL);
 	if (err != NULL)
 		return err;
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v7 04/11] bpf: add function to convert classic BPF to DPDK BPF
  2021-09-10 18:18 ` [dpdk-dev] [PATCH v7 00/11] Packet capture framework enhancements Stephen Hemminger
                     ` (2 preceding siblings ...)
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 03/11] bpf: allow self-xor operation Stephen Hemminger
@ 2021-09-10 18:18   ` Stephen Hemminger
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 05/11] bpf: add function to dump eBPF instructions Stephen Hemminger
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-10 18:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

The pcap library emits classic BPF (32 bit) and is useful for
creating filter programs.  The DPDK BPF library only implements
extended BPF (eBPF).  Add an function to convert from old to
new.

The rte_bpf_convert function uses rte_malloc to put the resulting
program in hugepage shared memory so it can be passed from a
secondary process to a primary process.

The code to convert was originally done as part of the Linux
kernel implementation then converted to a userspace program.
Both authors have agreed that it is allowable to license this
as BSD licensed in DPDK.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_convert.c | 570 ++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build   |   5 +
 lib/bpf/rte_bpf.h     |  25 ++
 lib/bpf/version.map   |   6 +
 4 files changed, 606 insertions(+)
 create mode 100644 lib/bpf/bpf_convert.c

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
new file mode 100644
index 000000000000..198e6d359042
--- /dev/null
+++ b/lib/bpf/bpf_convert.c
@@ -0,0 +1,570 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Microsoft Corporation
+ *
+ * Based on bpf_convert_filter() in the Linux kernel sources
+ * and filter2xdp.
+ *
+ * Licensed as BSD with permission original authors.
+ * Copyright (C) 2017 Tobias Klauser
+ * Copyright (c) 2011 - 2014 PLUMgrid, http://plumgrid.com
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_bpf.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+/* Workaround name conflicts with libpcap */
+#define bpf_validate(f, len) bpf_validate_libpcap(f, len)
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+#undef bpf_validate
+
+#include "bpf_impl.h"
+#include "bpf_def.h"
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+/*
+ * Linux socket filter uses negative absolute offsets to
+ * reference ancillary data.
+ *
+ */
+#define SKF_AD_OFF    (-0x1000)
+#define SKF_AD_PROTOCOL 0
+#define SKF_AD_PKTTYPE	4
+#define SKF_AD_IFINDEX	8
+#define SKF_AD_NLATTR	12
+#define SKF_AD_NLATTR_NEST	16
+#define SKF_AD_MARK	20
+#define SKF_AD_QUEUE	24
+#define SKF_AD_HATYPE	28
+#define SKF_AD_RXHASH	32
+#define SKF_AD_CPU	36
+#define SKF_AD_ALU_XOR_X	40
+#define SKF_AD_VLAN_TAG	44
+#define SKF_AD_VLAN_TAG_PRESENT 48
+#define SKF_AD_PAY_OFFSET	52
+#define SKF_AD_RANDOM	56
+#define SKF_AD_VLAN_TPID	60
+#define SKF_AD_MAX	64
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1	EBPF_REG_1
+#define BPF_REG_ARG2	EBPF_REG_2
+#define BPF_REG_ARG3	EBPF_REG_3
+#define BPF_REG_ARG4	EBPF_REG_4
+#define BPF_REG_ARG5	EBPF_REG_5
+#define BPF_REG_CTX	EBPF_REG_6
+#define BPF_REG_FP	EBPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A	EBPF_REG_0
+#define BPF_REG_X	EBPF_REG_7
+#define BPF_REG_TMP	EBPF_REG_8
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define EBPF_ALU64_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | BPF_OP(OP) | BPF_X,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU32_IMM(OP, DST, IMM)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV32_IMM(DST, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = CODE,					\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN()						\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | EBPF_EXIT,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/*
+ * Placeholder to convert BPF extensions like length and VLAN tag
+ * If and when DPDK BPF supports them.
+ */
+static bool convert_bpf_load(const struct bpf_insn *fp,
+			     struct ebpf_insn **new_insnp __rte_unused)
+{
+	switch (fp->k) {
+	case SKF_AD_OFF + SKF_AD_PROTOCOL:
+	case SKF_AD_OFF + SKF_AD_PKTTYPE:
+	case SKF_AD_OFF + SKF_AD_IFINDEX:
+	case SKF_AD_OFF + SKF_AD_HATYPE:
+	case SKF_AD_OFF + SKF_AD_MARK:
+	case SKF_AD_OFF + SKF_AD_RXHASH:
+	case SKF_AD_OFF + SKF_AD_QUEUE:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG_PRESENT:
+	case SKF_AD_OFF + SKF_AD_VLAN_TPID:
+	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
+	case SKF_AD_OFF + SKF_AD_NLATTR:
+	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
+	case SKF_AD_OFF + SKF_AD_CPU:
+	case SKF_AD_OFF + SKF_AD_RANDOM:
+	case SKF_AD_OFF + SKF_AD_ALU_XOR_X:
+		/* TODO: Not implemented yet */
+		RTE_BPF_LOG(ERR, "BPF extension LOAD ABS %u not supported\n",
+			    fp->k);
+		return true;
+	default:
+		return false;
+	}
+}
+
+static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
+			      struct ebpf_insn *new_prog, uint32_t *new_len)
+{
+	unsigned int pass = 0;
+	size_t new_flen = 0, target, i;
+	struct ebpf_insn *new_insn;
+	const struct bpf_insn *fp;
+	int *addrs = NULL;
+	uint8_t bpf_src;
+
+	if (len > BPF_MAXINSNS) {
+		RTE_BPF_LOG(ERR, "%s: cBPF program too long (%zu insns)\n",
+			    __func__, len);
+		return -EINVAL;
+	}
+
+	/* On second pass, allocate the new program */
+	if (new_prog) {
+		addrs = calloc(len, sizeof(*addrs));
+		if (addrs == NULL)
+			return -ENOMEM;
+	}
+
+do_pass:
+	new_insn = new_prog;
+	fp = prog;
+
+	/* Classic BPF related prologue emission. */
+	if (new_insn) {
+		/* Classic BPF expects A and X to be reset first. These need
+		 * to be guaranteed to be the first two instructions.
+		 */
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+
+		/* All programs must keep CTX in callee saved BPF_REG_CTX.
+		 * In eBPF case it's done by the compiler, here we need to
+		 * do this ourself. Initial CTX is present in BPF_REG_ARG1.
+		 */
+		*new_insn++ = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1);
+	} else {
+		new_insn += 3;
+	}
+
+	for (i = 0; i < len; fp++, i++) {
+		struct ebpf_insn tmp_insns[6] = { };
+		struct ebpf_insn *insn = tmp_insns;
+
+		if (addrs)
+			addrs[i] = new_insn - new_prog;
+
+		switch (fp->code) {
+			/* Absolute loads are how classic BPF accesses skb */
+		case BPF_LD | BPF_ABS | BPF_W:
+		case BPF_LD | BPF_ABS | BPF_H:
+		case BPF_LD | BPF_ABS | BPF_B:
+			if (convert_bpf_load(fp, &insn))
+				goto err;
+
+			*insn = BPF_RAW_INSN(fp->code, 0, 0, 0, fp->k);
+			break;
+
+		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_X:
+			/* For cBPF, don't cause floating point exception */
+			*insn++ = BPF_MOV32_REG(BPF_REG_X, BPF_REG_X);
+			*insn++ = BPF_JMP_IMM(EBPF_JNE, BPF_REG_X, 0, 2);
+			*insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+			*insn++ = BPF_EXIT_INSN();
+			/* fallthrough */
+		case BPF_ALU | BPF_ADD | BPF_X:
+		case BPF_ALU | BPF_ADD | BPF_K:
+		case BPF_ALU | BPF_SUB | BPF_X:
+		case BPF_ALU | BPF_SUB | BPF_K:
+		case BPF_ALU | BPF_AND | BPF_X:
+		case BPF_ALU | BPF_AND | BPF_K:
+		case BPF_ALU | BPF_OR | BPF_X:
+		case BPF_ALU | BPF_OR | BPF_K:
+		case BPF_ALU | BPF_LSH | BPF_X:
+		case BPF_ALU | BPF_LSH | BPF_K:
+		case BPF_ALU | BPF_RSH | BPF_X:
+		case BPF_ALU | BPF_RSH | BPF_K:
+		case BPF_ALU | BPF_XOR | BPF_X:
+		case BPF_ALU | BPF_XOR | BPF_K:
+		case BPF_ALU | BPF_MUL | BPF_X:
+		case BPF_ALU | BPF_MUL | BPF_K:
+		case BPF_ALU | BPF_DIV | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_NEG:
+		case BPF_LD | BPF_IND | BPF_W:
+		case BPF_LD | BPF_IND | BPF_H:
+		case BPF_LD | BPF_IND | BPF_B:
+			/* All arithmetic insns map as-is. */
+			*insn = BPF_RAW_INSN(fp->code, BPF_REG_A, BPF_REG_X, 0, fp->k);
+			break;
+
+			/* Jump transformation cannot use BPF block macros
+			 * everywhere as offset calculation and target updates
+			 * require a bit more work than the rest, i.e. jump
+			 * opcodes map as-is, but offsets need adjustment.
+			 */
+
+#define BPF_EMIT_JMP							\
+			do {						\
+				if (target >= len)			\
+					goto err;			\
+				insn->off = addrs ? addrs[target] - addrs[i] - 1 : 0; \
+				/* Adjust pc relative offset for 2nd or 3rd insn. */ \
+				insn->off -= insn - tmp_insns;		\
+			} while (0)
+
+		case BPF_JMP | BPF_JA:
+			target = i + fp->k + 1;
+			insn->code = fp->code;
+			BPF_EMIT_JMP;
+			break;
+
+		case BPF_JMP | BPF_JEQ | BPF_K:
+		case BPF_JMP | BPF_JEQ | BPF_X:
+		case BPF_JMP | BPF_JSET | BPF_K:
+		case BPF_JMP | BPF_JSET | BPF_X:
+		case BPF_JMP | BPF_JGT | BPF_K:
+		case BPF_JMP | BPF_JGT | BPF_X:
+		case BPF_JMP | BPF_JGE | BPF_K:
+		case BPF_JMP | BPF_JGE | BPF_X:
+			if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
+				/* BPF immediates are signed, zero extend
+				 * immediate into tmp register and use it
+				 * in compare insn.
+				 */
+				*insn++ = BPF_MOV32_IMM(BPF_REG_TMP, fp->k);
+
+				insn->dst_reg = BPF_REG_A;
+				insn->src_reg = BPF_REG_TMP;
+				bpf_src = BPF_X;
+			} else {
+				insn->dst_reg = BPF_REG_A;
+				insn->imm = fp->k;
+				bpf_src = BPF_SRC(fp->code);
+				insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			}
+
+			/* Common case where 'jump_false' is next insn. */
+			if (fp->jf == 0) {
+				insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+				target = i + fp->jt + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Convert JEQ into JNE when 'jump_true' is next insn. */
+			if (fp->jt == 0 && BPF_OP(fp->code) == BPF_JEQ) {
+				insn->code = BPF_JMP | EBPF_JNE | bpf_src;
+				target = i + fp->jf + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Other jumps are mapped into two insns: Jxx and JA. */
+			target = i + fp->jt + 1;
+			insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+			BPF_EMIT_JMP;
+			insn++;
+
+			insn->code = BPF_JMP | BPF_JA;
+			target = i + fp->jf + 1;
+			BPF_EMIT_JMP;
+			break;
+
+			/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
+		case BPF_LDX | BPF_MSH | BPF_B:
+			/* tmp = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_A);
+			/* A = BPF_R0 = *(u8 *) (skb->data + K) */
+			*insn++ = BPF_LD_ABS(BPF_B, fp->k);
+			/* A &= 0xf */
+			*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, 0xf);
+			/* A <<= 2 */
+			*insn++ = BPF_ALU32_IMM(BPF_LSH, BPF_REG_A, 2);
+			/* X = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			/* A = tmp */
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP);
+			break;
+
+			/* RET_K is remaped into 2 insns. RET_A case doesn't need an
+			 * extra mov as EBPF_REG_0 is already mapped into BPF_REG_A.
+			 */
+		case BPF_RET | BPF_A:
+		case BPF_RET | BPF_K:
+			if (BPF_RVAL(fp->code) == BPF_K) {
+				*insn++ = BPF_MOV32_RAW(BPF_K, EBPF_REG_0,
+							0, fp->k);
+			}
+			*insn = BPF_EXIT_INSN();
+			break;
+
+			/* Store to stack. */
+		case BPF_ST:
+		case BPF_STX:
+			*insn = BPF_STX_MEM(BPF_W, BPF_REG_FP, BPF_CLASS(fp->code) ==
+					    BPF_ST ? BPF_REG_A : BPF_REG_X,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* Load from stack. */
+		case BPF_LD | BPF_MEM:
+		case BPF_LDX | BPF_MEM:
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD  ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_FP,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* A = K or X = K */
+		case BPF_LD | BPF_IMM:
+		case BPF_LDX | BPF_IMM:
+			*insn = BPF_MOV32_IMM(BPF_CLASS(fp->code) == BPF_LD ?
+					      BPF_REG_A : BPF_REG_X, fp->k);
+			break;
+
+			/* X = A */
+		case BPF_MISC | BPF_TAX:
+			*insn = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			break;
+
+			/* A = X */
+		case BPF_MISC | BPF_TXA:
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_X);
+			break;
+
+			/* A = mbuf->len or X = mbuf->len */
+		case BPF_LD | BPF_W | BPF_LEN:
+		case BPF_LDX | BPF_W | BPF_LEN:
+			/* BPF_ABS/BPF_IND implicitly expect mbuf ptr in R6 */
+
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_CTX,
+					    offsetof(struct rte_mbuf, pkt_len));
+			break;
+
+			/* Unknown instruction. */
+		default:
+			RTE_BPF_LOG(ERR, "%s: Unknown instruction!: %#x\n",
+				    __func__, fp->code);
+			goto err;
+		}
+
+		insn++;
+		if (new_prog)
+			memcpy(new_insn, tmp_insns,
+			       sizeof(*insn) * (insn - tmp_insns));
+		new_insn += insn - tmp_insns;
+	}
+
+	if (!new_prog) {
+		/* Only calculating new length. */
+		*new_len = new_insn - new_prog;
+		return 0;
+	}
+
+	pass++;
+	if ((ptrdiff_t)new_flen != new_insn - new_prog) {
+		new_flen = new_insn - new_prog;
+		if (pass > 2)
+			goto err;
+		goto do_pass;
+	}
+
+	free(addrs);
+	assert(*new_len == new_flen);
+
+	return 0;
+err:
+	free(addrs);
+	return -1;
+}
+
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog)
+{
+	struct rte_bpf_prm *prm = NULL;
+	struct ebpf_insn *ebpf = NULL;
+	uint32_t ebpf_len = 0;
+	int ret;
+
+	if (prog == NULL) {
+		RTE_BPF_LOG(ERR, "%s: NULL program\n", __func__);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* 1st pass: calculate the eBPF program length */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, NULL, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot get eBPF length\n", __func__);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	RTE_BPF_LOG(DEBUG, "%s: prog len cBPF=%u -> eBPF=%u\n",
+		    __func__, prog->bf_len, ebpf_len);
+
+	prm = rte_zmalloc("bpf_filter",
+			  sizeof(*prm) + ebpf_len * sizeof(*ebpf), 0);
+	if (prm == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	/* The EPBF instructions in this case are right after the header */
+	ebpf = (void *)(prm + 1);
+
+	/* 2nd pass: remap cBPF to eBPF instructions  */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, ebpf, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot convert cBPF to eBPF\n", __func__);
+		free(prm);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	prm->ins = ebpf;
+	prm->nb_ins = ebpf_len;
+
+	/* Classic BPF programs use mbufs */
+	prm->prog_arg.type = RTE_BPF_ARG_PTR_MBUF;
+	prm->prog_arg.size = sizeof(struct rte_mbuf);
+
+	return prm;
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 63cbd60185e0..54f7610ae990 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -25,3 +25,8 @@ if dep.found()
     sources += files('bpf_load_elf.c')
     ext_deps += dep
 endif
+
+if dpdk_conf.has('RTE_PORT_PCAP')
+    sources += files('bpf_convert.c')
+    ext_deps += pcap_dep
+endif
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 69116f36ba8b..2f23e272a376 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,31 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+#ifdef RTE_PORT_PCAP
+
+struct bpf_program;
+
+/**
+ * Convert a Classic BPF program from libpcap into a DPDK BPF code.
+ *
+ * @param prog
+ *  Classic BPF program from pcap_compile().
+ * @param prm
+ *  Result Extended BPF program.
+ * @return
+ *   Pointer to BPF program (allocated with *rte_malloc*)
+ *   that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+__rte_experimental
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog);
+
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 0bf35f487666..47082d5003ef 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -14,3 +14,9 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_convert;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v7 05/11] bpf: add function to dump eBPF instructions
  2021-09-10 18:18 ` [dpdk-dev] [PATCH v7 00/11] Packet capture framework enhancements Stephen Hemminger
                     ` (3 preceding siblings ...)
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 04/11] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
@ 2021-09-10 18:18   ` Stephen Hemminger
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 06/11] pdump: support pcapng and filtering Stephen Hemminger
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-10 18:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

When debugging converted (and other) programs it is useful
to see disassembled eBPF output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_convert.c |   7 ++-
 lib/bpf/bpf_dump.c    | 118 ++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build   |   1 +
 lib/bpf/rte_bpf.h     |  14 +++++
 lib/bpf/version.map   |   1 +
 5 files changed, 140 insertions(+), 1 deletion(-)
 create mode 100644 lib/bpf/bpf_dump.c

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
index 198e6d359042..f649fa663edf 100644
--- a/lib/bpf/bpf_convert.c
+++ b/lib/bpf/bpf_convert.c
@@ -331,7 +331,12 @@ static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
 		case BPF_LD | BPF_IND | BPF_H:
 		case BPF_LD | BPF_IND | BPF_B:
 			/* All arithmetic insns map as-is. */
-			*insn = BPF_RAW_INSN(fp->code, BPF_REG_A, BPF_REG_X, 0, fp->k);
+			insn->code = fp->code;
+			insn->dst_reg = BPF_REG_A;
+			bpf_src = BPF_SRC(fp->code);
+			insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			insn->off = 0;
+			insn->imm = fp->k;
 			break;
 
 			/* Jump transformation cannot use BPF block macros
diff --git a/lib/bpf/bpf_dump.c b/lib/bpf/bpf_dump.c
new file mode 100644
index 000000000000..a6a431e64903
--- /dev/null
+++ b/lib/bpf/bpf_dump.c
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Stephen Hemminger
+ * Based on filter2xdp
+ * Copyright (C) 2017 Tobias Klauser
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+
+#include "rte_bpf.h"
+
+#define BPF_OP_INDEX(x) (BPF_OP(x) >> 4)
+#define BPF_SIZE_INDEX(x) (BPF_SIZE(x) >> 3)
+
+static const char *const class_tbl[] = {
+	[BPF_LD] = "ld",   [BPF_LDX] = "ldx",	 [BPF_ST] = "st",
+	[BPF_STX] = "stx", [BPF_ALU] = "alu",	 [BPF_JMP] = "jmp",
+	[BPF_RET] = "ret", [BPF_MISC] = "alu64",
+};
+
+static const char *const alu_op_tbl[16] = {
+	[BPF_ADD >> 4] = "add",	   [BPF_SUB >> 4] = "sub",
+	[BPF_MUL >> 4] = "mul",	   [BPF_DIV >> 4] = "div",
+	[BPF_OR >> 4] = "or",	   [BPF_AND >> 4] = "and",
+	[BPF_LSH >> 4] = "lsh",	   [BPF_RSH >> 4] = "rsh",
+	[BPF_NEG >> 4] = "neg",	   [BPF_MOD >> 4] = "mod",
+	[BPF_XOR >> 4] = "xor",	   [EBPF_MOV >> 4] = "mov",
+	[EBPF_ARSH >> 4] = "arsh", [EBPF_END >> 4] = "endian",
+};
+
+static const char *const size_tbl[] = {
+	[BPF_W >> 3] = "w",
+	[BPF_H >> 3] = "h",
+	[BPF_B >> 3] = "b",
+	[EBPF_DW >> 3] = "dw",
+};
+
+static const char *const jump_tbl[16] = {
+	[BPF_JA >> 4] = "ja",	   [BPF_JEQ >> 4] = "jeq",
+	[BPF_JGT >> 4] = "jgt",	   [BPF_JGE >> 4] = "jge",
+	[BPF_JSET >> 4] = "jset",  [EBPF_JNE >> 4] = "jne",
+	[EBPF_JSGT >> 4] = "jsgt", [EBPF_JSGE >> 4] = "jsge",
+	[EBPF_CALL >> 4] = "call", [EBPF_EXIT >> 4] = "exit",
+};
+
+static void ebpf_dump(FILE *f, const struct ebpf_insn insn, size_t n)
+{
+	const char *op, *postfix = "";
+	uint8_t cls = BPF_CLASS(insn.code);
+
+	fprintf(f, " L%zu:\t", n);
+
+	switch (cls) {
+	default:
+		fprintf(f, "unimp 0x%x // class: %s\n", insn.code,
+			class_tbl[cls]);
+		break;
+	case BPF_ALU:
+		postfix = "32";
+		/* fall through */
+	case EBPF_ALU64:
+		op = alu_op_tbl[BPF_OP_INDEX(insn.code)];
+		if (BPF_SRC(insn.code) == BPF_X)
+			fprintf(f, "%s%s r%u, r%u\n", op, postfix, insn.dst_reg,
+				insn.src_reg);
+		else
+			fprintf(f, "%s%s r%u, #0x%x\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		break;
+	case BPF_LD:
+		op = "ld";
+		postfix = size_tbl[BPF_SIZE_INDEX(insn.code)];
+		if (BPF_MODE(insn.code) == BPF_IMM)
+			fprintf(f, "%s%s r%d, #0x%x\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		else if (BPF_MODE(insn.code) == BPF_ABS)
+			fprintf(f, "%s%s r%d, [%d]\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		else if (BPF_MODE(insn.code) == BPF_IND)
+			fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix,
+				insn.dst_reg, insn.src_reg, insn.imm);
+		else
+			fprintf(f, "// BUG: LD opcode 0x%02x in eBPF insns\n",
+				insn.code);
+		break;
+	case BPF_LDX:
+		op = "ldx";
+		postfix = size_tbl[BPF_SIZE_INDEX(insn.code)];
+		fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix, insn.dst_reg,
+			insn.src_reg, insn.off);
+		break;
+#define L(pc, off) ((int)(pc) + 1 + (off))
+	case BPF_JMP:
+		op = jump_tbl[BPF_OP_INDEX(insn.code)];
+		if (op == NULL)
+			fprintf(f, "invalid jump opcode: %#x\n", insn.code);
+		else if (BPF_OP(insn.code) == BPF_JA)
+			fprintf(f, "%s L%d\n", op, L(n, insn.off));
+		else if (BPF_OP(insn.code) == EBPF_EXIT)
+			fprintf(f, "%s\n", op);
+		else
+			fprintf(f, "%s r%u, #0x%x, L%d\n", op, insn.dst_reg,
+				insn.imm, L(n, insn.off));
+		break;
+	case BPF_RET:
+		fprintf(f, "// BUG: RET opcode 0x%02x in eBPF insns\n",
+			insn.code);
+		break;
+	}
+}
+
+void rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len)
+{
+	uint32_t i;
+
+	for (i = 0; i < len; ++i)
+		ebpf_dump(f, buf[i], i);
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 54f7610ae990..5b5585173aeb 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2018 Intel Corporation
 
 sources = files('bpf.c',
+	'bpf_dump.c',
         'bpf_exec.c',
         'bpf_load.c',
         'bpf_pkt.c',
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 2f23e272a376..0d0a84b130a0 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,20 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+/**
+ * Dump epf instructions to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param buf
+ *   A pointer to BPF instructions
+ * @param len
+ *   Number of BPF instructions to dump.
+ */
+__rte_experimental
+void
+rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len);
+
 #ifdef RTE_PORT_PCAP
 
 struct bpf_program;
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 47082d5003ef..3b953f2f4592 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -19,4 +19,5 @@ EXPERIMENTAL {
 	global:
 
 	rte_bpf_convert;
+	rte_bpf_dump;
 };
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v7 06/11] pdump: support pcapng and filtering
  2021-09-10 18:18 ` [dpdk-dev] [PATCH v7 00/11] Packet capture framework enhancements Stephen Hemminger
                     ` (4 preceding siblings ...)
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 05/11] bpf: add function to dump eBPF instructions Stephen Hemminger
@ 2021-09-10 18:18   ` Stephen Hemminger
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 07/11] app/dumpcap: add new packet capture application Stephen Hemminger
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-10 18:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build       |   4 +-
 lib/pdump/meson.build |   2 +-
 lib/pdump/rte_pdump.c | 437 ++++++++++++++++++++++++++++++------------
 lib/pdump/rte_pdump.h | 110 ++++++++++-
 lib/pdump/version.map |   8 +
 5 files changed, 435 insertions(+), 126 deletions(-)

diff --git a/lib/meson.build b/lib/meson.build
index ba88e9eabc58..1da521ea6185 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -26,6 +26,7 @@ libraries = [
         'timer',   # eventdev depends on this
         'acl',
         'bbdev',
+        'bpf',
         'bitratestats',
         'cfgfile',
         'compressdev',
@@ -43,7 +44,6 @@ libraries = [
         'member',
         'pcapng',
         'power',
-        'pdump',
         'rawdev',
         'regexdev',
         'rib',
@@ -55,10 +55,10 @@ libraries = [
         'ipsec', # ipsec lib depends on net, crypto and security
         'fib', #fib lib depends on rib
         'port', # pkt framework libs which use other libs from above
+        'pdump', # pdump lib depends on bpf pcapng
         'table',
         'pipeline',
         'flow_classify', # flow_classify lib depends on pkt framework table lib
-        'bpf',
         'graph',
         'node',
 ]
diff --git a/lib/pdump/meson.build b/lib/pdump/meson.build
index 3a95eabde6a6..51ceb2afdec5 100644
--- a/lib/pdump/meson.build
+++ b/lib/pdump/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('rte_pdump.c')
 headers = files('rte_pdump.h')
-deps += ['ethdev']
+deps += ['ethdev', 'bpf', 'pcapng']
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 382217bc1564..f2047ad9f001 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -7,8 +7,10 @@
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
+#include <rte_pcapng.h>
 
 #include "rte_pdump.h"
 
@@ -27,30 +29,26 @@ enum pdump_operation {
 	ENABLE = 2
 };
 
+/*
+ * Note: version numbers intentionally start at 3
+ * in order to catch any application built with older out
+ * version of DPDK using incompatible client request format.
+ */
 enum pdump_version {
-	V1 = 1
+	PDUMP_CLIENT_LEGACY = 3,
+	PDUMP_CLIENT_PCAPNG = 4,
 };
 
 struct pdump_request {
 	uint16_t ver;
 	uint16_t op;
-	uint32_t flags;
-	union pdump_data {
-		struct enable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} en_v1;
-		struct disable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} dis_v1;
-	} data;
+	uint16_t flags;
+	uint16_t queue;
+	struct rte_ring *ring;
+	struct rte_mempool *mp;
+	const struct rte_bpf_prm *prm;
+	uint32_t snaplen;
+	char device[RTE_DEV_NAME_MAX_LEN];
 };
 
 struct pdump_response {
@@ -63,80 +61,140 @@ static struct pdump_rxtx_cbs {
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	const struct rte_eth_rxtx_callback *cb;
-	void *filter;
+	const struct rte_bpf *filter;
+	enum pdump_version ver;
+	uint32_t snaplen;
 } rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
 tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
 
-
-static inline void
-pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+static const char *MZ_RTE_PDUMP_STATS = "rte_pdump_stats";
+
+/* Shared memory between primary and secondary processes. */
+static struct {
+	struct rte_pdump_stats rx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+	struct rte_pdump_stats tx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+} *pdump_stats;
+
+/* Create a clone of mbuf to be placed into ring. */
+static void
+pdump_copy(uint16_t port_id, uint16_t queue,
+	   enum rte_pcapng_direction direction,
+	   struct rte_mbuf **pkts, uint16_t nb_pkts,
+	   const struct pdump_rxtx_cbs *cbs,
+	   struct rte_pdump_stats *stats)
 {
 	unsigned int i;
 	int ring_enq;
 	uint16_t d_pkts = 0;
 	struct rte_mbuf *dup_bufs[nb_pkts];
-	struct pdump_rxtx_cbs *cbs;
+	uint64_t ts;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	struct rte_mbuf *p;
+	uint64_t rcs[nb_pkts];
+
+	if (cbs->filter &&
+	    rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts) == 0) {
+		/* All packets were filtered out */
+		__atomic_fetch_add(&stats->filtered, nb_pkts,
+				   __ATOMIC_RELAXED);
+		return;
+	}
 
-	cbs  = user_params;
+	ts = rte_get_tsc_cycles();
 	ring = cbs->ring;
 	mp = cbs->mp;
 	for (i = 0; i < nb_pkts; i++) {
-		p = rte_pktmbuf_copy(pkts[i], mp, 0, UINT32_MAX);
-		if (p)
+		/*
+		 * Similar behavior to rte_bpf_eth callback.
+		 * if BPF program returns zero value for a given packet,
+		 * then it will be ignored.
+		 */
+		if (cbs->filter && rcs[i] == 0) {
+			__atomic_fetch_add(&stats->filtered,
+					   1, __ATOMIC_RELAXED);
+			continue;
+		}
+
+		/*
+		 * If using pcapng then want to wrap packets
+		 * otherwise a simple copy.
+		 */
+		if (cbs->ver == PDUMP_CLIENT_PCAPNG)
+			p = rte_pcapng_copy(port_id, queue,
+					    pkts[i], mp, cbs->snaplen,
+					    ts, direction);
+		else
+			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
+
+		if (unlikely(p == NULL))
+			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+		else
 			dup_bufs[d_pkts++] = p;
 	}
 
+	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)dup_bufs, d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		PDUMP_LOG(DEBUG,
-			"only %d of packets enqueued to ring\n", ring_enq);
+		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
 
 static uint16_t
-pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_rx(uint16_t port, uint16_t queue,
 	struct rte_mbuf **pkts, uint16_t nb_pkts,
-	uint16_t max_pkts __rte_unused,
-	void *user_params)
+	uint16_t max_pkts __rte_unused, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->rx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_IN,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static uint16_t
-pdump_tx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_tx(uint16_t port, uint16_t queue,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->tx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static int
-pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_rx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &rx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &rx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"rx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_first_rx_callback(port, qid,
 								pdump_rx, cbs);
 			if (cbs->cb == NULL) {
@@ -145,8 +203,7 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -170,26 +227,32 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 }
 
 static int
-pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_tx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &tx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &tx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"tx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_tx_callback(port, qid, pdump_tx,
 								cbs);
 			if (cbs->cb == NULL) {
@@ -198,8 +261,7 @@ pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -228,37 +290,47 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	uint16_t nb_rx_q = 0, nb_tx_q = 0, end_q, queue;
 	uint16_t port;
 	int ret = 0;
+	struct rte_bpf *filter = NULL;
 	uint32_t flags;
 	uint16_t operation;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 
-	flags = p->flags;
-	operation = p->op;
-	if (operation == ENABLE) {
-		ret = rte_eth_dev_get_port_by_name(p->data.en_v1.device,
-				&port);
-		if (ret < 0) {
+	if (!(p->ver == PDUMP_CLIENT_LEGACY ||
+	      p->ver == PDUMP_CLIENT_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "incorrect client version %u\n", p->ver);
+		return -EINVAL;
+	}
+
+	if (p->prm) {
+		if (p->prm->prog_arg.type != RTE_BPF_ARG_PTR_MBUF) {
 			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.en_v1.device);
+				  "invalid BPF program type: %u\n",
+				  p->prm->prog_arg.type);
 			return -EINVAL;
 		}
-		queue = p->data.en_v1.queue;
-		ring = p->data.en_v1.ring;
-		mp = p->data.en_v1.mp;
-	} else {
-		ret = rte_eth_dev_get_port_by_name(p->data.dis_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.dis_v1.device);
-			return -EINVAL;
+
+		filter = rte_bpf_load(p->prm);
+		if (filter == NULL) {
+			PDUMP_LOG(ERR, "cannot load BPF filter: %s\n",
+				  rte_strerror(rte_errno));
+			return -rte_errno;
 		}
-		queue = p->data.dis_v1.queue;
-		ring = p->data.dis_v1.ring;
-		mp = p->data.dis_v1.mp;
+	}
+
+	flags = p->flags;
+	operation = p->op;
+	queue = p->queue;
+	ring = p->ring;
+	mp = p->mp;
+
+	ret = rte_eth_dev_get_port_by_name(p->device, &port);
+	if (ret < 0) {
+		PDUMP_LOG(ERR,
+			  "failed to get port id for device id=%s\n",
+			  p->device);
+		return -EINVAL;
 	}
 
 	/* validation if packet capture is for all queues */
@@ -296,8 +368,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register RX callback */
 	if (flags & RTE_PDUMP_FLAG_RX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_rx_q : queue + 1;
-		ret = pdump_register_rx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_rx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -305,8 +378,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register TX callback */
 	if (flags & RTE_PDUMP_FLAG_TX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_tx_q : queue + 1;
-		ret = pdump_register_tx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_tx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -347,8 +421,18 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 int
 rte_pdump_init(void)
 {
+	const struct rte_memzone *mz;
 	int ret;
 
+	mz = rte_memzone_reserve(MZ_RTE_PDUMP_STATS, sizeof(*pdump_stats),
+				 rte_socket_id(), 0);
+	if (mz == NULL) {
+		PDUMP_LOG(ERR, "cannot allocate pdump statistics\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	pdump_stats = mz->addr;
+
 	ret = rte_mp_action_register(PDUMP_MP, pdump_server);
 	if (ret && rte_errno != ENOTSUP)
 		return -1;
@@ -392,14 +476,21 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 static int
 pdump_validate_flags(uint32_t flags)
 {
-	if (flags != RTE_PDUMP_FLAG_RX && flags != RTE_PDUMP_FLAG_TX &&
-		flags != RTE_PDUMP_FLAG_RXTX) {
+	if ((flags & RTE_PDUMP_FLAG_RXTX) == 0) {
 		PDUMP_LOG(ERR,
 			"invalid flags, should be either rx/tx/rxtx\n");
 		rte_errno = EINVAL;
 		return -1;
 	}
 
+	/* mask off the flags we know about */
+	if (flags & ~(RTE_PDUMP_FLAG_RXTX | RTE_PDUMP_FLAG_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "unknown flags: %#x\n", flags);
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -426,12 +517,12 @@ pdump_validate_port(uint16_t port, char *name)
 }
 
 static int
-pdump_prepare_client_request(char *device, uint16_t queue,
-				uint32_t flags,
-				uint16_t operation,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+pdump_prepare_client_request(const char *device, uint16_t queue,
+			     uint32_t flags, uint32_t snaplen,
+			     uint16_t operation,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     const struct rte_bpf_prm *prm)
 {
 	int ret = -1;
 	struct rte_mp_msg mp_req, *mp_rep;
@@ -440,23 +531,23 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	struct pdump_request *req = (struct pdump_request *)mp_req.param;
 	struct pdump_response *resp;
 
-	req->ver = 1;
-	req->flags = flags;
+	memset(req, 0, sizeof(*req));
+	if (flags & RTE_PDUMP_FLAG_PCAPNG)
+		req->ver = PDUMP_CLIENT_PCAPNG;
+	else
+		req->ver = PDUMP_CLIENT_LEGACY;
+
+	req->flags = flags & RTE_PDUMP_FLAG_RXTX;
 	req->op = operation;
+	req->queue = queue;
+	strlcpy(req->device, device, sizeof(req->device));
+
 	if ((operation & ENABLE) != 0) {
-		strlcpy(req->data.en_v1.device, device,
-			sizeof(req->data.en_v1.device));
-		req->data.en_v1.queue = queue;
-		req->data.en_v1.ring = ring;
-		req->data.en_v1.mp = mp;
-		req->data.en_v1.filter = filter;
-	} else {
-		strlcpy(req->data.dis_v1.device, device,
-			sizeof(req->data.dis_v1.device));
-		req->data.dis_v1.queue = queue;
-		req->data.dis_v1.ring = NULL;
-		req->data.dis_v1.mp = NULL;
-		req->data.dis_v1.filter = NULL;
+		req->queue = queue;
+		req->ring = ring;
+		req->mp = mp;
+		req->prm = prm;
+		req->snaplen = snaplen;
 	}
 
 	strlcpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
@@ -477,11 +568,17 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	return ret;
 }
 
-int
-rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
-			struct rte_ring *ring,
-			struct rte_mempool *mp,
-			void *filter)
+/*
+ * There are two versions of this function, because although original API
+ * left place holder for future filter, it never checked the value.
+ * Therefore the API can't depend on application passing a non
+ * bogus value.
+ */
+static int
+pdump_enable(uint16_t port, uint16_t queue,
+	     uint32_t flags, uint32_t snaplen,
+	     struct rte_ring *ring, struct rte_mempool *mp,
+	     const struct rte_bpf_prm *prm)
 {
 	int ret;
 	char name[RTE_DEV_NAME_MAX_LEN];
@@ -496,20 +593,42 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						ENABLE, ring, mp, filter);
+	if (snaplen == 0)
+		snaplen = UINT32_MAX;
 
-	return ret;
+	return pdump_prepare_client_request(name, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
 }
 
 int
-rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
-				uint32_t flags,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
+		 struct rte_ring *ring,
+		 struct rte_mempool *mp,
+		 void *filter __rte_unused)
 {
-	int ret = 0;
+	return pdump_enable(port, queue, flags, 0,
+			    ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm)
+{
+	return pdump_enable(port, queue, flags, snaplen,
+			    ring, mp, prm);
+}
+
+static int
+pdump_enable_by_deviceid(const char *device_id, uint16_t queue,
+			 uint32_t flags, uint32_t snaplen,
+			 struct rte_ring *ring,
+			 struct rte_mempool *mp,
+			 const struct rte_bpf_prm *prm)
+{
+	int ret;
 
 	ret = pdump_validate_ring_mp(ring, mp);
 	if (ret < 0)
@@ -518,10 +637,30 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						ENABLE, ring, mp, filter);
+	return pdump_prepare_client_request(device_id, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
+}
 
-	return ret;
+int
+rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
+			     uint32_t flags,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     void *filter __rte_unused)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, 0,
+					ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *prm)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, snaplen,
+					ring, mp, prm);
 }
 
 int
@@ -537,8 +676,8 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags)
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(name, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
@@ -553,8 +692,66 @@ rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(device_id, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
+
+static void
+pdump_sum_stats(uint16_t port, uint16_t nq,
+		struct rte_pdump_stats stats[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+		struct rte_pdump_stats *total)
+{
+	uint64_t *sum = (uint64_t *)total;
+	unsigned int i;
+	uint64_t val;
+	uint16_t qid;
+
+	for (qid = 0; qid < nq; qid++) {
+		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+
+		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
+			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			sum[i] += val;
+		}
+	}
+}
+
+int
+rte_pdump_stats(uint16_t port, struct rte_pdump_stats *stats)
+{
+	struct rte_eth_dev_info dev_info;
+	const struct rte_memzone *mz;
+	int ret;
+
+	memset(stats, 0, sizeof(*stats));
+	ret = rte_eth_dev_info_get(port, &dev_info);
+	if (ret != 0) {
+		PDUMP_LOG(ERR,
+			  "Error during getting device (port %u) info: %s\n",
+			  port, strerror(-ret));
+		return ret;
+	}
+
+	if (pdump_stats == NULL) {
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+			PDUMP_LOG(ERR,
+				  "pdump not initialized\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		mz = rte_memzone_lookup(MZ_RTE_PDUMP_STATS);
+		if (mz == NULL) {
+			PDUMP_LOG(ERR, "can not find pdump stats\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+		pdump_stats = mz->addr;
+	}
+
+	pdump_sum_stats(port, dev_info.nb_rx_queues, pdump_stats->rx, stats);
+	pdump_sum_stats(port, dev_info.nb_tx_queues, pdump_stats->tx, stats);
+	return 0;
+}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index 6b00fc17aeb2..be3fd14c4bd3 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -15,6 +15,7 @@
 #include <stdint.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
+#include <rte_bpf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -26,7 +27,9 @@ enum {
 	RTE_PDUMP_FLAG_RX = 1,  /* receive direction */
 	RTE_PDUMP_FLAG_TX = 2,  /* transmit direction */
 	/* both receive and transmit directions */
-	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
+	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX),
+
+	RTE_PDUMP_FLAG_PCAPNG = 4, /* format for pcapng */
 };
 
 /**
@@ -68,7 +71,7 @@ rte_pdump_uninit(void);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  Unused should be NULL.
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -80,6 +83,41 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 		struct rte_mempool *mp,
 		void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given port and queue with filtering.
+ *
+ * @param port_id
+ *  The Ethernet port on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param prm
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf(uint16_t port_id, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm);
+
 /**
  * Disables packet capturing on given port and queue.
  *
@@ -118,7 +156,7 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  unused should be NULL
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -131,6 +169,43 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 				struct rte_mempool *mp,
 				void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given device id and queue with filtering.
+ * device_id can be name or pci address of device.
+ *
+ * @param device_id
+ *  device id on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *filter);
+
+
 /**
  * Disables packet capturing on given device_id and queue.
  * device_id can be name or pci address of device.
@@ -153,6 +228,35 @@ int
 rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 				uint32_t flags);
 
+
+/**
+ * A structure used to retrieve statistics from packet capture.
+ * The statistics are sum of both receive and transmit queues.
+ */
+struct rte_pdump_stats {
+	uint64_t accepted; /**< Number of packets accepted by filter. */
+	uint64_t filtered; /**< Number of packets rejected by filter. */
+	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
+	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+
+	uint64_t reserved[4]; /**< Reserved and pad to cache line */
+};
+
+/**
+ * Retrieve the packet capture statistics for a queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param stats
+ *   A pointer to structure of type *rte_pdump_stats* to be filled in.
+ * @return
+ *   Zero if successful. -1 on error and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_pdump_stats(uint16_t port_id, struct rte_pdump_stats *stats);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index f0a9d12c9a9e..ce5502d9cdf4 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -10,3 +10,11 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_pdump_enable_bpf;
+	rte_pdump_enable_bpf_by_deviceid;
+	rte_pdump_stats;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v7 07/11] app/dumpcap: add new packet capture application
  2021-09-10 18:18 ` [dpdk-dev] [PATCH v7 00/11] Packet capture framework enhancements Stephen Hemminger
                     ` (5 preceding siblings ...)
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 06/11] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-09-10 18:18   ` Stephen Hemminger
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 08/11] test: add test for bpf_convert Stephen Hemminger
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-10 18:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.

It is not complete yet some features such as filtering are not implemented.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/dumpcap/main.c      | 844 ++++++++++++++++++++++++++++++++++++++++
 app/dumpcap/meson.build |  16 +
 app/meson.build         |   1 +
 3 files changed, 861 insertions(+)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build

diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c
new file mode 100644
index 000000000000..91a508e7af12
--- /dev/null
+++ b/app/dumpcap/main.c
@@ -0,0 +1,844 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * DPDK application to dump network traffic
+ * This is designed to look and act like the Wireshark
+ * dumpcap program.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <limits.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_alarm.h>
+#include <rte_bpf.h>
+#include <rte_config.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_lcore.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_pcapng.h>
+#include <rte_pdump.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+#include <rte_time.h>
+#include <rte_version.h>
+
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+
+#define RING_NAME "capture-ring"
+#define MONITOR_INTERVAL  (500 * 1000)
+#define MBUF_POOL_CACHE_SIZE 32
+#define BURST_SIZE 32
+#define SLEEP_THRESHOLD 1000
+
+/* command line flags */
+static const char *progname;
+static bool quit_signal;
+static bool group_read;
+static bool quiet;
+static bool promiscuous_mode = true;
+static bool use_pcapng = true;
+static char *output_name;
+static const char *filter_str;
+static unsigned int ring_size = 2048;
+static const char *capture_comment;
+static uint32_t snaplen = RTE_MBUF_DEFAULT_BUF_SIZE;
+static bool dump_bpf;
+static struct {
+	uint64_t  duration;	/* nanoseconds */
+	unsigned long packets;  /* number of packets in file */
+	size_t size;		/* file size (bytes) */
+} stop;
+
+/* Running state */
+static struct rte_bpf_prm *bpf_prm;
+static uint64_t start_time, end_time;
+static uint64_t packets_received;
+static size_t file_size;
+
+struct interface {
+	TAILQ_ENTRY(interface) next;
+	uint16_t port;
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_rxtx_callback *rx_cb[RTE_MAX_QUEUES_PER_PORT];
+};
+
+TAILQ_HEAD(interface_list, interface);
+static struct interface_list interfaces = TAILQ_HEAD_INITIALIZER(interfaces);
+static struct interface *port2intf[RTE_MAX_ETHPORTS];
+
+/* Can do either pcap or pcapng format output */
+typedef union {
+	rte_pcapng_t  *pcapng;
+	pcap_dumper_t *dumper;
+} dumpcap_out_t;
+
+static void usage(void)
+{
+	printf("Usage: %s [options] ...\n\n", progname);
+	printf("Capture Interface:\n"
+	       "  -i <interface>           name or port index of interface\n"
+	       "  -f <capture filter>      packet filter in libpcap filter syntax\n");
+	printf("  -s <snaplen>, --snapshot-length <snaplen>\n"
+	       "                           packet snapshot length (def: %u)\n",
+	       RTE_MBUF_DEFAULT_BUF_SIZE);
+	printf("  -p, --no-promiscuous-mode\n"
+	       "                           don't capture in promiscuous mode\n"
+	       "  -D, --list-interfaces    print list of interfaces and exit\n"
+	       "  -d                       print generated BPF code for capture filter\n"
+	       "\n"
+	       "Stop conditions:\n"
+	       "  -c <packet count>        stop after n packets (def: infinite)\n"
+	       "  -a <autostop cond.> ..., --autostop <autostop cond.> ...\n"
+	       "                           duration:NUM - stop after NUM seconds\n"
+	       "                           filesize:NUM - stop this file after NUM kB\n"
+	       "                            packets:NUM - stop after NUM packets\n"
+	       "Output (files):\n"
+	       "  -w <filename>            name of file to save (def: tempfile)\n"
+	       "  -g                       enable group read access on the output file(s)\n"
+	       "  -n                       use pcapng format instead of pcap (default)\n"
+	       "  -P                       use libpcap format instead of pcapng\n"
+	       "  --capture-comment <comment>\n"
+	       "                           add a capture comment to the output file\n"
+	       "\n"
+	       "Miscellaneous:\n"
+	       "  -q                       don't report packet capture counts\n"
+	       "  -v, --version            print version information and exit\n"
+	       "  -h, --help               display this help and exit\n"
+	       "\n"
+	       "Use Ctrl-C to stop capturing at any time.\n");
+}
+
+static const char *version(void)
+{
+	static char str[128];
+
+	snprintf(str, sizeof(str),
+		 "%s 1.0 (%s)\n", progname, rte_version());
+	return str;
+}
+
+/* Parse numeric argument from command line */
+static unsigned long get_uint(const char *arg, const char *name,
+			     unsigned int limit)
+{
+	unsigned long u;
+	char *endp;
+
+	u = strtoul(arg, &endp, 0);
+	if (*arg == '\0' || *endp != '\0')
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is not a valid number\n",
+			 name, arg);
+	if (limit && u > limit)
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is too large (greater than %u)\n",
+			 name, arg, limit);
+
+	return u;
+}
+
+/* Set auto stop values */
+static void auto_stop(char *opt)
+{
+	char *value, *endp;
+
+	value = strchr(opt, ':');
+	if (value == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Missing colon in auto stop parameter\n");
+
+	*value++ = '\0';
+	if (strcmp(opt, "duration") == 0) {
+		double interval = strtod(value, &endp);
+
+		if (*value == '\0' || *endp != '\0' || interval <= 0)
+			rte_exit(EXIT_FAILURE,
+				 "Invalid duration \"%s\"\n", value);
+		stop.duration = NSEC_PER_SEC * interval;
+	} else if (strcmp(opt, "filesize") == 0) {
+		stop.size = get_uint(value, "filesize", 0) * 1024;
+	} else if (strcmp(opt, "packets") == 0) {
+		stop.packets = get_uint(value, "packets", 0);
+	} else {
+		rte_exit(EXIT_FAILURE,
+			 "Unknown autostop parameter \"%s\"\n", opt);
+	}
+}
+
+/* Add interface to list of interfaces to capture */
+static void add_interface(uint16_t port, const char *name)
+{
+	struct interface *intf;
+
+	intf = malloc(sizeof(*intf));
+	if (!intf)
+		rte_exit(EXIT_FAILURE, "no memory for interface\n");
+
+	memset(intf, 0, sizeof(*intf));
+	strlcpy(intf->name, name, sizeof(intf->name));
+
+	printf("Capturing on '%s'\n", name);
+
+	port2intf[port] = intf;
+	TAILQ_INSERT_TAIL(&interfaces, intf, next);
+}
+
+/* Select all valid DPDK interfaces */
+static void select_all_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+	}
+}
+
+/*
+ * Choose interface to capture if no -i option given.
+ * Select the first DPDK port, this matches what dumpcap does.
+ */
+static void set_default_interface(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+		return;
+	}
+	rte_exit(EXIT_FAILURE, "No usable interfaces found\n");
+}
+
+/* Lookup interface by name or port and add it to the list */
+static void select_interface(const char *arg)
+{
+	uint16_t port;
+
+	if (strcmp(arg, "*"))
+		select_all_interfaces();
+	else if (rte_eth_dev_get_port_by_name(arg, &port) == 0)
+		add_interface(port, arg);
+	else {
+		char name[RTE_ETH_NAME_MAX_LEN];
+
+		port = get_uint(arg, "port_number", UINT16_MAX);
+		if (rte_eth_dev_get_name_by_port(port, name) < 0)
+			rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
+				 port);
+		add_interface(port, name);
+	}
+}
+
+/* Display list of possible interfaces that can be used. */
+static void show_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		printf("%u. %s\n", p, name);
+	}
+}
+
+static void compile_filter(void)
+{
+	struct bpf_program bf;
+	pcap_t *pcap;
+
+	pcap = pcap_open_dead(DLT_EN10MB, snaplen);
+	if (!pcap)
+		rte_exit(EXIT_FAILURE, "can not open pcap\n");
+
+	if (pcap_compile(pcap, &bf, filter_str,
+			 1, PCAP_NETMASK_UNKNOWN) != 0)
+		rte_exit(EXIT_FAILURE, "pcap filter string not valid (%s)\n",
+			 pcap_geterr(pcap));
+
+	bpf_prm = rte_bpf_convert(&bf);
+	if (bpf_prm == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "bpf convert failed\n");
+
+	if (dump_bpf) {
+		printf("cBPF program (%u insns)\n", bf.bf_len);
+		bpf_dump(&bf, 1);
+		printf("\neBPF program (%u insns)\n", bpf_prm->nb_ins);
+		rte_bpf_dump(stdout, bpf_prm->ins, bpf_prm->nb_ins);
+		exit(0);
+	}
+
+	/* Don't care about original program any more */
+	pcap_freecode(&bf);
+	pcap_close(pcap);
+}
+
+/*
+ * Parse command line options.
+ * These are chosen to be similar to dumpcap command.
+ */
+static void parse_opts(int argc, char **argv)
+{
+	static const struct option long_options[] = {
+		{ "autostop",        required_argument, NULL, 'a' },
+		{ "capture-comment", required_argument, NULL, 0 },
+		{ "help",            no_argument,       NULL, 'h' },
+		{ "interface",       required_argument, NULL, 'i' },
+		{ "list-interfaces", no_argument,       NULL, 'D' },
+		{ "no-promiscuous-mode", no_argument,   NULL, 'p' },
+		{ "output-file",     required_argument, NULL, 'w' },
+		{ "ring-buffer",     required_argument, NULL, 'b' },
+		{ "snapshot-length", required_argument, NULL, 's' },
+		{ "version",         no_argument,       NULL, 'v' },
+		{ NULL },
+	};
+	int option_index, c;
+
+	for (;;) {
+		c = getopt_long(argc, argv, "a:b:c:dDf:ghi:nN:pPqs:vw:",
+				long_options, &option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			switch (option_index) {
+			case 0:
+				capture_comment = optarg;
+				break;
+			default:
+				usage();
+				exit(1);
+			}
+			break;
+		case 'a':
+			auto_stop(optarg);
+			break;
+		case 'b':
+			rte_exit(EXIT_FAILURE,
+				 "multiple files not implemented\n");
+			break;
+		case 'c':
+			stop.packets = get_uint(optarg, "packet_count", 0);
+			break;
+		case 'd':
+			dump_bpf = true;
+			break;
+		case 'D':
+			show_interfaces();
+			exit(0);
+		case 'f':
+			filter_str = optarg;
+			break;
+		case 'g':
+			group_read = true;
+			break;
+		case 'h':
+			printf("%s\n\n", version());
+			usage();
+			exit(0);
+		case 'i':
+			select_interface(optarg);
+			break;
+		case 'n':
+			use_pcapng = true;
+			break;
+		case 'N':
+			ring_size = get_uint(optarg, "packet_limit", 0);
+			break;
+		case 'p':
+			promiscuous_mode = false;
+			break;
+		case 'P':
+			use_pcapng = false;
+			break;
+		case 'q':
+			quiet = true;
+			break;
+		case 's':
+			snaplen = get_uint(optarg, "snap_len", 0);
+			break;
+		case 'w':
+			output_name = optarg;
+			break;
+		case 'v':
+			printf("%s\n", version());
+			exit(0);
+		default:
+			fprintf(stderr, "Invalid option: %s\n",
+				argv[optind - 1]);
+			usage();
+			exit(1);
+		}
+	}
+}
+
+static void
+signal_handler(int sig_num __rte_unused)
+{
+	__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+}
+
+/* Return the time since 1/1/1970 in nanoseconds */
+static uint64_t create_timestamp(void)
+{
+	struct timespec now;
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+	return rte_timespec_to_ns(&now);
+}
+
+static void
+cleanup_pdump_resources(void)
+{
+	struct interface *intf;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		rte_pdump_disable(intf->port,
+				  RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
+		if (promiscuous_mode)
+			rte_eth_promiscuous_disable(intf->port);
+	}
+}
+
+/* Alarm signal handler, used to check that primary process */
+static void
+monitor_primary(void *arg __rte_unused)
+{
+	if (__atomic_load_n(&quit_signal, __ATOMIC_RELAXED))
+		return;
+
+	if (rte_eal_primary_proc_alive(NULL)) {
+		rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	} else {
+		fprintf(stderr,
+			"Primary process is no longer active, exiting...\n");
+		__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+	}
+}
+
+/* Setup handler to check when primary exits. */
+static void
+enable_primary_monitor(void)
+{
+	int ret;
+
+	/* Once primary exits, so will pdump. */
+	ret = rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to enable monitor:%d\n", ret);
+}
+
+static void
+disable_primary_monitor(void)
+{
+	int ret;
+
+	ret = rte_eal_alarm_cancel(monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to disable monitor:%d\n", ret);
+}
+
+static void
+report_packet_stats(dumpcap_out_t out)
+{
+	struct rte_pdump_stats pdump_stats;
+	struct interface *intf;
+	uint64_t ifrecv, ifdrop;
+	double percent;
+
+	fputc('\n', stderr);
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (rte_pdump_stats(intf->port, &pdump_stats) < 0)
+			continue;
+
+		/* do what Wiretap does */
+		ifrecv = pdump_stats.accepted + pdump_stats.filtered;
+		ifdrop = pdump_stats.nombuf + pdump_stats.ringfull;
+
+		if (use_pcapng)
+			rte_pcapng_write_stats(out.pcapng, intf->port, NULL,
+					       start_time, end_time,
+					       ifrecv, ifdrop);
+
+		if (ifrecv == 0)
+			percent = 0;
+		else
+			percent = 100. * ifrecv / (ifrecv + ifdrop);
+
+		fprintf(stderr,
+			"Packets received/dropped on interface '%s': "
+			"%"PRIu64 "/%" PRIu64 " (%.1f)\n",
+			intf->name, ifrecv, ifdrop, percent);
+	}
+}
+
+/*
+ * Start DPDK EAL with arguments.
+ * Unlike most DPDK programs, this application does not use the
+ * typical EAL command line arguments.
+ * We don't want to expose all the DPDK internals to the user.
+ */
+static void dpdk_init(void)
+{
+	static const char * const args[] = {
+		"dumpcap", "--proc-type", "secondary",
+		"--log-level", "notice"
+
+	};
+	const int eal_argc = RTE_DIM(args);
+	char **eal_argv;
+	unsigned int i;
+
+	/* DPDK API requires mutable versions of command line arguments. */
+	eal_argv = calloc(eal_argc + 1, sizeof(char *));
+	if (eal_argv == NULL)
+		rte_panic("No memory\n");
+
+	eal_argv[0] = strdup(progname);
+	for (i = 1; i < RTE_DIM(args); i++)
+		eal_argv[i] = strdup(args[i]);
+
+	if (rte_eal_init(eal_argc, eal_argv) < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed: is primary process running?\n");
+
+	if (rte_eth_dev_count_avail() == 0)
+		rte_exit(EXIT_FAILURE, "No Ethernet ports found\n");
+}
+
+/* Create packet ring shared between callbacks and process */
+static struct rte_ring *create_ring(void)
+{
+	struct rte_ring *ring;
+	size_t size, log2;
+
+	/* Find next power of 2 >= size. */
+	size = ring_size;
+	log2 = sizeof(size) * 8 - __builtin_clzl(size - 1);
+	size = 1u << log2;
+
+	if (size != ring_size) {
+		fprintf(stderr, "Ring size %u rounded up to %zu\n",
+			ring_size, size);
+		ring_size = size;
+	}
+
+	ring = rte_ring_lookup(RING_NAME);
+	if (ring == NULL) {
+		ring = rte_ring_create(RING_NAME, ring_size,
+					rte_socket_id(), 0);
+		if (ring == NULL)
+			rte_exit(EXIT_FAILURE, "Could not create ring :%s\n",
+				 rte_strerror(rte_errno));
+	}
+	return ring;
+}
+
+static struct rte_mempool *create_mempool(void)
+{
+	static const char pool_name[] = "capture_mbufs";
+	size_t num_mbufs = 2 * ring_size;
+	struct rte_mempool *mp;
+
+	mp = rte_mempool_lookup(pool_name);
+	if (mp)
+		return mp;
+
+	mp = rte_pktmbuf_pool_create_by_ops(pool_name, num_mbufs,
+					    MBUF_POOL_CACHE_SIZE, 0,
+					    rte_pcapng_mbuf_size(snaplen),
+					    rte_socket_id(), "ring_mp_sc");
+	if (mp == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Mempool (%s) creation failed: %s\n", pool_name,
+			 rte_strerror(rte_errno));
+
+	return mp;
+}
+
+/*
+ * Get Operating System information.
+ * Returns an string allocated via malloc().
+ */
+static char *get_os_info(void)
+{
+	struct utsname uts;
+	char *osname = NULL;
+
+	if (uname(&uts) < 0)
+		return NULL;
+
+	if (asprintf(&osname, "%s %s",
+		     uts.sysname, uts.release) == -1)
+		return NULL;
+
+	return osname;
+}
+
+static dumpcap_out_t create_output(void)
+{
+	dumpcap_out_t ret;
+	static char tmp_path[PATH_MAX];
+	int fd;
+
+	/* If no filename specified make a tempfile name */
+	if (output_name == NULL) {
+		struct interface *intf;
+		struct tm *tm;
+		time_t now;
+		char ts[32];
+
+		intf = TAILQ_FIRST(&interfaces);
+		now = time(NULL);
+		tm = localtime(&now);
+		if (!tm)
+			rte_panic("localtime failed\n");
+
+		strftime(ts, sizeof(ts), "%Y%m%d%H%M%S", tm);
+
+		snprintf(tmp_path, sizeof(tmp_path),
+			 "/tmp/%s_%u_%s_%s.%s",
+			 progname, intf->port, intf->name, ts,
+			 use_pcapng ? "pcapng" : "pcap");
+		output_name = tmp_path;
+	}
+
+	if (strcmp(output_name, "-") == 0)
+		fd = STDOUT_FILENO;
+	else {
+		mode_t mode = group_read ? 0640 : 0600;
+
+		fd = open(output_name, O_WRONLY | O_CREAT, mode);
+		if (fd < 0)
+			rte_exit(EXIT_FAILURE, "Can not open \"%s\": %s\n",
+				 output_name, strerror(errno));
+	}
+
+	if (use_pcapng) {
+		char *os = get_os_info();
+
+		ret.pcapng = rte_pcapng_fdopen(fd, os, NULL,
+					   version(), capture_comment);
+		if (ret.pcapng == NULL)
+			rte_exit(EXIT_FAILURE, "pcapng_fdopen failed: %s\n",
+				 strerror(rte_errno));
+		free(os);
+	} else {
+		pcap_t *pcap;
+
+		pcap = pcap_open_dead_with_tstamp_precision(DLT_EN10MB, snaplen,
+							    PCAP_TSTAMP_PRECISION_NANO);
+		if (pcap == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_open_dead failed\n");
+
+		ret.dumper = pcap_dump_fopen(pcap, fdopen(fd, "w"));
+		if (ret.dumper == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_dump_fopen failed: %s\n",
+				 pcap_geterr(pcap));
+	}
+
+	return ret;
+}
+
+static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp)
+{
+	struct interface *intf;
+	uint32_t flags;
+	int ret;
+
+	flags = RTE_PDUMP_FLAG_RXTX;
+	if (use_pcapng)
+		flags |= RTE_PDUMP_FLAG_PCAPNG;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (promiscuous_mode)
+			rte_eth_promiscuous_enable(intf->port);
+
+		ret = rte_pdump_enable_bpf(intf->port, RTE_PDUMP_ALL_QUEUES,
+					   flags, snaplen,
+					   r, mp, bpf_prm);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Packet dump enable failed: %s\n",
+				 rte_strerror(-ret));
+	}
+}
+
+/*
+ * Show current count of captured packets
+ * with backspaces to overwrite last value.
+ */
+static void show_count(uint64_t count)
+{
+	unsigned int i;
+	static unsigned int bt;
+
+	for (i = 0; i < bt; i++)
+		fputc('\b', stderr);
+
+	bt = fprintf(stderr, "%"PRIu64" ", count);
+}
+
+/* Write multiple packets in older pcap format */
+static ssize_t
+pcap_write_packets(pcap_dumper_t *dumper,
+		   struct rte_mbuf *pkts[], uint16_t n)
+{
+	uint8_t temp_data[snaplen];
+	struct pcap_pkthdr header;
+	uint16_t i;
+	size_t total = 0;
+
+	gettimeofday(&header.ts, NULL);
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m = pkts[i];
+
+		header.len = rte_pktmbuf_pkt_len(m);
+		header.caplen = RTE_MIN(header.len, snaplen);
+
+		pcap_dump((u_char *)dumper, &header,
+			  rte_pktmbuf_read(m, 0, header.caplen, temp_data));
+
+		total += sizeof(header) + header.len;
+	}
+
+	return total;
+}
+
+/* Process all packets in ring and dump to capture file */
+static int process_ring(dumpcap_out_t out, struct rte_ring *r)
+{
+	struct rte_mbuf *pkts[BURST_SIZE];
+	unsigned int avail, n;
+	static unsigned int empty_count;
+	ssize_t written;
+
+	n = rte_ring_sc_dequeue_burst(r, (void **) pkts, BURST_SIZE,
+				      &avail);
+	if (n == 0) {
+		/* don't consume endless amounts of cpu if idle */
+		if (empty_count < SLEEP_THRESHOLD)
+			++empty_count;
+		else
+			usleep(10);
+		return 0;
+	}
+
+	empty_count = (avail == 0);
+
+	if (use_pcapng)
+		written = rte_pcapng_write_packets(out.pcapng, pkts, n);
+	else
+		written = pcap_write_packets(out.dumper, pkts, n);
+
+	rte_pktmbuf_free_bulk(pkts, n);
+
+	if (written < 0)
+		return -1;
+
+	file_size += written;
+	packets_received += n;
+	if (!quiet)
+		show_count(packets_received);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	struct rte_ring *r;
+	struct rte_mempool *mp;
+	dumpcap_out_t out;
+
+	progname = argv[0];
+
+	dpdk_init();
+	parse_opts(argc, argv);
+
+	if (filter_str)
+		compile_filter();
+
+	if (TAILQ_EMPTY(&interfaces))
+		set_default_interface();
+
+	r = create_ring();
+	mp = create_mempool();
+	out = create_output();
+
+	start_time = create_timestamp();
+	enable_pdump(r, mp);
+
+	signal(SIGINT, signal_handler);
+	signal(SIGPIPE, SIG_IGN);
+
+	enable_primary_monitor();
+
+	if (!quiet) {
+		fprintf(stderr, "Packets captured: ");
+		show_count(0);
+	}
+
+	while (!__atomic_load_n(&quit_signal, __ATOMIC_RELAXED)) {
+		if (process_ring(out, r) < 0) {
+			fprintf(stderr, "pcapng file write failed; %s\n",
+				strerror(errno));
+			break;
+		}
+
+		if (stop.size && file_size >= stop.size)
+			break;
+
+		if (stop.packets && packets_received >= stop.packets)
+			break;
+
+		if (stop.duration != 0 &&
+		    create_timestamp() - start_time > stop.duration)
+			break;
+	}
+
+	end_time = create_timestamp();
+	disable_primary_monitor();
+
+	if (rte_eal_primary_proc_alive(NULL))
+		report_packet_stats(out);
+
+	if (use_pcapng)
+		rte_pcapng_close(out.pcapng);
+	else
+		pcap_dump_close(out.dumper);
+
+	cleanup_pdump_resources();
+	rte_free(bpf_filter);
+	rte_ring_free(r);
+	rte_mempool_free(mp);
+
+	return rte_eal_cleanup() ? EXIT_FAILURE : 0;
+}
diff --git a/app/dumpcap/meson.build b/app/dumpcap/meson.build
new file mode 100644
index 000000000000..794336211eff
--- /dev/null
+++ b/app/dumpcap/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+if not dpdk_conf.has('RTE_PORT_PCAP')
+    build = false
+    reason = 'missing dependency, "libpcap"'
+endif
+
+if is_windows
+	build = false
+	reason = 'not supported on Windows'
+	subdir_done()
+endif
+
+sources = files('main.c')
+deps += ['ethdev', 'pdump', 'pcapng', 'bpf']
diff --git a/app/meson.build b/app/meson.build
index 4c6049807cc3..e41a2e390236 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017-2019 Intel Corporation
 
 apps = [
+        'dumpcap',
         'pdump',
         'proc-info',
         'test-acl',
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v7 08/11] test: add test for bpf_convert
  2021-09-10 18:18 ` [dpdk-dev] [PATCH v7 00/11] Packet capture framework enhancements Stephen Hemminger
                     ` (6 preceding siblings ...)
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 07/11] app/dumpcap: add new packet capture application Stephen Hemminger
@ 2021-09-10 18:18   ` Stephen Hemminger
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 09/11] test: add a test for pcapng library Stephen Hemminger
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-10 18:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Add some functional tests for the Classic BPF to DPDK BPF converter.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 148 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 148 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 527c06b80708..565b19653a77 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -10,6 +10,7 @@
 #include <rte_memory.h>
 #include <rte_debug.h>
 #include <rte_hexdump.h>
+#include <rte_malloc.h>
 #include <rte_random.h>
 #include <rte_byteorder.h>
 #include <rte_errno.h>
@@ -3233,3 +3234,150 @@ test_bpf(void)
 }
 
 REGISTER_TEST_COMMAND(bpf_autotest, test_bpf);
+
+#ifdef RTE_PORT_PCAP
+#include <pcap/pcap.h>
+
+static void
+test_bpf_dump(struct bpf_program *cbf, const struct rte_bpf_prm *prm)
+{
+	printf("cBPF program (%u insns)\n", cbf->bf_len);
+	bpf_dump(cbf, 1);
+
+	printf("\neBPF program (%u insns)\n", prm->nb_ins);
+	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
+}
+
+static int
+test_bpf_match(pcap_t *pcap, const char *str, bool expected)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+	uint8_t tbuf[sizeof(struct dummy_mbuf)];
+	int ret = -1;
+	uint64_t rc;
+
+	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
+		       __func__, __LINE__,  str, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, str, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	test_ld_mbuf1_prepare(tbuf);
+	rc = rte_bpf_exec(bpf, tbuf);
+	if ((rc == 0) == expected)
+		ret = 0;
+	else
+		printf("%s@%d: failed match: expect %s 0 got %"PRIu64"\n",
+		       __func__, __LINE__, expected ? "==" : "<>",  rc);
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return ret;
+}
+
+/* Basic sanity test can we match a IP packet */
+static int
+test_bpf_filter_sanity(pcap_t *pcap)
+{
+	int ret;
+
+	ret = test_bpf_match(pcap, "ip", true);
+	ret |= test_bpf_match(pcap, "not ip", false);
+
+	return ret;
+}
+
+/* Some sample pcap filter strings from tcpdump man page */
+static const char * const sample_filters[] = {
+	"host 192.168.1.100",
+	"src net 10",
+	"not stp",
+	"len = 128",
+	"ip host 1.1.1.1 and not 1.1.1.2",
+	"ip and not net 127.0.0.1",
+	"tcp[tcpflags] & (tcp-syn|tcp-fin) != 0 and not src and dst net 127.0.0.1",
+	"ether[0] & 1 = 0 and ip[16] >= 224",
+	"icmp[icmptype] != icmp-echo and icmp[icmptype] != icmp-echoreply",
+};
+
+static int
+test_bpf_filter(pcap_t *pcap, const char *s)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+
+	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
+		       __func__, __LINE__, s, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, s, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	else {
+		printf("%s \"%s\"\n", __func__, s);
+		test_bpf_dump(&fcode, prm);
+	}
+
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return (bpf == NULL) ? -1 : 0;
+}
+
+static int
+test_bpf_convert(void)
+{
+	unsigned int i;
+	pcap_t *pcap;
+	int rc;
+
+	pcap = pcap_open_dead(DLT_EN10MB, 262144);
+	if (!pcap) {
+		printf("pcap_open_dead failed\n");
+		return -1;
+	}
+
+	rc = test_bpf_filter_sanity(pcap);
+	for (i = 0; i < RTE_DIM(sample_filters); i++)
+		rc |= test_bpf_filter(pcap, sample_filters[i]);
+
+	pcap_close(pcap);
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(bpf_convert_autotest, test_bpf_convert);
+#endif /* RTE_PORT_PCAP */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v7 09/11] test: add a test for pcapng library
  2021-09-10 18:18 ` [dpdk-dev] [PATCH v7 00/11] Packet capture framework enhancements Stephen Hemminger
                     ` (7 preceding siblings ...)
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 08/11] test: add test for bpf_convert Stephen Hemminger
@ 2021-09-10 18:18   ` Stephen Hemminger
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 10/11] doc: changes for new pcapng and dumpcap Stephen Hemminger
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 11/11] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
  10 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-10 18:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Simple unit test that created pcapng file using API.

To run this test you need to have at least one device.
For example:

DPDK_TEST=pcapng_autotest ./build/app/test/dpdk-test -l 0-15 \
    --no-huge -m 2048 --vdev=net_tap,iface=dummy

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/meson.build   |   1 +
 app/test/test_pcapng.c | 190 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 191 insertions(+)
 create mode 100644 app/test/test_pcapng.c

diff --git a/app/test/meson.build b/app/test/meson.build
index a7611686adcb..0d551ac9c2b2 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -97,6 +97,7 @@ test_sources = files(
         'test_metrics.c',
         'test_mcslock.c',
         'test_mp_secondary.c',
+        'test_pcapng.c',
         'test_per_lcore.c',
         'test_pflock.c',
         'test_pmd_perf.c',
diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
new file mode 100644
index 000000000000..6bf993ad30f6
--- /dev/null
+++ b/app/test/test_pcapng.c
@@ -0,0 +1,190 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Microsoft Corporation
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <rte_ethdev.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_net.h>
+#include <rte_pcapng.h>
+
+#include "test.h"
+
+#define DUMMY_MBUF_NUM 3
+
+static rte_pcapng_t *pcapng;
+static struct rte_mempool *mp;
+static const uint32_t pkt_len = 200;
+static uint64_t ifrecv, ifdrop;
+static uint16_t port_id;
+
+/* first mbuf in the packet, should always be at offset 0 */
+struct dummy_mbuf {
+	struct rte_mbuf mb[DUMMY_MBUF_NUM];
+	uint8_t buf[DUMMY_MBUF_NUM][RTE_MBUF_DEFAULT_BUF_SIZE];
+};
+
+static void
+dummy_mbuf_prep(struct rte_mbuf *mb, uint8_t buf[], uint32_t buf_len,
+	uint32_t data_len)
+{
+	uint32_t i;
+	uint8_t *db;
+
+	mb->buf_addr = buf;
+	mb->buf_iova = (uintptr_t)buf;
+	mb->buf_len = buf_len;
+	rte_mbuf_refcnt_set(mb, 1);
+
+	/* set pool pointer to dummy value, test doesn't use it */
+	mb->pool = (void *)buf;
+
+	rte_pktmbuf_reset(mb);
+	db = (uint8_t *)rte_pktmbuf_append(mb, data_len);
+
+	for (i = 0; i != data_len; i++)
+		db[i] = i;
+}
+
+/* Make an IP packet consisting of chain of two packets */
+static void
+mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen)
+{
+	struct rte_ipv4_hdr *ph;
+	const struct rte_ipv4_hdr iph = {
+		.version_ihl = RTE_IPV4_VHL_DEF,
+		.total_length = rte_cpu_to_be_16(plen),
+		.time_to_live = IPDEFTTL,
+		.next_proto_id = IPPROTO_RAW,
+		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+	};
+
+	memset(dm, 0, sizeof(*dm));
+	dummy_mbuf_prep(&dm->mb[0], dm->buf[0], sizeof(dm->buf[0]), plen);
+
+	ph = rte_pktmbuf_mtod(dm->mb, typeof(ph));
+	memcpy(ph, &iph, sizeof(iph));
+}
+
+static int
+test_setup(void)
+{
+	char file_template[] = "/tmp/pcapng_test_XXXXXX.pcapng";
+	int tmp_fd;
+
+	port_id = rte_eth_find_next(0);
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		fprintf(stderr, "No valid Ether port\n");
+		return -1;
+	}
+
+	tmp_fd = mkstemps(file_template, strlen(".pcapng"));
+	if (tmp_fd == -1) {
+		perror("mkstemps() failure");
+		return -1;
+	}
+
+	/* open a test capture file */
+	pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL);
+	if (pcapng == NULL) {
+		fprintf(stderr, "rte_pcapng_fdopen failed\n");
+		close(tmp_fd);
+		return -1;
+	}
+
+	/* Make a pool for cloned packeets */
+	mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", DUMMY_MBUF_NUM,
+					    0, 0,
+					    rte_pcapng_mbuf_size(pkt_len),
+					    SOCKET_ID_ANY, "ring_mp_sc");
+	if (mp == NULL) {
+		fprintf(stderr, "Cannot create mempool\n");
+		return -1;
+	}
+	return 0;
+
+}
+
+static int
+test_basic_packets(void)
+{
+	struct rte_mbuf *orig, *clone = NULL;
+	struct dummy_mbuf mbfs;
+	ssize_t len;
+
+	/* make a dummy packet */
+	mbuf1_prepare(&mbfs, pkt_len);
+
+	/* clone them */
+	orig  = &mbfs.mb[0];
+	clone = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
+				rte_get_tsc_cycles(), 0);
+	if (clone == NULL) {
+		fprintf(stderr, "Cannot copy packet\n");
+		return -1;
+	}
+
+	/* write it to capture file */
+	len = rte_pcapng_write_packets(pcapng, &clone, 1);
+	rte_pktmbuf_free(clone);
+
+	if (len <= 0) {
+		fprintf(stderr, "Write of packets failed\n");
+		return -1;
+	}
+
+	++ifrecv;
+	return 0;
+}
+
+static int
+test_write_stats(void)
+{
+	ssize_t len;
+
+	/* write a statistics block */
+	len = rte_pcapng_write_stats(pcapng, port_id,
+				     NULL, 0, 0,
+				     ifrecv, ifdrop);
+	if (len <= 0) {
+		fprintf(stderr, "Write of statistics failed\n");
+		return -1;
+	}
+	return 0;
+}
+
+static void
+test_cleanup(void)
+{
+	if (mp)
+		rte_mempool_free(mp);
+
+	if (pcapng)
+		rte_pcapng_close(pcapng);
+
+}
+
+static struct
+unit_test_suite test_pcapng_suite  = {
+	.setup = test_setup,
+	.teardown = test_cleanup,
+	.suite_name = "Test Pcapng Unit Test Suite",
+	.unit_test_cases = {
+		TEST_CASE(test_basic_packets),
+		TEST_CASE(test_write_stats),
+		TEST_CASES_END()
+	}
+};
+
+static int
+test_pcapng(void)
+{
+	return unit_test_suite_runner(&test_pcapng_suite);
+}
+
+REGISTER_TEST_COMMAND(pcapng_autotest, test_pcapng);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v7 10/11] doc: changes for new pcapng and dumpcap
  2021-09-10 18:18 ` [dpdk-dev] [PATCH v7 00/11] Packet capture framework enhancements Stephen Hemminger
                     ` (8 preceding siblings ...)
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 09/11] test: add a test for pcapng library Stephen Hemminger
@ 2021-09-10 18:18   ` Stephen Hemminger
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 11/11] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
  10 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-10 18:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Describe the new packet capture library and utilities

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/api/doxy-api-index.md                     |  1 +
 doc/api/doxy-api.conf.in                      |  1 +
 .../howto/img/packet_capture_framework.svg    | 96 +++++++++----------
 doc/guides/howto/packet_capture_framework.rst | 67 ++++++-------
 doc/guides/prog_guide/index.rst               |  1 +
 doc/guides/prog_guide/pcapng_lib.rst          | 24 +++++
 doc/guides/prog_guide/pdump_lib.rst           | 28 ++++--
 doc/guides/rel_notes/release_21_11.rst        | 10 ++
 doc/guides/tools/dumpcap.rst                  | 86 +++++++++++++++++
 doc/guides/tools/index.rst                    |  1 +
 10 files changed, 228 insertions(+), 87 deletions(-)
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a0356..ee07394d1c78 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -223,3 +223,4 @@ The public API headers are grouped by topics:
   [experimental APIs]  (@ref rte_compat.h),
   [ABI versioning]     (@ref rte_function_versioning.h),
   [version]            (@ref rte_version.h)
+  [pcapng]             (@ref rte_pcapng.h)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6ab..aba17799a9a1 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -58,6 +58,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/metrics \
                           @TOPDIR@/lib/node \
                           @TOPDIR@/lib/net \
+                          @TOPDIR@/lib/pcapng \
                           @TOPDIR@/lib/pci \
                           @TOPDIR@/lib/pdump \
                           @TOPDIR@/lib/pipeline \
diff --git a/doc/guides/howto/img/packet_capture_framework.svg b/doc/guides/howto/img/packet_capture_framework.svg
index a76baf71fdee..1c2646a81096 100644
--- a/doc/guides/howto/img/packet_capture_framework.svg
+++ b/doc/guides/howto/img/packet_capture_framework.svg
@@ -1,6 +1,4 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<!-- Created with Inkscape (http://www.inkscape.org/) -->
-
 <svg
    xmlns:osb="http://www.openswatchbook.org/uri/2009/osb"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
@@ -16,8 +14,8 @@
    viewBox="0 0 425.19685 283.46457"
    id="svg2"
    version="1.1"
-   inkscape:version="0.91 r13725"
-   sodipodi:docname="drawing-pcap.svg">
+   inkscape:version="1.0.2 (e86c870879, 2021-01-15)"
+   sodipodi:docname="packet_capture_framework.svg">
   <defs
      id="defs4">
     <marker
@@ -228,7 +226,7 @@
        x2="487.64606"
        y2="258.38232"
        gradientUnits="userSpaceOnUse"
-       gradientTransform="translate(-84.916417,744.90779)" />
+       gradientTransform="matrix(1.1457977,0,0,0.99944907,-151.97019,745.05014)" />
     <linearGradient
        inkscape:collect="always"
        xlink:href="#linearGradient5784"
@@ -277,17 +275,18 @@
      borderopacity="1.0"
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
-     inkscape:zoom="0.57434918"
-     inkscape:cx="215.17857"
-     inkscape:cy="285.26445"
+     inkscape:zoom="1"
+     inkscape:cx="226.77165"
+     inkscape:cy="78.124511"
      inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
-     inkscape:window-width="1874"
-     inkscape:window-height="971"
-     inkscape:window-x="2"
-     inkscape:window-y="24"
-     inkscape:window-maximized="0" />
+     inkscape:window-width="2560"
+     inkscape:window-height="1414"
+     inkscape:window-x="0"
+     inkscape:window-y="0"
+     inkscape:window-maximized="1"
+     inkscape:document-rotation="0" />
   <metadata
      id="metadata7">
     <rdf:RDF>
@@ -296,7 +295,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
+        <dc:title />
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -321,15 +320,15 @@
        y="790.82452" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="61.050636"
        y="807.3205"
-       id="text4152"
-       sodipodi:linespacing="125%"><tspan
+       id="text4152"><tspan
          sodipodi:role="line"
          id="tspan4154"
          x="61.050636"
-         y="807.3205">DPDK Primary Application</tspan></text>
+         y="807.3205"
+         style="font-size:12.5px;line-height:1.25">DPDK Primary Application</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6"
@@ -339,19 +338,20 @@
        y="827.01843" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="350.68585"
        y="841.16058"
-       id="text4189"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189"><tspan
          sodipodi:role="line"
          id="tspan4191"
          x="350.68585"
-         y="841.16058">dpdk-pdump</tspan><tspan
+         y="841.16058"
+         style="font-size:12.5px;line-height:1.25">dpdk-dumpcap</tspan><tspan
          sodipodi:role="line"
          x="350.68585"
          y="856.78558"
-         id="tspan4193">tool</tspan></text>
+         id="tspan4193"
+         style="font-size:12.5px;line-height:1.25">tool</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4"
@@ -361,15 +361,15 @@
        y="891.16315" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70612"
        y="905.3053"
-       id="text4189-1"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1"><tspan
          sodipodi:role="line"
          x="352.70612"
          y="905.3053"
-         id="tspan4193-3">PCAP PMD</tspan></text>
+         id="tspan4193-3"
+         style="font-size:12.5px;line-height:1.25">librte_pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5745);fill-opacity:1;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-6"
@@ -379,15 +379,15 @@
        y="923.9931" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.02846"
        y="938.13525"
-       id="text4189-0"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-0"><tspan
          sodipodi:role="line"
          x="136.02846"
          y="938.13525"
-         id="tspan4193-6">dpdk_port0</tspan></text>
+         id="tspan4193-6"
+         style="font-size:12.5px;line-height:1.25">dpdk_port0</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-5"
@@ -397,33 +397,33 @@
        y="824.99817" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="137.54369"
        y="839.14026"
-       id="text4189-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-4"><tspan
          sodipodi:role="line"
          x="137.54369"
          y="839.14026"
-         id="tspan4193-2">librte_pdump</tspan></text>
+         id="tspan4193-2"
+         style="font-size:12.5px;line-height:1.25">librte_pdump</tspan></text>
     <rect
-       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1.07013;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5"
-       width="94.449265"
-       height="35.355339"
-       x="307.7804"
-       y="985.61243" />
+       width="108.21974"
+       height="35.335861"
+       x="297.9809"
+       y="985.62219" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70618"
        y="999.75458"
-       id="text4189-1-8"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8"><tspan
          sodipodi:role="line"
          x="352.70618"
          y="999.75458"
-         id="tspan4193-3-2">capture.pcap</tspan></text>
+         id="tspan4193-3-2"
+         style="font-size:12.5px;line-height:1.25">capture.pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5788-1);fill-opacity:1;stroke:#257cdc;stroke-width:1.12555885;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5-1"
@@ -433,15 +433,15 @@
        y="983.14984" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.53352"
        y="1002.785"
-       id="text4189-1-8-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8-4"><tspan
          sodipodi:role="line"
          x="136.53352"
          y="1002.785"
-         id="tspan4193-3-2-7">Traffic Generator</tspan></text>
+         id="tspan4193-3-2-7"
+         style="font-size:12.5px;line-height:1.25">Traffic Generator</tspan></text>
     <path
        style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker7331)"
        d="m 351.46948,927.02357 c 0,57.5787 0,57.5787 0,57.5787"
diff --git a/doc/guides/howto/packet_capture_framework.rst b/doc/guides/howto/packet_capture_framework.rst
index c31bac52340e..78baa609a021 100644
--- a/doc/guides/howto/packet_capture_framework.rst
+++ b/doc/guides/howto/packet_capture_framework.rst
@@ -1,18 +1,19 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(c) 2017 Intel Corporation.
 
-DPDK pdump Library and pdump Tool
-=================================
+DPDK packet capture libraries and tools
+=======================================
 
 This document describes how the Data Plane Development Kit (DPDK) Packet
 Capture Framework is used for capturing packets on DPDK ports. It is intended
 for users of DPDK who want to know more about the Packet Capture feature and
 for those who want to monitor traffic on DPDK-controlled devices.
 
-The DPDK packet capture framework was introduced in DPDK v16.07. The DPDK
-packet capture framework consists of the DPDK pdump library and DPDK pdump
-tool.
-
+The DPDK packet capture framework was introduced in DPDK v16.07 and
+enhanced in 21.1. The DPDK packet capture framework consists of the
+libraries for collecting packets ``librte_pdump`` and writing packets
+to a file ``librte_pcapng``. There are two sample applications:
+``dpdk-dumpcap`` and older ``dpdk-pdump``.
 
 Introduction
 ------------
@@ -22,43 +23,46 @@ allow users to initialize the packet capture framework and to enable or
 disable packet capture. The library works on a multi process communication model and its
 usage is recommended for debugging purposes.
 
-The :ref:`dpdk-pdump <pdump_tool>` tool is developed based on the
-``librte_pdump`` library.  It runs as a DPDK secondary process and is capable
-of enabling or disabling packet capture on DPDK ports. The ``dpdk-pdump`` tool
-provides command-line options with which users can request enabling or
-disabling of the packet capture on DPDK ports.
+The :ref:`librte_pcapng <pcapng_library>` library provides the APIs to format
+packets and write them to a file in Pcapng format.
+
+
+The :ref:`dpdk-dumpcap <dumpcap_tool>` is a tool that captures packets in
+like Wireshark dumpcap does for Linux. It runs as a DPDK secondary process and
+captures packets from one or more interfaces and writes them to a file
+in Pcapng format.  The ``dpdk-dumpcap`` tool is designed to take
+most of the same options as the Wireshark ``dumpcap`` command.
 
-The application which initializes the packet capture framework will be a primary process
-and the application that enables or disables the packet capture will
-be a secondary process. The primary process sends the Rx and Tx packets from the DPDK ports
-to the secondary process.
+Without any options it will use the packet capture framework to
+capture traffic from the first available DPDK port.
 
 In DPDK the ``testpmd`` application can be used to initialize the packet
-capture framework and acts as a server, and the ``dpdk-pdump`` tool acts as a
+capture framework and acts as a server, and the ``dpdk-dumpcap`` tool acts as a
 client. To view Rx or Tx packets of ``testpmd``, the application should be
-launched first, and then the ``dpdk-pdump`` tool. Packets from ``testpmd``
-will be sent to the tool, which then sends them on to the Pcap PMD device and
-that device writes them to the Pcap file or to an external interface depending
-on the command-line option used.
+launched first, and then the ``dpdk-dumpcap`` tool. Packets from ``testpmd``
+will be sent to the tool, and then to the Pcapng file.
 
 Some things to note:
 
-* The ``dpdk-pdump`` tool can only be used in conjunction with a primary
+* All tools using ``librte_pdump`` can only be used in conjunction with a primary
   application which has the packet capture framework initialized already. In
   dpdk, only ``testpmd`` is modified to initialize packet capture framework,
-  other applications remain untouched. So, if the ``dpdk-pdump`` tool has to
+  other applications remain untouched. So, if the ``dpdk-dumpcap`` tool has to
   be used with any application other than the testpmd, the user needs to
   explicitly modify that application to call the packet capture framework
   initialization code. Refer to the ``app/test-pmd/testpmd.c`` code and look
   for ``pdump`` keyword to see how this is done.
 
-* The ``dpdk-pdump`` tool depends on the libpcap based PMD.
+* The ``dpdk-pdump`` tool is an older tool created as demonstration of ``librte_pdump``
+  library. The ``dpdk-pdump`` tool provides more limited functionality and
+  and depends on the Pcap PMD. It is retained only for compatibility reasons;
+  users should use ``dpdk-dumpcap`` instead.
 
 
 Test Environment
 ----------------
 
-The overview of using the Packet Capture Framework and the ``dpdk-pdump`` tool
+The overview of using the Packet Capture Framework and the ``dpdk-dumpcap`` utility
 for packet capturing on the DPDK port in
 :numref:`figure_packet_capture_framework`.
 
@@ -66,13 +70,13 @@ for packet capturing on the DPDK port in
 
 .. figure:: img/packet_capture_framework.*
 
-   Packet capturing on a DPDK port using the dpdk-pdump tool.
+   Packet capturing on a DPDK port using the dpdk-dumpcap utility.
 
 
 Running the Application
 -----------------------
 
-The following steps demonstrate how to run the ``dpdk-pdump`` tool to capture
+The following steps demonstrate how to run the ``dpdk-dumpcap`` tool to capture
 Rx side packets on dpdk_port0 in :numref:`figure_packet_capture_framework` and
 inspect them using ``tcpdump``.
 
@@ -80,16 +84,15 @@ inspect them using ``tcpdump``.
 
      sudo <build_dir>/app/dpdk-testpmd -c 0xf0 -n 4 -- -i --port-topology=chained
 
-#. Launch the pdump tool as follows::
+#. Launch the dpdk-dump as follows::
 
-     sudo <build_dir>/app/dpdk-pdump -- \
-          --pdump 'port=0,queue=*,rx-dev=/tmp/capture.pcap'
+     sudo <build_dir>/app/dpdk-dumpcap -w /tmp/capture.pcapng
 
 #. Send traffic to dpdk_port0 from traffic generator.
-   Inspect packets captured in the file capture.pcap using a tool
-   that can interpret Pcap files, for example tcpdump::
+   Inspect packets captured in the file capture.pcap using a tool such as
+   tcpdump or tshark that can interpret Pcapng files::
 
-     $tcpdump -nr /tmp/capture.pcap
+     $ tcpdump -nr /tmp/capture.pcapng
      reading from file /tmp/capture.pcap, link-type EN10MB (Ethernet)
      11:11:36.891404 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
      11:11:36.891442 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46a3..b440c77c2ba1 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -43,6 +43,7 @@ Programmer's Guide
     ip_fragment_reassembly_lib
     generic_receive_offload_lib
     generic_segmentation_offload_lib
+    pcapng_lib
     pdump_lib
     multi_proc_support
     kernel_nic_interface
diff --git a/doc/guides/prog_guide/pcapng_lib.rst b/doc/guides/prog_guide/pcapng_lib.rst
new file mode 100644
index 000000000000..36379b530a57
--- /dev/null
+++ b/doc/guides/prog_guide/pcapng_lib.rst
@@ -0,0 +1,24 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2016 Intel Corporation.
+
+.. _pcapng_library:
+
+Packet Capture File Writer
+==========================
+
+Pcapng is a library for creating files in Pcapng file format.
+The Pcapng file format is the default capture file format for modern
+network capture processing tools. It can be read by wireshark and tcpdump.
+
+Usage
+-----
+
+Before the library can be used the function ``rte_pcapng_init``
+should be called once to initialize timestamp computation.
+
+
+References
+----------
+* Draft RFC https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+
+* Project repository  https://github.com/pcapng/pcapng/
diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst
index 62c0b015b2fe..9af91415e5ea 100644
--- a/doc/guides/prog_guide/pdump_lib.rst
+++ b/doc/guides/prog_guide/pdump_lib.rst
@@ -3,10 +3,10 @@
 
 .. _pdump_library:
 
-The librte_pdump Library
-========================
+The Packet Capture Library
+==========================
 
-The ``librte_pdump`` library provides a framework for packet capturing in DPDK.
+The DPDK ``pdump`` library provides a framework for packet capturing in DPDK.
 The library does the complete copy of the Rx and Tx mbufs to a new mempool and
 hence it slows down the performance of the applications, so it is recommended
 to use this library for debugging purposes.
@@ -23,11 +23,19 @@ or disable the packet capture, and to uninitialize it.
 
 * ``rte_pdump_enable()``:
   This API enables the packet capture on a given port and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf()``
+  This API enables the packet capture on a given port and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_enable_by_deviceid()``:
   This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf_by_deviceid()``
+  This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
+  It also allows seating an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_disable()``:
   This API disables the packet capture on a given port and queue.
@@ -61,6 +69,12 @@ and enables the packet capture by registering the Ethernet RX and TX callbacks f
 and queue combinations. Then the primary process will mirror the packets to the new mempool and enqueue them to
 the rte_ring that secondary process have passed to these APIs.
 
+The packet ring supports one of two formats. The default format enqueues copies of the original packets
+into the rte_ring. If the ``RTE_PDUMP_FLAG_PCAPNG`` is set the mbuf data is extended with header and trailer
+to match the format of Pcapng enhanced packet block. The enhanced packet block has meta-data such as the
+timestamp, port and queue the packet was captured on. It is up to the application consuming the
+packets from the ring to select the format desired.
+
 The library APIs ``rte_pdump_disable()`` and ``rte_pdump_disable_by_deviceid()`` disables the packet capture.
 For the calls to these APIs from secondary process, the library creates the "pdump disable" request and sends
 the request to the primary process over the multi process channel. The primary process takes this request and
@@ -74,5 +88,5 @@ function.
 Use Case: Packet Capturing
 --------------------------
 
-The DPDK ``app/pdump`` tool is developed based on this library to capture packets in DPDK.
-Users can use this as an example to develop their own packet capturing tools.
+The DPDK ``app/dpdk-dumpcap`` utility uses this library
+to capture packets in DPDK.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 675b5738348b..ee24cbfdb99d 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -62,6 +62,16 @@ New Features
   * Added bus-level parsing of the devargs syntax.
   * Kept compatibility with the legacy syntax as parsing fallback.
 
+* **Enhance Packet capture.**
+
+  * New dpdk-dumpcap program that has most of the features of the
+    wireshark dumpcap utility including capture of multiple interfaces,
+    stopping after number of bytes, packets.
+  * New library for writing pcapng packet capture files.
+  * Enhancement to the pdump library to support:
+    * Packet filter with BPF.
+    * Pcapng format with timestamps and meta-data.
+    * Fixes packet capture with stripped VLAN tags.
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/dumpcap.rst b/doc/guides/tools/dumpcap.rst
new file mode 100644
index 000000000000..664ea0c79802
--- /dev/null
+++ b/doc/guides/tools/dumpcap.rst
@@ -0,0 +1,86 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Microsoft Corporation.
+
+.. _dumpcap_tool:
+
+dpdk-dumpcap Application
+========================
+
+The ``dpdk-dumpcap`` tool is a Data Plane Development Kit (DPDK)
+network traffic dump tool.  The interface is similar to  the dumpcap tool in Wireshark.
+It runs as a secondary DPDK process and lets you capture packets that are
+coming into and out of a DPDK primary process.
+The ``dpdk-dumpcap`` writes files in Pcapng packet format using
+capture file format is pcapng.
+
+Without any options set it will use DPDK to capture traffic from the first
+available DPDK interface and write the received raw packet data, along
+with timestamps into a pcapng file.
+
+If the ``-w`` option is not specified, ``dpdk-dumpcap`` writes to a newly
+create file with a name chosen based on interface name and timestamp.
+If ``-w`` option is specified, then that file is used.
+
+   .. Note::
+      * The ``dpdk-dumpcap`` tool can only be used in conjunction with a primary
+        application which has the packet capture framework initialized already.
+        In dpdk, only the ``testpmd`` is modified to initialize packet capture
+        framework, other applications remain untouched. So, if the ``dpdk-dumpcap``
+        tool has to be used with any application other than the testpmd, user
+        needs to explicitly modify that application to call packet capture
+        framework initialization code. Refer ``app/test-pmd/testpmd.c``
+        code to see how this is done.
+
+      * The ``dpdk-dumpcap`` tool runs as a DPDK secondary process. It exits when
+        the primary application exits.
+
+
+Running the Application
+-----------------------
+
+To list interfaces available for capture use ``--list-interfaces``.
+
+To filter packets in style of *tshark* use the ``-f`` flag.
+
+To capture on multiple interfaces at once, use multiple ``-I`` flags.
+
+Example
+-------
+
+.. code-block:: console
+
+   # ./<build_dir>/app/dpdk-dumpcap --list-interfaces
+   0. 000:00:03.0
+   1. 000:00:03.1
+
+   # ./<build_dir>/app/dpdk-dumpcap -I 0000:00:03.0 -c 6 -w /tmp/sample.pcapng
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 6/0
+
+   # ./<build_dir>/app/dpdk-dumpcap -f 'tcp port 80'
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 10/8
+
+
+Limitations
+-----------
+The following option of Wireshark ``dumpcap`` is not yet implemented:
+
+   * ``-b|--ring-buffer`` -- more complex file management.
+
+The following options do not make sense in the context of DPDK.
+
+   * ``-C <byte_limit>`` -- its a kernel thing
+
+   * ``-t`` -- use a thread per interface
+
+   * Timestamp type.
+
+   * Link data types. Only EN10MB (Ethernet) is supported.
+
+   * Wireless related options:  ``-I|--monitor-mode`` and  ``-k <freq>``
+
+
+.. Note::
+   * The options to ``dpdk-dumpcap`` are like the Wireshark dumpcap program and
+     are not the same as ``dpdk-pdump`` and other DPDK applications.
diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst
index 93dde4148e90..b71c12b8f2dd 100644
--- a/doc/guides/tools/index.rst
+++ b/doc/guides/tools/index.rst
@@ -8,6 +8,7 @@ DPDK Tools User Guides
     :maxdepth: 2
     :numbered:
 
+    dumpcap
     proc_info
     pdump
     pmdinfo
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v7 11/11] MAINTAINERS: add entry for new pcapng and dumper
  2021-09-10 18:18 ` [dpdk-dev] [PATCH v7 00/11] Packet capture framework enhancements Stephen Hemminger
                     ` (9 preceding siblings ...)
  2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 10/11] doc: changes for new pcapng and dumpcap Stephen Hemminger
@ 2021-09-10 18:18   ` Stephen Hemminger
  10 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-10 18:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Claim responsibility for the new code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 266f5ac1dae8..06384ac2702d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1429,6 +1429,12 @@ F: app/test/test_pdump.*
 F: app/pdump/
 F: doc/guides/tools/pdump.rst
 
+Packet dump
+M: Stephen Hemminger <stephen@networkplumber.org>
+F: lib/pcapng/
+F: doc/guides/prog_guide/pcapng_lib.rst
+F: app/dumpcap/
+F: doc/guides/tools/dumpcap.rst
 
 Packet Framework
 ----------------
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v8 00/12] Packet capture framework enhancements
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (10 preceding siblings ...)
  2021-09-10 18:18 ` [dpdk-dev] [PATCH v7 00/11] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-13 18:14 ` Stephen Hemminger
  2021-09-13 18:14   ` [dpdk-dev] [PATCH v8 01/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
                     ` (11 more replies)
  2021-09-16  0:14 ` [dpdk-dev] [PATCH v9 00/12] Packet capture framework enhancements Stephen Hemminger
                   ` (6 subsequent siblings)
  18 siblings, 12 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-13 18:14 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patch set is a more complete version of the the enhanced
packet capture support described last year.

The new capture library and utility are:
  - faster avoids lots of extra I/O, does bursting, etc.
  - gives more information (multiple ports, queues, etc)
  - has a better user interface (same as Wireshark dumpcap)
  - fixes structural problems with VLAN's and timestamps

v8 changes:
  - enable BPF tests in autotest
  - add more BPF test strings
  - use rte_strscpy to satisfy checkpatch
  - merge MAINTAINERS (put this in with existing pdump)

v7 changes:
  - add functional tests for pcapng lib
  - bug fix for error returns in pcapng lib
  - handle long osname on FreeBSD

Stephen Hemminger (12):
  librte_pcapng: add new library for writing pcapng files
  lib: pdump is not supported on Windows
  bpf: allow self-xor operation
  bpf: add function to convert classic BPF to DPDK BPF
  bpf: add function to dump eBPF instructions
  pdump: support pcapng and filtering
  app/dumpcap: add new packet capture application
  test: add test for bpf_convert
  test: add a test for pcapng library
  test: enable bpf autotest
  doc: changes for new pcapng and dumpcap
  MAINTAINERS: add entry for new packet capture features

 MAINTAINERS                                   |  11 +-
 app/dumpcap/main.c                            | 844 ++++++++++++++++++
 app/dumpcap/meson.build                       |  16 +
 app/meson.build                               |   1 +
 app/test/meson.build                          |   3 +
 app/test/test_bpf.c                           | 173 ++++
 app/test/test_pcapng.c                        | 190 ++++
 doc/api/doxy-api-index.md                     |   1 +
 doc/api/doxy-api.conf.in                      |   1 +
 .../howto/img/packet_capture_framework.svg    |  96 +-
 doc/guides/howto/packet_capture_framework.rst |  67 +-
 doc/guides/prog_guide/index.rst               |   1 +
 doc/guides/prog_guide/pcapng_lib.rst          |  24 +
 doc/guides/prog_guide/pdump_lib.rst           |  28 +-
 doc/guides/rel_notes/release_21_11.rst        |  10 +
 doc/guides/tools/dumpcap.rst                  |  86 ++
 doc/guides/tools/index.rst                    |   1 +
 lib/bpf/bpf_convert.c                         | 575 ++++++++++++
 lib/bpf/bpf_dump.c                            | 118 +++
 lib/bpf/bpf_validate.c                        |   8 +-
 lib/bpf/meson.build                           |   6 +
 lib/bpf/rte_bpf.h                             |  39 +
 lib/bpf/version.map                           |   7 +
 lib/meson.build                               |   6 +-
 lib/pcapng/meson.build                        |   8 +
 lib/pcapng/pcapng_proto.h                     | 129 +++
 lib/pcapng/rte_pcapng.c                       | 574 ++++++++++++
 lib/pcapng/rte_pcapng.h                       | 194 ++++
 lib/pcapng/version.map                        |  12 +
 lib/pdump/meson.build                         |   2 +-
 lib/pdump/rte_pdump.c                         | 441 ++++++---
 lib/pdump/rte_pdump.h                         | 110 ++-
 lib/pdump/version.map                         |   8 +
 33 files changed, 3569 insertions(+), 221 deletions(-)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build
 create mode 100644 app/test/test_pcapng.c
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst
 create mode 100644 lib/bpf/bpf_convert.c
 create mode 100644 lib/bpf/bpf_dump.c
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v8 01/12] librte_pcapng: add new library for writing pcapng files
  2021-09-13 18:14 ` [dpdk-dev] [PATCH v8 00/12] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-13 18:14   ` Stephen Hemminger
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 02/12] lib: pdump is not supported on Windows Stephen Hemminger
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-13 18:14 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See draft RFC
  https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
and
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build           |   1 +
 lib/pcapng/meson.build    |   8 +
 lib/pcapng/pcapng_proto.h | 129 +++++++++
 lib/pcapng/rte_pcapng.c   | 574 ++++++++++++++++++++++++++++++++++++++
 lib/pcapng/rte_pcapng.h   | 194 +++++++++++++
 lib/pcapng/version.map    |  12 +
 6 files changed, 918 insertions(+)
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

diff --git a/lib/meson.build b/lib/meson.build
index 1673ca4323c0..51bf9c2d11f0 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -41,6 +41,7 @@ libraries = [
         'latencystats',
         'lpm',
         'member',
+        'pcapng',
         'power',
         'pdump',
         'rawdev',
diff --git a/lib/pcapng/meson.build b/lib/pcapng/meson.build
new file mode 100644
index 000000000000..fe636bdf3c0b
--- /dev/null
+++ b/lib/pcapng/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+version = 1
+sources = files('rte_pcapng.c')
+headers = files('rte_pcapng.h')
+
+deps += ['ethdev']
diff --git a/lib/pcapng/pcapng_proto.h b/lib/pcapng/pcapng_proto.h
new file mode 100644
index 000000000000..47161d8a1213
--- /dev/null
+++ b/lib/pcapng/pcapng_proto.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * PCAP Next Generation Capture File writer
+ *
+ * See: https://github.com/pcapng/pcapng/ for the file format.
+ */
+
+enum pcapng_block_types {
+	PCAPNG_INTERFACE_BLOCK		= 1,
+	PCAPNG_PACKET_BLOCK,		/* Obsolete */
+	PCAPNG_SIMPLE_PACKET_BLOCK,
+	PCAPNG_NAME_RESOLUTION_BLOCK,
+	PCAPNG_INTERFACE_STATS_BLOCK,
+	PCAPNG_ENHANCED_PACKET_BLOCK,
+
+	PCAPNG_SECTION_BLOCK		= 0x0A0D0D0A,
+};
+
+struct pcapng_option {
+	uint16_t code;
+	uint16_t length;
+	uint8_t data[];
+};
+
+#define PCAPNG_BYTE_ORDER_MAGIC 0x1A2B3C4D
+#define PCAPNG_MAJOR_VERS 1
+#define PCAPNG_MINOR_VERS 0
+
+enum pcapng_opt {
+	PCAPNG_OPT_END	= 0,
+	PCAPNG_OPT_COMMENT = 1,
+};
+
+struct pcapng_section_header {
+	uint32_t block_type;
+	uint32_t block_length;
+	uint32_t byte_order_magic;
+	uint16_t major_version;
+	uint16_t minor_version;
+	uint64_t section_length;
+};
+
+enum pcapng_section_opt {
+	PCAPNG_SHB_HARDWARE = 2,
+	PCAPNG_SHB_OS	    = 3,
+	PCAPNG_SHB_USERAPPL = 4,
+};
+
+struct pcapng_interface_block {
+	uint32_t block_type;	/* 1 */
+	uint32_t block_length;
+	uint16_t link_type;
+	uint16_t reserved;
+	uint32_t snap_len;
+};
+
+enum pcapng_interface_options {
+	PCAPNG_IFB_NAME	 = 2,
+	PCAPNG_IFB_DESCRIPTION,
+	PCAPNG_IFB_IPV4ADDR,
+	PCAPNG_IFB_IPV6ADDR,
+	PCAPNG_IFB_MACADDR,
+	PCAPNG_IFB_EUIADDR,
+	PCAPNG_IFB_SPEED,
+	PCAPNG_IFB_TSRESOL,
+	PCAPNG_IFB_TZONE,
+	PCAPNG_IFB_FILTER,
+	PCAPNG_IFB_OS,
+	PCAPNG_IFB_FCSLEN,
+	PCAPNG_IFB_TSOFFSET,
+	PCAPNG_IFB_HARDWARE,
+};
+
+struct pcapng_enhance_packet_block {
+	uint32_t block_type;	/* 6 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+	uint32_t capture_length;
+	uint32_t original_length;
+};
+
+/* Flags values */
+#define PCAPNG_IFB_INBOUND   0b01
+#define PCAPNG_IFB_OUTBOUND  0b10
+
+enum pcapng_epb_options {
+	PCAPNG_EPB_FLAGS = 2,
+	PCAPNG_EPB_HASH,
+	PCAPNG_EPB_DROPCOUNT,
+	PCAPNG_EPB_PACKETID,
+	PCAPNG_EPB_QUEUE,
+	PCAPNG_EPB_VERDICT,
+};
+
+enum pcapng_epb_hash {
+	PCAPNG_HASH_2COMP = 0,
+	PCAPNG_HASH_XOR,
+	PCAPNG_HASH_CRC32,
+	PCAPNG_HASH_MD5,
+	PCAPNG_HASH_SHA1,
+	PCAPNG_HASH_TOEPLITZ,
+};
+
+struct pcapng_simple_packet {
+	uint32_t block_type;	/* 3 */
+	uint32_t block_length;
+	uint32_t packet_length;
+};
+
+struct pcapng_statistics {
+	uint32_t block_type;	/* 5 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+};
+
+enum pcapng_isb_options {
+	PCAPNG_ISB_STARTTIME = 2,
+	PCAPNG_ISB_ENDTIME,
+	PCAPNG_ISB_IFRECV,
+	PCAPNG_ISB_IFDROP,
+	PCAPNG_ISB_FILTERACCEPT,
+	PCAPNG_ISB_OSDROP,
+	PCAPNG_ISB_USRDELIV,
+};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
new file mode 100644
index 000000000000..f8280a8b01f4
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.c
@@ -0,0 +1,574 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+#include <errno.h>
+#include <net/if.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/uio.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+#include <rte_time.h>
+
+#include "pcapng_proto.h"
+
+/* conversion from DPDK speed to PCAPNG */
+#define PCAPNG_MBPS_SPEED 1000000ull
+
+/* Format of the capture file handle */
+struct rte_pcapng {
+	int  outfd;		/* output file */
+	/* DPDK port id to interface index in file */
+	uint32_t port_index[RTE_MAX_ETHPORTS];
+};
+
+/* For converting TSC cycles to PCAPNG ns format */
+struct pcapng_time {
+	uint64_t ns;
+	uint64_t cycles;
+} pcapng_time;
+
+RTE_INIT(pcapng_init)
+{
+	struct timespec ts;
+
+	pcapng_time.cycles = rte_get_tsc_cycles();
+	clock_gettime(CLOCK_REALTIME, &ts);
+	pcapng_time.ns = rte_timespec_to_ns(&ts);
+}
+
+/* PCAPNG timestamps are in nanoseconds */
+static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
+{
+	uint64_t delta;
+
+	delta = cycles - pcapng_time.cycles;
+	return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
+}
+
+/* length of option including padding */
+static uint16_t pcapng_optlen(uint16_t len)
+{
+	return RTE_ALIGN(sizeof(struct pcapng_option) + len,
+			 sizeof(uint32_t));
+}
+
+/* build TLV option and return location of next */
+static struct pcapng_option *
+pcapng_add_option(struct pcapng_option *popt, uint16_t code,
+		  const void *data, uint16_t len)
+{
+	popt->code = code;
+	popt->length = len;
+	memcpy(popt->data, data, len);
+
+	return (struct pcapng_option *)((uint8_t *)popt + pcapng_optlen(len));
+}
+
+/*
+ * Write required initial section header describing the capture
+ */
+static int
+pcapng_section_block(rte_pcapng_t *self,
+		    const char *os, const char *hw,
+		    const char *app, const char *comment)
+{
+	struct pcapng_section_header *hdr;
+	struct pcapng_option *opt;
+	void *buf;
+	uint32_t len;
+	ssize_t cc;
+
+	len = sizeof(*hdr);
+	if (hw)
+		len += pcapng_optlen(strlen(hw));
+	if (os)
+		len += pcapng_optlen(strlen(os));
+	if (app)
+		len += pcapng_optlen(strlen(app));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = calloc(1, len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_section_header *)buf;
+	*hdr = (struct pcapng_section_header) {
+		.block_type = PCAPNG_SECTION_BLOCK,
+		.block_length = len,
+		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
+		.major_version = PCAPNG_MAJOR_VERS,
+		.minor_version = PCAPNG_MINOR_VERS,
+		.section_length = UINT64_MAX,
+	};
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (hw)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_HARDWARE,
+					hw, strlen(hw));
+	if (os)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_OS,
+					os, strlen(os));
+	if (app)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_USERAPPL,
+					app, strlen(app));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+	/* clone block_length after option */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+/* Write the PCAPNG section header at start of file */
+static ssize_t
+pcapng_interface_block(rte_pcapng_t *self, const char *if_name,
+		      uint64_t if_speed, const uint8_t *mac_addr,
+		      const char *if_hw, const char *comment)
+{
+	struct pcapng_interface_block *hdr;
+	struct pcapng_option *opt;
+	const uint8_t tsresol = 9;	/* nanosecond resolution */
+	uint32_t len = sizeof(*hdr);
+	ssize_t cc;
+	void *buf;
+
+	len += pcapng_optlen(sizeof(tsresol));
+	if (if_name)
+		len += pcapng_optlen(strlen(if_name));
+	if (mac_addr)
+		len += pcapng_optlen(6);
+	if (if_speed)
+		len += pcapng_optlen(sizeof(uint64_t));
+	if (if_hw)
+		len += pcapng_optlen(strlen(if_hw));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+	buf = calloc(1, len);
+	if (!buf)
+		return -ENOMEM;
+
+	hdr = (struct pcapng_interface_block *)buf;
+	hdr->block_type = PCAPNG_INTERFACE_BLOCK;
+	hdr->link_type = 1;	/* Ethernet */
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (if_name)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_NAME,
+					 if_name, strlen(if_name));
+	if (mac_addr)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_MACADDR,
+					mac_addr, RTE_ETHER_ADDR_LEN);
+	if (if_speed)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_SPEED,
+					 &if_speed, sizeof(uint64_t));
+	opt = pcapng_add_option(opt, PCAPNG_IFB_TSRESOL,
+				&tsresol, sizeof(tsresol));
+	if (if_hw)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE,
+					 if_hw, strlen(if_hw));
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+static int
+pcapng_add_interface(rte_pcapng_t *self, uint16_t port)
+{
+	struct rte_eth_dev_info dev_info;
+	struct rte_ether_addr macaddr;
+	const struct rte_device *dev;
+	struct rte_eth_link link;
+	char ifname[IF_NAMESIZE];
+	char ifhw[256];
+	uint64_t speed = 0;
+
+	if (rte_eth_dev_info_get(port, &dev_info) < 0)
+		return -1;
+
+	/* make something like an interface name */
+	if (if_indextoname(dev_info.if_index, ifname) == NULL)
+		snprintf(ifname, IF_NAMESIZE, "dpdk:%u", port);
+
+	/* make a useful device hardware string */
+	dev = dev_info.device;
+	if (dev)
+		snprintf(ifhw, sizeof(ifhw),
+			 "%s-%s", dev->bus->name, dev->name);
+
+	/* DPDK reports in units of Mbps */
+	rte_eth_link_get(port, &link);
+	if (link.link_status == ETH_LINK_UP)
+		speed = link.link_speed * PCAPNG_MBPS_SPEED;
+
+	rte_eth_macaddr_get(port, &macaddr);
+
+	return pcapng_interface_block(self, ifname, speed,
+				      macaddr.addr_bytes,
+				      dev ? ifhw : NULL, NULL);
+}
+
+/*
+ * Write the list of possible interfaces at the start
+ * of the file.
+ */
+static int
+pcapng_interfaces(rte_pcapng_t *self)
+{
+	uint16_t port_id;
+	uint16_t index = 0;
+
+	RTE_ETH_FOREACH_DEV(port_id) {
+		/* The list if ports in pcapng needs to be contiguous */
+		self->port_index[port_id] = index++;
+		if (pcapng_add_interface(self, port_id) < 0)
+			return -1;
+	}
+	return 0;
+}
+
+/*
+ * Write an Interface statistics block at the end of capture.
+ */
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop)
+{
+	struct pcapng_statistics *hdr;
+	struct pcapng_option *opt;
+	uint32_t optlen, len;
+	uint8_t *buf;
+	uint64_t ns;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	optlen = 0;
+
+	if (ifrecv != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifdrop));
+	if (start_time != 0)
+		optlen += pcapng_optlen(sizeof(start_time));
+	if (end_time != 0)
+		optlen += pcapng_optlen(sizeof(end_time));
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+	if (optlen != 0)
+		optlen += pcapng_optlen(0);
+
+	len = sizeof(*hdr) + optlen + sizeof(uint32_t);
+	buf = alloca(len);
+	if (buf == NULL)
+		return -1;
+
+	hdr = (struct pcapng_statistics *)buf;
+	opt = (struct pcapng_option *)(hdr + 1);
+
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (start_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME,
+					 &start_time, sizeof(start_time));
+	if (end_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME,
+					 &end_time, sizeof(end_time));
+	if (ifrecv != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV,
+				&ifrecv, sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFDROP,
+				&ifdrop, sizeof(ifdrop));
+	if (optlen != 0)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	hdr->block_type = PCAPNG_INTERFACE_STATS_BLOCK;
+	hdr->block_length = len;
+	hdr->interface_id = self->port_index[port_id];
+
+	ns = pcapng_tsc_to_ns(rte_get_tsc_cycles());
+	hdr->timestamp_hi = ns >> 32;
+	hdr->timestamp_lo = (uint32_t)ns;
+
+	/* clone block_length after option */
+	memcpy(opt, &len, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length)
+{
+	/* The VLAN and EPB header must fit in the mbuf headroom. */
+	RTE_ASSERT(sizeof(struct pcapng_enhance_packet_block) +
+		   sizeof(struct rte_vlan_hdr) <= RTE_PKTMBUF_HEADROOM);
+
+	/* The flags and queue information are added at the end. */
+	return sizeof(struct rte_mbuf)
+		+ RTE_ALIGN(length, sizeof(uint32_t))
+		+ pcapng_optlen(sizeof(uint32_t)) /* flag option */
+		+ pcapng_optlen(sizeof(uint32_t)) /* queue option */
+		+ sizeof(uint32_t);		  /*  length */
+}
+
+/*
+ *   The mbufs created use the Pcapng standard enhanced packet  block.
+ *
+ *                         1                   2                   3
+ *     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  0 |                    Block Type = 0x00000006                    |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  4 |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  8 |                         Interface ID                          |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 12 |                        Timestamp (High)                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 16 |                        Timestamp (Low)                        |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 20 |                    Captured Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 24 |                    Original Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 28 /                                                               /
+ *    /                          Packet Data                          /
+ *    /              variable length, padded to 32 bits               /
+ *    /                                                               /
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0002     |     Option Length = 0x004     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Flags (direction)                                |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0006     |     Option Length = 0x002     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Queue id                                         |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+
+/* Make a copy of original mbuf with pcapng header and options */
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *md,
+		struct rte_mempool *mp,
+		uint32_t length, uint64_t cycles,
+		enum rte_pcapng_direction direction)
+{
+	struct pcapng_enhance_packet_block *epb;
+	uint32_t orig_len, data_len, padding, flags;
+	struct pcapng_option *opt;
+	const uint16_t optlen = pcapng_optlen(sizeof(flags)) + pcapng_optlen(sizeof(queue));
+	struct rte_mbuf *mc;
+	uint64_t ns;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
+#endif
+	ns = pcapng_tsc_to_ns(cycles);
+
+	orig_len = rte_pktmbuf_pkt_len(md);
+
+	/* Take snapshot of the data */
+	mc = rte_pktmbuf_copy(md, mp, 0, length);
+	if (unlikely(mc == NULL))
+		return NULL;
+
+	/* If packet had offloaded VLAN, expand it */
+	if (md->ol_flags & ~(PKT_RX_VLAN_STRIPPED | PKT_TX_VLAN)) {
+		if (rte_vlan_insert(&mc) != 0)
+			goto fail;
+
+		orig_len += sizeof(struct rte_vlan_hdr);
+	}
+
+	/* pad the packet to 32 bit boundary */
+	data_len = rte_pktmbuf_data_len(mc);
+	padding = RTE_ALIGN(data_len, sizeof(uint32_t)) - data_len;
+	if (padding > 0) {
+		void *tail = rte_pktmbuf_append(mc, padding);
+
+		if (tail == NULL)
+			goto fail;
+		memset(tail, 0, padding);
+	}
+
+	/* reserve trailing options and block length */
+	opt = (struct pcapng_option *)
+		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
+	if (unlikely(opt == NULL))
+		goto fail;
+
+	switch (direction) {
+	case RTE_PCAPNG_DIRECTION_IN:
+		flags = PCAPNG_IFB_INBOUND;
+		break;
+	case RTE_PCAPNG_DIRECTION_OUT:
+		flags = PCAPNG_IFB_OUTBOUND;
+		break;
+	default:
+		flags = 0;
+	}
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS,
+				&flags, sizeof(flags));
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
+				&queue, sizeof(queue));
+
+	/* Add PCAPNG packet header */
+	epb = (struct pcapng_enhance_packet_block *)
+		rte_pktmbuf_prepend(mc, sizeof(*epb));
+	if (unlikely(epb == NULL))
+		goto fail;
+
+	epb->block_type = PCAPNG_ENHANCED_PACKET_BLOCK;
+	epb->block_length = rte_pktmbuf_data_len(mc);
+
+	/* Interface index is filled in later during write */
+	mc->port = port_id;
+
+	epb->timestamp_hi = ns >> 32;
+	epb->timestamp_lo = (uint32_t)ns;
+	epb->capture_length = data_len;
+	epb->original_length = orig_len;
+
+	/* set trailer of block length */
+	*(uint32_t *)opt = epb->block_length;
+
+	return mc;
+
+fail:
+	rte_pktmbuf_free(mc);
+	return NULL;
+}
+
+/* Count how many segments are in this array of mbufs */
+static unsigned int
+mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
+{
+	unsigned int i, iovcnt;
+
+	for (iovcnt = 0, i = 0; i < n; i++) {
+		const struct rte_mbuf *m = pkts[i];
+
+		__rte_mbuf_sanity_check(m, 1);
+
+		iovcnt += m->nb_segs;
+	}
+	return iovcnt;
+}
+
+/* Write pre-formatted packets to file. */
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts)
+{
+	int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
+	struct iovec iov[iovcnt];
+	unsigned int i, cnt;
+	ssize_t ret;
+
+	for (i = cnt = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *m = pkts[i];
+		struct pcapng_enhance_packet_block *epb;
+
+		/* sanity check that is really a pcapng mbuf */
+		epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
+		if (unlikely(epb->block_type != PCAPNG_ENHANCED_PACKET_BLOCK ||
+			     epb->block_length != rte_pktmbuf_data_len(m))) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/*
+		 * The DPDK port is recorded during pcapng_copy.
+		 * Map that to PCAPNG interface in file.
+		 */
+		epb->interface_id = self->port_index[m->port];
+		do {
+			iov[cnt].iov_base = rte_pktmbuf_mtod(m, void *);
+			iov[cnt].iov_len = rte_pktmbuf_data_len(m);
+			++cnt;
+		} while ((m = m->next));
+	}
+
+	ret = writev(self->outfd, iov, iovcnt);
+	if (unlikely(ret < 0))
+		rte_errno = errno;
+	return ret;
+}
+
+/* Create new pcapng writer handle */
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment)
+{
+	rte_pcapng_t *self;
+
+	self = malloc(sizeof(*self));
+	if (!self) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	self->outfd = fd;
+
+	if (pcapng_section_block(self, osname, hardware, appname, comment) < 0)
+		goto fail;
+
+	if (pcapng_interfaces(self) < 0)
+		goto fail;
+
+	return self;
+fail:
+	free(self);
+	return NULL;
+}
+
+void
+rte_pcapng_close(rte_pcapng_t *self)
+{
+	close(self->outfd);
+	free(self);
+}
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
new file mode 100644
index 000000000000..2f1bb073df08
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.h
@@ -0,0 +1,194 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+/**
+ * @file
+ * RTE pcapng
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * Pcapng is an evolution from the pcap format, created to address some of
+ * its deficiencies. Namely, the lack of extensibility and inability to store
+ * additional information.
+ *
+ * For details about the file format see RFC:
+ *   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+ *  and
+ *    https://github.com/pcapng/pcapng/
+ */
+
+#ifndef _RTE_PCAPNG_H_
+#define _RTE_PCAPNG_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+#include <rte_compat.h>
+#include <rte_common.h>
+#include <rte_mempool.h>
+#include <rte_ring.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Opaque handle used for functions in this library. */
+typedef struct rte_pcapng rte_pcapng_t;
+
+/**
+ * Write data to existing open file
+ *
+ * @param fd
+ *   file descriptor
+ * @param osname
+ *   Optional description of the operating system.
+ *   Examples: "Debian 11", "Windows Server 22"
+ * @param hardware
+ *   Optional description of the hardware used to create this file.
+ *   Examples: "x86 Virtual Machine"
+ * @param appname
+ *   Optional: application name recorded in the pcapng file.
+ *   Example: "dpdk-dumpcap 1.0 (DPDK 20.11)"
+ * @param comment
+ *   Optional comment to add to file header.
+ * @return
+ *   handle to library, or NULL in case of error (and rte_errno is set).
+ */
+__rte_experimental
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment);
+
+/**
+ * Close capture file
+ *
+ * @param self
+ *  handle to library
+ */
+__rte_experimental
+void
+rte_pcapng_close(rte_pcapng_t *self);
+
+/**
+ * Direction flag
+ * These should match Enhanced Packet Block flag bits
+ */
+enum rte_pcapng_direction {
+	RTE_PCAPNG_DIRECTION_UNKNOWN = 0,
+	RTE_PCAPNG_DIRECTION_IN  = 1,
+	RTE_PCAPNG_DIRECTION_OUT = 2,
+};
+
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param port_id
+ *   The Ethernet port on which packet was received
+ *   or is going to be transmitted.
+ * @param queue
+ *   The queue on the Ethernet port where packet was received
+ *   or is going to be transmitted.
+ * @param mp
+ *   The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ *   The mbuf to copy
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @param timestamp
+ *   The timestamp in TSC cycles.
+ * @param direction
+ *   The direction of the packer: receive, transmit or unknown.
+ *
+ * @return
+ *   - The pointer to the new mbuf formatted for pcapng_write
+ *   - NULL if allocation fails.
+ *
+ */
+__rte_experimental
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *m, struct rte_mempool *mp,
+		uint32_t length, uint64_t timestamp,
+		enum rte_pcapng_direction direction);
+
+
+/**
+ * Determine optimum mbuf data size.
+ *
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @return
+ *   The minimum size of mbuf data to handle packet with length bytes.
+ *   Accounting for required header and trailer fields
+ */
+__rte_experimental
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length);
+
+/**
+ * Write packets to the capture file.
+ *
+ * Packets to be captured are copied by rte_pcapng_mbuf()
+ * and then this function is called to write them to the file.
+ * @warning
+ * Do not pass original mbufs
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param pkts
+ *  The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *  which contain the output packets
+ * @param nb_pkts
+ *  The number of packets to write to the file.
+ * @return
+ *  The number of bytes written to file, -1 on failure to write file.
+ *  The mbuf's in *pkts* are always freed.
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts);
+
+/**
+ * Write an Interface statistics block.
+ * For statistics, use 0 if don't know or care to report it.
+ * Should be called before closing capture to report results.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param port
+ *  The Ethernet port to report stats on.
+ * @param comment
+ *   Optional comment to add to statistics.
+ * @param start_time
+ *  The time when packet capture was started in nanoseconds.
+ *  Optional: can be zero if not known.
+ * @param end_time
+ *  The time when packet capture was stopped in nanoseconds.
+ *  Optional: can be zero if not finished;
+ * @param ifrecv
+ *  The number of packets received by capture.
+ *  Optional: use UINT64_MAX if not known.
+ * @param ifdrop
+ *  The number of packets missed by the capture process.
+ *  Optional: use UINT64_MAX if not known.
+ * @return
+ *  number of bytes written to file, -1 on failure to write file
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PCAPNG_H_ */
diff --git a/lib/pcapng/version.map b/lib/pcapng/version.map
new file mode 100644
index 000000000000..05a9c86a7d91
--- /dev/null
+++ b/lib/pcapng/version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_pcapng_close;
+	rte_pcapng_copy;
+	rte_pcapng_fdopen;
+	rte_pcapng_mbuf_size;
+	rte_pcapng_write_packets;
+	rte_pcapng_write_stats;
+
+	local: *;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v8 02/12] lib: pdump is not supported on Windows
  2021-09-13 18:14 ` [dpdk-dev] [PATCH v8 00/12] Packet capture framework enhancements Stephen Hemminger
  2021-09-13 18:14   ` [dpdk-dev] [PATCH v8 01/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-09-13 18:15   ` Stephen Hemminger
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 03/12] bpf: allow self-xor operation Stephen Hemminger
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-13 18:15 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

The current version of the pdump library was building on
Windows, but it was useless since the pdump utility was not being
built and Windows does not have multi-process support.

The new version of pdump with filtering now has dependency
on bpf. But bpf library is not available on Windows.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

Cc: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>
Cc: Dmitry Malloy <dmitrym@microsoft.com>
Cc: Pallavi Kadam <pallavi.kadam@intel.com>
---
 lib/meson.build | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/meson.build b/lib/meson.build
index 51bf9c2d11f0..ba88e9eabc58 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -85,7 +85,6 @@ if is_windows
             'gro',
             'gso',
             'latencystats',
-            'pdump',
     ] # only supported libraries for windows
 endif
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v8 03/12] bpf: allow self-xor operation
  2021-09-13 18:14 ` [dpdk-dev] [PATCH v8 00/12] Packet capture framework enhancements Stephen Hemminger
  2021-09-13 18:14   ` [dpdk-dev] [PATCH v8 01/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 02/12] lib: pdump is not supported on Windows Stephen Hemminger
@ 2021-09-13 18:15   ` Stephen Hemminger
  2021-09-15 10:55     ` Ananyev, Konstantin
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
                     ` (8 subsequent siblings)
  11 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-13 18:15 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

When doing BPF filter program conversion, a common way
to zero a register in single instruction is:
     xor r7,r7
The BPF validator would not allow this because the value of
r7 was undefined. But after this operation it always zero.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_validate.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/lib/bpf/bpf_validate.c b/lib/bpf/bpf_validate.c
index 7b1291b382e9..7647a7454dc2 100644
--- a/lib/bpf/bpf_validate.c
+++ b/lib/bpf/bpf_validate.c
@@ -661,8 +661,12 @@ eval_alu(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
 
 	op = BPF_OP(ins->code);
 
-	err = eval_defined((op != EBPF_MOV) ? rd : NULL,
-			(op != BPF_NEG) ? &rs : NULL);
+	/* Allow self-xor as way to zero register */
+	if (op == BPF_XOR && ins->src_reg == ins->dst_reg)
+		err = NULL;
+	else
+		err = eval_defined((op != EBPF_MOV) ? rd : NULL,
+				   (op != BPF_NEG) ? &rs : NULL);
 	if (err != NULL)
 		return err;
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v8 04/12] bpf: add function to convert classic BPF to DPDK BPF
  2021-09-13 18:14 ` [dpdk-dev] [PATCH v8 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (2 preceding siblings ...)
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 03/12] bpf: allow self-xor operation Stephen Hemminger
@ 2021-09-13 18:15   ` Stephen Hemminger
  2021-09-15 11:02     ` Ananyev, Konstantin
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
                     ` (7 subsequent siblings)
  11 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-13 18:15 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

The pcap library emits classic BPF (32 bit) and is useful for
creating filter programs.  The DPDK BPF library only implements
extended BPF (eBPF).  Add an function to convert from old to
new.

The rte_bpf_convert function uses rte_malloc to put the resulting
program in hugepage shared memory so it can be passed from a
secondary process to a primary process.

The code to convert was originally done as part of the Linux
kernel implementation then converted to a userspace program.
Both authors have agreed that it is allowable to license this
as BSD licensed in DPDK.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_convert.c | 570 ++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build   |   5 +
 lib/bpf/rte_bpf.h     |  25 ++
 lib/bpf/version.map   |   6 +
 4 files changed, 606 insertions(+)
 create mode 100644 lib/bpf/bpf_convert.c

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
new file mode 100644
index 000000000000..a46ffeb067dd
--- /dev/null
+++ b/lib/bpf/bpf_convert.c
@@ -0,0 +1,570 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Microsoft Corporation
+ *
+ * Based on bpf_convert_filter() in the Linux kernel sources
+ * and filter2xdp.
+ *
+ * Licensed as BSD with permission original authors.
+ * Copyright (C) 2017 Tobias Klauser
+ * Copyright (c) 2011 - 2014 PLUMgrid, http://plumgrid.com
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_bpf.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+/* Workaround name conflicts with libpcap */
+#define bpf_validate(f, len) bpf_validate_libpcap(f, len)
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+#undef bpf_validate
+
+#include "bpf_impl.h"
+#include "bpf_def.h"
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+/*
+ * Linux socket filter uses negative absolute offsets to
+ * reference ancillary data.
+ */
+#define SKF_AD_OFF    (-0x1000)
+#define SKF_AD_PROTOCOL 0
+#define SKF_AD_PKTTYPE	4
+#define SKF_AD_IFINDEX	8
+#define SKF_AD_NLATTR	12
+#define SKF_AD_NLATTR_NEST	16
+#define SKF_AD_MARK	20
+#define SKF_AD_QUEUE	24
+#define SKF_AD_HATYPE	28
+#define SKF_AD_RXHASH	32
+#define SKF_AD_CPU	36
+#define SKF_AD_ALU_XOR_X	40
+#define SKF_AD_VLAN_TAG	44
+#define SKF_AD_VLAN_TAG_PRESENT 48
+#define SKF_AD_PAY_OFFSET	52
+#define SKF_AD_RANDOM	56
+#define SKF_AD_VLAN_TPID	60
+#define SKF_AD_MAX	64
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1	EBPF_REG_1
+#define BPF_REG_ARG2	EBPF_REG_2
+#define BPF_REG_ARG3	EBPF_REG_3
+#define BPF_REG_ARG4	EBPF_REG_4
+#define BPF_REG_ARG5	EBPF_REG_5
+#define BPF_REG_CTX	EBPF_REG_6
+#define BPF_REG_FP	EBPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A	EBPF_REG_0
+#define BPF_REG_X	EBPF_REG_7
+#define BPF_REG_TMP	EBPF_REG_8
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define EBPF_ALU64_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | BPF_OP(OP) | BPF_X,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU32_IMM(OP, DST, IMM)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV32_IMM(DST, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = CODE,					\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN()						\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | EBPF_EXIT,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/*
+ * Placeholder to convert BPF extensions like length and VLAN tag
+ * If and when DPDK BPF supports them.
+ */
+static bool convert_bpf_load(const struct bpf_insn *fp,
+			     struct ebpf_insn **new_insnp __rte_unused)
+{
+	switch (fp->k) {
+	case SKF_AD_OFF + SKF_AD_PROTOCOL:
+	case SKF_AD_OFF + SKF_AD_PKTTYPE:
+	case SKF_AD_OFF + SKF_AD_IFINDEX:
+	case SKF_AD_OFF + SKF_AD_HATYPE:
+	case SKF_AD_OFF + SKF_AD_MARK:
+	case SKF_AD_OFF + SKF_AD_RXHASH:
+	case SKF_AD_OFF + SKF_AD_QUEUE:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG_PRESENT:
+	case SKF_AD_OFF + SKF_AD_VLAN_TPID:
+	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
+	case SKF_AD_OFF + SKF_AD_NLATTR:
+	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
+	case SKF_AD_OFF + SKF_AD_CPU:
+	case SKF_AD_OFF + SKF_AD_RANDOM:
+	case SKF_AD_OFF + SKF_AD_ALU_XOR_X:
+		/* Linux has special negative offsets to access meta-data. */
+		RTE_BPF_LOG(ERR,
+			    "rte_bpf_convert: socket offset %d not supported\n",
+			    fp->k - SKF_AD_OFF);
+		return true;
+	default:
+		return false;
+	}
+}
+
+static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
+			      struct ebpf_insn *new_prog, uint32_t *new_len)
+{
+	unsigned int pass = 0;
+	size_t new_flen = 0, target, i;
+	struct ebpf_insn *new_insn;
+	const struct bpf_insn *fp;
+	int *addrs = NULL;
+	uint8_t bpf_src;
+
+	if (len > BPF_MAXINSNS) {
+		RTE_BPF_LOG(ERR, "%s: cBPF program too long (%zu insns)\n",
+			    __func__, len);
+		return -EINVAL;
+	}
+
+	/* On second pass, allocate the new program */
+	if (new_prog) {
+		addrs = calloc(len, sizeof(*addrs));
+		if (addrs == NULL)
+			return -ENOMEM;
+	}
+
+do_pass:
+	new_insn = new_prog;
+	fp = prog;
+
+	/* Classic BPF related prologue emission. */
+	if (new_insn) {
+		/* Classic BPF expects A and X to be reset first. These need
+		 * to be guaranteed to be the first two instructions.
+		 */
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+
+		/* All programs must keep CTX in callee saved BPF_REG_CTX.
+		 * In eBPF case it's done by the compiler, here we need to
+		 * do this ourself. Initial CTX is present in BPF_REG_ARG1.
+		 */
+		*new_insn++ = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1);
+	} else {
+		new_insn += 3;
+	}
+
+	for (i = 0; i < len; fp++, i++) {
+		struct ebpf_insn tmp_insns[6] = { };
+		struct ebpf_insn *insn = tmp_insns;
+
+		if (addrs)
+			addrs[i] = new_insn - new_prog;
+
+		switch (fp->code) {
+			/* Absolute loads are how classic BPF accesses skb */
+		case BPF_LD | BPF_ABS | BPF_W:
+		case BPF_LD | BPF_ABS | BPF_H:
+		case BPF_LD | BPF_ABS | BPF_B:
+			if (convert_bpf_load(fp, &insn))
+				goto err;
+
+			*insn = BPF_RAW_INSN(fp->code, 0, 0, 0, fp->k);
+			break;
+
+		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_X:
+			/* For cBPF, don't cause floating point exception */
+			*insn++ = BPF_MOV32_REG(BPF_REG_X, BPF_REG_X);
+			*insn++ = BPF_JMP_IMM(EBPF_JNE, BPF_REG_X, 0, 2);
+			*insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+			*insn++ = BPF_EXIT_INSN();
+			/* fallthrough */
+		case BPF_ALU | BPF_ADD | BPF_X:
+		case BPF_ALU | BPF_ADD | BPF_K:
+		case BPF_ALU | BPF_SUB | BPF_X:
+		case BPF_ALU | BPF_SUB | BPF_K:
+		case BPF_ALU | BPF_AND | BPF_X:
+		case BPF_ALU | BPF_AND | BPF_K:
+		case BPF_ALU | BPF_OR | BPF_X:
+		case BPF_ALU | BPF_OR | BPF_K:
+		case BPF_ALU | BPF_LSH | BPF_X:
+		case BPF_ALU | BPF_LSH | BPF_K:
+		case BPF_ALU | BPF_RSH | BPF_X:
+		case BPF_ALU | BPF_RSH | BPF_K:
+		case BPF_ALU | BPF_XOR | BPF_X:
+		case BPF_ALU | BPF_XOR | BPF_K:
+		case BPF_ALU | BPF_MUL | BPF_X:
+		case BPF_ALU | BPF_MUL | BPF_K:
+		case BPF_ALU | BPF_DIV | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_NEG:
+		case BPF_LD | BPF_IND | BPF_W:
+		case BPF_LD | BPF_IND | BPF_H:
+		case BPF_LD | BPF_IND | BPF_B:
+			/* All arithmetic insns map as-is. */
+			*insn = BPF_RAW_INSN(fp->code, BPF_REG_A, BPF_REG_X, 0, fp->k);
+			break;
+
+			/* Jump transformation cannot use BPF block macros
+			 * everywhere as offset calculation and target updates
+			 * require a bit more work than the rest, i.e. jump
+			 * opcodes map as-is, but offsets need adjustment.
+			 */
+
+#define BPF_EMIT_JMP							\
+			do {						\
+				if (target >= len)			\
+					goto err;			\
+				insn->off = addrs ? addrs[target] - addrs[i] - 1 : 0; \
+				/* Adjust pc relative offset for 2nd or 3rd insn. */ \
+				insn->off -= insn - tmp_insns;		\
+			} while (0)
+
+		case BPF_JMP | BPF_JA:
+			target = i + fp->k + 1;
+			insn->code = fp->code;
+			BPF_EMIT_JMP;
+			break;
+
+		case BPF_JMP | BPF_JEQ | BPF_K:
+		case BPF_JMP | BPF_JEQ | BPF_X:
+		case BPF_JMP | BPF_JSET | BPF_K:
+		case BPF_JMP | BPF_JSET | BPF_X:
+		case BPF_JMP | BPF_JGT | BPF_K:
+		case BPF_JMP | BPF_JGT | BPF_X:
+		case BPF_JMP | BPF_JGE | BPF_K:
+		case BPF_JMP | BPF_JGE | BPF_X:
+			if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
+				/* BPF immediates are signed, zero extend
+				 * immediate into tmp register and use it
+				 * in compare insn.
+				 */
+				*insn++ = BPF_MOV32_IMM(BPF_REG_TMP, fp->k);
+
+				insn->dst_reg = BPF_REG_A;
+				insn->src_reg = BPF_REG_TMP;
+				bpf_src = BPF_X;
+			} else {
+				insn->dst_reg = BPF_REG_A;
+				insn->imm = fp->k;
+				bpf_src = BPF_SRC(fp->code);
+				insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			}
+
+			/* Common case where 'jump_false' is next insn. */
+			if (fp->jf == 0) {
+				insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+				target = i + fp->jt + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Convert JEQ into JNE when 'jump_true' is next insn. */
+			if (fp->jt == 0 && BPF_OP(fp->code) == BPF_JEQ) {
+				insn->code = BPF_JMP | EBPF_JNE | bpf_src;
+				target = i + fp->jf + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Other jumps are mapped into two insns: Jxx and JA. */
+			target = i + fp->jt + 1;
+			insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+			BPF_EMIT_JMP;
+			insn++;
+
+			insn->code = BPF_JMP | BPF_JA;
+			target = i + fp->jf + 1;
+			BPF_EMIT_JMP;
+			break;
+
+			/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
+		case BPF_LDX | BPF_MSH | BPF_B:
+			/* tmp = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_A);
+			/* A = BPF_R0 = *(u8 *) (skb->data + K) */
+			*insn++ = BPF_LD_ABS(BPF_B, fp->k);
+			/* A &= 0xf */
+			*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, 0xf);
+			/* A <<= 2 */
+			*insn++ = BPF_ALU32_IMM(BPF_LSH, BPF_REG_A, 2);
+			/* X = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			/* A = tmp */
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP);
+			break;
+
+			/* RET_K is remaped into 2 insns. RET_A case doesn't need an
+			 * extra mov as EBPF_REG_0 is already mapped into BPF_REG_A.
+			 */
+		case BPF_RET | BPF_A:
+		case BPF_RET | BPF_K:
+			if (BPF_RVAL(fp->code) == BPF_K) {
+				*insn++ = BPF_MOV32_RAW(BPF_K, EBPF_REG_0,
+							0, fp->k);
+			}
+			*insn = BPF_EXIT_INSN();
+			break;
+
+			/* Store to stack. */
+		case BPF_ST:
+		case BPF_STX:
+			*insn = BPF_STX_MEM(BPF_W, BPF_REG_FP, BPF_CLASS(fp->code) ==
+					    BPF_ST ? BPF_REG_A : BPF_REG_X,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* Load from stack. */
+		case BPF_LD | BPF_MEM:
+		case BPF_LDX | BPF_MEM:
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD  ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_FP,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* A = K or X = K */
+		case BPF_LD | BPF_IMM:
+		case BPF_LDX | BPF_IMM:
+			*insn = BPF_MOV32_IMM(BPF_CLASS(fp->code) == BPF_LD ?
+					      BPF_REG_A : BPF_REG_X, fp->k);
+			break;
+
+			/* X = A */
+		case BPF_MISC | BPF_TAX:
+			*insn = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			break;
+
+			/* A = X */
+		case BPF_MISC | BPF_TXA:
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_X);
+			break;
+
+			/* A = mbuf->len or X = mbuf->len */
+		case BPF_LD | BPF_W | BPF_LEN:
+		case BPF_LDX | BPF_W | BPF_LEN:
+			/* BPF_ABS/BPF_IND implicitly expect mbuf ptr in R6 */
+
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_CTX,
+					    offsetof(struct rte_mbuf, pkt_len));
+			break;
+
+			/* Unknown instruction. */
+		default:
+			RTE_BPF_LOG(ERR, "%s: Unknown instruction!: %#x\n",
+				    __func__, fp->code);
+			goto err;
+		}
+
+		insn++;
+		if (new_prog)
+			memcpy(new_insn, tmp_insns,
+			       sizeof(*insn) * (insn - tmp_insns));
+		new_insn += insn - tmp_insns;
+	}
+
+	if (!new_prog) {
+		/* Only calculating new length. */
+		*new_len = new_insn - new_prog;
+		return 0;
+	}
+
+	pass++;
+	if ((ptrdiff_t)new_flen != new_insn - new_prog) {
+		new_flen = new_insn - new_prog;
+		if (pass > 2)
+			goto err;
+		goto do_pass;
+	}
+
+	free(addrs);
+	assert(*new_len == new_flen);
+
+	return 0;
+err:
+	free(addrs);
+	return -1;
+}
+
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog)
+{
+	struct rte_bpf_prm *prm = NULL;
+	struct ebpf_insn *ebpf = NULL;
+	uint32_t ebpf_len = 0;
+	int ret;
+
+	if (prog == NULL) {
+		RTE_BPF_LOG(ERR, "%s: NULL program\n", __func__);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* 1st pass: calculate the eBPF program length */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, NULL, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot get eBPF length\n", __func__);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	RTE_BPF_LOG(DEBUG, "%s: prog len cBPF=%u -> eBPF=%u\n",
+		    __func__, prog->bf_len, ebpf_len);
+
+	prm = rte_zmalloc("bpf_filter",
+			  sizeof(*prm) + ebpf_len * sizeof(*ebpf), 0);
+	if (prm == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	/* The EPBF instructions in this case are right after the header */
+	ebpf = (void *)(prm + 1);
+
+	/* 2nd pass: remap cBPF to eBPF instructions  */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, ebpf, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot convert cBPF to eBPF\n", __func__);
+		free(prm);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	prm->ins = ebpf;
+	prm->nb_ins = ebpf_len;
+
+	/* Classic BPF programs use mbufs */
+	prm->prog_arg.type = RTE_BPF_ARG_PTR_MBUF;
+	prm->prog_arg.size = sizeof(struct rte_mbuf);
+
+	return prm;
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 63cbd60185e0..54f7610ae990 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -25,3 +25,8 @@ if dep.found()
     sources += files('bpf_load_elf.c')
     ext_deps += dep
 endif
+
+if dpdk_conf.has('RTE_PORT_PCAP')
+    sources += files('bpf_convert.c')
+    ext_deps += pcap_dep
+endif
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 69116f36ba8b..2f23e272a376 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,31 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+#ifdef RTE_PORT_PCAP
+
+struct bpf_program;
+
+/**
+ * Convert a Classic BPF program from libpcap into a DPDK BPF code.
+ *
+ * @param prog
+ *  Classic BPF program from pcap_compile().
+ * @param prm
+ *  Result Extended BPF program.
+ * @return
+ *   Pointer to BPF program (allocated with *rte_malloc*)
+ *   that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+__rte_experimental
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog);
+
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 0bf35f487666..47082d5003ef 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -14,3 +14,9 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_convert;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v8 05/12] bpf: add function to dump eBPF instructions
  2021-09-13 18:14 ` [dpdk-dev] [PATCH v8 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (3 preceding siblings ...)
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
@ 2021-09-13 18:15   ` Stephen Hemminger
  2021-09-15 11:04     ` Ananyev, Konstantin
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 06/12] pdump: support pcapng and filtering Stephen Hemminger
                     ` (6 subsequent siblings)
  11 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-13 18:15 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

When debugging converted (and other) programs it is useful
to see disassembled eBPF output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_convert.c |   7 ++-
 lib/bpf/bpf_dump.c    | 118 ++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build   |   1 +
 lib/bpf/rte_bpf.h     |  14 +++++
 lib/bpf/version.map   |   1 +
 5 files changed, 140 insertions(+), 1 deletion(-)
 create mode 100644 lib/bpf/bpf_dump.c

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
index a46ffeb067dd..db84add7dcce 100644
--- a/lib/bpf/bpf_convert.c
+++ b/lib/bpf/bpf_convert.c
@@ -331,7 +331,12 @@ static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
 		case BPF_LD | BPF_IND | BPF_H:
 		case BPF_LD | BPF_IND | BPF_B:
 			/* All arithmetic insns map as-is. */
-			*insn = BPF_RAW_INSN(fp->code, BPF_REG_A, BPF_REG_X, 0, fp->k);
+			insn->code = fp->code;
+			insn->dst_reg = BPF_REG_A;
+			bpf_src = BPF_SRC(fp->code);
+			insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			insn->off = 0;
+			insn->imm = fp->k;
 			break;
 
 			/* Jump transformation cannot use BPF block macros
diff --git a/lib/bpf/bpf_dump.c b/lib/bpf/bpf_dump.c
new file mode 100644
index 000000000000..a6a431e64903
--- /dev/null
+++ b/lib/bpf/bpf_dump.c
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Stephen Hemminger
+ * Based on filter2xdp
+ * Copyright (C) 2017 Tobias Klauser
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+
+#include "rte_bpf.h"
+
+#define BPF_OP_INDEX(x) (BPF_OP(x) >> 4)
+#define BPF_SIZE_INDEX(x) (BPF_SIZE(x) >> 3)
+
+static const char *const class_tbl[] = {
+	[BPF_LD] = "ld",   [BPF_LDX] = "ldx",	 [BPF_ST] = "st",
+	[BPF_STX] = "stx", [BPF_ALU] = "alu",	 [BPF_JMP] = "jmp",
+	[BPF_RET] = "ret", [BPF_MISC] = "alu64",
+};
+
+static const char *const alu_op_tbl[16] = {
+	[BPF_ADD >> 4] = "add",	   [BPF_SUB >> 4] = "sub",
+	[BPF_MUL >> 4] = "mul",	   [BPF_DIV >> 4] = "div",
+	[BPF_OR >> 4] = "or",	   [BPF_AND >> 4] = "and",
+	[BPF_LSH >> 4] = "lsh",	   [BPF_RSH >> 4] = "rsh",
+	[BPF_NEG >> 4] = "neg",	   [BPF_MOD >> 4] = "mod",
+	[BPF_XOR >> 4] = "xor",	   [EBPF_MOV >> 4] = "mov",
+	[EBPF_ARSH >> 4] = "arsh", [EBPF_END >> 4] = "endian",
+};
+
+static const char *const size_tbl[] = {
+	[BPF_W >> 3] = "w",
+	[BPF_H >> 3] = "h",
+	[BPF_B >> 3] = "b",
+	[EBPF_DW >> 3] = "dw",
+};
+
+static const char *const jump_tbl[16] = {
+	[BPF_JA >> 4] = "ja",	   [BPF_JEQ >> 4] = "jeq",
+	[BPF_JGT >> 4] = "jgt",	   [BPF_JGE >> 4] = "jge",
+	[BPF_JSET >> 4] = "jset",  [EBPF_JNE >> 4] = "jne",
+	[EBPF_JSGT >> 4] = "jsgt", [EBPF_JSGE >> 4] = "jsge",
+	[EBPF_CALL >> 4] = "call", [EBPF_EXIT >> 4] = "exit",
+};
+
+static void ebpf_dump(FILE *f, const struct ebpf_insn insn, size_t n)
+{
+	const char *op, *postfix = "";
+	uint8_t cls = BPF_CLASS(insn.code);
+
+	fprintf(f, " L%zu:\t", n);
+
+	switch (cls) {
+	default:
+		fprintf(f, "unimp 0x%x // class: %s\n", insn.code,
+			class_tbl[cls]);
+		break;
+	case BPF_ALU:
+		postfix = "32";
+		/* fall through */
+	case EBPF_ALU64:
+		op = alu_op_tbl[BPF_OP_INDEX(insn.code)];
+		if (BPF_SRC(insn.code) == BPF_X)
+			fprintf(f, "%s%s r%u, r%u\n", op, postfix, insn.dst_reg,
+				insn.src_reg);
+		else
+			fprintf(f, "%s%s r%u, #0x%x\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		break;
+	case BPF_LD:
+		op = "ld";
+		postfix = size_tbl[BPF_SIZE_INDEX(insn.code)];
+		if (BPF_MODE(insn.code) == BPF_IMM)
+			fprintf(f, "%s%s r%d, #0x%x\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		else if (BPF_MODE(insn.code) == BPF_ABS)
+			fprintf(f, "%s%s r%d, [%d]\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		else if (BPF_MODE(insn.code) == BPF_IND)
+			fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix,
+				insn.dst_reg, insn.src_reg, insn.imm);
+		else
+			fprintf(f, "// BUG: LD opcode 0x%02x in eBPF insns\n",
+				insn.code);
+		break;
+	case BPF_LDX:
+		op = "ldx";
+		postfix = size_tbl[BPF_SIZE_INDEX(insn.code)];
+		fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix, insn.dst_reg,
+			insn.src_reg, insn.off);
+		break;
+#define L(pc, off) ((int)(pc) + 1 + (off))
+	case BPF_JMP:
+		op = jump_tbl[BPF_OP_INDEX(insn.code)];
+		if (op == NULL)
+			fprintf(f, "invalid jump opcode: %#x\n", insn.code);
+		else if (BPF_OP(insn.code) == BPF_JA)
+			fprintf(f, "%s L%d\n", op, L(n, insn.off));
+		else if (BPF_OP(insn.code) == EBPF_EXIT)
+			fprintf(f, "%s\n", op);
+		else
+			fprintf(f, "%s r%u, #0x%x, L%d\n", op, insn.dst_reg,
+				insn.imm, L(n, insn.off));
+		break;
+	case BPF_RET:
+		fprintf(f, "// BUG: RET opcode 0x%02x in eBPF insns\n",
+			insn.code);
+		break;
+	}
+}
+
+void rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len)
+{
+	uint32_t i;
+
+	for (i = 0; i < len; ++i)
+		ebpf_dump(f, buf[i], i);
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 54f7610ae990..5b5585173aeb 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2018 Intel Corporation
 
 sources = files('bpf.c',
+	'bpf_dump.c',
         'bpf_exec.c',
         'bpf_load.c',
         'bpf_pkt.c',
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 2f23e272a376..0d0a84b130a0 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,20 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+/**
+ * Dump epf instructions to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param buf
+ *   A pointer to BPF instructions
+ * @param len
+ *   Number of BPF instructions to dump.
+ */
+__rte_experimental
+void
+rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len);
+
 #ifdef RTE_PORT_PCAP
 
 struct bpf_program;
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 47082d5003ef..3b953f2f4592 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -19,4 +19,5 @@ EXPERIMENTAL {
 	global:
 
 	rte_bpf_convert;
+	rte_bpf_dump;
 };
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v8 06/12] pdump: support pcapng and filtering
  2021-09-13 18:14 ` [dpdk-dev] [PATCH v8 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (4 preceding siblings ...)
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
@ 2021-09-13 18:15   ` Stephen Hemminger
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-13 18:15 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build       |   4 +-
 lib/pdump/meson.build |   2 +-
 lib/pdump/rte_pdump.c | 441 ++++++++++++++++++++++++++++++------------
 lib/pdump/rte_pdump.h | 110 ++++++++++-
 lib/pdump/version.map |   8 +
 5 files changed, 437 insertions(+), 128 deletions(-)

diff --git a/lib/meson.build b/lib/meson.build
index ba88e9eabc58..1da521ea6185 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -26,6 +26,7 @@ libraries = [
         'timer',   # eventdev depends on this
         'acl',
         'bbdev',
+        'bpf',
         'bitratestats',
         'cfgfile',
         'compressdev',
@@ -43,7 +44,6 @@ libraries = [
         'member',
         'pcapng',
         'power',
-        'pdump',
         'rawdev',
         'regexdev',
         'rib',
@@ -55,10 +55,10 @@ libraries = [
         'ipsec', # ipsec lib depends on net, crypto and security
         'fib', #fib lib depends on rib
         'port', # pkt framework libs which use other libs from above
+        'pdump', # pdump lib depends on bpf pcapng
         'table',
         'pipeline',
         'flow_classify', # flow_classify lib depends on pkt framework table lib
-        'bpf',
         'graph',
         'node',
 ]
diff --git a/lib/pdump/meson.build b/lib/pdump/meson.build
index 3a95eabde6a6..51ceb2afdec5 100644
--- a/lib/pdump/meson.build
+++ b/lib/pdump/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('rte_pdump.c')
 headers = files('rte_pdump.h')
-deps += ['ethdev']
+deps += ['ethdev', 'bpf', 'pcapng']
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 382217bc1564..abc28fcee0ad 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -7,8 +7,10 @@
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
+#include <rte_pcapng.h>
 
 #include "rte_pdump.h"
 
@@ -27,30 +29,26 @@ enum pdump_operation {
 	ENABLE = 2
 };
 
+/*
+ * Note: version numbers intentionally start at 3
+ * in order to catch any application built with older out
+ * version of DPDK using incompatible client request format.
+ */
 enum pdump_version {
-	V1 = 1
+	PDUMP_CLIENT_LEGACY = 3,
+	PDUMP_CLIENT_PCAPNG = 4,
 };
 
 struct pdump_request {
 	uint16_t ver;
 	uint16_t op;
-	uint32_t flags;
-	union pdump_data {
-		struct enable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} en_v1;
-		struct disable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} dis_v1;
-	} data;
+	uint16_t flags;
+	uint16_t queue;
+	struct rte_ring *ring;
+	struct rte_mempool *mp;
+	const struct rte_bpf_prm *prm;
+	uint32_t snaplen;
+	char device[RTE_DEV_NAME_MAX_LEN];
 };
 
 struct pdump_response {
@@ -63,80 +61,140 @@ static struct pdump_rxtx_cbs {
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	const struct rte_eth_rxtx_callback *cb;
-	void *filter;
+	const struct rte_bpf *filter;
+	enum pdump_version ver;
+	uint32_t snaplen;
 } rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
 tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
 
-
-static inline void
-pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+static const char *MZ_RTE_PDUMP_STATS = "rte_pdump_stats";
+
+/* Shared memory between primary and secondary processes. */
+static struct {
+	struct rte_pdump_stats rx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+	struct rte_pdump_stats tx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+} *pdump_stats;
+
+/* Create a clone of mbuf to be placed into ring. */
+static void
+pdump_copy(uint16_t port_id, uint16_t queue,
+	   enum rte_pcapng_direction direction,
+	   struct rte_mbuf **pkts, uint16_t nb_pkts,
+	   const struct pdump_rxtx_cbs *cbs,
+	   struct rte_pdump_stats *stats)
 {
 	unsigned int i;
 	int ring_enq;
 	uint16_t d_pkts = 0;
 	struct rte_mbuf *dup_bufs[nb_pkts];
-	struct pdump_rxtx_cbs *cbs;
+	uint64_t ts;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	struct rte_mbuf *p;
+	uint64_t rcs[nb_pkts];
+
+	if (cbs->filter &&
+	    rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts) == 0) {
+		/* All packets were filtered out */
+		__atomic_fetch_add(&stats->filtered, nb_pkts,
+				   __ATOMIC_RELAXED);
+		return;
+	}
 
-	cbs  = user_params;
+	ts = rte_get_tsc_cycles();
 	ring = cbs->ring;
 	mp = cbs->mp;
 	for (i = 0; i < nb_pkts; i++) {
-		p = rte_pktmbuf_copy(pkts[i], mp, 0, UINT32_MAX);
-		if (p)
+		/*
+		 * Similar behavior to rte_bpf_eth callback.
+		 * if BPF program returns zero value for a given packet,
+		 * then it will be ignored.
+		 */
+		if (cbs->filter && rcs[i] == 0) {
+			__atomic_fetch_add(&stats->filtered,
+					   1, __ATOMIC_RELAXED);
+			continue;
+		}
+
+		/*
+		 * If using pcapng then want to wrap packets
+		 * otherwise a simple copy.
+		 */
+		if (cbs->ver == PDUMP_CLIENT_PCAPNG)
+			p = rte_pcapng_copy(port_id, queue,
+					    pkts[i], mp, cbs->snaplen,
+					    ts, direction);
+		else
+			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
+
+		if (unlikely(p == NULL))
+			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+		else
 			dup_bufs[d_pkts++] = p;
 	}
 
+	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)dup_bufs, d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		PDUMP_LOG(DEBUG,
-			"only %d of packets enqueued to ring\n", ring_enq);
+		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
 
 static uint16_t
-pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_rx(uint16_t port, uint16_t queue,
 	struct rte_mbuf **pkts, uint16_t nb_pkts,
-	uint16_t max_pkts __rte_unused,
-	void *user_params)
+	uint16_t max_pkts __rte_unused, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->rx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_IN,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static uint16_t
-pdump_tx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_tx(uint16_t port, uint16_t queue,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->tx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static int
-pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_rx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &rx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &rx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"rx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_first_rx_callback(port, qid,
 								pdump_rx, cbs);
 			if (cbs->cb == NULL) {
@@ -145,8 +203,7 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -170,26 +227,32 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 }
 
 static int
-pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_tx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &tx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &tx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"tx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_tx_callback(port, qid, pdump_tx,
 								cbs);
 			if (cbs->cb == NULL) {
@@ -198,8 +261,7 @@ pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -228,37 +290,47 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	uint16_t nb_rx_q = 0, nb_tx_q = 0, end_q, queue;
 	uint16_t port;
 	int ret = 0;
+	struct rte_bpf *filter = NULL;
 	uint32_t flags;
 	uint16_t operation;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 
-	flags = p->flags;
-	operation = p->op;
-	if (operation == ENABLE) {
-		ret = rte_eth_dev_get_port_by_name(p->data.en_v1.device,
-				&port);
-		if (ret < 0) {
+	if (!(p->ver == PDUMP_CLIENT_LEGACY ||
+	      p->ver == PDUMP_CLIENT_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "incorrect client version %u\n", p->ver);
+		return -EINVAL;
+	}
+
+	if (p->prm) {
+		if (p->prm->prog_arg.type != RTE_BPF_ARG_PTR_MBUF) {
 			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.en_v1.device);
+				  "invalid BPF program type: %u\n",
+				  p->prm->prog_arg.type);
 			return -EINVAL;
 		}
-		queue = p->data.en_v1.queue;
-		ring = p->data.en_v1.ring;
-		mp = p->data.en_v1.mp;
-	} else {
-		ret = rte_eth_dev_get_port_by_name(p->data.dis_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.dis_v1.device);
-			return -EINVAL;
+
+		filter = rte_bpf_load(p->prm);
+		if (filter == NULL) {
+			PDUMP_LOG(ERR, "cannot load BPF filter: %s\n",
+				  rte_strerror(rte_errno));
+			return -rte_errno;
 		}
-		queue = p->data.dis_v1.queue;
-		ring = p->data.dis_v1.ring;
-		mp = p->data.dis_v1.mp;
+	}
+
+	flags = p->flags;
+	operation = p->op;
+	queue = p->queue;
+	ring = p->ring;
+	mp = p->mp;
+
+	ret = rte_eth_dev_get_port_by_name(p->device, &port);
+	if (ret < 0) {
+		PDUMP_LOG(ERR,
+			  "failed to get port id for device id=%s\n",
+			  p->device);
+		return -EINVAL;
 	}
 
 	/* validation if packet capture is for all queues */
@@ -296,8 +368,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register RX callback */
 	if (flags & RTE_PDUMP_FLAG_RX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_rx_q : queue + 1;
-		ret = pdump_register_rx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_rx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -305,8 +378,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register TX callback */
 	if (flags & RTE_PDUMP_FLAG_TX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_tx_q : queue + 1;
-		ret = pdump_register_tx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_tx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -332,7 +406,7 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 		resp->err_value = set_pdump_rxtx_cbs(cli_req);
 	}
 
-	strlcpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_resp.len_param = sizeof(*resp);
 	mp_resp.num_fds = 0;
 	if (rte_mp_reply(&mp_resp, peer) < 0) {
@@ -347,8 +421,18 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 int
 rte_pdump_init(void)
 {
+	const struct rte_memzone *mz;
 	int ret;
 
+	mz = rte_memzone_reserve(MZ_RTE_PDUMP_STATS, sizeof(*pdump_stats),
+				 rte_socket_id(), 0);
+	if (mz == NULL) {
+		PDUMP_LOG(ERR, "cannot allocate pdump statistics\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	pdump_stats = mz->addr;
+
 	ret = rte_mp_action_register(PDUMP_MP, pdump_server);
 	if (ret && rte_errno != ENOTSUP)
 		return -1;
@@ -392,14 +476,21 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 static int
 pdump_validate_flags(uint32_t flags)
 {
-	if (flags != RTE_PDUMP_FLAG_RX && flags != RTE_PDUMP_FLAG_TX &&
-		flags != RTE_PDUMP_FLAG_RXTX) {
+	if ((flags & RTE_PDUMP_FLAG_RXTX) == 0) {
 		PDUMP_LOG(ERR,
 			"invalid flags, should be either rx/tx/rxtx\n");
 		rte_errno = EINVAL;
 		return -1;
 	}
 
+	/* mask off the flags we know about */
+	if (flags & ~(RTE_PDUMP_FLAG_RXTX | RTE_PDUMP_FLAG_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "unknown flags: %#x\n", flags);
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -426,12 +517,12 @@ pdump_validate_port(uint16_t port, char *name)
 }
 
 static int
-pdump_prepare_client_request(char *device, uint16_t queue,
-				uint32_t flags,
-				uint16_t operation,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+pdump_prepare_client_request(const char *device, uint16_t queue,
+			     uint32_t flags, uint32_t snaplen,
+			     uint16_t operation,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     const struct rte_bpf_prm *prm)
 {
 	int ret = -1;
 	struct rte_mp_msg mp_req, *mp_rep;
@@ -440,26 +531,26 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	struct pdump_request *req = (struct pdump_request *)mp_req.param;
 	struct pdump_response *resp;
 
-	req->ver = 1;
-	req->flags = flags;
+	memset(req, 0, sizeof(*req));
+	if (flags & RTE_PDUMP_FLAG_PCAPNG)
+		req->ver = PDUMP_CLIENT_PCAPNG;
+	else
+		req->ver = PDUMP_CLIENT_LEGACY;
+
+	req->flags = flags & RTE_PDUMP_FLAG_RXTX;
 	req->op = operation;
+	req->queue = queue;
+	rte_strscpy(req->device, device, sizeof(req->device));
+
 	if ((operation & ENABLE) != 0) {
-		strlcpy(req->data.en_v1.device, device,
-			sizeof(req->data.en_v1.device));
-		req->data.en_v1.queue = queue;
-		req->data.en_v1.ring = ring;
-		req->data.en_v1.mp = mp;
-		req->data.en_v1.filter = filter;
-	} else {
-		strlcpy(req->data.dis_v1.device, device,
-			sizeof(req->data.dis_v1.device));
-		req->data.dis_v1.queue = queue;
-		req->data.dis_v1.ring = NULL;
-		req->data.dis_v1.mp = NULL;
-		req->data.dis_v1.filter = NULL;
+		req->queue = queue;
+		req->ring = ring;
+		req->mp = mp;
+		req->prm = prm;
+		req->snaplen = snaplen;
 	}
 
-	strlcpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_req.len_param = sizeof(*req);
 	mp_req.num_fds = 0;
 	if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) {
@@ -477,11 +568,17 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	return ret;
 }
 
-int
-rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
-			struct rte_ring *ring,
-			struct rte_mempool *mp,
-			void *filter)
+/*
+ * There are two versions of this function, because although original API
+ * left place holder for future filter, it never checked the value.
+ * Therefore the API can't depend on application passing a non
+ * bogus value.
+ */
+static int
+pdump_enable(uint16_t port, uint16_t queue,
+	     uint32_t flags, uint32_t snaplen,
+	     struct rte_ring *ring, struct rte_mempool *mp,
+	     const struct rte_bpf_prm *prm)
 {
 	int ret;
 	char name[RTE_DEV_NAME_MAX_LEN];
@@ -496,20 +593,42 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						ENABLE, ring, mp, filter);
+	if (snaplen == 0)
+		snaplen = UINT32_MAX;
 
-	return ret;
+	return pdump_prepare_client_request(name, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
 }
 
 int
-rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
-				uint32_t flags,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
+		 struct rte_ring *ring,
+		 struct rte_mempool *mp,
+		 void *filter __rte_unused)
 {
-	int ret = 0;
+	return pdump_enable(port, queue, flags, 0,
+			    ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm)
+{
+	return pdump_enable(port, queue, flags, snaplen,
+			    ring, mp, prm);
+}
+
+static int
+pdump_enable_by_deviceid(const char *device_id, uint16_t queue,
+			 uint32_t flags, uint32_t snaplen,
+			 struct rte_ring *ring,
+			 struct rte_mempool *mp,
+			 const struct rte_bpf_prm *prm)
+{
+	int ret;
 
 	ret = pdump_validate_ring_mp(ring, mp);
 	if (ret < 0)
@@ -518,10 +637,30 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						ENABLE, ring, mp, filter);
+	return pdump_prepare_client_request(device_id, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
+}
 
-	return ret;
+int
+rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
+			     uint32_t flags,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     void *filter __rte_unused)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, 0,
+					ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *prm)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, snaplen,
+					ring, mp, prm);
 }
 
 int
@@ -537,8 +676,8 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags)
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(name, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
@@ -553,8 +692,66 @@ rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(device_id, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
+
+static void
+pdump_sum_stats(uint16_t port, uint16_t nq,
+		struct rte_pdump_stats stats[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+		struct rte_pdump_stats *total)
+{
+	uint64_t *sum = (uint64_t *)total;
+	unsigned int i;
+	uint64_t val;
+	uint16_t qid;
+
+	for (qid = 0; qid < nq; qid++) {
+		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+
+		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
+			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			sum[i] += val;
+		}
+	}
+}
+
+int
+rte_pdump_stats(uint16_t port, struct rte_pdump_stats *stats)
+{
+	struct rte_eth_dev_info dev_info;
+	const struct rte_memzone *mz;
+	int ret;
+
+	memset(stats, 0, sizeof(*stats));
+	ret = rte_eth_dev_info_get(port, &dev_info);
+	if (ret != 0) {
+		PDUMP_LOG(ERR,
+			  "Error during getting device (port %u) info: %s\n",
+			  port, strerror(-ret));
+		return ret;
+	}
+
+	if (pdump_stats == NULL) {
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+			PDUMP_LOG(ERR,
+				  "pdump not initialized\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		mz = rte_memzone_lookup(MZ_RTE_PDUMP_STATS);
+		if (mz == NULL) {
+			PDUMP_LOG(ERR, "can not find pdump stats\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+		pdump_stats = mz->addr;
+	}
+
+	pdump_sum_stats(port, dev_info.nb_rx_queues, pdump_stats->rx, stats);
+	pdump_sum_stats(port, dev_info.nb_tx_queues, pdump_stats->tx, stats);
+	return 0;
+}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index 6b00fc17aeb2..be3fd14c4bd3 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -15,6 +15,7 @@
 #include <stdint.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
+#include <rte_bpf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -26,7 +27,9 @@ enum {
 	RTE_PDUMP_FLAG_RX = 1,  /* receive direction */
 	RTE_PDUMP_FLAG_TX = 2,  /* transmit direction */
 	/* both receive and transmit directions */
-	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
+	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX),
+
+	RTE_PDUMP_FLAG_PCAPNG = 4, /* format for pcapng */
 };
 
 /**
@@ -68,7 +71,7 @@ rte_pdump_uninit(void);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  Unused should be NULL.
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -80,6 +83,41 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 		struct rte_mempool *mp,
 		void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given port and queue with filtering.
+ *
+ * @param port_id
+ *  The Ethernet port on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param prm
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf(uint16_t port_id, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm);
+
 /**
  * Disables packet capturing on given port and queue.
  *
@@ -118,7 +156,7 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  unused should be NULL
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -131,6 +169,43 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 				struct rte_mempool *mp,
 				void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given device id and queue with filtering.
+ * device_id can be name or pci address of device.
+ *
+ * @param device_id
+ *  device id on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *filter);
+
+
 /**
  * Disables packet capturing on given device_id and queue.
  * device_id can be name or pci address of device.
@@ -153,6 +228,35 @@ int
 rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 				uint32_t flags);
 
+
+/**
+ * A structure used to retrieve statistics from packet capture.
+ * The statistics are sum of both receive and transmit queues.
+ */
+struct rte_pdump_stats {
+	uint64_t accepted; /**< Number of packets accepted by filter. */
+	uint64_t filtered; /**< Number of packets rejected by filter. */
+	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
+	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+
+	uint64_t reserved[4]; /**< Reserved and pad to cache line */
+};
+
+/**
+ * Retrieve the packet capture statistics for a queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param stats
+ *   A pointer to structure of type *rte_pdump_stats* to be filled in.
+ * @return
+ *   Zero if successful. -1 on error and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_pdump_stats(uint16_t port_id, struct rte_pdump_stats *stats);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index f0a9d12c9a9e..ce5502d9cdf4 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -10,3 +10,11 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_pdump_enable_bpf;
+	rte_pdump_enable_bpf_by_deviceid;
+	rte_pdump_stats;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v8 07/12] app/dumpcap: add new packet capture application
  2021-09-13 18:14 ` [dpdk-dev] [PATCH v8 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (5 preceding siblings ...)
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 06/12] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-09-13 18:15   ` Stephen Hemminger
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 08/12] test: add test for bpf_convert Stephen Hemminger
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-13 18:15 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.

It is not complete yet some features such as filtering are not implemented.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/dumpcap/main.c      | 844 ++++++++++++++++++++++++++++++++++++++++
 app/dumpcap/meson.build |  16 +
 app/meson.build         |   1 +
 3 files changed, 861 insertions(+)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build

diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c
new file mode 100644
index 000000000000..baf9eee46666
--- /dev/null
+++ b/app/dumpcap/main.c
@@ -0,0 +1,844 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * DPDK application to dump network traffic
+ * This is designed to look and act like the Wireshark
+ * dumpcap program.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <limits.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_alarm.h>
+#include <rte_bpf.h>
+#include <rte_config.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_lcore.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_pcapng.h>
+#include <rte_pdump.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+#include <rte_time.h>
+#include <rte_version.h>
+
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+
+#define RING_NAME "capture-ring"
+#define MONITOR_INTERVAL  (500 * 1000)
+#define MBUF_POOL_CACHE_SIZE 32
+#define BURST_SIZE 32
+#define SLEEP_THRESHOLD 1000
+
+/* command line flags */
+static const char *progname;
+static bool quit_signal;
+static bool group_read;
+static bool quiet;
+static bool promiscuous_mode = true;
+static bool use_pcapng = true;
+static char *output_name;
+static const char *filter_str;
+static unsigned int ring_size = 2048;
+static const char *capture_comment;
+static uint32_t snaplen = RTE_MBUF_DEFAULT_BUF_SIZE;
+static bool dump_bpf;
+static struct {
+	uint64_t  duration;	/* nanoseconds */
+	unsigned long packets;  /* number of packets in file */
+	size_t size;		/* file size (bytes) */
+} stop;
+
+/* Running state */
+static struct rte_bpf_prm *bpf_prm;
+static uint64_t start_time, end_time;
+static uint64_t packets_received;
+static size_t file_size;
+
+struct interface {
+	TAILQ_ENTRY(interface) next;
+	uint16_t port;
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_rxtx_callback *rx_cb[RTE_MAX_QUEUES_PER_PORT];
+};
+
+TAILQ_HEAD(interface_list, interface);
+static struct interface_list interfaces = TAILQ_HEAD_INITIALIZER(interfaces);
+static struct interface *port2intf[RTE_MAX_ETHPORTS];
+
+/* Can do either pcap or pcapng format output */
+typedef union {
+	rte_pcapng_t  *pcapng;
+	pcap_dumper_t *dumper;
+} dumpcap_out_t;
+
+static void usage(void)
+{
+	printf("Usage: %s [options] ...\n\n", progname);
+	printf("Capture Interface:\n"
+	       "  -i <interface>           name or port index of interface\n"
+	       "  -f <capture filter>      packet filter in libpcap filter syntax\n");
+	printf("  -s <snaplen>, --snapshot-length <snaplen>\n"
+	       "                           packet snapshot length (def: %u)\n",
+	       RTE_MBUF_DEFAULT_BUF_SIZE);
+	printf("  -p, --no-promiscuous-mode\n"
+	       "                           don't capture in promiscuous mode\n"
+	       "  -D, --list-interfaces    print list of interfaces and exit\n"
+	       "  -d                       print generated BPF code for capture filter\n"
+	       "\n"
+	       "Stop conditions:\n"
+	       "  -c <packet count>        stop after n packets (def: infinite)\n"
+	       "  -a <autostop cond.> ..., --autostop <autostop cond.> ...\n"
+	       "                           duration:NUM - stop after NUM seconds\n"
+	       "                           filesize:NUM - stop this file after NUM kB\n"
+	       "                            packets:NUM - stop after NUM packets\n"
+	       "Output (files):\n"
+	       "  -w <filename>            name of file to save (def: tempfile)\n"
+	       "  -g                       enable group read access on the output file(s)\n"
+	       "  -n                       use pcapng format instead of pcap (default)\n"
+	       "  -P                       use libpcap format instead of pcapng\n"
+	       "  --capture-comment <comment>\n"
+	       "                           add a capture comment to the output file\n"
+	       "\n"
+	       "Miscellaneous:\n"
+	       "  -q                       don't report packet capture counts\n"
+	       "  -v, --version            print version information and exit\n"
+	       "  -h, --help               display this help and exit\n"
+	       "\n"
+	       "Use Ctrl-C to stop capturing at any time.\n");
+}
+
+static const char *version(void)
+{
+	static char str[128];
+
+	snprintf(str, sizeof(str),
+		 "%s 1.0 (%s)\n", progname, rte_version());
+	return str;
+}
+
+/* Parse numeric argument from command line */
+static unsigned long get_uint(const char *arg, const char *name,
+			     unsigned int limit)
+{
+	unsigned long u;
+	char *endp;
+
+	u = strtoul(arg, &endp, 0);
+	if (*arg == '\0' || *endp != '\0')
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is not a valid number\n",
+			 name, arg);
+	if (limit && u > limit)
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is too large (greater than %u)\n",
+			 name, arg, limit);
+
+	return u;
+}
+
+/* Set auto stop values */
+static void auto_stop(char *opt)
+{
+	char *value, *endp;
+
+	value = strchr(opt, ':');
+	if (value == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Missing colon in auto stop parameter\n");
+
+	*value++ = '\0';
+	if (strcmp(opt, "duration") == 0) {
+		double interval = strtod(value, &endp);
+
+		if (*value == '\0' || *endp != '\0' || interval <= 0)
+			rte_exit(EXIT_FAILURE,
+				 "Invalid duration \"%s\"\n", value);
+		stop.duration = NSEC_PER_SEC * interval;
+	} else if (strcmp(opt, "filesize") == 0) {
+		stop.size = get_uint(value, "filesize", 0) * 1024;
+	} else if (strcmp(opt, "packets") == 0) {
+		stop.packets = get_uint(value, "packets", 0);
+	} else {
+		rte_exit(EXIT_FAILURE,
+			 "Unknown autostop parameter \"%s\"\n", opt);
+	}
+}
+
+/* Add interface to list of interfaces to capture */
+static void add_interface(uint16_t port, const char *name)
+{
+	struct interface *intf;
+
+	intf = malloc(sizeof(*intf));
+	if (!intf)
+		rte_exit(EXIT_FAILURE, "no memory for interface\n");
+
+	memset(intf, 0, sizeof(*intf));
+	rte_strscpy(intf->name, name, sizeof(intf->name));
+
+	printf("Capturing on '%s'\n", name);
+
+	port2intf[port] = intf;
+	TAILQ_INSERT_TAIL(&interfaces, intf, next);
+}
+
+/* Select all valid DPDK interfaces */
+static void select_all_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+	}
+}
+
+/*
+ * Choose interface to capture if no -i option given.
+ * Select the first DPDK port, this matches what dumpcap does.
+ */
+static void set_default_interface(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+		return;
+	}
+	rte_exit(EXIT_FAILURE, "No usable interfaces found\n");
+}
+
+/* Lookup interface by name or port and add it to the list */
+static void select_interface(const char *arg)
+{
+	uint16_t port;
+
+	if (strcmp(arg, "*"))
+		select_all_interfaces();
+	else if (rte_eth_dev_get_port_by_name(arg, &port) == 0)
+		add_interface(port, arg);
+	else {
+		char name[RTE_ETH_NAME_MAX_LEN];
+
+		port = get_uint(arg, "port_number", UINT16_MAX);
+		if (rte_eth_dev_get_name_by_port(port, name) < 0)
+			rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
+				 port);
+		add_interface(port, name);
+	}
+}
+
+/* Display list of possible interfaces that can be used. */
+static void show_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		printf("%u. %s\n", p, name);
+	}
+}
+
+static void compile_filter(void)
+{
+	struct bpf_program bf;
+	pcap_t *pcap;
+
+	pcap = pcap_open_dead(DLT_EN10MB, snaplen);
+	if (!pcap)
+		rte_exit(EXIT_FAILURE, "can not open pcap\n");
+
+	if (pcap_compile(pcap, &bf, filter_str,
+			 1, PCAP_NETMASK_UNKNOWN) != 0)
+		rte_exit(EXIT_FAILURE, "pcap filter string not valid (%s)\n",
+			 pcap_geterr(pcap));
+
+	bpf_prm = rte_bpf_convert(&bf);
+	if (bpf_prm == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "bpf convert failed\n");
+
+	if (dump_bpf) {
+		printf("cBPF program (%u insns)\n", bf.bf_len);
+		bpf_dump(&bf, 1);
+		printf("\neBPF program (%u insns)\n", bpf_prm->nb_ins);
+		rte_bpf_dump(stdout, bpf_prm->ins, bpf_prm->nb_ins);
+		exit(0);
+	}
+
+	/* Don't care about original program any more */
+	pcap_freecode(&bf);
+	pcap_close(pcap);
+}
+
+/*
+ * Parse command line options.
+ * These are chosen to be similar to dumpcap command.
+ */
+static void parse_opts(int argc, char **argv)
+{
+	static const struct option long_options[] = {
+		{ "autostop",        required_argument, NULL, 'a' },
+		{ "capture-comment", required_argument, NULL, 0 },
+		{ "help",            no_argument,       NULL, 'h' },
+		{ "interface",       required_argument, NULL, 'i' },
+		{ "list-interfaces", no_argument,       NULL, 'D' },
+		{ "no-promiscuous-mode", no_argument,   NULL, 'p' },
+		{ "output-file",     required_argument, NULL, 'w' },
+		{ "ring-buffer",     required_argument, NULL, 'b' },
+		{ "snapshot-length", required_argument, NULL, 's' },
+		{ "version",         no_argument,       NULL, 'v' },
+		{ NULL },
+	};
+	int option_index, c;
+
+	for (;;) {
+		c = getopt_long(argc, argv, "a:b:c:dDf:ghi:nN:pPqs:vw:",
+				long_options, &option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			switch (option_index) {
+			case 0:
+				capture_comment = optarg;
+				break;
+			default:
+				usage();
+				exit(1);
+			}
+			break;
+		case 'a':
+			auto_stop(optarg);
+			break;
+		case 'b':
+			rte_exit(EXIT_FAILURE,
+				 "multiple files not implemented\n");
+			break;
+		case 'c':
+			stop.packets = get_uint(optarg, "packet_count", 0);
+			break;
+		case 'd':
+			dump_bpf = true;
+			break;
+		case 'D':
+			show_interfaces();
+			exit(0);
+		case 'f':
+			filter_str = optarg;
+			break;
+		case 'g':
+			group_read = true;
+			break;
+		case 'h':
+			printf("%s\n\n", version());
+			usage();
+			exit(0);
+		case 'i':
+			select_interface(optarg);
+			break;
+		case 'n':
+			use_pcapng = true;
+			break;
+		case 'N':
+			ring_size = get_uint(optarg, "packet_limit", 0);
+			break;
+		case 'p':
+			promiscuous_mode = false;
+			break;
+		case 'P':
+			use_pcapng = false;
+			break;
+		case 'q':
+			quiet = true;
+			break;
+		case 's':
+			snaplen = get_uint(optarg, "snap_len", 0);
+			break;
+		case 'w':
+			output_name = optarg;
+			break;
+		case 'v':
+			printf("%s\n", version());
+			exit(0);
+		default:
+			fprintf(stderr, "Invalid option: %s\n",
+				argv[optind - 1]);
+			usage();
+			exit(1);
+		}
+	}
+}
+
+static void
+signal_handler(int sig_num __rte_unused)
+{
+	__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+}
+
+/* Return the time since 1/1/1970 in nanoseconds */
+static uint64_t create_timestamp(void)
+{
+	struct timespec now;
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+	return rte_timespec_to_ns(&now);
+}
+
+static void
+cleanup_pdump_resources(void)
+{
+	struct interface *intf;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		rte_pdump_disable(intf->port,
+				  RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
+		if (promiscuous_mode)
+			rte_eth_promiscuous_disable(intf->port);
+	}
+}
+
+/* Alarm signal handler, used to check that primary process */
+static void
+monitor_primary(void *arg __rte_unused)
+{
+	if (__atomic_load_n(&quit_signal, __ATOMIC_RELAXED))
+		return;
+
+	if (rte_eal_primary_proc_alive(NULL)) {
+		rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	} else {
+		fprintf(stderr,
+			"Primary process is no longer active, exiting...\n");
+		__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+	}
+}
+
+/* Setup handler to check when primary exits. */
+static void
+enable_primary_monitor(void)
+{
+	int ret;
+
+	/* Once primary exits, so will pdump. */
+	ret = rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to enable monitor:%d\n", ret);
+}
+
+static void
+disable_primary_monitor(void)
+{
+	int ret;
+
+	ret = rte_eal_alarm_cancel(monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to disable monitor:%d\n", ret);
+}
+
+static void
+report_packet_stats(dumpcap_out_t out)
+{
+	struct rte_pdump_stats pdump_stats;
+	struct interface *intf;
+	uint64_t ifrecv, ifdrop;
+	double percent;
+
+	fputc('\n', stderr);
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (rte_pdump_stats(intf->port, &pdump_stats) < 0)
+			continue;
+
+		/* do what Wiretap does */
+		ifrecv = pdump_stats.accepted + pdump_stats.filtered;
+		ifdrop = pdump_stats.nombuf + pdump_stats.ringfull;
+
+		if (use_pcapng)
+			rte_pcapng_write_stats(out.pcapng, intf->port, NULL,
+					       start_time, end_time,
+					       ifrecv, ifdrop);
+
+		if (ifrecv == 0)
+			percent = 0;
+		else
+			percent = 100. * ifrecv / (ifrecv + ifdrop);
+
+		fprintf(stderr,
+			"Packets received/dropped on interface '%s': "
+			"%"PRIu64 "/%" PRIu64 " (%.1f)\n",
+			intf->name, ifrecv, ifdrop, percent);
+	}
+}
+
+/*
+ * Start DPDK EAL with arguments.
+ * Unlike most DPDK programs, this application does not use the
+ * typical EAL command line arguments.
+ * We don't want to expose all the DPDK internals to the user.
+ */
+static void dpdk_init(void)
+{
+	static const char * const args[] = {
+		"dumpcap", "--proc-type", "secondary",
+		"--log-level", "notice"
+
+	};
+	const int eal_argc = RTE_DIM(args);
+	char **eal_argv;
+	unsigned int i;
+
+	/* DPDK API requires mutable versions of command line arguments. */
+	eal_argv = calloc(eal_argc + 1, sizeof(char *));
+	if (eal_argv == NULL)
+		rte_panic("No memory\n");
+
+	eal_argv[0] = strdup(progname);
+	for (i = 1; i < RTE_DIM(args); i++)
+		eal_argv[i] = strdup(args[i]);
+
+	if (rte_eal_init(eal_argc, eal_argv) < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed: is primary process running?\n");
+
+	if (rte_eth_dev_count_avail() == 0)
+		rte_exit(EXIT_FAILURE, "No Ethernet ports found\n");
+}
+
+/* Create packet ring shared between callbacks and process */
+static struct rte_ring *create_ring(void)
+{
+	struct rte_ring *ring;
+	size_t size, log2;
+
+	/* Find next power of 2 >= size. */
+	size = ring_size;
+	log2 = sizeof(size) * 8 - __builtin_clzl(size - 1);
+	size = 1u << log2;
+
+	if (size != ring_size) {
+		fprintf(stderr, "Ring size %u rounded up to %zu\n",
+			ring_size, size);
+		ring_size = size;
+	}
+
+	ring = rte_ring_lookup(RING_NAME);
+	if (ring == NULL) {
+		ring = rte_ring_create(RING_NAME, ring_size,
+					rte_socket_id(), 0);
+		if (ring == NULL)
+			rte_exit(EXIT_FAILURE, "Could not create ring :%s\n",
+				 rte_strerror(rte_errno));
+	}
+	return ring;
+}
+
+static struct rte_mempool *create_mempool(void)
+{
+	static const char pool_name[] = "capture_mbufs";
+	size_t num_mbufs = 2 * ring_size;
+	struct rte_mempool *mp;
+
+	mp = rte_mempool_lookup(pool_name);
+	if (mp)
+		return mp;
+
+	mp = rte_pktmbuf_pool_create_by_ops(pool_name, num_mbufs,
+					    MBUF_POOL_CACHE_SIZE, 0,
+					    rte_pcapng_mbuf_size(snaplen),
+					    rte_socket_id(), "ring_mp_sc");
+	if (mp == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Mempool (%s) creation failed: %s\n", pool_name,
+			 rte_strerror(rte_errno));
+
+	return mp;
+}
+
+/*
+ * Get Operating System information.
+ * Returns an string allocated via malloc().
+ */
+static char *get_os_info(void)
+{
+	struct utsname uts;
+	char *osname = NULL;
+
+	if (uname(&uts) < 0)
+		return NULL;
+
+	if (asprintf(&osname, "%s %s",
+		     uts.sysname, uts.release) == -1)
+		return NULL;
+
+	return osname;
+}
+
+static dumpcap_out_t create_output(void)
+{
+	dumpcap_out_t ret;
+	static char tmp_path[PATH_MAX];
+	int fd;
+
+	/* If no filename specified make a tempfile name */
+	if (output_name == NULL) {
+		struct interface *intf;
+		struct tm *tm;
+		time_t now;
+		char ts[32];
+
+		intf = TAILQ_FIRST(&interfaces);
+		now = time(NULL);
+		tm = localtime(&now);
+		if (!tm)
+			rte_panic("localtime failed\n");
+
+		strftime(ts, sizeof(ts), "%Y%m%d%H%M%S", tm);
+
+		snprintf(tmp_path, sizeof(tmp_path),
+			 "/tmp/%s_%u_%s_%s.%s",
+			 progname, intf->port, intf->name, ts,
+			 use_pcapng ? "pcapng" : "pcap");
+		output_name = tmp_path;
+	}
+
+	if (strcmp(output_name, "-") == 0)
+		fd = STDOUT_FILENO;
+	else {
+		mode_t mode = group_read ? 0640 : 0600;
+
+		fd = open(output_name, O_WRONLY | O_CREAT, mode);
+		if (fd < 0)
+			rte_exit(EXIT_FAILURE, "Can not open \"%s\": %s\n",
+				 output_name, strerror(errno));
+	}
+
+	if (use_pcapng) {
+		char *os = get_os_info();
+
+		ret.pcapng = rte_pcapng_fdopen(fd, os, NULL,
+					   version(), capture_comment);
+		if (ret.pcapng == NULL)
+			rte_exit(EXIT_FAILURE, "pcapng_fdopen failed: %s\n",
+				 strerror(rte_errno));
+		free(os);
+	} else {
+		pcap_t *pcap;
+
+		pcap = pcap_open_dead_with_tstamp_precision(DLT_EN10MB, snaplen,
+							    PCAP_TSTAMP_PRECISION_NANO);
+		if (pcap == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_open_dead failed\n");
+
+		ret.dumper = pcap_dump_fopen(pcap, fdopen(fd, "w"));
+		if (ret.dumper == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_dump_fopen failed: %s\n",
+				 pcap_geterr(pcap));
+	}
+
+	return ret;
+}
+
+static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp)
+{
+	struct interface *intf;
+	uint32_t flags;
+	int ret;
+
+	flags = RTE_PDUMP_FLAG_RXTX;
+	if (use_pcapng)
+		flags |= RTE_PDUMP_FLAG_PCAPNG;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (promiscuous_mode)
+			rte_eth_promiscuous_enable(intf->port);
+
+		ret = rte_pdump_enable_bpf(intf->port, RTE_PDUMP_ALL_QUEUES,
+					   flags, snaplen,
+					   r, mp, bpf_prm);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Packet dump enable failed: %s\n",
+				 rte_strerror(-ret));
+	}
+}
+
+/*
+ * Show current count of captured packets
+ * with backspaces to overwrite last value.
+ */
+static void show_count(uint64_t count)
+{
+	unsigned int i;
+	static unsigned int bt;
+
+	for (i = 0; i < bt; i++)
+		fputc('\b', stderr);
+
+	bt = fprintf(stderr, "%"PRIu64" ", count);
+}
+
+/* Write multiple packets in older pcap format */
+static ssize_t
+pcap_write_packets(pcap_dumper_t *dumper,
+		   struct rte_mbuf *pkts[], uint16_t n)
+{
+	uint8_t temp_data[snaplen];
+	struct pcap_pkthdr header;
+	uint16_t i;
+	size_t total = 0;
+
+	gettimeofday(&header.ts, NULL);
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m = pkts[i];
+
+		header.len = rte_pktmbuf_pkt_len(m);
+		header.caplen = RTE_MIN(header.len, snaplen);
+
+		pcap_dump((u_char *)dumper, &header,
+			  rte_pktmbuf_read(m, 0, header.caplen, temp_data));
+
+		total += sizeof(header) + header.len;
+	}
+
+	return total;
+}
+
+/* Process all packets in ring and dump to capture file */
+static int process_ring(dumpcap_out_t out, struct rte_ring *r)
+{
+	struct rte_mbuf *pkts[BURST_SIZE];
+	unsigned int avail, n;
+	static unsigned int empty_count;
+	ssize_t written;
+
+	n = rte_ring_sc_dequeue_burst(r, (void **) pkts, BURST_SIZE,
+				      &avail);
+	if (n == 0) {
+		/* don't consume endless amounts of cpu if idle */
+		if (empty_count < SLEEP_THRESHOLD)
+			++empty_count;
+		else
+			usleep(10);
+		return 0;
+	}
+
+	empty_count = (avail == 0);
+
+	if (use_pcapng)
+		written = rte_pcapng_write_packets(out.pcapng, pkts, n);
+	else
+		written = pcap_write_packets(out.dumper, pkts, n);
+
+	rte_pktmbuf_free_bulk(pkts, n);
+
+	if (written < 0)
+		return -1;
+
+	file_size += written;
+	packets_received += n;
+	if (!quiet)
+		show_count(packets_received);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	struct rte_ring *r;
+	struct rte_mempool *mp;
+	dumpcap_out_t out;
+
+	progname = argv[0];
+
+	dpdk_init();
+	parse_opts(argc, argv);
+
+	if (filter_str)
+		compile_filter();
+
+	if (TAILQ_EMPTY(&interfaces))
+		set_default_interface();
+
+	r = create_ring();
+	mp = create_mempool();
+	out = create_output();
+
+	start_time = create_timestamp();
+	enable_pdump(r, mp);
+
+	signal(SIGINT, signal_handler);
+	signal(SIGPIPE, SIG_IGN);
+
+	enable_primary_monitor();
+
+	if (!quiet) {
+		fprintf(stderr, "Packets captured: ");
+		show_count(0);
+	}
+
+	while (!__atomic_load_n(&quit_signal, __ATOMIC_RELAXED)) {
+		if (process_ring(out, r) < 0) {
+			fprintf(stderr, "pcapng file write failed; %s\n",
+				strerror(errno));
+			break;
+		}
+
+		if (stop.size && file_size >= stop.size)
+			break;
+
+		if (stop.packets && packets_received >= stop.packets)
+			break;
+
+		if (stop.duration != 0 &&
+		    create_timestamp() - start_time > stop.duration)
+			break;
+	}
+
+	end_time = create_timestamp();
+	disable_primary_monitor();
+
+	if (rte_eal_primary_proc_alive(NULL))
+		report_packet_stats(out);
+
+	if (use_pcapng)
+		rte_pcapng_close(out.pcapng);
+	else
+		pcap_dump_close(out.dumper);
+
+	cleanup_pdump_resources();
+	rte_free(bpf_filter);
+	rte_ring_free(r);
+	rte_mempool_free(mp);
+
+	return rte_eal_cleanup() ? EXIT_FAILURE : 0;
+}
diff --git a/app/dumpcap/meson.build b/app/dumpcap/meson.build
new file mode 100644
index 000000000000..794336211eff
--- /dev/null
+++ b/app/dumpcap/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+if not dpdk_conf.has('RTE_PORT_PCAP')
+    build = false
+    reason = 'missing dependency, "libpcap"'
+endif
+
+if is_windows
+	build = false
+	reason = 'not supported on Windows'
+	subdir_done()
+endif
+
+sources = files('main.c')
+deps += ['ethdev', 'pdump', 'pcapng', 'bpf']
diff --git a/app/meson.build b/app/meson.build
index 4c6049807cc3..e41a2e390236 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017-2019 Intel Corporation
 
 apps = [
+        'dumpcap',
         'pdump',
         'proc-info',
         'test-acl',
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v8 08/12] test: add test for bpf_convert
  2021-09-13 18:14 ` [dpdk-dev] [PATCH v8 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (6 preceding siblings ...)
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
@ 2021-09-13 18:15   ` Stephen Hemminger
  2021-09-15 11:34     ` Ananyev, Konstantin
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 09/12] test: add a test for pcapng library Stephen Hemminger
                     ` (3 subsequent siblings)
  11 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-13 18:15 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Add some functional tests for the Classic BPF to DPDK BPF converter.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 173 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 173 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 527c06b80708..1b5a178241d8 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -10,6 +10,7 @@
 #include <rte_memory.h>
 #include <rte_debug.h>
 #include <rte_hexdump.h>
+#include <rte_malloc.h>
 #include <rte_random.h>
 #include <rte_byteorder.h>
 #include <rte_errno.h>
@@ -3233,3 +3234,175 @@ test_bpf(void)
 }
 
 REGISTER_TEST_COMMAND(bpf_autotest, test_bpf);
+
+#ifdef RTE_PORT_PCAP
+#include <pcap/pcap.h>
+
+static void
+test_bpf_dump(struct bpf_program *cbf, const struct rte_bpf_prm *prm)
+{
+	printf("cBPF program (%u insns)\n", cbf->bf_len);
+	bpf_dump(cbf, 1);
+
+	printf("\neBPF program (%u insns)\n", prm->nb_ins);
+	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
+}
+
+static int
+test_bpf_match(pcap_t *pcap, const char *str, bool expected)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+	uint8_t tbuf[sizeof(struct dummy_mbuf)];
+	int ret = -1;
+	uint64_t rc;
+
+	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
+		       __func__, __LINE__,  str, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, str, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	test_ld_mbuf1_prepare(tbuf);
+	rc = rte_bpf_exec(bpf, tbuf);
+	if ((rc == 0) == expected)
+		ret = 0;
+	else
+		printf("%s@%d: failed match: expect %s 0 got %"PRIu64"\n",
+		       __func__, __LINE__, expected ? "==" : "<>",  rc);
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return ret;
+}
+
+/* Basic sanity test can we match a IP packet */
+static int
+test_bpf_filter_sanity(pcap_t *pcap)
+{
+	int ret;
+
+	ret = test_bpf_match(pcap, "ip", true);
+	ret |= test_bpf_match(pcap, "not ip", false);
+
+	return ret;
+}
+
+/*
+ * Some sample pcap filter strings from
+ * https://wiki.wireshark.org/CaptureFilters
+ */
+static const char * const sample_filters[] = {
+	"host 172.18.5.4",
+	"net 192.168.0.0/24",
+	"src net 192.168.0.0/24",
+	"src net 192.168.0.0 mask 255.255.255.0",
+	"dst net 192.168.0.0/24",
+	"dst net 192.168.0.0 mask 255.255.255.0",
+	"port 53",
+	"host www.example.com and not (port 80 or port 25)",
+	"host www.example.com and not port 80 and not port 25",
+	"port not 53 and not arp",
+	"(tcp[0:2] > 1500 and tcp[0:2] < 1550) or (tcp[2:2] > 1500 and tcp[2:2] < 1550)",
+	"ether proto 0x888e",
+	"ether[0] & 1 = 0 and ip[16] >= 224",
+	"icmp[icmptype] != icmp-echo and icmp[icmptype] != icmp-echoreply",
+	"tcp[tcpflags] & (tcp-syn|tcp-fin) != 0 and not src and dst net 127.0.0.1",
+	"not ether dst 01:80:c2:00:00:0e",
+	"not broadcast and not multicast",
+	"dst host ff02::1",
+	"port 80 and tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420",
+	/* Worms */
+	"dst port 135 and tcp port 135 and ip[2:2]==48",
+	"icmp[icmptype]==icmp-echo and ip[2:2]==92 and icmp[8:4]==0xAAAAAAAA",
+	"dst port 135 or dst port 445 or dst port 1433"
+	" and tcp[tcpflags] & (tcp-syn) != 0"
+	" and tcp[tcpflags] & (tcp-ack) = 0 and src net 192.168.0.0/24",
+	"tcp src port 443 and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4] = 0x18)"
+	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 1] = 0x03)"
+	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 2] < 0x04)"
+	" and ((ip[2:2] - 4 * (ip[0] & 0x0F) - 4 * ((tcp[12] & 0xF0) >> 4) > 69))",
+	/* Other */
+	"len = 128",
+};
+
+static int
+test_bpf_filter(pcap_t *pcap, const char *s)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+
+	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
+		       __func__, __LINE__, s, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, s, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	else {
+		printf("%s \"%s\"\n", __func__, s);
+		test_bpf_dump(&fcode, prm);
+	}
+
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return (bpf == NULL) ? -1 : 0;
+}
+
+static int
+test_bpf_convert(void)
+{
+	unsigned int i;
+	pcap_t *pcap;
+	int rc;
+
+	pcap = pcap_open_dead(DLT_EN10MB, 262144);
+	if (!pcap) {
+		printf("pcap_open_dead failed\n");
+		return -1;
+	}
+
+	rc = test_bpf_filter_sanity(pcap);
+	for (i = 0; i < RTE_DIM(sample_filters); i++)
+		rc |= test_bpf_filter(pcap, sample_filters[i]);
+
+	pcap_close(pcap);
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(bpf_convert_autotest, test_bpf_convert);
+#endif /* RTE_PORT_PCAP */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v8 09/12] test: add a test for pcapng library
  2021-09-13 18:14 ` [dpdk-dev] [PATCH v8 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (7 preceding siblings ...)
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 08/12] test: add test for bpf_convert Stephen Hemminger
@ 2021-09-13 18:15   ` Stephen Hemminger
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 10/12] test: enable bpf autotest Stephen Hemminger
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-13 18:15 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Simple unit test that created pcapng file using API.

To run this test you need to have at least one device.
For example:

DPDK_TEST=pcapng_autotest ./build/app/test/dpdk-test -l 0-15 \
    --no-huge -m 2048 --vdev=net_tap,iface=dummy

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/meson.build   |   1 +
 app/test/test_pcapng.c | 190 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 191 insertions(+)
 create mode 100644 app/test/test_pcapng.c

diff --git a/app/test/meson.build b/app/test/meson.build
index a7611686adcb..0d551ac9c2b2 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -97,6 +97,7 @@ test_sources = files(
         'test_metrics.c',
         'test_mcslock.c',
         'test_mp_secondary.c',
+        'test_pcapng.c',
         'test_per_lcore.c',
         'test_pflock.c',
         'test_pmd_perf.c',
diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
new file mode 100644
index 000000000000..6bf993ad30f6
--- /dev/null
+++ b/app/test/test_pcapng.c
@@ -0,0 +1,190 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Microsoft Corporation
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <rte_ethdev.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_net.h>
+#include <rte_pcapng.h>
+
+#include "test.h"
+
+#define DUMMY_MBUF_NUM 3
+
+static rte_pcapng_t *pcapng;
+static struct rte_mempool *mp;
+static const uint32_t pkt_len = 200;
+static uint64_t ifrecv, ifdrop;
+static uint16_t port_id;
+
+/* first mbuf in the packet, should always be at offset 0 */
+struct dummy_mbuf {
+	struct rte_mbuf mb[DUMMY_MBUF_NUM];
+	uint8_t buf[DUMMY_MBUF_NUM][RTE_MBUF_DEFAULT_BUF_SIZE];
+};
+
+static void
+dummy_mbuf_prep(struct rte_mbuf *mb, uint8_t buf[], uint32_t buf_len,
+	uint32_t data_len)
+{
+	uint32_t i;
+	uint8_t *db;
+
+	mb->buf_addr = buf;
+	mb->buf_iova = (uintptr_t)buf;
+	mb->buf_len = buf_len;
+	rte_mbuf_refcnt_set(mb, 1);
+
+	/* set pool pointer to dummy value, test doesn't use it */
+	mb->pool = (void *)buf;
+
+	rte_pktmbuf_reset(mb);
+	db = (uint8_t *)rte_pktmbuf_append(mb, data_len);
+
+	for (i = 0; i != data_len; i++)
+		db[i] = i;
+}
+
+/* Make an IP packet consisting of chain of two packets */
+static void
+mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen)
+{
+	struct rte_ipv4_hdr *ph;
+	const struct rte_ipv4_hdr iph = {
+		.version_ihl = RTE_IPV4_VHL_DEF,
+		.total_length = rte_cpu_to_be_16(plen),
+		.time_to_live = IPDEFTTL,
+		.next_proto_id = IPPROTO_RAW,
+		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+	};
+
+	memset(dm, 0, sizeof(*dm));
+	dummy_mbuf_prep(&dm->mb[0], dm->buf[0], sizeof(dm->buf[0]), plen);
+
+	ph = rte_pktmbuf_mtod(dm->mb, typeof(ph));
+	memcpy(ph, &iph, sizeof(iph));
+}
+
+static int
+test_setup(void)
+{
+	char file_template[] = "/tmp/pcapng_test_XXXXXX.pcapng";
+	int tmp_fd;
+
+	port_id = rte_eth_find_next(0);
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		fprintf(stderr, "No valid Ether port\n");
+		return -1;
+	}
+
+	tmp_fd = mkstemps(file_template, strlen(".pcapng"));
+	if (tmp_fd == -1) {
+		perror("mkstemps() failure");
+		return -1;
+	}
+
+	/* open a test capture file */
+	pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL);
+	if (pcapng == NULL) {
+		fprintf(stderr, "rte_pcapng_fdopen failed\n");
+		close(tmp_fd);
+		return -1;
+	}
+
+	/* Make a pool for cloned packeets */
+	mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", DUMMY_MBUF_NUM,
+					    0, 0,
+					    rte_pcapng_mbuf_size(pkt_len),
+					    SOCKET_ID_ANY, "ring_mp_sc");
+	if (mp == NULL) {
+		fprintf(stderr, "Cannot create mempool\n");
+		return -1;
+	}
+	return 0;
+
+}
+
+static int
+test_basic_packets(void)
+{
+	struct rte_mbuf *orig, *clone = NULL;
+	struct dummy_mbuf mbfs;
+	ssize_t len;
+
+	/* make a dummy packet */
+	mbuf1_prepare(&mbfs, pkt_len);
+
+	/* clone them */
+	orig  = &mbfs.mb[0];
+	clone = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
+				rte_get_tsc_cycles(), 0);
+	if (clone == NULL) {
+		fprintf(stderr, "Cannot copy packet\n");
+		return -1;
+	}
+
+	/* write it to capture file */
+	len = rte_pcapng_write_packets(pcapng, &clone, 1);
+	rte_pktmbuf_free(clone);
+
+	if (len <= 0) {
+		fprintf(stderr, "Write of packets failed\n");
+		return -1;
+	}
+
+	++ifrecv;
+	return 0;
+}
+
+static int
+test_write_stats(void)
+{
+	ssize_t len;
+
+	/* write a statistics block */
+	len = rte_pcapng_write_stats(pcapng, port_id,
+				     NULL, 0, 0,
+				     ifrecv, ifdrop);
+	if (len <= 0) {
+		fprintf(stderr, "Write of statistics failed\n");
+		return -1;
+	}
+	return 0;
+}
+
+static void
+test_cleanup(void)
+{
+	if (mp)
+		rte_mempool_free(mp);
+
+	if (pcapng)
+		rte_pcapng_close(pcapng);
+
+}
+
+static struct
+unit_test_suite test_pcapng_suite  = {
+	.setup = test_setup,
+	.teardown = test_cleanup,
+	.suite_name = "Test Pcapng Unit Test Suite",
+	.unit_test_cases = {
+		TEST_CASE(test_basic_packets),
+		TEST_CASE(test_write_stats),
+		TEST_CASES_END()
+	}
+};
+
+static int
+test_pcapng(void)
+{
+	return unit_test_suite_runner(&test_pcapng_suite);
+}
+
+REGISTER_TEST_COMMAND(pcapng_autotest, test_pcapng);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v8 10/12] test: enable bpf autotest
  2021-09-13 18:14 ` [dpdk-dev] [PATCH v8 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (8 preceding siblings ...)
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 09/12] test: add a test for pcapng library Stephen Hemminger
@ 2021-09-13 18:15   ` Stephen Hemminger
  2021-09-15 11:27     ` Ananyev, Konstantin
  2021-09-16  3:09     ` Stephen Hemminger
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 11/12] doc: changes for new pcapng and dumpcap Stephen Hemminger
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
  11 siblings, 2 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-13 18:15 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

The BPF autotest is defined but not run automatically.
Since it is short, it should be added to the autotest suite.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/meson.build | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/app/test/meson.build b/app/test/meson.build
index 0d551ac9c2b2..cd18484bb73a 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -194,6 +194,8 @@ test_deps = [
 fast_tests = [
         ['acl_autotest', true],
         ['atomic_autotest', false],
+        ['bpf_autotest', true],
+        ['bpf_convert_autotest', true],
         ['bitops_autotest', true],
         ['byteorder_autotest', true],
         ['cksum_autotest', true],
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v8 11/12] doc: changes for new pcapng and dumpcap
  2021-09-13 18:14 ` [dpdk-dev] [PATCH v8 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (9 preceding siblings ...)
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 10/12] test: enable bpf autotest Stephen Hemminger
@ 2021-09-13 18:15   ` Stephen Hemminger
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-13 18:15 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Describe the new packet capture library and utilities

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/api/doxy-api-index.md                     |  1 +
 doc/api/doxy-api.conf.in                      |  1 +
 .../howto/img/packet_capture_framework.svg    | 96 +++++++++----------
 doc/guides/howto/packet_capture_framework.rst | 67 ++++++-------
 doc/guides/prog_guide/index.rst               |  1 +
 doc/guides/prog_guide/pcapng_lib.rst          | 24 +++++
 doc/guides/prog_guide/pdump_lib.rst           | 28 ++++--
 doc/guides/rel_notes/release_21_11.rst        | 10 ++
 doc/guides/tools/dumpcap.rst                  | 86 +++++++++++++++++
 doc/guides/tools/index.rst                    |  1 +
 10 files changed, 228 insertions(+), 87 deletions(-)
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a0356..ee07394d1c78 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -223,3 +223,4 @@ The public API headers are grouped by topics:
   [experimental APIs]  (@ref rte_compat.h),
   [ABI versioning]     (@ref rte_function_versioning.h),
   [version]            (@ref rte_version.h)
+  [pcapng]             (@ref rte_pcapng.h)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6ab..aba17799a9a1 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -58,6 +58,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/metrics \
                           @TOPDIR@/lib/node \
                           @TOPDIR@/lib/net \
+                          @TOPDIR@/lib/pcapng \
                           @TOPDIR@/lib/pci \
                           @TOPDIR@/lib/pdump \
                           @TOPDIR@/lib/pipeline \
diff --git a/doc/guides/howto/img/packet_capture_framework.svg b/doc/guides/howto/img/packet_capture_framework.svg
index a76baf71fdee..1c2646a81096 100644
--- a/doc/guides/howto/img/packet_capture_framework.svg
+++ b/doc/guides/howto/img/packet_capture_framework.svg
@@ -1,6 +1,4 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<!-- Created with Inkscape (http://www.inkscape.org/) -->
-
 <svg
    xmlns:osb="http://www.openswatchbook.org/uri/2009/osb"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
@@ -16,8 +14,8 @@
    viewBox="0 0 425.19685 283.46457"
    id="svg2"
    version="1.1"
-   inkscape:version="0.91 r13725"
-   sodipodi:docname="drawing-pcap.svg">
+   inkscape:version="1.0.2 (e86c870879, 2021-01-15)"
+   sodipodi:docname="packet_capture_framework.svg">
   <defs
      id="defs4">
     <marker
@@ -228,7 +226,7 @@
        x2="487.64606"
        y2="258.38232"
        gradientUnits="userSpaceOnUse"
-       gradientTransform="translate(-84.916417,744.90779)" />
+       gradientTransform="matrix(1.1457977,0,0,0.99944907,-151.97019,745.05014)" />
     <linearGradient
        inkscape:collect="always"
        xlink:href="#linearGradient5784"
@@ -277,17 +275,18 @@
      borderopacity="1.0"
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
-     inkscape:zoom="0.57434918"
-     inkscape:cx="215.17857"
-     inkscape:cy="285.26445"
+     inkscape:zoom="1"
+     inkscape:cx="226.77165"
+     inkscape:cy="78.124511"
      inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
-     inkscape:window-width="1874"
-     inkscape:window-height="971"
-     inkscape:window-x="2"
-     inkscape:window-y="24"
-     inkscape:window-maximized="0" />
+     inkscape:window-width="2560"
+     inkscape:window-height="1414"
+     inkscape:window-x="0"
+     inkscape:window-y="0"
+     inkscape:window-maximized="1"
+     inkscape:document-rotation="0" />
   <metadata
      id="metadata7">
     <rdf:RDF>
@@ -296,7 +295,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
+        <dc:title />
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -321,15 +320,15 @@
        y="790.82452" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="61.050636"
        y="807.3205"
-       id="text4152"
-       sodipodi:linespacing="125%"><tspan
+       id="text4152"><tspan
          sodipodi:role="line"
          id="tspan4154"
          x="61.050636"
-         y="807.3205">DPDK Primary Application</tspan></text>
+         y="807.3205"
+         style="font-size:12.5px;line-height:1.25">DPDK Primary Application</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6"
@@ -339,19 +338,20 @@
        y="827.01843" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="350.68585"
        y="841.16058"
-       id="text4189"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189"><tspan
          sodipodi:role="line"
          id="tspan4191"
          x="350.68585"
-         y="841.16058">dpdk-pdump</tspan><tspan
+         y="841.16058"
+         style="font-size:12.5px;line-height:1.25">dpdk-dumpcap</tspan><tspan
          sodipodi:role="line"
          x="350.68585"
          y="856.78558"
-         id="tspan4193">tool</tspan></text>
+         id="tspan4193"
+         style="font-size:12.5px;line-height:1.25">tool</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4"
@@ -361,15 +361,15 @@
        y="891.16315" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70612"
        y="905.3053"
-       id="text4189-1"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1"><tspan
          sodipodi:role="line"
          x="352.70612"
          y="905.3053"
-         id="tspan4193-3">PCAP PMD</tspan></text>
+         id="tspan4193-3"
+         style="font-size:12.5px;line-height:1.25">librte_pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5745);fill-opacity:1;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-6"
@@ -379,15 +379,15 @@
        y="923.9931" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.02846"
        y="938.13525"
-       id="text4189-0"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-0"><tspan
          sodipodi:role="line"
          x="136.02846"
          y="938.13525"
-         id="tspan4193-6">dpdk_port0</tspan></text>
+         id="tspan4193-6"
+         style="font-size:12.5px;line-height:1.25">dpdk_port0</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-5"
@@ -397,33 +397,33 @@
        y="824.99817" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="137.54369"
        y="839.14026"
-       id="text4189-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-4"><tspan
          sodipodi:role="line"
          x="137.54369"
          y="839.14026"
-         id="tspan4193-2">librte_pdump</tspan></text>
+         id="tspan4193-2"
+         style="font-size:12.5px;line-height:1.25">librte_pdump</tspan></text>
     <rect
-       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1.07013;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5"
-       width="94.449265"
-       height="35.355339"
-       x="307.7804"
-       y="985.61243" />
+       width="108.21974"
+       height="35.335861"
+       x="297.9809"
+       y="985.62219" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70618"
        y="999.75458"
-       id="text4189-1-8"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8"><tspan
          sodipodi:role="line"
          x="352.70618"
          y="999.75458"
-         id="tspan4193-3-2">capture.pcap</tspan></text>
+         id="tspan4193-3-2"
+         style="font-size:12.5px;line-height:1.25">capture.pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5788-1);fill-opacity:1;stroke:#257cdc;stroke-width:1.12555885;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5-1"
@@ -433,15 +433,15 @@
        y="983.14984" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.53352"
        y="1002.785"
-       id="text4189-1-8-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8-4"><tspan
          sodipodi:role="line"
          x="136.53352"
          y="1002.785"
-         id="tspan4193-3-2-7">Traffic Generator</tspan></text>
+         id="tspan4193-3-2-7"
+         style="font-size:12.5px;line-height:1.25">Traffic Generator</tspan></text>
     <path
        style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker7331)"
        d="m 351.46948,927.02357 c 0,57.5787 0,57.5787 0,57.5787"
diff --git a/doc/guides/howto/packet_capture_framework.rst b/doc/guides/howto/packet_capture_framework.rst
index c31bac52340e..78baa609a021 100644
--- a/doc/guides/howto/packet_capture_framework.rst
+++ b/doc/guides/howto/packet_capture_framework.rst
@@ -1,18 +1,19 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(c) 2017 Intel Corporation.
 
-DPDK pdump Library and pdump Tool
-=================================
+DPDK packet capture libraries and tools
+=======================================
 
 This document describes how the Data Plane Development Kit (DPDK) Packet
 Capture Framework is used for capturing packets on DPDK ports. It is intended
 for users of DPDK who want to know more about the Packet Capture feature and
 for those who want to monitor traffic on DPDK-controlled devices.
 
-The DPDK packet capture framework was introduced in DPDK v16.07. The DPDK
-packet capture framework consists of the DPDK pdump library and DPDK pdump
-tool.
-
+The DPDK packet capture framework was introduced in DPDK v16.07 and
+enhanced in 21.1. The DPDK packet capture framework consists of the
+libraries for collecting packets ``librte_pdump`` and writing packets
+to a file ``librte_pcapng``. There are two sample applications:
+``dpdk-dumpcap`` and older ``dpdk-pdump``.
 
 Introduction
 ------------
@@ -22,43 +23,46 @@ allow users to initialize the packet capture framework and to enable or
 disable packet capture. The library works on a multi process communication model and its
 usage is recommended for debugging purposes.
 
-The :ref:`dpdk-pdump <pdump_tool>` tool is developed based on the
-``librte_pdump`` library.  It runs as a DPDK secondary process and is capable
-of enabling or disabling packet capture on DPDK ports. The ``dpdk-pdump`` tool
-provides command-line options with which users can request enabling or
-disabling of the packet capture on DPDK ports.
+The :ref:`librte_pcapng <pcapng_library>` library provides the APIs to format
+packets and write them to a file in Pcapng format.
+
+
+The :ref:`dpdk-dumpcap <dumpcap_tool>` is a tool that captures packets in
+like Wireshark dumpcap does for Linux. It runs as a DPDK secondary process and
+captures packets from one or more interfaces and writes them to a file
+in Pcapng format.  The ``dpdk-dumpcap`` tool is designed to take
+most of the same options as the Wireshark ``dumpcap`` command.
 
-The application which initializes the packet capture framework will be a primary process
-and the application that enables or disables the packet capture will
-be a secondary process. The primary process sends the Rx and Tx packets from the DPDK ports
-to the secondary process.
+Without any options it will use the packet capture framework to
+capture traffic from the first available DPDK port.
 
 In DPDK the ``testpmd`` application can be used to initialize the packet
-capture framework and acts as a server, and the ``dpdk-pdump`` tool acts as a
+capture framework and acts as a server, and the ``dpdk-dumpcap`` tool acts as a
 client. To view Rx or Tx packets of ``testpmd``, the application should be
-launched first, and then the ``dpdk-pdump`` tool. Packets from ``testpmd``
-will be sent to the tool, which then sends them on to the Pcap PMD device and
-that device writes them to the Pcap file or to an external interface depending
-on the command-line option used.
+launched first, and then the ``dpdk-dumpcap`` tool. Packets from ``testpmd``
+will be sent to the tool, and then to the Pcapng file.
 
 Some things to note:
 
-* The ``dpdk-pdump`` tool can only be used in conjunction with a primary
+* All tools using ``librte_pdump`` can only be used in conjunction with a primary
   application which has the packet capture framework initialized already. In
   dpdk, only ``testpmd`` is modified to initialize packet capture framework,
-  other applications remain untouched. So, if the ``dpdk-pdump`` tool has to
+  other applications remain untouched. So, if the ``dpdk-dumpcap`` tool has to
   be used with any application other than the testpmd, the user needs to
   explicitly modify that application to call the packet capture framework
   initialization code. Refer to the ``app/test-pmd/testpmd.c`` code and look
   for ``pdump`` keyword to see how this is done.
 
-* The ``dpdk-pdump`` tool depends on the libpcap based PMD.
+* The ``dpdk-pdump`` tool is an older tool created as demonstration of ``librte_pdump``
+  library. The ``dpdk-pdump`` tool provides more limited functionality and
+  and depends on the Pcap PMD. It is retained only for compatibility reasons;
+  users should use ``dpdk-dumpcap`` instead.
 
 
 Test Environment
 ----------------
 
-The overview of using the Packet Capture Framework and the ``dpdk-pdump`` tool
+The overview of using the Packet Capture Framework and the ``dpdk-dumpcap`` utility
 for packet capturing on the DPDK port in
 :numref:`figure_packet_capture_framework`.
 
@@ -66,13 +70,13 @@ for packet capturing on the DPDK port in
 
 .. figure:: img/packet_capture_framework.*
 
-   Packet capturing on a DPDK port using the dpdk-pdump tool.
+   Packet capturing on a DPDK port using the dpdk-dumpcap utility.
 
 
 Running the Application
 -----------------------
 
-The following steps demonstrate how to run the ``dpdk-pdump`` tool to capture
+The following steps demonstrate how to run the ``dpdk-dumpcap`` tool to capture
 Rx side packets on dpdk_port0 in :numref:`figure_packet_capture_framework` and
 inspect them using ``tcpdump``.
 
@@ -80,16 +84,15 @@ inspect them using ``tcpdump``.
 
      sudo <build_dir>/app/dpdk-testpmd -c 0xf0 -n 4 -- -i --port-topology=chained
 
-#. Launch the pdump tool as follows::
+#. Launch the dpdk-dump as follows::
 
-     sudo <build_dir>/app/dpdk-pdump -- \
-          --pdump 'port=0,queue=*,rx-dev=/tmp/capture.pcap'
+     sudo <build_dir>/app/dpdk-dumpcap -w /tmp/capture.pcapng
 
 #. Send traffic to dpdk_port0 from traffic generator.
-   Inspect packets captured in the file capture.pcap using a tool
-   that can interpret Pcap files, for example tcpdump::
+   Inspect packets captured in the file capture.pcap using a tool such as
+   tcpdump or tshark that can interpret Pcapng files::
 
-     $tcpdump -nr /tmp/capture.pcap
+     $ tcpdump -nr /tmp/capture.pcapng
      reading from file /tmp/capture.pcap, link-type EN10MB (Ethernet)
      11:11:36.891404 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
      11:11:36.891442 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46a3..b440c77c2ba1 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -43,6 +43,7 @@ Programmer's Guide
     ip_fragment_reassembly_lib
     generic_receive_offload_lib
     generic_segmentation_offload_lib
+    pcapng_lib
     pdump_lib
     multi_proc_support
     kernel_nic_interface
diff --git a/doc/guides/prog_guide/pcapng_lib.rst b/doc/guides/prog_guide/pcapng_lib.rst
new file mode 100644
index 000000000000..36379b530a57
--- /dev/null
+++ b/doc/guides/prog_guide/pcapng_lib.rst
@@ -0,0 +1,24 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2016 Intel Corporation.
+
+.. _pcapng_library:
+
+Packet Capture File Writer
+==========================
+
+Pcapng is a library for creating files in Pcapng file format.
+The Pcapng file format is the default capture file format for modern
+network capture processing tools. It can be read by wireshark and tcpdump.
+
+Usage
+-----
+
+Before the library can be used the function ``rte_pcapng_init``
+should be called once to initialize timestamp computation.
+
+
+References
+----------
+* Draft RFC https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+
+* Project repository  https://github.com/pcapng/pcapng/
diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst
index 62c0b015b2fe..9af91415e5ea 100644
--- a/doc/guides/prog_guide/pdump_lib.rst
+++ b/doc/guides/prog_guide/pdump_lib.rst
@@ -3,10 +3,10 @@
 
 .. _pdump_library:
 
-The librte_pdump Library
-========================
+The Packet Capture Library
+==========================
 
-The ``librte_pdump`` library provides a framework for packet capturing in DPDK.
+The DPDK ``pdump`` library provides a framework for packet capturing in DPDK.
 The library does the complete copy of the Rx and Tx mbufs to a new mempool and
 hence it slows down the performance of the applications, so it is recommended
 to use this library for debugging purposes.
@@ -23,11 +23,19 @@ or disable the packet capture, and to uninitialize it.
 
 * ``rte_pdump_enable()``:
   This API enables the packet capture on a given port and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf()``
+  This API enables the packet capture on a given port and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_enable_by_deviceid()``:
   This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf_by_deviceid()``
+  This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
+  It also allows seating an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_disable()``:
   This API disables the packet capture on a given port and queue.
@@ -61,6 +69,12 @@ and enables the packet capture by registering the Ethernet RX and TX callbacks f
 and queue combinations. Then the primary process will mirror the packets to the new mempool and enqueue them to
 the rte_ring that secondary process have passed to these APIs.
 
+The packet ring supports one of two formats. The default format enqueues copies of the original packets
+into the rte_ring. If the ``RTE_PDUMP_FLAG_PCAPNG`` is set the mbuf data is extended with header and trailer
+to match the format of Pcapng enhanced packet block. The enhanced packet block has meta-data such as the
+timestamp, port and queue the packet was captured on. It is up to the application consuming the
+packets from the ring to select the format desired.
+
 The library APIs ``rte_pdump_disable()`` and ``rte_pdump_disable_by_deviceid()`` disables the packet capture.
 For the calls to these APIs from secondary process, the library creates the "pdump disable" request and sends
 the request to the primary process over the multi process channel. The primary process takes this request and
@@ -74,5 +88,5 @@ function.
 Use Case: Packet Capturing
 --------------------------
 
-The DPDK ``app/pdump`` tool is developed based on this library to capture packets in DPDK.
-Users can use this as an example to develop their own packet capturing tools.
+The DPDK ``app/dpdk-dumpcap`` utility uses this library
+to capture packets in DPDK.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 3fa80186957a..43464e999aaa 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -82,6 +82,16 @@ New Features
 
   * Added PDCP short MAC-I support.
 
+* **Enhanced Packet capture.**
+
+  * New dpdk-dumpcap program that has most of the features of the
+    wireshark dumpcap utility including: capture of multiple interfaces,
+    filtering, and stopping after number of bytes, packets.
+  * New library for writing pcapng packet capture files.
+  * Enhancements to the pdump library to support:
+    * Packet filter with BPF.
+    * Pcapng format with timestamps and meta-data.
+    * Fixes packet capture with stripped VLAN tags.
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/dumpcap.rst b/doc/guides/tools/dumpcap.rst
new file mode 100644
index 000000000000..664ea0c79802
--- /dev/null
+++ b/doc/guides/tools/dumpcap.rst
@@ -0,0 +1,86 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Microsoft Corporation.
+
+.. _dumpcap_tool:
+
+dpdk-dumpcap Application
+========================
+
+The ``dpdk-dumpcap`` tool is a Data Plane Development Kit (DPDK)
+network traffic dump tool.  The interface is similar to  the dumpcap tool in Wireshark.
+It runs as a secondary DPDK process and lets you capture packets that are
+coming into and out of a DPDK primary process.
+The ``dpdk-dumpcap`` writes files in Pcapng packet format using
+capture file format is pcapng.
+
+Without any options set it will use DPDK to capture traffic from the first
+available DPDK interface and write the received raw packet data, along
+with timestamps into a pcapng file.
+
+If the ``-w`` option is not specified, ``dpdk-dumpcap`` writes to a newly
+create file with a name chosen based on interface name and timestamp.
+If ``-w`` option is specified, then that file is used.
+
+   .. Note::
+      * The ``dpdk-dumpcap`` tool can only be used in conjunction with a primary
+        application which has the packet capture framework initialized already.
+        In dpdk, only the ``testpmd`` is modified to initialize packet capture
+        framework, other applications remain untouched. So, if the ``dpdk-dumpcap``
+        tool has to be used with any application other than the testpmd, user
+        needs to explicitly modify that application to call packet capture
+        framework initialization code. Refer ``app/test-pmd/testpmd.c``
+        code to see how this is done.
+
+      * The ``dpdk-dumpcap`` tool runs as a DPDK secondary process. It exits when
+        the primary application exits.
+
+
+Running the Application
+-----------------------
+
+To list interfaces available for capture use ``--list-interfaces``.
+
+To filter packets in style of *tshark* use the ``-f`` flag.
+
+To capture on multiple interfaces at once, use multiple ``-I`` flags.
+
+Example
+-------
+
+.. code-block:: console
+
+   # ./<build_dir>/app/dpdk-dumpcap --list-interfaces
+   0. 000:00:03.0
+   1. 000:00:03.1
+
+   # ./<build_dir>/app/dpdk-dumpcap -I 0000:00:03.0 -c 6 -w /tmp/sample.pcapng
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 6/0
+
+   # ./<build_dir>/app/dpdk-dumpcap -f 'tcp port 80'
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 10/8
+
+
+Limitations
+-----------
+The following option of Wireshark ``dumpcap`` is not yet implemented:
+
+   * ``-b|--ring-buffer`` -- more complex file management.
+
+The following options do not make sense in the context of DPDK.
+
+   * ``-C <byte_limit>`` -- its a kernel thing
+
+   * ``-t`` -- use a thread per interface
+
+   * Timestamp type.
+
+   * Link data types. Only EN10MB (Ethernet) is supported.
+
+   * Wireless related options:  ``-I|--monitor-mode`` and  ``-k <freq>``
+
+
+.. Note::
+   * The options to ``dpdk-dumpcap`` are like the Wireshark dumpcap program and
+     are not the same as ``dpdk-pdump`` and other DPDK applications.
diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst
index 93dde4148e90..b71c12b8f2dd 100644
--- a/doc/guides/tools/index.rst
+++ b/doc/guides/tools/index.rst
@@ -8,6 +8,7 @@ DPDK Tools User Guides
     :maxdepth: 2
     :numbered:
 
+    dumpcap
     proc_info
     pdump
     pmdinfo
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v8 12/12] MAINTAINERS: add entry for new packet capture features
  2021-09-13 18:14 ` [dpdk-dev] [PATCH v8 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (10 preceding siblings ...)
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 11/12] doc: changes for new pcapng and dumpcap Stephen Hemminger
@ 2021-09-13 18:15   ` Stephen Hemminger
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-13 18:15 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Since the packet capture is just extension of existing pdump;
add myself as maintainer of that.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 266f5ac1dae8..437cba73de0b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1423,12 +1423,17 @@ F: doc/guides/sample_app_ug/qos_scheduler.rst
 
 Packet capture
 M: Reshma Pattan <reshma.pattan@intel.com>
+M: Stephen Hemminger <stephen@networkplumber.org>
 F: lib/pdump/
+F: lib/pcapng/
 F: doc/guides/prog_guide/pdump_lib.rst
-F: app/test/test_pdump.*
-F: app/pdump/
+F: doc/guides/prog_guide/pcapng_lib.rst
+F: doc/guides/tools/dumpcap.rst
 F: doc/guides/tools/pdump.rst
-
+F: app/test/test_pdump.c
+F: app/test/test_pcapng.c
+F: app/pdump/
+F: app/dumpcap/
 
 Packet Framework
 ----------------
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v8 03/12] bpf: allow self-xor operation
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 03/12] bpf: allow self-xor operation Stephen Hemminger
@ 2021-09-15 10:55     ` Ananyev, Konstantin
  0 siblings, 0 replies; 220+ messages in thread
From: Ananyev, Konstantin @ 2021-09-15 10:55 UTC (permalink / raw)
  To: Stephen Hemminger, dev



> When doing BPF filter program conversion, a common way
> to zero a register in single instruction is:
>      xor r7,r7
> The BPF validator would not allow this because the value of
> r7 was undefined. But after this operation it always zero.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/bpf/bpf_validate.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/bpf/bpf_validate.c b/lib/bpf/bpf_validate.c
> index 7b1291b382e9..7647a7454dc2 100644
> --- a/lib/bpf/bpf_validate.c
> +++ b/lib/bpf/bpf_validate.c
> @@ -661,8 +661,12 @@ eval_alu(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
> 
>  	op = BPF_OP(ins->code);
> 
> -	err = eval_defined((op != EBPF_MOV) ? rd : NULL,
> -			(op != BPF_NEG) ? &rs : NULL);
> +	/* Allow self-xor as way to zero register */
> +	if (op == BPF_XOR && ins->src_reg == ins->dst_reg)
> +		err = NULL;
> +	else
> +		err = eval_defined((op != EBPF_MOV) ? rd : NULL,
> +				   (op != BPF_NEG) ? &rs : NULL);

Two things:
- We probably need to check that this is instruction with source register (not imm value).
- rd value is not evaluated to zero, while it probably should
  (will help evaluator to better predict further values) 

So might be better to do something like:

/* Allow self-xor as way to zero register */
        if (op == BPF_XOR && BPF_SRC(ins->code) == BPF_X &&
                        ins->src_reg == ins->dst_reg) {
                eval_fill_imm(&rs, UINT64_MAX, 0);
                eval_fill_imm(rd, UINT64_MAX, 0);
        }

        err = eval_defined((op != EBPF_MOV) ? rd : NULL,
                           (op != BPF_NEG) ? &rs : NULL);
        if (err != NULL)
                return err;

...

Another thing - shouldn't that patch be treated like a fix (cc to stable, etc.)?

>  	if (err != NULL)
>  		return err;
> 
> --
> 2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v8 04/12] bpf: add function to convert classic BPF to DPDK BPF
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
@ 2021-09-15 11:02     ` Ananyev, Konstantin
  2021-09-15 16:25       ` Stephen Hemminger
  0 siblings, 1 reply; 220+ messages in thread
From: Ananyev, Konstantin @ 2021-09-15 11:02 UTC (permalink / raw)
  To: Stephen Hemminger, dev



> 
> The pcap library emits classic BPF (32 bit) and is useful for
> creating filter programs.  The DPDK BPF library only implements
> extended BPF (eBPF).  Add an function to convert from old to
> new.
> 
> The rte_bpf_convert function uses rte_malloc to put the resulting
> program in hugepage shared memory so it can be passed from a
> secondary process to a primary process.
> 
> The code to convert was originally done as part of the Linux
> kernel implementation then converted to a userspace program.
> Both authors have agreed that it is allowable to license this
> as BSD licensed in DPDK.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/bpf/bpf_convert.c | 570 ++++++++++++++++++++++++++++++++++++++++++
>  lib/bpf/meson.build   |   5 +
>  lib/bpf/rte_bpf.h     |  25 ++
>  lib/bpf/version.map   |   6 +
>  4 files changed, 606 insertions(+)
>  create mode 100644 lib/bpf/bpf_convert.c
> 
> diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
> index 63cbd60185e0..54f7610ae990 100644
> --- a/lib/bpf/meson.build
> +++ b/lib/bpf/meson.build
> @@ -25,3 +25,8 @@ if dep.found()
>      sources += files('bpf_load_elf.c')
>      ext_deps += dep
>  endif
> +
> +if dpdk_conf.has('RTE_PORT_PCAP')

Do we really need that 'if' above?
Why not to always have it enabled?

> +    sources += files('bpf_convert.c')
> +    ext_deps += pcap_dep
> +endif
> diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
> index 69116f36ba8b..2f23e272a376 100644
> --- a/lib/bpf/rte_bpf.h
> +++ b/lib/bpf/rte_bpf.h
> @@ -198,6 +198,31 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
>  int
>  rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
> 
> +#ifdef RTE_PORT_PCAP
> +
> +struct bpf_program;
> +
> +/**
> + * Convert a Classic BPF program from libpcap into a DPDK BPF code.
> + *
> + * @param prog
> + *  Classic BPF program from pcap_compile().
> + * @param prm
> + *  Result Extended BPF program.
> + * @return
> + *   Pointer to BPF program (allocated with *rte_malloc*)
> + *   that is used in future BPF operations,
> + *   or NULL on error, with error code set in rte_errno.
> + *   Possible rte_errno errors include:
> + *   - EINVAL - invalid parameter passed to function
> + *   - ENOMEM - can't reserve enough memory
> + */
> +__rte_experimental
> +struct rte_bpf_prm *
> +rte_bpf_convert(const struct bpf_program *prog);
> +
> +#endif
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/bpf/version.map b/lib/bpf/version.map
> index 0bf35f487666..47082d5003ef 100644
> --- a/lib/bpf/version.map
> +++ b/lib/bpf/version.map
> @@ -14,3 +14,9 @@ DPDK_22 {
> 
>  	local: *;
>  };
> +
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_bpf_convert;
> +};
> --

Cool feature, thanks for contributing.
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v8 05/12] bpf: add function to dump eBPF instructions
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
@ 2021-09-15 11:04     ` Ananyev, Konstantin
  2021-09-15 16:26       ` Stephen Hemminger
  0 siblings, 1 reply; 220+ messages in thread
From: Ananyev, Konstantin @ 2021-09-15 11:04 UTC (permalink / raw)
  To: Stephen Hemminger, dev



> When debugging converted (and other) programs it is useful
> to see disassembled eBPF output.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/bpf/bpf_convert.c |   7 ++-
>  lib/bpf/bpf_dump.c    | 118 ++++++++++++++++++++++++++++++++++++++++++
>  lib/bpf/meson.build   |   1 +
>  lib/bpf/rte_bpf.h     |  14 +++++
>  lib/bpf/version.map   |   1 +
>  5 files changed, 140 insertions(+), 1 deletion(-)
>  create mode 100644 lib/bpf/bpf_dump.c
> 
> diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
> index a46ffeb067dd..db84add7dcce 100644
> --- a/lib/bpf/bpf_convert.c
> +++ b/lib/bpf/bpf_convert.c
> @@ -331,7 +331,12 @@ static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
>  		case BPF_LD | BPF_IND | BPF_H:
>  		case BPF_LD | BPF_IND | BPF_B:
>  			/* All arithmetic insns map as-is. */
> -			*insn = BPF_RAW_INSN(fp->code, BPF_REG_A, BPF_REG_X, 0, fp->k);
> +			insn->code = fp->code;
> +			insn->dst_reg = BPF_REG_A;
> +			bpf_src = BPF_SRC(fp->code);
> +			insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
> +			insn->off = 0;
> +			insn->imm = fp->k;
>  			break;

Should it be part of that patch?
Looks like belongs to previous one, no?

> 
>  			/* Jump transformation cannot use BPF block macros
> diff --git a/lib/bpf/bpf_dump.c b/lib/bpf/bpf_dump.c
> new file mode 100644
> index 000000000000..a6a431e64903
> --- /dev/null
> +++ b/lib/bpf/bpf_dump.c
> @@ -0,0 +1,118 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2021 Stephen Hemminger
> + * Based on filter2xdp
> + * Copyright (C) 2017 Tobias Klauser
> + */
> +
> +#include <stdio.h>
> +#include <stdint.h>
> +
> +#include "rte_bpf.h"
> +
> +#define BPF_OP_INDEX(x) (BPF_OP(x) >> 4)
> +#define BPF_SIZE_INDEX(x) (BPF_SIZE(x) >> 3)
> +
> +static const char *const class_tbl[] = {
> +	[BPF_LD] = "ld",   [BPF_LDX] = "ldx",	 [BPF_ST] = "st",
> +	[BPF_STX] = "stx", [BPF_ALU] = "alu",	 [BPF_JMP] = "jmp",
> +	[BPF_RET] = "ret", [BPF_MISC] = "alu64",
> +};
> +
> +static const char *const alu_op_tbl[16] = {
> +	[BPF_ADD >> 4] = "add",	   [BPF_SUB >> 4] = "sub",
> +	[BPF_MUL >> 4] = "mul",	   [BPF_DIV >> 4] = "div",
> +	[BPF_OR >> 4] = "or",	   [BPF_AND >> 4] = "and",
> +	[BPF_LSH >> 4] = "lsh",	   [BPF_RSH >> 4] = "rsh",
> +	[BPF_NEG >> 4] = "neg",	   [BPF_MOD >> 4] = "mod",
> +	[BPF_XOR >> 4] = "xor",	   [EBPF_MOV >> 4] = "mov",
> +	[EBPF_ARSH >> 4] = "arsh", [EBPF_END >> 4] = "endian",
> +};
> +
> +static const char *const size_tbl[] = {
> +	[BPF_W >> 3] = "w",
> +	[BPF_H >> 3] = "h",
> +	[BPF_B >> 3] = "b",
> +	[EBPF_DW >> 3] = "dw",
> +};
> +
> +static const char *const jump_tbl[16] = {
> +	[BPF_JA >> 4] = "ja",	   [BPF_JEQ >> 4] = "jeq",
> +	[BPF_JGT >> 4] = "jgt",	   [BPF_JGE >> 4] = "jge",
> +	[BPF_JSET >> 4] = "jset",  [EBPF_JNE >> 4] = "jne",
> +	[EBPF_JSGT >> 4] = "jsgt", [EBPF_JSGE >> 4] = "jsge",
> +	[EBPF_CALL >> 4] = "call", [EBPF_EXIT >> 4] = "exit",
> +};
> +
> +static void ebpf_dump(FILE *f, const struct ebpf_insn insn, size_t n)
> +{
> +	const char *op, *postfix = "";
> +	uint8_t cls = BPF_CLASS(insn.code);
> +
> +	fprintf(f, " L%zu:\t", n);
> +
> +	switch (cls) {
> +	default:
> +		fprintf(f, "unimp 0x%x // class: %s\n", insn.code,
> +			class_tbl[cls]);
> +		break;
> +	case BPF_ALU:
> +		postfix = "32";
> +		/* fall through */
> +	case EBPF_ALU64:
> +		op = alu_op_tbl[BPF_OP_INDEX(insn.code)];
> +		if (BPF_SRC(insn.code) == BPF_X)
> +			fprintf(f, "%s%s r%u, r%u\n", op, postfix, insn.dst_reg,
> +				insn.src_reg);
> +		else
> +			fprintf(f, "%s%s r%u, #0x%x\n", op, postfix,
> +				insn.dst_reg, insn.imm);
> +		break;
> +	case BPF_LD:
> +		op = "ld";
> +		postfix = size_tbl[BPF_SIZE_INDEX(insn.code)];
> +		if (BPF_MODE(insn.code) == BPF_IMM)
> +			fprintf(f, "%s%s r%d, #0x%x\n", op, postfix,
> +				insn.dst_reg, insn.imm);
> +		else if (BPF_MODE(insn.code) == BPF_ABS)
> +			fprintf(f, "%s%s r%d, [%d]\n", op, postfix,
> +				insn.dst_reg, insn.imm);
> +		else if (BPF_MODE(insn.code) == BPF_IND)
> +			fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix,
> +				insn.dst_reg, insn.src_reg, insn.imm);
> +		else
> +			fprintf(f, "// BUG: LD opcode 0x%02x in eBPF insns\n",
> +				insn.code);
> +		break;
> +	case BPF_LDX:
> +		op = "ldx";
> +		postfix = size_tbl[BPF_SIZE_INDEX(insn.code)];
> +		fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix, insn.dst_reg,
> +			insn.src_reg, insn.off);
> +		break;
> +#define L(pc, off) ((int)(pc) + 1 + (off))
> +	case BPF_JMP:
> +		op = jump_tbl[BPF_OP_INDEX(insn.code)];
> +		if (op == NULL)
> +			fprintf(f, "invalid jump opcode: %#x\n", insn.code);
> +		else if (BPF_OP(insn.code) == BPF_JA)
> +			fprintf(f, "%s L%d\n", op, L(n, insn.off));
> +		else if (BPF_OP(insn.code) == EBPF_EXIT)
> +			fprintf(f, "%s\n", op);
> +		else
> +			fprintf(f, "%s r%u, #0x%x, L%d\n", op, insn.dst_reg,
> +				insn.imm, L(n, insn.off));
> +		break;
> +	case BPF_RET:
> +		fprintf(f, "// BUG: RET opcode 0x%02x in eBPF insns\n",
> +			insn.code);
> +		break;
> +	}
> +}
> +
> +void rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len)
> +{
> +	uint32_t i;
> +
> +	for (i = 0; i < len; ++i)
> +		ebpf_dump(f, buf[i], i);
> +}
> diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
> index 54f7610ae990..5b5585173aeb 100644
> --- a/lib/bpf/meson.build
> +++ b/lib/bpf/meson.build
> @@ -2,6 +2,7 @@
>  # Copyright(c) 2018 Intel Corporation
> 
>  sources = files('bpf.c',
> +	'bpf_dump.c',
>          'bpf_exec.c',
>          'bpf_load.c',
>          'bpf_pkt.c',
> diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
> index 2f23e272a376..0d0a84b130a0 100644
> --- a/lib/bpf/rte_bpf.h
> +++ b/lib/bpf/rte_bpf.h
> @@ -198,6 +198,20 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
>  int
>  rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
> 
> +/**
> + * Dump epf instructions to a file.
> + *
> + * @param f
> + *   A pointer to a file for output
> + * @param buf
> + *   A pointer to BPF instructions
> + * @param len
> + *   Number of BPF instructions to dump.
> + */
> +__rte_experimental
> +void
> +rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len);
> +
>  #ifdef RTE_PORT_PCAP
> 
>  struct bpf_program;
> diff --git a/lib/bpf/version.map b/lib/bpf/version.map
> index 47082d5003ef..3b953f2f4592 100644
> --- a/lib/bpf/version.map
> +++ b/lib/bpf/version.map
> @@ -19,4 +19,5 @@ EXPERIMENTAL {
>  	global:
> 
>  	rte_bpf_convert;
> +	rte_bpf_dump;
>  };
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v8 10/12] test: enable bpf autotest
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 10/12] test: enable bpf autotest Stephen Hemminger
@ 2021-09-15 11:27     ` Ananyev, Konstantin
  2021-09-15 23:36       ` Stephen Hemminger
  2021-09-16  3:09     ` Stephen Hemminger
  1 sibling, 1 reply; 220+ messages in thread
From: Ananyev, Konstantin @ 2021-09-15 11:27 UTC (permalink / raw)
  To: Stephen Hemminger, dev



> The BPF autotest is defined but not run automatically.
> Since it is short, it should be added to the autotest suite.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  app/test/meson.build | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/app/test/meson.build b/app/test/meson.build
> index 0d551ac9c2b2..cd18484bb73a 100644
> --- a/app/test/meson.build
> +++ b/app/test/meson.build
> @@ -194,6 +194,8 @@ test_deps = [
>  fast_tests = [
>          ['acl_autotest', true],
>          ['atomic_autotest', false],
> +        ['bpf_autotest', true],
> +        ['bpf_convert_autotest', true],
>          ['bitops_autotest', true],
>          ['byteorder_autotest', true],
>          ['cksum_autotest', true],
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v8 08/12] test: add test for bpf_convert
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 08/12] test: add test for bpf_convert Stephen Hemminger
@ 2021-09-15 11:34     ` Ananyev, Konstantin
  0 siblings, 0 replies; 220+ messages in thread
From: Ananyev, Konstantin @ 2021-09-15 11:34 UTC (permalink / raw)
  To: Stephen Hemminger, dev



> 
> Add some functional tests for the Classic BPF to DPDK BPF converter.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  app/test/test_bpf.c | 173 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 173 insertions(+)
> 
> diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
> index 527c06b80708..1b5a178241d8 100644
> --- a/app/test/test_bpf.c
> +++ b/app/test/test_bpf.c
> @@ -10,6 +10,7 @@
>  #include <rte_memory.h>
>  #include <rte_debug.h>
>  #include <rte_hexdump.h>
> +#include <rte_malloc.h>
>  #include <rte_random.h>
>  #include <rte_byteorder.h>
>  #include <rte_errno.h>
> @@ -3233,3 +3234,175 @@ test_bpf(void)
>  }
> 
>  REGISTER_TEST_COMMAND(bpf_autotest, test_bpf);
> +
> +#ifdef RTE_PORT_PCAP
> +#include <pcap/pcap.h>
> +
> +static void
> +test_bpf_dump(struct bpf_program *cbf, const struct rte_bpf_prm *prm)
> +{
> +	printf("cBPF program (%u insns)\n", cbf->bf_len);
> +	bpf_dump(cbf, 1);
> +
> +	printf("\neBPF program (%u insns)\n", prm->nb_ins);
> +	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
> +}
> +
> +static int
> +test_bpf_match(pcap_t *pcap, const char *str, bool expected)
> +{
> +	struct bpf_program fcode;
> +	struct rte_bpf_prm *prm = NULL;
> +	struct rte_bpf *bpf = NULL;
> +	uint8_t tbuf[sizeof(struct dummy_mbuf)];
> +	int ret = -1;
> +	uint64_t rc;
> +
> +	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
> +		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
> +		       __func__, __LINE__,  str, pcap_geterr(pcap));
> +		return -1;
> +	}
> +
> +	prm = rte_bpf_convert(&fcode);
> +	if (prm == NULL) {
> +		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
> +		       __func__, __LINE__, str, rte_errno, strerror(rte_errno));
> +		goto error;
> +	}
> +
> +	bpf = rte_bpf_load(prm);
> +	if (bpf == NULL) {
> +		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
> +			__func__, __LINE__, rte_errno, strerror(rte_errno));
> +		goto error;
> +	}
> +
> +	test_ld_mbuf1_prepare(tbuf);
> +	rc = rte_bpf_exec(bpf, tbuf);
> +	if ((rc == 0) == expected)
> +		ret = 0;
> +	else
> +		printf("%s@%d: failed match: expect %s 0 got %"PRIu64"\n",
> +		       __func__, __LINE__, expected ? "==" : "<>",  rc);
> +error:
> +	if (bpf)
> +		rte_bpf_destroy(bpf);
> +	rte_free(prm);
> +	pcap_freecode(&fcode);
> +	return ret;
> +}
> +
> +/* Basic sanity test can we match a IP packet */
> +static int
> +test_bpf_filter_sanity(pcap_t *pcap)
> +{
> +	int ret;
> +
> +	ret = test_bpf_match(pcap, "ip", true);
> +	ret |= test_bpf_match(pcap, "not ip", false);
> +
> +	return ret;
> +}
> +
> +/*
> + * Some sample pcap filter strings from
> + * https://wiki.wireshark.org/CaptureFilters
> + */
> +static const char * const sample_filters[] = {
> +	"host 172.18.5.4",
> +	"net 192.168.0.0/24",
> +	"src net 192.168.0.0/24",
> +	"src net 192.168.0.0 mask 255.255.255.0",
> +	"dst net 192.168.0.0/24",
> +	"dst net 192.168.0.0 mask 255.255.255.0",
> +	"port 53",
> +	"host www.example.com and not (port 80 or port 25)",
> +	"host www.example.com and not port 80 and not port 25",
> +	"port not 53 and not arp",
> +	"(tcp[0:2] > 1500 and tcp[0:2] < 1550) or (tcp[2:2] > 1500 and tcp[2:2] < 1550)",
> +	"ether proto 0x888e",
> +	"ether[0] & 1 = 0 and ip[16] >= 224",
> +	"icmp[icmptype] != icmp-echo and icmp[icmptype] != icmp-echoreply",
> +	"tcp[tcpflags] & (tcp-syn|tcp-fin) != 0 and not src and dst net 127.0.0.1",
> +	"not ether dst 01:80:c2:00:00:0e",
> +	"not broadcast and not multicast",
> +	"dst host ff02::1",
> +	"port 80 and tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420",
> +	/* Worms */
> +	"dst port 135 and tcp port 135 and ip[2:2]==48",
> +	"icmp[icmptype]==icmp-echo and ip[2:2]==92 and icmp[8:4]==0xAAAAAAAA",
> +	"dst port 135 or dst port 445 or dst port 1433"
> +	" and tcp[tcpflags] & (tcp-syn) != 0"
> +	" and tcp[tcpflags] & (tcp-ack) = 0 and src net 192.168.0.0/24",
> +	"tcp src port 443 and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4] = 0x18)"
> +	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 1] = 0x03)"
> +	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 2] < 0x04)"
> +	" and ((ip[2:2] - 4 * (ip[0] & 0x0F) - 4 * ((tcp[12] & 0xF0) >> 4) > 69))",
> +	/* Other */
> +	"len = 128",
> +};
> +
> +static int
> +test_bpf_filter(pcap_t *pcap, const char *s)
> +{
> +	struct bpf_program fcode;
> +	struct rte_bpf_prm *prm = NULL;
> +	struct rte_bpf *bpf = NULL;
> +
> +	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
> +		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
> +		       __func__, __LINE__, s, pcap_geterr(pcap));
> +		return -1;
> +	}
> +
> +	prm = rte_bpf_convert(&fcode);
> +	if (prm == NULL) {
> +		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
> +		       __func__, __LINE__, s, rte_errno, strerror(rte_errno));
> +		goto error;
> +	}
> +
> +	bpf = rte_bpf_load(prm);
> +	if (bpf == NULL) {
> +		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
> +			__func__, __LINE__, rte_errno, strerror(rte_errno));
> +		goto error;
> +	}
> +
> +error:
> +	if (bpf)
> +		rte_bpf_destroy(bpf);
> +	else {
> +		printf("%s \"%s\"\n", __func__, s);
> +		test_bpf_dump(&fcode, prm);
> +	}
> +
> +	rte_free(prm);
> +	pcap_freecode(&fcode);
> +	return (bpf == NULL) ? -1 : 0;
> +}
> +
> +static int
> +test_bpf_convert(void)
> +{
> +	unsigned int i;
> +	pcap_t *pcap;
> +	int rc;
> +
> +	pcap = pcap_open_dead(DLT_EN10MB, 262144);
> +	if (!pcap) {
> +		printf("pcap_open_dead failed\n");
> +		return -1;
> +	}
> +
> +	rc = test_bpf_filter_sanity(pcap);
> +	for (i = 0; i < RTE_DIM(sample_filters); i++)
> +		rc |= test_bpf_filter(pcap, sample_filters[i]);
> +
> +	pcap_close(pcap);
> +	return rc;
> +}
> +
> +REGISTER_TEST_COMMAND(bpf_convert_autotest, test_bpf_convert);
> +#endif /* RTE_PORT_PCAP */
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v8 04/12] bpf: add function to convert classic BPF to DPDK BPF
  2021-09-15 11:02     ` Ananyev, Konstantin
@ 2021-09-15 16:25       ` Stephen Hemminger
  0 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-15 16:25 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

On Wed, 15 Sep 2021 11:02:20 +0000
"Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:

> > +if dpdk_conf.has('RTE_PORT_PCAP')  
> 
> Do we really need that 'if' above?
> Why not to always have it enabled?

The converter code needs libpcap for the pcap header files that define
the encoding of classic BPF (struct bpf_insn).

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v8 05/12] bpf: add function to dump eBPF instructions
  2021-09-15 11:04     ` Ananyev, Konstantin
@ 2021-09-15 16:26       ` Stephen Hemminger
  0 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-15 16:26 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

On Wed, 15 Sep 2021 11:04:41 +0000
"Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:

> > diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
> > index a46ffeb067dd..db84add7dcce 100644
> > --- a/lib/bpf/bpf_convert.c
> > +++ b/lib/bpf/bpf_convert.c
> > @@ -331,7 +331,12 @@ static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
> >  		case BPF_LD | BPF_IND | BPF_H:
> >  		case BPF_LD | BPF_IND | BPF_B:
> >  			/* All arithmetic insns map as-is. */
> > -			*insn = BPF_RAW_INSN(fp->code, BPF_REG_A, BPF_REG_X, 0, fp->k);
> > +			insn->code = fp->code;
> > +			insn->dst_reg = BPF_REG_A;
> > +			bpf_src = BPF_SRC(fp->code);
> > +			insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
> > +			insn->off = 0;
> > +			insn->imm = fp->k;
> >  			break;  
> 
> Should it be part of that patch?
> Looks like belongs to previous one, no?

Yes, moved it in next bundle.

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v8 10/12] test: enable bpf autotest
  2021-09-15 11:27     ` Ananyev, Konstantin
@ 2021-09-15 23:36       ` Stephen Hemminger
  0 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-15 23:36 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

On Wed, 15 Sep 2021 11:27:15 +0000
"Ananyev, Konstantin" <konstantin.ananyev@intel.com> wrote:

> > The BPF autotest is defined but not run automatically.
> > Since it is short, it should be added to the autotest suite.
> > 
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >  app/test/meson.build | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/app/test/meson.build b/app/test/meson.build
> > index 0d551ac9c2b2..cd18484bb73a 100644
> > --- a/app/test/meson.build
> > +++ b/app/test/meson.build
> > @@ -194,6 +194,8 @@ test_deps = [
> >  fast_tests = [
> >          ['acl_autotest', true],
> >          ['atomic_autotest', false],
> > +        ['bpf_autotest', true],
> > +        ['bpf_convert_autotest', true],
> >          ['bitops_autotest', true],
> >          ['byteorder_autotest', true],
> >          ['cksum_autotest', true],
> > --  
> 
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> 
> > 2.30.2  
> 

One oddity of original BPF test is that it constructs an mbuf where there
is no Ethernet header.  Didn't want to change that since not only
would the constructor have to be changed but also the hand written
BPF programs as well.

Not necessarily a bug, but the test is not doing what any application
using BPF on incoming mbuf's would expect to do.

^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v9 00/12] Packet capture framework enhancements
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (11 preceding siblings ...)
  2021-09-13 18:14 ` [dpdk-dev] [PATCH v8 00/12] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-16  0:14 ` Stephen Hemminger
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 01/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
                     ` (11 more replies)
  2021-09-16 22:26 ` [dpdk-dev] [PATCH v10 00/12] Packet capture framework enhancements Stephen Hemminger
                   ` (5 subsequent siblings)
  18 siblings, 12 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16  0:14 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patch set is a more complete version of the the enhanced
packet capture support described last year.

The new capture library and utility are:
  - faster avoids lots of extra I/O, does bursting, etc.
  - gives more information (multiple ports, queues, etc)
  - has a better user interface (same as Wireshark dumpcap)
  - fixes structural problems with VLAN's and timestamps

v9 changes:
  - incorporate suggested change to BPF XOR
  - make autotest for pcapng more complete by reading the
    resulting file with libpcap

v8 changes:
  - enable BPF tests in autotest
  - add more BPF test strings
  - use rte_strscpy to satisfy checkpatch
  - merge MAINTAINERS (put this in with existing pdump)

Stephen Hemminger (12):
  librte_pcapng: add new library for writing pcapng files
  lib: pdump is not supported on Windows
  bpf: allow self-xor operation
  bpf: add function to convert classic BPF to DPDK BPF
  bpf: add function to dump eBPF instructions
  pdump: support pcapng and filtering
  app/dumpcap: add new packet capture application
  test: add test for bpf_convert
  test: add a test for pcapng library
  test: enable bpf autotest
  doc: changes for new pcapng and dumpcap
  MAINTAINERS: add entry for new packet capture features

 MAINTAINERS                                   |  11 +-
 app/dumpcap/main.c                            | 844 ++++++++++++++++++
 app/dumpcap/meson.build                       |  16 +
 app/meson.build                               |   1 +
 app/test/meson.build                          |   6 +
 app/test/test_bpf.c                           | 200 +++++
 app/test/test_pcapng.c                        | 272 ++++++
 doc/api/doxy-api-index.md                     |   1 +
 doc/api/doxy-api.conf.in                      |   1 +
 .../howto/img/packet_capture_framework.svg    |  96 +-
 doc/guides/howto/packet_capture_framework.rst |  67 +-
 doc/guides/prog_guide/index.rst               |   1 +
 doc/guides/prog_guide/pcapng_lib.rst          |  24 +
 doc/guides/prog_guide/pdump_lib.rst           |  28 +-
 doc/guides/rel_notes/release_21_11.rst        |  10 +
 doc/guides/tools/dumpcap.rst                  |  86 ++
 doc/guides/tools/index.rst                    |   1 +
 lib/bpf/bpf_convert.c                         | 575 ++++++++++++
 lib/bpf/bpf_dump.c                            | 118 +++
 lib/bpf/bpf_validate.c                        |   9 +-
 lib/bpf/meson.build                           |   6 +
 lib/bpf/rte_bpf.h                             |  39 +
 lib/bpf/version.map                           |   7 +
 lib/meson.build                               |   6 +-
 lib/pcapng/meson.build                        |   8 +
 lib/pcapng/pcapng_proto.h                     | 129 +++
 lib/pcapng/rte_pcapng.c                       | 574 ++++++++++++
 lib/pcapng/rte_pcapng.h                       | 194 ++++
 lib/pcapng/version.map                        |  12 +
 lib/pdump/meson.build                         |   2 +-
 lib/pdump/rte_pdump.c                         | 441 ++++++---
 lib/pdump/rte_pdump.h                         | 110 ++-
 lib/pdump/version.map                         |   8 +
 33 files changed, 3683 insertions(+), 220 deletions(-)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build
 create mode 100644 app/test/test_pcapng.c
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst
 create mode 100644 lib/bpf/bpf_convert.c
 create mode 100644 lib/bpf/bpf_dump.c
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v9 01/12] librte_pcapng: add new library for writing pcapng files
  2021-09-16  0:14 ` [dpdk-dev] [PATCH v9 00/12] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-16  0:14   ` Stephen Hemminger
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 02/12] lib: pdump is not supported on Windows Stephen Hemminger
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16  0:14 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See draft RFC
  https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
and
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build           |   1 +
 lib/pcapng/meson.build    |   8 +
 lib/pcapng/pcapng_proto.h | 129 +++++++++
 lib/pcapng/rte_pcapng.c   | 574 ++++++++++++++++++++++++++++++++++++++
 lib/pcapng/rte_pcapng.h   | 194 +++++++++++++
 lib/pcapng/version.map    |  12 +
 6 files changed, 918 insertions(+)
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

diff --git a/lib/meson.build b/lib/meson.build
index 1673ca4323c0..51bf9c2d11f0 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -41,6 +41,7 @@ libraries = [
         'latencystats',
         'lpm',
         'member',
+        'pcapng',
         'power',
         'pdump',
         'rawdev',
diff --git a/lib/pcapng/meson.build b/lib/pcapng/meson.build
new file mode 100644
index 000000000000..fe636bdf3c0b
--- /dev/null
+++ b/lib/pcapng/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+version = 1
+sources = files('rte_pcapng.c')
+headers = files('rte_pcapng.h')
+
+deps += ['ethdev']
diff --git a/lib/pcapng/pcapng_proto.h b/lib/pcapng/pcapng_proto.h
new file mode 100644
index 000000000000..47161d8a1213
--- /dev/null
+++ b/lib/pcapng/pcapng_proto.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * PCAP Next Generation Capture File writer
+ *
+ * See: https://github.com/pcapng/pcapng/ for the file format.
+ */
+
+enum pcapng_block_types {
+	PCAPNG_INTERFACE_BLOCK		= 1,
+	PCAPNG_PACKET_BLOCK,		/* Obsolete */
+	PCAPNG_SIMPLE_PACKET_BLOCK,
+	PCAPNG_NAME_RESOLUTION_BLOCK,
+	PCAPNG_INTERFACE_STATS_BLOCK,
+	PCAPNG_ENHANCED_PACKET_BLOCK,
+
+	PCAPNG_SECTION_BLOCK		= 0x0A0D0D0A,
+};
+
+struct pcapng_option {
+	uint16_t code;
+	uint16_t length;
+	uint8_t data[];
+};
+
+#define PCAPNG_BYTE_ORDER_MAGIC 0x1A2B3C4D
+#define PCAPNG_MAJOR_VERS 1
+#define PCAPNG_MINOR_VERS 0
+
+enum pcapng_opt {
+	PCAPNG_OPT_END	= 0,
+	PCAPNG_OPT_COMMENT = 1,
+};
+
+struct pcapng_section_header {
+	uint32_t block_type;
+	uint32_t block_length;
+	uint32_t byte_order_magic;
+	uint16_t major_version;
+	uint16_t minor_version;
+	uint64_t section_length;
+};
+
+enum pcapng_section_opt {
+	PCAPNG_SHB_HARDWARE = 2,
+	PCAPNG_SHB_OS	    = 3,
+	PCAPNG_SHB_USERAPPL = 4,
+};
+
+struct pcapng_interface_block {
+	uint32_t block_type;	/* 1 */
+	uint32_t block_length;
+	uint16_t link_type;
+	uint16_t reserved;
+	uint32_t snap_len;
+};
+
+enum pcapng_interface_options {
+	PCAPNG_IFB_NAME	 = 2,
+	PCAPNG_IFB_DESCRIPTION,
+	PCAPNG_IFB_IPV4ADDR,
+	PCAPNG_IFB_IPV6ADDR,
+	PCAPNG_IFB_MACADDR,
+	PCAPNG_IFB_EUIADDR,
+	PCAPNG_IFB_SPEED,
+	PCAPNG_IFB_TSRESOL,
+	PCAPNG_IFB_TZONE,
+	PCAPNG_IFB_FILTER,
+	PCAPNG_IFB_OS,
+	PCAPNG_IFB_FCSLEN,
+	PCAPNG_IFB_TSOFFSET,
+	PCAPNG_IFB_HARDWARE,
+};
+
+struct pcapng_enhance_packet_block {
+	uint32_t block_type;	/* 6 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+	uint32_t capture_length;
+	uint32_t original_length;
+};
+
+/* Flags values */
+#define PCAPNG_IFB_INBOUND   0b01
+#define PCAPNG_IFB_OUTBOUND  0b10
+
+enum pcapng_epb_options {
+	PCAPNG_EPB_FLAGS = 2,
+	PCAPNG_EPB_HASH,
+	PCAPNG_EPB_DROPCOUNT,
+	PCAPNG_EPB_PACKETID,
+	PCAPNG_EPB_QUEUE,
+	PCAPNG_EPB_VERDICT,
+};
+
+enum pcapng_epb_hash {
+	PCAPNG_HASH_2COMP = 0,
+	PCAPNG_HASH_XOR,
+	PCAPNG_HASH_CRC32,
+	PCAPNG_HASH_MD5,
+	PCAPNG_HASH_SHA1,
+	PCAPNG_HASH_TOEPLITZ,
+};
+
+struct pcapng_simple_packet {
+	uint32_t block_type;	/* 3 */
+	uint32_t block_length;
+	uint32_t packet_length;
+};
+
+struct pcapng_statistics {
+	uint32_t block_type;	/* 5 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+};
+
+enum pcapng_isb_options {
+	PCAPNG_ISB_STARTTIME = 2,
+	PCAPNG_ISB_ENDTIME,
+	PCAPNG_ISB_IFRECV,
+	PCAPNG_ISB_IFDROP,
+	PCAPNG_ISB_FILTERACCEPT,
+	PCAPNG_ISB_OSDROP,
+	PCAPNG_ISB_USRDELIV,
+};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
new file mode 100644
index 000000000000..f8280a8b01f4
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.c
@@ -0,0 +1,574 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+#include <errno.h>
+#include <net/if.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/uio.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+#include <rte_time.h>
+
+#include "pcapng_proto.h"
+
+/* conversion from DPDK speed to PCAPNG */
+#define PCAPNG_MBPS_SPEED 1000000ull
+
+/* Format of the capture file handle */
+struct rte_pcapng {
+	int  outfd;		/* output file */
+	/* DPDK port id to interface index in file */
+	uint32_t port_index[RTE_MAX_ETHPORTS];
+};
+
+/* For converting TSC cycles to PCAPNG ns format */
+struct pcapng_time {
+	uint64_t ns;
+	uint64_t cycles;
+} pcapng_time;
+
+RTE_INIT(pcapng_init)
+{
+	struct timespec ts;
+
+	pcapng_time.cycles = rte_get_tsc_cycles();
+	clock_gettime(CLOCK_REALTIME, &ts);
+	pcapng_time.ns = rte_timespec_to_ns(&ts);
+}
+
+/* PCAPNG timestamps are in nanoseconds */
+static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
+{
+	uint64_t delta;
+
+	delta = cycles - pcapng_time.cycles;
+	return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
+}
+
+/* length of option including padding */
+static uint16_t pcapng_optlen(uint16_t len)
+{
+	return RTE_ALIGN(sizeof(struct pcapng_option) + len,
+			 sizeof(uint32_t));
+}
+
+/* build TLV option and return location of next */
+static struct pcapng_option *
+pcapng_add_option(struct pcapng_option *popt, uint16_t code,
+		  const void *data, uint16_t len)
+{
+	popt->code = code;
+	popt->length = len;
+	memcpy(popt->data, data, len);
+
+	return (struct pcapng_option *)((uint8_t *)popt + pcapng_optlen(len));
+}
+
+/*
+ * Write required initial section header describing the capture
+ */
+static int
+pcapng_section_block(rte_pcapng_t *self,
+		    const char *os, const char *hw,
+		    const char *app, const char *comment)
+{
+	struct pcapng_section_header *hdr;
+	struct pcapng_option *opt;
+	void *buf;
+	uint32_t len;
+	ssize_t cc;
+
+	len = sizeof(*hdr);
+	if (hw)
+		len += pcapng_optlen(strlen(hw));
+	if (os)
+		len += pcapng_optlen(strlen(os));
+	if (app)
+		len += pcapng_optlen(strlen(app));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = calloc(1, len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_section_header *)buf;
+	*hdr = (struct pcapng_section_header) {
+		.block_type = PCAPNG_SECTION_BLOCK,
+		.block_length = len,
+		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
+		.major_version = PCAPNG_MAJOR_VERS,
+		.minor_version = PCAPNG_MINOR_VERS,
+		.section_length = UINT64_MAX,
+	};
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (hw)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_HARDWARE,
+					hw, strlen(hw));
+	if (os)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_OS,
+					os, strlen(os));
+	if (app)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_USERAPPL,
+					app, strlen(app));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+	/* clone block_length after option */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+/* Write the PCAPNG section header at start of file */
+static ssize_t
+pcapng_interface_block(rte_pcapng_t *self, const char *if_name,
+		      uint64_t if_speed, const uint8_t *mac_addr,
+		      const char *if_hw, const char *comment)
+{
+	struct pcapng_interface_block *hdr;
+	struct pcapng_option *opt;
+	const uint8_t tsresol = 9;	/* nanosecond resolution */
+	uint32_t len = sizeof(*hdr);
+	ssize_t cc;
+	void *buf;
+
+	len += pcapng_optlen(sizeof(tsresol));
+	if (if_name)
+		len += pcapng_optlen(strlen(if_name));
+	if (mac_addr)
+		len += pcapng_optlen(6);
+	if (if_speed)
+		len += pcapng_optlen(sizeof(uint64_t));
+	if (if_hw)
+		len += pcapng_optlen(strlen(if_hw));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+	buf = calloc(1, len);
+	if (!buf)
+		return -ENOMEM;
+
+	hdr = (struct pcapng_interface_block *)buf;
+	hdr->block_type = PCAPNG_INTERFACE_BLOCK;
+	hdr->link_type = 1;	/* Ethernet */
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (if_name)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_NAME,
+					 if_name, strlen(if_name));
+	if (mac_addr)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_MACADDR,
+					mac_addr, RTE_ETHER_ADDR_LEN);
+	if (if_speed)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_SPEED,
+					 &if_speed, sizeof(uint64_t));
+	opt = pcapng_add_option(opt, PCAPNG_IFB_TSRESOL,
+				&tsresol, sizeof(tsresol));
+	if (if_hw)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE,
+					 if_hw, strlen(if_hw));
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+static int
+pcapng_add_interface(rte_pcapng_t *self, uint16_t port)
+{
+	struct rte_eth_dev_info dev_info;
+	struct rte_ether_addr macaddr;
+	const struct rte_device *dev;
+	struct rte_eth_link link;
+	char ifname[IF_NAMESIZE];
+	char ifhw[256];
+	uint64_t speed = 0;
+
+	if (rte_eth_dev_info_get(port, &dev_info) < 0)
+		return -1;
+
+	/* make something like an interface name */
+	if (if_indextoname(dev_info.if_index, ifname) == NULL)
+		snprintf(ifname, IF_NAMESIZE, "dpdk:%u", port);
+
+	/* make a useful device hardware string */
+	dev = dev_info.device;
+	if (dev)
+		snprintf(ifhw, sizeof(ifhw),
+			 "%s-%s", dev->bus->name, dev->name);
+
+	/* DPDK reports in units of Mbps */
+	rte_eth_link_get(port, &link);
+	if (link.link_status == ETH_LINK_UP)
+		speed = link.link_speed * PCAPNG_MBPS_SPEED;
+
+	rte_eth_macaddr_get(port, &macaddr);
+
+	return pcapng_interface_block(self, ifname, speed,
+				      macaddr.addr_bytes,
+				      dev ? ifhw : NULL, NULL);
+}
+
+/*
+ * Write the list of possible interfaces at the start
+ * of the file.
+ */
+static int
+pcapng_interfaces(rte_pcapng_t *self)
+{
+	uint16_t port_id;
+	uint16_t index = 0;
+
+	RTE_ETH_FOREACH_DEV(port_id) {
+		/* The list if ports in pcapng needs to be contiguous */
+		self->port_index[port_id] = index++;
+		if (pcapng_add_interface(self, port_id) < 0)
+			return -1;
+	}
+	return 0;
+}
+
+/*
+ * Write an Interface statistics block at the end of capture.
+ */
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop)
+{
+	struct pcapng_statistics *hdr;
+	struct pcapng_option *opt;
+	uint32_t optlen, len;
+	uint8_t *buf;
+	uint64_t ns;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	optlen = 0;
+
+	if (ifrecv != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifdrop));
+	if (start_time != 0)
+		optlen += pcapng_optlen(sizeof(start_time));
+	if (end_time != 0)
+		optlen += pcapng_optlen(sizeof(end_time));
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+	if (optlen != 0)
+		optlen += pcapng_optlen(0);
+
+	len = sizeof(*hdr) + optlen + sizeof(uint32_t);
+	buf = alloca(len);
+	if (buf == NULL)
+		return -1;
+
+	hdr = (struct pcapng_statistics *)buf;
+	opt = (struct pcapng_option *)(hdr + 1);
+
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (start_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME,
+					 &start_time, sizeof(start_time));
+	if (end_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME,
+					 &end_time, sizeof(end_time));
+	if (ifrecv != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV,
+				&ifrecv, sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFDROP,
+				&ifdrop, sizeof(ifdrop));
+	if (optlen != 0)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	hdr->block_type = PCAPNG_INTERFACE_STATS_BLOCK;
+	hdr->block_length = len;
+	hdr->interface_id = self->port_index[port_id];
+
+	ns = pcapng_tsc_to_ns(rte_get_tsc_cycles());
+	hdr->timestamp_hi = ns >> 32;
+	hdr->timestamp_lo = (uint32_t)ns;
+
+	/* clone block_length after option */
+	memcpy(opt, &len, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length)
+{
+	/* The VLAN and EPB header must fit in the mbuf headroom. */
+	RTE_ASSERT(sizeof(struct pcapng_enhance_packet_block) +
+		   sizeof(struct rte_vlan_hdr) <= RTE_PKTMBUF_HEADROOM);
+
+	/* The flags and queue information are added at the end. */
+	return sizeof(struct rte_mbuf)
+		+ RTE_ALIGN(length, sizeof(uint32_t))
+		+ pcapng_optlen(sizeof(uint32_t)) /* flag option */
+		+ pcapng_optlen(sizeof(uint32_t)) /* queue option */
+		+ sizeof(uint32_t);		  /*  length */
+}
+
+/*
+ *   The mbufs created use the Pcapng standard enhanced packet  block.
+ *
+ *                         1                   2                   3
+ *     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  0 |                    Block Type = 0x00000006                    |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  4 |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  8 |                         Interface ID                          |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 12 |                        Timestamp (High)                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 16 |                        Timestamp (Low)                        |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 20 |                    Captured Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 24 |                    Original Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 28 /                                                               /
+ *    /                          Packet Data                          /
+ *    /              variable length, padded to 32 bits               /
+ *    /                                                               /
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0002     |     Option Length = 0x004     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Flags (direction)                                |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0006     |     Option Length = 0x002     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Queue id                                         |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+
+/* Make a copy of original mbuf with pcapng header and options */
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *md,
+		struct rte_mempool *mp,
+		uint32_t length, uint64_t cycles,
+		enum rte_pcapng_direction direction)
+{
+	struct pcapng_enhance_packet_block *epb;
+	uint32_t orig_len, data_len, padding, flags;
+	struct pcapng_option *opt;
+	const uint16_t optlen = pcapng_optlen(sizeof(flags)) + pcapng_optlen(sizeof(queue));
+	struct rte_mbuf *mc;
+	uint64_t ns;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
+#endif
+	ns = pcapng_tsc_to_ns(cycles);
+
+	orig_len = rte_pktmbuf_pkt_len(md);
+
+	/* Take snapshot of the data */
+	mc = rte_pktmbuf_copy(md, mp, 0, length);
+	if (unlikely(mc == NULL))
+		return NULL;
+
+	/* If packet had offloaded VLAN, expand it */
+	if (md->ol_flags & ~(PKT_RX_VLAN_STRIPPED | PKT_TX_VLAN)) {
+		if (rte_vlan_insert(&mc) != 0)
+			goto fail;
+
+		orig_len += sizeof(struct rte_vlan_hdr);
+	}
+
+	/* pad the packet to 32 bit boundary */
+	data_len = rte_pktmbuf_data_len(mc);
+	padding = RTE_ALIGN(data_len, sizeof(uint32_t)) - data_len;
+	if (padding > 0) {
+		void *tail = rte_pktmbuf_append(mc, padding);
+
+		if (tail == NULL)
+			goto fail;
+		memset(tail, 0, padding);
+	}
+
+	/* reserve trailing options and block length */
+	opt = (struct pcapng_option *)
+		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
+	if (unlikely(opt == NULL))
+		goto fail;
+
+	switch (direction) {
+	case RTE_PCAPNG_DIRECTION_IN:
+		flags = PCAPNG_IFB_INBOUND;
+		break;
+	case RTE_PCAPNG_DIRECTION_OUT:
+		flags = PCAPNG_IFB_OUTBOUND;
+		break;
+	default:
+		flags = 0;
+	}
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS,
+				&flags, sizeof(flags));
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
+				&queue, sizeof(queue));
+
+	/* Add PCAPNG packet header */
+	epb = (struct pcapng_enhance_packet_block *)
+		rte_pktmbuf_prepend(mc, sizeof(*epb));
+	if (unlikely(epb == NULL))
+		goto fail;
+
+	epb->block_type = PCAPNG_ENHANCED_PACKET_BLOCK;
+	epb->block_length = rte_pktmbuf_data_len(mc);
+
+	/* Interface index is filled in later during write */
+	mc->port = port_id;
+
+	epb->timestamp_hi = ns >> 32;
+	epb->timestamp_lo = (uint32_t)ns;
+	epb->capture_length = data_len;
+	epb->original_length = orig_len;
+
+	/* set trailer of block length */
+	*(uint32_t *)opt = epb->block_length;
+
+	return mc;
+
+fail:
+	rte_pktmbuf_free(mc);
+	return NULL;
+}
+
+/* Count how many segments are in this array of mbufs */
+static unsigned int
+mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
+{
+	unsigned int i, iovcnt;
+
+	for (iovcnt = 0, i = 0; i < n; i++) {
+		const struct rte_mbuf *m = pkts[i];
+
+		__rte_mbuf_sanity_check(m, 1);
+
+		iovcnt += m->nb_segs;
+	}
+	return iovcnt;
+}
+
+/* Write pre-formatted packets to file. */
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts)
+{
+	int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
+	struct iovec iov[iovcnt];
+	unsigned int i, cnt;
+	ssize_t ret;
+
+	for (i = cnt = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *m = pkts[i];
+		struct pcapng_enhance_packet_block *epb;
+
+		/* sanity check that is really a pcapng mbuf */
+		epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
+		if (unlikely(epb->block_type != PCAPNG_ENHANCED_PACKET_BLOCK ||
+			     epb->block_length != rte_pktmbuf_data_len(m))) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/*
+		 * The DPDK port is recorded during pcapng_copy.
+		 * Map that to PCAPNG interface in file.
+		 */
+		epb->interface_id = self->port_index[m->port];
+		do {
+			iov[cnt].iov_base = rte_pktmbuf_mtod(m, void *);
+			iov[cnt].iov_len = rte_pktmbuf_data_len(m);
+			++cnt;
+		} while ((m = m->next));
+	}
+
+	ret = writev(self->outfd, iov, iovcnt);
+	if (unlikely(ret < 0))
+		rte_errno = errno;
+	return ret;
+}
+
+/* Create new pcapng writer handle */
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment)
+{
+	rte_pcapng_t *self;
+
+	self = malloc(sizeof(*self));
+	if (!self) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	self->outfd = fd;
+
+	if (pcapng_section_block(self, osname, hardware, appname, comment) < 0)
+		goto fail;
+
+	if (pcapng_interfaces(self) < 0)
+		goto fail;
+
+	return self;
+fail:
+	free(self);
+	return NULL;
+}
+
+void
+rte_pcapng_close(rte_pcapng_t *self)
+{
+	close(self->outfd);
+	free(self);
+}
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
new file mode 100644
index 000000000000..2f1bb073df08
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.h
@@ -0,0 +1,194 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+/**
+ * @file
+ * RTE pcapng
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * Pcapng is an evolution from the pcap format, created to address some of
+ * its deficiencies. Namely, the lack of extensibility and inability to store
+ * additional information.
+ *
+ * For details about the file format see RFC:
+ *   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+ *  and
+ *    https://github.com/pcapng/pcapng/
+ */
+
+#ifndef _RTE_PCAPNG_H_
+#define _RTE_PCAPNG_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+#include <rte_compat.h>
+#include <rte_common.h>
+#include <rte_mempool.h>
+#include <rte_ring.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Opaque handle used for functions in this library. */
+typedef struct rte_pcapng rte_pcapng_t;
+
+/**
+ * Write data to existing open file
+ *
+ * @param fd
+ *   file descriptor
+ * @param osname
+ *   Optional description of the operating system.
+ *   Examples: "Debian 11", "Windows Server 22"
+ * @param hardware
+ *   Optional description of the hardware used to create this file.
+ *   Examples: "x86 Virtual Machine"
+ * @param appname
+ *   Optional: application name recorded in the pcapng file.
+ *   Example: "dpdk-dumpcap 1.0 (DPDK 20.11)"
+ * @param comment
+ *   Optional comment to add to file header.
+ * @return
+ *   handle to library, or NULL in case of error (and rte_errno is set).
+ */
+__rte_experimental
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment);
+
+/**
+ * Close capture file
+ *
+ * @param self
+ *  handle to library
+ */
+__rte_experimental
+void
+rte_pcapng_close(rte_pcapng_t *self);
+
+/**
+ * Direction flag
+ * These should match Enhanced Packet Block flag bits
+ */
+enum rte_pcapng_direction {
+	RTE_PCAPNG_DIRECTION_UNKNOWN = 0,
+	RTE_PCAPNG_DIRECTION_IN  = 1,
+	RTE_PCAPNG_DIRECTION_OUT = 2,
+};
+
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param port_id
+ *   The Ethernet port on which packet was received
+ *   or is going to be transmitted.
+ * @param queue
+ *   The queue on the Ethernet port where packet was received
+ *   or is going to be transmitted.
+ * @param mp
+ *   The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ *   The mbuf to copy
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @param timestamp
+ *   The timestamp in TSC cycles.
+ * @param direction
+ *   The direction of the packer: receive, transmit or unknown.
+ *
+ * @return
+ *   - The pointer to the new mbuf formatted for pcapng_write
+ *   - NULL if allocation fails.
+ *
+ */
+__rte_experimental
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *m, struct rte_mempool *mp,
+		uint32_t length, uint64_t timestamp,
+		enum rte_pcapng_direction direction);
+
+
+/**
+ * Determine optimum mbuf data size.
+ *
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @return
+ *   The minimum size of mbuf data to handle packet with length bytes.
+ *   Accounting for required header and trailer fields
+ */
+__rte_experimental
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length);
+
+/**
+ * Write packets to the capture file.
+ *
+ * Packets to be captured are copied by rte_pcapng_mbuf()
+ * and then this function is called to write them to the file.
+ * @warning
+ * Do not pass original mbufs
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param pkts
+ *  The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *  which contain the output packets
+ * @param nb_pkts
+ *  The number of packets to write to the file.
+ * @return
+ *  The number of bytes written to file, -1 on failure to write file.
+ *  The mbuf's in *pkts* are always freed.
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts);
+
+/**
+ * Write an Interface statistics block.
+ * For statistics, use 0 if don't know or care to report it.
+ * Should be called before closing capture to report results.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param port
+ *  The Ethernet port to report stats on.
+ * @param comment
+ *   Optional comment to add to statistics.
+ * @param start_time
+ *  The time when packet capture was started in nanoseconds.
+ *  Optional: can be zero if not known.
+ * @param end_time
+ *  The time when packet capture was stopped in nanoseconds.
+ *  Optional: can be zero if not finished;
+ * @param ifrecv
+ *  The number of packets received by capture.
+ *  Optional: use UINT64_MAX if not known.
+ * @param ifdrop
+ *  The number of packets missed by the capture process.
+ *  Optional: use UINT64_MAX if not known.
+ * @return
+ *  number of bytes written to file, -1 on failure to write file
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PCAPNG_H_ */
diff --git a/lib/pcapng/version.map b/lib/pcapng/version.map
new file mode 100644
index 000000000000..05a9c86a7d91
--- /dev/null
+++ b/lib/pcapng/version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_pcapng_close;
+	rte_pcapng_copy;
+	rte_pcapng_fdopen;
+	rte_pcapng_mbuf_size;
+	rte_pcapng_write_packets;
+	rte_pcapng_write_stats;
+
+	local: *;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v9 02/12] lib: pdump is not supported on Windows
  2021-09-16  0:14 ` [dpdk-dev] [PATCH v9 00/12] Packet capture framework enhancements Stephen Hemminger
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 01/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-09-16  0:14   ` Stephen Hemminger
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 03/12] bpf: allow self-xor operation Stephen Hemminger
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16  0:14 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

The current version of the pdump library was building on
Windows, but it was useless since the pdump utility was not being
built and Windows does not have multi-process support.

The new version of pdump with filtering now has dependency
on bpf. But bpf library is not available on Windows.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

Cc: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>
Cc: Dmitry Malloy <dmitrym@microsoft.com>
Cc: Pallavi Kadam <pallavi.kadam@intel.com>
---
 lib/meson.build | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/meson.build b/lib/meson.build
index 51bf9c2d11f0..ba88e9eabc58 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -85,7 +85,6 @@ if is_windows
             'gro',
             'gso',
             'latencystats',
-            'pdump',
     ] # only supported libraries for windows
 endif
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v9 03/12] bpf: allow self-xor operation
  2021-09-16  0:14 ` [dpdk-dev] [PATCH v9 00/12] Packet capture framework enhancements Stephen Hemminger
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 01/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 02/12] lib: pdump is not supported on Windows Stephen Hemminger
@ 2021-09-16  0:14   ` Stephen Hemminger
  2021-09-16 15:23     ` Ananyev, Konstantin
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
                     ` (8 subsequent siblings)
  11 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16  0:14 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, konstantin.ananyev

Some BPF programs may use XOR of a register with itself
as a way to zero register in one instruction.
The BPF filter converter generates this in the prolog
to the generated code.

The BPF validator would not allow this because the value of
register was undefined. But after this operation it always zero.

Fixes: 8021917293d0 ("bpf: add extra validation for input BPF program")
Cc: konstantin.ananyev@intel.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_validate.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_validate.c b/lib/bpf/bpf_validate.c
index 7b1291b382e9..853279fee557 100644
--- a/lib/bpf/bpf_validate.c
+++ b/lib/bpf/bpf_validate.c
@@ -661,8 +661,15 @@ eval_alu(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
 
 	op = BPF_OP(ins->code);
 
+	/* Allow self-xor as way to zero register */
+	if (op == BPF_XOR && BPF_SRC(ins->code) == BPF_X &&
+	    ins->src_reg == ins->dst_reg) {
+		eval_fill_imm(&rs, UINT64_MAX, 0);
+		eval_fill_imm(rd, UINT64_MAX, 0);
+	}
+
 	err = eval_defined((op != EBPF_MOV) ? rd : NULL,
-			(op != BPF_NEG) ? &rs : NULL);
+			   (op != BPF_NEG) ? &rs : NULL);
 	if (err != NULL)
 		return err;
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v9 04/12] bpf: add function to convert classic BPF to DPDK BPF
  2021-09-16  0:14 ` [dpdk-dev] [PATCH v9 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (2 preceding siblings ...)
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 03/12] bpf: allow self-xor operation Stephen Hemminger
@ 2021-09-16  0:14   ` Stephen Hemminger
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16  0:14 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

The pcap library emits classic BPF (32 bit) and is useful for
creating filter programs.  The DPDK BPF library only implements
extended BPF (eBPF).  Add an function to convert from old to
new.

The rte_bpf_convert function uses rte_malloc to put the resulting
program in hugepage shared memory so it can be passed from a
secondary process to a primary process.

The code to convert was originally done as part of the Linux
kernel implementation then converted to a userspace program.
See https://github.com/tklauser/filter2xdp

Both authors have agreed that it is allowable to create a modified
version of this code and license it with BSD license used by DPDK.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/bpf/bpf_convert.c | 575 ++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build   |   5 +
 lib/bpf/rte_bpf.h     |  25 ++
 lib/bpf/version.map   |   6 +
 4 files changed, 611 insertions(+)
 create mode 100644 lib/bpf/bpf_convert.c

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
new file mode 100644
index 000000000000..db84add7dcce
--- /dev/null
+++ b/lib/bpf/bpf_convert.c
@@ -0,0 +1,575 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Microsoft Corporation
+ *
+ * Based on bpf_convert_filter() in the Linux kernel sources
+ * and filter2xdp.
+ *
+ * Licensed as BSD with permission original authors.
+ * Copyright (C) 2017 Tobias Klauser
+ * Copyright (c) 2011 - 2014 PLUMgrid, http://plumgrid.com
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_bpf.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+/* Workaround name conflicts with libpcap */
+#define bpf_validate(f, len) bpf_validate_libpcap(f, len)
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+#undef bpf_validate
+
+#include "bpf_impl.h"
+#include "bpf_def.h"
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+/*
+ * Linux socket filter uses negative absolute offsets to
+ * reference ancillary data.
+ */
+#define SKF_AD_OFF    (-0x1000)
+#define SKF_AD_PROTOCOL 0
+#define SKF_AD_PKTTYPE	4
+#define SKF_AD_IFINDEX	8
+#define SKF_AD_NLATTR	12
+#define SKF_AD_NLATTR_NEST	16
+#define SKF_AD_MARK	20
+#define SKF_AD_QUEUE	24
+#define SKF_AD_HATYPE	28
+#define SKF_AD_RXHASH	32
+#define SKF_AD_CPU	36
+#define SKF_AD_ALU_XOR_X	40
+#define SKF_AD_VLAN_TAG	44
+#define SKF_AD_VLAN_TAG_PRESENT 48
+#define SKF_AD_PAY_OFFSET	52
+#define SKF_AD_RANDOM	56
+#define SKF_AD_VLAN_TPID	60
+#define SKF_AD_MAX	64
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1	EBPF_REG_1
+#define BPF_REG_ARG2	EBPF_REG_2
+#define BPF_REG_ARG3	EBPF_REG_3
+#define BPF_REG_ARG4	EBPF_REG_4
+#define BPF_REG_ARG5	EBPF_REG_5
+#define BPF_REG_CTX	EBPF_REG_6
+#define BPF_REG_FP	EBPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A	EBPF_REG_0
+#define BPF_REG_X	EBPF_REG_7
+#define BPF_REG_TMP	EBPF_REG_8
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define EBPF_ALU64_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | BPF_OP(OP) | BPF_X,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU32_IMM(OP, DST, IMM)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV32_IMM(DST, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = CODE,					\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN()						\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | EBPF_EXIT,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/*
+ * Placeholder to convert BPF extensions like length and VLAN tag
+ * If and when DPDK BPF supports them.
+ */
+static bool convert_bpf_load(const struct bpf_insn *fp,
+			     struct ebpf_insn **new_insnp __rte_unused)
+{
+	switch (fp->k) {
+	case SKF_AD_OFF + SKF_AD_PROTOCOL:
+	case SKF_AD_OFF + SKF_AD_PKTTYPE:
+	case SKF_AD_OFF + SKF_AD_IFINDEX:
+	case SKF_AD_OFF + SKF_AD_HATYPE:
+	case SKF_AD_OFF + SKF_AD_MARK:
+	case SKF_AD_OFF + SKF_AD_RXHASH:
+	case SKF_AD_OFF + SKF_AD_QUEUE:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG_PRESENT:
+	case SKF_AD_OFF + SKF_AD_VLAN_TPID:
+	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
+	case SKF_AD_OFF + SKF_AD_NLATTR:
+	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
+	case SKF_AD_OFF + SKF_AD_CPU:
+	case SKF_AD_OFF + SKF_AD_RANDOM:
+	case SKF_AD_OFF + SKF_AD_ALU_XOR_X:
+		/* Linux has special negative offsets to access meta-data. */
+		RTE_BPF_LOG(ERR,
+			    "rte_bpf_convert: socket offset %d not supported\n",
+			    fp->k - SKF_AD_OFF);
+		return true;
+	default:
+		return false;
+	}
+}
+
+static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
+			      struct ebpf_insn *new_prog, uint32_t *new_len)
+{
+	unsigned int pass = 0;
+	size_t new_flen = 0, target, i;
+	struct ebpf_insn *new_insn;
+	const struct bpf_insn *fp;
+	int *addrs = NULL;
+	uint8_t bpf_src;
+
+	if (len > BPF_MAXINSNS) {
+		RTE_BPF_LOG(ERR, "%s: cBPF program too long (%zu insns)\n",
+			    __func__, len);
+		return -EINVAL;
+	}
+
+	/* On second pass, allocate the new program */
+	if (new_prog) {
+		addrs = calloc(len, sizeof(*addrs));
+		if (addrs == NULL)
+			return -ENOMEM;
+	}
+
+do_pass:
+	new_insn = new_prog;
+	fp = prog;
+
+	/* Classic BPF related prologue emission. */
+	if (new_insn) {
+		/* Classic BPF expects A and X to be reset first. These need
+		 * to be guaranteed to be the first two instructions.
+		 */
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+
+		/* All programs must keep CTX in callee saved BPF_REG_CTX.
+		 * In eBPF case it's done by the compiler, here we need to
+		 * do this ourself. Initial CTX is present in BPF_REG_ARG1.
+		 */
+		*new_insn++ = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1);
+	} else {
+		new_insn += 3;
+	}
+
+	for (i = 0; i < len; fp++, i++) {
+		struct ebpf_insn tmp_insns[6] = { };
+		struct ebpf_insn *insn = tmp_insns;
+
+		if (addrs)
+			addrs[i] = new_insn - new_prog;
+
+		switch (fp->code) {
+			/* Absolute loads are how classic BPF accesses skb */
+		case BPF_LD | BPF_ABS | BPF_W:
+		case BPF_LD | BPF_ABS | BPF_H:
+		case BPF_LD | BPF_ABS | BPF_B:
+			if (convert_bpf_load(fp, &insn))
+				goto err;
+
+			*insn = BPF_RAW_INSN(fp->code, 0, 0, 0, fp->k);
+			break;
+
+		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_X:
+			/* For cBPF, don't cause floating point exception */
+			*insn++ = BPF_MOV32_REG(BPF_REG_X, BPF_REG_X);
+			*insn++ = BPF_JMP_IMM(EBPF_JNE, BPF_REG_X, 0, 2);
+			*insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+			*insn++ = BPF_EXIT_INSN();
+			/* fallthrough */
+		case BPF_ALU | BPF_ADD | BPF_X:
+		case BPF_ALU | BPF_ADD | BPF_K:
+		case BPF_ALU | BPF_SUB | BPF_X:
+		case BPF_ALU | BPF_SUB | BPF_K:
+		case BPF_ALU | BPF_AND | BPF_X:
+		case BPF_ALU | BPF_AND | BPF_K:
+		case BPF_ALU | BPF_OR | BPF_X:
+		case BPF_ALU | BPF_OR | BPF_K:
+		case BPF_ALU | BPF_LSH | BPF_X:
+		case BPF_ALU | BPF_LSH | BPF_K:
+		case BPF_ALU | BPF_RSH | BPF_X:
+		case BPF_ALU | BPF_RSH | BPF_K:
+		case BPF_ALU | BPF_XOR | BPF_X:
+		case BPF_ALU | BPF_XOR | BPF_K:
+		case BPF_ALU | BPF_MUL | BPF_X:
+		case BPF_ALU | BPF_MUL | BPF_K:
+		case BPF_ALU | BPF_DIV | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_NEG:
+		case BPF_LD | BPF_IND | BPF_W:
+		case BPF_LD | BPF_IND | BPF_H:
+		case BPF_LD | BPF_IND | BPF_B:
+			/* All arithmetic insns map as-is. */
+			insn->code = fp->code;
+			insn->dst_reg = BPF_REG_A;
+			bpf_src = BPF_SRC(fp->code);
+			insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			insn->off = 0;
+			insn->imm = fp->k;
+			break;
+
+			/* Jump transformation cannot use BPF block macros
+			 * everywhere as offset calculation and target updates
+			 * require a bit more work than the rest, i.e. jump
+			 * opcodes map as-is, but offsets need adjustment.
+			 */
+
+#define BPF_EMIT_JMP							\
+			do {						\
+				if (target >= len)			\
+					goto err;			\
+				insn->off = addrs ? addrs[target] - addrs[i] - 1 : 0; \
+				/* Adjust pc relative offset for 2nd or 3rd insn. */ \
+				insn->off -= insn - tmp_insns;		\
+			} while (0)
+
+		case BPF_JMP | BPF_JA:
+			target = i + fp->k + 1;
+			insn->code = fp->code;
+			BPF_EMIT_JMP;
+			break;
+
+		case BPF_JMP | BPF_JEQ | BPF_K:
+		case BPF_JMP | BPF_JEQ | BPF_X:
+		case BPF_JMP | BPF_JSET | BPF_K:
+		case BPF_JMP | BPF_JSET | BPF_X:
+		case BPF_JMP | BPF_JGT | BPF_K:
+		case BPF_JMP | BPF_JGT | BPF_X:
+		case BPF_JMP | BPF_JGE | BPF_K:
+		case BPF_JMP | BPF_JGE | BPF_X:
+			if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
+				/* BPF immediates are signed, zero extend
+				 * immediate into tmp register and use it
+				 * in compare insn.
+				 */
+				*insn++ = BPF_MOV32_IMM(BPF_REG_TMP, fp->k);
+
+				insn->dst_reg = BPF_REG_A;
+				insn->src_reg = BPF_REG_TMP;
+				bpf_src = BPF_X;
+			} else {
+				insn->dst_reg = BPF_REG_A;
+				insn->imm = fp->k;
+				bpf_src = BPF_SRC(fp->code);
+				insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			}
+
+			/* Common case where 'jump_false' is next insn. */
+			if (fp->jf == 0) {
+				insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+				target = i + fp->jt + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Convert JEQ into JNE when 'jump_true' is next insn. */
+			if (fp->jt == 0 && BPF_OP(fp->code) == BPF_JEQ) {
+				insn->code = BPF_JMP | EBPF_JNE | bpf_src;
+				target = i + fp->jf + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Other jumps are mapped into two insns: Jxx and JA. */
+			target = i + fp->jt + 1;
+			insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+			BPF_EMIT_JMP;
+			insn++;
+
+			insn->code = BPF_JMP | BPF_JA;
+			target = i + fp->jf + 1;
+			BPF_EMIT_JMP;
+			break;
+
+			/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
+		case BPF_LDX | BPF_MSH | BPF_B:
+			/* tmp = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_A);
+			/* A = BPF_R0 = *(u8 *) (skb->data + K) */
+			*insn++ = BPF_LD_ABS(BPF_B, fp->k);
+			/* A &= 0xf */
+			*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, 0xf);
+			/* A <<= 2 */
+			*insn++ = BPF_ALU32_IMM(BPF_LSH, BPF_REG_A, 2);
+			/* X = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			/* A = tmp */
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP);
+			break;
+
+			/* RET_K is remaped into 2 insns. RET_A case doesn't need an
+			 * extra mov as EBPF_REG_0 is already mapped into BPF_REG_A.
+			 */
+		case BPF_RET | BPF_A:
+		case BPF_RET | BPF_K:
+			if (BPF_RVAL(fp->code) == BPF_K) {
+				*insn++ = BPF_MOV32_RAW(BPF_K, EBPF_REG_0,
+							0, fp->k);
+			}
+			*insn = BPF_EXIT_INSN();
+			break;
+
+			/* Store to stack. */
+		case BPF_ST:
+		case BPF_STX:
+			*insn = BPF_STX_MEM(BPF_W, BPF_REG_FP, BPF_CLASS(fp->code) ==
+					    BPF_ST ? BPF_REG_A : BPF_REG_X,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* Load from stack. */
+		case BPF_LD | BPF_MEM:
+		case BPF_LDX | BPF_MEM:
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD  ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_FP,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* A = K or X = K */
+		case BPF_LD | BPF_IMM:
+		case BPF_LDX | BPF_IMM:
+			*insn = BPF_MOV32_IMM(BPF_CLASS(fp->code) == BPF_LD ?
+					      BPF_REG_A : BPF_REG_X, fp->k);
+			break;
+
+			/* X = A */
+		case BPF_MISC | BPF_TAX:
+			*insn = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			break;
+
+			/* A = X */
+		case BPF_MISC | BPF_TXA:
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_X);
+			break;
+
+			/* A = mbuf->len or X = mbuf->len */
+		case BPF_LD | BPF_W | BPF_LEN:
+		case BPF_LDX | BPF_W | BPF_LEN:
+			/* BPF_ABS/BPF_IND implicitly expect mbuf ptr in R6 */
+
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_CTX,
+					    offsetof(struct rte_mbuf, pkt_len));
+			break;
+
+			/* Unknown instruction. */
+		default:
+			RTE_BPF_LOG(ERR, "%s: Unknown instruction!: %#x\n",
+				    __func__, fp->code);
+			goto err;
+		}
+
+		insn++;
+		if (new_prog)
+			memcpy(new_insn, tmp_insns,
+			       sizeof(*insn) * (insn - tmp_insns));
+		new_insn += insn - tmp_insns;
+	}
+
+	if (!new_prog) {
+		/* Only calculating new length. */
+		*new_len = new_insn - new_prog;
+		return 0;
+	}
+
+	pass++;
+	if ((ptrdiff_t)new_flen != new_insn - new_prog) {
+		new_flen = new_insn - new_prog;
+		if (pass > 2)
+			goto err;
+		goto do_pass;
+	}
+
+	free(addrs);
+	assert(*new_len == new_flen);
+
+	return 0;
+err:
+	free(addrs);
+	return -1;
+}
+
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog)
+{
+	struct rte_bpf_prm *prm = NULL;
+	struct ebpf_insn *ebpf = NULL;
+	uint32_t ebpf_len = 0;
+	int ret;
+
+	if (prog == NULL) {
+		RTE_BPF_LOG(ERR, "%s: NULL program\n", __func__);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* 1st pass: calculate the eBPF program length */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, NULL, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot get eBPF length\n", __func__);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	RTE_BPF_LOG(DEBUG, "%s: prog len cBPF=%u -> eBPF=%u\n",
+		    __func__, prog->bf_len, ebpf_len);
+
+	prm = rte_zmalloc("bpf_filter",
+			  sizeof(*prm) + ebpf_len * sizeof(*ebpf), 0);
+	if (prm == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	/* The EPBF instructions in this case are right after the header */
+	ebpf = (void *)(prm + 1);
+
+	/* 2nd pass: remap cBPF to eBPF instructions  */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, ebpf, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot convert cBPF to eBPF\n", __func__);
+		free(prm);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	prm->ins = ebpf;
+	prm->nb_ins = ebpf_len;
+
+	/* Classic BPF programs use mbufs */
+	prm->prog_arg.type = RTE_BPF_ARG_PTR_MBUF;
+	prm->prog_arg.size = sizeof(struct rte_mbuf);
+
+	return prm;
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 63cbd60185e0..54f7610ae990 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -25,3 +25,8 @@ if dep.found()
     sources += files('bpf_load_elf.c')
     ext_deps += dep
 endif
+
+if dpdk_conf.has('RTE_PORT_PCAP')
+    sources += files('bpf_convert.c')
+    ext_deps += pcap_dep
+endif
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 69116f36ba8b..2f23e272a376 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,31 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+#ifdef RTE_PORT_PCAP
+
+struct bpf_program;
+
+/**
+ * Convert a Classic BPF program from libpcap into a DPDK BPF code.
+ *
+ * @param prog
+ *  Classic BPF program from pcap_compile().
+ * @param prm
+ *  Result Extended BPF program.
+ * @return
+ *   Pointer to BPF program (allocated with *rte_malloc*)
+ *   that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+__rte_experimental
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog);
+
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 0bf35f487666..47082d5003ef 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -14,3 +14,9 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_convert;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v9 05/12] bpf: add function to dump eBPF instructions
  2021-09-16  0:14 ` [dpdk-dev] [PATCH v9 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (3 preceding siblings ...)
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
@ 2021-09-16  0:14   ` Stephen Hemminger
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 06/12] pdump: support pcapng and filtering Stephen Hemminger
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16  0:14 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

When debugging converted (and other) programs it is useful
to see disassembled eBPF output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/bpf/bpf_dump.c  | 118 ++++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build |   1 +
 lib/bpf/rte_bpf.h   |  14 ++++++
 lib/bpf/version.map |   1 +
 4 files changed, 134 insertions(+)
 create mode 100644 lib/bpf/bpf_dump.c

diff --git a/lib/bpf/bpf_dump.c b/lib/bpf/bpf_dump.c
new file mode 100644
index 000000000000..a6a431e64903
--- /dev/null
+++ b/lib/bpf/bpf_dump.c
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Stephen Hemminger
+ * Based on filter2xdp
+ * Copyright (C) 2017 Tobias Klauser
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+
+#include "rte_bpf.h"
+
+#define BPF_OP_INDEX(x) (BPF_OP(x) >> 4)
+#define BPF_SIZE_INDEX(x) (BPF_SIZE(x) >> 3)
+
+static const char *const class_tbl[] = {
+	[BPF_LD] = "ld",   [BPF_LDX] = "ldx",	 [BPF_ST] = "st",
+	[BPF_STX] = "stx", [BPF_ALU] = "alu",	 [BPF_JMP] = "jmp",
+	[BPF_RET] = "ret", [BPF_MISC] = "alu64",
+};
+
+static const char *const alu_op_tbl[16] = {
+	[BPF_ADD >> 4] = "add",	   [BPF_SUB >> 4] = "sub",
+	[BPF_MUL >> 4] = "mul",	   [BPF_DIV >> 4] = "div",
+	[BPF_OR >> 4] = "or",	   [BPF_AND >> 4] = "and",
+	[BPF_LSH >> 4] = "lsh",	   [BPF_RSH >> 4] = "rsh",
+	[BPF_NEG >> 4] = "neg",	   [BPF_MOD >> 4] = "mod",
+	[BPF_XOR >> 4] = "xor",	   [EBPF_MOV >> 4] = "mov",
+	[EBPF_ARSH >> 4] = "arsh", [EBPF_END >> 4] = "endian",
+};
+
+static const char *const size_tbl[] = {
+	[BPF_W >> 3] = "w",
+	[BPF_H >> 3] = "h",
+	[BPF_B >> 3] = "b",
+	[EBPF_DW >> 3] = "dw",
+};
+
+static const char *const jump_tbl[16] = {
+	[BPF_JA >> 4] = "ja",	   [BPF_JEQ >> 4] = "jeq",
+	[BPF_JGT >> 4] = "jgt",	   [BPF_JGE >> 4] = "jge",
+	[BPF_JSET >> 4] = "jset",  [EBPF_JNE >> 4] = "jne",
+	[EBPF_JSGT >> 4] = "jsgt", [EBPF_JSGE >> 4] = "jsge",
+	[EBPF_CALL >> 4] = "call", [EBPF_EXIT >> 4] = "exit",
+};
+
+static void ebpf_dump(FILE *f, const struct ebpf_insn insn, size_t n)
+{
+	const char *op, *postfix = "";
+	uint8_t cls = BPF_CLASS(insn.code);
+
+	fprintf(f, " L%zu:\t", n);
+
+	switch (cls) {
+	default:
+		fprintf(f, "unimp 0x%x // class: %s\n", insn.code,
+			class_tbl[cls]);
+		break;
+	case BPF_ALU:
+		postfix = "32";
+		/* fall through */
+	case EBPF_ALU64:
+		op = alu_op_tbl[BPF_OP_INDEX(insn.code)];
+		if (BPF_SRC(insn.code) == BPF_X)
+			fprintf(f, "%s%s r%u, r%u\n", op, postfix, insn.dst_reg,
+				insn.src_reg);
+		else
+			fprintf(f, "%s%s r%u, #0x%x\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		break;
+	case BPF_LD:
+		op = "ld";
+		postfix = size_tbl[BPF_SIZE_INDEX(insn.code)];
+		if (BPF_MODE(insn.code) == BPF_IMM)
+			fprintf(f, "%s%s r%d, #0x%x\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		else if (BPF_MODE(insn.code) == BPF_ABS)
+			fprintf(f, "%s%s r%d, [%d]\n", op, postfix,
+				insn.dst_reg, insn.imm);
+		else if (BPF_MODE(insn.code) == BPF_IND)
+			fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix,
+				insn.dst_reg, insn.src_reg, insn.imm);
+		else
+			fprintf(f, "// BUG: LD opcode 0x%02x in eBPF insns\n",
+				insn.code);
+		break;
+	case BPF_LDX:
+		op = "ldx";
+		postfix = size_tbl[BPF_SIZE_INDEX(insn.code)];
+		fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix, insn.dst_reg,
+			insn.src_reg, insn.off);
+		break;
+#define L(pc, off) ((int)(pc) + 1 + (off))
+	case BPF_JMP:
+		op = jump_tbl[BPF_OP_INDEX(insn.code)];
+		if (op == NULL)
+			fprintf(f, "invalid jump opcode: %#x\n", insn.code);
+		else if (BPF_OP(insn.code) == BPF_JA)
+			fprintf(f, "%s L%d\n", op, L(n, insn.off));
+		else if (BPF_OP(insn.code) == EBPF_EXIT)
+			fprintf(f, "%s\n", op);
+		else
+			fprintf(f, "%s r%u, #0x%x, L%d\n", op, insn.dst_reg,
+				insn.imm, L(n, insn.off));
+		break;
+	case BPF_RET:
+		fprintf(f, "// BUG: RET opcode 0x%02x in eBPF insns\n",
+			insn.code);
+		break;
+	}
+}
+
+void rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len)
+{
+	uint32_t i;
+
+	for (i = 0; i < len; ++i)
+		ebpf_dump(f, buf[i], i);
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 54f7610ae990..5b5585173aeb 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2018 Intel Corporation
 
 sources = files('bpf.c',
+	'bpf_dump.c',
         'bpf_exec.c',
         'bpf_load.c',
         'bpf_pkt.c',
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 2f23e272a376..0d0a84b130a0 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,20 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+/**
+ * Dump epf instructions to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param buf
+ *   A pointer to BPF instructions
+ * @param len
+ *   Number of BPF instructions to dump.
+ */
+__rte_experimental
+void
+rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len);
+
 #ifdef RTE_PORT_PCAP
 
 struct bpf_program;
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 47082d5003ef..3b953f2f4592 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -19,4 +19,5 @@ EXPERIMENTAL {
 	global:
 
 	rte_bpf_convert;
+	rte_bpf_dump;
 };
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v9 06/12] pdump: support pcapng and filtering
  2021-09-16  0:14 ` [dpdk-dev] [PATCH v9 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (4 preceding siblings ...)
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
@ 2021-09-16  0:14   ` Stephen Hemminger
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16  0:14 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build       |   4 +-
 lib/pdump/meson.build |   2 +-
 lib/pdump/rte_pdump.c | 441 ++++++++++++++++++++++++++++++------------
 lib/pdump/rte_pdump.h | 110 ++++++++++-
 lib/pdump/version.map |   8 +
 5 files changed, 437 insertions(+), 128 deletions(-)

diff --git a/lib/meson.build b/lib/meson.build
index ba88e9eabc58..1da521ea6185 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -26,6 +26,7 @@ libraries = [
         'timer',   # eventdev depends on this
         'acl',
         'bbdev',
+        'bpf',
         'bitratestats',
         'cfgfile',
         'compressdev',
@@ -43,7 +44,6 @@ libraries = [
         'member',
         'pcapng',
         'power',
-        'pdump',
         'rawdev',
         'regexdev',
         'rib',
@@ -55,10 +55,10 @@ libraries = [
         'ipsec', # ipsec lib depends on net, crypto and security
         'fib', #fib lib depends on rib
         'port', # pkt framework libs which use other libs from above
+        'pdump', # pdump lib depends on bpf pcapng
         'table',
         'pipeline',
         'flow_classify', # flow_classify lib depends on pkt framework table lib
-        'bpf',
         'graph',
         'node',
 ]
diff --git a/lib/pdump/meson.build b/lib/pdump/meson.build
index 3a95eabde6a6..51ceb2afdec5 100644
--- a/lib/pdump/meson.build
+++ b/lib/pdump/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('rte_pdump.c')
 headers = files('rte_pdump.h')
-deps += ['ethdev']
+deps += ['ethdev', 'bpf', 'pcapng']
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 382217bc1564..abc28fcee0ad 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -7,8 +7,10 @@
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
+#include <rte_pcapng.h>
 
 #include "rte_pdump.h"
 
@@ -27,30 +29,26 @@ enum pdump_operation {
 	ENABLE = 2
 };
 
+/*
+ * Note: version numbers intentionally start at 3
+ * in order to catch any application built with older out
+ * version of DPDK using incompatible client request format.
+ */
 enum pdump_version {
-	V1 = 1
+	PDUMP_CLIENT_LEGACY = 3,
+	PDUMP_CLIENT_PCAPNG = 4,
 };
 
 struct pdump_request {
 	uint16_t ver;
 	uint16_t op;
-	uint32_t flags;
-	union pdump_data {
-		struct enable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} en_v1;
-		struct disable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} dis_v1;
-	} data;
+	uint16_t flags;
+	uint16_t queue;
+	struct rte_ring *ring;
+	struct rte_mempool *mp;
+	const struct rte_bpf_prm *prm;
+	uint32_t snaplen;
+	char device[RTE_DEV_NAME_MAX_LEN];
 };
 
 struct pdump_response {
@@ -63,80 +61,140 @@ static struct pdump_rxtx_cbs {
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	const struct rte_eth_rxtx_callback *cb;
-	void *filter;
+	const struct rte_bpf *filter;
+	enum pdump_version ver;
+	uint32_t snaplen;
 } rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
 tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
 
-
-static inline void
-pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+static const char *MZ_RTE_PDUMP_STATS = "rte_pdump_stats";
+
+/* Shared memory between primary and secondary processes. */
+static struct {
+	struct rte_pdump_stats rx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+	struct rte_pdump_stats tx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+} *pdump_stats;
+
+/* Create a clone of mbuf to be placed into ring. */
+static void
+pdump_copy(uint16_t port_id, uint16_t queue,
+	   enum rte_pcapng_direction direction,
+	   struct rte_mbuf **pkts, uint16_t nb_pkts,
+	   const struct pdump_rxtx_cbs *cbs,
+	   struct rte_pdump_stats *stats)
 {
 	unsigned int i;
 	int ring_enq;
 	uint16_t d_pkts = 0;
 	struct rte_mbuf *dup_bufs[nb_pkts];
-	struct pdump_rxtx_cbs *cbs;
+	uint64_t ts;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	struct rte_mbuf *p;
+	uint64_t rcs[nb_pkts];
+
+	if (cbs->filter &&
+	    rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts) == 0) {
+		/* All packets were filtered out */
+		__atomic_fetch_add(&stats->filtered, nb_pkts,
+				   __ATOMIC_RELAXED);
+		return;
+	}
 
-	cbs  = user_params;
+	ts = rte_get_tsc_cycles();
 	ring = cbs->ring;
 	mp = cbs->mp;
 	for (i = 0; i < nb_pkts; i++) {
-		p = rte_pktmbuf_copy(pkts[i], mp, 0, UINT32_MAX);
-		if (p)
+		/*
+		 * Similar behavior to rte_bpf_eth callback.
+		 * if BPF program returns zero value for a given packet,
+		 * then it will be ignored.
+		 */
+		if (cbs->filter && rcs[i] == 0) {
+			__atomic_fetch_add(&stats->filtered,
+					   1, __ATOMIC_RELAXED);
+			continue;
+		}
+
+		/*
+		 * If using pcapng then want to wrap packets
+		 * otherwise a simple copy.
+		 */
+		if (cbs->ver == PDUMP_CLIENT_PCAPNG)
+			p = rte_pcapng_copy(port_id, queue,
+					    pkts[i], mp, cbs->snaplen,
+					    ts, direction);
+		else
+			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
+
+		if (unlikely(p == NULL))
+			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+		else
 			dup_bufs[d_pkts++] = p;
 	}
 
+	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)dup_bufs, d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		PDUMP_LOG(DEBUG,
-			"only %d of packets enqueued to ring\n", ring_enq);
+		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
 
 static uint16_t
-pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_rx(uint16_t port, uint16_t queue,
 	struct rte_mbuf **pkts, uint16_t nb_pkts,
-	uint16_t max_pkts __rte_unused,
-	void *user_params)
+	uint16_t max_pkts __rte_unused, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->rx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_IN,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static uint16_t
-pdump_tx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_tx(uint16_t port, uint16_t queue,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->tx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static int
-pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_rx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &rx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &rx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"rx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_first_rx_callback(port, qid,
 								pdump_rx, cbs);
 			if (cbs->cb == NULL) {
@@ -145,8 +203,7 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -170,26 +227,32 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 }
 
 static int
-pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_tx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &tx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &tx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"tx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_tx_callback(port, qid, pdump_tx,
 								cbs);
 			if (cbs->cb == NULL) {
@@ -198,8 +261,7 @@ pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -228,37 +290,47 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	uint16_t nb_rx_q = 0, nb_tx_q = 0, end_q, queue;
 	uint16_t port;
 	int ret = 0;
+	struct rte_bpf *filter = NULL;
 	uint32_t flags;
 	uint16_t operation;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 
-	flags = p->flags;
-	operation = p->op;
-	if (operation == ENABLE) {
-		ret = rte_eth_dev_get_port_by_name(p->data.en_v1.device,
-				&port);
-		if (ret < 0) {
+	if (!(p->ver == PDUMP_CLIENT_LEGACY ||
+	      p->ver == PDUMP_CLIENT_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "incorrect client version %u\n", p->ver);
+		return -EINVAL;
+	}
+
+	if (p->prm) {
+		if (p->prm->prog_arg.type != RTE_BPF_ARG_PTR_MBUF) {
 			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.en_v1.device);
+				  "invalid BPF program type: %u\n",
+				  p->prm->prog_arg.type);
 			return -EINVAL;
 		}
-		queue = p->data.en_v1.queue;
-		ring = p->data.en_v1.ring;
-		mp = p->data.en_v1.mp;
-	} else {
-		ret = rte_eth_dev_get_port_by_name(p->data.dis_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.dis_v1.device);
-			return -EINVAL;
+
+		filter = rte_bpf_load(p->prm);
+		if (filter == NULL) {
+			PDUMP_LOG(ERR, "cannot load BPF filter: %s\n",
+				  rte_strerror(rte_errno));
+			return -rte_errno;
 		}
-		queue = p->data.dis_v1.queue;
-		ring = p->data.dis_v1.ring;
-		mp = p->data.dis_v1.mp;
+	}
+
+	flags = p->flags;
+	operation = p->op;
+	queue = p->queue;
+	ring = p->ring;
+	mp = p->mp;
+
+	ret = rte_eth_dev_get_port_by_name(p->device, &port);
+	if (ret < 0) {
+		PDUMP_LOG(ERR,
+			  "failed to get port id for device id=%s\n",
+			  p->device);
+		return -EINVAL;
 	}
 
 	/* validation if packet capture is for all queues */
@@ -296,8 +368,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register RX callback */
 	if (flags & RTE_PDUMP_FLAG_RX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_rx_q : queue + 1;
-		ret = pdump_register_rx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_rx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -305,8 +378,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register TX callback */
 	if (flags & RTE_PDUMP_FLAG_TX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_tx_q : queue + 1;
-		ret = pdump_register_tx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_tx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -332,7 +406,7 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 		resp->err_value = set_pdump_rxtx_cbs(cli_req);
 	}
 
-	strlcpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_resp.len_param = sizeof(*resp);
 	mp_resp.num_fds = 0;
 	if (rte_mp_reply(&mp_resp, peer) < 0) {
@@ -347,8 +421,18 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 int
 rte_pdump_init(void)
 {
+	const struct rte_memzone *mz;
 	int ret;
 
+	mz = rte_memzone_reserve(MZ_RTE_PDUMP_STATS, sizeof(*pdump_stats),
+				 rte_socket_id(), 0);
+	if (mz == NULL) {
+		PDUMP_LOG(ERR, "cannot allocate pdump statistics\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	pdump_stats = mz->addr;
+
 	ret = rte_mp_action_register(PDUMP_MP, pdump_server);
 	if (ret && rte_errno != ENOTSUP)
 		return -1;
@@ -392,14 +476,21 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 static int
 pdump_validate_flags(uint32_t flags)
 {
-	if (flags != RTE_PDUMP_FLAG_RX && flags != RTE_PDUMP_FLAG_TX &&
-		flags != RTE_PDUMP_FLAG_RXTX) {
+	if ((flags & RTE_PDUMP_FLAG_RXTX) == 0) {
 		PDUMP_LOG(ERR,
 			"invalid flags, should be either rx/tx/rxtx\n");
 		rte_errno = EINVAL;
 		return -1;
 	}
 
+	/* mask off the flags we know about */
+	if (flags & ~(RTE_PDUMP_FLAG_RXTX | RTE_PDUMP_FLAG_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "unknown flags: %#x\n", flags);
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -426,12 +517,12 @@ pdump_validate_port(uint16_t port, char *name)
 }
 
 static int
-pdump_prepare_client_request(char *device, uint16_t queue,
-				uint32_t flags,
-				uint16_t operation,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+pdump_prepare_client_request(const char *device, uint16_t queue,
+			     uint32_t flags, uint32_t snaplen,
+			     uint16_t operation,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     const struct rte_bpf_prm *prm)
 {
 	int ret = -1;
 	struct rte_mp_msg mp_req, *mp_rep;
@@ -440,26 +531,26 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	struct pdump_request *req = (struct pdump_request *)mp_req.param;
 	struct pdump_response *resp;
 
-	req->ver = 1;
-	req->flags = flags;
+	memset(req, 0, sizeof(*req));
+	if (flags & RTE_PDUMP_FLAG_PCAPNG)
+		req->ver = PDUMP_CLIENT_PCAPNG;
+	else
+		req->ver = PDUMP_CLIENT_LEGACY;
+
+	req->flags = flags & RTE_PDUMP_FLAG_RXTX;
 	req->op = operation;
+	req->queue = queue;
+	rte_strscpy(req->device, device, sizeof(req->device));
+
 	if ((operation & ENABLE) != 0) {
-		strlcpy(req->data.en_v1.device, device,
-			sizeof(req->data.en_v1.device));
-		req->data.en_v1.queue = queue;
-		req->data.en_v1.ring = ring;
-		req->data.en_v1.mp = mp;
-		req->data.en_v1.filter = filter;
-	} else {
-		strlcpy(req->data.dis_v1.device, device,
-			sizeof(req->data.dis_v1.device));
-		req->data.dis_v1.queue = queue;
-		req->data.dis_v1.ring = NULL;
-		req->data.dis_v1.mp = NULL;
-		req->data.dis_v1.filter = NULL;
+		req->queue = queue;
+		req->ring = ring;
+		req->mp = mp;
+		req->prm = prm;
+		req->snaplen = snaplen;
 	}
 
-	strlcpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_req.len_param = sizeof(*req);
 	mp_req.num_fds = 0;
 	if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) {
@@ -477,11 +568,17 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	return ret;
 }
 
-int
-rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
-			struct rte_ring *ring,
-			struct rte_mempool *mp,
-			void *filter)
+/*
+ * There are two versions of this function, because although original API
+ * left place holder for future filter, it never checked the value.
+ * Therefore the API can't depend on application passing a non
+ * bogus value.
+ */
+static int
+pdump_enable(uint16_t port, uint16_t queue,
+	     uint32_t flags, uint32_t snaplen,
+	     struct rte_ring *ring, struct rte_mempool *mp,
+	     const struct rte_bpf_prm *prm)
 {
 	int ret;
 	char name[RTE_DEV_NAME_MAX_LEN];
@@ -496,20 +593,42 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						ENABLE, ring, mp, filter);
+	if (snaplen == 0)
+		snaplen = UINT32_MAX;
 
-	return ret;
+	return pdump_prepare_client_request(name, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
 }
 
 int
-rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
-				uint32_t flags,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
+		 struct rte_ring *ring,
+		 struct rte_mempool *mp,
+		 void *filter __rte_unused)
 {
-	int ret = 0;
+	return pdump_enable(port, queue, flags, 0,
+			    ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm)
+{
+	return pdump_enable(port, queue, flags, snaplen,
+			    ring, mp, prm);
+}
+
+static int
+pdump_enable_by_deviceid(const char *device_id, uint16_t queue,
+			 uint32_t flags, uint32_t snaplen,
+			 struct rte_ring *ring,
+			 struct rte_mempool *mp,
+			 const struct rte_bpf_prm *prm)
+{
+	int ret;
 
 	ret = pdump_validate_ring_mp(ring, mp);
 	if (ret < 0)
@@ -518,10 +637,30 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						ENABLE, ring, mp, filter);
+	return pdump_prepare_client_request(device_id, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
+}
 
-	return ret;
+int
+rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
+			     uint32_t flags,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     void *filter __rte_unused)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, 0,
+					ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *prm)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, snaplen,
+					ring, mp, prm);
 }
 
 int
@@ -537,8 +676,8 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags)
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(name, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
@@ -553,8 +692,66 @@ rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(device_id, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
+
+static void
+pdump_sum_stats(uint16_t port, uint16_t nq,
+		struct rte_pdump_stats stats[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+		struct rte_pdump_stats *total)
+{
+	uint64_t *sum = (uint64_t *)total;
+	unsigned int i;
+	uint64_t val;
+	uint16_t qid;
+
+	for (qid = 0; qid < nq; qid++) {
+		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+
+		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
+			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			sum[i] += val;
+		}
+	}
+}
+
+int
+rte_pdump_stats(uint16_t port, struct rte_pdump_stats *stats)
+{
+	struct rte_eth_dev_info dev_info;
+	const struct rte_memzone *mz;
+	int ret;
+
+	memset(stats, 0, sizeof(*stats));
+	ret = rte_eth_dev_info_get(port, &dev_info);
+	if (ret != 0) {
+		PDUMP_LOG(ERR,
+			  "Error during getting device (port %u) info: %s\n",
+			  port, strerror(-ret));
+		return ret;
+	}
+
+	if (pdump_stats == NULL) {
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+			PDUMP_LOG(ERR,
+				  "pdump not initialized\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		mz = rte_memzone_lookup(MZ_RTE_PDUMP_STATS);
+		if (mz == NULL) {
+			PDUMP_LOG(ERR, "can not find pdump stats\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+		pdump_stats = mz->addr;
+	}
+
+	pdump_sum_stats(port, dev_info.nb_rx_queues, pdump_stats->rx, stats);
+	pdump_sum_stats(port, dev_info.nb_tx_queues, pdump_stats->tx, stats);
+	return 0;
+}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index 6b00fc17aeb2..be3fd14c4bd3 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -15,6 +15,7 @@
 #include <stdint.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
+#include <rte_bpf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -26,7 +27,9 @@ enum {
 	RTE_PDUMP_FLAG_RX = 1,  /* receive direction */
 	RTE_PDUMP_FLAG_TX = 2,  /* transmit direction */
 	/* both receive and transmit directions */
-	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
+	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX),
+
+	RTE_PDUMP_FLAG_PCAPNG = 4, /* format for pcapng */
 };
 
 /**
@@ -68,7 +71,7 @@ rte_pdump_uninit(void);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  Unused should be NULL.
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -80,6 +83,41 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 		struct rte_mempool *mp,
 		void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given port and queue with filtering.
+ *
+ * @param port_id
+ *  The Ethernet port on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param prm
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf(uint16_t port_id, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm);
+
 /**
  * Disables packet capturing on given port and queue.
  *
@@ -118,7 +156,7 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  unused should be NULL
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -131,6 +169,43 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 				struct rte_mempool *mp,
 				void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given device id and queue with filtering.
+ * device_id can be name or pci address of device.
+ *
+ * @param device_id
+ *  device id on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *filter);
+
+
 /**
  * Disables packet capturing on given device_id and queue.
  * device_id can be name or pci address of device.
@@ -153,6 +228,35 @@ int
 rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 				uint32_t flags);
 
+
+/**
+ * A structure used to retrieve statistics from packet capture.
+ * The statistics are sum of both receive and transmit queues.
+ */
+struct rte_pdump_stats {
+	uint64_t accepted; /**< Number of packets accepted by filter. */
+	uint64_t filtered; /**< Number of packets rejected by filter. */
+	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
+	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+
+	uint64_t reserved[4]; /**< Reserved and pad to cache line */
+};
+
+/**
+ * Retrieve the packet capture statistics for a queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param stats
+ *   A pointer to structure of type *rte_pdump_stats* to be filled in.
+ * @return
+ *   Zero if successful. -1 on error and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_pdump_stats(uint16_t port_id, struct rte_pdump_stats *stats);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index f0a9d12c9a9e..ce5502d9cdf4 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -10,3 +10,11 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_pdump_enable_bpf;
+	rte_pdump_enable_bpf_by_deviceid;
+	rte_pdump_stats;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v9 07/12] app/dumpcap: add new packet capture application
  2021-09-16  0:14 ` [dpdk-dev] [PATCH v9 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (5 preceding siblings ...)
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 06/12] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-09-16  0:14   ` Stephen Hemminger
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 08/12] test: add test for bpf_convert Stephen Hemminger
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16  0:14 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.

It is not complete yet some features such as filtering are not implemented.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/dumpcap/main.c      | 844 ++++++++++++++++++++++++++++++++++++++++
 app/dumpcap/meson.build |  16 +
 app/meson.build         |   1 +
 3 files changed, 861 insertions(+)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build

diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c
new file mode 100644
index 000000000000..baf9eee46666
--- /dev/null
+++ b/app/dumpcap/main.c
@@ -0,0 +1,844 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * DPDK application to dump network traffic
+ * This is designed to look and act like the Wireshark
+ * dumpcap program.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <limits.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_alarm.h>
+#include <rte_bpf.h>
+#include <rte_config.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_lcore.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_pcapng.h>
+#include <rte_pdump.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+#include <rte_time.h>
+#include <rte_version.h>
+
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+
+#define RING_NAME "capture-ring"
+#define MONITOR_INTERVAL  (500 * 1000)
+#define MBUF_POOL_CACHE_SIZE 32
+#define BURST_SIZE 32
+#define SLEEP_THRESHOLD 1000
+
+/* command line flags */
+static const char *progname;
+static bool quit_signal;
+static bool group_read;
+static bool quiet;
+static bool promiscuous_mode = true;
+static bool use_pcapng = true;
+static char *output_name;
+static const char *filter_str;
+static unsigned int ring_size = 2048;
+static const char *capture_comment;
+static uint32_t snaplen = RTE_MBUF_DEFAULT_BUF_SIZE;
+static bool dump_bpf;
+static struct {
+	uint64_t  duration;	/* nanoseconds */
+	unsigned long packets;  /* number of packets in file */
+	size_t size;		/* file size (bytes) */
+} stop;
+
+/* Running state */
+static struct rte_bpf_prm *bpf_prm;
+static uint64_t start_time, end_time;
+static uint64_t packets_received;
+static size_t file_size;
+
+struct interface {
+	TAILQ_ENTRY(interface) next;
+	uint16_t port;
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_rxtx_callback *rx_cb[RTE_MAX_QUEUES_PER_PORT];
+};
+
+TAILQ_HEAD(interface_list, interface);
+static struct interface_list interfaces = TAILQ_HEAD_INITIALIZER(interfaces);
+static struct interface *port2intf[RTE_MAX_ETHPORTS];
+
+/* Can do either pcap or pcapng format output */
+typedef union {
+	rte_pcapng_t  *pcapng;
+	pcap_dumper_t *dumper;
+} dumpcap_out_t;
+
+static void usage(void)
+{
+	printf("Usage: %s [options] ...\n\n", progname);
+	printf("Capture Interface:\n"
+	       "  -i <interface>           name or port index of interface\n"
+	       "  -f <capture filter>      packet filter in libpcap filter syntax\n");
+	printf("  -s <snaplen>, --snapshot-length <snaplen>\n"
+	       "                           packet snapshot length (def: %u)\n",
+	       RTE_MBUF_DEFAULT_BUF_SIZE);
+	printf("  -p, --no-promiscuous-mode\n"
+	       "                           don't capture in promiscuous mode\n"
+	       "  -D, --list-interfaces    print list of interfaces and exit\n"
+	       "  -d                       print generated BPF code for capture filter\n"
+	       "\n"
+	       "Stop conditions:\n"
+	       "  -c <packet count>        stop after n packets (def: infinite)\n"
+	       "  -a <autostop cond.> ..., --autostop <autostop cond.> ...\n"
+	       "                           duration:NUM - stop after NUM seconds\n"
+	       "                           filesize:NUM - stop this file after NUM kB\n"
+	       "                            packets:NUM - stop after NUM packets\n"
+	       "Output (files):\n"
+	       "  -w <filename>            name of file to save (def: tempfile)\n"
+	       "  -g                       enable group read access on the output file(s)\n"
+	       "  -n                       use pcapng format instead of pcap (default)\n"
+	       "  -P                       use libpcap format instead of pcapng\n"
+	       "  --capture-comment <comment>\n"
+	       "                           add a capture comment to the output file\n"
+	       "\n"
+	       "Miscellaneous:\n"
+	       "  -q                       don't report packet capture counts\n"
+	       "  -v, --version            print version information and exit\n"
+	       "  -h, --help               display this help and exit\n"
+	       "\n"
+	       "Use Ctrl-C to stop capturing at any time.\n");
+}
+
+static const char *version(void)
+{
+	static char str[128];
+
+	snprintf(str, sizeof(str),
+		 "%s 1.0 (%s)\n", progname, rte_version());
+	return str;
+}
+
+/* Parse numeric argument from command line */
+static unsigned long get_uint(const char *arg, const char *name,
+			     unsigned int limit)
+{
+	unsigned long u;
+	char *endp;
+
+	u = strtoul(arg, &endp, 0);
+	if (*arg == '\0' || *endp != '\0')
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is not a valid number\n",
+			 name, arg);
+	if (limit && u > limit)
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is too large (greater than %u)\n",
+			 name, arg, limit);
+
+	return u;
+}
+
+/* Set auto stop values */
+static void auto_stop(char *opt)
+{
+	char *value, *endp;
+
+	value = strchr(opt, ':');
+	if (value == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Missing colon in auto stop parameter\n");
+
+	*value++ = '\0';
+	if (strcmp(opt, "duration") == 0) {
+		double interval = strtod(value, &endp);
+
+		if (*value == '\0' || *endp != '\0' || interval <= 0)
+			rte_exit(EXIT_FAILURE,
+				 "Invalid duration \"%s\"\n", value);
+		stop.duration = NSEC_PER_SEC * interval;
+	} else if (strcmp(opt, "filesize") == 0) {
+		stop.size = get_uint(value, "filesize", 0) * 1024;
+	} else if (strcmp(opt, "packets") == 0) {
+		stop.packets = get_uint(value, "packets", 0);
+	} else {
+		rte_exit(EXIT_FAILURE,
+			 "Unknown autostop parameter \"%s\"\n", opt);
+	}
+}
+
+/* Add interface to list of interfaces to capture */
+static void add_interface(uint16_t port, const char *name)
+{
+	struct interface *intf;
+
+	intf = malloc(sizeof(*intf));
+	if (!intf)
+		rte_exit(EXIT_FAILURE, "no memory for interface\n");
+
+	memset(intf, 0, sizeof(*intf));
+	rte_strscpy(intf->name, name, sizeof(intf->name));
+
+	printf("Capturing on '%s'\n", name);
+
+	port2intf[port] = intf;
+	TAILQ_INSERT_TAIL(&interfaces, intf, next);
+}
+
+/* Select all valid DPDK interfaces */
+static void select_all_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+	}
+}
+
+/*
+ * Choose interface to capture if no -i option given.
+ * Select the first DPDK port, this matches what dumpcap does.
+ */
+static void set_default_interface(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+		return;
+	}
+	rte_exit(EXIT_FAILURE, "No usable interfaces found\n");
+}
+
+/* Lookup interface by name or port and add it to the list */
+static void select_interface(const char *arg)
+{
+	uint16_t port;
+
+	if (strcmp(arg, "*"))
+		select_all_interfaces();
+	else if (rte_eth_dev_get_port_by_name(arg, &port) == 0)
+		add_interface(port, arg);
+	else {
+		char name[RTE_ETH_NAME_MAX_LEN];
+
+		port = get_uint(arg, "port_number", UINT16_MAX);
+		if (rte_eth_dev_get_name_by_port(port, name) < 0)
+			rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
+				 port);
+		add_interface(port, name);
+	}
+}
+
+/* Display list of possible interfaces that can be used. */
+static void show_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		printf("%u. %s\n", p, name);
+	}
+}
+
+static void compile_filter(void)
+{
+	struct bpf_program bf;
+	pcap_t *pcap;
+
+	pcap = pcap_open_dead(DLT_EN10MB, snaplen);
+	if (!pcap)
+		rte_exit(EXIT_FAILURE, "can not open pcap\n");
+
+	if (pcap_compile(pcap, &bf, filter_str,
+			 1, PCAP_NETMASK_UNKNOWN) != 0)
+		rte_exit(EXIT_FAILURE, "pcap filter string not valid (%s)\n",
+			 pcap_geterr(pcap));
+
+	bpf_prm = rte_bpf_convert(&bf);
+	if (bpf_prm == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "bpf convert failed\n");
+
+	if (dump_bpf) {
+		printf("cBPF program (%u insns)\n", bf.bf_len);
+		bpf_dump(&bf, 1);
+		printf("\neBPF program (%u insns)\n", bpf_prm->nb_ins);
+		rte_bpf_dump(stdout, bpf_prm->ins, bpf_prm->nb_ins);
+		exit(0);
+	}
+
+	/* Don't care about original program any more */
+	pcap_freecode(&bf);
+	pcap_close(pcap);
+}
+
+/*
+ * Parse command line options.
+ * These are chosen to be similar to dumpcap command.
+ */
+static void parse_opts(int argc, char **argv)
+{
+	static const struct option long_options[] = {
+		{ "autostop",        required_argument, NULL, 'a' },
+		{ "capture-comment", required_argument, NULL, 0 },
+		{ "help",            no_argument,       NULL, 'h' },
+		{ "interface",       required_argument, NULL, 'i' },
+		{ "list-interfaces", no_argument,       NULL, 'D' },
+		{ "no-promiscuous-mode", no_argument,   NULL, 'p' },
+		{ "output-file",     required_argument, NULL, 'w' },
+		{ "ring-buffer",     required_argument, NULL, 'b' },
+		{ "snapshot-length", required_argument, NULL, 's' },
+		{ "version",         no_argument,       NULL, 'v' },
+		{ NULL },
+	};
+	int option_index, c;
+
+	for (;;) {
+		c = getopt_long(argc, argv, "a:b:c:dDf:ghi:nN:pPqs:vw:",
+				long_options, &option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			switch (option_index) {
+			case 0:
+				capture_comment = optarg;
+				break;
+			default:
+				usage();
+				exit(1);
+			}
+			break;
+		case 'a':
+			auto_stop(optarg);
+			break;
+		case 'b':
+			rte_exit(EXIT_FAILURE,
+				 "multiple files not implemented\n");
+			break;
+		case 'c':
+			stop.packets = get_uint(optarg, "packet_count", 0);
+			break;
+		case 'd':
+			dump_bpf = true;
+			break;
+		case 'D':
+			show_interfaces();
+			exit(0);
+		case 'f':
+			filter_str = optarg;
+			break;
+		case 'g':
+			group_read = true;
+			break;
+		case 'h':
+			printf("%s\n\n", version());
+			usage();
+			exit(0);
+		case 'i':
+			select_interface(optarg);
+			break;
+		case 'n':
+			use_pcapng = true;
+			break;
+		case 'N':
+			ring_size = get_uint(optarg, "packet_limit", 0);
+			break;
+		case 'p':
+			promiscuous_mode = false;
+			break;
+		case 'P':
+			use_pcapng = false;
+			break;
+		case 'q':
+			quiet = true;
+			break;
+		case 's':
+			snaplen = get_uint(optarg, "snap_len", 0);
+			break;
+		case 'w':
+			output_name = optarg;
+			break;
+		case 'v':
+			printf("%s\n", version());
+			exit(0);
+		default:
+			fprintf(stderr, "Invalid option: %s\n",
+				argv[optind - 1]);
+			usage();
+			exit(1);
+		}
+	}
+}
+
+static void
+signal_handler(int sig_num __rte_unused)
+{
+	__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+}
+
+/* Return the time since 1/1/1970 in nanoseconds */
+static uint64_t create_timestamp(void)
+{
+	struct timespec now;
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+	return rte_timespec_to_ns(&now);
+}
+
+static void
+cleanup_pdump_resources(void)
+{
+	struct interface *intf;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		rte_pdump_disable(intf->port,
+				  RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
+		if (promiscuous_mode)
+			rte_eth_promiscuous_disable(intf->port);
+	}
+}
+
+/* Alarm signal handler, used to check that primary process */
+static void
+monitor_primary(void *arg __rte_unused)
+{
+	if (__atomic_load_n(&quit_signal, __ATOMIC_RELAXED))
+		return;
+
+	if (rte_eal_primary_proc_alive(NULL)) {
+		rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	} else {
+		fprintf(stderr,
+			"Primary process is no longer active, exiting...\n");
+		__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+	}
+}
+
+/* Setup handler to check when primary exits. */
+static void
+enable_primary_monitor(void)
+{
+	int ret;
+
+	/* Once primary exits, so will pdump. */
+	ret = rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to enable monitor:%d\n", ret);
+}
+
+static void
+disable_primary_monitor(void)
+{
+	int ret;
+
+	ret = rte_eal_alarm_cancel(monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to disable monitor:%d\n", ret);
+}
+
+static void
+report_packet_stats(dumpcap_out_t out)
+{
+	struct rte_pdump_stats pdump_stats;
+	struct interface *intf;
+	uint64_t ifrecv, ifdrop;
+	double percent;
+
+	fputc('\n', stderr);
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (rte_pdump_stats(intf->port, &pdump_stats) < 0)
+			continue;
+
+		/* do what Wiretap does */
+		ifrecv = pdump_stats.accepted + pdump_stats.filtered;
+		ifdrop = pdump_stats.nombuf + pdump_stats.ringfull;
+
+		if (use_pcapng)
+			rte_pcapng_write_stats(out.pcapng, intf->port, NULL,
+					       start_time, end_time,
+					       ifrecv, ifdrop);
+
+		if (ifrecv == 0)
+			percent = 0;
+		else
+			percent = 100. * ifrecv / (ifrecv + ifdrop);
+
+		fprintf(stderr,
+			"Packets received/dropped on interface '%s': "
+			"%"PRIu64 "/%" PRIu64 " (%.1f)\n",
+			intf->name, ifrecv, ifdrop, percent);
+	}
+}
+
+/*
+ * Start DPDK EAL with arguments.
+ * Unlike most DPDK programs, this application does not use the
+ * typical EAL command line arguments.
+ * We don't want to expose all the DPDK internals to the user.
+ */
+static void dpdk_init(void)
+{
+	static const char * const args[] = {
+		"dumpcap", "--proc-type", "secondary",
+		"--log-level", "notice"
+
+	};
+	const int eal_argc = RTE_DIM(args);
+	char **eal_argv;
+	unsigned int i;
+
+	/* DPDK API requires mutable versions of command line arguments. */
+	eal_argv = calloc(eal_argc + 1, sizeof(char *));
+	if (eal_argv == NULL)
+		rte_panic("No memory\n");
+
+	eal_argv[0] = strdup(progname);
+	for (i = 1; i < RTE_DIM(args); i++)
+		eal_argv[i] = strdup(args[i]);
+
+	if (rte_eal_init(eal_argc, eal_argv) < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed: is primary process running?\n");
+
+	if (rte_eth_dev_count_avail() == 0)
+		rte_exit(EXIT_FAILURE, "No Ethernet ports found\n");
+}
+
+/* Create packet ring shared between callbacks and process */
+static struct rte_ring *create_ring(void)
+{
+	struct rte_ring *ring;
+	size_t size, log2;
+
+	/* Find next power of 2 >= size. */
+	size = ring_size;
+	log2 = sizeof(size) * 8 - __builtin_clzl(size - 1);
+	size = 1u << log2;
+
+	if (size != ring_size) {
+		fprintf(stderr, "Ring size %u rounded up to %zu\n",
+			ring_size, size);
+		ring_size = size;
+	}
+
+	ring = rte_ring_lookup(RING_NAME);
+	if (ring == NULL) {
+		ring = rte_ring_create(RING_NAME, ring_size,
+					rte_socket_id(), 0);
+		if (ring == NULL)
+			rte_exit(EXIT_FAILURE, "Could not create ring :%s\n",
+				 rte_strerror(rte_errno));
+	}
+	return ring;
+}
+
+static struct rte_mempool *create_mempool(void)
+{
+	static const char pool_name[] = "capture_mbufs";
+	size_t num_mbufs = 2 * ring_size;
+	struct rte_mempool *mp;
+
+	mp = rte_mempool_lookup(pool_name);
+	if (mp)
+		return mp;
+
+	mp = rte_pktmbuf_pool_create_by_ops(pool_name, num_mbufs,
+					    MBUF_POOL_CACHE_SIZE, 0,
+					    rte_pcapng_mbuf_size(snaplen),
+					    rte_socket_id(), "ring_mp_sc");
+	if (mp == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Mempool (%s) creation failed: %s\n", pool_name,
+			 rte_strerror(rte_errno));
+
+	return mp;
+}
+
+/*
+ * Get Operating System information.
+ * Returns an string allocated via malloc().
+ */
+static char *get_os_info(void)
+{
+	struct utsname uts;
+	char *osname = NULL;
+
+	if (uname(&uts) < 0)
+		return NULL;
+
+	if (asprintf(&osname, "%s %s",
+		     uts.sysname, uts.release) == -1)
+		return NULL;
+
+	return osname;
+}
+
+static dumpcap_out_t create_output(void)
+{
+	dumpcap_out_t ret;
+	static char tmp_path[PATH_MAX];
+	int fd;
+
+	/* If no filename specified make a tempfile name */
+	if (output_name == NULL) {
+		struct interface *intf;
+		struct tm *tm;
+		time_t now;
+		char ts[32];
+
+		intf = TAILQ_FIRST(&interfaces);
+		now = time(NULL);
+		tm = localtime(&now);
+		if (!tm)
+			rte_panic("localtime failed\n");
+
+		strftime(ts, sizeof(ts), "%Y%m%d%H%M%S", tm);
+
+		snprintf(tmp_path, sizeof(tmp_path),
+			 "/tmp/%s_%u_%s_%s.%s",
+			 progname, intf->port, intf->name, ts,
+			 use_pcapng ? "pcapng" : "pcap");
+		output_name = tmp_path;
+	}
+
+	if (strcmp(output_name, "-") == 0)
+		fd = STDOUT_FILENO;
+	else {
+		mode_t mode = group_read ? 0640 : 0600;
+
+		fd = open(output_name, O_WRONLY | O_CREAT, mode);
+		if (fd < 0)
+			rte_exit(EXIT_FAILURE, "Can not open \"%s\": %s\n",
+				 output_name, strerror(errno));
+	}
+
+	if (use_pcapng) {
+		char *os = get_os_info();
+
+		ret.pcapng = rte_pcapng_fdopen(fd, os, NULL,
+					   version(), capture_comment);
+		if (ret.pcapng == NULL)
+			rte_exit(EXIT_FAILURE, "pcapng_fdopen failed: %s\n",
+				 strerror(rte_errno));
+		free(os);
+	} else {
+		pcap_t *pcap;
+
+		pcap = pcap_open_dead_with_tstamp_precision(DLT_EN10MB, snaplen,
+							    PCAP_TSTAMP_PRECISION_NANO);
+		if (pcap == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_open_dead failed\n");
+
+		ret.dumper = pcap_dump_fopen(pcap, fdopen(fd, "w"));
+		if (ret.dumper == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_dump_fopen failed: %s\n",
+				 pcap_geterr(pcap));
+	}
+
+	return ret;
+}
+
+static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp)
+{
+	struct interface *intf;
+	uint32_t flags;
+	int ret;
+
+	flags = RTE_PDUMP_FLAG_RXTX;
+	if (use_pcapng)
+		flags |= RTE_PDUMP_FLAG_PCAPNG;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (promiscuous_mode)
+			rte_eth_promiscuous_enable(intf->port);
+
+		ret = rte_pdump_enable_bpf(intf->port, RTE_PDUMP_ALL_QUEUES,
+					   flags, snaplen,
+					   r, mp, bpf_prm);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Packet dump enable failed: %s\n",
+				 rte_strerror(-ret));
+	}
+}
+
+/*
+ * Show current count of captured packets
+ * with backspaces to overwrite last value.
+ */
+static void show_count(uint64_t count)
+{
+	unsigned int i;
+	static unsigned int bt;
+
+	for (i = 0; i < bt; i++)
+		fputc('\b', stderr);
+
+	bt = fprintf(stderr, "%"PRIu64" ", count);
+}
+
+/* Write multiple packets in older pcap format */
+static ssize_t
+pcap_write_packets(pcap_dumper_t *dumper,
+		   struct rte_mbuf *pkts[], uint16_t n)
+{
+	uint8_t temp_data[snaplen];
+	struct pcap_pkthdr header;
+	uint16_t i;
+	size_t total = 0;
+
+	gettimeofday(&header.ts, NULL);
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m = pkts[i];
+
+		header.len = rte_pktmbuf_pkt_len(m);
+		header.caplen = RTE_MIN(header.len, snaplen);
+
+		pcap_dump((u_char *)dumper, &header,
+			  rte_pktmbuf_read(m, 0, header.caplen, temp_data));
+
+		total += sizeof(header) + header.len;
+	}
+
+	return total;
+}
+
+/* Process all packets in ring and dump to capture file */
+static int process_ring(dumpcap_out_t out, struct rte_ring *r)
+{
+	struct rte_mbuf *pkts[BURST_SIZE];
+	unsigned int avail, n;
+	static unsigned int empty_count;
+	ssize_t written;
+
+	n = rte_ring_sc_dequeue_burst(r, (void **) pkts, BURST_SIZE,
+				      &avail);
+	if (n == 0) {
+		/* don't consume endless amounts of cpu if idle */
+		if (empty_count < SLEEP_THRESHOLD)
+			++empty_count;
+		else
+			usleep(10);
+		return 0;
+	}
+
+	empty_count = (avail == 0);
+
+	if (use_pcapng)
+		written = rte_pcapng_write_packets(out.pcapng, pkts, n);
+	else
+		written = pcap_write_packets(out.dumper, pkts, n);
+
+	rte_pktmbuf_free_bulk(pkts, n);
+
+	if (written < 0)
+		return -1;
+
+	file_size += written;
+	packets_received += n;
+	if (!quiet)
+		show_count(packets_received);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	struct rte_ring *r;
+	struct rte_mempool *mp;
+	dumpcap_out_t out;
+
+	progname = argv[0];
+
+	dpdk_init();
+	parse_opts(argc, argv);
+
+	if (filter_str)
+		compile_filter();
+
+	if (TAILQ_EMPTY(&interfaces))
+		set_default_interface();
+
+	r = create_ring();
+	mp = create_mempool();
+	out = create_output();
+
+	start_time = create_timestamp();
+	enable_pdump(r, mp);
+
+	signal(SIGINT, signal_handler);
+	signal(SIGPIPE, SIG_IGN);
+
+	enable_primary_monitor();
+
+	if (!quiet) {
+		fprintf(stderr, "Packets captured: ");
+		show_count(0);
+	}
+
+	while (!__atomic_load_n(&quit_signal, __ATOMIC_RELAXED)) {
+		if (process_ring(out, r) < 0) {
+			fprintf(stderr, "pcapng file write failed; %s\n",
+				strerror(errno));
+			break;
+		}
+
+		if (stop.size && file_size >= stop.size)
+			break;
+
+		if (stop.packets && packets_received >= stop.packets)
+			break;
+
+		if (stop.duration != 0 &&
+		    create_timestamp() - start_time > stop.duration)
+			break;
+	}
+
+	end_time = create_timestamp();
+	disable_primary_monitor();
+
+	if (rte_eal_primary_proc_alive(NULL))
+		report_packet_stats(out);
+
+	if (use_pcapng)
+		rte_pcapng_close(out.pcapng);
+	else
+		pcap_dump_close(out.dumper);
+
+	cleanup_pdump_resources();
+	rte_free(bpf_filter);
+	rte_ring_free(r);
+	rte_mempool_free(mp);
+
+	return rte_eal_cleanup() ? EXIT_FAILURE : 0;
+}
diff --git a/app/dumpcap/meson.build b/app/dumpcap/meson.build
new file mode 100644
index 000000000000..794336211eff
--- /dev/null
+++ b/app/dumpcap/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+if not dpdk_conf.has('RTE_PORT_PCAP')
+    build = false
+    reason = 'missing dependency, "libpcap"'
+endif
+
+if is_windows
+	build = false
+	reason = 'not supported on Windows'
+	subdir_done()
+endif
+
+sources = files('main.c')
+deps += ['ethdev', 'pdump', 'pcapng', 'bpf']
diff --git a/app/meson.build b/app/meson.build
index 4c6049807cc3..e41a2e390236 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017-2019 Intel Corporation
 
 apps = [
+        'dumpcap',
         'pdump',
         'proc-info',
         'test-acl',
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v9 08/12] test: add test for bpf_convert
  2021-09-16  0:14 ` [dpdk-dev] [PATCH v9 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (6 preceding siblings ...)
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
@ 2021-09-16  0:14   ` Stephen Hemminger
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 09/12] test: add a test for pcapng library Stephen Hemminger
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16  0:14 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Add some functional tests for the Classic BPF to DPDK BPF converter.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 200 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 200 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 527c06b80708..543a5fd615b2 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -10,6 +10,7 @@
 #include <rte_memory.h>
 #include <rte_debug.h>
 #include <rte_hexdump.h>
+#include <rte_malloc.h>
 #include <rte_random.h>
 #include <rte_byteorder.h>
 #include <rte_errno.h>
@@ -3233,3 +3234,202 @@ test_bpf(void)
 }
 
 REGISTER_TEST_COMMAND(bpf_autotest, test_bpf);
+
+#ifdef RTE_PORT_PCAP
+#include <pcap/pcap.h>
+
+static void
+test_bpf_dump(struct bpf_program *cbf, const struct rte_bpf_prm *prm)
+{
+	printf("cBPF program (%u insns)\n", cbf->bf_len);
+	bpf_dump(cbf, 1);
+
+	printf("\neBPF program (%u insns)\n", prm->nb_ins);
+	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
+}
+
+static int
+test_bpf_match(pcap_t *pcap, const char *str,
+	       struct rte_mbuf *mb)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+	int ret = -1;
+	uint64_t rc;
+
+	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
+		       __func__, __LINE__,  str, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, str, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	rc = rte_bpf_exec(bpf, mb);
+	/* The return code from bpf capture filter is non-zero if matched */
+	ret = (rc == 0);
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return ret;
+}
+
+/* Basic sanity test can we match a IP packet */
+static int
+test_bpf_filter_sanity(pcap_t *pcap)
+{
+	const uint32_t plen = 100;
+	struct rte_mbuf mb, *m;
+	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
+	struct {
+		struct rte_ether_hdr eth_hdr;
+		struct rte_ipv4_hdr ip_hdr;
+	} *hdr;
+
+	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
+	m = &mb;
+
+	hdr = rte_pktmbuf_mtod(m, typeof(hdr));
+	hdr->eth_hdr = (struct rte_ether_hdr) {
+		.d_addr.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+	};
+	hdr->ip_hdr = (struct rte_ipv4_hdr) {
+		.version_ihl = RTE_IPV4_VHL_DEF,
+		.total_length = rte_cpu_to_be_16(plen),
+		.time_to_live = IPDEFTTL,
+		.next_proto_id = IPPROTO_RAW,
+		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+	};
+
+	if (test_bpf_match(pcap, "ip", m) != 0) {
+		printf("%s@%d: filter \"ip\" doesn't match test data\n",
+		       __func__, __LINE__);
+		return -1;
+	}
+	if (test_bpf_match(pcap, "not ip", m) == 0) {
+		printf("%s@%d: filter \"not ip\" does match test data\n",
+		       __func__, __LINE__);
+		return -1;
+	}
+
+	return 0;
+}
+
+/*
+ * Some sample pcap filter strings from
+ * https://wiki.wireshark.org/CaptureFilters
+ */
+static const char * const sample_filters[] = {
+	"host 172.18.5.4",
+	"net 192.168.0.0/24",
+	"src net 192.168.0.0/24",
+	"src net 192.168.0.0 mask 255.255.255.0",
+	"dst net 192.168.0.0/24",
+	"dst net 192.168.0.0 mask 255.255.255.0",
+	"port 53",
+	"host dpdk.org and not (port 80 or port 25)",
+	"host dpdk.org and not port 80 and not port 25",
+	"port not 53 and not arp",
+	"(tcp[0:2] > 1500 and tcp[0:2] < 1550) or (tcp[2:2] > 1500 and tcp[2:2] < 1550)",
+	"ether proto 0x888e",
+	"ether[0] & 1 = 0 and ip[16] >= 224",
+	"icmp[icmptype] != icmp-echo and icmp[icmptype] != icmp-echoreply",
+	"tcp[tcpflags] & (tcp-syn|tcp-fin) != 0 and not src and dst net 127.0.0.1",
+	"not ether dst 01:80:c2:00:00:0e",
+	"not broadcast and not multicast",
+	"dst host ff02::1",
+	"port 80 and tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420",
+	/* Worms */
+	"dst port 135 and tcp port 135 and ip[2:2]==48",
+	"icmp[icmptype]==icmp-echo and ip[2:2]==92 and icmp[8:4]==0xAAAAAAAA",
+	"dst port 135 or dst port 445 or dst port 1433"
+	" and tcp[tcpflags] & (tcp-syn) != 0"
+	" and tcp[tcpflags] & (tcp-ack) = 0 and src net 192.168.0.0/24",
+	"tcp src port 443 and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4] = 0x18)"
+	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 1] = 0x03)"
+	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 2] < 0x04)"
+	" and ((ip[2:2] - 4 * (ip[0] & 0x0F) - 4 * ((tcp[12] & 0xF0) >> 4) > 69))",
+	/* Other */
+	"len = 128",
+};
+
+static int
+test_bpf_filter(pcap_t *pcap, const char *s)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+
+	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
+		       __func__, __LINE__, s, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, s, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	else {
+		printf("%s \"%s\"\n", __func__, s);
+		test_bpf_dump(&fcode, prm);
+	}
+
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return (bpf == NULL) ? -1 : 0;
+}
+
+static int
+test_bpf_convert(void)
+{
+	unsigned int i;
+	pcap_t *pcap;
+	int rc;
+
+	pcap = pcap_open_dead(DLT_EN10MB, 262144);
+	if (!pcap) {
+		printf("pcap_open_dead failed\n");
+		return -1;
+	}
+
+	rc = test_bpf_filter_sanity(pcap);
+	for (i = 0; i < RTE_DIM(sample_filters); i++)
+		rc |= test_bpf_filter(pcap, sample_filters[i]);
+
+	pcap_close(pcap);
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(bpf_convert_autotest, test_bpf_convert);
+#endif /* RTE_PORT_PCAP */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v9 09/12] test: add a test for pcapng library
  2021-09-16  0:14 ` [dpdk-dev] [PATCH v9 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (7 preceding siblings ...)
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 08/12] test: add test for bpf_convert Stephen Hemminger
@ 2021-09-16  0:14   ` Stephen Hemminger
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 10/12] test: enable bpf autotest Stephen Hemminger
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16  0:14 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Simple unit test that created pcapng file using API.

To run this test you need to have at least one device.
For example:

DPDK_TEST=pcapng_autotest ./build/app/test/dpdk-test -l 0-15 \
    --no-huge -m 2048 --vdev=net_tap,iface=dummy

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/meson.build   |   4 +
 app/test/test_pcapng.c | 272 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 276 insertions(+)
 create mode 100644 app/test/test_pcapng.c

diff --git a/app/test/meson.build b/app/test/meson.build
index a7611686adcb..8cf41021deb4 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -395,6 +395,10 @@ if dpdk_conf.has('RTE_NET_RING')
     fast_tests += [['pdump_autotest', true]]
 endif
 
+if dpdk_conf.has('RTE_PORT_PCAP')
+    test_sources += 'test_pcapng.c'
+endif
+
 if dpdk_conf.has('RTE_LIB_POWER')
     test_deps += 'power'
 endif
diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
new file mode 100644
index 000000000000..df837ffe0d51
--- /dev/null
+++ b/app/test/test_pcapng.c
@@ -0,0 +1,272 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Microsoft Corporation
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_net.h>
+#include <rte_pcapng.h>
+
+#include <pcap/pcap.h>
+
+#include "test.h"
+
+#define NUM_PACKETS    10
+#define DUMMY_MBUF_NUM 3
+
+static rte_pcapng_t *pcapng;
+static struct rte_mempool *mp;
+static const uint32_t pkt_len = 200;
+static uint16_t port_id;
+static char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng";
+
+/* first mbuf in the packet, should always be at offset 0 */
+struct dummy_mbuf {
+	struct rte_mbuf mb[DUMMY_MBUF_NUM];
+	uint8_t buf[DUMMY_MBUF_NUM][RTE_MBUF_DEFAULT_BUF_SIZE];
+};
+
+static void
+dummy_mbuf_prep(struct rte_mbuf *mb, uint8_t buf[], uint32_t buf_len,
+	uint32_t data_len)
+{
+	uint32_t i;
+	uint8_t *db;
+
+	mb->buf_addr = buf;
+	mb->buf_iova = (uintptr_t)buf;
+	mb->buf_len = buf_len;
+	rte_mbuf_refcnt_set(mb, 1);
+
+	/* set pool pointer to dummy value, test doesn't use it */
+	mb->pool = (void *)buf;
+
+	rte_pktmbuf_reset(mb);
+	db = (uint8_t *)rte_pktmbuf_append(mb, data_len);
+
+	for (i = 0; i != data_len; i++)
+		db[i] = i;
+}
+
+/* Make an IP packet consisting of chain of one packets */
+static void
+mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen)
+{
+	struct {
+		struct rte_ether_hdr eth;
+		struct rte_ipv4_hdr ip;
+	} pkt = {
+		.eth = {
+			.d_addr.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+			.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+		},
+		.ip = {
+			.version_ihl = RTE_IPV4_VHL_DEF,
+			.total_length = rte_cpu_to_be_16(plen),
+			.time_to_live = IPDEFTTL,
+			.next_proto_id = IPPROTO_RAW,
+			.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+			.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+		}
+	};
+
+	memset(dm, 0, sizeof(*dm));
+	dummy_mbuf_prep(&dm->mb[0], dm->buf[0], sizeof(dm->buf[0]), plen);
+
+	rte_eth_random_addr(pkt.eth.s_addr.addr_bytes);
+	memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, RTE_MIN(sizeof(pkt), plen));
+}
+
+static int
+test_setup(void)
+{
+	int tmp_fd;
+
+	port_id = rte_eth_find_next(0);
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		fprintf(stderr, "No valid Ether port\n");
+		return -1;
+	}
+
+	tmp_fd = mkstemps(file_name, strlen(".pcapng"));
+	if (tmp_fd == -1) {
+		perror("mkstemps() failure");
+		return -1;
+	}
+	printf("pcapng: output file %s\n", file_name);
+
+	/* open a test capture file */
+	pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL);
+	if (pcapng == NULL) {
+		fprintf(stderr, "rte_pcapng_fdopen failed\n");
+		close(tmp_fd);
+		return -1;
+	}
+
+
+	/* Make a pool for cloned packeets */
+	mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", NUM_PACKETS,
+					    0, 0,
+					    rte_pcapng_mbuf_size(pkt_len),
+					    SOCKET_ID_ANY, "ring_mp_sc");
+	if (mp == NULL) {
+		fprintf(stderr, "Cannot create mempool\n");
+		return -1;
+	}
+	return 0;
+
+}
+
+static int
+test_write_packets(void)
+{
+	struct rte_mbuf *orig;
+	struct rte_mbuf *clones[NUM_PACKETS] = { };
+	struct dummy_mbuf mbfs;
+	unsigned int i;
+	ssize_t len;
+
+	/* make a dummy packet */
+	mbuf1_prepare(&mbfs, pkt_len);
+
+	/* clone them */
+	orig  = &mbfs.mb[0];
+	for (i = 0; i < NUM_PACKETS; i++) {
+		struct rte_mbuf *mc;
+
+		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
+				rte_get_tsc_cycles(), 0);
+		if (mc == NULL) {
+			fprintf(stderr, "Cannot copy packet\n");
+			return -1;
+		}
+		clones[i] = mc;
+	}
+
+	/* write it to capture file */
+	len = rte_pcapng_write_packets(pcapng, clones, NUM_PACKETS);
+
+	rte_pktmbuf_free_bulk(clones, NUM_PACKETS);
+
+	if (len <= 0) {
+		fprintf(stderr, "Write of packets failed\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+test_write_stats(void)
+{
+	ssize_t len;
+
+	/* write a statistics block */
+	len = rte_pcapng_write_stats(pcapng, port_id,
+				     NULL, 0, 0,
+				     NUM_PACKETS, 0);
+	if (len <= 0) {
+		fprintf(stderr, "Write of statistics failed\n");
+		return -1;
+	}
+	return 0;
+}
+
+static void
+pkt_print(u_char *user, const struct pcap_pkthdr *h,
+	  const u_char *bytes)
+{
+	unsigned int *countp = (unsigned int *)user;
+	const struct rte_ether_hdr *eh;
+	struct tm *tm;
+	char tbuf[128], src[64], dst[64];
+
+	tm = localtime(&h->ts.tv_sec);
+	if (tm == NULL) {
+		perror("localtime");
+		return;
+	}
+
+	if (strftime(tbuf, sizeof(tbuf), "%X", tm) == 0) {
+		fprintf(stderr, "strftime returned 0!\n");
+		return;
+	}
+
+	eh = (const struct rte_ether_hdr *)bytes;
+	rte_ether_format_addr(dst, sizeof(dst), &eh->d_addr);
+	rte_ether_format_addr(src, sizeof(src), &eh->s_addr);
+	printf("%s.%06lu: %s -> %s type %x length %u\n",
+	       tbuf, (unsigned long)h->ts.tv_usec,
+	       src, dst, rte_be_to_cpu_16(eh->ether_type), h->len);
+
+	*countp += 1;
+}
+
+/*
+ * Open the resulting pcapng file with libpcap
+ * Would be better to use capinfos from wireshark
+ * but that creates an unwanted dependency.
+ */
+static int
+test_validate(void)
+{
+	char errbuf[PCAP_ERRBUF_SIZE];
+	unsigned int count = 0;
+	pcap_t *pcap;
+	int ret;
+
+	pcap = pcap_open_offline(file_name, errbuf);
+	if (pcap == NULL) {
+		fprintf(stderr, "pcap_open_offline('%s') failed: %s\n",
+			file_name, errbuf);
+		return -1;
+	}
+
+	ret = pcap_loop(pcap, 0, pkt_print, (u_char *)&count);
+	if (ret == 0)
+		printf("Saw %u packets\n", count);
+	else
+		fprintf(stderr, "pcap_dispatch: failed: %s\n",
+			pcap_geterr(pcap));
+	pcap_close(pcap);
+
+	return ret;
+}
+
+static void
+test_cleanup(void)
+{
+	if (mp)
+		rte_mempool_free(mp);
+
+	if (pcapng)
+		rte_pcapng_close(pcapng);
+
+}
+
+static struct
+unit_test_suite test_pcapng_suite  = {
+	.setup = test_setup,
+	.teardown = test_cleanup,
+	.suite_name = "Test Pcapng Unit Test Suite",
+	.unit_test_cases = {
+		TEST_CASE(test_write_packets),
+		TEST_CASE(test_write_stats),
+		TEST_CASE(test_validate),
+		TEST_CASES_END()
+	}
+};
+
+static int
+test_pcapng(void)
+{
+	return unit_test_suite_runner(&test_pcapng_suite);
+}
+
+REGISTER_TEST_COMMAND(pcapng_autotest, test_pcapng);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v9 10/12] test: enable bpf autotest
  2021-09-16  0:14 ` [dpdk-dev] [PATCH v9 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (8 preceding siblings ...)
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 09/12] test: add a test for pcapng library Stephen Hemminger
@ 2021-09-16  0:14   ` Stephen Hemminger
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 11/12] doc: changes for new pcapng and dumpcap Stephen Hemminger
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16  0:14 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

The BPF autotest is defined but not run automatically.
Since it is short, it should be added to the autotest suite.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/meson.build | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/app/test/meson.build b/app/test/meson.build
index 8cf41021deb4..ba7d568bf330 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -193,6 +193,8 @@ test_deps = [
 fast_tests = [
         ['acl_autotest', true],
         ['atomic_autotest', false],
+        ['bpf_autotest', true],
+        ['bpf_convert_autotest', true],
         ['bitops_autotest', true],
         ['byteorder_autotest', true],
         ['cksum_autotest', true],
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v9 11/12] doc: changes for new pcapng and dumpcap
  2021-09-16  0:14 ` [dpdk-dev] [PATCH v9 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (9 preceding siblings ...)
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 10/12] test: enable bpf autotest Stephen Hemminger
@ 2021-09-16  0:14   ` Stephen Hemminger
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16  0:14 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Describe the new packet capture library and utilities

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/api/doxy-api-index.md                     |  1 +
 doc/api/doxy-api.conf.in                      |  1 +
 .../howto/img/packet_capture_framework.svg    | 96 +++++++++----------
 doc/guides/howto/packet_capture_framework.rst | 67 ++++++-------
 doc/guides/prog_guide/index.rst               |  1 +
 doc/guides/prog_guide/pcapng_lib.rst          | 24 +++++
 doc/guides/prog_guide/pdump_lib.rst           | 28 ++++--
 doc/guides/rel_notes/release_21_11.rst        | 10 ++
 doc/guides/tools/dumpcap.rst                  | 86 +++++++++++++++++
 doc/guides/tools/index.rst                    |  1 +
 10 files changed, 228 insertions(+), 87 deletions(-)
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a0356..ee07394d1c78 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -223,3 +223,4 @@ The public API headers are grouped by topics:
   [experimental APIs]  (@ref rte_compat.h),
   [ABI versioning]     (@ref rte_function_versioning.h),
   [version]            (@ref rte_version.h)
+  [pcapng]             (@ref rte_pcapng.h)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6ab..aba17799a9a1 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -58,6 +58,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/metrics \
                           @TOPDIR@/lib/node \
                           @TOPDIR@/lib/net \
+                          @TOPDIR@/lib/pcapng \
                           @TOPDIR@/lib/pci \
                           @TOPDIR@/lib/pdump \
                           @TOPDIR@/lib/pipeline \
diff --git a/doc/guides/howto/img/packet_capture_framework.svg b/doc/guides/howto/img/packet_capture_framework.svg
index a76baf71fdee..1c2646a81096 100644
--- a/doc/guides/howto/img/packet_capture_framework.svg
+++ b/doc/guides/howto/img/packet_capture_framework.svg
@@ -1,6 +1,4 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<!-- Created with Inkscape (http://www.inkscape.org/) -->
-
 <svg
    xmlns:osb="http://www.openswatchbook.org/uri/2009/osb"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
@@ -16,8 +14,8 @@
    viewBox="0 0 425.19685 283.46457"
    id="svg2"
    version="1.1"
-   inkscape:version="0.91 r13725"
-   sodipodi:docname="drawing-pcap.svg">
+   inkscape:version="1.0.2 (e86c870879, 2021-01-15)"
+   sodipodi:docname="packet_capture_framework.svg">
   <defs
      id="defs4">
     <marker
@@ -228,7 +226,7 @@
        x2="487.64606"
        y2="258.38232"
        gradientUnits="userSpaceOnUse"
-       gradientTransform="translate(-84.916417,744.90779)" />
+       gradientTransform="matrix(1.1457977,0,0,0.99944907,-151.97019,745.05014)" />
     <linearGradient
        inkscape:collect="always"
        xlink:href="#linearGradient5784"
@@ -277,17 +275,18 @@
      borderopacity="1.0"
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
-     inkscape:zoom="0.57434918"
-     inkscape:cx="215.17857"
-     inkscape:cy="285.26445"
+     inkscape:zoom="1"
+     inkscape:cx="226.77165"
+     inkscape:cy="78.124511"
      inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
-     inkscape:window-width="1874"
-     inkscape:window-height="971"
-     inkscape:window-x="2"
-     inkscape:window-y="24"
-     inkscape:window-maximized="0" />
+     inkscape:window-width="2560"
+     inkscape:window-height="1414"
+     inkscape:window-x="0"
+     inkscape:window-y="0"
+     inkscape:window-maximized="1"
+     inkscape:document-rotation="0" />
   <metadata
      id="metadata7">
     <rdf:RDF>
@@ -296,7 +295,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
+        <dc:title />
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -321,15 +320,15 @@
        y="790.82452" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="61.050636"
        y="807.3205"
-       id="text4152"
-       sodipodi:linespacing="125%"><tspan
+       id="text4152"><tspan
          sodipodi:role="line"
          id="tspan4154"
          x="61.050636"
-         y="807.3205">DPDK Primary Application</tspan></text>
+         y="807.3205"
+         style="font-size:12.5px;line-height:1.25">DPDK Primary Application</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6"
@@ -339,19 +338,20 @@
        y="827.01843" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="350.68585"
        y="841.16058"
-       id="text4189"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189"><tspan
          sodipodi:role="line"
          id="tspan4191"
          x="350.68585"
-         y="841.16058">dpdk-pdump</tspan><tspan
+         y="841.16058"
+         style="font-size:12.5px;line-height:1.25">dpdk-dumpcap</tspan><tspan
          sodipodi:role="line"
          x="350.68585"
          y="856.78558"
-         id="tspan4193">tool</tspan></text>
+         id="tspan4193"
+         style="font-size:12.5px;line-height:1.25">tool</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4"
@@ -361,15 +361,15 @@
        y="891.16315" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70612"
        y="905.3053"
-       id="text4189-1"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1"><tspan
          sodipodi:role="line"
          x="352.70612"
          y="905.3053"
-         id="tspan4193-3">PCAP PMD</tspan></text>
+         id="tspan4193-3"
+         style="font-size:12.5px;line-height:1.25">librte_pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5745);fill-opacity:1;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-6"
@@ -379,15 +379,15 @@
        y="923.9931" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.02846"
        y="938.13525"
-       id="text4189-0"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-0"><tspan
          sodipodi:role="line"
          x="136.02846"
          y="938.13525"
-         id="tspan4193-6">dpdk_port0</tspan></text>
+         id="tspan4193-6"
+         style="font-size:12.5px;line-height:1.25">dpdk_port0</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-5"
@@ -397,33 +397,33 @@
        y="824.99817" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="137.54369"
        y="839.14026"
-       id="text4189-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-4"><tspan
          sodipodi:role="line"
          x="137.54369"
          y="839.14026"
-         id="tspan4193-2">librte_pdump</tspan></text>
+         id="tspan4193-2"
+         style="font-size:12.5px;line-height:1.25">librte_pdump</tspan></text>
     <rect
-       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1.07013;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5"
-       width="94.449265"
-       height="35.355339"
-       x="307.7804"
-       y="985.61243" />
+       width="108.21974"
+       height="35.335861"
+       x="297.9809"
+       y="985.62219" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70618"
        y="999.75458"
-       id="text4189-1-8"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8"><tspan
          sodipodi:role="line"
          x="352.70618"
          y="999.75458"
-         id="tspan4193-3-2">capture.pcap</tspan></text>
+         id="tspan4193-3-2"
+         style="font-size:12.5px;line-height:1.25">capture.pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5788-1);fill-opacity:1;stroke:#257cdc;stroke-width:1.12555885;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5-1"
@@ -433,15 +433,15 @@
        y="983.14984" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.53352"
        y="1002.785"
-       id="text4189-1-8-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8-4"><tspan
          sodipodi:role="line"
          x="136.53352"
          y="1002.785"
-         id="tspan4193-3-2-7">Traffic Generator</tspan></text>
+         id="tspan4193-3-2-7"
+         style="font-size:12.5px;line-height:1.25">Traffic Generator</tspan></text>
     <path
        style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker7331)"
        d="m 351.46948,927.02357 c 0,57.5787 0,57.5787 0,57.5787"
diff --git a/doc/guides/howto/packet_capture_framework.rst b/doc/guides/howto/packet_capture_framework.rst
index c31bac52340e..78baa609a021 100644
--- a/doc/guides/howto/packet_capture_framework.rst
+++ b/doc/guides/howto/packet_capture_framework.rst
@@ -1,18 +1,19 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(c) 2017 Intel Corporation.
 
-DPDK pdump Library and pdump Tool
-=================================
+DPDK packet capture libraries and tools
+=======================================
 
 This document describes how the Data Plane Development Kit (DPDK) Packet
 Capture Framework is used for capturing packets on DPDK ports. It is intended
 for users of DPDK who want to know more about the Packet Capture feature and
 for those who want to monitor traffic on DPDK-controlled devices.
 
-The DPDK packet capture framework was introduced in DPDK v16.07. The DPDK
-packet capture framework consists of the DPDK pdump library and DPDK pdump
-tool.
-
+The DPDK packet capture framework was introduced in DPDK v16.07 and
+enhanced in 21.1. The DPDK packet capture framework consists of the
+libraries for collecting packets ``librte_pdump`` and writing packets
+to a file ``librte_pcapng``. There are two sample applications:
+``dpdk-dumpcap`` and older ``dpdk-pdump``.
 
 Introduction
 ------------
@@ -22,43 +23,46 @@ allow users to initialize the packet capture framework and to enable or
 disable packet capture. The library works on a multi process communication model and its
 usage is recommended for debugging purposes.
 
-The :ref:`dpdk-pdump <pdump_tool>` tool is developed based on the
-``librte_pdump`` library.  It runs as a DPDK secondary process and is capable
-of enabling or disabling packet capture on DPDK ports. The ``dpdk-pdump`` tool
-provides command-line options with which users can request enabling or
-disabling of the packet capture on DPDK ports.
+The :ref:`librte_pcapng <pcapng_library>` library provides the APIs to format
+packets and write them to a file in Pcapng format.
+
+
+The :ref:`dpdk-dumpcap <dumpcap_tool>` is a tool that captures packets in
+like Wireshark dumpcap does for Linux. It runs as a DPDK secondary process and
+captures packets from one or more interfaces and writes them to a file
+in Pcapng format.  The ``dpdk-dumpcap`` tool is designed to take
+most of the same options as the Wireshark ``dumpcap`` command.
 
-The application which initializes the packet capture framework will be a primary process
-and the application that enables or disables the packet capture will
-be a secondary process. The primary process sends the Rx and Tx packets from the DPDK ports
-to the secondary process.
+Without any options it will use the packet capture framework to
+capture traffic from the first available DPDK port.
 
 In DPDK the ``testpmd`` application can be used to initialize the packet
-capture framework and acts as a server, and the ``dpdk-pdump`` tool acts as a
+capture framework and acts as a server, and the ``dpdk-dumpcap`` tool acts as a
 client. To view Rx or Tx packets of ``testpmd``, the application should be
-launched first, and then the ``dpdk-pdump`` tool. Packets from ``testpmd``
-will be sent to the tool, which then sends them on to the Pcap PMD device and
-that device writes them to the Pcap file or to an external interface depending
-on the command-line option used.
+launched first, and then the ``dpdk-dumpcap`` tool. Packets from ``testpmd``
+will be sent to the tool, and then to the Pcapng file.
 
 Some things to note:
 
-* The ``dpdk-pdump`` tool can only be used in conjunction with a primary
+* All tools using ``librte_pdump`` can only be used in conjunction with a primary
   application which has the packet capture framework initialized already. In
   dpdk, only ``testpmd`` is modified to initialize packet capture framework,
-  other applications remain untouched. So, if the ``dpdk-pdump`` tool has to
+  other applications remain untouched. So, if the ``dpdk-dumpcap`` tool has to
   be used with any application other than the testpmd, the user needs to
   explicitly modify that application to call the packet capture framework
   initialization code. Refer to the ``app/test-pmd/testpmd.c`` code and look
   for ``pdump`` keyword to see how this is done.
 
-* The ``dpdk-pdump`` tool depends on the libpcap based PMD.
+* The ``dpdk-pdump`` tool is an older tool created as demonstration of ``librte_pdump``
+  library. The ``dpdk-pdump`` tool provides more limited functionality and
+  and depends on the Pcap PMD. It is retained only for compatibility reasons;
+  users should use ``dpdk-dumpcap`` instead.
 
 
 Test Environment
 ----------------
 
-The overview of using the Packet Capture Framework and the ``dpdk-pdump`` tool
+The overview of using the Packet Capture Framework and the ``dpdk-dumpcap`` utility
 for packet capturing on the DPDK port in
 :numref:`figure_packet_capture_framework`.
 
@@ -66,13 +70,13 @@ for packet capturing on the DPDK port in
 
 .. figure:: img/packet_capture_framework.*
 
-   Packet capturing on a DPDK port using the dpdk-pdump tool.
+   Packet capturing on a DPDK port using the dpdk-dumpcap utility.
 
 
 Running the Application
 -----------------------
 
-The following steps demonstrate how to run the ``dpdk-pdump`` tool to capture
+The following steps demonstrate how to run the ``dpdk-dumpcap`` tool to capture
 Rx side packets on dpdk_port0 in :numref:`figure_packet_capture_framework` and
 inspect them using ``tcpdump``.
 
@@ -80,16 +84,15 @@ inspect them using ``tcpdump``.
 
      sudo <build_dir>/app/dpdk-testpmd -c 0xf0 -n 4 -- -i --port-topology=chained
 
-#. Launch the pdump tool as follows::
+#. Launch the dpdk-dump as follows::
 
-     sudo <build_dir>/app/dpdk-pdump -- \
-          --pdump 'port=0,queue=*,rx-dev=/tmp/capture.pcap'
+     sudo <build_dir>/app/dpdk-dumpcap -w /tmp/capture.pcapng
 
 #. Send traffic to dpdk_port0 from traffic generator.
-   Inspect packets captured in the file capture.pcap using a tool
-   that can interpret Pcap files, for example tcpdump::
+   Inspect packets captured in the file capture.pcap using a tool such as
+   tcpdump or tshark that can interpret Pcapng files::
 
-     $tcpdump -nr /tmp/capture.pcap
+     $ tcpdump -nr /tmp/capture.pcapng
      reading from file /tmp/capture.pcap, link-type EN10MB (Ethernet)
      11:11:36.891404 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
      11:11:36.891442 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46a3..b440c77c2ba1 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -43,6 +43,7 @@ Programmer's Guide
     ip_fragment_reassembly_lib
     generic_receive_offload_lib
     generic_segmentation_offload_lib
+    pcapng_lib
     pdump_lib
     multi_proc_support
     kernel_nic_interface
diff --git a/doc/guides/prog_guide/pcapng_lib.rst b/doc/guides/prog_guide/pcapng_lib.rst
new file mode 100644
index 000000000000..36379b530a57
--- /dev/null
+++ b/doc/guides/prog_guide/pcapng_lib.rst
@@ -0,0 +1,24 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2016 Intel Corporation.
+
+.. _pcapng_library:
+
+Packet Capture File Writer
+==========================
+
+Pcapng is a library for creating files in Pcapng file format.
+The Pcapng file format is the default capture file format for modern
+network capture processing tools. It can be read by wireshark and tcpdump.
+
+Usage
+-----
+
+Before the library can be used the function ``rte_pcapng_init``
+should be called once to initialize timestamp computation.
+
+
+References
+----------
+* Draft RFC https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+
+* Project repository  https://github.com/pcapng/pcapng/
diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst
index 62c0b015b2fe..9af91415e5ea 100644
--- a/doc/guides/prog_guide/pdump_lib.rst
+++ b/doc/guides/prog_guide/pdump_lib.rst
@@ -3,10 +3,10 @@
 
 .. _pdump_library:
 
-The librte_pdump Library
-========================
+The Packet Capture Library
+==========================
 
-The ``librte_pdump`` library provides a framework for packet capturing in DPDK.
+The DPDK ``pdump`` library provides a framework for packet capturing in DPDK.
 The library does the complete copy of the Rx and Tx mbufs to a new mempool and
 hence it slows down the performance of the applications, so it is recommended
 to use this library for debugging purposes.
@@ -23,11 +23,19 @@ or disable the packet capture, and to uninitialize it.
 
 * ``rte_pdump_enable()``:
   This API enables the packet capture on a given port and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf()``
+  This API enables the packet capture on a given port and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_enable_by_deviceid()``:
   This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf_by_deviceid()``
+  This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
+  It also allows seating an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_disable()``:
   This API disables the packet capture on a given port and queue.
@@ -61,6 +69,12 @@ and enables the packet capture by registering the Ethernet RX and TX callbacks f
 and queue combinations. Then the primary process will mirror the packets to the new mempool and enqueue them to
 the rte_ring that secondary process have passed to these APIs.
 
+The packet ring supports one of two formats. The default format enqueues copies of the original packets
+into the rte_ring. If the ``RTE_PDUMP_FLAG_PCAPNG`` is set the mbuf data is extended with header and trailer
+to match the format of Pcapng enhanced packet block. The enhanced packet block has meta-data such as the
+timestamp, port and queue the packet was captured on. It is up to the application consuming the
+packets from the ring to select the format desired.
+
 The library APIs ``rte_pdump_disable()`` and ``rte_pdump_disable_by_deviceid()`` disables the packet capture.
 For the calls to these APIs from secondary process, the library creates the "pdump disable" request and sends
 the request to the primary process over the multi process channel. The primary process takes this request and
@@ -74,5 +88,5 @@ function.
 Use Case: Packet Capturing
 --------------------------
 
-The DPDK ``app/pdump`` tool is developed based on this library to capture packets in DPDK.
-Users can use this as an example to develop their own packet capturing tools.
+The DPDK ``app/dpdk-dumpcap`` utility uses this library
+to capture packets in DPDK.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 43d367bcada2..fedbd64e9777 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -87,6 +87,16 @@ New Features
   Added command-line options to specify total number of processes and
   current process ID. Each process owns subset of Rx and Tx queues.
 
+* **Revised packet capture framework.**
+
+  * New dpdk-dumpcap program that has most of the features of the
+    wireshark dumpcap utility including: capture of multiple interfaces,
+    filtering, and stopping after number of bytes, packets.
+  * New library for writing pcapng packet capture files.
+  * Enhancements to the pdump library to support:
+    * Packet filter with BPF.
+    * Pcapng format with timestamps and meta-data.
+    * Fixes packet capture with stripped VLAN tags.
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/dumpcap.rst b/doc/guides/tools/dumpcap.rst
new file mode 100644
index 000000000000..664ea0c79802
--- /dev/null
+++ b/doc/guides/tools/dumpcap.rst
@@ -0,0 +1,86 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Microsoft Corporation.
+
+.. _dumpcap_tool:
+
+dpdk-dumpcap Application
+========================
+
+The ``dpdk-dumpcap`` tool is a Data Plane Development Kit (DPDK)
+network traffic dump tool.  The interface is similar to  the dumpcap tool in Wireshark.
+It runs as a secondary DPDK process and lets you capture packets that are
+coming into and out of a DPDK primary process.
+The ``dpdk-dumpcap`` writes files in Pcapng packet format using
+capture file format is pcapng.
+
+Without any options set it will use DPDK to capture traffic from the first
+available DPDK interface and write the received raw packet data, along
+with timestamps into a pcapng file.
+
+If the ``-w`` option is not specified, ``dpdk-dumpcap`` writes to a newly
+create file with a name chosen based on interface name and timestamp.
+If ``-w`` option is specified, then that file is used.
+
+   .. Note::
+      * The ``dpdk-dumpcap`` tool can only be used in conjunction with a primary
+        application which has the packet capture framework initialized already.
+        In dpdk, only the ``testpmd`` is modified to initialize packet capture
+        framework, other applications remain untouched. So, if the ``dpdk-dumpcap``
+        tool has to be used with any application other than the testpmd, user
+        needs to explicitly modify that application to call packet capture
+        framework initialization code. Refer ``app/test-pmd/testpmd.c``
+        code to see how this is done.
+
+      * The ``dpdk-dumpcap`` tool runs as a DPDK secondary process. It exits when
+        the primary application exits.
+
+
+Running the Application
+-----------------------
+
+To list interfaces available for capture use ``--list-interfaces``.
+
+To filter packets in style of *tshark* use the ``-f`` flag.
+
+To capture on multiple interfaces at once, use multiple ``-I`` flags.
+
+Example
+-------
+
+.. code-block:: console
+
+   # ./<build_dir>/app/dpdk-dumpcap --list-interfaces
+   0. 000:00:03.0
+   1. 000:00:03.1
+
+   # ./<build_dir>/app/dpdk-dumpcap -I 0000:00:03.0 -c 6 -w /tmp/sample.pcapng
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 6/0
+
+   # ./<build_dir>/app/dpdk-dumpcap -f 'tcp port 80'
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 10/8
+
+
+Limitations
+-----------
+The following option of Wireshark ``dumpcap`` is not yet implemented:
+
+   * ``-b|--ring-buffer`` -- more complex file management.
+
+The following options do not make sense in the context of DPDK.
+
+   * ``-C <byte_limit>`` -- its a kernel thing
+
+   * ``-t`` -- use a thread per interface
+
+   * Timestamp type.
+
+   * Link data types. Only EN10MB (Ethernet) is supported.
+
+   * Wireless related options:  ``-I|--monitor-mode`` and  ``-k <freq>``
+
+
+.. Note::
+   * The options to ``dpdk-dumpcap`` are like the Wireshark dumpcap program and
+     are not the same as ``dpdk-pdump`` and other DPDK applications.
diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst
index 93dde4148e90..b71c12b8f2dd 100644
--- a/doc/guides/tools/index.rst
+++ b/doc/guides/tools/index.rst
@@ -8,6 +8,7 @@ DPDK Tools User Guides
     :maxdepth: 2
     :numbered:
 
+    dumpcap
     proc_info
     pdump
     pmdinfo
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v9 12/12] MAINTAINERS: add entry for new packet capture features
  2021-09-16  0:14 ` [dpdk-dev] [PATCH v9 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (10 preceding siblings ...)
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 11/12] doc: changes for new pcapng and dumpcap Stephen Hemminger
@ 2021-09-16  0:14   ` Stephen Hemminger
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16  0:14 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Since the packet capture is just extension of existing pdump;
add myself as maintainer of that.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1e0d3033946d..081f90c9a0ba 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1422,12 +1422,17 @@ F: doc/guides/sample_app_ug/qos_scheduler.rst
 
 Packet capture
 M: Reshma Pattan <reshma.pattan@intel.com>
+M: Stephen Hemminger <stephen@networkplumber.org>
 F: lib/pdump/
+F: lib/pcapng/
 F: doc/guides/prog_guide/pdump_lib.rst
-F: app/test/test_pdump.*
-F: app/pdump/
+F: doc/guides/prog_guide/pcapng_lib.rst
+F: doc/guides/tools/dumpcap.rst
 F: doc/guides/tools/pdump.rst
-
+F: app/test/test_pdump.c
+F: app/test/test_pcapng.c
+F: app/pdump/
+F: app/dumpcap/
 
 Packet Framework
 ----------------
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v8 10/12] test: enable bpf autotest
  2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 10/12] test: enable bpf autotest Stephen Hemminger
  2021-09-15 11:27     ` Ananyev, Konstantin
@ 2021-09-16  3:09     ` Stephen Hemminger
  1 sibling, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16  3:09 UTC (permalink / raw)
  To: dev

On Mon, 13 Sep 2021 11:15:08 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:

> The BPF autotest is defined but not run automatically.
> Since it is short, it should be added to the autotest suite.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  app/test/meson.build | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/app/test/meson.build b/app/test/meson.build
> index 0d551ac9c2b2..cd18484bb73a 100644
> --- a/app/test/meson.build
> +++ b/app/test/meson.build
> @@ -194,6 +194,8 @@ test_deps = [
>  fast_tests = [
>          ['acl_autotest', true],
>          ['atomic_autotest', false],
> +        ['bpf_autotest', true],
> +        ['bpf_convert_autotest', true],
>          ['bitops_autotest', true],
>          ['byteorder_autotest', true],
>          ['cksum_autotest', true],

Note: this patch exposes a pre-existing bug in DPDK:
https://bugs.dpdk.org/show_bug.cgi?id=811

The BPF code does not work if built with Clang.
The test was just being ignored by the CI before!

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v9 03/12] bpf: allow self-xor operation
  2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 03/12] bpf: allow self-xor operation Stephen Hemminger
@ 2021-09-16 15:23     ` Ananyev, Konstantin
  0 siblings, 0 replies; 220+ messages in thread
From: Ananyev, Konstantin @ 2021-09-16 15:23 UTC (permalink / raw)
  To: Stephen Hemminger, dev


> Some BPF programs may use XOR of a register with itself
> as a way to zero register in one instruction.
> The BPF filter converter generates this in the prolog
> to the generated code.
> 
> The BPF validator would not allow this because the value of
> register was undefined. But after this operation it always zero.
> 
> Fixes: 8021917293d0 ("bpf: add extra validation for input BPF program")
> Cc: konstantin.ananyev@intel.com
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  lib/bpf/bpf_validate.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/bpf/bpf_validate.c b/lib/bpf/bpf_validate.c
> index 7b1291b382e9..853279fee557 100644
> --- a/lib/bpf/bpf_validate.c
> +++ b/lib/bpf/bpf_validate.c
> @@ -661,8 +661,15 @@ eval_alu(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
> 
>  	op = BPF_OP(ins->code);
> 
> +	/* Allow self-xor as way to zero register */
> +	if (op == BPF_XOR && BPF_SRC(ins->code) == BPF_X &&
> +	    ins->src_reg == ins->dst_reg) {
> +		eval_fill_imm(&rs, UINT64_MAX, 0);
> +		eval_fill_imm(rd, UINT64_MAX, 0);
> +	}
> +
>  	err = eval_defined((op != EBPF_MOV) ? rd : NULL,
> -			(op != BPF_NEG) ? &rs : NULL);
> +			   (op != BPF_NEG) ? &rs : NULL);
>  	if (err != NULL)
>  		return err;
> 
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v10 00/12] Packet capture framework enhancements
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (12 preceding siblings ...)
  2021-09-16  0:14 ` [dpdk-dev] [PATCH v9 00/12] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-16 22:26 ` Stephen Hemminger
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 01/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
                     ` (11 more replies)
  2021-09-24 15:21 ` [dpdk-dev] [PATCH v11 00/12] Packet capture framework enhancements Stephen Hemminger
                   ` (4 subsequent siblings)
  18 siblings, 12 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16 22:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patch set is a more complete version of the the enhanced
packet capture support described last year.

The new capture library and utility are:
  - faster avoids lots of extra I/O, does bursting, etc.
  - gives more information (multiple ports, queues, etc)
  - has a better user interface (same as Wireshark dumpcap)
  - fixes structural problems with VLAN's and timestamps

There are no blocker items. The following are worth noting:
  * some of the patches get bogus checkpatch warnings
  * enabling BPF tests causes CI to see a pre-existing bug
  * filtering for stripped VLAN tags requires changes to
    libpcap (to be addressed in future)

v10:
  - fix to rte_bpf_dump to handle more instructions
    make sure all bpf_test cases are decoded

v9:
  - incorporate suggested change to BPF XOR
  - make autotest for pcapng more complete by reading the
    resulting file with libpcap

v8:
  - enable BPF tests in autotest
  - add more BPF test strings
  - use rte_strscpy to satisfy checkpatch
  - merge MAINTAINERS (put this in with existing pdump)

v7:
  - add functional tests for pcapng lib
  - bug fix for error returns in pcapng lib
  - handle long osname on FreeBSD
  - resolve almost all checkpatch issues

v5:
  - minor build and checkpatch fixes for RHEL/FreeBSD
  - disable lib/pdump on Windows. It was not useful before
    and now pdump depends on bpf.

v4:
  - minor checkpatch fixes.
    Note: some of the checkpatch warnings are bogus and won't be fixed.
  - fix build of dumpcap on FreeBSD

v3:
  - introduce packet filters using classic BPF to eBPF converter
    required small fix to DPDK BPF interpreter
  - introduce function to decode eBPF instructions
  - add option to dumpcap to show both classic BPF and eBPF result
  - drop some un-useful stubs
  - minor checkpatch warning cleanup

v2:
   fix formatting of packet blocks
   fix the new packet capture statistics
   fix crash when primary process exits
   record start/end time
   various whitespace/checkpatch warnings

Stephen Hemminger (12):
  librte_pcapng: add new library for writing pcapng files
  lib: pdump is not supported on Windows
  bpf: allow self-xor operation
  bpf: add function to convert classic BPF to DPDK BPF
  bpf: add function to dump eBPF instructions
  pdump: support pcapng and filtering
  app/dumpcap: add new packet capture application
  test: add test for bpf_convert
  test: add a test for pcapng library
  test: enable bpf autotest
  doc: changes for new pcapng and dumpcap
  MAINTAINERS: add entry for new packet capture features

 MAINTAINERS                                   |  11 +-
 app/dumpcap/main.c                            | 844 ++++++++++++++++++
 app/dumpcap/meson.build                       |  16 +
 app/meson.build                               |   1 +
 app/test/meson.build                          |   6 +
 app/test/test_bpf.c                           | 200 +++++
 app/test/test_pcapng.c                        | 272 ++++++
 doc/api/doxy-api-index.md                     |   1 +
 doc/api/doxy-api.conf.in                      |   1 +
 .../howto/img/packet_capture_framework.svg    |  96 +-
 doc/guides/howto/packet_capture_framework.rst |  67 +-
 doc/guides/prog_guide/index.rst               |   1 +
 doc/guides/prog_guide/pcapng_lib.rst          |  24 +
 doc/guides/prog_guide/pdump_lib.rst           |  28 +-
 doc/guides/rel_notes/release_21_11.rst        |  10 +
 doc/guides/tools/dumpcap.rst                  |  86 ++
 doc/guides/tools/index.rst                    |   1 +
 lib/bpf/bpf_convert.c                         | 575 ++++++++++++
 lib/bpf/bpf_dump.c                            | 139 +++
 lib/bpf/bpf_validate.c                        |   9 +-
 lib/bpf/meson.build                           |   6 +
 lib/bpf/rte_bpf.h                             |  39 +
 lib/bpf/version.map                           |   7 +
 lib/meson.build                               |   6 +-
 lib/pcapng/meson.build                        |   8 +
 lib/pcapng/pcapng_proto.h                     | 129 +++
 lib/pcapng/rte_pcapng.c                       | 574 ++++++++++++
 lib/pcapng/rte_pcapng.h                       | 194 ++++
 lib/pcapng/version.map                        |  12 +
 lib/pdump/meson.build                         |   2 +-
 lib/pdump/rte_pdump.c                         | 441 ++++++---
 lib/pdump/rte_pdump.h                         | 110 ++-
 lib/pdump/version.map                         |   8 +
 33 files changed, 3704 insertions(+), 220 deletions(-)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build
 create mode 100644 app/test/test_pcapng.c
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst
 create mode 100644 lib/bpf/bpf_convert.c
 create mode 100644 lib/bpf/bpf_dump.c
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v10 01/12] librte_pcapng: add new library for writing pcapng files
  2021-09-16 22:26 ` [dpdk-dev] [PATCH v10 00/12] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-16 22:26   ` Stephen Hemminger
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 02/12] lib: pdump is not supported on Windows Stephen Hemminger
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16 22:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See draft RFC
  https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
and
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build           |   1 +
 lib/pcapng/meson.build    |   8 +
 lib/pcapng/pcapng_proto.h | 129 +++++++++
 lib/pcapng/rte_pcapng.c   | 574 ++++++++++++++++++++++++++++++++++++++
 lib/pcapng/rte_pcapng.h   | 194 +++++++++++++
 lib/pcapng/version.map    |  12 +
 6 files changed, 918 insertions(+)
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

diff --git a/lib/meson.build b/lib/meson.build
index 1673ca4323c0..51bf9c2d11f0 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -41,6 +41,7 @@ libraries = [
         'latencystats',
         'lpm',
         'member',
+        'pcapng',
         'power',
         'pdump',
         'rawdev',
diff --git a/lib/pcapng/meson.build b/lib/pcapng/meson.build
new file mode 100644
index 000000000000..fe636bdf3c0b
--- /dev/null
+++ b/lib/pcapng/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+version = 1
+sources = files('rte_pcapng.c')
+headers = files('rte_pcapng.h')
+
+deps += ['ethdev']
diff --git a/lib/pcapng/pcapng_proto.h b/lib/pcapng/pcapng_proto.h
new file mode 100644
index 000000000000..47161d8a1213
--- /dev/null
+++ b/lib/pcapng/pcapng_proto.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * PCAP Next Generation Capture File writer
+ *
+ * See: https://github.com/pcapng/pcapng/ for the file format.
+ */
+
+enum pcapng_block_types {
+	PCAPNG_INTERFACE_BLOCK		= 1,
+	PCAPNG_PACKET_BLOCK,		/* Obsolete */
+	PCAPNG_SIMPLE_PACKET_BLOCK,
+	PCAPNG_NAME_RESOLUTION_BLOCK,
+	PCAPNG_INTERFACE_STATS_BLOCK,
+	PCAPNG_ENHANCED_PACKET_BLOCK,
+
+	PCAPNG_SECTION_BLOCK		= 0x0A0D0D0A,
+};
+
+struct pcapng_option {
+	uint16_t code;
+	uint16_t length;
+	uint8_t data[];
+};
+
+#define PCAPNG_BYTE_ORDER_MAGIC 0x1A2B3C4D
+#define PCAPNG_MAJOR_VERS 1
+#define PCAPNG_MINOR_VERS 0
+
+enum pcapng_opt {
+	PCAPNG_OPT_END	= 0,
+	PCAPNG_OPT_COMMENT = 1,
+};
+
+struct pcapng_section_header {
+	uint32_t block_type;
+	uint32_t block_length;
+	uint32_t byte_order_magic;
+	uint16_t major_version;
+	uint16_t minor_version;
+	uint64_t section_length;
+};
+
+enum pcapng_section_opt {
+	PCAPNG_SHB_HARDWARE = 2,
+	PCAPNG_SHB_OS	    = 3,
+	PCAPNG_SHB_USERAPPL = 4,
+};
+
+struct pcapng_interface_block {
+	uint32_t block_type;	/* 1 */
+	uint32_t block_length;
+	uint16_t link_type;
+	uint16_t reserved;
+	uint32_t snap_len;
+};
+
+enum pcapng_interface_options {
+	PCAPNG_IFB_NAME	 = 2,
+	PCAPNG_IFB_DESCRIPTION,
+	PCAPNG_IFB_IPV4ADDR,
+	PCAPNG_IFB_IPV6ADDR,
+	PCAPNG_IFB_MACADDR,
+	PCAPNG_IFB_EUIADDR,
+	PCAPNG_IFB_SPEED,
+	PCAPNG_IFB_TSRESOL,
+	PCAPNG_IFB_TZONE,
+	PCAPNG_IFB_FILTER,
+	PCAPNG_IFB_OS,
+	PCAPNG_IFB_FCSLEN,
+	PCAPNG_IFB_TSOFFSET,
+	PCAPNG_IFB_HARDWARE,
+};
+
+struct pcapng_enhance_packet_block {
+	uint32_t block_type;	/* 6 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+	uint32_t capture_length;
+	uint32_t original_length;
+};
+
+/* Flags values */
+#define PCAPNG_IFB_INBOUND   0b01
+#define PCAPNG_IFB_OUTBOUND  0b10
+
+enum pcapng_epb_options {
+	PCAPNG_EPB_FLAGS = 2,
+	PCAPNG_EPB_HASH,
+	PCAPNG_EPB_DROPCOUNT,
+	PCAPNG_EPB_PACKETID,
+	PCAPNG_EPB_QUEUE,
+	PCAPNG_EPB_VERDICT,
+};
+
+enum pcapng_epb_hash {
+	PCAPNG_HASH_2COMP = 0,
+	PCAPNG_HASH_XOR,
+	PCAPNG_HASH_CRC32,
+	PCAPNG_HASH_MD5,
+	PCAPNG_HASH_SHA1,
+	PCAPNG_HASH_TOEPLITZ,
+};
+
+struct pcapng_simple_packet {
+	uint32_t block_type;	/* 3 */
+	uint32_t block_length;
+	uint32_t packet_length;
+};
+
+struct pcapng_statistics {
+	uint32_t block_type;	/* 5 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+};
+
+enum pcapng_isb_options {
+	PCAPNG_ISB_STARTTIME = 2,
+	PCAPNG_ISB_ENDTIME,
+	PCAPNG_ISB_IFRECV,
+	PCAPNG_ISB_IFDROP,
+	PCAPNG_ISB_FILTERACCEPT,
+	PCAPNG_ISB_OSDROP,
+	PCAPNG_ISB_USRDELIV,
+};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
new file mode 100644
index 000000000000..f8280a8b01f4
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.c
@@ -0,0 +1,574 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+#include <errno.h>
+#include <net/if.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/uio.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+#include <rte_time.h>
+
+#include "pcapng_proto.h"
+
+/* conversion from DPDK speed to PCAPNG */
+#define PCAPNG_MBPS_SPEED 1000000ull
+
+/* Format of the capture file handle */
+struct rte_pcapng {
+	int  outfd;		/* output file */
+	/* DPDK port id to interface index in file */
+	uint32_t port_index[RTE_MAX_ETHPORTS];
+};
+
+/* For converting TSC cycles to PCAPNG ns format */
+struct pcapng_time {
+	uint64_t ns;
+	uint64_t cycles;
+} pcapng_time;
+
+RTE_INIT(pcapng_init)
+{
+	struct timespec ts;
+
+	pcapng_time.cycles = rte_get_tsc_cycles();
+	clock_gettime(CLOCK_REALTIME, &ts);
+	pcapng_time.ns = rte_timespec_to_ns(&ts);
+}
+
+/* PCAPNG timestamps are in nanoseconds */
+static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
+{
+	uint64_t delta;
+
+	delta = cycles - pcapng_time.cycles;
+	return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
+}
+
+/* length of option including padding */
+static uint16_t pcapng_optlen(uint16_t len)
+{
+	return RTE_ALIGN(sizeof(struct pcapng_option) + len,
+			 sizeof(uint32_t));
+}
+
+/* build TLV option and return location of next */
+static struct pcapng_option *
+pcapng_add_option(struct pcapng_option *popt, uint16_t code,
+		  const void *data, uint16_t len)
+{
+	popt->code = code;
+	popt->length = len;
+	memcpy(popt->data, data, len);
+
+	return (struct pcapng_option *)((uint8_t *)popt + pcapng_optlen(len));
+}
+
+/*
+ * Write required initial section header describing the capture
+ */
+static int
+pcapng_section_block(rte_pcapng_t *self,
+		    const char *os, const char *hw,
+		    const char *app, const char *comment)
+{
+	struct pcapng_section_header *hdr;
+	struct pcapng_option *opt;
+	void *buf;
+	uint32_t len;
+	ssize_t cc;
+
+	len = sizeof(*hdr);
+	if (hw)
+		len += pcapng_optlen(strlen(hw));
+	if (os)
+		len += pcapng_optlen(strlen(os));
+	if (app)
+		len += pcapng_optlen(strlen(app));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = calloc(1, len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_section_header *)buf;
+	*hdr = (struct pcapng_section_header) {
+		.block_type = PCAPNG_SECTION_BLOCK,
+		.block_length = len,
+		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
+		.major_version = PCAPNG_MAJOR_VERS,
+		.minor_version = PCAPNG_MINOR_VERS,
+		.section_length = UINT64_MAX,
+	};
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (hw)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_HARDWARE,
+					hw, strlen(hw));
+	if (os)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_OS,
+					os, strlen(os));
+	if (app)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_USERAPPL,
+					app, strlen(app));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+	/* clone block_length after option */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+/* Write the PCAPNG section header at start of file */
+static ssize_t
+pcapng_interface_block(rte_pcapng_t *self, const char *if_name,
+		      uint64_t if_speed, const uint8_t *mac_addr,
+		      const char *if_hw, const char *comment)
+{
+	struct pcapng_interface_block *hdr;
+	struct pcapng_option *opt;
+	const uint8_t tsresol = 9;	/* nanosecond resolution */
+	uint32_t len = sizeof(*hdr);
+	ssize_t cc;
+	void *buf;
+
+	len += pcapng_optlen(sizeof(tsresol));
+	if (if_name)
+		len += pcapng_optlen(strlen(if_name));
+	if (mac_addr)
+		len += pcapng_optlen(6);
+	if (if_speed)
+		len += pcapng_optlen(sizeof(uint64_t));
+	if (if_hw)
+		len += pcapng_optlen(strlen(if_hw));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+	buf = calloc(1, len);
+	if (!buf)
+		return -ENOMEM;
+
+	hdr = (struct pcapng_interface_block *)buf;
+	hdr->block_type = PCAPNG_INTERFACE_BLOCK;
+	hdr->link_type = 1;	/* Ethernet */
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (if_name)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_NAME,
+					 if_name, strlen(if_name));
+	if (mac_addr)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_MACADDR,
+					mac_addr, RTE_ETHER_ADDR_LEN);
+	if (if_speed)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_SPEED,
+					 &if_speed, sizeof(uint64_t));
+	opt = pcapng_add_option(opt, PCAPNG_IFB_TSRESOL,
+				&tsresol, sizeof(tsresol));
+	if (if_hw)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE,
+					 if_hw, strlen(if_hw));
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+static int
+pcapng_add_interface(rte_pcapng_t *self, uint16_t port)
+{
+	struct rte_eth_dev_info dev_info;
+	struct rte_ether_addr macaddr;
+	const struct rte_device *dev;
+	struct rte_eth_link link;
+	char ifname[IF_NAMESIZE];
+	char ifhw[256];
+	uint64_t speed = 0;
+
+	if (rte_eth_dev_info_get(port, &dev_info) < 0)
+		return -1;
+
+	/* make something like an interface name */
+	if (if_indextoname(dev_info.if_index, ifname) == NULL)
+		snprintf(ifname, IF_NAMESIZE, "dpdk:%u", port);
+
+	/* make a useful device hardware string */
+	dev = dev_info.device;
+	if (dev)
+		snprintf(ifhw, sizeof(ifhw),
+			 "%s-%s", dev->bus->name, dev->name);
+
+	/* DPDK reports in units of Mbps */
+	rte_eth_link_get(port, &link);
+	if (link.link_status == ETH_LINK_UP)
+		speed = link.link_speed * PCAPNG_MBPS_SPEED;
+
+	rte_eth_macaddr_get(port, &macaddr);
+
+	return pcapng_interface_block(self, ifname, speed,
+				      macaddr.addr_bytes,
+				      dev ? ifhw : NULL, NULL);
+}
+
+/*
+ * Write the list of possible interfaces at the start
+ * of the file.
+ */
+static int
+pcapng_interfaces(rte_pcapng_t *self)
+{
+	uint16_t port_id;
+	uint16_t index = 0;
+
+	RTE_ETH_FOREACH_DEV(port_id) {
+		/* The list if ports in pcapng needs to be contiguous */
+		self->port_index[port_id] = index++;
+		if (pcapng_add_interface(self, port_id) < 0)
+			return -1;
+	}
+	return 0;
+}
+
+/*
+ * Write an Interface statistics block at the end of capture.
+ */
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop)
+{
+	struct pcapng_statistics *hdr;
+	struct pcapng_option *opt;
+	uint32_t optlen, len;
+	uint8_t *buf;
+	uint64_t ns;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	optlen = 0;
+
+	if (ifrecv != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifdrop));
+	if (start_time != 0)
+		optlen += pcapng_optlen(sizeof(start_time));
+	if (end_time != 0)
+		optlen += pcapng_optlen(sizeof(end_time));
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+	if (optlen != 0)
+		optlen += pcapng_optlen(0);
+
+	len = sizeof(*hdr) + optlen + sizeof(uint32_t);
+	buf = alloca(len);
+	if (buf == NULL)
+		return -1;
+
+	hdr = (struct pcapng_statistics *)buf;
+	opt = (struct pcapng_option *)(hdr + 1);
+
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (start_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME,
+					 &start_time, sizeof(start_time));
+	if (end_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME,
+					 &end_time, sizeof(end_time));
+	if (ifrecv != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV,
+				&ifrecv, sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFDROP,
+				&ifdrop, sizeof(ifdrop));
+	if (optlen != 0)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	hdr->block_type = PCAPNG_INTERFACE_STATS_BLOCK;
+	hdr->block_length = len;
+	hdr->interface_id = self->port_index[port_id];
+
+	ns = pcapng_tsc_to_ns(rte_get_tsc_cycles());
+	hdr->timestamp_hi = ns >> 32;
+	hdr->timestamp_lo = (uint32_t)ns;
+
+	/* clone block_length after option */
+	memcpy(opt, &len, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length)
+{
+	/* The VLAN and EPB header must fit in the mbuf headroom. */
+	RTE_ASSERT(sizeof(struct pcapng_enhance_packet_block) +
+		   sizeof(struct rte_vlan_hdr) <= RTE_PKTMBUF_HEADROOM);
+
+	/* The flags and queue information are added at the end. */
+	return sizeof(struct rte_mbuf)
+		+ RTE_ALIGN(length, sizeof(uint32_t))
+		+ pcapng_optlen(sizeof(uint32_t)) /* flag option */
+		+ pcapng_optlen(sizeof(uint32_t)) /* queue option */
+		+ sizeof(uint32_t);		  /*  length */
+}
+
+/*
+ *   The mbufs created use the Pcapng standard enhanced packet  block.
+ *
+ *                         1                   2                   3
+ *     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  0 |                    Block Type = 0x00000006                    |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  4 |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  8 |                         Interface ID                          |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 12 |                        Timestamp (High)                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 16 |                        Timestamp (Low)                        |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 20 |                    Captured Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 24 |                    Original Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 28 /                                                               /
+ *    /                          Packet Data                          /
+ *    /              variable length, padded to 32 bits               /
+ *    /                                                               /
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0002     |     Option Length = 0x004     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Flags (direction)                                |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0006     |     Option Length = 0x002     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Queue id                                         |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+
+/* Make a copy of original mbuf with pcapng header and options */
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *md,
+		struct rte_mempool *mp,
+		uint32_t length, uint64_t cycles,
+		enum rte_pcapng_direction direction)
+{
+	struct pcapng_enhance_packet_block *epb;
+	uint32_t orig_len, data_len, padding, flags;
+	struct pcapng_option *opt;
+	const uint16_t optlen = pcapng_optlen(sizeof(flags)) + pcapng_optlen(sizeof(queue));
+	struct rte_mbuf *mc;
+	uint64_t ns;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
+#endif
+	ns = pcapng_tsc_to_ns(cycles);
+
+	orig_len = rte_pktmbuf_pkt_len(md);
+
+	/* Take snapshot of the data */
+	mc = rte_pktmbuf_copy(md, mp, 0, length);
+	if (unlikely(mc == NULL))
+		return NULL;
+
+	/* If packet had offloaded VLAN, expand it */
+	if (md->ol_flags & ~(PKT_RX_VLAN_STRIPPED | PKT_TX_VLAN)) {
+		if (rte_vlan_insert(&mc) != 0)
+			goto fail;
+
+		orig_len += sizeof(struct rte_vlan_hdr);
+	}
+
+	/* pad the packet to 32 bit boundary */
+	data_len = rte_pktmbuf_data_len(mc);
+	padding = RTE_ALIGN(data_len, sizeof(uint32_t)) - data_len;
+	if (padding > 0) {
+		void *tail = rte_pktmbuf_append(mc, padding);
+
+		if (tail == NULL)
+			goto fail;
+		memset(tail, 0, padding);
+	}
+
+	/* reserve trailing options and block length */
+	opt = (struct pcapng_option *)
+		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
+	if (unlikely(opt == NULL))
+		goto fail;
+
+	switch (direction) {
+	case RTE_PCAPNG_DIRECTION_IN:
+		flags = PCAPNG_IFB_INBOUND;
+		break;
+	case RTE_PCAPNG_DIRECTION_OUT:
+		flags = PCAPNG_IFB_OUTBOUND;
+		break;
+	default:
+		flags = 0;
+	}
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS,
+				&flags, sizeof(flags));
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
+				&queue, sizeof(queue));
+
+	/* Add PCAPNG packet header */
+	epb = (struct pcapng_enhance_packet_block *)
+		rte_pktmbuf_prepend(mc, sizeof(*epb));
+	if (unlikely(epb == NULL))
+		goto fail;
+
+	epb->block_type = PCAPNG_ENHANCED_PACKET_BLOCK;
+	epb->block_length = rte_pktmbuf_data_len(mc);
+
+	/* Interface index is filled in later during write */
+	mc->port = port_id;
+
+	epb->timestamp_hi = ns >> 32;
+	epb->timestamp_lo = (uint32_t)ns;
+	epb->capture_length = data_len;
+	epb->original_length = orig_len;
+
+	/* set trailer of block length */
+	*(uint32_t *)opt = epb->block_length;
+
+	return mc;
+
+fail:
+	rte_pktmbuf_free(mc);
+	return NULL;
+}
+
+/* Count how many segments are in this array of mbufs */
+static unsigned int
+mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
+{
+	unsigned int i, iovcnt;
+
+	for (iovcnt = 0, i = 0; i < n; i++) {
+		const struct rte_mbuf *m = pkts[i];
+
+		__rte_mbuf_sanity_check(m, 1);
+
+		iovcnt += m->nb_segs;
+	}
+	return iovcnt;
+}
+
+/* Write pre-formatted packets to file. */
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts)
+{
+	int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
+	struct iovec iov[iovcnt];
+	unsigned int i, cnt;
+	ssize_t ret;
+
+	for (i = cnt = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *m = pkts[i];
+		struct pcapng_enhance_packet_block *epb;
+
+		/* sanity check that is really a pcapng mbuf */
+		epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
+		if (unlikely(epb->block_type != PCAPNG_ENHANCED_PACKET_BLOCK ||
+			     epb->block_length != rte_pktmbuf_data_len(m))) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/*
+		 * The DPDK port is recorded during pcapng_copy.
+		 * Map that to PCAPNG interface in file.
+		 */
+		epb->interface_id = self->port_index[m->port];
+		do {
+			iov[cnt].iov_base = rte_pktmbuf_mtod(m, void *);
+			iov[cnt].iov_len = rte_pktmbuf_data_len(m);
+			++cnt;
+		} while ((m = m->next));
+	}
+
+	ret = writev(self->outfd, iov, iovcnt);
+	if (unlikely(ret < 0))
+		rte_errno = errno;
+	return ret;
+}
+
+/* Create new pcapng writer handle */
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment)
+{
+	rte_pcapng_t *self;
+
+	self = malloc(sizeof(*self));
+	if (!self) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	self->outfd = fd;
+
+	if (pcapng_section_block(self, osname, hardware, appname, comment) < 0)
+		goto fail;
+
+	if (pcapng_interfaces(self) < 0)
+		goto fail;
+
+	return self;
+fail:
+	free(self);
+	return NULL;
+}
+
+void
+rte_pcapng_close(rte_pcapng_t *self)
+{
+	close(self->outfd);
+	free(self);
+}
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
new file mode 100644
index 000000000000..2f1bb073df08
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.h
@@ -0,0 +1,194 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+/**
+ * @file
+ * RTE pcapng
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * Pcapng is an evolution from the pcap format, created to address some of
+ * its deficiencies. Namely, the lack of extensibility and inability to store
+ * additional information.
+ *
+ * For details about the file format see RFC:
+ *   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+ *  and
+ *    https://github.com/pcapng/pcapng/
+ */
+
+#ifndef _RTE_PCAPNG_H_
+#define _RTE_PCAPNG_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+#include <rte_compat.h>
+#include <rte_common.h>
+#include <rte_mempool.h>
+#include <rte_ring.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Opaque handle used for functions in this library. */
+typedef struct rte_pcapng rte_pcapng_t;
+
+/**
+ * Write data to existing open file
+ *
+ * @param fd
+ *   file descriptor
+ * @param osname
+ *   Optional description of the operating system.
+ *   Examples: "Debian 11", "Windows Server 22"
+ * @param hardware
+ *   Optional description of the hardware used to create this file.
+ *   Examples: "x86 Virtual Machine"
+ * @param appname
+ *   Optional: application name recorded in the pcapng file.
+ *   Example: "dpdk-dumpcap 1.0 (DPDK 20.11)"
+ * @param comment
+ *   Optional comment to add to file header.
+ * @return
+ *   handle to library, or NULL in case of error (and rte_errno is set).
+ */
+__rte_experimental
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment);
+
+/**
+ * Close capture file
+ *
+ * @param self
+ *  handle to library
+ */
+__rte_experimental
+void
+rte_pcapng_close(rte_pcapng_t *self);
+
+/**
+ * Direction flag
+ * These should match Enhanced Packet Block flag bits
+ */
+enum rte_pcapng_direction {
+	RTE_PCAPNG_DIRECTION_UNKNOWN = 0,
+	RTE_PCAPNG_DIRECTION_IN  = 1,
+	RTE_PCAPNG_DIRECTION_OUT = 2,
+};
+
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param port_id
+ *   The Ethernet port on which packet was received
+ *   or is going to be transmitted.
+ * @param queue
+ *   The queue on the Ethernet port where packet was received
+ *   or is going to be transmitted.
+ * @param mp
+ *   The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ *   The mbuf to copy
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @param timestamp
+ *   The timestamp in TSC cycles.
+ * @param direction
+ *   The direction of the packer: receive, transmit or unknown.
+ *
+ * @return
+ *   - The pointer to the new mbuf formatted for pcapng_write
+ *   - NULL if allocation fails.
+ *
+ */
+__rte_experimental
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *m, struct rte_mempool *mp,
+		uint32_t length, uint64_t timestamp,
+		enum rte_pcapng_direction direction);
+
+
+/**
+ * Determine optimum mbuf data size.
+ *
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @return
+ *   The minimum size of mbuf data to handle packet with length bytes.
+ *   Accounting for required header and trailer fields
+ */
+__rte_experimental
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length);
+
+/**
+ * Write packets to the capture file.
+ *
+ * Packets to be captured are copied by rte_pcapng_mbuf()
+ * and then this function is called to write them to the file.
+ * @warning
+ * Do not pass original mbufs
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param pkts
+ *  The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *  which contain the output packets
+ * @param nb_pkts
+ *  The number of packets to write to the file.
+ * @return
+ *  The number of bytes written to file, -1 on failure to write file.
+ *  The mbuf's in *pkts* are always freed.
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts);
+
+/**
+ * Write an Interface statistics block.
+ * For statistics, use 0 if don't know or care to report it.
+ * Should be called before closing capture to report results.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param port
+ *  The Ethernet port to report stats on.
+ * @param comment
+ *   Optional comment to add to statistics.
+ * @param start_time
+ *  The time when packet capture was started in nanoseconds.
+ *  Optional: can be zero if not known.
+ * @param end_time
+ *  The time when packet capture was stopped in nanoseconds.
+ *  Optional: can be zero if not finished;
+ * @param ifrecv
+ *  The number of packets received by capture.
+ *  Optional: use UINT64_MAX if not known.
+ * @param ifdrop
+ *  The number of packets missed by the capture process.
+ *  Optional: use UINT64_MAX if not known.
+ * @return
+ *  number of bytes written to file, -1 on failure to write file
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PCAPNG_H_ */
diff --git a/lib/pcapng/version.map b/lib/pcapng/version.map
new file mode 100644
index 000000000000..05a9c86a7d91
--- /dev/null
+++ b/lib/pcapng/version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_pcapng_close;
+	rte_pcapng_copy;
+	rte_pcapng_fdopen;
+	rte_pcapng_mbuf_size;
+	rte_pcapng_write_packets;
+	rte_pcapng_write_stats;
+
+	local: *;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v10 02/12] lib: pdump is not supported on Windows
  2021-09-16 22:26 ` [dpdk-dev] [PATCH v10 00/12] Packet capture framework enhancements Stephen Hemminger
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 01/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-09-16 22:26   ` Stephen Hemminger
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 03/12] bpf: allow self-xor operation Stephen Hemminger
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16 22:26 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

The current version of the pdump library was building on
Windows, but it was useless since the pdump utility was not being
built and Windows does not have multi-process support.

The new version of pdump with filtering now has dependency
on bpf. But bpf library is not available on Windows.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

Cc: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>
Cc: Dmitry Malloy <dmitrym@microsoft.com>
Cc: Pallavi Kadam <pallavi.kadam@intel.com>
---
 lib/meson.build | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/meson.build b/lib/meson.build
index 51bf9c2d11f0..ba88e9eabc58 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -85,7 +85,6 @@ if is_windows
             'gro',
             'gso',
             'latencystats',
-            'pdump',
     ] # only supported libraries for windows
 endif
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v10 03/12] bpf: allow self-xor operation
  2021-09-16 22:26 ` [dpdk-dev] [PATCH v10 00/12] Packet capture framework enhancements Stephen Hemminger
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 01/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 02/12] lib: pdump is not supported on Windows Stephen Hemminger
@ 2021-09-16 22:26   ` Stephen Hemminger
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16 22:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

Some BPF programs may use XOR of a register with itself
as a way to zero register in one instruction.
The BPF filter converter generates this in the prolog
to the generated code.

The BPF validator would not allow this because the value of
register was undefined. But after this operation it always zero.

Fixes: 8021917293d0 ("bpf: add extra validation for input BPF program")
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_validate.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_validate.c b/lib/bpf/bpf_validate.c
index 7b1291b382e9..853279fee557 100644
--- a/lib/bpf/bpf_validate.c
+++ b/lib/bpf/bpf_validate.c
@@ -661,8 +661,15 @@ eval_alu(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
 
 	op = BPF_OP(ins->code);
 
+	/* Allow self-xor as way to zero register */
+	if (op == BPF_XOR && BPF_SRC(ins->code) == BPF_X &&
+	    ins->src_reg == ins->dst_reg) {
+		eval_fill_imm(&rs, UINT64_MAX, 0);
+		eval_fill_imm(rd, UINT64_MAX, 0);
+	}
+
 	err = eval_defined((op != EBPF_MOV) ? rd : NULL,
-			(op != BPF_NEG) ? &rs : NULL);
+			   (op != BPF_NEG) ? &rs : NULL);
 	if (err != NULL)
 		return err;
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v10 04/12] bpf: add function to convert classic BPF to DPDK BPF
  2021-09-16 22:26 ` [dpdk-dev] [PATCH v10 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (2 preceding siblings ...)
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 03/12] bpf: allow self-xor operation Stephen Hemminger
@ 2021-09-16 22:26   ` Stephen Hemminger
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16 22:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

The pcap library emits classic BPF (32 bit) and is useful for
creating filter programs.  The DPDK BPF library only implements
extended BPF (eBPF).  Add an function to convert from old to
new.

The rte_bpf_convert function uses rte_malloc to put the resulting
program in hugepage shared memory so it can be passed from a
secondary process to a primary process.

The code to convert was originally done as part of the Linux
kernel implementation then converted to a userspace program.
See https://github.com/tklauser/filter2xdp

Both authors have agreed that it is allowable to create a modified
version of this code and license it with BSD license used by DPDK.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/bpf/bpf_convert.c | 575 ++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build   |   5 +
 lib/bpf/rte_bpf.h     |  25 ++
 lib/bpf/version.map   |   6 +
 4 files changed, 611 insertions(+)
 create mode 100644 lib/bpf/bpf_convert.c

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
new file mode 100644
index 000000000000..db84add7dcce
--- /dev/null
+++ b/lib/bpf/bpf_convert.c
@@ -0,0 +1,575 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Microsoft Corporation
+ *
+ * Based on bpf_convert_filter() in the Linux kernel sources
+ * and filter2xdp.
+ *
+ * Licensed as BSD with permission original authors.
+ * Copyright (C) 2017 Tobias Klauser
+ * Copyright (c) 2011 - 2014 PLUMgrid, http://plumgrid.com
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_bpf.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+/* Workaround name conflicts with libpcap */
+#define bpf_validate(f, len) bpf_validate_libpcap(f, len)
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+#undef bpf_validate
+
+#include "bpf_impl.h"
+#include "bpf_def.h"
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+/*
+ * Linux socket filter uses negative absolute offsets to
+ * reference ancillary data.
+ */
+#define SKF_AD_OFF    (-0x1000)
+#define SKF_AD_PROTOCOL 0
+#define SKF_AD_PKTTYPE	4
+#define SKF_AD_IFINDEX	8
+#define SKF_AD_NLATTR	12
+#define SKF_AD_NLATTR_NEST	16
+#define SKF_AD_MARK	20
+#define SKF_AD_QUEUE	24
+#define SKF_AD_HATYPE	28
+#define SKF_AD_RXHASH	32
+#define SKF_AD_CPU	36
+#define SKF_AD_ALU_XOR_X	40
+#define SKF_AD_VLAN_TAG	44
+#define SKF_AD_VLAN_TAG_PRESENT 48
+#define SKF_AD_PAY_OFFSET	52
+#define SKF_AD_RANDOM	56
+#define SKF_AD_VLAN_TPID	60
+#define SKF_AD_MAX	64
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1	EBPF_REG_1
+#define BPF_REG_ARG2	EBPF_REG_2
+#define BPF_REG_ARG3	EBPF_REG_3
+#define BPF_REG_ARG4	EBPF_REG_4
+#define BPF_REG_ARG5	EBPF_REG_5
+#define BPF_REG_CTX	EBPF_REG_6
+#define BPF_REG_FP	EBPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A	EBPF_REG_0
+#define BPF_REG_X	EBPF_REG_7
+#define BPF_REG_TMP	EBPF_REG_8
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define EBPF_ALU64_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | BPF_OP(OP) | BPF_X,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU32_IMM(OP, DST, IMM)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV32_IMM(DST, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = CODE,					\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN()						\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | EBPF_EXIT,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/*
+ * Placeholder to convert BPF extensions like length and VLAN tag
+ * If and when DPDK BPF supports them.
+ */
+static bool convert_bpf_load(const struct bpf_insn *fp,
+			     struct ebpf_insn **new_insnp __rte_unused)
+{
+	switch (fp->k) {
+	case SKF_AD_OFF + SKF_AD_PROTOCOL:
+	case SKF_AD_OFF + SKF_AD_PKTTYPE:
+	case SKF_AD_OFF + SKF_AD_IFINDEX:
+	case SKF_AD_OFF + SKF_AD_HATYPE:
+	case SKF_AD_OFF + SKF_AD_MARK:
+	case SKF_AD_OFF + SKF_AD_RXHASH:
+	case SKF_AD_OFF + SKF_AD_QUEUE:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG_PRESENT:
+	case SKF_AD_OFF + SKF_AD_VLAN_TPID:
+	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
+	case SKF_AD_OFF + SKF_AD_NLATTR:
+	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
+	case SKF_AD_OFF + SKF_AD_CPU:
+	case SKF_AD_OFF + SKF_AD_RANDOM:
+	case SKF_AD_OFF + SKF_AD_ALU_XOR_X:
+		/* Linux has special negative offsets to access meta-data. */
+		RTE_BPF_LOG(ERR,
+			    "rte_bpf_convert: socket offset %d not supported\n",
+			    fp->k - SKF_AD_OFF);
+		return true;
+	default:
+		return false;
+	}
+}
+
+static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
+			      struct ebpf_insn *new_prog, uint32_t *new_len)
+{
+	unsigned int pass = 0;
+	size_t new_flen = 0, target, i;
+	struct ebpf_insn *new_insn;
+	const struct bpf_insn *fp;
+	int *addrs = NULL;
+	uint8_t bpf_src;
+
+	if (len > BPF_MAXINSNS) {
+		RTE_BPF_LOG(ERR, "%s: cBPF program too long (%zu insns)\n",
+			    __func__, len);
+		return -EINVAL;
+	}
+
+	/* On second pass, allocate the new program */
+	if (new_prog) {
+		addrs = calloc(len, sizeof(*addrs));
+		if (addrs == NULL)
+			return -ENOMEM;
+	}
+
+do_pass:
+	new_insn = new_prog;
+	fp = prog;
+
+	/* Classic BPF related prologue emission. */
+	if (new_insn) {
+		/* Classic BPF expects A and X to be reset first. These need
+		 * to be guaranteed to be the first two instructions.
+		 */
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+
+		/* All programs must keep CTX in callee saved BPF_REG_CTX.
+		 * In eBPF case it's done by the compiler, here we need to
+		 * do this ourself. Initial CTX is present in BPF_REG_ARG1.
+		 */
+		*new_insn++ = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1);
+	} else {
+		new_insn += 3;
+	}
+
+	for (i = 0; i < len; fp++, i++) {
+		struct ebpf_insn tmp_insns[6] = { };
+		struct ebpf_insn *insn = tmp_insns;
+
+		if (addrs)
+			addrs[i] = new_insn - new_prog;
+
+		switch (fp->code) {
+			/* Absolute loads are how classic BPF accesses skb */
+		case BPF_LD | BPF_ABS | BPF_W:
+		case BPF_LD | BPF_ABS | BPF_H:
+		case BPF_LD | BPF_ABS | BPF_B:
+			if (convert_bpf_load(fp, &insn))
+				goto err;
+
+			*insn = BPF_RAW_INSN(fp->code, 0, 0, 0, fp->k);
+			break;
+
+		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_X:
+			/* For cBPF, don't cause floating point exception */
+			*insn++ = BPF_MOV32_REG(BPF_REG_X, BPF_REG_X);
+			*insn++ = BPF_JMP_IMM(EBPF_JNE, BPF_REG_X, 0, 2);
+			*insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+			*insn++ = BPF_EXIT_INSN();
+			/* fallthrough */
+		case BPF_ALU | BPF_ADD | BPF_X:
+		case BPF_ALU | BPF_ADD | BPF_K:
+		case BPF_ALU | BPF_SUB | BPF_X:
+		case BPF_ALU | BPF_SUB | BPF_K:
+		case BPF_ALU | BPF_AND | BPF_X:
+		case BPF_ALU | BPF_AND | BPF_K:
+		case BPF_ALU | BPF_OR | BPF_X:
+		case BPF_ALU | BPF_OR | BPF_K:
+		case BPF_ALU | BPF_LSH | BPF_X:
+		case BPF_ALU | BPF_LSH | BPF_K:
+		case BPF_ALU | BPF_RSH | BPF_X:
+		case BPF_ALU | BPF_RSH | BPF_K:
+		case BPF_ALU | BPF_XOR | BPF_X:
+		case BPF_ALU | BPF_XOR | BPF_K:
+		case BPF_ALU | BPF_MUL | BPF_X:
+		case BPF_ALU | BPF_MUL | BPF_K:
+		case BPF_ALU | BPF_DIV | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_NEG:
+		case BPF_LD | BPF_IND | BPF_W:
+		case BPF_LD | BPF_IND | BPF_H:
+		case BPF_LD | BPF_IND | BPF_B:
+			/* All arithmetic insns map as-is. */
+			insn->code = fp->code;
+			insn->dst_reg = BPF_REG_A;
+			bpf_src = BPF_SRC(fp->code);
+			insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			insn->off = 0;
+			insn->imm = fp->k;
+			break;
+
+			/* Jump transformation cannot use BPF block macros
+			 * everywhere as offset calculation and target updates
+			 * require a bit more work than the rest, i.e. jump
+			 * opcodes map as-is, but offsets need adjustment.
+			 */
+
+#define BPF_EMIT_JMP							\
+			do {						\
+				if (target >= len)			\
+					goto err;			\
+				insn->off = addrs ? addrs[target] - addrs[i] - 1 : 0; \
+				/* Adjust pc relative offset for 2nd or 3rd insn. */ \
+				insn->off -= insn - tmp_insns;		\
+			} while (0)
+
+		case BPF_JMP | BPF_JA:
+			target = i + fp->k + 1;
+			insn->code = fp->code;
+			BPF_EMIT_JMP;
+			break;
+
+		case BPF_JMP | BPF_JEQ | BPF_K:
+		case BPF_JMP | BPF_JEQ | BPF_X:
+		case BPF_JMP | BPF_JSET | BPF_K:
+		case BPF_JMP | BPF_JSET | BPF_X:
+		case BPF_JMP | BPF_JGT | BPF_K:
+		case BPF_JMP | BPF_JGT | BPF_X:
+		case BPF_JMP | BPF_JGE | BPF_K:
+		case BPF_JMP | BPF_JGE | BPF_X:
+			if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
+				/* BPF immediates are signed, zero extend
+				 * immediate into tmp register and use it
+				 * in compare insn.
+				 */
+				*insn++ = BPF_MOV32_IMM(BPF_REG_TMP, fp->k);
+
+				insn->dst_reg = BPF_REG_A;
+				insn->src_reg = BPF_REG_TMP;
+				bpf_src = BPF_X;
+			} else {
+				insn->dst_reg = BPF_REG_A;
+				insn->imm = fp->k;
+				bpf_src = BPF_SRC(fp->code);
+				insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			}
+
+			/* Common case where 'jump_false' is next insn. */
+			if (fp->jf == 0) {
+				insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+				target = i + fp->jt + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Convert JEQ into JNE when 'jump_true' is next insn. */
+			if (fp->jt == 0 && BPF_OP(fp->code) == BPF_JEQ) {
+				insn->code = BPF_JMP | EBPF_JNE | bpf_src;
+				target = i + fp->jf + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Other jumps are mapped into two insns: Jxx and JA. */
+			target = i + fp->jt + 1;
+			insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+			BPF_EMIT_JMP;
+			insn++;
+
+			insn->code = BPF_JMP | BPF_JA;
+			target = i + fp->jf + 1;
+			BPF_EMIT_JMP;
+			break;
+
+			/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
+		case BPF_LDX | BPF_MSH | BPF_B:
+			/* tmp = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_A);
+			/* A = BPF_R0 = *(u8 *) (skb->data + K) */
+			*insn++ = BPF_LD_ABS(BPF_B, fp->k);
+			/* A &= 0xf */
+			*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, 0xf);
+			/* A <<= 2 */
+			*insn++ = BPF_ALU32_IMM(BPF_LSH, BPF_REG_A, 2);
+			/* X = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			/* A = tmp */
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP);
+			break;
+
+			/* RET_K is remaped into 2 insns. RET_A case doesn't need an
+			 * extra mov as EBPF_REG_0 is already mapped into BPF_REG_A.
+			 */
+		case BPF_RET | BPF_A:
+		case BPF_RET | BPF_K:
+			if (BPF_RVAL(fp->code) == BPF_K) {
+				*insn++ = BPF_MOV32_RAW(BPF_K, EBPF_REG_0,
+							0, fp->k);
+			}
+			*insn = BPF_EXIT_INSN();
+			break;
+
+			/* Store to stack. */
+		case BPF_ST:
+		case BPF_STX:
+			*insn = BPF_STX_MEM(BPF_W, BPF_REG_FP, BPF_CLASS(fp->code) ==
+					    BPF_ST ? BPF_REG_A : BPF_REG_X,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* Load from stack. */
+		case BPF_LD | BPF_MEM:
+		case BPF_LDX | BPF_MEM:
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD  ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_FP,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* A = K or X = K */
+		case BPF_LD | BPF_IMM:
+		case BPF_LDX | BPF_IMM:
+			*insn = BPF_MOV32_IMM(BPF_CLASS(fp->code) == BPF_LD ?
+					      BPF_REG_A : BPF_REG_X, fp->k);
+			break;
+
+			/* X = A */
+		case BPF_MISC | BPF_TAX:
+			*insn = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			break;
+
+			/* A = X */
+		case BPF_MISC | BPF_TXA:
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_X);
+			break;
+
+			/* A = mbuf->len or X = mbuf->len */
+		case BPF_LD | BPF_W | BPF_LEN:
+		case BPF_LDX | BPF_W | BPF_LEN:
+			/* BPF_ABS/BPF_IND implicitly expect mbuf ptr in R6 */
+
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_CTX,
+					    offsetof(struct rte_mbuf, pkt_len));
+			break;
+
+			/* Unknown instruction. */
+		default:
+			RTE_BPF_LOG(ERR, "%s: Unknown instruction!: %#x\n",
+				    __func__, fp->code);
+			goto err;
+		}
+
+		insn++;
+		if (new_prog)
+			memcpy(new_insn, tmp_insns,
+			       sizeof(*insn) * (insn - tmp_insns));
+		new_insn += insn - tmp_insns;
+	}
+
+	if (!new_prog) {
+		/* Only calculating new length. */
+		*new_len = new_insn - new_prog;
+		return 0;
+	}
+
+	pass++;
+	if ((ptrdiff_t)new_flen != new_insn - new_prog) {
+		new_flen = new_insn - new_prog;
+		if (pass > 2)
+			goto err;
+		goto do_pass;
+	}
+
+	free(addrs);
+	assert(*new_len == new_flen);
+
+	return 0;
+err:
+	free(addrs);
+	return -1;
+}
+
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog)
+{
+	struct rte_bpf_prm *prm = NULL;
+	struct ebpf_insn *ebpf = NULL;
+	uint32_t ebpf_len = 0;
+	int ret;
+
+	if (prog == NULL) {
+		RTE_BPF_LOG(ERR, "%s: NULL program\n", __func__);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* 1st pass: calculate the eBPF program length */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, NULL, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot get eBPF length\n", __func__);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	RTE_BPF_LOG(DEBUG, "%s: prog len cBPF=%u -> eBPF=%u\n",
+		    __func__, prog->bf_len, ebpf_len);
+
+	prm = rte_zmalloc("bpf_filter",
+			  sizeof(*prm) + ebpf_len * sizeof(*ebpf), 0);
+	if (prm == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	/* The EPBF instructions in this case are right after the header */
+	ebpf = (void *)(prm + 1);
+
+	/* 2nd pass: remap cBPF to eBPF instructions  */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, ebpf, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot convert cBPF to eBPF\n", __func__);
+		free(prm);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	prm->ins = ebpf;
+	prm->nb_ins = ebpf_len;
+
+	/* Classic BPF programs use mbufs */
+	prm->prog_arg.type = RTE_BPF_ARG_PTR_MBUF;
+	prm->prog_arg.size = sizeof(struct rte_mbuf);
+
+	return prm;
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 63cbd60185e0..54f7610ae990 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -25,3 +25,8 @@ if dep.found()
     sources += files('bpf_load_elf.c')
     ext_deps += dep
 endif
+
+if dpdk_conf.has('RTE_PORT_PCAP')
+    sources += files('bpf_convert.c')
+    ext_deps += pcap_dep
+endif
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 69116f36ba8b..2f23e272a376 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,31 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+#ifdef RTE_PORT_PCAP
+
+struct bpf_program;
+
+/**
+ * Convert a Classic BPF program from libpcap into a DPDK BPF code.
+ *
+ * @param prog
+ *  Classic BPF program from pcap_compile().
+ * @param prm
+ *  Result Extended BPF program.
+ * @return
+ *   Pointer to BPF program (allocated with *rte_malloc*)
+ *   that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+__rte_experimental
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog);
+
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 0bf35f487666..47082d5003ef 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -14,3 +14,9 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_convert;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v10 05/12] bpf: add function to dump eBPF instructions
  2021-09-16 22:26 ` [dpdk-dev] [PATCH v10 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (3 preceding siblings ...)
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
@ 2021-09-16 22:26   ` Stephen Hemminger
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 06/12] pdump: support pcapng and filtering Stephen Hemminger
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16 22:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

When debugging converted (and other) programs it is useful
to see disassembled eBPF output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/bpf/bpf_dump.c  | 139 ++++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build |   1 +
 lib/bpf/rte_bpf.h   |  14 +++++
 lib/bpf/version.map |   1 +
 4 files changed, 155 insertions(+)
 create mode 100644 lib/bpf/bpf_dump.c

diff --git a/lib/bpf/bpf_dump.c b/lib/bpf/bpf_dump.c
new file mode 100644
index 000000000000..b86977b96d08
--- /dev/null
+++ b/lib/bpf/bpf_dump.c
@@ -0,0 +1,139 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Stephen Hemminger
+ * Based on filter2xdp
+ * Copyright (C) 2017 Tobias Klauser
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+
+#include "rte_bpf.h"
+
+#define BPF_OP_INDEX(x) (BPF_OP(x) >> 4)
+#define BPF_SIZE_INDEX(x) (BPF_SIZE(x) >> 3)
+
+static const char *const class_tbl[] = {
+	[BPF_LD] = "ld",   [BPF_LDX] = "ldx",	 [BPF_ST] = "st",
+	[BPF_STX] = "stx", [BPF_ALU] = "alu",	 [BPF_JMP] = "jmp",
+	[BPF_RET] = "ret", [BPF_MISC] = "alu64",
+};
+
+static const char *const alu_op_tbl[16] = {
+	[BPF_ADD >> 4] = "add",	   [BPF_SUB >> 4] = "sub",
+	[BPF_MUL >> 4] = "mul",	   [BPF_DIV >> 4] = "div",
+	[BPF_OR >> 4] = "or",	   [BPF_AND >> 4] = "and",
+	[BPF_LSH >> 4] = "lsh",	   [BPF_RSH >> 4] = "rsh",
+	[BPF_NEG >> 4] = "neg",	   [BPF_MOD >> 4] = "mod",
+	[BPF_XOR >> 4] = "xor",	   [EBPF_MOV >> 4] = "mov",
+	[EBPF_ARSH >> 4] = "arsh", [EBPF_END >> 4] = "endian",
+};
+
+static const char *const size_tbl[] = {
+	[BPF_W >> 3] = "w",
+	[BPF_H >> 3] = "h",
+	[BPF_B >> 3] = "b",
+	[EBPF_DW >> 3] = "dw",
+};
+
+static const char *const jump_tbl[16] = {
+	[BPF_JA >> 4] = "ja",	   [BPF_JEQ >> 4] = "jeq",
+	[BPF_JGT >> 4] = "jgt",	   [BPF_JGE >> 4] = "jge",
+	[BPF_JSET >> 4] = "jset",  [EBPF_JNE >> 4] = "jne",
+	[EBPF_JSGT >> 4] = "jsgt", [EBPF_JSGE >> 4] = "jsge",
+	[EBPF_CALL >> 4] = "call", [EBPF_EXIT >> 4] = "exit",
+};
+
+void rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len)
+{
+	uint32_t i;
+
+	for (i = 0; i < len; ++i) {
+		const struct ebpf_insn *ins = buf + i;
+		uint8_t cls = BPF_CLASS(ins->code);
+		const char *op, *postfix = "";
+
+		fprintf(f, " L%u:\t", i);
+
+		switch (cls) {
+		default:
+			fprintf(f, "unimp 0x%x // class: %s\n",
+				ins->code, class_tbl[cls]);
+			break;
+		case BPF_ALU:
+			postfix = "32";
+			/* fall through */
+		case EBPF_ALU64:
+			op = alu_op_tbl[BPF_OP_INDEX(ins->code)];
+			if (BPF_SRC(ins->code) == BPF_X)
+				fprintf(f, "%s%s r%u, r%u\n", op, postfix, ins->dst_reg,
+					ins->src_reg);
+			else
+				fprintf(f, "%s%s r%u, #0x%x\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			break;
+		case BPF_LD:
+			op = "ld";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			if (ins->code == (BPF_LD | BPF_IMM | EBPF_DW)) {
+				uint64_t val;
+
+				val = (uint32_t)ins[0].imm |
+					(uint64_t)(uint32_t)ins[1].imm << 32;
+				fprintf(f, "%s%s r%d, #0x%"PRIx64"\n",
+					op, postfix, ins->dst_reg, val);
+				i++;
+			} else if (BPF_MODE(ins->code) == BPF_IMM)
+				fprintf(f, "%s%s r%d, #0x%x\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			else if (BPF_MODE(ins->code) == BPF_ABS)
+				fprintf(f, "%s%s r%d, [%d]\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			else if (BPF_MODE(ins->code) == BPF_IND)
+				fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix,
+					ins->dst_reg, ins->src_reg, ins->imm);
+			else
+				fprintf(f, "// BUG: LD opcode 0x%02x in eBPF insns\n",
+					ins->code);
+			break;
+		case BPF_LDX:
+			op = "ldx";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix, ins->dst_reg,
+				ins->src_reg, ins->off);
+			break;
+		case BPF_ST:
+			op = "st";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			if (BPF_MODE(ins->code) == BPF_MEM)
+				fprintf(f, "%s%s [r%d + %d], #0x%x\n", op, postfix,
+					ins->dst_reg, ins->off, ins->imm);
+			else
+				fprintf(f, "// BUG: ST opcode 0x%02x in eBPF insns\n",
+					ins->code);
+			break;
+		case BPF_STX:
+			op = "stx";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			fprintf(f, "%s%s [r%d + %d], r%u\n", op, postfix,
+				ins->dst_reg, ins->off, ins->src_reg);
+			break;
+#define L(pc, off) ((int)(pc) + 1 + (off))
+		case BPF_JMP:
+			op = jump_tbl[BPF_OP_INDEX(ins->code)];
+			if (op == NULL)
+				fprintf(f, "invalid jump opcode: %#x\n", ins->code);
+			else if (BPF_OP(ins->code) == BPF_JA)
+				fprintf(f, "%s L%d\n", op, L(i, ins->off));
+			else if (BPF_OP(ins->code) == EBPF_EXIT)
+				fprintf(f, "%s\n", op);
+			else
+				fprintf(f, "%s r%u, #0x%x, L%d\n", op, ins->dst_reg,
+					ins->imm, L(i, ins->off));
+			break;
+		case BPF_RET:
+			fprintf(f, "// BUG: RET opcode 0x%02x in eBPF insns\n",
+				ins->code);
+			break;
+		}
+	}
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 54f7610ae990..5b5585173aeb 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2018 Intel Corporation
 
 sources = files('bpf.c',
+	'bpf_dump.c',
         'bpf_exec.c',
         'bpf_load.c',
         'bpf_pkt.c',
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 2f23e272a376..0d0a84b130a0 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,20 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+/**
+ * Dump epf instructions to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param buf
+ *   A pointer to BPF instructions
+ * @param len
+ *   Number of BPF instructions to dump.
+ */
+__rte_experimental
+void
+rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len);
+
 #ifdef RTE_PORT_PCAP
 
 struct bpf_program;
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 47082d5003ef..3b953f2f4592 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -19,4 +19,5 @@ EXPERIMENTAL {
 	global:
 
 	rte_bpf_convert;
+	rte_bpf_dump;
 };
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v10 06/12] pdump: support pcapng and filtering
  2021-09-16 22:26 ` [dpdk-dev] [PATCH v10 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (4 preceding siblings ...)
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
@ 2021-09-16 22:26   ` Stephen Hemminger
  2021-09-23 16:11     ` Pattan, Reshma
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
                     ` (5 subsequent siblings)
  11 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16 22:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build       |   4 +-
 lib/pdump/meson.build |   2 +-
 lib/pdump/rte_pdump.c | 441 ++++++++++++++++++++++++++++++------------
 lib/pdump/rte_pdump.h | 110 ++++++++++-
 lib/pdump/version.map |   8 +
 5 files changed, 437 insertions(+), 128 deletions(-)

diff --git a/lib/meson.build b/lib/meson.build
index ba88e9eabc58..1da521ea6185 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -26,6 +26,7 @@ libraries = [
         'timer',   # eventdev depends on this
         'acl',
         'bbdev',
+        'bpf',
         'bitratestats',
         'cfgfile',
         'compressdev',
@@ -43,7 +44,6 @@ libraries = [
         'member',
         'pcapng',
         'power',
-        'pdump',
         'rawdev',
         'regexdev',
         'rib',
@@ -55,10 +55,10 @@ libraries = [
         'ipsec', # ipsec lib depends on net, crypto and security
         'fib', #fib lib depends on rib
         'port', # pkt framework libs which use other libs from above
+        'pdump', # pdump lib depends on bpf pcapng
         'table',
         'pipeline',
         'flow_classify', # flow_classify lib depends on pkt framework table lib
-        'bpf',
         'graph',
         'node',
 ]
diff --git a/lib/pdump/meson.build b/lib/pdump/meson.build
index 3a95eabde6a6..51ceb2afdec5 100644
--- a/lib/pdump/meson.build
+++ b/lib/pdump/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('rte_pdump.c')
 headers = files('rte_pdump.h')
-deps += ['ethdev']
+deps += ['ethdev', 'bpf', 'pcapng']
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 382217bc1564..abc28fcee0ad 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -7,8 +7,10 @@
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
+#include <rte_pcapng.h>
 
 #include "rte_pdump.h"
 
@@ -27,30 +29,26 @@ enum pdump_operation {
 	ENABLE = 2
 };
 
+/*
+ * Note: version numbers intentionally start at 3
+ * in order to catch any application built with older out
+ * version of DPDK using incompatible client request format.
+ */
 enum pdump_version {
-	V1 = 1
+	PDUMP_CLIENT_LEGACY = 3,
+	PDUMP_CLIENT_PCAPNG = 4,
 };
 
 struct pdump_request {
 	uint16_t ver;
 	uint16_t op;
-	uint32_t flags;
-	union pdump_data {
-		struct enable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} en_v1;
-		struct disable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} dis_v1;
-	} data;
+	uint16_t flags;
+	uint16_t queue;
+	struct rte_ring *ring;
+	struct rte_mempool *mp;
+	const struct rte_bpf_prm *prm;
+	uint32_t snaplen;
+	char device[RTE_DEV_NAME_MAX_LEN];
 };
 
 struct pdump_response {
@@ -63,80 +61,140 @@ static struct pdump_rxtx_cbs {
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	const struct rte_eth_rxtx_callback *cb;
-	void *filter;
+	const struct rte_bpf *filter;
+	enum pdump_version ver;
+	uint32_t snaplen;
 } rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
 tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
 
-
-static inline void
-pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+static const char *MZ_RTE_PDUMP_STATS = "rte_pdump_stats";
+
+/* Shared memory between primary and secondary processes. */
+static struct {
+	struct rte_pdump_stats rx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+	struct rte_pdump_stats tx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+} *pdump_stats;
+
+/* Create a clone of mbuf to be placed into ring. */
+static void
+pdump_copy(uint16_t port_id, uint16_t queue,
+	   enum rte_pcapng_direction direction,
+	   struct rte_mbuf **pkts, uint16_t nb_pkts,
+	   const struct pdump_rxtx_cbs *cbs,
+	   struct rte_pdump_stats *stats)
 {
 	unsigned int i;
 	int ring_enq;
 	uint16_t d_pkts = 0;
 	struct rte_mbuf *dup_bufs[nb_pkts];
-	struct pdump_rxtx_cbs *cbs;
+	uint64_t ts;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	struct rte_mbuf *p;
+	uint64_t rcs[nb_pkts];
+
+	if (cbs->filter &&
+	    rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts) == 0) {
+		/* All packets were filtered out */
+		__atomic_fetch_add(&stats->filtered, nb_pkts,
+				   __ATOMIC_RELAXED);
+		return;
+	}
 
-	cbs  = user_params;
+	ts = rte_get_tsc_cycles();
 	ring = cbs->ring;
 	mp = cbs->mp;
 	for (i = 0; i < nb_pkts; i++) {
-		p = rte_pktmbuf_copy(pkts[i], mp, 0, UINT32_MAX);
-		if (p)
+		/*
+		 * Similar behavior to rte_bpf_eth callback.
+		 * if BPF program returns zero value for a given packet,
+		 * then it will be ignored.
+		 */
+		if (cbs->filter && rcs[i] == 0) {
+			__atomic_fetch_add(&stats->filtered,
+					   1, __ATOMIC_RELAXED);
+			continue;
+		}
+
+		/*
+		 * If using pcapng then want to wrap packets
+		 * otherwise a simple copy.
+		 */
+		if (cbs->ver == PDUMP_CLIENT_PCAPNG)
+			p = rte_pcapng_copy(port_id, queue,
+					    pkts[i], mp, cbs->snaplen,
+					    ts, direction);
+		else
+			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
+
+		if (unlikely(p == NULL))
+			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+		else
 			dup_bufs[d_pkts++] = p;
 	}
 
+	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)dup_bufs, d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		PDUMP_LOG(DEBUG,
-			"only %d of packets enqueued to ring\n", ring_enq);
+		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
 
 static uint16_t
-pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_rx(uint16_t port, uint16_t queue,
 	struct rte_mbuf **pkts, uint16_t nb_pkts,
-	uint16_t max_pkts __rte_unused,
-	void *user_params)
+	uint16_t max_pkts __rte_unused, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->rx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_IN,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static uint16_t
-pdump_tx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_tx(uint16_t port, uint16_t queue,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->tx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static int
-pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_rx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &rx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &rx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"rx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_first_rx_callback(port, qid,
 								pdump_rx, cbs);
 			if (cbs->cb == NULL) {
@@ -145,8 +203,7 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -170,26 +227,32 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 }
 
 static int
-pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_tx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &tx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &tx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"tx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_tx_callback(port, qid, pdump_tx,
 								cbs);
 			if (cbs->cb == NULL) {
@@ -198,8 +261,7 @@ pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -228,37 +290,47 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	uint16_t nb_rx_q = 0, nb_tx_q = 0, end_q, queue;
 	uint16_t port;
 	int ret = 0;
+	struct rte_bpf *filter = NULL;
 	uint32_t flags;
 	uint16_t operation;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 
-	flags = p->flags;
-	operation = p->op;
-	if (operation == ENABLE) {
-		ret = rte_eth_dev_get_port_by_name(p->data.en_v1.device,
-				&port);
-		if (ret < 0) {
+	if (!(p->ver == PDUMP_CLIENT_LEGACY ||
+	      p->ver == PDUMP_CLIENT_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "incorrect client version %u\n", p->ver);
+		return -EINVAL;
+	}
+
+	if (p->prm) {
+		if (p->prm->prog_arg.type != RTE_BPF_ARG_PTR_MBUF) {
 			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.en_v1.device);
+				  "invalid BPF program type: %u\n",
+				  p->prm->prog_arg.type);
 			return -EINVAL;
 		}
-		queue = p->data.en_v1.queue;
-		ring = p->data.en_v1.ring;
-		mp = p->data.en_v1.mp;
-	} else {
-		ret = rte_eth_dev_get_port_by_name(p->data.dis_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.dis_v1.device);
-			return -EINVAL;
+
+		filter = rte_bpf_load(p->prm);
+		if (filter == NULL) {
+			PDUMP_LOG(ERR, "cannot load BPF filter: %s\n",
+				  rte_strerror(rte_errno));
+			return -rte_errno;
 		}
-		queue = p->data.dis_v1.queue;
-		ring = p->data.dis_v1.ring;
-		mp = p->data.dis_v1.mp;
+	}
+
+	flags = p->flags;
+	operation = p->op;
+	queue = p->queue;
+	ring = p->ring;
+	mp = p->mp;
+
+	ret = rte_eth_dev_get_port_by_name(p->device, &port);
+	if (ret < 0) {
+		PDUMP_LOG(ERR,
+			  "failed to get port id for device id=%s\n",
+			  p->device);
+		return -EINVAL;
 	}
 
 	/* validation if packet capture is for all queues */
@@ -296,8 +368,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register RX callback */
 	if (flags & RTE_PDUMP_FLAG_RX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_rx_q : queue + 1;
-		ret = pdump_register_rx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_rx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -305,8 +378,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register TX callback */
 	if (flags & RTE_PDUMP_FLAG_TX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_tx_q : queue + 1;
-		ret = pdump_register_tx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_tx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -332,7 +406,7 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 		resp->err_value = set_pdump_rxtx_cbs(cli_req);
 	}
 
-	strlcpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_resp.len_param = sizeof(*resp);
 	mp_resp.num_fds = 0;
 	if (rte_mp_reply(&mp_resp, peer) < 0) {
@@ -347,8 +421,18 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 int
 rte_pdump_init(void)
 {
+	const struct rte_memzone *mz;
 	int ret;
 
+	mz = rte_memzone_reserve(MZ_RTE_PDUMP_STATS, sizeof(*pdump_stats),
+				 rte_socket_id(), 0);
+	if (mz == NULL) {
+		PDUMP_LOG(ERR, "cannot allocate pdump statistics\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	pdump_stats = mz->addr;
+
 	ret = rte_mp_action_register(PDUMP_MP, pdump_server);
 	if (ret && rte_errno != ENOTSUP)
 		return -1;
@@ -392,14 +476,21 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 static int
 pdump_validate_flags(uint32_t flags)
 {
-	if (flags != RTE_PDUMP_FLAG_RX && flags != RTE_PDUMP_FLAG_TX &&
-		flags != RTE_PDUMP_FLAG_RXTX) {
+	if ((flags & RTE_PDUMP_FLAG_RXTX) == 0) {
 		PDUMP_LOG(ERR,
 			"invalid flags, should be either rx/tx/rxtx\n");
 		rte_errno = EINVAL;
 		return -1;
 	}
 
+	/* mask off the flags we know about */
+	if (flags & ~(RTE_PDUMP_FLAG_RXTX | RTE_PDUMP_FLAG_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "unknown flags: %#x\n", flags);
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -426,12 +517,12 @@ pdump_validate_port(uint16_t port, char *name)
 }
 
 static int
-pdump_prepare_client_request(char *device, uint16_t queue,
-				uint32_t flags,
-				uint16_t operation,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+pdump_prepare_client_request(const char *device, uint16_t queue,
+			     uint32_t flags, uint32_t snaplen,
+			     uint16_t operation,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     const struct rte_bpf_prm *prm)
 {
 	int ret = -1;
 	struct rte_mp_msg mp_req, *mp_rep;
@@ -440,26 +531,26 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	struct pdump_request *req = (struct pdump_request *)mp_req.param;
 	struct pdump_response *resp;
 
-	req->ver = 1;
-	req->flags = flags;
+	memset(req, 0, sizeof(*req));
+	if (flags & RTE_PDUMP_FLAG_PCAPNG)
+		req->ver = PDUMP_CLIENT_PCAPNG;
+	else
+		req->ver = PDUMP_CLIENT_LEGACY;
+
+	req->flags = flags & RTE_PDUMP_FLAG_RXTX;
 	req->op = operation;
+	req->queue = queue;
+	rte_strscpy(req->device, device, sizeof(req->device));
+
 	if ((operation & ENABLE) != 0) {
-		strlcpy(req->data.en_v1.device, device,
-			sizeof(req->data.en_v1.device));
-		req->data.en_v1.queue = queue;
-		req->data.en_v1.ring = ring;
-		req->data.en_v1.mp = mp;
-		req->data.en_v1.filter = filter;
-	} else {
-		strlcpy(req->data.dis_v1.device, device,
-			sizeof(req->data.dis_v1.device));
-		req->data.dis_v1.queue = queue;
-		req->data.dis_v1.ring = NULL;
-		req->data.dis_v1.mp = NULL;
-		req->data.dis_v1.filter = NULL;
+		req->queue = queue;
+		req->ring = ring;
+		req->mp = mp;
+		req->prm = prm;
+		req->snaplen = snaplen;
 	}
 
-	strlcpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_req.len_param = sizeof(*req);
 	mp_req.num_fds = 0;
 	if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) {
@@ -477,11 +568,17 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	return ret;
 }
 
-int
-rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
-			struct rte_ring *ring,
-			struct rte_mempool *mp,
-			void *filter)
+/*
+ * There are two versions of this function, because although original API
+ * left place holder for future filter, it never checked the value.
+ * Therefore the API can't depend on application passing a non
+ * bogus value.
+ */
+static int
+pdump_enable(uint16_t port, uint16_t queue,
+	     uint32_t flags, uint32_t snaplen,
+	     struct rte_ring *ring, struct rte_mempool *mp,
+	     const struct rte_bpf_prm *prm)
 {
 	int ret;
 	char name[RTE_DEV_NAME_MAX_LEN];
@@ -496,20 +593,42 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						ENABLE, ring, mp, filter);
+	if (snaplen == 0)
+		snaplen = UINT32_MAX;
 
-	return ret;
+	return pdump_prepare_client_request(name, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
 }
 
 int
-rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
-				uint32_t flags,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
+		 struct rte_ring *ring,
+		 struct rte_mempool *mp,
+		 void *filter __rte_unused)
 {
-	int ret = 0;
+	return pdump_enable(port, queue, flags, 0,
+			    ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm)
+{
+	return pdump_enable(port, queue, flags, snaplen,
+			    ring, mp, prm);
+}
+
+static int
+pdump_enable_by_deviceid(const char *device_id, uint16_t queue,
+			 uint32_t flags, uint32_t snaplen,
+			 struct rte_ring *ring,
+			 struct rte_mempool *mp,
+			 const struct rte_bpf_prm *prm)
+{
+	int ret;
 
 	ret = pdump_validate_ring_mp(ring, mp);
 	if (ret < 0)
@@ -518,10 +637,30 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						ENABLE, ring, mp, filter);
+	return pdump_prepare_client_request(device_id, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
+}
 
-	return ret;
+int
+rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
+			     uint32_t flags,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     void *filter __rte_unused)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, 0,
+					ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *prm)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, snaplen,
+					ring, mp, prm);
 }
 
 int
@@ -537,8 +676,8 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags)
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(name, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
@@ -553,8 +692,66 @@ rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(device_id, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
+
+static void
+pdump_sum_stats(uint16_t port, uint16_t nq,
+		struct rte_pdump_stats stats[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+		struct rte_pdump_stats *total)
+{
+	uint64_t *sum = (uint64_t *)total;
+	unsigned int i;
+	uint64_t val;
+	uint16_t qid;
+
+	for (qid = 0; qid < nq; qid++) {
+		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+
+		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
+			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			sum[i] += val;
+		}
+	}
+}
+
+int
+rte_pdump_stats(uint16_t port, struct rte_pdump_stats *stats)
+{
+	struct rte_eth_dev_info dev_info;
+	const struct rte_memzone *mz;
+	int ret;
+
+	memset(stats, 0, sizeof(*stats));
+	ret = rte_eth_dev_info_get(port, &dev_info);
+	if (ret != 0) {
+		PDUMP_LOG(ERR,
+			  "Error during getting device (port %u) info: %s\n",
+			  port, strerror(-ret));
+		return ret;
+	}
+
+	if (pdump_stats == NULL) {
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+			PDUMP_LOG(ERR,
+				  "pdump not initialized\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		mz = rte_memzone_lookup(MZ_RTE_PDUMP_STATS);
+		if (mz == NULL) {
+			PDUMP_LOG(ERR, "can not find pdump stats\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+		pdump_stats = mz->addr;
+	}
+
+	pdump_sum_stats(port, dev_info.nb_rx_queues, pdump_stats->rx, stats);
+	pdump_sum_stats(port, dev_info.nb_tx_queues, pdump_stats->tx, stats);
+	return 0;
+}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index 6b00fc17aeb2..be3fd14c4bd3 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -15,6 +15,7 @@
 #include <stdint.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
+#include <rte_bpf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -26,7 +27,9 @@ enum {
 	RTE_PDUMP_FLAG_RX = 1,  /* receive direction */
 	RTE_PDUMP_FLAG_TX = 2,  /* transmit direction */
 	/* both receive and transmit directions */
-	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
+	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX),
+
+	RTE_PDUMP_FLAG_PCAPNG = 4, /* format for pcapng */
 };
 
 /**
@@ -68,7 +71,7 @@ rte_pdump_uninit(void);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  Unused should be NULL.
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -80,6 +83,41 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 		struct rte_mempool *mp,
 		void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given port and queue with filtering.
+ *
+ * @param port_id
+ *  The Ethernet port on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param prm
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf(uint16_t port_id, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm);
+
 /**
  * Disables packet capturing on given port and queue.
  *
@@ -118,7 +156,7 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  unused should be NULL
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -131,6 +169,43 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 				struct rte_mempool *mp,
 				void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given device id and queue with filtering.
+ * device_id can be name or pci address of device.
+ *
+ * @param device_id
+ *  device id on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *filter);
+
+
 /**
  * Disables packet capturing on given device_id and queue.
  * device_id can be name or pci address of device.
@@ -153,6 +228,35 @@ int
 rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 				uint32_t flags);
 
+
+/**
+ * A structure used to retrieve statistics from packet capture.
+ * The statistics are sum of both receive and transmit queues.
+ */
+struct rte_pdump_stats {
+	uint64_t accepted; /**< Number of packets accepted by filter. */
+	uint64_t filtered; /**< Number of packets rejected by filter. */
+	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
+	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+
+	uint64_t reserved[4]; /**< Reserved and pad to cache line */
+};
+
+/**
+ * Retrieve the packet capture statistics for a queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param stats
+ *   A pointer to structure of type *rte_pdump_stats* to be filled in.
+ * @return
+ *   Zero if successful. -1 on error and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_pdump_stats(uint16_t port_id, struct rte_pdump_stats *stats);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index f0a9d12c9a9e..ce5502d9cdf4 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -10,3 +10,11 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_pdump_enable_bpf;
+	rte_pdump_enable_bpf_by_deviceid;
+	rte_pdump_stats;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v10 07/12] app/dumpcap: add new packet capture application
  2021-09-16 22:26 ` [dpdk-dev] [PATCH v10 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (5 preceding siblings ...)
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 06/12] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-09-16 22:26   ` Stephen Hemminger
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 08/12] test: add test for bpf_convert Stephen Hemminger
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16 22:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.

It is not complete yet some features such as filtering are not implemented.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/dumpcap/main.c      | 844 ++++++++++++++++++++++++++++++++++++++++
 app/dumpcap/meson.build |  16 +
 app/meson.build         |   1 +
 3 files changed, 861 insertions(+)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build

diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c
new file mode 100644
index 000000000000..baf9eee46666
--- /dev/null
+++ b/app/dumpcap/main.c
@@ -0,0 +1,844 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * DPDK application to dump network traffic
+ * This is designed to look and act like the Wireshark
+ * dumpcap program.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <limits.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_alarm.h>
+#include <rte_bpf.h>
+#include <rte_config.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_lcore.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_pcapng.h>
+#include <rte_pdump.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+#include <rte_time.h>
+#include <rte_version.h>
+
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+
+#define RING_NAME "capture-ring"
+#define MONITOR_INTERVAL  (500 * 1000)
+#define MBUF_POOL_CACHE_SIZE 32
+#define BURST_SIZE 32
+#define SLEEP_THRESHOLD 1000
+
+/* command line flags */
+static const char *progname;
+static bool quit_signal;
+static bool group_read;
+static bool quiet;
+static bool promiscuous_mode = true;
+static bool use_pcapng = true;
+static char *output_name;
+static const char *filter_str;
+static unsigned int ring_size = 2048;
+static const char *capture_comment;
+static uint32_t snaplen = RTE_MBUF_DEFAULT_BUF_SIZE;
+static bool dump_bpf;
+static struct {
+	uint64_t  duration;	/* nanoseconds */
+	unsigned long packets;  /* number of packets in file */
+	size_t size;		/* file size (bytes) */
+} stop;
+
+/* Running state */
+static struct rte_bpf_prm *bpf_prm;
+static uint64_t start_time, end_time;
+static uint64_t packets_received;
+static size_t file_size;
+
+struct interface {
+	TAILQ_ENTRY(interface) next;
+	uint16_t port;
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_rxtx_callback *rx_cb[RTE_MAX_QUEUES_PER_PORT];
+};
+
+TAILQ_HEAD(interface_list, interface);
+static struct interface_list interfaces = TAILQ_HEAD_INITIALIZER(interfaces);
+static struct interface *port2intf[RTE_MAX_ETHPORTS];
+
+/* Can do either pcap or pcapng format output */
+typedef union {
+	rte_pcapng_t  *pcapng;
+	pcap_dumper_t *dumper;
+} dumpcap_out_t;
+
+static void usage(void)
+{
+	printf("Usage: %s [options] ...\n\n", progname);
+	printf("Capture Interface:\n"
+	       "  -i <interface>           name or port index of interface\n"
+	       "  -f <capture filter>      packet filter in libpcap filter syntax\n");
+	printf("  -s <snaplen>, --snapshot-length <snaplen>\n"
+	       "                           packet snapshot length (def: %u)\n",
+	       RTE_MBUF_DEFAULT_BUF_SIZE);
+	printf("  -p, --no-promiscuous-mode\n"
+	       "                           don't capture in promiscuous mode\n"
+	       "  -D, --list-interfaces    print list of interfaces and exit\n"
+	       "  -d                       print generated BPF code for capture filter\n"
+	       "\n"
+	       "Stop conditions:\n"
+	       "  -c <packet count>        stop after n packets (def: infinite)\n"
+	       "  -a <autostop cond.> ..., --autostop <autostop cond.> ...\n"
+	       "                           duration:NUM - stop after NUM seconds\n"
+	       "                           filesize:NUM - stop this file after NUM kB\n"
+	       "                            packets:NUM - stop after NUM packets\n"
+	       "Output (files):\n"
+	       "  -w <filename>            name of file to save (def: tempfile)\n"
+	       "  -g                       enable group read access on the output file(s)\n"
+	       "  -n                       use pcapng format instead of pcap (default)\n"
+	       "  -P                       use libpcap format instead of pcapng\n"
+	       "  --capture-comment <comment>\n"
+	       "                           add a capture comment to the output file\n"
+	       "\n"
+	       "Miscellaneous:\n"
+	       "  -q                       don't report packet capture counts\n"
+	       "  -v, --version            print version information and exit\n"
+	       "  -h, --help               display this help and exit\n"
+	       "\n"
+	       "Use Ctrl-C to stop capturing at any time.\n");
+}
+
+static const char *version(void)
+{
+	static char str[128];
+
+	snprintf(str, sizeof(str),
+		 "%s 1.0 (%s)\n", progname, rte_version());
+	return str;
+}
+
+/* Parse numeric argument from command line */
+static unsigned long get_uint(const char *arg, const char *name,
+			     unsigned int limit)
+{
+	unsigned long u;
+	char *endp;
+
+	u = strtoul(arg, &endp, 0);
+	if (*arg == '\0' || *endp != '\0')
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is not a valid number\n",
+			 name, arg);
+	if (limit && u > limit)
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is too large (greater than %u)\n",
+			 name, arg, limit);
+
+	return u;
+}
+
+/* Set auto stop values */
+static void auto_stop(char *opt)
+{
+	char *value, *endp;
+
+	value = strchr(opt, ':');
+	if (value == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Missing colon in auto stop parameter\n");
+
+	*value++ = '\0';
+	if (strcmp(opt, "duration") == 0) {
+		double interval = strtod(value, &endp);
+
+		if (*value == '\0' || *endp != '\0' || interval <= 0)
+			rte_exit(EXIT_FAILURE,
+				 "Invalid duration \"%s\"\n", value);
+		stop.duration = NSEC_PER_SEC * interval;
+	} else if (strcmp(opt, "filesize") == 0) {
+		stop.size = get_uint(value, "filesize", 0) * 1024;
+	} else if (strcmp(opt, "packets") == 0) {
+		stop.packets = get_uint(value, "packets", 0);
+	} else {
+		rte_exit(EXIT_FAILURE,
+			 "Unknown autostop parameter \"%s\"\n", opt);
+	}
+}
+
+/* Add interface to list of interfaces to capture */
+static void add_interface(uint16_t port, const char *name)
+{
+	struct interface *intf;
+
+	intf = malloc(sizeof(*intf));
+	if (!intf)
+		rte_exit(EXIT_FAILURE, "no memory for interface\n");
+
+	memset(intf, 0, sizeof(*intf));
+	rte_strscpy(intf->name, name, sizeof(intf->name));
+
+	printf("Capturing on '%s'\n", name);
+
+	port2intf[port] = intf;
+	TAILQ_INSERT_TAIL(&interfaces, intf, next);
+}
+
+/* Select all valid DPDK interfaces */
+static void select_all_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+	}
+}
+
+/*
+ * Choose interface to capture if no -i option given.
+ * Select the first DPDK port, this matches what dumpcap does.
+ */
+static void set_default_interface(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+		return;
+	}
+	rte_exit(EXIT_FAILURE, "No usable interfaces found\n");
+}
+
+/* Lookup interface by name or port and add it to the list */
+static void select_interface(const char *arg)
+{
+	uint16_t port;
+
+	if (strcmp(arg, "*"))
+		select_all_interfaces();
+	else if (rte_eth_dev_get_port_by_name(arg, &port) == 0)
+		add_interface(port, arg);
+	else {
+		char name[RTE_ETH_NAME_MAX_LEN];
+
+		port = get_uint(arg, "port_number", UINT16_MAX);
+		if (rte_eth_dev_get_name_by_port(port, name) < 0)
+			rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
+				 port);
+		add_interface(port, name);
+	}
+}
+
+/* Display list of possible interfaces that can be used. */
+static void show_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		printf("%u. %s\n", p, name);
+	}
+}
+
+static void compile_filter(void)
+{
+	struct bpf_program bf;
+	pcap_t *pcap;
+
+	pcap = pcap_open_dead(DLT_EN10MB, snaplen);
+	if (!pcap)
+		rte_exit(EXIT_FAILURE, "can not open pcap\n");
+
+	if (pcap_compile(pcap, &bf, filter_str,
+			 1, PCAP_NETMASK_UNKNOWN) != 0)
+		rte_exit(EXIT_FAILURE, "pcap filter string not valid (%s)\n",
+			 pcap_geterr(pcap));
+
+	bpf_prm = rte_bpf_convert(&bf);
+	if (bpf_prm == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "bpf convert failed\n");
+
+	if (dump_bpf) {
+		printf("cBPF program (%u insns)\n", bf.bf_len);
+		bpf_dump(&bf, 1);
+		printf("\neBPF program (%u insns)\n", bpf_prm->nb_ins);
+		rte_bpf_dump(stdout, bpf_prm->ins, bpf_prm->nb_ins);
+		exit(0);
+	}
+
+	/* Don't care about original program any more */
+	pcap_freecode(&bf);
+	pcap_close(pcap);
+}
+
+/*
+ * Parse command line options.
+ * These are chosen to be similar to dumpcap command.
+ */
+static void parse_opts(int argc, char **argv)
+{
+	static const struct option long_options[] = {
+		{ "autostop",        required_argument, NULL, 'a' },
+		{ "capture-comment", required_argument, NULL, 0 },
+		{ "help",            no_argument,       NULL, 'h' },
+		{ "interface",       required_argument, NULL, 'i' },
+		{ "list-interfaces", no_argument,       NULL, 'D' },
+		{ "no-promiscuous-mode", no_argument,   NULL, 'p' },
+		{ "output-file",     required_argument, NULL, 'w' },
+		{ "ring-buffer",     required_argument, NULL, 'b' },
+		{ "snapshot-length", required_argument, NULL, 's' },
+		{ "version",         no_argument,       NULL, 'v' },
+		{ NULL },
+	};
+	int option_index, c;
+
+	for (;;) {
+		c = getopt_long(argc, argv, "a:b:c:dDf:ghi:nN:pPqs:vw:",
+				long_options, &option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			switch (option_index) {
+			case 0:
+				capture_comment = optarg;
+				break;
+			default:
+				usage();
+				exit(1);
+			}
+			break;
+		case 'a':
+			auto_stop(optarg);
+			break;
+		case 'b':
+			rte_exit(EXIT_FAILURE,
+				 "multiple files not implemented\n");
+			break;
+		case 'c':
+			stop.packets = get_uint(optarg, "packet_count", 0);
+			break;
+		case 'd':
+			dump_bpf = true;
+			break;
+		case 'D':
+			show_interfaces();
+			exit(0);
+		case 'f':
+			filter_str = optarg;
+			break;
+		case 'g':
+			group_read = true;
+			break;
+		case 'h':
+			printf("%s\n\n", version());
+			usage();
+			exit(0);
+		case 'i':
+			select_interface(optarg);
+			break;
+		case 'n':
+			use_pcapng = true;
+			break;
+		case 'N':
+			ring_size = get_uint(optarg, "packet_limit", 0);
+			break;
+		case 'p':
+			promiscuous_mode = false;
+			break;
+		case 'P':
+			use_pcapng = false;
+			break;
+		case 'q':
+			quiet = true;
+			break;
+		case 's':
+			snaplen = get_uint(optarg, "snap_len", 0);
+			break;
+		case 'w':
+			output_name = optarg;
+			break;
+		case 'v':
+			printf("%s\n", version());
+			exit(0);
+		default:
+			fprintf(stderr, "Invalid option: %s\n",
+				argv[optind - 1]);
+			usage();
+			exit(1);
+		}
+	}
+}
+
+static void
+signal_handler(int sig_num __rte_unused)
+{
+	__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+}
+
+/* Return the time since 1/1/1970 in nanoseconds */
+static uint64_t create_timestamp(void)
+{
+	struct timespec now;
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+	return rte_timespec_to_ns(&now);
+}
+
+static void
+cleanup_pdump_resources(void)
+{
+	struct interface *intf;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		rte_pdump_disable(intf->port,
+				  RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
+		if (promiscuous_mode)
+			rte_eth_promiscuous_disable(intf->port);
+	}
+}
+
+/* Alarm signal handler, used to check that primary process */
+static void
+monitor_primary(void *arg __rte_unused)
+{
+	if (__atomic_load_n(&quit_signal, __ATOMIC_RELAXED))
+		return;
+
+	if (rte_eal_primary_proc_alive(NULL)) {
+		rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	} else {
+		fprintf(stderr,
+			"Primary process is no longer active, exiting...\n");
+		__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+	}
+}
+
+/* Setup handler to check when primary exits. */
+static void
+enable_primary_monitor(void)
+{
+	int ret;
+
+	/* Once primary exits, so will pdump. */
+	ret = rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to enable monitor:%d\n", ret);
+}
+
+static void
+disable_primary_monitor(void)
+{
+	int ret;
+
+	ret = rte_eal_alarm_cancel(monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to disable monitor:%d\n", ret);
+}
+
+static void
+report_packet_stats(dumpcap_out_t out)
+{
+	struct rte_pdump_stats pdump_stats;
+	struct interface *intf;
+	uint64_t ifrecv, ifdrop;
+	double percent;
+
+	fputc('\n', stderr);
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (rte_pdump_stats(intf->port, &pdump_stats) < 0)
+			continue;
+
+		/* do what Wiretap does */
+		ifrecv = pdump_stats.accepted + pdump_stats.filtered;
+		ifdrop = pdump_stats.nombuf + pdump_stats.ringfull;
+
+		if (use_pcapng)
+			rte_pcapng_write_stats(out.pcapng, intf->port, NULL,
+					       start_time, end_time,
+					       ifrecv, ifdrop);
+
+		if (ifrecv == 0)
+			percent = 0;
+		else
+			percent = 100. * ifrecv / (ifrecv + ifdrop);
+
+		fprintf(stderr,
+			"Packets received/dropped on interface '%s': "
+			"%"PRIu64 "/%" PRIu64 " (%.1f)\n",
+			intf->name, ifrecv, ifdrop, percent);
+	}
+}
+
+/*
+ * Start DPDK EAL with arguments.
+ * Unlike most DPDK programs, this application does not use the
+ * typical EAL command line arguments.
+ * We don't want to expose all the DPDK internals to the user.
+ */
+static void dpdk_init(void)
+{
+	static const char * const args[] = {
+		"dumpcap", "--proc-type", "secondary",
+		"--log-level", "notice"
+
+	};
+	const int eal_argc = RTE_DIM(args);
+	char **eal_argv;
+	unsigned int i;
+
+	/* DPDK API requires mutable versions of command line arguments. */
+	eal_argv = calloc(eal_argc + 1, sizeof(char *));
+	if (eal_argv == NULL)
+		rte_panic("No memory\n");
+
+	eal_argv[0] = strdup(progname);
+	for (i = 1; i < RTE_DIM(args); i++)
+		eal_argv[i] = strdup(args[i]);
+
+	if (rte_eal_init(eal_argc, eal_argv) < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed: is primary process running?\n");
+
+	if (rte_eth_dev_count_avail() == 0)
+		rte_exit(EXIT_FAILURE, "No Ethernet ports found\n");
+}
+
+/* Create packet ring shared between callbacks and process */
+static struct rte_ring *create_ring(void)
+{
+	struct rte_ring *ring;
+	size_t size, log2;
+
+	/* Find next power of 2 >= size. */
+	size = ring_size;
+	log2 = sizeof(size) * 8 - __builtin_clzl(size - 1);
+	size = 1u << log2;
+
+	if (size != ring_size) {
+		fprintf(stderr, "Ring size %u rounded up to %zu\n",
+			ring_size, size);
+		ring_size = size;
+	}
+
+	ring = rte_ring_lookup(RING_NAME);
+	if (ring == NULL) {
+		ring = rte_ring_create(RING_NAME, ring_size,
+					rte_socket_id(), 0);
+		if (ring == NULL)
+			rte_exit(EXIT_FAILURE, "Could not create ring :%s\n",
+				 rte_strerror(rte_errno));
+	}
+	return ring;
+}
+
+static struct rte_mempool *create_mempool(void)
+{
+	static const char pool_name[] = "capture_mbufs";
+	size_t num_mbufs = 2 * ring_size;
+	struct rte_mempool *mp;
+
+	mp = rte_mempool_lookup(pool_name);
+	if (mp)
+		return mp;
+
+	mp = rte_pktmbuf_pool_create_by_ops(pool_name, num_mbufs,
+					    MBUF_POOL_CACHE_SIZE, 0,
+					    rte_pcapng_mbuf_size(snaplen),
+					    rte_socket_id(), "ring_mp_sc");
+	if (mp == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Mempool (%s) creation failed: %s\n", pool_name,
+			 rte_strerror(rte_errno));
+
+	return mp;
+}
+
+/*
+ * Get Operating System information.
+ * Returns an string allocated via malloc().
+ */
+static char *get_os_info(void)
+{
+	struct utsname uts;
+	char *osname = NULL;
+
+	if (uname(&uts) < 0)
+		return NULL;
+
+	if (asprintf(&osname, "%s %s",
+		     uts.sysname, uts.release) == -1)
+		return NULL;
+
+	return osname;
+}
+
+static dumpcap_out_t create_output(void)
+{
+	dumpcap_out_t ret;
+	static char tmp_path[PATH_MAX];
+	int fd;
+
+	/* If no filename specified make a tempfile name */
+	if (output_name == NULL) {
+		struct interface *intf;
+		struct tm *tm;
+		time_t now;
+		char ts[32];
+
+		intf = TAILQ_FIRST(&interfaces);
+		now = time(NULL);
+		tm = localtime(&now);
+		if (!tm)
+			rte_panic("localtime failed\n");
+
+		strftime(ts, sizeof(ts), "%Y%m%d%H%M%S", tm);
+
+		snprintf(tmp_path, sizeof(tmp_path),
+			 "/tmp/%s_%u_%s_%s.%s",
+			 progname, intf->port, intf->name, ts,
+			 use_pcapng ? "pcapng" : "pcap");
+		output_name = tmp_path;
+	}
+
+	if (strcmp(output_name, "-") == 0)
+		fd = STDOUT_FILENO;
+	else {
+		mode_t mode = group_read ? 0640 : 0600;
+
+		fd = open(output_name, O_WRONLY | O_CREAT, mode);
+		if (fd < 0)
+			rte_exit(EXIT_FAILURE, "Can not open \"%s\": %s\n",
+				 output_name, strerror(errno));
+	}
+
+	if (use_pcapng) {
+		char *os = get_os_info();
+
+		ret.pcapng = rte_pcapng_fdopen(fd, os, NULL,
+					   version(), capture_comment);
+		if (ret.pcapng == NULL)
+			rte_exit(EXIT_FAILURE, "pcapng_fdopen failed: %s\n",
+				 strerror(rte_errno));
+		free(os);
+	} else {
+		pcap_t *pcap;
+
+		pcap = pcap_open_dead_with_tstamp_precision(DLT_EN10MB, snaplen,
+							    PCAP_TSTAMP_PRECISION_NANO);
+		if (pcap == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_open_dead failed\n");
+
+		ret.dumper = pcap_dump_fopen(pcap, fdopen(fd, "w"));
+		if (ret.dumper == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_dump_fopen failed: %s\n",
+				 pcap_geterr(pcap));
+	}
+
+	return ret;
+}
+
+static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp)
+{
+	struct interface *intf;
+	uint32_t flags;
+	int ret;
+
+	flags = RTE_PDUMP_FLAG_RXTX;
+	if (use_pcapng)
+		flags |= RTE_PDUMP_FLAG_PCAPNG;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (promiscuous_mode)
+			rte_eth_promiscuous_enable(intf->port);
+
+		ret = rte_pdump_enable_bpf(intf->port, RTE_PDUMP_ALL_QUEUES,
+					   flags, snaplen,
+					   r, mp, bpf_prm);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Packet dump enable failed: %s\n",
+				 rte_strerror(-ret));
+	}
+}
+
+/*
+ * Show current count of captured packets
+ * with backspaces to overwrite last value.
+ */
+static void show_count(uint64_t count)
+{
+	unsigned int i;
+	static unsigned int bt;
+
+	for (i = 0; i < bt; i++)
+		fputc('\b', stderr);
+
+	bt = fprintf(stderr, "%"PRIu64" ", count);
+}
+
+/* Write multiple packets in older pcap format */
+static ssize_t
+pcap_write_packets(pcap_dumper_t *dumper,
+		   struct rte_mbuf *pkts[], uint16_t n)
+{
+	uint8_t temp_data[snaplen];
+	struct pcap_pkthdr header;
+	uint16_t i;
+	size_t total = 0;
+
+	gettimeofday(&header.ts, NULL);
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m = pkts[i];
+
+		header.len = rte_pktmbuf_pkt_len(m);
+		header.caplen = RTE_MIN(header.len, snaplen);
+
+		pcap_dump((u_char *)dumper, &header,
+			  rte_pktmbuf_read(m, 0, header.caplen, temp_data));
+
+		total += sizeof(header) + header.len;
+	}
+
+	return total;
+}
+
+/* Process all packets in ring and dump to capture file */
+static int process_ring(dumpcap_out_t out, struct rte_ring *r)
+{
+	struct rte_mbuf *pkts[BURST_SIZE];
+	unsigned int avail, n;
+	static unsigned int empty_count;
+	ssize_t written;
+
+	n = rte_ring_sc_dequeue_burst(r, (void **) pkts, BURST_SIZE,
+				      &avail);
+	if (n == 0) {
+		/* don't consume endless amounts of cpu if idle */
+		if (empty_count < SLEEP_THRESHOLD)
+			++empty_count;
+		else
+			usleep(10);
+		return 0;
+	}
+
+	empty_count = (avail == 0);
+
+	if (use_pcapng)
+		written = rte_pcapng_write_packets(out.pcapng, pkts, n);
+	else
+		written = pcap_write_packets(out.dumper, pkts, n);
+
+	rte_pktmbuf_free_bulk(pkts, n);
+
+	if (written < 0)
+		return -1;
+
+	file_size += written;
+	packets_received += n;
+	if (!quiet)
+		show_count(packets_received);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	struct rte_ring *r;
+	struct rte_mempool *mp;
+	dumpcap_out_t out;
+
+	progname = argv[0];
+
+	dpdk_init();
+	parse_opts(argc, argv);
+
+	if (filter_str)
+		compile_filter();
+
+	if (TAILQ_EMPTY(&interfaces))
+		set_default_interface();
+
+	r = create_ring();
+	mp = create_mempool();
+	out = create_output();
+
+	start_time = create_timestamp();
+	enable_pdump(r, mp);
+
+	signal(SIGINT, signal_handler);
+	signal(SIGPIPE, SIG_IGN);
+
+	enable_primary_monitor();
+
+	if (!quiet) {
+		fprintf(stderr, "Packets captured: ");
+		show_count(0);
+	}
+
+	while (!__atomic_load_n(&quit_signal, __ATOMIC_RELAXED)) {
+		if (process_ring(out, r) < 0) {
+			fprintf(stderr, "pcapng file write failed; %s\n",
+				strerror(errno));
+			break;
+		}
+
+		if (stop.size && file_size >= stop.size)
+			break;
+
+		if (stop.packets && packets_received >= stop.packets)
+			break;
+
+		if (stop.duration != 0 &&
+		    create_timestamp() - start_time > stop.duration)
+			break;
+	}
+
+	end_time = create_timestamp();
+	disable_primary_monitor();
+
+	if (rte_eal_primary_proc_alive(NULL))
+		report_packet_stats(out);
+
+	if (use_pcapng)
+		rte_pcapng_close(out.pcapng);
+	else
+		pcap_dump_close(out.dumper);
+
+	cleanup_pdump_resources();
+	rte_free(bpf_filter);
+	rte_ring_free(r);
+	rte_mempool_free(mp);
+
+	return rte_eal_cleanup() ? EXIT_FAILURE : 0;
+}
diff --git a/app/dumpcap/meson.build b/app/dumpcap/meson.build
new file mode 100644
index 000000000000..794336211eff
--- /dev/null
+++ b/app/dumpcap/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+if not dpdk_conf.has('RTE_PORT_PCAP')
+    build = false
+    reason = 'missing dependency, "libpcap"'
+endif
+
+if is_windows
+	build = false
+	reason = 'not supported on Windows'
+	subdir_done()
+endif
+
+sources = files('main.c')
+deps += ['ethdev', 'pdump', 'pcapng', 'bpf']
diff --git a/app/meson.build b/app/meson.build
index 4c6049807cc3..e41a2e390236 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017-2019 Intel Corporation
 
 apps = [
+        'dumpcap',
         'pdump',
         'proc-info',
         'test-acl',
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v10 08/12] test: add test for bpf_convert
  2021-09-16 22:26 ` [dpdk-dev] [PATCH v10 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (6 preceding siblings ...)
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
@ 2021-09-16 22:26   ` Stephen Hemminger
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 09/12] test: add a test for pcapng library Stephen Hemminger
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16 22:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Add some functional tests for the Classic BPF to DPDK BPF converter.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 200 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 200 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 527c06b80708..543a5fd615b2 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -10,6 +10,7 @@
 #include <rte_memory.h>
 #include <rte_debug.h>
 #include <rte_hexdump.h>
+#include <rte_malloc.h>
 #include <rte_random.h>
 #include <rte_byteorder.h>
 #include <rte_errno.h>
@@ -3233,3 +3234,202 @@ test_bpf(void)
 }
 
 REGISTER_TEST_COMMAND(bpf_autotest, test_bpf);
+
+#ifdef RTE_PORT_PCAP
+#include <pcap/pcap.h>
+
+static void
+test_bpf_dump(struct bpf_program *cbf, const struct rte_bpf_prm *prm)
+{
+	printf("cBPF program (%u insns)\n", cbf->bf_len);
+	bpf_dump(cbf, 1);
+
+	printf("\neBPF program (%u insns)\n", prm->nb_ins);
+	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
+}
+
+static int
+test_bpf_match(pcap_t *pcap, const char *str,
+	       struct rte_mbuf *mb)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+	int ret = -1;
+	uint64_t rc;
+
+	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
+		       __func__, __LINE__,  str, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, str, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	rc = rte_bpf_exec(bpf, mb);
+	/* The return code from bpf capture filter is non-zero if matched */
+	ret = (rc == 0);
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return ret;
+}
+
+/* Basic sanity test can we match a IP packet */
+static int
+test_bpf_filter_sanity(pcap_t *pcap)
+{
+	const uint32_t plen = 100;
+	struct rte_mbuf mb, *m;
+	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
+	struct {
+		struct rte_ether_hdr eth_hdr;
+		struct rte_ipv4_hdr ip_hdr;
+	} *hdr;
+
+	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
+	m = &mb;
+
+	hdr = rte_pktmbuf_mtod(m, typeof(hdr));
+	hdr->eth_hdr = (struct rte_ether_hdr) {
+		.d_addr.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+	};
+	hdr->ip_hdr = (struct rte_ipv4_hdr) {
+		.version_ihl = RTE_IPV4_VHL_DEF,
+		.total_length = rte_cpu_to_be_16(plen),
+		.time_to_live = IPDEFTTL,
+		.next_proto_id = IPPROTO_RAW,
+		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+	};
+
+	if (test_bpf_match(pcap, "ip", m) != 0) {
+		printf("%s@%d: filter \"ip\" doesn't match test data\n",
+		       __func__, __LINE__);
+		return -1;
+	}
+	if (test_bpf_match(pcap, "not ip", m) == 0) {
+		printf("%s@%d: filter \"not ip\" does match test data\n",
+		       __func__, __LINE__);
+		return -1;
+	}
+
+	return 0;
+}
+
+/*
+ * Some sample pcap filter strings from
+ * https://wiki.wireshark.org/CaptureFilters
+ */
+static const char * const sample_filters[] = {
+	"host 172.18.5.4",
+	"net 192.168.0.0/24",
+	"src net 192.168.0.0/24",
+	"src net 192.168.0.0 mask 255.255.255.0",
+	"dst net 192.168.0.0/24",
+	"dst net 192.168.0.0 mask 255.255.255.0",
+	"port 53",
+	"host dpdk.org and not (port 80 or port 25)",
+	"host dpdk.org and not port 80 and not port 25",
+	"port not 53 and not arp",
+	"(tcp[0:2] > 1500 and tcp[0:2] < 1550) or (tcp[2:2] > 1500 and tcp[2:2] < 1550)",
+	"ether proto 0x888e",
+	"ether[0] & 1 = 0 and ip[16] >= 224",
+	"icmp[icmptype] != icmp-echo and icmp[icmptype] != icmp-echoreply",
+	"tcp[tcpflags] & (tcp-syn|tcp-fin) != 0 and not src and dst net 127.0.0.1",
+	"not ether dst 01:80:c2:00:00:0e",
+	"not broadcast and not multicast",
+	"dst host ff02::1",
+	"port 80 and tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420",
+	/* Worms */
+	"dst port 135 and tcp port 135 and ip[2:2]==48",
+	"icmp[icmptype]==icmp-echo and ip[2:2]==92 and icmp[8:4]==0xAAAAAAAA",
+	"dst port 135 or dst port 445 or dst port 1433"
+	" and tcp[tcpflags] & (tcp-syn) != 0"
+	" and tcp[tcpflags] & (tcp-ack) = 0 and src net 192.168.0.0/24",
+	"tcp src port 443 and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4] = 0x18)"
+	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 1] = 0x03)"
+	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 2] < 0x04)"
+	" and ((ip[2:2] - 4 * (ip[0] & 0x0F) - 4 * ((tcp[12] & 0xF0) >> 4) > 69))",
+	/* Other */
+	"len = 128",
+};
+
+static int
+test_bpf_filter(pcap_t *pcap, const char *s)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+
+	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
+		       __func__, __LINE__, s, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, s, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	else {
+		printf("%s \"%s\"\n", __func__, s);
+		test_bpf_dump(&fcode, prm);
+	}
+
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return (bpf == NULL) ? -1 : 0;
+}
+
+static int
+test_bpf_convert(void)
+{
+	unsigned int i;
+	pcap_t *pcap;
+	int rc;
+
+	pcap = pcap_open_dead(DLT_EN10MB, 262144);
+	if (!pcap) {
+		printf("pcap_open_dead failed\n");
+		return -1;
+	}
+
+	rc = test_bpf_filter_sanity(pcap);
+	for (i = 0; i < RTE_DIM(sample_filters); i++)
+		rc |= test_bpf_filter(pcap, sample_filters[i]);
+
+	pcap_close(pcap);
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(bpf_convert_autotest, test_bpf_convert);
+#endif /* RTE_PORT_PCAP */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v10 09/12] test: add a test for pcapng library
  2021-09-16 22:26 ` [dpdk-dev] [PATCH v10 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (7 preceding siblings ...)
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 08/12] test: add test for bpf_convert Stephen Hemminger
@ 2021-09-16 22:26   ` Stephen Hemminger
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 10/12] test: enable bpf autotest Stephen Hemminger
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16 22:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Simple unit test that created pcapng file using API.

To run this test you need to have at least one device.
For example:

DPDK_TEST=pcapng_autotest ./build/app/test/dpdk-test -l 0-15 \
    --no-huge -m 2048 --vdev=net_tap,iface=dummy

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/meson.build   |   4 +
 app/test/test_pcapng.c | 272 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 276 insertions(+)
 create mode 100644 app/test/test_pcapng.c

diff --git a/app/test/meson.build b/app/test/meson.build
index a7611686adcb..8cf41021deb4 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -395,6 +395,10 @@ if dpdk_conf.has('RTE_NET_RING')
     fast_tests += [['pdump_autotest', true]]
 endif
 
+if dpdk_conf.has('RTE_PORT_PCAP')
+    test_sources += 'test_pcapng.c'
+endif
+
 if dpdk_conf.has('RTE_LIB_POWER')
     test_deps += 'power'
 endif
diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
new file mode 100644
index 000000000000..df837ffe0d51
--- /dev/null
+++ b/app/test/test_pcapng.c
@@ -0,0 +1,272 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Microsoft Corporation
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_net.h>
+#include <rte_pcapng.h>
+
+#include <pcap/pcap.h>
+
+#include "test.h"
+
+#define NUM_PACKETS    10
+#define DUMMY_MBUF_NUM 3
+
+static rte_pcapng_t *pcapng;
+static struct rte_mempool *mp;
+static const uint32_t pkt_len = 200;
+static uint16_t port_id;
+static char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng";
+
+/* first mbuf in the packet, should always be at offset 0 */
+struct dummy_mbuf {
+	struct rte_mbuf mb[DUMMY_MBUF_NUM];
+	uint8_t buf[DUMMY_MBUF_NUM][RTE_MBUF_DEFAULT_BUF_SIZE];
+};
+
+static void
+dummy_mbuf_prep(struct rte_mbuf *mb, uint8_t buf[], uint32_t buf_len,
+	uint32_t data_len)
+{
+	uint32_t i;
+	uint8_t *db;
+
+	mb->buf_addr = buf;
+	mb->buf_iova = (uintptr_t)buf;
+	mb->buf_len = buf_len;
+	rte_mbuf_refcnt_set(mb, 1);
+
+	/* set pool pointer to dummy value, test doesn't use it */
+	mb->pool = (void *)buf;
+
+	rte_pktmbuf_reset(mb);
+	db = (uint8_t *)rte_pktmbuf_append(mb, data_len);
+
+	for (i = 0; i != data_len; i++)
+		db[i] = i;
+}
+
+/* Make an IP packet consisting of chain of one packets */
+static void
+mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen)
+{
+	struct {
+		struct rte_ether_hdr eth;
+		struct rte_ipv4_hdr ip;
+	} pkt = {
+		.eth = {
+			.d_addr.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+			.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+		},
+		.ip = {
+			.version_ihl = RTE_IPV4_VHL_DEF,
+			.total_length = rte_cpu_to_be_16(plen),
+			.time_to_live = IPDEFTTL,
+			.next_proto_id = IPPROTO_RAW,
+			.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+			.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+		}
+	};
+
+	memset(dm, 0, sizeof(*dm));
+	dummy_mbuf_prep(&dm->mb[0], dm->buf[0], sizeof(dm->buf[0]), plen);
+
+	rte_eth_random_addr(pkt.eth.s_addr.addr_bytes);
+	memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, RTE_MIN(sizeof(pkt), plen));
+}
+
+static int
+test_setup(void)
+{
+	int tmp_fd;
+
+	port_id = rte_eth_find_next(0);
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		fprintf(stderr, "No valid Ether port\n");
+		return -1;
+	}
+
+	tmp_fd = mkstemps(file_name, strlen(".pcapng"));
+	if (tmp_fd == -1) {
+		perror("mkstemps() failure");
+		return -1;
+	}
+	printf("pcapng: output file %s\n", file_name);
+
+	/* open a test capture file */
+	pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL);
+	if (pcapng == NULL) {
+		fprintf(stderr, "rte_pcapng_fdopen failed\n");
+		close(tmp_fd);
+		return -1;
+	}
+
+
+	/* Make a pool for cloned packeets */
+	mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", NUM_PACKETS,
+					    0, 0,
+					    rte_pcapng_mbuf_size(pkt_len),
+					    SOCKET_ID_ANY, "ring_mp_sc");
+	if (mp == NULL) {
+		fprintf(stderr, "Cannot create mempool\n");
+		return -1;
+	}
+	return 0;
+
+}
+
+static int
+test_write_packets(void)
+{
+	struct rte_mbuf *orig;
+	struct rte_mbuf *clones[NUM_PACKETS] = { };
+	struct dummy_mbuf mbfs;
+	unsigned int i;
+	ssize_t len;
+
+	/* make a dummy packet */
+	mbuf1_prepare(&mbfs, pkt_len);
+
+	/* clone them */
+	orig  = &mbfs.mb[0];
+	for (i = 0; i < NUM_PACKETS; i++) {
+		struct rte_mbuf *mc;
+
+		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
+				rte_get_tsc_cycles(), 0);
+		if (mc == NULL) {
+			fprintf(stderr, "Cannot copy packet\n");
+			return -1;
+		}
+		clones[i] = mc;
+	}
+
+	/* write it to capture file */
+	len = rte_pcapng_write_packets(pcapng, clones, NUM_PACKETS);
+
+	rte_pktmbuf_free_bulk(clones, NUM_PACKETS);
+
+	if (len <= 0) {
+		fprintf(stderr, "Write of packets failed\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+test_write_stats(void)
+{
+	ssize_t len;
+
+	/* write a statistics block */
+	len = rte_pcapng_write_stats(pcapng, port_id,
+				     NULL, 0, 0,
+				     NUM_PACKETS, 0);
+	if (len <= 0) {
+		fprintf(stderr, "Write of statistics failed\n");
+		return -1;
+	}
+	return 0;
+}
+
+static void
+pkt_print(u_char *user, const struct pcap_pkthdr *h,
+	  const u_char *bytes)
+{
+	unsigned int *countp = (unsigned int *)user;
+	const struct rte_ether_hdr *eh;
+	struct tm *tm;
+	char tbuf[128], src[64], dst[64];
+
+	tm = localtime(&h->ts.tv_sec);
+	if (tm == NULL) {
+		perror("localtime");
+		return;
+	}
+
+	if (strftime(tbuf, sizeof(tbuf), "%X", tm) == 0) {
+		fprintf(stderr, "strftime returned 0!\n");
+		return;
+	}
+
+	eh = (const struct rte_ether_hdr *)bytes;
+	rte_ether_format_addr(dst, sizeof(dst), &eh->d_addr);
+	rte_ether_format_addr(src, sizeof(src), &eh->s_addr);
+	printf("%s.%06lu: %s -> %s type %x length %u\n",
+	       tbuf, (unsigned long)h->ts.tv_usec,
+	       src, dst, rte_be_to_cpu_16(eh->ether_type), h->len);
+
+	*countp += 1;
+}
+
+/*
+ * Open the resulting pcapng file with libpcap
+ * Would be better to use capinfos from wireshark
+ * but that creates an unwanted dependency.
+ */
+static int
+test_validate(void)
+{
+	char errbuf[PCAP_ERRBUF_SIZE];
+	unsigned int count = 0;
+	pcap_t *pcap;
+	int ret;
+
+	pcap = pcap_open_offline(file_name, errbuf);
+	if (pcap == NULL) {
+		fprintf(stderr, "pcap_open_offline('%s') failed: %s\n",
+			file_name, errbuf);
+		return -1;
+	}
+
+	ret = pcap_loop(pcap, 0, pkt_print, (u_char *)&count);
+	if (ret == 0)
+		printf("Saw %u packets\n", count);
+	else
+		fprintf(stderr, "pcap_dispatch: failed: %s\n",
+			pcap_geterr(pcap));
+	pcap_close(pcap);
+
+	return ret;
+}
+
+static void
+test_cleanup(void)
+{
+	if (mp)
+		rte_mempool_free(mp);
+
+	if (pcapng)
+		rte_pcapng_close(pcapng);
+
+}
+
+static struct
+unit_test_suite test_pcapng_suite  = {
+	.setup = test_setup,
+	.teardown = test_cleanup,
+	.suite_name = "Test Pcapng Unit Test Suite",
+	.unit_test_cases = {
+		TEST_CASE(test_write_packets),
+		TEST_CASE(test_write_stats),
+		TEST_CASE(test_validate),
+		TEST_CASES_END()
+	}
+};
+
+static int
+test_pcapng(void)
+{
+	return unit_test_suite_runner(&test_pcapng_suite);
+}
+
+REGISTER_TEST_COMMAND(pcapng_autotest, test_pcapng);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v10 10/12] test: enable bpf autotest
  2021-09-16 22:26 ` [dpdk-dev] [PATCH v10 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (8 preceding siblings ...)
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 09/12] test: add a test for pcapng library Stephen Hemminger
@ 2021-09-16 22:26   ` Stephen Hemminger
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 11/12] doc: changes for new pcapng and dumpcap Stephen Hemminger
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16 22:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

The BPF autotest is defined but not run automatically.
Since it is short, it should be added to the autotest suite.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/meson.build | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/app/test/meson.build b/app/test/meson.build
index 8cf41021deb4..ba7d568bf330 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -193,6 +193,8 @@ test_deps = [
 fast_tests = [
         ['acl_autotest', true],
         ['atomic_autotest', false],
+        ['bpf_autotest', true],
+        ['bpf_convert_autotest', true],
         ['bitops_autotest', true],
         ['byteorder_autotest', true],
         ['cksum_autotest', true],
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v10 11/12] doc: changes for new pcapng and dumpcap
  2021-09-16 22:26 ` [dpdk-dev] [PATCH v10 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (9 preceding siblings ...)
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 10/12] test: enable bpf autotest Stephen Hemminger
@ 2021-09-16 22:26   ` Stephen Hemminger
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16 22:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Describe the new packet capture library and utilities

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/api/doxy-api-index.md                     |  1 +
 doc/api/doxy-api.conf.in                      |  1 +
 .../howto/img/packet_capture_framework.svg    | 96 +++++++++----------
 doc/guides/howto/packet_capture_framework.rst | 67 ++++++-------
 doc/guides/prog_guide/index.rst               |  1 +
 doc/guides/prog_guide/pcapng_lib.rst          | 24 +++++
 doc/guides/prog_guide/pdump_lib.rst           | 28 ++++--
 doc/guides/rel_notes/release_21_11.rst        | 10 ++
 doc/guides/tools/dumpcap.rst                  | 86 +++++++++++++++++
 doc/guides/tools/index.rst                    |  1 +
 10 files changed, 228 insertions(+), 87 deletions(-)
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a0356..ee07394d1c78 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -223,3 +223,4 @@ The public API headers are grouped by topics:
   [experimental APIs]  (@ref rte_compat.h),
   [ABI versioning]     (@ref rte_function_versioning.h),
   [version]            (@ref rte_version.h)
+  [pcapng]             (@ref rte_pcapng.h)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6ab..aba17799a9a1 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -58,6 +58,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/metrics \
                           @TOPDIR@/lib/node \
                           @TOPDIR@/lib/net \
+                          @TOPDIR@/lib/pcapng \
                           @TOPDIR@/lib/pci \
                           @TOPDIR@/lib/pdump \
                           @TOPDIR@/lib/pipeline \
diff --git a/doc/guides/howto/img/packet_capture_framework.svg b/doc/guides/howto/img/packet_capture_framework.svg
index a76baf71fdee..1c2646a81096 100644
--- a/doc/guides/howto/img/packet_capture_framework.svg
+++ b/doc/guides/howto/img/packet_capture_framework.svg
@@ -1,6 +1,4 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<!-- Created with Inkscape (http://www.inkscape.org/) -->
-
 <svg
    xmlns:osb="http://www.openswatchbook.org/uri/2009/osb"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
@@ -16,8 +14,8 @@
    viewBox="0 0 425.19685 283.46457"
    id="svg2"
    version="1.1"
-   inkscape:version="0.91 r13725"
-   sodipodi:docname="drawing-pcap.svg">
+   inkscape:version="1.0.2 (e86c870879, 2021-01-15)"
+   sodipodi:docname="packet_capture_framework.svg">
   <defs
      id="defs4">
     <marker
@@ -228,7 +226,7 @@
        x2="487.64606"
        y2="258.38232"
        gradientUnits="userSpaceOnUse"
-       gradientTransform="translate(-84.916417,744.90779)" />
+       gradientTransform="matrix(1.1457977,0,0,0.99944907,-151.97019,745.05014)" />
     <linearGradient
        inkscape:collect="always"
        xlink:href="#linearGradient5784"
@@ -277,17 +275,18 @@
      borderopacity="1.0"
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
-     inkscape:zoom="0.57434918"
-     inkscape:cx="215.17857"
-     inkscape:cy="285.26445"
+     inkscape:zoom="1"
+     inkscape:cx="226.77165"
+     inkscape:cy="78.124511"
      inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
-     inkscape:window-width="1874"
-     inkscape:window-height="971"
-     inkscape:window-x="2"
-     inkscape:window-y="24"
-     inkscape:window-maximized="0" />
+     inkscape:window-width="2560"
+     inkscape:window-height="1414"
+     inkscape:window-x="0"
+     inkscape:window-y="0"
+     inkscape:window-maximized="1"
+     inkscape:document-rotation="0" />
   <metadata
      id="metadata7">
     <rdf:RDF>
@@ -296,7 +295,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
+        <dc:title />
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -321,15 +320,15 @@
        y="790.82452" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="61.050636"
        y="807.3205"
-       id="text4152"
-       sodipodi:linespacing="125%"><tspan
+       id="text4152"><tspan
          sodipodi:role="line"
          id="tspan4154"
          x="61.050636"
-         y="807.3205">DPDK Primary Application</tspan></text>
+         y="807.3205"
+         style="font-size:12.5px;line-height:1.25">DPDK Primary Application</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6"
@@ -339,19 +338,20 @@
        y="827.01843" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="350.68585"
        y="841.16058"
-       id="text4189"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189"><tspan
          sodipodi:role="line"
          id="tspan4191"
          x="350.68585"
-         y="841.16058">dpdk-pdump</tspan><tspan
+         y="841.16058"
+         style="font-size:12.5px;line-height:1.25">dpdk-dumpcap</tspan><tspan
          sodipodi:role="line"
          x="350.68585"
          y="856.78558"
-         id="tspan4193">tool</tspan></text>
+         id="tspan4193"
+         style="font-size:12.5px;line-height:1.25">tool</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4"
@@ -361,15 +361,15 @@
        y="891.16315" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70612"
        y="905.3053"
-       id="text4189-1"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1"><tspan
          sodipodi:role="line"
          x="352.70612"
          y="905.3053"
-         id="tspan4193-3">PCAP PMD</tspan></text>
+         id="tspan4193-3"
+         style="font-size:12.5px;line-height:1.25">librte_pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5745);fill-opacity:1;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-6"
@@ -379,15 +379,15 @@
        y="923.9931" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.02846"
        y="938.13525"
-       id="text4189-0"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-0"><tspan
          sodipodi:role="line"
          x="136.02846"
          y="938.13525"
-         id="tspan4193-6">dpdk_port0</tspan></text>
+         id="tspan4193-6"
+         style="font-size:12.5px;line-height:1.25">dpdk_port0</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-5"
@@ -397,33 +397,33 @@
        y="824.99817" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="137.54369"
        y="839.14026"
-       id="text4189-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-4"><tspan
          sodipodi:role="line"
          x="137.54369"
          y="839.14026"
-         id="tspan4193-2">librte_pdump</tspan></text>
+         id="tspan4193-2"
+         style="font-size:12.5px;line-height:1.25">librte_pdump</tspan></text>
     <rect
-       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1.07013;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5"
-       width="94.449265"
-       height="35.355339"
-       x="307.7804"
-       y="985.61243" />
+       width="108.21974"
+       height="35.335861"
+       x="297.9809"
+       y="985.62219" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70618"
        y="999.75458"
-       id="text4189-1-8"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8"><tspan
          sodipodi:role="line"
          x="352.70618"
          y="999.75458"
-         id="tspan4193-3-2">capture.pcap</tspan></text>
+         id="tspan4193-3-2"
+         style="font-size:12.5px;line-height:1.25">capture.pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5788-1);fill-opacity:1;stroke:#257cdc;stroke-width:1.12555885;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5-1"
@@ -433,15 +433,15 @@
        y="983.14984" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.53352"
        y="1002.785"
-       id="text4189-1-8-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8-4"><tspan
          sodipodi:role="line"
          x="136.53352"
          y="1002.785"
-         id="tspan4193-3-2-7">Traffic Generator</tspan></text>
+         id="tspan4193-3-2-7"
+         style="font-size:12.5px;line-height:1.25">Traffic Generator</tspan></text>
     <path
        style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker7331)"
        d="m 351.46948,927.02357 c 0,57.5787 0,57.5787 0,57.5787"
diff --git a/doc/guides/howto/packet_capture_framework.rst b/doc/guides/howto/packet_capture_framework.rst
index c31bac52340e..78baa609a021 100644
--- a/doc/guides/howto/packet_capture_framework.rst
+++ b/doc/guides/howto/packet_capture_framework.rst
@@ -1,18 +1,19 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(c) 2017 Intel Corporation.
 
-DPDK pdump Library and pdump Tool
-=================================
+DPDK packet capture libraries and tools
+=======================================
 
 This document describes how the Data Plane Development Kit (DPDK) Packet
 Capture Framework is used for capturing packets on DPDK ports. It is intended
 for users of DPDK who want to know more about the Packet Capture feature and
 for those who want to monitor traffic on DPDK-controlled devices.
 
-The DPDK packet capture framework was introduced in DPDK v16.07. The DPDK
-packet capture framework consists of the DPDK pdump library and DPDK pdump
-tool.
-
+The DPDK packet capture framework was introduced in DPDK v16.07 and
+enhanced in 21.1. The DPDK packet capture framework consists of the
+libraries for collecting packets ``librte_pdump`` and writing packets
+to a file ``librte_pcapng``. There are two sample applications:
+``dpdk-dumpcap`` and older ``dpdk-pdump``.
 
 Introduction
 ------------
@@ -22,43 +23,46 @@ allow users to initialize the packet capture framework and to enable or
 disable packet capture. The library works on a multi process communication model and its
 usage is recommended for debugging purposes.
 
-The :ref:`dpdk-pdump <pdump_tool>` tool is developed based on the
-``librte_pdump`` library.  It runs as a DPDK secondary process and is capable
-of enabling or disabling packet capture on DPDK ports. The ``dpdk-pdump`` tool
-provides command-line options with which users can request enabling or
-disabling of the packet capture on DPDK ports.
+The :ref:`librte_pcapng <pcapng_library>` library provides the APIs to format
+packets and write them to a file in Pcapng format.
+
+
+The :ref:`dpdk-dumpcap <dumpcap_tool>` is a tool that captures packets in
+like Wireshark dumpcap does for Linux. It runs as a DPDK secondary process and
+captures packets from one or more interfaces and writes them to a file
+in Pcapng format.  The ``dpdk-dumpcap`` tool is designed to take
+most of the same options as the Wireshark ``dumpcap`` command.
 
-The application which initializes the packet capture framework will be a primary process
-and the application that enables or disables the packet capture will
-be a secondary process. The primary process sends the Rx and Tx packets from the DPDK ports
-to the secondary process.
+Without any options it will use the packet capture framework to
+capture traffic from the first available DPDK port.
 
 In DPDK the ``testpmd`` application can be used to initialize the packet
-capture framework and acts as a server, and the ``dpdk-pdump`` tool acts as a
+capture framework and acts as a server, and the ``dpdk-dumpcap`` tool acts as a
 client. To view Rx or Tx packets of ``testpmd``, the application should be
-launched first, and then the ``dpdk-pdump`` tool. Packets from ``testpmd``
-will be sent to the tool, which then sends them on to the Pcap PMD device and
-that device writes them to the Pcap file or to an external interface depending
-on the command-line option used.
+launched first, and then the ``dpdk-dumpcap`` tool. Packets from ``testpmd``
+will be sent to the tool, and then to the Pcapng file.
 
 Some things to note:
 
-* The ``dpdk-pdump`` tool can only be used in conjunction with a primary
+* All tools using ``librte_pdump`` can only be used in conjunction with a primary
   application which has the packet capture framework initialized already. In
   dpdk, only ``testpmd`` is modified to initialize packet capture framework,
-  other applications remain untouched. So, if the ``dpdk-pdump`` tool has to
+  other applications remain untouched. So, if the ``dpdk-dumpcap`` tool has to
   be used with any application other than the testpmd, the user needs to
   explicitly modify that application to call the packet capture framework
   initialization code. Refer to the ``app/test-pmd/testpmd.c`` code and look
   for ``pdump`` keyword to see how this is done.
 
-* The ``dpdk-pdump`` tool depends on the libpcap based PMD.
+* The ``dpdk-pdump`` tool is an older tool created as demonstration of ``librte_pdump``
+  library. The ``dpdk-pdump`` tool provides more limited functionality and
+  and depends on the Pcap PMD. It is retained only for compatibility reasons;
+  users should use ``dpdk-dumpcap`` instead.
 
 
 Test Environment
 ----------------
 
-The overview of using the Packet Capture Framework and the ``dpdk-pdump`` tool
+The overview of using the Packet Capture Framework and the ``dpdk-dumpcap`` utility
 for packet capturing on the DPDK port in
 :numref:`figure_packet_capture_framework`.
 
@@ -66,13 +70,13 @@ for packet capturing on the DPDK port in
 
 .. figure:: img/packet_capture_framework.*
 
-   Packet capturing on a DPDK port using the dpdk-pdump tool.
+   Packet capturing on a DPDK port using the dpdk-dumpcap utility.
 
 
 Running the Application
 -----------------------
 
-The following steps demonstrate how to run the ``dpdk-pdump`` tool to capture
+The following steps demonstrate how to run the ``dpdk-dumpcap`` tool to capture
 Rx side packets on dpdk_port0 in :numref:`figure_packet_capture_framework` and
 inspect them using ``tcpdump``.
 
@@ -80,16 +84,15 @@ inspect them using ``tcpdump``.
 
      sudo <build_dir>/app/dpdk-testpmd -c 0xf0 -n 4 -- -i --port-topology=chained
 
-#. Launch the pdump tool as follows::
+#. Launch the dpdk-dump as follows::
 
-     sudo <build_dir>/app/dpdk-pdump -- \
-          --pdump 'port=0,queue=*,rx-dev=/tmp/capture.pcap'
+     sudo <build_dir>/app/dpdk-dumpcap -w /tmp/capture.pcapng
 
 #. Send traffic to dpdk_port0 from traffic generator.
-   Inspect packets captured in the file capture.pcap using a tool
-   that can interpret Pcap files, for example tcpdump::
+   Inspect packets captured in the file capture.pcap using a tool such as
+   tcpdump or tshark that can interpret Pcapng files::
 
-     $tcpdump -nr /tmp/capture.pcap
+     $ tcpdump -nr /tmp/capture.pcapng
      reading from file /tmp/capture.pcap, link-type EN10MB (Ethernet)
      11:11:36.891404 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
      11:11:36.891442 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46a3..b440c77c2ba1 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -43,6 +43,7 @@ Programmer's Guide
     ip_fragment_reassembly_lib
     generic_receive_offload_lib
     generic_segmentation_offload_lib
+    pcapng_lib
     pdump_lib
     multi_proc_support
     kernel_nic_interface
diff --git a/doc/guides/prog_guide/pcapng_lib.rst b/doc/guides/prog_guide/pcapng_lib.rst
new file mode 100644
index 000000000000..36379b530a57
--- /dev/null
+++ b/doc/guides/prog_guide/pcapng_lib.rst
@@ -0,0 +1,24 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2016 Intel Corporation.
+
+.. _pcapng_library:
+
+Packet Capture File Writer
+==========================
+
+Pcapng is a library for creating files in Pcapng file format.
+The Pcapng file format is the default capture file format for modern
+network capture processing tools. It can be read by wireshark and tcpdump.
+
+Usage
+-----
+
+Before the library can be used the function ``rte_pcapng_init``
+should be called once to initialize timestamp computation.
+
+
+References
+----------
+* Draft RFC https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+
+* Project repository  https://github.com/pcapng/pcapng/
diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst
index 62c0b015b2fe..9af91415e5ea 100644
--- a/doc/guides/prog_guide/pdump_lib.rst
+++ b/doc/guides/prog_guide/pdump_lib.rst
@@ -3,10 +3,10 @@
 
 .. _pdump_library:
 
-The librte_pdump Library
-========================
+The Packet Capture Library
+==========================
 
-The ``librte_pdump`` library provides a framework for packet capturing in DPDK.
+The DPDK ``pdump`` library provides a framework for packet capturing in DPDK.
 The library does the complete copy of the Rx and Tx mbufs to a new mempool and
 hence it slows down the performance of the applications, so it is recommended
 to use this library for debugging purposes.
@@ -23,11 +23,19 @@ or disable the packet capture, and to uninitialize it.
 
 * ``rte_pdump_enable()``:
   This API enables the packet capture on a given port and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf()``
+  This API enables the packet capture on a given port and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_enable_by_deviceid()``:
   This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf_by_deviceid()``
+  This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
+  It also allows seating an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_disable()``:
   This API disables the packet capture on a given port and queue.
@@ -61,6 +69,12 @@ and enables the packet capture by registering the Ethernet RX and TX callbacks f
 and queue combinations. Then the primary process will mirror the packets to the new mempool and enqueue them to
 the rte_ring that secondary process have passed to these APIs.
 
+The packet ring supports one of two formats. The default format enqueues copies of the original packets
+into the rte_ring. If the ``RTE_PDUMP_FLAG_PCAPNG`` is set the mbuf data is extended with header and trailer
+to match the format of Pcapng enhanced packet block. The enhanced packet block has meta-data such as the
+timestamp, port and queue the packet was captured on. It is up to the application consuming the
+packets from the ring to select the format desired.
+
 The library APIs ``rte_pdump_disable()`` and ``rte_pdump_disable_by_deviceid()`` disables the packet capture.
 For the calls to these APIs from secondary process, the library creates the "pdump disable" request and sends
 the request to the primary process over the multi process channel. The primary process takes this request and
@@ -74,5 +88,5 @@ function.
 Use Case: Packet Capturing
 --------------------------
 
-The DPDK ``app/pdump`` tool is developed based on this library to capture packets in DPDK.
-Users can use this as an example to develop their own packet capturing tools.
+The DPDK ``app/dpdk-dumpcap`` utility uses this library
+to capture packets in DPDK.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 1d56fa9bf2f1..32ebdb97c96d 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -87,6 +87,16 @@ New Features
   Added command-line options to specify total number of processes and
   current process ID. Each process owns subset of Rx and Tx queues.
 
+* **Revised packet capture framework.**
+
+  * New dpdk-dumpcap program that has most of the features of the
+    wireshark dumpcap utility including: capture of multiple interfaces,
+    filtering, and stopping after number of bytes, packets.
+  * New library for writing pcapng packet capture files.
+  * Enhancements to the pdump library to support:
+    * Packet filter with BPF.
+    * Pcapng format with timestamps and meta-data.
+    * Fixes packet capture with stripped VLAN tags.
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/dumpcap.rst b/doc/guides/tools/dumpcap.rst
new file mode 100644
index 000000000000..664ea0c79802
--- /dev/null
+++ b/doc/guides/tools/dumpcap.rst
@@ -0,0 +1,86 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Microsoft Corporation.
+
+.. _dumpcap_tool:
+
+dpdk-dumpcap Application
+========================
+
+The ``dpdk-dumpcap`` tool is a Data Plane Development Kit (DPDK)
+network traffic dump tool.  The interface is similar to  the dumpcap tool in Wireshark.
+It runs as a secondary DPDK process and lets you capture packets that are
+coming into and out of a DPDK primary process.
+The ``dpdk-dumpcap`` writes files in Pcapng packet format using
+capture file format is pcapng.
+
+Without any options set it will use DPDK to capture traffic from the first
+available DPDK interface and write the received raw packet data, along
+with timestamps into a pcapng file.
+
+If the ``-w`` option is not specified, ``dpdk-dumpcap`` writes to a newly
+create file with a name chosen based on interface name and timestamp.
+If ``-w`` option is specified, then that file is used.
+
+   .. Note::
+      * The ``dpdk-dumpcap`` tool can only be used in conjunction with a primary
+        application which has the packet capture framework initialized already.
+        In dpdk, only the ``testpmd`` is modified to initialize packet capture
+        framework, other applications remain untouched. So, if the ``dpdk-dumpcap``
+        tool has to be used with any application other than the testpmd, user
+        needs to explicitly modify that application to call packet capture
+        framework initialization code. Refer ``app/test-pmd/testpmd.c``
+        code to see how this is done.
+
+      * The ``dpdk-dumpcap`` tool runs as a DPDK secondary process. It exits when
+        the primary application exits.
+
+
+Running the Application
+-----------------------
+
+To list interfaces available for capture use ``--list-interfaces``.
+
+To filter packets in style of *tshark* use the ``-f`` flag.
+
+To capture on multiple interfaces at once, use multiple ``-I`` flags.
+
+Example
+-------
+
+.. code-block:: console
+
+   # ./<build_dir>/app/dpdk-dumpcap --list-interfaces
+   0. 000:00:03.0
+   1. 000:00:03.1
+
+   # ./<build_dir>/app/dpdk-dumpcap -I 0000:00:03.0 -c 6 -w /tmp/sample.pcapng
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 6/0
+
+   # ./<build_dir>/app/dpdk-dumpcap -f 'tcp port 80'
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 10/8
+
+
+Limitations
+-----------
+The following option of Wireshark ``dumpcap`` is not yet implemented:
+
+   * ``-b|--ring-buffer`` -- more complex file management.
+
+The following options do not make sense in the context of DPDK.
+
+   * ``-C <byte_limit>`` -- its a kernel thing
+
+   * ``-t`` -- use a thread per interface
+
+   * Timestamp type.
+
+   * Link data types. Only EN10MB (Ethernet) is supported.
+
+   * Wireless related options:  ``-I|--monitor-mode`` and  ``-k <freq>``
+
+
+.. Note::
+   * The options to ``dpdk-dumpcap`` are like the Wireshark dumpcap program and
+     are not the same as ``dpdk-pdump`` and other DPDK applications.
diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst
index 93dde4148e90..b71c12b8f2dd 100644
--- a/doc/guides/tools/index.rst
+++ b/doc/guides/tools/index.rst
@@ -8,6 +8,7 @@ DPDK Tools User Guides
     :maxdepth: 2
     :numbered:
 
+    dumpcap
     proc_info
     pdump
     pmdinfo
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v10 12/12] MAINTAINERS: add entry for new packet capture features
  2021-09-16 22:26 ` [dpdk-dev] [PATCH v10 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (10 preceding siblings ...)
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 11/12] doc: changes for new pcapng and dumpcap Stephen Hemminger
@ 2021-09-16 22:26   ` Stephen Hemminger
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-16 22:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Since the packet capture is just extension of existing pdump;
add myself as maintainer of that.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 77a549a5e8c2..df160477a217 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1424,12 +1424,17 @@ F: doc/guides/sample_app_ug/qos_scheduler.rst
 
 Packet capture
 M: Reshma Pattan <reshma.pattan@intel.com>
+M: Stephen Hemminger <stephen@networkplumber.org>
 F: lib/pdump/
+F: lib/pcapng/
 F: doc/guides/prog_guide/pdump_lib.rst
-F: app/test/test_pdump.*
-F: app/pdump/
+F: doc/guides/prog_guide/pcapng_lib.rst
+F: doc/guides/tools/dumpcap.rst
 F: doc/guides/tools/pdump.rst
-
+F: app/test/test_pdump.c
+F: app/test/test_pcapng.c
+F: app/pdump/
+F: app/dumpcap/
 
 Packet Framework
 ----------------
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v10 06/12] pdump: support pcapng and filtering
  2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 06/12] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-09-23 16:11     ` Pattan, Reshma
  2021-09-23 16:58       ` Stephen Hemminger
                         ` (2 more replies)
  0 siblings, 3 replies; 220+ messages in thread
From: Pattan, Reshma @ 2021-09-23 16:11 UTC (permalink / raw)
  To: Stephen Hemminger, dev



> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Stephen Hemminger
> 
> +++ b/lib/meson.build
> +        'bpf',
>          'bitratestats',

If alphabetical order,  should bpf come after bitratestats?

> +/*
> + * Note: version numbers intentionally start at 3
> + * in order to catch any application built with older out
> + * version of DPDK using incompatible client request format.
> + */
>  enum pdump_version {
> -	V1 = 1
> +	PDUMP_CLIENT_LEGACY = 3,
> +	PDUMP_CLIENT_PCAPNG = 4,
The version numbering was internal to library,  applications do not have control over  it, can't we start  enumeration from 1?

>  struct pdump_request {
> +	uint16_t flags;
Why is the flags type changed from unit32_t  unint16_t?

> 
> +		 * Similar behavior to rte_bpf_eth callback.
> +		 * if BPF program returns zero value for a given packet,
> +		 * then it will be ignored.
> +		 */
Looks like wrong callback name referred in the comment, should be corrected?

> +		if (cbs->filter && rcs[i] == 0) {
Why do we need to do this again if some packets already filtered.


> +	if (!(p->ver == PDUMP_CLIENT_LEGACY ||
> +	      p->ver == PDUMP_CLIENT_PCAPNG)) {
> +		PDUMP_LOG(ERR,
> +			  "incorrect client version %u\n", p->ver);
> +		return -EINVAL;
> +	}
This check is not useful here I guess, as we are setting the version in the library itself below.

> 
> +pdump_prepare_client_request(const char *device, uint16_t queue,
> +	req->queue = queue;
This assignment is done below as well, so here it is redundant I guess?

> -	} else {
> +		req->queue = queue;
>  	}
> 


> +	if (pdump_stats == NULL) {
> +		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +			PDUMP_LOG(ERR,
> +				  "pdump not initialized\n");
Might be god to say "pdump stats" not initialized  instead of just saying "pdump"?

> 
> +/**
> + * Retrieve the packet capture statistics for a queue.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param stats
> + *   A pointer to structure of type *rte_pdump_stats* to be filled in.
> + * @return
> + *   Zero if successful. -1 on error and rte_errno is set.
> + */
Missing below experimental warning in   the above comments .

> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> +notice



^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v10 06/12] pdump: support pcapng and filtering
  2021-09-23 16:11     ` Pattan, Reshma
@ 2021-09-23 16:58       ` Stephen Hemminger
  2021-09-23 18:14       ` Stephen Hemminger
  2021-09-23 18:23       ` Stephen Hemminger
  2 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-23 16:58 UTC (permalink / raw)
  To: Pattan, Reshma; +Cc: dev

On Thu, 23 Sep 2021 16:11:42 +0000
"Pattan, Reshma" <reshma.pattan@intel.com> wrote:

> > +/*
> > + * Note: version numbers intentionally start at 3
> > + * in order to catch any application built with older out
> > + * version of DPDK using incompatible client request format.
> > + */
> >  enum pdump_version {
> > -	V1 = 1
> > +	PDUMP_CLIENT_LEGACY = 3,
> > +	PDUMP_CLIENT_PCAPNG = 4,  
> The version numbering was internal to library,  applications do not have control over  it, can't we start  enumeration from 1?

Although, DPDK does not support mixing versions between primary/secondary
process. Someone is sure to try.

I wanted to make sure that if user did something invalid like using
old pdump (built with DPDK 20.11) and new application that it would
fail in a direct manner.

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v10 06/12] pdump: support pcapng and filtering
  2021-09-23 16:11     ` Pattan, Reshma
  2021-09-23 16:58       ` Stephen Hemminger
@ 2021-09-23 18:14       ` Stephen Hemminger
  2021-09-23 18:23       ` Stephen Hemminger
  2 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-23 18:14 UTC (permalink / raw)
  To: Pattan, Reshma; +Cc: dev

On Thu, 23 Sep 2021 16:11:42 +0000
"Pattan, Reshma" <reshma.pattan@intel.com> wrote:

> >  struct pdump_request {
> > +	uint16_t flags;  
> Why is the flags type changed from unit32_t  unint16_t?

Only to pack the structure. They were unused.

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v10 06/12] pdump: support pcapng and filtering
  2021-09-23 16:11     ` Pattan, Reshma
  2021-09-23 16:58       ` Stephen Hemminger
  2021-09-23 18:14       ` Stephen Hemminger
@ 2021-09-23 18:23       ` Stephen Hemminger
  2021-09-24 15:33         ` Pattan, Reshma
  2 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-23 18:23 UTC (permalink / raw)
  To: Pattan, Reshma; +Cc: dev

On Thu, 23 Sep 2021 16:11:42 +0000
"Pattan, Reshma" <reshma.pattan@intel.com> wrote:

> > 
> > +		 * Similar behavior to rte_bpf_eth callback.
> > +		 * if BPF program returns zero value for a given packet,
> > +		 * then it will be ignored.
> > +		 */  
> Looks like wrong callback name referred in the comment, should be corrected?

It really is pcap_offline_filter() and Linux kernel socket filter.

> > +		if (cbs->filter && rcs[i] == 0) {  
> Why do we need to do this again if some packets already filtered.

The earlier call (rte_bpf_exec_burst) returns the number of packets that were
processed. Actually, the return value there is always equal n.
So this code is the filtering, there was an issue with checking return value of exec_burst.

^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v11 00/12] Packet capture framework enhancements
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (13 preceding siblings ...)
  2021-09-16 22:26 ` [dpdk-dev] [PATCH v10 00/12] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-24 15:21 ` Stephen Hemminger
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 01/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
                     ` (11 more replies)
  2021-10-01 16:26 ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
                   ` (3 subsequent siblings)
  18 siblings, 12 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-24 15:21 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patch set is a more complete version of the the enhanced
packet capture support described last year.

The new capture library and utility are:
  - faster avoids lots of extra I/O, does bursting, etc.
  - gives more information (multiple ports, queues, etc)
  - has a better user interface (same as Wireshark dumpcap)
  - fixes structural problems with VLAN's and timestamps

There are no blocker items. The following are worth noting:
  * bogus checkpatch warnings
	- the correct flag to open is O_CREAT
        - intentionally keeping macro with goto since that
          was in original code and is clearer
        - the tempfile name can not be const since it is
          overwritten by tmpfile() call

  * enabling BPF tests causes CI to see a pre-existing bug
    https://bugs.dpdk.org/show_bug.cgi?id=811

  * filtering for stripped VLAN tags requires changes to
    libpcap (to be addressed in future)

v11
  - address review comments for pdump (patch 6)

v10:
  - fix to rte_bpf_dump to handle more instructions
    make sure all bpf_test cases are decoded

v9:
  - incorporate suggested change to BPF XOR
  - make autotest for pcapng more complete by reading the
    resulting file with libpcap

v8:
  - enable BPF tests in autotest
  - add more BPF test strings
  - use rte_strscpy to satisfy checkpatch
  - merge MAINTAINERS (put this in with existing pdump)

v7:
  - add functional tests for pcapng lib
  - bug fix for error returns in pcapng lib
  - handle long osname on FreeBSD
  - resolve almost all checkpatch issues

v5:
  - minor build and checkpatch fixes for RHEL/FreeBSD
  - disable lib/pdump on Windows. It was not useful before
    and now pdump depends on bpf.

v4:
  - minor checkpatch fixes.
    Note: some of the checkpatch warnings are bogus and won't be fixed.
  - fix build of dumpcap on FreeBSD

v3:
  - introduce packet filters using classic BPF to eBPF converter
    required small fix to DPDK BPF interpreter
  - introduce function to decode eBPF instructions
  - add option to dumpcap to show both classic BPF and eBPF result
  - drop some un-useful stubs
  - minor checkpatch warning cleanup

v2:
   fix formatting of packet blocks
   fix the new packet capture statistics
   fix crash when primary process exits
   record start/end time
   various whitespace/checkpatch warnings


Stephen Hemminger (12):
  librte_pcapng: add new library for writing pcapng files
  lib: pdump is not supported on Windows
  bpf: allow self-xor operation
  bpf: add function to convert classic BPF to DPDK BPF
  bpf: add function to dump eBPF instructions
  pdump: support pcapng and filtering
  app/dumpcap: add new packet capture application
  test: add test for bpf_convert
  test: add a test for pcapng library
  test: enable bpf autotest
  doc: changes for new pcapng and dumpcap
  MAINTAINERS: add entry for new packet capture features

 MAINTAINERS                                   |  11 +-
 app/dumpcap/main.c                            | 844 ++++++++++++++++++
 app/dumpcap/meson.build                       |  16 +
 app/meson.build                               |   1 +
 app/test/meson.build                          |   6 +
 app/test/test_bpf.c                           | 200 +++++
 app/test/test_pcapng.c                        | 272 ++++++
 doc/api/doxy-api-index.md                     |   1 +
 doc/api/doxy-api.conf.in                      |   1 +
 .../howto/img/packet_capture_framework.svg    |  96 +-
 doc/guides/howto/packet_capture_framework.rst |  67 +-
 doc/guides/prog_guide/index.rst               |   1 +
 doc/guides/prog_guide/pcapng_lib.rst          |  24 +
 doc/guides/prog_guide/pdump_lib.rst           |  28 +-
 doc/guides/rel_notes/release_21_11.rst        |  10 +
 doc/guides/tools/dumpcap.rst                  |  86 ++
 doc/guides/tools/index.rst                    |   1 +
 lib/bpf/bpf_convert.c                         | 575 ++++++++++++
 lib/bpf/bpf_dump.c                            | 139 +++
 lib/bpf/bpf_validate.c                        |   9 +-
 lib/bpf/meson.build                           |   6 +
 lib/bpf/rte_bpf.h                             |  39 +
 lib/bpf/version.map                           |   7 +
 lib/meson.build                               |   6 +-
 lib/pcapng/meson.build                        |   8 +
 lib/pcapng/pcapng_proto.h                     | 129 +++
 lib/pcapng/rte_pcapng.c                       | 574 ++++++++++++
 lib/pcapng/rte_pcapng.h                       | 194 ++++
 lib/pcapng/version.map                        |  12 +
 lib/pdump/meson.build                         |   2 +-
 lib/pdump/rte_pdump.c                         | 427 ++++++---
 lib/pdump/rte_pdump.h                         | 113 ++-
 lib/pdump/version.map                         |   8 +
 33 files changed, 3694 insertions(+), 219 deletions(-)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build
 create mode 100644 app/test/test_pcapng.c
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst
 create mode 100644 lib/bpf/bpf_convert.c
 create mode 100644 lib/bpf/bpf_dump.c
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v11 01/12] librte_pcapng: add new library for writing pcapng files
  2021-09-24 15:21 ` [dpdk-dev] [PATCH v11 00/12] Packet capture framework enhancements Stephen Hemminger
@ 2021-09-24 15:21   ` Stephen Hemminger
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 02/12] lib: pdump is not supported on Windows Stephen Hemminger
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-24 15:21 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See draft RFC
  https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
and
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build           |   1 +
 lib/pcapng/meson.build    |   8 +
 lib/pcapng/pcapng_proto.h | 129 +++++++++
 lib/pcapng/rte_pcapng.c   | 574 ++++++++++++++++++++++++++++++++++++++
 lib/pcapng/rte_pcapng.h   | 194 +++++++++++++
 lib/pcapng/version.map    |  12 +
 6 files changed, 918 insertions(+)
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

diff --git a/lib/meson.build b/lib/meson.build
index 1673ca4323c0..51bf9c2d11f0 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -41,6 +41,7 @@ libraries = [
         'latencystats',
         'lpm',
         'member',
+        'pcapng',
         'power',
         'pdump',
         'rawdev',
diff --git a/lib/pcapng/meson.build b/lib/pcapng/meson.build
new file mode 100644
index 000000000000..fe636bdf3c0b
--- /dev/null
+++ b/lib/pcapng/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+version = 1
+sources = files('rte_pcapng.c')
+headers = files('rte_pcapng.h')
+
+deps += ['ethdev']
diff --git a/lib/pcapng/pcapng_proto.h b/lib/pcapng/pcapng_proto.h
new file mode 100644
index 000000000000..47161d8a1213
--- /dev/null
+++ b/lib/pcapng/pcapng_proto.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * PCAP Next Generation Capture File writer
+ *
+ * See: https://github.com/pcapng/pcapng/ for the file format.
+ */
+
+enum pcapng_block_types {
+	PCAPNG_INTERFACE_BLOCK		= 1,
+	PCAPNG_PACKET_BLOCK,		/* Obsolete */
+	PCAPNG_SIMPLE_PACKET_BLOCK,
+	PCAPNG_NAME_RESOLUTION_BLOCK,
+	PCAPNG_INTERFACE_STATS_BLOCK,
+	PCAPNG_ENHANCED_PACKET_BLOCK,
+
+	PCAPNG_SECTION_BLOCK		= 0x0A0D0D0A,
+};
+
+struct pcapng_option {
+	uint16_t code;
+	uint16_t length;
+	uint8_t data[];
+};
+
+#define PCAPNG_BYTE_ORDER_MAGIC 0x1A2B3C4D
+#define PCAPNG_MAJOR_VERS 1
+#define PCAPNG_MINOR_VERS 0
+
+enum pcapng_opt {
+	PCAPNG_OPT_END	= 0,
+	PCAPNG_OPT_COMMENT = 1,
+};
+
+struct pcapng_section_header {
+	uint32_t block_type;
+	uint32_t block_length;
+	uint32_t byte_order_magic;
+	uint16_t major_version;
+	uint16_t minor_version;
+	uint64_t section_length;
+};
+
+enum pcapng_section_opt {
+	PCAPNG_SHB_HARDWARE = 2,
+	PCAPNG_SHB_OS	    = 3,
+	PCAPNG_SHB_USERAPPL = 4,
+};
+
+struct pcapng_interface_block {
+	uint32_t block_type;	/* 1 */
+	uint32_t block_length;
+	uint16_t link_type;
+	uint16_t reserved;
+	uint32_t snap_len;
+};
+
+enum pcapng_interface_options {
+	PCAPNG_IFB_NAME	 = 2,
+	PCAPNG_IFB_DESCRIPTION,
+	PCAPNG_IFB_IPV4ADDR,
+	PCAPNG_IFB_IPV6ADDR,
+	PCAPNG_IFB_MACADDR,
+	PCAPNG_IFB_EUIADDR,
+	PCAPNG_IFB_SPEED,
+	PCAPNG_IFB_TSRESOL,
+	PCAPNG_IFB_TZONE,
+	PCAPNG_IFB_FILTER,
+	PCAPNG_IFB_OS,
+	PCAPNG_IFB_FCSLEN,
+	PCAPNG_IFB_TSOFFSET,
+	PCAPNG_IFB_HARDWARE,
+};
+
+struct pcapng_enhance_packet_block {
+	uint32_t block_type;	/* 6 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+	uint32_t capture_length;
+	uint32_t original_length;
+};
+
+/* Flags values */
+#define PCAPNG_IFB_INBOUND   0b01
+#define PCAPNG_IFB_OUTBOUND  0b10
+
+enum pcapng_epb_options {
+	PCAPNG_EPB_FLAGS = 2,
+	PCAPNG_EPB_HASH,
+	PCAPNG_EPB_DROPCOUNT,
+	PCAPNG_EPB_PACKETID,
+	PCAPNG_EPB_QUEUE,
+	PCAPNG_EPB_VERDICT,
+};
+
+enum pcapng_epb_hash {
+	PCAPNG_HASH_2COMP = 0,
+	PCAPNG_HASH_XOR,
+	PCAPNG_HASH_CRC32,
+	PCAPNG_HASH_MD5,
+	PCAPNG_HASH_SHA1,
+	PCAPNG_HASH_TOEPLITZ,
+};
+
+struct pcapng_simple_packet {
+	uint32_t block_type;	/* 3 */
+	uint32_t block_length;
+	uint32_t packet_length;
+};
+
+struct pcapng_statistics {
+	uint32_t block_type;	/* 5 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+};
+
+enum pcapng_isb_options {
+	PCAPNG_ISB_STARTTIME = 2,
+	PCAPNG_ISB_ENDTIME,
+	PCAPNG_ISB_IFRECV,
+	PCAPNG_ISB_IFDROP,
+	PCAPNG_ISB_FILTERACCEPT,
+	PCAPNG_ISB_OSDROP,
+	PCAPNG_ISB_USRDELIV,
+};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
new file mode 100644
index 000000000000..f8280a8b01f4
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.c
@@ -0,0 +1,574 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+#include <errno.h>
+#include <net/if.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/uio.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+#include <rte_time.h>
+
+#include "pcapng_proto.h"
+
+/* conversion from DPDK speed to PCAPNG */
+#define PCAPNG_MBPS_SPEED 1000000ull
+
+/* Format of the capture file handle */
+struct rte_pcapng {
+	int  outfd;		/* output file */
+	/* DPDK port id to interface index in file */
+	uint32_t port_index[RTE_MAX_ETHPORTS];
+};
+
+/* For converting TSC cycles to PCAPNG ns format */
+struct pcapng_time {
+	uint64_t ns;
+	uint64_t cycles;
+} pcapng_time;
+
+RTE_INIT(pcapng_init)
+{
+	struct timespec ts;
+
+	pcapng_time.cycles = rte_get_tsc_cycles();
+	clock_gettime(CLOCK_REALTIME, &ts);
+	pcapng_time.ns = rte_timespec_to_ns(&ts);
+}
+
+/* PCAPNG timestamps are in nanoseconds */
+static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
+{
+	uint64_t delta;
+
+	delta = cycles - pcapng_time.cycles;
+	return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
+}
+
+/* length of option including padding */
+static uint16_t pcapng_optlen(uint16_t len)
+{
+	return RTE_ALIGN(sizeof(struct pcapng_option) + len,
+			 sizeof(uint32_t));
+}
+
+/* build TLV option and return location of next */
+static struct pcapng_option *
+pcapng_add_option(struct pcapng_option *popt, uint16_t code,
+		  const void *data, uint16_t len)
+{
+	popt->code = code;
+	popt->length = len;
+	memcpy(popt->data, data, len);
+
+	return (struct pcapng_option *)((uint8_t *)popt + pcapng_optlen(len));
+}
+
+/*
+ * Write required initial section header describing the capture
+ */
+static int
+pcapng_section_block(rte_pcapng_t *self,
+		    const char *os, const char *hw,
+		    const char *app, const char *comment)
+{
+	struct pcapng_section_header *hdr;
+	struct pcapng_option *opt;
+	void *buf;
+	uint32_t len;
+	ssize_t cc;
+
+	len = sizeof(*hdr);
+	if (hw)
+		len += pcapng_optlen(strlen(hw));
+	if (os)
+		len += pcapng_optlen(strlen(os));
+	if (app)
+		len += pcapng_optlen(strlen(app));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = calloc(1, len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_section_header *)buf;
+	*hdr = (struct pcapng_section_header) {
+		.block_type = PCAPNG_SECTION_BLOCK,
+		.block_length = len,
+		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
+		.major_version = PCAPNG_MAJOR_VERS,
+		.minor_version = PCAPNG_MINOR_VERS,
+		.section_length = UINT64_MAX,
+	};
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (hw)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_HARDWARE,
+					hw, strlen(hw));
+	if (os)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_OS,
+					os, strlen(os));
+	if (app)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_USERAPPL,
+					app, strlen(app));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+	/* clone block_length after option */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+/* Write the PCAPNG section header at start of file */
+static ssize_t
+pcapng_interface_block(rte_pcapng_t *self, const char *if_name,
+		      uint64_t if_speed, const uint8_t *mac_addr,
+		      const char *if_hw, const char *comment)
+{
+	struct pcapng_interface_block *hdr;
+	struct pcapng_option *opt;
+	const uint8_t tsresol = 9;	/* nanosecond resolution */
+	uint32_t len = sizeof(*hdr);
+	ssize_t cc;
+	void *buf;
+
+	len += pcapng_optlen(sizeof(tsresol));
+	if (if_name)
+		len += pcapng_optlen(strlen(if_name));
+	if (mac_addr)
+		len += pcapng_optlen(6);
+	if (if_speed)
+		len += pcapng_optlen(sizeof(uint64_t));
+	if (if_hw)
+		len += pcapng_optlen(strlen(if_hw));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+	buf = calloc(1, len);
+	if (!buf)
+		return -ENOMEM;
+
+	hdr = (struct pcapng_interface_block *)buf;
+	hdr->block_type = PCAPNG_INTERFACE_BLOCK;
+	hdr->link_type = 1;	/* Ethernet */
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (if_name)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_NAME,
+					 if_name, strlen(if_name));
+	if (mac_addr)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_MACADDR,
+					mac_addr, RTE_ETHER_ADDR_LEN);
+	if (if_speed)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_SPEED,
+					 &if_speed, sizeof(uint64_t));
+	opt = pcapng_add_option(opt, PCAPNG_IFB_TSRESOL,
+				&tsresol, sizeof(tsresol));
+	if (if_hw)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE,
+					 if_hw, strlen(if_hw));
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+static int
+pcapng_add_interface(rte_pcapng_t *self, uint16_t port)
+{
+	struct rte_eth_dev_info dev_info;
+	struct rte_ether_addr macaddr;
+	const struct rte_device *dev;
+	struct rte_eth_link link;
+	char ifname[IF_NAMESIZE];
+	char ifhw[256];
+	uint64_t speed = 0;
+
+	if (rte_eth_dev_info_get(port, &dev_info) < 0)
+		return -1;
+
+	/* make something like an interface name */
+	if (if_indextoname(dev_info.if_index, ifname) == NULL)
+		snprintf(ifname, IF_NAMESIZE, "dpdk:%u", port);
+
+	/* make a useful device hardware string */
+	dev = dev_info.device;
+	if (dev)
+		snprintf(ifhw, sizeof(ifhw),
+			 "%s-%s", dev->bus->name, dev->name);
+
+	/* DPDK reports in units of Mbps */
+	rte_eth_link_get(port, &link);
+	if (link.link_status == ETH_LINK_UP)
+		speed = link.link_speed * PCAPNG_MBPS_SPEED;
+
+	rte_eth_macaddr_get(port, &macaddr);
+
+	return pcapng_interface_block(self, ifname, speed,
+				      macaddr.addr_bytes,
+				      dev ? ifhw : NULL, NULL);
+}
+
+/*
+ * Write the list of possible interfaces at the start
+ * of the file.
+ */
+static int
+pcapng_interfaces(rte_pcapng_t *self)
+{
+	uint16_t port_id;
+	uint16_t index = 0;
+
+	RTE_ETH_FOREACH_DEV(port_id) {
+		/* The list if ports in pcapng needs to be contiguous */
+		self->port_index[port_id] = index++;
+		if (pcapng_add_interface(self, port_id) < 0)
+			return -1;
+	}
+	return 0;
+}
+
+/*
+ * Write an Interface statistics block at the end of capture.
+ */
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop)
+{
+	struct pcapng_statistics *hdr;
+	struct pcapng_option *opt;
+	uint32_t optlen, len;
+	uint8_t *buf;
+	uint64_t ns;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	optlen = 0;
+
+	if (ifrecv != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifdrop));
+	if (start_time != 0)
+		optlen += pcapng_optlen(sizeof(start_time));
+	if (end_time != 0)
+		optlen += pcapng_optlen(sizeof(end_time));
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+	if (optlen != 0)
+		optlen += pcapng_optlen(0);
+
+	len = sizeof(*hdr) + optlen + sizeof(uint32_t);
+	buf = alloca(len);
+	if (buf == NULL)
+		return -1;
+
+	hdr = (struct pcapng_statistics *)buf;
+	opt = (struct pcapng_option *)(hdr + 1);
+
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (start_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME,
+					 &start_time, sizeof(start_time));
+	if (end_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME,
+					 &end_time, sizeof(end_time));
+	if (ifrecv != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV,
+				&ifrecv, sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFDROP,
+				&ifdrop, sizeof(ifdrop));
+	if (optlen != 0)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	hdr->block_type = PCAPNG_INTERFACE_STATS_BLOCK;
+	hdr->block_length = len;
+	hdr->interface_id = self->port_index[port_id];
+
+	ns = pcapng_tsc_to_ns(rte_get_tsc_cycles());
+	hdr->timestamp_hi = ns >> 32;
+	hdr->timestamp_lo = (uint32_t)ns;
+
+	/* clone block_length after option */
+	memcpy(opt, &len, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length)
+{
+	/* The VLAN and EPB header must fit in the mbuf headroom. */
+	RTE_ASSERT(sizeof(struct pcapng_enhance_packet_block) +
+		   sizeof(struct rte_vlan_hdr) <= RTE_PKTMBUF_HEADROOM);
+
+	/* The flags and queue information are added at the end. */
+	return sizeof(struct rte_mbuf)
+		+ RTE_ALIGN(length, sizeof(uint32_t))
+		+ pcapng_optlen(sizeof(uint32_t)) /* flag option */
+		+ pcapng_optlen(sizeof(uint32_t)) /* queue option */
+		+ sizeof(uint32_t);		  /*  length */
+}
+
+/*
+ *   The mbufs created use the Pcapng standard enhanced packet  block.
+ *
+ *                         1                   2                   3
+ *     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  0 |                    Block Type = 0x00000006                    |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  4 |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  8 |                         Interface ID                          |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 12 |                        Timestamp (High)                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 16 |                        Timestamp (Low)                        |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 20 |                    Captured Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 24 |                    Original Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 28 /                                                               /
+ *    /                          Packet Data                          /
+ *    /              variable length, padded to 32 bits               /
+ *    /                                                               /
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0002     |     Option Length = 0x004     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Flags (direction)                                |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0006     |     Option Length = 0x002     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Queue id                                         |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+
+/* Make a copy of original mbuf with pcapng header and options */
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *md,
+		struct rte_mempool *mp,
+		uint32_t length, uint64_t cycles,
+		enum rte_pcapng_direction direction)
+{
+	struct pcapng_enhance_packet_block *epb;
+	uint32_t orig_len, data_len, padding, flags;
+	struct pcapng_option *opt;
+	const uint16_t optlen = pcapng_optlen(sizeof(flags)) + pcapng_optlen(sizeof(queue));
+	struct rte_mbuf *mc;
+	uint64_t ns;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
+#endif
+	ns = pcapng_tsc_to_ns(cycles);
+
+	orig_len = rte_pktmbuf_pkt_len(md);
+
+	/* Take snapshot of the data */
+	mc = rte_pktmbuf_copy(md, mp, 0, length);
+	if (unlikely(mc == NULL))
+		return NULL;
+
+	/* If packet had offloaded VLAN, expand it */
+	if (md->ol_flags & ~(PKT_RX_VLAN_STRIPPED | PKT_TX_VLAN)) {
+		if (rte_vlan_insert(&mc) != 0)
+			goto fail;
+
+		orig_len += sizeof(struct rte_vlan_hdr);
+	}
+
+	/* pad the packet to 32 bit boundary */
+	data_len = rte_pktmbuf_data_len(mc);
+	padding = RTE_ALIGN(data_len, sizeof(uint32_t)) - data_len;
+	if (padding > 0) {
+		void *tail = rte_pktmbuf_append(mc, padding);
+
+		if (tail == NULL)
+			goto fail;
+		memset(tail, 0, padding);
+	}
+
+	/* reserve trailing options and block length */
+	opt = (struct pcapng_option *)
+		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
+	if (unlikely(opt == NULL))
+		goto fail;
+
+	switch (direction) {
+	case RTE_PCAPNG_DIRECTION_IN:
+		flags = PCAPNG_IFB_INBOUND;
+		break;
+	case RTE_PCAPNG_DIRECTION_OUT:
+		flags = PCAPNG_IFB_OUTBOUND;
+		break;
+	default:
+		flags = 0;
+	}
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS,
+				&flags, sizeof(flags));
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
+				&queue, sizeof(queue));
+
+	/* Add PCAPNG packet header */
+	epb = (struct pcapng_enhance_packet_block *)
+		rte_pktmbuf_prepend(mc, sizeof(*epb));
+	if (unlikely(epb == NULL))
+		goto fail;
+
+	epb->block_type = PCAPNG_ENHANCED_PACKET_BLOCK;
+	epb->block_length = rte_pktmbuf_data_len(mc);
+
+	/* Interface index is filled in later during write */
+	mc->port = port_id;
+
+	epb->timestamp_hi = ns >> 32;
+	epb->timestamp_lo = (uint32_t)ns;
+	epb->capture_length = data_len;
+	epb->original_length = orig_len;
+
+	/* set trailer of block length */
+	*(uint32_t *)opt = epb->block_length;
+
+	return mc;
+
+fail:
+	rte_pktmbuf_free(mc);
+	return NULL;
+}
+
+/* Count how many segments are in this array of mbufs */
+static unsigned int
+mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
+{
+	unsigned int i, iovcnt;
+
+	for (iovcnt = 0, i = 0; i < n; i++) {
+		const struct rte_mbuf *m = pkts[i];
+
+		__rte_mbuf_sanity_check(m, 1);
+
+		iovcnt += m->nb_segs;
+	}
+	return iovcnt;
+}
+
+/* Write pre-formatted packets to file. */
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts)
+{
+	int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
+	struct iovec iov[iovcnt];
+	unsigned int i, cnt;
+	ssize_t ret;
+
+	for (i = cnt = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *m = pkts[i];
+		struct pcapng_enhance_packet_block *epb;
+
+		/* sanity check that is really a pcapng mbuf */
+		epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
+		if (unlikely(epb->block_type != PCAPNG_ENHANCED_PACKET_BLOCK ||
+			     epb->block_length != rte_pktmbuf_data_len(m))) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/*
+		 * The DPDK port is recorded during pcapng_copy.
+		 * Map that to PCAPNG interface in file.
+		 */
+		epb->interface_id = self->port_index[m->port];
+		do {
+			iov[cnt].iov_base = rte_pktmbuf_mtod(m, void *);
+			iov[cnt].iov_len = rte_pktmbuf_data_len(m);
+			++cnt;
+		} while ((m = m->next));
+	}
+
+	ret = writev(self->outfd, iov, iovcnt);
+	if (unlikely(ret < 0))
+		rte_errno = errno;
+	return ret;
+}
+
+/* Create new pcapng writer handle */
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment)
+{
+	rte_pcapng_t *self;
+
+	self = malloc(sizeof(*self));
+	if (!self) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	self->outfd = fd;
+
+	if (pcapng_section_block(self, osname, hardware, appname, comment) < 0)
+		goto fail;
+
+	if (pcapng_interfaces(self) < 0)
+		goto fail;
+
+	return self;
+fail:
+	free(self);
+	return NULL;
+}
+
+void
+rte_pcapng_close(rte_pcapng_t *self)
+{
+	close(self->outfd);
+	free(self);
+}
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
new file mode 100644
index 000000000000..2f1bb073df08
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.h
@@ -0,0 +1,194 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+/**
+ * @file
+ * RTE pcapng
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * Pcapng is an evolution from the pcap format, created to address some of
+ * its deficiencies. Namely, the lack of extensibility and inability to store
+ * additional information.
+ *
+ * For details about the file format see RFC:
+ *   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+ *  and
+ *    https://github.com/pcapng/pcapng/
+ */
+
+#ifndef _RTE_PCAPNG_H_
+#define _RTE_PCAPNG_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+#include <rte_compat.h>
+#include <rte_common.h>
+#include <rte_mempool.h>
+#include <rte_ring.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Opaque handle used for functions in this library. */
+typedef struct rte_pcapng rte_pcapng_t;
+
+/**
+ * Write data to existing open file
+ *
+ * @param fd
+ *   file descriptor
+ * @param osname
+ *   Optional description of the operating system.
+ *   Examples: "Debian 11", "Windows Server 22"
+ * @param hardware
+ *   Optional description of the hardware used to create this file.
+ *   Examples: "x86 Virtual Machine"
+ * @param appname
+ *   Optional: application name recorded in the pcapng file.
+ *   Example: "dpdk-dumpcap 1.0 (DPDK 20.11)"
+ * @param comment
+ *   Optional comment to add to file header.
+ * @return
+ *   handle to library, or NULL in case of error (and rte_errno is set).
+ */
+__rte_experimental
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment);
+
+/**
+ * Close capture file
+ *
+ * @param self
+ *  handle to library
+ */
+__rte_experimental
+void
+rte_pcapng_close(rte_pcapng_t *self);
+
+/**
+ * Direction flag
+ * These should match Enhanced Packet Block flag bits
+ */
+enum rte_pcapng_direction {
+	RTE_PCAPNG_DIRECTION_UNKNOWN = 0,
+	RTE_PCAPNG_DIRECTION_IN  = 1,
+	RTE_PCAPNG_DIRECTION_OUT = 2,
+};
+
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param port_id
+ *   The Ethernet port on which packet was received
+ *   or is going to be transmitted.
+ * @param queue
+ *   The queue on the Ethernet port where packet was received
+ *   or is going to be transmitted.
+ * @param mp
+ *   The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ *   The mbuf to copy
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @param timestamp
+ *   The timestamp in TSC cycles.
+ * @param direction
+ *   The direction of the packer: receive, transmit or unknown.
+ *
+ * @return
+ *   - The pointer to the new mbuf formatted for pcapng_write
+ *   - NULL if allocation fails.
+ *
+ */
+__rte_experimental
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *m, struct rte_mempool *mp,
+		uint32_t length, uint64_t timestamp,
+		enum rte_pcapng_direction direction);
+
+
+/**
+ * Determine optimum mbuf data size.
+ *
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @return
+ *   The minimum size of mbuf data to handle packet with length bytes.
+ *   Accounting for required header and trailer fields
+ */
+__rte_experimental
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length);
+
+/**
+ * Write packets to the capture file.
+ *
+ * Packets to be captured are copied by rte_pcapng_mbuf()
+ * and then this function is called to write them to the file.
+ * @warning
+ * Do not pass original mbufs
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param pkts
+ *  The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *  which contain the output packets
+ * @param nb_pkts
+ *  The number of packets to write to the file.
+ * @return
+ *  The number of bytes written to file, -1 on failure to write file.
+ *  The mbuf's in *pkts* are always freed.
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts);
+
+/**
+ * Write an Interface statistics block.
+ * For statistics, use 0 if don't know or care to report it.
+ * Should be called before closing capture to report results.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param port
+ *  The Ethernet port to report stats on.
+ * @param comment
+ *   Optional comment to add to statistics.
+ * @param start_time
+ *  The time when packet capture was started in nanoseconds.
+ *  Optional: can be zero if not known.
+ * @param end_time
+ *  The time when packet capture was stopped in nanoseconds.
+ *  Optional: can be zero if not finished;
+ * @param ifrecv
+ *  The number of packets received by capture.
+ *  Optional: use UINT64_MAX if not known.
+ * @param ifdrop
+ *  The number of packets missed by the capture process.
+ *  Optional: use UINT64_MAX if not known.
+ * @return
+ *  number of bytes written to file, -1 on failure to write file
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PCAPNG_H_ */
diff --git a/lib/pcapng/version.map b/lib/pcapng/version.map
new file mode 100644
index 000000000000..05a9c86a7d91
--- /dev/null
+++ b/lib/pcapng/version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_pcapng_close;
+	rte_pcapng_copy;
+	rte_pcapng_fdopen;
+	rte_pcapng_mbuf_size;
+	rte_pcapng_write_packets;
+	rte_pcapng_write_stats;
+
+	local: *;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v11 02/12] lib: pdump is not supported on Windows
  2021-09-24 15:21 ` [dpdk-dev] [PATCH v11 00/12] Packet capture framework enhancements Stephen Hemminger
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 01/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-09-24 15:21   ` Stephen Hemminger
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 03/12] bpf: allow self-xor operation Stephen Hemminger
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-24 15:21 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

The current version of the pdump library was building on
Windows, but it was useless since the pdump utility was not being
built and Windows does not have multi-process support.

The new version of pdump with filtering now has dependency
on bpf. But bpf library is not available on Windows.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

Cc: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>
Cc: Dmitry Malloy <dmitrym@microsoft.com>
Cc: Pallavi Kadam <pallavi.kadam@intel.com>
---
 lib/meson.build | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/meson.build b/lib/meson.build
index 51bf9c2d11f0..ba88e9eabc58 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -85,7 +85,6 @@ if is_windows
             'gro',
             'gso',
             'latencystats',
-            'pdump',
     ] # only supported libraries for windows
 endif
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v11 03/12] bpf: allow self-xor operation
  2021-09-24 15:21 ` [dpdk-dev] [PATCH v11 00/12] Packet capture framework enhancements Stephen Hemminger
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 01/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 02/12] lib: pdump is not supported on Windows Stephen Hemminger
@ 2021-09-24 15:21   ` Stephen Hemminger
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-24 15:21 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

Some BPF programs may use XOR of a register with itself
as a way to zero register in one instruction.
The BPF filter converter generates this in the prolog
to the generated code.

The BPF validator would not allow this because the value of
register was undefined. But after this operation it always zero.

Fixes: 8021917293d0 ("bpf: add extra validation for input BPF program")
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_validate.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_validate.c b/lib/bpf/bpf_validate.c
index 7b1291b382e9..853279fee557 100644
--- a/lib/bpf/bpf_validate.c
+++ b/lib/bpf/bpf_validate.c
@@ -661,8 +661,15 @@ eval_alu(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
 
 	op = BPF_OP(ins->code);
 
+	/* Allow self-xor as way to zero register */
+	if (op == BPF_XOR && BPF_SRC(ins->code) == BPF_X &&
+	    ins->src_reg == ins->dst_reg) {
+		eval_fill_imm(&rs, UINT64_MAX, 0);
+		eval_fill_imm(rd, UINT64_MAX, 0);
+	}
+
 	err = eval_defined((op != EBPF_MOV) ? rd : NULL,
-			(op != BPF_NEG) ? &rs : NULL);
+			   (op != BPF_NEG) ? &rs : NULL);
 	if (err != NULL)
 		return err;
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v11 04/12] bpf: add function to convert classic BPF to DPDK BPF
  2021-09-24 15:21 ` [dpdk-dev] [PATCH v11 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (2 preceding siblings ...)
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 03/12] bpf: allow self-xor operation Stephen Hemminger
@ 2021-09-24 15:21   ` Stephen Hemminger
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-24 15:21 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

The pcap library emits classic BPF (32 bit) and is useful for
creating filter programs.  The DPDK BPF library only implements
extended BPF (eBPF).  Add an function to convert from old to
new.

The rte_bpf_convert function uses rte_malloc to put the resulting
program in hugepage shared memory so it can be passed from a
secondary process to a primary process.

The code to convert was originally done as part of the Linux
kernel implementation then converted to a userspace program.
See https://github.com/tklauser/filter2xdp

Both authors have agreed that it is allowable to create a modified
version of this code and license it with BSD license used by DPDK.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/bpf/bpf_convert.c | 575 ++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build   |   5 +
 lib/bpf/rte_bpf.h     |  25 ++
 lib/bpf/version.map   |   6 +
 4 files changed, 611 insertions(+)
 create mode 100644 lib/bpf/bpf_convert.c

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
new file mode 100644
index 000000000000..db84add7dcce
--- /dev/null
+++ b/lib/bpf/bpf_convert.c
@@ -0,0 +1,575 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Microsoft Corporation
+ *
+ * Based on bpf_convert_filter() in the Linux kernel sources
+ * and filter2xdp.
+ *
+ * Licensed as BSD with permission original authors.
+ * Copyright (C) 2017 Tobias Klauser
+ * Copyright (c) 2011 - 2014 PLUMgrid, http://plumgrid.com
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_bpf.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+/* Workaround name conflicts with libpcap */
+#define bpf_validate(f, len) bpf_validate_libpcap(f, len)
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+#undef bpf_validate
+
+#include "bpf_impl.h"
+#include "bpf_def.h"
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+/*
+ * Linux socket filter uses negative absolute offsets to
+ * reference ancillary data.
+ */
+#define SKF_AD_OFF    (-0x1000)
+#define SKF_AD_PROTOCOL 0
+#define SKF_AD_PKTTYPE	4
+#define SKF_AD_IFINDEX	8
+#define SKF_AD_NLATTR	12
+#define SKF_AD_NLATTR_NEST	16
+#define SKF_AD_MARK	20
+#define SKF_AD_QUEUE	24
+#define SKF_AD_HATYPE	28
+#define SKF_AD_RXHASH	32
+#define SKF_AD_CPU	36
+#define SKF_AD_ALU_XOR_X	40
+#define SKF_AD_VLAN_TAG	44
+#define SKF_AD_VLAN_TAG_PRESENT 48
+#define SKF_AD_PAY_OFFSET	52
+#define SKF_AD_RANDOM	56
+#define SKF_AD_VLAN_TPID	60
+#define SKF_AD_MAX	64
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1	EBPF_REG_1
+#define BPF_REG_ARG2	EBPF_REG_2
+#define BPF_REG_ARG3	EBPF_REG_3
+#define BPF_REG_ARG4	EBPF_REG_4
+#define BPF_REG_ARG5	EBPF_REG_5
+#define BPF_REG_CTX	EBPF_REG_6
+#define BPF_REG_FP	EBPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A	EBPF_REG_0
+#define BPF_REG_X	EBPF_REG_7
+#define BPF_REG_TMP	EBPF_REG_8
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define EBPF_ALU64_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | BPF_OP(OP) | BPF_X,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU32_IMM(OP, DST, IMM)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV32_IMM(DST, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = CODE,					\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN()						\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | EBPF_EXIT,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/*
+ * Placeholder to convert BPF extensions like length and VLAN tag
+ * If and when DPDK BPF supports them.
+ */
+static bool convert_bpf_load(const struct bpf_insn *fp,
+			     struct ebpf_insn **new_insnp __rte_unused)
+{
+	switch (fp->k) {
+	case SKF_AD_OFF + SKF_AD_PROTOCOL:
+	case SKF_AD_OFF + SKF_AD_PKTTYPE:
+	case SKF_AD_OFF + SKF_AD_IFINDEX:
+	case SKF_AD_OFF + SKF_AD_HATYPE:
+	case SKF_AD_OFF + SKF_AD_MARK:
+	case SKF_AD_OFF + SKF_AD_RXHASH:
+	case SKF_AD_OFF + SKF_AD_QUEUE:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG_PRESENT:
+	case SKF_AD_OFF + SKF_AD_VLAN_TPID:
+	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
+	case SKF_AD_OFF + SKF_AD_NLATTR:
+	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
+	case SKF_AD_OFF + SKF_AD_CPU:
+	case SKF_AD_OFF + SKF_AD_RANDOM:
+	case SKF_AD_OFF + SKF_AD_ALU_XOR_X:
+		/* Linux has special negative offsets to access meta-data. */
+		RTE_BPF_LOG(ERR,
+			    "rte_bpf_convert: socket offset %d not supported\n",
+			    fp->k - SKF_AD_OFF);
+		return true;
+	default:
+		return false;
+	}
+}
+
+static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
+			      struct ebpf_insn *new_prog, uint32_t *new_len)
+{
+	unsigned int pass = 0;
+	size_t new_flen = 0, target, i;
+	struct ebpf_insn *new_insn;
+	const struct bpf_insn *fp;
+	int *addrs = NULL;
+	uint8_t bpf_src;
+
+	if (len > BPF_MAXINSNS) {
+		RTE_BPF_LOG(ERR, "%s: cBPF program too long (%zu insns)\n",
+			    __func__, len);
+		return -EINVAL;
+	}
+
+	/* On second pass, allocate the new program */
+	if (new_prog) {
+		addrs = calloc(len, sizeof(*addrs));
+		if (addrs == NULL)
+			return -ENOMEM;
+	}
+
+do_pass:
+	new_insn = new_prog;
+	fp = prog;
+
+	/* Classic BPF related prologue emission. */
+	if (new_insn) {
+		/* Classic BPF expects A and X to be reset first. These need
+		 * to be guaranteed to be the first two instructions.
+		 */
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+
+		/* All programs must keep CTX in callee saved BPF_REG_CTX.
+		 * In eBPF case it's done by the compiler, here we need to
+		 * do this ourself. Initial CTX is present in BPF_REG_ARG1.
+		 */
+		*new_insn++ = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1);
+	} else {
+		new_insn += 3;
+	}
+
+	for (i = 0; i < len; fp++, i++) {
+		struct ebpf_insn tmp_insns[6] = { };
+		struct ebpf_insn *insn = tmp_insns;
+
+		if (addrs)
+			addrs[i] = new_insn - new_prog;
+
+		switch (fp->code) {
+			/* Absolute loads are how classic BPF accesses skb */
+		case BPF_LD | BPF_ABS | BPF_W:
+		case BPF_LD | BPF_ABS | BPF_H:
+		case BPF_LD | BPF_ABS | BPF_B:
+			if (convert_bpf_load(fp, &insn))
+				goto err;
+
+			*insn = BPF_RAW_INSN(fp->code, 0, 0, 0, fp->k);
+			break;
+
+		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_X:
+			/* For cBPF, don't cause floating point exception */
+			*insn++ = BPF_MOV32_REG(BPF_REG_X, BPF_REG_X);
+			*insn++ = BPF_JMP_IMM(EBPF_JNE, BPF_REG_X, 0, 2);
+			*insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+			*insn++ = BPF_EXIT_INSN();
+			/* fallthrough */
+		case BPF_ALU | BPF_ADD | BPF_X:
+		case BPF_ALU | BPF_ADD | BPF_K:
+		case BPF_ALU | BPF_SUB | BPF_X:
+		case BPF_ALU | BPF_SUB | BPF_K:
+		case BPF_ALU | BPF_AND | BPF_X:
+		case BPF_ALU | BPF_AND | BPF_K:
+		case BPF_ALU | BPF_OR | BPF_X:
+		case BPF_ALU | BPF_OR | BPF_K:
+		case BPF_ALU | BPF_LSH | BPF_X:
+		case BPF_ALU | BPF_LSH | BPF_K:
+		case BPF_ALU | BPF_RSH | BPF_X:
+		case BPF_ALU | BPF_RSH | BPF_K:
+		case BPF_ALU | BPF_XOR | BPF_X:
+		case BPF_ALU | BPF_XOR | BPF_K:
+		case BPF_ALU | BPF_MUL | BPF_X:
+		case BPF_ALU | BPF_MUL | BPF_K:
+		case BPF_ALU | BPF_DIV | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_NEG:
+		case BPF_LD | BPF_IND | BPF_W:
+		case BPF_LD | BPF_IND | BPF_H:
+		case BPF_LD | BPF_IND | BPF_B:
+			/* All arithmetic insns map as-is. */
+			insn->code = fp->code;
+			insn->dst_reg = BPF_REG_A;
+			bpf_src = BPF_SRC(fp->code);
+			insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			insn->off = 0;
+			insn->imm = fp->k;
+			break;
+
+			/* Jump transformation cannot use BPF block macros
+			 * everywhere as offset calculation and target updates
+			 * require a bit more work than the rest, i.e. jump
+			 * opcodes map as-is, but offsets need adjustment.
+			 */
+
+#define BPF_EMIT_JMP							\
+			do {						\
+				if (target >= len)			\
+					goto err;			\
+				insn->off = addrs ? addrs[target] - addrs[i] - 1 : 0; \
+				/* Adjust pc relative offset for 2nd or 3rd insn. */ \
+				insn->off -= insn - tmp_insns;		\
+			} while (0)
+
+		case BPF_JMP | BPF_JA:
+			target = i + fp->k + 1;
+			insn->code = fp->code;
+			BPF_EMIT_JMP;
+			break;
+
+		case BPF_JMP | BPF_JEQ | BPF_K:
+		case BPF_JMP | BPF_JEQ | BPF_X:
+		case BPF_JMP | BPF_JSET | BPF_K:
+		case BPF_JMP | BPF_JSET | BPF_X:
+		case BPF_JMP | BPF_JGT | BPF_K:
+		case BPF_JMP | BPF_JGT | BPF_X:
+		case BPF_JMP | BPF_JGE | BPF_K:
+		case BPF_JMP | BPF_JGE | BPF_X:
+			if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
+				/* BPF immediates are signed, zero extend
+				 * immediate into tmp register and use it
+				 * in compare insn.
+				 */
+				*insn++ = BPF_MOV32_IMM(BPF_REG_TMP, fp->k);
+
+				insn->dst_reg = BPF_REG_A;
+				insn->src_reg = BPF_REG_TMP;
+				bpf_src = BPF_X;
+			} else {
+				insn->dst_reg = BPF_REG_A;
+				insn->imm = fp->k;
+				bpf_src = BPF_SRC(fp->code);
+				insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			}
+
+			/* Common case where 'jump_false' is next insn. */
+			if (fp->jf == 0) {
+				insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+				target = i + fp->jt + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Convert JEQ into JNE when 'jump_true' is next insn. */
+			if (fp->jt == 0 && BPF_OP(fp->code) == BPF_JEQ) {
+				insn->code = BPF_JMP | EBPF_JNE | bpf_src;
+				target = i + fp->jf + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Other jumps are mapped into two insns: Jxx and JA. */
+			target = i + fp->jt + 1;
+			insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+			BPF_EMIT_JMP;
+			insn++;
+
+			insn->code = BPF_JMP | BPF_JA;
+			target = i + fp->jf + 1;
+			BPF_EMIT_JMP;
+			break;
+
+			/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
+		case BPF_LDX | BPF_MSH | BPF_B:
+			/* tmp = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_A);
+			/* A = BPF_R0 = *(u8 *) (skb->data + K) */
+			*insn++ = BPF_LD_ABS(BPF_B, fp->k);
+			/* A &= 0xf */
+			*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, 0xf);
+			/* A <<= 2 */
+			*insn++ = BPF_ALU32_IMM(BPF_LSH, BPF_REG_A, 2);
+			/* X = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			/* A = tmp */
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP);
+			break;
+
+			/* RET_K is remaped into 2 insns. RET_A case doesn't need an
+			 * extra mov as EBPF_REG_0 is already mapped into BPF_REG_A.
+			 */
+		case BPF_RET | BPF_A:
+		case BPF_RET | BPF_K:
+			if (BPF_RVAL(fp->code) == BPF_K) {
+				*insn++ = BPF_MOV32_RAW(BPF_K, EBPF_REG_0,
+							0, fp->k);
+			}
+			*insn = BPF_EXIT_INSN();
+			break;
+
+			/* Store to stack. */
+		case BPF_ST:
+		case BPF_STX:
+			*insn = BPF_STX_MEM(BPF_W, BPF_REG_FP, BPF_CLASS(fp->code) ==
+					    BPF_ST ? BPF_REG_A : BPF_REG_X,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* Load from stack. */
+		case BPF_LD | BPF_MEM:
+		case BPF_LDX | BPF_MEM:
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD  ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_FP,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* A = K or X = K */
+		case BPF_LD | BPF_IMM:
+		case BPF_LDX | BPF_IMM:
+			*insn = BPF_MOV32_IMM(BPF_CLASS(fp->code) == BPF_LD ?
+					      BPF_REG_A : BPF_REG_X, fp->k);
+			break;
+
+			/* X = A */
+		case BPF_MISC | BPF_TAX:
+			*insn = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			break;
+
+			/* A = X */
+		case BPF_MISC | BPF_TXA:
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_X);
+			break;
+
+			/* A = mbuf->len or X = mbuf->len */
+		case BPF_LD | BPF_W | BPF_LEN:
+		case BPF_LDX | BPF_W | BPF_LEN:
+			/* BPF_ABS/BPF_IND implicitly expect mbuf ptr in R6 */
+
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_CTX,
+					    offsetof(struct rte_mbuf, pkt_len));
+			break;
+
+			/* Unknown instruction. */
+		default:
+			RTE_BPF_LOG(ERR, "%s: Unknown instruction!: %#x\n",
+				    __func__, fp->code);
+			goto err;
+		}
+
+		insn++;
+		if (new_prog)
+			memcpy(new_insn, tmp_insns,
+			       sizeof(*insn) * (insn - tmp_insns));
+		new_insn += insn - tmp_insns;
+	}
+
+	if (!new_prog) {
+		/* Only calculating new length. */
+		*new_len = new_insn - new_prog;
+		return 0;
+	}
+
+	pass++;
+	if ((ptrdiff_t)new_flen != new_insn - new_prog) {
+		new_flen = new_insn - new_prog;
+		if (pass > 2)
+			goto err;
+		goto do_pass;
+	}
+
+	free(addrs);
+	assert(*new_len == new_flen);
+
+	return 0;
+err:
+	free(addrs);
+	return -1;
+}
+
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog)
+{
+	struct rte_bpf_prm *prm = NULL;
+	struct ebpf_insn *ebpf = NULL;
+	uint32_t ebpf_len = 0;
+	int ret;
+
+	if (prog == NULL) {
+		RTE_BPF_LOG(ERR, "%s: NULL program\n", __func__);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* 1st pass: calculate the eBPF program length */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, NULL, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot get eBPF length\n", __func__);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	RTE_BPF_LOG(DEBUG, "%s: prog len cBPF=%u -> eBPF=%u\n",
+		    __func__, prog->bf_len, ebpf_len);
+
+	prm = rte_zmalloc("bpf_filter",
+			  sizeof(*prm) + ebpf_len * sizeof(*ebpf), 0);
+	if (prm == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	/* The EPBF instructions in this case are right after the header */
+	ebpf = (void *)(prm + 1);
+
+	/* 2nd pass: remap cBPF to eBPF instructions  */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, ebpf, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot convert cBPF to eBPF\n", __func__);
+		free(prm);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	prm->ins = ebpf;
+	prm->nb_ins = ebpf_len;
+
+	/* Classic BPF programs use mbufs */
+	prm->prog_arg.type = RTE_BPF_ARG_PTR_MBUF;
+	prm->prog_arg.size = sizeof(struct rte_mbuf);
+
+	return prm;
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 63cbd60185e0..54f7610ae990 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -25,3 +25,8 @@ if dep.found()
     sources += files('bpf_load_elf.c')
     ext_deps += dep
 endif
+
+if dpdk_conf.has('RTE_PORT_PCAP')
+    sources += files('bpf_convert.c')
+    ext_deps += pcap_dep
+endif
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 69116f36ba8b..2f23e272a376 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,31 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+#ifdef RTE_PORT_PCAP
+
+struct bpf_program;
+
+/**
+ * Convert a Classic BPF program from libpcap into a DPDK BPF code.
+ *
+ * @param prog
+ *  Classic BPF program from pcap_compile().
+ * @param prm
+ *  Result Extended BPF program.
+ * @return
+ *   Pointer to BPF program (allocated with *rte_malloc*)
+ *   that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+__rte_experimental
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog);
+
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 0bf35f487666..47082d5003ef 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -14,3 +14,9 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_convert;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v11 05/12] bpf: add function to dump eBPF instructions
  2021-09-24 15:21 ` [dpdk-dev] [PATCH v11 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (3 preceding siblings ...)
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
@ 2021-09-24 15:21   ` Stephen Hemminger
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 06/12] pdump: support pcapng and filtering Stephen Hemminger
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-24 15:21 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

When debugging converted (and other) programs it is useful
to see disassembled eBPF output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/bpf/bpf_dump.c  | 139 ++++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build |   1 +
 lib/bpf/rte_bpf.h   |  14 +++++
 lib/bpf/version.map |   1 +
 4 files changed, 155 insertions(+)
 create mode 100644 lib/bpf/bpf_dump.c

diff --git a/lib/bpf/bpf_dump.c b/lib/bpf/bpf_dump.c
new file mode 100644
index 000000000000..b86977b96d08
--- /dev/null
+++ b/lib/bpf/bpf_dump.c
@@ -0,0 +1,139 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Stephen Hemminger
+ * Based on filter2xdp
+ * Copyright (C) 2017 Tobias Klauser
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+
+#include "rte_bpf.h"
+
+#define BPF_OP_INDEX(x) (BPF_OP(x) >> 4)
+#define BPF_SIZE_INDEX(x) (BPF_SIZE(x) >> 3)
+
+static const char *const class_tbl[] = {
+	[BPF_LD] = "ld",   [BPF_LDX] = "ldx",	 [BPF_ST] = "st",
+	[BPF_STX] = "stx", [BPF_ALU] = "alu",	 [BPF_JMP] = "jmp",
+	[BPF_RET] = "ret", [BPF_MISC] = "alu64",
+};
+
+static const char *const alu_op_tbl[16] = {
+	[BPF_ADD >> 4] = "add",	   [BPF_SUB >> 4] = "sub",
+	[BPF_MUL >> 4] = "mul",	   [BPF_DIV >> 4] = "div",
+	[BPF_OR >> 4] = "or",	   [BPF_AND >> 4] = "and",
+	[BPF_LSH >> 4] = "lsh",	   [BPF_RSH >> 4] = "rsh",
+	[BPF_NEG >> 4] = "neg",	   [BPF_MOD >> 4] = "mod",
+	[BPF_XOR >> 4] = "xor",	   [EBPF_MOV >> 4] = "mov",
+	[EBPF_ARSH >> 4] = "arsh", [EBPF_END >> 4] = "endian",
+};
+
+static const char *const size_tbl[] = {
+	[BPF_W >> 3] = "w",
+	[BPF_H >> 3] = "h",
+	[BPF_B >> 3] = "b",
+	[EBPF_DW >> 3] = "dw",
+};
+
+static const char *const jump_tbl[16] = {
+	[BPF_JA >> 4] = "ja",	   [BPF_JEQ >> 4] = "jeq",
+	[BPF_JGT >> 4] = "jgt",	   [BPF_JGE >> 4] = "jge",
+	[BPF_JSET >> 4] = "jset",  [EBPF_JNE >> 4] = "jne",
+	[EBPF_JSGT >> 4] = "jsgt", [EBPF_JSGE >> 4] = "jsge",
+	[EBPF_CALL >> 4] = "call", [EBPF_EXIT >> 4] = "exit",
+};
+
+void rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len)
+{
+	uint32_t i;
+
+	for (i = 0; i < len; ++i) {
+		const struct ebpf_insn *ins = buf + i;
+		uint8_t cls = BPF_CLASS(ins->code);
+		const char *op, *postfix = "";
+
+		fprintf(f, " L%u:\t", i);
+
+		switch (cls) {
+		default:
+			fprintf(f, "unimp 0x%x // class: %s\n",
+				ins->code, class_tbl[cls]);
+			break;
+		case BPF_ALU:
+			postfix = "32";
+			/* fall through */
+		case EBPF_ALU64:
+			op = alu_op_tbl[BPF_OP_INDEX(ins->code)];
+			if (BPF_SRC(ins->code) == BPF_X)
+				fprintf(f, "%s%s r%u, r%u\n", op, postfix, ins->dst_reg,
+					ins->src_reg);
+			else
+				fprintf(f, "%s%s r%u, #0x%x\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			break;
+		case BPF_LD:
+			op = "ld";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			if (ins->code == (BPF_LD | BPF_IMM | EBPF_DW)) {
+				uint64_t val;
+
+				val = (uint32_t)ins[0].imm |
+					(uint64_t)(uint32_t)ins[1].imm << 32;
+				fprintf(f, "%s%s r%d, #0x%"PRIx64"\n",
+					op, postfix, ins->dst_reg, val);
+				i++;
+			} else if (BPF_MODE(ins->code) == BPF_IMM)
+				fprintf(f, "%s%s r%d, #0x%x\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			else if (BPF_MODE(ins->code) == BPF_ABS)
+				fprintf(f, "%s%s r%d, [%d]\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			else if (BPF_MODE(ins->code) == BPF_IND)
+				fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix,
+					ins->dst_reg, ins->src_reg, ins->imm);
+			else
+				fprintf(f, "// BUG: LD opcode 0x%02x in eBPF insns\n",
+					ins->code);
+			break;
+		case BPF_LDX:
+			op = "ldx";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix, ins->dst_reg,
+				ins->src_reg, ins->off);
+			break;
+		case BPF_ST:
+			op = "st";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			if (BPF_MODE(ins->code) == BPF_MEM)
+				fprintf(f, "%s%s [r%d + %d], #0x%x\n", op, postfix,
+					ins->dst_reg, ins->off, ins->imm);
+			else
+				fprintf(f, "// BUG: ST opcode 0x%02x in eBPF insns\n",
+					ins->code);
+			break;
+		case BPF_STX:
+			op = "stx";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			fprintf(f, "%s%s [r%d + %d], r%u\n", op, postfix,
+				ins->dst_reg, ins->off, ins->src_reg);
+			break;
+#define L(pc, off) ((int)(pc) + 1 + (off))
+		case BPF_JMP:
+			op = jump_tbl[BPF_OP_INDEX(ins->code)];
+			if (op == NULL)
+				fprintf(f, "invalid jump opcode: %#x\n", ins->code);
+			else if (BPF_OP(ins->code) == BPF_JA)
+				fprintf(f, "%s L%d\n", op, L(i, ins->off));
+			else if (BPF_OP(ins->code) == EBPF_EXIT)
+				fprintf(f, "%s\n", op);
+			else
+				fprintf(f, "%s r%u, #0x%x, L%d\n", op, ins->dst_reg,
+					ins->imm, L(i, ins->off));
+			break;
+		case BPF_RET:
+			fprintf(f, "// BUG: RET opcode 0x%02x in eBPF insns\n",
+				ins->code);
+			break;
+		}
+	}
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 54f7610ae990..5b5585173aeb 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2018 Intel Corporation
 
 sources = files('bpf.c',
+	'bpf_dump.c',
         'bpf_exec.c',
         'bpf_load.c',
         'bpf_pkt.c',
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 2f23e272a376..0d0a84b130a0 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,20 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+/**
+ * Dump epf instructions to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param buf
+ *   A pointer to BPF instructions
+ * @param len
+ *   Number of BPF instructions to dump.
+ */
+__rte_experimental
+void
+rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len);
+
 #ifdef RTE_PORT_PCAP
 
 struct bpf_program;
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 47082d5003ef..3b953f2f4592 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -19,4 +19,5 @@ EXPERIMENTAL {
 	global:
 
 	rte_bpf_convert;
+	rte_bpf_dump;
 };
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v11 06/12] pdump: support pcapng and filtering
  2021-09-24 15:21 ` [dpdk-dev] [PATCH v11 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (4 preceding siblings ...)
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
@ 2021-09-24 15:21   ` Stephen Hemminger
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-24 15:21 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build       |   4 +-
 lib/pdump/meson.build |   2 +-
 lib/pdump/rte_pdump.c | 427 ++++++++++++++++++++++++++++++------------
 lib/pdump/rte_pdump.h | 113 ++++++++++-
 lib/pdump/version.map |   8 +
 5 files changed, 427 insertions(+), 127 deletions(-)

diff --git a/lib/meson.build b/lib/meson.build
index ba88e9eabc58..9812e54f1a12 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -27,6 +27,7 @@ libraries = [
         'acl',
         'bbdev',
         'bitratestats',
+        'bpf',
         'cfgfile',
         'compressdev',
         'cryptodev',
@@ -43,7 +44,6 @@ libraries = [
         'member',
         'pcapng',
         'power',
-        'pdump',
         'rawdev',
         'regexdev',
         'rib',
@@ -55,10 +55,10 @@ libraries = [
         'ipsec', # ipsec lib depends on net, crypto and security
         'fib', #fib lib depends on rib
         'port', # pkt framework libs which use other libs from above
+        'pdump', # pdump lib depends on bpf
         'table',
         'pipeline',
         'flow_classify', # flow_classify lib depends on pkt framework table lib
-        'bpf',
         'graph',
         'node',
 ]
diff --git a/lib/pdump/meson.build b/lib/pdump/meson.build
index 3a95eabde6a6..51ceb2afdec5 100644
--- a/lib/pdump/meson.build
+++ b/lib/pdump/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('rte_pdump.c')
 headers = files('rte_pdump.h')
-deps += ['ethdev']
+deps += ['ethdev', 'bpf', 'pcapng']
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 382217bc1564..82b4f622ca37 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -7,8 +7,10 @@
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
+#include <rte_pcapng.h>
 
 #include "rte_pdump.h"
 
@@ -27,30 +29,23 @@ enum pdump_operation {
 	ENABLE = 2
 };
 
+/* Internal version number in request */
 enum pdump_version {
-	V1 = 1
+	V1 = 1,		    /* no filtering or snap */
+	V2 = 2,
 };
 
 struct pdump_request {
 	uint16_t ver;
 	uint16_t op;
 	uint32_t flags;
-	union pdump_data {
-		struct enable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} en_v1;
-		struct disable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} dis_v1;
-	} data;
+	char device[RTE_DEV_NAME_MAX_LEN];
+	uint16_t queue;
+	struct rte_ring *ring;
+	struct rte_mempool *mp;
+
+	const struct rte_bpf_prm *prm;
+	uint32_t snaplen;
 };
 
 struct pdump_response {
@@ -63,80 +58,136 @@ static struct pdump_rxtx_cbs {
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	const struct rte_eth_rxtx_callback *cb;
-	void *filter;
+	const struct rte_bpf *filter;
+	enum pdump_version ver;
+	uint32_t snaplen;
 } rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
 tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
 
-
-static inline void
-pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+static const char *MZ_RTE_PDUMP_STATS = "rte_pdump_stats";
+
+/* Shared memory between primary and secondary processes. */
+static struct {
+	struct rte_pdump_stats rx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+	struct rte_pdump_stats tx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+} *pdump_stats;
+
+/* Create a clone of mbuf to be placed into ring. */
+static void
+pdump_copy(uint16_t port_id, uint16_t queue,
+	   enum rte_pcapng_direction direction,
+	   struct rte_mbuf **pkts, uint16_t nb_pkts,
+	   const struct pdump_rxtx_cbs *cbs,
+	   struct rte_pdump_stats *stats)
 {
 	unsigned int i;
 	int ring_enq;
 	uint16_t d_pkts = 0;
 	struct rte_mbuf *dup_bufs[nb_pkts];
-	struct pdump_rxtx_cbs *cbs;
+	uint64_t ts;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	struct rte_mbuf *p;
+	uint64_t rcs[nb_pkts];
+
+	if (cbs->filter)
+		rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts);
 
-	cbs  = user_params;
+	ts = rte_get_tsc_cycles();
 	ring = cbs->ring;
 	mp = cbs->mp;
 	for (i = 0; i < nb_pkts; i++) {
-		p = rte_pktmbuf_copy(pkts[i], mp, 0, UINT32_MAX);
-		if (p)
+		/*
+		 * This uses same BPF return value convention as socket filter
+		 * and pcap_offline_filter.
+		 * if program returns zero
+		 * then packet doesn't match the filter (will be ignored).
+		 */
+		if (cbs->filter && rcs[i] == 0) {
+			__atomic_fetch_add(&stats->filtered,
+					   1, __ATOMIC_RELAXED);
+			continue;
+		}
+
+		/*
+		 * If using pcapng then want to wrap packets
+		 * otherwise a simple copy.
+		 */
+		if (cbs->ver == V2)
+			p = rte_pcapng_copy(port_id, queue,
+					    pkts[i], mp, cbs->snaplen,
+					    ts, direction);
+		else
+			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
+
+		if (unlikely(p == NULL))
+			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+		else
 			dup_bufs[d_pkts++] = p;
 	}
 
+	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)dup_bufs, d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		PDUMP_LOG(DEBUG,
-			"only %d of packets enqueued to ring\n", ring_enq);
+		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
 
 static uint16_t
-pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_rx(uint16_t port, uint16_t queue,
 	struct rte_mbuf **pkts, uint16_t nb_pkts,
-	uint16_t max_pkts __rte_unused,
-	void *user_params)
+	uint16_t max_pkts __rte_unused, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->rx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_IN,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static uint16_t
-pdump_tx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_tx(uint16_t port, uint16_t queue,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->tx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static int
-pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_rx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &rx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &rx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"rx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_first_rx_callback(port, qid,
 								pdump_rx, cbs);
 			if (cbs->cb == NULL) {
@@ -145,8 +196,7 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -170,26 +220,32 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 }
 
 static int
-pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_tx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &tx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &tx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"tx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_tx_callback(port, qid, pdump_tx,
 								cbs);
 			if (cbs->cb == NULL) {
@@ -198,8 +254,7 @@ pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -228,37 +283,47 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	uint16_t nb_rx_q = 0, nb_tx_q = 0, end_q, queue;
 	uint16_t port;
 	int ret = 0;
+	struct rte_bpf *filter = NULL;
 	uint32_t flags;
 	uint16_t operation;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 
-	flags = p->flags;
-	operation = p->op;
-	if (operation == ENABLE) {
-		ret = rte_eth_dev_get_port_by_name(p->data.en_v1.device,
-				&port);
-		if (ret < 0) {
+	/* Check for possible DPDK version mismatch */
+	if (!(p->ver == V1 || p->ver == V2)) {
+		PDUMP_LOG(ERR,
+			  "incorrect client version %u\n", p->ver);
+		return -EINVAL;
+	}
+
+	if (p->prm) {
+		if (p->prm->prog_arg.type != RTE_BPF_ARG_PTR_MBUF) {
 			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.en_v1.device);
+				  "invalid BPF program type: %u\n",
+				  p->prm->prog_arg.type);
 			return -EINVAL;
 		}
-		queue = p->data.en_v1.queue;
-		ring = p->data.en_v1.ring;
-		mp = p->data.en_v1.mp;
-	} else {
-		ret = rte_eth_dev_get_port_by_name(p->data.dis_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.dis_v1.device);
-			return -EINVAL;
+
+		filter = rte_bpf_load(p->prm);
+		if (filter == NULL) {
+			PDUMP_LOG(ERR, "cannot load BPF filter: %s\n",
+				  rte_strerror(rte_errno));
+			return -rte_errno;
 		}
-		queue = p->data.dis_v1.queue;
-		ring = p->data.dis_v1.ring;
-		mp = p->data.dis_v1.mp;
+	}
+
+	flags = p->flags;
+	operation = p->op;
+	queue = p->queue;
+	ring = p->ring;
+	mp = p->mp;
+
+	ret = rte_eth_dev_get_port_by_name(p->device, &port);
+	if (ret < 0) {
+		PDUMP_LOG(ERR,
+			  "failed to get port id for device id=%s\n",
+			  p->device);
+		return -EINVAL;
 	}
 
 	/* validation if packet capture is for all queues */
@@ -296,8 +361,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register RX callback */
 	if (flags & RTE_PDUMP_FLAG_RX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_rx_q : queue + 1;
-		ret = pdump_register_rx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_rx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -305,8 +371,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register TX callback */
 	if (flags & RTE_PDUMP_FLAG_TX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_tx_q : queue + 1;
-		ret = pdump_register_tx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_tx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -332,7 +399,7 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 		resp->err_value = set_pdump_rxtx_cbs(cli_req);
 	}
 
-	strlcpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_resp.len_param = sizeof(*resp);
 	mp_resp.num_fds = 0;
 	if (rte_mp_reply(&mp_resp, peer) < 0) {
@@ -347,8 +414,18 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 int
 rte_pdump_init(void)
 {
+	const struct rte_memzone *mz;
 	int ret;
 
+	mz = rte_memzone_reserve(MZ_RTE_PDUMP_STATS, sizeof(*pdump_stats),
+				 rte_socket_id(), 0);
+	if (mz == NULL) {
+		PDUMP_LOG(ERR, "cannot allocate pdump statistics\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	pdump_stats = mz->addr;
+
 	ret = rte_mp_action_register(PDUMP_MP, pdump_server);
 	if (ret && rte_errno != ENOTSUP)
 		return -1;
@@ -392,14 +469,21 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 static int
 pdump_validate_flags(uint32_t flags)
 {
-	if (flags != RTE_PDUMP_FLAG_RX && flags != RTE_PDUMP_FLAG_TX &&
-		flags != RTE_PDUMP_FLAG_RXTX) {
+	if ((flags & RTE_PDUMP_FLAG_RXTX) == 0) {
 		PDUMP_LOG(ERR,
 			"invalid flags, should be either rx/tx/rxtx\n");
 		rte_errno = EINVAL;
 		return -1;
 	}
 
+	/* mask off the flags we know about */
+	if (flags & ~(RTE_PDUMP_FLAG_RXTX | RTE_PDUMP_FLAG_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "unknown flags: %#x\n", flags);
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -426,12 +510,12 @@ pdump_validate_port(uint16_t port, char *name)
 }
 
 static int
-pdump_prepare_client_request(char *device, uint16_t queue,
-				uint32_t flags,
-				uint16_t operation,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+pdump_prepare_client_request(const char *device, uint16_t queue,
+			     uint32_t flags, uint32_t snaplen,
+			     uint16_t operation,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     const struct rte_bpf_prm *prm)
 {
 	int ret = -1;
 	struct rte_mp_msg mp_req, *mp_rep;
@@ -440,26 +524,22 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	struct pdump_request *req = (struct pdump_request *)mp_req.param;
 	struct pdump_response *resp;
 
-	req->ver = 1;
-	req->flags = flags;
+	memset(req, 0, sizeof(*req));
+
+	req->ver = (flags & RTE_PDUMP_FLAG_PCAPNG) ? V2 : V1;
+	req->flags = flags & RTE_PDUMP_FLAG_RXTX;
 	req->op = operation;
+	req->queue = queue;
+	rte_strscpy(req->device, device, sizeof(req->device));
+
 	if ((operation & ENABLE) != 0) {
-		strlcpy(req->data.en_v1.device, device,
-			sizeof(req->data.en_v1.device));
-		req->data.en_v1.queue = queue;
-		req->data.en_v1.ring = ring;
-		req->data.en_v1.mp = mp;
-		req->data.en_v1.filter = filter;
-	} else {
-		strlcpy(req->data.dis_v1.device, device,
-			sizeof(req->data.dis_v1.device));
-		req->data.dis_v1.queue = queue;
-		req->data.dis_v1.ring = NULL;
-		req->data.dis_v1.mp = NULL;
-		req->data.dis_v1.filter = NULL;
+		req->ring = ring;
+		req->mp = mp;
+		req->prm = prm;
+		req->snaplen = snaplen;
 	}
 
-	strlcpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_req.len_param = sizeof(*req);
 	mp_req.num_fds = 0;
 	if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) {
@@ -477,11 +557,17 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	return ret;
 }
 
-int
-rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
-			struct rte_ring *ring,
-			struct rte_mempool *mp,
-			void *filter)
+/*
+ * There are two versions of this function, because although original API
+ * left place holder for future filter, it never checked the value.
+ * Therefore the API can't depend on application passing a non
+ * bogus value.
+ */
+static int
+pdump_enable(uint16_t port, uint16_t queue,
+	     uint32_t flags, uint32_t snaplen,
+	     struct rte_ring *ring, struct rte_mempool *mp,
+	     const struct rte_bpf_prm *prm)
 {
 	int ret;
 	char name[RTE_DEV_NAME_MAX_LEN];
@@ -496,20 +582,42 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						ENABLE, ring, mp, filter);
+	if (snaplen == 0)
+		snaplen = UINT32_MAX;
 
-	return ret;
+	return pdump_prepare_client_request(name, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
 }
 
 int
-rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
-				uint32_t flags,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
+		 struct rte_ring *ring,
+		 struct rte_mempool *mp,
+		 void *filter __rte_unused)
 {
-	int ret = 0;
+	return pdump_enable(port, queue, flags, 0,
+			    ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm)
+{
+	return pdump_enable(port, queue, flags, snaplen,
+			    ring, mp, prm);
+}
+
+static int
+pdump_enable_by_deviceid(const char *device_id, uint16_t queue,
+			 uint32_t flags, uint32_t snaplen,
+			 struct rte_ring *ring,
+			 struct rte_mempool *mp,
+			 const struct rte_bpf_prm *prm)
+{
+	int ret;
 
 	ret = pdump_validate_ring_mp(ring, mp);
 	if (ret < 0)
@@ -518,10 +626,30 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						ENABLE, ring, mp, filter);
+	return pdump_prepare_client_request(device_id, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
+}
 
-	return ret;
+int
+rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
+			     uint32_t flags,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     void *filter __rte_unused)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, 0,
+					ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *prm)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, snaplen,
+					ring, mp, prm);
 }
 
 int
@@ -537,8 +665,8 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags)
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(name, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
@@ -553,8 +681,65 @@ rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(device_id, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
+
+static void
+pdump_sum_stats(uint16_t port, uint16_t nq,
+		struct rte_pdump_stats stats[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+		struct rte_pdump_stats *total)
+{
+	uint64_t *sum = (uint64_t *)total;
+	unsigned int i;
+	uint64_t val;
+	uint16_t qid;
+
+	for (qid = 0; qid < nq; qid++) {
+		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+
+		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
+			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			sum[i] += val;
+		}
+	}
+}
+
+int
+rte_pdump_stats(uint16_t port, struct rte_pdump_stats *stats)
+{
+	struct rte_eth_dev_info dev_info;
+	const struct rte_memzone *mz;
+	int ret;
+
+	memset(stats, 0, sizeof(*stats));
+	ret = rte_eth_dev_info_get(port, &dev_info);
+	if (ret != 0) {
+		PDUMP_LOG(ERR,
+			  "Error during getting device (port %u) info: %s\n",
+			  port, strerror(-ret));
+		return ret;
+	}
+
+	if (pdump_stats == NULL) {
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+			PDUMP_LOG(ERR, "pdump stats initialized\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		mz = rte_memzone_lookup(MZ_RTE_PDUMP_STATS);
+		if (mz == NULL) {
+			PDUMP_LOG(ERR, "can not find pdump stats\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+		pdump_stats = mz->addr;
+	}
+
+	pdump_sum_stats(port, dev_info.nb_rx_queues, pdump_stats->rx, stats);
+	pdump_sum_stats(port, dev_info.nb_tx_queues, pdump_stats->tx, stats);
+	return 0;
+}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index 6b00fc17aeb2..6efa0274f2ce 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -15,6 +15,7 @@
 #include <stdint.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
+#include <rte_bpf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -26,7 +27,9 @@ enum {
 	RTE_PDUMP_FLAG_RX = 1,  /* receive direction */
 	RTE_PDUMP_FLAG_TX = 2,  /* transmit direction */
 	/* both receive and transmit directions */
-	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
+	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX),
+
+	RTE_PDUMP_FLAG_PCAPNG = 4, /* format for pcapng */
 };
 
 /**
@@ -68,7 +71,7 @@ rte_pdump_uninit(void);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  Unused should be NULL.
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -80,6 +83,41 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 		struct rte_mempool *mp,
 		void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given port and queue with filtering.
+ *
+ * @param port_id
+ *  The Ethernet port on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param prm
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf(uint16_t port_id, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm);
+
 /**
  * Disables packet capturing on given port and queue.
  *
@@ -118,7 +156,7 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  unused should be NULL
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -131,6 +169,43 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 				struct rte_mempool *mp,
 				void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given device id and queue with filtering.
+ * device_id can be name or pci address of device.
+ *
+ * @param device_id
+ *  device id on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *filter);
+
+
 /**
  * Disables packet capturing on given device_id and queue.
  * device_id can be name or pci address of device.
@@ -153,6 +228,38 @@ int
 rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 				uint32_t flags);
 
+
+/**
+ * A structure used to retrieve statistics from packet capture.
+ * The statistics are sum of both receive and transmit queues.
+ */
+struct rte_pdump_stats {
+	uint64_t accepted; /**< Number of packets accepted by filter. */
+	uint64_t filtered; /**< Number of packets rejected by filter. */
+	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
+	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+
+	uint64_t reserved[4]; /**< Reserved and pad to cache line */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Retrieve the packet capture statistics for a queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param stats
+ *   A pointer to structure of type *rte_pdump_stats* to be filled in.
+ * @return
+ *   Zero if successful. -1 on error and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_pdump_stats(uint16_t port_id, struct rte_pdump_stats *stats);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index f0a9d12c9a9e..ce5502d9cdf4 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -10,3 +10,11 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_pdump_enable_bpf;
+	rte_pdump_enable_bpf_by_deviceid;
+	rte_pdump_stats;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v11 07/12] app/dumpcap: add new packet capture application
  2021-09-24 15:21 ` [dpdk-dev] [PATCH v11 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (5 preceding siblings ...)
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 06/12] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-09-24 15:21   ` Stephen Hemminger
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 08/12] test: add test for bpf_convert Stephen Hemminger
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-24 15:21 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.

It is not complete yet some features such as filtering are not implemented.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/dumpcap/main.c      | 844 ++++++++++++++++++++++++++++++++++++++++
 app/dumpcap/meson.build |  16 +
 app/meson.build         |   1 +
 3 files changed, 861 insertions(+)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build

diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c
new file mode 100644
index 000000000000..baf9eee46666
--- /dev/null
+++ b/app/dumpcap/main.c
@@ -0,0 +1,844 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * DPDK application to dump network traffic
+ * This is designed to look and act like the Wireshark
+ * dumpcap program.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <limits.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_alarm.h>
+#include <rte_bpf.h>
+#include <rte_config.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_lcore.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_pcapng.h>
+#include <rte_pdump.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+#include <rte_time.h>
+#include <rte_version.h>
+
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+
+#define RING_NAME "capture-ring"
+#define MONITOR_INTERVAL  (500 * 1000)
+#define MBUF_POOL_CACHE_SIZE 32
+#define BURST_SIZE 32
+#define SLEEP_THRESHOLD 1000
+
+/* command line flags */
+static const char *progname;
+static bool quit_signal;
+static bool group_read;
+static bool quiet;
+static bool promiscuous_mode = true;
+static bool use_pcapng = true;
+static char *output_name;
+static const char *filter_str;
+static unsigned int ring_size = 2048;
+static const char *capture_comment;
+static uint32_t snaplen = RTE_MBUF_DEFAULT_BUF_SIZE;
+static bool dump_bpf;
+static struct {
+	uint64_t  duration;	/* nanoseconds */
+	unsigned long packets;  /* number of packets in file */
+	size_t size;		/* file size (bytes) */
+} stop;
+
+/* Running state */
+static struct rte_bpf_prm *bpf_prm;
+static uint64_t start_time, end_time;
+static uint64_t packets_received;
+static size_t file_size;
+
+struct interface {
+	TAILQ_ENTRY(interface) next;
+	uint16_t port;
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_rxtx_callback *rx_cb[RTE_MAX_QUEUES_PER_PORT];
+};
+
+TAILQ_HEAD(interface_list, interface);
+static struct interface_list interfaces = TAILQ_HEAD_INITIALIZER(interfaces);
+static struct interface *port2intf[RTE_MAX_ETHPORTS];
+
+/* Can do either pcap or pcapng format output */
+typedef union {
+	rte_pcapng_t  *pcapng;
+	pcap_dumper_t *dumper;
+} dumpcap_out_t;
+
+static void usage(void)
+{
+	printf("Usage: %s [options] ...\n\n", progname);
+	printf("Capture Interface:\n"
+	       "  -i <interface>           name or port index of interface\n"
+	       "  -f <capture filter>      packet filter in libpcap filter syntax\n");
+	printf("  -s <snaplen>, --snapshot-length <snaplen>\n"
+	       "                           packet snapshot length (def: %u)\n",
+	       RTE_MBUF_DEFAULT_BUF_SIZE);
+	printf("  -p, --no-promiscuous-mode\n"
+	       "                           don't capture in promiscuous mode\n"
+	       "  -D, --list-interfaces    print list of interfaces and exit\n"
+	       "  -d                       print generated BPF code for capture filter\n"
+	       "\n"
+	       "Stop conditions:\n"
+	       "  -c <packet count>        stop after n packets (def: infinite)\n"
+	       "  -a <autostop cond.> ..., --autostop <autostop cond.> ...\n"
+	       "                           duration:NUM - stop after NUM seconds\n"
+	       "                           filesize:NUM - stop this file after NUM kB\n"
+	       "                            packets:NUM - stop after NUM packets\n"
+	       "Output (files):\n"
+	       "  -w <filename>            name of file to save (def: tempfile)\n"
+	       "  -g                       enable group read access on the output file(s)\n"
+	       "  -n                       use pcapng format instead of pcap (default)\n"
+	       "  -P                       use libpcap format instead of pcapng\n"
+	       "  --capture-comment <comment>\n"
+	       "                           add a capture comment to the output file\n"
+	       "\n"
+	       "Miscellaneous:\n"
+	       "  -q                       don't report packet capture counts\n"
+	       "  -v, --version            print version information and exit\n"
+	       "  -h, --help               display this help and exit\n"
+	       "\n"
+	       "Use Ctrl-C to stop capturing at any time.\n");
+}
+
+static const char *version(void)
+{
+	static char str[128];
+
+	snprintf(str, sizeof(str),
+		 "%s 1.0 (%s)\n", progname, rte_version());
+	return str;
+}
+
+/* Parse numeric argument from command line */
+static unsigned long get_uint(const char *arg, const char *name,
+			     unsigned int limit)
+{
+	unsigned long u;
+	char *endp;
+
+	u = strtoul(arg, &endp, 0);
+	if (*arg == '\0' || *endp != '\0')
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is not a valid number\n",
+			 name, arg);
+	if (limit && u > limit)
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is too large (greater than %u)\n",
+			 name, arg, limit);
+
+	return u;
+}
+
+/* Set auto stop values */
+static void auto_stop(char *opt)
+{
+	char *value, *endp;
+
+	value = strchr(opt, ':');
+	if (value == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Missing colon in auto stop parameter\n");
+
+	*value++ = '\0';
+	if (strcmp(opt, "duration") == 0) {
+		double interval = strtod(value, &endp);
+
+		if (*value == '\0' || *endp != '\0' || interval <= 0)
+			rte_exit(EXIT_FAILURE,
+				 "Invalid duration \"%s\"\n", value);
+		stop.duration = NSEC_PER_SEC * interval;
+	} else if (strcmp(opt, "filesize") == 0) {
+		stop.size = get_uint(value, "filesize", 0) * 1024;
+	} else if (strcmp(opt, "packets") == 0) {
+		stop.packets = get_uint(value, "packets", 0);
+	} else {
+		rte_exit(EXIT_FAILURE,
+			 "Unknown autostop parameter \"%s\"\n", opt);
+	}
+}
+
+/* Add interface to list of interfaces to capture */
+static void add_interface(uint16_t port, const char *name)
+{
+	struct interface *intf;
+
+	intf = malloc(sizeof(*intf));
+	if (!intf)
+		rte_exit(EXIT_FAILURE, "no memory for interface\n");
+
+	memset(intf, 0, sizeof(*intf));
+	rte_strscpy(intf->name, name, sizeof(intf->name));
+
+	printf("Capturing on '%s'\n", name);
+
+	port2intf[port] = intf;
+	TAILQ_INSERT_TAIL(&interfaces, intf, next);
+}
+
+/* Select all valid DPDK interfaces */
+static void select_all_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+	}
+}
+
+/*
+ * Choose interface to capture if no -i option given.
+ * Select the first DPDK port, this matches what dumpcap does.
+ */
+static void set_default_interface(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+		return;
+	}
+	rte_exit(EXIT_FAILURE, "No usable interfaces found\n");
+}
+
+/* Lookup interface by name or port and add it to the list */
+static void select_interface(const char *arg)
+{
+	uint16_t port;
+
+	if (strcmp(arg, "*"))
+		select_all_interfaces();
+	else if (rte_eth_dev_get_port_by_name(arg, &port) == 0)
+		add_interface(port, arg);
+	else {
+		char name[RTE_ETH_NAME_MAX_LEN];
+
+		port = get_uint(arg, "port_number", UINT16_MAX);
+		if (rte_eth_dev_get_name_by_port(port, name) < 0)
+			rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
+				 port);
+		add_interface(port, name);
+	}
+}
+
+/* Display list of possible interfaces that can be used. */
+static void show_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		printf("%u. %s\n", p, name);
+	}
+}
+
+static void compile_filter(void)
+{
+	struct bpf_program bf;
+	pcap_t *pcap;
+
+	pcap = pcap_open_dead(DLT_EN10MB, snaplen);
+	if (!pcap)
+		rte_exit(EXIT_FAILURE, "can not open pcap\n");
+
+	if (pcap_compile(pcap, &bf, filter_str,
+			 1, PCAP_NETMASK_UNKNOWN) != 0)
+		rte_exit(EXIT_FAILURE, "pcap filter string not valid (%s)\n",
+			 pcap_geterr(pcap));
+
+	bpf_prm = rte_bpf_convert(&bf);
+	if (bpf_prm == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "bpf convert failed\n");
+
+	if (dump_bpf) {
+		printf("cBPF program (%u insns)\n", bf.bf_len);
+		bpf_dump(&bf, 1);
+		printf("\neBPF program (%u insns)\n", bpf_prm->nb_ins);
+		rte_bpf_dump(stdout, bpf_prm->ins, bpf_prm->nb_ins);
+		exit(0);
+	}
+
+	/* Don't care about original program any more */
+	pcap_freecode(&bf);
+	pcap_close(pcap);
+}
+
+/*
+ * Parse command line options.
+ * These are chosen to be similar to dumpcap command.
+ */
+static void parse_opts(int argc, char **argv)
+{
+	static const struct option long_options[] = {
+		{ "autostop",        required_argument, NULL, 'a' },
+		{ "capture-comment", required_argument, NULL, 0 },
+		{ "help",            no_argument,       NULL, 'h' },
+		{ "interface",       required_argument, NULL, 'i' },
+		{ "list-interfaces", no_argument,       NULL, 'D' },
+		{ "no-promiscuous-mode", no_argument,   NULL, 'p' },
+		{ "output-file",     required_argument, NULL, 'w' },
+		{ "ring-buffer",     required_argument, NULL, 'b' },
+		{ "snapshot-length", required_argument, NULL, 's' },
+		{ "version",         no_argument,       NULL, 'v' },
+		{ NULL },
+	};
+	int option_index, c;
+
+	for (;;) {
+		c = getopt_long(argc, argv, "a:b:c:dDf:ghi:nN:pPqs:vw:",
+				long_options, &option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			switch (option_index) {
+			case 0:
+				capture_comment = optarg;
+				break;
+			default:
+				usage();
+				exit(1);
+			}
+			break;
+		case 'a':
+			auto_stop(optarg);
+			break;
+		case 'b':
+			rte_exit(EXIT_FAILURE,
+				 "multiple files not implemented\n");
+			break;
+		case 'c':
+			stop.packets = get_uint(optarg, "packet_count", 0);
+			break;
+		case 'd':
+			dump_bpf = true;
+			break;
+		case 'D':
+			show_interfaces();
+			exit(0);
+		case 'f':
+			filter_str = optarg;
+			break;
+		case 'g':
+			group_read = true;
+			break;
+		case 'h':
+			printf("%s\n\n", version());
+			usage();
+			exit(0);
+		case 'i':
+			select_interface(optarg);
+			break;
+		case 'n':
+			use_pcapng = true;
+			break;
+		case 'N':
+			ring_size = get_uint(optarg, "packet_limit", 0);
+			break;
+		case 'p':
+			promiscuous_mode = false;
+			break;
+		case 'P':
+			use_pcapng = false;
+			break;
+		case 'q':
+			quiet = true;
+			break;
+		case 's':
+			snaplen = get_uint(optarg, "snap_len", 0);
+			break;
+		case 'w':
+			output_name = optarg;
+			break;
+		case 'v':
+			printf("%s\n", version());
+			exit(0);
+		default:
+			fprintf(stderr, "Invalid option: %s\n",
+				argv[optind - 1]);
+			usage();
+			exit(1);
+		}
+	}
+}
+
+static void
+signal_handler(int sig_num __rte_unused)
+{
+	__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+}
+
+/* Return the time since 1/1/1970 in nanoseconds */
+static uint64_t create_timestamp(void)
+{
+	struct timespec now;
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+	return rte_timespec_to_ns(&now);
+}
+
+static void
+cleanup_pdump_resources(void)
+{
+	struct interface *intf;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		rte_pdump_disable(intf->port,
+				  RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
+		if (promiscuous_mode)
+			rte_eth_promiscuous_disable(intf->port);
+	}
+}
+
+/* Alarm signal handler, used to check that primary process */
+static void
+monitor_primary(void *arg __rte_unused)
+{
+	if (__atomic_load_n(&quit_signal, __ATOMIC_RELAXED))
+		return;
+
+	if (rte_eal_primary_proc_alive(NULL)) {
+		rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	} else {
+		fprintf(stderr,
+			"Primary process is no longer active, exiting...\n");
+		__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+	}
+}
+
+/* Setup handler to check when primary exits. */
+static void
+enable_primary_monitor(void)
+{
+	int ret;
+
+	/* Once primary exits, so will pdump. */
+	ret = rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to enable monitor:%d\n", ret);
+}
+
+static void
+disable_primary_monitor(void)
+{
+	int ret;
+
+	ret = rte_eal_alarm_cancel(monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to disable monitor:%d\n", ret);
+}
+
+static void
+report_packet_stats(dumpcap_out_t out)
+{
+	struct rte_pdump_stats pdump_stats;
+	struct interface *intf;
+	uint64_t ifrecv, ifdrop;
+	double percent;
+
+	fputc('\n', stderr);
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (rte_pdump_stats(intf->port, &pdump_stats) < 0)
+			continue;
+
+		/* do what Wiretap does */
+		ifrecv = pdump_stats.accepted + pdump_stats.filtered;
+		ifdrop = pdump_stats.nombuf + pdump_stats.ringfull;
+
+		if (use_pcapng)
+			rte_pcapng_write_stats(out.pcapng, intf->port, NULL,
+					       start_time, end_time,
+					       ifrecv, ifdrop);
+
+		if (ifrecv == 0)
+			percent = 0;
+		else
+			percent = 100. * ifrecv / (ifrecv + ifdrop);
+
+		fprintf(stderr,
+			"Packets received/dropped on interface '%s': "
+			"%"PRIu64 "/%" PRIu64 " (%.1f)\n",
+			intf->name, ifrecv, ifdrop, percent);
+	}
+}
+
+/*
+ * Start DPDK EAL with arguments.
+ * Unlike most DPDK programs, this application does not use the
+ * typical EAL command line arguments.
+ * We don't want to expose all the DPDK internals to the user.
+ */
+static void dpdk_init(void)
+{
+	static const char * const args[] = {
+		"dumpcap", "--proc-type", "secondary",
+		"--log-level", "notice"
+
+	};
+	const int eal_argc = RTE_DIM(args);
+	char **eal_argv;
+	unsigned int i;
+
+	/* DPDK API requires mutable versions of command line arguments. */
+	eal_argv = calloc(eal_argc + 1, sizeof(char *));
+	if (eal_argv == NULL)
+		rte_panic("No memory\n");
+
+	eal_argv[0] = strdup(progname);
+	for (i = 1; i < RTE_DIM(args); i++)
+		eal_argv[i] = strdup(args[i]);
+
+	if (rte_eal_init(eal_argc, eal_argv) < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed: is primary process running?\n");
+
+	if (rte_eth_dev_count_avail() == 0)
+		rte_exit(EXIT_FAILURE, "No Ethernet ports found\n");
+}
+
+/* Create packet ring shared between callbacks and process */
+static struct rte_ring *create_ring(void)
+{
+	struct rte_ring *ring;
+	size_t size, log2;
+
+	/* Find next power of 2 >= size. */
+	size = ring_size;
+	log2 = sizeof(size) * 8 - __builtin_clzl(size - 1);
+	size = 1u << log2;
+
+	if (size != ring_size) {
+		fprintf(stderr, "Ring size %u rounded up to %zu\n",
+			ring_size, size);
+		ring_size = size;
+	}
+
+	ring = rte_ring_lookup(RING_NAME);
+	if (ring == NULL) {
+		ring = rte_ring_create(RING_NAME, ring_size,
+					rte_socket_id(), 0);
+		if (ring == NULL)
+			rte_exit(EXIT_FAILURE, "Could not create ring :%s\n",
+				 rte_strerror(rte_errno));
+	}
+	return ring;
+}
+
+static struct rte_mempool *create_mempool(void)
+{
+	static const char pool_name[] = "capture_mbufs";
+	size_t num_mbufs = 2 * ring_size;
+	struct rte_mempool *mp;
+
+	mp = rte_mempool_lookup(pool_name);
+	if (mp)
+		return mp;
+
+	mp = rte_pktmbuf_pool_create_by_ops(pool_name, num_mbufs,
+					    MBUF_POOL_CACHE_SIZE, 0,
+					    rte_pcapng_mbuf_size(snaplen),
+					    rte_socket_id(), "ring_mp_sc");
+	if (mp == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Mempool (%s) creation failed: %s\n", pool_name,
+			 rte_strerror(rte_errno));
+
+	return mp;
+}
+
+/*
+ * Get Operating System information.
+ * Returns an string allocated via malloc().
+ */
+static char *get_os_info(void)
+{
+	struct utsname uts;
+	char *osname = NULL;
+
+	if (uname(&uts) < 0)
+		return NULL;
+
+	if (asprintf(&osname, "%s %s",
+		     uts.sysname, uts.release) == -1)
+		return NULL;
+
+	return osname;
+}
+
+static dumpcap_out_t create_output(void)
+{
+	dumpcap_out_t ret;
+	static char tmp_path[PATH_MAX];
+	int fd;
+
+	/* If no filename specified make a tempfile name */
+	if (output_name == NULL) {
+		struct interface *intf;
+		struct tm *tm;
+		time_t now;
+		char ts[32];
+
+		intf = TAILQ_FIRST(&interfaces);
+		now = time(NULL);
+		tm = localtime(&now);
+		if (!tm)
+			rte_panic("localtime failed\n");
+
+		strftime(ts, sizeof(ts), "%Y%m%d%H%M%S", tm);
+
+		snprintf(tmp_path, sizeof(tmp_path),
+			 "/tmp/%s_%u_%s_%s.%s",
+			 progname, intf->port, intf->name, ts,
+			 use_pcapng ? "pcapng" : "pcap");
+		output_name = tmp_path;
+	}
+
+	if (strcmp(output_name, "-") == 0)
+		fd = STDOUT_FILENO;
+	else {
+		mode_t mode = group_read ? 0640 : 0600;
+
+		fd = open(output_name, O_WRONLY | O_CREAT, mode);
+		if (fd < 0)
+			rte_exit(EXIT_FAILURE, "Can not open \"%s\": %s\n",
+				 output_name, strerror(errno));
+	}
+
+	if (use_pcapng) {
+		char *os = get_os_info();
+
+		ret.pcapng = rte_pcapng_fdopen(fd, os, NULL,
+					   version(), capture_comment);
+		if (ret.pcapng == NULL)
+			rte_exit(EXIT_FAILURE, "pcapng_fdopen failed: %s\n",
+				 strerror(rte_errno));
+		free(os);
+	} else {
+		pcap_t *pcap;
+
+		pcap = pcap_open_dead_with_tstamp_precision(DLT_EN10MB, snaplen,
+							    PCAP_TSTAMP_PRECISION_NANO);
+		if (pcap == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_open_dead failed\n");
+
+		ret.dumper = pcap_dump_fopen(pcap, fdopen(fd, "w"));
+		if (ret.dumper == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_dump_fopen failed: %s\n",
+				 pcap_geterr(pcap));
+	}
+
+	return ret;
+}
+
+static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp)
+{
+	struct interface *intf;
+	uint32_t flags;
+	int ret;
+
+	flags = RTE_PDUMP_FLAG_RXTX;
+	if (use_pcapng)
+		flags |= RTE_PDUMP_FLAG_PCAPNG;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (promiscuous_mode)
+			rte_eth_promiscuous_enable(intf->port);
+
+		ret = rte_pdump_enable_bpf(intf->port, RTE_PDUMP_ALL_QUEUES,
+					   flags, snaplen,
+					   r, mp, bpf_prm);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Packet dump enable failed: %s\n",
+				 rte_strerror(-ret));
+	}
+}
+
+/*
+ * Show current count of captured packets
+ * with backspaces to overwrite last value.
+ */
+static void show_count(uint64_t count)
+{
+	unsigned int i;
+	static unsigned int bt;
+
+	for (i = 0; i < bt; i++)
+		fputc('\b', stderr);
+
+	bt = fprintf(stderr, "%"PRIu64" ", count);
+}
+
+/* Write multiple packets in older pcap format */
+static ssize_t
+pcap_write_packets(pcap_dumper_t *dumper,
+		   struct rte_mbuf *pkts[], uint16_t n)
+{
+	uint8_t temp_data[snaplen];
+	struct pcap_pkthdr header;
+	uint16_t i;
+	size_t total = 0;
+
+	gettimeofday(&header.ts, NULL);
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m = pkts[i];
+
+		header.len = rte_pktmbuf_pkt_len(m);
+		header.caplen = RTE_MIN(header.len, snaplen);
+
+		pcap_dump((u_char *)dumper, &header,
+			  rte_pktmbuf_read(m, 0, header.caplen, temp_data));
+
+		total += sizeof(header) + header.len;
+	}
+
+	return total;
+}
+
+/* Process all packets in ring and dump to capture file */
+static int process_ring(dumpcap_out_t out, struct rte_ring *r)
+{
+	struct rte_mbuf *pkts[BURST_SIZE];
+	unsigned int avail, n;
+	static unsigned int empty_count;
+	ssize_t written;
+
+	n = rte_ring_sc_dequeue_burst(r, (void **) pkts, BURST_SIZE,
+				      &avail);
+	if (n == 0) {
+		/* don't consume endless amounts of cpu if idle */
+		if (empty_count < SLEEP_THRESHOLD)
+			++empty_count;
+		else
+			usleep(10);
+		return 0;
+	}
+
+	empty_count = (avail == 0);
+
+	if (use_pcapng)
+		written = rte_pcapng_write_packets(out.pcapng, pkts, n);
+	else
+		written = pcap_write_packets(out.dumper, pkts, n);
+
+	rte_pktmbuf_free_bulk(pkts, n);
+
+	if (written < 0)
+		return -1;
+
+	file_size += written;
+	packets_received += n;
+	if (!quiet)
+		show_count(packets_received);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	struct rte_ring *r;
+	struct rte_mempool *mp;
+	dumpcap_out_t out;
+
+	progname = argv[0];
+
+	dpdk_init();
+	parse_opts(argc, argv);
+
+	if (filter_str)
+		compile_filter();
+
+	if (TAILQ_EMPTY(&interfaces))
+		set_default_interface();
+
+	r = create_ring();
+	mp = create_mempool();
+	out = create_output();
+
+	start_time = create_timestamp();
+	enable_pdump(r, mp);
+
+	signal(SIGINT, signal_handler);
+	signal(SIGPIPE, SIG_IGN);
+
+	enable_primary_monitor();
+
+	if (!quiet) {
+		fprintf(stderr, "Packets captured: ");
+		show_count(0);
+	}
+
+	while (!__atomic_load_n(&quit_signal, __ATOMIC_RELAXED)) {
+		if (process_ring(out, r) < 0) {
+			fprintf(stderr, "pcapng file write failed; %s\n",
+				strerror(errno));
+			break;
+		}
+
+		if (stop.size && file_size >= stop.size)
+			break;
+
+		if (stop.packets && packets_received >= stop.packets)
+			break;
+
+		if (stop.duration != 0 &&
+		    create_timestamp() - start_time > stop.duration)
+			break;
+	}
+
+	end_time = create_timestamp();
+	disable_primary_monitor();
+
+	if (rte_eal_primary_proc_alive(NULL))
+		report_packet_stats(out);
+
+	if (use_pcapng)
+		rte_pcapng_close(out.pcapng);
+	else
+		pcap_dump_close(out.dumper);
+
+	cleanup_pdump_resources();
+	rte_free(bpf_filter);
+	rte_ring_free(r);
+	rte_mempool_free(mp);
+
+	return rte_eal_cleanup() ? EXIT_FAILURE : 0;
+}
diff --git a/app/dumpcap/meson.build b/app/dumpcap/meson.build
new file mode 100644
index 000000000000..794336211eff
--- /dev/null
+++ b/app/dumpcap/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+if not dpdk_conf.has('RTE_PORT_PCAP')
+    build = false
+    reason = 'missing dependency, "libpcap"'
+endif
+
+if is_windows
+	build = false
+	reason = 'not supported on Windows'
+	subdir_done()
+endif
+
+sources = files('main.c')
+deps += ['ethdev', 'pdump', 'pcapng', 'bpf']
diff --git a/app/meson.build b/app/meson.build
index 4c6049807cc3..e41a2e390236 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017-2019 Intel Corporation
 
 apps = [
+        'dumpcap',
         'pdump',
         'proc-info',
         'test-acl',
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v11 08/12] test: add test for bpf_convert
  2021-09-24 15:21 ` [dpdk-dev] [PATCH v11 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (6 preceding siblings ...)
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
@ 2021-09-24 15:21   ` Stephen Hemminger
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 09/12] test: add a test for pcapng library Stephen Hemminger
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-24 15:21 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Add some functional tests for the Classic BPF to DPDK BPF converter.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 200 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 200 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 527c06b80708..543a5fd615b2 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -10,6 +10,7 @@
 #include <rte_memory.h>
 #include <rte_debug.h>
 #include <rte_hexdump.h>
+#include <rte_malloc.h>
 #include <rte_random.h>
 #include <rte_byteorder.h>
 #include <rte_errno.h>
@@ -3233,3 +3234,202 @@ test_bpf(void)
 }
 
 REGISTER_TEST_COMMAND(bpf_autotest, test_bpf);
+
+#ifdef RTE_PORT_PCAP
+#include <pcap/pcap.h>
+
+static void
+test_bpf_dump(struct bpf_program *cbf, const struct rte_bpf_prm *prm)
+{
+	printf("cBPF program (%u insns)\n", cbf->bf_len);
+	bpf_dump(cbf, 1);
+
+	printf("\neBPF program (%u insns)\n", prm->nb_ins);
+	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
+}
+
+static int
+test_bpf_match(pcap_t *pcap, const char *str,
+	       struct rte_mbuf *mb)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+	int ret = -1;
+	uint64_t rc;
+
+	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
+		       __func__, __LINE__,  str, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, str, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	rc = rte_bpf_exec(bpf, mb);
+	/* The return code from bpf capture filter is non-zero if matched */
+	ret = (rc == 0);
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return ret;
+}
+
+/* Basic sanity test can we match a IP packet */
+static int
+test_bpf_filter_sanity(pcap_t *pcap)
+{
+	const uint32_t plen = 100;
+	struct rte_mbuf mb, *m;
+	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
+	struct {
+		struct rte_ether_hdr eth_hdr;
+		struct rte_ipv4_hdr ip_hdr;
+	} *hdr;
+
+	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
+	m = &mb;
+
+	hdr = rte_pktmbuf_mtod(m, typeof(hdr));
+	hdr->eth_hdr = (struct rte_ether_hdr) {
+		.d_addr.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+	};
+	hdr->ip_hdr = (struct rte_ipv4_hdr) {
+		.version_ihl = RTE_IPV4_VHL_DEF,
+		.total_length = rte_cpu_to_be_16(plen),
+		.time_to_live = IPDEFTTL,
+		.next_proto_id = IPPROTO_RAW,
+		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+	};
+
+	if (test_bpf_match(pcap, "ip", m) != 0) {
+		printf("%s@%d: filter \"ip\" doesn't match test data\n",
+		       __func__, __LINE__);
+		return -1;
+	}
+	if (test_bpf_match(pcap, "not ip", m) == 0) {
+		printf("%s@%d: filter \"not ip\" does match test data\n",
+		       __func__, __LINE__);
+		return -1;
+	}
+
+	return 0;
+}
+
+/*
+ * Some sample pcap filter strings from
+ * https://wiki.wireshark.org/CaptureFilters
+ */
+static const char * const sample_filters[] = {
+	"host 172.18.5.4",
+	"net 192.168.0.0/24",
+	"src net 192.168.0.0/24",
+	"src net 192.168.0.0 mask 255.255.255.0",
+	"dst net 192.168.0.0/24",
+	"dst net 192.168.0.0 mask 255.255.255.0",
+	"port 53",
+	"host dpdk.org and not (port 80 or port 25)",
+	"host dpdk.org and not port 80 and not port 25",
+	"port not 53 and not arp",
+	"(tcp[0:2] > 1500 and tcp[0:2] < 1550) or (tcp[2:2] > 1500 and tcp[2:2] < 1550)",
+	"ether proto 0x888e",
+	"ether[0] & 1 = 0 and ip[16] >= 224",
+	"icmp[icmptype] != icmp-echo and icmp[icmptype] != icmp-echoreply",
+	"tcp[tcpflags] & (tcp-syn|tcp-fin) != 0 and not src and dst net 127.0.0.1",
+	"not ether dst 01:80:c2:00:00:0e",
+	"not broadcast and not multicast",
+	"dst host ff02::1",
+	"port 80 and tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420",
+	/* Worms */
+	"dst port 135 and tcp port 135 and ip[2:2]==48",
+	"icmp[icmptype]==icmp-echo and ip[2:2]==92 and icmp[8:4]==0xAAAAAAAA",
+	"dst port 135 or dst port 445 or dst port 1433"
+	" and tcp[tcpflags] & (tcp-syn) != 0"
+	" and tcp[tcpflags] & (tcp-ack) = 0 and src net 192.168.0.0/24",
+	"tcp src port 443 and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4] = 0x18)"
+	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 1] = 0x03)"
+	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 2] < 0x04)"
+	" and ((ip[2:2] - 4 * (ip[0] & 0x0F) - 4 * ((tcp[12] & 0xF0) >> 4) > 69))",
+	/* Other */
+	"len = 128",
+};
+
+static int
+test_bpf_filter(pcap_t *pcap, const char *s)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+
+	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
+		       __func__, __LINE__, s, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, s, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	else {
+		printf("%s \"%s\"\n", __func__, s);
+		test_bpf_dump(&fcode, prm);
+	}
+
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return (bpf == NULL) ? -1 : 0;
+}
+
+static int
+test_bpf_convert(void)
+{
+	unsigned int i;
+	pcap_t *pcap;
+	int rc;
+
+	pcap = pcap_open_dead(DLT_EN10MB, 262144);
+	if (!pcap) {
+		printf("pcap_open_dead failed\n");
+		return -1;
+	}
+
+	rc = test_bpf_filter_sanity(pcap);
+	for (i = 0; i < RTE_DIM(sample_filters); i++)
+		rc |= test_bpf_filter(pcap, sample_filters[i]);
+
+	pcap_close(pcap);
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(bpf_convert_autotest, test_bpf_convert);
+#endif /* RTE_PORT_PCAP */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v11 09/12] test: add a test for pcapng library
  2021-09-24 15:21 ` [dpdk-dev] [PATCH v11 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (7 preceding siblings ...)
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 08/12] test: add test for bpf_convert Stephen Hemminger
@ 2021-09-24 15:21   ` Stephen Hemminger
  2021-09-24 15:22   ` [dpdk-dev] [PATCH v11 10/12] test: enable bpf autotest Stephen Hemminger
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-24 15:21 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Simple unit test that created pcapng file using API.

To run this test you need to have at least one device.
For example:

DPDK_TEST=pcapng_autotest ./build/app/test/dpdk-test -l 0-15 \
    --no-huge -m 2048 --vdev=net_tap,iface=dummy

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/meson.build   |   4 +
 app/test/test_pcapng.c | 272 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 276 insertions(+)
 create mode 100644 app/test/test_pcapng.c

diff --git a/app/test/meson.build b/app/test/meson.build
index a7611686adcb..8cf41021deb4 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -395,6 +395,10 @@ if dpdk_conf.has('RTE_NET_RING')
     fast_tests += [['pdump_autotest', true]]
 endif
 
+if dpdk_conf.has('RTE_PORT_PCAP')
+    test_sources += 'test_pcapng.c'
+endif
+
 if dpdk_conf.has('RTE_LIB_POWER')
     test_deps += 'power'
 endif
diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
new file mode 100644
index 000000000000..df837ffe0d51
--- /dev/null
+++ b/app/test/test_pcapng.c
@@ -0,0 +1,272 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Microsoft Corporation
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_net.h>
+#include <rte_pcapng.h>
+
+#include <pcap/pcap.h>
+
+#include "test.h"
+
+#define NUM_PACKETS    10
+#define DUMMY_MBUF_NUM 3
+
+static rte_pcapng_t *pcapng;
+static struct rte_mempool *mp;
+static const uint32_t pkt_len = 200;
+static uint16_t port_id;
+static char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng";
+
+/* first mbuf in the packet, should always be at offset 0 */
+struct dummy_mbuf {
+	struct rte_mbuf mb[DUMMY_MBUF_NUM];
+	uint8_t buf[DUMMY_MBUF_NUM][RTE_MBUF_DEFAULT_BUF_SIZE];
+};
+
+static void
+dummy_mbuf_prep(struct rte_mbuf *mb, uint8_t buf[], uint32_t buf_len,
+	uint32_t data_len)
+{
+	uint32_t i;
+	uint8_t *db;
+
+	mb->buf_addr = buf;
+	mb->buf_iova = (uintptr_t)buf;
+	mb->buf_len = buf_len;
+	rte_mbuf_refcnt_set(mb, 1);
+
+	/* set pool pointer to dummy value, test doesn't use it */
+	mb->pool = (void *)buf;
+
+	rte_pktmbuf_reset(mb);
+	db = (uint8_t *)rte_pktmbuf_append(mb, data_len);
+
+	for (i = 0; i != data_len; i++)
+		db[i] = i;
+}
+
+/* Make an IP packet consisting of chain of one packets */
+static void
+mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen)
+{
+	struct {
+		struct rte_ether_hdr eth;
+		struct rte_ipv4_hdr ip;
+	} pkt = {
+		.eth = {
+			.d_addr.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+			.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+		},
+		.ip = {
+			.version_ihl = RTE_IPV4_VHL_DEF,
+			.total_length = rte_cpu_to_be_16(plen),
+			.time_to_live = IPDEFTTL,
+			.next_proto_id = IPPROTO_RAW,
+			.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+			.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+		}
+	};
+
+	memset(dm, 0, sizeof(*dm));
+	dummy_mbuf_prep(&dm->mb[0], dm->buf[0], sizeof(dm->buf[0]), plen);
+
+	rte_eth_random_addr(pkt.eth.s_addr.addr_bytes);
+	memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, RTE_MIN(sizeof(pkt), plen));
+}
+
+static int
+test_setup(void)
+{
+	int tmp_fd;
+
+	port_id = rte_eth_find_next(0);
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		fprintf(stderr, "No valid Ether port\n");
+		return -1;
+	}
+
+	tmp_fd = mkstemps(file_name, strlen(".pcapng"));
+	if (tmp_fd == -1) {
+		perror("mkstemps() failure");
+		return -1;
+	}
+	printf("pcapng: output file %s\n", file_name);
+
+	/* open a test capture file */
+	pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL);
+	if (pcapng == NULL) {
+		fprintf(stderr, "rte_pcapng_fdopen failed\n");
+		close(tmp_fd);
+		return -1;
+	}
+
+
+	/* Make a pool for cloned packeets */
+	mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", NUM_PACKETS,
+					    0, 0,
+					    rte_pcapng_mbuf_size(pkt_len),
+					    SOCKET_ID_ANY, "ring_mp_sc");
+	if (mp == NULL) {
+		fprintf(stderr, "Cannot create mempool\n");
+		return -1;
+	}
+	return 0;
+
+}
+
+static int
+test_write_packets(void)
+{
+	struct rte_mbuf *orig;
+	struct rte_mbuf *clones[NUM_PACKETS] = { };
+	struct dummy_mbuf mbfs;
+	unsigned int i;
+	ssize_t len;
+
+	/* make a dummy packet */
+	mbuf1_prepare(&mbfs, pkt_len);
+
+	/* clone them */
+	orig  = &mbfs.mb[0];
+	for (i = 0; i < NUM_PACKETS; i++) {
+		struct rte_mbuf *mc;
+
+		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
+				rte_get_tsc_cycles(), 0);
+		if (mc == NULL) {
+			fprintf(stderr, "Cannot copy packet\n");
+			return -1;
+		}
+		clones[i] = mc;
+	}
+
+	/* write it to capture file */
+	len = rte_pcapng_write_packets(pcapng, clones, NUM_PACKETS);
+
+	rte_pktmbuf_free_bulk(clones, NUM_PACKETS);
+
+	if (len <= 0) {
+		fprintf(stderr, "Write of packets failed\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+test_write_stats(void)
+{
+	ssize_t len;
+
+	/* write a statistics block */
+	len = rte_pcapng_write_stats(pcapng, port_id,
+				     NULL, 0, 0,
+				     NUM_PACKETS, 0);
+	if (len <= 0) {
+		fprintf(stderr, "Write of statistics failed\n");
+		return -1;
+	}
+	return 0;
+}
+
+static void
+pkt_print(u_char *user, const struct pcap_pkthdr *h,
+	  const u_char *bytes)
+{
+	unsigned int *countp = (unsigned int *)user;
+	const struct rte_ether_hdr *eh;
+	struct tm *tm;
+	char tbuf[128], src[64], dst[64];
+
+	tm = localtime(&h->ts.tv_sec);
+	if (tm == NULL) {
+		perror("localtime");
+		return;
+	}
+
+	if (strftime(tbuf, sizeof(tbuf), "%X", tm) == 0) {
+		fprintf(stderr, "strftime returned 0!\n");
+		return;
+	}
+
+	eh = (const struct rte_ether_hdr *)bytes;
+	rte_ether_format_addr(dst, sizeof(dst), &eh->d_addr);
+	rte_ether_format_addr(src, sizeof(src), &eh->s_addr);
+	printf("%s.%06lu: %s -> %s type %x length %u\n",
+	       tbuf, (unsigned long)h->ts.tv_usec,
+	       src, dst, rte_be_to_cpu_16(eh->ether_type), h->len);
+
+	*countp += 1;
+}
+
+/*
+ * Open the resulting pcapng file with libpcap
+ * Would be better to use capinfos from wireshark
+ * but that creates an unwanted dependency.
+ */
+static int
+test_validate(void)
+{
+	char errbuf[PCAP_ERRBUF_SIZE];
+	unsigned int count = 0;
+	pcap_t *pcap;
+	int ret;
+
+	pcap = pcap_open_offline(file_name, errbuf);
+	if (pcap == NULL) {
+		fprintf(stderr, "pcap_open_offline('%s') failed: %s\n",
+			file_name, errbuf);
+		return -1;
+	}
+
+	ret = pcap_loop(pcap, 0, pkt_print, (u_char *)&count);
+	if (ret == 0)
+		printf("Saw %u packets\n", count);
+	else
+		fprintf(stderr, "pcap_dispatch: failed: %s\n",
+			pcap_geterr(pcap));
+	pcap_close(pcap);
+
+	return ret;
+}
+
+static void
+test_cleanup(void)
+{
+	if (mp)
+		rte_mempool_free(mp);
+
+	if (pcapng)
+		rte_pcapng_close(pcapng);
+
+}
+
+static struct
+unit_test_suite test_pcapng_suite  = {
+	.setup = test_setup,
+	.teardown = test_cleanup,
+	.suite_name = "Test Pcapng Unit Test Suite",
+	.unit_test_cases = {
+		TEST_CASE(test_write_packets),
+		TEST_CASE(test_write_stats),
+		TEST_CASE(test_validate),
+		TEST_CASES_END()
+	}
+};
+
+static int
+test_pcapng(void)
+{
+	return unit_test_suite_runner(&test_pcapng_suite);
+}
+
+REGISTER_TEST_COMMAND(pcapng_autotest, test_pcapng);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v11 10/12] test: enable bpf autotest
  2021-09-24 15:21 ` [dpdk-dev] [PATCH v11 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (8 preceding siblings ...)
  2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 09/12] test: add a test for pcapng library Stephen Hemminger
@ 2021-09-24 15:22   ` Stephen Hemminger
  2021-09-24 15:22   ` [dpdk-dev] [PATCH v11 11/12] doc: changes for new pcapng and dumpcap Stephen Hemminger
  2021-09-24 15:22   ` [dpdk-dev] [PATCH v11 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-24 15:22 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

The BPF autotest is defined but not run automatically.
Since it is short, it should be added to the autotest suite.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/meson.build | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/app/test/meson.build b/app/test/meson.build
index 8cf41021deb4..ba7d568bf330 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -193,6 +193,8 @@ test_deps = [
 fast_tests = [
         ['acl_autotest', true],
         ['atomic_autotest', false],
+        ['bpf_autotest', true],
+        ['bpf_convert_autotest', true],
         ['bitops_autotest', true],
         ['byteorder_autotest', true],
         ['cksum_autotest', true],
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v11 11/12] doc: changes for new pcapng and dumpcap
  2021-09-24 15:21 ` [dpdk-dev] [PATCH v11 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (9 preceding siblings ...)
  2021-09-24 15:22   ` [dpdk-dev] [PATCH v11 10/12] test: enable bpf autotest Stephen Hemminger
@ 2021-09-24 15:22   ` Stephen Hemminger
  2021-09-24 15:22   ` [dpdk-dev] [PATCH v11 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-24 15:22 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Describe the new packet capture library and utilities

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/api/doxy-api-index.md                     |  1 +
 doc/api/doxy-api.conf.in                      |  1 +
 .../howto/img/packet_capture_framework.svg    | 96 +++++++++----------
 doc/guides/howto/packet_capture_framework.rst | 67 ++++++-------
 doc/guides/prog_guide/index.rst               |  1 +
 doc/guides/prog_guide/pcapng_lib.rst          | 24 +++++
 doc/guides/prog_guide/pdump_lib.rst           | 28 ++++--
 doc/guides/rel_notes/release_21_11.rst        | 10 ++
 doc/guides/tools/dumpcap.rst                  | 86 +++++++++++++++++
 doc/guides/tools/index.rst                    |  1 +
 10 files changed, 228 insertions(+), 87 deletions(-)
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a0356..ee07394d1c78 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -223,3 +223,4 @@ The public API headers are grouped by topics:
   [experimental APIs]  (@ref rte_compat.h),
   [ABI versioning]     (@ref rte_function_versioning.h),
   [version]            (@ref rte_version.h)
+  [pcapng]             (@ref rte_pcapng.h)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6ab..aba17799a9a1 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -58,6 +58,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/metrics \
                           @TOPDIR@/lib/node \
                           @TOPDIR@/lib/net \
+                          @TOPDIR@/lib/pcapng \
                           @TOPDIR@/lib/pci \
                           @TOPDIR@/lib/pdump \
                           @TOPDIR@/lib/pipeline \
diff --git a/doc/guides/howto/img/packet_capture_framework.svg b/doc/guides/howto/img/packet_capture_framework.svg
index a76baf71fdee..1c2646a81096 100644
--- a/doc/guides/howto/img/packet_capture_framework.svg
+++ b/doc/guides/howto/img/packet_capture_framework.svg
@@ -1,6 +1,4 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<!-- Created with Inkscape (http://www.inkscape.org/) -->
-
 <svg
    xmlns:osb="http://www.openswatchbook.org/uri/2009/osb"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
@@ -16,8 +14,8 @@
    viewBox="0 0 425.19685 283.46457"
    id="svg2"
    version="1.1"
-   inkscape:version="0.91 r13725"
-   sodipodi:docname="drawing-pcap.svg">
+   inkscape:version="1.0.2 (e86c870879, 2021-01-15)"
+   sodipodi:docname="packet_capture_framework.svg">
   <defs
      id="defs4">
     <marker
@@ -228,7 +226,7 @@
        x2="487.64606"
        y2="258.38232"
        gradientUnits="userSpaceOnUse"
-       gradientTransform="translate(-84.916417,744.90779)" />
+       gradientTransform="matrix(1.1457977,0,0,0.99944907,-151.97019,745.05014)" />
     <linearGradient
        inkscape:collect="always"
        xlink:href="#linearGradient5784"
@@ -277,17 +275,18 @@
      borderopacity="1.0"
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
-     inkscape:zoom="0.57434918"
-     inkscape:cx="215.17857"
-     inkscape:cy="285.26445"
+     inkscape:zoom="1"
+     inkscape:cx="226.77165"
+     inkscape:cy="78.124511"
      inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
-     inkscape:window-width="1874"
-     inkscape:window-height="971"
-     inkscape:window-x="2"
-     inkscape:window-y="24"
-     inkscape:window-maximized="0" />
+     inkscape:window-width="2560"
+     inkscape:window-height="1414"
+     inkscape:window-x="0"
+     inkscape:window-y="0"
+     inkscape:window-maximized="1"
+     inkscape:document-rotation="0" />
   <metadata
      id="metadata7">
     <rdf:RDF>
@@ -296,7 +295,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
+        <dc:title />
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -321,15 +320,15 @@
        y="790.82452" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="61.050636"
        y="807.3205"
-       id="text4152"
-       sodipodi:linespacing="125%"><tspan
+       id="text4152"><tspan
          sodipodi:role="line"
          id="tspan4154"
          x="61.050636"
-         y="807.3205">DPDK Primary Application</tspan></text>
+         y="807.3205"
+         style="font-size:12.5px;line-height:1.25">DPDK Primary Application</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6"
@@ -339,19 +338,20 @@
        y="827.01843" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="350.68585"
        y="841.16058"
-       id="text4189"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189"><tspan
          sodipodi:role="line"
          id="tspan4191"
          x="350.68585"
-         y="841.16058">dpdk-pdump</tspan><tspan
+         y="841.16058"
+         style="font-size:12.5px;line-height:1.25">dpdk-dumpcap</tspan><tspan
          sodipodi:role="line"
          x="350.68585"
          y="856.78558"
-         id="tspan4193">tool</tspan></text>
+         id="tspan4193"
+         style="font-size:12.5px;line-height:1.25">tool</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4"
@@ -361,15 +361,15 @@
        y="891.16315" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70612"
        y="905.3053"
-       id="text4189-1"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1"><tspan
          sodipodi:role="line"
          x="352.70612"
          y="905.3053"
-         id="tspan4193-3">PCAP PMD</tspan></text>
+         id="tspan4193-3"
+         style="font-size:12.5px;line-height:1.25">librte_pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5745);fill-opacity:1;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-6"
@@ -379,15 +379,15 @@
        y="923.9931" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.02846"
        y="938.13525"
-       id="text4189-0"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-0"><tspan
          sodipodi:role="line"
          x="136.02846"
          y="938.13525"
-         id="tspan4193-6">dpdk_port0</tspan></text>
+         id="tspan4193-6"
+         style="font-size:12.5px;line-height:1.25">dpdk_port0</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-5"
@@ -397,33 +397,33 @@
        y="824.99817" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="137.54369"
        y="839.14026"
-       id="text4189-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-4"><tspan
          sodipodi:role="line"
          x="137.54369"
          y="839.14026"
-         id="tspan4193-2">librte_pdump</tspan></text>
+         id="tspan4193-2"
+         style="font-size:12.5px;line-height:1.25">librte_pdump</tspan></text>
     <rect
-       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1.07013;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5"
-       width="94.449265"
-       height="35.355339"
-       x="307.7804"
-       y="985.61243" />
+       width="108.21974"
+       height="35.335861"
+       x="297.9809"
+       y="985.62219" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70618"
        y="999.75458"
-       id="text4189-1-8"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8"><tspan
          sodipodi:role="line"
          x="352.70618"
          y="999.75458"
-         id="tspan4193-3-2">capture.pcap</tspan></text>
+         id="tspan4193-3-2"
+         style="font-size:12.5px;line-height:1.25">capture.pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5788-1);fill-opacity:1;stroke:#257cdc;stroke-width:1.12555885;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5-1"
@@ -433,15 +433,15 @@
        y="983.14984" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.53352"
        y="1002.785"
-       id="text4189-1-8-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8-4"><tspan
          sodipodi:role="line"
          x="136.53352"
          y="1002.785"
-         id="tspan4193-3-2-7">Traffic Generator</tspan></text>
+         id="tspan4193-3-2-7"
+         style="font-size:12.5px;line-height:1.25">Traffic Generator</tspan></text>
     <path
        style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker7331)"
        d="m 351.46948,927.02357 c 0,57.5787 0,57.5787 0,57.5787"
diff --git a/doc/guides/howto/packet_capture_framework.rst b/doc/guides/howto/packet_capture_framework.rst
index c31bac52340e..78baa609a021 100644
--- a/doc/guides/howto/packet_capture_framework.rst
+++ b/doc/guides/howto/packet_capture_framework.rst
@@ -1,18 +1,19 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(c) 2017 Intel Corporation.
 
-DPDK pdump Library and pdump Tool
-=================================
+DPDK packet capture libraries and tools
+=======================================
 
 This document describes how the Data Plane Development Kit (DPDK) Packet
 Capture Framework is used for capturing packets on DPDK ports. It is intended
 for users of DPDK who want to know more about the Packet Capture feature and
 for those who want to monitor traffic on DPDK-controlled devices.
 
-The DPDK packet capture framework was introduced in DPDK v16.07. The DPDK
-packet capture framework consists of the DPDK pdump library and DPDK pdump
-tool.
-
+The DPDK packet capture framework was introduced in DPDK v16.07 and
+enhanced in 21.1. The DPDK packet capture framework consists of the
+libraries for collecting packets ``librte_pdump`` and writing packets
+to a file ``librte_pcapng``. There are two sample applications:
+``dpdk-dumpcap`` and older ``dpdk-pdump``.
 
 Introduction
 ------------
@@ -22,43 +23,46 @@ allow users to initialize the packet capture framework and to enable or
 disable packet capture. The library works on a multi process communication model and its
 usage is recommended for debugging purposes.
 
-The :ref:`dpdk-pdump <pdump_tool>` tool is developed based on the
-``librte_pdump`` library.  It runs as a DPDK secondary process and is capable
-of enabling or disabling packet capture on DPDK ports. The ``dpdk-pdump`` tool
-provides command-line options with which users can request enabling or
-disabling of the packet capture on DPDK ports.
+The :ref:`librte_pcapng <pcapng_library>` library provides the APIs to format
+packets and write them to a file in Pcapng format.
+
+
+The :ref:`dpdk-dumpcap <dumpcap_tool>` is a tool that captures packets in
+like Wireshark dumpcap does for Linux. It runs as a DPDK secondary process and
+captures packets from one or more interfaces and writes them to a file
+in Pcapng format.  The ``dpdk-dumpcap`` tool is designed to take
+most of the same options as the Wireshark ``dumpcap`` command.
 
-The application which initializes the packet capture framework will be a primary process
-and the application that enables or disables the packet capture will
-be a secondary process. The primary process sends the Rx and Tx packets from the DPDK ports
-to the secondary process.
+Without any options it will use the packet capture framework to
+capture traffic from the first available DPDK port.
 
 In DPDK the ``testpmd`` application can be used to initialize the packet
-capture framework and acts as a server, and the ``dpdk-pdump`` tool acts as a
+capture framework and acts as a server, and the ``dpdk-dumpcap`` tool acts as a
 client. To view Rx or Tx packets of ``testpmd``, the application should be
-launched first, and then the ``dpdk-pdump`` tool. Packets from ``testpmd``
-will be sent to the tool, which then sends them on to the Pcap PMD device and
-that device writes them to the Pcap file or to an external interface depending
-on the command-line option used.
+launched first, and then the ``dpdk-dumpcap`` tool. Packets from ``testpmd``
+will be sent to the tool, and then to the Pcapng file.
 
 Some things to note:
 
-* The ``dpdk-pdump`` tool can only be used in conjunction with a primary
+* All tools using ``librte_pdump`` can only be used in conjunction with a primary
   application which has the packet capture framework initialized already. In
   dpdk, only ``testpmd`` is modified to initialize packet capture framework,
-  other applications remain untouched. So, if the ``dpdk-pdump`` tool has to
+  other applications remain untouched. So, if the ``dpdk-dumpcap`` tool has to
   be used with any application other than the testpmd, the user needs to
   explicitly modify that application to call the packet capture framework
   initialization code. Refer to the ``app/test-pmd/testpmd.c`` code and look
   for ``pdump`` keyword to see how this is done.
 
-* The ``dpdk-pdump`` tool depends on the libpcap based PMD.
+* The ``dpdk-pdump`` tool is an older tool created as demonstration of ``librte_pdump``
+  library. The ``dpdk-pdump`` tool provides more limited functionality and
+  and depends on the Pcap PMD. It is retained only for compatibility reasons;
+  users should use ``dpdk-dumpcap`` instead.
 
 
 Test Environment
 ----------------
 
-The overview of using the Packet Capture Framework and the ``dpdk-pdump`` tool
+The overview of using the Packet Capture Framework and the ``dpdk-dumpcap`` utility
 for packet capturing on the DPDK port in
 :numref:`figure_packet_capture_framework`.
 
@@ -66,13 +70,13 @@ for packet capturing on the DPDK port in
 
 .. figure:: img/packet_capture_framework.*
 
-   Packet capturing on a DPDK port using the dpdk-pdump tool.
+   Packet capturing on a DPDK port using the dpdk-dumpcap utility.
 
 
 Running the Application
 -----------------------
 
-The following steps demonstrate how to run the ``dpdk-pdump`` tool to capture
+The following steps demonstrate how to run the ``dpdk-dumpcap`` tool to capture
 Rx side packets on dpdk_port0 in :numref:`figure_packet_capture_framework` and
 inspect them using ``tcpdump``.
 
@@ -80,16 +84,15 @@ inspect them using ``tcpdump``.
 
      sudo <build_dir>/app/dpdk-testpmd -c 0xf0 -n 4 -- -i --port-topology=chained
 
-#. Launch the pdump tool as follows::
+#. Launch the dpdk-dump as follows::
 
-     sudo <build_dir>/app/dpdk-pdump -- \
-          --pdump 'port=0,queue=*,rx-dev=/tmp/capture.pcap'
+     sudo <build_dir>/app/dpdk-dumpcap -w /tmp/capture.pcapng
 
 #. Send traffic to dpdk_port0 from traffic generator.
-   Inspect packets captured in the file capture.pcap using a tool
-   that can interpret Pcap files, for example tcpdump::
+   Inspect packets captured in the file capture.pcap using a tool such as
+   tcpdump or tshark that can interpret Pcapng files::
 
-     $tcpdump -nr /tmp/capture.pcap
+     $ tcpdump -nr /tmp/capture.pcapng
      reading from file /tmp/capture.pcap, link-type EN10MB (Ethernet)
      11:11:36.891404 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
      11:11:36.891442 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46a3..b440c77c2ba1 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -43,6 +43,7 @@ Programmer's Guide
     ip_fragment_reassembly_lib
     generic_receive_offload_lib
     generic_segmentation_offload_lib
+    pcapng_lib
     pdump_lib
     multi_proc_support
     kernel_nic_interface
diff --git a/doc/guides/prog_guide/pcapng_lib.rst b/doc/guides/prog_guide/pcapng_lib.rst
new file mode 100644
index 000000000000..36379b530a57
--- /dev/null
+++ b/doc/guides/prog_guide/pcapng_lib.rst
@@ -0,0 +1,24 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2016 Intel Corporation.
+
+.. _pcapng_library:
+
+Packet Capture File Writer
+==========================
+
+Pcapng is a library for creating files in Pcapng file format.
+The Pcapng file format is the default capture file format for modern
+network capture processing tools. It can be read by wireshark and tcpdump.
+
+Usage
+-----
+
+Before the library can be used the function ``rte_pcapng_init``
+should be called once to initialize timestamp computation.
+
+
+References
+----------
+* Draft RFC https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+
+* Project repository  https://github.com/pcapng/pcapng/
diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst
index 62c0b015b2fe..9af91415e5ea 100644
--- a/doc/guides/prog_guide/pdump_lib.rst
+++ b/doc/guides/prog_guide/pdump_lib.rst
@@ -3,10 +3,10 @@
 
 .. _pdump_library:
 
-The librte_pdump Library
-========================
+The Packet Capture Library
+==========================
 
-The ``librte_pdump`` library provides a framework for packet capturing in DPDK.
+The DPDK ``pdump`` library provides a framework for packet capturing in DPDK.
 The library does the complete copy of the Rx and Tx mbufs to a new mempool and
 hence it slows down the performance of the applications, so it is recommended
 to use this library for debugging purposes.
@@ -23,11 +23,19 @@ or disable the packet capture, and to uninitialize it.
 
 * ``rte_pdump_enable()``:
   This API enables the packet capture on a given port and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf()``
+  This API enables the packet capture on a given port and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_enable_by_deviceid()``:
   This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf_by_deviceid()``
+  This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
+  It also allows seating an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_disable()``:
   This API disables the packet capture on a given port and queue.
@@ -61,6 +69,12 @@ and enables the packet capture by registering the Ethernet RX and TX callbacks f
 and queue combinations. Then the primary process will mirror the packets to the new mempool and enqueue them to
 the rte_ring that secondary process have passed to these APIs.
 
+The packet ring supports one of two formats. The default format enqueues copies of the original packets
+into the rte_ring. If the ``RTE_PDUMP_FLAG_PCAPNG`` is set the mbuf data is extended with header and trailer
+to match the format of Pcapng enhanced packet block. The enhanced packet block has meta-data such as the
+timestamp, port and queue the packet was captured on. It is up to the application consuming the
+packets from the ring to select the format desired.
+
 The library APIs ``rte_pdump_disable()`` and ``rte_pdump_disable_by_deviceid()`` disables the packet capture.
 For the calls to these APIs from secondary process, the library creates the "pdump disable" request and sends
 the request to the primary process over the multi process channel. The primary process takes this request and
@@ -74,5 +88,5 @@ function.
 Use Case: Packet Capturing
 --------------------------
 
-The DPDK ``app/pdump`` tool is developed based on this library to capture packets in DPDK.
-Users can use this as an example to develop their own packet capturing tools.
+The DPDK ``app/dpdk-dumpcap`` utility uses this library
+to capture packets in DPDK.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index ad7c1afec0f7..075e9e544c54 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -91,6 +91,16 @@ New Features
   Added command-line options to specify total number of processes and
   current process ID. Each process owns subset of Rx and Tx queues.
 
+* **Revised packet capture framework.**
+
+  * New dpdk-dumpcap program that has most of the features of the
+    wireshark dumpcap utility including: capture of multiple interfaces,
+    filtering, and stopping after number of bytes, packets.
+  * New library for writing pcapng packet capture files.
+  * Enhancements to the pdump library to support:
+    * Packet filter with BPF.
+    * Pcapng format with timestamps and meta-data.
+    * Fixes packet capture with stripped VLAN tags.
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/dumpcap.rst b/doc/guides/tools/dumpcap.rst
new file mode 100644
index 000000000000..664ea0c79802
--- /dev/null
+++ b/doc/guides/tools/dumpcap.rst
@@ -0,0 +1,86 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Microsoft Corporation.
+
+.. _dumpcap_tool:
+
+dpdk-dumpcap Application
+========================
+
+The ``dpdk-dumpcap`` tool is a Data Plane Development Kit (DPDK)
+network traffic dump tool.  The interface is similar to  the dumpcap tool in Wireshark.
+It runs as a secondary DPDK process and lets you capture packets that are
+coming into and out of a DPDK primary process.
+The ``dpdk-dumpcap`` writes files in Pcapng packet format using
+capture file format is pcapng.
+
+Without any options set it will use DPDK to capture traffic from the first
+available DPDK interface and write the received raw packet data, along
+with timestamps into a pcapng file.
+
+If the ``-w`` option is not specified, ``dpdk-dumpcap`` writes to a newly
+create file with a name chosen based on interface name and timestamp.
+If ``-w`` option is specified, then that file is used.
+
+   .. Note::
+      * The ``dpdk-dumpcap`` tool can only be used in conjunction with a primary
+        application which has the packet capture framework initialized already.
+        In dpdk, only the ``testpmd`` is modified to initialize packet capture
+        framework, other applications remain untouched. So, if the ``dpdk-dumpcap``
+        tool has to be used with any application other than the testpmd, user
+        needs to explicitly modify that application to call packet capture
+        framework initialization code. Refer ``app/test-pmd/testpmd.c``
+        code to see how this is done.
+
+      * The ``dpdk-dumpcap`` tool runs as a DPDK secondary process. It exits when
+        the primary application exits.
+
+
+Running the Application
+-----------------------
+
+To list interfaces available for capture use ``--list-interfaces``.
+
+To filter packets in style of *tshark* use the ``-f`` flag.
+
+To capture on multiple interfaces at once, use multiple ``-I`` flags.
+
+Example
+-------
+
+.. code-block:: console
+
+   # ./<build_dir>/app/dpdk-dumpcap --list-interfaces
+   0. 000:00:03.0
+   1. 000:00:03.1
+
+   # ./<build_dir>/app/dpdk-dumpcap -I 0000:00:03.0 -c 6 -w /tmp/sample.pcapng
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 6/0
+
+   # ./<build_dir>/app/dpdk-dumpcap -f 'tcp port 80'
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 10/8
+
+
+Limitations
+-----------
+The following option of Wireshark ``dumpcap`` is not yet implemented:
+
+   * ``-b|--ring-buffer`` -- more complex file management.
+
+The following options do not make sense in the context of DPDK.
+
+   * ``-C <byte_limit>`` -- its a kernel thing
+
+   * ``-t`` -- use a thread per interface
+
+   * Timestamp type.
+
+   * Link data types. Only EN10MB (Ethernet) is supported.
+
+   * Wireless related options:  ``-I|--monitor-mode`` and  ``-k <freq>``
+
+
+.. Note::
+   * The options to ``dpdk-dumpcap`` are like the Wireshark dumpcap program and
+     are not the same as ``dpdk-pdump`` and other DPDK applications.
diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst
index 93dde4148e90..b71c12b8f2dd 100644
--- a/doc/guides/tools/index.rst
+++ b/doc/guides/tools/index.rst
@@ -8,6 +8,7 @@ DPDK Tools User Guides
     :maxdepth: 2
     :numbered:
 
+    dumpcap
     proc_info
     pdump
     pmdinfo
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v11 12/12] MAINTAINERS: add entry for new packet capture features
  2021-09-24 15:21 ` [dpdk-dev] [PATCH v11 00/12] Packet capture framework enhancements Stephen Hemminger
                     ` (10 preceding siblings ...)
  2021-09-24 15:22   ` [dpdk-dev] [PATCH v11 11/12] doc: changes for new pcapng and dumpcap Stephen Hemminger
@ 2021-09-24 15:22   ` Stephen Hemminger
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-09-24 15:22 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Since the packet capture is just extension of existing pdump;
add myself as maintainer of that.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 30bf77b79a75..eae60d2d7a74 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1425,12 +1425,17 @@ F: doc/guides/sample_app_ug/qos_scheduler.rst
 
 Packet capture
 M: Reshma Pattan <reshma.pattan@intel.com>
+M: Stephen Hemminger <stephen@networkplumber.org>
 F: lib/pdump/
+F: lib/pcapng/
 F: doc/guides/prog_guide/pdump_lib.rst
-F: app/test/test_pdump.*
-F: app/pdump/
+F: doc/guides/prog_guide/pcapng_lib.rst
+F: doc/guides/tools/dumpcap.rst
 F: doc/guides/tools/pdump.rst
-
+F: app/test/test_pdump.c
+F: app/test/test_pcapng.c
+F: app/pdump/
+F: app/dumpcap/
 
 Packet Framework
 ----------------
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v10 06/12] pdump: support pcapng and filtering
  2021-09-23 18:23       ` Stephen Hemminger
@ 2021-09-24 15:33         ` Pattan, Reshma
  0 siblings, 0 replies; 220+ messages in thread
From: Pattan, Reshma @ 2021-09-24 15:33 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev



> -----Original Message-----
> > >
> > > +		 * Similar behavior to rte_bpf_eth callback.
> > > +		 * if BPF program returns zero value for a given packet,
> > > +		 * then it will be ignored.
> > > +		 */
> > Looks like wrong callback name referred in the comment, should be
> corrected?
> 
> It really is pcap_offline_filter() and Linux kernel socket filter.

Oh ok, rcs[i] is basically "return value of the filter program. This will be zero if the packet doesn't match the filter and non-zero if the packet matches the filter." 
got it now on why is the below if check.

> 
> > > +		if (cbs->filter && rcs[i] == 0) {

^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v12 00/12] Packet capture framework update
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (14 preceding siblings ...)
  2021-09-24 15:21 ` [dpdk-dev] [PATCH v11 00/12] Packet capture framework enhancements Stephen Hemminger
@ 2021-10-01 16:26 ` Stephen Hemminger
  2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 01/12] lib: pdump is not supported on Windows Stephen Hemminger
                     ` (12 more replies)
  2021-10-15 18:28 ` [dpdk-dev] [PATCH v13 " Stephen Hemminger
                   ` (2 subsequent siblings)
  18 siblings, 13 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-01 16:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patch set is a more complete version of the the enhanced
packet capture support described last year.

The new capture library and utility are:
  - faster avoids lots of extra I/O, does bursting, etc.
  - gives more information (multiple ports, queues, etc)
  - has a better user interface (same as Wireshark dumpcap)
  - fixes structural problems with VLAN's and timestamps

There are no blocker items.
The following are worth noting:
  * bogus checkpatch warnings
	- the correct flag to open is O_CREAT
        - intentionally keeping macro with goto since that
          was in original code and is clearer
        - the tempfile name can not be const since it is
          overwritten by tmpfile() call

  * enabling BPF tests causes CI to see a pre-existing bug
    https://bugs.dpdk.org/show_bug.cgi?id=811

  * future filtering for stripped VLAN tags needs collabration
    with libpcap project to fix pcap_compile_filter().

v12
  - fixes for capture offloaded VLAN tags.
    look at direction flag and handle QinQ offload.

v11
  - address review comments for pdump (patch 6)

v10:
  - fix to rte_bpf_dump to handle more instructions
    make sure all bpf_test cases are decoded

v9:
  - incorporate suggested change to BPF XOR
  - make autotest for pcapng more complete by reading the
    resulting file with libpcap

v8:
  - enable BPF tests in autotest
  - add more BPF test strings
  - use rte_strscpy to satisfy checkpatch
  - merge MAINTAINERS (put this in with existing pdump)

v7:
  - add functional tests for pcapng lib
  - bug fix for error returns in pcapng lib
  - handle long osname on FreeBSD
  - resolve almost all checkpatch issues

v5:
  - minor build and checkpatch fixes for RHEL/FreeBSD
  - disable lib/pdump on Windows. It was not useful before
    and now pdump depends on bpf.

v4:
  - minor checkpatch fixes.
    Note: some of the checkpatch warnings are bogus and won't be fixed.
  - fix build of dumpcap on FreeBSD

v3:
  - introduce packet filters using classic BPF to eBPF converter
    required small fix to DPDK BPF interpreter
  - introduce function to decode eBPF instructions
  - add option to dumpcap to show both classic BPF and eBPF result
  - drop some un-useful stubs
  - minor checkpatch warning cleanup

v2:
   fix formatting of packet blocks
   fix the new packet capture statistics
   fix crash when primary process exits
   record start/end time
   various whitespace/checkpatch warnings



Stephen Hemminger (12):
  lib: pdump is not supported on Windows
  librte_pcapng: add new library for writing pcapng files
  bpf: allow self-xor operation
  bpf: add function to convert classic BPF to DPDK BPF
  bpf: add function to dump eBPF instructions
  pdump: support pcapng and filtering
  app/dumpcap: add new packet capture application
  test: add test for bpf_convert
  test: add a test for pcapng library
  test: enable bpf autotest
  doc: changes for new pcapng and dumpcap
  MAINTAINERS: add entry for new packet capture features

 MAINTAINERS                                   |  11 +-
 app/dumpcap/main.c                            | 844 ++++++++++++++++++
 app/dumpcap/meson.build                       |  16 +
 app/meson.build                               |   1 +
 app/test/meson.build                          |   6 +
 app/test/test_bpf.c                           | 200 +++++
 app/test/test_pcapng.c                        | 272 ++++++
 doc/api/doxy-api-index.md                     |   1 +
 doc/api/doxy-api.conf.in                      |   1 +
 .../howto/img/packet_capture_framework.svg    |  96 +-
 doc/guides/howto/packet_capture_framework.rst |  67 +-
 doc/guides/prog_guide/index.rst               |   1 +
 doc/guides/prog_guide/pcapng_lib.rst          |  24 +
 doc/guides/prog_guide/pdump_lib.rst           |  28 +-
 doc/guides/rel_notes/release_21_11.rst        |  10 +
 doc/guides/tools/dumpcap.rst                  |  86 ++
 doc/guides/tools/index.rst                    |   1 +
 lib/bpf/bpf_convert.c                         | 575 ++++++++++++
 lib/bpf/bpf_dump.c                            | 139 +++
 lib/bpf/bpf_validate.c                        |   9 +-
 lib/bpf/meson.build                           |   6 +
 lib/bpf/rte_bpf.h                             |  39 +
 lib/bpf/version.map                           |   7 +
 lib/meson.build                               |   6 +-
 lib/pcapng/meson.build                        |   8 +
 lib/pcapng/pcapng_proto.h                     | 129 +++
 lib/pcapng/rte_pcapng.c                       | 613 +++++++++++++
 lib/pcapng/rte_pcapng.h                       | 194 ++++
 lib/pcapng/version.map                        |  12 +
 lib/pdump/meson.build                         |   2 +-
 lib/pdump/rte_pdump.c                         | 427 ++++++---
 lib/pdump/rte_pdump.h                         | 113 ++-
 lib/pdump/version.map                         |   8 +
 33 files changed, 3733 insertions(+), 219 deletions(-)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build
 create mode 100644 app/test/test_pcapng.c
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst
 create mode 100644 lib/bpf/bpf_convert.c
 create mode 100644 lib/bpf/bpf_dump.c
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v12 01/12] lib: pdump is not supported on Windows
  2021-10-01 16:26 ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
@ 2021-10-01 16:26   ` Stephen Hemminger
  2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-01 16:26 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

The current version of the pdump library was building on
Windows, but it was useless since the pdump utility was not being
built and Windows does not have multi-process support.

The new version of pdump with filtering now has dependency
on bpf. But bpf library is not available on Windows.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

Cc: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>
Cc: Dmitry Malloy <dmitrym@microsoft.com>
Cc: Pallavi Kadam <pallavi.kadam@intel.com>
---
 lib/meson.build | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/meson.build b/lib/meson.build
index 9c4841fe404b..708f4a133cdc 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -84,7 +84,6 @@ if is_windows
             'gro',
             'gso',
             'latencystats',
-            'pdump',
             'stack',
     ] # only supported libraries for windows
 endif
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v12 02/12] librte_pcapng: add new library for writing pcapng files
  2021-10-01 16:26 ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
  2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 01/12] lib: pdump is not supported on Windows Stephen Hemminger
@ 2021-10-01 16:26   ` Stephen Hemminger
  2021-10-15  9:36     ` Pattan, Reshma
  2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 03/12] bpf: allow self-xor operation Stephen Hemminger
                     ` (10 subsequent siblings)
  12 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-01 16:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See draft RFC
  https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
and
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build           |   1 +
 lib/pcapng/meson.build    |   8 +
 lib/pcapng/pcapng_proto.h | 129 ++++++++
 lib/pcapng/rte_pcapng.c   | 613 ++++++++++++++++++++++++++++++++++++++
 lib/pcapng/rte_pcapng.h   | 194 ++++++++++++
 lib/pcapng/version.map    |  12 +
 6 files changed, 957 insertions(+)
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

diff --git a/lib/meson.build b/lib/meson.build
index 708f4a133cdc..fe53024e2dcd 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -41,6 +41,7 @@ libraries = [
         'latencystats',
         'lpm',
         'member',
+        'pcapng',
         'power',
         'pdump',
         'rawdev',
diff --git a/lib/pcapng/meson.build b/lib/pcapng/meson.build
new file mode 100644
index 000000000000..fe636bdf3c0b
--- /dev/null
+++ b/lib/pcapng/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+version = 1
+sources = files('rte_pcapng.c')
+headers = files('rte_pcapng.h')
+
+deps += ['ethdev']
diff --git a/lib/pcapng/pcapng_proto.h b/lib/pcapng/pcapng_proto.h
new file mode 100644
index 000000000000..47161d8a1213
--- /dev/null
+++ b/lib/pcapng/pcapng_proto.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * PCAP Next Generation Capture File writer
+ *
+ * See: https://github.com/pcapng/pcapng/ for the file format.
+ */
+
+enum pcapng_block_types {
+	PCAPNG_INTERFACE_BLOCK		= 1,
+	PCAPNG_PACKET_BLOCK,		/* Obsolete */
+	PCAPNG_SIMPLE_PACKET_BLOCK,
+	PCAPNG_NAME_RESOLUTION_BLOCK,
+	PCAPNG_INTERFACE_STATS_BLOCK,
+	PCAPNG_ENHANCED_PACKET_BLOCK,
+
+	PCAPNG_SECTION_BLOCK		= 0x0A0D0D0A,
+};
+
+struct pcapng_option {
+	uint16_t code;
+	uint16_t length;
+	uint8_t data[];
+};
+
+#define PCAPNG_BYTE_ORDER_MAGIC 0x1A2B3C4D
+#define PCAPNG_MAJOR_VERS 1
+#define PCAPNG_MINOR_VERS 0
+
+enum pcapng_opt {
+	PCAPNG_OPT_END	= 0,
+	PCAPNG_OPT_COMMENT = 1,
+};
+
+struct pcapng_section_header {
+	uint32_t block_type;
+	uint32_t block_length;
+	uint32_t byte_order_magic;
+	uint16_t major_version;
+	uint16_t minor_version;
+	uint64_t section_length;
+};
+
+enum pcapng_section_opt {
+	PCAPNG_SHB_HARDWARE = 2,
+	PCAPNG_SHB_OS	    = 3,
+	PCAPNG_SHB_USERAPPL = 4,
+};
+
+struct pcapng_interface_block {
+	uint32_t block_type;	/* 1 */
+	uint32_t block_length;
+	uint16_t link_type;
+	uint16_t reserved;
+	uint32_t snap_len;
+};
+
+enum pcapng_interface_options {
+	PCAPNG_IFB_NAME	 = 2,
+	PCAPNG_IFB_DESCRIPTION,
+	PCAPNG_IFB_IPV4ADDR,
+	PCAPNG_IFB_IPV6ADDR,
+	PCAPNG_IFB_MACADDR,
+	PCAPNG_IFB_EUIADDR,
+	PCAPNG_IFB_SPEED,
+	PCAPNG_IFB_TSRESOL,
+	PCAPNG_IFB_TZONE,
+	PCAPNG_IFB_FILTER,
+	PCAPNG_IFB_OS,
+	PCAPNG_IFB_FCSLEN,
+	PCAPNG_IFB_TSOFFSET,
+	PCAPNG_IFB_HARDWARE,
+};
+
+struct pcapng_enhance_packet_block {
+	uint32_t block_type;	/* 6 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+	uint32_t capture_length;
+	uint32_t original_length;
+};
+
+/* Flags values */
+#define PCAPNG_IFB_INBOUND   0b01
+#define PCAPNG_IFB_OUTBOUND  0b10
+
+enum pcapng_epb_options {
+	PCAPNG_EPB_FLAGS = 2,
+	PCAPNG_EPB_HASH,
+	PCAPNG_EPB_DROPCOUNT,
+	PCAPNG_EPB_PACKETID,
+	PCAPNG_EPB_QUEUE,
+	PCAPNG_EPB_VERDICT,
+};
+
+enum pcapng_epb_hash {
+	PCAPNG_HASH_2COMP = 0,
+	PCAPNG_HASH_XOR,
+	PCAPNG_HASH_CRC32,
+	PCAPNG_HASH_MD5,
+	PCAPNG_HASH_SHA1,
+	PCAPNG_HASH_TOEPLITZ,
+};
+
+struct pcapng_simple_packet {
+	uint32_t block_type;	/* 3 */
+	uint32_t block_length;
+	uint32_t packet_length;
+};
+
+struct pcapng_statistics {
+	uint32_t block_type;	/* 5 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+};
+
+enum pcapng_isb_options {
+	PCAPNG_ISB_STARTTIME = 2,
+	PCAPNG_ISB_ENDTIME,
+	PCAPNG_ISB_IFRECV,
+	PCAPNG_ISB_IFDROP,
+	PCAPNG_ISB_FILTERACCEPT,
+	PCAPNG_ISB_OSDROP,
+	PCAPNG_ISB_USRDELIV,
+};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
new file mode 100644
index 000000000000..ea936d0444db
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.c
@@ -0,0 +1,613 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+#include <errno.h>
+#include <net/if.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/uio.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+#include <rte_time.h>
+
+#include "pcapng_proto.h"
+
+/* conversion from DPDK speed to PCAPNG */
+#define PCAPNG_MBPS_SPEED 1000000ull
+
+/* Format of the capture file handle */
+struct rte_pcapng {
+	int  outfd;		/* output file */
+	/* DPDK port id to interface index in file */
+	uint32_t port_index[RTE_MAX_ETHPORTS];
+};
+
+/* For converting TSC cycles to PCAPNG ns format */
+struct pcapng_time {
+	uint64_t ns;
+	uint64_t cycles;
+} pcapng_time;
+
+RTE_INIT(pcapng_init)
+{
+	struct timespec ts;
+
+	pcapng_time.cycles = rte_get_tsc_cycles();
+	clock_gettime(CLOCK_REALTIME, &ts);
+	pcapng_time.ns = rte_timespec_to_ns(&ts);
+}
+
+/* PCAPNG timestamps are in nanoseconds */
+static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
+{
+	uint64_t delta;
+
+	delta = cycles - pcapng_time.cycles;
+	return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
+}
+
+/* length of option including padding */
+static uint16_t pcapng_optlen(uint16_t len)
+{
+	return RTE_ALIGN(sizeof(struct pcapng_option) + len,
+			 sizeof(uint32_t));
+}
+
+/* build TLV option and return location of next */
+static struct pcapng_option *
+pcapng_add_option(struct pcapng_option *popt, uint16_t code,
+		  const void *data, uint16_t len)
+{
+	popt->code = code;
+	popt->length = len;
+	memcpy(popt->data, data, len);
+
+	return (struct pcapng_option *)((uint8_t *)popt + pcapng_optlen(len));
+}
+
+/*
+ * Write required initial section header describing the capture
+ */
+static int
+pcapng_section_block(rte_pcapng_t *self,
+		    const char *os, const char *hw,
+		    const char *app, const char *comment)
+{
+	struct pcapng_section_header *hdr;
+	struct pcapng_option *opt;
+	void *buf;
+	uint32_t len;
+	ssize_t cc;
+
+	len = sizeof(*hdr);
+	if (hw)
+		len += pcapng_optlen(strlen(hw));
+	if (os)
+		len += pcapng_optlen(strlen(os));
+	if (app)
+		len += pcapng_optlen(strlen(app));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = calloc(1, len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_section_header *)buf;
+	*hdr = (struct pcapng_section_header) {
+		.block_type = PCAPNG_SECTION_BLOCK,
+		.block_length = len,
+		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
+		.major_version = PCAPNG_MAJOR_VERS,
+		.minor_version = PCAPNG_MINOR_VERS,
+		.section_length = UINT64_MAX,
+	};
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (hw)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_HARDWARE,
+					hw, strlen(hw));
+	if (os)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_OS,
+					os, strlen(os));
+	if (app)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_USERAPPL,
+					app, strlen(app));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+	/* clone block_length after option */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+/* Write the PCAPNG section header at start of file */
+static ssize_t
+pcapng_interface_block(rte_pcapng_t *self, const char *if_name,
+		      uint64_t if_speed, const uint8_t *mac_addr,
+		      const char *if_hw, const char *comment)
+{
+	struct pcapng_interface_block *hdr;
+	struct pcapng_option *opt;
+	const uint8_t tsresol = 9;	/* nanosecond resolution */
+	uint32_t len = sizeof(*hdr);
+	ssize_t cc;
+	void *buf;
+
+	len += pcapng_optlen(sizeof(tsresol));
+	if (if_name)
+		len += pcapng_optlen(strlen(if_name));
+	if (mac_addr)
+		len += pcapng_optlen(6);
+	if (if_speed)
+		len += pcapng_optlen(sizeof(uint64_t));
+	if (if_hw)
+		len += pcapng_optlen(strlen(if_hw));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+	buf = calloc(1, len);
+	if (!buf)
+		return -ENOMEM;
+
+	hdr = (struct pcapng_interface_block *)buf;
+	hdr->block_type = PCAPNG_INTERFACE_BLOCK;
+	hdr->link_type = 1;	/* Ethernet */
+	hdr->block_length = len;
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (if_name)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_NAME,
+					 if_name, strlen(if_name));
+	if (mac_addr)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_MACADDR,
+					mac_addr, RTE_ETHER_ADDR_LEN);
+	if (if_speed)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_SPEED,
+					 &if_speed, sizeof(uint64_t));
+	opt = pcapng_add_option(opt, PCAPNG_IFB_TSRESOL,
+				&tsresol, sizeof(tsresol));
+	if (if_hw)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE,
+					 if_hw, strlen(if_hw));
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+static int
+pcapng_add_interface(rte_pcapng_t *self, uint16_t port)
+{
+	struct rte_eth_dev_info dev_info;
+	struct rte_ether_addr macaddr;
+	const struct rte_device *dev;
+	struct rte_eth_link link;
+	char ifname[IF_NAMESIZE];
+	char ifhw[256];
+	uint64_t speed = 0;
+
+	if (rte_eth_dev_info_get(port, &dev_info) < 0)
+		return -1;
+
+	/* make something like an interface name */
+	if (if_indextoname(dev_info.if_index, ifname) == NULL)
+		snprintf(ifname, IF_NAMESIZE, "dpdk:%u", port);
+
+	/* make a useful device hardware string */
+	dev = dev_info.device;
+	if (dev)
+		snprintf(ifhw, sizeof(ifhw),
+			 "%s-%s", dev->bus->name, dev->name);
+
+	/* DPDK reports in units of Mbps */
+	rte_eth_link_get(port, &link);
+	if (link.link_status == ETH_LINK_UP)
+		speed = link.link_speed * PCAPNG_MBPS_SPEED;
+
+	rte_eth_macaddr_get(port, &macaddr);
+
+	return pcapng_interface_block(self, ifname, speed,
+				      macaddr.addr_bytes,
+				      dev ? ifhw : NULL, NULL);
+}
+
+/*
+ * Write the list of possible interfaces at the start
+ * of the file.
+ */
+static int
+pcapng_interfaces(rte_pcapng_t *self)
+{
+	uint16_t port_id;
+	uint16_t index = 0;
+
+	RTE_ETH_FOREACH_DEV(port_id) {
+		/* The list if ports in pcapng needs to be contiguous */
+		self->port_index[port_id] = index++;
+		if (pcapng_add_interface(self, port_id) < 0)
+			return -1;
+	}
+	return 0;
+}
+
+/*
+ * Write an Interface statistics block at the end of capture.
+ */
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop)
+{
+	struct pcapng_statistics *hdr;
+	struct pcapng_option *opt;
+	uint32_t optlen, len;
+	uint8_t *buf;
+	uint64_t ns;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	optlen = 0;
+
+	if (ifrecv != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifdrop));
+	if (start_time != 0)
+		optlen += pcapng_optlen(sizeof(start_time));
+	if (end_time != 0)
+		optlen += pcapng_optlen(sizeof(end_time));
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+	if (optlen != 0)
+		optlen += pcapng_optlen(0);
+
+	len = sizeof(*hdr) + optlen + sizeof(uint32_t);
+	buf = alloca(len);
+	if (buf == NULL)
+		return -1;
+
+	hdr = (struct pcapng_statistics *)buf;
+	opt = (struct pcapng_option *)(hdr + 1);
+
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (start_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME,
+					 &start_time, sizeof(start_time));
+	if (end_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME,
+					 &end_time, sizeof(end_time));
+	if (ifrecv != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV,
+				&ifrecv, sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFDROP,
+				&ifdrop, sizeof(ifdrop));
+	if (optlen != 0)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	hdr->block_type = PCAPNG_INTERFACE_STATS_BLOCK;
+	hdr->block_length = len;
+	hdr->interface_id = self->port_index[port_id];
+
+	ns = pcapng_tsc_to_ns(rte_get_tsc_cycles());
+	hdr->timestamp_hi = ns >> 32;
+	hdr->timestamp_lo = (uint32_t)ns;
+
+	/* clone block_length after option */
+	memcpy(opt, &len, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length)
+{
+	/* The VLAN and EPB header must fit in the mbuf headroom. */
+	RTE_ASSERT(sizeof(struct pcapng_enhance_packet_block) +
+		   sizeof(struct rte_vlan_hdr) <= RTE_PKTMBUF_HEADROOM);
+
+	/* The flags and queue information are added at the end. */
+	return sizeof(struct rte_mbuf)
+		+ RTE_ALIGN(length, sizeof(uint32_t))
+		+ pcapng_optlen(sizeof(uint32_t)) /* flag option */
+		+ pcapng_optlen(sizeof(uint32_t)) /* queue option */
+		+ sizeof(uint32_t);		  /*  length */
+}
+
+/* More generalized version rte_vlan_insert() */
+static int
+pcapng_vlan_insert(struct rte_mbuf *m, uint16_t ether_type, uint16_t tci)
+{
+	struct rte_ether_hdr *nh, *oh;
+	struct rte_vlan_hdr *vh;
+
+	if (!RTE_MBUF_DIRECT(m) || rte_mbuf_refcnt_read(m) > 1)
+		return -EINVAL;
+
+	if (rte_pktmbuf_data_len(m) < sizeof(*oh))
+		return -EINVAL;
+
+	oh = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
+	nh = (struct rte_ether_hdr *)
+		rte_pktmbuf_prepend(m, sizeof(struct rte_vlan_hdr));
+	if (nh == NULL)
+		return -ENOSPC;
+
+	memmove(nh, oh, 2 * RTE_ETHER_ADDR_LEN);
+	nh->ether_type = rte_cpu_to_be_16(ether_type);
+
+	vh = (struct rte_vlan_hdr *) (nh + 1);
+	vh->vlan_tci = rte_cpu_to_be_16(tci);
+
+	return 0;
+}
+
+/*
+ *   The mbufs created use the Pcapng standard enhanced packet  block.
+ *
+ *                         1                   2                   3
+ *     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  0 |                    Block Type = 0x00000006                    |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  4 |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  8 |                         Interface ID                          |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 12 |                        Timestamp (High)                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 16 |                        Timestamp (Low)                        |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 20 |                    Captured Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 24 |                    Original Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 28 /                                                               /
+ *    /                          Packet Data                          /
+ *    /              variable length, padded to 32 bits               /
+ *    /                                                               /
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0002     |     Option Length = 0x004     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Flags (direction)                                |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0006     |     Option Length = 0x002     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Queue id                                         |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+
+/* Make a copy of original mbuf with pcapng header and options */
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *md,
+		struct rte_mempool *mp,
+		uint32_t length, uint64_t cycles,
+		enum rte_pcapng_direction direction)
+{
+	struct pcapng_enhance_packet_block *epb;
+	uint32_t orig_len, data_len, padding, flags;
+	struct pcapng_option *opt;
+	const uint16_t optlen = pcapng_optlen(sizeof(flags)) + pcapng_optlen(sizeof(queue));
+	struct rte_mbuf *mc;
+	uint64_t ns;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
+#endif
+	ns = pcapng_tsc_to_ns(cycles);
+
+	orig_len = rte_pktmbuf_pkt_len(md);
+
+	/* Take snapshot of the data */
+	mc = rte_pktmbuf_copy(md, mp, 0, length);
+	if (unlikely(mc == NULL))
+		return NULL;
+
+	/* Expand any offloaded VLAN information */
+	if ((direction == RTE_PCAPNG_DIRECTION_IN &&
+	     (md->ol_flags & PKT_RX_VLAN_STRIPPED)) ||
+	    (direction == RTE_PCAPNG_DIRECTION_OUT &&
+	     (md->ol_flags & PKT_TX_VLAN))) {
+		if (pcapng_vlan_insert(mc, RTE_ETHER_TYPE_VLAN,
+				       md->vlan_tci) != 0)
+			goto fail;
+	}
+
+	if ((direction == RTE_PCAPNG_DIRECTION_IN &&
+	     (md->ol_flags & PKT_RX_QINQ_STRIPPED)) ||
+	    (direction == RTE_PCAPNG_DIRECTION_OUT &&
+	     (md->ol_flags & PKT_TX_QINQ))) {
+		if (pcapng_vlan_insert(mc, RTE_ETHER_TYPE_QINQ,
+				       md->vlan_tci_outer) != 0)
+			goto fail;
+	}
+
+	/* pad the packet to 32 bit boundary */
+	data_len = rte_pktmbuf_data_len(mc);
+	padding = RTE_ALIGN(data_len, sizeof(uint32_t)) - data_len;
+	if (padding > 0) {
+		void *tail = rte_pktmbuf_append(mc, padding);
+
+		if (tail == NULL)
+			goto fail;
+		memset(tail, 0, padding);
+	}
+
+	/* reserve trailing options and block length */
+	opt = (struct pcapng_option *)
+		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
+	if (unlikely(opt == NULL))
+		goto fail;
+
+	switch (direction) {
+	case RTE_PCAPNG_DIRECTION_IN:
+		flags = PCAPNG_IFB_INBOUND;
+		break;
+	case RTE_PCAPNG_DIRECTION_OUT:
+		flags = PCAPNG_IFB_OUTBOUND;
+		break;
+	default:
+		flags = 0;
+	}
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS,
+				&flags, sizeof(flags));
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
+				&queue, sizeof(queue));
+
+	/* Add PCAPNG packet header */
+	epb = (struct pcapng_enhance_packet_block *)
+		rte_pktmbuf_prepend(mc, sizeof(*epb));
+	if (unlikely(epb == NULL))
+		goto fail;
+
+	epb->block_type = PCAPNG_ENHANCED_PACKET_BLOCK;
+	epb->block_length = rte_pktmbuf_data_len(mc);
+
+	/* Interface index is filled in later during write */
+	mc->port = port_id;
+
+	epb->timestamp_hi = ns >> 32;
+	epb->timestamp_lo = (uint32_t)ns;
+	epb->capture_length = data_len;
+	epb->original_length = orig_len;
+
+	/* set trailer of block length */
+	*(uint32_t *)opt = epb->block_length;
+
+	return mc;
+
+fail:
+	rte_pktmbuf_free(mc);
+	return NULL;
+}
+
+/* Count how many segments are in this array of mbufs */
+static unsigned int
+mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
+{
+	unsigned int i, iovcnt;
+
+	for (iovcnt = 0, i = 0; i < n; i++) {
+		const struct rte_mbuf *m = pkts[i];
+
+		__rte_mbuf_sanity_check(m, 1);
+
+		iovcnt += m->nb_segs;
+	}
+	return iovcnt;
+}
+
+/* Write pre-formatted packets to file. */
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts)
+{
+	int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
+	struct iovec iov[iovcnt];
+	unsigned int i, cnt;
+	ssize_t ret;
+
+	for (i = cnt = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *m = pkts[i];
+		struct pcapng_enhance_packet_block *epb;
+
+		/* sanity check that is really a pcapng mbuf */
+		epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
+		if (unlikely(epb->block_type != PCAPNG_ENHANCED_PACKET_BLOCK ||
+			     epb->block_length != rte_pktmbuf_data_len(m))) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/*
+		 * The DPDK port is recorded during pcapng_copy.
+		 * Map that to PCAPNG interface in file.
+		 */
+		epb->interface_id = self->port_index[m->port];
+		do {
+			iov[cnt].iov_base = rte_pktmbuf_mtod(m, void *);
+			iov[cnt].iov_len = rte_pktmbuf_data_len(m);
+			++cnt;
+		} while ((m = m->next));
+	}
+
+	ret = writev(self->outfd, iov, iovcnt);
+	if (unlikely(ret < 0))
+		rte_errno = errno;
+	return ret;
+}
+
+/* Create new pcapng writer handle */
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment)
+{
+	rte_pcapng_t *self;
+
+	self = malloc(sizeof(*self));
+	if (!self) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	self->outfd = fd;
+
+	if (pcapng_section_block(self, osname, hardware, appname, comment) < 0)
+		goto fail;
+
+	if (pcapng_interfaces(self) < 0)
+		goto fail;
+
+	return self;
+fail:
+	free(self);
+	return NULL;
+}
+
+void
+rte_pcapng_close(rte_pcapng_t *self)
+{
+	close(self->outfd);
+	free(self);
+}
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
new file mode 100644
index 000000000000..2f1bb073df08
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.h
@@ -0,0 +1,194 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+/**
+ * @file
+ * RTE pcapng
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * Pcapng is an evolution from the pcap format, created to address some of
+ * its deficiencies. Namely, the lack of extensibility and inability to store
+ * additional information.
+ *
+ * For details about the file format see RFC:
+ *   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+ *  and
+ *    https://github.com/pcapng/pcapng/
+ */
+
+#ifndef _RTE_PCAPNG_H_
+#define _RTE_PCAPNG_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+#include <rte_compat.h>
+#include <rte_common.h>
+#include <rte_mempool.h>
+#include <rte_ring.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Opaque handle used for functions in this library. */
+typedef struct rte_pcapng rte_pcapng_t;
+
+/**
+ * Write data to existing open file
+ *
+ * @param fd
+ *   file descriptor
+ * @param osname
+ *   Optional description of the operating system.
+ *   Examples: "Debian 11", "Windows Server 22"
+ * @param hardware
+ *   Optional description of the hardware used to create this file.
+ *   Examples: "x86 Virtual Machine"
+ * @param appname
+ *   Optional: application name recorded in the pcapng file.
+ *   Example: "dpdk-dumpcap 1.0 (DPDK 20.11)"
+ * @param comment
+ *   Optional comment to add to file header.
+ * @return
+ *   handle to library, or NULL in case of error (and rte_errno is set).
+ */
+__rte_experimental
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment);
+
+/**
+ * Close capture file
+ *
+ * @param self
+ *  handle to library
+ */
+__rte_experimental
+void
+rte_pcapng_close(rte_pcapng_t *self);
+
+/**
+ * Direction flag
+ * These should match Enhanced Packet Block flag bits
+ */
+enum rte_pcapng_direction {
+	RTE_PCAPNG_DIRECTION_UNKNOWN = 0,
+	RTE_PCAPNG_DIRECTION_IN  = 1,
+	RTE_PCAPNG_DIRECTION_OUT = 2,
+};
+
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param port_id
+ *   The Ethernet port on which packet was received
+ *   or is going to be transmitted.
+ * @param queue
+ *   The queue on the Ethernet port where packet was received
+ *   or is going to be transmitted.
+ * @param mp
+ *   The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ *   The mbuf to copy
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @param timestamp
+ *   The timestamp in TSC cycles.
+ * @param direction
+ *   The direction of the packer: receive, transmit or unknown.
+ *
+ * @return
+ *   - The pointer to the new mbuf formatted for pcapng_write
+ *   - NULL if allocation fails.
+ *
+ */
+__rte_experimental
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *m, struct rte_mempool *mp,
+		uint32_t length, uint64_t timestamp,
+		enum rte_pcapng_direction direction);
+
+
+/**
+ * Determine optimum mbuf data size.
+ *
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @return
+ *   The minimum size of mbuf data to handle packet with length bytes.
+ *   Accounting for required header and trailer fields
+ */
+__rte_experimental
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length);
+
+/**
+ * Write packets to the capture file.
+ *
+ * Packets to be captured are copied by rte_pcapng_mbuf()
+ * and then this function is called to write them to the file.
+ * @warning
+ * Do not pass original mbufs
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param pkts
+ *  The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *  which contain the output packets
+ * @param nb_pkts
+ *  The number of packets to write to the file.
+ * @return
+ *  The number of bytes written to file, -1 on failure to write file.
+ *  The mbuf's in *pkts* are always freed.
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts);
+
+/**
+ * Write an Interface statistics block.
+ * For statistics, use 0 if don't know or care to report it.
+ * Should be called before closing capture to report results.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param port
+ *  The Ethernet port to report stats on.
+ * @param comment
+ *   Optional comment to add to statistics.
+ * @param start_time
+ *  The time when packet capture was started in nanoseconds.
+ *  Optional: can be zero if not known.
+ * @param end_time
+ *  The time when packet capture was stopped in nanoseconds.
+ *  Optional: can be zero if not finished;
+ * @param ifrecv
+ *  The number of packets received by capture.
+ *  Optional: use UINT64_MAX if not known.
+ * @param ifdrop
+ *  The number of packets missed by the capture process.
+ *  Optional: use UINT64_MAX if not known.
+ * @return
+ *  number of bytes written to file, -1 on failure to write file
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PCAPNG_H_ */
diff --git a/lib/pcapng/version.map b/lib/pcapng/version.map
new file mode 100644
index 000000000000..05a9c86a7d91
--- /dev/null
+++ b/lib/pcapng/version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_pcapng_close;
+	rte_pcapng_copy;
+	rte_pcapng_fdopen;
+	rte_pcapng_mbuf_size;
+	rte_pcapng_write_packets;
+	rte_pcapng_write_stats;
+
+	local: *;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v12 03/12] bpf: allow self-xor operation
  2021-10-01 16:26 ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
  2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 01/12] lib: pdump is not supported on Windows Stephen Hemminger
  2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-10-01 16:26   ` Stephen Hemminger
  2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-01 16:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

Some BPF programs may use XOR of a register with itself
as a way to zero register in one instruction.
The BPF filter converter generates this in the prolog
to the generated code.

The BPF validator would not allow this because the value of
register was undefined. But after this operation it always zero.

Fixes: 8021917293d0 ("bpf: add extra validation for input BPF program")
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_validate.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_validate.c b/lib/bpf/bpf_validate.c
index 7b1291b382e9..853279fee557 100644
--- a/lib/bpf/bpf_validate.c
+++ b/lib/bpf/bpf_validate.c
@@ -661,8 +661,15 @@ eval_alu(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
 
 	op = BPF_OP(ins->code);
 
+	/* Allow self-xor as way to zero register */
+	if (op == BPF_XOR && BPF_SRC(ins->code) == BPF_X &&
+	    ins->src_reg == ins->dst_reg) {
+		eval_fill_imm(&rs, UINT64_MAX, 0);
+		eval_fill_imm(rd, UINT64_MAX, 0);
+	}
+
 	err = eval_defined((op != EBPF_MOV) ? rd : NULL,
-			(op != BPF_NEG) ? &rs : NULL);
+			   (op != BPF_NEG) ? &rs : NULL);
 	if (err != NULL)
 		return err;
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v12 04/12] bpf: add function to convert classic BPF to DPDK BPF
  2021-10-01 16:26 ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
                     ` (2 preceding siblings ...)
  2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 03/12] bpf: allow self-xor operation Stephen Hemminger
@ 2021-10-01 16:26   ` Stephen Hemminger
  2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-01 16:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

The pcap library emits classic BPF (32 bit) and is useful for
creating filter programs.  The DPDK BPF library only implements
extended BPF (eBPF).  Add an function to convert from old to
new.

The rte_bpf_convert function uses rte_malloc to put the resulting
program in hugepage shared memory so it can be passed from a
secondary process to a primary process.

The code to convert was originally done as part of the Linux
kernel implementation then converted to a userspace program.
See https://github.com/tklauser/filter2xdp

Both authors have agreed that it is allowable to create a modified
version of this code and license it with BSD license used by DPDK.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/bpf/bpf_convert.c | 575 ++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build   |   5 +
 lib/bpf/rte_bpf.h     |  25 ++
 lib/bpf/version.map   |   6 +
 4 files changed, 611 insertions(+)
 create mode 100644 lib/bpf/bpf_convert.c

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
new file mode 100644
index 000000000000..db84add7dcce
--- /dev/null
+++ b/lib/bpf/bpf_convert.c
@@ -0,0 +1,575 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Microsoft Corporation
+ *
+ * Based on bpf_convert_filter() in the Linux kernel sources
+ * and filter2xdp.
+ *
+ * Licensed as BSD with permission original authors.
+ * Copyright (C) 2017 Tobias Klauser
+ * Copyright (c) 2011 - 2014 PLUMgrid, http://plumgrid.com
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_bpf.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+/* Workaround name conflicts with libpcap */
+#define bpf_validate(f, len) bpf_validate_libpcap(f, len)
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+#undef bpf_validate
+
+#include "bpf_impl.h"
+#include "bpf_def.h"
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+/*
+ * Linux socket filter uses negative absolute offsets to
+ * reference ancillary data.
+ */
+#define SKF_AD_OFF    (-0x1000)
+#define SKF_AD_PROTOCOL 0
+#define SKF_AD_PKTTYPE	4
+#define SKF_AD_IFINDEX	8
+#define SKF_AD_NLATTR	12
+#define SKF_AD_NLATTR_NEST	16
+#define SKF_AD_MARK	20
+#define SKF_AD_QUEUE	24
+#define SKF_AD_HATYPE	28
+#define SKF_AD_RXHASH	32
+#define SKF_AD_CPU	36
+#define SKF_AD_ALU_XOR_X	40
+#define SKF_AD_VLAN_TAG	44
+#define SKF_AD_VLAN_TAG_PRESENT 48
+#define SKF_AD_PAY_OFFSET	52
+#define SKF_AD_RANDOM	56
+#define SKF_AD_VLAN_TPID	60
+#define SKF_AD_MAX	64
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1	EBPF_REG_1
+#define BPF_REG_ARG2	EBPF_REG_2
+#define BPF_REG_ARG3	EBPF_REG_3
+#define BPF_REG_ARG4	EBPF_REG_4
+#define BPF_REG_ARG5	EBPF_REG_5
+#define BPF_REG_CTX	EBPF_REG_6
+#define BPF_REG_FP	EBPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A	EBPF_REG_0
+#define BPF_REG_X	EBPF_REG_7
+#define BPF_REG_TMP	EBPF_REG_8
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define EBPF_ALU64_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | BPF_OP(OP) | BPF_X,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU32_IMM(OP, DST, IMM)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV32_IMM(DST, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = CODE,					\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN()						\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | EBPF_EXIT,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/*
+ * Placeholder to convert BPF extensions like length and VLAN tag
+ * If and when DPDK BPF supports them.
+ */
+static bool convert_bpf_load(const struct bpf_insn *fp,
+			     struct ebpf_insn **new_insnp __rte_unused)
+{
+	switch (fp->k) {
+	case SKF_AD_OFF + SKF_AD_PROTOCOL:
+	case SKF_AD_OFF + SKF_AD_PKTTYPE:
+	case SKF_AD_OFF + SKF_AD_IFINDEX:
+	case SKF_AD_OFF + SKF_AD_HATYPE:
+	case SKF_AD_OFF + SKF_AD_MARK:
+	case SKF_AD_OFF + SKF_AD_RXHASH:
+	case SKF_AD_OFF + SKF_AD_QUEUE:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG_PRESENT:
+	case SKF_AD_OFF + SKF_AD_VLAN_TPID:
+	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
+	case SKF_AD_OFF + SKF_AD_NLATTR:
+	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
+	case SKF_AD_OFF + SKF_AD_CPU:
+	case SKF_AD_OFF + SKF_AD_RANDOM:
+	case SKF_AD_OFF + SKF_AD_ALU_XOR_X:
+		/* Linux has special negative offsets to access meta-data. */
+		RTE_BPF_LOG(ERR,
+			    "rte_bpf_convert: socket offset %d not supported\n",
+			    fp->k - SKF_AD_OFF);
+		return true;
+	default:
+		return false;
+	}
+}
+
+static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
+			      struct ebpf_insn *new_prog, uint32_t *new_len)
+{
+	unsigned int pass = 0;
+	size_t new_flen = 0, target, i;
+	struct ebpf_insn *new_insn;
+	const struct bpf_insn *fp;
+	int *addrs = NULL;
+	uint8_t bpf_src;
+
+	if (len > BPF_MAXINSNS) {
+		RTE_BPF_LOG(ERR, "%s: cBPF program too long (%zu insns)\n",
+			    __func__, len);
+		return -EINVAL;
+	}
+
+	/* On second pass, allocate the new program */
+	if (new_prog) {
+		addrs = calloc(len, sizeof(*addrs));
+		if (addrs == NULL)
+			return -ENOMEM;
+	}
+
+do_pass:
+	new_insn = new_prog;
+	fp = prog;
+
+	/* Classic BPF related prologue emission. */
+	if (new_insn) {
+		/* Classic BPF expects A and X to be reset first. These need
+		 * to be guaranteed to be the first two instructions.
+		 */
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+
+		/* All programs must keep CTX in callee saved BPF_REG_CTX.
+		 * In eBPF case it's done by the compiler, here we need to
+		 * do this ourself. Initial CTX is present in BPF_REG_ARG1.
+		 */
+		*new_insn++ = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1);
+	} else {
+		new_insn += 3;
+	}
+
+	for (i = 0; i < len; fp++, i++) {
+		struct ebpf_insn tmp_insns[6] = { };
+		struct ebpf_insn *insn = tmp_insns;
+
+		if (addrs)
+			addrs[i] = new_insn - new_prog;
+
+		switch (fp->code) {
+			/* Absolute loads are how classic BPF accesses skb */
+		case BPF_LD | BPF_ABS | BPF_W:
+		case BPF_LD | BPF_ABS | BPF_H:
+		case BPF_LD | BPF_ABS | BPF_B:
+			if (convert_bpf_load(fp, &insn))
+				goto err;
+
+			*insn = BPF_RAW_INSN(fp->code, 0, 0, 0, fp->k);
+			break;
+
+		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_X:
+			/* For cBPF, don't cause floating point exception */
+			*insn++ = BPF_MOV32_REG(BPF_REG_X, BPF_REG_X);
+			*insn++ = BPF_JMP_IMM(EBPF_JNE, BPF_REG_X, 0, 2);
+			*insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+			*insn++ = BPF_EXIT_INSN();
+			/* fallthrough */
+		case BPF_ALU | BPF_ADD | BPF_X:
+		case BPF_ALU | BPF_ADD | BPF_K:
+		case BPF_ALU | BPF_SUB | BPF_X:
+		case BPF_ALU | BPF_SUB | BPF_K:
+		case BPF_ALU | BPF_AND | BPF_X:
+		case BPF_ALU | BPF_AND | BPF_K:
+		case BPF_ALU | BPF_OR | BPF_X:
+		case BPF_ALU | BPF_OR | BPF_K:
+		case BPF_ALU | BPF_LSH | BPF_X:
+		case BPF_ALU | BPF_LSH | BPF_K:
+		case BPF_ALU | BPF_RSH | BPF_X:
+		case BPF_ALU | BPF_RSH | BPF_K:
+		case BPF_ALU | BPF_XOR | BPF_X:
+		case BPF_ALU | BPF_XOR | BPF_K:
+		case BPF_ALU | BPF_MUL | BPF_X:
+		case BPF_ALU | BPF_MUL | BPF_K:
+		case BPF_ALU | BPF_DIV | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_NEG:
+		case BPF_LD | BPF_IND | BPF_W:
+		case BPF_LD | BPF_IND | BPF_H:
+		case BPF_LD | BPF_IND | BPF_B:
+			/* All arithmetic insns map as-is. */
+			insn->code = fp->code;
+			insn->dst_reg = BPF_REG_A;
+			bpf_src = BPF_SRC(fp->code);
+			insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			insn->off = 0;
+			insn->imm = fp->k;
+			break;
+
+			/* Jump transformation cannot use BPF block macros
+			 * everywhere as offset calculation and target updates
+			 * require a bit more work than the rest, i.e. jump
+			 * opcodes map as-is, but offsets need adjustment.
+			 */
+
+#define BPF_EMIT_JMP							\
+			do {						\
+				if (target >= len)			\
+					goto err;			\
+				insn->off = addrs ? addrs[target] - addrs[i] - 1 : 0; \
+				/* Adjust pc relative offset for 2nd or 3rd insn. */ \
+				insn->off -= insn - tmp_insns;		\
+			} while (0)
+
+		case BPF_JMP | BPF_JA:
+			target = i + fp->k + 1;
+			insn->code = fp->code;
+			BPF_EMIT_JMP;
+			break;
+
+		case BPF_JMP | BPF_JEQ | BPF_K:
+		case BPF_JMP | BPF_JEQ | BPF_X:
+		case BPF_JMP | BPF_JSET | BPF_K:
+		case BPF_JMP | BPF_JSET | BPF_X:
+		case BPF_JMP | BPF_JGT | BPF_K:
+		case BPF_JMP | BPF_JGT | BPF_X:
+		case BPF_JMP | BPF_JGE | BPF_K:
+		case BPF_JMP | BPF_JGE | BPF_X:
+			if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
+				/* BPF immediates are signed, zero extend
+				 * immediate into tmp register and use it
+				 * in compare insn.
+				 */
+				*insn++ = BPF_MOV32_IMM(BPF_REG_TMP, fp->k);
+
+				insn->dst_reg = BPF_REG_A;
+				insn->src_reg = BPF_REG_TMP;
+				bpf_src = BPF_X;
+			} else {
+				insn->dst_reg = BPF_REG_A;
+				insn->imm = fp->k;
+				bpf_src = BPF_SRC(fp->code);
+				insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			}
+
+			/* Common case where 'jump_false' is next insn. */
+			if (fp->jf == 0) {
+				insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+				target = i + fp->jt + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Convert JEQ into JNE when 'jump_true' is next insn. */
+			if (fp->jt == 0 && BPF_OP(fp->code) == BPF_JEQ) {
+				insn->code = BPF_JMP | EBPF_JNE | bpf_src;
+				target = i + fp->jf + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Other jumps are mapped into two insns: Jxx and JA. */
+			target = i + fp->jt + 1;
+			insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+			BPF_EMIT_JMP;
+			insn++;
+
+			insn->code = BPF_JMP | BPF_JA;
+			target = i + fp->jf + 1;
+			BPF_EMIT_JMP;
+			break;
+
+			/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
+		case BPF_LDX | BPF_MSH | BPF_B:
+			/* tmp = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_A);
+			/* A = BPF_R0 = *(u8 *) (skb->data + K) */
+			*insn++ = BPF_LD_ABS(BPF_B, fp->k);
+			/* A &= 0xf */
+			*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, 0xf);
+			/* A <<= 2 */
+			*insn++ = BPF_ALU32_IMM(BPF_LSH, BPF_REG_A, 2);
+			/* X = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			/* A = tmp */
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP);
+			break;
+
+			/* RET_K is remaped into 2 insns. RET_A case doesn't need an
+			 * extra mov as EBPF_REG_0 is already mapped into BPF_REG_A.
+			 */
+		case BPF_RET | BPF_A:
+		case BPF_RET | BPF_K:
+			if (BPF_RVAL(fp->code) == BPF_K) {
+				*insn++ = BPF_MOV32_RAW(BPF_K, EBPF_REG_0,
+							0, fp->k);
+			}
+			*insn = BPF_EXIT_INSN();
+			break;
+
+			/* Store to stack. */
+		case BPF_ST:
+		case BPF_STX:
+			*insn = BPF_STX_MEM(BPF_W, BPF_REG_FP, BPF_CLASS(fp->code) ==
+					    BPF_ST ? BPF_REG_A : BPF_REG_X,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* Load from stack. */
+		case BPF_LD | BPF_MEM:
+		case BPF_LDX | BPF_MEM:
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD  ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_FP,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* A = K or X = K */
+		case BPF_LD | BPF_IMM:
+		case BPF_LDX | BPF_IMM:
+			*insn = BPF_MOV32_IMM(BPF_CLASS(fp->code) == BPF_LD ?
+					      BPF_REG_A : BPF_REG_X, fp->k);
+			break;
+
+			/* X = A */
+		case BPF_MISC | BPF_TAX:
+			*insn = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			break;
+
+			/* A = X */
+		case BPF_MISC | BPF_TXA:
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_X);
+			break;
+
+			/* A = mbuf->len or X = mbuf->len */
+		case BPF_LD | BPF_W | BPF_LEN:
+		case BPF_LDX | BPF_W | BPF_LEN:
+			/* BPF_ABS/BPF_IND implicitly expect mbuf ptr in R6 */
+
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_CTX,
+					    offsetof(struct rte_mbuf, pkt_len));
+			break;
+
+			/* Unknown instruction. */
+		default:
+			RTE_BPF_LOG(ERR, "%s: Unknown instruction!: %#x\n",
+				    __func__, fp->code);
+			goto err;
+		}
+
+		insn++;
+		if (new_prog)
+			memcpy(new_insn, tmp_insns,
+			       sizeof(*insn) * (insn - tmp_insns));
+		new_insn += insn - tmp_insns;
+	}
+
+	if (!new_prog) {
+		/* Only calculating new length. */
+		*new_len = new_insn - new_prog;
+		return 0;
+	}
+
+	pass++;
+	if ((ptrdiff_t)new_flen != new_insn - new_prog) {
+		new_flen = new_insn - new_prog;
+		if (pass > 2)
+			goto err;
+		goto do_pass;
+	}
+
+	free(addrs);
+	assert(*new_len == new_flen);
+
+	return 0;
+err:
+	free(addrs);
+	return -1;
+}
+
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog)
+{
+	struct rte_bpf_prm *prm = NULL;
+	struct ebpf_insn *ebpf = NULL;
+	uint32_t ebpf_len = 0;
+	int ret;
+
+	if (prog == NULL) {
+		RTE_BPF_LOG(ERR, "%s: NULL program\n", __func__);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* 1st pass: calculate the eBPF program length */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, NULL, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot get eBPF length\n", __func__);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	RTE_BPF_LOG(DEBUG, "%s: prog len cBPF=%u -> eBPF=%u\n",
+		    __func__, prog->bf_len, ebpf_len);
+
+	prm = rte_zmalloc("bpf_filter",
+			  sizeof(*prm) + ebpf_len * sizeof(*ebpf), 0);
+	if (prm == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	/* The EPBF instructions in this case are right after the header */
+	ebpf = (void *)(prm + 1);
+
+	/* 2nd pass: remap cBPF to eBPF instructions  */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, ebpf, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot convert cBPF to eBPF\n", __func__);
+		free(prm);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	prm->ins = ebpf;
+	prm->nb_ins = ebpf_len;
+
+	/* Classic BPF programs use mbufs */
+	prm->prog_arg.type = RTE_BPF_ARG_PTR_MBUF;
+	prm->prog_arg.size = sizeof(struct rte_mbuf);
+
+	return prm;
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 63cbd60185e0..54f7610ae990 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -25,3 +25,8 @@ if dep.found()
     sources += files('bpf_load_elf.c')
     ext_deps += dep
 endif
+
+if dpdk_conf.has('RTE_PORT_PCAP')
+    sources += files('bpf_convert.c')
+    ext_deps += pcap_dep
+endif
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 69116f36ba8b..2f23e272a376 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,31 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+#ifdef RTE_PORT_PCAP
+
+struct bpf_program;
+
+/**
+ * Convert a Classic BPF program from libpcap into a DPDK BPF code.
+ *
+ * @param prog
+ *  Classic BPF program from pcap_compile().
+ * @param prm
+ *  Result Extended BPF program.
+ * @return
+ *   Pointer to BPF program (allocated with *rte_malloc*)
+ *   that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+__rte_experimental
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog);
+
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 0bf35f487666..47082d5003ef 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -14,3 +14,9 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_convert;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v12 05/12] bpf: add function to dump eBPF instructions
  2021-10-01 16:26 ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
                     ` (3 preceding siblings ...)
  2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
@ 2021-10-01 16:26   ` Stephen Hemminger
  2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 06/12] pdump: support pcapng and filtering Stephen Hemminger
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-01 16:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

When debugging converted (and other) programs it is useful
to see disassembled eBPF output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/bpf/bpf_dump.c  | 139 ++++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build |   1 +
 lib/bpf/rte_bpf.h   |  14 +++++
 lib/bpf/version.map |   1 +
 4 files changed, 155 insertions(+)
 create mode 100644 lib/bpf/bpf_dump.c

diff --git a/lib/bpf/bpf_dump.c b/lib/bpf/bpf_dump.c
new file mode 100644
index 000000000000..b86977b96d08
--- /dev/null
+++ b/lib/bpf/bpf_dump.c
@@ -0,0 +1,139 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Stephen Hemminger
+ * Based on filter2xdp
+ * Copyright (C) 2017 Tobias Klauser
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+
+#include "rte_bpf.h"
+
+#define BPF_OP_INDEX(x) (BPF_OP(x) >> 4)
+#define BPF_SIZE_INDEX(x) (BPF_SIZE(x) >> 3)
+
+static const char *const class_tbl[] = {
+	[BPF_LD] = "ld",   [BPF_LDX] = "ldx",	 [BPF_ST] = "st",
+	[BPF_STX] = "stx", [BPF_ALU] = "alu",	 [BPF_JMP] = "jmp",
+	[BPF_RET] = "ret", [BPF_MISC] = "alu64",
+};
+
+static const char *const alu_op_tbl[16] = {
+	[BPF_ADD >> 4] = "add",	   [BPF_SUB >> 4] = "sub",
+	[BPF_MUL >> 4] = "mul",	   [BPF_DIV >> 4] = "div",
+	[BPF_OR >> 4] = "or",	   [BPF_AND >> 4] = "and",
+	[BPF_LSH >> 4] = "lsh",	   [BPF_RSH >> 4] = "rsh",
+	[BPF_NEG >> 4] = "neg",	   [BPF_MOD >> 4] = "mod",
+	[BPF_XOR >> 4] = "xor",	   [EBPF_MOV >> 4] = "mov",
+	[EBPF_ARSH >> 4] = "arsh", [EBPF_END >> 4] = "endian",
+};
+
+static const char *const size_tbl[] = {
+	[BPF_W >> 3] = "w",
+	[BPF_H >> 3] = "h",
+	[BPF_B >> 3] = "b",
+	[EBPF_DW >> 3] = "dw",
+};
+
+static const char *const jump_tbl[16] = {
+	[BPF_JA >> 4] = "ja",	   [BPF_JEQ >> 4] = "jeq",
+	[BPF_JGT >> 4] = "jgt",	   [BPF_JGE >> 4] = "jge",
+	[BPF_JSET >> 4] = "jset",  [EBPF_JNE >> 4] = "jne",
+	[EBPF_JSGT >> 4] = "jsgt", [EBPF_JSGE >> 4] = "jsge",
+	[EBPF_CALL >> 4] = "call", [EBPF_EXIT >> 4] = "exit",
+};
+
+void rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len)
+{
+	uint32_t i;
+
+	for (i = 0; i < len; ++i) {
+		const struct ebpf_insn *ins = buf + i;
+		uint8_t cls = BPF_CLASS(ins->code);
+		const char *op, *postfix = "";
+
+		fprintf(f, " L%u:\t", i);
+
+		switch (cls) {
+		default:
+			fprintf(f, "unimp 0x%x // class: %s\n",
+				ins->code, class_tbl[cls]);
+			break;
+		case BPF_ALU:
+			postfix = "32";
+			/* fall through */
+		case EBPF_ALU64:
+			op = alu_op_tbl[BPF_OP_INDEX(ins->code)];
+			if (BPF_SRC(ins->code) == BPF_X)
+				fprintf(f, "%s%s r%u, r%u\n", op, postfix, ins->dst_reg,
+					ins->src_reg);
+			else
+				fprintf(f, "%s%s r%u, #0x%x\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			break;
+		case BPF_LD:
+			op = "ld";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			if (ins->code == (BPF_LD | BPF_IMM | EBPF_DW)) {
+				uint64_t val;
+
+				val = (uint32_t)ins[0].imm |
+					(uint64_t)(uint32_t)ins[1].imm << 32;
+				fprintf(f, "%s%s r%d, #0x%"PRIx64"\n",
+					op, postfix, ins->dst_reg, val);
+				i++;
+			} else if (BPF_MODE(ins->code) == BPF_IMM)
+				fprintf(f, "%s%s r%d, #0x%x\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			else if (BPF_MODE(ins->code) == BPF_ABS)
+				fprintf(f, "%s%s r%d, [%d]\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			else if (BPF_MODE(ins->code) == BPF_IND)
+				fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix,
+					ins->dst_reg, ins->src_reg, ins->imm);
+			else
+				fprintf(f, "// BUG: LD opcode 0x%02x in eBPF insns\n",
+					ins->code);
+			break;
+		case BPF_LDX:
+			op = "ldx";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix, ins->dst_reg,
+				ins->src_reg, ins->off);
+			break;
+		case BPF_ST:
+			op = "st";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			if (BPF_MODE(ins->code) == BPF_MEM)
+				fprintf(f, "%s%s [r%d + %d], #0x%x\n", op, postfix,
+					ins->dst_reg, ins->off, ins->imm);
+			else
+				fprintf(f, "// BUG: ST opcode 0x%02x in eBPF insns\n",
+					ins->code);
+			break;
+		case BPF_STX:
+			op = "stx";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			fprintf(f, "%s%s [r%d + %d], r%u\n", op, postfix,
+				ins->dst_reg, ins->off, ins->src_reg);
+			break;
+#define L(pc, off) ((int)(pc) + 1 + (off))
+		case BPF_JMP:
+			op = jump_tbl[BPF_OP_INDEX(ins->code)];
+			if (op == NULL)
+				fprintf(f, "invalid jump opcode: %#x\n", ins->code);
+			else if (BPF_OP(ins->code) == BPF_JA)
+				fprintf(f, "%s L%d\n", op, L(i, ins->off));
+			else if (BPF_OP(ins->code) == EBPF_EXIT)
+				fprintf(f, "%s\n", op);
+			else
+				fprintf(f, "%s r%u, #0x%x, L%d\n", op, ins->dst_reg,
+					ins->imm, L(i, ins->off));
+			break;
+		case BPF_RET:
+			fprintf(f, "// BUG: RET opcode 0x%02x in eBPF insns\n",
+				ins->code);
+			break;
+		}
+	}
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 54f7610ae990..5b5585173aeb 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2018 Intel Corporation
 
 sources = files('bpf.c',
+	'bpf_dump.c',
         'bpf_exec.c',
         'bpf_load.c',
         'bpf_pkt.c',
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 2f23e272a376..0d0a84b130a0 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,20 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+/**
+ * Dump epf instructions to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param buf
+ *   A pointer to BPF instructions
+ * @param len
+ *   Number of BPF instructions to dump.
+ */
+__rte_experimental
+void
+rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len);
+
 #ifdef RTE_PORT_PCAP
 
 struct bpf_program;
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 47082d5003ef..3b953f2f4592 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -19,4 +19,5 @@ EXPERIMENTAL {
 	global:
 
 	rte_bpf_convert;
+	rte_bpf_dump;
 };
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v12 06/12] pdump: support pcapng and filtering
  2021-10-01 16:26 ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
                     ` (4 preceding siblings ...)
  2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
@ 2021-10-01 16:26   ` Stephen Hemminger
  2021-10-12 16:31     ` Pattan, Reshma
  2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
                     ` (6 subsequent siblings)
  12 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-01 16:26 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build       |   4 +-
 lib/pdump/meson.build |   2 +-
 lib/pdump/rte_pdump.c | 427 ++++++++++++++++++++++++++++++------------
 lib/pdump/rte_pdump.h | 113 ++++++++++-
 lib/pdump/version.map |   8 +
 5 files changed, 427 insertions(+), 127 deletions(-)

diff --git a/lib/meson.build b/lib/meson.build
index fe53024e2dcd..a0bafa08b559 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -27,6 +27,7 @@ libraries = [
         'acl',
         'bbdev',
         'bitratestats',
+        'bpf',
         'cfgfile',
         'compressdev',
         'cryptodev',
@@ -43,7 +44,6 @@ libraries = [
         'member',
         'pcapng',
         'power',
-        'pdump',
         'rawdev',
         'regexdev',
         'rib',
@@ -55,10 +55,10 @@ libraries = [
         'ipsec', # ipsec lib depends on net, crypto and security
         'fib', #fib lib depends on rib
         'port', # pkt framework libs which use other libs from above
+        'pdump', # pdump lib depends on bpf
         'table',
         'pipeline',
         'flow_classify', # flow_classify lib depends on pkt framework table lib
-        'bpf',
         'graph',
         'node',
 ]
diff --git a/lib/pdump/meson.build b/lib/pdump/meson.build
index 3a95eabde6a6..51ceb2afdec5 100644
--- a/lib/pdump/meson.build
+++ b/lib/pdump/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('rte_pdump.c')
 headers = files('rte_pdump.h')
-deps += ['ethdev']
+deps += ['ethdev', 'bpf', 'pcapng']
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 382217bc1564..82b4f622ca37 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -7,8 +7,10 @@
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
+#include <rte_pcapng.h>
 
 #include "rte_pdump.h"
 
@@ -27,30 +29,23 @@ enum pdump_operation {
 	ENABLE = 2
 };
 
+/* Internal version number in request */
 enum pdump_version {
-	V1 = 1
+	V1 = 1,		    /* no filtering or snap */
+	V2 = 2,
 };
 
 struct pdump_request {
 	uint16_t ver;
 	uint16_t op;
 	uint32_t flags;
-	union pdump_data {
-		struct enable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} en_v1;
-		struct disable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} dis_v1;
-	} data;
+	char device[RTE_DEV_NAME_MAX_LEN];
+	uint16_t queue;
+	struct rte_ring *ring;
+	struct rte_mempool *mp;
+
+	const struct rte_bpf_prm *prm;
+	uint32_t snaplen;
 };
 
 struct pdump_response {
@@ -63,80 +58,136 @@ static struct pdump_rxtx_cbs {
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	const struct rte_eth_rxtx_callback *cb;
-	void *filter;
+	const struct rte_bpf *filter;
+	enum pdump_version ver;
+	uint32_t snaplen;
 } rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
 tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
 
-
-static inline void
-pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+static const char *MZ_RTE_PDUMP_STATS = "rte_pdump_stats";
+
+/* Shared memory between primary and secondary processes. */
+static struct {
+	struct rte_pdump_stats rx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+	struct rte_pdump_stats tx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+} *pdump_stats;
+
+/* Create a clone of mbuf to be placed into ring. */
+static void
+pdump_copy(uint16_t port_id, uint16_t queue,
+	   enum rte_pcapng_direction direction,
+	   struct rte_mbuf **pkts, uint16_t nb_pkts,
+	   const struct pdump_rxtx_cbs *cbs,
+	   struct rte_pdump_stats *stats)
 {
 	unsigned int i;
 	int ring_enq;
 	uint16_t d_pkts = 0;
 	struct rte_mbuf *dup_bufs[nb_pkts];
-	struct pdump_rxtx_cbs *cbs;
+	uint64_t ts;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	struct rte_mbuf *p;
+	uint64_t rcs[nb_pkts];
+
+	if (cbs->filter)
+		rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts);
 
-	cbs  = user_params;
+	ts = rte_get_tsc_cycles();
 	ring = cbs->ring;
 	mp = cbs->mp;
 	for (i = 0; i < nb_pkts; i++) {
-		p = rte_pktmbuf_copy(pkts[i], mp, 0, UINT32_MAX);
-		if (p)
+		/*
+		 * This uses same BPF return value convention as socket filter
+		 * and pcap_offline_filter.
+		 * if program returns zero
+		 * then packet doesn't match the filter (will be ignored).
+		 */
+		if (cbs->filter && rcs[i] == 0) {
+			__atomic_fetch_add(&stats->filtered,
+					   1, __ATOMIC_RELAXED);
+			continue;
+		}
+
+		/*
+		 * If using pcapng then want to wrap packets
+		 * otherwise a simple copy.
+		 */
+		if (cbs->ver == V2)
+			p = rte_pcapng_copy(port_id, queue,
+					    pkts[i], mp, cbs->snaplen,
+					    ts, direction);
+		else
+			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
+
+		if (unlikely(p == NULL))
+			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+		else
 			dup_bufs[d_pkts++] = p;
 	}
 
+	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)dup_bufs, d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		PDUMP_LOG(DEBUG,
-			"only %d of packets enqueued to ring\n", ring_enq);
+		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
 
 static uint16_t
-pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_rx(uint16_t port, uint16_t queue,
 	struct rte_mbuf **pkts, uint16_t nb_pkts,
-	uint16_t max_pkts __rte_unused,
-	void *user_params)
+	uint16_t max_pkts __rte_unused, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->rx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_IN,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static uint16_t
-pdump_tx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_tx(uint16_t port, uint16_t queue,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->tx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static int
-pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_rx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &rx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &rx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"rx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_first_rx_callback(port, qid,
 								pdump_rx, cbs);
 			if (cbs->cb == NULL) {
@@ -145,8 +196,7 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -170,26 +220,32 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 }
 
 static int
-pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_tx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &tx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &tx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"tx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_tx_callback(port, qid, pdump_tx,
 								cbs);
 			if (cbs->cb == NULL) {
@@ -198,8 +254,7 @@ pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -228,37 +283,47 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	uint16_t nb_rx_q = 0, nb_tx_q = 0, end_q, queue;
 	uint16_t port;
 	int ret = 0;
+	struct rte_bpf *filter = NULL;
 	uint32_t flags;
 	uint16_t operation;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 
-	flags = p->flags;
-	operation = p->op;
-	if (operation == ENABLE) {
-		ret = rte_eth_dev_get_port_by_name(p->data.en_v1.device,
-				&port);
-		if (ret < 0) {
+	/* Check for possible DPDK version mismatch */
+	if (!(p->ver == V1 || p->ver == V2)) {
+		PDUMP_LOG(ERR,
+			  "incorrect client version %u\n", p->ver);
+		return -EINVAL;
+	}
+
+	if (p->prm) {
+		if (p->prm->prog_arg.type != RTE_BPF_ARG_PTR_MBUF) {
 			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.en_v1.device);
+				  "invalid BPF program type: %u\n",
+				  p->prm->prog_arg.type);
 			return -EINVAL;
 		}
-		queue = p->data.en_v1.queue;
-		ring = p->data.en_v1.ring;
-		mp = p->data.en_v1.mp;
-	} else {
-		ret = rte_eth_dev_get_port_by_name(p->data.dis_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.dis_v1.device);
-			return -EINVAL;
+
+		filter = rte_bpf_load(p->prm);
+		if (filter == NULL) {
+			PDUMP_LOG(ERR, "cannot load BPF filter: %s\n",
+				  rte_strerror(rte_errno));
+			return -rte_errno;
 		}
-		queue = p->data.dis_v1.queue;
-		ring = p->data.dis_v1.ring;
-		mp = p->data.dis_v1.mp;
+	}
+
+	flags = p->flags;
+	operation = p->op;
+	queue = p->queue;
+	ring = p->ring;
+	mp = p->mp;
+
+	ret = rte_eth_dev_get_port_by_name(p->device, &port);
+	if (ret < 0) {
+		PDUMP_LOG(ERR,
+			  "failed to get port id for device id=%s\n",
+			  p->device);
+		return -EINVAL;
 	}
 
 	/* validation if packet capture is for all queues */
@@ -296,8 +361,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register RX callback */
 	if (flags & RTE_PDUMP_FLAG_RX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_rx_q : queue + 1;
-		ret = pdump_register_rx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_rx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -305,8 +371,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register TX callback */
 	if (flags & RTE_PDUMP_FLAG_TX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_tx_q : queue + 1;
-		ret = pdump_register_tx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_tx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -332,7 +399,7 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 		resp->err_value = set_pdump_rxtx_cbs(cli_req);
 	}
 
-	strlcpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_resp.len_param = sizeof(*resp);
 	mp_resp.num_fds = 0;
 	if (rte_mp_reply(&mp_resp, peer) < 0) {
@@ -347,8 +414,18 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 int
 rte_pdump_init(void)
 {
+	const struct rte_memzone *mz;
 	int ret;
 
+	mz = rte_memzone_reserve(MZ_RTE_PDUMP_STATS, sizeof(*pdump_stats),
+				 rte_socket_id(), 0);
+	if (mz == NULL) {
+		PDUMP_LOG(ERR, "cannot allocate pdump statistics\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	pdump_stats = mz->addr;
+
 	ret = rte_mp_action_register(PDUMP_MP, pdump_server);
 	if (ret && rte_errno != ENOTSUP)
 		return -1;
@@ -392,14 +469,21 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 static int
 pdump_validate_flags(uint32_t flags)
 {
-	if (flags != RTE_PDUMP_FLAG_RX && flags != RTE_PDUMP_FLAG_TX &&
-		flags != RTE_PDUMP_FLAG_RXTX) {
+	if ((flags & RTE_PDUMP_FLAG_RXTX) == 0) {
 		PDUMP_LOG(ERR,
 			"invalid flags, should be either rx/tx/rxtx\n");
 		rte_errno = EINVAL;
 		return -1;
 	}
 
+	/* mask off the flags we know about */
+	if (flags & ~(RTE_PDUMP_FLAG_RXTX | RTE_PDUMP_FLAG_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "unknown flags: %#x\n", flags);
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -426,12 +510,12 @@ pdump_validate_port(uint16_t port, char *name)
 }
 
 static int
-pdump_prepare_client_request(char *device, uint16_t queue,
-				uint32_t flags,
-				uint16_t operation,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+pdump_prepare_client_request(const char *device, uint16_t queue,
+			     uint32_t flags, uint32_t snaplen,
+			     uint16_t operation,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     const struct rte_bpf_prm *prm)
 {
 	int ret = -1;
 	struct rte_mp_msg mp_req, *mp_rep;
@@ -440,26 +524,22 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	struct pdump_request *req = (struct pdump_request *)mp_req.param;
 	struct pdump_response *resp;
 
-	req->ver = 1;
-	req->flags = flags;
+	memset(req, 0, sizeof(*req));
+
+	req->ver = (flags & RTE_PDUMP_FLAG_PCAPNG) ? V2 : V1;
+	req->flags = flags & RTE_PDUMP_FLAG_RXTX;
 	req->op = operation;
+	req->queue = queue;
+	rte_strscpy(req->device, device, sizeof(req->device));
+
 	if ((operation & ENABLE) != 0) {
-		strlcpy(req->data.en_v1.device, device,
-			sizeof(req->data.en_v1.device));
-		req->data.en_v1.queue = queue;
-		req->data.en_v1.ring = ring;
-		req->data.en_v1.mp = mp;
-		req->data.en_v1.filter = filter;
-	} else {
-		strlcpy(req->data.dis_v1.device, device,
-			sizeof(req->data.dis_v1.device));
-		req->data.dis_v1.queue = queue;
-		req->data.dis_v1.ring = NULL;
-		req->data.dis_v1.mp = NULL;
-		req->data.dis_v1.filter = NULL;
+		req->ring = ring;
+		req->mp = mp;
+		req->prm = prm;
+		req->snaplen = snaplen;
 	}
 
-	strlcpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_req.len_param = sizeof(*req);
 	mp_req.num_fds = 0;
 	if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) {
@@ -477,11 +557,17 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	return ret;
 }
 
-int
-rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
-			struct rte_ring *ring,
-			struct rte_mempool *mp,
-			void *filter)
+/*
+ * There are two versions of this function, because although original API
+ * left place holder for future filter, it never checked the value.
+ * Therefore the API can't depend on application passing a non
+ * bogus value.
+ */
+static int
+pdump_enable(uint16_t port, uint16_t queue,
+	     uint32_t flags, uint32_t snaplen,
+	     struct rte_ring *ring, struct rte_mempool *mp,
+	     const struct rte_bpf_prm *prm)
 {
 	int ret;
 	char name[RTE_DEV_NAME_MAX_LEN];
@@ -496,20 +582,42 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						ENABLE, ring, mp, filter);
+	if (snaplen == 0)
+		snaplen = UINT32_MAX;
 
-	return ret;
+	return pdump_prepare_client_request(name, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
 }
 
 int
-rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
-				uint32_t flags,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
+		 struct rte_ring *ring,
+		 struct rte_mempool *mp,
+		 void *filter __rte_unused)
 {
-	int ret = 0;
+	return pdump_enable(port, queue, flags, 0,
+			    ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm)
+{
+	return pdump_enable(port, queue, flags, snaplen,
+			    ring, mp, prm);
+}
+
+static int
+pdump_enable_by_deviceid(const char *device_id, uint16_t queue,
+			 uint32_t flags, uint32_t snaplen,
+			 struct rte_ring *ring,
+			 struct rte_mempool *mp,
+			 const struct rte_bpf_prm *prm)
+{
+	int ret;
 
 	ret = pdump_validate_ring_mp(ring, mp);
 	if (ret < 0)
@@ -518,10 +626,30 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						ENABLE, ring, mp, filter);
+	return pdump_prepare_client_request(device_id, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
+}
 
-	return ret;
+int
+rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
+			     uint32_t flags,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     void *filter __rte_unused)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, 0,
+					ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *prm)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, snaplen,
+					ring, mp, prm);
 }
 
 int
@@ -537,8 +665,8 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags)
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(name, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
@@ -553,8 +681,65 @@ rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(device_id, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
+
+static void
+pdump_sum_stats(uint16_t port, uint16_t nq,
+		struct rte_pdump_stats stats[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+		struct rte_pdump_stats *total)
+{
+	uint64_t *sum = (uint64_t *)total;
+	unsigned int i;
+	uint64_t val;
+	uint16_t qid;
+
+	for (qid = 0; qid < nq; qid++) {
+		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+
+		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
+			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			sum[i] += val;
+		}
+	}
+}
+
+int
+rte_pdump_stats(uint16_t port, struct rte_pdump_stats *stats)
+{
+	struct rte_eth_dev_info dev_info;
+	const struct rte_memzone *mz;
+	int ret;
+
+	memset(stats, 0, sizeof(*stats));
+	ret = rte_eth_dev_info_get(port, &dev_info);
+	if (ret != 0) {
+		PDUMP_LOG(ERR,
+			  "Error during getting device (port %u) info: %s\n",
+			  port, strerror(-ret));
+		return ret;
+	}
+
+	if (pdump_stats == NULL) {
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+			PDUMP_LOG(ERR, "pdump stats initialized\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		mz = rte_memzone_lookup(MZ_RTE_PDUMP_STATS);
+		if (mz == NULL) {
+			PDUMP_LOG(ERR, "can not find pdump stats\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+		pdump_stats = mz->addr;
+	}
+
+	pdump_sum_stats(port, dev_info.nb_rx_queues, pdump_stats->rx, stats);
+	pdump_sum_stats(port, dev_info.nb_tx_queues, pdump_stats->tx, stats);
+	return 0;
+}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index 6b00fc17aeb2..6efa0274f2ce 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -15,6 +15,7 @@
 #include <stdint.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
+#include <rte_bpf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -26,7 +27,9 @@ enum {
 	RTE_PDUMP_FLAG_RX = 1,  /* receive direction */
 	RTE_PDUMP_FLAG_TX = 2,  /* transmit direction */
 	/* both receive and transmit directions */
-	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
+	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX),
+
+	RTE_PDUMP_FLAG_PCAPNG = 4, /* format for pcapng */
 };
 
 /**
@@ -68,7 +71,7 @@ rte_pdump_uninit(void);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  Unused should be NULL.
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -80,6 +83,41 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 		struct rte_mempool *mp,
 		void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given port and queue with filtering.
+ *
+ * @param port_id
+ *  The Ethernet port on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param prm
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf(uint16_t port_id, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm);
+
 /**
  * Disables packet capturing on given port and queue.
  *
@@ -118,7 +156,7 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  unused should be NULL
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -131,6 +169,43 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 				struct rte_mempool *mp,
 				void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given device id and queue with filtering.
+ * device_id can be name or pci address of device.
+ *
+ * @param device_id
+ *  device id on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *filter);
+
+
 /**
  * Disables packet capturing on given device_id and queue.
  * device_id can be name or pci address of device.
@@ -153,6 +228,38 @@ int
 rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 				uint32_t flags);
 
+
+/**
+ * A structure used to retrieve statistics from packet capture.
+ * The statistics are sum of both receive and transmit queues.
+ */
+struct rte_pdump_stats {
+	uint64_t accepted; /**< Number of packets accepted by filter. */
+	uint64_t filtered; /**< Number of packets rejected by filter. */
+	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
+	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+
+	uint64_t reserved[4]; /**< Reserved and pad to cache line */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Retrieve the packet capture statistics for a queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param stats
+ *   A pointer to structure of type *rte_pdump_stats* to be filled in.
+ * @return
+ *   Zero if successful. -1 on error and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_pdump_stats(uint16_t port_id, struct rte_pdump_stats *stats);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index f0a9d12c9a9e..ce5502d9cdf4 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -10,3 +10,11 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_pdump_enable_bpf;
+	rte_pdump_enable_bpf_by_deviceid;
+	rte_pdump_stats;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v12 07/12] app/dumpcap: add new packet capture application
  2021-10-01 16:26 ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
                     ` (5 preceding siblings ...)
  2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 06/12] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-10-01 16:27   ` Stephen Hemminger
  2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 08/12] test: add test for bpf_convert Stephen Hemminger
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-01 16:27 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.

It is not complete yet some features such as filtering are not implemented.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/dumpcap/main.c      | 844 ++++++++++++++++++++++++++++++++++++++++
 app/dumpcap/meson.build |  16 +
 app/meson.build         |   1 +
 3 files changed, 861 insertions(+)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build

diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c
new file mode 100644
index 000000000000..baf9eee46666
--- /dev/null
+++ b/app/dumpcap/main.c
@@ -0,0 +1,844 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * DPDK application to dump network traffic
+ * This is designed to look and act like the Wireshark
+ * dumpcap program.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <limits.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_alarm.h>
+#include <rte_bpf.h>
+#include <rte_config.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_lcore.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_pcapng.h>
+#include <rte_pdump.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+#include <rte_time.h>
+#include <rte_version.h>
+
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+
+#define RING_NAME "capture-ring"
+#define MONITOR_INTERVAL  (500 * 1000)
+#define MBUF_POOL_CACHE_SIZE 32
+#define BURST_SIZE 32
+#define SLEEP_THRESHOLD 1000
+
+/* command line flags */
+static const char *progname;
+static bool quit_signal;
+static bool group_read;
+static bool quiet;
+static bool promiscuous_mode = true;
+static bool use_pcapng = true;
+static char *output_name;
+static const char *filter_str;
+static unsigned int ring_size = 2048;
+static const char *capture_comment;
+static uint32_t snaplen = RTE_MBUF_DEFAULT_BUF_SIZE;
+static bool dump_bpf;
+static struct {
+	uint64_t  duration;	/* nanoseconds */
+	unsigned long packets;  /* number of packets in file */
+	size_t size;		/* file size (bytes) */
+} stop;
+
+/* Running state */
+static struct rte_bpf_prm *bpf_prm;
+static uint64_t start_time, end_time;
+static uint64_t packets_received;
+static size_t file_size;
+
+struct interface {
+	TAILQ_ENTRY(interface) next;
+	uint16_t port;
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_rxtx_callback *rx_cb[RTE_MAX_QUEUES_PER_PORT];
+};
+
+TAILQ_HEAD(interface_list, interface);
+static struct interface_list interfaces = TAILQ_HEAD_INITIALIZER(interfaces);
+static struct interface *port2intf[RTE_MAX_ETHPORTS];
+
+/* Can do either pcap or pcapng format output */
+typedef union {
+	rte_pcapng_t  *pcapng;
+	pcap_dumper_t *dumper;
+} dumpcap_out_t;
+
+static void usage(void)
+{
+	printf("Usage: %s [options] ...\n\n", progname);
+	printf("Capture Interface:\n"
+	       "  -i <interface>           name or port index of interface\n"
+	       "  -f <capture filter>      packet filter in libpcap filter syntax\n");
+	printf("  -s <snaplen>, --snapshot-length <snaplen>\n"
+	       "                           packet snapshot length (def: %u)\n",
+	       RTE_MBUF_DEFAULT_BUF_SIZE);
+	printf("  -p, --no-promiscuous-mode\n"
+	       "                           don't capture in promiscuous mode\n"
+	       "  -D, --list-interfaces    print list of interfaces and exit\n"
+	       "  -d                       print generated BPF code for capture filter\n"
+	       "\n"
+	       "Stop conditions:\n"
+	       "  -c <packet count>        stop after n packets (def: infinite)\n"
+	       "  -a <autostop cond.> ..., --autostop <autostop cond.> ...\n"
+	       "                           duration:NUM - stop after NUM seconds\n"
+	       "                           filesize:NUM - stop this file after NUM kB\n"
+	       "                            packets:NUM - stop after NUM packets\n"
+	       "Output (files):\n"
+	       "  -w <filename>            name of file to save (def: tempfile)\n"
+	       "  -g                       enable group read access on the output file(s)\n"
+	       "  -n                       use pcapng format instead of pcap (default)\n"
+	       "  -P                       use libpcap format instead of pcapng\n"
+	       "  --capture-comment <comment>\n"
+	       "                           add a capture comment to the output file\n"
+	       "\n"
+	       "Miscellaneous:\n"
+	       "  -q                       don't report packet capture counts\n"
+	       "  -v, --version            print version information and exit\n"
+	       "  -h, --help               display this help and exit\n"
+	       "\n"
+	       "Use Ctrl-C to stop capturing at any time.\n");
+}
+
+static const char *version(void)
+{
+	static char str[128];
+
+	snprintf(str, sizeof(str),
+		 "%s 1.0 (%s)\n", progname, rte_version());
+	return str;
+}
+
+/* Parse numeric argument from command line */
+static unsigned long get_uint(const char *arg, const char *name,
+			     unsigned int limit)
+{
+	unsigned long u;
+	char *endp;
+
+	u = strtoul(arg, &endp, 0);
+	if (*arg == '\0' || *endp != '\0')
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is not a valid number\n",
+			 name, arg);
+	if (limit && u > limit)
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is too large (greater than %u)\n",
+			 name, arg, limit);
+
+	return u;
+}
+
+/* Set auto stop values */
+static void auto_stop(char *opt)
+{
+	char *value, *endp;
+
+	value = strchr(opt, ':');
+	if (value == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Missing colon in auto stop parameter\n");
+
+	*value++ = '\0';
+	if (strcmp(opt, "duration") == 0) {
+		double interval = strtod(value, &endp);
+
+		if (*value == '\0' || *endp != '\0' || interval <= 0)
+			rte_exit(EXIT_FAILURE,
+				 "Invalid duration \"%s\"\n", value);
+		stop.duration = NSEC_PER_SEC * interval;
+	} else if (strcmp(opt, "filesize") == 0) {
+		stop.size = get_uint(value, "filesize", 0) * 1024;
+	} else if (strcmp(opt, "packets") == 0) {
+		stop.packets = get_uint(value, "packets", 0);
+	} else {
+		rte_exit(EXIT_FAILURE,
+			 "Unknown autostop parameter \"%s\"\n", opt);
+	}
+}
+
+/* Add interface to list of interfaces to capture */
+static void add_interface(uint16_t port, const char *name)
+{
+	struct interface *intf;
+
+	intf = malloc(sizeof(*intf));
+	if (!intf)
+		rte_exit(EXIT_FAILURE, "no memory for interface\n");
+
+	memset(intf, 0, sizeof(*intf));
+	rte_strscpy(intf->name, name, sizeof(intf->name));
+
+	printf("Capturing on '%s'\n", name);
+
+	port2intf[port] = intf;
+	TAILQ_INSERT_TAIL(&interfaces, intf, next);
+}
+
+/* Select all valid DPDK interfaces */
+static void select_all_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+	}
+}
+
+/*
+ * Choose interface to capture if no -i option given.
+ * Select the first DPDK port, this matches what dumpcap does.
+ */
+static void set_default_interface(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+		return;
+	}
+	rte_exit(EXIT_FAILURE, "No usable interfaces found\n");
+}
+
+/* Lookup interface by name or port and add it to the list */
+static void select_interface(const char *arg)
+{
+	uint16_t port;
+
+	if (strcmp(arg, "*"))
+		select_all_interfaces();
+	else if (rte_eth_dev_get_port_by_name(arg, &port) == 0)
+		add_interface(port, arg);
+	else {
+		char name[RTE_ETH_NAME_MAX_LEN];
+
+		port = get_uint(arg, "port_number", UINT16_MAX);
+		if (rte_eth_dev_get_name_by_port(port, name) < 0)
+			rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
+				 port);
+		add_interface(port, name);
+	}
+}
+
+/* Display list of possible interfaces that can be used. */
+static void show_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		printf("%u. %s\n", p, name);
+	}
+}
+
+static void compile_filter(void)
+{
+	struct bpf_program bf;
+	pcap_t *pcap;
+
+	pcap = pcap_open_dead(DLT_EN10MB, snaplen);
+	if (!pcap)
+		rte_exit(EXIT_FAILURE, "can not open pcap\n");
+
+	if (pcap_compile(pcap, &bf, filter_str,
+			 1, PCAP_NETMASK_UNKNOWN) != 0)
+		rte_exit(EXIT_FAILURE, "pcap filter string not valid (%s)\n",
+			 pcap_geterr(pcap));
+
+	bpf_prm = rte_bpf_convert(&bf);
+	if (bpf_prm == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "bpf convert failed\n");
+
+	if (dump_bpf) {
+		printf("cBPF program (%u insns)\n", bf.bf_len);
+		bpf_dump(&bf, 1);
+		printf("\neBPF program (%u insns)\n", bpf_prm->nb_ins);
+		rte_bpf_dump(stdout, bpf_prm->ins, bpf_prm->nb_ins);
+		exit(0);
+	}
+
+	/* Don't care about original program any more */
+	pcap_freecode(&bf);
+	pcap_close(pcap);
+}
+
+/*
+ * Parse command line options.
+ * These are chosen to be similar to dumpcap command.
+ */
+static void parse_opts(int argc, char **argv)
+{
+	static const struct option long_options[] = {
+		{ "autostop",        required_argument, NULL, 'a' },
+		{ "capture-comment", required_argument, NULL, 0 },
+		{ "help",            no_argument,       NULL, 'h' },
+		{ "interface",       required_argument, NULL, 'i' },
+		{ "list-interfaces", no_argument,       NULL, 'D' },
+		{ "no-promiscuous-mode", no_argument,   NULL, 'p' },
+		{ "output-file",     required_argument, NULL, 'w' },
+		{ "ring-buffer",     required_argument, NULL, 'b' },
+		{ "snapshot-length", required_argument, NULL, 's' },
+		{ "version",         no_argument,       NULL, 'v' },
+		{ NULL },
+	};
+	int option_index, c;
+
+	for (;;) {
+		c = getopt_long(argc, argv, "a:b:c:dDf:ghi:nN:pPqs:vw:",
+				long_options, &option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			switch (option_index) {
+			case 0:
+				capture_comment = optarg;
+				break;
+			default:
+				usage();
+				exit(1);
+			}
+			break;
+		case 'a':
+			auto_stop(optarg);
+			break;
+		case 'b':
+			rte_exit(EXIT_FAILURE,
+				 "multiple files not implemented\n");
+			break;
+		case 'c':
+			stop.packets = get_uint(optarg, "packet_count", 0);
+			break;
+		case 'd':
+			dump_bpf = true;
+			break;
+		case 'D':
+			show_interfaces();
+			exit(0);
+		case 'f':
+			filter_str = optarg;
+			break;
+		case 'g':
+			group_read = true;
+			break;
+		case 'h':
+			printf("%s\n\n", version());
+			usage();
+			exit(0);
+		case 'i':
+			select_interface(optarg);
+			break;
+		case 'n':
+			use_pcapng = true;
+			break;
+		case 'N':
+			ring_size = get_uint(optarg, "packet_limit", 0);
+			break;
+		case 'p':
+			promiscuous_mode = false;
+			break;
+		case 'P':
+			use_pcapng = false;
+			break;
+		case 'q':
+			quiet = true;
+			break;
+		case 's':
+			snaplen = get_uint(optarg, "snap_len", 0);
+			break;
+		case 'w':
+			output_name = optarg;
+			break;
+		case 'v':
+			printf("%s\n", version());
+			exit(0);
+		default:
+			fprintf(stderr, "Invalid option: %s\n",
+				argv[optind - 1]);
+			usage();
+			exit(1);
+		}
+	}
+}
+
+static void
+signal_handler(int sig_num __rte_unused)
+{
+	__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+}
+
+/* Return the time since 1/1/1970 in nanoseconds */
+static uint64_t create_timestamp(void)
+{
+	struct timespec now;
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+	return rte_timespec_to_ns(&now);
+}
+
+static void
+cleanup_pdump_resources(void)
+{
+	struct interface *intf;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		rte_pdump_disable(intf->port,
+				  RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
+		if (promiscuous_mode)
+			rte_eth_promiscuous_disable(intf->port);
+	}
+}
+
+/* Alarm signal handler, used to check that primary process */
+static void
+monitor_primary(void *arg __rte_unused)
+{
+	if (__atomic_load_n(&quit_signal, __ATOMIC_RELAXED))
+		return;
+
+	if (rte_eal_primary_proc_alive(NULL)) {
+		rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	} else {
+		fprintf(stderr,
+			"Primary process is no longer active, exiting...\n");
+		__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+	}
+}
+
+/* Setup handler to check when primary exits. */
+static void
+enable_primary_monitor(void)
+{
+	int ret;
+
+	/* Once primary exits, so will pdump. */
+	ret = rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to enable monitor:%d\n", ret);
+}
+
+static void
+disable_primary_monitor(void)
+{
+	int ret;
+
+	ret = rte_eal_alarm_cancel(monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to disable monitor:%d\n", ret);
+}
+
+static void
+report_packet_stats(dumpcap_out_t out)
+{
+	struct rte_pdump_stats pdump_stats;
+	struct interface *intf;
+	uint64_t ifrecv, ifdrop;
+	double percent;
+
+	fputc('\n', stderr);
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (rte_pdump_stats(intf->port, &pdump_stats) < 0)
+			continue;
+
+		/* do what Wiretap does */
+		ifrecv = pdump_stats.accepted + pdump_stats.filtered;
+		ifdrop = pdump_stats.nombuf + pdump_stats.ringfull;
+
+		if (use_pcapng)
+			rte_pcapng_write_stats(out.pcapng, intf->port, NULL,
+					       start_time, end_time,
+					       ifrecv, ifdrop);
+
+		if (ifrecv == 0)
+			percent = 0;
+		else
+			percent = 100. * ifrecv / (ifrecv + ifdrop);
+
+		fprintf(stderr,
+			"Packets received/dropped on interface '%s': "
+			"%"PRIu64 "/%" PRIu64 " (%.1f)\n",
+			intf->name, ifrecv, ifdrop, percent);
+	}
+}
+
+/*
+ * Start DPDK EAL with arguments.
+ * Unlike most DPDK programs, this application does not use the
+ * typical EAL command line arguments.
+ * We don't want to expose all the DPDK internals to the user.
+ */
+static void dpdk_init(void)
+{
+	static const char * const args[] = {
+		"dumpcap", "--proc-type", "secondary",
+		"--log-level", "notice"
+
+	};
+	const int eal_argc = RTE_DIM(args);
+	char **eal_argv;
+	unsigned int i;
+
+	/* DPDK API requires mutable versions of command line arguments. */
+	eal_argv = calloc(eal_argc + 1, sizeof(char *));
+	if (eal_argv == NULL)
+		rte_panic("No memory\n");
+
+	eal_argv[0] = strdup(progname);
+	for (i = 1; i < RTE_DIM(args); i++)
+		eal_argv[i] = strdup(args[i]);
+
+	if (rte_eal_init(eal_argc, eal_argv) < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed: is primary process running?\n");
+
+	if (rte_eth_dev_count_avail() == 0)
+		rte_exit(EXIT_FAILURE, "No Ethernet ports found\n");
+}
+
+/* Create packet ring shared between callbacks and process */
+static struct rte_ring *create_ring(void)
+{
+	struct rte_ring *ring;
+	size_t size, log2;
+
+	/* Find next power of 2 >= size. */
+	size = ring_size;
+	log2 = sizeof(size) * 8 - __builtin_clzl(size - 1);
+	size = 1u << log2;
+
+	if (size != ring_size) {
+		fprintf(stderr, "Ring size %u rounded up to %zu\n",
+			ring_size, size);
+		ring_size = size;
+	}
+
+	ring = rte_ring_lookup(RING_NAME);
+	if (ring == NULL) {
+		ring = rte_ring_create(RING_NAME, ring_size,
+					rte_socket_id(), 0);
+		if (ring == NULL)
+			rte_exit(EXIT_FAILURE, "Could not create ring :%s\n",
+				 rte_strerror(rte_errno));
+	}
+	return ring;
+}
+
+static struct rte_mempool *create_mempool(void)
+{
+	static const char pool_name[] = "capture_mbufs";
+	size_t num_mbufs = 2 * ring_size;
+	struct rte_mempool *mp;
+
+	mp = rte_mempool_lookup(pool_name);
+	if (mp)
+		return mp;
+
+	mp = rte_pktmbuf_pool_create_by_ops(pool_name, num_mbufs,
+					    MBUF_POOL_CACHE_SIZE, 0,
+					    rte_pcapng_mbuf_size(snaplen),
+					    rte_socket_id(), "ring_mp_sc");
+	if (mp == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Mempool (%s) creation failed: %s\n", pool_name,
+			 rte_strerror(rte_errno));
+
+	return mp;
+}
+
+/*
+ * Get Operating System information.
+ * Returns an string allocated via malloc().
+ */
+static char *get_os_info(void)
+{
+	struct utsname uts;
+	char *osname = NULL;
+
+	if (uname(&uts) < 0)
+		return NULL;
+
+	if (asprintf(&osname, "%s %s",
+		     uts.sysname, uts.release) == -1)
+		return NULL;
+
+	return osname;
+}
+
+static dumpcap_out_t create_output(void)
+{
+	dumpcap_out_t ret;
+	static char tmp_path[PATH_MAX];
+	int fd;
+
+	/* If no filename specified make a tempfile name */
+	if (output_name == NULL) {
+		struct interface *intf;
+		struct tm *tm;
+		time_t now;
+		char ts[32];
+
+		intf = TAILQ_FIRST(&interfaces);
+		now = time(NULL);
+		tm = localtime(&now);
+		if (!tm)
+			rte_panic("localtime failed\n");
+
+		strftime(ts, sizeof(ts), "%Y%m%d%H%M%S", tm);
+
+		snprintf(tmp_path, sizeof(tmp_path),
+			 "/tmp/%s_%u_%s_%s.%s",
+			 progname, intf->port, intf->name, ts,
+			 use_pcapng ? "pcapng" : "pcap");
+		output_name = tmp_path;
+	}
+
+	if (strcmp(output_name, "-") == 0)
+		fd = STDOUT_FILENO;
+	else {
+		mode_t mode = group_read ? 0640 : 0600;
+
+		fd = open(output_name, O_WRONLY | O_CREAT, mode);
+		if (fd < 0)
+			rte_exit(EXIT_FAILURE, "Can not open \"%s\": %s\n",
+				 output_name, strerror(errno));
+	}
+
+	if (use_pcapng) {
+		char *os = get_os_info();
+
+		ret.pcapng = rte_pcapng_fdopen(fd, os, NULL,
+					   version(), capture_comment);
+		if (ret.pcapng == NULL)
+			rte_exit(EXIT_FAILURE, "pcapng_fdopen failed: %s\n",
+				 strerror(rte_errno));
+		free(os);
+	} else {
+		pcap_t *pcap;
+
+		pcap = pcap_open_dead_with_tstamp_precision(DLT_EN10MB, snaplen,
+							    PCAP_TSTAMP_PRECISION_NANO);
+		if (pcap == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_open_dead failed\n");
+
+		ret.dumper = pcap_dump_fopen(pcap, fdopen(fd, "w"));
+		if (ret.dumper == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_dump_fopen failed: %s\n",
+				 pcap_geterr(pcap));
+	}
+
+	return ret;
+}
+
+static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp)
+{
+	struct interface *intf;
+	uint32_t flags;
+	int ret;
+
+	flags = RTE_PDUMP_FLAG_RXTX;
+	if (use_pcapng)
+		flags |= RTE_PDUMP_FLAG_PCAPNG;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (promiscuous_mode)
+			rte_eth_promiscuous_enable(intf->port);
+
+		ret = rte_pdump_enable_bpf(intf->port, RTE_PDUMP_ALL_QUEUES,
+					   flags, snaplen,
+					   r, mp, bpf_prm);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Packet dump enable failed: %s\n",
+				 rte_strerror(-ret));
+	}
+}
+
+/*
+ * Show current count of captured packets
+ * with backspaces to overwrite last value.
+ */
+static void show_count(uint64_t count)
+{
+	unsigned int i;
+	static unsigned int bt;
+
+	for (i = 0; i < bt; i++)
+		fputc('\b', stderr);
+
+	bt = fprintf(stderr, "%"PRIu64" ", count);
+}
+
+/* Write multiple packets in older pcap format */
+static ssize_t
+pcap_write_packets(pcap_dumper_t *dumper,
+		   struct rte_mbuf *pkts[], uint16_t n)
+{
+	uint8_t temp_data[snaplen];
+	struct pcap_pkthdr header;
+	uint16_t i;
+	size_t total = 0;
+
+	gettimeofday(&header.ts, NULL);
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m = pkts[i];
+
+		header.len = rte_pktmbuf_pkt_len(m);
+		header.caplen = RTE_MIN(header.len, snaplen);
+
+		pcap_dump((u_char *)dumper, &header,
+			  rte_pktmbuf_read(m, 0, header.caplen, temp_data));
+
+		total += sizeof(header) + header.len;
+	}
+
+	return total;
+}
+
+/* Process all packets in ring and dump to capture file */
+static int process_ring(dumpcap_out_t out, struct rte_ring *r)
+{
+	struct rte_mbuf *pkts[BURST_SIZE];
+	unsigned int avail, n;
+	static unsigned int empty_count;
+	ssize_t written;
+
+	n = rte_ring_sc_dequeue_burst(r, (void **) pkts, BURST_SIZE,
+				      &avail);
+	if (n == 0) {
+		/* don't consume endless amounts of cpu if idle */
+		if (empty_count < SLEEP_THRESHOLD)
+			++empty_count;
+		else
+			usleep(10);
+		return 0;
+	}
+
+	empty_count = (avail == 0);
+
+	if (use_pcapng)
+		written = rte_pcapng_write_packets(out.pcapng, pkts, n);
+	else
+		written = pcap_write_packets(out.dumper, pkts, n);
+
+	rte_pktmbuf_free_bulk(pkts, n);
+
+	if (written < 0)
+		return -1;
+
+	file_size += written;
+	packets_received += n;
+	if (!quiet)
+		show_count(packets_received);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	struct rte_ring *r;
+	struct rte_mempool *mp;
+	dumpcap_out_t out;
+
+	progname = argv[0];
+
+	dpdk_init();
+	parse_opts(argc, argv);
+
+	if (filter_str)
+		compile_filter();
+
+	if (TAILQ_EMPTY(&interfaces))
+		set_default_interface();
+
+	r = create_ring();
+	mp = create_mempool();
+	out = create_output();
+
+	start_time = create_timestamp();
+	enable_pdump(r, mp);
+
+	signal(SIGINT, signal_handler);
+	signal(SIGPIPE, SIG_IGN);
+
+	enable_primary_monitor();
+
+	if (!quiet) {
+		fprintf(stderr, "Packets captured: ");
+		show_count(0);
+	}
+
+	while (!__atomic_load_n(&quit_signal, __ATOMIC_RELAXED)) {
+		if (process_ring(out, r) < 0) {
+			fprintf(stderr, "pcapng file write failed; %s\n",
+				strerror(errno));
+			break;
+		}
+
+		if (stop.size && file_size >= stop.size)
+			break;
+
+		if (stop.packets && packets_received >= stop.packets)
+			break;
+
+		if (stop.duration != 0 &&
+		    create_timestamp() - start_time > stop.duration)
+			break;
+	}
+
+	end_time = create_timestamp();
+	disable_primary_monitor();
+
+	if (rte_eal_primary_proc_alive(NULL))
+		report_packet_stats(out);
+
+	if (use_pcapng)
+		rte_pcapng_close(out.pcapng);
+	else
+		pcap_dump_close(out.dumper);
+
+	cleanup_pdump_resources();
+	rte_free(bpf_filter);
+	rte_ring_free(r);
+	rte_mempool_free(mp);
+
+	return rte_eal_cleanup() ? EXIT_FAILURE : 0;
+}
diff --git a/app/dumpcap/meson.build b/app/dumpcap/meson.build
new file mode 100644
index 000000000000..794336211eff
--- /dev/null
+++ b/app/dumpcap/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+if not dpdk_conf.has('RTE_PORT_PCAP')
+    build = false
+    reason = 'missing dependency, "libpcap"'
+endif
+
+if is_windows
+	build = false
+	reason = 'not supported on Windows'
+	subdir_done()
+endif
+
+sources = files('main.c')
+deps += ['ethdev', 'pdump', 'pcapng', 'bpf']
diff --git a/app/meson.build b/app/meson.build
index 4c6049807cc3..e41a2e390236 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017-2019 Intel Corporation
 
 apps = [
+        'dumpcap',
         'pdump',
         'proc-info',
         'test-acl',
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v12 08/12] test: add test for bpf_convert
  2021-10-01 16:26 ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
                     ` (6 preceding siblings ...)
  2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
@ 2021-10-01 16:27   ` Stephen Hemminger
  2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 09/12] test: add a test for pcapng library Stephen Hemminger
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-01 16:27 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Add some functional tests for the Classic BPF to DPDK BPF converter.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 200 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 200 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 527c06b80708..543a5fd615b2 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -10,6 +10,7 @@
 #include <rte_memory.h>
 #include <rte_debug.h>
 #include <rte_hexdump.h>
+#include <rte_malloc.h>
 #include <rte_random.h>
 #include <rte_byteorder.h>
 #include <rte_errno.h>
@@ -3233,3 +3234,202 @@ test_bpf(void)
 }
 
 REGISTER_TEST_COMMAND(bpf_autotest, test_bpf);
+
+#ifdef RTE_PORT_PCAP
+#include <pcap/pcap.h>
+
+static void
+test_bpf_dump(struct bpf_program *cbf, const struct rte_bpf_prm *prm)
+{
+	printf("cBPF program (%u insns)\n", cbf->bf_len);
+	bpf_dump(cbf, 1);
+
+	printf("\neBPF program (%u insns)\n", prm->nb_ins);
+	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
+}
+
+static int
+test_bpf_match(pcap_t *pcap, const char *str,
+	       struct rte_mbuf *mb)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+	int ret = -1;
+	uint64_t rc;
+
+	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
+		       __func__, __LINE__,  str, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, str, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	rc = rte_bpf_exec(bpf, mb);
+	/* The return code from bpf capture filter is non-zero if matched */
+	ret = (rc == 0);
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return ret;
+}
+
+/* Basic sanity test can we match a IP packet */
+static int
+test_bpf_filter_sanity(pcap_t *pcap)
+{
+	const uint32_t plen = 100;
+	struct rte_mbuf mb, *m;
+	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
+	struct {
+		struct rte_ether_hdr eth_hdr;
+		struct rte_ipv4_hdr ip_hdr;
+	} *hdr;
+
+	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
+	m = &mb;
+
+	hdr = rte_pktmbuf_mtod(m, typeof(hdr));
+	hdr->eth_hdr = (struct rte_ether_hdr) {
+		.d_addr.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+	};
+	hdr->ip_hdr = (struct rte_ipv4_hdr) {
+		.version_ihl = RTE_IPV4_VHL_DEF,
+		.total_length = rte_cpu_to_be_16(plen),
+		.time_to_live = IPDEFTTL,
+		.next_proto_id = IPPROTO_RAW,
+		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+	};
+
+	if (test_bpf_match(pcap, "ip", m) != 0) {
+		printf("%s@%d: filter \"ip\" doesn't match test data\n",
+		       __func__, __LINE__);
+		return -1;
+	}
+	if (test_bpf_match(pcap, "not ip", m) == 0) {
+		printf("%s@%d: filter \"not ip\" does match test data\n",
+		       __func__, __LINE__);
+		return -1;
+	}
+
+	return 0;
+}
+
+/*
+ * Some sample pcap filter strings from
+ * https://wiki.wireshark.org/CaptureFilters
+ */
+static const char * const sample_filters[] = {
+	"host 172.18.5.4",
+	"net 192.168.0.0/24",
+	"src net 192.168.0.0/24",
+	"src net 192.168.0.0 mask 255.255.255.0",
+	"dst net 192.168.0.0/24",
+	"dst net 192.168.0.0 mask 255.255.255.0",
+	"port 53",
+	"host dpdk.org and not (port 80 or port 25)",
+	"host dpdk.org and not port 80 and not port 25",
+	"port not 53 and not arp",
+	"(tcp[0:2] > 1500 and tcp[0:2] < 1550) or (tcp[2:2] > 1500 and tcp[2:2] < 1550)",
+	"ether proto 0x888e",
+	"ether[0] & 1 = 0 and ip[16] >= 224",
+	"icmp[icmptype] != icmp-echo and icmp[icmptype] != icmp-echoreply",
+	"tcp[tcpflags] & (tcp-syn|tcp-fin) != 0 and not src and dst net 127.0.0.1",
+	"not ether dst 01:80:c2:00:00:0e",
+	"not broadcast and not multicast",
+	"dst host ff02::1",
+	"port 80 and tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420",
+	/* Worms */
+	"dst port 135 and tcp port 135 and ip[2:2]==48",
+	"icmp[icmptype]==icmp-echo and ip[2:2]==92 and icmp[8:4]==0xAAAAAAAA",
+	"dst port 135 or dst port 445 or dst port 1433"
+	" and tcp[tcpflags] & (tcp-syn) != 0"
+	" and tcp[tcpflags] & (tcp-ack) = 0 and src net 192.168.0.0/24",
+	"tcp src port 443 and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4] = 0x18)"
+	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 1] = 0x03)"
+	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 2] < 0x04)"
+	" and ((ip[2:2] - 4 * (ip[0] & 0x0F) - 4 * ((tcp[12] & 0xF0) >> 4) > 69))",
+	/* Other */
+	"len = 128",
+};
+
+static int
+test_bpf_filter(pcap_t *pcap, const char *s)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+
+	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
+		       __func__, __LINE__, s, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, s, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	else {
+		printf("%s \"%s\"\n", __func__, s);
+		test_bpf_dump(&fcode, prm);
+	}
+
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return (bpf == NULL) ? -1 : 0;
+}
+
+static int
+test_bpf_convert(void)
+{
+	unsigned int i;
+	pcap_t *pcap;
+	int rc;
+
+	pcap = pcap_open_dead(DLT_EN10MB, 262144);
+	if (!pcap) {
+		printf("pcap_open_dead failed\n");
+		return -1;
+	}
+
+	rc = test_bpf_filter_sanity(pcap);
+	for (i = 0; i < RTE_DIM(sample_filters); i++)
+		rc |= test_bpf_filter(pcap, sample_filters[i]);
+
+	pcap_close(pcap);
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(bpf_convert_autotest, test_bpf_convert);
+#endif /* RTE_PORT_PCAP */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v12 09/12] test: add a test for pcapng library
  2021-10-01 16:26 ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
                     ` (7 preceding siblings ...)
  2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 08/12] test: add test for bpf_convert Stephen Hemminger
@ 2021-10-01 16:27   ` Stephen Hemminger
  2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 10/12] test: enable bpf autotest Stephen Hemminger
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-01 16:27 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Simple unit test that created pcapng file using API.

To run this test you need to have at least one device.
For example:

DPDK_TEST=pcapng_autotest ./build/app/test/dpdk-test -l 0-15 \
    --no-huge -m 2048 --vdev=net_tap,iface=dummy

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/meson.build   |   4 +
 app/test/test_pcapng.c | 272 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 276 insertions(+)
 create mode 100644 app/test/test_pcapng.c

diff --git a/app/test/meson.build b/app/test/meson.build
index f144d8b8ed68..56863f97acbb 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -396,6 +396,10 @@ if dpdk_conf.has('RTE_NET_RING')
     fast_tests += [['pdump_autotest', true]]
 endif
 
+if dpdk_conf.has('RTE_PORT_PCAP')
+    test_sources += 'test_pcapng.c'
+endif
+
 if dpdk_conf.has('RTE_LIB_POWER')
     test_deps += 'power'
 endif
diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
new file mode 100644
index 000000000000..df837ffe0d51
--- /dev/null
+++ b/app/test/test_pcapng.c
@@ -0,0 +1,272 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Microsoft Corporation
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_net.h>
+#include <rte_pcapng.h>
+
+#include <pcap/pcap.h>
+
+#include "test.h"
+
+#define NUM_PACKETS    10
+#define DUMMY_MBUF_NUM 3
+
+static rte_pcapng_t *pcapng;
+static struct rte_mempool *mp;
+static const uint32_t pkt_len = 200;
+static uint16_t port_id;
+static char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng";
+
+/* first mbuf in the packet, should always be at offset 0 */
+struct dummy_mbuf {
+	struct rte_mbuf mb[DUMMY_MBUF_NUM];
+	uint8_t buf[DUMMY_MBUF_NUM][RTE_MBUF_DEFAULT_BUF_SIZE];
+};
+
+static void
+dummy_mbuf_prep(struct rte_mbuf *mb, uint8_t buf[], uint32_t buf_len,
+	uint32_t data_len)
+{
+	uint32_t i;
+	uint8_t *db;
+
+	mb->buf_addr = buf;
+	mb->buf_iova = (uintptr_t)buf;
+	mb->buf_len = buf_len;
+	rte_mbuf_refcnt_set(mb, 1);
+
+	/* set pool pointer to dummy value, test doesn't use it */
+	mb->pool = (void *)buf;
+
+	rte_pktmbuf_reset(mb);
+	db = (uint8_t *)rte_pktmbuf_append(mb, data_len);
+
+	for (i = 0; i != data_len; i++)
+		db[i] = i;
+}
+
+/* Make an IP packet consisting of chain of one packets */
+static void
+mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen)
+{
+	struct {
+		struct rte_ether_hdr eth;
+		struct rte_ipv4_hdr ip;
+	} pkt = {
+		.eth = {
+			.d_addr.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+			.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+		},
+		.ip = {
+			.version_ihl = RTE_IPV4_VHL_DEF,
+			.total_length = rte_cpu_to_be_16(plen),
+			.time_to_live = IPDEFTTL,
+			.next_proto_id = IPPROTO_RAW,
+			.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+			.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+		}
+	};
+
+	memset(dm, 0, sizeof(*dm));
+	dummy_mbuf_prep(&dm->mb[0], dm->buf[0], sizeof(dm->buf[0]), plen);
+
+	rte_eth_random_addr(pkt.eth.s_addr.addr_bytes);
+	memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, RTE_MIN(sizeof(pkt), plen));
+}
+
+static int
+test_setup(void)
+{
+	int tmp_fd;
+
+	port_id = rte_eth_find_next(0);
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		fprintf(stderr, "No valid Ether port\n");
+		return -1;
+	}
+
+	tmp_fd = mkstemps(file_name, strlen(".pcapng"));
+	if (tmp_fd == -1) {
+		perror("mkstemps() failure");
+		return -1;
+	}
+	printf("pcapng: output file %s\n", file_name);
+
+	/* open a test capture file */
+	pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL);
+	if (pcapng == NULL) {
+		fprintf(stderr, "rte_pcapng_fdopen failed\n");
+		close(tmp_fd);
+		return -1;
+	}
+
+
+	/* Make a pool for cloned packeets */
+	mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", NUM_PACKETS,
+					    0, 0,
+					    rte_pcapng_mbuf_size(pkt_len),
+					    SOCKET_ID_ANY, "ring_mp_sc");
+	if (mp == NULL) {
+		fprintf(stderr, "Cannot create mempool\n");
+		return -1;
+	}
+	return 0;
+
+}
+
+static int
+test_write_packets(void)
+{
+	struct rte_mbuf *orig;
+	struct rte_mbuf *clones[NUM_PACKETS] = { };
+	struct dummy_mbuf mbfs;
+	unsigned int i;
+	ssize_t len;
+
+	/* make a dummy packet */
+	mbuf1_prepare(&mbfs, pkt_len);
+
+	/* clone them */
+	orig  = &mbfs.mb[0];
+	for (i = 0; i < NUM_PACKETS; i++) {
+		struct rte_mbuf *mc;
+
+		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
+				rte_get_tsc_cycles(), 0);
+		if (mc == NULL) {
+			fprintf(stderr, "Cannot copy packet\n");
+			return -1;
+		}
+		clones[i] = mc;
+	}
+
+	/* write it to capture file */
+	len = rte_pcapng_write_packets(pcapng, clones, NUM_PACKETS);
+
+	rte_pktmbuf_free_bulk(clones, NUM_PACKETS);
+
+	if (len <= 0) {
+		fprintf(stderr, "Write of packets failed\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+test_write_stats(void)
+{
+	ssize_t len;
+
+	/* write a statistics block */
+	len = rte_pcapng_write_stats(pcapng, port_id,
+				     NULL, 0, 0,
+				     NUM_PACKETS, 0);
+	if (len <= 0) {
+		fprintf(stderr, "Write of statistics failed\n");
+		return -1;
+	}
+	return 0;
+}
+
+static void
+pkt_print(u_char *user, const struct pcap_pkthdr *h,
+	  const u_char *bytes)
+{
+	unsigned int *countp = (unsigned int *)user;
+	const struct rte_ether_hdr *eh;
+	struct tm *tm;
+	char tbuf[128], src[64], dst[64];
+
+	tm = localtime(&h->ts.tv_sec);
+	if (tm == NULL) {
+		perror("localtime");
+		return;
+	}
+
+	if (strftime(tbuf, sizeof(tbuf), "%X", tm) == 0) {
+		fprintf(stderr, "strftime returned 0!\n");
+		return;
+	}
+
+	eh = (const struct rte_ether_hdr *)bytes;
+	rte_ether_format_addr(dst, sizeof(dst), &eh->d_addr);
+	rte_ether_format_addr(src, sizeof(src), &eh->s_addr);
+	printf("%s.%06lu: %s -> %s type %x length %u\n",
+	       tbuf, (unsigned long)h->ts.tv_usec,
+	       src, dst, rte_be_to_cpu_16(eh->ether_type), h->len);
+
+	*countp += 1;
+}
+
+/*
+ * Open the resulting pcapng file with libpcap
+ * Would be better to use capinfos from wireshark
+ * but that creates an unwanted dependency.
+ */
+static int
+test_validate(void)
+{
+	char errbuf[PCAP_ERRBUF_SIZE];
+	unsigned int count = 0;
+	pcap_t *pcap;
+	int ret;
+
+	pcap = pcap_open_offline(file_name, errbuf);
+	if (pcap == NULL) {
+		fprintf(stderr, "pcap_open_offline('%s') failed: %s\n",
+			file_name, errbuf);
+		return -1;
+	}
+
+	ret = pcap_loop(pcap, 0, pkt_print, (u_char *)&count);
+	if (ret == 0)
+		printf("Saw %u packets\n", count);
+	else
+		fprintf(stderr, "pcap_dispatch: failed: %s\n",
+			pcap_geterr(pcap));
+	pcap_close(pcap);
+
+	return ret;
+}
+
+static void
+test_cleanup(void)
+{
+	if (mp)
+		rte_mempool_free(mp);
+
+	if (pcapng)
+		rte_pcapng_close(pcapng);
+
+}
+
+static struct
+unit_test_suite test_pcapng_suite  = {
+	.setup = test_setup,
+	.teardown = test_cleanup,
+	.suite_name = "Test Pcapng Unit Test Suite",
+	.unit_test_cases = {
+		TEST_CASE(test_write_packets),
+		TEST_CASE(test_write_stats),
+		TEST_CASE(test_validate),
+		TEST_CASES_END()
+	}
+};
+
+static int
+test_pcapng(void)
+{
+	return unit_test_suite_runner(&test_pcapng_suite);
+}
+
+REGISTER_TEST_COMMAND(pcapng_autotest, test_pcapng);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v12 10/12] test: enable bpf autotest
  2021-10-01 16:26 ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
                     ` (8 preceding siblings ...)
  2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 09/12] test: add a test for pcapng library Stephen Hemminger
@ 2021-10-01 16:27   ` Stephen Hemminger
  2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 11/12] doc: changes for new pcapng and dumpcap Stephen Hemminger
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-01 16:27 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

The BPF autotest is defined but not run automatically.
Since it is short, it should be added to the autotest suite.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/meson.build | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/app/test/meson.build b/app/test/meson.build
index 56863f97acbb..aa298e971401 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -194,6 +194,8 @@ test_deps = [
 fast_tests = [
         ['acl_autotest', true],
         ['atomic_autotest', false],
+        ['bpf_autotest', true],
+        ['bpf_convert_autotest', true],
         ['bitops_autotest', true],
         ['byteorder_autotest', true],
         ['cksum_autotest', true],
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v12 11/12] doc: changes for new pcapng and dumpcap
  2021-10-01 16:26 ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
                     ` (9 preceding siblings ...)
  2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 10/12] test: enable bpf autotest Stephen Hemminger
@ 2021-10-01 16:27   ` Stephen Hemminger
  2021-10-15 16:42     ` Pattan, Reshma
  2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
  2021-10-12  2:31   ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
  12 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-01 16:27 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Describe the new packet capture library and utilities

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/api/doxy-api-index.md                     |  1 +
 doc/api/doxy-api.conf.in                      |  1 +
 .../howto/img/packet_capture_framework.svg    | 96 +++++++++----------
 doc/guides/howto/packet_capture_framework.rst | 67 ++++++-------
 doc/guides/prog_guide/index.rst               |  1 +
 doc/guides/prog_guide/pcapng_lib.rst          | 24 +++++
 doc/guides/prog_guide/pdump_lib.rst           | 28 ++++--
 doc/guides/rel_notes/release_21_11.rst        | 10 ++
 doc/guides/tools/dumpcap.rst                  | 86 +++++++++++++++++
 doc/guides/tools/index.rst                    |  1 +
 10 files changed, 228 insertions(+), 87 deletions(-)
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a0356..ee07394d1c78 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -223,3 +223,4 @@ The public API headers are grouped by topics:
   [experimental APIs]  (@ref rte_compat.h),
   [ABI versioning]     (@ref rte_function_versioning.h),
   [version]            (@ref rte_version.h)
+  [pcapng]             (@ref rte_pcapng.h)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6ab..aba17799a9a1 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -58,6 +58,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/metrics \
                           @TOPDIR@/lib/node \
                           @TOPDIR@/lib/net \
+                          @TOPDIR@/lib/pcapng \
                           @TOPDIR@/lib/pci \
                           @TOPDIR@/lib/pdump \
                           @TOPDIR@/lib/pipeline \
diff --git a/doc/guides/howto/img/packet_capture_framework.svg b/doc/guides/howto/img/packet_capture_framework.svg
index a76baf71fdee..1c2646a81096 100644
--- a/doc/guides/howto/img/packet_capture_framework.svg
+++ b/doc/guides/howto/img/packet_capture_framework.svg
@@ -1,6 +1,4 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<!-- Created with Inkscape (http://www.inkscape.org/) -->
-
 <svg
    xmlns:osb="http://www.openswatchbook.org/uri/2009/osb"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
@@ -16,8 +14,8 @@
    viewBox="0 0 425.19685 283.46457"
    id="svg2"
    version="1.1"
-   inkscape:version="0.91 r13725"
-   sodipodi:docname="drawing-pcap.svg">
+   inkscape:version="1.0.2 (e86c870879, 2021-01-15)"
+   sodipodi:docname="packet_capture_framework.svg">
   <defs
      id="defs4">
     <marker
@@ -228,7 +226,7 @@
        x2="487.64606"
        y2="258.38232"
        gradientUnits="userSpaceOnUse"
-       gradientTransform="translate(-84.916417,744.90779)" />
+       gradientTransform="matrix(1.1457977,0,0,0.99944907,-151.97019,745.05014)" />
     <linearGradient
        inkscape:collect="always"
        xlink:href="#linearGradient5784"
@@ -277,17 +275,18 @@
      borderopacity="1.0"
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
-     inkscape:zoom="0.57434918"
-     inkscape:cx="215.17857"
-     inkscape:cy="285.26445"
+     inkscape:zoom="1"
+     inkscape:cx="226.77165"
+     inkscape:cy="78.124511"
      inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
-     inkscape:window-width="1874"
-     inkscape:window-height="971"
-     inkscape:window-x="2"
-     inkscape:window-y="24"
-     inkscape:window-maximized="0" />
+     inkscape:window-width="2560"
+     inkscape:window-height="1414"
+     inkscape:window-x="0"
+     inkscape:window-y="0"
+     inkscape:window-maximized="1"
+     inkscape:document-rotation="0" />
   <metadata
      id="metadata7">
     <rdf:RDF>
@@ -296,7 +295,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
+        <dc:title />
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -321,15 +320,15 @@
        y="790.82452" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="61.050636"
        y="807.3205"
-       id="text4152"
-       sodipodi:linespacing="125%"><tspan
+       id="text4152"><tspan
          sodipodi:role="line"
          id="tspan4154"
          x="61.050636"
-         y="807.3205">DPDK Primary Application</tspan></text>
+         y="807.3205"
+         style="font-size:12.5px;line-height:1.25">DPDK Primary Application</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6"
@@ -339,19 +338,20 @@
        y="827.01843" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="350.68585"
        y="841.16058"
-       id="text4189"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189"><tspan
          sodipodi:role="line"
          id="tspan4191"
          x="350.68585"
-         y="841.16058">dpdk-pdump</tspan><tspan
+         y="841.16058"
+         style="font-size:12.5px;line-height:1.25">dpdk-dumpcap</tspan><tspan
          sodipodi:role="line"
          x="350.68585"
          y="856.78558"
-         id="tspan4193">tool</tspan></text>
+         id="tspan4193"
+         style="font-size:12.5px;line-height:1.25">tool</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4"
@@ -361,15 +361,15 @@
        y="891.16315" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70612"
        y="905.3053"
-       id="text4189-1"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1"><tspan
          sodipodi:role="line"
          x="352.70612"
          y="905.3053"
-         id="tspan4193-3">PCAP PMD</tspan></text>
+         id="tspan4193-3"
+         style="font-size:12.5px;line-height:1.25">librte_pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5745);fill-opacity:1;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-6"
@@ -379,15 +379,15 @@
        y="923.9931" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.02846"
        y="938.13525"
-       id="text4189-0"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-0"><tspan
          sodipodi:role="line"
          x="136.02846"
          y="938.13525"
-         id="tspan4193-6">dpdk_port0</tspan></text>
+         id="tspan4193-6"
+         style="font-size:12.5px;line-height:1.25">dpdk_port0</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-5"
@@ -397,33 +397,33 @@
        y="824.99817" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="137.54369"
        y="839.14026"
-       id="text4189-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-4"><tspan
          sodipodi:role="line"
          x="137.54369"
          y="839.14026"
-         id="tspan4193-2">librte_pdump</tspan></text>
+         id="tspan4193-2"
+         style="font-size:12.5px;line-height:1.25">librte_pdump</tspan></text>
     <rect
-       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1.07013;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5"
-       width="94.449265"
-       height="35.355339"
-       x="307.7804"
-       y="985.61243" />
+       width="108.21974"
+       height="35.335861"
+       x="297.9809"
+       y="985.62219" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70618"
        y="999.75458"
-       id="text4189-1-8"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8"><tspan
          sodipodi:role="line"
          x="352.70618"
          y="999.75458"
-         id="tspan4193-3-2">capture.pcap</tspan></text>
+         id="tspan4193-3-2"
+         style="font-size:12.5px;line-height:1.25">capture.pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5788-1);fill-opacity:1;stroke:#257cdc;stroke-width:1.12555885;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5-1"
@@ -433,15 +433,15 @@
        y="983.14984" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.53352"
        y="1002.785"
-       id="text4189-1-8-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8-4"><tspan
          sodipodi:role="line"
          x="136.53352"
          y="1002.785"
-         id="tspan4193-3-2-7">Traffic Generator</tspan></text>
+         id="tspan4193-3-2-7"
+         style="font-size:12.5px;line-height:1.25">Traffic Generator</tspan></text>
     <path
        style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker7331)"
        d="m 351.46948,927.02357 c 0,57.5787 0,57.5787 0,57.5787"
diff --git a/doc/guides/howto/packet_capture_framework.rst b/doc/guides/howto/packet_capture_framework.rst
index c31bac52340e..78baa609a021 100644
--- a/doc/guides/howto/packet_capture_framework.rst
+++ b/doc/guides/howto/packet_capture_framework.rst
@@ -1,18 +1,19 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
     Copyright(c) 2017 Intel Corporation.
 
-DPDK pdump Library and pdump Tool
-=================================
+DPDK packet capture libraries and tools
+=======================================
 
 This document describes how the Data Plane Development Kit (DPDK) Packet
 Capture Framework is used for capturing packets on DPDK ports. It is intended
 for users of DPDK who want to know more about the Packet Capture feature and
 for those who want to monitor traffic on DPDK-controlled devices.
 
-The DPDK packet capture framework was introduced in DPDK v16.07. The DPDK
-packet capture framework consists of the DPDK pdump library and DPDK pdump
-tool.
-
+The DPDK packet capture framework was introduced in DPDK v16.07 and
+enhanced in 21.1. The DPDK packet capture framework consists of the
+libraries for collecting packets ``librte_pdump`` and writing packets
+to a file ``librte_pcapng``. There are two sample applications:
+``dpdk-dumpcap`` and older ``dpdk-pdump``.
 
 Introduction
 ------------
@@ -22,43 +23,46 @@ allow users to initialize the packet capture framework and to enable or
 disable packet capture. The library works on a multi process communication model and its
 usage is recommended for debugging purposes.
 
-The :ref:`dpdk-pdump <pdump_tool>` tool is developed based on the
-``librte_pdump`` library.  It runs as a DPDK secondary process and is capable
-of enabling or disabling packet capture on DPDK ports. The ``dpdk-pdump`` tool
-provides command-line options with which users can request enabling or
-disabling of the packet capture on DPDK ports.
+The :ref:`librte_pcapng <pcapng_library>` library provides the APIs to format
+packets and write them to a file in Pcapng format.
+
+
+The :ref:`dpdk-dumpcap <dumpcap_tool>` is a tool that captures packets in
+like Wireshark dumpcap does for Linux. It runs as a DPDK secondary process and
+captures packets from one or more interfaces and writes them to a file
+in Pcapng format.  The ``dpdk-dumpcap`` tool is designed to take
+most of the same options as the Wireshark ``dumpcap`` command.
 
-The application which initializes the packet capture framework will be a primary process
-and the application that enables or disables the packet capture will
-be a secondary process. The primary process sends the Rx and Tx packets from the DPDK ports
-to the secondary process.
+Without any options it will use the packet capture framework to
+capture traffic from the first available DPDK port.
 
 In DPDK the ``testpmd`` application can be used to initialize the packet
-capture framework and acts as a server, and the ``dpdk-pdump`` tool acts as a
+capture framework and acts as a server, and the ``dpdk-dumpcap`` tool acts as a
 client. To view Rx or Tx packets of ``testpmd``, the application should be
-launched first, and then the ``dpdk-pdump`` tool. Packets from ``testpmd``
-will be sent to the tool, which then sends them on to the Pcap PMD device and
-that device writes them to the Pcap file or to an external interface depending
-on the command-line option used.
+launched first, and then the ``dpdk-dumpcap`` tool. Packets from ``testpmd``
+will be sent to the tool, and then to the Pcapng file.
 
 Some things to note:
 
-* The ``dpdk-pdump`` tool can only be used in conjunction with a primary
+* All tools using ``librte_pdump`` can only be used in conjunction with a primary
   application which has the packet capture framework initialized already. In
   dpdk, only ``testpmd`` is modified to initialize packet capture framework,
-  other applications remain untouched. So, if the ``dpdk-pdump`` tool has to
+  other applications remain untouched. So, if the ``dpdk-dumpcap`` tool has to
   be used with any application other than the testpmd, the user needs to
   explicitly modify that application to call the packet capture framework
   initialization code. Refer to the ``app/test-pmd/testpmd.c`` code and look
   for ``pdump`` keyword to see how this is done.
 
-* The ``dpdk-pdump`` tool depends on the libpcap based PMD.
+* The ``dpdk-pdump`` tool is an older tool created as demonstration of ``librte_pdump``
+  library. The ``dpdk-pdump`` tool provides more limited functionality and
+  and depends on the Pcap PMD. It is retained only for compatibility reasons;
+  users should use ``dpdk-dumpcap`` instead.
 
 
 Test Environment
 ----------------
 
-The overview of using the Packet Capture Framework and the ``dpdk-pdump`` tool
+The overview of using the Packet Capture Framework and the ``dpdk-dumpcap`` utility
 for packet capturing on the DPDK port in
 :numref:`figure_packet_capture_framework`.
 
@@ -66,13 +70,13 @@ for packet capturing on the DPDK port in
 
 .. figure:: img/packet_capture_framework.*
 
-   Packet capturing on a DPDK port using the dpdk-pdump tool.
+   Packet capturing on a DPDK port using the dpdk-dumpcap utility.
 
 
 Running the Application
 -----------------------
 
-The following steps demonstrate how to run the ``dpdk-pdump`` tool to capture
+The following steps demonstrate how to run the ``dpdk-dumpcap`` tool to capture
 Rx side packets on dpdk_port0 in :numref:`figure_packet_capture_framework` and
 inspect them using ``tcpdump``.
 
@@ -80,16 +84,15 @@ inspect them using ``tcpdump``.
 
      sudo <build_dir>/app/dpdk-testpmd -c 0xf0 -n 4 -- -i --port-topology=chained
 
-#. Launch the pdump tool as follows::
+#. Launch the dpdk-dump as follows::
 
-     sudo <build_dir>/app/dpdk-pdump -- \
-          --pdump 'port=0,queue=*,rx-dev=/tmp/capture.pcap'
+     sudo <build_dir>/app/dpdk-dumpcap -w /tmp/capture.pcapng
 
 #. Send traffic to dpdk_port0 from traffic generator.
-   Inspect packets captured in the file capture.pcap using a tool
-   that can interpret Pcap files, for example tcpdump::
+   Inspect packets captured in the file capture.pcap using a tool such as
+   tcpdump or tshark that can interpret Pcapng files::
 
-     $tcpdump -nr /tmp/capture.pcap
+     $ tcpdump -nr /tmp/capture.pcapng
      reading from file /tmp/capture.pcap, link-type EN10MB (Ethernet)
      11:11:36.891404 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
      11:11:36.891442 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46a3..b440c77c2ba1 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -43,6 +43,7 @@ Programmer's Guide
     ip_fragment_reassembly_lib
     generic_receive_offload_lib
     generic_segmentation_offload_lib
+    pcapng_lib
     pdump_lib
     multi_proc_support
     kernel_nic_interface
diff --git a/doc/guides/prog_guide/pcapng_lib.rst b/doc/guides/prog_guide/pcapng_lib.rst
new file mode 100644
index 000000000000..36379b530a57
--- /dev/null
+++ b/doc/guides/prog_guide/pcapng_lib.rst
@@ -0,0 +1,24 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2016 Intel Corporation.
+
+.. _pcapng_library:
+
+Packet Capture File Writer
+==========================
+
+Pcapng is a library for creating files in Pcapng file format.
+The Pcapng file format is the default capture file format for modern
+network capture processing tools. It can be read by wireshark and tcpdump.
+
+Usage
+-----
+
+Before the library can be used the function ``rte_pcapng_init``
+should be called once to initialize timestamp computation.
+
+
+References
+----------
+* Draft RFC https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+
+* Project repository  https://github.com/pcapng/pcapng/
diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst
index 62c0b015b2fe..9af91415e5ea 100644
--- a/doc/guides/prog_guide/pdump_lib.rst
+++ b/doc/guides/prog_guide/pdump_lib.rst
@@ -3,10 +3,10 @@
 
 .. _pdump_library:
 
-The librte_pdump Library
-========================
+The Packet Capture Library
+==========================
 
-The ``librte_pdump`` library provides a framework for packet capturing in DPDK.
+The DPDK ``pdump`` library provides a framework for packet capturing in DPDK.
 The library does the complete copy of the Rx and Tx mbufs to a new mempool and
 hence it slows down the performance of the applications, so it is recommended
 to use this library for debugging purposes.
@@ -23,11 +23,19 @@ or disable the packet capture, and to uninitialize it.
 
 * ``rte_pdump_enable()``:
   This API enables the packet capture on a given port and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf()``
+  This API enables the packet capture on a given port and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_enable_by_deviceid()``:
   This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf_by_deviceid()``
+  This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
+  It also allows seating an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_disable()``:
   This API disables the packet capture on a given port and queue.
@@ -61,6 +69,12 @@ and enables the packet capture by registering the Ethernet RX and TX callbacks f
 and queue combinations. Then the primary process will mirror the packets to the new mempool and enqueue them to
 the rte_ring that secondary process have passed to these APIs.
 
+The packet ring supports one of two formats. The default format enqueues copies of the original packets
+into the rte_ring. If the ``RTE_PDUMP_FLAG_PCAPNG`` is set the mbuf data is extended with header and trailer
+to match the format of Pcapng enhanced packet block. The enhanced packet block has meta-data such as the
+timestamp, port and queue the packet was captured on. It is up to the application consuming the
+packets from the ring to select the format desired.
+
 The library APIs ``rte_pdump_disable()`` and ``rte_pdump_disable_by_deviceid()`` disables the packet capture.
 For the calls to these APIs from secondary process, the library creates the "pdump disable" request and sends
 the request to the primary process over the multi process channel. The primary process takes this request and
@@ -74,5 +88,5 @@ function.
 Use Case: Packet Capturing
 --------------------------
 
-The DPDK ``app/pdump`` tool is developed based on this library to capture packets in DPDK.
-Users can use this as an example to develop their own packet capturing tools.
+The DPDK ``app/dpdk-dumpcap`` utility uses this library
+to capture packets in DPDK.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 37dc1a77866a..eb93bd6b9157 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -125,6 +125,16 @@ New Features
   * Added tests to validate packets hard expiry.
   * Added tests to verify tunnel header verification in IPsec inbound.
 
+* **Revised packet capture framework.**
+
+  * New dpdk-dumpcap program that has most of the features of the
+    wireshark dumpcap utility including: capture of multiple interfaces,
+    filtering, and stopping after number of bytes, packets.
+  * New library for writing pcapng packet capture files.
+  * Enhancements to the pdump library to support:
+    * Packet filter with BPF.
+    * Pcapng format with timestamps and meta-data.
+    * Fixes packet capture with stripped VLAN tags.
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/dumpcap.rst b/doc/guides/tools/dumpcap.rst
new file mode 100644
index 000000000000..664ea0c79802
--- /dev/null
+++ b/doc/guides/tools/dumpcap.rst
@@ -0,0 +1,86 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Microsoft Corporation.
+
+.. _dumpcap_tool:
+
+dpdk-dumpcap Application
+========================
+
+The ``dpdk-dumpcap`` tool is a Data Plane Development Kit (DPDK)
+network traffic dump tool.  The interface is similar to  the dumpcap tool in Wireshark.
+It runs as a secondary DPDK process and lets you capture packets that are
+coming into and out of a DPDK primary process.
+The ``dpdk-dumpcap`` writes files in Pcapng packet format using
+capture file format is pcapng.
+
+Without any options set it will use DPDK to capture traffic from the first
+available DPDK interface and write the received raw packet data, along
+with timestamps into a pcapng file.
+
+If the ``-w`` option is not specified, ``dpdk-dumpcap`` writes to a newly
+create file with a name chosen based on interface name and timestamp.
+If ``-w`` option is specified, then that file is used.
+
+   .. Note::
+      * The ``dpdk-dumpcap`` tool can only be used in conjunction with a primary
+        application which has the packet capture framework initialized already.
+        In dpdk, only the ``testpmd`` is modified to initialize packet capture
+        framework, other applications remain untouched. So, if the ``dpdk-dumpcap``
+        tool has to be used with any application other than the testpmd, user
+        needs to explicitly modify that application to call packet capture
+        framework initialization code. Refer ``app/test-pmd/testpmd.c``
+        code to see how this is done.
+
+      * The ``dpdk-dumpcap`` tool runs as a DPDK secondary process. It exits when
+        the primary application exits.
+
+
+Running the Application
+-----------------------
+
+To list interfaces available for capture use ``--list-interfaces``.
+
+To filter packets in style of *tshark* use the ``-f`` flag.
+
+To capture on multiple interfaces at once, use multiple ``-I`` flags.
+
+Example
+-------
+
+.. code-block:: console
+
+   # ./<build_dir>/app/dpdk-dumpcap --list-interfaces
+   0. 000:00:03.0
+   1. 000:00:03.1
+
+   # ./<build_dir>/app/dpdk-dumpcap -I 0000:00:03.0 -c 6 -w /tmp/sample.pcapng
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 6/0
+
+   # ./<build_dir>/app/dpdk-dumpcap -f 'tcp port 80'
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 10/8
+
+
+Limitations
+-----------
+The following option of Wireshark ``dumpcap`` is not yet implemented:
+
+   * ``-b|--ring-buffer`` -- more complex file management.
+
+The following options do not make sense in the context of DPDK.
+
+   * ``-C <byte_limit>`` -- its a kernel thing
+
+   * ``-t`` -- use a thread per interface
+
+   * Timestamp type.
+
+   * Link data types. Only EN10MB (Ethernet) is supported.
+
+   * Wireless related options:  ``-I|--monitor-mode`` and  ``-k <freq>``
+
+
+.. Note::
+   * The options to ``dpdk-dumpcap`` are like the Wireshark dumpcap program and
+     are not the same as ``dpdk-pdump`` and other DPDK applications.
diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst
index 93dde4148e90..b71c12b8f2dd 100644
--- a/doc/guides/tools/index.rst
+++ b/doc/guides/tools/index.rst
@@ -8,6 +8,7 @@ DPDK Tools User Guides
     :maxdepth: 2
     :numbered:
 
+    dumpcap
     proc_info
     pdump
     pmdinfo
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v12 12/12] MAINTAINERS: add entry for new packet capture features
  2021-10-01 16:26 ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
                     ` (10 preceding siblings ...)
  2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 11/12] doc: changes for new pcapng and dumpcap Stephen Hemminger
@ 2021-10-01 16:27   ` Stephen Hemminger
  2021-10-12  2:31   ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
  12 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-01 16:27 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Since the packet capture is just extension of existing pdump;
add myself as maintainer of that.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1d437ca29d0e..f316d1ac9972 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1425,12 +1425,17 @@ F: doc/guides/sample_app_ug/qos_scheduler.rst
 
 Packet capture
 M: Reshma Pattan <reshma.pattan@intel.com>
+M: Stephen Hemminger <stephen@networkplumber.org>
 F: lib/pdump/
+F: lib/pcapng/
 F: doc/guides/prog_guide/pdump_lib.rst
-F: app/test/test_pdump.*
-F: app/pdump/
+F: doc/guides/prog_guide/pcapng_lib.rst
+F: doc/guides/tools/dumpcap.rst
 F: doc/guides/tools/pdump.rst
-
+F: app/test/test_pdump.c
+F: app/test/test_pcapng.c
+F: app/pdump/
+F: app/dumpcap/
 
 Packet Framework
 ----------------
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v12 00/12] Packet capture framework update
  2021-10-01 16:26 ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
                     ` (11 preceding siblings ...)
  2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
@ 2021-10-12  2:31   ` Stephen Hemminger
  2021-10-12  7:09     ` Thomas Monjalon
  12 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-12  2:31 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Fri,  1 Oct 2021 09:26:53 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:

> This patch set is a more complete version of the the enhanced
> packet capture support described last year.
> 
> The new capture library and utility are:
>   - faster avoids lots of extra I/O, does bursting, etc.
>   - gives more information (multiple ports, queues, etc)
>   - has a better user interface (same as Wireshark dumpcap)
>   - fixes structural problems with VLAN's and timestamps
> 
> There are no blocker items.
> The following are worth noting:
>   * bogus checkpatch warnings
> 	- the correct flag to open is O_CREAT
>         - intentionally keeping macro with goto since that
>           was in original code and is clearer
>         - the tempfile name can not be const since it is
>           overwritten by tmpfile() call
> 
>   * enabling BPF tests causes CI to see a pre-existing bug
>     https://bugs.dpdk.org/show_bug.cgi?id=811
> 
>   * future filtering for stripped VLAN tags needs collabration
>     with libpcap project to fix pcap_compile_filter().
> 
> v12
>   - fixes for capture offloaded VLAN tags.
>     look at direction flag and handle QinQ offload.
> 
> v11
>   - address review comments for pdump (patch 6)
> 
> v10:
>   - fix to rte_bpf_dump to handle more instructions
>     make sure all bpf_test cases are decoded
> 
> v9:
>   - incorporate suggested change to BPF XOR
>   - make autotest for pcapng more complete by reading the
>     resulting file with libpcap
> 
> v8:
>   - enable BPF tests in autotest
>   - add more BPF test strings
>   - use rte_strscpy to satisfy checkpatch
>   - merge MAINTAINERS (put this in with existing pdump)
> 
> v7:
>   - add functional tests for pcapng lib
>   - bug fix for error returns in pcapng lib
>   - handle long osname on FreeBSD
>   - resolve almost all checkpatch issues
> 
> v5:
>   - minor build and checkpatch fixes for RHEL/FreeBSD
>   - disable lib/pdump on Windows. It was not useful before
>     and now pdump depends on bpf.
> 
> v4:
>   - minor checkpatch fixes.
>     Note: some of the checkpatch warnings are bogus and won't be fixed.
>   - fix build of dumpcap on FreeBSD
> 
> v3:
>   - introduce packet filters using classic BPF to eBPF converter
>     required small fix to DPDK BPF interpreter
>   - introduce function to decode eBPF instructions
>   - add option to dumpcap to show both classic BPF and eBPF result
>   - drop some un-useful stubs
>   - minor checkpatch warning cleanup
> 
> v2:
>    fix formatting of packet blocks
>    fix the new packet capture statistics
>    fix crash when primary process exits
>    record start/end time
>    various whitespace/checkpatch warnings
> 
> 
> 
> Stephen Hemminger (12):
>   lib: pdump is not supported on Windows
>   librte_pcapng: add new library for writing pcapng files
>   bpf: allow self-xor operation
>   bpf: add function to convert classic BPF to DPDK BPF
>   bpf: add function to dump eBPF instructions
>   pdump: support pcapng and filtering
>   app/dumpcap: add new packet capture application
>   test: add test for bpf_convert
>   test: add a test for pcapng library
>   test: enable bpf autotest
>   doc: changes for new pcapng and dumpcap
>   MAINTAINERS: add entry for new packet capture features
> 
>  MAINTAINERS                                   |  11 +-
>  app/dumpcap/main.c                            | 844 ++++++++++++++++++
>  app/dumpcap/meson.build                       |  16 +
>  app/meson.build                               |   1 +
>  app/test/meson.build                          |   6 +
>  app/test/test_bpf.c                           | 200 +++++
>  app/test/test_pcapng.c                        | 272 ++++++
>  doc/api/doxy-api-index.md                     |   1 +
>  doc/api/doxy-api.conf.in                      |   1 +
>  .../howto/img/packet_capture_framework.svg    |  96 +-
>  doc/guides/howto/packet_capture_framework.rst |  67 +-
>  doc/guides/prog_guide/index.rst               |   1 +
>  doc/guides/prog_guide/pcapng_lib.rst          |  24 +
>  doc/guides/prog_guide/pdump_lib.rst           |  28 +-
>  doc/guides/rel_notes/release_21_11.rst        |  10 +
>  doc/guides/tools/dumpcap.rst                  |  86 ++
>  doc/guides/tools/index.rst                    |   1 +
>  lib/bpf/bpf_convert.c                         | 575 ++++++++++++
>  lib/bpf/bpf_dump.c                            | 139 +++
>  lib/bpf/bpf_validate.c                        |   9 +-
>  lib/bpf/meson.build                           |   6 +
>  lib/bpf/rte_bpf.h                             |  39 +
>  lib/bpf/version.map                           |   7 +
>  lib/meson.build                               |   6 +-
>  lib/pcapng/meson.build                        |   8 +
>  lib/pcapng/pcapng_proto.h                     | 129 +++
>  lib/pcapng/rte_pcapng.c                       | 613 +++++++++++++
>  lib/pcapng/rte_pcapng.h                       | 194 ++++
>  lib/pcapng/version.map                        |  12 +
>  lib/pdump/meson.build                         |   2 +-
>  lib/pdump/rte_pdump.c                         | 427 ++++++---
>  lib/pdump/rte_pdump.h                         | 113 ++-
>  lib/pdump/version.map                         |   8 +
>  33 files changed, 3733 insertions(+), 219 deletions(-)
>  create mode 100644 app/dumpcap/main.c
>  create mode 100644 app/dumpcap/meson.build
>  create mode 100644 app/test/test_pcapng.c
>  create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
>  create mode 100644 doc/guides/tools/dumpcap.rst
>  create mode 100644 lib/bpf/bpf_convert.c
>  create mode 100644 lib/bpf/bpf_dump.c
>  create mode 100644 lib/pcapng/meson.build
>  create mode 100644 lib/pcapng/pcapng_proto.h
>  create mode 100644 lib/pcapng/rte_pcapng.c
>  create mode 100644 lib/pcapng/rte_pcapng.h
>  create mode 100644 lib/pcapng/version.map
> 

Anything outstanding on this patchset? Would like this to make 20.11

The only failures in CI are false positives: ie bogus checkpatch, and ice RHEL7 build

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v12 00/12] Packet capture framework update
  2021-10-12  2:31   ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
@ 2021-10-12  7:09     ` Thomas Monjalon
  2021-10-12 10:21       ` Pattan, Reshma
  0 siblings, 1 reply; 220+ messages in thread
From: Thomas Monjalon @ 2021-10-12  7:09 UTC (permalink / raw)
  To: Stephen Hemminger, Reshma Pattan; +Cc: dev, bruce.richardson

12/10/2021 04:31, Stephen Hemminger:
> On Fri,  1 Oct 2021 09:26:53 -0700
> Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> > This patch set is a more complete version of the the enhanced
> > packet capture support described last year.
> 
> Anything outstanding on this patchset? Would like this to make 20.11

Too late for 20.11, but we can try to merge it in 21.11 timeframe ;)

I was hoping to see a feedback from the current maintainer,
but it seems you didn't Cc her... Reshma, are you aware of these patches?




^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v12 00/12] Packet capture framework update
  2021-10-12  7:09     ` Thomas Monjalon
@ 2021-10-12 10:21       ` Pattan, Reshma
  2021-10-12 15:44         ` Stephen Hemminger
  0 siblings, 1 reply; 220+ messages in thread
From: Pattan, Reshma @ 2021-10-12 10:21 UTC (permalink / raw)
  To: Thomas Monjalon, Stephen Hemminger; +Cc: dev, Richardson, Bruce



> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>

> I was hoping to see a feedback from the current maintainer, but it seems you
> didn't Cc her... Reshma, are you aware of these patches?
> 

I was aware of v10 where I had comments, I will take a look at V12.  
Yes,  please add me to CC for  future patch sets, that would help me to not miss them.


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v12 00/12] Packet capture framework update
  2021-10-12 10:21       ` Pattan, Reshma
@ 2021-10-12 15:44         ` Stephen Hemminger
  2021-10-12 15:48           ` Thomas Monjalon
  0 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-12 15:44 UTC (permalink / raw)
  To: Pattan, Reshma; +Cc: Thomas Monjalon, dev, Richardson, Bruce

On Tue, 12 Oct 2021 10:21:41 +0000
"Pattan, Reshma" <reshma.pattan@intel.com> wrote:

> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>  
> 
> > I was hoping to see a feedback from the current maintainer, but it seems you
> > didn't Cc her... Reshma, are you aware of these patches?
> >   
> 
> I was aware of v10 where I had comments, I will take a look at V12.  
> Yes,  please add me to CC for  future patch sets, that would help me to not miss them.
> 

This means we have a flawed process if patches can't get
reviewed that have been submitted a month ahead of release.


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v12 00/12] Packet capture framework update
  2021-10-12 15:44         ` Stephen Hemminger
@ 2021-10-12 15:48           ` Thomas Monjalon
  2021-10-12 18:00             ` Stephen Hemminger
  0 siblings, 1 reply; 220+ messages in thread
From: Thomas Monjalon @ 2021-10-12 15:48 UTC (permalink / raw)
  To: Pattan, Reshma, Stephen Hemminger; +Cc: dev, Richardson, Bruce, david.marchand

12/10/2021 17:44, Stephen Hemminger:
> On Tue, 12 Oct 2021 10:21:41 +0000
> "Pattan, Reshma" <reshma.pattan@intel.com> wrote:
> > From: Thomas Monjalon <thomas@monjalon.net>  
> > 
> > > I was hoping to see a feedback from the current maintainer, but it seems you
> > > didn't Cc her... Reshma, are you aware of these patches?
> > 
> > I was aware of v10 where I had comments, I will take a look at V12.  
> > Yes,  please add me to CC for  future patch sets, that would help me to not miss them.
> 
> This means we have a flawed process if patches can't get
> reviewed that have been submitted a month ahead of release.

Part of the process, you are supposed to use "--cc-cmd devtools/get-maintainer.sh"
so maintainers are Cc'ed.




^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v12 06/12] pdump: support pcapng and filtering
  2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 06/12] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-10-12 16:31     ` Pattan, Reshma
  0 siblings, 0 replies; 220+ messages in thread
From: Pattan, Reshma @ 2021-10-12 16:31 UTC (permalink / raw)
  To: Stephen Hemminger, dev



> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Stephen Hemminger


I see all comments from v10 been addressed.  A small Nitpick, 

> +	if (pdump_stats == NULL) {
> +		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +			PDUMP_LOG(ERR, "pdump stats initialized\n");

:s/initialized/not initialized ?


Acked-by: Reshma Pattan <reshma.pattan@intel.com>

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v12 00/12] Packet capture framework update
  2021-10-12 15:48           ` Thomas Monjalon
@ 2021-10-12 18:00             ` Stephen Hemminger
  2021-10-12 18:22               ` Thomas Monjalon
  0 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-12 18:00 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Pattan, Reshma, dev, Richardson, Bruce, david.marchand

On Tue, 12 Oct 2021 17:48:47 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> 12/10/2021 17:44, Stephen Hemminger:
> > On Tue, 12 Oct 2021 10:21:41 +0000
> > "Pattan, Reshma" <reshma.pattan@intel.com> wrote:  
> > > From: Thomas Monjalon <thomas@monjalon.net>  
> > >   
> > > > I was hoping to see a feedback from the current maintainer, but it seems you
> > > > didn't Cc her... Reshma, are you aware of these patches?  
> > > 
> > > I was aware of v10 where I had comments, I will take a look at V12.  
> > > Yes,  please add me to CC for  future patch sets, that would help me to not miss them.  
> > 
> > This means we have a flawed process if patches can't get
> > reviewed that have been submitted a month ahead of release.  
> 
> Part of the process, you are supposed to use "--cc-cmd devtools/get-maintainer.sh"
> so maintainers are Cc'ed.

Thought they were, look like Reshma got missed.

No worries about doing in 22.02 since there is no API/ABI breakage in the
patchset. No pre-release note needed either.

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v12 00/12] Packet capture framework update
  2021-10-12 18:00             ` Stephen Hemminger
@ 2021-10-12 18:22               ` Thomas Monjalon
  2021-10-13  8:44                 ` Pattan, Reshma
  0 siblings, 1 reply; 220+ messages in thread
From: Thomas Monjalon @ 2021-10-12 18:22 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Pattan, Reshma, dev, Richardson, Bruce, david.marchand

12/10/2021 20:00, Stephen Hemminger:
> On Tue, 12 Oct 2021 17:48:47 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
> > 12/10/2021 17:44, Stephen Hemminger:
> > > On Tue, 12 Oct 2021 10:21:41 +0000
> > > "Pattan, Reshma" <reshma.pattan@intel.com> wrote:  
> > > > From: Thomas Monjalon <thomas@monjalon.net>  
> > > >   
> > > > > I was hoping to see a feedback from the current maintainer, but it seems you
> > > > > didn't Cc her... Reshma, are you aware of these patches?  
> > > > 
> > > > I was aware of v10 where I had comments, I will take a look at V12.  
> > > > Yes,  please add me to CC for  future patch sets, that would help me to not miss them.  
> > > 
> > > This means we have a flawed process if patches can't get
> > > reviewed that have been submitted a month ahead of release.  
> > 
> > Part of the process, you are supposed to use "--cc-cmd devtools/get-maintainer.sh"
> > so maintainers are Cc'ed.
> 
> Thought they were, look like Reshma got missed.
> 
> No worries about doing in 22.02 since there is no API/ABI breakage in the
> patchset. No pre-release note needed either.

We can still merge it for 21.11 if Reshma is OK with the v12.



^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v12 00/12] Packet capture framework update
  2021-10-12 18:22               ` Thomas Monjalon
@ 2021-10-13  8:44                 ` Pattan, Reshma
  0 siblings, 0 replies; 220+ messages in thread
From: Pattan, Reshma @ 2021-10-13  8:44 UTC (permalink / raw)
  To: Thomas Monjalon, Stephen Hemminger; +Cc: dev, Richardson, Bruce, david.marchand



> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> > Thought they were, look like Reshma got missed.
> >
> > No worries about doing in 22.02 since there is no API/ABI breakage in
> > the patchset. No pre-release note needed either.
> 
> We can still merge it for 21.11 if Reshma is OK with the v12.
> 

FYI, I have reviewed and acked the  below pdump library patch.
[v12,06/12] pdump: support pcapng and filtering

There are some other patches , mainly new library librte_pcapng which needs a review and an Ack, can someone join to review and provide an Ack? 
I will also take a look.

Thanks,
Reshma

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v12 02/12] librte_pcapng: add new library for writing pcapng files
  2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-10-15  9:36     ` Pattan, Reshma
  2021-10-15 17:40       ` Stephen Hemminger
  2021-10-15 18:14       ` Stephen Hemminger
  0 siblings, 2 replies; 220+ messages in thread
From: Pattan, Reshma @ 2021-10-15  9:36 UTC (permalink / raw)
  To: Stephen Hemminger, dev



> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Stephen Hemminger
> See draft RFC
>   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html

The page is not found.  Might need to add new link I guess

> +enum pcapng_interface_options {
> +	PCAPNG_IFB_NAME	 = 2,
> +	PCAPNG_IFB_DESCRIPTION,

Can IFB(interface block) be replaced with IF(interface) only?  But that's ok, upto u.


> +	buf = calloc(1, len);
> +	if (!buf)
> +		return -1;

How about returning -ENOMEM

> +
> +	hdr = (struct pcapng_section_header *)buf;
> +	*hdr = (struct pcapng_section_header) {
> +		.block_type = PCAPNG_SECTION_BLOCK,
> +		.block_length = len,
> +		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
> +		.major_version = PCAPNG_MAJOR_VERS,
> +		.minor_version = PCAPNG_MINOR_VERS,
> +		.section_length = UINT64_MAX,
> +	};
> +	hdr->block_length = len;

Why to assign block_len with len again? as it is already done few lines above.

> +	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);

Some comments around this code, about adding end of options at the end of options list would be helpful.

> +
> +/* Write the PCAPNG section header at start of file */ static ssize_t

:s/section header/ interface header?

> +pcapng_interface_block(rte_pcapng_t *self, const char *if_name,
> +	if (mac_addr)
> +		len += pcapng_optlen(6);

How about using  RTE_ETHER_ADDR_LEN instead of 6

> +struct rte_mbuf * rte_pcapng_copy(uint16_t port_id, uint32_t queue,
<snip>
> +fail:
> +	rte_pktmbuf_free(mc);


Freeing mc , would that take care of freeing  up the additional byte prepended after mc creation?

> +	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
> +				&queue, sizeof(queue));

Don't we need to add end of options to the end of option list, like did in Interface block and section header block?

> diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h new file mode
> + *
> + * Packets to be captured are copied by rte_pcapng_mbuf()

Do you mean by rte_pcapng_copy()?



^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v12 11/12] doc: changes for new pcapng and dumpcap
  2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 11/12] doc: changes for new pcapng and dumpcap Stephen Hemminger
@ 2021-10-15 16:42     ` Pattan, Reshma
  2021-10-15 17:29       ` Stephen Hemminger
  0 siblings, 1 reply; 220+ messages in thread
From: Pattan, Reshma @ 2021-10-15 16:42 UTC (permalink / raw)
  To: Stephen Hemminger, dev



> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Stephen Hemminger

> +The DPDK packet capture framework was introduced in DPDK v16.07 and
>+enhanced in 21.1. 

need to edit the version 

> +#. Launch the dpdk-dump as follows::
:s/dpdk-dump/dpdk-dumpcap

> +   Inspect packets captured in the file capture.pcap using a tool such as
:s/ capture.pcap/ capture.pcapng

> @@ -0,0 +1,24 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2016 Intel Corporation.

need to edit the licence .

> +  It also allows seating an optional filter using DPDK BPF interpreter
:s/seating/setting

> +* Draft RFC https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html

Need the link update


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v12 11/12] doc: changes for new pcapng and dumpcap
  2021-10-15 16:42     ` Pattan, Reshma
@ 2021-10-15 17:29       ` Stephen Hemminger
  2021-10-18  9:23         ` Pattan, Reshma
  0 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 17:29 UTC (permalink / raw)
  To: Pattan, Reshma; +Cc: dev

On Fri, 15 Oct 2021 16:42:19 +0000
"Pattan, Reshma" <reshma.pattan@intel.com> wrote:

> > @@ -0,0 +1,24 @@
> > +..  SPDX-License-Identifier: BSD-3-Clause
> > +    Copyright(c) 2016 Intel Corporation.  
> 
> need to edit the licence 

Do you want me to change date on the existing doc as well.

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v12 02/12] librte_pcapng: add new library for writing pcapng files
  2021-10-15  9:36     ` Pattan, Reshma
@ 2021-10-15 17:40       ` Stephen Hemminger
  2021-10-15 18:14       ` Stephen Hemminger
  1 sibling, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 17:40 UTC (permalink / raw)
  To: Pattan, Reshma; +Cc: dev

On Fri, 15 Oct 2021 09:36:00 +0000
"Pattan, Reshma" <reshma.pattan@intel.com> wrote:

> > +	buf = calloc(1, len);
> > +	if (!buf)
> > +		return -1;  
> 
> How about returning -ENOMEM

It could but not necessary.
The other code is returning result of write and therefore would be -1 on
write error.

This is in internal local function and the only caller is just checking for < 0.

PS: In reality, malloc can't fail on Linux. Process gets oom killed instead
(unless someone has gone and tweaked the memory allocator to not overcommit).

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v12 02/12] librte_pcapng: add new library for writing pcapng files
  2021-10-15  9:36     ` Pattan, Reshma
  2021-10-15 17:40       ` Stephen Hemminger
@ 2021-10-15 18:14       ` Stephen Hemminger
  1 sibling, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 18:14 UTC (permalink / raw)
  To: Pattan, Reshma; +Cc: dev

On Fri, 15 Oct 2021 09:36:00 +0000
"Pattan, Reshma" <reshma.pattan@intel.com> wrote:

> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Stephen Hemminger
> > See draft RFC
> >   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html  
> 
> The page is not found.  Might need to add new link I guess
> 
> > +enum pcapng_interface_options {
> > +	PCAPNG_IFB_NAME	 = 2,
> > +	PCAPNG_IFB_DESCRIPTION,  
> 
> Can IFB(interface block) be replaced with IF(interface) only?  But that's ok, upto u.
> 
> 
> > +	buf = calloc(1, len);
> > +	if (!buf)
> > +		return -1;  
> 
> How about returning -ENOMEM
> 
> > +
> > +	hdr = (struct pcapng_section_header *)buf;
> > +	*hdr = (struct pcapng_section_header) {
> > +		.block_type = PCAPNG_SECTION_BLOCK,
> > +		.block_length = len,
> > +		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
> > +		.major_version = PCAPNG_MAJOR_VERS,
> > +		.minor_version = PCAPNG_MINOR_VERS,
> > +		.section_length = UINT64_MAX,
> > +	};
> > +	hdr->block_length = len;  
> 
> Why to assign block_len with len again? as it is already done few lines above.
> 
> > +	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);  
> 
> Some comments around this code, about adding end of options at the end of options list would be helpful.

Ok, but someone looking at this code should really look at the standard
to see what the data format is.

> > +
> > +/* Write the PCAPNG section header at start of file */ static ssize_t  
> 
> :s/section header/ interface header?

Good catch, copy/paste of comment.

> 
> > +pcapng_interface_block(rte_pcapng_t *self, const char *if_name,
> > +	if (mac_addr)
> > +		len += pcapng_optlen(6);  
> 
> How about using  RTE_ETHER_ADDR_LEN instead of 6

Fixing now, also merging pcapng_interface_block since only called one place.


> 
> > +struct rte_mbuf * rte_pcapng_copy(uint16_t port_id, uint32_t queue,  
> <snip>
> > +fail:
> > +	rte_pktmbuf_free(mc);  
> 
> 
> Freeing mc , would that take care of freeing  up the additional byte prepended after mc creation?

Mbuf are allocation unit, so the whole buffer goes.

> 
> > +	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
> > +				&queue, sizeof(queue));  
> 
> Don't we need to add end of options to the end of option list, like did in Interface block and section header block?

It turns out that the reference (wireshark) does not. So did not do
that to save space on the output file.

> 
> > diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h new file mode
> > + *
> > + * Packets to be captured are copied by rte_pcapng_mbuf()  
> 
> Do you mean by rte_pcapng_copy()?

Good catch, function got renamed and comment not updated.

^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v13 00/12] Packet capture framework update
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (15 preceding siblings ...)
  2021-10-01 16:26 ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
@ 2021-10-15 18:28 ` Stephen Hemminger
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 01/12] lib: pdump is not supported on Windows Stephen Hemminger
                     ` (11 more replies)
  2021-10-15 20:11 ` [dpdk-dev] [PATCH v14 00/12] Packet capture framework update Stephen Hemminger
  2021-10-20 21:42 ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Stephen Hemminger
  18 siblings, 12 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 18:28 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patch set is a more complete version of the the enhanced
packet capture support described last year.

The new capture library and utility are:
  - faster avoids lots of extra I/O, does bursting, etc.
  - gives more information (multiple ports, queues, etc)
  - has a better user interface (same as Wireshark dumpcap)
  - fixes structural problems with VLAN's and timestamps

There are no blocker items.
The following are worth noting:
  * bogus checkpatch warnings
	- the correct flag to open is O_CREAT
        - intentionally keeping macro with goto since that
          was in original code and is clearer
        - the tempfile name can not be const since it is
          overwritten by tmpfile() call

  * enabling BPF tests causes CI to see a pre-existing bug
    https://bugs.dpdk.org/show_bug.cgi?id=811

  * future filtering for stripped VLAN tags needs collabration
    with libpcap project to fix pcap_compile_filter().

v13
  - integrate feedback in documentation and pcapng library

v12
  - fixes for capture offloaded VLAN tags.
    look at direction flag and handle QinQ offload.

v11
  - address review comments for pdump (patch 6)

v10:
  - fix to rte_bpf_dump to handle more instructions
    make sure all bpf_test cases are decoded

v9:
  - incorporate suggested change to BPF XOR
  - make autotest for pcapng more complete by reading the
    resulting file with libpcap

v8:
  - enable BPF tests in autotest
  - add more BPF test strings
  - use rte_strscpy to satisfy checkpatch
  - merge MAINTAINERS (put this in with existing pdump)

v7:
  - add functional tests for pcapng lib
  - bug fix for error returns in pcapng lib
  - handle long osname on FreeBSD
  - resolve almost all checkpatch issues

v5:
  - minor build and checkpatch fixes for RHEL/FreeBSD
  - disable lib/pdump on Windows. It was not useful before
    and now pdump depends on bpf.

v4:
  - minor checkpatch fixes.
    Note: some of the checkpatch warnings are bogus and won't be fixed.
  - fix build of dumpcap on FreeBSD

v3:
  - introduce packet filters using classic BPF to eBPF converter
    required small fix to DPDK BPF interpreter
  - introduce function to decode eBPF instructions
  - add option to dumpcap to show both classic BPF and eBPF result
  - drop some un-useful stubs
  - minor checkpatch warning cleanup

v2:
   fix formatting of packet blocks
   fix the new packet capture statistics
   fix crash when primary process exits
   record start/end time
   various whitespace/checkpatch warnings

Stephen Hemminger (12):
  lib: pdump is not supported on Windows
  librte_pcapng: add new library for writing pcapng files
  bpf: allow self-xor operation
  bpf: add function to convert classic BPF to DPDK BPF
  bpf: add function to dump eBPF instructions
  pdump: support pcapng and filtering
  app/dumpcap: add new packet capture application
  test: add test for bpf_convert
  test: add a test for pcapng library
  test: enable bpf autotest
  doc: changes for new pcapng and dumpcap utility
  MAINTAINERS: add entry for new packet capture features

 MAINTAINERS                                   |  11 +-
 app/dumpcap/main.c                            | 844 ++++++++++++++++++
 app/dumpcap/meson.build                       |  16 +
 app/meson.build                               |   1 +
 app/test/meson.build                          |   6 +
 app/test/test_bpf.c                           | 200 +++++
 app/test/test_pcapng.c                        | 272 ++++++
 doc/api/doxy-api-index.md                     |   1 +
 doc/api/doxy-api.conf.in                      |   1 +
 .../howto/img/packet_capture_framework.svg    |  96 +-
 doc/guides/howto/packet_capture_framework.rst |  69 +-
 doc/guides/prog_guide/index.rst               |   1 +
 doc/guides/prog_guide/pcapng_lib.rst          |  25 +
 doc/guides/prog_guide/pdump_lib.rst           |  28 +-
 doc/guides/rel_notes/release_21_11.rst        |  10 +
 doc/guides/tools/dumpcap.rst                  |  86 ++
 doc/guides/tools/index.rst                    |   1 +
 lib/bpf/bpf_convert.c                         | 575 ++++++++++++
 lib/bpf/bpf_dump.c                            | 139 +++
 lib/bpf/bpf_validate.c                        |   9 +-
 lib/bpf/meson.build                           |   6 +
 lib/bpf/rte_bpf.h                             |  39 +
 lib/bpf/version.map                           |   7 +
 lib/meson.build                               |   6 +-
 lib/pcapng/meson.build                        |   8 +
 lib/pcapng/pcapng_proto.h                     | 129 +++
 lib/pcapng/rte_pcapng.c                       | 607 +++++++++++++
 lib/pcapng/rte_pcapng.h                       | 196 ++++
 lib/pcapng/version.map                        |  12 +
 lib/pdump/meson.build                         |   2 +-
 lib/pdump/rte_pdump.c                         | 432 ++++++---
 lib/pdump/rte_pdump.h                         | 113 ++-
 lib/pdump/version.map                         |   8 +
 33 files changed, 3737 insertions(+), 219 deletions(-)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build
 create mode 100644 app/test/test_pcapng.c
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst
 create mode 100644 lib/bpf/bpf_convert.c
 create mode 100644 lib/bpf/bpf_dump.c
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v13 01/12] lib: pdump is not supported on Windows
  2021-10-15 18:28 ` [dpdk-dev] [PATCH v13 " Stephen Hemminger
@ 2021-10-15 18:28   ` Stephen Hemminger
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 18:28 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

The current version of the pdump library was building on
Windows, but it was useless since the pdump utility was not being
built and Windows does not have multi-process support.

The new version of pdump with filtering now has dependency
on bpf. But bpf library is not available on Windows.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

Cc: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>
Cc: Dmitry Malloy <dmitrym@microsoft.com>
Cc: Pallavi Kadam <pallavi.kadam@intel.com>
---
 lib/meson.build | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/meson.build b/lib/meson.build
index b2ba7258d8ba..ef5ff522aeaf 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -85,7 +85,6 @@ if is_windows
             'gro',
             'gso',
             'latencystats',
-            'pdump',
             'stack',
             'security',
     ] # only supported libraries for windows
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v13 02/12] librte_pcapng: add new library for writing pcapng files
  2021-10-15 18:28 ` [dpdk-dev] [PATCH v13 " Stephen Hemminger
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 01/12] lib: pdump is not supported on Windows Stephen Hemminger
@ 2021-10-15 18:28   ` Stephen Hemminger
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 03/12] bpf: allow self-xor operation Stephen Hemminger
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 18:28 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Reshma Pattan, Ray Kinsella

This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build           |   1 +
 lib/pcapng/meson.build    |   8 +
 lib/pcapng/pcapng_proto.h | 129 ++++++++
 lib/pcapng/rte_pcapng.c   | 607 ++++++++++++++++++++++++++++++++++++++
 lib/pcapng/rte_pcapng.h   | 196 ++++++++++++
 lib/pcapng/version.map    |  12 +
 6 files changed, 953 insertions(+)
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

diff --git a/lib/meson.build b/lib/meson.build
index ef5ff522aeaf..15150efa19a7 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -41,6 +41,7 @@ libraries = [
         'latencystats',
         'lpm',
         'member',
+        'pcapng',
         'power',
         'pdump',
         'rawdev',
diff --git a/lib/pcapng/meson.build b/lib/pcapng/meson.build
new file mode 100644
index 000000000000..fe636bdf3c0b
--- /dev/null
+++ b/lib/pcapng/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+version = 1
+sources = files('rte_pcapng.c')
+headers = files('rte_pcapng.h')
+
+deps += ['ethdev']
diff --git a/lib/pcapng/pcapng_proto.h b/lib/pcapng/pcapng_proto.h
new file mode 100644
index 000000000000..47161d8a1213
--- /dev/null
+++ b/lib/pcapng/pcapng_proto.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * PCAP Next Generation Capture File writer
+ *
+ * See: https://github.com/pcapng/pcapng/ for the file format.
+ */
+
+enum pcapng_block_types {
+	PCAPNG_INTERFACE_BLOCK		= 1,
+	PCAPNG_PACKET_BLOCK,		/* Obsolete */
+	PCAPNG_SIMPLE_PACKET_BLOCK,
+	PCAPNG_NAME_RESOLUTION_BLOCK,
+	PCAPNG_INTERFACE_STATS_BLOCK,
+	PCAPNG_ENHANCED_PACKET_BLOCK,
+
+	PCAPNG_SECTION_BLOCK		= 0x0A0D0D0A,
+};
+
+struct pcapng_option {
+	uint16_t code;
+	uint16_t length;
+	uint8_t data[];
+};
+
+#define PCAPNG_BYTE_ORDER_MAGIC 0x1A2B3C4D
+#define PCAPNG_MAJOR_VERS 1
+#define PCAPNG_MINOR_VERS 0
+
+enum pcapng_opt {
+	PCAPNG_OPT_END	= 0,
+	PCAPNG_OPT_COMMENT = 1,
+};
+
+struct pcapng_section_header {
+	uint32_t block_type;
+	uint32_t block_length;
+	uint32_t byte_order_magic;
+	uint16_t major_version;
+	uint16_t minor_version;
+	uint64_t section_length;
+};
+
+enum pcapng_section_opt {
+	PCAPNG_SHB_HARDWARE = 2,
+	PCAPNG_SHB_OS	    = 3,
+	PCAPNG_SHB_USERAPPL = 4,
+};
+
+struct pcapng_interface_block {
+	uint32_t block_type;	/* 1 */
+	uint32_t block_length;
+	uint16_t link_type;
+	uint16_t reserved;
+	uint32_t snap_len;
+};
+
+enum pcapng_interface_options {
+	PCAPNG_IFB_NAME	 = 2,
+	PCAPNG_IFB_DESCRIPTION,
+	PCAPNG_IFB_IPV4ADDR,
+	PCAPNG_IFB_IPV6ADDR,
+	PCAPNG_IFB_MACADDR,
+	PCAPNG_IFB_EUIADDR,
+	PCAPNG_IFB_SPEED,
+	PCAPNG_IFB_TSRESOL,
+	PCAPNG_IFB_TZONE,
+	PCAPNG_IFB_FILTER,
+	PCAPNG_IFB_OS,
+	PCAPNG_IFB_FCSLEN,
+	PCAPNG_IFB_TSOFFSET,
+	PCAPNG_IFB_HARDWARE,
+};
+
+struct pcapng_enhance_packet_block {
+	uint32_t block_type;	/* 6 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+	uint32_t capture_length;
+	uint32_t original_length;
+};
+
+/* Flags values */
+#define PCAPNG_IFB_INBOUND   0b01
+#define PCAPNG_IFB_OUTBOUND  0b10
+
+enum pcapng_epb_options {
+	PCAPNG_EPB_FLAGS = 2,
+	PCAPNG_EPB_HASH,
+	PCAPNG_EPB_DROPCOUNT,
+	PCAPNG_EPB_PACKETID,
+	PCAPNG_EPB_QUEUE,
+	PCAPNG_EPB_VERDICT,
+};
+
+enum pcapng_epb_hash {
+	PCAPNG_HASH_2COMP = 0,
+	PCAPNG_HASH_XOR,
+	PCAPNG_HASH_CRC32,
+	PCAPNG_HASH_MD5,
+	PCAPNG_HASH_SHA1,
+	PCAPNG_HASH_TOEPLITZ,
+};
+
+struct pcapng_simple_packet {
+	uint32_t block_type;	/* 3 */
+	uint32_t block_length;
+	uint32_t packet_length;
+};
+
+struct pcapng_statistics {
+	uint32_t block_type;	/* 5 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+};
+
+enum pcapng_isb_options {
+	PCAPNG_ISB_STARTTIME = 2,
+	PCAPNG_ISB_ENDTIME,
+	PCAPNG_ISB_IFRECV,
+	PCAPNG_ISB_IFDROP,
+	PCAPNG_ISB_FILTERACCEPT,
+	PCAPNG_ISB_OSDROP,
+	PCAPNG_ISB_USRDELIV,
+};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
new file mode 100644
index 000000000000..6900d1d2c595
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.c
@@ -0,0 +1,607 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+#include <errno.h>
+#include <net/if.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/uio.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+#include <rte_time.h>
+
+#include "pcapng_proto.h"
+
+/* conversion from DPDK speed to PCAPNG */
+#define PCAPNG_MBPS_SPEED 1000000ull
+
+/* Format of the capture file handle */
+struct rte_pcapng {
+	int  outfd;		/* output file */
+	/* DPDK port id to interface index in file */
+	uint32_t port_index[RTE_MAX_ETHPORTS];
+};
+
+/* For converting TSC cycles to PCAPNG ns format */
+struct pcapng_time {
+	uint64_t ns;
+	uint64_t cycles;
+} pcapng_time;
+
+RTE_INIT(pcapng_init)
+{
+	struct timespec ts;
+
+	pcapng_time.cycles = rte_get_tsc_cycles();
+	clock_gettime(CLOCK_REALTIME, &ts);
+	pcapng_time.ns = rte_timespec_to_ns(&ts);
+}
+
+/* PCAPNG timestamps are in nanoseconds */
+static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
+{
+	uint64_t delta;
+
+	delta = cycles - pcapng_time.cycles;
+	return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
+}
+
+/* length of option including padding */
+static uint16_t pcapng_optlen(uint16_t len)
+{
+	return RTE_ALIGN(sizeof(struct pcapng_option) + len,
+			 sizeof(uint32_t));
+}
+
+/* build TLV option and return location of next */
+static struct pcapng_option *
+pcapng_add_option(struct pcapng_option *popt, uint16_t code,
+		  const void *data, uint16_t len)
+{
+	popt->code = code;
+	popt->length = len;
+	memcpy(popt->data, data, len);
+
+	return (struct pcapng_option *)((uint8_t *)popt + pcapng_optlen(len));
+}
+
+/*
+ * Write required initial section header describing the capture
+ */
+static int
+pcapng_section_block(rte_pcapng_t *self,
+		    const char *os, const char *hw,
+		    const char *app, const char *comment)
+{
+	struct pcapng_section_header *hdr;
+	struct pcapng_option *opt;
+	void *buf;
+	uint32_t len;
+	ssize_t cc;
+
+	len = sizeof(*hdr);
+	if (hw)
+		len += pcapng_optlen(strlen(hw));
+	if (os)
+		len += pcapng_optlen(strlen(os));
+	if (app)
+		len += pcapng_optlen(strlen(app));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	/* reserve space for OPT_END */
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = calloc(1, len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_section_header *)buf;
+	*hdr = (struct pcapng_section_header) {
+		.block_type = PCAPNG_SECTION_BLOCK,
+		.block_length = len,
+		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
+		.major_version = PCAPNG_MAJOR_VERS,
+		.minor_version = PCAPNG_MINOR_VERS,
+		.section_length = UINT64_MAX,
+	};
+
+	/* After the section header insert variable length options. */
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (hw)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_HARDWARE,
+					hw, strlen(hw));
+	if (os)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_OS,
+					os, strlen(os));
+	if (app)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_USERAPPL,
+					app, strlen(app));
+
+	/* The standard requires last option to be OPT_END */
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	/* clone block_length after option */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+/* Write an interface block for a DPDK port */
+static int
+pcapng_add_interface(rte_pcapng_t *self, uint16_t port)
+{
+	struct pcapng_interface_block *hdr;
+	struct rte_eth_dev_info dev_info;
+	struct rte_ether_addr *ea, macaddr;
+	const struct rte_device *dev;
+	struct rte_eth_link link;
+	struct pcapng_option *opt;
+	const uint8_t tsresol = 9;	/* nanosecond resolution */
+	uint32_t len;
+	void *buf;
+	char ifname[IF_NAMESIZE];
+	char ifhw[256];
+	uint64_t speed = 0;
+
+	if (rte_eth_dev_info_get(port, &dev_info) < 0)
+		return -1;
+
+	/* make something like an interface name */
+	if (if_indextoname(dev_info.if_index, ifname) == NULL)
+		snprintf(ifname, IF_NAMESIZE, "dpdk:%u", port);
+
+	/* make a useful device hardware string */
+	dev = dev_info.device;
+	if (dev)
+		snprintf(ifhw, sizeof(ifhw),
+			 "%s-%s", dev->bus->name, dev->name);
+
+	/* DPDK reports in units of Mbps */
+	rte_eth_link_get(port, &link);
+	if (link.link_status == ETH_LINK_UP)
+		speed = link.link_speed * PCAPNG_MBPS_SPEED;
+
+	if (rte_eth_macaddr_get(port, &macaddr) < 0)
+		ea = NULL;
+	else
+		ea = &macaddr;
+
+	/* Compute length of interface block options */
+	len = sizeof(*hdr);
+
+	len += pcapng_optlen(sizeof(tsresol));	/* timestamp */
+	len += pcapng_optlen(strlen(ifname));	/* ifname */
+
+	if (ea)
+		len += pcapng_optlen(RTE_ETHER_ADDR_LEN); /* macaddr */
+	if (speed != 0)
+		len += pcapng_optlen(sizeof(uint64_t));
+	if (dev)
+		len += pcapng_optlen(strlen(ifhw));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = alloca(len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_interface_block *)buf;
+	*hdr = (struct pcapng_interface_block) {
+		.block_type = PCAPNG_INTERFACE_BLOCK,
+		.link_type = 1, 	/* Ethernet */
+		.block_length = len,
+	};
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	opt = pcapng_add_option(opt, PCAPNG_IFB_TSRESOL,
+				&tsresol, sizeof(tsresol));
+	opt = pcapng_add_option(opt, PCAPNG_IFB_NAME,
+				ifname, strlen(ifname));
+	if (ea)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_MACADDR,
+					ea, RTE_ETHER_ADDR_LEN);
+	if (speed != 0)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_SPEED,
+					 &speed, sizeof(uint64_t));
+	if (dev)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE,
+					 ifhw, strlen(ifhw));
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	/* clone block_length after optionsa */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+/*
+ * Write the list of possible interfaces at the start
+ * of the file.
+ */
+static int
+pcapng_interfaces(rte_pcapng_t *self)
+{
+	uint16_t port_id;
+	uint16_t index = 0;
+
+	RTE_ETH_FOREACH_DEV(port_id) {
+		/* The list if ports in pcapng needs to be contiguous */
+		self->port_index[port_id] = index++;
+		if (pcapng_add_interface(self, port_id) < 0)
+			return -1;
+	}
+	return 0;
+}
+
+/*
+ * Write an Interface statistics block at the end of capture.
+ */
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop)
+{
+	struct pcapng_statistics *hdr;
+	struct pcapng_option *opt;
+	uint32_t optlen, len;
+	uint8_t *buf;
+	uint64_t ns;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	optlen = 0;
+
+	if (ifrecv != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifdrop));
+	if (start_time != 0)
+		optlen += pcapng_optlen(sizeof(start_time));
+	if (end_time != 0)
+		optlen += pcapng_optlen(sizeof(end_time));
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+	if (optlen != 0)
+		optlen += pcapng_optlen(0);
+
+	len = sizeof(*hdr) + optlen + sizeof(uint32_t);
+	buf = alloca(len);
+	if (buf == NULL)
+		return -1;
+
+	hdr = (struct pcapng_statistics *)buf;
+	opt = (struct pcapng_option *)(hdr + 1);
+
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (start_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME,
+					 &start_time, sizeof(start_time));
+	if (end_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME,
+					 &end_time, sizeof(end_time));
+	if (ifrecv != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV,
+				&ifrecv, sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFDROP,
+				&ifdrop, sizeof(ifdrop));
+	if (optlen != 0)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	hdr->block_type = PCAPNG_INTERFACE_STATS_BLOCK;
+	hdr->block_length = len;
+	hdr->interface_id = self->port_index[port_id];
+
+	ns = pcapng_tsc_to_ns(rte_get_tsc_cycles());
+	hdr->timestamp_hi = ns >> 32;
+	hdr->timestamp_lo = (uint32_t)ns;
+
+	/* clone block_length after option */
+	memcpy(opt, &len, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length)
+{
+	/* The VLAN and EPB header must fit in the mbuf headroom. */
+	RTE_ASSERT(sizeof(struct pcapng_enhance_packet_block) +
+		   sizeof(struct rte_vlan_hdr) <= RTE_PKTMBUF_HEADROOM);
+
+	/* The flags and queue information are added at the end. */
+	return sizeof(struct rte_mbuf)
+		+ RTE_ALIGN(length, sizeof(uint32_t))
+		+ pcapng_optlen(sizeof(uint32_t)) /* flag option */
+		+ pcapng_optlen(sizeof(uint32_t)) /* queue option */
+		+ sizeof(uint32_t);		  /*  length */
+}
+
+/* More generalized version rte_vlan_insert() */
+static int
+pcapng_vlan_insert(struct rte_mbuf *m, uint16_t ether_type, uint16_t tci)
+{
+	struct rte_ether_hdr *nh, *oh;
+	struct rte_vlan_hdr *vh;
+
+	if (!RTE_MBUF_DIRECT(m) || rte_mbuf_refcnt_read(m) > 1)
+		return -EINVAL;
+
+	if (rte_pktmbuf_data_len(m) < sizeof(*oh))
+		return -EINVAL;
+
+	oh = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
+	nh = (struct rte_ether_hdr *)
+		rte_pktmbuf_prepend(m, sizeof(struct rte_vlan_hdr));
+	if (nh == NULL)
+		return -ENOSPC;
+
+	memmove(nh, oh, 2 * RTE_ETHER_ADDR_LEN);
+	nh->ether_type = rte_cpu_to_be_16(ether_type);
+
+	vh = (struct rte_vlan_hdr *) (nh + 1);
+	vh->vlan_tci = rte_cpu_to_be_16(tci);
+
+	return 0;
+}
+
+/*
+ *   The mbufs created use the Pcapng standard enhanced packet  block.
+ *
+ *                         1                   2                   3
+ *     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  0 |                    Block Type = 0x00000006                    |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  4 |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  8 |                         Interface ID                          |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 12 |                        Timestamp (High)                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 16 |                        Timestamp (Low)                        |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 20 |                    Captured Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 24 |                    Original Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 28 /                                                               /
+ *    /                          Packet Data                          /
+ *    /              variable length, padded to 32 bits               /
+ *    /                                                               /
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0002     |     Option Length = 0x004     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Flags (direction)                                |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0006     |     Option Length = 0x002     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Queue id                                         |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+
+/* Make a copy of original mbuf with pcapng header and options */
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *md,
+		struct rte_mempool *mp,
+		uint32_t length, uint64_t cycles,
+		enum rte_pcapng_direction direction)
+{
+	struct pcapng_enhance_packet_block *epb;
+	uint32_t orig_len, data_len, padding, flags;
+	struct pcapng_option *opt;
+	const uint16_t optlen = pcapng_optlen(sizeof(flags)) + pcapng_optlen(sizeof(queue));
+	struct rte_mbuf *mc;
+	uint64_t ns;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
+#endif
+	ns = pcapng_tsc_to_ns(cycles);
+
+	orig_len = rte_pktmbuf_pkt_len(md);
+
+	/* Take snapshot of the data */
+	mc = rte_pktmbuf_copy(md, mp, 0, length);
+	if (unlikely(mc == NULL))
+		return NULL;
+
+	/* Expand any offloaded VLAN information */
+	if ((direction == RTE_PCAPNG_DIRECTION_IN &&
+	     (md->ol_flags & PKT_RX_VLAN_STRIPPED)) ||
+	    (direction == RTE_PCAPNG_DIRECTION_OUT &&
+	     (md->ol_flags & PKT_TX_VLAN))) {
+		if (pcapng_vlan_insert(mc, RTE_ETHER_TYPE_VLAN,
+				       md->vlan_tci) != 0)
+			goto fail;
+	}
+
+	if ((direction == RTE_PCAPNG_DIRECTION_IN &&
+	     (md->ol_flags & PKT_RX_QINQ_STRIPPED)) ||
+	    (direction == RTE_PCAPNG_DIRECTION_OUT &&
+	     (md->ol_flags & PKT_TX_QINQ))) {
+		if (pcapng_vlan_insert(mc, RTE_ETHER_TYPE_QINQ,
+				       md->vlan_tci_outer) != 0)
+			goto fail;
+	}
+
+	/* pad the packet to 32 bit boundary */
+	data_len = rte_pktmbuf_data_len(mc);
+	padding = RTE_ALIGN(data_len, sizeof(uint32_t)) - data_len;
+	if (padding > 0) {
+		void *tail = rte_pktmbuf_append(mc, padding);
+
+		if (tail == NULL)
+			goto fail;
+		memset(tail, 0, padding);
+	}
+
+	/* reserve trailing options and block length */
+	opt = (struct pcapng_option *)
+		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
+	if (unlikely(opt == NULL))
+		goto fail;
+
+	switch (direction) {
+	case RTE_PCAPNG_DIRECTION_IN:
+		flags = PCAPNG_IFB_INBOUND;
+		break;
+	case RTE_PCAPNG_DIRECTION_OUT:
+		flags = PCAPNG_IFB_OUTBOUND;
+		break;
+	default:
+		flags = 0;
+	}
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS,
+				&flags, sizeof(flags));
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
+				&queue, sizeof(queue));
+
+	/* Note: END_OPT necessary here. Wireshark doesn't do it. */
+
+	/* Add PCAPNG packet header */
+	epb = (struct pcapng_enhance_packet_block *)
+		rte_pktmbuf_prepend(mc, sizeof(*epb));
+	if (unlikely(epb == NULL))
+		goto fail;
+
+	epb->block_type = PCAPNG_ENHANCED_PACKET_BLOCK;
+	epb->block_length = rte_pktmbuf_data_len(mc);
+
+	/* Interface index is filled in later during write */
+	mc->port = port_id;
+
+	epb->timestamp_hi = ns >> 32;
+	epb->timestamp_lo = (uint32_t)ns;
+	epb->capture_length = data_len;
+	epb->original_length = orig_len;
+
+	/* set trailer of block length */
+	*(uint32_t *)opt = epb->block_length;
+
+	return mc;
+
+fail:
+	rte_pktmbuf_free(mc);
+	return NULL;
+}
+
+/* Count how many segments are in this array of mbufs */
+static unsigned int
+mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
+{
+	unsigned int i, iovcnt;
+
+	for (iovcnt = 0, i = 0; i < n; i++) {
+		const struct rte_mbuf *m = pkts[i];
+
+		__rte_mbuf_sanity_check(m, 1);
+
+		iovcnt += m->nb_segs;
+	}
+	return iovcnt;
+}
+
+/* Write pre-formatted packets to file. */
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts)
+{
+	int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
+	struct iovec iov[iovcnt];
+	unsigned int i, cnt;
+	ssize_t ret;
+
+	for (i = cnt = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *m = pkts[i];
+		struct pcapng_enhance_packet_block *epb;
+
+		/* sanity check that is really a pcapng mbuf */
+		epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
+		if (unlikely(epb->block_type != PCAPNG_ENHANCED_PACKET_BLOCK ||
+			     epb->block_length != rte_pktmbuf_data_len(m))) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/*
+		 * The DPDK port is recorded during pcapng_copy.
+		 * Map that to PCAPNG interface in file.
+		 */
+		epb->interface_id = self->port_index[m->port];
+		do {
+			iov[cnt].iov_base = rte_pktmbuf_mtod(m, void *);
+			iov[cnt].iov_len = rte_pktmbuf_data_len(m);
+			++cnt;
+		} while ((m = m->next));
+	}
+
+	ret = writev(self->outfd, iov, iovcnt);
+	if (unlikely(ret < 0))
+		rte_errno = errno;
+	return ret;
+}
+
+/* Create new pcapng writer handle */
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment)
+{
+	rte_pcapng_t *self;
+
+	self = malloc(sizeof(*self));
+	if (!self) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	self->outfd = fd;
+
+	if (pcapng_section_block(self, osname, hardware, appname, comment) < 0)
+		goto fail;
+
+	if (pcapng_interfaces(self) < 0)
+		goto fail;
+
+	return self;
+fail:
+	free(self);
+	return NULL;
+}
+
+void
+rte_pcapng_close(rte_pcapng_t *self)
+{
+	close(self->outfd);
+	free(self);
+}
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
new file mode 100644
index 000000000000..31d2f0210f3f
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.h
@@ -0,0 +1,196 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+/**
+ * @file
+ * RTE pcapng
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * Pcapng is an evolution from the pcap format, created to address some of
+ * its deficiencies. Namely, the lack of extensibility and inability to store
+ * additional information.
+ *
+ * For details about the file format see RFC:
+ *   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+ *  and
+ *    https://github.com/pcapng/pcapng/
+ */
+
+#ifndef _RTE_PCAPNG_H_
+#define _RTE_PCAPNG_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+#include <rte_compat.h>
+#include <rte_common.h>
+#include <rte_mempool.h>
+#include <rte_ring.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Opaque handle used for functions in this library. */
+typedef struct rte_pcapng rte_pcapng_t;
+
+/**
+ * Write data to existing open file
+ *
+ * @param fd
+ *   file descriptor
+ * @param osname
+ *   Optional description of the operating system.
+ *   Examples: "Debian 11", "Windows Server 22"
+ * @param hardware
+ *   Optional description of the hardware used to create this file.
+ *   Examples: "x86 Virtual Machine"
+ * @param appname
+ *   Optional: application name recorded in the pcapng file.
+ *   Example: "dpdk-dumpcap 1.0 (DPDK 20.11)"
+ * @param comment
+ *   Optional comment to add to file header.
+ * @return
+ *   handle to library, or NULL in case of error (and rte_errno is set).
+ */
+__rte_experimental
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment);
+
+/**
+ * Close capture file
+ *
+ * @param self
+ *  handle to library
+ */
+__rte_experimental
+void
+rte_pcapng_close(rte_pcapng_t *self);
+
+/**
+ * Direction flag
+ * These should match Enhanced Packet Block flag bits
+ */
+enum rte_pcapng_direction {
+	RTE_PCAPNG_DIRECTION_UNKNOWN = 0,
+	RTE_PCAPNG_DIRECTION_IN  = 1,
+	RTE_PCAPNG_DIRECTION_OUT = 2,
+};
+
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param port_id
+ *   The Ethernet port on which packet was received
+ *   or is going to be transmitted.
+ * @param queue
+ *   The queue on the Ethernet port where packet was received
+ *   or is going to be transmitted.
+ * @param mp
+ *   The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ *   The mbuf to copy
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @param timestamp
+ *   The timestamp in TSC cycles.
+ * @param direction
+ *   The direction of the packer: receive, transmit or unknown.
+ *
+ * @return
+ *   - The pointer to the new mbuf formatted for pcapng_write
+ *   - NULL if allocation fails.
+ *
+ */
+__rte_experimental
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *m, struct rte_mempool *mp,
+		uint32_t length, uint64_t timestamp,
+		enum rte_pcapng_direction direction);
+
+
+/**
+ * Determine optimum mbuf data size.
+ *
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @return
+ *   The minimum size of mbuf data to handle packet with length bytes.
+ *   Accounting for required header and trailer fields
+ */
+__rte_experimental
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length);
+
+/**
+ * Write packets to the capture file.
+ *
+ * Packets to be captured are copied by rte_pcapng_copy()
+ * and then this function is called to write them to the file.
+ *
+ * @warning
+ * Do not pass original mbufs from transmit or receive
+ * or file will be invalid pcapng format.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param pkts
+ *  The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *  which contain the output packets
+ * @param nb_pkts
+ *  The number of packets to write to the file.
+ * @return
+ *  The number of bytes written to file, -1 on failure to write file.
+ *  The mbuf's in *pkts* are always freed.
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts);
+
+/**
+ * Write an Interface statistics block.
+ * For statistics, use 0 if don't know or care to report it.
+ * Should be called before closing capture to report results.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param port
+ *  The Ethernet port to report stats on.
+ * @param comment
+ *   Optional comment to add to statistics.
+ * @param start_time
+ *  The time when packet capture was started in nanoseconds.
+ *  Optional: can be zero if not known.
+ * @param end_time
+ *  The time when packet capture was stopped in nanoseconds.
+ *  Optional: can be zero if not finished;
+ * @param ifrecv
+ *  The number of packets received by capture.
+ *  Optional: use UINT64_MAX if not known.
+ * @param ifdrop
+ *  The number of packets missed by the capture process.
+ *  Optional: use UINT64_MAX if not known.
+ * @return
+ *  number of bytes written to file, -1 on failure to write file
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PCAPNG_H_ */
diff --git a/lib/pcapng/version.map b/lib/pcapng/version.map
new file mode 100644
index 000000000000..05a9c86a7d91
--- /dev/null
+++ b/lib/pcapng/version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_pcapng_close;
+	rte_pcapng_copy;
+	rte_pcapng_fdopen;
+	rte_pcapng_mbuf_size;
+	rte_pcapng_write_packets;
+	rte_pcapng_write_stats;
+
+	local: *;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v13 03/12] bpf: allow self-xor operation
  2021-10-15 18:28 ` [dpdk-dev] [PATCH v13 " Stephen Hemminger
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 01/12] lib: pdump is not supported on Windows Stephen Hemminger
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-10-15 18:28   ` Stephen Hemminger
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 18:28 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

Some BPF programs may use XOR of a register with itself
as a way to zero register in one instruction.
The BPF filter converter generates this in the prolog
to the generated code.

The BPF validator would not allow this because the value of
register was undefined. But after this operation it always zero.

Fixes: 8021917293d0 ("bpf: add extra validation for input BPF program")
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_validate.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_validate.c b/lib/bpf/bpf_validate.c
index 7b1291b382e9..853279fee557 100644
--- a/lib/bpf/bpf_validate.c
+++ b/lib/bpf/bpf_validate.c
@@ -661,8 +661,15 @@ eval_alu(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
 
 	op = BPF_OP(ins->code);
 
+	/* Allow self-xor as way to zero register */
+	if (op == BPF_XOR && BPF_SRC(ins->code) == BPF_X &&
+	    ins->src_reg == ins->dst_reg) {
+		eval_fill_imm(&rs, UINT64_MAX, 0);
+		eval_fill_imm(rd, UINT64_MAX, 0);
+	}
+
 	err = eval_defined((op != EBPF_MOV) ? rd : NULL,
-			(op != BPF_NEG) ? &rs : NULL);
+			   (op != BPF_NEG) ? &rs : NULL);
 	if (err != NULL)
 		return err;
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v13 04/12] bpf: add function to convert classic BPF to DPDK BPF
  2021-10-15 18:28 ` [dpdk-dev] [PATCH v13 " Stephen Hemminger
                     ` (2 preceding siblings ...)
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 03/12] bpf: allow self-xor operation Stephen Hemminger
@ 2021-10-15 18:28   ` Stephen Hemminger
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 18:28 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Ray Kinsella

The pcap library emits classic BPF (32 bit) and is useful for
creating filter programs.  The DPDK BPF library only implements
extended BPF (eBPF).  Add an function to convert from old to
new.

The rte_bpf_convert function uses rte_malloc to put the resulting
program in hugepage shared memory so it can be passed from a
secondary process to a primary process.

The code to convert was originally done as part of the Linux
kernel implementation then converted to a userspace program.
See https://github.com/tklauser/filter2xdp

Both authors have agreed that it is allowable to create a modified
version of this code and license it with BSD license used by DPDK.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/bpf/bpf_convert.c | 575 ++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build   |   5 +
 lib/bpf/rte_bpf.h     |  25 ++
 lib/bpf/version.map   |   6 +
 4 files changed, 611 insertions(+)
 create mode 100644 lib/bpf/bpf_convert.c

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
new file mode 100644
index 000000000000..db84add7dcce
--- /dev/null
+++ b/lib/bpf/bpf_convert.c
@@ -0,0 +1,575 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Microsoft Corporation
+ *
+ * Based on bpf_convert_filter() in the Linux kernel sources
+ * and filter2xdp.
+ *
+ * Licensed as BSD with permission original authors.
+ * Copyright (C) 2017 Tobias Klauser
+ * Copyright (c) 2011 - 2014 PLUMgrid, http://plumgrid.com
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_bpf.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+/* Workaround name conflicts with libpcap */
+#define bpf_validate(f, len) bpf_validate_libpcap(f, len)
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+#undef bpf_validate
+
+#include "bpf_impl.h"
+#include "bpf_def.h"
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+/*
+ * Linux socket filter uses negative absolute offsets to
+ * reference ancillary data.
+ */
+#define SKF_AD_OFF    (-0x1000)
+#define SKF_AD_PROTOCOL 0
+#define SKF_AD_PKTTYPE	4
+#define SKF_AD_IFINDEX	8
+#define SKF_AD_NLATTR	12
+#define SKF_AD_NLATTR_NEST	16
+#define SKF_AD_MARK	20
+#define SKF_AD_QUEUE	24
+#define SKF_AD_HATYPE	28
+#define SKF_AD_RXHASH	32
+#define SKF_AD_CPU	36
+#define SKF_AD_ALU_XOR_X	40
+#define SKF_AD_VLAN_TAG	44
+#define SKF_AD_VLAN_TAG_PRESENT 48
+#define SKF_AD_PAY_OFFSET	52
+#define SKF_AD_RANDOM	56
+#define SKF_AD_VLAN_TPID	60
+#define SKF_AD_MAX	64
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1	EBPF_REG_1
+#define BPF_REG_ARG2	EBPF_REG_2
+#define BPF_REG_ARG3	EBPF_REG_3
+#define BPF_REG_ARG4	EBPF_REG_4
+#define BPF_REG_ARG5	EBPF_REG_5
+#define BPF_REG_CTX	EBPF_REG_6
+#define BPF_REG_FP	EBPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A	EBPF_REG_0
+#define BPF_REG_X	EBPF_REG_7
+#define BPF_REG_TMP	EBPF_REG_8
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define EBPF_ALU64_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | BPF_OP(OP) | BPF_X,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU32_IMM(OP, DST, IMM)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV32_IMM(DST, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = CODE,					\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN()						\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | EBPF_EXIT,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/*
+ * Placeholder to convert BPF extensions like length and VLAN tag
+ * If and when DPDK BPF supports them.
+ */
+static bool convert_bpf_load(const struct bpf_insn *fp,
+			     struct ebpf_insn **new_insnp __rte_unused)
+{
+	switch (fp->k) {
+	case SKF_AD_OFF + SKF_AD_PROTOCOL:
+	case SKF_AD_OFF + SKF_AD_PKTTYPE:
+	case SKF_AD_OFF + SKF_AD_IFINDEX:
+	case SKF_AD_OFF + SKF_AD_HATYPE:
+	case SKF_AD_OFF + SKF_AD_MARK:
+	case SKF_AD_OFF + SKF_AD_RXHASH:
+	case SKF_AD_OFF + SKF_AD_QUEUE:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG_PRESENT:
+	case SKF_AD_OFF + SKF_AD_VLAN_TPID:
+	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
+	case SKF_AD_OFF + SKF_AD_NLATTR:
+	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
+	case SKF_AD_OFF + SKF_AD_CPU:
+	case SKF_AD_OFF + SKF_AD_RANDOM:
+	case SKF_AD_OFF + SKF_AD_ALU_XOR_X:
+		/* Linux has special negative offsets to access meta-data. */
+		RTE_BPF_LOG(ERR,
+			    "rte_bpf_convert: socket offset %d not supported\n",
+			    fp->k - SKF_AD_OFF);
+		return true;
+	default:
+		return false;
+	}
+}
+
+static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
+			      struct ebpf_insn *new_prog, uint32_t *new_len)
+{
+	unsigned int pass = 0;
+	size_t new_flen = 0, target, i;
+	struct ebpf_insn *new_insn;
+	const struct bpf_insn *fp;
+	int *addrs = NULL;
+	uint8_t bpf_src;
+
+	if (len > BPF_MAXINSNS) {
+		RTE_BPF_LOG(ERR, "%s: cBPF program too long (%zu insns)\n",
+			    __func__, len);
+		return -EINVAL;
+	}
+
+	/* On second pass, allocate the new program */
+	if (new_prog) {
+		addrs = calloc(len, sizeof(*addrs));
+		if (addrs == NULL)
+			return -ENOMEM;
+	}
+
+do_pass:
+	new_insn = new_prog;
+	fp = prog;
+
+	/* Classic BPF related prologue emission. */
+	if (new_insn) {
+		/* Classic BPF expects A and X to be reset first. These need
+		 * to be guaranteed to be the first two instructions.
+		 */
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+
+		/* All programs must keep CTX in callee saved BPF_REG_CTX.
+		 * In eBPF case it's done by the compiler, here we need to
+		 * do this ourself. Initial CTX is present in BPF_REG_ARG1.
+		 */
+		*new_insn++ = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1);
+	} else {
+		new_insn += 3;
+	}
+
+	for (i = 0; i < len; fp++, i++) {
+		struct ebpf_insn tmp_insns[6] = { };
+		struct ebpf_insn *insn = tmp_insns;
+
+		if (addrs)
+			addrs[i] = new_insn - new_prog;
+
+		switch (fp->code) {
+			/* Absolute loads are how classic BPF accesses skb */
+		case BPF_LD | BPF_ABS | BPF_W:
+		case BPF_LD | BPF_ABS | BPF_H:
+		case BPF_LD | BPF_ABS | BPF_B:
+			if (convert_bpf_load(fp, &insn))
+				goto err;
+
+			*insn = BPF_RAW_INSN(fp->code, 0, 0, 0, fp->k);
+			break;
+
+		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_X:
+			/* For cBPF, don't cause floating point exception */
+			*insn++ = BPF_MOV32_REG(BPF_REG_X, BPF_REG_X);
+			*insn++ = BPF_JMP_IMM(EBPF_JNE, BPF_REG_X, 0, 2);
+			*insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+			*insn++ = BPF_EXIT_INSN();
+			/* fallthrough */
+		case BPF_ALU | BPF_ADD | BPF_X:
+		case BPF_ALU | BPF_ADD | BPF_K:
+		case BPF_ALU | BPF_SUB | BPF_X:
+		case BPF_ALU | BPF_SUB | BPF_K:
+		case BPF_ALU | BPF_AND | BPF_X:
+		case BPF_ALU | BPF_AND | BPF_K:
+		case BPF_ALU | BPF_OR | BPF_X:
+		case BPF_ALU | BPF_OR | BPF_K:
+		case BPF_ALU | BPF_LSH | BPF_X:
+		case BPF_ALU | BPF_LSH | BPF_K:
+		case BPF_ALU | BPF_RSH | BPF_X:
+		case BPF_ALU | BPF_RSH | BPF_K:
+		case BPF_ALU | BPF_XOR | BPF_X:
+		case BPF_ALU | BPF_XOR | BPF_K:
+		case BPF_ALU | BPF_MUL | BPF_X:
+		case BPF_ALU | BPF_MUL | BPF_K:
+		case BPF_ALU | BPF_DIV | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_NEG:
+		case BPF_LD | BPF_IND | BPF_W:
+		case BPF_LD | BPF_IND | BPF_H:
+		case BPF_LD | BPF_IND | BPF_B:
+			/* All arithmetic insns map as-is. */
+			insn->code = fp->code;
+			insn->dst_reg = BPF_REG_A;
+			bpf_src = BPF_SRC(fp->code);
+			insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			insn->off = 0;
+			insn->imm = fp->k;
+			break;
+
+			/* Jump transformation cannot use BPF block macros
+			 * everywhere as offset calculation and target updates
+			 * require a bit more work than the rest, i.e. jump
+			 * opcodes map as-is, but offsets need adjustment.
+			 */
+
+#define BPF_EMIT_JMP							\
+			do {						\
+				if (target >= len)			\
+					goto err;			\
+				insn->off = addrs ? addrs[target] - addrs[i] - 1 : 0; \
+				/* Adjust pc relative offset for 2nd or 3rd insn. */ \
+				insn->off -= insn - tmp_insns;		\
+			} while (0)
+
+		case BPF_JMP | BPF_JA:
+			target = i + fp->k + 1;
+			insn->code = fp->code;
+			BPF_EMIT_JMP;
+			break;
+
+		case BPF_JMP | BPF_JEQ | BPF_K:
+		case BPF_JMP | BPF_JEQ | BPF_X:
+		case BPF_JMP | BPF_JSET | BPF_K:
+		case BPF_JMP | BPF_JSET | BPF_X:
+		case BPF_JMP | BPF_JGT | BPF_K:
+		case BPF_JMP | BPF_JGT | BPF_X:
+		case BPF_JMP | BPF_JGE | BPF_K:
+		case BPF_JMP | BPF_JGE | BPF_X:
+			if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
+				/* BPF immediates are signed, zero extend
+				 * immediate into tmp register and use it
+				 * in compare insn.
+				 */
+				*insn++ = BPF_MOV32_IMM(BPF_REG_TMP, fp->k);
+
+				insn->dst_reg = BPF_REG_A;
+				insn->src_reg = BPF_REG_TMP;
+				bpf_src = BPF_X;
+			} else {
+				insn->dst_reg = BPF_REG_A;
+				insn->imm = fp->k;
+				bpf_src = BPF_SRC(fp->code);
+				insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			}
+
+			/* Common case where 'jump_false' is next insn. */
+			if (fp->jf == 0) {
+				insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+				target = i + fp->jt + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Convert JEQ into JNE when 'jump_true' is next insn. */
+			if (fp->jt == 0 && BPF_OP(fp->code) == BPF_JEQ) {
+				insn->code = BPF_JMP | EBPF_JNE | bpf_src;
+				target = i + fp->jf + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Other jumps are mapped into two insns: Jxx and JA. */
+			target = i + fp->jt + 1;
+			insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+			BPF_EMIT_JMP;
+			insn++;
+
+			insn->code = BPF_JMP | BPF_JA;
+			target = i + fp->jf + 1;
+			BPF_EMIT_JMP;
+			break;
+
+			/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
+		case BPF_LDX | BPF_MSH | BPF_B:
+			/* tmp = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_A);
+			/* A = BPF_R0 = *(u8 *) (skb->data + K) */
+			*insn++ = BPF_LD_ABS(BPF_B, fp->k);
+			/* A &= 0xf */
+			*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, 0xf);
+			/* A <<= 2 */
+			*insn++ = BPF_ALU32_IMM(BPF_LSH, BPF_REG_A, 2);
+			/* X = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			/* A = tmp */
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP);
+			break;
+
+			/* RET_K is remaped into 2 insns. RET_A case doesn't need an
+			 * extra mov as EBPF_REG_0 is already mapped into BPF_REG_A.
+			 */
+		case BPF_RET | BPF_A:
+		case BPF_RET | BPF_K:
+			if (BPF_RVAL(fp->code) == BPF_K) {
+				*insn++ = BPF_MOV32_RAW(BPF_K, EBPF_REG_0,
+							0, fp->k);
+			}
+			*insn = BPF_EXIT_INSN();
+			break;
+
+			/* Store to stack. */
+		case BPF_ST:
+		case BPF_STX:
+			*insn = BPF_STX_MEM(BPF_W, BPF_REG_FP, BPF_CLASS(fp->code) ==
+					    BPF_ST ? BPF_REG_A : BPF_REG_X,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* Load from stack. */
+		case BPF_LD | BPF_MEM:
+		case BPF_LDX | BPF_MEM:
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD  ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_FP,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* A = K or X = K */
+		case BPF_LD | BPF_IMM:
+		case BPF_LDX | BPF_IMM:
+			*insn = BPF_MOV32_IMM(BPF_CLASS(fp->code) == BPF_LD ?
+					      BPF_REG_A : BPF_REG_X, fp->k);
+			break;
+
+			/* X = A */
+		case BPF_MISC | BPF_TAX:
+			*insn = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			break;
+
+			/* A = X */
+		case BPF_MISC | BPF_TXA:
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_X);
+			break;
+
+			/* A = mbuf->len or X = mbuf->len */
+		case BPF_LD | BPF_W | BPF_LEN:
+		case BPF_LDX | BPF_W | BPF_LEN:
+			/* BPF_ABS/BPF_IND implicitly expect mbuf ptr in R6 */
+
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_CTX,
+					    offsetof(struct rte_mbuf, pkt_len));
+			break;
+
+			/* Unknown instruction. */
+		default:
+			RTE_BPF_LOG(ERR, "%s: Unknown instruction!: %#x\n",
+				    __func__, fp->code);
+			goto err;
+		}
+
+		insn++;
+		if (new_prog)
+			memcpy(new_insn, tmp_insns,
+			       sizeof(*insn) * (insn - tmp_insns));
+		new_insn += insn - tmp_insns;
+	}
+
+	if (!new_prog) {
+		/* Only calculating new length. */
+		*new_len = new_insn - new_prog;
+		return 0;
+	}
+
+	pass++;
+	if ((ptrdiff_t)new_flen != new_insn - new_prog) {
+		new_flen = new_insn - new_prog;
+		if (pass > 2)
+			goto err;
+		goto do_pass;
+	}
+
+	free(addrs);
+	assert(*new_len == new_flen);
+
+	return 0;
+err:
+	free(addrs);
+	return -1;
+}
+
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog)
+{
+	struct rte_bpf_prm *prm = NULL;
+	struct ebpf_insn *ebpf = NULL;
+	uint32_t ebpf_len = 0;
+	int ret;
+
+	if (prog == NULL) {
+		RTE_BPF_LOG(ERR, "%s: NULL program\n", __func__);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* 1st pass: calculate the eBPF program length */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, NULL, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot get eBPF length\n", __func__);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	RTE_BPF_LOG(DEBUG, "%s: prog len cBPF=%u -> eBPF=%u\n",
+		    __func__, prog->bf_len, ebpf_len);
+
+	prm = rte_zmalloc("bpf_filter",
+			  sizeof(*prm) + ebpf_len * sizeof(*ebpf), 0);
+	if (prm == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	/* The EPBF instructions in this case are right after the header */
+	ebpf = (void *)(prm + 1);
+
+	/* 2nd pass: remap cBPF to eBPF instructions  */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, ebpf, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot convert cBPF to eBPF\n", __func__);
+		free(prm);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	prm->ins = ebpf;
+	prm->nb_ins = ebpf_len;
+
+	/* Classic BPF programs use mbufs */
+	prm->prog_arg.type = RTE_BPF_ARG_PTR_MBUF;
+	prm->prog_arg.size = sizeof(struct rte_mbuf);
+
+	return prm;
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 63cbd60185e0..54f7610ae990 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -25,3 +25,8 @@ if dep.found()
     sources += files('bpf_load_elf.c')
     ext_deps += dep
 endif
+
+if dpdk_conf.has('RTE_PORT_PCAP')
+    sources += files('bpf_convert.c')
+    ext_deps += pcap_dep
+endif
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 69116f36ba8b..2f23e272a376 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,31 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+#ifdef RTE_PORT_PCAP
+
+struct bpf_program;
+
+/**
+ * Convert a Classic BPF program from libpcap into a DPDK BPF code.
+ *
+ * @param prog
+ *  Classic BPF program from pcap_compile().
+ * @param prm
+ *  Result Extended BPF program.
+ * @return
+ *   Pointer to BPF program (allocated with *rte_malloc*)
+ *   that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+__rte_experimental
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog);
+
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 0bf35f487666..47082d5003ef 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -14,3 +14,9 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_convert;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v13 05/12] bpf: add function to dump eBPF instructions
  2021-10-15 18:28 ` [dpdk-dev] [PATCH v13 " Stephen Hemminger
                     ` (3 preceding siblings ...)
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
@ 2021-10-15 18:28   ` Stephen Hemminger
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 06/12] pdump: support pcapng and filtering Stephen Hemminger
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 18:28 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Ray Kinsella

When debugging converted (and other) programs it is useful
to see disassembled eBPF output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/bpf/bpf_dump.c  | 139 ++++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build |   1 +
 lib/bpf/rte_bpf.h   |  14 +++++
 lib/bpf/version.map |   1 +
 4 files changed, 155 insertions(+)
 create mode 100644 lib/bpf/bpf_dump.c

diff --git a/lib/bpf/bpf_dump.c b/lib/bpf/bpf_dump.c
new file mode 100644
index 000000000000..b86977b96d08
--- /dev/null
+++ b/lib/bpf/bpf_dump.c
@@ -0,0 +1,139 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Stephen Hemminger
+ * Based on filter2xdp
+ * Copyright (C) 2017 Tobias Klauser
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+
+#include "rte_bpf.h"
+
+#define BPF_OP_INDEX(x) (BPF_OP(x) >> 4)
+#define BPF_SIZE_INDEX(x) (BPF_SIZE(x) >> 3)
+
+static const char *const class_tbl[] = {
+	[BPF_LD] = "ld",   [BPF_LDX] = "ldx",	 [BPF_ST] = "st",
+	[BPF_STX] = "stx", [BPF_ALU] = "alu",	 [BPF_JMP] = "jmp",
+	[BPF_RET] = "ret", [BPF_MISC] = "alu64",
+};
+
+static const char *const alu_op_tbl[16] = {
+	[BPF_ADD >> 4] = "add",	   [BPF_SUB >> 4] = "sub",
+	[BPF_MUL >> 4] = "mul",	   [BPF_DIV >> 4] = "div",
+	[BPF_OR >> 4] = "or",	   [BPF_AND >> 4] = "and",
+	[BPF_LSH >> 4] = "lsh",	   [BPF_RSH >> 4] = "rsh",
+	[BPF_NEG >> 4] = "neg",	   [BPF_MOD >> 4] = "mod",
+	[BPF_XOR >> 4] = "xor",	   [EBPF_MOV >> 4] = "mov",
+	[EBPF_ARSH >> 4] = "arsh", [EBPF_END >> 4] = "endian",
+};
+
+static const char *const size_tbl[] = {
+	[BPF_W >> 3] = "w",
+	[BPF_H >> 3] = "h",
+	[BPF_B >> 3] = "b",
+	[EBPF_DW >> 3] = "dw",
+};
+
+static const char *const jump_tbl[16] = {
+	[BPF_JA >> 4] = "ja",	   [BPF_JEQ >> 4] = "jeq",
+	[BPF_JGT >> 4] = "jgt",	   [BPF_JGE >> 4] = "jge",
+	[BPF_JSET >> 4] = "jset",  [EBPF_JNE >> 4] = "jne",
+	[EBPF_JSGT >> 4] = "jsgt", [EBPF_JSGE >> 4] = "jsge",
+	[EBPF_CALL >> 4] = "call", [EBPF_EXIT >> 4] = "exit",
+};
+
+void rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len)
+{
+	uint32_t i;
+
+	for (i = 0; i < len; ++i) {
+		const struct ebpf_insn *ins = buf + i;
+		uint8_t cls = BPF_CLASS(ins->code);
+		const char *op, *postfix = "";
+
+		fprintf(f, " L%u:\t", i);
+
+		switch (cls) {
+		default:
+			fprintf(f, "unimp 0x%x // class: %s\n",
+				ins->code, class_tbl[cls]);
+			break;
+		case BPF_ALU:
+			postfix = "32";
+			/* fall through */
+		case EBPF_ALU64:
+			op = alu_op_tbl[BPF_OP_INDEX(ins->code)];
+			if (BPF_SRC(ins->code) == BPF_X)
+				fprintf(f, "%s%s r%u, r%u\n", op, postfix, ins->dst_reg,
+					ins->src_reg);
+			else
+				fprintf(f, "%s%s r%u, #0x%x\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			break;
+		case BPF_LD:
+			op = "ld";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			if (ins->code == (BPF_LD | BPF_IMM | EBPF_DW)) {
+				uint64_t val;
+
+				val = (uint32_t)ins[0].imm |
+					(uint64_t)(uint32_t)ins[1].imm << 32;
+				fprintf(f, "%s%s r%d, #0x%"PRIx64"\n",
+					op, postfix, ins->dst_reg, val);
+				i++;
+			} else if (BPF_MODE(ins->code) == BPF_IMM)
+				fprintf(f, "%s%s r%d, #0x%x\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			else if (BPF_MODE(ins->code) == BPF_ABS)
+				fprintf(f, "%s%s r%d, [%d]\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			else if (BPF_MODE(ins->code) == BPF_IND)
+				fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix,
+					ins->dst_reg, ins->src_reg, ins->imm);
+			else
+				fprintf(f, "// BUG: LD opcode 0x%02x in eBPF insns\n",
+					ins->code);
+			break;
+		case BPF_LDX:
+			op = "ldx";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix, ins->dst_reg,
+				ins->src_reg, ins->off);
+			break;
+		case BPF_ST:
+			op = "st";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			if (BPF_MODE(ins->code) == BPF_MEM)
+				fprintf(f, "%s%s [r%d + %d], #0x%x\n", op, postfix,
+					ins->dst_reg, ins->off, ins->imm);
+			else
+				fprintf(f, "// BUG: ST opcode 0x%02x in eBPF insns\n",
+					ins->code);
+			break;
+		case BPF_STX:
+			op = "stx";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			fprintf(f, "%s%s [r%d + %d], r%u\n", op, postfix,
+				ins->dst_reg, ins->off, ins->src_reg);
+			break;
+#define L(pc, off) ((int)(pc) + 1 + (off))
+		case BPF_JMP:
+			op = jump_tbl[BPF_OP_INDEX(ins->code)];
+			if (op == NULL)
+				fprintf(f, "invalid jump opcode: %#x\n", ins->code);
+			else if (BPF_OP(ins->code) == BPF_JA)
+				fprintf(f, "%s L%d\n", op, L(i, ins->off));
+			else if (BPF_OP(ins->code) == EBPF_EXIT)
+				fprintf(f, "%s\n", op);
+			else
+				fprintf(f, "%s r%u, #0x%x, L%d\n", op, ins->dst_reg,
+					ins->imm, L(i, ins->off));
+			break;
+		case BPF_RET:
+			fprintf(f, "// BUG: RET opcode 0x%02x in eBPF insns\n",
+				ins->code);
+			break;
+		}
+	}
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 54f7610ae990..5b5585173aeb 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2018 Intel Corporation
 
 sources = files('bpf.c',
+	'bpf_dump.c',
         'bpf_exec.c',
         'bpf_load.c',
         'bpf_pkt.c',
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 2f23e272a376..0d0a84b130a0 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,20 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+/**
+ * Dump epf instructions to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param buf
+ *   A pointer to BPF instructions
+ * @param len
+ *   Number of BPF instructions to dump.
+ */
+__rte_experimental
+void
+rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len);
+
 #ifdef RTE_PORT_PCAP
 
 struct bpf_program;
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 47082d5003ef..3b953f2f4592 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -19,4 +19,5 @@ EXPERIMENTAL {
 	global:
 
 	rte_bpf_convert;
+	rte_bpf_dump;
 };
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v13 06/12] pdump: support pcapng and filtering
  2021-10-15 18:28 ` [dpdk-dev] [PATCH v13 " Stephen Hemminger
                     ` (4 preceding siblings ...)
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
@ 2021-10-15 18:28   ` Stephen Hemminger
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 18:28 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Reshma Pattan, Ray Kinsella, Anatoly Burakov

This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
---
 lib/meson.build       |   4 +-
 lib/pdump/meson.build |   2 +-
 lib/pdump/rte_pdump.c | 432 ++++++++++++++++++++++++++++++------------
 lib/pdump/rte_pdump.h | 113 ++++++++++-
 lib/pdump/version.map |   8 +
 5 files changed, 433 insertions(+), 126 deletions(-)

diff --git a/lib/meson.build b/lib/meson.build
index 15150efa19a7..c71c6917dbb7 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -27,6 +27,7 @@ libraries = [
         'acl',
         'bbdev',
         'bitratestats',
+        'bpf',
         'cfgfile',
         'compressdev',
         'cryptodev',
@@ -43,7 +44,6 @@ libraries = [
         'member',
         'pcapng',
         'power',
-        'pdump',
         'rawdev',
         'regexdev',
         'rib',
@@ -55,10 +55,10 @@ libraries = [
         'ipsec', # ipsec lib depends on net, crypto and security
         'fib', #fib lib depends on rib
         'port', # pkt framework libs which use other libs from above
+        'pdump', # pdump lib depends on bpf
         'table',
         'pipeline',
         'flow_classify', # flow_classify lib depends on pkt framework table lib
-        'bpf',
         'graph',
         'node',
 ]
diff --git a/lib/pdump/meson.build b/lib/pdump/meson.build
index 3a95eabde6a6..51ceb2afdec5 100644
--- a/lib/pdump/meson.build
+++ b/lib/pdump/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('rte_pdump.c')
 headers = files('rte_pdump.h')
-deps += ['ethdev']
+deps += ['ethdev', 'bpf', 'pcapng']
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 382217bc1564..2636a216994b 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -7,8 +7,10 @@
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
+#include <rte_pcapng.h>
 
 #include "rte_pdump.h"
 
@@ -27,30 +29,23 @@ enum pdump_operation {
 	ENABLE = 2
 };
 
+/* Internal version number in request */
 enum pdump_version {
-	V1 = 1
+	V1 = 1,		    /* no filtering or snap */
+	V2 = 2,
 };
 
 struct pdump_request {
 	uint16_t ver;
 	uint16_t op;
 	uint32_t flags;
-	union pdump_data {
-		struct enable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} en_v1;
-		struct disable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} dis_v1;
-	} data;
+	char device[RTE_DEV_NAME_MAX_LEN];
+	uint16_t queue;
+	struct rte_ring *ring;
+	struct rte_mempool *mp;
+
+	const struct rte_bpf_prm *prm;
+	uint32_t snaplen;
 };
 
 struct pdump_response {
@@ -63,80 +58,140 @@ static struct pdump_rxtx_cbs {
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	const struct rte_eth_rxtx_callback *cb;
-	void *filter;
+	const struct rte_bpf *filter;
+	enum pdump_version ver;
+	uint32_t snaplen;
 } rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
 tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
 
 
-static inline void
-pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+/*
+ * The packet capture statistics keep track of packets
+ * accepted, filtered and dropped. These are per-queue
+ * and in memory between primary and secondary processes.
+ */
+static const char MZ_RTE_PDUMP_STATS[] = "rte_pdump_stats";
+static struct {
+	struct rte_pdump_stats rx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+	struct rte_pdump_stats tx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+} *pdump_stats;
+
+/* Create a clone of mbuf to be placed into ring. */
+static void
+pdump_copy(uint16_t port_id, uint16_t queue,
+	   enum rte_pcapng_direction direction,
+	   struct rte_mbuf **pkts, uint16_t nb_pkts,
+	   const struct pdump_rxtx_cbs *cbs,
+	   struct rte_pdump_stats *stats)
 {
 	unsigned int i;
 	int ring_enq;
 	uint16_t d_pkts = 0;
 	struct rte_mbuf *dup_bufs[nb_pkts];
-	struct pdump_rxtx_cbs *cbs;
+	uint64_t ts;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	struct rte_mbuf *p;
+	uint64_t rcs[nb_pkts];
+
+	if (cbs->filter)
+		rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts);
 
-	cbs  = user_params;
+	ts = rte_get_tsc_cycles();
 	ring = cbs->ring;
 	mp = cbs->mp;
 	for (i = 0; i < nb_pkts; i++) {
-		p = rte_pktmbuf_copy(pkts[i], mp, 0, UINT32_MAX);
-		if (p)
+		/*
+		 * This uses same BPF return value convention as socket filter
+		 * and pcap_offline_filter.
+		 * if program returns zero
+		 * then packet doesn't match the filter (will be ignored).
+		 */
+		if (cbs->filter && rcs[i] == 0) {
+			__atomic_fetch_add(&stats->filtered,
+					   1, __ATOMIC_RELAXED);
+			continue;
+		}
+
+		/*
+		 * If using pcapng then want to wrap packets
+		 * otherwise a simple copy.
+		 */
+		if (cbs->ver == V2)
+			p = rte_pcapng_copy(port_id, queue,
+					    pkts[i], mp, cbs->snaplen,
+					    ts, direction);
+		else
+			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
+
+		if (unlikely(p == NULL))
+			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+		else
 			dup_bufs[d_pkts++] = p;
 	}
 
+	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)dup_bufs, d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		PDUMP_LOG(DEBUG,
-			"only %d of packets enqueued to ring\n", ring_enq);
+		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
 
 static uint16_t
-pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_rx(uint16_t port, uint16_t queue,
 	struct rte_mbuf **pkts, uint16_t nb_pkts,
-	uint16_t max_pkts __rte_unused,
-	void *user_params)
+	uint16_t max_pkts __rte_unused, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->rx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_IN,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static uint16_t
-pdump_tx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_tx(uint16_t port, uint16_t queue,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->tx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static int
-pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_rx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &rx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &rx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"rx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_first_rx_callback(port, qid,
 								pdump_rx, cbs);
 			if (cbs->cb == NULL) {
@@ -145,8 +200,7 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -170,26 +224,32 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 }
 
 static int
-pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_tx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &tx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &tx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"tx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_tx_callback(port, qid, pdump_tx,
 								cbs);
 			if (cbs->cb == NULL) {
@@ -198,8 +258,7 @@ pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -228,37 +287,47 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	uint16_t nb_rx_q = 0, nb_tx_q = 0, end_q, queue;
 	uint16_t port;
 	int ret = 0;
+	struct rte_bpf *filter = NULL;
 	uint32_t flags;
 	uint16_t operation;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 
-	flags = p->flags;
-	operation = p->op;
-	if (operation == ENABLE) {
-		ret = rte_eth_dev_get_port_by_name(p->data.en_v1.device,
-				&port);
-		if (ret < 0) {
+	/* Check for possible DPDK version mismatch */
+	if (!(p->ver == V1 || p->ver == V2)) {
+		PDUMP_LOG(ERR,
+			  "incorrect client version %u\n", p->ver);
+		return -EINVAL;
+	}
+
+	if (p->prm) {
+		if (p->prm->prog_arg.type != RTE_BPF_ARG_PTR_MBUF) {
 			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.en_v1.device);
+				  "invalid BPF program type: %u\n",
+				  p->prm->prog_arg.type);
 			return -EINVAL;
 		}
-		queue = p->data.en_v1.queue;
-		ring = p->data.en_v1.ring;
-		mp = p->data.en_v1.mp;
-	} else {
-		ret = rte_eth_dev_get_port_by_name(p->data.dis_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.dis_v1.device);
-			return -EINVAL;
+
+		filter = rte_bpf_load(p->prm);
+		if (filter == NULL) {
+			PDUMP_LOG(ERR, "cannot load BPF filter: %s\n",
+				  rte_strerror(rte_errno));
+			return -rte_errno;
 		}
-		queue = p->data.dis_v1.queue;
-		ring = p->data.dis_v1.ring;
-		mp = p->data.dis_v1.mp;
+	}
+
+	flags = p->flags;
+	operation = p->op;
+	queue = p->queue;
+	ring = p->ring;
+	mp = p->mp;
+
+	ret = rte_eth_dev_get_port_by_name(p->device, &port);
+	if (ret < 0) {
+		PDUMP_LOG(ERR,
+			  "failed to get port id for device id=%s\n",
+			  p->device);
+		return -EINVAL;
 	}
 
 	/* validation if packet capture is for all queues */
@@ -296,8 +365,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register RX callback */
 	if (flags & RTE_PDUMP_FLAG_RX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_rx_q : queue + 1;
-		ret = pdump_register_rx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_rx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -305,8 +375,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register TX callback */
 	if (flags & RTE_PDUMP_FLAG_TX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_tx_q : queue + 1;
-		ret = pdump_register_tx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_tx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -332,7 +403,7 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 		resp->err_value = set_pdump_rxtx_cbs(cli_req);
 	}
 
-	strlcpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_resp.len_param = sizeof(*resp);
 	mp_resp.num_fds = 0;
 	if (rte_mp_reply(&mp_resp, peer) < 0) {
@@ -347,8 +418,18 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 int
 rte_pdump_init(void)
 {
+	const struct rte_memzone *mz;
 	int ret;
 
+	mz = rte_memzone_reserve(MZ_RTE_PDUMP_STATS, sizeof(*pdump_stats),
+				 rte_socket_id(), 0);
+	if (mz == NULL) {
+		PDUMP_LOG(ERR, "cannot allocate pdump statistics\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	pdump_stats = mz->addr;
+
 	ret = rte_mp_action_register(PDUMP_MP, pdump_server);
 	if (ret && rte_errno != ENOTSUP)
 		return -1;
@@ -392,14 +473,21 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 static int
 pdump_validate_flags(uint32_t flags)
 {
-	if (flags != RTE_PDUMP_FLAG_RX && flags != RTE_PDUMP_FLAG_TX &&
-		flags != RTE_PDUMP_FLAG_RXTX) {
+	if ((flags & RTE_PDUMP_FLAG_RXTX) == 0) {
 		PDUMP_LOG(ERR,
 			"invalid flags, should be either rx/tx/rxtx\n");
 		rte_errno = EINVAL;
 		return -1;
 	}
 
+	/* mask off the flags we know about */
+	if (flags & ~(RTE_PDUMP_FLAG_RXTX | RTE_PDUMP_FLAG_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "unknown flags: %#x\n", flags);
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -426,12 +514,12 @@ pdump_validate_port(uint16_t port, char *name)
 }
 
 static int
-pdump_prepare_client_request(char *device, uint16_t queue,
-				uint32_t flags,
-				uint16_t operation,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+pdump_prepare_client_request(const char *device, uint16_t queue,
+			     uint32_t flags, uint32_t snaplen,
+			     uint16_t operation,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     const struct rte_bpf_prm *prm)
 {
 	int ret = -1;
 	struct rte_mp_msg mp_req, *mp_rep;
@@ -440,26 +528,22 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	struct pdump_request *req = (struct pdump_request *)mp_req.param;
 	struct pdump_response *resp;
 
-	req->ver = 1;
-	req->flags = flags;
+	memset(req, 0, sizeof(*req));
+
+	req->ver = (flags & RTE_PDUMP_FLAG_PCAPNG) ? V2 : V1;
+	req->flags = flags & RTE_PDUMP_FLAG_RXTX;
 	req->op = operation;
+	req->queue = queue;
+	rte_strscpy(req->device, device, sizeof(req->device));
+
 	if ((operation & ENABLE) != 0) {
-		strlcpy(req->data.en_v1.device, device,
-			sizeof(req->data.en_v1.device));
-		req->data.en_v1.queue = queue;
-		req->data.en_v1.ring = ring;
-		req->data.en_v1.mp = mp;
-		req->data.en_v1.filter = filter;
-	} else {
-		strlcpy(req->data.dis_v1.device, device,
-			sizeof(req->data.dis_v1.device));
-		req->data.dis_v1.queue = queue;
-		req->data.dis_v1.ring = NULL;
-		req->data.dis_v1.mp = NULL;
-		req->data.dis_v1.filter = NULL;
+		req->ring = ring;
+		req->mp = mp;
+		req->prm = prm;
+		req->snaplen = snaplen;
 	}
 
-	strlcpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_req.len_param = sizeof(*req);
 	mp_req.num_fds = 0;
 	if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) {
@@ -477,11 +561,17 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	return ret;
 }
 
-int
-rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
-			struct rte_ring *ring,
-			struct rte_mempool *mp,
-			void *filter)
+/*
+ * There are two versions of this function, because although original API
+ * left place holder for future filter, it never checked the value.
+ * Therefore the API can't depend on application passing a non
+ * bogus value.
+ */
+static int
+pdump_enable(uint16_t port, uint16_t queue,
+	     uint32_t flags, uint32_t snaplen,
+	     struct rte_ring *ring, struct rte_mempool *mp,
+	     const struct rte_bpf_prm *prm)
 {
 	int ret;
 	char name[RTE_DEV_NAME_MAX_LEN];
@@ -496,20 +586,42 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						ENABLE, ring, mp, filter);
+	if (snaplen == 0)
+		snaplen = UINT32_MAX;
 
-	return ret;
+	return pdump_prepare_client_request(name, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
 }
 
 int
-rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
-				uint32_t flags,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
+		 struct rte_ring *ring,
+		 struct rte_mempool *mp,
+		 void *filter __rte_unused)
 {
-	int ret = 0;
+	return pdump_enable(port, queue, flags, 0,
+			    ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm)
+{
+	return pdump_enable(port, queue, flags, snaplen,
+			    ring, mp, prm);
+}
+
+static int
+pdump_enable_by_deviceid(const char *device_id, uint16_t queue,
+			 uint32_t flags, uint32_t snaplen,
+			 struct rte_ring *ring,
+			 struct rte_mempool *mp,
+			 const struct rte_bpf_prm *prm)
+{
+	int ret;
 
 	ret = pdump_validate_ring_mp(ring, mp);
 	if (ret < 0)
@@ -518,10 +630,30 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						ENABLE, ring, mp, filter);
+	return pdump_prepare_client_request(device_id, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
+}
 
-	return ret;
+int
+rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
+			     uint32_t flags,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     void *filter __rte_unused)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, 0,
+					ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *prm)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, snaplen,
+					ring, mp, prm);
 }
 
 int
@@ -537,8 +669,8 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags)
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(name, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
@@ -553,8 +685,68 @@ rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(device_id, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
+
+static void
+pdump_sum_stats(uint16_t port, uint16_t nq,
+		struct rte_pdump_stats stats[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+		struct rte_pdump_stats *total)
+{
+	uint64_t *sum = (uint64_t *)total;
+	unsigned int i;
+	uint64_t val;
+	uint16_t qid;
+
+	for (qid = 0; qid < nq; qid++) {
+		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+
+		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
+			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			sum[i] += val;
+		}
+	}
+}
+
+int
+rte_pdump_stats(uint16_t port, struct rte_pdump_stats *stats)
+{
+	struct rte_eth_dev_info dev_info;
+	const struct rte_memzone *mz;
+	int ret;
+
+	memset(stats, 0, sizeof(*stats));
+	ret = rte_eth_dev_info_get(port, &dev_info);
+	if (ret != 0) {
+		PDUMP_LOG(ERR,
+			  "Error during getting device (port %u) info: %s\n",
+			  port, strerror(-ret));
+		return ret;
+	}
+
+	if (pdump_stats == NULL) {
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+			/* rte_pdump_init was not called */
+			PDUMP_LOG(ERR, "pdump stats not initialized\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/* secondary process looks up the memzone */
+		mz = rte_memzone_lookup(MZ_RTE_PDUMP_STATS);
+		if (mz == NULL) {
+			/* rte_pdump_init was not called in primary process?? */
+			PDUMP_LOG(ERR, "can not find pdump stats\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+		pdump_stats = mz->addr;
+	}
+
+	pdump_sum_stats(port, dev_info.nb_rx_queues, pdump_stats->rx, stats);
+	pdump_sum_stats(port, dev_info.nb_tx_queues, pdump_stats->tx, stats);
+	return 0;
+}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index 6b00fc17aeb2..6efa0274f2ce 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -15,6 +15,7 @@
 #include <stdint.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
+#include <rte_bpf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -26,7 +27,9 @@ enum {
 	RTE_PDUMP_FLAG_RX = 1,  /* receive direction */
 	RTE_PDUMP_FLAG_TX = 2,  /* transmit direction */
 	/* both receive and transmit directions */
-	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
+	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX),
+
+	RTE_PDUMP_FLAG_PCAPNG = 4, /* format for pcapng */
 };
 
 /**
@@ -68,7 +71,7 @@ rte_pdump_uninit(void);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  Unused should be NULL.
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -80,6 +83,41 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 		struct rte_mempool *mp,
 		void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given port and queue with filtering.
+ *
+ * @param port_id
+ *  The Ethernet port on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param prm
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf(uint16_t port_id, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm);
+
 /**
  * Disables packet capturing on given port and queue.
  *
@@ -118,7 +156,7 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  unused should be NULL
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -131,6 +169,43 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 				struct rte_mempool *mp,
 				void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given device id and queue with filtering.
+ * device_id can be name or pci address of device.
+ *
+ * @param device_id
+ *  device id on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *filter);
+
+
 /**
  * Disables packet capturing on given device_id and queue.
  * device_id can be name or pci address of device.
@@ -153,6 +228,38 @@ int
 rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 				uint32_t flags);
 
+
+/**
+ * A structure used to retrieve statistics from packet capture.
+ * The statistics are sum of both receive and transmit queues.
+ */
+struct rte_pdump_stats {
+	uint64_t accepted; /**< Number of packets accepted by filter. */
+	uint64_t filtered; /**< Number of packets rejected by filter. */
+	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
+	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+
+	uint64_t reserved[4]; /**< Reserved and pad to cache line */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Retrieve the packet capture statistics for a queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param stats
+ *   A pointer to structure of type *rte_pdump_stats* to be filled in.
+ * @return
+ *   Zero if successful. -1 on error and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_pdump_stats(uint16_t port_id, struct rte_pdump_stats *stats);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index f0a9d12c9a9e..ce5502d9cdf4 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -10,3 +10,11 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_pdump_enable_bpf;
+	rte_pdump_enable_bpf_by_deviceid;
+	rte_pdump_stats;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v13 07/12] app/dumpcap: add new packet capture application
  2021-10-15 18:28 ` [dpdk-dev] [PATCH v13 " Stephen Hemminger
                     ` (5 preceding siblings ...)
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 06/12] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-10-15 18:28   ` Stephen Hemminger
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 08/12] test: add test for bpf_convert Stephen Hemminger
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 18:28 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Reshma Pattan

This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.

It is not complete yet some features such as filtering are not implemented.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/dumpcap/main.c      | 844 ++++++++++++++++++++++++++++++++++++++++
 app/dumpcap/meson.build |  16 +
 app/meson.build         |   1 +
 3 files changed, 861 insertions(+)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build

diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c
new file mode 100644
index 000000000000..baf9eee46666
--- /dev/null
+++ b/app/dumpcap/main.c
@@ -0,0 +1,844 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * DPDK application to dump network traffic
+ * This is designed to look and act like the Wireshark
+ * dumpcap program.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <limits.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_alarm.h>
+#include <rte_bpf.h>
+#include <rte_config.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_lcore.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_pcapng.h>
+#include <rte_pdump.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+#include <rte_time.h>
+#include <rte_version.h>
+
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+
+#define RING_NAME "capture-ring"
+#define MONITOR_INTERVAL  (500 * 1000)
+#define MBUF_POOL_CACHE_SIZE 32
+#define BURST_SIZE 32
+#define SLEEP_THRESHOLD 1000
+
+/* command line flags */
+static const char *progname;
+static bool quit_signal;
+static bool group_read;
+static bool quiet;
+static bool promiscuous_mode = true;
+static bool use_pcapng = true;
+static char *output_name;
+static const char *filter_str;
+static unsigned int ring_size = 2048;
+static const char *capture_comment;
+static uint32_t snaplen = RTE_MBUF_DEFAULT_BUF_SIZE;
+static bool dump_bpf;
+static struct {
+	uint64_t  duration;	/* nanoseconds */
+	unsigned long packets;  /* number of packets in file */
+	size_t size;		/* file size (bytes) */
+} stop;
+
+/* Running state */
+static struct rte_bpf_prm *bpf_prm;
+static uint64_t start_time, end_time;
+static uint64_t packets_received;
+static size_t file_size;
+
+struct interface {
+	TAILQ_ENTRY(interface) next;
+	uint16_t port;
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_rxtx_callback *rx_cb[RTE_MAX_QUEUES_PER_PORT];
+};
+
+TAILQ_HEAD(interface_list, interface);
+static struct interface_list interfaces = TAILQ_HEAD_INITIALIZER(interfaces);
+static struct interface *port2intf[RTE_MAX_ETHPORTS];
+
+/* Can do either pcap or pcapng format output */
+typedef union {
+	rte_pcapng_t  *pcapng;
+	pcap_dumper_t *dumper;
+} dumpcap_out_t;
+
+static void usage(void)
+{
+	printf("Usage: %s [options] ...\n\n", progname);
+	printf("Capture Interface:\n"
+	       "  -i <interface>           name or port index of interface\n"
+	       "  -f <capture filter>      packet filter in libpcap filter syntax\n");
+	printf("  -s <snaplen>, --snapshot-length <snaplen>\n"
+	       "                           packet snapshot length (def: %u)\n",
+	       RTE_MBUF_DEFAULT_BUF_SIZE);
+	printf("  -p, --no-promiscuous-mode\n"
+	       "                           don't capture in promiscuous mode\n"
+	       "  -D, --list-interfaces    print list of interfaces and exit\n"
+	       "  -d                       print generated BPF code for capture filter\n"
+	       "\n"
+	       "Stop conditions:\n"
+	       "  -c <packet count>        stop after n packets (def: infinite)\n"
+	       "  -a <autostop cond.> ..., --autostop <autostop cond.> ...\n"
+	       "                           duration:NUM - stop after NUM seconds\n"
+	       "                           filesize:NUM - stop this file after NUM kB\n"
+	       "                            packets:NUM - stop after NUM packets\n"
+	       "Output (files):\n"
+	       "  -w <filename>            name of file to save (def: tempfile)\n"
+	       "  -g                       enable group read access on the output file(s)\n"
+	       "  -n                       use pcapng format instead of pcap (default)\n"
+	       "  -P                       use libpcap format instead of pcapng\n"
+	       "  --capture-comment <comment>\n"
+	       "                           add a capture comment to the output file\n"
+	       "\n"
+	       "Miscellaneous:\n"
+	       "  -q                       don't report packet capture counts\n"
+	       "  -v, --version            print version information and exit\n"
+	       "  -h, --help               display this help and exit\n"
+	       "\n"
+	       "Use Ctrl-C to stop capturing at any time.\n");
+}
+
+static const char *version(void)
+{
+	static char str[128];
+
+	snprintf(str, sizeof(str),
+		 "%s 1.0 (%s)\n", progname, rte_version());
+	return str;
+}
+
+/* Parse numeric argument from command line */
+static unsigned long get_uint(const char *arg, const char *name,
+			     unsigned int limit)
+{
+	unsigned long u;
+	char *endp;
+
+	u = strtoul(arg, &endp, 0);
+	if (*arg == '\0' || *endp != '\0')
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is not a valid number\n",
+			 name, arg);
+	if (limit && u > limit)
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is too large (greater than %u)\n",
+			 name, arg, limit);
+
+	return u;
+}
+
+/* Set auto stop values */
+static void auto_stop(char *opt)
+{
+	char *value, *endp;
+
+	value = strchr(opt, ':');
+	if (value == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Missing colon in auto stop parameter\n");
+
+	*value++ = '\0';
+	if (strcmp(opt, "duration") == 0) {
+		double interval = strtod(value, &endp);
+
+		if (*value == '\0' || *endp != '\0' || interval <= 0)
+			rte_exit(EXIT_FAILURE,
+				 "Invalid duration \"%s\"\n", value);
+		stop.duration = NSEC_PER_SEC * interval;
+	} else if (strcmp(opt, "filesize") == 0) {
+		stop.size = get_uint(value, "filesize", 0) * 1024;
+	} else if (strcmp(opt, "packets") == 0) {
+		stop.packets = get_uint(value, "packets", 0);
+	} else {
+		rte_exit(EXIT_FAILURE,
+			 "Unknown autostop parameter \"%s\"\n", opt);
+	}
+}
+
+/* Add interface to list of interfaces to capture */
+static void add_interface(uint16_t port, const char *name)
+{
+	struct interface *intf;
+
+	intf = malloc(sizeof(*intf));
+	if (!intf)
+		rte_exit(EXIT_FAILURE, "no memory for interface\n");
+
+	memset(intf, 0, sizeof(*intf));
+	rte_strscpy(intf->name, name, sizeof(intf->name));
+
+	printf("Capturing on '%s'\n", name);
+
+	port2intf[port] = intf;
+	TAILQ_INSERT_TAIL(&interfaces, intf, next);
+}
+
+/* Select all valid DPDK interfaces */
+static void select_all_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+	}
+}
+
+/*
+ * Choose interface to capture if no -i option given.
+ * Select the first DPDK port, this matches what dumpcap does.
+ */
+static void set_default_interface(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+		return;
+	}
+	rte_exit(EXIT_FAILURE, "No usable interfaces found\n");
+}
+
+/* Lookup interface by name or port and add it to the list */
+static void select_interface(const char *arg)
+{
+	uint16_t port;
+
+	if (strcmp(arg, "*"))
+		select_all_interfaces();
+	else if (rte_eth_dev_get_port_by_name(arg, &port) == 0)
+		add_interface(port, arg);
+	else {
+		char name[RTE_ETH_NAME_MAX_LEN];
+
+		port = get_uint(arg, "port_number", UINT16_MAX);
+		if (rte_eth_dev_get_name_by_port(port, name) < 0)
+			rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
+				 port);
+		add_interface(port, name);
+	}
+}
+
+/* Display list of possible interfaces that can be used. */
+static void show_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		printf("%u. %s\n", p, name);
+	}
+}
+
+static void compile_filter(void)
+{
+	struct bpf_program bf;
+	pcap_t *pcap;
+
+	pcap = pcap_open_dead(DLT_EN10MB, snaplen);
+	if (!pcap)
+		rte_exit(EXIT_FAILURE, "can not open pcap\n");
+
+	if (pcap_compile(pcap, &bf, filter_str,
+			 1, PCAP_NETMASK_UNKNOWN) != 0)
+		rte_exit(EXIT_FAILURE, "pcap filter string not valid (%s)\n",
+			 pcap_geterr(pcap));
+
+	bpf_prm = rte_bpf_convert(&bf);
+	if (bpf_prm == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "bpf convert failed\n");
+
+	if (dump_bpf) {
+		printf("cBPF program (%u insns)\n", bf.bf_len);
+		bpf_dump(&bf, 1);
+		printf("\neBPF program (%u insns)\n", bpf_prm->nb_ins);
+		rte_bpf_dump(stdout, bpf_prm->ins, bpf_prm->nb_ins);
+		exit(0);
+	}
+
+	/* Don't care about original program any more */
+	pcap_freecode(&bf);
+	pcap_close(pcap);
+}
+
+/*
+ * Parse command line options.
+ * These are chosen to be similar to dumpcap command.
+ */
+static void parse_opts(int argc, char **argv)
+{
+	static const struct option long_options[] = {
+		{ "autostop",        required_argument, NULL, 'a' },
+		{ "capture-comment", required_argument, NULL, 0 },
+		{ "help",            no_argument,       NULL, 'h' },
+		{ "interface",       required_argument, NULL, 'i' },
+		{ "list-interfaces", no_argument,       NULL, 'D' },
+		{ "no-promiscuous-mode", no_argument,   NULL, 'p' },
+		{ "output-file",     required_argument, NULL, 'w' },
+		{ "ring-buffer",     required_argument, NULL, 'b' },
+		{ "snapshot-length", required_argument, NULL, 's' },
+		{ "version",         no_argument,       NULL, 'v' },
+		{ NULL },
+	};
+	int option_index, c;
+
+	for (;;) {
+		c = getopt_long(argc, argv, "a:b:c:dDf:ghi:nN:pPqs:vw:",
+				long_options, &option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			switch (option_index) {
+			case 0:
+				capture_comment = optarg;
+				break;
+			default:
+				usage();
+				exit(1);
+			}
+			break;
+		case 'a':
+			auto_stop(optarg);
+			break;
+		case 'b':
+			rte_exit(EXIT_FAILURE,
+				 "multiple files not implemented\n");
+			break;
+		case 'c':
+			stop.packets = get_uint(optarg, "packet_count", 0);
+			break;
+		case 'd':
+			dump_bpf = true;
+			break;
+		case 'D':
+			show_interfaces();
+			exit(0);
+		case 'f':
+			filter_str = optarg;
+			break;
+		case 'g':
+			group_read = true;
+			break;
+		case 'h':
+			printf("%s\n\n", version());
+			usage();
+			exit(0);
+		case 'i':
+			select_interface(optarg);
+			break;
+		case 'n':
+			use_pcapng = true;
+			break;
+		case 'N':
+			ring_size = get_uint(optarg, "packet_limit", 0);
+			break;
+		case 'p':
+			promiscuous_mode = false;
+			break;
+		case 'P':
+			use_pcapng = false;
+			break;
+		case 'q':
+			quiet = true;
+			break;
+		case 's':
+			snaplen = get_uint(optarg, "snap_len", 0);
+			break;
+		case 'w':
+			output_name = optarg;
+			break;
+		case 'v':
+			printf("%s\n", version());
+			exit(0);
+		default:
+			fprintf(stderr, "Invalid option: %s\n",
+				argv[optind - 1]);
+			usage();
+			exit(1);
+		}
+	}
+}
+
+static void
+signal_handler(int sig_num __rte_unused)
+{
+	__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+}
+
+/* Return the time since 1/1/1970 in nanoseconds */
+static uint64_t create_timestamp(void)
+{
+	struct timespec now;
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+	return rte_timespec_to_ns(&now);
+}
+
+static void
+cleanup_pdump_resources(void)
+{
+	struct interface *intf;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		rte_pdump_disable(intf->port,
+				  RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
+		if (promiscuous_mode)
+			rte_eth_promiscuous_disable(intf->port);
+	}
+}
+
+/* Alarm signal handler, used to check that primary process */
+static void
+monitor_primary(void *arg __rte_unused)
+{
+	if (__atomic_load_n(&quit_signal, __ATOMIC_RELAXED))
+		return;
+
+	if (rte_eal_primary_proc_alive(NULL)) {
+		rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	} else {
+		fprintf(stderr,
+			"Primary process is no longer active, exiting...\n");
+		__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+	}
+}
+
+/* Setup handler to check when primary exits. */
+static void
+enable_primary_monitor(void)
+{
+	int ret;
+
+	/* Once primary exits, so will pdump. */
+	ret = rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to enable monitor:%d\n", ret);
+}
+
+static void
+disable_primary_monitor(void)
+{
+	int ret;
+
+	ret = rte_eal_alarm_cancel(monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to disable monitor:%d\n", ret);
+}
+
+static void
+report_packet_stats(dumpcap_out_t out)
+{
+	struct rte_pdump_stats pdump_stats;
+	struct interface *intf;
+	uint64_t ifrecv, ifdrop;
+	double percent;
+
+	fputc('\n', stderr);
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (rte_pdump_stats(intf->port, &pdump_stats) < 0)
+			continue;
+
+		/* do what Wiretap does */
+		ifrecv = pdump_stats.accepted + pdump_stats.filtered;
+		ifdrop = pdump_stats.nombuf + pdump_stats.ringfull;
+
+		if (use_pcapng)
+			rte_pcapng_write_stats(out.pcapng, intf->port, NULL,
+					       start_time, end_time,
+					       ifrecv, ifdrop);
+
+		if (ifrecv == 0)
+			percent = 0;
+		else
+			percent = 100. * ifrecv / (ifrecv + ifdrop);
+
+		fprintf(stderr,
+			"Packets received/dropped on interface '%s': "
+			"%"PRIu64 "/%" PRIu64 " (%.1f)\n",
+			intf->name, ifrecv, ifdrop, percent);
+	}
+}
+
+/*
+ * Start DPDK EAL with arguments.
+ * Unlike most DPDK programs, this application does not use the
+ * typical EAL command line arguments.
+ * We don't want to expose all the DPDK internals to the user.
+ */
+static void dpdk_init(void)
+{
+	static const char * const args[] = {
+		"dumpcap", "--proc-type", "secondary",
+		"--log-level", "notice"
+
+	};
+	const int eal_argc = RTE_DIM(args);
+	char **eal_argv;
+	unsigned int i;
+
+	/* DPDK API requires mutable versions of command line arguments. */
+	eal_argv = calloc(eal_argc + 1, sizeof(char *));
+	if (eal_argv == NULL)
+		rte_panic("No memory\n");
+
+	eal_argv[0] = strdup(progname);
+	for (i = 1; i < RTE_DIM(args); i++)
+		eal_argv[i] = strdup(args[i]);
+
+	if (rte_eal_init(eal_argc, eal_argv) < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed: is primary process running?\n");
+
+	if (rte_eth_dev_count_avail() == 0)
+		rte_exit(EXIT_FAILURE, "No Ethernet ports found\n");
+}
+
+/* Create packet ring shared between callbacks and process */
+static struct rte_ring *create_ring(void)
+{
+	struct rte_ring *ring;
+	size_t size, log2;
+
+	/* Find next power of 2 >= size. */
+	size = ring_size;
+	log2 = sizeof(size) * 8 - __builtin_clzl(size - 1);
+	size = 1u << log2;
+
+	if (size != ring_size) {
+		fprintf(stderr, "Ring size %u rounded up to %zu\n",
+			ring_size, size);
+		ring_size = size;
+	}
+
+	ring = rte_ring_lookup(RING_NAME);
+	if (ring == NULL) {
+		ring = rte_ring_create(RING_NAME, ring_size,
+					rte_socket_id(), 0);
+		if (ring == NULL)
+			rte_exit(EXIT_FAILURE, "Could not create ring :%s\n",
+				 rte_strerror(rte_errno));
+	}
+	return ring;
+}
+
+static struct rte_mempool *create_mempool(void)
+{
+	static const char pool_name[] = "capture_mbufs";
+	size_t num_mbufs = 2 * ring_size;
+	struct rte_mempool *mp;
+
+	mp = rte_mempool_lookup(pool_name);
+	if (mp)
+		return mp;
+
+	mp = rte_pktmbuf_pool_create_by_ops(pool_name, num_mbufs,
+					    MBUF_POOL_CACHE_SIZE, 0,
+					    rte_pcapng_mbuf_size(snaplen),
+					    rte_socket_id(), "ring_mp_sc");
+	if (mp == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Mempool (%s) creation failed: %s\n", pool_name,
+			 rte_strerror(rte_errno));
+
+	return mp;
+}
+
+/*
+ * Get Operating System information.
+ * Returns an string allocated via malloc().
+ */
+static char *get_os_info(void)
+{
+	struct utsname uts;
+	char *osname = NULL;
+
+	if (uname(&uts) < 0)
+		return NULL;
+
+	if (asprintf(&osname, "%s %s",
+		     uts.sysname, uts.release) == -1)
+		return NULL;
+
+	return osname;
+}
+
+static dumpcap_out_t create_output(void)
+{
+	dumpcap_out_t ret;
+	static char tmp_path[PATH_MAX];
+	int fd;
+
+	/* If no filename specified make a tempfile name */
+	if (output_name == NULL) {
+		struct interface *intf;
+		struct tm *tm;
+		time_t now;
+		char ts[32];
+
+		intf = TAILQ_FIRST(&interfaces);
+		now = time(NULL);
+		tm = localtime(&now);
+		if (!tm)
+			rte_panic("localtime failed\n");
+
+		strftime(ts, sizeof(ts), "%Y%m%d%H%M%S", tm);
+
+		snprintf(tmp_path, sizeof(tmp_path),
+			 "/tmp/%s_%u_%s_%s.%s",
+			 progname, intf->port, intf->name, ts,
+			 use_pcapng ? "pcapng" : "pcap");
+		output_name = tmp_path;
+	}
+
+	if (strcmp(output_name, "-") == 0)
+		fd = STDOUT_FILENO;
+	else {
+		mode_t mode = group_read ? 0640 : 0600;
+
+		fd = open(output_name, O_WRONLY | O_CREAT, mode);
+		if (fd < 0)
+			rte_exit(EXIT_FAILURE, "Can not open \"%s\": %s\n",
+				 output_name, strerror(errno));
+	}
+
+	if (use_pcapng) {
+		char *os = get_os_info();
+
+		ret.pcapng = rte_pcapng_fdopen(fd, os, NULL,
+					   version(), capture_comment);
+		if (ret.pcapng == NULL)
+			rte_exit(EXIT_FAILURE, "pcapng_fdopen failed: %s\n",
+				 strerror(rte_errno));
+		free(os);
+	} else {
+		pcap_t *pcap;
+
+		pcap = pcap_open_dead_with_tstamp_precision(DLT_EN10MB, snaplen,
+							    PCAP_TSTAMP_PRECISION_NANO);
+		if (pcap == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_open_dead failed\n");
+
+		ret.dumper = pcap_dump_fopen(pcap, fdopen(fd, "w"));
+		if (ret.dumper == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_dump_fopen failed: %s\n",
+				 pcap_geterr(pcap));
+	}
+
+	return ret;
+}
+
+static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp)
+{
+	struct interface *intf;
+	uint32_t flags;
+	int ret;
+
+	flags = RTE_PDUMP_FLAG_RXTX;
+	if (use_pcapng)
+		flags |= RTE_PDUMP_FLAG_PCAPNG;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (promiscuous_mode)
+			rte_eth_promiscuous_enable(intf->port);
+
+		ret = rte_pdump_enable_bpf(intf->port, RTE_PDUMP_ALL_QUEUES,
+					   flags, snaplen,
+					   r, mp, bpf_prm);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Packet dump enable failed: %s\n",
+				 rte_strerror(-ret));
+	}
+}
+
+/*
+ * Show current count of captured packets
+ * with backspaces to overwrite last value.
+ */
+static void show_count(uint64_t count)
+{
+	unsigned int i;
+	static unsigned int bt;
+
+	for (i = 0; i < bt; i++)
+		fputc('\b', stderr);
+
+	bt = fprintf(stderr, "%"PRIu64" ", count);
+}
+
+/* Write multiple packets in older pcap format */
+static ssize_t
+pcap_write_packets(pcap_dumper_t *dumper,
+		   struct rte_mbuf *pkts[], uint16_t n)
+{
+	uint8_t temp_data[snaplen];
+	struct pcap_pkthdr header;
+	uint16_t i;
+	size_t total = 0;
+
+	gettimeofday(&header.ts, NULL);
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m = pkts[i];
+
+		header.len = rte_pktmbuf_pkt_len(m);
+		header.caplen = RTE_MIN(header.len, snaplen);
+
+		pcap_dump((u_char *)dumper, &header,
+			  rte_pktmbuf_read(m, 0, header.caplen, temp_data));
+
+		total += sizeof(header) + header.len;
+	}
+
+	return total;
+}
+
+/* Process all packets in ring and dump to capture file */
+static int process_ring(dumpcap_out_t out, struct rte_ring *r)
+{
+	struct rte_mbuf *pkts[BURST_SIZE];
+	unsigned int avail, n;
+	static unsigned int empty_count;
+	ssize_t written;
+
+	n = rte_ring_sc_dequeue_burst(r, (void **) pkts, BURST_SIZE,
+				      &avail);
+	if (n == 0) {
+		/* don't consume endless amounts of cpu if idle */
+		if (empty_count < SLEEP_THRESHOLD)
+			++empty_count;
+		else
+			usleep(10);
+		return 0;
+	}
+
+	empty_count = (avail == 0);
+
+	if (use_pcapng)
+		written = rte_pcapng_write_packets(out.pcapng, pkts, n);
+	else
+		written = pcap_write_packets(out.dumper, pkts, n);
+
+	rte_pktmbuf_free_bulk(pkts, n);
+
+	if (written < 0)
+		return -1;
+
+	file_size += written;
+	packets_received += n;
+	if (!quiet)
+		show_count(packets_received);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	struct rte_ring *r;
+	struct rte_mempool *mp;
+	dumpcap_out_t out;
+
+	progname = argv[0];
+
+	dpdk_init();
+	parse_opts(argc, argv);
+
+	if (filter_str)
+		compile_filter();
+
+	if (TAILQ_EMPTY(&interfaces))
+		set_default_interface();
+
+	r = create_ring();
+	mp = create_mempool();
+	out = create_output();
+
+	start_time = create_timestamp();
+	enable_pdump(r, mp);
+
+	signal(SIGINT, signal_handler);
+	signal(SIGPIPE, SIG_IGN);
+
+	enable_primary_monitor();
+
+	if (!quiet) {
+		fprintf(stderr, "Packets captured: ");
+		show_count(0);
+	}
+
+	while (!__atomic_load_n(&quit_signal, __ATOMIC_RELAXED)) {
+		if (process_ring(out, r) < 0) {
+			fprintf(stderr, "pcapng file write failed; %s\n",
+				strerror(errno));
+			break;
+		}
+
+		if (stop.size && file_size >= stop.size)
+			break;
+
+		if (stop.packets && packets_received >= stop.packets)
+			break;
+
+		if (stop.duration != 0 &&
+		    create_timestamp() - start_time > stop.duration)
+			break;
+	}
+
+	end_time = create_timestamp();
+	disable_primary_monitor();
+
+	if (rte_eal_primary_proc_alive(NULL))
+		report_packet_stats(out);
+
+	if (use_pcapng)
+		rte_pcapng_close(out.pcapng);
+	else
+		pcap_dump_close(out.dumper);
+
+	cleanup_pdump_resources();
+	rte_free(bpf_filter);
+	rte_ring_free(r);
+	rte_mempool_free(mp);
+
+	return rte_eal_cleanup() ? EXIT_FAILURE : 0;
+}
diff --git a/app/dumpcap/meson.build b/app/dumpcap/meson.build
new file mode 100644
index 000000000000..794336211eff
--- /dev/null
+++ b/app/dumpcap/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+if not dpdk_conf.has('RTE_PORT_PCAP')
+    build = false
+    reason = 'missing dependency, "libpcap"'
+endif
+
+if is_windows
+	build = false
+	reason = 'not supported on Windows'
+	subdir_done()
+endif
+
+sources = files('main.c')
+deps += ['ethdev', 'pdump', 'pcapng', 'bpf']
diff --git a/app/meson.build b/app/meson.build
index 4c6049807cc3..e41a2e390236 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017-2019 Intel Corporation
 
 apps = [
+        'dumpcap',
         'pdump',
         'proc-info',
         'test-acl',
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v13 08/12] test: add test for bpf_convert
  2021-10-15 18:28 ` [dpdk-dev] [PATCH v13 " Stephen Hemminger
                     ` (6 preceding siblings ...)
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
@ 2021-10-15 18:28   ` Stephen Hemminger
  2021-10-15 18:29   ` [dpdk-dev] [PATCH v13 09/12] test: add a test for pcapng library Stephen Hemminger
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 18:28 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

Add some functional tests for the Classic BPF to DPDK BPF converter.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 200 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 200 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 8118a1849ba0..d3d385ce1eff 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -10,6 +10,7 @@
 #include <rte_memory.h>
 #include <rte_debug.h>
 #include <rte_hexdump.h>
+#include <rte_malloc.h>
 #include <rte_random.h>
 #include <rte_byteorder.h>
 #include <rte_errno.h>
@@ -3233,3 +3234,202 @@ test_bpf(void)
 }
 
 REGISTER_TEST_COMMAND(bpf_autotest, test_bpf);
+
+#ifdef RTE_PORT_PCAP
+#include <pcap/pcap.h>
+
+static void
+test_bpf_dump(struct bpf_program *cbf, const struct rte_bpf_prm *prm)
+{
+	printf("cBPF program (%u insns)\n", cbf->bf_len);
+	bpf_dump(cbf, 1);
+
+	printf("\neBPF program (%u insns)\n", prm->nb_ins);
+	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
+}
+
+static int
+test_bpf_match(pcap_t *pcap, const char *str,
+	       struct rte_mbuf *mb)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+	int ret = -1;
+	uint64_t rc;
+
+	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
+		       __func__, __LINE__,  str, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, str, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	rc = rte_bpf_exec(bpf, mb);
+	/* The return code from bpf capture filter is non-zero if matched */
+	ret = (rc == 0);
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return ret;
+}
+
+/* Basic sanity test can we match a IP packet */
+static int
+test_bpf_filter_sanity(pcap_t *pcap)
+{
+	const uint32_t plen = 100;
+	struct rte_mbuf mb, *m;
+	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
+	struct {
+		struct rte_ether_hdr eth_hdr;
+		struct rte_ipv4_hdr ip_hdr;
+	} *hdr;
+
+	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
+	m = &mb;
+
+	hdr = rte_pktmbuf_mtod(m, typeof(hdr));
+	hdr->eth_hdr = (struct rte_ether_hdr) {
+		.dst_addr.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+		.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+	};
+	hdr->ip_hdr = (struct rte_ipv4_hdr) {
+		.version_ihl = RTE_IPV4_VHL_DEF,
+		.total_length = rte_cpu_to_be_16(plen),
+		.time_to_live = IPDEFTTL,
+		.next_proto_id = IPPROTO_RAW,
+		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+	};
+
+	if (test_bpf_match(pcap, "ip", m) != 0) {
+		printf("%s@%d: filter \"ip\" doesn't match test data\n",
+		       __func__, __LINE__);
+		return -1;
+	}
+	if (test_bpf_match(pcap, "not ip", m) == 0) {
+		printf("%s@%d: filter \"not ip\" does match test data\n",
+		       __func__, __LINE__);
+		return -1;
+	}
+
+	return 0;
+}
+
+/*
+ * Some sample pcap filter strings from
+ * https://wiki.wireshark.org/CaptureFilters
+ */
+static const char * const sample_filters[] = {
+	"host 172.18.5.4",
+	"net 192.168.0.0/24",
+	"src net 192.168.0.0/24",
+	"src net 192.168.0.0 mask 255.255.255.0",
+	"dst net 192.168.0.0/24",
+	"dst net 192.168.0.0 mask 255.255.255.0",
+	"port 53",
+	"host dpdk.org and not (port 80 or port 25)",
+	"host dpdk.org and not port 80 and not port 25",
+	"port not 53 and not arp",
+	"(tcp[0:2] > 1500 and tcp[0:2] < 1550) or (tcp[2:2] > 1500 and tcp[2:2] < 1550)",
+	"ether proto 0x888e",
+	"ether[0] & 1 = 0 and ip[16] >= 224",
+	"icmp[icmptype] != icmp-echo and icmp[icmptype] != icmp-echoreply",
+	"tcp[tcpflags] & (tcp-syn|tcp-fin) != 0 and not src and dst net 127.0.0.1",
+	"not ether dst 01:80:c2:00:00:0e",
+	"not broadcast and not multicast",
+	"dst host ff02::1",
+	"port 80 and tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420",
+	/* Worms */
+	"dst port 135 and tcp port 135 and ip[2:2]==48",
+	"icmp[icmptype]==icmp-echo and ip[2:2]==92 and icmp[8:4]==0xAAAAAAAA",
+	"dst port 135 or dst port 445 or dst port 1433"
+	" and tcp[tcpflags] & (tcp-syn) != 0"
+	" and tcp[tcpflags] & (tcp-ack) = 0 and src net 192.168.0.0/24",
+	"tcp src port 443 and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4] = 0x18)"
+	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 1] = 0x03)"
+	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 2] < 0x04)"
+	" and ((ip[2:2] - 4 * (ip[0] & 0x0F) - 4 * ((tcp[12] & 0xF0) >> 4) > 69))",
+	/* Other */
+	"len = 128",
+};
+
+static int
+test_bpf_filter(pcap_t *pcap, const char *s)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+
+	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
+		       __func__, __LINE__, s, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, s, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	else {
+		printf("%s \"%s\"\n", __func__, s);
+		test_bpf_dump(&fcode, prm);
+	}
+
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return (bpf == NULL) ? -1 : 0;
+}
+
+static int
+test_bpf_convert(void)
+{
+	unsigned int i;
+	pcap_t *pcap;
+	int rc;
+
+	pcap = pcap_open_dead(DLT_EN10MB, 262144);
+	if (!pcap) {
+		printf("pcap_open_dead failed\n");
+		return -1;
+	}
+
+	rc = test_bpf_filter_sanity(pcap);
+	for (i = 0; i < RTE_DIM(sample_filters); i++)
+		rc |= test_bpf_filter(pcap, sample_filters[i]);
+
+	pcap_close(pcap);
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(bpf_convert_autotest, test_bpf_convert);
+#endif /* RTE_PORT_PCAP */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v13 09/12] test: add a test for pcapng library
  2021-10-15 18:28 ` [dpdk-dev] [PATCH v13 " Stephen Hemminger
                     ` (7 preceding siblings ...)
  2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 08/12] test: add test for bpf_convert Stephen Hemminger
@ 2021-10-15 18:29   ` Stephen Hemminger
  2021-10-15 18:29   ` [dpdk-dev] [PATCH v13 10/12] test: enable bpf autotest Stephen Hemminger
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 18:29 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Reshma Pattan

Simple unit test that created pcapng file using API.

To run this test you need to have at least one device.
For example:

DPDK_TEST=pcapng_autotest ./build/app/test/dpdk-test -l 0-15 \
    --no-huge -m 2048 --vdev=net_tap,iface=dummy

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/meson.build   |   4 +
 app/test/test_pcapng.c | 272 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 276 insertions(+)
 create mode 100644 app/test/test_pcapng.c

diff --git a/app/test/meson.build b/app/test/meson.build
index f144d8b8ed68..56863f97acbb 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -396,6 +396,10 @@ if dpdk_conf.has('RTE_NET_RING')
     fast_tests += [['pdump_autotest', true]]
 endif
 
+if dpdk_conf.has('RTE_PORT_PCAP')
+    test_sources += 'test_pcapng.c'
+endif
+
 if dpdk_conf.has('RTE_LIB_POWER')
     test_deps += 'power'
 endif
diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
new file mode 100644
index 000000000000..ed1e87f9445d
--- /dev/null
+++ b/app/test/test_pcapng.c
@@ -0,0 +1,272 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Microsoft Corporation
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_net.h>
+#include <rte_pcapng.h>
+
+#include <pcap/pcap.h>
+
+#include "test.h"
+
+#define NUM_PACKETS    10
+#define DUMMY_MBUF_NUM 3
+
+static rte_pcapng_t *pcapng;
+static struct rte_mempool *mp;
+static const uint32_t pkt_len = 200;
+static uint16_t port_id;
+static char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng";
+
+/* first mbuf in the packet, should always be at offset 0 */
+struct dummy_mbuf {
+	struct rte_mbuf mb[DUMMY_MBUF_NUM];
+	uint8_t buf[DUMMY_MBUF_NUM][RTE_MBUF_DEFAULT_BUF_SIZE];
+};
+
+static void
+dummy_mbuf_prep(struct rte_mbuf *mb, uint8_t buf[], uint32_t buf_len,
+	uint32_t data_len)
+{
+	uint32_t i;
+	uint8_t *db;
+
+	mb->buf_addr = buf;
+	mb->buf_iova = (uintptr_t)buf;
+	mb->buf_len = buf_len;
+	rte_mbuf_refcnt_set(mb, 1);
+
+	/* set pool pointer to dummy value, test doesn't use it */
+	mb->pool = (void *)buf;
+
+	rte_pktmbuf_reset(mb);
+	db = (uint8_t *)rte_pktmbuf_append(mb, data_len);
+
+	for (i = 0; i != data_len; i++)
+		db[i] = i;
+}
+
+/* Make an IP packet consisting of chain of one packets */
+static void
+mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen)
+{
+	struct {
+		struct rte_ether_hdr eth;
+		struct rte_ipv4_hdr ip;
+	} pkt = {
+		.eth = {
+			.dst_addr.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+			.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+		},
+		.ip = {
+			.version_ihl = RTE_IPV4_VHL_DEF,
+			.total_length = rte_cpu_to_be_16(plen),
+			.time_to_live = IPDEFTTL,
+			.next_proto_id = IPPROTO_RAW,
+			.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+			.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+		}
+	};
+
+	memset(dm, 0, sizeof(*dm));
+	dummy_mbuf_prep(&dm->mb[0], dm->buf[0], sizeof(dm->buf[0]), plen);
+
+	rte_eth_random_addr(pkt.eth.src_addr.addr_bytes);
+	memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, RTE_MIN(sizeof(pkt), plen));
+}
+
+static int
+test_setup(void)
+{
+	int tmp_fd;
+
+	port_id = rte_eth_find_next(0);
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		fprintf(stderr, "No valid Ether port\n");
+		return -1;
+	}
+
+	tmp_fd = mkstemps(file_name, strlen(".pcapng"));
+	if (tmp_fd == -1) {
+		perror("mkstemps() failure");
+		return -1;
+	}
+	printf("pcapng: output file %s\n", file_name);
+
+	/* open a test capture file */
+	pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL);
+	if (pcapng == NULL) {
+		fprintf(stderr, "rte_pcapng_fdopen failed\n");
+		close(tmp_fd);
+		return -1;
+	}
+
+
+	/* Make a pool for cloned packeets */
+	mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", NUM_PACKETS,
+					    0, 0,
+					    rte_pcapng_mbuf_size(pkt_len),
+					    SOCKET_ID_ANY, "ring_mp_sc");
+	if (mp == NULL) {
+		fprintf(stderr, "Cannot create mempool\n");
+		return -1;
+	}
+	return 0;
+
+}
+
+static int
+test_write_packets(void)
+{
+	struct rte_mbuf *orig;
+	struct rte_mbuf *clones[NUM_PACKETS] = { };
+	struct dummy_mbuf mbfs;
+	unsigned int i;
+	ssize_t len;
+
+	/* make a dummy packet */
+	mbuf1_prepare(&mbfs, pkt_len);
+
+	/* clone them */
+	orig  = &mbfs.mb[0];
+	for (i = 0; i < NUM_PACKETS; i++) {
+		struct rte_mbuf *mc;
+
+		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
+				rte_get_tsc_cycles(), 0);
+		if (mc == NULL) {
+			fprintf(stderr, "Cannot copy packet\n");
+			return -1;
+		}
+		clones[i] = mc;
+	}
+
+	/* write it to capture file */
+	len = rte_pcapng_write_packets(pcapng, clones, NUM_PACKETS);
+
+	rte_pktmbuf_free_bulk(clones, NUM_PACKETS);
+
+	if (len <= 0) {
+		fprintf(stderr, "Write of packets failed\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+test_write_stats(void)
+{
+	ssize_t len;
+
+	/* write a statistics block */
+	len = rte_pcapng_write_stats(pcapng, port_id,
+				     NULL, 0, 0,
+				     NUM_PACKETS, 0);
+	if (len <= 0) {
+		fprintf(stderr, "Write of statistics failed\n");
+		return -1;
+	}
+	return 0;
+}
+
+static void
+pkt_print(u_char *user, const struct pcap_pkthdr *h,
+	  const u_char *bytes)
+{
+	unsigned int *countp = (unsigned int *)user;
+	const struct rte_ether_hdr *eh;
+	struct tm *tm;
+	char tbuf[128], src[64], dst[64];
+
+	tm = localtime(&h->ts.tv_sec);
+	if (tm == NULL) {
+		perror("localtime");
+		return;
+	}
+
+	if (strftime(tbuf, sizeof(tbuf), "%X", tm) == 0) {
+		fprintf(stderr, "strftime returned 0!\n");
+		return;
+	}
+
+	eh = (const struct rte_ether_hdr *)bytes;
+	rte_ether_format_addr(dst, sizeof(dst), &eh->dst_addr);
+	rte_ether_format_addr(src, sizeof(src), &eh->src_addr);
+	printf("%s.%06lu: %s -> %s type %x length %u\n",
+	       tbuf, (unsigned long)h->ts.tv_usec,
+	       src, dst, rte_be_to_cpu_16(eh->ether_type), h->len);
+
+	*countp += 1;
+}
+
+/*
+ * Open the resulting pcapng file with libpcap
+ * Would be better to use capinfos from wireshark
+ * but that creates an unwanted dependency.
+ */
+static int
+test_validate(void)
+{
+	char errbuf[PCAP_ERRBUF_SIZE];
+	unsigned int count = 0;
+	pcap_t *pcap;
+	int ret;
+
+	pcap = pcap_open_offline(file_name, errbuf);
+	if (pcap == NULL) {
+		fprintf(stderr, "pcap_open_offline('%s') failed: %s\n",
+			file_name, errbuf);
+		return -1;
+	}
+
+	ret = pcap_loop(pcap, 0, pkt_print, (u_char *)&count);
+	if (ret == 0)
+		printf("Saw %u packets\n", count);
+	else
+		fprintf(stderr, "pcap_dispatch: failed: %s\n",
+			pcap_geterr(pcap));
+	pcap_close(pcap);
+
+	return ret;
+}
+
+static void
+test_cleanup(void)
+{
+	if (mp)
+		rte_mempool_free(mp);
+
+	if (pcapng)
+		rte_pcapng_close(pcapng);
+
+}
+
+static struct
+unit_test_suite test_pcapng_suite  = {
+	.setup = test_setup,
+	.teardown = test_cleanup,
+	.suite_name = "Test Pcapng Unit Test Suite",
+	.unit_test_cases = {
+		TEST_CASE(test_write_packets),
+		TEST_CASE(test_write_stats),
+		TEST_CASE(test_validate),
+		TEST_CASES_END()
+	}
+};
+
+static int
+test_pcapng(void)
+{
+	return unit_test_suite_runner(&test_pcapng_suite);
+}
+
+REGISTER_TEST_COMMAND(pcapng_autotest, test_pcapng);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v13 10/12] test: enable bpf autotest
  2021-10-15 18:28 ` [dpdk-dev] [PATCH v13 " Stephen Hemminger
                     ` (8 preceding siblings ...)
  2021-10-15 18:29   ` [dpdk-dev] [PATCH v13 09/12] test: add a test for pcapng library Stephen Hemminger
@ 2021-10-15 18:29   ` Stephen Hemminger
  2021-10-15 18:29   ` [dpdk-dev] [PATCH v13 11/12] doc: changes for new pcapng and dumpcap utility Stephen Hemminger
  2021-10-15 18:29   ` [dpdk-dev] [PATCH v13 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 18:29 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

The BPF autotest is defined but not run automatically.
Since it is short, it should be added to the autotest suite.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/meson.build | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/app/test/meson.build b/app/test/meson.build
index 56863f97acbb..aa298e971401 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -194,6 +194,8 @@ test_deps = [
 fast_tests = [
         ['acl_autotest', true],
         ['atomic_autotest', false],
+        ['bpf_autotest', true],
+        ['bpf_convert_autotest', true],
         ['bitops_autotest', true],
         ['byteorder_autotest', true],
         ['cksum_autotest', true],
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v13 11/12] doc: changes for new pcapng and dumpcap utility
  2021-10-15 18:28 ` [dpdk-dev] [PATCH v13 " Stephen Hemminger
                     ` (9 preceding siblings ...)
  2021-10-15 18:29   ` [dpdk-dev] [PATCH v13 10/12] test: enable bpf autotest Stephen Hemminger
@ 2021-10-15 18:29   ` Stephen Hemminger
  2021-10-15 18:29   ` [dpdk-dev] [PATCH v13 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 18:29 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Reshma Pattan

Describe the new packet capture library and utility.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/api/doxy-api-index.md                     |  1 +
 doc/api/doxy-api.conf.in                      |  1 +
 .../howto/img/packet_capture_framework.svg    | 96 +++++++++----------
 doc/guides/howto/packet_capture_framework.rst | 69 ++++++-------
 doc/guides/prog_guide/index.rst               |  1 +
 doc/guides/prog_guide/pcapng_lib.rst          | 25 +++++
 doc/guides/prog_guide/pdump_lib.rst           | 28 ++++--
 doc/guides/rel_notes/release_21_11.rst        | 10 ++
 doc/guides/tools/dumpcap.rst                  | 86 +++++++++++++++++
 doc/guides/tools/index.rst                    |  1 +
 10 files changed, 230 insertions(+), 88 deletions(-)
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a0356..ee07394d1c78 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -223,3 +223,4 @@ The public API headers are grouped by topics:
   [experimental APIs]  (@ref rte_compat.h),
   [ABI versioning]     (@ref rte_function_versioning.h),
   [version]            (@ref rte_version.h)
+  [pcapng]             (@ref rte_pcapng.h)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6ab..aba17799a9a1 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -58,6 +58,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/metrics \
                           @TOPDIR@/lib/node \
                           @TOPDIR@/lib/net \
+                          @TOPDIR@/lib/pcapng \
                           @TOPDIR@/lib/pci \
                           @TOPDIR@/lib/pdump \
                           @TOPDIR@/lib/pipeline \
diff --git a/doc/guides/howto/img/packet_capture_framework.svg b/doc/guides/howto/img/packet_capture_framework.svg
index a76baf71fdee..1c2646a81096 100644
--- a/doc/guides/howto/img/packet_capture_framework.svg
+++ b/doc/guides/howto/img/packet_capture_framework.svg
@@ -1,6 +1,4 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<!-- Created with Inkscape (http://www.inkscape.org/) -->
-
 <svg
    xmlns:osb="http://www.openswatchbook.org/uri/2009/osb"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
@@ -16,8 +14,8 @@
    viewBox="0 0 425.19685 283.46457"
    id="svg2"
    version="1.1"
-   inkscape:version="0.91 r13725"
-   sodipodi:docname="drawing-pcap.svg">
+   inkscape:version="1.0.2 (e86c870879, 2021-01-15)"
+   sodipodi:docname="packet_capture_framework.svg">
   <defs
      id="defs4">
     <marker
@@ -228,7 +226,7 @@
        x2="487.64606"
        y2="258.38232"
        gradientUnits="userSpaceOnUse"
-       gradientTransform="translate(-84.916417,744.90779)" />
+       gradientTransform="matrix(1.1457977,0,0,0.99944907,-151.97019,745.05014)" />
     <linearGradient
        inkscape:collect="always"
        xlink:href="#linearGradient5784"
@@ -277,17 +275,18 @@
      borderopacity="1.0"
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
-     inkscape:zoom="0.57434918"
-     inkscape:cx="215.17857"
-     inkscape:cy="285.26445"
+     inkscape:zoom="1"
+     inkscape:cx="226.77165"
+     inkscape:cy="78.124511"
      inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
-     inkscape:window-width="1874"
-     inkscape:window-height="971"
-     inkscape:window-x="2"
-     inkscape:window-y="24"
-     inkscape:window-maximized="0" />
+     inkscape:window-width="2560"
+     inkscape:window-height="1414"
+     inkscape:window-x="0"
+     inkscape:window-y="0"
+     inkscape:window-maximized="1"
+     inkscape:document-rotation="0" />
   <metadata
      id="metadata7">
     <rdf:RDF>
@@ -296,7 +295,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
+        <dc:title />
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -321,15 +320,15 @@
        y="790.82452" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="61.050636"
        y="807.3205"
-       id="text4152"
-       sodipodi:linespacing="125%"><tspan
+       id="text4152"><tspan
          sodipodi:role="line"
          id="tspan4154"
          x="61.050636"
-         y="807.3205">DPDK Primary Application</tspan></text>
+         y="807.3205"
+         style="font-size:12.5px;line-height:1.25">DPDK Primary Application</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6"
@@ -339,19 +338,20 @@
        y="827.01843" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="350.68585"
        y="841.16058"
-       id="text4189"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189"><tspan
          sodipodi:role="line"
          id="tspan4191"
          x="350.68585"
-         y="841.16058">dpdk-pdump</tspan><tspan
+         y="841.16058"
+         style="font-size:12.5px;line-height:1.25">dpdk-dumpcap</tspan><tspan
          sodipodi:role="line"
          x="350.68585"
          y="856.78558"
-         id="tspan4193">tool</tspan></text>
+         id="tspan4193"
+         style="font-size:12.5px;line-height:1.25">tool</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4"
@@ -361,15 +361,15 @@
        y="891.16315" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70612"
        y="905.3053"
-       id="text4189-1"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1"><tspan
          sodipodi:role="line"
          x="352.70612"
          y="905.3053"
-         id="tspan4193-3">PCAP PMD</tspan></text>
+         id="tspan4193-3"
+         style="font-size:12.5px;line-height:1.25">librte_pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5745);fill-opacity:1;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-6"
@@ -379,15 +379,15 @@
        y="923.9931" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.02846"
        y="938.13525"
-       id="text4189-0"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-0"><tspan
          sodipodi:role="line"
          x="136.02846"
          y="938.13525"
-         id="tspan4193-6">dpdk_port0</tspan></text>
+         id="tspan4193-6"
+         style="font-size:12.5px;line-height:1.25">dpdk_port0</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-5"
@@ -397,33 +397,33 @@
        y="824.99817" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="137.54369"
        y="839.14026"
-       id="text4189-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-4"><tspan
          sodipodi:role="line"
          x="137.54369"
          y="839.14026"
-         id="tspan4193-2">librte_pdump</tspan></text>
+         id="tspan4193-2"
+         style="font-size:12.5px;line-height:1.25">librte_pdump</tspan></text>
     <rect
-       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1.07013;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5"
-       width="94.449265"
-       height="35.355339"
-       x="307.7804"
-       y="985.61243" />
+       width="108.21974"
+       height="35.335861"
+       x="297.9809"
+       y="985.62219" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70618"
        y="999.75458"
-       id="text4189-1-8"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8"><tspan
          sodipodi:role="line"
          x="352.70618"
          y="999.75458"
-         id="tspan4193-3-2">capture.pcap</tspan></text>
+         id="tspan4193-3-2"
+         style="font-size:12.5px;line-height:1.25">capture.pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5788-1);fill-opacity:1;stroke:#257cdc;stroke-width:1.12555885;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5-1"
@@ -433,15 +433,15 @@
        y="983.14984" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.53352"
        y="1002.785"
-       id="text4189-1-8-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8-4"><tspan
          sodipodi:role="line"
          x="136.53352"
          y="1002.785"
-         id="tspan4193-3-2-7">Traffic Generator</tspan></text>
+         id="tspan4193-3-2-7"
+         style="font-size:12.5px;line-height:1.25">Traffic Generator</tspan></text>
     <path
        style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker7331)"
        d="m 351.46948,927.02357 c 0,57.5787 0,57.5787 0,57.5787"
diff --git a/doc/guides/howto/packet_capture_framework.rst b/doc/guides/howto/packet_capture_framework.rst
index c31bac52340e..f933cc7e9311 100644
--- a/doc/guides/howto/packet_capture_framework.rst
+++ b/doc/guides/howto/packet_capture_framework.rst
@@ -1,18 +1,19 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
-    Copyright(c) 2017 Intel Corporation.
+    Copyright(c) 2017-2021 Intel Corporation.
 
-DPDK pdump Library and pdump Tool
-=================================
+DPDK packet capture libraries and tools
+=======================================
 
 This document describes how the Data Plane Development Kit (DPDK) Packet
 Capture Framework is used for capturing packets on DPDK ports. It is intended
 for users of DPDK who want to know more about the Packet Capture feature and
 for those who want to monitor traffic on DPDK-controlled devices.
 
-The DPDK packet capture framework was introduced in DPDK v16.07. The DPDK
-packet capture framework consists of the DPDK pdump library and DPDK pdump
-tool.
-
+The DPDK packet capture framework was introduced in DPDK v16.07 and
+enhanced in 21.11. The DPDK packet capture framework consists of the
+libraries for collecting packets ``librte_pdump`` and writing packets
+to a file ``librte_pcapng``. There are two sample applications:
+``dpdk-dumpcap`` and older ``dpdk-pdump``.
 
 Introduction
 ------------
@@ -22,43 +23,46 @@ allow users to initialize the packet capture framework and to enable or
 disable packet capture. The library works on a multi process communication model and its
 usage is recommended for debugging purposes.
 
-The :ref:`dpdk-pdump <pdump_tool>` tool is developed based on the
-``librte_pdump`` library.  It runs as a DPDK secondary process and is capable
-of enabling or disabling packet capture on DPDK ports. The ``dpdk-pdump`` tool
-provides command-line options with which users can request enabling or
-disabling of the packet capture on DPDK ports.
+The :ref:`librte_pcapng <pcapng_library>` library provides the APIs to format
+packets and write them to a file in Pcapng format.
+
+
+The :ref:`dpdk-dumpcap <dumpcap_tool>` is a tool that captures packets in
+like Wireshark dumpcap does for Linux. It runs as a DPDK secondary process and
+captures packets from one or more interfaces and writes them to a file
+in Pcapng format.  The ``dpdk-dumpcap`` tool is designed to take
+most of the same options as the Wireshark ``dumpcap`` command.
 
-The application which initializes the packet capture framework will be a primary process
-and the application that enables or disables the packet capture will
-be a secondary process. The primary process sends the Rx and Tx packets from the DPDK ports
-to the secondary process.
+Without any options it will use the packet capture framework to
+capture traffic from the first available DPDK port.
 
 In DPDK the ``testpmd`` application can be used to initialize the packet
-capture framework and acts as a server, and the ``dpdk-pdump`` tool acts as a
+capture framework and acts as a server, and the ``dpdk-dumpcap`` tool acts as a
 client. To view Rx or Tx packets of ``testpmd``, the application should be
-launched first, and then the ``dpdk-pdump`` tool. Packets from ``testpmd``
-will be sent to the tool, which then sends them on to the Pcap PMD device and
-that device writes them to the Pcap file or to an external interface depending
-on the command-line option used.
+launched first, and then the ``dpdk-dumpcap`` tool. Packets from ``testpmd``
+will be sent to the tool, and then to the Pcapng file.
 
 Some things to note:
 
-* The ``dpdk-pdump`` tool can only be used in conjunction with a primary
+* All tools using ``librte_pdump`` can only be used in conjunction with a primary
   application which has the packet capture framework initialized already. In
   dpdk, only ``testpmd`` is modified to initialize packet capture framework,
-  other applications remain untouched. So, if the ``dpdk-pdump`` tool has to
+  other applications remain untouched. So, if the ``dpdk-dumpcap`` tool has to
   be used with any application other than the testpmd, the user needs to
   explicitly modify that application to call the packet capture framework
   initialization code. Refer to the ``app/test-pmd/testpmd.c`` code and look
   for ``pdump`` keyword to see how this is done.
 
-* The ``dpdk-pdump`` tool depends on the libpcap based PMD.
+* The ``dpdk-pdump`` tool is an older tool created as demonstration of ``librte_pdump``
+  library. The ``dpdk-pdump`` tool provides more limited functionality and
+  and depends on the Pcap PMD. It is retained only for compatibility reasons;
+  users should use ``dpdk-dumpcap`` instead.
 
 
 Test Environment
 ----------------
 
-The overview of using the Packet Capture Framework and the ``dpdk-pdump`` tool
+The overview of using the Packet Capture Framework and the ``dpdk-dumpcap`` utility
 for packet capturing on the DPDK port in
 :numref:`figure_packet_capture_framework`.
 
@@ -66,13 +70,13 @@ for packet capturing on the DPDK port in
 
 .. figure:: img/packet_capture_framework.*
 
-   Packet capturing on a DPDK port using the dpdk-pdump tool.
+   Packet capturing on a DPDK port using the dpdk-dumpcap utility.
 
 
 Running the Application
 -----------------------
 
-The following steps demonstrate how to run the ``dpdk-pdump`` tool to capture
+The following steps demonstrate how to run the ``dpdk-dumpcap`` tool to capture
 Rx side packets on dpdk_port0 in :numref:`figure_packet_capture_framework` and
 inspect them using ``tcpdump``.
 
@@ -80,16 +84,15 @@ inspect them using ``tcpdump``.
 
      sudo <build_dir>/app/dpdk-testpmd -c 0xf0 -n 4 -- -i --port-topology=chained
 
-#. Launch the pdump tool as follows::
+#. Launch the dpdk-dumpcap as follows::
 
-     sudo <build_dir>/app/dpdk-pdump -- \
-          --pdump 'port=0,queue=*,rx-dev=/tmp/capture.pcap'
+     sudo <build_dir>/app/dpdk-dumpcap -w /tmp/capture.pcapng
 
 #. Send traffic to dpdk_port0 from traffic generator.
-   Inspect packets captured in the file capture.pcap using a tool
-   that can interpret Pcap files, for example tcpdump::
+   Inspect packets captured in the file capture.pcapng using a tool such as
+   tcpdump or tshark that can interpret Pcapng files::
 
-     $tcpdump -nr /tmp/capture.pcap
+     $ tcpdump -nr /tmp/capture.pcapng
      reading from file /tmp/capture.pcap, link-type EN10MB (Ethernet)
      11:11:36.891404 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
      11:11:36.891442 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46a3..b440c77c2ba1 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -43,6 +43,7 @@ Programmer's Guide
     ip_fragment_reassembly_lib
     generic_receive_offload_lib
     generic_segmentation_offload_lib
+    pcapng_lib
     pdump_lib
     multi_proc_support
     kernel_nic_interface
diff --git a/doc/guides/prog_guide/pcapng_lib.rst b/doc/guides/prog_guide/pcapng_lib.rst
new file mode 100644
index 000000000000..7b2d82d7bd3b
--- /dev/null
+++ b/doc/guides/prog_guide/pcapng_lib.rst
@@ -0,0 +1,25 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2016 Intel Corporation.
+
+.. _pcapng_library:
+
+Packet Capture File Writer
+==========================
+
+Pcapng is a library for creating files in Pcapng file format.
+The Pcapng file format is the default capture file format for modern
+network capture processing tools. It can be read by wireshark and tcpdump.
+
+Usage
+-----
+
+Before the library can be used the function ``rte_pcapng_init``
+should be called once to initialize timestamp computation.
+
+
+References
+----------
+* Project repository  https://github.com/pcapng/pcapng/
+
+* PCAP Next Generation (pcapng) Capture File Format
+https://pcapng.github.io/pcapng/draft-tuexen-opsawg-pcapng.html
diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst
index 62c0b015b2fe..d04d9709e364 100644
--- a/doc/guides/prog_guide/pdump_lib.rst
+++ b/doc/guides/prog_guide/pdump_lib.rst
@@ -3,10 +3,10 @@
 
 .. _pdump_library:
 
-The librte_pdump Library
-========================
+The Packet Capture Library
+==========================
 
-The ``librte_pdump`` library provides a framework for packet capturing in DPDK.
+The DPDK ``pdump`` library provides a framework for packet capturing in DPDK.
 The library does the complete copy of the Rx and Tx mbufs to a new mempool and
 hence it slows down the performance of the applications, so it is recommended
 to use this library for debugging purposes.
@@ -23,11 +23,19 @@ or disable the packet capture, and to uninitialize it.
 
 * ``rte_pdump_enable()``:
   This API enables the packet capture on a given port and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf()``
+  This API enables the packet capture on a given port and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_enable_by_deviceid()``:
   This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf_by_deviceid()``
+  This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_disable()``:
   This API disables the packet capture on a given port and queue.
@@ -61,6 +69,12 @@ and enables the packet capture by registering the Ethernet RX and TX callbacks f
 and queue combinations. Then the primary process will mirror the packets to the new mempool and enqueue them to
 the rte_ring that secondary process have passed to these APIs.
 
+The packet ring supports one of two formats. The default format enqueues copies of the original packets
+into the rte_ring. If the ``RTE_PDUMP_FLAG_PCAPNG`` is set the mbuf data is extended with header and trailer
+to match the format of Pcapng enhanced packet block. The enhanced packet block has meta-data such as the
+timestamp, port and queue the packet was captured on. It is up to the application consuming the
+packets from the ring to select the format desired.
+
 The library APIs ``rte_pdump_disable()`` and ``rte_pdump_disable_by_deviceid()`` disables the packet capture.
 For the calls to these APIs from secondary process, the library creates the "pdump disable" request and sends
 the request to the primary process over the multi process channel. The primary process takes this request and
@@ -74,5 +88,5 @@ function.
 Use Case: Packet Capturing
 --------------------------
 
-The DPDK ``app/pdump`` tool is developed based on this library to capture packets in DPDK.
-Users can use this as an example to develop their own packet capturing tools.
+The DPDK ``app/dpdk-dumpcap`` utility uses this library
+to capture packets in DPDK.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 4c56cdfeaaa2..0909f4258cf8 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -159,6 +159,16 @@ New Features
   * Added tests to verify tunnel header verification in IPsec inbound.
   * Added tests to verify inner checksum.
 
+* **Revised packet capture framework.**
+
+  * New dpdk-dumpcap program that has most of the features of the
+    wireshark dumpcap utility including: capture of multiple interfaces,
+    filtering, and stopping after number of bytes, packets.
+  * New library for writing pcapng packet capture files.
+  * Enhancements to the pdump library to support:
+    * Packet filter with BPF.
+    * Pcapng format with timestamps and meta-data.
+    * Fixes packet capture with stripped VLAN tags.
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/dumpcap.rst b/doc/guides/tools/dumpcap.rst
new file mode 100644
index 000000000000..664ea0c79802
--- /dev/null
+++ b/doc/guides/tools/dumpcap.rst
@@ -0,0 +1,86 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Microsoft Corporation.
+
+.. _dumpcap_tool:
+
+dpdk-dumpcap Application
+========================
+
+The ``dpdk-dumpcap`` tool is a Data Plane Development Kit (DPDK)
+network traffic dump tool.  The interface is similar to  the dumpcap tool in Wireshark.
+It runs as a secondary DPDK process and lets you capture packets that are
+coming into and out of a DPDK primary process.
+The ``dpdk-dumpcap`` writes files in Pcapng packet format using
+capture file format is pcapng.
+
+Without any options set it will use DPDK to capture traffic from the first
+available DPDK interface and write the received raw packet data, along
+with timestamps into a pcapng file.
+
+If the ``-w`` option is not specified, ``dpdk-dumpcap`` writes to a newly
+create file with a name chosen based on interface name and timestamp.
+If ``-w`` option is specified, then that file is used.
+
+   .. Note::
+      * The ``dpdk-dumpcap`` tool can only be used in conjunction with a primary
+        application which has the packet capture framework initialized already.
+        In dpdk, only the ``testpmd`` is modified to initialize packet capture
+        framework, other applications remain untouched. So, if the ``dpdk-dumpcap``
+        tool has to be used with any application other than the testpmd, user
+        needs to explicitly modify that application to call packet capture
+        framework initialization code. Refer ``app/test-pmd/testpmd.c``
+        code to see how this is done.
+
+      * The ``dpdk-dumpcap`` tool runs as a DPDK secondary process. It exits when
+        the primary application exits.
+
+
+Running the Application
+-----------------------
+
+To list interfaces available for capture use ``--list-interfaces``.
+
+To filter packets in style of *tshark* use the ``-f`` flag.
+
+To capture on multiple interfaces at once, use multiple ``-I`` flags.
+
+Example
+-------
+
+.. code-block:: console
+
+   # ./<build_dir>/app/dpdk-dumpcap --list-interfaces
+   0. 000:00:03.0
+   1. 000:00:03.1
+
+   # ./<build_dir>/app/dpdk-dumpcap -I 0000:00:03.0 -c 6 -w /tmp/sample.pcapng
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 6/0
+
+   # ./<build_dir>/app/dpdk-dumpcap -f 'tcp port 80'
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 10/8
+
+
+Limitations
+-----------
+The following option of Wireshark ``dumpcap`` is not yet implemented:
+
+   * ``-b|--ring-buffer`` -- more complex file management.
+
+The following options do not make sense in the context of DPDK.
+
+   * ``-C <byte_limit>`` -- its a kernel thing
+
+   * ``-t`` -- use a thread per interface
+
+   * Timestamp type.
+
+   * Link data types. Only EN10MB (Ethernet) is supported.
+
+   * Wireless related options:  ``-I|--monitor-mode`` and  ``-k <freq>``
+
+
+.. Note::
+   * The options to ``dpdk-dumpcap`` are like the Wireshark dumpcap program and
+     are not the same as ``dpdk-pdump`` and other DPDK applications.
diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst
index 93dde4148e90..b71c12b8f2dd 100644
--- a/doc/guides/tools/index.rst
+++ b/doc/guides/tools/index.rst
@@ -8,6 +8,7 @@ DPDK Tools User Guides
     :maxdepth: 2
     :numbered:
 
+    dumpcap
     proc_info
     pdump
     pmdinfo
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v13 12/12] MAINTAINERS: add entry for new packet capture features
  2021-10-15 18:28 ` [dpdk-dev] [PATCH v13 " Stephen Hemminger
                     ` (10 preceding siblings ...)
  2021-10-15 18:29   ` [dpdk-dev] [PATCH v13 11/12] doc: changes for new pcapng and dumpcap utility Stephen Hemminger
@ 2021-10-15 18:29   ` Stephen Hemminger
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 18:29 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Thomas Monjalon

Since the packet capture is just extension of existing pdump;
add myself as maintainer of that.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ed8becce85cd..6d95d151ba4a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1425,12 +1425,17 @@ F: doc/guides/sample_app_ug/qos_scheduler.rst
 
 Packet capture
 M: Reshma Pattan <reshma.pattan@intel.com>
+M: Stephen Hemminger <stephen@networkplumber.org>
 F: lib/pdump/
+F: lib/pcapng/
 F: doc/guides/prog_guide/pdump_lib.rst
-F: app/test/test_pdump.*
-F: app/pdump/
+F: doc/guides/prog_guide/pcapng_lib.rst
+F: doc/guides/tools/dumpcap.rst
 F: doc/guides/tools/pdump.rst
-
+F: app/test/test_pdump.c
+F: app/test/test_pcapng.c
+F: app/pdump/
+F: app/dumpcap/
 
 Packet Framework
 ----------------
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v14 00/12] Packet capture framework update
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (16 preceding siblings ...)
  2021-10-15 18:28 ` [dpdk-dev] [PATCH v13 " Stephen Hemminger
@ 2021-10-15 20:11 ` Stephen Hemminger
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 01/12] lib: pdump is not supported on Windows Stephen Hemminger
                     ` (11 more replies)
  2021-10-20 21:42 ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Stephen Hemminger
  18 siblings, 12 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 20:11 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patch set is a more complete version of the the enhanced
packet capture support described last year.

The new capture library and utility are:
  - faster avoids lots of extra I/O, does bursting, etc.
  - gives more information (multiple ports, queues, etc)
  - has a better user interface (same as Wireshark dumpcap)
  - fixes structural problems with VLAN's and timestamps

There are no blocker items.
The following are worth noting:
  * bogus checkpatch warnings
	- the correct flag to open is O_CREAT
        - intentionally keeping macro with goto since that
          was in original code and is clearer
        - the tempfile name can not be const since it is
          overwritten by tmpfile() call

  * enabling BPF tests causes CI to see a pre-existing bug
    https://bugs.dpdk.org/show_bug.cgi?id=811

  * future filtering for stripped VLAN tags needs collabration
    with libpcap project to fix pcap_compile_filter().

v14
  - fix checkpatch whitespace warning
  - enhance the pcapng prog guide documentation

v13
  - integrate feedback in documentation and pcapng library
  - rebase to align with rte_ether_addr changes

v12
  - fixes for capture offloaded VLAN tags.
    look at direction flag and handle QinQ offload.

v11
  - address review comments for pdump (patch 6)

v10:
  - fix to rte_bpf_dump to handle more instructions
    make sure all bpf_test cases are decoded

v9:
  - incorporate suggested change to BPF XOR
  - make autotest for pcapng more complete by reading the
    resulting file with libpcap

v8:
  - enable BPF tests in autotest
  - add more BPF test strings
  - use rte_strscpy to satisfy checkpatch
  - merge MAINTAINERS (put this in with existing pdump)

v7:
  - add functional tests for pcapng lib
  - bug fix for error returns in pcapng lib
  - handle long osname on FreeBSD
  - resolve almost all checkpatch issues

v5:
  - minor build and checkpatch fixes for RHEL/FreeBSD
  - disable lib/pdump on Windows. It was not useful before
    and now pdump depends on bpf.

v4:
  - minor checkpatch fixes.
    Note: some of the checkpatch warnings are bogus and won't be fixed.
  - fix build of dumpcap on FreeBSD

v3:
  - introduce packet filters using classic BPF to eBPF converter
    required small fix to DPDK BPF interpreter
  - introduce function to decode eBPF instructions
  - add option to dumpcap to show both classic BPF and eBPF result
  - drop some un-useful stubs
  - minor checkpatch warning cleanup

v2:
   fix formatting of packet blocks
   fix the new packet capture statistics
   fix crash when primary process exits
   record start/end time
   various whitespace/checkpatch warnings

Stephen Hemminger (12):
  lib: pdump is not supported on Windows
  librte_pcapng: add new library for writing pcapng files
  bpf: allow self-xor operation
  bpf: add function to convert classic BPF to DPDK BPF
  bpf: add function to dump eBPF instructions
  pdump: support pcapng and filtering
  app/dumpcap: add new packet capture application
  test: add test for bpf_convert
  test: add a test for pcapng library
  test: enable bpf autotest
  doc: changes for new pcapng and dumpcap utility
  MAINTAINERS: add entry for new packet capture features

 MAINTAINERS                                   |  11 +-
 app/dumpcap/main.c                            | 844 ++++++++++++++++++
 app/dumpcap/meson.build                       |  16 +
 app/meson.build                               |   1 +
 app/test/meson.build                          |   6 +
 app/test/test_bpf.c                           | 200 +++++
 app/test/test_pcapng.c                        | 272 ++++++
 doc/api/doxy-api-index.md                     |   1 +
 doc/api/doxy-api.conf.in                      |   1 +
 .../howto/img/packet_capture_framework.svg    |  96 +-
 doc/guides/howto/packet_capture_framework.rst |  69 +-
 doc/guides/prog_guide/index.rst               |   1 +
 doc/guides/prog_guide/pcapng_lib.rst          |  46 +
 doc/guides/prog_guide/pdump_lib.rst           |  28 +-
 doc/guides/rel_notes/release_21_11.rst        |  10 +
 doc/guides/tools/dumpcap.rst                  |  86 ++
 doc/guides/tools/index.rst                    |   1 +
 lib/bpf/bpf_convert.c                         | 575 ++++++++++++
 lib/bpf/bpf_dump.c                            | 139 +++
 lib/bpf/bpf_validate.c                        |   9 +-
 lib/bpf/meson.build                           |   6 +
 lib/bpf/rte_bpf.h                             |  39 +
 lib/bpf/version.map                           |   7 +
 lib/meson.build                               |   6 +-
 lib/pcapng/meson.build                        |   8 +
 lib/pcapng/pcapng_proto.h                     | 129 +++
 lib/pcapng/rte_pcapng.c                       | 607 +++++++++++++
 lib/pcapng/rte_pcapng.h                       | 195 ++++
 lib/pcapng/version.map                        |  12 +
 lib/pdump/meson.build                         |   2 +-
 lib/pdump/rte_pdump.c                         | 432 ++++++---
 lib/pdump/rte_pdump.h                         | 113 ++-
 lib/pdump/version.map                         |   8 +
 33 files changed, 3757 insertions(+), 219 deletions(-)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build
 create mode 100644 app/test/test_pcapng.c
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst
 create mode 100644 lib/bpf/bpf_convert.c
 create mode 100644 lib/bpf/bpf_dump.c
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v14 01/12] lib: pdump is not supported on Windows
  2021-10-15 20:11 ` [dpdk-dev] [PATCH v14 00/12] Packet capture framework update Stephen Hemminger
@ 2021-10-15 20:11   ` Stephen Hemminger
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 20:11 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

The current version of the pdump library was building on
Windows, but it was useless since the pdump utility was not being
built and Windows does not have multi-process support.

The new version of pdump with filtering now has dependency
on bpf. But bpf library is not available on Windows.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

Cc: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>
Cc: Dmitry Malloy <dmitrym@microsoft.com>
Cc: Pallavi Kadam <pallavi.kadam@intel.com>
---
 lib/meson.build | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/meson.build b/lib/meson.build
index b2ba7258d8ba..ef5ff522aeaf 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -85,7 +85,6 @@ if is_windows
             'gro',
             'gso',
             'latencystats',
-            'pdump',
             'stack',
             'security',
     ] # only supported libraries for windows
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v14 02/12] librte_pcapng: add new library for writing pcapng files
  2021-10-15 20:11 ` [dpdk-dev] [PATCH v14 00/12] Packet capture framework update Stephen Hemminger
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 01/12] lib: pdump is not supported on Windows Stephen Hemminger
@ 2021-10-15 20:11   ` Stephen Hemminger
  2021-10-19 10:24     ` Pattan, Reshma
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 03/12] bpf: allow self-xor operation Stephen Hemminger
                     ` (9 subsequent siblings)
  11 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 20:11 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Reshma Pattan, Ray Kinsella

This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/meson.build           |   1 +
 lib/pcapng/meson.build    |   8 +
 lib/pcapng/pcapng_proto.h | 129 ++++++++
 lib/pcapng/rte_pcapng.c   | 607 ++++++++++++++++++++++++++++++++++++++
 lib/pcapng/rte_pcapng.h   | 195 ++++++++++++
 lib/pcapng/version.map    |  12 +
 6 files changed, 952 insertions(+)
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

diff --git a/lib/meson.build b/lib/meson.build
index ef5ff522aeaf..15150efa19a7 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -41,6 +41,7 @@ libraries = [
         'latencystats',
         'lpm',
         'member',
+        'pcapng',
         'power',
         'pdump',
         'rawdev',
diff --git a/lib/pcapng/meson.build b/lib/pcapng/meson.build
new file mode 100644
index 000000000000..fe636bdf3c0b
--- /dev/null
+++ b/lib/pcapng/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+version = 1
+sources = files('rte_pcapng.c')
+headers = files('rte_pcapng.h')
+
+deps += ['ethdev']
diff --git a/lib/pcapng/pcapng_proto.h b/lib/pcapng/pcapng_proto.h
new file mode 100644
index 000000000000..47161d8a1213
--- /dev/null
+++ b/lib/pcapng/pcapng_proto.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * PCAP Next Generation Capture File writer
+ *
+ * See: https://github.com/pcapng/pcapng/ for the file format.
+ */
+
+enum pcapng_block_types {
+	PCAPNG_INTERFACE_BLOCK		= 1,
+	PCAPNG_PACKET_BLOCK,		/* Obsolete */
+	PCAPNG_SIMPLE_PACKET_BLOCK,
+	PCAPNG_NAME_RESOLUTION_BLOCK,
+	PCAPNG_INTERFACE_STATS_BLOCK,
+	PCAPNG_ENHANCED_PACKET_BLOCK,
+
+	PCAPNG_SECTION_BLOCK		= 0x0A0D0D0A,
+};
+
+struct pcapng_option {
+	uint16_t code;
+	uint16_t length;
+	uint8_t data[];
+};
+
+#define PCAPNG_BYTE_ORDER_MAGIC 0x1A2B3C4D
+#define PCAPNG_MAJOR_VERS 1
+#define PCAPNG_MINOR_VERS 0
+
+enum pcapng_opt {
+	PCAPNG_OPT_END	= 0,
+	PCAPNG_OPT_COMMENT = 1,
+};
+
+struct pcapng_section_header {
+	uint32_t block_type;
+	uint32_t block_length;
+	uint32_t byte_order_magic;
+	uint16_t major_version;
+	uint16_t minor_version;
+	uint64_t section_length;
+};
+
+enum pcapng_section_opt {
+	PCAPNG_SHB_HARDWARE = 2,
+	PCAPNG_SHB_OS	    = 3,
+	PCAPNG_SHB_USERAPPL = 4,
+};
+
+struct pcapng_interface_block {
+	uint32_t block_type;	/* 1 */
+	uint32_t block_length;
+	uint16_t link_type;
+	uint16_t reserved;
+	uint32_t snap_len;
+};
+
+enum pcapng_interface_options {
+	PCAPNG_IFB_NAME	 = 2,
+	PCAPNG_IFB_DESCRIPTION,
+	PCAPNG_IFB_IPV4ADDR,
+	PCAPNG_IFB_IPV6ADDR,
+	PCAPNG_IFB_MACADDR,
+	PCAPNG_IFB_EUIADDR,
+	PCAPNG_IFB_SPEED,
+	PCAPNG_IFB_TSRESOL,
+	PCAPNG_IFB_TZONE,
+	PCAPNG_IFB_FILTER,
+	PCAPNG_IFB_OS,
+	PCAPNG_IFB_FCSLEN,
+	PCAPNG_IFB_TSOFFSET,
+	PCAPNG_IFB_HARDWARE,
+};
+
+struct pcapng_enhance_packet_block {
+	uint32_t block_type;	/* 6 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+	uint32_t capture_length;
+	uint32_t original_length;
+};
+
+/* Flags values */
+#define PCAPNG_IFB_INBOUND   0b01
+#define PCAPNG_IFB_OUTBOUND  0b10
+
+enum pcapng_epb_options {
+	PCAPNG_EPB_FLAGS = 2,
+	PCAPNG_EPB_HASH,
+	PCAPNG_EPB_DROPCOUNT,
+	PCAPNG_EPB_PACKETID,
+	PCAPNG_EPB_QUEUE,
+	PCAPNG_EPB_VERDICT,
+};
+
+enum pcapng_epb_hash {
+	PCAPNG_HASH_2COMP = 0,
+	PCAPNG_HASH_XOR,
+	PCAPNG_HASH_CRC32,
+	PCAPNG_HASH_MD5,
+	PCAPNG_HASH_SHA1,
+	PCAPNG_HASH_TOEPLITZ,
+};
+
+struct pcapng_simple_packet {
+	uint32_t block_type;	/* 3 */
+	uint32_t block_length;
+	uint32_t packet_length;
+};
+
+struct pcapng_statistics {
+	uint32_t block_type;	/* 5 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+};
+
+enum pcapng_isb_options {
+	PCAPNG_ISB_STARTTIME = 2,
+	PCAPNG_ISB_ENDTIME,
+	PCAPNG_ISB_IFRECV,
+	PCAPNG_ISB_IFDROP,
+	PCAPNG_ISB_FILTERACCEPT,
+	PCAPNG_ISB_OSDROP,
+	PCAPNG_ISB_USRDELIV,
+};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
new file mode 100644
index 000000000000..3a399de3d037
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.c
@@ -0,0 +1,607 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+#include <errno.h>
+#include <net/if.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/uio.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+#include <rte_time.h>
+
+#include "pcapng_proto.h"
+
+/* conversion from DPDK speed to PCAPNG */
+#define PCAPNG_MBPS_SPEED 1000000ull
+
+/* Format of the capture file handle */
+struct rte_pcapng {
+	int  outfd;		/* output file */
+	/* DPDK port id to interface index in file */
+	uint32_t port_index[RTE_MAX_ETHPORTS];
+};
+
+/* For converting TSC cycles to PCAPNG ns format */
+struct pcapng_time {
+	uint64_t ns;
+	uint64_t cycles;
+} pcapng_time;
+
+RTE_INIT(pcapng_init)
+{
+	struct timespec ts;
+
+	pcapng_time.cycles = rte_get_tsc_cycles();
+	clock_gettime(CLOCK_REALTIME, &ts);
+	pcapng_time.ns = rte_timespec_to_ns(&ts);
+}
+
+/* PCAPNG timestamps are in nanoseconds */
+static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
+{
+	uint64_t delta;
+
+	delta = cycles - pcapng_time.cycles;
+	return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
+}
+
+/* length of option including padding */
+static uint16_t pcapng_optlen(uint16_t len)
+{
+	return RTE_ALIGN(sizeof(struct pcapng_option) + len,
+			 sizeof(uint32_t));
+}
+
+/* build TLV option and return location of next */
+static struct pcapng_option *
+pcapng_add_option(struct pcapng_option *popt, uint16_t code,
+		  const void *data, uint16_t len)
+{
+	popt->code = code;
+	popt->length = len;
+	memcpy(popt->data, data, len);
+
+	return (struct pcapng_option *)((uint8_t *)popt + pcapng_optlen(len));
+}
+
+/*
+ * Write required initial section header describing the capture
+ */
+static int
+pcapng_section_block(rte_pcapng_t *self,
+		    const char *os, const char *hw,
+		    const char *app, const char *comment)
+{
+	struct pcapng_section_header *hdr;
+	struct pcapng_option *opt;
+	void *buf;
+	uint32_t len;
+	ssize_t cc;
+
+	len = sizeof(*hdr);
+	if (hw)
+		len += pcapng_optlen(strlen(hw));
+	if (os)
+		len += pcapng_optlen(strlen(os));
+	if (app)
+		len += pcapng_optlen(strlen(app));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	/* reserve space for OPT_END */
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = calloc(1, len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_section_header *)buf;
+	*hdr = (struct pcapng_section_header) {
+		.block_type = PCAPNG_SECTION_BLOCK,
+		.block_length = len,
+		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
+		.major_version = PCAPNG_MAJOR_VERS,
+		.minor_version = PCAPNG_MINOR_VERS,
+		.section_length = UINT64_MAX,
+	};
+
+	/* After the section header insert variable length options. */
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (hw)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_HARDWARE,
+					hw, strlen(hw));
+	if (os)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_OS,
+					os, strlen(os));
+	if (app)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_USERAPPL,
+					app, strlen(app));
+
+	/* The standard requires last option to be OPT_END */
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	/* clone block_length after option */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+/* Write an interface block for a DPDK port */
+static int
+pcapng_add_interface(rte_pcapng_t *self, uint16_t port)
+{
+	struct pcapng_interface_block *hdr;
+	struct rte_eth_dev_info dev_info;
+	struct rte_ether_addr *ea, macaddr;
+	const struct rte_device *dev;
+	struct rte_eth_link link;
+	struct pcapng_option *opt;
+	const uint8_t tsresol = 9;	/* nanosecond resolution */
+	uint32_t len;
+	void *buf;
+	char ifname[IF_NAMESIZE];
+	char ifhw[256];
+	uint64_t speed = 0;
+
+	if (rte_eth_dev_info_get(port, &dev_info) < 0)
+		return -1;
+
+	/* make something like an interface name */
+	if (if_indextoname(dev_info.if_index, ifname) == NULL)
+		snprintf(ifname, IF_NAMESIZE, "dpdk:%u", port);
+
+	/* make a useful device hardware string */
+	dev = dev_info.device;
+	if (dev)
+		snprintf(ifhw, sizeof(ifhw),
+			 "%s-%s", dev->bus->name, dev->name);
+
+	/* DPDK reports in units of Mbps */
+	rte_eth_link_get(port, &link);
+	if (link.link_status == ETH_LINK_UP)
+		speed = link.link_speed * PCAPNG_MBPS_SPEED;
+
+	if (rte_eth_macaddr_get(port, &macaddr) < 0)
+		ea = NULL;
+	else
+		ea = &macaddr;
+
+	/* Compute length of interface block options */
+	len = sizeof(*hdr);
+
+	len += pcapng_optlen(sizeof(tsresol));	/* timestamp */
+	len += pcapng_optlen(strlen(ifname));	/* ifname */
+
+	if (ea)
+		len += pcapng_optlen(RTE_ETHER_ADDR_LEN); /* macaddr */
+	if (speed != 0)
+		len += pcapng_optlen(sizeof(uint64_t));
+	if (dev)
+		len += pcapng_optlen(strlen(ifhw));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = alloca(len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_interface_block *)buf;
+	*hdr = (struct pcapng_interface_block) {
+		.block_type = PCAPNG_INTERFACE_BLOCK,
+		.link_type = 1,		/* DLT_EN10MB - Ethernet */
+		.block_length = len,
+	};
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	opt = pcapng_add_option(opt, PCAPNG_IFB_TSRESOL,
+				&tsresol, sizeof(tsresol));
+	opt = pcapng_add_option(opt, PCAPNG_IFB_NAME,
+				ifname, strlen(ifname));
+	if (ea)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_MACADDR,
+					ea, RTE_ETHER_ADDR_LEN);
+	if (speed != 0)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_SPEED,
+					 &speed, sizeof(uint64_t));
+	if (dev)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE,
+					 ifhw, strlen(ifhw));
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	/* clone block_length after optionsa */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+/*
+ * Write the list of possible interfaces at the start
+ * of the file.
+ */
+static int
+pcapng_interfaces(rte_pcapng_t *self)
+{
+	uint16_t port_id;
+	uint16_t index = 0;
+
+	RTE_ETH_FOREACH_DEV(port_id) {
+		/* The list if ports in pcapng needs to be contiguous */
+		self->port_index[port_id] = index++;
+		if (pcapng_add_interface(self, port_id) < 0)
+			return -1;
+	}
+	return 0;
+}
+
+/*
+ * Write an Interface statistics block at the end of capture.
+ */
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop)
+{
+	struct pcapng_statistics *hdr;
+	struct pcapng_option *opt;
+	uint32_t optlen, len;
+	uint8_t *buf;
+	uint64_t ns;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	optlen = 0;
+
+	if (ifrecv != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifdrop));
+	if (start_time != 0)
+		optlen += pcapng_optlen(sizeof(start_time));
+	if (end_time != 0)
+		optlen += pcapng_optlen(sizeof(end_time));
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+	if (optlen != 0)
+		optlen += pcapng_optlen(0);
+
+	len = sizeof(*hdr) + optlen + sizeof(uint32_t);
+	buf = alloca(len);
+	if (buf == NULL)
+		return -1;
+
+	hdr = (struct pcapng_statistics *)buf;
+	opt = (struct pcapng_option *)(hdr + 1);
+
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (start_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME,
+					 &start_time, sizeof(start_time));
+	if (end_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME,
+					 &end_time, sizeof(end_time));
+	if (ifrecv != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV,
+				&ifrecv, sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFDROP,
+				&ifdrop, sizeof(ifdrop));
+	if (optlen != 0)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	hdr->block_type = PCAPNG_INTERFACE_STATS_BLOCK;
+	hdr->block_length = len;
+	hdr->interface_id = self->port_index[port_id];
+
+	ns = pcapng_tsc_to_ns(rte_get_tsc_cycles());
+	hdr->timestamp_hi = ns >> 32;
+	hdr->timestamp_lo = (uint32_t)ns;
+
+	/* clone block_length after option */
+	memcpy(opt, &len, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length)
+{
+	/* The VLAN and EPB header must fit in the mbuf headroom. */
+	RTE_ASSERT(sizeof(struct pcapng_enhance_packet_block) +
+		   sizeof(struct rte_vlan_hdr) <= RTE_PKTMBUF_HEADROOM);
+
+	/* The flags and queue information are added at the end. */
+	return sizeof(struct rte_mbuf)
+		+ RTE_ALIGN(length, sizeof(uint32_t))
+		+ pcapng_optlen(sizeof(uint32_t)) /* flag option */
+		+ pcapng_optlen(sizeof(uint32_t)) /* queue option */
+		+ sizeof(uint32_t);		  /*  length */
+}
+
+/* More generalized version rte_vlan_insert() */
+static int
+pcapng_vlan_insert(struct rte_mbuf *m, uint16_t ether_type, uint16_t tci)
+{
+	struct rte_ether_hdr *nh, *oh;
+	struct rte_vlan_hdr *vh;
+
+	if (!RTE_MBUF_DIRECT(m) || rte_mbuf_refcnt_read(m) > 1)
+		return -EINVAL;
+
+	if (rte_pktmbuf_data_len(m) < sizeof(*oh))
+		return -EINVAL;
+
+	oh = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
+	nh = (struct rte_ether_hdr *)
+		rte_pktmbuf_prepend(m, sizeof(struct rte_vlan_hdr));
+	if (nh == NULL)
+		return -ENOSPC;
+
+	memmove(nh, oh, 2 * RTE_ETHER_ADDR_LEN);
+	nh->ether_type = rte_cpu_to_be_16(ether_type);
+
+	vh = (struct rte_vlan_hdr *) (nh + 1);
+	vh->vlan_tci = rte_cpu_to_be_16(tci);
+
+	return 0;
+}
+
+/*
+ *   The mbufs created use the Pcapng standard enhanced packet  block.
+ *
+ *                         1                   2                   3
+ *     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  0 |                    Block Type = 0x00000006                    |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  4 |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  8 |                         Interface ID                          |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 12 |                        Timestamp (High)                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 16 |                        Timestamp (Low)                        |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 20 |                    Captured Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 24 |                    Original Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 28 /                                                               /
+ *    /                          Packet Data                          /
+ *    /              variable length, padded to 32 bits               /
+ *    /                                                               /
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0002     |     Option Length = 0x004     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Flags (direction)                                |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0006     |     Option Length = 0x002     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Queue id                                         |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+
+/* Make a copy of original mbuf with pcapng header and options */
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *md,
+		struct rte_mempool *mp,
+		uint32_t length, uint64_t cycles,
+		enum rte_pcapng_direction direction)
+{
+	struct pcapng_enhance_packet_block *epb;
+	uint32_t orig_len, data_len, padding, flags;
+	struct pcapng_option *opt;
+	const uint16_t optlen = pcapng_optlen(sizeof(flags)) + pcapng_optlen(sizeof(queue));
+	struct rte_mbuf *mc;
+	uint64_t ns;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
+#endif
+	ns = pcapng_tsc_to_ns(cycles);
+
+	orig_len = rte_pktmbuf_pkt_len(md);
+
+	/* Take snapshot of the data */
+	mc = rte_pktmbuf_copy(md, mp, 0, length);
+	if (unlikely(mc == NULL))
+		return NULL;
+
+	/* Expand any offloaded VLAN information */
+	if ((direction == RTE_PCAPNG_DIRECTION_IN &&
+	     (md->ol_flags & PKT_RX_VLAN_STRIPPED)) ||
+	    (direction == RTE_PCAPNG_DIRECTION_OUT &&
+	     (md->ol_flags & PKT_TX_VLAN))) {
+		if (pcapng_vlan_insert(mc, RTE_ETHER_TYPE_VLAN,
+				       md->vlan_tci) != 0)
+			goto fail;
+	}
+
+	if ((direction == RTE_PCAPNG_DIRECTION_IN &&
+	     (md->ol_flags & PKT_RX_QINQ_STRIPPED)) ||
+	    (direction == RTE_PCAPNG_DIRECTION_OUT &&
+	     (md->ol_flags & PKT_TX_QINQ))) {
+		if (pcapng_vlan_insert(mc, RTE_ETHER_TYPE_QINQ,
+				       md->vlan_tci_outer) != 0)
+			goto fail;
+	}
+
+	/* pad the packet to 32 bit boundary */
+	data_len = rte_pktmbuf_data_len(mc);
+	padding = RTE_ALIGN(data_len, sizeof(uint32_t)) - data_len;
+	if (padding > 0) {
+		void *tail = rte_pktmbuf_append(mc, padding);
+
+		if (tail == NULL)
+			goto fail;
+		memset(tail, 0, padding);
+	}
+
+	/* reserve trailing options and block length */
+	opt = (struct pcapng_option *)
+		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
+	if (unlikely(opt == NULL))
+		goto fail;
+
+	switch (direction) {
+	case RTE_PCAPNG_DIRECTION_IN:
+		flags = PCAPNG_IFB_INBOUND;
+		break;
+	case RTE_PCAPNG_DIRECTION_OUT:
+		flags = PCAPNG_IFB_OUTBOUND;
+		break;
+	default:
+		flags = 0;
+	}
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS,
+				&flags, sizeof(flags));
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
+				&queue, sizeof(queue));
+
+	/* Note: END_OPT necessary here. Wireshark doesn't do it. */
+
+	/* Add PCAPNG packet header */
+	epb = (struct pcapng_enhance_packet_block *)
+		rte_pktmbuf_prepend(mc, sizeof(*epb));
+	if (unlikely(epb == NULL))
+		goto fail;
+
+	epb->block_type = PCAPNG_ENHANCED_PACKET_BLOCK;
+	epb->block_length = rte_pktmbuf_data_len(mc);
+
+	/* Interface index is filled in later during write */
+	mc->port = port_id;
+
+	epb->timestamp_hi = ns >> 32;
+	epb->timestamp_lo = (uint32_t)ns;
+	epb->capture_length = data_len;
+	epb->original_length = orig_len;
+
+	/* set trailer of block length */
+	*(uint32_t *)opt = epb->block_length;
+
+	return mc;
+
+fail:
+	rte_pktmbuf_free(mc);
+	return NULL;
+}
+
+/* Count how many segments are in this array of mbufs */
+static unsigned int
+mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
+{
+	unsigned int i, iovcnt;
+
+	for (iovcnt = 0, i = 0; i < n; i++) {
+		const struct rte_mbuf *m = pkts[i];
+
+		__rte_mbuf_sanity_check(m, 1);
+
+		iovcnt += m->nb_segs;
+	}
+	return iovcnt;
+}
+
+/* Write pre-formatted packets to file. */
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts)
+{
+	int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
+	struct iovec iov[iovcnt];
+	unsigned int i, cnt;
+	ssize_t ret;
+
+	for (i = cnt = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *m = pkts[i];
+		struct pcapng_enhance_packet_block *epb;
+
+		/* sanity check that is really a pcapng mbuf */
+		epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
+		if (unlikely(epb->block_type != PCAPNG_ENHANCED_PACKET_BLOCK ||
+			     epb->block_length != rte_pktmbuf_data_len(m))) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/*
+		 * The DPDK port is recorded during pcapng_copy.
+		 * Map that to PCAPNG interface in file.
+		 */
+		epb->interface_id = self->port_index[m->port];
+		do {
+			iov[cnt].iov_base = rte_pktmbuf_mtod(m, void *);
+			iov[cnt].iov_len = rte_pktmbuf_data_len(m);
+			++cnt;
+		} while ((m = m->next));
+	}
+
+	ret = writev(self->outfd, iov, iovcnt);
+	if (unlikely(ret < 0))
+		rte_errno = errno;
+	return ret;
+}
+
+/* Create new pcapng writer handle */
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment)
+{
+	rte_pcapng_t *self;
+
+	self = malloc(sizeof(*self));
+	if (!self) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	self->outfd = fd;
+
+	if (pcapng_section_block(self, osname, hardware, appname, comment) < 0)
+		goto fail;
+
+	if (pcapng_interfaces(self) < 0)
+		goto fail;
+
+	return self;
+fail:
+	free(self);
+	return NULL;
+}
+
+void
+rte_pcapng_close(rte_pcapng_t *self)
+{
+	close(self->outfd);
+	free(self);
+}
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
new file mode 100644
index 000000000000..8d3fbb1941b4
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.h
@@ -0,0 +1,195 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+/**
+ * @file
+ * RTE pcapng
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * Pcapng is an evolution from the pcap format, created to address some of
+ * its deficiencies. Namely, the lack of extensibility and inability to store
+ * additional information.
+ *
+ * For details about the file format see RFC:
+ *   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+ *  and
+ *    https://github.com/pcapng/pcapng/
+ */
+
+#ifndef _RTE_PCAPNG_H_
+#define _RTE_PCAPNG_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+#include <rte_compat.h>
+#include <rte_common.h>
+#include <rte_mempool.h>
+#include <rte_ring.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Opaque handle used for functions in this library. */
+typedef struct rte_pcapng rte_pcapng_t;
+
+/**
+ * Write data to existing open file
+ *
+ * @param fd
+ *   file descriptor
+ * @param osname
+ *   Optional description of the operating system.
+ *   Examples: "Debian 11", "Windows Server 22"
+ * @param hardware
+ *   Optional description of the hardware used to create this file.
+ *   Examples: "x86 Virtual Machine"
+ * @param appname
+ *   Optional: application name recorded in the pcapng file.
+ *   Example: "dpdk-dumpcap 1.0 (DPDK 20.11)"
+ * @param comment
+ *   Optional comment to add to file header.
+ * @return
+ *   handle to library, or NULL in case of error (and rte_errno is set).
+ */
+__rte_experimental
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment);
+
+/**
+ * Close capture file
+ *
+ * @param self
+ *  handle to library
+ */
+__rte_experimental
+void
+rte_pcapng_close(rte_pcapng_t *self);
+
+/**
+ * Direction flag
+ * These should match Enhanced Packet Block flag bits
+ */
+enum rte_pcapng_direction {
+	RTE_PCAPNG_DIRECTION_UNKNOWN = 0,
+	RTE_PCAPNG_DIRECTION_IN  = 1,
+	RTE_PCAPNG_DIRECTION_OUT = 2,
+};
+
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param port_id
+ *   The Ethernet port on which packet was received
+ *   or is going to be transmitted.
+ * @param queue
+ *   The queue on the Ethernet port where packet was received
+ *   or is going to be transmitted.
+ * @param mp
+ *   The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ *   The mbuf to copy
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @param timestamp
+ *   The timestamp in TSC cycles.
+ * @param direction
+ *   The direction of the packer: receive, transmit or unknown.
+ *
+ * @return
+ *   - The pointer to the new mbuf formatted for pcapng_write
+ *   - NULL if allocation fails.
+ *
+ */
+__rte_experimental
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *m, struct rte_mempool *mp,
+		uint32_t length, uint64_t timestamp,
+		enum rte_pcapng_direction direction);
+
+
+/**
+ * Determine optimum mbuf data size.
+ *
+ * @param length
+ *   The largest packet that will be copied.
+ * @return
+ *   The minimum size of mbuf data to handle packet with length bytes.
+ *   Accounting for required header and trailer fields
+ */
+__rte_experimental
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length);
+
+/**
+ * Write packets to the capture file.
+ *
+ * Packets to be captured are copied by rte_pcapng_copy()
+ * and then this function is called to write them to the file.
+ *
+ * @warning
+ * Do not pass original mbufs from transmit or receive
+ * or file will be invalid pcapng format.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param pkts
+ *  The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *  which contain the output packets
+ * @param nb_pkts
+ *  The number of packets to write to the file.
+ * @return
+ *  The number of bytes written to file, -1 on failure to write file.
+ *  The mbuf's in *pkts* are always freed.
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts);
+
+/**
+ * Write an Interface statistics block.
+ * For statistics, use 0 if don't know or care to report it.
+ * Should be called before closing capture to report results.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param port
+ *  The Ethernet port to report stats on.
+ * @param comment
+ *   Optional comment to add to statistics.
+ * @param start_time
+ *  The time when packet capture was started in nanoseconds.
+ *  Optional: can be zero if not known.
+ * @param end_time
+ *  The time when packet capture was stopped in nanoseconds.
+ *  Optional: can be zero if not finished;
+ * @param ifrecv
+ *  The number of packets received by capture.
+ *  Optional: use UINT64_MAX if not known.
+ * @param ifdrop
+ *  The number of packets missed by the capture process.
+ *  Optional: use UINT64_MAX if not known.
+ * @return
+ *  number of bytes written to file, -1 on failure to write file
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PCAPNG_H_ */
diff --git a/lib/pcapng/version.map b/lib/pcapng/version.map
new file mode 100644
index 000000000000..05a9c86a7d91
--- /dev/null
+++ b/lib/pcapng/version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_pcapng_close;
+	rte_pcapng_copy;
+	rte_pcapng_fdopen;
+	rte_pcapng_mbuf_size;
+	rte_pcapng_write_packets;
+	rte_pcapng_write_stats;
+
+	local: *;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v14 03/12] bpf: allow self-xor operation
  2021-10-15 20:11 ` [dpdk-dev] [PATCH v14 00/12] Packet capture framework update Stephen Hemminger
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 01/12] lib: pdump is not supported on Windows Stephen Hemminger
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-10-15 20:11   ` Stephen Hemminger
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 20:11 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

Some BPF programs may use XOR of a register with itself
as a way to zero register in one instruction.
The BPF filter converter generates this in the prolog
to the generated code.

The BPF validator would not allow this because the value of
register was undefined. But after this operation it always zero.

Fixes: 8021917293d0 ("bpf: add extra validation for input BPF program")
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_validate.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_validate.c b/lib/bpf/bpf_validate.c
index 7b1291b382e9..853279fee557 100644
--- a/lib/bpf/bpf_validate.c
+++ b/lib/bpf/bpf_validate.c
@@ -661,8 +661,15 @@ eval_alu(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
 
 	op = BPF_OP(ins->code);
 
+	/* Allow self-xor as way to zero register */
+	if (op == BPF_XOR && BPF_SRC(ins->code) == BPF_X &&
+	    ins->src_reg == ins->dst_reg) {
+		eval_fill_imm(&rs, UINT64_MAX, 0);
+		eval_fill_imm(rd, UINT64_MAX, 0);
+	}
+
 	err = eval_defined((op != EBPF_MOV) ? rd : NULL,
-			(op != BPF_NEG) ? &rs : NULL);
+			   (op != BPF_NEG) ? &rs : NULL);
 	if (err != NULL)
 		return err;
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v14 04/12] bpf: add function to convert classic BPF to DPDK BPF
  2021-10-15 20:11 ` [dpdk-dev] [PATCH v14 00/12] Packet capture framework update Stephen Hemminger
                     ` (2 preceding siblings ...)
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 03/12] bpf: allow self-xor operation Stephen Hemminger
@ 2021-10-15 20:11   ` Stephen Hemminger
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 20:11 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Ray Kinsella

The pcap library emits classic BPF (32 bit) and is useful for
creating filter programs.  The DPDK BPF library only implements
extended BPF (eBPF).  Add an function to convert from old to
new.

The rte_bpf_convert function uses rte_malloc to put the resulting
program in hugepage shared memory so it can be passed from a
secondary process to a primary process.

The code to convert was originally done as part of the Linux
kernel implementation then converted to a userspace program.
See https://github.com/tklauser/filter2xdp

Both authors have agreed that it is allowable to create a modified
version of this code and license it with BSD license used by DPDK.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/bpf/bpf_convert.c | 575 ++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build   |   5 +
 lib/bpf/rte_bpf.h     |  25 ++
 lib/bpf/version.map   |   6 +
 4 files changed, 611 insertions(+)
 create mode 100644 lib/bpf/bpf_convert.c

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
new file mode 100644
index 000000000000..db84add7dcce
--- /dev/null
+++ b/lib/bpf/bpf_convert.c
@@ -0,0 +1,575 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Microsoft Corporation
+ *
+ * Based on bpf_convert_filter() in the Linux kernel sources
+ * and filter2xdp.
+ *
+ * Licensed as BSD with permission original authors.
+ * Copyright (C) 2017 Tobias Klauser
+ * Copyright (c) 2011 - 2014 PLUMgrid, http://plumgrid.com
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_bpf.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+/* Workaround name conflicts with libpcap */
+#define bpf_validate(f, len) bpf_validate_libpcap(f, len)
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+#undef bpf_validate
+
+#include "bpf_impl.h"
+#include "bpf_def.h"
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+/*
+ * Linux socket filter uses negative absolute offsets to
+ * reference ancillary data.
+ */
+#define SKF_AD_OFF    (-0x1000)
+#define SKF_AD_PROTOCOL 0
+#define SKF_AD_PKTTYPE	4
+#define SKF_AD_IFINDEX	8
+#define SKF_AD_NLATTR	12
+#define SKF_AD_NLATTR_NEST	16
+#define SKF_AD_MARK	20
+#define SKF_AD_QUEUE	24
+#define SKF_AD_HATYPE	28
+#define SKF_AD_RXHASH	32
+#define SKF_AD_CPU	36
+#define SKF_AD_ALU_XOR_X	40
+#define SKF_AD_VLAN_TAG	44
+#define SKF_AD_VLAN_TAG_PRESENT 48
+#define SKF_AD_PAY_OFFSET	52
+#define SKF_AD_RANDOM	56
+#define SKF_AD_VLAN_TPID	60
+#define SKF_AD_MAX	64
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1	EBPF_REG_1
+#define BPF_REG_ARG2	EBPF_REG_2
+#define BPF_REG_ARG3	EBPF_REG_3
+#define BPF_REG_ARG4	EBPF_REG_4
+#define BPF_REG_ARG5	EBPF_REG_5
+#define BPF_REG_CTX	EBPF_REG_6
+#define BPF_REG_FP	EBPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A	EBPF_REG_0
+#define BPF_REG_X	EBPF_REG_7
+#define BPF_REG_TMP	EBPF_REG_8
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define EBPF_ALU64_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | BPF_OP(OP) | BPF_X,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU32_IMM(OP, DST, IMM)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV32_IMM(DST, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = CODE,					\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN()						\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | EBPF_EXIT,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/*
+ * Placeholder to convert BPF extensions like length and VLAN tag
+ * If and when DPDK BPF supports them.
+ */
+static bool convert_bpf_load(const struct bpf_insn *fp,
+			     struct ebpf_insn **new_insnp __rte_unused)
+{
+	switch (fp->k) {
+	case SKF_AD_OFF + SKF_AD_PROTOCOL:
+	case SKF_AD_OFF + SKF_AD_PKTTYPE:
+	case SKF_AD_OFF + SKF_AD_IFINDEX:
+	case SKF_AD_OFF + SKF_AD_HATYPE:
+	case SKF_AD_OFF + SKF_AD_MARK:
+	case SKF_AD_OFF + SKF_AD_RXHASH:
+	case SKF_AD_OFF + SKF_AD_QUEUE:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG_PRESENT:
+	case SKF_AD_OFF + SKF_AD_VLAN_TPID:
+	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
+	case SKF_AD_OFF + SKF_AD_NLATTR:
+	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
+	case SKF_AD_OFF + SKF_AD_CPU:
+	case SKF_AD_OFF + SKF_AD_RANDOM:
+	case SKF_AD_OFF + SKF_AD_ALU_XOR_X:
+		/* Linux has special negative offsets to access meta-data. */
+		RTE_BPF_LOG(ERR,
+			    "rte_bpf_convert: socket offset %d not supported\n",
+			    fp->k - SKF_AD_OFF);
+		return true;
+	default:
+		return false;
+	}
+}
+
+static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
+			      struct ebpf_insn *new_prog, uint32_t *new_len)
+{
+	unsigned int pass = 0;
+	size_t new_flen = 0, target, i;
+	struct ebpf_insn *new_insn;
+	const struct bpf_insn *fp;
+	int *addrs = NULL;
+	uint8_t bpf_src;
+
+	if (len > BPF_MAXINSNS) {
+		RTE_BPF_LOG(ERR, "%s: cBPF program too long (%zu insns)\n",
+			    __func__, len);
+		return -EINVAL;
+	}
+
+	/* On second pass, allocate the new program */
+	if (new_prog) {
+		addrs = calloc(len, sizeof(*addrs));
+		if (addrs == NULL)
+			return -ENOMEM;
+	}
+
+do_pass:
+	new_insn = new_prog;
+	fp = prog;
+
+	/* Classic BPF related prologue emission. */
+	if (new_insn) {
+		/* Classic BPF expects A and X to be reset first. These need
+		 * to be guaranteed to be the first two instructions.
+		 */
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+
+		/* All programs must keep CTX in callee saved BPF_REG_CTX.
+		 * In eBPF case it's done by the compiler, here we need to
+		 * do this ourself. Initial CTX is present in BPF_REG_ARG1.
+		 */
+		*new_insn++ = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1);
+	} else {
+		new_insn += 3;
+	}
+
+	for (i = 0; i < len; fp++, i++) {
+		struct ebpf_insn tmp_insns[6] = { };
+		struct ebpf_insn *insn = tmp_insns;
+
+		if (addrs)
+			addrs[i] = new_insn - new_prog;
+
+		switch (fp->code) {
+			/* Absolute loads are how classic BPF accesses skb */
+		case BPF_LD | BPF_ABS | BPF_W:
+		case BPF_LD | BPF_ABS | BPF_H:
+		case BPF_LD | BPF_ABS | BPF_B:
+			if (convert_bpf_load(fp, &insn))
+				goto err;
+
+			*insn = BPF_RAW_INSN(fp->code, 0, 0, 0, fp->k);
+			break;
+
+		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_X:
+			/* For cBPF, don't cause floating point exception */
+			*insn++ = BPF_MOV32_REG(BPF_REG_X, BPF_REG_X);
+			*insn++ = BPF_JMP_IMM(EBPF_JNE, BPF_REG_X, 0, 2);
+			*insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+			*insn++ = BPF_EXIT_INSN();
+			/* fallthrough */
+		case BPF_ALU | BPF_ADD | BPF_X:
+		case BPF_ALU | BPF_ADD | BPF_K:
+		case BPF_ALU | BPF_SUB | BPF_X:
+		case BPF_ALU | BPF_SUB | BPF_K:
+		case BPF_ALU | BPF_AND | BPF_X:
+		case BPF_ALU | BPF_AND | BPF_K:
+		case BPF_ALU | BPF_OR | BPF_X:
+		case BPF_ALU | BPF_OR | BPF_K:
+		case BPF_ALU | BPF_LSH | BPF_X:
+		case BPF_ALU | BPF_LSH | BPF_K:
+		case BPF_ALU | BPF_RSH | BPF_X:
+		case BPF_ALU | BPF_RSH | BPF_K:
+		case BPF_ALU | BPF_XOR | BPF_X:
+		case BPF_ALU | BPF_XOR | BPF_K:
+		case BPF_ALU | BPF_MUL | BPF_X:
+		case BPF_ALU | BPF_MUL | BPF_K:
+		case BPF_ALU | BPF_DIV | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_NEG:
+		case BPF_LD | BPF_IND | BPF_W:
+		case BPF_LD | BPF_IND | BPF_H:
+		case BPF_LD | BPF_IND | BPF_B:
+			/* All arithmetic insns map as-is. */
+			insn->code = fp->code;
+			insn->dst_reg = BPF_REG_A;
+			bpf_src = BPF_SRC(fp->code);
+			insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			insn->off = 0;
+			insn->imm = fp->k;
+			break;
+
+			/* Jump transformation cannot use BPF block macros
+			 * everywhere as offset calculation and target updates
+			 * require a bit more work than the rest, i.e. jump
+			 * opcodes map as-is, but offsets need adjustment.
+			 */
+
+#define BPF_EMIT_JMP							\
+			do {						\
+				if (target >= len)			\
+					goto err;			\
+				insn->off = addrs ? addrs[target] - addrs[i] - 1 : 0; \
+				/* Adjust pc relative offset for 2nd or 3rd insn. */ \
+				insn->off -= insn - tmp_insns;		\
+			} while (0)
+
+		case BPF_JMP | BPF_JA:
+			target = i + fp->k + 1;
+			insn->code = fp->code;
+			BPF_EMIT_JMP;
+			break;
+
+		case BPF_JMP | BPF_JEQ | BPF_K:
+		case BPF_JMP | BPF_JEQ | BPF_X:
+		case BPF_JMP | BPF_JSET | BPF_K:
+		case BPF_JMP | BPF_JSET | BPF_X:
+		case BPF_JMP | BPF_JGT | BPF_K:
+		case BPF_JMP | BPF_JGT | BPF_X:
+		case BPF_JMP | BPF_JGE | BPF_K:
+		case BPF_JMP | BPF_JGE | BPF_X:
+			if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
+				/* BPF immediates are signed, zero extend
+				 * immediate into tmp register and use it
+				 * in compare insn.
+				 */
+				*insn++ = BPF_MOV32_IMM(BPF_REG_TMP, fp->k);
+
+				insn->dst_reg = BPF_REG_A;
+				insn->src_reg = BPF_REG_TMP;
+				bpf_src = BPF_X;
+			} else {
+				insn->dst_reg = BPF_REG_A;
+				insn->imm = fp->k;
+				bpf_src = BPF_SRC(fp->code);
+				insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			}
+
+			/* Common case where 'jump_false' is next insn. */
+			if (fp->jf == 0) {
+				insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+				target = i + fp->jt + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Convert JEQ into JNE when 'jump_true' is next insn. */
+			if (fp->jt == 0 && BPF_OP(fp->code) == BPF_JEQ) {
+				insn->code = BPF_JMP | EBPF_JNE | bpf_src;
+				target = i + fp->jf + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Other jumps are mapped into two insns: Jxx and JA. */
+			target = i + fp->jt + 1;
+			insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+			BPF_EMIT_JMP;
+			insn++;
+
+			insn->code = BPF_JMP | BPF_JA;
+			target = i + fp->jf + 1;
+			BPF_EMIT_JMP;
+			break;
+
+			/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
+		case BPF_LDX | BPF_MSH | BPF_B:
+			/* tmp = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_A);
+			/* A = BPF_R0 = *(u8 *) (skb->data + K) */
+			*insn++ = BPF_LD_ABS(BPF_B, fp->k);
+			/* A &= 0xf */
+			*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, 0xf);
+			/* A <<= 2 */
+			*insn++ = BPF_ALU32_IMM(BPF_LSH, BPF_REG_A, 2);
+			/* X = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			/* A = tmp */
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP);
+			break;
+
+			/* RET_K is remaped into 2 insns. RET_A case doesn't need an
+			 * extra mov as EBPF_REG_0 is already mapped into BPF_REG_A.
+			 */
+		case BPF_RET | BPF_A:
+		case BPF_RET | BPF_K:
+			if (BPF_RVAL(fp->code) == BPF_K) {
+				*insn++ = BPF_MOV32_RAW(BPF_K, EBPF_REG_0,
+							0, fp->k);
+			}
+			*insn = BPF_EXIT_INSN();
+			break;
+
+			/* Store to stack. */
+		case BPF_ST:
+		case BPF_STX:
+			*insn = BPF_STX_MEM(BPF_W, BPF_REG_FP, BPF_CLASS(fp->code) ==
+					    BPF_ST ? BPF_REG_A : BPF_REG_X,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* Load from stack. */
+		case BPF_LD | BPF_MEM:
+		case BPF_LDX | BPF_MEM:
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD  ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_FP,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* A = K or X = K */
+		case BPF_LD | BPF_IMM:
+		case BPF_LDX | BPF_IMM:
+			*insn = BPF_MOV32_IMM(BPF_CLASS(fp->code) == BPF_LD ?
+					      BPF_REG_A : BPF_REG_X, fp->k);
+			break;
+
+			/* X = A */
+		case BPF_MISC | BPF_TAX:
+			*insn = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			break;
+
+			/* A = X */
+		case BPF_MISC | BPF_TXA:
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_X);
+			break;
+
+			/* A = mbuf->len or X = mbuf->len */
+		case BPF_LD | BPF_W | BPF_LEN:
+		case BPF_LDX | BPF_W | BPF_LEN:
+			/* BPF_ABS/BPF_IND implicitly expect mbuf ptr in R6 */
+
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_CTX,
+					    offsetof(struct rte_mbuf, pkt_len));
+			break;
+
+			/* Unknown instruction. */
+		default:
+			RTE_BPF_LOG(ERR, "%s: Unknown instruction!: %#x\n",
+				    __func__, fp->code);
+			goto err;
+		}
+
+		insn++;
+		if (new_prog)
+			memcpy(new_insn, tmp_insns,
+			       sizeof(*insn) * (insn - tmp_insns));
+		new_insn += insn - tmp_insns;
+	}
+
+	if (!new_prog) {
+		/* Only calculating new length. */
+		*new_len = new_insn - new_prog;
+		return 0;
+	}
+
+	pass++;
+	if ((ptrdiff_t)new_flen != new_insn - new_prog) {
+		new_flen = new_insn - new_prog;
+		if (pass > 2)
+			goto err;
+		goto do_pass;
+	}
+
+	free(addrs);
+	assert(*new_len == new_flen);
+
+	return 0;
+err:
+	free(addrs);
+	return -1;
+}
+
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog)
+{
+	struct rte_bpf_prm *prm = NULL;
+	struct ebpf_insn *ebpf = NULL;
+	uint32_t ebpf_len = 0;
+	int ret;
+
+	if (prog == NULL) {
+		RTE_BPF_LOG(ERR, "%s: NULL program\n", __func__);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* 1st pass: calculate the eBPF program length */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, NULL, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot get eBPF length\n", __func__);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	RTE_BPF_LOG(DEBUG, "%s: prog len cBPF=%u -> eBPF=%u\n",
+		    __func__, prog->bf_len, ebpf_len);
+
+	prm = rte_zmalloc("bpf_filter",
+			  sizeof(*prm) + ebpf_len * sizeof(*ebpf), 0);
+	if (prm == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	/* The EPBF instructions in this case are right after the header */
+	ebpf = (void *)(prm + 1);
+
+	/* 2nd pass: remap cBPF to eBPF instructions  */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, ebpf, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot convert cBPF to eBPF\n", __func__);
+		free(prm);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	prm->ins = ebpf;
+	prm->nb_ins = ebpf_len;
+
+	/* Classic BPF programs use mbufs */
+	prm->prog_arg.type = RTE_BPF_ARG_PTR_MBUF;
+	prm->prog_arg.size = sizeof(struct rte_mbuf);
+
+	return prm;
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 63cbd60185e0..54f7610ae990 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -25,3 +25,8 @@ if dep.found()
     sources += files('bpf_load_elf.c')
     ext_deps += dep
 endif
+
+if dpdk_conf.has('RTE_PORT_PCAP')
+    sources += files('bpf_convert.c')
+    ext_deps += pcap_dep
+endif
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 69116f36ba8b..2f23e272a376 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,31 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+#ifdef RTE_PORT_PCAP
+
+struct bpf_program;
+
+/**
+ * Convert a Classic BPF program from libpcap into a DPDK BPF code.
+ *
+ * @param prog
+ *  Classic BPF program from pcap_compile().
+ * @param prm
+ *  Result Extended BPF program.
+ * @return
+ *   Pointer to BPF program (allocated with *rte_malloc*)
+ *   that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+__rte_experimental
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog);
+
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 0bf35f487666..47082d5003ef 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -14,3 +14,9 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_convert;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v14 05/12] bpf: add function to dump eBPF instructions
  2021-10-15 20:11 ` [dpdk-dev] [PATCH v14 00/12] Packet capture framework update Stephen Hemminger
                     ` (3 preceding siblings ...)
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
@ 2021-10-15 20:11   ` Stephen Hemminger
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 06/12] pdump: support pcapng and filtering Stephen Hemminger
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 20:11 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Ray Kinsella

When debugging converted (and other) programs it is useful
to see disassembled eBPF output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/bpf/bpf_dump.c  | 139 ++++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build |   1 +
 lib/bpf/rte_bpf.h   |  14 +++++
 lib/bpf/version.map |   1 +
 4 files changed, 155 insertions(+)
 create mode 100644 lib/bpf/bpf_dump.c

diff --git a/lib/bpf/bpf_dump.c b/lib/bpf/bpf_dump.c
new file mode 100644
index 000000000000..b86977b96d08
--- /dev/null
+++ b/lib/bpf/bpf_dump.c
@@ -0,0 +1,139 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Stephen Hemminger
+ * Based on filter2xdp
+ * Copyright (C) 2017 Tobias Klauser
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+
+#include "rte_bpf.h"
+
+#define BPF_OP_INDEX(x) (BPF_OP(x) >> 4)
+#define BPF_SIZE_INDEX(x) (BPF_SIZE(x) >> 3)
+
+static const char *const class_tbl[] = {
+	[BPF_LD] = "ld",   [BPF_LDX] = "ldx",	 [BPF_ST] = "st",
+	[BPF_STX] = "stx", [BPF_ALU] = "alu",	 [BPF_JMP] = "jmp",
+	[BPF_RET] = "ret", [BPF_MISC] = "alu64",
+};
+
+static const char *const alu_op_tbl[16] = {
+	[BPF_ADD >> 4] = "add",	   [BPF_SUB >> 4] = "sub",
+	[BPF_MUL >> 4] = "mul",	   [BPF_DIV >> 4] = "div",
+	[BPF_OR >> 4] = "or",	   [BPF_AND >> 4] = "and",
+	[BPF_LSH >> 4] = "lsh",	   [BPF_RSH >> 4] = "rsh",
+	[BPF_NEG >> 4] = "neg",	   [BPF_MOD >> 4] = "mod",
+	[BPF_XOR >> 4] = "xor",	   [EBPF_MOV >> 4] = "mov",
+	[EBPF_ARSH >> 4] = "arsh", [EBPF_END >> 4] = "endian",
+};
+
+static const char *const size_tbl[] = {
+	[BPF_W >> 3] = "w",
+	[BPF_H >> 3] = "h",
+	[BPF_B >> 3] = "b",
+	[EBPF_DW >> 3] = "dw",
+};
+
+static const char *const jump_tbl[16] = {
+	[BPF_JA >> 4] = "ja",	   [BPF_JEQ >> 4] = "jeq",
+	[BPF_JGT >> 4] = "jgt",	   [BPF_JGE >> 4] = "jge",
+	[BPF_JSET >> 4] = "jset",  [EBPF_JNE >> 4] = "jne",
+	[EBPF_JSGT >> 4] = "jsgt", [EBPF_JSGE >> 4] = "jsge",
+	[EBPF_CALL >> 4] = "call", [EBPF_EXIT >> 4] = "exit",
+};
+
+void rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len)
+{
+	uint32_t i;
+
+	for (i = 0; i < len; ++i) {
+		const struct ebpf_insn *ins = buf + i;
+		uint8_t cls = BPF_CLASS(ins->code);
+		const char *op, *postfix = "";
+
+		fprintf(f, " L%u:\t", i);
+
+		switch (cls) {
+		default:
+			fprintf(f, "unimp 0x%x // class: %s\n",
+				ins->code, class_tbl[cls]);
+			break;
+		case BPF_ALU:
+			postfix = "32";
+			/* fall through */
+		case EBPF_ALU64:
+			op = alu_op_tbl[BPF_OP_INDEX(ins->code)];
+			if (BPF_SRC(ins->code) == BPF_X)
+				fprintf(f, "%s%s r%u, r%u\n", op, postfix, ins->dst_reg,
+					ins->src_reg);
+			else
+				fprintf(f, "%s%s r%u, #0x%x\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			break;
+		case BPF_LD:
+			op = "ld";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			if (ins->code == (BPF_LD | BPF_IMM | EBPF_DW)) {
+				uint64_t val;
+
+				val = (uint32_t)ins[0].imm |
+					(uint64_t)(uint32_t)ins[1].imm << 32;
+				fprintf(f, "%s%s r%d, #0x%"PRIx64"\n",
+					op, postfix, ins->dst_reg, val);
+				i++;
+			} else if (BPF_MODE(ins->code) == BPF_IMM)
+				fprintf(f, "%s%s r%d, #0x%x\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			else if (BPF_MODE(ins->code) == BPF_ABS)
+				fprintf(f, "%s%s r%d, [%d]\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			else if (BPF_MODE(ins->code) == BPF_IND)
+				fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix,
+					ins->dst_reg, ins->src_reg, ins->imm);
+			else
+				fprintf(f, "// BUG: LD opcode 0x%02x in eBPF insns\n",
+					ins->code);
+			break;
+		case BPF_LDX:
+			op = "ldx";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix, ins->dst_reg,
+				ins->src_reg, ins->off);
+			break;
+		case BPF_ST:
+			op = "st";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			if (BPF_MODE(ins->code) == BPF_MEM)
+				fprintf(f, "%s%s [r%d + %d], #0x%x\n", op, postfix,
+					ins->dst_reg, ins->off, ins->imm);
+			else
+				fprintf(f, "// BUG: ST opcode 0x%02x in eBPF insns\n",
+					ins->code);
+			break;
+		case BPF_STX:
+			op = "stx";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			fprintf(f, "%s%s [r%d + %d], r%u\n", op, postfix,
+				ins->dst_reg, ins->off, ins->src_reg);
+			break;
+#define L(pc, off) ((int)(pc) + 1 + (off))
+		case BPF_JMP:
+			op = jump_tbl[BPF_OP_INDEX(ins->code)];
+			if (op == NULL)
+				fprintf(f, "invalid jump opcode: %#x\n", ins->code);
+			else if (BPF_OP(ins->code) == BPF_JA)
+				fprintf(f, "%s L%d\n", op, L(i, ins->off));
+			else if (BPF_OP(ins->code) == EBPF_EXIT)
+				fprintf(f, "%s\n", op);
+			else
+				fprintf(f, "%s r%u, #0x%x, L%d\n", op, ins->dst_reg,
+					ins->imm, L(i, ins->off));
+			break;
+		case BPF_RET:
+			fprintf(f, "// BUG: RET opcode 0x%02x in eBPF insns\n",
+				ins->code);
+			break;
+		}
+	}
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 54f7610ae990..5b5585173aeb 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2018 Intel Corporation
 
 sources = files('bpf.c',
+	'bpf_dump.c',
         'bpf_exec.c',
         'bpf_load.c',
         'bpf_pkt.c',
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 2f23e272a376..0d0a84b130a0 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,20 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+/**
+ * Dump epf instructions to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param buf
+ *   A pointer to BPF instructions
+ * @param len
+ *   Number of BPF instructions to dump.
+ */
+__rte_experimental
+void
+rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len);
+
 #ifdef RTE_PORT_PCAP
 
 struct bpf_program;
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 47082d5003ef..3b953f2f4592 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -19,4 +19,5 @@ EXPERIMENTAL {
 	global:
 
 	rte_bpf_convert;
+	rte_bpf_dump;
 };
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v14 06/12] pdump: support pcapng and filtering
  2021-10-15 20:11 ` [dpdk-dev] [PATCH v14 00/12] Packet capture framework update Stephen Hemminger
                     ` (4 preceding siblings ...)
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
@ 2021-10-15 20:11   ` Stephen Hemminger
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 20:11 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Reshma Pattan, Ray Kinsella, Anatoly Burakov

This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
---
 lib/meson.build       |   4 +-
 lib/pdump/meson.build |   2 +-
 lib/pdump/rte_pdump.c | 432 ++++++++++++++++++++++++++++++------------
 lib/pdump/rte_pdump.h | 113 ++++++++++-
 lib/pdump/version.map |   8 +
 5 files changed, 433 insertions(+), 126 deletions(-)

diff --git a/lib/meson.build b/lib/meson.build
index 15150efa19a7..c71c6917dbb7 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -27,6 +27,7 @@ libraries = [
         'acl',
         'bbdev',
         'bitratestats',
+        'bpf',
         'cfgfile',
         'compressdev',
         'cryptodev',
@@ -43,7 +44,6 @@ libraries = [
         'member',
         'pcapng',
         'power',
-        'pdump',
         'rawdev',
         'regexdev',
         'rib',
@@ -55,10 +55,10 @@ libraries = [
         'ipsec', # ipsec lib depends on net, crypto and security
         'fib', #fib lib depends on rib
         'port', # pkt framework libs which use other libs from above
+        'pdump', # pdump lib depends on bpf
         'table',
         'pipeline',
         'flow_classify', # flow_classify lib depends on pkt framework table lib
-        'bpf',
         'graph',
         'node',
 ]
diff --git a/lib/pdump/meson.build b/lib/pdump/meson.build
index 3a95eabde6a6..51ceb2afdec5 100644
--- a/lib/pdump/meson.build
+++ b/lib/pdump/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('rte_pdump.c')
 headers = files('rte_pdump.h')
-deps += ['ethdev']
+deps += ['ethdev', 'bpf', 'pcapng']
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 382217bc1564..2636a216994b 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -7,8 +7,10 @@
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
+#include <rte_pcapng.h>
 
 #include "rte_pdump.h"
 
@@ -27,30 +29,23 @@ enum pdump_operation {
 	ENABLE = 2
 };
 
+/* Internal version number in request */
 enum pdump_version {
-	V1 = 1
+	V1 = 1,		    /* no filtering or snap */
+	V2 = 2,
 };
 
 struct pdump_request {
 	uint16_t ver;
 	uint16_t op;
 	uint32_t flags;
-	union pdump_data {
-		struct enable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} en_v1;
-		struct disable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} dis_v1;
-	} data;
+	char device[RTE_DEV_NAME_MAX_LEN];
+	uint16_t queue;
+	struct rte_ring *ring;
+	struct rte_mempool *mp;
+
+	const struct rte_bpf_prm *prm;
+	uint32_t snaplen;
 };
 
 struct pdump_response {
@@ -63,80 +58,140 @@ static struct pdump_rxtx_cbs {
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	const struct rte_eth_rxtx_callback *cb;
-	void *filter;
+	const struct rte_bpf *filter;
+	enum pdump_version ver;
+	uint32_t snaplen;
 } rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
 tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
 
 
-static inline void
-pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+/*
+ * The packet capture statistics keep track of packets
+ * accepted, filtered and dropped. These are per-queue
+ * and in memory between primary and secondary processes.
+ */
+static const char MZ_RTE_PDUMP_STATS[] = "rte_pdump_stats";
+static struct {
+	struct rte_pdump_stats rx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+	struct rte_pdump_stats tx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+} *pdump_stats;
+
+/* Create a clone of mbuf to be placed into ring. */
+static void
+pdump_copy(uint16_t port_id, uint16_t queue,
+	   enum rte_pcapng_direction direction,
+	   struct rte_mbuf **pkts, uint16_t nb_pkts,
+	   const struct pdump_rxtx_cbs *cbs,
+	   struct rte_pdump_stats *stats)
 {
 	unsigned int i;
 	int ring_enq;
 	uint16_t d_pkts = 0;
 	struct rte_mbuf *dup_bufs[nb_pkts];
-	struct pdump_rxtx_cbs *cbs;
+	uint64_t ts;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	struct rte_mbuf *p;
+	uint64_t rcs[nb_pkts];
+
+	if (cbs->filter)
+		rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts);
 
-	cbs  = user_params;
+	ts = rte_get_tsc_cycles();
 	ring = cbs->ring;
 	mp = cbs->mp;
 	for (i = 0; i < nb_pkts; i++) {
-		p = rte_pktmbuf_copy(pkts[i], mp, 0, UINT32_MAX);
-		if (p)
+		/*
+		 * This uses same BPF return value convention as socket filter
+		 * and pcap_offline_filter.
+		 * if program returns zero
+		 * then packet doesn't match the filter (will be ignored).
+		 */
+		if (cbs->filter && rcs[i] == 0) {
+			__atomic_fetch_add(&stats->filtered,
+					   1, __ATOMIC_RELAXED);
+			continue;
+		}
+
+		/*
+		 * If using pcapng then want to wrap packets
+		 * otherwise a simple copy.
+		 */
+		if (cbs->ver == V2)
+			p = rte_pcapng_copy(port_id, queue,
+					    pkts[i], mp, cbs->snaplen,
+					    ts, direction);
+		else
+			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
+
+		if (unlikely(p == NULL))
+			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+		else
 			dup_bufs[d_pkts++] = p;
 	}
 
+	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)dup_bufs, d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		PDUMP_LOG(DEBUG,
-			"only %d of packets enqueued to ring\n", ring_enq);
+		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
 
 static uint16_t
-pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_rx(uint16_t port, uint16_t queue,
 	struct rte_mbuf **pkts, uint16_t nb_pkts,
-	uint16_t max_pkts __rte_unused,
-	void *user_params)
+	uint16_t max_pkts __rte_unused, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->rx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_IN,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static uint16_t
-pdump_tx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_tx(uint16_t port, uint16_t queue,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->tx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static int
-pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_rx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &rx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &rx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"rx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_first_rx_callback(port, qid,
 								pdump_rx, cbs);
 			if (cbs->cb == NULL) {
@@ -145,8 +200,7 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -170,26 +224,32 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 }
 
 static int
-pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_tx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &tx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &tx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"tx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_tx_callback(port, qid, pdump_tx,
 								cbs);
 			if (cbs->cb == NULL) {
@@ -198,8 +258,7 @@ pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -228,37 +287,47 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	uint16_t nb_rx_q = 0, nb_tx_q = 0, end_q, queue;
 	uint16_t port;
 	int ret = 0;
+	struct rte_bpf *filter = NULL;
 	uint32_t flags;
 	uint16_t operation;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 
-	flags = p->flags;
-	operation = p->op;
-	if (operation == ENABLE) {
-		ret = rte_eth_dev_get_port_by_name(p->data.en_v1.device,
-				&port);
-		if (ret < 0) {
+	/* Check for possible DPDK version mismatch */
+	if (!(p->ver == V1 || p->ver == V2)) {
+		PDUMP_LOG(ERR,
+			  "incorrect client version %u\n", p->ver);
+		return -EINVAL;
+	}
+
+	if (p->prm) {
+		if (p->prm->prog_arg.type != RTE_BPF_ARG_PTR_MBUF) {
 			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.en_v1.device);
+				  "invalid BPF program type: %u\n",
+				  p->prm->prog_arg.type);
 			return -EINVAL;
 		}
-		queue = p->data.en_v1.queue;
-		ring = p->data.en_v1.ring;
-		mp = p->data.en_v1.mp;
-	} else {
-		ret = rte_eth_dev_get_port_by_name(p->data.dis_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.dis_v1.device);
-			return -EINVAL;
+
+		filter = rte_bpf_load(p->prm);
+		if (filter == NULL) {
+			PDUMP_LOG(ERR, "cannot load BPF filter: %s\n",
+				  rte_strerror(rte_errno));
+			return -rte_errno;
 		}
-		queue = p->data.dis_v1.queue;
-		ring = p->data.dis_v1.ring;
-		mp = p->data.dis_v1.mp;
+	}
+
+	flags = p->flags;
+	operation = p->op;
+	queue = p->queue;
+	ring = p->ring;
+	mp = p->mp;
+
+	ret = rte_eth_dev_get_port_by_name(p->device, &port);
+	if (ret < 0) {
+		PDUMP_LOG(ERR,
+			  "failed to get port id for device id=%s\n",
+			  p->device);
+		return -EINVAL;
 	}
 
 	/* validation if packet capture is for all queues */
@@ -296,8 +365,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register RX callback */
 	if (flags & RTE_PDUMP_FLAG_RX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_rx_q : queue + 1;
-		ret = pdump_register_rx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_rx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -305,8 +375,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register TX callback */
 	if (flags & RTE_PDUMP_FLAG_TX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_tx_q : queue + 1;
-		ret = pdump_register_tx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_tx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -332,7 +403,7 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 		resp->err_value = set_pdump_rxtx_cbs(cli_req);
 	}
 
-	strlcpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_resp.len_param = sizeof(*resp);
 	mp_resp.num_fds = 0;
 	if (rte_mp_reply(&mp_resp, peer) < 0) {
@@ -347,8 +418,18 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 int
 rte_pdump_init(void)
 {
+	const struct rte_memzone *mz;
 	int ret;
 
+	mz = rte_memzone_reserve(MZ_RTE_PDUMP_STATS, sizeof(*pdump_stats),
+				 rte_socket_id(), 0);
+	if (mz == NULL) {
+		PDUMP_LOG(ERR, "cannot allocate pdump statistics\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	pdump_stats = mz->addr;
+
 	ret = rte_mp_action_register(PDUMP_MP, pdump_server);
 	if (ret && rte_errno != ENOTSUP)
 		return -1;
@@ -392,14 +473,21 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 static int
 pdump_validate_flags(uint32_t flags)
 {
-	if (flags != RTE_PDUMP_FLAG_RX && flags != RTE_PDUMP_FLAG_TX &&
-		flags != RTE_PDUMP_FLAG_RXTX) {
+	if ((flags & RTE_PDUMP_FLAG_RXTX) == 0) {
 		PDUMP_LOG(ERR,
 			"invalid flags, should be either rx/tx/rxtx\n");
 		rte_errno = EINVAL;
 		return -1;
 	}
 
+	/* mask off the flags we know about */
+	if (flags & ~(RTE_PDUMP_FLAG_RXTX | RTE_PDUMP_FLAG_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "unknown flags: %#x\n", flags);
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -426,12 +514,12 @@ pdump_validate_port(uint16_t port, char *name)
 }
 
 static int
-pdump_prepare_client_request(char *device, uint16_t queue,
-				uint32_t flags,
-				uint16_t operation,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+pdump_prepare_client_request(const char *device, uint16_t queue,
+			     uint32_t flags, uint32_t snaplen,
+			     uint16_t operation,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     const struct rte_bpf_prm *prm)
 {
 	int ret = -1;
 	struct rte_mp_msg mp_req, *mp_rep;
@@ -440,26 +528,22 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	struct pdump_request *req = (struct pdump_request *)mp_req.param;
 	struct pdump_response *resp;
 
-	req->ver = 1;
-	req->flags = flags;
+	memset(req, 0, sizeof(*req));
+
+	req->ver = (flags & RTE_PDUMP_FLAG_PCAPNG) ? V2 : V1;
+	req->flags = flags & RTE_PDUMP_FLAG_RXTX;
 	req->op = operation;
+	req->queue = queue;
+	rte_strscpy(req->device, device, sizeof(req->device));
+
 	if ((operation & ENABLE) != 0) {
-		strlcpy(req->data.en_v1.device, device,
-			sizeof(req->data.en_v1.device));
-		req->data.en_v1.queue = queue;
-		req->data.en_v1.ring = ring;
-		req->data.en_v1.mp = mp;
-		req->data.en_v1.filter = filter;
-	} else {
-		strlcpy(req->data.dis_v1.device, device,
-			sizeof(req->data.dis_v1.device));
-		req->data.dis_v1.queue = queue;
-		req->data.dis_v1.ring = NULL;
-		req->data.dis_v1.mp = NULL;
-		req->data.dis_v1.filter = NULL;
+		req->ring = ring;
+		req->mp = mp;
+		req->prm = prm;
+		req->snaplen = snaplen;
 	}
 
-	strlcpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_req.len_param = sizeof(*req);
 	mp_req.num_fds = 0;
 	if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) {
@@ -477,11 +561,17 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	return ret;
 }
 
-int
-rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
-			struct rte_ring *ring,
-			struct rte_mempool *mp,
-			void *filter)
+/*
+ * There are two versions of this function, because although original API
+ * left place holder for future filter, it never checked the value.
+ * Therefore the API can't depend on application passing a non
+ * bogus value.
+ */
+static int
+pdump_enable(uint16_t port, uint16_t queue,
+	     uint32_t flags, uint32_t snaplen,
+	     struct rte_ring *ring, struct rte_mempool *mp,
+	     const struct rte_bpf_prm *prm)
 {
 	int ret;
 	char name[RTE_DEV_NAME_MAX_LEN];
@@ -496,20 +586,42 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						ENABLE, ring, mp, filter);
+	if (snaplen == 0)
+		snaplen = UINT32_MAX;
 
-	return ret;
+	return pdump_prepare_client_request(name, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
 }
 
 int
-rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
-				uint32_t flags,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
+		 struct rte_ring *ring,
+		 struct rte_mempool *mp,
+		 void *filter __rte_unused)
 {
-	int ret = 0;
+	return pdump_enable(port, queue, flags, 0,
+			    ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm)
+{
+	return pdump_enable(port, queue, flags, snaplen,
+			    ring, mp, prm);
+}
+
+static int
+pdump_enable_by_deviceid(const char *device_id, uint16_t queue,
+			 uint32_t flags, uint32_t snaplen,
+			 struct rte_ring *ring,
+			 struct rte_mempool *mp,
+			 const struct rte_bpf_prm *prm)
+{
+	int ret;
 
 	ret = pdump_validate_ring_mp(ring, mp);
 	if (ret < 0)
@@ -518,10 +630,30 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						ENABLE, ring, mp, filter);
+	return pdump_prepare_client_request(device_id, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
+}
 
-	return ret;
+int
+rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
+			     uint32_t flags,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     void *filter __rte_unused)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, 0,
+					ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *prm)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, snaplen,
+					ring, mp, prm);
 }
 
 int
@@ -537,8 +669,8 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags)
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(name, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
@@ -553,8 +685,68 @@ rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(device_id, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
+
+static void
+pdump_sum_stats(uint16_t port, uint16_t nq,
+		struct rte_pdump_stats stats[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+		struct rte_pdump_stats *total)
+{
+	uint64_t *sum = (uint64_t *)total;
+	unsigned int i;
+	uint64_t val;
+	uint16_t qid;
+
+	for (qid = 0; qid < nq; qid++) {
+		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+
+		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
+			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			sum[i] += val;
+		}
+	}
+}
+
+int
+rte_pdump_stats(uint16_t port, struct rte_pdump_stats *stats)
+{
+	struct rte_eth_dev_info dev_info;
+	const struct rte_memzone *mz;
+	int ret;
+
+	memset(stats, 0, sizeof(*stats));
+	ret = rte_eth_dev_info_get(port, &dev_info);
+	if (ret != 0) {
+		PDUMP_LOG(ERR,
+			  "Error during getting device (port %u) info: %s\n",
+			  port, strerror(-ret));
+		return ret;
+	}
+
+	if (pdump_stats == NULL) {
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+			/* rte_pdump_init was not called */
+			PDUMP_LOG(ERR, "pdump stats not initialized\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/* secondary process looks up the memzone */
+		mz = rte_memzone_lookup(MZ_RTE_PDUMP_STATS);
+		if (mz == NULL) {
+			/* rte_pdump_init was not called in primary process?? */
+			PDUMP_LOG(ERR, "can not find pdump stats\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+		pdump_stats = mz->addr;
+	}
+
+	pdump_sum_stats(port, dev_info.nb_rx_queues, pdump_stats->rx, stats);
+	pdump_sum_stats(port, dev_info.nb_tx_queues, pdump_stats->tx, stats);
+	return 0;
+}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index 6b00fc17aeb2..6efa0274f2ce 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -15,6 +15,7 @@
 #include <stdint.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
+#include <rte_bpf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -26,7 +27,9 @@ enum {
 	RTE_PDUMP_FLAG_RX = 1,  /* receive direction */
 	RTE_PDUMP_FLAG_TX = 2,  /* transmit direction */
 	/* both receive and transmit directions */
-	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
+	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX),
+
+	RTE_PDUMP_FLAG_PCAPNG = 4, /* format for pcapng */
 };
 
 /**
@@ -68,7 +71,7 @@ rte_pdump_uninit(void);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  Unused should be NULL.
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -80,6 +83,41 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 		struct rte_mempool *mp,
 		void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given port and queue with filtering.
+ *
+ * @param port_id
+ *  The Ethernet port on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param prm
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf(uint16_t port_id, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm);
+
 /**
  * Disables packet capturing on given port and queue.
  *
@@ -118,7 +156,7 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  unused should be NULL
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -131,6 +169,43 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 				struct rte_mempool *mp,
 				void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given device id and queue with filtering.
+ * device_id can be name or pci address of device.
+ *
+ * @param device_id
+ *  device id on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *filter);
+
+
 /**
  * Disables packet capturing on given device_id and queue.
  * device_id can be name or pci address of device.
@@ -153,6 +228,38 @@ int
 rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 				uint32_t flags);
 
+
+/**
+ * A structure used to retrieve statistics from packet capture.
+ * The statistics are sum of both receive and transmit queues.
+ */
+struct rte_pdump_stats {
+	uint64_t accepted; /**< Number of packets accepted by filter. */
+	uint64_t filtered; /**< Number of packets rejected by filter. */
+	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
+	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+
+	uint64_t reserved[4]; /**< Reserved and pad to cache line */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Retrieve the packet capture statistics for a queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param stats
+ *   A pointer to structure of type *rte_pdump_stats* to be filled in.
+ * @return
+ *   Zero if successful. -1 on error and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_pdump_stats(uint16_t port_id, struct rte_pdump_stats *stats);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index f0a9d12c9a9e..ce5502d9cdf4 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -10,3 +10,11 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_pdump_enable_bpf;
+	rte_pdump_enable_bpf_by_deviceid;
+	rte_pdump_stats;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v14 07/12] app/dumpcap: add new packet capture application
  2021-10-15 20:11 ` [dpdk-dev] [PATCH v14 00/12] Packet capture framework update Stephen Hemminger
                     ` (5 preceding siblings ...)
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 06/12] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-10-15 20:11   ` Stephen Hemminger
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 08/12] test: add test for bpf_convert Stephen Hemminger
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 20:11 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Reshma Pattan

This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.

It is not complete yet some features such as filtering are not implemented.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/dumpcap/main.c      | 844 ++++++++++++++++++++++++++++++++++++++++
 app/dumpcap/meson.build |  16 +
 app/meson.build         |   1 +
 3 files changed, 861 insertions(+)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build

diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c
new file mode 100644
index 000000000000..baf9eee46666
--- /dev/null
+++ b/app/dumpcap/main.c
@@ -0,0 +1,844 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * DPDK application to dump network traffic
+ * This is designed to look and act like the Wireshark
+ * dumpcap program.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <limits.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_alarm.h>
+#include <rte_bpf.h>
+#include <rte_config.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_lcore.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_pcapng.h>
+#include <rte_pdump.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+#include <rte_time.h>
+#include <rte_version.h>
+
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+
+#define RING_NAME "capture-ring"
+#define MONITOR_INTERVAL  (500 * 1000)
+#define MBUF_POOL_CACHE_SIZE 32
+#define BURST_SIZE 32
+#define SLEEP_THRESHOLD 1000
+
+/* command line flags */
+static const char *progname;
+static bool quit_signal;
+static bool group_read;
+static bool quiet;
+static bool promiscuous_mode = true;
+static bool use_pcapng = true;
+static char *output_name;
+static const char *filter_str;
+static unsigned int ring_size = 2048;
+static const char *capture_comment;
+static uint32_t snaplen = RTE_MBUF_DEFAULT_BUF_SIZE;
+static bool dump_bpf;
+static struct {
+	uint64_t  duration;	/* nanoseconds */
+	unsigned long packets;  /* number of packets in file */
+	size_t size;		/* file size (bytes) */
+} stop;
+
+/* Running state */
+static struct rte_bpf_prm *bpf_prm;
+static uint64_t start_time, end_time;
+static uint64_t packets_received;
+static size_t file_size;
+
+struct interface {
+	TAILQ_ENTRY(interface) next;
+	uint16_t port;
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_rxtx_callback *rx_cb[RTE_MAX_QUEUES_PER_PORT];
+};
+
+TAILQ_HEAD(interface_list, interface);
+static struct interface_list interfaces = TAILQ_HEAD_INITIALIZER(interfaces);
+static struct interface *port2intf[RTE_MAX_ETHPORTS];
+
+/* Can do either pcap or pcapng format output */
+typedef union {
+	rte_pcapng_t  *pcapng;
+	pcap_dumper_t *dumper;
+} dumpcap_out_t;
+
+static void usage(void)
+{
+	printf("Usage: %s [options] ...\n\n", progname);
+	printf("Capture Interface:\n"
+	       "  -i <interface>           name or port index of interface\n"
+	       "  -f <capture filter>      packet filter in libpcap filter syntax\n");
+	printf("  -s <snaplen>, --snapshot-length <snaplen>\n"
+	       "                           packet snapshot length (def: %u)\n",
+	       RTE_MBUF_DEFAULT_BUF_SIZE);
+	printf("  -p, --no-promiscuous-mode\n"
+	       "                           don't capture in promiscuous mode\n"
+	       "  -D, --list-interfaces    print list of interfaces and exit\n"
+	       "  -d                       print generated BPF code for capture filter\n"
+	       "\n"
+	       "Stop conditions:\n"
+	       "  -c <packet count>        stop after n packets (def: infinite)\n"
+	       "  -a <autostop cond.> ..., --autostop <autostop cond.> ...\n"
+	       "                           duration:NUM - stop after NUM seconds\n"
+	       "                           filesize:NUM - stop this file after NUM kB\n"
+	       "                            packets:NUM - stop after NUM packets\n"
+	       "Output (files):\n"
+	       "  -w <filename>            name of file to save (def: tempfile)\n"
+	       "  -g                       enable group read access on the output file(s)\n"
+	       "  -n                       use pcapng format instead of pcap (default)\n"
+	       "  -P                       use libpcap format instead of pcapng\n"
+	       "  --capture-comment <comment>\n"
+	       "                           add a capture comment to the output file\n"
+	       "\n"
+	       "Miscellaneous:\n"
+	       "  -q                       don't report packet capture counts\n"
+	       "  -v, --version            print version information and exit\n"
+	       "  -h, --help               display this help and exit\n"
+	       "\n"
+	       "Use Ctrl-C to stop capturing at any time.\n");
+}
+
+static const char *version(void)
+{
+	static char str[128];
+
+	snprintf(str, sizeof(str),
+		 "%s 1.0 (%s)\n", progname, rte_version());
+	return str;
+}
+
+/* Parse numeric argument from command line */
+static unsigned long get_uint(const char *arg, const char *name,
+			     unsigned int limit)
+{
+	unsigned long u;
+	char *endp;
+
+	u = strtoul(arg, &endp, 0);
+	if (*arg == '\0' || *endp != '\0')
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is not a valid number\n",
+			 name, arg);
+	if (limit && u > limit)
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is too large (greater than %u)\n",
+			 name, arg, limit);
+
+	return u;
+}
+
+/* Set auto stop values */
+static void auto_stop(char *opt)
+{
+	char *value, *endp;
+
+	value = strchr(opt, ':');
+	if (value == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Missing colon in auto stop parameter\n");
+
+	*value++ = '\0';
+	if (strcmp(opt, "duration") == 0) {
+		double interval = strtod(value, &endp);
+
+		if (*value == '\0' || *endp != '\0' || interval <= 0)
+			rte_exit(EXIT_FAILURE,
+				 "Invalid duration \"%s\"\n", value);
+		stop.duration = NSEC_PER_SEC * interval;
+	} else if (strcmp(opt, "filesize") == 0) {
+		stop.size = get_uint(value, "filesize", 0) * 1024;
+	} else if (strcmp(opt, "packets") == 0) {
+		stop.packets = get_uint(value, "packets", 0);
+	} else {
+		rte_exit(EXIT_FAILURE,
+			 "Unknown autostop parameter \"%s\"\n", opt);
+	}
+}
+
+/* Add interface to list of interfaces to capture */
+static void add_interface(uint16_t port, const char *name)
+{
+	struct interface *intf;
+
+	intf = malloc(sizeof(*intf));
+	if (!intf)
+		rte_exit(EXIT_FAILURE, "no memory for interface\n");
+
+	memset(intf, 0, sizeof(*intf));
+	rte_strscpy(intf->name, name, sizeof(intf->name));
+
+	printf("Capturing on '%s'\n", name);
+
+	port2intf[port] = intf;
+	TAILQ_INSERT_TAIL(&interfaces, intf, next);
+}
+
+/* Select all valid DPDK interfaces */
+static void select_all_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+	}
+}
+
+/*
+ * Choose interface to capture if no -i option given.
+ * Select the first DPDK port, this matches what dumpcap does.
+ */
+static void set_default_interface(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+		return;
+	}
+	rte_exit(EXIT_FAILURE, "No usable interfaces found\n");
+}
+
+/* Lookup interface by name or port and add it to the list */
+static void select_interface(const char *arg)
+{
+	uint16_t port;
+
+	if (strcmp(arg, "*"))
+		select_all_interfaces();
+	else if (rte_eth_dev_get_port_by_name(arg, &port) == 0)
+		add_interface(port, arg);
+	else {
+		char name[RTE_ETH_NAME_MAX_LEN];
+
+		port = get_uint(arg, "port_number", UINT16_MAX);
+		if (rte_eth_dev_get_name_by_port(port, name) < 0)
+			rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
+				 port);
+		add_interface(port, name);
+	}
+}
+
+/* Display list of possible interfaces that can be used. */
+static void show_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		printf("%u. %s\n", p, name);
+	}
+}
+
+static void compile_filter(void)
+{
+	struct bpf_program bf;
+	pcap_t *pcap;
+
+	pcap = pcap_open_dead(DLT_EN10MB, snaplen);
+	if (!pcap)
+		rte_exit(EXIT_FAILURE, "can not open pcap\n");
+
+	if (pcap_compile(pcap, &bf, filter_str,
+			 1, PCAP_NETMASK_UNKNOWN) != 0)
+		rte_exit(EXIT_FAILURE, "pcap filter string not valid (%s)\n",
+			 pcap_geterr(pcap));
+
+	bpf_prm = rte_bpf_convert(&bf);
+	if (bpf_prm == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "bpf convert failed\n");
+
+	if (dump_bpf) {
+		printf("cBPF program (%u insns)\n", bf.bf_len);
+		bpf_dump(&bf, 1);
+		printf("\neBPF program (%u insns)\n", bpf_prm->nb_ins);
+		rte_bpf_dump(stdout, bpf_prm->ins, bpf_prm->nb_ins);
+		exit(0);
+	}
+
+	/* Don't care about original program any more */
+	pcap_freecode(&bf);
+	pcap_close(pcap);
+}
+
+/*
+ * Parse command line options.
+ * These are chosen to be similar to dumpcap command.
+ */
+static void parse_opts(int argc, char **argv)
+{
+	static const struct option long_options[] = {
+		{ "autostop",        required_argument, NULL, 'a' },
+		{ "capture-comment", required_argument, NULL, 0 },
+		{ "help",            no_argument,       NULL, 'h' },
+		{ "interface",       required_argument, NULL, 'i' },
+		{ "list-interfaces", no_argument,       NULL, 'D' },
+		{ "no-promiscuous-mode", no_argument,   NULL, 'p' },
+		{ "output-file",     required_argument, NULL, 'w' },
+		{ "ring-buffer",     required_argument, NULL, 'b' },
+		{ "snapshot-length", required_argument, NULL, 's' },
+		{ "version",         no_argument,       NULL, 'v' },
+		{ NULL },
+	};
+	int option_index, c;
+
+	for (;;) {
+		c = getopt_long(argc, argv, "a:b:c:dDf:ghi:nN:pPqs:vw:",
+				long_options, &option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			switch (option_index) {
+			case 0:
+				capture_comment = optarg;
+				break;
+			default:
+				usage();
+				exit(1);
+			}
+			break;
+		case 'a':
+			auto_stop(optarg);
+			break;
+		case 'b':
+			rte_exit(EXIT_FAILURE,
+				 "multiple files not implemented\n");
+			break;
+		case 'c':
+			stop.packets = get_uint(optarg, "packet_count", 0);
+			break;
+		case 'd':
+			dump_bpf = true;
+			break;
+		case 'D':
+			show_interfaces();
+			exit(0);
+		case 'f':
+			filter_str = optarg;
+			break;
+		case 'g':
+			group_read = true;
+			break;
+		case 'h':
+			printf("%s\n\n", version());
+			usage();
+			exit(0);
+		case 'i':
+			select_interface(optarg);
+			break;
+		case 'n':
+			use_pcapng = true;
+			break;
+		case 'N':
+			ring_size = get_uint(optarg, "packet_limit", 0);
+			break;
+		case 'p':
+			promiscuous_mode = false;
+			break;
+		case 'P':
+			use_pcapng = false;
+			break;
+		case 'q':
+			quiet = true;
+			break;
+		case 's':
+			snaplen = get_uint(optarg, "snap_len", 0);
+			break;
+		case 'w':
+			output_name = optarg;
+			break;
+		case 'v':
+			printf("%s\n", version());
+			exit(0);
+		default:
+			fprintf(stderr, "Invalid option: %s\n",
+				argv[optind - 1]);
+			usage();
+			exit(1);
+		}
+	}
+}
+
+static void
+signal_handler(int sig_num __rte_unused)
+{
+	__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+}
+
+/* Return the time since 1/1/1970 in nanoseconds */
+static uint64_t create_timestamp(void)
+{
+	struct timespec now;
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+	return rte_timespec_to_ns(&now);
+}
+
+static void
+cleanup_pdump_resources(void)
+{
+	struct interface *intf;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		rte_pdump_disable(intf->port,
+				  RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
+		if (promiscuous_mode)
+			rte_eth_promiscuous_disable(intf->port);
+	}
+}
+
+/* Alarm signal handler, used to check that primary process */
+static void
+monitor_primary(void *arg __rte_unused)
+{
+	if (__atomic_load_n(&quit_signal, __ATOMIC_RELAXED))
+		return;
+
+	if (rte_eal_primary_proc_alive(NULL)) {
+		rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	} else {
+		fprintf(stderr,
+			"Primary process is no longer active, exiting...\n");
+		__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+	}
+}
+
+/* Setup handler to check when primary exits. */
+static void
+enable_primary_monitor(void)
+{
+	int ret;
+
+	/* Once primary exits, so will pdump. */
+	ret = rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to enable monitor:%d\n", ret);
+}
+
+static void
+disable_primary_monitor(void)
+{
+	int ret;
+
+	ret = rte_eal_alarm_cancel(monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to disable monitor:%d\n", ret);
+}
+
+static void
+report_packet_stats(dumpcap_out_t out)
+{
+	struct rte_pdump_stats pdump_stats;
+	struct interface *intf;
+	uint64_t ifrecv, ifdrop;
+	double percent;
+
+	fputc('\n', stderr);
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (rte_pdump_stats(intf->port, &pdump_stats) < 0)
+			continue;
+
+		/* do what Wiretap does */
+		ifrecv = pdump_stats.accepted + pdump_stats.filtered;
+		ifdrop = pdump_stats.nombuf + pdump_stats.ringfull;
+
+		if (use_pcapng)
+			rte_pcapng_write_stats(out.pcapng, intf->port, NULL,
+					       start_time, end_time,
+					       ifrecv, ifdrop);
+
+		if (ifrecv == 0)
+			percent = 0;
+		else
+			percent = 100. * ifrecv / (ifrecv + ifdrop);
+
+		fprintf(stderr,
+			"Packets received/dropped on interface '%s': "
+			"%"PRIu64 "/%" PRIu64 " (%.1f)\n",
+			intf->name, ifrecv, ifdrop, percent);
+	}
+}
+
+/*
+ * Start DPDK EAL with arguments.
+ * Unlike most DPDK programs, this application does not use the
+ * typical EAL command line arguments.
+ * We don't want to expose all the DPDK internals to the user.
+ */
+static void dpdk_init(void)
+{
+	static const char * const args[] = {
+		"dumpcap", "--proc-type", "secondary",
+		"--log-level", "notice"
+
+	};
+	const int eal_argc = RTE_DIM(args);
+	char **eal_argv;
+	unsigned int i;
+
+	/* DPDK API requires mutable versions of command line arguments. */
+	eal_argv = calloc(eal_argc + 1, sizeof(char *));
+	if (eal_argv == NULL)
+		rte_panic("No memory\n");
+
+	eal_argv[0] = strdup(progname);
+	for (i = 1; i < RTE_DIM(args); i++)
+		eal_argv[i] = strdup(args[i]);
+
+	if (rte_eal_init(eal_argc, eal_argv) < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed: is primary process running?\n");
+
+	if (rte_eth_dev_count_avail() == 0)
+		rte_exit(EXIT_FAILURE, "No Ethernet ports found\n");
+}
+
+/* Create packet ring shared between callbacks and process */
+static struct rte_ring *create_ring(void)
+{
+	struct rte_ring *ring;
+	size_t size, log2;
+
+	/* Find next power of 2 >= size. */
+	size = ring_size;
+	log2 = sizeof(size) * 8 - __builtin_clzl(size - 1);
+	size = 1u << log2;
+
+	if (size != ring_size) {
+		fprintf(stderr, "Ring size %u rounded up to %zu\n",
+			ring_size, size);
+		ring_size = size;
+	}
+
+	ring = rte_ring_lookup(RING_NAME);
+	if (ring == NULL) {
+		ring = rte_ring_create(RING_NAME, ring_size,
+					rte_socket_id(), 0);
+		if (ring == NULL)
+			rte_exit(EXIT_FAILURE, "Could not create ring :%s\n",
+				 rte_strerror(rte_errno));
+	}
+	return ring;
+}
+
+static struct rte_mempool *create_mempool(void)
+{
+	static const char pool_name[] = "capture_mbufs";
+	size_t num_mbufs = 2 * ring_size;
+	struct rte_mempool *mp;
+
+	mp = rte_mempool_lookup(pool_name);
+	if (mp)
+		return mp;
+
+	mp = rte_pktmbuf_pool_create_by_ops(pool_name, num_mbufs,
+					    MBUF_POOL_CACHE_SIZE, 0,
+					    rte_pcapng_mbuf_size(snaplen),
+					    rte_socket_id(), "ring_mp_sc");
+	if (mp == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Mempool (%s) creation failed: %s\n", pool_name,
+			 rte_strerror(rte_errno));
+
+	return mp;
+}
+
+/*
+ * Get Operating System information.
+ * Returns an string allocated via malloc().
+ */
+static char *get_os_info(void)
+{
+	struct utsname uts;
+	char *osname = NULL;
+
+	if (uname(&uts) < 0)
+		return NULL;
+
+	if (asprintf(&osname, "%s %s",
+		     uts.sysname, uts.release) == -1)
+		return NULL;
+
+	return osname;
+}
+
+static dumpcap_out_t create_output(void)
+{
+	dumpcap_out_t ret;
+	static char tmp_path[PATH_MAX];
+	int fd;
+
+	/* If no filename specified make a tempfile name */
+	if (output_name == NULL) {
+		struct interface *intf;
+		struct tm *tm;
+		time_t now;
+		char ts[32];
+
+		intf = TAILQ_FIRST(&interfaces);
+		now = time(NULL);
+		tm = localtime(&now);
+		if (!tm)
+			rte_panic("localtime failed\n");
+
+		strftime(ts, sizeof(ts), "%Y%m%d%H%M%S", tm);
+
+		snprintf(tmp_path, sizeof(tmp_path),
+			 "/tmp/%s_%u_%s_%s.%s",
+			 progname, intf->port, intf->name, ts,
+			 use_pcapng ? "pcapng" : "pcap");
+		output_name = tmp_path;
+	}
+
+	if (strcmp(output_name, "-") == 0)
+		fd = STDOUT_FILENO;
+	else {
+		mode_t mode = group_read ? 0640 : 0600;
+
+		fd = open(output_name, O_WRONLY | O_CREAT, mode);
+		if (fd < 0)
+			rte_exit(EXIT_FAILURE, "Can not open \"%s\": %s\n",
+				 output_name, strerror(errno));
+	}
+
+	if (use_pcapng) {
+		char *os = get_os_info();
+
+		ret.pcapng = rte_pcapng_fdopen(fd, os, NULL,
+					   version(), capture_comment);
+		if (ret.pcapng == NULL)
+			rte_exit(EXIT_FAILURE, "pcapng_fdopen failed: %s\n",
+				 strerror(rte_errno));
+		free(os);
+	} else {
+		pcap_t *pcap;
+
+		pcap = pcap_open_dead_with_tstamp_precision(DLT_EN10MB, snaplen,
+							    PCAP_TSTAMP_PRECISION_NANO);
+		if (pcap == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_open_dead failed\n");
+
+		ret.dumper = pcap_dump_fopen(pcap, fdopen(fd, "w"));
+		if (ret.dumper == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_dump_fopen failed: %s\n",
+				 pcap_geterr(pcap));
+	}
+
+	return ret;
+}
+
+static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp)
+{
+	struct interface *intf;
+	uint32_t flags;
+	int ret;
+
+	flags = RTE_PDUMP_FLAG_RXTX;
+	if (use_pcapng)
+		flags |= RTE_PDUMP_FLAG_PCAPNG;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (promiscuous_mode)
+			rte_eth_promiscuous_enable(intf->port);
+
+		ret = rte_pdump_enable_bpf(intf->port, RTE_PDUMP_ALL_QUEUES,
+					   flags, snaplen,
+					   r, mp, bpf_prm);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Packet dump enable failed: %s\n",
+				 rte_strerror(-ret));
+	}
+}
+
+/*
+ * Show current count of captured packets
+ * with backspaces to overwrite last value.
+ */
+static void show_count(uint64_t count)
+{
+	unsigned int i;
+	static unsigned int bt;
+
+	for (i = 0; i < bt; i++)
+		fputc('\b', stderr);
+
+	bt = fprintf(stderr, "%"PRIu64" ", count);
+}
+
+/* Write multiple packets in older pcap format */
+static ssize_t
+pcap_write_packets(pcap_dumper_t *dumper,
+		   struct rte_mbuf *pkts[], uint16_t n)
+{
+	uint8_t temp_data[snaplen];
+	struct pcap_pkthdr header;
+	uint16_t i;
+	size_t total = 0;
+
+	gettimeofday(&header.ts, NULL);
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m = pkts[i];
+
+		header.len = rte_pktmbuf_pkt_len(m);
+		header.caplen = RTE_MIN(header.len, snaplen);
+
+		pcap_dump((u_char *)dumper, &header,
+			  rte_pktmbuf_read(m, 0, header.caplen, temp_data));
+
+		total += sizeof(header) + header.len;
+	}
+
+	return total;
+}
+
+/* Process all packets in ring and dump to capture file */
+static int process_ring(dumpcap_out_t out, struct rte_ring *r)
+{
+	struct rte_mbuf *pkts[BURST_SIZE];
+	unsigned int avail, n;
+	static unsigned int empty_count;
+	ssize_t written;
+
+	n = rte_ring_sc_dequeue_burst(r, (void **) pkts, BURST_SIZE,
+				      &avail);
+	if (n == 0) {
+		/* don't consume endless amounts of cpu if idle */
+		if (empty_count < SLEEP_THRESHOLD)
+			++empty_count;
+		else
+			usleep(10);
+		return 0;
+	}
+
+	empty_count = (avail == 0);
+
+	if (use_pcapng)
+		written = rte_pcapng_write_packets(out.pcapng, pkts, n);
+	else
+		written = pcap_write_packets(out.dumper, pkts, n);
+
+	rte_pktmbuf_free_bulk(pkts, n);
+
+	if (written < 0)
+		return -1;
+
+	file_size += written;
+	packets_received += n;
+	if (!quiet)
+		show_count(packets_received);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	struct rte_ring *r;
+	struct rte_mempool *mp;
+	dumpcap_out_t out;
+
+	progname = argv[0];
+
+	dpdk_init();
+	parse_opts(argc, argv);
+
+	if (filter_str)
+		compile_filter();
+
+	if (TAILQ_EMPTY(&interfaces))
+		set_default_interface();
+
+	r = create_ring();
+	mp = create_mempool();
+	out = create_output();
+
+	start_time = create_timestamp();
+	enable_pdump(r, mp);
+
+	signal(SIGINT, signal_handler);
+	signal(SIGPIPE, SIG_IGN);
+
+	enable_primary_monitor();
+
+	if (!quiet) {
+		fprintf(stderr, "Packets captured: ");
+		show_count(0);
+	}
+
+	while (!__atomic_load_n(&quit_signal, __ATOMIC_RELAXED)) {
+		if (process_ring(out, r) < 0) {
+			fprintf(stderr, "pcapng file write failed; %s\n",
+				strerror(errno));
+			break;
+		}
+
+		if (stop.size && file_size >= stop.size)
+			break;
+
+		if (stop.packets && packets_received >= stop.packets)
+			break;
+
+		if (stop.duration != 0 &&
+		    create_timestamp() - start_time > stop.duration)
+			break;
+	}
+
+	end_time = create_timestamp();
+	disable_primary_monitor();
+
+	if (rte_eal_primary_proc_alive(NULL))
+		report_packet_stats(out);
+
+	if (use_pcapng)
+		rte_pcapng_close(out.pcapng);
+	else
+		pcap_dump_close(out.dumper);
+
+	cleanup_pdump_resources();
+	rte_free(bpf_filter);
+	rte_ring_free(r);
+	rte_mempool_free(mp);
+
+	return rte_eal_cleanup() ? EXIT_FAILURE : 0;
+}
diff --git a/app/dumpcap/meson.build b/app/dumpcap/meson.build
new file mode 100644
index 000000000000..794336211eff
--- /dev/null
+++ b/app/dumpcap/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+if not dpdk_conf.has('RTE_PORT_PCAP')
+    build = false
+    reason = 'missing dependency, "libpcap"'
+endif
+
+if is_windows
+	build = false
+	reason = 'not supported on Windows'
+	subdir_done()
+endif
+
+sources = files('main.c')
+deps += ['ethdev', 'pdump', 'pcapng', 'bpf']
diff --git a/app/meson.build b/app/meson.build
index 4c6049807cc3..e41a2e390236 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017-2019 Intel Corporation
 
 apps = [
+        'dumpcap',
         'pdump',
         'proc-info',
         'test-acl',
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v14 08/12] test: add test for bpf_convert
  2021-10-15 20:11 ` [dpdk-dev] [PATCH v14 00/12] Packet capture framework update Stephen Hemminger
                     ` (6 preceding siblings ...)
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
@ 2021-10-15 20:11   ` Stephen Hemminger
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 09/12] test: add a test for pcapng library Stephen Hemminger
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 20:11 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

Add some functional tests for the Classic BPF to DPDK BPF converter.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 200 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 200 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 8118a1849ba0..d3d385ce1eff 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -10,6 +10,7 @@
 #include <rte_memory.h>
 #include <rte_debug.h>
 #include <rte_hexdump.h>
+#include <rte_malloc.h>
 #include <rte_random.h>
 #include <rte_byteorder.h>
 #include <rte_errno.h>
@@ -3233,3 +3234,202 @@ test_bpf(void)
 }
 
 REGISTER_TEST_COMMAND(bpf_autotest, test_bpf);
+
+#ifdef RTE_PORT_PCAP
+#include <pcap/pcap.h>
+
+static void
+test_bpf_dump(struct bpf_program *cbf, const struct rte_bpf_prm *prm)
+{
+	printf("cBPF program (%u insns)\n", cbf->bf_len);
+	bpf_dump(cbf, 1);
+
+	printf("\neBPF program (%u insns)\n", prm->nb_ins);
+	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
+}
+
+static int
+test_bpf_match(pcap_t *pcap, const char *str,
+	       struct rte_mbuf *mb)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+	int ret = -1;
+	uint64_t rc;
+
+	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
+		       __func__, __LINE__,  str, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, str, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	rc = rte_bpf_exec(bpf, mb);
+	/* The return code from bpf capture filter is non-zero if matched */
+	ret = (rc == 0);
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return ret;
+}
+
+/* Basic sanity test can we match a IP packet */
+static int
+test_bpf_filter_sanity(pcap_t *pcap)
+{
+	const uint32_t plen = 100;
+	struct rte_mbuf mb, *m;
+	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
+	struct {
+		struct rte_ether_hdr eth_hdr;
+		struct rte_ipv4_hdr ip_hdr;
+	} *hdr;
+
+	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
+	m = &mb;
+
+	hdr = rte_pktmbuf_mtod(m, typeof(hdr));
+	hdr->eth_hdr = (struct rte_ether_hdr) {
+		.dst_addr.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+		.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+	};
+	hdr->ip_hdr = (struct rte_ipv4_hdr) {
+		.version_ihl = RTE_IPV4_VHL_DEF,
+		.total_length = rte_cpu_to_be_16(plen),
+		.time_to_live = IPDEFTTL,
+		.next_proto_id = IPPROTO_RAW,
+		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+	};
+
+	if (test_bpf_match(pcap, "ip", m) != 0) {
+		printf("%s@%d: filter \"ip\" doesn't match test data\n",
+		       __func__, __LINE__);
+		return -1;
+	}
+	if (test_bpf_match(pcap, "not ip", m) == 0) {
+		printf("%s@%d: filter \"not ip\" does match test data\n",
+		       __func__, __LINE__);
+		return -1;
+	}
+
+	return 0;
+}
+
+/*
+ * Some sample pcap filter strings from
+ * https://wiki.wireshark.org/CaptureFilters
+ */
+static const char * const sample_filters[] = {
+	"host 172.18.5.4",
+	"net 192.168.0.0/24",
+	"src net 192.168.0.0/24",
+	"src net 192.168.0.0 mask 255.255.255.0",
+	"dst net 192.168.0.0/24",
+	"dst net 192.168.0.0 mask 255.255.255.0",
+	"port 53",
+	"host dpdk.org and not (port 80 or port 25)",
+	"host dpdk.org and not port 80 and not port 25",
+	"port not 53 and not arp",
+	"(tcp[0:2] > 1500 and tcp[0:2] < 1550) or (tcp[2:2] > 1500 and tcp[2:2] < 1550)",
+	"ether proto 0x888e",
+	"ether[0] & 1 = 0 and ip[16] >= 224",
+	"icmp[icmptype] != icmp-echo and icmp[icmptype] != icmp-echoreply",
+	"tcp[tcpflags] & (tcp-syn|tcp-fin) != 0 and not src and dst net 127.0.0.1",
+	"not ether dst 01:80:c2:00:00:0e",
+	"not broadcast and not multicast",
+	"dst host ff02::1",
+	"port 80 and tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420",
+	/* Worms */
+	"dst port 135 and tcp port 135 and ip[2:2]==48",
+	"icmp[icmptype]==icmp-echo and ip[2:2]==92 and icmp[8:4]==0xAAAAAAAA",
+	"dst port 135 or dst port 445 or dst port 1433"
+	" and tcp[tcpflags] & (tcp-syn) != 0"
+	" and tcp[tcpflags] & (tcp-ack) = 0 and src net 192.168.0.0/24",
+	"tcp src port 443 and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4] = 0x18)"
+	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 1] = 0x03)"
+	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 2] < 0x04)"
+	" and ((ip[2:2] - 4 * (ip[0] & 0x0F) - 4 * ((tcp[12] & 0xF0) >> 4) > 69))",
+	/* Other */
+	"len = 128",
+};
+
+static int
+test_bpf_filter(pcap_t *pcap, const char *s)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+
+	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
+		       __func__, __LINE__, s, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, s, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	else {
+		printf("%s \"%s\"\n", __func__, s);
+		test_bpf_dump(&fcode, prm);
+	}
+
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return (bpf == NULL) ? -1 : 0;
+}
+
+static int
+test_bpf_convert(void)
+{
+	unsigned int i;
+	pcap_t *pcap;
+	int rc;
+
+	pcap = pcap_open_dead(DLT_EN10MB, 262144);
+	if (!pcap) {
+		printf("pcap_open_dead failed\n");
+		return -1;
+	}
+
+	rc = test_bpf_filter_sanity(pcap);
+	for (i = 0; i < RTE_DIM(sample_filters); i++)
+		rc |= test_bpf_filter(pcap, sample_filters[i]);
+
+	pcap_close(pcap);
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(bpf_convert_autotest, test_bpf_convert);
+#endif /* RTE_PORT_PCAP */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v14 09/12] test: add a test for pcapng library
  2021-10-15 20:11 ` [dpdk-dev] [PATCH v14 00/12] Packet capture framework update Stephen Hemminger
                     ` (7 preceding siblings ...)
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 08/12] test: add test for bpf_convert Stephen Hemminger
@ 2021-10-15 20:11   ` Stephen Hemminger
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 10/12] test: enable bpf autotest Stephen Hemminger
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 20:11 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Reshma Pattan

Simple unit test that created pcapng file using API.

To run this test you need to have at least one device.
For example:

DPDK_TEST=pcapng_autotest ./build/app/test/dpdk-test -l 0-15 \
    --no-huge -m 2048 --vdev=net_tap,iface=dummy

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/meson.build   |   4 +
 app/test/test_pcapng.c | 272 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 276 insertions(+)
 create mode 100644 app/test/test_pcapng.c

diff --git a/app/test/meson.build b/app/test/meson.build
index f144d8b8ed68..56863f97acbb 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -396,6 +396,10 @@ if dpdk_conf.has('RTE_NET_RING')
     fast_tests += [['pdump_autotest', true]]
 endif
 
+if dpdk_conf.has('RTE_PORT_PCAP')
+    test_sources += 'test_pcapng.c'
+endif
+
 if dpdk_conf.has('RTE_LIB_POWER')
     test_deps += 'power'
 endif
diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
new file mode 100644
index 000000000000..ed1e87f9445d
--- /dev/null
+++ b/app/test/test_pcapng.c
@@ -0,0 +1,272 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Microsoft Corporation
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_net.h>
+#include <rte_pcapng.h>
+
+#include <pcap/pcap.h>
+
+#include "test.h"
+
+#define NUM_PACKETS    10
+#define DUMMY_MBUF_NUM 3
+
+static rte_pcapng_t *pcapng;
+static struct rte_mempool *mp;
+static const uint32_t pkt_len = 200;
+static uint16_t port_id;
+static char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng";
+
+/* first mbuf in the packet, should always be at offset 0 */
+struct dummy_mbuf {
+	struct rte_mbuf mb[DUMMY_MBUF_NUM];
+	uint8_t buf[DUMMY_MBUF_NUM][RTE_MBUF_DEFAULT_BUF_SIZE];
+};
+
+static void
+dummy_mbuf_prep(struct rte_mbuf *mb, uint8_t buf[], uint32_t buf_len,
+	uint32_t data_len)
+{
+	uint32_t i;
+	uint8_t *db;
+
+	mb->buf_addr = buf;
+	mb->buf_iova = (uintptr_t)buf;
+	mb->buf_len = buf_len;
+	rte_mbuf_refcnt_set(mb, 1);
+
+	/* set pool pointer to dummy value, test doesn't use it */
+	mb->pool = (void *)buf;
+
+	rte_pktmbuf_reset(mb);
+	db = (uint8_t *)rte_pktmbuf_append(mb, data_len);
+
+	for (i = 0; i != data_len; i++)
+		db[i] = i;
+}
+
+/* Make an IP packet consisting of chain of one packets */
+static void
+mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen)
+{
+	struct {
+		struct rte_ether_hdr eth;
+		struct rte_ipv4_hdr ip;
+	} pkt = {
+		.eth = {
+			.dst_addr.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+			.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+		},
+		.ip = {
+			.version_ihl = RTE_IPV4_VHL_DEF,
+			.total_length = rte_cpu_to_be_16(plen),
+			.time_to_live = IPDEFTTL,
+			.next_proto_id = IPPROTO_RAW,
+			.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+			.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+		}
+	};
+
+	memset(dm, 0, sizeof(*dm));
+	dummy_mbuf_prep(&dm->mb[0], dm->buf[0], sizeof(dm->buf[0]), plen);
+
+	rte_eth_random_addr(pkt.eth.src_addr.addr_bytes);
+	memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, RTE_MIN(sizeof(pkt), plen));
+}
+
+static int
+test_setup(void)
+{
+	int tmp_fd;
+
+	port_id = rte_eth_find_next(0);
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		fprintf(stderr, "No valid Ether port\n");
+		return -1;
+	}
+
+	tmp_fd = mkstemps(file_name, strlen(".pcapng"));
+	if (tmp_fd == -1) {
+		perror("mkstemps() failure");
+		return -1;
+	}
+	printf("pcapng: output file %s\n", file_name);
+
+	/* open a test capture file */
+	pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL);
+	if (pcapng == NULL) {
+		fprintf(stderr, "rte_pcapng_fdopen failed\n");
+		close(tmp_fd);
+		return -1;
+	}
+
+
+	/* Make a pool for cloned packeets */
+	mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", NUM_PACKETS,
+					    0, 0,
+					    rte_pcapng_mbuf_size(pkt_len),
+					    SOCKET_ID_ANY, "ring_mp_sc");
+	if (mp == NULL) {
+		fprintf(stderr, "Cannot create mempool\n");
+		return -1;
+	}
+	return 0;
+
+}
+
+static int
+test_write_packets(void)
+{
+	struct rte_mbuf *orig;
+	struct rte_mbuf *clones[NUM_PACKETS] = { };
+	struct dummy_mbuf mbfs;
+	unsigned int i;
+	ssize_t len;
+
+	/* make a dummy packet */
+	mbuf1_prepare(&mbfs, pkt_len);
+
+	/* clone them */
+	orig  = &mbfs.mb[0];
+	for (i = 0; i < NUM_PACKETS; i++) {
+		struct rte_mbuf *mc;
+
+		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
+				rte_get_tsc_cycles(), 0);
+		if (mc == NULL) {
+			fprintf(stderr, "Cannot copy packet\n");
+			return -1;
+		}
+		clones[i] = mc;
+	}
+
+	/* write it to capture file */
+	len = rte_pcapng_write_packets(pcapng, clones, NUM_PACKETS);
+
+	rte_pktmbuf_free_bulk(clones, NUM_PACKETS);
+
+	if (len <= 0) {
+		fprintf(stderr, "Write of packets failed\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+test_write_stats(void)
+{
+	ssize_t len;
+
+	/* write a statistics block */
+	len = rte_pcapng_write_stats(pcapng, port_id,
+				     NULL, 0, 0,
+				     NUM_PACKETS, 0);
+	if (len <= 0) {
+		fprintf(stderr, "Write of statistics failed\n");
+		return -1;
+	}
+	return 0;
+}
+
+static void
+pkt_print(u_char *user, const struct pcap_pkthdr *h,
+	  const u_char *bytes)
+{
+	unsigned int *countp = (unsigned int *)user;
+	const struct rte_ether_hdr *eh;
+	struct tm *tm;
+	char tbuf[128], src[64], dst[64];
+
+	tm = localtime(&h->ts.tv_sec);
+	if (tm == NULL) {
+		perror("localtime");
+		return;
+	}
+
+	if (strftime(tbuf, sizeof(tbuf), "%X", tm) == 0) {
+		fprintf(stderr, "strftime returned 0!\n");
+		return;
+	}
+
+	eh = (const struct rte_ether_hdr *)bytes;
+	rte_ether_format_addr(dst, sizeof(dst), &eh->dst_addr);
+	rte_ether_format_addr(src, sizeof(src), &eh->src_addr);
+	printf("%s.%06lu: %s -> %s type %x length %u\n",
+	       tbuf, (unsigned long)h->ts.tv_usec,
+	       src, dst, rte_be_to_cpu_16(eh->ether_type), h->len);
+
+	*countp += 1;
+}
+
+/*
+ * Open the resulting pcapng file with libpcap
+ * Would be better to use capinfos from wireshark
+ * but that creates an unwanted dependency.
+ */
+static int
+test_validate(void)
+{
+	char errbuf[PCAP_ERRBUF_SIZE];
+	unsigned int count = 0;
+	pcap_t *pcap;
+	int ret;
+
+	pcap = pcap_open_offline(file_name, errbuf);
+	if (pcap == NULL) {
+		fprintf(stderr, "pcap_open_offline('%s') failed: %s\n",
+			file_name, errbuf);
+		return -1;
+	}
+
+	ret = pcap_loop(pcap, 0, pkt_print, (u_char *)&count);
+	if (ret == 0)
+		printf("Saw %u packets\n", count);
+	else
+		fprintf(stderr, "pcap_dispatch: failed: %s\n",
+			pcap_geterr(pcap));
+	pcap_close(pcap);
+
+	return ret;
+}
+
+static void
+test_cleanup(void)
+{
+	if (mp)
+		rte_mempool_free(mp);
+
+	if (pcapng)
+		rte_pcapng_close(pcapng);
+
+}
+
+static struct
+unit_test_suite test_pcapng_suite  = {
+	.setup = test_setup,
+	.teardown = test_cleanup,
+	.suite_name = "Test Pcapng Unit Test Suite",
+	.unit_test_cases = {
+		TEST_CASE(test_write_packets),
+		TEST_CASE(test_write_stats),
+		TEST_CASE(test_validate),
+		TEST_CASES_END()
+	}
+};
+
+static int
+test_pcapng(void)
+{
+	return unit_test_suite_runner(&test_pcapng_suite);
+}
+
+REGISTER_TEST_COMMAND(pcapng_autotest, test_pcapng);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v14 10/12] test: enable bpf autotest
  2021-10-15 20:11 ` [dpdk-dev] [PATCH v14 00/12] Packet capture framework update Stephen Hemminger
                     ` (8 preceding siblings ...)
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 09/12] test: add a test for pcapng library Stephen Hemminger
@ 2021-10-15 20:11   ` Stephen Hemminger
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 11/12] doc: changes for new pcapng and dumpcap utility Stephen Hemminger
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
  11 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 20:11 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

The BPF autotest is defined but not run automatically.
Since it is short, it should be added to the autotest suite.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/meson.build | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/app/test/meson.build b/app/test/meson.build
index 56863f97acbb..aa298e971401 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -194,6 +194,8 @@ test_deps = [
 fast_tests = [
         ['acl_autotest', true],
         ['atomic_autotest', false],
+        ['bpf_autotest', true],
+        ['bpf_convert_autotest', true],
         ['bitops_autotest', true],
         ['byteorder_autotest', true],
         ['cksum_autotest', true],
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v14 11/12] doc: changes for new pcapng and dumpcap utility
  2021-10-15 20:11 ` [dpdk-dev] [PATCH v14 00/12] Packet capture framework update Stephen Hemminger
                     ` (9 preceding siblings ...)
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 10/12] test: enable bpf autotest Stephen Hemminger
@ 2021-10-15 20:11   ` Stephen Hemminger
  2021-10-19  8:28     ` Pattan, Reshma
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
  11 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 20:11 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Reshma Pattan

Describe the new packet capture library and utility.
Fix the title line on the pdump documentation.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 doc/api/doxy-api-index.md                     |  1 +
 doc/api/doxy-api.conf.in                      |  1 +
 .../howto/img/packet_capture_framework.svg    | 96 +++++++++----------
 doc/guides/howto/packet_capture_framework.rst | 69 ++++++-------
 doc/guides/prog_guide/index.rst               |  1 +
 doc/guides/prog_guide/pcapng_lib.rst          | 46 +++++++++
 doc/guides/prog_guide/pdump_lib.rst           | 28 ++++--
 doc/guides/rel_notes/release_21_11.rst        | 10 ++
 doc/guides/tools/dumpcap.rst                  | 86 +++++++++++++++++
 doc/guides/tools/index.rst                    |  1 +
 10 files changed, 251 insertions(+), 88 deletions(-)
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 1992107a0356..ee07394d1c78 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -223,3 +223,4 @@ The public API headers are grouped by topics:
   [experimental APIs]  (@ref rte_compat.h),
   [ABI versioning]     (@ref rte_function_versioning.h),
   [version]            (@ref rte_version.h)
+  [pcapng]             (@ref rte_pcapng.h)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 325a0195c6ab..aba17799a9a1 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -58,6 +58,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/metrics \
                           @TOPDIR@/lib/node \
                           @TOPDIR@/lib/net \
+                          @TOPDIR@/lib/pcapng \
                           @TOPDIR@/lib/pci \
                           @TOPDIR@/lib/pdump \
                           @TOPDIR@/lib/pipeline \
diff --git a/doc/guides/howto/img/packet_capture_framework.svg b/doc/guides/howto/img/packet_capture_framework.svg
index a76baf71fdee..1c2646a81096 100644
--- a/doc/guides/howto/img/packet_capture_framework.svg
+++ b/doc/guides/howto/img/packet_capture_framework.svg
@@ -1,6 +1,4 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<!-- Created with Inkscape (http://www.inkscape.org/) -->
-
 <svg
    xmlns:osb="http://www.openswatchbook.org/uri/2009/osb"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
@@ -16,8 +14,8 @@
    viewBox="0 0 425.19685 283.46457"
    id="svg2"
    version="1.1"
-   inkscape:version="0.91 r13725"
-   sodipodi:docname="drawing-pcap.svg">
+   inkscape:version="1.0.2 (e86c870879, 2021-01-15)"
+   sodipodi:docname="packet_capture_framework.svg">
   <defs
      id="defs4">
     <marker
@@ -228,7 +226,7 @@
        x2="487.64606"
        y2="258.38232"
        gradientUnits="userSpaceOnUse"
-       gradientTransform="translate(-84.916417,744.90779)" />
+       gradientTransform="matrix(1.1457977,0,0,0.99944907,-151.97019,745.05014)" />
     <linearGradient
        inkscape:collect="always"
        xlink:href="#linearGradient5784"
@@ -277,17 +275,18 @@
      borderopacity="1.0"
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
-     inkscape:zoom="0.57434918"
-     inkscape:cx="215.17857"
-     inkscape:cy="285.26445"
+     inkscape:zoom="1"
+     inkscape:cx="226.77165"
+     inkscape:cy="78.124511"
      inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
-     inkscape:window-width="1874"
-     inkscape:window-height="971"
-     inkscape:window-x="2"
-     inkscape:window-y="24"
-     inkscape:window-maximized="0" />
+     inkscape:window-width="2560"
+     inkscape:window-height="1414"
+     inkscape:window-x="0"
+     inkscape:window-y="0"
+     inkscape:window-maximized="1"
+     inkscape:document-rotation="0" />
   <metadata
      id="metadata7">
     <rdf:RDF>
@@ -296,7 +295,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
+        <dc:title />
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -321,15 +320,15 @@
        y="790.82452" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="61.050636"
        y="807.3205"
-       id="text4152"
-       sodipodi:linespacing="125%"><tspan
+       id="text4152"><tspan
          sodipodi:role="line"
          id="tspan4154"
          x="61.050636"
-         y="807.3205">DPDK Primary Application</tspan></text>
+         y="807.3205"
+         style="font-size:12.5px;line-height:1.25">DPDK Primary Application</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6"
@@ -339,19 +338,20 @@
        y="827.01843" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="350.68585"
        y="841.16058"
-       id="text4189"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189"><tspan
          sodipodi:role="line"
          id="tspan4191"
          x="350.68585"
-         y="841.16058">dpdk-pdump</tspan><tspan
+         y="841.16058"
+         style="font-size:12.5px;line-height:1.25">dpdk-dumpcap</tspan><tspan
          sodipodi:role="line"
          x="350.68585"
          y="856.78558"
-         id="tspan4193">tool</tspan></text>
+         id="tspan4193"
+         style="font-size:12.5px;line-height:1.25">tool</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4"
@@ -361,15 +361,15 @@
        y="891.16315" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70612"
        y="905.3053"
-       id="text4189-1"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1"><tspan
          sodipodi:role="line"
          x="352.70612"
          y="905.3053"
-         id="tspan4193-3">PCAP PMD</tspan></text>
+         id="tspan4193-3"
+         style="font-size:12.5px;line-height:1.25">librte_pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5745);fill-opacity:1;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-6"
@@ -379,15 +379,15 @@
        y="923.9931" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.02846"
        y="938.13525"
-       id="text4189-0"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-0"><tspan
          sodipodi:role="line"
          x="136.02846"
          y="938.13525"
-         id="tspan4193-6">dpdk_port0</tspan></text>
+         id="tspan4193-6"
+         style="font-size:12.5px;line-height:1.25">dpdk_port0</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-5"
@@ -397,33 +397,33 @@
        y="824.99817" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="137.54369"
        y="839.14026"
-       id="text4189-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-4"><tspan
          sodipodi:role="line"
          x="137.54369"
          y="839.14026"
-         id="tspan4193-2">librte_pdump</tspan></text>
+         id="tspan4193-2"
+         style="font-size:12.5px;line-height:1.25">librte_pdump</tspan></text>
     <rect
-       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1.07013;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5"
-       width="94.449265"
-       height="35.355339"
-       x="307.7804"
-       y="985.61243" />
+       width="108.21974"
+       height="35.335861"
+       x="297.9809"
+       y="985.62219" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70618"
        y="999.75458"
-       id="text4189-1-8"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8"><tspan
          sodipodi:role="line"
          x="352.70618"
          y="999.75458"
-         id="tspan4193-3-2">capture.pcap</tspan></text>
+         id="tspan4193-3-2"
+         style="font-size:12.5px;line-height:1.25">capture.pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5788-1);fill-opacity:1;stroke:#257cdc;stroke-width:1.12555885;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5-1"
@@ -433,15 +433,15 @@
        y="983.14984" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.53352"
        y="1002.785"
-       id="text4189-1-8-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8-4"><tspan
          sodipodi:role="line"
          x="136.53352"
          y="1002.785"
-         id="tspan4193-3-2-7">Traffic Generator</tspan></text>
+         id="tspan4193-3-2-7"
+         style="font-size:12.5px;line-height:1.25">Traffic Generator</tspan></text>
     <path
        style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker7331)"
        d="m 351.46948,927.02357 c 0,57.5787 0,57.5787 0,57.5787"
diff --git a/doc/guides/howto/packet_capture_framework.rst b/doc/guides/howto/packet_capture_framework.rst
index c31bac52340e..f933cc7e9311 100644
--- a/doc/guides/howto/packet_capture_framework.rst
+++ b/doc/guides/howto/packet_capture_framework.rst
@@ -1,18 +1,19 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
-    Copyright(c) 2017 Intel Corporation.
+    Copyright(c) 2017-2021 Intel Corporation.
 
-DPDK pdump Library and pdump Tool
-=================================
+DPDK packet capture libraries and tools
+=======================================
 
 This document describes how the Data Plane Development Kit (DPDK) Packet
 Capture Framework is used for capturing packets on DPDK ports. It is intended
 for users of DPDK who want to know more about the Packet Capture feature and
 for those who want to monitor traffic on DPDK-controlled devices.
 
-The DPDK packet capture framework was introduced in DPDK v16.07. The DPDK
-packet capture framework consists of the DPDK pdump library and DPDK pdump
-tool.
-
+The DPDK packet capture framework was introduced in DPDK v16.07 and
+enhanced in 21.11. The DPDK packet capture framework consists of the
+libraries for collecting packets ``librte_pdump`` and writing packets
+to a file ``librte_pcapng``. There are two sample applications:
+``dpdk-dumpcap`` and older ``dpdk-pdump``.
 
 Introduction
 ------------
@@ -22,43 +23,46 @@ allow users to initialize the packet capture framework and to enable or
 disable packet capture. The library works on a multi process communication model and its
 usage is recommended for debugging purposes.
 
-The :ref:`dpdk-pdump <pdump_tool>` tool is developed based on the
-``librte_pdump`` library.  It runs as a DPDK secondary process and is capable
-of enabling or disabling packet capture on DPDK ports. The ``dpdk-pdump`` tool
-provides command-line options with which users can request enabling or
-disabling of the packet capture on DPDK ports.
+The :ref:`librte_pcapng <pcapng_library>` library provides the APIs to format
+packets and write them to a file in Pcapng format.
+
+
+The :ref:`dpdk-dumpcap <dumpcap_tool>` is a tool that captures packets in
+like Wireshark dumpcap does for Linux. It runs as a DPDK secondary process and
+captures packets from one or more interfaces and writes them to a file
+in Pcapng format.  The ``dpdk-dumpcap`` tool is designed to take
+most of the same options as the Wireshark ``dumpcap`` command.
 
-The application which initializes the packet capture framework will be a primary process
-and the application that enables or disables the packet capture will
-be a secondary process. The primary process sends the Rx and Tx packets from the DPDK ports
-to the secondary process.
+Without any options it will use the packet capture framework to
+capture traffic from the first available DPDK port.
 
 In DPDK the ``testpmd`` application can be used to initialize the packet
-capture framework and acts as a server, and the ``dpdk-pdump`` tool acts as a
+capture framework and acts as a server, and the ``dpdk-dumpcap`` tool acts as a
 client. To view Rx or Tx packets of ``testpmd``, the application should be
-launched first, and then the ``dpdk-pdump`` tool. Packets from ``testpmd``
-will be sent to the tool, which then sends them on to the Pcap PMD device and
-that device writes them to the Pcap file or to an external interface depending
-on the command-line option used.
+launched first, and then the ``dpdk-dumpcap`` tool. Packets from ``testpmd``
+will be sent to the tool, and then to the Pcapng file.
 
 Some things to note:
 
-* The ``dpdk-pdump`` tool can only be used in conjunction with a primary
+* All tools using ``librte_pdump`` can only be used in conjunction with a primary
   application which has the packet capture framework initialized already. In
   dpdk, only ``testpmd`` is modified to initialize packet capture framework,
-  other applications remain untouched. So, if the ``dpdk-pdump`` tool has to
+  other applications remain untouched. So, if the ``dpdk-dumpcap`` tool has to
   be used with any application other than the testpmd, the user needs to
   explicitly modify that application to call the packet capture framework
   initialization code. Refer to the ``app/test-pmd/testpmd.c`` code and look
   for ``pdump`` keyword to see how this is done.
 
-* The ``dpdk-pdump`` tool depends on the libpcap based PMD.
+* The ``dpdk-pdump`` tool is an older tool created as demonstration of ``librte_pdump``
+  library. The ``dpdk-pdump`` tool provides more limited functionality and
+  and depends on the Pcap PMD. It is retained only for compatibility reasons;
+  users should use ``dpdk-dumpcap`` instead.
 
 
 Test Environment
 ----------------
 
-The overview of using the Packet Capture Framework and the ``dpdk-pdump`` tool
+The overview of using the Packet Capture Framework and the ``dpdk-dumpcap`` utility
 for packet capturing on the DPDK port in
 :numref:`figure_packet_capture_framework`.
 
@@ -66,13 +70,13 @@ for packet capturing on the DPDK port in
 
 .. figure:: img/packet_capture_framework.*
 
-   Packet capturing on a DPDK port using the dpdk-pdump tool.
+   Packet capturing on a DPDK port using the dpdk-dumpcap utility.
 
 
 Running the Application
 -----------------------
 
-The following steps demonstrate how to run the ``dpdk-pdump`` tool to capture
+The following steps demonstrate how to run the ``dpdk-dumpcap`` tool to capture
 Rx side packets on dpdk_port0 in :numref:`figure_packet_capture_framework` and
 inspect them using ``tcpdump``.
 
@@ -80,16 +84,15 @@ inspect them using ``tcpdump``.
 
      sudo <build_dir>/app/dpdk-testpmd -c 0xf0 -n 4 -- -i --port-topology=chained
 
-#. Launch the pdump tool as follows::
+#. Launch the dpdk-dumpcap as follows::
 
-     sudo <build_dir>/app/dpdk-pdump -- \
-          --pdump 'port=0,queue=*,rx-dev=/tmp/capture.pcap'
+     sudo <build_dir>/app/dpdk-dumpcap -w /tmp/capture.pcapng
 
 #. Send traffic to dpdk_port0 from traffic generator.
-   Inspect packets captured in the file capture.pcap using a tool
-   that can interpret Pcap files, for example tcpdump::
+   Inspect packets captured in the file capture.pcapng using a tool such as
+   tcpdump or tshark that can interpret Pcapng files::
 
-     $tcpdump -nr /tmp/capture.pcap
+     $ tcpdump -nr /tmp/capture.pcapng
      reading from file /tmp/capture.pcap, link-type EN10MB (Ethernet)
      11:11:36.891404 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
      11:11:36.891442 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 2dce507f46a3..b440c77c2ba1 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -43,6 +43,7 @@ Programmer's Guide
     ip_fragment_reassembly_lib
     generic_receive_offload_lib
     generic_segmentation_offload_lib
+    pcapng_lib
     pdump_lib
     multi_proc_support
     kernel_nic_interface
diff --git a/doc/guides/prog_guide/pcapng_lib.rst b/doc/guides/prog_guide/pcapng_lib.rst
new file mode 100644
index 000000000000..09fa2934a2cc
--- /dev/null
+++ b/doc/guides/prog_guide/pcapng_lib.rst
@@ -0,0 +1,46 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2021 Microsoft Corporation
+
+.. _pcapng_library:
+
+Packet Capture Next Generation Library
+======================================
+
+Exchanging packet traces becomes more and more critical every day.
+The defacto standard for this is the format define by libpcap;
+but that format is rather old and is lacking in functionality
+for more modern applications. The `Pcapng file format`_
+is the default capture file format for modern network capture
+processing tools such as `wireshark`_ (can also be read by `tcpdump`_).
+
+The Pcapng library is a an API for formatting packet data into
+into a Pcapng file.
+The format conforms to the current `Pcapng RFC`_ standard.
+It is designed to be integrated with the packet capture library.
+
+Usage
+-----
+
+Before the library can be used the function ``rte_pcapng_init``
+should be called once to initialize timestamp computation.
+
+The output stream is created with ``rte_pcapng_fdopen``,
+and should be closed with ``rte_pcapng_close``.
+
+The library requires a DPDK mempool to allocate mbufs. The mbufs
+need to be able to accommodate additional space for the pcapng packet
+format header and trailer information; the function ``rte_pcapng_mbuf_size``
+should be used to determine the lower bound based on MTU.
+
+Collecting packets is done in two parts. The function ``rte_pcapng_copy``
+is used to format and copy mbuf data and ``rte_pcapng_write_packets``
+writes a burst of packets to the output file.
+
+The function ``rte_pcapng_write_stats`` can be used to write
+statistics information into the output file. The summary statistics
+information is automatically added by ``rte_pcapng_close``.
+
+.. _Tcpdump: https://tcpdump.org/
+.. _Wireshark: https://wireshark.org/
+.. _Pcapng file format: https://github.com/pcapng/pcapng/
+.. _Pcapng RFC: https://datatracker.ietf.org/doc/html/draft-tuexen-opsawg-pcapng
diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst
index 62c0b015b2fe..f3ff8fd828dc 100644
--- a/doc/guides/prog_guide/pdump_lib.rst
+++ b/doc/guides/prog_guide/pdump_lib.rst
@@ -3,10 +3,10 @@
 
 .. _pdump_library:
 
-The librte_pdump Library
-========================
+Packet Capture Library
+======================
 
-The ``librte_pdump`` library provides a framework for packet capturing in DPDK.
+The DPDK ``pdump`` library provides a framework for packet capturing in DPDK.
 The library does the complete copy of the Rx and Tx mbufs to a new mempool and
 hence it slows down the performance of the applications, so it is recommended
 to use this library for debugging purposes.
@@ -23,11 +23,19 @@ or disable the packet capture, and to uninitialize it.
 
 * ``rte_pdump_enable()``:
   This API enables the packet capture on a given port and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf()``
+  This API enables the packet capture on a given port and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_enable_by_deviceid()``:
   This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf_by_deviceid()``
+  This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_disable()``:
   This API disables the packet capture on a given port and queue.
@@ -61,6 +69,12 @@ and enables the packet capture by registering the Ethernet RX and TX callbacks f
 and queue combinations. Then the primary process will mirror the packets to the new mempool and enqueue them to
 the rte_ring that secondary process have passed to these APIs.
 
+The packet ring supports one of two formats. The default format enqueues copies of the original packets
+into the rte_ring. If the ``RTE_PDUMP_FLAG_PCAPNG`` is set the mbuf data is extended with header and trailer
+to match the format of Pcapng enhanced packet block. The enhanced packet block has meta-data such as the
+timestamp, port and queue the packet was captured on. It is up to the application consuming the
+packets from the ring to select the format desired.
+
 The library APIs ``rte_pdump_disable()`` and ``rte_pdump_disable_by_deviceid()`` disables the packet capture.
 For the calls to these APIs from secondary process, the library creates the "pdump disable" request and sends
 the request to the primary process over the multi process channel. The primary process takes this request and
@@ -74,5 +88,5 @@ function.
 Use Case: Packet Capturing
 --------------------------
 
-The DPDK ``app/pdump`` tool is developed based on this library to capture packets in DPDK.
-Users can use this as an example to develop their own packet capturing tools.
+The DPDK ``app/dpdk-dumpcap`` utility uses this library
+to capture packets in DPDK.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 4c56cdfeaaa2..0909f4258cf8 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -159,6 +159,16 @@ New Features
   * Added tests to verify tunnel header verification in IPsec inbound.
   * Added tests to verify inner checksum.
 
+* **Revised packet capture framework.**
+
+  * New dpdk-dumpcap program that has most of the features of the
+    wireshark dumpcap utility including: capture of multiple interfaces,
+    filtering, and stopping after number of bytes, packets.
+  * New library for writing pcapng packet capture files.
+  * Enhancements to the pdump library to support:
+    * Packet filter with BPF.
+    * Pcapng format with timestamps and meta-data.
+    * Fixes packet capture with stripped VLAN tags.
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/dumpcap.rst b/doc/guides/tools/dumpcap.rst
new file mode 100644
index 000000000000..664ea0c79802
--- /dev/null
+++ b/doc/guides/tools/dumpcap.rst
@@ -0,0 +1,86 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Microsoft Corporation.
+
+.. _dumpcap_tool:
+
+dpdk-dumpcap Application
+========================
+
+The ``dpdk-dumpcap`` tool is a Data Plane Development Kit (DPDK)
+network traffic dump tool.  The interface is similar to  the dumpcap tool in Wireshark.
+It runs as a secondary DPDK process and lets you capture packets that are
+coming into and out of a DPDK primary process.
+The ``dpdk-dumpcap`` writes files in Pcapng packet format using
+capture file format is pcapng.
+
+Without any options set it will use DPDK to capture traffic from the first
+available DPDK interface and write the received raw packet data, along
+with timestamps into a pcapng file.
+
+If the ``-w`` option is not specified, ``dpdk-dumpcap`` writes to a newly
+create file with a name chosen based on interface name and timestamp.
+If ``-w`` option is specified, then that file is used.
+
+   .. Note::
+      * The ``dpdk-dumpcap`` tool can only be used in conjunction with a primary
+        application which has the packet capture framework initialized already.
+        In dpdk, only the ``testpmd`` is modified to initialize packet capture
+        framework, other applications remain untouched. So, if the ``dpdk-dumpcap``
+        tool has to be used with any application other than the testpmd, user
+        needs to explicitly modify that application to call packet capture
+        framework initialization code. Refer ``app/test-pmd/testpmd.c``
+        code to see how this is done.
+
+      * The ``dpdk-dumpcap`` tool runs as a DPDK secondary process. It exits when
+        the primary application exits.
+
+
+Running the Application
+-----------------------
+
+To list interfaces available for capture use ``--list-interfaces``.
+
+To filter packets in style of *tshark* use the ``-f`` flag.
+
+To capture on multiple interfaces at once, use multiple ``-I`` flags.
+
+Example
+-------
+
+.. code-block:: console
+
+   # ./<build_dir>/app/dpdk-dumpcap --list-interfaces
+   0. 000:00:03.0
+   1. 000:00:03.1
+
+   # ./<build_dir>/app/dpdk-dumpcap -I 0000:00:03.0 -c 6 -w /tmp/sample.pcapng
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 6/0
+
+   # ./<build_dir>/app/dpdk-dumpcap -f 'tcp port 80'
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 10/8
+
+
+Limitations
+-----------
+The following option of Wireshark ``dumpcap`` is not yet implemented:
+
+   * ``-b|--ring-buffer`` -- more complex file management.
+
+The following options do not make sense in the context of DPDK.
+
+   * ``-C <byte_limit>`` -- its a kernel thing
+
+   * ``-t`` -- use a thread per interface
+
+   * Timestamp type.
+
+   * Link data types. Only EN10MB (Ethernet) is supported.
+
+   * Wireless related options:  ``-I|--monitor-mode`` and  ``-k <freq>``
+
+
+.. Note::
+   * The options to ``dpdk-dumpcap`` are like the Wireshark dumpcap program and
+     are not the same as ``dpdk-pdump`` and other DPDK applications.
diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst
index 93dde4148e90..b71c12b8f2dd 100644
--- a/doc/guides/tools/index.rst
+++ b/doc/guides/tools/index.rst
@@ -8,6 +8,7 @@ DPDK Tools User Guides
     :maxdepth: 2
     :numbered:
 
+    dumpcap
     proc_info
     pdump
     pmdinfo
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v14 12/12] MAINTAINERS: add entry for new packet capture features
  2021-10-15 20:11 ` [dpdk-dev] [PATCH v14 00/12] Packet capture framework update Stephen Hemminger
                     ` (10 preceding siblings ...)
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 11/12] doc: changes for new pcapng and dumpcap utility Stephen Hemminger
@ 2021-10-15 20:11   ` Stephen Hemminger
  2021-10-21 12:40     ` Pattan, Reshma
  11 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-15 20:11 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Thomas Monjalon

Since the packet capture is just extension of existing pdump;
add myself as maintainer of that.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ed8becce85cd..6d95d151ba4a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1425,12 +1425,17 @@ F: doc/guides/sample_app_ug/qos_scheduler.rst
 
 Packet capture
 M: Reshma Pattan <reshma.pattan@intel.com>
+M: Stephen Hemminger <stephen@networkplumber.org>
 F: lib/pdump/
+F: lib/pcapng/
 F: doc/guides/prog_guide/pdump_lib.rst
-F: app/test/test_pdump.*
-F: app/pdump/
+F: doc/guides/prog_guide/pcapng_lib.rst
+F: doc/guides/tools/dumpcap.rst
 F: doc/guides/tools/pdump.rst
-
+F: app/test/test_pdump.c
+F: app/test/test_pcapng.c
+F: app/pdump/
+F: app/dumpcap/
 
 Packet Framework
 ----------------
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v12 11/12] doc: changes for new pcapng and dumpcap
  2021-10-15 17:29       ` Stephen Hemminger
@ 2021-10-18  9:23         ` Pattan, Reshma
  0 siblings, 0 replies; 220+ messages in thread
From: Pattan, Reshma @ 2021-10-18  9:23 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> On Fri, 15 Oct 2021 16:42:19 +0000
> "Pattan, Reshma" <reshma.pattan@intel.com> wrote:
> 
> > > @@ -0,0 +1,24 @@
> > > +..  SPDX-License-Identifier: BSD-3-Clause
> > > +    Copyright(c) 2016 Intel Corporation.
> >
> > need to edit the licence
> 
> Do you want me to change date on the existing doc as well.

Yes, looks good as in v14. 



^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v14 11/12] doc: changes for new pcapng and dumpcap utility
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 11/12] doc: changes for new pcapng and dumpcap utility Stephen Hemminger
@ 2021-10-19  8:28     ` Pattan, Reshma
  0 siblings, 0 replies; 220+ messages in thread
From: Pattan, Reshma @ 2021-10-19  8:28 UTC (permalink / raw)
  To: Stephen Hemminger, dev



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Subject: [PATCH v14 11/12] doc: changes for new pcapng and dumpcap utility
> 
> Describe the new packet capture library and utility.
> Fix the title line on the pdump documentation.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Acked-by: Reshma Pattan <reshma.pattan@intel.com>


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v14 02/12] librte_pcapng: add new library for writing pcapng files
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-10-19 10:24     ` Pattan, Reshma
  0 siblings, 0 replies; 220+ messages in thread
From: Pattan, Reshma @ 2021-10-19 10:24 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Ray Kinsella



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Subject: [PATCH v14 02/12] librte_pcapng: add new library for writing pcapng
> files
> 
> This is utility library for writing pcapng format files used by Wireshark family of
> utilities. Older tcpdump also knows how to read (but not write) this format.
> 
> See
>   https://github.com/pcapng/pcapng/
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Acked-by: Reshma Pattan <reshma.pattan@intel.com>

^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v15 00/12] Packet capture framework update
  2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
                   ` (17 preceding siblings ...)
  2021-10-15 20:11 ` [dpdk-dev] [PATCH v14 00/12] Packet capture framework update Stephen Hemminger
@ 2021-10-20 21:42 ` Stephen Hemminger
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 01/12] lib: pdump is not supported on Windows Stephen Hemminger
                     ` (12 more replies)
  18 siblings, 13 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-20 21:42 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

This patch set is a more complete version of the the enhanced
packet capture support described last year.

The new capture library and utility are:
  - faster avoids lots of extra I/O, does bursting, etc.
  - gives more information (multiple ports, queues, etc)
  - has a better user interface (same as Wireshark dumpcap)
  - fixes structural problems with VLAN's and timestamps

There are no blocker items.
The following are worth noting:
  * bogus checkpatch warnings
	- the correct flag to open is O_CREAT
        - intentionally keeping macro with goto since that
          was in original code and is clearer
        - the tempfile name can not be const since it is
          overwritten by tmpfile() call

v15
  - fix minor spelling in doc 
    make sure CI tests are rerun after test/bpf fix

v14
  - fix checkpatch whitespace warning
  - enhance the pcapng prog guide documentation

v13
  - integrate feedback in documentation and pcapng library
  - rebase to align with rte_ether_addr changes

v12
  - fixes for capture offloaded VLAN tags.
    look at direction flag and handle QinQ offload.

v11
  - address review comments for pdump (patch 6)

v10:
  - fix to rte_bpf_dump to handle more instructions
    make sure all bpf_test cases are decoded

v9:
  - incorporate suggested change to BPF XOR
  - make autotest for pcapng more complete by reading the
    resulting file with libpcap

v8:
  - enable BPF tests in autotest
  - add more BPF test strings
  - use rte_strscpy to satisfy checkpatch
  - merge MAINTAINERS (put this in with existing pdump)

v7:
  - add functional tests for pcapng lib
  - bug fix for error returns in pcapng lib
  - handle long osname on FreeBSD
  - resolve almost all checkpatch issues

v5:
  - minor build and checkpatch fixes for RHEL/FreeBSD
  - disable lib/pdump on Windows. It was not useful before
    and now pdump depends on bpf.

v4:
  - minor checkpatch fixes.
    Note: some of the checkpatch warnings are bogus and won't be fixed.
  - fix build of dumpcap on FreeBSD

v3:
  - introduce packet filters using classic BPF to eBPF converter
    required small fix to DPDK BPF interpreter
  - introduce function to decode eBPF instructions
  - add option to dumpcap to show both classic BPF and eBPF result
  - drop some un-useful stubs
  - minor checkpatch warning cleanup

v2:
   fix formatting of packet blocks
   fix the new packet capture statistics
   fix crash when primary process exits
   record start/end time
   various whitespace/checkpatch warnings


Stephen Hemminger (12):
  lib: pdump is not supported on Windows
  librte_pcapng: add new library for writing pcapng files
  bpf: allow self-xor operation
  bpf: add function to convert classic BPF to DPDK BPF
  bpf: add function to dump eBPF instructions
  pdump: support pcapng and filtering
  app/dumpcap: add new packet capture application
  test: add test for bpf_convert
  test: add a test for pcapng library
  test: enable bpf autotest
  doc: changes for new pcapng and dumpcap utility
  MAINTAINERS: add entry for new packet capture features

 MAINTAINERS                                   |  11 +-
 app/dumpcap/main.c                            | 844 ++++++++++++++++++
 app/dumpcap/meson.build                       |  16 +
 app/meson.build                               |   1 +
 app/test/meson.build                          |   6 +
 app/test/test_bpf.c                           | 200 +++++
 app/test/test_pcapng.c                        | 272 ++++++
 doc/api/doxy-api-index.md                     |   1 +
 doc/api/doxy-api.conf.in                      |   1 +
 .../howto/img/packet_capture_framework.svg    |  96 +-
 doc/guides/howto/packet_capture_framework.rst |  69 +-
 doc/guides/prog_guide/index.rst               |   1 +
 doc/guides/prog_guide/pcapng_lib.rst          |  46 +
 doc/guides/prog_guide/pdump_lib.rst           |  28 +-
 doc/guides/rel_notes/release_21_11.rst        |  10 +
 doc/guides/tools/dumpcap.rst                  |  86 ++
 doc/guides/tools/index.rst                    |   1 +
 lib/bpf/bpf_convert.c                         | 575 ++++++++++++
 lib/bpf/bpf_dump.c                            | 139 +++
 lib/bpf/bpf_validate.c                        |   9 +-
 lib/bpf/meson.build                           |   6 +
 lib/bpf/rte_bpf.h                             |  39 +
 lib/bpf/version.map                           |   7 +
 lib/meson.build                               |   6 +-
 lib/pcapng/meson.build                        |   8 +
 lib/pcapng/pcapng_proto.h                     | 129 +++
 lib/pcapng/rte_pcapng.c                       | 607 +++++++++++++
 lib/pcapng/rte_pcapng.h                       | 195 ++++
 lib/pcapng/version.map                        |  12 +
 lib/pdump/meson.build                         |   2 +-
 lib/pdump/rte_pdump.c                         | 432 ++++++---
 lib/pdump/rte_pdump.h                         | 113 ++-
 lib/pdump/version.map                         |   8 +
 33 files changed, 3757 insertions(+), 219 deletions(-)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build
 create mode 100644 app/test/test_pcapng.c
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst
 create mode 100644 lib/bpf/bpf_convert.c
 create mode 100644 lib/bpf/bpf_dump.c
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v15 01/12] lib: pdump is not supported on Windows
  2021-10-20 21:42 ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Stephen Hemminger
@ 2021-10-20 21:42   ` Stephen Hemminger
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-20 21:42 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Dmitry Kozlyuk, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

The current version of the pdump library was building on
Windows, but it was useless since the pdump utility was not being
built and Windows does not have multi-process support.

The new version of pdump with filtering now has dependency
on bpf. But bpf library is not available on Windows.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

Cc: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>
Cc: Dmitry Malloy <dmitrym@microsoft.com>
Cc: Pallavi Kadam <pallavi.kadam@intel.com>
---
 lib/meson.build | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/meson.build b/lib/meson.build
index 3b8b0998208a..5aa1be5134ed 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -86,7 +86,6 @@ if is_windows
             'gro',
             'gso',
             'latencystats',
-            'pdump',
             'stack',
             'security',
     ] # only supported libraries for windows
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v15 02/12] librte_pcapng: add new library for writing pcapng files
  2021-10-20 21:42 ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Stephen Hemminger
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 01/12] lib: pdump is not supported on Windows Stephen Hemminger
@ 2021-10-20 21:42   ` Stephen Hemminger
  2021-10-21 14:14     ` Kinsella, Ray
                       ` (2 more replies)
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 03/12] bpf: allow self-xor operation Stephen Hemminger
                     ` (10 subsequent siblings)
  12 siblings, 3 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-20 21:42 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Reshma Pattan, Ray Kinsella

This is utility library for writing pcapng format files
used by Wireshark family of utilities. Older tcpdump
also knows how to read (but not write) this format.

See
  https://github.com/pcapng/pcapng/

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
---
 lib/meson.build           |   1 +
 lib/pcapng/meson.build    |   8 +
 lib/pcapng/pcapng_proto.h | 129 ++++++++
 lib/pcapng/rte_pcapng.c   | 607 ++++++++++++++++++++++++++++++++++++++
 lib/pcapng/rte_pcapng.h   | 195 ++++++++++++
 lib/pcapng/version.map    |  12 +
 6 files changed, 952 insertions(+)
 create mode 100644 lib/pcapng/meson.build
 create mode 100644 lib/pcapng/pcapng_proto.h
 create mode 100644 lib/pcapng/rte_pcapng.c
 create mode 100644 lib/pcapng/rte_pcapng.h
 create mode 100644 lib/pcapng/version.map

diff --git a/lib/meson.build b/lib/meson.build
index 5aa1be5134ed..484b1da2b88d 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -41,6 +41,7 @@ libraries = [
         'latencystats',
         'lpm',
         'member',
+        'pcapng',
         'power',
         'pdump',
         'rawdev',
diff --git a/lib/pcapng/meson.build b/lib/pcapng/meson.build
new file mode 100644
index 000000000000..fe636bdf3c0b
--- /dev/null
+++ b/lib/pcapng/meson.build
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+version = 1
+sources = files('rte_pcapng.c')
+headers = files('rte_pcapng.h')
+
+deps += ['ethdev']
diff --git a/lib/pcapng/pcapng_proto.h b/lib/pcapng/pcapng_proto.h
new file mode 100644
index 000000000000..47161d8a1213
--- /dev/null
+++ b/lib/pcapng/pcapng_proto.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * PCAP Next Generation Capture File writer
+ *
+ * See: https://github.com/pcapng/pcapng/ for the file format.
+ */
+
+enum pcapng_block_types {
+	PCAPNG_INTERFACE_BLOCK		= 1,
+	PCAPNG_PACKET_BLOCK,		/* Obsolete */
+	PCAPNG_SIMPLE_PACKET_BLOCK,
+	PCAPNG_NAME_RESOLUTION_BLOCK,
+	PCAPNG_INTERFACE_STATS_BLOCK,
+	PCAPNG_ENHANCED_PACKET_BLOCK,
+
+	PCAPNG_SECTION_BLOCK		= 0x0A0D0D0A,
+};
+
+struct pcapng_option {
+	uint16_t code;
+	uint16_t length;
+	uint8_t data[];
+};
+
+#define PCAPNG_BYTE_ORDER_MAGIC 0x1A2B3C4D
+#define PCAPNG_MAJOR_VERS 1
+#define PCAPNG_MINOR_VERS 0
+
+enum pcapng_opt {
+	PCAPNG_OPT_END	= 0,
+	PCAPNG_OPT_COMMENT = 1,
+};
+
+struct pcapng_section_header {
+	uint32_t block_type;
+	uint32_t block_length;
+	uint32_t byte_order_magic;
+	uint16_t major_version;
+	uint16_t minor_version;
+	uint64_t section_length;
+};
+
+enum pcapng_section_opt {
+	PCAPNG_SHB_HARDWARE = 2,
+	PCAPNG_SHB_OS	    = 3,
+	PCAPNG_SHB_USERAPPL = 4,
+};
+
+struct pcapng_interface_block {
+	uint32_t block_type;	/* 1 */
+	uint32_t block_length;
+	uint16_t link_type;
+	uint16_t reserved;
+	uint32_t snap_len;
+};
+
+enum pcapng_interface_options {
+	PCAPNG_IFB_NAME	 = 2,
+	PCAPNG_IFB_DESCRIPTION,
+	PCAPNG_IFB_IPV4ADDR,
+	PCAPNG_IFB_IPV6ADDR,
+	PCAPNG_IFB_MACADDR,
+	PCAPNG_IFB_EUIADDR,
+	PCAPNG_IFB_SPEED,
+	PCAPNG_IFB_TSRESOL,
+	PCAPNG_IFB_TZONE,
+	PCAPNG_IFB_FILTER,
+	PCAPNG_IFB_OS,
+	PCAPNG_IFB_FCSLEN,
+	PCAPNG_IFB_TSOFFSET,
+	PCAPNG_IFB_HARDWARE,
+};
+
+struct pcapng_enhance_packet_block {
+	uint32_t block_type;	/* 6 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+	uint32_t capture_length;
+	uint32_t original_length;
+};
+
+/* Flags values */
+#define PCAPNG_IFB_INBOUND   0b01
+#define PCAPNG_IFB_OUTBOUND  0b10
+
+enum pcapng_epb_options {
+	PCAPNG_EPB_FLAGS = 2,
+	PCAPNG_EPB_HASH,
+	PCAPNG_EPB_DROPCOUNT,
+	PCAPNG_EPB_PACKETID,
+	PCAPNG_EPB_QUEUE,
+	PCAPNG_EPB_VERDICT,
+};
+
+enum pcapng_epb_hash {
+	PCAPNG_HASH_2COMP = 0,
+	PCAPNG_HASH_XOR,
+	PCAPNG_HASH_CRC32,
+	PCAPNG_HASH_MD5,
+	PCAPNG_HASH_SHA1,
+	PCAPNG_HASH_TOEPLITZ,
+};
+
+struct pcapng_simple_packet {
+	uint32_t block_type;	/* 3 */
+	uint32_t block_length;
+	uint32_t packet_length;
+};
+
+struct pcapng_statistics {
+	uint32_t block_type;	/* 5 */
+	uint32_t block_length;
+	uint32_t interface_id;
+	uint32_t timestamp_hi;
+	uint32_t timestamp_lo;
+};
+
+enum pcapng_isb_options {
+	PCAPNG_ISB_STARTTIME = 2,
+	PCAPNG_ISB_ENDTIME,
+	PCAPNG_ISB_IFRECV,
+	PCAPNG_ISB_IFDROP,
+	PCAPNG_ISB_FILTERACCEPT,
+	PCAPNG_ISB_OSDROP,
+	PCAPNG_ISB_USRDELIV,
+};
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
new file mode 100644
index 000000000000..3a399de3d037
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.c
@@ -0,0 +1,607 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+#include <errno.h>
+#include <net/if.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/uio.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_pcapng.h>
+#include <rte_time.h>
+
+#include "pcapng_proto.h"
+
+/* conversion from DPDK speed to PCAPNG */
+#define PCAPNG_MBPS_SPEED 1000000ull
+
+/* Format of the capture file handle */
+struct rte_pcapng {
+	int  outfd;		/* output file */
+	/* DPDK port id to interface index in file */
+	uint32_t port_index[RTE_MAX_ETHPORTS];
+};
+
+/* For converting TSC cycles to PCAPNG ns format */
+struct pcapng_time {
+	uint64_t ns;
+	uint64_t cycles;
+} pcapng_time;
+
+RTE_INIT(pcapng_init)
+{
+	struct timespec ts;
+
+	pcapng_time.cycles = rte_get_tsc_cycles();
+	clock_gettime(CLOCK_REALTIME, &ts);
+	pcapng_time.ns = rte_timespec_to_ns(&ts);
+}
+
+/* PCAPNG timestamps are in nanoseconds */
+static uint64_t pcapng_tsc_to_ns(uint64_t cycles)
+{
+	uint64_t delta;
+
+	delta = cycles - pcapng_time.cycles;
+	return pcapng_time.ns + (delta * NSEC_PER_SEC) / rte_get_tsc_hz();
+}
+
+/* length of option including padding */
+static uint16_t pcapng_optlen(uint16_t len)
+{
+	return RTE_ALIGN(sizeof(struct pcapng_option) + len,
+			 sizeof(uint32_t));
+}
+
+/* build TLV option and return location of next */
+static struct pcapng_option *
+pcapng_add_option(struct pcapng_option *popt, uint16_t code,
+		  const void *data, uint16_t len)
+{
+	popt->code = code;
+	popt->length = len;
+	memcpy(popt->data, data, len);
+
+	return (struct pcapng_option *)((uint8_t *)popt + pcapng_optlen(len));
+}
+
+/*
+ * Write required initial section header describing the capture
+ */
+static int
+pcapng_section_block(rte_pcapng_t *self,
+		    const char *os, const char *hw,
+		    const char *app, const char *comment)
+{
+	struct pcapng_section_header *hdr;
+	struct pcapng_option *opt;
+	void *buf;
+	uint32_t len;
+	ssize_t cc;
+
+	len = sizeof(*hdr);
+	if (hw)
+		len += pcapng_optlen(strlen(hw));
+	if (os)
+		len += pcapng_optlen(strlen(os));
+	if (app)
+		len += pcapng_optlen(strlen(app));
+	if (comment)
+		len += pcapng_optlen(strlen(comment));
+
+	/* reserve space for OPT_END */
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = calloc(1, len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_section_header *)buf;
+	*hdr = (struct pcapng_section_header) {
+		.block_type = PCAPNG_SECTION_BLOCK,
+		.block_length = len,
+		.byte_order_magic = PCAPNG_BYTE_ORDER_MAGIC,
+		.major_version = PCAPNG_MAJOR_VERS,
+		.minor_version = PCAPNG_MINOR_VERS,
+		.section_length = UINT64_MAX,
+	};
+
+	/* After the section header insert variable length options. */
+	opt = (struct pcapng_option *)(hdr + 1);
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (hw)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_HARDWARE,
+					hw, strlen(hw));
+	if (os)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_OS,
+					os, strlen(os));
+	if (app)
+		opt = pcapng_add_option(opt, PCAPNG_SHB_USERAPPL,
+					app, strlen(app));
+
+	/* The standard requires last option to be OPT_END */
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	/* clone block_length after option */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	cc = write(self->outfd, buf, len);
+	free(buf);
+
+	return cc;
+}
+
+/* Write an interface block for a DPDK port */
+static int
+pcapng_add_interface(rte_pcapng_t *self, uint16_t port)
+{
+	struct pcapng_interface_block *hdr;
+	struct rte_eth_dev_info dev_info;
+	struct rte_ether_addr *ea, macaddr;
+	const struct rte_device *dev;
+	struct rte_eth_link link;
+	struct pcapng_option *opt;
+	const uint8_t tsresol = 9;	/* nanosecond resolution */
+	uint32_t len;
+	void *buf;
+	char ifname[IF_NAMESIZE];
+	char ifhw[256];
+	uint64_t speed = 0;
+
+	if (rte_eth_dev_info_get(port, &dev_info) < 0)
+		return -1;
+
+	/* make something like an interface name */
+	if (if_indextoname(dev_info.if_index, ifname) == NULL)
+		snprintf(ifname, IF_NAMESIZE, "dpdk:%u", port);
+
+	/* make a useful device hardware string */
+	dev = dev_info.device;
+	if (dev)
+		snprintf(ifhw, sizeof(ifhw),
+			 "%s-%s", dev->bus->name, dev->name);
+
+	/* DPDK reports in units of Mbps */
+	rte_eth_link_get(port, &link);
+	if (link.link_status == ETH_LINK_UP)
+		speed = link.link_speed * PCAPNG_MBPS_SPEED;
+
+	if (rte_eth_macaddr_get(port, &macaddr) < 0)
+		ea = NULL;
+	else
+		ea = &macaddr;
+
+	/* Compute length of interface block options */
+	len = sizeof(*hdr);
+
+	len += pcapng_optlen(sizeof(tsresol));	/* timestamp */
+	len += pcapng_optlen(strlen(ifname));	/* ifname */
+
+	if (ea)
+		len += pcapng_optlen(RTE_ETHER_ADDR_LEN); /* macaddr */
+	if (speed != 0)
+		len += pcapng_optlen(sizeof(uint64_t));
+	if (dev)
+		len += pcapng_optlen(strlen(ifhw));
+
+	len += pcapng_optlen(0);
+	len += sizeof(uint32_t);
+
+	buf = alloca(len);
+	if (!buf)
+		return -1;
+
+	hdr = (struct pcapng_interface_block *)buf;
+	*hdr = (struct pcapng_interface_block) {
+		.block_type = PCAPNG_INTERFACE_BLOCK,
+		.link_type = 1,		/* DLT_EN10MB - Ethernet */
+		.block_length = len,
+	};
+
+	opt = (struct pcapng_option *)(hdr + 1);
+	opt = pcapng_add_option(opt, PCAPNG_IFB_TSRESOL,
+				&tsresol, sizeof(tsresol));
+	opt = pcapng_add_option(opt, PCAPNG_IFB_NAME,
+				ifname, strlen(ifname));
+	if (ea)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_MACADDR,
+					ea, RTE_ETHER_ADDR_LEN);
+	if (speed != 0)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_SPEED,
+					 &speed, sizeof(uint64_t));
+	if (dev)
+		opt = pcapng_add_option(opt, PCAPNG_IFB_HARDWARE,
+					 ifhw, strlen(ifhw));
+	opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	/* clone block_length after optionsa */
+	memcpy(opt, &hdr->block_length, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+/*
+ * Write the list of possible interfaces at the start
+ * of the file.
+ */
+static int
+pcapng_interfaces(rte_pcapng_t *self)
+{
+	uint16_t port_id;
+	uint16_t index = 0;
+
+	RTE_ETH_FOREACH_DEV(port_id) {
+		/* The list if ports in pcapng needs to be contiguous */
+		self->port_index[port_id] = index++;
+		if (pcapng_add_interface(self, port_id) < 0)
+			return -1;
+	}
+	return 0;
+}
+
+/*
+ * Write an Interface statistics block at the end of capture.
+ */
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port_id,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop)
+{
+	struct pcapng_statistics *hdr;
+	struct pcapng_option *opt;
+	uint32_t optlen, len;
+	uint8_t *buf;
+	uint64_t ns;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
+
+	optlen = 0;
+
+	if (ifrecv != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		optlen += pcapng_optlen(sizeof(ifdrop));
+	if (start_time != 0)
+		optlen += pcapng_optlen(sizeof(start_time));
+	if (end_time != 0)
+		optlen += pcapng_optlen(sizeof(end_time));
+	if (comment)
+		optlen += pcapng_optlen(strlen(comment));
+	if (optlen != 0)
+		optlen += pcapng_optlen(0);
+
+	len = sizeof(*hdr) + optlen + sizeof(uint32_t);
+	buf = alloca(len);
+	if (buf == NULL)
+		return -1;
+
+	hdr = (struct pcapng_statistics *)buf;
+	opt = (struct pcapng_option *)(hdr + 1);
+
+	if (comment)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_COMMENT,
+					comment, strlen(comment));
+	if (start_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_STARTTIME,
+					 &start_time, sizeof(start_time));
+	if (end_time != 0)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_ENDTIME,
+					 &end_time, sizeof(end_time));
+	if (ifrecv != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFRECV,
+				&ifrecv, sizeof(ifrecv));
+	if (ifdrop != UINT64_MAX)
+		opt = pcapng_add_option(opt, PCAPNG_ISB_IFDROP,
+				&ifdrop, sizeof(ifdrop));
+	if (optlen != 0)
+		opt = pcapng_add_option(opt, PCAPNG_OPT_END, NULL, 0);
+
+	hdr->block_type = PCAPNG_INTERFACE_STATS_BLOCK;
+	hdr->block_length = len;
+	hdr->interface_id = self->port_index[port_id];
+
+	ns = pcapng_tsc_to_ns(rte_get_tsc_cycles());
+	hdr->timestamp_hi = ns >> 32;
+	hdr->timestamp_lo = (uint32_t)ns;
+
+	/* clone block_length after option */
+	memcpy(opt, &len, sizeof(uint32_t));
+
+	return write(self->outfd, buf, len);
+}
+
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length)
+{
+	/* The VLAN and EPB header must fit in the mbuf headroom. */
+	RTE_ASSERT(sizeof(struct pcapng_enhance_packet_block) +
+		   sizeof(struct rte_vlan_hdr) <= RTE_PKTMBUF_HEADROOM);
+
+	/* The flags and queue information are added at the end. */
+	return sizeof(struct rte_mbuf)
+		+ RTE_ALIGN(length, sizeof(uint32_t))
+		+ pcapng_optlen(sizeof(uint32_t)) /* flag option */
+		+ pcapng_optlen(sizeof(uint32_t)) /* queue option */
+		+ sizeof(uint32_t);		  /*  length */
+}
+
+/* More generalized version rte_vlan_insert() */
+static int
+pcapng_vlan_insert(struct rte_mbuf *m, uint16_t ether_type, uint16_t tci)
+{
+	struct rte_ether_hdr *nh, *oh;
+	struct rte_vlan_hdr *vh;
+
+	if (!RTE_MBUF_DIRECT(m) || rte_mbuf_refcnt_read(m) > 1)
+		return -EINVAL;
+
+	if (rte_pktmbuf_data_len(m) < sizeof(*oh))
+		return -EINVAL;
+
+	oh = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
+	nh = (struct rte_ether_hdr *)
+		rte_pktmbuf_prepend(m, sizeof(struct rte_vlan_hdr));
+	if (nh == NULL)
+		return -ENOSPC;
+
+	memmove(nh, oh, 2 * RTE_ETHER_ADDR_LEN);
+	nh->ether_type = rte_cpu_to_be_16(ether_type);
+
+	vh = (struct rte_vlan_hdr *) (nh + 1);
+	vh->vlan_tci = rte_cpu_to_be_16(tci);
+
+	return 0;
+}
+
+/*
+ *   The mbufs created use the Pcapng standard enhanced packet  block.
+ *
+ *                         1                   2                   3
+ *     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  0 |                    Block Type = 0x00000006                    |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  4 |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *  8 |                         Interface ID                          |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 12 |                        Timestamp (High)                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 16 |                        Timestamp (Low)                        |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 20 |                    Captured Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 24 |                    Original Packet Length                     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * 28 /                                                               /
+ *    /                          Packet Data                          /
+ *    /              variable length, padded to 32 bits               /
+ *    /                                                               /
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0002     |     Option Length = 0x004     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Flags (direction)                                |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |      Option Code = 0x0006     |     Option Length = 0x002     |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |              Queue id                                         |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *    |                      Block Total Length                       |
+ *    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+
+/* Make a copy of original mbuf with pcapng header and options */
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *md,
+		struct rte_mempool *mp,
+		uint32_t length, uint64_t cycles,
+		enum rte_pcapng_direction direction)
+{
+	struct pcapng_enhance_packet_block *epb;
+	uint32_t orig_len, data_len, padding, flags;
+	struct pcapng_option *opt;
+	const uint16_t optlen = pcapng_optlen(sizeof(flags)) + pcapng_optlen(sizeof(queue));
+	struct rte_mbuf *mc;
+	uint64_t ns;
+
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
+#endif
+	ns = pcapng_tsc_to_ns(cycles);
+
+	orig_len = rte_pktmbuf_pkt_len(md);
+
+	/* Take snapshot of the data */
+	mc = rte_pktmbuf_copy(md, mp, 0, length);
+	if (unlikely(mc == NULL))
+		return NULL;
+
+	/* Expand any offloaded VLAN information */
+	if ((direction == RTE_PCAPNG_DIRECTION_IN &&
+	     (md->ol_flags & PKT_RX_VLAN_STRIPPED)) ||
+	    (direction == RTE_PCAPNG_DIRECTION_OUT &&
+	     (md->ol_flags & PKT_TX_VLAN))) {
+		if (pcapng_vlan_insert(mc, RTE_ETHER_TYPE_VLAN,
+				       md->vlan_tci) != 0)
+			goto fail;
+	}
+
+	if ((direction == RTE_PCAPNG_DIRECTION_IN &&
+	     (md->ol_flags & PKT_RX_QINQ_STRIPPED)) ||
+	    (direction == RTE_PCAPNG_DIRECTION_OUT &&
+	     (md->ol_flags & PKT_TX_QINQ))) {
+		if (pcapng_vlan_insert(mc, RTE_ETHER_TYPE_QINQ,
+				       md->vlan_tci_outer) != 0)
+			goto fail;
+	}
+
+	/* pad the packet to 32 bit boundary */
+	data_len = rte_pktmbuf_data_len(mc);
+	padding = RTE_ALIGN(data_len, sizeof(uint32_t)) - data_len;
+	if (padding > 0) {
+		void *tail = rte_pktmbuf_append(mc, padding);
+
+		if (tail == NULL)
+			goto fail;
+		memset(tail, 0, padding);
+	}
+
+	/* reserve trailing options and block length */
+	opt = (struct pcapng_option *)
+		rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
+	if (unlikely(opt == NULL))
+		goto fail;
+
+	switch (direction) {
+	case RTE_PCAPNG_DIRECTION_IN:
+		flags = PCAPNG_IFB_INBOUND;
+		break;
+	case RTE_PCAPNG_DIRECTION_OUT:
+		flags = PCAPNG_IFB_OUTBOUND;
+		break;
+	default:
+		flags = 0;
+	}
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS,
+				&flags, sizeof(flags));
+
+	opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
+				&queue, sizeof(queue));
+
+	/* Note: END_OPT necessary here. Wireshark doesn't do it. */
+
+	/* Add PCAPNG packet header */
+	epb = (struct pcapng_enhance_packet_block *)
+		rte_pktmbuf_prepend(mc, sizeof(*epb));
+	if (unlikely(epb == NULL))
+		goto fail;
+
+	epb->block_type = PCAPNG_ENHANCED_PACKET_BLOCK;
+	epb->block_length = rte_pktmbuf_data_len(mc);
+
+	/* Interface index is filled in later during write */
+	mc->port = port_id;
+
+	epb->timestamp_hi = ns >> 32;
+	epb->timestamp_lo = (uint32_t)ns;
+	epb->capture_length = data_len;
+	epb->original_length = orig_len;
+
+	/* set trailer of block length */
+	*(uint32_t *)opt = epb->block_length;
+
+	return mc;
+
+fail:
+	rte_pktmbuf_free(mc);
+	return NULL;
+}
+
+/* Count how many segments are in this array of mbufs */
+static unsigned int
+mbuf_burst_segs(struct rte_mbuf *pkts[], unsigned int n)
+{
+	unsigned int i, iovcnt;
+
+	for (iovcnt = 0, i = 0; i < n; i++) {
+		const struct rte_mbuf *m = pkts[i];
+
+		__rte_mbuf_sanity_check(m, 1);
+
+		iovcnt += m->nb_segs;
+	}
+	return iovcnt;
+}
+
+/* Write pre-formatted packets to file. */
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts)
+{
+	int iovcnt = mbuf_burst_segs(pkts, nb_pkts);
+	struct iovec iov[iovcnt];
+	unsigned int i, cnt;
+	ssize_t ret;
+
+	for (i = cnt = 0; i < nb_pkts; i++) {
+		struct rte_mbuf *m = pkts[i];
+		struct pcapng_enhance_packet_block *epb;
+
+		/* sanity check that is really a pcapng mbuf */
+		epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
+		if (unlikely(epb->block_type != PCAPNG_ENHANCED_PACKET_BLOCK ||
+			     epb->block_length != rte_pktmbuf_data_len(m))) {
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/*
+		 * The DPDK port is recorded during pcapng_copy.
+		 * Map that to PCAPNG interface in file.
+		 */
+		epb->interface_id = self->port_index[m->port];
+		do {
+			iov[cnt].iov_base = rte_pktmbuf_mtod(m, void *);
+			iov[cnt].iov_len = rte_pktmbuf_data_len(m);
+			++cnt;
+		} while ((m = m->next));
+	}
+
+	ret = writev(self->outfd, iov, iovcnt);
+	if (unlikely(ret < 0))
+		rte_errno = errno;
+	return ret;
+}
+
+/* Create new pcapng writer handle */
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment)
+{
+	rte_pcapng_t *self;
+
+	self = malloc(sizeof(*self));
+	if (!self) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	self->outfd = fd;
+
+	if (pcapng_section_block(self, osname, hardware, appname, comment) < 0)
+		goto fail;
+
+	if (pcapng_interfaces(self) < 0)
+		goto fail;
+
+	return self;
+fail:
+	free(self);
+	return NULL;
+}
+
+void
+rte_pcapng_close(rte_pcapng_t *self)
+{
+	close(self->outfd);
+	free(self);
+}
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
new file mode 100644
index 000000000000..8d3fbb1941b4
--- /dev/null
+++ b/lib/pcapng/rte_pcapng.h
@@ -0,0 +1,195 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Microsoft Corporation
+ */
+
+/**
+ * @file
+ * RTE pcapng
+ *
+ * @warning
+ * @b EXPERIMENTAL:
+ * All functions in this file may be changed or removed without prior notice.
+ *
+ * Pcapng is an evolution from the pcap format, created to address some of
+ * its deficiencies. Namely, the lack of extensibility and inability to store
+ * additional information.
+ *
+ * For details about the file format see RFC:
+ *   https://www.ietf.org/id/draft-tuexen-opsawg-pcapng-03.html
+ *  and
+ *    https://github.com/pcapng/pcapng/
+ */
+
+#ifndef _RTE_PCAPNG_H_
+#define _RTE_PCAPNG_H_
+
+#include <stdint.h>
+#include <sys/types.h>
+#include <rte_compat.h>
+#include <rte_common.h>
+#include <rte_mempool.h>
+#include <rte_ring.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Opaque handle used for functions in this library. */
+typedef struct rte_pcapng rte_pcapng_t;
+
+/**
+ * Write data to existing open file
+ *
+ * @param fd
+ *   file descriptor
+ * @param osname
+ *   Optional description of the operating system.
+ *   Examples: "Debian 11", "Windows Server 22"
+ * @param hardware
+ *   Optional description of the hardware used to create this file.
+ *   Examples: "x86 Virtual Machine"
+ * @param appname
+ *   Optional: application name recorded in the pcapng file.
+ *   Example: "dpdk-dumpcap 1.0 (DPDK 20.11)"
+ * @param comment
+ *   Optional comment to add to file header.
+ * @return
+ *   handle to library, or NULL in case of error (and rte_errno is set).
+ */
+__rte_experimental
+rte_pcapng_t *
+rte_pcapng_fdopen(int fd,
+		  const char *osname, const char *hardware,
+		  const char *appname, const char *comment);
+
+/**
+ * Close capture file
+ *
+ * @param self
+ *  handle to library
+ */
+__rte_experimental
+void
+rte_pcapng_close(rte_pcapng_t *self);
+
+/**
+ * Direction flag
+ * These should match Enhanced Packet Block flag bits
+ */
+enum rte_pcapng_direction {
+	RTE_PCAPNG_DIRECTION_UNKNOWN = 0,
+	RTE_PCAPNG_DIRECTION_IN  = 1,
+	RTE_PCAPNG_DIRECTION_OUT = 2,
+};
+
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param port_id
+ *   The Ethernet port on which packet was received
+ *   or is going to be transmitted.
+ * @param queue
+ *   The queue on the Ethernet port where packet was received
+ *   or is going to be transmitted.
+ * @param mp
+ *   The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ *   The mbuf to copy
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @param timestamp
+ *   The timestamp in TSC cycles.
+ * @param direction
+ *   The direction of the packer: receive, transmit or unknown.
+ *
+ * @return
+ *   - The pointer to the new mbuf formatted for pcapng_write
+ *   - NULL if allocation fails.
+ *
+ */
+__rte_experimental
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *m, struct rte_mempool *mp,
+		uint32_t length, uint64_t timestamp,
+		enum rte_pcapng_direction direction);
+
+
+/**
+ * Determine optimum mbuf data size.
+ *
+ * @param length
+ *   The largest packet that will be copied.
+ * @return
+ *   The minimum size of mbuf data to handle packet with length bytes.
+ *   Accounting for required header and trailer fields
+ */
+__rte_experimental
+uint32_t
+rte_pcapng_mbuf_size(uint32_t length);
+
+/**
+ * Write packets to the capture file.
+ *
+ * Packets to be captured are copied by rte_pcapng_copy()
+ * and then this function is called to write them to the file.
+ *
+ * @warning
+ * Do not pass original mbufs from transmit or receive
+ * or file will be invalid pcapng format.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param pkts
+ *  The address of an array of *nb_pkts* pointers to *rte_mbuf* structures
+ *  which contain the output packets
+ * @param nb_pkts
+ *  The number of packets to write to the file.
+ * @return
+ *  The number of bytes written to file, -1 on failure to write file.
+ *  The mbuf's in *pkts* are always freed.
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_packets(rte_pcapng_t *self,
+			 struct rte_mbuf *pkts[], uint16_t nb_pkts);
+
+/**
+ * Write an Interface statistics block.
+ * For statistics, use 0 if don't know or care to report it.
+ * Should be called before closing capture to report results.
+ *
+ * @param self
+ *  The handle to the packet capture file
+ * @param port
+ *  The Ethernet port to report stats on.
+ * @param comment
+ *   Optional comment to add to statistics.
+ * @param start_time
+ *  The time when packet capture was started in nanoseconds.
+ *  Optional: can be zero if not known.
+ * @param end_time
+ *  The time when packet capture was stopped in nanoseconds.
+ *  Optional: can be zero if not finished;
+ * @param ifrecv
+ *  The number of packets received by capture.
+ *  Optional: use UINT64_MAX if not known.
+ * @param ifdrop
+ *  The number of packets missed by the capture process.
+ *  Optional: use UINT64_MAX if not known.
+ * @return
+ *  number of bytes written to file, -1 on failure to write file
+ */
+__rte_experimental
+ssize_t
+rte_pcapng_write_stats(rte_pcapng_t *self, uint16_t port,
+		       const char *comment,
+		       uint64_t start_time, uint64_t end_time,
+		       uint64_t ifrecv, uint64_t ifdrop);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PCAPNG_H_ */
diff --git a/lib/pcapng/version.map b/lib/pcapng/version.map
new file mode 100644
index 000000000000..05a9c86a7d91
--- /dev/null
+++ b/lib/pcapng/version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_pcapng_close;
+	rte_pcapng_copy;
+	rte_pcapng_fdopen;
+	rte_pcapng_mbuf_size;
+	rte_pcapng_write_packets;
+	rte_pcapng_write_stats;
+
+	local: *;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v15 03/12] bpf: allow self-xor operation
  2021-10-20 21:42 ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Stephen Hemminger
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 01/12] lib: pdump is not supported on Windows Stephen Hemminger
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-10-20 21:42   ` Stephen Hemminger
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-20 21:42 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

Some BPF programs may use XOR of a register with itself
as a way to zero register in one instruction.
The BPF filter converter generates this in the prolog
to the generated code.

The BPF validator would not allow this because the value of
register was undefined. But after this operation it always zero.

Fixes: 8021917293d0 ("bpf: add extra validation for input BPF program")
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/bpf/bpf_validate.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/bpf/bpf_validate.c b/lib/bpf/bpf_validate.c
index 7b1291b382e9..853279fee557 100644
--- a/lib/bpf/bpf_validate.c
+++ b/lib/bpf/bpf_validate.c
@@ -661,8 +661,15 @@ eval_alu(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
 
 	op = BPF_OP(ins->code);
 
+	/* Allow self-xor as way to zero register */
+	if (op == BPF_XOR && BPF_SRC(ins->code) == BPF_X &&
+	    ins->src_reg == ins->dst_reg) {
+		eval_fill_imm(&rs, UINT64_MAX, 0);
+		eval_fill_imm(rd, UINT64_MAX, 0);
+	}
+
 	err = eval_defined((op != EBPF_MOV) ? rd : NULL,
-			(op != BPF_NEG) ? &rs : NULL);
+			   (op != BPF_NEG) ? &rs : NULL);
 	if (err != NULL)
 		return err;
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v15 04/12] bpf: add function to convert classic BPF to DPDK BPF
  2021-10-20 21:42 ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Stephen Hemminger
                     ` (2 preceding siblings ...)
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 03/12] bpf: allow self-xor operation Stephen Hemminger
@ 2021-10-20 21:42   ` Stephen Hemminger
  2021-10-21 14:15     ` Kinsella, Ray
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
                     ` (8 subsequent siblings)
  12 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-20 21:42 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Ray Kinsella

The pcap library emits classic BPF (32 bit) and is useful for
creating filter programs.  The DPDK BPF library only implements
extended BPF (eBPF).  Add an function to convert from old to
new.

The rte_bpf_convert function uses rte_malloc to put the resulting
program in hugepage shared memory so it can be passed from a
secondary process to a primary process.

The code to convert was originally done as part of the Linux
kernel implementation then converted to a userspace program.
See https://github.com/tklauser/filter2xdp

Both authors have agreed that it is allowable to create a modified
version of this code and license it with BSD license used by DPDK.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/bpf/bpf_convert.c | 575 ++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build   |   5 +
 lib/bpf/rte_bpf.h     |  25 ++
 lib/bpf/version.map   |   6 +
 4 files changed, 611 insertions(+)
 create mode 100644 lib/bpf/bpf_convert.c

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
new file mode 100644
index 000000000000..db84add7dcce
--- /dev/null
+++ b/lib/bpf/bpf_convert.c
@@ -0,0 +1,575 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Microsoft Corporation
+ *
+ * Based on bpf_convert_filter() in the Linux kernel sources
+ * and filter2xdp.
+ *
+ * Licensed as BSD with permission original authors.
+ * Copyright (C) 2017 Tobias Klauser
+ * Copyright (c) 2011 - 2014 PLUMgrid, http://plumgrid.com
+ */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_bpf.h>
+#include <rte_log.h>
+#include <rte_malloc.h>
+#include <rte_errno.h>
+
+/* Workaround name conflicts with libpcap */
+#define bpf_validate(f, len) bpf_validate_libpcap(f, len)
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+#undef bpf_validate
+
+#include "bpf_impl.h"
+#include "bpf_def.h"
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+/*
+ * Linux socket filter uses negative absolute offsets to
+ * reference ancillary data.
+ */
+#define SKF_AD_OFF    (-0x1000)
+#define SKF_AD_PROTOCOL 0
+#define SKF_AD_PKTTYPE	4
+#define SKF_AD_IFINDEX	8
+#define SKF_AD_NLATTR	12
+#define SKF_AD_NLATTR_NEST	16
+#define SKF_AD_MARK	20
+#define SKF_AD_QUEUE	24
+#define SKF_AD_HATYPE	28
+#define SKF_AD_RXHASH	32
+#define SKF_AD_CPU	36
+#define SKF_AD_ALU_XOR_X	40
+#define SKF_AD_VLAN_TAG	44
+#define SKF_AD_VLAN_TAG_PRESENT 48
+#define SKF_AD_PAY_OFFSET	52
+#define SKF_AD_RANDOM	56
+#define SKF_AD_VLAN_TPID	60
+#define SKF_AD_MAX	64
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1	EBPF_REG_1
+#define BPF_REG_ARG2	EBPF_REG_2
+#define BPF_REG_ARG3	EBPF_REG_3
+#define BPF_REG_ARG4	EBPF_REG_4
+#define BPF_REG_ARG5	EBPF_REG_5
+#define BPF_REG_CTX	EBPF_REG_6
+#define BPF_REG_FP	EBPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A	EBPF_REG_0
+#define BPF_REG_X	EBPF_REG_7
+#define BPF_REG_TMP	EBPF_REG_8
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define EBPF_ALU64_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | BPF_OP(OP) | BPF_X,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU32_IMM(OP, DST, IMM)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = EBPF_ALU64 | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+#define BPF_MOV32_REG(DST, SRC)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_X,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV32_IMM(DST, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_ALU | EBPF_MOV | BPF_SRC(TYPE),	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM)					\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS,	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF)			\
+	((struct ebpf_insn) {					\
+		.code  = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM,	\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF)				\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | BPF_OP(OP) | BPF_K,		\
+		.dst_reg = DST,					\
+		.src_reg = 0,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM)			\
+	((struct ebpf_insn) {					\
+		.code  = CODE,					\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = OFF,					\
+		.imm   = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN()						\
+	((struct ebpf_insn) {					\
+		.code  = BPF_JMP | EBPF_EXIT,			\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = 0 })
+
+/*
+ * Placeholder to convert BPF extensions like length and VLAN tag
+ * If and when DPDK BPF supports them.
+ */
+static bool convert_bpf_load(const struct bpf_insn *fp,
+			     struct ebpf_insn **new_insnp __rte_unused)
+{
+	switch (fp->k) {
+	case SKF_AD_OFF + SKF_AD_PROTOCOL:
+	case SKF_AD_OFF + SKF_AD_PKTTYPE:
+	case SKF_AD_OFF + SKF_AD_IFINDEX:
+	case SKF_AD_OFF + SKF_AD_HATYPE:
+	case SKF_AD_OFF + SKF_AD_MARK:
+	case SKF_AD_OFF + SKF_AD_RXHASH:
+	case SKF_AD_OFF + SKF_AD_QUEUE:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG:
+	case SKF_AD_OFF + SKF_AD_VLAN_TAG_PRESENT:
+	case SKF_AD_OFF + SKF_AD_VLAN_TPID:
+	case SKF_AD_OFF + SKF_AD_PAY_OFFSET:
+	case SKF_AD_OFF + SKF_AD_NLATTR:
+	case SKF_AD_OFF + SKF_AD_NLATTR_NEST:
+	case SKF_AD_OFF + SKF_AD_CPU:
+	case SKF_AD_OFF + SKF_AD_RANDOM:
+	case SKF_AD_OFF + SKF_AD_ALU_XOR_X:
+		/* Linux has special negative offsets to access meta-data. */
+		RTE_BPF_LOG(ERR,
+			    "rte_bpf_convert: socket offset %d not supported\n",
+			    fp->k - SKF_AD_OFF);
+		return true;
+	default:
+		return false;
+	}
+}
+
+static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
+			      struct ebpf_insn *new_prog, uint32_t *new_len)
+{
+	unsigned int pass = 0;
+	size_t new_flen = 0, target, i;
+	struct ebpf_insn *new_insn;
+	const struct bpf_insn *fp;
+	int *addrs = NULL;
+	uint8_t bpf_src;
+
+	if (len > BPF_MAXINSNS) {
+		RTE_BPF_LOG(ERR, "%s: cBPF program too long (%zu insns)\n",
+			    __func__, len);
+		return -EINVAL;
+	}
+
+	/* On second pass, allocate the new program */
+	if (new_prog) {
+		addrs = calloc(len, sizeof(*addrs));
+		if (addrs == NULL)
+			return -ENOMEM;
+	}
+
+do_pass:
+	new_insn = new_prog;
+	fp = prog;
+
+	/* Classic BPF related prologue emission. */
+	if (new_insn) {
+		/* Classic BPF expects A and X to be reset first. These need
+		 * to be guaranteed to be the first two instructions.
+		 */
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+
+		/* All programs must keep CTX in callee saved BPF_REG_CTX.
+		 * In eBPF case it's done by the compiler, here we need to
+		 * do this ourself. Initial CTX is present in BPF_REG_ARG1.
+		 */
+		*new_insn++ = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1);
+	} else {
+		new_insn += 3;
+	}
+
+	for (i = 0; i < len; fp++, i++) {
+		struct ebpf_insn tmp_insns[6] = { };
+		struct ebpf_insn *insn = tmp_insns;
+
+		if (addrs)
+			addrs[i] = new_insn - new_prog;
+
+		switch (fp->code) {
+			/* Absolute loads are how classic BPF accesses skb */
+		case BPF_LD | BPF_ABS | BPF_W:
+		case BPF_LD | BPF_ABS | BPF_H:
+		case BPF_LD | BPF_ABS | BPF_B:
+			if (convert_bpf_load(fp, &insn))
+				goto err;
+
+			*insn = BPF_RAW_INSN(fp->code, 0, 0, 0, fp->k);
+			break;
+
+		case BPF_ALU | BPF_DIV | BPF_X:
+		case BPF_ALU | BPF_MOD | BPF_X:
+			/* For cBPF, don't cause floating point exception */
+			*insn++ = BPF_MOV32_REG(BPF_REG_X, BPF_REG_X);
+			*insn++ = BPF_JMP_IMM(EBPF_JNE, BPF_REG_X, 0, 2);
+			*insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+			*insn++ = BPF_EXIT_INSN();
+			/* fallthrough */
+		case BPF_ALU | BPF_ADD | BPF_X:
+		case BPF_ALU | BPF_ADD | BPF_K:
+		case BPF_ALU | BPF_SUB | BPF_X:
+		case BPF_ALU | BPF_SUB | BPF_K:
+		case BPF_ALU | BPF_AND | BPF_X:
+		case BPF_ALU | BPF_AND | BPF_K:
+		case BPF_ALU | BPF_OR | BPF_X:
+		case BPF_ALU | BPF_OR | BPF_K:
+		case BPF_ALU | BPF_LSH | BPF_X:
+		case BPF_ALU | BPF_LSH | BPF_K:
+		case BPF_ALU | BPF_RSH | BPF_X:
+		case BPF_ALU | BPF_RSH | BPF_K:
+		case BPF_ALU | BPF_XOR | BPF_X:
+		case BPF_ALU | BPF_XOR | BPF_K:
+		case BPF_ALU | BPF_MUL | BPF_X:
+		case BPF_ALU | BPF_MUL | BPF_K:
+		case BPF_ALU | BPF_DIV | BPF_K:
+		case BPF_ALU | BPF_MOD | BPF_K:
+		case BPF_ALU | BPF_NEG:
+		case BPF_LD | BPF_IND | BPF_W:
+		case BPF_LD | BPF_IND | BPF_H:
+		case BPF_LD | BPF_IND | BPF_B:
+			/* All arithmetic insns map as-is. */
+			insn->code = fp->code;
+			insn->dst_reg = BPF_REG_A;
+			bpf_src = BPF_SRC(fp->code);
+			insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			insn->off = 0;
+			insn->imm = fp->k;
+			break;
+
+			/* Jump transformation cannot use BPF block macros
+			 * everywhere as offset calculation and target updates
+			 * require a bit more work than the rest, i.e. jump
+			 * opcodes map as-is, but offsets need adjustment.
+			 */
+
+#define BPF_EMIT_JMP							\
+			do {						\
+				if (target >= len)			\
+					goto err;			\
+				insn->off = addrs ? addrs[target] - addrs[i] - 1 : 0; \
+				/* Adjust pc relative offset for 2nd or 3rd insn. */ \
+				insn->off -= insn - tmp_insns;		\
+			} while (0)
+
+		case BPF_JMP | BPF_JA:
+			target = i + fp->k + 1;
+			insn->code = fp->code;
+			BPF_EMIT_JMP;
+			break;
+
+		case BPF_JMP | BPF_JEQ | BPF_K:
+		case BPF_JMP | BPF_JEQ | BPF_X:
+		case BPF_JMP | BPF_JSET | BPF_K:
+		case BPF_JMP | BPF_JSET | BPF_X:
+		case BPF_JMP | BPF_JGT | BPF_K:
+		case BPF_JMP | BPF_JGT | BPF_X:
+		case BPF_JMP | BPF_JGE | BPF_K:
+		case BPF_JMP | BPF_JGE | BPF_X:
+			if (BPF_SRC(fp->code) == BPF_K && (int) fp->k < 0) {
+				/* BPF immediates are signed, zero extend
+				 * immediate into tmp register and use it
+				 * in compare insn.
+				 */
+				*insn++ = BPF_MOV32_IMM(BPF_REG_TMP, fp->k);
+
+				insn->dst_reg = BPF_REG_A;
+				insn->src_reg = BPF_REG_TMP;
+				bpf_src = BPF_X;
+			} else {
+				insn->dst_reg = BPF_REG_A;
+				insn->imm = fp->k;
+				bpf_src = BPF_SRC(fp->code);
+				insn->src_reg = bpf_src == BPF_X ? BPF_REG_X : 0;
+			}
+
+			/* Common case where 'jump_false' is next insn. */
+			if (fp->jf == 0) {
+				insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+				target = i + fp->jt + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Convert JEQ into JNE when 'jump_true' is next insn. */
+			if (fp->jt == 0 && BPF_OP(fp->code) == BPF_JEQ) {
+				insn->code = BPF_JMP | EBPF_JNE | bpf_src;
+				target = i + fp->jf + 1;
+				BPF_EMIT_JMP;
+				break;
+			}
+
+			/* Other jumps are mapped into two insns: Jxx and JA. */
+			target = i + fp->jt + 1;
+			insn->code = BPF_JMP | BPF_OP(fp->code) | bpf_src;
+			BPF_EMIT_JMP;
+			insn++;
+
+			insn->code = BPF_JMP | BPF_JA;
+			target = i + fp->jf + 1;
+			BPF_EMIT_JMP;
+			break;
+
+			/* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */
+		case BPF_LDX | BPF_MSH | BPF_B:
+			/* tmp = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_TMP, BPF_REG_A);
+			/* A = BPF_R0 = *(u8 *) (skb->data + K) */
+			*insn++ = BPF_LD_ABS(BPF_B, fp->k);
+			/* A &= 0xf */
+			*insn++ = BPF_ALU32_IMM(BPF_AND, BPF_REG_A, 0xf);
+			/* A <<= 2 */
+			*insn++ = BPF_ALU32_IMM(BPF_LSH, BPF_REG_A, 2);
+			/* X = A */
+			*insn++ = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			/* A = tmp */
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP);
+			break;
+
+			/* RET_K is remaped into 2 insns. RET_A case doesn't need an
+			 * extra mov as EBPF_REG_0 is already mapped into BPF_REG_A.
+			 */
+		case BPF_RET | BPF_A:
+		case BPF_RET | BPF_K:
+			if (BPF_RVAL(fp->code) == BPF_K) {
+				*insn++ = BPF_MOV32_RAW(BPF_K, EBPF_REG_0,
+							0, fp->k);
+			}
+			*insn = BPF_EXIT_INSN();
+			break;
+
+			/* Store to stack. */
+		case BPF_ST:
+		case BPF_STX:
+			*insn = BPF_STX_MEM(BPF_W, BPF_REG_FP, BPF_CLASS(fp->code) ==
+					    BPF_ST ? BPF_REG_A : BPF_REG_X,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* Load from stack. */
+		case BPF_LD | BPF_MEM:
+		case BPF_LDX | BPF_MEM:
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD  ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_FP,
+					    -(BPF_MEMWORDS - fp->k) * 4);
+			break;
+
+			/* A = K or X = K */
+		case BPF_LD | BPF_IMM:
+		case BPF_LDX | BPF_IMM:
+			*insn = BPF_MOV32_IMM(BPF_CLASS(fp->code) == BPF_LD ?
+					      BPF_REG_A : BPF_REG_X, fp->k);
+			break;
+
+			/* X = A */
+		case BPF_MISC | BPF_TAX:
+			*insn = BPF_MOV64_REG(BPF_REG_X, BPF_REG_A);
+			break;
+
+			/* A = X */
+		case BPF_MISC | BPF_TXA:
+			*insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_X);
+			break;
+
+			/* A = mbuf->len or X = mbuf->len */
+		case BPF_LD | BPF_W | BPF_LEN:
+		case BPF_LDX | BPF_W | BPF_LEN:
+			/* BPF_ABS/BPF_IND implicitly expect mbuf ptr in R6 */
+
+			*insn = BPF_LDX_MEM(BPF_W, BPF_CLASS(fp->code) == BPF_LD ?
+					    BPF_REG_A : BPF_REG_X, BPF_REG_CTX,
+					    offsetof(struct rte_mbuf, pkt_len));
+			break;
+
+			/* Unknown instruction. */
+		default:
+			RTE_BPF_LOG(ERR, "%s: Unknown instruction!: %#x\n",
+				    __func__, fp->code);
+			goto err;
+		}
+
+		insn++;
+		if (new_prog)
+			memcpy(new_insn, tmp_insns,
+			       sizeof(*insn) * (insn - tmp_insns));
+		new_insn += insn - tmp_insns;
+	}
+
+	if (!new_prog) {
+		/* Only calculating new length. */
+		*new_len = new_insn - new_prog;
+		return 0;
+	}
+
+	pass++;
+	if ((ptrdiff_t)new_flen != new_insn - new_prog) {
+		new_flen = new_insn - new_prog;
+		if (pass > 2)
+			goto err;
+		goto do_pass;
+	}
+
+	free(addrs);
+	assert(*new_len == new_flen);
+
+	return 0;
+err:
+	free(addrs);
+	return -1;
+}
+
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog)
+{
+	struct rte_bpf_prm *prm = NULL;
+	struct ebpf_insn *ebpf = NULL;
+	uint32_t ebpf_len = 0;
+	int ret;
+
+	if (prog == NULL) {
+		RTE_BPF_LOG(ERR, "%s: NULL program\n", __func__);
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* 1st pass: calculate the eBPF program length */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, NULL, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot get eBPF length\n", __func__);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	RTE_BPF_LOG(DEBUG, "%s: prog len cBPF=%u -> eBPF=%u\n",
+		    __func__, prog->bf_len, ebpf_len);
+
+	prm = rte_zmalloc("bpf_filter",
+			  sizeof(*prm) + ebpf_len * sizeof(*ebpf), 0);
+	if (prm == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	/* The EPBF instructions in this case are right after the header */
+	ebpf = (void *)(prm + 1);
+
+	/* 2nd pass: remap cBPF to eBPF instructions  */
+	ret = bpf_convert_filter(prog->bf_insns, prog->bf_len, ebpf, &ebpf_len);
+	if (ret < 0) {
+		RTE_BPF_LOG(ERR, "%s: cannot convert cBPF to eBPF\n", __func__);
+		free(prm);
+		rte_errno = -ret;
+		return NULL;
+	}
+
+	prm->ins = ebpf;
+	prm->nb_ins = ebpf_len;
+
+	/* Classic BPF programs use mbufs */
+	prm->prog_arg.type = RTE_BPF_ARG_PTR_MBUF;
+	prm->prog_arg.size = sizeof(struct rte_mbuf);
+
+	return prm;
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 63cbd60185e0..54f7610ae990 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -25,3 +25,8 @@ if dep.found()
     sources += files('bpf_load_elf.c')
     ext_deps += dep
 endif
+
+if dpdk_conf.has('RTE_PORT_PCAP')
+    sources += files('bpf_convert.c')
+    ext_deps += pcap_dep
+endif
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 69116f36ba8b..2f23e272a376 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,31 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+#ifdef RTE_PORT_PCAP
+
+struct bpf_program;
+
+/**
+ * Convert a Classic BPF program from libpcap into a DPDK BPF code.
+ *
+ * @param prog
+ *  Classic BPF program from pcap_compile().
+ * @param prm
+ *  Result Extended BPF program.
+ * @return
+ *   Pointer to BPF program (allocated with *rte_malloc*)
+ *   that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+__rte_experimental
+struct rte_bpf_prm *
+rte_bpf_convert(const struct bpf_program *prog);
+
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 0bf35f487666..47082d5003ef 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -14,3 +14,9 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_convert;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v15 05/12] bpf: add function to dump eBPF instructions
  2021-10-20 21:42 ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Stephen Hemminger
                     ` (3 preceding siblings ...)
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
@ 2021-10-20 21:42   ` Stephen Hemminger
  2021-10-21 14:15     ` Kinsella, Ray
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 06/12] pdump: support pcapng and filtering Stephen Hemminger
                     ` (7 subsequent siblings)
  12 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-20 21:42 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Ray Kinsella

When debugging converted (and other) programs it is useful
to see disassembled eBPF output.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/bpf/bpf_dump.c  | 139 ++++++++++++++++++++++++++++++++++++++++++++
 lib/bpf/meson.build |   1 +
 lib/bpf/rte_bpf.h   |  14 +++++
 lib/bpf/version.map |   1 +
 4 files changed, 155 insertions(+)
 create mode 100644 lib/bpf/bpf_dump.c

diff --git a/lib/bpf/bpf_dump.c b/lib/bpf/bpf_dump.c
new file mode 100644
index 000000000000..b86977b96d08
--- /dev/null
+++ b/lib/bpf/bpf_dump.c
@@ -0,0 +1,139 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Stephen Hemminger
+ * Based on filter2xdp
+ * Copyright (C) 2017 Tobias Klauser
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+
+#include "rte_bpf.h"
+
+#define BPF_OP_INDEX(x) (BPF_OP(x) >> 4)
+#define BPF_SIZE_INDEX(x) (BPF_SIZE(x) >> 3)
+
+static const char *const class_tbl[] = {
+	[BPF_LD] = "ld",   [BPF_LDX] = "ldx",	 [BPF_ST] = "st",
+	[BPF_STX] = "stx", [BPF_ALU] = "alu",	 [BPF_JMP] = "jmp",
+	[BPF_RET] = "ret", [BPF_MISC] = "alu64",
+};
+
+static const char *const alu_op_tbl[16] = {
+	[BPF_ADD >> 4] = "add",	   [BPF_SUB >> 4] = "sub",
+	[BPF_MUL >> 4] = "mul",	   [BPF_DIV >> 4] = "div",
+	[BPF_OR >> 4] = "or",	   [BPF_AND >> 4] = "and",
+	[BPF_LSH >> 4] = "lsh",	   [BPF_RSH >> 4] = "rsh",
+	[BPF_NEG >> 4] = "neg",	   [BPF_MOD >> 4] = "mod",
+	[BPF_XOR >> 4] = "xor",	   [EBPF_MOV >> 4] = "mov",
+	[EBPF_ARSH >> 4] = "arsh", [EBPF_END >> 4] = "endian",
+};
+
+static const char *const size_tbl[] = {
+	[BPF_W >> 3] = "w",
+	[BPF_H >> 3] = "h",
+	[BPF_B >> 3] = "b",
+	[EBPF_DW >> 3] = "dw",
+};
+
+static const char *const jump_tbl[16] = {
+	[BPF_JA >> 4] = "ja",	   [BPF_JEQ >> 4] = "jeq",
+	[BPF_JGT >> 4] = "jgt",	   [BPF_JGE >> 4] = "jge",
+	[BPF_JSET >> 4] = "jset",  [EBPF_JNE >> 4] = "jne",
+	[EBPF_JSGT >> 4] = "jsgt", [EBPF_JSGE >> 4] = "jsge",
+	[EBPF_CALL >> 4] = "call", [EBPF_EXIT >> 4] = "exit",
+};
+
+void rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len)
+{
+	uint32_t i;
+
+	for (i = 0; i < len; ++i) {
+		const struct ebpf_insn *ins = buf + i;
+		uint8_t cls = BPF_CLASS(ins->code);
+		const char *op, *postfix = "";
+
+		fprintf(f, " L%u:\t", i);
+
+		switch (cls) {
+		default:
+			fprintf(f, "unimp 0x%x // class: %s\n",
+				ins->code, class_tbl[cls]);
+			break;
+		case BPF_ALU:
+			postfix = "32";
+			/* fall through */
+		case EBPF_ALU64:
+			op = alu_op_tbl[BPF_OP_INDEX(ins->code)];
+			if (BPF_SRC(ins->code) == BPF_X)
+				fprintf(f, "%s%s r%u, r%u\n", op, postfix, ins->dst_reg,
+					ins->src_reg);
+			else
+				fprintf(f, "%s%s r%u, #0x%x\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			break;
+		case BPF_LD:
+			op = "ld";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			if (ins->code == (BPF_LD | BPF_IMM | EBPF_DW)) {
+				uint64_t val;
+
+				val = (uint32_t)ins[0].imm |
+					(uint64_t)(uint32_t)ins[1].imm << 32;
+				fprintf(f, "%s%s r%d, #0x%"PRIx64"\n",
+					op, postfix, ins->dst_reg, val);
+				i++;
+			} else if (BPF_MODE(ins->code) == BPF_IMM)
+				fprintf(f, "%s%s r%d, #0x%x\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			else if (BPF_MODE(ins->code) == BPF_ABS)
+				fprintf(f, "%s%s r%d, [%d]\n", op, postfix,
+					ins->dst_reg, ins->imm);
+			else if (BPF_MODE(ins->code) == BPF_IND)
+				fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix,
+					ins->dst_reg, ins->src_reg, ins->imm);
+			else
+				fprintf(f, "// BUG: LD opcode 0x%02x in eBPF insns\n",
+					ins->code);
+			break;
+		case BPF_LDX:
+			op = "ldx";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			fprintf(f, "%s%s r%d, [r%u + %d]\n", op, postfix, ins->dst_reg,
+				ins->src_reg, ins->off);
+			break;
+		case BPF_ST:
+			op = "st";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			if (BPF_MODE(ins->code) == BPF_MEM)
+				fprintf(f, "%s%s [r%d + %d], #0x%x\n", op, postfix,
+					ins->dst_reg, ins->off, ins->imm);
+			else
+				fprintf(f, "// BUG: ST opcode 0x%02x in eBPF insns\n",
+					ins->code);
+			break;
+		case BPF_STX:
+			op = "stx";
+			postfix = size_tbl[BPF_SIZE_INDEX(ins->code)];
+			fprintf(f, "%s%s [r%d + %d], r%u\n", op, postfix,
+				ins->dst_reg, ins->off, ins->src_reg);
+			break;
+#define L(pc, off) ((int)(pc) + 1 + (off))
+		case BPF_JMP:
+			op = jump_tbl[BPF_OP_INDEX(ins->code)];
+			if (op == NULL)
+				fprintf(f, "invalid jump opcode: %#x\n", ins->code);
+			else if (BPF_OP(ins->code) == BPF_JA)
+				fprintf(f, "%s L%d\n", op, L(i, ins->off));
+			else if (BPF_OP(ins->code) == EBPF_EXIT)
+				fprintf(f, "%s\n", op);
+			else
+				fprintf(f, "%s r%u, #0x%x, L%d\n", op, ins->dst_reg,
+					ins->imm, L(i, ins->off));
+			break;
+		case BPF_RET:
+			fprintf(f, "// BUG: RET opcode 0x%02x in eBPF insns\n",
+				ins->code);
+			break;
+		}
+	}
+}
diff --git a/lib/bpf/meson.build b/lib/bpf/meson.build
index 54f7610ae990..5b5585173aeb 100644
--- a/lib/bpf/meson.build
+++ b/lib/bpf/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2018 Intel Corporation
 
 sources = files('bpf.c',
+	'bpf_dump.c',
         'bpf_exec.c',
         'bpf_load.c',
         'bpf_pkt.c',
diff --git a/lib/bpf/rte_bpf.h b/lib/bpf/rte_bpf.h
index 2f23e272a376..0d0a84b130a0 100644
--- a/lib/bpf/rte_bpf.h
+++ b/lib/bpf/rte_bpf.h
@@ -198,6 +198,20 @@ rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
 int
 rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
 
+/**
+ * Dump epf instructions to a file.
+ *
+ * @param f
+ *   A pointer to a file for output
+ * @param buf
+ *   A pointer to BPF instructions
+ * @param len
+ *   Number of BPF instructions to dump.
+ */
+__rte_experimental
+void
+rte_bpf_dump(FILE *f, const struct ebpf_insn *buf, uint32_t len);
+
 #ifdef RTE_PORT_PCAP
 
 struct bpf_program;
diff --git a/lib/bpf/version.map b/lib/bpf/version.map
index 47082d5003ef..3b953f2f4592 100644
--- a/lib/bpf/version.map
+++ b/lib/bpf/version.map
@@ -19,4 +19,5 @@ EXPERIMENTAL {
 	global:
 
 	rte_bpf_convert;
+	rte_bpf_dump;
 };
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v15 06/12] pdump: support pcapng and filtering
  2021-10-20 21:42 ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Stephen Hemminger
                     ` (4 preceding siblings ...)
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
@ 2021-10-20 21:42   ` Stephen Hemminger
  2021-10-21 14:16     ` Kinsella, Ray
  2021-10-27  6:34     ` Wang, Yinan
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
                     ` (6 subsequent siblings)
  12 siblings, 2 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-20 21:42 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Reshma Pattan, Ray Kinsella, Anatoly Burakov

This enhances the DPDK pdump library to support new
pcapng format and filtering via BPF.

The internal client/server protocol is changed to support
two versions: the original pdump basic version and a
new pcapng version.

The internal version number (not part of exposed API or ABI)
is intentionally increased to cause any attempt to try
mismatched primary/secondary process to fail.

Add new API to do allow filtering of captured packets with
DPDK BPF (eBPF) filter program. It keeps statistics
on packets captured, filtered, and missed (because ring was full).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
---
 lib/meson.build       |   4 +-
 lib/pdump/meson.build |   2 +-
 lib/pdump/rte_pdump.c | 432 ++++++++++++++++++++++++++++++------------
 lib/pdump/rte_pdump.h | 113 ++++++++++-
 lib/pdump/version.map |   8 +
 5 files changed, 433 insertions(+), 126 deletions(-)

diff --git a/lib/meson.build b/lib/meson.build
index 484b1da2b88d..1a8ac30c4da6 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -27,6 +27,7 @@ libraries = [
         'acl',
         'bbdev',
         'bitratestats',
+        'bpf',
         'cfgfile',
         'compressdev',
         'cryptodev',
@@ -43,7 +44,6 @@ libraries = [
         'member',
         'pcapng',
         'power',
-        'pdump',
         'rawdev',
         'regexdev',
         'dmadev',
@@ -56,10 +56,10 @@ libraries = [
         'ipsec', # ipsec lib depends on net, crypto and security
         'fib', #fib lib depends on rib
         'port', # pkt framework libs which use other libs from above
+        'pdump', # pdump lib depends on bpf
         'table',
         'pipeline',
         'flow_classify', # flow_classify lib depends on pkt framework table lib
-        'bpf',
         'graph',
         'node',
 ]
diff --git a/lib/pdump/meson.build b/lib/pdump/meson.build
index 3a95eabde6a6..51ceb2afdec5 100644
--- a/lib/pdump/meson.build
+++ b/lib/pdump/meson.build
@@ -3,4 +3,4 @@
 
 sources = files('rte_pdump.c')
 headers = files('rte_pdump.h')
-deps += ['ethdev']
+deps += ['ethdev', 'bpf', 'pcapng']
diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 46a87e233904..71602685d544 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -7,8 +7,10 @@
 #include <rte_ethdev.h>
 #include <rte_lcore.h>
 #include <rte_log.h>
+#include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
+#include <rte_pcapng.h>
 
 #include "rte_pdump.h"
 
@@ -27,30 +29,23 @@ enum pdump_operation {
 	ENABLE = 2
 };
 
+/* Internal version number in request */
 enum pdump_version {
-	V1 = 1
+	V1 = 1,		    /* no filtering or snap */
+	V2 = 2,
 };
 
 struct pdump_request {
 	uint16_t ver;
 	uint16_t op;
 	uint32_t flags;
-	union pdump_data {
-		struct enable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} en_v1;
-		struct disable_v1 {
-			char device[RTE_DEV_NAME_MAX_LEN];
-			uint16_t queue;
-			struct rte_ring *ring;
-			struct rte_mempool *mp;
-			void *filter;
-		} dis_v1;
-	} data;
+	char device[RTE_DEV_NAME_MAX_LEN];
+	uint16_t queue;
+	struct rte_ring *ring;
+	struct rte_mempool *mp;
+
+	const struct rte_bpf_prm *prm;
+	uint32_t snaplen;
 };
 
 struct pdump_response {
@@ -63,80 +58,140 @@ static struct pdump_rxtx_cbs {
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	const struct rte_eth_rxtx_callback *cb;
-	void *filter;
+	const struct rte_bpf *filter;
+	enum pdump_version ver;
+	uint32_t snaplen;
 } rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
 tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
 
 
-static inline void
-pdump_copy(struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
+/*
+ * The packet capture statistics keep track of packets
+ * accepted, filtered and dropped. These are per-queue
+ * and in memory between primary and secondary processes.
+ */
+static const char MZ_RTE_PDUMP_STATS[] = "rte_pdump_stats";
+static struct {
+	struct rte_pdump_stats rx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+	struct rte_pdump_stats tx[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+} *pdump_stats;
+
+/* Create a clone of mbuf to be placed into ring. */
+static void
+pdump_copy(uint16_t port_id, uint16_t queue,
+	   enum rte_pcapng_direction direction,
+	   struct rte_mbuf **pkts, uint16_t nb_pkts,
+	   const struct pdump_rxtx_cbs *cbs,
+	   struct rte_pdump_stats *stats)
 {
 	unsigned int i;
 	int ring_enq;
 	uint16_t d_pkts = 0;
 	struct rte_mbuf *dup_bufs[nb_pkts];
-	struct pdump_rxtx_cbs *cbs;
+	uint64_t ts;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 	struct rte_mbuf *p;
+	uint64_t rcs[nb_pkts];
+
+	if (cbs->filter)
+		rte_bpf_exec_burst(cbs->filter, (void **)pkts, rcs, nb_pkts);
 
-	cbs  = user_params;
+	ts = rte_get_tsc_cycles();
 	ring = cbs->ring;
 	mp = cbs->mp;
 	for (i = 0; i < nb_pkts; i++) {
-		p = rte_pktmbuf_copy(pkts[i], mp, 0, UINT32_MAX);
-		if (p)
+		/*
+		 * This uses same BPF return value convention as socket filter
+		 * and pcap_offline_filter.
+		 * if program returns zero
+		 * then packet doesn't match the filter (will be ignored).
+		 */
+		if (cbs->filter && rcs[i] == 0) {
+			__atomic_fetch_add(&stats->filtered,
+					   1, __ATOMIC_RELAXED);
+			continue;
+		}
+
+		/*
+		 * If using pcapng then want to wrap packets
+		 * otherwise a simple copy.
+		 */
+		if (cbs->ver == V2)
+			p = rte_pcapng_copy(port_id, queue,
+					    pkts[i], mp, cbs->snaplen,
+					    ts, direction);
+		else
+			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
+
+		if (unlikely(p == NULL))
+			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+		else
 			dup_bufs[d_pkts++] = p;
 	}
 
+	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)dup_bufs, d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		PDUMP_LOG(DEBUG,
-			"only %d of packets enqueued to ring\n", ring_enq);
+		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
 
 static uint16_t
-pdump_rx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_rx(uint16_t port, uint16_t queue,
 	struct rte_mbuf **pkts, uint16_t nb_pkts,
-	uint16_t max_pkts __rte_unused,
-	void *user_params)
+	uint16_t max_pkts __rte_unused, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->rx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_IN,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static uint16_t
-pdump_tx(uint16_t port __rte_unused, uint16_t qidx __rte_unused,
+pdump_tx(uint16_t port, uint16_t queue,
 		struct rte_mbuf **pkts, uint16_t nb_pkts, void *user_params)
 {
-	pdump_copy(pkts, nb_pkts, user_params);
+	const struct pdump_rxtx_cbs *cbs = user_params;
+	struct rte_pdump_stats *stats = &pdump_stats->tx[port][queue];
+
+	pdump_copy(port, queue, RTE_PCAPNG_DIRECTION_OUT,
+		   pkts, nb_pkts, cbs, stats);
 	return nb_pkts;
 }
 
 static int
-pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_rx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &rx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &rx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"rx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_first_rx_callback(port, qid,
 								pdump_rx, cbs);
 			if (cbs->cb == NULL) {
@@ -145,8 +200,7 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -170,26 +224,32 @@ pdump_register_rx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 }
 
 static int
-pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
-				struct rte_ring *ring, struct rte_mempool *mp,
-				uint16_t operation)
+pdump_register_tx_callbacks(enum pdump_version ver,
+			    uint16_t end_q, uint16_t port, uint16_t queue,
+			    struct rte_ring *ring, struct rte_mempool *mp,
+			    struct rte_bpf *filter,
+			    uint16_t operation, uint32_t snaplen)
 {
 
 	uint16_t qid;
-	struct pdump_rxtx_cbs *cbs = NULL;
 
 	qid = (queue == RTE_PDUMP_ALL_QUEUES) ? 0 : queue;
 	for (; qid < end_q; qid++) {
-		cbs = &tx_cbs[port][qid];
-		if (cbs && operation == ENABLE) {
+		struct pdump_rxtx_cbs *cbs = &tx_cbs[port][qid];
+
+		if (operation == ENABLE) {
 			if (cbs->cb) {
 				PDUMP_LOG(ERR,
 					"tx callback for port=%d queue=%d, already exists\n",
 					port, qid);
 				return -EEXIST;
 			}
+			cbs->ver = ver;
 			cbs->ring = ring;
 			cbs->mp = mp;
+			cbs->snaplen = snaplen;
+			cbs->filter = filter;
+
 			cbs->cb = rte_eth_add_tx_callback(port, qid, pdump_tx,
 								cbs);
 			if (cbs->cb == NULL) {
@@ -198,8 +258,7 @@ pdump_register_tx_callbacks(uint16_t end_q, uint16_t port, uint16_t queue,
 					rte_errno);
 				return rte_errno;
 			}
-		}
-		if (cbs && operation == DISABLE) {
+		} else if (operation == DISABLE) {
 			int ret;
 
 			if (cbs->cb == NULL) {
@@ -228,37 +287,47 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	uint16_t nb_rx_q = 0, nb_tx_q = 0, end_q, queue;
 	uint16_t port;
 	int ret = 0;
+	struct rte_bpf *filter = NULL;
 	uint32_t flags;
 	uint16_t operation;
 	struct rte_ring *ring;
 	struct rte_mempool *mp;
 
-	flags = p->flags;
-	operation = p->op;
-	if (operation == ENABLE) {
-		ret = rte_eth_dev_get_port_by_name(p->data.en_v1.device,
-				&port);
-		if (ret < 0) {
+	/* Check for possible DPDK version mismatch */
+	if (!(p->ver == V1 || p->ver == V2)) {
+		PDUMP_LOG(ERR,
+			  "incorrect client version %u\n", p->ver);
+		return -EINVAL;
+	}
+
+	if (p->prm) {
+		if (p->prm->prog_arg.type != RTE_BPF_ARG_PTR_MBUF) {
 			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.en_v1.device);
+				  "invalid BPF program type: %u\n",
+				  p->prm->prog_arg.type);
 			return -EINVAL;
 		}
-		queue = p->data.en_v1.queue;
-		ring = p->data.en_v1.ring;
-		mp = p->data.en_v1.mp;
-	} else {
-		ret = rte_eth_dev_get_port_by_name(p->data.dis_v1.device,
-				&port);
-		if (ret < 0) {
-			PDUMP_LOG(ERR,
-				"failed to get port id for device id=%s\n",
-				p->data.dis_v1.device);
-			return -EINVAL;
+
+		filter = rte_bpf_load(p->prm);
+		if (filter == NULL) {
+			PDUMP_LOG(ERR, "cannot load BPF filter: %s\n",
+				  rte_strerror(rte_errno));
+			return -rte_errno;
 		}
-		queue = p->data.dis_v1.queue;
-		ring = p->data.dis_v1.ring;
-		mp = p->data.dis_v1.mp;
+	}
+
+	flags = p->flags;
+	operation = p->op;
+	queue = p->queue;
+	ring = p->ring;
+	mp = p->mp;
+
+	ret = rte_eth_dev_get_port_by_name(p->device, &port);
+	if (ret < 0) {
+		PDUMP_LOG(ERR,
+			  "failed to get port id for device id=%s\n",
+			  p->device);
+		return -EINVAL;
 	}
 
 	/* validation if packet capture is for all queues */
@@ -296,8 +365,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register RX callback */
 	if (flags & RTE_PDUMP_FLAG_RX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_rx_q : queue + 1;
-		ret = pdump_register_rx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_rx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -305,8 +375,9 @@ set_pdump_rxtx_cbs(const struct pdump_request *p)
 	/* register TX callback */
 	if (flags & RTE_PDUMP_FLAG_TX) {
 		end_q = (queue == RTE_PDUMP_ALL_QUEUES) ? nb_tx_q : queue + 1;
-		ret = pdump_register_tx_callbacks(end_q, port, queue, ring, mp,
-							operation);
+		ret = pdump_register_tx_callbacks(p->ver, end_q, port, queue,
+						  ring, mp, filter,
+						  operation, p->snaplen);
 		if (ret < 0)
 			return ret;
 	}
@@ -332,7 +403,7 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 		resp->err_value = set_pdump_rxtx_cbs(cli_req);
 	}
 
-	strlcpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_resp.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_resp.len_param = sizeof(*resp);
 	mp_resp.num_fds = 0;
 	if (rte_mp_reply(&mp_resp, peer) < 0) {
@@ -347,8 +418,18 @@ pdump_server(const struct rte_mp_msg *mp_msg, const void *peer)
 int
 rte_pdump_init(void)
 {
+	const struct rte_memzone *mz;
 	int ret;
 
+	mz = rte_memzone_reserve(MZ_RTE_PDUMP_STATS, sizeof(*pdump_stats),
+				 rte_socket_id(), 0);
+	if (mz == NULL) {
+		PDUMP_LOG(ERR, "cannot allocate pdump statistics\n");
+		rte_errno = ENOMEM;
+		return -1;
+	}
+	pdump_stats = mz->addr;
+
 	ret = rte_mp_action_register(PDUMP_MP, pdump_server);
 	if (ret && rte_errno != ENOTSUP)
 		return -1;
@@ -393,14 +474,21 @@ pdump_validate_ring_mp(struct rte_ring *ring, struct rte_mempool *mp)
 static int
 pdump_validate_flags(uint32_t flags)
 {
-	if (flags != RTE_PDUMP_FLAG_RX && flags != RTE_PDUMP_FLAG_TX &&
-		flags != RTE_PDUMP_FLAG_RXTX) {
+	if ((flags & RTE_PDUMP_FLAG_RXTX) == 0) {
 		PDUMP_LOG(ERR,
 			"invalid flags, should be either rx/tx/rxtx\n");
 		rte_errno = EINVAL;
 		return -1;
 	}
 
+	/* mask off the flags we know about */
+	if (flags & ~(RTE_PDUMP_FLAG_RXTX | RTE_PDUMP_FLAG_PCAPNG)) {
+		PDUMP_LOG(ERR,
+			  "unknown flags: %#x\n", flags);
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -427,12 +515,12 @@ pdump_validate_port(uint16_t port, char *name)
 }
 
 static int
-pdump_prepare_client_request(char *device, uint16_t queue,
-				uint32_t flags,
-				uint16_t operation,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+pdump_prepare_client_request(const char *device, uint16_t queue,
+			     uint32_t flags, uint32_t snaplen,
+			     uint16_t operation,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     const struct rte_bpf_prm *prm)
 {
 	int ret = -1;
 	struct rte_mp_msg mp_req, *mp_rep;
@@ -441,26 +529,22 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	struct pdump_request *req = (struct pdump_request *)mp_req.param;
 	struct pdump_response *resp;
 
-	req->ver = 1;
-	req->flags = flags;
+	memset(req, 0, sizeof(*req));
+
+	req->ver = (flags & RTE_PDUMP_FLAG_PCAPNG) ? V2 : V1;
+	req->flags = flags & RTE_PDUMP_FLAG_RXTX;
 	req->op = operation;
+	req->queue = queue;
+	rte_strscpy(req->device, device, sizeof(req->device));
+
 	if ((operation & ENABLE) != 0) {
-		strlcpy(req->data.en_v1.device, device,
-			sizeof(req->data.en_v1.device));
-		req->data.en_v1.queue = queue;
-		req->data.en_v1.ring = ring;
-		req->data.en_v1.mp = mp;
-		req->data.en_v1.filter = filter;
-	} else {
-		strlcpy(req->data.dis_v1.device, device,
-			sizeof(req->data.dis_v1.device));
-		req->data.dis_v1.queue = queue;
-		req->data.dis_v1.ring = NULL;
-		req->data.dis_v1.mp = NULL;
-		req->data.dis_v1.filter = NULL;
+		req->ring = ring;
+		req->mp = mp;
+		req->prm = prm;
+		req->snaplen = snaplen;
 	}
 
-	strlcpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
+	rte_strscpy(mp_req.name, PDUMP_MP, RTE_MP_MAX_NAME_LEN);
 	mp_req.len_param = sizeof(*req);
 	mp_req.num_fds = 0;
 	if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) {
@@ -478,11 +562,17 @@ pdump_prepare_client_request(char *device, uint16_t queue,
 	return ret;
 }
 
-int
-rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
-			struct rte_ring *ring,
-			struct rte_mempool *mp,
-			void *filter)
+/*
+ * There are two versions of this function, because although original API
+ * left place holder for future filter, it never checked the value.
+ * Therefore the API can't depend on application passing a non
+ * bogus value.
+ */
+static int
+pdump_enable(uint16_t port, uint16_t queue,
+	     uint32_t flags, uint32_t snaplen,
+	     struct rte_ring *ring, struct rte_mempool *mp,
+	     const struct rte_bpf_prm *prm)
 {
 	int ret;
 	char name[RTE_DEV_NAME_MAX_LEN];
@@ -497,20 +587,42 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						ENABLE, ring, mp, filter);
+	if (snaplen == 0)
+		snaplen = UINT32_MAX;
 
-	return ret;
+	return pdump_prepare_client_request(name, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
 }
 
 int
-rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
-				uint32_t flags,
-				struct rte_ring *ring,
-				struct rte_mempool *mp,
-				void *filter)
+rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
+		 struct rte_ring *ring,
+		 struct rte_mempool *mp,
+		 void *filter __rte_unused)
 {
-	int ret = 0;
+	return pdump_enable(port, queue, flags, 0,
+			    ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf(uint16_t port, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm)
+{
+	return pdump_enable(port, queue, flags, snaplen,
+			    ring, mp, prm);
+}
+
+static int
+pdump_enable_by_deviceid(const char *device_id, uint16_t queue,
+			 uint32_t flags, uint32_t snaplen,
+			 struct rte_ring *ring,
+			 struct rte_mempool *mp,
+			 const struct rte_bpf_prm *prm)
+{
+	int ret;
 
 	ret = pdump_validate_ring_mp(ring, mp);
 	if (ret < 0)
@@ -519,10 +631,30 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						ENABLE, ring, mp, filter);
+	return pdump_prepare_client_request(device_id, queue, flags, snaplen,
+					    ENABLE, ring, mp, prm);
+}
 
-	return ret;
+int
+rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
+			     uint32_t flags,
+			     struct rte_ring *ring,
+			     struct rte_mempool *mp,
+			     void *filter __rte_unused)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, 0,
+					ring, mp, NULL);
+}
+
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *prm)
+{
+	return pdump_enable_by_deviceid(device_id, queue, flags, snaplen,
+					ring, mp, prm);
 }
 
 int
@@ -538,8 +670,8 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags)
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(name, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(name, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
@@ -554,8 +686,68 @@ rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 	if (ret < 0)
 		return ret;
 
-	ret = pdump_prepare_client_request(device_id, queue, flags,
-						DISABLE, NULL, NULL, NULL);
+	ret = pdump_prepare_client_request(device_id, queue, flags, 0,
+					   DISABLE, NULL, NULL, NULL);
 
 	return ret;
 }
+
+static void
+pdump_sum_stats(uint16_t port, uint16_t nq,
+		struct rte_pdump_stats stats[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+		struct rte_pdump_stats *total)
+{
+	uint64_t *sum = (uint64_t *)total;
+	unsigned int i;
+	uint64_t val;
+	uint16_t qid;
+
+	for (qid = 0; qid < nq; qid++) {
+		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+
+		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
+			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			sum[i] += val;
+		}
+	}
+}
+
+int
+rte_pdump_stats(uint16_t port, struct rte_pdump_stats *stats)
+{
+	struct rte_eth_dev_info dev_info;
+	const struct rte_memzone *mz;
+	int ret;
+
+	memset(stats, 0, sizeof(*stats));
+	ret = rte_eth_dev_info_get(port, &dev_info);
+	if (ret != 0) {
+		PDUMP_LOG(ERR,
+			  "Error during getting device (port %u) info: %s\n",
+			  port, strerror(-ret));
+		return ret;
+	}
+
+	if (pdump_stats == NULL) {
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+			/* rte_pdump_init was not called */
+			PDUMP_LOG(ERR, "pdump stats not initialized\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+
+		/* secondary process looks up the memzone */
+		mz = rte_memzone_lookup(MZ_RTE_PDUMP_STATS);
+		if (mz == NULL) {
+			/* rte_pdump_init was not called in primary process?? */
+			PDUMP_LOG(ERR, "can not find pdump stats\n");
+			rte_errno = EINVAL;
+			return -1;
+		}
+		pdump_stats = mz->addr;
+	}
+
+	pdump_sum_stats(port, dev_info.nb_rx_queues, pdump_stats->rx, stats);
+	pdump_sum_stats(port, dev_info.nb_tx_queues, pdump_stats->tx, stats);
+	return 0;
+}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index 6b00fc17aeb2..6efa0274f2ce 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -15,6 +15,7 @@
 #include <stdint.h>
 #include <rte_mempool.h>
 #include <rte_ring.h>
+#include <rte_bpf.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -26,7 +27,9 @@ enum {
 	RTE_PDUMP_FLAG_RX = 1,  /* receive direction */
 	RTE_PDUMP_FLAG_TX = 2,  /* transmit direction */
 	/* both receive and transmit directions */
-	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX)
+	RTE_PDUMP_FLAG_RXTX = (RTE_PDUMP_FLAG_RX|RTE_PDUMP_FLAG_TX),
+
+	RTE_PDUMP_FLAG_PCAPNG = 4, /* format for pcapng */
 };
 
 /**
@@ -68,7 +71,7 @@ rte_pdump_uninit(void);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  Unused should be NULL.
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -80,6 +83,41 @@ rte_pdump_enable(uint16_t port, uint16_t queue, uint32_t flags,
 		struct rte_mempool *mp,
 		void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given port and queue with filtering.
+ *
+ * @param port_id
+ *  The Ethernet port on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param prm
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf(uint16_t port_id, uint16_t queue,
+		     uint32_t flags, uint32_t snaplen,
+		     struct rte_ring *ring,
+		     struct rte_mempool *mp,
+		     const struct rte_bpf_prm *prm);
+
 /**
  * Disables packet capturing on given port and queue.
  *
@@ -118,7 +156,7 @@ rte_pdump_disable(uint16_t port, uint16_t queue, uint32_t flags);
  * @param mp
  *  mempool on to which original packets will be mirrored or duplicated.
  * @param filter
- *  place holder for packet filtering.
+ *  unused should be NULL
  *
  * @return
  *    0 on success, -1 on error, rte_errno is set accordingly.
@@ -131,6 +169,43 @@ rte_pdump_enable_by_deviceid(char *device_id, uint16_t queue,
 				struct rte_mempool *mp,
 				void *filter);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Enables packet capturing on given device id and queue with filtering.
+ * device_id can be name or pci address of device.
+ *
+ * @param device_id
+ *  device id on which packet capturing should be enabled.
+ * @param queue
+ *  The queue on the Ethernet port which packet capturing
+ *  should be enabled. Pass UINT16_MAX to enable packet capturing on all
+ *  queues of a given port.
+ * @param flags
+ *  Pdump library flags that specify direction and packet format.
+ * @param snaplen
+ *  The upper limit on bytes to copy.
+ *  Passing UINT32_MAX means capture all the possible data.
+ * @param ring
+ *  The ring on which captured packets will be enqueued for user.
+ * @param mp
+ *  The mempool on to which original packets will be mirrored or duplicated.
+ * @param filter
+ *  Use BPF program to run to filter packes (can be NULL)
+ *
+ * @return
+ *    0 on success, -1 on error, rte_errno is set accordingly.
+ */
+__rte_experimental
+int
+rte_pdump_enable_bpf_by_deviceid(const char *device_id, uint16_t queue,
+				 uint32_t flags, uint32_t snaplen,
+				 struct rte_ring *ring,
+				 struct rte_mempool *mp,
+				 const struct rte_bpf_prm *filter);
+
+
 /**
  * Disables packet capturing on given device_id and queue.
  * device_id can be name or pci address of device.
@@ -153,6 +228,38 @@ int
 rte_pdump_disable_by_deviceid(char *device_id, uint16_t queue,
 				uint32_t flags);
 
+
+/**
+ * A structure used to retrieve statistics from packet capture.
+ * The statistics are sum of both receive and transmit queues.
+ */
+struct rte_pdump_stats {
+	uint64_t accepted; /**< Number of packets accepted by filter. */
+	uint64_t filtered; /**< Number of packets rejected by filter. */
+	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
+	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+
+	uint64_t reserved[4]; /**< Reserved and pad to cache line */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Retrieve the packet capture statistics for a queue.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param stats
+ *   A pointer to structure of type *rte_pdump_stats* to be filled in.
+ * @return
+ *   Zero if successful. -1 on error and rte_errno is set.
+ */
+__rte_experimental
+int
+rte_pdump_stats(uint16_t port_id, struct rte_pdump_stats *stats);
+
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pdump/version.map b/lib/pdump/version.map
index f0a9d12c9a9e..ce5502d9cdf4 100644
--- a/lib/pdump/version.map
+++ b/lib/pdump/version.map
@@ -10,3 +10,11 @@ DPDK_22 {
 
 	local: *;
 };
+
+EXPERIMENTAL {
+	global:
+
+	rte_pdump_enable_bpf;
+	rte_pdump_enable_bpf_by_deviceid;
+	rte_pdump_stats;
+};
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v15 07/12] app/dumpcap: add new packet capture application
  2021-10-20 21:42 ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Stephen Hemminger
                     ` (5 preceding siblings ...)
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 06/12] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-10-20 21:42   ` Stephen Hemminger
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 08/12] test: add test for bpf_convert Stephen Hemminger
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-20 21:42 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Reshma Pattan

This is a new packet capture application to replace existing pdump.
The new application works like Wireshark dumpcap program and supports
the pdump API features.

It is not complete yet some features such as filtering are not implemented.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/dumpcap/main.c      | 844 ++++++++++++++++++++++++++++++++++++++++
 app/dumpcap/meson.build |  16 +
 app/meson.build         |   1 +
 3 files changed, 861 insertions(+)
 create mode 100644 app/dumpcap/main.c
 create mode 100644 app/dumpcap/meson.build

diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c
new file mode 100644
index 000000000000..baf9eee46666
--- /dev/null
+++ b/app/dumpcap/main.c
@@ -0,0 +1,844 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019-2020 Microsoft Corporation
+ *
+ * DPDK application to dump network traffic
+ * This is designed to look and act like the Wireshark
+ * dumpcap program.
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <limits.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/queue.h>
+#include <sys/types.h>
+#include <sys/utsname.h>
+#include <time.h>
+#include <unistd.h>
+
+#include <rte_alarm.h>
+#include <rte_bpf.h>
+#include <rte_config.h>
+#include <rte_debug.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_ethdev.h>
+#include <rte_lcore.h>
+#include <rte_malloc.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_pcapng.h>
+#include <rte_pdump.h>
+#include <rte_ring.h>
+#include <rte_string_fns.h>
+#include <rte_time.h>
+#include <rte_version.h>
+
+#include <pcap/pcap.h>
+#include <pcap/bpf.h>
+
+#define RING_NAME "capture-ring"
+#define MONITOR_INTERVAL  (500 * 1000)
+#define MBUF_POOL_CACHE_SIZE 32
+#define BURST_SIZE 32
+#define SLEEP_THRESHOLD 1000
+
+/* command line flags */
+static const char *progname;
+static bool quit_signal;
+static bool group_read;
+static bool quiet;
+static bool promiscuous_mode = true;
+static bool use_pcapng = true;
+static char *output_name;
+static const char *filter_str;
+static unsigned int ring_size = 2048;
+static const char *capture_comment;
+static uint32_t snaplen = RTE_MBUF_DEFAULT_BUF_SIZE;
+static bool dump_bpf;
+static struct {
+	uint64_t  duration;	/* nanoseconds */
+	unsigned long packets;  /* number of packets in file */
+	size_t size;		/* file size (bytes) */
+} stop;
+
+/* Running state */
+static struct rte_bpf_prm *bpf_prm;
+static uint64_t start_time, end_time;
+static uint64_t packets_received;
+static size_t file_size;
+
+struct interface {
+	TAILQ_ENTRY(interface) next;
+	uint16_t port;
+	char name[RTE_ETH_NAME_MAX_LEN];
+
+	struct rte_rxtx_callback *rx_cb[RTE_MAX_QUEUES_PER_PORT];
+};
+
+TAILQ_HEAD(interface_list, interface);
+static struct interface_list interfaces = TAILQ_HEAD_INITIALIZER(interfaces);
+static struct interface *port2intf[RTE_MAX_ETHPORTS];
+
+/* Can do either pcap or pcapng format output */
+typedef union {
+	rte_pcapng_t  *pcapng;
+	pcap_dumper_t *dumper;
+} dumpcap_out_t;
+
+static void usage(void)
+{
+	printf("Usage: %s [options] ...\n\n", progname);
+	printf("Capture Interface:\n"
+	       "  -i <interface>           name or port index of interface\n"
+	       "  -f <capture filter>      packet filter in libpcap filter syntax\n");
+	printf("  -s <snaplen>, --snapshot-length <snaplen>\n"
+	       "                           packet snapshot length (def: %u)\n",
+	       RTE_MBUF_DEFAULT_BUF_SIZE);
+	printf("  -p, --no-promiscuous-mode\n"
+	       "                           don't capture in promiscuous mode\n"
+	       "  -D, --list-interfaces    print list of interfaces and exit\n"
+	       "  -d                       print generated BPF code for capture filter\n"
+	       "\n"
+	       "Stop conditions:\n"
+	       "  -c <packet count>        stop after n packets (def: infinite)\n"
+	       "  -a <autostop cond.> ..., --autostop <autostop cond.> ...\n"
+	       "                           duration:NUM - stop after NUM seconds\n"
+	       "                           filesize:NUM - stop this file after NUM kB\n"
+	       "                            packets:NUM - stop after NUM packets\n"
+	       "Output (files):\n"
+	       "  -w <filename>            name of file to save (def: tempfile)\n"
+	       "  -g                       enable group read access on the output file(s)\n"
+	       "  -n                       use pcapng format instead of pcap (default)\n"
+	       "  -P                       use libpcap format instead of pcapng\n"
+	       "  --capture-comment <comment>\n"
+	       "                           add a capture comment to the output file\n"
+	       "\n"
+	       "Miscellaneous:\n"
+	       "  -q                       don't report packet capture counts\n"
+	       "  -v, --version            print version information and exit\n"
+	       "  -h, --help               display this help and exit\n"
+	       "\n"
+	       "Use Ctrl-C to stop capturing at any time.\n");
+}
+
+static const char *version(void)
+{
+	static char str[128];
+
+	snprintf(str, sizeof(str),
+		 "%s 1.0 (%s)\n", progname, rte_version());
+	return str;
+}
+
+/* Parse numeric argument from command line */
+static unsigned long get_uint(const char *arg, const char *name,
+			     unsigned int limit)
+{
+	unsigned long u;
+	char *endp;
+
+	u = strtoul(arg, &endp, 0);
+	if (*arg == '\0' || *endp != '\0')
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is not a valid number\n",
+			 name, arg);
+	if (limit && u > limit)
+		rte_exit(EXIT_FAILURE,
+			 "Specified %s \"%s\" is too large (greater than %u)\n",
+			 name, arg, limit);
+
+	return u;
+}
+
+/* Set auto stop values */
+static void auto_stop(char *opt)
+{
+	char *value, *endp;
+
+	value = strchr(opt, ':');
+	if (value == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Missing colon in auto stop parameter\n");
+
+	*value++ = '\0';
+	if (strcmp(opt, "duration") == 0) {
+		double interval = strtod(value, &endp);
+
+		if (*value == '\0' || *endp != '\0' || interval <= 0)
+			rte_exit(EXIT_FAILURE,
+				 "Invalid duration \"%s\"\n", value);
+		stop.duration = NSEC_PER_SEC * interval;
+	} else if (strcmp(opt, "filesize") == 0) {
+		stop.size = get_uint(value, "filesize", 0) * 1024;
+	} else if (strcmp(opt, "packets") == 0) {
+		stop.packets = get_uint(value, "packets", 0);
+	} else {
+		rte_exit(EXIT_FAILURE,
+			 "Unknown autostop parameter \"%s\"\n", opt);
+	}
+}
+
+/* Add interface to list of interfaces to capture */
+static void add_interface(uint16_t port, const char *name)
+{
+	struct interface *intf;
+
+	intf = malloc(sizeof(*intf));
+	if (!intf)
+		rte_exit(EXIT_FAILURE, "no memory for interface\n");
+
+	memset(intf, 0, sizeof(*intf));
+	rte_strscpy(intf->name, name, sizeof(intf->name));
+
+	printf("Capturing on '%s'\n", name);
+
+	port2intf[port] = intf;
+	TAILQ_INSERT_TAIL(&interfaces, intf, next);
+}
+
+/* Select all valid DPDK interfaces */
+static void select_all_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+	}
+}
+
+/*
+ * Choose interface to capture if no -i option given.
+ * Select the first DPDK port, this matches what dumpcap does.
+ */
+static void set_default_interface(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		add_interface(p, name);
+		return;
+	}
+	rte_exit(EXIT_FAILURE, "No usable interfaces found\n");
+}
+
+/* Lookup interface by name or port and add it to the list */
+static void select_interface(const char *arg)
+{
+	uint16_t port;
+
+	if (strcmp(arg, "*"))
+		select_all_interfaces();
+	else if (rte_eth_dev_get_port_by_name(arg, &port) == 0)
+		add_interface(port, arg);
+	else {
+		char name[RTE_ETH_NAME_MAX_LEN];
+
+		port = get_uint(arg, "port_number", UINT16_MAX);
+		if (rte_eth_dev_get_name_by_port(port, name) < 0)
+			rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
+				 port);
+		add_interface(port, name);
+	}
+}
+
+/* Display list of possible interfaces that can be used. */
+static void show_interfaces(void)
+{
+	char name[RTE_ETH_NAME_MAX_LEN];
+	uint16_t p;
+
+	RTE_ETH_FOREACH_DEV(p) {
+		if (rte_eth_dev_get_name_by_port(p, name) < 0)
+			continue;
+		printf("%u. %s\n", p, name);
+	}
+}
+
+static void compile_filter(void)
+{
+	struct bpf_program bf;
+	pcap_t *pcap;
+
+	pcap = pcap_open_dead(DLT_EN10MB, snaplen);
+	if (!pcap)
+		rte_exit(EXIT_FAILURE, "can not open pcap\n");
+
+	if (pcap_compile(pcap, &bf, filter_str,
+			 1, PCAP_NETMASK_UNKNOWN) != 0)
+		rte_exit(EXIT_FAILURE, "pcap filter string not valid (%s)\n",
+			 pcap_geterr(pcap));
+
+	bpf_prm = rte_bpf_convert(&bf);
+	if (bpf_prm == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "bpf convert failed\n");
+
+	if (dump_bpf) {
+		printf("cBPF program (%u insns)\n", bf.bf_len);
+		bpf_dump(&bf, 1);
+		printf("\neBPF program (%u insns)\n", bpf_prm->nb_ins);
+		rte_bpf_dump(stdout, bpf_prm->ins, bpf_prm->nb_ins);
+		exit(0);
+	}
+
+	/* Don't care about original program any more */
+	pcap_freecode(&bf);
+	pcap_close(pcap);
+}
+
+/*
+ * Parse command line options.
+ * These are chosen to be similar to dumpcap command.
+ */
+static void parse_opts(int argc, char **argv)
+{
+	static const struct option long_options[] = {
+		{ "autostop",        required_argument, NULL, 'a' },
+		{ "capture-comment", required_argument, NULL, 0 },
+		{ "help",            no_argument,       NULL, 'h' },
+		{ "interface",       required_argument, NULL, 'i' },
+		{ "list-interfaces", no_argument,       NULL, 'D' },
+		{ "no-promiscuous-mode", no_argument,   NULL, 'p' },
+		{ "output-file",     required_argument, NULL, 'w' },
+		{ "ring-buffer",     required_argument, NULL, 'b' },
+		{ "snapshot-length", required_argument, NULL, 's' },
+		{ "version",         no_argument,       NULL, 'v' },
+		{ NULL },
+	};
+	int option_index, c;
+
+	for (;;) {
+		c = getopt_long(argc, argv, "a:b:c:dDf:ghi:nN:pPqs:vw:",
+				long_options, &option_index);
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 0:
+			switch (option_index) {
+			case 0:
+				capture_comment = optarg;
+				break;
+			default:
+				usage();
+				exit(1);
+			}
+			break;
+		case 'a':
+			auto_stop(optarg);
+			break;
+		case 'b':
+			rte_exit(EXIT_FAILURE,
+				 "multiple files not implemented\n");
+			break;
+		case 'c':
+			stop.packets = get_uint(optarg, "packet_count", 0);
+			break;
+		case 'd':
+			dump_bpf = true;
+			break;
+		case 'D':
+			show_interfaces();
+			exit(0);
+		case 'f':
+			filter_str = optarg;
+			break;
+		case 'g':
+			group_read = true;
+			break;
+		case 'h':
+			printf("%s\n\n", version());
+			usage();
+			exit(0);
+		case 'i':
+			select_interface(optarg);
+			break;
+		case 'n':
+			use_pcapng = true;
+			break;
+		case 'N':
+			ring_size = get_uint(optarg, "packet_limit", 0);
+			break;
+		case 'p':
+			promiscuous_mode = false;
+			break;
+		case 'P':
+			use_pcapng = false;
+			break;
+		case 'q':
+			quiet = true;
+			break;
+		case 's':
+			snaplen = get_uint(optarg, "snap_len", 0);
+			break;
+		case 'w':
+			output_name = optarg;
+			break;
+		case 'v':
+			printf("%s\n", version());
+			exit(0);
+		default:
+			fprintf(stderr, "Invalid option: %s\n",
+				argv[optind - 1]);
+			usage();
+			exit(1);
+		}
+	}
+}
+
+static void
+signal_handler(int sig_num __rte_unused)
+{
+	__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+}
+
+/* Return the time since 1/1/1970 in nanoseconds */
+static uint64_t create_timestamp(void)
+{
+	struct timespec now;
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+	return rte_timespec_to_ns(&now);
+}
+
+static void
+cleanup_pdump_resources(void)
+{
+	struct interface *intf;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		rte_pdump_disable(intf->port,
+				  RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
+		if (promiscuous_mode)
+			rte_eth_promiscuous_disable(intf->port);
+	}
+}
+
+/* Alarm signal handler, used to check that primary process */
+static void
+monitor_primary(void *arg __rte_unused)
+{
+	if (__atomic_load_n(&quit_signal, __ATOMIC_RELAXED))
+		return;
+
+	if (rte_eal_primary_proc_alive(NULL)) {
+		rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	} else {
+		fprintf(stderr,
+			"Primary process is no longer active, exiting...\n");
+		__atomic_store_n(&quit_signal, true, __ATOMIC_RELAXED);
+	}
+}
+
+/* Setup handler to check when primary exits. */
+static void
+enable_primary_monitor(void)
+{
+	int ret;
+
+	/* Once primary exits, so will pdump. */
+	ret = rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to enable monitor:%d\n", ret);
+}
+
+static void
+disable_primary_monitor(void)
+{
+	int ret;
+
+	ret = rte_eal_alarm_cancel(monitor_primary, NULL);
+	if (ret < 0)
+		fprintf(stderr, "Fail to disable monitor:%d\n", ret);
+}
+
+static void
+report_packet_stats(dumpcap_out_t out)
+{
+	struct rte_pdump_stats pdump_stats;
+	struct interface *intf;
+	uint64_t ifrecv, ifdrop;
+	double percent;
+
+	fputc('\n', stderr);
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (rte_pdump_stats(intf->port, &pdump_stats) < 0)
+			continue;
+
+		/* do what Wiretap does */
+		ifrecv = pdump_stats.accepted + pdump_stats.filtered;
+		ifdrop = pdump_stats.nombuf + pdump_stats.ringfull;
+
+		if (use_pcapng)
+			rte_pcapng_write_stats(out.pcapng, intf->port, NULL,
+					       start_time, end_time,
+					       ifrecv, ifdrop);
+
+		if (ifrecv == 0)
+			percent = 0;
+		else
+			percent = 100. * ifrecv / (ifrecv + ifdrop);
+
+		fprintf(stderr,
+			"Packets received/dropped on interface '%s': "
+			"%"PRIu64 "/%" PRIu64 " (%.1f)\n",
+			intf->name, ifrecv, ifdrop, percent);
+	}
+}
+
+/*
+ * Start DPDK EAL with arguments.
+ * Unlike most DPDK programs, this application does not use the
+ * typical EAL command line arguments.
+ * We don't want to expose all the DPDK internals to the user.
+ */
+static void dpdk_init(void)
+{
+	static const char * const args[] = {
+		"dumpcap", "--proc-type", "secondary",
+		"--log-level", "notice"
+
+	};
+	const int eal_argc = RTE_DIM(args);
+	char **eal_argv;
+	unsigned int i;
+
+	/* DPDK API requires mutable versions of command line arguments. */
+	eal_argv = calloc(eal_argc + 1, sizeof(char *));
+	if (eal_argv == NULL)
+		rte_panic("No memory\n");
+
+	eal_argv[0] = strdup(progname);
+	for (i = 1; i < RTE_DIM(args); i++)
+		eal_argv[i] = strdup(args[i]);
+
+	if (rte_eal_init(eal_argc, eal_argv) < 0)
+		rte_exit(EXIT_FAILURE, "EAL init failed: is primary process running?\n");
+
+	if (rte_eth_dev_count_avail() == 0)
+		rte_exit(EXIT_FAILURE, "No Ethernet ports found\n");
+}
+
+/* Create packet ring shared between callbacks and process */
+static struct rte_ring *create_ring(void)
+{
+	struct rte_ring *ring;
+	size_t size, log2;
+
+	/* Find next power of 2 >= size. */
+	size = ring_size;
+	log2 = sizeof(size) * 8 - __builtin_clzl(size - 1);
+	size = 1u << log2;
+
+	if (size != ring_size) {
+		fprintf(stderr, "Ring size %u rounded up to %zu\n",
+			ring_size, size);
+		ring_size = size;
+	}
+
+	ring = rte_ring_lookup(RING_NAME);
+	if (ring == NULL) {
+		ring = rte_ring_create(RING_NAME, ring_size,
+					rte_socket_id(), 0);
+		if (ring == NULL)
+			rte_exit(EXIT_FAILURE, "Could not create ring :%s\n",
+				 rte_strerror(rte_errno));
+	}
+	return ring;
+}
+
+static struct rte_mempool *create_mempool(void)
+{
+	static const char pool_name[] = "capture_mbufs";
+	size_t num_mbufs = 2 * ring_size;
+	struct rte_mempool *mp;
+
+	mp = rte_mempool_lookup(pool_name);
+	if (mp)
+		return mp;
+
+	mp = rte_pktmbuf_pool_create_by_ops(pool_name, num_mbufs,
+					    MBUF_POOL_CACHE_SIZE, 0,
+					    rte_pcapng_mbuf_size(snaplen),
+					    rte_socket_id(), "ring_mp_sc");
+	if (mp == NULL)
+		rte_exit(EXIT_FAILURE,
+			 "Mempool (%s) creation failed: %s\n", pool_name,
+			 rte_strerror(rte_errno));
+
+	return mp;
+}
+
+/*
+ * Get Operating System information.
+ * Returns an string allocated via malloc().
+ */
+static char *get_os_info(void)
+{
+	struct utsname uts;
+	char *osname = NULL;
+
+	if (uname(&uts) < 0)
+		return NULL;
+
+	if (asprintf(&osname, "%s %s",
+		     uts.sysname, uts.release) == -1)
+		return NULL;
+
+	return osname;
+}
+
+static dumpcap_out_t create_output(void)
+{
+	dumpcap_out_t ret;
+	static char tmp_path[PATH_MAX];
+	int fd;
+
+	/* If no filename specified make a tempfile name */
+	if (output_name == NULL) {
+		struct interface *intf;
+		struct tm *tm;
+		time_t now;
+		char ts[32];
+
+		intf = TAILQ_FIRST(&interfaces);
+		now = time(NULL);
+		tm = localtime(&now);
+		if (!tm)
+			rte_panic("localtime failed\n");
+
+		strftime(ts, sizeof(ts), "%Y%m%d%H%M%S", tm);
+
+		snprintf(tmp_path, sizeof(tmp_path),
+			 "/tmp/%s_%u_%s_%s.%s",
+			 progname, intf->port, intf->name, ts,
+			 use_pcapng ? "pcapng" : "pcap");
+		output_name = tmp_path;
+	}
+
+	if (strcmp(output_name, "-") == 0)
+		fd = STDOUT_FILENO;
+	else {
+		mode_t mode = group_read ? 0640 : 0600;
+
+		fd = open(output_name, O_WRONLY | O_CREAT, mode);
+		if (fd < 0)
+			rte_exit(EXIT_FAILURE, "Can not open \"%s\": %s\n",
+				 output_name, strerror(errno));
+	}
+
+	if (use_pcapng) {
+		char *os = get_os_info();
+
+		ret.pcapng = rte_pcapng_fdopen(fd, os, NULL,
+					   version(), capture_comment);
+		if (ret.pcapng == NULL)
+			rte_exit(EXIT_FAILURE, "pcapng_fdopen failed: %s\n",
+				 strerror(rte_errno));
+		free(os);
+	} else {
+		pcap_t *pcap;
+
+		pcap = pcap_open_dead_with_tstamp_precision(DLT_EN10MB, snaplen,
+							    PCAP_TSTAMP_PRECISION_NANO);
+		if (pcap == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_open_dead failed\n");
+
+		ret.dumper = pcap_dump_fopen(pcap, fdopen(fd, "w"));
+		if (ret.dumper == NULL)
+			rte_exit(EXIT_FAILURE, "pcap_dump_fopen failed: %s\n",
+				 pcap_geterr(pcap));
+	}
+
+	return ret;
+}
+
+static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp)
+{
+	struct interface *intf;
+	uint32_t flags;
+	int ret;
+
+	flags = RTE_PDUMP_FLAG_RXTX;
+	if (use_pcapng)
+		flags |= RTE_PDUMP_FLAG_PCAPNG;
+
+	TAILQ_FOREACH(intf, &interfaces, next) {
+		if (promiscuous_mode)
+			rte_eth_promiscuous_enable(intf->port);
+
+		ret = rte_pdump_enable_bpf(intf->port, RTE_PDUMP_ALL_QUEUES,
+					   flags, snaplen,
+					   r, mp, bpf_prm);
+		if (ret < 0)
+			rte_exit(EXIT_FAILURE,
+				 "Packet dump enable failed: %s\n",
+				 rte_strerror(-ret));
+	}
+}
+
+/*
+ * Show current count of captured packets
+ * with backspaces to overwrite last value.
+ */
+static void show_count(uint64_t count)
+{
+	unsigned int i;
+	static unsigned int bt;
+
+	for (i = 0; i < bt; i++)
+		fputc('\b', stderr);
+
+	bt = fprintf(stderr, "%"PRIu64" ", count);
+}
+
+/* Write multiple packets in older pcap format */
+static ssize_t
+pcap_write_packets(pcap_dumper_t *dumper,
+		   struct rte_mbuf *pkts[], uint16_t n)
+{
+	uint8_t temp_data[snaplen];
+	struct pcap_pkthdr header;
+	uint16_t i;
+	size_t total = 0;
+
+	gettimeofday(&header.ts, NULL);
+
+	for (i = 0; i < n; i++) {
+		struct rte_mbuf *m = pkts[i];
+
+		header.len = rte_pktmbuf_pkt_len(m);
+		header.caplen = RTE_MIN(header.len, snaplen);
+
+		pcap_dump((u_char *)dumper, &header,
+			  rte_pktmbuf_read(m, 0, header.caplen, temp_data));
+
+		total += sizeof(header) + header.len;
+	}
+
+	return total;
+}
+
+/* Process all packets in ring and dump to capture file */
+static int process_ring(dumpcap_out_t out, struct rte_ring *r)
+{
+	struct rte_mbuf *pkts[BURST_SIZE];
+	unsigned int avail, n;
+	static unsigned int empty_count;
+	ssize_t written;
+
+	n = rte_ring_sc_dequeue_burst(r, (void **) pkts, BURST_SIZE,
+				      &avail);
+	if (n == 0) {
+		/* don't consume endless amounts of cpu if idle */
+		if (empty_count < SLEEP_THRESHOLD)
+			++empty_count;
+		else
+			usleep(10);
+		return 0;
+	}
+
+	empty_count = (avail == 0);
+
+	if (use_pcapng)
+		written = rte_pcapng_write_packets(out.pcapng, pkts, n);
+	else
+		written = pcap_write_packets(out.dumper, pkts, n);
+
+	rte_pktmbuf_free_bulk(pkts, n);
+
+	if (written < 0)
+		return -1;
+
+	file_size += written;
+	packets_received += n;
+	if (!quiet)
+		show_count(packets_received);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	struct rte_ring *r;
+	struct rte_mempool *mp;
+	dumpcap_out_t out;
+
+	progname = argv[0];
+
+	dpdk_init();
+	parse_opts(argc, argv);
+
+	if (filter_str)
+		compile_filter();
+
+	if (TAILQ_EMPTY(&interfaces))
+		set_default_interface();
+
+	r = create_ring();
+	mp = create_mempool();
+	out = create_output();
+
+	start_time = create_timestamp();
+	enable_pdump(r, mp);
+
+	signal(SIGINT, signal_handler);
+	signal(SIGPIPE, SIG_IGN);
+
+	enable_primary_monitor();
+
+	if (!quiet) {
+		fprintf(stderr, "Packets captured: ");
+		show_count(0);
+	}
+
+	while (!__atomic_load_n(&quit_signal, __ATOMIC_RELAXED)) {
+		if (process_ring(out, r) < 0) {
+			fprintf(stderr, "pcapng file write failed; %s\n",
+				strerror(errno));
+			break;
+		}
+
+		if (stop.size && file_size >= stop.size)
+			break;
+
+		if (stop.packets && packets_received >= stop.packets)
+			break;
+
+		if (stop.duration != 0 &&
+		    create_timestamp() - start_time > stop.duration)
+			break;
+	}
+
+	end_time = create_timestamp();
+	disable_primary_monitor();
+
+	if (rte_eal_primary_proc_alive(NULL))
+		report_packet_stats(out);
+
+	if (use_pcapng)
+		rte_pcapng_close(out.pcapng);
+	else
+		pcap_dump_close(out.dumper);
+
+	cleanup_pdump_resources();
+	rte_free(bpf_filter);
+	rte_ring_free(r);
+	rte_mempool_free(mp);
+
+	return rte_eal_cleanup() ? EXIT_FAILURE : 0;
+}
diff --git a/app/dumpcap/meson.build b/app/dumpcap/meson.build
new file mode 100644
index 000000000000..794336211eff
--- /dev/null
+++ b/app/dumpcap/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2019 Microsoft Corporation
+
+if not dpdk_conf.has('RTE_PORT_PCAP')
+    build = false
+    reason = 'missing dependency, "libpcap"'
+endif
+
+if is_windows
+	build = false
+	reason = 'not supported on Windows'
+	subdir_done()
+endif
+
+sources = files('main.c')
+deps += ['ethdev', 'pdump', 'pcapng', 'bpf']
diff --git a/app/meson.build b/app/meson.build
index 4c6049807cc3..e41a2e390236 100644
--- a/app/meson.build
+++ b/app/meson.build
@@ -2,6 +2,7 @@
 # Copyright(c) 2017-2019 Intel Corporation
 
 apps = [
+        'dumpcap',
         'pdump',
         'proc-info',
         'test-acl',
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v15 08/12] test: add test for bpf_convert
  2021-10-20 21:42 ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Stephen Hemminger
                     ` (6 preceding siblings ...)
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
@ 2021-10-20 21:42   ` Stephen Hemminger
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 09/12] test: add a test for pcapng library Stephen Hemminger
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-20 21:42 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

Add some functional tests for the Classic BPF to DPDK BPF converter.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_bpf.c | 200 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 200 insertions(+)

diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 7fcf92e716fe..ef861d05e755 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -10,6 +10,7 @@
 #include <rte_memory.h>
 #include <rte_debug.h>
 #include <rte_hexdump.h>
+#include <rte_malloc.h>
 #include <rte_random.h>
 #include <rte_byteorder.h>
 #include <rte_errno.h>
@@ -3248,3 +3249,202 @@ test_bpf(void)
 }
 
 REGISTER_TEST_COMMAND(bpf_autotest, test_bpf);
+
+#ifdef RTE_PORT_PCAP
+#include <pcap/pcap.h>
+
+static void
+test_bpf_dump(struct bpf_program *cbf, const struct rte_bpf_prm *prm)
+{
+	printf("cBPF program (%u insns)\n", cbf->bf_len);
+	bpf_dump(cbf, 1);
+
+	printf("\neBPF program (%u insns)\n", prm->nb_ins);
+	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
+}
+
+static int
+test_bpf_match(pcap_t *pcap, const char *str,
+	       struct rte_mbuf *mb)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+	int ret = -1;
+	uint64_t rc;
+
+	if (pcap_compile(pcap, &fcode, str, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile(\"%s\") failed: %s;\n",
+		       __func__, __LINE__,  str, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, str, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	rc = rte_bpf_exec(bpf, mb);
+	/* The return code from bpf capture filter is non-zero if matched */
+	ret = (rc == 0);
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return ret;
+}
+
+/* Basic sanity test can we match a IP packet */
+static int
+test_bpf_filter_sanity(pcap_t *pcap)
+{
+	const uint32_t plen = 100;
+	struct rte_mbuf mb, *m;
+	uint8_t tbuf[RTE_MBUF_DEFAULT_BUF_SIZE];
+	struct {
+		struct rte_ether_hdr eth_hdr;
+		struct rte_ipv4_hdr ip_hdr;
+	} *hdr;
+
+	dummy_mbuf_prep(&mb, tbuf, sizeof(tbuf), plen);
+	m = &mb;
+
+	hdr = rte_pktmbuf_mtod(m, typeof(hdr));
+	hdr->eth_hdr = (struct rte_ether_hdr) {
+		.dst_addr.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+		.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+	};
+	hdr->ip_hdr = (struct rte_ipv4_hdr) {
+		.version_ihl = RTE_IPV4_VHL_DEF,
+		.total_length = rte_cpu_to_be_16(plen),
+		.time_to_live = IPDEFTTL,
+		.next_proto_id = IPPROTO_RAW,
+		.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+		.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+	};
+
+	if (test_bpf_match(pcap, "ip", m) != 0) {
+		printf("%s@%d: filter \"ip\" doesn't match test data\n",
+		       __func__, __LINE__);
+		return -1;
+	}
+	if (test_bpf_match(pcap, "not ip", m) == 0) {
+		printf("%s@%d: filter \"not ip\" does match test data\n",
+		       __func__, __LINE__);
+		return -1;
+	}
+
+	return 0;
+}
+
+/*
+ * Some sample pcap filter strings from
+ * https://wiki.wireshark.org/CaptureFilters
+ */
+static const char * const sample_filters[] = {
+	"host 172.18.5.4",
+	"net 192.168.0.0/24",
+	"src net 192.168.0.0/24",
+	"src net 192.168.0.0 mask 255.255.255.0",
+	"dst net 192.168.0.0/24",
+	"dst net 192.168.0.0 mask 255.255.255.0",
+	"port 53",
+	"host dpdk.org and not (port 80 or port 25)",
+	"host dpdk.org and not port 80 and not port 25",
+	"port not 53 and not arp",
+	"(tcp[0:2] > 1500 and tcp[0:2] < 1550) or (tcp[2:2] > 1500 and tcp[2:2] < 1550)",
+	"ether proto 0x888e",
+	"ether[0] & 1 = 0 and ip[16] >= 224",
+	"icmp[icmptype] != icmp-echo and icmp[icmptype] != icmp-echoreply",
+	"tcp[tcpflags] & (tcp-syn|tcp-fin) != 0 and not src and dst net 127.0.0.1",
+	"not ether dst 01:80:c2:00:00:0e",
+	"not broadcast and not multicast",
+	"dst host ff02::1",
+	"port 80 and tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420",
+	/* Worms */
+	"dst port 135 and tcp port 135 and ip[2:2]==48",
+	"icmp[icmptype]==icmp-echo and ip[2:2]==92 and icmp[8:4]==0xAAAAAAAA",
+	"dst port 135 or dst port 445 or dst port 1433"
+	" and tcp[tcpflags] & (tcp-syn) != 0"
+	" and tcp[tcpflags] & (tcp-ack) = 0 and src net 192.168.0.0/24",
+	"tcp src port 443 and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4] = 0x18)"
+	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 1] = 0x03)"
+	" and (tcp[((tcp[12] & 0xF0) >> 4 ) * 4 + 2] < 0x04)"
+	" and ((ip[2:2] - 4 * (ip[0] & 0x0F) - 4 * ((tcp[12] & 0xF0) >> 4) > 69))",
+	/* Other */
+	"len = 128",
+};
+
+static int
+test_bpf_filter(pcap_t *pcap, const char *s)
+{
+	struct bpf_program fcode;
+	struct rte_bpf_prm *prm = NULL;
+	struct rte_bpf *bpf = NULL;
+
+	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
+		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
+		       __func__, __LINE__, s, pcap_geterr(pcap));
+		return -1;
+	}
+
+	prm = rte_bpf_convert(&fcode);
+	if (prm == NULL) {
+		printf("%s@%d: bpf_convert('%s') failed,, error=%d(%s);\n",
+		       __func__, __LINE__, s, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+	bpf = rte_bpf_load(prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		goto error;
+	}
+
+error:
+	if (bpf)
+		rte_bpf_destroy(bpf);
+	else {
+		printf("%s \"%s\"\n", __func__, s);
+		test_bpf_dump(&fcode, prm);
+	}
+
+	rte_free(prm);
+	pcap_freecode(&fcode);
+	return (bpf == NULL) ? -1 : 0;
+}
+
+static int
+test_bpf_convert(void)
+{
+	unsigned int i;
+	pcap_t *pcap;
+	int rc;
+
+	pcap = pcap_open_dead(DLT_EN10MB, 262144);
+	if (!pcap) {
+		printf("pcap_open_dead failed\n");
+		return -1;
+	}
+
+	rc = test_bpf_filter_sanity(pcap);
+	for (i = 0; i < RTE_DIM(sample_filters); i++)
+		rc |= test_bpf_filter(pcap, sample_filters[i]);
+
+	pcap_close(pcap);
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(bpf_convert_autotest, test_bpf_convert);
+#endif /* RTE_PORT_PCAP */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v15 09/12] test: add a test for pcapng library
  2021-10-20 21:42 ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Stephen Hemminger
                     ` (7 preceding siblings ...)
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 08/12] test: add test for bpf_convert Stephen Hemminger
@ 2021-10-20 21:42   ` Stephen Hemminger
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 10/12] test: enable bpf autotest Stephen Hemminger
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-20 21:42 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Reshma Pattan

Simple unit test that created pcapng file using API.

To run this test you need to have at least one device.
For example:

DPDK_TEST=pcapng_autotest ./build/app/test/dpdk-test -l 0-15 \
    --no-huge -m 2048 --vdev=net_tap,iface=dummy

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/meson.build   |   4 +
 app/test/test_pcapng.c | 272 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 276 insertions(+)
 create mode 100644 app/test/test_pcapng.c

diff --git a/app/test/meson.build b/app/test/meson.build
index a16374b7a109..08de1c3d82b4 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -400,6 +400,10 @@ if dpdk_conf.has('RTE_NET_RING')
     fast_tests += [['pdump_autotest', true]]
 endif
 
+if dpdk_conf.has('RTE_PORT_PCAP')
+    test_sources += 'test_pcapng.c'
+endif
+
 if dpdk_conf.has('RTE_LIB_POWER')
     test_deps += 'power'
 endif
diff --git a/app/test/test_pcapng.c b/app/test/test_pcapng.c
new file mode 100644
index 000000000000..ed1e87f9445d
--- /dev/null
+++ b/app/test/test_pcapng.c
@@ -0,0 +1,272 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2021 Microsoft Corporation
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+#include <rte_net.h>
+#include <rte_pcapng.h>
+
+#include <pcap/pcap.h>
+
+#include "test.h"
+
+#define NUM_PACKETS    10
+#define DUMMY_MBUF_NUM 3
+
+static rte_pcapng_t *pcapng;
+static struct rte_mempool *mp;
+static const uint32_t pkt_len = 200;
+static uint16_t port_id;
+static char file_name[] = "/tmp/pcapng_test_XXXXXX.pcapng";
+
+/* first mbuf in the packet, should always be at offset 0 */
+struct dummy_mbuf {
+	struct rte_mbuf mb[DUMMY_MBUF_NUM];
+	uint8_t buf[DUMMY_MBUF_NUM][RTE_MBUF_DEFAULT_BUF_SIZE];
+};
+
+static void
+dummy_mbuf_prep(struct rte_mbuf *mb, uint8_t buf[], uint32_t buf_len,
+	uint32_t data_len)
+{
+	uint32_t i;
+	uint8_t *db;
+
+	mb->buf_addr = buf;
+	mb->buf_iova = (uintptr_t)buf;
+	mb->buf_len = buf_len;
+	rte_mbuf_refcnt_set(mb, 1);
+
+	/* set pool pointer to dummy value, test doesn't use it */
+	mb->pool = (void *)buf;
+
+	rte_pktmbuf_reset(mb);
+	db = (uint8_t *)rte_pktmbuf_append(mb, data_len);
+
+	for (i = 0; i != data_len; i++)
+		db[i] = i;
+}
+
+/* Make an IP packet consisting of chain of one packets */
+static void
+mbuf1_prepare(struct dummy_mbuf *dm, uint32_t plen)
+{
+	struct {
+		struct rte_ether_hdr eth;
+		struct rte_ipv4_hdr ip;
+	} pkt = {
+		.eth = {
+			.dst_addr.addr_bytes = "\xff\xff\xff\xff\xff\xff",
+			.ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+		},
+		.ip = {
+			.version_ihl = RTE_IPV4_VHL_DEF,
+			.total_length = rte_cpu_to_be_16(plen),
+			.time_to_live = IPDEFTTL,
+			.next_proto_id = IPPROTO_RAW,
+			.src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+			.dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+		}
+	};
+
+	memset(dm, 0, sizeof(*dm));
+	dummy_mbuf_prep(&dm->mb[0], dm->buf[0], sizeof(dm->buf[0]), plen);
+
+	rte_eth_random_addr(pkt.eth.src_addr.addr_bytes);
+	memcpy(rte_pktmbuf_mtod(dm->mb, void *), &pkt, RTE_MIN(sizeof(pkt), plen));
+}
+
+static int
+test_setup(void)
+{
+	int tmp_fd;
+
+	port_id = rte_eth_find_next(0);
+	if (port_id >= RTE_MAX_ETHPORTS) {
+		fprintf(stderr, "No valid Ether port\n");
+		return -1;
+	}
+
+	tmp_fd = mkstemps(file_name, strlen(".pcapng"));
+	if (tmp_fd == -1) {
+		perror("mkstemps() failure");
+		return -1;
+	}
+	printf("pcapng: output file %s\n", file_name);
+
+	/* open a test capture file */
+	pcapng = rte_pcapng_fdopen(tmp_fd, NULL, NULL, "pcapng_test", NULL);
+	if (pcapng == NULL) {
+		fprintf(stderr, "rte_pcapng_fdopen failed\n");
+		close(tmp_fd);
+		return -1;
+	}
+
+
+	/* Make a pool for cloned packeets */
+	mp = rte_pktmbuf_pool_create_by_ops("pcapng_test_pool", NUM_PACKETS,
+					    0, 0,
+					    rte_pcapng_mbuf_size(pkt_len),
+					    SOCKET_ID_ANY, "ring_mp_sc");
+	if (mp == NULL) {
+		fprintf(stderr, "Cannot create mempool\n");
+		return -1;
+	}
+	return 0;
+
+}
+
+static int
+test_write_packets(void)
+{
+	struct rte_mbuf *orig;
+	struct rte_mbuf *clones[NUM_PACKETS] = { };
+	struct dummy_mbuf mbfs;
+	unsigned int i;
+	ssize_t len;
+
+	/* make a dummy packet */
+	mbuf1_prepare(&mbfs, pkt_len);
+
+	/* clone them */
+	orig  = &mbfs.mb[0];
+	for (i = 0; i < NUM_PACKETS; i++) {
+		struct rte_mbuf *mc;
+
+		mc = rte_pcapng_copy(port_id, 0, orig, mp, pkt_len,
+				rte_get_tsc_cycles(), 0);
+		if (mc == NULL) {
+			fprintf(stderr, "Cannot copy packet\n");
+			return -1;
+		}
+		clones[i] = mc;
+	}
+
+	/* write it to capture file */
+	len = rte_pcapng_write_packets(pcapng, clones, NUM_PACKETS);
+
+	rte_pktmbuf_free_bulk(clones, NUM_PACKETS);
+
+	if (len <= 0) {
+		fprintf(stderr, "Write of packets failed\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+test_write_stats(void)
+{
+	ssize_t len;
+
+	/* write a statistics block */
+	len = rte_pcapng_write_stats(pcapng, port_id,
+				     NULL, 0, 0,
+				     NUM_PACKETS, 0);
+	if (len <= 0) {
+		fprintf(stderr, "Write of statistics failed\n");
+		return -1;
+	}
+	return 0;
+}
+
+static void
+pkt_print(u_char *user, const struct pcap_pkthdr *h,
+	  const u_char *bytes)
+{
+	unsigned int *countp = (unsigned int *)user;
+	const struct rte_ether_hdr *eh;
+	struct tm *tm;
+	char tbuf[128], src[64], dst[64];
+
+	tm = localtime(&h->ts.tv_sec);
+	if (tm == NULL) {
+		perror("localtime");
+		return;
+	}
+
+	if (strftime(tbuf, sizeof(tbuf), "%X", tm) == 0) {
+		fprintf(stderr, "strftime returned 0!\n");
+		return;
+	}
+
+	eh = (const struct rte_ether_hdr *)bytes;
+	rte_ether_format_addr(dst, sizeof(dst), &eh->dst_addr);
+	rte_ether_format_addr(src, sizeof(src), &eh->src_addr);
+	printf("%s.%06lu: %s -> %s type %x length %u\n",
+	       tbuf, (unsigned long)h->ts.tv_usec,
+	       src, dst, rte_be_to_cpu_16(eh->ether_type), h->len);
+
+	*countp += 1;
+}
+
+/*
+ * Open the resulting pcapng file with libpcap
+ * Would be better to use capinfos from wireshark
+ * but that creates an unwanted dependency.
+ */
+static int
+test_validate(void)
+{
+	char errbuf[PCAP_ERRBUF_SIZE];
+	unsigned int count = 0;
+	pcap_t *pcap;
+	int ret;
+
+	pcap = pcap_open_offline(file_name, errbuf);
+	if (pcap == NULL) {
+		fprintf(stderr, "pcap_open_offline('%s') failed: %s\n",
+			file_name, errbuf);
+		return -1;
+	}
+
+	ret = pcap_loop(pcap, 0, pkt_print, (u_char *)&count);
+	if (ret == 0)
+		printf("Saw %u packets\n", count);
+	else
+		fprintf(stderr, "pcap_dispatch: failed: %s\n",
+			pcap_geterr(pcap));
+	pcap_close(pcap);
+
+	return ret;
+}
+
+static void
+test_cleanup(void)
+{
+	if (mp)
+		rte_mempool_free(mp);
+
+	if (pcapng)
+		rte_pcapng_close(pcapng);
+
+}
+
+static struct
+unit_test_suite test_pcapng_suite  = {
+	.setup = test_setup,
+	.teardown = test_cleanup,
+	.suite_name = "Test Pcapng Unit Test Suite",
+	.unit_test_cases = {
+		TEST_CASE(test_write_packets),
+		TEST_CASE(test_write_stats),
+		TEST_CASE(test_validate),
+		TEST_CASES_END()
+	}
+};
+
+static int
+test_pcapng(void)
+{
+	return unit_test_suite_runner(&test_pcapng_suite);
+}
+
+REGISTER_TEST_COMMAND(pcapng_autotest, test_pcapng);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v15 10/12] test: enable bpf autotest
  2021-10-20 21:42 ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Stephen Hemminger
                     ` (8 preceding siblings ...)
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 09/12] test: add a test for pcapng library Stephen Hemminger
@ 2021-10-20 21:42   ` Stephen Hemminger
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 11/12] doc: changes for new pcapng and dumpcap utility Stephen Hemminger
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-20 21:42 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev

The BPF autotest is defined but not run automatically.
Since it is short, it should be added to the autotest suite.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/meson.build | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/app/test/meson.build b/app/test/meson.build
index 08de1c3d82b4..95b9d3447e08 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -197,6 +197,8 @@ test_deps = [
 fast_tests = [
         ['acl_autotest', true],
         ['atomic_autotest', false],
+        ['bpf_autotest', true],
+        ['bpf_convert_autotest', true],
         ['bitops_autotest', true],
         ['byteorder_autotest', true],
         ['cksum_autotest', true],
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v15 11/12] doc: changes for new pcapng and dumpcap utility
  2021-10-20 21:42 ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Stephen Hemminger
                     ` (9 preceding siblings ...)
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 10/12] test: enable bpf autotest Stephen Hemminger
@ 2021-10-20 21:42   ` Stephen Hemminger
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
  2021-10-22 13:55   ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Thomas Monjalon
  12 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-20 21:42 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Reshma Pattan

Describe the new packet capture library and utility.
Fix the title line on the pdump documentation.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Reshma Pattan <reshma.pattan@intel.com>
---
 doc/api/doxy-api-index.md                     |  1 +
 doc/api/doxy-api.conf.in                      |  1 +
 .../howto/img/packet_capture_framework.svg    | 96 +++++++++----------
 doc/guides/howto/packet_capture_framework.rst | 69 ++++++-------
 doc/guides/prog_guide/index.rst               |  1 +
 doc/guides/prog_guide/pcapng_lib.rst          | 46 +++++++++
 doc/guides/prog_guide/pdump_lib.rst           | 28 ++++--
 doc/guides/rel_notes/release_21_11.rst        | 10 ++
 doc/guides/tools/dumpcap.rst                  | 86 +++++++++++++++++
 doc/guides/tools/index.rst                    |  1 +
 10 files changed, 251 insertions(+), 88 deletions(-)
 create mode 100644 doc/guides/prog_guide/pcapng_lib.rst
 create mode 100644 doc/guides/tools/dumpcap.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 29390504318b..a447c1ab4ac0 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -224,3 +224,4 @@ The public API headers are grouped by topics:
   [experimental APIs]  (@ref rte_compat.h),
   [ABI versioning]     (@ref rte_function_versioning.h),
   [version]            (@ref rte_version.h)
+  [pcapng]             (@ref rte_pcapng.h)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 109ec1f6826b..096ebbaf0d1b 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -59,6 +59,7 @@ INPUT                   = @TOPDIR@/doc/api/doxy-api-index.md \
                           @TOPDIR@/lib/metrics \
                           @TOPDIR@/lib/node \
                           @TOPDIR@/lib/net \
+                          @TOPDIR@/lib/pcapng \
                           @TOPDIR@/lib/pci \
                           @TOPDIR@/lib/pdump \
                           @TOPDIR@/lib/pipeline \
diff --git a/doc/guides/howto/img/packet_capture_framework.svg b/doc/guides/howto/img/packet_capture_framework.svg
index a76baf71fdee..1c2646a81096 100644
--- a/doc/guides/howto/img/packet_capture_framework.svg
+++ b/doc/guides/howto/img/packet_capture_framework.svg
@@ -1,6 +1,4 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-<!-- Created with Inkscape (http://www.inkscape.org/) -->
-
 <svg
    xmlns:osb="http://www.openswatchbook.org/uri/2009/osb"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
@@ -16,8 +14,8 @@
    viewBox="0 0 425.19685 283.46457"
    id="svg2"
    version="1.1"
-   inkscape:version="0.91 r13725"
-   sodipodi:docname="drawing-pcap.svg">
+   inkscape:version="1.0.2 (e86c870879, 2021-01-15)"
+   sodipodi:docname="packet_capture_framework.svg">
   <defs
      id="defs4">
     <marker
@@ -228,7 +226,7 @@
        x2="487.64606"
        y2="258.38232"
        gradientUnits="userSpaceOnUse"
-       gradientTransform="translate(-84.916417,744.90779)" />
+       gradientTransform="matrix(1.1457977,0,0,0.99944907,-151.97019,745.05014)" />
     <linearGradient
        inkscape:collect="always"
        xlink:href="#linearGradient5784"
@@ -277,17 +275,18 @@
      borderopacity="1.0"
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
-     inkscape:zoom="0.57434918"
-     inkscape:cx="215.17857"
-     inkscape:cy="285.26445"
+     inkscape:zoom="1"
+     inkscape:cx="226.77165"
+     inkscape:cy="78.124511"
      inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
-     inkscape:window-width="1874"
-     inkscape:window-height="971"
-     inkscape:window-x="2"
-     inkscape:window-y="24"
-     inkscape:window-maximized="0" />
+     inkscape:window-width="2560"
+     inkscape:window-height="1414"
+     inkscape:window-x="0"
+     inkscape:window-y="0"
+     inkscape:window-maximized="1"
+     inkscape:document-rotation="0" />
   <metadata
      id="metadata7">
     <rdf:RDF>
@@ -296,7 +295,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
+        <dc:title />
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -321,15 +320,15 @@
        y="790.82452" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="61.050636"
        y="807.3205"
-       id="text4152"
-       sodipodi:linespacing="125%"><tspan
+       id="text4152"><tspan
          sodipodi:role="line"
          id="tspan4154"
          x="61.050636"
-         y="807.3205">DPDK Primary Application</tspan></text>
+         y="807.3205"
+         style="font-size:12.5px;line-height:1.25">DPDK Primary Application</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6"
@@ -339,19 +338,20 @@
        y="827.01843" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="350.68585"
        y="841.16058"
-       id="text4189"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189"><tspan
          sodipodi:role="line"
          id="tspan4191"
          x="350.68585"
-         y="841.16058">dpdk-pdump</tspan><tspan
+         y="841.16058"
+         style="font-size:12.5px;line-height:1.25">dpdk-dumpcap</tspan><tspan
          sodipodi:role="line"
          x="350.68585"
          y="856.78558"
-         id="tspan4193">tool</tspan></text>
+         id="tspan4193"
+         style="font-size:12.5px;line-height:1.25">tool</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4"
@@ -361,15 +361,15 @@
        y="891.16315" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70612"
        y="905.3053"
-       id="text4189-1"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1"><tspan
          sodipodi:role="line"
          x="352.70612"
          y="905.3053"
-         id="tspan4193-3">PCAP PMD</tspan></text>
+         id="tspan4193-3"
+         style="font-size:12.5px;line-height:1.25">librte_pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5745);fill-opacity:1;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-6"
@@ -379,15 +379,15 @@
        y="923.9931" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.02846"
        y="938.13525"
-       id="text4189-0"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-0"><tspan
          sodipodi:role="line"
          x="136.02846"
          y="938.13525"
-         id="tspan4193-6">dpdk_port0</tspan></text>
+         id="tspan4193-6"
+         style="font-size:12.5px;line-height:1.25">dpdk_port0</tspan></text>
     <rect
        style="fill:#000000;fill-opacity:0;stroke:#257cdc;stroke-width:2;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-5"
@@ -397,33 +397,33 @@
        y="824.99817" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#000000;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="137.54369"
        y="839.14026"
-       id="text4189-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-4"><tspan
          sodipodi:role="line"
          x="137.54369"
          y="839.14026"
-         id="tspan4193-2">librte_pdump</tspan></text>
+         id="tspan4193-2"
+         style="font-size:12.5px;line-height:1.25">librte_pdump</tspan></text>
     <rect
-       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
+       style="fill:url(#linearGradient5788);fill-opacity:1;stroke:#257cdc;stroke-width:1.07013;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5"
-       width="94.449265"
-       height="35.355339"
-       x="307.7804"
-       y="985.61243" />
+       width="108.21974"
+       height="35.335861"
+       x="297.9809"
+       y="985.62219" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="352.70618"
        y="999.75458"
-       id="text4189-1-8"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8"><tspan
          sodipodi:role="line"
          x="352.70618"
          y="999.75458"
-         id="tspan4193-3-2">capture.pcap</tspan></text>
+         id="tspan4193-3-2"
+         style="font-size:12.5px;line-height:1.25">capture.pcapng</tspan></text>
     <rect
        style="fill:url(#linearGradient5788-1);fill-opacity:1;stroke:#257cdc;stroke-width:1.12555885;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1"
        id="rect4156-6-4-5-1"
@@ -433,15 +433,15 @@
        y="983.14984" />
     <text
        xml:space="preserve"
-       style="font-style:normal;font-weight:normal;font-size:12.5px;line-height:125%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;text-align:center;letter-spacing:0px;word-spacing:0px;text-anchor:middle;fill:#ffffff;fill-opacity:1;stroke:none;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
        x="136.53352"
        y="1002.785"
-       id="text4189-1-8-4"
-       sodipodi:linespacing="125%"><tspan
+       id="text4189-1-8-4"><tspan
          sodipodi:role="line"
          x="136.53352"
          y="1002.785"
-         id="tspan4193-3-2-7">Traffic Generator</tspan></text>
+         id="tspan4193-3-2-7"
+         style="font-size:12.5px;line-height:1.25">Traffic Generator</tspan></text>
     <path
        style="fill:none;fill-rule:evenodd;stroke:#000000;stroke-width:1px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1;marker-end:url(#marker7331)"
        d="m 351.46948,927.02357 c 0,57.5787 0,57.5787 0,57.5787"
diff --git a/doc/guides/howto/packet_capture_framework.rst b/doc/guides/howto/packet_capture_framework.rst
index c31bac52340e..f933cc7e9311 100644
--- a/doc/guides/howto/packet_capture_framework.rst
+++ b/doc/guides/howto/packet_capture_framework.rst
@@ -1,18 +1,19 @@
 ..  SPDX-License-Identifier: BSD-3-Clause
-    Copyright(c) 2017 Intel Corporation.
+    Copyright(c) 2017-2021 Intel Corporation.
 
-DPDK pdump Library and pdump Tool
-=================================
+DPDK packet capture libraries and tools
+=======================================
 
 This document describes how the Data Plane Development Kit (DPDK) Packet
 Capture Framework is used for capturing packets on DPDK ports. It is intended
 for users of DPDK who want to know more about the Packet Capture feature and
 for those who want to monitor traffic on DPDK-controlled devices.
 
-The DPDK packet capture framework was introduced in DPDK v16.07. The DPDK
-packet capture framework consists of the DPDK pdump library and DPDK pdump
-tool.
-
+The DPDK packet capture framework was introduced in DPDK v16.07 and
+enhanced in 21.11. The DPDK packet capture framework consists of the
+libraries for collecting packets ``librte_pdump`` and writing packets
+to a file ``librte_pcapng``. There are two sample applications:
+``dpdk-dumpcap`` and older ``dpdk-pdump``.
 
 Introduction
 ------------
@@ -22,43 +23,46 @@ allow users to initialize the packet capture framework and to enable or
 disable packet capture. The library works on a multi process communication model and its
 usage is recommended for debugging purposes.
 
-The :ref:`dpdk-pdump <pdump_tool>` tool is developed based on the
-``librte_pdump`` library.  It runs as a DPDK secondary process and is capable
-of enabling or disabling packet capture on DPDK ports. The ``dpdk-pdump`` tool
-provides command-line options with which users can request enabling or
-disabling of the packet capture on DPDK ports.
+The :ref:`librte_pcapng <pcapng_library>` library provides the APIs to format
+packets and write them to a file in Pcapng format.
+
+
+The :ref:`dpdk-dumpcap <dumpcap_tool>` is a tool that captures packets in
+like Wireshark dumpcap does for Linux. It runs as a DPDK secondary process and
+captures packets from one or more interfaces and writes them to a file
+in Pcapng format.  The ``dpdk-dumpcap`` tool is designed to take
+most of the same options as the Wireshark ``dumpcap`` command.
 
-The application which initializes the packet capture framework will be a primary process
-and the application that enables or disables the packet capture will
-be a secondary process. The primary process sends the Rx and Tx packets from the DPDK ports
-to the secondary process.
+Without any options it will use the packet capture framework to
+capture traffic from the first available DPDK port.
 
 In DPDK the ``testpmd`` application can be used to initialize the packet
-capture framework and acts as a server, and the ``dpdk-pdump`` tool acts as a
+capture framework and acts as a server, and the ``dpdk-dumpcap`` tool acts as a
 client. To view Rx or Tx packets of ``testpmd``, the application should be
-launched first, and then the ``dpdk-pdump`` tool. Packets from ``testpmd``
-will be sent to the tool, which then sends them on to the Pcap PMD device and
-that device writes them to the Pcap file or to an external interface depending
-on the command-line option used.
+launched first, and then the ``dpdk-dumpcap`` tool. Packets from ``testpmd``
+will be sent to the tool, and then to the Pcapng file.
 
 Some things to note:
 
-* The ``dpdk-pdump`` tool can only be used in conjunction with a primary
+* All tools using ``librte_pdump`` can only be used in conjunction with a primary
   application which has the packet capture framework initialized already. In
   dpdk, only ``testpmd`` is modified to initialize packet capture framework,
-  other applications remain untouched. So, if the ``dpdk-pdump`` tool has to
+  other applications remain untouched. So, if the ``dpdk-dumpcap`` tool has to
   be used with any application other than the testpmd, the user needs to
   explicitly modify that application to call the packet capture framework
   initialization code. Refer to the ``app/test-pmd/testpmd.c`` code and look
   for ``pdump`` keyword to see how this is done.
 
-* The ``dpdk-pdump`` tool depends on the libpcap based PMD.
+* The ``dpdk-pdump`` tool is an older tool created as demonstration of ``librte_pdump``
+  library. The ``dpdk-pdump`` tool provides more limited functionality and
+  and depends on the Pcap PMD. It is retained only for compatibility reasons;
+  users should use ``dpdk-dumpcap`` instead.
 
 
 Test Environment
 ----------------
 
-The overview of using the Packet Capture Framework and the ``dpdk-pdump`` tool
+The overview of using the Packet Capture Framework and the ``dpdk-dumpcap`` utility
 for packet capturing on the DPDK port in
 :numref:`figure_packet_capture_framework`.
 
@@ -66,13 +70,13 @@ for packet capturing on the DPDK port in
 
 .. figure:: img/packet_capture_framework.*
 
-   Packet capturing on a DPDK port using the dpdk-pdump tool.
+   Packet capturing on a DPDK port using the dpdk-dumpcap utility.
 
 
 Running the Application
 -----------------------
 
-The following steps demonstrate how to run the ``dpdk-pdump`` tool to capture
+The following steps demonstrate how to run the ``dpdk-dumpcap`` tool to capture
 Rx side packets on dpdk_port0 in :numref:`figure_packet_capture_framework` and
 inspect them using ``tcpdump``.
 
@@ -80,16 +84,15 @@ inspect them using ``tcpdump``.
 
      sudo <build_dir>/app/dpdk-testpmd -c 0xf0 -n 4 -- -i --port-topology=chained
 
-#. Launch the pdump tool as follows::
+#. Launch the dpdk-dumpcap as follows::
 
-     sudo <build_dir>/app/dpdk-pdump -- \
-          --pdump 'port=0,queue=*,rx-dev=/tmp/capture.pcap'
+     sudo <build_dir>/app/dpdk-dumpcap -w /tmp/capture.pcapng
 
 #. Send traffic to dpdk_port0 from traffic generator.
-   Inspect packets captured in the file capture.pcap using a tool
-   that can interpret Pcap files, for example tcpdump::
+   Inspect packets captured in the file capture.pcapng using a tool such as
+   tcpdump or tshark that can interpret Pcapng files::
 
-     $tcpdump -nr /tmp/capture.pcap
+     $ tcpdump -nr /tmp/capture.pcapng
      reading from file /tmp/capture.pcap, link-type EN10MB (Ethernet)
      11:11:36.891404 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
      11:11:36.891442 IP 4.4.4.4.whois++ > 3.3.3.3.whois++: UDP, length 18
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 89af28dacb72..a8e8e759ecf2 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -44,6 +44,7 @@ Programmer's Guide
     ip_fragment_reassembly_lib
     generic_receive_offload_lib
     generic_segmentation_offload_lib
+    pcapng_lib
     pdump_lib
     multi_proc_support
     kernel_nic_interface
diff --git a/doc/guides/prog_guide/pcapng_lib.rst b/doc/guides/prog_guide/pcapng_lib.rst
new file mode 100644
index 000000000000..fa1994c96f4d
--- /dev/null
+++ b/doc/guides/prog_guide/pcapng_lib.rst
@@ -0,0 +1,46 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2021 Microsoft Corporation
+
+.. _pcapng_library:
+
+Packet Capture Next Generation Library
+======================================
+
+Exchanging packet traces becomes more and more critical every day.
+The de facto standard for this is the format define by libpcap;
+but that format is rather old and is lacking in functionality
+for more modern applications. The `Pcapng file format`_
+is the default capture file format for modern network capture
+processing tools such as `wireshark`_ (can also be read by `tcpdump`_).
+
+The Pcapng library is a an API for formatting packet data into
+into a Pcapng file.
+The format conforms to the current `Pcapng RFC`_ standard.
+It is designed to be integrated with the packet capture library.
+
+Usage
+-----
+
+Before the library can be used the function ``rte_pcapng_init``
+should be called once to initialize timestamp computation.
+
+The output stream is created with ``rte_pcapng_fdopen``,
+and should be closed with ``rte_pcapng_close``.
+
+The library requires a DPDK mempool to allocate mbufs. The mbufs
+need to be able to accommodate additional space for the pcapng packet
+format header and trailer information; the function ``rte_pcapng_mbuf_size``
+should be used to determine the lower bound based on MTU.
+
+Collecting packets is done in two parts. The function ``rte_pcapng_copy``
+is used to format and copy mbuf data and ``rte_pcapng_write_packets``
+writes a burst of packets to the output file.
+
+The function ``rte_pcapng_write_stats`` can be used to write
+statistics information into the output file. The summary statistics
+information is automatically added by ``rte_pcapng_close``.
+
+.. _Tcpdump: https://tcpdump.org/
+.. _Wireshark: https://wireshark.org/
+.. _Pcapng file format: https://github.com/pcapng/pcapng/
+.. _Pcapng RFC: https://datatracker.ietf.org/doc/html/draft-tuexen-opsawg-pcapng
diff --git a/doc/guides/prog_guide/pdump_lib.rst b/doc/guides/prog_guide/pdump_lib.rst
index 62c0b015b2fe..f3ff8fd828dc 100644
--- a/doc/guides/prog_guide/pdump_lib.rst
+++ b/doc/guides/prog_guide/pdump_lib.rst
@@ -3,10 +3,10 @@
 
 .. _pdump_library:
 
-The librte_pdump Library
-========================
+Packet Capture Library
+======================
 
-The ``librte_pdump`` library provides a framework for packet capturing in DPDK.
+The DPDK ``pdump`` library provides a framework for packet capturing in DPDK.
 The library does the complete copy of the Rx and Tx mbufs to a new mempool and
 hence it slows down the performance of the applications, so it is recommended
 to use this library for debugging purposes.
@@ -23,11 +23,19 @@ or disable the packet capture, and to uninitialize it.
 
 * ``rte_pdump_enable()``:
   This API enables the packet capture on a given port and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf()``
+  This API enables the packet capture on a given port and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_enable_by_deviceid()``:
   This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
-  Note: The filter option in the API is a place holder for future enhancements.
+
+* ``rte_pdump_enable_bpf_by_deviceid()``
+  This API enables the packet capture on a given device id (``vdev name or pci address``) and queue.
+  It also allows setting an optional filter using DPDK BPF interpreter and
+  setting the captured packet length.
 
 * ``rte_pdump_disable()``:
   This API disables the packet capture on a given port and queue.
@@ -61,6 +69,12 @@ and enables the packet capture by registering the Ethernet RX and TX callbacks f
 and queue combinations. Then the primary process will mirror the packets to the new mempool and enqueue them to
 the rte_ring that secondary process have passed to these APIs.
 
+The packet ring supports one of two formats. The default format enqueues copies of the original packets
+into the rte_ring. If the ``RTE_PDUMP_FLAG_PCAPNG`` is set the mbuf data is extended with header and trailer
+to match the format of Pcapng enhanced packet block. The enhanced packet block has meta-data such as the
+timestamp, port and queue the packet was captured on. It is up to the application consuming the
+packets from the ring to select the format desired.
+
 The library APIs ``rte_pdump_disable()`` and ``rte_pdump_disable_by_deviceid()`` disables the packet capture.
 For the calls to these APIs from secondary process, the library creates the "pdump disable" request and sends
 the request to the primary process over the multi process channel. The primary process takes this request and
@@ -74,5 +88,5 @@ function.
 Use Case: Packet Capturing
 --------------------------
 
-The DPDK ``app/pdump`` tool is developed based on this library to capture packets in DPDK.
-Users can use this as an example to develop their own packet capturing tools.
+The DPDK ``app/dpdk-dumpcap`` utility uses this library
+to capture packets in DPDK.
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 30175246c74a..c91f36500a7c 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -189,6 +189,16 @@ New Features
   * Added tests to verify tunnel header verification in IPsec inbound.
   * Added tests to verify inner checksum.
 
+* **Revised packet capture framework.**
+
+  * New dpdk-dumpcap program that has most of the features of the
+    wireshark dumpcap utility including: capture of multiple interfaces,
+    filtering, and stopping after number of bytes, packets.
+  * New library for writing pcapng packet capture files.
+  * Enhancements to the pdump library to support:
+    * Packet filter with BPF.
+    * Pcapng format with timestamps and meta-data.
+    * Fixes packet capture with stripped VLAN tags.
 
 Removed Items
 -------------
diff --git a/doc/guides/tools/dumpcap.rst b/doc/guides/tools/dumpcap.rst
new file mode 100644
index 000000000000..664ea0c79802
--- /dev/null
+++ b/doc/guides/tools/dumpcap.rst
@@ -0,0 +1,86 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2020 Microsoft Corporation.
+
+.. _dumpcap_tool:
+
+dpdk-dumpcap Application
+========================
+
+The ``dpdk-dumpcap`` tool is a Data Plane Development Kit (DPDK)
+network traffic dump tool.  The interface is similar to  the dumpcap tool in Wireshark.
+It runs as a secondary DPDK process and lets you capture packets that are
+coming into and out of a DPDK primary process.
+The ``dpdk-dumpcap`` writes files in Pcapng packet format using
+capture file format is pcapng.
+
+Without any options set it will use DPDK to capture traffic from the first
+available DPDK interface and write the received raw packet data, along
+with timestamps into a pcapng file.
+
+If the ``-w`` option is not specified, ``dpdk-dumpcap`` writes to a newly
+create file with a name chosen based on interface name and timestamp.
+If ``-w`` option is specified, then that file is used.
+
+   .. Note::
+      * The ``dpdk-dumpcap`` tool can only be used in conjunction with a primary
+        application which has the packet capture framework initialized already.
+        In dpdk, only the ``testpmd`` is modified to initialize packet capture
+        framework, other applications remain untouched. So, if the ``dpdk-dumpcap``
+        tool has to be used with any application other than the testpmd, user
+        needs to explicitly modify that application to call packet capture
+        framework initialization code. Refer ``app/test-pmd/testpmd.c``
+        code to see how this is done.
+
+      * The ``dpdk-dumpcap`` tool runs as a DPDK secondary process. It exits when
+        the primary application exits.
+
+
+Running the Application
+-----------------------
+
+To list interfaces available for capture use ``--list-interfaces``.
+
+To filter packets in style of *tshark* use the ``-f`` flag.
+
+To capture on multiple interfaces at once, use multiple ``-I`` flags.
+
+Example
+-------
+
+.. code-block:: console
+
+   # ./<build_dir>/app/dpdk-dumpcap --list-interfaces
+   0. 000:00:03.0
+   1. 000:00:03.1
+
+   # ./<build_dir>/app/dpdk-dumpcap -I 0000:00:03.0 -c 6 -w /tmp/sample.pcapng
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 6/0
+
+   # ./<build_dir>/app/dpdk-dumpcap -f 'tcp port 80'
+   Packets captured: 6
+   Packets received/dropped on interface '0000:00:03.0' 10/8
+
+
+Limitations
+-----------
+The following option of Wireshark ``dumpcap`` is not yet implemented:
+
+   * ``-b|--ring-buffer`` -- more complex file management.
+
+The following options do not make sense in the context of DPDK.
+
+   * ``-C <byte_limit>`` -- its a kernel thing
+
+   * ``-t`` -- use a thread per interface
+
+   * Timestamp type.
+
+   * Link data types. Only EN10MB (Ethernet) is supported.
+
+   * Wireless related options:  ``-I|--monitor-mode`` and  ``-k <freq>``
+
+
+.. Note::
+   * The options to ``dpdk-dumpcap`` are like the Wireshark dumpcap program and
+     are not the same as ``dpdk-pdump`` and other DPDK applications.
diff --git a/doc/guides/tools/index.rst b/doc/guides/tools/index.rst
index 93dde4148e90..b71c12b8f2dd 100644
--- a/doc/guides/tools/index.rst
+++ b/doc/guides/tools/index.rst
@@ -8,6 +8,7 @@ DPDK Tools User Guides
     :maxdepth: 2
     :numbered:
 
+    dumpcap
     proc_info
     pdump
     pmdinfo
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH v15 12/12] MAINTAINERS: add entry for new packet capture features
  2021-10-20 21:42 ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Stephen Hemminger
                     ` (10 preceding siblings ...)
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 11/12] doc: changes for new pcapng and dumpcap utility Stephen Hemminger
@ 2021-10-20 21:42   ` Stephen Hemminger
  2021-10-21 16:02     ` Stephen Hemminger
  2021-10-22 13:55   ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Thomas Monjalon
  12 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-20 21:42 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Thomas Monjalon

Since the packet capture is just extension of existing pdump;
add myself as maintainer of that.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 629ec107cfed..5d8c9641103b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1430,12 +1430,17 @@ F: doc/guides/sample_app_ug/qos_scheduler.rst
 
 Packet capture
 M: Reshma Pattan <reshma.pattan@intel.com>
+M: Stephen Hemminger <stephen@networkplumber.org>
 F: lib/pdump/
+F: lib/pcapng/
 F: doc/guides/prog_guide/pdump_lib.rst
-F: app/test/test_pdump.*
-F: app/pdump/
+F: doc/guides/prog_guide/pcapng_lib.rst
+F: doc/guides/tools/dumpcap.rst
 F: doc/guides/tools/pdump.rst
-
+F: app/test/test_pdump.c
+F: app/test/test_pcapng.c
+F: app/pdump/
+F: app/dumpcap/
 
 Packet Framework
 ----------------
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v14 12/12] MAINTAINERS: add entry for new packet capture features
  2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
@ 2021-10-21 12:40     ` Pattan, Reshma
  0 siblings, 0 replies; 220+ messages in thread
From: Pattan, Reshma @ 2021-10-21 12:40 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Thomas Monjalon



> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Stephen Hemminger
> Sent: Friday, October 15, 2021 9:11 PM
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Acked-by: Reshma Pattan <reshma.pattan@intel.com>

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v15 02/12] librte_pcapng: add new library for writing pcapng files
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
@ 2021-10-21 14:14     ` Kinsella, Ray
  2021-10-21 15:29       ` Stephen Hemminger
  2021-10-22 13:43     ` Thomas Monjalon
  2021-10-29 17:50     ` Ferruh Yigit
  2 siblings, 1 reply; 220+ messages in thread
From: Kinsella, Ray @ 2021-10-21 14:14 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Reshma Pattan



On 20/10/2021 22:42, Stephen Hemminger wrote:
> This is utility library for writing pcapng format files
> used by Wireshark family of utilities. Older tcpdump
> also knows how to read (but not write) this format.
> 
> See
>    https://github.com/pcapng/pcapng/
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Acked-by: Reshma Pattan <reshma.pattan@intel.com>
> ---
>   lib/meson.build           |   1 +
>   lib/pcapng/meson.build    |   8 +
>   lib/pcapng/pcapng_proto.h | 129 ++++++++
>   lib/pcapng/rte_pcapng.c   | 607 ++++++++++++++++++++++++++++++++++++++
>   lib/pcapng/rte_pcapng.h   | 195 ++++++++++++
>   lib/pcapng/version.map    |  12 +
>   6 files changed, 952 insertions(+)
>   create mode 100644 lib/pcapng/meson.build
>   create mode 100644 lib/pcapng/pcapng_proto.h
>   create mode 100644 lib/pcapng/rte_pcapng.c
>   create mode 100644 lib/pcapng/rte_pcapng.h
>   create mode 100644 lib/pcapng/version.map
> 

Minor niggle, does this need a MAINTAINERS entry?

Acked-by: Ray Kinsella <mdr@ashroe.eu>

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v15 04/12] bpf: add function to convert classic BPF to DPDK BPF
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
@ 2021-10-21 14:15     ` Kinsella, Ray
  0 siblings, 0 replies; 220+ messages in thread
From: Kinsella, Ray @ 2021-10-21 14:15 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Konstantin Ananyev



On 20/10/2021 22:42, Stephen Hemminger wrote:
> The pcap library emits classic BPF (32 bit) and is useful for
> creating filter programs.  The DPDK BPF library only implements
> extended BPF (eBPF).  Add an function to convert from old to
> new.
> 
> The rte_bpf_convert function uses rte_malloc to put the resulting
> program in hugepage shared memory so it can be passed from a
> secondary process to a primary process.
> 
> The code to convert was originally done as part of the Linux
> kernel implementation then converted to a userspace program.
> See https://github.com/tklauser/filter2xdp
> 
> Both authors have agreed that it is allowable to create a modified
> version of this code and license it with BSD license used by DPDK.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>   lib/bpf/bpf_convert.c | 575 ++++++++++++++++++++++++++++++++++++++++++
>   lib/bpf/meson.build   |   5 +
>   lib/bpf/rte_bpf.h     |  25 ++
>   lib/bpf/version.map   |   6 +
>   4 files changed, 611 insertions(+)
>   create mode 100644 lib/bpf/bpf_convert.c
> 

Acked-by: Ray Kinsella <mdr@ashroe.eu>

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v15 05/12] bpf: add function to dump eBPF instructions
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
@ 2021-10-21 14:15     ` Kinsella, Ray
  0 siblings, 0 replies; 220+ messages in thread
From: Kinsella, Ray @ 2021-10-21 14:15 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Konstantin Ananyev



On 20/10/2021 22:42, Stephen Hemminger wrote:
> When debugging converted (and other) programs it is useful
> to see disassembled eBPF output.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>   lib/bpf/bpf_dump.c  | 139 ++++++++++++++++++++++++++++++++++++++++++++
>   lib/bpf/meson.build |   1 +
>   lib/bpf/rte_bpf.h   |  14 +++++
>   lib/bpf/version.map |   1 +
>   4 files changed, 155 insertions(+)
>   create mode 100644 lib/bpf/bpf_dump.c
> 
Acked-by: Ray Kinsella <mdr@ashroe.eu>

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v15 06/12] pdump: support pcapng and filtering
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 06/12] pdump: support pcapng and filtering Stephen Hemminger
@ 2021-10-21 14:16     ` Kinsella, Ray
  2021-10-27  6:34     ` Wang, Yinan
  1 sibling, 0 replies; 220+ messages in thread
From: Kinsella, Ray @ 2021-10-21 14:16 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Reshma Pattan, Anatoly Burakov



On 20/10/2021 22:42, Stephen Hemminger wrote:
> This enhances the DPDK pdump library to support new
> pcapng format and filtering via BPF.
> 
> The internal client/server protocol is changed to support
> two versions: the original pdump basic version and a
> new pcapng version.
> 
> The internal version number (not part of exposed API or ABI)
> is intentionally increased to cause any attempt to try
> mismatched primary/secondary process to fail.
> 
> Add new API to do allow filtering of captured packets with
> DPDK BPF (eBPF) filter program. It keeps statistics
> on packets captured, filtered, and missed (because ring was full).
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Acked-by: Reshma Pattan <reshma.pattan@intel.com>
> ---
>   lib/meson.build       |   4 +-
>   lib/pdump/meson.build |   2 +-
>   lib/pdump/rte_pdump.c | 432 ++++++++++++++++++++++++++++++------------
>   lib/pdump/rte_pdump.h | 113 ++++++++++-
>   lib/pdump/version.map |   8 +
>   5 files changed, 433 insertions(+), 126 deletions(-)
> 

Acked-by: Ray Kinsella <mdr@ashroe.eu>

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v15 02/12] librte_pcapng: add new library for writing pcapng files
  2021-10-21 14:14     ` Kinsella, Ray
@ 2021-10-21 15:29       ` Stephen Hemminger
  2021-10-21 18:56         ` Thomas Monjalon
  0 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-21 15:29 UTC (permalink / raw)
  To: Kinsella, Ray; +Cc: dev, Reshma Pattan

On Thu, 21 Oct 2021 15:14:38 +0100
"Kinsella, Ray" <mdr@ashroe.eu> wrote:

> On 20/10/2021 22:42, Stephen Hemminger wrote:
> > This is utility library for writing pcapng format files
> > used by Wireshark family of utilities. Older tcpdump
> > also knows how to read (but not write) this format.
> > 
> > See
> >    https://github.com/pcapng/pcapng/
> > 
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > Acked-by: Reshma Pattan <reshma.pattan@intel.com>
> > ---
> >   lib/meson.build           |   1 +
> >   lib/pcapng/meson.build    |   8 +
> >   lib/pcapng/pcapng_proto.h | 129 ++++++++
> >   lib/pcapng/rte_pcapng.c   | 607 ++++++++++++++++++++++++++++++++++++++
> >   lib/pcapng/rte_pcapng.h   | 195 ++++++++++++
> >   lib/pcapng/version.map    |  12 +
> >   6 files changed, 952 insertions(+)
> >   create mode 100644 lib/pcapng/meson.build
> >   create mode 100644 lib/pcapng/pcapng_proto.h
> >   create mode 100644 lib/pcapng/rte_pcapng.c
> >   create mode 100644 lib/pcapng/rte_pcapng.h
> >   create mode 100644 lib/pcapng/version.map
> >   
> 
> Minor niggle, does this need a MAINTAINERS entry?
> 
> Acked-by: Ray Kinsella <mdr@ashroe.eu>

It doesn't need its own entry in MAINTAINERS.
If you look at the last patch, added that directory under the Packet Capture section.

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v15 12/12] MAINTAINERS: add entry for new packet capture features
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
@ 2021-10-21 16:02     ` Stephen Hemminger
  0 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-21 16:02 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

On Wed, 20 Oct 2021 14:42:36 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:

> Since the packet capture is just extension of existing pdump;
> add myself as maintainer of that.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---

FYI the CI infrastructure is reporting false positives for spelling
errors on this.
"de facto" is correct spelling according to standard dictionaries.
It sees "fdopen" in the doc around pcapng and thinks the API call
is a word that has to be in dictionary.

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v15 02/12] librte_pcapng: add new library for writing pcapng files
  2021-10-21 15:29       ` Stephen Hemminger
@ 2021-10-21 18:56         ` Thomas Monjalon
  0 siblings, 0 replies; 220+ messages in thread
From: Thomas Monjalon @ 2021-10-21 18:56 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Kinsella, Ray, dev, Reshma Pattan

21/10/2021 17:29, Stephen Hemminger:
> On Thu, 21 Oct 2021 15:14:38 +0100
> "Kinsella, Ray" <mdr@ashroe.eu> wrote:
> 
> > On 20/10/2021 22:42, Stephen Hemminger wrote:
> > > This is utility library for writing pcapng format files
> > > used by Wireshark family of utilities. Older tcpdump
> > > also knows how to read (but not write) this format.
> > > 
> > > See
> > >    https://github.com/pcapng/pcapng/
> > > 
> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > Acked-by: Reshma Pattan <reshma.pattan@intel.com>
> > > ---
> > Minor niggle, does this need a MAINTAINERS entry?
> > 
> > Acked-by: Ray Kinsella <mdr@ashroe.eu>
> 
> It doesn't need its own entry in MAINTAINERS.
> If you look at the last patch, added that directory under the Packet Capture section.

It should not be the last patch.
I will try to squash where appropriate while merging.



^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v15 02/12] librte_pcapng: add new library for writing pcapng files
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
  2021-10-21 14:14     ` Kinsella, Ray
@ 2021-10-22 13:43     ` Thomas Monjalon
  2021-10-22 15:07       ` Stephen Hemminger
  2021-10-29 17:50     ` Ferruh Yigit
  2 siblings, 1 reply; 220+ messages in thread
From: Thomas Monjalon @ 2021-10-22 13:43 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Reshma Pattan, Ray Kinsella, david.marchand

20/10/2021 23:42, Stephen Hemminger:
> +++ b/lib/pcapng/meson.build
> +version = 1

What is the meaning of this version in meson? Is it used somewhere?




^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v15 00/12] Packet capture framework update
  2021-10-20 21:42 ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Stephen Hemminger
                     ` (11 preceding siblings ...)
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
@ 2021-10-22 13:55   ` Thomas Monjalon
  12 siblings, 0 replies; 220+ messages in thread
From: Thomas Monjalon @ 2021-10-22 13:55 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, reshma.pattan, david.marchand

20/10/2021 23:42, Stephen Hemminger:
> This patch set is a more complete version of the the enhanced
> packet capture support described last year.
> 
> The new capture library and utility are:
>   - faster avoids lots of extra I/O, does bursting, etc.
>   - gives more information (multiple ports, queues, etc)
>   - has a better user interface (same as Wireshark dumpcap)
>   - fixes structural problems with VLAN's and timestamps

I have re-organized the commits and fixed various stuff in the doc.
Doc and maintainers update are part of relevant commits.
Applied, thanks.




^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v15 02/12] librte_pcapng: add new library for writing pcapng files
  2021-10-22 13:43     ` Thomas Monjalon
@ 2021-10-22 15:07       ` Stephen Hemminger
  2021-10-22 15:21         ` Thomas Monjalon
  0 siblings, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-22 15:07 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Reshma Pattan, Ray Kinsella, david.marchand

On Fri, 22 Oct 2021 15:43:58 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:

> 20/10/2021 23:42, Stephen Hemminger:
> > +++ b/lib/pcapng/meson.build
> > +version = 1  
> 
> What is the meaning of this version in meson? Is it used somewhere?
> 
> 
> 

No just copy/paste from some other meson file.

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v15 02/12] librte_pcapng: add new library for writing pcapng files
  2021-10-22 15:07       ` Stephen Hemminger
@ 2021-10-22 15:21         ` Thomas Monjalon
  0 siblings, 0 replies; 220+ messages in thread
From: Thomas Monjalon @ 2021-10-22 15:21 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Reshma Pattan, Ray Kinsella, david.marchand

22/10/2021 17:07, Stephen Hemminger:
> On Fri, 22 Oct 2021 15:43:58 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> > 20/10/2021 23:42, Stephen Hemminger:
> > > +++ b/lib/pcapng/meson.build
> > > +version = 1  
> > 
> > What is the meaning of this version in meson? Is it used somewhere?
> 
> No just copy/paste from some other meson file.

OK, removed then.

For info, I've found multiple garbages in your patches, like repeated words,
misplaced line in doc, etc.



^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v15 06/12] pdump: support pcapng and filtering
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 06/12] pdump: support pcapng and filtering Stephen Hemminger
  2021-10-21 14:16     ` Kinsella, Ray
@ 2021-10-27  6:34     ` Wang, Yinan
  2021-10-27 14:56       ` Stephen Hemminger
  1 sibling, 1 reply; 220+ messages in thread
From: Wang, Yinan @ 2021-10-27  6:34 UTC (permalink / raw)
  To: Stephen Hemminger, dev
  Cc: Pattan, Reshma, Ray Kinsella, Burakov, Anatoly, Ling, WeiX, He,
	Xingguang

Hi Hemminger,

I meet an issue when using dpdk-pdump with your patch ,we try to capture pkts from virtio port, all packets captured shows malformed packets , and no issue if remove your patch. Bug link:https://bugs.dpdk.org/show_bug.cgi?id=840
 Could you help to take a look at this issue?

BR,
Yinan

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Stephen Hemminger
> Sent: 2021?10?21? 5:43
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; Pattan, Reshma
> <reshma.pattan@intel.com>; Ray Kinsella <mdr@ashroe.eu>; Burakov,
> Anatoly <anatoly.burakov@intel.com>
> Subject: [dpdk-dev] [PATCH v15 06/12] pdump: support pcapng and filtering
> 
> This enhances the DPDK pdump library to support new
> pcapng format and filtering via BPF.
> 
> The internal client/server protocol is changed to support
> two versions: the original pdump basic version and a
> new pcapng version.
> 
> The internal version number (not part of exposed API or ABI)
> is intentionally increased to cause any attempt to try
> mismatched primary/secondary process to fail.
> 
> Add new API to do allow filtering of captured packets with
> DPDK BPF (eBPF) filter program. It keeps statistics
> on packets captured, filtered, and missed (because ring was full).
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Acked-by: Reshma Pattan <reshma.pattan@intel.com>


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v15 06/12] pdump: support pcapng and filtering
  2021-10-27  6:34     ` Wang, Yinan
@ 2021-10-27 14:56       ` Stephen Hemminger
  0 siblings, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-27 14:56 UTC (permalink / raw)
  To: Wang, Yinan
  Cc: dev, Pattan, Reshma, Ray Kinsella, Burakov, Anatoly, Ling, WeiX,
	He, Xingguang

On Wed, 27 Oct 2021 06:34:36 +0000
"Wang, Yinan" <yinan.wang@intel.com> wrote:

> Hi Hemminger,
> 
> I meet an issue when using dpdk-pdump with your patch ,we try to capture pkts from virtio port, all packets captured shows malformed packets , and no issue if remove your patch. Bug link:https://bugs.dpdk.org/show_bug.cgi?id=840
>  Could you help to take a look at this issue?
> 
> BR,
> Yinan

Thanks looking into it today.

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v15 02/12] librte_pcapng: add new library for writing pcapng files
  2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
  2021-10-21 14:14     ` Kinsella, Ray
  2021-10-22 13:43     ` Thomas Monjalon
@ 2021-10-29 17:50     ` Ferruh Yigit
  2021-10-29 19:55       ` Stephen Hemminger
  2021-10-29 21:50       ` [dpdk-dev] [PATCH] pcapng: do not use deprecated ETH_LINK_UP Stephen Hemminger
  2 siblings, 2 replies; 220+ messages in thread
From: Ferruh Yigit @ 2021-10-29 17:50 UTC (permalink / raw)
  To: Stephen Hemminger, dev; +Cc: Reshma Pattan, Ray Kinsella

On 10/20/2021 10:42 PM, Stephen Hemminger wrote:
> +	/* DPDK reports in units of Mbps */
> +	rte_eth_link_get(port, &link);
> +	if (link.link_status == ETH_LINK_UP)

Should use renamed 'RTE_ETH_LINK_UP' macro with RTE_ prefix.

^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH v15 02/12] librte_pcapng: add new library for writing pcapng files
  2021-10-29 17:50     ` Ferruh Yigit
@ 2021-10-29 19:55       ` Stephen Hemminger
  2021-10-29 21:50       ` [dpdk-dev] [PATCH] pcapng: do not use deprecated ETH_LINK_UP Stephen Hemminger
  1 sibling, 0 replies; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-29 19:55 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev, Reshma Pattan, Ray Kinsella

On Fri, 29 Oct 2021 18:50:57 +0100
Ferruh Yigit <ferruh.yigit@intel.com> wrote:

> On 10/20/2021 10:42 PM, Stephen Hemminger wrote:
> > +	/* DPDK reports in units of Mbps */
> > +	rte_eth_link_get(port, &link);
> > +	if (link.link_status == ETH_LINK_UP)  
> 
> Should use renamed 'RTE_ETH_LINK_UP' macro with RTE_ prefix.

Sure, that wasn't there when this was first written.
Should we deprecate old one?

^ permalink raw reply	[flat|nested] 220+ messages in thread

* [dpdk-dev] [PATCH] pcapng: do not use deprecated ETH_LINK_UP
  2021-10-29 17:50     ` Ferruh Yigit
  2021-10-29 19:55       ` Stephen Hemminger
@ 2021-10-29 21:50       ` Stephen Hemminger
  2021-10-31 22:31         ` Thomas Monjalon
  1 sibling, 1 reply; 220+ messages in thread
From: Stephen Hemminger @ 2021-10-29 21:50 UTC (permalink / raw)
  To: ferruh.yigit; +Cc: dev, Stephen Hemminger

RTE_ prefix was added by:
commit 295968d17407 ("ethdev: add namespace")

Fixes: 8d23ce8f5ee9 ("pcapng: add new library for writing pcapng files")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/pcapng/rte_pcapng.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index a3d5760e6835..03edabe73e96 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -178,7 +178,7 @@ pcapng_add_interface(rte_pcapng_t *self, uint16_t port)
 
 	/* DPDK reports in units of Mbps */
 	rte_eth_link_get(port, &link);
-	if (link.link_status == ETH_LINK_UP)
+	if (link.link_status == RTE_ETH_LINK_UP)
 		speed = link.link_speed * PCAPNG_MBPS_SPEED;
 
 	if (rte_eth_macaddr_get(port, &macaddr) < 0)
-- 
2.30.2


^ permalink raw reply	[flat|nested] 220+ messages in thread

* Re: [dpdk-dev] [PATCH] pcapng: do not use deprecated ETH_LINK_UP
  2021-10-29 21:50       ` [dpdk-dev] [PATCH] pcapng: do not use deprecated ETH_LINK_UP Stephen Hemminger
@ 2021-10-31 22:31         ` Thomas Monjalon
  0 siblings, 0 replies; 220+ messages in thread
From: Thomas Monjalon @ 2021-10-31 22:31 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: ferruh.yigit, dev

29/10/2021 23:50, Stephen Hemminger:
> RTE_ prefix was added by:
> commit 295968d17407 ("ethdev: add namespace")
> 
> Fixes: 8d23ce8f5ee9 ("pcapng: add new library for writing pcapng files")
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Applied, thanks.
Note: I missed it when merging patches.



^ permalink raw reply	[flat|nested] 220+ messages in thread

end of thread, other threads:[~2021-10-31 22:32 UTC | newest]

Thread overview: 220+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-03  0:47 [dpdk-dev] [PATCH 0/5] Packet capture framework enhancements Stephen Hemminger
2021-09-03  0:47 ` [dpdk-dev] [PATCH 1/5] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
2021-09-03  0:47 ` [dpdk-dev] [PATCH 2/5] pdump: support pcapng and filtering Stephen Hemminger
2021-09-03  0:47 ` [dpdk-dev] [PATCH 3/5] app/dumpcap: add new packet capture application Stephen Hemminger
2021-09-03  0:59   ` Stephen Hemminger
2021-09-03  0:47 ` [dpdk-dev] [PATCH 4/5] doc: changes for new pcapng and dumpcap Stephen Hemminger
2021-09-03  0:47 ` [dpdk-dev] [PATCH 5/5] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
2021-09-03 22:06 ` [dpdk-dev] [PATCH v2 0/5] Packet capture framework enhancements Stephen Hemminger
2021-09-03 22:06   ` [dpdk-dev] [PATCH v2 1/5] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
2021-09-03 22:06   ` [dpdk-dev] [PATCH v2 2/5] pdump: support pcapng and filtering Stephen Hemminger
2021-09-03 22:06   ` [dpdk-dev] [PATCH v2 3/5] app/dumpcap: add new packet capture application Stephen Hemminger
2021-09-03 22:06   ` [dpdk-dev] [PATCH v2 4/5] doc: changes for new pcapng and dumpcap Stephen Hemminger
2021-09-03 22:06   ` [dpdk-dev] [PATCH v2 5/5] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
2021-09-08  4:50 ` [dpdk-dev] [PATCH v3 0/8] Packet capture framework enhancements Stephen Hemminger
2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 1/8] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 2/8] bpf: allow self-xor operation Stephen Hemminger
2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 3/8] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 4/8] bpf: add function to dump eBPF instructions Stephen Hemminger
2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 5/8] pdump: support pcapng and filtering Stephen Hemminger
2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 6/8] app/dumpcap: add new packet capture application Stephen Hemminger
2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 7/8] doc: changes for new pcapng and dumpcap Stephen Hemminger
2021-09-08  4:50   ` [dpdk-dev] [PATCH v3 8/8] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
2021-09-08 17:16 ` [dpdk-dev] [PATCH v4 0/8] Packet capture framework enhancements Stephen Hemminger
2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 1/8] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 2/8] bpf: allow self-xor operation Stephen Hemminger
2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 3/8] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 4/8] bpf: add function to dump eBPF instructions Stephen Hemminger
2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 5/8] pdump: support pcapng and filtering Stephen Hemminger
2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 6/8] app/dumpcap: add new packet capture application Stephen Hemminger
2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 7/8] doc: changes for new pcapng and dumpcap Stephen Hemminger
2021-09-08 17:16   ` [dpdk-dev] [PATCH v4 8/8] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
2021-09-08 21:50 ` [dpdk-dev] [PATCH v5 0/9] Packet capture framework enhancements Stephen Hemminger
2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 1/9] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 2/9] bpf: allow self-xor operation Stephen Hemminger
2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 3/9] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 4/9] bpf: add function to dump eBPF instructions Stephen Hemminger
2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 5/9] lib: pdump is not supported on Windows Stephen Hemminger
2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 6/9] pdump: support pcapng and filtering Stephen Hemminger
2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 7/9] app/dumpcap: add new packet capture application Stephen Hemminger
2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 8/9] doc: changes for new pcapng and dumpcap Stephen Hemminger
2021-09-08 21:50   ` [dpdk-dev] [PATCH v5 9/9] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
2021-09-09 23:33 ` [dpdk-dev] [PATCH v6 00/10] Packet capture framework enhancements Stephen Hemminger
2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 01/10] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 02/10] bpf: allow self-xor operation Stephen Hemminger
2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 03/10] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
2021-09-10  7:59     ` Dmitry Kozlyuk
2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 04/10] bpf: add function to dump eBPF instructions Stephen Hemminger
2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 05/10] test: add test for bpf_convert Stephen Hemminger
2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 06/10] lib: pdump is not supported on Windows Stephen Hemminger
2021-09-10  8:17     ` Dmitry Kozlyuk
2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 07/10] pdump: support pcapng and filtering Stephen Hemminger
2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 08/10] app/dumpcap: add new packet capture application Stephen Hemminger
2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 09/10] doc: changes for new pcapng and dumpcap Stephen Hemminger
2021-09-09 23:33   ` [dpdk-dev] [PATCH v6 10/10] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
2021-09-10 18:18 ` [dpdk-dev] [PATCH v7 00/11] Packet capture framework enhancements Stephen Hemminger
2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 01/11] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 02/11] lib: pdump is not supported on Windows Stephen Hemminger
2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 03/11] bpf: allow self-xor operation Stephen Hemminger
2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 04/11] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 05/11] bpf: add function to dump eBPF instructions Stephen Hemminger
2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 06/11] pdump: support pcapng and filtering Stephen Hemminger
2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 07/11] app/dumpcap: add new packet capture application Stephen Hemminger
2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 08/11] test: add test for bpf_convert Stephen Hemminger
2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 09/11] test: add a test for pcapng library Stephen Hemminger
2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 10/11] doc: changes for new pcapng and dumpcap Stephen Hemminger
2021-09-10 18:18   ` [dpdk-dev] [PATCH v7 11/11] MAINTAINERS: add entry for new pcapng and dumper Stephen Hemminger
2021-09-13 18:14 ` [dpdk-dev] [PATCH v8 00/12] Packet capture framework enhancements Stephen Hemminger
2021-09-13 18:14   ` [dpdk-dev] [PATCH v8 01/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 02/12] lib: pdump is not supported on Windows Stephen Hemminger
2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 03/12] bpf: allow self-xor operation Stephen Hemminger
2021-09-15 10:55     ` Ananyev, Konstantin
2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
2021-09-15 11:02     ` Ananyev, Konstantin
2021-09-15 16:25       ` Stephen Hemminger
2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
2021-09-15 11:04     ` Ananyev, Konstantin
2021-09-15 16:26       ` Stephen Hemminger
2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 06/12] pdump: support pcapng and filtering Stephen Hemminger
2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 08/12] test: add test for bpf_convert Stephen Hemminger
2021-09-15 11:34     ` Ananyev, Konstantin
2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 09/12] test: add a test for pcapng library Stephen Hemminger
2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 10/12] test: enable bpf autotest Stephen Hemminger
2021-09-15 11:27     ` Ananyev, Konstantin
2021-09-15 23:36       ` Stephen Hemminger
2021-09-16  3:09     ` Stephen Hemminger
2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 11/12] doc: changes for new pcapng and dumpcap Stephen Hemminger
2021-09-13 18:15   ` [dpdk-dev] [PATCH v8 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
2021-09-16  0:14 ` [dpdk-dev] [PATCH v9 00/12] Packet capture framework enhancements Stephen Hemminger
2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 01/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 02/12] lib: pdump is not supported on Windows Stephen Hemminger
2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 03/12] bpf: allow self-xor operation Stephen Hemminger
2021-09-16 15:23     ` Ananyev, Konstantin
2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 06/12] pdump: support pcapng and filtering Stephen Hemminger
2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 08/12] test: add test for bpf_convert Stephen Hemminger
2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 09/12] test: add a test for pcapng library Stephen Hemminger
2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 10/12] test: enable bpf autotest Stephen Hemminger
2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 11/12] doc: changes for new pcapng and dumpcap Stephen Hemminger
2021-09-16  0:14   ` [dpdk-dev] [PATCH v9 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
2021-09-16 22:26 ` [dpdk-dev] [PATCH v10 00/12] Packet capture framework enhancements Stephen Hemminger
2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 01/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 02/12] lib: pdump is not supported on Windows Stephen Hemminger
2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 03/12] bpf: allow self-xor operation Stephen Hemminger
2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 06/12] pdump: support pcapng and filtering Stephen Hemminger
2021-09-23 16:11     ` Pattan, Reshma
2021-09-23 16:58       ` Stephen Hemminger
2021-09-23 18:14       ` Stephen Hemminger
2021-09-23 18:23       ` Stephen Hemminger
2021-09-24 15:33         ` Pattan, Reshma
2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 08/12] test: add test for bpf_convert Stephen Hemminger
2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 09/12] test: add a test for pcapng library Stephen Hemminger
2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 10/12] test: enable bpf autotest Stephen Hemminger
2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 11/12] doc: changes for new pcapng and dumpcap Stephen Hemminger
2021-09-16 22:26   ` [dpdk-dev] [PATCH v10 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
2021-09-24 15:21 ` [dpdk-dev] [PATCH v11 00/12] Packet capture framework enhancements Stephen Hemminger
2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 01/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 02/12] lib: pdump is not supported on Windows Stephen Hemminger
2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 03/12] bpf: allow self-xor operation Stephen Hemminger
2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 06/12] pdump: support pcapng and filtering Stephen Hemminger
2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 08/12] test: add test for bpf_convert Stephen Hemminger
2021-09-24 15:21   ` [dpdk-dev] [PATCH v11 09/12] test: add a test for pcapng library Stephen Hemminger
2021-09-24 15:22   ` [dpdk-dev] [PATCH v11 10/12] test: enable bpf autotest Stephen Hemminger
2021-09-24 15:22   ` [dpdk-dev] [PATCH v11 11/12] doc: changes for new pcapng and dumpcap Stephen Hemminger
2021-09-24 15:22   ` [dpdk-dev] [PATCH v11 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
2021-10-01 16:26 ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 01/12] lib: pdump is not supported on Windows Stephen Hemminger
2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
2021-10-15  9:36     ` Pattan, Reshma
2021-10-15 17:40       ` Stephen Hemminger
2021-10-15 18:14       ` Stephen Hemminger
2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 03/12] bpf: allow self-xor operation Stephen Hemminger
2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
2021-10-01 16:26   ` [dpdk-dev] [PATCH v12 06/12] pdump: support pcapng and filtering Stephen Hemminger
2021-10-12 16:31     ` Pattan, Reshma
2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 08/12] test: add test for bpf_convert Stephen Hemminger
2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 09/12] test: add a test for pcapng library Stephen Hemminger
2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 10/12] test: enable bpf autotest Stephen Hemminger
2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 11/12] doc: changes for new pcapng and dumpcap Stephen Hemminger
2021-10-15 16:42     ` Pattan, Reshma
2021-10-15 17:29       ` Stephen Hemminger
2021-10-18  9:23         ` Pattan, Reshma
2021-10-01 16:27   ` [dpdk-dev] [PATCH v12 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
2021-10-12  2:31   ` [dpdk-dev] [PATCH v12 00/12] Packet capture framework update Stephen Hemminger
2021-10-12  7:09     ` Thomas Monjalon
2021-10-12 10:21       ` Pattan, Reshma
2021-10-12 15:44         ` Stephen Hemminger
2021-10-12 15:48           ` Thomas Monjalon
2021-10-12 18:00             ` Stephen Hemminger
2021-10-12 18:22               ` Thomas Monjalon
2021-10-13  8:44                 ` Pattan, Reshma
2021-10-15 18:28 ` [dpdk-dev] [PATCH v13 " Stephen Hemminger
2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 01/12] lib: pdump is not supported on Windows Stephen Hemminger
2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 03/12] bpf: allow self-xor operation Stephen Hemminger
2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 06/12] pdump: support pcapng and filtering Stephen Hemminger
2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
2021-10-15 18:28   ` [dpdk-dev] [PATCH v13 08/12] test: add test for bpf_convert Stephen Hemminger
2021-10-15 18:29   ` [dpdk-dev] [PATCH v13 09/12] test: add a test for pcapng library Stephen Hemminger
2021-10-15 18:29   ` [dpdk-dev] [PATCH v13 10/12] test: enable bpf autotest Stephen Hemminger
2021-10-15 18:29   ` [dpdk-dev] [PATCH v13 11/12] doc: changes for new pcapng and dumpcap utility Stephen Hemminger
2021-10-15 18:29   ` [dpdk-dev] [PATCH v13 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
2021-10-15 20:11 ` [dpdk-dev] [PATCH v14 00/12] Packet capture framework update Stephen Hemminger
2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 01/12] lib: pdump is not supported on Windows Stephen Hemminger
2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
2021-10-19 10:24     ` Pattan, Reshma
2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 03/12] bpf: allow self-xor operation Stephen Hemminger
2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 06/12] pdump: support pcapng and filtering Stephen Hemminger
2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 08/12] test: add test for bpf_convert Stephen Hemminger
2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 09/12] test: add a test for pcapng library Stephen Hemminger
2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 10/12] test: enable bpf autotest Stephen Hemminger
2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 11/12] doc: changes for new pcapng and dumpcap utility Stephen Hemminger
2021-10-19  8:28     ` Pattan, Reshma
2021-10-15 20:11   ` [dpdk-dev] [PATCH v14 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
2021-10-21 12:40     ` Pattan, Reshma
2021-10-20 21:42 ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Stephen Hemminger
2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 01/12] lib: pdump is not supported on Windows Stephen Hemminger
2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 02/12] librte_pcapng: add new library for writing pcapng files Stephen Hemminger
2021-10-21 14:14     ` Kinsella, Ray
2021-10-21 15:29       ` Stephen Hemminger
2021-10-21 18:56         ` Thomas Monjalon
2021-10-22 13:43     ` Thomas Monjalon
2021-10-22 15:07       ` Stephen Hemminger
2021-10-22 15:21         ` Thomas Monjalon
2021-10-29 17:50     ` Ferruh Yigit
2021-10-29 19:55       ` Stephen Hemminger
2021-10-29 21:50       ` [dpdk-dev] [PATCH] pcapng: do not use deprecated ETH_LINK_UP Stephen Hemminger
2021-10-31 22:31         ` Thomas Monjalon
2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 03/12] bpf: allow self-xor operation Stephen Hemminger
2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 04/12] bpf: add function to convert classic BPF to DPDK BPF Stephen Hemminger
2021-10-21 14:15     ` Kinsella, Ray
2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 05/12] bpf: add function to dump eBPF instructions Stephen Hemminger
2021-10-21 14:15     ` Kinsella, Ray
2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 06/12] pdump: support pcapng and filtering Stephen Hemminger
2021-10-21 14:16     ` Kinsella, Ray
2021-10-27  6:34     ` Wang, Yinan
2021-10-27 14:56       ` Stephen Hemminger
2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 07/12] app/dumpcap: add new packet capture application Stephen Hemminger
2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 08/12] test: add test for bpf_convert Stephen Hemminger
2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 09/12] test: add a test for pcapng library Stephen Hemminger
2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 10/12] test: enable bpf autotest Stephen Hemminger
2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 11/12] doc: changes for new pcapng and dumpcap utility Stephen Hemminger
2021-10-20 21:42   ` [dpdk-dev] [PATCH v15 12/12] MAINTAINERS: add entry for new packet capture features Stephen Hemminger
2021-10-21 16:02     ` Stephen Hemminger
2021-10-22 13:55   ` [dpdk-dev] [PATCH v15 00/12] Packet capture framework update Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).