* [RFC 00/13] Packet capture using port mirroring
@ 2025-04-11 23:44 Stephen Hemminger
2025-04-11 23:44 ` [RFC 01/13] app/testpmd: revert auto attach/detach Stephen Hemminger
` (13 more replies)
0 siblings, 14 replies; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-11 23:44 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
This is a rework of how packet capture is done in DPDK.
The existing mechanism using callbacks has a number of problems;
the main one is that any packets sent/received in a secondary process
are not visible. The root cause is that callbacks can not function
across process boundaries, they are specific to the role.
The new mechanism builds on the concept of port mirroring used
in Linux/FreeBSD and also router vendors (SPAN ports). The infrastructure
is built around a new ethdev call to mirror a port.
The internals of dumpcap are redone but the program command
syntax is unchanged (no doc change needed). Usingthe new mirror mechanism,
the dumpcap program creates a port using ring PMD and
then directs mirror to that. Then the packets are extracted from the ring.
If capturing on multiple devices, they all get mirrored to
the same ring PMD.
Doing this uncovered a bunch of additional missing features and bugs.
The first is that test-pmd auto attach introduced in 24.11 breaks this
(and it broke old pdump as well) so it is removed.
The dumpcap program needs to be able to start the ring PMD it created
but the existing ethdev API is broken when used from secondary process.
So fix that.
The mirror API has a restriction that the port being mirrored to must allow
lock free transmit. This is because both rx and tx as well as multiple
queues need to go over the same transmit queue. Although it might be possible
to have some complex mapping between multiple ports/queues being mirrored
to multiple transmit queues, that gets to be a lot of overhead in API's
and implementation. Fixing the ring and null PMD to allow lockeless
transmit is good enough.
Added tests for the new features.
The performance has not been measured but the overhead of checking for
mirror port is no more that the overhead of looking at the callback list.
TODO items:
- need release notes for new features
- more through testing on real hardware
- mark old pdump API and application for deprecation in 25.07 and removal in 25.11
- more cleanup of patch wording and docs needed.
Stephen Hemminger (13):
app/testpmd: revert auto attach/detach
ethdev: allow start/stop from secondary process
test: add test for hotplug and secondary process operations
net/ring: allow lockfree transmit if ring supports it
net/ring: add argument to attach existing ring
net/ring: add timestamp devargs
net/null: all lockfree transmit
mbuf: add fields for mirroring
ethdev: add port mirror capability
pcapng: split packet copy from header insertion
test: add tests for ethdev mirror
app/testpmd: support for port mirroring
app/dumpcap: use port mirror instead of pdump
app/dumpcap/main.c | 361 +++++++++++++++-----
app/dumpcap/meson.build | 2 +-
app/test-pmd/cmdline.c | 1 +
app/test-pmd/cmdline_mirror.c | 120 +++++++
app/test-pmd/cmdline_mirror.h | 12 +
app/test-pmd/meson.build | 1 +
app/test-pmd/testpmd.c | 77 ++---
app/test/meson.build | 1 +
app/test/test_ethdev_mirror.c | 286 ++++++++++++++++
app/test/test_mp_secondary.c | 51 ++-
config/rte_config.h | 2 +
doc/guides/testpmd_app_ug/testpmd_funcs.rst | 15 +
drivers/net/null/rte_eth_null.c | 1 +
drivers/net/ring/rte_eth_ring.c | 91 ++++-
lib/ethdev/ethdev_driver.c | 11 +-
lib/ethdev/ethdev_driver.h | 3 +
lib/ethdev/ethdev_private.c | 80 +++++
lib/ethdev/ethdev_private.h | 23 ++
lib/ethdev/ethdev_trace.h | 14 +
lib/ethdev/ethdev_trace_points.c | 6 +
lib/ethdev/rte_ethdev.c | 275 +++++++++++++--
lib/ethdev/rte_ethdev.h | 81 +++++
lib/ethdev/rte_ethdev_core.h | 6 +-
lib/mbuf/rte_mbuf_core.h | 8 +
lib/pcapng/rte_pcapng.c | 178 +++++-----
lib/pcapng/rte_pcapng.h | 27 +-
26 files changed, 1468 insertions(+), 265 deletions(-)
create mode 100644 app/test-pmd/cmdline_mirror.c
create mode 100644 app/test-pmd/cmdline_mirror.h
create mode 100644 app/test/test_ethdev_mirror.c
--
2.47.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC 01/13] app/testpmd: revert auto attach/detach
2025-04-11 23:44 [RFC 00/13] Packet capture using port mirroring Stephen Hemminger
@ 2025-04-11 23:44 ` Stephen Hemminger
2025-04-11 23:44 ` [RFC 02/13] ethdev: allow start/stop from secondary process Stephen Hemminger
` (12 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-11 23:44 UTC (permalink / raw)
To: dev
Cc: Stephen Hemminger, lihuisong, stable, Aman Singh, Chengwen Feng,
Dongdong Liu
Revert "app/testpmd: add port attach/detach for multiple process"
This reverts commit 994635edb2c038e64617bcf2790a8cd326c3e8e0.
This commit breaks using pdump and other secondary processes that
create there own devices. The patch makes testpmd grab any new
hotplug device and configure it.
It may also break failsafe and netvsc PMD handling of VF devices.
But I don't have access to test that part.
The patch is flawed in concept, and needs to go.
Bugzilla ID: 1695
Fixes: 994635edb2c0 ("app/testpmd: add port attach/detach for multiple process")
Cc: lihuisong@huawei.com
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
app/test-pmd/testpmd.c | 77 +++++++++++-------------------------------
1 file changed, 20 insertions(+), 57 deletions(-)
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index b5f0c02261..7f4e3b434d 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -705,7 +705,7 @@ eth_dev_set_mtu_mp(uint16_t port_id, uint16_t mtu)
}
/* Forward function declarations */
-static void setup_attached_port(void *arg);
+static void setup_attached_port(portid_t pi);
static void check_all_ports_link_status(uint32_t port_mask);
static int eth_event_callback(portid_t port_id,
enum rte_eth_event_type type,
@@ -2906,7 +2906,6 @@ start_port(portid_t pid)
at_least_one_port_exist = true;
port = &ports[pi];
-
if (port->port_status == RTE_PORT_STOPPED) {
port->port_status = RTE_PORT_HANDLING;
all_ports_already_started = false;
@@ -3254,7 +3253,6 @@ remove_invalid_ports(void)
remove_invalid_ports_in(ports_ids, &nb_ports);
remove_invalid_ports_in(fwd_ports_ids, &nb_fwd_ports);
nb_cfg_ports = nb_fwd_ports;
- printf("Now total ports is %d\n", nb_ports);
}
static void
@@ -3427,11 +3425,14 @@ attach_port(char *identifier)
return;
}
- /* First attach mode: event
- * New port flag is updated on RTE_ETH_EVENT_NEW event
- */
+ /* first attach mode: event */
if (setup_on_probe_event) {
- goto out;
+ /* new ports are detected on RTE_ETH_EVENT_NEW event */
+ for (pi = 0; pi < RTE_MAX_ETHPORTS; pi++)
+ if (ports[pi].port_status == RTE_PORT_HANDLING &&
+ ports[pi].need_setup != 0)
+ setup_attached_port(pi);
+ return;
}
/* second attach mode: iterator */
@@ -3439,17 +3440,13 @@ attach_port(char *identifier)
/* setup ports matching the devargs used for probing */
if (port_is_forwarding(pi))
continue; /* port was already attached before */
- setup_attached_port((void *)(intptr_t)pi);
+ setup_attached_port(pi);
}
-out:
- printf("Port %s is attached.\n", identifier);
- printf("Done\n");
}
static void
-setup_attached_port(void *arg)
+setup_attached_port(portid_t pi)
{
- portid_t pi = (intptr_t)arg;
unsigned int socket_id;
int ret;
@@ -3464,8 +3461,14 @@ setup_attached_port(void *arg)
"Error during enabling promiscuous mode for port %u: %s - ignore\n",
pi, rte_strerror(-ret));
+ ports_ids[nb_ports++] = pi;
+ fwd_ports_ids[nb_fwd_ports++] = pi;
+ nb_cfg_ports = nb_fwd_ports;
ports[pi].need_setup = 0;
ports[pi].port_status = RTE_PORT_STOPPED;
+
+ printf("Port %d is attached. Now total ports is %d\n", pi, nb_ports);
+ printf("Done\n");
}
static void
@@ -3495,8 +3498,10 @@ detach_device(struct rte_device *dev)
TESTPMD_LOG(ERR, "Failed to detach device %s\n", rte_dev_name(dev));
return;
}
+ remove_invalid_ports();
printf("Device is detached\n");
+ printf("Now total ports is %d\n", nb_ports);
printf("Done\n");
return;
}
@@ -3728,25 +3733,7 @@ rmv_port_callback(void *arg)
struct rte_device *device = dev_info.device;
close_port(port_id);
detach_device(device); /* might be already removed or have more ports */
- remove_invalid_ports();
- }
- if (need_to_start)
- start_packet_forwarding(0);
-}
-
-static void
-remove_invalid_ports_callback(void *arg)
-{
- portid_t port_id = (intptr_t)arg;
- int need_to_start = 0;
-
- if (!test_done && port_is_forwarding(port_id)) {
- need_to_start = 1;
- stop_packet_forwarding();
}
-
- remove_invalid_ports();
-
if (need_to_start)
start_packet_forwarding(0);
}
@@ -3772,23 +3759,8 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
switch (type) {
case RTE_ETH_EVENT_NEW:
- /* The port in ports_id and fwd_ports_ids is always valid
- * from index 0 ~ (nb_ports - 1) due to updating their
- * position when one port is detached or removed.
- */
- ports_ids[nb_ports++] = port_id;
- fwd_ports_ids[nb_fwd_ports++] = port_id;
- nb_cfg_ports = nb_fwd_ports;
- printf("Port %d is probed. Now total ports is %d\n", port_id, nb_ports);
-
- if (setup_on_probe_event) {
- ports[port_id].need_setup = 1;
- ports[port_id].port_status = RTE_PORT_HANDLING;
- }
- /* Can't initialize port directly in new event. */
- if (rte_eal_alarm_set(100000, setup_attached_port,
- (void *)(intptr_t)port_id))
- fprintf(stderr, "Could not set up deferred task to setup this attached port.\n");
+ ports[port_id].need_setup = 1;
+ ports[port_id].port_status = RTE_PORT_HANDLING;
break;
case RTE_ETH_EVENT_INTR_RMV:
if (port_id_is_invalid(port_id, DISABLED_WARN))
@@ -3801,15 +3773,6 @@ eth_event_callback(portid_t port_id, enum rte_eth_event_type type, void *param,
case RTE_ETH_EVENT_DESTROY:
ports[port_id].port_status = RTE_PORT_CLOSED;
printf("Port %u is closed\n", port_id);
- /*
- * Defer to remove port id due to the reason that the ethdev
- * state is changed from 'ATTACHED' to 'UNUSED' only after the
- * event callback finished. Otherwise this port id can not be
- * removed.
- */
- if (rte_eal_alarm_set(100000, remove_invalid_ports_callback,
- (void *)(intptr_t)port_id))
- fprintf(stderr, "Could not set up deferred task to remove this port id.\n");
break;
case RTE_ETH_EVENT_RX_AVAIL_THRESH: {
uint16_t rxq_id;
--
2.47.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC 02/13] ethdev: allow start/stop from secondary process
2025-04-11 23:44 [RFC 00/13] Packet capture using port mirroring Stephen Hemminger
2025-04-11 23:44 ` [RFC 01/13] app/testpmd: revert auto attach/detach Stephen Hemminger
@ 2025-04-11 23:44 ` Stephen Hemminger
2025-04-15 0:19 ` Stephen Hemminger
2025-04-11 23:44 ` [RFC 03/13] test: add test for hotplug and secondary process operations Stephen Hemminger
` (11 subsequent siblings)
13 siblings, 1 reply; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-11 23:44 UTC (permalink / raw)
To: dev
Cc: Stephen Hemminger, Thomas Monjalon, Ferruh Yigit,
Andrew Rybchenko, Anatoly Burakov
Before this patch if secondary process called start/stop
it would only impact the secondary process, the ethdev on the
primary process will still not be started.
With this patch, when start/stop is called from secondary,
it calls the primary and does the operation there. The design
is generic, and we can later add queue and other operations
as needed.
Bugzilla ID: 73
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
lib/ethdev/ethdev_driver.c | 11 ++++-
lib/ethdev/ethdev_private.c | 77 ++++++++++++++++++++++++++++++++++
lib/ethdev/ethdev_private.h | 23 ++++++++++
lib/ethdev/rte_ethdev.c | 84 +++++++++++++++++++++----------------
4 files changed, 158 insertions(+), 37 deletions(-)
diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
index ec0c1e1176..66ba7dafc9 100644
--- a/lib/ethdev/ethdev_driver.c
+++ b/lib/ethdev/ethdev_driver.c
@@ -114,6 +114,14 @@ rte_eth_dev_allocate(const char *name)
goto unlock;
}
+ if (eth_dev_shared_data->allocated_ports == 0 &&
+ rte_mp_action_register(ETHDEV_MP, ethdev_server) &&
+ rte_errno != ENOTSUP) {
+ RTE_ETHDEV_LOG_LINE(ERR,
+ "Could not start %s service", ETHDEV_MP);
+ goto unlock;
+ }
+
eth_dev = eth_dev_get(port_id);
eth_dev->flow_fp_ops = &rte_flow_fp_default_ops;
strlcpy(eth_dev->data->name, name, sizeof(eth_dev->data->name));
@@ -282,7 +290,8 @@ rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
eth_dev->data = NULL;
- eth_dev_shared_data->allocated_ports--;
+ if (--eth_dev_shared_data->allocated_ports == 0)
+ rte_mp_action_unregister(ETHDEV_MP);
eth_dev_shared_data_release();
}
diff --git a/lib/ethdev/ethdev_private.c b/lib/ethdev/ethdev_private.c
index b96d992ea1..bded3f1722 100644
--- a/lib/ethdev/ethdev_private.c
+++ b/lib/ethdev/ethdev_private.c
@@ -477,3 +477,80 @@ eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues)
dev->data->nb_tx_queues = nb_queues;
return 0;
}
+
+static int
+ethdev_handle_request(const struct ethdev_mp_request *req)
+{
+ switch (req->operation) {
+ case ETH_REQ_START:
+ return rte_eth_dev_start(req->port_id);
+
+ case ETH_REQ_STOP:
+ return rte_eth_dev_stop(req->port_id);
+
+ default:
+ return -EINVAL;
+ }
+}
+
+static_assert(sizeof(struct ethdev_mp_request) <= RTE_MP_MAX_PARAM_LEN);
+
+int
+ethdev_server(const struct rte_mp_msg *mp_msg, const void *peer)
+{
+ struct rte_mp_msg mp_resp = {
+ .name = ETHDEV_MP,
+ };
+ struct ethdev_mp_response *resp
+ = (struct ethdev_mp_response *)mp_resp.param;
+ const struct ethdev_mp_request *req;
+
+ mp_resp.len_param = sizeof(*resp);
+
+ req = (const struct ethdev_mp_request *)mp_msg->param;
+
+ resp->res_op = req->operation;
+
+ /* recv client requests */
+ if (mp_msg->len_param != sizeof(*req))
+ resp->err_value = -EINVAL;
+ else
+ resp->err_value = ethdev_handle_request(req);
+
+ return rte_mp_reply(&mp_resp, peer);
+}
+
+int
+ethdev_request(enum ethdev_mp_operation operation, uint16_t port_id,
+ uint16_t queue_id __rte_unused)
+{
+ struct rte_mp_msg mp_req = { };
+ struct rte_mp_reply mp_reply;
+ struct ethdev_mp_request *req;
+ struct timespec ts = {.tv_sec = 5, .tv_nsec = 0};
+ int ret;
+
+ req = (struct ethdev_mp_request *)mp_req.param;
+ strlcpy(mp_req.name, ETHDEV_MP, RTE_MP_MAX_NAME_LEN);
+ mp_req.len_param = sizeof(*req);
+
+ req->operation = operation;
+ req->port_id = port_id;
+
+ if (rte_mp_request_sync(&mp_req, &mp_reply, &ts) == 0) {
+ const struct rte_mp_msg *mp_rep = &mp_reply.msgs[0];
+ const struct ethdev_mp_response *resp
+ = (const struct ethdev_mp_response *)mp_rep->param;
+
+ if (resp->err_value == 0)
+ ret = 0;
+ else
+ rte_errno = -resp->err_value;
+ free(mp_reply.msgs);
+ }
+
+ if (ret < 0)
+ RTE_ETHDEV_LOG_LINE(ERR,
+ "port %up ethdev op %u failed", port_id, operation);
+ return ret;
+}
diff --git a/lib/ethdev/ethdev_private.h b/lib/ethdev/ethdev_private.h
index b07b1b4c42..73df6c4b13 100644
--- a/lib/ethdev/ethdev_private.h
+++ b/lib/ethdev/ethdev_private.h
@@ -79,4 +79,27 @@ void eth_dev_txq_release(struct rte_eth_dev *dev, uint16_t qid);
int eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues);
int eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues);
+/* Used to allow start/stop from secondary */
+#define ETHDEV_MP "mp_ethdev"
+
+enum ethdev_mp_operation {
+ ETH_REQ_START,
+ ETH_REQ_STOP,
+};
+
+struct ethdev_mp_request {
+ uint16_t operation;
+ uint16_t port_id;
+ uint32_t reserved;
+};
+
+struct ethdev_mp_response {
+ uint16_t res_op;
+ uint16_t port_id;
+ int32_t err_value;
+};
+
+int ethdev_server(const struct rte_mp_msg *mp_msg, const void *peer);
+int ethdev_request(enum ethdev_mp_operation op, uint16_t port, uint16_t queue);
+
#endif /* _ETH_PRIVATE_H_ */
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index d4197322a0..f7bf762ff7 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -1800,54 +1800,61 @@ rte_eth_dev_start(uint16_t port_id)
if (dev->data->dev_configured == 0) {
RTE_ETHDEV_LOG_LINE(INFO,
- "Device with port_id=%"PRIu16" is not configured.",
- port_id);
+ "Device with port_id=%"PRIu16" is not configured.",
+ port_id);
return -EINVAL;
}
if (dev->data->dev_started != 0) {
RTE_ETHDEV_LOG_LINE(INFO,
- "Device with port_id=%"PRIu16" already started",
- port_id);
+ "Device with port_id=%"PRIu16" already started",
+ port_id);
return 0;
}
- ret = rte_eth_dev_info_get(port_id, &dev_info);
- if (ret != 0)
- return ret;
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ ret = rte_eth_dev_info_get(port_id, &dev_info);
+ if (ret != 0)
+ return ret;
- restore_flags = rte_eth_get_restore_flags(dev, RTE_ETH_START);
+ restore_flags = rte_eth_get_restore_flags(dev, RTE_ETH_START);
- /* Lets restore MAC now if device does not support live change */
- if ((*dev_info.dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR) &&
- (restore_flags & RTE_ETH_RESTORE_MAC_ADDR))
- eth_dev_mac_restore(dev, &dev_info);
+ /* Restore MAC now if device does not support live change */
+ if ((*dev_info.dev_flags & RTE_ETH_DEV_NOLIVE_MAC_ADDR) &&
+ (restore_flags & RTE_ETH_RESTORE_MAC_ADDR))
+ eth_dev_mac_restore(dev, &dev_info);
- diag = dev->dev_ops->dev_start(dev);
- if (diag == 0)
- dev->data->dev_started = 1;
- else
- return eth_err(port_id, diag);
+ diag = dev->dev_ops->dev_start(dev);
+ if (diag == 0)
+ dev->data->dev_started = 1;
+ else
+ return eth_err(port_id, diag);
- ret = eth_dev_config_restore(dev, &dev_info, restore_flags, port_id);
- if (ret != 0) {
- RTE_ETHDEV_LOG_LINE(ERR,
- "Error during restoring configuration for device (port %u): %s",
- port_id, rte_strerror(-ret));
- ret_stop = rte_eth_dev_stop(port_id);
- if (ret_stop != 0) {
+ ret = eth_dev_config_restore(dev, &dev_info, restore_flags, port_id);
+ if (ret != 0) {
RTE_ETHDEV_LOG_LINE(ERR,
- "Failed to stop device (port %u): %s",
- port_id, rte_strerror(-ret_stop));
- }
+ "Error during restoring configuration for device (port %u): %s",
+ port_id, rte_strerror(-ret));
+ ret_stop = rte_eth_dev_stop(port_id);
+ if (ret_stop != 0) {
+ RTE_ETHDEV_LOG_LINE(ERR,
+ "Failed to stop device (port %u): %s",
+ port_id, rte_strerror(-ret_stop));
+ }
- return ret;
- }
+ return ret;
+ }
- if (dev->data->dev_conf.intr_conf.lsc == 0) {
- if (dev->dev_ops->link_update == NULL)
- return -ENOTSUP;
- dev->dev_ops->link_update(dev, 0);
+ if (dev->data->dev_conf.intr_conf.lsc == 0) {
+ if (dev->dev_ops->link_update == NULL)
+ return -ENOTSUP;
+ dev->dev_ops->link_update(dev, 0);
+ }
+ } else {
+ /* in secondary, proxy to primary */
+ ret = ethdev_request(ETH_REQ_START, port_id, UINT16_MAX);
+ if (ret != 0)
+ return ret;
}
/* expose selection of PMD fast-path functions */
@@ -1880,9 +1887,14 @@ rte_eth_dev_stop(uint16_t port_id)
/* point fast-path functions to dummy ones */
eth_dev_fp_ops_reset(rte_eth_fp_ops + port_id);
- ret = dev->dev_ops->dev_stop(dev);
- if (ret == 0)
- dev->data->dev_started = 0;
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ ret = dev->dev_ops->dev_stop(dev);
+ if (ret == 0)
+ dev->data->dev_started = 0;
+ } else {
+ ret = ethdev_request(ETH_REQ_STOP, port_id, UINT16_MAX);
+ }
+
rte_ethdev_trace_stop(port_id, ret);
return ret;
--
2.47.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC 03/13] test: add test for hotplug and secondary process operations
2025-04-11 23:44 [RFC 00/13] Packet capture using port mirroring Stephen Hemminger
2025-04-11 23:44 ` [RFC 01/13] app/testpmd: revert auto attach/detach Stephen Hemminger
2025-04-11 23:44 ` [RFC 02/13] ethdev: allow start/stop from secondary process Stephen Hemminger
@ 2025-04-11 23:44 ` Stephen Hemminger
2025-04-11 23:44 ` [RFC 04/13] net/ring: allow lockfree transmit if ring supports it Stephen Hemminger
` (10 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-11 23:44 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Anatoly Burakov
Use null device to exercise ethdev start/stop in secondary process.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
app/test/test_mp_secondary.c | 51 +++++++++++++++++++++++++++++++++---
1 file changed, 48 insertions(+), 3 deletions(-)
diff --git a/app/test/test_mp_secondary.c b/app/test/test_mp_secondary.c
index f3694530a8..09611cc5c2 100644
--- a/app/test/test_mp_secondary.c
+++ b/app/test/test_mp_secondary.c
@@ -15,6 +15,9 @@
#include <string.h>
#include <unistd.h>
+#include <rte_ethdev.h>
+#include <rte_bus_vdev.h>
+
#ifdef RTE_EXEC_ENV_WINDOWS
int
test_mp_secondary(void)
@@ -204,6 +207,44 @@ run_object_creation_tests(void)
return 0;
}
+static int
+run_ethdev_tests(void)
+{
+ char vdev_name[] = "net_null";
+ struct rte_eth_conf dev_conf = { 0 };
+ uint16_t port;
+ int ret;
+
+ printf("### Testing hotplug and ethdev start/stop\n");
+
+ /* use hotplug to make a null vdev */
+ ret = rte_eal_hotplug_add("vdev", vdev_name, "");
+ TEST_ASSERT(ret == 0, "Hotplug add of '%s' failed", vdev_name);
+
+ printf("# Checked hotlug_add OK\n");
+
+ ret = rte_eth_dev_get_port_by_name(vdev_name, &port);
+ TEST_ASSERT(ret == 0, "Lookup vdev '%s' failed", vdev_name);
+
+ ret = rte_eth_dev_configure(port, 1, 1, &dev_conf);
+ TEST_ASSERT(ret == 0, "Configure port %u failed", port);
+
+ ret = rte_eth_dev_start(port);
+ TEST_ASSERT(ret == 0, "Start port %u failed", port);
+
+ printf("# Checked rte_eth_dev_start\n");
+
+ ret = rte_eth_dev_stop(port);
+ TEST_ASSERT(ret == 0, "Stop port %u failed", port);
+
+ printf("# Checked rte_eth_dev_stop\n");
+
+ rte_eal_hotplug_remove("vdev", vdev_name);
+
+ printf("# Checked hotlug_remove OK\n");
+ return 0;
+}
+
/* if called in a primary process, just spawns off a secondary process to
* run validation tests - which brings us right back here again...
* if called in a secondary process, this runs a series of API tests to check
@@ -212,15 +253,19 @@ run_object_creation_tests(void)
int
test_mp_secondary(void)
{
- if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ int ret;
+
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY)
return run_secondary_instances();
- }
printf("IN SECONDARY PROCESS\n");
+ ret = run_ethdev_tests();
+ if (ret != 0)
+ return ret;
return run_object_creation_tests();
}
#endif /* !RTE_EXEC_ENV_WINDOWS */
-REGISTER_FAST_TEST(multiprocess_autotest, false, false, test_mp_secondary);
+REGISTER_FAST_TEST(multiprocess_autotest, true, true, test_mp_secondary);
--
2.47.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC 04/13] net/ring: allow lockfree transmit if ring supports it
2025-04-11 23:44 [RFC 00/13] Packet capture using port mirroring Stephen Hemminger
` (2 preceding siblings ...)
2025-04-11 23:44 ` [RFC 03/13] test: add test for hotplug and secondary process operations Stephen Hemminger
@ 2025-04-11 23:44 ` Stephen Hemminger
2025-04-11 23:44 ` [RFC 05/13] net/ring: add argument to attach existing ring Stephen Hemminger
` (9 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-11 23:44 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Bruce Richardson
Need to be able have multiple threads all using same
transmit queue when using SPAN.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
drivers/net/ring/rte_eth_ring.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/net/ring/rte_eth_ring.c b/drivers/net/ring/rte_eth_ring.c
index b1085bf390..966c64d9a5 100644
--- a/drivers/net/ring/rte_eth_ring.c
+++ b/drivers/net/ring/rte_eth_ring.c
@@ -177,6 +177,19 @@ eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
return 0;
}
+static bool
+eth_tx_queue_lockfree(const struct pmd_internals *internals)
+{
+ unsigned int i;
+
+ for (i = 0; i < internals->max_tx_queues; i++) {
+ const struct ring_queue *r = &internals->tx_ring_queues[i];
+
+ if (r->rng->flags & RING_F_SP_ENQ)
+ return false;
+ }
+ return true;
+}
static int
eth_dev_info(struct rte_eth_dev *dev,
@@ -189,6 +202,10 @@ eth_dev_info(struct rte_eth_dev *dev,
dev_info->max_rx_queues = (uint16_t)internals->max_rx_queues;
dev_info->rx_offload_capa = RTE_ETH_RX_OFFLOAD_SCATTER;
dev_info->tx_offload_capa = RTE_ETH_TX_OFFLOAD_MULTI_SEGS;
+
+ if (eth_tx_queue_lockfree(internals))
+ dev_info->tx_offload_capa |= RTE_ETH_TX_OFFLOAD_MT_LOCKFREE;
+
dev_info->max_tx_queues = (uint16_t)internals->max_tx_queues;
dev_info->min_rx_bufsize = 0;
--
2.47.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC 05/13] net/ring: add argument to attach existing ring
2025-04-11 23:44 [RFC 00/13] Packet capture using port mirroring Stephen Hemminger
` (3 preceding siblings ...)
2025-04-11 23:44 ` [RFC 04/13] net/ring: allow lockfree transmit if ring supports it Stephen Hemminger
@ 2025-04-11 23:44 ` Stephen Hemminger
2025-04-11 23:44 ` [RFC 06/13] net/ring: add timestamp devargs Stephen Hemminger
` (8 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-11 23:44 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Bruce Richardson
Need ability to allow process like dumpcap to make a ring ethdev
with a pre-existing ring. Do this via devargs so it can work
with hotplug.
The API rte_eth_from_ring() doesn't do this.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
drivers/net/ring/rte_eth_ring.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/drivers/net/ring/rte_eth_ring.c b/drivers/net/ring/rte_eth_ring.c
index 966c64d9a5..94a739a925 100644
--- a/drivers/net/ring/rte_eth_ring.c
+++ b/drivers/net/ring/rte_eth_ring.c
@@ -23,9 +23,12 @@
#define ETH_RING_INTERNAL_ARG "internal"
#define ETH_RING_INTERNAL_ARG_MAX_LEN 19 /* "0x..16chars..\0" */
+#define ETH_RING_RING_ARG "ring"
+
static const char *valid_arguments[] = {
ETH_RING_NUMA_NODE_ACTION_ARG,
ETH_RING_INTERNAL_ARG,
+ ETH_RING_RING_ARG,
NULL
};
@@ -693,6 +696,20 @@ parse_internal_args(const char *key __rte_unused, const char *value,
return 0;
}
+static int
+parse_ring_arg(const char *key __rte_unused, const char *value, void *data)
+{
+ struct rte_ring **rp = data;
+
+ *rp = rte_ring_lookup(value);
+ if (*rp == NULL) {
+ PMD_LOG(ERR, "ring '%s' not found", value);
+ return -1;
+ }
+
+ return 0;
+}
+
static int
rte_pmd_ring_probe(struct rte_vdev_device *dev)
{
@@ -770,6 +787,18 @@ rte_pmd_ring_probe(struct rte_vdev_device *dev)
ð_dev);
if (ret >= 0)
ret = 0;
+ } else if (rte_kvargs_count(kvlist, ETH_RING_RING_ARG) == 1) {
+ struct rte_ring *rxtx[1] = { };
+
+ ret = rte_kvargs_process(kvlist, ETH_RING_RING_ARG, parse_ring_arg, rxtx);
+ if (ret < 0)
+ goto out_free;
+
+ /* Note: rte_eth_from_ring() does not do what is expected here! */
+ ret = do_eth_dev_ring_create(name, dev, rxtx, 1, rxtx, 1,
+ rte_socket_id(), DEV_ATTACH, ð_dev);
+ if (ret < 0)
+ goto out_free;
} else {
ret = rte_kvargs_count(kvlist, ETH_RING_NUMA_NODE_ACTION_ARG);
info = rte_zmalloc("struct node_action_list",
@@ -843,4 +872,5 @@ static struct rte_vdev_driver pmd_ring_drv = {
RTE_PMD_REGISTER_VDEV(net_ring, pmd_ring_drv);
RTE_PMD_REGISTER_ALIAS(net_ring, eth_ring);
RTE_PMD_REGISTER_PARAM_STRING(net_ring,
+ ETH_RING_RING_ARG "=<string> "
ETH_RING_NUMA_NODE_ACTION_ARG "=name:node:action(ATTACH|CREATE)");
--
2.47.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC 06/13] net/ring: add timestamp devargs
2025-04-11 23:44 [RFC 00/13] Packet capture using port mirroring Stephen Hemminger
` (4 preceding siblings ...)
2025-04-11 23:44 ` [RFC 05/13] net/ring: add argument to attach existing ring Stephen Hemminger
@ 2025-04-11 23:44 ` Stephen Hemminger
2025-04-11 23:44 ` [RFC 07/13] net/null: all lockfree transmit Stephen Hemminger
` (7 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-11 23:44 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Bruce Richardson
Add argument to ring PMD creation to force packets going
into ring to be timestamped.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
drivers/net/ring/rte_eth_ring.c | 44 +++++++++++++++++++++++++++++++--
1 file changed, 42 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ring/rte_eth_ring.c b/drivers/net/ring/rte_eth_ring.c
index 94a739a925..0f97cd1262 100644
--- a/drivers/net/ring/rte_eth_ring.c
+++ b/drivers/net/ring/rte_eth_ring.c
@@ -20,15 +20,18 @@
#define ETH_RING_ACTION_CREATE "CREATE"
#define ETH_RING_ACTION_ATTACH "ATTACH"
#define ETH_RING_ACTION_MAX_LEN 8 /* CREATE | ACTION */
+
#define ETH_RING_INTERNAL_ARG "internal"
#define ETH_RING_INTERNAL_ARG_MAX_LEN 19 /* "0x..16chars..\0" */
#define ETH_RING_RING_ARG "ring"
+#define ETH_RING_TIMESTAMP_ARG "timestamp"
static const char *valid_arguments[] = {
ETH_RING_NUMA_NODE_ACTION_ARG,
ETH_RING_INTERNAL_ARG,
ETH_RING_RING_ARG,
+ ETH_RING_TIMESTAMP_ARG,
NULL
};
@@ -41,6 +44,9 @@ struct ring_internal_args {
void *addr; /* self addr for sanity check */
};
+static uint64_t timestamp_dynflag;
+static int timestamp_dynfield_offset = -1;
+
enum dev_action {
DEV_CREATE,
DEV_ATTACH
@@ -49,6 +55,8 @@ enum dev_action {
struct ring_queue {
struct rte_ring *rng;
uint16_t in_port;
+ uint8_t timestamp;
+
RTE_ATOMIC(uint64_t) rx_pkts;
RTE_ATOMIC(uint64_t) tx_pkts;
};
@@ -56,6 +64,7 @@ struct ring_queue {
struct pmd_internals {
unsigned int max_rx_queues;
unsigned int max_tx_queues;
+ uint8_t timestamp;
struct ring_queue rx_ring_queues[RTE_PMD_RING_MAX_RX_RINGS];
struct ring_queue tx_ring_queues[RTE_PMD_RING_MAX_TX_RINGS];
@@ -64,6 +73,7 @@ struct pmd_internals {
enum dev_action action;
};
+
static struct rte_eth_link pmd_link = {
.link_speed = RTE_ETH_SPEED_NUM_10G,
.link_duplex = RTE_ETH_LINK_FULL_DUPLEX,
@@ -99,8 +109,23 @@ eth_ring_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs)
{
void **ptrs = (void *)&bufs[0];
struct ring_queue *r = q;
- const uint16_t nb_tx = (uint16_t)rte_ring_enqueue_burst(r->rng,
- ptrs, nb_bufs, NULL);
+ uint16_t nb_tx;
+
+ if (r->timestamp) {
+ unsigned int i;
+ uint64_t cycles = rte_get_tsc_cycles();
+
+ for (i = 0; i < nb_bufs; i++) {
+ struct rte_mbuf *m = bufs[i];
+ *RTE_MBUF_DYNFIELD(m, timestamp_dynfield_offset,
+ rte_mbuf_timestamp_t *) = cycles;
+ m->ol_flags |= timestamp_dynflag;
+ }
+ }
+
+
+ nb_tx = (uint16_t)rte_ring_enqueue_burst(r->rng, ptrs, nb_bufs, NULL);
+
if (r->rng->flags & RING_F_SP_ENQ)
r->tx_pkts += nb_tx;
else
@@ -176,6 +201,7 @@ eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t tx_queue_id,
{
struct pmd_internals *internals = dev->data->dev_private;
+ internals->tx_ring_queues[tx_queue_id].timestamp = internals->timestamp;
dev->data->tx_queues[tx_queue_id] = &internals->tx_ring_queues[tx_queue_id];
return 0;
}
@@ -835,6 +861,20 @@ rte_pmd_ring_probe(struct rte_vdev_device *dev)
}
}
}
+
+ if (rte_kvargs_count(kvlist, ETH_RING_TIMESTAMP_ARG) == 1) {
+ struct pmd_internals *internals = eth_dev->data->dev_private;
+
+ if (timestamp_dynfield_offset == -1) {
+ ret = rte_mbuf_dyn_rx_timestamp_register(×tamp_dynfield_offset,
+ ×tamp_dynflag);
+ if (ret < 0)
+ return ret;
+ }
+
+ internals->timestamp = 1;
+ }
+
}
out_free:
--
2.47.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC 07/13] net/null: all lockfree transmit
2025-04-11 23:44 [RFC 00/13] Packet capture using port mirroring Stephen Hemminger
` (5 preceding siblings ...)
2025-04-11 23:44 ` [RFC 06/13] net/ring: add timestamp devargs Stephen Hemminger
@ 2025-04-11 23:44 ` Stephen Hemminger
2025-04-11 23:44 ` [RFC 08/13] mbuf: add fields for mirroring Stephen Hemminger
` (6 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-11 23:44 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Tetsuya Mukawa
For testing mirror, useful if this device allows lockfree Tx.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
drivers/net/null/rte_eth_null.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c
index 6764cf2ec1..d9210d867f 100644
--- a/drivers/net/null/rte_eth_null.c
+++ b/drivers/net/null/rte_eth_null.c
@@ -314,6 +314,7 @@ eth_dev_info(struct rte_eth_dev *dev,
dev_info->max_rx_queues = RTE_DIM(internals->rx_null_queues);
dev_info->max_tx_queues = RTE_DIM(internals->tx_null_queues);
dev_info->min_rx_bufsize = 0;
+ dev_info->tx_offload_capa = RTE_ETH_TX_OFFLOAD_MULTI_SEGS | RTE_ETH_TX_OFFLOAD_MT_LOCKFREE;
dev_info->reta_size = internals->reta_size;
dev_info->flow_type_rss_offloads = internals->flow_type_rss_offloads;
dev_info->hash_key_size = sizeof(internals->rss_key);
--
2.47.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC 08/13] mbuf: add fields for mirroring
2025-04-11 23:44 [RFC 00/13] Packet capture using port mirroring Stephen Hemminger
` (6 preceding siblings ...)
2025-04-11 23:44 ` [RFC 07/13] net/null: all lockfree transmit Stephen Hemminger
@ 2025-04-11 23:44 ` Stephen Hemminger
2025-04-12 9:59 ` Morten Brørup
2025-04-11 23:44 ` [RFC 09/13] ethdev: add port mirror capability Stephen Hemminger
` (5 subsequent siblings)
13 siblings, 1 reply; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-11 23:44 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
Add field to union used for sched/event etc, for use when
an mbuf is mirrored.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
lib/mbuf/rte_mbuf_core.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index a0df265b5d..1806dddd67 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -589,6 +589,14 @@ struct __rte_cache_aligned rte_mbuf {
* @see rte_event_eth_tx_adapter_txq_set()
*/
} txadapter; /**< Eventdev ethdev Tx adapter */
+ struct rte_mbuf_mirror {
+ uint32_t orig_len;
+ uint16_t queue_id;
+ uint16_t direction;
+ /**< Port mirroring uses this to store origin
+ * @see rte_eth_mirror()
+ */
+ } mirror;
uint32_t usr;
/**< User defined tags. See rte_distributor_process() */
} hash; /**< hash information */
--
2.47.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC 09/13] ethdev: add port mirror capability
2025-04-11 23:44 [RFC 00/13] Packet capture using port mirroring Stephen Hemminger
` (7 preceding siblings ...)
2025-04-11 23:44 ` [RFC 08/13] mbuf: add fields for mirroring Stephen Hemminger
@ 2025-04-11 23:44 ` Stephen Hemminger
2025-04-11 23:44 ` [RFC 10/13] pcapng: split packet copy from header insertion Stephen Hemminger
` (4 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-11 23:44 UTC (permalink / raw)
To: dev
Cc: Stephen Hemminger, Bruce Richardson, Thomas Monjalon,
Ferruh Yigit, Andrew Rybchenko
This adds new feature for port mirroring.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
config/rte_config.h | 2 +
lib/ethdev/ethdev_driver.h | 3 +
lib/ethdev/ethdev_private.c | 3 +
lib/ethdev/ethdev_trace.h | 14 +++
lib/ethdev/ethdev_trace_points.c | 6 +
lib/ethdev/rte_ethdev.c | 191 +++++++++++++++++++++++++++++++
lib/ethdev/rte_ethdev.h | 81 +++++++++++++
lib/ethdev/rte_ethdev_core.h | 6 +-
8 files changed, 304 insertions(+), 2 deletions(-)
diff --git a/config/rte_config.h b/config/rte_config.h
index 86897de75e..3a1fbec6b9 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -70,6 +70,8 @@
#define RTE_ETHDEV_QUEUE_STAT_CNTRS 16 /* max 256 */
#define RTE_ETHDEV_RXTX_CALLBACKS 1
#define RTE_MAX_MULTI_HOST_CTRLS 4
+#define RTE_ETHDEV_MIRROR 1
+#define RTE_MIRROR_BURST_SIZE 256
/* cryptodev defines */
#define RTE_CRYPTO_MAX_DEVS 64
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 2b4d2ae9c3..9b3215ae2d 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -172,6 +172,9 @@ struct __rte_cache_aligned rte_eth_dev_data {
uint32_t dev_flags; /**< Capabilities */
int numa_node; /**< NUMA node connection */
+ struct rte_eth_mirror *rx_mirror; /**< Port mirroring */
+ struct rte_eth_mirror *tx_mirror; /**< Port mirroring */
+
/** VLAN filter configuration */
struct rte_vlan_filter_conf vlan_filter_conf;
diff --git a/lib/ethdev/ethdev_private.c b/lib/ethdev/ethdev_private.c
index bded3f1722..b7bcd1d089 100644
--- a/lib/ethdev/ethdev_private.c
+++ b/lib/ethdev/ethdev_private.c
@@ -284,6 +284,9 @@ eth_dev_fp_ops_setup(struct rte_eth_fp_ops *fpo,
fpo->txq.data = dev->data->tx_queues;
fpo->txq.clbk = (void * __rte_atomic *)(uintptr_t)dev->pre_tx_burst_cbs;
+
+ fpo->rx_mirror = &dev->data->rx_mirror;
+ fpo->tx_mirror = &dev->data->tx_mirror;
}
RTE_EXPORT_SYMBOL(rte_eth_call_rx_callbacks)
diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h
index c65b78590a..de93037581 100644
--- a/lib/ethdev/ethdev_trace.h
+++ b/lib/ethdev/ethdev_trace.h
@@ -1035,6 +1035,20 @@ RTE_TRACE_POINT(
rte_trace_point_emit_int(ret);
)
+RTE_TRACE_POINT(
+ rte_eth_trace_mirror_bind,
+ RTE_TRACE_POINT_ARGS(uint16_t port_id,
+ const struct rte_eth_mirror_conf *conf),
+ rte_trace_point_emit_u16(port_id);
+ rte_trace_point_emit_ptr(conf);
+)
+
+RTE_TRACE_POINT(
+ rte_eth_trace_mirror_unbind,
+ RTE_TRACE_POINT_ARGS(uint16_t port_id),
+ rte_trace_point_emit_u16(port_id);
+)
+
RTE_TRACE_POINT(
rte_eth_trace_rx_queue_info_get,
RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id,
diff --git a/lib/ethdev/ethdev_trace_points.c b/lib/ethdev/ethdev_trace_points.c
index 071c508327..e73ef01332 100644
--- a/lib/ethdev/ethdev_trace_points.c
+++ b/lib/ethdev/ethdev_trace_points.c
@@ -389,6 +389,12 @@ RTE_TRACE_POINT_REGISTER(rte_eth_trace_remove_rx_callback,
RTE_TRACE_POINT_REGISTER(rte_eth_trace_remove_tx_callback,
lib.ethdev.remove_tx_callback)
+RTE_TRACE_POINT_REGISTER(rte_eth_trace_mirror_bind,
+ lib.ethdev.mirror_bind)
+
+RTE_TRACE_POINT_REGISTER(rte_eth_trace_mirror_unbind,
+ lib.ethdev.mirror_unbind)
+
RTE_TRACE_POINT_REGISTER(rte_eth_trace_rx_queue_info_get,
lib.ethdev.rx_queue_info_get)
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index f7bf762ff7..bea4dac502 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -14,6 +14,7 @@
#include <bus_driver.h>
#include <eal_export.h>
#include <rte_log.h>
+#include <rte_alarm.h>
#include <rte_interrupts.h>
#include <rte_kvargs.h>
#include <rte_memcpy.h>
@@ -52,6 +53,9 @@ static rte_spinlock_t eth_dev_rx_cb_lock = RTE_SPINLOCK_INITIALIZER;
/* spinlock for add/remove Tx callbacks */
static rte_spinlock_t eth_dev_tx_cb_lock = RTE_SPINLOCK_INITIALIZER;
+/* spinlock for setting up mirror ports */
+static rte_spinlock_t eth_dev_mirror_lock = RTE_SPINLOCK_INITIALIZER;
+
/* store statistics names and its offset in stats structure */
struct rte_eth_xstats_name_off {
char name[RTE_ETH_XSTATS_NAME_SIZE];
@@ -7050,6 +7054,193 @@ rte_eth_dev_hairpin_capability_get(uint16_t port_id,
return ret;
}
+
+struct rte_eth_mirror {
+ struct rte_mempool *pool;
+ uint32_t snaplen;
+ uint16_t target;
+};
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_add_mirror, 25.07)
+int
+rte_eth_add_mirror(uint16_t port_id, uint16_t target_id, const struct rte_eth_mirror_conf *conf)
+{
+#ifndef RTE_ETHDEV_MIRROR
+ return -ENOTSUP;
+#endif
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(target_id, -ENODEV);
+
+ struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+ if (conf == NULL) {
+ RTE_ETHDEV_LOG_LINE(ERR, "Missing configuration information");
+ return -EINVAL;
+ }
+
+ if (conf->mp == NULL) {
+ RTE_ETHDEV_LOG_LINE(ERR, "not a valid mempool");
+ return -EINVAL;
+ }
+
+ if (conf->direction == 0 ||
+ conf->direction > (RTE_MIRROR_DIRECTION_INGRESS | RTE_MIRROR_DIRECTION_EGRESS)) {
+ RTE_ETHDEV_LOG_LINE(ERR, "Invalid direction %#x", conf->direction);
+ return -EINVAL;
+ }
+
+ if (conf->snaplen < RTE_ETHER_HDR_LEN) {
+ RTE_ETHDEV_LOG_LINE(ERR, "Invalid snapshot length");
+ return -EINVAL;
+ }
+
+ if (target_id == port_id) {
+ RTE_ETHDEV_LOG_LINE(ERR, "Cannot mirror port to self");
+ return -EINVAL;
+ }
+
+ struct rte_eth_dev_info dev_info;
+ int ret = rte_eth_dev_info_get(target_id, &dev_info);
+ if (ret != 0)
+ return ret;
+
+ if (!(dev_info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_MT_LOCKFREE)) {
+ RTE_ETHDEV_LOG_LINE(ERR, "Mirror needs lockfree transmit");
+ return -ENOTSUP;
+ }
+
+ struct rte_eth_mirror *mirror = rte_zmalloc(NULL, sizeof(*mirror), 0);
+ if (mirror == NULL)
+ return -ENOMEM;
+
+ mirror->pool = conf->mp;
+ mirror->target = target_id;
+ mirror->snaplen = conf->snaplen;
+
+ rte_spinlock_lock(ð_dev_mirror_lock);
+ if (dev->data->rx_mirror != NULL || dev->data->tx_mirror != NULL)
+ ret = -EBUSY;
+ else {
+ if (conf->direction & RTE_MIRROR_DIRECTION_INGRESS)
+ rte_atomic_store_explicit(&dev->data->rx_mirror, mirror, rte_memory_order_relaxed);
+ if (conf->direction & RTE_MIRROR_DIRECTION_EGRESS)
+ rte_atomic_store_explicit(&dev->data->tx_mirror, mirror, rte_memory_order_relaxed);
+
+ rte_eth_trace_mirror_bind(port_id, conf);
+ ret = 0;
+ }
+ rte_spinlock_unlock(ð_dev_mirror_lock);
+
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_remove_mirror, 25.07)
+int
+rte_eth_remove_mirror(uint16_t port_id)
+{
+#ifndef RTE_ETHDEV_MIRROR
+ return -ENOTSUP;
+#endif
+ struct rte_eth_mirror *rx_mirror, *tx_mirror;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+ struct rte_eth_dev *dev = &rte_eth_devices[port_id];
+
+ rte_spinlock_lock(ð_dev_mirror_lock);
+ rx_mirror = rte_atomic_exchange_explicit(&dev->data->rx_mirror, NULL,
+ rte_memory_order_acquire);
+ tx_mirror = rte_atomic_exchange_explicit(&dev->data->tx_mirror, NULL,
+ rte_memory_order_acquire);
+
+ rte_spinlock_unlock(ð_dev_mirror_lock);
+
+ struct rte_eth_mirror *mirror = NULL;
+ if (rx_mirror)
+ mirror = rx_mirror;
+ else if (tx_mirror)
+ mirror = tx_mirror;
+ else
+ return -ENOENT; /* no mirror present */
+
+ /* Defer freeing the mirror until after one second
+ * to allow for active threads that are using it.
+ * Assumes no PMD takes more than one second to transmit a burst.
+ * Alternative would be RCU, but RCU in DPDK is optional.
+ */
+ rte_eal_alarm_set(US_PER_S, rte_free, mirror);
+ rte_eth_trace_mirror_unbind(port_id);
+ return 0;
+}
+
+static inline void
+eth_dev_mirror(uint16_t port_id, uint16_t queue_id, uint8_t direction,
+ struct rte_mbuf **pkts, uint16_t nb_pkts,
+ const struct rte_eth_mirror *mirror)
+{
+ struct rte_mbuf *tosend[RTE_MIRROR_BURST_SIZE];
+ unsigned int count = 0;
+ unsigned int i;
+
+ for (i = 0; i < nb_pkts; i++) {
+ const struct rte_mbuf *m = pkts[i];
+ struct rte_mbuf *mc;
+
+ /*
+ * maybe use refcount to clone mbuf but lots of restrictions:
+ * - assumes application won't overwrite rx mbuf
+ * - no vlan insertion
+ */
+ mc = rte_pktmbuf_copy(m, mirror->pool, 0, mirror->snaplen);
+ if (unlikely(mc == NULL))
+ continue;
+
+ /* if original packet has VLAN offload, then undo offload */
+ if ((direction == RTE_MIRROR_DIRECTION_INGRESS &&
+ (m->ol_flags & RTE_MBUF_F_RX_VLAN_STRIPPED)) ||
+ (direction == RTE_MIRROR_DIRECTION_EGRESS &&
+ (m->ol_flags & RTE_MBUF_F_TX_VLAN))) {
+ if (unlikely(rte_vlan_insert(&mc) != 0)) {
+ rte_pktmbuf_free(mc);
+ continue;
+ }
+ }
+
+ mc->port = port_id;
+ mc->hash.mirror = (struct rte_mbuf_mirror) {
+ .orig_len = m->pkt_len,
+ .queue_id = queue_id,
+ .direction = direction,
+ };
+
+ tosend[count++] = mc;
+ }
+
+ uint16_t nsent = rte_eth_tx_burst(mirror->target, 0, tosend, count);
+ if (unlikely(nsent < count)) {
+ uint16_t drop = count - nsent;
+
+ /* TODO: need some stats here? */
+ rte_pktmbuf_free_bulk(pkts + nsent, drop);
+ }
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_mirror_burst, 25.07)
+void
+rte_eth_mirror_burst(uint16_t port_id, uint16_t queue_id, uint8_t direction,
+ struct rte_mbuf **pkts, uint16_t nb_pkts,
+ const struct rte_eth_mirror *mirror)
+{
+ unsigned int i;
+
+ for (i = 0; i < nb_pkts; i += RTE_MIRROR_BURST_SIZE) {
+ uint16_t left = nb_pkts - i;
+ uint16_t burst = RTE_MIN(left, RTE_MIRROR_BURST_SIZE);
+
+ eth_dev_mirror(port_id, queue_id, direction,
+ pkts + i, burst, mirror);
+ }
+}
+
RTE_EXPORT_SYMBOL(rte_eth_dev_pool_ops_supported)
int
rte_eth_dev_pool_ops_supported(uint16_t port_id, const char *pool)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index ea7f8c4a1a..0588d6f416 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -1464,6 +1464,66 @@ enum rte_eth_tunnel_type {
RTE_ETH_TUNNEL_TYPE_MAX,
};
+/* Definitions for mirror direction */
+#define RTE_MIRROR_DIRECTION_INGRESS 1
+#define RTE_MIRROR_DIRECTION_EGRESS 2
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Structure used to configure port mirroring.
+ */
+struct rte_eth_mirror_conf {
+ struct rte_mempool *mp; /**< Memory pool to allocate from */
+ uint32_t snaplen; /**< Amount of data to copy */
+ uint8_t direction; /**< bitmask of RTE_MIRROR_DIRECTION_XXX */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Create a port mirror.
+ *
+ * @param port_id
+ * The port identifier of the Ethernet device.
+ * @param target_id
+ * The port identifier of the mirror port.
+ * @param conf
+ * Settings for the mirror.
+ * @return
+ * Negative errno value on error, 0 on success.
+ */
+__rte_experimental
+int
+rte_eth_add_mirror(uint16_t port_id, uint16_t target_id,
+ const struct rte_eth_mirror_conf *conf);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Break port mirroring. After this call no more packets will be sent
+ * the target port.
+ *
+ * @param port_id
+ * The port identifier of the Ethernet device.
+ * @return
+ * Negative errno value on error, 0 on success.
+ */
+__rte_experimental
+int rte_eth_remove_mirror(uint16_t port_id);
+
+/**
+ * @internal
+ * Helper routine for rte_eth_rx_burst() and rte_eth_tx_burst().
+ */
+struct rte_eth_mirror;
+__rte_experimental
+void rte_eth_mirror_burst(uint16_t port_id, uint16_t queue_id, uint8_t dir,
+ struct rte_mbuf **pkts, uint16_t nb_pkts,
+ const struct rte_eth_mirror *mirror);
#ifdef __cplusplus
}
#endif
@@ -6331,6 +6391,17 @@ rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id,
nb_rx = p->rx_pkt_burst(qd, rx_pkts, nb_pkts);
+#ifdef RTE_ETHDEV_MIRROR
+ if (p->rx_mirror) {
+ const struct rte_eth_mirror *mirror;
+
+ mirror = rte_atomic_load_explicit(p->rx_mirror, rte_memory_order_relaxed);
+ if (unlikely(mirror != NULL))
+ rte_eth_mirror_burst(port_id, queue_id, RTE_MIRROR_DIRECTION_INGRESS,
+ rx_pkts, nb_rx, mirror);
+ }
+#endif
+
#ifdef RTE_ETHDEV_RXTX_CALLBACKS
{
void *cb;
@@ -6689,6 +6760,16 @@ rte_eth_tx_burst(uint16_t port_id, uint16_t queue_id,
}
#endif
+#ifdef RTE_ETHDEV_MIRROR
+ if (p->tx_mirror) {
+ const struct rte_eth_mirror *mirror;
+
+ mirror = rte_atomic_load_explicit(p->tx_mirror, rte_memory_order_relaxed);
+ if (unlikely(mirror != NULL))
+ rte_eth_mirror_burst(port_id, queue_id, RTE_MIRROR_DIRECTION_EGRESS,
+ tx_pkts, nb_pkts, mirror);
+ }
+#endif
nb_pkts = p->tx_pkt_burst(qd, tx_pkts, nb_pkts);
rte_ethdev_trace_tx_burst(port_id, queue_id, (void **)tx_pkts, nb_pkts);
diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
index e55fb42996..2ef3b659f7 100644
--- a/lib/ethdev/rte_ethdev_core.h
+++ b/lib/ethdev/rte_ethdev_core.h
@@ -101,7 +101,8 @@ struct __rte_cache_aligned rte_eth_fp_ops {
eth_rx_descriptor_status_t rx_descriptor_status;
/** Refill Rx descriptors with the recycling mbufs. */
eth_recycle_rx_descriptors_refill_t recycle_rx_descriptors_refill;
- uintptr_t reserved1[2];
+ uintptr_t reserved1;
+ RTE_ATOMIC(struct rte_eth_mirror *) *rx_mirror;
/**@}*/
/**@{*/
@@ -121,7 +122,8 @@ struct __rte_cache_aligned rte_eth_fp_ops {
eth_recycle_tx_mbufs_reuse_t recycle_tx_mbufs_reuse;
/** Get the number of used Tx descriptors. */
eth_tx_queue_count_t tx_queue_count;
- uintptr_t reserved2[1];
+ RTE_ATOMIC(struct rte_eth_mirror *) *tx_mirror;
+ uintptr_t reserved2;
/**@}*/
};
--
2.47.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC 10/13] pcapng: split packet copy from header insertion
2025-04-11 23:44 [RFC 00/13] Packet capture using port mirroring Stephen Hemminger
` (8 preceding siblings ...)
2025-04-11 23:44 ` [RFC 09/13] ethdev: add port mirror capability Stephen Hemminger
@ 2025-04-11 23:44 ` Stephen Hemminger
2025-04-11 23:44 ` [RFC 11/13] test: add tests for ethdev mirror Stephen Hemminger
` (3 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-11 23:44 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Reshma Pattan
In new model, the packet was already copied, only need
to wrap it in pcapng format.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
lib/pcapng/rte_pcapng.c | 178 +++++++++++++++++++++-------------------
lib/pcapng/rte_pcapng.h | 27 +++++-
2 files changed, 120 insertions(+), 85 deletions(-)
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index cacbefdc50..5b40a20b19 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -1,3 +1,4 @@
+
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright(c) 2019 Microsoft Corporation
*/
@@ -432,8 +433,24 @@ pcapng_vlan_insert(struct rte_mbuf *m, uint16_t ether_type, uint16_t tci)
return 0;
}
+/* pad the packet to 32 bit boundary */
+static inline int
+pcapng_mbuf_pad32(struct rte_mbuf *m)
+{
+ uint32_t pkt_len = rte_pktmbuf_pkt_len(m);
+ uint32_t padding = RTE_ALIGN(pkt_len, sizeof(uint32_t)) - pkt_len;
+
+ if (padding > 0) {
+ void *tail = rte_pktmbuf_append(m, padding);
+ if (tail == NULL)
+ return -1;
+ memset(tail, 0, padding);
+ }
+ return 0;
+}
+
/*
- * The mbufs created use the Pcapng standard enhanced packet block.
+ * The mbufs created use the Pcapng standard enhanced packet block.
*
* 1 2 3
* 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
@@ -468,71 +485,28 @@ pcapng_vlan_insert(struct rte_mbuf *m, uint16_t ether_type, uint16_t tci)
* | Block Total Length |
* +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
*/
-
-/* Make a copy of original mbuf with pcapng header and options */
-RTE_EXPORT_SYMBOL(rte_pcapng_copy)
-struct rte_mbuf *
-rte_pcapng_copy(uint16_t port_id, uint32_t queue,
- const struct rte_mbuf *md,
- struct rte_mempool *mp,
- uint32_t length,
- enum rte_pcapng_direction direction,
- const char *comment)
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pcapng_insert, 25.07)
+int
+rte_pcapng_insert(struct rte_mbuf *m, uint32_t queue,
+ enum rte_pcapng_direction direction, uint32_t orig_len,
+ uint64_t timestamp, const char *comment)
{
struct pcapng_enhance_packet_block *epb;
- uint32_t orig_len, pkt_len, padding, flags;
- struct pcapng_option *opt;
- uint64_t timestamp;
- uint16_t optlen;
- struct rte_mbuf *mc;
- bool rss_hash;
-
-#ifdef RTE_LIBRTE_ETHDEV_DEBUG
- RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
-#endif
- orig_len = rte_pktmbuf_pkt_len(md);
+ uint32_t pkt_len = rte_pktmbuf_pkt_len(m);
+ uint32_t flags;
- /* Take snapshot of the data */
- mc = rte_pktmbuf_copy(md, mp, 0, length);
- if (unlikely(mc == NULL))
- return NULL;
-
- /* Expand any offloaded VLAN information */
- if ((direction == RTE_PCAPNG_DIRECTION_IN &&
- (md->ol_flags & RTE_MBUF_F_RX_VLAN_STRIPPED)) ||
- (direction == RTE_PCAPNG_DIRECTION_OUT &&
- (md->ol_flags & RTE_MBUF_F_TX_VLAN))) {
- if (pcapng_vlan_insert(mc, RTE_ETHER_TYPE_VLAN,
- md->vlan_tci) != 0)
- goto fail;
- }
+ if (unlikely(pcapng_mbuf_pad32(m) < 0))
+ return -1;
- if ((direction == RTE_PCAPNG_DIRECTION_IN &&
- (md->ol_flags & RTE_MBUF_F_RX_QINQ_STRIPPED)) ||
- (direction == RTE_PCAPNG_DIRECTION_OUT &&
- (md->ol_flags & RTE_MBUF_F_TX_QINQ))) {
- if (pcapng_vlan_insert(mc, RTE_ETHER_TYPE_QINQ,
- md->vlan_tci_outer) != 0)
- goto fail;
- }
+ uint16_t optlen = pcapng_optlen(sizeof(flags));
- /* record HASH on incoming packets */
- rss_hash = (direction == RTE_PCAPNG_DIRECTION_IN &&
- (md->ol_flags & RTE_MBUF_F_RX_RSS_HASH));
+ /* make queue optional? */
+ optlen += pcapng_optlen(sizeof(queue));
- /* pad the packet to 32 bit boundary */
- pkt_len = rte_pktmbuf_pkt_len(mc);
- padding = RTE_ALIGN(pkt_len, sizeof(uint32_t)) - pkt_len;
- if (padding > 0) {
- void *tail = rte_pktmbuf_append(mc, padding);
+ /* does packet have valid RSS hash to include */
+ bool rss_hash = (direction == RTE_PCAPNG_DIRECTION_IN &&
+ (m->ol_flags & RTE_MBUF_F_RX_RSS_HASH));
- if (tail == NULL)
- goto fail;
- memset(tail, 0, padding);
- }
-
- optlen = pcapng_optlen(sizeof(flags));
- optlen += pcapng_optlen(sizeof(queue));
if (rss_hash)
optlen += pcapng_optlen(sizeof(uint8_t) + sizeof(uint32_t));
@@ -540,10 +514,10 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
optlen += pcapng_optlen(strlen(comment));
/* reserve trailing options and block length */
- opt = (struct pcapng_option *)
- rte_pktmbuf_append(mc, optlen + sizeof(uint32_t));
+ struct pcapng_option *opt = (struct pcapng_option *)
+ rte_pktmbuf_append(m, optlen + sizeof(uint32_t));
if (unlikely(opt == NULL))
- goto fail;
+ return -1;
switch (direction) {
case RTE_PCAPNG_DIRECTION_IN:
@@ -556,24 +530,20 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
flags = 0;
}
- opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS,
- &flags, sizeof(flags));
-
- opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE,
- &queue, sizeof(queue));
+ opt = pcapng_add_option(opt, PCAPNG_EPB_FLAGS, &flags, sizeof(flags));
+ opt = pcapng_add_option(opt, PCAPNG_EPB_QUEUE, &queue, sizeof(queue));
if (rss_hash) {
uint8_t hash_opt[5];
- /* The algorithm could be something else if
- * using rte_flow_action_rss; but the current API does not
- * have a way for ethdev to report this on a per-packet basis.
+ /* The algorithm could be something else but the current API does not
+ * have a way for to record this on a per-packet basis
+ * and the PCAPNG hash types don't match the DPDK types.
*/
hash_opt[0] = PCAPNG_HASH_TOEPLITZ;
- memcpy(&hash_opt[1], &md->hash.rss, sizeof(uint32_t));
- opt = pcapng_add_option(opt, PCAPNG_EPB_HASH,
- &hash_opt, sizeof(hash_opt));
+ memcpy(&hash_opt[1], &m->hash.rss, sizeof(uint32_t));
+ opt = pcapng_add_option(opt, PCAPNG_EPB_HASH, &hash_opt, sizeof(hash_opt));
}
if (comment)
@@ -583,19 +553,14 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
/* Note: END_OPT necessary here. Wireshark doesn't do it. */
/* Add PCAPNG packet header */
- epb = (struct pcapng_enhance_packet_block *)
- rte_pktmbuf_prepend(mc, sizeof(*epb));
+ epb = (struct pcapng_enhance_packet_block *) rte_pktmbuf_prepend(m, sizeof(*epb));
if (unlikely(epb == NULL))
- goto fail;
+ return -1;
epb->block_type = PCAPNG_ENHANCED_PACKET_BLOCK;
- epb->block_length = rte_pktmbuf_pkt_len(mc);
-
- /* Interface index is filled in later during write */
- mc->port = port_id;
+ epb->block_length = rte_pktmbuf_pkt_len(m);
- /* Put timestamp in cycles here - adjust in packet write */
- timestamp = rte_get_tsc_cycles();
+ /* Put timestamp in cycles here - adjusted in packet write */
epb->timestamp_hi = timestamp >> 32;
epb->timestamp_lo = (uint32_t)timestamp;
epb->capture_length = pkt_len;
@@ -603,9 +568,56 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
/* set trailer of block length */
*(uint32_t *)opt = epb->block_length;
+ return 0;
+}
+
+/* Make a copy of original mbuf with pcapng header and options */
+RTE_EXPORT_SYMBOL(rte_pcapng_copy)
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+ const struct rte_mbuf *md,
+ struct rte_mempool *mp,
+ uint32_t length,
+ enum rte_pcapng_direction direction,
+ const char *comment)
+{
+ uint32_t orig_len = rte_pktmbuf_pkt_len(md);
+ struct rte_mbuf *mc;
- return mc;
+#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
+#endif
+
+ /* Take snapshot of the data */
+ mc = rte_pktmbuf_copy(md, mp, 0, length);
+ if (unlikely(mc == NULL))
+ return NULL;
+
+ /* Expand any offloaded VLAN information */
+ if ((direction == RTE_PCAPNG_DIRECTION_IN &&
+ (md->ol_flags & RTE_MBUF_F_RX_VLAN_STRIPPED)) ||
+ (direction == RTE_PCAPNG_DIRECTION_OUT &&
+ (md->ol_flags & RTE_MBUF_F_TX_VLAN))) {
+ if (pcapng_vlan_insert(mc, RTE_ETHER_TYPE_VLAN,
+ md->vlan_tci) != 0)
+ goto fail;
+ }
+
+ if ((direction == RTE_PCAPNG_DIRECTION_IN &&
+ (md->ol_flags & RTE_MBUF_F_RX_QINQ_STRIPPED)) ||
+ (direction == RTE_PCAPNG_DIRECTION_OUT &&
+ (md->ol_flags & RTE_MBUF_F_TX_QINQ))) {
+ if (pcapng_vlan_insert(mc, RTE_ETHER_TYPE_QINQ,
+ md->vlan_tci_outer) != 0)
+ goto fail;
+ }
+
+ /* Interface index is filled in later during write */
+ mc->port = port_id;
+ if (likely(rte_pcapng_insert(mc, queue, direction, orig_len,
+ rte_get_tsc_cycles(), comment) == 0))
+ return mc;
fail:
rte_pktmbuf_free(mc);
return NULL;
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
index 48f2b57564..4914ac9622 100644
--- a/lib/pcapng/rte_pcapng.h
+++ b/lib/pcapng/rte_pcapng.h
@@ -99,7 +99,7 @@ enum rte_pcapng_direction {
};
/**
- * Format an mbuf for writing to file.
+ * Make a copy of mbuf for writing to file.
*
* @param port_id
* The Ethernet port on which packet was received
@@ -117,7 +117,7 @@ enum rte_pcapng_direction {
* @param direction
* The direction of the packer: receive, transmit or unknown.
* @param comment
- * Packet comment.
+ * Packet comment (optional).
*
* @return
* - The pointer to the new mbuf formatted for pcapng_write
@@ -129,6 +129,29 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
uint32_t length,
enum rte_pcapng_direction direction, const char *comment);
+/**
+ * Format an mbuf for writing to file.
+ *
+ * @param m
+ * The mbuf to modify.
+ * @param queue
+ * The queue on the Ethernet port where packet was received
+ * or is going to be transmitted.
+ * @param direction
+ * The direction of the packer: receive, transmit or unknown.
+ * @param orig_len
+ * The length of the original packet which maybe less than actual
+ * packet if only a snapshot was captured.
+ * @param timestamp
+ * The timestamp for packet in TSC cycles.
+ * @param comment
+ * Packet comment (optional).
+ */
+__rte_experimental
+int
+rte_pcapng_insert(struct rte_mbuf *m, uint32_t queue,
+ enum rte_pcapng_direction direction, uint32_t orig_len,
+ uint64_t timestamp, const char *comment);
/**
* Determine optimum mbuf data size.
--
2.47.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC 11/13] test: add tests for ethdev mirror
2025-04-11 23:44 [RFC 00/13] Packet capture using port mirroring Stephen Hemminger
` (9 preceding siblings ...)
2025-04-11 23:44 ` [RFC 10/13] pcapng: split packet copy from header insertion Stephen Hemminger
@ 2025-04-11 23:44 ` Stephen Hemminger
2025-04-11 23:44 ` [RFC 12/13] app/testpmd: support for port mirroring Stephen Hemminger
` (2 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-11 23:44 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
Simple API and packet mirroring standalone tests.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
app/test/meson.build | 1 +
app/test/test_ethdev_mirror.c | 286 ++++++++++++++++++++++++++++++++++
2 files changed, 287 insertions(+)
create mode 100644 app/test/test_ethdev_mirror.c
diff --git a/app/test/meson.build b/app/test/meson.build
index b6285a6b45..ee90ca6648 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -73,6 +73,7 @@ source_file_deps = {
'test_errno.c': [],
'test_ethdev_api.c': ['ethdev'],
'test_ethdev_link.c': ['ethdev'],
+ 'test_ethdev_mirror.c': ['net_ring', 'ethdev', 'bus_vdev'],
'test_event_crypto_adapter.c': ['cryptodev', 'eventdev', 'bus_vdev'],
'test_event_dma_adapter.c': ['dmadev', 'eventdev', 'bus_vdev'],
'test_event_eth_rx_adapter.c': ['ethdev', 'eventdev', 'bus_vdev'],
diff --git a/app/test/test_ethdev_mirror.c b/app/test/test_ethdev_mirror.c
new file mode 100644
index 0000000000..924b8d0617
--- /dev/null
+++ b/app/test/test_ethdev_mirror.c
@@ -0,0 +1,286 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2025 Stephen Hemminger <stephen@networkplumber.org>
+ */
+#include "test.h"
+
+#include <stdio.h>
+
+#include <rte_eth_ring.h>
+#include <rte_ethdev.h>
+#include <rte_bus_vdev.h>
+
+#define SOCKET0 0
+#define RING_SIZE 256
+#define MIRROR_BURST 64
+#define NB_MBUF 512
+
+static struct rte_mempool *mirror_pool;
+static const uint32_t pkt_len = 200;
+
+static struct rte_ring *rxtx_ring;
+static const char ring_dev[] = "net_ring0";
+static uint16_t ring_port;
+static const char null_dev[] = "net_null0";
+static uint16_t null_port;
+
+static int
+configure_port(uint16_t port, const char *name)
+{
+ struct rte_eth_conf eth_conf = { 0 };
+
+ if (rte_eth_dev_configure(port, 1, 1, ð_conf) < 0) {
+ fprintf(stderr, "Configure failed for port %u: %s\n", port, name);
+ return -1;
+ }
+
+ /* only single queue */
+ if (rte_eth_tx_queue_setup(port, 0, RING_SIZE, SOCKET0, NULL) < 0) {
+ fprintf(stderr, "TX queue setup failed port %u: %s\n", port, name);
+ return -1;
+ }
+
+ if (rte_eth_rx_queue_setup(port, 0, RING_SIZE, SOCKET0, NULL, mirror_pool) < 0) {
+ fprintf(stderr, "RX queue setup failed port %u: %s\n", port, name);
+ return -1;
+ }
+
+ if (rte_eth_dev_start(port) < 0) {
+ fprintf(stderr, "Error starting port %u:%s\n", port, name);
+ return -1;
+ }
+
+ return 0;
+}
+
+static void
+test_cleanup(void)
+{
+ rte_ring_free(rxtx_ring);
+
+ rte_vdev_uninit("net_ring");
+ rte_vdev_uninit("net_null");
+
+ rte_mempool_free(mirror_pool);
+}
+
+/* Make two virtual devices ring and null */
+static int
+test_setup(void)
+{
+ char null_name[] = "net_null0";
+ int ret;
+
+ /* ring must support multiple enqueue for mirror to work */
+ rxtx_ring = rte_ring_create("R0", RING_SIZE, SOCKET0, RING_F_MP_RTS_ENQ | RING_F_SC_DEQ);
+ if (rxtx_ring == NULL) {
+ fprintf(stderr, "rte_ring_create R0 failed\n");
+ goto fail;
+ }
+ ret = rte_eth_from_rings(ring_dev, &rxtx_ring, 1, &rxtx_ring, 1, SOCKET0);
+ if (ret < 0) {
+ fprintf(stderr, "rte_eth_from_rings failed\n");
+ goto fail;
+ }
+ ring_port = ret;
+
+ /* Make a dummy null device to snoop on */
+ if (rte_vdev_init(null_dev, NULL) != 0) {
+ fprintf(stderr, "Failed to create vdev '%s'\n", null_dev);
+ goto fail;
+ }
+ if (rte_eth_dev_get_port_by_name(null_dev, &null_port) != 0) {
+ fprintf(stderr, "cannot find added vdev %s\n", null_name);
+ goto fail;
+ }
+
+ mirror_pool = rte_pktmbuf_pool_create("mirror_pool", NB_MBUF, 32,
+ 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
+ if (mirror_pool == NULL) {
+ fprintf(stderr, "rte_pktmbuf_pool_create failed\n");
+ goto fail;
+ }
+
+ ret = configure_port(ring_port, ring_dev);
+ if (ret < 0)
+ goto fail;
+
+ ret = configure_port(null_port, null_dev);
+ if (ret < 0)
+ goto fail;
+
+ return 0;
+fail:
+ test_cleanup();
+ return -1;
+}
+
+/* Make sure mirror API checks args */
+static int32_t
+ethdev_mirror_api(void)
+{
+ struct rte_eth_mirror_conf conf = {
+ .mp = mirror_pool,
+ .snaplen = UINT32_MAX,
+ .direction = RTE_MIRROR_DIRECTION_EGRESS,
+ };
+ uint16_t invalid_port = RTE_MAX_ETHPORTS;
+ int ret;
+
+ ret = rte_eth_add_mirror(invalid_port, null_port, &conf);
+ TEST_ASSERT(ret != 0, "Created mirror from invalid port");
+
+ ret = rte_eth_add_mirror(null_port, invalid_port, &conf);
+ TEST_ASSERT(ret != 0, "Created mirror to invalid port");
+
+ conf.snaplen = 1;
+ ret = rte_eth_add_mirror(null_port, ring_port, &conf);
+ TEST_ASSERT(ret != 0, "Created mirror with invalid snap len");
+
+ conf.snaplen = UINT16_MAX;
+ conf.direction = 0;
+ ret = rte_eth_add_mirror(null_port, ring_port, &conf);
+ TEST_ASSERT(ret != 0, "Created mirror with invalid direction");
+
+ conf.direction = RTE_MIRROR_DIRECTION_INGRESS;
+ ret = rte_eth_add_mirror(ring_port, ring_port, &conf);
+ TEST_ASSERT(ret != 0, "Created mirror to self");
+
+ ret = rte_eth_add_mirror(ring_port, null_port, &conf);
+ TEST_ASSERT(ret == 0, "Could not create mirror from ring to null");
+
+ ret = rte_eth_add_mirror(ring_port, null_port, &conf);
+ TEST_ASSERT(ret != 0, "Able to create duplicate mirror");
+
+ ret = rte_eth_remove_mirror(ring_port);
+ TEST_ASSERT(ret == 0, "Unable to delete mirror");
+
+ ret = rte_eth_remove_mirror(ring_port);
+ TEST_ASSERT(ret != 0, "Able to delete port without mirror");
+
+ return 0;
+}
+
+static int
+init_mbuf(struct rte_mbuf *m, uint16_t id)
+{
+ struct {
+ struct rte_ether_hdr eth;
+ struct rte_ipv4_hdr ip;
+ struct rte_udp_hdr udp;
+ } hdrs = {
+ .eth = {
+ .dst_addr.addr_bytes = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+ .ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4),
+ },
+ .ip = {
+ .version_ihl = RTE_IPV4_VHL_DEF,
+ .time_to_live = 1,
+ .next_proto_id = IPPROTO_UDP,
+ .src_addr = rte_cpu_to_be_32(RTE_IPV4_LOOPBACK),
+ .dst_addr = rte_cpu_to_be_32(RTE_IPV4_BROADCAST),
+ },
+ .udp = {
+ .dst_port = rte_cpu_to_be_16(9), /* Discard port */
+ },
+ };
+
+ rte_eth_random_addr(hdrs.eth.src_addr.addr_bytes);
+ uint16_t plen = pkt_len - sizeof(struct rte_ether_hdr);
+
+ hdrs.ip.packet_id = rte_cpu_to_be_16(id);
+ hdrs.ip.total_length = rte_cpu_to_be_16(plen);
+ hdrs.ip.hdr_checksum = rte_ipv4_cksum(&hdrs.ip);
+
+ plen -= sizeof(struct rte_ipv4_hdr);
+ hdrs.udp.src_port = rte_rand();
+ hdrs.udp.dgram_len = rte_cpu_to_be_16(plen);
+
+ void *data = rte_pktmbuf_append(m, pkt_len);
+ TEST_ASSERT(data != NULL, "could not add header");
+
+ memcpy(data, &hdrs, sizeof(hdrs));
+ return 0;
+}
+
+static int
+init_burst(struct rte_mbuf *pkts[], unsigned int n)
+{
+ for (unsigned int i = 0; i < n; i++) {
+ if (init_mbuf(pkts[i], i) < 0)
+ return -1;
+ }
+ return 0;
+}
+
+static int
+validate_burst(struct rte_mbuf *pkts[], unsigned int n)
+{
+ for (unsigned int i = 0; i < n; i++) {
+ struct rte_mbuf *m = pkts[i];
+
+ rte_mbuf_sanity_check(m, 1);
+ TEST_ASSERT(m->pkt_len != pkt_len,
+ "mirror packet not right len");
+
+ const struct rte_ether_hdr *eh
+ = rte_pktmbuf_mtod(m, struct rte_ether_hdr *);
+ TEST_ASSERT(rte_is_broadcast_ether_addr(&eh->dst_addr),
+ "mirrored packet is not broadcast");
+
+ }
+ return 0;
+}
+
+static int32_t
+ethdev_mirror_packets(void)
+{
+ struct rte_eth_mirror_conf conf = {
+ .mp = mirror_pool,
+ .snaplen = UINT32_MAX,
+ .direction = RTE_MIRROR_DIRECTION_EGRESS,
+ };
+ struct rte_mbuf *tx_pkts[MIRROR_BURST], *rx_pkts[MIRROR_BURST];
+ uint16_t nb_tx, nb_rx;
+ int ret;
+
+ ret = rte_pktmbuf_alloc_bulk(mirror_pool, tx_pkts, MIRROR_BURST);
+ TEST_ASSERT(ret == 0, "Could not allocate mbufs");
+
+ ret = init_burst(tx_pkts, MIRROR_BURST);
+ TEST_ASSERT(ret == 0, "Init mbufs failed");
+
+ ret = rte_eth_add_mirror(null_port, ring_port, &conf);
+ TEST_ASSERT(ret == 0, "Could not create mirror from ring to null");
+
+ nb_tx = rte_eth_tx_burst(null_port, 0, tx_pkts, MIRROR_BURST);
+ TEST_ASSERT(nb_tx == MIRROR_BURST, "Only sent %u burst to null (vs %u)",
+ nb_tx, MIRROR_BURST);
+
+ nb_rx = rte_eth_rx_burst(ring_port, 0, rx_pkts, MIRROR_BURST);
+ TEST_ASSERT(nb_rx == MIRROR_BURST, "Only received %u of %u packets",
+ nb_rx, MIRROR_BURST);
+
+ validate_burst(rx_pkts, nb_rx);
+ rte_pktmbuf_free_bulk(rx_pkts, nb_rx);
+
+ return 0;
+}
+
+static struct unit_test_suite ethdev_mirror_suite = {
+ .suite_name = "port mirroring",
+ .setup = test_setup,
+ .teardown = test_cleanup,
+ .unit_test_cases = {
+ TEST_CASE(ethdev_mirror_api),
+ TEST_CASE(ethdev_mirror_packets),
+ TEST_CASES_END()
+ }
+};
+
+static int
+test_ethdev_mirror(void)
+{
+ return unit_test_suite_runner(ðdev_mirror_suite);
+}
+
+REGISTER_FAST_TEST(ethdev_mirror, true, true, test_ethdev_mirror);
--
2.47.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC 12/13] app/testpmd: support for port mirroring
2025-04-11 23:44 [RFC 00/13] Packet capture using port mirroring Stephen Hemminger
` (10 preceding siblings ...)
2025-04-11 23:44 ` [RFC 11/13] test: add tests for ethdev mirror Stephen Hemminger
@ 2025-04-11 23:44 ` Stephen Hemminger
2025-04-11 23:44 ` [RFC 13/13] app/dumpcap: use port mirror instead of pdump Stephen Hemminger
2025-04-12 11:06 ` [RFC 00/13] Packet capture using port mirroring Morten Brørup
13 siblings, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-11 23:44 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Aman Singh
Add new commands to enable testing of port mirror functionality.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
app/test-pmd/cmdline.c | 1 +
app/test-pmd/cmdline_mirror.c | 120 ++++++++++++++++++++
app/test-pmd/cmdline_mirror.h | 12 ++
app/test-pmd/meson.build | 1 +
doc/guides/testpmd_app_ug/testpmd_funcs.rst | 15 +++
5 files changed, 149 insertions(+)
create mode 100644 app/test-pmd/cmdline_mirror.c
create mode 100644 app/test-pmd/cmdline_mirror.h
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index b4089d281b..64ba094e04 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -66,6 +66,7 @@
#include "testpmd.h"
#include "cmdline_cman.h"
#include "cmdline_mtr.h"
+#include "cmdline_mirror.h"
#include "cmdline_tm.h"
#include "bpf_cmd.h"
diff --git a/app/test-pmd/cmdline_mirror.c b/app/test-pmd/cmdline_mirror.c
new file mode 100644
index 0000000000..cea41302e1
--- /dev/null
+++ b/app/test-pmd/cmdline_mirror.c
@@ -0,0 +1,120 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2025 Stephen Hemminger <stephen@networkplumber.org>
+ */
+
+#include <stdlib.h>
+
+#include <cmdline_parse.h>
+#include <cmdline_parse_num.h>
+#include <cmdline_parse_string.h>
+
+#include <rte_ethdev.h>
+
+#include "testpmd.h"
+#include "cmdline_mirror.h"
+
+/* *** Create Port Mirror Object *** */
+struct cmd_create_port_mirror_result {
+ cmdline_fixed_string_t create;
+ cmdline_fixed_string_t port;
+ cmdline_fixed_string_t mirror;
+ uint16_t port_id;
+ cmdline_fixed_string_t destination;
+ uint16_t target_id;
+};
+
+static cmdline_parse_token_string_t cmd_create_port_mirror_create =
+ TOKEN_STRING_INITIALIZER(struct cmd_create_port_mirror_result, create, "create");
+static cmdline_parse_token_string_t cmd_create_port_mirror_port =
+ TOKEN_STRING_INITIALIZER(struct cmd_create_port_mirror_result, port, "port");
+static cmdline_parse_token_string_t cmd_create_port_mirror_mirror =
+ TOKEN_STRING_INITIALIZER(struct cmd_create_port_mirror_result, mirror, "mirror");
+static cmdline_parse_token_string_t cmd_create_port_mirror_destination =
+ TOKEN_STRING_INITIALIZER(struct cmd_create_port_mirror_result, destination, "destination");
+static cmdline_parse_token_num_t cmd_create_port_mirror_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_create_port_mirror_result, port_id, RTE_UINT16);
+static cmdline_parse_token_num_t cmd_create_port_mirror_target_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_create_port_mirror_result, target_id, RTE_UINT16);
+
+static void cmd_create_port_mirror_parsed(void *parsed_result,
+ __rte_unused struct cmdline *cl,
+ __rte_unused void *data)
+{
+ const struct cmd_create_port_mirror_result *res = parsed_result;
+ struct rte_eth_mirror_conf mirror_conf = {
+ .snaplen = RTE_MBUF_DEFAULT_BUF_SIZE,
+ .direction = RTE_MIRROR_DIRECTION_INGRESS | RTE_MIRROR_DIRECTION_EGRESS
+ };
+ int ret;
+
+ if (port_id_is_invalid(res->port_id, ENABLED_WARN))
+ return;
+
+ if (port_id_is_invalid(res->target_id, ENABLED_WARN))
+ return;
+
+ ret = rte_eth_add_mirror(res->port_id, res->target_id, &mirror_conf);
+ if (ret != 0)
+ fprintf(stderr, "%s\n", rte_strerror(-ret));
+}
+
+cmdline_parse_inst_t cmd_create_port_mirror = {
+ .f = cmd_create_port_mirror_parsed,
+ .data = NULL,
+ .help_str = "create port mirror <port_id> destination <port_id>",
+ .tokens = {
+ (void *)&cmd_create_port_mirror_create,
+ (void *)&cmd_create_port_mirror_port,
+ (void *)&cmd_create_port_mirror_mirror,
+ (void *)&cmd_create_port_mirror_port_id,
+ (void *)&cmd_create_port_mirror_destination,
+ (void *)&cmd_create_port_mirror_target_id,
+ NULL
+ },
+};
+
+/* *** Disable Port Mirror Object *** */
+struct cmd_disable_port_mirror_result {
+ cmdline_fixed_string_t disable;
+ cmdline_fixed_string_t port;
+ cmdline_fixed_string_t mirror;
+ uint16_t port_id;
+};
+
+static void cmd_disable_port_mirror_parsed(void *parsed_result,
+ __rte_unused struct cmdline *cl,
+ __rte_unused void *data)
+{
+ const struct cmd_disable_port_mirror_result *res = parsed_result;
+ int ret;
+
+ if (port_id_is_invalid(res->port_id, ENABLED_WARN))
+ return;
+
+ /* Disable Mirror */
+ ret = rte_eth_remove_mirror(res->port_id);
+ if (ret != 0)
+ fprintf(stderr, "%s\n", rte_strerror(-ret));
+}
+
+static cmdline_parse_token_string_t cmd_disable_port_mirror_disable =
+ TOKEN_STRING_INITIALIZER(struct cmd_disable_port_mirror_result, disable, "disable");
+static cmdline_parse_token_string_t cmd_disable_port_mirror_port =
+ TOKEN_STRING_INITIALIZER(struct cmd_disable_port_mirror_result, port, "port");
+static cmdline_parse_token_string_t cmd_disable_port_mirror_mirror =
+ TOKEN_STRING_INITIALIZER(struct cmd_disable_port_mirror_result, mirror, "mirror");
+static cmdline_parse_token_num_t cmd_disable_port_mirror_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_disable_port_mirror_result, port_id, RTE_UINT16);
+
+cmdline_parse_inst_t cmd_disable_port_mirror = {
+ .f = cmd_disable_port_mirror_parsed,
+ .data = NULL,
+ .help_str = "disable port mirror <port_id>",
+ .tokens = {
+ (void *)&cmd_disable_port_mirror_disable,
+ (void *)&cmd_disable_port_mirror_port,
+ (void *)&cmd_disable_port_mirror_mirror,
+ (void *)&cmd_disable_port_mirror_port_id,
+ NULL
+ },
+};
diff --git a/app/test-pmd/cmdline_mirror.h b/app/test-pmd/cmdline_mirror.h
new file mode 100644
index 0000000000..23ebe1ed75
--- /dev/null
+++ b/app/test-pmd/cmdline_mirror.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2025 Stephen Hemminger <stephen@networkplumber.org>
+ */
+
+#ifndef _CMDLINE_MIRROR_H_
+#define _CMDLINE_MIRROR_H_
+
+/* Port MIRROR commands */
+extern cmdline_parse_inst_t cmd_create_port_mirror;
+extern cmdline_parse_inst_t cmd_disable_port_mirror;
+
+#endif /* _CMDLINE_MIRROR_H_ */
diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
index 000548c261..f34d80bb5d 100644
--- a/app/test-pmd/meson.build
+++ b/app/test-pmd/meson.build
@@ -9,6 +9,7 @@ sources = files(
'cmdline_cman.c',
'cmdline_flow.c',
'cmdline_mtr.c',
+ 'cmdline_mirror.c',
'cmdline_tm.c',
'cmd_flex_item.c',
'config.c',
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index eeef49500f..ea94f993fe 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -2568,6 +2568,21 @@ where:
* ``clear``: Flag that indicates whether the statistics counters should
be cleared (i.e. set to zero) immediately after they have been read or not.
+create mirror
+-----------
+
+Create a new MIRROR port for an ethernet device::
+
+ testpmd> create port mirror (port_id) destination (port_id)
+
+delete mirror
+-----------
+
+Disable MIRROR port for an ethernet device::
+
+ testpmd> disable port mirror (port_id)
+
+
Traffic Management
------------------
--
2.47.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [RFC 13/13] app/dumpcap: use port mirror instead of pdump
2025-04-11 23:44 [RFC 00/13] Packet capture using port mirroring Stephen Hemminger
` (11 preceding siblings ...)
2025-04-11 23:44 ` [RFC 12/13] app/testpmd: support for port mirroring Stephen Hemminger
@ 2025-04-11 23:44 ` Stephen Hemminger
2025-04-12 11:06 ` [RFC 00/13] Packet capture using port mirroring Morten Brørup
13 siblings, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-11 23:44 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Reshma Pattan
Use the new port mirror API instead of pdump.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
app/dumpcap/main.c | 361 +++++++++++++++++++++++++++++++---------
app/dumpcap/meson.build | 2 +-
2 files changed, 284 insertions(+), 79 deletions(-)
diff --git a/app/dumpcap/main.c b/app/dumpcap/main.c
index 3d3c0dbc66..7adbab71c8 100644
--- a/app/dumpcap/main.c
+++ b/app/dumpcap/main.c
@@ -36,8 +36,6 @@
#include <rte_mbuf.h>
#include <rte_mempool.h>
#include <rte_pcapng.h>
-#include <rte_pdump.h>
-#include <rte_ring.h>
#include <rte_string_fns.h>
#include <rte_thread.h>
#include <rte_time.h>
@@ -92,16 +90,22 @@ struct capture_options {
struct interface {
TAILQ_ENTRY(interface) next;
uint16_t port;
+ struct rte_bpf *filter;
+ uint64_t filteraccept;
struct capture_options opts;
- struct rte_bpf_prm *bpf_prm;
char name[RTE_ETH_NAME_MAX_LEN];
+ struct rte_eth_stats start_stats;
const char *ifname;
const char *ifdescr;
};
+static int timestamp_dynfield = -1;
+static uint64_t timestamp_dynflag;
+
TAILQ_HEAD(interface_list, interface);
static struct interface_list interfaces = TAILQ_HEAD_INITIALIZER(interfaces);
+static struct interface *port2intf[RTE_MAX_ETHPORTS];
/* Can do either pcap or pcapng format output */
typedef union {
@@ -239,14 +243,16 @@ static void find_interfaces(void)
TAILQ_FOREACH(intf, &interfaces, next) {
/* if name is valid then just record port */
- if (rte_eth_dev_get_port_by_name(intf->name, &intf->port) == 0)
- continue;
+ if (rte_eth_dev_get_port_by_name(intf->name, &intf->port) != 0) {
+ /* maybe got passed port number string as name */
+ intf->port = get_uint(intf->name, "port_number", UINT16_MAX);
+ if (rte_eth_dev_get_name_by_port(intf->port, intf->name) < 0)
+ rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
+ intf->port);
+ }
- /* maybe got passed port number string as name */
- intf->port = get_uint(intf->name, "port_number", UINT16_MAX);
- if (rte_eth_dev_get_name_by_port(intf->port, intf->name) < 0)
- rte_exit(EXIT_FAILURE, "Invalid port number %u\n",
- intf->port);
+ if (rte_eth_stats_get(intf->port, &intf->start_stats) < 0)
+ rte_exit(EXIT_FAILURE, "Could not read stats for port %u\n", intf->port);
}
}
@@ -266,6 +272,9 @@ static void set_default_interface(void)
intf = add_interface(name);
intf->port = p;
+
+ if (rte_eth_stats_get(intf->port, &intf->start_stats) < 0)
+ rte_exit(EXIT_FAILURE, "Could not read stats for default port %u\n", intf->port);
return;
}
rte_exit(EXIT_FAILURE, "No usable interfaces found\n");
@@ -295,6 +304,9 @@ static void compile_filters(void)
struct bpf_program bf;
pcap_t *pcap;
+ /* cache for filter packets */
+ port2intf[intf->port] = intf;
+
pcap = pcap_open_dead(DLT_EN10MB, intf->opts.snap_len);
if (!pcap)
rte_exit(EXIT_FAILURE, "can not open pcap\n");
@@ -324,7 +336,12 @@ static void compile_filters(void)
exit(0);
}
- intf->bpf_prm = bpf_prm;
+
+ intf->filter = rte_bpf_load(bpf_prm);
+ if (intf->filter == NULL)
+ rte_exit(EXIT_FAILURE,
+ "BPF load failed '%s'\n%s(%d)\n",
+ intf->name, rte_strerror(rte_errno), rte_errno);
/* Don't care about original program any more */
pcap_freecode(&bf);
@@ -518,13 +535,12 @@ static void statistics_loop(void)
}
static void
-cleanup_pdump_resources(void)
+disable_mirror_ports(void)
{
struct interface *intf;
TAILQ_FOREACH(intf, &interfaces, next) {
- rte_pdump_disable(intf->port,
- RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
+ rte_eth_remove_mirror(intf->port);
if (intf->opts.promisc_mode)
rte_eth_promiscuous_disable(intf->port);
}
@@ -552,7 +568,6 @@ enable_primary_monitor(void)
{
int ret;
- /* Once primary exits, so will pdump. */
ret = rte_eal_alarm_set(MONITOR_INTERVAL, monitor_primary, NULL);
if (ret < 0)
fprintf(stderr, "Fail to enable monitor:%d\n", ret);
@@ -571,23 +586,24 @@ disable_primary_monitor(void)
static void
report_packet_stats(dumpcap_out_t out)
{
- struct rte_pdump_stats pdump_stats;
struct interface *intf;
- uint64_t ifrecv, ifdrop;
- double percent;
fputc('\n', stderr);
+
TAILQ_FOREACH(intf, &interfaces, next) {
- if (rte_pdump_stats(intf->port, &pdump_stats) < 0)
+ uint64_t ifrecv, ifdrop;
+ struct rte_eth_stats stats;
+ double percent;
+
+ if (rte_eth_stats_get(intf->port, &stats) < 0)
continue;
- /* do what Wiretap does */
- ifrecv = pdump_stats.accepted + pdump_stats.filtered;
- ifdrop = pdump_stats.nombuf + pdump_stats.ringfull;
+ ifrecv = stats.ipackets - intf->start_stats.ipackets;
+ ifdrop = (stats.rx_nombuf - intf->start_stats.rx_nombuf)
+ + (stats.imissed - intf->start_stats.imissed);
if (use_pcapng)
- rte_pcapng_write_stats(out.pcapng, intf->port,
- ifrecv, ifdrop, NULL);
+ rte_pcapng_write_stats(out.pcapng, intf->port, ifrecv, ifdrop, NULL);
if (ifrecv == 0)
percent = 0;
@@ -668,12 +684,21 @@ static void dpdk_init(void)
rte_exit(EXIT_FAILURE, "Can not restore original CPU affinity\n");
}
-/* Create packet ring shared between callbacks and process */
-static struct rte_ring *create_ring(void)
+/* Create ring port to redirect packet to.
+ * This could be much simpler if the ring PMD API (rte_eth_from_rings)
+ * worked from a secondary process, but it doesn't.
+ */
+static struct rte_ring *
+create_ring_dev(char *vdev_name, uint16_t *mirror_port)
{
- struct rte_ring *ring;
+ struct rte_eth_dev_owner owner = { 0 };
+ struct rte_eth_conf dev_conf = { 0 };
+ struct rte_ring *ring = NULL;
char ring_name[RTE_RING_NAMESIZE];
+ char vdev_args[128];
size_t size, log2;
+ uint16_t port;
+ int ret;
/* Find next power of 2 >= size. */
size = ring_size;
@@ -686,17 +711,60 @@ static struct rte_ring *create_ring(void)
ring_size = size;
}
- /* Want one ring per invocation of program */
- snprintf(ring_name, sizeof(ring_name),
- "dumpcap-%d", getpid());
+ ret = rte_eth_dev_owner_new(&owner.id);
+ if (ret < 0)
+ rte_exit(EXIT_FAILURE, "rte_eth_dev_owner_new failed: %s\n",
+ strerror(-ret));
- ring = rte_ring_create(ring_name, ring_size,
- rte_socket_id(), 0);
+ /* Give the vdev a unique name */
+ snprintf(ring_name, sizeof(ring_name), "dumpcap%"PRIu64, owner.id);
+ ring = rte_ring_create(ring_name, ring_size, rte_socket_id(),
+ RING_F_MP_RTS_ENQ | RING_F_SC_DEQ);
if (ring == NULL)
rte_exit(EXIT_FAILURE, "Could not create ring :%s\n",
rte_strerror(rte_errno));
+ strlcpy(owner.name, ring_name, sizeof(owner.name));
+
+ snprintf(vdev_name, RTE_DEV_NAME_MAX_LEN,
+ "net_ring-dumpcap%"PRIu64, owner.id);
+ snprintf(vdev_args, sizeof(vdev_args),
+ "ring=%s,timestamp", ring_name);
+
+ if (rte_eal_hotplug_add("vdev", vdev_name, vdev_args) < 0)
+ rte_exit(EXIT_FAILURE,
+ "rte_eal_hotplug_add of %s failed:%s\n",
+ vdev_name, rte_strerror(rte_errno));
+
+ ret = rte_eth_dev_get_port_by_name(vdev_name, &port);
+ if (ret != 0) {
+ fprintf(stderr, "Could not port for %s: %s\n",
+ vdev_name, strerror(-ret));
+ goto unplug;
+ }
+
+ ret = rte_eth_dev_owner_set(port, &owner);
+ if (ret != 0) {
+ fprintf(stderr, "Could not set owner for port for %u: %s\n",
+ port, strerror(-ret));
+ goto unplug;
+ }
+
+ ret = rte_eth_dev_configure(port, 1, 1, &dev_conf);
+ if (ret < 0) {
+ fprintf(stderr, "Could not configure port %u: %s\n",
+ port, strerror(-ret));
+ goto unplug;
+ }
+
+ *mirror_port = port;
return ring;
+
+unplug:
+ rte_eal_hotplug_remove("vdev", vdev_name);
+ rte_ring_free(ring);
+ rte_eal_cleanup();
+ exit(EXIT_FAILURE);
}
static struct rte_mempool *create_mempool(void)
@@ -796,7 +864,7 @@ static dumpcap_out_t create_output(void)
version(), capture_comment);
if (ret.pcapng == NULL)
rte_exit(EXIT_FAILURE, "pcapng_fdopen failed: %s\n",
- strerror(rte_errno));
+ rte_strerror(rte_errno));
free(os);
TAILQ_FOREACH(intf, &interfaces, next) {
@@ -823,37 +891,30 @@ static dumpcap_out_t create_output(void)
return ret;
}
-static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp)
+static int enable_mirror(uint16_t mirror_port, struct rte_mempool *mpool)
{
struct interface *intf;
unsigned int count = 0;
- uint32_t flags;
int ret;
- flags = RTE_PDUMP_FLAG_RXTX;
- if (use_pcapng)
- flags |= RTE_PDUMP_FLAG_PCAPNG;
+ ret = rte_eth_dev_start(mirror_port);
+ if (ret < 0)
+ rte_exit(EXIT_FAILURE, "Could not start mirror port %u: %s\n",
+ mirror_port, strerror(-ret));
TAILQ_FOREACH(intf, &interfaces, next) {
- ret = rte_pdump_enable_bpf(intf->port, RTE_PDUMP_ALL_QUEUES,
- flags, intf->opts.snap_len,
- r, mp, intf->bpf_prm);
+ struct rte_eth_mirror_conf conf = {
+ .mp = mpool,
+ .snaplen = intf->opts.snap_len,
+ .direction = RTE_MIRROR_DIRECTION_INGRESS |
+ RTE_MIRROR_DIRECTION_EGRESS,
+ };
+
+ ret = rte_eth_add_mirror(intf->port, mirror_port, &conf);
if (ret < 0) {
- const struct interface *intf2;
-
- /* unwind any previous enables */
- TAILQ_FOREACH(intf2, &interfaces, next) {
- if (intf == intf2)
- break;
- rte_pdump_disable(intf2->port,
- RTE_PDUMP_ALL_QUEUES, RTE_PDUMP_FLAG_RXTX);
- if (intf2->opts.promisc_mode)
- rte_eth_promiscuous_disable(intf2->port);
- }
- rte_exit(EXIT_FAILURE,
- "Packet dump enable on %u:%s failed %s\n",
- intf->port, intf->name,
- rte_strerror(rte_errno));
+ fprintf(stderr, "Port mirror on %u:%s failed %s\n",
+ intf->port, intf->name, strerror(-ret));
+ return -1;
}
if (intf->opts.promisc_mode) {
@@ -868,6 +929,31 @@ static void enable_pdump(struct rte_ring *r, struct rte_mempool *mp)
}
++count;
}
+ return count;
+}
+
+static void setup_mbuf(void)
+{
+ int offset;
+
+ offset = rte_mbuf_dynfield_lookup(RTE_MBUF_DYNFIELD_TIMESTAMP_NAME, NULL);
+ if (offset < 0) {
+ fprintf(stderr, "Could not find timestamp dynamic field\n");
+ return;
+ }
+ timestamp_dynfield = offset;
+
+ offset = rte_mbuf_dynflag_lookup(RTE_MBUF_DYNFLAG_RX_TIMESTAMP_NAME, NULL);
+ if (offset < 0) {
+ fprintf(stderr, "Could not find timestamp flag\n");
+ return;
+ }
+ timestamp_dynflag = RTE_BIT64(offset);
+}
+
+static void show_capturing(unsigned int count)
+{
+ struct interface *intf;
fputs("Capturing on ", stdout);
TAILQ_FOREACH(intf, &interfaces, next) {
@@ -898,6 +984,42 @@ static void show_count(uint64_t count)
bt = fprintf(stderr, "%"PRIu64" ", count);
}
+static ssize_t
+pcapng_write_packets(rte_pcapng_t *pcapng,
+ struct rte_mbuf *pkts[], uint16_t n)
+{
+ struct rte_mbuf *towrite[BURST_SIZE];
+ unsigned int count = 0;
+ unsigned int i;
+
+ for (i = 0; i < n; i++) {
+ struct rte_mbuf *m = pkts[i];
+ const struct rte_mbuf_mirror *orig = &m->hash.mirror;
+ enum rte_pcapng_direction direction;
+ uint64_t ts;
+
+ if (orig->direction == RTE_MIRROR_DIRECTION_INGRESS)
+ direction = RTE_PCAPNG_DIRECTION_IN;
+ else if (orig->direction == RTE_MIRROR_DIRECTION_EGRESS)
+ direction = RTE_PCAPNG_DIRECTION_OUT;
+ else
+ direction = RTE_PCAPNG_DIRECTION_UNKNOWN;
+
+ if (likely(m->ol_flags & timestamp_dynflag))
+ ts = *RTE_MBUF_DYNFIELD(m, timestamp_dynfield, rte_mbuf_timestamp_t *);
+ else
+ ts = rte_get_tsc_cycles();
+
+ if (unlikely(rte_pcapng_insert(m, orig->queue_id, direction,
+ orig->orig_len, ts, NULL) < 0))
+ continue; /* skip no headroom? */
+
+ towrite[count++] = m;
+ }
+
+ return rte_pcapng_write_packets(pcapng, towrite, count);
+}
+
/* Write multiple packets in older pcap format */
static ssize_t
pcap_write_packets(pcap_dumper_t *dumper,
@@ -930,16 +1052,82 @@ pcap_write_packets(pcap_dumper_t *dumper,
return total;
}
+static unsigned int
+filter_burst(struct interface *intf, struct rte_mbuf *pkts[], unsigned int n)
+{
+ uint64_t results[BURST_SIZE];
+ struct rte_mbuf *towrite[BURST_SIZE];
+ unsigned int i, count;
+ uint32_t matches;
+
+ matches = rte_bpf_exec_burst(intf->filter, (void **)pkts, results, n);
+ if (matches == n)
+ return n;
+ intf->filteraccept += matches;
+
+ for (i = 0, count = 0; i < n; i++) {
+ if (results[i])
+ towrite[count++] = pkts[i];
+ else
+ rte_pktmbuf_free(pkts[i]);
+ }
+ memcpy(pkts, towrite, count * sizeof(struct rte_mbuf *));
+ return count;
+}
+
+static ssize_t
+process_burst(uint16_t in_port, dumpcap_out_t out,
+ struct rte_mbuf *pkts[], unsigned int n)
+{
+ struct interface *intf;
+ ssize_t written;
+
+ if (in_port >= RTE_MAX_ETHPORTS)
+ goto invalid_port;
+
+ intf = port2intf[in_port];
+ if (intf == NULL)
+ goto invalid_port;
+
+ if (intf->filter) {
+ n = filter_burst(intf, pkts, n);
+ if (n == 0)
+ return 0;
+ }
+
+ packets_received += n;
+ if (use_pcapng)
+ written = pcapng_write_packets(out.pcapng, pkts, n);
+ else
+ written = pcap_write_packets(out.dumper, pkts, n);
+
+ if (written > 0)
+ file_size += written;
+
+ rte_pktmbuf_free_bulk(pkts, n);
+ return written;
+
+invalid_port:
+ static bool warn_once = true;
+ if (warn_once) {
+ warn_once = false;
+ fprintf(stderr, "Packet in ring from port %u\n", in_port);
+ }
+
+ rte_pktmbuf_free_bulk(pkts, n);
+ return 0;
+}
+
/* Process all packets in ring and dump to capture file */
-static int process_ring(dumpcap_out_t out, struct rte_ring *r)
+static int process_ring(dumpcap_out_t out, struct rte_ring *ring)
{
struct rte_mbuf *pkts[BURST_SIZE];
- unsigned int avail, n;
+ uint16_t in_port;
+ unsigned int i, n, start, avail;
static unsigned int empty_count;
- ssize_t written;
+ int ret;
- n = rte_ring_sc_dequeue_burst(r, (void **) pkts, BURST_SIZE,
- &avail);
+ n = rte_ring_sc_dequeue_burst(ring, (void **)pkts, BURST_SIZE, &avail);
if (n == 0) {
/* don't consume endless amounts of cpu if idle */
if (empty_count < SLEEP_THRESHOLD)
@@ -951,18 +1139,24 @@ static int process_ring(dumpcap_out_t out, struct rte_ring *r)
empty_count = (avail == 0);
- if (use_pcapng)
- written = rte_pcapng_write_packets(out.pcapng, pkts, n);
- else
- written = pcap_write_packets(out.dumper, pkts, n);
+ /* process all packets mirrored from same port */
+ start = 0;
+ in_port = pkts[start]->port;
+ for (i = 1; i < n; i++) {
+ if (likely(pkts[i]->port == in_port))
+ continue;
+ ret = process_burst(in_port, out, pkts + start, i - start);
+ if (ret < 0)
+ return ret; /* stop on file write error */
- rte_pktmbuf_free_bulk(pkts, n);
+ start = i;
+ in_port = pkts[i]->port;
+ }
- if (written < 0)
- return -1;
+ ret = process_burst(in_port, out, pkts + start, n - start);
+ if (ret < 0)
+ return ret;
- file_size += written;
- packets_received += n;
if (!quiet)
show_count(packets_received);
@@ -971,15 +1165,18 @@ static int process_ring(dumpcap_out_t out, struct rte_ring *r)
int main(int argc, char **argv)
{
- struct rte_ring *r;
struct rte_mempool *mp;
struct sigaction action = {
.sa_flags = SA_RESTART,
.sa_handler = signal_handler,
};
struct sigaction origaction;
+ struct rte_ring *ring = NULL;
+ char vdev_name[RTE_DEV_NAME_MAX_LEN];
+ uint16_t mirror_port = UINT16_MAX;
dumpcap_out_t out;
char *p;
+ int ret;
p = strrchr(argv[0], '/');
if (p == NULL)
@@ -1018,12 +1215,19 @@ int main(int argc, char **argv)
exit(0);
}
- r = create_ring();
mp = create_mempool();
+ ring = create_ring_dev(vdev_name, &mirror_port);
out = create_output();
+ ret = enable_mirror(mirror_port, mp);
+ if (ret < 0)
+ goto cleanup;
+
+ setup_mbuf();
+
+ show_capturing(ret);
+
start_time = time(NULL);
- enable_pdump(r, mp);
if (!quiet) {
fprintf(stderr, "Packets captured: ");
@@ -1031,7 +1235,7 @@ int main(int argc, char **argv)
}
while (!rte_atomic_load_explicit(&quit_signal, rte_memory_order_relaxed)) {
- if (process_ring(out, r) < 0) {
+ if (process_ring(out, ring) < 0) {
fprintf(stderr, "pcapng file write failed; %s\n",
strerror(errno));
break;
@@ -1048,19 +1252,20 @@ int main(int argc, char **argv)
break;
}
- disable_primary_monitor();
-
if (rte_eal_primary_proc_alive(NULL))
report_packet_stats(out);
+cleanup:
if (use_pcapng)
rte_pcapng_close(out.pcapng);
else
pcap_dump_close(out.dumper);
- cleanup_pdump_resources();
+ disable_primary_monitor();
+ disable_mirror_ports();
- rte_ring_free(r);
+ rte_eal_hotplug_remove("vdev", vdev_name);
+ rte_ring_free(ring);
rte_mempool_free(mp);
return rte_eal_cleanup() ? EXIT_FAILURE : 0;
diff --git a/app/dumpcap/meson.build b/app/dumpcap/meson.build
index 69c016c780..6212291d40 100644
--- a/app/dumpcap/meson.build
+++ b/app/dumpcap/meson.build
@@ -14,4 +14,4 @@ endif
ext_deps += pcap_dep
sources = files('main.c')
-deps += ['ethdev', 'pdump', 'pcapng', 'bpf']
+deps += ['net_ring', 'ethdev', 'pcapng', 'bpf']
--
2.47.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [RFC 08/13] mbuf: add fields for mirroring
2025-04-11 23:44 ` [RFC 08/13] mbuf: add fields for mirroring Stephen Hemminger
@ 2025-04-12 9:59 ` Morten Brørup
2025-04-12 16:56 ` Stephen Hemminger
0 siblings, 1 reply; 23+ messages in thread
From: Morten Brørup @ 2025-04-12 9:59 UTC (permalink / raw)
To: Stephen Hemminger, dev
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, 12 April 2025 01.45
>
> Add field to union used for sched/event etc, for use when
> an mbuf is mirrored.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> lib/mbuf/rte_mbuf_core.h | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> index a0df265b5d..1806dddd67 100644
> --- a/lib/mbuf/rte_mbuf_core.h
> +++ b/lib/mbuf/rte_mbuf_core.h
> @@ -589,6 +589,14 @@ struct __rte_cache_aligned rte_mbuf {
> * @see
> rte_event_eth_tx_adapter_txq_set()
> */
> } txadapter; /**< Eventdev ethdev Tx
> adapter */
> + struct rte_mbuf_mirror {
> + uint32_t orig_len;
> + uint16_t queue_id;
> + uint16_t direction;
> + /**< Port mirroring uses this to
> store origin
> + * @see rte_eth_mirror()
> + */
> + } mirror;
> uint32_t usr;
> /**< User defined tags. See
> rte_distributor_process() */
> } hash; /**< hash information
Stop overloading the "hash" field!
We now have dynfields. The mbuf structure's dedicated fields should be limited to absolute core features.
Long term, the "hash" field should be cleaned up.
E.g. if we get rid of the Flow Director and make the 8 byte "sched" (Hierarchical Scheduler) a dynfield, the "hash" field can be reduced from 8 byte to 4 byte (RSS hash).
I acknowledge that some mbuf fields can be overloaded and thus used for multiple purposes - i.e. a value only used for ingress/forwarding (e.g. RSS hash) can share an mbuf field with a value only used for egress (e.g. Scheduler).
The overloading of the "hash" field is too much already. E.g. can the Hierarchical Scheduler be used together with the Eventdev ethdev Tx adapter, or are they mutually exclusive due to sharing the same mbuf field?
Going to the extreme, we would completely replace the "hash" field by dynfields.
In short: Overloading the "hash" field with port mirror information is a step in the wrong direction.
> */
> --
> 2.47.2
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [RFC 00/13] Packet capture using port mirroring
2025-04-11 23:44 [RFC 00/13] Packet capture using port mirroring Stephen Hemminger
` (12 preceding siblings ...)
2025-04-11 23:44 ` [RFC 13/13] app/dumpcap: use port mirror instead of pdump Stephen Hemminger
@ 2025-04-12 11:06 ` Morten Brørup
2025-04-12 17:04 ` Stephen Hemminger
13 siblings, 1 reply; 23+ messages in thread
From: Morten Brørup @ 2025-04-12 11:06 UTC (permalink / raw)
To: Stephen Hemminger, dev
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, 12 April 2025 01.45
>
> This is a rework of how packet capture is done in DPDK.
> The existing mechanism using callbacks has a number of problems;
> the main one is that any packets sent/received in a secondary process
> are not visible. The root cause is that callbacks can not function
> across process boundaries, they are specific to the role.
>
> The new mechanism builds on the concept of port mirroring used
> in Linux/FreeBSD and also router vendors (SPAN ports). The
> infrastructure
> is built around a new ethdev call to mirror a port.
>
> The internals of dumpcap are redone but the program command
> syntax is unchanged (no doc change needed). Usingthe new mirror
> mechanism,
> the dumpcap program creates a port using ring PMD and
> then directs mirror to that. Then the packets are extracted from the
> ring.
> If capturing on multiple devices, they all get mirrored to
> the same ring PMD.
Here's some general feedback...
I very much like the concept of using a shared ring for carrying the mirrored packets.
It allows other types of future consumers to process the mirrored packets, e.g. encapsulating and forwarding them into an L2 or L3 tunnel, or Wireshark remote capture.
Using a Ring PMD instead of setting up a dedicated ring also has some advantages, such as the ability to set up multiple separate mirror target instances.
For performance reasons, we should ensure that a lightweight Ring PMD is available for mirroring, in case the Ring PMD is extended with new features affecting its performance.
Or maybe create a new type of mirror/capture virtual PMD. This would allow applications to enqueue packets from non-ethdev interfaces into it.
I'm not convinced you should undo the VLAN offloads when enqueueing a mirror packet... If you do, you should also undo QinQ offloads. Undoing offloads is never going to end.
And if you create a dedicated PMD type for carrying mirrored packets, you can ensure that the offload fields remain intact on dequeue.
You should consider sampling and VLAN filtering as typical mirroring features.
It would improve the performance if such filtering is done before copying the packets.
PS: I agree with your choice of copying (rather than cloning by refcount) when mirroring the packets.
>
> Doing this uncovered a bunch of additional missing features and bugs.
> The first is that test-pmd auto attach introduced in 24.11 breaks this
> (and it broke old pdump as well) so it is removed.
>
> The dumpcap program needs to be able to start the ring PMD it created
> but the existing ethdev API is broken when used from secondary process.
> So fix that.
>
> The mirror API has a restriction that the port being mirrored to must
> allow
> lock free transmit. This is because both rx and tx as well as multiple
> queues need to go over the same transmit queue. Although it might be
> possible
> to have some complex mapping between multiple ports/queues being
> mirrored
> to multiple transmit queues, that gets to be a lot of overhead in API's
> and implementation. Fixing the ring and null PMD to allow lockeless
> transmit is good enough.
>
> Added tests for the new features.
> The performance has not been measured but the overhead of checking for
> mirror port is no more that the overhead of looking at the callback
> list.
>
> TODO items:
> - need release notes for new features
> - more through testing on real hardware
> - mark old pdump API and application for deprecation in 25.07 and
> removal in 25.11
> - more cleanup of patch wording and docs needed.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 08/13] mbuf: add fields for mirroring
2025-04-12 9:59 ` Morten Brørup
@ 2025-04-12 16:56 ` Stephen Hemminger
2025-04-13 7:00 ` Morten Brørup
0 siblings, 1 reply; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-12 16:56 UTC (permalink / raw)
To: Morten Brørup; +Cc: dev
On Sat, 12 Apr 2025 11:59:10 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Saturday, 12 April 2025 01.45
> >
> > Add field to union used for sched/event etc, for use when
> > an mbuf is mirrored.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> > lib/mbuf/rte_mbuf_core.h | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > index a0df265b5d..1806dddd67 100644
> > --- a/lib/mbuf/rte_mbuf_core.h
> > +++ b/lib/mbuf/rte_mbuf_core.h
> > @@ -589,6 +589,14 @@ struct __rte_cache_aligned rte_mbuf {
> > * @see
> > rte_event_eth_tx_adapter_txq_set()
> > */
> > } txadapter; /**< Eventdev ethdev Tx
> > adapter */
> > + struct rte_mbuf_mirror {
> > + uint32_t orig_len;
> > + uint16_t queue_id;
> > + uint16_t direction;
> > + /**< Port mirroring uses this to
> > store origin
> > + * @see rte_eth_mirror()
> > + */
> > + } mirror;
> > uint32_t usr;
> > /**< User defined tags. See
> > rte_distributor_process() */
> > } hash; /**< hash information
>
> Stop overloading the "hash" field!
>
> We now have dynfields. The mbuf structure's dedicated fields should be limited to absolute core features.
>
> Long term, the "hash" field should be cleaned up.
> E.g. if we get rid of the Flow Director and make the 8 byte "sched" (Hierarchical Scheduler) a dynfield, the "hash" field can be reduced from 8 byte to 4 byte (RSS hash).
>
> I acknowledge that some mbuf fields can be overloaded and thus used for multiple purposes - i.e. a value only used for ingress/forwarding (e.g. RSS hash) can share an mbuf field with a value only used for egress (e.g. Scheduler).
>
> The overloading of the "hash" field is too much already. E.g. can the Hierarchical Scheduler be used together with the Eventdev ethdev Tx adapter, or are they mutually exclusive due to sharing the same mbuf field?
>
> Going to the extreme, we would completely replace the "hash" field by dynfields.
>
> In short: Overloading the "hash" field with port mirror information is a step in the wrong direction.
Short answer: Dynamic Fields are hard to work with primary/secondary process model.
The goal was to allow dumpcap to run and just work without modifications to the primary application.
If secondary creates dynamic field, the primary doesn't see it.
The hash field is not going away, flow director is stuck, it has been scheduled for removal
for 3 years and Intel still needs it. Other uses such as storing received hash value are
still needed.
Long answer:
It maybe possible. The patchset went through many revisions during development.
Ended up having to have MP server for start/stop, and if that code was extended
to allow secondary to proxy setting up mirror, then the code in handling mirror() on
primary could also setup the dynamic fields. But accessing dynamic fields is slower,
not that it matters that much if we have to copy mbuf anyway. Other option
would be to pre-pend a pseudo header that capture could then use.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 00/13] Packet capture using port mirroring
2025-04-12 11:06 ` [RFC 00/13] Packet capture using port mirroring Morten Brørup
@ 2025-04-12 17:04 ` Stephen Hemminger
2025-04-13 9:26 ` Morten Brørup
0 siblings, 1 reply; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-12 17:04 UTC (permalink / raw)
To: Morten Brørup; +Cc: dev
On Sat, 12 Apr 2025 13:06:55 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Saturday, 12 April 2025 01.45
> >
> > This is a rework of how packet capture is done in DPDK.
> > The existing mechanism using callbacks has a number of problems;
> > the main one is that any packets sent/received in a secondary process
> > are not visible. The root cause is that callbacks can not function
> > across process boundaries, they are specific to the role.
> >
> > The new mechanism builds on the concept of port mirroring used
> > in Linux/FreeBSD and also router vendors (SPAN ports). The
> > infrastructure
> > is built around a new ethdev call to mirror a port.
> >
> > The internals of dumpcap are redone but the program command
> > syntax is unchanged (no doc change needed). Usingthe new mirror
> > mechanism,
> > the dumpcap program creates a port using ring PMD and
> > then directs mirror to that. Then the packets are extracted from the
> > ring.
> > If capturing on multiple devices, they all get mirrored to
> > the same ring PMD.
>
> Here's some general feedback...
>
> I very much like the concept of using a shared ring for carrying the mirrored packets.
> It allows other types of future consumers to process the mirrored packets, e.g. encapsulating and forwarding them into an L2 or L3 tunnel, or Wireshark remote capture.
>
> Using a Ring PMD instead of setting up a dedicated ring also has some advantages, such as the ability to set up multiple separate mirror target instances.
> For performance reasons, we should ensure that a lightweight Ring PMD is available for mirroring, in case the Ring PMD is extended with new features affecting its performance.
> Or maybe create a new type of mirror/capture virtual PMD. This would allow applications to enqueue packets from non-ethdev interfaces into it.
The new modifications are simple, and do not impact performance of ring PMD.
We don't need another PMD for this.
It did uncover some broken API's in ring PMD, but those can be marked deprecated
and disappear in some future version.
>
> I'm not convinced you should undo the VLAN offloads when enqueueing a mirror packet... If you do, you should also undo QinQ offloads. Undoing offloads is never going to end.
> And if you create a dedicated PMD type for carrying mirrored packets, you can ensure that the offload fields remain intact on dequeue.
VLAN needs to be unoffloaded before filtering and writing to file.
Can move that logic to the other end of the ring (i.e in dumpcap).
The issue with filtering is that current DPDK BPF has no way to reference offload
meta data. There is a way to do that with Linux kernel BPF (via negative offsets).
Looked into doing this in the DPDK BPF, but there are several blockers: not all
the offloads are the same, and more importantly the pcap library to build filters
(pcap_compile) has some internal issues. The pcap library "knows" when it is
build a direct Linux socket filter, versus just class BPF. For example,
if you call pcap_compile() with "vlan 13", it will generate different code
based on whether it is a Linux filter or not.
> You should consider sampling and VLAN filtering as typical mirroring features.
> It would improve the performance if such filtering is done before copying the packets.
The problem is that filtering before going into the ring leads to creating BPF
dependency inside ethdev which is a build nightmare. Tried it and was not successful.
>
> PS: I agree with your choice of copying (rather than cloning by refcount) when mirroring the packets.
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [RFC 08/13] mbuf: add fields for mirroring
2025-04-12 16:56 ` Stephen Hemminger
@ 2025-04-13 7:00 ` Morten Brørup
2025-04-13 14:31 ` Stephen Hemminger
0 siblings, 1 reply; 23+ messages in thread
From: Morten Brørup @ 2025-04-13 7:00 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, 12 April 2025 18.57
>
> On Sat, 12 Apr 2025 11:59:10 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > Sent: Saturday, 12 April 2025 01.45
> > >
> > > Add field to union used for sched/event etc, for use when
> > > an mbuf is mirrored.
> > >
> > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > ---
> > > lib/mbuf/rte_mbuf_core.h | 8 ++++++++
> > > 1 file changed, 8 insertions(+)
> > >
> > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > index a0df265b5d..1806dddd67 100644
> > > --- a/lib/mbuf/rte_mbuf_core.h
> > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > @@ -589,6 +589,14 @@ struct __rte_cache_aligned rte_mbuf {
> > > * @see
> > > rte_event_eth_tx_adapter_txq_set()
> > > */
> > > } txadapter; /**< Eventdev ethdev Tx
> > > adapter */
> > > + struct rte_mbuf_mirror {
> > > + uint32_t orig_len;
> > > + uint16_t queue_id;
> > > + uint16_t direction;
> > > + /**< Port mirroring uses this to
> > > store origin
> > > + * @see rte_eth_mirror()
> > > + */
> > > + } mirror;
> > > uint32_t usr;
> > > /**< User defined tags. See
> > > rte_distributor_process() */
> > > } hash; /**< hash information
> >
> > Stop overloading the "hash" field!
> >
> > We now have dynfields. The mbuf structure's dedicated fields should
> be limited to absolute core features.
> >
> > Long term, the "hash" field should be cleaned up.
> > E.g. if we get rid of the Flow Director and make the 8 byte "sched"
> (Hierarchical Scheduler) a dynfield, the "hash" field can be reduced
> from 8 byte to 4 byte (RSS hash).
> >
> > I acknowledge that some mbuf fields can be overloaded and thus used
> for multiple purposes - i.e. a value only used for ingress/forwarding
> (e.g. RSS hash) can share an mbuf field with a value only used for
> egress (e.g. Scheduler).
> >
> > The overloading of the "hash" field is too much already. E.g. can the
> Hierarchical Scheduler be used together with the Eventdev ethdev Tx
> adapter, or are they mutually exclusive due to sharing the same mbuf
> field?
> >
> > Going to the extreme, we would completely replace the "hash" field by
> dynfields.
> >
> > In short: Overloading the "hash" field with port mirror information
> is a step in the wrong direction.
>
> Short answer: Dynamic Fields are hard to work with primary/secondary
> process model.
> The goal was to allow dumpcap to run and just work without
> modifications to the primary application.
> If secondary creates dynamic field, the primary doesn't see it.
I skimmed the mbuf dynfield source code, and it looks like it is designed for primary/secondary process model.
If the primary process doesn't see a dynfield created in a secondary process, it is a bug in the mbuf dynfield library. I couldn't find such a bug in Bugzilla.
I would be much better to fix the bug than overloading the "hash" field.
>
> The hash field is not going away, flow director is stuck, it has been
> scheduled for removal
> for 3 years and Intel still needs it. Other uses such as storing
> received hash value are
> still needed.
>
> Long answer:
> It maybe possible. The patchset went through many revisions during
> development.
> Ended up having to have MP server for start/stop, and if that code was
> extended
> to allow secondary to proxy setting up mirror, then the code in
> handling mirror() on
> primary could also setup the dynamic fields. But accessing dynamic
> fields is slower,
> not that it matters that much if we have to copy mbuf anyway. Other
> option
> would be to pre-pend a pseudo header that capture could then use.
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [RFC 00/13] Packet capture using port mirroring
2025-04-12 17:04 ` Stephen Hemminger
@ 2025-04-13 9:26 ` Morten Brørup
0 siblings, 0 replies; 23+ messages in thread
From: Morten Brørup @ 2025-04-13 9:26 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, 12 April 2025 19.05
>
> On Sat, 12 Apr 2025 13:06:55 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > Sent: Saturday, 12 April 2025 01.45
> > >
> > > This is a rework of how packet capture is done in DPDK.
> > > The existing mechanism using callbacks has a number of problems;
> > > the main one is that any packets sent/received in a secondary
> process
> > > are not visible. The root cause is that callbacks can not function
> > > across process boundaries, they are specific to the role.
> > >
> > > The new mechanism builds on the concept of port mirroring used
> > > in Linux/FreeBSD and also router vendors (SPAN ports). The
> > > infrastructure
> > > is built around a new ethdev call to mirror a port.
> > >
> > > The internals of dumpcap are redone but the program command
> > > syntax is unchanged (no doc change needed). Usingthe new mirror
> > > mechanism,
> > > the dumpcap program creates a port using ring PMD and
> > > then directs mirror to that. Then the packets are extracted from
> the
> > > ring.
> > > If capturing on multiple devices, they all get mirrored to
> > > the same ring PMD.
> >
> > Here's some general feedback...
> >
> > I very much like the concept of using a shared ring for carrying the
> mirrored packets.
> > It allows other types of future consumers to process the mirrored
> packets, e.g. encapsulating and forwarding them into an L2 or L3
> tunnel, or Wireshark remote capture.
> >
> > Using a Ring PMD instead of setting up a dedicated ring also has some
> advantages, such as the ability to set up multiple separate mirror
> target instances.
> > For performance reasons, we should ensure that a lightweight Ring PMD
> is available for mirroring, in case the Ring PMD is extended with new
> features affecting its performance.
> > Or maybe create a new type of mirror/capture virtual PMD. This would
> allow applications to enqueue packets from non-ethdev interfaces into
> it.
>
> The new modifications are simple, and do not impact performance of ring
> PMD.
> We don't need another PMD for this.
Agree; so long as the Ring PMD remains lightweight.
The Ring PMD sets the mbuf "port" field on RX, thereby overwriting the port of the mirrored packet.
Instead of adding a new mbuf field for the mirrored packet to hold the original port, you could add a "transparent" configuration option to the Ring PMD, so it doesn’t touch the transported mbuf.
Furthermore, not setting the mbuf "port" field would significantly improve performance of the Ring PMD; this is the only mbuf field written by the Ring PMD, so a memory write operation per packet is eliminated.
Metadata added to the mirrored packet, e.g. the egress port (for packets mirrored at egress) and maybe the queue_id and a timestamp (of the time of mirroring), could be stored in a dedicated dynfield or in a prepended header (as you already mentioned).
Prepending a header also works if there are no more dynfields available. Especially if we add a lot of metadata. I think a prepended header is more future proof.
Hmm... if the "port" field is also stored in the metadata, my suggested Ring PMD "transparent" option becomes a performance optimization only.
> It did uncover some broken API's in ring PMD, but those can be marked
> deprecated
> and disappear in some future version.
+1
>
> >
> > I'm not convinced you should undo the VLAN offloads when enqueueing a
> mirror packet... If you do, you should also undo QinQ offloads. Undoing
> offloads is never going to end.
> > And if you create a dedicated PMD type for carrying mirrored packets,
> you can ensure that the offload fields remain intact on dequeue.
>
> VLAN needs to be unoffloaded before filtering and writing to file.
> Can move that logic to the other end of the ring (i.e in dumpcap).
If the application at the dequeueing end of the ring is not dumpcap, but some other application, this other application might prefer getting the mbufs as raw as possible.
I think moving the unoffloading logic to the dequeueing end of the ring would be better.
BTW, unoffloading should probably be a public DPDK library function, to avoid copy-pasting it from dumpcap to other applications.
>
> The issue with filtering is that current DPDK BPF has no way to
> reference offload
> meta data. There is a way to do that with Linux kernel BPF (via
> negative offsets).
> Looked into doing this in the DPDK BPF, but there are several blockers:
> not all
> the offloads are the same, and more importantly the pcap library to
> build filters
> (pcap_compile) has some internal issues. The pcap library "knows" when
> it is
> build a direct Linux socket filter, versus just class BPF. For example,
> if you call pcap_compile() with "vlan 13", it will generate different
> code
> based on whether it is a Linux filter or not.
>
> > You should consider sampling and VLAN filtering as typical mirroring
> features.
> > It would improve the performance if such filtering is done before
> copying the packets.
>
> The problem is that filtering before going into the ring leads to
> creating BPF
> dependency inside ethdev which is a build nightmare. Tried it and was
> not successful.
Filtering before going into the ring is only a performance optimization.
Full BPF would be nice, but apparently not doable.
VLAN filtering (mirroring one interesting VLAN only) and sampling (to reduce the traffic volume) is common for port mirroring. Perhaps just offer this limited pre-filtering as a performance optimization. Full BPF remains in dumpcap.
Anyway, limited pre-filtering can be added later; just keep it in mind for API design purposes.
If VLAN pre-filtering is added to the enqueueing end of the ring, it obviously needs to parse the mbuf/packet to obtain the VLAN; but I still think it should not modify the packet, i.e. unoffloading the VLAN tag still belongs in the dequeueing end of the ring.
Another thought (more feature creep)...
It would be "nice to have" the ability to attach multiple mirrors to a port.
E.g. if an application uses the mirror feature for its own purposes, dumpcap can still attach its own mirror to ports already being mirrored by the application.
>
> >
> > PS: I agree with your choice of copying (rather than cloning by
> refcount) when mirroring the packets.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 08/13] mbuf: add fields for mirroring
2025-04-13 7:00 ` Morten Brørup
@ 2025-04-13 14:31 ` Stephen Hemminger
2025-04-13 14:44 ` Morten Brørup
0 siblings, 1 reply; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-13 14:31 UTC (permalink / raw)
To: Morten Brørup; +Cc: dev
On Sun, 13 Apr 2025 09:00:19 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Saturday, 12 April 2025 18.57
> >
> > On Sat, 12 Apr 2025 11:59:10 +0200
> > Morten Brørup <mb@smartsharesystems.com> wrote:
> >
> > > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > > Sent: Saturday, 12 April 2025 01.45
> > > >
> > > > Add field to union used for sched/event etc, for use when
> > > > an mbuf is mirrored.
> > > >
> > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > > ---
> > > > lib/mbuf/rte_mbuf_core.h | 8 ++++++++
> > > > 1 file changed, 8 insertions(+)
> > > >
> > > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > > index a0df265b5d..1806dddd67 100644
> > > > --- a/lib/mbuf/rte_mbuf_core.h
> > > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > > @@ -589,6 +589,14 @@ struct __rte_cache_aligned rte_mbuf {
> > > > * @see
> > > > rte_event_eth_tx_adapter_txq_set()
> > > > */
> > > > } txadapter; /**< Eventdev ethdev Tx
> > > > adapter */
> > > > + struct rte_mbuf_mirror {
> > > > + uint32_t orig_len;
> > > > + uint16_t queue_id;
> > > > + uint16_t direction;
> > > > + /**< Port mirroring uses this to
> > > > store origin
> > > > + * @see rte_eth_mirror()
> > > > + */
> > > > + } mirror;
> > > > uint32_t usr;
> > > > /**< User defined tags. See
> > > > rte_distributor_process() */
> > > > } hash; /**< hash information
> > >
> > > Stop overloading the "hash" field!
> > >
> > > We now have dynfields. The mbuf structure's dedicated fields should
> > be limited to absolute core features.
> > >
> > > Long term, the "hash" field should be cleaned up.
> > > E.g. if we get rid of the Flow Director and make the 8 byte "sched"
> > (Hierarchical Scheduler) a dynfield, the "hash" field can be reduced
> > from 8 byte to 4 byte (RSS hash).
> > >
> > > I acknowledge that some mbuf fields can be overloaded and thus used
> > for multiple purposes - i.e. a value only used for ingress/forwarding
> > (e.g. RSS hash) can share an mbuf field with a value only used for
> > egress (e.g. Scheduler).
> > >
> > > The overloading of the "hash" field is too much already. E.g. can the
> > Hierarchical Scheduler be used together with the Eventdev ethdev Tx
> > adapter, or are they mutually exclusive due to sharing the same mbuf
> > field?
> > >
> > > Going to the extreme, we would completely replace the "hash" field by
> > dynfields.
> > >
> > > In short: Overloading the "hash" field with port mirror information
> > is a step in the wrong direction.
> >
> > Short answer: Dynamic Fields are hard to work with primary/secondary
> > process model.
> > The goal was to allow dumpcap to run and just work without
> > modifications to the primary application.
> > If secondary creates dynamic field, the primary doesn't see it.
>
> I skimmed the mbuf dynfield source code, and it looks like it is designed for primary/secondary process model.
> If the primary process doesn't see a dynfield created in a secondary process, it is a bug in the mbuf dynfield library. I couldn't find such a bug in Bugzilla.
> I would be much better to fix the bug than overloading the "hash" field.
The problem is that if secondary makes a new field, the primary still has to lookup the offset.
And don't want to do that in the packet path. Need to invoke a control path argument in the primary.
If primary always makes the dynamic field, there really is not much point in it being dynamic.
^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [RFC 08/13] mbuf: add fields for mirroring
2025-04-13 14:31 ` Stephen Hemminger
@ 2025-04-13 14:44 ` Morten Brørup
0 siblings, 0 replies; 23+ messages in thread
From: Morten Brørup @ 2025-04-13 14:44 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Sunday, 13 April 2025 16.32
>
> On Sun, 13 Apr 2025 09:00:19 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > Sent: Saturday, 12 April 2025 18.57
> > >
> > > On Sat, 12 Apr 2025 11:59:10 +0200
> > > Morten Brørup <mb@smartsharesystems.com> wrote:
> > >
> > > > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > > > Sent: Saturday, 12 April 2025 01.45
> > > > >
> > > > > Add field to union used for sched/event etc, for use when
> > > > > an mbuf is mirrored.
> > > > >
> > > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > > > > ---
> > > > > lib/mbuf/rte_mbuf_core.h | 8 ++++++++
> > > > > 1 file changed, 8 insertions(+)
> > > > >
> > > > > diff --git a/lib/mbuf/rte_mbuf_core.h
> b/lib/mbuf/rte_mbuf_core.h
> > > > > index a0df265b5d..1806dddd67 100644
> > > > > --- a/lib/mbuf/rte_mbuf_core.h
> > > > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > > > @@ -589,6 +589,14 @@ struct __rte_cache_aligned rte_mbuf {
> > > > > * @see
> > > > > rte_event_eth_tx_adapter_txq_set()
> > > > > */
> > > > > } txadapter; /**< Eventdev ethdev
> Tx
> > > > > adapter */
> > > > > + struct rte_mbuf_mirror {
> > > > > + uint32_t orig_len;
> > > > > + uint16_t queue_id;
> > > > > + uint16_t direction;
> > > > > + /**< Port mirroring uses this
> to
> > > > > store origin
> > > > > + * @see rte_eth_mirror()
> > > > > + */
> > > > > + } mirror;
> > > > > uint32_t usr;
> > > > > /**< User defined tags. See
> > > > > rte_distributor_process() */
> > > > > } hash; /**< hash
> information
> > > >
> > > > Stop overloading the "hash" field!
> > > >
> > > > We now have dynfields. The mbuf structure's dedicated fields
> should
> > > be limited to absolute core features.
> > > >
> > > > Long term, the "hash" field should be cleaned up.
> > > > E.g. if we get rid of the Flow Director and make the 8 byte
> "sched"
> > > (Hierarchical Scheduler) a dynfield, the "hash" field can be
> reduced
> > > from 8 byte to 4 byte (RSS hash).
> > > >
> > > > I acknowledge that some mbuf fields can be overloaded and thus
> used
> > > for multiple purposes - i.e. a value only used for
> ingress/forwarding
> > > (e.g. RSS hash) can share an mbuf field with a value only used for
> > > egress (e.g. Scheduler).
> > > >
> > > > The overloading of the "hash" field is too much already. E.g. can
> the
> > > Hierarchical Scheduler be used together with the Eventdev ethdev Tx
> > > adapter, or are they mutually exclusive due to sharing the same
> mbuf
> > > field?
> > > >
> > > > Going to the extreme, we would completely replace the "hash"
> field by
> > > dynfields.
> > > >
> > > > In short: Overloading the "hash" field with port mirror
> information
> > > is a step in the wrong direction.
> > >
> > > Short answer: Dynamic Fields are hard to work with
> primary/secondary
> > > process model.
> > > The goal was to allow dumpcap to run and just work without
> > > modifications to the primary application.
> > > If secondary creates dynamic field, the primary doesn't see it.
> >
> > I skimmed the mbuf dynfield source code, and it looks like it is
> designed for primary/secondary process model.
> > If the primary process doesn't see a dynfield created in a secondary
> process, it is a bug in the mbuf dynfield library. I couldn't find such
> a bug in Bugzilla.
> > I would be much better to fix the bug than overloading the "hash"
> field.
>
> The problem is that if secondary makes a new field, the primary still
> has to lookup the offset.
> And don't want to do that in the packet path. Need to invoke a control
> path argument in the primary.
> If primary always makes the dynamic field, there really is not much
> point in it being dynamic.
The secondary could provide the dynfield offset to the primary in the control plane when adding the mirror, either as part of the struct rte_eth_mirror, or by some other means before setting dev->data->rx/tx_mirror.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RFC 02/13] ethdev: allow start/stop from secondary process
2025-04-11 23:44 ` [RFC 02/13] ethdev: allow start/stop from secondary process Stephen Hemminger
@ 2025-04-15 0:19 ` Stephen Hemminger
0 siblings, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2025-04-15 0:19 UTC (permalink / raw)
To: dev; +Cc: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, Anatoly Burakov
On Fri, 11 Apr 2025 16:44:39 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:
> Before this patch if secondary process called start/stop
> it would only impact the secondary process, the ethdev on the
> primary process will still not be started.
>
> With this patch, when start/stop is called from secondary,
> it calls the primary and does the operation there. The design
> is generic, and we can later add queue and other operations
> as needed.
>
> Bugzilla ID: 73
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
Wanted to expand this to other functions like rte_eth_dev_configure()
but the structure rte_eth_conf has grown quite large
According to pahole
/* size: 2280, cachelines: 36, members: 8 */
/* sum members: 2268, holes: 2, sum holes: 8 */
/* padding: 4 */
/* member types with holes: 3, total: 4, bit holes: 1, total: 1, bit paddings: 1, total: 29 bits */
The biggest offender
there is DCB and VMDQ both of which are features I doubt anyone ever uses,
but unlikely to get that fixed.
Because it is so large, it is impossible to pass rte_eth_conf over the mp
primary/secondary API.
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2025-04-15 0:19 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-11 23:44 [RFC 00/13] Packet capture using port mirroring Stephen Hemminger
2025-04-11 23:44 ` [RFC 01/13] app/testpmd: revert auto attach/detach Stephen Hemminger
2025-04-11 23:44 ` [RFC 02/13] ethdev: allow start/stop from secondary process Stephen Hemminger
2025-04-15 0:19 ` Stephen Hemminger
2025-04-11 23:44 ` [RFC 03/13] test: add test for hotplug and secondary process operations Stephen Hemminger
2025-04-11 23:44 ` [RFC 04/13] net/ring: allow lockfree transmit if ring supports it Stephen Hemminger
2025-04-11 23:44 ` [RFC 05/13] net/ring: add argument to attach existing ring Stephen Hemminger
2025-04-11 23:44 ` [RFC 06/13] net/ring: add timestamp devargs Stephen Hemminger
2025-04-11 23:44 ` [RFC 07/13] net/null: all lockfree transmit Stephen Hemminger
2025-04-11 23:44 ` [RFC 08/13] mbuf: add fields for mirroring Stephen Hemminger
2025-04-12 9:59 ` Morten Brørup
2025-04-12 16:56 ` Stephen Hemminger
2025-04-13 7:00 ` Morten Brørup
2025-04-13 14:31 ` Stephen Hemminger
2025-04-13 14:44 ` Morten Brørup
2025-04-11 23:44 ` [RFC 09/13] ethdev: add port mirror capability Stephen Hemminger
2025-04-11 23:44 ` [RFC 10/13] pcapng: split packet copy from header insertion Stephen Hemminger
2025-04-11 23:44 ` [RFC 11/13] test: add tests for ethdev mirror Stephen Hemminger
2025-04-11 23:44 ` [RFC 12/13] app/testpmd: support for port mirroring Stephen Hemminger
2025-04-11 23:44 ` [RFC 13/13] app/dumpcap: use port mirror instead of pdump Stephen Hemminger
2025-04-12 11:06 ` [RFC 00/13] Packet capture using port mirroring Morten Brørup
2025-04-12 17:04 ` Stephen Hemminger
2025-04-13 9:26 ` Morten Brørup
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).