From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Stephen Hemminger" <stephen@networkplumber.org>
Cc: <dev@dpdk.org>
Subject: RE: [RFC 00/13] Packet capture using port mirroring
Date: Sun, 13 Apr 2025 11:26:04 +0200 [thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9FBD2@smartserver.smartshare.dk> (raw)
In-Reply-To: <20250412100442.0c48971a@hermes.local>
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Saturday, 12 April 2025 19.05
>
> On Sat, 12 Apr 2025 13:06:55 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > Sent: Saturday, 12 April 2025 01.45
> > >
> > > This is a rework of how packet capture is done in DPDK.
> > > The existing mechanism using callbacks has a number of problems;
> > > the main one is that any packets sent/received in a secondary
> process
> > > are not visible. The root cause is that callbacks can not function
> > > across process boundaries, they are specific to the role.
> > >
> > > The new mechanism builds on the concept of port mirroring used
> > > in Linux/FreeBSD and also router vendors (SPAN ports). The
> > > infrastructure
> > > is built around a new ethdev call to mirror a port.
> > >
> > > The internals of dumpcap are redone but the program command
> > > syntax is unchanged (no doc change needed). Usingthe new mirror
> > > mechanism,
> > > the dumpcap program creates a port using ring PMD and
> > > then directs mirror to that. Then the packets are extracted from
> the
> > > ring.
> > > If capturing on multiple devices, they all get mirrored to
> > > the same ring PMD.
> >
> > Here's some general feedback...
> >
> > I very much like the concept of using a shared ring for carrying the
> mirrored packets.
> > It allows other types of future consumers to process the mirrored
> packets, e.g. encapsulating and forwarding them into an L2 or L3
> tunnel, or Wireshark remote capture.
> >
> > Using a Ring PMD instead of setting up a dedicated ring also has some
> advantages, such as the ability to set up multiple separate mirror
> target instances.
> > For performance reasons, we should ensure that a lightweight Ring PMD
> is available for mirroring, in case the Ring PMD is extended with new
> features affecting its performance.
> > Or maybe create a new type of mirror/capture virtual PMD. This would
> allow applications to enqueue packets from non-ethdev interfaces into
> it.
>
> The new modifications are simple, and do not impact performance of ring
> PMD.
> We don't need another PMD for this.
Agree; so long as the Ring PMD remains lightweight.
The Ring PMD sets the mbuf "port" field on RX, thereby overwriting the port of the mirrored packet.
Instead of adding a new mbuf field for the mirrored packet to hold the original port, you could add a "transparent" configuration option to the Ring PMD, so it doesn’t touch the transported mbuf.
Furthermore, not setting the mbuf "port" field would significantly improve performance of the Ring PMD; this is the only mbuf field written by the Ring PMD, so a memory write operation per packet is eliminated.
Metadata added to the mirrored packet, e.g. the egress port (for packets mirrored at egress) and maybe the queue_id and a timestamp (of the time of mirroring), could be stored in a dedicated dynfield or in a prepended header (as you already mentioned).
Prepending a header also works if there are no more dynfields available. Especially if we add a lot of metadata. I think a prepended header is more future proof.
Hmm... if the "port" field is also stored in the metadata, my suggested Ring PMD "transparent" option becomes a performance optimization only.
> It did uncover some broken API's in ring PMD, but those can be marked
> deprecated
> and disappear in some future version.
+1
>
> >
> > I'm not convinced you should undo the VLAN offloads when enqueueing a
> mirror packet... If you do, you should also undo QinQ offloads. Undoing
> offloads is never going to end.
> > And if you create a dedicated PMD type for carrying mirrored packets,
> you can ensure that the offload fields remain intact on dequeue.
>
> VLAN needs to be unoffloaded before filtering and writing to file.
> Can move that logic to the other end of the ring (i.e in dumpcap).
If the application at the dequeueing end of the ring is not dumpcap, but some other application, this other application might prefer getting the mbufs as raw as possible.
I think moving the unoffloading logic to the dequeueing end of the ring would be better.
BTW, unoffloading should probably be a public DPDK library function, to avoid copy-pasting it from dumpcap to other applications.
>
> The issue with filtering is that current DPDK BPF has no way to
> reference offload
> meta data. There is a way to do that with Linux kernel BPF (via
> negative offsets).
> Looked into doing this in the DPDK BPF, but there are several blockers:
> not all
> the offloads are the same, and more importantly the pcap library to
> build filters
> (pcap_compile) has some internal issues. The pcap library "knows" when
> it is
> build a direct Linux socket filter, versus just class BPF. For example,
> if you call pcap_compile() with "vlan 13", it will generate different
> code
> based on whether it is a Linux filter or not.
>
> > You should consider sampling and VLAN filtering as typical mirroring
> features.
> > It would improve the performance if such filtering is done before
> copying the packets.
>
> The problem is that filtering before going into the ring leads to
> creating BPF
> dependency inside ethdev which is a build nightmare. Tried it and was
> not successful.
Filtering before going into the ring is only a performance optimization.
Full BPF would be nice, but apparently not doable.
VLAN filtering (mirroring one interesting VLAN only) and sampling (to reduce the traffic volume) is common for port mirroring. Perhaps just offer this limited pre-filtering as a performance optimization. Full BPF remains in dumpcap.
Anyway, limited pre-filtering can be added later; just keep it in mind for API design purposes.
If VLAN pre-filtering is added to the enqueueing end of the ring, it obviously needs to parse the mbuf/packet to obtain the VLAN; but I still think it should not modify the packet, i.e. unoffloading the VLAN tag still belongs in the dequeueing end of the ring.
Another thought (more feature creep)...
It would be "nice to have" the ability to attach multiple mirrors to a port.
E.g. if an application uses the mirror feature for its own purposes, dumpcap can still attach its own mirror to ports already being mirrored by the application.
>
> >
> > PS: I agree with your choice of copying (rather than cloning by
> refcount) when mirroring the packets.
prev parent reply other threads:[~2025-04-13 9:26 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-11 23:44 Stephen Hemminger
2025-04-11 23:44 ` [RFC 01/13] app/testpmd: revert auto attach/detach Stephen Hemminger
2025-04-15 13:28 ` lihuisong (C)
2025-04-16 0:06 ` Stephen Hemminger
2025-04-11 23:44 ` [RFC 02/13] ethdev: allow start/stop from secondary process Stephen Hemminger
2025-04-15 0:19 ` Stephen Hemminger
2025-04-11 23:44 ` [RFC 03/13] test: add test for hotplug and secondary process operations Stephen Hemminger
2025-04-11 23:44 ` [RFC 04/13] net/ring: allow lockfree transmit if ring supports it Stephen Hemminger
2025-04-11 23:44 ` [RFC 05/13] net/ring: add argument to attach existing ring Stephen Hemminger
2025-04-11 23:44 ` [RFC 06/13] net/ring: add timestamp devargs Stephen Hemminger
2025-04-11 23:44 ` [RFC 07/13] net/null: all lockfree transmit Stephen Hemminger
2025-04-11 23:44 ` [RFC 08/13] mbuf: add fields for mirroring Stephen Hemminger
2025-04-12 9:59 ` Morten Brørup
2025-04-12 16:56 ` Stephen Hemminger
2025-04-13 7:00 ` Morten Brørup
2025-04-13 14:31 ` Stephen Hemminger
2025-04-13 14:44 ` Morten Brørup
2025-04-11 23:44 ` [RFC 09/13] ethdev: add port mirror capability Stephen Hemminger
2025-04-11 23:44 ` [RFC 10/13] pcapng: split packet copy from header insertion Stephen Hemminger
2025-04-11 23:44 ` [RFC 11/13] test: add tests for ethdev mirror Stephen Hemminger
2025-04-11 23:44 ` [RFC 12/13] app/testpmd: support for port mirroring Stephen Hemminger
2025-04-11 23:44 ` [RFC 13/13] app/dumpcap: use port mirror instead of pdump Stephen Hemminger
2025-04-12 11:06 ` [RFC 00/13] Packet capture using port mirroring Morten Brørup
2025-04-12 17:04 ` Stephen Hemminger
2025-04-13 9:26 ` Morten Brørup [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=98CBD80474FA8B44BF855DF32C47DC35E9FBD2@smartserver.smartshare.dk \
--to=mb@smartsharesystems.com \
--cc=dev@dpdk.org \
--cc=stephen@networkplumber.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).