From: Bruce Richardson <bruce.richardson@intel.com>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] tcpdump support in DPDK 2.3
Date: Wed, 16 Dec 2015 13:12:49 +0000 [thread overview]
Message-ID: <20151216131249.GC10020@bricha3-MOBL3> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC358AF771@smartserver.smartshare.dk>
On Wed, Dec 16, 2015 at 01:26:11PM +0100, Morten Brørup wrote:
> Bruce,
>
> Please note that tcpdump is a stupid name for a packet capture application that supports much more than just TCP.
>
> I had missed the point about ethdev supporting virtual interfaces, so thank you for pointing that out. That covers my concerns about capturing packets inside tunnels.
>
> I will gladly admit that you Intel guys are probably much more competent in the field of DPDK performance and scalability than I am. So Matthew and I have been asking you to kindly ensure that your solution scales well at very high packet rates too, and pointing out that filtering before copying is probably cheaper than copying before filtering. You mention that it leads to an important choice about which lcores get to do the work of filtering the packets, so that might be worth some discussion.
>
> :-)
>
> Med venlig hilsen / kind regards
> - Morten Brørup
>
Thanks for your support.
We may look at having a certain amount of flexibility in the configuration of
the setup, so as to avoid limiting the use of the functionality.
For scalability at very high packet rates, it's something we'll need you guys to
give us pointers on too - what's acceptable or not inside an app, and what
level of scalabilty is needed. I'd admit that most of our initial thinking in this
area was for debugging apps at less than line rate i.e. for functional testing.
For full line rate introspection, we'll have to see when we get some working code.
/Bruce
>
> -----Original Message-----
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: 16. december 2015 12:56
> To: Morten Brørup
> Cc: Matthew Hall; Kyle Larose; dev@dpdk.org
> Subject: Re: [dpdk-dev] tcpdump support in DPDK 2.3
>
> On Wed, Dec 16, 2015 at 12:40:43PM +0100, Morten Brørup wrote:
> > Bruce,
> >
> > This doesn't really sound like tcpdump to me; it sounds like port mirroring.
>
> It's actually a bit of both, in my opinion, it's designed to allow basic mirroring of traffic on a port to allow that traffic to be sent to a tcpdump destination.
> By going with a more generic approach, we hope to enable more possible use cases than just focusing on TCP.
>
>
> >
> > Your suggestion is limited to physical ports only, and cannot be attached further inside the application, e.g. for mirroring packets related to a specific VLAN.
>
> Yes, the lack of attachment inside the app is a limitation. There are two types of scenarios that could be considered for packet capture:
> * ones where the application can be modified to do it's own filtering and capturing.
> * ones where you want a generic capture mechanism which can be used on any application without modification.
> We have chosen to focus more on the second one, as that is where a generic solution for DPDK is likely to lie. For the first case, the application writer himself knows the type of traffic and how best to capture and filter it, so I don't think a generic one-size-fits-all solution is possible. [Though a couple of helper libraries may be of use]
>
> As for physical ports, the scheme should work for any ethdev - why do you see it only being limited to physical ports? What would you want to see monitored that we are missing.
>
> >
> > Furthermore, it doesn't sound like the filtering part scales well. Consider a fully loaded 40 Gbit/s port. You would need to copy all packets into a single rte_ring to the attached filtering process, which would then require its own set of lcores to probably discard most of these packets when filtering. I agree with Matthew that the filtering needs to happen as close to the source as possible, and must be scalable to multiple lcores.
>
> Without modifying the application itself to do it's own filtering I suspect scalability is always going to be a problem. That being said, there is no particular reason why a single rte_ring needs to be used - we could allow one ring per NIC queue for instance. The trouble with filtering at the source itself is that you put extra load on the IO cores. By using a ring, we put the filtering load on extra cores in a secondary process which can be scaled by the user without touching the main app.
>
> >
> > On the positive side, your idea has the advantage that the filter can be any application, and is not limited to BPF. However if the purpose is "tcpdump", we should probably consider BPF, which is the type of filtering offered by tcpdump.
>
> Having this work with any application is one of our primary targets here. The app author should not have to worry too much about getting basic debug support.
> Even if it doesn't work at 40G small packet rates, you can get a lot of benefit from a scheme that provides functional debugging for an app. Obviously, though we aim to make this as scalable as possible, which is why we want to allow fitlering in userspace before sending packets externally to DPDK.
>
> >
> > I would prefer having a BPF library available that the application can use at any point, either at the lowest level (when receiving/transmitting Ethernet packets) or at a higher level (e.g. when working with packets that go into or come out of a tunnel). The BPF library should implement packet length and relevant ancillary data, such as SKF_AD_VLAN_TAG etc. based on metadata in the mbuf.
> >
> > Transferring a BPF filter from an outside application could be done by using a simple text format, e.g. the output format of "tcpdump -ddd". This also opens an easy roadmap for Wireshark integration by simply extending excap to include such a BPF filter format.
> >
> >
> > Lots of negativity above. I very much like the idea of attaching the secondary process and going through an rte_ring. This allows the secondary process to pass the filtered and captured packets on in any format it likes to any destination it likes.
>
> Good, so we're not completely off-base here. :-)
>
> /Bruce
>
> >
> >
> > Med venlig hilsen / kind regards
> > - Morten Brørup
> >
> > -----Original Message-----
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: 16. december 2015 11:45
> >
> > Hi,
> >
> > we are currently doing some investigation and prototyping for this feature.
> > Our current thinking is the following:
> > * to allow dynamic control of the filtering, we are thinking of making use of
> > the multi-process infrastructure in DPDK. A secondary process can attach to a
> > primary at runtime and provide the packet filtering and dumping capability.
> > * ideally we want to create a generic packet mirroring callback inside the EAL,
> > that can be set up to mirror packets going through Rx/Tx on an ethdev.
> > * using this, packets being received on the port to be monitored are sent via
> > an rte_ring (ring ethdev) to the secondary process which takes those packets
> > and does any filtering on them. [This would be where BPF could fit into
> > things, but it's not something we have looked at yet.]
> > * initially we plan to have the secondary process then write packets to a pcap
> > file using a pcap PMD, but down the road if we get other PMDs, like a KNI PMD
> > or a TAP device PMD, those could be used as targets instead.
> >
> > This implementation we hope should provide enough hooks to enable the standard tools to be used for monitoring and capturing packets. We will send out draft implementation code for various parts of this as soon as we have it.
> >
> > Additional feedback welcome, as always. :-)
> >
> > Regards,
> > /Bruce
> >
> >
>
next prev parent reply other threads:[~2015-12-16 13:12 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-14 9:57 Morten Brørup
2015-12-14 15:45 ` Aaron Conole
2015-12-14 15:48 ` Thomas Monjalon
2015-12-14 18:29 ` Matthew Hall
2015-12-14 19:14 ` Stephen Hemminger
2015-12-14 22:23 ` Matthew Hall
2015-12-14 19:17 ` Aaron Conole
2015-12-14 21:29 ` Kyle Larose
2015-12-14 22:36 ` Matthew Hall
2015-12-16 10:45 ` Bruce Richardson
2015-12-16 11:37 ` Arnon Warshavsky
2015-12-16 11:56 ` Morten Brørup
2015-12-16 11:40 ` Morten Brørup
2015-12-16 11:56 ` Bruce Richardson
2015-12-16 12:26 ` Morten Brørup
2015-12-16 13:12 ` Bruce Richardson [this message]
2015-12-16 22:45 ` Morten Brørup
2015-12-16 23:38 ` Matthew Hall
2015-12-17 5:59 ` Arnon Warshavsky
2015-12-16 18:15 ` Matthew Hall
2015-12-21 15:39 ` Bruce Richardson
2015-12-21 16:08 ` Morten Brørup
2015-12-21 16:17 ` Gray, Mark D
2015-12-21 17:22 ` Matthew Hall
2015-12-21 16:11 ` Gray, Mark D
2015-12-14 22:25 ` Matthew Hall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151216131249.GC10020@bricha3-MOBL3 \
--to=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=mb@smartsharesystems.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).