From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smartserver.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by dpdk.org (Postfix) with ESMTP id 21B0EDE6 for ; Wed, 16 Dec 2015 13:26:13 +0100 (CET) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Date: Wed, 16 Dec 2015 13:26:11 +0100 Message-ID: <98CBD80474FA8B44BF855DF32C47DC358AF771@smartserver.smartshare.dk> In-Reply-To: <20151216115611.GB10020@bricha3-MOBL3> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [dpdk-dev] tcpdump support in DPDK 2.3 Thread-Index: AdE3+MkUKrE3+ETeSzuvcqzBWc2bfgAATkMg References: <98CBD80474FA8B44BF855DF32C47DC358AF758@smartserver.smartshare.dk> <20151214182931.GA17279@mhcomputing.net> <20151214223613.GC21163@mhcomputing.net> <20151216104502.GA10020@bricha3-MOBL3> <98CBD80474FA8B44BF855DF32C47DC358AF76F@smartserver.smartshare.dk> <20151216115611.GB10020@bricha3-MOBL3> From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Bruce Richardson" Cc: dev@dpdk.org Subject: Re: [dpdk-dev] tcpdump support in DPDK 2.3 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Dec 2015 12:26:13 -0000 Bruce, Please note that tcpdump is a stupid name for a packet capture = application that supports much more than just TCP. I had missed the point about ethdev supporting virtual interfaces, so = thank you for pointing that out. That covers my concerns about capturing = packets inside tunnels. I will gladly admit that you Intel guys are probably much more competent = in the field of DPDK performance and scalability than I am. So Matthew = and I have been asking you to kindly ensure that your solution scales = well at very high packet rates too, and pointing out that filtering = before copying is probably cheaper than copying before filtering. You = mention that it leads to an important choice about which lcores get to = do the work of filtering the packets, so that might be worth some = discussion. :-) Med venlig hilsen / kind regards - Morten Br=F8rup -----Original Message----- From: Bruce Richardson [mailto:bruce.richardson@intel.com]=20 Sent: 16. december 2015 12:56 To: Morten Br=F8rup Cc: Matthew Hall; Kyle Larose; dev@dpdk.org Subject: Re: [dpdk-dev] tcpdump support in DPDK 2.3 On Wed, Dec 16, 2015 at 12:40:43PM +0100, Morten Br=F8rup wrote: > Bruce, >=20 > This doesn't really sound like tcpdump to me; it sounds like port = mirroring. It's actually a bit of both, in my opinion, it's designed to allow basic = mirroring of traffic on a port to allow that traffic to be sent to a = tcpdump destination. By going with a more generic approach, we hope to enable more possible = use cases than just focusing on TCP. >=20 > Your suggestion is limited to physical ports only, and cannot be = attached further inside the application, e.g. for mirroring packets = related to a specific VLAN. Yes, the lack of attachment inside the app is a limitation. There are = two types of scenarios that could be considered for packet capture: * ones where the application can be modified to do it's own filtering = and capturing. * ones where you want a generic capture mechanism which can be used on = any application without modification. We have chosen to focus more on the second one, as that is where a = generic solution for DPDK is likely to lie. For the first case, the = application writer himself knows the type of traffic and how best to = capture and filter it, so I don't think a generic one-size-fits-all = solution is possible. [Though a couple of helper libraries may be of = use] As for physical ports, the scheme should work for any ethdev - why do = you see it only being limited to physical ports? What would you want to = see monitored that we are missing. >=20 > Furthermore, it doesn't sound like the filtering part scales well. = Consider a fully loaded 40 Gbit/s port. You would need to copy all = packets into a single rte_ring to the attached filtering process, which = would then require its own set of lcores to probably discard most of = these packets when filtering. I agree with Matthew that the filtering = needs to happen as close to the source as possible, and must be scalable = to multiple lcores. Without modifying the application itself to do it's own filtering I = suspect scalability is always going to be a problem. That being said, = there is no particular reason why a single rte_ring needs to be used - = we could allow one ring per NIC queue for instance. The trouble with = filtering at the source itself is that you put extra load on the IO = cores. By using a ring, we put the filtering load on extra cores in a = secondary process which can be scaled by the user without touching the = main app. >=20 > On the positive side, your idea has the advantage that the filter can = be any application, and is not limited to BPF. However if the purpose is = "tcpdump", we should probably consider BPF, which is the type of = filtering offered by tcpdump. Having this work with any application is one of our primary targets = here. The app author should not have to worry too much about getting = basic debug support. Even if it doesn't work at 40G small packet rates, you can get a lot of = benefit from a scheme that provides functional debugging for an app. = Obviously, though we aim to make this as scalable as possible, which is = why we want to allow fitlering in userspace before sending packets = externally to DPDK. >=20 > I would prefer having a BPF library available that the application can = use at any point, either at the lowest level (when = receiving/transmitting Ethernet packets) or at a higher level (e.g. when = working with packets that go into or come out of a tunnel). The BPF = library should implement packet length and relevant ancillary data, such = as SKF_AD_VLAN_TAG etc. based on metadata in the mbuf. >=20 > Transferring a BPF filter from an outside application could be done by = using a simple text format, e.g. the output format of "tcpdump -ddd". = This also opens an easy roadmap for Wireshark integration by simply = extending excap to include such a BPF filter format. >=20 >=20 > Lots of negativity above. I very much like the idea of attaching the = secondary process and going through an rte_ring. This allows the = secondary process to pass the filtered and captured packets on in any = format it likes to any destination it likes. Good, so we're not completely off-base here. :-) /Bruce >=20 >=20 > Med venlig hilsen / kind regards > - Morten Br=F8rup >=20 > -----Original Message----- > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > Sent: 16. december 2015 11:45 >=20 > Hi, >=20 > we are currently doing some investigation and prototyping for this = feature. > Our current thinking is the following: > * to allow dynamic control of the filtering, we are thinking of making = use of > the multi-process infrastructure in DPDK. A secondary process can = attach to a > primary at runtime and provide the packet filtering and dumping = capability. > * ideally we want to create a generic packet mirroring callback inside = the EAL, > that can be set up to mirror packets going through Rx/Tx on an = ethdev. > * using this, packets being received on the port to be monitored are = sent via > an rte_ring (ring ethdev) to the secondary process which takes those = packets > and does any filtering on them. [This would be where BPF could fit = into > things, but it's not something we have looked at yet.] > * initially we plan to have the secondary process then write packets = to a pcap > file using a pcap PMD, but down the road if we get other PMDs, like = a KNI PMD > or a TAP device PMD, those could be used as targets instead. >=20 > This implementation we hope should provide enough hooks to enable the = standard tools to be used for monitoring and capturing packets. We will = send out draft implementation code for various parts of this as soon as = we have it. >=20 > Additional feedback welcome, as always. :-) >=20 > Regards, > /Bruce >=20 >=20