From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id A78255952 for ; Fri, 8 Jul 2016 12:08:09 +0200 (CEST) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP; 08 Jul 2016 03:08:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,329,1464678000"; d="scan'208";a="1017915556" Received: from fmsmsx106.amr.corp.intel.com ([10.18.124.204]) by fmsmga002.fm.intel.com with ESMTP; 08 Jul 2016 03:08:08 -0700 Received: from fmsmsx121.amr.corp.intel.com (10.18.125.36) by FMSMSX106.amr.corp.intel.com (10.18.124.204) with Microsoft SMTP Server (TLS) id 14.3.248.2; Fri, 8 Jul 2016 03:08:08 -0700 Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by fmsmsx121.amr.corp.intel.com (10.18.125.36) with Microsoft SMTP Server (TLS) id 14.3.248.2; Fri, 8 Jul 2016 03:08:08 -0700 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.147]) by SHSMSX151.ccr.corp.intel.com ([169.254.3.150]) with mapi id 14.03.0248.002; Fri, 8 Jul 2016 18:08:06 +0800 From: "Liang, Cunming" To: Olivier Matz , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type from data Thread-Index: AQHR11HfPzhKEWyUek2oW5QL5NqaGaAKfy6AgAIDNxCAABbsgIABpZ2g Date: Fri, 8 Jul 2016 10:08:05 +0000 Message-ID: References: <1467733310-20875-1-git-send-email-olivier.matz@6wind.com> <1467733310-20875-6-git-send-email-olivier.matz@6wind.com> <577CA8D1.5000203@intel.com> <12989717-cc42-9ce6-f520-0ffbd3db7a8a@6wind.com> <683d73f9-62e0-8169-1222-80f5ea8d865b@6wind.com> In-Reply-To: <683d73f9-62e0-8169-1222-80f5ea8d865b@6wind.com> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type from data X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jul 2016 10:08:10 -0000 Hi Olivier, > -----Original Message----- > From: Olivier Matz [mailto:olivier.matz@6wind.com] > Sent: Thursday, July 07, 2016 11:49 PM > To: Liang, Cunming ; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet ty= pe > from data >=20 > Hi Cunming, >=20 > Thank you for your feedback. >=20 > On 07/07/2016 10:19 AM, Liang, Cunming wrote: > > Hi Olivier, > > > >> -----Original Message----- > >> From: Olivier MATZ [mailto:olivier.matz@6wind.com] > >> Sent: Wednesday, July 06, 2016 3:43 PM > >> To: Liang, Cunming ; dev@dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet= type > >> from data > >> > >> Hi Cunming, > >> > >> On 07/06/2016 08:44 AM, Liang, Cunming wrote: > >>> Hi Olivier, > >>> > >>> On 7/5/2016 11:41 PM, Olivier Matz wrote: > >>>> Introduce the function rte_pktmbuf_get_ptype() that parses a > >>>> mbuf and returns its packet type. For now, the following packet > >>>> types are parsed: > >>>> L2: Ether > >>>> L3: IPv4, IPv6 > >>>> L4: TCP, UDP, SCTP > >>>> > >>>> The goal here is to provide a reference implementation for packet ty= pe > >>>> parsing. This function will be used by testpmd in next commits, allo= wing > >>>> to compare its result with the value given by the hardware. > >>>> > >>>> This function will also be useful when implementing Rx offload suppo= rt > >>>> in virtio pmd. Indeed, the virtio protocol gives the csum start and > >>>> offset, but it does not give the L4 protocol nor it tells if the > >>>> checksum is relevant for inner or outer. This information has to be > >>>> known to properly set the ol_flags in mbuf. > >>>> > >>>> Signed-off-by: Didier Pallard > >>>> Signed-off-by: Jean Dao > >>>> Signed-off-by: Olivier Matz > >>>> --- > >>>> doc/guides/rel_notes/release_16_11.rst | 5 + > >>>> lib/librte_mbuf/Makefile | 5 +- > >>>> lib/librte_mbuf/rte_mbuf_ptype.c | 234 > >>>> +++++++++++++++++++++++++++++++++ > >>>> lib/librte_mbuf/rte_mbuf_ptype.h | 43 ++++++ > >>>> lib/librte_mbuf/rte_mbuf_version.map | 1 + > >>>> 5 files changed, 286 insertions(+), 2 deletions(-) > >>>> create mode 100644 lib/librte_mbuf/rte_mbuf_ptype.c > >>>> > >>>> [...] > >>>> + > >>>> +/* parse mbuf data to get packet type */ > >>>> +uint32_t rte_pktmbuf_get_ptype(const struct rte_mbuf *m, > >>>> + struct rte_mbuf_hdr_lens *hdr_lens) > >>>> +{ > >>>> + struct rte_mbuf_hdr_lens local_hdr_lens; > >>>> + const struct ether_hdr *eh; > >>>> + struct ether_hdr eh_copy; > >>>> + uint32_t pkt_type =3D RTE_PTYPE_L2_ETHER; > >>>> + uint32_t off =3D 0; > >>>> + uint16_t proto; > >>>> + > >>>> + if (hdr_lens =3D=3D NULL) > >>>> + hdr_lens =3D &local_hdr_lens; > >>>> + > >>>> + eh =3D rte_pktmbuf_read(m, off, sizeof(*eh), &eh_copy); > >>>> + if (unlikely(eh =3D=3D NULL)) > >>>> + return 0; > >>>> + proto =3D eh->ether_type; > >>>> + off =3D sizeof(*eh); > >>>> + hdr_lens->l2_len =3D off; > >>>> + > >>>> + if (proto =3D=3D rte_cpu_to_be_16(ETHER_TYPE_IPv4)) { > >>>> + const struct ipv4_hdr *ip4h; > >>>> + struct ipv4_hdr ip4h_copy; > >>>> + > >>>> + ip4h =3D rte_pktmbuf_read(m, off, sizeof(*ip4h), &ip4h_copy= ); > >>>> + if (unlikely(ip4h =3D=3D NULL)) > >>>> + return pkt_type; > >>>> + > >>>> + pkt_type |=3D ptype_l3_ip(ip4h->version_ihl); > >>>> + hdr_lens->l3_len =3D ip4_hlen(ip4h); > >>>> + off +=3D hdr_lens->l3_len; > >>>> + if (ip4h->fragment_offset & > >>>> + rte_cpu_to_be_16(IPV4_HDR_OFFSET_MASK | > >>>> + IPV4_HDR_MF_FLAG)) { > >>>> + pkt_type |=3D RTE_PTYPE_L4_FRAG; > >>>> + hdr_lens->l4_len =3D 0; > >>>> + return pkt_type; > >>>> + } > >>>> + proto =3D ip4h->next_proto_id; > >>>> + pkt_type |=3D ptype_l4(proto); > >>>> + } else if (proto =3D=3D rte_cpu_to_be_16(ETHER_TYPE_IPv6)) { > >>>> + const struct ipv6_hdr *ip6h; > >>>> + struct ipv6_hdr ip6h_copy; > >>>> + int frag =3D 0; > >>>> + > >>>> + ip6h =3D rte_pktmbuf_read(m, off, sizeof(*ip6h), &ip6h_copy= ); > >>>> + if (unlikely(ip6h =3D=3D NULL)) > >>>> + return pkt_type; > >>>> + > >>>> + proto =3D ip6h->proto; > >>>> + hdr_lens->l3_len =3D sizeof(*ip6h); > >>>> + off +=3D hdr_lens->l3_len; > >>>> + pkt_type |=3D ptype_l3_ip6(proto); > >>>> + if ((pkt_type & RTE_PTYPE_L3_MASK) =3D=3D RTE_PTYPE_L3_IPV6= _EXT) { > >>>> + proto =3D skip_ip6_ext(proto, m, &off, &frag); > >>>> + hdr_lens->l3_len =3D off - hdr_lens->l2_len; > >>>> + } > >>>> + if (proto =3D=3D 0) > >>>> + return pkt_type; > >>>> + if (frag) { > >>>> + pkt_type |=3D RTE_PTYPE_L4_FRAG; > >>>> + hdr_lens->l4_len =3D 0; > >>>> + return pkt_type; > >>>> + } > >>>> + pkt_type |=3D ptype_l4(proto); > >>>> + } > >>>> + > >>>> + if ((pkt_type & RTE_PTYPE_L4_MASK) =3D=3D RTE_PTYPE_L4_UDP) { > >>>> + hdr_lens->l4_len =3D sizeof(struct udp_hdr); > >>>> + } else if ((pkt_type & RTE_PTYPE_L4_MASK) =3D=3D RTE_PTYPE_L4_T= CP) { > >>>> + const struct tcp_hdr *th; > >>>> + struct tcp_hdr th_copy; > >>>> + > >>>> + th =3D rte_pktmbuf_read(m, off, sizeof(*th), &th_copy); > >>>> + if (unlikely(th =3D=3D NULL)) > >>>> + return pkt_type & (RTE_PTYPE_L2_MASK | > >>>> + RTE_PTYPE_L3_MASK); > >>>> + hdr_lens->l4_len =3D (th->data_off & 0xf0) >> 2; > >>>> + } else if ((pkt_type & RTE_PTYPE_L4_MASK) =3D=3D RTE_PTYPE_L4_S= CTP) { > >>>> + hdr_lens->l4_len =3D sizeof(struct sctp_hdr); > >>>> + } else { > >>>> + hdr_lens->l4_len =3D 0; > >>>> + } > >>>> + > >>>> + return pkt_type; > >>>> +} > >>>> diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h > >>>> b/lib/librte_mbuf/rte_mbuf_ptype.h > >>>> index 4a34678..f468520 100644 > >>>> --- a/lib/librte_mbuf/rte_mbuf_ptype.h > >>>> +++ b/lib/librte_mbuf/rte_mbuf_ptype.h > >>>> @@ -545,6 +545,49 @@ extern "C" { > >>>> RTE_PTYPE_INNER_L3_MASK | \ > >>>> RTE_PTYPE_INNER_L4_MASK)) > >>>> +struct rte_mbuf; > >>>> + > >>>> +/** > >>>> + * Structure containing header lengths associated to a packet. > >>>> + */ > >>>> +struct rte_mbuf_hdr_lens { > >>>> + uint8_t l2_len; > >>>> + uint8_t l3_len; > >>>> + uint8_t l4_len; > >>>> + uint8_t tunnel_len; > >>>> + uint8_t inner_l2_len; > >>>> + uint8_t inner_l3_len; > >>>> + uint8_t inner_l4_len; > >>>> +}; > >>> [LC] The header parsing graph usually is not unique. The definition > >>> maybe nice for the basic IP and L4 tunnel. > >>> However it can't scale out to other cases, e.g. qinq, mac-in-mac, mpl= s > >>> l2/l3 tunnel. > >>> The parsing logic of "rte_pktmbuf_get_ptype()" and the definition of > >>> "struct rte_mbuf_hdr_lens" consist a pair for one specific parser sch= eme. > >>> In this case, the fixed function is to support below. > >>> > >>> + * Supported packet types are: > >>> + * L2: Ether > >>> + * L3: IPv4, IPv6 > >>> + * L4: TCP, UDP, SCTP > >>> > >>> Of course, it can add more packet type detection logic in future. But > >>> the more support, the higher the cost. > >>> > >>> One of the alternative way is to allow registering parser pair. APP > >>> decides to choose the predefined scheme(by DPDK LIB), or to self-defi= ne > >>> the parsing logic. > >>> In this way, the scheme can do some assumption for the specific case = and > >>> ignore some useless graph detection. > >>> In addition, besides the SW parser, the HW parser(identified by > >>> packet_type in mbuf) can be turn on/off by leveraging the same manner= . > >> > >> > >> Sorry, I'm not sure I'm fully getting what you are saying. If I > >> understand well, you would like to have something more flexible that > >> supports the registration of protocol to be recognized? > > [LC] Not on that granularity, but on the entire parsing routine. > > rte_pktmbuf_get_ptype() as the common API, and can present in different > behavior. > > Usually in different scenario, the interested packet set is different. > > For the specific case, can do some speculation pre-checking on the opti= mization > perspective. > > > >> > >> I'm not sure having a function with a dynamic registration method woul= d > >> really increase the performance compared to a static complete function= . > > [LC] No, it won't. But the overhead is not much, refer to rx_pkt_burst(= is a > callback either). > > If someone only interest for IPv4-NoFrag-TCP stream, the easiest way ma= ybe > not layer by layer detection. > > The straight forward way maybe, 1) load n bytes 2) compare mask 3) upda= te > ptype. > > We require a normal way to do SW detection, current version is perfect. > > My point is, we can provide a simple mechanism to allow other way, and = under > the same unified API. >=20 > Again, sorry, I'm not perfectly sure I understand what you are saying. >=20 > What you describe (mask packet data, then compare with a value) seems > very similar to what ovs does. Do you mean we should have an API for that= ? [LC] No. Sorry to make you confused. If there's one function can well detect all kinds of packet in low cost, it= 's perfect. But from case to case, the packet detection interest is difficult(IPDC, wir= eless, metro Ethernet and etc). Considering the possible tradeoff of performance and completeness, to allow dedicated parser tuned for special purpose is an alternative way. >=20 > I think once we have masked+compared the data, we may know much more > than just a packet_type. [LC] Detection packet layer by layer is the normal way. In some case, it do= esn't have to. For example, we assume there's one network using VXLAN-GPE.. To detect the packet layer by layer, need to walk through two step, UDP Por= t and VXLAN NP. In fact, UDP+VXLAN(16B) as a whole to compare mask once, you can know it's = a VXLAN w/ inner Ethernet or not. Probably it's not a perfect cases. SW Parser is not a low cost stuff, from = cases to cases, if there are some special, it has potential space to optimi= ze. One possible pseudo code as below. struct rte_ptype_parser { char name[128]; uint32_t (*get_ptype)(const struct rte_mbuf *m, void *hdr_lens); }; struct rte_ptype_parser def_parser =3D=20 { .name =3D "ipdc";=20 .get_ptype =3D ipdc_get_ptype; }; uint32_t rte_pktmbuf_get_ptype(const struct rte_mbuf *m, void *hdr_lens) { struct rte_ptype_parser parser =3D def_parser; =09 [...] parser->get_ptype(m, hdr_lens); [...] } /* scheme for ipdc */ struct ipdc_hdr_lens { uint8_t l2_len; uint8_t l3_len; uint8_t l4_len; uint8_t tunnel_len; uint8_t inner_l2_len; uint8_t inner_l3_len; uint8_t inner_l4_len; }; uint32_t ipdc_get_ptype(const struct rte_mbuf *m, void *hdr_lens) { struct ipdc_hdr_lens ihl =3D (struct ipdc_hdr_lens*)hdr_lens; /* parser logic optimized for typical IP datacenter packet */ [...] } /* scheme for l2mpls */ struct l2mpls_hdr_lens { uint8_t l2_len; uint8_t mpls_len; /* total length for multi-layer */ uint8_t inner_l2_len; uint8_t inner_l3_len; }; uint32_t l2mpls_get_ptype(const struct rte_mbuf *m, void *hdr_lens) { struct l2mpls_hdr_lens ihl =3D (struct l2mpls_hdr_lens*)hdr_lens; /* parser logic optimized for typical L2MPLS */ [...] } >=20 >=20 >=20 > > > >> Actually, we will never support a tons of protocols since each layer > >> packet type is 4 bits, and since it requires that at least one hw > >> supports it. > > [LC] Agree, it is today. But maybe dynamic in future, packet type defin= ition as a > template. > >> > >> As described in the cover letter, the 2 main goals of this patchset ar= e > >> to provide a reference implementation for packet type recognition, and > >> enable the support of virtio offloads (I'll send the patchset soon). > >> This function is adapted to these 2 usages. Are you thinking of anothe= r > >> use-case that would not be covered? > > [LC] That's excellent work. Furthermore I believe it can cover all eth= dev actually. > > When HW can't report some demand packet type, then fallback to your SW > parser version. > > If the auto-switch can be transparent, that's perfect. Maybe rx callbac= k and > update ptype in mbuf? >=20 > I was also thinking about calling rte_pktmbuf_get_ptype() from a driver. > I think drivers should not access to mbuf data if it's not absolutely > required. > Calling rte_pktmbuf_get_ptype() from inside a rx callback seems easily > feasible, it may be useful for applications that mostly relies on > packet_type to select an action. >=20 >=20 > Regards, > Olivier