From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 4DCE96CC4 for ; Wed, 6 Jul 2016 14:21:52 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP; 06 Jul 2016 05:21:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,318,1464678000"; d="scan'208";a="989995596" Received: from irsmsx153.ger.corp.intel.com ([163.33.192.75]) by orsmga001.jf.intel.com with ESMTP; 06 Jul 2016 05:21:50 -0700 Received: from irsmsx106.ger.corp.intel.com ([169.254.8.145]) by IRSMSX153.ger.corp.intel.com ([169.254.9.105]) with mapi id 14.03.0248.002; Wed, 6 Jul 2016 13:21:49 +0100 From: "Chilikin, Andrey" To: Olivier MATZ , "Liang, Cunming" , "dev@dpdk.org" CC: "Ananyev, Konstantin" , "Zhang, Helin" Thread-Topic: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type from data Thread-Index: AQHR11HwCOOTbt1pqUinZrKb29oQQKAK9IeAgAA1OlCAABUkAIAAEt8Q Date: Wed, 6 Jul 2016 12:21:48 +0000 Message-ID: References: <1467733310-20875-1-git-send-email-olivier.matz@6wind.com> <1467733310-20875-6-git-send-email-olivier.matz@6wind.com> <577CA8D1.5000203@intel.com> <12989717-cc42-9ce6-f520-0ffbd3db7a8a@6wind.com> In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type from data X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2016 12:21:53 -0000 Hi Oliver, > -----Original Message----- > From: Olivier MATZ [mailto:olivier.matz@6wind.com] > Sent: Wednesday, July 6, 2016 1:09 PM > To: Chilikin, Andrey ; Liang, Cunming > ; dev@dpdk.org > Cc: Ananyev, Konstantin > Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet ty= pe > from data >=20 > Hi Andrey, >=20 > On 07/06/2016 01:59 PM, Chilikin, Andrey wrote: > > Hi Oliver, > > > >> -----Original Message----- > >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ > >> Sent: Wednesday, July 6, 2016 8:43 AM > >> To: Liang, Cunming ; dev@dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get > >> packet type from data > >> > >> Hi Cunming, > >> > >> On 07/06/2016 08:44 AM, Liang, Cunming wrote: > >>> Hi Olivier, > >>> > >>> On 7/5/2016 11:41 PM, Olivier Matz wrote: > >>>> Introduce the function rte_pktmbuf_get_ptype() that parses a mbuf > >>>> and returns its packet type. For now, the following packet types > >>>> are > >>>> parsed: > >>>> L2: Ether > >>>> L3: IPv4, IPv6 > >>>> L4: TCP, UDP, SCTP > >>>> > >>>> The goal here is to provide a reference implementation for packet > >>>> type parsing. This function will be used by testpmd in next > >>>> commits, allowing to compare its result with the value given by the > hardware. > >>>> > >>>> This function will also be useful when implementing Rx offload > >>>> support in virtio pmd. Indeed, the virtio protocol gives the csum > >>>> start and offset, but it does not give the L4 protocol nor it tells > >>>> if the checksum is relevant for inner or outer. This information > >>>> has to be known to properly set the ol_flags in mbuf. > >>>> > >>>> Signed-off-by: Didier Pallard > >>>> Signed-off-by: Jean Dao > >>>> Signed-off-by: Olivier Matz > >>>> --- > >>>> doc/guides/rel_notes/release_16_11.rst | 5 + > >>>> lib/librte_mbuf/Makefile | 5 +- > >>>> lib/librte_mbuf/rte_mbuf_ptype.c | 234 > >>>> +++++++++++++++++++++++++++++++++ > >>>> lib/librte_mbuf/rte_mbuf_ptype.h | 43 ++++++ > >>>> lib/librte_mbuf/rte_mbuf_version.map | 1 + > >>>> 5 files changed, 286 insertions(+), 2 deletions(-) > >>>> create mode 100644 lib/librte_mbuf/rte_mbuf_ptype.c > >>>> > >>>> [...] > >>>> + > >>>> +/* parse mbuf data to get packet type */ uint32_t > >>>> +rte_pktmbuf_get_ptype(const struct rte_mbuf *m, > >>>> + struct rte_mbuf_hdr_lens *hdr_lens) { > >>>> + struct rte_mbuf_hdr_lens local_hdr_lens; > >>>> + const struct ether_hdr *eh; > >>>> + struct ether_hdr eh_copy; > >>>> + uint32_t pkt_type =3D RTE_PTYPE_L2_ETHER; > >>>> + uint32_t off =3D 0; > >>>> + uint16_t proto; > >>>> + > >>>> + if (hdr_lens =3D=3D NULL) > >>>> + hdr_lens =3D &local_hdr_lens; > >>>> + > >>>> + eh =3D rte_pktmbuf_read(m, off, sizeof(*eh), &eh_copy); > >>>> + if (unlikely(eh =3D=3D NULL)) > >>>> + return 0; > >>>> + proto =3D eh->ether_type; > >>>> + off =3D sizeof(*eh); > >>>> + hdr_lens->l2_len =3D off; > >>>> + > >>>> + if (proto =3D=3D rte_cpu_to_be_16(ETHER_TYPE_IPv4)) { > >>>> + const struct ipv4_hdr *ip4h; > >>>> + struct ipv4_hdr ip4h_copy; > >>>> + > >>>> + ip4h =3D rte_pktmbuf_read(m, off, sizeof(*ip4h), &ip4h_copy= ); > >>>> + if (unlikely(ip4h =3D=3D NULL)) > >>>> + return pkt_type; > >>>> + > >>>> + pkt_type |=3D ptype_l3_ip(ip4h->version_ihl); > >>>> + hdr_lens->l3_len =3D ip4_hlen(ip4h); > >>>> + off +=3D hdr_lens->l3_len; > >>>> + if (ip4h->fragment_offset & > >>>> + rte_cpu_to_be_16(IPV4_HDR_OFFSET_MASK | > >>>> + IPV4_HDR_MF_FLAG)) { > >>>> + pkt_type |=3D RTE_PTYPE_L4_FRAG; > >>>> + hdr_lens->l4_len =3D 0; > >>>> + return pkt_type; > >>>> + } > >>>> + proto =3D ip4h->next_proto_id; > >>>> + pkt_type |=3D ptype_l4(proto); > >>>> + } else if (proto =3D=3D rte_cpu_to_be_16(ETHER_TYPE_IPv6)) { > >>>> + const struct ipv6_hdr *ip6h; > >>>> + struct ipv6_hdr ip6h_copy; > >>>> + int frag =3D 0; > >>>> + > >>>> + ip6h =3D rte_pktmbuf_read(m, off, sizeof(*ip6h), &ip6h_copy= ); > >>>> + if (unlikely(ip6h =3D=3D NULL)) > >>>> + return pkt_type; > >>>> + > >>>> + proto =3D ip6h->proto; > >>>> + hdr_lens->l3_len =3D sizeof(*ip6h); > >>>> + off +=3D hdr_lens->l3_len; > >>>> + pkt_type |=3D ptype_l3_ip6(proto); > >>>> + if ((pkt_type & RTE_PTYPE_L3_MASK) =3D=3D RTE_PTYPE_L3_IPV6= _EXT) > { > >>>> + proto =3D skip_ip6_ext(proto, m, &off, &frag); > >>>> + hdr_lens->l3_len =3D off - hdr_lens->l2_len; > >>>> + } > >>>> + if (proto =3D=3D 0) > >>>> + return pkt_type; > >>>> + if (frag) { > >>>> + pkt_type |=3D RTE_PTYPE_L4_FRAG; > >>>> + hdr_lens->l4_len =3D 0; > >>>> + return pkt_type; > >>>> + } > >>>> + pkt_type |=3D ptype_l4(proto); > >>>> + } > >>>> + > >>>> + if ((pkt_type & RTE_PTYPE_L4_MASK) =3D=3D RTE_PTYPE_L4_UDP) { > >>>> + hdr_lens->l4_len =3D sizeof(struct udp_hdr); > >>>> + } else if ((pkt_type & RTE_PTYPE_L4_MASK) =3D=3D RTE_PTYPE_L4_T= CP) { > >>>> + const struct tcp_hdr *th; > >>>> + struct tcp_hdr th_copy; > >>>> + > >>>> + th =3D rte_pktmbuf_read(m, off, sizeof(*th), &th_copy); > >>>> + if (unlikely(th =3D=3D NULL)) > >>>> + return pkt_type & (RTE_PTYPE_L2_MASK | > >>>> + RTE_PTYPE_L3_MASK); > >>>> + hdr_lens->l4_len =3D (th->data_off & 0xf0) >> 2; > >>>> + } else if ((pkt_type & RTE_PTYPE_L4_MASK) =3D=3D RTE_PTYPE_L4_S= CTP) { > >>>> + hdr_lens->l4_len =3D sizeof(struct sctp_hdr); > >>>> + } else { > >>>> + hdr_lens->l4_len =3D 0; > >>>> + } > >>>> + > >>>> + return pkt_type; > >>>> +} > >>>> diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h > >>>> b/lib/librte_mbuf/rte_mbuf_ptype.h > >>>> index 4a34678..f468520 100644 > >>>> --- a/lib/librte_mbuf/rte_mbuf_ptype.h > >>>> +++ b/lib/librte_mbuf/rte_mbuf_ptype.h > >>>> @@ -545,6 +545,49 @@ extern "C" { > >>>> RTE_PTYPE_INNER_L3_MASK | \ > >>>> RTE_PTYPE_INNER_L4_MASK)) > >>>> +struct rte_mbuf; > >>>> + > >>>> +/** > >>>> + * Structure containing header lengths associated to a packet. > >>>> + */ > >>>> +struct rte_mbuf_hdr_lens { > >>>> + uint8_t l2_len; > >>>> + uint8_t l3_len; > >>>> + uint8_t l4_len; > >>>> + uint8_t tunnel_len; > >>>> + uint8_t inner_l2_len; > >>>> + uint8_t inner_l3_len; > >>>> + uint8_t inner_l4_len; > >>>> +}; > >>> [LC] The header parsing graph usually is not unique. The definition > >>> maybe nice for the basic IP and L4 tunnel. > >>> However it can't scale out to other cases, e.g. qinq, mac-in-mac, > >>> mpls > >>> l2/l3 tunnel. > >>> The parsing logic of "rte_pktmbuf_get_ptype()" and the definition of > >>> "struct rte_mbuf_hdr_lens" consist a pair for one specific parser sch= eme. > >>> In this case, the fixed function is to support below. > >>> > >>> + * Supported packet types are: > >>> + * L2: Ether > >>> + * L3: IPv4, IPv6 > >>> + * L4: TCP, UDP, SCTP > >>> > >>> Of course, it can add more packet type detection logic in future. > >>> But the more support, the higher the cost. > >>> > >>> One of the alternative way is to allow registering parser pair. APP > >>> decides to choose the predefined scheme(by DPDK LIB), or to > >>> self-define the parsing logic. > >>> In this way, the scheme can do some assumption for the specific case > >>> and ignore some useless graph detection. > >>> In addition, besides the SW parser, the HW parser(identified by > >>> packet_type in mbuf) can be turn on/off by leveraging the same manner= . > >> > >> Sorry, I'm not sure I'm fully getting what you are saying. If I > >> understand well, you would like to have something more flexible that > >> supports the registration of protocol to be recognized? > >> > >> I'm not sure having a function with a dynamic registration method > >> would really increase the performance compared to a static complete > function. > >> Actually, we will never support a tons of protocols since each layer > >> packet type is 4 bits, and since it requires that at least one hw supp= orts it. > > > > This patch will be very useful as a reference implementation, but it al= so > highlights an issue with the current implementation of packet types repor= ting > by HW and SW - as you just mentioned there are only 4 bits per each layer= . As > these 4 bit are used as a enumeration it is impossible to reports multipl= e > headers located on the same layer. MPLS is one example, different packets > could have different numbers of MPLS labels, but it is impossible to repo= rt > using current packet_type structure. > > > > It is possible, however, to program HW to report user (application) sp= ecific > packet types. For example, for IPoMPLS with one MPLS label, HW will repor= t > packet type A, but for IPoMPLS with two MPLS labels HW will reports packe= t > type B. In this case, instead of defining and supporting tons of statical= ly defined > (or enumerated) protocol headers combinations, application will register > packet types it expects from HW in addition to standard packet types. At = the > moment we have high bits of packet_type reserved, so one possible soluti= on > would be to use the highest bit to indicate that this is user defined pac= ket_type, > specific to the application. Then it could be used with HW and with SW pa= rser. > For example, packet_type 0x8000000A is IPoMPLS with one MPLS label, > 0x8000000B is IPoMPLS with two MPLS labels and so on. >=20 > Thank you for the explanation. From your description, I wonder if the flo= w > director API recently [1] proposed by Adrien wouldn't solve this issue? >=20 > [1] http://dpdk.org/ml/archives/dev/2016-July/043365.html I'm reviewing Adrien's proposal (it is a big document :)) and reply with my= comment after reviewing. Regards, Andrey =20 > Regards, > Olivier