From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f43.google.com (mail-wm0-f43.google.com [74.125.82.43]) by dpdk.org (Postfix) with ESMTP id 03D616CC2 for ; Wed, 6 Jul 2016 14:08:50 +0200 (CEST) Received: by mail-wm0-f43.google.com with SMTP id a66so189934741wme.0 for ; Wed, 06 Jul 2016 05:08:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=LQdUKFz4K2nfw9p6a6zbtD2S2wA5r8Rga2p5VhqIVfU=; b=aNPU4HVQ1+UKK+/l3fy46cIDZ83E3xVMMzE26yEhpMJVd91F/XaLTGQFyFRfAdpEcu bFdGUyZHMNvxqIp39WWH5le+oeDjzJ0ffL8yY2ENtuqq6sMSSCkJ8Ku/vsssrZh/iz9S E8Mz4SCSzMgNgBJ0EIOCKqeDViWcF3Gc3jR1eBIFGXXpDxVbR/IajTv/JHfZZPeyH8pZ ODAq9k7sqwJrSrxFIjwSUT3IZ+6e+qMfDFxWr3xzk5J7trSTQzLUrihyl7x+9wBCNkUK djMAaB2DeOl+T+Fn1LO3bBWfceZ0kxoGemf7k28k3A7cEwaz/skayZ+nM30atooBCON7 dsMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=LQdUKFz4K2nfw9p6a6zbtD2S2wA5r8Rga2p5VhqIVfU=; b=DnVB3yeQXh1h53radAMv23GkymnQSHbQ2IRawvVP+h3FwiHF0euJmVm6ZaFjNf3yPN HACs4p9d6oTSbwWYYE85e/p40PygTi2ebs7JMtJtSfPF1Y9Ax4rm1ppaFuZslF27IwN2 NN/J2mtDdMzkO0904b6gayATsG7I9YGY7WfARFeb5rHdwq5t3WEtmQQZqJn71kRBxg5x RsSFn+09xGpZZfLz45xdidg3j3ZPOLvHOGPI9VMK4ZaENQaWjhj+n31dxOHCQf0NeG/7 9hvGr85SmbN+Ji6qrZV2wtyYKyc1Wa0jWLQ/8UH7Wls6DROWhyPdjS1TLj41u5CmCTl0 EgzQ== X-Gm-Message-State: ALyK8tJxWSmvQHN/AfU1J4MN6+/d54gc+p/+grXFkzpAHn6KVCmxw3RmUtkAsXopYBuvhluR X-Received: by 10.28.207.13 with SMTP id f13mr21514749wmg.53.1467806929730; Wed, 06 Jul 2016 05:08:49 -0700 (PDT) Received: from [192.168.1.15] (LFbn-1-8274-170.w81-254.abo.wanadoo.fr. [81.254.171.170]) by smtp.gmail.com with ESMTPSA id m125sm2132789wmm.8.2016.07.06.05.08.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Jul 2016 05:08:49 -0700 (PDT) To: "Chilikin, Andrey" , "Liang, Cunming" , "dev@dpdk.org" References: <1467733310-20875-1-git-send-email-olivier.matz@6wind.com> <1467733310-20875-6-git-send-email-olivier.matz@6wind.com> <577CA8D1.5000203@intel.com> <12989717-cc42-9ce6-f520-0ffbd3db7a8a@6wind.com> Cc: "Ananyev, Konstantin" From: Olivier MATZ Message-ID: Date: Wed, 6 Jul 2016 14:08:48 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type from data X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Jul 2016 12:08:50 -0000 Hi Andrey, On 07/06/2016 01:59 PM, Chilikin, Andrey wrote: > Hi Oliver, > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier MATZ >> Sent: Wednesday, July 6, 2016 8:43 AM >> To: Liang, Cunming ; dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type >> from data >> >> Hi Cunming, >> >> On 07/06/2016 08:44 AM, Liang, Cunming wrote: >>> Hi Olivier, >>> >>> On 7/5/2016 11:41 PM, Olivier Matz wrote: >>>> Introduce the function rte_pktmbuf_get_ptype() that parses a mbuf and >>>> returns its packet type. For now, the following packet types are >>>> parsed: >>>> L2: Ether >>>> L3: IPv4, IPv6 >>>> L4: TCP, UDP, SCTP >>>> >>>> The goal here is to provide a reference implementation for packet >>>> type parsing. This function will be used by testpmd in next commits, >>>> allowing to compare its result with the value given by the hardware. >>>> >>>> This function will also be useful when implementing Rx offload >>>> support in virtio pmd. Indeed, the virtio protocol gives the csum >>>> start and offset, but it does not give the L4 protocol nor it tells >>>> if the checksum is relevant for inner or outer. This information has >>>> to be known to properly set the ol_flags in mbuf. >>>> >>>> Signed-off-by: Didier Pallard >>>> Signed-off-by: Jean Dao >>>> Signed-off-by: Olivier Matz >>>> --- >>>> doc/guides/rel_notes/release_16_11.rst | 5 + >>>> lib/librte_mbuf/Makefile | 5 +- >>>> lib/librte_mbuf/rte_mbuf_ptype.c | 234 >>>> +++++++++++++++++++++++++++++++++ >>>> lib/librte_mbuf/rte_mbuf_ptype.h | 43 ++++++ >>>> lib/librte_mbuf/rte_mbuf_version.map | 1 + >>>> 5 files changed, 286 insertions(+), 2 deletions(-) >>>> create mode 100644 lib/librte_mbuf/rte_mbuf_ptype.c >>>> >>>> [...] >>>> + >>>> +/* parse mbuf data to get packet type */ uint32_t >>>> +rte_pktmbuf_get_ptype(const struct rte_mbuf *m, >>>> + struct rte_mbuf_hdr_lens *hdr_lens) { >>>> + struct rte_mbuf_hdr_lens local_hdr_lens; >>>> + const struct ether_hdr *eh; >>>> + struct ether_hdr eh_copy; >>>> + uint32_t pkt_type = RTE_PTYPE_L2_ETHER; >>>> + uint32_t off = 0; >>>> + uint16_t proto; >>>> + >>>> + if (hdr_lens == NULL) >>>> + hdr_lens = &local_hdr_lens; >>>> + >>>> + eh = rte_pktmbuf_read(m, off, sizeof(*eh), &eh_copy); >>>> + if (unlikely(eh == NULL)) >>>> + return 0; >>>> + proto = eh->ether_type; >>>> + off = sizeof(*eh); >>>> + hdr_lens->l2_len = off; >>>> + >>>> + if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv4)) { >>>> + const struct ipv4_hdr *ip4h; >>>> + struct ipv4_hdr ip4h_copy; >>>> + >>>> + ip4h = rte_pktmbuf_read(m, off, sizeof(*ip4h), &ip4h_copy); >>>> + if (unlikely(ip4h == NULL)) >>>> + return pkt_type; >>>> + >>>> + pkt_type |= ptype_l3_ip(ip4h->version_ihl); >>>> + hdr_lens->l3_len = ip4_hlen(ip4h); >>>> + off += hdr_lens->l3_len; >>>> + if (ip4h->fragment_offset & >>>> + rte_cpu_to_be_16(IPV4_HDR_OFFSET_MASK | >>>> + IPV4_HDR_MF_FLAG)) { >>>> + pkt_type |= RTE_PTYPE_L4_FRAG; >>>> + hdr_lens->l4_len = 0; >>>> + return pkt_type; >>>> + } >>>> + proto = ip4h->next_proto_id; >>>> + pkt_type |= ptype_l4(proto); >>>> + } else if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv6)) { >>>> + const struct ipv6_hdr *ip6h; >>>> + struct ipv6_hdr ip6h_copy; >>>> + int frag = 0; >>>> + >>>> + ip6h = rte_pktmbuf_read(m, off, sizeof(*ip6h), &ip6h_copy); >>>> + if (unlikely(ip6h == NULL)) >>>> + return pkt_type; >>>> + >>>> + proto = ip6h->proto; >>>> + hdr_lens->l3_len = sizeof(*ip6h); >>>> + off += hdr_lens->l3_len; >>>> + pkt_type |= ptype_l3_ip6(proto); >>>> + if ((pkt_type & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV6_EXT) { >>>> + proto = skip_ip6_ext(proto, m, &off, &frag); >>>> + hdr_lens->l3_len = off - hdr_lens->l2_len; >>>> + } >>>> + if (proto == 0) >>>> + return pkt_type; >>>> + if (frag) { >>>> + pkt_type |= RTE_PTYPE_L4_FRAG; >>>> + hdr_lens->l4_len = 0; >>>> + return pkt_type; >>>> + } >>>> + pkt_type |= ptype_l4(proto); >>>> + } >>>> + >>>> + if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP) { >>>> + hdr_lens->l4_len = sizeof(struct udp_hdr); >>>> + } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP) { >>>> + const struct tcp_hdr *th; >>>> + struct tcp_hdr th_copy; >>>> + >>>> + th = rte_pktmbuf_read(m, off, sizeof(*th), &th_copy); >>>> + if (unlikely(th == NULL)) >>>> + return pkt_type & (RTE_PTYPE_L2_MASK | >>>> + RTE_PTYPE_L3_MASK); >>>> + hdr_lens->l4_len = (th->data_off & 0xf0) >> 2; >>>> + } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP) { >>>> + hdr_lens->l4_len = sizeof(struct sctp_hdr); >>>> + } else { >>>> + hdr_lens->l4_len = 0; >>>> + } >>>> + >>>> + return pkt_type; >>>> +} >>>> diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h >>>> b/lib/librte_mbuf/rte_mbuf_ptype.h >>>> index 4a34678..f468520 100644 >>>> --- a/lib/librte_mbuf/rte_mbuf_ptype.h >>>> +++ b/lib/librte_mbuf/rte_mbuf_ptype.h >>>> @@ -545,6 +545,49 @@ extern "C" { >>>> RTE_PTYPE_INNER_L3_MASK | \ >>>> RTE_PTYPE_INNER_L4_MASK)) >>>> +struct rte_mbuf; >>>> + >>>> +/** >>>> + * Structure containing header lengths associated to a packet. >>>> + */ >>>> +struct rte_mbuf_hdr_lens { >>>> + uint8_t l2_len; >>>> + uint8_t l3_len; >>>> + uint8_t l4_len; >>>> + uint8_t tunnel_len; >>>> + uint8_t inner_l2_len; >>>> + uint8_t inner_l3_len; >>>> + uint8_t inner_l4_len; >>>> +}; >>> [LC] The header parsing graph usually is not unique. The definition >>> maybe nice for the basic IP and L4 tunnel. >>> However it can't scale out to other cases, e.g. qinq, mac-in-mac, mpls >>> l2/l3 tunnel. >>> The parsing logic of "rte_pktmbuf_get_ptype()" and the definition of >>> "struct rte_mbuf_hdr_lens" consist a pair for one specific parser scheme. >>> In this case, the fixed function is to support below. >>> >>> + * Supported packet types are: >>> + * L2: Ether >>> + * L3: IPv4, IPv6 >>> + * L4: TCP, UDP, SCTP >>> >>> Of course, it can add more packet type detection logic in future. But >>> the more support, the higher the cost. >>> >>> One of the alternative way is to allow registering parser pair. APP >>> decides to choose the predefined scheme(by DPDK LIB), or to >>> self-define the parsing logic. >>> In this way, the scheme can do some assumption for the specific case >>> and ignore some useless graph detection. >>> In addition, besides the SW parser, the HW parser(identified by >>> packet_type in mbuf) can be turn on/off by leveraging the same manner. >> >> Sorry, I'm not sure I'm fully getting what you are saying. If I understand well, >> you would like to have something more flexible that supports the registration of >> protocol to be recognized? >> >> I'm not sure having a function with a dynamic registration method would really >> increase the performance compared to a static complete function. >> Actually, we will never support a tons of protocols since each layer packet type >> is 4 bits, and since it requires that at least one hw supports it. > > This patch will be very useful as a reference implementation, but it also highlights an issue with the current implementation of packet types reporting by HW and SW - as you just mentioned there are only 4 bits per each layer. As these 4 bit are used as a enumeration it is impossible to reports multiple headers located on the same layer. MPLS is one example, different packets could have different numbers of MPLS labels, but it is impossible to report using current packet_type structure. > > It is possible, however, to program HW to report user (application) specific packet types. For example, for IPoMPLS with one MPLS label, HW will report packet type A, but for IPoMPLS with two MPLS labels HW will reports packet type B. In this case, instead of defining and supporting tons of statically defined (or enumerated) protocol headers combinations, application will register packet types it expects from HW in addition to standard packet types. At the moment we have high bits of packet_type reserved, so one possible solution would be to use the highest bit to indicate that this is user defined packet_type, specific to the application. Then it could be used with HW and with SW parser. For example, packet_type 0x8000000A is IPoMPLS with one MPLS label, 0x8000000B is IPoMPLS with two MPLS labels and so on. Thank you for the explanation. From your description, I wonder if the flow director API recently [1] proposed by Adrien wouldn't solve this issue? [1] http://dpdk.org/ml/archives/dev/2016-July/043365.html Regards, Olivier