From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67]) by dpdk.org (Postfix) with ESMTP id 281BB2C22 for ; Thu, 7 Jul 2016 17:48:51 +0200 (CEST) Received: from lfbn-1-8274-170.w81-254.abo.wanadoo.fr ([81.254.171.170] helo=[192.168.1.13]) by mail.droids-corp.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1bLBaA-00077x-2V; Thu, 07 Jul 2016 17:51:19 +0200 To: "Liang, Cunming" , "dev@dpdk.org" References: <1467733310-20875-1-git-send-email-olivier.matz@6wind.com> <1467733310-20875-6-git-send-email-olivier.matz@6wind.com> <577CA8D1.5000203@intel.com> <12989717-cc42-9ce6-f520-0ffbd3db7a8a@6wind.com> From: Olivier Matz Message-ID: <683d73f9-62e0-8169-1222-80f5ea8d865b@6wind.com> Date: Thu, 7 Jul 2016 17:48:41 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type from data X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Jul 2016 15:48:51 -0000 Hi Cunming, Thank you for your feedback. On 07/07/2016 10:19 AM, Liang, Cunming wrote: > Hi Olivier, > >> -----Original Message----- >> From: Olivier MATZ [mailto:olivier.matz@6wind.com] >> Sent: Wednesday, July 06, 2016 3:43 PM >> To: Liang, Cunming ; dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type >> from data >> >> Hi Cunming, >> >> On 07/06/2016 08:44 AM, Liang, Cunming wrote: >>> Hi Olivier, >>> >>> On 7/5/2016 11:41 PM, Olivier Matz wrote: >>>> Introduce the function rte_pktmbuf_get_ptype() that parses a >>>> mbuf and returns its packet type. For now, the following packet >>>> types are parsed: >>>> L2: Ether >>>> L3: IPv4, IPv6 >>>> L4: TCP, UDP, SCTP >>>> >>>> The goal here is to provide a reference implementation for packet type >>>> parsing. This function will be used by testpmd in next commits, allowing >>>> to compare its result with the value given by the hardware. >>>> >>>> This function will also be useful when implementing Rx offload support >>>> in virtio pmd. Indeed, the virtio protocol gives the csum start and >>>> offset, but it does not give the L4 protocol nor it tells if the >>>> checksum is relevant for inner or outer. This information has to be >>>> known to properly set the ol_flags in mbuf. >>>> >>>> Signed-off-by: Didier Pallard >>>> Signed-off-by: Jean Dao >>>> Signed-off-by: Olivier Matz >>>> --- >>>> doc/guides/rel_notes/release_16_11.rst | 5 + >>>> lib/librte_mbuf/Makefile | 5 +- >>>> lib/librte_mbuf/rte_mbuf_ptype.c | 234 >>>> +++++++++++++++++++++++++++++++++ >>>> lib/librte_mbuf/rte_mbuf_ptype.h | 43 ++++++ >>>> lib/librte_mbuf/rte_mbuf_version.map | 1 + >>>> 5 files changed, 286 insertions(+), 2 deletions(-) >>>> create mode 100644 lib/librte_mbuf/rte_mbuf_ptype.c >>>> >>>> [...] >>>> + >>>> +/* parse mbuf data to get packet type */ >>>> +uint32_t rte_pktmbuf_get_ptype(const struct rte_mbuf *m, >>>> + struct rte_mbuf_hdr_lens *hdr_lens) >>>> +{ >>>> + struct rte_mbuf_hdr_lens local_hdr_lens; >>>> + const struct ether_hdr *eh; >>>> + struct ether_hdr eh_copy; >>>> + uint32_t pkt_type = RTE_PTYPE_L2_ETHER; >>>> + uint32_t off = 0; >>>> + uint16_t proto; >>>> + >>>> + if (hdr_lens == NULL) >>>> + hdr_lens = &local_hdr_lens; >>>> + >>>> + eh = rte_pktmbuf_read(m, off, sizeof(*eh), &eh_copy); >>>> + if (unlikely(eh == NULL)) >>>> + return 0; >>>> + proto = eh->ether_type; >>>> + off = sizeof(*eh); >>>> + hdr_lens->l2_len = off; >>>> + >>>> + if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv4)) { >>>> + const struct ipv4_hdr *ip4h; >>>> + struct ipv4_hdr ip4h_copy; >>>> + >>>> + ip4h = rte_pktmbuf_read(m, off, sizeof(*ip4h), &ip4h_copy); >>>> + if (unlikely(ip4h == NULL)) >>>> + return pkt_type; >>>> + >>>> + pkt_type |= ptype_l3_ip(ip4h->version_ihl); >>>> + hdr_lens->l3_len = ip4_hlen(ip4h); >>>> + off += hdr_lens->l3_len; >>>> + if (ip4h->fragment_offset & >>>> + rte_cpu_to_be_16(IPV4_HDR_OFFSET_MASK | >>>> + IPV4_HDR_MF_FLAG)) { >>>> + pkt_type |= RTE_PTYPE_L4_FRAG; >>>> + hdr_lens->l4_len = 0; >>>> + return pkt_type; >>>> + } >>>> + proto = ip4h->next_proto_id; >>>> + pkt_type |= ptype_l4(proto); >>>> + } else if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv6)) { >>>> + const struct ipv6_hdr *ip6h; >>>> + struct ipv6_hdr ip6h_copy; >>>> + int frag = 0; >>>> + >>>> + ip6h = rte_pktmbuf_read(m, off, sizeof(*ip6h), &ip6h_copy); >>>> + if (unlikely(ip6h == NULL)) >>>> + return pkt_type; >>>> + >>>> + proto = ip6h->proto; >>>> + hdr_lens->l3_len = sizeof(*ip6h); >>>> + off += hdr_lens->l3_len; >>>> + pkt_type |= ptype_l3_ip6(proto); >>>> + if ((pkt_type & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV6_EXT) { >>>> + proto = skip_ip6_ext(proto, m, &off, &frag); >>>> + hdr_lens->l3_len = off - hdr_lens->l2_len; >>>> + } >>>> + if (proto == 0) >>>> + return pkt_type; >>>> + if (frag) { >>>> + pkt_type |= RTE_PTYPE_L4_FRAG; >>>> + hdr_lens->l4_len = 0; >>>> + return pkt_type; >>>> + } >>>> + pkt_type |= ptype_l4(proto); >>>> + } >>>> + >>>> + if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP) { >>>> + hdr_lens->l4_len = sizeof(struct udp_hdr); >>>> + } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP) { >>>> + const struct tcp_hdr *th; >>>> + struct tcp_hdr th_copy; >>>> + >>>> + th = rte_pktmbuf_read(m, off, sizeof(*th), &th_copy); >>>> + if (unlikely(th == NULL)) >>>> + return pkt_type & (RTE_PTYPE_L2_MASK | >>>> + RTE_PTYPE_L3_MASK); >>>> + hdr_lens->l4_len = (th->data_off & 0xf0) >> 2; >>>> + } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP) { >>>> + hdr_lens->l4_len = sizeof(struct sctp_hdr); >>>> + } else { >>>> + hdr_lens->l4_len = 0; >>>> + } >>>> + >>>> + return pkt_type; >>>> +} >>>> diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h >>>> b/lib/librte_mbuf/rte_mbuf_ptype.h >>>> index 4a34678..f468520 100644 >>>> --- a/lib/librte_mbuf/rte_mbuf_ptype.h >>>> +++ b/lib/librte_mbuf/rte_mbuf_ptype.h >>>> @@ -545,6 +545,49 @@ extern "C" { >>>> RTE_PTYPE_INNER_L3_MASK | \ >>>> RTE_PTYPE_INNER_L4_MASK)) >>>> +struct rte_mbuf; >>>> + >>>> +/** >>>> + * Structure containing header lengths associated to a packet. >>>> + */ >>>> +struct rte_mbuf_hdr_lens { >>>> + uint8_t l2_len; >>>> + uint8_t l3_len; >>>> + uint8_t l4_len; >>>> + uint8_t tunnel_len; >>>> + uint8_t inner_l2_len; >>>> + uint8_t inner_l3_len; >>>> + uint8_t inner_l4_len; >>>> +}; >>> [LC] The header parsing graph usually is not unique. The definition >>> maybe nice for the basic IP and L4 tunnel. >>> However it can't scale out to other cases, e.g. qinq, mac-in-mac, mpls >>> l2/l3 tunnel. >>> The parsing logic of "rte_pktmbuf_get_ptype()" and the definition of >>> "struct rte_mbuf_hdr_lens" consist a pair for one specific parser scheme. >>> In this case, the fixed function is to support below. >>> >>> + * Supported packet types are: >>> + * L2: Ether >>> + * L3: IPv4, IPv6 >>> + * L4: TCP, UDP, SCTP >>> >>> Of course, it can add more packet type detection logic in future. But >>> the more support, the higher the cost. >>> >>> One of the alternative way is to allow registering parser pair. APP >>> decides to choose the predefined scheme(by DPDK LIB), or to self-define >>> the parsing logic. >>> In this way, the scheme can do some assumption for the specific case and >>> ignore some useless graph detection. >>> In addition, besides the SW parser, the HW parser(identified by >>> packet_type in mbuf) can be turn on/off by leveraging the same manner. >> >> >> Sorry, I'm not sure I'm fully getting what you are saying. If I >> understand well, you would like to have something more flexible that >> supports the registration of protocol to be recognized? > [LC] Not on that granularity, but on the entire parsing routine. > rte_pktmbuf_get_ptype() as the common API, and can present in different behavior. > Usually in different scenario, the interested packet set is different. > For the specific case, can do some speculation pre-checking on the optimization perspective. > >> >> I'm not sure having a function with a dynamic registration method would >> really increase the performance compared to a static complete function. > [LC] No, it won't. But the overhead is not much, refer to rx_pkt_burst(is a callback either). > If someone only interest for IPv4-NoFrag-TCP stream, the easiest way maybe not layer by layer detection. > The straight forward way maybe, 1) load n bytes 2) compare mask 3) update ptype. > We require a normal way to do SW detection, current version is perfect. > My point is, we can provide a simple mechanism to allow other way, and under the same unified API. Again, sorry, I'm not perfectly sure I understand what you are saying. What you describe (mask packet data, then compare with a value) seems very similar to what ovs does. Do you mean we should have an API for that? I think once we have masked+compared the data, we may know much more than just a packet_type. > >> Actually, we will never support a tons of protocols since each layer >> packet type is 4 bits, and since it requires that at least one hw >> supports it. > [LC] Agree, it is today. But maybe dynamic in future, packet type definition as a template. >> >> As described in the cover letter, the 2 main goals of this patchset are >> to provide a reference implementation for packet type recognition, and >> enable the support of virtio offloads (I'll send the patchset soon). >> This function is adapted to these 2 usages. Are you thinking of another >> use-case that would not be covered? > [LC] That's excellent work. Furthermore I believe it can cover all ethdev actually. > When HW can't report some demand packet type, then fallback to your SW parser version. > If the auto-switch can be transparent, that's perfect. Maybe rx callback and update ptype in mbuf? I was also thinking about calling rte_pktmbuf_get_ptype() from a driver. I think drivers should not access to mbuf data if it's not absolutely required. Calling rte_pktmbuf_get_ptype() from inside a rx callback seems easily feasible, it may be useful for applications that mostly relies on packet_type to select an action. Regards, Olivier