From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <cunming.liang@intel.com>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id A78255952
 for <dev@dpdk.org>; Fri,  8 Jul 2016 12:08:09 +0200 (CEST)
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by fmsmga103.fm.intel.com with ESMTP; 08 Jul 2016 03:08:09 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.28,329,1464678000"; d="scan'208";a="1017915556"
Received: from fmsmsx106.amr.corp.intel.com ([10.18.124.204])
 by fmsmga002.fm.intel.com with ESMTP; 08 Jul 2016 03:08:08 -0700
Received: from fmsmsx121.amr.corp.intel.com (10.18.125.36) by
 FMSMSX106.amr.corp.intel.com (10.18.124.204) with Microsoft SMTP Server (TLS)
 id 14.3.248.2; Fri, 8 Jul 2016 03:08:08 -0700
Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by
 fmsmsx121.amr.corp.intel.com (10.18.125.36) with Microsoft SMTP Server (TLS)
 id 14.3.248.2; Fri, 8 Jul 2016 03:08:08 -0700
Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.147]) by
 SHSMSX151.ccr.corp.intel.com ([169.254.3.150]) with mapi id 14.03.0248.002;
 Fri, 8 Jul 2016 18:08:06 +0800
From: "Liang, Cunming" <cunming.liang@intel.com>
To: Olivier Matz <olivier.matz@6wind.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type
 from data
Thread-Index: AQHR11HfPzhKEWyUek2oW5QL5NqaGaAKfy6AgAIDNxCAABbsgIABpZ2g
Date: Fri, 8 Jul 2016 10:08:05 +0000
Message-ID: <D0158A423229094DA7ABF71CF2FA0DA315545B74@shsmsx102.ccr.corp.intel.com>
References: <1467733310-20875-1-git-send-email-olivier.matz@6wind.com>
 <1467733310-20875-6-git-send-email-olivier.matz@6wind.com>
 <577CA8D1.5000203@intel.com>
 <12989717-cc42-9ce6-f520-0ffbd3db7a8a@6wind.com>
 <D0158A423229094DA7ABF71CF2FA0DA31554517C@shsmsx102.ccr.corp.intel.com>
 <683d73f9-62e0-8169-1222-80f5ea8d865b@6wind.com>
In-Reply-To: <683d73f9-62e0-8169-1222-80f5ea8d865b@6wind.com>
Accept-Language: zh-CN, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type
 from data
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jul 2016 10:08:10 -0000

Hi Olivier,

> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Thursday, July 07, 2016 11:49 PM
> To: Liang, Cunming <cunming.liang@intel.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet ty=
pe
> from data
>=20
> Hi Cunming,
>=20
> Thank you for your feedback.
>=20
> On 07/07/2016 10:19 AM, Liang, Cunming wrote:
> > Hi Olivier,
> >
> >> -----Original Message-----
> >> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> >> Sent: Wednesday, July 06, 2016 3:43 PM
> >> To: Liang, Cunming <cunming.liang@intel.com>; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet=
 type
> >> from data
> >>
> >> Hi Cunming,
> >>
> >> On 07/06/2016 08:44 AM, Liang, Cunming wrote:
> >>> Hi Olivier,
> >>>
> >>> On 7/5/2016 11:41 PM, Olivier Matz wrote:
> >>>> Introduce the function rte_pktmbuf_get_ptype() that parses a
> >>>> mbuf and returns its packet type. For now, the following packet
> >>>> types are parsed:
> >>>>     L2: Ether
> >>>>     L3: IPv4, IPv6
> >>>>     L4: TCP, UDP, SCTP
> >>>>
> >>>> The goal here is to provide a reference implementation for packet ty=
pe
> >>>> parsing. This function will be used by testpmd in next commits, allo=
wing
> >>>> to compare its result with the value given by the hardware.
> >>>>
> >>>> This function will also be useful when implementing Rx offload suppo=
rt
> >>>> in virtio pmd. Indeed, the virtio protocol gives the csum start and
> >>>> offset, but it does not give the L4 protocol nor it tells if the
> >>>> checksum is relevant for inner or outer. This information has to be
> >>>> known to properly set the ol_flags in mbuf.
> >>>>
> >>>> Signed-off-by: Didier Pallard <didier.pallard@6wind.com>
> >>>> Signed-off-by: Jean Dao <jean.dao@6wind.com>
> >>>> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> >>>> ---
> >>>>   doc/guides/rel_notes/release_16_11.rst |   5 +
> >>>>   lib/librte_mbuf/Makefile               |   5 +-
> >>>>   lib/librte_mbuf/rte_mbuf_ptype.c       | 234
> >>>> +++++++++++++++++++++++++++++++++
> >>>>   lib/librte_mbuf/rte_mbuf_ptype.h       |  43 ++++++
> >>>>   lib/librte_mbuf/rte_mbuf_version.map   |   1 +
> >>>>   5 files changed, 286 insertions(+), 2 deletions(-)
> >>>>   create mode 100644 lib/librte_mbuf/rte_mbuf_ptype.c
> >>>>
> >>>> [...]
> >>>> +
> >>>> +/* parse mbuf data to get packet type */
> >>>> +uint32_t rte_pktmbuf_get_ptype(const struct rte_mbuf *m,
> >>>> +    struct rte_mbuf_hdr_lens *hdr_lens)
> >>>> +{
> >>>> +    struct rte_mbuf_hdr_lens local_hdr_lens;
> >>>> +    const struct ether_hdr *eh;
> >>>> +    struct ether_hdr eh_copy;
> >>>> +    uint32_t pkt_type =3D RTE_PTYPE_L2_ETHER;
> >>>> +    uint32_t off =3D 0;
> >>>> +    uint16_t proto;
> >>>> +
> >>>> +    if (hdr_lens =3D=3D NULL)
> >>>> +        hdr_lens =3D &local_hdr_lens;
> >>>> +
> >>>> +    eh =3D rte_pktmbuf_read(m, off, sizeof(*eh), &eh_copy);
> >>>> +    if (unlikely(eh =3D=3D NULL))
> >>>> +        return 0;
> >>>> +    proto =3D eh->ether_type;
> >>>> +    off =3D sizeof(*eh);
> >>>> +    hdr_lens->l2_len =3D off;
> >>>> +
> >>>> +    if (proto =3D=3D rte_cpu_to_be_16(ETHER_TYPE_IPv4)) {
> >>>> +        const struct ipv4_hdr *ip4h;
> >>>> +        struct ipv4_hdr ip4h_copy;
> >>>> +
> >>>> +        ip4h =3D rte_pktmbuf_read(m, off, sizeof(*ip4h), &ip4h_copy=
);
> >>>> +        if (unlikely(ip4h =3D=3D NULL))
> >>>> +            return pkt_type;
> >>>> +
> >>>> +        pkt_type |=3D ptype_l3_ip(ip4h->version_ihl);
> >>>> +        hdr_lens->l3_len =3D ip4_hlen(ip4h);
> >>>> +        off +=3D hdr_lens->l3_len;
> >>>> +        if (ip4h->fragment_offset &
> >>>> +                rte_cpu_to_be_16(IPV4_HDR_OFFSET_MASK |
> >>>> +                    IPV4_HDR_MF_FLAG)) {
> >>>> +            pkt_type |=3D RTE_PTYPE_L4_FRAG;
> >>>> +            hdr_lens->l4_len =3D 0;
> >>>> +            return pkt_type;
> >>>> +        }
> >>>> +        proto =3D ip4h->next_proto_id;
> >>>> +        pkt_type |=3D ptype_l4(proto);
> >>>> +    } else if (proto =3D=3D rte_cpu_to_be_16(ETHER_TYPE_IPv6)) {
> >>>> +        const struct ipv6_hdr *ip6h;
> >>>> +        struct ipv6_hdr ip6h_copy;
> >>>> +        int frag =3D 0;
> >>>> +
> >>>> +        ip6h =3D rte_pktmbuf_read(m, off, sizeof(*ip6h), &ip6h_copy=
);
> >>>> +        if (unlikely(ip6h =3D=3D NULL))
> >>>> +            return pkt_type;
> >>>> +
> >>>> +        proto =3D ip6h->proto;
> >>>> +        hdr_lens->l3_len =3D sizeof(*ip6h);
> >>>> +        off +=3D hdr_lens->l3_len;
> >>>> +        pkt_type |=3D ptype_l3_ip6(proto);
> >>>> +        if ((pkt_type & RTE_PTYPE_L3_MASK) =3D=3D RTE_PTYPE_L3_IPV6=
_EXT) {
> >>>> +            proto =3D skip_ip6_ext(proto, m, &off, &frag);
> >>>> +            hdr_lens->l3_len =3D off - hdr_lens->l2_len;
> >>>> +        }
> >>>> +        if (proto =3D=3D 0)
> >>>> +            return pkt_type;
> >>>> +        if (frag) {
> >>>> +            pkt_type |=3D RTE_PTYPE_L4_FRAG;
> >>>> +            hdr_lens->l4_len =3D 0;
> >>>> +            return pkt_type;
> >>>> +        }
> >>>> +        pkt_type |=3D ptype_l4(proto);
> >>>> +    }
> >>>> +
> >>>> +    if ((pkt_type & RTE_PTYPE_L4_MASK) =3D=3D RTE_PTYPE_L4_UDP) {
> >>>> +        hdr_lens->l4_len =3D sizeof(struct udp_hdr);
> >>>> +    } else if ((pkt_type & RTE_PTYPE_L4_MASK) =3D=3D RTE_PTYPE_L4_T=
CP) {
> >>>> +        const struct tcp_hdr *th;
> >>>> +        struct tcp_hdr th_copy;
> >>>> +
> >>>> +        th =3D rte_pktmbuf_read(m, off, sizeof(*th), &th_copy);
> >>>> +        if (unlikely(th =3D=3D NULL))
> >>>> +            return pkt_type & (RTE_PTYPE_L2_MASK |
> >>>> +                RTE_PTYPE_L3_MASK);
> >>>> +        hdr_lens->l4_len =3D (th->data_off & 0xf0) >> 2;
> >>>> +    } else if ((pkt_type & RTE_PTYPE_L4_MASK) =3D=3D RTE_PTYPE_L4_S=
CTP) {
> >>>> +        hdr_lens->l4_len =3D sizeof(struct sctp_hdr);
> >>>> +    } else {
> >>>> +        hdr_lens->l4_len =3D 0;
> >>>> +    }
> >>>> +
> >>>> +    return pkt_type;
> >>>> +}
> >>>> diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h
> >>>> b/lib/librte_mbuf/rte_mbuf_ptype.h
> >>>> index 4a34678..f468520 100644
> >>>> --- a/lib/librte_mbuf/rte_mbuf_ptype.h
> >>>> +++ b/lib/librte_mbuf/rte_mbuf_ptype.h
> >>>> @@ -545,6 +545,49 @@ extern "C" {
> >>>>           RTE_PTYPE_INNER_L3_MASK |                \
> >>>>           RTE_PTYPE_INNER_L4_MASK))
> >>>>   +struct rte_mbuf;
> >>>> +
> >>>> +/**
> >>>> + * Structure containing header lengths associated to a packet.
> >>>> + */
> >>>> +struct rte_mbuf_hdr_lens {
> >>>> +    uint8_t l2_len;
> >>>> +    uint8_t l3_len;
> >>>> +    uint8_t l4_len;
> >>>> +    uint8_t tunnel_len;
> >>>> +    uint8_t inner_l2_len;
> >>>> +    uint8_t inner_l3_len;
> >>>> +    uint8_t inner_l4_len;
> >>>> +};
> >>> [LC] The header parsing graph usually is not unique. The definition
> >>> maybe nice for the basic IP and L4 tunnel.
> >>> However it can't scale out to other cases, e.g. qinq, mac-in-mac, mpl=
s
> >>> l2/l3 tunnel.
> >>> The parsing logic of "rte_pktmbuf_get_ptype()" and the definition of
> >>> "struct rte_mbuf_hdr_lens" consist a pair for one specific parser sch=
eme.
> >>> In this case, the fixed function is to support below.
> >>>
> >>> + * Supported packet types are:
> >>> + *   L2: Ether
> >>> + *   L3: IPv4, IPv6
> >>> + *   L4: TCP, UDP, SCTP
> >>>
> >>> Of course, it can add more packet type detection logic in future. But
> >>> the more support, the higher the cost.
> >>>
> >>> One of the alternative way is to allow registering parser pair. APP
> >>> decides to choose the predefined scheme(by DPDK LIB), or to self-defi=
ne
> >>> the parsing logic.
> >>> In this way, the scheme can do some assumption for the specific case =
and
> >>> ignore some useless graph detection.
> >>> In addition, besides the SW parser, the HW parser(identified by
> >>> packet_type in mbuf) can be turn on/off by leveraging the same manner=
.
> >>
> >>
> >> Sorry, I'm not sure I'm fully getting what you are saying. If I
> >> understand well, you would like to have something more flexible that
> >> supports the registration of protocol to be recognized?
> > [LC] Not on that granularity, but on the entire parsing routine.
> > rte_pktmbuf_get_ptype() as the common API, and can present in different
> behavior.
> > Usually in different scenario, the interested packet set is different.
> > For the specific case, can do some speculation pre-checking on the opti=
mization
> perspective.
> >
> >>
> >> I'm not sure having a function with a dynamic registration method woul=
d
> >> really increase the performance compared to a static complete function=
.
> > [LC] No, it won't. But the overhead is not much, refer to rx_pkt_burst(=
is a
> callback either).
> > If someone only interest for IPv4-NoFrag-TCP stream, the easiest way ma=
ybe
> not layer by layer detection.
> > The straight forward way maybe, 1) load n bytes 2) compare mask 3) upda=
te
> ptype.
> > We require a normal way to do SW detection, current version is perfect.
> > My point is, we can provide a simple mechanism to allow other way, and =
under
> the same unified API.
>=20
> Again, sorry, I'm not perfectly sure I understand what you are saying.
>=20
> What you describe (mask packet data, then compare with a value) seems
> very similar to what ovs does. Do you mean we should have an API for that=
?
[LC] No. Sorry to make you confused.
If there's one function can well detect all kinds of packet in low cost, it=
's perfect.
But from case to case, the packet detection interest is difficult(IPDC, wir=
eless, metro Ethernet and etc).
Considering the possible tradeoff of performance and completeness, to
allow dedicated parser tuned for special purpose is an alternative way.

>=20
> I think once we have masked+compared the data, we may know much more
> than just a packet_type.
[LC] Detection packet layer by layer is the normal way. In some case, it do=
esn't have to.
For example, we assume there's one network using VXLAN-GPE..
To detect the packet layer by layer, need to walk through two step, UDP Por=
t and VXLAN NP.
In fact, UDP+VXLAN(16B) as a whole to compare mask once, you can know it's =
a VXLAN w/ inner Ethernet or not.

Probably it's not a perfect cases. SW Parser is not a low cost stuff, from =
cases to cases, if there are some special, it has potential space to optimi=
ze. One possible pseudo code as below.

struct rte_ptype_parser {
	char name[128];
	uint32_t (*get_ptype)(const struct rte_mbuf *m, void *hdr_lens);
};

struct rte_ptype_parser def_parser =3D=20
{
	.name =3D "ipdc";=20
	.get_ptype =3D ipdc_get_ptype;
};

uint32_t rte_pktmbuf_get_ptype(const struct rte_mbuf *m,
		void *hdr_lens)
{
	struct rte_ptype_parser parser =3D def_parser;
=09
	[...]
	parser->get_ptype(m, hdr_lens);
	[...]
}

/* scheme for ipdc */
struct ipdc_hdr_lens {
	uint8_t l2_len;
	uint8_t l3_len;
	uint8_t l4_len;
	uint8_t tunnel_len;
	uint8_t inner_l2_len;
	uint8_t inner_l3_len;
	uint8_t inner_l4_len;
};
uint32_t ipdc_get_ptype(const struct rte_mbuf *m, void *hdr_lens)
{
	struct ipdc_hdr_lens ihl =3D (struct ipdc_hdr_lens*)hdr_lens;

	/* parser logic optimized for typical IP datacenter packet */
	[...]
}

/* scheme for l2mpls */
struct l2mpls_hdr_lens {
	uint8_t l2_len;
	uint8_t mpls_len;            /* total length for multi-layer */
	uint8_t inner_l2_len;
	uint8_t inner_l3_len;
};
uint32_t l2mpls_get_ptype(const struct rte_mbuf *m, void *hdr_lens)
{
	struct l2mpls_hdr_lens ihl =3D (struct l2mpls_hdr_lens*)hdr_lens;

	/* parser logic optimized for typical L2MPLS */
	[...]
}

>=20
>=20
>=20
> >
> >> Actually, we will never support a tons of protocols since each layer
> >> packet type is 4 bits, and since it requires that at least one hw
> >> supports it.
> > [LC] Agree, it is today. But maybe dynamic in future, packet type defin=
ition as a
> template.
> >>
> >> As described in the cover letter, the 2 main goals of this patchset ar=
e
> >> to provide a reference implementation for packet type recognition, and
> >> enable the support of virtio offloads (I'll send the patchset soon).
> >> This function is adapted to these 2 usages. Are you thinking of anothe=
r
> >> use-case that would not be covered?
> > [LC] That's excellent work.  Furthermore I believe it can cover all eth=
dev actually.
> > When HW can't report some demand packet type, then fallback to your SW
> parser version.
> > If the auto-switch can be transparent, that's perfect. Maybe rx callbac=
k and
> update ptype in mbuf?
>=20
> I was also thinking about calling rte_pktmbuf_get_ptype() from a driver.
> I think drivers should not access to mbuf data if it's not absolutely
> required.
> Calling rte_pktmbuf_get_ptype() from inside a rx callback seems easily
> feasible, it may be useful for applications that mostly relies on
> packet_type to select an action.
>=20
>=20
> Regards,
> Olivier