From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id E01559AA5 for ; Tue, 3 Feb 2015 07:38:05 +0100 (CET) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP; 02 Feb 2015 22:34:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,510,1418112000"; d="scan'208";a="646512995" Received: from kmsmsx151.gar.corp.intel.com ([172.21.73.86]) by orsmga001.jf.intel.com with ESMTP; 02 Feb 2015 22:38:03 -0800 Received: from kmsmsx154.gar.corp.intel.com (172.21.73.14) by KMSMSX151.gar.corp.intel.com (172.21.73.86) with Microsoft SMTP Server (TLS) id 14.3.195.1; Tue, 3 Feb 2015 14:37:30 +0800 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by KMSMSX154.gar.corp.intel.com (172.21.73.14) with Microsoft SMTP Server (TLS) id 14.3.195.1; Tue, 3 Feb 2015 14:37:30 +0800 Received: from shsmsx104.ccr.corp.intel.com ([169.254.5.161]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.129]) with mapi id 14.03.0195.001; Tue, 3 Feb 2015 14:37:30 +0800 From: "Zhang, Helin" To: 'Olivier MATZ' , "'dev@dpdk.org'" Thread-Topic: [dpdk-dev] [PATCH 01/17] mbuf: add definitions of unified packet types Thread-Index: AQHQPJSYnLd4IgnLikW5X4rRpE4JWJzciU5wgAArTQCAAYnV0IAAPUVg Date: Tue, 3 Feb 2015 06:37:28 +0000 Message-ID: References: <1421637666-16872-1-git-send-email-helin.zhang@intel.com> <1422501365-12643-1-git-send-email-helin.zhang@intel.com> <1422501365-12643-2-git-send-email-helin.zhang@intel.com> <54CB8D81.2050205@6wind.com> <54CF5CF8.2090605@6wind.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 01/17] mbuf: add definitions of unified packet types X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Feb 2015 06:38:06 -0000 > -----Original Message----- > From: Zhang, Helin > Sent: Tuesday, February 3, 2015 11:19 AM > To: Olivier MATZ; dev@dpdk.org > Cc: Stephen Hemminger > Subject: RE: [dpdk-dev] [PATCH 01/17] mbuf: add definitions of unified pa= cket > types >=20 >=20 >=20 > > -----Original Message----- > > From: Olivier MATZ [mailto:olivier.matz@6wind.com] > > Sent: Monday, February 2, 2015 7:18 PM > > To: Zhang, Helin; dev@dpdk.org > > Cc: Stephen Hemminger > > Subject: Re: [dpdk-dev] [PATCH 01/17] mbuf: add definitions of unified > > packet types > > > > Hi Helin, > > > > On 02/02/2015 02:43 AM, Zhang, Helin wrote: > > >>> +/* > > >>> + * Sixteen bits are divided into several fields to mark packet typ= es. > > >>> +Note that > > >>> + * each field is indexical. > > >>> + * - Bit 3:0 is for tunnel types. > > >>> + * - Bit 7:4 is for L3 or outer L3 (for tunneling case) types. > > >>> + * - Bit 10:8 is for L4 types. It can also be used for inner L4 ty= pes for > > >>> + * tunneling packets. > > >>> + * - Bit 13:11 is for inner L3 types. > > >>> + * - Bit 15:14 is reserved. > > >> > > >> Is there a reason why using this specific order? > > > Yes, to support ixgbe Vector PMD, outer L3 types and L4 types need > > > to be contiguous and in this order. > > > > When you say "need to be", do you mean it's impossible to do in > > another manner or just that it would be slower? > It was designed to be like this, otherwise, performance drop must be expe= cted. >=20 > > > > >> Also, there are 4 bits for outer L3 types and 3 bits for inner L3 > > >> types, but both of them have 6 different supported types. Is it inte= ntional? > > > Yes, it is to support ixgbe Vector PMD. Contiguous 7 bits are > > > needed, though > > 1 bit wasted. > > > > To be honnest, I'm always a surprised that in dpdk we prefer having a > > strange API just because it's faster or easier to do on one specific > > driver (usually i40e or ixgbe). Unfortunately, trying to optimize the > > API for one driver may result in making the rest of the code > > (application and other drivers) slower and more complex. > Based on my understanding, 'faster' is most of DPDK customers wanted. > Otherwise, they don't need DPDK. Different hardware must have different > capabilities, I am trying to unify at least packet types to get things ea= sier. >=20 > > > > In your proposition, there is no inner l4_type. I consider it's as > > useful as the other fields. From what I see, there are only 2 bits > > left. What do you think about changing the packet type to 64 bits now? > For tunneling cases, L4_type is for inner L4 type, outer L4 type is not n= eeded, as > it can be in tunnel type. > I can expect 64 bits are needed in the future. But for now, I don't see a= ny > strong demand on that for currently supported hardware. > In addition, there is no free bit in the first cache line of mbuf header,= mbuf > changes are needed to expand it. I'd prefer to do it later to make things= easier. Sorry, I misremember the usage of the first cache line of mbuf. It still ha= s some free space. Based on this, enlarging (to 32 or 64 bits) the packet type mig= ht be good. >=20 > > > > From an API point of view, I think it would be good to have the same > > structure for inner and outer types. For instance (this is just an exam= ple): > > > > union layer_pkt_type { > > struct { > > uint16_t l2_type:4; > > uint16_t l3_type:4; > > uint16_t l4_type:4; > > uint16_t tun_type:4; > > }; > > uint16_t u16; > > }; > > > > struct pkt_type { > > union layer_pkt_type outer; > > union layer_pkt_type inner; > > }; > > > > When your application decapsulates tunnels, you can just do outer =3D > > inner and enter into the same code. > Expanding packet_type is not easy, as there is no free bits in the first = cache > line. > Is there any tunnel type in inner packet? Is it a waste? > Is L2 type really needed? I don't know. If it is now not short of space in mbuf, the definition as yours might be g= ood. But tun_type is not required for inner packet, I'd prefer to define it as n= eeded with taking into account the Vector PMD support. It seems 32 bits might be = enough, like below, struct pkt_type { uint32_t l2_type:4; uint32_t l3_type:4; uint32_t l4_type:4; uint32_t tun_type:4; uint32_t inner_l2_type:4; uint32_t inner_l3_type:4; uint32_t inner_l4_type:4; } Regards, Helin >=20 > > > > > > >>> + * RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV6_EXT, RTE_PTYPE_L4_TCP, > > >>> +RTE_PTYPE_L4_UDP > > >>> + * and RTE_PTYPE_L4_SCTP should be kept as below in a contiguous > > >>> +7 > > bits. > > >>> + * > > >>> + * Note that L3 types values are selected for checking IPV4/IPV6 > > >>> +header from > > >>> + * performance point of view. Reading annotations of > > >>> +RTE_ETH_IS_IPV4_HDR and > > >>> + * RTE_ETH_IS_IPV6_HDR is needed for any future changes of L3 > > >>> +type > > >> values. > > >>> + */ > > >>> +#define RTE_PTYPE_UNKNOWN 0x0000 /* > > >> 0b0000000000000000 */ > > >>> +/* bit 3:0 for tunnel types */ > > >>> +#define RTE_PTYPE_TUNNEL_IP 0x0001 /* > > >> 0b0000000000000001 */ > > >>> +#define RTE_PTYPE_TUNNEL_TCP 0x0002 /* > > >> 0b0000000000000010 */ > > >>> +#define RTE_PTYPE_TUNNEL_UDP 0x0003 /* > > >> 0b0000000000000011 */ > > >>> +#define RTE_PTYPE_TUNNEL_GRE 0x0004 /* > > >> 0b0000000000000100 */ > > >>> +#define RTE_PTYPE_TUNNEL_VXLAN 0x0005 /* > > >> 0b0000000000000101 */ > > >>> +#define RTE_PTYPE_TUNNEL_NVGRE 0x0006 /* > > >> 0b0000000000000110 */ > > >>> +#define RTE_PTYPE_TUNNEL_GENEVE 0x0007 /* > > >> 0b0000000000000111 */ > > >>> +#define RTE_PTYPE_TUNNEL_GRENAT 0x0008 /* > > >> 0b0000000000001000 */ > > >>> +#define RTE_PTYPE_TUNNEL_GRENAT_MAC 0x0009 /* > > >> 0b0000000000001001 */ > > >>> +#define RTE_PTYPE_TUNNEL_GRENAT_MACVLAN 0x000a /* > > >> 0b0000000000001010 */ > > >>> +#define RTE_PTYPE_TUNNEL_MASK 0x000f /* > > >> 0b0000000000001111 */ > > >>> +/* bit 7:4 for L3 types */ > > >>> +#define RTE_PTYPE_L3_IPV4 0x0010 /* > > >> 0b0000000000010000 */ > > >>> +#define RTE_PTYPE_L3_IPV4_EXT 0x0030 /* > > >> 0b0000000000110000 */ > > >>> +#define RTE_PTYPE_L3_IPV6 0x0040 /* > > >> 0b0000000001000000 */ > > >>> +#define RTE_PTYPE_L3_IPV4_EXT_UNKNOWN 0x0090 /* > > >> 0b0000000010010000 */ > > >>> +#define RTE_PTYPE_L3_IPV6_EXT 0x00c0 /* > > >> 0b0000000011000000 */ > > >>> +#define RTE_PTYPE_L3_IPV6_EXT_UNKNOWN 0x00e0 /* > > >> 0b0000000011100000 */ > > >>> +#define RTE_PTYPE_L3_MASK 0x00f0 /* > > >> 0b0000000011110000 */ > > >> > > >> can we expect that when RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV4_EXT > or > > >> RTE_PTYPE_L3_IPV4_EXT_UNKNOWN is set, the hardware also verified > > >> the > > >> L3 checksum? > > > RTE_PTYPE_L3_IPV4 means there is NONE-EXT. Each time only one of > > > above 3 > > can be set. > > > These bits don't indicate any checksum, checksum should be indicated > > > by > > other flags. > > > They are just for packet types hardware can recognized. > > > > I think these 2 information are linked: > > > > - if the hardware cannot recognize packet, it cannot calculate the > > checksum because it does not know the packet type > > - if the hardware can recognize the packet, it can verify that the > > checksum is good or wrong > We cannot know how hardware works, we care about what hardware can > report. >=20 > > > > Today, we have: > > > > - PKT_RX_IPV4_HDR and PKT_RX_IPV4_HDR_EXT to tell if the packet is > > seen as IPv4 by the hw. > > > > - We can suppose that: > > > > - PKT_RX_IPV4_HDR(_EXT)=3D0 -> no hw checksum information > > - PKT_RX_IPV4_HDR(_EXT)=3D1 and PKT_RX_IP_CKSUM_BAD=3D0 -> > checksum > > is correct > > - PKT_RX_IPV4_HDR(_EXT)=3D1 and PKT_RX_IP_CKSUM_BAD=3D1 -> > checksum > > is not correct > > > > - We cannot do the same with L4 because we have no L4 type info, > > but it would be good to be able to do the same. > > > > With your patch, you are removing the PKT_RX_IPV4_HDR and > > PKT_RX_IPV4_HDR_EXT flags, but I think the above assumption about > > checksum should be kept. As you are adding a L4 type info, the same > > method could be applied to L4 checksums. > > > > I think this would definitely solve the problem described by Stephen. > I think packet type and checksum are different things. They are reported = by > different fields. > PKT_RX_IPV4_HDR and PKT_RX_IPV4_HDR_EXT mean packet type only, > nothing about checksum. Checksum GOOD/BAD can be reported by other flags > in ol_flags. >=20 > > > > > > >> My understanding is: > > >> > > >> - if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 0 > > >> -> checksum was checked by hw and is good > > >> - if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 1 > > >> -> checksum was checked by hw and is bad > > >> - if packet_type is not IPv4* > > >> -> checksum was not checked by hw > > >> > > >> I think it would solve the problem asked by Stephen > > >> http://dpdk.org/ml/archives/dev/2015-January/011550.html > > >> > > >>> +/* bit 10:8 for L4 types */ > > >>> +#define RTE_PTYPE_L4_TCP 0x0100 /* > > >> 0b0000000100000000 */ > > >>> +#define RTE_PTYPE_L4_UDP 0x0200 /* > > >> 0b0000001000000000 */ > > >>> +#define RTE_PTYPE_L4_FRAG 0x0300 /* > > >> 0b0000001100000000 */ > > >>> +#define RTE_PTYPE_L4_SCTP 0x0400 /* > > >> 0b0000010000000000 */ > > >>> +#define RTE_PTYPE_L4_ICMP 0x0500 /* > > >> 0b0000010100000000 */ > > >>> +#define RTE_PTYPE_L4_NONFRAG 0x0600 /* > > >> 0b0000011000000000 */ > > >>> +#define RTE_PTYPE_L4_MASK 0x0700 /* > > >> 0b0000011100000000 */ > > >> > > >> Same question for L4. > > >> > > >> Note: it would means that if a hardware is able to recognize a TCP > > >> packet but not to verify the checksum, it has to set RTE_PTYPE_L4 > > >> to > > unknown. > > >> > > >>> +/* bit 13:11 for inner L3 types */ > > >>> +#define RTE_PTYPE_INNER_L3_IPV4 0x0800 /* > > >> 0b0000100000000000 */ > > >>> +#define RTE_PTYPE_INNER_L3_IPV4_EXT 0x1000 /* > > >> 0b0001000000000000 */ > > >>> +#define RTE_PTYPE_INNER_L3_IPV6 0x1800 /* > > >> 0b0001100000000000 */ > > >>> +#define RTE_PTYPE_INNER_L3_IPV6_EXT 0x2000 /* > > >> 0b0010000000000000 */ > > >>> +#define RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN 0x2800 /* > > >>> +0b0010100000000000 */ #define > > >> RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN 0x3000 /* > > 0b0011000000000000 */ > > > We cannot define the hardware behaviors, it just reports the > > > hardware recognized packet information directly to the mbuf. > > > Based on my experiment on i40e hardware, if a IPV4 packet with wrong > > > checksum, by default, the PMD driver cannot see the packet at all. > > > So we don't need to care about it too much! > > > > I agree that the hardware reports some info that can be different > > depending on the hw. But the role of the driver is to convert these > > info into a common API with a well-defined behavior. > Yes, driver should report the received packet information to a well-defin= ed > behavior, but not the same behavior, even for the same packet. > Capability can be queried for each port, and then the application can kno= w the > port capability well, and know what the hardware can report, and what the > hardware cannot report. > Driver should enable the hardware with its advanced capabilities as most = as > possible. >=20 > > > > Regards, > > Olivier