From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 8D4A59A8B for ; Tue, 3 Feb 2015 04:19:05 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP; 02 Feb 2015 19:19:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,510,1418112000"; d="scan'208";a="660646977" Received: from pgsmsx107.gar.corp.intel.com ([10.221.44.105]) by fmsmga001.fm.intel.com with ESMTP; 02 Feb 2015 19:19:01 -0800 Received: from shsmsx101.ccr.corp.intel.com (10.239.4.153) by PGSMSX107.gar.corp.intel.com (10.221.44.105) with Microsoft SMTP Server (TLS) id 14.3.195.1; Tue, 3 Feb 2015 11:19:01 +0800 Received: from shsmsx104.ccr.corp.intel.com ([169.254.5.161]) by SHSMSX101.ccr.corp.intel.com ([169.254.1.253]) with mapi id 14.03.0195.001; Tue, 3 Feb 2015 11:19:00 +0800 From: "Zhang, Helin" To: Olivier MATZ , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH 01/17] mbuf: add definitions of unified packet types Thread-Index: AQHQPJSYnLd4IgnLikW5X4rRpE4JWJzciU5wgAArTQCAAYnV0A== Date: Tue, 3 Feb 2015 03:18:59 +0000 Message-ID: References: <1421637666-16872-1-git-send-email-helin.zhang@intel.com> <1422501365-12643-1-git-send-email-helin.zhang@intel.com> <1422501365-12643-2-git-send-email-helin.zhang@intel.com> <54CB8D81.2050205@6wind.com> <54CF5CF8.2090605@6wind.com> In-Reply-To: <54CF5CF8.2090605@6wind.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 01/17] mbuf: add definitions of unified packet types X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Feb 2015 03:19:06 -0000 > -----Original Message----- > From: Olivier MATZ [mailto:olivier.matz@6wind.com] > Sent: Monday, February 2, 2015 7:18 PM > To: Zhang, Helin; dev@dpdk.org > Cc: Stephen Hemminger > Subject: Re: [dpdk-dev] [PATCH 01/17] mbuf: add definitions of unified pa= cket > types >=20 > Hi Helin, >=20 > On 02/02/2015 02:43 AM, Zhang, Helin wrote: > >>> +/* > >>> + * Sixteen bits are divided into several fields to mark packet types= . > >>> +Note that > >>> + * each field is indexical. > >>> + * - Bit 3:0 is for tunnel types. > >>> + * - Bit 7:4 is for L3 or outer L3 (for tunneling case) types. > >>> + * - Bit 10:8 is for L4 types. It can also be used for inner L4 type= s for > >>> + * tunneling packets. > >>> + * - Bit 13:11 is for inner L3 types. > >>> + * - Bit 15:14 is reserved. > >> > >> Is there a reason why using this specific order? > > Yes, to support ixgbe Vector PMD, outer L3 types and L4 types need to > > be contiguous and in this order. >=20 > When you say "need to be", do you mean it's impossible to do in another > manner or just that it would be slower? It was designed to be like this, otherwise, performance drop must be expect= ed. >=20 > >> Also, there are 4 bits for outer L3 types and 3 bits for inner L3 > >> types, but both of them have 6 different supported types. Is it intent= ional? > > Yes, it is to support ixgbe Vector PMD. Contiguous 7 bits are needed, t= hough > 1 bit wasted. >=20 > To be honnest, I'm always a surprised that in dpdk we prefer having a str= ange > API just because it's faster or easier to do on one specific driver (usua= lly i40e or > ixgbe). Unfortunately, trying to optimize the API for one driver may resu= lt in > making the rest of the code (application and other drivers) slower and mo= re > complex. Based on my understanding, 'faster' is most of DPDK customers wanted. Other= wise, they don't need DPDK. Different hardware must have different capabilities, = I am trying to unify at least packet types to get things easier. >=20 > In your proposition, there is no inner l4_type. I consider it's as useful= as the > other fields. From what I see, there are only 2 bits left. What do you th= ink about > changing the packet type to 64 bits now? For tunneling cases, L4_type is for inner L4 type, outer L4 type is not nee= ded, as it can be in tunnel type. I can expect 64 bits are needed in the future. But for now, I don't see any= strong demand on that for currently supported hardware. In addition, there is no free bit in the first cache line of mbuf header, m= buf changes are needed to expand it. I'd prefer to do it later to make things easier. >=20 > From an API point of view, I think it would be good to have the same stru= cture > for inner and outer types. For instance (this is just an example): >=20 > union layer_pkt_type { > struct { > uint16_t l2_type:4; > uint16_t l3_type:4; > uint16_t l4_type:4; > uint16_t tun_type:4; > }; > uint16_t u16; > }; >=20 > struct pkt_type { > union layer_pkt_type outer; > union layer_pkt_type inner; > }; >=20 > When your application decapsulates tunnels, you can just do outer =3D inn= er and > enter into the same code. Expanding packet_type is not easy, as there is no free bits in the first ca= che line. Is there any tunnel type in inner packet? Is it a waste? Is L2 type really needed? I don't know. >=20 >=20 > >>> + * RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV6_EXT, RTE_PTYPE_L4_TCP, > >>> +RTE_PTYPE_L4_UDP > >>> + * and RTE_PTYPE_L4_SCTP should be kept as below in a contiguous 7 > bits. > >>> + * > >>> + * Note that L3 types values are selected for checking IPV4/IPV6 > >>> +header from > >>> + * performance point of view. Reading annotations of > >>> +RTE_ETH_IS_IPV4_HDR and > >>> + * RTE_ETH_IS_IPV6_HDR is needed for any future changes of L3 type > >> values. > >>> + */ > >>> +#define RTE_PTYPE_UNKNOWN 0x0000 /* > >> 0b0000000000000000 */ > >>> +/* bit 3:0 for tunnel types */ > >>> +#define RTE_PTYPE_TUNNEL_IP 0x0001 /* > >> 0b0000000000000001 */ > >>> +#define RTE_PTYPE_TUNNEL_TCP 0x0002 /* > >> 0b0000000000000010 */ > >>> +#define RTE_PTYPE_TUNNEL_UDP 0x0003 /* > >> 0b0000000000000011 */ > >>> +#define RTE_PTYPE_TUNNEL_GRE 0x0004 /* > >> 0b0000000000000100 */ > >>> +#define RTE_PTYPE_TUNNEL_VXLAN 0x0005 /* > >> 0b0000000000000101 */ > >>> +#define RTE_PTYPE_TUNNEL_NVGRE 0x0006 /* > >> 0b0000000000000110 */ > >>> +#define RTE_PTYPE_TUNNEL_GENEVE 0x0007 /* > >> 0b0000000000000111 */ > >>> +#define RTE_PTYPE_TUNNEL_GRENAT 0x0008 /* > >> 0b0000000000001000 */ > >>> +#define RTE_PTYPE_TUNNEL_GRENAT_MAC 0x0009 /* > >> 0b0000000000001001 */ > >>> +#define RTE_PTYPE_TUNNEL_GRENAT_MACVLAN 0x000a /* > >> 0b0000000000001010 */ > >>> +#define RTE_PTYPE_TUNNEL_MASK 0x000f /* > >> 0b0000000000001111 */ > >>> +/* bit 7:4 for L3 types */ > >>> +#define RTE_PTYPE_L3_IPV4 0x0010 /* > >> 0b0000000000010000 */ > >>> +#define RTE_PTYPE_L3_IPV4_EXT 0x0030 /* > >> 0b0000000000110000 */ > >>> +#define RTE_PTYPE_L3_IPV6 0x0040 /* > >> 0b0000000001000000 */ > >>> +#define RTE_PTYPE_L3_IPV4_EXT_UNKNOWN 0x0090 /* > >> 0b0000000010010000 */ > >>> +#define RTE_PTYPE_L3_IPV6_EXT 0x00c0 /* > >> 0b0000000011000000 */ > >>> +#define RTE_PTYPE_L3_IPV6_EXT_UNKNOWN 0x00e0 /* > >> 0b0000000011100000 */ > >>> +#define RTE_PTYPE_L3_MASK 0x00f0 /* > >> 0b0000000011110000 */ > >> > >> can we expect that when RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV4_EXT or > >> RTE_PTYPE_L3_IPV4_EXT_UNKNOWN is set, the hardware also verified the > >> L3 checksum? > > RTE_PTYPE_L3_IPV4 means there is NONE-EXT. Each time only one of above = 3 > can be set. > > These bits don't indicate any checksum, checksum should be indicated by > other flags. > > They are just for packet types hardware can recognized. >=20 > I think these 2 information are linked: >=20 > - if the hardware cannot recognize packet, it cannot calculate the > checksum because it does not know the packet type > - if the hardware can recognize the packet, it can verify that the > checksum is good or wrong We cannot know how hardware works, we care about what hardware can report. >=20 > Today, we have: >=20 > - PKT_RX_IPV4_HDR and PKT_RX_IPV4_HDR_EXT to tell if the packet is > seen as IPv4 by the hw. >=20 > - We can suppose that: >=20 > - PKT_RX_IPV4_HDR(_EXT)=3D0 -> no hw checksum information > - PKT_RX_IPV4_HDR(_EXT)=3D1 and PKT_RX_IP_CKSUM_BAD=3D0 -> checksum > is correct > - PKT_RX_IPV4_HDR(_EXT)=3D1 and PKT_RX_IP_CKSUM_BAD=3D1 -> checksum > is not correct >=20 > - We cannot do the same with L4 because we have no L4 type info, > but it would be good to be able to do the same. >=20 > With your patch, you are removing the PKT_RX_IPV4_HDR and > PKT_RX_IPV4_HDR_EXT flags, but I think the above assumption about > checksum should be kept. As you are adding a L4 type info, the same metho= d > could be applied to L4 checksums. >=20 > I think this would definitely solve the problem described by Stephen. I think packet type and checksum are different things. They are reported by= different fields. PKT_RX_IPV4_HDR and PKT_RX_IPV4_HDR_EXT mean packet type only, nothing about checksum. Checksum GOOD/BAD can be reported by other flags in= ol_flags. >=20 >=20 > >> My understanding is: > >> > >> - if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 0 > >> -> checksum was checked by hw and is good > >> - if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 1 > >> -> checksum was checked by hw and is bad > >> - if packet_type is not IPv4* > >> -> checksum was not checked by hw > >> > >> I think it would solve the problem asked by Stephen > >> http://dpdk.org/ml/archives/dev/2015-January/011550.html > >> > >>> +/* bit 10:8 for L4 types */ > >>> +#define RTE_PTYPE_L4_TCP 0x0100 /* > >> 0b0000000100000000 */ > >>> +#define RTE_PTYPE_L4_UDP 0x0200 /* > >> 0b0000001000000000 */ > >>> +#define RTE_PTYPE_L4_FRAG 0x0300 /* > >> 0b0000001100000000 */ > >>> +#define RTE_PTYPE_L4_SCTP 0x0400 /* > >> 0b0000010000000000 */ > >>> +#define RTE_PTYPE_L4_ICMP 0x0500 /* > >> 0b0000010100000000 */ > >>> +#define RTE_PTYPE_L4_NONFRAG 0x0600 /* > >> 0b0000011000000000 */ > >>> +#define RTE_PTYPE_L4_MASK 0x0700 /* > >> 0b0000011100000000 */ > >> > >> Same question for L4. > >> > >> Note: it would means that if a hardware is able to recognize a TCP > >> packet but not to verify the checksum, it has to set RTE_PTYPE_L4 to > unknown. > >> > >>> +/* bit 13:11 for inner L3 types */ > >>> +#define RTE_PTYPE_INNER_L3_IPV4 0x0800 /* > >> 0b0000100000000000 */ > >>> +#define RTE_PTYPE_INNER_L3_IPV4_EXT 0x1000 /* > >> 0b0001000000000000 */ > >>> +#define RTE_PTYPE_INNER_L3_IPV6 0x1800 /* > >> 0b0001100000000000 */ > >>> +#define RTE_PTYPE_INNER_L3_IPV6_EXT 0x2000 /* > >> 0b0010000000000000 */ > >>> +#define RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN 0x2800 /* > >>> +0b0010100000000000 */ #define > >> RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN 0x3000 /* > 0b0011000000000000 */ > > We cannot define the hardware behaviors, it just reports the hardware > > recognized packet information directly to the mbuf. > > Based on my experiment on i40e hardware, if a IPV4 packet with wrong > > checksum, by default, the PMD driver cannot see the packet at all. So > > we don't need to care about it too much! >=20 > I agree that the hardware reports some info that can be different dependi= ng on > the hw. But the role of the driver is to convert these info into a common= API > with a well-defined behavior. Yes, driver should report the received packet information to a well-defined= behavior, but not the same behavior, even for the same packet. Capability can be queried for each port, and then the application can know = the port capability well, and know what the hardware can report, and what the hardwa= re cannot report. Driver should enable the hardware with its advanced capabilities as most as= possible. >=20 > Regards, > Olivier