From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 3E4331C00 for ; Thu, 7 Dec 2017 09:31:39 +0100 (CET) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Dec 2017 00:31:38 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,372,1508828400"; d="scan'208";a="156755084" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by orsmga004.jf.intel.com with ESMTP; 07 Dec 2017 00:31:38 -0800 Received: from fmsmsx119.amr.corp.intel.com (10.18.124.207) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 7 Dec 2017 00:31:38 -0800 Received: from shsmsx103.ccr.corp.intel.com (10.239.4.69) by FMSMSX119.amr.corp.intel.com (10.18.124.207) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 7 Dec 2017 00:31:38 -0800 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.175]) by SHSMSX103.ccr.corp.intel.com ([169.254.4.213]) with mapi id 14.03.0319.002; Thu, 7 Dec 2017 16:31:36 +0800 From: "Hu, Jiayu" To: Stephen Hemminger , "Ananyev, Konstantin" CC: Ilya Matveychikov , "dev@dpdk.org" Thread-Topic: [dpdk-dev] A question about GRO neighbor packet matching Thread-Index: AQHTbprlTLtjiz4UDUG1Q5CsgccyIKM2GBkAgAAHUgCAAE18AIAAEfMAgAALyYCAAO92YA== Date: Thu, 7 Dec 2017 08:31:35 +0000 Message-ID: References: <4F9781B2-338C-4154-BDA1-BC24D1B2B689@gmail.com> <20171206101200.031afa39@shemminger-XPS-13-9360> <2111ED2C-DB90-4AE3-893E-2406EFE129AD@gmail.com> <20171206151532.3abaf2fb@xeon-e3> <2601191342CEEE43887BDE71AB9772585FAC57C1@irsmsx105.ger.corp.intel.com> <20171206170157.1d839de0@xeon-e3> In-Reply-To: <20171206170157.1d839de0@xeon-e3> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNDA2NWQzYzMtODdkNi00NDc2LWI5YWUtZjI3NWJhY2QyYTc1IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6IlNieVRyRXd1QjErc1VXeERiblpOME1xOXdQdjhEZ3V2eEdBTjM2UnBSSTg9In0= x-ctpclassification: CTP_IC x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] A question about GRO neighbor packet matching X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Dec 2017 08:31:40 -0000 Hi all, > -----Original Message----- > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Thursday, December 7, 2017 9:02 AM > To: Ananyev, Konstantin > Cc: Ilya Matveychikov ; dev@dpdk.org; Hu, Jiayu > > Subject: Re: [dpdk-dev] A question about GRO neighbor packet matching >=20 > On Thu, 7 Dec 2017 00:19:46 +0000 > "Ananyev, Konstantin" wrote: >=20 > > > -----Original Message----- > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen > Hemminger > > > Sent: Wednesday, December 6, 2017 11:16 PM > > > To: Ilya Matveychikov > > > Cc: dev@dpdk.org; Hu, Jiayu > > > Subject: Re: [dpdk-dev] A question about GRO neighbor packet matching > > > > > > On Wed, 6 Dec 2017 22:38:12 +0400 > > > Ilya Matveychikov wrote: > > > > > > > > On Dec 6, 2017, at 10:12 PM, Stephen Hemminger > wrote: > > > > > > > > > > On Wed, 6 Dec 2017 18:02:21 +0400 > > > > > Ilya Matveychikov wrote: > > > > > > > > > >> Hello all, > > > > >> > > > > >> > > > > >> My question is about neighbor packet matching algorithm for TCP.= Is > it > > > > >> correct to expect that IP packets should have continuous ID > enumeration > > > > >> (i.e. iph-next.id =3D iph-prev.id + 1)? > > > > > > > > > > > > > > > No. > > > > > > > > > >> ~~~ > > > > >> lib/librte_gro/gro_tcp4.c:check_seq_option() > > > > >> ... > > > > >> /* check if the two packets are neighbors */ > > > > >> tcp_dl0 =3D pkt0->pkt_len - pkt0->l2_len - pkt0->l3_len - > tcp_hl0; > > > > >> if ((sent_seq =3D=3D (item->sent_seq + tcp_dl0)) && > > > > >> (ip_id =3D=3D (item->ip_id + 1))) > > > > >> /* append the new packet */ > > > > >> return 1; > > > > >> else if (((sent_seq + tcp_dl) =3D=3D item->sent_seq) && > > > > >> ((ip_id + item->nb_merged) =3D=3D item->ip_id)) > > > > >> /* pre-pend the new packet */ > > > > >> return -1; > > > > >> else > > > > >> return 0; > > > > >> ~~~ > > > > >> > > > > >> As per RFC791: > > > > >> > > > > >> Identification: 16 bits > > > > >> > > > > >> An identifying value assigned by the sender to aid in assembl= ing the > > > > >> fragments of a datagram. > > > > > > > > > > The IP header id is meaningless in most TCP sessions. > > > > > Good TCP implementations use PMTU discovery which sets the Don't > Fragment bit. > > > > > With DF, the IP id is unused (since no fragmentation). > > > > > Many implementations just send 0 since generating unique IP id > requires an > > > > > atomic operation which is potential bottleneck. > > > > > > > > So, is my question correct and the code is wrong? > > > > > > > > > > Yes. This code is wrong on several areas. > > > * The ip_id on TCP flows is irrelevant. @Stephen and @Konstantin: In the latest linux, its GRO supports two kinds of IP ID: fixed or incremen= tal. You can see the commit 1530545ed64b42e87acb43c0c16401bd1ebae6bf. It uses "skb->is_atomic" to reflect if the IP ID is ignored. Linux GRO only= checks IP ID for the packets which are non-atomic (is_atomic is 0), and these pac= kets use incremental IP ID. Others, which are atomic, use fixed IP ID and Linux does= n't check their IP ID. You can see the codes in tcp_offload.c: if (NAPI_GRO_CB(p)->flush_id !=3D 1 || NAPI_GRO_CB(p)->count !=3D 1 || !NAPI_GRO_CB(p)->is_atomic) flush |=3D NAPI_GRO_CB(p)->flush_id; In af_inet.c, is_atomic is set: NAPI_GRO_CB(skb)->is_atomic =3D !!(iph->frag_off & htons(IP_DF)); I haven't figured out which kind of packets are set to is_atomic in Linux. Maybe Linux has followed RFC 6864. I need to investigate further. Especially, we plan to support tunneled GRO. The outer IP ID will encounter the same issue. If you have any suggestions, that will be highly appreciated. > > > * packet should only be merged if TCP flags are the same. @Stephen, we do check TCP flags when decide if two packets can be merged. Thanks, Jiayu > > > > > > > > > The author should look at Linux net/ipv4/tcp_offload.c > > > > As I remember, linux GRO implementation *does* require that IP IDs > > of the merging packets to be continuous. > > > > net/ipv4/af_inet.c: > > static struct sk_buff **inet_gro_receive(struct sk_buff **head, > > struct sk_buff *skb) > > { > > ... > > id =3D ntohl(*(__be32 *)&iph->id); > > flush =3D (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & > ~IP_DF)); > > id >>=3D 16; > > > > ... > > > > NAPI_GRO_CB(p)->flush_id =3D > > ((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) > ^ id); > > NAPI_GRO_CB(p)->flush |=3D flush; > > .... > > > > And then at net/ipv4/tcp_offload.c: > > struct sk_buff **tcp_gro_receive(struct sk_buff **head, struct sk_buff = *skb) > > { > > ... > > /* Include the IP ID check below from the inner most IP hdr */ > > flush =3D NAPI_GRO_CB(p)->flush | NAPI_GRO_CB(p)->flush_id; > > ... > > if (flush || skb_gro_receive(head, skb)) { > > ... > > > > The reason why we do need to check that IP ID is continuous - > > DPDK GRO library doesn't strip off IPv4 header, instead it has to merge > them into one. > > If IP ID would be non-contiguous it is unclear which one should be to u= sed. > > By same reason packets with different IP/TCP options are not allowed. > > So in that case GRO lib makes a decision that it isn't safe to merge th= ese > packets. > > As I understand linux does pretty much the same. > > Konstantin >=20 > You are right, but still not sure that Linux and DPDK are doing > the same thing with reordered packets. >=20 > Ok, went RFC hunting and the relevant one seems to be RFC 6864. > It mandates unique id's for each datagram so TCP does send them. >=20