From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 79A1137A6 for ; Fri, 30 Jun 2017 17:40:31 +0200 (CEST) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Jun 2017 08:40:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,287,1496127600"; d="scan'208";a="987273527" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by orsmga003.jf.intel.com with ESMTP; 30 Jun 2017 08:40:30 -0700 Received: from fmsmsx112.amr.corp.intel.com (10.18.116.6) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.319.2; Fri, 30 Jun 2017 08:40:24 -0700 Received: from shsmsx103.ccr.corp.intel.com (10.239.4.69) by FMSMSX112.amr.corp.intel.com (10.18.116.6) with Microsoft SMTP Server (TLS) id 14.3.319.2; Fri, 30 Jun 2017 08:40:23 -0700 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.146]) by SHSMSX103.ccr.corp.intel.com ([169.254.4.116]) with mapi id 14.03.0319.002; Fri, 30 Jun 2017 23:40:21 +0800 From: "Hu, Jiayu" To: "Ananyev, Konstantin" CC: "dev@dpdk.org" , "Tan, Jianfeng" , "stephen@networkplumber.org" , "yliu@fridaylinux.org" , "Wu, Jingjing" , "Yao, Lei A" , "Wiles, Keith" , "Bie, Tiwei" Thread-Topic: [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support Thread-Index: AQHS8Zls7mU87+hBrkSDOYJ6Z34HKKI9arhg Date: Fri, 30 Jun 2017 15:40:20 +0000 Message-ID: References: <1498229000-94867-1-git-send-email-jiayu.hu@intel.com> <1498459430-116048-1-git-send-email-jiayu.hu@intel.com> <1498459430-116048-3-git-send-email-jiayu.hu@intel.com> <2601191342CEEE43887BDE71AB9772583FB13208@IRSMSX109.ger.corp.intel.com> <20170629022614.GB123467@dpdk15.sh.intel.com> <2601191342CEEE43887BDE71AB9772583FB18595@IRSMSX109.ger.corp.intel.com> In-Reply-To: <2601191342CEEE43887BDE71AB9772583FB18595@IRSMSX109.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jun 2017 15:40:32 -0000 Hi Konstantin, > -----Original Message----- > From: Ananyev, Konstantin > Sent: Friday, June 30, 2017 8:07 PM > To: Hu, Jiayu > Cc: dev@dpdk.org; Tan, Jianfeng ; > stephen@networkplumber.org; yliu@fridaylinux.org; Wu, Jingjing > ; Yao, Lei A ; Wiles, Keith > ; Bie, Tiwei > Subject: RE: [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support >=20 > Hi Jiayu, >=20 > > > > + > > > > +int32_t > > > > +gro_tcp4_reassemble(struct rte_mbuf *pkt, > > > > + struct gro_tcp_tbl *tbl, > > > > + uint32_t max_packet_size) > > > > +{ > > > > + struct ether_hdr *eth_hdr; > > > > + struct ipv4_hdr *ipv4_hdr; > > > > + struct tcp_hdr *tcp_hdr; > > > > + uint16_t ipv4_ihl, tcp_hl, tcp_dl; > > > > + > > > > + struct tcp_key key; > > > > + uint32_t cur_idx, prev_idx, item_idx; > > > > + uint32_t i, key_idx; > > > > + > > > > + eth_hdr =3D rte_pktmbuf_mtod(pkt, struct ether_hdr *); > > > > + ipv4_hdr =3D (struct ipv4_hdr *)(eth_hdr + 1); > > > > + ipv4_ihl =3D IPv4_HDR_LEN(ipv4_hdr); > > > > + > > > > + /* check if the packet should be processed */ > > > > + if (ipv4_ihl < sizeof(struct ipv4_hdr)) > > > > + goto fail; > > > > + tcp_hdr =3D (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl); > > > > + tcp_hl =3D TCP_HDR_LEN(tcp_hdr); > > > > + tcp_dl =3D rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl > > > > + - tcp_hl; > > > > + if (tcp_dl =3D=3D 0) > > > > + goto fail; > > > > + > > > > + /* find a key and traverse all packets in its item group */ > > > > + key.eth_saddr =3D eth_hdr->s_addr; > > > > + key.eth_daddr =3D eth_hdr->d_addr; > > > > + key.ip_src_addr[0] =3D rte_be_to_cpu_32(ipv4_hdr->src_addr); > > > > + key.ip_dst_addr[0] =3D rte_be_to_cpu_32(ipv4_hdr->dst_addr); > > > > > > Your key.ip_src_addr[1-3] still contains some junk. > > > How memcmp below supposed to worj properly? > > > > When allocate an item, we already guarantee the content of its > > memory space is 0. So memcpy won't be error. >=20 > key is allocated on the stack. > Obviously fileds that are not initialized manually will contain undefined= values, > i.e.: ip_src-addr[1-3]. > Then below you are doing: > memcp((&(tbl->keys[i].key), &key, sizeof(struct tcp_key)); > ... > memcpy(&(tbl->keys[key_idx].key), &key, sizeof(struct tcp_key)); >=20 > So I still think you are comparing/copying some junk here. Oh, yes. Key is allocated in stack. Thanks, I will modify it. >=20 > > > > > BTW why do you need 4 elems here, why just not uint32_t ip_src_addr;? > > > Same for ip_dst_addr. > > > > I think tcp6 and tcp4 can share the same table structure. So I use > > 128bit IP address here. You mean we need to use different structures > > for tcp4 and tcp6? >=20 > That would be my preference. Got it. I will modify it. Thanks. >=20 > > > > > > > > > + key.src_port =3D rte_be_to_cpu_16(tcp_hdr->src_port); > > > > + key.dst_port =3D rte_be_to_cpu_16(tcp_hdr->dst_port); > > > > + key.recv_ack =3D rte_be_to_cpu_32(tcp_hdr->recv_ack); > > > > + key.tcp_flags =3D tcp_hdr->tcp_flags; > > > > + > > > > + for (i =3D 0; i < tbl->max_key_num; i++) { > > > > + if (tbl->keys[i].is_valid && > > > > + (memcmp(&(tbl->keys[i].key), &key, > > > > + sizeof(struct tcp_key)) > > > > + =3D=3D 0)) { > > > > + cur_idx =3D tbl->keys[i].start_index; > > > > + prev_idx =3D cur_idx; > > > > + while (cur_idx !=3D INVALID_ARRAY_INDEX) { > > > > + if (check_seq_option(tbl->items[cur_idx].pkt, > > > > + tcp_hdr, > > > > + tcp_hl) > 0) { > > > > > > As I remember linux gro also check ipv4 packet_id - it should be > consecutive. > > > > IP fragmented packets have the same IP ID, but others are consecutive. >=20 > Yes, you assume that they are consecutive. > But the only way to know for sure that they are - check it. Yes, I will modify it. Thanks. > Another thing - I think we need to perform GRO only for TCP packets with > only ACK bit set > (i.e. - no GRO for FIN/SYN/PUSH/URG/, etc.). Currently, we don't process packets whose payload length is 0. So if the pa= ckets are SYN or FIN, we won't process them, since their payload length is 0. For= URG, PSH and RST, TCP/IPv4 GRO may still process these packets. You are right. We shouldn't process packets whose URG, PSH or RST bit is se= t. Thanks, I will modify it.=20 >=20 > > As we suppose GRO can merge IP fragmented packets, so I think we > shouldn't check if > > the IP ID is consecutive here. How do you think? > > > > > > > > > + if (merge_two_tcp4_packets( > > > > + tbl- > >items[cur_idx].pkt, > > > > + pkt, > > > > + > max_packet_size) > 0) { > > > > + /* successfully merge two > packets */ > > > > + tbl- > >items[cur_idx].is_groed =3D 1; > > > > + return 1; > > > > + } > > > > > > If you allow more then packet per flow to be stored in the table, the= n you > should be > > > prepared that new segment can fill a gap between 2 packets. > > > Probably the easiest thing - don't allow more then one 'item' per flo= w. > > > > We allow the table to store same flow but out-of-order arriving packets= . > For > > these packets, they will occupy different 'item' and we won't re-merge > them. > > For example, there are three same flow tcp packets: p1, p2 and p3. And = p1 > arrives > > first, then p3, and last is p2. So TCP GRO will allocate one 'item' for= p1 and > one > > 'item' for p3, and when p2 arrives, p2 will be merged with p1. Therefor= e, in > the > > table, we will have two 'item': item1 to store merged p1 and p2, item2 = to > store p3. > > > > As you can see, TCP GRO can only merges sequential arriving packets. If= we > want to > > merge all out-of-order arriving packets, we need to re-process these > packets which > > are already processed and have one 'item'. IMO, this procedure will be = very > complicated. > > So we don't do that. > > > > Sorry, I don't understand how to allow one 'item' per-flow. Because > packets are arriving > > out-of-order. If we don't re-process these packets which already have o= ne > 'item', how to > > guarantee it? >=20 > As I understand you'll have an input array: > - you wouldn't be able to merge it. > So I think your merge need be prepared to merge both smaller and bigger > sequence. Oh yes, it's much better. I will modify it. > About one item my thought : instead of allowing N items per key(flow) - f= or > simplicity > just allow one item per flow. > In that case we wouldn't allow holes, but still would be able to merge > reordered packets. > Alternatively you can store items ordered by seq, and after merge, try to > merge neighbor > Items too. Yes, when we insert a new item, we can chain it with the packets of its ite= m group ordered by seq. After processing all packets, we need to traverse eac= h item group and try to merge neighbors. But when will 'merge neighbors' happ= en? When flush packets from the table? (e.g. gro_tcp_tbl_timeout_flush) BRs, Jiayu >=20 > > > > > > > > > + /** > > > > + * fail to merge two packets since > > > > + * it's beyond the max packet length. > > > > + * Insert it into the item group. > > > > + */ > > > > + goto insert_to_item_group; > > > > + } else { > > > > + prev_idx =3D cur_idx; > > > > + cur_idx =3D tbl- > >items[cur_idx].next_pkt_idx; > > > > + } > > > > + } > > > > + /** > > > > + * find a corresponding item group but fails to find > > > > + * one packet to merge. Insert it into this item group. > > > > + */ > > > > +insert_to_item_group: > > > > + item_idx =3D find_an_empty_item(tbl); > > > > + /* the item number is greater than the max value */ > > > > + if (item_idx =3D=3D INVALID_ARRAY_INDEX) > > > > + return -1; > > > > + tbl->items[prev_idx].next_pkt_idx =3D item_idx; > > > > + tbl->items[item_idx].pkt =3D pkt; > > > > + tbl->items[item_idx].is_groed =3D 0; > > > > + tbl->items[item_idx].next_pkt_idx =3D > INVALID_ARRAY_INDEX; > > > > + tbl->items[item_idx].is_valid =3D 1; > > > > + tbl->items[item_idx].start_time =3D rte_rdtsc(); > > > > + tbl->item_num++; > > > > + return 0; > > > > + } > > > > + } > > > > + > > > > + /** > > > > + * merge fail as the given packet has a new key. > > > > + * So insert a new key. > > > > + */ > > > > + item_idx =3D find_an_empty_item(tbl); > > > > + key_idx =3D find_an_empty_key(tbl); > > > > + /** > > > > + * if current key or item number is greater than the max > > > > + * value, don't insert the packet into the table and return > > > > + * immediately. > > > > + */ > > > > + if (item_idx =3D=3D INVALID_ARRAY_INDEX || > > > > + key_idx =3D=3D INVALID_ARRAY_INDEX) > > > > + return -1; > > > > + tbl->items[item_idx].pkt =3D pkt; > > > > + tbl->items[item_idx].next_pkt_idx =3D INVALID_ARRAY_INDEX; > > > > + tbl->items[item_idx].is_groed =3D 0; > > > > + tbl->items[item_idx].is_valid =3D 1; > > > > + tbl->items[item_idx].start_time =3D rte_rdtsc(); > > >