From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
 by dpdk.org (Postfix) with ESMTP id 04A7337A0
 for <dev@dpdk.org>; Fri, 30 Jun 2017 14:07:17 +0200 (CEST)
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
 by fmsmga105.fm.intel.com with ESMTP; 30 Jun 2017 05:07:17 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.40,286,1496127600"; d="scan'208";a="280606196"
Received: from irsmsx107.ger.corp.intel.com ([163.33.3.99])
 by fmsmga004.fm.intel.com with ESMTP; 30 Jun 2017 05:07:16 -0700
Received: from irsmsx109.ger.corp.intel.com ([169.254.13.115]) by
 IRSMSX107.ger.corp.intel.com ([169.254.10.129]) with mapi id 14.03.0319.002;
 Fri, 30 Jun 2017 13:07:14 +0100
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: "Hu, Jiayu" <jiayu.hu@intel.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, "Tan, Jianfeng" <jianfeng.tan@intel.com>,
 "stephen@networkplumber.org" <stephen@networkplumber.org>,
 "yliu@fridaylinux.org" <yliu@fridaylinux.org>, "Wu, Jingjing"
 <jingjing.wu@intel.com>, "Yao, Lei A" <lei.a.yao@intel.com>, "Wiles, Keith"
 <keith.wiles@intel.com>, "Bie, Tiwei" <tiwei.bie@intel.com>
Thread-Topic: [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support
Thread-Index: AQHS7kdlXNHGhRYN9Eipev8pdR3kGKI6kXCQgAB/Z4CAAjYfoA==
Date: Fri, 30 Jun 2017 12:07:13 +0000
Message-ID: <2601191342CEEE43887BDE71AB9772583FB18595@IRSMSX109.ger.corp.intel.com>
References: <1498229000-94867-1-git-send-email-jiayu.hu@intel.com>
 <1498459430-116048-1-git-send-email-jiayu.hu@intel.com>
 <1498459430-116048-3-git-send-email-jiayu.hu@intel.com>
 <2601191342CEEE43887BDE71AB9772583FB13208@IRSMSX109.ger.corp.intel.com>
 <20170629022614.GB123467@dpdk15.sh.intel.com>
In-Reply-To: <20170629022614.GB123467@dpdk15.sh.intel.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
dlp-product: dlpe-windows
dlp-version: 10.0.102.7
dlp-reaction: no-action
x-originating-ip: [163.33.239.181]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jun 2017 12:07:18 -0000

Hi Jiayu,

> > > +
> > > +int32_t
> > > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > > +		struct gro_tcp_tbl *tbl,
> > > +		uint32_t max_packet_size)
> > > +{
> > > +	struct ether_hdr *eth_hdr;
> > > +	struct ipv4_hdr *ipv4_hdr;
> > > +	struct tcp_hdr *tcp_hdr;
> > > +	uint16_t ipv4_ihl, tcp_hl, tcp_dl;
> > > +
> > > +	struct tcp_key key;
> > > +	uint32_t cur_idx, prev_idx, item_idx;
> > > +	uint32_t i, key_idx;
> > > +
> > > +	eth_hdr =3D rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > +	ipv4_hdr =3D (struct ipv4_hdr *)(eth_hdr + 1);
> > > +	ipv4_ihl =3D IPv4_HDR_LEN(ipv4_hdr);
> > > +
> > > +	/* check if the packet should be processed */
> > > +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
> > > +		goto fail;
> > > +	tcp_hdr =3D (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > > +	tcp_hl =3D TCP_HDR_LEN(tcp_hdr);
> > > +	tcp_dl =3D rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
> > > +		- tcp_hl;
> > > +	if (tcp_dl =3D=3D 0)
> > > +		goto fail;
> > > +
> > > +	/* find a key and traverse all packets in its item group */
> > > +	key.eth_saddr =3D eth_hdr->s_addr;
> > > +	key.eth_daddr =3D eth_hdr->d_addr;
> > > +	key.ip_src_addr[0] =3D rte_be_to_cpu_32(ipv4_hdr->src_addr);
> > > +	key.ip_dst_addr[0] =3D rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> >
> > Your key.ip_src_addr[1-3] still contains some junk.
> > How memcmp below supposed to worj properly?
>=20
> When allocate an item, we already guarantee the content of its
> memory space is 0. So memcpy won't be error.

key is allocated on the stack.
Obviously fileds that are not initialized manually will contain undefined v=
alues,
i.e.: ip_src-addr[1-3].
Then below you are doing:
memcp((&(tbl->keys[i].key), &key, sizeof(struct tcp_key));
...
memcpy(&(tbl->keys[key_idx].key), &key, sizeof(struct tcp_key));

So I still think you are comparing/copying some junk here.

>=20
> > BTW why do you need 4 elems here, why just not uint32_t ip_src_addr;?
> > Same for ip_dst_addr.
>=20
> I think tcp6 and tcp4 can share the same table structure. So I use
> 128bit IP address here. You mean we need to use different structures
> for tcp4 and tcp6?

That would be my preference.

>=20
> >
> > > +	key.src_port =3D rte_be_to_cpu_16(tcp_hdr->src_port);
> > > +	key.dst_port =3D rte_be_to_cpu_16(tcp_hdr->dst_port);
> > > +	key.recv_ack =3D rte_be_to_cpu_32(tcp_hdr->recv_ack);
> > > +	key.tcp_flags =3D tcp_hdr->tcp_flags;
> > > +
> > > +	for (i =3D 0; i < tbl->max_key_num; i++) {
> > > +		if (tbl->keys[i].is_valid &&
> > > +				(memcmp(&(tbl->keys[i].key), &key,
> > > +						sizeof(struct tcp_key))
> > > +				 =3D=3D 0)) {
> > > +			cur_idx =3D tbl->keys[i].start_index;
> > > +			prev_idx =3D cur_idx;
> > > +			while (cur_idx !=3D INVALID_ARRAY_INDEX) {
> > > +				if (check_seq_option(tbl->items[cur_idx].pkt,
> > > +							tcp_hdr,
> > > +							tcp_hl) > 0) {
> >
> > As I remember linux gro also check ipv4 packet_id - it should be consec=
utive.
>=20
> IP fragmented packets have the same IP ID, but others are consecutive.

Yes, you assume that they are consecutive.
But the only way to know for sure that they are - check it.
Another thing - I think we need to perform GRO only for TCP packets with on=
ly ACK bit set
(i.e. - no GRO for FIN/SYN/PUSH/URG/, etc.).

> As we  suppose GRO can merge IP fragmented packets, so I think we shouldn=
't check if
> the IP ID is consecutive here. How do you think?
>=20
> >
> > > +					if (merge_two_tcp4_packets(
> > > +								tbl->items[cur_idx].pkt,
> > > +								pkt,
> > > +								max_packet_size) > 0) {
> > > +						/* successfully merge two packets */
> > > +						tbl->items[cur_idx].is_groed =3D 1;
> > > +						return 1;
> > > +					}
> >
> > If you allow more then packet per flow to be stored in the table, then =
you should be
> > prepared that new segment can fill a gap between 2 packets.
> > Probably the easiest thing - don't allow more then one 'item' per flow.
>=20
> We allow the table to store same flow but out-of-order arriving packets. =
For
> these packets, they will occupy different 'item' and we won't re-merge th=
em.
> For example, there are three same flow tcp packets: p1, p2 and p3. And p1=
 arrives
> first, then p3, and last is p2. So TCP GRO will allocate one 'item' for p=
1 and one
> 'item' for p3, and when p2 arrives, p2 will be merged with p1. Therefore,=
 in the
> table, we will have two 'item': item1 to store merged p1 and p2, item2 to=
 store p3.
>=20
> As you can see, TCP GRO can only merges sequential arriving packets. If w=
e want to
> merge all out-of-order arriving packets, we need to re-process these pack=
ets which
> are already processed and have one 'item'. IMO, this procedure will be ve=
ry complicated.
> So we don't do that.
>=20
> Sorry, I don't understand how to allow one 'item' per-flow. Because packe=
ts are arriving
> out-of-order. If we don't re-process these packets which already have one=
 'item', how to
> guarantee it?

As I understand you'll have an input array:
<seq=3D2, seq=3D1> - you wouldn't be able to merge it.
So I think your merge need be prepared to merge both smaller and bigger seq=
uence.
About one item my thought : instead of allowing N items per key(flow) - for=
 simplicity
just allow one item per flow.
In that case we wouldn't allow holes, but still would be able to merge reor=
dered packets.
Alternatively you can store items ordered by seq, and after merge, try to m=
erge neighbor
Items too.

>=20
> >
> > > +					/**
> > > +					 * fail to merge two packets since
> > > +					 * it's beyond the max packet length.
> > > +					 * Insert it into the item group.
> > > +					 */
> > > +					goto insert_to_item_group;
> > > +				} else {
> > > +					prev_idx =3D cur_idx;
> > > +					cur_idx =3D tbl->items[cur_idx].next_pkt_idx;
> > > +				}
> > > +			}
> > > +			/**
> > > +			 * find a corresponding item group but fails to find
> > > +			 * one packet to merge. Insert it into this item group.
> > > +			 */
> > > +insert_to_item_group:
> > > +			item_idx =3D find_an_empty_item(tbl);
> > > +			/* the item number is greater than the max value */
> > > +			if (item_idx =3D=3D INVALID_ARRAY_INDEX)
> > > +				return -1;
> > > +			tbl->items[prev_idx].next_pkt_idx =3D item_idx;
> > > +			tbl->items[item_idx].pkt =3D pkt;
> > > +			tbl->items[item_idx].is_groed =3D 0;
> > > +			tbl->items[item_idx].next_pkt_idx =3D INVALID_ARRAY_INDEX;
> > > +			tbl->items[item_idx].is_valid =3D 1;
> > > +			tbl->items[item_idx].start_time =3D rte_rdtsc();
> > > +			tbl->item_num++;
> > > +			return 0;
> > > +		}
> > > +	}
> > > +
> > > +	/**
> > > +	 * merge fail as the given packet has a new key.
> > > +	 * So insert a new key.
> > > +	 */
> > > +	item_idx =3D find_an_empty_item(tbl);
> > > +	key_idx =3D find_an_empty_key(tbl);
> > > +	/**
> > > +	 * if current key or item number is greater than the max
> > > +	 * value, don't insert the packet into the table and return
> > > +	 * immediately.
> > > +	 */
> > > +	if (item_idx =3D=3D INVALID_ARRAY_INDEX ||
> > > +			key_idx =3D=3D INVALID_ARRAY_INDEX)
> > > +		return -1;
> > > +	tbl->items[item_idx].pkt =3D pkt;
> > > +	tbl->items[item_idx].next_pkt_idx =3D INVALID_ARRAY_INDEX;
> > > +	tbl->items[item_idx].is_groed =3D 0;
> > > +	tbl->items[item_idx].is_valid =3D 1;
> > > +	tbl->items[item_idx].start_time =3D rte_rdtsc();
> >