From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jiayu.hu@intel.com>
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
 by dpdk.org (Postfix) with ESMTP id 79A1137A6
 for <dev@dpdk.org>; Fri, 30 Jun 2017 17:40:31 +0200 (CEST)
Received: from orsmga003.jf.intel.com ([10.7.209.27])
 by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 30 Jun 2017 08:40:30 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.40,287,1496127600"; d="scan'208";a="987273527"
Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203])
 by orsmga003.jf.intel.com with ESMTP; 30 Jun 2017 08:40:30 -0700
Received: from fmsmsx112.amr.corp.intel.com (10.18.116.6) by
 FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Fri, 30 Jun 2017 08:40:24 -0700
Received: from shsmsx103.ccr.corp.intel.com (10.239.4.69) by
 FMSMSX112.amr.corp.intel.com (10.18.116.6) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Fri, 30 Jun 2017 08:40:23 -0700
Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.146]) by
 SHSMSX103.ccr.corp.intel.com ([169.254.4.116]) with mapi id 14.03.0319.002;
 Fri, 30 Jun 2017 23:40:21 +0800
From: "Hu, Jiayu" <jiayu.hu@intel.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, "Tan, Jianfeng" <jianfeng.tan@intel.com>,
 "stephen@networkplumber.org" <stephen@networkplumber.org>,
 "yliu@fridaylinux.org" <yliu@fridaylinux.org>, "Wu, Jingjing"
 <jingjing.wu@intel.com>, "Yao, Lei A" <lei.a.yao@intel.com>, "Wiles, Keith"
 <keith.wiles@intel.com>, "Bie, Tiwei" <tiwei.bie@intel.com>
Thread-Topic: [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support
Thread-Index: AQHS8Zls7mU87+hBrkSDOYJ6Z34HKKI9arhg
Date: Fri, 30 Jun 2017 15:40:20 +0000
Message-ID: <ED946F0BEFE0A141B63BABBD629A2A9B3876E7D4@shsmsx102.ccr.corp.intel.com>
References: <1498229000-94867-1-git-send-email-jiayu.hu@intel.com>
 <1498459430-116048-1-git-send-email-jiayu.hu@intel.com>
 <1498459430-116048-3-git-send-email-jiayu.hu@intel.com>
 <2601191342CEEE43887BDE71AB9772583FB13208@IRSMSX109.ger.corp.intel.com>
 <20170629022614.GB123467@dpdk15.sh.intel.com>
 <2601191342CEEE43887BDE71AB9772583FB18595@IRSMSX109.ger.corp.intel.com>
In-Reply-To: <2601191342CEEE43887BDE71AB9772583FB18595@IRSMSX109.ger.corp.intel.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
dlp-product: dlpe-windows
dlp-version: 10.0.102.7
dlp-reaction: no-action
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jun 2017 15:40:32 -0000

Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Friday, June 30, 2017 8:07 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>
> Cc: dev@dpdk.org; Tan, Jianfeng <jianfeng.tan@intel.com>;
> stephen@networkplumber.org; yliu@fridaylinux.org; Wu, Jingjing
> <jingjing.wu@intel.com>; Yao, Lei A <lei.a.yao@intel.com>; Wiles, Keith
> <keith.wiles@intel.com>; Bie, Tiwei <tiwei.bie@intel.com>
> Subject: RE: [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support
>=20
> Hi Jiayu,
>=20
> > > > +
> > > > +int32_t
> > > > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > > > +		struct gro_tcp_tbl *tbl,
> > > > +		uint32_t max_packet_size)
> > > > +{
> > > > +	struct ether_hdr *eth_hdr;
> > > > +	struct ipv4_hdr *ipv4_hdr;
> > > > +	struct tcp_hdr *tcp_hdr;
> > > > +	uint16_t ipv4_ihl, tcp_hl, tcp_dl;
> > > > +
> > > > +	struct tcp_key key;
> > > > +	uint32_t cur_idx, prev_idx, item_idx;
> > > > +	uint32_t i, key_idx;
> > > > +
> > > > +	eth_hdr =3D rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > > +	ipv4_hdr =3D (struct ipv4_hdr *)(eth_hdr + 1);
> > > > +	ipv4_ihl =3D IPv4_HDR_LEN(ipv4_hdr);
> > > > +
> > > > +	/* check if the packet should be processed */
> > > > +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
> > > > +		goto fail;
> > > > +	tcp_hdr =3D (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > > > +	tcp_hl =3D TCP_HDR_LEN(tcp_hdr);
> > > > +	tcp_dl =3D rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
> > > > +		- tcp_hl;
> > > > +	if (tcp_dl =3D=3D 0)
> > > > +		goto fail;
> > > > +
> > > > +	/* find a key and traverse all packets in its item group */
> > > > +	key.eth_saddr =3D eth_hdr->s_addr;
> > > > +	key.eth_daddr =3D eth_hdr->d_addr;
> > > > +	key.ip_src_addr[0] =3D rte_be_to_cpu_32(ipv4_hdr->src_addr);
> > > > +	key.ip_dst_addr[0] =3D rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> > >
> > > Your key.ip_src_addr[1-3] still contains some junk.
> > > How memcmp below supposed to worj properly?
> >
> > When allocate an item, we already guarantee the content of its
> > memory space is 0. So memcpy won't be error.
>=20
> key is allocated on the stack.
> Obviously fileds that are not initialized manually will contain undefined=
 values,
> i.e.: ip_src-addr[1-3].
> Then below you are doing:
> memcp((&(tbl->keys[i].key), &key, sizeof(struct tcp_key));
> ...
> memcpy(&(tbl->keys[key_idx].key), &key, sizeof(struct tcp_key));
>=20
> So I still think you are comparing/copying some junk here.

Oh, yes. Key is allocated in stack. Thanks, I will modify it.

>=20
> >
> > > BTW why do you need 4 elems here, why just not uint32_t ip_src_addr;?
> > > Same for ip_dst_addr.
> >
> > I think tcp6 and tcp4 can share the same table structure. So I use
> > 128bit IP address here. You mean we need to use different structures
> > for tcp4 and tcp6?
>=20
> That would be my preference.

Got it. I will modify it. Thanks.

>=20
> >
> > >
> > > > +	key.src_port =3D rte_be_to_cpu_16(tcp_hdr->src_port);
> > > > +	key.dst_port =3D rte_be_to_cpu_16(tcp_hdr->dst_port);
> > > > +	key.recv_ack =3D rte_be_to_cpu_32(tcp_hdr->recv_ack);
> > > > +	key.tcp_flags =3D tcp_hdr->tcp_flags;
> > > > +
> > > > +	for (i =3D 0; i < tbl->max_key_num; i++) {
> > > > +		if (tbl->keys[i].is_valid &&
> > > > +				(memcmp(&(tbl->keys[i].key), &key,
> > > > +						sizeof(struct tcp_key))
> > > > +				 =3D=3D 0)) {
> > > > +			cur_idx =3D tbl->keys[i].start_index;
> > > > +			prev_idx =3D cur_idx;
> > > > +			while (cur_idx !=3D INVALID_ARRAY_INDEX) {
> > > > +				if (check_seq_option(tbl->items[cur_idx].pkt,
> > > > +							tcp_hdr,
> > > > +							tcp_hl) > 0) {
> > >
> > > As I remember linux gro also check ipv4 packet_id - it should be
> consecutive.
> >
> > IP fragmented packets have the same IP ID, but others are consecutive.
>=20
> Yes, you assume that they are consecutive.
> But the only way to know for sure that they are - check it.

Yes, I will modify it. Thanks.

> Another thing - I think we need to perform GRO only for TCP packets with
> only ACK bit set
> (i.e. - no GRO for FIN/SYN/PUSH/URG/, etc.).

Currently, we don't process packets whose payload length is 0. So if the pa=
ckets
are SYN or FIN, we won't process them, since their payload length is 0. For=
 URG,
PSH and RST, TCP/IPv4 GRO may still process these packets.

You are right. We shouldn't process packets whose URG, PSH or RST bit is se=
t.
Thanks, I will modify it.=20

>=20
> > As we  suppose GRO can merge IP fragmented packets, so I think we
> shouldn't check if
> > the IP ID is consecutive here. How do you think?
> >
> > >
> > > > +					if (merge_two_tcp4_packets(
> > > > +								tbl-
> >items[cur_idx].pkt,
> > > > +								pkt,
> > > > +
> 	max_packet_size) > 0) {
> > > > +						/* successfully merge two
> packets */
> > > > +						tbl-
> >items[cur_idx].is_groed =3D 1;
> > > > +						return 1;
> > > > +					}
> > >
> > > If you allow more then packet per flow to be stored in the table, the=
n you
> should be
> > > prepared that new segment can fill a gap between 2 packets.
> > > Probably the easiest thing - don't allow more then one 'item' per flo=
w.
> >
> > We allow the table to store same flow but out-of-order arriving packets=
.
> For
> > these packets, they will occupy different 'item' and we won't re-merge
> them.
> > For example, there are three same flow tcp packets: p1, p2 and p3. And =
p1
> arrives
> > first, then p3, and last is p2. So TCP GRO will allocate one 'item' for=
 p1 and
> one
> > 'item' for p3, and when p2 arrives, p2 will be merged with p1. Therefor=
e, in
> the
> > table, we will have two 'item': item1 to store merged p1 and p2, item2 =
to
> store p3.
> >
> > As you can see, TCP GRO can only merges sequential arriving packets. If=
 we
> want to
> > merge all out-of-order arriving packets, we need to re-process these
> packets which
> > are already processed and have one 'item'. IMO, this procedure will be =
very
> complicated.
> > So we don't do that.
> >
> > Sorry, I don't understand how to allow one 'item' per-flow. Because
> packets are arriving
> > out-of-order. If we don't re-process these packets which already have o=
ne
> 'item', how to
> > guarantee it?
>=20
> As I understand you'll have an input array:
> <seq=3D2, seq=3D1> - you wouldn't be able to merge it.
> So I think your merge need be prepared to merge both smaller and bigger
> sequence.

Oh yes, it's much better. I will modify it.

> About one item my thought : instead of allowing N items per key(flow) - f=
or
> simplicity
> just allow one item per flow.
> In that case we wouldn't allow holes, but still would be able to merge
> reordered packets.
> Alternatively you can store items ordered by seq, and after merge, try to
> merge neighbor
> Items too.

Yes, when we insert a new item, we can chain it with the packets of its ite=
m
group ordered by seq. After processing all packets, we need to traverse eac=
h
item group and try to merge neighbors. But when will 'merge neighbors' happ=
en?
When flush packets from the table? (e.g.  gro_tcp_tbl_timeout_flush)

BRs,
Jiayu
>=20
> >
> > >
> > > > +					/**
> > > > +					 * fail to merge two packets since
> > > > +					 * it's beyond the max packet length.
> > > > +					 * Insert it into the item group.
> > > > +					 */
> > > > +					goto insert_to_item_group;
> > > > +				} else {
> > > > +					prev_idx =3D cur_idx;
> > > > +					cur_idx =3D tbl-
> >items[cur_idx].next_pkt_idx;
> > > > +				}
> > > > +			}
> > > > +			/**
> > > > +			 * find a corresponding item group but fails to find
> > > > +			 * one packet to merge. Insert it into this item group.
> > > > +			 */
> > > > +insert_to_item_group:
> > > > +			item_idx =3D find_an_empty_item(tbl);
> > > > +			/* the item number is greater than the max value */
> > > > +			if (item_idx =3D=3D INVALID_ARRAY_INDEX)
> > > > +				return -1;
> > > > +			tbl->items[prev_idx].next_pkt_idx =3D item_idx;
> > > > +			tbl->items[item_idx].pkt =3D pkt;
> > > > +			tbl->items[item_idx].is_groed =3D 0;
> > > > +			tbl->items[item_idx].next_pkt_idx =3D
> INVALID_ARRAY_INDEX;
> > > > +			tbl->items[item_idx].is_valid =3D 1;
> > > > +			tbl->items[item_idx].start_time =3D rte_rdtsc();
> > > > +			tbl->item_num++;
> > > > +			return 0;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	/**
> > > > +	 * merge fail as the given packet has a new key.
> > > > +	 * So insert a new key.
> > > > +	 */
> > > > +	item_idx =3D find_an_empty_item(tbl);
> > > > +	key_idx =3D find_an_empty_key(tbl);
> > > > +	/**
> > > > +	 * if current key or item number is greater than the max
> > > > +	 * value, don't insert the packet into the table and return
> > > > +	 * immediately.
> > > > +	 */
> > > > +	if (item_idx =3D=3D INVALID_ARRAY_INDEX ||
> > > > +			key_idx =3D=3D INVALID_ARRAY_INDEX)
> > > > +		return -1;
> > > > +	tbl->items[item_idx].pkt =3D pkt;
> > > > +	tbl->items[item_idx].next_pkt_idx =3D INVALID_ARRAY_INDEX;
> > > > +	tbl->items[item_idx].is_groed =3D 0;
> > > > +	tbl->items[item_idx].is_valid =3D 1;
> > > > +	tbl->items[item_idx].start_time =3D rte_rdtsc();
> > >