From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jianfeng.tan@intel.com>
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
 by dpdk.org (Postfix) with ESMTP id 625425A3E
 for <dev@dpdk.org>; Wed, 21 Jun 2017 01:30:15 +0200 (CEST)
Received: from orsmga002.jf.intel.com ([10.7.209.21])
 by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 20 Jun 2017 16:30:14 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.39,366,1493708400"; d="scan'208";a="101777403"
Received: from tanjianf-mobl.ccr.corp.intel.com (HELO [10.255.27.34])
 ([10.255.27.34])
 by orsmga002.jf.intel.com with ESMTP; 20 Jun 2017 16:30:09 -0700
To: Jiayu Hu <jiayu.hu@intel.com>
References: <1496833731-53653-1-git-send-email-jiayu.hu@intel.com>
 <1497770469-16661-1-git-send-email-jiayu.hu@intel.com>
 <1497770469-16661-3-git-send-email-jiayu.hu@intel.com>
 <20fd3a2c-9b61-2732-5a34-5acb8fc639a0@intel.com>
 <20170620032220.GB12728@localhost.localdomain>
Cc: dev@dpdk.org, konstantin.ananyev@intel.com, yliu@fridaylinux.org,
 keith.wiles@intel.com, tiwei.bie@intel.com, lei.a.yao@intel.com
From: "Tan, Jianfeng" <jianfeng.tan@intel.com>
Message-ID: <21e9e28b-ba41-b88d-1f9a-b022b0d2c5ce@intel.com>
Date: Wed, 21 Jun 2017 07:30:08 +0800
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <20170620032220.GB12728@localhost.localdomain>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [dpdk-dev] [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Jun 2017 23:30:17 -0000

Hi Jiayu,


On 6/20/2017 11:22 AM, Jiayu Hu wrote:
> Hi Jianfeng,
>
> On Mon, Jun 19, 2017 at 11:43:20PM +0800, Tan, Jianfeng wrote:
>>
>> On 6/18/2017 3:21 PM, Jiayu Hu wrote:
>>> In this patch, we introduce six APIs to support TCP/IPv4 GRO.
>> Those functions are not used outside of this library. Don't make it as
>> extern visible.
> But they are called by functions in rte_gro.c, which are in the different
> file. If we define these functions with static, how can they be called by
> other functions in the different file?

We can define some ops for GRO engines. And in each GRO engine, tcp4 in 
this case, we just need to register those ops; then we can iterate all 
GRO engines in rte_gro,c. It's a better way for other developers to 
contribute other GRO engines.

>
>>> - gro_tcp_tbl_create: create a TCP reassembly table, which is used to
>>>       merge packets.
>> Will tcp6 shares the same function with tcp4? If no, please rename it to
>> gro_tcp4_tlb_create
> In TCP GRO design, TCP4 and TCP6 will share a same table structure, but
> they will have different reassembly function. Therefore, I use
> gro_tcp_tlb_create instead of gro_tcp4_tlb_create here.

Then as far as I see, we are gonna call this function for all GRO 
engines except different flow structures are allocated for different GRO 
engines. Then I suggest we put this function into rte_gro.c.

>
>>> - gro_tcp_tbl_destroy: free memory space of a TCP reassembly table.
>>> - gro_tcp_tbl_flush: flush packets in the TCP reassembly table.
>>> - gro_tcp_tbl_timeout_flush: flush timeout packets in the TCP
>>>       reassembly table.
>>> - gro_tcp4_reassemble: merge an inputted packet.
>>> - gro_tcp4_tbl_cksum_update: update TCP and IPv4 header checksums for
>>>       all merged packets in the TCP reassembly table.
>>>
>>> In TCP GRO, we use a table structure, called TCP reassembly table, to
>>> reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table
>>> structure. A TCP reassembly table includes a flow array and a item array,
>>> where the flow array is used to record flow information and the item
>>> array is used to record packets information.
>>>
>>> Each element in the flow array records the information of one flow,
>>> which includes two parts:
>>> - key: the criteria of the same flow. If packets have the same key
>>>       value, they belong to the same flow.
>>> - start_index: the index of the first incoming packet of this flow in
>>>       the item array. With start_index, we can locate the first incoming
>>>       packet of this flow.
>>> Each element in the item array records one packet information. It mainly
>>> includes two parts:
>>> - pkt: packet address
>>> - next_pkt_index: index of the next packet of the same flow in the item
>>>       array. All packets of the same flow are chained by next_pkt_index.
>>>       With next_pkt_index, we can locate all packets of the same flow
>>>       one by one.
>>>
>>> To process an incoming packet, we need three steps:
>>> a. check if the packet should be processed. Packets with the following
>>>       properties won't be processed:
>>> 	- packets without data;
>>> 	- packets with wrong checksums;
>> Why do we care to check this kind of error? Can we just suppose the
>> applications have already dropped the packets with wrong cksum?
> Indeed, if we assume all inputted packets are correct, we can avoid
> checksum checking overhead. But as a library, I think a more flexible
> way is to enable applications to tell GRO API if checksum checking
> is needed. For example, we can add a flag to struct rte_gro_tbl
> and struct rte_gro_param, which indicates if the checksum checking
> is needed. If applications set this flag, reassembly function won't
> check packet checksum. Otherwise, we check the checksum. How do you
> think?

My opinion is to keep the library focusing on what it does, and make 
clear its dependency. This flag thing will differ for different GRO 
engines, which makes it a little complicated to me.

>
>>> 	- fragmented packets.
>> IP fragmented? I don't think we need to check it here either. It's the
>> application's responsibility to call librte_ip_frag firstly to reassemble
>> IP-fragmented packets, and then call this gro library to merge TCP packets.
>> And this procedure should be shown in an example for other users to refer.
>>
>>> b. traverse the flow array to find a flow which the packet belongs to.
>>>       If not find, insert a new flow and store the packet into the item
>>>       array.
>> You do not store the packet now. "store the packet into the item array" ->
>> "then go to step c".
> Thanks, I will update it in next patch.
>
>>> c. locate the first packet of this flow in the item array via
>>>       start_index. Then traverse all packets of this flow one by one via
>>>       next_pkt_index. If find one packet to merge with the incoming packet,
>>>       merge them but without updating checksums. If not, allocate one item
>>>       in the item array to store the incoming packet and update
>>>       next_pkt_index value.
>>>
>>> For better performance, we don't update header checksums once two
>>> packets are merged. The header checksums are updated only when packets
>>> are flushed from TCP reassembly tables.
>> Why do we care to recalculate the L4 checksum when flushing? How about Just
>> keeping the wrong cksum, and letting the applications to handle that?
> Not all applications want GROed packets with wrong checksum. So I think a
> more reasonable way is to give a flag to applications to tell GRO API if
> they need calculate checksum when flush them from GRO table. How do you
> think?

The two main directions: (1) to be sent out from physical NIC; (2) to be 
sent out from a vhost port. It is very easy to take care of the wrong 
checksum for applications.

>
>>
>>> Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
>>> ---
>>>    lib/librte_gro/Makefile      |   1 +
>>>    lib/librte_gro/rte_gro.c     | 154 +++++++++++--
>>>    lib/librte_gro/rte_gro.h     |  34 +--
>>>    lib/librte_gro/rte_gro_tcp.c | 527 +++++++++++++++++++++++++++++++++++++++++++
>>>    lib/librte_gro/rte_gro_tcp.h | 210 +++++++++++++++++
>>>    5 files changed, 895 insertions(+), 31 deletions(-)
>>>    create mode 100644 lib/librte_gro/rte_gro_tcp.c
>>>    create mode 100644 lib/librte_gro/rte_gro_tcp.h
>>>
>>> diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
>>> index 9f4063a..3495dfc 100644
>>> --- a/lib/librte_gro/Makefile
>>> +++ b/lib/librte_gro/Makefile
>>> @@ -43,6 +43,7 @@ LIBABIVER := 1
>>>    # source files
>>>    SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
>>> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c
>> Again, if it's just for tcp4, please use the name rte_gro_tcp4.c.
> TCP4 and TCP6 reassembly functions will be placed in the same file,
> rte_gro_tcp.c. But currently, we don't support TCP6 GRO.

That's ok to me. But then we have to have different struct gro_tcp_flow 
for tcp4 and tcp6.

>
>>>    # install this header file
>>>    SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
>>> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
>>> index 1bc53a2..2620ef6 100644
>>> --- a/lib/librte_gro/rte_gro.c
>>> +++ b/lib/librte_gro/rte_gro.c
>>> @@ -32,11 +32,17 @@
>>>    #include <rte_malloc.h>
>>>    #include <rte_mbuf.h>
>>> +#include <rte_ethdev.h>
>>> +#include <rte_ip.h>
>>> +#include <rte_tcp.h>
>>>    #include "rte_gro.h"
>>> +#include "rte_gro_tcp.h"
>>> -static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB];
>>> -static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB];
>>> +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {
>>> +	gro_tcp_tbl_create, NULL};
>>> +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB] = {
>>> +	gro_tcp_tbl_destroy, NULL};
>>>    struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id,
>>>    		uint16_t max_flow_num,
>>> @@ -94,33 +100,149 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
>>>    }
>>>    uint16_t
>>> -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
>>> +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
>>>    		const uint16_t nb_pkts,
>>> -		const struct rte_gro_param param __rte_unused)
>>> +		const struct rte_gro_param param)
>>>    {
>>> -	return nb_pkts;
>>> +	struct ether_hdr *eth_hdr;
>>> +	struct ipv4_hdr *ipv4_hdr;
>>> +	uint16_t l3proc_type, i;
>> I did not catch the variable definition here: l3proc_type -> l3_proto?
> You can see it in line 158 and line 159.

It's not the reference. I mean the variable name is not that clear.

>
>>> +	uint16_t nb_after_gro = nb_pkts;
>>> +	uint16_t flow_num = nb_pkts < param.max_flow_num ?
>>> +		nb_pkts : param.max_flow_num;
>>> +	uint32_t item_num = nb_pkts <
>>> +		flow_num * param.max_item_per_flow ?
>>> +		nb_pkts :
>>> +		flow_num * param.max_item_per_flow;
>>> +
>>> +	/* allocate a reassembly table for TCP/IPv4 GRO */
>>> +	uint16_t tcp_flow_num = flow_num <= GRO_TCP_TBL_MAX_FLOW_NUM ?
>>> +		flow_num : GRO_TCP_TBL_MAX_FLOW_NUM;
>>> +	uint32_t tcp_item_num = item_num <= GRO_TCP_TBL_MAX_ITEM_NUM ?
>>> +		item_num : GRO_TCP_TBL_MAX_ITEM_NUM;
>> Below tcpv4-specific logic should be in rte_gro_tcp4.c; here, as my previous
>> comment, we iterate all ptypes of this packets to iterate all supported GRO
>> engine.
> Sorry, I don't get the point. The table which is created here is used by
> gro_tcp4_reassemble when merges packets. If we don't create table here,
> what does gro_tcp4_reassemble use to merge packets?

Too much tcp* code here. If we add another GRO engine, take udp as an 
example, shall we add more udp* code here? Not a good idea to me. In 
fact, gro_tcp4_reassemble is defined in rte_gro_tcp.c instead of this 
file. For better modularity, we'd better put these tcp-related code into 
rte_gro_tcp.c.


>
>>> +	struct gro_tcp_tbl tcp_tbl;
>>> +	struct gro_tcp_flow tcp_flows[tcp_flow_num];
>>> +	struct gro_tcp_item tcp_items[tcp_item_num];
>>> +	struct gro_tcp_rule tcp_rule;
>>> +
>>> +	struct rte_mbuf *unprocess_pkts[nb_pkts];
>>> +	uint16_t unprocess_num = 0;
>>> +	int32_t ret;
>>> +
>>> +	if (unlikely(nb_pkts <= 1))
>>> +		return nb_pkts;
>>> +
>>> +	memset(tcp_flows, 0, sizeof(struct gro_tcp_flow) *
>>> +			tcp_flow_num);
>>> +	memset(tcp_items, 0, sizeof(struct gro_tcp_item) *
>>> +			tcp_item_num);
>>> +	tcp_tbl.flows = tcp_flows;
>>> +	tcp_tbl.items = tcp_items;
>>> +	tcp_tbl.flow_num = 0;
>>> +	tcp_tbl.item_num = 0;
>>> +	tcp_tbl.max_flow_num = tcp_flow_num;
>>> +	tcp_tbl.max_item_num = tcp_item_num;
>>> +	tcp_rule.max_packet_size = param.max_packet_size;
>>> +
>>> +	for (i = 0; i < nb_pkts; i++) {
>>> +		eth_hdr = rte_pktmbuf_mtod(pkts[i], struct ether_hdr *);
>>> +		l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
>>> +		if (l3proc_type == ETHER_TYPE_IPv4) {
>>> +			ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
>>> +			if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
>>> +					(param.desired_gro_types &
>>> +					 GRO_TCP_IPV4)) {
>>> +				ret = gro_tcp4_reassemble(pkts[i],
>>> +						&tcp_tbl,
>>> +						&tcp_rule);
>>> +				if (ret > 0)
>>> +					nb_after_gro--;
>>> +				else if (ret < 0)
>>> +					unprocess_pkts[unprocess_num++] =
>>> +						pkts[i];
>>> +			} else
>>> +				unprocess_pkts[unprocess_num++] =
>>> +					pkts[i];
>>> +		} else
>>> +			unprocess_pkts[unprocess_num++] =
>>> +				pkts[i];
>>> +	}
>>> +
>>> +	if (nb_after_gro < nb_pkts) {
>>> +		/* update packets headers and re-arrange GROed packets */
>>> +		if (param.desired_gro_types & GRO_TCP_IPV4) {
>>> +			gro_tcp4_tbl_cksum_update(&tcp_tbl);
>>> +			for (i = 0; i < tcp_tbl.item_num; i++)
>>> +				pkts[i] = tcp_tbl.items[i].pkt;
>>> +		}
>>> +		if (unprocess_num > 0) {
>>> +			memcpy(&pkts[i], unprocess_pkts,
>>> +					sizeof(struct rte_mbuf *) *
>>> +					unprocess_num);
>>> +			i += unprocess_num;
>>> +		}
>>> +		if (nb_pkts > i)
>>> +			memset(&pkts[i], 0,
>>> +					sizeof(struct rte_mbuf *) *
>>> +					(nb_pkts - i));
>>> +	}
>>> +	return nb_after_gro;
>>>    }
>>> -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
>>> -		struct rte_gro_tbl *gro_tbl __rte_unused)
>>> +int rte_gro_reassemble(struct rte_mbuf *pkt,
>>> +		struct rte_gro_tbl *gro_tbl)
>>>    {
>>> +	struct ether_hdr *eth_hdr;
>>> +	struct ipv4_hdr *ipv4_hdr;
>>> +	uint16_t l3proc_type;
>>> +	struct gro_tcp_rule tcp_rule;
>>> +
>>> +	if (pkt == NULL)
>>> +		return -1;
>>> +	tcp_rule.max_packet_size = gro_tbl->max_packet_size;
>>> +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
>>> +	l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type);
>>> +	if (l3proc_type == ETHER_TYPE_IPv4) {
>>> +		ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
>>> +		if (ipv4_hdr->next_proto_id == IPPROTO_TCP &&
>>> +				(gro_tbl->desired_gro_types & GRO_TCP_IPV4)) {
>>> +			return gro_tcp4_reassemble(pkt,
>>> +					gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
>>> +					&tcp_rule);
>>> +		}
>>> +	}
>>>    	return -1;
>>>    }
>>> -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
>>> -		uint64_t desired_gro_types __rte_unused,
>>> -		uint16_t flush_num __rte_unused,
>>> -		struct rte_mbuf **out __rte_unused,
>>> -		const uint16_t max_nb_out __rte_unused)
>>> +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
>>> +		uint64_t desired_gro_types,
>>> +		uint16_t flush_num,
>>> +		struct rte_mbuf **out,
>>> +		const uint16_t max_nb_out)
>>>    {
>> Ditto.
>>
>>> +	desired_gro_types = desired_gro_types &
>>> +		gro_tbl->desired_gro_types;
>>> +	if (desired_gro_types & GRO_TCP_IPV4)
>>> +		return gro_tcp_tbl_flush(
>>> +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
>>> +				flush_num,
>>> +				out,
>>> +				max_nb_out);
>>>    	return 0;
>>>    }
>>>    uint16_t
>>> -rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
>>> -		uint64_t desired_gro_types __rte_unused,
>>> -		struct rte_mbuf **out __rte_unused,
>>> -		const uint16_t max_nb_out __rte_unused)
>>> +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
>>> +		uint64_t desired_gro_types,
>>> +		struct rte_mbuf **out,
>>> +		const uint16_t max_nb_out)
>>>    {
>>> +	desired_gro_types = desired_gro_types &
>>> +		gro_tbl->desired_gro_types;
>>> +	if (desired_gro_types & GRO_TCP_IPV4)
>>> +		return gro_tcp_tbl_timeout_flush(
>>> +				gro_tbl->tbls[GRO_TCP_IPV4_INDEX],
>>> +				gro_tbl->max_timeout_cycles,
>>> +				out, max_nb_out);
>>>    	return 0;
>>>    }
>>> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
>>> index 67bd90d..e26aa5b 100644
>>> --- a/lib/librte_gro/rte_gro.h
>>> +++ b/lib/librte_gro/rte_gro.h
>>> @@ -35,7 +35,11 @@
>>>    /* maximum number of supported GRO types */
>>>    #define GRO_TYPE_MAX_NB 64
>>> -#define GRO_TYPE_SUPPORT_NB 0	/**< current supported GRO num */
>>> +#define GRO_TYPE_SUPPORT_NB 1	/**< supported GRO types number */
>>> +
>>> +/* TCP/IPv4 GRO flag */
>>> +#define GRO_TCP_IPV4_INDEX 0
>>> +#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX)
>>>    /**
>>>     * GRO table structure. DPDK GRO uses GRO table to reassemble
>>> @@ -139,9 +143,9 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl);
>>>     * @return
>>>     *  the number of packets after GROed.
>>>     */
>>> -uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
>>> -		const uint16_t nb_pkts __rte_unused,
>>> -		const struct rte_gro_param param __rte_unused);
>>> +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
>>> +		const uint16_t nb_pkts,
>>> +		const struct rte_gro_param param);
>>>    /**
>>>     * This is the main reassembly API used in heavyweight mode, which
>>> @@ -164,8 +168,8 @@ uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
>>>     *  if merge the packet successfully, return a positive value. If fail
>>>     *  to merge, return zero. If errors happen, return a negative value.
>>>     */
>>> -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
>>> -		struct rte_gro_tbl *gro_tbl __rte_unused);
>>> +int rte_gro_reassemble(struct rte_mbuf *pkt,
>>> +		struct rte_gro_tbl *gro_tbl);
>>>    /**
>>>     * This function flushed packets of desired GRO types from their
>>> @@ -184,11 +188,11 @@ int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused,
>>>     * @return
>>>     *  the number of flushed packets. If no packets are flushed, return 0.
>>>     */
>>> -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
>>> -		uint64_t desired_gro_types __rte_unused,
>>> -		uint16_t flush_num __rte_unused,
>>> -		struct rte_mbuf **out __rte_unused,
>>> -		const uint16_t max_nb_out __rte_unused);
>>> +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl,
>>> +		uint64_t desired_gro_types,
>>> +		uint16_t flush_num,
>>> +		struct rte_mbuf **out,
>>> +		const uint16_t max_nb_out);
>>>    /**
>>>     * This function flushes the timeout packets from reassembly tables of
>>> @@ -206,8 +210,8 @@ uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
>>>     * @return
>>>     *  the number of flushed packets. If no packets are flushed, return 0.
>>>     */
>>> -uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused,
>>> -		uint64_t desired_gro_types __rte_unused,
>>> -		struct rte_mbuf **out __rte_unused,
>>> -		const uint16_t max_nb_out __rte_unused);
>>> +uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl,
>>> +		uint64_t desired_gro_types,
>>> +		struct rte_mbuf **out,
>>> +		const uint16_t max_nb_out);
>> Do you have any cases to test this API? I don't see following example use
>> this API. That means we are exposing an API that are never tested. I don't
>> know if we can add some experimental flag on this API. Let's seek advice
>> from others.
> These flush APIs are used in heavyweight mode. But testpmd is not a good case
> to use heavyweight mode. How do you think if we use some unit tests to test
> them?

I think vhost example is a good place to implement heavyweight mode. 
There is timeout mechanism in vhost example which can call this flush 
API. Feel free to ping yuanhan and Maxime for suggestions.

>
>>>    #endif
>>> diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c
>>> new file mode 100644
>>> index 0000000..86743cd
>>> --- /dev/null
>>> +++ b/lib/librte_gro/rte_gro_tcp.c
>>> @@ -0,0 +1,527 @@
>>> +/*-
>>> + *   BSD LICENSE
>>> + *
>>> + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
>>> + *
>>> + *   Redistribution and use in source and binary forms, with or without
>>> + *   modification, are permitted provided that the following conditions
>>> + *   are met:
>>> + *
>>> + *     * Redistributions of source code must retain the above copyright
>>> + *       notice, this list of conditions and the following disclaimer.
>>> + *     * Redistributions in binary form must reproduce the above copyright
>>> + *       notice, this list of conditions and the following disclaimer in
>>> + *       the documentation and/or other materials provided with the
>>> + *       distribution.
>>> + *     * Neither the name of Intel Corporation nor the names of its
>>> + *       contributors may be used to endorse or promote products derived
>>> + *       from this software without specific prior written permission.
>>> + *
>>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>>> + */
>>> +
>>> +#include <rte_malloc.h>
>>> +#include <rte_mbuf.h>
>>> +#include <rte_cycles.h>
>>> +
>>> +#include <rte_ethdev.h>
>>> +#include <rte_ip.h>
>>> +#include <rte_tcp.h>
>>> +
>>> +#include "rte_gro_tcp.h"
>>> +
>>> +void *gro_tcp_tbl_create(uint16_t socket_id,
>> Define it as "static". Similar to other functions.
>>
>>> +		uint16_t max_flow_num,
>>> +		uint16_t max_item_per_flow)
>>> +{
>>> +	size_t size;
>>> +	uint32_t entries_num;
>>> +	struct gro_tcp_tbl *tbl;
>>> +
>>> +	max_flow_num = max_flow_num > GRO_TCP_TBL_MAX_FLOW_NUM ?
>>> +		GRO_TCP_TBL_MAX_FLOW_NUM : max_flow_num;
>>> +
>>> +	entries_num = max_flow_num * max_item_per_flow;
>>> +	entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ?
>>> +		GRO_TCP_TBL_MAX_ITEM_NUM : entries_num;
>>> +
>>> +	if (entries_num == 0 || max_flow_num == 0)
>>> +		return NULL;
>>> +
>>> +	tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket(
>>> +			__func__,
>>> +			sizeof(struct gro_tcp_tbl),
>>> +			RTE_CACHE_LINE_SIZE,
>>> +			socket_id);
>>> +
>>> +	size = sizeof(struct gro_tcp_item) * entries_num;
>>> +	tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket(
>>> +			__func__,
>>> +			size,
>>> +			RTE_CACHE_LINE_SIZE,
>>> +			socket_id);
>>> +	tbl->max_item_num = entries_num;
>>> +
>>> +	size = sizeof(struct gro_tcp_flow) * max_flow_num;
>>> +	tbl->flows = (struct gro_tcp_flow *)rte_zmalloc_socket(
>>> +			__func__,
>>> +			size, RTE_CACHE_LINE_SIZE,
>>> +			socket_id);
>>> +	tbl->max_flow_num = max_flow_num;
>>> +	return tbl;
>>> +}
>>> +
>>> +void gro_tcp_tbl_destroy(void *tbl)
>>> +{
>>> +	struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl;
>>> +
>>> +	if (tcp_tbl) {
>>> +		if (tcp_tbl->items)
>>> +			rte_free(tcp_tbl->items);
>>> +		if (tcp_tbl->flows)
>>> +			rte_free(tcp_tbl->flows);
>>> +		rte_free(tcp_tbl);
>>> +	}
>>> +}
>>> +
>>> +/* update TCP header and IPv4 header checksum */
>>> +static void
>>> +gro_tcp4_cksum_update(struct rte_mbuf *pkt)
>>> +{
>>> +	uint32_t len, offset, cksum;
>>> +	struct ether_hdr *eth_hdr;
>>> +	struct ipv4_hdr *ipv4_hdr;
>>> +	struct tcp_hdr *tcp_hdr;
>>> +	uint16_t ipv4_ihl, cksum_pld;
>>> +
>>> +	if (pkt == NULL)
>>> +		return;
>>> +
>>> +	len = pkt->pkt_len;
>>> +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
>>> +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
>>> +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
>>> +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
>>> +
>>> +	offset = sizeof(struct ether_hdr) + ipv4_ihl;
>>> +	len -= offset;
>>> +
>>> +	/* TCP cksum without IP pseudo header */
>>> +	ipv4_hdr->hdr_checksum = 0;
>>> +	tcp_hdr->cksum = 0;
>>> +	rte_raw_cksum_mbuf(pkt, offset, len, &cksum_pld);
>>> +
>>> +	/* IP pseudo header cksum */
>>> +	cksum = cksum_pld;
>>> +	cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0);
>>> +
>>> +	/* combine TCP checksum and IP pseudo header checksum */
>>> +	cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff);
>>> +	cksum = (~cksum) & 0xffff;
>>> +	cksum = (cksum == 0) ? 0xffff : cksum;
>>> +	tcp_hdr->cksum = cksum;
>>> +
>>> +	/* update IP header cksum */
>>> +	ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
>>> +}
>>> +
>>> +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl)
>>> +{
>>> +	uint32_t i;
>>> +	uint32_t item_num = tbl->item_num;
>>> +
>>> +	for (i = 0; i < tbl->max_item_num; i++) {
>>> +		if (tbl->items[i].is_valid) {
>>> +			item_num--;
>>> +			if (tbl->items[i].is_groed)
>>> +				gro_tcp4_cksum_update(tbl->items[i].pkt);
>>> +		}
>>> +		if (unlikely(item_num == 0))
>>> +			break;
>>> +	}
>>> +}
>>> +
>>> +/**
>>> + * merge two TCP/IPv4 packets without update header checksum.
>>> + */
>>> +static int
>>> +merge_two_tcp4_packets(struct rte_mbuf *pkt_src,
>>> +		struct rte_mbuf *pkt,
>>> +		struct gro_tcp_rule *rule)
>>> +{
>>> +	struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2;
>>> +	struct tcp_hdr *tcp_hdr1;
>>> +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
>>> +	struct rte_mbuf *tail;
>>> +
>>> +	/* parse the given packet */
>>> +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
>>> +				struct ether_hdr *) + 1);
>>> +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
>>> +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
>>> +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
>>> +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
>>> +		- tcp_hl1;
>>> +
>>> +	/* parse the original packet */
>>> +	ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src,
>>> +				struct ether_hdr *) + 1);
>>> +
>>> +	/* check reassembly rules */
>>> +	if (pkt_src->pkt_len + tcp_dl1 > rule->max_packet_size)
>>> +		return -1;
>>> +
>>> +	/* remove the header of the incoming packet */
>>> +	rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) +
>>> +			ipv4_ihl1 + tcp_hl1);
>>> +
>>> +	/* chain the two packet together */
>>> +	tail = rte_pktmbuf_lastseg(pkt_src);
>>> +	tail->next = pkt;
>>> +
>>> +	/* update IP header */
>>> +	ipv4_hdr2->total_length = rte_cpu_to_be_16(
>>> +			rte_be_to_cpu_16(
>>> +				ipv4_hdr2->total_length)
>>> +			+ tcp_dl1);
>>> +
>>> +	/* update mbuf metadata for the merged packet */
>>> +	pkt_src->nb_segs++;
>>> +	pkt_src->pkt_len += pkt->pkt_len;
>>> +	return 1;
>>> +}
>>> +
>>> +static int
>>> +check_seq_option(struct rte_mbuf *pkt,
>>> +		struct tcp_hdr *tcp_hdr,
>>> +		uint16_t tcp_hl)
>>> +{
>>> +	struct ipv4_hdr *ipv4_hdr1;
>>> +	struct tcp_hdr *tcp_hdr1;
>>> +	uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1;
>>> +	uint32_t sent_seq1, sent_seq;
>>> +	int ret = -1;
>>> +
>>> +	ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt,
>>> +				struct ether_hdr *) + 1);
>>> +	ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1);
>>> +	tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1);
>>> +	tcp_hl1 = TCP_HDR_LEN(tcp_hdr1);
>>> +	tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1
>>> +		- tcp_hl1;
>>> +	sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1;
>>> +	sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
>>> +
>>> +	/* check if the two packets are neighbor */
>>> +	if ((sent_seq ^ sent_seq1) == 0) {
>>> +		/* check if TCP option field equals */
>>> +		if (tcp_hl1 > sizeof(struct tcp_hdr)) {
>>> +			if ((tcp_hl1 != tcp_hl) ||
>>> +					(memcmp(tcp_hdr1 + 1,
>>> +							tcp_hdr + 1,
>>> +							tcp_hl - sizeof
>>> +							(struct tcp_hdr))
>>> +					 == 0))
>>> +				ret = 1;
>>> +		}
>>> +	}
>>> +	return ret;
>>> +}
>>> +
>>> +static uint32_t
>>> +find_an_empty_item(struct gro_tcp_tbl *tbl)
>>> +{
>>> +	uint32_t i;
>>> +
>>> +	for (i = 0; i < tbl->max_item_num; i++)
>>> +		if (tbl->items[i].is_valid == 0)
>>> +			return i;
>>> +	return INVALID_ITEM_INDEX;
>>> +}
>>> +
>>> +static uint16_t
>>> +find_an_empty_flow(struct gro_tcp_tbl *tbl)
>>> +{
>>> +	uint16_t i;
>>> +
>>> +	for (i = 0; i < tbl->max_flow_num; i++)
>>> +		if (tbl->flows[i].is_valid == 0)
>>> +			return i;
>>> +	return INVALID_FLOW_INDEX;
>>> +}
>>> +
>>> +int32_t
>>> +gro_tcp4_reassemble(struct rte_mbuf *pkt,
>>> +		struct gro_tcp_tbl *tbl,
>>> +		struct gro_tcp_rule *rule)
>>> +{
>>> +	struct ether_hdr *eth_hdr;
>>> +	struct ipv4_hdr *ipv4_hdr;
>>> +	struct tcp_hdr *tcp_hdr;
>>> +	uint16_t ipv4_ihl, tcp_hl, tcp_dl, tcp_cksum, ip_cksum;
>>> +
>>> +	struct gro_tcp_flow_key key;
>>> +	uint64_t ol_flags;
>>> +	uint32_t cur_idx, prev_idx, item_idx;
>>> +	uint16_t i, flow_idx;
>>> +
>>> +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
>>> +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
>>> +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
>>> +
>>> +	/* 1. check if the packet should be processed */
>>> +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
>>> +		goto fail;
>>> +	if (ipv4_hdr->next_proto_id != IPPROTO_TCP)
>>> +		goto fail;
>>> +	if ((ipv4_hdr->fragment_offset &
>>> +				rte_cpu_to_be_16(IPV4_HDR_DF_MASK))
>>> +			== 0)
>>> +		goto fail;
>>> +
>>> +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
>>> +	tcp_hl = TCP_HDR_LEN(tcp_hdr);
>>> +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
>>> +		- tcp_hl;
>>> +	if (tcp_dl == 0)
>>> +		goto fail;
>>> +
>>> +	/**
>>> +	 * 2. if HW rx checksum offload isn't enabled, recalculate the
>>> +	 * checksum in SW. Then, check if the checksum is correct
>>> +	 */
>>> +	ol_flags = pkt->ol_flags;
>>> +	if ((ol_flags & PKT_RX_IP_CKSUM_MASK) !=
>>> +			PKT_RX_IP_CKSUM_UNKNOWN) {
>>> +		if (ol_flags == PKT_RX_IP_CKSUM_BAD)
>>> +			goto fail;
>>> +	} else {
>>> +		ip_cksum = ipv4_hdr->hdr_checksum;
>>> +		ipv4_hdr->hdr_checksum = 0;
>>> +		ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
>>> +		if (ipv4_hdr->hdr_checksum ^ ip_cksum)
>>> +			goto fail;
>>> +	}
>>> +
>>> +	if ((ol_flags & PKT_RX_L4_CKSUM_MASK) !=
>>> +			PKT_RX_L4_CKSUM_UNKNOWN) {
>>> +		if (ol_flags == PKT_RX_L4_CKSUM_BAD)
>>> +			goto fail;
>>> +	} else {
>>> +		tcp_cksum = tcp_hdr->cksum;
>>> +		tcp_hdr->cksum = 0;
>>> +		tcp_hdr->cksum = rte_ipv4_udptcp_cksum
>>> +			(ipv4_hdr, tcp_hdr);
>>> +		if (tcp_hdr->cksum ^ tcp_cksum)
>>> +			goto fail;
>>> +	}
>>> +
>>> +	/**
>>> +	 * 3. search for a flow and traverse all packets in the flow
>>> +	 * to find one to merge with the given packet.
>>> +	 */
>>> +	key.eth_saddr = eth_hdr->s_addr;
>>> +	key.eth_daddr = eth_hdr->d_addr;
>>> +	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
>>> +	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
>>> +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
>>> +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
>>> +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
>>> +	key.tcp_flags = tcp_hdr->tcp_flags;
>>> +
>>> +	for (i = 0; i < tbl->max_flow_num; i++) {
>>> +		/* search all packets in a valid flow. */
>>> +		if (tbl->flows[i].is_valid &&
>>> +				(memcmp(&(tbl->flows[i].key), &key,
>>> +						sizeof(struct gro_tcp_flow_key))
>>> +				 == 0)) {
>>> +			cur_idx = tbl->flows[i].start_index;
>>> +			prev_idx = cur_idx;
>>> +			while (cur_idx != INVALID_ITEM_INDEX) {
>>> +				if (check_seq_option(tbl->items[cur_idx].pkt,
>>> +							tcp_hdr,
>>> +							tcp_hl) > 0) {
>>> +					if (merge_two_tcp4_packets(
>>> +								tbl->items[cur_idx].pkt,
>>> +								pkt,
>>> +								rule) > 0) {
>>> +						/* successfully merge two packets */
>>> +						tbl->items[cur_idx].is_groed = 1;
>>> +						return 1;
>>> +					}
>>> +					/**
>>> +					 * fail to merge two packets since
>>> +					 * break the rules, add the packet
>>> +					 * into the flow.
>>> +					 */
>>> +					goto insert_to_existed_flow;
>>> +				} else {
>>> +					prev_idx = cur_idx;
>>> +					cur_idx = tbl->items[cur_idx].next_pkt_idx;
>>> +				}
>>> +			}
>>> +			/**
>>> +			 * fail to merge the given packet into an existed flow,
>>> +			 * add it into the flow.
>>> +			 */
>>> +insert_to_existed_flow:
>>> +			item_idx = find_an_empty_item(tbl);
>>> +			/* the item number is beyond the maximum value */
>>> +			if (item_idx == INVALID_ITEM_INDEX)
>>> +				return -1;
>>> +			tbl->items[prev_idx].next_pkt_idx = item_idx;
>>> +			tbl->items[item_idx].pkt = pkt;
>>> +			tbl->items[item_idx].is_groed = 0;
>>> +			tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
>>> +			tbl->items[item_idx].is_valid = 1;
>>> +			tbl->items[item_idx].start_time = rte_rdtsc();
>>> +			tbl->item_num++;
>>> +			return 0;
>>> +		}
>>> +	}
>>> +
>>> +	/**
>>> +	 * merge fail as the given packet is a new flow. Therefore,
>>> +	 * insert a new flow.
>>> +	 */
>>> +	item_idx = find_an_empty_item(tbl);
>>> +	flow_idx = find_an_empty_flow(tbl);
>>> +	/**
>>> +	 * if the flow or item number are beyond the maximum values,
>>> +	 * the inputted packet won't be processed.
>>> +	 */
>>> +	if (item_idx == INVALID_ITEM_INDEX ||
>>> +			flow_idx == INVALID_FLOW_INDEX)
>>> +		return -1;
>>> +	tbl->items[item_idx].pkt = pkt;
>>> +	tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX;
>>> +	tbl->items[item_idx].is_groed = 0;
>>> +	tbl->items[item_idx].is_valid = 1;
>>> +	tbl->items[item_idx].start_time = rte_rdtsc();
>>> +	tbl->item_num++;
>>> +
>>> +	memcpy(&(tbl->flows[flow_idx].key),
>>> +			&key, sizeof(struct gro_tcp_flow_key));
>>> +	tbl->flows[flow_idx].start_index = item_idx;
>>> +	tbl->flows[flow_idx].is_valid = 1;
>>> +	tbl->flow_num++;
>>> +
>>> +	return 0;
>>> +fail:
>>> +	return -1;
>>> +}
>>> +
>>> +uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
>>> +		uint16_t flush_num,
>>> +		struct rte_mbuf **out,
>>> +		const uint16_t nb_out)
>>> +{
>>> +	uint16_t num, k;
>>> +	uint16_t i;
>>> +	uint32_t j;
>>> +
>>> +	k = 0;
>>> +	num = tbl->item_num > flush_num ? flush_num : tbl->item_num;
>>> +	num = num > nb_out ? nb_out : num;
>>> +	if (unlikely(num == 0))
>>> +		return 0;
>>> +
>>> +	for (i = 0; i < tbl->max_flow_num; i++) {
>>> +		if (tbl->flows[i].is_valid) {
>>> +			j = tbl->flows[i].start_index;
>>> +			while (j != INVALID_ITEM_INDEX) {
>>> +				/* update checksum for GROed packet */
>>> +				if (tbl->items[j].is_groed)
>>> +					gro_tcp4_cksum_update(tbl->items[j].pkt);
>>> +
>>> +				out[k++] = tbl->items[j].pkt;
>>> +				tbl->items[j].is_valid = 0;
>>> +				tbl->item_num--;
>>> +				j = tbl->items[j].next_pkt_idx;
>>> +
>>> +				if (k == num) {
>>> +					/* delete the flow */
>>> +					if (j == INVALID_ITEM_INDEX) {
>>> +						tbl->flows[i].is_valid = 0;
>>> +						tbl->flow_num--;
>>> +					} else
>>> +						/* update flow information */
>>> +						tbl->flows[i].start_index = j;
>>> +					goto end;
>>> +				}
>>> +			}
>>> +			/* delete the flow, as all of its packets are flushed */
>>> +			tbl->flows[i].is_valid = 0;
>>> +			tbl->flow_num--;
>>> +		}
>>> +		if (tbl->flow_num == 0)
>>> +			goto end;
>>> +	}
>>> +end:
>>> +	return num;
>>> +}
>>> +
>>> +uint16_t
>>> +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
>>> +		uint64_t timeout_cycles,
>>> +		struct rte_mbuf **out,
>>> +		const uint16_t nb_out)
>>> +{
>>> +	uint16_t k;
>>> +	uint16_t i;
>>> +	uint32_t j;
>>> +	uint64_t current_time;
>>> +
>>> +	if (nb_out == 0)
>>> +		return 0;
>>> +	k = 0;
>>> +	current_time = rte_rdtsc();
>>> +
>>> +	for (i = 0; i < tbl->max_flow_num; i++) {
>>> +		if (tbl->flows[i].is_valid) {
>>> +			j = tbl->flows[i].start_index;
>>> +			while (j != INVALID_ITEM_INDEX) {
>>> +				if (current_time - tbl->items[j].start_time >=
>>> +						timeout_cycles) {
>>> +					/* update checksum for GROed packet */
>>> +					if (tbl->items[j].is_groed)
>>> +						gro_tcp4_cksum_update(tbl->items[j].pkt);
>>> +
>>> +					out[k++] = tbl->items[j].pkt;
>>> +					tbl->items[j].is_valid = 0;
>>> +					tbl->item_num--;
>>> +					j = tbl->items[j].next_pkt_idx;
>>> +
>>> +					if (k == nb_out &&
>>> +							j == INVALID_ITEM_INDEX) {
>>> +						/* delete the flow */
>>> +						tbl->flows[i].is_valid = 0;
>>> +						tbl->flow_num--;
>>> +						goto end;
>>> +					} else if (k == nb_out &&
>>> +							j != INVALID_ITEM_INDEX) {
>>> +						tbl->flows[i].start_index = j;
>>> +						goto end;
>>> +					}
>>> +				}
>>> +			}
>>> +			/* delete the flow, as all of its packets are flushed */
>>> +			tbl->flows[i].is_valid = 0;
>>> +			tbl->flow_num--;
>>> +		}
>>> +		if (tbl->flow_num == 0)
>>> +			goto end;
>>> +	}
>>> +end:
>>> +	return k;
>>> +}
>>> diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h
>>> new file mode 100644
>>> index 0000000..551efc4
>>> --- /dev/null
>>> +++ b/lib/librte_gro/rte_gro_tcp.h
>>> @@ -0,0 +1,210 @@
>>> +/*-
>>> + *   BSD LICENSE
>>> + *
>>> + *   Copyright(c) 2016-2017 Intel Corporation. All rights reserved.
>>> + *
>>> + *   Redistribution and use in source and binary forms, with or without
>>> + *   modification, are permitted provided that the following conditions
>>> + *   are met:
>>> + *
>>> + *     * Redistributions of source code must retain the above copyright
>>> + *       notice, this list of conditions and the following disclaimer.
>>> + *     * Redistributions in binary form must reproduce the above copyright
>>> + *       notice, this list of conditions and the following disclaimer in
>>> + *       the documentation and/or other materials provided with the
>>> + *       distribution.
>>> + *     * Neither the name of Intel Corporation nor the names of its
>>> + *       contributors may be used to endorse or promote products derived
>>> + *       from this software without specific prior written permission.
>>> + *
>>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>>> + */
>>> +
>>> +#ifndef _RTE_GRO_TCP_H_
>>> +#define _RTE_GRO_TCP_H_
>>> +
>>> +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
>>> +#define TCP_HDR_LEN(tcph) \
>>> +	((tcph->data_off >> 4) * 4)
>>> +#define IPv4_HDR_LEN(iph) \
>>> +	((iph->version_ihl & 0x0f) * 4)
>>> +#else
>>> +#define TCP_DATAOFF_MASK 0x0f
>>> +#define TCP_HDR_LEN(tcph) \
>>> +	((tcph->data_off & TCP_DATAOFF_MASK) * 4)
>>> +#define IPv4_HDR_LEN(iph) \
>>> +	((iph->version_ihl >> 4) * 4)
>>> +#endif
>>> +
>>> +#define IPV4_HDR_DF_SHIFT 14
>>> +#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
>>> +
>>> +#define INVALID_FLOW_INDEX 0xffffU
>>> +#define INVALID_ITEM_INDEX 0xffffffffUL
>>> +
>>> +#define GRO_TCP_TBL_MAX_FLOW_NUM (UINT16_MAX - 1)
>>> +#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
>>> +
>>> +/* criteria of mergeing packets */
>>> +struct gro_tcp_flow_key {
>>> +	struct ether_addr eth_saddr;
>>> +	struct ether_addr eth_daddr;
>>> +	uint32_t ip_src_addr[4];	/**< IPv4 uses the first 8B */
>>> +	uint32_t ip_dst_addr[4];
>>> +
>>> +	uint32_t recv_ack;	/**< acknowledgment sequence number. */
>>> +	uint16_t src_port;
>>> +	uint16_t dst_port;
>>> +	uint8_t tcp_flags;	/**< TCP flags. */
>>> +};
>>> +
>>> +struct gro_tcp_flow {
>>> +	struct gro_tcp_flow_key key;
>>> +	uint32_t start_index;	/**< the first packet index of the flow */
>>> +	uint8_t is_valid;
>>> +};
>>> +
>>> +struct gro_tcp_item {
>>> +	struct rte_mbuf *pkt;	/**< packet address. */
>>> +	/* the time when the packet in added into the table */
>>> +	uint64_t start_time;
>>> +	uint32_t next_pkt_idx;	/**< next packet index. */
>>> +	/* flag to indicate if the packet is GROed */
>>> +	uint8_t is_groed;
>>> +	uint8_t is_valid;	/**< flag indicates if the item is valid */
>>> +};
>>> +
>>> +/**
>>> + * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table
>>> + * structure.
>>> + */
>>> +struct gro_tcp_tbl {
>>> +	struct gro_tcp_item *items;	/**< item array */
>>> +	struct gro_tcp_flow *flows;	/**< flow array */
>>> +	uint32_t item_num;	/**< current item number */
>>> +	uint16_t flow_num;	/**< current flow num */
>>> +	uint32_t max_item_num;	/**< item array size */
>>> +	uint16_t max_flow_num;	/**< flow array size */
>>> +};
>>> +
>>> +/* rules to reassemble TCP packets, which are decided by applications */
>>> +struct gro_tcp_rule {
>>> +	/* the maximum packet length after merged */
>>> +	uint32_t max_packet_size;
>>> +};
>> Are there any other rules? If not, I prefer to use max_packet_size directly.
> If we agree to use a flag to indicate if check checksum, this structure should
> be used to keep this flag.
>
>>> +
>>> +/**
>>> + * This function is to update TCP and IPv4 header checksums
>>> + * for merged packets in the TCP reassembly table.
>>> + */
>>> +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl);
>>> +
>>> +/**
>>> + * This function creates a TCP reassembly table.
>>> + *
>>> + * @param socket_id
>>> + *  socket index where the Ethernet port connects to.
>>> + * @param max_flow_num
>>> + *  the maximum number of flows in the TCP GRO table
>>> + * @param max_item_per_flow
>>> + *  the maximum packet number per flow.
>>> + * @return
>>> + *  if create successfully, return a pointer which points to the
>>> + *  created TCP GRO table. Otherwise, return NULL.
>>> + */
>>> +void *gro_tcp_tbl_create(uint16_t socket_id,
>>> +		uint16_t max_flow_num,
>>> +		uint16_t max_item_per_flow);
>>> +
>>> +/**
>>> + * This function destroys a TCP reassembly table.
>>> + * @param tbl
>>> + *  a pointer points to the TCP reassembly table.
>>> + */
>>> +void gro_tcp_tbl_destroy(void *tbl);
>>> +
>>> +/**
>>> + * This function searches for a packet in the TCP reassembly table to
>>> + * merge with the inputted one. To merge two packets is to chain them
>>> + * together and update packet headers. Note that this function won't
>>> + * re-calculate IPv4 and TCP checksums.
>>> + *
>>> + * If the packet doesn't have data, or with wrong checksums, or is
>>> + * fragmented etc., errors happen and gro_tcp4_reassemble returns
>>> + * immediately. If no errors happen, the packet is either merged, or
>>> + * inserted into the reassembly table.
>>> + *
>>> + * If applications want to get packets in the reassembly table, they
>>> + * need to manually flush the packets.
>>> + *
>>> + * @param pkt
>>> + *  packet to reassemble.
>>> + * @param tbl
>>> + *  a pointer that points to a TCP reassembly table.
>>> + * @param rule
>>> + *  TCP reassembly criteria defined by applications.
>>> + * @return
>>> + *  if the inputted packet is merged successfully, return an positive
>>> + *  value. If the packet hasn't be merged with any packets in the TCP
>>> + *  reassembly table. If errors happen, return a negative value and the
>>> + *  packet won't be inserted into the reassemble table.
>>> + */
>>> +int32_t
>>> +gro_tcp4_reassemble(struct rte_mbuf *pkt,
>>> +		struct gro_tcp_tbl *tbl,
>>> +		struct gro_tcp_rule *rule);
>>> +
>>> +/**
>>> + * This function flushes the packets in a TCP reassembly table to
>>> + * applications. Before returning the packets, it will update TCP and
>>> + * IPv4 header checksums.
>>> + *
>>> + * @param tbl
>>> + *  a pointer that points to a TCP GRO table.
>>> + * @param flush_num
>>> + *  the number of packets that applications want to flush.
>>> + * @param out
>>> + *  pointer array which is used to keep flushed packets.
>>> + * @param nb_out
>>> + *  the maximum element number of out.
>>> + * @return
>>> + *  the number of packets that are flushed finally.
>>> + */
>>> +uint16_t
>>> +gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl,
>>> +		uint16_t flush_num,
>>> +		struct rte_mbuf **out,
>>> +		const uint16_t nb_out);
>>> +
>>> +/**
>>> + * This function flushes timeout packets in a TCP reassembly table to
>>> + * applications. Before returning the packets, it updates TCP and IPv4
>>> + * header checksums.
>>> + *
>>> + * @param tbl
>>> + *  a pointer that points to a TCP GRO table.
>>> + * @param timeout_cycles
>>> + *  the maximum time that packets can stay in the table.
>>> + * @param out
>>> + *  pointer array which is used to keep flushed packets.
>>> + * @param nb_out
>>> + *  the maximum element number of out.
>>> + * @return
>>> + *  It returns the number of packets that are flushed finally.
>>> + */
>>> +uint16_t
>>> +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl,
>>> +		uint64_t timeout_cycles,
>>> +		struct rte_mbuf **out,
>>> +		const uint16_t nb_out);
>>> +#endif