From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 0EE4B29CA for ; Thu, 29 Jun 2017 01:56:14 +0200 (CEST) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Jun 2017 16:56:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,278,1496127600"; d="scan'208";a="102502267" Received: from irsmsx106.ger.corp.intel.com ([163.33.3.31]) by orsmga004.jf.intel.com with ESMTP; 28 Jun 2017 16:56:11 -0700 Received: from irsmsx109.ger.corp.intel.com ([169.254.13.115]) by IRSMSX106.ger.corp.intel.com ([169.254.8.236]) with mapi id 14.03.0319.002; Thu, 29 Jun 2017 00:56:11 +0100 From: "Ananyev, Konstantin" To: "Hu, Jiayu" , "dev@dpdk.org" CC: "Tan, Jianfeng" , "stephen@networkplumber.org" , "yliu@fridaylinux.org" , "Wu, Jingjing" , "Yao, Lei A" , "Wiles, Keith" , "Bie, Tiwei" Thread-Topic: [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support Thread-Index: AQHS7kdlXNHGhRYN9Eipev8pdR3kGKI6kXCQ Date: Wed, 28 Jun 2017 23:56:10 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772583FB13208@IRSMSX109.ger.corp.intel.com> References: <1498229000-94867-1-git-send-email-jiayu.hu@intel.com> <1498459430-116048-1-git-send-email-jiayu.hu@intel.com> <1498459430-116048-3-git-send-email-jiayu.hu@intel.com> In-Reply-To: <1498459430-116048-3-git-send-email-jiayu.hu@intel.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Jun 2017 23:56:16 -0000 Hi Jiayu, >=20 > In this patch, we introduce five APIs to support TCP/IPv4 GRO. > - gro_tcp_tbl_create: create a TCP reassembly table, which is used to > merge packets. > - gro_tcp_tbl_destroy: free memory space of a TCP reassembly table. > - gro_tcp_tbl_flush: flush all packets from a TCP reassembly table. > - gro_tcp_tbl_timeout_flush: flush timeout packets from a TCP > reassembly table. > - gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet. >=20 > TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4 > and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP > checksums for merged packets. If inputted packets are IP fragmented, > TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4 > headers). >=20 > In TCP GRO, we use a table structure, called TCP reassembly table, to > reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table > structure. A TCP reassembly table includes a key array and a item array, > where the key array keeps the criteria to merge packets and the item > array keeps packet information. >=20 > One key in the key array points to an item group, which consists of > packets which have the same criteria value. If two packets are able to > merge, they must be in the same item group. Each key in the key array > includes two parts: > - criteria: the criteria of merging packets. If two packets can be > merged, they must have the same criteria value. > - start_index: the index of the first incoming packet of the item group. >=20 > Each element in the item array keeps the information of one packet. It > mainly includes two parts: > - pkt: packet address > - next_pkt_index: the index of the next packet in the same item group. > All packets in the same item group are chained by next_pkt_index. > With next_pkt_index, we can locate all packets in the same item > group one by one. >=20 > To process an incoming packet needs three steps: > a. check if the packet should be processed. Packets with the following > properties won't be processed: > - packets without data (e.g. SYN, SYN-ACK) > b. traverse the key array to find a key which has the same criteria > value with the incoming packet. If find, goto step c. Otherwise, > insert a new key and insert the packet into the item array. > c. locate the first packet in the item group via the start_index in the > key. Then traverse all packets in the item group via next_pkt_index. > If find one packet which can merge with the incoming one, merge them > together. If can't find, insert the packet into this item group. >=20 > Signed-off-by: Jiayu Hu > --- > doc/guides/rel_notes/release_17_08.rst | 7 + > lib/librte_gro/Makefile | 1 + > lib/librte_gro/rte_gro.c | 123 ++++++++-- > lib/librte_gro/rte_gro.h | 6 +- > lib/librte_gro/rte_gro_tcp.c | 394 +++++++++++++++++++++++++++= ++++++ > lib/librte_gro/rte_gro_tcp.h | 191 ++++++++++++++++ > 6 files changed, 706 insertions(+), 16 deletions(-) > create mode 100644 lib/librte_gro/rte_gro_tcp.c > create mode 100644 lib/librte_gro/rte_gro_tcp.h >=20 > diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_note= s/release_17_08.rst > index 842f46f..f067247 100644 > --- a/doc/guides/rel_notes/release_17_08.rst > +++ b/doc/guides/rel_notes/release_17_08.rst > @@ -75,6 +75,13 @@ New Features >=20 > Added support for firmwares with multiple Ethernet ports per physical = port. >=20 > +* **Add Generic Receive Offload API support.** > + > + Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4 > + packets. GRO API assumes all inputted packets are with correct > + checksums. GRO API doesn't update checksums for merged packets. If > + inputted packets are IP fragmented, GRO API assumes they are complete > + packets (i.e. with L4 headers). >=20 > Resolved Issues > --------------- > diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile > index 7e0f128..e89344d 100644 > --- a/lib/librte_gro/Makefile > +++ b/lib/librte_gro/Makefile > @@ -43,6 +43,7 @@ LIBABIVER :=3D 1 >=20 > # source files > SRCS-$(CONFIG_RTE_LIBRTE_GRO) +=3D rte_gro.c > +SRCS-$(CONFIG_RTE_LIBRTE_GRO) +=3D rte_gro_tcp.c >=20 > # install this header file > SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include +=3D rte_gro.h > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c > index 33275e8..5b89928 100644 > --- a/lib/librte_gro/rte_gro.c > +++ b/lib/librte_gro/rte_gro.c > @@ -32,11 +32,15 @@ >=20 > #include > #include > +#include >=20 > #include "rte_gro.h" > +#include "rte_gro_tcp.h" >=20 > -static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NUM]; > -static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NUM]; > +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NUM] =3D { > + gro_tcp_tbl_create, NULL}; > +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NUM] =3D { > + gro_tcp_tbl_destroy, NULL}; >=20 > struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id, > uint16_t max_flow_num, > @@ -94,32 +98,121 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl= ) > } >=20 > uint16_t > -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused, > +rte_gro_reassemble_burst(struct rte_mbuf **pkts, > const uint16_t nb_pkts, > - const struct rte_gro_param param __rte_unused) > + const struct rte_gro_param param) > { > - return nb_pkts; > + uint16_t i; > + uint16_t nb_after_gro =3D nb_pkts; > + uint32_t item_num =3D RTE_MIN(nb_pkts, param.max_flow_num * > + param.max_item_per_flow); > + > + /* allocate a reassembly table for TCP/IPv4 GRO */ > + uint32_t tcp_item_num =3D RTE_MIN(item_num, GRO_MAX_BURST_ITEM_NUM); > + struct gro_tcp_tbl tcp_tbl; > + struct gro_tcp_key tcp_keys[tcp_item_num]; > + struct gro_tcp_item tcp_items[tcp_item_num]; > + > + struct rte_mbuf *unprocess_pkts[nb_pkts]; > + uint16_t unprocess_num =3D 0; > + int32_t ret; > + > + memset(tcp_keys, 0, sizeof(struct gro_tcp_key) * > + tcp_item_num); > + memset(tcp_items, 0, sizeof(struct gro_tcp_item) * > + tcp_item_num); > + tcp_tbl.keys =3D tcp_keys; > + tcp_tbl.items =3D tcp_items; > + tcp_tbl.key_num =3D 0; > + tcp_tbl.item_num =3D 0; > + tcp_tbl.max_key_num =3D tcp_item_num; > + tcp_tbl.max_item_num =3D tcp_item_num; > + > + for (i =3D 0; i < nb_pkts; i++) { > + if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type)) { Why just not && for these 2 conditions? > + if ((pkts[i]->packet_type & RTE_PTYPE_L4_TCP) && > + (param.desired_gro_types & > + GRO_TCP_IPV4)) { No need to check param.desired_gro_types inside the loop. You can do that before the loop. > + ret =3D gro_tcp4_reassemble(pkts[i], > + &tcp_tbl, > + param.max_packet_size); > + /* merge successfully */ > + if (ret > 0) > + nb_after_gro--; > + else if (ret < 0) > + unprocess_pkts[unprocess_num++] =3D > + pkts[i]; > + } else > + unprocess_pkts[unprocess_num++] =3D > + pkts[i]; > + } else > + unprocess_pkts[unprocess_num++] =3D > + pkts[i]; > + } > + > + /* re-arrange GROed packets */ > + if (nb_after_gro < nb_pkts) { > + if (param.desired_gro_types & GRO_TCP_IPV4) > + i =3D gro_tcp_tbl_flush(&tcp_tbl, pkts, nb_pkts); > + if (unprocess_num > 0) { > + memcpy(&pkts[i], unprocess_pkts, > + sizeof(struct rte_mbuf *) * > + unprocess_num); > + i +=3D unprocess_num; > + } > + if (nb_pkts > i) > + memset(&pkts[i], 0, > + sizeof(struct rte_mbuf *) * > + (nb_pkts - i)); > + } Why do you need to zero remaining pkts[]? > + return nb_after_gro; > } >=20 > -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused, > - struct rte_gro_tbl *gro_tbl __rte_unused) > +int rte_gro_reassemble(struct rte_mbuf *pkt, > + struct rte_gro_tbl *gro_tbl) > { > + if (unlikely(pkt =3D=3D NULL)) > + return -1; > + > + if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) { > + if ((pkt->packet_type & RTE_PTYPE_L4_TCP) && > + (gro_tbl->desired_gro_types & > + GRO_TCP_IPV4)) > + return gro_tcp4_reassemble(pkt, > + gro_tbl->tbls[GRO_TCP_IPV4_INDEX], > + gro_tbl->max_packet_size); > + } > + > return -1; > } >=20 > -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused, > - uint64_t desired_gro_types __rte_unused, > - struct rte_mbuf **out __rte_unused, > - const uint16_t max_nb_out __rte_unused) > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl, > + uint64_t desired_gro_types, > + struct rte_mbuf **out, > + const uint16_t max_nb_out) > { > + desired_gro_types =3D desired_gro_types & > + gro_tbl->desired_gro_types; > + if (desired_gro_types & GRO_TCP_IPV4) > + return gro_tcp_tbl_flush( > + gro_tbl->tbls[GRO_TCP_IPV4_INDEX], > + out, > + max_nb_out); > return 0; > } >=20 > uint16_t > -rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused, > - uint64_t desired_gro_types __rte_unused, > - struct rte_mbuf **out __rte_unused, > - const uint16_t max_nb_out __rte_unused) > +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl, > + uint64_t desired_gro_types, > + struct rte_mbuf **out, > + const uint16_t max_nb_out) > { > + desired_gro_types =3D desired_gro_types & > + gro_tbl->desired_gro_types; > + if (desired_gro_types & GRO_TCP_IPV4) > + return gro_tcp_tbl_timeout_flush( > + gro_tbl->tbls[GRO_TCP_IPV4_INDEX], > + gro_tbl->max_timeout_cycles, > + out, max_nb_out); > return 0; > } > diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h > index f9d36e8..a30b1c6 100644 > --- a/lib/librte_gro/rte_gro.h > +++ b/lib/librte_gro/rte_gro.h > @@ -45,7 +45,11 @@ extern "C" { >=20 > /* max number of supported GRO types */ > #define GRO_TYPE_MAX_NUM 64 > -#define GRO_TYPE_SUPPORT_NUM 0 /**< current supported GRO num */ > +#define GRO_TYPE_SUPPORT_NUM 1 /**< current supported GRO num */ > + > +/* TCP/IPv4 GRO flag */ > +#define GRO_TCP_IPV4_INDEX 0 > +#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX) >=20 > /** > * GRO table, which is used to merge packets. It keeps many reassembly > diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c > new file mode 100644 > index 0000000..c0eaa45 > --- /dev/null > +++ b/lib/librte_gro/rte_gro_tcp.c > @@ -0,0 +1,394 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2017 Intel Corporation. All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyrig= ht > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS F= OR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGH= T > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTA= L, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF US= E, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON A= NY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE U= SE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE= . > + */ > + > +#include > +#include > +#include > + > +#include > +#include > +#include > + > +#include "rte_gro_tcp.h" > + > +void *gro_tcp_tbl_create(uint16_t socket_id, > + uint16_t max_flow_num, > + uint16_t max_item_per_flow) > +{ > + size_t size; > + uint32_t entries_num; > + struct gro_tcp_tbl *tbl; > + > + entries_num =3D max_flow_num * max_item_per_flow; > + entries_num =3D entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ? > + GRO_TCP_TBL_MAX_ITEM_NUM : entries_num; > + > + if (entries_num =3D=3D 0) > + return NULL; > + > + tbl =3D (struct gro_tcp_tbl *)rte_zmalloc_socket( > + __func__, > + sizeof(struct gro_tcp_tbl), > + RTE_CACHE_LINE_SIZE, > + socket_id); Here and everywhere - rte_malloc() can fail. Add proper error handling. > + > + size =3D sizeof(struct gro_tcp_item) * entries_num; > + tbl->items =3D (struct gro_tcp_item *)rte_zmalloc_socket( > + __func__, > + size, > + RTE_CACHE_LINE_SIZE, > + socket_id); > + tbl->max_item_num =3D entries_num; > + > + size =3D sizeof(struct gro_tcp_key) * entries_num; > + tbl->keys =3D (struct gro_tcp_key *)rte_zmalloc_socket( > + __func__, > + size, RTE_CACHE_LINE_SIZE, > + socket_id); > + tbl->max_key_num =3D entries_num; > + return tbl; > +} > + > +void gro_tcp_tbl_destroy(void *tbl) > +{ > + struct gro_tcp_tbl *tcp_tbl =3D (struct gro_tcp_tbl *)tbl; > + > + if (tcp_tbl) { > + if (tcp_tbl->items) No need to, rte_free(NULL) is a valid construction. Same below. > + rte_free(tcp_tbl->items); > + if (tcp_tbl->keys) > + rte_free(tcp_tbl->keys); > + rte_free(tcp_tbl); > + } > +} > + > +/** > + * merge two TCP/IPv4 packets without update checksums. > + */ > +static int > +merge_two_tcp4_packets(struct rte_mbuf *pkt_src, > + struct rte_mbuf *pkt, > + uint32_t max_packet_size) > +{ > + struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2; > + struct tcp_hdr *tcp_hdr1; > + uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1; > + struct rte_mbuf *tail; > + > + /* parse the given packet */ > + ipv4_hdr1 =3D (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, > + struct ether_hdr *) + 1); You probably shouldn't assume that l2_len is always 14B long. > + ipv4_ihl1 =3D IPv4_HDR_LEN(ipv4_hdr1); > + tcp_hdr1 =3D (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1); > + tcp_hl1 =3D TCP_HDR_LEN(tcp_hdr1); > + tcp_dl1 =3D rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1 > + - tcp_hl1; > + > + /* parse the original packet */ > + ipv4_hdr2 =3D (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src, > + struct ether_hdr *) + 1); > + > + if (pkt_src->pkt_len + tcp_dl1 > max_packet_size) > + return -1; > + > + /* remove the header of the incoming packet */ > + rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) + > + ipv4_ihl1 + tcp_hl1); > + > + /* chain the two packet together */ > + tail =3D rte_pktmbuf_lastseg(pkt_src); > + tail->next =3D pkt; What I see as a problem here: You have to reparse your packet and do lastseg for it for each new segment. That seems like a big overhead. Would be good instead to parse the packet once and then store that infomati= o inside mbuf: l2_len/l3_len/l4_len, etc. You can probably even avoid parsing inside your library - by adding as a pr= erequisite for the caller to fill these fields properly. Similar thought about lastseg - would be good to store it somewhere inside = your table. > + > + /* update IP header */ > + ipv4_hdr2->total_length =3D rte_cpu_to_be_16( > + rte_be_to_cpu_16( > + ipv4_hdr2->total_length) > + + tcp_dl1); > + > + /* update mbuf metadata for the merged packet */ > + pkt_src->nb_segs++; Why do you assume that incoming packet always contains only one segment? > + pkt_src->pkt_len +=3D pkt->pkt_len; > + return 1; > +} > + > +static int > +check_seq_option(struct rte_mbuf *pkt, > + struct tcp_hdr *tcp_hdr, > + uint16_t tcp_hl) > +{ > + struct ipv4_hdr *ipv4_hdr1; > + struct tcp_hdr *tcp_hdr1; > + uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1; > + uint32_t sent_seq1, sent_seq; > + int ret =3D -1; > + > + ipv4_hdr1 =3D (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, > + struct ether_hdr *) + 1); > + ipv4_ihl1 =3D IPv4_HDR_LEN(ipv4_hdr1); > + tcp_hdr1 =3D (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1); > + tcp_hl1 =3D TCP_HDR_LEN(tcp_hdr1); > + tcp_dl1 =3D rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1 > + - tcp_hl1; > + sent_seq1 =3D rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1; > + sent_seq =3D rte_be_to_cpu_32(tcp_hdr->sent_seq); > + > + /* check if the two packets are neighbor */ > + if ((sent_seq ^ sent_seq1) =3D=3D 0) { Why just not if sent_seq =3D=3D sent_seq1? > + /* check if TCP option field equals */ > + if (tcp_hl1 > sizeof(struct tcp_hdr)) { And what if tcp_hl1 =3D=3D sizeof(struct tcp_hdr), but tcp_hl > tcp_hl1? I think you need to remove that check. =20 > + if ((tcp_hl1 !=3D tcp_hl) || > + (memcmp(tcp_hdr1 + 1, > + tcp_hdr + 1, > + tcp_hl - sizeof > + (struct tcp_hdr)) > + =3D=3D 0)) > + ret =3D 1; > + } > + } > + return ret; > +} > + > +static uint32_t > +find_an_empty_item(struct gro_tcp_tbl *tbl) > +{ > + uint32_t i; > + > + for (i =3D 0; i < tbl->max_item_num; i++) > + if (tbl->items[i].is_valid =3D=3D 0) > + return i; > + return INVALID_ARRAY_INDEX; > +} > + > +static uint32_t > +find_an_empty_key(struct gro_tcp_tbl *tbl) > +{ > + uint32_t i; > + > + for (i =3D 0; i < tbl->max_key_num; i++) > + if (tbl->keys[i].is_valid =3D=3D 0) > + return i; > + return INVALID_ARRAY_INDEX; > +} > + > +int32_t > +gro_tcp4_reassemble(struct rte_mbuf *pkt, > + struct gro_tcp_tbl *tbl, > + uint32_t max_packet_size) > +{ > + struct ether_hdr *eth_hdr; > + struct ipv4_hdr *ipv4_hdr; > + struct tcp_hdr *tcp_hdr; > + uint16_t ipv4_ihl, tcp_hl, tcp_dl; > + > + struct tcp_key key; > + uint32_t cur_idx, prev_idx, item_idx; > + uint32_t i, key_idx; > + > + eth_hdr =3D rte_pktmbuf_mtod(pkt, struct ether_hdr *); > + ipv4_hdr =3D (struct ipv4_hdr *)(eth_hdr + 1); > + ipv4_ihl =3D IPv4_HDR_LEN(ipv4_hdr); > + > + /* check if the packet should be processed */ > + if (ipv4_ihl < sizeof(struct ipv4_hdr)) > + goto fail; > + tcp_hdr =3D (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl); > + tcp_hl =3D TCP_HDR_LEN(tcp_hdr); > + tcp_dl =3D rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl > + - tcp_hl; > + if (tcp_dl =3D=3D 0) > + goto fail; > + > + /* find a key and traverse all packets in its item group */ > + key.eth_saddr =3D eth_hdr->s_addr; > + key.eth_daddr =3D eth_hdr->d_addr; > + key.ip_src_addr[0] =3D rte_be_to_cpu_32(ipv4_hdr->src_addr); > + key.ip_dst_addr[0] =3D rte_be_to_cpu_32(ipv4_hdr->dst_addr); Your key.ip_src_addr[1-3] still contains some junk. How memcmp below supposed to worj properly? BTW why do you need 4 elems here, why just not uint32_t ip_src_addr;? Same for ip_dst_addr. > + key.src_port =3D rte_be_to_cpu_16(tcp_hdr->src_port); > + key.dst_port =3D rte_be_to_cpu_16(tcp_hdr->dst_port); > + key.recv_ack =3D rte_be_to_cpu_32(tcp_hdr->recv_ack); > + key.tcp_flags =3D tcp_hdr->tcp_flags; > + > + for (i =3D 0; i < tbl->max_key_num; i++) { > + if (tbl->keys[i].is_valid && > + (memcmp(&(tbl->keys[i].key), &key, > + sizeof(struct tcp_key)) > + =3D=3D 0)) { > + cur_idx =3D tbl->keys[i].start_index; > + prev_idx =3D cur_idx; > + while (cur_idx !=3D INVALID_ARRAY_INDEX) { > + if (check_seq_option(tbl->items[cur_idx].pkt, > + tcp_hdr, > + tcp_hl) > 0) { As I remember linux gro also check ipv4 packet_id - it should be consecutiv= e. > + if (merge_two_tcp4_packets( > + tbl->items[cur_idx].pkt, > + pkt, > + max_packet_size) > 0) { > + /* successfully merge two packets */ > + tbl->items[cur_idx].is_groed =3D 1; > + return 1; > + } If you allow more then packet per flow to be stored in the table, then you = should be prepared that new segment can fill a gap between 2 packets. Probably the easiest thing - don't allow more then one 'item' per flow. =20 > + /** > + * fail to merge two packets since > + * it's beyond the max packet length. > + * Insert it into the item group. > + */ > + goto insert_to_item_group; > + } else { > + prev_idx =3D cur_idx; > + cur_idx =3D tbl->items[cur_idx].next_pkt_idx; > + } > + } > + /** > + * find a corresponding item group but fails to find > + * one packet to merge. Insert it into this item group. > + */ > +insert_to_item_group: > + item_idx =3D find_an_empty_item(tbl); > + /* the item number is greater than the max value */ > + if (item_idx =3D=3D INVALID_ARRAY_INDEX) > + return -1; > + tbl->items[prev_idx].next_pkt_idx =3D item_idx; > + tbl->items[item_idx].pkt =3D pkt; > + tbl->items[item_idx].is_groed =3D 0; > + tbl->items[item_idx].next_pkt_idx =3D INVALID_ARRAY_INDEX; > + tbl->items[item_idx].is_valid =3D 1; > + tbl->items[item_idx].start_time =3D rte_rdtsc(); > + tbl->item_num++; > + return 0; > + } > + } > + > + /** > + * merge fail as the given packet has a new key. > + * So insert a new key. > + */ > + item_idx =3D find_an_empty_item(tbl); > + key_idx =3D find_an_empty_key(tbl); > + /** > + * if current key or item number is greater than the max > + * value, don't insert the packet into the table and return > + * immediately. > + */ > + if (item_idx =3D=3D INVALID_ARRAY_INDEX || > + key_idx =3D=3D INVALID_ARRAY_INDEX) > + return -1; > + tbl->items[item_idx].pkt =3D pkt; > + tbl->items[item_idx].next_pkt_idx =3D INVALID_ARRAY_INDEX; > + tbl->items[item_idx].is_groed =3D 0; > + tbl->items[item_idx].is_valid =3D 1; > + tbl->items[item_idx].start_time =3D rte_rdtsc(); You can pass start-time as a parameter instead. > + tbl->item_num++; > + > + memcpy(&(tbl->keys[key_idx].key), > + &key, sizeof(struct tcp_key)); > + tbl->keys[key_idx].start_index =3D item_idx; > + tbl->keys[key_idx].is_valid =3D 1; > + tbl->key_num++; > + > + return 0; > +fail: Please try to avoid goto whenever possible. Looks really ugly. > + return -1; > +} > + > +uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl, > + struct rte_mbuf **out, > + const uint16_t nb_out) > +{ > + uint32_t i, num =3D 0; > + > + if (nb_out < tbl->item_num) > + return 0; And how user would now how many items are now in the table? > + > + for (i =3D 0; i < tbl->max_item_num; i++) { > + if (tbl->items[i].is_valid) { > + out[num++] =3D tbl->items[i].pkt; > + tbl->items[i].is_valid =3D 0; > + tbl->item_num--; > + } > + } > + memset(tbl->keys, 0, sizeof(struct gro_tcp_key) * > + tbl->max_key_num); > + tbl->key_num =3D 0; > + > + return num; > +} > + > +uint16_t > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl, > + uint64_t timeout_cycles, > + struct rte_mbuf **out, > + const uint16_t nb_out) > +{ > + uint16_t k; > + uint32_t i, j; > + uint64_t current_time; > + > + if (nb_out =3D=3D 0) > + return 0; > + k =3D 0; > + current_time =3D rte_rdtsc(); > + > + for (i =3D 0; i < tbl->max_key_num; i++) { > + if (tbl->keys[i].is_valid) { Seems pretty expensive to traverse the whole table... Would it worth to have some sort of LRU list? > + j =3D tbl->keys[i].start_index; > + while (j !=3D INVALID_ARRAY_INDEX) { > + if (current_time - tbl->items[j].start_time >=3D > + timeout_cycles) { > + out[k++] =3D tbl->items[j].pkt; > + tbl->items[j].is_valid =3D 0; > + tbl->item_num--; > + j =3D tbl->items[j].next_pkt_idx; > + > + if (k =3D=3D nb_out && > + j =3D=3D INVALID_ARRAY_INDEX) { > + /* delete the key */ > + tbl->keys[i].is_valid =3D 0; > + tbl->key_num--; > + goto end; Please rearrange the code to avoid gotos. > + } else if (k =3D=3D nb_out && > + j !=3D INVALID_ARRAY_INDEX) { > + /* update the first item index */ > + tbl->keys[i].start_index =3D j; > + goto end; > + } > + } > + } > + /* delete the key, as all of its packets are flushed */ > + tbl->keys[i].is_valid =3D 0; > + tbl->key_num--; > + } > + if (tbl->key_num =3D=3D 0) > + goto end; > + } > +end: > + return k; > +} > diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h > new file mode 100644 > index 0000000..a9a7aca > --- /dev/null > +++ b/lib/librte_gro/rte_gro_tcp.h > @@ -0,0 +1,191 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2017 Intel Corporation. All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyrig= ht > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS F= OR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGH= T > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTA= L, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF US= E, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON A= NY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE U= SE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE= . > + */ > + > +#ifndef _RTE_GRO_TCP_H_ > +#define _RTE_GRO_TCP_H_ > + > +#if RTE_BYTE_ORDER =3D=3D RTE_LITTLE_ENDIAN > +#define TCP_HDR_LEN(tcph) \ > + ((tcph->data_off >> 4) * 4) > +#define IPv4_HDR_LEN(iph) \ > + ((iph->version_ihl & 0x0f) * 4) > +#else > +#define TCP_DATAOFF_MASK 0x0f > +#define TCP_HDR_LEN(tcph) \ > + ((tcph->data_off & TCP_DATAOFF_MASK) * 4) > +#define IPv4_HDR_LEN(iph) \ > + ((iph->version_ihl >> 4) * 4) > +#endif > + > +#define INVALID_ARRAY_INDEX 0xffffffffUL > +#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1) > + > +/* criteria of mergeing packets */ > +struct tcp_key { > + struct ether_addr eth_saddr; > + struct ether_addr eth_daddr; > + uint32_t ip_src_addr[4]; /**< IPv4 uses the first 8B */ > + uint32_t ip_dst_addr[4]; > + > + uint32_t recv_ack; /**< acknowledgment sequence number. */ > + uint16_t src_port; > + uint16_t dst_port; > + uint8_t tcp_flags; /**< TCP flags. */ > +}; > + > +struct gro_tcp_key { > + struct tcp_key key; > + uint32_t start_index; /**< the first packet index of the flow */ > + uint8_t is_valid; > +}; > + > +struct gro_tcp_item { > + struct rte_mbuf *pkt; /**< packet address. */ > + /* the time when the packet in added into the table */ > + uint64_t start_time; > + uint32_t next_pkt_idx; /**< next packet index. */ > + /* flag to indicate if the packet is GROed */ > + uint8_t is_groed; > + uint8_t is_valid; /**< flag indicates if the item is valid */ Why do you need these 2 flags at all? Why not just reset let say pkt to NULL for invalid item? > +}; > + > +/** > + * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table > + * structure. > + */ > +struct gro_tcp_tbl { > + struct gro_tcp_item *items; /**< item array */ > + struct gro_tcp_key *keys; /**< key array */ > + uint32_t item_num; /**< current item number */ > + uint32_t key_num; /**< current key num */ > + uint32_t max_item_num; /**< item array size */ > + uint32_t max_key_num; /**< key array size */ > +}; > + > +/** > + * This function creates a TCP reassembly table. > + * > + * @param socket_id > + * socket index where the Ethernet port connects to. > + * @param max_flow_num > + * the maximum number of flows in the TCP GRO table > + * @param max_item_per_flow > + * the maximum packet number per flow. > + * @return > + * if create successfully, return a pointer which points to the > + * created TCP GRO table. Otherwise, return NULL. > + */ > +void *gro_tcp_tbl_create(uint16_t socket_id, > + uint16_t max_flow_num, > + uint16_t max_item_per_flow); > + > +/** > + * This function destroys a TCP reassembly table. > + * @param tbl > + * a pointer points to the TCP reassembly table. > + */ > +void gro_tcp_tbl_destroy(void *tbl); > + > +/** > + * This function searches for a packet in the TCP reassembly table to > + * merge with the inputted one. To merge two packets is to chain them > + * together and update packet headers. If the packet is without data > + * (e.g. SYN, SYN-ACK packet), this function returns immediately. > + * Otherwise, the packet is either merged, or inserted into the table. > + * Besides, if there is no available space to insert the packet, this > + * function returns immediately too. > + * > + * This function assumes the inputted packet is with correct IPv4 and > + * TCP checksums. And if two packets are merged, it won't re-calculate > + * IPv4 and TCP checksums. Besides, if the inputted packet is IP > + * fragmented, it assumes the packet is complete (with TCP header). > + * > + * @param pkt > + * packet to reassemble. > + * @param tbl > + * a pointer that points to a TCP reassembly table. > + * @param max_packet_size > + * max packet length after merged > + * @return > + * if the packet doesn't have data, or there is no available space > + * in the table to insert a new item or a new key, return a negative > + * value. If the packet is merged successfully, return an positive > + * value. If the packet is inserted into the table, return 0. > + */ > +int32_t > +gro_tcp4_reassemble(struct rte_mbuf *pkt, > + struct gro_tcp_tbl *tbl, > + uint32_t max_packet_size); > + > +/** > + * This function flushes all packets in a TCP reassembly table to > + * applications, and without updating checksums for merged packets. > + * If the array which is used to keep flushed packets is not large > + * enough, error happens and this function returns immediately. > + * > + * @param tbl > + * a pointer that points to a TCP GRO table. > + * @param out > + * pointer array which is used to keep flushed packets. Applications > + * should guarantee it's large enough to hold all packets in the table. > + * @param nb_out > + * the element number of out. > + * @return > + * the number of flushed packets. If out is not large enough to hold > + * all packets in the table, return 0. > + */ > +uint16_t > +gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl, > + struct rte_mbuf **out, > + const uint16_t nb_out); > + > +/** > + * This function flushes timeout packets in a TCP reassembly table to > + * applications, and without updating checksums for merged packets. > + * > + * @param tbl > + * a pointer that points to a TCP GRO table. > + * @param timeout_cycles > + * the maximum time that packets can stay in the table. > + * @param out > + * pointer array which is used to keep flushed packets. > + * @param nb_out > + * the element number of out. > + * @return > + * the number of packets that are returned. > + */ > +uint16_t > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl, > + uint64_t timeout_cycles, > + struct rte_mbuf **out, > + const uint16_t nb_out); > +#endif > -- > 2.7.4