From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 36D4A1094 for ; Sun, 25 Jun 2017 18:53:35 +0200 (CEST) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Jun 2017 09:53:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.39,391,1493708400"; d="scan'208";a="278463044" Received: from tanjianf-mobl.ccr.corp.intel.com (HELO [10.255.27.207]) ([10.255.27.207]) by fmsmga004.fm.intel.com with ESMTP; 25 Jun 2017 09:53:32 -0700 To: Jiayu Hu , dev@dpdk.org References: <1497770469-16661-1-git-send-email-jiayu.hu@intel.com> <1498229000-94867-1-git-send-email-jiayu.hu@intel.com> <1498229000-94867-3-git-send-email-jiayu.hu@intel.com> Cc: konstantin.ananyev@intel.com, stephen@networkplumber.org, yliu@fridaylinux.org, keith.wiles@intel.com, tiwei.bie@intel.com, lei.a.yao@intel.com From: "Tan, Jianfeng" Message-ID: <7b3cd553-65c7-535b-5f39-3c8316e5ee85@intel.com> Date: Mon, 26 Jun 2017 00:53:31 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <1498229000-94867-3-git-send-email-jiayu.hu@intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v6 2/3] lib/gro: add TCP/IPv4 GRO support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Jun 2017 16:53:37 -0000 Hi Jiayu, On 6/23/2017 10:43 PM, Jiayu Hu wrote: > In this patch, we introduce five APIs to support TCP/IPv4 GRO. > - gro_tcp_tbl_create: create a TCP reassembly table, which is used to > merge packets. > - gro_tcp_tbl_destroy: free memory space of a TCP reassembly table. > - gro_tcp_tbl_flush: flush all packets from a TCP reassembly table. > - gro_tcp_tbl_timeout_flush: flush timeout packets from a TCP > reassembly table. > - gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet. > > TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4 > and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP > checksums for merged packets. If inputted packets are IP fragmented, > TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4 > headers). > > In TCP GRO, we use a table structure, called TCP reassembly table, to > reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table > structure. A TCP reassembly table includes a key array and a item array, > where the key array keeps the criteria to merge packets and the item > array keeps packet information. > > One key in the key array points to an item group, which consists of > packets which have the same criteria value. If two packets are able to > merge, they must be in the same item group. Each key in the key array > includes two parts: > - criteria: the criteria of merging packets. If two packets can be > merged, they must have the same criteria value. > - start_index: the index of the first incoming packet of the item group. > > Each element in the item array keeps the information of one packet. It > mainly includes two parts: > - pkt: packet address > - next_pkt_index: the index of the next packet in the same item group. > All packets in the same item group are chained by next_pkt_index. > With next_pkt_index, we can locate all packets in the same item > group one by one. > > To process an incoming packet needs three steps: > a. check if the packet should be processed. Packets with the following > properties won't be processed: > - packets without data (e.g. SYN, SYN-ACK) > b. traverse the key array to find a key which has the same criteria > value with the incoming packet. If find, goto step c. Otherwise, > insert a new key and insert the packet into the item array. > c. locate the first packet in the item group via the start_index in the > key. Then traverse all packets in the item group via next_pkt_index. > If find one packet which can merge with the incoming one, merge them > together. If can't find, insert the packet into this item group. > > Signed-off-by: Jiayu Hu > --- > doc/guides/rel_notes/release_17_08.rst | 7 + > lib/librte_gro/Makefile | 1 + > lib/librte_gro/rte_gro.c | 126 +++++++++-- > lib/librte_gro/rte_gro.h | 6 +- > lib/librte_gro/rte_gro_tcp.c | 393 +++++++++++++++++++++++++++++++++ > lib/librte_gro/rte_gro_tcp.h | 188 ++++++++++++++++ > 6 files changed, 705 insertions(+), 16 deletions(-) > create mode 100644 lib/librte_gro/rte_gro_tcp.c > create mode 100644 lib/librte_gro/rte_gro_tcp.h > > diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst > index 842f46f..f067247 100644 > --- a/doc/guides/rel_notes/release_17_08.rst > +++ b/doc/guides/rel_notes/release_17_08.rst > @@ -75,6 +75,13 @@ New Features > > Added support for firmwares with multiple Ethernet ports per physical port. > > +* **Add Generic Receive Offload API support.** > + > + Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4 > + packets. GRO API assumes all inputted packets are with correct > + checksums. GRO API doesn't update checksums for merged packets. If > + inputted packets are IP fragmented, GRO API assumes they are complete > + packets (i.e. with L4 headers). > > Resolved Issues > --------------- > diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile > index 7e0f128..e89344d 100644 > --- a/lib/librte_gro/Makefile > +++ b/lib/librte_gro/Makefile > @@ -43,6 +43,7 @@ LIBABIVER := 1 > > # source files > SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c > +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c > > # install this header file > SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c > index ebc545f..ae800f9 100644 > --- a/lib/librte_gro/rte_gro.c > +++ b/lib/librte_gro/rte_gro.c > @@ -32,11 +32,15 @@ > > #include > #include > +#include > > #include "rte_gro.h" > +#include "rte_gro_tcp.h" > > -static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB]; > -static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB]; > +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = { > + gro_tcp_tbl_create, NULL}; > +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB] = { > + gro_tcp_tbl_destroy, NULL}; > > struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id, > uint16_t max_flow_num, > @@ -94,32 +98,124 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl) > } > > uint16_t > -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused, > +rte_gro_reassemble_burst(struct rte_mbuf **pkts, > const uint16_t nb_pkts, > - const struct rte_gro_param param __rte_unused) > + const struct rte_gro_param param) > { > - return nb_pkts; > + uint16_t i; > + uint16_t nb_after_gro = nb_pkts; > + uint32_t item_num = nb_pkts < > + param.max_flow_num * param.max_item_per_flow ? > + nb_pkts : > + param.max_flow_num * param.max_item_per_flow; > + > + /* allocate a reassembly table for TCP/IPv4 GRO */ > + uint32_t tcp_item_num = item_num <= GRO_TCP_TBL_MAX_ITEM_NUM ? > + item_num : GRO_TCP_TBL_MAX_ITEM_NUM; This is a bad check here as GRO_TCP_TBL_MAX_ITEM_NUM is defined as (UINT32_MAX - 1), and we cannot allocate such a big array on the stack. What's more, I still don't think we should put any TCP-specific code here. As we discussed offline, the reason you did this is to make the allocation as soon as possible. I suggest to define a macro named GRO_TCP_TBL_MAX_FLOWS and GRO_TCP_TBL_MAX_ITEMS_PER_FLOW, and memory are allocated when the library is loaded. This can even save the users from assigning the rte_gro_param. If there are more flows than GRO_TCP_TBL_MAX_FLOWS, we can just stop adding new flows. > + struct gro_tcp_tbl tcp_tbl; > + struct gro_tcp_key tcp_keys[tcp_item_num]; > + struct gro_tcp_item tcp_items[tcp_item_num]; > + > + struct rte_mbuf *unprocess_pkts[nb_pkts]; > + uint16_t unprocess_num = 0; > + int32_t ret; > + > + memset(tcp_keys, 0, sizeof(struct gro_tcp_key) * > + tcp_item_num); > + memset(tcp_items, 0, sizeof(struct gro_tcp_item) * > + tcp_item_num); > + tcp_tbl.keys = tcp_keys; > + tcp_tbl.items = tcp_items; > + tcp_tbl.key_num = 0; > + tcp_tbl.item_num = 0; > + tcp_tbl.max_key_num = tcp_item_num; > + tcp_tbl.max_item_num = tcp_item_num; > + > + for (i = 0; i < nb_pkts; i++) { > + if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type)) { > + if ((pkts[i]->packet_type & RTE_PTYPE_L4_TCP) && > + (param.desired_gro_types & > + GRO_TCP_IPV4)) { > + ret = gro_tcp4_reassemble(pkts[i], > + &tcp_tbl, > + param.max_packet_size); > + /* merge successfully */ > + if (ret > 0) > + nb_after_gro--; > + else if (ret < 0) > + unprocess_pkts[unprocess_num++] = > + pkts[i]; > + } else > + unprocess_pkts[unprocess_num++] = > + pkts[i]; > + } else > + unprocess_pkts[unprocess_num++] = > + pkts[i]; > + } > + > + /* re-arrange GROed packets */ > + if (nb_after_gro < nb_pkts) { > + if (param.desired_gro_types & GRO_TCP_IPV4) > + i = gro_tcp_tbl_flush(&tcp_tbl, pkts, nb_pkts); > + if (unprocess_num > 0) { > + memcpy(&pkts[i], unprocess_pkts, > + sizeof(struct rte_mbuf *) * > + unprocess_num); > + i += unprocess_num; > + } > + if (nb_pkts > i) > + memset(&pkts[i], 0, > + sizeof(struct rte_mbuf *) * > + (nb_pkts - i)); > + } > + return nb_after_gro; > } > > -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused, > - struct rte_gro_tbl *gro_tbl __rte_unused) > +int rte_gro_reassemble(struct rte_mbuf *pkt, > + struct rte_gro_tbl *gro_tbl) > { > + if (unlikely(pkt == NULL)) > + return -1; > + > + if (RTE_ETH_IS_IPV4_HDR(pkt->packet_type)) { > + if ((pkt->packet_type & RTE_PTYPE_L4_TCP) && > + (gro_tbl->desired_gro_types & > + GRO_TCP_IPV4)) > + return gro_tcp4_reassemble(pkt, > + gro_tbl->tbls[GRO_TCP_IPV4_INDEX], > + gro_tbl->max_packet_size); > + } > + > return -1; > } > > -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused, > - uint64_t desired_gro_types __rte_unused, > - struct rte_mbuf **out __rte_unused, > - const uint16_t max_nb_out __rte_unused) > +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl, > + uint64_t desired_gro_types, > + struct rte_mbuf **out, > + const uint16_t max_nb_out) > { > + desired_gro_types = desired_gro_types & > + gro_tbl->desired_gro_types; > + if (desired_gro_types & GRO_TCP_IPV4) > + return gro_tcp_tbl_flush( > + gro_tbl->tbls[GRO_TCP_IPV4_INDEX], > + out, > + max_nb_out); > return 0; > } > > uint16_t > -rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused, > - uint64_t desired_gro_types __rte_unused, > - struct rte_mbuf **out __rte_unused, > - const uint16_t max_nb_out __rte_unused) > +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl, > + uint64_t desired_gro_types, > + struct rte_mbuf **out, > + const uint16_t max_nb_out) > { > + desired_gro_types = desired_gro_types & > + gro_tbl->desired_gro_types; > + if (desired_gro_types & GRO_TCP_IPV4) > + return gro_tcp_tbl_timeout_flush( > + gro_tbl->tbls[GRO_TCP_IPV4_INDEX], > + gro_tbl->max_timeout_cycles, > + out, max_nb_out); > return 0; > } > diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h > index 2c547fa..41cd51a 100644 > --- a/lib/librte_gro/rte_gro.h > +++ b/lib/librte_gro/rte_gro.h > @@ -35,7 +35,11 @@ > > /* max number of supported GRO types */ > #define GRO_TYPE_MAX_NB 64 > -#define GRO_TYPE_SUPPORT_NB 0 /**< current supported GRO num */ > +#define GRO_TYPE_SUPPORT_NB 1 /**< supported GRO types number */ > + > +/* TCP/IPv4 GRO flag */ > +#define GRO_TCP_IPV4_INDEX 0 > +#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX) > > /** > * GRO table, which is used to merge packets. It keeps many reassembly > diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c > new file mode 100644 > index 0000000..cfcd89e > --- /dev/null > +++ b/lib/librte_gro/rte_gro_tcp.c > @@ -0,0 +1,393 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2017 Intel Corporation. All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#include > +#include > +#include > + > +#include > +#include > +#include > + > +#include "rte_gro_tcp.h" > + > +void *gro_tcp_tbl_create(uint16_t socket_id, > + uint16_t max_flow_num, > + uint16_t max_item_per_flow) > +{ > + size_t size; > + uint32_t entries_num; > + struct gro_tcp_tbl *tbl; > + > + entries_num = max_flow_num * max_item_per_flow; > + entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ? > + GRO_TCP_TBL_MAX_ITEM_NUM : entries_num; > + > + if (entries_num == 0) > + return NULL; > + > + tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket( > + __func__, > + sizeof(struct gro_tcp_tbl), > + RTE_CACHE_LINE_SIZE, > + socket_id); > + > + size = sizeof(struct gro_tcp_item) * entries_num; > + tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket( > + __func__, > + size, > + RTE_CACHE_LINE_SIZE, > + socket_id); > + tbl->max_item_num = entries_num; > + > + size = sizeof(struct gro_tcp_key) * entries_num; > + tbl->keys = (struct gro_tcp_key *)rte_zmalloc_socket( > + __func__, > + size, RTE_CACHE_LINE_SIZE, > + socket_id); > + tbl->max_key_num = entries_num; > + return tbl; > +} > + > +void gro_tcp_tbl_destroy(void *tbl) > +{ > + struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl; > + > + if (tcp_tbl) { > + if (tcp_tbl->items) > + rte_free(tcp_tbl->items); > + if (tcp_tbl->keys) > + rte_free(tcp_tbl->keys); > + rte_free(tcp_tbl); > + } > +} > + > +/** > + * merge two TCP/IPv4 packets without update checksums. > + */ > +static int > +merge_two_tcp4_packets(struct rte_mbuf *pkt_src, > + struct rte_mbuf *pkt, > + uint32_t max_packet_size) > +{ > + struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2; > + struct tcp_hdr *tcp_hdr1; > + uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1; > + struct rte_mbuf *tail; > + > + /* parse the given packet */ > + ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, > + struct ether_hdr *) + 1); > + ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1); > + tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1); > + tcp_hl1 = TCP_HDR_LEN(tcp_hdr1); > + tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1 > + - tcp_hl1; > + > + /* parse the original packet */ > + ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src, > + struct ether_hdr *) + 1); > + > + if (pkt_src->pkt_len + tcp_dl1 > max_packet_size) > + return -1; > + > + /* remove the header of the incoming packet */ > + rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) + > + ipv4_ihl1 + tcp_hl1); > + > + /* chain the two packet together */ > + tail = rte_pktmbuf_lastseg(pkt_src); > + tail->next = pkt; > + > + /* update IP header */ > + ipv4_hdr2->total_length = rte_cpu_to_be_16( > + rte_be_to_cpu_16( > + ipv4_hdr2->total_length) > + + tcp_dl1); > + > + /* update mbuf metadata for the merged packet */ > + pkt_src->nb_segs++; > + pkt_src->pkt_len += pkt->pkt_len; > + return 1; > +} > + > +static int > +check_seq_option(struct rte_mbuf *pkt, > + struct tcp_hdr *tcp_hdr, > + uint16_t tcp_hl) > +{ > + struct ipv4_hdr *ipv4_hdr1; > + struct tcp_hdr *tcp_hdr1; > + uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1; > + uint32_t sent_seq1, sent_seq; > + int ret = -1; > + > + ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, > + struct ether_hdr *) + 1); > + ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1); > + tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1); > + tcp_hl1 = TCP_HDR_LEN(tcp_hdr1); > + tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1 > + - tcp_hl1; > + sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1; > + sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq); > + > + /* check if the two packets are neighbor */ > + if ((sent_seq ^ sent_seq1) == 0) { > + /* check if TCP option field equals */ > + if (tcp_hl1 > sizeof(struct tcp_hdr)) { > + if ((tcp_hl1 != tcp_hl) || > + (memcmp(tcp_hdr1 + 1, > + tcp_hdr + 1, > + tcp_hl - sizeof > + (struct tcp_hdr)) > + == 0)) > + ret = 1; > + } > + } > + return ret; > +} > + > +static uint32_t > +find_an_empty_item(struct gro_tcp_tbl *tbl) > +{ > + uint32_t i; > + > + for (i = 0; i < tbl->max_item_num; i++) > + if (tbl->items[i].is_valid == 0) > + return i; > + return INVALID_ARRAY_INDEX; > +} > + > +static uint32_t > +find_an_empty_key(struct gro_tcp_tbl *tbl) > +{ > + uint32_t i; > + > + for (i = 0; i < tbl->max_key_num; i++) > + if (tbl->keys[i].is_valid == 0) > + return i; > + return INVALID_ARRAY_INDEX; > +} > + > +int32_t > +gro_tcp4_reassemble(struct rte_mbuf *pkt, > + struct gro_tcp_tbl *tbl, > + uint32_t max_packet_size) > +{ > + struct ether_hdr *eth_hdr; > + struct ipv4_hdr *ipv4_hdr; > + struct tcp_hdr *tcp_hdr; > + uint16_t ipv4_ihl, tcp_hl, tcp_dl; > + > + struct tcp_key key; > + uint32_t cur_idx, prev_idx, item_idx; > + uint32_t i, key_idx; > + > + eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *); > + ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1); > + ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr); > + > + /* check if the packet should be processed */ > + if (ipv4_ihl < sizeof(struct ipv4_hdr)) > + goto fail; > + tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl); > + tcp_hl = TCP_HDR_LEN(tcp_hdr); > + tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl > + - tcp_hl; > + if (tcp_dl == 0) > + goto fail; > + > + /* find a key and traverse all packets in its item group */ > + key.eth_saddr = eth_hdr->s_addr; > + key.eth_daddr = eth_hdr->d_addr; > + key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr); > + key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr); > + key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port); > + key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port); > + key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack); > + key.tcp_flags = tcp_hdr->tcp_flags; > + > + for (i = 0; i < tbl->max_key_num; i++) { > + if (tbl->keys[i].is_valid && > + (memcmp(&(tbl->keys[i].key), &key, > + sizeof(struct tcp_key)) > + == 0)) { > + cur_idx = tbl->keys[i].start_index; > + prev_idx = cur_idx; > + while (cur_idx != INVALID_ARRAY_INDEX) { > + if (check_seq_option(tbl->items[cur_idx].pkt, > + tcp_hdr, > + tcp_hl) > 0) { > + if (merge_two_tcp4_packets( > + tbl->items[cur_idx].pkt, > + pkt, > + max_packet_size) > 0) { > + /* successfully merge two packets */ > + tbl->items[cur_idx].is_groed = 1; > + return 1; > + } > + /** > + * fail to merge two packets since > + * it's beyond the max packet length. > + * Insert it into the item group. > + */ > + goto insert_to_item_group; > + } else { > + prev_idx = cur_idx; > + cur_idx = tbl->items[cur_idx].next_pkt_idx; > + } > + } > + /** > + * find a corresponding item group but fails to find > + * one packet to merge. Insert it into this item group. > + */ > +insert_to_item_group: > + item_idx = find_an_empty_item(tbl); > + /* the item number is beyond the maximum value */ > + if (item_idx == INVALID_ARRAY_INDEX) > + return -1; > + tbl->items[prev_idx].next_pkt_idx = item_idx; > + tbl->items[item_idx].pkt = pkt; > + tbl->items[item_idx].is_groed = 0; > + tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX; > + tbl->items[item_idx].is_valid = 1; > + tbl->items[item_idx].start_time = rte_rdtsc(); > + tbl->item_num++; > + return 0; > + } > + } > + > + /** > + * merge fail as the given packet has a > + * new key. So insert a new key. > + */ > + item_idx = find_an_empty_item(tbl); > + key_idx = find_an_empty_key(tbl); > + /** > + * if the key or item number is beyond the maximum > + * value, the inputted packet won't be processed. > + */ > + if (item_idx == INVALID_ARRAY_INDEX || > + key_idx == INVALID_ARRAY_INDEX) > + return -1; > + tbl->items[item_idx].pkt = pkt; > + tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX; > + tbl->items[item_idx].is_groed = 0; > + tbl->items[item_idx].is_valid = 1; > + tbl->items[item_idx].start_time = rte_rdtsc(); > + tbl->item_num++; > + > + memcpy(&(tbl->keys[key_idx].key), > + &key, sizeof(struct tcp_key)); > + tbl->keys[key_idx].start_index = item_idx; > + tbl->keys[key_idx].is_valid = 1; > + tbl->key_num++; > + > + return 0; > +fail: > + return -1; > +} > + > +uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl, > + struct rte_mbuf **out, > + const uint16_t nb_out) > +{ > + uint32_t i, num = 0; > + > + if (nb_out < tbl->item_num) > + return 0; > + > + for (i = 0; i < tbl->max_item_num; i++) { > + if (tbl->items[i].is_valid) { > + out[num++] = tbl->items[i].pkt; > + tbl->items[i].is_valid = 0; > + tbl->item_num--; > + } > + } > + memset(tbl->keys, 0, sizeof(struct gro_tcp_key) * > + tbl->max_key_num); > + tbl->key_num = 0; > + > + return num; > +} > + > +uint16_t > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl, > + uint64_t timeout_cycles, > + struct rte_mbuf **out, > + const uint16_t nb_out) > +{ > + uint16_t k; > + uint32_t i, j; > + uint64_t current_time; > + > + if (nb_out == 0) > + return 0; > + k = 0; > + current_time = rte_rdtsc(); > + > + for (i = 0; i < tbl->max_key_num; i++) { > + if (tbl->keys[i].is_valid) { > + j = tbl->keys[i].start_index; > + while (j != INVALID_ARRAY_INDEX) { > + if (current_time - tbl->items[j].start_time >= > + timeout_cycles) { > + out[k++] = tbl->items[j].pkt; > + tbl->items[j].is_valid = 0; > + tbl->item_num--; > + j = tbl->items[j].next_pkt_idx; > + > + if (k == nb_out && > + j == INVALID_ARRAY_INDEX) { > + /* delete the key */ > + tbl->keys[i].is_valid = 0; > + tbl->key_num--; > + goto end; > + } else if (k == nb_out && > + j != INVALID_ARRAY_INDEX) { > + /* update the first item index */ > + tbl->keys[i].start_index = j; > + goto end; > + } > + } > + } > + /* delete the key, as all of its packets are flushed */ > + tbl->keys[i].is_valid = 0; > + tbl->key_num--; > + } > + if (tbl->key_num == 0) > + goto end; > + } > +end: > + return k; > +} > diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h > new file mode 100644 > index 0000000..4c4f9c7 > --- /dev/null > +++ b/lib/librte_gro/rte_gro_tcp.h > @@ -0,0 +1,188 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2017 Intel Corporation. All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#ifndef _RTE_GRO_TCP_H_ > +#define _RTE_GRO_TCP_H_ > + > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN > +#define TCP_HDR_LEN(tcph) \ > + ((tcph->data_off >> 4) * 4) > +#define IPv4_HDR_LEN(iph) \ > + ((iph->version_ihl & 0x0f) * 4) > +#else > +#define TCP_DATAOFF_MASK 0x0f > +#define TCP_HDR_LEN(tcph) \ > + ((tcph->data_off & TCP_DATAOFF_MASK) * 4) > +#define IPv4_HDR_LEN(iph) \ > + ((iph->version_ihl >> 4) * 4) > +#endif > + > +#define INVALID_ARRAY_INDEX 0xffffffffUL > +#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1) Defining such a big number does not make any sense. > + > +/* criteria of mergeing packets */ > +struct tcp_key { > + struct ether_addr eth_saddr; > + struct ether_addr eth_daddr; > + uint32_t ip_src_addr[4]; /**< IPv4 uses the first 8B */ > + uint32_t ip_dst_addr[4]; > + > + uint32_t recv_ack; /**< acknowledgment sequence number. */ > + uint16_t src_port; > + uint16_t dst_port; > + uint8_t tcp_flags; /**< TCP flags. */ > +}; > + > +struct gro_tcp_key { > + struct tcp_key key; > + uint32_t start_index; /**< the first packet index of the flow */ > + uint8_t is_valid; > +}; > + > +struct gro_tcp_item { > + struct rte_mbuf *pkt; /**< packet address. */ > + /* the time when the packet in added into the table */ > + uint64_t start_time; > + uint32_t next_pkt_idx; /**< next packet index. */ > + /* flag to indicate if the packet is GROed */ > + uint8_t is_groed; > + uint8_t is_valid; /**< flag indicates if the item is valid */ > +}; > + > +/** > + * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table > + * structure. > + */ > +struct gro_tcp_tbl { > + struct gro_tcp_item *items; /**< item array */ > + struct gro_tcp_key *keys; /**< key array */ > + uint32_t item_num; /**< current item number */ > + uint32_t key_num; /**< current key num */ > + uint32_t max_item_num; /**< item array size */ > + uint32_t max_key_num; /**< key array size */ > +}; > + > +/** > + * This function creates a TCP reassembly table. > + * > + * @param socket_id > + * socket index where the Ethernet port connects to. > + * @param max_flow_num > + * the maximum number of flows in the TCP GRO table > + * @param max_item_per_flow > + * the maximum packet number per flow. > + * @return > + * if create successfully, return a pointer which points to the > + * created TCP GRO table. Otherwise, return NULL. > + */ > +void *gro_tcp_tbl_create(uint16_t socket_id, > + uint16_t max_flow_num, > + uint16_t max_item_per_flow); > + > +/** > + * This function destroys a TCP reassembly table. > + * @param tbl > + * a pointer points to the TCP reassembly table. > + */ > +void gro_tcp_tbl_destroy(void *tbl); > + > +/** > + * This function searches for a packet in the TCP reassembly table to > + * merge with the inputted one. To merge two packets is to chain them > + * together and update packet headers. If the packet is without data > + * (e.g. SYN, SYN-ACK packet), this function returns immediately. > + * Otherwise, the packet is either merged, or inserted into the table. > + * > + * This function assumes the inputted packet is with correct IPv4 and > + * TCP checksums. And if two packets are merged, it won't re-calculate > + * IPv4 and TCP checksums. Besides, if the inputted packet is IP > + * fragmented, it assumes the packet is complete (with TCP header). > + * > + * @param pkt > + * packet to reassemble. > + * @param tbl > + * a pointer that points to a TCP reassembly table. > + * @param max_packet_size > + * max packet length after merged > + * @return > + * if the packet doesn't have data, return a negative value. If the > + * packet is merged successfully, return an positive value. If the > + * packet is inserted into the table, return 0. > + */ > +int32_t > +gro_tcp4_reassemble(struct rte_mbuf *pkt, > + struct gro_tcp_tbl *tbl, > + uint32_t max_packet_size); > + > +/** > + * This function flushes all packets in a TCP reassembly table to > + * applications, and without updating checksums for merged packets. > + * If the array which is used to keep flushed packets is not large > + * enough, error happens and this function returns immediately. > + * > + * @param tbl > + * a pointer that points to a TCP GRO table. > + * @param out > + * pointer array which is used to keep flushed packets. Applications > + * should guarantee it's large enough to hold all packets in the table. > + * @param nb_out > + * the element number of out. > + * @return > + * the number of flushed packets. If out is not large enough to hold > + * all packets in the table, return 0. > + */ > +uint16_t > +gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl, > + struct rte_mbuf **out, > + const uint16_t nb_out); > + > +/** > + * This function flushes timeout packets in a TCP reassembly table to > + * applications, and without updating checksums for merged packets. > + * > + * @param tbl > + * a pointer that points to a TCP GRO table. > + * @param timeout_cycles > + * the maximum time that packets can stay in the table. > + * @param out > + * pointer array which is used to keep flushed packets. > + * @param nb_out > + * the element number of out. > + * @return > + * the number of packets that are returned. > + */ > +uint16_t > +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl, > + uint64_t timeout_cycles, > + struct rte_mbuf **out, > + const uint16_t nb_out); > +#endif