From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id A53D02BFF for ; Fri, 7 Jul 2017 08:55:31 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP; 06 Jul 2017 23:55:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,321,1496127600"; d="scan'208";a="876196915" Received: from tanjianf-mobl.ccr.corp.intel.com (HELO [10.67.64.98]) ([10.67.64.98]) by FMSMGA003.fm.intel.com with ESMTP; 06 Jul 2017 23:55:28 -0700 To: Jiayu Hu , dev@dpdk.org References: <1498907323-17563-1-git-send-email-jiayu.hu@intel.com> <1499227716-116583-1-git-send-email-jiayu.hu@intel.com> <1499227716-116583-3-git-send-email-jiayu.hu@intel.com> Cc: konstantin.ananyev@intel.com, yliu@fridaylinux.org, stephen@networkplumber.org, jingjing.wu@intel.com, lei.a.yao@intel.com From: "Tan, Jianfeng" Message-ID: <57a2400e-a626-fccb-3ab9-61b697fd2104@intel.com> Date: Fri, 7 Jul 2017 14:55:28 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <1499227716-116583-3-git-send-email-jiayu.hu@intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v11 2/3] lib/gro: add TCP/IPv4 GRO support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Jul 2017 06:55:33 -0000 On 7/5/2017 12:08 PM, Jiayu Hu wrote: > In this patch, we introduce five APIs to support TCP/IPv4 GRO. > - gro_tcp4_tbl_create: create a TCP/IPv4 reassembly table, which is used > to merge packets. > - gro_tcp4_tbl_destroy: free memory space of a TCP/IPv4 reassembly table. > - gro_tcp4_tbl_timeout_flush: flush timeout packets from a TCP/IPv4 > reassembly table. > - gro_tcp4_tbl_get_count: return the number of packets in a TCP/IPv4 > reassembly table. > - gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet. > > TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4 > and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP > checksums for merged packets. If inputted packets are IP fragmented, > TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4 > headers). > > In TCP/IPv4 GRO, we use a table structure, called TCP/IPv4 reassembly > table, to reassemble packets. A TCP/IPv4 reassembly table includes a key > array and a item array, where the key array keeps the criteria to merge > packets and the item array keeps packet information. > > One key in the key array points to an item group, which consists of > packets which have the same criteria value. If two packets are able to > merge, they must be in the same item group. Each key in the key array > includes two parts: > - criteria: the criteria of merging packets. If two packets can be > merged, they must have the same criteria value. > - start_index: the index of the first incoming packet of the item group. > > Each element in the item array keeps the information of one packet. It > mainly includes three parts: > - firstseg: the address of the first segment of the packet > - lastsegL the address of the last segment of the packet > - next_pkt_index: the index of the next packet in the same item group. > All packets in the same item group are chained by next_pkt_index. > With next_pkt_index, we can locate all packets in the same item > group one by one. > > To process an incoming packet needs three steps: > a. check if the packet should be processed. Packets with one of the > following properties won't be processed: > - FIN, SYN, RST URG, PSH, ECE or CWR bit is set; > - packet payload length is 0. > b. traverse the key array to find a key which has the same criteria > value with the incoming packet. If find, goto step c. Otherwise, > insert a new key and insert the packet into the item array. > c. locate the first packet in the item group via the start_index in the > key. Then traverse all packets in the item group via next_pkt_index. > If find one packet which can merge with the incoming one, merge them > together. If can't find, insert the packet into this item group. > > Signed-off-by: Jiayu Hu > --- > doc/guides/rel_notes/release_17_08.rst | 7 + > lib/librte_gro/Makefile | 1 + > lib/librte_gro/gro_tcp4.c | 493 +++++++++++++++++++++++++++++++++ > lib/librte_gro/gro_tcp4.h | 206 ++++++++++++++ > lib/librte_gro/rte_gro.c | 121 +++++++- > lib/librte_gro/rte_gro.h | 5 +- > 6 files changed, 819 insertions(+), 14 deletions(-) > create mode 100644 lib/librte_gro/gro_tcp4.c > create mode 100644 lib/librte_gro/gro_tcp4.h > > diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst > index 842f46f..f067247 100644 > --- a/doc/guides/rel_notes/release_17_08.rst > +++ b/doc/guides/rel_notes/release_17_08.rst > @@ -75,6 +75,13 @@ New Features > > Added support for firmwares with multiple Ethernet ports per physical port. > > +* **Add Generic Receive Offload API support.** > + > + Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4 > + packets. GRO API assumes all inputted packets are with correct > + checksums. GRO API doesn't update checksums for merged packets. If > + inputted packets are IP fragmented, GRO API assumes they are complete > + packets (i.e. with L4 headers). > > Resolved Issues > --------------- > diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile > index 7e0f128..747eeec 100644 > --- a/lib/librte_gro/Makefile > +++ b/lib/librte_gro/Makefile > @@ -43,6 +43,7 @@ LIBABIVER := 1 > > # source files > SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c > +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += gro_tcp4.c > > # install this header file > SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h > diff --git a/lib/librte_gro/gro_tcp4.c b/lib/librte_gro/gro_tcp4.c > new file mode 100644 > index 0000000..703282d > --- /dev/null > +++ b/lib/librte_gro/gro_tcp4.c > @@ -0,0 +1,493 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2017 Intel Corporation. All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "gro_tcp4.h" > + > +void * > +gro_tcp4_tbl_create(uint16_t socket_id, > + uint16_t max_flow_num, > + uint16_t max_item_per_flow) > +{ > + struct gro_tcp4_tbl *tbl; > + size_t size; > + uint32_t entries_num; > + > + entries_num = max_flow_num * max_item_per_flow; > + entries_num = entries_num > GRO_TCP4_TBL_MAX_ITEM_NUM ? > + GRO_TCP4_TBL_MAX_ITEM_NUM : entries_num; As I commented before, this check is not good; entries_num is uint32_t; it can never be greater than (UINT32_MAX - 1). Plus, we cannot allocate a memory as big as sizeof(struct gro_tcp4_item) * UINT32_MAX. If we really need a check, please make it smaller. Considering each item means a flow in some extent, I think we can limit it to 1M flows for now. (Sorry, I should comment at the definition of GRO_TCP4_TBL_MAX_ITEM_NUM. > + > + if (entries_num == 0) > + return NULL; > + > + tbl = rte_zmalloc_socket(__func__, > + sizeof(struct gro_tcp4_tbl), > + RTE_CACHE_LINE_SIZE, > + socket_id); > + if (tbl == NULL) > + return NULL; > + > + size = sizeof(struct gro_tcp4_item) * entries_num; > + tbl->items = rte_zmalloc_socket(__func__, > + size, > + RTE_CACHE_LINE_SIZE, > + socket_id); > + if (tbl->items == NULL) { > + rte_free(tbl); > + return NULL; > + } > + tbl->max_item_num = entries_num; > + > + size = sizeof(struct gro_tcp4_key) * entries_num; > + tbl->keys = rte_zmalloc_socket(__func__, > + size, > + RTE_CACHE_LINE_SIZE, > + socket_id); > + if (tbl->keys == NULL) { > + rte_free(tbl->items); > + rte_free(tbl); > + return NULL; > + } > + tbl->max_key_num = entries_num; > + > + return tbl; > +} > + > +void > +gro_tcp4_tbl_destroy(void *tbl) > +{ > + struct gro_tcp4_tbl *tcp_tbl = tbl; > + > + if (tcp_tbl) { > + rte_free(tcp_tbl->items); > + rte_free(tcp_tbl->keys); > + } > + rte_free(tcp_tbl); > +} > + > +/* > + * merge two TCP/IPv4 packets without updating checksums. > + * If cmp is larger than 0, append the new packet to the > + * original packet. Otherwise, pre-pend the new packet to > + * the original packet. > + */ > +static inline int > +merge_two_tcp4_packets(struct gro_tcp4_item *item_src, > + struct rte_mbuf *pkt, > + uint16_t ip_id, > + uint32_t sent_seq, > + int cmp) > +{ > + struct rte_mbuf *pkt_head, *pkt_tail, *lastseg; > + uint16_t tcp_dl1; We don't have a tcp_dl2, and for readability, we should not hide "dl"; so just change the name to tcp_datalen. > + > + if (cmp > 0) { > + pkt_head = item_src->firstseg; > + pkt_tail = pkt; > + } else { > + pkt_head = pkt; > + pkt_tail = item_src->firstseg; > + } > + > + /* check if the packet length will be beyond the max value */ > + tcp_dl1 = pkt_tail->pkt_len - pkt_tail->l2_len - > + pkt_tail->l3_len - pkt_tail->l4_len; > + if (pkt_head->pkt_len - pkt_head->l2_len + tcp_dl1 > > + TCP4_MAX_L3_LENGTH) > + return -1; > + > + /* remove packet header for the tail packet */ > + rte_pktmbuf_adj(pkt_tail, > + pkt_tail->l2_len + > + pkt_tail->l3_len + > + pkt_tail->l4_len); > + > + /* chain two packets together */ > + if (cmp > 0) { > + item_src->lastseg->next = pkt; > + item_src->lastseg = rte_pktmbuf_lastseg(pkt); > + /* update IP ID to the larger value */ > + item_src->ip_id = ip_id; > + } else { > + lastseg = rte_pktmbuf_lastseg(pkt); > + lastseg->next = item_src->firstseg; > + item_src->firstseg = pkt; > + /* update sent_seq to the smaller value */ > + item_src->sent_seq = sent_seq; > + } > + item_src->nb_merged++; > + > + /* update mbuf metadata for the merged packet */ > + pkt_head->nb_segs += pkt_tail->nb_segs; > + pkt_head->pkt_len += pkt_tail->pkt_len; > + > + return 1; > +} > + > +static inline int > +check_seq_option(struct gro_tcp4_item *item, > + struct tcp_hdr *tcp_hdr, > + uint16_t tcp_hl, > + uint16_t tcp_dl, > + uint16_t ip_id, > + uint32_t sent_seq) > +{ > + struct rte_mbuf *pkt0 = item->firstseg; > + struct ipv4_hdr *ipv4_hdr0; > + struct tcp_hdr *tcp_hdr0; > + uint16_t tcp_hl0, tcp_dl0; > + uint16_t len; > + > + ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt0, char *) + > + pkt0->l2_len); > + tcp_hdr0 = (struct tcp_hdr *)((char *)ipv4_hdr0 + pkt0->l3_len); > + tcp_hl0 = pkt0->l4_len; > + > + /* check if TCP option fields equal. If not, return 0. */ > + len = RTE_MAX(tcp_hl, tcp_hl0) - sizeof(struct tcp_hdr); > + if ((tcp_hl != tcp_hl0) || > + ((len > 0) && (memcmp(tcp_hdr + 1, > + tcp_hdr0 + 1, > + len) != 0))) > + return 0; > + > + /* check if the two packets are neighbors */ > + tcp_dl0 = pkt0->pkt_len - pkt0->l2_len - pkt0->l3_len - tcp_hl0; > + if ((sent_seq == (item->sent_seq + tcp_dl0)) && > + (ip_id == (item->ip_id + 1))) > + /* append the new packet */ > + return 1; > + else if (((sent_seq + tcp_dl) == item->sent_seq) && > + ((ip_id + item->nb_merged) == item->ip_id)) > + /* pre-pend the new packet */ > + return -1; > + else > + return 0; > +} > + > +static inline uint32_t > +find_an_empty_item(struct gro_tcp4_tbl *tbl) > +{ > + uint32_t i; > + > + for (i = 0; i < tbl->max_item_num; i++) > + if (tbl->items[i].firstseg == NULL) > + return i; > + return INVALID_ARRAY_INDEX; > +} > + > +static inline uint32_t > +find_an_empty_key(struct gro_tcp4_tbl *tbl) > +{ > + uint32_t i; > + > + for (i = 0; i < tbl->max_key_num; i++) > + if (tbl->keys[i].is_valid == 0) > + return i; > + return INVALID_ARRAY_INDEX; > +} > + > +static inline uint32_t > +insert_new_item(struct gro_tcp4_tbl *tbl, > + struct rte_mbuf *pkt, > + uint16_t ip_id, > + uint32_t sent_seq, > + uint32_t prev_idx, > + uint64_t start_time) > +{ > + uint32_t item_idx; > + > + item_idx = find_an_empty_item(tbl); > + if (item_idx == INVALID_ARRAY_INDEX) > + return INVALID_ARRAY_INDEX; > + > + tbl->items[item_idx].firstseg = pkt; > + tbl->items[item_idx].lastseg = rte_pktmbuf_lastseg(pkt); > + tbl->items[item_idx].start_time = start_time; > + tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX; > + tbl->items[item_idx].sent_seq = sent_seq; > + tbl->items[item_idx].ip_id = ip_id; > + tbl->items[item_idx].nb_merged = 1; > + tbl->item_num++; > + > + /* if the previous packet exists, chain the new one with it */ > + if (prev_idx != INVALID_ARRAY_INDEX) > + tbl->items[prev_idx].next_pkt_idx = item_idx; > + > + return item_idx; > +} > + > +static inline uint32_t > +delete_item(struct gro_tcp4_tbl *tbl, uint32_t item_idx) > +{ > + uint32_t next_idx = tbl->items[item_idx].next_pkt_idx; > + > + /* set NULL to firstseg to indicate it's an empty item */ > + tbl->items[item_idx].firstseg = NULL; > + tbl->item_num--; > + > + return next_idx; > +} > + > +static inline uint32_t > +insert_new_key(struct gro_tcp4_tbl *tbl, > + struct tcp4_key *key_src, > + uint32_t item_idx) > +{ > + struct tcp4_key *key_dst; > + uint32_t key_idx; > + > + key_idx = find_an_empty_key(tbl); > + if (key_idx == INVALID_ARRAY_INDEX) > + return INVALID_ARRAY_INDEX; > + > + key_dst = &(tbl->keys[key_idx].key); > + > + ether_addr_copy(&(key_src->eth_saddr), &(key_dst->eth_saddr)); > + ether_addr_copy(&(key_src->eth_daddr), &(key_dst->eth_daddr)); > + key_dst->ip_src_addr = key_src->ip_src_addr; > + key_dst->ip_dst_addr = key_src->ip_dst_addr; > + key_dst->recv_ack = key_src->recv_ack; > + key_dst->src_port = key_src->src_port; > + key_dst->dst_port = key_src->dst_port; > + > + tbl->keys[key_idx].start_index = item_idx; > + tbl->keys[key_idx].is_valid = 1; > + tbl->key_num++; > + > + return key_idx; > +} > + > +static inline int > +compare_key(struct tcp4_key k1, struct tcp4_key k2) > +{ > + uint16_t *c1, *c2; > + > + c1 = (uint16_t *)&(k1.eth_saddr); > + c2 = (uint16_t *)&(k2.eth_saddr); > + if ((c1[0] != c2[0]) || (c1[1] != c2[1]) || (c1[2] != c2[2])) > + return -1; > + c1 = (uint16_t *)&(k1.eth_daddr); > + c2 = (uint16_t *)&(k2.eth_daddr); > + if ((c1[0] != c2[0]) || (c1[1] != c2[1]) || (c1[2] != c2[2])) > + return -1; > + if ((k1.ip_src_addr != k2.ip_src_addr) || > + (k1.ip_dst_addr != k2.ip_dst_addr) || > + (k1.recv_ack != k2.recv_ack) || > + (k1.src_port != k2.src_port) || > + (k1.dst_port != k2.dst_port)) > + return -1; > + > + return 0; > +} Above function can be written in a cleaner way: static inline int is_same_key(struct tcp4_key k1, struct tcp4_key k2) { if (is_same_ether_addr(&k1.eth_saddr, &k2.eth_saddr) == 0) return 0; if (is_same_ether_addr(&k1.eth_daddr, &k2.eth_daddr) == 0) return 0; return ((k1.ip_src_addr == k2.ip_src_addr) && (k1.ip_dst_addr == k2.ip_dst_addr) && (k1.recv_ack == k2.recv_ack) && (k1.src_port == k2.src_port) && (k1.dst_port == k2.dst_port)); } > + > +/* > + * update packet length and IP ID for the flushed packet. > + */ > +static inline void > +update_packet_header(struct gro_tcp4_item *item) > +{ > + struct ipv4_hdr *ipv4_hdr; > + struct rte_mbuf *pkt = item->firstseg; > + > + ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) + > + pkt->l2_len); > + ipv4_hdr->total_length = rte_cpu_to_be_16(pkt->pkt_len - > + pkt->l2_len); > + ipv4_hdr->packet_id = rte_cpu_to_be_16(item->ip_id); > +} > + > +int32_t > +gro_tcp4_reassemble(struct rte_mbuf *pkt, > + struct gro_tcp4_tbl *tbl, > + uint64_t start_time) > +{ > + struct ether_hdr *eth_hdr; > + struct ipv4_hdr *ipv4_hdr; > + struct tcp_hdr *tcp_hdr; > + uint32_t sent_seq; > + uint16_t tcp_dl, ip_id; > + > + struct tcp4_key key; > + uint32_t cur_idx, prev_idx, item_idx; > + uint32_t i; > + int cmp; > + > + eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *); > + ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len); > + tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len); > + > + /* > + * if FIN, SYN, RST, PSH, URG, ECE or CWR is set, return immediately. > + */ > + if (tcp_hdr->tcp_flags != TCP_ACK_FLAG) > + return -1; > + /* if payload length is 0, return immediately */ > + tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len - > + pkt->l4_len; > + if (tcp_dl == 0) > + return -1; > + > + ip_id = rte_be_to_cpu_16(ipv4_hdr->packet_id); > + sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq); > + > + ether_addr_copy(&(eth_hdr->s_addr), &(key.eth_saddr)); > + ether_addr_copy(&(eth_hdr->d_addr), &(key.eth_daddr)); > + key.ip_src_addr = ipv4_hdr->src_addr; > + key.ip_dst_addr = ipv4_hdr->dst_addr; > + key.src_port = tcp_hdr->src_port; > + key.dst_port = tcp_hdr->dst_port; > + key.recv_ack = tcp_hdr->recv_ack; > + > + /* search for a key */ > + for (i = 0; i < tbl->max_key_num; i++) { > + if ((tbl->keys[i].is_valid == 1) && > + (compare_key(tbl->keys[i].key, key) == 0)) > + break; Simplified as: for (i = 0; i < tbl->max_key_num; i++) if (tbl->keys[i].is_valid && is_same_key(tbl->keys[i].key, key)) break; > + } > + > + /* can't find a key, so insert a new key and a new item. */ > + if (i == tbl->max_key_num) { > + item_idx = insert_new_item(tbl, pkt, ip_id, sent_seq, > + INVALID_ARRAY_INDEX, start_time); > + if (item_idx == INVALID_ARRAY_INDEX) > + return -1; > + if (insert_new_key(tbl, &key, item_idx) == > + INVALID_ARRAY_INDEX) { > + /* fail to insert a new key, delete the inserted item */ > + delete_item(tbl, item_idx); > + return -1; > + } > + return 0; > + } > + > + /* traverse all packets in the item group to find one to merge */ > + cur_idx = tbl->keys[i].start_index; > + prev_idx = cur_idx; > + do { > + cmp = check_seq_option(&(tbl->items[cur_idx]), tcp_hdr, > + pkt->l4_len, tcp_dl, ip_id, sent_seq); > + if (cmp != 0) { > + if (merge_two_tcp4_packets(&(tbl->items[cur_idx]), pkt, > + ip_id, sent_seq, cmp) > 0) > + return 1; > + /* > + * fail to merge two packets since the packet length > + * will be greater than the max value. So insert the > + * packet into the item group. > + */ > + if (insert_new_item(tbl, pkt, ip_id, sent_seq, prev_idx, > + start_time) == INVALID_ARRAY_INDEX) > + return -1; > + return 0; > + } > + prev_idx = cur_idx; > + cur_idx = tbl->items[cur_idx].next_pkt_idx; > + } while (cur_idx != INVALID_ARRAY_INDEX); > + > + /* > + * can't find a packet in the item group to merge, > + * so insert the packet into the item group. > + */ > + if (insert_new_item(tbl, pkt, ip_id, sent_seq, prev_idx, > + start_time) == INVALID_ARRAY_INDEX) > + return -1; > + > + return 0; > +} > + > +uint16_t > +gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl, > + uint64_t timeout_cycles, > + struct rte_mbuf **out, > + uint16_t nb_out) > +{ > + uint16_t k = 0; > + uint32_t i, j; > + uint64_t current_time; > + > + current_time = rte_rdtsc(); > + > + for (i = 0; i < tbl->max_key_num; i++) { > + /* all keys have been checked, return immediately */ > + if (tbl->key_num == 0) > + return k; > + > + if (tbl->keys[i].is_valid == 0) > + continue; > + > + j = tbl->keys[i].start_index; > + do { > + if ((current_time - tbl->items[j].start_time) >= > + timeout_cycles) { > + out[k++] = tbl->items[j].firstseg; > + update_packet_header(&(tbl->items[j])); > + /* delete the item and get the next packet index */ > + j = delete_item(tbl, j); > + > + /* delete the key as all of packets are flushed */ > + if (j == INVALID_ARRAY_INDEX) { > + tbl->keys[i].is_valid = 0; > + tbl->key_num--; > + } else > + /* update start_index of the key */ > + tbl->keys[i].start_index = j; > + > + if (k == nb_out) > + return k; > + } else > + /* > + * left packets of this key won't be timeout, so go to > + * check other keys. > + */ > + break; > + } while (j != INVALID_ARRAY_INDEX); > + } > + return k; > +} > + > +uint32_t > +gro_tcp4_tbl_get_count(void *tbl) > +{ > + struct gro_tcp4_tbl *gro_tbl = tbl; > + > + if (gro_tbl) > + return gro_tbl->item_num; > + > + return 0; > +} > diff --git a/lib/librte_gro/gro_tcp4.h b/lib/librte_gro/gro_tcp4.h > new file mode 100644 > index 0000000..4a57451 > --- /dev/null > +++ b/lib/librte_gro/gro_tcp4.h > @@ -0,0 +1,206 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2017 Intel Corporation. All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#ifndef _GRO_TCP4_H_ > +#define _GRO_TCP4_H_ > + > +#define INVALID_ARRAY_INDEX 0xffffffffUL > +#define GRO_TCP4_TBL_MAX_ITEM_NUM (UINT32_MAX - 1) > + > +/* > + * the max L3 length of a TCP/IPv4 packet. The L3 length > + * is the sum of ipv4 header, tcp header and L4 payload. > + */ > +#define TCP4_MAX_L3_LENGTH UINT16_MAX > + > +/* criteria of mergeing packets */ > +struct tcp4_key { > + struct ether_addr eth_saddr; > + struct ether_addr eth_daddr; > + uint32_t ip_src_addr; > + uint32_t ip_dst_addr; > + > + uint32_t recv_ack; > + uint16_t src_port; > + uint16_t dst_port; > +}; > + > +struct gro_tcp4_key { > + struct tcp4_key key; > + /* the index of the first packet in the item group */ > + uint32_t start_index; > + uint8_t is_valid; > +}; > + > +struct gro_tcp4_item { > + /* > + * first segment of the packet. If the value > + * is NULL, it means the item is empty. > + */ > + struct rte_mbuf *firstseg; > + /* last segment of the packet */ > + struct rte_mbuf *lastseg; > + /* > + * the time when the first packet is inserted > + * into the table. If a packet in the table is > + * merged with an incoming packet, this value > + * won't be updated. We set this value only > + * when the first packet is inserted into the > + * table. > + */ > + uint64_t start_time; > + /* > + * we use next_pkt_idx to chain the packets that > + * have same key value but can't be merged together. > + */ > + uint32_t next_pkt_idx; > + /* the sequence number of the packet */ > + uint32_t sent_seq; > + /* the IP ID of the packet */ > + uint16_t ip_id; > + /* the number of merged packets */ > + uint16_t nb_merged; > +}; > + > +/* > + * TCP/IPv4 reassembly table structure. > + */ > +struct gro_tcp4_tbl { > + /* item array */ > + struct gro_tcp4_item *items; > + /* key array */ > + struct gro_tcp4_key *keys; > + /* current item number */ > + uint32_t item_num; > + /* current key num */ > + uint32_t key_num; > + /* item array size */ > + uint32_t max_item_num; > + /* key array size */ > + uint32_t max_key_num; > +}; > + > +/** > + * This function creates a TCP/IPv4 reassembly table. > + * > + * @param socket_id > + * socket index for allocating TCP/IPv4 reassemblt table > + * @param max_flow_num > + * the maximum number of flows in the TCP/IPv4 GRO table > + * @param max_item_per_flow > + * the maximum packet number per flow. > + * > + * @return > + * if create successfully, return a pointer which points to the > + * created TCP/IPv4 GRO table. Otherwise, return NULL. > + */ > +void *gro_tcp4_tbl_create(uint16_t socket_id, > + uint16_t max_flow_num, > + uint16_t max_item_per_flow); > + > +/** > + * This function destroys a TCP/IPv4 reassembly table. > + * > + * @param tbl > + * a pointer points to the TCP/IPv4 reassembly table. > + */ > +void gro_tcp4_tbl_destroy(void *tbl); > + > +/** > + * This function searches for a packet in the TCP/IPv4 reassembly table > + * to merge with the inputted one. To merge two packets is to chain them > + * together and update packet headers. Packets, whose SYN, FIN, RST, PSH > + * CWR, ECE or URG bit is set, are returned immediately. Packets which > + * only have packet headers (i.e. without data) are also returned > + * immediately. Otherwise, the packet is either merged, or inserted into > + * the table. Besides, if there is no available space to insert the > + * packet, this function returns immediately too. > + * > + * This function assumes the inputted packet is with correct IPv4 and > + * TCP checksums. And if two packets are merged, it won't re-calculate > + * IPv4 and TCP checksums. Besides, if the inputted packet is IP > + * fragmented, it assumes the packet is complete (with TCP header). > + * > + * @param pkt > + * packet to reassemble. > + * @param tbl > + * a pointer that points to a TCP/IPv4 reassembly table. > + * @start_time > + * the start time that the packet is inserted into the table > + * > + * @return > + * if the packet doesn't have data, or SYN, FIN, RST, PSH, CWR, ECE > + * or URG bit is set, or there is no available space in the table to > + * insert a new item or a new key, return a negative value. If the > + * packet is merged successfully, return an positive value. If the > + * packet is inserted into the table, return 0. > + */ > +int32_t gro_tcp4_reassemble(struct rte_mbuf *pkt, > + struct gro_tcp4_tbl *tbl, > + uint64_t start_time); > + > +/** > + * This function flushes timeout packets in a TCP/IPv4 reassembly table > + * to applications, and without updating checksums for merged packets. > + * The max number of flushed timeout packets is the element number of > + * the array which is used to keep flushed packets. > + * > + * @param tbl > + * a pointer that points to a TCP GRO table. > + * @param timeout_cycles > + * the maximum time that packets can stay in the table. > + * @param out > + * pointer array which is used to keep flushed packets. > + * @param nb_out > + * the element number of out. It's also the max number of timeout > + * packets that can be flushed finally. > + * > + * @return > + * the number of packets that are returned. > + */ > +uint16_t gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl, > + uint64_t timeout_cycles, > + struct rte_mbuf **out, > + uint16_t nb_out); > + > +/** > + * This function returns the number of the packets in a TCP/IPv4 > + * reassembly table. > + * > + * @param tbl > + * pointer points to a TCP/IPv4 reassembly table. > + * > + * @return > + * the number of packets in the table > + */ > +uint32_t gro_tcp4_tbl_get_count(void *tbl); > +#endif > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c > index 24e5f2b..7488845 100644 > --- a/lib/librte_gro/rte_gro.c > +++ b/lib/librte_gro/rte_gro.c > @@ -32,8 +32,11 @@ > > #include > #include > +#include > +#include > > #include "rte_gro.h" > +#include "gro_tcp4.h" > > typedef void *(*gro_tbl_create_fn)(uint16_t socket_id, > uint16_t max_flow_num, > @@ -41,9 +44,12 @@ typedef void *(*gro_tbl_create_fn)(uint16_t socket_id, > typedef void (*gro_tbl_destroy_fn)(void *tbl); > typedef uint32_t (*gro_tbl_get_count_fn)(void *tbl); > > -static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM]; > -static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM]; > -static gro_tbl_get_count_fn tbl_get_count_fn[RTE_GRO_TYPE_MAX_NUM]; > +static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM] = { > + gro_tcp4_tbl_create, NULL}; > +static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM] = { > + gro_tcp4_tbl_destroy, NULL}; > +static gro_tbl_get_count_fn tbl_get_count_fn[RTE_GRO_TYPE_MAX_NUM] = { > + gro_tcp4_tbl_get_count, NULL}; > > /* > * GRO context structure, which is used to merge packets. It keeps > @@ -124,27 +130,116 @@ rte_gro_ctx_destroy(void *ctx) > } > > uint16_t > -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused, > +rte_gro_reassemble_burst(struct rte_mbuf **pkts, > uint16_t nb_pkts, > - const struct rte_gro_param *param __rte_unused) > + const struct rte_gro_param *param) > { > - return nb_pkts; > + uint16_t i; > + uint16_t nb_after_gro = nb_pkts; > + uint32_t item_num; > + > + /* allocate a reassembly table for TCP/IPv4 GRO */ > + struct gro_tcp4_tbl tcp_tbl; > + struct gro_tcp4_key tcp_keys[RTE_GRO_MAX_BURST_ITEM_NUM] = {0}; > + struct gro_tcp4_item tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM] = {0}; > + > + struct rte_mbuf *unprocess_pkts[nb_pkts]; > + uint16_t unprocess_num = 0; > + int32_t ret; > + uint64_t current_time; > + > + if ((param->gro_types & RTE_GRO_TCP_IPV4) == 0) > + return nb_pkts; > + > + /* get the actual number of packets */ > + item_num = RTE_MIN(nb_pkts, (param->max_flow_num * > + param->max_item_per_flow)); > + item_num = RTE_MIN(item_num, RTE_GRO_MAX_BURST_ITEM_NUM); > + > + tcp_tbl.keys = tcp_keys; > + tcp_tbl.items = tcp_items; > + tcp_tbl.key_num = 0; > + tcp_tbl.item_num = 0; > + tcp_tbl.max_key_num = item_num; > + tcp_tbl.max_item_num = item_num; > + > + current_time = rte_rdtsc(); > + > + for (i = 0; i < nb_pkts; i++) { > + if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) && > + (pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) { Keep one style to check the ptypes, either macro or just compare the bit like: pkt->packet_type & (RTE_PTYPE_L3_IP | RTE_PTYPE_L4_TCP) == (RTE_PTYPE_L3_IP | RTE_PTYPE_L4_TCP) > + ret = gro_tcp4_reassemble(pkts[i], > + &tcp_tbl, > + current_time); > + if (ret > 0) > + /* merge successfully */ > + nb_after_gro--; > + else if (ret < 0) > + unprocess_pkts[unprocess_num++] = pkts[i]; > + } else > + unprocess_pkts[unprocess_num++] = pkts[i]; > + } > + > + /* re-arrange GROed packets */ > + if (nb_after_gro < nb_pkts) { > + i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, 0, pkts, nb_pkts); > + if (unprocess_num > 0) { > + memcpy(&pkts[i], unprocess_pkts, > + sizeof(struct rte_mbuf *) * unprocess_num); > + } > + } > + > + return nb_after_gro; > } > > uint16_t > -rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused, > +rte_gro_reassemble(struct rte_mbuf **pkts, > uint16_t nb_pkts, > - void *ctx __rte_unused) > + void *ctx) > { > - return nb_pkts; > + uint16_t i, unprocess_num = 0; > + struct rte_mbuf *unprocess_pkts[nb_pkts]; > + struct gro_ctx *gro_ctx = ctx; > + uint64_t current_time; > + > + if ((gro_ctx->gro_types & RTE_GRO_TCP_IPV4) == 0) > + return nb_pkts; > + > + current_time = rte_rdtsc(); > + > + for (i = 0; i < nb_pkts; i++) { > + if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) && > + (pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) { > + if (gro_tcp4_reassemble(pkts[i], > + gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX], > + current_time) < 0) > + unprocess_pkts[unprocess_num++] = pkts[i]; > + } else > + unprocess_pkts[unprocess_num++] = pkts[i]; > + } > + if (unprocess_num > 0) { > + memcpy(pkts, unprocess_pkts, > + sizeof(struct rte_mbuf *) * unprocess_num); > + } > + > + return unprocess_num; > } > > uint16_t > -rte_gro_timeout_flush(void *ctx __rte_unused, > - uint64_t gro_types __rte_unused, > - struct rte_mbuf **out __rte_unused, > - uint16_t max_nb_out __rte_unused) > +rte_gro_timeout_flush(void *ctx, > + uint64_t gro_types, > + struct rte_mbuf **out, > + uint16_t max_nb_out) > { > + struct gro_ctx *gro_ctx = ctx; > + > + gro_types = gro_types & gro_ctx->gro_types; > + if (gro_types & RTE_GRO_TCP_IPV4) { > + return gro_tcp4_tbl_timeout_flush( > + gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX], > + gro_ctx->max_timeout_cycles, > + out, max_nb_out); > + } > return 0; > } > > diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h > index 54a6e82..c2140e6 100644 > --- a/lib/librte_gro/rte_gro.h > +++ b/lib/librte_gro/rte_gro.h > @@ -45,8 +45,11 @@ extern "C" { > /**< max number of supported GRO types */ > #define RTE_GRO_TYPE_MAX_NUM 64 > /**< current supported GRO num */ > -#define RTE_GRO_TYPE_SUPPORT_NUM 0 > +#define RTE_GRO_TYPE_SUPPORT_NUM 1 > > +/**< TCP/IPv4 GRO flag */ > +#define RTE_GRO_TCP_IPV4_INDEX 0 > +#define RTE_GRO_TCP_IPV4 (1ULL << RTE_GRO_TCP_IPV4_INDEX) > > struct rte_gro_param { > /**< desired GRO types */