From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 625425A3E for ; Wed, 21 Jun 2017 01:30:15 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Jun 2017 16:30:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.39,366,1493708400"; d="scan'208";a="101777403" Received: from tanjianf-mobl.ccr.corp.intel.com (HELO [10.255.27.34]) ([10.255.27.34]) by orsmga002.jf.intel.com with ESMTP; 20 Jun 2017 16:30:09 -0700 To: Jiayu Hu References: <1496833731-53653-1-git-send-email-jiayu.hu@intel.com> <1497770469-16661-1-git-send-email-jiayu.hu@intel.com> <1497770469-16661-3-git-send-email-jiayu.hu@intel.com> <20fd3a2c-9b61-2732-5a34-5acb8fc639a0@intel.com> <20170620032220.GB12728@localhost.localdomain> Cc: dev@dpdk.org, konstantin.ananyev@intel.com, yliu@fridaylinux.org, keith.wiles@intel.com, tiwei.bie@intel.com, lei.a.yao@intel.com From: "Tan, Jianfeng" Message-ID: <21e9e28b-ba41-b88d-1f9a-b022b0d2c5ce@intel.com> Date: Wed, 21 Jun 2017 07:30:08 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20170620032220.GB12728@localhost.localdomain> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v5 2/3] lib/gro: add TCP/IPv4 GRO support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jun 2017 23:30:17 -0000 Hi Jiayu, On 6/20/2017 11:22 AM, Jiayu Hu wrote: > Hi Jianfeng, > > On Mon, Jun 19, 2017 at 11:43:20PM +0800, Tan, Jianfeng wrote: >> >> On 6/18/2017 3:21 PM, Jiayu Hu wrote: >>> In this patch, we introduce six APIs to support TCP/IPv4 GRO. >> Those functions are not used outside of this library. Don't make it as >> extern visible. > But they are called by functions in rte_gro.c, which are in the different > file. If we define these functions with static, how can they be called by > other functions in the different file? We can define some ops for GRO engines. And in each GRO engine, tcp4 in this case, we just need to register those ops; then we can iterate all GRO engines in rte_gro,c. It's a better way for other developers to contribute other GRO engines. > >>> - gro_tcp_tbl_create: create a TCP reassembly table, which is used to >>> merge packets. >> Will tcp6 shares the same function with tcp4? If no, please rename it to >> gro_tcp4_tlb_create > In TCP GRO design, TCP4 and TCP6 will share a same table structure, but > they will have different reassembly function. Therefore, I use > gro_tcp_tlb_create instead of gro_tcp4_tlb_create here. Then as far as I see, we are gonna call this function for all GRO engines except different flow structures are allocated for different GRO engines. Then I suggest we put this function into rte_gro.c. > >>> - gro_tcp_tbl_destroy: free memory space of a TCP reassembly table. >>> - gro_tcp_tbl_flush: flush packets in the TCP reassembly table. >>> - gro_tcp_tbl_timeout_flush: flush timeout packets in the TCP >>> reassembly table. >>> - gro_tcp4_reassemble: merge an inputted packet. >>> - gro_tcp4_tbl_cksum_update: update TCP and IPv4 header checksums for >>> all merged packets in the TCP reassembly table. >>> >>> In TCP GRO, we use a table structure, called TCP reassembly table, to >>> reassemble packets. Both TCP/IPv4 and TCP/IPv6 GRO use the same table >>> structure. A TCP reassembly table includes a flow array and a item array, >>> where the flow array is used to record flow information and the item >>> array is used to record packets information. >>> >>> Each element in the flow array records the information of one flow, >>> which includes two parts: >>> - key: the criteria of the same flow. If packets have the same key >>> value, they belong to the same flow. >>> - start_index: the index of the first incoming packet of this flow in >>> the item array. With start_index, we can locate the first incoming >>> packet of this flow. >>> Each element in the item array records one packet information. It mainly >>> includes two parts: >>> - pkt: packet address >>> - next_pkt_index: index of the next packet of the same flow in the item >>> array. All packets of the same flow are chained by next_pkt_index. >>> With next_pkt_index, we can locate all packets of the same flow >>> one by one. >>> >>> To process an incoming packet, we need three steps: >>> a. check if the packet should be processed. Packets with the following >>> properties won't be processed: >>> - packets without data; >>> - packets with wrong checksums; >> Why do we care to check this kind of error? Can we just suppose the >> applications have already dropped the packets with wrong cksum? > Indeed, if we assume all inputted packets are correct, we can avoid > checksum checking overhead. But as a library, I think a more flexible > way is to enable applications to tell GRO API if checksum checking > is needed. For example, we can add a flag to struct rte_gro_tbl > and struct rte_gro_param, which indicates if the checksum checking > is needed. If applications set this flag, reassembly function won't > check packet checksum. Otherwise, we check the checksum. How do you > think? My opinion is to keep the library focusing on what it does, and make clear its dependency. This flag thing will differ for different GRO engines, which makes it a little complicated to me. > >>> - fragmented packets. >> IP fragmented? I don't think we need to check it here either. It's the >> application's responsibility to call librte_ip_frag firstly to reassemble >> IP-fragmented packets, and then call this gro library to merge TCP packets. >> And this procedure should be shown in an example for other users to refer. >> >>> b. traverse the flow array to find a flow which the packet belongs to. >>> If not find, insert a new flow and store the packet into the item >>> array. >> You do not store the packet now. "store the packet into the item array" -> >> "then go to step c". > Thanks, I will update it in next patch. > >>> c. locate the first packet of this flow in the item array via >>> start_index. Then traverse all packets of this flow one by one via >>> next_pkt_index. If find one packet to merge with the incoming packet, >>> merge them but without updating checksums. If not, allocate one item >>> in the item array to store the incoming packet and update >>> next_pkt_index value. >>> >>> For better performance, we don't update header checksums once two >>> packets are merged. The header checksums are updated only when packets >>> are flushed from TCP reassembly tables. >> Why do we care to recalculate the L4 checksum when flushing? How about Just >> keeping the wrong cksum, and letting the applications to handle that? > Not all applications want GROed packets with wrong checksum. So I think a > more reasonable way is to give a flag to applications to tell GRO API if > they need calculate checksum when flush them from GRO table. How do you > think? The two main directions: (1) to be sent out from physical NIC; (2) to be sent out from a vhost port. It is very easy to take care of the wrong checksum for applications. > >> >>> Signed-off-by: Jiayu Hu >>> --- >>> lib/librte_gro/Makefile | 1 + >>> lib/librte_gro/rte_gro.c | 154 +++++++++++-- >>> lib/librte_gro/rte_gro.h | 34 +-- >>> lib/librte_gro/rte_gro_tcp.c | 527 +++++++++++++++++++++++++++++++++++++++++++ >>> lib/librte_gro/rte_gro_tcp.h | 210 +++++++++++++++++ >>> 5 files changed, 895 insertions(+), 31 deletions(-) >>> create mode 100644 lib/librte_gro/rte_gro_tcp.c >>> create mode 100644 lib/librte_gro/rte_gro_tcp.h >>> >>> diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile >>> index 9f4063a..3495dfc 100644 >>> --- a/lib/librte_gro/Makefile >>> +++ b/lib/librte_gro/Makefile >>> @@ -43,6 +43,7 @@ LIBABIVER := 1 >>> # source files >>> SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c >>> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro_tcp.c >> Again, if it's just for tcp4, please use the name rte_gro_tcp4.c. > TCP4 and TCP6 reassembly functions will be placed in the same file, > rte_gro_tcp.c. But currently, we don't support TCP6 GRO. That's ok to me. But then we have to have different struct gro_tcp_flow for tcp4 and tcp6. > >>> # install this header file >>> SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h >>> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c >>> index 1bc53a2..2620ef6 100644 >>> --- a/lib/librte_gro/rte_gro.c >>> +++ b/lib/librte_gro/rte_gro.c >>> @@ -32,11 +32,17 @@ >>> #include >>> #include >>> +#include >>> +#include >>> +#include >>> #include "rte_gro.h" >>> +#include "rte_gro_tcp.h" >>> -static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB]; >>> -static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB]; >>> +static gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = { >>> + gro_tcp_tbl_create, NULL}; >>> +static gro_tbl_destroy_fn tbl_destroy_functions[GRO_TYPE_MAX_NB] = { >>> + gro_tcp_tbl_destroy, NULL}; >>> struct rte_gro_tbl *rte_gro_tbl_create(uint16_t socket_id, >>> uint16_t max_flow_num, >>> @@ -94,33 +100,149 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl) >>> } >>> uint16_t >>> -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused, >>> +rte_gro_reassemble_burst(struct rte_mbuf **pkts, >>> const uint16_t nb_pkts, >>> - const struct rte_gro_param param __rte_unused) >>> + const struct rte_gro_param param) >>> { >>> - return nb_pkts; >>> + struct ether_hdr *eth_hdr; >>> + struct ipv4_hdr *ipv4_hdr; >>> + uint16_t l3proc_type, i; >> I did not catch the variable definition here: l3proc_type -> l3_proto? > You can see it in line 158 and line 159. It's not the reference. I mean the variable name is not that clear. > >>> + uint16_t nb_after_gro = nb_pkts; >>> + uint16_t flow_num = nb_pkts < param.max_flow_num ? >>> + nb_pkts : param.max_flow_num; >>> + uint32_t item_num = nb_pkts < >>> + flow_num * param.max_item_per_flow ? >>> + nb_pkts : >>> + flow_num * param.max_item_per_flow; >>> + >>> + /* allocate a reassembly table for TCP/IPv4 GRO */ >>> + uint16_t tcp_flow_num = flow_num <= GRO_TCP_TBL_MAX_FLOW_NUM ? >>> + flow_num : GRO_TCP_TBL_MAX_FLOW_NUM; >>> + uint32_t tcp_item_num = item_num <= GRO_TCP_TBL_MAX_ITEM_NUM ? >>> + item_num : GRO_TCP_TBL_MAX_ITEM_NUM; >> Below tcpv4-specific logic should be in rte_gro_tcp4.c; here, as my previous >> comment, we iterate all ptypes of this packets to iterate all supported GRO >> engine. > Sorry, I don't get the point. The table which is created here is used by > gro_tcp4_reassemble when merges packets. If we don't create table here, > what does gro_tcp4_reassemble use to merge packets? Too much tcp* code here. If we add another GRO engine, take udp as an example, shall we add more udp* code here? Not a good idea to me. In fact, gro_tcp4_reassemble is defined in rte_gro_tcp.c instead of this file. For better modularity, we'd better put these tcp-related code into rte_gro_tcp.c. > >>> + struct gro_tcp_tbl tcp_tbl; >>> + struct gro_tcp_flow tcp_flows[tcp_flow_num]; >>> + struct gro_tcp_item tcp_items[tcp_item_num]; >>> + struct gro_tcp_rule tcp_rule; >>> + >>> + struct rte_mbuf *unprocess_pkts[nb_pkts]; >>> + uint16_t unprocess_num = 0; >>> + int32_t ret; >>> + >>> + if (unlikely(nb_pkts <= 1)) >>> + return nb_pkts; >>> + >>> + memset(tcp_flows, 0, sizeof(struct gro_tcp_flow) * >>> + tcp_flow_num); >>> + memset(tcp_items, 0, sizeof(struct gro_tcp_item) * >>> + tcp_item_num); >>> + tcp_tbl.flows = tcp_flows; >>> + tcp_tbl.items = tcp_items; >>> + tcp_tbl.flow_num = 0; >>> + tcp_tbl.item_num = 0; >>> + tcp_tbl.max_flow_num = tcp_flow_num; >>> + tcp_tbl.max_item_num = tcp_item_num; >>> + tcp_rule.max_packet_size = param.max_packet_size; >>> + >>> + for (i = 0; i < nb_pkts; i++) { >>> + eth_hdr = rte_pktmbuf_mtod(pkts[i], struct ether_hdr *); >>> + l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type); >>> + if (l3proc_type == ETHER_TYPE_IPv4) { >>> + ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1); >>> + if (ipv4_hdr->next_proto_id == IPPROTO_TCP && >>> + (param.desired_gro_types & >>> + GRO_TCP_IPV4)) { >>> + ret = gro_tcp4_reassemble(pkts[i], >>> + &tcp_tbl, >>> + &tcp_rule); >>> + if (ret > 0) >>> + nb_after_gro--; >>> + else if (ret < 0) >>> + unprocess_pkts[unprocess_num++] = >>> + pkts[i]; >>> + } else >>> + unprocess_pkts[unprocess_num++] = >>> + pkts[i]; >>> + } else >>> + unprocess_pkts[unprocess_num++] = >>> + pkts[i]; >>> + } >>> + >>> + if (nb_after_gro < nb_pkts) { >>> + /* update packets headers and re-arrange GROed packets */ >>> + if (param.desired_gro_types & GRO_TCP_IPV4) { >>> + gro_tcp4_tbl_cksum_update(&tcp_tbl); >>> + for (i = 0; i < tcp_tbl.item_num; i++) >>> + pkts[i] = tcp_tbl.items[i].pkt; >>> + } >>> + if (unprocess_num > 0) { >>> + memcpy(&pkts[i], unprocess_pkts, >>> + sizeof(struct rte_mbuf *) * >>> + unprocess_num); >>> + i += unprocess_num; >>> + } >>> + if (nb_pkts > i) >>> + memset(&pkts[i], 0, >>> + sizeof(struct rte_mbuf *) * >>> + (nb_pkts - i)); >>> + } >>> + return nb_after_gro; >>> } >>> -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused, >>> - struct rte_gro_tbl *gro_tbl __rte_unused) >>> +int rte_gro_reassemble(struct rte_mbuf *pkt, >>> + struct rte_gro_tbl *gro_tbl) >>> { >>> + struct ether_hdr *eth_hdr; >>> + struct ipv4_hdr *ipv4_hdr; >>> + uint16_t l3proc_type; >>> + struct gro_tcp_rule tcp_rule; >>> + >>> + if (pkt == NULL) >>> + return -1; >>> + tcp_rule.max_packet_size = gro_tbl->max_packet_size; >>> + eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *); >>> + l3proc_type = rte_be_to_cpu_16(eth_hdr->ether_type); >>> + if (l3proc_type == ETHER_TYPE_IPv4) { >>> + ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1); >>> + if (ipv4_hdr->next_proto_id == IPPROTO_TCP && >>> + (gro_tbl->desired_gro_types & GRO_TCP_IPV4)) { >>> + return gro_tcp4_reassemble(pkt, >>> + gro_tbl->tbls[GRO_TCP_IPV4_INDEX], >>> + &tcp_rule); >>> + } >>> + } >>> return -1; >>> } >>> -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused, >>> - uint64_t desired_gro_types __rte_unused, >>> - uint16_t flush_num __rte_unused, >>> - struct rte_mbuf **out __rte_unused, >>> - const uint16_t max_nb_out __rte_unused) >>> +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl, >>> + uint64_t desired_gro_types, >>> + uint16_t flush_num, >>> + struct rte_mbuf **out, >>> + const uint16_t max_nb_out) >>> { >> Ditto. >> >>> + desired_gro_types = desired_gro_types & >>> + gro_tbl->desired_gro_types; >>> + if (desired_gro_types & GRO_TCP_IPV4) >>> + return gro_tcp_tbl_flush( >>> + gro_tbl->tbls[GRO_TCP_IPV4_INDEX], >>> + flush_num, >>> + out, >>> + max_nb_out); >>> return 0; >>> } >>> uint16_t >>> -rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused, >>> - uint64_t desired_gro_types __rte_unused, >>> - struct rte_mbuf **out __rte_unused, >>> - const uint16_t max_nb_out __rte_unused) >>> +rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl, >>> + uint64_t desired_gro_types, >>> + struct rte_mbuf **out, >>> + const uint16_t max_nb_out) >>> { >>> + desired_gro_types = desired_gro_types & >>> + gro_tbl->desired_gro_types; >>> + if (desired_gro_types & GRO_TCP_IPV4) >>> + return gro_tcp_tbl_timeout_flush( >>> + gro_tbl->tbls[GRO_TCP_IPV4_INDEX], >>> + gro_tbl->max_timeout_cycles, >>> + out, max_nb_out); >>> return 0; >>> } >>> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h >>> index 67bd90d..e26aa5b 100644 >>> --- a/lib/librte_gro/rte_gro.h >>> +++ b/lib/librte_gro/rte_gro.h >>> @@ -35,7 +35,11 @@ >>> /* maximum number of supported GRO types */ >>> #define GRO_TYPE_MAX_NB 64 >>> -#define GRO_TYPE_SUPPORT_NB 0 /**< current supported GRO num */ >>> +#define GRO_TYPE_SUPPORT_NB 1 /**< supported GRO types number */ >>> + >>> +/* TCP/IPv4 GRO flag */ >>> +#define GRO_TCP_IPV4_INDEX 0 >>> +#define GRO_TCP_IPV4 (1ULL << GRO_TCP_IPV4_INDEX) >>> /** >>> * GRO table structure. DPDK GRO uses GRO table to reassemble >>> @@ -139,9 +143,9 @@ void rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl); >>> * @return >>> * the number of packets after GROed. >>> */ >>> -uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused, >>> - const uint16_t nb_pkts __rte_unused, >>> - const struct rte_gro_param param __rte_unused); >>> +uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts, >>> + const uint16_t nb_pkts, >>> + const struct rte_gro_param param); >>> /** >>> * This is the main reassembly API used in heavyweight mode, which >>> @@ -164,8 +168,8 @@ uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused, >>> * if merge the packet successfully, return a positive value. If fail >>> * to merge, return zero. If errors happen, return a negative value. >>> */ >>> -int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused, >>> - struct rte_gro_tbl *gro_tbl __rte_unused); >>> +int rte_gro_reassemble(struct rte_mbuf *pkt, >>> + struct rte_gro_tbl *gro_tbl); >>> /** >>> * This function flushed packets of desired GRO types from their >>> @@ -184,11 +188,11 @@ int rte_gro_reassemble(struct rte_mbuf *pkt __rte_unused, >>> * @return >>> * the number of flushed packets. If no packets are flushed, return 0. >>> */ >>> -uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused, >>> - uint64_t desired_gro_types __rte_unused, >>> - uint16_t flush_num __rte_unused, >>> - struct rte_mbuf **out __rte_unused, >>> - const uint16_t max_nb_out __rte_unused); >>> +uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl, >>> + uint64_t desired_gro_types, >>> + uint16_t flush_num, >>> + struct rte_mbuf **out, >>> + const uint16_t max_nb_out); >>> /** >>> * This function flushes the timeout packets from reassembly tables of >>> @@ -206,8 +210,8 @@ uint16_t rte_gro_flush(struct rte_gro_tbl *gro_tbl __rte_unused, >>> * @return >>> * the number of flushed packets. If no packets are flushed, return 0. >>> */ >>> -uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl __rte_unused, >>> - uint64_t desired_gro_types __rte_unused, >>> - struct rte_mbuf **out __rte_unused, >>> - const uint16_t max_nb_out __rte_unused); >>> +uint16_t rte_gro_timeout_flush(struct rte_gro_tbl *gro_tbl, >>> + uint64_t desired_gro_types, >>> + struct rte_mbuf **out, >>> + const uint16_t max_nb_out); >> Do you have any cases to test this API? I don't see following example use >> this API. That means we are exposing an API that are never tested. I don't >> know if we can add some experimental flag on this API. Let's seek advice >> from others. > These flush APIs are used in heavyweight mode. But testpmd is not a good case > to use heavyweight mode. How do you think if we use some unit tests to test > them? I think vhost example is a good place to implement heavyweight mode. There is timeout mechanism in vhost example which can call this flush API. Feel free to ping yuanhan and Maxime for suggestions. > >>> #endif >>> diff --git a/lib/librte_gro/rte_gro_tcp.c b/lib/librte_gro/rte_gro_tcp.c >>> new file mode 100644 >>> index 0000000..86743cd >>> --- /dev/null >>> +++ b/lib/librte_gro/rte_gro_tcp.c >>> @@ -0,0 +1,527 @@ >>> +/*- >>> + * BSD LICENSE >>> + * >>> + * Copyright(c) 2016-2017 Intel Corporation. All rights reserved. >>> + * >>> + * Redistribution and use in source and binary forms, with or without >>> + * modification, are permitted provided that the following conditions >>> + * are met: >>> + * >>> + * * Redistributions of source code must retain the above copyright >>> + * notice, this list of conditions and the following disclaimer. >>> + * * Redistributions in binary form must reproduce the above copyright >>> + * notice, this list of conditions and the following disclaimer in >>> + * the documentation and/or other materials provided with the >>> + * distribution. >>> + * * Neither the name of Intel Corporation nor the names of its >>> + * contributors may be used to endorse or promote products derived >>> + * from this software without specific prior written permission. >>> + * >>> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS >>> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT >>> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR >>> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT >>> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, >>> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT >>> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, >>> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY >>> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT >>> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE >>> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. >>> + */ >>> + >>> +#include >>> +#include >>> +#include >>> + >>> +#include >>> +#include >>> +#include >>> + >>> +#include "rte_gro_tcp.h" >>> + >>> +void *gro_tcp_tbl_create(uint16_t socket_id, >> Define it as "static". Similar to other functions. >> >>> + uint16_t max_flow_num, >>> + uint16_t max_item_per_flow) >>> +{ >>> + size_t size; >>> + uint32_t entries_num; >>> + struct gro_tcp_tbl *tbl; >>> + >>> + max_flow_num = max_flow_num > GRO_TCP_TBL_MAX_FLOW_NUM ? >>> + GRO_TCP_TBL_MAX_FLOW_NUM : max_flow_num; >>> + >>> + entries_num = max_flow_num * max_item_per_flow; >>> + entries_num = entries_num > GRO_TCP_TBL_MAX_ITEM_NUM ? >>> + GRO_TCP_TBL_MAX_ITEM_NUM : entries_num; >>> + >>> + if (entries_num == 0 || max_flow_num == 0) >>> + return NULL; >>> + >>> + tbl = (struct gro_tcp_tbl *)rte_zmalloc_socket( >>> + __func__, >>> + sizeof(struct gro_tcp_tbl), >>> + RTE_CACHE_LINE_SIZE, >>> + socket_id); >>> + >>> + size = sizeof(struct gro_tcp_item) * entries_num; >>> + tbl->items = (struct gro_tcp_item *)rte_zmalloc_socket( >>> + __func__, >>> + size, >>> + RTE_CACHE_LINE_SIZE, >>> + socket_id); >>> + tbl->max_item_num = entries_num; >>> + >>> + size = sizeof(struct gro_tcp_flow) * max_flow_num; >>> + tbl->flows = (struct gro_tcp_flow *)rte_zmalloc_socket( >>> + __func__, >>> + size, RTE_CACHE_LINE_SIZE, >>> + socket_id); >>> + tbl->max_flow_num = max_flow_num; >>> + return tbl; >>> +} >>> + >>> +void gro_tcp_tbl_destroy(void *tbl) >>> +{ >>> + struct gro_tcp_tbl *tcp_tbl = (struct gro_tcp_tbl *)tbl; >>> + >>> + if (tcp_tbl) { >>> + if (tcp_tbl->items) >>> + rte_free(tcp_tbl->items); >>> + if (tcp_tbl->flows) >>> + rte_free(tcp_tbl->flows); >>> + rte_free(tcp_tbl); >>> + } >>> +} >>> + >>> +/* update TCP header and IPv4 header checksum */ >>> +static void >>> +gro_tcp4_cksum_update(struct rte_mbuf *pkt) >>> +{ >>> + uint32_t len, offset, cksum; >>> + struct ether_hdr *eth_hdr; >>> + struct ipv4_hdr *ipv4_hdr; >>> + struct tcp_hdr *tcp_hdr; >>> + uint16_t ipv4_ihl, cksum_pld; >>> + >>> + if (pkt == NULL) >>> + return; >>> + >>> + len = pkt->pkt_len; >>> + eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *); >>> + ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1); >>> + ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr); >>> + tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl); >>> + >>> + offset = sizeof(struct ether_hdr) + ipv4_ihl; >>> + len -= offset; >>> + >>> + /* TCP cksum without IP pseudo header */ >>> + ipv4_hdr->hdr_checksum = 0; >>> + tcp_hdr->cksum = 0; >>> + rte_raw_cksum_mbuf(pkt, offset, len, &cksum_pld); >>> + >>> + /* IP pseudo header cksum */ >>> + cksum = cksum_pld; >>> + cksum += rte_ipv4_phdr_cksum(ipv4_hdr, 0); >>> + >>> + /* combine TCP checksum and IP pseudo header checksum */ >>> + cksum = ((cksum & 0xffff0000) >> 16) + (cksum & 0xffff); >>> + cksum = (~cksum) & 0xffff; >>> + cksum = (cksum == 0) ? 0xffff : cksum; >>> + tcp_hdr->cksum = cksum; >>> + >>> + /* update IP header cksum */ >>> + ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr); >>> +} >>> + >>> +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl) >>> +{ >>> + uint32_t i; >>> + uint32_t item_num = tbl->item_num; >>> + >>> + for (i = 0; i < tbl->max_item_num; i++) { >>> + if (tbl->items[i].is_valid) { >>> + item_num--; >>> + if (tbl->items[i].is_groed) >>> + gro_tcp4_cksum_update(tbl->items[i].pkt); >>> + } >>> + if (unlikely(item_num == 0)) >>> + break; >>> + } >>> +} >>> + >>> +/** >>> + * merge two TCP/IPv4 packets without update header checksum. >>> + */ >>> +static int >>> +merge_two_tcp4_packets(struct rte_mbuf *pkt_src, >>> + struct rte_mbuf *pkt, >>> + struct gro_tcp_rule *rule) >>> +{ >>> + struct ipv4_hdr *ipv4_hdr1, *ipv4_hdr2; >>> + struct tcp_hdr *tcp_hdr1; >>> + uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1; >>> + struct rte_mbuf *tail; >>> + >>> + /* parse the given packet */ >>> + ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, >>> + struct ether_hdr *) + 1); >>> + ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1); >>> + tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1); >>> + tcp_hl1 = TCP_HDR_LEN(tcp_hdr1); >>> + tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1 >>> + - tcp_hl1; >>> + >>> + /* parse the original packet */ >>> + ipv4_hdr2 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_src, >>> + struct ether_hdr *) + 1); >>> + >>> + /* check reassembly rules */ >>> + if (pkt_src->pkt_len + tcp_dl1 > rule->max_packet_size) >>> + return -1; >>> + >>> + /* remove the header of the incoming packet */ >>> + rte_pktmbuf_adj(pkt, sizeof(struct ether_hdr) + >>> + ipv4_ihl1 + tcp_hl1); >>> + >>> + /* chain the two packet together */ >>> + tail = rte_pktmbuf_lastseg(pkt_src); >>> + tail->next = pkt; >>> + >>> + /* update IP header */ >>> + ipv4_hdr2->total_length = rte_cpu_to_be_16( >>> + rte_be_to_cpu_16( >>> + ipv4_hdr2->total_length) >>> + + tcp_dl1); >>> + >>> + /* update mbuf metadata for the merged packet */ >>> + pkt_src->nb_segs++; >>> + pkt_src->pkt_len += pkt->pkt_len; >>> + return 1; >>> +} >>> + >>> +static int >>> +check_seq_option(struct rte_mbuf *pkt, >>> + struct tcp_hdr *tcp_hdr, >>> + uint16_t tcp_hl) >>> +{ >>> + struct ipv4_hdr *ipv4_hdr1; >>> + struct tcp_hdr *tcp_hdr1; >>> + uint16_t ipv4_ihl1, tcp_hl1, tcp_dl1; >>> + uint32_t sent_seq1, sent_seq; >>> + int ret = -1; >>> + >>> + ipv4_hdr1 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, >>> + struct ether_hdr *) + 1); >>> + ipv4_ihl1 = IPv4_HDR_LEN(ipv4_hdr1); >>> + tcp_hdr1 = (struct tcp_hdr *)((char *)ipv4_hdr1 + ipv4_ihl1); >>> + tcp_hl1 = TCP_HDR_LEN(tcp_hdr1); >>> + tcp_dl1 = rte_be_to_cpu_16(ipv4_hdr1->total_length) - ipv4_ihl1 >>> + - tcp_hl1; >>> + sent_seq1 = rte_be_to_cpu_32(tcp_hdr1->sent_seq) + tcp_dl1; >>> + sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq); >>> + >>> + /* check if the two packets are neighbor */ >>> + if ((sent_seq ^ sent_seq1) == 0) { >>> + /* check if TCP option field equals */ >>> + if (tcp_hl1 > sizeof(struct tcp_hdr)) { >>> + if ((tcp_hl1 != tcp_hl) || >>> + (memcmp(tcp_hdr1 + 1, >>> + tcp_hdr + 1, >>> + tcp_hl - sizeof >>> + (struct tcp_hdr)) >>> + == 0)) >>> + ret = 1; >>> + } >>> + } >>> + return ret; >>> +} >>> + >>> +static uint32_t >>> +find_an_empty_item(struct gro_tcp_tbl *tbl) >>> +{ >>> + uint32_t i; >>> + >>> + for (i = 0; i < tbl->max_item_num; i++) >>> + if (tbl->items[i].is_valid == 0) >>> + return i; >>> + return INVALID_ITEM_INDEX; >>> +} >>> + >>> +static uint16_t >>> +find_an_empty_flow(struct gro_tcp_tbl *tbl) >>> +{ >>> + uint16_t i; >>> + >>> + for (i = 0; i < tbl->max_flow_num; i++) >>> + if (tbl->flows[i].is_valid == 0) >>> + return i; >>> + return INVALID_FLOW_INDEX; >>> +} >>> + >>> +int32_t >>> +gro_tcp4_reassemble(struct rte_mbuf *pkt, >>> + struct gro_tcp_tbl *tbl, >>> + struct gro_tcp_rule *rule) >>> +{ >>> + struct ether_hdr *eth_hdr; >>> + struct ipv4_hdr *ipv4_hdr; >>> + struct tcp_hdr *tcp_hdr; >>> + uint16_t ipv4_ihl, tcp_hl, tcp_dl, tcp_cksum, ip_cksum; >>> + >>> + struct gro_tcp_flow_key key; >>> + uint64_t ol_flags; >>> + uint32_t cur_idx, prev_idx, item_idx; >>> + uint16_t i, flow_idx; >>> + >>> + eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *); >>> + ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1); >>> + ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr); >>> + >>> + /* 1. check if the packet should be processed */ >>> + if (ipv4_ihl < sizeof(struct ipv4_hdr)) >>> + goto fail; >>> + if (ipv4_hdr->next_proto_id != IPPROTO_TCP) >>> + goto fail; >>> + if ((ipv4_hdr->fragment_offset & >>> + rte_cpu_to_be_16(IPV4_HDR_DF_MASK)) >>> + == 0) >>> + goto fail; >>> + >>> + tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl); >>> + tcp_hl = TCP_HDR_LEN(tcp_hdr); >>> + tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl >>> + - tcp_hl; >>> + if (tcp_dl == 0) >>> + goto fail; >>> + >>> + /** >>> + * 2. if HW rx checksum offload isn't enabled, recalculate the >>> + * checksum in SW. Then, check if the checksum is correct >>> + */ >>> + ol_flags = pkt->ol_flags; >>> + if ((ol_flags & PKT_RX_IP_CKSUM_MASK) != >>> + PKT_RX_IP_CKSUM_UNKNOWN) { >>> + if (ol_flags == PKT_RX_IP_CKSUM_BAD) >>> + goto fail; >>> + } else { >>> + ip_cksum = ipv4_hdr->hdr_checksum; >>> + ipv4_hdr->hdr_checksum = 0; >>> + ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr); >>> + if (ipv4_hdr->hdr_checksum ^ ip_cksum) >>> + goto fail; >>> + } >>> + >>> + if ((ol_flags & PKT_RX_L4_CKSUM_MASK) != >>> + PKT_RX_L4_CKSUM_UNKNOWN) { >>> + if (ol_flags == PKT_RX_L4_CKSUM_BAD) >>> + goto fail; >>> + } else { >>> + tcp_cksum = tcp_hdr->cksum; >>> + tcp_hdr->cksum = 0; >>> + tcp_hdr->cksum = rte_ipv4_udptcp_cksum >>> + (ipv4_hdr, tcp_hdr); >>> + if (tcp_hdr->cksum ^ tcp_cksum) >>> + goto fail; >>> + } >>> + >>> + /** >>> + * 3. search for a flow and traverse all packets in the flow >>> + * to find one to merge with the given packet. >>> + */ >>> + key.eth_saddr = eth_hdr->s_addr; >>> + key.eth_daddr = eth_hdr->d_addr; >>> + key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr); >>> + key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr); >>> + key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port); >>> + key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port); >>> + key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack); >>> + key.tcp_flags = tcp_hdr->tcp_flags; >>> + >>> + for (i = 0; i < tbl->max_flow_num; i++) { >>> + /* search all packets in a valid flow. */ >>> + if (tbl->flows[i].is_valid && >>> + (memcmp(&(tbl->flows[i].key), &key, >>> + sizeof(struct gro_tcp_flow_key)) >>> + == 0)) { >>> + cur_idx = tbl->flows[i].start_index; >>> + prev_idx = cur_idx; >>> + while (cur_idx != INVALID_ITEM_INDEX) { >>> + if (check_seq_option(tbl->items[cur_idx].pkt, >>> + tcp_hdr, >>> + tcp_hl) > 0) { >>> + if (merge_two_tcp4_packets( >>> + tbl->items[cur_idx].pkt, >>> + pkt, >>> + rule) > 0) { >>> + /* successfully merge two packets */ >>> + tbl->items[cur_idx].is_groed = 1; >>> + return 1; >>> + } >>> + /** >>> + * fail to merge two packets since >>> + * break the rules, add the packet >>> + * into the flow. >>> + */ >>> + goto insert_to_existed_flow; >>> + } else { >>> + prev_idx = cur_idx; >>> + cur_idx = tbl->items[cur_idx].next_pkt_idx; >>> + } >>> + } >>> + /** >>> + * fail to merge the given packet into an existed flow, >>> + * add it into the flow. >>> + */ >>> +insert_to_existed_flow: >>> + item_idx = find_an_empty_item(tbl); >>> + /* the item number is beyond the maximum value */ >>> + if (item_idx == INVALID_ITEM_INDEX) >>> + return -1; >>> + tbl->items[prev_idx].next_pkt_idx = item_idx; >>> + tbl->items[item_idx].pkt = pkt; >>> + tbl->items[item_idx].is_groed = 0; >>> + tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX; >>> + tbl->items[item_idx].is_valid = 1; >>> + tbl->items[item_idx].start_time = rte_rdtsc(); >>> + tbl->item_num++; >>> + return 0; >>> + } >>> + } >>> + >>> + /** >>> + * merge fail as the given packet is a new flow. Therefore, >>> + * insert a new flow. >>> + */ >>> + item_idx = find_an_empty_item(tbl); >>> + flow_idx = find_an_empty_flow(tbl); >>> + /** >>> + * if the flow or item number are beyond the maximum values, >>> + * the inputted packet won't be processed. >>> + */ >>> + if (item_idx == INVALID_ITEM_INDEX || >>> + flow_idx == INVALID_FLOW_INDEX) >>> + return -1; >>> + tbl->items[item_idx].pkt = pkt; >>> + tbl->items[item_idx].next_pkt_idx = INVALID_ITEM_INDEX; >>> + tbl->items[item_idx].is_groed = 0; >>> + tbl->items[item_idx].is_valid = 1; >>> + tbl->items[item_idx].start_time = rte_rdtsc(); >>> + tbl->item_num++; >>> + >>> + memcpy(&(tbl->flows[flow_idx].key), >>> + &key, sizeof(struct gro_tcp_flow_key)); >>> + tbl->flows[flow_idx].start_index = item_idx; >>> + tbl->flows[flow_idx].is_valid = 1; >>> + tbl->flow_num++; >>> + >>> + return 0; >>> +fail: >>> + return -1; >>> +} >>> + >>> +uint16_t gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl, >>> + uint16_t flush_num, >>> + struct rte_mbuf **out, >>> + const uint16_t nb_out) >>> +{ >>> + uint16_t num, k; >>> + uint16_t i; >>> + uint32_t j; >>> + >>> + k = 0; >>> + num = tbl->item_num > flush_num ? flush_num : tbl->item_num; >>> + num = num > nb_out ? nb_out : num; >>> + if (unlikely(num == 0)) >>> + return 0; >>> + >>> + for (i = 0; i < tbl->max_flow_num; i++) { >>> + if (tbl->flows[i].is_valid) { >>> + j = tbl->flows[i].start_index; >>> + while (j != INVALID_ITEM_INDEX) { >>> + /* update checksum for GROed packet */ >>> + if (tbl->items[j].is_groed) >>> + gro_tcp4_cksum_update(tbl->items[j].pkt); >>> + >>> + out[k++] = tbl->items[j].pkt; >>> + tbl->items[j].is_valid = 0; >>> + tbl->item_num--; >>> + j = tbl->items[j].next_pkt_idx; >>> + >>> + if (k == num) { >>> + /* delete the flow */ >>> + if (j == INVALID_ITEM_INDEX) { >>> + tbl->flows[i].is_valid = 0; >>> + tbl->flow_num--; >>> + } else >>> + /* update flow information */ >>> + tbl->flows[i].start_index = j; >>> + goto end; >>> + } >>> + } >>> + /* delete the flow, as all of its packets are flushed */ >>> + tbl->flows[i].is_valid = 0; >>> + tbl->flow_num--; >>> + } >>> + if (tbl->flow_num == 0) >>> + goto end; >>> + } >>> +end: >>> + return num; >>> +} >>> + >>> +uint16_t >>> +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl, >>> + uint64_t timeout_cycles, >>> + struct rte_mbuf **out, >>> + const uint16_t nb_out) >>> +{ >>> + uint16_t k; >>> + uint16_t i; >>> + uint32_t j; >>> + uint64_t current_time; >>> + >>> + if (nb_out == 0) >>> + return 0; >>> + k = 0; >>> + current_time = rte_rdtsc(); >>> + >>> + for (i = 0; i < tbl->max_flow_num; i++) { >>> + if (tbl->flows[i].is_valid) { >>> + j = tbl->flows[i].start_index; >>> + while (j != INVALID_ITEM_INDEX) { >>> + if (current_time - tbl->items[j].start_time >= >>> + timeout_cycles) { >>> + /* update checksum for GROed packet */ >>> + if (tbl->items[j].is_groed) >>> + gro_tcp4_cksum_update(tbl->items[j].pkt); >>> + >>> + out[k++] = tbl->items[j].pkt; >>> + tbl->items[j].is_valid = 0; >>> + tbl->item_num--; >>> + j = tbl->items[j].next_pkt_idx; >>> + >>> + if (k == nb_out && >>> + j == INVALID_ITEM_INDEX) { >>> + /* delete the flow */ >>> + tbl->flows[i].is_valid = 0; >>> + tbl->flow_num--; >>> + goto end; >>> + } else if (k == nb_out && >>> + j != INVALID_ITEM_INDEX) { >>> + tbl->flows[i].start_index = j; >>> + goto end; >>> + } >>> + } >>> + } >>> + /* delete the flow, as all of its packets are flushed */ >>> + tbl->flows[i].is_valid = 0; >>> + tbl->flow_num--; >>> + } >>> + if (tbl->flow_num == 0) >>> + goto end; >>> + } >>> +end: >>> + return k; >>> +} >>> diff --git a/lib/librte_gro/rte_gro_tcp.h b/lib/librte_gro/rte_gro_tcp.h >>> new file mode 100644 >>> index 0000000..551efc4 >>> --- /dev/null >>> +++ b/lib/librte_gro/rte_gro_tcp.h >>> @@ -0,0 +1,210 @@ >>> +/*- >>> + * BSD LICENSE >>> + * >>> + * Copyright(c) 2016-2017 Intel Corporation. All rights reserved. >>> + * >>> + * Redistribution and use in source and binary forms, with or without >>> + * modification, are permitted provided that the following conditions >>> + * are met: >>> + * >>> + * * Redistributions of source code must retain the above copyright >>> + * notice, this list of conditions and the following disclaimer. >>> + * * Redistributions in binary form must reproduce the above copyright >>> + * notice, this list of conditions and the following disclaimer in >>> + * the documentation and/or other materials provided with the >>> + * distribution. >>> + * * Neither the name of Intel Corporation nor the names of its >>> + * contributors may be used to endorse or promote products derived >>> + * from this software without specific prior written permission. >>> + * >>> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS >>> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT >>> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR >>> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT >>> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, >>> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT >>> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, >>> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY >>> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT >>> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE >>> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. >>> + */ >>> + >>> +#ifndef _RTE_GRO_TCP_H_ >>> +#define _RTE_GRO_TCP_H_ >>> + >>> +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN >>> +#define TCP_HDR_LEN(tcph) \ >>> + ((tcph->data_off >> 4) * 4) >>> +#define IPv4_HDR_LEN(iph) \ >>> + ((iph->version_ihl & 0x0f) * 4) >>> +#else >>> +#define TCP_DATAOFF_MASK 0x0f >>> +#define TCP_HDR_LEN(tcph) \ >>> + ((tcph->data_off & TCP_DATAOFF_MASK) * 4) >>> +#define IPv4_HDR_LEN(iph) \ >>> + ((iph->version_ihl >> 4) * 4) >>> +#endif >>> + >>> +#define IPV4_HDR_DF_SHIFT 14 >>> +#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT) >>> + >>> +#define INVALID_FLOW_INDEX 0xffffU >>> +#define INVALID_ITEM_INDEX 0xffffffffUL >>> + >>> +#define GRO_TCP_TBL_MAX_FLOW_NUM (UINT16_MAX - 1) >>> +#define GRO_TCP_TBL_MAX_ITEM_NUM (UINT32_MAX - 1) >>> + >>> +/* criteria of mergeing packets */ >>> +struct gro_tcp_flow_key { >>> + struct ether_addr eth_saddr; >>> + struct ether_addr eth_daddr; >>> + uint32_t ip_src_addr[4]; /**< IPv4 uses the first 8B */ >>> + uint32_t ip_dst_addr[4]; >>> + >>> + uint32_t recv_ack; /**< acknowledgment sequence number. */ >>> + uint16_t src_port; >>> + uint16_t dst_port; >>> + uint8_t tcp_flags; /**< TCP flags. */ >>> +}; >>> + >>> +struct gro_tcp_flow { >>> + struct gro_tcp_flow_key key; >>> + uint32_t start_index; /**< the first packet index of the flow */ >>> + uint8_t is_valid; >>> +}; >>> + >>> +struct gro_tcp_item { >>> + struct rte_mbuf *pkt; /**< packet address. */ >>> + /* the time when the packet in added into the table */ >>> + uint64_t start_time; >>> + uint32_t next_pkt_idx; /**< next packet index. */ >>> + /* flag to indicate if the packet is GROed */ >>> + uint8_t is_groed; >>> + uint8_t is_valid; /**< flag indicates if the item is valid */ >>> +}; >>> + >>> +/** >>> + * TCP reassembly table. Both TCP/IPv4 and TCP/IPv6 use the same table >>> + * structure. >>> + */ >>> +struct gro_tcp_tbl { >>> + struct gro_tcp_item *items; /**< item array */ >>> + struct gro_tcp_flow *flows; /**< flow array */ >>> + uint32_t item_num; /**< current item number */ >>> + uint16_t flow_num; /**< current flow num */ >>> + uint32_t max_item_num; /**< item array size */ >>> + uint16_t max_flow_num; /**< flow array size */ >>> +}; >>> + >>> +/* rules to reassemble TCP packets, which are decided by applications */ >>> +struct gro_tcp_rule { >>> + /* the maximum packet length after merged */ >>> + uint32_t max_packet_size; >>> +}; >> Are there any other rules? If not, I prefer to use max_packet_size directly. > If we agree to use a flag to indicate if check checksum, this structure should > be used to keep this flag. > >>> + >>> +/** >>> + * This function is to update TCP and IPv4 header checksums >>> + * for merged packets in the TCP reassembly table. >>> + */ >>> +void gro_tcp4_tbl_cksum_update(struct gro_tcp_tbl *tbl); >>> + >>> +/** >>> + * This function creates a TCP reassembly table. >>> + * >>> + * @param socket_id >>> + * socket index where the Ethernet port connects to. >>> + * @param max_flow_num >>> + * the maximum number of flows in the TCP GRO table >>> + * @param max_item_per_flow >>> + * the maximum packet number per flow. >>> + * @return >>> + * if create successfully, return a pointer which points to the >>> + * created TCP GRO table. Otherwise, return NULL. >>> + */ >>> +void *gro_tcp_tbl_create(uint16_t socket_id, >>> + uint16_t max_flow_num, >>> + uint16_t max_item_per_flow); >>> + >>> +/** >>> + * This function destroys a TCP reassembly table. >>> + * @param tbl >>> + * a pointer points to the TCP reassembly table. >>> + */ >>> +void gro_tcp_tbl_destroy(void *tbl); >>> + >>> +/** >>> + * This function searches for a packet in the TCP reassembly table to >>> + * merge with the inputted one. To merge two packets is to chain them >>> + * together and update packet headers. Note that this function won't >>> + * re-calculate IPv4 and TCP checksums. >>> + * >>> + * If the packet doesn't have data, or with wrong checksums, or is >>> + * fragmented etc., errors happen and gro_tcp4_reassemble returns >>> + * immediately. If no errors happen, the packet is either merged, or >>> + * inserted into the reassembly table. >>> + * >>> + * If applications want to get packets in the reassembly table, they >>> + * need to manually flush the packets. >>> + * >>> + * @param pkt >>> + * packet to reassemble. >>> + * @param tbl >>> + * a pointer that points to a TCP reassembly table. >>> + * @param rule >>> + * TCP reassembly criteria defined by applications. >>> + * @return >>> + * if the inputted packet is merged successfully, return an positive >>> + * value. If the packet hasn't be merged with any packets in the TCP >>> + * reassembly table. If errors happen, return a negative value and the >>> + * packet won't be inserted into the reassemble table. >>> + */ >>> +int32_t >>> +gro_tcp4_reassemble(struct rte_mbuf *pkt, >>> + struct gro_tcp_tbl *tbl, >>> + struct gro_tcp_rule *rule); >>> + >>> +/** >>> + * This function flushes the packets in a TCP reassembly table to >>> + * applications. Before returning the packets, it will update TCP and >>> + * IPv4 header checksums. >>> + * >>> + * @param tbl >>> + * a pointer that points to a TCP GRO table. >>> + * @param flush_num >>> + * the number of packets that applications want to flush. >>> + * @param out >>> + * pointer array which is used to keep flushed packets. >>> + * @param nb_out >>> + * the maximum element number of out. >>> + * @return >>> + * the number of packets that are flushed finally. >>> + */ >>> +uint16_t >>> +gro_tcp_tbl_flush(struct gro_tcp_tbl *tbl, >>> + uint16_t flush_num, >>> + struct rte_mbuf **out, >>> + const uint16_t nb_out); >>> + >>> +/** >>> + * This function flushes timeout packets in a TCP reassembly table to >>> + * applications. Before returning the packets, it updates TCP and IPv4 >>> + * header checksums. >>> + * >>> + * @param tbl >>> + * a pointer that points to a TCP GRO table. >>> + * @param timeout_cycles >>> + * the maximum time that packets can stay in the table. >>> + * @param out >>> + * pointer array which is used to keep flushed packets. >>> + * @param nb_out >>> + * the maximum element number of out. >>> + * @return >>> + * It returns the number of packets that are flushed finally. >>> + */ >>> +uint16_t >>> +gro_tcp_tbl_timeout_flush(struct gro_tcp_tbl *tbl, >>> + uint64_t timeout_cycles, >>> + struct rte_mbuf **out, >>> + const uint16_t nb_out); >>> +#endif