From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 02822A04B1; Thu, 17 Sep 2020 04:06:44 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 7F1421D421; Thu, 17 Sep 2020 04:06:44 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id D97591D37E for ; Thu, 17 Sep 2020 04:06:41 +0200 (CEST) IronPort-SDR: FeJM2anzA7YFMzCeutrRfXR+mlNvzF9I0ZQouDXYj6TKn7PlJY2CZ9pEQWqfsB1q7Zn+CrsRep Jr4db868RaWA== X-IronPort-AV: E=McAfee;i="6000,8403,9746"; a="244441800" X-IronPort-AV: E=Sophos;i="5.76,434,1592895600"; d="scan'208,217";a="244441800" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Sep 2020 19:06:40 -0700 IronPort-SDR: LsgYIxIejtcef+DyxoEJs8B3c+Jj45dVqlAnpRhmaMRSjTCxeJC8qkATODQV3tlK7k+PVaOooe tWfW4F3QVTUQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,434,1592895600"; d="scan'208,217";a="339282639" Received: from fmsmsx606.amr.corp.intel.com ([10.18.126.86]) by fmsmga002.fm.intel.com with ESMTP; 16 Sep 2020 19:06:40 -0700 Received: from shsmsx604.ccr.corp.intel.com (10.109.6.214) by fmsmsx606.amr.corp.intel.com (10.18.126.86) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Wed, 16 Sep 2020 19:06:39 -0700 Received: from shsmsx606.ccr.corp.intel.com (10.109.6.216) by SHSMSX604.ccr.corp.intel.com (10.109.6.214) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 17 Sep 2020 10:06:37 +0800 Received: from shsmsx606.ccr.corp.intel.com ([10.109.6.216]) by SHSMSX606.ccr.corp.intel.com ([10.109.6.216]) with mapi id 15.01.1713.004; Thu, 17 Sep 2020 10:06:37 +0800 From: "Hu, Jiayu" To: yang_y_yi CC: "yangyi01@inspur.com" , "thomas@monjalon.net" , "dev@dpdk.org" , Stephen Hemminger Thread-Topic: Re:Re: RE: Re:Re: Re: [PATCH v5 2/2] gro: add VXLAN UDP GRO support Thread-Index: AQHWi9NUb//q9Ji/g0iuzvlYMopoxKlqDr4AgAIFRdA= Date: Thu, 17 Sep 2020 02:06:37 +0000 Message-ID: References: <20200910092914.120452-1-yang_y_yi@163.com> <20200910092914.120452-3-yang_y_yi@163.com> <20200911042416.GA46174@NPG_DPDK_VIRTIO_jiayuhu_15.sh.intel.com> <1623572b.1b82.1748a63b63f.Coremail.yang_y_yi@163.com> <20200914042126.GA15258@NPG_DPDK_VIRTIO_jiayuhu_15.sh.intel.com> <20ddfcd3.5b19.1748bb90144.Coremail.yang_y_yi@163.com> <662209b31e344cfb8994a5b5cdcb0375@intel.com> <778c4739.666a.1748be56189.Coremail.yang_y_yi@163.com> <20200916025424.GA90702@NPG_DPDK_VIRTIO_jiayuhu_15.sh.intel.com> <36eaaf64.21fb.17494dff30d.Coremail.yang_y_yi@163.com> In-Reply-To: <36eaaf64.21fb.17494dff30d.Coremail.yang_y_yi@163.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-reaction: no-action dlp-version: 11.5.1.3 dlp-product: dlpe-windows x-originating-ip: [10.239.127.36] MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH v5 2/2] gro: add VXLAN UDP GRO support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" OK, I agree to ignore processing outer IP ID. But I think the comment need = to change, as it is not accurate. VxLAN/UDP GRO doesn't process outer IP ID is because it= is meaningless for reassembly in the ovs case you mentioned, rather than it's out-of-order= . You can give the ovs example in the commit log, and it's easier to understand the c= ode IMO. Thanks, Jiayu From: yang_y_yi Sent: Wednesday, September 16, 2020 11:06 AM To: Hu, Jiayu Cc: yangyi01@inspur.com; thomas@monjalon.net; dev@dpdk.org Subject: Re:Re: RE: Re:Re: Re: [PATCH v5 2/2] gro: add VXLAN UDP GRO suppor= t Importance: High No, next_proto_id of inner IP header can clearly identify it is a UDP packe= t even if it is a no-udp-header ip fragment. At 2020-09-16 10:54:24, "Jiayu Hu" > wrote: >On Mon, Sep 14, 2020 at 05:14:59PM +0800, yang_y_yi wrote: >> >> Jiayu, VM to VM case for big UDP packet (say udp parload size is 8192K, = but mtu >> 1500). VM will segment it as UDP framgments, but when ovs dpdk uses VxLa= n to >> encapsulate them, it doesn't know they are same flow, OVS DPDK has no kn= owldge >> about them, just encapsulate them as if they are not related. But on rec= eive >> side on another machine, OVS DPDK will GRO them, at this point, outer IP= ID is >> even random for every packet. For TCP, I think same issue is there, GRO = just >> considered them as different flow and doesn't do GRO, in my opinion, we = also >> should do GRO for such case. > >I am a little confused about ovs behavior. Does ovs know they are udp frag= ments? >Or it knows they are fragments but doesn't know they are from one flow? >Given ovs receives 3 udp fragments of a udp packet, if ovs doesn't know th= ey are >fragments of a udp packet, their mbuf->packet_type will be ether/ipv4/udp, >ether/ipv4, and ether/ipv4. dpdk udp gro will not process the second and t= hird >one as their packet type in mbuf is not ether/ipv4/udp. > >> >> >> 2020-09-14 16:50:08 "Hu, Jiayu" > =C4 >> >> Out IP ID is not used to check if they belong to the same flow, >> >> but check if they are neighbors. If two vxlan/udp fragments are >> >> neighbors, their outer IP ID and outer frag_oft should be incrementa= l. >> >> I still cannot understand why ignore outer IP ID when DF is 0. >> >> Can you give more explanation? >> >> >> >> From: yang_y_yi > >> Sent: Monday, September 14, 2020 4:27 PM >> To: Hu, Jiayu > >> Cc: thomas@monjalon.net; yangyi01@inspur= .com; dev@dpdk.org >> Subject: Re:Re: Re: [PATCH v5 2/2] gro: add VXLAN UDP GRO support >> Importance: High >> >> >> >> For outer_ip_id, there is same ip_id disorder issue existing as UDP = GRO, so >> we can't use ip_id +/-1 to check if they are same flow, here is my >> incrmental change for it with more comments. >> >> >> >> @@ -189,7 +188,12 @@ >> is_same_vxlan_udp4_flow(struct vxlan_udp4_flow_key k1, >> struct vxlan_udp4_flow_key k2) >> { >> - /* src port is changing, so shouldn't use it for flow check = */ >> + /* For VxLAN packet, outer udp src port is calculated from >> + * inner packet RSS hash, udp src port of the first UDP >> + * fragment is different from one of other UDP fragments >> + * even if they are same flow, so we have to skip outer udp >> + * src port comparison here. >> + */ >> return (rte_is_same_ether_addr(&k1.outer_eth_saddr, >> &k2.outer_eth_saddr) && >> rte_is_same_ether_addr(&k1.outer_eth_daddr, >> @@ -212,12 +216,16 @@ >> uint16_t l2_offset; >> int ret =3D 0; >> >> + /* Note: if outer DF bit is set, i.e outer_is_atomic is 0, >> + * we needn't compare outer_ip_id because they are same, >> + * for the case outer_is_atomic is 1, we also have no way >> + * to comapre outer_ip_id because the received packets can >> + * be out of order. So ignore outer_ip_id comparison here. >> + */ >> + >> l2_offset =3D pkt->outer_l2_len + pkt->outer_l3_len; >> cmp =3D udp4_check_neighbor(&item->inner_item, frag_offset, = ip_dl, >> l2_offset); >> - /* VXLAN outer IP ID is out of order, so don't touch it and >> - * don't compare it. >> - */ >> if (cmp > 0) >> /* Append the new packet. */ >> ret =3D 1; >> @@ -354,7 +362,9 @@ >> rte_ether_addr_copy(&(outer_eth_hdr->d_addr), & >> (key.outer_eth_daddr)); >> key.outer_ip_src_addr =3D outer_ipv4_hdr->src_addr; >> key.outer_ip_dst_addr =3D outer_ipv4_hdr->dst_addr; >> - key.outer_src_port =3D udp_hdr->src_port; >> + /* Note: It is unnecessary to save outer_src_port here becau= se it >> can >> + * be different for VxLAN UDP fragments from the same flow. >> + */ >> key.outer_dst_port =3D udp_hdr->dst_port; >> >> /* Search for a matched flow. */ >> >> >> >> >> >> >> >> >> At 2020-09-14 12:21:26, "Jiayu Hu" > wrote: >> >> >Replies are inline. >> >> > >> >> >BTW, you need to update the programmer guide >> >> >doc/guides/prog_guide/generic_receive_offload_lib.rst and >> >> >release note doc/guides/rel_notes/release_20_11.rst. >> >> > >> >> >Thanks, >> >> >Jiayu >> >> > >> >> >On Mon, Sep 14, 2020 at 10:13:44AM +0800, yang_y_yi wrote: >> >> >> Jiayu, thank you so much, please check my replies for some of you= r comments, >> >> >> here is incremental patch. I built it by meson this time :-) >> >> >> >> >> >> diff --git a/lib/librte_gro/gro_vxlan_udp4.c b/lib/librte_gro/gro= _vxlan_udp4.c >> >> >> index 9e9df72..1fcfaf1 100644 >> >> >> --- a/lib/librte_gro/gro_vxlan_udp4.c >> >> >> +++ b/lib/librte_gro/gro_vxlan_udp4.c >> >> >> @@ -203,7 +203,7 @@ >> >> >> } >> >> >> >> >> >> static inline int >> >> >> -udp_check_vxlan_neighbor(struct gro_vxlan_udp4_item *item, >> >> >> +udp4_check_vxlan_neighbor(struct gro_vxlan_udp4_item *item, >> >> >> uint16_t frag_offset, >> >> >> uint16_t ip_dl) >> >> >> { >> >> >> @@ -213,7 +213,7 @@ >> >> >> int ret =3D 0; >> >> >> >> >> >> l2_offset =3D pkt->outer_l2_len + pkt->outer_l3_len; >> >> >> - cmp =3D udp_check_neighbor(&item->inner_item, frag_offset= , ip_dl, >> >> >> + cmp =3D udp4_check_neighbor(&item->inner_item, frag_offse= t, ip_dl, >> >> >> l2_offset); >> >> >> /* VXLAN outer IP ID is out of order, so don't touch it a= nd >> >> >> * don't compare it. >> >> >> @@ -379,7 +379,7 @@ >> >> >> item_idx =3D insert_new_item(tbl, pkt, start_time= , >> >> >> INVALID_ARRAY_INDEX, frag_offset, >> >> >> is_last_frag, outer_ip_id, outer_= is_atomic); >> >> >> - if (item_idx =3D=3D INVALID_ARRAY_INDEX) >> >> >> + if (unlikely(item_idx =3D=3D INVALID_ARRAY_INDEX)= ) >> >> >> return -1; >> >> >> if (insert_new_flow(tbl, &key, item_idx) =3D=3D >> >> >> INVALID_ARRAY_INDEX) { >> >> >> @@ -397,7 +397,7 @@ >> >> >> cur_idx =3D tbl->flows[i].start_index; >> >> >> prev_idx =3D cur_idx; >> >> >> do { >> >> >> - cmp =3D udp_check_vxlan_neighbor(&(tbl->items[cur= _idx]), >> >> >> + cmp =3D udp4_check_vxlan_neighbor(&(tbl->items[cu= r_idx]), >> >> >> frag_offset, ip_dl); >> >> >> if (cmp) { >> >> >> if (merge_two_vxlan_udp4_packets( >> >> >> @@ -436,7 +436,7 @@ >> >> >> item_idx =3D insert_new_item(tbl, pkt, start_time= , >> >> >> INVALID_ARRAY_INDEX, frag_offset, >> >> >> is_last_frag, outer_ip_id, outer_= is_atomic); >> >> >> - if (item_idx =3D=3D INVALID_ARRAY_INDEX) >> >> >> + if (unlikely(item_idx =3D=3D INVALID_ARRAY_INDEX)= ) >> >> >> return -1; >> >> >> tbl->items[item_idx].inner_item.next_pkt_idx =3D = cur_idx; >> >> >> tbl->flows[i].start_index =3D item_idx; >> >> >> @@ -470,7 +470,7 @@ >> >> >> ip_dl =3D pkt->pkt_len - hdr_len; >> >> >> diff --git a/lib/librte_gro/gro_vxlan_udp4.c b/lib/librte_gro/gro= _vxlan_udp4.c >> >> >> index 9e9df72..1fcfaf1 100644 >> >> >> --- a/lib/librte_gro/gro_vxlan_udp4.c >> >> >> +++ b/lib/librte_gro/gro_vxlan_udp4.c >> >> >> @@ -203,7 +203,7 @@ >> >> >> } >> >> >> >> >> >> static inline int >> >> >> -udp_check_vxlan_neighbor(struct gro_vxlan_udp4_item *item, >> >> >> +udp4_check_vxlan_neighbor(struct gro_vxlan_udp4_item *item, >> >> >> uint16_t frag_offset, >> >> >> uint16_t ip_dl) >> >> >> { >> >> >> @@ -213,7 +213,7 @@ >> >> >> int ret =3D 0; >> >> >> >> >> >> l2_offset =3D pkt->outer_l2_len + pkt->outer_l3_len; >> >> >> - cmp =3D udp_check_neighbor(&item->inner_item, frag_offset= , ip_dl, >> >> >> + cmp =3D udp4_check_neighbor(&item->inner_item, frag_offse= t, ip_dl, >> >> >> l2_offset); >> >> >> /* VXLAN outer IP ID is out of order, so don't touch it a= nd >> >> >> * don't compare it. >> >> >> @@ -379,7 +379,7 @@ >> >> >> item_idx =3D insert_new_item(tbl, pkt, start_time= , >> >> >> INVALID_ARRAY_INDEX, frag_offset, >> >> >> is_last_frag, outer_ip_id, outer_= is_atomic); >> >> >> - if (item_idx =3D=3D INVALID_ARRAY_INDEX) >> >> >> + if (unlikely(item_idx =3D=3D INVALID_ARRAY_INDEX)= ) >> >> >> return -1; >> >> >> if (insert_new_flow(tbl, &key, item_idx) =3D=3D >> >> >> INVALID_ARRAY_INDEX) { >> >> >> @@ -397,7 +397,7 @@ >> >> >> cur_idx =3D tbl->flows[i].start_index; >> >> >> prev_idx =3D cur_idx; >> >> >> do { >> >> >> - cmp =3D udp_check_vxlan_neighbor(&(tbl->items[cur= _idx]), >> >> >> + cmp =3D udp4_check_vxlan_neighbor(&(tbl->items[cu= r_idx]), >> >> >> frag_offset, ip_dl); >> >> >> if (cmp) { >> >> >> if (merge_two_vxlan_udp4_packets( >> >> >> @@ -436,7 +436,7 @@ >> >> >> item_idx =3D insert_new_item(tbl, pkt, start_time= , >> >> >> INVALID_ARRAY_INDEX, frag_offset, >> >> >> is_last_frag, outer_ip_id, outer_= is_atomic); >> >> >> - if (item_idx =3D=3D INVALID_ARRAY_INDEX) >> >> >> + if (unlikely(item_idx =3D=3D INVALID_ARRAY_INDEX)= ) >> >> >> return -1; >> >> >> tbl->items[item_idx].inner_item.next_pkt_idx =3D = cur_idx; >> >> >> tbl->flows[i].start_index =3D item_idx; >> >> >> @@ -470,7 +470,7 @@ >> >> >> ip_dl =3D pkt->pkt_len - hdr_len; >> >> >> frag_offset =3D tbl->items[item_idx].inner_item.f= rag_offset; >> >> >> is_last_frag =3D tbl->items[item_idx].inner_item.= is_last_frag; >> >> >> - cmp =3D udp_check_vxlan_neighbor(&(tbl->items[sta= rt_idx]), >> >> >> + cmp =3D udp4_check_vxlan_neighbor(&(tbl->items[st= art_idx]), >> >> >> frag_offset, ip_dl); >> >> >> if (cmp) { >> >> >> if (merge_two_vxlan_udp4_packets( >> >> >> @@ -481,12 +481,10 @@ >> >> >> INVALID_A= RRAY_INDEX); >> >> >> tbl->items[start_idx].inner_item.= next_pkt_idx >> >> >> =3D item_idx; >> >> >> - } else { >> >> >> + } else >> >> >> return 0; >> >> >> - } >> >> >> - } else { >> >> >> + } else >> >> >> return 0; >> >> >> - } >> >> >> } >> >> >> >> >> >> return 0; >> >> >> >> >> >> >> >> >> At 2020-09-11 12:24:16, "Jiayu Hu" > wrote: >> >> >> >Some comments are inline. >> >> >> > >> >> >> >Thanks, >> >> >> >Jiayu >> >> >> > >> >> >> >On Thu, Sep 10, 2020 at 05:29:14PM +0800, yang_y_yi@163.com wrote: >> >> >> >> From: Yi Yang = > >> >> >> >> >> >> >> >> VXLAN UDP GRO can help improve VM-to-VM UDP performance >> >> >> >> when VM is enabled UFO or GSO, GRO must be supported if >> >> >> >> GSO or UFO is enabled, otherwise, performance gain will >> >> >> >> be hurt. >> >> >> >> >> >> >> >> With this enabled in DPDK, OVS DPDK can leverage it to >> >> >> >> improve VM-to-VM UDP performance, this will make sure >> >> >> >> IP fragments will be reassembled once it is received >> >> >> >> from physical NIC. It is very helpful in OVS DPDK VXLAN >> >> >> >> TSO case. >> >> >> >> >> >> >> >> Signed-off-by: Yi Yang > >> >> >> >> --- >> >> >> >> lib/librte_gro/gro_udp4.h | 1 + >> >> >> >> lib/librte_gro/gro_vxlan_udp4.c | 548 +++++++++++++++++++++++= +++++++++++++++++ >> >> >> >> lib/librte_gro/gro_vxlan_udp4.h | 152 +++++++++++ >> >> >> >> lib/librte_gro/meson.build | 2 +- >> >> >> >> lib/librte_gro/rte_gro.c | 115 +++++++-- >> >> >> >> lib/librte_gro/rte_gro.h | 3 + >> >> >> >> 6 files changed, 794 insertions(+), 27 deletions(-) >> >> >> >> create mode 100644 lib/librte_gro/gro_vxlan_udp4.c >> >> >> >> create mode 100644 lib/librte_gro/gro_vxlan_udp4.h >> >> >> >> >> >> >> >> diff --git a/lib/librte_gro/gro_udp4.h b/lib/librte_gro/gro_ud= p4.h >> >> >> >> index 0a078e4..d38b393 100644 >> >> >> >> --- a/lib/librte_gro/gro_udp4.h >> >> >> >> +++ b/lib/librte_gro/gro_udp4.h >> >> >> >> @@ -7,6 +7,7 @@ >> >> >> >> >> >> >> >> #include >> >> >> >> #include >> >> >> >> +#include >> >> >> >> >> >> >> >> #define INVALID_ARRAY_INDEX 0xffffffffUL >> >> >> >> #define GRO_UDP4_TBL_MAX_ITEM_NUM (1024UL * 1024UL) >> >> >> >> diff --git a/lib/librte_gro/gro_vxlan_udp4.c b/lib/librte_gro/= gro_vxlan_udp4.c >> >> >> >> new file mode 100644 >> >> >> >> index 0000000..9e9df72 >> >> >> >> --- /dev/null >> >> >> >> +++ b/lib/librte_gro/gro_vxlan_udp4.c >> >> >> >> @@ -0,0 +1,548 @@ >> >> >> >> +/* SPDX-License-Identifier: BSD-3-Clause >> >> >> >> + * Copyright(c) 2020 Inspur Corporation >> >> >> >> + */ >> >> >> >> + >> >> >> >> + >> >> >> >> + >> >> >> >> +static inline uint32_t >> >> >> >> +delete_item(struct gro_vxlan_udp4_tbl *tbl, >> >> >> >> + uint32_t item_idx, >> >> >> >> + uint32_t prev_item_idx) >> >> >> >> +{ >> >> >> >> + uint32_t next_idx =3D tbl->items[item_idx].inner_item.nex= t_pkt_idx; >> >> >> >> + >> >> >> >> + /* NULL indicates an empty item. */ >> >> >> >> + tbl->items[item_idx].inner_item.firstseg =3D NULL; >> >> >> >> + tbl->item_num--; >> >> >> >> + if (prev_item_idx !=3D INVALID_ARRAY_INDEX) >> >> >> >> + tbl->items[prev_item_idx].inner_item.next_pkt_idx= =3D next_idx; >> >> >> >> + >> >> >> >> + return next_idx; >> >> >> >> +} >> >> >> >> + >> >> >> >> +static inline uint32_t >> >> >> >> +insert_new_flow(struct gro_vxlan_udp4_tbl *tbl, >> >> >> >> + struct vxlan_udp4_flow_key *src, >> >> >> >> + uint32_t item_idx) >> >> >> >> +{ >> >> >> >> + struct vxlan_udp4_flow_key *dst; >> >> >> >> + uint32_t flow_idx; >> >> >> >> + >> >> >> >> + flow_idx =3D find_an_empty_flow(tbl); >> >> >> >> + if (unlikely(flow_idx =3D=3D INVALID_ARRAY_INDEX)) >> >> >> >> + return INVALID_ARRAY_INDEX; >> >> >> >> + >> >> >> >> + dst =3D &(tbl->flows[flow_idx].key); >> >> >> >> + >> >> >> >> + rte_ether_addr_copy(&(src->inner_key.eth_saddr), >> >> >> >> + &(dst->inner_key.eth_saddr)); >> >> >> >> + rte_ether_addr_copy(&(src->inner_key.eth_daddr), >> >> >> >> + &(dst->inner_key.eth_daddr)); >> >> >> >> + dst->inner_key.ip_src_addr =3D src->inner_key.ip_src_addr= ; >> >> >> >> + dst->inner_key.ip_dst_addr =3D src->inner_key.ip_dst_addr= ; >> >> >> >> + dst->inner_key.ip_id =3D src->inner_key.ip_id; >> >> >> >> + >> >> >> >> + dst->vxlan_hdr.vx_flags =3D src->vxlan_hdr.vx_flags; >> >> >> >> + dst->vxlan_hdr.vx_vni =3D src->vxlan_hdr.vx_vni; >> >> >> >> + rte_ether_addr_copy(&(src->outer_eth_saddr), &(dst->outer= _eth_saddr)); >> >> >> >> + rte_ether_addr_copy(&(src->outer_eth_daddr), &(dst->outer= _eth_daddr)); >> >> >> >> + dst->outer_ip_src_addr =3D src->outer_ip_src_addr; >> >> >> >> + dst->outer_ip_dst_addr =3D src->outer_ip_dst_addr; >> >> >> >> + dst->outer_src_port =3D src->outer_src_port; >> >> >> >> + dst->outer_dst_port =3D src->outer_dst_port; >> >> >> >> + >> >> >> >> + tbl->flows[flow_idx].start_index =3D item_idx; >> >> >> >> + tbl->flow_num++; >> >> >> >> + >> >> >> >> + return flow_idx; >> >> >> >> +} >> >> >> >> + >> >> >> >> +static inline int >> >> >> >> +is_same_vxlan_udp4_flow(struct vxlan_udp4_flow_key k1, >> >> >> >> + struct vxlan_udp4_flow_key k2) >> >> >> >> +{ >> >> >> >> + /* src port is changing, so shouldn't use it for flow che= ck */ >> >> >> > >> >> >> >As I know, fragments of a vxlan/udp/ipv4 packet have same outer = source >> >> >> >port. In what cases outer source port will change? >> >> >> >> >> >> In OVS, vxlan udp soure port is calculated from inner packet RSS,= so for UDP fragment, RSS of first packet is >> >> >> different from other fragments because RSS is calculated based on= 5 tuples (src ip, src port, dst ip, dst port, >> >> >> protocol type). >> >> >> >> >> >> ovs/lib/netdev-native-tnl.h: >> >> >> >> >> >> static inline uint32_t * >> >> >> dp_packet_rss_ptr(const struct dp_packet *b) >> >> >> { >> >> >> return CONST_CAST(uint32_t *, &b->mbuf.hash.rss); >> >> >> } >> >> >> >> >> >> static inline uint32_t >> >> >> dp_packet_get_rss_hash(const struct dp_packet *p) >> >> >> { >> >> >> return *dp_packet_rss_ptr(p); >> >> >> } >> >> >> >> >> >> static inline ovs_be16 >> >> >> netdev_tnl_get_src_port(struct dp_packet *packet) >> >> >> { >> >> >> uint32_t hash; >> >> >> >> >> >> hash =3D dp_packet_get_rss_hash(packet); >> >> >> >> >> >> return htons((((uint64_t) hash * (tnl_udp_port_max - tnl_udp_= port_min)) >> 32) + >> >> >> tnl_udp_port_min); >> >> >> } >> >> >> >> >> >> void >> >> >> netdev_tnl_push_udp_header(const struct netdev *netdev OVS_UNUSED= , >> >> >> struct dp_packet *packet, >> >> >> const struct ovs_action_push_tnl *data= ) >> >> >> { >> >> >> struct udp_header *udp; >> >> >> int ip_tot_size; >> >> >> >> >> >> udp =3D netdev_tnl_push_ip_header(packet, data->header, data-= >header_len, &ip_tot_size); >> >> >> >> >> >> /* set udp src port */ >> >> >> udp->udp_src =3D netdev_tnl_get_src_port(packet); >> >> >> udp->udp_len =3D htons(ip_tot_size); >> >> >> >> >> >> if (udp->udp_csum) { >> >> >> netdev_tnl_calc_udp_csum(udp, packet, ip_tot_size); >> >> >> } >> >> >> } >> >> >> >> >> >> This is special for ECMP which only checks outer header to differ= entiate different flows (innner packet). The >> >> >> first fragment of original inner UDP is considered as different f= low from other fragments. >> >> >> >> >> >> I found this by debugging becuase the same UDP traffic did genera= te different vxlan UDP source port. >> >> >> >> >> > >> >> >You are right. I checked VxLAN RFC 7348, and outer source port is >> >> >calculated from inner packet. In OVS case, the first fragment >> >> >will have different outer source port compared with other fragments= . >> >> >Please add some comments in the code for people to better understan= d >> >> >the code. In addition, need to remove outer_src_port from the struc= ture >> >> >vxlan_udp4_flow_key. >> >> > >> >> >> > >> >> >> >> + return (rte_is_same_ether_addr(&k1.outer_eth_saddr, >> >> >> >> + &k2.outer_eth_saddr) && >> >> >> >> + rte_is_same_ether_addr(&k1.outer_eth_dadd= r, >> >> >> >> + &k2.outer_eth_daddr) && >> >> >> >> + (k1.outer_ip_src_addr =3D=3D k2.outer_ip_= src_addr) && >> >> >> >> + (k1.outer_ip_dst_addr =3D=3D k2.outer_ip_= dst_addr) && >> >> >> >> + (k1.outer_dst_port =3D=3D k2.outer_dst_po= rt) && >> >> >> >> + (k1.vxlan_hdr.vx_flags =3D=3D k2.vxlan_hd= r.vx_flags) && >> >> >> >> + (k1.vxlan_hdr.vx_vni =3D=3D k2.vxlan_hdr.= vx_vni) && >> >> >> >> + is_same_udp4_flow(k1.inner_key, k2.inner_= key)); >> >> >> >> +} >> >> >> >> + >> >> >> >> +static inline int >> >> >> >> +udp_check_vxlan_neighbor(struct gro_vxlan_udp4_item *item, >> >> >> >> + uint16_t frag_offset, >> >> >> >> + uint16_t ip_dl) >> >> >> > >> >> >> >Better rename it as udp4_check_vxlan_neighbor. >> >> >> >> >> >> Ok >> >> >> >> >> >> > >> >> >> >> +{ >> >> >> >> + struct rte_mbuf *pkt =3D item->inner_item.firstseg; >> >> >> >> + int cmp; >> >> >> >> + uint16_t l2_offset; >> >> >> >> + int ret =3D 0; >> >> >> >> + >> >> >> >> + l2_offset =3D pkt->outer_l2_len + pkt->outer_l3_len; >> >> >> >> + cmp =3D udp_check_neighbor(&item->inner_item, frag_offset= , ip_dl, >> >> >> >> + l2_offset); >> >> >> > >> >> >> >Why don't check outer IP ID? If outer DF bit is not set, all fra= gments >> >> >> >should have incremental outer IP ID; if DF bit is set, we don't = need >> >> >> >to check outer IP ID. >> >> >> >> >> >> By default, OVS set DF flag. I don't think we need to compare out= er IP id because inner check >> >> >> is enough to differentiate if they are same flow, isn't it? >> >> > >> >> >OVS is one of workloads using DPDK, and we can assume >> >> >other cases have the same design as OVS. I think IP ID >> >> >check is necessary. >> >> > >> >> >> >> >> >> > >> >> >> >In addition, udp_check_neighbor is renamed, but you didn't chang= e here. >> >> >> >> >> >> Forgot to build it, Makefile has been removed from dpdk main bran= ch, build isn't so convinient, will >> >> >> use meson build to check it for next version. >> >> >> >> >> >> > >> >> >> >> + /* VXLAN outer IP ID is out of order, so don't touch it a= nd >> >> >> >> + * don't compare it. >> >> >> >> + */ >> >> >> >> + if (cmp > 0) >> >> >> >> + /* Append the new packet. */ >> >> >> >> + ret =3D 1; >> >> >> >> + else if (cmp < 0) >> >> >> >> + /* Prepend the new packet. */ >> >> >> >> + ret =3D -1; >> >> >> >> + >> >> >> >> + return ret; >> >> >> >> +} >> >> >> >> + >> >> >> >> >> >> >> >> /* >> >> >> >> * GRO context structure. It keeps the table structures, whic= h are >> >> >> >> @@ -137,19 +148,27 @@ struct gro_ctx { >> >> >> >> struct gro_udp4_item udp_items[RTE_GRO_MAX_BURST_ITEM_NUM= ] =3D {{0} }; >> >> >> >> >> >> >> >> /* Allocate a reassembly table for VXLAN TCP GRO */ >> >> >> >> - struct gro_vxlan_tcp4_tbl vxlan_tbl; >> >> >> >> - struct gro_vxlan_tcp4_flow vxlan_flows[RTE_GRO_MAX_BURST_= ITEM_NUM]; >> >> >> >> - struct gro_vxlan_tcp4_item vxlan_items[RTE_GRO_MAX_BURST_= ITEM_NUM] >> >> >> >> + struct gro_vxlan_tcp4_tbl vxlan_tcp_tbl; >> >> >> >> + struct gro_vxlan_tcp4_flow vxlan_tcp_flows[RTE_GRO_MAX_BU= RST_ITEM_NUM]; >> >> >> >> + struct gro_vxlan_tcp4_item vxlan_tcp_items[RTE_GRO_MAX_BU= RST_ITEM_NUM] >> >> >> >> + =3D {{{0}, 0, 0} }; >> >> >> >> + >> >> >> >> + /* Allocate a reassembly table for VXLAN UDP GRO */ >> >> >> >> + struct gro_vxlan_udp4_tbl vxlan_udp_tbl; >> >> >> >> + struct gro_vxlan_udp4_flow vxlan_udp_flows[RTE_GRO_MAX_BU= RST_ITEM_NUM]; >> >> >> >> + struct gro_vxlan_udp4_item vxlan_udp_items[RTE_GRO_MAX_BU= RST_ITEM_NUM] >> >> >> >> =3D {{{0}, 0, 0} }; >> >> >> >> >> >> >> >> struct rte_mbuf *unprocess_pkts[nb_pkts]; >> >> >> >> uint32_t item_num; >> >> >> >> int32_t ret; >> >> >> >> uint16_t i, unprocess_num =3D 0, nb_after_gro =3D nb_pkts= ; >> >> >> >> - uint8_t do_tcp4_gro =3D 0, do_vxlan_gro =3D 0, do_udp4_gr= o =3D 0; >> >> >> >> + uint8_t do_tcp4_gro =3D 0, do_vxlan_tcp_gro =3D 0, do_udp= 4_gro =3D 0, >> >> >> >> + do_vxlan_udp_gro =3D 0; >> >> >> >> >> >> >> >> if (unlikely((param->gro_types & (RTE_GRO_IPV4_VXLAN_TCP_= IPV4 | >> >> >> >> RTE_GRO_TCP_IPV4 | >> >> >> >> + RTE_GRO_IPV4_VXLAN_UDP_IP= V4 | >> >> >> >> RTE_GRO_UDP_IPV4)) =3D=3D= 0)) >> >> >> >> return nb_pkts; >> >> >> >> >> >> >> >> @@ -160,15 +179,28 @@ struct gro_ctx { >> >> >> >> >> >> >> >> if (param->gro_types & RTE_GRO_IPV4_VXLAN_TCP_IPV4) { >> >> >> >> for (i =3D 0; i < item_num; i++) >> >> >> >> - vxlan_flows[i].start_index =3D INVALID_AR= RAY_INDEX; >> >> >> >> - >> >> >> >> - vxlan_tbl.flows =3D vxlan_flows; >> >> >> >> - vxlan_tbl.items =3D vxlan_items; >> >> >> >> - vxlan_tbl.flow_num =3D 0; >> >> >> >> - vxlan_tbl.item_num =3D 0; >> >> >> >> - vxlan_tbl.max_flow_num =3D item_num; >> >> >> >> - vxlan_tbl.max_item_num =3D item_num; >> >> >> >> - do_vxlan_gro =3D 1; >> >> >> >> + vxlan_tcp_flows[i].start_index =3D INVALI= D_ARRAY_INDEX; >> >> >> >> + >> >> >> >> + vxlan_tcp_tbl.flows =3D vxlan_tcp_flows; >> >> >> >> + vxlan_tcp_tbl.items =3D vxlan_tcp_items; >> >> >> >> + vxlan_tcp_tbl.flow_num =3D 0; >> >> >> >> + vxlan_tcp_tbl.item_num =3D 0; >> >> >> >> + vxlan_tcp_tbl.max_flow_num =3D item_num; >> >> >> >> + vxlan_tcp_tbl.max_item_num =3D item_num; >> >> >> >> + do_vxlan_tcp_gro =3D 1; >> >> >> >> + } >> >> >> >> + >> >> >> >> + if (param->gro_types & RTE_GRO_IPV4_VXLAN_UDP_IPV4) { >> >> >> >> + for (i =3D 0; i < item_num; i++) >> >> >> >> + vxlan_udp_flows[i].start_index =3D INVALI= D_ARRAY_INDEX; >> >> >> >> + >> >> >> >> + vxlan_udp_tbl.flows =3D vxlan_udp_flows; >> >> >> >> + vxlan_udp_tbl.items =3D vxlan_udp_items; >> >> >> >> + vxlan_udp_tbl.flow_num =3D 0; >> >> >> >> + vxlan_udp_tbl.item_num =3D 0; >> >> >> >> + vxlan_udp_tbl.max_flow_num =3D item_num; >> >> >> >> + vxlan_udp_tbl.max_item_num =3D item_num; >> >> >> >> + do_vxlan_udp_gro =3D 1; >> >> >> >> } >> >> >> >> >> >> >> >> if (param->gro_types & RTE_GRO_TCP_IPV4) { >> >> >> >> @@ -204,9 +236,18 @@ struct gro_ctx { >> >> >> >> * will be flushed from the tables. >> >> >> >> */ >> >> >> >> if (IS_IPV4_VXLAN_TCP4_PKT(pkts[i]->packet_type) = && >> >> >> >> - do_vxlan_gro) { >> >> >> >> + do_vxlan_tcp_gro) { >> >> >> >> ret =3D gro_vxlan_tcp4_reassemble(pkts[i]= , >> >> >> >> - &vxlan_tb= l, 0); >> >> >> >> + &vxlan_tc= p_tbl, 0); >> >> >> >> + if (ret > 0) >> >> >> >> + /* Merge successfully */ >> >> >> >> + nb_after_gro--; >> >> >> >> + else if (ret < 0) >> >> >> >> + unprocess_pkts[unprocess_num++] = =3D pkts[i]; >> >> >> >> + } else if (IS_IPV4_VXLAN_UDP4_PKT(pkts[i]->packet= _type) && >> >> >> >> + do_vxlan_udp_gro) { >> >> >> >> + ret =3D gro_vxlan_udp4_reassemble(pkts[i]= , >> >> >> >> + &vxlan_ud= p_tbl, 0); >> >> >> >> if (ret > 0) >> >> >> >> /* Merge successfully */ >> >> >> >> nb_after_gro--; >> >> >> >> @@ -236,11 +277,17 @@ struct gro_ctx { >> >> >> >> || (unprocess_num < nb_pkts)) { >> >> >> >> i =3D 0; >> >> >> >> /* Flush all packets from the tables */ >> >> >> >> - if (do_vxlan_gro) { >> >> >> >> - i =3D gro_vxlan_tcp4_tbl_timeout_flush(&v= xlan_tbl, >> >> >> >> + if (do_vxlan_tcp_gro) { >> >> >> >> + i =3D gro_vxlan_tcp4_tbl_timeout_flush(&v= xlan_tcp_tbl, >> >> >> >> 0, pkts, nb_pkts); >> >> >> >> } >> >> >> >> >> >> >> >> + if (do_vxlan_udp_gro) { >> >> >> >> + i +=3D gro_vxlan_udp4_tbl_timeout_flush(&= vxlan_udp_tbl, >> >> >> >> + 0, &pkts[i], nb_pkts - i)= ; >> >> >> >> + >> >> >> >> + } >> >> >> >> + >> >> >> >> if (do_tcp4_gro) { >> >> >> >> i +=3D gro_tcp4_tbl_timeout_flush(&tcp_tb= l, 0, >> >> >> >> &pkts[i], nb_pkts - i); >> >> >> >> @@ -269,33 +316,42 @@ struct gro_ctx { >> >> >> >> { >> >> >> >> struct rte_mbuf *unprocess_pkts[nb_pkts]; >> >> >> >> struct gro_ctx *gro_ctx =3D ctx; >> >> >> >> - void *tcp_tbl, *udp_tbl, *vxlan_tbl; >> >> >> >> + void *tcp_tbl, *udp_tbl, *vxlan_tcp_tbl, *vxlan_udp_tbl; >> >> >> >> uint64_t current_time; >> >> >> >> uint16_t i, unprocess_num =3D 0; >> >> >> >> - uint8_t do_tcp4_gro, do_vxlan_gro, do_udp4_gro; >> >> >> >> + uint8_t do_tcp4_gro, do_vxlan_tcp_gro, do_udp4_gro, do_vx= lan_udp_gro; >> >> >> >> >> >> >> >> if (unlikely((gro_ctx->gro_types & (RTE_GRO_IPV4_VXLAN_TC= P_IPV4 | >> >> >> >> RTE_GRO_TCP_IPV4 | >> >> >> >> + RTE_GRO_IPV4_VXLAN_UDP_IP= V4 | >> >> >> >> RTE_GRO_UDP_IPV4)) =3D=3D= 0)) >> >> >> >> return nb_pkts; >> >> >> >> >> >> >> >> tcp_tbl =3D gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX]; >> >> >> >> - vxlan_tbl =3D gro_ctx->tbls[RTE_GRO_IPV4_VXLAN_TCP_IPV4_I= NDEX]; >> >> >> >> + vxlan_tcp_tbl =3D gro_ctx->tbls[RTE_GRO_IPV4_VXLAN_TCP_IP= V4_INDEX]; >> >> >> >> udp_tbl =3D gro_ctx->tbls[RTE_GRO_UDP_IPV4_INDEX]; >> >> >> >> + vxlan_udp_tbl =3D gro_ctx->tbls[RTE_GRO_IPV4_VXLAN_UDP_IP= V4_INDEX]; >> >> >> >> >> >> >> >> do_tcp4_gro =3D (gro_ctx->gro_types & RTE_GRO_TCP_IPV4) = =3D=3D >> >> >> >> RTE_GRO_TCP_IPV4; >> >> >> >> - do_vxlan_gro =3D (gro_ctx->gro_types & RTE_GRO_IPV4_VXLAN= _TCP_IPV4) =3D=3D >> >> >> >> + do_vxlan_tcp_gro =3D (gro_ctx->gro_types & RTE_GRO_IPV4_V= XLAN_TCP_IPV4) =3D=3D >> >> >> >> RTE_GRO_IPV4_VXLAN_TCP_IPV4; >> >> >> >> do_udp4_gro =3D (gro_ctx->gro_types & RTE_GRO_UDP_IPV4) = =3D=3D >> >> >> >> RTE_GRO_UDP_IPV4; >> >> >> >> + do_vxlan_udp_gro =3D (gro_ctx->gro_types & RTE_GRO_IPV4_V= XLAN_UDP_IPV4) =3D=3D >> >> >> >> + RTE_GRO_IPV4_VXLAN_UDP_IPV4; >> >> >> >> >> >> >> >> current_time =3D rte_rdtsc(); >> >> >> >> >> >> >> >> for (i =3D 0; i < nb_pkts; i++) { >> >> >> >> if (IS_IPV4_VXLAN_TCP4_PKT(pkts[i]->packet_type) = && >> >> >> >> - do_vxlan_gro) { >> >> >> >> - if (gro_vxlan_tcp4_reassemble(pkts[i], vx= lan_tbl, >> >> >> >> + do_vxlan_tcp_gro) { >> >> >> >> + if (gro_vxlan_tcp4_reassemble(pkts[i], vx= lan_tcp_tbl, >> >> >> >> + current_time) < 0= ) >> >> >> >> + unprocess_pkts[unprocess_num++] = =3D pkts[i]; >> >> >> >> + } else if (IS_IPV4_VXLAN_UDP4_PKT(pkts[i]->packet= _type) && >> >> >> >> + do_vxlan_udp_gro) { >> >> >> >> + if (gro_vxlan_udp4_reassemble(pkts[i], vx= lan_udp_tbl, >> >> >> >> current_time) < 0= ) >> >> >> >> unprocess_pkts[unprocess_num++] = =3D pkts[i]; >> >> >> >> } else if (IS_IPV4_TCP_PKT(pkts[i]->packet_type) = && >> >> >> >> @@ -341,6 +397,13 @@ struct gro_ctx { >> >> >> >> left_nb_out =3D max_nb_out - num; >> >> >> >> } >> >> >> >> >> >> >> >> + if ((gro_types & RTE_GRO_IPV4_VXLAN_UDP_IPV4) && max_nb_o= ut > 0) { >> >> >> >> + num +=3D gro_vxlan_udp4_tbl_timeout_flush(gro_ctx= ->tbls[ >> >> >> >> + RTE_GRO_IPV4_VXLAN_UDP_IPV4_INDEX= ], >> >> >> >> + flush_timestamp, &out[num], left_= nb_out); >> >> >> >> + left_nb_out =3D max_nb_out - num; >> >> >> > >> >> >> >Don't need to calculate left_nb_out here. In the previous patch,= you >> >> >> >also calculate it for UDP GRO. Please remove it both. >> >> >> >> >> >> I think keeping it here isn't bad thing, if someone will add a ne= w GRO type here, he/she won't make a mistake. >> >> > >> >> >I don't think so. It's unnecessary for current design. >> >> >If others want to add new GRO types, they will write >> >> >correct code. >> >> > >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>