From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 698F2A04D7;
	Thu,  3 Sep 2020 09:42:54 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id D3F591C0B2;
	Thu,  3 Sep 2020 09:42:53 +0200 (CEST)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by dpdk.org (Postfix) with ESMTP id 881CF1BEAF
 for <dev@dpdk.org>; Thu,  3 Sep 2020 09:42:52 +0200 (CEST)
IronPort-SDR: GgZNoE7dU0p5bJ7yUT+BknSCldT+D9iUdtiaLKqE2a5G2IWA074arTin5r3KTfDk9Wwb0Y3KYY
 iY9e09Pxndhw==
X-IronPort-AV: E=McAfee;i="6000,8403,9732"; a="175586176"
X-IronPort-AV: E=Sophos;i="5.76,385,1592895600"; d="scan'208";a="175586176"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 03 Sep 2020 00:42:51 -0700
IronPort-SDR: mrRWGDkkD2+ZOhLLRcoWnvnamDScDDksTfoKCGDPN8R7n8tAEvpSFJI5SAUj2wXjH7swbhLsmG
 BwAsl1eXVoXw==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.76,385,1592895600"; d="scan'208";a="334387550"
Received: from fmsmsx606.amr.corp.intel.com ([10.18.126.86])
 by fmsmga002.fm.intel.com with ESMTP; 03 Sep 2020 00:42:51 -0700
Received: from shsmsx604.ccr.corp.intel.com (10.109.6.214) by
 fmsmsx606.amr.corp.intel.com (10.18.126.86) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.1713.5; Thu, 3 Sep 2020 00:42:50 -0700
Received: from shsmsx606.ccr.corp.intel.com (10.109.6.216) by
 SHSMSX604.ccr.corp.intel.com (10.109.6.214) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.1713.5; Thu, 3 Sep 2020 15:42:48 +0800
Received: from shsmsx606.ccr.corp.intel.com ([10.109.6.216]) by
 SHSMSX606.ccr.corp.intel.com ([10.109.6.216]) with mapi id 15.01.1713.004;
 Thu, 3 Sep 2020 15:42:48 +0800
From: "Hu, Jiayu" <jiayu.hu@intel.com>
To: "yang_y_yi@163.com" <yang_y_yi@163.com>, "dev@dpdk.org" <dev@dpdk.org>
CC: "thomas@monjalon.net" <thomas@monjalon.net>, "yangyi01@inspur.com"
 <yangyi01@inspur.com>
Thread-Topic: [PATCH V3 1/2] gro: add UDP GRO support
Thread-Index: AQHWgQs93LCFlkMQOkKZJ52UbJMO/KlWbzwA
Date: Thu, 3 Sep 2020 07:42:48 +0000
Message-ID: <93b32da783ad4a609ea11b1cf14184a3@intel.com>
References: <20200902092643.49924-1-yang_y_yi@163.com>
 <20200902092643.49924-2-yang_y_yi@163.com>
In-Reply-To: <20200902092643.49924-2-yang_y_yi@163.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
dlp-reaction: no-action
dlp-version: 11.5.1.3
dlp-product: dlpe-windows
x-originating-ip: [10.239.127.36]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH V3 1/2] gro: add UDP GRO support
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi Yi,

Some comments are inline.

In addition, have you tested UDP GRO function and measure performance?

Thanks,
Jiayu

> -----Original Message-----
> From: yang_y_yi@163.com <yang_y_yi@163.com>
> Sent: Wednesday, September 2, 2020 5:27 PM
> To: dev@dpdk.org
> Cc: Hu, Jiayu <jiayu.hu@intel.com>; thomas@monjalon.net;
> yangyi01@inspur.com; yang_y_yi@163.com
> Subject: [PATCH V3 1/2] gro: add UDP GRO support
>=20
> From: Yi Yang <yangyi01@inspur.com>
>=20
> UDP GRO can help improve VM-to-VM UDP performance when
> VM is enabled UFO or GSO, GRO must be supported if GSO
> or UFO is enabled, otherwise, performance gain will be
> hurt.
>=20
> With this enabled in DPDK, OVS DPDK can leverage it
> to improve VM-to-VM UDP performance, this will make
> sure IP fragments will be reassembled once it is
> received from physical NIC. It is very helpful in OVS
> DPDK VLAN TSO case.
>=20
> Signed-off-by: Yi Yang <yangyi01@inspur.com>
> ---
>=20
>  # install this header file
> diff --git a/lib/librte_gro/gro_udp4.c b/lib/librte_gro/gro_udp4.c
> new file mode 100644
> index 0000000..d6beece
> --- /dev/null
> +++ b/lib/librte_gro/gro_udp4.c
> +static inline void
> +update_header(struct gro_udp4_item *item)
> +{
> +	struct rte_ipv4_hdr *ipv4_hdr;
> +	struct rte_mbuf *pkt =3D item->firstseg;
> +	uint16_t frag_offset;
> +
> +	ipv4_hdr =3D (struct rte_ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
> +			pkt->l2_len);
> +	ipv4_hdr->total_length =3D rte_cpu_to_be_16(pkt->pkt_len -
> +			pkt->l2_len);
> +
> +	/* Clear MF bit if it is last fragment */
> +	if (item->is_last_frag) {
> +		frag_offset =3D rte_be_to_cpu_16(ipv4_hdr->fragment_offset);
> +		ipv4_hdr->fragment_offset =3D
> +			rte_cpu_to_be_16(frag_offset &
> ~RTE_IPV4_HDR_MF_FLAG);
> +	}

I think we need to clear MF bit and offset both, since either
MF bit or offset is non-zero indicates that the packet is an IP
fragment. Once the packet is reassembled successfully, the two
fields should be zero. You can reference IP fragment library in
DPDK.

> +}
> +
> +int32_t
> +gro_udp4_reassemble(struct rte_mbuf *pkt,
> +		struct gro_udp4_tbl *tbl,
> +		uint64_t start_time)
> +{
> +	struct rte_ether_hdr *eth_hdr;
> +	struct rte_ipv4_hdr *ipv4_hdr;
> +	uint16_t ip_dl;
> +	uint16_t ip_id, hdr_len;
> +	uint16_t frag_offset =3D 0;
> +	uint8_t is_last_frag;
> +
> +	struct udp4_flow_key key;
> +	uint32_t cur_idx, prev_idx, item_idx;
> +	uint32_t i, max_flow_num, remaining_flow_num;
> +	int cmp;
> +	uint8_t find;
> +
> +	eth_hdr =3D rte_pktmbuf_mtod(pkt, struct rte_ether_hdr *);
> +	ipv4_hdr =3D (struct rte_ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
> +	hdr_len =3D pkt->l2_len + pkt->l3_len + pkt->l4_len;
> +
> +	/*
> +	 * Don't process non-fragment packet.
> +	 */
> +	if (!is_ipv4_fragment(ipv4_hdr))
> +		return -1;
> +
> +	/*
> +	 * Don't process the packet whose payload length is less than or
> +	 * equal to 0.
> +	 */
> +	if (pkt->pkt_len - hdr_len <=3D 0)
> +		return -1;

Input packets are IP fragments, so the header length shouldn't include l4_l=
en.

> +
> +	ip_dl =3D rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len;
> +	ip_id =3D rte_be_to_cpu_16(ipv4_hdr->packet_id);
> +	frag_offset =3D rte_be_to_cpu_16(ipv4_hdr->fragment_offset);
> +	is_last_frag =3D ((frag_offset & RTE_IPV4_HDR_MF_FLAG) =3D=3D 0) ? 1 : =
0;
> +	frag_offset =3D (uint16_t)(frag_offset & RTE_IPV4_HDR_OFFSET_MASK)
> << 3;
> +
> +	return 0;
> +}
> +
> +
> +uint16_t
> +gro_udp4_tbl_timeout_flush(struct gro_udp4_tbl *tbl,
> +		uint64_t flush_timestamp,
> +		struct rte_mbuf **out,
> +		uint16_t nb_out)
> +{
> +	uint16_t k =3D 0;
> +	uint32_t i, j;
> +	uint32_t max_flow_num =3D tbl->max_flow_num;
> +
> +	for (i =3D 0; i < max_flow_num; i++) {
> +		if (unlikely(tbl->flow_num =3D=3D 0))
> +			return k;
> +
> +		j =3D tbl->flows[i].start_index;
> +		while (j !=3D INVALID_ARRAY_INDEX) {
> +			if (tbl->items[j].start_time <=3D flush_timestamp) {
> +				gro_udp4_merge_items(tbl, j);

Why need to merge packets again when flush the table?

> +				out[k++] =3D tbl->items[j].firstseg;
> +				if (tbl->items[j].nb_merged > 1)
> +					update_header(&(tbl->items[j]));
> +				/*
> +				 * Delete the packet and get the next
> +				 * packet in the flow.
> +				 */
> +				j =3D delete_item(tbl, j, INVALID_ARRAY_INDEX);
> +				tbl->flows[i].start_index =3D j;
> +				if (j =3D=3D INVALID_ARRAY_INDEX)
> +					tbl->flow_num--;
> +
> +				if (unlikely(k =3D=3D nb_out))
> +					return k;
> +			} else
> +				/*
> +				 * The left packets in this flow won't be
> +				 * timeout. Go to check other flows.
> +				 */
> +				break;
> +		}
> +	}
> +	return k;
> +}
> +
> diff --git a/lib/librte_gro/gro_udp4.h b/lib/librte_gro/gro_udp4.h
> new file mode 100644
> index 0000000..e1002c6
> --- /dev/null
> +++ b/lib/librte_gro/gro_udp4.h
> @@ -0,0 +1,294 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Inspur Corporation
> + */
> +
> +#ifndef _GRO_UDP4_H_
> +#define _GRO_UDP4_H_
> +
> +#include <rte_ip.h>
> +#include <rte_udp.h>
> +#include <rte_vxlan.h>

rte_vxlan.h is used in VxLAN/UDP GRO. We don't need
it in this patch.

> +
> +struct gro_udp4_item {
> +	/*
> +	 * The first MBUF segment of the packet. If the value
> +	 * is NULL, it means the item is empty.
> +	 */
> +	struct rte_mbuf *firstseg;
> +	/* The last MBUF segment of the packet */
> +	struct rte_mbuf *lastseg;
> +	/*
> +	 * The time when the first packet is inserted into the table.
> +	 * This value won't be updated, even if the packet is merged
> +	 * with other packets.
> +	 */
> +	uint64_t start_time;
> +	/*
> +	 * next_pkt_idx is used to chain the packets that
> +	 * are in the same flow but can't be merged together
> +	 * (e.g. caused by packet reordering).
> +	 */
> +	uint32_t next_pkt_idx;
> +	/* offset of IP fragment packet */
> +	uint16_t frag_offset;
> +	/* is last IP fragment? */
> +	uint8_t is_last_frag;
> +	/* IPv4 ID of the packet */
> +	uint16_t ip_id;

Fragments of a UDP packet have the same IP ID. We only need to match
it in udp4_flow_key structure, and gro_udp4_item doesn't need it.

> +	/* the number of merged packets */
> +	uint16_t nb_merged;
> +	/* Indicate if IPv4 ID can be ignored */
> +	uint8_t is_atomic;

Is is_atomic used?

> +};
> +
> +
> +/*
> + * Check if two UDP/IPv4 packets belong to the same flow.
> + */
> +static inline int
> +is_same_udp4_flow(struct udp4_flow_key k1, struct udp4_flow_key k2)
> +{
> +	return (rte_is_same_ether_addr(&k1.eth_saddr, &k2.eth_saddr) &&
> +			rte_is_same_ether_addr(&k1.eth_daddr,
> &k2.eth_daddr) &&
> +			(k1.ip_src_addr =3D=3D k2.ip_src_addr) &&
> +			(k1.ip_dst_addr =3D=3D k2.ip_dst_addr) &&
> +			(k1.ip_id =3D=3D k2.ip_id));
> +}
> +
> +/*
> + * Merge two UDP/IPv4 packets without updating checksums.
> + * If cmp is larger than 0, append the new packet to the
> + * original packet. Otherwise, pre-pend the new packet to
> + * the original packet.
> + */
> +static inline int
> +merge_two_udp4_packets(struct gro_udp4_item *item,
> +		struct rte_mbuf *pkt,
> +		int cmp,
> +		uint16_t frag_offset,
> +		uint8_t is_last_frag,
> +		uint16_t ip_id,
> +		uint16_t l2_offset)
> +{
> +	struct rte_mbuf *pkt_head, *pkt_tail, *lastseg;
> +	uint16_t hdr_len, l2_len;
> +	uint32_t ip_len;
> +
> +	if (cmp > 0) {
> +		pkt_head =3D item->firstseg;
> +		pkt_tail =3D pkt;
> +	} else {
> +		pkt_head =3D pkt;
> +		pkt_tail =3D item->firstseg;
> +	}
> +
> +	/* check if the IPv4 packet length is greater than the max value */
> +	hdr_len =3D l2_offset + pkt_head->l2_len + pkt_head->l3_len;
> +	l2_len =3D l2_offset > 0 ? pkt_head->outer_l2_len : pkt_head->l2_len;
> +	ip_len =3D pkt_head->pkt_len - l2_len
> +		 + pkt_tail->pkt_len - hdr_len;
> +	if (unlikely(ip_len > MAX_IPV4_PKT_LENGTH))
> +		return 0;
> +
> +	/* remove the packet header for the tail packet */
> +	rte_pktmbuf_adj(pkt_tail, hdr_len);
> +
> +	/* chain two packets together */
> +	if (cmp > 0) {
> +		item->lastseg->next =3D pkt;
> +		item->lastseg =3D rte_pktmbuf_lastseg(pkt);
> +		/* update IP ID to the larger value */
> +		item->ip_id =3D ip_id;

IP ID is the same for all fragments of a packet. I don't think
we need to update it.

> +	} else {
> +		lastseg =3D rte_pktmbuf_lastseg(pkt);
> +		lastseg->next =3D item->firstseg;
> +		item->firstseg =3D pkt;
> +		/* update sent_seq to the smaller value */
> +		item->frag_offset =3D frag_offset;
> +		item->ip_id =3D ip_id;
> +	}
> +	item->nb_merged++;
> +	if (is_last_frag)
> +		item->is_last_frag =3D is_last_frag;
> +
> +	/* update MBUF metadata for the merged packet */
> +	pkt_head->nb_segs +=3D pkt_tail->nb_segs;
> +	pkt_head->pkt_len +=3D pkt_tail->pkt_len;
> +
> +	return 1;
> +}
> +
> +/*
> + * Check if two UDP/IPv4 packets are neighbors.
> + */
> +static inline int
> +udp_check_neighbor(struct gro_udp4_item *item,
> +		uint16_t frag_offset,
> +		uint16_t ip_id,
> +		uint16_t ip_dl,
> +		uint16_t l2_offset)
> +{
> +	struct rte_mbuf *pkt_orig =3D item->firstseg;
> +	uint16_t len;
> +
> +	/* check if the two packets are neighbors */
> +	len =3D pkt_orig->pkt_len - l2_offset - pkt_orig->l2_len -
> +		pkt_orig->l3_len;
> +	if ((frag_offset =3D=3D item->frag_offset + len) &&
> +		(ip_id =3D=3D item->ip_id))
> +		/* append the new packet */
> +		return 1;
> +	else if ((frag_offset + ip_dl =3D=3D item->frag_offset) &&
> +			(ip_id =3D=3D item->ip_id))

Is_same_udp4_flow() checks ip_id. No need to check again.

> +		/* pre-pend the new packet */
> +		return -1;
> +
> +	return 0;
> +}
> +
> +static inline int
> +is_ipv4_fragment(const struct rte_ipv4_hdr *hdr)
> +{
> +	uint16_t flag_offset, ip_flag, ip_ofs;
> +
> +	flag_offset =3D rte_be_to_cpu_16(hdr->fragment_offset);
> +	ip_ofs =3D (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK);
> +	ip_flag =3D (uint16_t)(flag_offset & RTE_IPV4_HDR_MF_FLAG);
> +
> +	return ip_flag !=3D 0 || ip_ofs  !=3D 0;

If DF bit is set, the packet is not fragmented, which shouldn't be processe=
d
by UDP GRO. So we also need to make sure that DF bit is not set.

> +}
> +#endif
> diff --git a/lib/librte_gro/meson.build b/lib/librte_gro/meson.build
> index 501668c..0d18dc2 100644
> --- a/lib/librte_gro/rte_gro.c
> +++ b/lib/librte_gro/rte_gro.c
> @@ -9,6 +9,7 @@
>=20
>  #include "rte_gro.h"
>  #include "gro_tcp4.h"
> +#include "gro_udp4.h"
>  #include "gro_vxlan_tcp4.h"
>=20
>  typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
> @@ -18,17 +19,23 @@
>  typedef uint32_t (*gro_tbl_pkt_count_fn)(void *tbl);
>=20
>  static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM] =3D {
> -		gro_tcp4_tbl_create, gro_vxlan_tcp4_tbl_create, NULL};
> +		gro_tcp4_tbl_create, gro_vxlan_tcp4_tbl_create,
> +		gro_udp4_tbl_create, NULL};
>  static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM] =3D {
>  			gro_tcp4_tbl_destroy, gro_vxlan_tcp4_tbl_destroy,
> +			gro_udp4_tbl_destroy,
>  			NULL};
>  static gro_tbl_pkt_count_fn tbl_pkt_count_fn[RTE_GRO_TYPE_MAX_NUM] =3D
> {
>  			gro_tcp4_tbl_pkt_count,
> gro_vxlan_tcp4_tbl_pkt_count,
> +			gro_udp4_tbl_pkt_count,
>  			NULL};
>=20
>  #define IS_IPV4_TCP_PKT(ptype) (RTE_ETH_IS_IPV4_HDR(ptype) && \
>  		((ptype & RTE_PTYPE_L4_TCP) =3D=3D RTE_PTYPE_L4_TCP))
>=20
> +#define IS_IPV4_UDP_PKT(ptype) (RTE_ETH_IS_IPV4_HDR(ptype) && \
> +		((ptype & RTE_PTYPE_L4_UDP) =3D=3D RTE_PTYPE_L4_UDP))
> +
>  #define IS_IPV4_VXLAN_TCP4_PKT(ptype) (RTE_ETH_IS_IPV4_HDR(ptype)
> && \
>  		((ptype & RTE_PTYPE_L4_UDP) =3D=3D RTE_PTYPE_L4_UDP) && \
>  		((ptype & RTE_PTYPE_TUNNEL_VXLAN) =3D=3D \
> @@ -40,6 +47,7 @@
>  		     RTE_PTYPE_INNER_L3_IPV4_EXT | \
>  		     RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN)) !=3D 0))
>=20
> +
>  /*
>   * GRO context structure. It keeps the table structures, which are
>   * used to merge packets, for different GRO types. Before using
> @@ -123,20 +131,26 @@ struct gro_ctx {
>  	struct gro_tcp4_flow tcp_flows[RTE_GRO_MAX_BURST_ITEM_NUM];
>  	struct gro_tcp4_item tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM]
> =3D {{0} };
>=20
> -	/* Allocate a reassembly table for VXLAN GRO */
> -	struct gro_vxlan_tcp4_tbl vxlan_tbl;
> -	struct gro_vxlan_tcp4_flow
> vxlan_flows[RTE_GRO_MAX_BURST_ITEM_NUM];
> -	struct gro_vxlan_tcp4_item
> vxlan_items[RTE_GRO_MAX_BURST_ITEM_NUM] =3D {
> -		{{0}, 0, 0} };
> +	/* allocate a reassembly table for UDP/IPv4 GRO */
> +	struct gro_udp4_tbl udp_tbl;
> +	struct gro_udp4_flow
> udp_flows[RTE_GRO_MAX_BURST_ITEM_NUM];
> +	struct gro_udp4_item udp_items[RTE_GRO_MAX_BURST_ITEM_NUM]
> =3D {{0} };
> +
> +	/* Allocate a reassembly table for VXLAN TCP GRO */
> +	struct gro_vxlan_tcp4_tbl vxlan_tcp_tbl;
> +	struct gro_vxlan_tcp4_flow
> vxlan_tcp_flows[RTE_GRO_MAX_BURST_ITEM_NUM];
> +	struct gro_vxlan_tcp4_item
> vxlan_tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM]
> +			=3D {{{0}, 0, 0} };

Renaming vxlan_items/_flows should be in the second patch, as this patch ju=
st
supports UDP GRO.

>=20
>  	struct rte_mbuf *unprocess_pkts[nb_pkts];
>  	uint32_t item_num;
>  	int32_t ret;
>  	uint16_t i, unprocess_num =3D 0, nb_after_gro =3D nb_pkts;
> -	uint8_t do_tcp4_gro =3D 0, do_vxlan_gro =3D 0;
> +	uint8_t do_tcp4_gro =3D 0, do_vxlan_tcp_gro =3D 0, do_udp4_gro =3D 0;

Renaming do_vxlan_gro should be in the second patch.

> =20
>  	/* Get the maximum number of packets */
> @@ -146,15 +160,15 @@ struct gro_ctx {
>=20
>  	if (param->gro_types & RTE_GRO_TCP_IPV4) {
> @@ -170,14 +184,29 @@ struct gro_ctx {
>  		do_tcp4_gro =3D 1;
>  	}
> +
>=20
>  	return nb_after_gro;
> @@ -224,29 +269,33 @@ struct gro_ctx {
>  {
>  	struct rte_mbuf *unprocess_pkts[nb_pkts];
>  	struct gro_ctx *gro_ctx =3D ctx;
> -	void *tcp_tbl, *vxlan_tbl;
> +	void *tcp_tbl, *udp_tbl, *vxlan_tcp_tbl;
>  	uint64_t current_time;
>  	uint16_t i, unprocess_num =3D 0;
> -	uint8_t do_tcp4_gro, do_vxlan_gro;
> +	uint8_t do_tcp4_gro, do_vxlan_tcp_gro, do_udp4_gro;
>=20
>  	if (unlikely((gro_ctx->gro_types & (RTE_GRO_IPV4_VXLAN_TCP_IPV4
> |
> -					RTE_GRO_TCP_IPV4)) =3D=3D 0))
> +					RTE_GRO_TCP_IPV4 |
> +					RTE_GRO_UDP_IPV4)) =3D=3D 0))
>  		return nb_pkts;
>=20
>  	tcp_tbl =3D gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX];
> -	vxlan_tbl =3D gro_ctx->tbls[RTE_GRO_IPV4_VXLAN_TCP_IPV4_INDEX];
> +	vxlan_tcp_tbl =3D gro_ctx-
> >tbls[RTE_GRO_IPV4_VXLAN_TCP_IPV4_INDEX];
> +	udp_tbl =3D gro_ctx->tbls[RTE_GRO_UDP_IPV4_INDEX];
>=20
>  	do_tcp4_gro =3D (gro_ctx->gro_types & RTE_GRO_TCP_IPV4) =3D=3D
>  		RTE_GRO_TCP_IPV4;
> -	do_vxlan_gro =3D (gro_ctx->gro_types &
> RTE_GRO_IPV4_VXLAN_TCP_IPV4) =3D=3D
> +	do_vxlan_tcp_gro =3D (gro_ctx->gro_types &
> RTE_GRO_IPV4_VXLAN_TCP_IPV4) =3D=3D
>  		RTE_GRO_IPV4_VXLAN_TCP_IPV4;
> +	do_udp4_gro =3D (gro_ctx->gro_types & RTE_GRO_UDP_IPV4) =3D=3D
> +		RTE_GRO_UDP_IPV4;
>=20
>  	current_time =3D rte_rdtsc();
>=20
>  	for (i =3D 0; i < nb_pkts; i++) {
>  		if (IS_IPV4_VXLAN_TCP4_PKT(pkts[i]->packet_type) &&
> -				do_vxlan_gro) {
> -			if (gro_vxlan_tcp4_reassemble(pkts[i], vxlan_tbl,
> +				do_vxlan_tcp_gro) {
> +			if (gro_vxlan_tcp4_reassemble(pkts[i], vxlan_tcp_tbl,
>  						current_time) < 0)
>  				unprocess_pkts[unprocess_num++] =3D pkts[i];
>  		} else if (IS_IPV4_TCP_PKT(pkts[i]->packet_type) &&
> @@ -254,6 +303,11 @@ struct gro_ctx {
>  			if (gro_tcp4_reassemble(pkts[i], tcp_tbl,
>  						current_time) < 0)
>  				unprocess_pkts[unprocess_num++] =3D pkts[i];
> +		} else if (IS_IPV4_UDP_PKT(pkts[i]->packet_type) &&
> +				do_udp4_gro) {
> +			if (gro_udp4_reassemble(pkts[i], udp_tbl,
> +						current_time) < 0)
> +				unprocess_pkts[unprocess_num++] =3D pkts[i];
>  		} else
>  			unprocess_pkts[unprocess_num++] =3D pkts[i];
>  	}
> @@ -275,6 +329,7 @@ struct gro_ctx {
>  	struct gro_ctx *gro_ctx =3D ctx;
>  	uint64_t flush_timestamp;
>  	uint16_t num =3D 0;
> +	uint16_t left_nb_out =3D max_nb_out;
>=20
>  	gro_types =3D gro_types & gro_ctx->gro_types;
>  	flush_timestamp =3D rte_rdtsc() - timeout_cycles;
> @@ -282,8 +337,8 @@ struct gro_ctx {
>  	if (gro_types & RTE_GRO_IPV4_VXLAN_TCP_IPV4) {
>  		num =3D gro_vxlan_tcp4_tbl_timeout_flush(gro_ctx->tbls[
>  				RTE_GRO_IPV4_VXLAN_TCP_IPV4_INDEX],
> -				flush_timestamp, out, max_nb_out);
> -		max_nb_out -=3D num;
> +				flush_timestamp, out, left_nb_out);
> +		left_nb_out =3D max_nb_out - num;
>  	}
>=20
>  	/* If no available space in 'out', stop flushing. */
> @@ -291,7 +346,17 @@ struct gro_ctx {
>  		num +=3D gro_tcp4_tbl_timeout_flush(
>  				gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX],
>  				flush_timestamp,
> -				&out[num], max_nb_out);
> +				&out[num], left_nb_out);
> +		left_nb_out =3D max_nb_out - num;
> +	}
> +
> +	/* If no available space in 'out', stop flushing. */
> +	if ((gro_types & RTE_GRO_UDP_IPV4) && max_nb_out > 0) {
> +		num +=3D gro_udp4_tbl_timeout_flush(
> +				gro_ctx->tbls[RTE_GRO_UDP_IPV4_INDEX],
> +				flush_timestamp,
> +				&out[num], left_nb_out);
> +		left_nb_out =3D max_nb_out - num;

Don't need to update left_nb_out here.

>  	}
>=20
>  	return num;
> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> index 8d781b5..470f3ed 100644
> --- a/lib/librte_gro/rte_gro.h
> +++ b/lib/librte_gro/rte_gro.h
> @@ -31,7 +31,10 @@
>  /**< TCP/IPv4 GRO flag */
>  #define RTE_GRO_IPV4_VXLAN_TCP_IPV4_INDEX 1
>  #define RTE_GRO_IPV4_VXLAN_TCP_IPV4 (1ULL <<
> RTE_GRO_IPV4_VXLAN_TCP_IPV4_INDEX)
> -/**< VxLAN GRO flag. */
> +/**< VxLAN TCP/IPv4 GRO flag. */
> +#define RTE_GRO_UDP_IPV4_INDEX 2
> +#define RTE_GRO_UDP_IPV4 (1ULL << RTE_GRO_UDP_IPV4_INDEX)
> +/**< UDP/IPv4 GRO flag */
>=20
>  /**
>   * Structure used to create GRO context objects or used to pass
> --
> 1.8.3.1