From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <mark.b.kavanagh@intel.com>
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
 by dpdk.org (Postfix) with ESMTP id 542C62BF3
 for <dev@dpdk.org>; Wed, 30 Aug 2017 11:25:42 +0200 (CEST)
Received: from fmsmga006.fm.intel.com ([10.253.24.20])
 by orsmga104.jf.intel.com with ESMTP; 30 Aug 2017 02:25:41 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.41,448,1498546800"; d="scan'208";a="146202326"
Received: from irsmsx108.ger.corp.intel.com ([163.33.3.3])
 by fmsmga006.fm.intel.com with ESMTP; 30 Aug 2017 02:25:40 -0700
Received: from irsmsx101.ger.corp.intel.com ([169.254.1.22]) by
 IRSMSX108.ger.corp.intel.com ([169.254.11.167]) with mapi id 14.03.0319.002;
 Wed, 30 Aug 2017 10:25:39 +0100
From: "Kavanagh, Mark B" <mark.b.kavanagh@intel.com>
To: "Hu, Jiayu" <jiayu.hu@intel.com>, "Ananyev, Konstantin"
 <konstantin.ananyev@intel.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, "Tan, Jianfeng" <jianfeng.tan@intel.com>
Thread-Topic: [PATCH 2/5] gso/lib: add TCP/IPv4 GSO support
Thread-Index: AQHTHOM2JlGz7S1xvEqkdDT8yCb80aKcFteAgAAVmACAAHucoA==
Date: Wed, 30 Aug 2017 09:25:38 +0000
Message-ID: <DC5AD7FA266D86499789B1BCAEC715F8C31A75A8@IRSMSX101.ger.corp.intel.com>
References: <1503584144-63181-1-git-send-email-jiayu.hu@intel.com>
 <1503584144-63181-3-git-send-email-jiayu.hu@intel.com>
 <2601191342CEEE43887BDE71AB9772584F23E097@IRSMSX103.ger.corp.intel.com>
 <20170830025550.GA113170@dpdk15.sh.intel.com>
In-Reply-To: <20170830025550.GA113170@dpdk15.sh.intel.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-ctpclassification: CTP_IC
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMDExOWU2OWQtYzliZC00YmE0LTk4MGQtMmZiZmQ5ODZhMjYxIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IjVEYklGWkk1WDJ5TmoyWUtiNFVROHVSVXdtY1JYOFwvYTFzdXoxM3ZRVzZZPSJ9
dlp-product: dlpe-windows
dlp-version: 10.0.102.7
dlp-reaction: no-action
x-originating-ip: [163.33.239.180]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH 2/5] gso/lib: add TCP/IPv4 GSO support
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Aug 2017 09:25:43 -0000

>From: Hu, Jiayu
>Sent: Wednesday, August 30, 2017 3:56 AM
>To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
>Cc: dev@dpdk.org; Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Tan, Jianf=
eng
><jianfeng.tan@intel.com>
>Subject: Re: [PATCH 2/5] gso/lib: add TCP/IPv4 GSO support
>
>Hi Konstantin,
>
>Thanks for your important suggestions. My feedbacks are inline.
>
>On Wed, Aug 30, 2017 at 09:38:33AM +0800, Ananyev, Konstantin wrote:
>>
>>
>> > -----Original Message-----
>> > From: Hu, Jiayu
>> > Sent: Thursday, August 24, 2017 3:16 PM
>> > To: dev@dpdk.org
>> > Cc: Kavanagh, Mark B <mark.b.kavanagh@intel.com>; Ananyev, Konstantin
><konstantin.ananyev@intel.com>; Tan, Jianfeng
>> > <jianfeng.tan@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>
>> > Subject: [PATCH 2/5] gso/lib: add TCP/IPv4 GSO support
>> >
>> > This patch adds GSO support for TCP/IPv4 packets. Supported packets
>> > may include a single VLAN tag. TCP/IPv4 GSO assumes that all input
>> > packets have correct checksums, and doesn't update checksums for outpu=
t
>> > packets (the responsibility for this lies with the application).
>> > Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets.
>> >
>> > TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect
>> > MBUF, to organize an output packet. Note that we refer to these two
>> > chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet
>> > header, while the indirect mbuf simply points to a location within the
>> > original packet's payload. Consequently, use of the GSO library requir=
es
>> > multi-segment MBUF support in the TX functions of the NIC driver.
>> >
>> > If a packet is GSOed, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a
>> > result, when all of its GSOed segments are freed, the packet is freed
>> > automatically.
>> >
>> > Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
>> > Signed-off-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
>> > ---
>> >  lib/librte_eal/common/include/rte_log.h |   1 +
>> >  lib/librte_gso/Makefile                 |   2 +
>> >  lib/librte_gso/gso_common.c             | 270
>++++++++++++++++++++++++++++++++
>> >  lib/librte_gso/gso_common.h             | 120 ++++++++++++++
>> >  lib/librte_gso/gso_tcp.c                |  82 ++++++++++
>> >  lib/librte_gso/gso_tcp.h                |  73 +++++++++
>> >  lib/librte_gso/rte_gso.c                |  44 +++++-
>> >  lib/librte_gso/rte_gso.h                |   3 +
>> >  8 files changed, 593 insertions(+), 2 deletions(-)
>> >  create mode 100644 lib/librte_gso/gso_common.c
>> >  create mode 100644 lib/librte_gso/gso_common.h
>> >  create mode 100644 lib/librte_gso/gso_tcp.c
>> >  create mode 100644 lib/librte_gso/gso_tcp.h
>> >
>> > diff --git a/lib/librte_eal/common/include/rte_log.h
>b/lib/librte_eal/common/include/rte_log.h
>> > index ec8dba7..2fa1199 100644
>> > --- a/lib/librte_eal/common/include/rte_log.h
>> > +++ b/lib/librte_eal/common/include/rte_log.h
>> > @@ -87,6 +87,7 @@ extern struct rte_logs rte_logs;
>> >  #define RTE_LOGTYPE_CRYPTODEV 17 /**< Log related to cryptodev. */
>> >  #define RTE_LOGTYPE_EFD       18 /**< Log related to EFD. */
>> >  #define RTE_LOGTYPE_EVENTDEV  19 /**< Log related to eventdev. */
>> > +#define RTE_LOGTYPE_GSO       20 /**< Log related to GSO. */
>> >
>> >  /* these log types can be used in an application */
>> >  #define RTE_LOGTYPE_USER1     24 /**< User-defined log type 1. */
>> > diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile
>> > index aeaacbc..0f8e38f 100644
>> > --- a/lib/librte_gso/Makefile
>> > +++ b/lib/librte_gso/Makefile
>> > @@ -42,6 +42,8 @@ LIBABIVER :=3D 1
>> >
>> >  #source files
>> >  SRCS-$(CONFIG_RTE_LIBRTE_GSO) +=3D rte_gso.c
>> > +SRCS-$(CONFIG_RTE_LIBRTE_GSO) +=3D gso_common.c
>> > +SRCS-$(CONFIG_RTE_LIBRTE_GSO) +=3D gso_tcp.c
>> >
>> >  # install this header file
>> >  SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include +=3D rte_gso.h
>> > diff --git a/lib/librte_gso/gso_common.c b/lib/librte_gso/gso_common.c
>> > new file mode 100644
>> > index 0000000..2b54fbd
>> > --- /dev/null
>> > +++ b/lib/librte_gso/gso_common.c
>> > @@ -0,0 +1,270 @@
>> > +/*-
>> > + *   BSD LICENSE
>> > + *
>> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
>> > + *   All rights reserved.
>> > + *
>> > + *   Redistribution and use in source and binary forms, with or witho=
ut
>> > + *   modification, are permitted provided that the following conditio=
ns
>> > + *   are met:
>> > + *
>> > + *     * Redistributions of source code must retain the above copyrig=
ht
>> > + *       notice, this list of conditions and the following disclaimer=
.
>> > + *     * Redistributions in binary form must reproduce the above
>copyright
>> > + *       notice, this list of conditions and the following disclaimer=
 in
>> > + *       the documentation and/or other materials provided with the
>> > + *       distribution.
>> > + *     * Neither the name of Intel Corporation nor the names of its
>> > + *       contributors may be used to endorse or promote products deri=
ved
>> > + *       from this software without specific prior written permission=
.
>> > + *
>> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUT=
ORS
>> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NO=
T
>> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNES=
S
>FOR
>> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYR=
IGHT
>> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>INCIDENTAL,
>> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
>USE,
>> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND O=
N
>ANY
>> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR T=
ORT
>> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF TH=
E
>USE
>> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAM=
AGE.
>> > + */
>> > +
>> > +#include <stdbool.h>
>> > +#include <string.h>
>> > +
>> > +#include <rte_malloc.h>
>> > +
>> > +#include <rte_ether.h>
>> > +#include <rte_ip.h>
>> > +#include <rte_tcp.h>
>> > +
>> > +#include "gso_common.h"
>> > +
>> > +static inline void
>> > +hdr_segment_init(struct rte_mbuf *hdr_segment, struct rte_mbuf *pkt,
>> > +		uint16_t pkt_hdr_offset)
>> > +{
>> > +	/* copy mbuf metadata */
>> > +	hdr_segment->nb_segs =3D 1;
>> > +	hdr_segment->port =3D pkt->port;
>> > +	hdr_segment->ol_flags =3D pkt->ol_flags;
>> > +	hdr_segment->packet_type =3D pkt->packet_type;
>> > +	hdr_segment->pkt_len =3D pkt_hdr_offset;
>> > +	hdr_segment->data_len =3D pkt_hdr_offset;
>> > +	hdr_segment->tx_offload =3D pkt->tx_offload;
>> > +	/* copy packet header */
>> > +	rte_memcpy(rte_pktmbuf_mtod(hdr_segment, char *),
>> > +			rte_pktmbuf_mtod(pkt, char *),
>> > +			pkt_hdr_offset);
>> > +}
>> > +
>> > +static inline void
>> > +free_gso_segment(struct rte_mbuf **pkts, uint16_t nb_pkts)
>> > +{
>> > +	uint16_t i;
>> > +
>> > +	for (i =3D 0; i < nb_pkts; i++) {
>> > +		rte_pktmbuf_detach(pkts[i]->next);
>>
>> I don't think you need to call detach() here explicitly.
>> Just rte_pktmbuf_free(pkts[i]) should do I think.
>
>Yes, rte_pktmbuf_free() is enough. I will modify it. Thanks.
>
>>
>> > +		rte_pktmbuf_free(pkts[i]);
>> > +		pkts[i] =3D NULL;
>> > +	}
>> > +}
>> > +
>> > +int
>> > +gso_do_segment(struct rte_mbuf *pkt,
>> > +		uint16_t pkt_hdr_offset,
>> > +		uint16_t pyld_unit_size,
>> > +		struct rte_mempool *direct_pool,
>> > +		struct rte_mempool *indirect_pool,
>> > +		struct rte_mbuf **pkts_out,
>> > +		uint16_t nb_pkts_out)
>> > +{
>> > +	struct rte_mbuf *pkt_in;
>> > +	struct rte_mbuf *hdr_segment, *pyld_segment;
>> > +	uint32_t pkt_in_pyld_off;
>> > +	uint16_t pkt_in_segment_len, pkt_out_segment_len;
>> > +	uint16_t nb_segs;
>> > +	bool pkt_in_segment_processed;
>> > +
>> > +	pkt_in_pyld_off =3D pkt->data_off + pkt_hdr_offset;
>> > +	pkt_in =3D pkt;
>> > +	nb_segs =3D 0;
>> > +
>> > +	while (pkt_in) {
>> > +		pkt_in_segment_processed =3D false;
>> > +		pkt_in_segment_len =3D pkt_in->data_off + pkt_in->data_len;
>> > +
>> > +		while (!pkt_in_segment_processed) {
>> > +			if (unlikely(nb_segs >=3D nb_pkts_out)) {
>> > +				free_gso_segment(pkts_out, nb_segs);
>> > +				return -EINVAL;
>> > +			}
>> > +
>> > +			/* allocate direct mbuf */
>> > +			hdr_segment =3D rte_pktmbuf_alloc(direct_pool);
>> > +			if (unlikely(hdr_segment =3D=3D NULL)) {
>> > +				free_gso_segment(pkts_out, nb_segs);
>> > +				return -ENOMEM;
>> > +			}
>> > +
>> > +			/* allocate indirect mbuf */
>> > +			pyld_segment =3D rte_pktmbuf_alloc(indirect_pool);
>> > +			if (unlikely(pyld_segment =3D=3D NULL)) {
>> > +				rte_pktmbuf_free(hdr_segment);
>> > +				free_gso_segment(pkts_out, nb_segs);
>> > +				return -ENOMEM;
>> > +			}
>>
>> So, if I understand correctly each new packet would always contain just =
one
>data segment?
>> Why several data segments couldn't be chained together (if sum of their
>data_len <=3D mss)?
>> In a same way as done here:
>>
>http://dpdk.org/browse/dpdk/tree/lib/librte_ip_frag/rte_ipv4_fragmentation=
.c#n
>93
>> or here:
>>
>https://gerrit.fd.io/r/gitweb?p=3Dtldk.git;a=3Dblob;f=3Dlib/libtle_l4p/tcp=
_tx_seg.h;
>h=3Da8d2425597a7ad6f598aa4bb7fcd7f1da74305f0;hb=3DHEAD#l23
>> ?
>
>Oh, yes. I can chain these data segments when their total length is less t=
han
>the GSO segsz.
>I will change it in the next patch. Thanks very much.
>
>>
>> > +
>> > +			/* copy packet header */
>> > +			hdr_segment_init(hdr_segment, pkt, pkt_hdr_offset);
>> > +
>> > +			/* attach payload mbuf to current packet segment */
>> > +			rte_pktmbuf_attach(pyld_segment, pkt_in);
>> > +
>> > +			hdr_segment->next =3D pyld_segment;
>> > +			pkts_out[nb_segs++] =3D hdr_segment;
>> > +
>> > +			/* calculate payload length */
>> > +			pkt_out_segment_len =3D pyld_unit_size;
>> > +			if (pkt_in_pyld_off + pkt_out_segment_len >
>> > +					pkt_in_segment_len) {
>> > +				pkt_out_segment_len =3D pkt_in_segment_len -
>> > +					pkt_in_pyld_off;
>> > +			}
>> > +
>> > +			/* update payload segment */
>> > +			pyld_segment->data_off =3D pkt_in_pyld_off;
>> > +			pyld_segment->data_len =3D pkt_out_segment_len;
>> > +
>> > +			/* update header segment */
>> > +			hdr_segment->pkt_len +=3D pyld_segment->data_len;
>> > +			hdr_segment->nb_segs++;
>> > +
>> > +			/* update pkt_in_pyld_off */
>> > +			pkt_in_pyld_off +=3D pkt_out_segment_len;
>> > +			if (pkt_in_pyld_off =3D=3D pkt_in_segment_len)
>> > +				pkt_in_segment_processed =3D true;
>> > +		}
>> > +
>> > +		/* 'pkt_in' may contain numerous segments */
>> > +		pkt_in =3D pkt_in->next;
>> > +		if (pkt_in !=3D NULL)
>> > +			pkt_in_pyld_off =3D pkt_in->data_off;
>> > +	}
>> > +	return nb_segs;
>> > +}
>> > +
>> > +static inline void
>> > +parse_ipv4(struct ipv4_hdr *ipv4_hdr, struct rte_mbuf *pkt)
>> > +{
>> > +	struct tcp_hdr *tcp_hdr;
>> > +
>> > +	switch (ipv4_hdr->next_proto_id) {
>> > +	case IPPROTO_TCP:
>> > +		pkt->packet_type |=3D RTE_PTYPE_L4_TCP;
>> > +		pkt->l3_len =3D IPv4_HDR_LEN(ipv4_hdr);
>> > +		tcp_hdr =3D (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
>> > +		pkt->l4_len =3D TCP_HDR_LEN(tcp_hdr);
>> > +		break;
>> > +	}
>> > +}
>> > +
>> > +static inline void
>> > +parse_ethernet(struct ether_hdr *eth_hdr, struct rte_mbuf *pkt)
>> > +{
>> > +	struct ipv4_hdr *ipv4_hdr;
>> > +	struct vlan_hdr *vlan_hdr;
>> > +	uint16_t ethertype;
>> > +
>> > +	ethertype =3D rte_be_to_cpu_16(eth_hdr->ether_type);
>> > +	if (ethertype =3D=3D ETHER_TYPE_VLAN) {
>> > +		vlan_hdr =3D (struct vlan_hdr *)(eth_hdr + 1);
>> > +		pkt->l2_len =3D sizeof(struct vlan_hdr);
>> > +		pkt->packet_type |=3D RTE_PTYPE_L2_ETHER_VLAN;
>> > +		ethertype =3D rte_be_to_cpu_16(vlan_hdr->eth_proto);
>> > +	}
>> > +
>> > +	switch (ethertype) {
>> > +	case ETHER_TYPE_IPv4:
>> > +		if (IS_VLAN_PKT(pkt)) {
>> > +			pkt->packet_type |=3D RTE_PTYPE_L3_IPV4;
>> > +		} else {
>> > +			pkt->packet_type |=3D RTE_PTYPE_L2_ETHER;
>> > +			pkt->packet_type |=3D RTE_PTYPE_L3_IPV4;
>> > +		}
>> > +		pkt->l2_len +=3D sizeof(struct ether_hdr);
>> > +		ipv4_hdr =3D (struct ipv4_hdr *) ((char *)eth_hdr +
>> > +				pkt->l2_len);
>> > +		parse_ipv4(ipv4_hdr, pkt);
>> > +		break;
>> > +	}
>> > +}
>> > +
>> > +void
>> > +gso_parse_packet(struct rte_mbuf *pkt)
>>
>> There is a function rte_net_get_ptype() that supposed to provide similar
>functionality.
>> So we probably don't need to create a new SW parse function here, instea=
d
>would be better
>> to reuse (and update if needed) an existing one.
>> Again user already might have l2/l3/l4.../_len and packet_type setuped.
>> So better to keep SW packet parsing out of scope of that library.
>
>Hmm, I know we have discussed this design choice in the GRO library, and I
>also think it's
>better to reuse these values.
>
>But from the perspective of OVS, it may add extra overhead, since OVS does=
n't
>parse every
>packet originally. Maybe @Mark can give us more inputs from the view of OV=
S.

Hi Jiayu, Konstantin

For GSO, the application needs to know:
- the packet type (as it only currently supports TCP/IPv4, VxLAN, GRE packe=
ts)
- the l2/3/4_lens, etc. (in order to replicate the original packet's header=
s across outgoing segments)

For this, we can use the rte_net_get_ptype function, as per Konstantin's su=
ggestion, as it provides both - thanks Konstantin!

WRT the extra overhead in OvS: TSO is the defacto standard, and GSO is prov=
ided purely as a fallback option. As such, and since the additional packet =
parsing is a necessity in order to facilitate GSO, the additional overhead =
is IMO acceptable.

Thanks,
Mark

>
>>
>> > +{
>> > +	struct ether_hdr *eth_hdr =3D rte_pktmbuf_mtod(pkt, struct ether_hdr=
 *);
>> > +
>> > +	pkt->packet_type =3D pkt->tx_offload =3D 0;
>> > +	parse_ethernet(eth_hdr, pkt);
>> > +}
>> > +
>> > +static inline void
>> > +update_ipv4_header(char *base, uint16_t offset, uint16_t length, uint=
16_t
>id)
>> > +{
>> > +	struct ipv4_hdr *ipv4_hdr =3D (struct ipv4_hdr *)(base + offset);
>> > +
>> > +	ipv4_hdr->total_length =3D rte_cpu_to_be_16(length - offset);
>> > +	ipv4_hdr->packet_id =3D rte_cpu_to_be_16(id);
>> > +}
>> > +
>> > +static inline void
>> > +update_tcp_header(char *base, uint16_t offset, uint32_t sent_seq,
>> > +	uint8_t non_tail)
>> > +{
>> > +	struct tcp_hdr *tcp_hdr =3D (struct tcp_hdr *)(base + offset);
>> > +
>> > +	tcp_hdr->sent_seq =3D rte_cpu_to_be_32(sent_seq);
>> > +	/* clean FIN and PSH for non-tail segments */
>> > +	if (non_tail)
>> > +		tcp_hdr->tcp_flags &=3D (~(TCP_HDR_PSH_MASK | TCP_HDR_FIN_MASK));
>> > +}
>> > +
>> > +void
>> > +gso_update_pkt_headers(struct rte_mbuf *pkt, uint16_t nb_segments,
>> > +		struct rte_mbuf **out_segments)
>> > +{
>> > +	struct ipv4_hdr *ipv4_hdr;
>> > +	struct tcp_hdr *tcp_hdr;
>> > +	struct rte_mbuf *seg;
>> > +	uint32_t sent_seq;
>> > +	uint16_t offset, i;
>> > +	uint16_t tail_seg_idx =3D nb_segments - 1, id;
>> > +
>> > +	switch (pkt->packet_type) {
>> > +	case ETHER_VLAN_IPv4_TCP_PKT:
>> > +	case ETHER_IPv4_TCP_PKT:
>>
>> Might be worth to put code below in a separate function:
>> update_inner_tcp_hdr(..) or so.
>> Then you can reuse it for tunneled cases too.
>
>Yes, I will modify it in the next patch. Thanks.
>
>>
>> > +		ipv4_hdr =3D (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *)
>> > +				pkt->l2_len);
>> > +		tcp_hdr =3D (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
>> > +		id =3D rte_be_to_cpu_16(ipv4_hdr->packet_id);
>> > +		sent_seq =3D rte_be_to_cpu_32(tcp_hdr->sent_seq);
>> > +
>> > +		for (i =3D 0; i < nb_segments; i++) {
>> > +			seg =3D out_segments[i];
>> > +
>> > +			offset =3D seg->l2_len;
>> > +			update_ipv4_header(rte_pktmbuf_mtod(seg, char *),
>> > +					offset, seg->pkt_len, id);
>> > +			id++;
>>
>> Who would be responsible to make sure that we wouldn't have consecutive
>packets with the IPV4 id?
>> Would be the upper layer that forms the packet or gso library or ...?
>
>Oh yes. I ingore this important issue. I don't think applications can
>guarantee it.
>I will check the design of linux and try to figure out a way. Thanks for
>reminder.
>
>>
>> > +
>> > +			offset +=3D seg->l3_len;
>> > +			update_tcp_header(rte_pktmbuf_mtod(seg, char *),
>> > +					offset, sent_seq, i < tail_seg_idx);
>> > +			sent_seq +=3D seg->next->data_len;
>> > +		}
>> > +		break;
>> > +	}
>> > +}
>> > diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h
>> > new file mode 100644
>> > index 0000000..d750041
>> > --- /dev/null
>> > +++ b/lib/librte_gso/gso_common.h
>> > @@ -0,0 +1,120 @@
>> > +/*-
>> > + *   BSD LICENSE
>> > + *
>> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
>> > + *   All rights reserved.
>> > + *
>> > + *   Redistribution and use in source and binary forms, with or witho=
ut
>> > + *   modification, are permitted provided that the following conditio=
ns
>> > + *   are met:
>> > + *
>> > + *     * Redistributions of source code must retain the above copyrig=
ht
>> > + *       notice, this list of conditions and the following disclaimer=
.
>> > + *     * Redistributions in binary form must reproduce the above
>copyright
>> > + *       notice, this list of conditions and the following disclaimer=
 in
>> > + *       the documentation and/or other materials provided with the
>> > + *       distribution.
>> > + *     * Neither the name of Intel Corporation nor the names of its
>> > + *       contributors may be used to endorse or promote products deri=
ved
>> > + *       from this software without specific prior written permission=
.
>> > + *
>> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUT=
ORS
>> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NO=
T
>> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNES=
S
>FOR
>> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYR=
IGHT
>> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>INCIDENTAL,
>> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
>USE,
>> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND O=
N
>ANY
>> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR T=
ORT
>> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF TH=
E
>USE
>> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAM=
AGE.
>> > + */
>> > +
>> > +#ifndef _GSO_COMMON_H_
>> > +#define _GSO_COMMON_H_
>> > +
>> > +#include <stdint.h>
>> > +#include <rte_mbuf.h>
>> > +
>> > +#define IPV4_HDR_DF_SHIFT 14
>> > +#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT)
>> > +#define IPv4_HDR_LEN(iph) ((iph->version_ihl & 0x0f) * 4)
>> > +
>> > +#define TCP_HDR_PSH_MASK ((uint8_t)0x08)
>> > +#define TCP_HDR_FIN_MASK ((uint8_t)0x01)
>> > +#define TCP_HDR_LEN(tcph) ((tcph->data_off & 0xf0) >> 2)
>> > +
>> > +#define ETHER_IPv4_PKT (RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4)
>> > +/* Supported packet types */
>> > +/* TCP/IPv4 packet. */
>> > +#define ETHER_IPv4_TCP_PKT (ETHER_IPv4_PKT | RTE_PTYPE_L4_TCP)
>> > +
>> > +/* TCP/IPv4 packet with VLAN tag. */
>> > +#define ETHER_VLAN_IPv4_TCP_PKT (RTE_PTYPE_L2_ETHER_VLAN | \
>> > +		RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP)
>> > +
>> > +#define IS_VLAN_PKT(pkt) ((pkt->packet_type & RTE_PTYPE_L2_ETHER_VLAN=
) =3D=3D
>\
>> > +		RTE_PTYPE_L2_ETHER_VLAN)
>> > +
>> > +/**
>> > + * Internal function which parses a packet, setting outer_l2/l3_len a=
nd
>> > + * l2/l3/l4_len and packet_type.
>> > + *
>> > + * @param pkt
>> > + *  Packet to parse.
>> > + */
>> > +void gso_parse_packet(struct rte_mbuf *pkt);
>> > +
>> > +/**
>> > + * Internal function which updates relevant packet headers, following
>> > + * segmentation. This is required to update, for example, the IPv4
>> > + * 'total_length' field, to reflect the reduced length of the now-
>> > + * segmented packet.
>> > + *
>> > + * @param pkt
>> > + *  The original packet.
>> > + * @param nb_segments
>> > + *  The number of GSO segments into which pkt was split.
>> > + * @param out_segements
>> > + *  Pointer array used for storing mbuf addresses for GSO segments.
>> > + */
>> > +void gso_update_pkt_headers(struct rte_mbuf *pkt, uint16_t nb_segment=
s,
>> > +		struct rte_mbuf **out_segments);
>> > +
>> > +/**
>> > + * Internal function which divides the input packet into small segmen=
ts.
>> > + * Each of the newly-created segments is organized as a two-segment m=
buf,
>> > + * where the first segment is a standard mbuf, which stores a copy of
>> > + * packet header, and the second is an indirect mbuf which points to =
a
>> > + * section of data in the input packet.
>> > + *
>> > + * @param pkt
>> > + *  Packet to segment.
>> > + * @param pkt_hdr_offset
>> > + *  Packet header offset, measured in byte.
>> > + * @param pyld_unit_size
>> > + *  The max payload length of a GSO segment.
>> > + * @param direct_pool
>> > + *  MBUF pool used for allocating direct buffers for output segments.
>> > + * @param indirect_pool
>> > + *  MBUF pool used for allocating indirect buffers for output segment=
s.
>> > + * @param pkts_out
>> > + *  Pointer array used to keep the mbuf addresses of output segments.
>> > + * @param nb_pkts_out
>> > + *  The max number of items that pkts_out can keep.
>> > + *
>> > + * @return
>> > + *  - The number of segments created in the event of success.
>> > + *  - If no GSO is performed, return 1.
>> > + *  - If available memory in mempools is insufficient, return -ENOMEM=
.
>> > + *  - -EINVAL for invalid parameters
>> > + */
>> > +int gso_do_segment(struct rte_mbuf *pkt,
>> > +		uint16_t pkt_hdr_offset,
>> > +		uint16_t pyld_unit_size,
>> > +		struct rte_mempool *direct_pool,
>> > +		struct rte_mempool *indirect_pool,
>> > +		struct rte_mbuf **pkts_out,
>> > +		uint16_t nb_pkts_out);
>> > +#endif
>> > diff --git a/lib/librte_gso/gso_tcp.c b/lib/librte_gso/gso_tcp.c
>> > new file mode 100644
>> > index 0000000..9d5fc30
>> > --- /dev/null
>> > +++ b/lib/librte_gso/gso_tcp.c
>> > @@ -0,0 +1,82 @@
>> > +/*-
>> > + *   BSD LICENSE
>> > + *
>> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
>> > + *   All rights reserved.
>> > + *
>> > + *   Redistribution and use in source and binary forms, with or witho=
ut
>> > + *   modification, are permitted provided that the following conditio=
ns
>> > + *   are met:
>> > + *
>> > + *     * Redistributions of source code must retain the above copyrig=
ht
>> > + *       notice, this list of conditions and the following disclaimer=
.
>> > + *     * Redistributions in binary form must reproduce the above
>copyright
>> > + *       notice, this list of conditions and the following disclaimer=
 in
>> > + *       the documentation and/or other materials provided with the
>> > + *       distribution.
>> > + *     * Neither the name of Intel Corporation nor the names of its
>> > + *       contributors may be used to endorse or promote products deri=
ved
>> > + *       from this software without specific prior written permission=
.
>> > + *
>> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUT=
ORS
>> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NO=
T
>> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNES=
S
>FOR
>> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYR=
IGHT
>> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>INCIDENTAL,
>> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
>USE,
>> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND O=
N
>ANY
>> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR T=
ORT
>> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF TH=
E
>USE
>> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAM=
AGE.
>> > + */
>> > +
>> > +
>> > +#include <rte_ether.h>
>> > +#include <rte_ip.h>
>> > +#include <rte_tcp.h>
>> > +
>> > +#include "gso_common.h"
>> > +#include "gso_tcp.h"
>> > +
>> > +int
>> > +gso_tcp4_segment(struct rte_mbuf *pkt,
>> > +		uint16_t gso_size,
>> > +		struct rte_mempool *direct_pool,
>> > +		struct rte_mempool *indirect_pool,
>> > +		struct rte_mbuf **pkts_out,
>> > +		uint16_t nb_pkts_out)
>> > +{
>> > +	struct ether_hdr *eth_hdr;
>> > +	struct ipv4_hdr *ipv4_hdr;
>> > +	uint16_t tcp_dl;
>> > +	uint16_t pyld_unit_size;
>> > +	uint16_t hdr_offset;
>> > +	int ret =3D 1;
>> > +
>> > +	eth_hdr =3D rte_pktmbuf_mtod(pkt, struct ether_hdr *);
>> > +	ipv4_hdr =3D (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
>> > +
>> > +	/* don't process fragmented packet */
>> > +	if ((ipv4_hdr->fragment_offset &
>> > +				rte_cpu_to_be_16(IPV4_HDR_DF_MASK)) =3D=3D 0)
>> > +		return ret;
>> > +
>> > +	tcp_dl =3D rte_be_to_cpu_16(ipv4_hdr->total_length) -
>> > +		pkt->l3_len - pkt->l4_len;
>> > +	/* don't process packet without data */
>> > +	if (tcp_dl =3D=3D 0)
>> > +		return ret;
>> > +
>> > +	hdr_offset =3D pkt->l2_len + pkt->l3_len + pkt->l4_len;
>> > +	pyld_unit_size =3D gso_size - hdr_offset - ETHER_CRC_LEN;
>> > +
>> > +	/* segment the payload */
>> > +	ret =3D gso_do_segment(pkt, hdr_offset, pyld_unit_size, direct_pool,
>> > +			indirect_pool, pkts_out, nb_pkts_out);
>> > +
>> > +	if (ret > 1)
>> > +		gso_update_pkt_headers(pkt, ret, pkts_out);
>> > +
>> > +	return ret;
>> > +}
>> > diff --git a/lib/librte_gso/gso_tcp.h b/lib/librte_gso/gso_tcp.h
>> > new file mode 100644
>> > index 0000000..f291ccb
>> > --- /dev/null
>> > +++ b/lib/librte_gso/gso_tcp.h
>> > @@ -0,0 +1,73 @@
>> > +/*-
>> > + *   BSD LICENSE
>> > + *
>> > + *   Copyright(c) 2017 Intel Corporation. All rights reserved.
>> > + *   All rights reserved.
>> > + *
>> > + *   Redistribution and use in source and binary forms, with or witho=
ut
>> > + *   modification, are permitted provided that the following conditio=
ns
>> > + *   are met:
>> > + *
>> > + *     * Redistributions of source code must retain the above copyrig=
ht
>> > + *       notice, this list of conditions and the following disclaimer=
.
>> > + *     * Redistributions in binary form must reproduce the above
>copyright
>> > + *       notice, this list of conditions and the following disclaimer=
 in
>> > + *       the documentation and/or other materials provided with the
>> > + *       distribution.
>> > + *     * Neither the name of Intel Corporation nor the names of its
>> > + *       contributors may be used to endorse or promote products deri=
ved
>> > + *       from this software without specific prior written permission=
.
>> > + *
>> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUT=
ORS
>> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NO=
T
>> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNES=
S
>FOR
>> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYR=
IGHT
>> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>INCIDENTAL,
>> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
>USE,
>> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND O=
N
>ANY
>> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR T=
ORT
>> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF TH=
E
>USE
>> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAM=
AGE.
>> > + */
>> > +
>> > +#ifndef _GSO_TCP_H_
>> > +#define _GSO_TCP_H_
>> > +
>> > +#include <stdint.h>
>> > +#include <rte_mbuf.h>
>> > +
>> > +/**
>> > + * Segment an IPv4/TCP packet. This function assumes the input packet=
 has
>> > + * correct checksums and doesn't update checksums for GSO segment.
>> > + * Furthermore, it doesn't process IP fragment packets.
>> > + *
>> > + * @param pkt
>> > + *  The packet mbuf to segment.
>> > + * @param gso_size
>> > + *  The max length of a GSO segment, measured in bytes.
>> > + * @param direct_pool
>> > + *  MBUF pool used for allocating direct buffers for output segments.
>> > + * @param indirect_pool
>> > + *  MBUF pool used for allocating indirect buffers for output segment=
s.
>> > + * @param pkts_out
>> > + *  Pointer array, which is used to store mbuf addresses of GSO segme=
nts.
>> > + *  Caller should guarantee that 'pkts_out' is sufficiently large to
>store
>> > + *  all GSO segments.
>> > + * @param nb_pkts_out
>> > + *  The max number of items that 'pkts_out' can keep.
>> > + *
>> > + * @return
>> > + *   - The number of GSO segments on success.
>> > + *   - Return 1 if no GSO is performed.
>> > + *   - Return -ENOMEM if available memory in mempools is insufficient=
.
>> > + *   - Return -EINVAL for invalid parameters.
>> > + */
>> > +int gso_tcp4_segment(struct rte_mbuf *pkt,
>> > +		uint16_t gso_size,
>> > +		struct rte_mempool *direct_pool,
>> > +		struct rte_mempool *indirect_pool,
>> > +		struct rte_mbuf **pkts_out,
>> > +		uint16_t nb_pkts_out);
>> > +
>> > +#endif
>> > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c
>> > index b81afce..fac95f2 100644
>> > --- a/lib/librte_gso/rte_gso.c
>> > +++ b/lib/librte_gso/rte_gso.c
>> > @@ -31,17 +31,57 @@
>> >   *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAM=
AGE.
>> >   */
>> >
>> > +#include <rte_log.h>
>> > +
>> >  #include "rte_gso.h"
>> > +#include "gso_common.h"
>> > +#include "gso_tcp.h"
>> >
>> >  int
>> >  rte_gso_segment(struct rte_mbuf *pkt,
>> >  		struct rte_gso_ctx gso_ctx,
>> >  		struct rte_mbuf **pkts_out,
>> > -		uint16_t nb_pkts_out __rte_unused)
>> > +		uint16_t nb_pkts_out)
>> >  {
>> > +	struct rte_mempool *direct_pool, *indirect_pool;
>> > +	struct rte_mbuf *pkt_seg;
>> > +	uint16_t nb_segments, gso_size;
>> > +
>> >  	if (pkt =3D=3D NULL || pkts_out =3D=3D NULL || gso_ctx.direct_pool =
=3D=3D
>> >  			NULL || gso_ctx.indirect_pool =3D=3D NULL)
>> >  		return -EINVAL;
>>
>> Probably we don't need to check gso_ctx values for each incoming packet.
>> If you feel it is necessary - create  new function rte_gso_ctx_check() t=
hat
>> could be performed just once per ctx.
>
>Agree. I will change it. Thanks.
>
>>
>> >
>> > -	return 1;
>> > +	if ((gso_ctx.gso_types & RTE_GSO_TCP_IPV4) =3D=3D 0 ||
>> > +			gso_ctx.gso_size >=3D pkt->pkt_len ||
>> > +			gso_ctx.gso_size =3D=3D 0)
>>
>>
>> First and third condition seems redundant.
>
>The reason to check gso_ctx.gso_types here is that we don't perform
>GSO if applications don't set RTE_GSO_TCP_IPV4 to gso_ctx.gso_types,
>even the input packet is TCP/IPv4. And if gso_ctx.gso_size is 0,
>we don't need to execute the following codes. So we still need to
>remove these two conditions?
>
>>
>> > +		return 1;
>>
>>
>> I think you forgot here:
>> pkts_out[0] =3D pkt;
>
>But why should we keep the input packet in the output array? Currently, if
>GSO is not performed, no packets will be kept in pkts_out[]. Applications
>can know it by the return value 1 of rte_gso_segment().
>
>>
>>
>> > +
>> > +	pkt_seg =3D pkt;
>> > +	gso_size =3D gso_ctx.gso_size;
>> > +	direct_pool =3D gso_ctx.direct_pool;
>> > +	indirect_pool =3D gso_ctx.indirect_pool;
>> > +
>> > +	/* Parse packet headers to determine how to segment 'pkt' */
>> > +	gso_parse_packet(pkt);
>>
>>
>> I don't think we need to parse packet here.
>>  Instead assume that user already filled packet_type and l2/l3/..._len
>fields correctly.
>
>Hmm, I see it. Thanks.
>
>Thanks,
>Jiayu