From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 542C62BF3 for ; Wed, 30 Aug 2017 11:25:42 +0200 (CEST) Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga104.jf.intel.com with ESMTP; 30 Aug 2017 02:25:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.41,448,1498546800"; d="scan'208";a="146202326" Received: from irsmsx108.ger.corp.intel.com ([163.33.3.3]) by fmsmga006.fm.intel.com with ESMTP; 30 Aug 2017 02:25:40 -0700 Received: from irsmsx101.ger.corp.intel.com ([169.254.1.22]) by IRSMSX108.ger.corp.intel.com ([169.254.11.167]) with mapi id 14.03.0319.002; Wed, 30 Aug 2017 10:25:39 +0100 From: "Kavanagh, Mark B" To: "Hu, Jiayu" , "Ananyev, Konstantin" CC: "dev@dpdk.org" , "Tan, Jianfeng" Thread-Topic: [PATCH 2/5] gso/lib: add TCP/IPv4 GSO support Thread-Index: AQHTHOM2JlGz7S1xvEqkdDT8yCb80aKcFteAgAAVmACAAHucoA== Date: Wed, 30 Aug 2017 09:25:38 +0000 Message-ID: References: <1503584144-63181-1-git-send-email-jiayu.hu@intel.com> <1503584144-63181-3-git-send-email-jiayu.hu@intel.com> <2601191342CEEE43887BDE71AB9772584F23E097@IRSMSX103.ger.corp.intel.com> <20170830025550.GA113170@dpdk15.sh.intel.com> In-Reply-To: <20170830025550.GA113170@dpdk15.sh.intel.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMDExOWU2OWQtYzliZC00YmE0LTk4MGQtMmZiZmQ5ODZhMjYxIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IjVEYklGWkk1WDJ5TmoyWUtiNFVROHVSVXdtY1JYOFwvYTFzdXoxM3ZRVzZZPSJ9 dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 2/5] gso/lib: add TCP/IPv4 GSO support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Aug 2017 09:25:43 -0000 >From: Hu, Jiayu >Sent: Wednesday, August 30, 2017 3:56 AM >To: Ananyev, Konstantin >Cc: dev@dpdk.org; Kavanagh, Mark B ; Tan, Jianf= eng > >Subject: Re: [PATCH 2/5] gso/lib: add TCP/IPv4 GSO support > >Hi Konstantin, > >Thanks for your important suggestions. My feedbacks are inline. > >On Wed, Aug 30, 2017 at 09:38:33AM +0800, Ananyev, Konstantin wrote: >> >> >> > -----Original Message----- >> > From: Hu, Jiayu >> > Sent: Thursday, August 24, 2017 3:16 PM >> > To: dev@dpdk.org >> > Cc: Kavanagh, Mark B ; Ananyev, Konstantin >; Tan, Jianfeng >> > ; Hu, Jiayu >> > Subject: [PATCH 2/5] gso/lib: add TCP/IPv4 GSO support >> > >> > This patch adds GSO support for TCP/IPv4 packets. Supported packets >> > may include a single VLAN tag. TCP/IPv4 GSO assumes that all input >> > packets have correct checksums, and doesn't update checksums for outpu= t >> > packets (the responsibility for this lies with the application). >> > Additionally, TCP/IPv4 GSO doesn't process IP fragmented packets. >> > >> > TCP/IPv4 GSO uses two chained MBUFs, one direct MBUF and one indrect >> > MBUF, to organize an output packet. Note that we refer to these two >> > chained MBUFs as a two-segment MBUF. The direct MBUF stores the packet >> > header, while the indirect mbuf simply points to a location within the >> > original packet's payload. Consequently, use of the GSO library requir= es >> > multi-segment MBUF support in the TX functions of the NIC driver. >> > >> > If a packet is GSOed, TCP/IPv4 GSO reduces its MBUF refcnt by 1. As a >> > result, when all of its GSOed segments are freed, the packet is freed >> > automatically. >> > >> > Signed-off-by: Jiayu Hu >> > Signed-off-by: Mark Kavanagh >> > --- >> > lib/librte_eal/common/include/rte_log.h | 1 + >> > lib/librte_gso/Makefile | 2 + >> > lib/librte_gso/gso_common.c | 270 >++++++++++++++++++++++++++++++++ >> > lib/librte_gso/gso_common.h | 120 ++++++++++++++ >> > lib/librte_gso/gso_tcp.c | 82 ++++++++++ >> > lib/librte_gso/gso_tcp.h | 73 +++++++++ >> > lib/librte_gso/rte_gso.c | 44 +++++- >> > lib/librte_gso/rte_gso.h | 3 + >> > 8 files changed, 593 insertions(+), 2 deletions(-) >> > create mode 100644 lib/librte_gso/gso_common.c >> > create mode 100644 lib/librte_gso/gso_common.h >> > create mode 100644 lib/librte_gso/gso_tcp.c >> > create mode 100644 lib/librte_gso/gso_tcp.h >> > >> > diff --git a/lib/librte_eal/common/include/rte_log.h >b/lib/librte_eal/common/include/rte_log.h >> > index ec8dba7..2fa1199 100644 >> > --- a/lib/librte_eal/common/include/rte_log.h >> > +++ b/lib/librte_eal/common/include/rte_log.h >> > @@ -87,6 +87,7 @@ extern struct rte_logs rte_logs; >> > #define RTE_LOGTYPE_CRYPTODEV 17 /**< Log related to cryptodev. */ >> > #define RTE_LOGTYPE_EFD 18 /**< Log related to EFD. */ >> > #define RTE_LOGTYPE_EVENTDEV 19 /**< Log related to eventdev. */ >> > +#define RTE_LOGTYPE_GSO 20 /**< Log related to GSO. */ >> > >> > /* these log types can be used in an application */ >> > #define RTE_LOGTYPE_USER1 24 /**< User-defined log type 1. */ >> > diff --git a/lib/librte_gso/Makefile b/lib/librte_gso/Makefile >> > index aeaacbc..0f8e38f 100644 >> > --- a/lib/librte_gso/Makefile >> > +++ b/lib/librte_gso/Makefile >> > @@ -42,6 +42,8 @@ LIBABIVER :=3D 1 >> > >> > #source files >> > SRCS-$(CONFIG_RTE_LIBRTE_GSO) +=3D rte_gso.c >> > +SRCS-$(CONFIG_RTE_LIBRTE_GSO) +=3D gso_common.c >> > +SRCS-$(CONFIG_RTE_LIBRTE_GSO) +=3D gso_tcp.c >> > >> > # install this header file >> > SYMLINK-$(CONFIG_RTE_LIBRTE_GSO)-include +=3D rte_gso.h >> > diff --git a/lib/librte_gso/gso_common.c b/lib/librte_gso/gso_common.c >> > new file mode 100644 >> > index 0000000..2b54fbd >> > --- /dev/null >> > +++ b/lib/librte_gso/gso_common.c >> > @@ -0,0 +1,270 @@ >> > +/*- >> > + * BSD LICENSE >> > + * >> > + * Copyright(c) 2017 Intel Corporation. All rights reserved. >> > + * All rights reserved. >> > + * >> > + * Redistribution and use in source and binary forms, with or witho= ut >> > + * modification, are permitted provided that the following conditio= ns >> > + * are met: >> > + * >> > + * * Redistributions of source code must retain the above copyrig= ht >> > + * notice, this list of conditions and the following disclaimer= . >> > + * * Redistributions in binary form must reproduce the above >copyright >> > + * notice, this list of conditions and the following disclaimer= in >> > + * the documentation and/or other materials provided with the >> > + * distribution. >> > + * * Neither the name of Intel Corporation nor the names of its >> > + * contributors may be used to endorse or promote products deri= ved >> > + * from this software without specific prior written permission= . >> > + * >> > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUT= ORS >> > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NO= T >> > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNES= S >FOR >> > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYR= IGHT >> > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, >INCIDENTAL, >> > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT >> > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF >USE, >> > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND O= N >ANY >> > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR T= ORT >> > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF TH= E >USE >> > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAM= AGE. >> > + */ >> > + >> > +#include >> > +#include >> > + >> > +#include >> > + >> > +#include >> > +#include >> > +#include >> > + >> > +#include "gso_common.h" >> > + >> > +static inline void >> > +hdr_segment_init(struct rte_mbuf *hdr_segment, struct rte_mbuf *pkt, >> > + uint16_t pkt_hdr_offset) >> > +{ >> > + /* copy mbuf metadata */ >> > + hdr_segment->nb_segs =3D 1; >> > + hdr_segment->port =3D pkt->port; >> > + hdr_segment->ol_flags =3D pkt->ol_flags; >> > + hdr_segment->packet_type =3D pkt->packet_type; >> > + hdr_segment->pkt_len =3D pkt_hdr_offset; >> > + hdr_segment->data_len =3D pkt_hdr_offset; >> > + hdr_segment->tx_offload =3D pkt->tx_offload; >> > + /* copy packet header */ >> > + rte_memcpy(rte_pktmbuf_mtod(hdr_segment, char *), >> > + rte_pktmbuf_mtod(pkt, char *), >> > + pkt_hdr_offset); >> > +} >> > + >> > +static inline void >> > +free_gso_segment(struct rte_mbuf **pkts, uint16_t nb_pkts) >> > +{ >> > + uint16_t i; >> > + >> > + for (i =3D 0; i < nb_pkts; i++) { >> > + rte_pktmbuf_detach(pkts[i]->next); >> >> I don't think you need to call detach() here explicitly. >> Just rte_pktmbuf_free(pkts[i]) should do I think. > >Yes, rte_pktmbuf_free() is enough. I will modify it. Thanks. > >> >> > + rte_pktmbuf_free(pkts[i]); >> > + pkts[i] =3D NULL; >> > + } >> > +} >> > + >> > +int >> > +gso_do_segment(struct rte_mbuf *pkt, >> > + uint16_t pkt_hdr_offset, >> > + uint16_t pyld_unit_size, >> > + struct rte_mempool *direct_pool, >> > + struct rte_mempool *indirect_pool, >> > + struct rte_mbuf **pkts_out, >> > + uint16_t nb_pkts_out) >> > +{ >> > + struct rte_mbuf *pkt_in; >> > + struct rte_mbuf *hdr_segment, *pyld_segment; >> > + uint32_t pkt_in_pyld_off; >> > + uint16_t pkt_in_segment_len, pkt_out_segment_len; >> > + uint16_t nb_segs; >> > + bool pkt_in_segment_processed; >> > + >> > + pkt_in_pyld_off =3D pkt->data_off + pkt_hdr_offset; >> > + pkt_in =3D pkt; >> > + nb_segs =3D 0; >> > + >> > + while (pkt_in) { >> > + pkt_in_segment_processed =3D false; >> > + pkt_in_segment_len =3D pkt_in->data_off + pkt_in->data_len; >> > + >> > + while (!pkt_in_segment_processed) { >> > + if (unlikely(nb_segs >=3D nb_pkts_out)) { >> > + free_gso_segment(pkts_out, nb_segs); >> > + return -EINVAL; >> > + } >> > + >> > + /* allocate direct mbuf */ >> > + hdr_segment =3D rte_pktmbuf_alloc(direct_pool); >> > + if (unlikely(hdr_segment =3D=3D NULL)) { >> > + free_gso_segment(pkts_out, nb_segs); >> > + return -ENOMEM; >> > + } >> > + >> > + /* allocate indirect mbuf */ >> > + pyld_segment =3D rte_pktmbuf_alloc(indirect_pool); >> > + if (unlikely(pyld_segment =3D=3D NULL)) { >> > + rte_pktmbuf_free(hdr_segment); >> > + free_gso_segment(pkts_out, nb_segs); >> > + return -ENOMEM; >> > + } >> >> So, if I understand correctly each new packet would always contain just = one >data segment? >> Why several data segments couldn't be chained together (if sum of their >data_len <=3D mss)? >> In a same way as done here: >> >http://dpdk.org/browse/dpdk/tree/lib/librte_ip_frag/rte_ipv4_fragmentation= .c#n >93 >> or here: >> >https://gerrit.fd.io/r/gitweb?p=3Dtldk.git;a=3Dblob;f=3Dlib/libtle_l4p/tcp= _tx_seg.h; >h=3Da8d2425597a7ad6f598aa4bb7fcd7f1da74305f0;hb=3DHEAD#l23 >> ? > >Oh, yes. I can chain these data segments when their total length is less t= han >the GSO segsz. >I will change it in the next patch. Thanks very much. > >> >> > + >> > + /* copy packet header */ >> > + hdr_segment_init(hdr_segment, pkt, pkt_hdr_offset); >> > + >> > + /* attach payload mbuf to current packet segment */ >> > + rte_pktmbuf_attach(pyld_segment, pkt_in); >> > + >> > + hdr_segment->next =3D pyld_segment; >> > + pkts_out[nb_segs++] =3D hdr_segment; >> > + >> > + /* calculate payload length */ >> > + pkt_out_segment_len =3D pyld_unit_size; >> > + if (pkt_in_pyld_off + pkt_out_segment_len > >> > + pkt_in_segment_len) { >> > + pkt_out_segment_len =3D pkt_in_segment_len - >> > + pkt_in_pyld_off; >> > + } >> > + >> > + /* update payload segment */ >> > + pyld_segment->data_off =3D pkt_in_pyld_off; >> > + pyld_segment->data_len =3D pkt_out_segment_len; >> > + >> > + /* update header segment */ >> > + hdr_segment->pkt_len +=3D pyld_segment->data_len; >> > + hdr_segment->nb_segs++; >> > + >> > + /* update pkt_in_pyld_off */ >> > + pkt_in_pyld_off +=3D pkt_out_segment_len; >> > + if (pkt_in_pyld_off =3D=3D pkt_in_segment_len) >> > + pkt_in_segment_processed =3D true; >> > + } >> > + >> > + /* 'pkt_in' may contain numerous segments */ >> > + pkt_in =3D pkt_in->next; >> > + if (pkt_in !=3D NULL) >> > + pkt_in_pyld_off =3D pkt_in->data_off; >> > + } >> > + return nb_segs; >> > +} >> > + >> > +static inline void >> > +parse_ipv4(struct ipv4_hdr *ipv4_hdr, struct rte_mbuf *pkt) >> > +{ >> > + struct tcp_hdr *tcp_hdr; >> > + >> > + switch (ipv4_hdr->next_proto_id) { >> > + case IPPROTO_TCP: >> > + pkt->packet_type |=3D RTE_PTYPE_L4_TCP; >> > + pkt->l3_len =3D IPv4_HDR_LEN(ipv4_hdr); >> > + tcp_hdr =3D (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len); >> > + pkt->l4_len =3D TCP_HDR_LEN(tcp_hdr); >> > + break; >> > + } >> > +} >> > + >> > +static inline void >> > +parse_ethernet(struct ether_hdr *eth_hdr, struct rte_mbuf *pkt) >> > +{ >> > + struct ipv4_hdr *ipv4_hdr; >> > + struct vlan_hdr *vlan_hdr; >> > + uint16_t ethertype; >> > + >> > + ethertype =3D rte_be_to_cpu_16(eth_hdr->ether_type); >> > + if (ethertype =3D=3D ETHER_TYPE_VLAN) { >> > + vlan_hdr =3D (struct vlan_hdr *)(eth_hdr + 1); >> > + pkt->l2_len =3D sizeof(struct vlan_hdr); >> > + pkt->packet_type |=3D RTE_PTYPE_L2_ETHER_VLAN; >> > + ethertype =3D rte_be_to_cpu_16(vlan_hdr->eth_proto); >> > + } >> > + >> > + switch (ethertype) { >> > + case ETHER_TYPE_IPv4: >> > + if (IS_VLAN_PKT(pkt)) { >> > + pkt->packet_type |=3D RTE_PTYPE_L3_IPV4; >> > + } else { >> > + pkt->packet_type |=3D RTE_PTYPE_L2_ETHER; >> > + pkt->packet_type |=3D RTE_PTYPE_L3_IPV4; >> > + } >> > + pkt->l2_len +=3D sizeof(struct ether_hdr); >> > + ipv4_hdr =3D (struct ipv4_hdr *) ((char *)eth_hdr + >> > + pkt->l2_len); >> > + parse_ipv4(ipv4_hdr, pkt); >> > + break; >> > + } >> > +} >> > + >> > +void >> > +gso_parse_packet(struct rte_mbuf *pkt) >> >> There is a function rte_net_get_ptype() that supposed to provide similar >functionality. >> So we probably don't need to create a new SW parse function here, instea= d >would be better >> to reuse (and update if needed) an existing one. >> Again user already might have l2/l3/l4.../_len and packet_type setuped. >> So better to keep SW packet parsing out of scope of that library. > >Hmm, I know we have discussed this design choice in the GRO library, and I >also think it's >better to reuse these values. > >But from the perspective of OVS, it may add extra overhead, since OVS does= n't >parse every >packet originally. Maybe @Mark can give us more inputs from the view of OV= S. Hi Jiayu, Konstantin For GSO, the application needs to know: - the packet type (as it only currently supports TCP/IPv4, VxLAN, GRE packe= ts) - the l2/3/4_lens, etc. (in order to replicate the original packet's header= s across outgoing segments) For this, we can use the rte_net_get_ptype function, as per Konstantin's su= ggestion, as it provides both - thanks Konstantin! WRT the extra overhead in OvS: TSO is the defacto standard, and GSO is prov= ided purely as a fallback option. As such, and since the additional packet = parsing is a necessity in order to facilitate GSO, the additional overhead = is IMO acceptable. Thanks, Mark > >> >> > +{ >> > + struct ether_hdr *eth_hdr =3D rte_pktmbuf_mtod(pkt, struct ether_hdr= *); >> > + >> > + pkt->packet_type =3D pkt->tx_offload =3D 0; >> > + parse_ethernet(eth_hdr, pkt); >> > +} >> > + >> > +static inline void >> > +update_ipv4_header(char *base, uint16_t offset, uint16_t length, uint= 16_t >id) >> > +{ >> > + struct ipv4_hdr *ipv4_hdr =3D (struct ipv4_hdr *)(base + offset); >> > + >> > + ipv4_hdr->total_length =3D rte_cpu_to_be_16(length - offset); >> > + ipv4_hdr->packet_id =3D rte_cpu_to_be_16(id); >> > +} >> > + >> > +static inline void >> > +update_tcp_header(char *base, uint16_t offset, uint32_t sent_seq, >> > + uint8_t non_tail) >> > +{ >> > + struct tcp_hdr *tcp_hdr =3D (struct tcp_hdr *)(base + offset); >> > + >> > + tcp_hdr->sent_seq =3D rte_cpu_to_be_32(sent_seq); >> > + /* clean FIN and PSH for non-tail segments */ >> > + if (non_tail) >> > + tcp_hdr->tcp_flags &=3D (~(TCP_HDR_PSH_MASK | TCP_HDR_FIN_MASK)); >> > +} >> > + >> > +void >> > +gso_update_pkt_headers(struct rte_mbuf *pkt, uint16_t nb_segments, >> > + struct rte_mbuf **out_segments) >> > +{ >> > + struct ipv4_hdr *ipv4_hdr; >> > + struct tcp_hdr *tcp_hdr; >> > + struct rte_mbuf *seg; >> > + uint32_t sent_seq; >> > + uint16_t offset, i; >> > + uint16_t tail_seg_idx =3D nb_segments - 1, id; >> > + >> > + switch (pkt->packet_type) { >> > + case ETHER_VLAN_IPv4_TCP_PKT: >> > + case ETHER_IPv4_TCP_PKT: >> >> Might be worth to put code below in a separate function: >> update_inner_tcp_hdr(..) or so. >> Then you can reuse it for tunneled cases too. > >Yes, I will modify it in the next patch. Thanks. > >> >> > + ipv4_hdr =3D (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) >> > + pkt->l2_len); >> > + tcp_hdr =3D (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len); >> > + id =3D rte_be_to_cpu_16(ipv4_hdr->packet_id); >> > + sent_seq =3D rte_be_to_cpu_32(tcp_hdr->sent_seq); >> > + >> > + for (i =3D 0; i < nb_segments; i++) { >> > + seg =3D out_segments[i]; >> > + >> > + offset =3D seg->l2_len; >> > + update_ipv4_header(rte_pktmbuf_mtod(seg, char *), >> > + offset, seg->pkt_len, id); >> > + id++; >> >> Who would be responsible to make sure that we wouldn't have consecutive >packets with the IPV4 id? >> Would be the upper layer that forms the packet or gso library or ...? > >Oh yes. I ingore this important issue. I don't think applications can >guarantee it. >I will check the design of linux and try to figure out a way. Thanks for >reminder. > >> >> > + >> > + offset +=3D seg->l3_len; >> > + update_tcp_header(rte_pktmbuf_mtod(seg, char *), >> > + offset, sent_seq, i < tail_seg_idx); >> > + sent_seq +=3D seg->next->data_len; >> > + } >> > + break; >> > + } >> > +} >> > diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h >> > new file mode 100644 >> > index 0000000..d750041 >> > --- /dev/null >> > +++ b/lib/librte_gso/gso_common.h >> > @@ -0,0 +1,120 @@ >> > +/*- >> > + * BSD LICENSE >> > + * >> > + * Copyright(c) 2017 Intel Corporation. All rights reserved. >> > + * All rights reserved. >> > + * >> > + * Redistribution and use in source and binary forms, with or witho= ut >> > + * modification, are permitted provided that the following conditio= ns >> > + * are met: >> > + * >> > + * * Redistributions of source code must retain the above copyrig= ht >> > + * notice, this list of conditions and the following disclaimer= . >> > + * * Redistributions in binary form must reproduce the above >copyright >> > + * notice, this list of conditions and the following disclaimer= in >> > + * the documentation and/or other materials provided with the >> > + * distribution. >> > + * * Neither the name of Intel Corporation nor the names of its >> > + * contributors may be used to endorse or promote products deri= ved >> > + * from this software without specific prior written permission= . >> > + * >> > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUT= ORS >> > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NO= T >> > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNES= S >FOR >> > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYR= IGHT >> > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, >INCIDENTAL, >> > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT >> > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF >USE, >> > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND O= N >ANY >> > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR T= ORT >> > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF TH= E >USE >> > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAM= AGE. >> > + */ >> > + >> > +#ifndef _GSO_COMMON_H_ >> > +#define _GSO_COMMON_H_ >> > + >> > +#include >> > +#include >> > + >> > +#define IPV4_HDR_DF_SHIFT 14 >> > +#define IPV4_HDR_DF_MASK (1 << IPV4_HDR_DF_SHIFT) >> > +#define IPv4_HDR_LEN(iph) ((iph->version_ihl & 0x0f) * 4) >> > + >> > +#define TCP_HDR_PSH_MASK ((uint8_t)0x08) >> > +#define TCP_HDR_FIN_MASK ((uint8_t)0x01) >> > +#define TCP_HDR_LEN(tcph) ((tcph->data_off & 0xf0) >> 2) >> > + >> > +#define ETHER_IPv4_PKT (RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4) >> > +/* Supported packet types */ >> > +/* TCP/IPv4 packet. */ >> > +#define ETHER_IPv4_TCP_PKT (ETHER_IPv4_PKT | RTE_PTYPE_L4_TCP) >> > + >> > +/* TCP/IPv4 packet with VLAN tag. */ >> > +#define ETHER_VLAN_IPv4_TCP_PKT (RTE_PTYPE_L2_ETHER_VLAN | \ >> > + RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP) >> > + >> > +#define IS_VLAN_PKT(pkt) ((pkt->packet_type & RTE_PTYPE_L2_ETHER_VLAN= ) =3D=3D >\ >> > + RTE_PTYPE_L2_ETHER_VLAN) >> > + >> > +/** >> > + * Internal function which parses a packet, setting outer_l2/l3_len a= nd >> > + * l2/l3/l4_len and packet_type. >> > + * >> > + * @param pkt >> > + * Packet to parse. >> > + */ >> > +void gso_parse_packet(struct rte_mbuf *pkt); >> > + >> > +/** >> > + * Internal function which updates relevant packet headers, following >> > + * segmentation. This is required to update, for example, the IPv4 >> > + * 'total_length' field, to reflect the reduced length of the now- >> > + * segmented packet. >> > + * >> > + * @param pkt >> > + * The original packet. >> > + * @param nb_segments >> > + * The number of GSO segments into which pkt was split. >> > + * @param out_segements >> > + * Pointer array used for storing mbuf addresses for GSO segments. >> > + */ >> > +void gso_update_pkt_headers(struct rte_mbuf *pkt, uint16_t nb_segment= s, >> > + struct rte_mbuf **out_segments); >> > + >> > +/** >> > + * Internal function which divides the input packet into small segmen= ts. >> > + * Each of the newly-created segments is organized as a two-segment m= buf, >> > + * where the first segment is a standard mbuf, which stores a copy of >> > + * packet header, and the second is an indirect mbuf which points to = a >> > + * section of data in the input packet. >> > + * >> > + * @param pkt >> > + * Packet to segment. >> > + * @param pkt_hdr_offset >> > + * Packet header offset, measured in byte. >> > + * @param pyld_unit_size >> > + * The max payload length of a GSO segment. >> > + * @param direct_pool >> > + * MBUF pool used for allocating direct buffers for output segments. >> > + * @param indirect_pool >> > + * MBUF pool used for allocating indirect buffers for output segment= s. >> > + * @param pkts_out >> > + * Pointer array used to keep the mbuf addresses of output segments. >> > + * @param nb_pkts_out >> > + * The max number of items that pkts_out can keep. >> > + * >> > + * @return >> > + * - The number of segments created in the event of success. >> > + * - If no GSO is performed, return 1. >> > + * - If available memory in mempools is insufficient, return -ENOMEM= . >> > + * - -EINVAL for invalid parameters >> > + */ >> > +int gso_do_segment(struct rte_mbuf *pkt, >> > + uint16_t pkt_hdr_offset, >> > + uint16_t pyld_unit_size, >> > + struct rte_mempool *direct_pool, >> > + struct rte_mempool *indirect_pool, >> > + struct rte_mbuf **pkts_out, >> > + uint16_t nb_pkts_out); >> > +#endif >> > diff --git a/lib/librte_gso/gso_tcp.c b/lib/librte_gso/gso_tcp.c >> > new file mode 100644 >> > index 0000000..9d5fc30 >> > --- /dev/null >> > +++ b/lib/librte_gso/gso_tcp.c >> > @@ -0,0 +1,82 @@ >> > +/*- >> > + * BSD LICENSE >> > + * >> > + * Copyright(c) 2017 Intel Corporation. All rights reserved. >> > + * All rights reserved. >> > + * >> > + * Redistribution and use in source and binary forms, with or witho= ut >> > + * modification, are permitted provided that the following conditio= ns >> > + * are met: >> > + * >> > + * * Redistributions of source code must retain the above copyrig= ht >> > + * notice, this list of conditions and the following disclaimer= . >> > + * * Redistributions in binary form must reproduce the above >copyright >> > + * notice, this list of conditions and the following disclaimer= in >> > + * the documentation and/or other materials provided with the >> > + * distribution. >> > + * * Neither the name of Intel Corporation nor the names of its >> > + * contributors may be used to endorse or promote products deri= ved >> > + * from this software without specific prior written permission= . >> > + * >> > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUT= ORS >> > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NO= T >> > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNES= S >FOR >> > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYR= IGHT >> > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, >INCIDENTAL, >> > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT >> > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF >USE, >> > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND O= N >ANY >> > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR T= ORT >> > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF TH= E >USE >> > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAM= AGE. >> > + */ >> > + >> > + >> > +#include >> > +#include >> > +#include >> > + >> > +#include "gso_common.h" >> > +#include "gso_tcp.h" >> > + >> > +int >> > +gso_tcp4_segment(struct rte_mbuf *pkt, >> > + uint16_t gso_size, >> > + struct rte_mempool *direct_pool, >> > + struct rte_mempool *indirect_pool, >> > + struct rte_mbuf **pkts_out, >> > + uint16_t nb_pkts_out) >> > +{ >> > + struct ether_hdr *eth_hdr; >> > + struct ipv4_hdr *ipv4_hdr; >> > + uint16_t tcp_dl; >> > + uint16_t pyld_unit_size; >> > + uint16_t hdr_offset; >> > + int ret =3D 1; >> > + >> > + eth_hdr =3D rte_pktmbuf_mtod(pkt, struct ether_hdr *); >> > + ipv4_hdr =3D (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len); >> > + >> > + /* don't process fragmented packet */ >> > + if ((ipv4_hdr->fragment_offset & >> > + rte_cpu_to_be_16(IPV4_HDR_DF_MASK)) =3D=3D 0) >> > + return ret; >> > + >> > + tcp_dl =3D rte_be_to_cpu_16(ipv4_hdr->total_length) - >> > + pkt->l3_len - pkt->l4_len; >> > + /* don't process packet without data */ >> > + if (tcp_dl =3D=3D 0) >> > + return ret; >> > + >> > + hdr_offset =3D pkt->l2_len + pkt->l3_len + pkt->l4_len; >> > + pyld_unit_size =3D gso_size - hdr_offset - ETHER_CRC_LEN; >> > + >> > + /* segment the payload */ >> > + ret =3D gso_do_segment(pkt, hdr_offset, pyld_unit_size, direct_pool, >> > + indirect_pool, pkts_out, nb_pkts_out); >> > + >> > + if (ret > 1) >> > + gso_update_pkt_headers(pkt, ret, pkts_out); >> > + >> > + return ret; >> > +} >> > diff --git a/lib/librte_gso/gso_tcp.h b/lib/librte_gso/gso_tcp.h >> > new file mode 100644 >> > index 0000000..f291ccb >> > --- /dev/null >> > +++ b/lib/librte_gso/gso_tcp.h >> > @@ -0,0 +1,73 @@ >> > +/*- >> > + * BSD LICENSE >> > + * >> > + * Copyright(c) 2017 Intel Corporation. All rights reserved. >> > + * All rights reserved. >> > + * >> > + * Redistribution and use in source and binary forms, with or witho= ut >> > + * modification, are permitted provided that the following conditio= ns >> > + * are met: >> > + * >> > + * * Redistributions of source code must retain the above copyrig= ht >> > + * notice, this list of conditions and the following disclaimer= . >> > + * * Redistributions in binary form must reproduce the above >copyright >> > + * notice, this list of conditions and the following disclaimer= in >> > + * the documentation and/or other materials provided with the >> > + * distribution. >> > + * * Neither the name of Intel Corporation nor the names of its >> > + * contributors may be used to endorse or promote products deri= ved >> > + * from this software without specific prior written permission= . >> > + * >> > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUT= ORS >> > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NO= T >> > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNES= S >FOR >> > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYR= IGHT >> > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, >INCIDENTAL, >> > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT >> > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF >USE, >> > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND O= N >ANY >> > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR T= ORT >> > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF TH= E >USE >> > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAM= AGE. >> > + */ >> > + >> > +#ifndef _GSO_TCP_H_ >> > +#define _GSO_TCP_H_ >> > + >> > +#include >> > +#include >> > + >> > +/** >> > + * Segment an IPv4/TCP packet. This function assumes the input packet= has >> > + * correct checksums and doesn't update checksums for GSO segment. >> > + * Furthermore, it doesn't process IP fragment packets. >> > + * >> > + * @param pkt >> > + * The packet mbuf to segment. >> > + * @param gso_size >> > + * The max length of a GSO segment, measured in bytes. >> > + * @param direct_pool >> > + * MBUF pool used for allocating direct buffers for output segments. >> > + * @param indirect_pool >> > + * MBUF pool used for allocating indirect buffers for output segment= s. >> > + * @param pkts_out >> > + * Pointer array, which is used to store mbuf addresses of GSO segme= nts. >> > + * Caller should guarantee that 'pkts_out' is sufficiently large to >store >> > + * all GSO segments. >> > + * @param nb_pkts_out >> > + * The max number of items that 'pkts_out' can keep. >> > + * >> > + * @return >> > + * - The number of GSO segments on success. >> > + * - Return 1 if no GSO is performed. >> > + * - Return -ENOMEM if available memory in mempools is insufficient= . >> > + * - Return -EINVAL for invalid parameters. >> > + */ >> > +int gso_tcp4_segment(struct rte_mbuf *pkt, >> > + uint16_t gso_size, >> > + struct rte_mempool *direct_pool, >> > + struct rte_mempool *indirect_pool, >> > + struct rte_mbuf **pkts_out, >> > + uint16_t nb_pkts_out); >> > + >> > +#endif >> > diff --git a/lib/librte_gso/rte_gso.c b/lib/librte_gso/rte_gso.c >> > index b81afce..fac95f2 100644 >> > --- a/lib/librte_gso/rte_gso.c >> > +++ b/lib/librte_gso/rte_gso.c >> > @@ -31,17 +31,57 @@ >> > * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAM= AGE. >> > */ >> > >> > +#include >> > + >> > #include "rte_gso.h" >> > +#include "gso_common.h" >> > +#include "gso_tcp.h" >> > >> > int >> > rte_gso_segment(struct rte_mbuf *pkt, >> > struct rte_gso_ctx gso_ctx, >> > struct rte_mbuf **pkts_out, >> > - uint16_t nb_pkts_out __rte_unused) >> > + uint16_t nb_pkts_out) >> > { >> > + struct rte_mempool *direct_pool, *indirect_pool; >> > + struct rte_mbuf *pkt_seg; >> > + uint16_t nb_segments, gso_size; >> > + >> > if (pkt =3D=3D NULL || pkts_out =3D=3D NULL || gso_ctx.direct_pool = =3D=3D >> > NULL || gso_ctx.indirect_pool =3D=3D NULL) >> > return -EINVAL; >> >> Probably we don't need to check gso_ctx values for each incoming packet. >> If you feel it is necessary - create new function rte_gso_ctx_check() t= hat >> could be performed just once per ctx. > >Agree. I will change it. Thanks. > >> >> > >> > - return 1; >> > + if ((gso_ctx.gso_types & RTE_GSO_TCP_IPV4) =3D=3D 0 || >> > + gso_ctx.gso_size >=3D pkt->pkt_len || >> > + gso_ctx.gso_size =3D=3D 0) >> >> >> First and third condition seems redundant. > >The reason to check gso_ctx.gso_types here is that we don't perform >GSO if applications don't set RTE_GSO_TCP_IPV4 to gso_ctx.gso_types, >even the input packet is TCP/IPv4. And if gso_ctx.gso_size is 0, >we don't need to execute the following codes. So we still need to >remove these two conditions? > >> >> > + return 1; >> >> >> I think you forgot here: >> pkts_out[0] =3D pkt; > >But why should we keep the input packet in the output array? Currently, if >GSO is not performed, no packets will be kept in pkts_out[]. Applications >can know it by the return value 1 of rte_gso_segment(). > >> >> >> > + >> > + pkt_seg =3D pkt; >> > + gso_size =3D gso_ctx.gso_size; >> > + direct_pool =3D gso_ctx.direct_pool; >> > + indirect_pool =3D gso_ctx.indirect_pool; >> > + >> > + /* Parse packet headers to determine how to segment 'pkt' */ >> > + gso_parse_packet(pkt); >> >> >> I don't think we need to parse packet here. >> Instead assume that user already filled packet_type and l2/l3/..._len >fields correctly. > >Hmm, I see it. Thanks. > >Thanks, >Jiayu