From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 4C0C72A9 for ; Thu, 27 Nov 2014 15:56:41 +0100 (CET) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP; 27 Nov 2014 06:56:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.07,469,1413270000"; d="scan'208";a="615045899" Received: from irsmsx152.ger.corp.intel.com ([163.33.192.66]) by orsmga001.jf.intel.com with ESMTP; 27 Nov 2014 06:56:39 -0800 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.144]) by IRSMSX152.ger.corp.intel.com ([169.254.6.56]) with mapi id 14.03.0195.001; Thu, 27 Nov 2014 14:56:28 +0000 From: "Ananyev, Konstantin" To: "Liu, Jijiang" , "Olivier Matz (olivier.matz@6wind.com)" Thread-Topic: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields Thread-Index: AQHQCikb8nVQX/4wQEWUH1iPbQ9du5x0gSyAgAABBFA= Date: Thu, 27 Nov 2014 14:56:25 +0000 Message-ID: <2601191342CEEE43887BDE71AB977258213BADB8@IRSMSX105.ger.corp.intel.com> References: <1417076319-629-1-git-send-email-jijiang.liu@intel.com> <1417076319-629-2-git-send-email-jijiang.liu@intel.com> <5476F626.2020708@6wind.com> <1ED644BD7E0A5F4091CF203DAFB8E4CC01D9EEA0@SHSMSX101.ccr.corp.intel.com> In-Reply-To: <1ED644BD7E0A5F4091CF203DAFB8E4CC01D9EEA0@SHSMSX101.ccr.corp.intel.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and change three fields X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Nov 2014 14:56:42 -0000 >=20 > -----Original Message----- > From: Olivier MATZ [mailto:olivier.matz@6wind.com] > Sent: Thursday, November 27, 2014 6:00 PM > To: Liu, Jijiang; dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH 1/3] mbuf:add two TX offload flags and cha= nge three fields >=20 > Hi Jijiang, >=20 > Please see some comments below. >=20 > On 11/27/2014 09:18 AM, Jijiang Liu wrote: > > In place of removing the PKT_TX_VXLAN_CKSUM, we introduce 2 new flags: = PKT_TX_OUT_IP_CKSUM, PKT_TX_UDP_TUNNEL_PKT, > and a new field: l4_tun_len. > > Replace the inner_l2_len and the inner_l3_len field with the outer_l2_l= en and outer_l3_len field. > > > > PKT_TX_OUT_IP_CKSUM: is not used for non-tunnelling packet;hardware out= er checksum for tunnelling packet. > > PKT_TX_UDP_TUNNEL_PKT: is used to tell PMD that the transmit packet is = a UDP tunneling packet. > > l4_tun_len: for VXLAN packet, it should be udp header length plus VXLAN= header length. > > > > Signed-off-by: Jijiang Liu > > --- > > lib/librte_mbuf/rte_mbuf.c | 2 +- > > lib/librte_mbuf/rte_mbuf.h | 23 ++++++++++++++--------- > > 2 files changed, 15 insertions(+), 10 deletions(-) > > > > diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c > > index 87c2963..e89c310 100644 > > --- a/lib/librte_mbuf/rte_mbuf.c > > +++ b/lib/librte_mbuf/rte_mbuf.c > > @@ -240,7 +240,7 @@ const char *rte_get_tx_ol_flag_name(uint64_t mask) > > case PKT_TX_SCTP_CKSUM: return "PKT_TX_SCTP_CKSUM"; > > case PKT_TX_UDP_CKSUM: return "PKT_TX_UDP_CKSUM"; > > case PKT_TX_IEEE1588_TMST: return "PKT_TX_IEEE1588_TMST"; > > - case PKT_TX_VXLAN_CKSUM: return "PKT_TX_VXLAN_CKSUM"; > > + case PKT_TX_UDP_TUNNEL_PKT: return "PKT_TX_UDP_TUNNEL_PKT"; > > case PKT_TX_TCP_SEG: return "PKT_TX_TCP_SEG"; > > default: return NULL; >=20 > As I said as a reply to the cover letter, I suggest to use PKT_TX_OUT_UDP= _CKSUM instead of PKT_TX_UDP_TUNNEL_PKT. HW don't support outer L4 checksum offload. But to calculate inner checksums correctly, it needs a hint from SW about L= 4 Tunneling Type. Currently the following values are recognised by HW: L4 Tunneling Type (Teredo / GRE header / VXLAN header) indication: 00b - No UDP / GRE tunneling (field must be set to zero if EIPT equals to z= ero) 01b - UDP tunneling header (any UDP tunneling, VXLAN and Geneve). 10b - GRE tunneling header Else - reserved You can check yourself: http://www.intel.com/content/www/us/en/embedded/products/networking/xl710-1= 0-40-controller-datasheet.html Sections 8.4.2.2.1 and 8.4.4.2 >=20 > Also, the PKT_TX_OUT_IP_CKSUM case is missing here. >=20 > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h > > index 367fc56..48cd8e1 100644 > > --- a/lib/librte_mbuf/rte_mbuf.h > > +++ b/lib/librte_mbuf/rte_mbuf.h > > @@ -99,10 +99,9 @@ extern "C" { > > #define PKT_RX_TUNNEL_IPV6_HDR (1ULL << 12) /**< RX tunnel packet wit= h IPv6 header. */ > > #define PKT_RX_FDIR_ID (1ULL << 13) /**< FD id reported if FDIR= match. */ > > #define PKT_RX_FDIR_FLX (1ULL << 14) /**< Flexible bytes reporte= d if FDIR match. */ > > -/* add new RX flags here */ > > >=20 > We should probably not remove this line. >=20 >=20 > > /* add new TX flags here */ > > -#define PKT_TX_VXLAN_CKSUM (1ULL << 50) /**< TX checksum of VXLAN co= mputed by NIC */ > > +#define PKT_TX_UDP_TUNNEL_PKT (1ULL << 50) /**< TX packet is an UDP > > +tunneling packet */ > > #define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to > > timestamp. */ > > > > /** > > @@ -125,13 +124,20 @@ extern "C" { > > #define PKT_TX_IP_CKSUM (1ULL << 54) /**< IP cksum of TX pkt. co= mputed by NIC. */ > > #define PKT_TX_IPV4_CSUM PKT_TX_IP_CKSUM /**< Alias of PKT_TX_IP_= CKSUM. */ > > > > +#define PKT_TX_VLAN_PKT (1ULL << 55) /**< TX packet is a 802.1q V= LAN packet. */ > > + > > /** Tell the NIC it's an IPv4 packet. Required for L4 checksum offloa= d or TSO. */ > > -#define PKT_TX_IPV4 PKT_RX_IPV4_HDR > > +#define PKT_TX_IPV4 (1ULL << 56) > > > > /** Tell the NIC it's an IPv6 packet. Required for L4 checksum offloa= d or TSO. */ > > -#define PKT_TX_IPV6 PKT_RX_IPV6_HDR > > +#define PKT_TX_IPV6 (1ULL << 57) >=20 > The description in comment does not match the description in the cover le= tter. >=20 > Also, I think replacing PKT_RX_IPV[46]_HDR by the value may go in another= commit. >=20 >=20 > > -#define PKT_TX_VLAN_PKT (1ULL << 55) /**< TX packet is a 802.1q V= LAN packet. */ > > +/** Outer IP cksum of TX pkt. computed by NIC for tunneling packet */ > > +#define PKT_TX_OUTER_IP_CKSUM (1ULL << 58) > > +#define PKT_TX_OUTER_IPV4_CSUM PKT_TX_OUTER_IP_CKSUM /**< Alias of > > +PKT_TX_OUTER_IP_CKSUM. */ >=20 > Why do we need an alias? >=20 > By the way, I think the alias of PKT_TX_IP_CKSUM is also uneeded and can = be removed. But it's not the topic of your series. >=20 > Also, the name PKT_TX_OUTER_IP_CKSUM does not match the name in the cover= letter and commit logs. >=20 >=20 > > + > > +/** Tell the NIC it's an outer IPv6 packet for tunneling packet.*/ > > +#define PKT_TX_OUTER_IPV6 (1ULL << 59) > > >=20 > This flag is not in the cover letter or commit log. What is its purpose? My bad, forgot that for outer IP, will also need to specify it's type. So same story here as for inner IP. So in total, we might need 3 flags for outer IP: /* Tells HW that outer IP is IPV4 and checksum for it should be calculated = by HW. */ PKT_TX_OUTER_IP_CKSUM /* Tells HW that outer IP is IPV4 and checksum for it should not be calcula= ted by HW. */ PKT_TX_OUTER_IPV4 /* Tells HW that outer IP is IPV6. */ PKT_TX_OUTER_IPV6 >=20 >=20 > > /** > > * TCP segmentation offload. To enable this offload feature for a @@ > > -266,10 +272,9 @@ struct rte_mbuf { > > uint64_t tso_segsz:16; /**< TCP TSO segment size */ > > > > /* fields for TX offloading of tunnels */ > > - uint64_t inner_l3_len:9; /**< inner L3 (IP) Hdr Length. */ > > - uint64_t inner_l2_len:7; /**< inner L2 (MAC) Hdr Length. */ > > - > > - /* uint64_t unused:8; */ > > + uint64_t outer_l3_len:9; /**< outer L3 (IP) Hdr Length. */ > > + uint64_t outer_l2_len:7; /**< outer L2 (MAC) Hdr Length. */ > > + uint64_t l4_tun_len:8; /**< L4 tunnelling header length */ > > }; > > }; > > } __rte_cache_aligned; > > >=20 > About l4_tun_len, I have another comment I forgot to add in the cover let= ter. Can we remove it and include its length in outer_l2_len > instead? For instance, replace: >=20 > mb->l2_len =3D eth_hdr_in; > mb->l3_len =3D ipv4_hdr_in; > mb->outer_l2_len =3D eth_hdr_out; > mb->outer_l3_len =3D ipv4_hdr_out; > mb->l4tun_len =3D vxlan_hdr; > mb->ol_flags |=3D PKT_TX_OUT_IP_CKSUM | PKT_TX_UDP_TUNNEL | > PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM; >=20 > by: >=20 > mb->l2_len =3D eth_hdr_in; > mb->l3_len =3D ipv4_hdr_in; > mb->outer_l2_len =3D eth_hdr_out + vxlan_hdr; > mb->outer_l3_len =3D ipv4_hdr_out; > mb->ol_flags |=3D PKT_TX_OUT_IP_CKSUM | PKT_TX_UDP_TUNNEL | > PKT_TX_IP_CKSUM | PKT_TX_TCP_CKSUM; >=20 > I think it won't bother the driver, and it's coherent with case B.2 of yo= ur cover letter. You probably meant: mb->l2_len =3D eth_hdr_in + vxlan_hdr; ? Yes, I think it could be done that way too. Though I still prefer to keep l4tun_len - it makes things a bit cleaner (at= least to me).=20 After all we do have space for it in mbuf's tx_offload. Konstantin >=20 > Regards, > Olivier