DPDK patches and discussions
 help / color / mirror / Atom feed
From: luyicai <luyicai@huawei.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "Zhoujingbin (Robin, Russell Lab)" <zhoujingbin@huawei.com>,
	chenchanghu <chenchanghu@huawei.com>,
	"Lilijun (Jerry)" <jerry.lilijun@huawei.com>,
	Linhaifeng <haifeng.lin@huawei.com>,
	"Guohongzhi (Russell Lab)" <guohongzhi1@huawei.com>,
	wangyunjian <wangyunjian@huawei.com>,
	"stable@dpdk.org" <stable@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment
Date: Wed, 16 Dec 2020 10:54:01 +0000
Message-ID: <c60db85447684a5d94ae6480efd3a0e9@huawei.com> (raw)
In-Reply-To: <BYAPR11MB3301257998873B995830D4259AC50@BYAPR11MB3301.namprd11.prod.outlook.com>



-----Original Message-----
From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com] 
Sent: Wednesday, December 16, 2020 6:45 PM
To: luyicai <luyicai@huawei.com>; dev@dpdk.org
Cc: Zhoujingbin (Robin, Russell Lab) <zhoujingbin@huawei.com>; chenchanghu <chenchanghu@huawei.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; Linhaifeng <haifeng.lin@huawei.com>; Guohongzhi (Russell Lab) <guohongzhi1@huawei.com>; wangyunjian <wangyunjian@huawei.com>; stable@dpdk.org
Subject: RE: [dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment



> Hi Yicai,
 
> > In some situations, we would get several ip fragments, which total 
> > data length is less than min_ip_len(64) and padding with zeros.
> > We simulated intermediate fragments by modifying the MTU.
> > To illustrate the problem, we simplify the packet format and ignore 
> > the impact of the packet header.In namespace2, a packet whose data 
> > length is 1520 is sent.
> > When the packet passes tap2, the packet is divided into two
> > fragments: fragment A and B, similar to (1520 = 1510 + 10).
> > When the packet passes tap3, the larger fragment packet A is divided 
> > into two fragments A1 and A2, similar to (1510 = 1500 + 10).
> > Finally, the bond interface receives three fragments:
> > A1, A2, and B (1520 = 1500 + 10 + 10).
> > One fragmented packet A2 is smaller than the minimum Ethernet frame 
> > length, so it needs to be padded.
> >
> > |---------------------------------------------------|
> > |                      HOST                         |
> > | |--------------|   |----------------------------| |
> > | |      ns2     |   |      |--------------|      | |
> > | |  |--------|  |   |  |--------|    |--------|  | |
> > | |  |  tap1  |  |   |  |  tap2  | ns1|  tap3  |  | |
> > | |  |mtu=1510|  |   |  |mtu=1510|    |mtu=1500|  | |
> > | |--|1.1.1.1 |--|   |--|1.1.1.2 |----|2.1.1.1 |--| |
> > |    |--------|         |--------|    |--------|    |
> > |         |                 |              |        |
> > |         |-----------------|              |        |
> > |                                          |        |
> > |                                      |--------|   |
> > |                                      |  bond  |   |
> > |--------------------------------------|mtu=1500|---|
> >                                        |--------|
> >
> > When processing the preceding packets above, DPDK would aggregate 
> > fragmented packets A2 and B.
> > And error packets are generated, which padding(zero) is displayed in 
> > the middle of the packet.
> >
> > A2 + B:
> > 0000   fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00
> > 0010   00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01
> > 0020   01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00
> > 0030   00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb
> > 0040   cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db
> > 0050   dc dd de df e0 e1 e2 e3 e4 e5 e6
> >
> > So, we would calculate the length of padding, and remove the padding 
> > in pkt_len and data_len before aggregation.
> >
> > Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming
> > packet")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Yicai Lu <luyicai@huawei.com>
> > ---
> > v4 -> v5: Update the comments and description.
> > ---
> >  lib/librte_ip_frag/rte_ipv4_reassembly.c | 12 +++++++++---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c
> > b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> > index 1dda8ac..fdf66a4 100644
> > --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c
> > +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> > @@ -104,6 +104,7 @@ struct rte_mbuf *
> >  	const unaligned_uint64_t *psd;
> >  	uint16_t flag_offset, ip_ofs, ip_flag;
> >  	int32_t ip_len;
> > +	int32_t trim;
> >
> >  	flag_offset = rte_be_to_cpu_16(ip_hdr->fragment_offset);
> >  	ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK); @@
> > -117,14 +118,15 @@ struct rte_mbuf *
> >
> >  	ip_ofs *= RTE_IPV4_HDR_OFFSET_UNITS;
> >  	ip_len = rte_be_to_cpu_16(ip_hdr->total_length) - mb->l3_len;
> > +	trim  = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len);
> >
> >  	IP_FRAG_LOG(DEBUG, "%s:%d:\n"
> > -		"mbuf: %p, tms: %" PRIu64
> > -		", key: <%" PRIx64 ", %#x>, ofs: %u, len: %d, flags: %#x\n"
> > +		"mbuf: %p, tms: %" PRIu64 ", key: <%" PRIx64 ", %#x>"
> > +		"ofs: %u, len: %d, padding: %d, flags: %#x\n"
> >  		"tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, "
> >  		"max_entries: %u, use_entries: %u\n\n",
> >  		__func__, __LINE__,
> > -		mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, ip_flag,
> > +		mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, trim, ip_flag,
> >  		tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries,
> >  		tbl->use_entries);
> >
> > @@ -134,6 +136,10 @@ struct rte_mbuf *
> >  		return NULL;
> >  	}
> >
> > +	if (unlikely(trim > 0)) {
> > +		rte_pktmbuf_trim(mb, trim);
> > +	}
> 
> > As a nit {} braces are not required for single expression.
> > LGTM in general, just one thing: shouldn't we have the same fix for ipv6 then?
> > Konstantin
> 
> Hi Konstantin,
> 
> Thanks!
> 
> During the problem analysis, we have discussed on ipv6 and concluded 
> that it does not exist in ipv6.
> 
> For ipv6, it consists of the following parts:
> basic header = 40(bytes)
> DMAC = 6(bytes)
> SMAC = 6(bytes)
> Type = 2(bytes)
> CRC = 4(bytes)
> fragment header = 8(bytes)
> ...
> 
> 40 + 6 + 6 + 2 + 4 + 8 = 66 (bytes)
> 
> Total is already greater than min_ip_len(64). So it doesn't need to be 
> padded with zeros.

> For normal cases - yes, but in theory there could be some unusual scenarios (tunnelled packet, different media, etc.).
> So for consistency and to avoid unforeseen issues - I think better to have the fix for both ipv4 and ipv6.
> After all the impact looks neglectable.
> Konstantin 

Hi Konstantin,

Agree!

In terms of code symmetry, it should be better.

Whatever, I'll submit an another patch(v6) later.

> 
> > +
> >  	/* try to find/add entry into the fragment's table. */
> >  	if ((fp = ip_frag_find(tbl, dr, &key, tms)) == NULL) {
> >  		IP_FRAG_MBUF2DR(dr, mb);
> > --
> > 1.9.5.msysgit.1


  reply	other threads:[~2020-12-16 10:54 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-18 13:29 [dpdk-dev] [PATCH] ip_frag: recalculate data " Yicai Lu
2020-11-22 11:22 ` Thomas Monjalon
2020-11-23  2:27 ` [dpdk-dev] [PATCH v2] " Yicai Lu
2020-12-12 11:05 ` [dpdk-dev] [PATCH v5] ip_frag: remove padding " Yicai Lu
2020-12-14 14:44   ` Ananyev, Konstantin
2020-12-15  3:18     ` luyicai
2020-12-16 10:44       ` Ananyev, Konstantin
2020-12-16 10:54         ` luyicai [this message]
2020-12-16 13:36 ` [dpdk-dev] [PATCH v6] " Yicai Lu
2020-12-18 11:41   ` Ananyev, Konstantin
2021-01-15 10:29     ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c60db85447684a5d94ae6480efd3a0e9@huawei.com \
    --to=luyicai@huawei.com \
    --cc=chenchanghu@huawei.com \
    --cc=dev@dpdk.org \
    --cc=guohongzhi1@huawei.com \
    --cc=haifeng.lin@huawei.com \
    --cc=jerry.lilijun@huawei.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=stable@dpdk.org \
    --cc=wangyunjian@huawei.com \
    --cc=zhoujingbin@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git