* [dpdk-dev] [PATCH] ip_frag: recalculate data length of fragment
@ 2020-11-18 13:29 Yicai Lu
2020-11-22 11:22 ` Thomas Monjalon
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Yicai Lu @ 2020-11-18 13:29 UTC (permalink / raw)
To: dev
Cc: konstantin.ananyev, zhoujingbin, chenchanghu, jerry.lilijun,
haifeng.lin, guohongzhi1, wangyunjian, Yicai Lu
In some situations, we would get several ip fragments, which total
data length is less than minimum frame(64) and padding with zeros.
Examples: Second Fragment is "a0a1 a2a3 a4a5 a6a7 0000 0000 ..."
and Third Fragment is "a8a9 aaab acad aeaf b0b1 b2b3 ...".
And then, we would reassemble Second and Third Fragment like this
"a0a1 a2a3 a4a5 a6a7 0000 0000 ... a8a9 aaab acad aeaf b0b1 ...",
which is not correct! So, we need recalculate the data length
of fragment to remove invalid padings(zero)!
Signed-off-by: Yicai Lu <luyicai@huawei.com>
---
lib/librte_ip_frag/ip_frag_internal.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/librte_ip_frag/ip_frag_internal.c b/lib/librte_ip_frag/ip_frag_internal.c
index b436a4c93..7bdd98090 100644
--- a/lib/librte_ip_frag/ip_frag_internal.c
+++ b/lib/librte_ip_frag/ip_frag_internal.c
@@ -155,6 +155,7 @@ ip_frag_process(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
fp->frags[idx].ofs = ofs;
fp->frags[idx].len = len;
+ mb->data_len = mb->l2_len + mb->l3_len + len;
fp->frags[idx].mb = mb;
mb = NULL;
--
2.28.0.windows.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] [PATCH] ip_frag: recalculate data length of fragment
2020-11-18 13:29 [dpdk-dev] [PATCH] ip_frag: recalculate data length of fragment Yicai Lu
@ 2020-11-22 11:22 ` Thomas Monjalon
2020-11-23 2:27 ` [dpdk-dev] [PATCH v2] " Yicai Lu
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: Thomas Monjalon @ 2020-11-22 11:22 UTC (permalink / raw)
To: Yicai Lu
Cc: dev, konstantin.ananyev, zhoujingbin, chenchanghu, jerry.lilijun,
haifeng.lin, guohongzhi1, wangyunjian
18/11/2020 14:29, Yicai Lu:
> In some situations, we would get several ip fragments, which total
> data length is less than minimum frame(64) and padding with zeros.
> Examples: Second Fragment is "a0a1 a2a3 a4a5 a6a7 0000 0000 ..."
> and Third Fragment is "a8a9 aaab acad aeaf b0b1 b2b3 ...".
> And then, we would reassemble Second and Third Fragment like this
> "a0a1 a2a3 a4a5 a6a7 0000 0000 ... a8a9 aaab acad aeaf b0b1 ...",
> which is not correct! So, we need recalculate the data length
> of fragment to remove invalid padings(zero)!
>
> Signed-off-by: Yicai Lu <luyicai@huawei.com>
Please add a line "Fixes" to identify the root cause,
it will help to know where to backport.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [dpdk-dev] [PATCH v2] ip_frag: recalculate data length of fragment
2020-11-18 13:29 [dpdk-dev] [PATCH] ip_frag: recalculate data length of fragment Yicai Lu
2020-11-22 11:22 ` Thomas Monjalon
@ 2020-11-23 2:27 ` Yicai Lu
2020-12-12 11:05 ` [dpdk-dev] [PATCH v5] ip_frag: remove padding " Yicai Lu
2020-12-16 13:36 ` [dpdk-dev] [PATCH v6] " Yicai Lu
3 siblings, 0 replies; 11+ messages in thread
From: Yicai Lu @ 2020-11-23 2:27 UTC (permalink / raw)
To: dev
Cc: konstantin.ananyev, zhoujingbin, chenchanghu, jerry.lilijun,
haifeng.lin, guohongzhi1, wangyunjian, Yicai Lu, anatoly.burakov
In some situations, we would get several ip fragments, which total
data length is less than minimum frame(64) and padding with zeros.
Examples: Second Fragment "a0a1 a2a3 a4a5 a6a7 0000 0000 ..."
and Third Fragment "a8a9 aaab acad aeaf b0b1 b2b3 ...".
Finally, we would reassemble Second and Third Fragment like this
"a0a1 a2a3 a4a5 a6a7 0000 0000 ... a8a9 aaab acad aeaf b0b1 ...",
which is not correct!
So, we need recalculate data length of fragment to remove padings!
Fixes: 416707812c03 ("ip_frag: refactor reassembly code into a proper library")
Cc: anatoly.burakov@intel.com
Signed-off-by: Yicai Lu <luyicai@huawei.com>
---
v1 -> v2: Adjust commit info style
---
lib/librte_ip_frag/ip_frag_internal.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/librte_ip_frag/ip_frag_internal.c b/lib/librte_ip_frag/ip_frag_internal.c
index b436a4c..7bdd980 100644
--- a/lib/librte_ip_frag/ip_frag_internal.c
+++ b/lib/librte_ip_frag/ip_frag_internal.c
@@ -155,6 +155,7 @@ struct rte_mbuf *
fp->frags[idx].ofs = ofs;
fp->frags[idx].len = len;
+ mb->data_len = mb->l2_len + mb->l3_len + len;
fp->frags[idx].mb = mb;
mb = NULL;
--
1.9.5.msysgit.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment
2020-11-18 13:29 [dpdk-dev] [PATCH] ip_frag: recalculate data length of fragment Yicai Lu
2020-11-22 11:22 ` Thomas Monjalon
2020-11-23 2:27 ` [dpdk-dev] [PATCH v2] " Yicai Lu
@ 2020-12-12 11:05 ` Yicai Lu
2020-12-14 14:44 ` Ananyev, Konstantin
2020-12-16 13:36 ` [dpdk-dev] [PATCH v6] " Yicai Lu
3 siblings, 1 reply; 11+ messages in thread
From: Yicai Lu @ 2020-12-12 11:05 UTC (permalink / raw)
To: dev
Cc: konstantin.ananyev, zhoujingbin, chenchanghu, jerry.lilijun,
haifeng.lin, guohongzhi1, wangyunjian, Yicai Lu, stable
In some situations, we would get several ip fragments, which total
data length is less than min_ip_len(64) and padding with zeros.
We simulated intermediate fragments by modifying the MTU.
To illustrate the problem, we simplify the packet format and
ignore the impact of the packet header.In namespace2,
a packet whose data length is 1520 is sent.
When the packet passes tap2, the packet is divided into two
fragments: fragment A and B, similar to (1520 = 1510 + 10).
When the packet passes tap3, the larger fragment packet A is
divided into two fragments A1 and A2, similar to (1510 = 1500 + 10).
Finally, the bond interface receives three fragments:
A1, A2, and B (1520 = 1500 + 10 + 10).
One fragmented packet A2 is smaller than the minimum Ethernet
frame length, so it needs to be padded.
|---------------------------------------------------|
| HOST |
| |--------------| |----------------------------| |
| | ns2 | | |--------------| | |
| | |--------| | | |--------| |--------| | |
| | | tap1 | | | | tap2 | ns1| tap3 | | |
| | |mtu=1510| | | |mtu=1510| |mtu=1500| | |
| |--|1.1.1.1 |--| |--|1.1.1.2 |----|2.1.1.1 |--| |
| |--------| |--------| |--------| |
| | | | |
| |-----------------| | |
| | |
| |--------| |
| | bond | |
|--------------------------------------|mtu=1500|---|
|--------|
When processing the preceding packets above,
DPDK would aggregate fragmented packets A2 and B.
And error packets are generated, which padding(zero)
is displayed in the middle of the packet.
A2 + B:
0000 fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00
0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01
0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00
0030 00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb
0040 cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db
0050 dc dd de df e0 e1 e2 e3 e4 e5 e6
So, we would calculate the length of padding, and remove
the padding in pkt_len and data_len before aggregation.
Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming packet")
Cc: stable@dpdk.org
Signed-off-by: Yicai Lu <luyicai@huawei.com>
---
v4 -> v5: Update the comments and description.
---
lib/librte_ip_frag/rte_ipv4_reassembly.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c b/lib/librte_ip_frag/rte_ipv4_reassembly.c
index 1dda8ac..fdf66a4 100644
--- a/lib/librte_ip_frag/rte_ipv4_reassembly.c
+++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c
@@ -104,6 +104,7 @@ struct rte_mbuf *
const unaligned_uint64_t *psd;
uint16_t flag_offset, ip_ofs, ip_flag;
int32_t ip_len;
+ int32_t trim;
flag_offset = rte_be_to_cpu_16(ip_hdr->fragment_offset);
ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK);
@@ -117,14 +118,15 @@ struct rte_mbuf *
ip_ofs *= RTE_IPV4_HDR_OFFSET_UNITS;
ip_len = rte_be_to_cpu_16(ip_hdr->total_length) - mb->l3_len;
+ trim = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len);
IP_FRAG_LOG(DEBUG, "%s:%d:\n"
- "mbuf: %p, tms: %" PRIu64
- ", key: <%" PRIx64 ", %#x>, ofs: %u, len: %d, flags: %#x\n"
+ "mbuf: %p, tms: %" PRIu64 ", key: <%" PRIx64 ", %#x>"
+ "ofs: %u, len: %d, padding: %d, flags: %#x\n"
"tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, "
"max_entries: %u, use_entries: %u\n\n",
__func__, __LINE__,
- mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, ip_flag,
+ mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, trim, ip_flag,
tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries,
tbl->use_entries);
@@ -134,6 +136,10 @@ struct rte_mbuf *
return NULL;
}
+ if (unlikely(trim > 0)) {
+ rte_pktmbuf_trim(mb, trim);
+ }
+
/* try to find/add entry into the fragment's table. */
if ((fp = ip_frag_find(tbl, dr, &key, tms)) == NULL) {
IP_FRAG_MBUF2DR(dr, mb);
--
1.9.5.msysgit.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment
2020-12-12 11:05 ` [dpdk-dev] [PATCH v5] ip_frag: remove padding " Yicai Lu
@ 2020-12-14 14:44 ` Ananyev, Konstantin
2020-12-15 3:18 ` luyicai
0 siblings, 1 reply; 11+ messages in thread
From: Ananyev, Konstantin @ 2020-12-14 14:44 UTC (permalink / raw)
To: Yicai Lu, dev
Cc: zhoujingbin, chenchanghu, jerry.lilijun, haifeng.lin,
guohongzhi1, wangyunjian, stable
Hi Yicai,
> In some situations, we would get several ip fragments, which total
> data length is less than min_ip_len(64) and padding with zeros.
> We simulated intermediate fragments by modifying the MTU.
> To illustrate the problem, we simplify the packet format and
> ignore the impact of the packet header.In namespace2,
> a packet whose data length is 1520 is sent.
> When the packet passes tap2, the packet is divided into two
> fragments: fragment A and B, similar to (1520 = 1510 + 10).
> When the packet passes tap3, the larger fragment packet A is
> divided into two fragments A1 and A2, similar to (1510 = 1500 + 10).
> Finally, the bond interface receives three fragments:
> A1, A2, and B (1520 = 1500 + 10 + 10).
> One fragmented packet A2 is smaller than the minimum Ethernet
> frame length, so it needs to be padded.
>
> |---------------------------------------------------|
> | HOST |
> | |--------------| |----------------------------| |
> | | ns2 | | |--------------| | |
> | | |--------| | | |--------| |--------| | |
> | | | tap1 | | | | tap2 | ns1| tap3 | | |
> | | |mtu=1510| | | |mtu=1510| |mtu=1500| | |
> | |--|1.1.1.1 |--| |--|1.1.1.2 |----|2.1.1.1 |--| |
> | |--------| |--------| |--------| |
> | | | | |
> | |-----------------| | |
> | | |
> | |--------| |
> | | bond | |
> |--------------------------------------|mtu=1500|---|
> |--------|
>
> When processing the preceding packets above,
> DPDK would aggregate fragmented packets A2 and B.
> And error packets are generated, which padding(zero)
> is displayed in the middle of the packet.
>
> A2 + B:
> 0000 fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00
> 0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01
> 0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00
> 0030 00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb
> 0040 cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db
> 0050 dc dd de df e0 e1 e2 e3 e4 e5 e6
>
> So, we would calculate the length of padding, and remove
> the padding in pkt_len and data_len before aggregation.
>
> Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming packet")
> Cc: stable@dpdk.org
>
> Signed-off-by: Yicai Lu <luyicai@huawei.com>
> ---
> v4 -> v5: Update the comments and description.
> ---
> lib/librte_ip_frag/rte_ipv4_reassembly.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> index 1dda8ac..fdf66a4 100644
> --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c
> +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> @@ -104,6 +104,7 @@ struct rte_mbuf *
> const unaligned_uint64_t *psd;
> uint16_t flag_offset, ip_ofs, ip_flag;
> int32_t ip_len;
> + int32_t trim;
>
> flag_offset = rte_be_to_cpu_16(ip_hdr->fragment_offset);
> ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK);
> @@ -117,14 +118,15 @@ struct rte_mbuf *
>
> ip_ofs *= RTE_IPV4_HDR_OFFSET_UNITS;
> ip_len = rte_be_to_cpu_16(ip_hdr->total_length) - mb->l3_len;
> + trim = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len);
>
> IP_FRAG_LOG(DEBUG, "%s:%d:\n"
> - "mbuf: %p, tms: %" PRIu64
> - ", key: <%" PRIx64 ", %#x>, ofs: %u, len: %d, flags: %#x\n"
> + "mbuf: %p, tms: %" PRIu64 ", key: <%" PRIx64 ", %#x>"
> + "ofs: %u, len: %d, padding: %d, flags: %#x\n"
> "tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, "
> "max_entries: %u, use_entries: %u\n\n",
> __func__, __LINE__,
> - mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, ip_flag,
> + mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, trim, ip_flag,
> tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries,
> tbl->use_entries);
>
> @@ -134,6 +136,10 @@ struct rte_mbuf *
> return NULL;
> }
>
> + if (unlikely(trim > 0)) {
> + rte_pktmbuf_trim(mb, trim);
> + }
As a nit {} braces are not required for single expression.
LGTM in general, just one thing: shouldn't we have the same fix for ipv6 then?
Konstantin
> +
> /* try to find/add entry into the fragment's table. */
> if ((fp = ip_frag_find(tbl, dr, &key, tms)) == NULL) {
> IP_FRAG_MBUF2DR(dr, mb);
> --
> 1.9.5.msysgit.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment
2020-12-14 14:44 ` Ananyev, Konstantin
@ 2020-12-15 3:18 ` luyicai
2020-12-16 10:44 ` Ananyev, Konstantin
0 siblings, 1 reply; 11+ messages in thread
From: luyicai @ 2020-12-15 3:18 UTC (permalink / raw)
To: Ananyev, Konstantin, dev
Cc: Zhoujingbin (Robin, Russell Lab), chenchanghu, Lilijun (Jerry),
Linhaifeng, Guohongzhi (Russell Lab),
wangyunjian, stable
> -----Original Message-----
> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> Sent: Monday, December 14, 2020 10:45 PM
> To: luyicai <luyicai@huawei.com>; dev@dpdk.org
> Cc: Zhoujingbin (Robin, Russell Lab) <zhoujingbin@huawei.com>; chenchanghu <chenchanghu@huawei.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; Linhaifeng <haifeng.lin@huawei.com>; Guohongzhi (Russell Lab) <guohongzhi1@huawei.com>; wangyunjian <wangyunjian@huawei.com>; stable@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment
> Hi Yicai,
> In some situations, we would get several ip fragments, which total
> data length is less than min_ip_len(64) and padding with zeros.
> We simulated intermediate fragments by modifying the MTU.
> To illustrate the problem, we simplify the packet format and ignore
> the impact of the packet header.In namespace2, a packet whose data
> length is 1520 is sent.
> When the packet passes tap2, the packet is divided into two
> fragments: fragment A and B, similar to (1520 = 1510 + 10).
> When the packet passes tap3, the larger fragment packet A is divided
> into two fragments A1 and A2, similar to (1510 = 1500 + 10).
> Finally, the bond interface receives three fragments:
> A1, A2, and B (1520 = 1500 + 10 + 10).
> One fragmented packet A2 is smaller than the minimum Ethernet frame
> length, so it needs to be padded.
>
> |---------------------------------------------------|
> | HOST |
> | |--------------| |----------------------------| |
> | | ns2 | | |--------------| | |
> | | |--------| | | |--------| |--------| | |
> | | | tap1 | | | | tap2 | ns1| tap3 | | |
> | | |mtu=1510| | | |mtu=1510| |mtu=1500| | |
> | |--|1.1.1.1 |--| |--|1.1.1.2 |----|2.1.1.1 |--| |
> | |--------| |--------| |--------| |
> | | | | |
> | |-----------------| | |
> | | |
> | |--------| |
> | | bond | |
> |--------------------------------------|mtu=1500|---|
> |--------|
>
> When processing the preceding packets above, DPDK would aggregate
> fragmented packets A2 and B.
> And error packets are generated, which padding(zero) is displayed in
> the middle of the packet.
>
> A2 + B:
> 0000 fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00
> 0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01
> 0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00
> 0030 00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb
> 0040 cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db
> 0050 dc dd de df e0 e1 e2 e3 e4 e5 e6
>
> So, we would calculate the length of padding, and remove the padding
> in pkt_len and data_len before aggregation.
>
> Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming
> packet")
> Cc: stable@dpdk.org
>
> Signed-off-by: Yicai Lu <luyicai@huawei.com>
> ---
> v4 -> v5: Update the comments and description.
> ---
> lib/librte_ip_frag/rte_ipv4_reassembly.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c
> b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> index 1dda8ac..fdf66a4 100644
> --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c
> +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> @@ -104,6 +104,7 @@ struct rte_mbuf *
> const unaligned_uint64_t *psd;
> uint16_t flag_offset, ip_ofs, ip_flag;
> int32_t ip_len;
> + int32_t trim;
>
> flag_offset = rte_be_to_cpu_16(ip_hdr->fragment_offset);
> ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK); @@
> -117,14 +118,15 @@ struct rte_mbuf *
>
> ip_ofs *= RTE_IPV4_HDR_OFFSET_UNITS;
> ip_len = rte_be_to_cpu_16(ip_hdr->total_length) - mb->l3_len;
> + trim = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len);
>
> IP_FRAG_LOG(DEBUG, "%s:%d:\n"
> - "mbuf: %p, tms: %" PRIu64
> - ", key: <%" PRIx64 ", %#x>, ofs: %u, len: %d, flags: %#x\n"
> + "mbuf: %p, tms: %" PRIu64 ", key: <%" PRIx64 ", %#x>"
> + "ofs: %u, len: %d, padding: %d, flags: %#x\n"
> "tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, "
> "max_entries: %u, use_entries: %u\n\n",
> __func__, __LINE__,
> - mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, ip_flag,
> + mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, trim, ip_flag,
> tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries,
> tbl->use_entries);
>
> @@ -134,6 +136,10 @@ struct rte_mbuf *
> return NULL;
> }
>
> + if (unlikely(trim > 0)) {
> + rte_pktmbuf_trim(mb, trim);
> + }
> As a nit {} braces are not required for single expression.
> LGTM in general, just one thing: shouldn't we have the same fix for ipv6 then?
> Konstantin
Hi Konstantin,
Thanks!
During the problem analysis, we have discussed on ipv6
and concluded that it does not exist in ipv6.
For ipv6, it consists of the following parts:
basic header = 40(bytes)
DMAC = 6(bytes)
SMAC = 6(bytes)
Type = 2(bytes)
CRC = 4(bytes)
fragment header = 8(bytes)
...
40 + 6 + 6 + 2 + 4 + 8 = 66 (bytes)
Total is already greater than min_ip_len(64). So it doesn't
need to be padded with zeros.
> +
> /* try to find/add entry into the fragment's table. */
> if ((fp = ip_frag_find(tbl, dr, &key, tms)) == NULL) {
> IP_FRAG_MBUF2DR(dr, mb);
> --
> 1.9.5.msysgit.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment
2020-12-15 3:18 ` luyicai
@ 2020-12-16 10:44 ` Ananyev, Konstantin
2020-12-16 10:54 ` luyicai
0 siblings, 1 reply; 11+ messages in thread
From: Ananyev, Konstantin @ 2020-12-16 10:44 UTC (permalink / raw)
To: luyicai, dev
Cc: Zhoujingbin (Robin, Russell Lab), chenchanghu, Lilijun (Jerry),
Linhaifeng, Guohongzhi (Russell Lab),
wangyunjian, stable
Hi Yicai,
> > In some situations, we would get several ip fragments, which total
> > data length is less than min_ip_len(64) and padding with zeros.
> > We simulated intermediate fragments by modifying the MTU.
> > To illustrate the problem, we simplify the packet format and ignore
> > the impact of the packet header.In namespace2, a packet whose data
> > length is 1520 is sent.
> > When the packet passes tap2, the packet is divided into two
> > fragments: fragment A and B, similar to (1520 = 1510 + 10).
> > When the packet passes tap3, the larger fragment packet A is divided
> > into two fragments A1 and A2, similar to (1510 = 1500 + 10).
> > Finally, the bond interface receives three fragments:
> > A1, A2, and B (1520 = 1500 + 10 + 10).
> > One fragmented packet A2 is smaller than the minimum Ethernet frame
> > length, so it needs to be padded.
> >
> > |---------------------------------------------------|
> > | HOST |
> > | |--------------| |----------------------------| |
> > | | ns2 | | |--------------| | |
> > | | |--------| | | |--------| |--------| | |
> > | | | tap1 | | | | tap2 | ns1| tap3 | | |
> > | | |mtu=1510| | | |mtu=1510| |mtu=1500| | |
> > | |--|1.1.1.1 |--| |--|1.1.1.2 |----|2.1.1.1 |--| |
> > | |--------| |--------| |--------| |
> > | | | | |
> > | |-----------------| | |
> > | | |
> > | |--------| |
> > | | bond | |
> > |--------------------------------------|mtu=1500|---|
> > |--------|
> >
> > When processing the preceding packets above, DPDK would aggregate
> > fragmented packets A2 and B.
> > And error packets are generated, which padding(zero) is displayed in
> > the middle of the packet.
> >
> > A2 + B:
> > 0000 fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00
> > 0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01
> > 0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00
> > 0030 00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb
> > 0040 cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db
> > 0050 dc dd de df e0 e1 e2 e3 e4 e5 e6
> >
> > So, we would calculate the length of padding, and remove the padding
> > in pkt_len and data_len before aggregation.
> >
> > Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming
> > packet")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Yicai Lu <luyicai@huawei.com>
> > ---
> > v4 -> v5: Update the comments and description.
> > ---
> > lib/librte_ip_frag/rte_ipv4_reassembly.c | 12 +++++++++---
> > 1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c
> > b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> > index 1dda8ac..fdf66a4 100644
> > --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c
> > +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> > @@ -104,6 +104,7 @@ struct rte_mbuf *
> > const unaligned_uint64_t *psd;
> > uint16_t flag_offset, ip_ofs, ip_flag;
> > int32_t ip_len;
> > + int32_t trim;
> >
> > flag_offset = rte_be_to_cpu_16(ip_hdr->fragment_offset);
> > ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK); @@
> > -117,14 +118,15 @@ struct rte_mbuf *
> >
> > ip_ofs *= RTE_IPV4_HDR_OFFSET_UNITS;
> > ip_len = rte_be_to_cpu_16(ip_hdr->total_length) - mb->l3_len;
> > + trim = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len);
> >
> > IP_FRAG_LOG(DEBUG, "%s:%d:\n"
> > - "mbuf: %p, tms: %" PRIu64
> > - ", key: <%" PRIx64 ", %#x>, ofs: %u, len: %d, flags: %#x\n"
> > + "mbuf: %p, tms: %" PRIu64 ", key: <%" PRIx64 ", %#x>"
> > + "ofs: %u, len: %d, padding: %d, flags: %#x\n"
> > "tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, "
> > "max_entries: %u, use_entries: %u\n\n",
> > __func__, __LINE__,
> > - mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, ip_flag,
> > + mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, trim, ip_flag,
> > tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries,
> > tbl->use_entries);
> >
> > @@ -134,6 +136,10 @@ struct rte_mbuf *
> > return NULL;
> > }
> >
> > + if (unlikely(trim > 0)) {
> > + rte_pktmbuf_trim(mb, trim);
> > + }
>
> > As a nit {} braces are not required for single expression.
> > LGTM in general, just one thing: shouldn't we have the same fix for ipv6 then?
> > Konstantin
>
> Hi Konstantin,
>
> Thanks!
>
> During the problem analysis, we have discussed on ipv6
> and concluded that it does not exist in ipv6.
>
> For ipv6, it consists of the following parts:
> basic header = 40(bytes)
> DMAC = 6(bytes)
> SMAC = 6(bytes)
> Type = 2(bytes)
> CRC = 4(bytes)
> fragment header = 8(bytes)
> ...
>
> 40 + 6 + 6 + 2 + 4 + 8 = 66 (bytes)
>
> Total is already greater than min_ip_len(64). So it doesn't
> need to be padded with zeros.
For normal cases - yes, but in theory there could be some unusual scenarios
(tunnelled packet, different media, etc.).
So for consistency and to avoid unforeseen issues - I think better to have
the fix for both ipv4 and ipv6.
After all the impact looks neglectable.
Konstantin
>
> > +
> > /* try to find/add entry into the fragment's table. */
> > if ((fp = ip_frag_find(tbl, dr, &key, tms)) == NULL) {
> > IP_FRAG_MBUF2DR(dr, mb);
> > --
> > 1.9.5.msysgit.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment
2020-12-16 10:44 ` Ananyev, Konstantin
@ 2020-12-16 10:54 ` luyicai
0 siblings, 0 replies; 11+ messages in thread
From: luyicai @ 2020-12-16 10:54 UTC (permalink / raw)
To: Ananyev, Konstantin, dev
Cc: Zhoujingbin (Robin, Russell Lab), chenchanghu, Lilijun (Jerry),
Linhaifeng, Guohongzhi (Russell Lab),
wangyunjian, stable
-----Original Message-----
From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
Sent: Wednesday, December 16, 2020 6:45 PM
To: luyicai <luyicai@huawei.com>; dev@dpdk.org
Cc: Zhoujingbin (Robin, Russell Lab) <zhoujingbin@huawei.com>; chenchanghu <chenchanghu@huawei.com>; Lilijun (Jerry) <jerry.lilijun@huawei.com>; Linhaifeng <haifeng.lin@huawei.com>; Guohongzhi (Russell Lab) <guohongzhi1@huawei.com>; wangyunjian <wangyunjian@huawei.com>; stable@dpdk.org
Subject: RE: [dpdk-dev] [PATCH v5] ip_frag: remove padding length of fragment
> Hi Yicai,
> > In some situations, we would get several ip fragments, which total
> > data length is less than min_ip_len(64) and padding with zeros.
> > We simulated intermediate fragments by modifying the MTU.
> > To illustrate the problem, we simplify the packet format and ignore
> > the impact of the packet header.In namespace2, a packet whose data
> > length is 1520 is sent.
> > When the packet passes tap2, the packet is divided into two
> > fragments: fragment A and B, similar to (1520 = 1510 + 10).
> > When the packet passes tap3, the larger fragment packet A is divided
> > into two fragments A1 and A2, similar to (1510 = 1500 + 10).
> > Finally, the bond interface receives three fragments:
> > A1, A2, and B (1520 = 1500 + 10 + 10).
> > One fragmented packet A2 is smaller than the minimum Ethernet frame
> > length, so it needs to be padded.
> >
> > |---------------------------------------------------|
> > | HOST |
> > | |--------------| |----------------------------| |
> > | | ns2 | | |--------------| | |
> > | | |--------| | | |--------| |--------| | |
> > | | | tap1 | | | | tap2 | ns1| tap3 | | |
> > | | |mtu=1510| | | |mtu=1510| |mtu=1500| | |
> > | |--|1.1.1.1 |--| |--|1.1.1.2 |----|2.1.1.1 |--| |
> > | |--------| |--------| |--------| |
> > | | | | |
> > | |-----------------| | |
> > | | |
> > | |--------| |
> > | | bond | |
> > |--------------------------------------|mtu=1500|---|
> > |--------|
> >
> > When processing the preceding packets above, DPDK would aggregate
> > fragmented packets A2 and B.
> > And error packets are generated, which padding(zero) is displayed in
> > the middle of the packet.
> >
> > A2 + B:
> > 0000 fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00
> > 0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01
> > 0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00
> > 0030 00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb
> > 0040 cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db
> > 0050 dc dd de df e0 e1 e2 e3 e4 e5 e6
> >
> > So, we would calculate the length of padding, and remove the padding
> > in pkt_len and data_len before aggregation.
> >
> > Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming
> > packet")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Yicai Lu <luyicai@huawei.com>
> > ---
> > v4 -> v5: Update the comments and description.
> > ---
> > lib/librte_ip_frag/rte_ipv4_reassembly.c | 12 +++++++++---
> > 1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c
> > b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> > index 1dda8ac..fdf66a4 100644
> > --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c
> > +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> > @@ -104,6 +104,7 @@ struct rte_mbuf *
> > const unaligned_uint64_t *psd;
> > uint16_t flag_offset, ip_ofs, ip_flag;
> > int32_t ip_len;
> > + int32_t trim;
> >
> > flag_offset = rte_be_to_cpu_16(ip_hdr->fragment_offset);
> > ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK); @@
> > -117,14 +118,15 @@ struct rte_mbuf *
> >
> > ip_ofs *= RTE_IPV4_HDR_OFFSET_UNITS;
> > ip_len = rte_be_to_cpu_16(ip_hdr->total_length) - mb->l3_len;
> > + trim = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len);
> >
> > IP_FRAG_LOG(DEBUG, "%s:%d:\n"
> > - "mbuf: %p, tms: %" PRIu64
> > - ", key: <%" PRIx64 ", %#x>, ofs: %u, len: %d, flags: %#x\n"
> > + "mbuf: %p, tms: %" PRIu64 ", key: <%" PRIx64 ", %#x>"
> > + "ofs: %u, len: %d, padding: %d, flags: %#x\n"
> > "tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, "
> > "max_entries: %u, use_entries: %u\n\n",
> > __func__, __LINE__,
> > - mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, ip_flag,
> > + mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, trim, ip_flag,
> > tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries,
> > tbl->use_entries);
> >
> > @@ -134,6 +136,10 @@ struct rte_mbuf *
> > return NULL;
> > }
> >
> > + if (unlikely(trim > 0)) {
> > + rte_pktmbuf_trim(mb, trim);
> > + }
>
> > As a nit {} braces are not required for single expression.
> > LGTM in general, just one thing: shouldn't we have the same fix for ipv6 then?
> > Konstantin
>
> Hi Konstantin,
>
> Thanks!
>
> During the problem analysis, we have discussed on ipv6 and concluded
> that it does not exist in ipv6.
>
> For ipv6, it consists of the following parts:
> basic header = 40(bytes)
> DMAC = 6(bytes)
> SMAC = 6(bytes)
> Type = 2(bytes)
> CRC = 4(bytes)
> fragment header = 8(bytes)
> ...
>
> 40 + 6 + 6 + 2 + 4 + 8 = 66 (bytes)
>
> Total is already greater than min_ip_len(64). So it doesn't need to be
> padded with zeros.
> For normal cases - yes, but in theory there could be some unusual scenarios (tunnelled packet, different media, etc.).
> So for consistency and to avoid unforeseen issues - I think better to have the fix for both ipv4 and ipv6.
> After all the impact looks neglectable.
> Konstantin
Hi Konstantin,
Agree!
In terms of code symmetry, it should be better.
Whatever, I'll submit an another patch(v6) later.
>
> > +
> > /* try to find/add entry into the fragment's table. */
> > if ((fp = ip_frag_find(tbl, dr, &key, tms)) == NULL) {
> > IP_FRAG_MBUF2DR(dr, mb);
> > --
> > 1.9.5.msysgit.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [dpdk-dev] [PATCH v6] ip_frag: remove padding length of fragment
2020-11-18 13:29 [dpdk-dev] [PATCH] ip_frag: recalculate data length of fragment Yicai Lu
` (2 preceding siblings ...)
2020-12-12 11:05 ` [dpdk-dev] [PATCH v5] ip_frag: remove padding " Yicai Lu
@ 2020-12-16 13:36 ` Yicai Lu
2020-12-18 11:41 ` Ananyev, Konstantin
3 siblings, 1 reply; 11+ messages in thread
From: Yicai Lu @ 2020-12-16 13:36 UTC (permalink / raw)
To: dev
Cc: konstantin.ananyev, zhoujingbin, chenchanghu, jerry.lilijun,
haifeng.lin, guohongzhi1, wangyunjian, Yicai Lu, stable
In some situations, we would get several ip fragments, which total
data length is less than min_ip_len(64) and padding with zeros.
We simulated intermediate fragments by modifying the MTU.
To illustrate the problem, we simplify the packet format and
ignore the impact of the packet header.In namespace2,
a packet whose data length is 1520 is sent.
When the packet passes tap2, the packet is divided into two
fragments: fragment A and B, similar to (1520 = 1510 + 10).
When the packet passes tap3, the larger fragment packet A is
divided into two fragments A1 and A2, similar to (1510 = 1500 + 10).
Finally, the bond interface receives three fragments:
A1, A2, and B (1520 = 1500 + 10 + 10).
One fragmented packet A2 is smaller than the minimum Ethernet
frame length, so it needs to be padded.
|---------------------------------------------------|
| HOST |
| |--------------| |----------------------------| |
| | ns2 | | |--------------| | |
| | |--------| | | |--------| |--------| | |
| | | tap1 | | | | tap2 | ns1| tap3 | | |
| | |mtu=1510| | | |mtu=1510| |mtu=1500| | |
| |--|1.1.1.1 |--| |--|1.1.1.2 |----|2.1.1.1 |--| |
| |--------| |--------| |--------| |
| | | | |
| |-----------------| | |
| | |
| |--------| |
| | bond | |
|--------------------------------------|mtu=1500|---|
|--------|
When processing the preceding packets above,
DPDK would aggregate fragmented packets A2 and B.
And error packets are generated, which padding(zero)
is displayed in the middle of the packet.
A2 + B:
0000 fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00
0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01
0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00
0030 00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb
0040 cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db
0050 dc dd de df e0 e1 e2 e3 e4 e5 e6
So, we would calculate the length of padding, and remove
the padding in pkt_len and data_len before aggregation.
And also we have the fix for both ipv4 and ipv6.
Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming packet")
Cc: stable@dpdk.org
Signed-off-by: Yicai Lu <luyicai@huawei.com>
---
v5 -> v6: Have the fix for ipv6.
---
lib/librte_ip_frag/rte_ipv4_reassembly.c | 11 ++++++++---
lib/librte_ip_frag/rte_ipv6_reassembly.c | 9 +++++++--
2 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c b/lib/librte_ip_frag/rte_ipv4_reassembly.c
index 1dda8ac..69666c8 100644
--- a/lib/librte_ip_frag/rte_ipv4_reassembly.c
+++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c
@@ -104,6 +104,7 @@ struct rte_mbuf *
const unaligned_uint64_t *psd;
uint16_t flag_offset, ip_ofs, ip_flag;
int32_t ip_len;
+ int32_t trim;
flag_offset = rte_be_to_cpu_16(ip_hdr->fragment_offset);
ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK);
@@ -117,14 +118,15 @@ struct rte_mbuf *
ip_ofs *= RTE_IPV4_HDR_OFFSET_UNITS;
ip_len = rte_be_to_cpu_16(ip_hdr->total_length) - mb->l3_len;
+ trim = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len);
IP_FRAG_LOG(DEBUG, "%s:%d:\n"
- "mbuf: %p, tms: %" PRIu64
- ", key: <%" PRIx64 ", %#x>, ofs: %u, len: %d, flags: %#x\n"
+ "mbuf: %p, tms: %" PRIu64 ", key: <%" PRIx64 ", %#x>"
+ "ofs: %u, len: %d, padding: %d, flags: %#x\n"
"tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, "
"max_entries: %u, use_entries: %u\n\n",
__func__, __LINE__,
- mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, ip_flag,
+ mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, trim, ip_flag,
tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries,
tbl->use_entries);
@@ -134,6 +136,9 @@ struct rte_mbuf *
return NULL;
}
+ if (unlikely(trim > 0))
+ rte_pktmbuf_trim(mb, trim);
+
/* try to find/add entry into the fragment's table. */
if ((fp = ip_frag_find(tbl, dr, &key, tms)) == NULL) {
IP_FRAG_MBUF2DR(dr, mb);
diff --git a/lib/librte_ip_frag/rte_ipv6_reassembly.c b/lib/librte_ip_frag/rte_ipv6_reassembly.c
index ad01055..4037da0 100644
--- a/lib/librte_ip_frag/rte_ipv6_reassembly.c
+++ b/lib/librte_ip_frag/rte_ipv6_reassembly.c
@@ -142,6 +142,7 @@ struct rte_mbuf *
struct ip_frag_key key;
uint16_t ip_ofs;
int32_t ip_len;
+ int32_t trim;
rte_memcpy(&key.src_dst[0], ip_hdr->src_addr, 16);
rte_memcpy(&key.src_dst[2], ip_hdr->dst_addr, 16);
@@ -158,16 +159,17 @@ struct rte_mbuf *
* this is what we remove from the payload len.
*/
ip_len = rte_be_to_cpu_16(ip_hdr->payload_len) - sizeof(*frag_hdr);
+ trim = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len);
IP_FRAG_LOG(DEBUG, "%s:%d:\n"
"mbuf: %p, tms: %" PRIu64
", key: <" IPv6_KEY_BYTES_FMT ", %#x>, "
- "ofs: %u, len: %d, flags: %#x\n"
+ "ofs: %u, len: %d, padding: %d, flags: %#x\n"
"tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, "
"max_entries: %u, use_entries: %u\n\n",
__func__, __LINE__,
mb, tms, IPv6_KEY_BYTES(key.src_dst), key.id, ip_ofs, ip_len,
- RTE_IPV6_GET_MF(frag_hdr->frag_data),
+ trim, RTE_IPV6_GET_MF(frag_hdr->frag_data),
tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries,
tbl->use_entries);
@@ -177,6 +179,9 @@ struct rte_mbuf *
return NULL;
}
+ if (unlikely(trim > 0))
+ rte_pktmbuf_trim(mb, trim);
+
/* try to find/add entry into the fragment's table. */
fp = ip_frag_find(tbl, dr, &key, tms);
if (fp == NULL) {
--
1.9.5.msysgit.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] [PATCH v6] ip_frag: remove padding length of fragment
2020-12-16 13:36 ` [dpdk-dev] [PATCH v6] " Yicai Lu
@ 2020-12-18 11:41 ` Ananyev, Konstantin
2021-01-15 10:29 ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
0 siblings, 1 reply; 11+ messages in thread
From: Ananyev, Konstantin @ 2020-12-18 11:41 UTC (permalink / raw)
To: Yicai Lu, dev
Cc: zhoujingbin, chenchanghu, jerry.lilijun, haifeng.lin,
guohongzhi1, wangyunjian, stable
>
> In some situations, we would get several ip fragments, which total
> data length is less than min_ip_len(64) and padding with zeros.
> We simulated intermediate fragments by modifying the MTU.
> To illustrate the problem, we simplify the packet format and
> ignore the impact of the packet header.In namespace2,
> a packet whose data length is 1520 is sent.
> When the packet passes tap2, the packet is divided into two
> fragments: fragment A and B, similar to (1520 = 1510 + 10).
> When the packet passes tap3, the larger fragment packet A is
> divided into two fragments A1 and A2, similar to (1510 = 1500 + 10).
> Finally, the bond interface receives three fragments:
> A1, A2, and B (1520 = 1500 + 10 + 10).
> One fragmented packet A2 is smaller than the minimum Ethernet
> frame length, so it needs to be padded.
>
> |---------------------------------------------------|
> | HOST |
> | |--------------| |----------------------------| |
> | | ns2 | | |--------------| | |
> | | |--------| | | |--------| |--------| | |
> | | | tap1 | | | | tap2 | ns1| tap3 | | |
> | | |mtu=1510| | | |mtu=1510| |mtu=1500| | |
> | |--|1.1.1.1 |--| |--|1.1.1.2 |----|2.1.1.1 |--| |
> | |--------| |--------| |--------| |
> | | | | |
> | |-----------------| | |
> | | |
> | |--------| |
> | | bond | |
> |--------------------------------------|mtu=1500|---|
> |--------|
>
> When processing the preceding packets above,
> DPDK would aggregate fragmented packets A2 and B.
> And error packets are generated, which padding(zero)
> is displayed in the middle of the packet.
>
> A2 + B:
> 0000 fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00
> 0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01
> 0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00
> 0030 00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb
> 0040 cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db
> 0050 dc dd de df e0 e1 e2 e3 e4 e5 e6
>
> So, we would calculate the length of padding, and remove
> the padding in pkt_len and data_len before aggregation.
> And also we have the fix for both ipv4 and ipv6.
>
> Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming packet")
> Cc: stable@dpdk.org
>
> Signed-off-by: Yicai Lu <luyicai@huawei.com>
> ---
> v5 -> v6: Have the fix for ipv6.
> ---
> lib/librte_ip_frag/rte_ipv4_reassembly.c | 11 ++++++++---
> lib/librte_ip_frag/rte_ipv6_reassembly.c | 9 +++++++--
> 2 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> index 1dda8ac..69666c8 100644
> --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c
> +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c
> @@ -104,6 +104,7 @@ struct rte_mbuf *
> const unaligned_uint64_t *psd;
> uint16_t flag_offset, ip_ofs, ip_flag;
> int32_t ip_len;
> + int32_t trim;
>
> flag_offset = rte_be_to_cpu_16(ip_hdr->fragment_offset);
> ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK);
> @@ -117,14 +118,15 @@ struct rte_mbuf *
>
> ip_ofs *= RTE_IPV4_HDR_OFFSET_UNITS;
> ip_len = rte_be_to_cpu_16(ip_hdr->total_length) - mb->l3_len;
> + trim = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len);
>
> IP_FRAG_LOG(DEBUG, "%s:%d:\n"
> - "mbuf: %p, tms: %" PRIu64
> - ", key: <%" PRIx64 ", %#x>, ofs: %u, len: %d, flags: %#x\n"
> + "mbuf: %p, tms: %" PRIu64 ", key: <%" PRIx64 ", %#x>"
> + "ofs: %u, len: %d, padding: %d, flags: %#x\n"
> "tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, "
> "max_entries: %u, use_entries: %u\n\n",
> __func__, __LINE__,
> - mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, ip_flag,
> + mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, trim, ip_flag,
> tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries,
> tbl->use_entries);
>
> @@ -134,6 +136,9 @@ struct rte_mbuf *
> return NULL;
> }
>
> + if (unlikely(trim > 0))
> + rte_pktmbuf_trim(mb, trim);
> +
> /* try to find/add entry into the fragment's table. */
> if ((fp = ip_frag_find(tbl, dr, &key, tms)) == NULL) {
> IP_FRAG_MBUF2DR(dr, mb);
> diff --git a/lib/librte_ip_frag/rte_ipv6_reassembly.c b/lib/librte_ip_frag/rte_ipv6_reassembly.c
> index ad01055..4037da0 100644
> --- a/lib/librte_ip_frag/rte_ipv6_reassembly.c
> +++ b/lib/librte_ip_frag/rte_ipv6_reassembly.c
> @@ -142,6 +142,7 @@ struct rte_mbuf *
> struct ip_frag_key key;
> uint16_t ip_ofs;
> int32_t ip_len;
> + int32_t trim;
>
> rte_memcpy(&key.src_dst[0], ip_hdr->src_addr, 16);
> rte_memcpy(&key.src_dst[2], ip_hdr->dst_addr, 16);
> @@ -158,16 +159,17 @@ struct rte_mbuf *
> * this is what we remove from the payload len.
> */
> ip_len = rte_be_to_cpu_16(ip_hdr->payload_len) - sizeof(*frag_hdr);
> + trim = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len);
>
> IP_FRAG_LOG(DEBUG, "%s:%d:\n"
> "mbuf: %p, tms: %" PRIu64
> ", key: <" IPv6_KEY_BYTES_FMT ", %#x>, "
> - "ofs: %u, len: %d, flags: %#x\n"
> + "ofs: %u, len: %d, padding: %d, flags: %#x\n"
> "tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, "
> "max_entries: %u, use_entries: %u\n\n",
> __func__, __LINE__,
> mb, tms, IPv6_KEY_BYTES(key.src_dst), key.id, ip_ofs, ip_len,
> - RTE_IPV6_GET_MF(frag_hdr->frag_data),
> + trim, RTE_IPV6_GET_MF(frag_hdr->frag_data),
> tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries,
> tbl->use_entries);
>
> @@ -177,6 +179,9 @@ struct rte_mbuf *
> return NULL;
> }
>
> + if (unlikely(trim > 0))
> + rte_pktmbuf_trim(mb, trim);
> +
> /* try to find/add entry into the fragment's table. */
> fp = ip_frag_find(tbl, dr, &key, tms);
> if (fp == NULL) {
> --
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> 1.9.5.msysgit.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [dpdk-dev] [dpdk-stable] [PATCH v6] ip_frag: remove padding length of fragment
2020-12-18 11:41 ` Ananyev, Konstantin
@ 2021-01-15 10:29 ` Thomas Monjalon
0 siblings, 0 replies; 11+ messages in thread
From: Thomas Monjalon @ 2021-01-15 10:29 UTC (permalink / raw)
To: Yicai Lu
Cc: dev, stable, zhoujingbin, chenchanghu, jerry.lilijun,
haifeng.lin, guohongzhi1, wangyunjian, Ananyev, Konstantin
> > In some situations, we would get several ip fragments, which total
> > data length is less than min_ip_len(64) and padding with zeros.
> > We simulated intermediate fragments by modifying the MTU.
> > To illustrate the problem, we simplify the packet format and
> > ignore the impact of the packet header.In namespace2,
> > a packet whose data length is 1520 is sent.
> > When the packet passes tap2, the packet is divided into two
> > fragments: fragment A and B, similar to (1520 = 1510 + 10).
> > When the packet passes tap3, the larger fragment packet A is
> > divided into two fragments A1 and A2, similar to (1510 = 1500 + 10).
> > Finally, the bond interface receives three fragments:
> > A1, A2, and B (1520 = 1500 + 10 + 10).
> > One fragmented packet A2 is smaller than the minimum Ethernet
> > frame length, so it needs to be padded.
> >
> > |---------------------------------------------------|
> > | HOST |
> > | |--------------| |----------------------------| |
> > | | ns2 | | |--------------| | |
> > | | |--------| | | |--------| |--------| | |
> > | | | tap1 | | | | tap2 | ns1| tap3 | | |
> > | | |mtu=1510| | | |mtu=1510| |mtu=1500| | |
> > | |--|1.1.1.1 |--| |--|1.1.1.2 |----|2.1.1.1 |--| |
> > | |--------| |--------| |--------| |
> > | | | | |
> > | |-----------------| | |
> > | | |
> > | |--------| |
> > | | bond | |
> > |--------------------------------------|mtu=1500|---|
> > |--------|
> >
> > When processing the preceding packets above,
> > DPDK would aggregate fragmented packets A2 and B.
> > And error packets are generated, which padding(zero)
> > is displayed in the middle of the packet.
> >
> > A2 + B:
> > 0000 fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00
> > 0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01
> > 0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00
> > 0030 00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb
> > 0040 cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db
> > 0050 dc dd de df e0 e1 e2 e3 e4 e5 e6
> >
> > So, we would calculate the length of padding, and remove
> > the padding in pkt_len and data_len before aggregation.
> > And also we have the fix for both ipv4 and ipv6.
> >
> > Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming packet")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Yicai Lu <luyicai@huawei.com>
>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Applied, thanks
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-01-15 10:29 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-18 13:29 [dpdk-dev] [PATCH] ip_frag: recalculate data length of fragment Yicai Lu
2020-11-22 11:22 ` Thomas Monjalon
2020-11-23 2:27 ` [dpdk-dev] [PATCH v2] " Yicai Lu
2020-12-12 11:05 ` [dpdk-dev] [PATCH v5] ip_frag: remove padding " Yicai Lu
2020-12-14 14:44 ` Ananyev, Konstantin
2020-12-15 3:18 ` luyicai
2020-12-16 10:44 ` Ananyev, Konstantin
2020-12-16 10:54 ` luyicai
2020-12-16 13:36 ` [dpdk-dev] [PATCH v6] " Yicai Lu
2020-12-18 11:41 ` Ananyev, Konstantin
2021-01-15 10:29 ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).