From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id E3CB4A09EF; Wed, 16 Dec 2020 14:37:14 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id BA4FCC9AA; Wed, 16 Dec 2020 14:37:12 +0100 (CET) Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by dpdk.org (Postfix) with ESMTP id AB801C996; Wed, 16 Dec 2020 14:37:08 +0100 (CET) Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4Cwx3p0HF2z15WJr; Wed, 16 Dec 2020 21:36:22 +0800 (CST) Received: from localhost (10.174.243.72) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.498.0; Wed, 16 Dec 2020 21:36:54 +0800 From: Yicai Lu To: CC: , , , , , , , Yicai Lu , Date: Wed, 16 Dec 2020 21:36:30 +0800 Message-ID: <1608125790-26496-1-git-send-email-luyicai@huawei.com> X-Mailer: git-send-email 1.9.5.msysgit.1 In-Reply-To: <1605706193-17192-1-git-send-email-luyicai@huawei.com> References: <1605706193-17192-1-git-send-email-luyicai@huawei.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.174.243.72] X-CFilter-Loop: Reflected Subject: [dpdk-dev] [PATCH v6] ip_frag: remove padding length of fragment X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" In some situations, we would get several ip fragments, which total data length is less than min_ip_len(64) and padding with zeros. We simulated intermediate fragments by modifying the MTU. To illustrate the problem, we simplify the packet format and ignore the impact of the packet header.In namespace2, a packet whose data length is 1520 is sent. When the packet passes tap2, the packet is divided into two fragments: fragment A and B, similar to (1520 = 1510 + 10). When the packet passes tap3, the larger fragment packet A is divided into two fragments A1 and A2, similar to (1510 = 1500 + 10). Finally, the bond interface receives three fragments: A1, A2, and B (1520 = 1500 + 10 + 10). One fragmented packet A2 is smaller than the minimum Ethernet frame length, so it needs to be padded. |---------------------------------------------------| | HOST | | |--------------| |----------------------------| | | | ns2 | | |--------------| | | | | |--------| | | |--------| |--------| | | | | | tap1 | | | | tap2 | ns1| tap3 | | | | | |mtu=1510| | | |mtu=1510| |mtu=1500| | | | |--|1.1.1.1 |--| |--|1.1.1.2 |----|2.1.1.1 |--| | | |--------| |--------| |--------| | | | | | | | |-----------------| | | | | | | |--------| | | | bond | | |--------------------------------------|mtu=1500|---| |--------| When processing the preceding packets above, DPDK would aggregate fragmented packets A2 and B. And error packets are generated, which padding(zero) is displayed in the middle of the packet. A2 + B: 0000 fa 16 3e 9f fb 82 fa 47 b2 57 dc 20 08 00 45 00 0010 00 33 b4 66 00 ba 3f 01 c1 a5 01 01 01 01 02 01 0020 01 02 c0 c1 c2 c3 c4 c5 c6 c7 00 00 00 00 00 00 0030 00 00 00 00 00 00 00 00 00 00 00 00 c8 c9 ca cb 0040 cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db 0050 dc dd de df e0 e1 e2 e3 e4 e5 e6 So, we would calculate the length of padding, and remove the padding in pkt_len and data_len before aggregation. And also we have the fix for both ipv4 and ipv6. Fixes: 7f0983ee331c ("ip_frag: check fragment length of incoming packet") Cc: stable@dpdk.org Signed-off-by: Yicai Lu --- v5 -> v6: Have the fix for ipv6. --- lib/librte_ip_frag/rte_ipv4_reassembly.c | 11 ++++++++--- lib/librte_ip_frag/rte_ipv6_reassembly.c | 9 +++++++-- 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/lib/librte_ip_frag/rte_ipv4_reassembly.c b/lib/librte_ip_frag/rte_ipv4_reassembly.c index 1dda8ac..69666c8 100644 --- a/lib/librte_ip_frag/rte_ipv4_reassembly.c +++ b/lib/librte_ip_frag/rte_ipv4_reassembly.c @@ -104,6 +104,7 @@ struct rte_mbuf * const unaligned_uint64_t *psd; uint16_t flag_offset, ip_ofs, ip_flag; int32_t ip_len; + int32_t trim; flag_offset = rte_be_to_cpu_16(ip_hdr->fragment_offset); ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK); @@ -117,14 +118,15 @@ struct rte_mbuf * ip_ofs *= RTE_IPV4_HDR_OFFSET_UNITS; ip_len = rte_be_to_cpu_16(ip_hdr->total_length) - mb->l3_len; + trim = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len); IP_FRAG_LOG(DEBUG, "%s:%d:\n" - "mbuf: %p, tms: %" PRIu64 - ", key: <%" PRIx64 ", %#x>, ofs: %u, len: %d, flags: %#x\n" + "mbuf: %p, tms: %" PRIu64 ", key: <%" PRIx64 ", %#x>" + "ofs: %u, len: %d, padding: %d, flags: %#x\n" "tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, " "max_entries: %u, use_entries: %u\n\n", __func__, __LINE__, - mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, ip_flag, + mb, tms, key.src_dst[0], key.id, ip_ofs, ip_len, trim, ip_flag, tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries, tbl->use_entries); @@ -134,6 +136,9 @@ struct rte_mbuf * return NULL; } + if (unlikely(trim > 0)) + rte_pktmbuf_trim(mb, trim); + /* try to find/add entry into the fragment's table. */ if ((fp = ip_frag_find(tbl, dr, &key, tms)) == NULL) { IP_FRAG_MBUF2DR(dr, mb); diff --git a/lib/librte_ip_frag/rte_ipv6_reassembly.c b/lib/librte_ip_frag/rte_ipv6_reassembly.c index ad01055..4037da0 100644 --- a/lib/librte_ip_frag/rte_ipv6_reassembly.c +++ b/lib/librte_ip_frag/rte_ipv6_reassembly.c @@ -142,6 +142,7 @@ struct rte_mbuf * struct ip_frag_key key; uint16_t ip_ofs; int32_t ip_len; + int32_t trim; rte_memcpy(&key.src_dst[0], ip_hdr->src_addr, 16); rte_memcpy(&key.src_dst[2], ip_hdr->dst_addr, 16); @@ -158,16 +159,17 @@ struct rte_mbuf * * this is what we remove from the payload len. */ ip_len = rte_be_to_cpu_16(ip_hdr->payload_len) - sizeof(*frag_hdr); + trim = mb->pkt_len - (ip_len + mb->l3_len + mb->l2_len); IP_FRAG_LOG(DEBUG, "%s:%d:\n" "mbuf: %p, tms: %" PRIu64 ", key: <" IPv6_KEY_BYTES_FMT ", %#x>, " - "ofs: %u, len: %d, flags: %#x\n" + "ofs: %u, len: %d, padding: %d, flags: %#x\n" "tbl: %p, max_cycles: %" PRIu64 ", entry_mask: %#x, " "max_entries: %u, use_entries: %u\n\n", __func__, __LINE__, mb, tms, IPv6_KEY_BYTES(key.src_dst), key.id, ip_ofs, ip_len, - RTE_IPV6_GET_MF(frag_hdr->frag_data), + trim, RTE_IPV6_GET_MF(frag_hdr->frag_data), tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries, tbl->use_entries); @@ -177,6 +179,9 @@ struct rte_mbuf * return NULL; } + if (unlikely(trim > 0)) + rte_pktmbuf_trim(mb, trim); + /* try to find/add entry into the fragment's table. */ fp = ip_frag_find(tbl, dr, &key, tms); if (fp == NULL) { -- 1.9.5.msysgit.1