DPDK patches and discussions
 help / color / mirror / Atom feed
From: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
To: Huichao Cai <chcchc88@163.com>, dev@dpdk.org
Subject: Re: [PATCH v6] ip_frag: add IPv4 fragment copy packet API
Date: Sun, 7 Aug 2022 12:45:01 +0100	[thread overview]
Message-ID: <bb464f37-fbae-8aad-d3fb-7fbd21c9e906@yandex.ru> (raw)
In-Reply-To: <1658650203-7831-1-git-send-email-chcchc88@163.com>

24/07/2022 09:10, Huichao Cai пишет:
> Some NIC drivers support MBUF_FAST_FREE(Device supports optimization
> for fast release of mbufs. When set application must guarantee that
> per-queue all mbufs comes from the same mempool,has refcnt = 1,direct
> and non-segmented.)offload. In order to adapt to this offload function,
> add this API. Add some test data for this API.
> 
> Signed-off-by: Huichao Cai <chcchc88@163.com>
> ---
>   app/test/test_ipfrag.c               |   9 +-
>   lib/ip_frag/rte_ip_frag.h            |  34 +++++++
>   lib/ip_frag/rte_ipv4_fragmentation.c | 175 +++++++++++++++++++++++++++++++++++
>   lib/ip_frag/version.map              |   1 +
>   4 files changed, 218 insertions(+), 1 deletion(-)
> 
> diff --git a/app/test/test_ipfrag.c b/app/test/test_ipfrag.c
> index ba0ffd0..88cc4cd 100644
> --- a/app/test/test_ipfrag.c
> +++ b/app/test/test_ipfrag.c
> @@ -418,10 +418,17 @@ static void ut_teardown(void)
>   		}
>   
>   		if (tests[i].ipv == 4)
> -			len = rte_ipv4_fragment_packet(b, pkts_out, BURST,
> +			if (i % 2)
> +				len = rte_ipv4_fragment_packet(b, pkts_out, BURST,
>   						       tests[i].mtu_size,
>   						       direct_pool,
>   						       indirect_pool);
> +			else
> +				len = rte_ipv4_fragment_copy_nonseg_packet(b,
> +						       pkts_out,
> +						       BURST,
> +						       tests[i].mtu_size,
> +						       direct_pool);
>   		else if (tests[i].ipv == 6)
>   			len = rte_ipv6_fragment_packet(b, pkts_out, BURST,
>   						       tests[i].mtu_size,
> diff --git a/lib/ip_frag/rte_ip_frag.h b/lib/ip_frag/rte_ip_frag.h
> index 7d2abe1..4a2b150 100644
> --- a/lib/ip_frag/rte_ip_frag.h
> +++ b/lib/ip_frag/rte_ip_frag.h
> @@ -179,6 +179,40 @@ int32_t rte_ipv4_fragment_packet(struct rte_mbuf *pkt_in,
>   			struct rte_mempool *pool_indirect);
>   
>   /**
> + * IPv4 fragmentation by copy.
> + *
> + * This function implements the fragmentation of IPv4 packets by copy
> + * non-segmented mbuf.
> + * This function is mainly used to adapt TX MBUF_FAST_FREE offload.
> + * MBUF_FAST_FREE: Device supports optimization for fast release of mbufs.
> + * When set application must guarantee that per-queue all mbufs comes from
> + * the same mempool,has refcnt = 1,direct and non-segmented.
> + *
> + * @param pkt_in
> + *   The input packet.
> + * @param pkts_out
> + *   Array storing the output fragments.
> + * @param nb_pkts_out
> + *   Number of fragments.
> + * @param mtu_size
> + *   Size in bytes of the Maximum Transfer Unit (MTU) for the outgoing IPv4
> + *   datagrams. This value includes the size of the IPv4 header.
> + * @param pool_direct
> + *   MBUF pool used for allocating direct buffers for the output fragments.
> + * @return
> + *   Upon successful completion - number of output fragments placed
> + *   in the pkts_out array.
> + *   Otherwise - (-1) * errno.
> + */
> +__rte_experimental
> +int32_t
> +rte_ipv4_fragment_copy_nonseg_packet(struct rte_mbuf *pkt_in,
> +	struct rte_mbuf **pkts_out,
> +	uint16_t nb_pkts_out,
> +	uint16_t mtu_size,
> +	struct rte_mempool *pool_direct);
> +
> +/**
>    * This function implements reassembly of fragmented IPv4 packets.
>    * Incoming mbufs should have its l2_len/l3_len fields setup correctly.
>    *
> diff --git a/lib/ip_frag/rte_ipv4_fragmentation.c b/lib/ip_frag/rte_ipv4_fragmentation.c
> index 27a8ad2..e6ec408 100644
> --- a/lib/ip_frag/rte_ipv4_fragmentation.c
> +++ b/lib/ip_frag/rte_ipv4_fragmentation.c
> @@ -259,3 +259,178 @@ static inline uint16_t __create_ipopt_frag_hdr(uint8_t *iph,
>   
>   	return out_pkt_pos;
>   }
> +
> +/**
> + * IPv4 fragmentation by copy.
> + *
> + * This function implements the fragmentation of IPv4 packets by copy
> + * non-segmented mbuf.
> + * This function is mainly used to adapt TX MBUF_FAST_FREE offload.
> + * MBUF_FAST_FREE: Device supports optimization for fast release of mbufs.
> + * When set application must guarantee that per-queue all mbufs comes from
> + * the same mempool,has refcnt = 1,direct and non-segmented.
> + *
> + * @param pkt_in
> + *   The input packet.
> + * @param pkts_out
> + *   Array storing the output fragments.
> + * @param nb_pkts_out
> + *   Number of fragments.
> + * @param mtu_size
> + *   Size in bytes of the Maximum Transfer Unit (MTU) for the outgoing IPv4
> + *   datagrams. This value includes the size of the IPv4 header.
> + * @param pool_direct
> + *   MBUF pool used for allocating direct buffers for the output fragments.
> + * @return
> + *   Upon successful completion - number of output fragments placed
> + *   in the pkts_out array.
> + *   Otherwise - (-1) * errno.
> + */
> +int32_t
> +rte_ipv4_fragment_copy_nonseg_packet(struct rte_mbuf *pkt_in,
> +	struct rte_mbuf **pkts_out,
> +	uint16_t nb_pkts_out,
> +	uint16_t mtu_size,
> +	struct rte_mempool *pool_direct)
> +{
> +	struct rte_mbuf *in_seg = NULL;
> +	struct rte_ipv4_hdr *in_hdr;
> +	uint32_t out_pkt_pos, in_seg_data_pos;
> +	uint32_t more_in_segs;
> +	uint16_t fragment_offset, flag_offset, frag_size, header_len;
> +	uint16_t frag_bytes_remaining;
> +	uint8_t ipopt_frag_hdr[IPV4_HDR_MAX_LEN];
> +	uint16_t ipopt_len;
> +
> +	/*
> +	 * Formal parameter checking.
> +	 */
> +	if (unlikely(pkt_in == NULL) || unlikely(pkts_out == NULL) ||
> +	    unlikely(nb_pkts_out == 0) || unlikely(pool_direct == NULL) ||
> +	    unlikely(mtu_size < RTE_ETHER_MIN_MTU))
> +		return -EINVAL;
> +
> +	in_hdr = rte_pktmbuf_mtod(pkt_in, struct rte_ipv4_hdr *);
> +	header_len = (in_hdr->version_ihl & RTE_IPV4_HDR_IHL_MASK) *
> +	    RTE_IPV4_IHL_MULTIPLIER;
> +
> +	/* Check IP header length */
> +	if (unlikely(pkt_in->data_len < header_len) ||
> +	    unlikely(mtu_size < header_len))
> +		return -EINVAL;
> +
> +	/*
> +	 * Ensure the IP payload length of all fragments is aligned to a
> +	 * multiple of 8 bytes as per RFC791 section 2.3.
> +	 */
> +	frag_size = RTE_ALIGN_FLOOR((mtu_size - header_len),
> +				    IPV4_HDR_FO_ALIGN);
> +
> +	flag_offset = rte_cpu_to_be_16(in_hdr->fragment_offset);
> +
> +	/* If Don't Fragment flag is set */
> +	if (unlikely((flag_offset & IPV4_HDR_DF_MASK) != 0))
> +		return -ENOTSUP;
> +
> +	/* Check that pkts_out is big enough to hold all fragments */
> +	if (unlikely(frag_size * nb_pkts_out <
> +	    (uint16_t)(pkt_in->pkt_len - header_len)))
> +		return -EINVAL;
> +
> +	in_seg = pkt_in;
> +	in_seg_data_pos = header_len;
> +	out_pkt_pos = 0;
> +	fragment_offset = 0;
> +
> +	ipopt_len = header_len - sizeof(struct rte_ipv4_hdr);
> +	if (unlikely(ipopt_len > RTE_IPV4_HDR_OPT_MAX_LEN))
> +		return -EINVAL;
> +
> +	more_in_segs = 1;
> +	while (likely(more_in_segs)) {
> +		struct rte_mbuf *out_pkt = NULL;
> +		uint32_t more_out_segs;
> +		struct rte_ipv4_hdr *out_hdr;
> +
> +		/* Allocate direct buffer */
> +		out_pkt = rte_pktmbuf_alloc(pool_direct);
> +		if (unlikely(out_pkt == NULL)) {
> +			__free_fragments(pkts_out, out_pkt_pos);
> +			return -ENOMEM;
> +		}
> +		if (unlikely(out_pkt->buf_len - rte_pktmbuf_headroom(out_pkt) <
> +				frag_size)) {

As a nit, might be better;
if (rte_pktmbuf_tailroom(out_pkt) < frag_size) {...}

Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>

> +			rte_pktmbuf_free(out_pkt);
> +			__free_fragments(pkts_out, out_pkt_pos);
> +			return -EINVAL;
> +		}
> +
> +		/* Reserve space for the IP header that will be built later */
> +		out_pkt->data_len = header_len;
> +		out_pkt->pkt_len = header_len;
> +		frag_bytes_remaining = frag_size;
> +
> +		more_out_segs = 1;
> +		while (likely(more_out_segs && more_in_segs)) {
> +			uint32_t len;
> +
> +			len = frag_bytes_remaining;
> +			if (len > (in_seg->data_len - in_seg_data_pos))
> +				len = in_seg->data_len - in_seg_data_pos;
> +
> +			memcpy(rte_pktmbuf_mtod_offset(out_pkt, char *,
> +					out_pkt->data_len),
> +				rte_pktmbuf_mtod_offset(in_seg, char *,
> +					in_seg_data_pos),
> +				len);
> +
> +			in_seg_data_pos += len;
> +			frag_bytes_remaining -= len;
> +			out_pkt->data_len += len;
> +
> +			/* Current output packet (i.e. fragment) done ? */
> +			if (unlikely(frag_bytes_remaining == 0))
> +				more_out_segs = 0;
> +
> +			/* Current input segment done ? */
> +			if (unlikely(in_seg_data_pos == in_seg->data_len)) {
> +				in_seg = in_seg->next;
> +				in_seg_data_pos = 0;
> +
> +				if (unlikely(in_seg == NULL))
> +					more_in_segs = 0;
> +			}
> +		}
> +
> +		/* Build the IP header */
> +
> +		out_pkt->pkt_len = out_pkt->data_len;
> +		out_hdr = rte_pktmbuf_mtod(out_pkt, struct rte_ipv4_hdr *);
> +
> +		__fill_ipv4hdr_frag(out_hdr, in_hdr, header_len,
> +		    (uint16_t)out_pkt->pkt_len,
> +		    flag_offset, fragment_offset, more_in_segs);
> +
> +		if (unlikely((fragment_offset == 0) && (ipopt_len) &&
> +			    ((flag_offset & RTE_IPV4_HDR_OFFSET_MASK) == 0))) {
> +			ipopt_len = __create_ipopt_frag_hdr((uint8_t *)in_hdr,
> +				ipopt_len, ipopt_frag_hdr);
> +			fragment_offset = (uint16_t)(fragment_offset +
> +				out_pkt->pkt_len - header_len);
> +			out_pkt->l3_len = header_len;
> +
> +			header_len = sizeof(struct rte_ipv4_hdr) + ipopt_len;
> +			in_hdr = (struct rte_ipv4_hdr *)ipopt_frag_hdr;
> +		} else {
> +			fragment_offset = (uint16_t)(fragment_offset +
> +				out_pkt->pkt_len - header_len);
> +			out_pkt->l3_len = header_len;
> +		}
> +
> +		/* Write the fragment to the output list */
> +		pkts_out[out_pkt_pos] = out_pkt;
> +		out_pkt_pos++;
> +	}
> +
> +	return out_pkt_pos;
> +}
> diff --git a/lib/ip_frag/version.map b/lib/ip_frag/version.map
> index b9c1cca..8aad839 100644
> --- a/lib/ip_frag/version.map
> +++ b/lib/ip_frag/version.map
> @@ -17,4 +17,5 @@ EXPERIMENTAL {
>   	global:
>   
>   	rte_ip_frag_table_del_expired_entries;
> +	rte_ipv4_fragment_copy_nonseg_packet;
>   };


  parent reply	other threads:[~2022-08-07 11:45 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-09  2:39 [PATCH v1] " Huichao Cai
2022-06-09 14:19 ` [PATCH v2] " Huichao Cai
2022-07-10 23:35   ` Konstantin Ananyev
2022-07-11  9:14     ` Konstantin Ananyev
2022-07-15  8:05       ` Huichao Cai
2022-07-19  8:19         ` Konstantin Ananyev
2022-07-22 13:01   ` [PATCH v3] " Huichao Cai
2022-07-22 14:42     ` Morten Brørup
2022-07-22 14:49     ` Stephen Hemminger
2022-07-22 15:52       ` Morten Brørup
2022-07-22 15:58         ` Huichao Cai
2022-07-22 16:14           ` Morten Brørup
2022-07-22 22:35             ` Konstantin Ananyev
2022-07-23  8:24               ` Morten Brørup
2022-07-23 18:25                 ` Konstantin Ananyev
2022-07-23 22:27                   ` Morten Brørup
2022-07-22 14:49     ` [PATCH v4] " Huichao Cai
2022-07-24  4:50       ` [PATCH v5] " Huichao Cai
2022-07-24  8:10         ` [PATCH v6] " Huichao Cai
2022-07-25 15:42           ` Stephen Hemminger
2022-07-26  1:22             ` Huichao Cai
2022-08-07 11:49               ` Konstantin Ananyev
2022-08-07 11:45           ` Konstantin Ananyev [this message]
2022-08-08  1:48           ` [PATCH v7] " Huichao Cai
2022-08-08 22:29             ` Konstantin Ananyev
2022-08-29 14:22               ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb464f37-fbae-8aad-d3fb-7fbd21c9e906@yandex.ru \
    --to=konstantin.v.ananyev@yandex.ru \
    --cc=chcchc88@163.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).