From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1F10EA00C2; Thu, 6 Oct 2022 12:11:55 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 087B442B70; Thu, 6 Oct 2022 12:11:55 +0200 (CEST) Received: from shelob.oktetlabs.ru (shelob.oktetlabs.ru [91.220.146.113]) by mails.dpdk.org (Postfix) with ESMTP id AFB4940042 for ; Thu, 6 Oct 2022 12:11:53 +0200 (CEST) Received: from [192.168.38.17] (aros.oktetlabs.ru [192.168.38.17]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by shelob.oktetlabs.ru (Postfix) with ESMTPSA id 0FD1E86; Thu, 6 Oct 2022 13:11:53 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 shelob.oktetlabs.ru 0FD1E86 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=oktetlabs.ru; s=default; t=1665051113; bh=Ev+ZGcYawnsyj7ob2s9HisgddYpE4p3dVgppY+S3L/w=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=kAfO1eJwOgSvNi6VPS/YaczelZdI2qBXkZtWYOIChiZgMuBltUHzgX7wyJdaFcXTv QtYlaLpeem3ICuPDkbqHrJw8WC2M+6Kp3diwSrKBXw8eq6JWlz24x/YJL9FvrX65F/ cR/O14r8JfEEZwnTPSAO3qqdJCy3FpO3JIY4jIpc= Message-ID: <2e121e35-0f42-beb4-2896-9d1d81b5fff1@oktetlabs.ru> Date: Thu, 6 Oct 2022 13:11:52 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.0 Subject: Re: [PATCH v8 2/4] ethdev: introduce protocol hdr based buffer split Content-Language: en-US To: Yuan Wang , dev@dpdk.org, Thomas Monjalon , Ferruh Yigit Cc: ferruh.yigit@xilinx.com, mdr@ashroe.eu, xiaoyun.li@intel.com, aman.deep.singh@intel.com, yuying.zhang@intel.com, qi.z.zhang@intel.com, qiming.yang@intel.com, jerinjacobk@gmail.com, viacheslavo@nvidia.com, stephen@networkplumber.org, xuan.ding@intel.com, hpothula@marvell.com, yaqi.tang@intel.com, Wenxuan Wu References: <20220812181552.2908067-1-yuanx.wang@intel.com> <20221005231836.215112-1-yuanx.wang@intel.com> <20221005231836.215112-3-yuanx.wang@intel.com> From: Andrew Rybchenko Organization: OKTET Labs In-Reply-To: <20221005231836.215112-3-yuanx.wang@intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 10/6/22 02:18, Yuan Wang wrote: > Currently, Rx buffer split supports length based split. With Rx queue > offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT enabled and Rx packet segment > configured, PMD will be able to split the received packets into > multiple segments. > > However, length based buffer split is not suitable for NICs that do split > based on protocol headers. Given an arbitrarily variable length in Rx > packet segment, it is almost impossible to pass a fixed protocol header to > driver. Besides, the existence of tunneling results in the composition of > a packet is various, which makes the situation even worse. > > This patch extends current buffer split to support protocol header based > buffer split. A new proto_hdr field is introduced in the reserved field > of rte_eth_rxseg_split structure to specify protocol header. The proto_hdr > field defines the split position of packet, splitting will always happen > after the protocol header defined in the Rx packet segment. When Rx queue > offload RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT is enabled and corresponding > protocol header is configured, driver will split the ingress packets into > multiple segments. > > Examples for proto_hdr field defines: > To split after ETH-IPV4-UDP, it should be defined as > proto_hdr = RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | > RTE_PTYPE_L4_UDP > > For inner ETH-IPV4-UDP, it should be defined as > proto_hdr = RTE_PTYPE_TUNNEL_GRENAT | RTE_PTYPE_INNER_L2_ETHER | > RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_INNER_L4_UDP > > If the protocol header is repeated with the previously defined one, > the repeated part can be omitted. For example, split after ETH, ETH-IPV4 > and ETH-IPV4-UDP, it should be defined as > proto_hdr0 = RTE_PTYPE_L2_ETHER > proto_hdr1 = RTE_PTYPE_L3_IPV4_EXT_UNKNOWN > proto_hdr2 = RTE_PTYPE_L4_UDP Ack > > struct rte_eth_rxseg_split { > struct rte_mempool *mp; /* memory pools to allocate segment from */ > uint16_t length; /* segment maximal data length, > configures split point */ > uint16_t offset; /* data offset from beginning > of mbuf data buffer */ > /** > * Proto_hdr defines a bit mask of the protocol sequence as > * RTE_PTYPE_*, configures split point. The last RTE_PTYPE* > * in the mask indicates the split position. > * If one protocol header is defined to split packets into two > * segments, for non-tunneling packets, the complete protocol > * sequence should be defined. > * For tunneling packets, for simplicity, > * only the tunnel and inner part of comple protocol sequence > * is required. > * If several protocol headers are defined to split packets into > * multi-segments, the repeated parts of adjacent segments > * should be omitted. > */ > uint32_t proto_hdr; > }; Sorry, but I see no reason to repeat in the descrtion. What is the purpose of the duplication? > > If protocol header split can be supported by a PMD, the > rte_eth_buffer_split_get_supported_hdr_ptypes function can > be use to obtain a list of these protocol headers. > > For example, let's suppose we configured the Rx queue with the > following segments: > seg0 - pool0, proto_hdr0=RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4, > off0=2B > seg1 - pool1, proto_hdr1=RTE_PTYPE_L4_UDP, off1=128B > seg2 - pool2, off1=0B > > The packet consists of ETH_IPV4_UDP_PAYLOAD will be split like > following: > seg0 - ipv4 header @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0 > seg1 - udp header @ 128 in mbuf from pool1 > seg2 - payload @ 0 in mbuf from pool2 > > Note: NIC will only do split when the packets exactly match all the > protocol headers in the segments. For example, if ARP packets received > with above config, the NIC won't do split for ARP packets since > it does not contains ipv4 header and udp header. These packets will be put ipv4 -> IPv4, udp -> UDP. > into the last valid mempool, with zero offset. What should happen if we have seg1 -> ETH, seg2 -> IPv4, seg3 - remaining and receive ARP? Will we see ETH header split in seg1 and everything else in the seg3? I would say yes. May be we should define intended behavior using pseudo-code? > > Now buffer split can be configured in two modes. For length based > buffer split, the mp, length, offset field in Rx packet segment should > be configured, while the proto_hdr field will be ignored. > For protocol header based buffer split, the mp, offset, proto_hdr field > in Rx packet segment should be configured, while the length field will > be ignored. > > The split limitations imposed by underlying driver is reported in the > rte_eth_dev_info->rx_seg_capa field. The memory attributes for the split > parts may differ either, dpdk memory and external memory, respectively. > > Signed-off-by: Yuan Wang > Signed-off-by: Xuan Ding > Signed-off-by: Wenxuan Wu > --- > doc/guides/rel_notes/release_22_11.rst | 4 ++ > lib/ethdev/rte_ethdev.c | 89 ++++++++++++++++++++++---- > lib/ethdev/rte_ethdev.h | 34 +++++++++- > 3 files changed, 115 insertions(+), 12 deletions(-) > > diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst > index 141fd9442b..4c3a7f8b8b 100644 > --- a/doc/guides/rel_notes/release_22_11.rst > +++ b/doc/guides/rel_notes/release_22_11.rst > @@ -127,6 +127,10 @@ New Features > > * Added ``rte_eth_buffer_split_get_supported_hdr_ptypes()``, to get supported > header protocols of a PMD to split. > + * Ethdev: The ``reserved`` field in the ``rte_eth_rxseg_split`` structure is > + replaced with ``proto_hdr`` to support protocol header based buffer split. > + User can choose length or protocol header to configure buffer split > + according to NIC's capability. It sounds like it should be mentioned in API change section as well. Here I'd concentrate on top level feature overview only. I.e. Supported protocol-based buffer split using added ``proto_hdr`` in structure ``rte_eth_rxseg_split``. > > > Removed Items > diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c > index ee3b490889..60fe6eb2bd 100644 > --- a/lib/ethdev/rte_ethdev.c > +++ b/lib/ethdev/rte_ethdev.c > @@ -1650,14 +1650,18 @@ rte_eth_dev_is_removed(uint16_t port_id) > } > > static int > -rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg, > - uint16_t n_seg, uint32_t *mbp_buf_size, > - const struct rte_eth_dev_info *dev_info) > +rte_eth_rx_queue_check_split(uint16_t port_id, > + const struct rte_eth_rxseg_split *rx_seg, > + uint16_t n_seg, uint32_t *mbp_buf_size, > + const struct rte_eth_dev_info *dev_info) > { > const struct rte_eth_rxseg_capa *seg_capa = &dev_info->rx_seg_capa; > struct rte_mempool *mp_first; > uint32_t offset_mask; > uint16_t seg_idx; > + int ptype_cnt; > + uint32_t *ptypes; > + int i; > > if (n_seg > seg_capa->max_nseg) { > RTE_ETHDEV_LOG(ERR, > @@ -1675,6 +1679,7 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg, > struct rte_mempool *mpl = rx_seg[seg_idx].mp; > uint32_t length = rx_seg[seg_idx].length; > uint32_t offset = rx_seg[seg_idx].offset; > + uint32_t proto_hdr = rx_seg[seg_idx].proto_hdr; > > if (mpl == NULL) { > RTE_ETHDEV_LOG(ERR, "null mempool pointer\n"); > @@ -1708,13 +1713,75 @@ rte_eth_rx_queue_check_split(const struct rte_eth_rxseg_split *rx_seg, > } > offset += seg_idx != 0 ? 0 : RTE_PKTMBUF_HEADROOM; > *mbp_buf_size = rte_pktmbuf_data_room_size(mpl); > - length = length != 0 ? length : *mbp_buf_size; > - if (*mbp_buf_size < length + offset) { > - RTE_ETHDEV_LOG(ERR, > - "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n", > - mpl->name, *mbp_buf_size, > - length + offset, length, offset); > - return -EINVAL; > + > + if (proto_hdr > 0) { proto_hdr != 0, please. I know that it is the same, but != 0 raises a bit less question if the field is signed or unsigned. As the first condition here we should check if protocol-based split is supported at all (see note about separate helper function below). > + /* Split based on protocol headers. */ > + if (length != 0) { > + RTE_ETHDEV_LOG(ERR, > + "Do not set length split and protocol split within a segment\n" > + ); > + return -EINVAL; > + } > + > + if (seg_idx == n_seg - 1) { > + RTE_ETHDEV_LOG(ERR, > + "The proto_hdr in the last segment should be 0\n" > + ); > + return -EINVAL; > + } I think here we should check if we have seen any segment with proto_hdr == 0 before. If so, we can't do protocol based split any more. Since we need to collect already split protcols (prev_proto_hdrs), I would use the variable as a marker and set it to all 1's MASK as soon as proto_hdr==0 met. So, the condition will be if ((proto_hdr & prev_proto_hdrs) != 0) So, it will check two since no repeat of previou protocol headers which are already split and no ptoto-split after length-based split. > + > + if (*mbp_buf_size < offset) { > + RTE_ETHDEV_LOG(ERR, > + "%s mbuf_data_room_size %u < %u segment offset)\n", > + mpl->name, *mbp_buf_size, > + offset); > + return -EINVAL; > + } > + (separate helper function starts here) > + ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, NULL, 0); Three is no point to do it in a loop. It should be done outside. Moreover, it should be a helper function which does it to make this functionshort. > + if (ptype_cnt <= 0) { > + RTE_ETHDEV_LOG(ERR, > + "Port %u failed to supported buffer split header protocols\n", > + port_id); > + return -EINVAL; > + } > + > + ptypes = malloc(sizeof(uint32_t) * ptype_cnt); > + if (ptypes == NULL) > + return -ENOMEM; > + > + ptype_cnt = rte_eth_buffer_split_get_supported_hdr_ptypes(port_id, > + ptypes, ptype_cnt); > + if (ptype_cnt < 0) { > + RTE_ETHDEV_LOG(ERR, > + "Port %u failed to supported buffer split header protocols\n", > + port_id); > + free(ptypes); > + return -EINVAL; > + } (separate helper function ends here) > + > + for (i = 0; i < ptype_cnt; i++) > + if (ptypes[i] == proto_hdr) It should be if ((prev_proto_hdrs | proto_hdr) == ptypes[i]) > + break; > + > + free(ptypes); > + > + if (i == ptype_cnt) { > + RTE_ETHDEV_LOG(ERR, > + "Requested Rx split header protocols 0x%x is not supported.\n", > + proto_hdr); > + return -EINVAL; > + } prev_proto_hdrs |= proto_hdr; > + } else { NOTE If driver does not support length-based split, it should reject such configuration itself. > + /* Split at fixed length. */ > + length = length != 0 ? length : *mbp_buf_size; > + if (*mbp_buf_size < length + offset) { > + RTE_ETHDEV_LOG(ERR, > + "%s mbuf_data_room_size %u < %u (segment length=%u + segment offset=%u)\n", > + mpl->name, *mbp_buf_size, > + length + offset, length, offset); > + return -EINVAL; > + } prev_proto_hdrs = RTE_PTYPE_ALL_MASK; > } > } > return 0; > @@ -1794,7 +1861,7 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id, > n_seg = rx_conf->rx_nseg; > > if (rx_conf->offloads & RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT) { > - ret = rte_eth_rx_queue_check_split(rx_seg, n_seg, > + ret = rte_eth_rx_queue_check_split(port_id, rx_seg, n_seg, > &mbp_buf_size, > &dev_info); > if (ret != 0) > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h > index c51c1f3fa0..4c9b121355 100644 > --- a/lib/ethdev/rte_ethdev.h > +++ b/lib/ethdev/rte_ethdev.h > @@ -994,6 +994,9 @@ struct rte_eth_txmode { > * specified in the first array element, the second buffer, from the > * pool in the second element, and so on. > * > + * - The proto_hdrs in the elements define the split position of > + * received packets. > + * > * - The offsets from the segment description elements specify > * the data offset from the buffer beginning except the first mbuf. > * The first segment offset is added with RTE_PKTMBUF_HEADROOM. > @@ -1015,12 +1018,41 @@ struct rte_eth_txmode { > * - pool from the last valid element > * - the buffer size from this pool > * - zero offset > + * > + * - Length based buffer split: > + * - mp, length, offset should be configured. > + * - The proto_hdr field must be 0. > + * > + * - Protocol header based buffer split: > + * - mp, offset, proto_hdr should be configured. > + * - The length field must be 0. > + * - The proto_hdr field in the last segment should be 0. > + * > + * - For Protocol header based buffer split, if the received packets > + * don't exactly match all protocol headers in the elements, packets > + * will not be split. > + * These packets will be put into: > + * - pool from the last valid element > + * - the buffer size from this pool > + * - zero offset > */ > struct rte_eth_rxseg_split { > struct rte_mempool *mp; /**< Memory pool to allocate segment from. */ > uint16_t length; /**< Segment data length, configures split point. */ > uint16_t offset; /**< Data offset from beginning of mbuf data buffer. */ > - uint32_t reserved; /**< Reserved field. */ > + /** > + * Proto_hdr defines a bit mask of the protocol sequence as RTE_PTYPE_*, > + * configures split point. The last RTE_PTYPE* in the mask indicates the > + * split position. > + * > + * If one protocol header is defined to split packets into two segments, > + * for non-tunneling packets, the complete protocol sequence should be defined. > + * For tunneling packets, for simplicity, only the tunnel and inner part of > + * comple protocol sequence is required. > + * If several protocol headers are defined to split packets into multi-segments, > + * the repeated parts of adjacent segments should be omitted. > + */ > + uint32_t proto_hdr; > }; > > /**