From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id BFFC0A04C0; Thu, 17 Sep 2020 18:55:35 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DB77B1D6CE; Thu, 17 Sep 2020 18:55:34 +0200 (CEST) Received: from dispatch1-us1.ppe-hosted.com (dispatch1-us1.ppe-hosted.com [148.163.129.52]) by dpdk.org (Postfix) with ESMTP id A5CC51D6CD for ; Thu, 17 Sep 2020 18:55:32 +0200 (CEST) Received: from mx1-us1.ppe-hosted.com (unknown [10.7.65.64]) by dispatch1-us1.ppe-hosted.com (PPE Hosted ESMTP Server) with ESMTP id 1A3876013E; Thu, 17 Sep 2020 16:55:32 +0000 (UTC) Received: from us4-mdac16-7.ut7.mdlocal (unknown [10.7.65.75]) by mx1-us1.ppe-hosted.com (PPE Hosted ESMTP Server) with ESMTP id 15673200A4; Thu, 17 Sep 2020 16:55:32 +0000 (UTC) X-Virus-Scanned: Proofpoint Essentials engine Received: from mx1-us1.ppe-hosted.com (unknown [10.7.66.41]) by mx1-us1.ppe-hosted.com (PPE Hosted ESMTP Server) with ESMTPS id 568A322006C; Thu, 17 Sep 2020 16:55:28 +0000 (UTC) Received: from webmail.solarflare.com (uk.solarflare.com [193.34.186.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mx1-us1.ppe-hosted.com (PPE Hosted ESMTP Server) with ESMTPS id 8DF434C0078; Thu, 17 Sep 2020 16:55:25 +0000 (UTC) Received: from [127.0.0.27] (10.17.10.39) by ukex01.SolarFlarecom.com (10.17.10.4) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 17 Sep 2020 17:55:15 +0100 To: Slava Ovsiienko , "dev@dpdk.org" CC: Thomas Monjalon , "stephen@networkplumber.org" , "ferruh.yigit@intel.com" , Shahaf Shuler , "olivier.matz@6wind.com" , "jerinjacobk@gmail.com" , "maxime.coquelin@redhat.com" , "david.marchand@redhat.com" , Asaf Penso References: From: Andrew Rybchenko Message-ID: <1870d31a-0ec1-4198-fcee-03646c009458@solarflare.com> Date: Thu, 17 Sep 2020 19:55:07 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.17.10.39] X-ClientProxiedBy: ocex03.SolarFlarecom.com (10.20.40.36) To ukex01.SolarFlarecom.com (10.17.10.4) X-TM-AS-Product-Ver: SMEX-12.5.0.1300-8.6.1012-25670.006 X-TM-AS-Result: No-19.519500-8.000000-10 X-TMASE-MatchedRID: 8HTFlOrbAtH4ECMHJTM/ufZvT2zYoYOwC/ExpXrHizzYk/mTcJavDmHw ut3qZqvbg2TvV253YN4naVZBJmZ0eq48GVQyXAyfcndZTV1f+nzKhROoEXUTm1ezEE205dNvZYz ZgiJFzgqD4hKcvrPSS/kMfvYwOeNIDoDmevW/xPpuh7qwx+D6TylayzmQ9QV0xtCyHb7iTyno2K YzVhE6Zvk5JZcbYmCyVMquwSWKY9k7EhTUWpl2cPFanwH6mNosZZ5HkRNGNgPwJYZa/L83HZgM2 5fIhnOfFuUKANpaHF8HgPJYRMYDZWf0WZn/fej28Otj0KvfR1lgptxXmQ0efSNGK7UC7ElMwm8f E1p0MF/RQIdPd0cW5uCD4910pm5rPMD2N6SZSV15aAks2jAd1coioCrSMgeKyjuivQ3b6OPs5+n hNoK8J/hxU7FWHnV+Y2H89+L8ExKip6xhGipIWUQiyd+xo4asG0Oe0T+pTlGn7m/OxjmYo3o9OH iQckzmm9ac7+JHgVsXY8HbCaq2I5sO9o0mINopdPuue3cRiRgO9z+P2gwiBQpCjqVELlwVY7btf ej9JsJ8bO6hWfRWzo9CL1e45ag41Lt7MpzUCiTm96eHJyFxjX/0yDqj7AyLxzfbZxx4F4T4v0Dh 0LIwCcyVnoHE1KGmaK8NfDCX6GZVyim4K7cofHV7tdtvoiba6qG5M9QNAO0da1Vk3RqxOOwYnx3 P3cHGktJj1AooNIfrU6gIDTki0SWKssWVP7Rb2fov7TwhL8mpvf+jmz45wygVbxW7FDOV57JYwt FV/YvIVHWMur+9eEaBhQFwLGfgma85uekQONyeAiCmPx4NwLTrdaH1ZWqC1B0Hk1Q1KyLUZxEAl FPo846HM5rqDwqtlExlQIQeRG0= X-TM-AS-User-Approved-Sender: Yes X-TM-AS-User-Blocked-Sender: No X-TMASE-Result: 10--19.519500-8.000000 X-TMASE-Version: SMEX-12.5.0.1300-8.6.1012-25670.006 X-MDID: 1600361728-hrD-mENgsZTp Subject: Re: [dpdk-dev] [RFC] ethdev: introduce Rx buffer split X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 8/17/20 8:49 PM, Slava Ovsiienko wrote: >>>From 7f7052d8b85ff3ff7011bd844b6d3169c6e51923 Mon Sep 17 00:00:00 2001 > From: Viacheslav Ovsiienko > Date: Mon, 17 Aug 2020 16:57:43 +0000 > Subject: [RFC] ethdev: introduce Rx buffer split > > The DPDK datapath in the transmit direction is very flexible. > An application can build the multisegment packet and manages > almost all data aspects - the memory pools where segments > are allocated from, the segment lengths, the memory attributes > like external buffers, registered for DMA, etc. > > In the receiving direction, the datapath is much less flexible, > an application can only specify the memory pool to configure the > receiving queue and nothing more. In order to extend receiving > datapath capabilities it is proposed to add the way to provide > extended infoirmation how to split the packets being received. infoirmation -> information > > The following structure is introduced to specify the Rx packet > segment: > > struct rte_eth_rxseg { > struct rte_mempool *mp; /* memory pools to allocate segment from */ > uint16_t length; /* segment maximal data length */ > uint16_t offset; /* data offset from beginning of mbuf data buffer */ > uint32_t reserved; /* reserved field */ > }; > > The new routine rte_eth_rx_queue_setup_ex() is introduced to > setup the given Rx queue using the new extended Rx packet segment > description: > > int > rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id, > uint16_t nb_rx_desc, unsigned int socket_id, > const struct rte_eth_rxconf *rx_conf, > const struct rte_eth_rxseg *rx_seg, > uint16_t n_seg) > > This routine presents the two new parameters: > rx_seg - pointer the array of segment descriptions, each element > describes the memory pool, maximal data length, initial > data offset from the beginning of data buffer in mbuf > n_seg - number of elements in the array > > The new offload flag DEV_RX_OFFLOAD_BUFFER_SPLIT in device > capabilities is introduced to present the way for PMD to report to > application about supporting Rx packet split to configurable > segments. Prior invoking the rte_eth_rx_queue_setup_ex() routine > application should check DEV_RX_OFFLOAD_BUFFER_SPLIT flag. > > If the Rx queue is configured with new routine the packets being > received will be split into multiple segments pushed to the mbufs > with specified attributes. The PMD will allocate the first mbuf > from the pool specified in the first segment descriptor and puts > the data staring at specified offeset in the allocated mbuf data offeset -> offset > buffer. If packet length exceeds the specified segment length > the next mbuf will be allocated according to the next segment > descriptor (if any) and data will be put in its data buffer at > specified offset and not exceeding specified length. If there is > no next descriptor the next mbuf will be allocated and filled in the > same way (from the same pool and with the same buffer offset/length) > as the current one. > > For example, let's suppose we configured the Rx queue with the > following segments: > seg0 - pool0, len0=14B, off0=RTE_PKTMBUF_HEADROOM > seg1 - pool1, len1=20B, off1=0B > seg2 - pool2, len2=20B, off2=0B > seg3 - pool3, len3=512B, off3=0B > > The packet 46 bytes long will look like the following: > seg0 - 14B long @ RTE_PKTMBUF_HEADROOM in mbuf from pool0 > seg1 - 20B long @ 0 in mbuf from pool1 > seg2 - 12B long @ 0 in mbuf from pool2 > > The packet 1500 bytes long will look like the following: > seg0 - 14B @ RTE_PKTMBUF_HEADROOM in mbuf from pool0 > seg1 - 20B @ 0 in mbuf from pool1 > seg2 - 20B @ 0 in mbuf from pool2 > seg3 - 512B @ 0 in mbuf from pool3 > seg4 - 512B @ 0 in mbuf from pool3 > seg5 - 422B @ 0 in mbuf from pool3 The behaviour is logical, but what to do if HW can't do it, i.e. use the last segment many times. Should it reject configuration if provided segments are insufficient to fit MTU packet? How to report the limitation? (I'm still trying to convince that SCATTER and BUFFER_SPLIT should be independent). > > The offload DEV_RX_OFFLOAD_SCATTER must be present and > configured to support new buffer spllit feature (if n_seg spllit -> split > is greater than one). > > The new approach would allow splitting the ingress packets into > multiple parts pushed to the memory with different attributes. > For example, the packet headers can be pushed to the embedded > data buffers within mbufs and the application data into > the external buffers attached to mbufs allocated from the > different memory pools. The memory attributes for the split > parts may differ either - for example the application data > may be pushed into the external memory located on the dedicated > physical device, say GPU or NVMe. This would improve the DPDK > receiving datapath flexibility with preserving compatibility > with existing API. > > Also, the proposed segment description might be used to specify > Rx packet split for some other features. For example, provide > the way to specify the extra memory pool for the Header Split > feature of some Intel PMD. > > Signed-off-by: Viacheslav Ovsiienko > --- > lib/librte_ethdev/rte_ethdev.c | 166 ++++++++++++++++++++++++++++++++++++ > lib/librte_ethdev/rte_ethdev.h | 15 ++++ > lib/librte_ethdev/rte_ethdev_core.h | 10 +++ > 3 files changed, 191 insertions(+) > > diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c > index 7858ad5..638e42d 100644 > --- a/lib/librte_ethdev/rte_ethdev.c > +++ b/lib/librte_ethdev/rte_ethdev.c > @@ -1933,6 +1933,172 @@ struct rte_eth_dev * > } > > int > +rte_eth_rx_queue_setup_ex(uint16_t port_id, uint16_t rx_queue_id, > + uint16_t nb_rx_desc, unsigned int socket_id, > + const struct rte_eth_rxconf *rx_conf, > + const struct rte_eth_rxseg *rx_seg, uint16_t n_seg) > +{ > + int ret; > + uint16_t seg_idx; > + uint32_t mbp_buf_size; > + struct rte_eth_dev *dev; > + struct rte_eth_dev_info dev_info; > + struct rte_eth_rxconf local_conf; > + void **rxq; > + > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); > + > + dev = &rte_eth_devices[port_id]; > + if (rx_queue_id >= dev->data->nb_rx_queues) { > + RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", rx_queue_id); > + return -EINVAL; > + } > + > + if (rx_seg == NULL) { > + RTE_ETHDEV_LOG(ERR, "Invalid null description pointer\n"); > + return -EINVAL; > + } > + > + if (n_seg == 0) { > + RTE_ETHDEV_LOG(ERR, "Invalid zero description number\n"); > + return -EINVAL; > + } > + > + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup_ex, -ENOTSUP); > + > + /* > + * Check the size of the mbuf data buffer. > + * This value must be provided in the private data of the memory pool. > + * First check that the memory pool has a valid private data. > + */ > + ret = rte_eth_dev_info_get(port_id, &dev_info); > + if (ret != 0) > + return ret; > + > + for (seg_idx = 0; seg_idx < n_seg; seg_idx++) { > + struct rte_mempool *mp = rx_seg[seg_idx].mp; > + > + if (mp->private_data_size < sizeof(struct rte_pktmbuf_pool_private)) { > + RTE_ETHDEV_LOG(ERR, "%s private_data_size %d < %d\n", > + mp->name, (int)mp->private_data_size, > + (int)sizeof(struct rte_pktmbuf_pool_private)); > + return -ENOSPC; > + } > + > + mbp_buf_size = rte_pktmbuf_data_room_size(mp); > + if (mbp_buf_size < rx_seg[seg_idx].length + rx_seg[seg_idx].offset) { > + RTE_ETHDEV_LOG(ERR, > + "%s mbuf_data_room_size %d < %d" > + " (segment length=%d + segment offset=%d)\n", > + mp->name, (int)mbp_buf_size, > + (int)(rx_seg[seg_idx].length + rx_seg[seg_idx].offset), > + (int)rx_seg[seg_idx].length, > + (int)rx_seg[seg_idx].offset); > + return -EINVAL; > + } > + } > + > + /* Use default specified by driver, if nb_rx_desc is zero */ > + if (nb_rx_desc == 0) { > + nb_rx_desc = dev_info.default_rxportconf.ring_size; > + /* If driver default is also zero, fall back on EAL default */ > + if (nb_rx_desc == 0) > + nb_rx_desc = RTE_ETH_DEV_FALLBACK_RX_RINGSIZE; > + } > + > + if (nb_rx_desc > dev_info.rx_desc_lim.nb_max || > + nb_rx_desc < dev_info.rx_desc_lim.nb_min || > + nb_rx_desc % dev_info.rx_desc_lim.nb_align != 0) { > + > + RTE_ETHDEV_LOG(ERR, > + "Invalid value for nb_rx_desc(=%hu), should be: <= %hu, >= %hu, and a product of %hu\n", > + nb_rx_desc, dev_info.rx_desc_lim.nb_max, > + dev_info.rx_desc_lim.nb_min, > + dev_info.rx_desc_lim.nb_align); > + return -EINVAL; > + } > + > + if (dev->data->dev_started && > + !(dev_info.dev_capa & > + RTE_ETH_DEV_CAPA_RUNTIME_RX_QUEUE_SETUP)) > + return -EBUSY; > + > + if (dev->data->dev_started && > + (dev->data->rx_queue_state[rx_queue_id] != > + RTE_ETH_QUEUE_STATE_STOPPED)) > + return -EBUSY; > + > + rxq = dev->data->rx_queues; > + if (rxq[rx_queue_id]) { > + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release, > + -ENOTSUP); > + (*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]); > + rxq[rx_queue_id] = NULL; > + } > + > + if (rx_conf == NULL) > + rx_conf = &dev_info.default_rxconf; > + > + local_conf = *rx_conf; > + > + /* > + * If an offloading has already been enabled in > + * rte_eth_dev_configure(), it has been enabled on all queues, > + * so there is no need to enable it in this queue again. > + * The local_conf.offloads input to underlying PMD only carries > + * those offloadings which are only enabled on this queue and > + * not enabled on all queues. > + */ > + local_conf.offloads &= ~dev->data->dev_conf.rxmode.offloads; > + > + /* > + * New added offloadings for this queue are those not enabled in > + * rte_eth_dev_configure() and they must be per-queue type. > + * A pure per-port offloading can't be enabled on a queue while > + * disabled on another queue. A pure per-port offloading can't > + * be enabled for any queue as new added one if it hasn't been > + * enabled in rte_eth_dev_configure(). > + */ > + if ((local_conf.offloads & dev_info.rx_queue_offload_capa) != > + local_conf.offloads) { > + RTE_ETHDEV_LOG(ERR, > + "Ethdev port_id=%d rx_queue_id=%d, new added offloads 0x%"PRIx64" must be " > + "within per-queue offload capabilities 0x%"PRIx64" in %s()\n", > + port_id, rx_queue_id, local_conf.offloads, > + dev_info.rx_queue_offload_capa, > + __func__); > + return -EINVAL; > + } > + > + /* > + * If LRO is enabled, check that the maximum aggregated packet > + * size is supported by the configured device. > + */ > + if (local_conf.offloads & DEV_RX_OFFLOAD_TCP_LRO) { > + if (dev->data->dev_conf.rxmode.max_lro_pkt_size == 0) > + dev->data->dev_conf.rxmode.max_lro_pkt_size = > + dev->data->dev_conf.rxmode.max_rx_pkt_len; > + int ret = check_lro_pkt_size(port_id, > + dev->data->dev_conf.rxmode.max_lro_pkt_size, > + dev->data->dev_conf.rxmode.max_rx_pkt_len, > + dev_info.max_lro_pkt_size); > + if (ret != 0) > + return ret; > + } > + > + ret = (*dev->dev_ops->rx_queue_setup_ex)(dev, rx_queue_id, nb_rx_desc, > + socket_id, &local_conf, > + rx_seg, n_seg); > + if (!ret) { > + if (!dev->data->min_rx_buf_size || > + dev->data->min_rx_buf_size > mbp_buf_size) > + dev->data->min_rx_buf_size = mbp_buf_size; > + } > + > + return eth_err(port_id, ret); > +} > + > +int > rte_eth_rx_hairpin_queue_setup(uint16_t port_id, uint16_t rx_queue_id, > uint16_t nb_rx_desc, > const struct rte_eth_hairpin_conf *conf) I dislike the idea to introduce new device operation. rte_eth_rxconf has reserved space and BUFFER_SPLIT offload will mean that PMD looks at the split configuration location there. Above duplication is simply not acceptable, but even if we factor our shared code into helper function, I still see no point to introduce new device operation. [snip]