From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9804BA04DB; Fri, 16 Oct 2020 11:19:14 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 784161EBDD; Fri, 16 Oct 2020 11:19:13 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id 8DF1C1EBA0 for ; Fri, 16 Oct 2020 11:19:12 +0200 (CEST) IronPort-SDR: IaR5KgzsJ08EI/txwZAn5LdKFdOAhtLwPl9IoLiY8z4G8koENA73HD6CoDrIKb6vg9YOFWl6Qi pChZDVXD2gQw== X-IronPort-AV: E=McAfee;i="6000,8403,9775"; a="154371591" X-IronPort-AV: E=Sophos;i="5.77,382,1596524400"; d="scan'208";a="154371591" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Oct 2020 02:19:10 -0700 IronPort-SDR: OdYVFU1JtgR+Ih8IHq0/l2EiTzRN9TsYnIEz0HcBih2hKEy0XGU5cj3sGIDcP+h12jA9rA+GlO trdzLWSD546A== X-IronPort-AV: E=Sophos;i="5.77,382,1596524400"; d="scan'208";a="531648042" Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.252.19.66]) ([10.252.19.66]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Oct 2020 02:19:07 -0700 To: Viacheslav Ovsiienko , dev@dpdk.org Cc: thomas@monjalon.net, stephen@networkplumber.org, olivier.matz@6wind.com, jerinjacobk@gmail.com, maxime.coquelin@redhat.com, david.marchand@redhat.com, arybchenko@solarflare.com References: <1602834519-8696-1-git-send-email-viacheslavo@nvidia.com> <1602834519-8696-2-git-send-email-viacheslavo@nvidia.com> From: Ferruh Yigit Message-ID: <3e1da2f4-074d-9d03-c44f-435498eedaa6@intel.com> Date: Fri, 16 Oct 2020 10:19:04 +0100 MIME-Version: 1.0 In-Reply-To: <1602834519-8696-2-git-send-email-viacheslavo@nvidia.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v8 1/6] ethdev: introduce Rx buffer split X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 10/16/2020 8:48 AM, Viacheslav Ovsiienko wrote: > The DPDK datapath in the transmit direction is very flexible. > An application can build the multi-segment packet and manages > almost all data aspects - the memory pools where segments > are allocated from, the segment lengths, the memory attributes > like external buffers, registered for DMA, etc. > > In the receiving direction, the datapath is much less flexible, > an application can only specify the memory pool to configure the > receiving queue and nothing more. In order to extend receiving > datapath capabilities it is proposed to add the way to provide > extended information how to split the packets being received. > > The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device > capabilities is introduced to present the way for PMD to report to > application about supporting Rx packet split to configurable > segments. Prior invoking the rte_eth_rx_queue_setup() routine > application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag. > > The following structure is introduced to specify the Rx packet > segment for RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload: > > struct rte_eth_rxseg_split { > > struct rte_mempool *mp; /* memory pools to allocate segment from */ > uint16_t length; /* segment maximal data length, > configures "split point" */ > uint16_t offset; /* data offset from beginning > of mbuf data buffer */ > uint32_t reserved; /* reserved field */ > }; > > The segment descriptions are added to the rte_eth_rxconf structure: > rx_seg - pointer the array of segment descriptions, each element > describes the memory pool, maximal data length, initial > data offset from the beginning of data buffer in mbuf. > This array allows to specify the different settings for > each segment in individual fashion. > rx_nseg - number of elements in the array > > If the extended segment descriptions is provided with these new > fields the mp parameter of the rte_eth_rx_queue_setup must be > specified as NULL to avoid ambiguity. > > There are two options to specify Rx buffer configuration: > - mp is not NULL, rx_conf.rx_seg is NULL, rx_conf.rx_nseg is zero, > it is compatible configuration, follows existing implementation, > provides single pool and no description for segment sizes > and offsets. > - mp is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not > zero, it provides the extended configuration, individually for > each segment. > > f the Rx queue is configured with new settings the packets being > received will be split into multiple segments pushed to the mbufs > with specified attributes. The PMD will split the received packets > into multiple segments according to the specification in the > description array. > > For example, let's suppose we configured the Rx queue with the > following segments: > seg0 - pool0, len0=14B, off0=2 > seg1 - pool1, len1=20B, off1=128B > seg2 - pool2, len2=20B, off2=0B > seg3 - pool3, len3=512B, off3=0B > > The packet 46 bytes long will look like the following: > seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0 > seg1 - 20B long @ 128 in mbuf from pool1 > seg2 - 12B long @ 0 in mbuf from pool2 > > The packet 1500 bytes long will look like the following: > seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0 > seg1 - 20B @ 128 in mbuf from pool1 > seg2 - 20B @ 0 in mbuf from pool2 > seg3 - 512B @ 0 in mbuf from pool3 > seg4 - 512B @ 0 in mbuf from pool3 > seg5 - 422B @ 0 in mbuf from pool3 > > The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and > configured to support new buffer split feature (if rx_nseg > is greater than one). > > The split limitations imposed by underlying PMD is reported > in the new introduced rte_eth_dev_info->rx_seg_capa field. > > The new approach would allow splitting the ingress packets into > multiple parts pushed to the memory with different attributes. > For example, the packet headers can be pushed to the embedded > data buffers within mbufs and the application data into > the external buffers attached to mbufs allocated from the > different memory pools. The memory attributes for the split > parts may differ either - for example the application data > may be pushed into the external memory located on the dedicated > physical device, say GPU or NVMe. This would improve the DPDK > receiving datapath flexibility with preserving compatibility > with existing API. > > Signed-off-by: Viacheslav Ovsiienko > Acked-by: Ajit Khaparde > Acked-by: Jerin Jacob <...> > +/** > * A structure used to configure an RX ring of an Ethernet port. > */ > struct rte_eth_rxconf { > @@ -977,6 +998,46 @@ struct rte_eth_rxconf { > uint16_t rx_free_thresh; /**< Drives the freeing of RX descriptors. */ > uint8_t rx_drop_en; /**< Drop packets if no descriptors are available. */ > uint8_t rx_deferred_start; /**< Do not start queue with rte_eth_dev_start(). */ > + uint16_t rx_nseg; /**< Number of descriptions in rx_seg array. */ > + /** > + * Points to the array of segment descriptions. Each array element > + * describes the properties for each segment in the receiving > + * buffer according to feature descripting structure. > + * > + * The supported capabilities of receiving segmentation is reported > + * in rte_eth_dev_info ->rx_seg_capa field. > + * > + * If RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag is set in offloads field, > + * the PMD will split the received packets into multiple segments > + * according to the specification in the description array: > + * > + * - the first network buffer will be allocated from the memory pool, > + * specified in the first array element, the second buffer, from the > + * pool in the second element, and so on. > + * > + * - the offsets from the segment description elements specify > + * the data offset from the buffer beginning except the first mbuf. > + * For this one the offset is added with RTE_PKTMBUF_HEADROOM. > + * > + * - the lengths in the elements define the maximal data amount > + * being received to each segment. The receiving starts with filling > + * up the first mbuf data buffer up to specified length. If the > + * there are data remaining (packet is longer than buffer in the first > + * mbuf) the following data will be pushed to the next segment > + * up to its own length, and so on. > + * > + * - If the length in the segment description element is zero > + * the actual buffer size will be deduced from the appropriate > + * memory pool properties. > + * > + * - if there is not enough elements to describe the buffer for entire > + * packet of maximal length the following parameters will be used > + * for the all remaining segments: > + * - pool from the last valid element > + * - the buffer size from this pool > + * - zero offset > + */ > + struct rte_eth_rxseg *rx_seg; "struct rte_eth_rxconf" is very commonly used, I think all applications does the 'rte_eth_rx_queue_setup()', but "buffer split" is not a common usage, I am against the "struct rte_eth_rxseg *rx_seg;" field creating this much noise in the "struct rte_eth_rxconf" documentation. As mentioned before, can you please move the above detailed documentation to where "struct rte_eth_rxseg" defined, and in this struct put a single comment for "struct rte_eth_rxseg *rx_seg" ?