From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id BBB1CA04DB; Fri, 16 Oct 2020 13:21:40 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 8F5401ECBA; Fri, 16 Oct 2020 13:21:39 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 174331EC05 for ; Fri, 16 Oct 2020 13:21:36 +0200 (CEST) IronPort-SDR: KwtEuD7pvw8RoiXGpJxTFkELgh50wpF5qEtqTJZ2V+PulfdxCIqx0KSmCgYsTonz15o1juNYZP yYAGU3xp9BPA== X-IronPort-AV: E=McAfee;i="6000,8403,9775"; a="251278476" X-IronPort-AV: E=Sophos;i="5.77,382,1596524400"; d="scan'208";a="251278476" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Oct 2020 04:21:34 -0700 IronPort-SDR: Vm/8PctDPsTm6dpXFdRny3va39RaEq1iubbZ/ybgsFfjk9wMyju2krF9Go41zDg312EPpLZslj +YclMFBGQnyw== X-IronPort-AV: E=Sophos;i="5.77,382,1596524400"; d="scan'208";a="531680985" Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.252.19.66]) ([10.252.19.66]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Oct 2020 04:21:32 -0700 To: Viacheslav Ovsiienko , dev@dpdk.org Cc: thomas@monjalon.net, stephen@networkplumber.org, olivier.matz@6wind.com, jerinjacobk@gmail.com, maxime.coquelin@redhat.com, david.marchand@redhat.com, arybchenko@solarflare.com References: <1602843764-32331-1-git-send-email-viacheslavo@nvidia.com> <1602843764-32331-2-git-send-email-viacheslavo@nvidia.com> From: Ferruh Yigit Message-ID: Date: Fri, 16 Oct 2020 12:21:28 +0100 MIME-Version: 1.0 In-Reply-To: <1602843764-32331-2-git-send-email-viacheslavo@nvidia.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v9 1/6] ethdev: introduce Rx buffer split X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 10/16/2020 11:22 AM, Viacheslav Ovsiienko wrote: > The DPDK datapath in the transmit direction is very flexible. > An application can build the multi-segment packet and manages > almost all data aspects - the memory pools where segments > are allocated from, the segment lengths, the memory attributes > like external buffers, registered for DMA, etc. > > In the receiving direction, the datapath is much less flexible, > an application can only specify the memory pool to configure the > receiving queue and nothing more. In order to extend receiving > datapath capabilities it is proposed to add the way to provide > extended information how to split the packets being received. > > The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device > capabilities is introduced to present the way for PMD to report to > application about supporting Rx packet split to configurable > segments. Prior invoking the rte_eth_rx_queue_setup() routine > application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag. > > The following structure is introduced to specify the Rx packet > segment for RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload: > > struct rte_eth_rxseg_split { > > struct rte_mempool *mp; /* memory pools to allocate segment from */ > uint16_t length; /* segment maximal data length, > configures "split point" */ > uint16_t offset; /* data offset from beginning > of mbuf data buffer */ > uint32_t reserved; /* reserved field */ > }; > > The segment descriptions are added to the rte_eth_rxconf structure: > rx_seg - pointer the array of segment descriptions, each element > describes the memory pool, maximal data length, initial > data offset from the beginning of data buffer in mbuf. > This array allows to specify the different settings for > each segment in individual fashion. > rx_nseg - number of elements in the array > > If the extended segment descriptions is provided with these new > fields the mp parameter of the rte_eth_rx_queue_setup must be > specified as NULL to avoid ambiguity. > > There are two options to specify Rx buffer configuration: > - mp is not NULL, rx_conf.rx_seg is NULL, rx_conf.rx_nseg is zero, > it is compatible configuration, follows existing implementation, > provides single pool and no description for segment sizes > and offsets. > - mp is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not > zero, it provides the extended configuration, individually for > each segment. > > f the Rx queue is configured with new settings the packets being > received will be split into multiple segments pushed to the mbufs > with specified attributes. The PMD will split the received packets > into multiple segments according to the specification in the > description array. > > For example, let's suppose we configured the Rx queue with the > following segments: > seg0 - pool0, len0=14B, off0=2 > seg1 - pool1, len1=20B, off1=128B > seg2 - pool2, len2=20B, off2=0B > seg3 - pool3, len3=512B, off3=0B > > The packet 46 bytes long will look like the following: > seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0 > seg1 - 20B long @ 128 in mbuf from pool1 > seg2 - 12B long @ 0 in mbuf from pool2 > > The packet 1500 bytes long will look like the following: > seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0 > seg1 - 20B @ 128 in mbuf from pool1 > seg2 - 20B @ 0 in mbuf from pool2 > seg3 - 512B @ 0 in mbuf from pool3 > seg4 - 512B @ 0 in mbuf from pool3 > seg5 - 422B @ 0 in mbuf from pool3 > > The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and > configured to support new buffer split feature (if rx_nseg > is greater than one). > > The split limitations imposed by underlying PMD is reported > in the new introduced rte_eth_dev_info->rx_seg_capa field. > > The new approach would allow splitting the ingress packets into > multiple parts pushed to the memory with different attributes. > For example, the packet headers can be pushed to the embedded > data buffers within mbufs and the application data into > the external buffers attached to mbufs allocated from the > different memory pools. The memory attributes for the split > parts may differ either - for example the application data > may be pushed into the external memory located on the dedicated > physical device, say GPU or NVMe. This would improve the DPDK > receiving datapath flexibility with preserving compatibility > with existing API. > > Signed-off-by: Viacheslav Ovsiienko > Acked-by: Ajit Khaparde > Acked-by: Jerin Jacob > Reviewed-by: Andrew Rybchenko <...> > +.. _nic_features_buffer_split: > + > +Buffer Split on Rx > +------------------ > + > +Scatters the packets being received on specified boundaries to segmented mbufs. > + > +* **[uses] rte_eth_rxconf,rte_eth_rxmode**: ``offloads:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``. > +* **[uses] rte_eth_rxconf**: ``rx_conf.rx_seg, rx_conf.rx_nseg``. > +* **[implements] datapath**: ``Buffer Split functionality``. > +* **[provides] rte_eth_dev_info**: ``rx_offload_capa:RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT``. > +* **[provides] eth_dev_ops**: ``rxq_info_get:buffer_split``. Previously you mentioned this is because 'rxq_info_get()' can provide buffer_split information, but with current implementation it doesn't and there is no filed in the struct to report such. I suggest either add it now, while you can :) [with a techboard approval], or remove above documentation of it. <...> > /** > + * Ethernet device Rx buffer segmentation capabilities. > + */ > +__rte_experimental > +struct rte_eth_rxseg_capa { > + __extension__ > + uint32_t max_nseg:16; /**< Maximum amount of segments to split. */ > + uint32_t multi_pools:1; /**< Supports receiving to multiple pools.*/ > + uint32_t offset_allowed:1; /**< Supports buffer offsets. */ > + uint32_t offset_align_log2:4; /**< Required offset alignment. */ > +}; Now we are fiddling details, but, I am not fun of the bitfields [1], but I assumed Thomas' request was to enable expanding capabilities later without breaking the ABI, which makes senses and suits to this kind of capability struct, if this is correct why made the 'max_nseg' a bitfield too? Why not, uint16_t max_nseg; uint16_t multi_pools:1 uint16_t offset_allowed:1; uint16_t offset_align_log2:4; < This still leaves 10 bits to expand without ABI break> [1] unles very space critical use case, otherwise they just add more code to extract the same value, and not as simple as a simple variable :)