From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 228BDA0352; Mon, 4 Nov 2019 14:09:46 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6BD8F37A2; Mon, 4 Nov 2019 14:09:45 +0100 (CET) Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) by dpdk.org (Postfix) with ESMTP id D75D3374E for ; Mon, 4 Nov 2019 14:09:44 +0100 (CET) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 279DC21EC3; Mon, 4 Nov 2019 08:09:43 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Mon, 04 Nov 2019 08:09:43 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=mesmtp; bh=AJKfJ7fdrN7eCq9K3igpt/bXQsO/SzcqMgijdYzj3gA=; b=T8yNfdAImS+V XT9y3XlOmtUa1J96ncWCeMBIupINkUXl3MTVe+ryrmjgTuW6r5//op5ITl6ahmLY l1QRMoLnZ676YwPz0YmqnmzA0mFeaXK+SkLJnKC14SI/+IgQYZb/9jUqvY/BDuyZ AdtFV6XYQYrlgNg5Amsd3+WsQdiN7YY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=AJKfJ7fdrN7eCq9K3igpt/bXQsO/SzcqMgijdYzj3 gA=; b=JgZRe57gWH9NPmJ7usgDpZpSSmRQUeCA0/xHz1pONlCDsNbLXaV4h3PwI UWXEiEtBSEvJPgKXJgbMhbfkY6bsoCi4QsgJhMjCgj2uFxHGV236IkvpxNRIqYa3 4GF0PDzDMHD+J4JNKLilbMwV72QFJKotHyr6rxdPtfIwQtHM9uZp9yCMlwT9Emp+ /EhogE242xz1jqbIIOCJ6EPbMkzBRkRGhvuj75vKnDWtaWQtwDDUzRdtBscqBmxD S53JznXrXW3aOx5OMST42WzQTA0DyB9B+A2JMlm3w0fnntokRxXMy4IsdMaVpGbu HD+kHnNHESmvfe7UxtXzfM/m+Ja7w== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedufedruddufedggeelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkfgjfhgggfgtsehtufertddttddvnecuhfhrohhmpefvhhhomhgr shcuofhonhhjrghlohhnuceothhhohhmrghssehmohhnjhgrlhhonhdrnhgvtheqnecukf hppeelfedrvdefrddvgeelrddugeelnecurfgrrhgrmhepmhgrihhlfhhrohhmpehthhho mhgrshesmhhonhhjrghlohhnrdhnvghtnecuvehluhhsthgvrhfuihiivgeptd X-ME-Proxy: Received: from xps.localnet (149.249.23.93.rev.sfr.net [93.23.249.149]) by mail.messagingengine.com (Postfix) with ESMTPA id 0729E8005C; Mon, 4 Nov 2019 08:09:39 -0500 (EST) From: Thomas Monjalon To: Ray Kinsella Cc: dev@dpdk.org, "Yigit, Ferruh" , 'Damjan Marion' , "Wang, Haiyue" , Jerin Jacob Kollanukkaran , viacheslavo@mellanox.com, stephen@networkplumber.org, arybchenko@solarflare.com Date: Mon, 04 Nov 2019 14:09:38 +0100 Message-ID: <2680328.hM31vkk1Ea@xps> In-Reply-To: <932da3d3-34b9-79a4-1a68-fb06056ee93f@ashroe.eu> References: <20191015075133.38560-1-haiyue.wang@intel.com> <3333162.RZseXh5CjJ@xps> <932da3d3-34b9-79a4-1a68-fb06056ee93f@ashroe.eu> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [PATCH v4 1/4] ethdev: add the API for getting burst mode information X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 04/11/2019 13:07, Ray Kinsella: > > On 04/11/2019 11:30, Thomas Monjalon wrote: > > 04/11/2019 11:03, Ray Kinsella: > >> On 04/11/2019 09:54, Thomas Monjalon wrote: > >>> 04/11/2019 10:49, Ray Kinsella: > >>>> On 03/11/2019 22:41, Thomas Monjalon wrote: > >>>>> 03/11/2019 21:35, Ray Kinsella: > >>>>>> On 29/10/2019 14:27, Ferruh Yigit wrote: > >>>>>>> On 10/26/2019 5:23 PM, Thomas Monjalon wrote: > >>>>>>>> 26/10/2019 11:23, Wang, Haiyue: > >>>>>>>>> From: Thomas Monjalon [mailto:thomas@monjalon.net] > >>>>>>>>>> 26/10/2019 06:40, Wang, Haiyue: > >>>>>>>>>>> From: Thomas Monjalon [mailto:thomas@monjalon.net] > >>>>>>>>>>>> 25/10/2019 18:02, Jerin Jacob: > >>>>>>>>>>>>> On Fri, Oct 25, 2019 at 9:15 PM Thomas Monjalon wrote: > >>>>>>>>>>>>>> 25/10/2019 16:08, Ferruh Yigit: > >>>>>>>>>>>>>>> On 10/25/2019 10:36 AM, Thomas Monjalon wrote: > >>>>>>>>>>>>>>>> 15/10/2019 09:51, Haiyue Wang: > >>>>>>>>>>>>>>>>> Some PMDs have more than one RX/TX burst paths, add the ethdev API > >>>>>>>>>>>>>>>>> that allows an application to retrieve the mode information about > >>>>>>>>>>>>>>>>> Rx/Tx packet burst such as Scalar or Vector, and Vector technology > >>>>>>>>>>>>>>>>> like AVX2. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I missed this patch. I and Andrew, maintainers of ethdev, were not CC'ed. > >>>>>>>>>>>>>>>> Ferruh, I would expect to be Cc'ed and/or get a notification before merging. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> It has been discussed in the mail list and went through multiple discussions, > >>>>>>>>>>>>>>> patch is out since the August, +1 to cc all maintainers I missed that part, > >>>>>>>>>>>>>>> but when the patch is reviewed and there is no objection, why block the merge? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I'm not saying blocking the merge. > >>>>>>>>>>>>>> My bad is that I missed the patch and I am asking for help with a notification > >>>>>>>>>>>>>> in this case. Same for Andrew I guess. > >>>>>>>>>>>>>> Note: it is merged in master and I am looking to improve this feature. > >>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> +/** > >>>>>>>>>>>>>>>>> + * Ethernet device RX/TX queue packet burst mode information structure. > >>>>>>>>>>>>>>>>> + * Used to retrieve information about packet burst mode setting. > >>>>>>>>>>>>>>>>> + */ > >>>>>>>>>>>>>>>>> +struct rte_eth_burst_mode { > >>>>>>>>>>>>>>>>> + uint64_t options; > >>>>>>>>>>>>>>>>> +}; > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Why a struct for an integer? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Again by a request from me, to not need to break the API if we need to add more > >>>>>>>>>>>>>>> thing in the future. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I would replace it with a string. This is the most flexible API. > >>>>>>>>>>>>> > >>>>>>>>>>>>> IMO, Probably, best of both worlds make a good option here, > >>>>>>>>>>>>> as Haiyue suggested if we have an additional dev_specific[1] in structure. > >>>>>>>>>>>>> and when a pass to the application, let common code make final string as > >>>>>>>>>>>>> (options flags to string + dev_specific) > >>>>>>>>>>>>> > >>>>>>>>>>>>> options flag can be zero if PMD does not have any generic flags nor > >>>>>>>>>>>>> interested in such a scheme. > >>>>>>>>>>>>> Generic flags will help at least to have some common code. > >>>>>>>>>>>>> > >>>>>>>>>>>>> [1] > >>>>>>>>>>>>> struct rte_eth_burst_mode { > >>>>>>>>>>>>> uint64_t options; > >>>>>>>>>>>>> char dev_specific[128]; /* PMD has specific burst mode information */ > >>>>>>>>>>>>> }; > >>>>>>>>>>>> > >>>>>>>>>>>> I really don't see how we can have generic flags. > >>>>>>>>>>>> The flags which are proposed are just matching > >>>>>>>>>>>> the functions implemented in Intel PMDs. > >>>>>>>>>>>> And this is a complicate solution. > >>>>>>>>>>>> Why not just returning a name for the selected Rx/Tx mode? > >>>>>>>>>>> > >>>>>>>>>>> Intel PMDs use the *generic* methods like x86 SSE, AVX2, ARM NEON, PPC ALTIVEC, > >>>>>>>>>>> 'dev->data->scattered_rx' etc for the target : "DPDK is the Data Plane Development Kit > >>>>>>>>>>> that consists of libraries to accelerate packet processing workloads running on a wide > >>>>>>>>>>> variety of CPU architectures." > >>>>>>>>>> > >>>>>>>>>> How RTE_ETH_BURST_SCATTERED and RTE_ETH_BURST_BULK_ALLOC are generic? > >>>>>>>>>> They just match some features of the Intel PMDs. > >>>>>>>>>> Why not exposing other optimizations of the Rx/Tx implementations? > >>>>>>>>>> You totally missed the point of generic burst mode description. > >>>>>>>>>> > >>>>>>>>>>> If understand these new experimental APIs from above, then bit options is the best, > >>>>>>>>>>> and we didn't invent new words to describe them, just from the CPU & other *generic* > >>>>>>>>>>> technology. And the application can loop to check which kind of burst is running by > >>>>>>>>>>> just simple bit test. > >>>>>>>>>>> > >>>>>>>>>>> If PMDs missed these, they can update them in future roadmaps to enhance their PMDs, > >>>>>>>>>>> like MLX5 supports ARM NEON, x86 SSE. > >>>>>>>>>> > >>>>>>>>>> I have no word! > >>>>>>>>>> You really think other PMDs should learn from Intel how to "enhance" their PMD? > >>>>>>>>>> You talk about mlx5, did you look at its code? Did you see the burst modes > >>>>>>>>>> depending on which specific hardware path is used (MPRQ, EMPW, inline)? > >>>>>>>>>> Or depending on which offloads are handled? > >>>>>>>>>> > >>>>>>>>>> Again, the instruction set used by the function is a small part > >>>>>>>>>> of the burst mode optimization. > >>>>>>>>>> > >>>>>>>>>> So you did not reply to my question: > >>>>>>>>>> Why not just returning a name for the selected Rx/Tx mode? > >>>>>>>>> > >>>>>>>>> In fact, RFC v1/v2 returns the *name*, but the *name* is hard for > >>>>>>>>> application to do further processing, strcmp, strstr ? Not so nice > >>>>>>>>> for C code, and it is not so standard, So switch it to bit definition. > >>>>>>>> > >>>>>>>> Again, please answer my question: why do you need it? > >>>>>>>> I think it is just informative, that's why a string should be enough. > >>>>>>>> I am clearly against the bitmap because it is way too much restrictive. > >>>>>>>> I disagree that knowing it is using AVX2 or AVX512 is so interesting. > >>>>>>>> What you would like to know is whether it is processing packets 4 by 4, > >>>>>>>> for instance, or to know which offload is supported, or what hardware trick > >>>>>>>> is used in the datapath design. > >>>>>>>> There are so many options in a datapath design that it cannot be > >>>>>>>> represented with a bitmap. And it makes no sense to have some design > >>>>>>>> criterias more important than others. > >>>>>>>> I Cc an Intel architect (Edwin) who could explain you how much > >>>>>>>> a datapath design is more complicate than just using AVX instructions. > >>>>>>> > >>>>>>> As I understand this is to let applications to give informed decision based on > >>>>>>> what vectorization is used in the driver, currently this is not know by the > >>>>>>> application. > >>>>>>> > >>>>>>> And as previously replied, the main target of the API is to define the vector > >>>>>>> path, not all optimizations, so the number is limited. > >>>>> > >>>>> No! > >>>>> The name of this API is "burst mode information", > >>>>> not "vector instructions used". > >>>>> I think the main error is that in Intel PMDs, > >>>>> each Rx/Tx function use different vector instructions. > >>>>> So you generalize that knowing the vectors instructions > >>>>> will give you a good information about the performance. > >>>>> But this is generally wrong! > >>>>> The right level of infos is much more complex. > >>>> > >>>> I don't think anyone was suggesting limiting it to purely describing PMD optimization > >>>> with vector instructions. If there are other commonalities let's describe those also. > >>>> > >>>> Vectorization was thought to be a good starting point - IMHO it is. > >>>> > >>>>> > >>>>>>> There are many optimization in the data path, I agree we may not represent all > >>>>>>> of them, and agreed existing enum having "RTE_ETH_BURST_BULK_ALLOC" and similar > >>>>>>> causing this confusion, perhaps we can remove them. > >>>>>>> > >>>>>>> And if the requirement from the application is just informative, I would agree > >>>>>>> that free text string will be better, right now 'rte_eth_rx/tx_burst_mode_get()' > >>>>>>> is the main API to provide the information and > >>>>>>> 'rte_eth_burst_mode_option_name()' is a helper for application/driver to log > >>>>>>> this information. > >>>>>>> > >>>>>> > >>>>>> Well look we have a general deficit of information about what is happening under > >>>>>> the covers in DPDK. The end user may get wildly different performance characteristics > >>>>>> based on the DPDK configuration. Simple example is using flow director causes the i40e > >>>>>> PMD to switch to using a scalar code path, and performance may as much as half. > >>>>>> > >>>>>> This can cause no end of head-scratching in consuming products, I have done some > >>>>>> of that head scratching myself, it is a usability nightmare. > >>>>>> > >>>>>> FD.io VPP tries to work around this by mining the call stack, to give the user _some_ > >>>>>> kind of information about what is happening. These kind of heroics should not be necessary. > >>>>>> > >>>>>> For exactly the same reasons as telemetry, we should be trying to give the users as much > >>>>>> information as possible, in as standard as format as possible. Otherwise DPDK > >>>>>> becomes arcane leaving the user running gdb to understand what is going on, as I > >>>>>> frequently do. > >>>>> > >>>>> I agree we must provide a clue to understand the performance result. > >>>>> As Stephen commented at the very beginning, a log is enough for such debug. > >>>>> But his comment was ignored. > >>>> > >>>> Do we expect applications built on DPDK to have to grep it's log to make such discoveries? > >>>> It's very brittle and arcane way to provide information, if nothing else. > >>>> > >>>>> You wanted an API, fine. > >>>>> I am OK to have an API to request infos which are also in logs. > >>>> > >>>> I would point out that an API to query meta-data is common practice else where. > >>>> GStreamer GstCaps and Linux Sysfs are the closest example I can think of. > >>>> > >>>>> > >>>>>> Finally, again for the same reasons as telemetry, I would say that machine readable is the > >>>>>> ideal here. > >>>>> > >>>>> I disagree here. There is no need to make this info machine readable. > >>>>> We want a clue about the optimizations which are all about creativity. > >>>>> And we cannot make creativity of developers "machine readable". > >>>> > >>>> I am more concerned about the creativity in how developers describe optimizations. > >>>> If there is no standardization of strings (or bits), the API will be challenging to use. > >>> > >>> No it won't be challenging because it will be just a string to print. > >> > >> Well the challenge is getting everyone to use the same set of strings, > >> such that what is returned by the API has common meaning. > >> > >> I am fine with strings. > >> So long as we have a method of encouraging folks to use a standard set were possible. > > > > I don't understand why you insist on standardizing. > > Every drivers are different. > > The routine names will have a sense only in the context of the driver. > > The more diversity in description, the more the user is reaching for documentation. > If the user is lucky enough that description has documentation. I would go even further: The documentation will not explain each Rx/Tx routine. The user can have some guess, but the main information is to see that the routine changed from one run to the other, so he can expect a change in the performance result. And as a bonus, he can ask more explanation to the maintainers by giving the routine name he got from the API. > We can argue about this indefinitely, instead of proposing a standard. :-) > The best way to this is to leave as the API as experimental for some period - as Haiyue suggests. > And then review as the API drops experimental status, with a view to standardizing if possible? Yes we can argue indefinitely. My position is against standardizing this information. (not even talking about the ABI breaks it could cause) > >>> The challenge is trying to fix the design characteristics in an API. > >> > >> I thought Haiyue's patch with a fair degree of input from Ferruh and others is a pretty solid start. > >> Let's describe those commonalities that _do_ exist today - it may not be enough, but it's better than we had. > > > > The initial requirement is to describe what makes the performance > > change from one routine to the other. > > We don't want to describe the commonalities but the specific differences. > > Please let's focus on the real requirement and build an API which really helps. > > As said several times, a PMD-specific string would be a good API.