From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 86F79A0352; Mon, 4 Nov 2019 12:30:59 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A997A37B7; Mon, 4 Nov 2019 12:30:58 +0100 (CET) Received: from out5-smtp.messagingengine.com (out5-smtp.messagingengine.com [66.111.4.29]) by dpdk.org (Postfix) with ESMTP id 9549737B4 for ; Mon, 4 Nov 2019 12:30:57 +0100 (CET) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 18A1B21E6A; Mon, 4 Nov 2019 06:30:55 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Mon, 04 Nov 2019 06:30:55 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=mesmtp; bh=k6XONiTdjBmcmuwLc4EX4WPlzRG3AYHZLEJmsiO3Tm4=; b=eqP/lHCGxfgg OjpvmINlEXqC1N9wgSCe6ubPzfMhR5x9a634oVbqHZVwYrF3esFJDuqmrx3ZTzv3 zaetI4utxRGMqXyurJrpCTS8R4sVkIb1bOIWHojbP1EszoE1vWhm22Sj+L24ZXa9 qzqf23ABAwBsZL4m5kRSAAXA9maEE+E= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=k6XONiTdjBmcmuwLc4EX4WPlzRG3AYHZLEJmsiO3T m4=; b=b0fECUa07PRlaHCPU+PNQIVeAO1OwZe2jYyR2+BQyPPdcgX4YwuMuJgUE Mf4Ziyrej/FybLnmbaHj/Qr5pqRT67aZA9fuqVJ2hvITAcIWqEeqnZuKpTzNFl8W /+pQarmzs+np6XtFeZzky3qwm+y3ZqKXIJUlDVQ5J5XbxWXfIJeDD8xO7zvt6VZ1 41G3DXJ+Zwb6wVxGhF1E63ZGyh2DJzrCjuEbRQ6/erRQWwXOzQqQBFlOT8oxvlai LjFdaKy6VandzMjDDfhNmLKj+Fy31zNHOeY+/xUacws37VVZR8d4/LlHyXdBhRoP ku2ZHIIjQVJsuvayx6XdHDJvsizxg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedufedruddufedgfedtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkfgjfhgggfgtsehtufertddttddvnecuhfhrohhmpefvhhhomhgr shcuofhonhhjrghlohhnuceothhhohhmrghssehmohhnjhgrlhhonhdrnhgvtheqnecukf hppeelfedrvdefrddvgeelrddugeelnecurfgrrhgrmhepmhgrihhlfhhrohhmpehthhho mhgrshesmhhonhhjrghlohhnrdhnvghtnecuvehluhhsthgvrhfuihiivgeptd X-ME-Proxy: Received: from xps.localnet (149.249.23.93.rev.sfr.net [93.23.249.149]) by mail.messagingengine.com (Postfix) with ESMTPA id 6470C3060077; Mon, 4 Nov 2019 06:30:51 -0500 (EST) From: Thomas Monjalon To: Ray Kinsella Cc: dev@dpdk.org, "Yigit, Ferruh" , 'Damjan Marion' , "Wang, Haiyue" , Jerin Jacob Kollanukkaran , viacheslavo@mellanox.com, stephen@networkplumber.org, arybchenko@solarflare.com Date: Mon, 04 Nov 2019 12:30:47 +0100 Message-ID: <3333162.RZseXh5CjJ@xps> In-Reply-To: <31ae4f6c-f55b-ae3f-4527-4870156b74ff@ashroe.eu> References: <20191015075133.38560-1-haiyue.wang@intel.com> <4310025.8zu0hhGBfT@xps> <31ae4f6c-f55b-ae3f-4527-4870156b74ff@ashroe.eu> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [PATCH v4 1/4] ethdev: add the API for getting burst mode information X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 04/11/2019 11:03, Ray Kinsella: > On 04/11/2019 09:54, Thomas Monjalon wrote: > > 04/11/2019 10:49, Ray Kinsella: > >> On 03/11/2019 22:41, Thomas Monjalon wrote: > >>> 03/11/2019 21:35, Ray Kinsella: > >>>> On 29/10/2019 14:27, Ferruh Yigit wrote: > >>>>> On 10/26/2019 5:23 PM, Thomas Monjalon wrote: > >>>>>> 26/10/2019 11:23, Wang, Haiyue: > >>>>>>> From: Thomas Monjalon [mailto:thomas@monjalon.net] > >>>>>>>> 26/10/2019 06:40, Wang, Haiyue: > >>>>>>>>> From: Thomas Monjalon [mailto:thomas@monjalon.net] > >>>>>>>>>> 25/10/2019 18:02, Jerin Jacob: > >>>>>>>>>>> On Fri, Oct 25, 2019 at 9:15 PM Thomas Monjalon wrote: > >>>>>>>>>>>> 25/10/2019 16:08, Ferruh Yigit: > >>>>>>>>>>>>> On 10/25/2019 10:36 AM, Thomas Monjalon wrote: > >>>>>>>>>>>>>> 15/10/2019 09:51, Haiyue Wang: > >>>>>>>>>>>>>>> Some PMDs have more than one RX/TX burst paths, add the ethdev API > >>>>>>>>>>>>>>> that allows an application to retrieve the mode information about > >>>>>>>>>>>>>>> Rx/Tx packet burst such as Scalar or Vector, and Vector technology > >>>>>>>>>>>>>>> like AVX2. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I missed this patch. I and Andrew, maintainers of ethdev, were not CC'ed. > >>>>>>>>>>>>>> Ferruh, I would expect to be Cc'ed and/or get a notification before merging. > >>>>>>>>>>>>> > >>>>>>>>>>>>> It has been discussed in the mail list and went through multiple discussions, > >>>>>>>>>>>>> patch is out since the August, +1 to cc all maintainers I missed that part, > >>>>>>>>>>>>> but when the patch is reviewed and there is no objection, why block the merge? > >>>>>>>>>>>> > >>>>>>>>>>>> I'm not saying blocking the merge. > >>>>>>>>>>>> My bad is that I missed the patch and I am asking for help with a notification > >>>>>>>>>>>> in this case. Same for Andrew I guess. > >>>>>>>>>>>> Note: it is merged in master and I am looking to improve this feature. > >>>>>>>>>>> > >>>>>>>>>>>>>>> +/** > >>>>>>>>>>>>>>> + * Ethernet device RX/TX queue packet burst mode information structure. > >>>>>>>>>>>>>>> + * Used to retrieve information about packet burst mode setting. > >>>>>>>>>>>>>>> + */ > >>>>>>>>>>>>>>> +struct rte_eth_burst_mode { > >>>>>>>>>>>>>>> + uint64_t options; > >>>>>>>>>>>>>>> +}; > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Why a struct for an integer? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Again by a request from me, to not need to break the API if we need to add more > >>>>>>>>>>>>> thing in the future. > >>>>>>>>>>>> > >>>>>>>>>>>> I would replace it with a string. This is the most flexible API. > >>>>>>>>>>> > >>>>>>>>>>> IMO, Probably, best of both worlds make a good option here, > >>>>>>>>>>> as Haiyue suggested if we have an additional dev_specific[1] in structure. > >>>>>>>>>>> and when a pass to the application, let common code make final string as > >>>>>>>>>>> (options flags to string + dev_specific) > >>>>>>>>>>> > >>>>>>>>>>> options flag can be zero if PMD does not have any generic flags nor > >>>>>>>>>>> interested in such a scheme. > >>>>>>>>>>> Generic flags will help at least to have some common code. > >>>>>>>>>>> > >>>>>>>>>>> [1] > >>>>>>>>>>> struct rte_eth_burst_mode { > >>>>>>>>>>> uint64_t options; > >>>>>>>>>>> char dev_specific[128]; /* PMD has specific burst mode information */ > >>>>>>>>>>> }; > >>>>>>>>>> > >>>>>>>>>> I really don't see how we can have generic flags. > >>>>>>>>>> The flags which are proposed are just matching > >>>>>>>>>> the functions implemented in Intel PMDs. > >>>>>>>>>> And this is a complicate solution. > >>>>>>>>>> Why not just returning a name for the selected Rx/Tx mode? > >>>>>>>>> > >>>>>>>>> Intel PMDs use the *generic* methods like x86 SSE, AVX2, ARM NEON, PPC ALTIVEC, > >>>>>>>>> 'dev->data->scattered_rx' etc for the target : "DPDK is the Data Plane Development Kit > >>>>>>>>> that consists of libraries to accelerate packet processing workloads running on a wide > >>>>>>>>> variety of CPU architectures." > >>>>>>>> > >>>>>>>> How RTE_ETH_BURST_SCATTERED and RTE_ETH_BURST_BULK_ALLOC are generic? > >>>>>>>> They just match some features of the Intel PMDs. > >>>>>>>> Why not exposing other optimizations of the Rx/Tx implementations? > >>>>>>>> You totally missed the point of generic burst mode description. > >>>>>>>> > >>>>>>>>> If understand these new experimental APIs from above, then bit options is the best, > >>>>>>>>> and we didn't invent new words to describe them, just from the CPU & other *generic* > >>>>>>>>> technology. And the application can loop to check which kind of burst is running by > >>>>>>>>> just simple bit test. > >>>>>>>>> > >>>>>>>>> If PMDs missed these, they can update them in future roadmaps to enhance their PMDs, > >>>>>>>>> like MLX5 supports ARM NEON, x86 SSE. > >>>>>>>> > >>>>>>>> I have no word! > >>>>>>>> You really think other PMDs should learn from Intel how to "enhance" their PMD? > >>>>>>>> You talk about mlx5, did you look at its code? Did you see the burst modes > >>>>>>>> depending on which specific hardware path is used (MPRQ, EMPW, inline)? > >>>>>>>> Or depending on which offloads are handled? > >>>>>>>> > >>>>>>>> Again, the instruction set used by the function is a small part > >>>>>>>> of the burst mode optimization. > >>>>>>>> > >>>>>>>> So you did not reply to my question: > >>>>>>>> Why not just returning a name for the selected Rx/Tx mode? > >>>>>>> > >>>>>>> In fact, RFC v1/v2 returns the *name*, but the *name* is hard for > >>>>>>> application to do further processing, strcmp, strstr ? Not so nice > >>>>>>> for C code, and it is not so standard, So switch it to bit definition. > >>>>>> > >>>>>> Again, please answer my question: why do you need it? > >>>>>> I think it is just informative, that's why a string should be enough. > >>>>>> I am clearly against the bitmap because it is way too much restrictive. > >>>>>> I disagree that knowing it is using AVX2 or AVX512 is so interesting. > >>>>>> What you would like to know is whether it is processing packets 4 by 4, > >>>>>> for instance, or to know which offload is supported, or what hardware trick > >>>>>> is used in the datapath design. > >>>>>> There are so many options in a datapath design that it cannot be > >>>>>> represented with a bitmap. And it makes no sense to have some design > >>>>>> criterias more important than others. > >>>>>> I Cc an Intel architect (Edwin) who could explain you how much > >>>>>> a datapath design is more complicate than just using AVX instructions. > >>>>> > >>>>> As I understand this is to let applications to give informed decision based on > >>>>> what vectorization is used in the driver, currently this is not know by the > >>>>> application. > >>>>> > >>>>> And as previously replied, the main target of the API is to define the vector > >>>>> path, not all optimizations, so the number is limited. > >>> > >>> No! > >>> The name of this API is "burst mode information", > >>> not "vector instructions used". > >>> I think the main error is that in Intel PMDs, > >>> each Rx/Tx function use different vector instructions. > >>> So you generalize that knowing the vectors instructions > >>> will give you a good information about the performance. > >>> But this is generally wrong! > >>> The right level of infos is much more complex. > >> > >> I don't think anyone was suggesting limiting it to purely describing PMD optimization > >> with vector instructions. If there are other commonalities let's describe those also. > >> > >> Vectorization was thought to be a good starting point - IMHO it is. > >> > >>> > >>>>> There are many optimization in the data path, I agree we may not represent all > >>>>> of them, and agreed existing enum having "RTE_ETH_BURST_BULK_ALLOC" and similar > >>>>> causing this confusion, perhaps we can remove them. > >>>>> > >>>>> And if the requirement from the application is just informative, I would agree > >>>>> that free text string will be better, right now 'rte_eth_rx/tx_burst_mode_get()' > >>>>> is the main API to provide the information and > >>>>> 'rte_eth_burst_mode_option_name()' is a helper for application/driver to log > >>>>> this information. > >>>>> > >>>> > >>>> Well look we have a general deficit of information about what is happening under > >>>> the covers in DPDK. The end user may get wildly different performance characteristics > >>>> based on the DPDK configuration. Simple example is using flow director causes the i40e > >>>> PMD to switch to using a scalar code path, and performance may as much as half. > >>>> > >>>> This can cause no end of head-scratching in consuming products, I have done some > >>>> of that head scratching myself, it is a usability nightmare. > >>>> > >>>> FD.io VPP tries to work around this by mining the call stack, to give the user _some_ > >>>> kind of information about what is happening. These kind of heroics should not be necessary. > >>>> > >>>> For exactly the same reasons as telemetry, we should be trying to give the users as much > >>>> information as possible, in as standard as format as possible. Otherwise DPDK > >>>> becomes arcane leaving the user running gdb to understand what is going on, as I > >>>> frequently do. > >>> > >>> I agree we must provide a clue to understand the performance result. > >>> As Stephen commented at the very beginning, a log is enough for such debug. > >>> But his comment was ignored. > >> > >> Do we expect applications built on DPDK to have to grep it's log to make such discoveries? > >> It's very brittle and arcane way to provide information, if nothing else. > >> > >>> You wanted an API, fine. > >>> I am OK to have an API to request infos which are also in logs. > >> > >> I would point out that an API to query meta-data is common practice else where. > >> GStreamer GstCaps and Linux Sysfs are the closest example I can think of. > >> > >>> > >>>> Finally, again for the same reasons as telemetry, I would say that machine readable is the > >>>> ideal here. > >>> > >>> I disagree here. There is no need to make this info machine readable. > >>> We want a clue about the optimizations which are all about creativity. > >>> And we cannot make creativity of developers "machine readable". > >> > >> I am more concerned about the creativity in how developers describe optimizations. > >> If there is no standardization of strings (or bits), the API will be challenging to use. > > > > No it won't be challenging because it will be just a string to print. > > Well the challenge is getting everyone to use the same set of strings, > such that what is returned by the API has common meaning. > > I am fine with strings. > So long as we have a method of encouraging folks to use a standard set were possible. I don't understand why you insist on standardizing. Every drivers are different. The routine names will have a sense only in the context of the driver. > > The challenge is trying to fix the design characteristics in an API. > > I thought Haiyue's patch with a fair degree of input from Ferruh and others is a pretty solid start. > Let's describe those commonalities that _do_ exist today - it may not be enough, but it's better than we had. The initial requirement is to describe what makes the performance change from one routine to the other. We don't want to describe the commonalities but the specific differences. Please let's focus on the real requirement and build an API which really helps. As said several times, a PMD-specific string would be a good API.