DPDK patches and discussions
 help / color / mirror / Atom feed
From: Thomas Monjalon <thomas@monjalon.net>
To: Ray Kinsella <mdr@ashroe.eu>
Cc: dev@dpdk.org, "Yigit, Ferruh" <ferruh.yigit@intel.com>,
	'Damjan Marion' <dmarion@me.com>,
	"Wang, Haiyue" <haiyue.wang@intel.com>,
	Jerin Jacob Kollanukkaran <jerinj@marvell.com>,
	viacheslavo@mellanox.com, stephen@networkplumber.org,
	arybchenko@solarflare.com
Subject: Re: [dpdk-dev] [PATCH v4 1/4] ethdev: add the API for getting burst mode information
Date: Mon, 04 Nov 2019 14:09:38 +0100	[thread overview]
Message-ID: <2680328.hM31vkk1Ea@xps> (raw)
In-Reply-To: <932da3d3-34b9-79a4-1a68-fb06056ee93f@ashroe.eu>

04/11/2019 13:07, Ray Kinsella:
> 
> On 04/11/2019 11:30, Thomas Monjalon wrote:
> > 04/11/2019 11:03, Ray Kinsella:
> >> On 04/11/2019 09:54, Thomas Monjalon wrote:
> >>> 04/11/2019 10:49, Ray Kinsella:
> >>>> On 03/11/2019 22:41, Thomas Monjalon wrote:
> >>>>> 03/11/2019 21:35, Ray Kinsella:
> >>>>>> On 29/10/2019 14:27, Ferruh Yigit wrote:
> >>>>>>> On 10/26/2019 5:23 PM, Thomas Monjalon wrote:
> >>>>>>>> 26/10/2019 11:23, Wang, Haiyue:
> >>>>>>>>> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> >>>>>>>>>> 26/10/2019 06:40, Wang, Haiyue:
> >>>>>>>>>>> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> >>>>>>>>>>>> 25/10/2019 18:02, Jerin Jacob:
> >>>>>>>>>>>>> On Fri, Oct 25, 2019 at 9:15 PM Thomas Monjalon <thomas@monjalon.net> wrote:
> >>>>>>>>>>>>>> 25/10/2019 16:08, Ferruh Yigit:
> >>>>>>>>>>>>>>> On 10/25/2019 10:36 AM, Thomas Monjalon wrote:
> >>>>>>>>>>>>>>>> 15/10/2019 09:51, Haiyue Wang:
> >>>>>>>>>>>>>>>>> Some PMDs have more than one RX/TX burst paths, add the ethdev API
> >>>>>>>>>>>>>>>>> that allows an application to retrieve the mode information about
> >>>>>>>>>>>>>>>>> Rx/Tx packet burst such as Scalar or Vector, and Vector technology
> >>>>>>>>>>>>>>>>> like AVX2.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I missed this patch. I and Andrew, maintainers of ethdev, were not CC'ed.
> >>>>>>>>>>>>>>>> Ferruh, I would expect to be Cc'ed and/or get a notification before merging.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> It has been discussed in the mail list and went through multiple discussions,
> >>>>>>>>>>>>>>> patch is out since the August, +1 to cc all maintainers I missed that part,
> >>>>>>>>>>>>>>> but when the patch is reviewed and there is no objection, why block the merge?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I'm not saying blocking the merge.
> >>>>>>>>>>>>>> My bad is that I missed the patch and I am asking for help with a notification
> >>>>>>>>>>>>>> in this case. Same for Andrew I guess.
> >>>>>>>>>>>>>> Note: it is merged in master and I am looking to improve this feature.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> +/**
> >>>>>>>>>>>>>>>>> + * Ethernet device RX/TX queue packet burst mode information structure.
> >>>>>>>>>>>>>>>>> + * Used to retrieve information about packet burst mode setting.
> >>>>>>>>>>>>>>>>> + */
> >>>>>>>>>>>>>>>>> +struct rte_eth_burst_mode {
> >>>>>>>>>>>>>>>>> +  uint64_t options;
> >>>>>>>>>>>>>>>>> +};
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Why a struct for an integer?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Again by a request from me, to not need to break the API if we need to add more
> >>>>>>>>>>>>>>> thing in the future.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I would replace it with a string. This is the most flexible API.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> IMO, Probably, best of both worlds make a good option here,
> >>>>>>>>>>>>> as Haiyue suggested if we have an additional dev_specific[1] in structure.
> >>>>>>>>>>>>> and when a pass to the application, let common code make final string as
> >>>>>>>>>>>>> (options flags to string + dev_specific)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> options flag can be zero if PMD does not have any generic flags nor
> >>>>>>>>>>>>> interested in such a scheme.
> >>>>>>>>>>>>> Generic flags will help at least to have some common code.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]
> >>>>>>>>>>>>> struct rte_eth_burst_mode {
> >>>>>>>>>>>>>         uint64_t options;
> >>>>>>>>>>>>>         char dev_specific[128]; /* PMD has specific burst mode information */
> >>>>>>>>>>>>> };
> >>>>>>>>>>>>
> >>>>>>>>>>>> I really don't see how we can have generic flags.
> >>>>>>>>>>>> The flags which are proposed are just matching
> >>>>>>>>>>>> the functions implemented in Intel PMDs.
> >>>>>>>>>>>> And this is a complicate solution.
> >>>>>>>>>>>> Why not just returning a name for the selected Rx/Tx mode?
> >>>>>>>>>>>
> >>>>>>>>>>> Intel PMDs use the *generic* methods like x86 SSE, AVX2, ARM NEON, PPC ALTIVEC,
> >>>>>>>>>>> 'dev->data->scattered_rx' etc for the target : "DPDK is the Data Plane Development Kit
> >>>>>>>>>>> that consists of libraries to accelerate packet processing workloads running on a wide
> >>>>>>>>>>> variety of CPU architectures."
> >>>>>>>>>>
> >>>>>>>>>> How RTE_ETH_BURST_SCATTERED and RTE_ETH_BURST_BULK_ALLOC are generic?
> >>>>>>>>>> They just match some features of the Intel PMDs.
> >>>>>>>>>> Why not exposing other optimizations of the Rx/Tx implementations?
> >>>>>>>>>> You totally missed the point of generic burst mode description.
> >>>>>>>>>>
> >>>>>>>>>>> If understand these new experimental APIs from above, then bit options is the best,
> >>>>>>>>>>> and we didn't invent new words to describe them, just from the CPU & other *generic*
> >>>>>>>>>>> technology. And the application can loop to check which kind of burst is running by
> >>>>>>>>>>> just simple bit test.
> >>>>>>>>>>>
> >>>>>>>>>>> If PMDs missed these, they can update them in future roadmaps to enhance their PMDs,
> >>>>>>>>>>> like MLX5 supports ARM NEON, x86 SSE.
> >>>>>>>>>>
> >>>>>>>>>> I have no word!
> >>>>>>>>>> You really think other PMDs should learn from Intel how to "enhance" their PMD?
> >>>>>>>>>> You talk about mlx5, did you look at its code? Did you see the burst modes
> >>>>>>>>>> depending on which specific hardware path is used (MPRQ, EMPW, inline)?
> >>>>>>>>>> Or depending on which offloads are handled?
> >>>>>>>>>>
> >>>>>>>>>> Again, the instruction set used by the function is a small part
> >>>>>>>>>> of the burst mode optimization.
> >>>>>>>>>>
> >>>>>>>>>> So you did not reply to my question:
> >>>>>>>>>> Why not just returning a name for the selected Rx/Tx mode?
> >>>>>>>>>
> >>>>>>>>> In fact, RFC v1/v2 returns the *name*, but the *name* is hard for
> >>>>>>>>> application to do further processing, strcmp, strstr ? Not so nice
> >>>>>>>>> for C code, and it is not so standard, So switch it to bit definition.
> >>>>>>>>
> >>>>>>>> Again, please answer my question: why do you need it?
> >>>>>>>> I think it is just informative, that's why a string should be enough.
> >>>>>>>> I am clearly against the bitmap because it is way too much restrictive.
> >>>>>>>> I disagree that knowing it is using AVX2 or AVX512 is so interesting.
> >>>>>>>> What you would like to know is whether it is processing packets 4 by 4,
> >>>>>>>> for instance, or to know which offload is supported, or what hardware trick
> >>>>>>>> is used in the datapath design.
> >>>>>>>> There are so many options in a datapath design that it cannot be
> >>>>>>>> represented with a bitmap. And it makes no sense to have some design
> >>>>>>>> criterias more important than others.
> >>>>>>>> I Cc an Intel architect (Edwin) who could explain you how much
> >>>>>>>> a datapath design is more complicate than just using AVX instructions.
> >>>>>>>
> >>>>>>> As I understand this is to let applications to give informed decision based on
> >>>>>>> what vectorization is used in the driver, currently this is not know by the
> >>>>>>> application.
> >>>>>>>
> >>>>>>> And as previously replied, the main target of the API is to define the vector
> >>>>>>> path, not all optimizations, so the number is limited.
> >>>>>
> >>>>> No!
> >>>>> The name of this API is "burst mode information",
> >>>>> not "vector instructions used".
> >>>>> I think the main error is that in Intel PMDs,
> >>>>> each Rx/Tx function use different vector instructions.
> >>>>> So you generalize that knowing the vectors instructions
> >>>>> will give you a good information about the performance.
> >>>>> But this is generally wrong!
> >>>>> The right level of infos is much more complex.
> >>>>
> >>>> I don't think anyone was suggesting limiting it to purely describing PMD optimization 
> >>>> with vector instructions. If there are other commonalities let's describe those also. 
> >>>>
> >>>> Vectorization was thought to be a good starting point - IMHO it is.
> >>>>
> >>>>>
> >>>>>>> There are many optimization in the data path, I agree we may not represent all
> >>>>>>> of them, and agreed existing enum having "RTE_ETH_BURST_BULK_ALLOC" and similar
> >>>>>>> causing this confusion, perhaps we can remove them.
> >>>>>>>
> >>>>>>> And if the requirement from the application is just informative, I would agree
> >>>>>>> that free text string will be better, right now 'rte_eth_rx/tx_burst_mode_get()'
> >>>>>>> is the main API to provide the information and
> >>>>>>> 'rte_eth_burst_mode_option_name()' is a helper for application/driver to log
> >>>>>>> this information.
> >>>>>>>
> >>>>>>
> >>>>>> Well look we have a general deficit of information about what is happening under 
> >>>>>> the covers in DPDK. The end user may get wildly different performance characteristics 
> >>>>>> based on the DPDK configuration. Simple example is using flow director causes the i40e 
> >>>>>> PMD to switch to using a scalar code path, and performance may as much as half.
> >>>>>>
> >>>>>> This can cause no end of head-scratching in consuming products, I have done some 
> >>>>>> of that head scratching myself, it is a usability nightmare. 
> >>>>>>
> >>>>>> FD.io VPP tries to work around this by mining the call stack, to give the user _some_
> >>>>>> kind of information about what is happening. These kind of heroics should not be necessary.
> >>>>>>
> >>>>>> For exactly the same reasons as telemetry, we should be trying to give the users as much 
> >>>>>> information as possible, in as standard as format as possible. Otherwise DPDK 
> >>>>>> becomes arcane leaving the user running gdb to understand what is going on, as I 
> >>>>>> frequently do.
> >>>>>
> >>>>> I agree we must provide a clue to understand the performance result.
> >>>>> As Stephen commented at the very beginning, a log is enough for such debug.
> >>>>> But his comment was ignored. 
> >>>>
> >>>> Do we expect applications built on DPDK to have to grep it's log to make such discoveries?
> >>>> It's very brittle and arcane way to provide information, if nothing else. 
> >>>>
> >>>>> You wanted an API, fine.
> >>>>> I am OK to have an API to request infos which are also in logs.
> >>>>
> >>>> I would point out that an API to query meta-data is common practice else where.
> >>>> GStreamer GstCaps and Linux Sysfs are the closest example I can think of.
> >>>>
> >>>>>
> >>>>>> Finally, again for the same reasons as telemetry, I would say that machine readable is the 
> >>>>>> ideal here.
> >>>>>
> >>>>> I disagree here. There is no need to make this info machine readable.
> >>>>> We want a clue about the optimizations which are all about creativity.
> >>>>> And we cannot make creativity of developers "machine readable".
> >>>>
> >>>> I am more concerned about the creativity in how developers describe optimizations. 
> >>>> If there is no standardization of strings (or bits), the API will be challenging to use. 
> >>>
> >>> No it won't be challenging because it will be just a string to print.
> >>
> >> Well the challenge is getting everyone to use the same set of strings, 
> >> such that what is returned by the API has common meaning. 
> >>
> >> I am fine with strings.
> >> So long as we have a method of encouraging folks to use a standard set were possible.
> > 
> > I don't understand why you insist on standardizing.
> > Every drivers are different.
> > The routine names will have a sense only in the context of the driver.
> 
> The more diversity in description, the more the user is reaching for documentation.
> If the user is lucky enough that description has documentation.

I would go even further:
The documentation will not explain each Rx/Tx routine.
The user can have some guess, but the main information is to see
that the routine changed from one run to the other, so he can expect
a change in the performance result.
And as a bonus, he can ask more explanation to the maintainers
by giving the routine name he got from the API.

> We can argue about this indefinitely, instead of proposing a standard. :-)
> The best way to this is to leave as the API as experimental for some period - as Haiyue suggests. 
> And then review as the API drops experimental status, with a view to standardizing if possible?

Yes we can argue indefinitely.
My position is against standardizing this information.
(not even talking about the ABI breaks it could cause)

> >>> The challenge is trying to fix the design characteristics in an API.
> >>
> >> I thought Haiyue's patch with a fair degree of input from Ferruh and others is a pretty solid start. 
> >> Let's describe those commonalities that _do_ exist today - it may not be enough, but it's better than we had. 
> > 
> > The initial requirement is to describe what makes the performance
> > change from one routine to the other.
> > We don't want to describe the commonalities but the specific differences.
> > Please let's focus on the real requirement and build an API which really helps.
> > As said several times, a PMD-specific string would be a good API.




  reply	other threads:[~2019-11-04 13:09 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-15  7:51 [dpdk-dev] [PATCH v4 0/4] get Rx/Tx packet " Haiyue Wang
2019-10-15  7:51 ` [dpdk-dev] [PATCH v4 1/4] ethdev: add the API for getting " Haiyue Wang
2019-10-15 10:45   ` Ferruh Yigit
2019-10-15 11:23     ` Wang, Haiyue
2019-10-15 11:13   ` Ferruh Yigit
2019-10-25  9:36   ` Thomas Monjalon
2019-10-25 10:26     ` Jerin Jacob
2019-10-25 13:59     ` Wang, Haiyue
2019-10-25 14:08     ` Ferruh Yigit
2019-10-25 15:45       ` Thomas Monjalon
2019-10-25 16:02         ` Jerin Jacob
2019-10-25 22:27           ` Thomas Monjalon
2019-10-26  4:40             ` Wang, Haiyue
2019-10-26  6:24               ` Thomas Monjalon
2019-10-26  9:23                 ` Wang, Haiyue
2019-10-26 16:23                   ` Thomas Monjalon
2019-10-29 14:27                     ` Ferruh Yigit
2019-11-03 20:35                       ` Ray Kinsella
2019-11-03 22:41                         ` Thomas Monjalon
2019-11-04  9:49                           ` Ray Kinsella
2019-11-04  9:54                             ` Thomas Monjalon
2019-11-04 10:03                               ` Ray Kinsella
2019-11-04 10:46                                 ` Wang, Haiyue
2019-11-04 11:30                                 ` Thomas Monjalon
2019-11-04 12:07                                   ` Ray Kinsella
2019-11-04 13:09                                     ` Thomas Monjalon [this message]
2019-11-04 13:48                                       ` Ray Kinsella
2019-11-04 14:17                                         ` Wang, Haiyue
2019-10-26  6:40             ` Slava Ovsiienko
2019-10-26  9:31               ` Wang, Haiyue
2019-10-26  6:58             ` Jerin Jacob
2019-10-26  9:37               ` Wang, Haiyue
2019-10-29  3:37                 ` Jerin Jacob
2019-10-29  4:44                   ` Wang, Haiyue
2019-10-29  5:19                     ` Jerin Jacob
2019-10-29  5:42                       ` Wang, Haiyue
2019-10-29  5:47                         ` Jerin Jacob
2019-10-29  5:56                           ` Wang, Haiyue
2019-10-29  6:00                           ` Wang, Haiyue
2019-10-29  8:34                             ` Jerin Jacob
2019-10-29 11:26                               ` Wang, Haiyue
2019-10-29 12:56                                 ` Jerin Jacob
2019-10-29 13:51                                   ` Wang, Haiyue
2019-10-29 14:08                                     ` Jerin Jacob
2019-10-29 15:42                                       ` Wang, Haiyue
2019-10-29 15:59                                         ` Jerin Jacob
2019-10-29 16:16                                           ` Wang, Haiyue
2019-10-29 16:59               ` Ferruh Yigit
2019-10-30  4:38                 ` Jerin Jacob
2019-10-30  4:43                   ` Wang, Haiyue
2019-10-30  8:14                 ` Wang, Haiyue
2019-10-31 10:46                   ` Jerin Jacob
2019-10-31 11:15                     ` Ray Kinsella
2019-10-31 11:16                     ` Wang, Haiyue
2019-10-31 14:58                       ` Ferruh Yigit
2019-10-31 15:07                         ` Wang, Haiyue
2019-10-31 15:29                           ` Ferruh Yigit
2019-10-31 15:54                             ` Wang, Haiyue
2019-10-31 11:09                 ` Ray Kinsella
2019-10-15  7:51 ` [dpdk-dev] [PATCH v4 2/4] net/i40e: add Rx/Tx burst mode get callbacks Haiyue Wang
2019-10-15  7:51 ` [dpdk-dev] [PATCH v4 3/4] net/ice: " Haiyue Wang
2019-10-15  7:51 ` [dpdk-dev] [PATCH v4 4/4] app/testpmd: show the Rx/Tx burst mode description Haiyue Wang
2019-10-15 12:11 ` [dpdk-dev] [PATCH v4 0/4] get Rx/Tx packet burst mode information Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2680328.hM31vkk1Ea@xps \
    --to=thomas@monjalon.net \
    --cc=arybchenko@solarflare.com \
    --cc=dev@dpdk.org \
    --cc=dmarion@me.com \
    --cc=ferruh.yigit@intel.com \
    --cc=haiyue.wang@intel.com \
    --cc=jerinj@marvell.com \
    --cc=mdr@ashroe.eu \
    --cc=stephen@networkplumber.org \
    --cc=viacheslavo@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).