DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Konstantin Ananyev" <konstantin.ananyev@huawei.com>,
	"Jerin Jacob" <jerinjacobk@gmail.com>
Cc: <jerinj@marvell.com>, <dev@dpdk.org>,
	"Thomas Monjalon" <thomas@monjalon.net>,
	"Ferruh Yigit" <ferruh.yigit@amd.com>,
	"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
	<ferruh.yigit@xilinx.com>, <ajit.khaparde@broadcom.com>,
	<aboyer@pensando.io>, <beilei.xing@intel.com>,
	<bruce.richardson@intel.com>, <chas3@att.com>,
	<chenbo.xia@intel.com>, <ciara.loftus@intel.com>,
	<dsinghrawat@marvell.com>, <ed.czeck@atomicrules.com>,
	<evgenys@amazon.com>, <grive@u256.net>, <g.singh@nxp.com>,
	<haiyue.wang@intel.com>, <hkalra@marvell.com>,
	<heinrich.kuhn@corigine.com>, <hemant.agrawal@nxp.com>,
	<hyonkim@cisco.com>, <igorch@amazon.com>, <irusskikh@marvell.com>,
	<jgrajcia@cisco.com>, <jasvinder.singh@intel.com>,
	<jianwang@trustnetic.com>, <jiawenwu@trustnetic.com>,
	<jingjing.wu@intel.com>, <johndale@cisco.com>,
	<john.miller@atomicrules.com>, <linville@tuxdriver.com>,
	<keith.wiles@intel.com>, <kirankumark@marvell.com>,
	<lironh@marvell.com>, <longli@microsoft.com>, <mw@semihalf.com>,
	<spinler@cesnet.cz>, <matan@nvidia.com>,
	<matt.peters@windriver.com>, <maxime.coquelin@redhat.com>,
	<mk@semihalf.com>, "humin (Q)" <humin29@huawei.com>,
	<pnalla@marvell.com>, <ndabilpuram@marvell.com>,
	<qiming.yang@intel.com>, <qi.z.zhang@intel.com>,
	<radhac@marvell.com>, <rahul.lakkireddy@chelsio.com>,
	<rmody@marvell.com>, <rosen.xu@intel.com>,
	<sachin.saxena@oss.nxp.com>, <skoteshwar@marvell.com>,
	<shshaikh@marvell.com>, <shaibran@amazon.com>,
	<shepard.siegel@atomicrules.com>, <asomalap@amd.com>,
	<somnath.kotur@broadcom.com>, <sthemmin@microsoft.com>,
	<steven.webster@windriver.com>, <skori@marvell.com>,
	<mtetsuyah@gmail.com>, <vburru@marvell.com>,
	<viacheslavo@nvidia.com>, <xiao.w.wang@intel.com>,
	"Wangxiaoyun (Cloud)" <cloud.wangxiaoyun@huawei.com>,
	"Zhuangyuzeng (Yisen)" <yisen.zhuang@huawei.com>,
	<yongwang@vmware.com>,
	"Xuanziyang (William)" <william.xuanziyang@huawei.com>,
	<cristian.dumitrescu@intel.com>
Subject: RE: [dpdk-dev] [v1] ethdev: support Tx queue used count
Date: Fri, 19 Jan 2024 11:32:08 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9F179@smartserver.smartshare.dk> (raw)
In-Reply-To: <18d0e3e252bc46828f285d6a865592f2@huawei.com>

> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> Sent: Friday, 19 January 2024 10.53
> 
> > > > > > Introduce a new API to retrieve the number of used
> descriptors
> > > > > > in a Tx queue. Applications can leverage this API in the fast
> > > path to
> > > > > > inspect the Tx queue occupancy and take appropriate actions
> based
> > > on the
> > > > > > available free descriptors.
> > > > > >
> > > > > > A notable use case could be implementing Random Early Discard
> > > (RED)
> > > > > > in software based on Tx queue occupancy.
> > > > > >
> > > > > > Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> > > >
> > > > The general feedback is to align with the Rx queue API,
> specifically
> > > > rte_eth_rx_queue_count,
> > > > and it's noted that there is no equivalent
> > > rte_eth_rx_queue_free_count.
> > > >
> > > > Given that the free count can be obtained by subtracting the used
> > > > count from queue_txd_num,
> > > > it is considered that either approach is acceptable.
> > > >
> > > > The application configures queue_txd_num with tx_queue_setup(),
> and
> > > > the application can store that value in its structure.
> > > > This would enable fast-path usage for both base cases (whether
> the
> > > > application needs information about free or used descriptors)
> > > > with just one API(rte_eth_tx_queue_count())
> > >
> > > Right now I don't use these functions, but if I think what most
> people
> > > are interested in:
> > > - how many packets you can receive immediately (rx_queue_count)
> >
> > Agreed that "used" (not "free") is the preferred info for RX.
> >
> > > - how many packets you can transmit immediately
> (tx_queue_free_count)
> > > Sure, I understand that user can store txd_num  somewhere and then
> do
> > > subtraction himself.
> > > Though it means more effort for the user, and the only reason for
> that,
> > > as I can see,
> > > is to have RX and TX function naming symmetric.
> > > Which seems much less improtant to me comparing to user
> convenience.
> >
> > I agree 100 % with your prioritization: Usability has higher priority
> than symmetric naming.
> >
> > So here are some example use cases supporting the TX "Used" API:
> > - RED (and similar queueing algorithms) need to know how many packets
> the queue holds (not how much room the queue has for
> > more packets).
> 
> Ok, but to calculate percentage we do need both numbers: txd_num and
> txd_used_num (or txd_free_num).
> So in such case user still has to store txd_num somewhere and do the
> math after getting txd_used_num.
> So probably  no advantage between tx_queue_used_count() and
> tx_queue_free_count() for that case.

If used for simple mitigation of tail-dropping, you are correct. Then the percentages would be appropriate.

But not if used for traffic engineering. In that case, the queueing algorithm's thresholds should be based on absolute numbers, configured by the network engineer.
As an optional application optimization, if an application has RED queueing configured for 100 % dropping at 200 packets in queue, the application can configure the NIC with only 256 descriptors instead of 512 or whatever the default is. In this way, the NIC queue size depends on the RED parameters, not vice versa.

> 
> > - Load Balancing across multiple links, in use cases where packet
> reordering is allowed.
> 
> I suppose for that case, you also will need to calc percentage, not the
> raw txd_used_num, no?

Not if optimizing for low latency. If one of the NICs supports 512 TX descriptors and another of the NICs supports 4096 TX descriptors, the application should transmit packets via the NIC with the lowest absolute number of used descriptors, assuming that this queue has the shortest queuing delay.

> 
> > - Monitoring egress queueing, especially in many-to-one-port traffic
> patterns, e.g. to catch micro-burst induced spikes (which may
> > cause latency/"bufferbloat").
> > - The (obsolete) ifOutQLen object in the Interfaces MIB for SNMP,
> which I suppose was intended for monitoring egress queueing.
> >
> > > Anyway, as I stated above, I don't use these functions right now,
> > > so if the majority of users are happy with current approach, I
> would
> > > not insist :)
> >
> > I'm very happy with the current approach. :-)
> 
> As I said, if end users are happy, then I am fine too ;)

Then everyone are happy; we can enjoy the coming weekend, and leave the remaining work on this to Jerin. :-))

Great discussion, Konstantin! It is important to remain focused on what the users need, and keep challenging if that is really what we are doing.


  reply	other threads:[~2024-01-19 10:32 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-19 17:29 [dpdk-dev] [RFC] ethdev: support Tx queue free descriptor query jerinj
2024-01-04 13:16 ` Dumitrescu, Cristian
2024-01-04 13:35   ` Jerin Jacob
2024-01-04 14:21     ` Konstantin Ananyev
2024-01-04 18:29       ` Thomas Monjalon
2024-01-05  9:57         ` Jerin Jacob
2024-01-05 10:03           ` Thomas Monjalon
2024-01-05 11:12             ` Konstantin Ananyev
2024-01-08 20:54               ` Morten Brørup
2024-01-09 14:45                 ` Jerin Jacob
2024-01-04 21:17 ` Thomas Monjalon
2024-01-05  9:54   ` Jerin Jacob
2024-01-05 10:02     ` Thomas Monjalon
2024-01-08 10:54 ` Bruce Richardson
2024-01-08 21:15   ` Morten Brørup
2024-01-09  8:47     ` Bruce Richardson
2024-01-12 10:56     ` Ferruh Yigit
2024-01-11 15:17 ` [dpdk-dev] [v1] ethdev: support Tx queue used count jerinj
2024-01-11 16:17   ` Andrew Rybchenko
2024-01-12  6:56     ` Jerin Jacob
2024-01-11 16:20   ` Morten Brørup
2024-01-12  6:59     ` Jerin Jacob
2024-01-11 17:00   ` Stephen Hemminger
2024-01-12  7:01     ` Jerin Jacob
2024-01-12 16:30       ` Stephen Hemminger
2024-01-12  8:02   ` David Marchand
2024-01-12  9:29     ` Jerin Jacob
2024-01-12 11:34   ` Ferruh Yigit
2024-01-12 12:11     ` David Marchand
2024-01-12 14:25       ` Ferruh Yigit
2024-01-12 12:29     ` Morten Brørup
2024-01-12 14:29       ` Ferruh Yigit
2024-01-18  9:06     ` Jerin Jacob
2024-01-12 12:33   ` Konstantin Ananyev
2024-01-16  6:37     ` Jerin Jacob
2024-01-18 10:17       ` Konstantin Ananyev
2024-01-18 11:21         ` Jerin Jacob
2024-01-18 13:36         ` Morten Brørup
2024-01-19  9:52           ` Konstantin Ananyev
2024-01-19 10:32             ` Morten Brørup [this message]
2024-01-12 16:52   ` Stephen Hemminger
2024-01-18  9:47   ` [dpdk-dev] [v2] " jerinj
2024-01-22 13:00     ` Konstantin Ananyev
2024-01-23 11:46       ` Ferruh Yigit
2024-02-07 20:30         ` Ferruh Yigit
2024-01-29 15:03     ` Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35E9F179@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=aboyer@pensando.io \
    --cc=ajit.khaparde@broadcom.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=asomalap@amd.com \
    --cc=beilei.xing@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=chas3@att.com \
    --cc=chenbo.xia@intel.com \
    --cc=ciara.loftus@intel.com \
    --cc=cloud.wangxiaoyun@huawei.com \
    --cc=cristian.dumitrescu@intel.com \
    --cc=dev@dpdk.org \
    --cc=dsinghrawat@marvell.com \
    --cc=ed.czeck@atomicrules.com \
    --cc=evgenys@amazon.com \
    --cc=ferruh.yigit@amd.com \
    --cc=ferruh.yigit@xilinx.com \
    --cc=g.singh@nxp.com \
    --cc=grive@u256.net \
    --cc=haiyue.wang@intel.com \
    --cc=heinrich.kuhn@corigine.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=hkalra@marvell.com \
    --cc=humin29@huawei.com \
    --cc=hyonkim@cisco.com \
    --cc=igorch@amazon.com \
    --cc=irusskikh@marvell.com \
    --cc=jasvinder.singh@intel.com \
    --cc=jerinj@marvell.com \
    --cc=jerinjacobk@gmail.com \
    --cc=jgrajcia@cisco.com \
    --cc=jianwang@trustnetic.com \
    --cc=jiawenwu@trustnetic.com \
    --cc=jingjing.wu@intel.com \
    --cc=john.miller@atomicrules.com \
    --cc=johndale@cisco.com \
    --cc=keith.wiles@intel.com \
    --cc=kirankumark@marvell.com \
    --cc=konstantin.ananyev@huawei.com \
    --cc=linville@tuxdriver.com \
    --cc=lironh@marvell.com \
    --cc=longli@microsoft.com \
    --cc=matan@nvidia.com \
    --cc=matt.peters@windriver.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mk@semihalf.com \
    --cc=mtetsuyah@gmail.com \
    --cc=mw@semihalf.com \
    --cc=ndabilpuram@marvell.com \
    --cc=pnalla@marvell.com \
    --cc=qi.z.zhang@intel.com \
    --cc=qiming.yang@intel.com \
    --cc=radhac@marvell.com \
    --cc=rahul.lakkireddy@chelsio.com \
    --cc=rmody@marvell.com \
    --cc=rosen.xu@intel.com \
    --cc=sachin.saxena@oss.nxp.com \
    --cc=shaibran@amazon.com \
    --cc=shepard.siegel@atomicrules.com \
    --cc=shshaikh@marvell.com \
    --cc=skori@marvell.com \
    --cc=skoteshwar@marvell.com \
    --cc=somnath.kotur@broadcom.com \
    --cc=spinler@cesnet.cz \
    --cc=steven.webster@windriver.com \
    --cc=sthemmin@microsoft.com \
    --cc=thomas@monjalon.net \
    --cc=vburru@marvell.com \
    --cc=viacheslavo@nvidia.com \
    --cc=william.xuanziyang@huawei.com \
    --cc=xiao.w.wang@intel.com \
    --cc=yisen.zhuang@huawei.com \
    --cc=yongwang@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).