DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] Regarding HQOS with run-to-completion Model
@ 2025-04-28 11:25 farooq basha
  2025-05-21 14:18 ` Stephen Hemminger
  0 siblings, 1 reply; 4+ messages in thread
From: farooq basha @ 2025-04-28 11:25 UTC (permalink / raw)
  To: dev

[-- Attachment #1: Type: text/plain, Size: 809 bytes --]

Hello DevTeam,

    I am planning to use DPDK HQOS for Traffic shaping with a
run-to-completion Model. While I was reading the dpdk-qos document, I came
across the following statement.

"*Running enqueue and dequeue operations for the same output port from
different cores is likely to cause significant impact on scheduler’s
performance and it is therefore not recommended"*

 Let's take an  example, Port1  & Port2 have 4 Rx queues and each Queue
mapped to a different CPU. Traffic coming on port1  gets forwarded to port2
. With the above limitation application needs to take a lock before doing
rte_sched_port_enqueue & dequeue operation. Performance is limited to only
1 CPU even though Traffic is coming on 4 Different CPUs.

Correct me if my understanding is Wrong?

Thanks
Basha

[-- Attachment #2: Type: text/html, Size: 2727 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] Regarding HQOS with run-to-completion Model
  2025-04-28 11:25 [dpdk-dev] Regarding HQOS with run-to-completion Model farooq basha
@ 2025-05-21 14:18 ` Stephen Hemminger
  2025-05-22  2:45   ` farooq basha
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Hemminger @ 2025-05-21 14:18 UTC (permalink / raw)
  To: farooq basha; +Cc: dev

On Mon, 28 Apr 2025 16:55:07 +0530
farooq basha <farooq.juturu@gmail.com> wrote:

> Hello DevTeam,
> 
>     I am planning to use DPDK HQOS for Traffic shaping with a
> run-to-completion Model. While I was reading the dpdk-qos document, I came
> across the following statement.
> 
> "*Running enqueue and dequeue operations for the same output port from
> different cores is likely to cause significant impact on scheduler’s
> performance and it is therefore not recommended"*
> 
>  Let's take an  example, Port1  & Port2 have 4 Rx queues and each Queue
> mapped to a different CPU. Traffic coming on port1  gets forwarded to port2
> . With the above limitation application needs to take a lock before doing
> rte_sched_port_enqueue & dequeue operation. Performance is limited to only
> 1 CPU even though Traffic is coming on 4 Different CPUs.
> 
> Correct me if my understanding is Wrong?
> 
> Thanks
> Basha

The HQOS code is not thread safe so yes you need a lock.
The traffic scheduling (QOS) needs to be at last stage of the pipeline just
before mbufs are passed to the device.

The issue is that QOS is single threaded, so lock is required. 

The statement is misleading, the real overhead is the lock; the secondary
overhead is the cache miss that will happen if processing on different cores.
But if you are doing that you are going to cut performance a lot from cache
misses.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] Regarding HQOS with run-to-completion Model
  2025-05-21 14:18 ` Stephen Hemminger
@ 2025-05-22  2:45   ` farooq basha
  2025-05-22 15:21     ` Stephen Hemminger
  0 siblings, 1 reply; 4+ messages in thread
From: farooq basha @ 2025-05-22  2:45 UTC (permalink / raw)
  To: stephen; +Cc: dev

[-- Attachment #1: Type: text/plain, Size: 2102 bytes --]

Thanks Stephen for addressing my queries , and it is helpful.

    One more follow up question on the same ,   Can DPDK HQOS be customized
based on Use case ?

    For example: Hqos config for one of the use cases ,  *One Port , One
Subport , 16 Pipes & Each Pipe with only one TC*.
                         16 pipe config was allowed but changing the 13TCs
to 1TC is not allowed per Pipe.

    Can I still use 13 TCs but use the QueueSize as 0, Can that impact
performance ?


Thanks
Farooq.J



On Wed, May 21, 2025 at 7:48 PM Stephen Hemminger <
stephen@networkplumber.org> wrote:

> On Mon, 28 Apr 2025 16:55:07 +0530
> farooq basha <farooq.juturu@gmail.com> wrote:
>
> > Hello DevTeam,
> >
> >     I am planning to use DPDK HQOS for Traffic shaping with a
> > run-to-completion Model. While I was reading the dpdk-qos document, I
> came
> > across the following statement.
> >
> > "*Running enqueue and dequeue operations for the same output port from
> > different cores is likely to cause significant impact on scheduler’s
> > performance and it is therefore not recommended"*
> >
> >  Let's take an  example, Port1  & Port2 have 4 Rx queues and each Queue
> > mapped to a different CPU. Traffic coming on port1  gets forwarded to
> port2
> > . With the above limitation application needs to take a lock before doing
> > rte_sched_port_enqueue & dequeue operation. Performance is limited to
> only
> > 1 CPU even though Traffic is coming on 4 Different CPUs.
> >
> > Correct me if my understanding is Wrong?
> >
> > Thanks
> > Basha
>
> The HQOS code is not thread safe so yes you need a lock.
> The traffic scheduling (QOS) needs to be at last stage of the pipeline just
> before mbufs are passed to the device.
>
> The issue is that QOS is single threaded, so lock is required.
>
> The statement is misleading, the real overhead is the lock; the secondary
> overhead is the cache miss that will happen if processing on different
> cores.
> But if you are doing that you are going to cut performance a lot from cache
> misses.
>

[-- Attachment #2: Type: text/html, Size: 2845 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-dev] Regarding HQOS with run-to-completion Model
  2025-05-22  2:45   ` farooq basha
@ 2025-05-22 15:21     ` Stephen Hemminger
  0 siblings, 0 replies; 4+ messages in thread
From: Stephen Hemminger @ 2025-05-22 15:21 UTC (permalink / raw)
  To: farooq basha; +Cc: dev

On Thu, 22 May 2025 08:15:14 +0530
farooq basha <farooq.juturu@gmail.com> wrote:

> Thanks Stephen for addressing my queries , and it is helpful.
> 
>     One more follow up question on the same ,   Can DPDK HQOS be customized
> based on Use case ?
> 
>     For example: Hqos config for one of the use cases ,  *One Port , One
> Subport , 16 Pipes & Each Pipe with only one TC*.
>                          16 pipe config was allowed but changing the 13TCs
> to 1TC is not allowed per Pipe.
> 
>     Can I still use 13 TCs but use the QueueSize as 0, Can that impact
> performance ?
> 

No. Current qos sched code has hard coded assumptions on number of pipes etc.
I think it is modeled after some carrier standard and is not generally
that useful.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-22 15:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-28 11:25 [dpdk-dev] Regarding HQOS with run-to-completion Model farooq basha
2025-05-21 14:18 ` Stephen Hemminger
2025-05-22  2:45   ` farooq basha
2025-05-22 15:21     ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).