Thanks Stephen for addressing my queries , and it is helpful. 
    
    One more follow up question on the same ,   Can DPDK HQOS be customized based on Use case ?
    
    For example: Hqos config for one of the use cases ,  One Port , One Subport , 16 Pipes & Each Pipe with only one TC.
                         16 pipe config was allowed but changing the 13TCs to 1TC is not allowed per Pipe.
 
    Can I still use 13 TCs but use the QueueSize as 0, Can that impact performance ?   
                           
    
Thanks
Farooq.J
 
  

On Wed, May 21, 2025 at 7:48 PM Stephen Hemminger <stephen@networkplumber.org> wrote:
On Mon, 28 Apr 2025 16:55:07 +0530
farooq basha <farooq.juturu@gmail.com> wrote:

> Hello DevTeam,
>
>     I am planning to use DPDK HQOS for Traffic shaping with a
> run-to-completion Model. While I was reading the dpdk-qos document, I came
> across the following statement.
>
> "*Running enqueue and dequeue operations for the same output port from
> different cores is likely to cause significant impact on scheduler’s
> performance and it is therefore not recommended"*
>
>  Let's take an  example, Port1  & Port2 have 4 Rx queues and each Queue
> mapped to a different CPU. Traffic coming on port1  gets forwarded to port2
> . With the above limitation application needs to take a lock before doing
> rte_sched_port_enqueue & dequeue operation. Performance is limited to only
> 1 CPU even though Traffic is coming on 4 Different CPUs.
>
> Correct me if my understanding is Wrong?
>
> Thanks
> Basha

The HQOS code is not thread safe so yes you need a lock.
The traffic scheduling (QOS) needs to be at last stage of the pipeline just
before mbufs are passed to the device.

The issue is that QOS is single threaded, so lock is required.

The statement is misleading, the real overhead is the lock; the secondary
overhead is the cache miss that will happen if processing on different cores.
But if you are doing that you are going to cut performance a lot from cache
misses.