[dpdk-dev] Fwd: high latency detected in IP pipeline example

DPDK patches and discussions
 help / color / mirror / Atom feed

* [dpdk-dev] Fwd: high latency detected in IP pipeline example
       [not found] <CAGxG5cjY+npJ7wVqcb9MXdtKkpC6RrgYpDQA2qbaAjD7i7C2EQ@mail.gmail.com>
@ 2020-02-17 16:41 ` Victor Huertas
  2020-02-17 23:10   ` James Huang
  0 siblings, 1 reply; 10+ messages in thread
From: Victor Huertas @ 2020-02-17 16:41 UTC (permalink / raw)
  To: dev; +Cc: cristian.dumitrescu

Hi all,

I am developing my own DPDK application basing it in the dpdk-stable
ip_pipeline example.
At this moment I am using the 17.11 LTS version of DPDK and I amb observing
some extrange behaviour. Maybe it is an old issue that can be solved
quickly so I would appreciate it if some expert can shade a light on this.

The ip_pipeline example allows you to develop Pipelines that perform
specific packet processing functions (ROUTING, FLOW_CLASSIFYING, etc...).
The thing is that I am extending some of this pipelines with my own.
However I want to take advantage of the built-in ip_pipeline capability of
arbitrarily assigning the logical core where the pipeline (f_run()
function) must be executed so that i can adapt the packet processing power
to the amount of the number of cores available.
Taking this into account I have observed something strange. I show you this
simple example below.

Case 1:
[PIPELINE 0 MASTER core =0]
[PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=2] -----SWQ2---->
[PIPELINE 3 core=3]

Case 2:
[PIPELINE 0 MASTER core =0]
[PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
[PIPELINE 3 core=1]

I send a ping between two hosts connected at both sides of the pipeline
model which allows these pings to cross all the pipelines (from 1 to 3).
What I observe in Case 1 (each pipeline has its own thread in different
core) is that the reported RTT is less than 1 ms, whereas in Case 2 (all
pipelines except MASTER are run in the same thread) is 20 ms. Furthermore,
in Case 2, if I increase a lot (hundreds of Mbps) the packet rate this RTT
decreases to 3 or 4 ms.

Has somebody observed this behaviour in the past? Can it be solved somehow?

Thanks a lot for your attention
-- 
Victor

-- 
Victor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] Fwd: high latency detected in IP pipeline example
  2020-02-17 16:41 ` [dpdk-dev] Fwd: high latency detected in IP pipeline example Victor Huertas
@ 2020-02-17 23:10   ` James Huang
  2020-02-18  7:04     ` Victor Huertas
  0 siblings, 1 reply; 10+ messages in thread
From: James Huang @ 2020-02-17 23:10 UTC (permalink / raw)
  To: Victor Huertas; +Cc: dev, cristian.dumitrescu

Yes, I experienced similar issue in my application. In a short answer, set
the swqs write burst value to 1 may reduce the latency significantly. The
default write burst value is 32.

On Mon., Feb. 17, 2020, 8:41 a.m. Victor Huertas <vhuertas@gmail.com> wrote:

> Hi all,
>
> I am developing my own DPDK application basing it in the dpdk-stable
> ip_pipeline example.
> At this moment I am using the 17.11 LTS version of DPDK and I amb observing
> some extrange behaviour. Maybe it is an old issue that can be solved
> quickly so I would appreciate it if some expert can shade a light on this.
>
> The ip_pipeline example allows you to develop Pipelines that perform
> specific packet processing functions (ROUTING, FLOW_CLASSIFYING, etc...).
> The thing is that I am extending some of this pipelines with my own.
> However I want to take advantage of the built-in ip_pipeline capability of
> arbitrarily assigning the logical core where the pipeline (f_run()
> function) must be executed so that i can adapt the packet processing power
> to the amount of the number of cores available.
> Taking this into account I have observed something strange. I show you this
> simple example below.
>
> Case 1:
> [PIPELINE 0 MASTER core =0]
> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=2] -----SWQ2---->
> [PIPELINE 3 core=3]
>
> Case 2:
> [PIPELINE 0 MASTER core =0]
> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
> [PIPELINE 3 core=1]
>
> I send a ping between two hosts connected at both sides of the pipeline
> model which allows these pings to cross all the pipelines (from 1 to 3).
> What I observe in Case 1 (each pipeline has its own thread in different
> core) is that the reported RTT is less than 1 ms, whereas in Case 2 (all
> pipelines except MASTER are run in the same thread) is 20 ms. Furthermore,
> in Case 2, if I increase a lot (hundreds of Mbps) the packet rate this RTT
> decreases to 3 or 4 ms.
>
> Has somebody observed this behaviour in the past? Can it be solved somehow?
>
> Thanks a lot for your attention
> --
> Victor
>
>
> --
> Victor
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] Fwd: high latency detected in IP pipeline example
  2020-02-17 23:10   ` James Huang
@ 2020-02-18  7:04     ` Victor Huertas
  2020-02-18  7:18       ` James Huang
  0 siblings, 1 reply; 10+ messages in thread
From: Victor Huertas @ 2020-02-18  7:04 UTC (permalink / raw)
  To: James Huang; +Cc: dev, cristian.dumitrescu

Thanks James for your quick answer.
I guess that this configuration modification implies that the packets must
be written one by one in the sw ring. Did you notice loose of performance
(in throughput) in your aplicación because of that?

Regards

El mar., 18 feb. 2020 0:10, James Huang <jamsphon@gmail.com> escribió:

> Yes, I experienced similar issue in my application. In a short answer, set
> the swqs write burst value to 1 may reduce the latency significantly. The
> default write burst value is 32.
>
> On Mon., Feb. 17, 2020, 8:41 a.m. Victor Huertas <vhuertas@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I am developing my own DPDK application basing it in the dpdk-stable
>> ip_pipeline example.
>> At this moment I am using the 17.11 LTS version of DPDK and I amb
>> observing
>> some extrange behaviour. Maybe it is an old issue that can be solved
>> quickly so I would appreciate it if some expert can shade a light on this.
>>
>> The ip_pipeline example allows you to develop Pipelines that perform
>> specific packet processing functions (ROUTING, FLOW_CLASSIFYING, etc...).
>> The thing is that I am extending some of this pipelines with my own.
>> However I want to take advantage of the built-in ip_pipeline capability of
>> arbitrarily assigning the logical core where the pipeline (f_run()
>> function) must be executed so that i can adapt the packet processing power
>> to the amount of the number of cores available.
>> Taking this into account I have observed something strange. I show you
>> this
>> simple example below.
>>
>> Case 1:
>> [PIPELINE 0 MASTER core =0]
>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=2] -----SWQ2---->
>> [PIPELINE 3 core=3]
>>
>> Case 2:
>> [PIPELINE 0 MASTER core =0]
>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
>> [PIPELINE 3 core=1]
>>
>> I send a ping between two hosts connected at both sides of the pipeline
>> model which allows these pings to cross all the pipelines (from 1 to 3).
>> What I observe in Case 1 (each pipeline has its own thread in different
>> core) is that the reported RTT is less than 1 ms, whereas in Case 2 (all
>> pipelines except MASTER are run in the same thread) is 20 ms. Furthermore,
>> in Case 2, if I increase a lot (hundreds of Mbps) the packet rate this RTT
>> decreases to 3 or 4 ms.
>>
>> Has somebody observed this behaviour in the past? Can it be solved
>> somehow?
>>
>> Thanks a lot for your attention
>> --
>> Victor
>>
>>
>> --
>> Victor
>>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] Fwd: high latency detected in IP pipeline example
  2020-02-18  7:04     ` Victor Huertas
@ 2020-02-18  7:18       ` James Huang
  2020-02-18  9:49         ` Victor Huertas
  0 siblings, 1 reply; 10+ messages in thread
From: James Huang @ 2020-02-18  7:18 UTC (permalink / raw)
  To: Victor Huertas; +Cc: dev, cristian.dumitrescu

No. We didn't see noticable throughput difference in our test.

On Mon., Feb. 17, 2020, 11:04 p.m. Victor Huertas <vhuertas@gmail.com>
wrote:

> Thanks James for your quick answer.
> I guess that this configuration modification implies that the packets must
> be written one by one in the sw ring. Did you notice loose of performance
> (in throughput) in your aplicación because of that?
>
> Regards
>
> El mar., 18 feb. 2020 0:10, James Huang <jamsphon@gmail.com> escribió:
>
>> Yes, I experienced similar issue in my application. In a short answer,
>> set the swqs write burst value to 1 may reduce the latency significantly.
>> The default write burst value is 32.
>>
>> On Mon., Feb. 17, 2020, 8:41 a.m. Victor Huertas <vhuertas@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I am developing my own DPDK application basing it in the dpdk-stable
>>> ip_pipeline example.
>>> At this moment I am using the 17.11 LTS version of DPDK and I amb
>>> observing
>>> some extrange behaviour. Maybe it is an old issue that can be solved
>>> quickly so I would appreciate it if some expert can shade a light on
>>> this.
>>>
>>> The ip_pipeline example allows you to develop Pipelines that perform
>>> specific packet processing functions (ROUTING, FLOW_CLASSIFYING, etc...).
>>> The thing is that I am extending some of this pipelines with my own.
>>> However I want to take advantage of the built-in ip_pipeline capability
>>> of
>>> arbitrarily assigning the logical core where the pipeline (f_run()
>>> function) must be executed so that i can adapt the packet processing
>>> power
>>> to the amount of the number of cores available.
>>> Taking this into account I have observed something strange. I show you
>>> this
>>> simple example below.
>>>
>>> Case 1:
>>> [PIPELINE 0 MASTER core =0]
>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=2] -----SWQ2---->
>>> [PIPELINE 3 core=3]
>>>
>>> Case 2:
>>> [PIPELINE 0 MASTER core =0]
>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
>>> [PIPELINE 3 core=1]
>>>
>>> I send a ping between two hosts connected at both sides of the pipeline
>>> model which allows these pings to cross all the pipelines (from 1 to 3).
>>> What I observe in Case 1 (each pipeline has its own thread in different
>>> core) is that the reported RTT is less than 1 ms, whereas in Case 2 (all
>>> pipelines except MASTER are run in the same thread) is 20 ms.
>>> Furthermore,
>>> in Case 2, if I increase a lot (hundreds of Mbps) the packet rate this
>>> RTT
>>> decreases to 3 or 4 ms.
>>>
>>> Has somebody observed this behaviour in the past? Can it be solved
>>> somehow?
>>>
>>> Thanks a lot for your attention
>>> --
>>> Victor
>>>
>>>
>>> --
>>> Victor
>>>
>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] Fwd: high latency detected in IP pipeline example
  2020-02-18  7:18       ` James Huang
@ 2020-02-18  9:49         ` Victor Huertas
  2020-02-18 22:08           ` James Huang
  0 siblings, 1 reply; 10+ messages in thread
From: Victor Huertas @ 2020-02-18  9:49 UTC (permalink / raw)
  To: James Huang; +Cc: dev, cristian.dumitrescu

Dear James,

I have done two different tests with the following configuration:
[PIPELINE 0 MASTER core =0]
[PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
[PIPELINE 3 core=1]

The first test (sending a single ping to cross all the pipelines to measure
RTT) has been done by setting the burst_write to 32 in SWQ1 and SWQ2. NOTE:
All the times we use rte_ring_enqueue_burst in the pipelines 1 and 2 we set
the number of packets to write to 1.

The result of this first test is as shown subsquently:
64 bytes from 192.168.0.101: icmp_seq=343 ttl=63 time=59.8 ms
64 bytes from 192.168.0.101: icmp_seq=344 ttl=63 time=59.4 ms
64 bytes from 192.168.0.101: icmp_seq=345 ttl=63 time=59.2 ms
64 bytes from 192.168.0.101: icmp_seq=346 ttl=63 time=59.0 ms
64 bytes from 192.168.0.101: icmp_seq=347 ttl=63 time=59.0 ms
64 bytes from 192.168.0.101: icmp_seq=348 ttl=63 time=59.2 ms
64 bytes from 192.168.0.101: icmp_seq=349 ttl=63 time=59.3 ms
64 bytes from 192.168.0.101: icmp_seq=350 ttl=63 time=59.1 ms
64 bytes from 192.168.0.101: icmp_seq=351 ttl=63 time=58.9 ms
64 bytes from 192.168.0.101: icmp_seq=352 ttl=63 time=58.5 ms
64 bytes from 192.168.0.101: icmp_seq=353 ttl=63 time=58.4 ms
64 bytes from 192.168.0.101: icmp_seq=354 ttl=63 time=58.0 ms
64 bytes from 192.168.0.101: icmp_seq=355 ttl=63 time=58.4 ms
64 bytes from 192.168.0.101: icmp_seq=356 ttl=63 time=57.7 ms
64 bytes from 192.168.0.101: icmp_seq=357 ttl=63 time=56.9 ms
64 bytes from 192.168.0.101: icmp_seq=358 ttl=63 time=57.2 ms
64 bytes from 192.168.0.101: icmp_seq=359 ttl=63 time=57.5 ms
64 bytes from 192.168.0.101: icmp_seq=360 ttl=63 time=57.3 ms

As you can see, the RTT is quite high and the range of values is more or
less stable.

The second test is the same as the first one but setting burst_write to 1
for all SWQs. The result is this one:

64 bytes from 192.168.0.101: icmp_seq=131 ttl=63 time=10.6 ms
64 bytes from 192.168.0.101: icmp_seq=132 ttl=63 time=10.6 ms
64 bytes from 192.168.0.101: icmp_seq=133 ttl=63 time=10.5 ms
64 bytes from 192.168.0.101: icmp_seq=134 ttl=63 time=10.7 ms
64 bytes from 192.168.0.101: icmp_seq=135 ttl=63 time=10.8 ms
64 bytes from 192.168.0.101: icmp_seq=136 ttl=63 time=10.4 ms
64 bytes from 192.168.0.101: icmp_seq=137 ttl=63 time=10.7 ms
64 bytes from 192.168.0.101: icmp_seq=138 ttl=63 time=10.5 ms
64 bytes from 192.168.0.101: icmp_seq=139 ttl=63 time=10.4 ms
64 bytes from 192.168.0.101: icmp_seq=140 ttl=63 time=10.2 ms
64 bytes from 192.168.0.101: icmp_seq=141 ttl=63 time=10.4 ms
64 bytes from 192.168.0.101: icmp_seq=142 ttl=63 time=10.9 ms
64 bytes from 192.168.0.101: icmp_seq=143 ttl=63 time=11.4 ms
64 bytes from 192.168.0.101: icmp_seq=144 ttl=63 time=11.3 ms
64 bytes from 192.168.0.101: icmp_seq=145 ttl=63 time=11.5 ms
64 bytes from 192.168.0.101: icmp_seq=146 ttl=63 time=11.6 ms
64 bytes from 192.168.0.101: icmp_seq=147 ttl=63 time=11.0 ms
64 bytes from 192.168.0.101: icmp_seq=148 ttl=63 time=11.3 ms
64 bytes from 192.168.0.101: icmp_seq=149 ttl=63 time=12.0 ms
64 bytes from 192.168.0.101: icmp_seq=150 ttl=63 time=12.6 ms
64 bytes from 192.168.0.101: icmp_seq=151 ttl=63 time=12.4 ms
64 bytes from 192.168.0.101: icmp_seq=152 ttl=63 time=12.3 ms
64 bytes from 192.168.0.101: icmp_seq=153 ttl=63 time=12.8 ms
64 bytes from 192.168.0.101: icmp_seq=154 ttl=63 time=12.4 ms
64 bytes from 192.168.0.101: icmp_seq=155 ttl=63 time=12.8 ms
64 bytes from 192.168.0.101: icmp_seq=156 ttl=63 time=12.7 ms
64 bytes from 192.168.0.101: icmp_seq=157 ttl=63 time=12.6 ms
64 bytes from 192.168.0.101: icmp_seq=158 ttl=63 time=12.9 ms
64 bytes from 192.168.0.101: icmp_seq=159 ttl=63 time=13.4 ms
64 bytes from 192.168.0.101: icmp_seq=160 ttl=63 time=13.8 ms
64 bytes from 192.168.0.101: icmp_seq=161 ttl=63 time=13.4 ms
64 bytes from 192.168.0.101: icmp_seq=162 ttl=63 time=13.3 ms
64 bytes from 192.168.0.101: icmp_seq=163 ttl=63 time=13.3 ms
64 bytes from 192.168.0.101: icmp_seq=164 ttl=63 time=13.7 ms
64 bytes from 192.168.0.101: icmp_seq=165 ttl=63 time=13.7 ms
64 bytes from 192.168.0.101: icmp_seq=166 ttl=63 time=13.8 ms
64 bytes from 192.168.0.101: icmp_seq=167 ttl=63 time=14.7 ms
64 bytes from 192.168.0.101: icmp_seq=168 ttl=63 time=14.7 ms
64 bytes from 192.168.0.101: icmp_seq=169 ttl=63 time=14.7 ms
64 bytes from 192.168.0.101: icmp_seq=170 ttl=63 time=14.7 ms
64 bytes from 192.168.0.101: icmp_seq=171 ttl=63 time=14.6 ms
64 bytes from 192.168.0.101: icmp_seq=172 ttl=63 time=14.6 ms
64 bytes from 192.168.0.101: icmp_seq=173 ttl=63 time=14.5 ms
64 bytes from 192.168.0.101: icmp_seq=174 ttl=63 time=14.5 ms
64 bytes from 192.168.0.101: icmp_seq=175 ttl=63 time=15.1 ms
64 bytes from 192.168.0.101: icmp_seq=176 ttl=63 time=15.6 ms
64 bytes from 192.168.0.101: icmp_seq=177 ttl=63 time=16.0 ms
64 bytes from 192.168.0.101: icmp_seq=178 ttl=63 time=16.9 ms
64 bytes from 192.168.0.101: icmp_seq=179 ttl=63 time=17.7 ms
64 bytes from 192.168.0.101: icmp_seq=180 ttl=63 time=17.6 ms
64 bytes from 192.168.0.101: icmp_seq=181 ttl=63 time=17.9 ms
64 bytes from 192.168.0.101: icmp_seq=182 ttl=63 time=17.9 ms
64 bytes from 192.168.0.101: icmp_seq=183 ttl=63 time=18.5 ms
64 bytes from 192.168.0.101: icmp_seq=184 ttl=63 time=18.9 ms
64 bytes from 192.168.0.101: icmp_seq=185 ttl=63 time=19.8 ms
64 bytes from 192.168.0.101: icmp_seq=186 ttl=63 time=19.8 ms
64 bytes from 192.168.0.101: icmp_seq=187 ttl=63 time=10.7 ms
64 bytes from 192.168.0.101: icmp_seq=188 ttl=63 time=10.5 ms
64 bytes from 192.168.0.101: icmp_seq=189 ttl=63 time=10.4 ms
64 bytes from 192.168.0.101: icmp_seq=190 ttl=63 time=10.3 ms
64 bytes from 192.168.0.101: icmp_seq=191 ttl=63 time=10.5 ms
64 bytes from 192.168.0.101: icmp_seq=192 ttl=63 time=10.7 ms
As you mentioned, the delay has decreased a lot but it is still
considerably high (in a normal router this delay is less than 1 ms).
A second strange behaviour is seen in the evolution of the RTT detected. It
begins in 10 ms and goes increasing little by litttle to reach a peak of 20
ms aprox and then it suddely comes back to 10 ms again to increase again
till 20 ms.

Is this the behaviour you have in your case when the burst_write is set to
1?

Regards,

El mar., 18 feb. 2020 a las 8:18, James Huang (<jamsphon@gmail.com>)
escribió:

> No. We didn't see noticable throughput difference in our test.
>
> On Mon., Feb. 17, 2020, 11:04 p.m. Victor Huertas <vhuertas@gmail.com>
> wrote:
>
>> Thanks James for your quick answer.
>> I guess that this configuration modification implies that the packets
>> must be written one by one in the sw ring. Did you notice loose of
>> performance (in throughput) in your aplicación because of that?
>>
>> Regards
>>
>> El mar., 18 feb. 2020 0:10, James Huang <jamsphon@gmail.com> escribió:
>>
>>> Yes, I experienced similar issue in my application. In a short answer,
>>> set the swqs write burst value to 1 may reduce the latency significantly.
>>> The default write burst value is 32.
>>>
>>> On Mon., Feb. 17, 2020, 8:41 a.m. Victor Huertas <vhuertas@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am developing my own DPDK application basing it in the dpdk-stable
>>>> ip_pipeline example.
>>>> At this moment I am using the 17.11 LTS version of DPDK and I amb
>>>> observing
>>>> some extrange behaviour. Maybe it is an old issue that can be solved
>>>> quickly so I would appreciate it if some expert can shade a light on
>>>> this.
>>>>
>>>> The ip_pipeline example allows you to develop Pipelines that perform
>>>> specific packet processing functions (ROUTING, FLOW_CLASSIFYING,
>>>> etc...).
>>>> The thing is that I am extending some of this pipelines with my own.
>>>> However I want to take advantage of the built-in ip_pipeline capability
>>>> of
>>>> arbitrarily assigning the logical core where the pipeline (f_run()
>>>> function) must be executed so that i can adapt the packet processing
>>>> power
>>>> to the amount of the number of cores available.
>>>> Taking this into account I have observed something strange. I show you
>>>> this
>>>> simple example below.
>>>>
>>>> Case 1:
>>>> [PIPELINE 0 MASTER core =0]
>>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=2] -----SWQ2---->
>>>> [PIPELINE 3 core=3]
>>>>
>>>> Case 2:
>>>> [PIPELINE 0 MASTER core =0]
>>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
>>>> [PIPELINE 3 core=1]
>>>>
>>>> I send a ping between two hosts connected at both sides of the pipeline
>>>> model which allows these pings to cross all the pipelines (from 1 to 3).
>>>> What I observe in Case 1 (each pipeline has its own thread in different
>>>> core) is that the reported RTT is less than 1 ms, whereas in Case 2 (all
>>>> pipelines except MASTER are run in the same thread) is 20 ms.
>>>> Furthermore,
>>>> in Case 2, if I increase a lot (hundreds of Mbps) the packet rate this
>>>> RTT
>>>> decreases to 3 or 4 ms.
>>>>
>>>> Has somebody observed this behaviour in the past? Can it be solved
>>>> somehow?
>>>>
>>>> Thanks a lot for your attention
>>>> --
>>>> Victor
>>>>
>>>>
>>>> --
>>>> Victor
>>>>
>>>

-- 
Victor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] Fwd: high latency detected in IP pipeline example
  2020-02-18  9:49         ` Victor Huertas
@ 2020-02-18 22:08           ` James Huang
  2020-02-19  8:29             ` Victor Huertas
  0 siblings, 1 reply; 10+ messages in thread
From: James Huang @ 2020-02-18 22:08 UTC (permalink / raw)
  To: Victor Huertas; +Cc: dev, cristian.dumitrescu

No. I didn't notice the RTT bouncing symptoms.
In high throughput scenario, if multiple pipelines runs in a single cpu
core, it does increase the latency.


Regards,
James Huang


On Tue, Feb 18, 2020 at 1:50 AM Victor Huertas <vhuertas@gmail.com> wrote:

> Dear James,
>
> I have done two different tests with the following configuration:
> [PIPELINE 0 MASTER core =0]
> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
> [PIPELINE 3 core=1]
>
> The first test (sending a single ping to cross all the pipelines to
> measure RTT) has been done by setting the burst_write to 32 in SWQ1 and
> SWQ2. NOTE: All the times we use rte_ring_enqueue_burst in the pipelines 1
> and 2 we set the number of packets to write to 1.
>
> The result of this first test is as shown subsquently:
> 64 bytes from 192.168.0.101: icmp_seq=343 ttl=63 time=59.8 ms
> 64 bytes from 192.168.0.101: icmp_seq=344 ttl=63 time=59.4 ms
> 64 bytes from 192.168.0.101: icmp_seq=345 ttl=63 time=59.2 ms
> 64 bytes from 192.168.0.101: icmp_seq=346 ttl=63 time=59.0 ms
> 64 bytes from 192.168.0.101: icmp_seq=347 ttl=63 time=59.0 ms
> 64 bytes from 192.168.0.101: icmp_seq=348 ttl=63 time=59.2 ms
> 64 bytes from 192.168.0.101: icmp_seq=349 ttl=63 time=59.3 ms
> 64 bytes from 192.168.0.101: icmp_seq=350 ttl=63 time=59.1 ms
> 64 bytes from 192.168.0.101: icmp_seq=351 ttl=63 time=58.9 ms
> 64 bytes from 192.168.0.101: icmp_seq=352 ttl=63 time=58.5 ms
> 64 bytes from 192.168.0.101: icmp_seq=353 ttl=63 time=58.4 ms
> 64 bytes from 192.168.0.101: icmp_seq=354 ttl=63 time=58.0 ms
> 64 bytes from 192.168.0.101: icmp_seq=355 ttl=63 time=58.4 ms
> 64 bytes from 192.168.0.101: icmp_seq=356 ttl=63 time=57.7 ms
> 64 bytes from 192.168.0.101: icmp_seq=357 ttl=63 time=56.9 ms
> 64 bytes from 192.168.0.101: icmp_seq=358 ttl=63 time=57.2 ms
> 64 bytes from 192.168.0.101: icmp_seq=359 ttl=63 time=57.5 ms
> 64 bytes from 192.168.0.101: icmp_seq=360 ttl=63 time=57.3 ms
>
> As you can see, the RTT is quite high and the range of values is more or
> less stable.
>
> The second test is the same as the first one but setting burst_write to 1
> for all SWQs. The result is this one:
>
> 64 bytes from 192.168.0.101: icmp_seq=131 ttl=63 time=10.6 ms
> 64 bytes from 192.168.0.101: icmp_seq=132 ttl=63 time=10.6 ms
> 64 bytes from 192.168.0.101: icmp_seq=133 ttl=63 time=10.5 ms
> 64 bytes from 192.168.0.101: icmp_seq=134 ttl=63 time=10.7 ms
> 64 bytes from 192.168.0.101: icmp_seq=135 ttl=63 time=10.8 ms
> 64 bytes from 192.168.0.101: icmp_seq=136 ttl=63 time=10.4 ms
> 64 bytes from 192.168.0.101: icmp_seq=137 ttl=63 time=10.7 ms
> 64 bytes from 192.168.0.101: icmp_seq=138 ttl=63 time=10.5 ms
> 64 bytes from 192.168.0.101: icmp_seq=139 ttl=63 time=10.4 ms
> 64 bytes from 192.168.0.101: icmp_seq=140 ttl=63 time=10.2 ms
> 64 bytes from 192.168.0.101: icmp_seq=141 ttl=63 time=10.4 ms
> 64 bytes from 192.168.0.101: icmp_seq=142 ttl=63 time=10.9 ms
> 64 bytes from 192.168.0.101: icmp_seq=143 ttl=63 time=11.4 ms
> 64 bytes from 192.168.0.101: icmp_seq=144 ttl=63 time=11.3 ms
> 64 bytes from 192.168.0.101: icmp_seq=145 ttl=63 time=11.5 ms
> 64 bytes from 192.168.0.101: icmp_seq=146 ttl=63 time=11.6 ms
> 64 bytes from 192.168.0.101: icmp_seq=147 ttl=63 time=11.0 ms
> 64 bytes from 192.168.0.101: icmp_seq=148 ttl=63 time=11.3 ms
> 64 bytes from 192.168.0.101: icmp_seq=149 ttl=63 time=12.0 ms
> 64 bytes from 192.168.0.101: icmp_seq=150 ttl=63 time=12.6 ms
> 64 bytes from 192.168.0.101: icmp_seq=151 ttl=63 time=12.4 ms
> 64 bytes from 192.168.0.101: icmp_seq=152 ttl=63 time=12.3 ms
> 64 bytes from 192.168.0.101: icmp_seq=153 ttl=63 time=12.8 ms
> 64 bytes from 192.168.0.101: icmp_seq=154 ttl=63 time=12.4 ms
> 64 bytes from 192.168.0.101: icmp_seq=155 ttl=63 time=12.8 ms
> 64 bytes from 192.168.0.101: icmp_seq=156 ttl=63 time=12.7 ms
> 64 bytes from 192.168.0.101: icmp_seq=157 ttl=63 time=12.6 ms
> 64 bytes from 192.168.0.101: icmp_seq=158 ttl=63 time=12.9 ms
> 64 bytes from 192.168.0.101: icmp_seq=159 ttl=63 time=13.4 ms
> 64 bytes from 192.168.0.101: icmp_seq=160 ttl=63 time=13.8 ms
> 64 bytes from 192.168.0.101: icmp_seq=161 ttl=63 time=13.4 ms
> 64 bytes from 192.168.0.101: icmp_seq=162 ttl=63 time=13.3 ms
> 64 bytes from 192.168.0.101: icmp_seq=163 ttl=63 time=13.3 ms
> 64 bytes from 192.168.0.101: icmp_seq=164 ttl=63 time=13.7 ms
> 64 bytes from 192.168.0.101: icmp_seq=165 ttl=63 time=13.7 ms
> 64 bytes from 192.168.0.101: icmp_seq=166 ttl=63 time=13.8 ms
> 64 bytes from 192.168.0.101: icmp_seq=167 ttl=63 time=14.7 ms
> 64 bytes from 192.168.0.101: icmp_seq=168 ttl=63 time=14.7 ms
> 64 bytes from 192.168.0.101: icmp_seq=169 ttl=63 time=14.7 ms
> 64 bytes from 192.168.0.101: icmp_seq=170 ttl=63 time=14.7 ms
> 64 bytes from 192.168.0.101: icmp_seq=171 ttl=63 time=14.6 ms
> 64 bytes from 192.168.0.101: icmp_seq=172 ttl=63 time=14.6 ms
> 64 bytes from 192.168.0.101: icmp_seq=173 ttl=63 time=14.5 ms
> 64 bytes from 192.168.0.101: icmp_seq=174 ttl=63 time=14.5 ms
> 64 bytes from 192.168.0.101: icmp_seq=175 ttl=63 time=15.1 ms
> 64 bytes from 192.168.0.101: icmp_seq=176 ttl=63 time=15.6 ms
> 64 bytes from 192.168.0.101: icmp_seq=177 ttl=63 time=16.0 ms
> 64 bytes from 192.168.0.101: icmp_seq=178 ttl=63 time=16.9 ms
> 64 bytes from 192.168.0.101: icmp_seq=179 ttl=63 time=17.7 ms
> 64 bytes from 192.168.0.101: icmp_seq=180 ttl=63 time=17.6 ms
> 64 bytes from 192.168.0.101: icmp_seq=181 ttl=63 time=17.9 ms
> 64 bytes from 192.168.0.101: icmp_seq=182 ttl=63 time=17.9 ms
> 64 bytes from 192.168.0.101: icmp_seq=183 ttl=63 time=18.5 ms
> 64 bytes from 192.168.0.101: icmp_seq=184 ttl=63 time=18.9 ms
> 64 bytes from 192.168.0.101: icmp_seq=185 ttl=63 time=19.8 ms
> 64 bytes from 192.168.0.101: icmp_seq=186 ttl=63 time=19.8 ms
> 64 bytes from 192.168.0.101: icmp_seq=187 ttl=63 time=10.7 ms
> 64 bytes from 192.168.0.101: icmp_seq=188 ttl=63 time=10.5 ms
> 64 bytes from 192.168.0.101: icmp_seq=189 ttl=63 time=10.4 ms
> 64 bytes from 192.168.0.101: icmp_seq=190 ttl=63 time=10.3 ms
> 64 bytes from 192.168.0.101: icmp_seq=191 ttl=63 time=10.5 ms
> 64 bytes from 192.168.0.101: icmp_seq=192 ttl=63 time=10.7 ms
> As you mentioned, the delay has decreased a lot but it is still
> considerably high (in a normal router this delay is less than 1 ms).
> A second strange behaviour is seen in the evolution of the RTT detected.
> It begins in 10 ms and goes increasing little by litttle to reach a peak of
> 20 ms aprox and then it suddely comes back to 10 ms again to increase again
> till 20 ms.
>
> Is this the behaviour you have in your case when the burst_write is set to
> 1?
>
> Regards,
>
> El mar., 18 feb. 2020 a las 8:18, James Huang (<jamsphon@gmail.com>)
> escribió:
>
>> No. We didn't see noticable throughput difference in our test.
>>
>> On Mon., Feb. 17, 2020, 11:04 p.m. Victor Huertas <vhuertas@gmail.com>
>> wrote:
>>
>>> Thanks James for your quick answer.
>>> I guess that this configuration modification implies that the packets
>>> must be written one by one in the sw ring. Did you notice loose of
>>> performance (in throughput) in your aplicación because of that?
>>>
>>> Regards
>>>
>>> El mar., 18 feb. 2020 0:10, James Huang <jamsphon@gmail.com> escribió:
>>>
>>>> Yes, I experienced similar issue in my application. In a short answer,
>>>> set the swqs write burst value to 1 may reduce the latency significantly.
>>>> The default write burst value is 32.
>>>>
>>>> On Mon., Feb. 17, 2020, 8:41 a.m. Victor Huertas <vhuertas@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am developing my own DPDK application basing it in the dpdk-stable
>>>>> ip_pipeline example.
>>>>> At this moment I am using the 17.11 LTS version of DPDK and I amb
>>>>> observing
>>>>> some extrange behaviour. Maybe it is an old issue that can be solved
>>>>> quickly so I would appreciate it if some expert can shade a light on
>>>>> this.
>>>>>
>>>>> The ip_pipeline example allows you to develop Pipelines that perform
>>>>> specific packet processing functions (ROUTING, FLOW_CLASSIFYING,
>>>>> etc...).
>>>>> The thing is that I am extending some of this pipelines with my own.
>>>>> However I want to take advantage of the built-in ip_pipeline
>>>>> capability of
>>>>> arbitrarily assigning the logical core where the pipeline (f_run()
>>>>> function) must be executed so that i can adapt the packet processing
>>>>> power
>>>>> to the amount of the number of cores available.
>>>>> Taking this into account I have observed something strange. I show you
>>>>> this
>>>>> simple example below.
>>>>>
>>>>> Case 1:
>>>>> [PIPELINE 0 MASTER core =0]
>>>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=2] -----SWQ2---->
>>>>> [PIPELINE 3 core=3]
>>>>>
>>>>> Case 2:
>>>>> [PIPELINE 0 MASTER core =0]
>>>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
>>>>> [PIPELINE 3 core=1]
>>>>>
>>>>> I send a ping between two hosts connected at both sides of the pipeline
>>>>> model which allows these pings to cross all the pipelines (from 1 to
>>>>> 3).
>>>>> What I observe in Case 1 (each pipeline has its own thread in different
>>>>> core) is that the reported RTT is less than 1 ms, whereas in Case 2
>>>>> (all
>>>>> pipelines except MASTER are run in the same thread) is 20 ms.
>>>>> Furthermore,
>>>>> in Case 2, if I increase a lot (hundreds of Mbps) the packet rate this
>>>>> RTT
>>>>> decreases to 3 or 4 ms.
>>>>>
>>>>> Has somebody observed this behaviour in the past? Can it be solved
>>>>> somehow?
>>>>>
>>>>> Thanks a lot for your attention
>>>>> --
>>>>> Victor
>>>>>
>>>>>
>>>>> --
>>>>> Victor
>>>>>
>>>>
>
> --
> Victor
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] Fwd: high latency detected in IP pipeline example
  2020-02-18 22:08           ` James Huang
@ 2020-02-19  8:29             ` Victor Huertas
  2020-02-19 10:37               ` [dpdk-dev] Fwd: " Victor Huertas
  0 siblings, 1 reply; 10+ messages in thread
From: Victor Huertas @ 2020-02-19  8:29 UTC (permalink / raw)
  To: James Huang; +Cc: dev, cristian.dumitrescu

OK James,
Thanks for sharing your own experience.
What I would need right now is to know from maintainers if this latency
behaviour is something inherent in DPDK  in the particular case we are
talking about. Furthermore, I would also appreciate it if some maintainer
could tell us if there is some workaround or special configuration that
completely mitigate this latency. I guess that there is one mitigation
mechanism, which is the approach that the new ip_pipeline app example
exposes: if two or more pipelines are in the same core the "connection"
between them is not a software queue but a "direct table connection".

This proposed approach has a big impact on my application and I would like
to know if there is other mitigation approach taking into account the "old"
version of ip_pipeline example.

Thanks for your attention


El mar., 18 feb. 2020 a las 23:09, James Huang (<jamsphon@gmail.com>)
escribió:

> No. I didn't notice the RTT bouncing symptoms.
> In high throughput scenario, if multiple pipelines runs in a single cpu
> core, it does increase the latency.
>
>
> Regards,
> James Huang
>
>
> On Tue, Feb 18, 2020 at 1:50 AM Victor Huertas <vhuertas@gmail.com> wrote:
>
>> Dear James,
>>
>> I have done two different tests with the following configuration:
>> [PIPELINE 0 MASTER core =0]
>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
>> [PIPELINE 3 core=1]
>>
>> The first test (sending a single ping to cross all the pipelines to
>> measure RTT) has been done by setting the burst_write to 32 in SWQ1 and
>> SWQ2. NOTE: All the times we use rte_ring_enqueue_burst in the pipelines 1
>> and 2 we set the number of packets to write to 1.
>>
>> The result of this first test is as shown subsquently:
>> 64 bytes from 192.168.0.101: icmp_seq=343 ttl=63 time=59.8 ms
>> 64 bytes from 192.168.0.101: icmp_seq=344 ttl=63 time=59.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=345 ttl=63 time=59.2 ms
>> 64 bytes from 192.168.0.101: icmp_seq=346 ttl=63 time=59.0 ms
>> 64 bytes from 192.168.0.101: icmp_seq=347 ttl=63 time=59.0 ms
>> 64 bytes from 192.168.0.101: icmp_seq=348 ttl=63 time=59.2 ms
>> 64 bytes from 192.168.0.101: icmp_seq=349 ttl=63 time=59.3 ms
>> 64 bytes from 192.168.0.101: icmp_seq=350 ttl=63 time=59.1 ms
>> 64 bytes from 192.168.0.101: icmp_seq=351 ttl=63 time=58.9 ms
>> 64 bytes from 192.168.0.101: icmp_seq=352 ttl=63 time=58.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=353 ttl=63 time=58.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=354 ttl=63 time=58.0 ms
>> 64 bytes from 192.168.0.101: icmp_seq=355 ttl=63 time=58.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=356 ttl=63 time=57.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=357 ttl=63 time=56.9 ms
>> 64 bytes from 192.168.0.101: icmp_seq=358 ttl=63 time=57.2 ms
>> 64 bytes from 192.168.0.101: icmp_seq=359 ttl=63 time=57.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=360 ttl=63 time=57.3 ms
>>
>> As you can see, the RTT is quite high and the range of values is more or
>> less stable.
>>
>> The second test is the same as the first one but setting burst_write to 1
>> for all SWQs. The result is this one:
>>
>> 64 bytes from 192.168.0.101: icmp_seq=131 ttl=63 time=10.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=132 ttl=63 time=10.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=133 ttl=63 time=10.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=134 ttl=63 time=10.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=135 ttl=63 time=10.8 ms
>> 64 bytes from 192.168.0.101: icmp_seq=136 ttl=63 time=10.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=137 ttl=63 time=10.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=138 ttl=63 time=10.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=139 ttl=63 time=10.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=140 ttl=63 time=10.2 ms
>> 64 bytes from 192.168.0.101: icmp_seq=141 ttl=63 time=10.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=142 ttl=63 time=10.9 ms
>> 64 bytes from 192.168.0.101: icmp_seq=143 ttl=63 time=11.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=144 ttl=63 time=11.3 ms
>> 64 bytes from 192.168.0.101: icmp_seq=145 ttl=63 time=11.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=146 ttl=63 time=11.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=147 ttl=63 time=11.0 ms
>> 64 bytes from 192.168.0.101: icmp_seq=148 ttl=63 time=11.3 ms
>> 64 bytes from 192.168.0.101: icmp_seq=149 ttl=63 time=12.0 ms
>> 64 bytes from 192.168.0.101: icmp_seq=150 ttl=63 time=12.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=151 ttl=63 time=12.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=152 ttl=63 time=12.3 ms
>> 64 bytes from 192.168.0.101: icmp_seq=153 ttl=63 time=12.8 ms
>> 64 bytes from 192.168.0.101: icmp_seq=154 ttl=63 time=12.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=155 ttl=63 time=12.8 ms
>> 64 bytes from 192.168.0.101: icmp_seq=156 ttl=63 time=12.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=157 ttl=63 time=12.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=158 ttl=63 time=12.9 ms
>> 64 bytes from 192.168.0.101: icmp_seq=159 ttl=63 time=13.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=160 ttl=63 time=13.8 ms
>> 64 bytes from 192.168.0.101: icmp_seq=161 ttl=63 time=13.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=162 ttl=63 time=13.3 ms
>> 64 bytes from 192.168.0.101: icmp_seq=163 ttl=63 time=13.3 ms
>> 64 bytes from 192.168.0.101: icmp_seq=164 ttl=63 time=13.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=165 ttl=63 time=13.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=166 ttl=63 time=13.8 ms
>> 64 bytes from 192.168.0.101: icmp_seq=167 ttl=63 time=14.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=168 ttl=63 time=14.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=169 ttl=63 time=14.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=170 ttl=63 time=14.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=171 ttl=63 time=14.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=172 ttl=63 time=14.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=173 ttl=63 time=14.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=174 ttl=63 time=14.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=175 ttl=63 time=15.1 ms
>> 64 bytes from 192.168.0.101: icmp_seq=176 ttl=63 time=15.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=177 ttl=63 time=16.0 ms
>> 64 bytes from 192.168.0.101: icmp_seq=178 ttl=63 time=16.9 ms
>> 64 bytes from 192.168.0.101: icmp_seq=179 ttl=63 time=17.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=180 ttl=63 time=17.6 ms
>> 64 bytes from 192.168.0.101: icmp_seq=181 ttl=63 time=17.9 ms
>> 64 bytes from 192.168.0.101: icmp_seq=182 ttl=63 time=17.9 ms
>> 64 bytes from 192.168.0.101: icmp_seq=183 ttl=63 time=18.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=184 ttl=63 time=18.9 ms
>> 64 bytes from 192.168.0.101: icmp_seq=185 ttl=63 time=19.8 ms
>> 64 bytes from 192.168.0.101: icmp_seq=186 ttl=63 time=19.8 ms
>> 64 bytes from 192.168.0.101: icmp_seq=187 ttl=63 time=10.7 ms
>> 64 bytes from 192.168.0.101: icmp_seq=188 ttl=63 time=10.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=189 ttl=63 time=10.4 ms
>> 64 bytes from 192.168.0.101: icmp_seq=190 ttl=63 time=10.3 ms
>> 64 bytes from 192.168.0.101: icmp_seq=191 ttl=63 time=10.5 ms
>> 64 bytes from 192.168.0.101: icmp_seq=192 ttl=63 time=10.7 ms
>> As you mentioned, the delay has decreased a lot but it is still
>> considerably high (in a normal router this delay is less than 1 ms).
>> A second strange behaviour is seen in the evolution of the RTT detected.
>> It begins in 10 ms and goes increasing little by litttle to reach a peak of
>> 20 ms aprox and then it suddely comes back to 10 ms again to increase again
>> till 20 ms.
>>
>> Is this the behaviour you have in your case when the burst_write is set
>> to 1?
>>
>> Regards,
>>
>> El mar., 18 feb. 2020 a las 8:18, James Huang (<jamsphon@gmail.com>)
>> escribió:
>>
>>> No. We didn't see noticable throughput difference in our test.
>>>
>>> On Mon., Feb. 17, 2020, 11:04 p.m. Victor Huertas <vhuertas@gmail.com>
>>> wrote:
>>>
>>>> Thanks James for your quick answer.
>>>> I guess that this configuration modification implies that the packets
>>>> must be written one by one in the sw ring. Did you notice loose of
>>>> performance (in throughput) in your aplicación because of that?
>>>>
>>>> Regards
>>>>
>>>> El mar., 18 feb. 2020 0:10, James Huang <jamsphon@gmail.com> escribió:
>>>>
>>>>> Yes, I experienced similar issue in my application. In a short answer,
>>>>> set the swqs write burst value to 1 may reduce the latency significantly.
>>>>> The default write burst value is 32.
>>>>>
>>>>> On Mon., Feb. 17, 2020, 8:41 a.m. Victor Huertas <vhuertas@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am developing my own DPDK application basing it in the dpdk-stable
>>>>>> ip_pipeline example.
>>>>>> At this moment I am using the 17.11 LTS version of DPDK and I amb
>>>>>> observing
>>>>>> some extrange behaviour. Maybe it is an old issue that can be solved
>>>>>> quickly so I would appreciate it if some expert can shade a light on
>>>>>> this.
>>>>>>
>>>>>> The ip_pipeline example allows you to develop Pipelines that perform
>>>>>> specific packet processing functions (ROUTING, FLOW_CLASSIFYING,
>>>>>> etc...).
>>>>>> The thing is that I am extending some of this pipelines with my own.
>>>>>> However I want to take advantage of the built-in ip_pipeline
>>>>>> capability of
>>>>>> arbitrarily assigning the logical core where the pipeline (f_run()
>>>>>> function) must be executed so that i can adapt the packet processing
>>>>>> power
>>>>>> to the amount of the number of cores available.
>>>>>> Taking this into account I have observed something strange. I show
>>>>>> you this
>>>>>> simple example below.
>>>>>>
>>>>>> Case 1:
>>>>>> [PIPELINE 0 MASTER core =0]
>>>>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=2] -----SWQ2---->
>>>>>> [PIPELINE 3 core=3]
>>>>>>
>>>>>> Case 2:
>>>>>> [PIPELINE 0 MASTER core =0]
>>>>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
>>>>>> [PIPELINE 3 core=1]
>>>>>>
>>>>>> I send a ping between two hosts connected at both sides of the
>>>>>> pipeline
>>>>>> model which allows these pings to cross all the pipelines (from 1 to
>>>>>> 3).
>>>>>> What I observe in Case 1 (each pipeline has its own thread in
>>>>>> different
>>>>>> core) is that the reported RTT is less than 1 ms, whereas in Case 2
>>>>>> (all
>>>>>> pipelines except MASTER are run in the same thread) is 20 ms.
>>>>>> Furthermore,
>>>>>> in Case 2, if I increase a lot (hundreds of Mbps) the packet rate
>>>>>> this RTT
>>>>>> decreases to 3 or 4 ms.
>>>>>>
>>>>>> Has somebody observed this behaviour in the past? Can it be solved
>>>>>> somehow?
>>>>>>
>>>>>> Thanks a lot for your attention
>>>>>> --
>>>>>> Victor
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Victor
>>>>>>
>>>>>
>>
>> --
>> Victor
>>
>

-- 
Victor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [dpdk-dev] Fwd: Fwd: high latency detected in IP pipeline example
  2020-02-19  8:29             ` Victor Huertas
@ 2020-02-19 10:37               ` Victor Huertas
  2020-02-19 10:53                 ` Olivier Matz
  0 siblings, 1 reply; 10+ messages in thread
From: Victor Huertas @ 2020-02-19 10:37 UTC (permalink / raw)
  To: James Huang, cristian.dumitrescu, dev, olivier.matz

Hi ,

I put some maintainers as destination that could provide some extra
information on this issue.
I hope they can shed some light on this.

Regards

El mié., 19 feb. 2020 a las 9:29, Victor Huertas (<vhuertas@gmail.com>)
escribió:

> OK James,
> Thanks for sharing your own experience.
> What I would need right now is to know from maintainers if this latency
> behaviour is something inherent in DPDK  in the particular case we are
> talking about. Furthermore, I would also appreciate it if some maintainer
> could tell us if there is some workaround or special configuration that
> completely mitigate this latency. I guess that there is one mitigation
> mechanism, which is the approach that the new ip_pipeline app example
> exposes: if two or more pipelines are in the same core the "connection"
> between them is not a software queue but a "direct table connection".
>
> This proposed approach has a big impact on my application and I would like
> to know if there is other mitigation approach taking into account the "old"
> version of ip_pipeline example.
>
> Thanks for your attention
>
>
> El mar., 18 feb. 2020 a las 23:09, James Huang (<jamsphon@gmail.com>)
> escribió:
>
>> No. I didn't notice the RTT bouncing symptoms.
>> In high throughput scenario, if multiple pipelines runs in a single cpu
>> core, it does increase the latency.
>>
>>
>> Regards,
>> James Huang
>>
>>
>> On Tue, Feb 18, 2020 at 1:50 AM Victor Huertas <vhuertas@gmail.com>
>> wrote:
>>
>>> Dear James,
>>>
>>> I have done two different tests with the following configuration:
>>> [PIPELINE 0 MASTER core =0]
>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
>>> [PIPELINE 3 core=1]
>>>
>>> The first test (sending a single ping to cross all the pipelines to
>>> measure RTT) has been done by setting the burst_write to 32 in SWQ1 and
>>> SWQ2. NOTE: All the times we use rte_ring_enqueue_burst in the pipelines 1
>>> and 2 we set the number of packets to write to 1.
>>>
>>> The result of this first test is as shown subsquently:
>>> 64 bytes from 192.168.0.101: icmp_seq=343 ttl=63 time=59.8 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=344 ttl=63 time=59.4 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=345 ttl=63 time=59.2 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=346 ttl=63 time=59.0 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=347 ttl=63 time=59.0 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=348 ttl=63 time=59.2 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=349 ttl=63 time=59.3 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=350 ttl=63 time=59.1 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=351 ttl=63 time=58.9 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=352 ttl=63 time=58.5 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=353 ttl=63 time=58.4 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=354 ttl=63 time=58.0 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=355 ttl=63 time=58.4 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=356 ttl=63 time=57.7 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=357 ttl=63 time=56.9 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=358 ttl=63 time=57.2 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=359 ttl=63 time=57.5 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=360 ttl=63 time=57.3 ms
>>>
>>> As you can see, the RTT is quite high and the range of values is more or
>>> less stable.
>>>
>>> The second test is the same as the first one but setting burst_write to
>>> 1 for all SWQs. The result is this one:
>>>
>>> 64 bytes from 192.168.0.101: icmp_seq=131 ttl=63 time=10.6 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=132 ttl=63 time=10.6 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=133 ttl=63 time=10.5 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=134 ttl=63 time=10.7 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=135 ttl=63 time=10.8 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=136 ttl=63 time=10.4 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=137 ttl=63 time=10.7 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=138 ttl=63 time=10.5 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=139 ttl=63 time=10.4 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=140 ttl=63 time=10.2 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=141 ttl=63 time=10.4 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=142 ttl=63 time=10.9 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=143 ttl=63 time=11.4 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=144 ttl=63 time=11.3 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=145 ttl=63 time=11.5 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=146 ttl=63 time=11.6 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=147 ttl=63 time=11.0 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=148 ttl=63 time=11.3 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=149 ttl=63 time=12.0 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=150 ttl=63 time=12.6 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=151 ttl=63 time=12.4 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=152 ttl=63 time=12.3 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=153 ttl=63 time=12.8 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=154 ttl=63 time=12.4 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=155 ttl=63 time=12.8 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=156 ttl=63 time=12.7 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=157 ttl=63 time=12.6 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=158 ttl=63 time=12.9 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=159 ttl=63 time=13.4 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=160 ttl=63 time=13.8 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=161 ttl=63 time=13.4 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=162 ttl=63 time=13.3 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=163 ttl=63 time=13.3 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=164 ttl=63 time=13.7 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=165 ttl=63 time=13.7 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=166 ttl=63 time=13.8 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=167 ttl=63 time=14.7 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=168 ttl=63 time=14.7 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=169 ttl=63 time=14.7 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=170 ttl=63 time=14.7 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=171 ttl=63 time=14.6 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=172 ttl=63 time=14.6 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=173 ttl=63 time=14.5 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=174 ttl=63 time=14.5 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=175 ttl=63 time=15.1 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=176 ttl=63 time=15.6 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=177 ttl=63 time=16.0 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=178 ttl=63 time=16.9 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=179 ttl=63 time=17.7 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=180 ttl=63 time=17.6 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=181 ttl=63 time=17.9 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=182 ttl=63 time=17.9 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=183 ttl=63 time=18.5 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=184 ttl=63 time=18.9 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=185 ttl=63 time=19.8 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=186 ttl=63 time=19.8 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=187 ttl=63 time=10.7 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=188 ttl=63 time=10.5 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=189 ttl=63 time=10.4 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=190 ttl=63 time=10.3 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=191 ttl=63 time=10.5 ms
>>> 64 bytes from 192.168.0.101: icmp_seq=192 ttl=63 time=10.7 ms
>>> As you mentioned, the delay has decreased a lot but it is still
>>> considerably high (in a normal router this delay is less than 1 ms).
>>> A second strange behaviour is seen in the evolution of the RTT detected.
>>> It begins in 10 ms and goes increasing little by litttle to reach a peak of
>>> 20 ms aprox and then it suddely comes back to 10 ms again to increase again
>>> till 20 ms.
>>>
>>> Is this the behaviour you have in your case when the burst_write is set
>>> to 1?
>>>
>>> Regards,
>>>
>>> El mar., 18 feb. 2020 a las 8:18, James Huang (<jamsphon@gmail.com>)
>>> escribió:
>>>
>>>> No. We didn't see noticable throughput difference in our test.
>>>>
>>>> On Mon., Feb. 17, 2020, 11:04 p.m. Victor Huertas <vhuertas@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks James for your quick answer.
>>>>> I guess that this configuration modification implies that the packets
>>>>> must be written one by one in the sw ring. Did you notice loose of
>>>>> performance (in throughput) in your aplicación because of that?
>>>>>
>>>>> Regards
>>>>>
>>>>> El mar., 18 feb. 2020 0:10, James Huang <jamsphon@gmail.com> escribió:
>>>>>
>>>>>> Yes, I experienced similar issue in my application. In a short
>>>>>> answer, set the swqs write burst value to 1 may reduce the latency
>>>>>> significantly. The default write burst value is 32.
>>>>>>
>>>>>> On Mon., Feb. 17, 2020, 8:41 a.m. Victor Huertas <vhuertas@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I am developing my own DPDK application basing it in the dpdk-stable
>>>>>>> ip_pipeline example.
>>>>>>> At this moment I am using the 17.11 LTS version of DPDK and I amb
>>>>>>> observing
>>>>>>> some extrange behaviour. Maybe it is an old issue that can be solved
>>>>>>> quickly so I would appreciate it if some expert can shade a light on
>>>>>>> this.
>>>>>>>
>>>>>>> The ip_pipeline example allows you to develop Pipelines that perform
>>>>>>> specific packet processing functions (ROUTING, FLOW_CLASSIFYING,
>>>>>>> etc...).
>>>>>>> The thing is that I am extending some of this pipelines with my own.
>>>>>>> However I want to take advantage of the built-in ip_pipeline
>>>>>>> capability of
>>>>>>> arbitrarily assigning the logical core where the pipeline (f_run()
>>>>>>> function) must be executed so that i can adapt the packet processing
>>>>>>> power
>>>>>>> to the amount of the number of cores available.
>>>>>>> Taking this into account I have observed something strange. I show
>>>>>>> you this
>>>>>>> simple example below.
>>>>>>>
>>>>>>> Case 1:
>>>>>>> [PIPELINE 0 MASTER core =0]
>>>>>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=2] -----SWQ2---->
>>>>>>> [PIPELINE 3 core=3]
>>>>>>>
>>>>>>> Case 2:
>>>>>>> [PIPELINE 0 MASTER core =0]
>>>>>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
>>>>>>> [PIPELINE 3 core=1]
>>>>>>>
>>>>>>> I send a ping between two hosts connected at both sides of the
>>>>>>> pipeline
>>>>>>> model which allows these pings to cross all the pipelines (from 1 to
>>>>>>> 3).
>>>>>>> What I observe in Case 1 (each pipeline has its own thread in
>>>>>>> different
>>>>>>> core) is that the reported RTT is less than 1 ms, whereas in Case 2
>>>>>>> (all
>>>>>>> pipelines except MASTER are run in the same thread) is 20 ms.
>>>>>>> Furthermore,
>>>>>>> in Case 2, if I increase a lot (hundreds of Mbps) the packet rate
>>>>>>> this RTT
>>>>>>> decreases to 3 or 4 ms.
>>>>>>>
>>>>>>> Has somebody observed this behaviour in the past? Can it be solved
>>>>>>> somehow?
>>>>>>>
>>>>>>> Thanks a lot for your attention
>>>>>>> --
>>>>>>> Victor
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Victor
>>>>>>>
>>>>>>
>>>
>>> --
>>> Victor
>>>
>>
>
> --
> Victor
>


-- 
Victor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] Fwd: Fwd: high latency detected in IP pipeline example
  2020-02-19 10:37               ` [dpdk-dev] Fwd: " Victor Huertas
@ 2020-02-19 10:53                 ` Olivier Matz
  2020-02-19 12:05                   ` Victor Huertas
  0 siblings, 1 reply; 10+ messages in thread
From: Olivier Matz @ 2020-02-19 10:53 UTC (permalink / raw)
  To: Victor Huertas; +Cc: James Huang, cristian.dumitrescu, dev

Hi Victor,

I have no experience with ip_pipeline. I can at least say that this
latency is much higher that what you should get.

My initial thought was that you were using several pthreads bound to the
same core, but from what I read in your first mail, this is not the
case.

Do you have a simple way to reproduce your issue with the original
example app?


Olivier

On Wed, Feb 19, 2020 at 11:37:21AM +0100, Victor Huertas wrote:
> Hi ,
> 
> I put some maintainers as destination that could provide some extra
> information on this issue.
> I hope they can shed some light on this.
> 
> Regards
> 
> El mié., 19 feb. 2020 a las 9:29, Victor Huertas (<vhuertas@gmail.com>)
> escribió:
> 
> > OK James,
> > Thanks for sharing your own experience.
> > What I would need right now is to know from maintainers if this latency
> > behaviour is something inherent in DPDK  in the particular case we are
> > talking about. Furthermore, I would also appreciate it if some maintainer
> > could tell us if there is some workaround or special configuration that
> > completely mitigate this latency. I guess that there is one mitigation
> > mechanism, which is the approach that the new ip_pipeline app example
> > exposes: if two or more pipelines are in the same core the "connection"
> > between them is not a software queue but a "direct table connection".
> >
> > This proposed approach has a big impact on my application and I would like
> > to know if there is other mitigation approach taking into account the "old"
> > version of ip_pipeline example.
> >
> > Thanks for your attention
> >
> >
> > El mar., 18 feb. 2020 a las 23:09, James Huang (<jamsphon@gmail.com>)
> > escribió:
> >
> >> No. I didn't notice the RTT bouncing symptoms.
> >> In high throughput scenario, if multiple pipelines runs in a single cpu
> >> core, it does increase the latency.
> >>
> >>
> >> Regards,
> >> James Huang
> >>
> >>
> >> On Tue, Feb 18, 2020 at 1:50 AM Victor Huertas <vhuertas@gmail.com>
> >> wrote:
> >>
> >>> Dear James,
> >>>
> >>> I have done two different tests with the following configuration:
> >>> [PIPELINE 0 MASTER core =0]
> >>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
> >>> [PIPELINE 3 core=1]
> >>>
> >>> The first test (sending a single ping to cross all the pipelines to
> >>> measure RTT) has been done by setting the burst_write to 32 in SWQ1 and
> >>> SWQ2. NOTE: All the times we use rte_ring_enqueue_burst in the pipelines 1
> >>> and 2 we set the number of packets to write to 1.
> >>>
> >>> The result of this first test is as shown subsquently:
> >>> 64 bytes from 192.168.0.101: icmp_seq=343 ttl=63 time=59.8 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=344 ttl=63 time=59.4 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=345 ttl=63 time=59.2 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=346 ttl=63 time=59.0 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=347 ttl=63 time=59.0 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=348 ttl=63 time=59.2 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=349 ttl=63 time=59.3 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=350 ttl=63 time=59.1 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=351 ttl=63 time=58.9 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=352 ttl=63 time=58.5 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=353 ttl=63 time=58.4 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=354 ttl=63 time=58.0 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=355 ttl=63 time=58.4 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=356 ttl=63 time=57.7 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=357 ttl=63 time=56.9 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=358 ttl=63 time=57.2 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=359 ttl=63 time=57.5 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=360 ttl=63 time=57.3 ms
> >>>
> >>> As you can see, the RTT is quite high and the range of values is more or
> >>> less stable.
> >>>
> >>> The second test is the same as the first one but setting burst_write to
> >>> 1 for all SWQs. The result is this one:
> >>>
> >>> 64 bytes from 192.168.0.101: icmp_seq=131 ttl=63 time=10.6 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=132 ttl=63 time=10.6 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=133 ttl=63 time=10.5 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=134 ttl=63 time=10.7 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=135 ttl=63 time=10.8 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=136 ttl=63 time=10.4 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=137 ttl=63 time=10.7 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=138 ttl=63 time=10.5 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=139 ttl=63 time=10.4 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=140 ttl=63 time=10.2 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=141 ttl=63 time=10.4 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=142 ttl=63 time=10.9 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=143 ttl=63 time=11.4 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=144 ttl=63 time=11.3 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=145 ttl=63 time=11.5 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=146 ttl=63 time=11.6 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=147 ttl=63 time=11.0 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=148 ttl=63 time=11.3 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=149 ttl=63 time=12.0 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=150 ttl=63 time=12.6 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=151 ttl=63 time=12.4 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=152 ttl=63 time=12.3 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=153 ttl=63 time=12.8 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=154 ttl=63 time=12.4 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=155 ttl=63 time=12.8 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=156 ttl=63 time=12.7 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=157 ttl=63 time=12.6 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=158 ttl=63 time=12.9 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=159 ttl=63 time=13.4 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=160 ttl=63 time=13.8 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=161 ttl=63 time=13.4 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=162 ttl=63 time=13.3 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=163 ttl=63 time=13.3 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=164 ttl=63 time=13.7 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=165 ttl=63 time=13.7 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=166 ttl=63 time=13.8 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=167 ttl=63 time=14.7 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=168 ttl=63 time=14.7 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=169 ttl=63 time=14.7 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=170 ttl=63 time=14.7 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=171 ttl=63 time=14.6 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=172 ttl=63 time=14.6 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=173 ttl=63 time=14.5 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=174 ttl=63 time=14.5 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=175 ttl=63 time=15.1 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=176 ttl=63 time=15.6 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=177 ttl=63 time=16.0 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=178 ttl=63 time=16.9 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=179 ttl=63 time=17.7 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=180 ttl=63 time=17.6 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=181 ttl=63 time=17.9 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=182 ttl=63 time=17.9 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=183 ttl=63 time=18.5 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=184 ttl=63 time=18.9 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=185 ttl=63 time=19.8 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=186 ttl=63 time=19.8 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=187 ttl=63 time=10.7 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=188 ttl=63 time=10.5 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=189 ttl=63 time=10.4 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=190 ttl=63 time=10.3 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=191 ttl=63 time=10.5 ms
> >>> 64 bytes from 192.168.0.101: icmp_seq=192 ttl=63 time=10.7 ms
> >>> As you mentioned, the delay has decreased a lot but it is still
> >>> considerably high (in a normal router this delay is less than 1 ms).
> >>> A second strange behaviour is seen in the evolution of the RTT detected.
> >>> It begins in 10 ms and goes increasing little by litttle to reach a peak of
> >>> 20 ms aprox and then it suddely comes back to 10 ms again to increase again
> >>> till 20 ms.
> >>>
> >>> Is this the behaviour you have in your case when the burst_write is set
> >>> to 1?
> >>>
> >>> Regards,
> >>>
> >>> El mar., 18 feb. 2020 a las 8:18, James Huang (<jamsphon@gmail.com>)
> >>> escribió:
> >>>
> >>>> No. We didn't see noticable throughput difference in our test.
> >>>>
> >>>> On Mon., Feb. 17, 2020, 11:04 p.m. Victor Huertas <vhuertas@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Thanks James for your quick answer.
> >>>>> I guess that this configuration modification implies that the packets
> >>>>> must be written one by one in the sw ring. Did you notice loose of
> >>>>> performance (in throughput) in your aplicación because of that?
> >>>>>
> >>>>> Regards
> >>>>>
> >>>>> El mar., 18 feb. 2020 0:10, James Huang <jamsphon@gmail.com> escribió:
> >>>>>
> >>>>>> Yes, I experienced similar issue in my application. In a short
> >>>>>> answer, set the swqs write burst value to 1 may reduce the latency
> >>>>>> significantly. The default write burst value is 32.
> >>>>>>
> >>>>>> On Mon., Feb. 17, 2020, 8:41 a.m. Victor Huertas <vhuertas@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I am developing my own DPDK application basing it in the dpdk-stable
> >>>>>>> ip_pipeline example.
> >>>>>>> At this moment I am using the 17.11 LTS version of DPDK and I amb
> >>>>>>> observing
> >>>>>>> some extrange behaviour. Maybe it is an old issue that can be solved
> >>>>>>> quickly so I would appreciate it if some expert can shade a light on
> >>>>>>> this.
> >>>>>>>
> >>>>>>> The ip_pipeline example allows you to develop Pipelines that perform
> >>>>>>> specific packet processing functions (ROUTING, FLOW_CLASSIFYING,
> >>>>>>> etc...).
> >>>>>>> The thing is that I am extending some of this pipelines with my own.
> >>>>>>> However I want to take advantage of the built-in ip_pipeline
> >>>>>>> capability of
> >>>>>>> arbitrarily assigning the logical core where the pipeline (f_run()
> >>>>>>> function) must be executed so that i can adapt the packet processing
> >>>>>>> power
> >>>>>>> to the amount of the number of cores available.
> >>>>>>> Taking this into account I have observed something strange. I show
> >>>>>>> you this
> >>>>>>> simple example below.
> >>>>>>>
> >>>>>>> Case 1:
> >>>>>>> [PIPELINE 0 MASTER core =0]
> >>>>>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=2] -----SWQ2---->
> >>>>>>> [PIPELINE 3 core=3]
> >>>>>>>
> >>>>>>> Case 2:
> >>>>>>> [PIPELINE 0 MASTER core =0]
> >>>>>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
> >>>>>>> [PIPELINE 3 core=1]
> >>>>>>>
> >>>>>>> I send a ping between two hosts connected at both sides of the
> >>>>>>> pipeline
> >>>>>>> model which allows these pings to cross all the pipelines (from 1 to
> >>>>>>> 3).
> >>>>>>> What I observe in Case 1 (each pipeline has its own thread in
> >>>>>>> different
> >>>>>>> core) is that the reported RTT is less than 1 ms, whereas in Case 2
> >>>>>>> (all
> >>>>>>> pipelines except MASTER are run in the same thread) is 20 ms.
> >>>>>>> Furthermore,
> >>>>>>> in Case 2, if I increase a lot (hundreds of Mbps) the packet rate
> >>>>>>> this RTT
> >>>>>>> decreases to 3 or 4 ms.
> >>>>>>>
> >>>>>>> Has somebody observed this behaviour in the past? Can it be solved
> >>>>>>> somehow?
> >>>>>>>
> >>>>>>> Thanks a lot for your attention
> >>>>>>> --
> >>>>>>> Victor
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Victor
> >>>>>>>
> >>>>>>
> >>>
> >>> --
> >>> Victor
> >>>
> >>
> >
> > --
> > Victor
> >
> 
> 
> -- 
> Victor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] Fwd: Fwd: high latency detected in IP pipeline example
  2020-02-19 10:53                 ` Olivier Matz
@ 2020-02-19 12:05                   ` Victor Huertas
  0 siblings, 0 replies; 10+ messages in thread
From: Victor Huertas @ 2020-02-19 12:05 UTC (permalink / raw)
  To: Olivier Matz; +Cc: James Huang, cristian.dumitrescu, dev

Hi Oliver,

Thanks for your answer. I think that the most appropriate maintainer to
answer to this issue is Cristian as it is the maintainer of ip_pipeline.

In order to tell you how to reproduce the problem, you should go back to
the DPDK v17.11 and run the ip_pipeline app having a *.cfg configuration
file where you put N (where N is more than 3) pipelines in a row. No matter
if the pipelines apply the default entry of the tables they may have. The
important thing is that the packet crosses all the pipelines, not the
process that they receive. That is, the point is that several f_run()
functions (one for each pipeline) falls into the same thread which is
associated to an unique logical core (the f_runs are executed one after the
other) and the rte_mbufs are read and written into/from software queues.

If you need a particular configuration that you can test quickly I should
remount this environment in my lab and test it again.This would take a
while from my part. I have evolved the pipelines a little bit from the
original app and I cannot provide you with these pipelines. The only thing
I can assure you is that the mechanism of "connecting" the pipelines
depending on the logical core where you want to run them is untouched from
the original app.

This latency issue is something that worries me quite a lot because I need
to justify to my bosses the use of DPDK as a key library to improve
performance of the application we are developing in my company (please,
understand that I cannot tell you more). I decided to base the application
on the ip_pipeline app becuase it offered me the opportunity to mix
pipelines including built-in tables and built-in f_run with customized
pipelines with custom f_run function. I saw a very attractive point in
this--> flexibility in conforming our own packet processing models by
contatenating well defined pipelines. However I found myself with the
mentoined latency issue.

I hope this allows you to understand better where I am now.

Regards,

El mié., 19 feb. 2020 a las 11:53, Olivier Matz (<olivier.matz@6wind.com>)
escribió:

> Hi Victor,
>
> I have no experience with ip_pipeline. I can at least say that this
> latency is much higher that what you should get.
>
> My initial thought was that you were using several pthreads bound to the
> same core, but from what I read in your first mail, this is not the
> case.
>
> Do you have a simple way to reproduce your issue with the original
> example app?
>
>
> Olivier
>
> On Wed, Feb 19, 2020 at 11:37:21AM +0100, Victor Huertas wrote:
> > Hi ,
> >
> > I put some maintainers as destination that could provide some extra
> > information on this issue.
> > I hope they can shed some light on this.
> >
> > Regards
> >
> > El mié., 19 feb. 2020 a las 9:29, Victor Huertas (<vhuertas@gmail.com>)
> > escribió:
> >
> > > OK James,
> > > Thanks for sharing your own experience.
> > > What I would need right now is to know from maintainers if this latency
> > > behaviour is something inherent in DPDK  in the particular case we are
> > > talking about. Furthermore, I would also appreciate it if some
> maintainer
> > > could tell us if there is some workaround or special configuration that
> > > completely mitigate this latency. I guess that there is one mitigation
> > > mechanism, which is the approach that the new ip_pipeline app example
> > > exposes: if two or more pipelines are in the same core the "connection"
> > > between them is not a software queue but a "direct table connection".
> > >
> > > This proposed approach has a big impact on my application and I would
> like
> > > to know if there is other mitigation approach taking into account the
> "old"
> > > version of ip_pipeline example.
> > >
> > > Thanks for your attention
> > >
> > >
> > > El mar., 18 feb. 2020 a las 23:09, James Huang (<jamsphon@gmail.com>)
> > > escribió:
> > >
> > >> No. I didn't notice the RTT bouncing symptoms.
> > >> In high throughput scenario, if multiple pipelines runs in a single
> cpu
> > >> core, it does increase the latency.
> > >>
> > >>
> > >> Regards,
> > >> James Huang
> > >>
> > >>
> > >> On Tue, Feb 18, 2020 at 1:50 AM Victor Huertas <vhuertas@gmail.com>
> > >> wrote:
> > >>
> > >>> Dear James,
> > >>>
> > >>> I have done two different tests with the following configuration:
> > >>> [PIPELINE 0 MASTER core =0]
> > >>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1] -----SWQ2---->
> > >>> [PIPELINE 3 core=1]
> > >>>
> > >>> The first test (sending a single ping to cross all the pipelines to
> > >>> measure RTT) has been done by setting the burst_write to 32 in SWQ1
> and
> > >>> SWQ2. NOTE: All the times we use rte_ring_enqueue_burst in the
> pipelines 1
> > >>> and 2 we set the number of packets to write to 1.
> > >>>
> > >>> The result of this first test is as shown subsquently:
> > >>> 64 bytes from 192.168.0.101: icmp_seq=343 ttl=63 time=59.8 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=344 ttl=63 time=59.4 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=345 ttl=63 time=59.2 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=346 ttl=63 time=59.0 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=347 ttl=63 time=59.0 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=348 ttl=63 time=59.2 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=349 ttl=63 time=59.3 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=350 ttl=63 time=59.1 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=351 ttl=63 time=58.9 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=352 ttl=63 time=58.5 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=353 ttl=63 time=58.4 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=354 ttl=63 time=58.0 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=355 ttl=63 time=58.4 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=356 ttl=63 time=57.7 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=357 ttl=63 time=56.9 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=358 ttl=63 time=57.2 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=359 ttl=63 time=57.5 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=360 ttl=63 time=57.3 ms
> > >>>
> > >>> As you can see, the RTT is quite high and the range of values is
> more or
> > >>> less stable.
> > >>>
> > >>> The second test is the same as the first one but setting burst_write
> to
> > >>> 1 for all SWQs. The result is this one:
> > >>>
> > >>> 64 bytes from 192.168.0.101: icmp_seq=131 ttl=63 time=10.6 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=132 ttl=63 time=10.6 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=133 ttl=63 time=10.5 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=134 ttl=63 time=10.7 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=135 ttl=63 time=10.8 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=136 ttl=63 time=10.4 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=137 ttl=63 time=10.7 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=138 ttl=63 time=10.5 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=139 ttl=63 time=10.4 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=140 ttl=63 time=10.2 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=141 ttl=63 time=10.4 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=142 ttl=63 time=10.9 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=143 ttl=63 time=11.4 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=144 ttl=63 time=11.3 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=145 ttl=63 time=11.5 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=146 ttl=63 time=11.6 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=147 ttl=63 time=11.0 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=148 ttl=63 time=11.3 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=149 ttl=63 time=12.0 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=150 ttl=63 time=12.6 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=151 ttl=63 time=12.4 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=152 ttl=63 time=12.3 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=153 ttl=63 time=12.8 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=154 ttl=63 time=12.4 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=155 ttl=63 time=12.8 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=156 ttl=63 time=12.7 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=157 ttl=63 time=12.6 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=158 ttl=63 time=12.9 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=159 ttl=63 time=13.4 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=160 ttl=63 time=13.8 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=161 ttl=63 time=13.4 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=162 ttl=63 time=13.3 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=163 ttl=63 time=13.3 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=164 ttl=63 time=13.7 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=165 ttl=63 time=13.7 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=166 ttl=63 time=13.8 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=167 ttl=63 time=14.7 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=168 ttl=63 time=14.7 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=169 ttl=63 time=14.7 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=170 ttl=63 time=14.7 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=171 ttl=63 time=14.6 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=172 ttl=63 time=14.6 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=173 ttl=63 time=14.5 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=174 ttl=63 time=14.5 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=175 ttl=63 time=15.1 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=176 ttl=63 time=15.6 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=177 ttl=63 time=16.0 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=178 ttl=63 time=16.9 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=179 ttl=63 time=17.7 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=180 ttl=63 time=17.6 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=181 ttl=63 time=17.9 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=182 ttl=63 time=17.9 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=183 ttl=63 time=18.5 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=184 ttl=63 time=18.9 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=185 ttl=63 time=19.8 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=186 ttl=63 time=19.8 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=187 ttl=63 time=10.7 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=188 ttl=63 time=10.5 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=189 ttl=63 time=10.4 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=190 ttl=63 time=10.3 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=191 ttl=63 time=10.5 ms
> > >>> 64 bytes from 192.168.0.101: icmp_seq=192 ttl=63 time=10.7 ms
> > >>> As you mentioned, the delay has decreased a lot but it is still
> > >>> considerably high (in a normal router this delay is less than 1 ms).
> > >>> A second strange behaviour is seen in the evolution of the RTT
> detected.
> > >>> It begins in 10 ms and goes increasing little by litttle to reach a
> peak of
> > >>> 20 ms aprox and then it suddely comes back to 10 ms again to
> increase again
> > >>> till 20 ms.
> > >>>
> > >>> Is this the behaviour you have in your case when the burst_write is
> set
> > >>> to 1?
> > >>>
> > >>> Regards,
> > >>>
> > >>> El mar., 18 feb. 2020 a las 8:18, James Huang (<jamsphon@gmail.com>)
> > >>> escribió:
> > >>>
> > >>>> No. We didn't see noticable throughput difference in our test.
> > >>>>
> > >>>> On Mon., Feb. 17, 2020, 11:04 p.m. Victor Huertas <
> vhuertas@gmail.com>
> > >>>> wrote:
> > >>>>
> > >>>>> Thanks James for your quick answer.
> > >>>>> I guess that this configuration modification implies that the
> packets
> > >>>>> must be written one by one in the sw ring. Did you notice loose of
> > >>>>> performance (in throughput) in your aplicación because of that?
> > >>>>>
> > >>>>> Regards
> > >>>>>
> > >>>>> El mar., 18 feb. 2020 0:10, James Huang <jamsphon@gmail.com>
> escribió:
> > >>>>>
> > >>>>>> Yes, I experienced similar issue in my application. In a short
> > >>>>>> answer, set the swqs write burst value to 1 may reduce the latency
> > >>>>>> significantly. The default write burst value is 32.
> > >>>>>>
> > >>>>>> On Mon., Feb. 17, 2020, 8:41 a.m. Victor Huertas <
> vhuertas@gmail.com>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Hi all,
> > >>>>>>>
> > >>>>>>> I am developing my own DPDK application basing it in the
> dpdk-stable
> > >>>>>>> ip_pipeline example.
> > >>>>>>> At this moment I am using the 17.11 LTS version of DPDK and I amb
> > >>>>>>> observing
> > >>>>>>> some extrange behaviour. Maybe it is an old issue that can be
> solved
> > >>>>>>> quickly so I would appreciate it if some expert can shade a
> light on
> > >>>>>>> this.
> > >>>>>>>
> > >>>>>>> The ip_pipeline example allows you to develop Pipelines that
> perform
> > >>>>>>> specific packet processing functions (ROUTING, FLOW_CLASSIFYING,
> > >>>>>>> etc...).
> > >>>>>>> The thing is that I am extending some of this pipelines with my
> own.
> > >>>>>>> However I want to take advantage of the built-in ip_pipeline
> > >>>>>>> capability of
> > >>>>>>> arbitrarily assigning the logical core where the pipeline
> (f_run()
> > >>>>>>> function) must be executed so that i can adapt the packet
> processing
> > >>>>>>> power
> > >>>>>>> to the amount of the number of cores available.
> > >>>>>>> Taking this into account I have observed something strange. I
> show
> > >>>>>>> you this
> > >>>>>>> simple example below.
> > >>>>>>>
> > >>>>>>> Case 1:
> > >>>>>>> [PIPELINE 0 MASTER core =0]
> > >>>>>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=2]
> -----SWQ2---->
> > >>>>>>> [PIPELINE 3 core=3]
> > >>>>>>>
> > >>>>>>> Case 2:
> > >>>>>>> [PIPELINE 0 MASTER core =0]
> > >>>>>>> [PIPELINE 1 core=1] --- SWQ1--->[PIPELINE 2 core=1]
> -----SWQ2---->
> > >>>>>>> [PIPELINE 3 core=1]
> > >>>>>>>
> > >>>>>>> I send a ping between two hosts connected at both sides of the
> > >>>>>>> pipeline
> > >>>>>>> model which allows these pings to cross all the pipelines (from
> 1 to
> > >>>>>>> 3).
> > >>>>>>> What I observe in Case 1 (each pipeline has its own thread in
> > >>>>>>> different
> > >>>>>>> core) is that the reported RTT is less than 1 ms, whereas in
> Case 2
> > >>>>>>> (all
> > >>>>>>> pipelines except MASTER are run in the same thread) is 20 ms.
> > >>>>>>> Furthermore,
> > >>>>>>> in Case 2, if I increase a lot (hundreds of Mbps) the packet rate
> > >>>>>>> this RTT
> > >>>>>>> decreases to 3 or 4 ms.
> > >>>>>>>
> > >>>>>>> Has somebody observed this behaviour in the past? Can it be
> solved
> > >>>>>>> somehow?
> > >>>>>>>
> > >>>>>>> Thanks a lot for your attention
> > >>>>>>> --
> > >>>>>>> Victor
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Victor
> > >>>>>>>
> > >>>>>>
> > >>>
> > >>> --
> > >>> Victor
> > >>>
> > >>
> > >
> > > --
> > > Victor
> > >
> >
> >
> > --
> > Victor
>


-- 
Victor

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-02-19 12:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAGxG5cjY+npJ7wVqcb9MXdtKkpC6RrgYpDQA2qbaAjD7i7C2EQ@mail.gmail.com>
2020-02-17 16:41 ` [dpdk-dev] Fwd: high latency detected in IP pipeline example Victor Huertas
2020-02-17 23:10   ` James Huang
2020-02-18  7:04     ` Victor Huertas
2020-02-18  7:18       ` James Huang
2020-02-18  9:49         ` Victor Huertas
2020-02-18 22:08           ` James Huang
2020-02-19  8:29             ` Victor Huertas
2020-02-19 10:37               ` [dpdk-dev] Fwd: " Victor Huertas
2020-02-19 10:53                 ` Olivier Matz
2020-02-19 12:05                   ` Victor Huertas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).