[dpdk-users] Strange packet loss with multi-frame payloads

DPDK usage discussions
 help / color / mirror / Atom feed

* [dpdk-users] Strange packet loss with multi-frame payloads
@ 2017-07-17 13:18 Harold Demure
  2017-07-17 20:38 ` Pavel Shirshov
  0 siblings, 1 reply; 11+ messages in thread
From: Harold Demure @ 2017-07-17 13:18 UTC (permalink / raw)
  To: users

Hello,
  I am having a problem with packets loss and I hope you can help me out.
Below you find a description of the application and of the problem.
It is a little long, but I really hope somebody out there can help me,
because this is driving me crazy.

*Application*

I have a client-server application; single server, multiple clients.
The machines have 8 active cores which poll 8 distinct RX queues to receive
packets and use 8 distinct TX queues to burst out packets (i.e.,
run-to-completion model).

*Workload*

The workload is composed of mostly single-frame packets, but occasionally
clients send to the server multi-frame packets, and occasionally the server
sends back to the client multi-frame replies.
Packets are fragmented at the UDP level (i.e., no IP fragmentation, every
packet of the same requests has a frag_id == 0, even though they share the
same packet_id).

*Problem*

I experience huge packet loss on the server when the occasional multi-frame
requests of the clients correspond to a big payload ( > 300 Kb).
The eth stats that I gather on the server say that there is no error, nor
any packet loss (q_errors, imissed, ierrors, oerrors, rx_nombuf are all
equal to 0). Yet, the application is not seeing some packets of big
requests that the clients send.

I record some interesting facts
1) The clients do not experience such packet loss, although they also
receive  packets with an aggregate payload of the same size of the packets
received by the server. The only differences w.r.t. the server is that a
client machine of course has a lower RX load (it only gets the replies to
its own requests) and a client thread only receives packets from a single
machine (the server).
2) This behavior does not arise as long as the biggest payload exchanged
between clients and servers is < 200 Kb. This leads me to conclude that
fragmentation is not te issue (also, if I implement a stubborn
retransmission, eventually all packets are received even with bigger
payloads). Also, I reserve plenty of memory for my mempool, so I don't
think the server runs out of mbufs (and if that was the case I guess I
would see this in the dropped packets count, right?).
3) If I switch to the pipeline model (on the server only) this problem
basically disappears. By pipeline model I mean something like the
load-balancing app, where a single core on the server receives client
packets on a single RX queue (worker cores reply back to the client using
their own TX queue). This leads me to think that the problem is on the
server, and not on the clients.
4) It doesn't seem to be a "load" problem. If I run the same tests multiple
times, in some "lucky" runs I get that the run-to-completion model
 outperforms the pipeline one. Also, the run-to-completion model with
single-frame packets can handle a number of single-frame packets per second
that is much higher than the number of frames per second that are generated
with the workload with some big packets.

*Question*

Do you have any idea why I am witnessing this behavior? I know that having
fewer queues can help performance by relieving contention on the NIC, but
is it possible that the contention is actually causing packets to get
dropped?

*Platform*

DPDK: v  2.2-0  (I know this is an old version, but I am dealing with
legacy code I cannot change)

MLNX_OFED_LINUX-3.1-1.0.3-ubuntu14.04-x86_64

My NIC : Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

My machine runs a 4.4.0-72-generic  on Ubuntu 16.04.02

CPU is Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz  2x8 cores

Thank you a lot, especially if you went through the whole email :)
Regards,
   Harold

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Strange packet loss with multi-frame payloads
  2017-07-17 13:18 [dpdk-users] Strange packet loss with multi-frame payloads Harold Demure
@ 2017-07-17 20:38 ` Pavel Shirshov
  2017-07-17 21:23   ` Harold Demure
  0 siblings, 1 reply; 11+ messages in thread
From: Pavel Shirshov @ 2017-07-17 20:38 UTC (permalink / raw)
  To: Harold Demure; +Cc: users

Hi Harold,

Sorry I don't have a direct answer on your request, but I have a bunch
of questions.

1. What is "packet_id" here? It's something inside of your udp payload?
2. How do you know you have the packet loss? How can you be sure it's
packet loss if you don't see it on your counters? How can you be sure
that these packets were sent by clients? How can you be sure your
clients actually sent the packets?

Also I see you're using 2x8 cores server. So your OS uses some cores
for itself. Could it be a problem too?

Thanks

On Mon, Jul 17, 2017 at 6:18 AM, Harold Demure
<harold.demure87@gmail.com> wrote:
> Hello,
>   I am having a problem with packets loss and I hope you can help me out.
> Below you find a description of the application and of the problem.
> It is a little long, but I really hope somebody out there can help me,
> because this is driving me crazy.
>
> *Application*
>
> I have a client-server application; single server, multiple clients.
> The machines have 8 active cores which poll 8 distinct RX queues to receive
> packets and use 8 distinct TX queues to burst out packets (i.e.,
> run-to-completion model).
>
> *Workload*
>
> The workload is composed of mostly single-frame packets, but occasionally
> clients send to the server multi-frame packets, and occasionally the server
> sends back to the client multi-frame replies.
> Packets are fragmented at the UDP level (i.e., no IP fragmentation, every
> packet of the same requests has a frag_id == 0, even though they share the
> same packet_id).
>
> *Problem*
>
> I experience huge packet loss on the server when the occasional multi-frame
> requests of the clients correspond to a big payload ( > 300 Kb).
> The eth stats that I gather on the server say that there is no error, nor
> any packet loss (q_errors, imissed, ierrors, oerrors, rx_nombuf are all
> equal to 0). Yet, the application is not seeing some packets of big
> requests that the clients send.
>
> I record some interesting facts
> 1) The clients do not experience such packet loss, although they also
> receive  packets with an aggregate payload of the same size of the packets
> received by the server. The only differences w.r.t. the server is that a
> client machine of course has a lower RX load (it only gets the replies to
> its own requests) and a client thread only receives packets from a single
> machine (the server).
> 2) This behavior does not arise as long as the biggest payload exchanged
> between clients and servers is < 200 Kb. This leads me to conclude that
> fragmentation is not te issue (also, if I implement a stubborn
> retransmission, eventually all packets are received even with bigger
> payloads). Also, I reserve plenty of memory for my mempool, so I don't
> think the server runs out of mbufs (and if that was the case I guess I
> would see this in the dropped packets count, right?).
> 3) If I switch to the pipeline model (on the server only) this problem
> basically disappears. By pipeline model I mean something like the
> load-balancing app, where a single core on the server receives client
> packets on a single RX queue (worker cores reply back to the client using
> their own TX queue). This leads me to think that the problem is on the
> server, and not on the clients.
> 4) It doesn't seem to be a "load" problem. If I run the same tests multiple
> times, in some "lucky" runs I get that the run-to-completion model
>  outperforms the pipeline one. Also, the run-to-completion model with
> single-frame packets can handle a number of single-frame packets per second
> that is much higher than the number of frames per second that are generated
> with the workload with some big packets.
>
>
> *Question*
>
> Do you have any idea why I am witnessing this behavior? I know that having
> fewer queues can help performance by relieving contention on the NIC, but
> is it possible that the contention is actually causing packets to get
> dropped?
>
> *Platform*
>
> DPDK: v  2.2-0  (I know this is an old version, but I am dealing with
> legacy code I cannot change)
>
> MLNX_OFED_LINUX-3.1-1.0.3-ubuntu14.04-x86_64
>
> My NIC : Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
>
> My machine runs a 4.4.0-72-generic  on Ubuntu 16.04.02
>
> CPU is Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz  2x8 cores
>
>
> Thank you a lot, especially if you went through the whole email :)
> Regards,
>    Harold

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Strange packet loss with multi-frame payloads
  2017-07-17 20:38 ` Pavel Shirshov
@ 2017-07-17 21:23   ` Harold Demure
  2017-07-17 23:24     ` Harold Demure
  2017-07-24 16:23     ` Pavel Shirshov
  0 siblings, 2 replies; 11+ messages in thread
From: Harold Demure @ 2017-07-17 21:23 UTC (permalink / raw)
  To: Pavel Shirshov; +Cc: users

Dear Pavel,
  Thank you for your feedback; I really appreciate it. I reply to your
questions inline.
Regards,
   Harold

2017-07-17 22:38 GMT+02:00 Pavel Shirshov <pavel.shirshov@gmail.com>:

> Hi Harold,
>
> Sorry I don't have a direct answer on your request, but I have a bunch
> of questions.
>
> 1. What is "packet_id" here? It's something inside of your udp payload?
>

*I have a packet_id* *in the plain ipv4_hdr structure and I have a
fragment_id in the header of each fragment I send*
*So a typical packet is  ETH|IP|UDP|APP_HEADER*
*I defragment packets looking at the pckt_id in the ipv4 header and the
fragment id in the app_header*

2. How do you know you have the packet loss?


*I know it because some fragmented packets never get reassembled fully. If
I print the packets seen by the server I see something like  "PCKT_ID 10
FRAG 250, PCKT_ID 10 FRAG 252". And FRAG 251 is never printed.*

*Actually, something strange that happens sometimes is that a core receives
fragments of two packets and, say, receives   frag 1 of packet X, frag 2 of
packet Y, frag 3 of packet X, frag 4 of packet Y.*
*Or that, after "losing" a fragment for packet X, I only see printed
fragments with EVEN frag_id for that packet X. At least for a while.*

*This led me also to consider a bug in my implementation (I don't
experience this problem if I run with a SINGLE client thread). However,
with smaller payloads, even fragmented, everything runs smoothly.*
*If you have any suggestions for tests to run to spot a possible bug in my
implementation, It'd be more than welcome!*

*MORE ON THIS: the buffers in which I store the packets taken from RX are
statically defined arrays, like struct rte_mbuf*  temp_mbuf[SIZE].  SIZE
can be pretty high (say, 10K entries), and there are 3 of those arrays per
core. Can it be that, somehow, they mess up the memory layout (e.g., they
intersect)?*


> How can you be sure it's
> packet loss if you don't see it on your counters?


*Just because i tag every packet, and some packets that should be there, is
not*


> How can you be sure
> that these packets were sent by clients?


*I print all packets that the clients send*

How can you be sure your
> clients actually sent the packets?
>

*The TX/mbuf error counters on the client eth_stats are 0*



>
> Also I see you're using 2x8 cores server. So your OS uses some cores
> for itself. Could it be a problem too?
>
>
*I have no idea. I only use 8 cores out of the 16 I have, because I only
use the 8 that are in the same NUMA domain as the NIC. However, the PMD
should take the NIC out of the control of the kernel, so the OS should not
be able to see it or mess with it.*


> Thanks
>
> On Mon, Jul 17, 2017 at 6:18 AM, Harold Demure
> <harold.demure87@gmail.com> wrote:
> > Hello,
> >   I am having a problem with packets loss and I hope you can help me out.
> > Below you find a description of the application and of the problem.
> > It is a little long, but I really hope somebody out there can help me,
> > because this is driving me crazy.
> >
> > *Application*
> >
> > I have a client-server application; single server, multiple clients.
> > The machines have 8 active cores which poll 8 distinct RX queues to
> receive
> > packets and use 8 distinct TX queues to burst out packets (i.e.,
> > run-to-completion model).
> >
> > *Workload*
> >
> > The workload is composed of mostly single-frame packets, but occasionally
> > clients send to the server multi-frame packets, and occasionally the
> server
> > sends back to the client multi-frame replies.
> > Packets are fragmented at the UDP level (i.e., no IP fragmentation, every
> > packet of the same requests has a frag_id == 0, even though they share
> the
> > same packet_id).
> >
> > *Problem*
> >
> > I experience huge packet loss on the server when the occasional
> multi-frame
> > requests of the clients correspond to a big payload ( > 300 Kb).
> > The eth stats that I gather on the server say that there is no error, nor
> > any packet loss (q_errors, imissed, ierrors, oerrors, rx_nombuf are all
> > equal to 0). Yet, the application is not seeing some packets of big
> > requests that the clients send.
> >
> > I record some interesting facts
> > 1) The clients do not experience such packet loss, although they also
> > receive  packets with an aggregate payload of the same size of the
> packets
> > received by the server. The only differences w.r.t. the server is that a
> > client machine of course has a lower RX load (it only gets the replies to
> > its own requests) and a client thread only receives packets from a single
> > machine (the server).
> > 2) This behavior does not arise as long as the biggest payload exchanged
> > between clients and servers is < 200 Kb. This leads me to conclude that
> > fragmentation is not te issue (also, if I implement a stubborn
> > retransmission, eventually all packets are received even with bigger
> > payloads). Also, I reserve plenty of memory for my mempool, so I don't
> > think the server runs out of mbufs (and if that was the case I guess I
> > would see this in the dropped packets count, right?).
> > 3) If I switch to the pipeline model (on the server only) this problem
> > basically disappears. By pipeline model I mean something like the
> > load-balancing app, where a single core on the server receives client
> > packets on a single RX queue (worker cores reply back to the client using
> > their own TX queue). This leads me to think that the problem is on the
> > server, and not on the clients.
> > 4) It doesn't seem to be a "load" problem. If I run the same tests
> multiple
> > times, in some "lucky" runs I get that the run-to-completion model
> >  outperforms the pipeline one. Also, the run-to-completion model with
> > single-frame packets can handle a number of single-frame packets per
> second
> > that is much higher than the number of frames per second that are
> generated
> > with the workload with some big packets.
> >
> >
> > *Question*
> >
> > Do you have any idea why I am witnessing this behavior? I know that
> having
> > fewer queues can help performance by relieving contention on the NIC, but
> > is it possible that the contention is actually causing packets to get
> > dropped?
> >
> > *Platform*
> >
> > DPDK: v  2.2-0  (I know this is an old version, but I am dealing with
> > legacy code I cannot change)
> >
> > MLNX_OFED_LINUX-3.1-1.0.3-ubuntu14.04-x86_64
> >
> > My NIC : Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
> >
> > My machine runs a 4.4.0-72-generic  on Ubuntu 16.04.02
> >
> > CPU is Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz  2x8 cores
> >
> >
> > Thank you a lot, especially if you went through the whole email :)
> > Regards,
> >    Harold
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Strange packet loss with multi-frame payloads
  2017-07-17 21:23   ` Harold Demure
@ 2017-07-17 23:24     ` Harold Demure
  2017-07-18  5:50       ` Shyam Shrivastav
  2017-07-24 16:23     ` Pavel Shirshov
  1 sibling, 1 reply; 11+ messages in thread
From: Harold Demure @ 2017-07-17 23:24 UTC (permalink / raw)
  To: Pavel Shirshov; +Cc: users

Hello again,
  I tried to convert my statically defined buffers into buffers allocated
through rte_malloc (as discussed in the previous email, see quoted text).
Unfortunately, the problem is still there :(
Regards,
  Harold



>
> 2. How do you know you have the packet loss?
>
>
> *I know it because some fragmented packets never get reassembled fully. If
> I print the packets seen by the server I see something like  "PCKT_ID 10
> FRAG 250, PCKT_ID 10 FRAG 252". And FRAG 251 is never printed.*
>
> *Actually, something strange that happens sometimes is that a core
> receives fragments of two packets and, say, receives   frag 1 of packet X,
> frag 2 of packet Y, frag 3 of packet X, frag 4 of packet Y.*
> *Or that, after "losing" a fragment for packet X, I only see printed
> fragments with EVEN frag_id for that packet X. At least for a while.*
>
> *This led me also to consider a bug in my implementation (I don't
> experience this problem if I run with a SINGLE client thread). However,
> with smaller payloads, even fragmented, everything runs smoothly.*
> *If you have any suggestions for tests to run to spot a possible bug in my
> implementation, It'd be more than welcome!*
>
> *MORE ON THIS: the buffers in which I store the packets taken from RX are
> statically defined arrays, like struct rte_mbuf*  temp_mbuf[SIZE].  SIZE
> can be pretty high (say, 10K entries), and there are 3 of those arrays per
> core. Can it be that, somehow, they mess up the memory layout (e.g., they
> intersect)?*
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Strange packet loss with multi-frame payloads
  2017-07-17 23:24     ` Harold Demure
@ 2017-07-18  5:50       ` Shyam Shrivastav
  2017-07-18  9:36         ` Harold Demure
  0 siblings, 1 reply; 11+ messages in thread
From: Shyam Shrivastav @ 2017-07-18  5:50 UTC (permalink / raw)
  To: Harold Demure; +Cc: Pavel Shirshov, users

As I understand the problem disappears with 1 RX queue on server. You can
reduce number of queues on server from 8 and arrive at an optimal value
without packet loss.
For intel 82599 NIC packet loss is experienced with more than 4 RX queues,
this was reported in dpdk dev or user mailing list, read in archives
sometime back while looking for similar information with 82599.

On Tue, Jul 18, 2017 at 4:54 AM, Harold Demure <harold.demure87@gmail.com>
wrote:

> Hello again,
>   I tried to convert my statically defined buffers into buffers allocated
> through rte_malloc (as discussed in the previous email, see quoted text).
> Unfortunately, the problem is still there :(
> Regards,
>   Harold
>
>
>
> >
> > 2. How do you know you have the packet loss?
> >
> >
> > *I know it because some fragmented packets never get reassembled fully.
> If
> > I print the packets seen by the server I see something like  "PCKT_ID 10
> > FRAG 250, PCKT_ID 10 FRAG 252". And FRAG 251 is never printed.*
> >
> > *Actually, something strange that happens sometimes is that a core
> > receives fragments of two packets and, say, receives   frag 1 of packet
> X,
> > frag 2 of packet Y, frag 3 of packet X, frag 4 of packet Y.*
> > *Or that, after "losing" a fragment for packet X, I only see printed
> > fragments with EVEN frag_id for that packet X. At least for a while.*
> >
> > *This led me also to consider a bug in my implementation (I don't
> > experience this problem if I run with a SINGLE client thread). However,
> > with smaller payloads, even fragmented, everything runs smoothly.*
> > *If you have any suggestions for tests to run to spot a possible bug in
> my
> > implementation, It'd be more than welcome!*
> >
> > *MORE ON THIS: the buffers in which I store the packets taken from RX are
> > statically defined arrays, like struct rte_mbuf*  temp_mbuf[SIZE].  SIZE
> > can be pretty high (say, 10K entries), and there are 3 of those arrays
> per
> > core. Can it be that, somehow, they mess up the memory layout (e.g., they
> > intersect)?*
> >
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Strange packet loss with multi-frame payloads
  2017-07-18  5:50       ` Shyam Shrivastav
@ 2017-07-18  9:36         ` Harold Demure
  2017-07-18 10:07           ` Shyam Shrivastav
  0 siblings, 1 reply; 11+ messages in thread
From: Harold Demure @ 2017-07-18  9:36 UTC (permalink / raw)
  To: Shyam Shrivastav; +Cc: Pavel Shirshov, users

Hello Shyam,
   Thank you for your suggestion. I will try what you say. However, this
problem arises only with specific workloads. For example, if the clients
only send requests of 1 frame, everything runs smoothly even with 16 active
queues. My problem arises only with bigger payloads and multiple queues.
Shouldn't this suggest that the problem is not "simply" that my NIC drops
packets with > X active queues?

Regards,
  Harold

2017-07-18 7:50 GMT+02:00 Shyam Shrivastav <shrivastav.shyam@gmail.com>:

> As I understand the problem disappears with 1 RX queue on server. You can
> reduce number of queues on server from 8 and arrive at an optimal value
> without packet loss.
> For intel 82599 NIC packet loss is experienced with more than 4 RX queues,
> this was reported in dpdk dev or user mailing list, read in archives
> sometime back while looking for similar information with 82599.
>
> On Tue, Jul 18, 2017 at 4:54 AM, Harold Demure <harold.demure87@gmail.com>
> wrote:
>
>> Hello again,
>>   I tried to convert my statically defined buffers into buffers allocated
>> through rte_malloc (as discussed in the previous email, see quoted text).
>> Unfortunately, the problem is still there :(
>> Regards,
>>   Harold
>>
>>
>>
>> >
>> > 2. How do you know you have the packet loss?
>> >
>> >
>> > *I know it because some fragmented packets never get reassembled fully.
>> If
>> > I print the packets seen by the server I see something like  "PCKT_ID 10
>> > FRAG 250, PCKT_ID 10 FRAG 252". And FRAG 251 is never printed.*
>> >
>> > *Actually, something strange that happens sometimes is that a core
>> > receives fragments of two packets and, say, receives   frag 1 of packet
>> X,
>> > frag 2 of packet Y, frag 3 of packet X, frag 4 of packet Y.*
>> > *Or that, after "losing" a fragment for packet X, I only see printed
>> > fragments with EVEN frag_id for that packet X. At least for a while.*
>> >
>> > *This led me also to consider a bug in my implementation (I don't
>> > experience this problem if I run with a SINGLE client thread). However,
>> > with smaller payloads, even fragmented, everything runs smoothly.*
>> > *If you have any suggestions for tests to run to spot a possible bug in
>> my
>> > implementation, It'd be more than welcome!*
>> >
>> > *MORE ON THIS: the buffers in which I store the packets taken from RX
>> are
>> > statically defined arrays, like struct rte_mbuf*  temp_mbuf[SIZE].  SIZE
>> > can be pretty high (say, 10K entries), and there are 3 of those arrays
>> per
>> > core. Can it be that, somehow, they mess up the memory layout (e.g.,
>> they
>> > intersect)?*
>> >
>>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Strange packet loss with multi-frame payloads
  2017-07-18  9:36         ` Harold Demure
@ 2017-07-18 10:07           ` Shyam Shrivastav
  2017-07-18 11:06             ` Harold Demure
  0 siblings, 1 reply; 11+ messages in thread
From: Shyam Shrivastav @ 2017-07-18 10:07 UTC (permalink / raw)
  To: Harold Demure; +Cc: Pavel Shirshov, users

Hi Harold
I meant optimal performance w.r.t packets per second. If there is no loss
without app fragmentation at target pps with say 8 RX queues, and same
results in missing packets with app fragmentation then the issue might me
somewhere else. What is RSS configuration, you should not take transport
headers into account ETH_RSS_IPV4 is safe otherwise different app fragments
of same packet can go to different RX queues.

On Tue, Jul 18, 2017 at 3:06 PM, Harold Demure <harold.demure87@gmail.com>
wrote:

> Hello Shyam,
>    Thank you for your suggestion. I will try what you say. However, this
> problem arises only with specific workloads. For example, if the clients
> only send requests of 1 frame, everything runs smoothly even with 16 active
> queues. My problem arises only with bigger payloads and multiple queues.
> Shouldn't this suggest that the problem is not "simply" that my NIC drops
> packets with > X active queues?
>
> Regards,
>   Harold
>
> 2017-07-18 7:50 GMT+02:00 Shyam Shrivastav <shrivastav.shyam@gmail.com>:
>
>> As I understand the problem disappears with 1 RX queue on server. You can
>> reduce number of queues on server from 8 and arrive at an optimal value
>> without packet loss.
>> For intel 82599 NIC packet loss is experienced with more than 4 RX
>> queues, this was reported in dpdk dev or user mailing list, read in
>> archives sometime back while looking for similar information with 82599.
>>
>> On Tue, Jul 18, 2017 at 4:54 AM, Harold Demure <harold.demure87@gmail.com
>> > wrote:
>>
>>> Hello again,
>>>   I tried to convert my statically defined buffers into buffers allocated
>>> through rte_malloc (as discussed in the previous email, see quoted text).
>>> Unfortunately, the problem is still there :(
>>> Regards,
>>>   Harold
>>>
>>>
>>>
>>> >
>>> > 2. How do you know you have the packet loss?
>>> >
>>> >
>>> > *I know it because some fragmented packets never get reassembled
>>> fully. If
>>> > I print the packets seen by the server I see something like  "PCKT_ID
>>> 10
>>> > FRAG 250, PCKT_ID 10 FRAG 252". And FRAG 251 is never printed.*
>>> >
>>> > *Actually, something strange that happens sometimes is that a core
>>> > receives fragments of two packets and, say, receives   frag 1 of
>>> packet X,
>>> > frag 2 of packet Y, frag 3 of packet X, frag 4 of packet Y.*
>>> > *Or that, after "losing" a fragment for packet X, I only see printed
>>> > fragments with EVEN frag_id for that packet X. At least for a while.*
>>> >
>>> > *This led me also to consider a bug in my implementation (I don't
>>> > experience this problem if I run with a SINGLE client thread). However,
>>> > with smaller payloads, even fragmented, everything runs smoothly.*
>>> > *If you have any suggestions for tests to run to spot a possible bug
>>> in my
>>> > implementation, It'd be more than welcome!*
>>> >
>>> > *MORE ON THIS: the buffers in which I store the packets taken from RX
>>> are
>>> > statically defined arrays, like struct rte_mbuf*  temp_mbuf[SIZE].
>>> SIZE
>>> > can be pretty high (say, 10K entries), and there are 3 of those arrays
>>> per
>>> > core. Can it be that, somehow, they mess up the memory layout (e.g.,
>>> they
>>> > intersect)?*
>>> >
>>>
>>
>>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Strange packet loss with multi-frame payloads
  2017-07-18 10:07           ` Shyam Shrivastav
@ 2017-07-18 11:06             ` Harold Demure
  2017-07-18 12:20               ` Shyam Shrivastav
  0 siblings, 1 reply; 11+ messages in thread
From: Harold Demure @ 2017-07-18 11:06 UTC (permalink / raw)
  To: Shyam Shrivastav; +Cc: Pavel Shirshov, users

Hello again,
  At the bottom of this email you find my rte_eth_conf configuration, which
includes RSS.
For my NIC, documentation says RSS can only be used taking into account
also the transport layer [1].
For a given client/server pair, all the packets with the same src/dst port
are received by the same core.
So to ensure that all the fragments are received by the same core, I keep
fixed the src-dst port.

Indeed, this works just fine with smaller payloads (even multi-frame), and
also the clients always get multi-frame replies, because an individual
logical reply has all its segments delivered to the same client thread.

Thank you again for your feedback.
Regards,
  Harold

=========

[1] http://dpdk.org/doc/guides/nics/mlx4.html

static struct rte_eth_conf port_conf = {
        .rxmode = {
                .mq_mode    = ETH_MQ_RX_RSS,
                .split_hdr_size = 0,
                .header_split   = 0, /**< Header Split disabled */
                .hw_ip_checksum = 0, /**< IP checksum offload enabled */
                .hw_vlan_filter = 0, /**< VLAN filtering disabled */
                .jumbo_frame    = 0, /**< Jumbo Frame Support disabled */
                .hw_strip_crc   = 0, /**< CRC stripped by hardware */
                .max_rx_pkt_len =  ETHER_MAX_LEN,
                .enable_scatter = 1
        },
        .rx_adv_conf = {
                .rss_conf = {
                        .rss_key = NULL,
                        .rss_hf = ETH_RSS_IP | ETH_RSS_UDP,
                },
        },
        .txmode = {
                .mq_mode = ETH_MQ_TX_NONE,
        },
};


2017-07-18 12:07 GMT+02:00 Shyam Shrivastav <shrivastav.shyam@gmail.com>:

> Hi Harold
> I meant optimal performance w.r.t packets per second. If there is no loss
> without app fragmentation at target pps with say 8 RX queues, and same
> results in missing packets with app fragmentation then the issue might me
> somewhere else. What is RSS configuration, you should not take transport
> headers into account ETH_RSS_IPV4 is safe otherwise different app fragments
> of same packet can go to different RX queues.
>
> On Tue, Jul 18, 2017 at 3:06 PM, Harold Demure <harold.demure87@gmail.com>
> wrote:
>
>> Hello Shyam,
>>    Thank you for your suggestion. I will try what you say. However, this
>> problem arises only with specific workloads. For example, if the clients
>> only send requests of 1 frame, everything runs smoothly even with 16 active
>> queues. My problem arises only with bigger payloads and multiple queues.
>> Shouldn't this suggest that the problem is not "simply" that my NIC drops
>> packets with > X active queues?
>>
>> Regards,
>>   Harold
>>
>> 2017-07-18 7:50 GMT+02:00 Shyam Shrivastav <shrivastav.shyam@gmail.com>:
>>
>>> As I understand the problem disappears with 1 RX queue on server. You
>>> can reduce number of queues on server from 8 and arrive at an optimal value
>>> without packet loss.
>>> For intel 82599 NIC packet loss is experienced with more than 4 RX
>>> queues, this was reported in dpdk dev or user mailing list, read in
>>> archives sometime back while looking for similar information with 82599.
>>>
>>> On Tue, Jul 18, 2017 at 4:54 AM, Harold Demure <
>>> harold.demure87@gmail.com> wrote:
>>>
>>>> Hello again,
>>>>   I tried to convert my statically defined buffers into buffers
>>>> allocated
>>>> through rte_malloc (as discussed in the previous email, see quoted
>>>> text).
>>>> Unfortunately, the problem is still there :(
>>>> Regards,
>>>>   Harold
>>>>
>>>>
>>>>
>>>> >
>>>> > 2. How do you know you have the packet loss?
>>>> >
>>>> >
>>>> > *I know it because some fragmented packets never get reassembled
>>>> fully. If
>>>> > I print the packets seen by the server I see something like  "PCKT_ID
>>>> 10
>>>> > FRAG 250, PCKT_ID 10 FRAG 252". And FRAG 251 is never printed.*
>>>> >
>>>> > *Actually, something strange that happens sometimes is that a core
>>>> > receives fragments of two packets and, say, receives   frag 1 of
>>>> packet X,
>>>> > frag 2 of packet Y, frag 3 of packet X, frag 4 of packet Y.*
>>>> > *Or that, after "losing" a fragment for packet X, I only see printed
>>>> > fragments with EVEN frag_id for that packet X. At least for a while.*
>>>> >
>>>> > *This led me also to consider a bug in my implementation (I don't
>>>> > experience this problem if I run with a SINGLE client thread).
>>>> However,
>>>> > with smaller payloads, even fragmented, everything runs smoothly.*
>>>> > *If you have any suggestions for tests to run to spot a possible bug
>>>> in my
>>>> > implementation, It'd be more than welcome!*
>>>> >
>>>> > *MORE ON THIS: the buffers in which I store the packets taken from RX
>>>> are
>>>> > statically defined arrays, like struct rte_mbuf*  temp_mbuf[SIZE].
>>>> SIZE
>>>> > can be pretty high (say, 10K entries), and there are 3 of those
>>>> arrays per
>>>> > core. Can it be that, somehow, they mess up the memory layout (e.g.,
>>>> they
>>>> > intersect)?*
>>>> >
>>>>
>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Strange packet loss with multi-frame payloads
  2017-07-18 11:06             ` Harold Demure
@ 2017-07-18 12:20               ` Shyam Shrivastav
  0 siblings, 0 replies; 11+ messages in thread
From: Shyam Shrivastav @ 2017-07-18 12:20 UTC (permalink / raw)
  To: Harold Demure; +Cc: Pavel Shirshov, users

Yes your RSS configuration is not an issue ..

On Tue, Jul 18, 2017 at 4:36 PM, Harold Demure <harold.demure87@gmail.com>
wrote:

> Hello again,
>   At the bottom of this email you find my rte_eth_conf configuration,
> which includes RSS.
> For my NIC, documentation says RSS can only be used taking into account
> also the transport layer [1].
> For a given client/server pair, all the packets with the same src/dst port
> are received by the same core.
> So to ensure that all the fragments are received by the same core, I keep
> fixed the src-dst port.
>
> Indeed, this works just fine with smaller payloads (even multi-frame), and
> also the clients always get multi-frame replies, because an individual
> logical reply has all its segments delivered to the same client thread.
>
> Thank you again for your feedback.
> Regards,
>   Harold
>
> =========
>
> [1] http://dpdk.org/doc/guides/nics/mlx4.html
>
> static struct rte_eth_conf port_conf = {
>         .rxmode = {
>                 .mq_mode    = ETH_MQ_RX_RSS,
>                 .split_hdr_size = 0,
>                 .header_split   = 0, /**< Header Split disabled */
>                 .hw_ip_checksum = 0, /**< IP checksum offload enabled */
>                 .hw_vlan_filter = 0, /**< VLAN filtering disabled */
>                 .jumbo_frame    = 0, /**< Jumbo Frame Support disabled */
>                 .hw_strip_crc   = 0, /**< CRC stripped by hardware */
>                 .max_rx_pkt_len =  ETHER_MAX_LEN,
>                 .enable_scatter = 1
>         },
>         .rx_adv_conf = {
>                 .rss_conf = {
>                         .rss_key = NULL,
>                         .rss_hf = ETH_RSS_IP | ETH_RSS_UDP,
>                 },
>         },
>         .txmode = {
>                 .mq_mode = ETH_MQ_TX_NONE,
>         },
> };
>
>
> 2017-07-18 12:07 GMT+02:00 Shyam Shrivastav <shrivastav.shyam@gmail.com>:
>
>> Hi Harold
>> I meant optimal performance w.r.t packets per second. If there is no loss
>> without app fragmentation at target pps with say 8 RX queues, and same
>> results in missing packets with app fragmentation then the issue might me
>> somewhere else. What is RSS configuration, you should not take transport
>> headers into account ETH_RSS_IPV4 is safe otherwise different app fragments
>> of same packet can go to different RX queues.
>>
>> On Tue, Jul 18, 2017 at 3:06 PM, Harold Demure <harold.demure87@gmail.com
>> > wrote:
>>
>>> Hello Shyam,
>>>    Thank you for your suggestion. I will try what you say. However, this
>>> problem arises only with specific workloads. For example, if the clients
>>> only send requests of 1 frame, everything runs smoothly even with 16 active
>>> queues. My problem arises only with bigger payloads and multiple queues.
>>> Shouldn't this suggest that the problem is not "simply" that my NIC drops
>>> packets with > X active queues?
>>>
>>> Regards,
>>>   Harold
>>>
>>> 2017-07-18 7:50 GMT+02:00 Shyam Shrivastav <shrivastav.shyam@gmail.com>:
>>>
>>>> As I understand the problem disappears with 1 RX queue on server. You
>>>> can reduce number of queues on server from 8 and arrive at an optimal value
>>>> without packet loss.
>>>> For intel 82599 NIC packet loss is experienced with more than 4 RX
>>>> queues, this was reported in dpdk dev or user mailing list, read in
>>>> archives sometime back while looking for similar information with 82599.
>>>>
>>>> On Tue, Jul 18, 2017 at 4:54 AM, Harold Demure <
>>>> harold.demure87@gmail.com> wrote:
>>>>
>>>>> Hello again,
>>>>>   I tried to convert my statically defined buffers into buffers
>>>>> allocated
>>>>> through rte_malloc (as discussed in the previous email, see quoted
>>>>> text).
>>>>> Unfortunately, the problem is still there :(
>>>>> Regards,
>>>>>   Harold
>>>>>
>>>>>
>>>>>
>>>>> >
>>>>> > 2. How do you know you have the packet loss?
>>>>> >
>>>>> >
>>>>> > *I know it because some fragmented packets never get reassembled
>>>>> fully. If
>>>>> > I print the packets seen by the server I see something like
>>>>> "PCKT_ID 10
>>>>> > FRAG 250, PCKT_ID 10 FRAG 252". And FRAG 251 is never printed.*
>>>>> >
>>>>> > *Actually, something strange that happens sometimes is that a core
>>>>> > receives fragments of two packets and, say, receives   frag 1 of
>>>>> packet X,
>>>>> > frag 2 of packet Y, frag 3 of packet X, frag 4 of packet Y.*
>>>>> > *Or that, after "losing" a fragment for packet X, I only see printed
>>>>> > fragments with EVEN frag_id for that packet X. At least for a while.*
>>>>> >
>>>>> > *This led me also to consider a bug in my implementation (I don't
>>>>> > experience this problem if I run with a SINGLE client thread).
>>>>> However,
>>>>> > with smaller payloads, even fragmented, everything runs smoothly.*
>>>>> > *If you have any suggestions for tests to run to spot a possible bug
>>>>> in my
>>>>> > implementation, It'd be more than welcome!*
>>>>> >
>>>>> > *MORE ON THIS: the buffers in which I store the packets taken from
>>>>> RX are
>>>>> > statically defined arrays, like struct rte_mbuf*  temp_mbuf[SIZE].
>>>>> SIZE
>>>>> > can be pretty high (say, 10K entries), and there are 3 of those
>>>>> arrays per
>>>>> > core. Can it be that, somehow, they mess up the memory layout (e.g.,
>>>>> they
>>>>> > intersect)?*
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Strange packet loss with multi-frame payloads
  2017-07-17 21:23   ` Harold Demure
  2017-07-17 23:24     ` Harold Demure
@ 2017-07-24 16:23     ` Pavel Shirshov
  2017-07-25 20:03       ` Harold Demure
  1 sibling, 1 reply; 11+ messages in thread
From: Pavel Shirshov @ 2017-07-24 16:23 UTC (permalink / raw)
  To: Harold Demure; +Cc: users

Hi Harold,

Some thoughts:

IPv4 packet_id. I'd better put packet_id in application header.
Otherwise a driver may interfere with packet. In case of Mellanox
there're two drivers: PMD + whole mellanox driver infrastructure. Try
to extend your application header with packet_id and test it.

About registering missed packets. I'd put a switch between client and
server and start seeing into switch counters. It could give more
precise information about dropped packet and what component drops
them. In your case it's not clear your client or server dropped
packets. I don't believe both client and server statistics.

Are you using RSS? What if you disable it? What if you enable it? Can
every second packet go to different NIC queue in high load?

I don't think your static memory mbuf could mess anything.

As I see you XEON has 8 cores/16 threads. Are you sure you're not
using threads on same cores? Because I see you have one cpu. All
cores/threads bound to the same NUMA domain.


On Mon, Jul 17, 2017 at 2:23 PM, Harold Demure
<harold.demure87@gmail.com> wrote:
> Dear Pavel,
>   Thank you for your feedback; I really appreciate it. I reply to your
> questions inline.
> Regards,
>    Harold
>
> 2017-07-17 22:38 GMT+02:00 Pavel Shirshov <pavel.shirshov@gmail.com>:
>>
>> Hi Harold,
>>
>> Sorry I don't have a direct answer on your request, but I have a bunch
>> of questions.
>>
>> 1. What is "packet_id" here? It's something inside of your udp payload?
>
>
> I have a packet_id in the plain ipv4_hdr structure and I have a fragment_id
> in the header of each fragment I send
> So a typical packet is  ETH|IP|UDP|APP_HEADER
> I defragment packets looking at the pckt_id in the ipv4 header and the
> fragment id in the app_header
>
>> 2. How do you know you have the packet loss?
>
>
> I know it because some fragmented packets never get reassembled fully. If I
> print the packets seen by the server I see something like  "PCKT_ID 10 FRAG
> 250, PCKT_ID 10 FRAG 252". And FRAG 251 is never printed.
>
> Actually, something strange that happens sometimes is that a core receives
> fragments of two packets and, say, receives   frag 1 of packet X, frag 2 of
> packet Y, frag 3 of packet X, frag 4 of packet Y.
> Or that, after "losing" a fragment for packet X, I only see printed
> fragments with EVEN frag_id for that packet X. At least for a while.
>
> This led me also to consider a bug in my implementation (I don't experience
> this problem if I run with a SINGLE client thread). However, with smaller
> payloads, even fragmented, everything runs smoothly.
> If you have any suggestions for tests to run to spot a possible bug in my
> implementation, It'd be more than welcome!
>
> MORE ON THIS: the buffers in which I store the packets taken from RX are
> statically defined arrays, like struct rte_mbuf*  temp_mbuf[SIZE].  SIZE can
> be pretty high (say, 10K entries), and there are 3 of those arrays per core.
> Can it be that, somehow, they mess up the memory layout (e.g., they
> intersect)?
>
>>
>> How can you be sure it's
>> packet loss if you don't see it on your counters?
>
>
> Just because i tag every packet, and some packets that should be there, is
> not
>
>>
>> How can you be sure
>> that these packets were sent by clients?
>
>
> I print all packets that the clients send
>
>> How can you be sure your
>> clients actually sent the packets?
>
>
> The TX/mbuf error counters on the client eth_stats are 0
>
>
>>
>>
>> Also I see you're using 2x8 cores server. So your OS uses some cores
>> for itself. Could it be a problem too?
>>
>
> I have no idea. I only use 8 cores out of the 16 I have, because I only use
> the 8 that are in the same NUMA domain as the NIC. However, the PMD should
> take the NIC out of the control of the kernel, so the OS should not be able
> to see it or mess with it.
>
>>
>> Thanks
>>
>> On Mon, Jul 17, 2017 at 6:18 AM, Harold Demure
>> <harold.demure87@gmail.com> wrote:
>> > Hello,
>> >   I am having a problem with packets loss and I hope you can help me
>> > out.
>> > Below you find a description of the application and of the problem.
>> > It is a little long, but I really hope somebody out there can help me,
>> > because this is driving me crazy.
>> >
>> > *Application*
>> >
>> > I have a client-server application; single server, multiple clients.
>> > The machines have 8 active cores which poll 8 distinct RX queues to
>> > receive
>> > packets and use 8 distinct TX queues to burst out packets (i.e.,
>> > run-to-completion model).
>> >
>> > *Workload*
>> >
>> > The workload is composed of mostly single-frame packets, but
>> > occasionally
>> > clients send to the server multi-frame packets, and occasionally the
>> > server
>> > sends back to the client multi-frame replies.
>> > Packets are fragmented at the UDP level (i.e., no IP fragmentation,
>> > every
>> > packet of the same requests has a frag_id == 0, even though they share
>> > the
>> > same packet_id).
>> >
>> > *Problem*
>> >
>> > I experience huge packet loss on the server when the occasional
>> > multi-frame
>> > requests of the clients correspond to a big payload ( > 300 Kb).
>> > The eth stats that I gather on the server say that there is no error,
>> > nor
>> > any packet loss (q_errors, imissed, ierrors, oerrors, rx_nombuf are all
>> > equal to 0). Yet, the application is not seeing some packets of big
>> > requests that the clients send.
>> >
>> > I record some interesting facts
>> > 1) The clients do not experience such packet loss, although they also
>> > receive  packets with an aggregate payload of the same size of the
>> > packets
>> > received by the server. The only differences w.r.t. the server is that a
>> > client machine of course has a lower RX load (it only gets the replies
>> > to
>> > its own requests) and a client thread only receives packets from a
>> > single
>> > machine (the server).
>> > 2) This behavior does not arise as long as the biggest payload exchanged
>> > between clients and servers is < 200 Kb. This leads me to conclude that
>> > fragmentation is not te issue (also, if I implement a stubborn
>> > retransmission, eventually all packets are received even with bigger
>> > payloads). Also, I reserve plenty of memory for my mempool, so I don't
>> > think the server runs out of mbufs (and if that was the case I guess I
>> > would see this in the dropped packets count, right?).
>> > 3) If I switch to the pipeline model (on the server only) this problem
>> > basically disappears. By pipeline model I mean something like the
>> > load-balancing app, where a single core on the server receives client
>> > packets on a single RX queue (worker cores reply back to the client
>> > using
>> > their own TX queue). This leads me to think that the problem is on the
>> > server, and not on the clients.
>> > 4) It doesn't seem to be a "load" problem. If I run the same tests
>> > multiple
>> > times, in some "lucky" runs I get that the run-to-completion model
>> >  outperforms the pipeline one. Also, the run-to-completion model with
>> > single-frame packets can handle a number of single-frame packets per
>> > second
>> > that is much higher than the number of frames per second that are
>> > generated
>> > with the workload with some big packets.
>> >
>> >
>> > *Question*
>> >
>> > Do you have any idea why I am witnessing this behavior? I know that
>> > having
>> > fewer queues can help performance by relieving contention on the NIC,
>> > but
>> > is it possible that the contention is actually causing packets to get
>> > dropped?
>> >
>> > *Platform*
>> >
>> > DPDK: v  2.2-0  (I know this is an old version, but I am dealing with
>> > legacy code I cannot change)
>> >
>> > MLNX_OFED_LINUX-3.1-1.0.3-ubuntu14.04-x86_64
>> >
>> > My NIC : Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
>> >
>> > My machine runs a 4.4.0-72-generic  on Ubuntu 16.04.02
>> >
>> > CPU is Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz  2x8 cores
>> >
>> >
>> > Thank you a lot, especially if you went through the whole email :)
>> > Regards,
>> >    Harold
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-users] Strange packet loss with multi-frame payloads
  2017-07-24 16:23     ` Pavel Shirshov
@ 2017-07-25 20:03       ` Harold Demure
  0 siblings, 0 replies; 11+ messages in thread
From: Harold Demure @ 2017-07-25 20:03 UTC (permalink / raw)
  To: Pavel Shirshov; +Cc: users

Dear Pavel,
  Thank you for your input. Please find my replies inline.
Unfortunately, I will not be able to try your suggestions right away
because I will not have access to my cluster for -- I guess -- a couple of
weeks.
I will update this thread of emails as soon as I have further news.

Regards,
   Harold

2017-07-24 18:23 GMT+02:00 Pavel Shirshov <pavel.shirshov@gmail.com>:

> Hi Harold,
>
> Some thoughts:
>
> IPv4 packet_id. I'd better put packet_id in application header.
> Otherwise a driver may interfere with packet. In case of Mellanox
> there're two drivers: PMD + whole mellanox driver infrastructure. Try
> to extend your application header with packet_id and test it.
>

*This is something I thought about several times, but never tried. I guess
this will be the first thing I will try.*



>
> About registering missed packets. I'd put a switch between client and
> server and start seeing into switch counters. It could give more
> precise information about dropped packet and what component drops
> them. In your case it's not clear your client or server dropped
> packets. I don't believe both client and server statistics.
>

*I don't have direct access to the switch, unfortunately. Packets are
counted as sent by the server in eth_stat, but not as received on the
client.*


>
> Are you using RSS? What if you disable it? What if you enable it? Can
> every second packet go to different NIC queue in high load?
>

*The load that I generate is not very high (it also happens with only two
threads injecting traffic). RSS in general seems to be ok: if I only have
single-frame requests, everything run smoothly.*
*I have not tried disabling RSS yet because disabling it would cause frames
from the same request to be received by different cores. This makes it
harder to monitor exactly what is happening.*
*However, it is something I shall try .*

>
> I don't think your static memory mbuf could mess anything.
>

*Yes, making my buffers dynamic didn't change anything*


>
> As I see you XEON has 8 cores/16 threads. Are you sure you're not
> using threads on same cores? Because I see you have one cpu. All
> cores/threads bound to the same NUMA domain.
>
>
*AFAIK, lcores are automatically mapped to different cores at startup, so I
don't "force" lcore to core affinity in any way. By spawning lcore 0..7 I
expect them to be pinned to cores 0..7, which are in the same NUMA domain. *
*Looking at htop, it seems this is fine.*



>
> On Mon, Jul 17, 2017 at 2:23 PM, Harold Demure
> <harold.demure87@gmail.com> wrote:
> > Dear Pavel,
> >   Thank you for your feedback; I really appreciate it. I reply to your
> > questions inline.
> > Regards,
> >    Harold
> >
> > 2017-07-17 22:38 GMT+02:00 Pavel Shirshov <pavel.shirshov@gmail.com>:
> >>
> >> Hi Harold,
> >>
> >> Sorry I don't have a direct answer on your request, but I have a bunch
> >> of questions.
> >>
> >> 1. What is "packet_id" here? It's something inside of your udp payload?
> >
> >
> > I have a packet_id in the plain ipv4_hdr structure and I have a
> fragment_id
> > in the header of each fragment I send
> > So a typical packet is  ETH|IP|UDP|APP_HEADER
> > I defragment packets looking at the pckt_id in the ipv4 header and the
> > fragment id in the app_header
> >
> >> 2. How do you know you have the packet loss?
> >
> >
> > I know it because some fragmented packets never get reassembled fully.
> If I
> > print the packets seen by the server I see something like  "PCKT_ID 10
> FRAG
> > 250, PCKT_ID 10 FRAG 252". And FRAG 251 is never printed.
> >
> > Actually, something strange that happens sometimes is that a core
> receives
> > fragments of two packets and, say, receives   frag 1 of packet X, frag 2
> of
> > packet Y, frag 3 of packet X, frag 4 of packet Y.
> > Or that, after "losing" a fragment for packet X, I only see printed
> > fragments with EVEN frag_id for that packet X. At least for a while.
> >
> > This led me also to consider a bug in my implementation (I don't
> experience
> > this problem if I run with a SINGLE client thread). However, with smaller
> > payloads, even fragmented, everything runs smoothly.
> > If you have any suggestions for tests to run to spot a possible bug in my
> > implementation, It'd be more than welcome!
> >
> > MORE ON THIS: the buffers in which I store the packets taken from RX are
> > statically defined arrays, like struct rte_mbuf*  temp_mbuf[SIZE].  SIZE
> can
> > be pretty high (say, 10K entries), and there are 3 of those arrays per
> core.
> > Can it be that, somehow, they mess up the memory layout (e.g., they
> > intersect)?
> >
> >>
> >> How can you be sure it's
> >> packet loss if you don't see it on your counters?
> >
> >
> > Just because i tag every packet, and some packets that should be there,
> is
> > not
> >
> >>
> >> How can you be sure
> >> that these packets were sent by clients?
> >
> >
> > I print all packets that the clients send
> >
> >> How can you be sure your
> >> clients actually sent the packets?
> >
> >
> > The TX/mbuf error counters on the client eth_stats are 0
> >
> >
> >>
> >>
> >> Also I see you're using 2x8 cores server. So your OS uses some cores
> >> for itself. Could it be a problem too?
> >>
> >
> > I have no idea. I only use 8 cores out of the 16 I have, because I only
> use
> > the 8 that are in the same NUMA domain as the NIC. However, the PMD
> should
> > take the NIC out of the control of the kernel, so the OS should not be
> able
> > to see it or mess with it.
> >
> >>
> >> Thanks
> >>
> >> On Mon, Jul 17, 2017 at 6:18 AM, Harold Demure
> >> <harold.demure87@gmail.com> wrote:
> >> > Hello,
> >> >   I am having a problem with packets loss and I hope you can help me
> >> > out.
> >> > Below you find a description of the application and of the problem.
> >> > It is a little long, but I really hope somebody out there can help me,
> >> > because this is driving me crazy.
> >> >
> >> > *Application*
> >> >
> >> > I have a client-server application; single server, multiple clients.
> >> > The machines have 8 active cores which poll 8 distinct RX queues to
> >> > receive
> >> > packets and use 8 distinct TX queues to burst out packets (i.e.,
> >> > run-to-completion model).
> >> >
> >> > *Workload*
> >> >
> >> > The workload is composed of mostly single-frame packets, but
> >> > occasionally
> >> > clients send to the server multi-frame packets, and occasionally the
> >> > server
> >> > sends back to the client multi-frame replies.
> >> > Packets are fragmented at the UDP level (i.e., no IP fragmentation,
> >> > every
> >> > packet of the same requests has a frag_id == 0, even though they share
> >> > the
> >> > same packet_id).
> >> >
> >> > *Problem*
> >> >
> >> > I experience huge packet loss on the server when the occasional
> >> > multi-frame
> >> > requests of the clients correspond to a big payload ( > 300 Kb).
> >> > The eth stats that I gather on the server say that there is no error,
> >> > nor
> >> > any packet loss (q_errors, imissed, ierrors, oerrors, rx_nombuf are
> all
> >> > equal to 0). Yet, the application is not seeing some packets of big
> >> > requests that the clients send.
> >> >
> >> > I record some interesting facts
> >> > 1) The clients do not experience such packet loss, although they also
> >> > receive  packets with an aggregate payload of the same size of the
> >> > packets
> >> > received by the server. The only differences w.r.t. the server is
> that a
> >> > client machine of course has a lower RX load (it only gets the replies
> >> > to
> >> > its own requests) and a client thread only receives packets from a
> >> > single
> >> > machine (the server).
> >> > 2) This behavior does not arise as long as the biggest payload
> exchanged
> >> > between clients and servers is < 200 Kb. This leads me to conclude
> that
> >> > fragmentation is not te issue (also, if I implement a stubborn
> >> > retransmission, eventually all packets are received even with bigger
> >> > payloads). Also, I reserve plenty of memory for my mempool, so I don't
> >> > think the server runs out of mbufs (and if that was the case I guess I
> >> > would see this in the dropped packets count, right?).
> >> > 3) If I switch to the pipeline model (on the server only) this problem
> >> > basically disappears. By pipeline model I mean something like the
> >> > load-balancing app, where a single core on the server receives client
> >> > packets on a single RX queue (worker cores reply back to the client
> >> > using
> >> > their own TX queue). This leads me to think that the problem is on the
> >> > server, and not on the clients.
> >> > 4) It doesn't seem to be a "load" problem. If I run the same tests
> >> > multiple
> >> > times, in some "lucky" runs I get that the run-to-completion model
> >> >  outperforms the pipeline one. Also, the run-to-completion model with
> >> > single-frame packets can handle a number of single-frame packets per
> >> > second
> >> > that is much higher than the number of frames per second that are
> >> > generated
> >> > with the workload with some big packets.
> >> >
> >> >
> >> > *Question*
> >> >
> >> > Do you have any idea why I am witnessing this behavior? I know that
> >> > having
> >> > fewer queues can help performance by relieving contention on the NIC,
> >> > but
> >> > is it possible that the contention is actually causing packets to get
> >> > dropped?
> >> >
> >> > *Platform*
> >> >
> >> > DPDK: v  2.2-0  (I know this is an old version, but I am dealing with
> >> > legacy code I cannot change)
> >> >
> >> > MLNX_OFED_LINUX-3.1-1.0.3-ubuntu14.04-x86_64
> >> >
> >> > My NIC : Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
> >> >
> >> > My machine runs a 4.4.0-72-generic  on Ubuntu 16.04.02
> >> >
> >> > CPU is Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz  2x8 cores
> >> >
> >> >
> >> > Thank you a lot, especially if you went through the whole email :)
> >> > Regards,
> >> >    Harold
> >
> >
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-07-25 20:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-17 13:18 [dpdk-users] Strange packet loss with multi-frame payloads Harold Demure
2017-07-17 20:38 ` Pavel Shirshov
2017-07-17 21:23   ` Harold Demure
2017-07-17 23:24     ` Harold Demure
2017-07-18  5:50       ` Shyam Shrivastav
2017-07-18  9:36         ` Harold Demure
2017-07-18 10:07           ` Shyam Shrivastav
2017-07-18 11:06             ` Harold Demure
2017-07-18 12:20               ` Shyam Shrivastav
2017-07-24 16:23     ` Pavel Shirshov
2017-07-25 20:03       ` Harold Demure

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).