Hi all, I am using dpdk 16.04 library to process the packets on a vmware vm (say vm1). The traffic is sent through a client machine using apache bench. I am observing an issue when the number of packets reaching the vmxnet3 interface reaches the descriptor ring size (set to 2048). Till 2048 all the packets correctly reach the 'vmxnet3_recv_pkts' function. But as the number of packets received reaches 2048, i start seeing some retransmissions on the client side (as shown by tcpdump on client side). Then i captured the packets on the esx host destined to vm1 and it shows that all the original packets as well as the corresponding retransmissions are reaching vm1. But somehow these packets fail to reach till 'vmxnet3_recv_pkts' function (this was found by putting the breakpoint in 'vmxnet3_recv_pkts' function and dumping the packet contents, which showed that some packets don't reach here). Now 'vmxnet3_recv_pkts' is the first function which reads the packets from the descriptor ring and as the packets are not reaching till here, i am not sure on how to debug this further. Possibly some issues related to the ring initialization.... I enabled the init & RX logs for vmxnet3, but i don't see any error logs as well. I also tried looking at rxq stats and it shows 0 all the time : (gdb) p rxq->stats $1 = {drop_total = 0, drop_err = 0, drop_fcs = 0, rx_buf_alloc_failure = 0} Can anyone please provide some clues on how to debug this further ? Are there any known issues related to this which have been fixed post 16.04 version ? thanks, gaurav
any suggestions here ?
On Wed, Sep 25, 2019 at 4:09 PM Gaurav Bansal <zeebee48@gmail.com> wrote:
> Hi all,
> I am using dpdk 16.04 library to process the packets on a vmware vm (say
> vm1). The traffic is sent through a client machine using apache bench. I am
> observing an issue when the number of packets reaching the vmxnet3
> interface reaches the descriptor ring size (set to 2048). Till 2048 all the
> packets correctly reach the 'vmxnet3_recv_pkts' function.
>
> But as the number of packets received reaches 2048, i start seeing some
> retransmissions on the client side (as shown by tcpdump on client side).
> Then i captured the packets on the esx host destined to vm1 and it shows
> that all the original packets as well as the corresponding retransmissions
> are reaching vm1. But somehow these packets fail to reach till
> 'vmxnet3_recv_pkts' function (this was found by putting the breakpoint in
> 'vmxnet3_recv_pkts' function and dumping the packet contents, which showed
> that some packets don't reach here).
>
> Now 'vmxnet3_recv_pkts' is the first function which reads the packets from
> the descriptor ring and as the packets are not reaching till here, i am not
> sure on how to debug this further. Possibly some issues related to the ring
> initialization.... I enabled the init & RX logs for vmxnet3, but i don't
> see any error logs as well. I also tried looking at rxq stats and it shows
> 0 all the time :
> (gdb) p rxq->stats
> $1 = {drop_total = 0, drop_err = 0, drop_fcs = 0, rx_buf_alloc_failure = 0}
>
> Can anyone please provide some clues on how to debug this further ? Are
> there any known issues related to this which have been fixed post 16.04
> version ?
> thanks,
> gaurav
>
On Thu, 26 Sep 2019 10:45:03 +0530
Gaurav Bansal <zeebee48@gmail.com> wrote:
> any suggestions here ?
>
> On Wed, Sep 25, 2019 at 4:09 PM Gaurav Bansal <zeebee48@gmail.com> wrote:
>
> > Hi all,
> > I am using dpdk 16.04 library to process the packets on a vmware vm (say
> > vm1). The traffic is sent through a client machine using apache bench. I am
> > observing an issue when the number of packets reaching the vmxnet3
> > interface reaches the descriptor ring size (set to 2048). Till 2048 all the
> > packets correctly reach the 'vmxnet3_recv_pkts' function.
> >
> > But as the number of packets received reaches 2048, i start seeing some
> > retransmissions on the client side (as shown by tcpdump on client side).
> > Then i captured the packets on the esx host destined to vm1 and it shows
> > that all the original packets as well as the corresponding retransmissions
> > are reaching vm1. But somehow these packets fail to reach till
> > 'vmxnet3_recv_pkts' function (this was found by putting the breakpoint in
> > 'vmxnet3_recv_pkts' function and dumping the packet contents, which showed
> > that some packets don't reach here).
> >
> > Now 'vmxnet3_recv_pkts' is the first function which reads the packets from
> > the descriptor ring and as the packets are not reaching till here, i am not
> > sure on how to debug this further. Possibly some issues related to the ring
> > initialization.... I enabled the init & RX logs for vmxnet3, but i don't
> > see any error logs as well. I also tried looking at rxq stats and it shows
> > 0 all the time :
> > (gdb) p rxq->stats
> > $1 = {drop_total = 0, drop_err = 0, drop_fcs = 0, rx_buf_alloc_failure = 0}
> >
> > Can anyone please provide some clues on how to debug this further ? Are
> > there any known issues related to this which have been fixed post 16.04
> > version ?
> > thanks,
> > gaurav
> >
Try a newer version of DPDK first.
Thanks Stephen for the reply. I am planning to try that but that will
require some major changes & lot more time, as this library is tightly
integrated with our code base. Meanwhile please let me know if there is
anything that can be tried on 16.04 itself.
thanks,
gaurav
On Thu, Sep 26, 2019 at 10:46 AM Stephen Hemminger <
stephen@networkplumber.org> wrote:
> On Thu, 26 Sep 2019 10:45:03 +0530
> Gaurav Bansal <zeebee48@gmail.com> wrote:
>
> > any suggestions here ?
> >
> > On Wed, Sep 25, 2019 at 4:09 PM Gaurav Bansal <zeebee48@gmail.com>
> wrote:
> >
> > > Hi all,
> > > I am using dpdk 16.04 library to process the packets on a vmware vm
> (say
> > > vm1). The traffic is sent through a client machine using apache bench.
> I am
> > > observing an issue when the number of packets reaching the vmxnet3
> > > interface reaches the descriptor ring size (set to 2048). Till 2048
> all the
> > > packets correctly reach the 'vmxnet3_recv_pkts' function.
> > >
> > > But as the number of packets received reaches 2048, i start seeing some
> > > retransmissions on the client side (as shown by tcpdump on client
> side).
> > > Then i captured the packets on the esx host destined to vm1 and it
> shows
> > > that all the original packets as well as the corresponding
> retransmissions
> > > are reaching vm1. But somehow these packets fail to reach till
> > > 'vmxnet3_recv_pkts' function (this was found by putting the breakpoint
> in
> > > 'vmxnet3_recv_pkts' function and dumping the packet contents, which
> showed
> > > that some packets don't reach here).
> > >
> > > Now 'vmxnet3_recv_pkts' is the first function which reads the packets
> from
> > > the descriptor ring and as the packets are not reaching till here, i
> am not
> > > sure on how to debug this further. Possibly some issues related to the
> ring
> > > initialization.... I enabled the init & RX logs for vmxnet3, but i
> don't
> > > see any error logs as well. I also tried looking at rxq stats and it
> shows
> > > 0 all the time :
> > > (gdb) p rxq->stats
> > > $1 = {drop_total = 0, drop_err = 0, drop_fcs = 0, rx_buf_alloc_failure
> = 0}
> > >
> > > Can anyone please provide some clues on how to debug this further ? Are
> > > there any known issues related to this which have been fixed post 16.04
> > > version ?
> > > thanks,
> > > gaurav
> > >
>
> Try a newer version of DPDK first.
>
hi all,
Tried to debug this further and found that there is a skip in the rxdIdx
count as seen in the logs below. And 'rxdIdx' index is incremented by the
NIC itself. Any ideas on why the 'rxdIdx' count (say 5 as seen below) may
be missing ? Any suggestions to try to find the root cause of missing rx
index ?
PMD: vmxnet3_recv_pkts(): rxd idx: 0 ring idx: 0.
PMD: vmxnet3_recv_pkts(): rxd idx: 1 ring idx: 0.
PMD: vmxnet3_recv_pkts(): rxd idx: 2 ring idx: 0.
PMD: vmxnet3_recv_pkts(): rxd idx: 3 ring idx: 0.
PMD: vmxnet3_recv_pkts(): *rxd idx: 4* ring idx: 0.
PMD: vmxnet3_recv_pkts(): *rxd idx: 6* ring idx: 0.
PMD: vmxnet3_recv_pkts(): rxd idx: 7 ring idx: 0.
thanks,
gaurav
On Thu, Sep 26, 2019 at 1:25 PM Gaurav Bansal <zeebee48@gmail.com> wrote:
> Thanks Stephen for the reply. I am planning to try that but that will
> require some major changes & lot more time, as this library is tightly
> integrated with our code base. Meanwhile please let me know if there is
> anything that can be tried on 16.04 itself.
> thanks,
> gaurav
>
> On Thu, Sep 26, 2019 at 10:46 AM Stephen Hemminger <
> stephen@networkplumber.org> wrote:
>
>> On Thu, 26 Sep 2019 10:45:03 +0530
>> Gaurav Bansal <zeebee48@gmail.com> wrote:
>>
>> > any suggestions here ?
>> >
>> > On Wed, Sep 25, 2019 at 4:09 PM Gaurav Bansal <zeebee48@gmail.com>
>> wrote:
>> >
>> > > Hi all,
>> > > I am using dpdk 16.04 library to process the packets on a vmware vm
>> (say
>> > > vm1). The traffic is sent through a client machine using apache
>> bench. I am
>> > > observing an issue when the number of packets reaching the vmxnet3
>> > > interface reaches the descriptor ring size (set to 2048). Till 2048
>> all the
>> > > packets correctly reach the 'vmxnet3_recv_pkts' function.
>> > >
>> > > But as the number of packets received reaches 2048, i start seeing
>> some
>> > > retransmissions on the client side (as shown by tcpdump on client
>> side).
>> > > Then i captured the packets on the esx host destined to vm1 and it
>> shows
>> > > that all the original packets as well as the corresponding
>> retransmissions
>> > > are reaching vm1. But somehow these packets fail to reach till
>> > > 'vmxnet3_recv_pkts' function (this was found by putting the
>> breakpoint in
>> > > 'vmxnet3_recv_pkts' function and dumping the packet contents, which
>> showed
>> > > that some packets don't reach here).
>> > >
>> > > Now 'vmxnet3_recv_pkts' is the first function which reads the packets
>> from
>> > > the descriptor ring and as the packets are not reaching till here, i
>> am not
>> > > sure on how to debug this further. Possibly some issues related to
>> the ring
>> > > initialization.... I enabled the init & RX logs for vmxnet3, but i
>> don't
>> > > see any error logs as well. I also tried looking at rxq stats and it
>> shows
>> > > 0 all the time :
>> > > (gdb) p rxq->stats
>> > > $1 = {drop_total = 0, drop_err = 0, drop_fcs = 0,
>> rx_buf_alloc_failure = 0}
>> > >
>> > > Can anyone please provide some clues on how to debug this further ?
>> Are
>> > > there any known issues related to this which have been fixed post
>> 16.04
>> > > version ?
>> > > thanks,
>> > > gaurav
>> > >
>>
>> Try a newer version of DPDK first.
>>
>
hi all,
havn't heard back on this so far. Can anyone please let me know of the
possible reasons for the NIC to skip some of the indexes of the descriptor
ring in the Rx path (as also shown by the vmxnet3 logs in my previous mail)
?
thanks,
gaurav
On Sat, Sep 28, 2019 at 2:17 PM Gaurav Bansal <zeebee48@gmail.com> wrote:
> hi all,
> Tried to debug this further and found that there is a skip in the rxdIdx
> count as seen in the logs below. And 'rxdIdx' index is incremented by the
> NIC itself. Any ideas on why the 'rxdIdx' count (say 5 as seen below) may
> be missing ? Any suggestions to try to find the root cause of missing rx
> index ?
>
> PMD: vmxnet3_recv_pkts(): rxd idx: 0 ring idx: 0.
> PMD: vmxnet3_recv_pkts(): rxd idx: 1 ring idx: 0.
> PMD: vmxnet3_recv_pkts(): rxd idx: 2 ring idx: 0.
> PMD: vmxnet3_recv_pkts(): rxd idx: 3 ring idx: 0.
> PMD: vmxnet3_recv_pkts(): *rxd idx: 4* ring idx: 0.
> PMD: vmxnet3_recv_pkts(): *rxd idx: 6* ring idx: 0.
> PMD: vmxnet3_recv_pkts(): rxd idx: 7 ring idx: 0.
>
> thanks,
> gaurav
>
> On Thu, Sep 26, 2019 at 1:25 PM Gaurav Bansal <zeebee48@gmail.com> wrote:
>
>> Thanks Stephen for the reply. I am planning to try that but that will
>> require some major changes & lot more time, as this library is tightly
>> integrated with our code base. Meanwhile please let me know if there is
>> anything that can be tried on 16.04 itself.
>> thanks,
>> gaurav
>>
>> On Thu, Sep 26, 2019 at 10:46 AM Stephen Hemminger <
>> stephen@networkplumber.org> wrote:
>>
>>> On Thu, 26 Sep 2019 10:45:03 +0530
>>> Gaurav Bansal <zeebee48@gmail.com> wrote:
>>>
>>> > any suggestions here ?
>>> >
>>> > On Wed, Sep 25, 2019 at 4:09 PM Gaurav Bansal <zeebee48@gmail.com>
>>> wrote:
>>> >
>>> > > Hi all,
>>> > > I am using dpdk 16.04 library to process the packets on a vmware vm
>>> (say
>>> > > vm1). The traffic is sent through a client machine using apache
>>> bench. I am
>>> > > observing an issue when the number of packets reaching the vmxnet3
>>> > > interface reaches the descriptor ring size (set to 2048). Till 2048
>>> all the
>>> > > packets correctly reach the 'vmxnet3_recv_pkts' function.
>>> > >
>>> > > But as the number of packets received reaches 2048, i start seeing
>>> some
>>> > > retransmissions on the client side (as shown by tcpdump on client
>>> side).
>>> > > Then i captured the packets on the esx host destined to vm1 and it
>>> shows
>>> > > that all the original packets as well as the corresponding
>>> retransmissions
>>> > > are reaching vm1. But somehow these packets fail to reach till
>>> > > 'vmxnet3_recv_pkts' function (this was found by putting the
>>> breakpoint in
>>> > > 'vmxnet3_recv_pkts' function and dumping the packet contents, which
>>> showed
>>> > > that some packets don't reach here).
>>> > >
>>> > > Now 'vmxnet3_recv_pkts' is the first function which reads the
>>> packets from
>>> > > the descriptor ring and as the packets are not reaching till here, i
>>> am not
>>> > > sure on how to debug this further. Possibly some issues related to
>>> the ring
>>> > > initialization.... I enabled the init & RX logs for vmxnet3, but i
>>> don't
>>> > > see any error logs as well. I also tried looking at rxq stats and it
>>> shows
>>> > > 0 all the time :
>>> > > (gdb) p rxq->stats
>>> > > $1 = {drop_total = 0, drop_err = 0, drop_fcs = 0,
>>> rx_buf_alloc_failure = 0}
>>> > >
>>> > > Can anyone please provide some clues on how to debug this further ?
>>> Are
>>> > > there any known issues related to this which have been fixed post
>>> 16.04
>>> > > version ?
>>> > > thanks,
>>> > > gaurav
>>> > >
>>>
>>> Try a newer version of DPDK first.
>>>
>>
On Mon, 30 Sep 2019 11:22:19 +0530
Gaurav Bansal <zeebee48@gmail.com> wrote:
> hi all,
> havn't heard back on this so far. Can anyone please let me know of the
> possible reasons for the NIC to skip some of the indexes of the descriptor
> ring in the Rx path (as also shown by the vmxnet3 logs in my previous mail)
> ?
> thanks,
> gaurav
>
A packet can span multiple segments if your receive mbuf is too small to fit
all the data. That would cause the receive data to create a chained mbuf.
Hi Stephen,
Thanks for the reply. Does that mean receive index skip is expected in case
a packet spans across multiple segments ? I am seeing following behaviour :
We receive the packets in vmxnet3_recv_pkts function... this is done by
checking gen bits (in a while loop). For each packet received, we increment
next2proc in comp_ring and then fetch rxdIdx using next2proc value as
follows :
rcd = &rxq->comp_ring.base[rxq->comp_ring.next2proc].rcd;
idx = rcd->rxdIdx;
Now for the case when i am seeing packet drops, next2proc is getting
incremented by 1 but rxdIdx gets incremented by 2. This is seen by printing
the following in gdb :
p rxq->comp_ring.next2proc
p rxq->comp_ring.base[rxq->comp_ring.next2proc].rcd.rxdIdx
$1094 = *262*
$1095 = *6*
//for next packet
$1099 = *263*
$1100 = *8*
Is this too an expected behaviour ? In that case how we will receive the
packet with rxdIdx = 7 ?
thanks,
gaurav
On Mon, Sep 30, 2019 at 8:29 PM Stephen Hemminger <
stephen@networkplumber.org> wrote:
> On Mon, 30 Sep 2019 11:22:19 +0530
> Gaurav Bansal <zeebee48@gmail.com> wrote:
>
> > hi all,
> > havn't heard back on this so far. Can anyone please let me know of the
> > possible reasons for the NIC to skip some of the indexes of the
> descriptor
> > ring in the Rx path (as also shown by the vmxnet3 logs in my previous
> mail)
> > ?
> > thanks,
> > gaurav
> >
>
> A packet can span multiple segments if your receive mbuf is too small to
> fit
> all the data. That would cause the receive data to create a chained mbuf.
>
>
>