[dpdk-dev] vhost-net stops sending to virito pmd -- already fixed?

DPDK patches and discussions
 help / color / mirror / Atom feed

* [dpdk-dev] vhost-net stops sending to virito pmd -- already fixed?
@ 2015-09-11 16:32 Kyle Larose
  2015-09-13 21:43 ` Thomas Monjalon
  0 siblings, 1 reply; 8+ messages in thread
From: Kyle Larose @ 2015-09-11 16:32 UTC (permalink / raw)
  To: dev

Hi all,

I've been debugging an issue we see occasionally when under load. We have a
non-DPDK linux-bridge host sending to our guest via vhost-net. The guest is
running a DPDK 1.8 application using virtio PMDs.

The symptoms of the problem are that host interfaces report all packets
sent to the interface as being tx-discarded. The guest reports that no
packets are received.

Debugging the guest process, I found that the PMD thought that there were
no entries queued for reception:

$5 = {hw = 0x7f131fcfd600, mz = 0x7f1346200128, virtio_net_hdr_mz = 0x0,
mpool = 0x7f131ff8a0c0, queue_id = 0, port_id = 1 '\001', vq_ring_virt_mem
= 0x7f1339ebd000, vq_alignment = 4096, vq_ring_size = 12288, vq_ring_mem =
2081148928, vq_ring = {num = 256,
    desc = 0x7f1339ebd000, avail = 0x7f1339ebe000, used = 0x7f1339ebf000},
vq_free_cnt = 256, vq_nentries = 256, vq_queue_index = 0, vq_desc_head_idx
= 33, vq_desc_tail_idx = 32, vq_used_cons_idx = 37409, vq_avail_idx =
37409, virtio_net_hdr_mem = 0,
  packets = 67473953, bytes = 4049880780, errors = 0, vq_descx =
0x7f131fcf41d0}

(Basically, VIRTQUEUE_NUSED() was returning 0).

This makes me suspect that the guest and host fell out of sync.

Looking through the version tree for virtio_rxtx.c, I saw the following
commit:

http://dpdk.org/browse/dpdk/commit/lib/librte_pmd_virtio?id=8c09c20fb4cde76e53d87bd50acf2b441ecf6eb8

Does anybody know offhand if the issue fixed by that commit could be the
root cause of what I am seeing?

Thanks,

Kyle

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] vhost-net stops sending to virito pmd -- already fixed?
  2015-09-11 16:32 [dpdk-dev] vhost-net stops sending to virito pmd -- already fixed? Kyle Larose
@ 2015-09-13 21:43 ` Thomas Monjalon
  2015-09-15 21:04   ` Kyle Larose
  2015-09-18  6:28   ` Xie, Huawei
  0 siblings, 2 replies; 8+ messages in thread
From: Thomas Monjalon @ 2015-09-13 21:43 UTC (permalink / raw)
  To: dev

Hi,

2015-09-11 12:32, Kyle Larose:
> Looking through the version tree for virtio_rxtx.c, I saw the following
> commit:
> 
> http://dpdk.org/browse/dpdk/commit/lib/librte_pmd_virtio?id=8c09c20fb4cde76e53d87bd50acf2b441ecf6eb8
> 
> Does anybody know offhand if the issue fixed by that commit could be the
> root cause of what I am seeing?

I won't have the definitive answer but I would like to use your question
to highlight a common issue in git messages:

PLEASE, authors of fixes, explain the bug you are fixing and how it can
be reproduced. Good commit messages are REALLY read and useful.

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] vhost-net stops sending to virito pmd -- already fixed?
  2015-09-13 21:43 ` Thomas Monjalon
@ 2015-09-15 21:04   ` Kyle Larose
  2015-09-16  1:37     ` Xie, Huawei
  2015-09-17  0:45     ` Ouyang, Changchun
  2015-09-18  6:28   ` Xie, Huawei
  1 sibling, 2 replies; 8+ messages in thread
From: Kyle Larose @ 2015-09-15 21:04 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

On Sun, Sep 13, 2015 at 5:43 PM, Thomas Monjalon
<thomas.monjalon@6wind.com> wrote:
>
> Hi,
>
> 2015-09-11 12:32, Kyle Larose:
> > Looking through the version tree for virtio_rxtx.c, I saw the following
> > commit:
> >
> > http://dpdk.org/browse/dpdk/commit/lib/librte_pmd_virtio?id=8c09c20fb4cde76e53d87bd50acf2b441ecf6eb8
> >
> > Does anybody know offhand if the issue fixed by that commit could be the
> > root cause of what I am seeing?
>
> I won't have the definitive answer but I would like to use your question
> to highlight a common issue in git messages:
>
> PLEASE, authors of fixes, explain the bug you are fixing and how it can
> be reproduced. Good commit messages are REALLY read and useful.
>
> Thanks
>

I've figured out what happened. It has nothing to do with the fix I
pasted above. Instead, the issue has to do with running low on mbufs.

Here's the general logic:

1. If packets are not queued, return
2. Fetch each queued packet, as an mbuf, into the provided array. This
may involve some merging/etc
3. Try to fill the virtio receive ring with new mbufs
  3.a. If we fail to allocate an mbuf, break out of the refill loop
4. Update the receive ring information and kick the host

This is obviously a simplification, but the key point is 3.a. If we
hit this logic when the virtio receive ring is completely used up, we
essentially lock up. The host will have no buffers with which to queue
packets, so the next time we poll, we will hit case 1. However, since
we hit case 1, we will not allocate mbufs to the virtio receive ring,
regardless of how many are now free. Rinse and repeat; we are stuck
until the pmd is restarted or the link is restarted.

This is very easy to reproduce when the mbuf pool is fairly small, and
packets are being passed to worker threads/processes which may
increase the length of the pipeline.

I took a quick look at the ixgbe driver, and it looks like it checks
if it needs to allocate mbufs to the ring before trying to pull
packets off the nic. Should we not be doing something similar for
virtio? Rather than breaking out early if no packets are queued, we
should first make sure there are resources with which to queue
packets!

One solution here is to increase the mbuf pool to a size where such
exhaustion is impossible, but that doesn't seem like a graceful
solution. For example, it may be desirable to drop packets rather than
have a large memory pool, and becoming stuck under such a situation is
not good. Further, it isn't easy to know the exact size required. You
may end up wasting a bunch of resources allocating far more than
necessary, or you may unknowingly under allocate, only to find out
once your application has been deployed into production, and it's
dropping everything on the floor.

Does anyone have thoughts on this? I took a look at virtio_rxtx and
head and I didn't see anything resembling my suggestion.

Comments would be appreciated. Thanks,

Kyle

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] vhost-net stops sending to virito pmd -- already fixed?
  2015-09-15 21:04   ` Kyle Larose
@ 2015-09-16  1:37     ` Xie, Huawei
  2015-09-16 17:24       ` Kyle Larose
  2015-09-17  0:45     ` Ouyang, Changchun
  1 sibling, 1 reply; 8+ messages in thread
From: Xie, Huawei @ 2015-09-16  1:37 UTC (permalink / raw)
  To: Kyle Larose, Thomas Monjalon; +Cc: dev

On 9/16/2015 5:05 AM, Kyle Larose wrote:
> On Sun, Sep 13, 2015 at 5:43 PM, Thomas Monjalon
> <thomas.monjalon@6wind.com> wrote:
>> Hi,
>>
>> 2015-09-11 12:32, Kyle Larose:
>>> Looking through the version tree for virtio_rxtx.c, I saw the following
>>> commit:
>>>
>>> http://dpdk.org/browse/dpdk/commit/lib/librte_pmd_virtio?id=8c09c20fb4cde76e53d87bd50acf2b441ecf6eb8
>>>
>>> Does anybody know offhand if the issue fixed by that commit could be the
>>> root cause of what I am seeing?
>> I won't have the definitive answer but I would like to use your question
>> to highlight a common issue in git messages:
>>
>> PLEASE, authors of fixes, explain the bug you are fixing and how it can
>> be reproduced. Good commit messages are REALLY read and useful.
>>
>> Thanks
>>
> I've figured out what happened. It has nothing to do with the fix I
> pasted above. Instead, the issue has to do with running low on mbufs.
>
> Here's the general logic:
>
> 1. If packets are not queued, return
> 2. Fetch each queued packet, as an mbuf, into the provided array. This
> may involve some merging/etc
> 3. Try to fill the virtio receive ring with new mbufs
>   3.a. If we fail to allocate an mbuf, break out of the refill loop
> 4. Update the receive ring information and kick the host
>
> This is obviously a simplification, but the key point is 3.a. If we
> hit this logic when the virtio receive ring is completely used up, we
> essentially lock up. The host will have no buffers with which to queue
> packets, so the next time we poll, we will hit case 1. However, since
> we hit case 1, we will not allocate mbufs to the virtio receive ring,
> regardless of how many are now free. Rinse and repeat; we are stuck
> until the pmd is restarted or the link is restarted.
>
> This is very easy to reproduce when the mbuf pool is fairly small, and
> packets are being passed to worker threads/processes which may
> increase the length of the pipeline.
Sorry for the trouble, and thanks a lot for your investigation. It quite
makes sense. We would check the code and come back to you.
I remember we fixed a similar problem before. Would check if there is
other dead lock issue.
>
> I took a quick look at the ixgbe driver, and it looks like it checks
> if it needs to allocate mbufs to the ring before trying to pull
> packets off the nic. Should we not be doing something similar for
> virtio? Rather than breaking out early if no packets are queued, we
> should first make sure there are resources with which to queue
> packets!
Yes, this is a implementation bug.
>
> One solution here is to increase the mbuf pool to a size where such
> exhaustion is impossible, but that doesn't seem like a graceful
> solution. For example, it may be desirable to drop packets rather than
> have a large memory pool, and becoming stuck under such a situation is
> not good. Further, it isn't easy to know the exact size required. You
> may end up wasting a bunch of resources allocating far more than
> necessary, or you may unknowingly under allocate, only to find out
> once your application has been deployed into production, and it's
> dropping everything on the floor.
Kyle:
Could you tell us how did you produce this issue, very small pool size
or you are using pipeline model?
>
> Does anyone have thoughts on this? I took a look at virtio_rxtx and
> head and I didn't see anything resembling my suggestion.
>
> Comments would be appreciated. Thanks,
>
> Kyle
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] vhost-net stops sending to virito pmd -- already fixed?
  2015-09-16  1:37     ` Xie, Huawei
@ 2015-09-16 17:24       ` Kyle Larose
  2015-09-18  5:58         ` Xie, Huawei
  0 siblings, 1 reply; 8+ messages in thread
From: Kyle Larose @ 2015-09-16 17:24 UTC (permalink / raw)
  To: Xie, Huawei; +Cc: dev

Hi Huawei,

> Kyle:
> Could you tell us how did you produce this issue, very small pool size
> or you are using pipeline model?

If I understand correctly, by pipeline model you mean a model whereby
multiple threads handle a given packet, with some sort IPC (e.g. dpdk
rings) between them? If so, yes: we are using such a model. And I
suspect that this model is where we run into issues: the length of the
pipeline, combined with the queuing between stages, can lead to us
exhausting the mbufs, particularly when a stage's load causes queuing.

When I initially ran into this issue, I had a fairly large mbuf pool
(32K entries), with 3 stages in the pipeline: rx, worker, tx. There
were two worker threads, with a total of 6 rings. I was sending some
fairly bursty traffic, at a high packet rate (it was bursting up to
around 1Mpkt/s). There was a low chance that this actually caused the
problem. However, when I decreased the mbuf pool to 1000 entries, it
*always* happened.

In summary: the pipeline model is important here, and a small pool
size definitely exacerbates the problem.

I was able to reproduce the problem using the load_balancer sample
application, though it required some modification to get it to run
with virtio. I'm not sure if this is because I'm using DPDK 1.8,  or
something else. Either way, I made the number of mbufs configurable
via an environment variable, and was able to show that decreasing it
from the default of 32K to 1K would cause the problem to always happen
when using the same traffic as with my application. Applying the below
patch fixed the problem.

The following patch seems to fix the problem for me, though I'm not
sure it's the optimal solution. It does so by removing the early exit
which prevents us from allocating mbufs. After we skip over the packet
processing loop since there are no packets, the mbuf allocation loop
runs.  Note that the patch is on dpdk 1.8.

diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c
b/lib/librte_pmd_virtio/virtio_rxtx.c
index c013f97..7cadf52 100644
--- a/lib/librte_pmd_virtio/virtio_rxtx.c
+++ b/lib/librte_pmd_virtio/virtio_rxtx.c
@@ -463,9 +463,6 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
**rx_pkts, uint16_t nb_pkts)
        if (likely(num > DESC_PER_CACHELINE))
                num = num - ((rxvq->vq_used_cons_idx + num) %
DESC_PER_CACHELINE);

-       if (num == 0)
-               return 0;
-
        num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num);
        PMD_RX_LOG(DEBUG, "used:%d dequeue:%d", nb_used, num);
        for (i = 0; i < num ; i++) {
@@ -549,9 +546,6 @@ virtio_recv_mergeable_pkts(void *rx_queue,

        rmb();

-       if (nb_used == 0)
-               return 0;
-
        PMD_RX_LOG(DEBUG, "used:%d\n", nb_used);

        while (i < nb_used) {

Thanks,

Kyle

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] vhost-net stops sending to virito pmd -- already fixed?
  2015-09-15 21:04   ` Kyle Larose
  2015-09-16  1:37     ` Xie, Huawei
@ 2015-09-17  0:45     ` Ouyang, Changchun
  1 sibling, 0 replies; 8+ messages in thread
From: Ouyang, Changchun @ 2015-09-17  0:45 UTC (permalink / raw)
  To: Kyle Larose, Thomas Monjalon; +Cc: dev



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Kyle Larose
> Sent: Wednesday, September 16, 2015 5:05 AM
> To: Thomas Monjalon
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] vhost-net stops sending to virito pmd -- already
> fixed?
> 
> On Sun, Sep 13, 2015 at 5:43 PM, Thomas Monjalon
> <thomas.monjalon@6wind.com> wrote:
> >
> > Hi,
> >
> > 2015-09-11 12:32, Kyle Larose:
> > > Looking through the version tree for virtio_rxtx.c, I saw the
> > > following
> > > commit:
> > >
> > >
> http://dpdk.org/browse/dpdk/commit/lib/librte_pmd_virtio?id=8c09c20f
> > > b4cde76e53d87bd50acf2b441ecf6eb8
> > >
> > > Does anybody know offhand if the issue fixed by that commit could be
> > > the root cause of what I am seeing?
> >
> > I won't have the definitive answer but I would like to use your
> > question to highlight a common issue in git messages:
> >
> > PLEASE, authors of fixes, explain the bug you are fixing and how it
> > can be reproduced. Good commit messages are REALLY read and useful.
> >
> > Thanks
> >
> 
> I've figured out what happened. It has nothing to do with the fix I pasted
> above. Instead, the issue has to do with running low on mbufs.
> 
> Here's the general logic:
> 
> 1. If packets are not queued, return
> 2. Fetch each queued packet, as an mbuf, into the provided array. This may
> involve some merging/etc 3. Try to fill the virtio receive ring with new mbufs
>   3.a. If we fail to allocate an mbuf, break out of the refill loop 4. Update the
> receive ring information and kick the host
> 
> This is obviously a simplification, but the key point is 3.a. If we hit this logic
> when the virtio receive ring is completely used up, we essentially lock up.
> The host will have no buffers with which to queue packets, so the next time
> we poll, we will hit case 1. However, since we hit case 1, we will not allocate
> mbufs to the virtio receive ring, regardless of how many are now free. Rinse
> and repeat; we are stuck until the pmd is restarted or the link is restarted.
> 
> This is very easy to reproduce when the mbuf pool is fairly small, and packets
> are being passed to worker threads/processes which may increase the
> length of the pipeline.
> 
> I took a quick look at the ixgbe driver, and it looks like it checks if it needs to
> allocate mbufs to the ring before trying to pull packets off the nic. Should we
> not be doing something similar for virtio? Rather than breaking out early if no
> packets are queued, we should first make sure there are resources with
> which to queue packets!

Try to allocate mbuf and refill the vring descriptor when 1 is hit,
This way probably address your issue.

> 
> One solution here is to increase the mbuf pool to a size where such
> exhaustion is impossible, but that doesn't seem like a graceful solution. For
> example, it may be desirable to drop packets rather than have a large
> memory pool, and becoming stuck under such a situation is not good. Further,
> it isn't easy to know the exact size required. You may end up wasting a bunch
> of resources allocating far more than necessary, or you may unknowingly
> under allocate, only to find out once your application has been deployed into
> production, and it's dropping everything on the floor.
> 
> Does anyone have thoughts on this? I took a look at virtio_rxtx and head and
> I didn't see anything resembling my suggestion.
> 
> Comments would be appreciated. Thanks,
> 
> Kyle

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] vhost-net stops sending to virito pmd -- already fixed?
  2015-09-16 17:24       ` Kyle Larose
@ 2015-09-18  5:58         ` Xie, Huawei
  0 siblings, 0 replies; 8+ messages in thread
From: Xie, Huawei @ 2015-09-18  5:58 UTC (permalink / raw)
  To: Kyle Larose; +Cc: dev

On 9/17/2015 1:25 AM, Kyle Larose wrote:
> Hi Huawei,
>
>> Kyle:
>> Could you tell us how did you produce this issue, very small pool size
>> or you are using pipeline model?
> If I understand correctly, by pipeline model you mean a model whereby
> multiple threads handle a given packet, with some sort IPC (e.g. dpdk
> rings) between them? If so, yes: we are using such a model. And I
> suspect that this model is where we run into issues: the length of the
> pipeline, combined with the queuing between stages, can lead to us
> exhausting the mbufs, particularly when a stage's load causes queuing.
Yes, exactly.
>
> When I initially ran into this issue, I had a fairly large mbuf pool
> (32K entries), with 3 stages in the pipeline: rx, worker, tx. There
> were two worker threads, with a total of 6 rings. I was sending some
> fairly bursty traffic, at a high packet rate (it was bursting up to
> around 1Mpkt/s). There was a low chance that this actually caused the
> problem. However, when I decreased the mbuf pool to 1000 entries, it
> *always* happened.
>
> In summary: the pipeline model is important here, and a small pool
> size definitely exacerbates the problem.
>
> I was able to reproduce the problem using the load_balancer sample
> application, though it required some modification to get it to run
> with virtio. I'm not sure if this is because I'm using DPDK 1.8,  or
> something else. Either way, I made the number of mbufs configurable
> via an environment variable, and was able to show that decreasing it
> from the default of 32K to 1K would cause the problem to always happen
> when using the same traffic as with my application. Applying the below
> patch fixed the problem.
>
> The following patch seems to fix the problem for me, though I'm not
> sure it's the optimal solution. It does so by removing the early exit
> which prevents us from allocating mbufs. After we skip over the packet
> processing loop since there are no packets, the mbuf allocation loop
> runs.  Note that the patch is on dpdk 1.8.
Yes, it will fix your problem. We could try to do the refill each time
we enter the loop no matter there is avail packets or not.

> diff --git a/lib/librte_pmd_virtio/virtio_rxtx.c
> b/lib/librte_pmd_virtio/virtio_rxtx.c
> index c013f97..7cadf52 100644
> --- a/lib/librte_pmd_virtio/virtio_rxtx.c
> +++ b/lib/librte_pmd_virtio/virtio_rxtx.c
> @@ -463,9 +463,6 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf
> **rx_pkts, uint16_t nb_pkts)
>         if (likely(num > DESC_PER_CACHELINE))
>                 num = num - ((rxvq->vq_used_cons_idx + num) %
> DESC_PER_CACHELINE);
>
> -       if (num == 0)
> -               return 0;
> -
>         num = virtqueue_dequeue_burst_rx(rxvq, rcv_pkts, len, num);
>         PMD_RX_LOG(DEBUG, "used:%d dequeue:%d", nb_used, num);
>         for (i = 0; i < num ; i++) {
> @@ -549,9 +546,6 @@ virtio_recv_mergeable_pkts(void *rx_queue,
>
>         rmb();
>
> -       if (nb_used == 0)
> -               return 0;
> -
>         PMD_RX_LOG(DEBUG, "used:%d\n", nb_used);
>
>         while (i < nb_used) {
>
> Thanks,
>
> Kyle
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] vhost-net stops sending to virito pmd -- already fixed?
  2015-09-13 21:43 ` Thomas Monjalon
  2015-09-15 21:04   ` Kyle Larose
@ 2015-09-18  6:28   ` Xie, Huawei
  1 sibling, 0 replies; 8+ messages in thread
From: Xie, Huawei @ 2015-09-18  6:28 UTC (permalink / raw)
  To: Thomas Monjalon, dev

On 9/14/2015 5:44 AM, Thomas Monjalon wrote:
> Hi,
>
> 2015-09-11 12:32, Kyle Larose:
>> Looking through the version tree for virtio_rxtx.c, I saw the following
>> commit:
>>
>> http://dpdk.org/browse/dpdk/commit/lib/librte_pmd_virtio?id=8c09c20fb4cde76e53d87bd50acf2b441ecf6eb8
>>
>> Does anybody know offhand if the issue fixed by that commit could be the
>> root cause of what I am seeing?
> I won't have the definitive answer but I would like to use your question
> to highlight a common issue in git messages:
>
> PLEASE, authors of fixes, explain the bug you are fixing and how it can
> be reproduced. Good commit messages are REALLY read and useful.
Thomas:
Thanks for reminder. I am not the author of this specific patch, :). We
will try to keep commit message simple, enough and useful.
In my opinion, this commit message is totally ok. It doesn't fix
anything but remove some unnecessary ring update operations.

>
> Thanks
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-09-18  6:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-11 16:32 [dpdk-dev] vhost-net stops sending to virito pmd -- already fixed? Kyle Larose
2015-09-13 21:43 ` Thomas Monjalon
2015-09-15 21:04   ` Kyle Larose
2015-09-16  1:37     ` Xie, Huawei
2015-09-16 17:24       ` Kyle Larose
2015-09-18  5:58         ` Xie, Huawei
2015-09-17  0:45     ` Ouyang, Changchun
2015-09-18  6:28   ` Xie, Huawei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).