From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f173.google.com (mail-ob0-f173.google.com [209.85.214.173]) by dpdk.org (Postfix) with ESMTP id 75950379E for ; Tue, 15 Sep 2015 23:04:58 +0200 (CEST) Received: by obbbh8 with SMTP id bh8so146145854obb.0 for ; Tue, 15 Sep 2015 14:04:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=ZWvz+KpV/EXJhoTulEVTqYa3JPG5BDmB+2RklT3Xa3I=; b=b+AIiLHBMR4JJowDiOkAAY829YcvEhburBuCMyS7Uw3YqFtQTnMMttjn6GBYu5GfFI 6W3+7UhD6pEkowd43jFz2fj8s22CQm70gGL7I5+hIXUaVk8z/SrhsuxqwRykyD0xBwnz HzqMrRYvu3T1HhvQKIKfG6B4/mEfgSrM6WbWuL+HgfsL9yLYwtIxglDZa2eS2upTLiud S8DpDZElZjjCAdVszqKyQ+U5i4EU/0GrmXfIqitUT26CGBpXjhTvf24+0fRnm4xb5o8V OFfaae5d45F+1K3QYq0ob1HsDm85eiWF+BV4e3BPaMw8NaFxy2BTja1NLXLO08uEOKvj u5Eg== MIME-Version: 1.0 X-Received: by 10.182.247.99 with SMTP id yd3mr20014916obc.37.1442351097743; Tue, 15 Sep 2015 14:04:57 -0700 (PDT) Received: by 10.76.83.164 with HTTP; Tue, 15 Sep 2015 14:04:57 -0700 (PDT) In-Reply-To: <2603066.XvJpW2pAu2@xps13> References: <2603066.XvJpW2pAu2@xps13> Date: Tue, 15 Sep 2015 17:04:57 -0400 Message-ID: From: Kyle Larose To: Thomas Monjalon Content-Type: text/plain; charset=UTF-8 Cc: dev@dpdk.org Subject: Re: [dpdk-dev] vhost-net stops sending to virito pmd -- already fixed? X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2015 21:04:58 -0000 On Sun, Sep 13, 2015 at 5:43 PM, Thomas Monjalon wrote: > > Hi, > > 2015-09-11 12:32, Kyle Larose: > > Looking through the version tree for virtio_rxtx.c, I saw the following > > commit: > > > > http://dpdk.org/browse/dpdk/commit/lib/librte_pmd_virtio?id=8c09c20fb4cde76e53d87bd50acf2b441ecf6eb8 > > > > Does anybody know offhand if the issue fixed by that commit could be the > > root cause of what I am seeing? > > I won't have the definitive answer but I would like to use your question > to highlight a common issue in git messages: > > PLEASE, authors of fixes, explain the bug you are fixing and how it can > be reproduced. Good commit messages are REALLY read and useful. > > Thanks > I've figured out what happened. It has nothing to do with the fix I pasted above. Instead, the issue has to do with running low on mbufs. Here's the general logic: 1. If packets are not queued, return 2. Fetch each queued packet, as an mbuf, into the provided array. This may involve some merging/etc 3. Try to fill the virtio receive ring with new mbufs 3.a. If we fail to allocate an mbuf, break out of the refill loop 4. Update the receive ring information and kick the host This is obviously a simplification, but the key point is 3.a. If we hit this logic when the virtio receive ring is completely used up, we essentially lock up. The host will have no buffers with which to queue packets, so the next time we poll, we will hit case 1. However, since we hit case 1, we will not allocate mbufs to the virtio receive ring, regardless of how many are now free. Rinse and repeat; we are stuck until the pmd is restarted or the link is restarted. This is very easy to reproduce when the mbuf pool is fairly small, and packets are being passed to worker threads/processes which may increase the length of the pipeline. I took a quick look at the ixgbe driver, and it looks like it checks if it needs to allocate mbufs to the ring before trying to pull packets off the nic. Should we not be doing something similar for virtio? Rather than breaking out early if no packets are queued, we should first make sure there are resources with which to queue packets! One solution here is to increase the mbuf pool to a size where such exhaustion is impossible, but that doesn't seem like a graceful solution. For example, it may be desirable to drop packets rather than have a large memory pool, and becoming stuck under such a situation is not good. Further, it isn't easy to know the exact size required. You may end up wasting a bunch of resources allocating far more than necessary, or you may unknowingly under allocate, only to find out once your application has been deployed into production, and it's dropping everything on the floor. Does anyone have thoughts on this? I took a look at virtio_rxtx and head and I didn't see anything resembling my suggestion. Comments would be appreciated. Thanks, Kyle