From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6F654A3168 for ; Wed, 16 Oct 2019 16:09:03 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D4AFD1E969; Wed, 16 Oct 2019 16:09:01 +0200 (CEST) Received: from relay8-d.mail.gandi.net (relay8-d.mail.gandi.net [217.70.183.201]) by dpdk.org (Postfix) with ESMTP id 325EB1BEF1 for ; Wed, 16 Oct 2019 16:09:00 +0200 (CEST) X-Originating-IP: 90.177.210.238 Received: from [192.168.1.110] (238.210.broadband10.iol.cz [90.177.210.238]) (Authenticated sender: i.maximets@ovn.org) by relay8-d.mail.gandi.net (Postfix) with ESMTPSA id 08F331BF214; Wed, 16 Oct 2019 14:08:54 +0000 (UTC) To: Flavio Leitner , Maxime Coquelin Cc: Ilya Maximets , dev@dpdk.org, Shahaf Shuler , David Marchand , Tiwei Bie , Obrembski MichalX , Stokes Ian References: <20191015161727.32570-1-fbl@sysclose.org> <20191015185951.6295-1-fbl@sysclose.org> <5900f514-f1d9-7f7d-0ea1-f394b9e5d01f@ovn.org> <62fb6a50-d4ab-584d-8cb2-4820c141f137@redhat.com> <20191016110243.14005e1c@p50.lan> From: Ilya Maximets Message-ID: Date: Wed, 16 Oct 2019 16:08:54 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20191016110243.14005e1c@p50.lan> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v5] vhost: add support for large buffers X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 16.10.2019 16:02, Flavio Leitner wrote: > On Wed, 16 Oct 2019 15:46:15 +0200 > Maxime Coquelin wrote: > >> On 10/16/19 3:32 PM, Ilya Maximets wrote: >>> On 16.10.2019 13:13, Maxime Coquelin wrote: >>>> >>>> >>>> On 10/15/19 8:59 PM, Flavio Leitner wrote: >>>>> The rte_vhost_dequeue_burst supports two ways of dequeuing data. >>>>> If the data fits into a buffer, then all data is copied and a >>>>> single linear buffer is returned. Otherwise it allocates >>>>> additional mbufs and chains them together to return a multiple >>>>> segments mbuf. >>>>> >>>>> While that covers most use cases, it forces applications that >>>>> need to work with larger data sizes to support multiple segments >>>>> mbufs. The non-linear characteristic brings complexity and >>>>> performance implications to the application. >>>>> >>>>> To resolve the issue, add support to attach external buffer >>>>> to a pktmbuf and let the host provide during registration if >>>>> attaching an external buffer to pktmbuf is supported and if >>>>> only linear buffer are supported. >>>>> >>>>> Signed-off-by: Flavio Leitner >>>>> --- >>>>>   doc/guides/prog_guide/vhost_lib.rst |  35 +++++++++ >>>>>   lib/librte_vhost/rte_vhost.h        |   4 + >>>>>   lib/librte_vhost/socket.c           |  22 ++++++ >>>>>   lib/librte_vhost/vhost.c            |  22 ++++++ >>>>>   lib/librte_vhost/vhost.h            |   4 + >>>>>   lib/librte_vhost/virtio_net.c       | 109 >>>>> ++++++++++++++++++++++++---- 6 files changed, 182 insertions(+), >>>>> 14 deletions(-) >>>>> >>>>> - Changelog: >>>>>    v5: >>>>>      - fixed to destroy mutex if incompatible flags >>>>>    v4: >>>>>      - allow to use pktmbuf if there is exact space >>>>>      - removed log message if the buffer is too big >>>>>      - fixed the length to include align padding >>>>>      - free allocated buf if shinfo fails >>>>>    v3: >>>>>      - prevent the new features to be used with zero copy >>>>>      - fixed sizeof() usage >>>>>      - fixed log msg indentation >>>>>      - removed/replaced asserts >>>>>      - used the correct virt2iova function >>>>>      - fixed the patch's title >>>>>      - OvS PoC code: >>>>>        https://github.com/fleitner/ovs/tree/rte_malloc-v3 >>>>>    v2: >>>>>      - Used rte_malloc() instead of another mempool as suggested >>>>> by Shahaf. >>>>>      - Added the documentation section. >>>>>      - Using driver registration to negotiate the features. >>>>>      - OvS PoC code: >>>>> >>>>> https://github.com/fleitner/ovs/commit/8fc197c40b1d4fda331686a7b919e9e2b670dda7 >>>>> >>>> >>>> Applied to dpdk-next-virtio/master. >>> >>> Thanks Maxime, >>> >>> But can we return back the mbuf allocation failure message? >> >> Good point, I missed it. >> >>> I mean: >>> >>> diff --git a/lib/librte_vhost/virtio_net.c >>> b/lib/librte_vhost/virtio_net.c index 66f0c7206..f8af4e0b3 100644 >>> --- a/lib/librte_vhost/virtio_net.c >>> +++ b/lib/librte_vhost/virtio_net.c >>> @@ -1354,8 +1354,11 @@ virtio_dev_pktmbuf_alloc(struct virtio_net >>> *dev, struct rte_mempool *mp, >>>  { >>>      struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp); >>> >>> -    if (unlikely(pkt == NULL)) >>> +    if (unlikely(pkt == NULL)) { >>> +        RTE_LOG(ERR, VHOST_DATA, >>> +            "Failed to allocate memory for mbuf.\n"); >>>          return NULL; >>> +    } >>> >>>      if (rte_pktmbuf_tailroom(pkt) >= data_len) >>>          return pkt; >>> --- >>> >>> It's a hard failure that highlights some significant issues with >>> memory pool size or a mbuf leak. >> >> I agree, we need this error message. > > If it runs out of memory and there are many packets coming, then it > will flood the logs. Maybe we could use something to report in a more > moderated way, for example: > > diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c > index da69ab1db..b1572b3a4 100644 > --- a/lib/librte_vhost/virtio_net.c > +++ b/lib/librte_vhost/virtio_net.c > @@ -1354,8 +1354,14 @@ virtio_dev_pktmbuf_alloc(struct virtio_net *dev, struct rte_mempool *mp, > { > struct rte_mbuf *pkt = rte_pktmbuf_alloc(mp); > > - if (unlikely(pkt == NULL)) > + if (unlikely(pkt == NULL)) { > + static int rate_lim = 0; > + > + if (rate_lim++ % 1000 == 0) > + RTE_LOG... > + > return NULL; > + } > > if (rte_pktmbuf_tailroom(pkt) >= data_len) > return pkt; > > > Or track this at mempool level and keep stats of failed requests and > report that in OvS. That would cover other points too. OVS is already rete-limiting DPDK logs. Basically, I introduced this functionality to limit exactly this message. See: 9fd38f6867df ("netdev-dpdk: Limit rate of DPDK logs.") Best regards, Ilya Maximets.