From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id B8D2AA3168 for ; Wed, 16 Oct 2019 16:02:54 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 149931E960; Wed, 16 Oct 2019 16:02:54 +0200 (CEST) Received: from sysclose.org (smtp.sysclose.org [69.164.214.230]) by dpdk.org (Postfix) with ESMTP id EDD691E941 for ; Wed, 16 Oct 2019 16:02:52 +0200 (CEST) Received: by sysclose.org (Postfix, from userid 5001) id 41AEB677B; Wed, 16 Oct 2019 14:03:19 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 sysclose.org 41AEB677B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sysclose.org; s=201903; t=1571234599; bh=AoFHlPdB7KXLXvyjEIWPi/u8R4w65x+t7Y8VjaOUDRg=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=nZAbzQs2/TDByo6hgk0Pmry58re0IiZdEocskz4LLr+ecJm8yHqdscaA5WAdtSnmY URI8rSzB64fBLzg0uk/8PjDi2aybCfwWCrpjAQ8BGYbpeg4qGKyz3dYqBExGTgfnzd hR6FXEd9noczgZ6ZTE9uOUWOARRZ+lgBza4DAZT94gomSqyxlrhULM5XLrDVzbMUV7 nAvXOTrkKPpoF9hr8hK4+bL1/6ngVrAPGLkGL/qhtItYVsi6opZtDFEidKiNv9liqM cRMLPBMjEBbVI/S/idNz0Vr1cgL7ueXrpuQCzQlCq7Mo9CQWKeoIgzQC6w/E2Vsc8K TQjTN11jCIxZw== X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on mail.sysclose.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from p50.lan (unknown [177.183.215.210]) by sysclose.org (Postfix) with ESMTPSA id BFD7B6509; Wed, 16 Oct 2019 14:03:16 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 sysclose.org BFD7B6509 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sysclose.org; s=201903; t=1571234598; bh=AoFHlPdB7KXLXvyjEIWPi/u8R4w65x+t7Y8VjaOUDRg=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=cik6CZLlxUnBnehhn7p+d77UuTITQyq+BEzGkHXzl+gEC8rMNbF59ePNDsO7XO79R kO1Oh/ZCSQTHrgoNQrTUzXYQGnUe6WCDkoZ1iXwtUCue4Ft16QJuwjfpKYaYA2axJb kol5dVgbL8d/XoPclQU6DVyUVNu49n+JyplVe+R9YooJCZ3oao5AvTeK1sHxs2+3kh YcU13At3gzxmWVjDiurqS3bmWSqSQ7CA0/ede7vWOVkHkL0swTmsdTyc3VVCXKcU8E BIwv7JspSt1+b6fvBXY5ZiSo8R95Lrw+k8nc9bFaHteLE/o1bk7dKnDwNjkPKtDMK8 SU3XjXH5iFf1Q== Date: Wed, 16 Oct 2019 11:02:43 -0300 From: Flavio Leitner To: Maxime Coquelin Cc: Ilya Maximets , dev@dpdk.org, Shahaf Shuler , David Marchand , Tiwei Bie , Obrembski MichalX , Stokes Ian Message-ID: <20191016110243.14005e1c@p50.lan> In-Reply-To: <62fb6a50-d4ab-584d-8cb2-4820c141f137@redhat.com> References: <20191015161727.32570-1-fbl@sysclose.org> <20191015185951.6295-1-fbl@sysclose.org> <5900f514-f1d9-7f7d-0ea1-f394b9e5d01f@ovn.org> <62fb6a50-d4ab-584d-8cb2-4820c141f137@redhat.com> X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [dpdk-dev] [PATCH v5] vhost: add support for large buffers X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Wed, 16 Oct 2019 15:46:15 +0200 Maxime Coquelin wrote: > On 10/16/19 3:32 PM, Ilya Maximets wrote: > > On 16.10.2019 13:13, Maxime Coquelin wrote: =20 > >> > >> > >> On 10/15/19 8:59 PM, Flavio Leitner wrote: =20 > >>> The rte_vhost_dequeue_burst supports two ways of dequeuing data. > >>> If the data fits into a buffer, then all data is copied and a > >>> single linear buffer is returned. Otherwise it allocates > >>> additional mbufs and chains them together to return a multiple > >>> segments mbuf. > >>> > >>> While that covers most use cases, it forces applications that > >>> need to work with larger data sizes to support multiple segments > >>> mbufs. The non-linear characteristic brings complexity and > >>> performance implications to the application. > >>> > >>> To resolve the issue, add support to attach external buffer > >>> to a pktmbuf and let the host provide during registration if > >>> attaching an external buffer to pktmbuf is supported and if > >>> only linear buffer are supported. > >>> > >>> Signed-off-by: Flavio Leitner > >>> --- > >>> =C2=A0 doc/guides/prog_guide/vhost_lib.rst |=C2=A0 35 +++++++++ > >>> =C2=A0 lib/librte_vhost/rte_vhost.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 |=C2=A0=C2=A0 4 + > >>> =C2=A0 lib/librte_vhost/socket.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 22 ++++++ > >>> =C2=A0 lib/librte_vhost/vhost.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 22 ++++++ > >>> =C2=A0 lib/librte_vhost/vhost.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0 4 + > >>> =C2=A0 lib/librte_vhost/virtio_net.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 | 109 > >>> ++++++++++++++++++++++++---- 6 files changed, 182 insertions(+), > >>> 14 deletions(-) > >>> > >>> - Changelog: > >>> =C2=A0=C2=A0 v5: > >>> =C2=A0=C2=A0=C2=A0=C2=A0 - fixed to destroy mutex if incompatible fla= gs > >>> =C2=A0=C2=A0 v4: > >>> =C2=A0=C2=A0=C2=A0=C2=A0 - allow to use pktmbuf if there is exact spa= ce > >>> =C2=A0=C2=A0=C2=A0=C2=A0 - removed log message if the buffer is too b= ig > >>> =C2=A0=C2=A0=C2=A0=C2=A0 - fixed the length to include align padding > >>> =C2=A0=C2=A0=C2=A0=C2=A0 - free allocated buf if shinfo fails > >>> =C2=A0=C2=A0 v3: > >>> =C2=A0=C2=A0=C2=A0=C2=A0 - prevent the new features to be used with z= ero copy > >>> =C2=A0=C2=A0=C2=A0=C2=A0 - fixed sizeof() usage > >>> =C2=A0=C2=A0=C2=A0=C2=A0 - fixed log msg indentation > >>> =C2=A0=C2=A0=C2=A0=C2=A0 - removed/replaced asserts > >>> =C2=A0=C2=A0=C2=A0=C2=A0 - used the correct virt2iova function > >>> =C2=A0=C2=A0=C2=A0=C2=A0 - fixed the patch's title > >>> =C2=A0=C2=A0=C2=A0=C2=A0 - OvS PoC code: > >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 https://github.com/fleitner/ovs/= tree/rte_malloc-v3 > >>> =C2=A0=C2=A0 v2: > >>> =C2=A0=C2=A0=C2=A0=C2=A0 - Used rte_malloc() instead of another mempo= ol as suggested > >>> by Shahaf. > >>> =C2=A0=C2=A0=C2=A0=C2=A0 - Added the documentation section. > >>> =C2=A0=C2=A0=C2=A0=C2=A0 - Using driver registration to negotiate the= features. > >>> =C2=A0=C2=A0=C2=A0=C2=A0 - OvS PoC code: > >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 > >>> https://github.com/fleitner/ovs/commit/8fc197c40b1d4fda331686a7b919e9= e2b670dda7 > >>> =20 > >> > >> Applied to dpdk-next-virtio/master. =20 > >=20 > > Thanks Maxime, > >=20 > > But can we return back the mbuf allocation failure message? =20 >=20 > Good point, I missed it. >=20 > > I mean: > >=20 > > diff --git a/lib/librte_vhost/virtio_net.c > > b/lib/librte_vhost/virtio_net.c index 66f0c7206..f8af4e0b3 100644 > > --- a/lib/librte_vhost/virtio_net.c > > +++ b/lib/librte_vhost/virtio_net.c > > @@ -1354,8 +1354,11 @@ virtio_dev_pktmbuf_alloc(struct virtio_net > > *dev, struct rte_mempool *mp, > > =C2=A0{ > > =C2=A0=C2=A0=C2=A0=C2=A0 struct rte_mbuf *pkt =3D rte_pktmbuf_alloc(mp); > > =C2=A0 > > -=C2=A0=C2=A0=C2=A0 if (unlikely(pkt =3D=3D NULL)) > > +=C2=A0=C2=A0=C2=A0 if (unlikely(pkt =3D=3D NULL)) { > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 RTE_LOG(ERR, VHOST_DATA, > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "Fa= iled to allocate memory for mbuf.\n"); > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return NULL; > > +=C2=A0=C2=A0=C2=A0 } > > =C2=A0 > > =C2=A0=C2=A0=C2=A0=C2=A0 if (rte_pktmbuf_tailroom(pkt) >=3D data_len) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return pkt; > > --- > >=20 > > It's a hard failure that highlights some significant issues with > > memory pool size or a mbuf leak. =20 >=20 > I agree, we need this error message. If it runs out of memory and there are many packets coming, then it will flood the logs. Maybe we could use something to report in a more moderated way, for example: diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index da69ab1db..b1572b3a4 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -1354,8 +1354,14 @@ virtio_dev_pktmbuf_alloc(struct virtio_net *dev, str= uct rte_mempool *mp, { struct rte_mbuf *pkt =3D rte_pktmbuf_alloc(mp); =20 - if (unlikely(pkt =3D=3D NULL)) + if (unlikely(pkt =3D=3D NULL)) { + static int rate_lim =3D 0; + + if (rate_lim++ % 1000 =3D=3D 0) + RTE_LOG... + return NULL; + } =20 if (rte_pktmbuf_tailroom(pkt) >=3D data_len) return pkt; Or track this at mempool level and keep stats of failed requests and report that in OvS. That would cover other points too. fbl