From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 34341A2EDB for ; Wed, 2 Oct 2019 14:58:44 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 24DBD1BEF1; Wed, 2 Oct 2019 14:58:43 +0200 (CEST) Received: from sysclose.org (smtp.sysclose.org [69.164.214.230]) by dpdk.org (Postfix) with ESMTP id 17A281BEEC for ; Wed, 2 Oct 2019 14:58:42 +0200 (CEST) Received: by sysclose.org (Postfix, from userid 5001) id 735B066FD; Wed, 2 Oct 2019 12:59:02 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 sysclose.org 735B066FD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sysclose.org; s=201903; t=1570021142; bh=j36KK1nqjzHMhl+jF77XcGJAXwpD56FfnQ4CEm9tZJs=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=rZJXr+Xxd5utBh7X2BN6sFtA9YmJFuiVOS+RtuVohuHON5COTB9y4oytJFEWDDkAI sLhbUCMiq9R0okGs513OtXNKinIf0J/S9n6b8dUcZG6OXWZi9i5c3TGFPl7QFEpwcX QWhIhx+MWk8AHlkgFdwrL4CNA+8EItH8EJZESpO3vB1+jOsipSdo3n/YDm2Cc+W82s SfhVHCtBj/nyUljf6orJT/1DUgy9uV4QPHMhCm/YJql32H9OkTqpS7cVUfzPRRO4v+ GL/NzSa1uP9Rh/3jCtYbWsY1Mf2YvwISUXXpsCgLcLKn1RojkE9AX8MbbqUJ/7vatO bUhgvt9hkybyw== X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on mail.sysclose.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU autolearn=ham autolearn_force=no version=3.4.0 Received: from p50.lan (unknown [177.183.215.210]) by sysclose.org (Postfix) with ESMTPSA id 12C1765B5; Wed, 2 Oct 2019 12:58:59 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 sysclose.org 12C1765B5 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sysclose.org; s=201903; t=1570021141; bh=j36KK1nqjzHMhl+jF77XcGJAXwpD56FfnQ4CEm9tZJs=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=aiEePc41joYfLb9nY6nELToDM1L3YpFNETdOzfTpMJyvZ/V4b4DdePmGeWp5gcetq j5qLS6S1sU5rBRjARQzKnQ00wxO/fbr48zCYCVibKQPSJSmFH51bzDihZ65boxRYnQ Sqzd0Of7RT/z+dz7UvpOwl4TFJmTor/UnitKfhHpTydqwMChHWs+40AF4HEJZbp4bH cC/nqE/em2A8fPHBO7C9Paqk2C/2VcDJenpnPvdfNh5LFSAKU7cnvqG8zlsuWbQq7R IIE9Bw9z3Rs8l2bz0mYbIFt5ykFP9T/FH7P4QBm+gSIi8mPJy4mjVf+BuFBNBi9SHf +Po12hK/rT8YQ== Date: Wed, 2 Oct 2019 09:58:31 -0300 From: Flavio Leitner To: Shahaf Shuler Cc: David Marchand , "dev@dpdk.org" , Maxime Coquelin , Tiwei Bie , Zhihong Wang , Obrembski MichalX , Stokes Ian Message-ID: <20191002095831.5927af93@p50.lan> In-Reply-To: References: <20191001221935.12140-1-fbl@sysclose.org> X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] vhost: add support to large linear mbufs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Shahaf, Thanks for looking into this, see my inline comments. On Wed, 2 Oct 2019 09:00:11 +0000 Shahaf Shuler wrote: > Wednesday, October 2, 2019 11:05 AM, David Marchand: > > Subject: Re: [dpdk-dev] [PATCH] vhost: add support to large linear > > mbufs > > > > Hello Shahaf, > > > > On Wed, Oct 2, 2019 at 6:46 AM Shahaf Shuler > > wrote: > > > > > > Wednesday, October 2, 2019 1:20 AM, Flavio Leitner: > > > > Subject: [dpdk-dev] [PATCH] vhost: add support to large linear > > > > mbufs > > > > > > > > The rte_vhost_dequeue_burst supports two ways of dequeuing > > > > data. If the data fits into a buffer, then all data is copied > > > > and a single linear buffer is returned. Otherwise it allocates > > > > additional mbufs and chains them together to return a multiple > > > > segments mbuf. > > > > > > > > While that covers most use cases, it forces applications that > > > > need to work with larger data sizes to support multiple > > > > segments mbufs. The non-linear characteristic brings complexity > > > > and performance implications to the application. > > > > > > > > To resolve the issue, change the API so that the application can > > > > optionally provide a second mempool containing larger mbufs. If > > > > that is not provided (NULL), the behavior remains as before the > > > > change. Otherwise, the data size is checked and the > > > > corresponding mempool is used to return linear mbufs. > > > > > > I understand the motivation. > > > However, providing a static pool w/ large buffers is not so > > > efficient in terms > > of memory footprint. You will need to prepare to worst case (all > > packet are large) w/ max size of 64KB. > > > Also, the two mempools are quite restrictive as the memory fill > > > of the > > mbufs might be very sparse. E.g. mempool1 mbuf.size = 1.5K , > > mempool2 mbuf.size = 64K, packet size 4KB. > > > > > > Instead, how about using the mbuf external buffer feature? > > > The flow will be: > > > 1. vhost PMD always receive a single mempool (like today) 2. on > > > dequeue, PMD looks on the virtio packet size. If smaller than the > > > mbuf size use the mbuf as is (like today) 3. otherwise, allocate > > > a new buffer (inside the PMD) and link it to the mbuf as external > > > buffer (rte_pktmbuf_attach_extbuf) > > > > I am missing some piece here. > > Which pool would the PMD take those external buffers from? > > The mbuf is always taken from the single mempool associated w/ the > rxq. The buffer for the mbuf may be allocated (in case virtio payload > is bigger than current mbuf size) from DPDK hugepages or any other > system memory and be attached to the mbuf. > > You can see example implementation of it in mlx5 PMD (checkout > rte_pktmbuf_attach_extbuf call) Thanks, I wasn't aware of external buffers. I see that attaching external buffers of the correct size would be more efficient in terms of saving memory/avoiding sparsing. However, we still need to be prepared to the worse case scenario (all packets 64K), so that doesn't help with the total memory required. The current patch pushes the decision to the application which knows better the workload. If more memory is available, it can optionally use large buffers, otherwise just don't pass that. Or even decide whether to share the same 64K mempool between multiple vhost ports or use one mempool per port. Perhaps I missed something, but managing memory with mempool still require us to have buffers of 64K regardless if the data consumes less space. Otherwise the application or the PMD will have to manage memory itself. If we let the PMD manages the memory, what happens if a port/queue is closed and one or more buffers are still in use (switching)? I don't see how to solve this cleanly. fbl > > > > > If it is from an additional mempool passed to the vhost pmd, I > > can't see the difference with Flavio proposal. > > > > > > > The pros of this approach is that you have full flexibility on > > > the memory > > allocation, and therefore a lower footprint. > > > The cons is the OVS will need to know how to handle mbuf w/ > > > external > > buffers (not too complex IMO). > > > > > > -- > > David Marchand