From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 28DBEA2F63 for ; Fri, 4 Oct 2019 09:58:03 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id EC18A1C1A9; Fri, 4 Oct 2019 09:58:02 +0200 (CEST) Received: from relay8-d.mail.gandi.net (relay8-d.mail.gandi.net [217.70.183.201]) by dpdk.org (Postfix) with ESMTP id 86EAB1C132 for ; Thu, 3 Oct 2019 18:57:44 +0200 (CEST) X-Originating-IP: 90.177.210.238 Received: from [192.168.1.110] (238.210.broadband10.iol.cz [90.177.210.238]) (Authenticated sender: i.maximets@ovn.org) by relay8-d.mail.gandi.net (Postfix) with ESMTPSA id 58E3C1BF207; Thu, 3 Oct 2019 16:57:42 +0000 (UTC) To: Flavio Leitner , Maxime Coquelin Cc: Shahaf Shuler , David Marchand , "dev@dpdk.org" , Tiwei Bie , Zhihong Wang , Obrembski MichalX , Stokes Ian , Ilya Maximets References: <20191001221935.12140-1-fbl@sysclose.org> <20191002095831.5927af93@p50.lan> <20191002151528.0f285b8a@p50.lan> From: Ilya Maximets Message-ID: <088ea83c-cc00-5542-a554-ca857b9ef6ec@ovn.org> Date: Thu, 3 Oct 2019 18:57:32 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20191002151528.0f285b8a@p50.lan> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Fri, 04 Oct 2019 09:58:01 +0200 Subject: Re: [dpdk-dev] [PATCH] vhost: add support to large linear mbufs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 02.10.2019 20:15, Flavio Leitner wrote: > On Wed, 2 Oct 2019 17:50:41 +0000 > Shahaf Shuler wrote: > >> Wednesday, October 2, 2019 3:59 PM, Flavio Leitner: >>> Obrembski MichalX ; Stokes Ian >>> >>> Subject: Re: [dpdk-dev] [PATCH] vhost: add support to large linear >>> mbufs >>> >>> >>> Hi Shahaf, >>> >>> Thanks for looking into this, see my inline comments. >>> >>> On Wed, 2 Oct 2019 09:00:11 +0000 >>> Shahaf Shuler wrote: >>> >>>> Wednesday, October 2, 2019 11:05 AM, David Marchand: >>>>> Subject: Re: [dpdk-dev] [PATCH] vhost: add support to large >>>>> linear mbufs >>>>> >>>>> Hello Shahaf, >>>>> >>>>> On Wed, Oct 2, 2019 at 6:46 AM Shahaf Shuler >>>>> wrote: >>>>>> >> >> [...] >> >>>>> >>>>> I am missing some piece here. >>>>> Which pool would the PMD take those external buffers from? >>>> >>>> The mbuf is always taken from the single mempool associated w/ the >>>> rxq. The buffer for the mbuf may be allocated (in case virtio >>>> payload is bigger than current mbuf size) from DPDK hugepages or >>>> any other system memory and be attached to the mbuf. >>>> >>>> You can see example implementation of it in mlx5 PMD (checkout >>>> rte_pktmbuf_attach_extbuf call) >>> >>> Thanks, I wasn't aware of external buffers. >>> >>> I see that attaching external buffers of the correct size would be >>> more efficient in terms of saving memory/avoiding sparsing. >>> >>> However, we still need to be prepared to the worse case scenario >>> (all packets 64K), so that doesn't help with the total memory >>> required. >> >> Am not sure why. >> The allocation can be per demand. That is - only when you encounter a >> large buffer. >> >> Having buffer allocated in advance will benefit only from removing >> the cost of the rte_*malloc. However on such big buffers, and further >> more w/ device offloads like TSO, am not sure that is an issue. > > Now I see what you're saying. I was thinking we had to reserve the > memory before, like mempool does, then get the buffers as needed. > > OK, I can give a try with rte_*malloc and see how it goes. This way we actually could have a nice API. For example, by introducing some new flag RTE_VHOST_USER_NO_CHAINED_MBUFS (there might be better name) which could be passed to driver_register(). On receive, depending on this flag, function will create chained mbufs or allocate new contiguous memory chunk and attach it as an external buffer if the data could not be stored in a single mbuf from the registered memory pool. Supporting external memory in mbufs will require some additional work from the OVS side (e.g. better work with ol_flags), but we'll have to do it anyway for upgrade to DPDK 19.11. Best regards, Ilya Maximets. > >>> The current patch pushes the decision to the application which >>> knows better the workload. If more memory is available, it can >>> optionally use large buffers, otherwise just don't pass that. Or >>> even decide whether to share the same 64K mempool between multiple >>> vhost ports or use one mempool per port. >>> >>> Perhaps I missed something, but managing memory with mempool still >>> require us to have buffers of 64K regardless if the data consumes >>> less space. Otherwise the application or the PMD will have to >>> manage memory itself. >>> >>> If we let the PMD manages the memory, what happens if a port/queue >>> is closed and one or more buffers are still in use (switching)? I >>> don't see how to solve this cleanly. >> >> Closing of the dev should return EBUSY till all buffers are free. >> What is the use case of closing a port while still having packet >> pending on other port of the switch? And why we cannot wait for them >> to complete transmission? > > The vswitch gets the request from outside and the assumption is that > the command will succeed. AFAIK, there is no retry mechanism. > > Thanks Shahaf! > fbl