From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f49.google.com (mail-wm0-f49.google.com [74.125.82.49]) by dpdk.org (Postfix) with ESMTP id 735A61B2FD for ; Mon, 5 Feb 2018 11:18:27 +0100 (CET) Received: by mail-wm0-f49.google.com with SMTP id 141so24631312wme.3 for ; Mon, 05 Feb 2018 02:18:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=eYPz2qq0YxpVIH0LcV3mS5dm6ZOx3tEo5KgcRjMzbtc=; b=IgBUZ76xPLEWIkuzRMfwfx/o0mqNVoKBM4uTt6Fktd+FiZavrRP8kwV9shRh2YHSdP Qbj2+8ATDxWhT19a9L5Q7YoeR57BQYUOhbr0023fSuhR6euGwST7TKHVUSBvgymipnQz Hp05Sc0EqPn/iO07AsgYC2qH17IaVoj1FuffULp5npssE+F6GwfJ2p9Nq7QpHcKmIe/D hqSGtXjC69fTAg/6y+hX+1m6FKF31hu8V5vOECmuR+CS7RY8VjzTbyHVOtEzgHfbQla+ IFh2y/+c4tZsnkhqHgC2fM0ynLyyMEYBreEN+QJZqOvdc//bzYWqZ9fU1aubDPl/qfvl Xkww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=eYPz2qq0YxpVIH0LcV3mS5dm6ZOx3tEo5KgcRjMzbtc=; b=hZrl2MW3HeeSh4eIWIPvK+xjELKb2LMBFkaXmlX4Ind2JySWfy/ChffqLKV1gCn7AB kuMM9W4AU9f1OHx0YqoM3YxKWFkLfUnJcw9kYGx2z/OyBll2EC23HuJazKGswde6qP44 x+MzVAEPI9NpbR4Jlvh6QaptJ0R+dZXxECWbjbC1sR/awNzMh3V8OTg12Zuf5wXb/Atl RdWId2IzHWYEBO1LIUOCAd8w6SiofhyHcOmBPRVN1B/GwH7lF8Wrwycg++hIONuBij+9 9gxR0x+2yQR/eAPjr93QitJm4QMgtv1Bo0iYqVyywY5kMG/KySyEfqaz+WWkg0StBL33 Qmpw== X-Gm-Message-State: AKwxytcAH0p5tUYrBZkHQVN0vmRdyN2lHW+1z61Vrg2m4E3fAIL8ER/h 5LG0F6M8jns2kitM3aSKYOWmeiu1nQ== X-Google-Smtp-Source: AH8x227B+ACylABo4mNfVLs+B05llSBROILuL7JGbtHlg1fpt1LRZsV22CFQhwI4gfQI8zKI4TUZ+A== X-Received: by 10.80.244.211 with SMTP id v19mr78900440edm.278.1517825907083; Mon, 05 Feb 2018 02:18:27 -0800 (PST) Received: from laranjeiro-vm.dev.6wind.com (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id j17sm8178248ede.84.2018.02.05.02.18.26 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 05 Feb 2018 02:18:26 -0800 (PST) Date: Mon, 5 Feb 2018 11:18:52 +0100 From: =?iso-8859-1?Q?N=E9lio?= Laranjeiro To: "Burakov, Anatoly" Cc: Yongseok Koh , "Walker, Benjamin" , "dev@dpdk.org" , "thomas@monjalon.net" , "andras.kovacs@ericsson.com" , "Wiles, Keith" , "Richardson, Bruce" Message-ID: <20180205101852.owogsnbcach32z2k@laranjeiro-vm.dev.6wind.com> References: <1513892309.2658.80.camel@intel.com> <1514308764.2658.93.camel@intel.com> <20180202192832.GA42096@yongseok-MBP.local> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Subject: Re: [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Feb 2018 10:18:27 -0000 On Mon, Feb 05, 2018 at 10:03:35AM +0000, Burakov, Anatoly wrote: > On 02-Feb-18 7:28 PM, Yongseok Koh wrote: > > On Tue, Dec 26, 2017 at 05:19:25PM +0000, Walker, Benjamin wrote: > > > On Fri, 2017-12-22 at 09:13 +0000, Burakov, Anatoly wrote: > > > > On 21-Dec-17 9:38 PM, Walker, Benjamin wrote: > > > > > SPDK will need some way to register for a notification when pages are > > > > > allocated > > > > > or freed. For storage, the number of requests per second is (relative to > > > > > networking) fairly small (hundreds of thousands per second in a traditional > > > > > block storage stack, or a few million per second with SPDK). Given that, we > > > > > can > > > > > afford to do a dynamic lookup from va to pa/iova on each request in order to > > > > > greatly simplify our APIs (users can just pass pointers around instead of > > > > > mbufs). DPDK has a way to lookup the pa from a given va, but it does so by > > > > > scanning /proc/self/pagemap and is very slow. SPDK instead handles this by > > > > > implementing a lookup table of va to pa/iova which we populate by scanning > > > > > through the DPDK memory segments at start up, so the lookup in our table is > > > > > sufficiently fast for storage use cases. If the list of memory segments > > > > > changes, > > > > > we need to know about it in order to update our map. > > > > > > > > Hi Benjamin, > > > > > > > > So, in other words, we need callbacks on alloa/free. What information > > > > would SPDK need when receiving this notification? Since we can't really > > > > know in advance how many pages we allocate (it may be one, it may be a > > > > thousand) and they no longer are guaranteed to be contiguous, would a > > > > per-page callback be OK? Alternatively, we could have one callback per > > > > operation, but only provide VA and size of allocated memory, while > > > > leaving everything else to the user. I do add a virt2memseg() function > > > > which would allow you to look up segment physical addresses easier, so > > > > you won't have to manually scan memseg lists to get IOVA for a given VA. > > > > > > > > Thanks for your feedback and suggestions! > > > > > > Yes - callbacks on alloc/free would be perfect. Ideally for us we want one > > > callback per virtual memory region allocated, plus a function we can call to > > > find the physical addresses/page break points on that virtual region. The > > > function that finds the physical addresses does not have to be efficient - we'll > > > just call that once when the new region is allocated and store the results in a > > > fast lookup table. One call per virtual region is better for us than one call > > > per physical page because we're actually keeping multiple different types of > > > memory address translation tables in SPDK. One translates from va to pa/iova, so > > > for this one we need to break this up into physical pages and it doesn't matter > > > if you do one call per virtual region or one per physical page. However another > > > one translates from va to RDMA lkey, so it is much more efficient if we can > > > register large virtual regions in a single call. > > > > Another yes to callbacks. Like Benjamin mentioned about RDMA, MLX PMD has to > > look up LKEY per each packet DMA. Let me briefly explain about this for your > > understanding. For security reason, we don't allow application initiates a DMA > > transaction with unknown random physical addresses. Instead, va-to-pa mapping > > (we call it Memory Region) should be pre-registered and LKEY is the index of the > > translation entry registered in device. With the current static memory model, it > > is easy to manage because v-p mapping is unchanged over time. But if it becomes > > dynamic, MLX PMD should get notified with the event to register/un-regsiter > > Memory Region. > > > > For MLX PMD, it is also enough to get one notification per allocation/free of a > > virutal memory region. It shouldn't necessarily be a per-page call like Benjamin > > mentioned because PA of region doesn't need to be contiguous for registration. > > But it doesn't need to know about physical address of the region (I'm not saying > > it is unnecessary, but just FYI :-). > > > > Thanks, > > Yongseok > > > > Thanks for your feedback, good to hear we're on the right track. I already > have a prototype implementation of this working, due for v1 submission :) Hi Anatoly, Good to know. Do you see some performances impact with this series? Thanks, -- Nélio Laranjeiro 6WIND