From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f47.google.com (mail-oi0-f47.google.com [209.85.218.47]) by dpdk.org (Postfix) with ESMTP id 2725E3B5 for ; Sat, 18 Mar 2017 22:37:58 +0100 (CET) Received: by mail-oi0-f47.google.com with SMTP id w81so8946094oig.1 for ; Sat, 18 Mar 2017 14:37:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=5Lgda2SW3WuF8jSfWxxC/OTEO9i12ekqg8yim7OzzMs=; b=N38L/Sa8c34QvWuakyTUzm0yvLb9renAcqUCFUBANn98Xpa77SaTFsKHdsNONIXbsc lS/oCZ+elyBgjXj5LnfodoUw3CEJOBLkAWu6OLKTMtwx3cGoMH/mDU/tcAl8qTFTVy7N Cwiaa3l+/RninPJ65ucPaduarfUQqISw6kOK97Scj6IBQ95k7Wl4kLSqEU3vjkToIxpL fcX+2dvByJ2hJoN9VLc+tByB3HkeQKyfMrBLPypDDLA8kfgUt/Xv3WGT46BNxcC4uLci 86t24yAHPBW6UarIBng44tv+lKQUXiY1GgSbNr5aANvIKRI4LWMBZNf6woPpvqK66+Mu OKrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=5Lgda2SW3WuF8jSfWxxC/OTEO9i12ekqg8yim7OzzMs=; b=mwrXemjNy+DM7qFnqhQBdxctIlfTQ2WJIfeqUl71Cos0dB1ldFDPJ8GqbgAaqd1wab k1roxX74l4JFLL3vu8tI9NBuP4dHAg7yq03KXPtDbqyrav+sYt4eSw38A9bg06q8EXRQ ny61fqqPg29hpM5HzlKrZbcrENiFxYQgN1qNfYeVVvXVEcqdYVW5qLKnEQEQqBxXrSol ht0R9kJzTmP+2wI9GSpvrqUR49FLLvvkquyVcm9vl2MossjQRXEyQU59GPLkUz+GYOr8 5MrE0FguvUUc0lDBlMf2iFdx8cUbuEVjJ6ZhDImSTHH2iB6uvG5nwW5Sew+qhCJiUG7Q /4ZQ== X-Gm-Message-State: AFeK/H1Jv4MyLdM1NTmmaSMB4dhOqE9PKQ4avGu0qp4JIkPe4xqbVMXRQ81LITm/HH6WcC0fGedx0ZrutJ5QTQ== X-Received: by 10.202.117.135 with SMTP id q129mr11583203oic.78.1489873077442; Sat, 18 Mar 2017 14:37:57 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.55.195 with HTTP; Sat, 18 Mar 2017 14:37:56 -0700 (PDT) In-Reply-To: References: <20170317020611.GV18844@yliu-dev.sh.intel.com> <20170317043526.GW18844@yliu-dev.sh.intel.com> <20170317051343.GY18844@yliu-dev.sh.intel.com> <20170317052433.GZ18844@yliu-dev.sh.intel.com> <20170317054036.GB18844@yliu-dev.sh.intel.com> From: Gopakumar Choorakkot Edakkunni Date: Sat, 18 Mar 2017 14:37:56 -0700 Message-ID: To: Yuanhan Liu Cc: dev@dpdk.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] virtio "how to restart applications" - //dpdk.org/doc/virtio-net-pmd X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Mar 2017 21:37:58 -0000 I mean vtpci_reset is called from rte_eal_pci_probe() which is the *last* thing in rte_eal_init(), *after* hugepage init, so if I can somehow get that done *before* hugepage init maybe all will be well (because I cant do anything to fix the host side) Rgds, Gopa. On Sat, Mar 18, 2017 at 2:32 PM, Gopakumar Choorakkot Edakkunni < gopakumar.c.e@gmail.com> wrote: > Hi Yuan, > > As a "hack"/"workaround", in rte_eal_init(), if I can call vtpci_reset() > just before rte_eal_memory_init(), that should take care of the problem of > host zeroing out hugepages right ? As of today vtpci_reset() is called in > rte_eal_dev_init() which comes *after* rte_eal_memory_init() > > Rgds, > Gopa. > > On Thu, Mar 16, 2017 at 10:50 PM, Gopakumar Choorakkot Edakkunni < > gopakumar.c.e@gmail.com> wrote: > >> Thanks again Yuanhan, you are the true expert!! >> >> Rgds, >> Gopa. >> >> On Thu, Mar 16, 2017 at 10:40 PM, Yuanhan Liu < >> yuanhan.liu@linux.intel.com> wrote: >> >>> On Thu, Mar 16, 2017 at 10:30:09PM -0700, Gopakumar Choorakkot Edakkunni >>> wrote: >>> > Thanks for the confirmation, glad I reached the person who knows the >>> nuts and >>> > bolts of virtio :-). So if the host is not in our control (ie if I am >>> just >>> > running as a VM on host provided by thirdparty vendor), is there any >>> workaround >>> > I can do from the guest side to prevent problems from happening on a >>> guest >>> > restart ? >>> >>> Not too much. You might want to hack the guest DPDK EAL memory initiation >>> part though, to not reset the hugepage memory on start. But that's too >>> hacky >>> that I will not recommend you to do so! >>> >>> > And if theres no workarounds at all and the host has to change, >>> instead of >>> > asking the third party vendor to do a wholesale upgrade to 16.04, is >>> there one/ >>> > few commits that can be added to the host ovs-dpdk to take care of >>> this guest >>> > restart virtio-reset-before opening case ? >>> >>> Yes, backporting the commits I have mentioned should be able to fix it. >>> But please note that I did some code refactorings before those fixes: it >>> won't apply cleanly to DPDK v2.2. >>> >>> And if you want to upgrade, I'd suggest to upgrade to v16.11, which is >>> LTS release. >>> >>> --yliu >>> > >>> > Rgds, >>> > Gopa. >>> > >>> > On Thu, Mar 16, 2017 at 10:24 PM, Yuanhan Liu < >>> yuanhan.liu@linux.intel.com> >>> > wrote: >>> > >>> > On Thu, Mar 16, 2017 at 10:20:30PM -0700, Gopakumar Choorakkot >>> Edakkunni >>> > wrote: >>> > > >> When I was saying dpdk version, I meant the DPDK version with >>> OVS. >>> > > >>> > > Oh I see! My apologies for the misuderstanding. The dpdk version >>> used by >>> > host >>> > > ovs should be dpdk2.2, the guest process uses dpdk16.07. The OVS >>> process >>> > is not >>> > > getting restarted, what is getting restarted is the guest >>> process using >>> > > dpdk16.07 - so the above clarifications you had about virtio >>> being >>> > > reset-before-opened on guest restart - does that still hold good >>> or does >>> > that >>> > > need the HOST side dpdk to be 16.04 or above ? >>> > >>> > Yes, the HOST dpdk should be >= v16.04. >>> > >>> > --yliu >>> > > >>> > > >> And yes, the fixes are not included in the DPDK required for >>> OVS 2.4. >>> > > >>> > > Thanks for the info. >>> > > >>> > > Rgds, >>> > > Gopa. >>> > > >>> > > On Thu, Mar 16, 2017 at 10:13 PM, Yuanhan Liu < >>> > yuanhan.liu@linux.intel.com> >>> > > wrote: >>> > > >>> > > On Thu, Mar 16, 2017 at 09:56:01PM -0700, Gopakumar >>> Choorakkot >>> > Edakkunni >>> > > wrote: >>> > > > Hi Yuanhan, >>> > > > >>> > > > Thanks for the confirmation about not having to do >>> anything special >>> > to >>> > > close >>> > > > the ports on dpdk going down or coming up. >>> > > > >>> > > > As for the question about if I met any issue of ovs >>> getting stuck - >>> > yes, >>> > > my >>> > > > guest process runs dpdk 16.07 as I mentioned earlier - and >>> if I >>> > kill my >>> > > guest >>> > > > process, then the host OVS-dpdk on the host reports stall >>> ! The >>> > OVS-dpdk >>> > > and >>> > > > emu versions I use are as below. But maybe that is because >>> of the >>> > ovs >>> > > missing >>> > > > the fixes you mentioned ? >>> > > >>> > > When I was saying dpdk version, I meant the DPDK version >>> with OVS. >>> > > >>> > > > ~# ovs-vswitchd --version >>> > > > ovs-vswitchd (Open vSwitch) 2.4.1 >>> > > >>> > > And yes, the fixes are not included in the DPDK required for >>> OVS 2.4. >>> > > >>> > > --yliu >>> > > >>> > > > Compiled Nov 14 2016 06:53:31 >>> > > > # kvm --version >>> > > > QEMU emulator version 2.2.0, Copyright (c) 2003-2008 >>> Fabrice >>> > Bellard >>> > > > ~# >>> > > > >>> > > > >>> > > > Rgds, >>> > > > Gopa. >>> > > > >>> > > > On Thu, Mar 16, 2017 at 9:35 PM, Yuanhan Liu < >>> > yuanhan.liu@linux.intel.com >>> > > > >>> > > > wrote: >>> > > > >>> > > > On Thu, Mar 16, 2017 at 07:48:28PM -0700, Gopakumar >>> Choorakkot >>> > > Edakkunni >>> > > > wrote: >>> > > > > Thanks a lot for the response Yuanhan. I am using >>> dpdk >>> > v16.07. So >>> > > what >>> > > > you are >>> > > > > saying is that in 16.07, we dont really need to call >>> > > rte_eth_dev_close() >>> > > > on >>> > > > > exit, >>> > > > >>> > > > It's not about "don't really need", it's more like >>> "it's hard >>> > to". >>> > > Just >>> > > > think that it may crash at any time. >>> > > > >>> > > > > because dpdk will ensure that it will do virtio >>> reset before >>> > init >>> > > when it >>> > > > > comes up right ? >>> > > > >>> > > > No, It just handles the abnormal case well when guest >>> APP >>> > restarts. >>> > > > >>> > > > > Regarding the vhost commits you mentioned - do we >>> still need >>> > those >>> > > fixes >>> > > > if we >>> > > > > have the "virtio reset before init" mechanism ? >>> > > > >>> > > > Yes, we still need them: just think some malicious >>> guest may >>> > also >>> > > forge >>> > > > data like that. >>> > > > >>> > > > I'm a bit confused then. Have you actually met any >>> issue (like >>> > got >>> > > stucked) >>> > > > with DPDK v16.07? >>> > > > >>> > > > --yliu >>> > > > >>> > > > > Or that is a seperate problem >>> > > > > altogether (and hence we would need those fixes) ? >>> > > > > >>> > > > > Rgds, >>> > > > > Gopa. >>> > > > > >>> > > > > On Thu, Mar 16, 2017 at 7:06 PM, Yuanhan Liu < >>> > > yuanhan.liu@linux.intel.com >>> > > > > >>> > > > > wrote: >>> > > > > >>> > > > > On Thu, Mar 16, 2017 at 12:39:16PM -0700, >>> Gopakumar >>> > Choorakkot >>> > > > Edakkunni >>> > > > > wrote: >>> > > > > > So the doc says we should call >>> rte_eth_dev_close() >>> > *before* >>> > > going >>> > > > down. >>> > > > > And I >>> > > > > > know that especially in dpdk-virtionet in the >>> guest + >>> > > ovs-dpdk in >>> > > > the >>> > > > > host, >>> > > > > > the ovs ends up getting stalled/stuck (!!) if >>> I dont >>> > close >>> > > the port >>> > > > > before >>> > > > > > starting() it when the guest dpdk process >>> comes back >>> > up. >>> > > > > >>> > > > > I'm assuming you were using an old version, >>> something >>> > like dpdk >>> > > v2.2? >>> > > > > IIRC, DPDK v16.04 should have fixed your issue. >>> > > > > >>> > > > > > Considering that this not done properly can >>> screw up >>> > the HOST >>> > > ovs, >>> > > > and I >>> > > > > want >>> > > > > > to do everything possible to avoid that, I >>> want to be >>> > 200% >>> > > sure >>> > > > that I >>> > > > > call >>> > > > > > close even if my process gets a kill -9 .. So >>> obviously >>> > the >>> > > only >>> > > > way of >>> > > > > doing >>> > > > > > that is to close the port when the dpdk >>> process comes >>> > back up >>> > > and >>> > > > > *before* we >>> > > > > > init the port. rte_eth_dev_close() is not >>> capable of >>> > doing >>> > > that as >>> > > > it >>> > > > > expects >>> > > > > > the port parameters to be initialized etc.. >>> before it >>> > can be >>> > > > called. >>> > > > > >>> > > > > We do virtio reset before init, which is >>> basically what >>> > > > rte_eth_dev_close() >>> > > > > mainly does. So I see no big issue here. >>> > > > > >>> > > > > The stuck issue is due to hugepage reset by the >>> guest >>> > DPDK >>> > > > application, >>> > > > > leading all virtio vring elements being mem >>> zeroed. The >>> > old >>> > > vhost >>> > > > doesn't >>> > > > > handle it well, as a result, it got stuck. And >>> here are >>> > some >>> > > relevant >>> > > > > commits: >>> > > > > >>> > > > > a436f53 vhost: avoid dead loop chain >>> > > > > c687b0b vhost: check for ring descriptors >>> overflow >>> > > > > 623bc47 vhost: do sanity check for ring >>> descriptor >>> > length >>> > > > > >>> > > > > --yliu >>> > > > > >>> > > > > > Any other >>> > > > > > suggestions on what can be done to close on >>> restart >>> > rather >>> > > than >>> > > > close on >>> > > > > going >>> > > > > > down ? Thought of bouncing this by the alias >>> before I >>> > add a >>> > > version >>> > > > of >>> > > > > close >>> > > > > > myself that can do this close-on-restart >>> > > > > >>> > > > > >>> > > > >>> > > > >>> > > >>> > > >>> > >>> > >>> >> >> >