From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f41.google.com (mail-wm0-f41.google.com [74.125.82.41]) by dpdk.org (Postfix) with ESMTP id C92FE1B7B9 for ; Mon, 9 Apr 2018 16:07:03 +0200 (CEST) Received: by mail-wm0-f41.google.com with SMTP id x82so17041402wmg.1 for ; Mon, 09 Apr 2018 07:07:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=IrIwPChYM6I9AjeY4SC+eiTtqEm0h9p+8j0XgAT5vFc=; b=FpMmOot3/YySxXuNkGB2M16nGzrRvRS/qph27vMCrilfxxX2GEnzqoJFWAQ/gP+ldL HfCtjbtDQSNvzgPKKC/WWX34k7+WeRKe2z7iairPlwuuFKG/hMT+bWanbxUPPvmC3mT8 jXsknYKUJyswDGbqmqjXyg6rLetOHD+Msdzr7GabPTrf0wPwmYo/y+JPI937gdOvJZeN 7mXLTAZ9ab7wX+qTF6/Mg0XZY38hos7k63MkI6jz4mRKewLU5PjSW+mz0YjXL4S0k2KT XG8sWVfuqO8DpOKTt7pcZAp3WbR9S/pxitYmjSyZpooy5BTlA9iKC56o3UidMWaWK6Jz DVMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=IrIwPChYM6I9AjeY4SC+eiTtqEm0h9p+8j0XgAT5vFc=; b=Kvnv+BoOleP8+QECCwQQFGZ/T5S5Jki3D5IJuIX9fyMwTWpi/iQZkkfZR8nfKR59CE 8+8r5gWCmsDeXa3q1+iFPAgXsRWIkIE7KzEG8rhbX2KfJ9Yu4cJSRtjJAYuqt1rV2vig G84VErqGV2UPyL120PbYW6WGpLko3OXLtxtj3v2HnmNreHKDzjFE8Lm9MJf3Fru7pETu 6T9oxbRr+kVHhZ7AoP6pT1b/7gVDFhjGK12D1TUblEyOrHHKkKIfVDAHpcXNLbHroO12 abYvZYGoHfJ6w/LZBCxdDgYb6z2LMgiIzTsJVMWbRHplQzJmODd6bAoecitNZTwYtb3x wdqg== X-Gm-Message-State: ALQs6tAQpbj9ir3fULb6TZ/wEQIPQyB9VjYCKM0kpiU022YurmZeOmU6 2oSY/F7wRvejei4+zrnmEFek X-Google-Smtp-Source: AIpwx4+7vR3BIgSS5GHk0BNYzCCaIRZiVjygiAvXXoXltnozYqD65BD1oQBa2KXExYqeAyLAr5MxNg== X-Received: by 10.28.16.85 with SMTP id 82mr111485wmq.4.1523282823508; Mon, 09 Apr 2018 07:07:03 -0700 (PDT) Received: from laranjeiro-vm.dev.6wind.com (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id 10sm636009wrz.58.2018.04.09.07.07.02 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 09 Apr 2018 07:07:02 -0700 (PDT) Date: Mon, 9 Apr 2018 16:07:18 +0200 From: =?iso-8859-1?Q?N=E9lio?= Laranjeiro To: Shahaf Shuler Cc: Adrien Mazarguil , Yongseok Koh , "dev@dpdk.org" Message-ID: <20180409140718.7xokbkqb6ygjowrs@laranjeiro-vm.dev.6wind.com> References: <20180403044817.27457-1-shahafs@mellanox.com> <20180404073009.zgqu3yrj26trwdfr@laranjeiro-vm.dev.6wind.com> <20180404121051.ersiyf75gykwfon5@laranjeiro-vm.dev.6wind.com> <20180405065120.jqttkvlhjflpfdbj@laranjeiro-vm.dev.6wind.com> <20180409082736.spzxnp3bbcllakui@laranjeiro-vm.dev.6wind.com> <20180409132641.a4s4eamfgs5vea3l@laranjeiro-vm.dev.6wind.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20180409132641.a4s4eamfgs5vea3l@laranjeiro-vm.dev.6wind.com> User-Agent: NeoMutt/20170113 (1.7.2) Subject: Re: [dpdk-dev] [PATCH] net/mlx5: fix link status initialization X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2018 14:07:03 -0000 On Mon, Apr 09, 2018 at 03:26:41PM +0200, Nélio Laranjeiro wrote: > On Mon, Apr 09, 2018 at 12:28:04PM +0000, Shahaf Shuler wrote: > > Monday, April 9, 2018 11:28 AM, Nélio Laranjeiro: > > > Subject: Re: [PATCH] net/mlx5: fix link status initialization > > > > > > On Sun, Apr 08, 2018 at 01:09:27PM +0000, Shahaf Shuler wrote: > > > > Thursday, April 5, 2018 9:51 AM, Nélio Laranjeiro: > > > > > Subject: Re: [PATCH] net/mlx5: fix link status initialization > > > > > > > > > > On Thu, Apr 05, 2018 at 05:35:57AM +0000, Shahaf Shuler wrote: > > > > > > Wednesday, April 4, 2018 3:11 PM, Nélio Laranjeiro: > > > > > > > Subject: Re: [PATCH] net/mlx5: fix link status initialization > > > > > > > > > > > > > > On Wed, Apr 04, 2018 at 09:58:33AM +0000, Shahaf Shuler wrote: > > > > > > > > Wednesday, April 4, 2018 10:30 AM, Nélio Laranjeiro: > > > > > > > > > Subject: Re: [PATCH] net/mlx5: fix link status > > > > > > > > > initialization > > > > > > > > > > > > > > > > > > On Tue, Apr 03, 2018 at 07:48:17AM +0300, Shahaf Shuler wrote: > > > > > > > > [..] > > > > > > > > > > > > > > > > > > According to your analysis, this is only necessary when the LCS > > > > > > > is configured in the device. Why not adding this call to > > > > > > > mlx5_dev_interrupt_handler_install() which is responsible to > > > > > > > install the LCS callback. > > > > > > > > > > > > I think it is good practice whether or not LSC is set. > > > > > > The link status should be initialized to the correct value after the probe. > > > > > > > > > > There is no guarantee the link will be accurate, at probe time the > > > > > link may be up so internal information has a status up with a speed with > > > this patch. > > > > > The application probes a second port, in the mean time the link of > > > > > the first port goes down, the interrupt is still not installed and > > > > > the internal status becomes wrong (still up whereas the port is down). > > > > > > > > > > Finally at start, the device installs the handler, but the link is > > > > > still down whereas internally it is up, the application will call > > > > > rte_eth_link_get_nowait() which will directly copy the wrong > > > > > internal status to the application. > > > > > > > > This is not correct. > > > > Using Verbs, the async_fd on which link status interrupts are reported is > > > created on probe. > > > > Even if the interrupt handler is not installed, interrupts still > > > > trigger on this fd. They will be processed when the interrupt handler > > > > will be installed as part of the port start. > > > > So in fact you have the whole trace on the link status changes waiting > > > > to be processed upon port start. > > > > > > Right, but in such case, this patch still does not solves the issue. > > > Until the dev_start() is called, the link may be inconsistent with the real > > > status. > > > > > > example: > > > > > > pci_probe --> link is up. > > > leaving pci probe, the link goes downs --> internally the PMD has a link up. > > > > > > Until the dev_start() is called any call to rte_eth_link_get_nowait() will copy > > > the internal PMD status which is not accurate. > > > > > > From this point, the issue seems to fully come from > > > rte_eth_link_get_nowait() which should not make any copy until the port is > > > not started, until then the link may be inconsistent between the driver and > > > the device. > > > > Right. However fixing it only on the ethdev layer is not enough. > > > > Assume the link was up from the beginning. > > For the case were the call for rte_eth_link_get_nowait happens only > > after the port start, the link will still be wrong as it will be > > reported as down (and no interrupt to fix it). > > > > So to summarize I plan to have this patch (with modification of the > > wait_to_complete) along with another fix for ethdev. > > Do we have an agreement? > > Sorry but no, as MLX5 is not the only PMD impacted by this issue and a > solution can be done for all of them (ixgbe, virtio, MLX5 I did not > check the others). > > Your proposition does not solves the following case neither, when the > link goes down and the application stops the port due to that, if the > application waits using the same API to restart the port, it will never > occur as the interrupts handlers are no more installed and thus the > internal link will never be updated remaining indefinitely down. This > also impact the three PMD above. > > Commit b77d21cc2364 ("ethdev: add link status get/set helper functions") > has been integrated a little too fast introducing this bug. > > All those issues can be fixed by adding another statement in the if of > the function rte_eth_link_get_nowait() > > - if (dev->data->dev_conf.intr_conf.lsc) > + if (dev->data->dev_conf.intr_conf.lsc && dev->data->dev_started) > rte_eth_linkstatus_get(dev, eth_link); > else { > RTE_FUNC_PTR_OR_RET(*dev->dev_ops->link_update); > (*dev->dev_ops->link_update)(dev, 1); > *eth_link = dev->data->dev_link; > } > > Doing this fixes all PMD having the same bug, and remove your specific > patch for MLX5 which becomes useless. Making our private little discussion public. Indeed with my proposal MLX5 still needs to make a first link update to have the internal value set in case the link is consistent until the dev_start() (i.e. up before the probing), otherwise, the returned value from rte_eth_link_get_nowait() will be down when the device is started. Regards, -- Nélio Laranjeiro 6WIND