From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f180.google.com (mail-wr0-f180.google.com [209.85.128.180]) by dpdk.org (Postfix) with ESMTP id 01C061B7C7 for ; Mon, 9 Apr 2018 15:26:27 +0200 (CEST) Received: by mail-wr0-f180.google.com with SMTP id 80so9590957wrb.2 for ; Mon, 09 Apr 2018 06:26:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=A2fPuvvEdS65ZT4z1DPMkajuqfC3x5dgBz1WjfbrX5o=; b=VvkOb9VZjNDj212D23aG7X8jCLWvxfrIZuWhoqEGuMKpCAB0m1fd5xYr34fxwkYHjw FqjMzFULZj3Za+XmsFWDKozI7sYU5zTCHQ79Hs9qV/ODqEWlAkxwM8BrNGbBseV+VaHN EnpuOlsilWnrw0RKma5p3CM02p21PBt/+Qb3uPXQXZqbjTGpwL3LsNnppUav2ZM+x29o pwKhtSormyAT07ZvDOFRsk8Njp5VcegVxA6w309avZdEtuiTVygbzE9f9C84xvr/ujrG pbKmjhQ9V0iFlBrOT3YZ+ENTG7ipDadoKXb1dQfO4qLfXTOJ61vnlvV1WF/xSW/AJGPj 3dqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=A2fPuvvEdS65ZT4z1DPMkajuqfC3x5dgBz1WjfbrX5o=; b=ObFy4rrDprMfLqQVG2/Zev7NQycDzkbYsyBOEp35qU2U9X9G1B2Dtcj6jg+/26bP24 IHR8eT9Ai79CeYxAg7cCF1FYG+RWXhV/+eKHkiwxSXMOhOHHVbwkwqIgLhsD7pq/1bT0 w/uwACsTq16drkHonl372nQFcFV2+KjF1azBKH+5aNwPF7KAqxH3TfLw7lV5SlSsh3FT OmW4JexZPUiZKSPNgdq2v6XCgc2pbtXtiZsSu/tIR7Pgwjr7OuiIFkiwby0J6CRWJCt4 ES3e3/ON91rFsV3zpd5WrIz66QnCCZABTCmjx+qo2y9Y8uSlwqSrIgtaF7XrT2Fdwkj1 R+4w== X-Gm-Message-State: AElRT7Gf2B3QJnqMzDPl0yvc33Qgsce8U7q2sbhOkvRysLkzdijx/k4Q iVFEEuXs9uKUuZJuj56Rz7u58iWpsQ== X-Google-Smtp-Source: AIpwx4/1SmGalPFgcz49AmjVxgDSY+0K65e8Ow+THrMYZ9kVT2ZnhrJGxEMt+4uI0PMwkdJfsX9PtA== X-Received: by 10.223.225.4 with SMTP id d4mr24288625wri.24.1523280386670; Mon, 09 Apr 2018 06:26:26 -0700 (PDT) Received: from laranjeiro-vm.dev.6wind.com (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id u127sm866241wmd.30.2018.04.09.06.26.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 09 Apr 2018 06:26:25 -0700 (PDT) Date: Mon, 9 Apr 2018 15:26:41 +0200 From: =?iso-8859-1?Q?N=E9lio?= Laranjeiro To: Shahaf Shuler Cc: Adrien Mazarguil , Yongseok Koh , "dev@dpdk.org" Message-ID: <20180409132641.a4s4eamfgs5vea3l@laranjeiro-vm.dev.6wind.com> References: <20180403044817.27457-1-shahafs@mellanox.com> <20180404073009.zgqu3yrj26trwdfr@laranjeiro-vm.dev.6wind.com> <20180404121051.ersiyf75gykwfon5@laranjeiro-vm.dev.6wind.com> <20180405065120.jqttkvlhjflpfdbj@laranjeiro-vm.dev.6wind.com> <20180409082736.spzxnp3bbcllakui@laranjeiro-vm.dev.6wind.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Subject: Re: [dpdk-dev] [PATCH] net/mlx5: fix link status initialization X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2018 13:26:27 -0000 On Mon, Apr 09, 2018 at 12:28:04PM +0000, Shahaf Shuler wrote: > Monday, April 9, 2018 11:28 AM, Nélio Laranjeiro: > > Subject: Re: [PATCH] net/mlx5: fix link status initialization > > > > On Sun, Apr 08, 2018 at 01:09:27PM +0000, Shahaf Shuler wrote: > > > Thursday, April 5, 2018 9:51 AM, Nélio Laranjeiro: > > > > Subject: Re: [PATCH] net/mlx5: fix link status initialization > > > > > > > > On Thu, Apr 05, 2018 at 05:35:57AM +0000, Shahaf Shuler wrote: > > > > > Wednesday, April 4, 2018 3:11 PM, Nélio Laranjeiro: > > > > > > Subject: Re: [PATCH] net/mlx5: fix link status initialization > > > > > > > > > > > > On Wed, Apr 04, 2018 at 09:58:33AM +0000, Shahaf Shuler wrote: > > > > > > > Wednesday, April 4, 2018 10:30 AM, Nélio Laranjeiro: > > > > > > > > Subject: Re: [PATCH] net/mlx5: fix link status > > > > > > > > initialization > > > > > > > > > > > > > > > > On Tue, Apr 03, 2018 at 07:48:17AM +0300, Shahaf Shuler wrote: > > > > > > [..] > > > > > > > > > > > > > > > According to your analysis, this is only necessary when the LCS > > > > > > is configured in the device. Why not adding this call to > > > > > > mlx5_dev_interrupt_handler_install() which is responsible to > > > > > > install the LCS callback. > > > > > > > > > > I think it is good practice whether or not LSC is set. > > > > > The link status should be initialized to the correct value after the probe. > > > > > > > > There is no guarantee the link will be accurate, at probe time the > > > > link may be up so internal information has a status up with a speed with > > this patch. > > > > The application probes a second port, in the mean time the link of > > > > the first port goes down, the interrupt is still not installed and > > > > the internal status becomes wrong (still up whereas the port is down). > > > > > > > > Finally at start, the device installs the handler, but the link is > > > > still down whereas internally it is up, the application will call > > > > rte_eth_link_get_nowait() which will directly copy the wrong > > > > internal status to the application. > > > > > > This is not correct. > > > Using Verbs, the async_fd on which link status interrupts are reported is > > created on probe. > > > Even if the interrupt handler is not installed, interrupts still > > > trigger on this fd. They will be processed when the interrupt handler > > > will be installed as part of the port start. > > > So in fact you have the whole trace on the link status changes waiting > > > to be processed upon port start. > > > > Right, but in such case, this patch still does not solves the issue. > > Until the dev_start() is called, the link may be inconsistent with the real > > status. > > > > example: > > > > pci_probe --> link is up. > > leaving pci probe, the link goes downs --> internally the PMD has a link up. > > > > Until the dev_start() is called any call to rte_eth_link_get_nowait() will copy > > the internal PMD status which is not accurate. > > > > From this point, the issue seems to fully come from > > rte_eth_link_get_nowait() which should not make any copy until the port is > > not started, until then the link may be inconsistent between the driver and > > the device. > > Right. However fixing it only on the ethdev layer is not enough. > > Assume the link was up from the beginning. > For the case were the call for rte_eth_link_get_nowait happens only > after the port start, the link will still be wrong as it will be > reported as down (and no interrupt to fix it). > > So to summarize I plan to have this patch (with modification of the > wait_to_complete) along with another fix for ethdev. > Do we have an agreement? Sorry but no, as MLX5 is not the only PMD impacted by this issue and a solution can be done for all of them (ixgbe, virtio, MLX5 I did not check the others). Your proposition does not solves the following case neither, when the link goes down and the application stops the port due to that, if the application waits using the same API to restart the port, it will never occur as the interrupts handlers are no more installed and thus the internal link will never be updated remaining indefinitely down. This also impact the three PMD above. Commit b77d21cc2364 ("ethdev: add link status get/set helper functions") has been integrated a little too fast introducing this bug. All those issues can be fixed by adding another statement in the if of the function rte_eth_link_get_nowait() - if (dev->data->dev_conf.intr_conf.lsc) + if (dev->data->dev_conf.intr_conf.lsc && dev->data->dev_started) rte_eth_linkstatus_get(dev, eth_link); else { RTE_FUNC_PTR_OR_RET(*dev->dev_ops->link_update); (*dev->dev_ops->link_update)(dev, 1); *eth_link = dev->data->dev_link; } Doing this fixes all PMD having the same bug, and remove your specific patch for MLX5 which becomes useless. Regards, -- Nélio Laranjeiro 6WIND