From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id D1C99A0540 for ; Mon, 27 Jul 2020 19:33:18 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 57B9A1C027; Mon, 27 Jul 2020 19:33:18 +0200 (CEST) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id DFA9F1023; Mon, 27 Jul 2020 19:33:14 +0200 (CEST) IronPort-SDR: 0asNhBOS0RRmB8DbQqpdmslwiQMTd/jn8k+Wb8od7Jk4SRStAuOK/YyNIKTgl6xD38GYcYUMKz GDADtJvDNJRA== X-IronPort-AV: E=McAfee;i="6000,8403,9695"; a="169195744" X-IronPort-AV: E=Sophos;i="5.75,402,1589266800"; d="scan'208";a="169195744" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jul 2020 10:33:13 -0700 IronPort-SDR: IrtTWVcmDMXyZ5CLrxdSnaE+DfrsURfvprhdmV5eToDfGM5+Mk7gcnMkQkk9rI2S/316gUi7H7 FCV3FrEpPyjA== X-IronPort-AV: E=Sophos;i="5.75,402,1589266800"; d="scan'208";a="434015869" Received: from fyigit-mobl.ger.corp.intel.com (HELO [10.213.196.62]) ([10.213.196.62]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jul 2020 10:33:12 -0700 To: Stephen Hemminger , Thomas Monjalon Cc: dev@dpdk.org, stable@dpdk.org References: <20191222175551.17684-1-stephen@networkplumber.org> <3101970.h16uAIiOU7@xps> <20200505171454.00274f10@hermes.lan> From: Ferruh Yigit Autocrypt: addr=ferruh.yigit@intel.com; keydata= mQINBFXZCFABEADCujshBOAaqPZpwShdkzkyGpJ15lmxiSr3jVMqOtQS/sB3FYLT0/d3+bvy qbL9YnlbPyRvZfnP3pXiKwkRoR1RJwEo2BOf6hxdzTmLRtGtwWzI9MwrUPj6n/ldiD58VAGQ +iR1I/z9UBUN/ZMksElA2D7Jgg7vZ78iKwNnd+vLBD6I61kVrZ45Vjo3r+pPOByUBXOUlxp9 GWEKKIrJ4eogqkVNSixN16VYK7xR+5OUkBYUO+sE6etSxCr7BahMPKxH+XPlZZjKrxciaWQb +dElz3Ab4Opl+ZT/bK2huX+W+NJBEBVzjTkhjSTjcyRdxvS1gwWRuXqAml/sh+KQjPV1PPHF YK5LcqLkle+OKTCa82OvUb7cr+ALxATIZXQkgmn+zFT8UzSS3aiBBohg3BtbTIWy51jNlYdy ezUZ4UxKSsFuUTPt+JjHQBvF7WKbmNGS3fCid5Iag4tWOfZoqiCNzxApkVugltxoc6rG2TyX CmI2rP0mQ0GOsGXA3+3c1MCdQFzdIn/5tLBZyKy4F54UFo35eOX8/g7OaE+xrgY/4bZjpxC1 1pd66AAtKb3aNXpHvIfkVV6NYloo52H+FUE5ZDPNCGD0/btFGPWmWRmkPybzColTy7fmPaGz cBcEEqHK4T0aY4UJmE7Ylvg255Kz7s6wGZe6IR3N0cKNv++O7QARAQABtCVGZXJydWggWWln aXQgPGZlcnJ1aC55aWdpdEBpbnRlbC5jb20+iQJsBBMBCgBWAhsDAh4BAheABQsJCAcDBRUK CQgLBRYCAwEABQkKqZZ8FiEE0jZTh0IuwoTjmYHH+TPrQ98TYR8FAl6ha3sXGHZrczovL2tl eXMub3BlbnBncC5vcmcACgkQ+TPrQ98TYR8uLA//QwltuFliUWe60xwmu9sY38c1DXvX67wk UryQ1WijVdIoj4H8cf/s2KtyIBjc89R254KMEfJDao/LrXqJ69KyGKXFhFPlF3VmFLsN4XiT PSfxkx8s6kHVaB3O183p4xAqnnl/ql8nJ5ph9HuwdL8CyO5/7dC/MjZ/mc4NGq5O9zk3YRGO lvdZAp5HW9VKW4iynvy7rl3tKyEqaAE62MbGyfJDH3C/nV/4+mPc8Av5rRH2hV+DBQourwuC ci6noiDP6GCNQqTh1FHYvXaN4GPMHD9DX6LtT8Fc5mL/V9i9kEVikPohlI0WJqhE+vQHFzR2 1q5nznE+pweYsBi3LXIMYpmha9oJh03dJOdKAEhkfBr6n8BWkWQMMiwfdzg20JX0o7a/iF8H 4dshBs+dXdIKzPfJhMjHxLDFNPNH8zRQkB02JceY9ESEah3wAbzTwz+e/9qQ5OyDTQjKkVOo cxC2U7CqeNt0JZi0tmuzIWrfxjAUulVhBmnceqyMOzGpSCQIkvalb6+eXsC9V1DZ4zsHZ2Mx Hi+7pCksdraXUhKdg5bOVCt8XFmx1MX4AoV3GWy6mZ4eMMvJN2hjXcrreQgG25BdCdcxKgqp e9cMbCtF+RZax8U6LkAWueJJ1QXrav1Jk5SnG8/5xANQoBQKGz+yFiWcgEs9Tpxth15o2v59 gXK5Ag0EV9ZMvgEQAKc0Db17xNqtSwEvmfp4tkddwW9XA0tWWKtY4KUdd/jijYqc3fDD54ES YpV8QWj0xK4YM0dLxnDU2IYxjEshSB1TqAatVWz9WtBYvzalsyTqMKP3w34FciuL7orXP4Ai bPtrHuIXWQOBECcVZTTOdZYGAzaYzxiAONzF9eTiwIqe9/oaOjTwTLnOarHt16QApTYQSnxD UQljeNvKYt1lZE/gAUUxNLWsYyTT+22/vU0GDUahsJxs1+f1yEr+OGrFiEAmqrzpF0lCS3f/ 3HVTU6rS9cK3glVUeaTF4+1SK5ZNO35piVQCwphmxa+dwTG/DvvHYCtgOZorTJ+OHfvCnSVj sM4kcXGjJPy3JZmUtyL9UxEbYlrffGPQI3gLXIGD5AN5XdAXFCjjaID/KR1c9RHd7Oaw0Pdc q9UtMLgM1vdX8RlDuMGPrj5sQrRVbgYHfVU/TQCk1C9KhzOwg4Ap2T3tE1umY/DqrXQgsgH7 1PXFucVjOyHMYXXugLT8YQ0gcBPHy9mZqw5mgOI5lCl6d4uCcUT0l/OEtPG/rA1lxz8ctdFB VOQOxCvwRG2QCgcJ/UTn5vlivul+cThi6ERPvjqjblLncQtRg8izj2qgmwQkvfj+h7Ex88bI 8iWtu5+I3K3LmNz/UxHBSWEmUnkg4fJlRr7oItHsZ0ia6wWQ8lQnABEBAAGJAjwEGAEKACYC GwwWIQTSNlOHQi7ChOOZgcf5M+tD3xNhHwUCXqFrngUJCKxSYAAKCRD5M+tD3xNhH3YWD/9b cUiWaHJasX+OpiuZ1Li5GG3m9aw4lR/k2lET0UPRer2Jy1JsL+uqzdkxGvPqzFTBXgx/6Byz EMa2mt6R9BCyR286s3lxVS5Bgr5JGB3EkpPcoJT3A7QOYMV95jBiiJTy78Qdzi5LrIu4tW6H o0MWUjpjdbR01cnj6EagKrDx9kAsqQTfvz4ff5JIFyKSKEHQMaz1YGHyCWhsTwqONhs0G7V2 0taQS1bGiaWND0dIBJ/u0pU998XZhmMzn765H+/MqXsyDXwoHv1rcaX/kcZIcN3sLUVcbdxA WHXOktGTQemQfEpCNuf2jeeJlp8sHmAQmV3dLS1R49h0q7hH4qOPEIvXjQebJGs5W7s2vxbA 5u5nLujmMkkfg1XHsds0u7Zdp2n200VC4GQf8vsUp6CSMgjedHeF9zKv1W4lYXpHp576ZV7T GgsEsvveAE1xvHnpV9d7ZehPuZfYlP4qgo2iutA1c0AXZLn5LPcDBgZ+KQZTzm05RU1gkx7n gL9CdTzVrYFy7Y5R+TrE9HFUnsaXaGsJwOB/emByGPQEKrupz8CZFi9pkqPuAPwjN6Wonokv ChAewHXPUadcJmCTj78Oeg9uXR6yjpxyFjx3vdijQIYgi5TEGpeTQBymLANOYxYWYOjXk+ae dYuOYKR9nbPv+2zK9pwwQ2NXbUBystaGyQ== Message-ID: Date: Mon, 27 Jul 2020 18:33:08 +0100 MIME-Version: 1.0 In-Reply-To: <20200505171454.00274f10@hermes.lan> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] kni: fix kernel deadlock when using mlx devices X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Sender: "stable" On 5/6/2020 1:14 AM, Stephen Hemminger wrote: > On Wed, 18 Mar 2020 16:17:57 +0100 > Thomas Monjalon wrote: > >> 17/01/2020 17:43, Ferruh Yigit: >>> On 12/22/2019 5:55 PM, Stephen Hemminger wrote: >>>> This fixes a deadlock when using KNI with bifurcated drivers. >>>> Bringing kni device up always times out when using Mellanox >>>> devices. >>>> >>>> The kernel KNI driver sends message to userspace to complete >>>> the request. For the case of bifurcated driver, this may involve >>>> an additional request to kernel to change state. This request >>>> would deadlock because KNI was holding the RTNL mutex. >>>> >>>> This was a bad design which goes back to the original code. >>>> A workaround is for KNI driver to drop RTNL while waiting. >>>> To prevent the device from disappearing while the operation >>>> is in progress, it needs to hold reference to network device >>>> while waiting. >>>> >>>> As an added benefit, an useless error check can also be removed. >>>> >>>> Fixes: 3fc5ca2f6352 ("kni: initial import") >>>> Cc: stable@dpdk.org >>>> Signed-off-by: Stephen Hemminger >>>> --- >>> >>> This patch cause a hang on my server, not sure what exactly was the problem but >>> kernel log was continuously printing "Cannot send to req_q". Will dig more. >> >> Ferruh, did you have a chance to check what is hanging? >> Stephen, is there any news on your side? >> >> > > It did not hang when I tested it. The bug report is still open > Sorry for the delay, since I am working remotely I was worried about loosing the connection to my server, finally I did create a virtual environment to test again. I confirm the hang observed %100 when two different process updates the kni interface, like two different process sets the mtu. Without this patch this works fine. I understand the motivation of the patch, but with change there is a possibility to hang the server, which we can't allow, need to find another way. Can updating mlx interface wait KNI interface operation to complete?