From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8F720A00BE; Wed, 20 Apr 2022 10:37:20 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2C9344068E; Wed, 20 Apr 2022 10:37:20 +0200 (CEST) Received: from mx0b-00169c01.pphosted.com (mx0a-00169c01.pphosted.com [67.231.148.124]) by mails.dpdk.org (Postfix) with ESMTP id 597BD40687 for ; Wed, 20 Apr 2022 10:37:18 +0200 (CEST) Received: from pps.filterd (m0048493.ppops.net [127.0.0.1]) by mx0a-00169c01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 23K7m4hD006455 for ; Wed, 20 Apr 2022 01:37:17 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paloaltonetworks.com; h=mime-version : references : in-reply-to : from : date : message-id : subject : to : cc : content-type; s=PPS12012017; bh=W9n+7wM+KvZWMojVj9cVw71jM/pPM+/3M1gAdjcQJ4w=; b=TM1fR4xtu6oM/AZ9Qk3rm+TCXzATUa7bjm/EqJq610uS8eZ6L90Lo2dywOY3RnMxrHl2 9JAdAjXSRjHn2Szxb18ssiwZKpZ/HwdhHqLL4WzcJ2+5Xx002LQHn06wlO544R/vf4Jx 9OTAxD8C3qakkap69AwqIHhGgJKKYGhb1cJr4czWXBfhY2DU2n/NZ/ppUDfncm0gTN9m M+ZC4FacKBLjugvCRQ2ixNMjn7dihHC55yODraeZmjo8kiApmBVJjh3R4RQW+/lc1E8E I6ljFrGwuon1EdMbhCai5t7e8enSgJbBBLG0o/O/bTKQWNP/z00AJBJ0LIOmYvh+++AL yg== Received: from mail-vk1-f200.google.com (mail-vk1-f200.google.com [209.85.221.200]) by mx0a-00169c01.pphosted.com (PPS) with ESMTPS id 3ffvujp2pq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 20 Apr 2022 01:37:17 -0700 Received: by mail-vk1-f200.google.com with SMTP id k77-20020a1f2450000000b0034908b0f8c6so92322vkk.15 for ; Wed, 20 Apr 2022 01:37:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=paloaltonetworks-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=W9n+7wM+KvZWMojVj9cVw71jM/pPM+/3M1gAdjcQJ4w=; b=4eFuU94plGkEhBYMcKrzUr+BC5Rg/haKm0zIIVhKSbo1n817Wekj195kgRYzUO4H6Z JGPv1sZEVwySW4aS5ziVmbR43UEDtd56XU9H6HYosBFYud3/qFqTh/3O5C2PYJXkWpzE FKYt0lUbPHoMoacCtJbpsXBEZ4FtT/O8CicBJw4nzAsvZQUARYkl5ktwcQ1Q4BuD3qp9 EL98H62WJuOoWqdd2zEYr5ZEx5YaTjeIJtPr9IZNE0r4Xga5LxCquMIR0NDtXTaF5Hcz QOin9QJuE0hYRa+0iAobAIGKpy07yuv7AxwbcRvmYZSLvnG8pteigkesOkCew1V+X3Tc dj3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=W9n+7wM+KvZWMojVj9cVw71jM/pPM+/3M1gAdjcQJ4w=; b=VGtBM38bdRkztMLYAe4Q5njSFUD8SBMPgiT3uukugORyEBwnrDwKpMkO0euTNe9z8F xT92LG8dvJk2JAOEDAS4TTM3YWFAIdZ9/d8yILvXUhlJhgjL5b6lXMq6nhpN3XyzMRoh LV100/4tT1Qy4KLt2A8Q1WgXtLZHS1mVHBvX3wm6ZAn12WxXP6nEV00/XnAYY4tcC3qB oS3lWhh70KcStMIFwSlxbUGoWyfOu+9riZ2sroCopsIkSENYoVFeCJoedsHGUmINwynw fofKlfwvsgNK5S21HCip2XGy30a66RLMFc7W9kqubA1aRzxMBh3cOxe/ExuRdvKjKWbC qfKg== X-Gm-Message-State: AOAM530ldUYwMJBSka6+sCdNGEr67FvSkPmDpM6yCHm3paRMflYauiPp 8c2S/wSSuLpAVOmbcaTszGqecDbAVCPHfLaNPfZgTm5a4epXxXhBVLc78Y7cyWxapLb42OXPo/o ABtyi4ZsQft72nC2p2u4= X-Received: by 2002:ab0:67cf:0:b0:341:257f:ce52 with SMTP id w15-20020ab067cf000000b00341257fce52mr5403514uar.109.1650443835844; Wed, 20 Apr 2022 01:37:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxu9AkccH5h6VTuV46kdws4eATszq9kTTVfis3OG+iMKNX71mA8M3LhYzcoS4y6hoHDMzke7jxPKyIx0CfzrCI= X-Received: by 2002:ab0:67cf:0:b0:341:257f:ce52 with SMTP id w15-20020ab067cf000000b00341257fce52mr5403510uar.109.1650443835620; Wed, 20 Apr 2022 01:37:15 -0700 (PDT) MIME-Version: 1.0 References: <20220419080150.2511dee2@hermes.local> <20220419160942.75fd8703@hermes.local> In-Reply-To: <20220419160942.75fd8703@hermes.local> From: Amiya Mohakud Date: Wed, 20 Apr 2022 14:07:04 +0530 Message-ID: Subject: Re: DPDK:20.11.1: net/ena crash while fetching xstats To: Stephen Hemminger Cc: =?UTF-8?Q?Micha=C5=82_Krawczyk?= , dev , Sachin Kanoje , Megha Punjani , Sharad Saha , Eswar Sadaram , "Brandes, Shai" , ena-dev Content-Type: multipart/alternative; boundary="0000000000008b533f05dd11e647" X-Proofpoint-GUID: 52jl0ufCUZrj0liIMqzIiBUUD6UKkBCY X-Proofpoint-ORIG-GUID: 52jl0ufCUZrj0liIMqzIiBUUD6UKkBCY X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.858,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-04-20_02,2022-04-15_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_spam_notspam policy=outbound_spam score=0 spamscore=0 impostorscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 adultscore=0 mlxscore=0 priorityscore=1501 clxscore=1015 suspectscore=0 lowpriorityscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2204200053 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --0000000000008b533f05dd11e647 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Stephen and Michal Thanks a lot for all the discussions and progress made on this.Appreciate it. Sorry for the late reply. To answer your questions: *1. Is the application you're using the single-process or multiprocess?If so, from which process are you probing for the xstats?* >> System has both primary and secondary processes running. but the stats are being fetched from the* primary process* only. I'm not sure if the presence of secondary processes is causing the crash even if we try to fetch stats from the primary process. Can we confirm this from the code? *2. Have you tried running latest DPDK v20.11 LTS?* *>> *It's DPDK v20.11.1. Did not try with the latest 20.11 LTS. 3. What kernel module are you using (igb_uio/vfio-pci)? >> It's igb_uio. 4. On what AWS instance type it was reproduced? >> It's c5n.2xlarge. ( 8 cores. 1 primary process and 6 secondary processes.) 5. Is the Seg Fault happening the first time you call for the xstats? >> Yes. That's correct. Regards Amiya On Wed, Apr 20, 2022 at 4:39 AM Stephen Hemminger < stephen@networkplumber.org> wrote: > On Tue, 19 Apr 2022 22:27:32 +0200 > Micha=C5=82 Krawczyk wrote: > > > Thanks Stephen, indeed the issue reproduces in the secondary process. > > > > Basically ENA v2.2.1 is not MP aware, meaning it cannot be used safely > > from the secondary process. The main obstacle is the admin queue which > > is used for processing the hardware requests which can be used safely > > only from the primary process. It's not strictly a bug, as we weren't > > exposing 'MP Awareness' in the PMD features list, it's more like a > > lack of proper MP support. > > > > The latest ENA PMD release should be MP safe. We currently don't have > > PMD backport ready for the older LTS release (but we're planning to do > > so for ENA v2.6.0 on the amzn-drivers repository: > > > https://urldefense.com/v3/__https://github.com/amzn/amzn-drivers/tree/mas= ter/userspace/dpdk__;!!Mt_FR42WkD9csi9Y!ZAgIa147k7j0wwnu83K-vq8T9bH0gWwoldq= Hg9IshR1CSkTYpJOLzT35FhtlVPDkWbN9CZMv469Jj68fwxrqFsQQErwYHNc$ > ). > > I wish that ENA did not have its own versioning scheme. > Driver versions are meaningful only to the driver writer/vendor, they > don't help the end user. > > Since backporting is not part of stable process. I suggest doing what > XDP did for 21.11 and earlier releases. > > diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c > index 634c97acf60d..3778349f3fe9 100644 > --- a/drivers/net/ena/ena_ethdev.c > +++ b/drivers/net/ena/ena_ethdev.c > @@ -3212,6 +3212,12 @@ static int ena_rx_queue_intr_disable(struct > rte_eth_dev *dev, > static int eth_ena_pci_probe(struct rte_pci_driver *pci_drv __rte_unused= , > struct rte_pci_device *pci_dev) > { > + if (rte_eal_process_type() =3D=3D RTE_PROC_SECONDARY) { > + PMD_INIT_LOG(ERR, > + "Ena PMD does not support secondary > processes\n"); > + return -ENOTSUP; > + } > + > return rte_eth_dev_pci_generic_probe(pci_dev, > sizeof(struct ena_adapter), eth_ena_dev_init); > } > --0000000000008b533f05dd11e647 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Stephen and Michal

Thanks a lot for all the discussio= ns and progress made on this.Appreciate it.
Sorry for the late reply. To an= swer your questions:

1. Is the application you're using the single-process o= r multiprocess?
= If so, from which pr= ocess are you probing for the xstats?
>> System has both p= rimary and secondary processes running. but the stats are being=C2=A0fetche= d from the primary process only. I'm not sure if the presence of= secondary processes is causing the crash even if we try to fetch stats fro= m the primary process. Can we=C2=A0confirm=C2=A0this from the code?
2. Have you tried running lates= t DPDK v20.11 LTS?
= >> It's=C2=A0DPDK v20.11.1. Did not try with = the latest 20.11 LTS.
<= font face=3D"verdana, sans-serif">
3. What kernel module= are you using (igb_uio/vfio-pci)?
>> It'= s igb_uio.

4. On what AWS instance type i= t was reproduced?
>> It's c5n.2xlarge. ( = 8 cores. 1 primary process and 6 secondary processes.)

5. Is the Seg Fault happening the first time you call for the x= stats?
>> Yes. That's correct.

<= div class=3D"gmail_default" style=3D"font-family:verdana,sans-serif">Regard= s
Amiya



On Wed, Apr 20, 2022 at 4:39 AM Stephen Hemminger &l= t;stephen@networkplumber.org<= /a>> wrote:
O= n Tue, 19 Apr 2022 22:27:32 +0200
Micha=C5=82 Krawczyk <
mk@semihalf.com> wrote:

> Thanks Stephen, indeed the issue reproduces in the secondary process.<= br> >
> Basically ENA v2.2.1 is not MP aware, meaning it cannot be used safely=
> from the secondary process. The main obstacle is the admin queue which=
> is used for processing the hardware requests which can be used safely<= br> > only from the primary process. It's not strictly a bug, as we were= n't
> exposing 'MP Awareness' in the PMD features list, it's mor= e like a
> lack of proper MP support.
>
> The latest ENA PMD release should be MP safe. We currently don't h= ave
> PMD backport ready for the older LTS release (but we're planning t= o do
> so for ENA v2.6.0 on the amzn-drivers repository:
> https://urldefense.com/v3/__https://= github.com/amzn/amzn-drivers/tree/master/userspace/dpdk__;!!Mt_FR42WkD9csi9= Y!ZAgIa147k7j0wwnu83K-vq8T9bH0gWwoldqHg9IshR1CSkTYpJOLzT35FhtlVPDkWbN9CZMv4= 69Jj68fwxrqFsQQErwYHNc$ ).

I wish that ENA did not have its own versioning scheme.
Driver versions are meaningful only to the driver writer/vendor, they
don't help the end user.

Since backporting is not part of stable process. I suggest doing what
XDP did for 21.11 and earlier releases.

diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c index 634c97acf60d..3778349f3fe9 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -3212,6 +3212,12 @@ static int ena_rx_queue_intr_disable(struct rte_eth_= dev *dev,
=C2=A0static int eth_ena_pci_probe(struct rte_pci_driver *pci_drv __rte_unu= sed,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 struct rte_pci_device *pci_dev)
=C2=A0{
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (rte_eal_process_type() =3D=3D RTE_PROC_SECO= NDARY) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0PMD_INIT_LOG(ERR, +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0"Ena PMD does not support secondary processes\= n");
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return -ENOTSUP; +=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+
=C2=A0 =C2=A0 =C2=A0 =C2=A0 return rte_eth_dev_pci_generic_probe(pci_dev, =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sizeof(struct ena_a= dapter), eth_ena_dev_init);
=C2=A0}
--0000000000008b533f05dd11e647--