From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f65.google.com (mail-wm0-f65.google.com [74.125.82.65]) by dpdk.org (Postfix) with ESMTP id 6D5662BA8 for ; Thu, 16 Aug 2018 12:27:36 +0200 (CEST) Received: by mail-wm0-f65.google.com with SMTP id o11-v6so3920658wmh.2 for ; Thu, 16 Aug 2018 03:27:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:content-transfer-encoding:mime-version; bh=UO7O++WtWnp8vanwQ83s/O68O7MxkJF6+WExDV6c1Ag=; b=t3g0x/B/8LhD091w5xOq1VX5GbKJLJGsLvLcAAJKsP0EW3DISx4W+bXl70a7/qFWE+ WBary6RrI9VoVMOMpiBe3mFEudaBkdBVhXvIMlTVKha3vMilZtnp3cxX5isS6FSxnM9X yxlsKFWOSnOXYT3QrJ3cKvlJsBAefBY5lauXebIX8zbcKE1q+lh+lKL1tXPOHLhg9mQN nILZG9m7bm9epB71uLPv8zcuY8NmBJnuGo1L+XFneMtgk9ZzwjijKPfgIRvaSO/2JH2a VkPglGK13BQu0x7k5qgaFtDXKymF+9GMaTiYR9S/1PWfGJhci9sIwqOdrLlPZ7NdDEUA NU3w== X-Gm-Message-State: AOUpUlG/zWc9b6HPErbqMKJYqIGNYcugeP10je308cS8wNd2UWEi8Q/x SZbZcUnBMfspXthVeeLsiL0= X-Google-Smtp-Source: AA+uWPwdwUFxtet4iXew3cF+YTGeS3UUSVcKkCOrd6JMRQ8Cq+SpW1/d7cLaRtLDt2GLeM0YcSzJFg== X-Received: by 2002:a1c:5d55:: with SMTP id r82-v6mr15259714wmb.152.1534415256044; Thu, 16 Aug 2018 03:27:36 -0700 (PDT) Received: from localhost ([2001:1be0:110d:fcfe:41aa:5bfa:6cf3:7531]) by smtp.gmail.com with ESMTPSA id h184-v6sm917574wmf.28.2018.08.16.03.27.34 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 16 Aug 2018 03:27:34 -0700 (PDT) Message-ID: <1534415253.5764.42.camel@debian.org> From: Luca Boccassi To: Tiwei Bie Cc: dev@dpdk.org, maxime.coquelin@redhat.com, zhihong.wang@intel.com, bruce.richardson@intel.com, Brian Russell Date: Thu, 16 Aug 2018 11:27:33 +0100 In-Reply-To: <20180816064600.GA17647@debian> References: <20180814143035.19640-1-bluca@debian.org> <20180814143035.19640-2-bluca@debian.org> <20180815031144.GA7324@debian> <1534326657.5764.11.camel@debian.org> <20180816064600.GA17647@debian> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Evolution 3.22.6-1+deb9u1 Mime-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 2/2] virtio: fix PCI config err handling X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Aug 2018 10:27:36 -0000 On Thu, 2018-08-16 at 14:46 +0800, Tiwei Bie wrote: > On Wed, Aug 15, 2018 at 10:50:57AM +0100, Luca Boccassi wrote: > > On Wed, 2018-08-15 at 11:11 +0800, Tiwei Bie wrote: > > > On Tue, Aug 14, 2018 at 03:30:35PM +0100, Luca Boccassi wrote: > > > > From: Brian Russell > > > >=20 > > > > In virtio_read_caps, rte_pci_read_config returns the number of > > > > bytes > > > > read from PCI config or < 0 on error. > > > > If less than the expected number of bytes are read then log the > > > > failure and return rather than carrying on with garbage. > > >=20 > > > Is this a fix or an improvement? > > > Or did you see anything broken without this patch? > > > If so, we may need a fixes line and Cc stable. > >=20 > > It is a fix, as it was creating problems in production due to the > > constant flux of errors in the logs. >=20 > Could you be a bit more specific about which errors > were logged if possible? >=20 > If my understanding is correct, you mean the errors > were logged because less than the required amount of > bytes were read? Yes - rte_pci_read_config on Linux will return not just 0/-1, but the actual number of bytes read. If it's less than the required amount, the code then goes on and reads garbage, which causes errors later in the execution. Checking that we actually got the amount of data we need fixes this issue. > > But given patch 1/2 is effectively doing a small change in the BSD > > bus > > API, and it's a requirement for 2/2, I don't think we can include > > it in > > the stable releases unfortunately. >=20 > If it's a fix, we need a fixes line. Sure, will send a v2. > >=20 > > > >=20 >=20 > [...] > > > > @@ -567,16 +567,18 @@ virtio_read_caps(struct rte_pci_device > > > > *dev, > > > > struct virtio_hw *hw) > > > > =C2=A0 } > > > > =C2=A0 > > > > =C2=A0 ret =3D rte_pci_read_config(dev, &pos, 1, > > > > PCI_CAPABILITY_LIST); > > > > - if (ret < 0) { > > > > - PMD_INIT_LOG(DEBUG, "failed to read pci > > > > capability list"); > > > > + if (ret !=3D 1) { > > > > + PMD_INIT_LOG(DEBUG, > > > > + =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0"failed to read pci capability > > > > list, ret %d", ret); > > > > =C2=A0 return -1; > > > > =C2=A0 } > > > > =C2=A0 > > > > =C2=A0 while (pos) { > > > > =C2=A0 ret =3D rte_pci_read_config(dev, &cap, > > > > sizeof(cap), pos); > > > > - if (ret < 0) { > > > > - PMD_INIT_LOG(ERR, > > > > - "failed to read pci cap at > > > > pos: %x", pos); > > > > + if (ret !=3D sizeof(cap)) { >=20 > Above code has to successfully read a full virtio > PCI capability during each read, otherwise it will > give up reading other capabilities and may fallback > to the legacy mode. In which case it will fail to > read the requested amount of bytes? Should we try > to read the generic PCI fields first? I do not know what exactly causes less than required bytes to be read, but we have seen it happen in production (not 100% of the times though - so I think it's worth keeping the structure as-is). As you said in that case it falls back to legacy mode which, in our experience in production deployments, then succeeds. That's why the error level print is undesired - because the code will actually work via the fallback, but the customers will see scary errors in the logs and open escalations :-) > Besides, you also need to update other calls to > rte_pci_read_config(), e.g.: >=20 > https://github.com/DPDK/dpdk/blob/76b9d9de5c7d/drivers/net/virtio/vir > tio_pci.c#L696 >=20 > Thanks Sure I will apply the same changes in v2. --=20 Kind regards, Luca Boccassi