From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 222D943101; Fri, 25 Aug 2023 17:46:50 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0DD5D40A7A; Fri, 25 Aug 2023 17:46:50 +0200 (CEST) Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by mails.dpdk.org (Postfix) with ESMTP id AEAEB400D5 for ; Fri, 25 Aug 2023 17:46:48 +0200 (CEST) Received: by linux.microsoft.com (Postfix, from userid 1086) id 1262A2127C8E; Fri, 25 Aug 2023 08:46:48 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 1262A2127C8E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1692978408; bh=2uDrqtKeWPFsZZPzYPlKt8S6oXiUmzf+AJtMw8rzld0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=INGaq0QnFd8HDbdmjTMKX7R3gMQEjLisI2V2EFcrFjmTjMXAH4engPa2mqQaTxArV WkPnvzC+8unYsoJwP5JvpR+g7QGAh5hY9VcFSWelkdsEiktvZ/QIErE3GspPGFhjLH P6bXl5/W2KA2D/ToEspnhi65jvn3ZI+KT40Dqt1M= Date: Fri, 25 Aug 2023 08:46:48 -0700 From: Tyler Retzlaff To: David Marchand Cc: Maxime Coquelin , Morten =?iso-8859-1?Q?Br=F8rup?= , dev@dpdk.org, Bruce Richardson , Konstantin Ananyev , Ciara Power , thomas@monjalon.net Subject: Re: [PATCH v11 06/16] eal: use prefetch intrinsics Message-ID: <20230825154648.GB15412@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> References: <1680558751-17931-1-git-send-email-roretzla@linux.microsoft.com> <1691781658-32520-1-git-send-email-roretzla@linux.microsoft.com> <1691781658-32520-7-git-send-email-roretzla@linux.microsoft.com> <98CBD80474FA8B44BF855DF32C47DC35D87B32@smartserver.smartshare.dk> <20230824155358.GA17491@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Fri, Aug 25, 2023 at 10:44:01AM +0200, David Marchand wrote: > On Thu, Aug 24, 2023 at 5:54 PM Tyler Retzlaff > wrote: > > > > On Thu, Aug 24, 2023 at 04:18:06PM +0200, David Marchand wrote: > > > On Thu, Aug 24, 2023 at 2:47 PM Morten Brørup wrote: > > > > > From: David Marchand [mailto:david.marchand@redhat.com] > > > > > However, I am a bit puzzled why the prefetch change makes the compiler > > > > > consider this loop differently. > > > > > We have the same constructs everywhere in this library and x86_64 > > > > > builds are fine... > > > > > > > > That is indeed the relevant question here! > > > > > > > > Perhaps the compiler somehow ignores the "const" part of the parameter given to the "asm" (in rte_prefetch0()) for 64 bit arch, but not for 32 bit arch? > > > > > > It is possible to reproduce the issue with current DPDK tree (with > > > existing prefetch implementation in asm) and removing all > > > rte_prefetch0 calls from the async rx path: > > > > > > diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c > > > index d7624d18c8..6f941cf27d 100644 > > > --- a/lib/vhost/virtio_net.c > > > +++ b/lib/vhost/virtio_net.c > > > @@ -748,8 +748,6 @@ map_one_desc(struct virtio_net *dev, struct > > > vhost_virtqueue *vq, > > > if (unlikely(!desc_addr)) > > > return -1; > > > > > > - rte_prefetch0((void *)(uintptr_t)desc_addr); > > > - > > > buf_vec[vec_id].buf_iova = desc_iova; > > > buf_vec[vec_id].buf_addr = desc_addr; > > > buf_vec[vec_id].buf_len = desc_chunck_len; > > > @@ -1808,8 +1806,6 @@ virtio_dev_rx_async_submit_split(struct > > > virtio_net *dev, struct vhost_virtqueue > > > */ > > > avail_head = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE); > > > > > > - rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]); > > > - > > > async_iter_reset(async); > > > > > > for (pkt_idx = 0; pkt_idx < count; pkt_idx++) { > > > @@ -1997,7 +1993,6 @@ virtio_dev_rx_async_packed_batch_enqueue(struct > > > virtio_net *dev, > > > uint16_t i; > > > > > > vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { > > > - rte_prefetch0((void *)(uintptr_t)desc_addrs[i]); > > > desc = vhost_iova_to_vva(dev, vq, desc_addrs[i], > > > &lens[i], VHOST_ACCESS_RW); > > > hdrs[i] = (struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc; > > > lens[i] = pkts[i]->pkt_len + > > > @@ -2106,8 +2101,6 @@ virtio_dev_rx_async_submit_packed(struct > > > virtio_net *dev, struct vhost_virtqueue > > > uint16_t i; > > > > > > do { > > > - rte_prefetch0(&vq->desc_packed[vq->last_avail_idx]); > > > - > > > if (count - pkt_idx >= PACKED_BATCH_SIZE) { > > > if (!virtio_dev_rx_async_packed_batch(dev, vq, > > > &pkts[pkt_idx], > > > dma_id, vchan_id)) { > > > > > > > > > If any prefetch is left, the warning disappears. > > > So the asm usage probably impacts/disables some compiler feature, for > > > this code path. > > > > So as an aside one of the reasons often cited as to why intrinsics are > > preferred over inline asm is that inline asm can't be considered when > > the compiler performs optimization. It could be that when moving to use > > the intrinsics the scope of what can be optimized around the prefetch0 > > calls can be used for optimization leading to the observations here. > > Agreed. > While we decide the next steps on the vhost library, and for making > progress on this series, I decided to keep the previous asm > implementation and put intrinsics under a #ifdef MSVC. I'm fine with this, we can micro-optimize things out of the MSVC only blocks later in more targeted changes. These kinds of little things is why earlier versions of the series only used intrinsics for MSVC bah! Thanks > > -- > David Marchand