From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 172FEA04F3; Fri, 20 Dec 2019 07:56:01 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D396A330; Fri, 20 Dec 2019 07:55:59 +0100 (CET) Received: from mail-il1-f196.google.com (mail-il1-f196.google.com [209.85.166.196]) by dpdk.org (Postfix) with ESMTP id 73AB61F5 for ; Fri, 20 Dec 2019 07:55:58 +0100 (CET) Received: by mail-il1-f196.google.com with SMTP id p8so7047810iln.12 for ; Thu, 19 Dec 2019 22:55:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=LsF2t6krBWe3BOm/f7DUQ4hB+ZYGfcQkWqSTWNouipQ=; b=o4jeYrQ33fq2D1W3SFICWT5GxsWB08t2oqeZo0ltaDfZueuahv4YXWmZ/0xgQWLMoX DUcPDjKO8P4ViB5ZwACTvEc7IJAqgmme2JUSAg4Zs7c2hhxjTg1dibv8GhNKGGWCfJCn 8mpW/WpKaNCIxo0fRQr5nmQd64HYVzD3lP/XvsNuVtLFBzYgnlnhnChELPFvXJp8bQUH 9J+34Aw/k8wurTyDiYpB9DTMSKSWWOj6SVAetfwPpmpkA9ymP8taZarMrLX8iSVj5jR0 5GOCByAk6IDnJvC4fP9jP8WkX1JtuepmSiZRP/aWODDGX50DnghiSi9OhjI/Mi3Yqn+M beTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=LsF2t6krBWe3BOm/f7DUQ4hB+ZYGfcQkWqSTWNouipQ=; b=dIaLepthG+8nOZO72/nXBONm88yz+TeDuDawfuA2UTc+zmStziYrHbxByXv2fGXx+t cKDK+uExORffQsZIhX0UngZADoRSyIgdxthNGy9FmZBaEVCAH+YWt20ss+Jh19bpJ8MY tZISBO8X5HlTG3buAkfd0iz2ng5CSAk7cEwe1cVPscP3y2JLXsnRYec/8lRQZcEZogMN 8KTsmGrqya6ZrAOBmZMFaOJFhtsmG7mnWjGMWY3ADTd8G0F6NFTYYDteE3GYCnxdz7hp xUMBbRGUcR2WOJtmE/HEs7t+3UUtE9byIQSzZUGfqAG/w3zo4nmP18lvrId/FEYp+x8G 9Yvg== X-Gm-Message-State: APjAAAXRiMYHVd4wLhg0ooNt9I16kll8Az2vEPEF1tBz1SVCI1PIzZ+F AL3Ury5nkR4fezP8ljAmG6QBngl40JWwlZSM9No= X-Google-Smtp-Source: APXvYqyKVxBbq0Dto48OBNNI0lb4qHK6NXT6nNIyAtmIrM8TNgUA1lRm53dG+RLbf7eX+xZWJJUEYTSHl7Gjvkn6XFw= X-Received: by 2002:a92:481d:: with SMTP id v29mr11105129ila.271.1576824957601; Thu, 19 Dec 2019 22:55:57 -0800 (PST) MIME-Version: 1.0 References: <1571758074-16445-1-git-send-email-gavin.hu@arm.com> <1576811391-19131-1-git-send-email-gavin.hu@arm.com> <1576811391-19131-2-git-send-email-gavin.hu@arm.com> In-Reply-To: From: Jerin Jacob Date: Fri, 20 Dec 2019 12:25:41 +0530 Message-ID: To: Gavin Hu Cc: dpdk-dev , nd , David Marchand , "thomas@monjalon.net" , "rasland@mellanox.com" , "maxime.coquelin@redhat.com" , "tiwei.bie@intel.com" , "hemant.agrawal@nxp.com" , "jerinj@marvell.com" , Pavan Nikhilesh , Honnappa Nagarahalli , Ruifeng Wang , Phil Yang , Joyce Kong , Steve Capper Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, Dec 20, 2019 at 12:02 PM Gavin Hu wrote: > > Hi Jerin, Hi Gavin, > > > > > > > > > > > > The peripheral coherence order for a memory-mapped peripheral > > > > signifies the > > > > > > order in which accesses arrive at the endpoint. For a read or a write > > > > RW1 > > > > > > and a read or a write RW2 to the same peripheral, then RW1 will > > appear > > > > in > > > > > > the peripheral coherence order for the peripheral before RW2 if > > either > > > > of > > > > > > the following cases apply: > > > > > > 1. RW1 and RW2 are accesses using Non-cacheable or Device > > attributes > > > > and > > > > > > RW1 is Ordered-before RW2. > > > > > > 2. RW1 and RW2 are accesses using Device-nGnRE or Device- > > nGnRnE > > > > attributes > > > > > > and RW1 appears in program order before RW2. > > > > > > > > > > > > > > > This is true if RW1 and RW2 addresses are device memory. i.e the > > > > > registers in the PCI bar address. > > > > > If RW1 is DDR address which is been used by the controller(say NIC > > > > > ring descriptor) then there will be an issue. > > > > > For example Intel i40e driver, the admin queue update in Host DDR > > > > > memory and it updates the doorbell. > > > > > In such a case, this patch will create an issue. Correct? Have you > > > > > checked this patch with ARM64 + XL710 controllers? > > > > > > This patch relaxes the rte_io_*mb barriers for pure PCI device memory > > accesses. > > > > Yes. This would break cases for mixed access fro i40e drivers. > > > > > > > > For mixed accesses of DDR and PCI device memory, rte_smp_*mb(DMB > > ISH) is not sufficient. > > > But rte_cio_*mb(DMB OSH) is sufficient and can be used. > > > > Yes. Let me share a bit of history. > > > > 1) There are a lot of drivers(initially developed in x86) that have > > mixed access and don't have any barriers as x86 does not need it. > > 2) rte_io introduced to fix that > > 3) Item (2) introduced the performance issues in the fast path as an > > optimization rte_cio_* introduced. > Exactly, this patch is to mitigate the performance issues introduced by rte_io('dsb' is too much and unnecessary here). > Rte_cio instead is definitely required for mixed access. > > > > So in the current of the scheme of things, we have APIs to FIX > > portability issue(rte_io) and performance issue(rte_cio). > > IMO, we may not need any change in infra code now. If you think, the > > documentation is missing then we can enhance it. > > If we make infra change then again drivers needs to be updated and tested. > No changes for rte_cio, the semantics, and definitions of rte_io does not change either, if limited the scope to PCI, which is the case in DPDK context(?). > The change lies only in the implementation, right? > > Just looked at the link you shared and found i40 driver is missing rte_cio_*mb in i40e_asq_send_command, but the old rte_io_*mb rescued. > Will submit a new patch in this series to used rte_cio together with new relaxed rte_io and do more tests. > > Yes, this is a big change, also a big optimization, for aarch64, in our tests it has very positive results. It will be optimization only when if we are changing in the fast path. In the slow path, it does not matter. I think, the First step should be to use rte_cio_* wherever it is coherent memory used in _fast path_. I think, Almost every driver fixed that. I am not against this patch(changing the slow path to use rte_cio* from rte_io* and virtio changes associated with that). If you are taking that patch, pay attention to all the drivers in the tree which is using rte_io* for mixed access in slowpath. > But as the case in i40e, we must pay attention to where rte_cio was missing but rescued by old rte_io(but not by new rte_io). > >