* [dpdk-dev] [PATCH v1 1/3] eal/arm64: relax the io barrier for aarch64
2019-10-22 15:27 [dpdk-dev] [PATCH v1 0/3] relax io barrier for aarch64 and use smp barriers for virtual pci memory Gavin Hu
@ 2019-10-22 15:27 ` Gavin Hu
2019-10-22 15:27 ` [dpdk-dev] [PATCH v1 2/3] net/virtio: virtual PCI requires smp barriers Gavin Hu
` (8 subsequent siblings)
9 siblings, 0 replies; 35+ messages in thread
From: Gavin Hu @ 2019-10-22 15:27 UTC (permalink / raw)
To: dev
Cc: nd, david.marchand, maxime.coquelin, tiwei.bie, thomas, rasland,
matan, shahafs, viacheslavo, arybchenko, stephen, hemant.agrawal,
jerinj, pbhagavatula, Honnappa.Nagarahalli, ruifeng.wang,
phil.yang, joyce.kong, steve.capper
Armv8's peripheral coherence order is a total order on all reads and writes
to that peripheral.[1]
The Peripheral coherence order for a Memory-mapped peripheral signifies the
order in which accesses arrive at the endpoint. For a read or a write RW1
and a read or a write RW2 to the same peripheral, then RW1 will appear in
the Peripheral coherence order for the peripheral before RW2 if either of
the following cases apply:
1. RW1 and RW2 are accesses using Non-cacheable or Device attributes and
RW1 is Ordered-before RW2.
2. RW1 and RW2 are accesses using Device-nGnRE or Device-nGnRnE attributes
and RW1 appears in program order before RW2.
On arm platforms, all the PCI resources are mapped to nGnRE device memory
[2], the above case 2 holds true, that means the perepheral coherence order
applies here and just a compiler barrier is sufficient for rte io barriers.
[1] Section B2.3.4 of ARMARM, https://developer.arm.com/docs/ddi0487/lates
t/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
tree/drivers/pci/pci-sysfs.c?h=v5.4-rc3#n1204
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
---
lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
index 859ae12..fd63956 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
@@ -34,11 +34,11 @@ extern "C" {
#define rte_smp_rmb() dmb(ishld)
-#define rte_io_mb() rte_mb()
+#define rte_io_mb() rte_compiler_barrier()
-#define rte_io_wmb() rte_wmb()
+#define rte_io_wmb() rte_compiler_barrier()
-#define rte_io_rmb() rte_rmb()
+#define rte_io_rmb() rte_compiler_barrier()
#define rte_cio_wmb() dmb(oshst)
--
2.7.4
^ permalink raw reply [flat|nested] 35+ messages in thread
* [dpdk-dev] [PATCH v1 2/3] net/virtio: virtual PCI requires smp barriers
2019-10-22 15:27 [dpdk-dev] [PATCH v1 0/3] relax io barrier for aarch64 and use smp barriers for virtual pci memory Gavin Hu
2019-10-22 15:27 ` [dpdk-dev] [PATCH v1 1/3] eal/arm64: relax the io barrier for aarch64 Gavin Hu
@ 2019-10-22 15:27 ` Gavin Hu
2019-10-22 15:27 ` [dpdk-dev] [PATCH v1 3/3] crypto/virtio: " Gavin Hu
` (7 subsequent siblings)
9 siblings, 0 replies; 35+ messages in thread
From: Gavin Hu @ 2019-10-22 15:27 UTC (permalink / raw)
To: dev
Cc: nd, david.marchand, maxime.coquelin, tiwei.bie, thomas, rasland,
matan, shahafs, viacheslavo, arybchenko, stephen, hemant.agrawal,
jerinj, pbhagavatula, Honnappa.Nagarahalli, ruifeng.wang,
phil.yang, joyce.kong, steve.capper
Other than real PCI reads and writes to the device memory requiring
the io barriers, virtual pci memories are normal memory in the smp
configuration, which requires the smp barriers.
Since the smp barriers and io barriers are identical on x86 and PPC,
this change has only effect on aarch64.
As far as peripheral coherence order for ‘virtual’ devices, the arch
intent is that the Hypervisor view of things takes precedence – since
translations are made in holistic manner as the full stage1+stage2
regime, there is no such thing as a transaction taking on “EL1”
mapping as far as ordering. If the Hypervisor maps stage2 as Normal
but the OS at EL1 maps it as Device-nGnRE, then it’s Normal memory and
follows the ordering rules for Normal memory.
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
---
drivers/net/virtio/virtio_pci.c | 124 ++++++++++++++++++++++++++++++----------
1 file changed, 94 insertions(+), 30 deletions(-)
diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 4468e89..b10f33a 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -24,6 +24,70 @@
#define PCI_CAP_ID_VNDR 0x09
#define PCI_CAP_ID_MSIX 0x11
+static __rte_always_inline uint8_t
+virtio_pci_read8(const volatile void *addr)
+{
+ uint8_t val;
+ val = rte_read8_relaxed(addr);
+ rte_smp_rmb();
+ return val;
+}
+
+static __rte_always_inline uint16_t
+virtio_pci_read16(const volatile void *addr)
+{
+ uint16_t val;
+ val = rte_read16_relaxed(addr);
+ rte_smp_rmb();
+ return val;
+}
+
+static __rte_always_inline uint32_t
+virtio_pci_read32(const volatile void *addr)
+{
+ uint32_t val;
+ val = rte_read32_relaxed(addr);
+ rte_smp_rmb();
+ return val;
+}
+
+static __rte_always_inline uint64_t
+virtio_pci_read64(const volatile void *addr)
+{
+ uint64_t val;
+ val = rte_read64_relaxed(addr);
+ rte_smp_rmb();
+ return val;
+}
+
+static __rte_always_inline void
+virtio_pci_write8(uint8_t value, volatile void *addr)
+{
+ rte_smp_wmb();
+ rte_write8_relaxed(value, addr);
+}
+
+static __rte_always_inline void
+virtio_pci_write16(uint16_t value, volatile void *addr)
+{
+ rte_smp_wmb();
+ rte_write16_relaxed(value, addr);
+}
+
+static __rte_always_inline void
+virtio_pci_write32(uint32_t value, volatile void *addr)
+{
+ rte_smp_wmb();
+ rte_write32_relaxed(value, addr);
+}
+
+static __rte_always_inline void
+virtio_pci_write64(uint64_t value, volatile void *addr)
+{
+ rte_smp_wmb();
+ rte_write64_relaxed(value, addr);
+}
+
/*
* The remaining space is defined by each driver as the per-driver
* configuration space.
@@ -260,8 +324,8 @@ const struct virtio_pci_ops legacy_ops = {
static inline void
io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
{
- rte_write32(val & ((1ULL << 32) - 1), lo);
- rte_write32(val >> 32, hi);
+ virtio_pci_write32(val & ((1ULL << 32) - 1), lo);
+ virtio_pci_write32(val >> 32, hi);
}
static void
@@ -273,13 +337,13 @@ modern_read_dev_config(struct virtio_hw *hw, size_t offset,
uint8_t old_gen, new_gen;
do {
- old_gen = rte_read8(&hw->common_cfg->config_generation);
+ old_gen = virtio_pci_read8(&hw->common_cfg->config_generation);
p = dst;
for (i = 0; i < length; i++)
- *p++ = rte_read8((uint8_t *)hw->dev_cfg + offset + i);
+ *p++ = virtio_pci_read8((uint8_t *)hw->dev_cfg + offset + i);
- new_gen = rte_read8(&hw->common_cfg->config_generation);
+ new_gen = virtio_pci_read8(&hw->common_cfg->config_generation);
} while (old_gen != new_gen);
}
@@ -291,7 +355,7 @@ modern_write_dev_config(struct virtio_hw *hw, size_t offset,
const uint8_t *p = src;
for (i = 0; i < length; i++)
- rte_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
+ virtio_pci_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
}
static uint64_t
@@ -299,11 +363,11 @@ modern_get_features(struct virtio_hw *hw)
{
uint32_t features_lo, features_hi;
- rte_write32(0, &hw->common_cfg->device_feature_select);
- features_lo = rte_read32(&hw->common_cfg->device_feature);
+ virtio_pci_write32(0, &hw->common_cfg->device_feature_select);
+ features_lo = virtio_pci_read32(&hw->common_cfg->device_feature);
- rte_write32(1, &hw->common_cfg->device_feature_select);
- features_hi = rte_read32(&hw->common_cfg->device_feature);
+ virtio_pci_write32(1, &hw->common_cfg->device_feature_select);
+ features_hi = virtio_pci_read32(&hw->common_cfg->device_feature);
return ((uint64_t)features_hi << 32) | features_lo;
}
@@ -311,53 +375,53 @@ modern_get_features(struct virtio_hw *hw)
static void
modern_set_features(struct virtio_hw *hw, uint64_t features)
{
- rte_write32(0, &hw->common_cfg->guest_feature_select);
- rte_write32(features & ((1ULL << 32) - 1),
+ virtio_pci_write32(0, &hw->common_cfg->guest_feature_select);
+ virtio_pci_write32(features & ((1ULL << 32) - 1),
&hw->common_cfg->guest_feature);
- rte_write32(1, &hw->common_cfg->guest_feature_select);
- rte_write32(features >> 32,
+ virtio_pci_write32(1, &hw->common_cfg->guest_feature_select);
+ virtio_pci_write32(features >> 32,
&hw->common_cfg->guest_feature);
}
static uint8_t
modern_get_status(struct virtio_hw *hw)
{
- return rte_read8(&hw->common_cfg->device_status);
+ return virtio_pci_read8(&hw->common_cfg->device_status);
}
static void
modern_set_status(struct virtio_hw *hw, uint8_t status)
{
- rte_write8(status, &hw->common_cfg->device_status);
+ virtio_pci_write8(status, &hw->common_cfg->device_status);
}
static uint8_t
modern_get_isr(struct virtio_hw *hw)
{
- return rte_read8(hw->isr);
+ return virtio_pci_read8(hw->isr);
}
static uint16_t
modern_set_config_irq(struct virtio_hw *hw, uint16_t vec)
{
- rte_write16(vec, &hw->common_cfg->msix_config);
- return rte_read16(&hw->common_cfg->msix_config);
+ virtio_pci_write16(vec, &hw->common_cfg->msix_config);
+ return virtio_pci_read16(&hw->common_cfg->msix_config);
}
static uint16_t
modern_set_queue_irq(struct virtio_hw *hw, struct virtqueue *vq, uint16_t vec)
{
- rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
- rte_write16(vec, &hw->common_cfg->queue_msix_vector);
- return rte_read16(&hw->common_cfg->queue_msix_vector);
+ virtio_pci_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+ virtio_pci_write16(vec, &hw->common_cfg->queue_msix_vector);
+ return virtio_pci_read16(&hw->common_cfg->queue_msix_vector);
}
static uint16_t
modern_get_queue_num(struct virtio_hw *hw, uint16_t queue_id)
{
- rte_write16(queue_id, &hw->common_cfg->queue_select);
- return rte_read16(&hw->common_cfg->queue_size);
+ virtio_pci_write16(queue_id, &hw->common_cfg->queue_select);
+ return virtio_pci_read16(&hw->common_cfg->queue_size);
}
static int
@@ -375,7 +439,7 @@ modern_setup_queue(struct virtio_hw *hw, struct virtqueue *vq)
ring[vq->vq_nentries]),
VIRTIO_PCI_VRING_ALIGN);
- rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+ virtio_pci_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
io_write64_twopart(desc_addr, &hw->common_cfg->queue_desc_lo,
&hw->common_cfg->queue_desc_hi);
@@ -384,11 +448,11 @@ modern_setup_queue(struct virtio_hw *hw, struct virtqueue *vq)
io_write64_twopart(used_addr, &hw->common_cfg->queue_used_lo,
&hw->common_cfg->queue_used_hi);
- notify_off = rte_read16(&hw->common_cfg->queue_notify_off);
+ notify_off = virtio_pci_read16(&hw->common_cfg->queue_notify_off);
vq->notify_addr = (void *)((uint8_t *)hw->notify_base +
notify_off * hw->notify_off_multiplier);
- rte_write16(1, &hw->common_cfg->queue_enable);
+ virtio_pci_write16(1, &hw->common_cfg->queue_enable);
PMD_INIT_LOG(DEBUG, "queue %u addresses:", vq->vq_queue_index);
PMD_INIT_LOG(DEBUG, "\t desc_addr: %" PRIx64, desc_addr);
@@ -403,7 +467,7 @@ modern_setup_queue(struct virtio_hw *hw, struct virtqueue *vq)
static void
modern_del_queue(struct virtio_hw *hw, struct virtqueue *vq)
{
- rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+ virtio_pci_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
io_write64_twopart(0, &hw->common_cfg->queue_desc_lo,
&hw->common_cfg->queue_desc_hi);
@@ -412,13 +476,13 @@ modern_del_queue(struct virtio_hw *hw, struct virtqueue *vq)
io_write64_twopart(0, &hw->common_cfg->queue_used_lo,
&hw->common_cfg->queue_used_hi);
- rte_write16(0, &hw->common_cfg->queue_enable);
+ virtio_pci_write16(0, &hw->common_cfg->queue_enable);
}
static void
modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue *vq)
{
- rte_write16(vq->vq_queue_index, vq->notify_addr);
+ virtio_pci_write16(vq->vq_queue_index, vq->notify_addr);
}
const struct virtio_pci_ops modern_ops = {
--
2.7.4
^ permalink raw reply [flat|nested] 35+ messages in thread
* [dpdk-dev] [PATCH v1 3/3] crypto/virtio: virtual PCI requires smp barriers
2019-10-22 15:27 [dpdk-dev] [PATCH v1 0/3] relax io barrier for aarch64 and use smp barriers for virtual pci memory Gavin Hu
2019-10-22 15:27 ` [dpdk-dev] [PATCH v1 1/3] eal/arm64: relax the io barrier for aarch64 Gavin Hu
2019-10-22 15:27 ` [dpdk-dev] [PATCH v1 2/3] net/virtio: virtual PCI requires smp barriers Gavin Hu
@ 2019-10-22 15:27 ` Gavin Hu
2019-10-23 8:22 ` [dpdk-dev] [PATCH v1 0/3] relax io barrier for aarch64 and use smp barriers for virtual pci memory Maxime Coquelin
` (6 subsequent siblings)
9 siblings, 0 replies; 35+ messages in thread
From: Gavin Hu @ 2019-10-22 15:27 UTC (permalink / raw)
To: dev
Cc: nd, david.marchand, maxime.coquelin, tiwei.bie, thomas, rasland,
matan, shahafs, viacheslavo, arybchenko, stephen, hemant.agrawal,
jerinj, pbhagavatula, Honnappa.Nagarahalli, ruifeng.wang,
phil.yang, joyce.kong, steve.capper
Other than real PCI reads and writes to the device memory requiring
the io barriers, virtual pci memories are normal memory in the smp
configuration, and requires the smp barriers.
Since the smp barriers and io barriers are identical on x86 and PPC,
this change has only effect on aarch64.
As far as peripheral coherence order for ‘virtual’ devices, the arch
intent is that the Hypervisor view of things takes precedence – since
translations are made in holistic manner as the full stage1+stage2
regime, there is no such thing as a transaction taking on “EL1”
mapping as far as ordering. If the Hypervisor maps stage2 as Normal
but the OS at EL1 maps it as Device-nGnRE, then it’s Normal memory and
follows the ordering rules for Normal memory.
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
---
drivers/crypto/virtio/virtio_pci.c | 124 ++++++++++++++++++++++++++++---------
1 file changed, 94 insertions(+), 30 deletions(-)
diff --git a/drivers/crypto/virtio/virtio_pci.c b/drivers/crypto/virtio/virtio_pci.c
index 8137b3c..200e0ac 100644
--- a/drivers/crypto/virtio/virtio_pci.c
+++ b/drivers/crypto/virtio/virtio_pci.c
@@ -24,6 +24,70 @@
#define PCI_CAP_ID_VNDR 0x09
#define PCI_CAP_ID_MSIX 0x11
+static __rte_always_inline uint8_t
+virtio_pci_read8(const volatile void *addr)
+{
+ uint8_t val;
+ val = rte_read8_relaxed(addr);
+ rte_smp_rmb();
+ return val;
+}
+
+static __rte_always_inline uint16_t
+virtio_pci_read16(const volatile void *addr)
+{
+ uint16_t val;
+ val = rte_read16_relaxed(addr);
+ rte_smp_rmb();
+ return val;
+}
+
+static __rte_always_inline uint32_t
+virtio_pci_read32(const volatile void *addr)
+{
+ uint32_t val;
+ val = rte_read32_relaxed(addr);
+ rte_smp_rmb();
+ return val;
+}
+
+static __rte_always_inline uint64_t
+virtio_pci_read64(const volatile void *addr)
+{
+ uint64_t val;
+ val = rte_read64_relaxed(addr);
+ rte_smp_rmb();
+ return val;
+}
+
+static __rte_always_inline void
+virtio_pci_write8(uint8_t value, volatile void *addr)
+{
+ rte_smp_wmb();
+ rte_write8_relaxed(value, addr);
+}
+
+static __rte_always_inline void
+virtio_pci_write16(uint16_t value, volatile void *addr)
+{
+ rte_smp_wmb();
+ rte_write16_relaxed(value, addr);
+}
+
+static __rte_always_inline void
+virtio_pci_write32(uint32_t value, volatile void *addr)
+{
+ rte_smp_wmb();
+ rte_write32_relaxed(value, addr);
+}
+
+static __rte_always_inline void
+virtio_pci_write64(uint64_t value, volatile void *addr)
+{
+ rte_smp_wmb();
+ rte_write64_relaxed(value, addr);
+}
+
/*
* The remaining space is defined by each driver as the per-driver
* configuration space.
@@ -52,8 +116,8 @@ check_vq_phys_addr_ok(struct virtqueue *vq)
static inline void
io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
{
- rte_write32(val & ((1ULL << 32) - 1), lo);
- rte_write32(val >> 32, hi);
+ virtio_pci_write32(val & ((1ULL << 32) - 1), lo);
+ virtio_pci_write32(val >> 32, hi);
}
static void
@@ -65,13 +129,13 @@ modern_read_dev_config(struct virtio_crypto_hw *hw, size_t offset,
uint8_t old_gen, new_gen;
do {
- old_gen = rte_read8(&hw->common_cfg->config_generation);
+ old_gen = virtio_pci_read8(&hw->common_cfg->config_generation);
p = dst;
for (i = 0; i < length; i++)
- *p++ = rte_read8((uint8_t *)hw->dev_cfg + offset + i);
+ *p++ = virtio_pci_read8((uint8_t *)hw->dev_cfg + offset + i);
- new_gen = rte_read8(&hw->common_cfg->config_generation);
+ new_gen = virtio_pci_read8(&hw->common_cfg->config_generation);
} while (old_gen != new_gen);
}
@@ -83,7 +147,7 @@ modern_write_dev_config(struct virtio_crypto_hw *hw, size_t offset,
const uint8_t *p = src;
for (i = 0; i < length; i++)
- rte_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
+ virtio_pci_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
}
static uint64_t
@@ -91,11 +155,11 @@ modern_get_features(struct virtio_crypto_hw *hw)
{
uint32_t features_lo, features_hi;
- rte_write32(0, &hw->common_cfg->device_feature_select);
- features_lo = rte_read32(&hw->common_cfg->device_feature);
+ virtio_pci_write32(0, &hw->common_cfg->device_feature_select);
+ features_lo = virtio_pci_read32(&hw->common_cfg->device_feature);
- rte_write32(1, &hw->common_cfg->device_feature_select);
- features_hi = rte_read32(&hw->common_cfg->device_feature);
+ virtio_pci_write32(1, &hw->common_cfg->device_feature_select);
+ features_hi = virtio_pci_read32(&hw->common_cfg->device_feature);
return ((uint64_t)features_hi << 32) | features_lo;
}
@@ -103,25 +167,25 @@ modern_get_features(struct virtio_crypto_hw *hw)
static void
modern_set_features(struct virtio_crypto_hw *hw, uint64_t features)
{
- rte_write32(0, &hw->common_cfg->guest_feature_select);
- rte_write32(features & ((1ULL << 32) - 1),
+ virtio_pci_write32(0, &hw->common_cfg->guest_feature_select);
+ virtio_pci_write32(features & ((1ULL << 32) - 1),
&hw->common_cfg->guest_feature);
- rte_write32(1, &hw->common_cfg->guest_feature_select);
- rte_write32(features >> 32,
+ virtio_pci_write32(1, &hw->common_cfg->guest_feature_select);
+ virtio_pci_write32(features >> 32,
&hw->common_cfg->guest_feature);
}
static uint8_t
modern_get_status(struct virtio_crypto_hw *hw)
{
- return rte_read8(&hw->common_cfg->device_status);
+ return virtio_pci_read8(&hw->common_cfg->device_status);
}
static void
modern_set_status(struct virtio_crypto_hw *hw, uint8_t status)
{
- rte_write8(status, &hw->common_cfg->device_status);
+ virtio_pci_write8(status, &hw->common_cfg->device_status);
}
static void
@@ -134,30 +198,30 @@ modern_reset(struct virtio_crypto_hw *hw)
static uint8_t
modern_get_isr(struct virtio_crypto_hw *hw)
{
- return rte_read8(hw->isr);
+ return virtio_pci_read8(hw->isr);
}
static uint16_t
modern_set_config_irq(struct virtio_crypto_hw *hw, uint16_t vec)
{
- rte_write16(vec, &hw->common_cfg->msix_config);
- return rte_read16(&hw->common_cfg->msix_config);
+ virtio_pci_write16(vec, &hw->common_cfg->msix_config);
+ return virtio_pci_read16(&hw->common_cfg->msix_config);
}
static uint16_t
modern_set_queue_irq(struct virtio_crypto_hw *hw, struct virtqueue *vq,
uint16_t vec)
{
- rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
- rte_write16(vec, &hw->common_cfg->queue_msix_vector);
- return rte_read16(&hw->common_cfg->queue_msix_vector);
+ virtio_pci_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+ virtio_pci_write16(vec, &hw->common_cfg->queue_msix_vector);
+ return virtio_pci_read16(&hw->common_cfg->queue_msix_vector);
}
static uint16_t
modern_get_queue_num(struct virtio_crypto_hw *hw, uint16_t queue_id)
{
- rte_write16(queue_id, &hw->common_cfg->queue_select);
- return rte_read16(&hw->common_cfg->queue_size);
+ virtio_pci_write16(queue_id, &hw->common_cfg->queue_select);
+ return virtio_pci_read16(&hw->common_cfg->queue_size);
}
static int
@@ -175,7 +239,7 @@ modern_setup_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
ring[vq->vq_nentries]),
VIRTIO_PCI_VRING_ALIGN);
- rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+ virtio_pci_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
io_write64_twopart(desc_addr, &hw->common_cfg->queue_desc_lo,
&hw->common_cfg->queue_desc_hi);
@@ -184,11 +248,11 @@ modern_setup_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
io_write64_twopart(used_addr, &hw->common_cfg->queue_used_lo,
&hw->common_cfg->queue_used_hi);
- notify_off = rte_read16(&hw->common_cfg->queue_notify_off);
+ notify_off = virtio_pci_read16(&hw->common_cfg->queue_notify_off);
vq->notify_addr = (void *)((uint8_t *)hw->notify_base +
notify_off * hw->notify_off_multiplier);
- rte_write16(1, &hw->common_cfg->queue_enable);
+ virtio_pci_write16(1, &hw->common_cfg->queue_enable);
VIRTIO_CRYPTO_INIT_LOG_DBG("queue %u addresses:", vq->vq_queue_index);
VIRTIO_CRYPTO_INIT_LOG_DBG("\t desc_addr: %" PRIx64, desc_addr);
@@ -203,7 +267,7 @@ modern_setup_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
static void
modern_del_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
{
- rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+ virtio_pci_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
io_write64_twopart(0, &hw->common_cfg->queue_desc_lo,
&hw->common_cfg->queue_desc_hi);
@@ -212,14 +276,14 @@ modern_del_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
io_write64_twopart(0, &hw->common_cfg->queue_used_lo,
&hw->common_cfg->queue_used_hi);
- rte_write16(0, &hw->common_cfg->queue_enable);
+ virtio_pci_write16(0, &hw->common_cfg->queue_enable);
}
static void
modern_notify_queue(struct virtio_crypto_hw *hw __rte_unused,
struct virtqueue *vq)
{
- rte_write16(vq->vq_queue_index, vq->notify_addr);
+ virtio_pci_write16(vq->vq_queue_index, vq->notify_addr);
}
const struct virtio_pci_ops virtio_crypto_modern_ops = {
--
2.7.4
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v1 0/3] relax io barrier for aarch64 and use smp barriers for virtual pci memory
2019-10-22 15:27 [dpdk-dev] [PATCH v1 0/3] relax io barrier for aarch64 and use smp barriers for virtual pci memory Gavin Hu
` (2 preceding siblings ...)
2019-10-22 15:27 ` [dpdk-dev] [PATCH v1 3/3] crypto/virtio: " Gavin Hu
@ 2019-10-23 8:22 ` Maxime Coquelin
2019-11-07 1:13 ` Gavin Hu (Arm Technology China)
2019-12-20 3:09 ` [dpdk-dev] [PATCH v2 " Gavin Hu
` (5 subsequent siblings)
9 siblings, 1 reply; 35+ messages in thread
From: Maxime Coquelin @ 2019-10-23 8:22 UTC (permalink / raw)
To: Gavin Hu, dev
Cc: nd, david.marchand, tiwei.bie, thomas, rasland, matan, shahafs,
viacheslavo, arybchenko, stephen, hemant.agrawal, jerinj,
pbhagavatula, Honnappa.Nagarahalli, ruifeng.wang, phil.yang,
joyce.kong, steve.capper
Hi Gavin,
On 10/22/19 5:27 PM, Gavin Hu wrote:
> Armv8's peripheral coherence order is a total order on all reads and
> writes to that peripheral, that makes a compiler barrier is enough for
> abstracted rte io barrier.
>
> For virtual PCI devices, the virtual device memory is actually normal
> memory and the Hypervisor view of things takes precedence and they are
> within a smp configuration and smp barriers should be used, the
> relaxed io barrier for aarch64 becomes insufficient.
IIUC, this series is for performance optimization and not fixing
coherency issues. Can you confirm?
If that's the case, I'm afraid we'll have to postpone it to v20.02,
our patch queues are already too big at that stage of the release.
Maxime
> Gavin Hu (3):
> eal/arm64: relax the io barrier for aarch64
> net/virtio: virtual PCI requires smp barriers
> crypto/virtio: virtual PCI requires smp barriers
>
> drivers/crypto/virtio/virtio_pci.c | 124 ++++++++++++++++-----
> drivers/net/virtio/virtio_pci.c | 124 ++++++++++++++++-----
> .../common/include/arch/arm/rte_atomic_64.h | 6 +-
> 3 files changed, 191 insertions(+), 63 deletions(-)
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v1 0/3] relax io barrier for aarch64 and use smp barriers for virtual pci memory
2019-10-23 8:22 ` [dpdk-dev] [PATCH v1 0/3] relax io barrier for aarch64 and use smp barriers for virtual pci memory Maxime Coquelin
@ 2019-11-07 1:13 ` Gavin Hu (Arm Technology China)
0 siblings, 0 replies; 35+ messages in thread
From: Gavin Hu (Arm Technology China) @ 2019-11-07 1:13 UTC (permalink / raw)
To: Maxime Coquelin, dev
Cc: nd, david.marchand, tiwei.bie, thomas, rasland, matan, shahafs,
viacheslavo, arybchenko, stephen, hemant.agrawal, jerinj,
pbhagavatula, Honnappa Nagarahalli,
Ruifeng Wang (Arm Technology China),
Phil Yang (Arm Technology China),
Joyce Kong (Arm Technology China),
Steve Capper, nd
Hi Maxime,
> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Wednesday, October 23, 2019 4:23 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; dev@dpdk.org
> Cc: nd <nd@arm.com>; david.marchand@redhat.com; tiwei.bie@intel.com;
> thomas@monjalon.net; rasland@mellanox.com; matan@mellanox.com;
> shahafs@mellanox.com; viacheslavo@mellanox.com;
> arybchenko@solarflare.com; stephen@networkplumber.org;
> hemant.agrawal@nxp.com; jerinj@marvell.com;
> pbhagavatula@marvell.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang (Arm Technology China)
> <Ruifeng.Wang@arm.com>; Phil Yang (Arm Technology China)
> <Phil.Yang@arm.com>; Joyce Kong (Arm Technology China)
> <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [PATCH v1 0/3] relax io barrier for aarch64 and use smp barriers
> for virtual pci memory
>
> Hi Gavin,
>
> On 10/22/19 5:27 PM, Gavin Hu wrote:
> > Armv8's peripheral coherence order is a total order on all reads and
> > writes to that peripheral, that makes a compiler barrier is enough for
> > abstracted rte io barrier.
> >
> > For virtual PCI devices, the virtual device memory is actually normal
> > memory and the Hypervisor view of things takes precedence and they are
> > within a smp configuration and smp barriers should be used, the
> > relaxed io barrier for aarch64 becomes insufficient.
>
> IIUC, this series is for performance optimization and not fixing
> coherency issues. Can you confirm?
Yes, this is for perf optimization.
>
> If that's the case, I'm afraid we'll have to postpone it to v20.02,
> our patch queues are already too big at that stage of the release.
Ok, no problem, thanks!
>
> Maxime
>
> > Gavin Hu (3):
> > eal/arm64: relax the io barrier for aarch64
> > net/virtio: virtual PCI requires smp barriers
> > crypto/virtio: virtual PCI requires smp barriers
> >
> > drivers/crypto/virtio/virtio_pci.c | 124 ++++++++++++++++-----
> > drivers/net/virtio/virtio_pci.c | 124 ++++++++++++++++-----
> > .../common/include/arch/arm/rte_atomic_64.h | 6 +-
> > 3 files changed, 191 insertions(+), 63 deletions(-)
> >
^ permalink raw reply [flat|nested] 35+ messages in thread
* [dpdk-dev] [PATCH v2 0/3] relax io barrier for aarch64 and use smp barriers for virtual pci memory
2019-10-22 15:27 [dpdk-dev] [PATCH v1 0/3] relax io barrier for aarch64 and use smp barriers for virtual pci memory Gavin Hu
` (3 preceding siblings ...)
2019-10-23 8:22 ` [dpdk-dev] [PATCH v1 0/3] relax io barrier for aarch64 and use smp barriers for virtual pci memory Maxime Coquelin
@ 2019-12-20 3:09 ` Gavin Hu
2019-12-20 3:09 ` [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64 Gavin Hu
` (4 subsequent siblings)
9 siblings, 0 replies; 35+ messages in thread
From: Gavin Hu @ 2019-12-20 3:09 UTC (permalink / raw)
To: dev
Cc: nd, david.marchand, maxime.coquelin, tiwei.bie, thomas, rasland,
matan, shahafs, viacheslavo, arybchenko, stephen, hemant.agrawal,
jerinj, pbhagavatula, Honnappa.Nagarahalli, ruifeng.wang,
phil.yang, joyce.kong, steve.capper
Armv8's peripheral coherence order is a total order on all reads and
writes to that peripheral, that makes a compiler barrier is enough for
abstracted rte io barrier.
For virtual PCI devices, the virtual device memory is actually normal
memory and the Hypervisor view of things takes precedence and they are
within a smp configuration and smp barriers should be used, the
relaxed io barrier for aarch64 becomes insufficient.
Note for the ordering of other MMIO device memories, other than PCI,
stronger barriers might be required, which depends on the memory attributes
assigned to the memory regions. So far I did not find occurrences of such
io barriers used in non-PCI device memories within DPDK.
V2:
- remove virtio_pci_read/write64 APIs definitions, they are not needed and generate compiling errors like " error: unused function 'virtio_pci_write64' [-Werror,-Wunused-function]"
- update the reference link to kernel source code
Gavin Hu (3):
eal/arm64: relax the io barrier for aarch64
net/virtio: virtual PCI requires smp barriers
crypto/virtio: virtual PCI requires smp barriers
drivers/crypto/virtio/virtio_pci.c | 108 +++++++++++++++------
drivers/net/virtio/virtio_pci.c | 108 +++++++++++++++------
.../common/include/arch/arm/rte_atomic_64.h | 6 +-
3 files changed, 159 insertions(+), 63 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 35+ messages in thread
* [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
2019-10-22 15:27 [dpdk-dev] [PATCH v1 0/3] relax io barrier for aarch64 and use smp barriers for virtual pci memory Gavin Hu
` (4 preceding siblings ...)
2019-12-20 3:09 ` [dpdk-dev] [PATCH v2 " Gavin Hu
@ 2019-12-20 3:09 ` Gavin Hu
2019-12-20 3:33 ` Jerin Jacob
2019-12-20 3:09 ` [dpdk-dev] [PATCH v2 2/3] net/virtio: virtual PCI requires smp barriers Gavin Hu
` (3 subsequent siblings)
9 siblings, 1 reply; 35+ messages in thread
From: Gavin Hu @ 2019-12-20 3:09 UTC (permalink / raw)
To: dev
Cc: nd, david.marchand, thomas, rasland, maxime.coquelin, tiwei.bie,
hemant.agrawal, jerinj, pbhagavatula, Honnappa.Nagarahalli,
ruifeng.wang, phil.yang, joyce.kong, steve.capper
Armv8's peripheral coherence order is a total order on all reads and writes
to that peripheral.[1]
The peripheral coherence order for a memory-mapped peripheral signifies the
order in which accesses arrive at the endpoint. For a read or a write RW1
and a read or a write RW2 to the same peripheral, then RW1 will appear in
the peripheral coherence order for the peripheral before RW2 if either of
the following cases apply:
1. RW1 and RW2 are accesses using Non-cacheable or Device attributes and
RW1 is Ordered-before RW2.
2. RW1 and RW2 are accesses using Device-nGnRE or Device-nGnRnE attributes
and RW1 appears in program order before RW2.
On arm platforms, all the PCI resources are mapped to nGnRE device memory
[2], the above case 2 holds true, that means the peripheral coherence order
applies here and just a compiler barrier is sufficient for rte io barriers.
[1] Section B2.3.4 of ARMARM, https://developer.arm.com/docs/ddi0487/lates
t/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
tree/drivers/pci/pci-sysfs.c#n1204
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
---
lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
index 859ae12..fd63956 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
@@ -34,11 +34,11 @@ extern "C" {
#define rte_smp_rmb() dmb(ishld)
-#define rte_io_mb() rte_mb()
+#define rte_io_mb() rte_compiler_barrier()
-#define rte_io_wmb() rte_wmb()
+#define rte_io_wmb() rte_compiler_barrier()
-#define rte_io_rmb() rte_rmb()
+#define rte_io_rmb() rte_compiler_barrier()
#define rte_cio_wmb() dmb(oshst)
--
2.7.4
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
2019-12-20 3:09 ` [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64 Gavin Hu
@ 2019-12-20 3:33 ` Jerin Jacob
2019-12-20 3:38 ` Jerin Jacob
0 siblings, 1 reply; 35+ messages in thread
From: Jerin Jacob @ 2019-12-20 3:33 UTC (permalink / raw)
To: Gavin Hu
Cc: dpdk-dev, nd, David Marchand, Thomas Monjalon, rasland,
maxime.coquelin, tiwei.bie, Hemant Agrawal, Jerin Jacob,
Pavan Nikhilesh, Honnappa Nagarahalli,
Ruifeng Wang (Arm Technology China),
Phil Yang, Joyce Kong, Steve Capper
On Fri, Dec 20, 2019 at 8:40 AM Gavin Hu <gavin.hu@arm.com> wrote:
>
> Armv8's peripheral coherence order is a total order on all reads and writes
> to that peripheral.[1]
>
> The peripheral coherence order for a memory-mapped peripheral signifies the
> order in which accesses arrive at the endpoint. For a read or a write RW1
> and a read or a write RW2 to the same peripheral, then RW1 will appear in
> the peripheral coherence order for the peripheral before RW2 if either of
> the following cases apply:
> 1. RW1 and RW2 are accesses using Non-cacheable or Device attributes and
> RW1 is Ordered-before RW2.
> 2. RW1 and RW2 are accesses using Device-nGnRE or Device-nGnRnE attributes
> and RW1 appears in program order before RW2.
This is true if RW1 and RW2 addresses are device memory. i.e the
registers in the PCI bar address.
If RW1 is DDR address which is been used by the controller(say NIC
ring descriptor) then there will be an issue.
For example Intel i40e driver, the admin queue update in Host DDR
memory and it updates the doorbell.
In such a case, this patch will create an issue. Correct? Have you
checked this patch with ARM64 + XL710 controllers?
Some of the legacy code is missing such barriers, that's the reason
for adding rte_io_* barrier.
>
> On arm platforms, all the PCI resources are mapped to nGnRE device memory
> [2], the above case 2 holds true, that means the peripheral coherence order
> applies here and just a compiler barrier is sufficient for rte io barriers.
>
> [1] Section B2.3.4 of ARMARM, https://developer.arm.com/docs/ddi0487/lates
> t/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
> tree/drivers/pci/pci-sysfs.c#n1204
>
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> ---
> lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> index 859ae12..fd63956 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> @@ -34,11 +34,11 @@ extern "C" {
>
> #define rte_smp_rmb() dmb(ishld)
>
> -#define rte_io_mb() rte_mb()
> +#define rte_io_mb() rte_compiler_barrier()
>
> -#define rte_io_wmb() rte_wmb()
> +#define rte_io_wmb() rte_compiler_barrier()
>
> -#define rte_io_rmb() rte_rmb()
> +#define rte_io_rmb() rte_compiler_barrier()
>
> #define rte_cio_wmb() dmb(oshst)
>
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
2019-12-20 3:33 ` Jerin Jacob
@ 2019-12-20 3:38 ` Jerin Jacob
2019-12-20 4:19 ` Gavin Hu
0 siblings, 1 reply; 35+ messages in thread
From: Jerin Jacob @ 2019-12-20 3:38 UTC (permalink / raw)
To: Gavin Hu
Cc: dpdk-dev, nd, David Marchand, Thomas Monjalon, rasland,
maxime.coquelin, tiwei.bie, Hemant Agrawal, Jerin Jacob,
Pavan Nikhilesh, Honnappa Nagarahalli,
Ruifeng Wang (Arm Technology China),
Phil Yang, Joyce Kong, Steve Capper
On Fri, Dec 20, 2019 at 9:03 AM Jerin Jacob <jerinjacobk@gmail.com> wrote:
>
> On Fri, Dec 20, 2019 at 8:40 AM Gavin Hu <gavin.hu@arm.com> wrote:
> >
> > Armv8's peripheral coherence order is a total order on all reads and writes
> > to that peripheral.[1]
> >
> > The peripheral coherence order for a memory-mapped peripheral signifies the
> > order in which accesses arrive at the endpoint. For a read or a write RW1
> > and a read or a write RW2 to the same peripheral, then RW1 will appear in
> > the peripheral coherence order for the peripheral before RW2 if either of
> > the following cases apply:
> > 1. RW1 and RW2 are accesses using Non-cacheable or Device attributes and
> > RW1 is Ordered-before RW2.
> > 2. RW1 and RW2 are accesses using Device-nGnRE or Device-nGnRnE attributes
> > and RW1 appears in program order before RW2.
>
>
> This is true if RW1 and RW2 addresses are device memory. i.e the
> registers in the PCI bar address.
> If RW1 is DDR address which is been used by the controller(say NIC
> ring descriptor) then there will be an issue.
> For example Intel i40e driver, the admin queue update in Host DDR
> memory and it updates the doorbell.
> In such a case, this patch will create an issue. Correct? Have you
> checked this patch with ARM64 + XL710 controllers?
>
> Some of the legacy code is missing such barriers, that's the reason
> for adding rte_io_* barrier.
More details:
https://dev.dpdk.narkive.com/DpIRqDuy/dpdk-dev-patch-v2-i40e-fix-eth-i40e-dev-init-sequence-on-thunderx
>
> >
> > On arm platforms, all the PCI resources are mapped to nGnRE device memory
> > [2], the above case 2 holds true, that means the peripheral coherence order
> > applies here and just a compiler barrier is sufficient for rte io barriers.
> >
> > [1] Section B2.3.4 of ARMARM, https://developer.arm.com/docs/ddi0487/lates
> > t/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
> > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
> > tree/drivers/pci/pci-sysfs.c#n1204
> >
> > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Steve Capper <steve.capper@arm.com>
> > Reviewed-by: Phil Yang <phil.yang@arm.com>
> > ---
> > lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> > index 859ae12..fd63956 100644
> > --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> > +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> > @@ -34,11 +34,11 @@ extern "C" {
> >
> > #define rte_smp_rmb() dmb(ishld)
> >
> > -#define rte_io_mb() rte_mb()
> > +#define rte_io_mb() rte_compiler_barrier()
> >
> > -#define rte_io_wmb() rte_wmb()
> > +#define rte_io_wmb() rte_compiler_barrier()
> >
> > -#define rte_io_rmb() rte_rmb()
> > +#define rte_io_rmb() rte_compiler_barrier()
> >
> > #define rte_cio_wmb() dmb(oshst)
> >
> > --
> > 2.7.4
> >
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
2019-12-20 3:38 ` Jerin Jacob
@ 2019-12-20 4:19 ` Gavin Hu
2019-12-20 4:34 ` Jerin Jacob
0 siblings, 1 reply; 35+ messages in thread
From: Gavin Hu @ 2019-12-20 4:19 UTC (permalink / raw)
To: Jerin Jacob
Cc: dpdk-dev, nd, David Marchand, thomas, rasland, maxime.coquelin,
tiwei.bie, hemant.agrawal, jerinj, Pavan Nikhilesh,
Honnappa Nagarahalli, Ruifeng Wang, Phil Yang, Joyce Kong,
Steve Capper, nd
Hi Jerin,
Thanks for review, inline comments,
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Friday, December 20, 2019 11:38 AM
> To: Gavin Hu <Gavin.Hu@arm.com>
> Cc: dpdk-dev <dev@dpdk.org>; nd <nd@arm.com>; David Marchand
> <david.marchand@redhat.com>; thomas@monjalon.net;
> rasland@mellanox.com; maxime.coquelin@redhat.com;
> tiwei.bie@intel.com; hemant.agrawal@nxp.com; jerinj@marvell.com;
> Pavan Nikhilesh <pbhagavatula@marvell.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for
> aarch64
>
> On Fri, Dec 20, 2019 at 9:03 AM Jerin Jacob <jerinjacobk@gmail.com>
> wrote:
> >
> > On Fri, Dec 20, 2019 at 8:40 AM Gavin Hu <gavin.hu@arm.com> wrote:
> > >
> > > Armv8's peripheral coherence order is a total order on all reads and
> writes
> > > to that peripheral.[1]
> > >
> > > The peripheral coherence order for a memory-mapped peripheral
> signifies the
> > > order in which accesses arrive at the endpoint. For a read or a write
> RW1
> > > and a read or a write RW2 to the same peripheral, then RW1 will appear
> in
> > > the peripheral coherence order for the peripheral before RW2 if either
> of
> > > the following cases apply:
> > > 1. RW1 and RW2 are accesses using Non-cacheable or Device attributes
> and
> > > RW1 is Ordered-before RW2.
> > > 2. RW1 and RW2 are accesses using Device-nGnRE or Device-nGnRnE
> attributes
> > > and RW1 appears in program order before RW2.
> >
> >
> > This is true if RW1 and RW2 addresses are device memory. i.e the
> > registers in the PCI bar address.
> > If RW1 is DDR address which is been used by the controller(say NIC
> > ring descriptor) then there will be an issue.
> > For example Intel i40e driver, the admin queue update in Host DDR
> > memory and it updates the doorbell.
> > In such a case, this patch will create an issue. Correct? Have you
> > checked this patch with ARM64 + XL710 controllers?
This patch relaxes the rte_io_*mb barriers for pure PCI device memory accesses.
For mixed accesses of DDR and PCI device memory, rte_smp_*mb(DMB ISH) is not sufficient.
But rte_cio_*mb(DMB OSH) is sufficient and can be used.
> >
> > Some of the legacy code is missing such barriers, that's the reason
> > for adding rte_io_* barrier.
>
>
> More details:
>
> https://dev.dpdk.narkive.com/DpIRqDuy/dpdk-dev-patch-v2-i40e-fix-eth-
> i40e-dev-init-sequence-on-thunderx
>
> >
> > >
> > > On arm platforms, all the PCI resources are mapped to nGnRE device
> memory
> > > [2], the above case 2 holds true, that means the peripheral coherence
> order
> > > applies here and just a compiler barrier is sufficient for rte io barriers.
> > >
> > > [1] Section B2.3.4 of ARMARM,
> https://developer.arm.com/docs/ddi0487/lates
> > > t/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-
> profile
> > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
> > > tree/drivers/pci/pci-sysfs.c#n1204
> > >
> > > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > > Reviewed-by: Steve Capper <steve.capper@arm.com>
> > > Reviewed-by: Phil Yang <phil.yang@arm.com>
> > > ---
> > > lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 +++---
> > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> > > index 859ae12..fd63956 100644
> > > --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> > > +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h
> > > @@ -34,11 +34,11 @@ extern "C" {
> > >
> > > #define rte_smp_rmb() dmb(ishld)
> > >
> > > -#define rte_io_mb() rte_mb()
> > > +#define rte_io_mb() rte_compiler_barrier()
> > >
> > > -#define rte_io_wmb() rte_wmb()
> > > +#define rte_io_wmb() rte_compiler_barrier()
> > >
> > > -#define rte_io_rmb() rte_rmb()
> > > +#define rte_io_rmb() rte_compiler_barrier()
> > >
> > > #define rte_cio_wmb() dmb(oshst)
> > >
> > > --
> > > 2.7.4
> > >
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
2019-12-20 4:19 ` Gavin Hu
@ 2019-12-20 4:34 ` Jerin Jacob
2019-12-20 6:32 ` Gavin Hu
0 siblings, 1 reply; 35+ messages in thread
From: Jerin Jacob @ 2019-12-20 4:34 UTC (permalink / raw)
To: Gavin Hu
Cc: dpdk-dev, nd, David Marchand, thomas, rasland, maxime.coquelin,
tiwei.bie, hemant.agrawal, jerinj, Pavan Nikhilesh,
Honnappa Nagarahalli, Ruifeng Wang, Phil Yang, Joyce Kong,
Steve Capper
On Fri, Dec 20, 2019 at 9:49 AM Gavin Hu <Gavin.Hu@arm.com> wrote:
>
> Hi Jerin,
>
> Thanks for review, inline comments,
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Friday, December 20, 2019 11:38 AM
> > To: Gavin Hu <Gavin.Hu@arm.com>
> > Cc: dpdk-dev <dev@dpdk.org>; nd <nd@arm.com>; David Marchand
> > <david.marchand@redhat.com>; thomas@monjalon.net;
> > rasland@mellanox.com; maxime.coquelin@redhat.com;
> > tiwei.bie@intel.com; hemant.agrawal@nxp.com; jerinj@marvell.com;
> > Pavan Nikhilesh <pbhagavatula@marvell.com>; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> > <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> > <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> > Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for
> > aarch64
> >
> > On Fri, Dec 20, 2019 at 9:03 AM Jerin Jacob <jerinjacobk@gmail.com>
> > wrote:
> > >
> > > On Fri, Dec 20, 2019 at 8:40 AM Gavin Hu <gavin.hu@arm.com> wrote:
> > > >
> > > > Armv8's peripheral coherence order is a total order on all reads and
> > writes
> > > > to that peripheral.[1]
> > > >
> > > > The peripheral coherence order for a memory-mapped peripheral
> > signifies the
> > > > order in which accesses arrive at the endpoint. For a read or a write
> > RW1
> > > > and a read or a write RW2 to the same peripheral, then RW1 will appear
> > in
> > > > the peripheral coherence order for the peripheral before RW2 if either
> > of
> > > > the following cases apply:
> > > > 1. RW1 and RW2 are accesses using Non-cacheable or Device attributes
> > and
> > > > RW1 is Ordered-before RW2.
> > > > 2. RW1 and RW2 are accesses using Device-nGnRE or Device-nGnRnE
> > attributes
> > > > and RW1 appears in program order before RW2.
> > >
> > >
> > > This is true if RW1 and RW2 addresses are device memory. i.e the
> > > registers in the PCI bar address.
> > > If RW1 is DDR address which is been used by the controller(say NIC
> > > ring descriptor) then there will be an issue.
> > > For example Intel i40e driver, the admin queue update in Host DDR
> > > memory and it updates the doorbell.
> > > In such a case, this patch will create an issue. Correct? Have you
> > > checked this patch with ARM64 + XL710 controllers?
>
> This patch relaxes the rte_io_*mb barriers for pure PCI device memory accesses.
Yes. This would break cases for mixed access fro i40e drivers.
>
> For mixed accesses of DDR and PCI device memory, rte_smp_*mb(DMB ISH) is not sufficient.
> But rte_cio_*mb(DMB OSH) is sufficient and can be used.
Yes. Let me share a bit of history.
1) There are a lot of drivers(initially developed in x86) that have
mixed access and don't have any barriers as x86 does not need it.
2) rte_io introduced to fix that
3) Item (2) introduced the performance issues in the fast path as an
optimization rte_cio_* introduced.
So in the current of the scheme of things, we have APIs to FIX
portability issue(rte_io) and performance issue(rte_cio).
IMO, we may not need any change in infra code now. If you think, the
documentation is missing then we can enhance it.
If we make infra change then again drivers needs to be updated and tested.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
2019-12-20 4:34 ` Jerin Jacob
@ 2019-12-20 6:32 ` Gavin Hu
2019-12-20 6:55 ` Jerin Jacob
0 siblings, 1 reply; 35+ messages in thread
From: Gavin Hu @ 2019-12-20 6:32 UTC (permalink / raw)
To: Jerin Jacob
Cc: dpdk-dev, nd, David Marchand, thomas, rasland, maxime.coquelin,
tiwei.bie, hemant.agrawal, jerinj, Pavan Nikhilesh,
Honnappa Nagarahalli, Ruifeng Wang, Phil Yang, Joyce Kong,
Steve Capper, nd
Hi Jerin,
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Friday, December 20, 2019 12:34 PM
> To: Gavin Hu <Gavin.Hu@arm.com>
> Cc: dpdk-dev <dev@dpdk.org>; nd <nd@arm.com>; David Marchand
> <david.marchand@redhat.com>; thomas@monjalon.net;
> rasland@mellanox.com; maxime.coquelin@redhat.com;
> tiwei.bie@intel.com; hemant.agrawal@nxp.com; jerinj@marvell.com;
> Pavan Nikhilesh <pbhagavatula@marvell.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for
> aarch64
>
> On Fri, Dec 20, 2019 at 9:49 AM Gavin Hu <Gavin.Hu@arm.com> wrote:
> >
> > Hi Jerin,
> >
> > Thanks for review, inline comments,
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Friday, December 20, 2019 11:38 AM
> > > To: Gavin Hu <Gavin.Hu@arm.com>
> > > Cc: dpdk-dev <dev@dpdk.org>; nd <nd@arm.com>; David Marchand
> > > <david.marchand@redhat.com>; thomas@monjalon.net;
> > > rasland@mellanox.com; maxime.coquelin@redhat.com;
> > > tiwei.bie@intel.com; hemant.agrawal@nxp.com; jerinj@marvell.com;
> > > Pavan Nikhilesh <pbhagavatula@marvell.com>; Honnappa Nagarahalli
> > > <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> > > <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce
> Kong
> > > <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier
> for
> > > aarch64
> > >
> > > On Fri, Dec 20, 2019 at 9:03 AM Jerin Jacob <jerinjacobk@gmail.com>
> > > wrote:
> > > >
> > > > On Fri, Dec 20, 2019 at 8:40 AM Gavin Hu <gavin.hu@arm.com> wrote:
> > > > >
> > > > > Armv8's peripheral coherence order is a total order on all reads and
> > > writes
> > > > > to that peripheral.[1]
> > > > >
> > > > > The peripheral coherence order for a memory-mapped peripheral
> > > signifies the
> > > > > order in which accesses arrive at the endpoint. For a read or a write
> > > RW1
> > > > > and a read or a write RW2 to the same peripheral, then RW1 will
> appear
> > > in
> > > > > the peripheral coherence order for the peripheral before RW2 if
> either
> > > of
> > > > > the following cases apply:
> > > > > 1. RW1 and RW2 are accesses using Non-cacheable or Device
> attributes
> > > and
> > > > > RW1 is Ordered-before RW2.
> > > > > 2. RW1 and RW2 are accesses using Device-nGnRE or Device-
> nGnRnE
> > > attributes
> > > > > and RW1 appears in program order before RW2.
> > > >
> > > >
> > > > This is true if RW1 and RW2 addresses are device memory. i.e the
> > > > registers in the PCI bar address.
> > > > If RW1 is DDR address which is been used by the controller(say NIC
> > > > ring descriptor) then there will be an issue.
> > > > For example Intel i40e driver, the admin queue update in Host DDR
> > > > memory and it updates the doorbell.
> > > > In such a case, this patch will create an issue. Correct? Have you
> > > > checked this patch with ARM64 + XL710 controllers?
> >
> > This patch relaxes the rte_io_*mb barriers for pure PCI device memory
> accesses.
>
> Yes. This would break cases for mixed access fro i40e drivers.
>
> >
> > For mixed accesses of DDR and PCI device memory, rte_smp_*mb(DMB
> ISH) is not sufficient.
> > But rte_cio_*mb(DMB OSH) is sufficient and can be used.
>
> Yes. Let me share a bit of history.
>
> 1) There are a lot of drivers(initially developed in x86) that have
> mixed access and don't have any barriers as x86 does not need it.
> 2) rte_io introduced to fix that
> 3) Item (2) introduced the performance issues in the fast path as an
> optimization rte_cio_* introduced.
Exactly, this patch is to mitigate the performance issues introduced by rte_io('dsb' is too much and unnecessary here).
Rte_cio instead is definitely required for mixed access.
>
> So in the current of the scheme of things, we have APIs to FIX
> portability issue(rte_io) and performance issue(rte_cio).
> IMO, we may not need any change in infra code now. If you think, the
> documentation is missing then we can enhance it.
> If we make infra change then again drivers needs to be updated and tested.
No changes for rte_cio, the semantics, and definitions of rte_io does not change either, if limited the scope to PCI, which is the case in DPDK context(?).
The change lies only in the implementation, right?
Just looked at the link you shared and found i40 driver is missing rte_cio_*mb in i40e_asq_send_command, but the old rte_io_*mb rescued.
Will submit a new patch in this series to used rte_cio together with new relaxed rte_io and do more tests.
Yes, this is a big change, also a big optimization, for aarch64, in our tests it has very positive results.
But as the case in i40e, we must pay attention to where rte_cio was missing but rescued by old rte_io(but not by new rte_io).
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
2019-12-20 6:32 ` Gavin Hu
@ 2019-12-20 6:55 ` Jerin Jacob
2019-12-23 9:14 ` Gavin Hu
0 siblings, 1 reply; 35+ messages in thread
From: Jerin Jacob @ 2019-12-20 6:55 UTC (permalink / raw)
To: Gavin Hu
Cc: dpdk-dev, nd, David Marchand, thomas, rasland, maxime.coquelin,
tiwei.bie, hemant.agrawal, jerinj, Pavan Nikhilesh,
Honnappa Nagarahalli, Ruifeng Wang, Phil Yang, Joyce Kong,
Steve Capper
On Fri, Dec 20, 2019 at 12:02 PM Gavin Hu <Gavin.Hu@arm.com> wrote:
>
> Hi Jerin,
Hi Gavin,
> > > > > >
> > > > > > The peripheral coherence order for a memory-mapped peripheral
> > > > signifies the
> > > > > > order in which accesses arrive at the endpoint. For a read or a write
> > > > RW1
> > > > > > and a read or a write RW2 to the same peripheral, then RW1 will
> > appear
> > > > in
> > > > > > the peripheral coherence order for the peripheral before RW2 if
> > either
> > > > of
> > > > > > the following cases apply:
> > > > > > 1. RW1 and RW2 are accesses using Non-cacheable or Device
> > attributes
> > > > and
> > > > > > RW1 is Ordered-before RW2.
> > > > > > 2. RW1 and RW2 are accesses using Device-nGnRE or Device-
> > nGnRnE
> > > > attributes
> > > > > > and RW1 appears in program order before RW2.
> > > > >
> > > > >
> > > > > This is true if RW1 and RW2 addresses are device memory. i.e the
> > > > > registers in the PCI bar address.
> > > > > If RW1 is DDR address which is been used by the controller(say NIC
> > > > > ring descriptor) then there will be an issue.
> > > > > For example Intel i40e driver, the admin queue update in Host DDR
> > > > > memory and it updates the doorbell.
> > > > > In such a case, this patch will create an issue. Correct? Have you
> > > > > checked this patch with ARM64 + XL710 controllers?
> > >
> > > This patch relaxes the rte_io_*mb barriers for pure PCI device memory
> > accesses.
> >
> > Yes. This would break cases for mixed access fro i40e drivers.
> >
> > >
> > > For mixed accesses of DDR and PCI device memory, rte_smp_*mb(DMB
> > ISH) is not sufficient.
> > > But rte_cio_*mb(DMB OSH) is sufficient and can be used.
> >
> > Yes. Let me share a bit of history.
> >
> > 1) There are a lot of drivers(initially developed in x86) that have
> > mixed access and don't have any barriers as x86 does not need it.
> > 2) rte_io introduced to fix that
> > 3) Item (2) introduced the performance issues in the fast path as an
> > optimization rte_cio_* introduced.
> Exactly, this patch is to mitigate the performance issues introduced by rte_io('dsb' is too much and unnecessary here).
> Rte_cio instead is definitely required for mixed access.
> >
> > So in the current of the scheme of things, we have APIs to FIX
> > portability issue(rte_io) and performance issue(rte_cio).
> > IMO, we may not need any change in infra code now. If you think, the
> > documentation is missing then we can enhance it.
> > If we make infra change then again drivers needs to be updated and tested.
> No changes for rte_cio, the semantics, and definitions of rte_io does not change either, if limited the scope to PCI, which is the case in DPDK context(?).
> The change lies only in the implementation, right?
>
> Just looked at the link you shared and found i40 driver is missing rte_cio_*mb in i40e_asq_send_command, but the old rte_io_*mb rescued.
> Will submit a new patch in this series to used rte_cio together with new relaxed rte_io and do more tests.
>
> Yes, this is a big change, also a big optimization, for aarch64, in our tests it has very positive results.
It will be optimization only when if we are changing in the fast path.
In the slow path, it does not matter.
I think, the First step should be to use rte_cio_* wherever it is
coherent memory used in _fast path_. I think, Almost every driver
fixed that.
I am not against this patch(changing the slow path to use rte_cio*
from rte_io* and virtio changes associated with that).
If you are taking that patch, pay attention to all the drivers in the
tree which is using rte_io* for mixed access in slowpath.
> But as the case in i40e, we must pay attention to where rte_cio was missing but rescued by old rte_io(but not by new rte_io).
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
2019-12-20 6:55 ` Jerin Jacob
@ 2019-12-23 9:14 ` Gavin Hu
2019-12-23 9:19 ` Jerin Jacob
0 siblings, 1 reply; 35+ messages in thread
From: Gavin Hu @ 2019-12-23 9:14 UTC (permalink / raw)
To: Jerin Jacob
Cc: dpdk-dev, nd, David Marchand, thomas, rasland, maxime.coquelin,
tiwei.bie, hemant.agrawal, jerinj, Pavan Nikhilesh,
Honnappa Nagarahalli, Ruifeng Wang, Phil Yang, Joyce Kong,
Steve Capper, nd
Hi Jerin,
I think we are on the same page with regard to the problem, and the situations, thanks for illuminating the historical background of the two barriers.
About the solution, I added inline comments.
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Friday, December 20, 2019 2:56 PM
> To: Gavin Hu <Gavin.Hu@arm.com>
> Cc: dpdk-dev <dev@dpdk.org>; nd <nd@arm.com>; David Marchand
> <david.marchand@redhat.com>; thomas@monjalon.net;
> rasland@mellanox.com; maxime.coquelin@redhat.com;
> tiwei.bie@intel.com; hemant.agrawal@nxp.com; jerinj@marvell.com;
> Pavan Nikhilesh <pbhagavatula@marvell.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for
> aarch64
>
> On Fri, Dec 20, 2019 at 12:02 PM Gavin Hu <Gavin.Hu@arm.com> wrote:
> >
> > Hi Jerin,
>
> Hi Gavin,
>
>
> > > > > > >
> > > > > > > The peripheral coherence order for a memory-mapped peripheral
> > > > > signifies the
> > > > > > > order in which accesses arrive at the endpoint. For a read or a
> write
> > > > > RW1
> > > > > > > and a read or a write RW2 to the same peripheral, then RW1 will
> > > appear
> > > > > in
> > > > > > > the peripheral coherence order for the peripheral before RW2 if
> > > either
> > > > > of
> > > > > > > the following cases apply:
> > > > > > > 1. RW1 and RW2 are accesses using Non-cacheable or Device
> > > attributes
> > > > > and
> > > > > > > RW1 is Ordered-before RW2.
> > > > > > > 2. RW1 and RW2 are accesses using Device-nGnRE or Device-
> > > nGnRnE
> > > > > attributes
> > > > > > > and RW1 appears in program order before RW2.
> > > > > >
> > > > > >
> > > > > > This is true if RW1 and RW2 addresses are device memory. i.e the
> > > > > > registers in the PCI bar address.
> > > > > > If RW1 is DDR address which is been used by the controller(say NIC
> > > > > > ring descriptor) then there will be an issue.
> > > > > > For example Intel i40e driver, the admin queue update in Host DDR
> > > > > > memory and it updates the doorbell.
> > > > > > In such a case, this patch will create an issue. Correct? Have you
> > > > > > checked this patch with ARM64 + XL710 controllers?
> > > >
> > > > This patch relaxes the rte_io_*mb barriers for pure PCI device memory
> > > accesses.
> > >
> > > Yes. This would break cases for mixed access fro i40e drivers.
> > >
> > > >
> > > > For mixed accesses of DDR and PCI device memory,
> rte_smp_*mb(DMB
> > > ISH) is not sufficient.
> > > > But rte_cio_*mb(DMB OSH) is sufficient and can be used.
> > >
> > > Yes. Let me share a bit of history.
> > >
> > > 1) There are a lot of drivers(initially developed in x86) that have
> > > mixed access and don't have any barriers as x86 does not need it.
> > > 2) rte_io introduced to fix that
> > > 3) Item (2) introduced the performance issues in the fast path as an
> > > optimization rte_cio_* introduced.
> > Exactly, this patch is to mitigate the performance issues introduced by
> rte_io('dsb' is too much and unnecessary here).
> > Rte_cio instead is definitely required for mixed access.
> > >
> > > So in the current of the scheme of things, we have APIs to FIX
> > > portability issue(rte_io) and performance issue(rte_cio).
> > > IMO, we may not need any change in infra code now. If you think, the
> > > documentation is missing then we can enhance it.
> > > If we make infra change then again drivers needs to be updated and
> tested.
> > No changes for rte_cio, the semantics, and definitions of rte_io does not
> change either, if limited the scope to PCI, which is the case in DPDK
> context(?).
> > The change lies only in the implementation, right?
> >
> > Just looked at the link you shared and found i40 driver is missing
> rte_cio_*mb in i40e_asq_send_command, but the old rte_io_*mb rescued.
> > Will submit a new patch in this series to used rte_cio together with new
> relaxed rte_io and do more tests.
> >
> > Yes, this is a big change, also a big optimization, for aarch64, in our tests it
> has very positive results.
>
> It will be optimization only when if we are changing in the fast path.
> In the slow path, it does not matter.
> I think, the First step should be to use rte_cio_* wherever it is
> coherent memory used in _fast path_. I think, Almost every driver
> fixed that.
>
> I am not against this patch(changing the slow path to use rte_cio*
> from rte_io* and virtio changes associated with that).
> If you are taking that patch, pay attention to all the drivers in the
> tree which is using rte_io* for mixed access in slowpath.
I see 30+ drivers has calling rte_io* directly or indirectly through rte_write/read*.
It is hard for me to figure out all the mixed accesses in these drivers, and as you said, it makes no sense to change the _slow path_.
How about we keep the old rte_io as is, and introduce 'fast path' version of rte_io for new code use?
Then in future, we may merge the two?
Another reason about this proposal is maybe there is rte_io calling in the fast path, but they are not mixed accesses and rte_cio is not suitable.
Any thoughts?
>
> > But as the case in i40e, we must pay attention to where rte_cio was
> missing but rescued by old rte_io(but not by new rte_io).
> >
> >
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
2019-12-23 9:14 ` Gavin Hu
@ 2019-12-23 9:19 ` Jerin Jacob
2019-12-23 10:16 ` Gavin Hu
0 siblings, 1 reply; 35+ messages in thread
From: Jerin Jacob @ 2019-12-23 9:19 UTC (permalink / raw)
To: Gavin Hu
Cc: dpdk-dev, nd, David Marchand, thomas, rasland, maxime.coquelin,
tiwei.bie, hemant.agrawal, jerinj, Pavan Nikhilesh,
Honnappa Nagarahalli, Ruifeng Wang, Phil Yang, Joyce Kong,
Steve Capper
On Mon, Dec 23, 2019 at 2:44 PM Gavin Hu <Gavin.Hu@arm.com> wrote:
>
> Hi Jerin,
Hi Gavin,
>
> I think we are on the same page with regard to the problem, and the situations, thanks for illuminating the historical background of the two barriers.
> About the solution, I added inline comments.
> > It will be optimization only when if we are changing in the fast path.
> > In the slow path, it does not matter.
> > I think, the First step should be to use rte_cio_* wherever it is
> > coherent memory used in _fast path_. I think, Almost every driver
> > fixed that.
> >
> > I am not against this patch(changing the slow path to use rte_cio*
> > from rte_io* and virtio changes associated with that).
> > If you are taking that patch, pay attention to all the drivers in the
> > tree which is using rte_io* for mixed access in slowpath.
> I see 30+ drivers has calling rte_io* directly or indirectly through rte_write/read*.
> It is hard for me to figure out all the mixed accesses in these drivers, and as you said, it makes no sense to change the _slow path_.
>
> How about we keep the old rte_io as is, and introduce 'fast path' version of rte_io for new code use?
> Then in future, we may merge the two?
> Another reason about this proposal is maybe there is rte_io calling in the fast path, but they are not mixed accesses and rte_cio is not suitable.
Could you share more details about the case where fastpath + rte_io
needed + rte_cio is not suitable?
>
> Any thoughts?
>
> >
> > > But as the case in i40e, we must pay attention to where rte_cio was
> > missing but rescued by old rte_io(but not by new rte_io).
> > >
> > >
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
2019-12-23 9:19 ` Jerin Jacob
@ 2019-12-23 10:16 ` Gavin Hu
2020-01-02 9:51 ` Jerin Jacob
0 siblings, 1 reply; 35+ messages in thread
From: Gavin Hu @ 2019-12-23 10:16 UTC (permalink / raw)
To: Jerin Jacob
Cc: dpdk-dev, nd, David Marchand, thomas, rasland, maxime.coquelin,
tiwei.bie, hemant.agrawal, jerinj, Pavan Nikhilesh,
Honnappa Nagarahalli, Ruifeng Wang, Phil Yang, Joyce Kong,
Steve Capper, nd
Hi Jerin,
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, December 23, 2019 5:20 PM
> To: Gavin Hu <Gavin.Hu@arm.com>
> Cc: dpdk-dev <dev@dpdk.org>; nd <nd@arm.com>; David Marchand
> <david.marchand@redhat.com>; thomas@monjalon.net;
> rasland@mellanox.com; maxime.coquelin@redhat.com;
> tiwei.bie@intel.com; hemant.agrawal@nxp.com; jerinj@marvell.com;
> Pavan Nikhilesh <pbhagavatula@marvell.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for
> aarch64
>
> On Mon, Dec 23, 2019 at 2:44 PM Gavin Hu <Gavin.Hu@arm.com> wrote:
> >
> > Hi Jerin,
>
> Hi Gavin,
>
> >
> > I think we are on the same page with regard to the problem, and the
> situations, thanks for illuminating the historical background of the two
> barriers.
> > About the solution, I added inline comments.
> > > It will be optimization only when if we are changing in the fast path.
> > > In the slow path, it does not matter.
> > > I think, the First step should be to use rte_cio_* wherever it is
> > > coherent memory used in _fast path_. I think, Almost every driver
> > > fixed that.
> > >
> > > I am not against this patch(changing the slow path to use rte_cio*
> > > from rte_io* and virtio changes associated with that).
> > > If you are taking that patch, pay attention to all the drivers in the
> > > tree which is using rte_io* for mixed access in slowpath.
> > I see 30+ drivers has calling rte_io* directly or indirectly through
> rte_write/read*.
> > It is hard for me to figure out all the mixed accesses in these drivers, and
> as you said, it makes no sense to change the _slow path_.
> >
> > How about we keep the old rte_io as is, and introduce 'fast path' version
> of rte_io for new code use?
> > Then in future, we may merge the two?
> > Another reason about this proposal is maybe there is rte_io calling in the
> fast path, but they are not mixed accesses and rte_cio is not suitable.
>
> Could you share more details about the case where fastpath + rte_io
> needed + rte_cio is not suitable?
Here is an example for i40e, in the fast path, but only a pure io memory access.
https://code.dpdk.org/dpdk/v19.11/source/drivers/net/i40e/i40e_rxtx.c#L1208
I wanted two variants of rte_io, because also x86 requires two as indicated here, one for no-WC and another for WC.
http://inbox.dpdk.org/dev/20191204151916.12607-1-xiaoyun.li@intel.com/T/#ea8bb1b4a378ab09baedbf95b4542bcb92f4a396f
>
> >
> > Any thoughts?
> >
> > >
> > > > But as the case in i40e, we must pay attention to where rte_cio was
> > > missing but rescued by old rte_io(but not by new rte_io).
> > > >
> > > >
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
2019-12-23 10:16 ` Gavin Hu
@ 2020-01-02 9:51 ` Jerin Jacob
2020-01-03 6:30 ` Gavin Hu
0 siblings, 1 reply; 35+ messages in thread
From: Jerin Jacob @ 2020-01-02 9:51 UTC (permalink / raw)
To: Gavin Hu
Cc: dpdk-dev, nd, David Marchand, thomas, rasland, maxime.coquelin,
tiwei.bie, hemant.agrawal, jerinj, Pavan Nikhilesh,
Honnappa Nagarahalli, Ruifeng Wang, Phil Yang, Joyce Kong,
Steve Capper
On Mon, Dec 23, 2019 at 3:46 PM Gavin Hu <Gavin.Hu@arm.com> wrote:
>
> Hi Jerin,
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Monday, December 23, 2019 5:20 PM
> > To: Gavin Hu <Gavin.Hu@arm.com>
> > Cc: dpdk-dev <dev@dpdk.org>; nd <nd@arm.com>; David Marchand
> > <david.marchand@redhat.com>; thomas@monjalon.net;
> > rasland@mellanox.com; maxime.coquelin@redhat.com;
> > tiwei.bie@intel.com; hemant.agrawal@nxp.com; jerinj@marvell.com;
> > Pavan Nikhilesh <pbhagavatula@marvell.com>; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> > <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> > <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> > Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for
> > aarch64
> >
> > On Mon, Dec 23, 2019 at 2:44 PM Gavin Hu <Gavin.Hu@arm.com> wrote:
> > >
> > > Hi Jerin,
> >
> > Hi Gavin,
> >
> > >
> > > I think we are on the same page with regard to the problem, and the
> > situations, thanks for illuminating the historical background of the two
> > barriers.
> > > About the solution, I added inline comments.
> > > > It will be optimization only when if we are changing in the fast path.
> > > > In the slow path, it does not matter.
> > > > I think, the First step should be to use rte_cio_* wherever it is
> > > > coherent memory used in _fast path_. I think, Almost every driver
> > > > fixed that.
> > > >
> > > > I am not against this patch(changing the slow path to use rte_cio*
> > > > from rte_io* and virtio changes associated with that).
> > > > If you are taking that patch, pay attention to all the drivers in the
> > > > tree which is using rte_io* for mixed access in slowpath.
> > > I see 30+ drivers has calling rte_io* directly or indirectly through
> > rte_write/read*.
> > > It is hard for me to figure out all the mixed accesses in these drivers, and
> > as you said, it makes no sense to change the _slow path_.
> > >
> > > How about we keep the old rte_io as is, and introduce 'fast path' version
> > of rte_io for new code use?
> > > Then in future, we may merge the two?
> > > Another reason about this proposal is maybe there is rte_io calling in the
> > fast path, but they are not mixed accesses and rte_cio is not suitable.
> >
> > Could you share more details about the case where fastpath + rte_io
> > needed + rte_cio is not suitable?
>
> Here is an example for i40e, in the fast path, but only a pure io memory access.
> https://code.dpdk.org/dpdk/v19.11/source/drivers/net/i40e/i40e_rxtx.c#L1208
Yes. That's a performance issue.
It could be changed to following for the fix that works on x86, arm64
with existing infra.
From:
I40E_PCI_REG_WRITE()
to:
rte_cio_wmb()
I40E_PCI_REG_WRITE_RELAXED()
>
> I wanted two variants of rte_io, because also x86 requires two as indicated here, one for no-WC and another for WC.
> http://inbox.dpdk.org/dev/20191204151916.12607-1-xiaoyun.li@intel.com/T/#ea8bb1b4a378ab09baedbf95b4542bcb92f4a396f
> >
> > >
> > > Any thoughts?
> > >
> > > >
> > > > > But as the case in i40e, we must pay attention to where rte_cio was
> > > > missing but rescued by old rte_io(but not by new rte_io).
> > > > >
> > > > >
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
2020-01-02 9:51 ` Jerin Jacob
@ 2020-01-03 6:30 ` Gavin Hu
2020-01-03 7:34 ` Jerin Jacob
0 siblings, 1 reply; 35+ messages in thread
From: Gavin Hu @ 2020-01-03 6:30 UTC (permalink / raw)
To: Jerin Jacob
Cc: dpdk-dev, nd, David Marchand, thomas, rasland, maxime.coquelin,
tiwei.bie, hemant.agrawal, jerinj, Pavan Nikhilesh,
Honnappa Nagarahalli, Ruifeng Wang, Phil Yang, Joyce Kong,
Steve Capper, nd
Hi Jerin,
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Thursday, January 2, 2020 5:52 PM
> To: Gavin Hu <Gavin.Hu@arm.com>
> Cc: dpdk-dev <dev@dpdk.org>; nd <nd@arm.com>; David Marchand
> <david.marchand@redhat.com>; thomas@monjalon.net;
> rasland@mellanox.com; maxime.coquelin@redhat.com; tiwei.bie@intel.com;
> hemant.agrawal@nxp.com; jerinj@marvell.com; Pavan Nikhilesh
> <pbhagavatula@marvell.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for
> aarch64
>
> On Mon, Dec 23, 2019 at 3:46 PM Gavin Hu <Gavin.Hu@arm.com> wrote:
> >
> > Hi Jerin,
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Monday, December 23, 2019 5:20 PM
> > > To: Gavin Hu <Gavin.Hu@arm.com>
> > > Cc: dpdk-dev <dev@dpdk.org>; nd <nd@arm.com>; David Marchand
> > > <david.marchand@redhat.com>; thomas@monjalon.net;
> > > rasland@mellanox.com; maxime.coquelin@redhat.com;
> > > tiwei.bie@intel.com; hemant.agrawal@nxp.com; jerinj@marvell.com;
> > > Pavan Nikhilesh <pbhagavatula@marvell.com>; Honnappa Nagarahalli
> > > <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> > > <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> > > <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for
> > > aarch64
> > >
> > > On Mon, Dec 23, 2019 at 2:44 PM Gavin Hu <Gavin.Hu@arm.com> wrote:
> > > >
> > > > Hi Jerin,
> > >
> > > Hi Gavin,
> > >
> > > >
> > > > I think we are on the same page with regard to the problem, and the
> > > situations, thanks for illuminating the historical background of the two
> > > barriers.
> > > > About the solution, I added inline comments.
> > > > > It will be optimization only when if we are changing in the fast path.
> > > > > In the slow path, it does not matter.
> > > > > I think, the First step should be to use rte_cio_* wherever it is
> > > > > coherent memory used in _fast path_. I think, Almost every driver
> > > > > fixed that.
> > > > >
> > > > > I am not against this patch(changing the slow path to use rte_cio*
> > > > > from rte_io* and virtio changes associated with that).
> > > > > If you are taking that patch, pay attention to all the drivers in the
> > > > > tree which is using rte_io* for mixed access in slowpath.
> > > > I see 30+ drivers has calling rte_io* directly or indirectly through
> > > rte_write/read*.
> > > > It is hard for me to figure out all the mixed accesses in these drivers, and
> > > as you said, it makes no sense to change the _slow path_.
> > > >
> > > > How about we keep the old rte_io as is, and introduce 'fast path' version
> > > of rte_io for new code use?
> > > > Then in future, we may merge the two?
> > > > Another reason about this proposal is maybe there is rte_io calling in
> the
> > > fast path, but they are not mixed accesses and rte_cio is not suitable.
> > >
> > > Could you share more details about the case where fastpath + rte_io
> > > needed + rte_cio is not suitable?
> >
> > Here is an example for i40e, in the fast path, but only a pure io memory
> access.
> >
> https://code.dpdk.org/dpdk/v19.11/source/drivers/net/i40e/i40e_rxtx.c#L12
> 08
>
> Yes. That's a performance issue.
>
> It could be changed to following for the fix that works on x86, arm64
> with existing infra.
>
> From:
> I40E_PCI_REG_WRITE()
>
> to:
>
> rte_cio_wmb()
> I40E_PCI_REG_WRITE_RELAXED()
Yes, this is correct, I will submit a new patch for this.
This is an example out of all the cases that I must fix before relaxing the rte_io barriers.
My plan is as follows, any comments are welcome!
1. replace rte_*mb and rte_io_*mb with rte_cio_*mb where applicable in the fastpath, this is an optimization, as the barriers are relaxed.
2. replace all the rte_io_*mb with rte_cio_*mb where applicable in the slowpath and control path
3. until *all* the occurrences in the step 1 and 2 are done, then this path can be re-activated.
Please advise if the above approach works from your viewpoint.
Maybe I will stop at step 1, step 2 and 3 are not necessary as they are not in the fastpath?
>
> >
> > I wanted two variants of rte_io, because also x86 requires two as indicated
> here, one for no-WC and another for WC.
> > http://inbox.dpdk.org/dev/20191204151916.12607-1-
> xiaoyun.li@intel.com/T/#ea8bb1b4a378ab09baedbf95b4542bcb92f4a396f
> > >
> > > >
> > > > Any thoughts?
> > > >
> > > > >
> > > > > > But as the case in i40e, we must pay attention to where rte_cio was
> > > > > missing but rescued by old rte_io(but not by new rte_io).
> > > > > >
> > > > > >
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
2020-01-03 6:30 ` Gavin Hu
@ 2020-01-03 7:34 ` Jerin Jacob
2020-01-03 9:12 ` Gavin Hu
0 siblings, 1 reply; 35+ messages in thread
From: Jerin Jacob @ 2020-01-03 7:34 UTC (permalink / raw)
To: Gavin Hu
Cc: dpdk-dev, nd, David Marchand, thomas, rasland, maxime.coquelin,
tiwei.bie, hemant.agrawal, jerinj, Pavan Nikhilesh,
Honnappa Nagarahalli, Ruifeng Wang, Phil Yang, Joyce Kong,
Steve Capper
On Fri, Jan 3, 2020 at 12:00 PM Gavin Hu <Gavin.Hu@arm.com> wrote:
>
> Hi Jerin,
Hi Gavin,
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk@gmail.com>
> > Sent: Thursday, January 2, 2020 5:52 PM
> > To: Gavin Hu <Gavin.Hu@arm.com>
> > Cc: dpdk-dev <dev@dpdk.org>; nd <nd@arm.com>; David Marchand
> > <david.marchand@redhat.com>; thomas@monjalon.net;
> > rasland@mellanox.com; maxime.coquelin@redhat.com; tiwei.bie@intel.com;
> > hemant.agrawal@nxp.com; jerinj@marvell.com; Pavan Nikhilesh
> > <pbhagavatula@marvell.com>; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> > <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> > <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> > Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for
> > aarch64
> >
> > On Mon, Dec 23, 2019 at 3:46 PM Gavin Hu <Gavin.Hu@arm.com> wrote:
> > >
> > > Hi Jerin,
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > Sent: Monday, December 23, 2019 5:20 PM
> > > > To: Gavin Hu <Gavin.Hu@arm.com>
> > > > Cc: dpdk-dev <dev@dpdk.org>; nd <nd@arm.com>; David Marchand
> > > > <david.marchand@redhat.com>; thomas@monjalon.net;
> > > > rasland@mellanox.com; maxime.coquelin@redhat.com;
> > > > tiwei.bie@intel.com; hemant.agrawal@nxp.com; jerinj@marvell.com;
> > > > Pavan Nikhilesh <pbhagavatula@marvell.com>; Honnappa Nagarahalli
> > > > <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> > > > <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> > > > <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> > > > Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for
> > > > aarch64
> > > >
> > > > On Mon, Dec 23, 2019 at 2:44 PM Gavin Hu <Gavin.Hu@arm.com> wrote:
> > > > >
> > > > > Hi Jerin,
> > > >
> > > > Hi Gavin,
> > > >
> > > > >
> > > > > I think we are on the same page with regard to the problem, and the
> > > > situations, thanks for illuminating the historical background of the two
> > > > barriers.
> > > > > About the solution, I added inline comments.
> > > > > > It will be optimization only when if we are changing in the fast path.
> > > > > > In the slow path, it does not matter.
> > > > > > I think, the First step should be to use rte_cio_* wherever it is
> > > > > > coherent memory used in _fast path_. I think, Almost every driver
> > > > > > fixed that.
> > > > > >
> > > > > > I am not against this patch(changing the slow path to use rte_cio*
> > > > > > from rte_io* and virtio changes associated with that).
> > > > > > If you are taking that patch, pay attention to all the drivers in the
> > > > > > tree which is using rte_io* for mixed access in slowpath.
> > > > > I see 30+ drivers has calling rte_io* directly or indirectly through
> > > > rte_write/read*.
> > > > > It is hard for me to figure out all the mixed accesses in these drivers, and
> > > > as you said, it makes no sense to change the _slow path_.
> > > > >
> > > > > How about we keep the old rte_io as is, and introduce 'fast path' version
> > > > of rte_io for new code use?
> > > > > Then in future, we may merge the two?
> > > > > Another reason about this proposal is maybe there is rte_io calling in
> > the
> > > > fast path, but they are not mixed accesses and rte_cio is not suitable.
> > > >
> > > > Could you share more details about the case where fastpath + rte_io
> > > > needed + rte_cio is not suitable?
> > >
> > > Here is an example for i40e, in the fast path, but only a pure io memory
> > access.
> > >
> > https://code.dpdk.org/dpdk/v19.11/source/drivers/net/i40e/i40e_rxtx.c#L12
> > 08
> >
> > Yes. That's a performance issue.
> >
> > It could be changed to following for the fix that works on x86, arm64
> > with existing infra.
> >
> > From:
> > I40E_PCI_REG_WRITE()
> >
> > to:
> >
> > rte_cio_wmb()
> > I40E_PCI_REG_WRITE_RELAXED()
> Yes, this is correct, I will submit a new patch for this.
> This is an example out of all the cases that I must fix before relaxing the rte_io barriers.
> My plan is as follows, any comments are welcome!
> 1. replace rte_*mb and rte_io_*mb with rte_cio_*mb where applicable in the fastpath, this is an optimization, as the barriers are relaxed.
> 2. replace all the rte_io_*mb with rte_cio_*mb where applicable in the slowpath and control path
> 3. until *all* the occurrences in the step 1 and 2 are done, then this path can be re-activated.
>
> Please advise if the above approach works from your viewpoint.
I would prefer to have ONLY the step (1) and add a note for the same
in https://doc.dpdk.org/guides/prog_guide/perf_opt_guidelines.html
as documentation reference.
> Maybe I will stop at step 1, step 2 and 3 are not necessary as they are not in the fastpath?
Yup.
>
> >
> > >
> > > I wanted two variants of rte_io, because also x86 requires two as indicated
> > here, one for no-WC and another for WC.
> > > http://inbox.dpdk.org/dev/20191204151916.12607-1-
> > xiaoyun.li@intel.com/T/#ea8bb1b4a378ab09baedbf95b4542bcb92f4a396f
> > > >
> > > > >
> > > > > Any thoughts?
> > > > >
> > > > > >
> > > > > > > But as the case in i40e, we must pay attention to where rte_cio was
> > > > > > missing but rescued by old rte_io(but not by new rte_io).
> > > > > > >
> > > > > > >
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64
2020-01-03 7:34 ` Jerin Jacob
@ 2020-01-03 9:12 ` Gavin Hu
0 siblings, 0 replies; 35+ messages in thread
From: Gavin Hu @ 2020-01-03 9:12 UTC (permalink / raw)
To: Jerin Jacob
Cc: dpdk-dev, nd, David Marchand, thomas, rasland, maxime.coquelin,
tiwei.bie, hemant.agrawal, jerinj, Pavan Nikhilesh,
Honnappa Nagarahalli, Ruifeng Wang, Phil Yang, Joyce Kong,
Steve Capper, nd
Hi Jerin,
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Friday, January 3, 2020 3:35 PM
> To: Gavin Hu <Gavin.Hu@arm.com>
> Cc: dpdk-dev <dev@dpdk.org>; nd <nd@arm.com>; David Marchand
> <david.marchand@redhat.com>; thomas@monjalon.net;
> rasland@mellanox.com; maxime.coquelin@redhat.com; tiwei.bie@intel.com;
> hemant.agrawal@nxp.com; jerinj@marvell.com; Pavan Nikhilesh
> <pbhagavatula@marvell.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for
> aarch64
>
> On Fri, Jan 3, 2020 at 12:00 PM Gavin Hu <Gavin.Hu@arm.com> wrote:
> >
> > Hi Jerin,
>
> Hi Gavin,
>
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > Sent: Thursday, January 2, 2020 5:52 PM
> > > To: Gavin Hu <Gavin.Hu@arm.com>
> > > Cc: dpdk-dev <dev@dpdk.org>; nd <nd@arm.com>; David Marchand
> > > <david.marchand@redhat.com>; thomas@monjalon.net;
> > > rasland@mellanox.com; maxime.coquelin@redhat.com;
> tiwei.bie@intel.com;
> > > hemant.agrawal@nxp.com; jerinj@marvell.com; Pavan Nikhilesh
> > > <pbhagavatula@marvell.com>; Honnappa Nagarahalli
> > > <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> > > <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> > > <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for
> > > aarch64
> > >
> > > On Mon, Dec 23, 2019 at 3:46 PM Gavin Hu <Gavin.Hu@arm.com> wrote:
> > > >
> > > > Hi Jerin,
> > > >
> > > > > -----Original Message-----
> > > > > From: Jerin Jacob <jerinjacobk@gmail.com>
> > > > > Sent: Monday, December 23, 2019 5:20 PM
> > > > > To: Gavin Hu <Gavin.Hu@arm.com>
> > > > > Cc: dpdk-dev <dev@dpdk.org>; nd <nd@arm.com>; David Marchand
> > > > > <david.marchand@redhat.com>; thomas@monjalon.net;
> > > > > rasland@mellanox.com; maxime.coquelin@redhat.com;
> > > > > tiwei.bie@intel.com; hemant.agrawal@nxp.com; jerinj@marvell.com;
> > > > > Pavan Nikhilesh <pbhagavatula@marvell.com>; Honnappa Nagarahalli
> > > > > <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> > > > > <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce
> Kong
> > > > > <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> > > > > Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier
> for
> > > > > aarch64
> > > > >
> > > > > On Mon, Dec 23, 2019 at 2:44 PM Gavin Hu <Gavin.Hu@arm.com>
> wrote:
> > > > > >
> > > > > > Hi Jerin,
> > > > >
> > > > > Hi Gavin,
> > > > >
> > > > > >
> > > > > > I think we are on the same page with regard to the problem, and the
> > > > > situations, thanks for illuminating the historical background of the two
> > > > > barriers.
> > > > > > About the solution, I added inline comments.
> > > > > > > It will be optimization only when if we are changing in the fast path.
> > > > > > > In the slow path, it does not matter.
> > > > > > > I think, the First step should be to use rte_cio_* wherever it is
> > > > > > > coherent memory used in _fast path_. I think, Almost every driver
> > > > > > > fixed that.
> > > > > > >
> > > > > > > I am not against this patch(changing the slow path to use rte_cio*
> > > > > > > from rte_io* and virtio changes associated with that).
> > > > > > > If you are taking that patch, pay attention to all the drivers in the
> > > > > > > tree which is using rte_io* for mixed access in slowpath.
> > > > > > I see 30+ drivers has calling rte_io* directly or indirectly through
> > > > > rte_write/read*.
> > > > > > It is hard for me to figure out all the mixed accesses in these drivers,
> and
> > > > > as you said, it makes no sense to change the _slow path_.
> > > > > >
> > > > > > How about we keep the old rte_io as is, and introduce 'fast path'
> version
> > > > > of rte_io for new code use?
> > > > > > Then in future, we may merge the two?
> > > > > > Another reason about this proposal is maybe there is rte_io calling in
> > > the
> > > > > fast path, but they are not mixed accesses and rte_cio is not suitable.
> > > > >
> > > > > Could you share more details about the case where fastpath + rte_io
> > > > > needed + rte_cio is not suitable?
> > > >
> > > > Here is an example for i40e, in the fast path, but only a pure io memory
> > > access.
> > > >
> > >
> https://code.dpdk.org/dpdk/v19.11/source/drivers/net/i40e/i40e_rxtx.c#L12
> > > 08
> > >
> > > Yes. That's a performance issue.
> > >
> > > It could be changed to following for the fix that works on x86, arm64
> > > with existing infra.
> > >
> > > From:
> > > I40E_PCI_REG_WRITE()
> > >
> > > to:
> > >
> > > rte_cio_wmb()
> > > I40E_PCI_REG_WRITE_RELAXED()
> > Yes, this is correct, I will submit a new patch for this.
> > This is an example out of all the cases that I must fix before relaxing the
> rte_io barriers.
> > My plan is as follows, any comments are welcome!
> > 1. replace rte_*mb and rte_io_*mb with rte_cio_*mb where applicable in
> the fastpath, this is an optimization, as the barriers are relaxed.
> > 2. replace all the rte_io_*mb with rte_cio_*mb where applicable in the
> slowpath and control path
> > 3. until *all* the occurrences in the step 1 and 2 are done, then this path can
> be re-activated.
> >
> > Please advise if the above approach works from your viewpoint.
>
> I would prefer to have ONLY the step (1) and add a note for the same
> in https://doc.dpdk.org/guides/prog_guide/perf_opt_guidelines.html
> as documentation reference.
Ok, good idea, I will submit a documentation patch for this.
>
>
> > Maybe I will stop at step 1, step 2 and 3 are not necessary as they are not in
> the fastpath?
>
> Yup.
>
> >
> > >
> > > >
> > > > I wanted two variants of rte_io, because also x86 requires two as
> indicated
> > > here, one for no-WC and another for WC.
> > > > http://inbox.dpdk.org/dev/20191204151916.12607-1-
> > > xiaoyun.li@intel.com/T/#ea8bb1b4a378ab09baedbf95b4542bcb92f4a396f
> > > > >
> > > > > >
> > > > > > Any thoughts?
> > > > > >
> > > > > > >
> > > > > > > > But as the case in i40e, we must pay attention to where rte_cio
> was
> > > > > > > missing but rescued by old rte_io(but not by new rte_io).
> > > > > > > >
> > > > > > > >
^ permalink raw reply [flat|nested] 35+ messages in thread
* [dpdk-dev] [PATCH v2 2/3] net/virtio: virtual PCI requires smp barriers
2019-10-22 15:27 [dpdk-dev] [PATCH v1 0/3] relax io barrier for aarch64 and use smp barriers for virtual pci memory Gavin Hu
` (5 preceding siblings ...)
2019-12-20 3:09 ` [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for aarch64 Gavin Hu
@ 2019-12-20 3:09 ` Gavin Hu
2019-12-20 8:17 ` Tiwei Bie
2019-12-20 3:09 ` [dpdk-dev] [PATCH v2 3/3] crypto/virtio: " Gavin Hu
` (2 subsequent siblings)
9 siblings, 1 reply; 35+ messages in thread
From: Gavin Hu @ 2019-12-20 3:09 UTC (permalink / raw)
To: dev
Cc: nd, david.marchand, thomas, rasland, maxime.coquelin, tiwei.bie,
hemant.agrawal, jerinj, pbhagavatula, Honnappa.Nagarahalli,
ruifeng.wang, phil.yang, joyce.kong, steve.capper
Other than real PCI reads and writes to the device memory requiring
the io barriers, virtual pci memories are normal memory in the smp
configuration, which requires the smp barriers.
Since the smp barriers and io barriers are identical on x86 and PPC,
this change has only effect on aarch64.
As far as peripheral coherence order for ‘virtual’ devices, the arch
intent is that the Hypervisor view of things takes precedence – since
translations are made in holistic manner as the full stage1+stage2
regime, there is no such thing as a transaction taking on “EL1”
mapping as far as ordering. If the Hypervisor maps stage2 as Normal
but the OS at EL1 maps it as Device-nGnRE, then it’s Normal memory and
follows the ordering rules for Normal memory.
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
---
drivers/net/virtio/virtio_pci.c | 108 +++++++++++++++++++++++++++++-----------
1 file changed, 78 insertions(+), 30 deletions(-)
diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
index 4468e89..64aa0a0 100644
--- a/drivers/net/virtio/virtio_pci.c
+++ b/drivers/net/virtio/virtio_pci.c
@@ -24,6 +24,54 @@
#define PCI_CAP_ID_VNDR 0x09
#define PCI_CAP_ID_MSIX 0x11
+static __rte_always_inline uint8_t
+virtio_pci_read8(const volatile void *addr)
+{
+ uint8_t val;
+ val = rte_read8_relaxed(addr);
+ rte_smp_rmb();
+ return val;
+}
+
+static __rte_always_inline uint16_t
+virtio_pci_read16(const volatile void *addr)
+{
+ uint16_t val;
+ val = rte_read16_relaxed(addr);
+ rte_smp_rmb();
+ return val;
+}
+
+static __rte_always_inline uint32_t
+virtio_pci_read32(const volatile void *addr)
+{
+ uint32_t val;
+ val = rte_read32_relaxed(addr);
+ rte_smp_rmb();
+ return val;
+}
+
+static __rte_always_inline void
+virtio_pci_write8(uint8_t value, volatile void *addr)
+{
+ rte_smp_wmb();
+ rte_write8_relaxed(value, addr);
+}
+
+static __rte_always_inline void
+virtio_pci_write16(uint16_t value, volatile void *addr)
+{
+ rte_smp_wmb();
+ rte_write16_relaxed(value, addr);
+}
+
+static __rte_always_inline void
+virtio_pci_write32(uint32_t value, volatile void *addr)
+{
+ rte_smp_wmb();
+ rte_write32_relaxed(value, addr);
+}
+
/*
* The remaining space is defined by each driver as the per-driver
* configuration space.
@@ -260,8 +308,8 @@ const struct virtio_pci_ops legacy_ops = {
static inline void
io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
{
- rte_write32(val & ((1ULL << 32) - 1), lo);
- rte_write32(val >> 32, hi);
+ virtio_pci_write32(val & ((1ULL << 32) - 1), lo);
+ virtio_pci_write32(val >> 32, hi);
}
static void
@@ -273,13 +321,13 @@ modern_read_dev_config(struct virtio_hw *hw, size_t offset,
uint8_t old_gen, new_gen;
do {
- old_gen = rte_read8(&hw->common_cfg->config_generation);
+ old_gen = virtio_pci_read8(&hw->common_cfg->config_generation);
p = dst;
for (i = 0; i < length; i++)
- *p++ = rte_read8((uint8_t *)hw->dev_cfg + offset + i);
+ *p++ = virtio_pci_read8((uint8_t *)hw->dev_cfg + offset + i);
- new_gen = rte_read8(&hw->common_cfg->config_generation);
+ new_gen = virtio_pci_read8(&hw->common_cfg->config_generation);
} while (old_gen != new_gen);
}
@@ -291,7 +339,7 @@ modern_write_dev_config(struct virtio_hw *hw, size_t offset,
const uint8_t *p = src;
for (i = 0; i < length; i++)
- rte_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
+ virtio_pci_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
}
static uint64_t
@@ -299,11 +347,11 @@ modern_get_features(struct virtio_hw *hw)
{
uint32_t features_lo, features_hi;
- rte_write32(0, &hw->common_cfg->device_feature_select);
- features_lo = rte_read32(&hw->common_cfg->device_feature);
+ virtio_pci_write32(0, &hw->common_cfg->device_feature_select);
+ features_lo = virtio_pci_read32(&hw->common_cfg->device_feature);
- rte_write32(1, &hw->common_cfg->device_feature_select);
- features_hi = rte_read32(&hw->common_cfg->device_feature);
+ virtio_pci_write32(1, &hw->common_cfg->device_feature_select);
+ features_hi = virtio_pci_read32(&hw->common_cfg->device_feature);
return ((uint64_t)features_hi << 32) | features_lo;
}
@@ -311,53 +359,53 @@ modern_get_features(struct virtio_hw *hw)
static void
modern_set_features(struct virtio_hw *hw, uint64_t features)
{
- rte_write32(0, &hw->common_cfg->guest_feature_select);
- rte_write32(features & ((1ULL << 32) - 1),
+ virtio_pci_write32(0, &hw->common_cfg->guest_feature_select);
+ virtio_pci_write32(features & ((1ULL << 32) - 1),
&hw->common_cfg->guest_feature);
- rte_write32(1, &hw->common_cfg->guest_feature_select);
- rte_write32(features >> 32,
+ virtio_pci_write32(1, &hw->common_cfg->guest_feature_select);
+ virtio_pci_write32(features >> 32,
&hw->common_cfg->guest_feature);
}
static uint8_t
modern_get_status(struct virtio_hw *hw)
{
- return rte_read8(&hw->common_cfg->device_status);
+ return virtio_pci_read8(&hw->common_cfg->device_status);
}
static void
modern_set_status(struct virtio_hw *hw, uint8_t status)
{
- rte_write8(status, &hw->common_cfg->device_status);
+ virtio_pci_write8(status, &hw->common_cfg->device_status);
}
static uint8_t
modern_get_isr(struct virtio_hw *hw)
{
- return rte_read8(hw->isr);
+ return virtio_pci_read8(hw->isr);
}
static uint16_t
modern_set_config_irq(struct virtio_hw *hw, uint16_t vec)
{
- rte_write16(vec, &hw->common_cfg->msix_config);
- return rte_read16(&hw->common_cfg->msix_config);
+ virtio_pci_write16(vec, &hw->common_cfg->msix_config);
+ return virtio_pci_read16(&hw->common_cfg->msix_config);
}
static uint16_t
modern_set_queue_irq(struct virtio_hw *hw, struct virtqueue *vq, uint16_t vec)
{
- rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
- rte_write16(vec, &hw->common_cfg->queue_msix_vector);
- return rte_read16(&hw->common_cfg->queue_msix_vector);
+ virtio_pci_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+ virtio_pci_write16(vec, &hw->common_cfg->queue_msix_vector);
+ return virtio_pci_read16(&hw->common_cfg->queue_msix_vector);
}
static uint16_t
modern_get_queue_num(struct virtio_hw *hw, uint16_t queue_id)
{
- rte_write16(queue_id, &hw->common_cfg->queue_select);
- return rte_read16(&hw->common_cfg->queue_size);
+ virtio_pci_write16(queue_id, &hw->common_cfg->queue_select);
+ return virtio_pci_read16(&hw->common_cfg->queue_size);
}
static int
@@ -375,7 +423,7 @@ modern_setup_queue(struct virtio_hw *hw, struct virtqueue *vq)
ring[vq->vq_nentries]),
VIRTIO_PCI_VRING_ALIGN);
- rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+ virtio_pci_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
io_write64_twopart(desc_addr, &hw->common_cfg->queue_desc_lo,
&hw->common_cfg->queue_desc_hi);
@@ -384,11 +432,11 @@ modern_setup_queue(struct virtio_hw *hw, struct virtqueue *vq)
io_write64_twopart(used_addr, &hw->common_cfg->queue_used_lo,
&hw->common_cfg->queue_used_hi);
- notify_off = rte_read16(&hw->common_cfg->queue_notify_off);
+ notify_off = virtio_pci_read16(&hw->common_cfg->queue_notify_off);
vq->notify_addr = (void *)((uint8_t *)hw->notify_base +
notify_off * hw->notify_off_multiplier);
- rte_write16(1, &hw->common_cfg->queue_enable);
+ virtio_pci_write16(1, &hw->common_cfg->queue_enable);
PMD_INIT_LOG(DEBUG, "queue %u addresses:", vq->vq_queue_index);
PMD_INIT_LOG(DEBUG, "\t desc_addr: %" PRIx64, desc_addr);
@@ -403,7 +451,7 @@ modern_setup_queue(struct virtio_hw *hw, struct virtqueue *vq)
static void
modern_del_queue(struct virtio_hw *hw, struct virtqueue *vq)
{
- rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+ virtio_pci_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
io_write64_twopart(0, &hw->common_cfg->queue_desc_lo,
&hw->common_cfg->queue_desc_hi);
@@ -412,13 +460,13 @@ modern_del_queue(struct virtio_hw *hw, struct virtqueue *vq)
io_write64_twopart(0, &hw->common_cfg->queue_used_lo,
&hw->common_cfg->queue_used_hi);
- rte_write16(0, &hw->common_cfg->queue_enable);
+ virtio_pci_write16(0, &hw->common_cfg->queue_enable);
}
static void
modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue *vq)
{
- rte_write16(vq->vq_queue_index, vq->notify_addr);
+ virtio_pci_write16(vq->vq_queue_index, vq->notify_addr);
}
const struct virtio_pci_ops modern_ops = {
--
2.7.4
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 2/3] net/virtio: virtual PCI requires smp barriers
2019-12-20 3:09 ` [dpdk-dev] [PATCH v2 2/3] net/virtio: virtual PCI requires smp barriers Gavin Hu
@ 2019-12-20 8:17 ` Tiwei Bie
2019-12-20 10:19 ` Gavin Hu
0 siblings, 1 reply; 35+ messages in thread
From: Tiwei Bie @ 2019-12-20 8:17 UTC (permalink / raw)
To: Gavin Hu
Cc: dev, nd, david.marchand, thomas, rasland, maxime.coquelin,
hemant.agrawal, jerinj, pbhagavatula, Honnappa.Nagarahalli,
ruifeng.wang, phil.yang, joyce.kong, steve.capper
On Fri, Dec 20, 2019 at 11:09:50AM +0800, Gavin Hu wrote:
> Other than real PCI reads and writes to the device memory requiring
> the io barriers, virtual pci memories are normal memory in the smp
> configuration, which requires the smp barriers.
>
> Since the smp barriers and io barriers are identical on x86 and PPC,
> this change has only effect on aarch64.
>
> As far as peripheral coherence order for ‘virtual’ devices, the arch
> intent is that the Hypervisor view of things takes precedence – since
> translations are made in holistic manner as the full stage1+stage2
> regime, there is no such thing as a transaction taking on “EL1”
> mapping as far as ordering. If the Hypervisor maps stage2 as Normal
> but the OS at EL1 maps it as Device-nGnRE, then it’s Normal memory and
> follows the ordering rules for Normal memory.
>
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> ---
> drivers/net/virtio/virtio_pci.c | 108 +++++++++++++++++++++++++++++-----------
> 1 file changed, 78 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
> index 4468e89..64aa0a0 100644
> --- a/drivers/net/virtio/virtio_pci.c
> +++ b/drivers/net/virtio/virtio_pci.c
> @@ -24,6 +24,54 @@
> #define PCI_CAP_ID_VNDR 0x09
> #define PCI_CAP_ID_MSIX 0x11
>
> +static __rte_always_inline uint8_t
> +virtio_pci_read8(const volatile void *addr)
> +{
> + uint8_t val;
> + val = rte_read8_relaxed(addr);
> + rte_smp_rmb();
> + return val;
> +}
> +
> +static __rte_always_inline uint16_t
> +virtio_pci_read16(const volatile void *addr)
> +{
> + uint16_t val;
> + val = rte_read16_relaxed(addr);
> + rte_smp_rmb();
> + return val;
> +}
> +
> +static __rte_always_inline uint32_t
> +virtio_pci_read32(const volatile void *addr)
> +{
> + uint32_t val;
> + val = rte_read32_relaxed(addr);
> + rte_smp_rmb();
> + return val;
> +}
> +
> +static __rte_always_inline void
> +virtio_pci_write8(uint8_t value, volatile void *addr)
> +{
> + rte_smp_wmb();
> + rte_write8_relaxed(value, addr);
> +}
> +
> +static __rte_always_inline void
> +virtio_pci_write16(uint16_t value, volatile void *addr)
> +{
> + rte_smp_wmb();
> + rte_write16_relaxed(value, addr);
> +}
> +
> +static __rte_always_inline void
> +virtio_pci_write32(uint32_t value, volatile void *addr)
> +{
> + rte_smp_wmb();
> + rte_write32_relaxed(value, addr);
> +}
We can't assume that virtio device is software running
in an SMP configuration unless VIRTIO_F_ORDER_PLATFORM
isn't negotiated.
https://github.com/oasis-tcs/virtio-spec/blob/94520b3af19c/content.tex#L5788
> +
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v2 2/3] net/virtio: virtual PCI requires smp barriers
2019-12-20 8:17 ` Tiwei Bie
@ 2019-12-20 10:19 ` Gavin Hu
0 siblings, 0 replies; 35+ messages in thread
From: Gavin Hu @ 2019-12-20 10:19 UTC (permalink / raw)
To: Tiwei Bie
Cc: dev, nd, david.marchand, thomas, rasland, maxime.coquelin,
hemant.agrawal, jerinj, pbhagavatula, Honnappa Nagarahalli,
Ruifeng Wang, Phil Yang, Joyce Kong, Steve Capper, nd
Hi Tiwei,
Thanks for review.
> -----Original Message-----
> From: Tiwei Bie <tiwei.bie@intel.com>
> Sent: Friday, December 20, 2019 4:18 PM
> To: Gavin Hu <Gavin.Hu@arm.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>; david.marchand@redhat.com;
> thomas@monjalon.net; rasland@mellanox.com;
> maxime.coquelin@redhat.com; hemant.agrawal@nxp.com;
> jerinj@marvell.com; pbhagavatula@marvell.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [PATCH v2 2/3] net/virtio: virtual PCI requires smp barriers
>
> On Fri, Dec 20, 2019 at 11:09:50AM +0800, Gavin Hu wrote:
> > Other than real PCI reads and writes to the device memory requiring
> > the io barriers, virtual pci memories are normal memory in the smp
> > configuration, which requires the smp barriers.
> >
> > Since the smp barriers and io barriers are identical on x86 and PPC,
> > this change has only effect on aarch64.
> >
> > As far as peripheral coherence order for ‘virtual’ devices, the arch
> > intent is that the Hypervisor view of things takes precedence – since
> > translations are made in holistic manner as the full stage1+stage2
> > regime, there is no such thing as a transaction taking on “EL1”
> > mapping as far as ordering. If the Hypervisor maps stage2 as Normal
> > but the OS at EL1 maps it as Device-nGnRE, then it’s Normal memory and
> > follows the ordering rules for Normal memory.
> >
> > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > ---
> > drivers/net/virtio/virtio_pci.c | 108 +++++++++++++++++++++++++++++---
> --------
> > 1 file changed, 78 insertions(+), 30 deletions(-)
> >
> > diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c
> > index 4468e89..64aa0a0 100644
> > --- a/drivers/net/virtio/virtio_pci.c
> > +++ b/drivers/net/virtio/virtio_pci.c
> > @@ -24,6 +24,54 @@
> > #define PCI_CAP_ID_VNDR 0x09
> > #define PCI_CAP_ID_MSIX 0x11
> >
> > +static __rte_always_inline uint8_t
> > +virtio_pci_read8(const volatile void *addr)
> > +{
> > + uint8_t val;
> > + val = rte_read8_relaxed(addr);
> > + rte_smp_rmb();
> > + return val;
> > +}
> > +
> > +static __rte_always_inline uint16_t
> > +virtio_pci_read16(const volatile void *addr)
> > +{
> > + uint16_t val;
> > + val = rte_read16_relaxed(addr);
> > + rte_smp_rmb();
> > + return val;
> > +}
> > +
> > +static __rte_always_inline uint32_t
> > +virtio_pci_read32(const volatile void *addr)
> > +{
> > + uint32_t val;
> > + val = rte_read32_relaxed(addr);
> > + rte_smp_rmb();
> > + return val;
> > +}
> > +
> > +static __rte_always_inline void
> > +virtio_pci_write8(uint8_t value, volatile void *addr)
> > +{
> > + rte_smp_wmb();
> > + rte_write8_relaxed(value, addr);
> > +}
> > +
> > +static __rte_always_inline void
> > +virtio_pci_write16(uint16_t value, volatile void *addr)
> > +{
> > + rte_smp_wmb();
> > + rte_write16_relaxed(value, addr);
> > +}
> > +
> > +static __rte_always_inline void
> > +virtio_pci_write32(uint32_t value, volatile void *addr)
> > +{
> > + rte_smp_wmb();
> > + rte_write32_relaxed(value, addr);
> > +}
>
> We can't assume that virtio device is software running
> in an SMP configuration unless VIRTIO_F_ORDER_PLATFORM
> isn't negotiated.
That's true, rte_smp is sufficient for *non*- VIRTIO_F_ORDER_PLATFORM,
As in the previous patch, rte_io is relaxed to a compiler barrier, rte_smp is stronger than that, put another way, rte_smp is also sufficient for * VIRTIO_F_ORDER_PLATFORM* case.
As X86 and PPC make no difference about rte_smp and rte_io, so everything goes fine?
Anyway, I will update the commit log to reflect the above points.
>
> https://github.com/oasis-tcs/virtio-
> spec/blob/94520b3af19c/content.tex#L5788
>
> > +
^ permalink raw reply [flat|nested] 35+ messages in thread
* [dpdk-dev] [PATCH v2 3/3] crypto/virtio: virtual PCI requires smp barriers
2019-10-22 15:27 [dpdk-dev] [PATCH v1 0/3] relax io barrier for aarch64 and use smp barriers for virtual pci memory Gavin Hu
` (6 preceding siblings ...)
2019-12-20 3:09 ` [dpdk-dev] [PATCH v2 2/3] net/virtio: virtual PCI requires smp barriers Gavin Hu
@ 2019-12-20 3:09 ` Gavin Hu
2020-02-08 13:48 ` [dpdk-dev] [PATCH v3] net/i40e: relaxed barrier in the tx fastpath Gavin Hu
2020-02-12 5:56 ` [dpdk-dev] [PATCH v4] " Gavin Hu
9 siblings, 0 replies; 35+ messages in thread
From: Gavin Hu @ 2019-12-20 3:09 UTC (permalink / raw)
To: dev
Cc: nd, david.marchand, thomas, rasland, maxime.coquelin, tiwei.bie,
hemant.agrawal, jerinj, pbhagavatula, Honnappa.Nagarahalli,
ruifeng.wang, phil.yang, joyce.kong, steve.capper
Other than real PCI reads and writes to the device memory requiring
the io barriers, virtual pci memories are normal memory in the smp
configuration, and requires the smp barriers.
Since the smp barriers and io barriers are identical on x86 and PPC,
this change has only effect on aarch64.
As far as peripheral coherence order for ‘virtual’ devices, the arch
intent is that the Hypervisor view of things takes precedence – since
translations are made in holistic manner as the full stage1+stage2
regime, there is no such thing as a transaction taking on “EL1”
mapping as far as ordering. If the Hypervisor maps stage2 as Normal
but the OS at EL1 maps it as Device-nGnRE, then it’s Normal memory and
follows the ordering rules for Normal memory.
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
---
drivers/crypto/virtio/virtio_pci.c | 108 ++++++++++++++++++++++++++-----------
1 file changed, 78 insertions(+), 30 deletions(-)
diff --git a/drivers/crypto/virtio/virtio_pci.c b/drivers/crypto/virtio/virtio_pci.c
index 8137b3c..dd8eda8 100644
--- a/drivers/crypto/virtio/virtio_pci.c
+++ b/drivers/crypto/virtio/virtio_pci.c
@@ -24,6 +24,54 @@
#define PCI_CAP_ID_VNDR 0x09
#define PCI_CAP_ID_MSIX 0x11
+static __rte_always_inline uint8_t
+virtio_pci_read8(const volatile void *addr)
+{
+ uint8_t val;
+ val = rte_read8_relaxed(addr);
+ rte_smp_rmb();
+ return val;
+}
+
+static __rte_always_inline uint16_t
+virtio_pci_read16(const volatile void *addr)
+{
+ uint16_t val;
+ val = rte_read16_relaxed(addr);
+ rte_smp_rmb();
+ return val;
+}
+
+static __rte_always_inline uint32_t
+virtio_pci_read32(const volatile void *addr)
+{
+ uint32_t val;
+ val = rte_read32_relaxed(addr);
+ rte_smp_rmb();
+ return val;
+}
+
+static __rte_always_inline void
+virtio_pci_write8(uint8_t value, volatile void *addr)
+{
+ rte_smp_wmb();
+ rte_write8_relaxed(value, addr);
+}
+
+static __rte_always_inline void
+virtio_pci_write16(uint16_t value, volatile void *addr)
+{
+ rte_smp_wmb();
+ rte_write16_relaxed(value, addr);
+}
+
+static __rte_always_inline void
+virtio_pci_write32(uint32_t value, volatile void *addr)
+{
+ rte_smp_wmb();
+ rte_write32_relaxed(value, addr);
+}
+
/*
* The remaining space is defined by each driver as the per-driver
* configuration space.
@@ -52,8 +100,8 @@ check_vq_phys_addr_ok(struct virtqueue *vq)
static inline void
io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
{
- rte_write32(val & ((1ULL << 32) - 1), lo);
- rte_write32(val >> 32, hi);
+ virtio_pci_write32(val & ((1ULL << 32) - 1), lo);
+ virtio_pci_write32(val >> 32, hi);
}
static void
@@ -65,13 +113,13 @@ modern_read_dev_config(struct virtio_crypto_hw *hw, size_t offset,
uint8_t old_gen, new_gen;
do {
- old_gen = rte_read8(&hw->common_cfg->config_generation);
+ old_gen = virtio_pci_read8(&hw->common_cfg->config_generation);
p = dst;
for (i = 0; i < length; i++)
- *p++ = rte_read8((uint8_t *)hw->dev_cfg + offset + i);
+ *p++ = virtio_pci_read8((uint8_t *)hw->dev_cfg + offset + i);
- new_gen = rte_read8(&hw->common_cfg->config_generation);
+ new_gen = virtio_pci_read8(&hw->common_cfg->config_generation);
} while (old_gen != new_gen);
}
@@ -83,7 +131,7 @@ modern_write_dev_config(struct virtio_crypto_hw *hw, size_t offset,
const uint8_t *p = src;
for (i = 0; i < length; i++)
- rte_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
+ virtio_pci_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
}
static uint64_t
@@ -91,11 +139,11 @@ modern_get_features(struct virtio_crypto_hw *hw)
{
uint32_t features_lo, features_hi;
- rte_write32(0, &hw->common_cfg->device_feature_select);
- features_lo = rte_read32(&hw->common_cfg->device_feature);
+ virtio_pci_write32(0, &hw->common_cfg->device_feature_select);
+ features_lo = virtio_pci_read32(&hw->common_cfg->device_feature);
- rte_write32(1, &hw->common_cfg->device_feature_select);
- features_hi = rte_read32(&hw->common_cfg->device_feature);
+ virtio_pci_write32(1, &hw->common_cfg->device_feature_select);
+ features_hi = virtio_pci_read32(&hw->common_cfg->device_feature);
return ((uint64_t)features_hi << 32) | features_lo;
}
@@ -103,25 +151,25 @@ modern_get_features(struct virtio_crypto_hw *hw)
static void
modern_set_features(struct virtio_crypto_hw *hw, uint64_t features)
{
- rte_write32(0, &hw->common_cfg->guest_feature_select);
- rte_write32(features & ((1ULL << 32) - 1),
+ virtio_pci_write32(0, &hw->common_cfg->guest_feature_select);
+ virtio_pci_write32(features & ((1ULL << 32) - 1),
&hw->common_cfg->guest_feature);
- rte_write32(1, &hw->common_cfg->guest_feature_select);
- rte_write32(features >> 32,
+ virtio_pci_write32(1, &hw->common_cfg->guest_feature_select);
+ virtio_pci_write32(features >> 32,
&hw->common_cfg->guest_feature);
}
static uint8_t
modern_get_status(struct virtio_crypto_hw *hw)
{
- return rte_read8(&hw->common_cfg->device_status);
+ return virtio_pci_read8(&hw->common_cfg->device_status);
}
static void
modern_set_status(struct virtio_crypto_hw *hw, uint8_t status)
{
- rte_write8(status, &hw->common_cfg->device_status);
+ virtio_pci_write8(status, &hw->common_cfg->device_status);
}
static void
@@ -134,30 +182,30 @@ modern_reset(struct virtio_crypto_hw *hw)
static uint8_t
modern_get_isr(struct virtio_crypto_hw *hw)
{
- return rte_read8(hw->isr);
+ return virtio_pci_read8(hw->isr);
}
static uint16_t
modern_set_config_irq(struct virtio_crypto_hw *hw, uint16_t vec)
{
- rte_write16(vec, &hw->common_cfg->msix_config);
- return rte_read16(&hw->common_cfg->msix_config);
+ virtio_pci_write16(vec, &hw->common_cfg->msix_config);
+ return virtio_pci_read16(&hw->common_cfg->msix_config);
}
static uint16_t
modern_set_queue_irq(struct virtio_crypto_hw *hw, struct virtqueue *vq,
uint16_t vec)
{
- rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
- rte_write16(vec, &hw->common_cfg->queue_msix_vector);
- return rte_read16(&hw->common_cfg->queue_msix_vector);
+ virtio_pci_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+ virtio_pci_write16(vec, &hw->common_cfg->queue_msix_vector);
+ return virtio_pci_read16(&hw->common_cfg->queue_msix_vector);
}
static uint16_t
modern_get_queue_num(struct virtio_crypto_hw *hw, uint16_t queue_id)
{
- rte_write16(queue_id, &hw->common_cfg->queue_select);
- return rte_read16(&hw->common_cfg->queue_size);
+ virtio_pci_write16(queue_id, &hw->common_cfg->queue_select);
+ return virtio_pci_read16(&hw->common_cfg->queue_size);
}
static int
@@ -175,7 +223,7 @@ modern_setup_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
ring[vq->vq_nentries]),
VIRTIO_PCI_VRING_ALIGN);
- rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+ virtio_pci_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
io_write64_twopart(desc_addr, &hw->common_cfg->queue_desc_lo,
&hw->common_cfg->queue_desc_hi);
@@ -184,11 +232,11 @@ modern_setup_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
io_write64_twopart(used_addr, &hw->common_cfg->queue_used_lo,
&hw->common_cfg->queue_used_hi);
- notify_off = rte_read16(&hw->common_cfg->queue_notify_off);
+ notify_off = virtio_pci_read16(&hw->common_cfg->queue_notify_off);
vq->notify_addr = (void *)((uint8_t *)hw->notify_base +
notify_off * hw->notify_off_multiplier);
- rte_write16(1, &hw->common_cfg->queue_enable);
+ virtio_pci_write16(1, &hw->common_cfg->queue_enable);
VIRTIO_CRYPTO_INIT_LOG_DBG("queue %u addresses:", vq->vq_queue_index);
VIRTIO_CRYPTO_INIT_LOG_DBG("\t desc_addr: %" PRIx64, desc_addr);
@@ -203,7 +251,7 @@ modern_setup_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
static void
modern_del_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
{
- rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+ virtio_pci_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
io_write64_twopart(0, &hw->common_cfg->queue_desc_lo,
&hw->common_cfg->queue_desc_hi);
@@ -212,14 +260,14 @@ modern_del_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
io_write64_twopart(0, &hw->common_cfg->queue_used_lo,
&hw->common_cfg->queue_used_hi);
- rte_write16(0, &hw->common_cfg->queue_enable);
+ virtio_pci_write16(0, &hw->common_cfg->queue_enable);
}
static void
modern_notify_queue(struct virtio_crypto_hw *hw __rte_unused,
struct virtqueue *vq)
{
- rte_write16(vq->vq_queue_index, vq->notify_addr);
+ virtio_pci_write16(vq->vq_queue_index, vq->notify_addr);
}
const struct virtio_pci_ops virtio_crypto_modern_ops = {
--
2.7.4
^ permalink raw reply [flat|nested] 35+ messages in thread
* [dpdk-dev] [PATCH v3] net/i40e: relaxed barrier in the tx fastpath
2019-10-22 15:27 [dpdk-dev] [PATCH v1 0/3] relax io barrier for aarch64 and use smp barriers for virtual pci memory Gavin Hu
` (7 preceding siblings ...)
2019-12-20 3:09 ` [dpdk-dev] [PATCH v2 3/3] crypto/virtio: " Gavin Hu
@ 2020-02-08 13:48 ` Gavin Hu
2020-02-11 2:11 ` Ye Xiaolong
2020-02-15 8:25 ` Jerin Jacob
2020-02-12 5:56 ` [dpdk-dev] [PATCH v4] " Gavin Hu
9 siblings, 2 replies; 35+ messages in thread
From: Gavin Hu @ 2020-02-08 13:48 UTC (permalink / raw)
To: dev
Cc: nd, david.marchand, thomas, rasland, maxime.coquelin, tiwei.bie,
matan, shahafs, viacheslavo, arybchenko, stephen, hemant.agrawal,
jerinj, pbhagavatula, Honnappa.Nagarahalli, ruifeng.wang,
phil.yang, joyce.kong, steve.capper
To keep ordering of mixed accesses, rte_cio is sufficient.
The rte_io barrier is overkill.[1]
[1] http://inbox.dpdk.org/dev/CALBAE1M-ezVWCjqCZDBw+MMDEC4O9
qf0Kpn89EMdGDajepKoZQ@mail.gmail.com
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
---
V3:
- optimize the barriers in the fast path only, leave as it is for the
barriers in the slow path and control path <jerin>
- drop the virtio patches from the list as they are in the control path
- it makes more sense to relax the barrier in the fast path, at the PMD level.
relaxing the fundamental rte_io_x barriers APIs requires scrutinizations for
each PMDs which use the barriers directly or indirectly.
V2:
- remove virtio_pci_read/write64 APIs definitions, they are not needed and generate compiling errors like " error: unused function 'virtio_pci_write64' [-Werror,-Wunused-function]"
- update the reference link to kernel source code
---
drivers/net/i40e/i40e_rxtx.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index fd1ae80da..8c0f7cc67 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1248,7 +1248,8 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
(unsigned) txq->port_id, (unsigned) txq->queue_id,
(unsigned) tx_id, (unsigned) nb_tx);
- I40E_PCI_REG_WRITE(txq->qtx_tail, tx_id);
+ rte_cio_wmb();
+ I40E_PCI_REG_WRITE_RELAXED(txq->qtx_tail, tx_id);
txq->tx_tail = tx_id;
return nb_tx;
--
2.17.1
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v3] net/i40e: relaxed barrier in the tx fastpath
2020-02-08 13:48 ` [dpdk-dev] [PATCH v3] net/i40e: relaxed barrier in the tx fastpath Gavin Hu
@ 2020-02-11 2:11 ` Ye Xiaolong
2020-02-12 6:02 ` Gavin Hu
2020-02-15 8:25 ` Jerin Jacob
1 sibling, 1 reply; 35+ messages in thread
From: Ye Xiaolong @ 2020-02-11 2:11 UTC (permalink / raw)
To: Gavin Hu
Cc: dev, nd, david.marchand, thomas, rasland, maxime.coquelin,
tiwei.bie, matan, shahafs, viacheslavo, arybchenko, stephen,
hemant.agrawal, jerinj, pbhagavatula, Honnappa.Nagarahalli,
ruifeng.wang, phil.yang, joyce.kong, steve.capper
Hi, Gavin
Please help add the Fixes tag and cc stable.
Thanks,
Xiaolong
On 02/08, Gavin Hu wrote:
>To keep ordering of mixed accesses, rte_cio is sufficient.
>The rte_io barrier is overkill.[1]
>
>[1] http://inbox.dpdk.org/dev/CALBAE1M-ezVWCjqCZDBw+MMDEC4O9
>qf0Kpn89EMdGDajepKoZQ@mail.gmail.com
>
>Signed-off-by: Gavin Hu <gavin.hu@arm.com>
>---
>V3:
>- optimize the barriers in the fast path only, leave as it is for the
> barriers in the slow path and control path <jerin>
>- drop the virtio patches from the list as they are in the control path
>- it makes more sense to relax the barrier in the fast path, at the PMD level.
> relaxing the fundamental rte_io_x barriers APIs requires scrutinizations for
> each PMDs which use the barriers directly or indirectly.
>V2:
>- remove virtio_pci_read/write64 APIs definitions, they are not needed and generate compiling errors like " error: unused function 'virtio_pci_write64' [-Werror,-Wunused-function]"
>- update the reference link to kernel source code
>---
> drivers/net/i40e/i40e_rxtx.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
>index fd1ae80da..8c0f7cc67 100644
>--- a/drivers/net/i40e/i40e_rxtx.c
>+++ b/drivers/net/i40e/i40e_rxtx.c
>@@ -1248,7 +1248,8 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> (unsigned) txq->port_id, (unsigned) txq->queue_id,
> (unsigned) tx_id, (unsigned) nb_tx);
>
>- I40E_PCI_REG_WRITE(txq->qtx_tail, tx_id);
>+ rte_cio_wmb();
>+ I40E_PCI_REG_WRITE_RELAXED(txq->qtx_tail, tx_id);
> txq->tx_tail = tx_id;
>
> return nb_tx;
>--
>2.17.1
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v3] net/i40e: relaxed barrier in the tx fastpath
2020-02-11 2:11 ` Ye Xiaolong
@ 2020-02-12 6:02 ` Gavin Hu
0 siblings, 0 replies; 35+ messages in thread
From: Gavin Hu @ 2020-02-12 6:02 UTC (permalink / raw)
To: Ye Xiaolong
Cc: dev, nd, david.marchand, thomas, rasland, maxime.coquelin,
tiwei.bie, matan, shahafs, viacheslavo, arybchenko, stephen,
hemant.agrawal, jerinj, pbhagavatula, Honnappa Nagarahalli,
Ruifeng Wang, Phil Yang, Joyce Kong, Steve Capper, nd
Hi Xiaolong,
Just sent v4 for this, thanks for reminding!
Best Regards,
Gavin
> -----Original Message-----
> From: Ye Xiaolong <xiaolong.ye@intel.com>
> Sent: Tuesday, February 11, 2020 10:11 AM
> To: Gavin Hu <Gavin.Hu@arm.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>; david.marchand@redhat.com;
> thomas@monjalon.net; rasland@mellanox.com;
> maxime.coquelin@redhat.com; tiwei.bie@intel.com; matan@mellanox.com;
> shahafs@mellanox.com; viacheslavo@mellanox.com;
> arybchenko@solarflare.com; stephen@networkplumber.org;
> hemant.agrawal@nxp.com; jerinj@marvell.com;
> pbhagavatula@marvell.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; Phil Yang <Phil.Yang@arm.com>; Joyce Kong
> <Joyce.Kong@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v3] net/i40e: relaxed barrier in the tx
> fastpath
>
> Hi, Gavin
>
> Please help add the Fixes tag and cc stable.
>
> Thanks,
> Xiaolong
>
> On 02/08, Gavin Hu wrote:
> >To keep ordering of mixed accesses, rte_cio is sufficient.
> >The rte_io barrier is overkill.[1]
> >
> >[1] http://inbox.dpdk.org/dev/CALBAE1M-ezVWCjqCZDBw+MMDEC4O9
> >qf0Kpn89EMdGDajepKoZQ@mail.gmail.com
> >
> >Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> >---
> >V3:
> >- optimize the barriers in the fast path only, leave as it is for the
> > barriers in the slow path and control path <jerin>
> >- drop the virtio patches from the list as they are in the control path
> >- it makes more sense to relax the barrier in the fast path, at the PMD level.
> > relaxing the fundamental rte_io_x barriers APIs requires scrutinizations
> for
> > each PMDs which use the barriers directly or indirectly.
> >V2:
> >- remove virtio_pci_read/write64 APIs definitions, they are not needed and
> generate compiling errors like " error: unused function 'virtio_pci_write64' [-
> Werror,-Wunused-function]"
> >- update the reference link to kernel source code
> >---
> > drivers/net/i40e/i40e_rxtx.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> >diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
> >index fd1ae80da..8c0f7cc67 100644
> >--- a/drivers/net/i40e/i40e_rxtx.c
> >+++ b/drivers/net/i40e/i40e_rxtx.c
> >@@ -1248,7 +1248,8 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf
> **tx_pkts, uint16_t nb_pkts)
> > (unsigned) txq->port_id, (unsigned) txq->queue_id,
> > (unsigned) tx_id, (unsigned) nb_tx);
> >
> >- I40E_PCI_REG_WRITE(txq->qtx_tail, tx_id);
> >+ rte_cio_wmb();
> >+ I40E_PCI_REG_WRITE_RELAXED(txq->qtx_tail, tx_id);
> > txq->tx_tail = tx_id;
> >
> > return nb_tx;
> >--
> >2.17.1
> >
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v3] net/i40e: relaxed barrier in the tx fastpath
2020-02-08 13:48 ` [dpdk-dev] [PATCH v3] net/i40e: relaxed barrier in the tx fastpath Gavin Hu
2020-02-11 2:11 ` Ye Xiaolong
@ 2020-02-15 8:25 ` Jerin Jacob
1 sibling, 0 replies; 35+ messages in thread
From: Jerin Jacob @ 2020-02-15 8:25 UTC (permalink / raw)
To: Gavin Hu
Cc: dpdk-dev, nd, David Marchand, Thomas Monjalon, Raslan Darawsheh,
Maxime Coquelin, Tiwei Bie, Matan Azrad, Shahaf Shuler,
Slava Ovsiienko, Andrew Rybchenko, Stephen Hemminger,
Hemant Agrawal, Jerin Jacob, Pavan Nikhilesh,
Honnappa Nagarahalli, Ruifeng Wang (Arm Technology China),
Phil Yang, Joyce Kong, Steve Capper
On Sat, Feb 8, 2020 at 7:22 PM Gavin Hu <gavin.hu@arm.com> wrote:
>
> To keep ordering of mixed accesses, rte_cio is sufficient.
> The rte_io barrier is overkill.[1]
>
> [1] http://inbox.dpdk.org/dev/CALBAE1M-ezVWCjqCZDBw+MMDEC4O9
> qf0Kpn89EMdGDajepKoZQ@mail.gmail.com
>
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> ---
> V3:
> - optimize the barriers in the fast path only, leave as it is for the
> barriers in the slow path and control path <jerin>
> - drop the virtio patches from the list as they are in the control path
> - it makes more sense to relax the barrier in the fast path, at the PMD level.
> relaxing the fundamental rte_io_x barriers APIs requires scrutinizations for
> each PMDs which use the barriers directly or indirectly.
> V2:
> - remove virtio_pci_read/write64 APIs definitions, they are not needed and generate compiling errors like " error: unused function 'virtio_pci_write64' [-Werror,-Wunused-function]"
> - update the reference link to kernel source code
> ---
> drivers/net/i40e/i40e_rxtx.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
> index fd1ae80da..8c0f7cc67 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -1248,7 +1248,8 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> (unsigned) txq->port_id, (unsigned) txq->queue_id,
> (unsigned) tx_id, (unsigned) nb_tx);
>
> - I40E_PCI_REG_WRITE(txq->qtx_tail, tx_id);
> + rte_cio_wmb();
> + I40E_PCI_REG_WRITE_RELAXED(txq->qtx_tail, tx_id);
Looks good to me from the ARM64 perspective or in general.
Reviewed-by: Jerin Jacob <jerinj@marvell.com>
> txq->tx_tail = tx_id;
>
> return nb_tx;
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* [dpdk-dev] [PATCH v4] net/i40e: relaxed barrier in the tx fastpath
2019-10-22 15:27 [dpdk-dev] [PATCH v1 0/3] relax io barrier for aarch64 and use smp barriers for virtual pci memory Gavin Hu
` (8 preceding siblings ...)
2020-02-08 13:48 ` [dpdk-dev] [PATCH v3] net/i40e: relaxed barrier in the tx fastpath Gavin Hu
@ 2020-02-12 5:56 ` Gavin Hu
2020-02-15 15:16 ` Ye Xiaolong
9 siblings, 1 reply; 35+ messages in thread
From: Gavin Hu @ 2020-02-12 5:56 UTC (permalink / raw)
To: dev
Cc: nd, david.marchand, thomas, jerinj, xiaolong.ye,
Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
steve.capper, stable
To keep ordering of mixed accesses, rte_cio is sufficient.
The rte_io barrier inside the I40E_PCI_REG_WRITE is overkill.[1]
[1] http://inbox.dpdk.org/dev/CALBAE1M-ezVWCjqCZDBw+MMDEC4O9
qf0Kpn89EMdGDajepKoZQ@mail.gmail.com
Fixes: 4861cde46116 ("i40e: new poll mode driver")
Cc: stable@dpdk.org
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
---
V4:
- add the Fixes tag and CC stable <Xiaolong Ye>
V3:
- optimize the barriers in the fast path only, leave as it is for the
barriers in the slow path and control path <jerin>
- drop the virtio patches from the list as they are in the control path
- it makes more sense to relax the barrier in the fast path, at the PMD level.
relaxing the fundamental rte_io_x barriers APIs requires scrutinizations for
each PMDs which use the barriers directly or indirectly.
V2:
- remove virtio_pci_read/write64 APIs definitions, they are not needed and generate compiling errors like " error: unused function 'virtio_pci_write64' [-Werror,-Wunused-function]"
- update the reference link to kernel source code
---
drivers/net/i40e/i40e_rxtx.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index fd1ae80da..8c0f7cc67 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -1248,7 +1248,8 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
(unsigned) txq->port_id, (unsigned) txq->queue_id,
(unsigned) tx_id, (unsigned) nb_tx);
- I40E_PCI_REG_WRITE(txq->qtx_tail, tx_id);
+ rte_cio_wmb();
+ I40E_PCI_REG_WRITE_RELAXED(txq->qtx_tail, tx_id);
txq->tx_tail = tx_id;
return nb_tx;
--
2.17.1
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v4] net/i40e: relaxed barrier in the tx fastpath
2020-02-12 5:56 ` [dpdk-dev] [PATCH v4] " Gavin Hu
@ 2020-02-15 15:16 ` Ye Xiaolong
2020-02-16 9:51 ` Thomas Monjalon
0 siblings, 1 reply; 35+ messages in thread
From: Ye Xiaolong @ 2020-02-15 15:16 UTC (permalink / raw)
To: Gavin Hu
Cc: dev, nd, david.marchand, thomas, jerinj, Honnappa.Nagarahalli,
ruifeng.wang, phil.yang, joyce.kong, steve.capper, stable
s/relaxed/relax
On 02/12, Gavin Hu wrote:
>To keep ordering of mixed accesses, rte_cio is sufficient.
>The rte_io barrier inside the I40E_PCI_REG_WRITE is overkill.[1]
>
>[1] http://inbox.dpdk.org/dev/CALBAE1M-ezVWCjqCZDBw+MMDEC4O9
>qf0Kpn89EMdGDajepKoZQ@mail.gmail.com
>
>Fixes: 4861cde46116 ("i40e: new poll mode driver")
>Cc: stable@dpdk.org
>
>Signed-off-by: Gavin Hu <gavin.hu@arm.com>
>---
>V4:
>- add the Fixes tag and CC stable <Xiaolong Ye>
>V3:
>- optimize the barriers in the fast path only, leave as it is for the
> barriers in the slow path and control path <jerin>
>- drop the virtio patches from the list as they are in the control path
>- it makes more sense to relax the barrier in the fast path, at the PMD level.
> relaxing the fundamental rte_io_x barriers APIs requires scrutinizations for
> each PMDs which use the barriers directly or indirectly.
>V2:
>- remove virtio_pci_read/write64 APIs definitions, they are not needed and generate compiling errors like " error: unused function 'virtio_pci_write64' [-Werror,-Wunused-function]"
>- update the reference link to kernel source code
>---
> drivers/net/i40e/i40e_rxtx.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
>index fd1ae80da..8c0f7cc67 100644
>--- a/drivers/net/i40e/i40e_rxtx.c
>+++ b/drivers/net/i40e/i40e_rxtx.c
>@@ -1248,7 +1248,8 @@ i40e_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
> (unsigned) txq->port_id, (unsigned) txq->queue_id,
> (unsigned) tx_id, (unsigned) nb_tx);
>
>- I40E_PCI_REG_WRITE(txq->qtx_tail, tx_id);
>+ rte_cio_wmb();
>+ I40E_PCI_REG_WRITE_RELAXED(txq->qtx_tail, tx_id);
> txq->tx_tail = tx_id;
>
> return nb_tx;
>--
>2.17.1
>
Applied to dpdk-next-net-intel with Jerin's Reviewed-by tag, Thanks.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v4] net/i40e: relaxed barrier in the tx fastpath
2020-02-15 15:16 ` Ye Xiaolong
@ 2020-02-16 9:51 ` Thomas Monjalon
2020-02-16 16:38 ` Ye Xiaolong
0 siblings, 1 reply; 35+ messages in thread
From: Thomas Monjalon @ 2020-02-16 9:51 UTC (permalink / raw)
To: Ye Xiaolong, ferruh.yigit
Cc: Gavin Hu, dev, nd, david.marchand, jerinj, Honnappa.Nagarahalli,
ruifeng.wang, phil.yang, joyce.kong, steve.capper, stable
15/02/2020 16:16, Ye Xiaolong:
> s/relaxed/relax
>
> On 02/12, Gavin Hu wrote:
> >To keep ordering of mixed accesses, rte_cio is sufficient.
> >The rte_io barrier inside the I40E_PCI_REG_WRITE is overkill.[1]
[...]
>
> Applied to dpdk-next-net-intel with Jerin's Reviewed-by tag, Thanks.
I assume it is too much risky doing such optimization post-rc3.
Ferruh, Xiaolong, you don't plan anymore pull from dpdk-next-net-intel
in 20.02?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v4] net/i40e: relaxed barrier in the tx fastpath
2020-02-16 9:51 ` Thomas Monjalon
@ 2020-02-16 16:38 ` Ye Xiaolong
2020-02-16 17:36 ` Thomas Monjalon
0 siblings, 1 reply; 35+ messages in thread
From: Ye Xiaolong @ 2020-02-16 16:38 UTC (permalink / raw)
To: Thomas Monjalon
Cc: ferruh.yigit, Gavin Hu, dev, nd, david.marchand, jerinj,
Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
steve.capper, stable
Hi, Thomas
On 02/16, Thomas Monjalon wrote:
>15/02/2020 16:16, Ye Xiaolong:
>> s/relaxed/relax
>>
>> On 02/12, Gavin Hu wrote:
>> >To keep ordering of mixed accesses, rte_cio is sufficient.
>> >The rte_io barrier inside the I40E_PCI_REG_WRITE is overkill.[1]
>[...]
>>
>> Applied to dpdk-next-net-intel with Jerin's Reviewed-by tag, Thanks.
>
>I assume it is too much risky doing such optimization post-rc3.
Yes, this iss a valid concern, I agree to postpone it to next release.
>
>Ferruh, Xiaolong, you don't plan anymore pull from dpdk-next-net-intel
>in 20.02?
There are still some bug fixing work going on in PRC, so I assume there
should be some fix patches after RC3, they are still allowed to be merged
to 20.02, if the fix is relatively small in terms of lines of code and scope,
right?
Thanks,
Xiaolong
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [dpdk-dev] [PATCH v4] net/i40e: relaxed barrier in the tx fastpath
2020-02-16 16:38 ` Ye Xiaolong
@ 2020-02-16 17:36 ` Thomas Monjalon
0 siblings, 0 replies; 35+ messages in thread
From: Thomas Monjalon @ 2020-02-16 17:36 UTC (permalink / raw)
To: Ye Xiaolong
Cc: ferruh.yigit, Gavin Hu, dev, nd, david.marchand, jerinj,
Honnappa.Nagarahalli, ruifeng.wang, phil.yang, joyce.kong,
steve.capper, stable
16/02/2020 17:38, Ye Xiaolong:
> Hi, Thomas
>
> On 02/16, Thomas Monjalon wrote:
> >15/02/2020 16:16, Ye Xiaolong:
> >> s/relaxed/relax
> >>
> >> On 02/12, Gavin Hu wrote:
> >> >To keep ordering of mixed accesses, rte_cio is sufficient.
> >> >The rte_io barrier inside the I40E_PCI_REG_WRITE is overkill.[1]
> >[...]
> >>
> >> Applied to dpdk-next-net-intel with Jerin's Reviewed-by tag, Thanks.
> >
> >I assume it is too much risky doing such optimization post-rc3.
>
> Yes, this iss a valid concern, I agree to postpone it to next release.
>
> >
> >Ferruh, Xiaolong, you don't plan anymore pull from dpdk-next-net-intel
> >in 20.02?
>
> There are still some bug fixing work going on in PRC, so I assume there
> should be some fix patches after RC3, they are still allowed to be merged
> to 20.02, if the fix is relatively small in terms of lines of code and scope,
> right?
Right
^ permalink raw reply [flat|nested] 35+ messages in thread