* [dpdk-dev] [PATCH 01/10] net/enetc: do not stall in clean Tx ring
2020-03-02 14:31 [dpdk-dev] [PATCH 00/10] net/enetc: optimization and cleanup Hemant Agrawal
@ 2020-03-02 14:32 ` Hemant Agrawal
2020-03-02 14:32 ` [dpdk-dev] [PATCH 02/10] net/enetc: use relaxed read for Tx CI in clean Tx Hemant Agrawal
` (9 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Hemant Agrawal @ 2020-03-02 14:32 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, g.singh, Alex Marginean
From: Alex Marginean <alexandru.marginean@nxp.com>
Don't read the hardware CI register in a loop, read it once, clean up and
exit.
The issue with reading the register in a loop is that we're stalling here
trying to catch up with hardware which keeps sending traffic as long as it
has traffic to send, so in effect we could be waiting here for the Tx ring
to be drained by hardware, instead of us doing Rx in that meantime.
At the time we return the function there may be new BDs in the ring that
could be cleaned, we're just leaving those there for the next time.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
---
drivers/net/enetc/enetc_rxtx.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/net/enetc/enetc_rxtx.c b/drivers/net/enetc/enetc_rxtx.c
index 81b0ef3b1..b7ecb75ec 100644
--- a/drivers/net/enetc/enetc_rxtx.c
+++ b/drivers/net/enetc/enetc_rxtx.c
@@ -1,5 +1,5 @@
/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2018-2019 NXP
+ * Copyright 2018-2020 NXP
*/
#include <stdbool.h>
@@ -21,12 +21,24 @@ enetc_clean_tx_ring(struct enetc_bdr *tx_ring)
{
int tx_frm_cnt = 0;
struct enetc_swbd *tx_swbd;
- int i;
+ int i, hwci;
i = tx_ring->next_to_clean;
tx_swbd = &tx_ring->q_swbd[i];
- while ((int)(enetc_rd_reg(tx_ring->tcisr) &
- ENETC_TBCISR_IDX_MASK) != i) {
+
+ hwci = (int)(enetc_rd_reg(tx_ring->tcisr) &
+ ENETC_TBCISR_IDX_MASK);
+
+ /* we're only reading the CI index once here, which means HW may update
+ * it while we're doing clean-up. We could read the register in a loop
+ * but for now I assume it's OK to leave a few Tx frames for next call.
+ * The issue with reading the register in a loop is that we're stalling
+ * here trying to catch up with HW which keeps sending traffic as long
+ * as it has traffic to send, so in effect we could be waiting here for
+ * the Tx ring to be drained by HW, instead of us doing Rx in that
+ * meantime.
+ */
+ while (i != hwci) {
rte_pktmbuf_free(tx_swbd->buffer_addr);
tx_swbd->buffer_addr = NULL;
tx_swbd++;
--
2.17.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [dpdk-dev] [PATCH 02/10] net/enetc: use relaxed read for Tx CI in clean Tx
2020-03-02 14:31 [dpdk-dev] [PATCH 00/10] net/enetc: optimization and cleanup Hemant Agrawal
2020-03-02 14:32 ` [dpdk-dev] [PATCH 01/10] net/enetc: do not stall in clean Tx ring Hemant Agrawal
@ 2020-03-02 14:32 ` Hemant Agrawal
2020-03-02 14:32 ` [dpdk-dev] [PATCH 03/10] net/enetc: batch process enetc clean Tx ring calls Hemant Agrawal
` (8 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Hemant Agrawal @ 2020-03-02 14:32 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, g.singh, Alex Marginean
From: Alex Marginean <alexandru.marginean@nxp.com>
We don't need barriers here since this read doesn't have to be strictly
serialized in relation to other surrounding memory/register accesses.
We only want a reasonably recent value out of hardware so we know how much
we can clean.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
---
drivers/net/enetc/enetc_rxtx.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/net/enetc/enetc_rxtx.c b/drivers/net/enetc/enetc_rxtx.c
index b7ecb75ec..395f5ecf4 100644
--- a/drivers/net/enetc/enetc_rxtx.c
+++ b/drivers/net/enetc/enetc_rxtx.c
@@ -23,12 +23,15 @@ enetc_clean_tx_ring(struct enetc_bdr *tx_ring)
struct enetc_swbd *tx_swbd;
int i, hwci;
+ /* we don't need barriers here, we just want a relatively current value
+ * from HW.
+ */
+ hwci = (int)(rte_read32_relaxed(tx_ring->tcisr) &
+ ENETC_TBCISR_IDX_MASK);
+
i = tx_ring->next_to_clean;
tx_swbd = &tx_ring->q_swbd[i];
- hwci = (int)(enetc_rd_reg(tx_ring->tcisr) &
- ENETC_TBCISR_IDX_MASK);
-
/* we're only reading the CI index once here, which means HW may update
* it while we're doing clean-up. We could read the register in a loop
* but for now I assume it's OK to leave a few Tx frames for next call.
--
2.17.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [dpdk-dev] [PATCH 03/10] net/enetc: batch process enetc clean Tx ring calls
2020-03-02 14:31 [dpdk-dev] [PATCH 00/10] net/enetc: optimization and cleanup Hemant Agrawal
2020-03-02 14:32 ` [dpdk-dev] [PATCH 01/10] net/enetc: do not stall in clean Tx ring Hemant Agrawal
2020-03-02 14:32 ` [dpdk-dev] [PATCH 02/10] net/enetc: use relaxed read for Tx CI in clean Tx Hemant Agrawal
@ 2020-03-02 14:32 ` Hemant Agrawal
2020-03-02 14:32 ` [dpdk-dev] [PATCH 04/10] net/enetc: erratum wa for Rx lock-up issue Hemant Agrawal
` (7 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Hemant Agrawal @ 2020-03-02 14:32 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, g.singh, Alex Marginean
From: Alex Marginean <alexandru.marginean@nxp.com>
Each call to enetc_clean_tx_ring will cost at least 150-200 CPU cycles
even if no clean-up is done, due to the CI register read.
We're only calling it once at the end of the function, on the assumption
that software is slower than hardware and hardware completed sending older
frames out by now.
We're also cleaning up the ring before kicking off Tx for the new batch to
minimize chances of contention on the Tx ring.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
---
drivers/net/enetc/enetc_rxtx.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/net/enetc/enetc_rxtx.c b/drivers/net/enetc/enetc_rxtx.c
index 395f5ecf4..958e3a21d 100644
--- a/drivers/net/enetc/enetc_rxtx.c
+++ b/drivers/net/enetc/enetc_rxtx.c
@@ -76,7 +76,6 @@ enetc_xmit_pkts(void *tx_queue,
start = 0;
while (nb_pkts--) {
- enetc_clean_tx_ring(tx_ring);
tx_ring->q_swbd[i].buffer_addr = tx_pkts[start];
txbd = ENETC_TXBD(*tx_ring, i);
tx_swbd = &tx_ring->q_swbd[i];
@@ -92,6 +91,14 @@ enetc_xmit_pkts(void *tx_queue,
i = 0;
}
+ /* we're only cleaning up the Tx ring here, on the assumption that
+ * software is slower than hardware and hardware completed sending
+ * older frames out by now.
+ * We're also cleaning up the ring before kicking off Tx for the new
+ * batch to minimize chances of contention on the Tx ring
+ */
+ enetc_clean_tx_ring(tx_ring);
+
tx_ring->next_to_use = i;
enetc_wr_reg(tx_ring->tcir, i);
return start;
--
2.17.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [dpdk-dev] [PATCH 04/10] net/enetc: erratum wa for Rx lock-up issue
2020-03-02 14:31 [dpdk-dev] [PATCH 00/10] net/enetc: optimization and cleanup Hemant Agrawal
` (2 preceding siblings ...)
2020-03-02 14:32 ` [dpdk-dev] [PATCH 03/10] net/enetc: batch process enetc clean Tx ring calls Hemant Agrawal
@ 2020-03-02 14:32 ` Hemant Agrawal
2020-03-02 14:32 ` [dpdk-dev] [PATCH 05/10] net/enetc: improve batching Rx ring refill Hemant Agrawal
` (6 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Hemant Agrawal @ 2020-03-02 14:32 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, g.singh, Alex Marginean
From: Alex Marginean <alexandru.marginean@nxp.com>
The default value in hardware for the Rx MAC FIFO (@) is higher than it
should be and can lead to Rx lock-up under traffic.
Set it to the value recommended by hardware team, 1.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
Signed-off-by: Gagandeep Singh <g.singh@nxp.com>
---
drivers/net/enetc/base/enetc_hw.h | 3 ++-
drivers/net/enetc/enetc_ethdev.c | 5 ++++-
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/enetc/base/enetc_hw.h b/drivers/net/enetc/base/enetc_hw.h
index 2fe7ccb5b..00813284e 100644
--- a/drivers/net/enetc/base/enetc_hw.h
+++ b/drivers/net/enetc/base/enetc_hw.h
@@ -1,5 +1,5 @@
/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2018-2019 NXP
+ * Copyright 2018-2020 NXP
*/
#ifndef _ENETC_HW_H_
@@ -86,6 +86,7 @@ enum enetc_bdr_type {TX, RX};
#define ENETC_PSIPMAR1(n) (0x00104 + (n) * 0x20)
#define ENETC_PCAPR0 0x00900
#define ENETC_PCAPR1 0x00904
+#define ENETC_PM0_RX_FIFO 0x801C
#define ENETC_PM0_IF_MODE 0x8300
#define ENETC_PM1_IF_MODE 0x9300
#define ENETC_PMO_IFM_RG BIT(2)
diff --git a/drivers/net/enetc/enetc_ethdev.c b/drivers/net/enetc/enetc_ethdev.c
index 20b77c006..eb637d030 100644
--- a/drivers/net/enetc/enetc_ethdev.c
+++ b/drivers/net/enetc/enetc_ethdev.c
@@ -1,5 +1,5 @@
/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright 2018-2019 NXP
+ * Copyright 2018-2020 NXP
*/
#include <stdbool.h>
@@ -147,6 +147,9 @@ enetc_hardware_init(struct enetc_eth_hw *hw)
hw->hw.port = (void *)((size_t)hw->hw.reg + ENETC_PORT_BASE);
hw->hw.global = (void *)((size_t)hw->hw.reg + ENETC_GLOBAL_BASE);
+ /* WA for Rx lock-up HW erratum */
+ enetc_port_wr(enetc_hw, ENETC_PM0_RX_FIFO, 1);
+
/* Enabling Station Interface */
enetc_wr(enetc_hw, ENETC_SIMR, ENETC_SIMR_EN);
--
2.17.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [dpdk-dev] [PATCH 05/10] net/enetc: improve batching Rx ring refill
2020-03-02 14:31 [dpdk-dev] [PATCH 00/10] net/enetc: optimization and cleanup Hemant Agrawal
` (3 preceding siblings ...)
2020-03-02 14:32 ` [dpdk-dev] [PATCH 04/10] net/enetc: erratum wa for Rx lock-up issue Hemant Agrawal
@ 2020-03-02 14:32 ` Hemant Agrawal
2020-03-02 14:32 ` [dpdk-dev] [PATCH 06/10] net/enetc: cache align enetc bdr structure Hemant Agrawal
` (5 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Hemant Agrawal @ 2020-03-02 14:32 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, g.singh, Alex Marginean
From: Alex Marginean <alexandru.marginean@nxp.com>
Move from doing batch refill of Rx ring from bundles of 8 to once per
enetc_clean_rx_ring call. One benefit is that we're cleaning up all the
BDs that we just processed, which should still be cached. The other is
that hardware Rx index stays a little back and doesn't cause contention on
the BDs processed in the Rx loop.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
---
drivers/net/enetc/enetc_rxtx.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/drivers/net/enetc/enetc_rxtx.c b/drivers/net/enetc/enetc_rxtx.c
index 958e3a21d..262ed8a0f 100644
--- a/drivers/net/enetc/enetc_rxtx.c
+++ b/drivers/net/enetc/enetc_rxtx.c
@@ -14,8 +14,6 @@
#include "enetc.h"
#include "enetc_logs.h"
-#define ENETC_RXBD_BUNDLE 8 /* Number of BDs to update at once */
-
static int
enetc_clean_tx_ring(struct enetc_bdr *tx_ring)
{
@@ -305,12 +303,6 @@ enetc_clean_rx_ring(struct enetc_bdr *rx_ring,
union enetc_rx_bd *rxbd;
uint32_t bd_status;
- if (cleaned_cnt >= ENETC_RXBD_BUNDLE) {
- int count = enetc_refill_rx_ring(rx_ring, cleaned_cnt);
-
- cleaned_cnt -= count;
- }
-
rxbd = ENETC_RXBD(*rx_ring, i);
bd_status = rte_le_to_cpu_32(rxbd->r.lstatus);
if (!bd_status)
@@ -337,6 +329,8 @@ enetc_clean_rx_ring(struct enetc_bdr *rx_ring,
rx_frm_cnt++;
}
+ enetc_refill_rx_ring(rx_ring, cleaned_cnt);
+
return rx_frm_cnt;
}
--
2.17.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [dpdk-dev] [PATCH 06/10] net/enetc: cache align enetc bdr structure
2020-03-02 14:31 [dpdk-dev] [PATCH 00/10] net/enetc: optimization and cleanup Hemant Agrawal
` (4 preceding siblings ...)
2020-03-02 14:32 ` [dpdk-dev] [PATCH 05/10] net/enetc: improve batching Rx ring refill Hemant Agrawal
@ 2020-03-02 14:32 ` Hemant Agrawal
2020-03-02 14:32 ` [dpdk-dev] [PATCH 07/10] net/enetc: use bulk alloc in Rx refill ring Hemant Agrawal
` (4 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Hemant Agrawal @ 2020-03-02 14:32 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, g.singh, Alex Marginean
From: Alex Marginean <alexandru.marginean@nxp.com>
Reorder the members of the structure so that the ones used on datapath fit
in a single cache line, to slightly reduce pressure on cache and miss rate.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
---
drivers/net/enetc/enetc.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/enetc/enetc.h b/drivers/net/enetc/enetc.h
index 8c830a5c0..14ef3bc18 100644
--- a/drivers/net/enetc/enetc.h
+++ b/drivers/net/enetc/enetc.h
@@ -53,23 +53,23 @@ struct enetc_swbd {
};
struct enetc_bdr {
- struct rte_eth_dev *ndev;
- struct rte_mempool *mb_pool; /* mbuf pool to populate RX ring. */
void *bd_base; /* points to Rx or Tx BD ring */
+ struct enetc_swbd *q_swbd;
union {
void *tcir;
void *rcir;
};
- uint16_t index;
int bd_count; /* # of BDs */
int next_to_use;
int next_to_clean;
- struct enetc_swbd *q_swbd;
+ uint16_t index;
+ uint8_t crc_len; /* 0 if CRC stripped, 4 otherwise */
union {
void *tcisr; /* Tx */
int next_to_alloc; /* Rx */
};
- uint8_t crc_len; /* 0 if CRC stripped, 4 otherwise */
+ struct rte_mempool *mb_pool; /* mbuf pool to populate RX ring. */
+ struct rte_eth_dev *ndev;
};
/*
--
2.17.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [dpdk-dev] [PATCH 07/10] net/enetc: use bulk alloc in Rx refill ring
2020-03-02 14:31 [dpdk-dev] [PATCH 00/10] net/enetc: optimization and cleanup Hemant Agrawal
` (5 preceding siblings ...)
2020-03-02 14:32 ` [dpdk-dev] [PATCH 06/10] net/enetc: cache align enetc bdr structure Hemant Agrawal
@ 2020-03-02 14:32 ` Hemant Agrawal
2020-03-02 14:32 ` [dpdk-dev] [PATCH 08/10] net/enetc: use bulk free in Tx clean Hemant Agrawal
` (3 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Hemant Agrawal @ 2020-03-02 14:32 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, g.singh, Alex Marginean
From: Alex Marginean <alexandru.marginean@nxp.com>
Since we know in advance that we're going to fill in multiple descriptors
it's convenient to allocate the buffers in batches.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
---
drivers/net/enetc/enetc_rxtx.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/net/enetc/enetc_rxtx.c b/drivers/net/enetc/enetc_rxtx.c
index 262ed8a0f..8b85c5371 100644
--- a/drivers/net/enetc/enetc_rxtx.c
+++ b/drivers/net/enetc/enetc_rxtx.c
@@ -14,6 +14,8 @@
#include "enetc.h"
#include "enetc_logs.h"
+#define ENETC_RXBD_BUNDLE 16 /* Number of buffers to allocate at once */
+
static int
enetc_clean_tx_ring(struct enetc_bdr *tx_ring)
{
@@ -107,15 +109,25 @@ enetc_refill_rx_ring(struct enetc_bdr *rx_ring, const int buff_cnt)
{
struct enetc_swbd *rx_swbd;
union enetc_rx_bd *rxbd;
- int i, j;
+ int i, j, k = ENETC_RXBD_BUNDLE;
+ struct rte_mbuf *m[ENETC_RXBD_BUNDLE];
+ struct rte_mempool *mb_pool;
i = rx_ring->next_to_use;
+ mb_pool = rx_ring->mb_pool;
rx_swbd = &rx_ring->q_swbd[i];
rxbd = ENETC_RXBD(*rx_ring, i);
for (j = 0; j < buff_cnt; j++) {
- rx_swbd->buffer_addr = (void *)(uintptr_t)
- rte_cpu_to_le_64((uint64_t)(uintptr_t)
- rte_pktmbuf_alloc(rx_ring->mb_pool));
+ /* bulk alloc for the next up to 8 BDs */
+ if (k == ENETC_RXBD_BUNDLE) {
+ k = 0;
+ int m_cnt = RTE_MIN(buff_cnt - j, ENETC_RXBD_BUNDLE);
+
+ if (rte_pktmbuf_alloc_bulk(mb_pool, m, m_cnt))
+ return -1;
+ }
+
+ rx_swbd->buffer_addr = m[k];
rxbd->w.addr = (uint64_t)(uintptr_t)
rx_swbd->buffer_addr->buf_iova +
rx_swbd->buffer_addr->data_off;
@@ -124,6 +136,7 @@ enetc_refill_rx_ring(struct enetc_bdr *rx_ring, const int buff_cnt)
rx_swbd++;
rxbd++;
i++;
+ k++;
if (unlikely(i == rx_ring->bd_count)) {
i = 0;
rxbd = ENETC_RXBD(*rx_ring, 0);
--
2.17.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [dpdk-dev] [PATCH 08/10] net/enetc: use bulk free in Tx clean
2020-03-02 14:31 [dpdk-dev] [PATCH 00/10] net/enetc: optimization and cleanup Hemant Agrawal
` (6 preceding siblings ...)
2020-03-02 14:32 ` [dpdk-dev] [PATCH 07/10] net/enetc: use bulk alloc in Rx refill ring Hemant Agrawal
@ 2020-03-02 14:32 ` Hemant Agrawal
2020-03-02 14:32 ` [dpdk-dev] [PATCH 09/10] net/enetc: improve prefetch in Rx ring clean Hemant Agrawal
` (2 subsequent siblings)
10 siblings, 0 replies; 13+ messages in thread
From: Hemant Agrawal @ 2020-03-02 14:32 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, g.singh, Alex Marginean
From: Alex Marginean <alexandru.marginean@nxp.com>
Use rte_pktmbuf_free_bulk to release all mbufs at once. This is flagged
as obsolete/not yet stable in DPDK but seems to be functional.
Don't count the released frames, it's no longer needed in the caller.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
---
drivers/net/enetc/Makefile | 1 +
drivers/net/enetc/enetc_rxtx.c | 32 ++++++++++++++++++++++++--------
drivers/net/enetc/meson.build | 1 +
3 files changed, 26 insertions(+), 8 deletions(-)
diff --git a/drivers/net/enetc/Makefile b/drivers/net/enetc/Makefile
index 7276026e3..7f7a85f64 100644
--- a/drivers/net/enetc/Makefile
+++ b/drivers/net/enetc/Makefile
@@ -11,6 +11,7 @@ LIB = librte_pmd_enetc.a
CFLAGS += -O3
CFLAGS += $(WERROR_FLAGS)
CFLAGS += -I$(RTE_SDK)/drivers/common/dpaax
+CFLAGS += -DALLOW_EXPERIMENTAL_API
EXPORT_MAP := rte_pmd_enetc_version.map
SRCS-$(CONFIG_RTE_LIBRTE_ENETC_PMD) += enetc_ethdev.c
SRCS-$(CONFIG_RTE_LIBRTE_ENETC_PMD) += enetc_rxtx.c
diff --git a/drivers/net/enetc/enetc_rxtx.c b/drivers/net/enetc/enetc_rxtx.c
index 8b85c5371..1acc43a08 100644
--- a/drivers/net/enetc/enetc_rxtx.c
+++ b/drivers/net/enetc/enetc_rxtx.c
@@ -20,8 +20,9 @@ static int
enetc_clean_tx_ring(struct enetc_bdr *tx_ring)
{
int tx_frm_cnt = 0;
- struct enetc_swbd *tx_swbd;
- int i, hwci;
+ struct enetc_swbd *tx_swbd, *tx_swbd_base;
+ int i, hwci, bd_count;
+ struct rte_mbuf *m[ENETC_RXBD_BUNDLE];
/* we don't need barriers here, we just want a relatively current value
* from HW.
@@ -29,8 +30,10 @@ enetc_clean_tx_ring(struct enetc_bdr *tx_ring)
hwci = (int)(rte_read32_relaxed(tx_ring->tcisr) &
ENETC_TBCISR_IDX_MASK);
+ tx_swbd_base = tx_ring->q_swbd;
+ bd_count = tx_ring->bd_count;
i = tx_ring->next_to_clean;
- tx_swbd = &tx_ring->q_swbd[i];
+ tx_swbd = &tx_swbd_base[i];
/* we're only reading the CI index once here, which means HW may update
* it while we're doing clean-up. We could read the register in a loop
@@ -42,20 +45,33 @@ enetc_clean_tx_ring(struct enetc_bdr *tx_ring)
* meantime.
*/
while (i != hwci) {
- rte_pktmbuf_free(tx_swbd->buffer_addr);
+ /* It seems calling rte_pktmbuf_free is wasting a lot of cycles,
+ * make a list and call _free when it's done.
+ */
+ if (tx_frm_cnt == ENETC_RXBD_BUNDLE) {
+ rte_pktmbuf_free_bulk(m, tx_frm_cnt);
+ tx_frm_cnt = 0;
+ }
+
+ m[tx_frm_cnt] = tx_swbd->buffer_addr;
tx_swbd->buffer_addr = NULL;
- tx_swbd++;
+
i++;
- if (unlikely(i == tx_ring->bd_count)) {
+ tx_swbd++;
+ if (unlikely(i == bd_count)) {
i = 0;
- tx_swbd = &tx_ring->q_swbd[0];
+ tx_swbd = tx_swbd_base;
}
tx_frm_cnt++;
}
+ if (tx_frm_cnt)
+ rte_pktmbuf_free_bulk(m, tx_frm_cnt);
+
tx_ring->next_to_clean = i;
- return tx_frm_cnt++;
+
+ return 0;
}
uint16_t
diff --git a/drivers/net/enetc/meson.build b/drivers/net/enetc/meson.build
index bea54bea8..af11c0960 100644
--- a/drivers/net/enetc/meson.build
+++ b/drivers/net/enetc/meson.build
@@ -11,3 +11,4 @@ sources = files('enetc_ethdev.c',
'enetc_rxtx.c')
includes += include_directories('base')
+allow_experimental_apis = true
--
2.17.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [dpdk-dev] [PATCH 09/10] net/enetc: improve prefetch in Rx ring clean
2020-03-02 14:31 [dpdk-dev] [PATCH 00/10] net/enetc: optimization and cleanup Hemant Agrawal
` (7 preceding siblings ...)
2020-03-02 14:32 ` [dpdk-dev] [PATCH 08/10] net/enetc: use bulk free in Tx clean Hemant Agrawal
@ 2020-03-02 14:32 ` Hemant Agrawal
2020-03-02 14:32 ` [dpdk-dev] [PATCH 10/10] net/enetc: init SI transactions attribute reg Hemant Agrawal
2020-03-03 12:31 ` [dpdk-dev] [PATCH 00/10] net/enetc: optimization and cleanup Gagandeep Singh
10 siblings, 0 replies; 13+ messages in thread
From: Hemant Agrawal @ 2020-03-02 14:32 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, g.singh, Alex Marginean
From: Alex Marginean <alexandru.marginean@nxp.com>
LS1028A does not have platform cache so any reads following a hardware
write will go directly to DDR. Latency of such a read is in excess of 100
core cycles, so try to prefetch more in advance to mitigate this.
How much is worth prefetching really depends on traffic conditions. With
congested Rx this could go up to 4 cache lines or so. But if software
keeps up with hardware and follows behind Rx PI by a cache line then it's
harmful in terms of performance to cache more. We would only prefetch
data that's yet to be written by ENETC, which will be evicted again anyway.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
---
drivers/net/enetc/enetc_rxtx.c | 38 +++++++++++++++++++++++++++++-----
1 file changed, 33 insertions(+), 5 deletions(-)
diff --git a/drivers/net/enetc/enetc_rxtx.c b/drivers/net/enetc/enetc_rxtx.c
index 1acc43a08..e57ecf2d4 100644
--- a/drivers/net/enetc/enetc_rxtx.c
+++ b/drivers/net/enetc/enetc_rxtx.c
@@ -14,6 +14,8 @@
#include "enetc.h"
#include "enetc_logs.h"
+#define ENETC_CACHE_LINE_RXBDS (RTE_CACHE_LINE_SIZE / \
+ sizeof(union enetc_rx_bd))
#define ENETC_RXBD_BUNDLE 16 /* Number of buffers to allocate at once */
static int
@@ -321,18 +323,37 @@ enetc_clean_rx_ring(struct enetc_bdr *rx_ring,
int work_limit)
{
int rx_frm_cnt = 0;
- int cleaned_cnt, i;
+ int cleaned_cnt, i, bd_count;
struct enetc_swbd *rx_swbd;
+ union enetc_rx_bd *rxbd;
- cleaned_cnt = enetc_bd_unused(rx_ring);
/* next descriptor to process */
i = rx_ring->next_to_clean;
+ /* next descriptor to process */
+ rxbd = ENETC_RXBD(*rx_ring, i);
+ rte_prefetch0(rxbd);
+ bd_count = rx_ring->bd_count;
+ /* LS1028A does not have platform cache so any software access following
+ * a hardware write will go directly to DDR. Latency of such a read is
+ * in excess of 100 core cycles, so try to prefetch more in advance to
+ * mitigate this.
+ * How much is worth prefetching really depends on traffic conditions.
+ * With congested Rx this could go up to 4 cache lines or so. But if
+ * software keeps up with hardware and follows behind Rx PI by a cache
+ * line or less then it's harmful in terms of performance to cache more.
+ * We would only prefetch BDs that have yet to be written by ENETC,
+ * which will have to be evicted again anyway.
+ */
+ rte_prefetch0(ENETC_RXBD(*rx_ring,
+ (i + ENETC_CACHE_LINE_RXBDS) % bd_count));
+ rte_prefetch0(ENETC_RXBD(*rx_ring,
+ (i + ENETC_CACHE_LINE_RXBDS * 2) % bd_count));
+
+ cleaned_cnt = enetc_bd_unused(rx_ring);
rx_swbd = &rx_ring->q_swbd[i];
while (likely(rx_frm_cnt < work_limit)) {
- union enetc_rx_bd *rxbd;
uint32_t bd_status;
- rxbd = ENETC_RXBD(*rx_ring, i);
bd_status = rte_le_to_cpu_32(rxbd->r.lstatus);
if (!bd_status)
break;
@@ -353,11 +374,18 @@ enetc_clean_rx_ring(struct enetc_bdr *rx_ring,
i = 0;
rx_swbd = &rx_ring->q_swbd[i];
}
+ rxbd = ENETC_RXBD(*rx_ring, i);
+ rte_prefetch0(ENETC_RXBD(*rx_ring,
+ (i + ENETC_CACHE_LINE_RXBDS) %
+ bd_count));
+ rte_prefetch0(ENETC_RXBD(*rx_ring,
+ (i + ENETC_CACHE_LINE_RXBDS * 2) %
+ bd_count));
- rx_ring->next_to_clean = i;
rx_frm_cnt++;
}
+ rx_ring->next_to_clean = i;
enetc_refill_rx_ring(rx_ring, cleaned_cnt);
return rx_frm_cnt;
--
2.17.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [dpdk-dev] [PATCH 10/10] net/enetc: init SI transactions attribute reg
2020-03-02 14:31 [dpdk-dev] [PATCH 00/10] net/enetc: optimization and cleanup Hemant Agrawal
` (8 preceding siblings ...)
2020-03-02 14:32 ` [dpdk-dev] [PATCH 09/10] net/enetc: improve prefetch in Rx ring clean Hemant Agrawal
@ 2020-03-02 14:32 ` Hemant Agrawal
2020-03-03 12:31 ` [dpdk-dev] [PATCH 00/10] net/enetc: optimization and cleanup Gagandeep Singh
10 siblings, 0 replies; 13+ messages in thread
From: Hemant Agrawal @ 2020-03-02 14:32 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, g.singh, Alex Marginean
From: Alex Marginean <alexandru.marginean@nxp.com>
This was left to its default value. With the patch transactions are:
- coherent,
- do not allocate in downstream cache (there is none on LS1028a),
- merge surrounding data for BD writes,
- overwrite surrounding data for frame data writes.
Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
---
drivers/net/enetc/base/enetc_hw.h | 2 ++
drivers/net/enetc/enetc_ethdev.c | 6 ++++++
2 files changed, 8 insertions(+)
diff --git a/drivers/net/enetc/base/enetc_hw.h b/drivers/net/enetc/base/enetc_hw.h
index 00813284e..66fad58e5 100644
--- a/drivers/net/enetc/base/enetc_hw.h
+++ b/drivers/net/enetc/base/enetc_hw.h
@@ -22,6 +22,8 @@
#define ENETC_SIMR 0x0
#define ENETC_SIMR_EN BIT(31)
+#define ENETC_SICAR0 0x40
+#define ENETC_SICAR0_COHERENT 0x2B2B6727
#define ENETC_SIPMAR0 0x80
#define ENETC_SIPMAR1 0x84
diff --git a/drivers/net/enetc/enetc_ethdev.c b/drivers/net/enetc/enetc_ethdev.c
index eb637d030..1716e11dd 100644
--- a/drivers/net/enetc/enetc_ethdev.c
+++ b/drivers/net/enetc/enetc_ethdev.c
@@ -150,6 +150,12 @@ enetc_hardware_init(struct enetc_eth_hw *hw)
/* WA for Rx lock-up HW erratum */
enetc_port_wr(enetc_hw, ENETC_PM0_RX_FIFO, 1);
+ /* set ENETC transaction flags to coherent, don't allocate.
+ * BD writes merge with surrounding cache line data, frame data writes
+ * overwrite cache line.
+ */
+ enetc_wr(enetc_hw, ENETC_SICAR0, ENETC_SICAR0_COHERENT);
+
/* Enabling Station Interface */
enetc_wr(enetc_hw, ENETC_SIMR, ENETC_SIMR_EN);
--
2.17.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-dev] [PATCH 00/10] net/enetc: optimization and cleanup
2020-03-02 14:31 [dpdk-dev] [PATCH 00/10] net/enetc: optimization and cleanup Hemant Agrawal
` (9 preceding siblings ...)
2020-03-02 14:32 ` [dpdk-dev] [PATCH 10/10] net/enetc: init SI transactions attribute reg Hemant Agrawal
@ 2020-03-03 12:31 ` Gagandeep Singh
2020-03-03 14:02 ` Ferruh Yigit
10 siblings, 1 reply; 13+ messages in thread
From: Gagandeep Singh @ 2020-03-03 12:31 UTC (permalink / raw)
To: Hemant Agrawal, ferruh.yigit; +Cc: dev
> -----Original Message-----
> From: Hemant Agrawal <hemant.agrawal@nxp.com>
> Sent: Monday, March 2, 2020 8:02 PM
> To: ferruh.yigit@intel.com
> Cc: dev@dpdk.org; Gagandeep Singh <G.Singh@nxp.com>
> Subject: [PATCH 00/10] net/enetc: optimization and cleanup
>
> This patch series includes patches to optimize and clean
> the network driver for ENETC
>
> Alex Marginean (10):
> net/enetc: do not stall in clean Tx ring
> net/enetc: use relaxed read for Tx CI in clean Tx
> net/enetc: batch process enetc clean Tx ring calls
> net/enetc: erratum wa for Rx lock-up issue
> net/enetc: improve batching Rx ring refill
> net/enetc: cache align enetc bdr structure
> net/enetc: use bulk alloc in Rx refill ring
> net/enetc: use bulk free in Tx clean
> net/enetc: improve prefetch in Rx ring clean
> net/enetc: init SI transactions attribute reg
>
> drivers/net/enetc/Makefile | 1 +
> drivers/net/enetc/base/enetc_hw.h | 5 +-
> drivers/net/enetc/enetc.h | 10 +--
> drivers/net/enetc/enetc_ethdev.c | 11 ++-
> drivers/net/enetc/enetc_rxtx.c | 131 +++++++++++++++++++++++-------
> drivers/net/enetc/meson.build | 1 +
> 6 files changed, 123 insertions(+), 36 deletions(-)
>
Series-acked-by: Gagandeep Singh <g.singh@nxp.com>
> --
> 2.17.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-dev] [PATCH 00/10] net/enetc: optimization and cleanup
2020-03-03 12:31 ` [dpdk-dev] [PATCH 00/10] net/enetc: optimization and cleanup Gagandeep Singh
@ 2020-03-03 14:02 ` Ferruh Yigit
0 siblings, 0 replies; 13+ messages in thread
From: Ferruh Yigit @ 2020-03-03 14:02 UTC (permalink / raw)
To: Gagandeep Singh, Hemant Agrawal; +Cc: dev
On 3/3/2020 12:31 PM, Gagandeep Singh wrote:
>
>
>> -----Original Message-----
>> From: Hemant Agrawal <hemant.agrawal@nxp.com>
>> Sent: Monday, March 2, 2020 8:02 PM
>> To: ferruh.yigit@intel.com
>> Cc: dev@dpdk.org; Gagandeep Singh <G.Singh@nxp.com>
>> Subject: [PATCH 00/10] net/enetc: optimization and cleanup
>>
>> This patch series includes patches to optimize and clean
>> the network driver for ENETC
>>
>> Alex Marginean (10):
>> net/enetc: do not stall in clean Tx ring
>> net/enetc: use relaxed read for Tx CI in clean Tx
>> net/enetc: batch process enetc clean Tx ring calls
>> net/enetc: erratum wa for Rx lock-up issue
>> net/enetc: improve batching Rx ring refill
>> net/enetc: cache align enetc bdr structure
>> net/enetc: use bulk alloc in Rx refill ring
>> net/enetc: use bulk free in Tx clean
>> net/enetc: improve prefetch in Rx ring clean
>> net/enetc: init SI transactions attribute reg
>>
>> drivers/net/enetc/Makefile | 1 +
>> drivers/net/enetc/base/enetc_hw.h | 5 +-
>> drivers/net/enetc/enetc.h | 10 +--
>> drivers/net/enetc/enetc_ethdev.c | 11 ++-
>> drivers/net/enetc/enetc_rxtx.c | 131 +++++++++++++++++++++++-------
>> drivers/net/enetc/meson.build | 1 +
>> 6 files changed, 123 insertions(+), 36 deletions(-)
>>
>
> Series-acked-by: Gagandeep Singh <g.singh@nxp.com>
>
Series applied to dpdk-next-net/master, thanks.
^ permalink raw reply [flat|nested] 13+ messages in thread