This patch series includes patches to optimize and clean the network driver for ENETC Alex Marginean (10): net/enetc: do not stall in clean Tx ring net/enetc: use relaxed read for Tx CI in clean Tx net/enetc: batch process enetc clean Tx ring calls net/enetc: erratum wa for Rx lock-up issue net/enetc: improve batching Rx ring refill net/enetc: cache align enetc bdr structure net/enetc: use bulk alloc in Rx refill ring net/enetc: use bulk free in Tx clean net/enetc: improve prefetch in Rx ring clean net/enetc: init SI transactions attribute reg drivers/net/enetc/Makefile | 1 + drivers/net/enetc/base/enetc_hw.h | 5 +- drivers/net/enetc/enetc.h | 10 +-- drivers/net/enetc/enetc_ethdev.c | 11 ++- drivers/net/enetc/enetc_rxtx.c | 131 +++++++++++++++++++++++------- drivers/net/enetc/meson.build | 1 + 6 files changed, 123 insertions(+), 36 deletions(-) -- 2.17.1
From: Alex Marginean <alexandru.marginean@nxp.com> Don't read the hardware CI register in a loop, read it once, clean up and exit. The issue with reading the register in a loop is that we're stalling here trying to catch up with hardware which keeps sending traffic as long as it has traffic to send, so in effect we could be waiting here for the Tx ring to be drained by hardware, instead of us doing Rx in that meantime. At the time we return the function there may be new BDs in the ring that could be cleaned, we're just leaving those there for the next time. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> --- drivers/net/enetc/enetc_rxtx.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/net/enetc/enetc_rxtx.c b/drivers/net/enetc/enetc_rxtx.c index 81b0ef3b1..b7ecb75ec 100644 --- a/drivers/net/enetc/enetc_rxtx.c +++ b/drivers/net/enetc/enetc_rxtx.c @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright 2018-2019 NXP + * Copyright 2018-2020 NXP */ #include <stdbool.h> @@ -21,12 +21,24 @@ enetc_clean_tx_ring(struct enetc_bdr *tx_ring) { int tx_frm_cnt = 0; struct enetc_swbd *tx_swbd; - int i; + int i, hwci; i = tx_ring->next_to_clean; tx_swbd = &tx_ring->q_swbd[i]; - while ((int)(enetc_rd_reg(tx_ring->tcisr) & - ENETC_TBCISR_IDX_MASK) != i) { + + hwci = (int)(enetc_rd_reg(tx_ring->tcisr) & + ENETC_TBCISR_IDX_MASK); + + /* we're only reading the CI index once here, which means HW may update + * it while we're doing clean-up. We could read the register in a loop + * but for now I assume it's OK to leave a few Tx frames for next call. + * The issue with reading the register in a loop is that we're stalling + * here trying to catch up with HW which keeps sending traffic as long + * as it has traffic to send, so in effect we could be waiting here for + * the Tx ring to be drained by HW, instead of us doing Rx in that + * meantime. + */ + while (i != hwci) { rte_pktmbuf_free(tx_swbd->buffer_addr); tx_swbd->buffer_addr = NULL; tx_swbd++; -- 2.17.1
From: Alex Marginean <alexandru.marginean@nxp.com> We don't need barriers here since this read doesn't have to be strictly serialized in relation to other surrounding memory/register accesses. We only want a reasonably recent value out of hardware so we know how much we can clean. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> --- drivers/net/enetc/enetc_rxtx.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/net/enetc/enetc_rxtx.c b/drivers/net/enetc/enetc_rxtx.c index b7ecb75ec..395f5ecf4 100644 --- a/drivers/net/enetc/enetc_rxtx.c +++ b/drivers/net/enetc/enetc_rxtx.c @@ -23,12 +23,15 @@ enetc_clean_tx_ring(struct enetc_bdr *tx_ring) struct enetc_swbd *tx_swbd; int i, hwci; + /* we don't need barriers here, we just want a relatively current value + * from HW. + */ + hwci = (int)(rte_read32_relaxed(tx_ring->tcisr) & + ENETC_TBCISR_IDX_MASK); + i = tx_ring->next_to_clean; tx_swbd = &tx_ring->q_swbd[i]; - hwci = (int)(enetc_rd_reg(tx_ring->tcisr) & - ENETC_TBCISR_IDX_MASK); - /* we're only reading the CI index once here, which means HW may update * it while we're doing clean-up. We could read the register in a loop * but for now I assume it's OK to leave a few Tx frames for next call. -- 2.17.1
From: Alex Marginean <alexandru.marginean@nxp.com> Each call to enetc_clean_tx_ring will cost at least 150-200 CPU cycles even if no clean-up is done, due to the CI register read. We're only calling it once at the end of the function, on the assumption that software is slower than hardware and hardware completed sending older frames out by now. We're also cleaning up the ring before kicking off Tx for the new batch to minimize chances of contention on the Tx ring. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> --- drivers/net/enetc/enetc_rxtx.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/enetc/enetc_rxtx.c b/drivers/net/enetc/enetc_rxtx.c index 395f5ecf4..958e3a21d 100644 --- a/drivers/net/enetc/enetc_rxtx.c +++ b/drivers/net/enetc/enetc_rxtx.c @@ -76,7 +76,6 @@ enetc_xmit_pkts(void *tx_queue, start = 0; while (nb_pkts--) { - enetc_clean_tx_ring(tx_ring); tx_ring->q_swbd[i].buffer_addr = tx_pkts[start]; txbd = ENETC_TXBD(*tx_ring, i); tx_swbd = &tx_ring->q_swbd[i]; @@ -92,6 +91,14 @@ enetc_xmit_pkts(void *tx_queue, i = 0; } + /* we're only cleaning up the Tx ring here, on the assumption that + * software is slower than hardware and hardware completed sending + * older frames out by now. + * We're also cleaning up the ring before kicking off Tx for the new + * batch to minimize chances of contention on the Tx ring + */ + enetc_clean_tx_ring(tx_ring); + tx_ring->next_to_use = i; enetc_wr_reg(tx_ring->tcir, i); return start; -- 2.17.1
From: Alex Marginean <alexandru.marginean@nxp.com> The default value in hardware for the Rx MAC FIFO (@) is higher than it should be and can lead to Rx lock-up under traffic. Set it to the value recommended by hardware team, 1. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Gagandeep Singh <g.singh@nxp.com> --- drivers/net/enetc/base/enetc_hw.h | 3 ++- drivers/net/enetc/enetc_ethdev.c | 5 ++++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/enetc/base/enetc_hw.h b/drivers/net/enetc/base/enetc_hw.h index 2fe7ccb5b..00813284e 100644 --- a/drivers/net/enetc/base/enetc_hw.h +++ b/drivers/net/enetc/base/enetc_hw.h @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright 2018-2019 NXP + * Copyright 2018-2020 NXP */ #ifndef _ENETC_HW_H_ @@ -86,6 +86,7 @@ enum enetc_bdr_type {TX, RX}; #define ENETC_PSIPMAR1(n) (0x00104 + (n) * 0x20) #define ENETC_PCAPR0 0x00900 #define ENETC_PCAPR1 0x00904 +#define ENETC_PM0_RX_FIFO 0x801C #define ENETC_PM0_IF_MODE 0x8300 #define ENETC_PM1_IF_MODE 0x9300 #define ENETC_PMO_IFM_RG BIT(2) diff --git a/drivers/net/enetc/enetc_ethdev.c b/drivers/net/enetc/enetc_ethdev.c index 20b77c006..eb637d030 100644 --- a/drivers/net/enetc/enetc_ethdev.c +++ b/drivers/net/enetc/enetc_ethdev.c @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright 2018-2019 NXP + * Copyright 2018-2020 NXP */ #include <stdbool.h> @@ -147,6 +147,9 @@ enetc_hardware_init(struct enetc_eth_hw *hw) hw->hw.port = (void *)((size_t)hw->hw.reg + ENETC_PORT_BASE); hw->hw.global = (void *)((size_t)hw->hw.reg + ENETC_GLOBAL_BASE); + /* WA for Rx lock-up HW erratum */ + enetc_port_wr(enetc_hw, ENETC_PM0_RX_FIFO, 1); + /* Enabling Station Interface */ enetc_wr(enetc_hw, ENETC_SIMR, ENETC_SIMR_EN); -- 2.17.1
From: Alex Marginean <alexandru.marginean@nxp.com> Move from doing batch refill of Rx ring from bundles of 8 to once per enetc_clean_rx_ring call. One benefit is that we're cleaning up all the BDs that we just processed, which should still be cached. The other is that hardware Rx index stays a little back and doesn't cause contention on the BDs processed in the Rx loop. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> --- drivers/net/enetc/enetc_rxtx.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/drivers/net/enetc/enetc_rxtx.c b/drivers/net/enetc/enetc_rxtx.c index 958e3a21d..262ed8a0f 100644 --- a/drivers/net/enetc/enetc_rxtx.c +++ b/drivers/net/enetc/enetc_rxtx.c @@ -14,8 +14,6 @@ #include "enetc.h" #include "enetc_logs.h" -#define ENETC_RXBD_BUNDLE 8 /* Number of BDs to update at once */ - static int enetc_clean_tx_ring(struct enetc_bdr *tx_ring) { @@ -305,12 +303,6 @@ enetc_clean_rx_ring(struct enetc_bdr *rx_ring, union enetc_rx_bd *rxbd; uint32_t bd_status; - if (cleaned_cnt >= ENETC_RXBD_BUNDLE) { - int count = enetc_refill_rx_ring(rx_ring, cleaned_cnt); - - cleaned_cnt -= count; - } - rxbd = ENETC_RXBD(*rx_ring, i); bd_status = rte_le_to_cpu_32(rxbd->r.lstatus); if (!bd_status) @@ -337,6 +329,8 @@ enetc_clean_rx_ring(struct enetc_bdr *rx_ring, rx_frm_cnt++; } + enetc_refill_rx_ring(rx_ring, cleaned_cnt); + return rx_frm_cnt; } -- 2.17.1
From: Alex Marginean <alexandru.marginean@nxp.com> Reorder the members of the structure so that the ones used on datapath fit in a single cache line, to slightly reduce pressure on cache and miss rate. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> --- drivers/net/enetc/enetc.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/enetc/enetc.h b/drivers/net/enetc/enetc.h index 8c830a5c0..14ef3bc18 100644 --- a/drivers/net/enetc/enetc.h +++ b/drivers/net/enetc/enetc.h @@ -53,23 +53,23 @@ struct enetc_swbd { }; struct enetc_bdr { - struct rte_eth_dev *ndev; - struct rte_mempool *mb_pool; /* mbuf pool to populate RX ring. */ void *bd_base; /* points to Rx or Tx BD ring */ + struct enetc_swbd *q_swbd; union { void *tcir; void *rcir; }; - uint16_t index; int bd_count; /* # of BDs */ int next_to_use; int next_to_clean; - struct enetc_swbd *q_swbd; + uint16_t index; + uint8_t crc_len; /* 0 if CRC stripped, 4 otherwise */ union { void *tcisr; /* Tx */ int next_to_alloc; /* Rx */ }; - uint8_t crc_len; /* 0 if CRC stripped, 4 otherwise */ + struct rte_mempool *mb_pool; /* mbuf pool to populate RX ring. */ + struct rte_eth_dev *ndev; }; /* -- 2.17.1
From: Alex Marginean <alexandru.marginean@nxp.com> Since we know in advance that we're going to fill in multiple descriptors it's convenient to allocate the buffers in batches. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> --- drivers/net/enetc/enetc_rxtx.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/net/enetc/enetc_rxtx.c b/drivers/net/enetc/enetc_rxtx.c index 262ed8a0f..8b85c5371 100644 --- a/drivers/net/enetc/enetc_rxtx.c +++ b/drivers/net/enetc/enetc_rxtx.c @@ -14,6 +14,8 @@ #include "enetc.h" #include "enetc_logs.h" +#define ENETC_RXBD_BUNDLE 16 /* Number of buffers to allocate at once */ + static int enetc_clean_tx_ring(struct enetc_bdr *tx_ring) { @@ -107,15 +109,25 @@ enetc_refill_rx_ring(struct enetc_bdr *rx_ring, const int buff_cnt) { struct enetc_swbd *rx_swbd; union enetc_rx_bd *rxbd; - int i, j; + int i, j, k = ENETC_RXBD_BUNDLE; + struct rte_mbuf *m[ENETC_RXBD_BUNDLE]; + struct rte_mempool *mb_pool; i = rx_ring->next_to_use; + mb_pool = rx_ring->mb_pool; rx_swbd = &rx_ring->q_swbd[i]; rxbd = ENETC_RXBD(*rx_ring, i); for (j = 0; j < buff_cnt; j++) { - rx_swbd->buffer_addr = (void *)(uintptr_t) - rte_cpu_to_le_64((uint64_t)(uintptr_t) - rte_pktmbuf_alloc(rx_ring->mb_pool)); + /* bulk alloc for the next up to 8 BDs */ + if (k == ENETC_RXBD_BUNDLE) { + k = 0; + int m_cnt = RTE_MIN(buff_cnt - j, ENETC_RXBD_BUNDLE); + + if (rte_pktmbuf_alloc_bulk(mb_pool, m, m_cnt)) + return -1; + } + + rx_swbd->buffer_addr = m[k]; rxbd->w.addr = (uint64_t)(uintptr_t) rx_swbd->buffer_addr->buf_iova + rx_swbd->buffer_addr->data_off; @@ -124,6 +136,7 @@ enetc_refill_rx_ring(struct enetc_bdr *rx_ring, const int buff_cnt) rx_swbd++; rxbd++; i++; + k++; if (unlikely(i == rx_ring->bd_count)) { i = 0; rxbd = ENETC_RXBD(*rx_ring, 0); -- 2.17.1
From: Alex Marginean <alexandru.marginean@nxp.com> Use rte_pktmbuf_free_bulk to release all mbufs at once. This is flagged as obsolete/not yet stable in DPDK but seems to be functional. Don't count the released frames, it's no longer needed in the caller. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> --- drivers/net/enetc/Makefile | 1 + drivers/net/enetc/enetc_rxtx.c | 32 ++++++++++++++++++++++++-------- drivers/net/enetc/meson.build | 1 + 3 files changed, 26 insertions(+), 8 deletions(-) diff --git a/drivers/net/enetc/Makefile b/drivers/net/enetc/Makefile index 7276026e3..7f7a85f64 100644 --- a/drivers/net/enetc/Makefile +++ b/drivers/net/enetc/Makefile @@ -11,6 +11,7 @@ LIB = librte_pmd_enetc.a CFLAGS += -O3 CFLAGS += $(WERROR_FLAGS) CFLAGS += -I$(RTE_SDK)/drivers/common/dpaax +CFLAGS += -DALLOW_EXPERIMENTAL_API EXPORT_MAP := rte_pmd_enetc_version.map SRCS-$(CONFIG_RTE_LIBRTE_ENETC_PMD) += enetc_ethdev.c SRCS-$(CONFIG_RTE_LIBRTE_ENETC_PMD) += enetc_rxtx.c diff --git a/drivers/net/enetc/enetc_rxtx.c b/drivers/net/enetc/enetc_rxtx.c index 8b85c5371..1acc43a08 100644 --- a/drivers/net/enetc/enetc_rxtx.c +++ b/drivers/net/enetc/enetc_rxtx.c @@ -20,8 +20,9 @@ static int enetc_clean_tx_ring(struct enetc_bdr *tx_ring) { int tx_frm_cnt = 0; - struct enetc_swbd *tx_swbd; - int i, hwci; + struct enetc_swbd *tx_swbd, *tx_swbd_base; + int i, hwci, bd_count; + struct rte_mbuf *m[ENETC_RXBD_BUNDLE]; /* we don't need barriers here, we just want a relatively current value * from HW. @@ -29,8 +30,10 @@ enetc_clean_tx_ring(struct enetc_bdr *tx_ring) hwci = (int)(rte_read32_relaxed(tx_ring->tcisr) & ENETC_TBCISR_IDX_MASK); + tx_swbd_base = tx_ring->q_swbd; + bd_count = tx_ring->bd_count; i = tx_ring->next_to_clean; - tx_swbd = &tx_ring->q_swbd[i]; + tx_swbd = &tx_swbd_base[i]; /* we're only reading the CI index once here, which means HW may update * it while we're doing clean-up. We could read the register in a loop @@ -42,20 +45,33 @@ enetc_clean_tx_ring(struct enetc_bdr *tx_ring) * meantime. */ while (i != hwci) { - rte_pktmbuf_free(tx_swbd->buffer_addr); + /* It seems calling rte_pktmbuf_free is wasting a lot of cycles, + * make a list and call _free when it's done. + */ + if (tx_frm_cnt == ENETC_RXBD_BUNDLE) { + rte_pktmbuf_free_bulk(m, tx_frm_cnt); + tx_frm_cnt = 0; + } + + m[tx_frm_cnt] = tx_swbd->buffer_addr; tx_swbd->buffer_addr = NULL; - tx_swbd++; + i++; - if (unlikely(i == tx_ring->bd_count)) { + tx_swbd++; + if (unlikely(i == bd_count)) { i = 0; - tx_swbd = &tx_ring->q_swbd[0]; + tx_swbd = tx_swbd_base; } tx_frm_cnt++; } + if (tx_frm_cnt) + rte_pktmbuf_free_bulk(m, tx_frm_cnt); + tx_ring->next_to_clean = i; - return tx_frm_cnt++; + + return 0; } uint16_t diff --git a/drivers/net/enetc/meson.build b/drivers/net/enetc/meson.build index bea54bea8..af11c0960 100644 --- a/drivers/net/enetc/meson.build +++ b/drivers/net/enetc/meson.build @@ -11,3 +11,4 @@ sources = files('enetc_ethdev.c', 'enetc_rxtx.c') includes += include_directories('base') +allow_experimental_apis = true -- 2.17.1
From: Alex Marginean <alexandru.marginean@nxp.com> LS1028A does not have platform cache so any reads following a hardware write will go directly to DDR. Latency of such a read is in excess of 100 core cycles, so try to prefetch more in advance to mitigate this. How much is worth prefetching really depends on traffic conditions. With congested Rx this could go up to 4 cache lines or so. But if software keeps up with hardware and follows behind Rx PI by a cache line then it's harmful in terms of performance to cache more. We would only prefetch data that's yet to be written by ENETC, which will be evicted again anyway. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> --- drivers/net/enetc/enetc_rxtx.c | 38 +++++++++++++++++++++++++++++----- 1 file changed, 33 insertions(+), 5 deletions(-) diff --git a/drivers/net/enetc/enetc_rxtx.c b/drivers/net/enetc/enetc_rxtx.c index 1acc43a08..e57ecf2d4 100644 --- a/drivers/net/enetc/enetc_rxtx.c +++ b/drivers/net/enetc/enetc_rxtx.c @@ -14,6 +14,8 @@ #include "enetc.h" #include "enetc_logs.h" +#define ENETC_CACHE_LINE_RXBDS (RTE_CACHE_LINE_SIZE / \ + sizeof(union enetc_rx_bd)) #define ENETC_RXBD_BUNDLE 16 /* Number of buffers to allocate at once */ static int @@ -321,18 +323,37 @@ enetc_clean_rx_ring(struct enetc_bdr *rx_ring, int work_limit) { int rx_frm_cnt = 0; - int cleaned_cnt, i; + int cleaned_cnt, i, bd_count; struct enetc_swbd *rx_swbd; + union enetc_rx_bd *rxbd; - cleaned_cnt = enetc_bd_unused(rx_ring); /* next descriptor to process */ i = rx_ring->next_to_clean; + /* next descriptor to process */ + rxbd = ENETC_RXBD(*rx_ring, i); + rte_prefetch0(rxbd); + bd_count = rx_ring->bd_count; + /* LS1028A does not have platform cache so any software access following + * a hardware write will go directly to DDR. Latency of such a read is + * in excess of 100 core cycles, so try to prefetch more in advance to + * mitigate this. + * How much is worth prefetching really depends on traffic conditions. + * With congested Rx this could go up to 4 cache lines or so. But if + * software keeps up with hardware and follows behind Rx PI by a cache + * line or less then it's harmful in terms of performance to cache more. + * We would only prefetch BDs that have yet to be written by ENETC, + * which will have to be evicted again anyway. + */ + rte_prefetch0(ENETC_RXBD(*rx_ring, + (i + ENETC_CACHE_LINE_RXBDS) % bd_count)); + rte_prefetch0(ENETC_RXBD(*rx_ring, + (i + ENETC_CACHE_LINE_RXBDS * 2) % bd_count)); + + cleaned_cnt = enetc_bd_unused(rx_ring); rx_swbd = &rx_ring->q_swbd[i]; while (likely(rx_frm_cnt < work_limit)) { - union enetc_rx_bd *rxbd; uint32_t bd_status; - rxbd = ENETC_RXBD(*rx_ring, i); bd_status = rte_le_to_cpu_32(rxbd->r.lstatus); if (!bd_status) break; @@ -353,11 +374,18 @@ enetc_clean_rx_ring(struct enetc_bdr *rx_ring, i = 0; rx_swbd = &rx_ring->q_swbd[i]; } + rxbd = ENETC_RXBD(*rx_ring, i); + rte_prefetch0(ENETC_RXBD(*rx_ring, + (i + ENETC_CACHE_LINE_RXBDS) % + bd_count)); + rte_prefetch0(ENETC_RXBD(*rx_ring, + (i + ENETC_CACHE_LINE_RXBDS * 2) % + bd_count)); - rx_ring->next_to_clean = i; rx_frm_cnt++; } + rx_ring->next_to_clean = i; enetc_refill_rx_ring(rx_ring, cleaned_cnt); return rx_frm_cnt; -- 2.17.1
From: Alex Marginean <alexandru.marginean@nxp.com> This was left to its default value. With the patch transactions are: - coherent, - do not allocate in downstream cache (there is none on LS1028a), - merge surrounding data for BD writes, - overwrite surrounding data for frame data writes. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> --- drivers/net/enetc/base/enetc_hw.h | 2 ++ drivers/net/enetc/enetc_ethdev.c | 6 ++++++ 2 files changed, 8 insertions(+) diff --git a/drivers/net/enetc/base/enetc_hw.h b/drivers/net/enetc/base/enetc_hw.h index 00813284e..66fad58e5 100644 --- a/drivers/net/enetc/base/enetc_hw.h +++ b/drivers/net/enetc/base/enetc_hw.h @@ -22,6 +22,8 @@ #define ENETC_SIMR 0x0 #define ENETC_SIMR_EN BIT(31) +#define ENETC_SICAR0 0x40 +#define ENETC_SICAR0_COHERENT 0x2B2B6727 #define ENETC_SIPMAR0 0x80 #define ENETC_SIPMAR1 0x84 diff --git a/drivers/net/enetc/enetc_ethdev.c b/drivers/net/enetc/enetc_ethdev.c index eb637d030..1716e11dd 100644 --- a/drivers/net/enetc/enetc_ethdev.c +++ b/drivers/net/enetc/enetc_ethdev.c @@ -150,6 +150,12 @@ enetc_hardware_init(struct enetc_eth_hw *hw) /* WA for Rx lock-up HW erratum */ enetc_port_wr(enetc_hw, ENETC_PM0_RX_FIFO, 1); + /* set ENETC transaction flags to coherent, don't allocate. + * BD writes merge with surrounding cache line data, frame data writes + * overwrite cache line. + */ + enetc_wr(enetc_hw, ENETC_SICAR0, ENETC_SICAR0_COHERENT); + /* Enabling Station Interface */ enetc_wr(enetc_hw, ENETC_SIMR, ENETC_SIMR_EN); -- 2.17.1
> -----Original Message----- > From: Hemant Agrawal <hemant.agrawal@nxp.com> > Sent: Monday, March 2, 2020 8:02 PM > To: ferruh.yigit@intel.com > Cc: dev@dpdk.org; Gagandeep Singh <G.Singh@nxp.com> > Subject: [PATCH 00/10] net/enetc: optimization and cleanup > > This patch series includes patches to optimize and clean > the network driver for ENETC > > Alex Marginean (10): > net/enetc: do not stall in clean Tx ring > net/enetc: use relaxed read for Tx CI in clean Tx > net/enetc: batch process enetc clean Tx ring calls > net/enetc: erratum wa for Rx lock-up issue > net/enetc: improve batching Rx ring refill > net/enetc: cache align enetc bdr structure > net/enetc: use bulk alloc in Rx refill ring > net/enetc: use bulk free in Tx clean > net/enetc: improve prefetch in Rx ring clean > net/enetc: init SI transactions attribute reg > > drivers/net/enetc/Makefile | 1 + > drivers/net/enetc/base/enetc_hw.h | 5 +- > drivers/net/enetc/enetc.h | 10 +-- > drivers/net/enetc/enetc_ethdev.c | 11 ++- > drivers/net/enetc/enetc_rxtx.c | 131 +++++++++++++++++++++++------- > drivers/net/enetc/meson.build | 1 + > 6 files changed, 123 insertions(+), 36 deletions(-) > Series-acked-by: Gagandeep Singh <g.singh@nxp.com> > -- > 2.17.1
On 3/3/2020 12:31 PM, Gagandeep Singh wrote:
>
>
>> -----Original Message-----
>> From: Hemant Agrawal <hemant.agrawal@nxp.com>
>> Sent: Monday, March 2, 2020 8:02 PM
>> To: ferruh.yigit@intel.com
>> Cc: dev@dpdk.org; Gagandeep Singh <G.Singh@nxp.com>
>> Subject: [PATCH 00/10] net/enetc: optimization and cleanup
>>
>> This patch series includes patches to optimize and clean
>> the network driver for ENETC
>>
>> Alex Marginean (10):
>> net/enetc: do not stall in clean Tx ring
>> net/enetc: use relaxed read for Tx CI in clean Tx
>> net/enetc: batch process enetc clean Tx ring calls
>> net/enetc: erratum wa for Rx lock-up issue
>> net/enetc: improve batching Rx ring refill
>> net/enetc: cache align enetc bdr structure
>> net/enetc: use bulk alloc in Rx refill ring
>> net/enetc: use bulk free in Tx clean
>> net/enetc: improve prefetch in Rx ring clean
>> net/enetc: init SI transactions attribute reg
>>
>> drivers/net/enetc/Makefile | 1 +
>> drivers/net/enetc/base/enetc_hw.h | 5 +-
>> drivers/net/enetc/enetc.h | 10 +--
>> drivers/net/enetc/enetc_ethdev.c | 11 ++-
>> drivers/net/enetc/enetc_rxtx.c | 131 +++++++++++++++++++++++-------
>> drivers/net/enetc/meson.build | 1 +
>> 6 files changed, 123 insertions(+), 36 deletions(-)
>>
>
> Series-acked-by: Gagandeep Singh <g.singh@nxp.com>
>
Series applied to dpdk-next-net/master, thanks.