* [dpdk-dev] [PATCH v2 1/7] net/ena: fix verification of the offload capabilities
2021-10-15 16:26 ` [dpdk-dev] [PATCH 0/7] net/ena: update ENA PMD to v2.5.0 Michal Krawczyk
@ 2021-10-15 16:26 ` Michal Krawczyk
2021-10-15 16:26 ` [dpdk-dev] [PATCH v2 2/7] net/ena: support Tx/Rx free thresholds Michal Krawczyk
` (7 subsequent siblings)
8 siblings, 0 replies; 29+ messages in thread
From: Michal Krawczyk @ 2021-10-15 16:26 UTC (permalink / raw)
To: ferruh.yigit
Cc: dev, upstream, shaibran, ndagan, igorch, Michal Krawczyk, stable
ENA PMD has multiple checksum offload flags, which are more discrete
than the DPDK offload capabilities flags.
As the driver wasn't storing it's internal checksum offload capabilities
and was relying only on the DPDK capabilities, not all scenarios could
be properly covered (like when to prepare pseudo header checksum and
when not).
Moreover, the user could request offload capability, which isn't
supported by the HW and the PMD would quietly ignore the issue.
This commit reworks eth_ena_prep_pkts() function to perform additional
checks and to properly reflect the HW requirements. With the
RTE_LIBRTE_ETHDEV_DEBUG enabled, the function will do even more
verifications, to help the user find any issues with the mbuf
configuration.
Fixes: b3fc5a1ae10d ("net/ena: add Tx preparation")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
---
drivers/net/ena/ena_ethdev.c | 235 +++++++++++++++++++++++++++--------
drivers/net/ena/ena_ethdev.h | 6 +-
2 files changed, 184 insertions(+), 57 deletions(-)
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index a82d4b6287..227831a98c 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -140,6 +140,23 @@ static const struct ena_stats ena_stats_rx_strings[] = {
#define ENA_TX_OFFLOAD_NOTSUP_MASK \
(PKT_TX_OFFLOAD_MASK ^ ENA_TX_OFFLOAD_MASK)
+/** HW specific offloads capabilities. */
+/* IPv4 checksum offload. */
+#define ENA_L3_IPV4_CSUM 0x0001
+/* TCP/UDP checksum offload for IPv4 packets. */
+#define ENA_L4_IPV4_CSUM 0x0002
+/* TCP/UDP checksum offload for IPv4 packets with pseudo header checksum. */
+#define ENA_L4_IPV4_CSUM_PARTIAL 0x0004
+/* TCP/UDP checksum offload for IPv6 packets. */
+#define ENA_L4_IPV6_CSUM 0x0008
+/* TCP/UDP checksum offload for IPv6 packets with pseudo header checksum. */
+#define ENA_L4_IPV6_CSUM_PARTIAL 0x0010
+/* TSO support for IPv4 packets. */
+#define ENA_IPV4_TSO 0x0020
+
+/* Device supports setting RSS hash. */
+#define ENA_RX_RSS_HASH 0x0040
+
static const struct rte_pci_id pci_id_ena_map[] = {
{ RTE_PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_ENA_VF) },
{ RTE_PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_ENA_VF_RSERV0) },
@@ -1624,6 +1641,50 @@ static uint32_t ena_calc_max_io_queue_num(struct ena_com_dev *ena_dev,
return max_num_io_queues;
}
+static void
+ena_set_offloads(struct ena_offloads *offloads,
+ struct ena_admin_feature_offload_desc *offload_desc)
+{
+ if (offload_desc->tx & ENA_ADMIN_FEATURE_OFFLOAD_DESC_TSO_IPV4_MASK)
+ offloads->tx_offloads |= ENA_IPV4_TSO;
+
+ /* Tx IPv4 checksum offloads */
+ if (offload_desc->tx &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L3_CSUM_IPV4_MASK)
+ offloads->tx_offloads |= ENA_L3_IPV4_CSUM;
+ if (offload_desc->tx &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L4_IPV4_CSUM_FULL_MASK)
+ offloads->tx_offloads |= ENA_L4_IPV4_CSUM;
+ if (offload_desc->tx &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L4_IPV4_CSUM_PART_MASK)
+ offloads->tx_offloads |= ENA_L4_IPV4_CSUM_PARTIAL;
+
+ /* Tx IPv6 checksum offloads */
+ if (offload_desc->tx &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L4_IPV6_CSUM_FULL_MASK)
+ offloads->tx_offloads |= ENA_L4_IPV6_CSUM;
+ if (offload_desc->tx &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L4_IPV6_CSUM_PART_MASK)
+ offloads->tx_offloads |= ENA_L4_IPV6_CSUM_PARTIAL;
+
+ /* Rx IPv4 checksum offloads */
+ if (offload_desc->rx_supported &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_RX_L3_CSUM_IPV4_MASK)
+ offloads->rx_offloads |= ENA_L3_IPV4_CSUM;
+ if (offload_desc->rx_supported &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_RX_L4_IPV4_CSUM_MASK)
+ offloads->rx_offloads |= ENA_L4_IPV4_CSUM;
+
+ /* Rx IPv6 checksum offloads */
+ if (offload_desc->rx_supported &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_RX_L4_IPV6_CSUM_MASK)
+ offloads->rx_offloads |= ENA_L4_IPV6_CSUM;
+
+ if (offload_desc->rx_supported &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_RX_HASH_MASK)
+ offloads->rx_offloads |= ENA_RX_RSS_HASH;
+}
+
static int eth_ena_dev_init(struct rte_eth_dev *eth_dev)
{
struct ena_calc_queue_size_ctx calc_queue_ctx = { 0 };
@@ -1745,17 +1806,7 @@ static int eth_ena_dev_init(struct rte_eth_dev *eth_dev)
/* Set max MTU for this device */
adapter->max_mtu = get_feat_ctx.dev_attr.max_mtu;
- /* set device support for offloads */
- adapter->offloads.tso4_supported = (get_feat_ctx.offload.tx &
- ENA_ADMIN_FEATURE_OFFLOAD_DESC_TSO_IPV4_MASK) != 0;
- adapter->offloads.tx_csum_supported = (get_feat_ctx.offload.tx &
- ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L4_IPV4_CSUM_PART_MASK) != 0;
- adapter->offloads.rx_csum_supported =
- (get_feat_ctx.offload.rx_supported &
- ENA_ADMIN_FEATURE_OFFLOAD_DESC_RX_L4_IPV4_CSUM_MASK) != 0;
- adapter->offloads.rss_hash_supported =
- (get_feat_ctx.offload.rx_supported &
- ENA_ADMIN_FEATURE_OFFLOAD_DESC_RX_HASH_MASK) != 0;
+ ena_set_offloads(&adapter->offloads, &get_feat_ctx.offload);
/* Copy MAC address and point DPDK to it */
eth_dev->data->mac_addrs = (struct rte_ether_addr *)adapter->mac_addr;
@@ -1915,25 +1966,28 @@ static int ena_infos_get(struct rte_eth_dev *dev,
ETH_LINK_SPEED_100G;
/* Set Tx & Rx features available for device */
- if (adapter->offloads.tso4_supported)
+ if (adapter->offloads.tx_offloads & ENA_IPV4_TSO)
tx_feat |= DEV_TX_OFFLOAD_TCP_TSO;
- if (adapter->offloads.tx_csum_supported)
- tx_feat |= DEV_TX_OFFLOAD_IPV4_CKSUM |
- DEV_TX_OFFLOAD_UDP_CKSUM |
- DEV_TX_OFFLOAD_TCP_CKSUM;
+ if (adapter->offloads.tx_offloads & ENA_L3_IPV4_CSUM)
+ tx_feat |= DEV_TX_OFFLOAD_IPV4_CKSUM;
+ if (adapter->offloads.tx_offloads &
+ (ENA_L4_IPV4_CSUM_PARTIAL | ENA_L4_IPV4_CSUM |
+ ENA_L4_IPV6_CSUM | ENA_L4_IPV6_CSUM_PARTIAL))
+ tx_feat |= DEV_TX_OFFLOAD_UDP_CKSUM | DEV_TX_OFFLOAD_TCP_CKSUM;
- if (adapter->offloads.rx_csum_supported)
- rx_feat |= DEV_RX_OFFLOAD_IPV4_CKSUM |
- DEV_RX_OFFLOAD_UDP_CKSUM |
- DEV_RX_OFFLOAD_TCP_CKSUM;
+ if (adapter->offloads.rx_offloads & ENA_L3_IPV4_CSUM)
+ rx_feat |= DEV_RX_OFFLOAD_IPV4_CKSUM;
+ if (adapter->offloads.rx_offloads &
+ (ENA_L4_IPV4_CSUM | ENA_L4_IPV6_CSUM))
+ rx_feat |= DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM;
rx_feat |= DEV_RX_OFFLOAD_JUMBO_FRAME;
tx_feat |= DEV_TX_OFFLOAD_MULTI_SEGS;
/* Inform framework about available features */
dev_info->rx_offload_capa = rx_feat;
- if (adapter->offloads.rss_hash_supported)
+ if (adapter->offloads.rx_offloads & ENA_RX_RSS_HASH)
dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_RSS_HASH;
dev_info->rx_queue_offload_capa = rx_feat;
dev_info->tx_offload_capa = tx_feat;
@@ -2183,45 +2237,60 @@ eth_ena_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint32_t i;
struct rte_mbuf *m;
struct ena_ring *tx_ring = (struct ena_ring *)(tx_queue);
+ struct ena_adapter *adapter = tx_ring->adapter;
struct rte_ipv4_hdr *ip_hdr;
uint64_t ol_flags;
+ uint64_t l4_csum_flag;
+ uint64_t dev_offload_capa;
uint16_t frag_field;
+ bool need_pseudo_csum;
+ dev_offload_capa = adapter->offloads.tx_offloads;
for (i = 0; i != nb_pkts; i++) {
m = tx_pkts[i];
ol_flags = m->ol_flags;
- if (!(ol_flags & PKT_TX_IPV4))
+ /* Check if any offload flag was set */
+ if (ol_flags == 0)
continue;
- /* If there was not L2 header length specified, assume it is
- * length of the ethernet header.
- */
- if (unlikely(m->l2_len == 0))
- m->l2_len = sizeof(struct rte_ether_hdr);
-
- ip_hdr = rte_pktmbuf_mtod_offset(m, struct rte_ipv4_hdr *,
- m->l2_len);
- frag_field = rte_be_to_cpu_16(ip_hdr->fragment_offset);
-
- if ((frag_field & RTE_IPV4_HDR_DF_FLAG) != 0) {
- m->packet_type |= RTE_PTYPE_L4_NONFRAG;
-
- /* If IPv4 header has DF flag enabled and TSO support is
- * disabled, partial chcecksum should not be calculated.
- */
- if (!tx_ring->adapter->offloads.tso4_supported)
- continue;
- }
-
- if ((ol_flags & ENA_TX_OFFLOAD_NOTSUP_MASK) != 0 ||
- (ol_flags & PKT_TX_L4_MASK) ==
- PKT_TX_SCTP_CKSUM) {
+ l4_csum_flag = ol_flags & PKT_TX_L4_MASK;
+ /* SCTP checksum offload is not supported by the ENA. */
+ if ((ol_flags & ENA_TX_OFFLOAD_NOTSUP_MASK) ||
+ l4_csum_flag == PKT_TX_SCTP_CKSUM) {
+ PMD_TX_LOG(DEBUG,
+ "mbuf[%" PRIu32 "] has unsupported offloads flags set: 0x%" PRIu64 "\n",
+ i, ol_flags);
rte_errno = ENOTSUP;
return i;
}
#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+ /* Check if requested offload is also enabled for the queue */
+ if ((ol_flags & PKT_TX_IP_CKSUM &&
+ !(tx_ring->offloads & DEV_TX_OFFLOAD_IPV4_CKSUM)) ||
+ (l4_csum_flag == PKT_TX_TCP_CKSUM &&
+ !(tx_ring->offloads & DEV_TX_OFFLOAD_TCP_CKSUM)) ||
+ (l4_csum_flag == PKT_TX_UDP_CKSUM &&
+ !(tx_ring->offloads & DEV_TX_OFFLOAD_UDP_CKSUM))) {
+ PMD_TX_LOG(DEBUG,
+ "mbuf[%" PRIu32 "]: requested offloads: %" PRIu16 " are not enabled for the queue[%u]\n",
+ i, m->nb_segs, tx_ring->id);
+ rte_errno = EINVAL;
+ return i;
+ }
+
+ /* The caller is obligated to set l2 and l3 len if any cksum
+ * offload is enabled.
+ */
+ if (unlikely(ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_L4_MASK) &&
+ (m->l2_len == 0 || m->l3_len == 0))) {
+ PMD_TX_LOG(DEBUG,
+ "mbuf[%" PRIu32 "]: l2_len or l3_len values are 0 while the offload was requested\n",
+ i);
+ rte_errno = EINVAL;
+ return i;
+ }
ret = rte_validate_tx_offload(m);
if (ret != 0) {
rte_errno = -ret;
@@ -2229,16 +2298,76 @@ eth_ena_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
}
#endif
- /* In case we are supposed to TSO and have DF not set (DF=0)
- * hardware must be provided with partial checksum, otherwise
- * it will take care of necessary calculations.
+ /* Verify HW support for requested offloads and determine if
+ * pseudo header checksum is needed.
*/
+ need_pseudo_csum = false;
+ if (ol_flags & PKT_TX_IPV4) {
+ if (ol_flags & PKT_TX_IP_CKSUM &&
+ !(dev_offload_capa & ENA_L3_IPV4_CSUM)) {
+ rte_errno = ENOTSUP;
+ return i;
+ }
- ret = rte_net_intel_cksum_flags_prepare(m,
- ol_flags & ~PKT_TX_TCP_SEG);
- if (ret != 0) {
- rte_errno = -ret;
- return i;
+ if (ol_flags & PKT_TX_TCP_SEG &&
+ !(dev_offload_capa & ENA_IPV4_TSO)) {
+ rte_errno = ENOTSUP;
+ return i;
+ }
+
+ /* Check HW capabilities and if pseudo csum is needed
+ * for L4 offloads.
+ */
+ if (l4_csum_flag != PKT_TX_L4_NO_CKSUM &&
+ !(dev_offload_capa & ENA_L4_IPV4_CSUM)) {
+ if (dev_offload_capa &
+ ENA_L4_IPV4_CSUM_PARTIAL) {
+ need_pseudo_csum = true;
+ } else {
+ rte_errno = ENOTSUP;
+ return i;
+ }
+ }
+
+ /* Parse the DF flag */
+ ip_hdr = rte_pktmbuf_mtod_offset(m,
+ struct rte_ipv4_hdr *, m->l2_len);
+ frag_field = rte_be_to_cpu_16(ip_hdr->fragment_offset);
+ if (frag_field & RTE_IPV4_HDR_DF_FLAG) {
+ m->packet_type |= RTE_PTYPE_L4_NONFRAG;
+ } else if (ol_flags & PKT_TX_TCP_SEG) {
+ /* In case we are supposed to TSO and have DF
+ * not set (DF=0) hardware must be provided with
+ * partial checksum.
+ */
+ need_pseudo_csum = true;
+ }
+ } else if (ol_flags & PKT_TX_IPV6) {
+ /* There is no support for IPv6 TSO as for now. */
+ if (ol_flags & PKT_TX_TCP_SEG) {
+ rte_errno = ENOTSUP;
+ return i;
+ }
+
+ /* Check HW capabilities and if pseudo csum is needed */
+ if (l4_csum_flag != PKT_TX_L4_NO_CKSUM &&
+ !(dev_offload_capa & ENA_L4_IPV6_CSUM)) {
+ if (dev_offload_capa &
+ ENA_L4_IPV6_CSUM_PARTIAL) {
+ need_pseudo_csum = true;
+ } else {
+ rte_errno = ENOTSUP;
+ return i;
+ }
+ }
+ }
+
+ if (need_pseudo_csum) {
+ ret = rte_net_intel_cksum_flags_prepare(m, ol_flags);
+ if (ret != 0) {
+ rte_errno = -ret;
+ return i;
+ }
}
}
diff --git a/drivers/net/ena/ena_ethdev.h b/drivers/net/ena/ena_ethdev.h
index 06ac8b06b5..26d425a893 100644
--- a/drivers/net/ena/ena_ethdev.h
+++ b/drivers/net/ena/ena_ethdev.h
@@ -223,10 +223,8 @@ struct ena_stats_eni {
};
struct ena_offloads {
- bool tso4_supported;
- bool tx_csum_supported;
- bool rx_csum_supported;
- bool rss_hash_supported;
+ uint32_t tx_offloads;
+ uint32_t rx_offloads;
};
/* board specific private data structure */
--
2.25.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v2 2/7] net/ena: support Tx/Rx free thresholds
2021-10-15 16:26 ` [dpdk-dev] [PATCH 0/7] net/ena: update ENA PMD to v2.5.0 Michal Krawczyk
2021-10-15 16:26 ` [dpdk-dev] [PATCH v2 1/7] net/ena: fix verification of the offload capabilities Michal Krawczyk
@ 2021-10-15 16:26 ` Michal Krawczyk
2021-10-15 16:26 ` [dpdk-dev] [PATCH v2 3/7] net/ena: fix per-queue offload capabilities Michal Krawczyk
` (6 subsequent siblings)
8 siblings, 0 replies; 29+ messages in thread
From: Michal Krawczyk @ 2021-10-15 16:26 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, upstream, shaibran, ndagan, igorch, Michal Krawczyk
The caller can pass Tx or Rx free threshold value to the configuration
structure for each ring. It determines when the Tx/Rx function should
start cleaning up/refilling the descriptors. ENA was ignoring this value
and doing it's own calulcations.
Now the user can configure ENA's behavior using this parameter and if
this variable won't be set, the ENA will continue with the old behavior
and will use it's own threshold value.
The default value is not provided by the ENA in the ena_infos_get(), as
it's being determined dynamically, depending on the requested ring size.
Note that NULL check for Tx conf was removed from the function
ena_tx_queue_setup(), as at this place the configuration will be
either provided by the user or the default config will be used and it's
handled by the upper (rte_ethdev) layer.
Tx threshold shouldn't be used for the Tx cleanup budget as it can be
inadequate to the used burst. Now the PMD tries to release mbufs for the
ring until it will be depleted.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
---
v2:
* Fix calculations of the default tx_free_thresh if it wasn't provided by
the user. RTE_MIN was replaced with RTE_MAX.
doc/guides/rel_notes/release_21_11.rst | 7 ++++
drivers/net/ena/ena_ethdev.c | 44 ++++++++++++++++++--------
drivers/net/ena/ena_ethdev.h | 5 +++
3 files changed, 42 insertions(+), 14 deletions(-)
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 1f033cf80c..45d5cbdc78 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -93,6 +93,13 @@ New Features
* Disabled secondary process support.
+* **Updated Amazon ENA PMD.**
+
+ Updated the Amazon ENA PMD. The new driver version (v2.5.0) introduced
+ bug fixes and improvements, including:
+
+ * Support for the tx_free_thresh and rx_free_thresh configuration parameters.
+
* **Updated Broadcom bnxt PMD.**
* Added flow offload support for Thor.
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 227831a98c..87216f75a9 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -1140,6 +1140,7 @@ static int ena_tx_queue_setup(struct rte_eth_dev *dev,
struct ena_ring *txq = NULL;
struct ena_adapter *adapter = dev->data->dev_private;
unsigned int i;
+ uint16_t dyn_thresh;
txq = &adapter->tx_ring[queue_idx];
@@ -1206,10 +1207,18 @@ static int ena_tx_queue_setup(struct rte_eth_dev *dev,
for (i = 0; i < txq->ring_size; i++)
txq->empty_tx_reqs[i] = i;
- if (tx_conf != NULL) {
- txq->offloads =
- tx_conf->offloads | dev->data->dev_conf.txmode.offloads;
+ txq->offloads = tx_conf->offloads | dev->data->dev_conf.txmode.offloads;
+
+ /* Check if caller provided the Tx cleanup threshold value. */
+ if (tx_conf->tx_free_thresh != 0) {
+ txq->tx_free_thresh = tx_conf->tx_free_thresh;
+ } else {
+ dyn_thresh = txq->ring_size -
+ txq->ring_size / ENA_REFILL_THRESH_DIVIDER;
+ txq->tx_free_thresh = RTE_MAX(dyn_thresh,
+ txq->ring_size - ENA_REFILL_THRESH_PACKET);
}
+
/* Store pointer to this queue in upper layer */
txq->configured = 1;
dev->data->tx_queues[queue_idx] = txq;
@@ -1228,6 +1237,7 @@ static int ena_rx_queue_setup(struct rte_eth_dev *dev,
struct ena_ring *rxq = NULL;
size_t buffer_size;
int i;
+ uint16_t dyn_thresh;
rxq = &adapter->rx_ring[queue_idx];
if (rxq->configured) {
@@ -1307,6 +1317,14 @@ static int ena_rx_queue_setup(struct rte_eth_dev *dev,
rxq->offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
+ if (rx_conf->rx_free_thresh != 0) {
+ rxq->rx_free_thresh = rx_conf->rx_free_thresh;
+ } else {
+ dyn_thresh = rxq->ring_size / ENA_REFILL_THRESH_DIVIDER;
+ rxq->rx_free_thresh = RTE_MIN(dyn_thresh,
+ (uint16_t)(ENA_REFILL_THRESH_PACKET));
+ }
+
/* Store pointer to this queue in upper layer */
rxq->configured = 1;
dev->data->rx_queues[queue_idx] = rxq;
@@ -2134,7 +2152,6 @@ static uint16_t eth_ena_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
{
struct ena_ring *rx_ring = (struct ena_ring *)(rx_queue);
unsigned int free_queue_entries;
- unsigned int refill_threshold;
uint16_t next_to_clean = rx_ring->next_to_clean;
uint16_t descs_in_use;
struct rte_mbuf *mbuf;
@@ -2216,12 +2233,9 @@ static uint16_t eth_ena_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
rx_ring->next_to_clean = next_to_clean;
free_queue_entries = ena_com_free_q_entries(rx_ring->ena_com_io_sq);
- refill_threshold =
- RTE_MIN(rx_ring->ring_size / ENA_REFILL_THRESH_DIVIDER,
- (unsigned int)ENA_REFILL_THRESH_PACKET);
/* Burst refill to save doorbells, memory barriers, const interval */
- if (free_queue_entries > refill_threshold) {
+ if (free_queue_entries >= rx_ring->rx_free_thresh) {
ena_com_update_dev_comp_head(rx_ring->ena_com_io_cq);
ena_populate_rx_queue(rx_ring, free_queue_entries);
}
@@ -2588,12 +2602,12 @@ static int ena_xmit_mbuf(struct ena_ring *tx_ring, struct rte_mbuf *mbuf)
static void ena_tx_cleanup(struct ena_ring *tx_ring)
{
- unsigned int cleanup_budget;
unsigned int total_tx_descs = 0;
+ uint16_t cleanup_budget;
uint16_t next_to_clean = tx_ring->next_to_clean;
- cleanup_budget = RTE_MIN(tx_ring->ring_size / ENA_REFILL_THRESH_DIVIDER,
- (unsigned int)ENA_REFILL_THRESH_PACKET);
+ /* Attempt to release all Tx descriptors (ring_size - 1 -> size_mask) */
+ cleanup_budget = tx_ring->size_mask;
while (likely(total_tx_descs < cleanup_budget)) {
struct rte_mbuf *mbuf;
@@ -2634,6 +2648,7 @@ static uint16_t eth_ena_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts)
{
struct ena_ring *tx_ring = (struct ena_ring *)(tx_queue);
+ int available_desc;
uint16_t sent_idx = 0;
#ifdef RTE_ETHDEV_DEBUG_TX
@@ -2653,8 +2668,8 @@ static uint16_t eth_ena_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
tx_ring->size_mask)]);
}
- tx_ring->tx_stats.available_desc =
- ena_com_free_q_entries(tx_ring->ena_com_io_sq);
+ available_desc = ena_com_free_q_entries(tx_ring->ena_com_io_sq);
+ tx_ring->tx_stats.available_desc = available_desc;
/* If there are ready packets to be xmitted... */
if (likely(tx_ring->pkts_without_db)) {
@@ -2664,7 +2679,8 @@ static uint16_t eth_ena_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
tx_ring->pkts_without_db = false;
}
- ena_tx_cleanup(tx_ring);
+ if (available_desc < tx_ring->tx_free_thresh)
+ ena_tx_cleanup(tx_ring);
tx_ring->tx_stats.available_desc =
ena_com_free_q_entries(tx_ring->ena_com_io_sq);
diff --git a/drivers/net/ena/ena_ethdev.h b/drivers/net/ena/ena_ethdev.h
index 26d425a893..176d713dff 100644
--- a/drivers/net/ena/ena_ethdev.h
+++ b/drivers/net/ena/ena_ethdev.h
@@ -142,6 +142,11 @@ struct ena_ring {
struct ena_com_io_cq *ena_com_io_cq;
struct ena_com_io_sq *ena_com_io_sq;
+ union {
+ uint16_t tx_free_thresh;
+ uint16_t rx_free_thresh;
+ };
+
struct ena_com_rx_buf_info ena_bufs[ENA_PKT_MAX_BUFS]
__rte_cache_aligned;
--
2.25.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v2 3/7] net/ena: fix per-queue offload capabilities
2021-10-15 16:26 ` [dpdk-dev] [PATCH 0/7] net/ena: update ENA PMD to v2.5.0 Michal Krawczyk
2021-10-15 16:26 ` [dpdk-dev] [PATCH v2 1/7] net/ena: fix verification of the offload capabilities Michal Krawczyk
2021-10-15 16:26 ` [dpdk-dev] [PATCH v2 2/7] net/ena: support Tx/Rx free thresholds Michal Krawczyk
@ 2021-10-15 16:26 ` Michal Krawczyk
2021-10-15 16:26 ` [dpdk-dev] [PATCH v2 4/7] net/ena: indicate missing scattered Rx capability Michal Krawczyk
` (5 subsequent siblings)
8 siblings, 0 replies; 29+ messages in thread
From: Michal Krawczyk @ 2021-10-15 16:26 UTC (permalink / raw)
To: ferruh.yigit
Cc: dev, upstream, shaibran, ndagan, igorch, Michal Krawczyk, stable
PMD shouldn't advertise the same offloads as both per-queue and
per-port [1]. Each offload capability should go either to the
[rt]x_queue_offload_capa or [rt]x_offload_capa.
As ENA currently doesn't support offloads which could be configured
per-queue, only per-port flags should be set.
In addition, to make the code cleaner, parsing appropriate offload
flags is encapsulated into helper functions, in a similar matter it's
done by the other PMDs.
[1] https://doc.dpdk.org/guides/prog_guide/
poll_mode_drv.html?highlight=offloads#hardware-offload
Fixes: 7369f88f88c0 ("net/ena: convert to new Rx offloads API")
Fixes: 56b8b9b7e5d2 ("net/ena: convert to new Tx offloads API")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
---
drivers/net/ena/ena_ethdev.c | 90 ++++++++++++++++++++++++------------
1 file changed, 60 insertions(+), 30 deletions(-)
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 87216f75a9..c2bd2f12af 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -223,6 +223,10 @@ static int ena_queue_start(struct rte_eth_dev *dev, struct ena_ring *ring);
static int ena_queue_start_all(struct rte_eth_dev *dev,
enum ena_ring_type ring_type);
static void ena_stats_restart(struct rte_eth_dev *dev);
+static uint64_t ena_get_rx_port_offloads(struct ena_adapter *adapter);
+static uint64_t ena_get_tx_port_offloads(struct ena_adapter *adapter);
+static uint64_t ena_get_rx_queue_offloads(struct ena_adapter *adapter);
+static uint64_t ena_get_tx_queue_offloads(struct ena_adapter *adapter);
static int ena_infos_get(struct rte_eth_dev *dev,
struct rte_eth_dev_info *dev_info);
static void ena_interrupt_handler_rte(void *cb_arg);
@@ -1959,12 +1963,63 @@ static void ena_init_rings(struct ena_adapter *adapter,
}
}
+static uint64_t ena_get_rx_port_offloads(struct ena_adapter *adapter)
+{
+ uint64_t port_offloads = 0;
+
+ if (adapter->offloads.rx_offloads & ENA_L3_IPV4_CSUM)
+ port_offloads |= DEV_RX_OFFLOAD_IPV4_CKSUM;
+
+ if (adapter->offloads.rx_offloads &
+ (ENA_L4_IPV4_CSUM | ENA_L4_IPV6_CSUM))
+ port_offloads |=
+ DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM;
+
+ if (adapter->offloads.rx_offloads & ENA_RX_RSS_HASH)
+ port_offloads |= DEV_RX_OFFLOAD_RSS_HASH;
+
+ port_offloads |= DEV_RX_OFFLOAD_JUMBO_FRAME;
+
+ return port_offloads;
+}
+
+static uint64_t ena_get_tx_port_offloads(struct ena_adapter *adapter)
+{
+ uint64_t port_offloads = 0;
+
+ if (adapter->offloads.tx_offloads & ENA_IPV4_TSO)
+ port_offloads |= DEV_TX_OFFLOAD_TCP_TSO;
+
+ if (adapter->offloads.tx_offloads & ENA_L3_IPV4_CSUM)
+ port_offloads |= DEV_TX_OFFLOAD_IPV4_CKSUM;
+ if (adapter->offloads.tx_offloads &
+ (ENA_L4_IPV4_CSUM_PARTIAL | ENA_L4_IPV4_CSUM |
+ ENA_L4_IPV6_CSUM | ENA_L4_IPV6_CSUM_PARTIAL))
+ port_offloads |=
+ DEV_TX_OFFLOAD_UDP_CKSUM | DEV_TX_OFFLOAD_TCP_CKSUM;
+
+ return port_offloads;
+}
+
+static uint64_t ena_get_rx_queue_offloads(struct ena_adapter *adapter)
+{
+ RTE_SET_USED(adapter);
+
+ return 0;
+}
+
+static uint64_t ena_get_tx_queue_offloads(struct ena_adapter *adapter)
+{
+ RTE_SET_USED(adapter);
+
+ return 0;
+}
+
static int ena_infos_get(struct rte_eth_dev *dev,
struct rte_eth_dev_info *dev_info)
{
struct ena_adapter *adapter;
struct ena_com_dev *ena_dev;
- uint64_t rx_feat = 0, tx_feat = 0;
ena_assert_msg(dev->data != NULL, "Uninitialized device\n");
ena_assert_msg(dev->data->dev_private != NULL, "Uninitialized device\n");
@@ -1983,33 +2038,11 @@ static int ena_infos_get(struct rte_eth_dev *dev,
ETH_LINK_SPEED_50G |
ETH_LINK_SPEED_100G;
- /* Set Tx & Rx features available for device */
- if (adapter->offloads.tx_offloads & ENA_IPV4_TSO)
- tx_feat |= DEV_TX_OFFLOAD_TCP_TSO;
-
- if (adapter->offloads.tx_offloads & ENA_L3_IPV4_CSUM)
- tx_feat |= DEV_TX_OFFLOAD_IPV4_CKSUM;
- if (adapter->offloads.tx_offloads &
- (ENA_L4_IPV4_CSUM_PARTIAL | ENA_L4_IPV4_CSUM |
- ENA_L4_IPV6_CSUM | ENA_L4_IPV6_CSUM_PARTIAL))
- tx_feat |= DEV_TX_OFFLOAD_UDP_CKSUM | DEV_TX_OFFLOAD_TCP_CKSUM;
-
- if (adapter->offloads.rx_offloads & ENA_L3_IPV4_CSUM)
- rx_feat |= DEV_RX_OFFLOAD_IPV4_CKSUM;
- if (adapter->offloads.rx_offloads &
- (ENA_L4_IPV4_CSUM | ENA_L4_IPV6_CSUM))
- rx_feat |= DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM;
-
- rx_feat |= DEV_RX_OFFLOAD_JUMBO_FRAME;
- tx_feat |= DEV_TX_OFFLOAD_MULTI_SEGS;
-
/* Inform framework about available features */
- dev_info->rx_offload_capa = rx_feat;
- if (adapter->offloads.rx_offloads & ENA_RX_RSS_HASH)
- dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_RSS_HASH;
- dev_info->rx_queue_offload_capa = rx_feat;
- dev_info->tx_offload_capa = tx_feat;
- dev_info->tx_queue_offload_capa = tx_feat;
+ dev_info->rx_offload_capa = ena_get_rx_port_offloads(adapter);
+ dev_info->tx_offload_capa = ena_get_tx_port_offloads(adapter);
+ dev_info->rx_queue_offload_capa = ena_get_rx_queue_offloads(adapter);
+ dev_info->tx_queue_offload_capa = ena_get_tx_queue_offloads(adapter);
dev_info->flow_type_rss_offloads = ENA_ALL_RSS_HF;
dev_info->hash_key_size = ENA_HASH_KEY_SIZE;
@@ -2022,9 +2055,6 @@ static int ena_infos_get(struct rte_eth_dev *dev,
dev_info->max_tx_queues = adapter->max_num_io_queues;
dev_info->reta_size = ENA_RX_RSS_TABLE_SIZE;
- adapter->tx_supported_offloads = tx_feat;
- adapter->rx_supported_offloads = rx_feat;
-
dev_info->rx_desc_lim.nb_max = adapter->max_rx_ring_size;
dev_info->rx_desc_lim.nb_min = ENA_MIN_RING_DESC;
dev_info->rx_desc_lim.nb_seg_max = RTE_MIN(ENA_PKT_MAX_BUFS,
--
2.25.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v2 4/7] net/ena: indicate missing scattered Rx capability
2021-10-15 16:26 ` [dpdk-dev] [PATCH 0/7] net/ena: update ENA PMD to v2.5.0 Michal Krawczyk
` (2 preceding siblings ...)
2021-10-15 16:26 ` [dpdk-dev] [PATCH v2 3/7] net/ena: fix per-queue offload capabilities Michal Krawczyk
@ 2021-10-15 16:26 ` Michal Krawczyk
2021-10-15 16:26 ` [dpdk-dev] [PATCH v2 5/7] net/ena: add NUMA aware allocations Michal Krawczyk
` (4 subsequent siblings)
8 siblings, 0 replies; 29+ messages in thread
From: Michal Krawczyk @ 2021-10-15 16:26 UTC (permalink / raw)
To: ferruh.yigit
Cc: dev, upstream, shaibran, ndagan, igorch, Michal Krawczyk, stable
ENA can't be forced to always pass single descriptor for the Rx packet.
Even if the passed buffer size is big enough to hold the data, we can't
make assumption that the HW won't use extra descriptor because of
internal optimizations. This assumption may be true, but only for some
of the FW revisions, which may differ depending on the used AWS instance
type.
As the scattered Rx support on the Rx path already exists, the driver
just needs to announce DEV_RX_OFFLOAD_SCATTER capability by turning on
the rte_eth_dev_data::scattered_rx option.
Fixes: 1173fca25af9 ("ena: add polling-mode driver")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
---
drivers/net/ena/ena_ethdev.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index c2bd2f12af..35db2e8356 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -1929,8 +1929,14 @@ static int ena_dev_configure(struct rte_eth_dev *dev)
dev->data->dev_conf.rxmode.offloads |= DEV_RX_OFFLOAD_RSS_HASH;
dev->data->dev_conf.txmode.offloads |= DEV_TX_OFFLOAD_MULTI_SEGS;
+ /* Scattered Rx cannot be turned off in the HW, so this capability must
+ * be forced.
+ */
+ dev->data->scattered_rx = 1;
+
adapter->tx_selected_offloads = dev->data->dev_conf.txmode.offloads;
adapter->rx_selected_offloads = dev->data->dev_conf.rxmode.offloads;
+
return 0;
}
@@ -1978,7 +1984,7 @@ static uint64_t ena_get_rx_port_offloads(struct ena_adapter *adapter)
if (adapter->offloads.rx_offloads & ENA_RX_RSS_HASH)
port_offloads |= DEV_RX_OFFLOAD_RSS_HASH;
- port_offloads |= DEV_RX_OFFLOAD_JUMBO_FRAME;
+ port_offloads |= DEV_RX_OFFLOAD_JUMBO_FRAME | DEV_RX_OFFLOAD_SCATTER;
return port_offloads;
}
--
2.25.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v2 5/7] net/ena: add NUMA aware allocations
2021-10-15 16:26 ` [dpdk-dev] [PATCH 0/7] net/ena: update ENA PMD to v2.5.0 Michal Krawczyk
` (3 preceding siblings ...)
2021-10-15 16:26 ` [dpdk-dev] [PATCH v2 4/7] net/ena: indicate missing scattered Rx capability Michal Krawczyk
@ 2021-10-15 16:26 ` Michal Krawczyk
2021-10-15 16:27 ` [dpdk-dev] [PATCH v2 6/7] net/ena: add check for missing Tx completions Michal Krawczyk
` (3 subsequent siblings)
8 siblings, 0 replies; 29+ messages in thread
From: Michal Krawczyk @ 2021-10-15 16:26 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, upstream, shaibran, ndagan, igorch, Michal Krawczyk
Only the IO rings memory was allocated with taking the socket ID into
the respect, while the other structures was allocated using the regular
rte_zmalloc() API.
Ring specific structures are now being allocated using the ring's
socket ID.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
---
doc/guides/rel_notes/release_21_11.rst | 1 +
drivers/net/ena/ena_ethdev.c | 42 ++++++++++++++------------
2 files changed, 24 insertions(+), 19 deletions(-)
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 45d5cbdc78..c87862e713 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -99,6 +99,7 @@ New Features
bug fixes and improvements, including:
* Support for the tx_free_thresh and rx_free_thresh configuration parameters.
+ * NUMA aware allocations for the queue helper structures.
* **Updated Broadcom bnxt PMD.**
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 35db2e8356..e31cb0b65c 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -1177,19 +1177,20 @@ static int ena_tx_queue_setup(struct rte_eth_dev *dev,
txq->numa_socket_id = socket_id;
txq->pkts_without_db = false;
- txq->tx_buffer_info = rte_zmalloc("txq->tx_buffer_info",
- sizeof(struct ena_tx_buffer) *
- txq->ring_size,
- RTE_CACHE_LINE_SIZE);
+ txq->tx_buffer_info = rte_zmalloc_socket("txq->tx_buffer_info",
+ sizeof(struct ena_tx_buffer) * txq->ring_size,
+ RTE_CACHE_LINE_SIZE,
+ socket_id);
if (!txq->tx_buffer_info) {
PMD_DRV_LOG(ERR,
"Failed to allocate memory for Tx buffer info\n");
return -ENOMEM;
}
- txq->empty_tx_reqs = rte_zmalloc("txq->empty_tx_reqs",
- sizeof(u16) * txq->ring_size,
- RTE_CACHE_LINE_SIZE);
+ txq->empty_tx_reqs = rte_zmalloc_socket("txq->empty_tx_reqs",
+ sizeof(uint16_t) * txq->ring_size,
+ RTE_CACHE_LINE_SIZE,
+ socket_id);
if (!txq->empty_tx_reqs) {
PMD_DRV_LOG(ERR,
"Failed to allocate memory for empty Tx requests\n");
@@ -1198,9 +1199,10 @@ static int ena_tx_queue_setup(struct rte_eth_dev *dev,
}
txq->push_buf_intermediate_buf =
- rte_zmalloc("txq->push_buf_intermediate_buf",
- txq->tx_max_header_size,
- RTE_CACHE_LINE_SIZE);
+ rte_zmalloc_socket("txq->push_buf_intermediate_buf",
+ txq->tx_max_header_size,
+ RTE_CACHE_LINE_SIZE,
+ socket_id);
if (!txq->push_buf_intermediate_buf) {
PMD_DRV_LOG(ERR, "Failed to alloc push buffer for LLQ\n");
rte_free(txq->tx_buffer_info);
@@ -1282,19 +1284,20 @@ static int ena_rx_queue_setup(struct rte_eth_dev *dev,
rxq->numa_socket_id = socket_id;
rxq->mb_pool = mp;
- rxq->rx_buffer_info = rte_zmalloc("rxq->buffer_info",
+ rxq->rx_buffer_info = rte_zmalloc_socket("rxq->buffer_info",
sizeof(struct ena_rx_buffer) * nb_desc,
- RTE_CACHE_LINE_SIZE);
+ RTE_CACHE_LINE_SIZE,
+ socket_id);
if (!rxq->rx_buffer_info) {
PMD_DRV_LOG(ERR,
"Failed to allocate memory for Rx buffer info\n");
return -ENOMEM;
}
- rxq->rx_refill_buffer = rte_zmalloc("rxq->rx_refill_buffer",
- sizeof(struct rte_mbuf *) * nb_desc,
- RTE_CACHE_LINE_SIZE);
-
+ rxq->rx_refill_buffer = rte_zmalloc_socket("rxq->rx_refill_buffer",
+ sizeof(struct rte_mbuf *) * nb_desc,
+ RTE_CACHE_LINE_SIZE,
+ socket_id);
if (!rxq->rx_refill_buffer) {
PMD_DRV_LOG(ERR,
"Failed to allocate memory for Rx refill buffer\n");
@@ -1303,9 +1306,10 @@ static int ena_rx_queue_setup(struct rte_eth_dev *dev,
return -ENOMEM;
}
- rxq->empty_rx_reqs = rte_zmalloc("rxq->empty_rx_reqs",
- sizeof(uint16_t) * nb_desc,
- RTE_CACHE_LINE_SIZE);
+ rxq->empty_rx_reqs = rte_zmalloc_socket("rxq->empty_rx_reqs",
+ sizeof(uint16_t) * nb_desc,
+ RTE_CACHE_LINE_SIZE,
+ socket_id);
if (!rxq->empty_rx_reqs) {
PMD_DRV_LOG(ERR,
"Failed to allocate memory for empty Rx requests\n");
--
2.25.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v2 6/7] net/ena: add check for missing Tx completions
2021-10-15 16:26 ` [dpdk-dev] [PATCH 0/7] net/ena: update ENA PMD to v2.5.0 Michal Krawczyk
` (4 preceding siblings ...)
2021-10-15 16:26 ` [dpdk-dev] [PATCH v2 5/7] net/ena: add NUMA aware allocations Michal Krawczyk
@ 2021-10-15 16:27 ` Michal Krawczyk
2021-10-15 16:27 ` [dpdk-dev] [PATCH v2 7/7] net/ena: update version to 2.5.0 Michal Krawczyk
` (2 subsequent siblings)
8 siblings, 0 replies; 29+ messages in thread
From: Michal Krawczyk @ 2021-10-15 16:27 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, upstream, shaibran, ndagan, igorch, Michal Krawczyk
In some cases Tx descriptors may be uncompleted by the HW and as a
result they will never be released.
This patch adds checking for the missing Tx completions to the ENA timer
service, so in order to use this feature, the application must call the
function rte_timer_manage().
Missing Tx completion reset threshold is determined dynamically, by
taking into consideration ring size and the default value.
Tx cleanup is associated with the Tx burst function. As DPDK
applications can call Tx burst function dynamically, time when last
cleanup was called must be traced to avoid false detection of the
missing Tx completion.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
---
doc/guides/rel_notes/release_21_11.rst | 1 +
drivers/net/ena/ena_ethdev.c | 118 +++++++++++++++++++++++++
drivers/net/ena/ena_ethdev.h | 15 ++++
3 files changed, 134 insertions(+)
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index c87862e713..198f56a694 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -100,6 +100,7 @@ New Features
* Support for the tx_free_thresh and rx_free_thresh configuration parameters.
* NUMA aware allocations for the queue helper structures.
+ * Watchdog's feature which is checking for missing Tx completions.
* **Updated Broadcom bnxt PMD.**
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index e31cb0b65c..5554057ed3 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -99,6 +99,7 @@ static const struct ena_stats ena_stats_tx_strings[] = {
ENA_STAT_TX_ENTRY(doorbells),
ENA_STAT_TX_ENTRY(bad_req_id),
ENA_STAT_TX_ENTRY(available_desc),
+ ENA_STAT_TX_ENTRY(missed_tx),
};
static const struct ena_stats ena_stats_rx_strings[] = {
@@ -1176,6 +1177,7 @@ static int ena_tx_queue_setup(struct rte_eth_dev *dev,
txq->size_mask = nb_desc - 1;
txq->numa_socket_id = socket_id;
txq->pkts_without_db = false;
+ txq->last_cleanup_ticks = 0;
txq->tx_buffer_info = rte_zmalloc_socket("txq->tx_buffer_info",
sizeof(struct ena_tx_buffer) * txq->ring_size,
@@ -1225,6 +1227,9 @@ static int ena_tx_queue_setup(struct rte_eth_dev *dev,
txq->ring_size - ENA_REFILL_THRESH_PACKET);
}
+ txq->missing_tx_completion_threshold =
+ RTE_MIN(txq->ring_size / 2, ENA_DEFAULT_MISSING_COMP);
+
/* Store pointer to this queue in upper layer */
txq->configured = 1;
dev->data->tx_queues[queue_idx] = txq;
@@ -1551,6 +1556,85 @@ static void check_for_admin_com_state(struct ena_adapter *adapter)
}
}
+static int check_for_tx_completion_in_queue(struct ena_adapter *adapter,
+ struct ena_ring *tx_ring)
+{
+ struct ena_tx_buffer *tx_buf;
+ uint64_t timestamp;
+ uint64_t completion_delay;
+ uint32_t missed_tx = 0;
+ unsigned int i;
+ int rc = 0;
+
+ for (i = 0; i < tx_ring->ring_size; ++i) {
+ tx_buf = &tx_ring->tx_buffer_info[i];
+ timestamp = tx_buf->timestamp;
+
+ if (timestamp == 0)
+ continue;
+
+ completion_delay = rte_get_timer_cycles() - timestamp;
+ if (completion_delay > adapter->missing_tx_completion_to) {
+ if (unlikely(!tx_buf->print_once)) {
+ PMD_TX_LOG(WARNING,
+ "Found a Tx that wasn't completed on time, qid %d, index %d. Missing Tx outstanding for %" PRIu64 " msecs.\n",
+ tx_ring->id, i, completion_delay /
+ rte_get_timer_hz() * 1000);
+ tx_buf->print_once = true;
+ }
+ ++missed_tx;
+ }
+ }
+
+ if (unlikely(missed_tx > tx_ring->missing_tx_completion_threshold)) {
+ PMD_DRV_LOG(ERR,
+ "The number of lost Tx completions is above the threshold (%d > %d). Trigger the device reset.\n",
+ missed_tx,
+ tx_ring->missing_tx_completion_threshold);
+ adapter->reset_reason = ENA_REGS_RESET_MISS_TX_CMPL;
+ adapter->trigger_reset = true;
+ rc = -EIO;
+ }
+
+ tx_ring->tx_stats.missed_tx += missed_tx;
+
+ return rc;
+}
+
+static void check_for_tx_completions(struct ena_adapter *adapter)
+{
+ struct ena_ring *tx_ring;
+ uint64_t tx_cleanup_delay;
+ size_t qid;
+ int budget;
+ uint16_t nb_tx_queues = adapter->edev_data->nb_tx_queues;
+
+ if (adapter->missing_tx_completion_to == ENA_HW_HINTS_NO_TIMEOUT)
+ return;
+
+ nb_tx_queues = adapter->edev_data->nb_tx_queues;
+ budget = adapter->missing_tx_completion_budget;
+
+ qid = adapter->last_tx_comp_qid;
+ while (budget-- > 0) {
+ tx_ring = &adapter->tx_ring[qid];
+
+ /* Tx cleanup is called only by the burst function and can be
+ * called dynamically by the application. Also cleanup is
+ * limited by the threshold. To avoid false detection of the
+ * missing HW Tx completion, get the delay since last cleanup
+ * function was called.
+ */
+ tx_cleanup_delay = rte_get_timer_cycles() -
+ tx_ring->last_cleanup_ticks;
+ if (tx_cleanup_delay < adapter->tx_cleanup_stall_delay)
+ check_for_tx_completion_in_queue(adapter, tx_ring);
+ qid = (qid + 1) % nb_tx_queues;
+ }
+
+ adapter->last_tx_comp_qid = qid;
+}
+
static void ena_timer_wd_callback(__rte_unused struct rte_timer *timer,
void *arg)
{
@@ -1559,6 +1643,7 @@ static void ena_timer_wd_callback(__rte_unused struct rte_timer *timer,
check_for_missing_keep_alive(adapter);
check_for_admin_com_state(adapter);
+ check_for_tx_completions(adapter);
if (unlikely(adapter->trigger_reset)) {
PMD_DRV_LOG(ERR, "Trigger reset is on\n");
@@ -1938,6 +2023,20 @@ static int ena_dev_configure(struct rte_eth_dev *dev)
*/
dev->data->scattered_rx = 1;
+ adapter->last_tx_comp_qid = 0;
+
+ adapter->missing_tx_completion_budget =
+ RTE_MIN(ENA_MONITORED_TX_QUEUES, dev->data->nb_tx_queues);
+
+ adapter->missing_tx_completion_to = ENA_TX_TIMEOUT;
+ /* To avoid detection of the spurious Tx completion timeout due to
+ * application not calling the Tx cleanup function, set timeout for the
+ * Tx queue which should be half of the missing completion timeout for a
+ * safety. If there will be a lot of missing Tx completions in the
+ * queue, they will be detected sooner or later.
+ */
+ adapter->tx_cleanup_stall_delay = adapter->missing_tx_completion_to / 2;
+
adapter->tx_selected_offloads = dev->data->dev_conf.txmode.offloads;
adapter->rx_selected_offloads = dev->data->dev_conf.rxmode.offloads;
@@ -2440,6 +2539,20 @@ static void ena_update_hints(struct ena_adapter *adapter,
adapter->ena_dev.mmio_read.reg_read_to =
hints->mmio_read_timeout * 1000;
+ if (hints->missing_tx_completion_timeout) {
+ if (hints->missing_tx_completion_timeout ==
+ ENA_HW_HINTS_NO_TIMEOUT) {
+ adapter->missing_tx_completion_to =
+ ENA_HW_HINTS_NO_TIMEOUT;
+ } else {
+ /* Convert from msecs to ticks */
+ adapter->missing_tx_completion_to = rte_get_timer_hz() *
+ hints->missing_tx_completion_timeout / 1000;
+ adapter->tx_cleanup_stall_delay =
+ adapter->missing_tx_completion_to / 2;
+ }
+ }
+
if (hints->driver_watchdog_timeout) {
if (hints->driver_watchdog_timeout == ENA_HW_HINTS_NO_TIMEOUT)
adapter->keep_alive_timeout = ENA_HW_HINTS_NO_TIMEOUT;
@@ -2630,6 +2743,7 @@ static int ena_xmit_mbuf(struct ena_ring *tx_ring, struct rte_mbuf *mbuf)
}
tx_info->tx_descs = nb_hw_desc;
+ tx_info->timestamp = rte_get_timer_cycles();
tx_ring->tx_stats.cnt++;
tx_ring->tx_stats.bytes += mbuf->pkt_len;
@@ -2662,6 +2776,7 @@ static void ena_tx_cleanup(struct ena_ring *tx_ring)
/* Get Tx info & store how many descs were processed */
tx_info = &tx_ring->tx_buffer_info[req_id];
+ tx_info->timestamp = 0;
mbuf = tx_info->mbuf;
rte_pktmbuf_free(mbuf);
@@ -2682,6 +2797,9 @@ static void ena_tx_cleanup(struct ena_ring *tx_ring)
ena_com_comp_ack(tx_ring->ena_com_io_sq, total_tx_descs);
ena_com_update_dev_comp_head(tx_ring->ena_com_io_cq);
}
+
+ /* Notify completion handler that the cleanup was just called */
+ tx_ring->last_cleanup_ticks = rte_get_timer_cycles();
}
static uint16_t eth_ena_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
diff --git a/drivers/net/ena/ena_ethdev.h b/drivers/net/ena/ena_ethdev.h
index 176d713dff..4f4142ed12 100644
--- a/drivers/net/ena/ena_ethdev.h
+++ b/drivers/net/ena/ena_ethdev.h
@@ -36,6 +36,10 @@
#define ENA_WD_TIMEOUT_SEC 3
#define ENA_DEVICE_KALIVE_TIMEOUT (ENA_WD_TIMEOUT_SEC * rte_get_timer_hz())
+#define ENA_TX_TIMEOUT (5 * rte_get_timer_hz())
+#define ENA_MONITORED_TX_QUEUES 3
+#define ENA_DEFAULT_MISSING_COMP 256U
+
/* While processing submitted and completed descriptors (rx and tx path
* respectively) in a loop it is desired to:
* - perform batch submissions while populating sumbissmion queue
@@ -75,6 +79,8 @@ struct ena_tx_buffer {
struct rte_mbuf *mbuf;
unsigned int tx_descs;
unsigned int num_of_bufs;
+ uint64_t timestamp;
+ bool print_once;
struct ena_com_buf bufs[ENA_PKT_MAX_BUFS];
};
@@ -103,6 +109,7 @@ struct ena_stats_tx {
u64 doorbells;
u64 bad_req_id;
u64 available_desc;
+ u64 missed_tx;
};
struct ena_stats_rx {
@@ -118,6 +125,7 @@ struct ena_stats_rx {
struct ena_ring {
u16 next_to_use;
u16 next_to_clean;
+ uint64_t last_cleanup_ticks;
enum ena_ring_type type;
enum ena_admin_placement_policy_type tx_mem_queue_type;
@@ -171,6 +179,8 @@ struct ena_ring {
};
unsigned int numa_socket_id;
+
+ uint32_t missing_tx_completion_threshold;
} __rte_cache_aligned;
enum ena_adapter_state {
@@ -291,6 +301,11 @@ struct ena_adapter {
bool wd_state;
bool use_large_llq_hdr;
+
+ uint32_t last_tx_comp_qid;
+ uint64_t missing_tx_completion_to;
+ uint64_t missing_tx_completion_budget;
+ uint64_t tx_cleanup_stall_delay;
};
int ena_rss_reta_update(struct rte_eth_dev *dev,
--
2.25.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v2 7/7] net/ena: update version to 2.5.0
2021-10-15 16:26 ` [dpdk-dev] [PATCH 0/7] net/ena: update ENA PMD to v2.5.0 Michal Krawczyk
` (5 preceding siblings ...)
2021-10-15 16:27 ` [dpdk-dev] [PATCH v2 6/7] net/ena: add check for missing Tx completions Michal Krawczyk
@ 2021-10-15 16:27 ` Michal Krawczyk
2021-10-18 20:51 ` [dpdk-dev] [PATCH 0/7] net/ena: update ENA PMD to v2.5.0 Ferruh Yigit
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 " Michal Krawczyk
8 siblings, 0 replies; 29+ messages in thread
From: Michal Krawczyk @ 2021-10-15 16:27 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, upstream, shaibran, ndagan, igorch, Michal Krawczyk
This version update contains:
* Fix for verification of the offload capabilities (especially for
IPv6 packets).
* Support for Tx and Rx free threshold values.
* Fixes for per-queue offload capabilities.
* Announce support of the scattered Rx offload.
* NUMA aware allocations.
* Check for the missing Tx completions.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
---
drivers/net/ena/ena_ethdev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 5554057ed3..cad9d46198 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -21,7 +21,7 @@
#include <ena_eth_io_defs.h>
#define DRV_MODULE_VER_MAJOR 2
-#define DRV_MODULE_VER_MINOR 4
+#define DRV_MODULE_VER_MINOR 5
#define DRV_MODULE_VER_SUBMINOR 0
#define __MERGE_64B_H_L(h, l) (((uint64_t)h << 32) | l)
--
2.25.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH 0/7] net/ena: update ENA PMD to v2.5.0
2021-10-15 16:26 ` [dpdk-dev] [PATCH 0/7] net/ena: update ENA PMD to v2.5.0 Michal Krawczyk
` (6 preceding siblings ...)
2021-10-15 16:27 ` [dpdk-dev] [PATCH v2 7/7] net/ena: update version to 2.5.0 Michal Krawczyk
@ 2021-10-18 20:51 ` Ferruh Yigit
2021-10-19 9:05 ` Michał Krawczyk
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 " Michal Krawczyk
8 siblings, 1 reply; 29+ messages in thread
From: Ferruh Yigit @ 2021-10-18 20:51 UTC (permalink / raw)
To: Michal Krawczyk; +Cc: dev, upstream, shaibran, ndagan, igorch
On 10/15/2021 5:26 PM, Michal Krawczyk wrote:
> Hi,
>
> this version updates the driver to version 2.5.0. It mainly focuses on
> fixing the offload flags fixes. Other features included in this patchset
> are:
>
> * NUMA aware allocations for the queue specific structures
> * New watchdog - check for missing Tx completions
> * Support for [tr]x_free_thresh configuration parameters
>
> Regards,
> Michal
>
> Michal Krawczyk (7):
> net/ena: fix verification of the offload capabilities
> net/ena: support Tx/Rx free thresholds
> net/ena: fix per-queue offload capabilities
> net/ena: indicate missing scattered Rx capability
> net/ena: add NUMA aware allocations
> net/ena: add check for missing Tx completions
> net/ena: update version to 2.5.0
>
Hi Michal,
Can you please rebase on top of latest next-net, mainly because of
JUMBO frame offload check?
Resolving conflict is not hard, but I wasn't sure if you want to
change the logic too, safer if you send a new version.
Thanks,
ferruh
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH 0/7] net/ena: update ENA PMD to v2.5.0
2021-10-18 20:51 ` [dpdk-dev] [PATCH 0/7] net/ena: update ENA PMD to v2.5.0 Ferruh Yigit
@ 2021-10-19 9:05 ` Michał Krawczyk
0 siblings, 0 replies; 29+ messages in thread
From: Michał Krawczyk @ 2021-10-19 9:05 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: dev, upstream, Brandes, Shai, Dagan, Noam, Chauskin, Igor
pon., 18 paź 2021 o 22:52 Ferruh Yigit <ferruh.yigit@intel.com> napisał(a):
>
> On 10/15/2021 5:26 PM, Michal Krawczyk wrote:
> > Hi,
> >
> > this version updates the driver to version 2.5.0. It mainly focuses on
> > fixing the offload flags fixes. Other features included in this patchset
> > are:
> >
> > * NUMA aware allocations for the queue specific structures
> > * New watchdog - check for missing Tx completions
> > * Support for [tr]x_free_thresh configuration parameters
> >
> > Regards,
> > Michal
> >
> > Michal Krawczyk (7):
> > net/ena: fix verification of the offload capabilities
> > net/ena: support Tx/Rx free thresholds
> > net/ena: fix per-queue offload capabilities
> > net/ena: indicate missing scattered Rx capability
> > net/ena: add NUMA aware allocations
> > net/ena: add check for missing Tx completions
> > net/ena: update version to 2.5.0
> >
>
> Hi Michal,
>
> Can you please rebase on top of latest next-net, mainly because of
> JUMBO frame offload check?
> Resolving conflict is not hard, but I wasn't sure if you want to
> change the logic too, safer if you send a new version.
>
Hi Ferruh,
sure, I'll do that and send the v3 of the patches.
Thanks,
Michal
> Thanks,
> ferruh
^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v3 0/7] net/ena: update ENA PMD to v2.5.0
2021-10-15 16:26 ` [dpdk-dev] [PATCH 0/7] net/ena: update ENA PMD to v2.5.0 Michal Krawczyk
` (7 preceding siblings ...)
2021-10-18 20:51 ` [dpdk-dev] [PATCH 0/7] net/ena: update ENA PMD to v2.5.0 Ferruh Yigit
@ 2021-10-19 10:56 ` Michal Krawczyk
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 1/7] net/ena: fix verification of the offload capabilities Michal Krawczyk
` (7 more replies)
8 siblings, 8 replies; 29+ messages in thread
From: Michal Krawczyk @ 2021-10-19 10:56 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, upstream, shaibran, ndagan, igorch, Michal Krawczyk
Hi,
this version updates the driver to version 2.5.0. It mainly focuses on
fixing the offload flags fixes. Other features included in this patchset
are:
* NUMA aware allocations for the queue specific structures
* New watchdog - check for missing Tx completions
* Support for [tr]x_free_thresh configuration parameters
Regards,
Michal
v3:
* Rebase series on top of recent changed to resolve conflicts regarding the
driver's offload flags (especially the DEV_RX_OFFLOAD_JUMBO_FRAME).
Michal Krawczyk (7):
net/ena: fix verification of the offload capabilities
net/ena: support Tx/Rx free thresholds
net/ena: fix per-queue offload capabilities
net/ena: indicate missing scattered Rx capability
net/ena: add NUMA aware allocations
net/ena: add check for missing Tx completions
net/ena: update version to 2.5.0
doc/guides/rel_notes/release_21_11.rst | 9 +
drivers/net/ena/ena_ethdev.c | 512 ++++++++++++++++++++-----
drivers/net/ena/ena_ethdev.h | 26 +-
3 files changed, 440 insertions(+), 107 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v3 1/7] net/ena: fix verification of the offload capabilities
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 " Michal Krawczyk
@ 2021-10-19 10:56 ` Michal Krawczyk
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 2/7] net/ena: support Tx/Rx free thresholds Michal Krawczyk
` (6 subsequent siblings)
7 siblings, 0 replies; 29+ messages in thread
From: Michal Krawczyk @ 2021-10-19 10:56 UTC (permalink / raw)
To: ferruh.yigit
Cc: dev, upstream, shaibran, ndagan, igorch, Michal Krawczyk, stable
ENA PMD has multiple checksum offload flags, which are more discrete
than the DPDK offload capabilities flags.
As the driver wasn't storing it's internal checksum offload capabilities
and was relying only on the DPDK capabilities, not all scenarios could
be properly covered (like when to prepare pseudo header checksum and
when not).
Moreover, the user could request offload capability, which isn't
supported by the HW and the PMD would quietly ignore the issue.
This commit reworks eth_ena_prep_pkts() function to perform additional
checks and to properly reflect the HW requirements. With the
RTE_LIBRTE_ETHDEV_DEBUG enabled, the function will do even more
verifications, to help the user find any issues with the mbuf
configuration.
Fixes: b3fc5a1ae10d ("net/ena: add Tx preparation")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
---
drivers/net/ena/ena_ethdev.c | 235 +++++++++++++++++++++++++++--------
drivers/net/ena/ena_ethdev.h | 6 +-
2 files changed, 184 insertions(+), 57 deletions(-)
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 3fde099ab4..197cb7ecd4 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -140,6 +140,23 @@ static const struct ena_stats ena_stats_rx_strings[] = {
#define ENA_TX_OFFLOAD_NOTSUP_MASK \
(PKT_TX_OFFLOAD_MASK ^ ENA_TX_OFFLOAD_MASK)
+/** HW specific offloads capabilities. */
+/* IPv4 checksum offload. */
+#define ENA_L3_IPV4_CSUM 0x0001
+/* TCP/UDP checksum offload for IPv4 packets. */
+#define ENA_L4_IPV4_CSUM 0x0002
+/* TCP/UDP checksum offload for IPv4 packets with pseudo header checksum. */
+#define ENA_L4_IPV4_CSUM_PARTIAL 0x0004
+/* TCP/UDP checksum offload for IPv6 packets. */
+#define ENA_L4_IPV6_CSUM 0x0008
+/* TCP/UDP checksum offload for IPv6 packets with pseudo header checksum. */
+#define ENA_L4_IPV6_CSUM_PARTIAL 0x0010
+/* TSO support for IPv4 packets. */
+#define ENA_IPV4_TSO 0x0020
+
+/* Device supports setting RSS hash. */
+#define ENA_RX_RSS_HASH 0x0040
+
static const struct rte_pci_id pci_id_ena_map[] = {
{ RTE_PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_ENA_VF) },
{ RTE_PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_ENA_VF_RSERV0) },
@@ -1612,6 +1629,50 @@ static uint32_t ena_calc_max_io_queue_num(struct ena_com_dev *ena_dev,
return max_num_io_queues;
}
+static void
+ena_set_offloads(struct ena_offloads *offloads,
+ struct ena_admin_feature_offload_desc *offload_desc)
+{
+ if (offload_desc->tx & ENA_ADMIN_FEATURE_OFFLOAD_DESC_TSO_IPV4_MASK)
+ offloads->tx_offloads |= ENA_IPV4_TSO;
+
+ /* Tx IPv4 checksum offloads */
+ if (offload_desc->tx &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L3_CSUM_IPV4_MASK)
+ offloads->tx_offloads |= ENA_L3_IPV4_CSUM;
+ if (offload_desc->tx &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L4_IPV4_CSUM_FULL_MASK)
+ offloads->tx_offloads |= ENA_L4_IPV4_CSUM;
+ if (offload_desc->tx &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L4_IPV4_CSUM_PART_MASK)
+ offloads->tx_offloads |= ENA_L4_IPV4_CSUM_PARTIAL;
+
+ /* Tx IPv6 checksum offloads */
+ if (offload_desc->tx &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L4_IPV6_CSUM_FULL_MASK)
+ offloads->tx_offloads |= ENA_L4_IPV6_CSUM;
+ if (offload_desc->tx &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L4_IPV6_CSUM_PART_MASK)
+ offloads->tx_offloads |= ENA_L4_IPV6_CSUM_PARTIAL;
+
+ /* Rx IPv4 checksum offloads */
+ if (offload_desc->rx_supported &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_RX_L3_CSUM_IPV4_MASK)
+ offloads->rx_offloads |= ENA_L3_IPV4_CSUM;
+ if (offload_desc->rx_supported &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_RX_L4_IPV4_CSUM_MASK)
+ offloads->rx_offloads |= ENA_L4_IPV4_CSUM;
+
+ /* Rx IPv6 checksum offloads */
+ if (offload_desc->rx_supported &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_RX_L4_IPV6_CSUM_MASK)
+ offloads->rx_offloads |= ENA_L4_IPV6_CSUM;
+
+ if (offload_desc->rx_supported &
+ ENA_ADMIN_FEATURE_OFFLOAD_DESC_RX_HASH_MASK)
+ offloads->rx_offloads |= ENA_RX_RSS_HASH;
+}
+
static int eth_ena_dev_init(struct rte_eth_dev *eth_dev)
{
struct ena_calc_queue_size_ctx calc_queue_ctx = { 0 };
@@ -1733,17 +1794,7 @@ static int eth_ena_dev_init(struct rte_eth_dev *eth_dev)
/* Set max MTU for this device */
adapter->max_mtu = get_feat_ctx.dev_attr.max_mtu;
- /* set device support for offloads */
- adapter->offloads.tso4_supported = (get_feat_ctx.offload.tx &
- ENA_ADMIN_FEATURE_OFFLOAD_DESC_TSO_IPV4_MASK) != 0;
- adapter->offloads.tx_csum_supported = (get_feat_ctx.offload.tx &
- ENA_ADMIN_FEATURE_OFFLOAD_DESC_TX_L4_IPV4_CSUM_PART_MASK) != 0;
- adapter->offloads.rx_csum_supported =
- (get_feat_ctx.offload.rx_supported &
- ENA_ADMIN_FEATURE_OFFLOAD_DESC_RX_L4_IPV4_CSUM_MASK) != 0;
- adapter->offloads.rss_hash_supported =
- (get_feat_ctx.offload.rx_supported &
- ENA_ADMIN_FEATURE_OFFLOAD_DESC_RX_HASH_MASK) != 0;
+ ena_set_offloads(&adapter->offloads, &get_feat_ctx.offload);
/* Copy MAC address and point DPDK to it */
eth_dev->data->mac_addrs = (struct rte_ether_addr *)adapter->mac_addr;
@@ -1903,24 +1954,27 @@ static int ena_infos_get(struct rte_eth_dev *dev,
ETH_LINK_SPEED_100G;
/* Set Tx & Rx features available for device */
- if (adapter->offloads.tso4_supported)
+ if (adapter->offloads.tx_offloads & ENA_IPV4_TSO)
tx_feat |= DEV_TX_OFFLOAD_TCP_TSO;
- if (adapter->offloads.tx_csum_supported)
- tx_feat |= DEV_TX_OFFLOAD_IPV4_CKSUM |
- DEV_TX_OFFLOAD_UDP_CKSUM |
- DEV_TX_OFFLOAD_TCP_CKSUM;
+ if (adapter->offloads.tx_offloads & ENA_L3_IPV4_CSUM)
+ tx_feat |= DEV_TX_OFFLOAD_IPV4_CKSUM;
+ if (adapter->offloads.tx_offloads &
+ (ENA_L4_IPV4_CSUM_PARTIAL | ENA_L4_IPV4_CSUM |
+ ENA_L4_IPV6_CSUM | ENA_L4_IPV6_CSUM_PARTIAL))
+ tx_feat |= DEV_TX_OFFLOAD_UDP_CKSUM | DEV_TX_OFFLOAD_TCP_CKSUM;
- if (adapter->offloads.rx_csum_supported)
- rx_feat |= DEV_RX_OFFLOAD_IPV4_CKSUM |
- DEV_RX_OFFLOAD_UDP_CKSUM |
- DEV_RX_OFFLOAD_TCP_CKSUM;
+ if (adapter->offloads.rx_offloads & ENA_L3_IPV4_CSUM)
+ rx_feat |= DEV_RX_OFFLOAD_IPV4_CKSUM;
+ if (adapter->offloads.rx_offloads &
+ (ENA_L4_IPV4_CSUM | ENA_L4_IPV6_CSUM))
+ rx_feat |= DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM;
tx_feat |= DEV_TX_OFFLOAD_MULTI_SEGS;
/* Inform framework about available features */
dev_info->rx_offload_capa = rx_feat;
- if (adapter->offloads.rss_hash_supported)
+ if (adapter->offloads.rx_offloads & ENA_RX_RSS_HASH)
dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_RSS_HASH;
dev_info->rx_queue_offload_capa = rx_feat;
dev_info->tx_offload_capa = tx_feat;
@@ -2173,45 +2227,60 @@ eth_ena_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint32_t i;
struct rte_mbuf *m;
struct ena_ring *tx_ring = (struct ena_ring *)(tx_queue);
+ struct ena_adapter *adapter = tx_ring->adapter;
struct rte_ipv4_hdr *ip_hdr;
uint64_t ol_flags;
+ uint64_t l4_csum_flag;
+ uint64_t dev_offload_capa;
uint16_t frag_field;
+ bool need_pseudo_csum;
+ dev_offload_capa = adapter->offloads.tx_offloads;
for (i = 0; i != nb_pkts; i++) {
m = tx_pkts[i];
ol_flags = m->ol_flags;
- if (!(ol_flags & PKT_TX_IPV4))
+ /* Check if any offload flag was set */
+ if (ol_flags == 0)
continue;
- /* If there was not L2 header length specified, assume it is
- * length of the ethernet header.
- */
- if (unlikely(m->l2_len == 0))
- m->l2_len = sizeof(struct rte_ether_hdr);
-
- ip_hdr = rte_pktmbuf_mtod_offset(m, struct rte_ipv4_hdr *,
- m->l2_len);
- frag_field = rte_be_to_cpu_16(ip_hdr->fragment_offset);
-
- if ((frag_field & RTE_IPV4_HDR_DF_FLAG) != 0) {
- m->packet_type |= RTE_PTYPE_L4_NONFRAG;
-
- /* If IPv4 header has DF flag enabled and TSO support is
- * disabled, partial chcecksum should not be calculated.
- */
- if (!tx_ring->adapter->offloads.tso4_supported)
- continue;
- }
-
- if ((ol_flags & ENA_TX_OFFLOAD_NOTSUP_MASK) != 0 ||
- (ol_flags & PKT_TX_L4_MASK) ==
- PKT_TX_SCTP_CKSUM) {
+ l4_csum_flag = ol_flags & PKT_TX_L4_MASK;
+ /* SCTP checksum offload is not supported by the ENA. */
+ if ((ol_flags & ENA_TX_OFFLOAD_NOTSUP_MASK) ||
+ l4_csum_flag == PKT_TX_SCTP_CKSUM) {
+ PMD_TX_LOG(DEBUG,
+ "mbuf[%" PRIu32 "] has unsupported offloads flags set: 0x%" PRIu64 "\n",
+ i, ol_flags);
rte_errno = ENOTSUP;
return i;
}
#ifdef RTE_LIBRTE_ETHDEV_DEBUG
+ /* Check if requested offload is also enabled for the queue */
+ if ((ol_flags & PKT_TX_IP_CKSUM &&
+ !(tx_ring->offloads & DEV_TX_OFFLOAD_IPV4_CKSUM)) ||
+ (l4_csum_flag == PKT_TX_TCP_CKSUM &&
+ !(tx_ring->offloads & DEV_TX_OFFLOAD_TCP_CKSUM)) ||
+ (l4_csum_flag == PKT_TX_UDP_CKSUM &&
+ !(tx_ring->offloads & DEV_TX_OFFLOAD_UDP_CKSUM))) {
+ PMD_TX_LOG(DEBUG,
+ "mbuf[%" PRIu32 "]: requested offloads: %" PRIu16 " are not enabled for the queue[%u]\n",
+ i, m->nb_segs, tx_ring->id);
+ rte_errno = EINVAL;
+ return i;
+ }
+
+ /* The caller is obligated to set l2 and l3 len if any cksum
+ * offload is enabled.
+ */
+ if (unlikely(ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_L4_MASK) &&
+ (m->l2_len == 0 || m->l3_len == 0))) {
+ PMD_TX_LOG(DEBUG,
+ "mbuf[%" PRIu32 "]: l2_len or l3_len values are 0 while the offload was requested\n",
+ i);
+ rte_errno = EINVAL;
+ return i;
+ }
ret = rte_validate_tx_offload(m);
if (ret != 0) {
rte_errno = -ret;
@@ -2219,16 +2288,76 @@ eth_ena_prep_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
}
#endif
- /* In case we are supposed to TSO and have DF not set (DF=0)
- * hardware must be provided with partial checksum, otherwise
- * it will take care of necessary calculations.
+ /* Verify HW support for requested offloads and determine if
+ * pseudo header checksum is needed.
*/
+ need_pseudo_csum = false;
+ if (ol_flags & PKT_TX_IPV4) {
+ if (ol_flags & PKT_TX_IP_CKSUM &&
+ !(dev_offload_capa & ENA_L3_IPV4_CSUM)) {
+ rte_errno = ENOTSUP;
+ return i;
+ }
- ret = rte_net_intel_cksum_flags_prepare(m,
- ol_flags & ~PKT_TX_TCP_SEG);
- if (ret != 0) {
- rte_errno = -ret;
- return i;
+ if (ol_flags & PKT_TX_TCP_SEG &&
+ !(dev_offload_capa & ENA_IPV4_TSO)) {
+ rte_errno = ENOTSUP;
+ return i;
+ }
+
+ /* Check HW capabilities and if pseudo csum is needed
+ * for L4 offloads.
+ */
+ if (l4_csum_flag != PKT_TX_L4_NO_CKSUM &&
+ !(dev_offload_capa & ENA_L4_IPV4_CSUM)) {
+ if (dev_offload_capa &
+ ENA_L4_IPV4_CSUM_PARTIAL) {
+ need_pseudo_csum = true;
+ } else {
+ rte_errno = ENOTSUP;
+ return i;
+ }
+ }
+
+ /* Parse the DF flag */
+ ip_hdr = rte_pktmbuf_mtod_offset(m,
+ struct rte_ipv4_hdr *, m->l2_len);
+ frag_field = rte_be_to_cpu_16(ip_hdr->fragment_offset);
+ if (frag_field & RTE_IPV4_HDR_DF_FLAG) {
+ m->packet_type |= RTE_PTYPE_L4_NONFRAG;
+ } else if (ol_flags & PKT_TX_TCP_SEG) {
+ /* In case we are supposed to TSO and have DF
+ * not set (DF=0) hardware must be provided with
+ * partial checksum.
+ */
+ need_pseudo_csum = true;
+ }
+ } else if (ol_flags & PKT_TX_IPV6) {
+ /* There is no support for IPv6 TSO as for now. */
+ if (ol_flags & PKT_TX_TCP_SEG) {
+ rte_errno = ENOTSUP;
+ return i;
+ }
+
+ /* Check HW capabilities and if pseudo csum is needed */
+ if (l4_csum_flag != PKT_TX_L4_NO_CKSUM &&
+ !(dev_offload_capa & ENA_L4_IPV6_CSUM)) {
+ if (dev_offload_capa &
+ ENA_L4_IPV6_CSUM_PARTIAL) {
+ need_pseudo_csum = true;
+ } else {
+ rte_errno = ENOTSUP;
+ return i;
+ }
+ }
+ }
+
+ if (need_pseudo_csum) {
+ ret = rte_net_intel_cksum_flags_prepare(m, ol_flags);
+ if (ret != 0) {
+ rte_errno = -ret;
+ return i;
+ }
}
}
diff --git a/drivers/net/ena/ena_ethdev.h b/drivers/net/ena/ena_ethdev.h
index 06ac8b06b5..26d425a893 100644
--- a/drivers/net/ena/ena_ethdev.h
+++ b/drivers/net/ena/ena_ethdev.h
@@ -223,10 +223,8 @@ struct ena_stats_eni {
};
struct ena_offloads {
- bool tso4_supported;
- bool tx_csum_supported;
- bool rx_csum_supported;
- bool rss_hash_supported;
+ uint32_t tx_offloads;
+ uint32_t rx_offloads;
};
/* board specific private data structure */
--
2.25.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v3 2/7] net/ena: support Tx/Rx free thresholds
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 " Michal Krawczyk
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 1/7] net/ena: fix verification of the offload capabilities Michal Krawczyk
@ 2021-10-19 10:56 ` Michal Krawczyk
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 3/7] net/ena: fix per-queue offload capabilities Michal Krawczyk
` (5 subsequent siblings)
7 siblings, 0 replies; 29+ messages in thread
From: Michal Krawczyk @ 2021-10-19 10:56 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, upstream, shaibran, ndagan, igorch, Michal Krawczyk
The caller can pass Tx or Rx free threshold value to the configuration
structure for each ring. It determines when the Tx/Rx function should
start cleaning up/refilling the descriptors. ENA was ignoring this value
and doing it's own calulcations.
Now the user can configure ENA's behavior using this parameter and if
this variable won't be set, the ENA will continue with the old behavior
and will use it's own threshold value.
The default value is not provided by the ENA in the ena_infos_get(), as
it's being determined dynamically, depending on the requested ring size.
Note that NULL check for Tx conf was removed from the function
ena_tx_queue_setup(), as at this place the configuration will be
either provided by the user or the default config will be used and it's
handled by the upper (rte_ethdev) layer.
Tx threshold shouldn't be used for the Tx cleanup budget as it can be
inadequate to the used burst. Now the PMD tries to release mbufs for the
ring until it will be depleted.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
---
v2:
* Fix calculations of the default tx_free_thresh if it wasn't provided by
the user. RTE_MIN was replaced with RTE_MAX.
doc/guides/rel_notes/release_21_11.rst | 7 ++++
drivers/net/ena/ena_ethdev.c | 44 ++++++++++++++++++--------
drivers/net/ena/ena_ethdev.h | 5 +++
3 files changed, 42 insertions(+), 14 deletions(-)
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index bd6a388c9d..8341d979aa 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -102,6 +102,13 @@ New Features
* Disabled secondary process support.
+* **Updated Amazon ENA PMD.**
+
+ Updated the Amazon ENA PMD. The new driver version (v2.5.0) introduced
+ bug fixes and improvements, including:
+
+ * Support for the tx_free_thresh and rx_free_thresh configuration parameters.
+
* **Updated Broadcom bnxt PMD.**
* Added flow offload support for Thor.
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 197cb7ecd4..fe9bac8888 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -1128,6 +1128,7 @@ static int ena_tx_queue_setup(struct rte_eth_dev *dev,
struct ena_ring *txq = NULL;
struct ena_adapter *adapter = dev->data->dev_private;
unsigned int i;
+ uint16_t dyn_thresh;
txq = &adapter->tx_ring[queue_idx];
@@ -1194,10 +1195,18 @@ static int ena_tx_queue_setup(struct rte_eth_dev *dev,
for (i = 0; i < txq->ring_size; i++)
txq->empty_tx_reqs[i] = i;
- if (tx_conf != NULL) {
- txq->offloads =
- tx_conf->offloads | dev->data->dev_conf.txmode.offloads;
+ txq->offloads = tx_conf->offloads | dev->data->dev_conf.txmode.offloads;
+
+ /* Check if caller provided the Tx cleanup threshold value. */
+ if (tx_conf->tx_free_thresh != 0) {
+ txq->tx_free_thresh = tx_conf->tx_free_thresh;
+ } else {
+ dyn_thresh = txq->ring_size -
+ txq->ring_size / ENA_REFILL_THRESH_DIVIDER;
+ txq->tx_free_thresh = RTE_MAX(dyn_thresh,
+ txq->ring_size - ENA_REFILL_THRESH_PACKET);
}
+
/* Store pointer to this queue in upper layer */
txq->configured = 1;
dev->data->tx_queues[queue_idx] = txq;
@@ -1216,6 +1225,7 @@ static int ena_rx_queue_setup(struct rte_eth_dev *dev,
struct ena_ring *rxq = NULL;
size_t buffer_size;
int i;
+ uint16_t dyn_thresh;
rxq = &adapter->rx_ring[queue_idx];
if (rxq->configured) {
@@ -1295,6 +1305,14 @@ static int ena_rx_queue_setup(struct rte_eth_dev *dev,
rxq->offloads = rx_conf->offloads | dev->data->dev_conf.rxmode.offloads;
+ if (rx_conf->rx_free_thresh != 0) {
+ rxq->rx_free_thresh = rx_conf->rx_free_thresh;
+ } else {
+ dyn_thresh = rxq->ring_size / ENA_REFILL_THRESH_DIVIDER;
+ rxq->rx_free_thresh = RTE_MIN(dyn_thresh,
+ (uint16_t)(ENA_REFILL_THRESH_PACKET));
+ }
+
/* Store pointer to this queue in upper layer */
rxq->configured = 1;
dev->data->rx_queues[queue_idx] = rxq;
@@ -2124,7 +2142,6 @@ static uint16_t eth_ena_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
{
struct ena_ring *rx_ring = (struct ena_ring *)(rx_queue);
unsigned int free_queue_entries;
- unsigned int refill_threshold;
uint16_t next_to_clean = rx_ring->next_to_clean;
uint16_t descs_in_use;
struct rte_mbuf *mbuf;
@@ -2206,12 +2223,9 @@ static uint16_t eth_ena_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
rx_ring->next_to_clean = next_to_clean;
free_queue_entries = ena_com_free_q_entries(rx_ring->ena_com_io_sq);
- refill_threshold =
- RTE_MIN(rx_ring->ring_size / ENA_REFILL_THRESH_DIVIDER,
- (unsigned int)ENA_REFILL_THRESH_PACKET);
/* Burst refill to save doorbells, memory barriers, const interval */
- if (free_queue_entries > refill_threshold) {
+ if (free_queue_entries >= rx_ring->rx_free_thresh) {
ena_com_update_dev_comp_head(rx_ring->ena_com_io_cq);
ena_populate_rx_queue(rx_ring, free_queue_entries);
}
@@ -2578,12 +2592,12 @@ static int ena_xmit_mbuf(struct ena_ring *tx_ring, struct rte_mbuf *mbuf)
static void ena_tx_cleanup(struct ena_ring *tx_ring)
{
- unsigned int cleanup_budget;
unsigned int total_tx_descs = 0;
+ uint16_t cleanup_budget;
uint16_t next_to_clean = tx_ring->next_to_clean;
- cleanup_budget = RTE_MIN(tx_ring->ring_size / ENA_REFILL_THRESH_DIVIDER,
- (unsigned int)ENA_REFILL_THRESH_PACKET);
+ /* Attempt to release all Tx descriptors (ring_size - 1 -> size_mask) */
+ cleanup_budget = tx_ring->size_mask;
while (likely(total_tx_descs < cleanup_budget)) {
struct rte_mbuf *mbuf;
@@ -2624,6 +2638,7 @@ static uint16_t eth_ena_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
uint16_t nb_pkts)
{
struct ena_ring *tx_ring = (struct ena_ring *)(tx_queue);
+ int available_desc;
uint16_t sent_idx = 0;
#ifdef RTE_ETHDEV_DEBUG_TX
@@ -2643,8 +2658,8 @@ static uint16_t eth_ena_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
tx_ring->size_mask)]);
}
- tx_ring->tx_stats.available_desc =
- ena_com_free_q_entries(tx_ring->ena_com_io_sq);
+ available_desc = ena_com_free_q_entries(tx_ring->ena_com_io_sq);
+ tx_ring->tx_stats.available_desc = available_desc;
/* If there are ready packets to be xmitted... */
if (likely(tx_ring->pkts_without_db)) {
@@ -2654,7 +2669,8 @@ static uint16_t eth_ena_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
tx_ring->pkts_without_db = false;
}
- ena_tx_cleanup(tx_ring);
+ if (available_desc < tx_ring->tx_free_thresh)
+ ena_tx_cleanup(tx_ring);
tx_ring->tx_stats.available_desc =
ena_com_free_q_entries(tx_ring->ena_com_io_sq);
diff --git a/drivers/net/ena/ena_ethdev.h b/drivers/net/ena/ena_ethdev.h
index 26d425a893..176d713dff 100644
--- a/drivers/net/ena/ena_ethdev.h
+++ b/drivers/net/ena/ena_ethdev.h
@@ -142,6 +142,11 @@ struct ena_ring {
struct ena_com_io_cq *ena_com_io_cq;
struct ena_com_io_sq *ena_com_io_sq;
+ union {
+ uint16_t tx_free_thresh;
+ uint16_t rx_free_thresh;
+ };
+
struct ena_com_rx_buf_info ena_bufs[ENA_PKT_MAX_BUFS]
__rte_cache_aligned;
--
2.25.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v3 3/7] net/ena: fix per-queue offload capabilities
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 " Michal Krawczyk
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 1/7] net/ena: fix verification of the offload capabilities Michal Krawczyk
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 2/7] net/ena: support Tx/Rx free thresholds Michal Krawczyk
@ 2021-10-19 10:56 ` Michal Krawczyk
2021-10-19 12:25 ` Ferruh Yigit
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 4/7] net/ena: indicate missing scattered Rx capability Michal Krawczyk
` (4 subsequent siblings)
7 siblings, 1 reply; 29+ messages in thread
From: Michal Krawczyk @ 2021-10-19 10:56 UTC (permalink / raw)
To: ferruh.yigit
Cc: dev, upstream, shaibran, ndagan, igorch, Michal Krawczyk, stable
PMD shouldn't advertise the same offloads as both per-queue and
per-port [1]. Each offload capability should go either to the
[rt]x_queue_offload_capa or [rt]x_offload_capa.
As ENA currently doesn't support offloads which could be configured
per-queue, only per-port flags should be set.
In addition, to make the code cleaner, parsing appropriate offload
flags is encapsulated into helper functions, in a similar matter it's
done by the other PMDs.
[1] https://doc.dpdk.org/guides/prog_guide/
poll_mode_drv.html?highlight=offloads#hardware-offload
Fixes: 7369f88f88c0 ("net/ena: convert to new Rx offloads API")
Fixes: 56b8b9b7e5d2 ("net/ena: convert to new Tx offloads API")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
---
drivers/net/ena/ena_ethdev.c | 89 ++++++++++++++++++++++++------------
1 file changed, 60 insertions(+), 29 deletions(-)
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index fe9bac8888..655c53b525 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -223,6 +223,10 @@ static int ena_queue_start(struct rte_eth_dev *dev, struct ena_ring *ring);
static int ena_queue_start_all(struct rte_eth_dev *dev,
enum ena_ring_type ring_type);
static void ena_stats_restart(struct rte_eth_dev *dev);
+static uint64_t ena_get_rx_port_offloads(struct ena_adapter *adapter);
+static uint64_t ena_get_tx_port_offloads(struct ena_adapter *adapter);
+static uint64_t ena_get_rx_queue_offloads(struct ena_adapter *adapter);
+static uint64_t ena_get_tx_queue_offloads(struct ena_adapter *adapter);
static int ena_infos_get(struct rte_eth_dev *dev,
struct rte_eth_dev_info *dev_info);
static void ena_interrupt_handler_rte(void *cb_arg);
@@ -1947,12 +1951,63 @@ static void ena_init_rings(struct ena_adapter *adapter,
}
}
+static uint64_t ena_get_rx_port_offloads(struct ena_adapter *adapter)
+{
+ uint64_t port_offloads = 0;
+
+ if (adapter->offloads.rx_offloads & ENA_L3_IPV4_CSUM)
+ port_offloads |= DEV_RX_OFFLOAD_IPV4_CKSUM;
+
+ if (adapter->offloads.rx_offloads &
+ (ENA_L4_IPV4_CSUM | ENA_L4_IPV6_CSUM))
+ port_offloads |=
+ DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM;
+
+ if (adapter->offloads.rx_offloads & ENA_RX_RSS_HASH)
+ port_offloads |= DEV_RX_OFFLOAD_RSS_HASH;
+
+ return port_offloads;
+}
+
+static uint64_t ena_get_tx_port_offloads(struct ena_adapter *adapter)
+{
+ uint64_t port_offloads = 0;
+
+ if (adapter->offloads.tx_offloads & ENA_IPV4_TSO)
+ port_offloads |= DEV_TX_OFFLOAD_TCP_TSO;
+
+ if (adapter->offloads.tx_offloads & ENA_L3_IPV4_CSUM)
+ port_offloads |= DEV_TX_OFFLOAD_IPV4_CKSUM;
+ if (adapter->offloads.tx_offloads &
+ (ENA_L4_IPV4_CSUM_PARTIAL | ENA_L4_IPV4_CSUM |
+ ENA_L4_IPV6_CSUM | ENA_L4_IPV6_CSUM_PARTIAL))
+ port_offloads |=
+ DEV_TX_OFFLOAD_UDP_CKSUM | DEV_TX_OFFLOAD_TCP_CKSUM;
+
+ port_offloads |= DEV_TX_OFFLOAD_MULTI_SEGS;
+
+ return port_offloads;
+}
+
+static uint64_t ena_get_rx_queue_offloads(struct ena_adapter *adapter)
+{
+ RTE_SET_USED(adapter);
+
+ return 0;
+}
+
+static uint64_t ena_get_tx_queue_offloads(struct ena_adapter *adapter)
+{
+ RTE_SET_USED(adapter);
+
+ return 0;
+}
+
static int ena_infos_get(struct rte_eth_dev *dev,
struct rte_eth_dev_info *dev_info)
{
struct ena_adapter *adapter;
struct ena_com_dev *ena_dev;
- uint64_t rx_feat = 0, tx_feat = 0;
ena_assert_msg(dev->data != NULL, "Uninitialized device\n");
ena_assert_msg(dev->data->dev_private != NULL, "Uninitialized device\n");
@@ -1971,32 +2026,11 @@ static int ena_infos_get(struct rte_eth_dev *dev,
ETH_LINK_SPEED_50G |
ETH_LINK_SPEED_100G;
- /* Set Tx & Rx features available for device */
- if (adapter->offloads.tx_offloads & ENA_IPV4_TSO)
- tx_feat |= DEV_TX_OFFLOAD_TCP_TSO;
-
- if (adapter->offloads.tx_offloads & ENA_L3_IPV4_CSUM)
- tx_feat |= DEV_TX_OFFLOAD_IPV4_CKSUM;
- if (adapter->offloads.tx_offloads &
- (ENA_L4_IPV4_CSUM_PARTIAL | ENA_L4_IPV4_CSUM |
- ENA_L4_IPV6_CSUM | ENA_L4_IPV6_CSUM_PARTIAL))
- tx_feat |= DEV_TX_OFFLOAD_UDP_CKSUM | DEV_TX_OFFLOAD_TCP_CKSUM;
-
- if (adapter->offloads.rx_offloads & ENA_L3_IPV4_CSUM)
- rx_feat |= DEV_RX_OFFLOAD_IPV4_CKSUM;
- if (adapter->offloads.rx_offloads &
- (ENA_L4_IPV4_CSUM | ENA_L4_IPV6_CSUM))
- rx_feat |= DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM;
-
- tx_feat |= DEV_TX_OFFLOAD_MULTI_SEGS;
-
/* Inform framework about available features */
- dev_info->rx_offload_capa = rx_feat;
- if (adapter->offloads.rx_offloads & ENA_RX_RSS_HASH)
- dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_RSS_HASH;
- dev_info->rx_queue_offload_capa = rx_feat;
- dev_info->tx_offload_capa = tx_feat;
- dev_info->tx_queue_offload_capa = tx_feat;
+ dev_info->rx_offload_capa = ena_get_rx_port_offloads(adapter);
+ dev_info->tx_offload_capa = ena_get_tx_port_offloads(adapter);
+ dev_info->rx_queue_offload_capa = ena_get_rx_queue_offloads(adapter);
+ dev_info->tx_queue_offload_capa = ena_get_tx_queue_offloads(adapter);
dev_info->flow_type_rss_offloads = ENA_ALL_RSS_HF;
dev_info->hash_key_size = ENA_HASH_KEY_SIZE;
@@ -2012,9 +2046,6 @@ static int ena_infos_get(struct rte_eth_dev *dev,
dev_info->max_tx_queues = adapter->max_num_io_queues;
dev_info->reta_size = ENA_RX_RSS_TABLE_SIZE;
- adapter->tx_supported_offloads = tx_feat;
- adapter->rx_supported_offloads = rx_feat;
-
dev_info->rx_desc_lim.nb_max = adapter->max_rx_ring_size;
dev_info->rx_desc_lim.nb_min = ENA_MIN_RING_DESC;
dev_info->rx_desc_lim.nb_seg_max = RTE_MIN(ENA_PKT_MAX_BUFS,
--
2.25.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 3/7] net/ena: fix per-queue offload capabilities
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 3/7] net/ena: fix per-queue offload capabilities Michal Krawczyk
@ 2021-10-19 12:25 ` Ferruh Yigit
0 siblings, 0 replies; 29+ messages in thread
From: Ferruh Yigit @ 2021-10-19 12:25 UTC (permalink / raw)
To: Michal Krawczyk; +Cc: dev, upstream, shaibran, ndagan, igorch, stable
On 10/19/2021 11:56 AM, Michal Krawczyk wrote:
> PMD shouldn't advertise the same offloads as both per-queue and
> per-port [1]. Each offload capability should go either to the
> [rt]x_queue_offload_capa or [rt]x_offload_capa.
>
This is not exactly true.
It is expected that queue offloads advertised as part of port offloads too.
The logic is, if any offload can be applied in queue granularity it
can be applied to all queues which becomes port offload.
In document:
Port capabilities = per-queue capabilities + pure per-port capabilities.
There is a difference between "pure per-port capability" and "port capability",
this may be source of the confusion.
Since driver doesn't support queue specific offloads, code is not wrong,
I will remove above paragraph and merge the patch, if you have objection
or change request, please let me know, I can update in next-net.
> As ENA currently doesn't support offloads which could be configured
> per-queue, only per-port flags should be set.
>
> In addition, to make the code cleaner, parsing appropriate offload
> flags is encapsulated into helper functions, in a similar matter it's
> done by the other PMDs.
>
> [1] https://doc.dpdk.org/guides/prog_guide/
> poll_mode_drv.html?highlight=offloads#hardware-offload
>
> Fixes: 7369f88f88c0 ("net/ena: convert to new Rx offloads API")
> Fixes: 56b8b9b7e5d2 ("net/ena: convert to new Tx offloads API")
> Cc: stable@dpdk.org
>
> Signed-off-by: Michal Krawczyk <mk@semihalf.com>
> Reviewed-by: Igor Chauskin <igorch@amazon.com>
> Reviewed-by: Shai Brandes <shaibran@amazon.com>
<...>
^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v3 4/7] net/ena: indicate missing scattered Rx capability
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 " Michal Krawczyk
` (2 preceding siblings ...)
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 3/7] net/ena: fix per-queue offload capabilities Michal Krawczyk
@ 2021-10-19 10:56 ` Michal Krawczyk
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 5/7] net/ena: add NUMA aware allocations Michal Krawczyk
` (3 subsequent siblings)
7 siblings, 0 replies; 29+ messages in thread
From: Michal Krawczyk @ 2021-10-19 10:56 UTC (permalink / raw)
To: ferruh.yigit
Cc: dev, upstream, shaibran, ndagan, igorch, Michal Krawczyk, stable
ENA can't be forced to always pass single descriptor for the Rx packet.
Even if the passed buffer size is big enough to hold the data, we can't
make assumption that the HW won't use extra descriptor because of
internal optimizations. This assumption may be true, but only for some
of the FW revisions, which may differ depending on the used AWS instance
type.
As the scattered Rx support on the Rx path already exists, the driver
just needs to announce DEV_RX_OFFLOAD_SCATTER capability by turning on
the rte_eth_dev_data::scattered_rx option.
Fixes: 1173fca25af9 ("ena: add polling-mode driver")
Cc: stable@dpdk.org
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
---
drivers/net/ena/ena_ethdev.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 655c53b525..94dbb3164e 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -1917,8 +1917,14 @@ static int ena_dev_configure(struct rte_eth_dev *dev)
dev->data->dev_conf.rxmode.offloads |= DEV_RX_OFFLOAD_RSS_HASH;
dev->data->dev_conf.txmode.offloads |= DEV_TX_OFFLOAD_MULTI_SEGS;
+ /* Scattered Rx cannot be turned off in the HW, so this capability must
+ * be forced.
+ */
+ dev->data->scattered_rx = 1;
+
adapter->tx_selected_offloads = dev->data->dev_conf.txmode.offloads;
adapter->rx_selected_offloads = dev->data->dev_conf.rxmode.offloads;
+
return 0;
}
@@ -1966,6 +1972,8 @@ static uint64_t ena_get_rx_port_offloads(struct ena_adapter *adapter)
if (adapter->offloads.rx_offloads & ENA_RX_RSS_HASH)
port_offloads |= DEV_RX_OFFLOAD_RSS_HASH;
+ port_offloads |= DEV_RX_OFFLOAD_SCATTER;
+
return port_offloads;
}
--
2.25.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v3 5/7] net/ena: add NUMA aware allocations
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 " Michal Krawczyk
` (3 preceding siblings ...)
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 4/7] net/ena: indicate missing scattered Rx capability Michal Krawczyk
@ 2021-10-19 10:56 ` Michal Krawczyk
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 6/7] net/ena: add check for missing Tx completions Michal Krawczyk
` (2 subsequent siblings)
7 siblings, 0 replies; 29+ messages in thread
From: Michal Krawczyk @ 2021-10-19 10:56 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, upstream, shaibran, ndagan, igorch, Michal Krawczyk
Only the IO rings memory was allocated with taking the socket ID into
the respect, while the other structures was allocated using the regular
rte_zmalloc() API.
Ring specific structures are now being allocated using the ring's
socket ID.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
---
doc/guides/rel_notes/release_21_11.rst | 1 +
drivers/net/ena/ena_ethdev.c | 42 ++++++++++++++------------
2 files changed, 24 insertions(+), 19 deletions(-)
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 8341d979aa..6ac867321b 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -108,6 +108,7 @@ New Features
bug fixes and improvements, including:
* Support for the tx_free_thresh and rx_free_thresh configuration parameters.
+ * NUMA aware allocations for the queue helper structures.
* **Updated Broadcom bnxt PMD.**
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 94dbb3164e..4e9925b6be 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -1165,19 +1165,20 @@ static int ena_tx_queue_setup(struct rte_eth_dev *dev,
txq->numa_socket_id = socket_id;
txq->pkts_without_db = false;
- txq->tx_buffer_info = rte_zmalloc("txq->tx_buffer_info",
- sizeof(struct ena_tx_buffer) *
- txq->ring_size,
- RTE_CACHE_LINE_SIZE);
+ txq->tx_buffer_info = rte_zmalloc_socket("txq->tx_buffer_info",
+ sizeof(struct ena_tx_buffer) * txq->ring_size,
+ RTE_CACHE_LINE_SIZE,
+ socket_id);
if (!txq->tx_buffer_info) {
PMD_DRV_LOG(ERR,
"Failed to allocate memory for Tx buffer info\n");
return -ENOMEM;
}
- txq->empty_tx_reqs = rte_zmalloc("txq->empty_tx_reqs",
- sizeof(u16) * txq->ring_size,
- RTE_CACHE_LINE_SIZE);
+ txq->empty_tx_reqs = rte_zmalloc_socket("txq->empty_tx_reqs",
+ sizeof(uint16_t) * txq->ring_size,
+ RTE_CACHE_LINE_SIZE,
+ socket_id);
if (!txq->empty_tx_reqs) {
PMD_DRV_LOG(ERR,
"Failed to allocate memory for empty Tx requests\n");
@@ -1186,9 +1187,10 @@ static int ena_tx_queue_setup(struct rte_eth_dev *dev,
}
txq->push_buf_intermediate_buf =
- rte_zmalloc("txq->push_buf_intermediate_buf",
- txq->tx_max_header_size,
- RTE_CACHE_LINE_SIZE);
+ rte_zmalloc_socket("txq->push_buf_intermediate_buf",
+ txq->tx_max_header_size,
+ RTE_CACHE_LINE_SIZE,
+ socket_id);
if (!txq->push_buf_intermediate_buf) {
PMD_DRV_LOG(ERR, "Failed to alloc push buffer for LLQ\n");
rte_free(txq->tx_buffer_info);
@@ -1270,19 +1272,20 @@ static int ena_rx_queue_setup(struct rte_eth_dev *dev,
rxq->numa_socket_id = socket_id;
rxq->mb_pool = mp;
- rxq->rx_buffer_info = rte_zmalloc("rxq->buffer_info",
+ rxq->rx_buffer_info = rte_zmalloc_socket("rxq->buffer_info",
sizeof(struct ena_rx_buffer) * nb_desc,
- RTE_CACHE_LINE_SIZE);
+ RTE_CACHE_LINE_SIZE,
+ socket_id);
if (!rxq->rx_buffer_info) {
PMD_DRV_LOG(ERR,
"Failed to allocate memory for Rx buffer info\n");
return -ENOMEM;
}
- rxq->rx_refill_buffer = rte_zmalloc("rxq->rx_refill_buffer",
- sizeof(struct rte_mbuf *) * nb_desc,
- RTE_CACHE_LINE_SIZE);
-
+ rxq->rx_refill_buffer = rte_zmalloc_socket("rxq->rx_refill_buffer",
+ sizeof(struct rte_mbuf *) * nb_desc,
+ RTE_CACHE_LINE_SIZE,
+ socket_id);
if (!rxq->rx_refill_buffer) {
PMD_DRV_LOG(ERR,
"Failed to allocate memory for Rx refill buffer\n");
@@ -1291,9 +1294,10 @@ static int ena_rx_queue_setup(struct rte_eth_dev *dev,
return -ENOMEM;
}
- rxq->empty_rx_reqs = rte_zmalloc("rxq->empty_rx_reqs",
- sizeof(uint16_t) * nb_desc,
- RTE_CACHE_LINE_SIZE);
+ rxq->empty_rx_reqs = rte_zmalloc_socket("rxq->empty_rx_reqs",
+ sizeof(uint16_t) * nb_desc,
+ RTE_CACHE_LINE_SIZE,
+ socket_id);
if (!rxq->empty_rx_reqs) {
PMD_DRV_LOG(ERR,
"Failed to allocate memory for empty Rx requests\n");
--
2.25.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v3 6/7] net/ena: add check for missing Tx completions
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 " Michal Krawczyk
` (4 preceding siblings ...)
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 5/7] net/ena: add NUMA aware allocations Michal Krawczyk
@ 2021-10-19 10:56 ` Michal Krawczyk
2021-10-19 12:40 ` Ferruh Yigit
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 7/7] net/ena: update version to 2.5.0 Michal Krawczyk
2021-10-19 13:05 ` [dpdk-dev] [PATCH v3 0/7] net/ena: update ENA PMD to v2.5.0 Ferruh Yigit
7 siblings, 1 reply; 29+ messages in thread
From: Michal Krawczyk @ 2021-10-19 10:56 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, upstream, shaibran, ndagan, igorch, Michal Krawczyk
In some cases Tx descriptors may be uncompleted by the HW and as a
result they will never be released.
This patch adds checking for the missing Tx completions to the ENA timer
service, so in order to use this feature, the application must call the
function rte_timer_manage().
Missing Tx completion reset threshold is determined dynamically, by
taking into consideration ring size and the default value.
Tx cleanup is associated with the Tx burst function. As DPDK
applications can call Tx burst function dynamically, time when last
cleanup was called must be traced to avoid false detection of the
missing Tx completion.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
Reviewed-by: Igor Chauskin <igorch@amazon.com>
Reviewed-by: Shai Brandes <shaibran@amazon.com>
---
doc/guides/rel_notes/release_21_11.rst | 1 +
drivers/net/ena/ena_ethdev.c | 118 +++++++++++++++++++++++++
drivers/net/ena/ena_ethdev.h | 15 ++++
3 files changed, 134 insertions(+)
diff --git a/doc/guides/rel_notes/release_21_11.rst b/doc/guides/rel_notes/release_21_11.rst
index 6ac867321b..c5f76081e5 100644
--- a/doc/guides/rel_notes/release_21_11.rst
+++ b/doc/guides/rel_notes/release_21_11.rst
@@ -109,6 +109,7 @@ New Features
* Support for the tx_free_thresh and rx_free_thresh configuration parameters.
* NUMA aware allocations for the queue helper structures.
+ * Watchdog's feature which is checking for missing Tx completions.
* **Updated Broadcom bnxt PMD.**
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 4e9925b6be..1a70cd781c 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -99,6 +99,7 @@ static const struct ena_stats ena_stats_tx_strings[] = {
ENA_STAT_TX_ENTRY(doorbells),
ENA_STAT_TX_ENTRY(bad_req_id),
ENA_STAT_TX_ENTRY(available_desc),
+ ENA_STAT_TX_ENTRY(missed_tx),
};
static const struct ena_stats ena_stats_rx_strings[] = {
@@ -1164,6 +1165,7 @@ static int ena_tx_queue_setup(struct rte_eth_dev *dev,
txq->size_mask = nb_desc - 1;
txq->numa_socket_id = socket_id;
txq->pkts_without_db = false;
+ txq->last_cleanup_ticks = 0;
txq->tx_buffer_info = rte_zmalloc_socket("txq->tx_buffer_info",
sizeof(struct ena_tx_buffer) * txq->ring_size,
@@ -1213,6 +1215,9 @@ static int ena_tx_queue_setup(struct rte_eth_dev *dev,
txq->ring_size - ENA_REFILL_THRESH_PACKET);
}
+ txq->missing_tx_completion_threshold =
+ RTE_MIN(txq->ring_size / 2, ENA_DEFAULT_MISSING_COMP);
+
/* Store pointer to this queue in upper layer */
txq->configured = 1;
dev->data->tx_queues[queue_idx] = txq;
@@ -1539,6 +1544,85 @@ static void check_for_admin_com_state(struct ena_adapter *adapter)
}
}
+static int check_for_tx_completion_in_queue(struct ena_adapter *adapter,
+ struct ena_ring *tx_ring)
+{
+ struct ena_tx_buffer *tx_buf;
+ uint64_t timestamp;
+ uint64_t completion_delay;
+ uint32_t missed_tx = 0;
+ unsigned int i;
+ int rc = 0;
+
+ for (i = 0; i < tx_ring->ring_size; ++i) {
+ tx_buf = &tx_ring->tx_buffer_info[i];
+ timestamp = tx_buf->timestamp;
+
+ if (timestamp == 0)
+ continue;
+
+ completion_delay = rte_get_timer_cycles() - timestamp;
+ if (completion_delay > adapter->missing_tx_completion_to) {
+ if (unlikely(!tx_buf->print_once)) {
+ PMD_TX_LOG(WARNING,
+ "Found a Tx that wasn't completed on time, qid %d, index %d. Missing Tx outstanding for %" PRIu64 " msecs.\n",
+ tx_ring->id, i, completion_delay /
+ rte_get_timer_hz() * 1000);
+ tx_buf->print_once = true;
+ }
+ ++missed_tx;
+ }
+ }
+
+ if (unlikely(missed_tx > tx_ring->missing_tx_completion_threshold)) {
+ PMD_DRV_LOG(ERR,
+ "The number of lost Tx completions is above the threshold (%d > %d). Trigger the device reset.\n",
+ missed_tx,
+ tx_ring->missing_tx_completion_threshold);
+ adapter->reset_reason = ENA_REGS_RESET_MISS_TX_CMPL;
+ adapter->trigger_reset = true;
+ rc = -EIO;
+ }
+
+ tx_ring->tx_stats.missed_tx += missed_tx;
+
+ return rc;
+}
+
+static void check_for_tx_completions(struct ena_adapter *adapter)
+{
+ struct ena_ring *tx_ring;
+ uint64_t tx_cleanup_delay;
+ size_t qid;
+ int budget;
+ uint16_t nb_tx_queues = adapter->edev_data->nb_tx_queues;
+
+ if (adapter->missing_tx_completion_to == ENA_HW_HINTS_NO_TIMEOUT)
+ return;
+
+ nb_tx_queues = adapter->edev_data->nb_tx_queues;
+ budget = adapter->missing_tx_completion_budget;
+
+ qid = adapter->last_tx_comp_qid;
+ while (budget-- > 0) {
+ tx_ring = &adapter->tx_ring[qid];
+
+ /* Tx cleanup is called only by the burst function and can be
+ * called dynamically by the application. Also cleanup is
+ * limited by the threshold. To avoid false detection of the
+ * missing HW Tx completion, get the delay since last cleanup
+ * function was called.
+ */
+ tx_cleanup_delay = rte_get_timer_cycles() -
+ tx_ring->last_cleanup_ticks;
+ if (tx_cleanup_delay < adapter->tx_cleanup_stall_delay)
+ check_for_tx_completion_in_queue(adapter, tx_ring);
+ qid = (qid + 1) % nb_tx_queues;
+ }
+
+ adapter->last_tx_comp_qid = qid;
+}
+
static void ena_timer_wd_callback(__rte_unused struct rte_timer *timer,
void *arg)
{
@@ -1547,6 +1631,7 @@ static void ena_timer_wd_callback(__rte_unused struct rte_timer *timer,
check_for_missing_keep_alive(adapter);
check_for_admin_com_state(adapter);
+ check_for_tx_completions(adapter);
if (unlikely(adapter->trigger_reset)) {
PMD_DRV_LOG(ERR, "Trigger reset is on\n");
@@ -1926,6 +2011,20 @@ static int ena_dev_configure(struct rte_eth_dev *dev)
*/
dev->data->scattered_rx = 1;
+ adapter->last_tx_comp_qid = 0;
+
+ adapter->missing_tx_completion_budget =
+ RTE_MIN(ENA_MONITORED_TX_QUEUES, dev->data->nb_tx_queues);
+
+ adapter->missing_tx_completion_to = ENA_TX_TIMEOUT;
+ /* To avoid detection of the spurious Tx completion timeout due to
+ * application not calling the Tx cleanup function, set timeout for the
+ * Tx queue which should be half of the missing completion timeout for a
+ * safety. If there will be a lot of missing Tx completions in the
+ * queue, they will be detected sooner or later.
+ */
+ adapter->tx_cleanup_stall_delay = adapter->missing_tx_completion_to / 2;
+
adapter->tx_selected_offloads = dev->data->dev_conf.txmode.offloads;
adapter->rx_selected_offloads = dev->data->dev_conf.rxmode.offloads;
@@ -2433,6 +2532,20 @@ static void ena_update_hints(struct ena_adapter *adapter,
adapter->ena_dev.mmio_read.reg_read_to =
hints->mmio_read_timeout * 1000;
+ if (hints->missing_tx_completion_timeout) {
+ if (hints->missing_tx_completion_timeout ==
+ ENA_HW_HINTS_NO_TIMEOUT) {
+ adapter->missing_tx_completion_to =
+ ENA_HW_HINTS_NO_TIMEOUT;
+ } else {
+ /* Convert from msecs to ticks */
+ adapter->missing_tx_completion_to = rte_get_timer_hz() *
+ hints->missing_tx_completion_timeout / 1000;
+ adapter->tx_cleanup_stall_delay =
+ adapter->missing_tx_completion_to / 2;
+ }
+ }
+
if (hints->driver_watchdog_timeout) {
if (hints->driver_watchdog_timeout == ENA_HW_HINTS_NO_TIMEOUT)
adapter->keep_alive_timeout = ENA_HW_HINTS_NO_TIMEOUT;
@@ -2623,6 +2736,7 @@ static int ena_xmit_mbuf(struct ena_ring *tx_ring, struct rte_mbuf *mbuf)
}
tx_info->tx_descs = nb_hw_desc;
+ tx_info->timestamp = rte_get_timer_cycles();
tx_ring->tx_stats.cnt++;
tx_ring->tx_stats.bytes += mbuf->pkt_len;
@@ -2655,6 +2769,7 @@ static void ena_tx_cleanup(struct ena_ring *tx_ring)
/* Get Tx info & store how many descs were processed */
tx_info = &tx_ring->tx_buffer_info[req_id];
+ tx_info->timestamp = 0;
mbuf = tx_info->mbuf;
rte_pktmbuf_free(mbuf);
@@ -2675,6 +2790,9 @@ static void ena_tx_cleanup(struct ena_ring *tx_ring)
ena_com_comp_ack(tx_ring->ena_com_io_sq, total_tx_descs);
ena_com_update_dev_comp_head(tx_ring->ena_com_io_cq);
}
+
+ /* Notify completion handler that the cleanup was just called */
+ tx_ring->last_cleanup_ticks = rte_get_timer_cycles();
}
static uint16_t eth_ena_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
diff --git a/drivers/net/ena/ena_ethdev.h b/drivers/net/ena/ena_ethdev.h
index 176d713dff..4f4142ed12 100644
--- a/drivers/net/ena/ena_ethdev.h
+++ b/drivers/net/ena/ena_ethdev.h
@@ -36,6 +36,10 @@
#define ENA_WD_TIMEOUT_SEC 3
#define ENA_DEVICE_KALIVE_TIMEOUT (ENA_WD_TIMEOUT_SEC * rte_get_timer_hz())
+#define ENA_TX_TIMEOUT (5 * rte_get_timer_hz())
+#define ENA_MONITORED_TX_QUEUES 3
+#define ENA_DEFAULT_MISSING_COMP 256U
+
/* While processing submitted and completed descriptors (rx and tx path
* respectively) in a loop it is desired to:
* - perform batch submissions while populating sumbissmion queue
@@ -75,6 +79,8 @@ struct ena_tx_buffer {
struct rte_mbuf *mbuf;
unsigned int tx_descs;
unsigned int num_of_bufs;
+ uint64_t timestamp;
+ bool print_once;
struct ena_com_buf bufs[ENA_PKT_MAX_BUFS];
};
@@ -103,6 +109,7 @@ struct ena_stats_tx {
u64 doorbells;
u64 bad_req_id;
u64 available_desc;
+ u64 missed_tx;
};
struct ena_stats_rx {
@@ -118,6 +125,7 @@ struct ena_stats_rx {
struct ena_ring {
u16 next_to_use;
u16 next_to_clean;
+ uint64_t last_cleanup_ticks;
enum ena_ring_type type;
enum ena_admin_placement_policy_type tx_mem_queue_type;
@@ -171,6 +179,8 @@ struct ena_ring {
};
unsigned int numa_socket_id;
+
+ uint32_t missing_tx_completion_threshold;
} __rte_cache_aligned;
enum ena_adapter_state {
@@ -291,6 +301,11 @@ struct ena_adapter {
bool wd_state;
bool use_large_llq_hdr;
+
+ uint32_t last_tx_comp_qid;
+ uint64_t missing_tx_completion_to;
+ uint64_t missing_tx_completion_budget;
+ uint64_t tx_cleanup_stall_delay;
};
int ena_rss_reta_update(struct rte_eth_dev *dev,
--
2.25.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 6/7] net/ena: add check for missing Tx completions
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 6/7] net/ena: add check for missing Tx completions Michal Krawczyk
@ 2021-10-19 12:40 ` Ferruh Yigit
0 siblings, 0 replies; 29+ messages in thread
From: Ferruh Yigit @ 2021-10-19 12:40 UTC (permalink / raw)
To: Michal Krawczyk
Cc: dev, upstream, shaibran, ndagan, igorch, Thomas Monjalon,
David Marchand, Stephen Hemminger
On 10/19/2021 11:56 AM, Michal Krawczyk wrote:
> +static int check_for_tx_completion_in_queue(struct ena_adapter *adapter,
> + struct ena_ring *tx_ring)
> +{
> + struct ena_tx_buffer *tx_buf;
> + uint64_t timestamp;
> + uint64_t completion_delay;
> + uint32_t missed_tx = 0;
> + unsigned int i;
> + int rc = 0;
> +
> + for (i = 0; i < tx_ring->ring_size; ++i) {
> + tx_buf = &tx_ring->tx_buffer_info[i];
> + timestamp = tx_buf->timestamp;
> +
> + if (timestamp == 0)
> + continue;
> +
> + completion_delay = rte_get_timer_cycles() - timestamp;
> + if (completion_delay > adapter->missing_tx_completion_to) {
> + if (unlikely(!tx_buf->print_once)) {
> + PMD_TX_LOG(WARNING,
> + "Found a Tx that wasn't completed on time, qid %d, index %d. Missing Tx outstanding for %" PRIu64 " msecs.\n",
This line is too long, normally we allow long line for logs, but the
intention there is to enable user to search a log message in the code;
when line is broken search fails.
But when there is a format specifier in the log, it already break the
search and there is no point to keep the string in single line, which
reduces code readability.
I will break the line while merging.
^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v3 7/7] net/ena: update version to 2.5.0
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 " Michal Krawczyk
` (5 preceding siblings ...)
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 6/7] net/ena: add check for missing Tx completions Michal Krawczyk
@ 2021-10-19 10:56 ` Michal Krawczyk
2021-10-19 13:05 ` [dpdk-dev] [PATCH v3 0/7] net/ena: update ENA PMD to v2.5.0 Ferruh Yigit
7 siblings, 0 replies; 29+ messages in thread
From: Michal Krawczyk @ 2021-10-19 10:56 UTC (permalink / raw)
To: ferruh.yigit; +Cc: dev, upstream, shaibran, ndagan, igorch, Michal Krawczyk
This version update contains:
* Fix for verification of the offload capabilities (especially for
IPv6 packets).
* Support for Tx and Rx free threshold values.
* Fixes for per-queue offload capabilities.
* Announce support of the scattered Rx offload.
* NUMA aware allocations.
* Check for the missing Tx completions.
Signed-off-by: Michal Krawczyk <mk@semihalf.com>
---
drivers/net/ena/ena_ethdev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 1a70cd781c..4d2f7d727c 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -21,7 +21,7 @@
#include <ena_eth_io_defs.h>
#define DRV_MODULE_VER_MAJOR 2
-#define DRV_MODULE_VER_MINOR 4
+#define DRV_MODULE_VER_MINOR 5
#define DRV_MODULE_VER_SUBMINOR 0
#define __MERGE_64B_H_L(h, l) (((uint64_t)h << 32) | l)
--
2.25.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/7] net/ena: update ENA PMD to v2.5.0
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 " Michal Krawczyk
` (6 preceding siblings ...)
2021-10-19 10:56 ` [dpdk-dev] [PATCH v3 7/7] net/ena: update version to 2.5.0 Michal Krawczyk
@ 2021-10-19 13:05 ` Ferruh Yigit
7 siblings, 0 replies; 29+ messages in thread
From: Ferruh Yigit @ 2021-10-19 13:05 UTC (permalink / raw)
To: Michal Krawczyk; +Cc: dev, upstream, shaibran, ndagan, igorch
On 10/19/2021 11:56 AM, Michal Krawczyk wrote:
> Hi,
>
> this version updates the driver to version 2.5.0. It mainly focuses on
> fixing the offload flags fixes. Other features included in this patchset
> are:
>
> * NUMA aware allocations for the queue specific structures
> * New watchdog - check for missing Tx completions
> * Support for [tr]x_free_thresh configuration parameters
>
> Regards,
> Michal
>
> v3:
> * Rebase series on top of recent changed to resolve conflicts regarding the
> driver's offload flags (especially the DEV_RX_OFFLOAD_JUMBO_FRAME).
>
> Michal Krawczyk (7):
> net/ena: fix verification of the offload capabilities
> net/ena: support Tx/Rx free thresholds
> net/ena: fix per-queue offload capabilities
> net/ena: indicate missing scattered Rx capability
> net/ena: add NUMA aware allocations
> net/ena: add check for missing Tx completions
> net/ena: update version to 2.5.0
>
Series applied to dpdk-next-net/main, thanks.
^ permalink raw reply [flat|nested] 29+ messages in thread