From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C463445D5D for ; Thu, 21 Nov 2024 00:43:28 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B934443272; Thu, 21 Nov 2024 00:43:28 +0100 (CET) Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com [209.85.128.41]) by mails.dpdk.org (Postfix) with ESMTP id 81CAB43272 for ; Thu, 21 Nov 2024 00:43:27 +0100 (CET) Received: by mail-wm1-f41.google.com with SMTP id 5b1f17b1804b1-43169902057so2261065e9.0 for ; Wed, 20 Nov 2024 15:43:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732146207; x=1732751007; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lW5sy3qaV4Ui8O6YbYeijWpr69fCFRL/0yHCpXSpDwM=; b=h+bgj34OjDq5poVpxg1517mPQyEwjYcZBszDlr45nI9C0hVoGP3AJIf75jZiaCLFQd bzI0vKF7J/4s71xzsPTA+K/EmdaNB6t+4mnNqKVcT1EJ+uiGK4EW9r8lDqKctvm7cFhp +ewxtZTdR2d1HQR1xPbayOf35rKd/Z1/ZN3CUs4bJNaVtvWxi28Il5TAk6hJ+H6XIFmp xMcCUT918BgEDa8qmVjy7VACN8p80vPWBXYHH1hZx9geOPRbwHWu8K80wXxCIeWItj9J HmCFoD1c//NeqN298yVyqNSrDm6Jz2eh6MX4/S67fJDSqMpHVdcpoTk4Nqag3TyBVNsc vkgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732146207; x=1732751007; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lW5sy3qaV4Ui8O6YbYeijWpr69fCFRL/0yHCpXSpDwM=; b=UmeqWjt0hK+cIJ8TvWx2e/GPlH02mM13tbyLnt8yXXvaS2tMyZuLqpcXThON9JFiyA n6tB8QpQGrn3QCJ8AADd5LlSrB8gpG+hSjg5OyBbnjXskhw7V3Op/GUrxtObNOfWgdQz JcdNUb7AeY4v0SoLls0s786IyWIwKtUaQCrqCYcunWzbF9+ntgD75hZspKXgv0W/N9ln BotztpgR8vy0G02y4z+ojscMcmMLOHeFCHjznGXh17J0U7esc2OlbkN9FRuidFy98804 hhYPFax6mhVDAiiQor32kQAEUHc/N2VJOqtMfkl6/EJfSyMdFPMlOXnGV0vPjBJbX8GO fqQA== X-Forwarded-Encrypted: i=1; AJvYcCWhNl0V4FE4Svc/XXfO+PbdKM2L9b29fZyvtQfp4H6Zg2ZaorRbZOQH5y/m995CiUCVt/zoTWs=@dpdk.org X-Gm-Message-State: AOJu0Yz4xloIXQ6iPTTl+8x22NrwIbInc5OyCi8yHBsF4X1ZijwTa9DH 21ohau268MXfklSG2uCkyDeS7EEJRiQqx513WvVgexYKZB2fKjgmvZLuUt/0 X-Google-Smtp-Source: AGHT+IH6py7K1sx/Vi4jfvmowI8rb59ijNgA5jaroj+0U4mOilvt+RmgQ/NtyAVL0ecpOYw3u/l0+A== X-Received: by 2002:a05:600c:34c3:b0:431:51a9:e956 with SMTP id 5b1f17b1804b1-433489866d2mr35968825e9.1.1732146207034; Wed, 20 Nov 2024 15:43:27 -0800 (PST) Received: from localhost ([2a01:4b00:d036:ae00:21cd:def0:a01d:d2aa]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3825493ea74sm3402741f8f.89.2024.11.20.15.43.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Nov 2024 15:43:26 -0800 (PST) From: luca.boccassi@gmail.com To: Dariusz Sosnowski Cc: Ori Kam , dpdk stable Subject: patch 'net/mlx5: fix counter query loop getting stuck' has been queued to stable release 22.11.7 Date: Wed, 20 Nov 2024 23:41:43 +0000 Message-ID: <20241120234215.233355-18-luca.boccassi@gmail.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241120234215.233355-1-luca.boccassi@gmail.com> References: <20241112220754.666489-40-luca.boccassi@gmail.com> <20241120234215.233355-1-luca.boccassi@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Hi, FYI, your patch has been queued to stable release 22.11.7 Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. It will be pushed if I get no objections before 11/22/24. So please shout if anyone has objections. Also note that after the patch there's a diff of the upstream commit vs the patch applied to the branch. This will indicate if there was any rebasing needed to apply to the stable branch. If there were code changes for rebasing (ie: not only metadata diffs), please double check that the rebase was correctly done. Queued patches are on a temporary branch at: https://github.com/bluca/dpdk-stable This queued commit can be viewed at: https://github.com/bluca/dpdk-stable/commit/0e162f1a25b59dbb3c40455a0249154e0532b61f Thanks. Luca Boccassi --- >From 0e162f1a25b59dbb3c40455a0249154e0532b61f Mon Sep 17 00:00:00 2001 From: Dariusz Sosnowski Date: Wed, 30 Oct 2024 17:30:46 +0100 Subject: [PATCH] net/mlx5: fix counter query loop getting stuck [ upstream commit c0e29968294c92ca15fdb34ce63fbba01c4562a6 ] Counter service thread, responsible for refreshing counter values stored in host memory, is running an "infinite loop" with the following logic: - For each port: - Refresh port's counter pool - call to __mlx5_hws_cnt_svc(). - Perform aging checks. - Go to sleep if time left in current cycle. - Repeat. __mlx5_hws_cnt_svc() used to perform counter value refresh implemented the following logic: 1. Store number of counters waiting for reset. 2. Issue ASO WQEs to refresh all counters values. 3. Move counters from reset to reuse list. Number of moved counters is limited by number stored in step 1 or step 4. 4. Store number of counters waiting for reset. 5. If number of counters waiting for reset > 0, go to step 2. Now, if an application constantly creates/destroys flow rules with counters and even a single counter is added to reset list during step 2, counter service thread might end up issuing ASO WQEs endlessly, without going to sleep and respecting the configured cycle time. This patch fixes that by remove the loop inside __mlx5_hws_cnt_svc(). As a drawback of this fix, the application must allocate enough counters to accommodate for the cycle time. This number if roughly equal to the expected counter release rate. This patch also: - Ensures that proper counter related error code is returned, when flow rule create failed due to counter allocation problem. - Adds debug logging to counter service thread. - Adds documentation for counter service thread. Fixes: 4d368e1da3a4 ("net/mlx5: support flow counter action for HWS") Signed-off-by: Dariusz Sosnowski Acked-by: Ori Kam --- doc/guides/nics/mlx5.rst | 71 +++++++++++++++++++++++++++++++++ drivers/net/mlx5/mlx5_flow_hw.c | 17 +++++--- drivers/net/mlx5/mlx5_hws_cnt.c | 46 ++++++++++++--------- 3 files changed, 110 insertions(+), 24 deletions(-) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index b047d7db58..d2f741a472 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -1589,6 +1589,77 @@ directly but neither destroyed nor flushed. The application should re-create the flows as required after the port restart. +Notes for flow counters +----------------------- + +mlx5 PMD supports the ``COUNT`` flow action, +which provides an ability to count packets (and bytes) +matched against a given flow rule. +This section describes the high level overview of +how this support is implemented and limitations. + +HW steering flow engine +~~~~~~~~~~~~~~~~~~~~~~~ + +Flow counters are allocated from HW in bulks. +A set of bulks forms a flow counter pool managed by PMD. +When flow counters are queried from HW, +each counter is identified by an offset in a given bulk. +Querying HW flow counter requires sending a request to HW, +which will request a read of counter values for given offsets. +HW will asynchronously provide these values through a DMA write. + +In order to optimize HW to SW communication, +these requests are handled in a separate counter service thread +spawned by mlx5 PMD. +This service thread will refresh the counter values stored in memory, +in cycles, each spanning ``svc_cycle_time`` milliseconds. +By default, ``svc_cycle_time`` is set to 500. +When applications query the ``COUNT`` flow action, +PMD returns the values stored in host memory. + +mlx5 PMD manages 3 global rings of allocated counter offsets: + +- ``free`` ring - Counters which were not used at all. +- ``wait_reset`` ring - Counters which were used in some flow rules, + but were recently freed (flow rule was destroyed + or an indirect action was destroyed). + Since the count value might have changed + between the last counter service thread cycle and the moment it was freed, + the value in host memory might be stale. + During the next service thread cycle, + such counters will be moved to ``reuse`` ring. +- ``reuse`` ring - Counters which were used at least once + and can be reused in new flow rules. + +When counters are assigned to a flow rule (or allocated to indirect action), +the PMD first tries to fetch a counter from ``reuse`` ring. +If it's empty, the PMD fetches a counter from ``free`` ring. + +The counter service thread works as follows: + +#. Record counters stored in ``wait_reset`` ring. +#. Read values of all counters which were used at least once + or are currently in use. +#. Move recorded counters from ``wait_reset`` to ``reuse`` ring. +#. Sleep for ``(query time) - svc_cycle_time`` milliseconds +#. Repeat. + +Because freeing a counter (by destroying a flow rule or destroying indirect action) +does not immediately make it available for the application, +the PMD might return: + +- ``ENOENT`` if no counter is available in ``free``, ``reuse`` + or ``wait_reset`` rings. + No counter will be available until the application releases some of them. +- ``EAGAIN`` if no counter is available in ``free`` and ``reuse`` rings, + but there are counters in ``wait_reset`` ring. + This means that after the next service thread cycle new counters will be available. + +The application has to be aware that flow rule create or indirect action create +might need be retried. + + Notes for hairpin ----------------- diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c index bff95133cf..dd64cb224f 100644 --- a/drivers/net/mlx5/mlx5_flow_hw.c +++ b/drivers/net/mlx5/mlx5_flow_hw.c @@ -2377,8 +2377,11 @@ flow_hw_actions_construct(struct rte_eth_dev *dev, case RTE_FLOW_ACTION_TYPE_COUNT: cnt_queue = mlx5_hws_cnt_get_queue(priv, &queue); ret = mlx5_hws_cnt_pool_get(priv->hws_cpool, cnt_queue, &cnt_id, age_idx); - if (ret != 0) + if (ret != 0) { + rte_flow_error_set(error, -ret, RTE_FLOW_ERROR_TYPE_ACTION, + action, "Failed to allocate flow counter"); goto error; + } ret = mlx5_hws_cnt_pool_get_action_offset (priv->hws_cpool, cnt_id, @@ -2571,6 +2574,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev, struct mlx5_hw_q_job *job; const struct rte_flow_item *rule_items; uint32_t flow_idx; + struct rte_flow_error sub_error = { 0 }; int ret; if (unlikely(!priv->hw_q[queue].job_idx)) { @@ -2611,7 +2615,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev, if (flow_hw_actions_construct(dev, job, &table->ats[action_template_index], pattern_template_index, actions, - rule_acts, queue, error)) + rule_acts, queue, &sub_error)) goto free; rule_items = flow_hw_get_rule_items(dev, table, items, pattern_template_index, job); @@ -2628,9 +2632,12 @@ free: mlx5_ipool_free(table->flow, flow_idx); priv->hw_q[queue].job_idx++; error: - rte_flow_error_set(error, rte_errno, - RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, - "fail to create rte flow"); + if (sub_error.cause != RTE_FLOW_ERROR_TYPE_NONE && error != NULL) + *error = sub_error; + else + rte_flow_error_set(error, rte_errno, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "fail to create rte flow"); return NULL; } diff --git a/drivers/net/mlx5/mlx5_hws_cnt.c b/drivers/net/mlx5/mlx5_hws_cnt.c index 8415aa411f..3250255727 100644 --- a/drivers/net/mlx5/mlx5_hws_cnt.c +++ b/drivers/net/mlx5/mlx5_hws_cnt.c @@ -72,26 +72,29 @@ __mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh, uint32_t ret __rte_unused; reset_cnt_num = rte_ring_count(reset_list); - do { - cpool->query_gen++; - mlx5_aso_cnt_query(sh, cpool); - zcdr.n1 = 0; - zcdu.n1 = 0; - ret = rte_ring_enqueue_zc_burst_elem_start(reuse_list, - sizeof(cnt_id_t), - reset_cnt_num, &zcdu, - NULL); - MLX5_ASSERT(ret == reset_cnt_num); - ret = rte_ring_dequeue_zc_burst_elem_start(reset_list, - sizeof(cnt_id_t), - reset_cnt_num, &zcdr, - NULL); - MLX5_ASSERT(ret == reset_cnt_num); - __hws_cnt_r2rcpy(&zcdu, &zcdr, reset_cnt_num); - rte_ring_dequeue_zc_elem_finish(reset_list, reset_cnt_num); - rte_ring_enqueue_zc_elem_finish(reuse_list, reset_cnt_num); + cpool->query_gen++; + mlx5_aso_cnt_query(sh, cpool); + zcdr.n1 = 0; + zcdu.n1 = 0; + ret = rte_ring_enqueue_zc_burst_elem_start(reuse_list, + sizeof(cnt_id_t), + reset_cnt_num, &zcdu, + NULL); + MLX5_ASSERT(ret == reset_cnt_num); + ret = rte_ring_dequeue_zc_burst_elem_start(reset_list, + sizeof(cnt_id_t), + reset_cnt_num, &zcdr, + NULL); + MLX5_ASSERT(ret == reset_cnt_num); + __hws_cnt_r2rcpy(&zcdu, &zcdr, reset_cnt_num); + rte_ring_dequeue_zc_elem_finish(reset_list, reset_cnt_num); + rte_ring_enqueue_zc_elem_finish(reuse_list, reset_cnt_num); + + if (rte_log_can_log(mlx5_logtype, RTE_LOG_DEBUG)) { reset_cnt_num = rte_ring_count(reset_list); - } while (reset_cnt_num > 0); + DRV_LOG(DEBUG, "ibdev %s cpool %p wait_reset_cnt=%" PRIu32, + sh->ibdev_name, (void *)cpool, reset_cnt_num); + } } /** @@ -331,6 +334,11 @@ mlx5_hws_cnt_svc(void *opaque) rte_spinlock_unlock(&sh->cpool_lock); query_us = query_cycle / (rte_get_timer_hz() / US_PER_S); sleep_us = interval - query_us; + DRV_LOG(DEBUG, "ibdev %s counter service thread: " + "interval_us=%" PRIu64 " query_us=%" PRIu64 " " + "sleep_us=%" PRIu64, + sh->ibdev_name, interval, query_us, + interval > query_us ? sleep_us : 0); if (interval > query_us) rte_delay_us_sleep(sleep_us); } -- 2.45.2 --- Diff of the applied patch vs upstream commit (please double-check if non-empty: --- --- - 2024-11-20 23:41:23.344119428 +0000 +++ 0018-net-mlx5-fix-counter-query-loop-getting-stuck.patch 2024-11-20 23:41:22.732195467 +0000 @@ -1 +1 @@ -From c0e29968294c92ca15fdb34ce63fbba01c4562a6 Mon Sep 17 00:00:00 2001 +From 0e162f1a25b59dbb3c40455a0249154e0532b61f Mon Sep 17 00:00:00 2001 @@ -5,0 +6,2 @@ +[ upstream commit c0e29968294c92ca15fdb34ce63fbba01c4562a6 ] + @@ -46 +47,0 @@ -Cc: stable@dpdk.org @@ -57 +58 @@ -index b1d6863f36..145c01fbda 100644 +index b047d7db58..d2f741a472 100644 @@ -60 +61 @@ -@@ -2021,6 +2021,77 @@ directly but neither destroyed nor flushed. +@@ -1589,6 +1589,77 @@ directly but neither destroyed nor flushed. @@ -139 +140 @@ -index 488ef4ce3c..6ad98d40f7 100644 +index bff95133cf..dd64cb224f 100644 @@ -142 +143 @@ -@@ -3734,8 +3734,11 @@ flow_hw_actions_construct(struct rte_eth_dev *dev, +@@ -2377,8 +2377,11 @@ flow_hw_actions_construct(struct rte_eth_dev *dev, @@ -155,3 +156,2 @@ -@@ -3980,6 +3983,7 @@ flow_hw_async_flow_create_generic(struct rte_eth_dev *dev, - struct mlx5dr_rule_action *rule_acts; - struct rte_flow_hw *flow = NULL; +@@ -2571,6 +2574,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev, + struct mlx5_hw_q_job *job; @@ -158,0 +159 @@ + uint32_t flow_idx; @@ -160,2 +160,0 @@ - uint32_t flow_idx = 0; - uint32_t res_idx = 0; @@ -163 +162,4 @@ -@@ -4037,7 +4041,7 @@ flow_hw_async_flow_create_generic(struct rte_eth_dev *dev, + + if (unlikely(!priv->hw_q[queue].job_idx)) { +@@ -2611,7 +2615,7 @@ flow_hw_async_flow_create(struct rte_eth_dev *dev, + if (flow_hw_actions_construct(dev, job, @@ -165,2 +167 @@ - table->its[pattern_template_index]->item_flags, - flow->table, actions, + pattern_template_index, actions, @@ -169 +170 @@ - goto error; + goto free; @@ -171,5 +172,5 @@ - pattern_template_index, &priv->hw_q[queue].pp); -@@ -4074,9 +4078,12 @@ error: - mlx5_ipool_free(table->resource, res_idx); - if (flow_idx) - mlx5_ipool_free(table->flow, flow_idx); + pattern_template_index, job); +@@ -2628,9 +2632,12 @@ free: + mlx5_ipool_free(table->flow, flow_idx); + priv->hw_q[queue].job_idx++; + error: @@ -189 +190 @@ -index def0b19deb..0197c098f6 100644 +index 8415aa411f..3250255727 100644 @@ -192 +193 @@ -@@ -56,26 +56,29 @@ __mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh, +@@ -72,26 +72,29 @@ __mlx5_hws_cnt_svc(struct mlx5_dev_ctx_shared *sh, @@ -241 +242 @@ -@@ -325,6 +328,11 @@ mlx5_hws_cnt_svc(void *opaque) +@@ -331,6 +334,11 @@ mlx5_hws_cnt_svc(void *opaque)