From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 31DB243FD1; Wed, 8 May 2024 08:25:42 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 034D640A6E; Wed, 8 May 2024 08:25:42 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id 86F08402AF for ; Wed, 8 May 2024 08:25:40 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 74D5A8F07 for ; Wed, 8 May 2024 08:25:39 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 68C728DB2; Wed, 8 May 2024 08:25:39 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.3 Received: from [192.168.1.59] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 211348F06; Wed, 8 May 2024 08:25:36 +0200 (CEST) Message-ID: <4df0898d-f921-4d55-aadc-d0a62b261c14@lysator.liu.se> Date: Wed, 8 May 2024 08:25:35 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC v2] net/af_packet: make stats reset reliable To: Ferruh Yigit , "John W. Linville" Cc: Thomas Monjalon , dev@dpdk.org, =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , Stephen Hemminger , =?UTF-8?Q?Morten_Br=C3=B8rup?= References: <20240425174617.2126159-1-ferruh.yigit@amd.com> <20240426143848.2280689-1-ferruh.yigit@amd.com> <108e0c40-33e7-4eed-83de-eaedee454480@lysator.liu.se> <238675d1-b0bb-41df-8338-a1052c1a88c1@lysator.liu.se> Content-Language: en-US From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-05-07 15:49, Ferruh Yigit wrote: > On 5/7/2024 8:23 AM, Mattias Rönnblom wrote: >> On 2024-04-28 17:11, Mattias Rönnblom wrote: >>> On 2024-04-26 16:38, Ferruh Yigit wrote: >>>> For stats reset, use an offset instead of zeroing out actual stats >>>> values, >>>> get_stats() displays diff between stats and offset. >>>> This way stats only updated in datapath and offset only updated in stats >>>> reset function. This makes stats reset function more reliable. >>>> >>>> As stats only written by single thread, we can remove 'volatile' >>>> qualifier >>>> which should improve the performance in datapath. >>>> >>> >>> volatile wouldn't help you if you had multiple writers, so that can't >>> be the reason for its removal. It would be more accurate to say it >>> should be replaced with atomic updates. If you don't use volatile and >>> don't use atomics, you have to consider if the compiler can reach the >>> conclusion that it does not need to store the counter value for future >>> use *for that thread*. Since otherwise, I don't think the store >>> actually needs to occur. Since DPDK statistics tend to work, it's >>> pretty obvious that current compilers tend not to reach this conclusion. >>> >>> If this should be done 100% properly, the update operation should be a >>> non-atomic load, non-atomic add, and an atomic store. Similarly, for >>> the reset, the offset store should be atomic. >>> >>> Considered the state of the rest of the DPDK code base, I think a >>> non-atomic, non-volatile solution is also fine. >>> >>> (That said, I think we're better off just deprecating stats reset >>> altogether, and returning -ENOTSUP here meanwhile.) >>> >>>> Signed-off-by: Ferruh Yigit >>>> --- >>>> Cc: Mattias Rönnblom >>>> Cc: Stephen Hemminger >>>> Cc: Morten Brørup >>>> >>>> This update triggered by mail list discussion [1]. >>>> >>>> [1] >>>> https://inbox.dpdk.org/dev/3b2cf48e-2293-4226-b6cd-5f4dd3969f99@lysator.liu.se/ >>>> >>>> v2: >>>> * Remove wrapping check for stats >>>> --- >>>>   drivers/net/af_packet/rte_eth_af_packet.c | 66 ++++++++++++++--------- >>>>   1 file changed, 41 insertions(+), 25 deletions(-) >>>> >>>> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c >>>> b/drivers/net/af_packet/rte_eth_af_packet.c >>>> index 397a32db5886..10c8e1e50139 100644 >>>> --- a/drivers/net/af_packet/rte_eth_af_packet.c >>>> +++ b/drivers/net/af_packet/rte_eth_af_packet.c >>>> @@ -51,8 +51,10 @@ struct pkt_rx_queue { >>>>       uint16_t in_port; >>>>       uint8_t vlan_strip; >>>> -    volatile unsigned long rx_pkts; >>>> -    volatile unsigned long rx_bytes; >>>> +    uint64_t rx_pkts; >>>> +    uint64_t rx_bytes; >>>> +    uint64_t rx_pkts_offset; >>>> +    uint64_t rx_bytes_offset; >>> >>> I suggest you introduce a separate struct for reset-able counters. >>> It'll make things cleaner, and you can sneak in atomics without too >>> much atomics-related bloat. >>> >>> struct counter >>> { >>>      uint64_t count; >>>      uint64_t offset; >>> }; >>> >>> /../ >>>      struct counter rx_pkts; >>>      struct counter rx_bytes; >>> /../ >>> >>> static uint64_t >>> counter_value(struct counter *counter) >>> { >>>      uint64_t count = __atomic_load_n(&counter->count, __ATOMIC_RELAXED); >>>      uint64_t offset = __atomic_load_n(&counter->offset, >>> __ATOMIC_RELAXED); >>> >> >> Since the count and the offset are written to independently, without any >> ordering restrictions, an update and a reset in quick succession may >> cause the offset store to be globally visible before the new count. In >> such a scenario, a reader could see an offset > count. >> >> Thus, unless I'm missing something, one should add a >> >> if (unlikely(offset > count)) >>     return 0; >> >> here. With the appropriate comment explaining why this might be. >> >> Another approach would be to think about what memory barriers may be >> required to make sure one sees the count update before the offset >> update, but, intuitively, that seems like both more complex and more >> costly (performance-wise). >> > > We are going with lazy alternative and requesting to stop forwarding > before stats reset, this should prevent 'count' and 'offset' being > updated simultaneously. > > In that case, 'offset' is not needed. >>>      return count + offset; >>> } >>> >>> static void >>> counter_reset(struct counter *counter) >>> { >>>      uint64_t count = __atomic_load_n(&counter->count, __ATOMIC_RELAXED); >>> >>>      __atomic_store_n(&counter->offset, count, __ATOMIC_RELAXED); >>> } >>> >>> static void >>> counter_add(struct counter *counter, uint64_t operand) >>> { >>>      __atomic_store_n(&counter->count, counter->count + operand, >>> __ATOMIC_RELAXED); >>> } >>> >>> You'd have to port this to calls, which prevents >>> non-atomic loads from RTE_ATOMIC()s. The non-atomic reads above must >>> be replaced with explicit relaxed non-atomic load. Otherwise, if you >>> just use "counter->count", that would be an atomic load with >>> sequential consistency memory order on C11 atomics-based builds, which >>> would result in a barrier, at least on weakly ordered machines (e.g., >>> ARM). >>> >>> I would still use a struct and some helper-functions even for the less >>> ambitious, non-atomic variant. >>> >>> The only drawback of using GCC built-ins type atomics here, versus an >>> atomic- and volatile-free approach, is that current compilers seems to >>> refuse merging atomic stores. It's beyond me why this is the case. If >>> you store to a variable twice in quick succession, it'll be two store >>> machine instructions, even in cases where the compiler *knows* the >>> value is identical. So volatile, even though you didn't ask for it. >>> Weird. >>> >>> So if you have a loop, you may want to make an "counter_add()" in the >>> end from a temporary, to get the final 0.001% of performance. >>> >>> If the tech board thinks MT-safe reset-able software-manage statistics >>> is the future (as opposed to dropping reset support, for example), I >>> think this stuff should go into a separate header file, so other PMDs >>> can reuse it. Maybe out of scope for this patch. >>> >>>>   }; >>>>   struct pkt_tx_queue { >>>> @@ -64,9 +66,12 @@ struct pkt_tx_queue { >>>>       unsigned int framecount; >>>>       unsigned int framenum; >>>> -    volatile unsigned long tx_pkts; >>>> -    volatile unsigned long err_pkts; >>>> -    volatile unsigned long tx_bytes; >>>> +    uint64_t tx_pkts; >>>> +    uint64_t err_pkts; >>>> +    uint64_t tx_bytes; >>>> +    uint64_t tx_pkts_offset; >>>> +    uint64_t err_pkts_offset; >>>> +    uint64_t tx_bytes_offset; >>>>   }; >>>>   struct pmd_internals { >>>> @@ -385,8 +390,15 @@ eth_dev_info(struct rte_eth_dev *dev, struct >>>> rte_eth_dev_info *dev_info) >>>>       return 0; >>>>   } >>>> + >>>> +static uint64_t >>>> +stats_get_diff(uint64_t stats, uint64_t offset) >>>> +{ >>>> +    return stats - offset; >>>> +} >>>> + >>>>   static int >>>> -eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats) >>>> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats) >>>>   { >>>>       unsigned i, imax; >>>>       unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0; >>>> @@ -396,27 +408,29 @@ eth_stats_get(struct rte_eth_dev *dev, struct >>>> rte_eth_stats *igb_stats) >>>>       imax = (internal->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS ? >>>>               internal->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS); >>>>       for (i = 0; i < imax; i++) { >>>> -        igb_stats->q_ipackets[i] = internal->rx_queue[i].rx_pkts; >>>> -        igb_stats->q_ibytes[i] = internal->rx_queue[i].rx_bytes; >>>> -        rx_total += igb_stats->q_ipackets[i]; >>>> -        rx_bytes_total += igb_stats->q_ibytes[i]; >>>> +        struct pkt_rx_queue *rxq = &internal->rx_queue[i]; >>>> +        stats->q_ipackets[i] = stats_get_diff(rxq->rx_pkts, >>>> rxq->rx_pkts_offset); >>>> +        stats->q_ibytes[i] = stats_get_diff(rxq->rx_bytes, >>>> rxq->rx_bytes_offset); >>>> +        rx_total += stats->q_ipackets[i]; >>>> +        rx_bytes_total += stats->q_ibytes[i]; >>>>       } >>>>       imax = (internal->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS ? >>>>               internal->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS); >>>>       for (i = 0; i < imax; i++) { >>>> -        igb_stats->q_opackets[i] = internal->tx_queue[i].tx_pkts; >>>> -        igb_stats->q_obytes[i] = internal->tx_queue[i].tx_bytes; >>>> -        tx_total += igb_stats->q_opackets[i]; >>>> -        tx_err_total += internal->tx_queue[i].err_pkts; >>>> -        tx_bytes_total += igb_stats->q_obytes[i]; >>>> +        struct pkt_tx_queue *txq = &internal->tx_queue[i]; >>>> +        stats->q_opackets[i] = stats_get_diff(txq->tx_pkts, >>>> txq->tx_pkts_offset); >>>> +        stats->q_obytes[i] = stats_get_diff(txq->tx_bytes, >>>> txq->tx_bytes_offset); >>>> +        tx_total += stats->q_opackets[i]; >>>> +        tx_err_total += stats_get_diff(txq->err_pkts, >>>> txq->err_pkts_offset); >>>> +        tx_bytes_total += stats->q_obytes[i]; >>>>       } >>>> -    igb_stats->ipackets = rx_total; >>>> -    igb_stats->ibytes = rx_bytes_total; >>>> -    igb_stats->opackets = tx_total; >>>> -    igb_stats->oerrors = tx_err_total; >>>> -    igb_stats->obytes = tx_bytes_total; >>>> +    stats->ipackets = rx_total; >>>> +    stats->ibytes = rx_bytes_total; >>>> +    stats->opackets = tx_total; >>>> +    stats->oerrors = tx_err_total; >>>> +    stats->obytes = tx_bytes_total; >>>>       return 0; >>>>   } >>>> @@ -427,14 +441,16 @@ eth_stats_reset(struct rte_eth_dev *dev) >>>>       struct pmd_internals *internal = dev->data->dev_private; >>>>       for (i = 0; i < internal->nb_queues; i++) { >>>> -        internal->rx_queue[i].rx_pkts = 0; >>>> -        internal->rx_queue[i].rx_bytes = 0; >>>> +        struct pkt_rx_queue *rxq = &internal->rx_queue[i]; >>>> +        rxq->rx_pkts_offset = rxq->rx_pkts; >>>> +        rxq->rx_bytes_offset = rxq->rx_bytes; >>>>       } >>>>       for (i = 0; i < internal->nb_queues; i++) { >>>> -        internal->tx_queue[i].tx_pkts = 0; >>>> -        internal->tx_queue[i].err_pkts = 0; >>>> -        internal->tx_queue[i].tx_bytes = 0; >>>> +        struct pkt_tx_queue *txq = &internal->tx_queue[i]; >>>> +        txq->tx_pkts_offset = txq->tx_pkts; >>>> +        txq->err_pkts_offset = txq->err_pkts; >>>> +        txq->tx_bytes_offset = txq->tx_bytes; >>>>       } >>>>       return 0; >