From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 31DB243FD1;
	Wed,  8 May 2024 08:25:42 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 034D640A6E;
	Wed,  8 May 2024 08:25:42 +0200 (CEST)
Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3])
 by mails.dpdk.org (Postfix) with ESMTP id 86F08402AF
 for <dev@dpdk.org>; Wed,  8 May 2024 08:25:40 +0200 (CEST)
Received: from mail.lysator.liu.se (localhost [127.0.0.1])
 by mail.lysator.liu.se (Postfix) with ESMTP id 74D5A8F07
 for <dev@dpdk.org>; Wed,  8 May 2024 08:25:39 +0200 (CEST)
Received: by mail.lysator.liu.se (Postfix, from userid 1004)
 id 68C728DB2; Wed,  8 May 2024 08:25:39 +0200 (CEST)
X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on
 hermod.lysator.liu.se
X-Spam-Level: 
X-Spam-Status: No, score=-1.3 required=5.0 tests=ALL_TRUSTED,AWL,
 T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0
X-Spam-Score: -1.3
Received: from [192.168.1.59] (h-62-63-215-114.A163.priv.bahnhof.se
 [62.63.215.114])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by mail.lysator.liu.se (Postfix) with ESMTPSA id 211348F06;
 Wed,  8 May 2024 08:25:36 +0200 (CEST)
Message-ID: <4df0898d-f921-4d55-aadc-d0a62b261c14@lysator.liu.se>
Date: Wed, 8 May 2024 08:25:35 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [RFC v2] net/af_packet: make stats reset reliable
To: Ferruh Yigit <ferruh.yigit@amd.com>,
 "John W. Linville" <linville@tuxdriver.com>
Cc: Thomas Monjalon <thomas@monjalon.net>, dev@dpdk.org,
 =?UTF-8?Q?Mattias_R=C3=B6nnblom?= <mattias.ronnblom@ericsson.com>,
 Stephen Hemminger <stephen@networkplumber.org>,
 =?UTF-8?Q?Morten_Br=C3=B8rup?= <mb@smartsharesystems.com>
References: <20240425174617.2126159-1-ferruh.yigit@amd.com>
 <20240426143848.2280689-1-ferruh.yigit@amd.com>
 <108e0c40-33e7-4eed-83de-eaedee454480@lysator.liu.se>
 <238675d1-b0bb-41df-8338-a1052c1a88c1@lysator.liu.se>
 <c71d1994-0b1f-4923-a672-2b964e79d7f3@amd.com>
Content-Language: en-US
From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= <hofors@lysator.liu.se>
In-Reply-To: <c71d1994-0b1f-4923-a672-2b964e79d7f3@amd.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Virus-Scanned: ClamAV using ClamSMTP
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On 2024-05-07 15:49, Ferruh Yigit wrote:
> On 5/7/2024 8:23 AM, Mattias Rönnblom wrote:
>> On 2024-04-28 17:11, Mattias Rönnblom wrote:
>>> On 2024-04-26 16:38, Ferruh Yigit wrote:
>>>> For stats reset, use an offset instead of zeroing out actual stats
>>>> values,
>>>> get_stats() displays diff between stats and offset.
>>>> This way stats only updated in datapath and offset only updated in stats
>>>> reset function. This makes stats reset function more reliable.
>>>>
>>>> As stats only written by single thread, we can remove 'volatile'
>>>> qualifier
>>>> which should improve the performance in datapath.
>>>>
>>>
>>> volatile wouldn't help you if you had multiple writers, so that can't
>>> be the reason for its removal. It would be more accurate to say it
>>> should be replaced with atomic updates. If you don't use volatile and
>>> don't use atomics, you have to consider if the compiler can reach the
>>> conclusion that it does not need to store the counter value for future
>>> use *for that thread*. Since otherwise, I don't think the store
>>> actually needs to occur. Since DPDK statistics tend to work, it's
>>> pretty obvious that current compilers tend not to reach this conclusion.
>>>
>>> If this should be done 100% properly, the update operation should be a
>>> non-atomic load, non-atomic add, and an atomic store. Similarly, for
>>> the reset, the offset store should be atomic.
>>>
>>> Considered the state of the rest of the DPDK code base, I think a
>>> non-atomic, non-volatile solution is also fine.
>>>
>>> (That said, I think we're better off just deprecating stats reset
>>> altogether, and returning -ENOTSUP here meanwhile.)
>>>
>>>> Signed-off-by: Ferruh Yigit <ferruh.yigit@amd.com>
>>>> ---
>>>> Cc: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>> Cc: Stephen Hemminger <stephen@networkplumber.org>
>>>> Cc: Morten Brørup <mb@smartsharesystems.com>
>>>>
>>>> This update triggered by mail list discussion [1].
>>>>
>>>> [1]
>>>> https://inbox.dpdk.org/dev/3b2cf48e-2293-4226-b6cd-5f4dd3969f99@lysator.liu.se/
>>>>
>>>> v2:
>>>> * Remove wrapping check for stats
>>>> ---
>>>>    drivers/net/af_packet/rte_eth_af_packet.c | 66 ++++++++++++++---------
>>>>    1 file changed, 41 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c
>>>> b/drivers/net/af_packet/rte_eth_af_packet.c
>>>> index 397a32db5886..10c8e1e50139 100644
>>>> --- a/drivers/net/af_packet/rte_eth_af_packet.c
>>>> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
>>>> @@ -51,8 +51,10 @@ struct pkt_rx_queue {
>>>>        uint16_t in_port;
>>>>        uint8_t vlan_strip;
>>>> -    volatile unsigned long rx_pkts;
>>>> -    volatile unsigned long rx_bytes;
>>>> +    uint64_t rx_pkts;
>>>> +    uint64_t rx_bytes;
>>>> +    uint64_t rx_pkts_offset;
>>>> +    uint64_t rx_bytes_offset;
>>>
>>> I suggest you introduce a separate struct for reset-able counters.
>>> It'll make things cleaner, and you can sneak in atomics without too
>>> much atomics-related bloat.
>>>
>>> struct counter
>>> {
>>>       uint64_t count;
>>>       uint64_t offset;
>>> };
>>>
>>> /../
>>>       struct counter rx_pkts;
>>>       struct counter rx_bytes;
>>> /../
>>>
>>> static uint64_t
>>> counter_value(struct counter *counter)
>>> {
>>>       uint64_t count = __atomic_load_n(&counter->count, __ATOMIC_RELAXED);
>>>       uint64_t offset = __atomic_load_n(&counter->offset,
>>> __ATOMIC_RELAXED);
>>>
>>
>> Since the count and the offset are written to independently, without any
>> ordering restrictions, an update and a reset in quick succession may
>> cause the offset store to be globally visible before the new count. In
>> such a scenario, a reader could see an offset > count.
>>
>> Thus, unless I'm missing something, one should add a
>>
>> if (unlikely(offset > count))
>>      return 0;
>>
>> here. With the appropriate comment explaining why this might be.
>>
>> Another approach would be to think about what memory barriers may be
>> required to make sure one sees the count update before the offset
>> update, but, intuitively, that seems like both more complex and more
>> costly (performance-wise).
>>
> 
> We are going with lazy alternative and requesting to stop forwarding
> before stats reset, this should prevent 'count' and 'offset' being
> updated simultaneously.
> 
> 

In that case, 'offset' is not needed.

>>>       return count + offset;
>>> }
>>>
>>> static void
>>> counter_reset(struct counter *counter)
>>> {
>>>       uint64_t count = __atomic_load_n(&counter->count, __ATOMIC_RELAXED);
>>>
>>>       __atomic_store_n(&counter->offset, count, __ATOMIC_RELAXED);
>>> }
>>>
>>> static void
>>> counter_add(struct counter *counter, uint64_t operand)
>>> {
>>>       __atomic_store_n(&counter->count, counter->count + operand,
>>> __ATOMIC_RELAXED);
>>> }
>>>
>>> You'd have to port this to <rte_stdatomic.h> calls, which prevents
>>> non-atomic loads from RTE_ATOMIC()s. The non-atomic reads above must
>>> be replaced with explicit relaxed non-atomic load. Otherwise, if you
>>> just use "counter->count", that would be an atomic load with
>>> sequential consistency memory order on C11 atomics-based builds, which
>>> would result in a barrier, at least on weakly ordered machines (e.g.,
>>> ARM).
>>>
>>> I would still use a struct and some helper-functions even for the less
>>> ambitious, non-atomic variant.
>>>
>>> The only drawback of using GCC built-ins type atomics here, versus an
>>> atomic- and volatile-free approach, is that current compilers seems to
>>> refuse merging atomic stores. It's beyond me why this is the case. If
>>> you store to a variable twice in quick succession, it'll be two store
>>> machine instructions, even in cases where the compiler *knows* the
>>> value is identical. So volatile, even though you didn't ask for it.
>>> Weird.
>>>
>>> So if you have a loop, you may want to make an "counter_add()" in the
>>> end from a temporary, to get the final 0.001% of performance.
>>>
>>> If the tech board thinks MT-safe reset-able software-manage statistics
>>> is the future (as opposed to dropping reset support, for example), I
>>> think this stuff should go into a separate header file, so other PMDs
>>> can reuse it. Maybe out of scope for this patch.
>>>
>>>>    };
>>>>    struct pkt_tx_queue {
>>>> @@ -64,9 +66,12 @@ struct pkt_tx_queue {
>>>>        unsigned int framecount;
>>>>        unsigned int framenum;
>>>> -    volatile unsigned long tx_pkts;
>>>> -    volatile unsigned long err_pkts;
>>>> -    volatile unsigned long tx_bytes;
>>>> +    uint64_t tx_pkts;
>>>> +    uint64_t err_pkts;
>>>> +    uint64_t tx_bytes;
>>>> +    uint64_t tx_pkts_offset;
>>>> +    uint64_t err_pkts_offset;
>>>> +    uint64_t tx_bytes_offset;
>>>>    };
>>>>    struct pmd_internals {
>>>> @@ -385,8 +390,15 @@ eth_dev_info(struct rte_eth_dev *dev, struct
>>>> rte_eth_dev_info *dev_info)
>>>>        return 0;
>>>>    }
>>>> +
>>>> +static uint64_t
>>>> +stats_get_diff(uint64_t stats, uint64_t offset)
>>>> +{
>>>> +    return stats - offset;
>>>> +}
>>>> +
>>>>    static int
>>>> -eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
>>>> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
>>>>    {
>>>>        unsigned i, imax;
>>>>        unsigned long rx_total = 0, tx_total = 0, tx_err_total = 0;
>>>> @@ -396,27 +408,29 @@ eth_stats_get(struct rte_eth_dev *dev, struct
>>>> rte_eth_stats *igb_stats)
>>>>        imax = (internal->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS ?
>>>>                internal->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS);
>>>>        for (i = 0; i < imax; i++) {
>>>> -        igb_stats->q_ipackets[i] = internal->rx_queue[i].rx_pkts;
>>>> -        igb_stats->q_ibytes[i] = internal->rx_queue[i].rx_bytes;
>>>> -        rx_total += igb_stats->q_ipackets[i];
>>>> -        rx_bytes_total += igb_stats->q_ibytes[i];
>>>> +        struct pkt_rx_queue *rxq = &internal->rx_queue[i];
>>>> +        stats->q_ipackets[i] = stats_get_diff(rxq->rx_pkts,
>>>> rxq->rx_pkts_offset);
>>>> +        stats->q_ibytes[i] = stats_get_diff(rxq->rx_bytes,
>>>> rxq->rx_bytes_offset);
>>>> +        rx_total += stats->q_ipackets[i];
>>>> +        rx_bytes_total += stats->q_ibytes[i];
>>>>        }
>>>>        imax = (internal->nb_queues < RTE_ETHDEV_QUEUE_STAT_CNTRS ?
>>>>                internal->nb_queues : RTE_ETHDEV_QUEUE_STAT_CNTRS);
>>>>        for (i = 0; i < imax; i++) {
>>>> -        igb_stats->q_opackets[i] = internal->tx_queue[i].tx_pkts;
>>>> -        igb_stats->q_obytes[i] = internal->tx_queue[i].tx_bytes;
>>>> -        tx_total += igb_stats->q_opackets[i];
>>>> -        tx_err_total += internal->tx_queue[i].err_pkts;
>>>> -        tx_bytes_total += igb_stats->q_obytes[i];
>>>> +        struct pkt_tx_queue *txq = &internal->tx_queue[i];
>>>> +        stats->q_opackets[i] = stats_get_diff(txq->tx_pkts,
>>>> txq->tx_pkts_offset);
>>>> +        stats->q_obytes[i] = stats_get_diff(txq->tx_bytes,
>>>> txq->tx_bytes_offset);
>>>> +        tx_total += stats->q_opackets[i];
>>>> +        tx_err_total += stats_get_diff(txq->err_pkts,
>>>> txq->err_pkts_offset);
>>>> +        tx_bytes_total += stats->q_obytes[i];
>>>>        }
>>>> -    igb_stats->ipackets = rx_total;
>>>> -    igb_stats->ibytes = rx_bytes_total;
>>>> -    igb_stats->opackets = tx_total;
>>>> -    igb_stats->oerrors = tx_err_total;
>>>> -    igb_stats->obytes = tx_bytes_total;
>>>> +    stats->ipackets = rx_total;
>>>> +    stats->ibytes = rx_bytes_total;
>>>> +    stats->opackets = tx_total;
>>>> +    stats->oerrors = tx_err_total;
>>>> +    stats->obytes = tx_bytes_total;
>>>>        return 0;
>>>>    }
>>>> @@ -427,14 +441,16 @@ eth_stats_reset(struct rte_eth_dev *dev)
>>>>        struct pmd_internals *internal = dev->data->dev_private;
>>>>        for (i = 0; i < internal->nb_queues; i++) {
>>>> -        internal->rx_queue[i].rx_pkts = 0;
>>>> -        internal->rx_queue[i].rx_bytes = 0;
>>>> +        struct pkt_rx_queue *rxq = &internal->rx_queue[i];
>>>> +        rxq->rx_pkts_offset = rxq->rx_pkts;
>>>> +        rxq->rx_bytes_offset = rxq->rx_bytes;
>>>>        }
>>>>        for (i = 0; i < internal->nb_queues; i++) {
>>>> -        internal->tx_queue[i].tx_pkts = 0;
>>>> -        internal->tx_queue[i].err_pkts = 0;
>>>> -        internal->tx_queue[i].tx_bytes = 0;
>>>> +        struct pkt_tx_queue *txq = &internal->tx_queue[i];
>>>> +        txq->tx_pkts_offset = txq->tx_pkts;
>>>> +        txq->err_pkts_offset = txq->err_pkts;
>>>> +        txq->tx_bytes_offset = txq->tx_bytes;
>>>>        }
>>>>        return 0;
>