From: Matan Azrad <matan@mellanox.com>
To: "Gaëtan Rivet" <gaetan.rivet@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] net/failsafe: stat support enhancement
Date: Tue, 5 Sep 2017 15:12:55 +0000 [thread overview]
Message-ID: <DB6PR0502MB3048D9AE5D12DC486A1E1A92D2960@DB6PR0502MB3048.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <20170905132244.GF21444@bidouze.vm.6wind.com>
Hi Gaetan,
> -----Original Message-----
> From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com]
> Sent: Tuesday, September 5, 2017 4:23 PM
> To: Matan Azrad <matan@mellanox.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH] net/failsafe: stat support enhancement
>
> Hi Matan,
>
> On Tue, Sep 05, 2017 at 12:56:34PM +0300, Matan Azrad wrote:
> > The previous stats code returned only the current TX sub device stats.
> >
> > This enhancement extends it to return the sum of all sub devices stats
> > with history of removed sub-devices.
> >
> > Dedicated stats accumulator saves the stat history of all sub device
> > remove events.
> >
> > Each failsafe sub device contains the last stats asked by the user and
> > updates the accumulator in removal time.
> >
> > I would like to implement ultimate snapshot on removal time.
> > The stats_get API needs to be changed to return error in the case it
> > is too late to retrieve statistics.
>
> You need to have this API evolution first. It is not assured to be accepted by
> the community.
>
I already sent suggestion to mailing list, find "stats_get API: return value suggestion":
No objections for now.
> As it is, this version is incorrect, because the only available stats for a
> removed slave is the last snapshot.
>
It is for sure better from the previous version, the accuracy can be better if
my suggestion will be accepted.
If it is, I will improve it.
> Thus it complicates the rules while still being incorrect. Even if you were able
> to push for this API evolution, PMDs with hard counters would be the ones
> to return an error code on stat read while removed.
> You may be lucky at the moment because MLX drivers do not support hard
> counters, but this won't always be true (and it will be false soon enough).
> You will thus be back at square one, with a new useless API and still incorrect
> stats in the fail-safe. On the other hand, the fail-safe should strive to be as
> easy to use as possible with most drivers, and not cater specifically to soft-
> counters ones.
>
> So, my take on this: I understand that aggregated stats would be interesting.
> Keep it simple & stupid: simple aggregation on stats_get.
> You will have a rollback at some point on device removal, but this is still easier
> to detect / deal with than partially / sometimes incorrect stats.
>
100% accuracy is impossible when removal event is in the game,
but we can do it better as we can.
As I wrote in commit log(if stat gets will return value):
When some drivers can give the stats after RMV interrupt and before dev_close,
We just can to improve accuracy and be closer to 100%.
The stats of the sub devices which return error will be taken from the last snapshot.
It is just enhancement which is important for customers.
> > By this way, failsafe can get stats snapshot in removal interrupt
> > callback for each PMD which can give stats after removal event.
> >
> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > ---
> > drivers/net/failsafe/failsafe_ether.c | 33
> +++++++++++++++++++++++++++++++++
> > drivers/net/failsafe/failsafe_ops.c | 16 ++++++++++++----
> > drivers/net/failsafe/failsafe_private.h | 5 +++++
> > 3 files changed, 50 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/failsafe/failsafe_ether.c
> > b/drivers/net/failsafe/failsafe_ether.c
> > index a3a8cce..133080d 100644
> > --- a/drivers/net/failsafe/failsafe_ether.c
> > +++ b/drivers/net/failsafe/failsafe_ether.c
> > @@ -399,6 +399,37 @@ failsafe_eth_dev_state_sync(struct rte_eth_dev
> *dev)
> > return ret;
> > }
> >
> > +void
> > +fs_increment_stats(struct rte_eth_stats *from, struct rte_eth_stats
> > +*to) {
> > + uint8_t i;
> > +
> > + RTE_ASSERT(from != NULL && to != NULL);
> > + to->ipackets += from->ipackets;
> > + to->opackets += from->opackets;
> > + to->ibytes += from->ibytes;
> > + to->obytes += from->obytes;
> > + to->imissed += from->imissed;
> > + to->ierrors += from->ierrors;
> > + to->oerrors += from->oerrors;
> > + to->rx_nombuf += from->rx_nombuf;
> > + for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS; i++) {
> > + to->q_ipackets[i] += from->q_ipackets[i];
> > + to->q_opackets[i] += from->q_opackets[i];
> > + to->q_ibytes[i] += from->q_ibytes[i];
> > + to->q_obytes[i] += from->q_obytes[i];
> > + to->q_errors[i] += from->q_errors[i];
> > + }
> > +}
> > +
> > +void
> > +fs_increment_stats_accumulator(struct sub_device *sdev) {
> > + fs_increment_stats(&sdev->stats_snapshot,
> > + &PRIV(sdev->fs_dev)->stats_accumulator);
> > + memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats)); }
> > +
> > int
> > failsafe_eth_rmv_event_callback(uint8_t port_id __rte_unused,
> > enum rte_eth_event_type event
> __rte_unused, @@ -410,6 +441,8 @@
> > failsafe_eth_rmv_event_callback(uint8_t port_id __rte_unused,
> > fs_switch_dev(sdev->fs_dev, sdev);
> > /* Use safe bursts in any case. */
> > set_burst_fn(sdev->fs_dev, 1);
> > + /* Increment the stats accumulator by the last stats snapshot. */
> > + fs_increment_stats_accumulator(sdev);
> > /*
> > * Async removal, the sub-PMD will try to unregister
> > * the callback at the source of the current thread context.
> > diff --git a/drivers/net/failsafe/failsafe_ops.c
> > b/drivers/net/failsafe/failsafe_ops.c
> > index ff9ad15..e47cc85 100644
> > --- a/drivers/net/failsafe/failsafe_ops.c
> > +++ b/drivers/net/failsafe/failsafe_ops.c
> > @@ -586,9 +586,14 @@ static void
> > fs_stats_get(struct rte_eth_dev *dev,
> > struct rte_eth_stats *stats)
> > {
> > - if (TX_SUBDEV(dev) == NULL)
> > - return;
> > - rte_eth_stats_get(PORT_ID(TX_SUBDEV(dev)), stats);
> > + struct sub_device *sdev;
> > + uint8_t i;
> > +
> > + memcpy(stats, &PRIV(dev)->stats_accumulator, sizeof(*stats));
> > + FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > + rte_eth_stats_get(PORT_ID(sdev), &sdev->stats_snapshot);
> > + fs_increment_stats(&sdev->stats_snapshot, stats);
> > + }
> > }
> >
> > static void
> > @@ -597,8 +602,11 @@ fs_stats_reset(struct rte_eth_dev *dev)
> > struct sub_device *sdev;
> > uint8_t i;
> >
> > - FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
> > + FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > rte_eth_stats_reset(PORT_ID(sdev));
> > + memset(&sdev->stats_snapshot, 0, sizeof(struct
> rte_eth_stats));
> > + }
> > + memset(&PRIV(dev)->stats_accumulator, 0, sizeof(struct
> > +rte_eth_stats));
> > }
> >
> > /**
> > diff --git a/drivers/net/failsafe/failsafe_private.h
> > b/drivers/net/failsafe/failsafe_private.h
> > index 0361cf4..267c749 100644
> > --- a/drivers/net/failsafe/failsafe_private.h
> > +++ b/drivers/net/failsafe/failsafe_private.h
> > @@ -102,6 +102,8 @@ struct sub_device {
> > uint8_t sid;
> > /* Device state machine */
> > enum dev_state state;
> > + /* Last stats snapshot passed to user */
> > + struct rte_eth_stats stats_snapshot;
> > /* Some device are defined as a command line */
> > char *cmdline;
> > /* fail-safe device backreference */ @@ -140,6 +142,7 @@ struct
> > fs_priv {
> > * synchronized state.
> > */
> > enum dev_state state;
> > + struct rte_eth_stats stats_accumulator;
> > unsigned int pending_alarm:1; /* An alarm is pending */
> > /* flow isolation state */
> > int flow_isolated:1;
> > @@ -180,6 +183,8 @@ int failsafe_eal_uninit(struct rte_eth_dev *dev);
> >
> > int failsafe_eth_dev_state_sync(struct rte_eth_dev *dev); void
> > failsafe_dev_remove(struct rte_eth_dev *dev);
> > +void fs_increment_stats(struct rte_eth_stats *from, struct
> > +rte_eth_stats *to); void fs_increment_stats_accumulator(struct
> > +sub_device *sdev);
> > int failsafe_eth_rmv_event_callback(uint8_t port_id,
> > enum rte_eth_event_type type,
> > void *arg, void *out);
> > --
> > 2.7.4
> >
>
> --
> Gaëtan Rivet
> 6WIND
next prev parent reply other threads:[~2017-09-05 15:12 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-05 9:56 Matan Azrad
2017-09-05 13:22 ` Gaëtan Rivet
2017-09-05 15:12 ` Matan Azrad [this message]
2017-09-05 16:34 ` Gaëtan Rivet
2017-09-06 6:54 ` Matan Azrad
2017-09-07 8:49 ` Gaëtan Rivet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DB6PR0502MB3048D9AE5D12DC486A1E1A92D2960@DB6PR0502MB3048.eurprd05.prod.outlook.com \
--to=matan@mellanox.com \
--cc=dev@dpdk.org \
--cc=gaetan.rivet@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).