DPDK patches and discussions
 help / color / mirror / Atom feed
From: Matan Azrad <matan@mellanox.com>
To: "Gaëtan Rivet" <gaetan.rivet@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] net/failsafe: stat support enhancement
Date: Tue, 5 Sep 2017 15:12:55 +0000	[thread overview]
Message-ID: <DB6PR0502MB3048D9AE5D12DC486A1E1A92D2960@DB6PR0502MB3048.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <20170905132244.GF21444@bidouze.vm.6wind.com>

Hi Gaetan,

> -----Original Message-----
> From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com]
> Sent: Tuesday, September 5, 2017 4:23 PM
> To: Matan Azrad <matan@mellanox.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH] net/failsafe: stat support enhancement
> 
> Hi Matan,
> 
> On Tue, Sep 05, 2017 at 12:56:34PM +0300, Matan Azrad wrote:
> > The previous stats code returned only the current TX sub device stats.
> >
> > This enhancement extends it to return the sum of all sub devices stats
> > with history of removed sub-devices.
> >
> > Dedicated stats accumulator saves the stat history of all sub device
> > remove events.
> >
> > Each failsafe sub device contains the last stats asked by the user and
> > updates the accumulator in removal time.
> >
> > I would like to implement ultimate snapshot on removal time.
> > The stats_get API needs to be changed to return error in the case it
> > is too late to retrieve statistics.
> 
> You need to have this API evolution first. It is not assured to be accepted by
> the community.
> 

I already sent suggestion to mailing list, find "stats_get API: return value suggestion":
No objections for now.


> As it is, this version is incorrect, because the only available stats for a
> removed slave is the last snapshot.
> 

It is for sure better from the previous version, the accuracy can be better if 
my suggestion will be accepted. 
If it is, I will improve it.

> Thus it complicates the rules while still being incorrect. Even if you were able
> to push for this API evolution, PMDs with hard counters would be the ones
> to return an error code on stat read while removed.
> You may be lucky at the moment because MLX drivers do not support hard
> counters, but this won't always be true (and it will be false soon enough).
> You will thus be back at square one, with a new useless API and still incorrect
> stats in the fail-safe. On the other hand, the fail-safe should strive to be as
> easy to use as possible with most drivers, and not cater specifically to soft-
> counters ones.
> 
> So, my take on this: I understand that aggregated stats would be interesting.
> Keep it simple & stupid: simple aggregation on stats_get.
> You will have a rollback at some point on device removal, but this is still easier
> to detect / deal with than partially / sometimes incorrect stats.
> 

100% accuracy is impossible when removal event is in the game,
but we can do it better as we can.

As I wrote in commit log(if stat gets will return value):
When some drivers can give the stats after RMV interrupt and before dev_close,
We just can to improve accuracy and be closer to 100%.
The stats of the sub devices which return error will be taken from the last snapshot. 

It is just enhancement which is important for customers.

> > By this way, failsafe can get stats snapshot in removal interrupt
> > callback for each PMD which can give stats after removal event.
> >
> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > ---
> >  drivers/net/failsafe/failsafe_ether.c   | 33
> +++++++++++++++++++++++++++++++++
> >  drivers/net/failsafe/failsafe_ops.c     | 16 ++++++++++++----
> >  drivers/net/failsafe/failsafe_private.h |  5 +++++
> >  3 files changed, 50 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/failsafe/failsafe_ether.c
> > b/drivers/net/failsafe/failsafe_ether.c
> > index a3a8cce..133080d 100644
> > --- a/drivers/net/failsafe/failsafe_ether.c
> > +++ b/drivers/net/failsafe/failsafe_ether.c
> > @@ -399,6 +399,37 @@ failsafe_eth_dev_state_sync(struct rte_eth_dev
> *dev)
> >  	return ret;
> >  }
> >
> > +void
> > +fs_increment_stats(struct rte_eth_stats *from, struct rte_eth_stats
> > +*to) {
> > +	uint8_t i;
> > +
> > +	RTE_ASSERT(from != NULL && to != NULL);
> > +	to->ipackets += from->ipackets;
> > +	to->opackets += from->opackets;
> > +	to->ibytes += from->ibytes;
> > +	to->obytes += from->obytes;
> > +	to->imissed += from->imissed;
> > +	to->ierrors += from->ierrors;
> > +	to->oerrors += from->oerrors;
> > +	to->rx_nombuf += from->rx_nombuf;
> > +	for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS; i++) {
> > +		to->q_ipackets[i] += from->q_ipackets[i];
> > +		to->q_opackets[i] += from->q_opackets[i];
> > +		to->q_ibytes[i] += from->q_ibytes[i];
> > +		to->q_obytes[i] += from->q_obytes[i];
> > +		to->q_errors[i] += from->q_errors[i];
> > +	}
> > +}
> > +
> > +void
> > +fs_increment_stats_accumulator(struct sub_device *sdev) {
> > +	fs_increment_stats(&sdev->stats_snapshot,
> > +			&PRIV(sdev->fs_dev)->stats_accumulator);
> > +	memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats)); }
> > +
> >  int
> >  failsafe_eth_rmv_event_callback(uint8_t port_id __rte_unused,
> >  				enum rte_eth_event_type event
> __rte_unused, @@ -410,6 +441,8 @@
> > failsafe_eth_rmv_event_callback(uint8_t port_id __rte_unused,
> >  	fs_switch_dev(sdev->fs_dev, sdev);
> >  	/* Use safe bursts in any case. */
> >  	set_burst_fn(sdev->fs_dev, 1);
> > +	/* Increment the stats accumulator by the last stats snapshot. */
> > +	fs_increment_stats_accumulator(sdev);
> >  	/*
> >  	 * Async removal, the sub-PMD will try to unregister
> >  	 * the callback at the source of the current thread context.
> > diff --git a/drivers/net/failsafe/failsafe_ops.c
> > b/drivers/net/failsafe/failsafe_ops.c
> > index ff9ad15..e47cc85 100644
> > --- a/drivers/net/failsafe/failsafe_ops.c
> > +++ b/drivers/net/failsafe/failsafe_ops.c
> > @@ -586,9 +586,14 @@ static void
> >  fs_stats_get(struct rte_eth_dev *dev,
> >  	     struct rte_eth_stats *stats)
> >  {
> > -	if (TX_SUBDEV(dev) == NULL)
> > -		return;
> > -	rte_eth_stats_get(PORT_ID(TX_SUBDEV(dev)), stats);
> > +	struct sub_device *sdev;
> > +	uint8_t i;
> > +
> > +	memcpy(stats, &PRIV(dev)->stats_accumulator, sizeof(*stats));
> > +	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> > +		rte_eth_stats_get(PORT_ID(sdev), &sdev->stats_snapshot);
> > +		fs_increment_stats(&sdev->stats_snapshot, stats);
> > +	}
> >  }
> >
> >  static void
> > @@ -597,8 +602,11 @@ fs_stats_reset(struct rte_eth_dev *dev)
> >  	struct sub_device *sdev;
> >  	uint8_t i;
> >
> > -	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE)
> > +	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> >  		rte_eth_stats_reset(PORT_ID(sdev));
> > +		memset(&sdev->stats_snapshot, 0, sizeof(struct
> rte_eth_stats));
> > +	}
> > +	memset(&PRIV(dev)->stats_accumulator, 0, sizeof(struct
> > +rte_eth_stats));
> >  }
> >
> >  /**
> > diff --git a/drivers/net/failsafe/failsafe_private.h
> > b/drivers/net/failsafe/failsafe_private.h
> > index 0361cf4..267c749 100644
> > --- a/drivers/net/failsafe/failsafe_private.h
> > +++ b/drivers/net/failsafe/failsafe_private.h
> > @@ -102,6 +102,8 @@ struct sub_device {
> >  	uint8_t sid;
> >  	/* Device state machine */
> >  	enum dev_state state;
> > +	/* Last stats snapshot passed to user */
> > +	struct rte_eth_stats stats_snapshot;
> >  	/* Some device are defined as a command line */
> >  	char *cmdline;
> >  	/* fail-safe device backreference */ @@ -140,6 +142,7 @@ struct
> > fs_priv {
> >  	 * synchronized state.
> >  	 */
> >  	enum dev_state state;
> > +	struct rte_eth_stats stats_accumulator;
> >  	unsigned int pending_alarm:1; /* An alarm is pending */
> >  	/* flow isolation state */
> >  	int flow_isolated:1;
> > @@ -180,6 +183,8 @@ int failsafe_eal_uninit(struct rte_eth_dev *dev);
> >
> >  int failsafe_eth_dev_state_sync(struct rte_eth_dev *dev);  void
> > failsafe_dev_remove(struct rte_eth_dev *dev);
> > +void fs_increment_stats(struct rte_eth_stats *from, struct
> > +rte_eth_stats *to); void fs_increment_stats_accumulator(struct
> > +sub_device *sdev);
> >  int failsafe_eth_rmv_event_callback(uint8_t port_id,
> >  				    enum rte_eth_event_type type,
> >  				    void *arg, void *out);
> > --
> > 2.7.4
> >
> 
> --
> Gaëtan Rivet
> 6WIND

  reply	other threads:[~2017-09-05 15:12 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-05  9:56 Matan Azrad
2017-09-05 13:22 ` Gaëtan Rivet
2017-09-05 15:12   ` Matan Azrad [this message]
2017-09-05 16:34     ` Gaëtan Rivet
2017-09-06  6:54       ` Matan Azrad
2017-09-07  8:49 ` Gaëtan Rivet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DB6PR0502MB3048D9AE5D12DC486A1E1A92D2960@DB6PR0502MB3048.eurprd05.prod.outlook.com \
    --to=matan@mellanox.com \
    --cc=dev@dpdk.org \
    --cc=gaetan.rivet@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).