* [dpdk-dev] [PATCH] net/failsafe: improve stats accuracy
@ 2017-10-16  7:41 Matan Azrad
  2017-10-16  8:27 ` Gaëtan Rivet
  2017-10-19 14:31 ` [dpdk-dev] [PATCH v2] " Matan Azrad
  0 siblings, 2 replies; 10+ messages in thread
From: Matan Azrad @ 2017-10-16  7:41 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev
The stats_get API was changed to return error in case of failure at
stats getting process time.
By this way, failsafe can get stats snapshot in removal process for
each PMD which can give stats after removal event.
This patch implements ultimate stats snapshot on removal time by
trying to get the removed device stats before calling to dev_close.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/failsafe/failsafe_ether.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index f4db423..2758d4c 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -312,8 +312,12 @@
 static void
 fs_dev_stats_save(struct sub_device *sdev)
 {
+	struct rte_eth_stats stats;
+
+	/* Get stats now or take it from last snapshot. */
 	failsafe_stats_increment(&PRIV(sdev->fs_dev)->stats_accumulator,
-			&sdev->stats_snapshot);
+		rte_eth_stats_get(PORT_ID(sdev), &stats) ?
+		&sdev->stats_snapshot : &stats);
 	memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));
 }
 
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH] net/failsafe: improve stats accuracy
  2017-10-16  7:41 [dpdk-dev] [PATCH] net/failsafe: improve stats accuracy Matan Azrad
@ 2017-10-16  8:27 ` Gaëtan Rivet
  2017-10-16  8:51   ` Matan Azrad
  2017-10-19 14:31 ` [dpdk-dev] [PATCH v2] " Matan Azrad
  1 sibling, 1 reply; 10+ messages in thread
From: Gaëtan Rivet @ 2017-10-16  8:27 UTC (permalink / raw)
  To: Matan Azrad; +Cc: dev
Hey Matan,
Thanks for the patch. Here are some remarks:
On Mon, Oct 16, 2017 at 07:41:50AM +0000, Matan Azrad wrote:
> The stats_get API was changed to return error in case of failure at
> stats getting process time.
> By this way, failsafe can get stats snapshot in removal process for
> each PMD which can give stats after removal event.
> 
> This patch implements ultimate stats snapshot on removal time by
> trying to get the removed device stats before calling to dev_close.
> 
I would reformulate the commit log as such:
The stats_get API has changed to signal a potential failure to read
stats. Furthermore, some PMDs are able to provide statistics even
after a removal event occured.
Considering this, the fail-safe can try to access the latest statistics
of a PMD to improve statistics readings.
Attempt an ultimate statistics read on removal time;
if that fails, use the latest recorded snapshot.
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  drivers/net/failsafe/failsafe_ether.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
> index f4db423..2758d4c 100644
> --- a/drivers/net/failsafe/failsafe_ether.c
> +++ b/drivers/net/failsafe/failsafe_ether.c
> @@ -312,8 +312,12 @@
>  static void
>  fs_dev_stats_save(struct sub_device *sdev)
>  {
> +	struct rte_eth_stats stats;
> +
> +	/* Get stats now or take it from last snapshot. */
>  	failsafe_stats_increment(&PRIV(sdev->fs_dev)->stats_accumulator,
> -			&sdev->stats_snapshot);
> +		rte_eth_stats_get(PORT_ID(sdev), &stats) ?
> +		&sdev->stats_snapshot : &stats);
The conditional is awkward within the parameter list here.
Something like:
    struct rte_eth_stats stats;
    int err;
    /* Attempt to read current stats. */
    err = rte_eth_stats_get(PORT_ID(sdev), &stats);
    if (err)
        WARN("Could not access latest statistics from sub-device %d, using latest snapshot",
             SUB_ID(sdev));
    /* Use either current stats or latest snapshot. */
    failsafe_stats_increment(&PRIV(sdev->fs_dev)->stats_accumulator,
                             err ? &sdev->stats_snapshot : &stats);
Seems easier to read and allows warning the user of potential inaccuracies.
On that matter, it may be interesting to measure the time since the last snapshot
and display it as well.
>  	memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));
>  }
>  
> -- 
> 1.8.3.1
> 
-- 
Gaëtan Rivet
6WIND
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH] net/failsafe: improve stats accuracy
  2017-10-16  8:27 ` Gaëtan Rivet
@ 2017-10-16  8:51   ` Matan Azrad
  0 siblings, 0 replies; 10+ messages in thread
From: Matan Azrad @ 2017-10-16  8:51 UTC (permalink / raw)
  To: Gaëtan Rivet; +Cc: dev
Hi Geatan
> -----Original Message-----
> From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com]
> Sent: Monday, October 16, 2017 11:28 AM
> To: Matan Azrad <matan@mellanox.com>
> Cc: dev@dpdk.org
> Subject: Re: [PATCH] net/failsafe: improve stats accuracy
> 
> Hey Matan,
> 
> Thanks for the patch. Here are some remarks:
> 
> On Mon, Oct 16, 2017 at 07:41:50AM +0000, Matan Azrad wrote:
> > The stats_get API was changed to return error in case of failure at
> > stats getting process time.
> > By this way, failsafe can get stats snapshot in removal process for
> > each PMD which can give stats after removal event.
> >
> > This patch implements ultimate stats snapshot on removal time by
> > trying to get the removed device stats before calling to dev_close.
> >
> 
> I would reformulate the commit log as such:
> 
> The stats_get API has changed to signal a potential failure to read stats.
> Furthermore, some PMDs are able to provide statistics even after a removal
> event occured.
> 
> Considering this, the fail-safe can try to access the latest statistics of a PMD to
> improve statistics readings.
> 
> Attempt an ultimate statistics read on removal time; if that fails, use the
> latest recorded snapshot.
> 
Looks ok for me, thanks.
> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > ---
> >  drivers/net/failsafe/failsafe_ether.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/failsafe/failsafe_ether.c
> > b/drivers/net/failsafe/failsafe_ether.c
> > index f4db423..2758d4c 100644
> > --- a/drivers/net/failsafe/failsafe_ether.c
> > +++ b/drivers/net/failsafe/failsafe_ether.c
> > @@ -312,8 +312,12 @@
> >  static void
> >  fs_dev_stats_save(struct sub_device *sdev)  {
> > +	struct rte_eth_stats stats;
> > +
> > +	/* Get stats now or take it from last snapshot. */
> >  	failsafe_stats_increment(&PRIV(sdev->fs_dev)->stats_accumulator,
> > -			&sdev->stats_snapshot);
> > +		rte_eth_stats_get(PORT_ID(sdev), &stats) ?
> > +		&sdev->stats_snapshot : &stats);
> 
> The conditional is awkward within the parameter list here.
> Something like:
> 
>     struct rte_eth_stats stats;
>     int err;
> 
>     /* Attempt to read current stats. */
>     err = rte_eth_stats_get(PORT_ID(sdev), &stats);
>     if (err)
>         WARN("Could not access latest statistics from sub-device %d, using latest
> snapshot",
>              SUB_ID(sdev));
>     /* Use either current stats or latest snapshot. */
>     failsafe_stats_increment(&PRIV(sdev->fs_dev)->stats_accumulator,
>                              err ? &sdev->stats_snapshot : &stats);
> 
> Seems easier to read and allows warning the user of potential inaccuracies.
> On that matter, it may be interesting to measure the time since the last
> snapshot and display it as well.
>
OK, I think it is good idea, I will try to add this here.
 
> >  	memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));  }
> >
> > --
> > 1.8.3.1
> >
> 
> --
> Gaëtan Rivet
> 6WIND
^ permalink raw reply	[flat|nested] 10+ messages in thread
* [dpdk-dev] [PATCH v2] net/failsafe: improve stats accuracy
  2017-10-16  7:41 [dpdk-dev] [PATCH] net/failsafe: improve stats accuracy Matan Azrad
  2017-10-16  8:27 ` Gaëtan Rivet
@ 2017-10-19 14:31 ` Matan Azrad
  2017-10-19 15:08   ` Gaëtan Rivet
  2017-10-21 20:54   ` [dpdk-dev] [PATCH v3 1/2] " Matan Azrad
  1 sibling, 2 replies; 10+ messages in thread
From: Matan Azrad @ 2017-10-19 14:31 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev
The stats_get API was changed to signal a potential failure to read
stats. Furthermore, some PMDs are able to provide statistics even after
a removal event occurred.
Considering this, the fail-safe can try to access the latest statistics
of a PMD to improve statistics accuracy.
Attempt an ultimate statistics read on removal time; if that fails, use
the latest recorded snapshot.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/failsafe/failsafe_ether.c   | 19 +++++++++++++++++--
 drivers/net/failsafe/failsafe_ops.c     | 10 ++++++++--
 drivers/net/failsafe/failsafe_private.h |  7 ++++++-
 3 files changed, 31 insertions(+), 5 deletions(-)
diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index f4db423..df38360 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -35,6 +35,7 @@
 
 #include <rte_flow.h>
 #include <rte_flow_driver.h>
+#include <rte_cycles.h>
 
 #include "failsafe_private.h"
 
@@ -312,9 +313,23 @@
 static void
 fs_dev_stats_save(struct sub_device *sdev)
 {
+	struct rte_eth_stats stats;
+	int err;
+
+	/* Attempt to read current stats. */
+	err = rte_eth_stats_get(PORT_ID(sdev), &stats);
+	if (err) {
+		uint64_t cycles = sdev->stats_snapshot.cycles;
+
+		WARN("Could not access latest statistics from sub-device %d.\n",
+			 SUB_ID(sdev));
+		if (cycles != 0)
+			WARN("Using latest snapshot taken before %lu seconds.\n",
+				 (rte_rdtsc() - cycles) / rte_get_tsc_hz());
+	}
 	failsafe_stats_increment(&PRIV(sdev->fs_dev)->stats_accumulator,
-			&sdev->stats_snapshot);
-	memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));
+			err ? &sdev->stats_snapshot.stats : &stats);
+	memset(&sdev->stats_snapshot, 0, sizeof(sdev->stats_snapshot));
 }
 
 static inline int
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index d360965..818f12d 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -38,6 +38,7 @@
 #include <rte_ethdev.h>
 #include <rte_malloc.h>
 #include <rte_flow.h>
+#include <rte_cycles.h>
 
 #include "failsafe_private.h"
 
@@ -592,13 +593,18 @@
 
 	rte_memcpy(stats, &PRIV(dev)->stats_accumulator, sizeof(*stats));
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
-		ret = rte_eth_stats_get(PORT_ID(sdev), &sdev->stats_snapshot);
+		struct rte_eth_stats *snapshot = &sdev->stats_snapshot.stats;
+		uint64_t *cycles = &sdev->stats_snapshot.cycles;
+
+		ret = rte_eth_stats_get(PORT_ID(sdev), snapshot);
 		if (ret) {
 			ERROR("Operation rte_eth_stats_get failed for sub_device %d with error %d",
 				  i, ret);
+			*cycles = 0;
 			return ret;
 		}
-		failsafe_stats_increment(stats, &sdev->stats_snapshot);
+		*cycles = rte_rdtsc();
+		failsafe_stats_increment(stats, snapshot);
 	}
 	return 0;
 }
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index d343ebf..1df52f4 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -93,6 +93,11 @@ enum dev_state {
 	DEV_STARTED,
 };
 
+struct fs_stats {
+	struct rte_eth_stats stats;
+	uint64_t cycles;
+};
+
 struct sub_device {
 	/* Exhaustive DPDK device description */
 	struct rte_devargs devargs;
@@ -103,7 +108,7 @@ struct sub_device {
 	/* Device state machine */
 	enum dev_state state;
 	/* Last stats snapshot passed to user */
-	struct rte_eth_stats stats_snapshot;
+	struct fs_stats stats_snapshot;
 	/* Some device are defined as a command line */
 	char *cmdline;
 	/* fail-safe device backreference */
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH v2] net/failsafe: improve stats accuracy
  2017-10-19 14:31 ` [dpdk-dev] [PATCH v2] " Matan Azrad
@ 2017-10-19 15:08   ` Gaëtan Rivet
  2017-10-21 20:54   ` [dpdk-dev] [PATCH v3 1/2] " Matan Azrad
  1 sibling, 0 replies; 10+ messages in thread
From: Gaëtan Rivet @ 2017-10-19 15:08 UTC (permalink / raw)
  To: Matan Azrad; +Cc: dev
Hello Matan,
Adding the time delta should have been done in a separate commit.
Can you please divide this patch in two?
The first one will only attempt the ultimate stat read, the second one
would add the delay warning.
Small nit below for your v3.
On Thu, Oct 19, 2017 at 02:31:54PM +0000, Matan Azrad wrote:
> The stats_get API was changed to signal a potential failure to read
> stats. Furthermore, some PMDs are able to provide statistics even after
> a removal event occurred.
> 
> Considering this, the fail-safe can try to access the latest statistics
> of a PMD to improve statistics accuracy.
> 
> Attempt an ultimate statistics read on removal time; if that fails, use
> the latest recorded snapshot.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  drivers/net/failsafe/failsafe_ether.c   | 19 +++++++++++++++++--
>  drivers/net/failsafe/failsafe_ops.c     | 10 ++++++++--
>  drivers/net/failsafe/failsafe_private.h |  7 ++++++-
>  3 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
> index f4db423..df38360 100644
> --- a/drivers/net/failsafe/failsafe_ether.c
> +++ b/drivers/net/failsafe/failsafe_ether.c
> @@ -35,6 +35,7 @@
>  
>  #include <rte_flow.h>
>  #include <rte_flow_driver.h>
> +#include <rte_cycles.h>
>  
>  #include "failsafe_private.h"
>  
> @@ -312,9 +313,23 @@
>  static void
>  fs_dev_stats_save(struct sub_device *sdev)
>  {
> +	struct rte_eth_stats stats;
> +	int err;
> +
> +	/* Attempt to read current stats. */
> +	err = rte_eth_stats_get(PORT_ID(sdev), &stats);
> +	if (err) {
> +		uint64_t cycles = sdev->stats_snapshot.cycles;
> +
> +		WARN("Could not access latest statistics from sub-device %d.\n",
> +			 SUB_ID(sdev));
> +		if (cycles != 0)
> +			WARN("Using latest snapshot taken before %lu seconds.\n",
> +				 (rte_rdtsc() - cycles) / rte_get_tsc_hz());
> +	}
>  	failsafe_stats_increment(&PRIV(sdev->fs_dev)->stats_accumulator,
> -			&sdev->stats_snapshot);
> -	memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));
> +			err ? &sdev->stats_snapshot.stats : &stats);
> +	memset(&sdev->stats_snapshot, 0, sizeof(sdev->stats_snapshot));
>  }
>  
>  static inline int
> diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
> index d360965..818f12d 100644
> --- a/drivers/net/failsafe/failsafe_ops.c
> +++ b/drivers/net/failsafe/failsafe_ops.c
> @@ -38,6 +38,7 @@
>  #include <rte_ethdev.h>
>  #include <rte_malloc.h>
>  #include <rte_flow.h>
> +#include <rte_cycles.h>
>  
>  #include "failsafe_private.h"
>  
> @@ -592,13 +593,18 @@
>  
>  	rte_memcpy(stats, &PRIV(dev)->stats_accumulator, sizeof(*stats));
>  	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> -		ret = rte_eth_stats_get(PORT_ID(sdev), &sdev->stats_snapshot);
> +		struct rte_eth_stats *snapshot = &sdev->stats_snapshot.stats;
> +		uint64_t *cycles = &sdev->stats_snapshot.cycles;
> +
> +		ret = rte_eth_stats_get(PORT_ID(sdev), snapshot);
>  		if (ret) {
>  			ERROR("Operation rte_eth_stats_get failed for sub_device %d with error %d",
>  				  i, ret);
> +			*cycles = 0;
>  			return ret;
>  		}
> -		failsafe_stats_increment(stats, &sdev->stats_snapshot);
> +		*cycles = rte_rdtsc();
> +		failsafe_stats_increment(stats, snapshot);
>  	}
>  	return 0;
>  }
> diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
> index d343ebf..1df52f4 100644
> --- a/drivers/net/failsafe/failsafe_private.h
> +++ b/drivers/net/failsafe/failsafe_private.h
> @@ -93,6 +93,11 @@ enum dev_state {
>  	DEV_STARTED,
>  };
>  
> +struct fs_stats {
> +	struct rte_eth_stats stats;
> +	uint64_t cycles;
What do you think of the name "timestamp" for this field?
It would be more descriptive of its use.
> +};
> +
>  struct sub_device {
>  	/* Exhaustive DPDK device description */
>  	struct rte_devargs devargs;
> @@ -103,7 +108,7 @@ struct sub_device {
>  	/* Device state machine */
>  	enum dev_state state;
>  	/* Last stats snapshot passed to user */
> -	struct rte_eth_stats stats_snapshot;
> +	struct fs_stats stats_snapshot;
>  	/* Some device are defined as a command line */
>  	char *cmdline;
>  	/* fail-safe device backreference */
> -- 
> 1.8.3.1
> 
Thanks,
-- 
Gaëtan Rivet
6WIND
^ permalink raw reply	[flat|nested] 10+ messages in thread
* [dpdk-dev] [PATCH v3 1/2] net/failsafe: improve stats accuracy
  2017-10-19 14:31 ` [dpdk-dev] [PATCH v2] " Matan Azrad
  2017-10-19 15:08   ` Gaëtan Rivet
@ 2017-10-21 20:54   ` Matan Azrad
  2017-10-21 20:54     ` [dpdk-dev] [PATCH v3 2/2] net/failsafe: add timestamp to stats snapshot Matan Azrad
  2017-10-23  8:46     ` [dpdk-dev] [PATCH v3 1/2] net/failsafe: improve stats accuracy Gaëtan Rivet
  1 sibling, 2 replies; 10+ messages in thread
From: Matan Azrad @ 2017-10-21 20:54 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev
The stats_get API was changed to signal a potential failure to read
stats. Furthermore, some PMDs are able to provide statistics even
after a removal event occurred.
Considering this, the fail-safe can try to access the latest
statistics of a PMD to improve statistics accuracy.
Attempt an ultimate statistics read on removal time; if that
fails, use the latest recorded snapshot.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/failsafe/failsafe_ether.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)
V2:
Improve commit message.
Add warning massage when using stats snapshot.
Add time report from last snapshot.
V3:
Separate patch.
Replace "cycles" by "timestamp".
diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index f4db423..0282891 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -312,8 +312,16 @@
 static void
 fs_dev_stats_save(struct sub_device *sdev)
 {
+	struct rte_eth_stats stats;
+	int err;
+
+	/* Attempt to read current stats. */
+	err = rte_eth_stats_get(PORT_ID(sdev), &stats);
+	if (err)
+		WARN("Could not access latest statistics from sub-device %d,"
+			 " using latest snapshot.\n", SUB_ID(sdev));
 	failsafe_stats_increment(&PRIV(sdev->fs_dev)->stats_accumulator,
-			&sdev->stats_snapshot);
+			err ? &sdev->stats_snapshot : &stats);
 	memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));
 }
 
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 10+ messages in thread
* [dpdk-dev] [PATCH v3 2/2] net/failsafe: add timestamp to stats snapshot
  2017-10-21 20:54   ` [dpdk-dev] [PATCH v3 1/2] " Matan Azrad
@ 2017-10-21 20:54     ` Matan Azrad
  2017-10-23  8:46       ` Gaëtan Rivet
  2017-10-23  8:46     ` [dpdk-dev] [PATCH v3 1/2] net/failsafe: improve stats accuracy Gaëtan Rivet
  1 sibling, 1 reply; 10+ messages in thread
From: Matan Azrad @ 2017-10-21 20:54 UTC (permalink / raw)
  To: Gaetan Rivet; +Cc: dev
Fail-safe attempts to read an ultimate statistics on removal time; if
that fails, it uses the latest recorded snapshot.
This patch adds timestamp for each stats snapshot to allow a time report
since the last snapshot in case of the above failure.
By this way, the user can estimate the stats read accuracy.
Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 drivers/net/failsafe/failsafe_ether.c   | 17 ++++++++++++-----
 drivers/net/failsafe/failsafe_ops.c     | 10 ++++++++--
 drivers/net/failsafe/failsafe_private.h |  7 ++++++-
 3 files changed, 26 insertions(+), 8 deletions(-)
diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
index 0282891..6cb435f 100644
--- a/drivers/net/failsafe/failsafe_ether.c
+++ b/drivers/net/failsafe/failsafe_ether.c
@@ -35,6 +35,7 @@
 
 #include <rte_flow.h>
 #include <rte_flow_driver.h>
+#include <rte_cycles.h>
 
 #include "failsafe_private.h"
 
@@ -317,12 +318,18 @@
 
 	/* Attempt to read current stats. */
 	err = rte_eth_stats_get(PORT_ID(sdev), &stats);
-	if (err)
-		WARN("Could not access latest statistics from sub-device %d,"
-			 " using latest snapshot.\n", SUB_ID(sdev));
+	if (err) {
+		uint64_t timestamp = sdev->stats_snapshot.timestamp;
+
+		WARN("Could not access latest statistics from sub-device %d.\n",
+			 SUB_ID(sdev));
+		if (timestamp != 0)
+			WARN("Using latest snapshot taken before %lu seconds.\n",
+				 (rte_rdtsc() - timestamp) / rte_get_tsc_hz());
+	}
 	failsafe_stats_increment(&PRIV(sdev->fs_dev)->stats_accumulator,
-			err ? &sdev->stats_snapshot : &stats);
-	memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));
+			err ? &sdev->stats_snapshot.stats : &stats);
+	memset(&sdev->stats_snapshot, 0, sizeof(sdev->stats_snapshot));
 }
 
 static inline int
diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
index d360965..f460551 100644
--- a/drivers/net/failsafe/failsafe_ops.c
+++ b/drivers/net/failsafe/failsafe_ops.c
@@ -38,6 +38,7 @@
 #include <rte_ethdev.h>
 #include <rte_malloc.h>
 #include <rte_flow.h>
+#include <rte_cycles.h>
 
 #include "failsafe_private.h"
 
@@ -592,13 +593,18 @@
 
 	rte_memcpy(stats, &PRIV(dev)->stats_accumulator, sizeof(*stats));
 	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
-		ret = rte_eth_stats_get(PORT_ID(sdev), &sdev->stats_snapshot);
+		struct rte_eth_stats *snapshot = &sdev->stats_snapshot.stats;
+		uint64_t *timestamp = &sdev->stats_snapshot.timestamp;
+
+		ret = rte_eth_stats_get(PORT_ID(sdev), snapshot);
 		if (ret) {
 			ERROR("Operation rte_eth_stats_get failed for sub_device %d with error %d",
 				  i, ret);
+			*timestamp = 0;
 			return ret;
 		}
-		failsafe_stats_increment(stats, &sdev->stats_snapshot);
+		*timestamp = rte_rdtsc();
+		failsafe_stats_increment(stats, snapshot);
 	}
 	return 0;
 }
diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
index d343ebf..d81cc3c 100644
--- a/drivers/net/failsafe/failsafe_private.h
+++ b/drivers/net/failsafe/failsafe_private.h
@@ -93,6 +93,11 @@ enum dev_state {
 	DEV_STARTED,
 };
 
+struct fs_stats {
+	struct rte_eth_stats stats;
+	uint64_t timestamp;
+};
+
 struct sub_device {
 	/* Exhaustive DPDK device description */
 	struct rte_devargs devargs;
@@ -103,7 +108,7 @@ struct sub_device {
 	/* Device state machine */
 	enum dev_state state;
 	/* Last stats snapshot passed to user */
-	struct rte_eth_stats stats_snapshot;
+	struct fs_stats stats_snapshot;
 	/* Some device are defined as a command line */
 	char *cmdline;
 	/* fail-safe device backreference */
-- 
1.8.3.1
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/2] net/failsafe: improve stats accuracy
  2017-10-21 20:54   ` [dpdk-dev] [PATCH v3 1/2] " Matan Azrad
  2017-10-21 20:54     ` [dpdk-dev] [PATCH v3 2/2] net/failsafe: add timestamp to stats snapshot Matan Azrad
@ 2017-10-23  8:46     ` Gaëtan Rivet
  2017-10-23 21:01       ` Ferruh Yigit
  1 sibling, 1 reply; 10+ messages in thread
From: Gaëtan Rivet @ 2017-10-23  8:46 UTC (permalink / raw)
  To: Matan Azrad; +Cc: dev
Thanks Matan,
On Sat, Oct 21, 2017 at 08:54:45PM +0000, Matan Azrad wrote:
> The stats_get API was changed to signal a potential failure to read
> stats. Furthermore, some PMDs are able to provide statistics even
> after a removal event occurred.
> 
> Considering this, the fail-safe can try to access the latest
> statistics of a PMD to improve statistics accuracy.
> 
> Attempt an ultimate statistics read on removal time; if that
> fails, use the latest recorded snapshot.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
> ---
>  drivers/net/failsafe/failsafe_ether.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> V2:
> Improve commit message.
> Add warning massage when using stats snapshot.
> Add time report from last snapshot.
> 
> V3:
> Separate patch.
> Replace "cycles" by "timestamp".
> 
> diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
> index f4db423..0282891 100644
> --- a/drivers/net/failsafe/failsafe_ether.c
> +++ b/drivers/net/failsafe/failsafe_ether.c
> @@ -312,8 +312,16 @@
>  static void
>  fs_dev_stats_save(struct sub_device *sdev)
>  {
> +	struct rte_eth_stats stats;
> +	int err;
> +
> +	/* Attempt to read current stats. */
> +	err = rte_eth_stats_get(PORT_ID(sdev), &stats);
> +	if (err)
> +		WARN("Could not access latest statistics from sub-device %d,"
> +			 " using latest snapshot.\n", SUB_ID(sdev));
>  	failsafe_stats_increment(&PRIV(sdev->fs_dev)->stats_accumulator,
> -			&sdev->stats_snapshot);
> +			err ? &sdev->stats_snapshot : &stats);
>  	memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));
>  }
>  
> -- 
> 1.8.3.1
> 
-- 
Gaëtan Rivet
6WIND
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH v3 2/2] net/failsafe: add timestamp to stats snapshot
  2017-10-21 20:54     ` [dpdk-dev] [PATCH v3 2/2] net/failsafe: add timestamp to stats snapshot Matan Azrad
@ 2017-10-23  8:46       ` Gaëtan Rivet
  0 siblings, 0 replies; 10+ messages in thread
From: Gaëtan Rivet @ 2017-10-23  8:46 UTC (permalink / raw)
  To: Matan Azrad; +Cc: dev
On Sat, Oct 21, 2017 at 08:54:46PM +0000, Matan Azrad wrote:
> Fail-safe attempts to read an ultimate statistics on removal time; if
> that fails, it uses the latest recorded snapshot.
> 
> This patch adds timestamp for each stats snapshot to allow a time report
> since the last snapshot in case of the above failure.
> 
> By this way, the user can estimate the stats read accuracy.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
> ---
>  drivers/net/failsafe/failsafe_ether.c   | 17 ++++++++++++-----
>  drivers/net/failsafe/failsafe_ops.c     | 10 ++++++++--
>  drivers/net/failsafe/failsafe_private.h |  7 ++++++-
>  3 files changed, 26 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c
> index 0282891..6cb435f 100644
> --- a/drivers/net/failsafe/failsafe_ether.c
> +++ b/drivers/net/failsafe/failsafe_ether.c
> @@ -35,6 +35,7 @@
>  
>  #include <rte_flow.h>
>  #include <rte_flow_driver.h>
> +#include <rte_cycles.h>
>  
>  #include "failsafe_private.h"
>  
> @@ -317,12 +318,18 @@
>  
>  	/* Attempt to read current stats. */
>  	err = rte_eth_stats_get(PORT_ID(sdev), &stats);
> -	if (err)
> -		WARN("Could not access latest statistics from sub-device %d,"
> -			 " using latest snapshot.\n", SUB_ID(sdev));
> +	if (err) {
> +		uint64_t timestamp = sdev->stats_snapshot.timestamp;
> +
> +		WARN("Could not access latest statistics from sub-device %d.\n",
> +			 SUB_ID(sdev));
> +		if (timestamp != 0)
> +			WARN("Using latest snapshot taken before %lu seconds.\n",
> +				 (rte_rdtsc() - timestamp) / rte_get_tsc_hz());
> +	}
>  	failsafe_stats_increment(&PRIV(sdev->fs_dev)->stats_accumulator,
> -			err ? &sdev->stats_snapshot : &stats);
> -	memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats));
> +			err ? &sdev->stats_snapshot.stats : &stats);
> +	memset(&sdev->stats_snapshot, 0, sizeof(sdev->stats_snapshot));
>  }
>  
>  static inline int
> diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
> index d360965..f460551 100644
> --- a/drivers/net/failsafe/failsafe_ops.c
> +++ b/drivers/net/failsafe/failsafe_ops.c
> @@ -38,6 +38,7 @@
>  #include <rte_ethdev.h>
>  #include <rte_malloc.h>
>  #include <rte_flow.h>
> +#include <rte_cycles.h>
>  
>  #include "failsafe_private.h"
>  
> @@ -592,13 +593,18 @@
>  
>  	rte_memcpy(stats, &PRIV(dev)->stats_accumulator, sizeof(*stats));
>  	FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) {
> -		ret = rte_eth_stats_get(PORT_ID(sdev), &sdev->stats_snapshot);
> +		struct rte_eth_stats *snapshot = &sdev->stats_snapshot.stats;
> +		uint64_t *timestamp = &sdev->stats_snapshot.timestamp;
> +
> +		ret = rte_eth_stats_get(PORT_ID(sdev), snapshot);
>  		if (ret) {
>  			ERROR("Operation rte_eth_stats_get failed for sub_device %d with error %d",
>  				  i, ret);
> +			*timestamp = 0;
>  			return ret;
>  		}
> -		failsafe_stats_increment(stats, &sdev->stats_snapshot);
> +		*timestamp = rte_rdtsc();
> +		failsafe_stats_increment(stats, snapshot);
>  	}
>  	return 0;
>  }
> diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
> index d343ebf..d81cc3c 100644
> --- a/drivers/net/failsafe/failsafe_private.h
> +++ b/drivers/net/failsafe/failsafe_private.h
> @@ -93,6 +93,11 @@ enum dev_state {
>  	DEV_STARTED,
>  };
>  
> +struct fs_stats {
> +	struct rte_eth_stats stats;
> +	uint64_t timestamp;
> +};
> +
>  struct sub_device {
>  	/* Exhaustive DPDK device description */
>  	struct rte_devargs devargs;
> @@ -103,7 +108,7 @@ struct sub_device {
>  	/* Device state machine */
>  	enum dev_state state;
>  	/* Last stats snapshot passed to user */
> -	struct rte_eth_stats stats_snapshot;
> +	struct fs_stats stats_snapshot;
>  	/* Some device are defined as a command line */
>  	char *cmdline;
>  	/* fail-safe device backreference */
> -- 
> 1.8.3.1
> 
-- 
Gaëtan Rivet
6WIND
^ permalink raw reply	[flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/2] net/failsafe: improve stats accuracy
  2017-10-23  8:46     ` [dpdk-dev] [PATCH v3 1/2] net/failsafe: improve stats accuracy Gaëtan Rivet
@ 2017-10-23 21:01       ` Ferruh Yigit
  0 siblings, 0 replies; 10+ messages in thread
From: Ferruh Yigit @ 2017-10-23 21:01 UTC (permalink / raw)
  To: Gaëtan Rivet, Matan Azrad; +Cc: dev
On 10/23/2017 1:46 AM, Gaëtan Rivet wrote:
> Thanks Matan,
> 
> On Sat, Oct 21, 2017 at 08:54:45PM +0000, Matan Azrad wrote:
>> The stats_get API was changed to signal a potential failure to read
>> stats. Furthermore, some PMDs are able to provide statistics even
>> after a removal event occurred.
>>
>> Considering this, the fail-safe can try to access the latest
>> statistics of a PMD to improve statistics accuracy.
>>
>> Attempt an ultimate statistics read on removal time; if that
>> fails, use the latest recorded snapshot.
>>
>> Signed-off-by: Matan Azrad <matan@mellanox.com>
> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Series applied to dpdk-next-net/master, thanks.
(Fixed build error for 32bits [1] while applying, please check.)
[1]
.../dpdk/drivers/net/failsafe/failsafe_ether.c: In function ‘fs_dev_stats_save’:
.../dpdk/drivers/net/failsafe/failsafe_ether.c:328:50: error: format ‘%lu’
expects argument of type ‘long unsigned int’, but argument 4 has type ‘uint64_t
{aka long long unsigned int}’ [-Werror=format=]
      (rte_rdtsc() - timestamp) / rte_get_tsc_hz());
                                                  ^
.../dpdk/i686-native-linuxapp-gcc/include/rte_log.h:345:25: note: in definition
of macro ‘RTE_LOG’
    RTE_LOGTYPE_ ## t, # t ": " __VA_ARGS__)
                         ^
.../dpdk/drivers/net/failsafe/failsafe_private.h:299:26: note: in expansion of
macro ‘LOG__’
 #define LOG_(level, ...) LOG__(level, __VA_ARGS__, '\n')
                          ^~~~~
.../dpdk/drivers/net/failsafe/failsafe_private.h:302:19: note: in expansion of
macro ‘LOG_’
 #define WARN(...) LOG_(WARNING, __VA_ARGS__)
                   ^~~~
.../dpdk/drivers/net/failsafe/failsafe_ether.c:327:4: note: in expansion of
macro ‘WARN’
    WARN("Using latest snapshot taken before %lu seconds.\n",
    ^~~~
^ permalink raw reply	[flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-10-23 21:02 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-16  7:41 [dpdk-dev] [PATCH] net/failsafe: improve stats accuracy Matan Azrad
2017-10-16  8:27 ` Gaëtan Rivet
2017-10-16  8:51   ` Matan Azrad
2017-10-19 14:31 ` [dpdk-dev] [PATCH v2] " Matan Azrad
2017-10-19 15:08   ` Gaëtan Rivet
2017-10-21 20:54   ` [dpdk-dev] [PATCH v3 1/2] " Matan Azrad
2017-10-21 20:54     ` [dpdk-dev] [PATCH v3 2/2] net/failsafe: add timestamp to stats snapshot Matan Azrad
2017-10-23  8:46       ` Gaëtan Rivet
2017-10-23  8:46     ` [dpdk-dev] [PATCH v3 1/2] net/failsafe: improve stats accuracy Gaëtan Rivet
2017-10-23 21:01       ` Ferruh Yigit
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).