From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f176.google.com (mail-wr0-f176.google.com [209.85.128.176]) by dpdk.org (Postfix) with ESMTP id ADDAE293B for ; Tue, 5 Sep 2017 18:34:32 +0200 (CEST) Received: by mail-wr0-f176.google.com with SMTP id a47so8117166wra.1 for ; Tue, 05 Sep 2017 09:34:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=KOYxVX7Q2aTLw69XCE8fhKhSXHmvSfMGHjPuEzZ9PTQ=; b=hGNOptFcUWSv+sjlUHdJL7yLxL2XvdPqFcGiCI3aMYugZcwTCgCJJJjJNsnoy6reb+ Y9JItdykSiVA8rt+ZycVZCDI2pkY3q3wyyVlCTmcC4E9peNytbhCsE0wewuQrRxDalpb JIokRHJ2p576agggZJkuXJc7dO0RAnimkHc5rsK59B2tp2JMlfzqy3EnHRZb3jxrUlUF pDqUKxKgl/bpzzKcB/xrKvue3hxH3yRPIFnca8Qv5270hfum0INx8KXln7y4iTkMANyT yJxZGMC5MmOhZRCJO8TXehDhlYoidMUD1ABYtkecxTC2UQzqcG38wSiDFYZ9sd4OVdJF r44w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=KOYxVX7Q2aTLw69XCE8fhKhSXHmvSfMGHjPuEzZ9PTQ=; b=rgyAnE2aNhrnfpd69JQKuhc45EfnEvgj05wNuj30bBf0zTAnZgSr8Xw1D504PXKgk+ BIgk/IRzkHUWtFfBBfRCZAWs6vK3RCAGj9aqy2LgNKsfM1fqjcax3wy30bUt7UaUIzKl +bqTvAWX35FTEZGuQ7zidkvmCKQ5h+3yT2M3yjDArwGSu8wjEI+YPr9RtceNKJmNbR8C lvpyUwwmeuiPCSsBevpX3GDcuBgRuB7fMOAh3+2l51+a4aq7n+h78+U7fYeD6LcbkRv0 UYntzFN+mbAhA9B3Y/F0PVpEb8tiS2jZB1r7ZcjvBMszVXFjDLSNL1NgBv5BeF11Bhf5 2rdQ== X-Gm-Message-State: AHPjjUghPKwRNAjPK4Lg2JecmFq7oPbkhMezreCwYE8Ia9BGaXP1olf/ CAhJr3IAy9wxhP1Y X-Google-Smtp-Source: ADKCNb7w/+3WIrZUkt2YO5+VPbKC0qj+mSgNK9+qpuN8lIaiSNCaVSCTnznf2IopU2KAxyEjL0bN6w== X-Received: by 10.223.171.206 with SMTP id s72mr2575857wrc.27.1504629272310; Tue, 05 Sep 2017 09:34:32 -0700 (PDT) Received: from bidouze.vm.6wind.com (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id z51sm1295691wrz.80.2017.09.05.09.34.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Sep 2017 09:34:31 -0700 (PDT) Date: Tue, 5 Sep 2017 18:34:21 +0200 From: =?iso-8859-1?Q?Ga=EBtan?= Rivet To: Matan Azrad Cc: "dev@dpdk.org" Message-ID: <20170905163421.GJ21444@bidouze.vm.6wind.com> References: <1504605394-6450-1-git-send-email-matan@mellanox.com> <20170905132244.GF21444@bidouze.vm.6wind.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [dpdk-dev] [PATCH] net/failsafe: stat support enhancement X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Sep 2017 16:34:32 -0000 On Tue, Sep 05, 2017 at 03:12:55PM +0000, Matan Azrad wrote: > Hi Gaetan, > > > -----Original Message----- > > From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com] > > Sent: Tuesday, September 5, 2017 4:23 PM > > To: Matan Azrad > > Cc: dev@dpdk.org > > Subject: Re: [PATCH] net/failsafe: stat support enhancement > > > > Hi Matan, > > > > On Tue, Sep 05, 2017 at 12:56:34PM +0300, Matan Azrad wrote: > > > The previous stats code returned only the current TX sub device stats. > > > > > > This enhancement extends it to return the sum of all sub devices stats > > > with history of removed sub-devices. > > > > > > Dedicated stats accumulator saves the stat history of all sub device > > > remove events. > > > > > > Each failsafe sub device contains the last stats asked by the user and > > > updates the accumulator in removal time. > > > > > > I would like to implement ultimate snapshot on removal time. > > > The stats_get API needs to be changed to return error in the case it > > > is too late to retrieve statistics. > > > > You need to have this API evolution first. It is not assured to be accepted by > > the community. > > > > I already sent suggestion to mailing list, find "stats_get API: return value suggestion": > No objections for now. > > > > As it is, this version is incorrect, because the only available stats for a > > removed slave is the last snapshot. > > > > It is for sure better from the previous version, the accuracy can be better if > my suggestion will be accepted. > If it is, I will improve it. > > > Thus it complicates the rules while still being incorrect. Even if you were able > > to push for this API evolution, PMDs with hard counters would be the ones > > to return an error code on stat read while removed. > > You may be lucky at the moment because MLX drivers do not support hard > > counters, but this won't always be true (and it will be false soon enough). > > You will thus be back at square one, with a new useless API and still incorrect > > stats in the fail-safe. On the other hand, the fail-safe should strive to be as > > easy to use as possible with most drivers, and not cater specifically to soft- > > counters ones. > > > > So, my take on this: I understand that aggregated stats would be interesting. > > Keep it simple & stupid: simple aggregation on stats_get. > > You will have a rollback at some point on device removal, but this is still easier > > to detect / deal with than partially / sometimes incorrect stats. > > > > 100% accuracy is impossible when removal event is in the game, > but we can do it better as we can. > > As I wrote in commit log(if stat gets will return value): > When some drivers can give the stats after RMV interrupt and before dev_close, > We just can to improve accuracy and be closer to 100%. > The stats of the sub devices which return error will be taken from the last snapshot. > > It is just enhancement which is important for customers. > The trade-off is between accuracy and complexity, both at the fail-safe level and at the application level. Your solution | KISS solution -------------------------+-------------------------------- Accuracy Closer to reality. | Farther from reality. Upon device removal, | Upon device removal, only packets received | all previous accounted received since the last stats_get | packets are missing, inducing are missing. | a stats rollback from the app | PoV. -------------------------+-------------------------------- Complexity Medium implementation | Simple implementation. (fast-path) complexity. | Two hotplug sequences will be The accumulator induces | identical (from a stats PoV). a trailing effect | between device remove / | plugin, thus incurring | edge-effects and | reducing idempotency of | hotplug sequences. | -------------------------+-------------------------------- Complexity Statistical errors | Glaring statistical errors (application) induced by device | induced by device removal. removal is hard. | Detecting those (stats rollback) Device removal is | is trivial and can thus be pretty much invisible. | mitigated. | Plays nice with the | While easier to deal with, "seamless" feature of | it actually pushes application fail-safe, but might be | to have custom code-path for a curse in disguise. | the fail-safe PMD, which goes | against its design principles. I consider that your proposed API will only somewhat fix errors with PMDs having software counters. Thus it being accepted or rejected is irrelevant for this assessment. While I strongly support the simplest solution for any implementation, I dislike going against the fail-safe core principles. This is a very flimsy sheet of ice we are walking upon. Keep in mind that at the first complication due to your proposal, it will be reverted and the KISS solution used instead. But as it is, I thus accept it. I will do a proper review (there are a few things to fix still, now that the `why` is dealt with) soon. > > > By this way, failsafe can get stats snapshot in removal interrupt > > > callback for each PMD which can give stats after removal event. > > > > > > Signed-off-by: Matan Azrad > > > --- > > > drivers/net/failsafe/failsafe_ether.c | 33 > > +++++++++++++++++++++++++++++++++ > > > drivers/net/failsafe/failsafe_ops.c | 16 ++++++++++++---- > > > drivers/net/failsafe/failsafe_private.h | 5 +++++ > > > 3 files changed, 50 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/net/failsafe/failsafe_ether.c > > > b/drivers/net/failsafe/failsafe_ether.c > > > index a3a8cce..133080d 100644 > > > --- a/drivers/net/failsafe/failsafe_ether.c > > > +++ b/drivers/net/failsafe/failsafe_ether.c > > > @@ -399,6 +399,37 @@ failsafe_eth_dev_state_sync(struct rte_eth_dev > > *dev) > > > return ret; > > > } > > > > > > +void > > > +fs_increment_stats(struct rte_eth_stats *from, struct rte_eth_stats > > > +*to) { > > > + uint8_t i; > > > + > > > + RTE_ASSERT(from != NULL && to != NULL); > > > + to->ipackets += from->ipackets; > > > + to->opackets += from->opackets; > > > + to->ibytes += from->ibytes; > > > + to->obytes += from->obytes; > > > + to->imissed += from->imissed; > > > + to->ierrors += from->ierrors; > > > + to->oerrors += from->oerrors; > > > + to->rx_nombuf += from->rx_nombuf; > > > + for (i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS; i++) { > > > + to->q_ipackets[i] += from->q_ipackets[i]; > > > + to->q_opackets[i] += from->q_opackets[i]; > > > + to->q_ibytes[i] += from->q_ibytes[i]; > > > + to->q_obytes[i] += from->q_obytes[i]; > > > + to->q_errors[i] += from->q_errors[i]; > > > + } > > > +} > > > + > > > +void > > > +fs_increment_stats_accumulator(struct sub_device *sdev) { > > > + fs_increment_stats(&sdev->stats_snapshot, > > > + &PRIV(sdev->fs_dev)->stats_accumulator); > > > + memset(&sdev->stats_snapshot, 0, sizeof(struct rte_eth_stats)); } > > > + > > > int > > > failsafe_eth_rmv_event_callback(uint8_t port_id __rte_unused, > > > enum rte_eth_event_type event > > __rte_unused, @@ -410,6 +441,8 @@ > > > failsafe_eth_rmv_event_callback(uint8_t port_id __rte_unused, > > > fs_switch_dev(sdev->fs_dev, sdev); > > > /* Use safe bursts in any case. */ > > > set_burst_fn(sdev->fs_dev, 1); > > > + /* Increment the stats accumulator by the last stats snapshot. */ > > > + fs_increment_stats_accumulator(sdev); > > > /* > > > * Async removal, the sub-PMD will try to unregister > > > * the callback at the source of the current thread context. > > > diff --git a/drivers/net/failsafe/failsafe_ops.c > > > b/drivers/net/failsafe/failsafe_ops.c > > > index ff9ad15..e47cc85 100644 > > > --- a/drivers/net/failsafe/failsafe_ops.c > > > +++ b/drivers/net/failsafe/failsafe_ops.c > > > @@ -586,9 +586,14 @@ static void > > > fs_stats_get(struct rte_eth_dev *dev, > > > struct rte_eth_stats *stats) > > > { > > > - if (TX_SUBDEV(dev) == NULL) > > > - return; > > > - rte_eth_stats_get(PORT_ID(TX_SUBDEV(dev)), stats); > > > + struct sub_device *sdev; > > > + uint8_t i; > > > + > > > + memcpy(stats, &PRIV(dev)->stats_accumulator, sizeof(*stats)); > > > + FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) { > > > + rte_eth_stats_get(PORT_ID(sdev), &sdev->stats_snapshot); > > > + fs_increment_stats(&sdev->stats_snapshot, stats); > > > + } > > > } > > > > > > static void > > > @@ -597,8 +602,11 @@ fs_stats_reset(struct rte_eth_dev *dev) > > > struct sub_device *sdev; > > > uint8_t i; > > > > > > - FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) > > > + FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_ACTIVE) { > > > rte_eth_stats_reset(PORT_ID(sdev)); > > > + memset(&sdev->stats_snapshot, 0, sizeof(struct > > rte_eth_stats)); > > > + } > > > + memset(&PRIV(dev)->stats_accumulator, 0, sizeof(struct > > > +rte_eth_stats)); > > > } > > > > > > /** > > > diff --git a/drivers/net/failsafe/failsafe_private.h > > > b/drivers/net/failsafe/failsafe_private.h > > > index 0361cf4..267c749 100644 > > > --- a/drivers/net/failsafe/failsafe_private.h > > > +++ b/drivers/net/failsafe/failsafe_private.h > > > @@ -102,6 +102,8 @@ struct sub_device { > > > uint8_t sid; > > > /* Device state machine */ > > > enum dev_state state; > > > + /* Last stats snapshot passed to user */ > > > + struct rte_eth_stats stats_snapshot; > > > /* Some device are defined as a command line */ > > > char *cmdline; > > > /* fail-safe device backreference */ @@ -140,6 +142,7 @@ struct > > > fs_priv { > > > * synchronized state. > > > */ > > > enum dev_state state; > > > + struct rte_eth_stats stats_accumulator; > > > unsigned int pending_alarm:1; /* An alarm is pending */ > > > /* flow isolation state */ > > > int flow_isolated:1; > > > @@ -180,6 +183,8 @@ int failsafe_eal_uninit(struct rte_eth_dev *dev); > > > > > > int failsafe_eth_dev_state_sync(struct rte_eth_dev *dev); void > > > failsafe_dev_remove(struct rte_eth_dev *dev); > > > +void fs_increment_stats(struct rte_eth_stats *from, struct > > > +rte_eth_stats *to); void fs_increment_stats_accumulator(struct > > > +sub_device *sdev); > > > int failsafe_eth_rmv_event_callback(uint8_t port_id, > > > enum rte_eth_event_type type, > > > void *arg, void *out); > > > -- > > > 2.7.4 > > > > > > > -- > > Gaëtan Rivet > > 6WIND -- Gaëtan Rivet 6WIND