From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-f174.google.com (mail-vk1-f174.google.com [209.85.221.174]) by dpdk.org (Postfix) with ESMTP id 630721B4B7 for ; Fri, 15 Feb 2019 18:32:41 +0100 (CET) Received: by mail-vk1-f174.google.com with SMTP id h128so2402235vkg.11 for ; Fri, 15 Feb 2019 09:32:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bnkkiCGAXkDeXQ0K52plhP8uACxqYcCG0hRFCitu7b0=; b=UcVZZXUw5LsGzUZvFEnao80sFt/DMppjkoUmTdAu3zDWplHxh0rK7BK127O0wKtBiG /qH48Q7FgRR0A3pYVNoWgMDH0jM9D6nM3BLeH5GXeoeG+8zAVszbWz6nOaHQ2FTD03p+ wJmiPMDJ4gdsMEQmeg3287mUO110UiW5Trzh4riPmgxLaOyjiwlej1mVYC0i6rJOVONe OfjjjBk0V3Lyc3ocMsyMDiYCnS2hkWsLwNncwqRLaA6VzK8uHrnxa7bfglv+EHkhn6rq 8FE6G5I2Rr4AxhMYhCC6nuFsyIaXmrDvSr78bO8E+Wm1MX/iGMOmXK3QBJnpXHEwq95h YqOQ== X-Gm-Message-State: AHQUAuYiG4hus3Bm3tK/cqzC77oV3MWg/DE9GIJ5melK/A/CCWo5LZ0o 8DeGWtR09TU/HyvgqiO5nuGBHawpUDRdHbAlB1c/4g== X-Google-Smtp-Source: AHgI3IZNT46qnpn3atdxfsij737UVp8vFYXf4bRO4ZIp2Ww60oFswoVIZPnL6MQGZGX0Gy0LCZs8WukmIZ3fawgbSss= X-Received: by 2002:a1f:7f10:: with SMTP id o16mr5398920vki.31.1550251960376; Fri, 15 Feb 2019 09:32:40 -0800 (PST) MIME-Version: 1.0 References: <1550158972-21895-1-git-send-email-david.marchand@redhat.com> <20190215140526.GB790616@bricha3-MOBL.ger.corp.intel.com> <1551030.Hp446SNAY3@xps> In-Reply-To: <1551030.Hp446SNAY3@xps> From: David Marchand Date: Fri, 15 Feb 2019 18:32:28 +0100 Message-ID: To: Thomas Monjalon Cc: Bruce Richardson , dev@dpdk.org, Wenzhuo Lu , Jingjing Wu , bernard.iremonger@intel.com, Maxime Coquelin , "Yigit, Ferruh" , Andrew Rybchenko , keith.wiles@intel.com Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH 3/5] app/testpmd: add missing transmit errors stats X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Feb 2019 17:32:43 -0000 On Fri, Feb 15, 2019 at 5:19 PM Thomas Monjalon wrote: > 15/02/2019 16:04, David Marchand: > > On Fri, Feb 15, 2019 at 3:05 PM Bruce Richardson < > bruce.richardson@intel.com> > > wrote: > > > > > On Fri, Feb 15, 2019 at 10:33:47AM +0100, David Marchand wrote: > > > > On Fri, Feb 15, 2019 at 9:58 AM Thomas Monjalon > > > > <[1]thomas@monjalon.net> wrote: > > > > > > > > 14/02/2019 19:51, David Marchand: > > > > > What is the purpose of oerrors ? > > > > > > > > > > Since the drivers (via rte_eth_tx_burst return value) report > the > > > > numbers of > > > > > packets successfully transmitted, the application can try to > > > > retransmit the > > > > > packets that did not make it and counts this. > > > > > If the driver counts such "missed" packets, then it does the > job > > > > the > > > > > application will do anyway (wasting some cycles). > > > > > But what is more a problem is that the application does not > know > > > > if the > > > > > packets in oerrors are its own retries or problems that the > driver > > > > can not > > > > > detect (hw problems) but the hw can. > > > > > > > > > > So the best option is that oerrors does not report the > packets the > > > > driver > > > > > refuses (and I can see some drivers that do not comply to > this) > > > > but only > > > > > "external" errors from the driver pov. > > > > I can see the benefit of having driver errors in the stats, > > > > so it is generically stored for later analysis or print. > > > > It could be managed at ethdev level instead of the application > > > > doing the computation. > > > > What about splitting the Tx errors in 2 fields? oerrors / ofull > ? > > > > Who said it's awful? sorry Bruce for anticipating ;) > > > > > > > > Summary, correct me if we are not aligned :-) > > > > - ofull (maybe ofifoerrors?) is actually a count of SW failed > > > transmits > > > > - it would be handled in rte_eth_tx_burst() itself in a generic > way > > > > - the drivers do not need to track such SW failed transmits > > > > - oerrors only counts packets HW failed transmits, dropped out of > the > > > > driver tx_pkt_burst() knowledge > > > > - the application does not have to track SW failed transmits > since the > > > > stats is in ethdev > > > > It sounds good to me, this means an ethdev abi breakage. > > > > > > Hang on, why do we need ethdev to track this at all, given that it's > > > trivial for apps to track this themselves. Would we not be better just > to > > > add this tracking into testpmd and leave ethdev and drivers alone? > Perhaps > > > I'm missing something? > > > > > > > This was my first intention but Thomas hopped in ;-) > > I was just opening the discussion :) > > > testpmd does it already via the fs->fwd_dropped stats and ovs has its > > tx_dropped stat. > > > > The problem is that all drivers have different approach about this. > > Some drivers only count real hw errors in oerrors. > > But others count the packets it can't send in oerrors (+ there are some > > cases that seem buggy to me where the driver will always refuse the mbufs > > for reason X and the application can retry indefinitely to send...). > > We have 3 options: > 1/ status quo = oerrors is inconsistent across drivers > 2/ API break = oerrors stop being incremented for temporary > unavailability (i.e. queue full, kind of ERETRY), > report only packets which will be never sent, > may be a small performance gain for some drivers > 3/ API + ABI break = same as 2/ + > report ERETRY errors in ofull (same as tx_burst() delta) > > Note that the option 2 is a light API break which does not require > any deprecation notice because the original definition of oerrors > is really vague: "failed transmitted packets" > By changing the definition of errors to "packets lost", we can count > HW errors + packets not matching requirements. > As David suggests, the packets not matching requirements can be freed > as it is done for packets successfully transmitted to the HW. > We need also to update the definition of the return value of > rte_eth_tx_burst(): "packets actually stored in transmit descriptors". > We should also count the bad packets rejected by the driver. > Then the number of bad packets would be the difference between > the return value of rte_eth_tx_burst() and opackets counter. > This solution is fixing some bugs and enforce a consistent behaviour. > I am also for option 2 especially because of this. A driver that refuses a packet for reason X (which is a limitation, or an incorrect config or whatever that is not a transient condition) but gives it back to the application is a bad driver. > The option 3 is breaking the ABI and may degrade the performances. > The only benefit is convenience or semantic: ofull would be the > equivalent of imissed. > The application can count the same by making the difference between > the burst size and the return of rte_eth_tx_burst. > > My vote is for the option 2. > -- David Marchand