DPDK patches and discussions
 help / color / mirror / Atom feed
From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To: Matan Azrad <matan@mellanox.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH 4/8] net/mlx4: optimize Tx multi-segment case
Date: Wed, 6 Dec 2017 12:55:31 +0100	[thread overview]
Message-ID: <20171206115531.GA4062@6wind.com> (raw)
In-Reply-To: <HE1PR0502MB3659DB14EFE589B5FAF075AED2320@HE1PR0502MB3659.eurprd05.prod.outlook.com>

On Wed, Dec 06, 2017 at 11:29:38AM +0000, Matan Azrad wrote:
> Hi Adrien
> 
> > -----Original Message-----
> > From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com]
> > Sent: Wednesday, December 6, 2017 12:59 PM
> > To: Matan Azrad <matan@mellanox.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [PATCH 4/8] net/mlx4: optimize Tx multi-segment case
> > 
> > On Tue, Nov 28, 2017 at 12:19:26PM +0000, Matan Azrad wrote:
> > > mlx4 Tx block can handle up to 4 data segments or control segment + up
> > > to 3 data segments. The first data segment in each not first Tx block
> > > must validate Tx queue wraparound and must use IO memory barrier
> > > before writing the byte count.
> > >
> > > The previous multi-segment code used "for" loop to iterate over all
> > > packet segments and separated first Tx block data case by "if"
> > > statments.
> > 
> > statments => statements
> > 
> > >
> > > Use switch case and unconditional branches instead of "for" loop can
> > > optimize the case and prevents the unnecessary checks for each data
> > > segment; This hints to compiler to create opitimized jump table.
> > 
> > opitimized => optimized
> > 
> > >
> > > Optimize this case by switch case and unconditional branches usage.
> > >
> > > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > > ---
> > >  drivers/net/mlx4/mlx4_rxtx.c | 165
> > > +++++++++++++++++++++++++------------------
> > >  drivers/net/mlx4/mlx4_rxtx.h |  33 +++++++++
> > >  2 files changed, 128 insertions(+), 70 deletions(-)
> > >
> > > diff --git a/drivers/net/mlx4/mlx4_rxtx.c
<snip>
> > > +	/*
> > > +	 * Fill the data segments with buffer information.
> > > +	 * First WQE TXBB head segment is always control segment,
> > > +	 * so jump to tail TXBB data segments code for the first
> > > +	 * WQE data segments filling.
> > > +	 */
> > > +	goto txbb_tail_segs;
> > > +txbb_head_seg:
> > 
> > I'm not fundamentally opposed to "goto" unlike a lot of people out there,
> > but this doesn't look good. It's OK to use goto for error cases and to extricate
> > yourself when trapped in an inner loop, also in some optimization scenarios
> > where it sometimes make sense, but not when the same can be achieved
> > through standard loop constructs and keywords.
> > 
> > In this case I'm under the impression you could have managed with a do { ... }
> > while (...) construct. You need to try harder to reorganize these changes or
> > prove it can't be done without negatively impacting performance.
> > 
> > Doing so should make this patch shorter as well.
> > 
> 
> I noticed this could be done with loop and without unconditional branches, but I checked it and found nice performance improvement using that way.
> When I used the loop I degraded performance.  

OK, you can leave it as is in the meantime then, we'll keep that in mind for
the next refactoring iteration.

<snip>
> > > a/drivers/net/mlx4/mlx4_rxtx.h b/drivers/net/mlx4/mlx4_rxtx.h index
> > > 463df2b..8207232 100644
> > > --- a/drivers/net/mlx4/mlx4_rxtx.h
> > > +++ b/drivers/net/mlx4/mlx4_rxtx.h
> > > @@ -212,4 +212,37 @@ int mlx4_tx_queue_setup(struct rte_eth_dev
> > *dev, uint16_t idx,
> > >  	return mlx4_txq_add_mr(txq, mp, i);
> > >  }
> > >
> > > +/**
> > > + * Write Tx data segment to the SQ.
> > > + *
> > > + * @param dseg
> > > + *   Pointer to data segment in SQ.
> > > + * @param lkey
> > > + *   Memory region lkey.
> > > + * @param addr
> > > + *   Data address.
> > > + * @param byte_count
> > > + *   Big Endian bytes count of the data to send.
> > 
> > Big Endian => Big endian
> > 
> > How about using the dedicated type to properly document it?
> > See rte_be32_t from rte_byteorder.h.
> > 
> Learned new something, thanks, will check it.
> 
> > > + */
> > > +static inline void
> > > +mlx4_fill_tx_data_seg(volatile struct mlx4_wqe_data_seg *dseg,
> > > +		       uint32_t lkey, uintptr_t addr, uint32_t byte_count) {
> > > +	dseg->addr = rte_cpu_to_be_64(addr);
> > > +	dseg->lkey = rte_cpu_to_be_32(lkey); #if RTE_CACHE_LINE_SIZE <
> > 64
> > > +	/*
> > > +	 * Need a barrier here before writing the byte_count
> > > +	 * fields to make sure that all the data is visible
> > > +	 * before the byte_count field is set.
> > > +	 * Otherwise, if the segment begins a new cacheline,
> > > +	 * the HCA prefetcher could grab the 64-byte chunk and
> > > +	 * get a valid (!= 0xffffffff) byte count but stale
> > > +	 * data, and end up sending the wrong data.
> > > +	 */
> > > +	rte_io_wmb();
> > > +#endif /* RTE_CACHE_LINE_SIZE */
> > > +	dseg->byte_count = byte_count;
> > > +}
> > > +
> > 
> > No need to expose this function in a header file. Note that rte_cpu_*() and
> > rte_io*() require the inclusion of rte_byteorder.h and rte_atomic.h
> > respectively.
> > 
> 
> Shouldn't inline functions be in header files?

Not necessarily, like other function types actually. They have to be
declared where needed, in a header file only if several files depend on
them, otherwise they bring namespace pollution for no apparent reason.

-- 
Adrien Mazarguil
6WIND

  reply	other threads:[~2017-12-06 11:55 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-28 12:19 [dpdk-dev] [PATCH 0/8] improve mlx4 Tx performance Matan Azrad
2017-11-28 12:19 ` [dpdk-dev] [PATCH 1/8] net/mlx4: fix Tx packet drop application report Matan Azrad
2017-12-06 10:57   ` Adrien Mazarguil
2017-11-28 12:19 ` [dpdk-dev] [PATCH 2/8] net/mlx4: remove unnecessary Tx wraparound checks Matan Azrad
2017-12-06 10:57   ` Adrien Mazarguil
2017-11-28 12:19 ` [dpdk-dev] [PATCH 3/8] net/mlx4: remove restamping from Tx error path Matan Azrad
2017-12-06 10:58   ` Adrien Mazarguil
2017-11-28 12:19 ` [dpdk-dev] [PATCH 4/8] net/mlx4: optimize Tx multi-segment case Matan Azrad
2017-12-06 10:58   ` Adrien Mazarguil
2017-12-06 11:29     ` Matan Azrad
2017-12-06 11:55       ` Adrien Mazarguil [this message]
2017-11-28 12:19 ` [dpdk-dev] [PATCH 5/8] net/mlx4: merge Tx queue rings management Matan Azrad
2017-12-06 10:58   ` Adrien Mazarguil
2017-12-06 11:43     ` Matan Azrad
2017-12-06 12:09       ` Adrien Mazarguil
2017-11-28 12:19 ` [dpdk-dev] [PATCH 6/8] net/mlx4: mitigate Tx send entry size calculations Matan Azrad
2017-12-06 10:59   ` Adrien Mazarguil
2017-11-28 12:19 ` [dpdk-dev] [PATCH 7/8] net/mlx4: align Tx descriptors number Matan Azrad
2017-12-06 10:59   ` Adrien Mazarguil
2017-12-06 11:44     ` Matan Azrad
2017-11-28 12:19 ` [dpdk-dev] [PATCH 8/8] net/mlx4: remove Tx completion elements counter Matan Azrad
2017-12-06 10:59   ` Adrien Mazarguil
2017-12-06 14:48 ` [dpdk-dev] [PATCH v2 0/8] improve mlx4 Tx performance Matan Azrad
2017-12-06 14:48   ` [dpdk-dev] [PATCH v2 1/8] net/mlx4: fix Tx packet drop application report Matan Azrad
2017-12-06 14:48   ` [dpdk-dev] [PATCH v2 2/8] net/mlx4: remove unnecessary Tx wraparound checks Matan Azrad
2017-12-06 14:48   ` [dpdk-dev] [PATCH v2 3/8] net/mlx4: remove restamping from Tx error path Matan Azrad
2017-12-06 14:48   ` [dpdk-dev] [PATCH v2 4/8] net/mlx4: optimize Tx multi-segment case Matan Azrad
2017-12-06 16:22     ` Adrien Mazarguil
2017-12-06 14:48   ` [dpdk-dev] [PATCH v2 5/8] net/mlx4: merge Tx queue rings management Matan Azrad
2017-12-06 16:22     ` Adrien Mazarguil
2017-12-06 14:48   ` [dpdk-dev] [PATCH v2 6/8] net/mlx4: mitigate Tx send entry size calculations Matan Azrad
2017-12-06 14:48   ` [dpdk-dev] [PATCH v2 7/8] net/mlx4: align Tx descriptors number Matan Azrad
2017-12-06 16:22     ` Adrien Mazarguil
2017-12-06 17:24       ` Matan Azrad
2017-12-06 14:48   ` [dpdk-dev] [PATCH v2 8/8] net/mlx4: remove Tx completion elements counter Matan Azrad
2017-12-06 16:22     ` Adrien Mazarguil
2017-12-06 17:57   ` [dpdk-dev] [PATCH v3 0/8] improve mlx4 Tx performance Matan Azrad
2017-12-06 17:57     ` [dpdk-dev] [PATCH v3 1/8] net/mlx4: fix Tx packet drop application report Matan Azrad
2017-12-06 17:57     ` [dpdk-dev] [PATCH v3 2/8] net/mlx4: remove unnecessary Tx wraparound checks Matan Azrad
2017-12-06 17:57     ` [dpdk-dev] [PATCH v3 3/8] net/mlx4: remove restamping from Tx error path Matan Azrad
2017-12-06 17:57     ` [dpdk-dev] [PATCH v3 4/8] net/mlx4: optimize Tx multi-segment case Matan Azrad
2017-12-06 17:57     ` [dpdk-dev] [PATCH v3 5/8] net/mlx4: merge Tx queue rings management Matan Azrad
2017-12-06 17:57     ` [dpdk-dev] [PATCH v3 6/8] net/mlx4: mitigate Tx send entry size calculations Matan Azrad
2017-12-06 17:57     ` [dpdk-dev] [PATCH v3 7/8] net/mlx4: align Tx descriptors number Matan Azrad
2017-12-06 17:57     ` [dpdk-dev] [PATCH v3 8/8] net/mlx4: remove Tx completion elements counter Matan Azrad
2017-12-07 10:56     ` [dpdk-dev] [PATCH v3 0/8] improve mlx4 Tx performance Adrien Mazarguil
2017-12-10 10:22       ` Shahaf Shuler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171206115531.GA4062@6wind.com \
    --to=adrien.mazarguil@6wind.com \
    --cc=dev@dpdk.org \
    --cc=matan@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).