Re: [dpdk-dev] [PATCH v5] net/mlx4: support hardware TSO

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Matan Azrad <matan@mellanox.com>
To: Mordechay Haimovsky <motih@mellanox.com>,
	Adrien Mazarguil <adrien.mazarguil@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v5] net/mlx4: support hardware TSO
Date: Mon, 9 Jul 2018 18:18:17 +0000	[thread overview]
Message-ID: <VI1PR0501MB26080CBC826E88325B547919D2440@VI1PR0501MB2608.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <AM0PR05MB4435D0C04626CF1D768C1EE5D2440@AM0PR05MB4435.eurprd05.prod.outlook.com>

Hi Moti

I continue the discussion here in spite of the next version was out just to see the full discussions. 

From: Mordechay Haimovsky
> inline
> 
> > From: Matan Azrad
> > Hi Moti
> >
> > Please see some comments below.
> >
> > From: Mordechay Haimovsky
> > > Implement support for hardware TSO.
> > >
> > > Signed-off-by: Moti Haimovsky <motih@mellanox.com>
...
> > > +	do {
> > > +		/* how many dseg entries do we have in the current TXBB ?
> > > */
> > > +		nb_segs_txbb = (MLX4_TXBB_SIZE -
> > > +				((uintptr_t)dseg & (MLX4_TXBB_SIZE - 1))) >>
> > > +			       MLX4_SEG_SHIFT;
> > > +		switch (nb_segs_txbb) {
> > > +		default:
> > > +			/* Should never happen. */
> > > +			rte_panic("%p: Invalid number of SGEs(%d) for a
> > > TXBB",
> > > +			(void *)txq, nb_segs_txbb);
> > > +			/* rte_panic never returns. */
> >
> > Since this default case should not happen because of the above
> > calculation I think we don't need it.
> > Just "break" if the compiler complain of default case lack.
> >
> Although "default" is not mandatory in switch case statement it is a good
> practice to have it even just for code clarity.
> so I will keep it there.

But the rte_panic code (and all the default block) is redundant and we don't need redundant code in our data-path.
You can remain a comment if you want for clarifying.
 

> > > +		case 4:
> > > +			/* Memory region key for this memory pool. */
> > > +			lkey = mlx4_tx_mb2mr(txq, sbuf);
> > > +			if (unlikely(lkey == (uint32_t)-1))
> > > +				goto err;
> > > +			dseg->addr =
> > > +
> > > rte_cpu_to_be_64(rte_pktmbuf_mtod_offset(sbuf,
> > > +								     uintptr_t,
> > > +								     sb_of));
> > > +			dseg->lkey = lkey;
> > > +			/*
> > > +			 * This data segment starts at the beginning of a new
> > > +			 * TXBB, so we need to postpone its byte_count
> > > writing
> > > +			 * for later.
> > > +			 */
> > > +			pv[*pv_counter].dseg = dseg;
> > > +			/*
> > > +			 * Zero length segment is treated as inline segment
> > > +			 * with zero data.
> > > +			 */
> > > +			data_len = sbuf->data_len - sb_of;
> >
> > Is there a chance that the data_len will be negative? Rolled in this case?
> Since we verify ahead the all l2,l3 and L4 headers reside in the same fragment
> there is no reason for data_len to become negative, this is why I use uint16_t
> which is  the same data type used in struct rte_mbuf for representing
> data_len , and as we do it in mlx4_tx_burst_segs.
> 
> > Maybe it is better to change it for int16_t and to replace the next
> > check to
> > be:
> > data_len > 0 ? data_len : 0x80000000
> >
> I will keep this the way it is for 2 reasons:
> 1. Seems to me more cumbersome then what I wrote.

OK, you right here, if it cannot be negative we shouldn't change it :)

> 2. Code consistency wise, this is how we also wrote it in mlx4_tx_burst_segs,
>      What's good there is also good here.

Not agree, here is really a different case from there, a lot of assumption are different and the code may reflects it.

> > And I think I found a way to remove the sb_of calculations for each
> segment:
> >
> > Each segment will create the next segment parameters while only the
> > pre loop calculation for the first segment parameters will calculate
> > the header
> > offset:
> >
> > The parameters: data_len and sb_of.
> >
> > So before the loop:
> > sb_of = tinfo->tso_header_size;
> > data_len = sbuf->data_len - sb_of;
> >
> > And inside the loop (after the check of nb_segs):
> > sb_of = 0;
> > data_len = sbuf->data_len(the next sbuf);
> >
> > so each segment calculates the next segment parameters and we don't
> > need the "- sb_of" calculation per segment.
> >
> NICE :)
> 

Sorry for see it only now, but we don't need even the "sb_of=0" per segment:
We can add one more parameter for the next segment 
addr = rte_pktmbuf_mtod_offset(sbuf, uintptr_t, tinfo->tso_header_size)
before the loop
and
addr= rte_pktmbuf_mtod(sbuf, uintptr_t)
inside the loop

so finally we save 2 cycles per segment :)
...
> > > +static inline volatile struct mlx4_wqe_data_seg *
> > > +mlx4_tx_burst_fill_tso_hdr(struct rte_mbuf *buf,
> > > +			   struct txq *txq,
> > > +			   struct tso_info *tinfo,
> > > +			   volatile struct mlx4_wqe_ctrl_seg *ctrl) {
> > > +	volatile struct mlx4_wqe_lso_seg *tseg =
> > > +		(volatile struct mlx4_wqe_lso_seg *)(ctrl + 1);
> > > +	struct mlx4_sq *sq = &txq->msq;
> > > +	struct pv *pv = tinfo->pv;
> > > +	int *pv_counter = &tinfo->pv_counter;
> > > +	int remain_size = tinfo->tso_header_size;
> > > +	char *from = rte_pktmbuf_mtod(buf, char *);
> > > +	uint16_t txbb_avail_space;
> > > +	/* Union to overcome volatile constraints when copying TSO header.
> > > */
> > > +	union {
> > > +		volatile uint8_t *vto;
> > > +		uint8_t *to;
> > > +	} thdr = { .vto = (volatile uint8_t *)tseg->header, };
> > > +
> > > +	/*
> > > +	 * TSO data always starts at offset 20 from the beginning of the TXBB
> > > +	 * (16 byte ctrl + 4byte TSO desc). Since each TXBB is 64Byte aligned
> > > +	 * we can write the first 44 TSO header bytes without worry for TxQ
> > > +	 * wrapping or overwriting the first TXBB 32bit word.
> > > +	 */
> > > +	txbb_avail_space = MLX4_TXBB_SIZE -
> > > +			   (sizeof(struct mlx4_wqe_ctrl_seg) +
> > > +			    sizeof(struct mlx4_wqe_lso_seg));
> >
> > I think that better name is txbb_tail_size.
> I think that txbb_avail_space is good enough, so no change here.

My suggestion is because this size is only the tail size in the txbb without the first 4 bytes, so it may be more reasonable.
I can understand also your suggestion while avail points to the available txbb size to write (the first 4B are not available now only later).
I'm not going to argue about it :)
 
> 
> >
> > > +	while (remain_size >= (int)(txbb_avail_space + sizeof(uint32_t))) {
> > > +		/* Copy to end of txbb. */
> > > +		rte_memcpy(thdr.to, from, txbb_avail_space);
> > > +		from += txbb_avail_space;
> > > +		thdr.to += txbb_avail_space;
> > > +		/* New TXBB, Check for TxQ wrap. */
> > > +		if (thdr.to >= sq->eob)
> > > +			thdr.vto = sq->buf;
> > > +		/* New TXBB, stash the first 32bits for later use. */
> > > +		pv[*pv_counter].dst = (volatile uint32_t *)thdr.to;
> > > +		pv[(*pv_counter)++].val = *(uint32_t *)from,
> > > +		from += sizeof(uint32_t);
> > > +		thdr.to += sizeof(uint32_t);
> > > +		remain_size -= (txbb_avail_space + sizeof(uint32_t));
> >
> > You don't need the () here.
> True
> >
> > > +		/* Avail space in new TXBB is TXBB size - 4 */
> > > +		txbb_avail_space = MLX4_TXBB_SIZE - sizeof(uint32_t);
> > > +	}
> > > +	if (remain_size > txbb_avail_space) {
> > > +		rte_memcpy(thdr.to, from, txbb_avail_space);
> > > +		from += txbb_avail_space;
> > > +		thdr.to += txbb_avail_space;
> > > +		remain_size -= txbb_avail_space;
> > > +		/* New TXBB, Check for TxQ wrap. */
> > > +		if (thdr.to >= sq->eob)
> > > +			thdr.vto = sq->buf;
> > > +		pv[*pv_counter].dst = (volatile uint32_t *)thdr.to;
> > > +		rte_memcpy(&pv[*pv_counter].val, from, remain_size);
> > > +		(*pv_counter)++;
> > > +	} else {
> >
> > Here it should be else if (remain_size > 0).
> true
> >
> > > +		rte_memcpy(thdr.to, from, remain_size);
> > > +	}
> > > +
> > > +	tseg->mss_hdr_size = rte_cpu_to_be_32((buf->tso_segsz << 16) |
> > > +					      tinfo->tso_header_size);
> > > +	/* Calculate data segment location */
> > > +	return (volatile struct mlx4_wqe_data_seg *)
> > > +				((uintptr_t)tseg + tinfo->wqe_tso_seg_size);
> > > }
> > > +
> > > +/**
> > > + * Write data segments and header for TSO uni/multi segment packet.
> > > + *
> > > + * @param buf
> > > + *   Pointer to the first packet mbuf.
> > > + * @param txq
> > > + *   Pointer to Tx queue structure.
> > > + * @param ctrl
> > > + *   Pointer to the WQE control segment.
> > > + *
> > > + * @return
> > > + *   Pointer to the next WQE control segment on success, NULL
> otherwise.
> > > + */
> > > +static volatile struct mlx4_wqe_ctrl_seg * mlx4_tx_burst_tso(struct
> > > +rte_mbuf *buf, struct txq *txq,
> > > +		  volatile struct mlx4_wqe_ctrl_seg *ctrl) {
> > > +	volatile struct mlx4_wqe_data_seg *dseg;
> > > +	volatile struct mlx4_wqe_ctrl_seg *ctrl_next;
> > > +	struct mlx4_sq *sq = &txq->msq;
> > > +	struct tso_info tinfo;
> > > +	struct pv *pv;
> > > +	int pv_counter;
> > > +	int ret;
> > > +
> > > +	ret = mlx4_tx_burst_tso_get_params(buf, txq, &tinfo);
> > > +	if (unlikely(ret))
> > > +		goto error;
> > > +	dseg = mlx4_tx_burst_fill_tso_hdr(buf, txq, &tinfo, ctrl);
> > > +	if (unlikely(dseg == NULL))
> > > +		goto error;
> > > +	if ((uintptr_t)dseg >= (uintptr_t)sq->eob)
> > > +		dseg = (volatile struct mlx4_wqe_data_seg *)
> > > +					((uintptr_t)dseg - sq->size);
> > > +	ctrl_next = mlx4_tx_burst_fill_tso_dsegs(buf, txq, &tinfo, dseg, ctrl);
> > > +	if (unlikely(ctrl_next == NULL))
> > > +		goto error;
> > > +	/* Write the first DWORD of each TXBB save earlier. */
> > > +	pv = tinfo.pv;
> > > +	pv_counter = tinfo.pv_counter;
> > > +	/* Need a barrier here before writing the first TXBB word. */
> > > +	rte_io_wmb();
> >
> > > +	for (--pv_counter; pv_counter  >= 0; pv_counter--)
> >
> > Since we don't need the first check do while statement is better.
> > To be fully safe you can use likely check before the memory barrier.
> >
> Will return the if statement But will not change the loop as it is the same as in
> mlx4_tx_burst_segs and I do want to have a consistent code.

I'm not agree with this statement as above - different assumptions - different optimized code.

Here and in the mlx4_tx_burst_segs code we don't need the first check in the for loop and we don't need redundant checks in our datapath.

So both should be updated.

While the difference is in the prior check, here it should be likely and there it should not.
So actually there the "for" loop can stay as is but we don't need the first if check of pv_counter because it is already checked in the for loop.

I suggest to optimize both(prior patch for mlx4_tx_burst_segs optimization).

> > > +		*pv[pv_counter].dst = pv[pv_counter].val;
> > > +	ctrl->fence_size = tinfo.fence_size;
> > > +	sq->remain_size -= tinfo.wqe_size;
> > > +	return ctrl_next;

next prev parent reply	other threads:[~2018-07-09 18:18 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <0~1530184583-30166-1-git-send-email-motih@mellanox.com>
2018-06-28 11:57 ` [dpdk-dev] [PATCH v2] " Moti Haimovsky
2018-06-28 12:48   ` [dpdk-dev] [PATCH v3] " Moti Haimovsky
2018-06-28 14:15     ` Adrien Mazarguil
2018-06-28 15:19     ` Matan Azrad
2018-07-04 14:53     ` [dpdk-dev] [PATCH v4] " Moti Haimovsky
2018-07-05 12:30       ` Matan Azrad
2018-07-09 10:43       ` [dpdk-dev] [PATCH v5] " Moti Haimovsky
2018-07-09 13:07         ` Matan Azrad
2018-07-09 16:22           ` Mordechay Haimovsky
2018-07-09 18:18             ` Matan Azrad [this message]
2018-07-09 16:33         ` [dpdk-dev] [PATCH v6] " Moti Haimovsky
2018-07-10 10:45           ` [dpdk-dev] [PATCH v7] " Moti Haimovsky
2018-07-10 11:02             ` Matan Azrad
2018-07-10 12:03               ` Shahaf Shuler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR0501MB26080CBC826E88325B547919D2440@VI1PR0501MB2608.eurprd05.prod.outlook.com \
    --to=matan@mellanox.com \
    --cc=adrien.mazarguil@6wind.com \
    --cc=dev@dpdk.org \
    --cc=motih@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).