DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v2] ixgbe: don't override mbuf buffer length
Date: Fri, 5 Dec 2014 12:10:44 +0000	[thread overview]
Message-ID: <20141205121044.GA4084@bricha3-MOBL3> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB977258213BCF7D@IRSMSX105.ger.corp.intel.com>

On Fri, Dec 05, 2014 at 11:59:50AM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Richardson, Bruce
> > Sent: Friday, December 05, 2014 11:29 AM
> > To: Ananyev, Konstantin
> > Cc: Jean-Mickael Guerin; dev@dpdk.org
> > Subject: Re: [PATCH v2] ixgbe: don't override mbuf buffer length
> > 
> > On Fri, Dec 05, 2014 at 11:19:11AM +0000, Ananyev, Konstantin wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Richardson, Bruce
> > > > Sent: Friday, December 05, 2014 10:39 AM
> > > > To: Ananyev, Konstantin
> > > > Cc: Jean-Mickael Guerin; dev@dpdk.org
> > > > Subject: Re: [PATCH v2] ixgbe: don't override mbuf buffer length
> > > >
> > > > On Fri, Dec 05, 2014 at 01:10:28AM +0000, Ananyev, Konstantin wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Jean-Mickael Guerin [mailto:jean-mickael.guerin@6wind.com]
> > > > > > Sent: Thursday, December 04, 2014 6:09 PM
> > > > > > To: dev@dpdk.org
> > > > > > Cc: Richardson, Bruce; Ananyev, Konstantin
> > > > > > Subject: [PATCH v2] ixgbe: don't override mbuf buffer length
> > > > > >
> > > > > > The template mbuf_initializer is hard coded with a buflen which
> > > > > > might have been set differently by the application at the time of
> > > > > > mbuf pool creation.
> > > > > >
> > > > > > Switch to a mbuf allocation, to fetch the correct default values.
> > > > > > There is no performance impact because this is not a data-plane API.
> > > > > >
> > > > > > Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com>
> > > > > > Acked-by: David Marchand <david.marchand@6wind.com>
> > > > > > Fixes: 0ff3324da2 ("ixgbe: rework vector pmd following mbuf changes")
> > > > > > ---
> > > > > >
> > > > > >  v2: check returned value of ixgbe_rxq_vec_setup
> > > > > >
> > > > > >  lib/librte_pmd_ixgbe/ixgbe_rxtx.c     |  5 ++++-
> > > > > >  lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c | 19 ++++++++++++-------
> > > > > >  2 files changed, 16 insertions(+), 8 deletions(-)
> > > > > >
> > > > > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > > > > > index 5c36bff..7994da1 100644
> > > > > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > > > > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c
> > > > > > @@ -2244,7 +2244,10 @@ ixgbe_dev_rx_queue_setup(struct rte_eth_dev *dev,
> > > > > >  	use_def_burst_func = check_rx_burst_bulk_alloc_preconditions(rxq);
> > > > > >
> > > > > >  #ifdef RTE_IXGBE_INC_VECTOR
> > > > > > -	ixgbe_rxq_vec_setup(rxq);
> > > > > > +	if (ixgbe_rxq_vec_setup(rxq) < 0) {
> > > > > > +		ixgbe_rx_queue_release(rxq);
> > > > > > +		return (-ENOMEM);
> > > > > > +	}
> > > > > >  #endif
> > > > > >  	/* Check if pre-conditions are satisfied, and no Scattered Rx */
> > > > > >  	if (!use_def_burst_func && !dev->data->scattered_rx) {
> > > > > > diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
> > > > > > index c1b5a78..f7b02f5 100644
> > > > > > --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
> > > > > > +++ b/lib/librte_pmd_ixgbe/ixgbe_rxtx_vec.c
> > > > > > @@ -732,17 +732,22 @@ static struct ixgbe_txq_ops vec_txq_ops = {
> > > > > >  int
> > > > > >  ixgbe_rxq_vec_setup(struct igb_rx_queue *rxq)
> > > > > >  {
> > > > > > -	struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */
> > > > > > +	struct rte_mbuf *mb_def;
> > > > > >
> > > > > > -	mb_def.nb_segs = 1;
> > > > > > -	mb_def.data_off = RTE_PKTMBUF_HEADROOM;
> > > > > > -	mb_def.buf_len = rxq->mb_pool->elt_size - sizeof(struct rte_mbuf);
> > > > > > -	mb_def.port = rxq->port_id;
> > > > > > -	rte_mbuf_refcnt_set(&mb_def, 1);
> > > > > > +	mb_def = rte_pktmbuf_alloc(rxq->mb_pool);
> > > > > > +	if (mb_def == NULL) {
> > > > > > +		PMD_INIT_LOG(ERR, "ixgbe_rxq_vec_setup: could not allocate one mbuf");
> > > > > > +		return -1;
> > > > > > +	}
> > > > > > +	/* nb_segs, refcnt, data_off and buf_len are already set */
> > > > > > +	mb_def->port = rxq->port_id;
> > > > > >
> > > > > >  	/* prevent compiler reordering: rearm_data covers previous fields */
> > > > > >  	rte_compiler_barrier();
> > > > > > -	rxq->mbuf_initializer = *((uint64_t *)&mb_def.rearm_data);
> > > > > > +	rxq->mbuf_initializer = *((uint64_t *)&mb_def->rearm_data);
> > > > > > +
> > > > > > +	rte_pktmbuf_free(mb_def);
> > > > > > +
> > > > > >  	return 0;
> > > > > >  }
> > > > > >
> > > > > > --
> > > > > > 2.1.3
> > > > >
> > > > > As I said in another mail, I don't think it is a proper fix.
> > > > > What we did here - just changed one assumption to another.
> > > > > Current assumption - all mbuf obj_init() would setup buf_len in exactly the same manner as  rte_pktmbuf_init() does:
> > > > > buf_len = mp->elt_size - sizeof(struct rte_mbuf);
> > > > > New assumption - all mbuf obj_init() would setup buf_len for all mbufs in the pool to the same value.
> > > > > Both assumptions, I believe, are not always correct.
> > > > > Though, probably the new one would be true more often.
> > > > >
> > > > > I still think the proper fix is not to update mbuf's buf_len field at ixgbe_rxq_rearm() at all.
> > > > > We should just leave the original value unmodified.
> > > > > Actually, while looking at ixgbe_rxq_rearm(), I don't see any reason why we need to update buf_len field.
> > > > > It is not the data that need to be rearmed.
> > > > > The fields that need to be rearmed are:
> > > > > uint16_t data_off;
> > > > > uint16_t refcnt
> > > > > uint8_t nb_segs;
> > > > > uint8_t port;
> > > > >
> > > > > 6B in total.
> > > > > We probably would like to keep rearming as one 64bit load/store.
> > > > > Though straight below them we have:
> > > > > uint64_t ol_flags;
> > > > >
> > > > > As RX fully override ol_flags anyway, we can safely overwrite first 2B of it.
> > > > > That would allow us to still read/write whole 64bits and avoid overwriting buf_len.
> > > > > I am talking about something like patch  below.
> > > > > I admit that it looks not so pretty, but I think it is much safer and correct.
> > > > > Konstantin
> > > >
> > > > I just don't see this as worthwhile doing. We are looking here at an mbuf pool
> > > > which is to be used for packet buffers for RX. If the packet buffers in that
> > > > pool are all of different sizes then we need to go back and look at other places
> > > > throughout the code too. For instance, we query the buffer length of the mbuf
> > > > pool when initializing the RX queues to determine if we need to enable scattered
> > > > RX. If the mbufs in the pool can potentially be of different sizes, we need to
> > > > turn off the no-scattered-packets optimization and always use the scatter
> > > > packets code path - because the assumption that buffers don't get resized could
> > > > also be false. Similarly here, if the mbufs are going to be of different
> > > > sizes then the user should disable the vector PMD, and use RX code that doesn't
> > > > override the buf_len each time.
> > >
> > > Yes, all buffers in the pool supposed to be not smaller than some threshold.
> > > But it doesn't mean that buffer can't be bigger.
> > > Let say our usual case - buf_len = 2K.
> > > Why it should be prohibited to use for RX mbuf with buf_len == 4K?
> > > There is absolutely nothing wrong with it.
> > >
> > > >
> > > > Furthermore, we still don't have an actual use-case where the user would want to
> > > > have different size mbufs in an mbuf pool used for RX. They can still have
> > > > variable-sized mbufs in other pools, but having all buffers the same size in the
> > > > pool used to receive packets seems a perfectly fair restriction to have. If
> > > > someone has an app that they are creating that needs this functionality, I'll
> > > > reconsider this opinion, but right now this is all theoretical.
> > >
> > > Well, the question is why do we need to override buf_len at all?
> > > At first place, it doesn't look correct.
> > > Second - what advantage we are gaining from it?
> > > Performance?
> > > I tried with the changes below and didn't see any performance difference at all.
> > > So what is the point in keeping that potential point of failure?
> > >
> > > About use cases - I think I gave two examples in my previous mail.
> > > Not sure why you consider them artificial.
> > > From other side, do we have right now any actual use-case where people setup buf_len to something
> > > different then: mp->elt_size - sizeof(struct rte_mbuf)?
> > >
> > > Konstantin
> > >
> > Ok, thanks for the info Konstantin - I assume there is a current case where the
> > mbuf is being initialized to something other than the default, on the basis that
> > this patch was proposed as a fix for it.
> > 
> > However, rather than argue this further, I'll defer to whatever the rest of the
> > community decides. Thomas, Jean-Mickael, David - what are your opinions on this?
> 
> That's ok, but I still wonder why do we need to keep buf_len under rearm_data marker and overwrite it for each RX?
> Honestly, probably there exist some obvious reason, that I am missing?
> To keep code tidy and easier to read?
> Or to keep whole (uint64_t)rearm_data on 64bits boundary?
> That are all good reasons, but I suppose data integrity is more important, right?
>
Yes, those are the reasons it was put there (easy to read, aligned), but also 
because it was viewed as being constant for a queue, i.e. the case where you 
would have different sized mbufs in a pool was never considered. [The first line 
of our documentation for the mempools is "A memory pool is an allocator of a 
fixed-sized object", and everything is based around that idea of fixed size 
object pools.]
When doing the mbuf redesign, I pulled together what were the fields that were
constant for each queue, and grouped them to a single 8-byte write to give
faster (and shorter in terms of code) mbuf initialization.

/Bruce

  reply	other threads:[~2014-12-05 12:10 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-04 18:09 Jean-Mickael Guerin
2014-12-05  1:10 ` Ananyev, Konstantin
2014-12-05 10:38   ` Bruce Richardson
2014-12-05 11:19     ` Ananyev, Konstantin
2014-12-05 11:28       ` Bruce Richardson
2014-12-05 11:59         ` Ananyev, Konstantin
2014-12-05 12:10           ` Bruce Richardson [this message]
2014-12-05 15:23             ` Ananyev, Konstantin
2014-12-05 10:40 ` Bruce Richardson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141205121044.GA4084@bricha3-MOBL3 \
    --to=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).