From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
Cc: "Richardson, Bruce" <bruce.richardson@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>,
"thomas.monjalon@6wind.com" <thomas.monjalon@6wind.com>,
"viktorin@rehivetech.com" <viktorin@rehivetech.com>,
"jianbo.liu@linaro.org" <jianbo.liu@linaro.org>
Subject: Re: [dpdk-dev] [PATCH] mbuf: make rearm_data address naturally aligned
Date: Thu, 19 May 2016 19:05:54 +0530 [thread overview]
Message-ID: <20160519133548.GA5308@localhost.localdomain> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB97725836B5AB67@irsmsx105.ger.corp.intel.com>
On Thu, May 19, 2016 at 12:18:57PM +0000, Ananyev, Konstantin wrote:
>
> Hi everyone,
>
> > On Thu, May 19, 2016 at 12:20:16AM +0530, Jerin Jacob wrote:
> > > On Wed, May 18, 2016 at 05:43:00PM +0100, Bruce Richardson wrote:
> > > > On Wed, May 18, 2016 at 07:27:43PM +0530, Jerin Jacob wrote:
> > > > > To avoid multiple stores on fast path, Ethernet drivers
> > > > > aggregate the writes to data_off, refcnt, nb_segs and port
> > > > > to an uint64_t data and write the data in one shot
> > > > > with uint64_t* at &mbuf->rearm_data address.
> > > > >
> > > > > Some of the non-IA platforms have store operation overhead
> > > > > if the store address is not naturally aligned.This patch
> > > > > fixes the performance issue on those targets.
> > > > >
> > > > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > > > ---
> > > > >
> > > > > Tested this patch on IA and non-IA(ThunderX) platforms.
> > > > > This patch shows 400Kpps/core improvement on ThunderX + ixgbe + vector environment.
> > > > > and this patch does not have any overhead on IA platform.
> > > > >
> > > > > Have tried an another similar approach by replacing "buf_len" with "pad"
> > > > > (in this patch context),
> > > > > Since it has additional overhead on read and then mask to keep "buf_len" intact,
> > > > > not much improvement is not shown.
> > > > > ref: http://dpdk.org/ml/archives/dev/2016-May/038914.html
> > > > >
> > > > > ---
> > > > While this will work and from your tests doesn't seem to have a performance
> > > > impact, I'm not sure I particularly like it. It's extending out the end of
> > > > cacheline0 of the mbuf by 16 bytes, though I suppose it's not technically using
> > > > up any more space of it.
> > >
> > > Extending by 2 bytes. Right ?. Yes, I guess, Now we using only 56 out of 64 bytes
> > > in the first 64-byte cache line.
> > >
> > > >
> > > > What I'm wondering about though, is do we have any usecases where we need a
> > > > variable buf_len for packets for RX. These mbufs come directly from a mempool,
> > > > which is generally understood to be a set of fixed-sized buffers. I realise that
> > > > this change was made in the past after some discussion, but one of the key points
> > > > there [at least to my reading] was that - even though nobody actually made a
> > > > concrete case where they had variable-sized buffers - having support for them
> > > > made no performance difference.
>
> I was going to point to vhost zcp support, but as Thomas pointed out
> that functionality was removed from dpdk.org recently.
> So I am not aware does such case exist right now in the 'real world' or not.
> Though I still think RX function should leave buf_len field intact.
>
> > > >
> > > > The latter part of that has now changed, and supporting variable-sized mbufs
> > > > from an mbuf pool has a perf impact. Do we definitely need that functionality,
> > > > because the easiest fix here is just to move the rxrearm marker back above
> > > > mbuf_len as it was originally in releases like 1.8?
> > >
> > > And initialize the buf_len with mp->elt_size - sizeof(struct rte_mbuf).
> > > Right?
> > >
> > > I don't have a strong opinion on this, I can do this if there is no
> > > objection on this. Let me know.
> > >
> > > However, I do see in future, "buf_len" may belong at the end of the first 64 byte
> > > cache line as currently "port" is defined as uint8_t, IMO, that is less.
> > > We may need to increase that uint16_t. The reason why I think that
> > > because, Currently in ThunderX HW, we do have 128VFs per socket for
> > > built-in NIC, So, the two node configuration and one external PCIe NW card
> > > configuration can easily go beyond 256 ports.
>
> I wonder does anyone really use mbuf port field?
> My though was - could we to drop it completely?
> Actually, after discussing it with Bruce offline, an interesting idea came out:
> if we'll drop port and make mbuf_prefree() to reset nb_segs=1, then
> we can reduce RX rearm_data to 4B. So with that layout:
>
> struct rte_mbuf {
>
> MARKER cacheline0;
>
> void *buf_addr;
> phys_addr_t buf_physaddr;
> uint16_t buf_len;
> uint8_t nb_segs;
> uint8_t reserved_1byte; /* former port */
>
> MARKER32 rearm_data;
> uint16_t data_off;
> uint16_t refcnt;
>
> uint64_t ol_flags;
> ...
>
> We can keep buf_len at its place and avoid 2B gap, while making rearm_data
> 4B long and 4B aligned.
Couple of comments,
- IMO, It is good if nb_segs can move under rearm_data, as some
drivers(not in ixgbe may be) can write nb_segs in one shot also
in segmented rx handler case
- I think, it makes sense to keep port in mbuf so that application
can make use of it(Not sure what real application developers think of
this)
- if Writing 4B and 8B consume same cycles(at least in arm64) then I think it
makes sense to make it as 8B wide with maximum pre-built constants are possible.
>
> Another similar alternative, is to make mbuf_prefree() to set refcnt=1
> (as it update it anyway). Then we can remove refcnt from the RX rearm_data,
> and again make rearm_data 4B long and 4B aligned:
>
> struct rte_mbuf {
>
> MARKER cacheline0;
>
> void *buf_addr;
> phys_addr_t buf_physaddr;
> uint16_t buf_len;
> uint16_t refcnt;
>
> MARKER32 rearm_data;
> uint16_t data_off;
> uint8_t nb_segs;
> uint8_t port;
The only problem I think with this approach is that, port data type cannot be
extended to uint16_t in future.
>
> uint64_t ol_flags;
> ..
>
> As additional plus, __rte_mbuf_raw_alloc() wouldn't need to modify mbuf contents at all -
> which probably is a good thing.
> As a drawback - we'll have a free mbufs in pool with refcnt==1, which probably reduce
> debug ability of the mbuf code.
>
> Konstantin
>
> > >
> > Ok, good point. If you think it's needed, and if we are changing the mbuf
> > structure, it might be a good time to extend that field while you are at it, save
> > a second ABI break later on.
> >
> > /Bruce
> >
> > > >
> > > > Regards,
> > > > /Bruce
> > > >
> > > > Ref: http://dpdk.org/ml/archives/dev/2014-December/009432.html
> > > >
next prev parent reply other threads:[~2016-05-19 13:36 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-18 13:57 Jerin Jacob
2016-05-18 16:43 ` Bruce Richardson
2016-05-18 18:50 ` Jerin Jacob
2016-05-19 8:50 ` Bruce Richardson
2016-05-19 11:54 ` Jan Viktorin
2016-05-19 12:18 ` Ananyev, Konstantin
2016-05-19 13:35 ` Jerin Jacob [this message]
2016-05-19 15:50 ` Thomas Monjalon
2016-05-23 11:19 ` Olivier Matz
2016-07-04 12:45 ` Jerin Jacob
2016-07-04 12:58 ` Olivier MATZ
2016-05-20 15:30 ` Zoltan Kiss
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160519133548.GA5308@localhost.localdomain \
--to=jerin.jacob@caviumnetworks.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=jianbo.liu@linaro.org \
--cc=konstantin.ananyev@intel.com \
--cc=thomas.monjalon@6wind.com \
--cc=viktorin@rehivetech.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).