From: Olivier MATZ <olivier.matz@6wind.com>
To: "Wiles, Keith" <keith.wiles@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] mbuf: add helpers to prefetch mbuf
Date: Tue, 10 May 2016 10:08:19 +0200 [thread overview]
Message-ID: <573196F3.2030608@6wind.com> (raw)
In-Reply-To: <645005AB-A5B0-43AC-8E44-AD8D6526DF3D@intel.com>
Hi,
On 05/10/2016 12:02 AM, Wiles, Keith wrote:
>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>> index 529debb..e3ee0b3 100644
>> --- a/lib/librte_mbuf/rte_mbuf.h
>> +++ b/lib/librte_mbuf/rte_mbuf.h
>> @@ -842,6 +842,44 @@ struct rte_mbuf {
>> uint16_t timesync;
>> } __rte_cache_aligned;
>>
>> +/**
>> + * Prefetch the first part of the mbuf
>> + *
>> + * The first 64 bytes of the mbuf corresponds to fields that are used early
>> + * in the receive path. If the cache line of the architecture is higher than
>> + * 64B, the second part will also be prefetched.
>> + *
>> + * @param m
>> + * The pointer to the mbuf.
>> + */
>> +static inline void
>> +rte_mbuf_prefetch_part0(struct rte_mbuf *m)
>> +{
>> + rte_prefetch0(&m->cacheline0);
>> +}
>> +
>> +/**
>> + * Prefetch the second part of the mbuf
>> + *
>> + * The next 64 bytes of the mbuf corresponds to fields that are used in the
>> + * transmit path. If the cache line of the architecture is higher than 64B,
>> + * this function does nothing as it is expected that the full mbuf is
>> + * already in cache.
>> + *
>> + * @param m
>> + * The pointer to the mbuf.
>> + */
>> +static inline void
>> +rte_mbuf_prefetch_part1(struct rte_mbuf *m)
>> +{
>> +#if RTE_CACHE_LINE_SIZE == 64
>> + rte_prefetch0(&m->cacheline1);
>> +#else
>> + RTE_SET_USED(m);
>> +#endif
>> +}
>
> I am not super happy with the names here, but I understand that rte_mbuf_prefetch_cacheline0() is a bit long. I could live with them being longer if that makes more sense and adds to readability.
Naming these functions rte_mbuf_prefetch_cacheline0() and
rte_mbuf_prefetch_cacheline1() was my first intention, but
as you said, it's long, and it's also not accurate because
here we don't really deal with cache lines, and that's why
I preffered to use "part" instead.
I'm not opposed to name them part1/part2 instead of part0/part1
as Thomas suggested. Another option would be:
- rte_mbuf_prefetch_rx_part(m)
- rte_mbuf_prefetch_tx_part(m)
The objective is to avoid the drivers to deal with the two possible
cache line sizes with #ifdefs. So I don't think the function should
be called something_cacheline.
As a side note, I'm not really satisfied by the RTE_CACHE_LINE_MIN_SIZE
and __rte_cache_min_aligned macros and I think it would be clearer
to explicitly align to 64. If other people agree, I can submit a patch
for this too.
Any comment?
> Another idea is to have only one function for both:
>
> enum { MBUF_CACHELINE0 = 0, MBUF_CACHELINE1, MBUF_CACHELINES }; // Optional enum if you want
>
> static inline void
> rte_mbuf_prefetch(struct rte_mbuf *m, unsigned cacheline) // Make sure we add a comment about the constant value
> {
> if (cacheline == MBUF_CACHELINE0)
> rte_prefetch0(&m->cacheline0);
> else if (cacheline == MBUF_CACHELINE1)
> rte_prefetch0(&m->cacheline1);
> else {
> rte_prefetch0(&m->cacheline0);
> rte_prefetch0(&m->cacheline1);
> }
> }
>
> I believe if you use constant value in the call for the cacheline variable then the extra code should be optimized out. If not then what about a macro instead.
>
> #define rte_mbuf_prefetch(m, c) \
> do { \
> if ((c) == MBUF_CACHELINE0) \
> rte_prefetch0(&(m)->cacheline0); \
> else if ((c) == MBUF_CACHELINE1) \
> rte_prefetch0(&(m)->cacheline1); \
> else { \
> rte_prefetch0(&(m)->cacheline0); \
> rte_prefetch0(&(m)->cacheline1); \
> } \
> } while((0))
>
> Call like this:
> rte_mbuf_prefetch(m, 0); // For cacheline 0
> rte_mbuf_prefetch(m, 1); // For cacheline 1
> rte_mbuf_prefetch(m, 2); // For cacheline 0 and 1
In my opinion, the implementation and usage is simpler with 2
separate functions. What would be the advantage of this?
> We could have another routine:
> rte_mbuf_prefetch_data(m, 0); // Prefetch the first cacheline of the packet data.
Well, here, I think there is no need to replace rte_prefetch0(m->data).
It's useful for mbuf structure prefetch because many drivers want
to prefetch the rx part first, then the tx part. For data, the same
function can be used whatever the cache line size.
Regards,
Olivier
next prev parent reply other threads:[~2016-05-10 8:08 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-09 16:18 Olivier Matz
2016-05-09 17:28 ` Thomas Monjalon
2016-05-09 22:02 ` Wiles, Keith
2016-05-10 8:08 ` Olivier MATZ [this message]
2016-05-18 16:02 ` [dpdk-dev] [PATCH v2] " Olivier Matz
2016-05-19 6:46 ` Jerin Jacob
2016-05-24 9:20 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=573196F3.2030608@6wind.com \
--to=olivier.matz@6wind.com \
--cc=dev@dpdk.org \
--cc=keith.wiles@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).