From: "Harris, James R" <james.r.harris@intel.com>
To: "Howell, Seth" <seth.howell@intel.com>,
"Varghese, Vipin" <vipin.varghese@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] Aligned rte_mempool for storage applications
Date: Tue, 26 Mar 2019 18:59:12 +0000 [thread overview]
Message-ID: <06C0B6E8-83E6-4513-BF5C-7EDAB50D1E1E@intel.com> (raw)
Message-ID: <20190326185912.G78ksSmvE2_9kuDHB6qRROFtXZgyWdneGefRkpSo3-Y@z> (raw)
In-Reply-To: <EA913ED399BBA34AA4EAC2EDC24CDD00AAA711DA@FMSMSX105.amr.corp.intel.com>
On 3/26/19, 11:34 AM, "Howell, Seth" <seth.howell@intel.com> wrote:
Hi Vipin,
Thanks for your quick reply. I will respond to your queries in order.
1. Yes, in at least one case we have buffers of size 4096 bytes. Some of our other buffers are much larger (>64KiB)
2. These buffers are used in the I/O path, so performance is very important. Allocating and freeing a buffer each time we use it could be pretty costly.
I think Vipin may have been suggesting allocating one (or multiple) very large buffers, and then splitting that buffer on 4KB boundaries in SPDK. If so, that would still require SPDK to develop its own mempool-like feature to hold those buffers. We'd really like to use the DPDK rte_mempool implementation rather than inventing our own.
3. Could you describe the idea of an indirect buffer in more detail? I don't think I quite understand that concept. I know we couldn't use mbufs because we often have buffers that are larger than 64k. I think there are more reasons we don't use the mbuf structure in our use case, but am not familiar with all of them. Maybe Jim can explain those in more detail.
SPDK doesn't use rte_mbufs (except when absolutely required for things like DPDK cryptodev/compressdev). Most of that data structure is filled with network packet related fields that would never be used for storage. We could create our own very small data structure and do something similar to Vipin's indirect mbuf suggestion. And I think this is what Vipin was starting to allude to in query #2.
It would be less optimal than a native aligned mempool because we'd be adding an extra pointer dereference on every get from the mempool - but probably only slightly less optimal. Seth - let's sync up offline and see if we can quickly collect some benchmarking data to measure the performance impact of this extra dereference.
Thanks Vipin - this definitely gives us an alternative direction to investigate that we hadn't considered.
-Jim
Thanks,
Seth
-----Original Message-----
From: Varghese, Vipin
Sent: Monday, March 25, 2019 7:53 PM
To: Harris, James R <james.r.harris@intel.com>; Howell, Seth <seth.howell@intel.com>; dev@dpdk.org
Subject: RE: Aligned rte_mempool for storage applications
Hi Seth,
If I may I would like to suggest and ask a query on the mempool alignment details. Please find my suggestion and query inline to the email.
Snipped
>
> In SPDK, we use the rte_mempool struct for many internal structure
> collections. The per-thread cache and ease of allocation of mempools
> are very useful features.
> Some of the collections we store in SPDK are pools of I/O buffers.
> Typically, these pools contain elements of at least 4096 bytes, and we
> would like them to be aligned to 4k for performance reasons.
Query-1> is the total memory required to be 4096 only (data portion)?
>
> [Jim] Just to clarify Seth's point - the performance reasons are
> specifically to avoid wasteful memcopies. The vast majority of NVMe
> SSDs in the market today do not have full scatter/gather support -
> rather they only support something called PRP (Physical Region Pages)
> which require all scatter gather elements except the first to be 4KB
> aligned. There are other storage interfaces such as Linux AIO that also impose alignment restrictions.
>
> -Jim
>
>
> Currently, the rte_mempool API doesn't support aligned mempool
> objects. This means that when we allocate a 4k buffer and want it
> aligned to 4k, we actually need to allocate an 8k buffer and calculate
> an offset into it each time we want to use it.
Query-2> why not create contiguous 4K aligned memory with rte_malloc?
> We recently did a proof of concept using the rte_mempool_ops hook
> where we allocated a mempool and populated it with aligned entries.
> This allowed us to retrieve aligned addresses directly from
> rte_mempool_get(), but didn't help with the allocation size.
> Because the rte_mempool struct assumes that each element has a
> header attached to it, we still need to live up to that assumption for
> each object we create in a mempool. This means that the actual size of
> a buffer becomes 4k + 24 bytes. In order to get to our next aligned
> address, we need to add about 4k of padding to each element.
> Modifying the current rte_mempool struct to allow entries without
> headers seems impossible since it would break rte_mempool_for_obj_iter
> and rte_mempool_from_obj. However I still think there is a lot of
> benefit to be gained from a mempool structure that supports aligned objects without headers.
> I am wondering if DPDK would be open to us introducing an
> rte_mempool_aligned structure. This structure would essentially be a
> wrapper around a regular mempool struct. However, it would not require
> headers or trailers for each object in the pool.
Query-3> using mempool with 0 size for data portion we can either create a indirect buffer or use external mbuf to attach MBUF to 4K aligned rte_malloc areas.
Note: we did similar to the prototype for AF_XDP_ZC_PMD (presented in BLR summit 2019).
Advantage: no change in mempool library, mbuf library, or rte_malloc. Application works with zero change.
>
> This structure would only be applicable to a subset of mempools
> with the following characteristics:
> 1. mempools for which the following flags were set:
> MEMPOOL_F_NO_CACHE_ALIGNED, MEMPOOL_F_NO_IOVA_CONTIG ,
> MEMPOOL_F_NO_SPREAD
> 2. mempools that do not require the use of the following
> functions rte_mempool_from_obj (requires a pointer to the mp in the
> header of each obj), rte_mempool_for_obj_iter.
> 3. Any attempt to create this object when
> RTE_LIBRTE_MEMPOOL_DEBUG was enabled would necessarily fail since we
> can't check the header cookies.
>
> My thought would be that we could implement this data structure in
> a header and it would look something like this:
>
> Struct rte_mempool_aligned {
> Struct rte_mempool mp;
> Size_t obj_alignment;
> };
>
> The rest of the functions in the header would primarily be
> wrappers around the original functions. Most functions
> (rte_mempool_alloc, rte_mempool_free, rte_mempool_enqueue/dequeue,
> rte_mempool_get_count, etc.) could be implemented directly as
> wrappers, and others such as rte_mempool_create and the populate
> functions would have to be re-implemented to some degree in the new
> header. The remaining functions (check_cookies, for_obj_iter) would not be implemented in the rte_mempool_aligned.h file.
>
> Would the community be welcoming of a new rte_mempool_aligned
> struct? If you don't feel like this would be the way to go, are there
> other options in DPDK for creating a pool of pre-allocated aligned objects?
>
> Thank you,
>
> Seth Howell
>
>
>
next prev parent reply other threads:[~2019-03-26 18:59 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-25 21:06 Howell, Seth
2019-03-25 21:06 ` Howell, Seth
2019-03-25 21:13 ` Harris, James R
2019-03-25 21:13 ` Harris, James R
2019-03-26 2:52 ` Varghese, Vipin
2019-03-26 2:52 ` Varghese, Vipin
2019-03-26 18:34 ` Howell, Seth
2019-03-26 18:34 ` Howell, Seth
2019-03-26 18:59 ` Harris, James R [this message]
2019-03-26 18:59 ` Harris, James R
2019-03-27 2:33 ` Varghese, Vipin
2019-03-27 2:33 ` Varghese, Vipin
2019-03-27 8:28 ` Varghese, Vipin
2019-03-27 8:28 ` Varghese, Vipin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=06C0B6E8-83E6-4513-BF5C-7EDAB50D1E1E@intel.com \
--to=james.r.harris@intel.com \
--cc=dev@dpdk.org \
--cc=seth.howell@intel.com \
--cc=vipin.varghese@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).