From: "Richardson, Bruce" <bruce.richardson@intel.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH 2/4] distributor: new packet distributor library
Date: Wed, 21 May 2014 10:21:26 +0000 [thread overview]
Message-ID: <59AF69C657FD0841A61C55336867B5B01AA1C540@IRSMSX103.ger.corp.intel.com> (raw)
In-Reply-To: <20140520181844.GF6648@hmsreliant.think-freely.org>
> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Tuesday, May 20, 2014 7:19 PM
> To: Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/4] distributor: new packet distributor library
>
> On Tue, May 20, 2014 at 11:00:55AM +0100, Bruce Richardson wrote:
> > This adds the code for a new Intel DPDK library for packet distribution.
> > The distributor is a component which is designed to pass packets
> > one-at-a-time to workers, with dynamic load balancing. Using the RSS
> > field in the mbuf as a tag, the distributor tracks what packet tag is
> > being processed by what worker and then ensures that no two packets with
> > the same tag are in-flight simultaneously. Once a tag is not in-flight,
> > then the next packet with that tag will be sent to the next available
> > core.
> >
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ><snip>
>
> > +#define RTE_DISTRIB_GET_BUF (1)
> > +#define RTE_DISTRIB_RETURN_BUF (2)
> > +
> Can you document the meaning of these bits please, the code makes it
> somewhat
> confusing to differentiate them. As I read the code, GET_BUF is used as a flag
> to indicate that rte_distributor_get_pkt needs to wait while a buffer is
> filled in by the processing thread, while RETURN_BUF indicates that a worker is
> leaving and the buffer needs to be (re)assigned to an alternate worker, is that
> correct?
Pretty much. I'll add additional comments to the code.
>
> > +#define RTE_DISTRIB_BACKLOG_SIZE 8
> > +#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
> > +
> > +#define RTE_DISTRIB_MAX_RETURNS 128
> > +#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
> > +
> > +union rte_distributor_buffer {
> > + volatile int64_t bufptr64;
> > + char pad[CACHE_LINE_SIZE*3];
> Do you need the pad, if you mark the struct as cache aligned?
Yes, for performance reasons we actually want the structure to take up three cache lines, not just one. For instance, this will guarantee that we don't have adjacent line prefetcher in hardware pull in an additional cache line -belonging to a different worker - when we access the memory.
> > +} __rte_cache_aligned;
> >
> +
> ><snip>
> > +
> > +struct rte_mbuf *
> > +rte_distributor_get_pkt(struct rte_distributor *d,
> > + unsigned worker_id, struct rte_mbuf *oldpkt,
> > + unsigned reserved __rte_unused)
> > +{
> > + union rte_distributor_buffer *buf = &d->bufs[worker_id];
> > + int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS) |
> \
> > + RTE_DISTRIB_GET_BUF;
> > + while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
> > + rte_pause();
> > + buf->bufptr64 = req;
> > + while (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
> > + rte_pause();
> You may want to document the fact that this is deadlock prone. You clearly
> state that only a single thread can run the processing routine, but if a user
> selects a single worker thread to preform double duty, the GET_BUF_FLAG will
> never get cleared here, and no other queues will get processed.
Agreed, I'll update the comments.
>
> > + /* since bufptr64 is a signed value, this should be an arithmetic shift */
> > + int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
> > + return (struct rte_mbuf *)((uintptr_t)ret);
> > +}
> > +
> > +int
> > +rte_distributor_return_pkt(struct rte_distributor *d,
> > + unsigned worker_id, struct rte_mbuf *oldpkt)
> > +{
> Maybe some optional sanity checking, here and above, to ensure that a packet
> returned through get_pkt doesn't also get returned here, mangling the flags
> field?
That actually shouldn't be an issue.
When we return a packet using this call, we just get the in_flight_ids value for the worker to zero (and re-assign the backlog, if any), and move on to the next worker. No checking of the returned packet is done. Also, since get_pkt always returns a new packet, the internal logic will still work ok - all that will happen if you return the wrong packet, e.g. by returning the same packet twice rather than returning the latest packet each time, is that the returns array will have the duplicated pointer in it. Whatever gets passed back by the worker gets stored directly there - it's up to the worker to return the correct pointer to the distributor.
>
> ><snip>
> > +
> > +/* flush the distributor, so that there are no outstanding packets in flight or
> > + * queued up. */
> > +int
> > +rte_distributor_flush(struct rte_distributor *d)
> > +{
> You need to document that this function can only be called by the same thread
> that is running rte_distributor_process, lest your corrupt your queue data.
> Alternatively, it might be nicer to modify this functions internals to set a
> flag in the distributor status bits to make the process routine do the flush
> work when it gets set. that would allow this function to be called by any
> other thread, which seems like a more natural interface.
Agreed. At minimum I'll update the comments, and I'll also look into what would be involved in changing the mechanism like you describe. However, given the limited time to code freeze date, it may not be possible to do here. [I also don't anticipate this function being much used in normal operations anyway - it was written in order to allow me to write proper unit tests to test the process function. We need a flush function for unit testing to sure that our packet counts are predictable at the end of each test run, and eliminate any dependency in the tests on the internal buffer sizes of the distributor.]
>
> ><snip>
> > +}
> > +
> > +/* clears the internal returns array in the distributor */
> > +void
> > +rte_distributor_clear_returns(struct rte_distributor *d)
> > +{
> This can also only be called by the same thread that runs the process routine,
> lest the start and count values get mis-assigned.
Agreed. Will update comments.
>
> > + d->returns.start = d->returns.count = 0;
> > +#ifndef __OPTIMIZE__
> > + memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
> > +#endif
> > +}
> > +
> > +/* creates a distributor instance */
> > +struct rte_distributor *
> > +rte_distributor_create(const char *name,
> > + unsigned socket_id,
> > + unsigned num_workers,
> > + struct rte_distributor_extra_args *args __rte_unused)
> > +{
> > + struct rte_distributor *d;
> > + struct rte_distributor_list *distributor_list;
> > + char mz_name[RTE_MEMZONE_NAMESIZE];
> > + const struct rte_memzone *mz;
> > +
> > + /* compilation-time checks */
> > + RTE_BUILD_BUG_ON((sizeof(*d) & CACHE_LINE_MASK) != 0);
> > + RTE_BUILD_BUG_ON((RTE_MAX_LCORE & 7) != 0);
> > +
> > + if (name == NULL || num_workers >= RTE_MAX_LCORE) {
> > + rte_errno = EINVAL;
> > + return NULL;
> > + }
> > + rte_snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s",
> name);
> > + mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id,
> NO_FLAGS);
> > + if (mz == NULL) {
> > + rte_errno = ENOMEM;
> > + return NULL;
> > + }
> > +
> > + /* check that we have an initialised tail queue */
> > + if ((distributor_list =
> RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_DISTRIBUTOR,
> > + rte_distributor_list)) == NULL) {
> > + rte_errno = E_RTE_NO_TAILQ;
> > + return NULL;
> > + }
> > +
> > + d = mz->addr;
> > + rte_snprintf(d->name, sizeof(d->name), "%s", name);
> > + d->num_workers = num_workers;
> > + TAILQ_INSERT_TAIL(distributor_list, d, next);
> You need locking around this list unless you intend to assert that distributor
> creation and destruction must only be preformed from a single thread. Also,
> where is the API method to tear down a distributor instance?
Ack re locking, will make this as used in other structures.
For tearing down, that's not possible until such time as we get a function to free memzones back. Rings and mempools similarly have no free function.
>
> ><snip>
> > +#endif
> > +
> > +#include <rte_mbuf.h>
> > +
> > +#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance
> */
> > +
> > +struct rte_distributor;
> > +
> > +struct rte_distributor_extra_args { }; /**< reserved for future use*/
> > +
> You don't need to reserve a struct name for future use. No one will use it
> until you create it.
>
> > +struct rte_mbuf *
> > +rte_distributor_get_pkt(struct rte_distributor *d,
> > + unsigned worker_id, struct rte_mbuf *oldpkt, unsigned
> reserved);
> > +
> Don't need to reserve an extra argument here. You're not ABI safe currently,
> and if DPDK becomes ABI safe in the future, we will use a linker script to
> provide versions with backward compatibility easily enough.
We may not have ABI compatibility between releases, but on the other hand we try to reduce the amount of code changes that need to be made by our customers who are compiling their code against the libraries - generally linking against static rather than shared libraries. Since we have a reasonable expectation that this field will be needed in a future release, we want to include it now so that when we do need it, no code changes need to be made to upgrade this particular library to a new Intel DPDK version.
next prev parent reply other threads:[~2014-05-21 10:21 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-20 10:00 [dpdk-dev] [PATCH 0/4] New library: rte_distributor Bruce Richardson
2014-05-20 10:00 ` [dpdk-dev] [PATCH 1/4] eal: add tailq for new distributor component Bruce Richardson
2014-05-20 10:00 ` [dpdk-dev] [PATCH 2/4] distributor: new packet distributor library Bruce Richardson
2014-05-20 18:18 ` Neil Horman
2014-05-21 10:21 ` Richardson, Bruce [this message]
2014-05-21 15:23 ` Neil Horman
2014-05-20 10:00 ` [dpdk-dev] [PATCH 3/4] distributor: add distributor library to build Bruce Richardson
2014-05-20 10:00 ` [dpdk-dev] [PATCH 4/4] distributor: add unit tests for distributor lib Bruce Richardson
2014-05-20 10:38 ` [dpdk-dev] [PATCH 0/4] New library: rte_distributor Neil Horman
2014-05-20 11:02 ` Richardson, Bruce
2014-05-20 17:14 ` Neil Horman
2014-05-20 19:32 ` Richardson, Bruce
2014-05-27 22:32 ` Thomas Monjalon
2014-05-28 8:48 ` Richardson, Bruce
2014-05-29 10:12 ` [dpdk-dev] [PATCH v2 0/5] " Bruce Richardson
2014-06-05 1:58 ` Cao, Waterman
2014-06-12 13:57 ` Thomas Monjalon
2014-05-29 10:12 ` [dpdk-dev] [PATCH v2 1/5] eal: add tailq for new distributor component Bruce Richardson
2014-05-29 10:12 ` [dpdk-dev] [PATCH v2 2/5] distributor: new packet distributor library Bruce Richardson
2014-05-29 13:48 ` Neil Horman
2014-06-02 21:40 ` Richardson, Bruce
2014-06-03 11:01 ` Neil Horman
2014-06-03 14:33 ` Richardson, Bruce
2014-06-03 14:51 ` Neil Horman
2014-06-03 18:04 ` [dpdk-dev] [PATCH v3 " Bruce Richardson
2014-06-03 18:38 ` Neil Horman
2014-05-29 10:12 ` [dpdk-dev] [PATCH v2 3/5] distributor: add distributor library to build Bruce Richardson
2014-05-29 10:12 ` [dpdk-dev] [PATCH v2 4/5] distributor: add unit tests for distributor lib Bruce Richardson
2014-05-29 10:12 ` [dpdk-dev] [PATCH v2 5/5] docs: add distributor lib to API docs Bruce Richardson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=59AF69C657FD0841A61C55336867B5B01AA1C540@IRSMSX103.ger.corp.intel.com \
--to=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=nhorman@tuxdriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).