From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bruce.richardson@intel.com>
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
 by dpdk.org (Postfix) with ESMTP id 3FB71594F
 for <dev@dpdk.org>; Wed, 21 May 2014 12:21:21 +0200 (CEST)
Received: from orsmga001.jf.intel.com ([10.7.209.18])
 by orsmga102.jf.intel.com with ESMTP; 21 May 2014 03:16:23 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="4.98,879,1392192000"; d="scan'208";a="515350947"
Received: from irsmsx101.ger.corp.intel.com ([163.33.3.153])
 by orsmga001.jf.intel.com with ESMTP; 21 May 2014 03:21:28 -0700
Received: from irsmsx103.ger.corp.intel.com ([169.254.3.183]) by
 IRSMSX101.ger.corp.intel.com ([169.254.1.249]) with mapi id 14.03.0123.003;
 Wed, 21 May 2014 11:21:26 +0100
From: "Richardson, Bruce" <bruce.richardson@intel.com>
To: Neil Horman <nhorman@tuxdriver.com>
Thread-Topic: [dpdk-dev] [PATCH 2/4] distributor: new packet distributor
 library
Thread-Index: AQHPdBKNzqCtdhXEt0OEBijKFVbv75tJtvUAgAEXSPA=
Date: Wed, 21 May 2014 10:21:26 +0000
Message-ID: <59AF69C657FD0841A61C55336867B5B01AA1C540@IRSMSX103.ger.corp.intel.com>
References: <1400580057-30155-1-git-send-email-bruce.richardson@intel.com>
 <1400580057-30155-3-git-send-email-bruce.richardson@intel.com>
 <20140520181844.GF6648@hmsreliant.think-freely.org>
In-Reply-To: <20140520181844.GF6648@hmsreliant.think-freely.org>
Accept-Language: en-GB, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.181]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH 2/4] distributor: new packet distributor
 library
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 21 May 2014 10:21:22 -0000

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Tuesday, May 20, 2014 7:19 PM
> To: Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 2/4] distributor: new packet distributor l=
ibrary
>=20
> On Tue, May 20, 2014 at 11:00:55AM +0100, Bruce Richardson wrote:
> > This adds the code for a new Intel DPDK library for packet distribution=
.
> > The distributor is a component which is designed to pass packets
> > one-at-a-time to workers, with dynamic load balancing. Using the RSS
> > field in the mbuf as a tag, the distributor tracks what packet tag is
> > being processed by what worker and then ensures that no two packets wit=
h
> > the same tag are in-flight simultaneously. Once a tag is not in-flight,
> > then the next packet with that tag will be sent to the next available
> > core.
> >
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> ><snip>
>=20
> > +#define RTE_DISTRIB_GET_BUF (1)
> > +#define RTE_DISTRIB_RETURN_BUF (2)
> > +
> Can you document the meaning of these bits please, the code makes it
> somewhat
> confusing to differentiate them.  As I read the code, GET_BUF is used as =
a flag
> to indicate that rte_distributor_get_pkt needs to wait while a buffer is
> filled in by the processing thread, while RETURN_BUF indicates that a wor=
ker is
> leaving and the buffer needs to be (re)assigned to an alternate worker, i=
s that
> correct?
Pretty much. I'll add additional comments to the code.

>=20
> > +#define RTE_DISTRIB_BACKLOG_SIZE 8
> > +#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
> > +
> > +#define RTE_DISTRIB_MAX_RETURNS 128
> > +#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
> > +
> > +union rte_distributor_buffer {
> > +	volatile int64_t bufptr64;
> > +	char pad[CACHE_LINE_SIZE*3];
> Do you need the pad, if you mark the struct as cache aligned?
Yes, for performance reasons we actually want the structure to take up thre=
e cache lines, not just one. For instance, this will guarantee that we don'=
t have adjacent line prefetcher in hardware pull in an additional cache lin=
e -belonging to a different worker - when we access the memory.

> > +} __rte_cache_aligned;
> >
> +
> ><snip>
> > +
> > +struct rte_mbuf *
> > +rte_distributor_get_pkt(struct rte_distributor *d,
> > +		unsigned worker_id, struct rte_mbuf *oldpkt,
> > +		unsigned reserved __rte_unused)
> > +{
> > +	union rte_distributor_buffer *buf =3D &d->bufs[worker_id];
> > +	int64_t req =3D (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BIT=
S) |
> \
> > +			RTE_DISTRIB_GET_BUF;
> > +	while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
> > +		rte_pause();
> > +	buf->bufptr64 =3D req;
> > +	while (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
> > +		rte_pause();
> You may want to document the fact that this is deadlock prone.  You clear=
ly
> state that only a single thread can run the processing routine, but if a =
user
> selects a single worker thread to preform double duty, the GET_BUF_FLAG w=
ill
> never get cleared here, and no other queues will get processed.
Agreed, I'll update the comments.

>=20
> > +	/* since bufptr64 is a signed value, this should be an arithmetic shi=
ft */
> > +	int64_t ret =3D buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
> > +	return (struct rte_mbuf *)((uintptr_t)ret);
> > +}
> > +
> > +int
> > +rte_distributor_return_pkt(struct rte_distributor *d,
> > +		unsigned worker_id, struct rte_mbuf *oldpkt)
> > +{
> Maybe some optional sanity checking, here and above, to ensure that a pac=
ket
> returned through get_pkt doesn't also get returned here, mangling the fla=
gs
> field?
That actually shouldn't be an issue.=20
When we return a packet using this call, we just get the in_flight_ids valu=
e for the worker to zero (and re-assign the backlog, if any), and move on t=
o the next worker. No checking of the returned packet is done. Also, since =
get_pkt always returns a new packet, the internal logic will still work ok =
- all that will happen if you return the wrong packet, e.g. by returning th=
e same packet twice rather than returning the latest packet each time, is t=
hat the returns array will have the duplicated pointer in it. Whatever gets=
 passed back by the worker gets stored directly there - it's up to the work=
er to return the correct pointer to the distributor.

>=20
> ><snip>
> > +
> > +/* flush the distributor, so that there are no outstanding packets in =
flight or
> > + * queued up. */
> > +int
> > +rte_distributor_flush(struct rte_distributor *d)
> > +{
> You need to document that this function can only be called by the same th=
read
> that is running rte_distributor_process, lest your corrupt your queue dat=
a.
> Alternatively, it might be nicer to modify this functions internals to se=
t a
> flag in the distributor status bits to make the process routine do the fl=
ush
> work when it gets set.  that would allow this function to be called by an=
y
> other thread, which seems like a more natural interface.
Agreed. At minimum I'll update the comments, and I'll also look into what w=
ould be involved in changing the mechanism like you describe. However, give=
n the limited time to code freeze date, it may not be possible to do here. =
[I also don't anticipate this function being much used in normal operations=
 anyway - it was written in order to allow me to write proper unit tests to=
 test the process function. We need a flush function for unit testing to su=
re that our packet counts are predictable at the end of each test run, and =
eliminate any dependency in the tests on the internal buffer sizes of the d=
istributor.]

>=20
> ><snip>
> > +}
> > +
> > +/* clears the internal returns array in the distributor */
> > +void
> > +rte_distributor_clear_returns(struct rte_distributor *d)
> > +{
> This can also only be called by the same thread that runs the process rou=
tine,
> lest the start and count values get mis-assigned.
Agreed. Will update comments.

>=20
> > +	d->returns.start =3D d->returns.count =3D 0;
> > +#ifndef __OPTIMIZE__
> > +	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
> > +#endif
> > +}
> > +
> > +/* creates a distributor instance */
> > +struct rte_distributor *
> > +rte_distributor_create(const char *name,
> > +		unsigned socket_id,
> > +		unsigned num_workers,
> > +		struct rte_distributor_extra_args *args __rte_unused)
> > +{
> > +	struct rte_distributor *d;
> > +	struct rte_distributor_list *distributor_list;
> > +	char mz_name[RTE_MEMZONE_NAMESIZE];
> > +	const struct rte_memzone *mz;
> > +
> > +	/* compilation-time checks */
> > +	RTE_BUILD_BUG_ON((sizeof(*d) & CACHE_LINE_MASK) !=3D 0);
> > +	RTE_BUILD_BUG_ON((RTE_MAX_LCORE & 7) !=3D 0);
> > +
> > +	if (name =3D=3D NULL || num_workers >=3D RTE_MAX_LCORE) {
> > +		rte_errno =3D EINVAL;
> > +		return NULL;
> > +	}
> > +	rte_snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s",
> name);
> > +	mz =3D rte_memzone_reserve(mz_name, sizeof(*d), socket_id,
> NO_FLAGS);
> > +	if (mz =3D=3D NULL) {
> > +		rte_errno =3D ENOMEM;
> > +		return NULL;
> > +	}
> > +
> > +	/* check that we have an initialised tail queue */
> > +	if ((distributor_list =3D
> RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_DISTRIBUTOR,
> > +			rte_distributor_list)) =3D=3D NULL) {
> > +		rte_errno =3D E_RTE_NO_TAILQ;
> > +		return NULL;
> > +	}
> > +
> > +	d =3D mz->addr;
> > +	rte_snprintf(d->name, sizeof(d->name), "%s", name);
> > +	d->num_workers =3D num_workers;
> > +	TAILQ_INSERT_TAIL(distributor_list, d, next);
> You need locking around this list unless you intend to assert that distri=
butor
> creation and destruction must only be preformed from a single thread.  Al=
so,
> where is the API method to tear down a distributor instance?
Ack re locking, will make this as used in other structures.
For tearing down, that's not possible until such time as we get a function =
to free memzones back. Rings and mempools similarly have no free function.

>=20
> ><snip>
> > +#endif
> > +
> > +#include <rte_mbuf.h>
> > +
> > +#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance
> */
> > +
> > +struct rte_distributor;
> > +
> > +struct rte_distributor_extra_args { }; /**< reserved for future use*/
> > +
> You don't need to reserve a struct name for future use.  No one will use =
it
> until you create it.
>=20
> > +struct rte_mbuf *
> > +rte_distributor_get_pkt(struct rte_distributor *d,
> > +		unsigned worker_id, struct rte_mbuf *oldpkt, unsigned
> reserved);
> > +
> Don't need to reserve an extra argument here.  You're not ABI safe curren=
tly,
> and if DPDK becomes ABI safe in the future, we will use a linker script t=
o
> provide versions with backward compatibility easily enough.
We may not have ABI compatibility between releases, but on the other hand w=
e try to reduce the amount of code changes that need to be made by our cust=
omers who are compiling their code against the libraries - generally linkin=
g against static rather than shared libraries. Since we have a reasonable e=
xpectation that this field will be needed in a future release, we want to i=
nclude it now so that when we do need it, no code changes need to be made t=
o upgrade this particular library to a new Intel DPDK version.