From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 3FB71594F for ; Wed, 21 May 2014 12:21:21 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP; 21 May 2014 03:16:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.98,879,1392192000"; d="scan'208";a="515350947" Received: from irsmsx101.ger.corp.intel.com ([163.33.3.153]) by orsmga001.jf.intel.com with ESMTP; 21 May 2014 03:21:28 -0700 Received: from irsmsx103.ger.corp.intel.com ([169.254.3.183]) by IRSMSX101.ger.corp.intel.com ([169.254.1.249]) with mapi id 14.03.0123.003; Wed, 21 May 2014 11:21:26 +0100 From: "Richardson, Bruce" To: Neil Horman Thread-Topic: [dpdk-dev] [PATCH 2/4] distributor: new packet distributor library Thread-Index: AQHPdBKNzqCtdhXEt0OEBijKFVbv75tJtvUAgAEXSPA= Date: Wed, 21 May 2014 10:21:26 +0000 Message-ID: <59AF69C657FD0841A61C55336867B5B01AA1C540@IRSMSX103.ger.corp.intel.com> References: <1400580057-30155-1-git-send-email-bruce.richardson@intel.com> <1400580057-30155-3-git-send-email-bruce.richardson@intel.com> <20140520181844.GF6648@hmsreliant.think-freely.org> In-Reply-To: <20140520181844.GF6648@hmsreliant.think-freely.org> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH 2/4] distributor: new packet distributor library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 May 2014 10:21:22 -0000 > -----Original Message----- > From: Neil Horman [mailto:nhorman@tuxdriver.com] > Sent: Tuesday, May 20, 2014 7:19 PM > To: Richardson, Bruce > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH 2/4] distributor: new packet distributor l= ibrary >=20 > On Tue, May 20, 2014 at 11:00:55AM +0100, Bruce Richardson wrote: > > This adds the code for a new Intel DPDK library for packet distribution= . > > The distributor is a component which is designed to pass packets > > one-at-a-time to workers, with dynamic load balancing. Using the RSS > > field in the mbuf as a tag, the distributor tracks what packet tag is > > being processed by what worker and then ensures that no two packets wit= h > > the same tag are in-flight simultaneously. Once a tag is not in-flight, > > then the next packet with that tag will be sent to the next available > > core. > > > > Signed-off-by: Bruce Richardson > > >=20 > > +#define RTE_DISTRIB_GET_BUF (1) > > +#define RTE_DISTRIB_RETURN_BUF (2) > > + > Can you document the meaning of these bits please, the code makes it > somewhat > confusing to differentiate them. As I read the code, GET_BUF is used as = a flag > to indicate that rte_distributor_get_pkt needs to wait while a buffer is > filled in by the processing thread, while RETURN_BUF indicates that a wor= ker is > leaving and the buffer needs to be (re)assigned to an alternate worker, i= s that > correct? Pretty much. I'll add additional comments to the code. >=20 > > +#define RTE_DISTRIB_BACKLOG_SIZE 8 > > +#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1) > > + > > +#define RTE_DISTRIB_MAX_RETURNS 128 > > +#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1) > > + > > +union rte_distributor_buffer { > > + volatile int64_t bufptr64; > > + char pad[CACHE_LINE_SIZE*3]; > Do you need the pad, if you mark the struct as cache aligned? Yes, for performance reasons we actually want the structure to take up thre= e cache lines, not just one. For instance, this will guarantee that we don'= t have adjacent line prefetcher in hardware pull in an additional cache lin= e -belonging to a different worker - when we access the memory. > > +} __rte_cache_aligned; > > > + > > > > + > > +struct rte_mbuf * > > +rte_distributor_get_pkt(struct rte_distributor *d, > > + unsigned worker_id, struct rte_mbuf *oldpkt, > > + unsigned reserved __rte_unused) > > +{ > > + union rte_distributor_buffer *buf =3D &d->bufs[worker_id]; > > + int64_t req =3D (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BIT= S) | > \ > > + RTE_DISTRIB_GET_BUF; > > + while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK)) > > + rte_pause(); > > + buf->bufptr64 =3D req; > > + while (buf->bufptr64 & RTE_DISTRIB_GET_BUF) > > + rte_pause(); > You may want to document the fact that this is deadlock prone. You clear= ly > state that only a single thread can run the processing routine, but if a = user > selects a single worker thread to preform double duty, the GET_BUF_FLAG w= ill > never get cleared here, and no other queues will get processed. Agreed, I'll update the comments. >=20 > > + /* since bufptr64 is a signed value, this should be an arithmetic shi= ft */ > > + int64_t ret =3D buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS; > > + return (struct rte_mbuf *)((uintptr_t)ret); > > +} > > + > > +int > > +rte_distributor_return_pkt(struct rte_distributor *d, > > + unsigned worker_id, struct rte_mbuf *oldpkt) > > +{ > Maybe some optional sanity checking, here and above, to ensure that a pac= ket > returned through get_pkt doesn't also get returned here, mangling the fla= gs > field? That actually shouldn't be an issue.=20 When we return a packet using this call, we just get the in_flight_ids valu= e for the worker to zero (and re-assign the backlog, if any), and move on t= o the next worker. No checking of the returned packet is done. Also, since = get_pkt always returns a new packet, the internal logic will still work ok = - all that will happen if you return the wrong packet, e.g. by returning th= e same packet twice rather than returning the latest packet each time, is t= hat the returns array will have the duplicated pointer in it. Whatever gets= passed back by the worker gets stored directly there - it's up to the work= er to return the correct pointer to the distributor. >=20 > > > > + > > +/* flush the distributor, so that there are no outstanding packets in = flight or > > + * queued up. */ > > +int > > +rte_distributor_flush(struct rte_distributor *d) > > +{ > You need to document that this function can only be called by the same th= read > that is running rte_distributor_process, lest your corrupt your queue dat= a. > Alternatively, it might be nicer to modify this functions internals to se= t a > flag in the distributor status bits to make the process routine do the fl= ush > work when it gets set. that would allow this function to be called by an= y > other thread, which seems like a more natural interface. Agreed. At minimum I'll update the comments, and I'll also look into what w= ould be involved in changing the mechanism like you describe. However, give= n the limited time to code freeze date, it may not be possible to do here. = [I also don't anticipate this function being much used in normal operations= anyway - it was written in order to allow me to write proper unit tests to= test the process function. We need a flush function for unit testing to su= re that our packet counts are predictable at the end of each test run, and = eliminate any dependency in the tests on the internal buffer sizes of the d= istributor.] >=20 > > > > +} > > + > > +/* clears the internal returns array in the distributor */ > > +void > > +rte_distributor_clear_returns(struct rte_distributor *d) > > +{ > This can also only be called by the same thread that runs the process rou= tine, > lest the start and count values get mis-assigned. Agreed. Will update comments. >=20 > > + d->returns.start =3D d->returns.count =3D 0; > > +#ifndef __OPTIMIZE__ > > + memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs)); > > +#endif > > +} > > + > > +/* creates a distributor instance */ > > +struct rte_distributor * > > +rte_distributor_create(const char *name, > > + unsigned socket_id, > > + unsigned num_workers, > > + struct rte_distributor_extra_args *args __rte_unused) > > +{ > > + struct rte_distributor *d; > > + struct rte_distributor_list *distributor_list; > > + char mz_name[RTE_MEMZONE_NAMESIZE]; > > + const struct rte_memzone *mz; > > + > > + /* compilation-time checks */ > > + RTE_BUILD_BUG_ON((sizeof(*d) & CACHE_LINE_MASK) !=3D 0); > > + RTE_BUILD_BUG_ON((RTE_MAX_LCORE & 7) !=3D 0); > > + > > + if (name =3D=3D NULL || num_workers >=3D RTE_MAX_LCORE) { > > + rte_errno =3D EINVAL; > > + return NULL; > > + } > > + rte_snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", > name); > > + mz =3D rte_memzone_reserve(mz_name, sizeof(*d), socket_id, > NO_FLAGS); > > + if (mz =3D=3D NULL) { > > + rte_errno =3D ENOMEM; > > + return NULL; > > + } > > + > > + /* check that we have an initialised tail queue */ > > + if ((distributor_list =3D > RTE_TAILQ_LOOKUP_BY_IDX(RTE_TAILQ_DISTRIBUTOR, > > + rte_distributor_list)) =3D=3D NULL) { > > + rte_errno =3D E_RTE_NO_TAILQ; > > + return NULL; > > + } > > + > > + d =3D mz->addr; > > + rte_snprintf(d->name, sizeof(d->name), "%s", name); > > + d->num_workers =3D num_workers; > > + TAILQ_INSERT_TAIL(distributor_list, d, next); > You need locking around this list unless you intend to assert that distri= butor > creation and destruction must only be preformed from a single thread. Al= so, > where is the API method to tear down a distributor instance? Ack re locking, will make this as used in other structures. For tearing down, that's not possible until such time as we get a function = to free memzones back. Rings and mempools similarly have no free function. >=20 > > > > +#endif > > + > > +#include > > + > > +#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance > */ > > + > > +struct rte_distributor; > > + > > +struct rte_distributor_extra_args { }; /**< reserved for future use*/ > > + > You don't need to reserve a struct name for future use. No one will use = it > until you create it. >=20 > > +struct rte_mbuf * > > +rte_distributor_get_pkt(struct rte_distributor *d, > > + unsigned worker_id, struct rte_mbuf *oldpkt, unsigned > reserved); > > + > Don't need to reserve an extra argument here. You're not ABI safe curren= tly, > and if DPDK becomes ABI safe in the future, we will use a linker script t= o > provide versions with backward compatibility easily enough. We may not have ABI compatibility between releases, but on the other hand w= e try to reduce the amount of code changes that need to be made by our cust= omers who are compiling their code against the libraries - generally linkin= g against static rather than shared libraries. Since we have a reasonable e= xpectation that this field will be needed in a future release, we want to i= nclude it now so that when we do need it, no code changes need to be made t= o upgrade this particular library to a new Intel DPDK version.