From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.tuxdriver.com (charlotte.tuxdriver.com [70.61.120.58]) by dpdk.org (Postfix) with ESMTP id 1D74968CB for ; Mon, 21 Jul 2014 21:53:04 +0200 (CEST) Received: from cpe-098-026-076-128.nc.res.rr.com ([98.26.76.128] helo=localhost) by smtp.tuxdriver.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.63) (envelope-from ) id 1X9Jf6-0001Bq-2h; Mon, 21 Jul 2014 15:54:18 -0400 Date: Mon, 21 Jul 2014 15:54:15 -0400 From: Neil Horman To: Chris Pappas Message-ID: <20140721195415.GA25740@hmsreliant.think-freely.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Score: -2.9 (--) X-Spam-Status: No Cc: dev@dpdk.org Subject: Re: [dpdk-dev] Random numbers at line-rate X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Jul 2014 19:53:04 -0000 On Mon, Jul 21, 2014 at 09:24:36PM +0200, Chris Pappas wrote: > Hi, > > I need to generate a random number per packet and I used the rte_fast_rand > function to do so. When I run the code for one port-core I get almost > line-rate performance. However, running simultaneously on multiple cores > degrades performance significantly. (in all cases I uses minimum-sized > packets). > > Shouldn't the implementation scale for multicore and not degrade > performance or am I missing anything? Also, is there another recommendation > for generating randomness at line-rate? (the cpu does not support rdrand). > > Best regards, > Chris > thats an odd random number generator. I think, without locking, its likely on a multicore system to produce identical values on multiple cores operating in parallel (since multiple cores can read rte_red_rand_seed at the same time). That may well lead to multiple packets having the same nonce, which might cause odd behavior. If your cpu supports it, I'd suggest writing some inline assembly to use the rdrand instruction instead. I'm not sure about its performance relative to the current implementation, but IIRC the instruction is handled internal to the core, so it should scale with any number of cpus. neil