From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-f194.google.com (mail-pf1-f194.google.com [209.85.210.194]) by dpdk.org (Postfix) with ESMTP id 105392BF9 for ; Tue, 11 Sep 2018 21:13:02 +0200 (CEST) Received: by mail-pf1-f194.google.com with SMTP id k19-v6so12716456pfi.1 for ; Tue, 11 Sep 2018 12:13:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=gjUPoWtK+PUsXrD3B5dwgtcB6LaqEzl8pOAWztzQZe8=; b=e721Yll09Mx8XvWL753FmJU8euAKg6bBq3Ovh0Wml1yLhHk2WKSYYWGDtdbHk9X7ff KRiJktFom2VyQXJc8FXgYBf2FenjmHyWh3EbPVdRaRXHAE+9vnptdpB/QCZTiaiaWX0C 4RhkO/Yf0Io9xbcDCsB14W3b3apyTgrWWDjlxJIPJxU1/oR8cGmuru4bNu+9V7sL6KD9 7VsDXOWV/sq526+yMMQl5pqiufzHy0vZPpoCEn+8xTg1MtqDOueOz7c+MHmt3dKmlT+w MiJqiKUoCbkxHSwOxMPSet6Doa8AnwzX3Ke/olm7eGqdGky/31TClv5FJtKro2oZjx2f n3Dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=gjUPoWtK+PUsXrD3B5dwgtcB6LaqEzl8pOAWztzQZe8=; b=bhbVqSop/xRLvvx+0BG8Kip30TP6NNRTv7skohykjgpcdQM8CYMs9lU0KQ154H2mTK GCNFa4eyTis+wuYIXLU/t5OgDHv2itsM5MjxhpgnW4MrP0/A+nMcoa/UdEzwt09NyfZa rPOsJkXI2GNxImkldYaPmmNJjask0rF6TTvzTnlaTmpxkrqZr0UoBaDqrDlSvA82AUXp dc+VcadZ1+o459QbgQWafXC0ME/fjBDFmryOlpDFni0bv26hE0+dFe1woB7GygnNfVa+ cu0BQgR1K/zqxQzLIZbg2I7VBkIsaI8qzv5Z+nTZkgxqbSXZ+WC4z1dFQZD8Ws6s4pOo 9/VA== X-Gm-Message-State: APzg51ARvfcTLBbE/BbudGdMsKDaMhpy8SHbosC7pus0Lu4rFbt4SUU9 o3OrwBC0Tz9bVy8LUbiZpUI7Yg== X-Google-Smtp-Source: ANB0VdYBYEz0W89QxTyYO6IW0qR+YoctrcWOxj1XgUqCIeO03RwnO/xcvcptKgyfQ47GG9QBmtlW2A== X-Received: by 2002:a63:2150:: with SMTP id s16-v6mr30054087pgm.267.1536693181997; Tue, 11 Sep 2018 12:13:01 -0700 (PDT) Received: from xeon-e3 (204-195-22-127.wavecable.com. [204.195.22.127]) by smtp.gmail.com with ESMTPSA id l85-v6sm34881224pfk.34.2018.09.11.12.13.01 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 11 Sep 2018 12:13:01 -0700 (PDT) Date: Tue, 11 Sep 2018 12:12:54 -0700 From: Stephen Hemminger To: Arvind Narayanan Cc: keith.wiles@intel.com, users@dpdk.org Message-ID: <20180911121254.6c1531de@xeon-e3> In-Reply-To: References: <20180911110744.7ef55fc2@xeon-e3> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-users] How to use software prefetching for custom structures to increase throughput on the fast path X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Sep 2018 19:13:03 -0000 On Tue, 11 Sep 2018 13:39:24 -0500 Arvind Narayanan wrote: > Stephen, thanks! > > That is it! Not sure if there is any workaround. > > So, essentially, what I am doing is -- core 0 gets a burst of my_packet(s) > from its pre-allocated mempool, and then (bulk) enqueues it into a > rte_ring. Core 1 then (bulk) dequeues from this ring and when it access the > data pointed by the ring's element (i.e. my_packet->tag1), this memory > access latency issue is seen. I cannot advance the prefetch any earlier. Is > there any clever workaround (or hack) to overcome this issue other than > using the same core for all the functions? For e.g. can I can prefetch the > packets in core 0 for core 1's cache (could be a dumb question!)? > > Thanks, > Arvind > > On Tue, Sep 11, 2018 at 1:07 PM Stephen Hemminger < > stephen@networkplumber.org> wrote: > > > On Tue, 11 Sep 2018 12:18:42 -0500 > > Arvind Narayanan wrote: > > > > > If I don't do any processing, I easily get 10G. It is only when I access > > > the tag when the throughput drops. > > > What confuses me is if I use the following snippet, it works at line > > rate. > > > > > > ``` > > > int temp_key = 1; // declared outside of the for loop > > > > > > for (i = 0; i < pkt_count; i++) { > > > if (rte_hash_lookup_data(rx_table, &(temp_key), (void **)&val[i]) < > > 0) { > > > } > > > } > > > ``` > > > > > > But as soon as I replace `temp_key` with `my_packet->tag1`, I experience > > > fall in throughput (which in a way confirms the issue is due to cache > > > misses). > > > > Your packet data is not in cache. > > Doing prefetch can help but it is very timing sensitive. If prefetch is > > done > > before data is available it won't help. And if prefetch is done just before > > data is used then there isn't enough cycles to get it from memory to the > > cache. > > > > > > In my experience, if you want performance then don't pass packets between cores. It is slightly less bad if the core that does the passing does not access the packet. It is really bad if the handling core writes the packet. Especially cores with greater cache distance (numa). If you have to then use cores which share hyper-thread.