From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f169.google.com (mail-ob0-f169.google.com [209.85.214.169]) by dpdk.org (Postfix) with ESMTP id 6A1EE6829 for ; Sun, 23 Mar 2014 17:28:04 +0100 (CET) Received: by mail-ob0-f169.google.com with SMTP id va2so4741468obc.28 for ; Sun, 23 Mar 2014 09:29:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=kxxlObfUHCF3JmykpMX+xS+yJFQ+/qsBEbFWyUpnmYE=; b=AXPd+Pu9uRq7ogMmYLPYI1toNuh/wRkAZX3WQftUgHq3+tuDTE0aP51EyGxTRhTpMS lTerJAWaDmDpPMeeGEj/koc/pl6+K2Y3/50OWyy7BD3pi+MIDVdedE8tCo9jAdxAk2Du qQGi2g3tExbXVtQrDcrniK4yprxMjqFTuA4OBd61CsvqqpYbPlwtLF3ViaNbRqrDii/9 uqCqQEnS9rzw9Uh/D9ypnL+7yvvhA1aQgTwqMUbg7N+kjDHbz4fTkwQU5O4zTokAnh4D NqnZH1KUBBKSuHeo3JWZHt6xix8BPx71WgxWOGb7VxyqOHJsBFF7IZqJUoSZZNUq7li6 i5sg== MIME-Version: 1.0 X-Received: by 10.60.172.70 with SMTP id ba6mr52410981oec.17.1395592176718; Sun, 23 Mar 2014 09:29:36 -0700 (PDT) Received: by 10.76.22.7 with HTTP; Sun, 23 Mar 2014 09:29:36 -0700 (PDT) Date: Sun, 23 Mar 2014 17:29:36 +0100 Message-ID: From: Jun Han To: "dev@dpdk.org" Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: [dpdk-dev] l3fwd LPM lookup - issue when measuring latency X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Mar 2014 16:28:04 -0000 Hi all, I've been trying to measure possible performance penalties of performing LPM table lookup on l3fwd code (as opposed to a simple forwarding without lookup, i.e., forwarding back to the ingress port). I perform two sets of experiments -- (1) generate a fixed dst IP address from DPDK pktgen; (2) generate random dst IP address from DPDK pktgen. My hypothesis is that for case (1), upon receiving many packets with same dst IP, DPDK l3fwd should only need to fetch LPM table from the cache. However, case (2) would generate more cache misses, hence requiring fetches from memory, which should increase the latency. (My current machine has 20MB of L3 cache.) However, when I measure the average cycles it takes to perform a lookup indexed by the received dst IP address, the two cases yield almost similar results of around 34 cycles. I am using rdtsc to measure the cycles in the rte_lpm_lookup() function in rte_lpm.h (under lib/librte_lpm). I am not sure if this is due to rte_rdtsc problem, or if I am misunderstanding something. tsc1 = rte_rdtsc(); tbl_entry = *(const uint16_t *)&lpm->tbl24[tbl24_index]; tscdif = rte_rdtsc() - tsc1; aggreg_dif += tscdif; I would appreciate it if someone could provide their opinion on this phenomenon. Thanks in advance! Jun