From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-f49.google.com (mail-la0-f49.google.com [209.85.215.49]) by dpdk.org (Postfix) with ESMTP id 272795954 for ; Mon, 6 Apr 2015 14:18:22 +0200 (CEST) Received: by layy10 with SMTP id y10so18502247lay.0 for ; Mon, 06 Apr 2015 05:18:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=S1oODqb138KotGeCgzgvkJU1HWkDfoinDSVDj0BlJn8=; b=rVgsk0abbx3VaNoy95TkhNsblMW/h+f9a6zcCB8zsSKsS2lzL8URcNs3vBBC8eDubd XeOsFBD0piqxQ3U0PvIjH3Say5OwUbL0ENJGtmLOOV4/eWkNZF6B5BRi1Vuv6Hy7SwGB 2mFHOt4AhHLwxdsssH7DC9krkreInsw660brAVZmlqkmK63XWQCTqhSt7PL22kM5urc9 cfkNbRFGQtwpWoVluC4jWNcRfsRghQ71GLyFKUYOyv5X2E5ijWc1tkW8MlYiE1LVOgwk n4J/BGJgI36Vp9HrI/dHcA3pCMCMzWxKxrYjZHZ9iYrf3oScsoxAedhSPhhEPHO9m7AZ DBPA== MIME-Version: 1.0 X-Received: by 10.112.183.134 with SMTP id em6mr13197552lbc.52.1428322701805; Mon, 06 Apr 2015 05:18:21 -0700 (PDT) Received: by 10.25.41.201 with HTTP; Mon, 6 Apr 2015 05:18:21 -0700 (PDT) Date: Mon, 6 Apr 2015 15:18:21 +0300 Message-ID: From: Dor Green To: dev@dpdk.org Content-Type: text/plain; charset=UTF-8 Subject: [dpdk-dev] rte_ring's dequeue appears to be slow X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2015 12:18:22 -0000 I have an app which captures packets on a single core and then passes to multiple workers on different lcores, using the ring queues. While I manage to capture packets at 10Gbps, when I send it to the processing lcores there is substantial packet loss. At first I figured it's the processing I do on the packets and optimized that, which did help it a little but did not alleviate the problem. I used Intel VTune amplifier to profile the program, and on all profiling checks that I did there, the majority of the time in the program is spent in "__rte_ring_sc_do_dequeue" (about 70%). I was wondering if anyone can tell me how to optimize this, or if I'm using the queues incorrectly, or maybe even doing the profiling wrong (because I do find it weird that this dequeuing is so slow). My program architecture is as follows (replaced consts with actual values): A queue is created for each processing lcore: rte_ring_create(qname, swsize, NUMA_SOCKET, 1024*1024, RING_F_SP_ENQ | RING_F_SC_DEQ); The processing core enqueues packets one by one, to each of the queues (the packet burst size is 256): rte_ring_sp_enqueue(lc[queue_index].queue, (void *const)pkts[i]); Which are then dequeued in bulk in the processor lcores: rte_ring_sc_dequeue_bulk(lc->queue, (void**) &mbufs, 128); I'm using 16 1GB hugepages, running the new 2.0 version. If there's any further info required about the program, let me know. Thank you.