From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by dpdk.org (Postfix) with ESMTP id 70D011B3A7 for ; Wed, 28 Nov 2018 17:55:44 +0100 (CET) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id D69C74001A for ; Wed, 28 Nov 2018 17:55:43 +0100 (CET) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id C1B6240005; Wed, 28 Nov 2018 17:55:43 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on bernadotte.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=ALL_TRUSTED,AWL, PP_MIME_FAKE_ASCII_TEXT autolearn=disabled version=3.4.1 X-Spam-Score: -0.9 Received: from [192.168.1.59] (host-90-232-89-187.mobileonline.telia.com [90.232.89.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id E5D9240006; Wed, 28 Nov 2018 17:55:41 +0100 (CET) To: Venky Venkatesh , "dev@dpdk.org" References: <27A03E76-DED0-435F-B02F-24A7A7B1BCC9@contoso.com> <779258cb-490f-0111-94ce-bc87d1502ed0@lysator.liu.se> <0AD526BD-FC54-4128-829D-6D5EE8BEAFC6@paloaltonetworks.com> <7E26E1F9-4148-4F6C-9BC1-B79A419B2A97@paloaltonetworks.com> From: =?UTF-8?Q?Mattias_R=c3=b6nnblom?= Message-ID: <4af7e731-6243-ce80-cc78-4d6c0ebd7135@ericsson.com> Date: Wed, 28 Nov 2018 17:55:41 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <7E26E1F9-4148-4F6C-9BC1-B79A419B2A97@paloaltonetworks.com> Content-Language: en-US X-Virus-Scanned: ClamAV using ClamSMTP Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] Application used for DSW event_dev performance testing X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 Nov 2018 16:55:44 -0000 On 2018-11-27 23:33, Venky Venkatesh wrote: > > As you can see the DSW overhead dominates the scene and very little real work is getting done. Is there some configuration or tuning to be done to get the sort of performance you are seeing with multiple cores? > I can't explain the behavior you are seeing based on the information you have supplied. Attached is a small DSW throughput test program, that I thought might help you to find the issue. It works much like the pipeline simulator I used when developing the scheduler, but it's a lot simpler. Remember to supply "--vdev=event_dsw0". I ran it on my 12-core Skylake desktop (@2,9 GHz, turbo disabled). With zero work and one stage, I get ~640 Mevent/s. For the first few stages you add, you'll see a drop in performance. For example, with 3 stages, you are at ~310 Mevent/s. If you increase DSW_MAX_PORT_OUT_BUFFER and DSW_MAX_PORT_OPS_PER_BG_TASK you see improvements in efficiency on high-core-count machines. On my system, the above goes to 675 M/s for a 1-stage pipeline, and 460 M/s on a 3-stage pipeline, if I apply the following changes to dsw_evdev.h: -#define DSW_MAX_PORT_OUT_BUFFER (32) +#define DSW_MAX_PORT_OUT_BUFFER (64) -#define DSW_MAX_PORT_OPS_PER_BG_TASK (128) +#define DSW_MAX_PORT_OPS_PER_BG_TASK (512) With 500 clock cycles of dummy work, the per-event overhead is ~16 TSC clock cycles/stage and event (i.e. per scheduled event; enqueue + dequeue), if my quick-and-dirty benchmark program does the math correctly. This also includes the overhead from the benchmark program itself. Overhead with a real application will be higher.