From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f177.google.com (mail-io0-f177.google.com [209.85.223.177]) by dpdk.org (Postfix) with ESMTP id 4F82E2C27 for ; Thu, 13 Apr 2017 16:22:50 +0200 (CEST) Received: by mail-io0-f177.google.com with SMTP id r16so79142681ioi.2 for ; Thu, 13 Apr 2017 07:22:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=XlESRC5FyzNzSFmfrY9JvgmANx+E0Z/ENDqLcvZhvcs=; b=bBIoCuKRjXHVXWQ2JzOACuDXzS3bJ9c0CuSLjNAlVDII1ZgNETSuHL0Ar04UCW8zK1 PZXaX7OB1eH/NTtVlXvfIqjERCdtdQLU0UwYd7vpu7S5ypiR0nuEQpZ0A8QDG1Xjz9o4 XR342f0hffQKfP6++K9Y8TL1o0FXMCpq3oPt4quEGl3JDVGoSd+8mppTw8YaYRniX+NM +RzThJ/LrmclyuY2hX4P6Fk2DzDO4BdrcVgFIDgRY1dxC2705j34NHEwT4KYKgnjll4q RJ8VUy8HjjJlhJ34E38sFQceYJQ6/iGtrHT+4g+HPqgzNiLxYAn0xwfeVeQP2JcbTIlu S0lA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=XlESRC5FyzNzSFmfrY9JvgmANx+E0Z/ENDqLcvZhvcs=; b=QCdBRVi6uejq9cgC0056QEnvZw3EjVozxCs06/Z2iTIwBH97T1oILY5qlsVRH+4MP9 V/ndMPhfoLRMW5J8X0HZClbdHDUblwPMeHnhzAzrHLGxCvkt9KJgqDEEahN9753t1MnD lC2jhxtQofb0sxDj/TZK0WLOWG/jH3tFiE5ERrWgkB2e0xrOygmexQVJ5LpmZKeEL+Wm r67RFgPZU9DqHmGpr9bG1zazpWzeDJK/KJYO4ieaT+aDmzCc8WCHB3rFu3Edv4MKpJFL YwO4oUKMJA+NxU+LP0bVDOpHEuiKoX2gRkn/eGsWtXEMUTF8ZoaZQNGfeELjn9Txz3Sv 9Ttg== X-Gm-Message-State: AN3rC/7CJk3M371LQUAOSEGb9qx2HfiyoiOU8PisG+JWJ5C/4ywdUFfR bWGP24hDJ5IPnGGDUz2zO+lBA8gwMA== X-Received: by 10.107.23.6 with SMTP id 6mr3757212iox.14.1492093321965; Thu, 13 Apr 2017 07:22:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.79.145.19 with HTTP; Thu, 13 Apr 2017 07:21:21 -0700 (PDT) In-Reply-To: References: <5550cab3-aeba-ddb4-63e4-3821f91f0ebe@gmail.com> From: Shihabur Rahman Chowdhury Date: Thu, 13 Apr 2017 10:21:21 -0400 Message-ID: To: Shahaf Shuler Cc: Dave Wallace , Olga Shern , Adrien Mazarguil , "Wiles, Keith" , "users@dpdk.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-users] Low Rx throughput when using Mellanox ConnectX-3 card with DPDK X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Apr 2017 14:22:51 -0000 =E2=80=8B On Thu, Apr 13, 2017 at 1:19 AM, Shahaf Shuler wrote= : > Why did you choose such configuration? > Such configuration may cause high overhead in snoop cycles, as the first > cache line of the packet > Will first be on the Rx lcore and then it will need to be invalidated whe= n > the Tx lcore swaps the macs. > > Since you are using 2 cores anyway, have you tried that each core will do > both Rx and Tx (run to completion)? > =E2=80=8BTo give a bit more context, we are developing a set of packet proc= essors that can be independently deployed as separate processes and can be scaled out independently as well. So a batch of packet goes through a sequence of processes until at some point they are written to the Tx queue or gets dropped because of some processing decision. These packet processors are running as secondary dpdk processes and the rx is being taking place at a primary process (since Mellanox PMD does not allow Rx from a secondary process). In this example configuration, one primary process is doing the Rx, handing over the packet to another secondary process through a shared ring and that secondary process is swapping the MAC and writing packets to Tx queue. We are expecting some performance drop because of the cache invalidation across lcores (also we cannot use the same lcore for different secondary process for mempool cache corruption), but again 7.3Mpps is ~30+% overhead. Since you said, we tried the run to completion processing in the primary process (i.e., rx and tx is now on the same lcore). We also configured pktgent to handle rx and tx on the same lcore as well. With that we are now getting ~9.9-10Mpps with 64B packets. With our multi-process setup that drops down to ~8.4Mpps. So it seems like pktgen was not configured properly. It seems a bit counter-intuitive since from pktgen's side doing rx and tx on different lcore should not cause any cache invalidation (set of rx and tx packets are disjoint). So using different lcores should theoretically be better than handling both rx/tx in the same lcore for pkgetn. Am I missing something here? Thanks