From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <shihab.buet@gmail.com>
Received: from mail-io0-f177.google.com (mail-io0-f177.google.com
 [209.85.223.177]) by dpdk.org (Postfix) with ESMTP id 4F82E2C27
 for <users@dpdk.org>; Thu, 13 Apr 2017 16:22:50 +0200 (CEST)
Received: by mail-io0-f177.google.com with SMTP id r16so79142681ioi.2
 for <users@dpdk.org>; Thu, 13 Apr 2017 07:22:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=XlESRC5FyzNzSFmfrY9JvgmANx+E0Z/ENDqLcvZhvcs=;
 b=bBIoCuKRjXHVXWQ2JzOACuDXzS3bJ9c0CuSLjNAlVDII1ZgNETSuHL0Ar04UCW8zK1
 PZXaX7OB1eH/NTtVlXvfIqjERCdtdQLU0UwYd7vpu7S5ypiR0nuEQpZ0A8QDG1Xjz9o4
 XR342f0hffQKfP6++K9Y8TL1o0FXMCpq3oPt4quEGl3JDVGoSd+8mppTw8YaYRniX+NM
 +RzThJ/LrmclyuY2hX4P6Fk2DzDO4BdrcVgFIDgRY1dxC2705j34NHEwT4KYKgnjll4q
 RJ8VUy8HjjJlhJ34E38sFQceYJQ6/iGtrHT+4g+HPqgzNiLxYAn0xwfeVeQP2JcbTIlu
 S0lA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=XlESRC5FyzNzSFmfrY9JvgmANx+E0Z/ENDqLcvZhvcs=;
 b=QCdBRVi6uejq9cgC0056QEnvZw3EjVozxCs06/Z2iTIwBH97T1oILY5qlsVRH+4MP9
 V/ndMPhfoLRMW5J8X0HZClbdHDUblwPMeHnhzAzrHLGxCvkt9KJgqDEEahN9753t1MnD
 lC2jhxtQofb0sxDj/TZK0WLOWG/jH3tFiE5ERrWgkB2e0xrOygmexQVJ5LpmZKeEL+Wm
 r67RFgPZU9DqHmGpr9bG1zazpWzeDJK/KJYO4ieaT+aDmzCc8WCHB3rFu3Edv4MKpJFL
 YwO4oUKMJA+NxU+LP0bVDOpHEuiKoX2gRkn/eGsWtXEMUTF8ZoaZQNGfeELjn9Txz3Sv
 9Ttg==
X-Gm-Message-State: AN3rC/7CJk3M371LQUAOSEGb9qx2HfiyoiOU8PisG+JWJ5C/4ywdUFfR
 bWGP24hDJ5IPnGGDUz2zO+lBA8gwMA==
X-Received: by 10.107.23.6 with SMTP id 6mr3757212iox.14.1492093321965; Thu,
 13 Apr 2017 07:22:01 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.79.145.19 with HTTP; Thu, 13 Apr 2017 07:21:21 -0700 (PDT)
In-Reply-To: <AM4PR05MB1505FB2BCC0B22B809153A23C3020@AM4PR05MB1505.eurprd05.prod.outlook.com>
References: <CAMGVCn7hrPKj-3pV-THv0BHQw9hWRD=C4V1R-18ydFWR_B14ZA@mail.gmail.com>
 <DC044C0C-9BCC-4FA2-8E5D-2E62DF70EECD@intel.com>
 <CAMGVCn5JCT3AKbNqeJPmZ2MHtgyoSYZcE_EEnyEyGM+uiCK2Tw@mail.gmail.com>
 <5550cab3-aeba-ddb4-63e4-3821f91f0ebe@gmail.com>
 <CAMGVCn4vrY9jHJMSWioD-ARUgKetjT9mNbAWUfKQJvKJ8d=mqA@mail.gmail.com>
 <AM4PR05MB1505FB2BCC0B22B809153A23C3020@AM4PR05MB1505.eurprd05.prod.outlook.com>
From: Shihabur Rahman Chowdhury <shihab.buet@gmail.com>
Date: Thu, 13 Apr 2017 10:21:21 -0400
Message-ID: <CAMGVCn4LNOyWP5BxyMQ9s34JTQSNs_vYAyQZt_z38jRC3TgHjw@mail.gmail.com>
To: Shahaf Shuler <shahafs@mellanox.com>
Cc: Dave Wallace <dwallacelf@gmail.com>, Olga Shern <olgas@mellanox.com>, 
 Adrien Mazarguil <adrien.mazarguil@6wind.com>, "Wiles,
 Keith" <keith.wiles@intel.com>, "users@dpdk.org" <users@dpdk.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.15
Subject: Re: [dpdk-users] Low Rx throughput when using Mellanox ConnectX-3
 card with DPDK
X-BeenThere: users@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK usage discussions <users.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/users>,
 <mailto:users-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/users/>
List-Post: <mailto:users@dpdk.org>
List-Help: <mailto:users-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/users>,
 <mailto:users-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Apr 2017 14:22:51 -0000

=E2=80=8B
On Thu, Apr 13, 2017 at 1:19 AM, Shahaf Shuler <shahafs@mellanox.com> wrote=
:

> Why did you choose such configuration?
> Such configuration may cause high overhead in snoop cycles, as the first
> cache line of the packet
> Will first be on the Rx lcore and then it will need to be invalidated whe=
n
> the Tx lcore swaps the macs.
>
> Since you are using 2 cores anyway, have you tried that each core will do
> both Rx and Tx (run to completion)?
>

=E2=80=8BTo give a bit more context, we are developing a set of packet proc=
essors
that can be independently deployed as separate processes and can be scaled
out independently as well. So a batch of packet goes through a sequence of
processes until at some point they are written to the Tx queue or gets
dropped because of some processing decision. These packet processors are
running as secondary dpdk processes and the rx is being taking place at a
primary process (since Mellanox PMD does not allow Rx from a secondary
process). In this example configuration, one primary process is doing the
Rx, handing over the packet to another secondary process through a shared
ring and that secondary process is swapping the MAC and writing packets to
Tx queue. We are expecting some performance drop because of the cache
invalidation across lcores (also we cannot use the same lcore for different
secondary process for mempool cache corruption), but again 7.3Mpps is ~30+%
overhead.

Since you said, we tried the run to completion processing in the primary
process (i.e., rx and tx is now on the same lcore). We also configured
pktgent to handle rx and tx on the same lcore as well. With that we are now
getting ~9.9-10Mpps with 64B packets. With our multi-process setup that
drops down to ~8.4Mpps. So it seems like pktgen was not configured
properly. It seems a bit counter-intuitive since from pktgen's side doing
rx and tx on different lcore should not cause any cache invalidation (set
of rx and tx packets are disjoint). So using different lcores should
theoretically be better than handling both rx/tx in the same lcore for
pkgetn. Am I missing something here?

Thanks