From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8570D454E9; Tue, 25 Jun 2024 02:22:29 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 778934026F; Tue, 25 Jun 2024 02:22:29 +0200 (CEST) Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by mails.dpdk.org (Postfix) with ESMTP id 9319A400EF for ; Tue, 25 Jun 2024 02:22:27 +0200 (CEST) Received: by mail-lf1-f49.google.com with SMTP id 2adb3069b0e04-52cd9f9505cso3829291e87.0 for ; Mon, 24 Jun 2024 17:22:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1719274947; x=1719879747; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=7mONX7SWWutQw7iTxnfrKXjdvg1vEAF6DetFuPLBc64=; b=ZoWLZ3395SHwTlzW/qjxCcjPAQiUKBk2nQYxpze+NaBET4yjbkAmb0WyFYgMxK5LE+ gOUmeyPj1bzxMUA2oTmkYXiQMbuL9HbEW+ljYsnP92EgiY7WC13wF65wMS2ZaLoqDrmy 5u6XDrxsQ9hpZ4uuVDwHLjIFwrrZnPDFL2wa3CB98EHFLvBP+srReeefcuest5A6M/Em pQFcBQ1EckPphfs8VUAhuCwQsg13v+knL3Q96B8DqGZ3V2+eeE0UvfdTNSRyqypo91Zh DXTLaT8tLJe8zKmJiVIeXcNB1i+vIaI+IAGZoOqln2c0EAPFOCrZI7i3TDESACjepauO Bfgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719274947; x=1719879747; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7mONX7SWWutQw7iTxnfrKXjdvg1vEAF6DetFuPLBc64=; b=JwYP8RcUFq6vSIba2GWR1f0C0QGrGRPKLlvQlrhBditcyFg34aep2tt9304FAbsaLm k1+dZw0am+wHfHYgoUNCPd78ybVWijsvMR3ThzdZ8gUxDF2/O4p4sDhwerujaVg2IwL6 Kd2HK2kr/8cSG+ujnKV/JYjwBaCMM/+aHkSJqhvl7crUQ/8fPmMdT57hhr8+O3W3Dqhw VKHmQbgsATxDEYe6YV4Dl8Z7IdTDvyizd01Fprliyo7IC4eHyRpl8C9cjYGEcZPK1ME2 qKSUQARMNkfMXo4amZSd5u8++M9BgQFzcD6uYWVRQelfNpfXvQ10ljUkIx3RROgZ5NoC cxuQ== X-Gm-Message-State: AOJu0Yy5kvXsCy7ohkx6LkmflID9nsWKPQuNFWIbPVqnfEin6uNEv5GP m/tryQxS1IUAozV2OxRteoMjQtnwA4vKpk7SfiFLIz42yhWHjjdF X-Google-Smtp-Source: AGHT+IE7FiStnVExCOaY8ZvMfjEhD0j7EpV1ausF0AgrLE4PWogFUDrFUupqeQmNdYkKfaXtWePWtQ== X-Received: by 2002:a05:6512:b90:b0:52b:88c3:b2bc with SMTP id 2adb3069b0e04-52cdf820a00mr4799374e87.48.1719274946447; Mon, 24 Jun 2024 17:22:26 -0700 (PDT) Received: from sovereign (broadband-109-173-110-33.ip.moscow.rt.ru. [109.173.110.33]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-52cdb29e9cbsm945273e87.111.2024.06.24.17.22.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jun 2024 17:22:25 -0700 (PDT) Date: Tue, 25 Jun 2024 03:22:24 +0300 From: Dmitry Kozlyuk To: =?UTF-8?B?TcOhcmlv?= Kuka Cc: dev@dpdk.org, orika@nvidia.com, bingz@nvidia.com, viktorin@cesnet.cz Subject: Re: Hairpin Queues Throughput ConnectX-6 Message-ID: <20240625032224.45b65339@sovereign> In-Reply-To: <3d746dbc-330e-403f-b87f-bf495cac3437@cesnet.cz> References: <3d746dbc-330e-403f-b87f-bf495cac3437@cesnet.cz> X-Mailer: Claws Mail 3.18.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi M=C3=A1rio, 2024-06-19 08:45 (UTC+0200), M=C3=A1rio Kuka: > Hello, >=20 > I want to use hairpin queues to forward high priority traffic (such as=20 > LACP). > My goal is to ensure that this traffic is not dropped in case the=20 > software pipeline is overwhelmed. > But during testing with dpdk-testpmd I can't achieve full throughput for= =20 > hairpin queues. For maintainers: I'd like to express interest in this use case too. >=20 > The best result I have been able to achieve for 64B packets is 83 Gbps=20 > in this configuration: > $ sudo dpdk-testpmd -l 0-1 -n 4 -a 0000:17:00.0,hp_buf_log_sz=3D19 --=20 > --rxq=3D1 --txq=3D1 --rxd=3D4096 --txd=3D4096 --hairpinq=3D2 > testpmd> flow create 0 ingress pattern eth src is 00:10:94:00:00:03 / =20 > end actions rss queues 1 2 end / end Try enabling "Explicit Tx rule" mode if possible. I was able to achieve 137 Mpps @ 64B with the following command: dpdk-testpmd -a 21:00.0 -a c1:00.0 --in-memory -- \ -i --rxq=3D1 --txq=3D1 --hairpinq=3D8 --hairpin-mode=3D0x10 You might get even better speed, because my flow rules were more complicated (RTE Flow based "router on-a-stick"): flow create 0 ingress group 1 pattern eth / vlan vid is 721 / end actions o= f_set_vlan_vid vlan_vid 722 / rss queues 1 2 3 4 5 6 7 8 end / end flow create 1 ingress group 1 pattern eth / vlan vid is 721 / end actions o= f_set_vlan_vid vlan_vid 722 / rss queues 1 2 3 4 5 6 7 8 end / end flow create 0 ingress group 1 pattern eth / vlan vid is 722 / end actions o= f_set_vlan_vid vlan_vid 721 / rss queues 1 2 3 4 5 6 7 8 end / end flow create 1 ingress group 1 pattern eth / vlan vid is 722 / end actions o= f_set_vlan_vid vlan_vid 721 / rss queues 1 2 3 4 5 6 7 8 end / end flow create 0 ingress group 0 pattern end actions jump group 1 / end flow create 1 ingress group 0 pattern end actions jump group 1 / end >=20 > For packets in the range 68-80B I measured even lower throughput. > Full throughput I measured only from packets larger than 112B >=20 > For only one queue, I didn't get more than 55Gbps: > $ sudo dpdk-testpmd -l 0-1 -n 4 -a 0000:17:00.0,hp_buf_log_sz=3D19 --=20 > --rxq=3D1 --txq=3D1 --rxd=3D4096 --txd=3D4096 --hairpinq=3D1 -i > testpmd> flow create 0 ingress pattern eth src is 00:10:94:00:00:03 / =20 > end actions queue index 1 / end >=20 > I tried to use locked device memory for TX and RX queues, but it seems=20 > that this is not supported: > "--hairpin-mode=3D0x011000" (bit 16 - hairpin TX queues will use locked=20 > device memory, bit 12 - hairpin RX queues will use locked device memory) RxQ pinned in device memory requires firmware configuration [1]: mlxconfig -y -d $pci_addr set MEMIC_SIZE_LIMIT=3D0 HAIRPIN_DATA_BUFFER_LOCK= =3D1 mlxfwreset -y -d $pci_addr reset [1]: https://doc.dpdk.org/guides/platform/mlx5.html?highlight=3Dhairpin_dat= a_buffer_lock However, pinned RxQ didn't improve anything for me. TxQ pinned in device memory is not supported by net/mlx5. TxQ pinned to DPDK memory made performance awful (predictably). > I was expecting that achieving full throughput with hairpin queues would= =20 > not be a problem. > Is my expectation too optimistic? >=20 > What other parameters besides 'hp_buf_log_sz' can I use to achieve full=20 > throughput? In my experiments, default "hp_buf_log_sz" of 16 is optimal. The most influential parameter appears to be the number of hairpin queues. > I tried combining the following parameters: mprq_en=3D, rxqs_min_mprq=3D,= =20 > mprq_log_stride_num=3D, txq_inline_mpw=3D, rxq_pkt_pad_en=3D, > but with no positive impact on throughput.