From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2ED07454AA for ; Thu, 20 Jun 2024 15:05:39 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E6F87402D2; Thu, 20 Jun 2024 15:05:38 +0200 (CEST) Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com [209.85.167.43]) by mails.dpdk.org (Postfix) with ESMTP id 7EED9402B1 for ; Thu, 20 Jun 2024 15:05:37 +0200 (CEST) Received: by mail-lf1-f43.google.com with SMTP id 2adb3069b0e04-52cc129c78fso803701e87.2 for ; Thu, 20 Jun 2024 06:05:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718888737; x=1719493537; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=u0GTtzshqtB3qK+uacUyrbFt23MltyIeErLLmA8t3AA=; b=Mu6qCRCPNf8QI/ARFW6Ln/69twqwRDqm37QPf+/KfLQyd5nwO08PiluMEtXtN/PkqN W3BO+hpGUQQSZjzaXgJAihd+mx9sO0RDVyxXlagiavz83V9poxKTdbh058GIkWa4jHd3 kBJ+w+ze1YJmsWnjwCC7jbwm/6iX2hj5eeTcY9S9I4lNTgoMmq2JwW7dsWsBlGlOYro7 QraYT0Dy6T7cUL47gHlP4tS4y3iFaT+YvtxWn4m/FwTGni6pH7p3Sn9cahpG8Nhtg3Vq gYdEIBRkqbE7R+SJ2+xvlL7h+N6+xUq/ByNSbiJaCp0ypA5VG6XrV1A+V8rV1fwR8IsV PJOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718888737; x=1719493537; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=u0GTtzshqtB3qK+uacUyrbFt23MltyIeErLLmA8t3AA=; b=IL/zfFYul04qs+i8wvB0aQWOb9IDhwoW7ou84qcFGikSE2C5xQWREmsmrxBPsJ12aY j2395FDu7sYX7fDLHcWd9sjtu93pw7k+kpx8OcYFT/PKgqlbczJ6I9E8BE/HtHOCKKv1 8l7x5w13c3lI9INjmTbxYxCk6KSGrp4ukDQnqQWunr+xbYi5hyXUN+GotB1KZ855Leof 7QvlWmksHY6hGsZZXwzMUgCZsOtQBrmuLfWkyvT5Y5WagVatT93d/tZqiyJsA7iLqCXs gNrqNedsR4g4B7kCB+EmGptLZWyZu3eOOzD+Son7mtc+XA0vLyZuGpg63Xquv0WCbPmz GgnQ== X-Gm-Message-State: AOJu0YxsR/0eBcIohD1AYCCabxhX0VUBRCYzE7Utihw1pj2PUudL5NxQ NXxeqz+nJv8aYe8g1XjXll+YRcBWiaZrHc21q0rwunGlnrrGQT9Y X-Google-Smtp-Source: AGHT+IGe3yDTph1LPScNDZ7r/zqfP7Gu0V8YND80hXJmKsU6g+AkBdxLwrcSWZvfitV+V2T6FyPmig== X-Received: by 2002:a05:6512:1053:b0:52c:9e51:c3f with SMTP id 2adb3069b0e04-52ccaa919aemr4478835e87.42.1718888736109; Thu, 20 Jun 2024 06:05:36 -0700 (PDT) Received: from sovereign (broadband-109-173-110-33.ip.moscow.rt.ru. [109.173.110.33]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-52ca282f1c1sm2031636e87.98.2024.06.20.06.05.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Jun 2024 06:05:35 -0700 (PDT) Date: Thu, 20 Jun 2024 16:05:34 +0300 From: Dmitry Kozlyuk To: Dariusz Sosnowski Cc: "users@dpdk.org" Subject: Re: [net/mlx5] Performance drop with HWS compared to SWS Message-ID: <20240620160534.218f544c@sovereign> In-Reply-To: References: <20240613120145.057d4963@sovereign> <20240613231448.63f1dbbd@sovereign> X-Mailer: Claws Mail 3.18.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org 2024-06-19 19:15 (UTC+0000), Dariusz Sosnowski: [snip] > > Case "jump_rss", SWS > > ==================== > > Jump to group 1, then RSS. > > Result: 37 Mpps (?!) > > This "37 Mpps" seems to be caused by PCIe bottleneck, which MPRQ is supposed > > to overcome. > > Is MPRQ limited only to default RSS in SWS mode? > > > > /root/build/app/dpdk-testpmd -l 0-31,64-95 -a > > 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \ > > -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32 > > > > flow create 0 ingress group 0 pattern end actions jump group 1 / end flow create > > 0 ingress group 1 pattern end actions rss queues 0 1 2 3 4 5 6 7 8 9 10 11 12 13 > > 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 end / end # start > > > > mlnx_perf -i enp33s0f0np0 -t 1: > > > > rx_vport_unicast_packets: 38,155,359 > > rx_vport_unicast_bytes: 2,441,942,976 Bps = 19,535.54 Mbps > > tx_packets_phy: 7,586 > > rx_packets_phy: 151,531,694 > > tx_bytes_phy: 485,568 Bps = 3.88 Mbps > > rx_bytes_phy: 9,698,029,248 Bps = 77,584.23 Mbps > > tx_mac_control_phy: 7,587 > > tx_pause_ctrl_phy: 7,587 > > rx_discards_phy: 113,376,265 > > rx_64_bytes_phy: 151,531,748 Bps = 1,212.25 Mbps > > rx_buffer_passed_thres_phy: 203 > > rx_prio0_bytes: 9,698,066,560 Bps = 77,584.53 Mbps > > rx_prio0_packets: 38,155,328 > > rx_prio0_discards: 113,376,963 > > tx_global_pause: 7,587 > > tx_global_pause_duration: 1,018,266 > > > > Attached: "neohost-cx6dx-jump_rss-sws.txt". > > How are you generating the traffic? Are both IP addresses and TCP ports changing? > > "jump_rss" case degradation seems to be caused by RSS configuration. > It appears that packets are not distributed across all queues. > With these flow commands in SWS all packets should go to queue 0 only. > Could you please check if that's the case on your side? > > It can be alleviated this by specifying RSS hash types on RSS action: > > flow create 0 ingress group 0 pattern end actions jump group 1 / end > flow create 0 ingress group 1 pattern end actions rss queues end types ip tcp end / end > > Could you please try that on your side? > > With HWS flow engine, if RSS action does not have hash types specified, > implementation defaults to hashing on IP addresses. > If IP addresses are variable in your test traffic, that would explain the difference. You are right, SWS performance is 148 Mpps with "types ipv4-tcp end"! I was missing the difference between default RSS "types" for SWS and HWS. For the reference, generated traffic fields: * Ethernet: fixed unicast MACs * VLAN: fixed VLAN IDs, zero TCI * Source IPv4: random from 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 * Destination IPv4: random from a /16 subnet * Source TCP port: random * Destination TCP port: random * TCP flags: SYN [snip] > > Case "jump_miss", SWS > > ===================== > > Result: 148 Mpps > > > > /root/build/app/dpdk-testpmd -l 0-31,64-95 -a > > 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \ > > -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32 > > > > flow create 0 ingress group 0 pattern end actions jump group 1 / end start > > > > mlnx_perf -i enp33s0f0np0 > > > > rx_vport_unicast_packets: 151,526,489 > > rx_vport_unicast_bytes: 9,697,695,296 Bps = 77,581.56 Mbps > > rx_packets_phy: 151,526,193 > > rx_bytes_phy: 9,697,676,672 Bps = 77,581.41 Mbps > > rx_64_bytes_phy: 151,525,423 Bps = 1,212.20 Mbps > > rx_prio0_bytes: 9,697,488,256 Bps = 77,579.90 Mbps > > rx_prio0_packets: 151,523,240 > > > > Attached: "neohost-cx6dx-jump_miss-sws.txt". > > > > > > Case "jump_miss", HWS > > ===================== > > Result: 107 Mpps > > Neohost shows RX Packet Rate = 148 Mpps, but RX Steering Packets = 107 Mpps. > > > > /root/build/app/dpdk-testpmd -l 0-31,64-95 -a > > 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \ > > -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32 > > > > port stop 0 > > flow configure 0 queues_number 1 queues_size 128 counters_number 16 port > > start 0 flow pattern_template 0 create pattern_template_id 1 ingress template > > end flow actions_template 0 create ingress actions_template_id 1 template jump > > group 1 / end mask jump group 0xFFFFFFFF / end flow template_table 0 create > > ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number > > 1 flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 > > postpone false pattern end actions jump group 1 / end flow pull 0 queue 0 > > > > mlnx_perf -i enp33s0f0np0 > > > > rx_steer_missed_packets: 109,463,466 > > rx_vport_unicast_packets: 109,463,450 > > rx_vport_unicast_bytes: 7,005,660,800 Bps = 56,045.28 Mbps > > rx_packets_phy: 151,518,062 > > rx_bytes_phy: 9,697,155,840 Bps = 77,577.24 Mbps > > rx_64_bytes_phy: 151,516,201 Bps = 1,212.12 Mbps > > rx_prio0_bytes: 9,697,137,280 Bps = 77,577.9 Mbps > > rx_prio0_packets: 151,517,782 > > rx_prio0_buf_discard: 42,055,156 > > > > Attached: "neohost-cx6dx-jump_miss-hws.txt". > > As you can see HWS provides "rx_steer_missed_packets" counter, which is not available with SWS. > It counts the number of packets which did not hit any rule and in the end, they had to be dropped. > To enable that, additional HW flows are required which would handle packets which did not hit any rule. > It has a side effect - these HW flows cause enough backpressure > that on very high packet rate, it causes Rx buffer overflow on CX6 Dx. > > After some internal discussions, I learned that it is kind of expected, > because this high number of missed packets is already an indication of the problem - > - NIC resources are wasted on packets for which there is no specified destination. Fair enough. In production, there will be no misses and thus no issue. This case is mostly to exhaust the search space. > > > Case "jump_drop", SWS > > ===================== > > Result: 148 Mpps > > Match all in group 0, jump to group 1; match all in group 1, drop. > > > > /root/build/app/dpdk-testpmd -l 0-31,64-95 -a > > 21:00.0,dv_flow_en=1,mprq_en=1,rx_vec_en=1 --in-memory -- \ > > -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32 > > > > flow create 0 ingress group 0 pattern end actions jump group 1 / end flow create > > 0 ingress group 1 pattern end actions drop / end > > > > mlnx_perf -i enp33s0f0np0 > > > > rx_vport_unicast_packets: 151,705,269 > > rx_vport_unicast_bytes: 9,709,137,216 Bps = 77,673.9 Mbps > > rx_packets_phy: 151,701,498 > > rx_bytes_phy: 9,708,896,128 Bps = 77,671.16 Mbps > > rx_64_bytes_phy: 151,693,532 Bps = 1,213.54 Mbps > > rx_prio0_bytes: 9,707,005,888 Bps = 77,656.4 Mbps > > rx_prio0_packets: 151,671,959 > > > > Attached: "neohost-cx6dx-jump_drop-sws.txt". > > > > > > Case "jump_drop", HWS > > ===================== > > Result: 107 Mpps > > Match all in group 0, jump to group 1; match all in group 1, drop. > > I've also run this test with a counter attached to the dropping table, and it > > showed that indeed only 107 Mpps hit the rule. > > > > /root/build/app/dpdk-testpmd -l 0-31,64-95 -a > > 21:00.0,dv_flow_en=2,mprq_en=1,rx_vec_en=1 --in-memory -- \ > > -i --rxq=32 --txq=32 --forward-mode=rxonly --nb-cores=32 > > > > port stop 0 > > flow configure 0 queues_number 1 queues_size 128 counters_number 16 port > > start 0 flow pattern_template 0 create pattern_template_id 1 ingress template > > end flow actions_template 0 create ingress actions_template_id 1 template jump > > group 1 / end mask jump group 0xFFFFFFFF / end flow template_table 0 create > > ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number > > 1 flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 > > postpone false pattern end actions jump group 1 / end flow pull 0 queue 0 # flow > > actions_template 0 create ingress actions_template_id 2 template drop / end > > mask drop / end flow template_table 0 create ingress group 1 table_id 2 > > pattern_template 1 actions_template 2 rules_number 1 flow queue 0 create 0 > > template_table 2 pattern_template 0 actions_template 0 postpone false pattern > > end actions drop / end flow pull 0 queue 0 > > > > mlnx_perf -i enp33s0f0np0 > > > > rx_vport_unicast_packets: 109,500,637 > > rx_vport_unicast_bytes: 7,008,040,768 Bps = 56,064.32 Mbps > > rx_packets_phy: 151,568,915 > > rx_bytes_phy: 9,700,410,560 Bps = 77,603.28 Mbps > > rx_64_bytes_phy: 151,569,146 Bps = 1,212.55 Mbps > > rx_prio0_bytes: 9,699,889,216 Bps = 77,599.11 Mbps > > rx_prio0_packets: 151,560,756 > > rx_prio0_buf_discard: 42,065,705 > > > > Attached: "neohost-cx6dx-jump_drop-hws.txt". > > We're still looking into "jump_drop" case. Looking forward for your verdict. Thank you for investigating and explaining, understanding helps a lot. > By the way - May I ask what is your target use case with HWS? The subject area is DDoS attack protection, this is why I'm interested in 64B packets which are not so frequent in normal traffic, at wire speed. Drop action performance is crucial, because it is to be used for blocking malicious traffic that comes at the highest packet rates. IPv4 is the primary target, IPv6 is nice to have. Hopefully HWS relaxed matching will allow matching by 5-tuple at max pps. SWS proved to be incapable of that in groups other than 0 because of extra hops, and group 0 has other limitations. Typical flow structures expected: - match by 5-tuple, drop on match, RSS to SW otherwise; - match by 5-tuple, mark and RSS to SW on match, just RSS to SW otherwise; - match by 5-tuple, modify MAC and/or VLAN, RSS to 1-port hairpin queues; - match by 5-tuple, meter (?), RSS to SW or 2-port hairpin queues. These 5-tuples come in bulk from packet inspection on data path, so high insertion speed of HWS is very desirable. Worst case scale: millions of flow rules, ~100K insertions/second, but HWS may prove itself useful to us even at much smaller scale. I've already noticed that the cost of table miss with HWS depends on the table size even if the table is empty (going from 1024 to 2048 adds 0.5 hops), so maybe very large tables are not realistic at max pps. I'm also exploring other features and their performance in pps.