From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6C9AC43263; Wed, 1 Nov 2023 19:05:50 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A6E384113D; Wed, 1 Nov 2023 19:05:35 +0100 (CET) Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) by mails.dpdk.org (Postfix) with ESMTP id F347940A7F for ; Wed, 1 Nov 2023 19:05:33 +0100 (CET) Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-6bd0e1b1890so106026b3a.3 for ; Wed, 01 Nov 2023 11:05:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1698861933; x=1699466733; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=U77Yi7lbfyn8RYyyObf9wDcdhY8VNBTi21Q9I6NsYaY=; b=x2CK1WCUck2bWtIJuZ8pdZTwUp3SbgQti7KkpF5fesoqZAD6P2027ltzeNLUJdDZ/p Dd9LcVuY3wNE0CMdkDzBoBNNMr/hKshmhMh+1YamsPKxFgyMbM+xFRU2mpEN3tsyeh9F ETK8JpmJdWkYuY7illge4Spf8RC896j712XGty7qPSXnZ4oM2wRHLSgfWx/KCV6ALMdO z27dybTUrNaxYepQNhoUTToqbnVl2nDtjcuIZI9BppEuT3rPBorKt0EBzxWfCZUPOFPJ JShTxhmZek+YuE6XLQCwN7p+KO922NarJJwcIyd+bKNfJHd0rws8Hqo5/GZnuV1Jw+m2 RgIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698861933; x=1699466733; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=U77Yi7lbfyn8RYyyObf9wDcdhY8VNBTi21Q9I6NsYaY=; b=mCffXH3vBSB/S2PJHyQnk/re3yBaM/Kvyvh4FD7RZypDbq0MAZEdCdMpKR2Y4mLWSs z7IDVm/qq04EqSt+OHdjz0F44lbIRwAhJRKT95oherutOzLwZ8AmSbIQp6bsRGIwzGH9 L/hTdQ4s/UOa6Fx4HM4kRN1ntN2enXNrmd/yRze/vYK27XARSfYq+LCniEtfbgLys9pQ GcDyiONmOiUA0wnoaHpHDw8vgIIOOpfd0OKm1KfNZpOSujljyrLxm7jZ8UEHhkStQJlq zXFkfebaoYTvab7znadmj9yzxtm6WZe9mMgG/NkZGe67g2QCOcLR7qnPeNtAqWtY9QOo 8gEw== X-Gm-Message-State: AOJu0YwRoC6hEQECF0O01SyAh8fFHTdh6kMwubca3ZMiyF97YYwftz4B CngjS6TdWx9e8dueAczwmlLzdkM4E4BXbYcdhVbzYzFFTxg= X-Google-Smtp-Source: AGHT+IE8AjDs44a6Vejml0xewP/PAhiDzS84N2rw3xIxBmOUgICHxIQKqgZqHNFqImoCXapFk0vLDQ== X-Received: by 2002:a05:6a20:729e:b0:181:7d6d:c0fa with SMTP id o30-20020a056a20729e00b001817d6dc0famr986757pzk.49.1698861932760; Wed, 01 Nov 2023 11:05:32 -0700 (PDT) Received: from fedora.. ([38.142.2.14]) by smtp.gmail.com with ESMTPSA id d9-20020a056a0010c900b006be7d407a11sm1575286pfu.178.2023.11.01.11.05.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Nov 2023 11:05:31 -0700 (PDT) From: Stephen Hemminger To: dev@dpdk.org Cc: Madhuker Mythri , Stephen Hemminger Subject: [PATCH v6 2/3] net/tap: Fixed RSS algorithm to support fragmented packets Date: Wed, 1 Nov 2023 11:02:47 -0700 Message-ID: <20231101180526.214773-3-stephen@networkplumber.org> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231101180526.214773-1-stephen@networkplumber.org> References: <20230716212544.5625-1-stephen@networkplumber.org> <20231101180526.214773-1-stephen@networkplumber.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Madhuker Mythri As per analysis on Tap PMD, the existing RSS algorithm considering 4-tuple(Src-IP, Dst-IP, Src-port and Dst-port) and identification of fragment packets is not done, thus we are seeing all the fragmented chunks of single packet differs in RSS hash value and distributed across multiple queues. The RSS algorithm assumes that, all the incoming IP packets are based on L4-protocol(UDP/TCP) and trying to fetch the L4 fields(Src-port and Dst-port) for each incoming packet, but for the fragmented chunks these L4-header will not be present(except for first packet) and should not consider in RSS hash for L4 header fields in-case of fragmented chunks. Which is a bug in the RSS algorithm implemented in the BPF functionality under TAP PMD. So, modified the RSS eBPF C-program and generated the structure of C-array in the 'tap_bpf_insns.h' file, which is in eBPF byte-code instructions format. Bugzilla Id: 870 Signed-off-by: Madhuker Mythri Signed-off-by: Stephen Hemminger --- drivers/net/tap/bpf/tap_bpf_program.c | 47 ++++++++++++++++++++++----- 1 file changed, 39 insertions(+), 8 deletions(-) diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c index d65021d8a1..369c7b107f 100644 --- a/drivers/net/tap/bpf/tap_bpf_program.c +++ b/drivers/net/tap/bpf/tap_bpf_program.c @@ -19,6 +19,8 @@ #include "bpf_elf.h" #include "../tap_rss.h" +#include "bpf_api.h" + /** Create IPv4 address */ #define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \ (((b) & 0xff) << 16) | \ @@ -133,6 +135,8 @@ rss_l3_l4(struct __sk_buff *skb) __u8 *key = 0; __u32 len; __u32 queue = 0; + bool mf = 0; + __u16 frag_off = 0; rsskey = map_lookup_elem(&map_keys, &key_idx); if (!rsskey) { @@ -157,6 +161,8 @@ rss_l3_l4(struct __sk_buff *skb) return TC_ACT_OK; __u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr); + __u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off); + __u8 *prot_addr = data + off + offsetof(struct iphdr, protocol); __u8 *src_dst_port = data + off + sizeof(struct iphdr); struct ipv4_l3_l4_tuple v4_tuple = { .src_addr = IPv4(*(src_dst_addr + 0), @@ -167,11 +173,25 @@ rss_l3_l4(struct __sk_buff *skb) *(src_dst_addr + 5), *(src_dst_addr + 6), *(src_dst_addr + 7)), - .sport = PORT(*(src_dst_port + 0), - *(src_dst_port + 1)), - .dport = PORT(*(src_dst_port + 2), - *(src_dst_port + 3)), + .sport = 0, + .dport = 0, }; + /** Fetch the L4-payer port numbers only in-case of TCP/UDP + ** and also if the packet is not fragmented. Since fragmented + ** chunks do not have L4 TCP/UDP header. + **/ + if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) { + frag_off = PORT(*(frag_off_addr + 0), + *(frag_off_addr + 1)); + mf = frag_off & 0x2000; + frag_off = frag_off & 0x1fff; + if (mf == 0 && frag_off == 0) { + v4_tuple.sport = PORT(*(src_dst_port + 0), + *(src_dst_port + 1)); + v4_tuple.dport = PORT(*(src_dst_port + 2), + *(src_dst_port + 3)); + } + } __u8 input_len = sizeof(v4_tuple) / sizeof(__u32); if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3)) input_len--; @@ -184,6 +204,9 @@ rss_l3_l4(struct __sk_buff *skb) offsetof(struct ipv6hdr, saddr); __u8 *src_dst_port = data + off + sizeof(struct ipv6hdr); + __u8 *next_hdr = data + off + + offsetof(struct ipv6hdr, nexthdr); + struct ipv6_l3_l4_tuple v6_tuple; for (j = 0; j < 4; j++) *((uint32_t *)&v6_tuple.src_addr + j) = @@ -193,10 +216,18 @@ rss_l3_l4(struct __sk_buff *skb) *((uint32_t *)&v6_tuple.dst_addr + j) = __builtin_bswap32(*((uint32_t *) src_dst_addr + 4 + j)); - v6_tuple.sport = PORT(*(src_dst_port + 0), - *(src_dst_port + 1)); - v6_tuple.dport = PORT(*(src_dst_port + 2), - *(src_dst_port + 3)); + + /** Fetch the L4 header port-numbers only if next-header + * is TCP/UDP **/ + if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) { + v6_tuple.sport = PORT(*(src_dst_port + 0), + *(src_dst_port + 1)); + v6_tuple.dport = PORT(*(src_dst_port + 2), + *(src_dst_port + 3)); + } else { + v6_tuple.sport = 0; + v6_tuple.dport = 0; + } __u8 input_len = sizeof(v6_tuple) / sizeof(__u32); if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3)) -- 2.41.0