From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6CDCDA0032; Wed, 16 Mar 2022 14:47:13 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C022541155; Wed, 16 Mar 2022 14:46:47 +0100 (CET) Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by mails.dpdk.org (Postfix) with ESMTP id A57A1410F3 for ; Wed, 16 Mar 2022 14:46:46 +0100 (CET) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 328FA5C00EE; Wed, 16 Mar 2022 09:46:46 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Wed, 16 Mar 2022 09:46:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm1; bh=gOkgfR0wzfzMB4 zZbghF44Y9JnqiiW6RK42Oz6m3cRU=; b=BlG8oDlD1TiiYlMGDnfZ1LKuoHonXI 7E/6LEKwoJ1WbMfEHzYxlkNvWqewXOvANJscddF9zavLXY7Jlz/+pUlIcTek94NG EkCQxVxG2anfFzKAvImq6iGzRsbbG3RD/pz3ch+IIVpCZfv9ASQcGZuRk/2isluF AnM0X6Yggx2/rUiB+9ml0bXqLDPHOEY+0ClYW+5hHwLNdS2+wjX/EcDRaFhW1lXM na5iH9tupX888Qn1FzSNYf/bsL39MLQzHLSfrOYE/SyXbjZAvry9Wp/GIitrFNci s5a83TWP46sBeuVXFEoP/phsw9N4pyX4ejsK7fizfuW/+C2/mgLGNtFw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:sender:subject :subject:to:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; bh=gOkgfR0wzfzMB4zZbghF44Y9JnqiiW6RK42Oz6m3c RU=; b=lVykME66xwCKuyYxdQZKmpHlRceG/fw5VVpb9e+bTq+4QMuHqzImiSPC4 ZRItHP7TgM/bLOyv7z1X68IWWUEYezqPV5T9ryPXKWze7Lyv6wMTTb8R3qHy5A/K ggFDquh2WwTl0dhhtrTsCaJQJ6cCjd78JKks/0xJouwvAAql24QlslPJ0ZBUslYW ExL9zxKbOH0eLEZ/nzDgPtWsN5V7r1Qo+YzINtdCpsU4CwJiFJeffRzicBD/QI4Y bMfB4S4Qj54O1vBwkgpI9exZAmp+CcJd5OnH6eAXG6WnW+4qZao9Bi+51sLy78aR 5hdxaThhUfIxC/ky54Y90AuSfk8IQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvvddrudefvddgheegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkfgjfhgggfgtsehtufertddttddvnecuhfhrohhmpefvhhhomhgr shcuofhonhhjrghlohhnuceothhhohhmrghssehmohhnjhgrlhhonhdrnhgvtheqnecugg ftrfgrthhtvghrnhepjeetgfeludfftdevvdejgeelheehtddvffffhefhudfgiefhffeg udegjeefveetnecuffhomhgrihhnpehophgvnhhvshifihhttghhrdhorhhgpdhgihhthh husgdrtghomhenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhr ohhmpehthhhomhgrshesmhhonhhjrghlohhnrdhnvght X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 16 Mar 2022 09:46:44 -0400 (EDT) From: Thomas Monjalon To: Ilya Maximets Cc: dev , Sriharsha Basavapatna , Gaetan Rivet , Eli Britstein , Ivan Malov , Andrew Rybchenko , Ori Kam , Ian Stokes Subject: Re: rte_flow API change request: tunnel info restoration is too slow Date: Wed, 16 Mar 2022 14:46:42 +0100 Message-ID: <4878373.LvFx2qVVIh@thomas> In-Reply-To: <7c70ff8a-296b-6f4e-53c7-ad0f825b838c@ovn.org> References: <5248c2ca-f2a6-3fb0-38b8-7f659bfa40de@ovn.org> <6043769.DvuYhMxLoT@thomas> <7c70ff8a-296b-6f4e-53c7-ad0f825b838c@ovn.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org 16/03/2022 13:25, Ilya Maximets: > On 3/16/22 10:41, Thomas Monjalon wrote: > > 15/03/2022 23:12, Ilya Maximets: > >> Hi, everyone. > >> > >> After implementing support for tunnel offloading in OVS we faced a > >> significant performance issue caused by the requirement to call > >> rte_flow_get_restore_info() function on a per-packet basis. > >> > >> The main problem is that once the tunnel offloading is configured, > >> there is no way for the application to tell if a particular packet > >> was partially processed in the hardware and the tunnel info has to > >> be restored. What we have to do right now is to call the > >> rte_flow_get_restore_info() unconditionally for every packet. The > >> result of that call says if we have the tunnel info or not. > >> > >> rte_flow_get_restore_info() call itself is very heavy. It is at > >> least a couple of indirect function calls and the device lock > >> on the application side (not-really-thread-safety of the rte_flow > >> API is a separate topic). Internal info lookup inside the driver > >> also costs a lot (depends on a driver). > >> > >> It has been measured that having this call on a per-packet basis can > >> reduce application performance (OVS) by a factor of 3 in some cases: > >> https://mail.openvswitch.org/pipermail/ovs-dev/2021-November/389265.html > >> https://github.com/openvswitch/ovs/commit/6e50c1651869de0335eb4b7fd0960059c5505f5c > >> (Above patch avoid the problem in a hacky way for devices that doesn't > >> support tunnel offloading, but it's not applicable to situation > >> where device actually supports it, since the API has to be called.) > >> > >> Another tricky part is that we have to call rte_flow_get_restore_info() > >> before checking other parts of the mbuf, because mlx5 driver, for > >> example, re-uses the mbuf.hash.fdir value for both tunnel info > >> restoration and classification offloading, so the application has > >> no way to tell which one is used right now and has to call the > >> restoration API first in order to find out. > >> > >> > >> What we need: > >> > >> A generic and fast (couple of CPU cycles) API that will clearly say > >> if the heavy rte_flow_get_restore_info() has to be called for a > >> particular packet or not. Ideally, that should be a static mbuf > >> flag that can be easily checked by the application. > > > > A dynamic mbuf flag, defined in the API, looks to be a good idea. > > Makes sense. OTOH, I'm not sure what is the profit of having it > dynamically allocated if it will need to be always defined. True. We need to discuss whether we can have situations where it is not registered at all. We recently introduced a function for initial config of rte_flow, it could be the trigger to register such flag or field. > But, well, it doesn't really matter, I guess. That's a detail but it should be discussed. > >> Calls inside the device driver are way too slow for that purpose, > >> especially if they are not fully thread-safe, or require complex > >> lookups or calculations. > >> > >> I'm assuming here that packets that really need the tunnel info > >> restoration should be fairly rare. > >> > >> > >> Current state: > >> > >> Currently, the get_restore_info() API is implemented only for mlx5 > >> and sfc drivers, AFAICT. > >> SFC driver is already using mbuf flag, but > >> it's dynamic and not exposed to the application. > > > > What is this flag? > > SFC driver defines a dynamic field 'rte_net_sfc_dynfield_ft_id' > and the corresponding flag 'rte_net_sfc_dynflag_ft_id_valid' to > check if the field contains a valid data. OK, so we could deprecate this flag. > >> MLX5 driver re-uses mbuf.hash.fdir value > >> and performs a heavy lookup inside the driver. > > > > We should avoid re-using a field. > > +1 from me, but I'm not familiar with the mlx5 driver enough > to tell how to change it. > > > > >> For now OVS doesn't support tunnel offload with DPDK formally, the > >> code in OVS is under the experimental API ifdef and not compiled-in > >> by default. > >> > >> //Let me know if there is more formal way to submit such requests. > > > > That's a well written request, thanks. > > If you are looking for something more formal, > > it could be a patch in the API. > > I'm not looking for now. :) > I think we need an agreement from driver maintainers first. And > I don't think we can introduce such API change without changing > drivers first, because otherwise we'll end up with the 'has_vlan' > situation and the broken offloading support. OK