* rte_flow API change request: tunnel info restoration is too slow @ 2022-03-15 22:12 Ilya Maximets 2022-03-16 9:41 ` Thomas Monjalon 0 siblings, 1 reply; 5+ messages in thread From: Ilya Maximets @ 2022-03-15 22:12 UTC (permalink / raw) To: dev Cc: i.maximets, Sriharsha Basavapatna, Gaetan Rivet, Eli Britstein, Ivan Malov, Andrew Rybchenko, Ori Kam, Thomas Monjalon, Ian Stokes Hi, everyone. After implementing support for tunnel offloading in OVS we faced a significant performance issue caused by the requirement to call rte_flow_get_restore_info() function on a per-packet basis. The main problem is that once the tunnel offloading is configured, there is no way for the application to tell if a particular packet was partially processed in the hardware and the tunnel info has to be restored. What we have to do right now is to call the rte_flow_get_restore_info() unconditionally for every packet. The result of that call says if we have the tunnel info or not. rte_flow_get_restore_info() call itself is very heavy. It is at least a couple of indirect function calls and the device lock on the application side (not-really-thread-safety of the rte_flow API is a separate topic). Internal info lookup inside the driver also costs a lot (depends on a driver). It has been measured that having this call on a per-packet basis can reduce application performance (OVS) by a factor of 3 in some cases: https://mail.openvswitch.org/pipermail/ovs-dev/2021-November/389265.html https://github.com/openvswitch/ovs/commit/6e50c1651869de0335eb4b7fd0960059c5505f5c (Above patch avoid the problem in a hacky way for devices that doesn't support tunnel offloading, but it's not applicable to situation where device actually supports it, since the API has to be called.) Another tricky part is that we have to call rte_flow_get_restore_info() before checking other parts of the mbuf, because mlx5 driver, for example, re-uses the mbuf.hash.fdir value for both tunnel info restoration and classification offloading, so the application has no way to tell which one is used right now and has to call the restoration API first in order to find out. What we need: A generic and fast (couple of CPU cycles) API that will clearly say if the heavy rte_flow_get_restore_info() has to be called for a particular packet or not. Ideally, that should be a static mbuf flag that can be easily checked by the application. Calls inside the device driver are way too slow for that purpose, especially if they are not fully thread-safe, or require complex lookups or calculations. I'm assuming here that packets that really need the tunnel info restoration should be fairly rare. Current state: Currently, the get_restore_info() API is implemented only for mlx5 and sfc drivers, AFAICT. SFC driver is already using mbuf flag, but it's dynamic and not exposed to the application. MLX5 driver re-uses mbuf.hash.fdir value and performs a heavy lookup inside the driver. For now OVS doesn't support tunnel offload with DPDK formally, the code in OVS is under the experimental API ifdef and not compiled-in by default. //Let me know if there is more formal way to submit such requests. Best regards, Ilya Maximets. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: rte_flow API change request: tunnel info restoration is too slow 2022-03-15 22:12 rte_flow API change request: tunnel info restoration is too slow Ilya Maximets @ 2022-03-16 9:41 ` Thomas Monjalon 2022-03-16 12:25 ` Ilya Maximets 0 siblings, 1 reply; 5+ messages in thread From: Thomas Monjalon @ 2022-03-16 9:41 UTC (permalink / raw) To: Ilya Maximets Cc: dev, Sriharsha Basavapatna, Gaetan Rivet, Eli Britstein, Ivan Malov, Andrew Rybchenko, Ori Kam, Ian Stokes 15/03/2022 23:12, Ilya Maximets: > Hi, everyone. > > After implementing support for tunnel offloading in OVS we faced a > significant performance issue caused by the requirement to call > rte_flow_get_restore_info() function on a per-packet basis. > > The main problem is that once the tunnel offloading is configured, > there is no way for the application to tell if a particular packet > was partially processed in the hardware and the tunnel info has to > be restored. What we have to do right now is to call the > rte_flow_get_restore_info() unconditionally for every packet. The > result of that call says if we have the tunnel info or not. > > rte_flow_get_restore_info() call itself is very heavy. It is at > least a couple of indirect function calls and the device lock > on the application side (not-really-thread-safety of the rte_flow > API is a separate topic). Internal info lookup inside the driver > also costs a lot (depends on a driver). > > It has been measured that having this call on a per-packet basis can > reduce application performance (OVS) by a factor of 3 in some cases: > https://mail.openvswitch.org/pipermail/ovs-dev/2021-November/389265.html > https://github.com/openvswitch/ovs/commit/6e50c1651869de0335eb4b7fd0960059c5505f5c > (Above patch avoid the problem in a hacky way for devices that doesn't > support tunnel offloading, but it's not applicable to situation > where device actually supports it, since the API has to be called.) > > Another tricky part is that we have to call rte_flow_get_restore_info() > before checking other parts of the mbuf, because mlx5 driver, for > example, re-uses the mbuf.hash.fdir value for both tunnel info > restoration and classification offloading, so the application has > no way to tell which one is used right now and has to call the > restoration API first in order to find out. > > > What we need: > > A generic and fast (couple of CPU cycles) API that will clearly say > if the heavy rte_flow_get_restore_info() has to be called for a > particular packet or not. Ideally, that should be a static mbuf > flag that can be easily checked by the application. A dynamic mbuf flag, defined in the API, looks to be a good idea. > Calls inside the device driver are way too slow for that purpose, > especially if they are not fully thread-safe, or require complex > lookups or calculations. > > I'm assuming here that packets that really need the tunnel info > restoration should be fairly rare. > > > Current state: > > Currently, the get_restore_info() API is implemented only for mlx5 > and sfc drivers, AFAICT. > SFC driver is already using mbuf flag, but > it's dynamic and not exposed to the application. What is this flag? > MLX5 driver re-uses mbuf.hash.fdir value > and performs a heavy lookup inside the driver. We should avoid re-using a field. > For now OVS doesn't support tunnel offload with DPDK formally, the > code in OVS is under the experimental API ifdef and not compiled-in > by default. > > //Let me know if there is more formal way to submit such requests. That's a well written request, thanks. If you are looking for something more formal, it could be a patch in the API. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: rte_flow API change request: tunnel info restoration is too slow 2022-03-16 9:41 ` Thomas Monjalon @ 2022-03-16 12:25 ` Ilya Maximets 2022-03-16 13:46 ` Thomas Monjalon 0 siblings, 1 reply; 5+ messages in thread From: Ilya Maximets @ 2022-03-16 12:25 UTC (permalink / raw) To: Thomas Monjalon Cc: i.maximets, dev, Sriharsha Basavapatna, Gaetan Rivet, Eli Britstein, Ivan Malov, Andrew Rybchenko, Ori Kam, Ian Stokes On 3/16/22 10:41, Thomas Monjalon wrote: > 15/03/2022 23:12, Ilya Maximets: >> Hi, everyone. >> >> After implementing support for tunnel offloading in OVS we faced a >> significant performance issue caused by the requirement to call >> rte_flow_get_restore_info() function on a per-packet basis. >> >> The main problem is that once the tunnel offloading is configured, >> there is no way for the application to tell if a particular packet >> was partially processed in the hardware and the tunnel info has to >> be restored. What we have to do right now is to call the >> rte_flow_get_restore_info() unconditionally for every packet. The >> result of that call says if we have the tunnel info or not. >> >> rte_flow_get_restore_info() call itself is very heavy. It is at >> least a couple of indirect function calls and the device lock >> on the application side (not-really-thread-safety of the rte_flow >> API is a separate topic). Internal info lookup inside the driver >> also costs a lot (depends on a driver). >> >> It has been measured that having this call on a per-packet basis can >> reduce application performance (OVS) by a factor of 3 in some cases: >> https://mail.openvswitch.org/pipermail/ovs-dev/2021-November/389265.html >> https://github.com/openvswitch/ovs/commit/6e50c1651869de0335eb4b7fd0960059c5505f5c >> (Above patch avoid the problem in a hacky way for devices that doesn't >> support tunnel offloading, but it's not applicable to situation >> where device actually supports it, since the API has to be called.) >> >> Another tricky part is that we have to call rte_flow_get_restore_info() >> before checking other parts of the mbuf, because mlx5 driver, for >> example, re-uses the mbuf.hash.fdir value for both tunnel info >> restoration and classification offloading, so the application has >> no way to tell which one is used right now and has to call the >> restoration API first in order to find out. >> >> >> What we need: >> >> A generic and fast (couple of CPU cycles) API that will clearly say >> if the heavy rte_flow_get_restore_info() has to be called for a >> particular packet or not. Ideally, that should be a static mbuf >> flag that can be easily checked by the application. > > A dynamic mbuf flag, defined in the API, looks to be a good idea. Makes sense. OTOH, I'm not sure what is the profit of having it dynamically allocated if it will need to be always defined. But, well, it doesn't really matter, I guess. > >> Calls inside the device driver are way too slow for that purpose, >> especially if they are not fully thread-safe, or require complex >> lookups or calculations. >> >> I'm assuming here that packets that really need the tunnel info >> restoration should be fairly rare. >> >> >> Current state: >> >> Currently, the get_restore_info() API is implemented only for mlx5 >> and sfc drivers, AFAICT. >> SFC driver is already using mbuf flag, but >> it's dynamic and not exposed to the application. > > What is this flag? SFC driver defines a dynamic field 'rte_net_sfc_dynfield_ft_id' and the corresponding flag 'rte_net_sfc_dynflag_ft_id_valid' to check if the field contains a valid data. > >> MLX5 driver re-uses mbuf.hash.fdir value >> and performs a heavy lookup inside the driver. > > We should avoid re-using a field. +1 from me, but I'm not familiar with the mlx5 driver enough to tell how to change it. > >> For now OVS doesn't support tunnel offload with DPDK formally, the >> code in OVS is under the experimental API ifdef and not compiled-in >> by default. >> >> //Let me know if there is more formal way to submit such requests. > > That's a well written request, thanks. > If you are looking for something more formal, > it could be a patch in the API. I'm not looking for now. :) I think we need an agreement from driver maintainers first. And I don't think we can introduce such API change without changing drivers first, because otherwise we'll end up with the 'has_vlan' situation and the broken offloading support. Best regards, Ilya Maximets. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: rte_flow API change request: tunnel info restoration is too slow 2022-03-16 12:25 ` Ilya Maximets @ 2022-03-16 13:46 ` Thomas Monjalon 2022-03-16 14:01 ` Ori Kam 0 siblings, 1 reply; 5+ messages in thread From: Thomas Monjalon @ 2022-03-16 13:46 UTC (permalink / raw) To: Ilya Maximets Cc: dev, Sriharsha Basavapatna, Gaetan Rivet, Eli Britstein, Ivan Malov, Andrew Rybchenko, Ori Kam, Ian Stokes 16/03/2022 13:25, Ilya Maximets: > On 3/16/22 10:41, Thomas Monjalon wrote: > > 15/03/2022 23:12, Ilya Maximets: > >> Hi, everyone. > >> > >> After implementing support for tunnel offloading in OVS we faced a > >> significant performance issue caused by the requirement to call > >> rte_flow_get_restore_info() function on a per-packet basis. > >> > >> The main problem is that once the tunnel offloading is configured, > >> there is no way for the application to tell if a particular packet > >> was partially processed in the hardware and the tunnel info has to > >> be restored. What we have to do right now is to call the > >> rte_flow_get_restore_info() unconditionally for every packet. The > >> result of that call says if we have the tunnel info or not. > >> > >> rte_flow_get_restore_info() call itself is very heavy. It is at > >> least a couple of indirect function calls and the device lock > >> on the application side (not-really-thread-safety of the rte_flow > >> API is a separate topic). Internal info lookup inside the driver > >> also costs a lot (depends on a driver). > >> > >> It has been measured that having this call on a per-packet basis can > >> reduce application performance (OVS) by a factor of 3 in some cases: > >> https://mail.openvswitch.org/pipermail/ovs-dev/2021-November/389265.html > >> https://github.com/openvswitch/ovs/commit/6e50c1651869de0335eb4b7fd0960059c5505f5c > >> (Above patch avoid the problem in a hacky way for devices that doesn't > >> support tunnel offloading, but it's not applicable to situation > >> where device actually supports it, since the API has to be called.) > >> > >> Another tricky part is that we have to call rte_flow_get_restore_info() > >> before checking other parts of the mbuf, because mlx5 driver, for > >> example, re-uses the mbuf.hash.fdir value for both tunnel info > >> restoration and classification offloading, so the application has > >> no way to tell which one is used right now and has to call the > >> restoration API first in order to find out. > >> > >> > >> What we need: > >> > >> A generic and fast (couple of CPU cycles) API that will clearly say > >> if the heavy rte_flow_get_restore_info() has to be called for a > >> particular packet or not. Ideally, that should be a static mbuf > >> flag that can be easily checked by the application. > > > > A dynamic mbuf flag, defined in the API, looks to be a good idea. > > Makes sense. OTOH, I'm not sure what is the profit of having it > dynamically allocated if it will need to be always defined. True. We need to discuss whether we can have situations where it is not registered at all. We recently introduced a function for initial config of rte_flow, it could be the trigger to register such flag or field. > But, well, it doesn't really matter, I guess. That's a detail but it should be discussed. > >> Calls inside the device driver are way too slow for that purpose, > >> especially if they are not fully thread-safe, or require complex > >> lookups or calculations. > >> > >> I'm assuming here that packets that really need the tunnel info > >> restoration should be fairly rare. > >> > >> > >> Current state: > >> > >> Currently, the get_restore_info() API is implemented only for mlx5 > >> and sfc drivers, AFAICT. > >> SFC driver is already using mbuf flag, but > >> it's dynamic and not exposed to the application. > > > > What is this flag? > > SFC driver defines a dynamic field 'rte_net_sfc_dynfield_ft_id' > and the corresponding flag 'rte_net_sfc_dynflag_ft_id_valid' to > check if the field contains a valid data. OK, so we could deprecate this flag. > >> MLX5 driver re-uses mbuf.hash.fdir value > >> and performs a heavy lookup inside the driver. > > > > We should avoid re-using a field. > > +1 from me, but I'm not familiar with the mlx5 driver enough > to tell how to change it. > > > > >> For now OVS doesn't support tunnel offload with DPDK formally, the > >> code in OVS is under the experimental API ifdef and not compiled-in > >> by default. > >> > >> //Let me know if there is more formal way to submit such requests. > > > > That's a well written request, thanks. > > If you are looking for something more formal, > > it could be a patch in the API. > > I'm not looking for now. :) > I think we need an agreement from driver maintainers first. And > I don't think we can introduce such API change without changing > drivers first, because otherwise we'll end up with the 'has_vlan' > situation and the broken offloading support. OK ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: rte_flow API change request: tunnel info restoration is too slow 2022-03-16 13:46 ` Thomas Monjalon @ 2022-03-16 14:01 ` Ori Kam 0 siblings, 0 replies; 5+ messages in thread From: Ori Kam @ 2022-03-16 14:01 UTC (permalink / raw) To: NBU-Contact-Thomas Monjalon (EXTERNAL), Ilya Maximets Cc: dev, Sriharsha Basavapatna, Gaetan Rivet, Eli Britstein, Ivan Malov, Andrew Rybchenko, Ian Stokes, Slava Ovsiienko, Matan Azrad Hi > -----Original Message----- > From: Thomas Monjalon <thomas@monjalon.net> > Sent: Wednesday, March 16, 2022 3:47 PM > Subject: Re: rte_flow API change request: tunnel info restoration is too slow > > 16/03/2022 13:25, Ilya Maximets: > > On 3/16/22 10:41, Thomas Monjalon wrote: > > > 15/03/2022 23:12, Ilya Maximets: > > >> Hi, everyone. > > >> > > >> After implementing support for tunnel offloading in OVS we faced a > > >> significant performance issue caused by the requirement to call > > >> rte_flow_get_restore_info() function on a per-packet basis. > > >> > > >> The main problem is that once the tunnel offloading is configured, > > >> there is no way for the application to tell if a particular packet > > >> was partially processed in the hardware and the tunnel info has to > > >> be restored. What we have to do right now is to call the > > >> rte_flow_get_restore_info() unconditionally for every packet. The > > >> result of that call says if we have the tunnel info or not. > > >> > > >> rte_flow_get_restore_info() call itself is very heavy. It is at > > >> least a couple of indirect function calls and the device lock > > >> on the application side (not-really-thread-safety of the rte_flow > > >> API is a separate topic). Internal info lookup inside the driver > > >> also costs a lot (depends on a driver). > > >> > > >> It has been measured that having this call on a per-packet basis can > > >> reduce application performance (OVS) by a factor of 3 in some cases: > > >> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpip > ermail%2Fovs-dev%2F2021- > November%2F389265.html&data=04%7C01%7Corika%40nvidia.com%7C16130914f2354dc02cbe08 > da075369f1%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637830352111149472%7CUnknown > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C > 3000&sdata=ZciqL%2FK8xhJLhVFJjn%2B6euRk7nt9HVA3Ych4Kqv0%2BY4%3D&reserved=0 > > >> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenvswitch > %2Fovs%2Fcommit%2F6e50c1651869de0335eb4b7fd0960059c5505f5c&data=04%7C01%7Corika% > 40nvidia.com%7C16130914f2354dc02cbe08da075369f1%7C43083d15727340c1b7db39efd9ccc17a%7C0% > 7C0%7C637830352111149472%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2lu > MzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=5cb6n8PqNIgE49013%2B83%2Buy8shi8x > YHPoX%2BrHoyt8ig%3D&reserved=0 > > >> (Above patch avoid the problem in a hacky way for devices that doesn't > > >> support tunnel offloading, but it's not applicable to situation > > >> where device actually supports it, since the API has to be called.) > > >> > > >> Another tricky part is that we have to call rte_flow_get_restore_info() > > >> before checking other parts of the mbuf, because mlx5 driver, for > > >> example, re-uses the mbuf.hash.fdir value for both tunnel info > > >> restoration and classification offloading, so the application has > > >> no way to tell which one is used right now and has to call the > > >> restoration API first in order to find out. > > >> > > >> > > >> What we need: > > >> > > >> A generic and fast (couple of CPU cycles) API that will clearly say > > >> if the heavy rte_flow_get_restore_info() has to be called for a > > >> particular packet or not. Ideally, that should be a static mbuf > > >> flag that can be easily checked by the application. > > > > > > A dynamic mbuf flag, defined in the API, looks to be a good idea. > > > > Makes sense. OTOH, I'm not sure what is the profit of having it > > dynamically allocated if it will need to be always defined. > > True. > We need to discuss whether we can have situations where it is not registered > at all. We recently introduced a function for initial config of rte_flow, > it could be the trigger to register such flag or field. > Why would it always be defined? As I can see it this flag is only used if application is planning to use the tunnel. We should also introduce some way to let application know if PMD is taking over the fdir or metadata. > > But, well, it doesn't really matter, I guess. > > That's a detail but it should be discussed. > > > >> Calls inside the device driver are way too slow for that purpose, > > >> especially if they are not fully thread-safe, or require complex > > >> lookups or calculations. > > >> > > >> I'm assuming here that packets that really need the tunnel info > > >> restoration should be fairly rare. > > >> > > >> > > >> Current state: > > >> > > >> Currently, the get_restore_info() API is implemented only for mlx5 > > >> and sfc drivers, AFAICT. > > >> SFC driver is already using mbuf flag, but > > >> it's dynamic and not exposed to the application. > > > > > > What is this flag? > > > > SFC driver defines a dynamic field 'rte_net_sfc_dynfield_ft_id' > > and the corresponding flag 'rte_net_sfc_dynflag_ft_id_valid' to > > check if the field contains a valid data. > > OK, so we could deprecate this flag. > > > >> MLX5 driver re-uses mbuf.hash.fdir value > > >> and performs a heavy lookup inside the driver. > > > > > > We should avoid re-using a field. > > > > +1 from me, but I'm not familiar with the mlx5 driver enough > > to tell how to change it. > > Normally I would agree, but at least in the MLX5 case due to HW limitations, there is a use of the same field so in any case application can't use the fdir, and adding new field will mean penalty in performance. > > > > > >> For now OVS doesn't support tunnel offload with DPDK formally, the > > >> code in OVS is under the experimental API ifdef and not compiled-in > > >> by default. > > >> > > >> //Let me know if there is more formal way to submit such requests. > > > > > > That's a well written request, thanks. > > > If you are looking for something more formal, > > > it could be a patch in the API. > > > > I'm not looking for now. :) > > I think we need an agreement from driver maintainers first. And > > I don't think we can introduce such API change without changing > > drivers first, because otherwise we'll end up with the 'has_vlan' > > situation and the broken offloading support. > > OK > I think you are missing some driver maintainers. Best, Ori ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-03-16 14:01 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-03-15 22:12 rte_flow API change request: tunnel info restoration is too slow Ilya Maximets 2022-03-16 9:41 ` Thomas Monjalon 2022-03-16 12:25 ` Ilya Maximets 2022-03-16 13:46 ` Thomas Monjalon 2022-03-16 14:01 ` Ori Kam
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).