DPDK patches and discussions
 help / color / mirror / Atom feed
* rte_fib network order bug
@ 2024-11-12  9:31 Robin Jarry
  2024-11-13 10:42 ` Medvedkin, Vladimir
  0 siblings, 1 reply; 7+ messages in thread
From: Robin Jarry @ 2024-11-12  9:31 UTC (permalink / raw)
  To: Vladimir Medvedkin, dev

Hi Vladimir,

I started playing with the new RTE_FIB_F_NETWORK_ORDER flag and I found 
that it does not work at all.

rte_fib is based on rte_rib to perform all slow path lookups and 
modifications. Unfortunately, rte_rib does not handle network order 
addresses. This causes the added routes to be incorrectly masked and 
thus inserted at the wrong place in the dir24 structure.

Here is the config I am using:

static struct rte_fib_conf fib_conf = {
	.type = RTE_FIB_DIR24_8,
	.default_nh = 0,
	.max_routes = 65536,
	.rib_ext_sz = 0,
	.dir24_8 = {
		.nh_sz = RTE_FIB_DIR24_8_8B,
		.num_tbl8 = 1 << 15,
	},
	.flags = RTE_FIB_F_NETWORK_ORDER,
};

And here is a short gdb session demonstrating the issue:

rte_fib_add (fib=0x16ed79740, ip=67305985, depth=24 '\030', next_hop=6438576192) at ../subprojects/dpdk/lib/fib/rte_fib.c:126
126             if ((fib == NULL) || (fib->modify == NULL) ||
(gdb) 
129             return fib->modify(fib, ip, depth, next_hop, RTE_FIB_ADD);
(gdb) p (char*)inet_ntoa(ip)
$9 = 0x7ffff72a471c "1.2.3.4"
(gdb) p depth
$10 = 24 '\030'
(gdb) p fib->flags 
$11 = 1 (
(gdb) s
dir24_8_modify (fib=0x16ed79740, ip=67305985, depth=24 '\030', next_hop=6438576192, op=0)
    at ../subprojects/dpdk/lib/fib/dir24_8.c:452
452             struct rte_rib_node *tmp = NULL;
(gdb) n
455             int ret = 0;
(gdb) 
458             if ((fib == NULL) || (depth > RTE_FIB_MAXDEPTH))
(gdb) 
461             dp = rte_fib_get_dp(fib);
(gdb) 
462             rib = rte_fib_get_rib(fib);
(gdb) 
465             if (next_hop > get_max_nh(dp->nh_sz))
(gdb) 
468             ip &= rte_rib_depth_to_mask(depth);
(gdb) p (char*)inet_ntoa(ip)
$12 = 0x7ffff72a471c "1.2.3.4"
(gdb) p depth
$13 = 24 '\030'
(gdb) n
470             node = rte_rib_lookup_exact(rib, ip, depth);
(gdb) p (char*)inet_ntoa(ip)
$14 = 0x7ffff72a471c "0.2.3.4"
(gdb) p (char*)inet_ntoa(rte_rib_depth_to_mask(depth))
$15 = 0x7ffff72a471c "0.255.255.255"

As you can see, the generated mask is invalid. It should have been 
255.255.255.0.

rte_rib would need to have similar network order support and should 
inherit it from rte_fib.

Let me know if you need more information.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: rte_fib network order bug
  2024-11-12  9:31 rte_fib network order bug Robin Jarry
@ 2024-11-13 10:42 ` Medvedkin, Vladimir
  2024-11-13 13:27   ` Robin Jarry
  0 siblings, 1 reply; 7+ messages in thread
From: Medvedkin, Vladimir @ 2024-11-13 10:42 UTC (permalink / raw)
  To: Robin Jarry, dev

Hi Robin,

It should not. Here is documentation says regarding this flag:

/** If set, fib lookup is expecting IPv4 address in network byte order */
#define RTE_FIB_F_NETWORK_ORDER    1

As stated above lookups will be performed while expecting addresses to 
be in BE byte order. Control plane API expects IPv4 prefix address to be 
in CPU byte order.

On 12/11/2024 09:31, Robin Jarry wrote:
> Hi Vladimir,
>
> I started playing with the new RTE_FIB_F_NETWORK_ORDER flag and I 
> found that it does not work at all.
>
> rte_fib is based on rte_rib to perform all slow path lookups and 
> modifications. Unfortunately, rte_rib does not handle network order 
> addresses. This causes the added routes to be incorrectly masked and 
> thus inserted at the wrong place in the dir24 structure.
>
> Here is the config I am using:
>
> static struct rte_fib_conf fib_conf = {
>     .type = RTE_FIB_DIR24_8,
>     .default_nh = 0,
>     .max_routes = 65536,
>     .rib_ext_sz = 0,
>     .dir24_8 = {
>         .nh_sz = RTE_FIB_DIR24_8_8B,
>         .num_tbl8 = 1 << 15,
>     },
>     .flags = RTE_FIB_F_NETWORK_ORDER,
> };
>
> And here is a short gdb session demonstrating the issue:
>
> rte_fib_add (fib=0x16ed79740, ip=67305985, depth=24 '\030', 
> next_hop=6438576192) at ../subprojects/dpdk/lib/fib/rte_fib.c:126
> 126             if ((fib == NULL) || (fib->modify == NULL) ||
> (gdb) 129             return fib->modify(fib, ip, depth, next_hop, 
> RTE_FIB_ADD);
> (gdb) p (char*)inet_ntoa(ip)
> $9 = 0x7ffff72a471c "1.2.3.4"
> (gdb) p depth
> $10 = 24 '\030'
> (gdb) p fib->flags $11 = 1 (
> (gdb) s
> dir24_8_modify (fib=0x16ed79740, ip=67305985, depth=24 '\030', 
> next_hop=6438576192, op=0)
>    at ../subprojects/dpdk/lib/fib/dir24_8.c:452
> 452             struct rte_rib_node *tmp = NULL;
> (gdb) n
> 455             int ret = 0;
> (gdb) 458             if ((fib == NULL) || (depth > RTE_FIB_MAXDEPTH))
> (gdb) 461             dp = rte_fib_get_dp(fib);
> (gdb) 462             rib = rte_fib_get_rib(fib);
> (gdb) 465             if (next_hop > get_max_nh(dp->nh_sz))
> (gdb) 468             ip &= rte_rib_depth_to_mask(depth);
> (gdb) p (char*)inet_ntoa(ip)
> $12 = 0x7ffff72a471c "1.2.3.4"
> (gdb) p depth
> $13 = 24 '\030'
> (gdb) n
> 470             node = rte_rib_lookup_exact(rib, ip, depth);
> (gdb) p (char*)inet_ntoa(ip)
> $14 = 0x7ffff72a471c "0.2.3.4"
> (gdb) p (char*)inet_ntoa(rte_rib_depth_to_mask(depth))
> $15 = 0x7ffff72a471c "0.255.255.255"
>
> As you can see, the generated mask is invalid. It should have been 
> 255.255.255.0.
>
> rte_rib would need to have similar network order support and should 
> inherit it from rte_fib.
>
> Let me know if you need more information.
>
-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: rte_fib network order bug
  2024-11-13 10:42 ` Medvedkin, Vladimir
@ 2024-11-13 13:27   ` Robin Jarry
  2024-11-13 19:39     ` Medvedkin, Vladimir
  0 siblings, 1 reply; 7+ messages in thread
From: Robin Jarry @ 2024-11-13 13:27 UTC (permalink / raw)
  To: Medvedkin, Vladimir, dev

Medvedkin, Vladimir, Nov 13, 2024 at 11:42:
> Hi Robin,
>
> It should not. Here is documentation says regarding this flag:
>
> /** If set, fib lookup is expecting IPv4 address in network byte order */
> #define RTE_FIB_F_NETWORK_ORDER    1
>
> As stated above lookups will be performed while expecting addresses to 
> be in BE byte order. Control plane API expects IPv4 prefix address to be 
> in CPU byte order.

I had not understood that it was *only* the lookups that were network 
order.

The original reason why a RTE_FIB_F_NETWORK_ORDER flag was suggested 
some time ago is that inet_pton() always returns network order 
addresses. It makes it much more natural to keep everything in network 
order instead of having to swap things around.

Now, only having the lookup functions requiring network order addresses 
but all the other functions in the API requiring host order addresses is 
even more confusing.

In my opinion, when RTE_FIB_F_NETWORK_ORDER is set, *all* functions of 
the fib API should take network order addresses. That obviously mean 
that rib should also have a similar flag.

Is that possible without a massive rework? If not, then I think we 
should revert the patch that adds it or at least discourage its use in 
its current state.

What do you think?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: rte_fib network order bug
  2024-11-13 13:27   ` Robin Jarry
@ 2024-11-13 19:39     ` Medvedkin, Vladimir
  2024-11-14  7:43       ` Morten Brørup
  0 siblings, 1 reply; 7+ messages in thread
From: Medvedkin, Vladimir @ 2024-11-13 19:39 UTC (permalink / raw)
  To: Robin Jarry, dev

I think control plane API should work with prefix addresses in CPU byte order. At least our RTE_IPV4 macro works this way. Also, prefix is an address + prefix length (not the mask), so it is more natural if address is in cpu byte order.
Also, I think byte swap should be done on the interface where byte order changes, and this boundary lies outside the FIB library. However, I've added this feature not only because it was asked, but also trying to improve performance in some cases, such as using AVX512 byte swap in vector path for users who don't want to bother about manually do byteswap on the fast path.

Why do you think this would discourage users?


-----Original Message-----
From: Robin Jarry <rjarry@redhat.com> 
Sent: Wednesday, November 13, 2024 1:27 PM
To: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>; dev@dpdk.org
Subject: Re: rte_fib network order bug

Medvedkin, Vladimir, Nov 13, 2024 at 11:42:
> Hi Robin,
>
> It should not. Here is documentation says regarding this flag:
>
> /** If set, fib lookup is expecting IPv4 address in network byte order 
> */ #define RTE_FIB_F_NETWORK_ORDER    1
>
> As stated above lookups will be performed while expecting addresses to 
> be in BE byte order. Control plane API expects IPv4 prefix address to 
> be in CPU byte order.

I had not understood that it was *only* the lookups that were network order.

The original reason why a RTE_FIB_F_NETWORK_ORDER flag was suggested some time ago is that inet_pton() always returns network order addresses. It makes it much more natural to keep everything in network order instead of having to swap things around.

Now, only having the lookup functions requiring network order addresses but all the other functions in the API requiring host order addresses is even more confusing.

In my opinion, when RTE_FIB_F_NETWORK_ORDER is set, *all* functions of the fib API should take network order addresses. That obviously mean that rib should also have a similar flag.

Is that possible without a massive rework? If not, then I think we should revert the patch that adds it or at least discourage its use in its current state.

What do you think?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: rte_fib network order bug
  2024-11-13 19:39     ` Medvedkin, Vladimir
@ 2024-11-14  7:43       ` Morten Brørup
  2024-11-14 10:18         ` Robin Jarry
  0 siblings, 1 reply; 7+ messages in thread
From: Morten Brørup @ 2024-11-14  7:43 UTC (permalink / raw)
  To: Medvedkin, Vladimir, Robin Jarry, dev

> -----Original Message-----
> From: Medvedkin, Vladimir [mailto:vladimir.medvedkin@intel.com]
> Sent: Wednesday, 13 November 2024 20.39
> To: Robin Jarry; dev@dpdk.org
> Subject: RE: rte_fib network order bug
> 
> I think control plane API should work with prefix addresses in CPU byte
> order. At least our RTE_IPV4 macro works this way. Also, prefix is an
> address + prefix length (not the mask), so it is more natural if
> address is in cpu byte order.
> Also, I think byte swap should be done on the interface where byte
> order changes, and this boundary lies outside the FIB library. However,
> I've added this feature not only because it was asked, but also trying
> to improve performance in some cases, such as using AVX512 byte swap in
> vector path for users who don't want to bother about manually do
> byteswap on the fast path.
> 
> Why do you think this would discourage users?
> 

Joining the discussing with a couple of comments.

1. When I saw the byte order flag the first time, it was not clear to me either that it only affected lookups - I too thought it covered the entire API of the library. This needs to be emphasized in the description of the flag. And the flag's name should contain LOOKUP, e.g.:

/** If set, FIB lookup is expecting IPv4 address in network byte order. Note: Only lookup! */
#define RTE_FIB_F_LOOKUP_NETWORK_ORDER    1

2. Control plane API should use CPU byte order. I consider inet_pton() irrelevant in this context.
Adding network byte order lookup for fast path optimization makes good sense, and adding it to the RIB library would be nice too.

If it was an address table (not longest prefix table, but a hash or similar), the learn()/update() function could be fast path, and thus support network byte order too; but it's not, so add() is control plane.

> 
> -----Original Message-----
> From: Robin Jarry <rjarry@redhat.com>
> Sent: Wednesday, November 13, 2024 1:27 PM
> To: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>; dev@dpdk.org
> Subject: Re: rte_fib network order bug
> 
> Medvedkin, Vladimir, Nov 13, 2024 at 11:42:
> > Hi Robin,
> >
> > It should not. Here is documentation says regarding this flag:
> >
> > /** If set, fib lookup is expecting IPv4 address in network byte
> order
> > */ #define RTE_FIB_F_NETWORK_ORDER    1
> >
> > As stated above lookups will be performed while expecting addresses
> to
> > be in BE byte order. Control plane API expects IPv4 prefix address to
> > be in CPU byte order.
> 
> I had not understood that it was *only* the lookups that were network
> order.
> 
> The original reason why a RTE_FIB_F_NETWORK_ORDER flag was suggested
> some time ago is that inet_pton() always returns network order
> addresses. It makes it much more natural to keep everything in network
> order instead of having to swap things around.
> 
> Now, only having the lookup functions requiring network order addresses
> but all the other functions in the API requiring host order addresses
> is even more confusing.
> 
> In my opinion, when RTE_FIB_F_NETWORK_ORDER is set, *all* functions of
> the fib API should take network order addresses. That obviously mean
> that rib should also have a similar flag.
> 
> Is that possible without a massive rework? If not, then I think we
> should revert the patch that adds it or at least discourage its use in
> its current state.
> 
> What do you think?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: rte_fib network order bug
  2024-11-14  7:43       ` Morten Brørup
@ 2024-11-14 10:18         ` Robin Jarry
  2024-11-14 14:35           ` Morten Brørup
  0 siblings, 1 reply; 7+ messages in thread
From: Robin Jarry @ 2024-11-14 10:18 UTC (permalink / raw)
  To: Morten Brørup, Medvedkin, Vladimir, dev

Hi folks,

Morten Brørup, Nov 14, 2024 at 08:43:
> Medvedkin, Vladimir:
>> I think control plane API should work with prefix addresses in CPU 
>> byte order. At least our RTE_IPV4 macro works this way. Also, prefix 
>> is an address + prefix length (not the mask), so it is more natural 
>> if address is in cpu byte order.

This may get into a religion debate, but in my opinion, an IPv4 address 
is *not* an integer. It should be treated as an opaque value. RTE_IPV4 
is only useful to define addresses in unit tests.

I do not know of any IPv4 stack implementation that deals with 
*host order* addresses. Here are a couple of examples where all 
addresses are stored in network order in the control plane:

https://elixir.bootlin.com/linux/v6.11.6/source/net/ipv4/fib_frontend.c#L1069

https://github.com/freebsd/freebsd-src/blob/release/14.1.0/sys/net/route/route_ctl.c#L692

https://git.fd.io/vpp/tree/src/vnet/fib/fib_table.c?h=v24.10#n237

>>
>> Also, I think byte swap should be done on the interface where byte 
>> order changes, and this boundary lies outside the FIB library. 
>> However, I've added this feature not only because it was asked, but 
>> also trying to improve performance in some cases, such as using 
>> AVX512 byte swap in vector path for users who don't want to bother 
>> about manually do byteswap on the fast path.
>>
>> Why do you think this would discourage users?
>
> Joining the discussing with a couple of comments.
>
> 1. When I saw the byte order flag the first time, it was not clear to 
>    me either that it only affected lookups - I too thought it covered 
>    the entire API of the library. This needs to be emphasized in the 
>    description of the flag. And the flag's name should contain LOOKUP, 
>    e.g.:
>
> /** If set, FIB lookup is expecting IPv4 address in network byte order. Note: Only lookup! */
> #define RTE_FIB_F_LOOKUP_NETWORK_ORDER    1
>
> 2. Control plane API should use CPU byte order. I consider inet_pton() 
>    irrelevant in this context. Adding network byte order lookup for 
>    fast path optimization makes good sense, and adding it to the RIB 
>    library would be nice too.
>
>    If it was an address table (not longest prefix table, but a hash or 
>    similar), the learn()/update() function could be fast path, and 
>    thus support network byte order too; but it's not, so add() is 
>    control plane.

Why would control plane use a different representation of addresses 
compared to data plane? Also for consistency with IPv6, I really think 
that *all* addresses should be dealt in their network form.

Cheers.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: rte_fib network order bug
  2024-11-14 10:18         ` Robin Jarry
@ 2024-11-14 14:35           ` Morten Brørup
  0 siblings, 0 replies; 7+ messages in thread
From: Morten Brørup @ 2024-11-14 14:35 UTC (permalink / raw)
  To: Robin Jarry, Medvedkin, Vladimir, dev

> From: Robin Jarry [mailto:rjarry@redhat.com]
> Sent: Thursday, 14 November 2024 11.19
> 
> Hi folks,
> 
> Morten Brørup, Nov 14, 2024 at 08:43:
> > Medvedkin, Vladimir:
> >> I think control plane API should work with prefix addresses in CPU
> >> byte order. At least our RTE_IPV4 macro works this way. Also, prefix
> >> is an address + prefix length (not the mask), so it is more natural
> >> if address is in cpu byte order.
> 
> This may get into a religion debate, but in my opinion, an IPv4 address
> is *not* an integer. It should be treated as an opaque value.

In theory yes. But it is easier/faster performing range comparisons (First <= IP <= Last) when converting to host endian integer for IPv4.
E.g. our appliance allows configuring IP ranges, not only IP subnets.

Thinking more about it, it might just be an old habit, hard to shake off. Treating IPv4 addresses as byte arrays would allow simple memcmp() for range comparison, like IPv6 addresses.

> RTE_IPV4
> is only useful to define addresses in unit tests.

There are plenty of special IP addresses and subnets, where a shortcut macro makes the address easier readable in the code.

> 
> I do not know of any IPv4 stack implementation that deals with
> *host order* addresses. Here are a couple of examples where all
> addresses are stored in network order in the control plane:
> 
> https://elixir.bootlin.com/linux/v6.11.6/source/net/ipv4/fib_frontend.c
> #L1069
> 
> https://github.com/freebsd/freebsd-
> src/blob/release/14.1.0/sys/net/route/route_ctl.c#L692
> 
> https://git.fd.io/vpp/tree/src/vnet/fib/fib_table.c?h=v24.10#n237
> 
> >>
> >> Also, I think byte swap should be done on the interface where byte
> >> order changes, and this boundary lies outside the FIB library.
> >> However, I've added this feature not only because it was asked, but
> >> also trying to improve performance in some cases, such as using
> >> AVX512 byte swap in vector path for users who don't want to bother
> >> about manually do byteswap on the fast path.
> >>
> >> Why do you think this would discourage users?
> >
> > Joining the discussing with a couple of comments.
> >
> > 1. When I saw the byte order flag the first time, it was not clear to
> >    me either that it only affected lookups - I too thought it covered
> >    the entire API of the library. This needs to be emphasized in the
> >    description of the flag. And the flag's name should contain
> LOOKUP,
> >    e.g.:
> >
> > /** If set, FIB lookup is expecting IPv4 address in network byte
> order. Note: Only lookup! */
> > #define RTE_FIB_F_LOOKUP_NETWORK_ORDER    1
> >
> > 2. Control plane API should use CPU byte order. I consider
> inet_pton()
> >    irrelevant in this context. Adding network byte order lookup for
> >    fast path optimization makes good sense, and adding it to the RIB
> >    library would be nice too.
> >
> >    If it was an address table (not longest prefix table, but a hash
> or
> >    similar), the learn()/update() function could be fast path, and
> >    thus support network byte order too; but it's not, so add() is
> >    control plane.
> 
> Why would control plane use a different representation of addresses
> compared to data plane?

Excellent question.
Old habit? Growing up using big endian CPUs, we have come to think of IPv4 addresses as 32 bit numbers, so we keep treating them as such. With this old way of thinking, the only reason to use network endian in the fast path with little endian CPUs is for performance reasons (to save the byte swap) - if not, we would still prefer using host endian in the fast path too.

> Also for consistency with IPv6, I really think
> that *all* addresses should be dealt in their network form.

Food for thought!

> 
> Cheers.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-11-14 14:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-12  9:31 rte_fib network order bug Robin Jarry
2024-11-13 10:42 ` Medvedkin, Vladimir
2024-11-13 13:27   ` Robin Jarry
2024-11-13 19:39     ` Medvedkin, Vladimir
2024-11-14  7:43       ` Morten Brørup
2024-11-14 10:18         ` Robin Jarry
2024-11-14 14:35           ` Morten Brørup

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).