* rte_fib network order bug
@ 2024-11-12 9:31 Robin Jarry
2024-11-13 10:42 ` Medvedkin, Vladimir
0 siblings, 1 reply; 7+ messages in thread
From: Robin Jarry @ 2024-11-12 9:31 UTC (permalink / raw)
To: Vladimir Medvedkin, dev
Hi Vladimir,
I started playing with the new RTE_FIB_F_NETWORK_ORDER flag and I found
that it does not work at all.
rte_fib is based on rte_rib to perform all slow path lookups and
modifications. Unfortunately, rte_rib does not handle network order
addresses. This causes the added routes to be incorrectly masked and
thus inserted at the wrong place in the dir24 structure.
Here is the config I am using:
static struct rte_fib_conf fib_conf = {
.type = RTE_FIB_DIR24_8,
.default_nh = 0,
.max_routes = 65536,
.rib_ext_sz = 0,
.dir24_8 = {
.nh_sz = RTE_FIB_DIR24_8_8B,
.num_tbl8 = 1 << 15,
},
.flags = RTE_FIB_F_NETWORK_ORDER,
};
And here is a short gdb session demonstrating the issue:
rte_fib_add (fib=0x16ed79740, ip=67305985, depth=24 '\030', next_hop=6438576192) at ../subprojects/dpdk/lib/fib/rte_fib.c:126
126 if ((fib == NULL) || (fib->modify == NULL) ||
(gdb)
129 return fib->modify(fib, ip, depth, next_hop, RTE_FIB_ADD);
(gdb) p (char*)inet_ntoa(ip)
$9 = 0x7ffff72a471c "1.2.3.4"
(gdb) p depth
$10 = 24 '\030'
(gdb) p fib->flags
$11 = 1 (
(gdb) s
dir24_8_modify (fib=0x16ed79740, ip=67305985, depth=24 '\030', next_hop=6438576192, op=0)
at ../subprojects/dpdk/lib/fib/dir24_8.c:452
452 struct rte_rib_node *tmp = NULL;
(gdb) n
455 int ret = 0;
(gdb)
458 if ((fib == NULL) || (depth > RTE_FIB_MAXDEPTH))
(gdb)
461 dp = rte_fib_get_dp(fib);
(gdb)
462 rib = rte_fib_get_rib(fib);
(gdb)
465 if (next_hop > get_max_nh(dp->nh_sz))
(gdb)
468 ip &= rte_rib_depth_to_mask(depth);
(gdb) p (char*)inet_ntoa(ip)
$12 = 0x7ffff72a471c "1.2.3.4"
(gdb) p depth
$13 = 24 '\030'
(gdb) n
470 node = rte_rib_lookup_exact(rib, ip, depth);
(gdb) p (char*)inet_ntoa(ip)
$14 = 0x7ffff72a471c "0.2.3.4"
(gdb) p (char*)inet_ntoa(rte_rib_depth_to_mask(depth))
$15 = 0x7ffff72a471c "0.255.255.255"
As you can see, the generated mask is invalid. It should have been
255.255.255.0.
rte_rib would need to have similar network order support and should
inherit it from rte_fib.
Let me know if you need more information.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rte_fib network order bug
2024-11-12 9:31 rte_fib network order bug Robin Jarry
@ 2024-11-13 10:42 ` Medvedkin, Vladimir
2024-11-13 13:27 ` Robin Jarry
0 siblings, 1 reply; 7+ messages in thread
From: Medvedkin, Vladimir @ 2024-11-13 10:42 UTC (permalink / raw)
To: Robin Jarry, dev
Hi Robin,
It should not. Here is documentation says regarding this flag:
/** If set, fib lookup is expecting IPv4 address in network byte order */
#define RTE_FIB_F_NETWORK_ORDER 1
As stated above lookups will be performed while expecting addresses to
be in BE byte order. Control plane API expects IPv4 prefix address to be
in CPU byte order.
On 12/11/2024 09:31, Robin Jarry wrote:
> Hi Vladimir,
>
> I started playing with the new RTE_FIB_F_NETWORK_ORDER flag and I
> found that it does not work at all.
>
> rte_fib is based on rte_rib to perform all slow path lookups and
> modifications. Unfortunately, rte_rib does not handle network order
> addresses. This causes the added routes to be incorrectly masked and
> thus inserted at the wrong place in the dir24 structure.
>
> Here is the config I am using:
>
> static struct rte_fib_conf fib_conf = {
> .type = RTE_FIB_DIR24_8,
> .default_nh = 0,
> .max_routes = 65536,
> .rib_ext_sz = 0,
> .dir24_8 = {
> .nh_sz = RTE_FIB_DIR24_8_8B,
> .num_tbl8 = 1 << 15,
> },
> .flags = RTE_FIB_F_NETWORK_ORDER,
> };
>
> And here is a short gdb session demonstrating the issue:
>
> rte_fib_add (fib=0x16ed79740, ip=67305985, depth=24 '\030',
> next_hop=6438576192) at ../subprojects/dpdk/lib/fib/rte_fib.c:126
> 126 if ((fib == NULL) || (fib->modify == NULL) ||
> (gdb) 129 return fib->modify(fib, ip, depth, next_hop,
> RTE_FIB_ADD);
> (gdb) p (char*)inet_ntoa(ip)
> $9 = 0x7ffff72a471c "1.2.3.4"
> (gdb) p depth
> $10 = 24 '\030'
> (gdb) p fib->flags $11 = 1 (
> (gdb) s
> dir24_8_modify (fib=0x16ed79740, ip=67305985, depth=24 '\030',
> next_hop=6438576192, op=0)
> at ../subprojects/dpdk/lib/fib/dir24_8.c:452
> 452 struct rte_rib_node *tmp = NULL;
> (gdb) n
> 455 int ret = 0;
> (gdb) 458 if ((fib == NULL) || (depth > RTE_FIB_MAXDEPTH))
> (gdb) 461 dp = rte_fib_get_dp(fib);
> (gdb) 462 rib = rte_fib_get_rib(fib);
> (gdb) 465 if (next_hop > get_max_nh(dp->nh_sz))
> (gdb) 468 ip &= rte_rib_depth_to_mask(depth);
> (gdb) p (char*)inet_ntoa(ip)
> $12 = 0x7ffff72a471c "1.2.3.4"
> (gdb) p depth
> $13 = 24 '\030'
> (gdb) n
> 470 node = rte_rib_lookup_exact(rib, ip, depth);
> (gdb) p (char*)inet_ntoa(ip)
> $14 = 0x7ffff72a471c "0.2.3.4"
> (gdb) p (char*)inet_ntoa(rte_rib_depth_to_mask(depth))
> $15 = 0x7ffff72a471c "0.255.255.255"
>
> As you can see, the generated mask is invalid. It should have been
> 255.255.255.0.
>
> rte_rib would need to have similar network order support and should
> inherit it from rte_fib.
>
> Let me know if you need more information.
>
--
Regards,
Vladimir
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rte_fib network order bug
2024-11-13 10:42 ` Medvedkin, Vladimir
@ 2024-11-13 13:27 ` Robin Jarry
2024-11-13 19:39 ` Medvedkin, Vladimir
0 siblings, 1 reply; 7+ messages in thread
From: Robin Jarry @ 2024-11-13 13:27 UTC (permalink / raw)
To: Medvedkin, Vladimir, dev
Medvedkin, Vladimir, Nov 13, 2024 at 11:42:
> Hi Robin,
>
> It should not. Here is documentation says regarding this flag:
>
> /** If set, fib lookup is expecting IPv4 address in network byte order */
> #define RTE_FIB_F_NETWORK_ORDER 1
>
> As stated above lookups will be performed while expecting addresses to
> be in BE byte order. Control plane API expects IPv4 prefix address to be
> in CPU byte order.
I had not understood that it was *only* the lookups that were network
order.
The original reason why a RTE_FIB_F_NETWORK_ORDER flag was suggested
some time ago is that inet_pton() always returns network order
addresses. It makes it much more natural to keep everything in network
order instead of having to swap things around.
Now, only having the lookup functions requiring network order addresses
but all the other functions in the API requiring host order addresses is
even more confusing.
In my opinion, when RTE_FIB_F_NETWORK_ORDER is set, *all* functions of
the fib API should take network order addresses. That obviously mean
that rib should also have a similar flag.
Is that possible without a massive rework? If not, then I think we
should revert the patch that adds it or at least discourage its use in
its current state.
What do you think?
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: rte_fib network order bug
2024-11-13 13:27 ` Robin Jarry
@ 2024-11-13 19:39 ` Medvedkin, Vladimir
2024-11-14 7:43 ` Morten Brørup
0 siblings, 1 reply; 7+ messages in thread
From: Medvedkin, Vladimir @ 2024-11-13 19:39 UTC (permalink / raw)
To: Robin Jarry, dev
I think control plane API should work with prefix addresses in CPU byte order. At least our RTE_IPV4 macro works this way. Also, prefix is an address + prefix length (not the mask), so it is more natural if address is in cpu byte order.
Also, I think byte swap should be done on the interface where byte order changes, and this boundary lies outside the FIB library. However, I've added this feature not only because it was asked, but also trying to improve performance in some cases, such as using AVX512 byte swap in vector path for users who don't want to bother about manually do byteswap on the fast path.
Why do you think this would discourage users?
-----Original Message-----
From: Robin Jarry <rjarry@redhat.com>
Sent: Wednesday, November 13, 2024 1:27 PM
To: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>; dev@dpdk.org
Subject: Re: rte_fib network order bug
Medvedkin, Vladimir, Nov 13, 2024 at 11:42:
> Hi Robin,
>
> It should not. Here is documentation says regarding this flag:
>
> /** If set, fib lookup is expecting IPv4 address in network byte order
> */ #define RTE_FIB_F_NETWORK_ORDER 1
>
> As stated above lookups will be performed while expecting addresses to
> be in BE byte order. Control plane API expects IPv4 prefix address to
> be in CPU byte order.
I had not understood that it was *only* the lookups that were network order.
The original reason why a RTE_FIB_F_NETWORK_ORDER flag was suggested some time ago is that inet_pton() always returns network order addresses. It makes it much more natural to keep everything in network order instead of having to swap things around.
Now, only having the lookup functions requiring network order addresses but all the other functions in the API requiring host order addresses is even more confusing.
In my opinion, when RTE_FIB_F_NETWORK_ORDER is set, *all* functions of the fib API should take network order addresses. That obviously mean that rib should also have a similar flag.
Is that possible without a massive rework? If not, then I think we should revert the patch that adds it or at least discourage its use in its current state.
What do you think?
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: rte_fib network order bug
2024-11-13 19:39 ` Medvedkin, Vladimir
@ 2024-11-14 7:43 ` Morten Brørup
2024-11-14 10:18 ` Robin Jarry
0 siblings, 1 reply; 7+ messages in thread
From: Morten Brørup @ 2024-11-14 7:43 UTC (permalink / raw)
To: Medvedkin, Vladimir, Robin Jarry, dev
> -----Original Message-----
> From: Medvedkin, Vladimir [mailto:vladimir.medvedkin@intel.com]
> Sent: Wednesday, 13 November 2024 20.39
> To: Robin Jarry; dev@dpdk.org
> Subject: RE: rte_fib network order bug
>
> I think control plane API should work with prefix addresses in CPU byte
> order. At least our RTE_IPV4 macro works this way. Also, prefix is an
> address + prefix length (not the mask), so it is more natural if
> address is in cpu byte order.
> Also, I think byte swap should be done on the interface where byte
> order changes, and this boundary lies outside the FIB library. However,
> I've added this feature not only because it was asked, but also trying
> to improve performance in some cases, such as using AVX512 byte swap in
> vector path for users who don't want to bother about manually do
> byteswap on the fast path.
>
> Why do you think this would discourage users?
>
Joining the discussing with a couple of comments.
1. When I saw the byte order flag the first time, it was not clear to me either that it only affected lookups - I too thought it covered the entire API of the library. This needs to be emphasized in the description of the flag. And the flag's name should contain LOOKUP, e.g.:
/** If set, FIB lookup is expecting IPv4 address in network byte order. Note: Only lookup! */
#define RTE_FIB_F_LOOKUP_NETWORK_ORDER 1
2. Control plane API should use CPU byte order. I consider inet_pton() irrelevant in this context.
Adding network byte order lookup for fast path optimization makes good sense, and adding it to the RIB library would be nice too.
If it was an address table (not longest prefix table, but a hash or similar), the learn()/update() function could be fast path, and thus support network byte order too; but it's not, so add() is control plane.
>
> -----Original Message-----
> From: Robin Jarry <rjarry@redhat.com>
> Sent: Wednesday, November 13, 2024 1:27 PM
> To: Medvedkin, Vladimir <vladimir.medvedkin@intel.com>; dev@dpdk.org
> Subject: Re: rte_fib network order bug
>
> Medvedkin, Vladimir, Nov 13, 2024 at 11:42:
> > Hi Robin,
> >
> > It should not. Here is documentation says regarding this flag:
> >
> > /** If set, fib lookup is expecting IPv4 address in network byte
> order
> > */ #define RTE_FIB_F_NETWORK_ORDER 1
> >
> > As stated above lookups will be performed while expecting addresses
> to
> > be in BE byte order. Control plane API expects IPv4 prefix address to
> > be in CPU byte order.
>
> I had not understood that it was *only* the lookups that were network
> order.
>
> The original reason why a RTE_FIB_F_NETWORK_ORDER flag was suggested
> some time ago is that inet_pton() always returns network order
> addresses. It makes it much more natural to keep everything in network
> order instead of having to swap things around.
>
> Now, only having the lookup functions requiring network order addresses
> but all the other functions in the API requiring host order addresses
> is even more confusing.
>
> In my opinion, when RTE_FIB_F_NETWORK_ORDER is set, *all* functions of
> the fib API should take network order addresses. That obviously mean
> that rib should also have a similar flag.
>
> Is that possible without a massive rework? If not, then I think we
> should revert the patch that adds it or at least discourage its use in
> its current state.
>
> What do you think?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: rte_fib network order bug
2024-11-14 7:43 ` Morten Brørup
@ 2024-11-14 10:18 ` Robin Jarry
2024-11-14 14:35 ` Morten Brørup
0 siblings, 1 reply; 7+ messages in thread
From: Robin Jarry @ 2024-11-14 10:18 UTC (permalink / raw)
To: Morten Brørup, Medvedkin, Vladimir, dev
Hi folks,
Morten Brørup, Nov 14, 2024 at 08:43:
> Medvedkin, Vladimir:
>> I think control plane API should work with prefix addresses in CPU
>> byte order. At least our RTE_IPV4 macro works this way. Also, prefix
>> is an address + prefix length (not the mask), so it is more natural
>> if address is in cpu byte order.
This may get into a religion debate, but in my opinion, an IPv4 address
is *not* an integer. It should be treated as an opaque value. RTE_IPV4
is only useful to define addresses in unit tests.
I do not know of any IPv4 stack implementation that deals with
*host order* addresses. Here are a couple of examples where all
addresses are stored in network order in the control plane:
https://elixir.bootlin.com/linux/v6.11.6/source/net/ipv4/fib_frontend.c#L1069
https://github.com/freebsd/freebsd-src/blob/release/14.1.0/sys/net/route/route_ctl.c#L692
https://git.fd.io/vpp/tree/src/vnet/fib/fib_table.c?h=v24.10#n237
>>
>> Also, I think byte swap should be done on the interface where byte
>> order changes, and this boundary lies outside the FIB library.
>> However, I've added this feature not only because it was asked, but
>> also trying to improve performance in some cases, such as using
>> AVX512 byte swap in vector path for users who don't want to bother
>> about manually do byteswap on the fast path.
>>
>> Why do you think this would discourage users?
>
> Joining the discussing with a couple of comments.
>
> 1. When I saw the byte order flag the first time, it was not clear to
> me either that it only affected lookups - I too thought it covered
> the entire API of the library. This needs to be emphasized in the
> description of the flag. And the flag's name should contain LOOKUP,
> e.g.:
>
> /** If set, FIB lookup is expecting IPv4 address in network byte order. Note: Only lookup! */
> #define RTE_FIB_F_LOOKUP_NETWORK_ORDER 1
>
> 2. Control plane API should use CPU byte order. I consider inet_pton()
> irrelevant in this context. Adding network byte order lookup for
> fast path optimization makes good sense, and adding it to the RIB
> library would be nice too.
>
> If it was an address table (not longest prefix table, but a hash or
> similar), the learn()/update() function could be fast path, and
> thus support network byte order too; but it's not, so add() is
> control plane.
Why would control plane use a different representation of addresses
compared to data plane? Also for consistency with IPv6, I really think
that *all* addresses should be dealt in their network form.
Cheers.
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: rte_fib network order bug
2024-11-14 10:18 ` Robin Jarry
@ 2024-11-14 14:35 ` Morten Brørup
0 siblings, 0 replies; 7+ messages in thread
From: Morten Brørup @ 2024-11-14 14:35 UTC (permalink / raw)
To: Robin Jarry, Medvedkin, Vladimir, dev
> From: Robin Jarry [mailto:rjarry@redhat.com]
> Sent: Thursday, 14 November 2024 11.19
>
> Hi folks,
>
> Morten Brørup, Nov 14, 2024 at 08:43:
> > Medvedkin, Vladimir:
> >> I think control plane API should work with prefix addresses in CPU
> >> byte order. At least our RTE_IPV4 macro works this way. Also, prefix
> >> is an address + prefix length (not the mask), so it is more natural
> >> if address is in cpu byte order.
>
> This may get into a religion debate, but in my opinion, an IPv4 address
> is *not* an integer. It should be treated as an opaque value.
In theory yes. But it is easier/faster performing range comparisons (First <= IP <= Last) when converting to host endian integer for IPv4.
E.g. our appliance allows configuring IP ranges, not only IP subnets.
Thinking more about it, it might just be an old habit, hard to shake off. Treating IPv4 addresses as byte arrays would allow simple memcmp() for range comparison, like IPv6 addresses.
> RTE_IPV4
> is only useful to define addresses in unit tests.
There are plenty of special IP addresses and subnets, where a shortcut macro makes the address easier readable in the code.
>
> I do not know of any IPv4 stack implementation that deals with
> *host order* addresses. Here are a couple of examples where all
> addresses are stored in network order in the control plane:
>
> https://elixir.bootlin.com/linux/v6.11.6/source/net/ipv4/fib_frontend.c
> #L1069
>
> https://github.com/freebsd/freebsd-
> src/blob/release/14.1.0/sys/net/route/route_ctl.c#L692
>
> https://git.fd.io/vpp/tree/src/vnet/fib/fib_table.c?h=v24.10#n237
>
> >>
> >> Also, I think byte swap should be done on the interface where byte
> >> order changes, and this boundary lies outside the FIB library.
> >> However, I've added this feature not only because it was asked, but
> >> also trying to improve performance in some cases, such as using
> >> AVX512 byte swap in vector path for users who don't want to bother
> >> about manually do byteswap on the fast path.
> >>
> >> Why do you think this would discourage users?
> >
> > Joining the discussing with a couple of comments.
> >
> > 1. When I saw the byte order flag the first time, it was not clear to
> > me either that it only affected lookups - I too thought it covered
> > the entire API of the library. This needs to be emphasized in the
> > description of the flag. And the flag's name should contain
> LOOKUP,
> > e.g.:
> >
> > /** If set, FIB lookup is expecting IPv4 address in network byte
> order. Note: Only lookup! */
> > #define RTE_FIB_F_LOOKUP_NETWORK_ORDER 1
> >
> > 2. Control plane API should use CPU byte order. I consider
> inet_pton()
> > irrelevant in this context. Adding network byte order lookup for
> > fast path optimization makes good sense, and adding it to the RIB
> > library would be nice too.
> >
> > If it was an address table (not longest prefix table, but a hash
> or
> > similar), the learn()/update() function could be fast path, and
> > thus support network byte order too; but it's not, so add() is
> > control plane.
>
> Why would control plane use a different representation of addresses
> compared to data plane?
Excellent question.
Old habit? Growing up using big endian CPUs, we have come to think of IPv4 addresses as 32 bit numbers, so we keep treating them as such. With this old way of thinking, the only reason to use network endian in the fast path with little endian CPUs is for performance reasons (to save the byte swap) - if not, we would still prefer using host endian in the fast path too.
> Also for consistency with IPv6, I really think
> that *all* addresses should be dealt in their network form.
Food for thought!
>
> Cheers.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-11-14 14:35 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-12 9:31 rte_fib network order bug Robin Jarry
2024-11-13 10:42 ` Medvedkin, Vladimir
2024-11-13 13:27 ` Robin Jarry
2024-11-13 19:39 ` Medvedkin, Vladimir
2024-11-14 7:43 ` Morten Brørup
2024-11-14 10:18 ` Robin Jarry
2024-11-14 14:35 ` Morten Brørup
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).