From: Vladimir Medvedkin <medvedkinv@gmail.com>
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: "Robin Jarry" <rjarry@redhat.com>,
"Morten Brørup" <mb@smartsharesystems.com>,
"Medvedkin, Vladimir" <vladimir.medvedkin@intel.com>,
dev@dpdk.org
Subject: Re: rte_fib network order bug
Date: Sun, 17 Nov 2024 15:04:00 +0000 [thread overview]
Message-ID: <CANDrEHkxt32+PHRWNBYjpo2t3cxLemwK_iKYfNbA85OR_t3cjw@mail.gmail.com> (raw)
In-Reply-To: <20241115082046.46af52d5@hermes.local>
[-- Attachment #1: Type: text/plain, Size: 4983 bytes --]
Hi all,
[Robin] > I had not understood that it was *only* the lookups that were
network order
[Morten] >When I saw the byte order flag the first time, it was not clear
to me either that it only affected lookups - I too thought it covered the
entire API of the library. This needs to be emphasized in the description
of the flag. And the flag's name should contain LOOKUP
[Morten] > And/or rename RTE_FIB_F_NETWORK_ORDER to
RTE_FIB_F_NETWORK_ORDER_LOOKUP or similar.
There is a clear comment for this flag that it has effects on lookup.
Repeating the statement with an exclamation mark seems too much. Moreover,
at first this flag was named "RTE_FIB_FLAG_LOOKUP_BE" and it was suggested
for renaming here:
https://inbox.dpdk.org/dev/D4SWPKOPRD5Z.87YIET3Y4AW@redhat.com/
[Morten] >Control plane API should use CPU byte order ... adding it
(support for network byte order) to the RIB library would be nice too.
I'm not sure if I understood you correctly here, RIB is a control plane
library.
[Robin] > an IPv4 address is *not* an integer. It should be treated as an
opaque value.
I don't agree here. IPv4 is 32 bits of information. CPUs usually can treat
32 bits of information as an integer, which is really useful.
[Morten] > Treating IPv4 addresses as byte arrays would allow simple
memcmp() for range comparison
How is it possible for a general case? For example, I need to test IP
addresses against range 1.1.1.7 - 10.20.30.37.
[Robin] >Also for consistency with IPv6, I really think that *all*
addresses should be dealt in their network form.
There is no such a problem as byte order mismatch for IPv6 addresses since
they can not be treated by modern CPUs as an native integer type.
[Robin] >But it (RTE_IPV4) will always generate addresses in *host order*.
Which means they cannot be used in IPv4 headers without passing them
through htonl().
RTE_IPV4 is not limited by setting IPv4 headers values.
[Robin] >Maybe we could revert that patch and defer a complete change of
the rib/fib APIs to only expose network order addresses?
I don't agree with that. Don't limit yourself to just manipulating network
headers.
[Robin] >Thinking about it some more. Having a flag for such a drastic
change in behaviour does not seem right.
This flag is optional. I don't see any problems with that.
In general, here we just have different perspectives on the problem. I can
see and understand your point.
My considerations are:
- The vast majority of the longest prefix match algorithms works with
addresses in host byte order (binary trees, multibit tries, DXR, except
only hash based lookup)
- If you do byteswap two or more times - If you run byteswap two or more
times, you are probably doing something wrong in terms of computations
So, feel free to submit patches adding this feature to the control plane
API, but let's consider:
- default behaviour should remain the same. Why? At least because for my
usecases I'd like to have "data representation" (byte swap) outside of the
library. Not to mention ABI/API breakage
- IPv4 should stay as uint32_t. C doesn't know such a thing as byte order,
it knows about size and signedness. rte_be32_t is just a hint for us -
humans :)
пт, 15 нояб. 2024 г. в 17:00, Stephen Hemminger <stephen@networkplumber.org
>:
> On Fri, 15 Nov 2024 15:28:33 +0100
> "Robin Jarry" <rjarry@redhat.com> wrote:
>
> > Morten Brørup, Nov 15, 2024 at 14:52:
> > > Robin, you've totally won me over on this endian discussion. :-)
> > > Especially the IPv6 comparison make it clear why IPv4 should also be
> > > network byte order.
> > >
> > > API/ABI stability is a pain... we're stuck with host endian IPv4
> > > addresses; e.g. for the RTE_IPV4() macro, which I now agree produces
> > > the wrong endian value (on little endian CPUs).
> >
> > At least for 24.11 it is too late. But maybe we could make it right for
> > the next LTS?
> >
> > >> Vladimir, could we at least consider adding a real network order mode
> > >> for the rib and fib libraries? So that we can have consistent APIs
> > >> between IPv4 and IPv6?
> > >
> > > And/or rename RTE_FIB_F_NETWORK_ORDER to
> > > RTE_FIB_F_NETWORK_ORDER_LOOKUP or similar. This is important if real
> > > network order mode is added (now or later)!
> >
> > Maybe we could revert that patch and defer a complete change of the
> > rib/fib APIs to only expose network order addresses? It would be an ABI
> > breakage but if properly announced in advance, it should be possible.
> >
> > Thinking about it some more. Having a flag for such a drastic change in
> > behaviour does not seem right.
>
> It was a mistake for DPDK to define its own data structures for IP
> addresses.
> Would have been much better to stick with the legacy what BSD, Linux (and
> Windows)
> uses in API. 'struct in_addr' and 'struct in6_addr'
>
> Reinvention did not help users.
>
--
Regards,
Vladimir
[-- Attachment #2: Type: text/html, Size: 11432 bytes --]
prev parent reply other threads:[~2024-11-17 15:04 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-12 9:31 Robin Jarry
2024-11-13 10:42 ` Medvedkin, Vladimir
2024-11-13 13:27 ` Robin Jarry
2024-11-13 19:39 ` Medvedkin, Vladimir
2024-11-14 7:43 ` Morten Brørup
2024-11-14 10:18 ` Robin Jarry
2024-11-14 14:35 ` Morten Brørup
2024-11-15 13:01 ` Robin Jarry
2024-11-15 13:52 ` Morten Brørup
2024-11-15 14:07 ` Bruce Richardson
2024-11-15 14:28 ` Robin Jarry
2024-11-15 16:20 ` Stephen Hemminger
2024-11-17 15:04 ` Vladimir Medvedkin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CANDrEHkxt32+PHRWNBYjpo2t3cxLemwK_iKYfNbA85OR_t3cjw@mail.gmail.com \
--to=medvedkinv@gmail.com \
--cc=dev@dpdk.org \
--cc=mb@smartsharesystems.com \
--cc=rjarry@redhat.com \
--cc=stephen@networkplumber.org \
--cc=vladimir.medvedkin@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).