DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Robin Jarry" <rjarry@redhat.com>,
	"Medvedkin, Vladimir" <vladimir.medvedkin@intel.com>,
	<dev@dpdk.org>
Subject: RE: rte_fib network order bug
Date: Thu, 14 Nov 2024 15:35:27 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9F8CC@smartserver.smartshare.dk> (raw)
In-Reply-To: <D5LTPJ4FK5O0.1C6RBQKKUU055@redhat.com>

> From: Robin Jarry [mailto:rjarry@redhat.com]
> Sent: Thursday, 14 November 2024 11.19
> 
> Hi folks,
> 
> Morten Brørup, Nov 14, 2024 at 08:43:
> > Medvedkin, Vladimir:
> >> I think control plane API should work with prefix addresses in CPU
> >> byte order. At least our RTE_IPV4 macro works this way. Also, prefix
> >> is an address + prefix length (not the mask), so it is more natural
> >> if address is in cpu byte order.
> 
> This may get into a religion debate, but in my opinion, an IPv4 address
> is *not* an integer. It should be treated as an opaque value.

In theory yes. But it is easier/faster performing range comparisons (First <= IP <= Last) when converting to host endian integer for IPv4.
E.g. our appliance allows configuring IP ranges, not only IP subnets.

Thinking more about it, it might just be an old habit, hard to shake off. Treating IPv4 addresses as byte arrays would allow simple memcmp() for range comparison, like IPv6 addresses.

> RTE_IPV4
> is only useful to define addresses in unit tests.

There are plenty of special IP addresses and subnets, where a shortcut macro makes the address easier readable in the code.

> 
> I do not know of any IPv4 stack implementation that deals with
> *host order* addresses. Here are a couple of examples where all
> addresses are stored in network order in the control plane:
> 
> https://elixir.bootlin.com/linux/v6.11.6/source/net/ipv4/fib_frontend.c
> #L1069
> 
> https://github.com/freebsd/freebsd-
> src/blob/release/14.1.0/sys/net/route/route_ctl.c#L692
> 
> https://git.fd.io/vpp/tree/src/vnet/fib/fib_table.c?h=v24.10#n237
> 
> >>
> >> Also, I think byte swap should be done on the interface where byte
> >> order changes, and this boundary lies outside the FIB library.
> >> However, I've added this feature not only because it was asked, but
> >> also trying to improve performance in some cases, such as using
> >> AVX512 byte swap in vector path for users who don't want to bother
> >> about manually do byteswap on the fast path.
> >>
> >> Why do you think this would discourage users?
> >
> > Joining the discussing with a couple of comments.
> >
> > 1. When I saw the byte order flag the first time, it was not clear to
> >    me either that it only affected lookups - I too thought it covered
> >    the entire API of the library. This needs to be emphasized in the
> >    description of the flag. And the flag's name should contain
> LOOKUP,
> >    e.g.:
> >
> > /** If set, FIB lookup is expecting IPv4 address in network byte
> order. Note: Only lookup! */
> > #define RTE_FIB_F_LOOKUP_NETWORK_ORDER    1
> >
> > 2. Control plane API should use CPU byte order. I consider
> inet_pton()
> >    irrelevant in this context. Adding network byte order lookup for
> >    fast path optimization makes good sense, and adding it to the RIB
> >    library would be nice too.
> >
> >    If it was an address table (not longest prefix table, but a hash
> or
> >    similar), the learn()/update() function could be fast path, and
> >    thus support network byte order too; but it's not, so add() is
> >    control plane.
> 
> Why would control plane use a different representation of addresses
> compared to data plane?

Excellent question.
Old habit? Growing up using big endian CPUs, we have come to think of IPv4 addresses as 32 bit numbers, so we keep treating them as such. With this old way of thinking, the only reason to use network endian in the fast path with little endian CPUs is for performance reasons (to save the byte swap) - if not, we would still prefer using host endian in the fast path too.

> Also for consistency with IPv6, I really think
> that *all* addresses should be dealt in their network form.

Food for thought!

> 
> Cheers.


      reply	other threads:[~2024-11-14 14:35 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-12  9:31 Robin Jarry
2024-11-13 10:42 ` Medvedkin, Vladimir
2024-11-13 13:27   ` Robin Jarry
2024-11-13 19:39     ` Medvedkin, Vladimir
2024-11-14  7:43       ` Morten Brørup
2024-11-14 10:18         ` Robin Jarry
2024-11-14 14:35           ` Morten Brørup [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35E9F8CC@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=dev@dpdk.org \
    --cc=rjarry@redhat.com \
    --cc=vladimir.medvedkin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).