From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8917945D2E; Sun, 17 Nov 2024 16:04:15 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E5A9C40298; Sun, 17 Nov 2024 16:04:14 +0100 (CET) Received: from mail-oo1-f49.google.com (mail-oo1-f49.google.com [209.85.161.49]) by mails.dpdk.org (Postfix) with ESMTP id 02E9E4003C for ; Sun, 17 Nov 2024 16:04:12 +0100 (CET) Received: by mail-oo1-f49.google.com with SMTP id 006d021491bc7-5ee9dbf1b47so1702166eaf.1 for ; Sun, 17 Nov 2024 07:04:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731855852; x=1732460652; darn=dpdk.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=MnDNVwLbRwIHU3VP14DJjND3Bn/5XZcYKPJguNyUm3M=; b=AuqLVqdo43XLV4cXZ3LPNs9ReSrpr9l/DoaGctKdwx4FtZr1F/+O2ShZdXukIMGc7D NR9kJ73WfirYjn0RrrQZmlmT9O0iJnrr91+SZLMJ+HOWEjZ4Hur0tx7WHuB4I5sLjfHr YezvVoXRIKVKMfJrW8m9IEiE6zxP653YUvC4+9EmwdmuPfIGNBaaXy6ZSdtmmB3NuPo8 UfBsrxXYOyi9EFZecTj0tCvAdsd7TE1onVEgA8J/rA56kdTOQdZ3/QuhAmMsKefEeUGW RtNrA0lea77sfN2KVPAsDYhttH71wW2QPq/VvrzFdlFSiCtsVnbqHDyDWVU+A12k/7e0 h/VA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731855852; x=1732460652; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=MnDNVwLbRwIHU3VP14DJjND3Bn/5XZcYKPJguNyUm3M=; b=H4KxtzkRSxlXLNhP6/yG3b+1jX3rGMqJi8wtRc2KbuLiKjrRiNKrbL8ccboCpcJGnR ALon6yrxDzNN6hW0H6I50U/99+32y2cXoET4/6x/qyUFdHC0tVfIn07+4tlPTmbMO8eR nFN+h8Ad5l9MokktS1BnnIvZpZGUXBXJTyofAOyO75WodAlj65rtf6b7a82PCTqgf6iv cojWjugfNR/nlDc8kKyreS8AksloPMpTH7Zp99i+OB6c7BrP7ybwZC0tKUiFnaEU7JBQ 7MNnlw95SrFfGrjx67CpOs7MrLFBQZlXg5kRV21se7qn44wQMGvMfBQInlU8vLhHkaLL xlxg== X-Forwarded-Encrypted: i=1; AJvYcCWES63kBxMTtlv+ISJknhQw6tQKxaGbSqsFYZMHD2LcetaP08+C6yU9zgPC0iiE66qjZJw=@dpdk.org X-Gm-Message-State: AOJu0Yz1YBO+zDOEgxlV+KQjoTySJtKrY8uj7+Ux8idz48If9o49Fah6 KoX9nnPOw9/PxDL8Y/RipEt9Wo2fvP8cup/cwCUoq6hC7JWG6AeaMbVXwtGXD031qQ98TQ22gN5 xQ7LDw3wKQA6qw6RChy5aNOZuS0I= X-Google-Smtp-Source: AGHT+IES0x33wTtV7N5a9RewsoSttg6Qv4mu8OsByw2fdCu9nMuXc983lWJiG1UHaqdRm1bfIZLrAct3uPmO2gi011E= X-Received: by 2002:a05:6830:6703:b0:713:d159:ec with SMTP id 46e09a7af769-71a77455372mr6678259a34.10.1731855851634; Sun, 17 Nov 2024 07:04:11 -0800 (PST) MIME-Version: 1.0 References: <98CBD80474FA8B44BF855DF32C47DC35E9F8CB@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35E9F8CC@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35E9F8CD@smartserver.smartshare.dk> <20241115082046.46af52d5@hermes.local> In-Reply-To: <20241115082046.46af52d5@hermes.local> From: Vladimir Medvedkin Date: Sun, 17 Nov 2024 15:04:00 +0000 Message-ID: Subject: Re: rte_fib network order bug To: Stephen Hemminger Cc: Robin Jarry , =?UTF-8?Q?Morten_Br=C3=B8rup?= , "Medvedkin, Vladimir" , dev@dpdk.org Content-Type: multipart/alternative; boundary="000000000000d6c80906271d1c44" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --000000000000d6c80906271d1c44 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi all, [Robin] > I had not understood that it was *only* the lookups that were network order [Morten] >When I saw the byte order flag the first time, it was not clear to me either that it only affected lookups - I too thought it covered the entire API of the library. This needs to be emphasized in the description of the flag. And the flag's name should contain LOOKUP [Morten] > And/or rename RTE_FIB_F_NETWORK_ORDER to RTE_FIB_F_NETWORK_ORDER_LOOKUP or similar. There is a clear comment for this flag that it has effects on lookup. Repeating the statement with an exclamation mark seems too much. Moreover, at first this flag was named "RTE_FIB_FLAG_LOOKUP_BE" and it was suggested for renaming here: https://inbox.dpdk.org/dev/D4SWPKOPRD5Z.87YIET3Y4AW@redhat.com/ [Morten] >Control plane API should use CPU byte order ... adding it (support for network byte order) to the RIB library would be nice too. I'm not sure if I understood you correctly here, RIB is a control plane library. [Robin] > an IPv4 address is *not* an integer. It should be treated as an opaque value. I don't agree here. IPv4 is 32 bits of information. CPUs usually can treat 32 bits of information as an integer, which is really useful. [Morten] > Treating IPv4 addresses as byte arrays would allow simple memcmp() for range comparison How is it possible for a general case? For example, I need to test IP addresses against range 1.1.1.7 - 10.20.30.37. [Robin] >Also for consistency with IPv6, I really think that *all* addresses should be dealt in their network form. There is no such a problem as byte order mismatch for IPv6 addresses since they can not be treated by modern CPUs as an native integer type. [Robin] >But it (RTE_IPV4) will always generate addresses in *host order*. Which means they cannot be used in IPv4 headers without passing them through htonl(). RTE_IPV4 is not limited by setting IPv4 headers values. [Robin] >Maybe we could revert that patch and defer a complete change of the rib/fib APIs to only expose network order addresses? I don't agree with that. Don't limit yourself to just manipulating network headers. [Robin] >Thinking about it some more. Having a flag for such a drastic change in behaviour does not seem right. This flag is optional. I don't see any problems with that. In general, here we just have different perspectives on the problem. I can see and understand your point. My considerations are: - The vast majority of the longest prefix match algorithms works with addresses in host byte order (binary trees, multibit tries, DXR, except only hash based lookup) - If you do byteswap two or more times - If you run byteswap two or more times, you are probably doing something wrong in terms of computations So, feel free to submit patches adding this feature to the control plane API, but let's consider: - default behaviour should remain the same. Why? At least because for my usecases I'd like to have "data representation" (byte swap) outside of the library. Not to mention ABI/API breakage - IPv4 should stay as uint32_t. C doesn't know such a thing as byte order, it knows about size and signedness. rte_be32_t is just a hint for us - humans :) =D0=BF=D1=82, 15 =D0=BD=D0=BE=D1=8F=D0=B1. 2024=E2=80=AF=D0=B3. =D0=B2 17:0= 0, Stephen Hemminger : > On Fri, 15 Nov 2024 15:28:33 +0100 > "Robin Jarry" wrote: > > > Morten Br=C3=B8rup, Nov 15, 2024 at 14:52: > > > Robin, you've totally won me over on this endian discussion. :-) > > > Especially the IPv6 comparison make it clear why IPv4 should also be > > > network byte order. > > > > > > API/ABI stability is a pain... we're stuck with host endian IPv4 > > > addresses; e.g. for the RTE_IPV4() macro, which I now agree produces > > > the wrong endian value (on little endian CPUs). > > > > At least for 24.11 it is too late. But maybe we could make it right for > > the next LTS? > > > > >> Vladimir, could we at least consider adding a real network order mod= e > > >> for the rib and fib libraries? So that we can have consistent APIs > > >> between IPv4 and IPv6? > > > > > > And/or rename RTE_FIB_F_NETWORK_ORDER to > > > RTE_FIB_F_NETWORK_ORDER_LOOKUP or similar. This is important if real > > > network order mode is added (now or later)! > > > > Maybe we could revert that patch and defer a complete change of the > > rib/fib APIs to only expose network order addresses? It would be an ABI > > breakage but if properly announced in advance, it should be possible. > > > > Thinking about it some more. Having a flag for such a drastic change in > > behaviour does not seem right. > > It was a mistake for DPDK to define its own data structures for IP > addresses. > Would have been much better to stick with the legacy what BSD, Linux (and > Windows) > uses in API. 'struct in_addr' and 'struct in6_addr' > > Reinvention did not help users. > --=20 Regards, Vladimir --000000000000d6c80906271d1c44 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi all,

[Robin] > I had n= ot understood that it was *only* the lookups that were network order
<= div>[Morten] >When I saw the byte order flag the first time, it was not = clear to me=20 either that it only affected lookups - I too thought it covered the=20 entire API of the library. This needs to be emphasized in the=20 description of the flag. And the flag's name should contain LOOKUP
[Morten] > And/or rename RTE_FIB_F_NETWORK_ORDER to RTE_FIB_F_NETW= ORK_ORDER_LOOKUP or similar.

There is a clear comm= ent for this flag that it has effects on lookup. Repeating the statement wi= th an exclamation mark seems too much. Moreover, at first this flag was nam= ed "RTE_FIB_FLAG_LOOKUP_BE" and it was suggested for renaming her= e:

[Morten] >Control plane API should u= se CPU byte order ... adding it (support for network byte order) to the RIB= library would be nice too.
I'm not sure if I understood you = correctly here, RIB is a control plane library.

[R= obin] > an IPv4 address is *not* an integer. It should be treated as an = opaque value.
I don't agree here. IPv4 is 32 bits of informat= ion. CPUs usually can treat 32 bits of information as an integer, which is = really useful.

[Morten] > Treating IPv4 ad= dresses as byte arrays would allow simple memcmp() for range comparison
How is it possible for a general case? For example, I need to test I= P addresses against range 1.1.1.7 - 10.20.30.37.

[= Robin] >Also for consistency with IPv6, I really think that *all* addres= ses should be dealt in their network form.
There is no such a pro= blem as byte order mismatch for IPv6 addresses since they can not be treate= d by modern CPUs as an native integer type.

[Robin= ] >But it (RTE_IPV4) will always generate addresses in *host order*. Whi= ch means they cannot be used in IPv4 headers without passing them through h= tonl().
RTE_IPV4 is not limited by setting IPv4 headers values.

[Robin] >Maybe we could revert that patch and de= fer a complete change of the rib/fib APIs to only expose network order addr= esses?
I don't agree with that. = Don't limit yourself to just manipulating network headers.

[Robin] >Thinking about it some more.= Having a flag for such a drastic change in behaviour does not seem right.<= span class=3D"gmail-im">
This flag is optional. = I don't see any problems with that.

In general= , here we just have different perspectives on the problem. I can see and un= derstand your point.
My considerations are:
- The vast = majority of the longest prefix match algorithms works with addresses in hos= t byte order (binary trees, multibit tries, DXR, except only hash based loo= kup)
- If you do byteswap two or more times - If you run byteswap<= /span> two or= more times= , you are probably doing somethi= ng wrong in terms of computatio= ns

So, feel free to submit patches adding t= his feature to the control plane API, but let's consider:
- d= efault behaviour should remain the same. Why? At least because for my useca= ses I'd like to have "data representation" (byte swap) outsid= e of the library. Not to mention ABI/API breakage
-=C2=A0 IPv4 sh= ould stay as uint32_t. C doesn't know<= span style=3D"white-space:pre-wrap"> such a thing= as byte order, it = knows about= size and <= /span>signedness. rte_be32_t is just a hint for us - humans :)<= br>


=D0=BF=D1=82, 15 =D0=BD=D0=BE=D1=8F=D0=B1. 2024=E2= =80=AF=D0=B3. =D0=B2 17:00, Stephen Hemminger <stephen@networkplumber.org>:
On Fri, 15 Nov 2024 15:28:33 +0100=
"Robin Jarry" <rjarry@redhat.com> wrote:

> Morten Br=C3=B8rup, Nov 15, 2024 at 14:52:
> > Robin, you've totally won me over on this endian discussion. = :-)
> > Especially the IPv6 comparison make it clear why IPv4 should also= be
> > network byte order.
> >
> > API/ABI stability is a pain... we're stuck with host endian I= Pv4
> > addresses; e.g. for the RTE_IPV4() macro, which I now agree produ= ces
> > the wrong endian value (on little endian CPUs).=C2=A0
>
> At least for 24.11 it is too late. But maybe we could make it right fo= r
> the next LTS?
>
> >> Vladimir, could we at least consider adding a real network or= der mode
> >> for the rib and fib libraries? So that we can have consistent= APIs
> >> between IPv4 and IPv6?=C2=A0
> >
> > And/or rename RTE_FIB_F_NETWORK_ORDER to
> > RTE_FIB_F_NETWORK_ORDER_LOOKUP or similar. This is important if r= eal
> > network order mode is added (now or later)!=C2=A0
>
> Maybe we could revert that patch and defer a complete change of the > rib/fib APIs to only expose network order addresses? It would be an AB= I
> breakage but if properly announced in advance, it should be possible.<= br> >
> Thinking about it some more. Having a flag for such a drastic change i= n
> behaviour does not seem right.

It was a mistake for DPDK to define its own data structures for IP addresse= s.
Would have been much better to stick with the legacy what BSD, Linux (and W= indows)
uses in API. 'struct in_addr' and 'struct in6_addr'

Reinvention did not help users.


--
Regards,
Vladimir
--000000000000d6c80906271d1c44--