From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id BCEE74563A; Thu, 18 Jul 2024 23:25:38 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 4DB3340662; Thu, 18 Jul 2024 23:25:38 +0200 (CEST) Received: from mail-oo1-f43.google.com (mail-oo1-f43.google.com [209.85.161.43]) by mails.dpdk.org (Postfix) with ESMTP id 5E66F40616 for ; Thu, 18 Jul 2024 23:25:36 +0200 (CEST) Received: by mail-oo1-f43.google.com with SMTP id 006d021491bc7-5c66909738fso1068408eaf.1 for ; Thu, 18 Jul 2024 14:25:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721337935; x=1721942735; darn=dpdk.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=TdtL2JDQWPgHNq3qT9Kgb7s1ANPu+H377wTw4GvBzQY=; b=lzkL+E/TU7Hi3saoZ2QDm4nLU/IxwXwju8eKbT9wMnvVvEUBnnkA8RDTalsaD3lVMe BU2aMmn6QyHT3jyZZpCDPuCK5CQE83YHPhh69Nyi81gjbJe7ZFkqBHvI/WoA+zCwJEjp XKI1Xd9aJ8MXOVICFC48MMR2yjWlE7aUEpizD0rK30j2K2JcLP4bnDOZ93udxaPk/MNe 5JFcqBPghoeYws4mfkuOikKJN83P4DzHNZ0RKH8Zwh1zGu8h4LTy6ptWd6rgQaDYqg5Q nemDOz5h3yjyj3dOdWk4S4aHpcZb4yVYx5Zjnocc+eVTtGc3itvWDfqZP0ZURoUsBgIW Whgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721337935; x=1721942735; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TdtL2JDQWPgHNq3qT9Kgb7s1ANPu+H377wTw4GvBzQY=; b=dGUIsc7bfKmDT8ABWBkKPsZ9fQDUqnmAJpCulvXq34uMZ71S6P3MvAnbRkGhwGzG6I a49ng8iQ59LxXPGKdCh9jgL0CNN9cWt4OBdSZ6OV3bfUNIBDYHS7stY8VOyqnoqB6PnO FV6B+M5UEM9NVgrvRQAVuACQp0q/RWnviax3AyUD239Gp6OoM4wGQdnmIaHFczL+9BDx OezFAu9YjAgavbwiDD+BwZvoci8ioJK9ggib8hw/zy7lpRgo6Us4mabwLM/mtjtwENGg 9kA2MnoBYBJRAV+WFjnAdHoM4cLvMMAglbiIft08n3bSDZdAiwFGVMkJqWnBAnWIX9+F 8iHg== X-Forwarded-Encrypted: i=1; AJvYcCV4na1EU4Nw8qrCYrP629p+od2ujN0jTjdHhBLNepOeVRoJzdCgFz+7ax3DlbL8MjBDbeKJNJluaKcuDGg= X-Gm-Message-State: AOJu0YyeH4dhdy4Azuo/un4cQUVv/iMlOuRgWbD433nnQNonLaaA7npf olnjeP2lPtCqtAULHsbmvkpcE5wx1NdTRSl4jZFqsJ7bVdFxDLGxpvbCdo3UYjGIsTfqBYVDDUz prj4kTISQ4Z1tQuLL0KTDnN2sPdw= X-Google-Smtp-Source: AGHT+IG7Bdtr0EFepsHcKm1bkevwLuBIZ/CMOsuZHHNx0nraL3nqTpP7nLv7YyYXk8d7OogSm2CCv6EEB+GvHvR7W6o= X-Received: by 2002:a05:6870:fb87:b0:25e:2b75:1f89 with SMTP id 586e51a60fabf-260ef382cfdmr2056691fac.22.1721337935239; Thu, 18 Jul 2024 14:25:35 -0700 (PDT) MIME-Version: 1.0 References: <98CBD80474FA8B44BF855DF32C47DC35E9F5AB@smartserver.smartshare.dk> In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F5AB@smartserver.smartshare.dk> From: Vladimir Medvedkin Date: Thu, 18 Jul 2024 22:25:24 +0100 Message-ID: Subject: Re: IPv6 APIs rework To: =?UTF-8?Q?Morten_Br=C3=B8rup?= Cc: Robin Jarry , dev@dpdk.org, Sunil Kumar Kori , Rakesh Kudurumalla , Vladimir Medvedkin , Wisam Jaddo , Cristian Dumitrescu , Konstantin Ananyev , Akhil Goyal , Fan Zhang , Bruce Richardson , Yipeng Wang , Sameh Gobriel , Nithin Dabilpuram , Kiran Kumar K , Satha Rao , Harman Kalra , Ankur Dwivedi , Anoob Joseph , Tejasree Kondoj , Gagandeep Singh , Hemant Agrawal , Ajit Khaparde , Somnath Kotur , Chas Williams , "Min Hu (Connor)" , Potnuri Bharat Teja , Sachin Saxena , Ziyang Xuan , Xiaoyun Wang , Jie Hai , Yisen Zhuang , Jingjing Wu , Dariusz Sosnowski , Viacheslav Ovsiienko , Bing Zhao , Ori Kam , Suanming Mou , Matan Azrad , Chaoyong He , Devendra Singh Rawat , Alok Prasad , Andrew Rybchenko , Stephen Hemminger , Jiawen Wu , Jian Wang , Thomas Monjalon , Ferruh Yigit , Jiayu Hu , Pavan Nikhilesh , Maxime Coquelin , Chenbo Xia Content-Type: multipart/alternative; boundary="0000000000002b233f061d8c38fa" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --0000000000002b233f061d8c38fa Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Robin, Thanks, that is a good idea. =D1=87=D1=82, 18 =D0=B8=D1=8E=D0=BB. 2024=E2=80=AF=D0=B3. =D0=B2 21:27, Mor= ten Br=C3=B8rup : > > From: Robin Jarry [mailto:rjarry@redhat.com] > > > > Hi folks, > > > > while working on IPv6 support for grout [1], I noticed that all DPDK > > IPv6 APIs used fixed sized arrays in the route lookup functions [2]. > > > > int rte_fib6_lookup_bulk(struct rte_fib6 *fib, > > uint8_t ips[][RTE_FIB6_IPV6_ADDR_SIZE], > > uint64_t *next_hops, > > int n); > > > > If I'm not mistaken, using sized arrays in function signatures is only > > for documentation purposes and does not result in any specific compiler > > checks. In the above example, the ips parameter is considered as a plai= n > > old `uint8_t **` pointer. > > > > Also, not having a dedicated type for IPv6 addresses requires obscure > > pointer arithmetic [3] and casting [4]. > > > > I'd like to introduce a real IPv6 address structure that has the same > > alignment than a dumb `uint8_t *` pointer but has an union to ease > > casting and most importantly presents the whole thing as an explicit > > typed structure: > > > > #define RTE_IPV6_ADDR_SIZE 16 > > > > struct rte_ipv6_addr { > > union { > > uint8_t u8[RTE_IPV6_ADDR_SIZE]; > > uint16_t u16[RTE_IPV6_ADDR_SIZE / sizeof(uint16_t)]; > > uint32_t u32[RTE_IPV6_ADDR_SIZE / sizeof(uint32_t)]; > > uint64_t u64[RTE_IPV6_ADDR_SIZE / sizeof(uint64_t)]; > > }; > > } __rte_packed __rte_aligned(1); > > > > This would require some breakage of the APIs but I think it would > > benefit code readability and maintainability in the long term. > > In short: Although I like the idea of a unified IPv6 address type very > much, I'm not sure consensus can be reached about the optimal alignment o= f > such a type. > > The long version: > > Please consider this proposal in a broader perspective. > > The IPv4 FIB lookup takes an uint32_t array, so the IPv4 address type her= e > is 4 byte aligned: uint32_t *ips > Generally, uint32_t or rte_be32_t is used for IPv4 addresses, and both > these types are 4 byte aligned. In other words: IPv4 addresses are > considered 4 byte aligned by DPDK. > > I don't think it is similarly simple for IPv6 addresses. > > The alignment of IPv6 addresses may depend on how/where they are used, > e.g.: > 1. For the FIB library, it might be good for vector implementations to > have the IPv6 addresses naturally aligned (i.e. 16 byte aligned), like th= e > uint128_t/__int128/__m128i type (or the rte_xmm_t type [XMM]). Furthermor= e, > a simple integer type (uint128_t equivalent) might be preferable in this > API. > I think alignment should be 1 since in FIB6 users usually don't copy IPv6 address and just provide a pointer to the memory inside the packet. Current vector implementation loads IPv6 addresses using unaligned access ( _mm512_loadu_si512) so it doesn't rely on alignment. > 2. In the IPv6 packet header, the IPv6 addresses are not 16 byte aligned, > they are 8 byte aligned. So we cannot make the IPv6 address type 16 byte > aligned. > Not necessary, if Ethernet frame in mbuf starts on 8b aligned address, then IPv6 is aligned only by 2 bytes. > I fear that broadly dumbing down the IPv6 address type to always use 1 > byte alignment could potentially introduce unwanted performance penalties > (now or in the future). We didn't do it for IPv4 addresses, so let's not = do > it for IPv6 addresses. > > Perhaps we could use the lowest "non-exotic" (considering the use of IPv6 > addresses) alignment, which I would guess is 8 byte - as in the IPv6 pack= et > header. > For reference, Ethernet addresses are defined as 2 byte aligned [ETH]. > > [XMM]: > https://elixir.bootlin.com/dpdk/v24.03/source/lib/eal/x86/include/rte_vec= t.h#L42 > [ETH]: > https://elixir.bootlin.com/dpdk/v24.07-rc2/source/lib/net/rte_ether.h#L74 > > > > > int rte_fib6_lookup_bulk(struct rte_fib6 *fib, > > const struct rte_ipv6_addr *ips, > > uint64_t *next_hops, > > int n); > > > > I already have a semi-working draft and am in the process of splitting > > the changes into small chunks to make them easier to review. > > > > https://github.com/DPDK/dpdk/compare/main...rjarry:dpdk:ipv6-address- > > rework > > > > Is that something that would be of interest? If yes, I would like to > > announce API breakage before the release of 24.07 so that the changes > > can be integrated into 24.11. > > > > Cheers! > > > > [1] https://github.com/rjarry/grout > > [2] > > https://doc.dpdk.org/api/rte__fib6_8h.html#a924678410ccb9551cda3e75d742= a > > 11e3 > > [3] https://git.dpdk.org/dpdk/tree/lib/fib/trie_avx512.c?h=3Dv24.07- > > rc2#n340 > > [4] https://git.dpdk.org/dpdk/tree/lib/hash/rte_thash.h?h=3Dv24.07- > > rc2#n156 > > > > -- > > Robin > > --=20 Regards, Vladimir --0000000000002b233f061d8c38fa Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Robin,

Thanks, that= is a good idea.



=D1=87= =D1=82, 18 =D0=B8=D1=8E=D0=BB. 2024=E2=80=AF=D0=B3. =D0=B2 21:27, Morten Br= =C3=B8rup <mb@smartsharesyst= ems.com>:
> From: Robin Jarry [mailto:<= a href=3D"mailto:rjarry@redhat.com" target=3D"_blank">rjarry@redhat.com= ]
>
> Hi folks,
>
> while working on IPv6 support for grout [1], I noticed that all DPDK > IPv6 APIs used fixed sized arrays in the route lookup functions [2]. >
>=C2=A0 int rte_fib6_lookup_bulk(struct rte_fib6 *fib,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0uint8_t ips[][RTE_FIB6_IPV6_ADDR_SIZE],
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0uint64_t *next_hops,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0int n);
>
> If I'm not mistaken, using sized arrays in function signatures is = only
> for documentation purposes and does not result in any specific compile= r
> checks. In the above example, the ips parameter is considered as a pla= in
> old `uint8_t **` pointer.
>
> Also, not having a dedicated type for IPv6 addresses requires obscure<= br> > pointer arithmetic [3] and casting [4].
>
> I'd like to introduce a real IPv6 address structure that has the s= ame
> alignment than a dumb `uint8_t *` pointer but has an union to ease
> casting and most importantly presents the whole thing as an explicit > typed structure:
>
>=C2=A0 =C2=A0 =C2=A0#define RTE_IPV6_ADDR_SIZE 16
>
>=C2=A0 =C2=A0 =C2=A0struct rte_ipv6_addr {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0union {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0uint8_t u8[RTE_IPV6_ADD= R_SIZE];
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0uint16_t u16[RTE_IPV6_A= DDR_SIZE / sizeof(uint16_t)];
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0uint32_t u32[RTE_IPV6_A= DDR_SIZE / sizeof(uint32_t)];
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0uint64_t u64[RTE_IPV6_A= DDR_SIZE / sizeof(uint64_t)];
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0};
>=C2=A0 =C2=A0 =C2=A0} __rte_packed __rte_aligned(1);
>
> This would require some breakage of the APIs but I think it would
> benefit code readability and maintainability in the long term.

In short: Although I like the idea of a unified IPv6 address type very much= , I'm not sure consensus can be reached about the optimal alignment of = such a type.

The long version:

Please consider this proposal in a broader perspective.

The IPv4 FIB lookup takes an uint32_t array, so the IPv4 address type here = is 4 byte aligned: uint32_t *ips
Generally, uint32_t or rte_be32_t is used for IPv4 addresses, and both thes= e types are 4 byte aligned. In other words: IPv4 addresses are considered 4= byte aligned by DPDK.

I don't think it is similarly simple for IPv6 addresses.

The alignment of IPv6 addresses may depend on how/where they are used, e.g.= :
1. For the FIB library, it might be good for vector implementations to have= the IPv6 addresses naturally aligned (i.e. 16 byte aligned), like the uint= 128_t/__int128/__m128i type (or the rte_xmm_t type [XMM]). Furthermore, a s= imple integer type (uint128_t equivalent) might be preferable in this API.<= br>

I think alignment should be 1 sinc= e in FIB6 users usually don't copy IPv6 address and just provide a poin= ter to the memory inside the packet. Current vector implementation loads IP= v6 addresses using unaligned access (_mm512_loadu_si512)= so it doesn't rely on alignment.
=C2=A0
2. In the IPv6 packet header, the IPv6 addresses are not 16 byte aligned, t= hey are 8 byte aligned. So we cannot make the IPv6 address type 16 byte ali= gned.
Not necessary, if Ethernet frame in mbuf starts = on 8b aligned address, then IPv6 is aligned only by 2 bytes.

=

I fear that broadly dumbing down the IPv6 address type to always use 1 byte= alignment could potentially introduce unwanted performance penalties (now = or in the future). We didn't do it for IPv4 addresses, so let's not= do it for IPv6 addresses.

Perhaps we could use the lowest "non-exotic" (considering the use= of IPv6 addresses) alignment, which I would guess is 8 byte - as in the IP= v6 packet header.
For reference, Ethernet addresses are defined as 2 byte aligned [ETH].

[XMM]: https://elixi= r.bootlin.com/dpdk/v24.03/source/lib/eal/x86/include/rte_vect.h#L42
[ETH]: https://elixir.bootl= in.com/dpdk/v24.07-rc2/source/lib/net/rte_ether.h#L74

>
>=C2=A0 int rte_fib6_lookup_bulk(struct rte_fib6 *fib,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0const struct rte_ipv6_addr *ips,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0uint64_t *next_hops,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0int n);
>
> I already have a semi-working draft and am in the process of splitting=
> the changes into small chunks to make them easier to review.
>
> https://github.com/DPDK/dp= dk/compare/main...rjarry:dpdk:ipv6-address-
> rework
>
> Is that something that would be of interest? If yes, I would like to > announce API breakage before the release of 24.07 so that the changes<= br> > can be integrated into 24.11.
>
> Cheers!
>
> [1] https://github.com/rjarry/grout
> [2]
> https://doc.dpdk.org/ap= i/rte__fib6_8h.html#a924678410ccb9551cda3e75d742a
> 11e3
> [3] https://git.dpdk.org/dpdk/= tree/lib/fib/trie_avx512.c?h=3Dv24.07-
> rc2#n340
> [4] https://git.dpdk.org/dpdk/= tree/lib/hash/rte_thash.h?h=3Dv24.07-
> rc2#n156
>
> --
> Robin



--
Regards,
Vladimir
--0000000000002b233f061d8c38fa--