DPDK patches and discussions
 help / color / mirror / Atom feed
* IPv6 APIs rework
@ 2024-07-18 15:03 Robin Jarry
  2024-07-18 20:27 ` Morten Brørup
  0 siblings, 1 reply; 21+ messages in thread
From: Robin Jarry @ 2024-07-18 15:03 UTC (permalink / raw)
  To: dev
  Cc: Sunil Kumar Kori, Rakesh Kudurumalla, Vladimir Medvedkin,
	Wisam Jaddo, Cristian Dumitrescu, Konstantin Ananyev,
	Akhil Goyal, Fan Zhang, Bruce Richardson, Yipeng Wang,
	Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K, Satha Rao,
	Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Gagandeep Singh, Hemant Agrawal, Ajit Khaparde, Somnath Kotur,
	Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
	Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Stephen Hemminger, Jiawen Wu, Jian Wang,
	Thomas Monjalon, Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh,
	Maxime Coquelin, Chenbo Xia

Hi folks,

while working on IPv6 support for grout [1], I noticed that all DPDK 
IPv6 APIs used fixed sized arrays in the route lookup functions [2].

 int rte_fib6_lookup_bulk(struct rte_fib6 *fib,
                          uint8_t ips[][RTE_FIB6_IPV6_ADDR_SIZE],
                          uint64_t *next_hops,
                          int n);

If I'm not mistaken, using sized arrays in function signatures is only 
for documentation purposes and does not result in any specific compiler 
checks. In the above example, the ips parameter is considered as a plain 
old `uint8_t **` pointer.

Also, not having a dedicated type for IPv6 addresses requires obscure 
pointer arithmetic [3] and casting [4].

I'd like to introduce a real IPv6 address structure that has the same 
alignment than a dumb `uint8_t *` pointer but has an union to ease 
casting and most importantly presents the whole thing as an explicit 
typed structure:

    #define RTE_IPV6_ADDR_SIZE 16

    struct rte_ipv6_addr {
        union {
            uint8_t u8[RTE_IPV6_ADDR_SIZE];
            uint16_t u16[RTE_IPV6_ADDR_SIZE / sizeof(uint16_t)];
            uint32_t u32[RTE_IPV6_ADDR_SIZE / sizeof(uint32_t)];
            uint64_t u64[RTE_IPV6_ADDR_SIZE / sizeof(uint64_t)];
        };
    } __rte_packed __rte_aligned(1);

This would require some breakage of the APIs but I think it would 
benefit code readability and maintainability in the long term.

 int rte_fib6_lookup_bulk(struct rte_fib6 *fib,
                          const struct rte_ipv6_addr *ips,
                          uint64_t *next_hops,
                          int n);

I already have a semi-working draft and am in the process of splitting 
the changes into small chunks to make them easier to review.

https://github.com/DPDK/dpdk/compare/main...rjarry:dpdk:ipv6-address-rework

Is that something that would be of interest? If yes, I would like to 
announce API breakage before the release of 24.07 so that the changes 
can be integrated into 24.11.

Cheers!

[1] https://github.com/rjarry/grout
[2] https://doc.dpdk.org/api/rte__fib6_8h.html#a924678410ccb9551cda3e75d742a11e3
[3] https://git.dpdk.org/dpdk/tree/lib/fib/trie_avx512.c?h=v24.07-rc2#n340
[4] https://git.dpdk.org/dpdk/tree/lib/hash/rte_thash.h?h=v24.07-rc2#n156

-- 
Robin


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: IPv6 APIs rework
  2024-07-18 15:03 IPv6 APIs rework Robin Jarry
@ 2024-07-18 20:27 ` Morten Brørup
  2024-07-18 21:15   ` Stephen Hemminger
  2024-07-18 21:25   ` Vladimir Medvedkin
  0 siblings, 2 replies; 21+ messages in thread
From: Morten Brørup @ 2024-07-18 20:27 UTC (permalink / raw)
  To: Robin Jarry, dev
  Cc: Sunil Kumar Kori, Rakesh Kudurumalla, Vladimir Medvedkin,
	Wisam Jaddo, Cristian Dumitrescu, Konstantin Ananyev,
	Akhil Goyal, Fan Zhang, Bruce Richardson, Yipeng Wang,
	Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K, Satha Rao,
	Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Gagandeep Singh, Hemant Agrawal, Ajit Khaparde, Somnath Kotur,
	Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
	Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Stephen Hemminger, Jiawen Wu, Jian Wang,
	Thomas Monjalon, Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh,
	Maxime Coquelin, Chenbo Xia

> From: Robin Jarry [mailto:rjarry@redhat.com]
> 
> Hi folks,
> 
> while working on IPv6 support for grout [1], I noticed that all DPDK
> IPv6 APIs used fixed sized arrays in the route lookup functions [2].
> 
>  int rte_fib6_lookup_bulk(struct rte_fib6 *fib,
>                           uint8_t ips[][RTE_FIB6_IPV6_ADDR_SIZE],
>                           uint64_t *next_hops,
>                           int n);
> 
> If I'm not mistaken, using sized arrays in function signatures is only
> for documentation purposes and does not result in any specific compiler
> checks. In the above example, the ips parameter is considered as a plain
> old `uint8_t **` pointer.
> 
> Also, not having a dedicated type for IPv6 addresses requires obscure
> pointer arithmetic [3] and casting [4].
> 
> I'd like to introduce a real IPv6 address structure that has the same
> alignment than a dumb `uint8_t *` pointer but has an union to ease
> casting and most importantly presents the whole thing as an explicit
> typed structure:
> 
>     #define RTE_IPV6_ADDR_SIZE 16
> 
>     struct rte_ipv6_addr {
>         union {
>             uint8_t u8[RTE_IPV6_ADDR_SIZE];
>             uint16_t u16[RTE_IPV6_ADDR_SIZE / sizeof(uint16_t)];
>             uint32_t u32[RTE_IPV6_ADDR_SIZE / sizeof(uint32_t)];
>             uint64_t u64[RTE_IPV6_ADDR_SIZE / sizeof(uint64_t)];
>         };
>     } __rte_packed __rte_aligned(1);
> 
> This would require some breakage of the APIs but I think it would
> benefit code readability and maintainability in the long term.

In short: Although I like the idea of a unified IPv6 address type very much, I'm not sure consensus can be reached about the optimal alignment of such a type.

The long version:

Please consider this proposal in a broader perspective.

The IPv4 FIB lookup takes an uint32_t array, so the IPv4 address type here is 4 byte aligned: uint32_t *ips
Generally, uint32_t or rte_be32_t is used for IPv4 addresses, and both these types are 4 byte aligned. In other words: IPv4 addresses are considered 4 byte aligned by DPDK.

I don't think it is similarly simple for IPv6 addresses.

The alignment of IPv6 addresses may depend on how/where they are used, e.g.:
1. For the FIB library, it might be good for vector implementations to have the IPv6 addresses naturally aligned (i.e. 16 byte aligned), like the uint128_t/__int128/__m128i type (or the rte_xmm_t type [XMM]). Furthermore, a simple integer type (uint128_t equivalent) might be preferable in this API.
2. In the IPv6 packet header, the IPv6 addresses are not 16 byte aligned, they are 8 byte aligned. So we cannot make the IPv6 address type 16 byte aligned.

I fear that broadly dumbing down the IPv6 address type to always use 1 byte alignment could potentially introduce unwanted performance penalties (now or in the future). We didn't do it for IPv4 addresses, so let's not do it for IPv6 addresses.

Perhaps we could use the lowest "non-exotic" (considering the use of IPv6 addresses) alignment, which I would guess is 8 byte - as in the IPv6 packet header.
For reference, Ethernet addresses are defined as 2 byte aligned [ETH].

[XMM]: https://elixir.bootlin.com/dpdk/v24.03/source/lib/eal/x86/include/rte_vect.h#L42
[ETH]: https://elixir.bootlin.com/dpdk/v24.07-rc2/source/lib/net/rte_ether.h#L74

> 
>  int rte_fib6_lookup_bulk(struct rte_fib6 *fib,
>                           const struct rte_ipv6_addr *ips,
>                           uint64_t *next_hops,
>                           int n);
> 
> I already have a semi-working draft and am in the process of splitting
> the changes into small chunks to make them easier to review.
> 
> https://github.com/DPDK/dpdk/compare/main...rjarry:dpdk:ipv6-address-
> rework
> 
> Is that something that would be of interest? If yes, I would like to
> announce API breakage before the release of 24.07 so that the changes
> can be integrated into 24.11.
> 
> Cheers!
> 
> [1] https://github.com/rjarry/grout
> [2]
> https://doc.dpdk.org/api/rte__fib6_8h.html#a924678410ccb9551cda3e75d742a
> 11e3
> [3] https://git.dpdk.org/dpdk/tree/lib/fib/trie_avx512.c?h=v24.07-
> rc2#n340
> [4] https://git.dpdk.org/dpdk/tree/lib/hash/rte_thash.h?h=v24.07-
> rc2#n156
> 
> --
> Robin


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IPv6 APIs rework
  2024-07-18 20:27 ` Morten Brørup
@ 2024-07-18 21:15   ` Stephen Hemminger
  2024-07-18 21:40     ` Robin Jarry
  2024-07-18 21:25   ` Vladimir Medvedkin
  1 sibling, 1 reply; 21+ messages in thread
From: Stephen Hemminger @ 2024-07-18 21:15 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Robin Jarry, dev, Sunil Kumar Kori, Rakesh Kudurumalla,
	Vladimir Medvedkin, Wisam Jaddo, Cristian Dumitrescu,
	Konstantin Ananyev, Akhil Goyal, Fan Zhang, Bruce Richardson,
	Yipeng Wang, Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K,
	Satha Rao, Harman Kalra, Ankur Dwivedi, Anoob Joseph,
	Tejasree Kondoj, Gagandeep Singh, Hemant Agrawal, Ajit Khaparde,
	Somnath Kotur, Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
	Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
	Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
	Chenbo Xia

On Thu, 18 Jul 2024 22:27:03 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> > From: Robin Jarry [mailto:rjarry@redhat.com]
> > 
> > Hi folks,
> > 
> > while working on IPv6 support for grout [1], I noticed that all DPDK
> > IPv6 APIs used fixed sized arrays in the route lookup functions [2].
> > 
> >  int rte_fib6_lookup_bulk(struct rte_fib6 *fib,
> >                           uint8_t ips[][RTE_FIB6_IPV6_ADDR_SIZE],
> >                           uint64_t *next_hops,
> >                           int n);
> > 
> > If I'm not mistaken, using sized arrays in function signatures is only
> > for documentation purposes and does not result in any specific compiler
> > checks. In the above example, the ips parameter is considered as a plain
> > old `uint8_t **` pointer.
> > 
> > Also, not having a dedicated type for IPv6 addresses requires obscure
> > pointer arithmetic [3] and casting [4].
> > 
> > I'd like to introduce a real IPv6 address structure that has the same
> > alignment than a dumb `uint8_t *` pointer but has an union to ease
> > casting and most importantly presents the whole thing as an explicit
> > typed structure:
> > 
> >     #define RTE_IPV6_ADDR_SIZE 16
> > 
> >     struct rte_ipv6_addr {
> >         union {
> >             uint8_t u8[RTE_IPV6_ADDR_SIZE];
> >             uint16_t u16[RTE_IPV6_ADDR_SIZE / sizeof(uint16_t)];
> >             uint32_t u32[RTE_IPV6_ADDR_SIZE / sizeof(uint32_t)];
> >             uint64_t u64[RTE_IPV6_ADDR_SIZE / sizeof(uint64_t)];
> >         };
> >     } __rte_packed __rte_aligned(1);
> > 
> > This would require some breakage of the APIs but I think it would
> > benefit code readability and maintainability in the long term.  
> 
> In short: Although I like the idea of a unified IPv6 address type very much, I'm not sure consensus can be reached about the optimal alignment of such a type.
> 
> The long version:
> 
> Please consider this proposal in a broader perspective.
> 
> The IPv4 FIB lookup takes an uint32_t array, so the IPv4 address type here is 4 byte aligned: uint32_t *ips
> Generally, uint32_t or rte_be32_t is used for IPv4 addresses, and both these types are 4 byte aligned. In other words: IPv4 addresses are considered 4 byte aligned by DPDK.
> 
> I don't think it is similarly simple for IPv6 addresses.
> 
> The alignment of IPv6 addresses may depend on how/where they are used, e.g.:
> 1. For the FIB library, it might be good for vector implementations to have the IPv6 addresses naturally aligned (i.e. 16 byte aligned), like the uint128_t/__int128/__m128i type (or the rte_xmm_t type [XMM]). Furthermore, a simple integer type (uint128_t equivalent) might be preferable in this API.
> 2. In the IPv6 packet header, the IPv6 addresses are not 16 byte aligned, they are 8 byte aligned. So we cannot make the IPv6 address type 16 byte aligned.
> 
> I fear that broadly dumbing down the IPv6 address type to always use 1 byte alignment could potentially introduce unwanted performance penalties (now or in the future). We didn't do it for IPv4 addresses, so let's not do it for IPv6 addresses.
> 
> Perhaps we could use the lowest "non-exotic" (considering the use of IPv6 addresses) alignment, which I would guess is 8 byte - as in the IPv6 packet header.
> For reference, Ethernet addresses are defined as 2 byte aligned [ETH].
> 
> [XMM]: https://elixir.bootlin.com/dpdk/v24.03/source/lib/eal/x86/include/rte_vect.h#L42
> [ETH]: https://elixir.bootlin.com/dpdk/v24.07-rc2/source/lib/net/rte_ether.h#L74
> 
> > 
> >  int rte_fib6_lookup_bulk(struct rte_fib6 *fib,
> >                           const struct rte_ipv6_addr *ips,
> >                           uint64_t *next_hops,
> >                           int n);
> > 
> > I already have a semi-working draft and am in the process of splitting
> > the changes into small chunks to make them easier to review.
> > 
> > https://github.com/DPDK/dpdk/compare/main...rjarry:dpdk:ipv6-address-
> > rework
> > 
> > Is that something that would be of interest? If yes, I would like to
> > announce API breakage before the release of 24.07 so that the changes
> > can be integrated into 24.11.
> > 
> > Cheers!
> > 
> > [1] https://github.com/rjarry/grout
> > [2]
> > https://doc.dpdk.org/api/rte__fib6_8h.html#a924678410ccb9551cda3e75d742a
> > 11e3
> > [3] https://git.dpdk.org/dpdk/tree/lib/fib/trie_avx512.c?h=v24.07-
> > rc2#n340
> > [4] https://git.dpdk.org/dpdk/tree/lib/hash/rte_thash.h?h=v24.07-
> > rc2#n156
> > 
> > --
> > Robin  
> 

If you look at the standard netinet/in.h the storage of IPv6 addresses
is in in6_addr. DPDK has always wanted to do its own thing...

The in6_addr is a union with no explicit alignment.

struct in6_addr
  {
    union
      {
        uint8_t __u6_addr8[16];
        uint16_t __u6_addr16[8];
        uint32_t __u6_addr32[4];
      } __in6_u;

Better to not have explicit alignment and not have 64 bit value.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IPv6 APIs rework
  2024-07-18 20:27 ` Morten Brørup
  2024-07-18 21:15   ` Stephen Hemminger
@ 2024-07-18 21:25   ` Vladimir Medvedkin
  2024-07-18 21:34     ` Robin Jarry
  1 sibling, 1 reply; 21+ messages in thread
From: Vladimir Medvedkin @ 2024-07-18 21:25 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Robin Jarry, dev, Sunil Kumar Kori, Rakesh Kudurumalla,
	Vladimir Medvedkin, Wisam Jaddo, Cristian Dumitrescu,
	Konstantin Ananyev, Akhil Goyal, Fan Zhang, Bruce Richardson,
	Yipeng Wang, Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K,
	Satha Rao, Harman Kalra, Ankur Dwivedi, Anoob Joseph,
	Tejasree Kondoj, Gagandeep Singh, Hemant Agrawal, Ajit Khaparde,
	Somnath Kotur, Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
	Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Stephen Hemminger, Jiawen Wu, Jian Wang,
	Thomas Monjalon, Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh,
	Maxime Coquelin, Chenbo Xia

[-- Attachment #1: Type: text/plain, Size: 5085 bytes --]

Hi Robin,

Thanks, that is a good idea.




чт, 18 июл. 2024 г. в 21:27, Morten Brørup <mb@smartsharesystems.com>:

> > From: Robin Jarry [mailto:rjarry@redhat.com]
> >
> > Hi folks,
> >
> > while working on IPv6 support for grout [1], I noticed that all DPDK
> > IPv6 APIs used fixed sized arrays in the route lookup functions [2].
> >
> >  int rte_fib6_lookup_bulk(struct rte_fib6 *fib,
> >                           uint8_t ips[][RTE_FIB6_IPV6_ADDR_SIZE],
> >                           uint64_t *next_hops,
> >                           int n);
> >
> > If I'm not mistaken, using sized arrays in function signatures is only
> > for documentation purposes and does not result in any specific compiler
> > checks. In the above example, the ips parameter is considered as a plain
> > old `uint8_t **` pointer.
> >
> > Also, not having a dedicated type for IPv6 addresses requires obscure
> > pointer arithmetic [3] and casting [4].
> >
> > I'd like to introduce a real IPv6 address structure that has the same
> > alignment than a dumb `uint8_t *` pointer but has an union to ease
> > casting and most importantly presents the whole thing as an explicit
> > typed structure:
> >
> >     #define RTE_IPV6_ADDR_SIZE 16
> >
> >     struct rte_ipv6_addr {
> >         union {
> >             uint8_t u8[RTE_IPV6_ADDR_SIZE];
> >             uint16_t u16[RTE_IPV6_ADDR_SIZE / sizeof(uint16_t)];
> >             uint32_t u32[RTE_IPV6_ADDR_SIZE / sizeof(uint32_t)];
> >             uint64_t u64[RTE_IPV6_ADDR_SIZE / sizeof(uint64_t)];
> >         };
> >     } __rte_packed __rte_aligned(1);
> >
> > This would require some breakage of the APIs but I think it would
> > benefit code readability and maintainability in the long term.
>
> In short: Although I like the idea of a unified IPv6 address type very
> much, I'm not sure consensus can be reached about the optimal alignment of
> such a type.
>
> The long version:
>
> Please consider this proposal in a broader perspective.
>
> The IPv4 FIB lookup takes an uint32_t array, so the IPv4 address type here
> is 4 byte aligned: uint32_t *ips
> Generally, uint32_t or rte_be32_t is used for IPv4 addresses, and both
> these types are 4 byte aligned. In other words: IPv4 addresses are
> considered 4 byte aligned by DPDK.
>
> I don't think it is similarly simple for IPv6 addresses.
>
> The alignment of IPv6 addresses may depend on how/where they are used,
> e.g.:
> 1. For the FIB library, it might be good for vector implementations to
> have the IPv6 addresses naturally aligned (i.e. 16 byte aligned), like the
> uint128_t/__int128/__m128i type (or the rte_xmm_t type [XMM]). Furthermore,
> a simple integer type (uint128_t equivalent) might be preferable in this
> API.
>

I think alignment should be 1 since in FIB6 users usually don't copy IPv6
address and just provide a pointer to the memory inside the packet. Current
vector implementation loads IPv6 addresses using unaligned access (
_mm512_loadu_si512) so it doesn't rely on alignment.


> 2. In the IPv6 packet header, the IPv6 addresses are not 16 byte aligned,
> they are 8 byte aligned. So we cannot make the IPv6 address type 16 byte
> aligned.
>
Not necessary, if Ethernet frame in mbuf starts on 8b aligned address, then
IPv6 is aligned only by 2 bytes.


> I fear that broadly dumbing down the IPv6 address type to always use 1
> byte alignment could potentially introduce unwanted performance penalties
> (now or in the future). We didn't do it for IPv4 addresses, so let's not do
> it for IPv6 addresses.
>
> Perhaps we could use the lowest "non-exotic" (considering the use of IPv6
> addresses) alignment, which I would guess is 8 byte - as in the IPv6 packet
> header.
> For reference, Ethernet addresses are defined as 2 byte aligned [ETH].
>
> [XMM]:
> https://elixir.bootlin.com/dpdk/v24.03/source/lib/eal/x86/include/rte_vect.h#L42
> [ETH]:
> https://elixir.bootlin.com/dpdk/v24.07-rc2/source/lib/net/rte_ether.h#L74
>
> >
> >  int rte_fib6_lookup_bulk(struct rte_fib6 *fib,
> >                           const struct rte_ipv6_addr *ips,
> >                           uint64_t *next_hops,
> >                           int n);
> >
> > I already have a semi-working draft and am in the process of splitting
> > the changes into small chunks to make them easier to review.
> >
> > https://github.com/DPDK/dpdk/compare/main...rjarry:dpdk:ipv6-address-
> > rework
> >
> > Is that something that would be of interest? If yes, I would like to
> > announce API breakage before the release of 24.07 so that the changes
> > can be integrated into 24.11.
> >
> > Cheers!
> >
> > [1] https://github.com/rjarry/grout
> > [2]
> > https://doc.dpdk.org/api/rte__fib6_8h.html#a924678410ccb9551cda3e75d742a
> > 11e3
> > [3] https://git.dpdk.org/dpdk/tree/lib/fib/trie_avx512.c?h=v24.07-
> > rc2#n340
> > [4] https://git.dpdk.org/dpdk/tree/lib/hash/rte_thash.h?h=v24.07-
> > rc2#n156
> >
> > --
> > Robin
>
>

-- 
Regards,
Vladimir

[-- Attachment #2: Type: text/html, Size: 7520 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IPv6 APIs rework
  2024-07-18 21:25   ` Vladimir Medvedkin
@ 2024-07-18 21:34     ` Robin Jarry
  2024-07-19  8:25       ` Konstantin Ananyev
  2024-07-19  9:12       ` Morten Brørup
  0 siblings, 2 replies; 21+ messages in thread
From: Robin Jarry @ 2024-07-18 21:34 UTC (permalink / raw)
  To: Vladimir Medvedkin, Morten Brørup
  Cc: dev, Sunil Kumar Kori, Rakesh Kudurumalla, Vladimir Medvedkin,
	Wisam Jaddo, Cristian Dumitrescu, Konstantin Ananyev,
	Akhil Goyal, Fan Zhang, Bruce Richardson, Yipeng Wang,
	Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K, Satha Rao,
	Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Gagandeep Singh, Hemant Agrawal, Ajit Khaparde, Somnath Kotur,
	Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
	Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Stephen Hemminger, Jiawen Wu, Jian Wang,
	Thomas Monjalon, Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh,
	Maxime Coquelin, Chenbo Xia

Vladimir Medvedkin, Jul 18, 2024 at 23:25:
> I think alignment should be 1 since in FIB6 users usually don't copy IPv6
> address and just provide a pointer to the memory inside the packet. Current
> vector implementation loads IPv6 addresses using unaligned access (
> _mm512_loadu_si512) so it doesn't rely on alignment.

Yes, my intention was exactly that, being able to map that structure 
directly in packets without copying them on the stack.

> > 2. In the IPv6 packet header, the IPv6 addresses are not 16 byte aligned,
> > they are 8 byte aligned. So we cannot make the IPv6 address type 16 byte
> > aligned.

> Not necessary, if Ethernet frame in mbuf starts on 8b aligned address, then
> IPv6 is aligned only by 2 bytes.

We probably could safely say that aligning on 2 bytes would be OK. But 
is there any benefit, performance wise, in doing so? Keeping the same 
alignment as before the change would at least make it ABI compatible.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IPv6 APIs rework
  2024-07-18 21:15   ` Stephen Hemminger
@ 2024-07-18 21:40     ` Robin Jarry
  0 siblings, 0 replies; 21+ messages in thread
From: Robin Jarry @ 2024-07-18 21:40 UTC (permalink / raw)
  To: Stephen Hemminger, Morten Brørup
  Cc: dev, Sunil Kumar Kori, Rakesh Kudurumalla, Vladimir Medvedkin,
	Wisam Jaddo, Cristian Dumitrescu, Konstantin Ananyev,
	Akhil Goyal, Fan Zhang, Bruce Richardson, Yipeng Wang,
	Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K, Satha Rao,
	Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Gagandeep Singh, Hemant Agrawal, Ajit Khaparde, Somnath Kotur,
	Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
	Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
	Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
	Chenbo Xia

Stephen Hemminger, Jul 18, 2024 at 23:15:
> If you look at the standard netinet/in.h the storage of IPv6 addresses
> is in in6_addr. DPDK has always wanted to do its own thing...
>
> The in6_addr is a union with no explicit alignment.
>
> struct in6_addr
>   {
>     union
>       {
>         uint8_t __u6_addr8[16];
>         uint16_t __u6_addr16[8];
>         uint32_t __u6_addr32[4];
>       } __in6_u;
>
> Better to not have explicit alignment and not have 64 bit value.

The main reason why I didn't use the standard POSIX type is that it has 
an alignment of 4 which means it cannot always be mapped directly to 
packets in memory depending on the encapsulating protocol.

Also, ip->__in6_u.__u6_addr8 is really ugly as a field name, even if 
the "helper" macros (ip->s6_addr8) make them a bit better :)

What do you have against adding a 64 bit value in the union?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: IPv6 APIs rework
  2024-07-18 21:34     ` Robin Jarry
@ 2024-07-19  8:25       ` Konstantin Ananyev
  2024-07-19  9:12       ` Morten Brørup
  1 sibling, 0 replies; 21+ messages in thread
From: Konstantin Ananyev @ 2024-07-19  8:25 UTC (permalink / raw)




> Vladimir Medvedkin, Jul 18, 2024 at 23:25:
> > I think alignment should be 1 since in FIB6 users usually don't copy IPv6
> > address and just provide a pointer to the memory inside the packet. Current
> > vector implementation loads IPv6 addresses using unaligned access (
> > _mm512_loadu_si512) so it doesn't rely on alignment.
> 
> Yes, my intention was exactly that, being able to map that structure
> directly in packets without copying them on the stack.
> 
> > > 2. In the IPv6 packet header, the IPv6 addresses are not 16 byte aligned,
> > > they are 8 byte aligned. So we cannot make the IPv6 address type 16 byte
> > > aligned.
> 
> > Not necessary, if Ethernet frame in mbuf starts on 8b aligned address, then
> > IPv6 is aligned only by 2 bytes.
> 
> We probably could safely say that aligning on 2 bytes would be OK. But
> is there any benefit, performance wise, in doing so? Keeping the same
> alignment as before the change would at least make it ABI compatible.

I am also not sure that this extra alignment (2B or 4B) here will give us any benefit,
while it most likely will introduce extra restrictions. 
AFAIK, right now we do have ipv6 as array of plain chars, and there were no much
complaints about it.
So I am for keeping it 1B aligned.
Overall proposal looks reasonable to me... might be 24.11 is a good opportunity for such change.
Konstantin  

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: IPv6 APIs rework
  2024-07-18 21:34     ` Robin Jarry
  2024-07-19  8:25       ` Konstantin Ananyev
@ 2024-07-19  9:12       ` Morten Brørup
  2024-07-19 10:02         ` Robin Jarry
  2024-07-19 10:41         ` Medvedkin, Vladimir
  1 sibling, 2 replies; 21+ messages in thread
From: Morten Brørup @ 2024-07-19  9:12 UTC (permalink / raw)
  To: Robin Jarry, Vladimir Medvedkin, stephen
  Cc: dev, Sunil Kumar Kori, Rakesh Kudurumalla, Vladimir Medvedkin,
	Wisam Jaddo, Cristian Dumitrescu, Konstantin Ananyev,
	Akhil Goyal, Fan Zhang, Bruce Richardson, Yipeng Wang,
	Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K, Satha Rao,
	Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Gagandeep Singh, Hemant Agrawal, Ajit Khaparde, Somnath Kotur,
	Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
	Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Stephen Hemminger, Jiawen Wu, Jian Wang,
	Thomas Monjalon, Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh,
	Maxime Coquelin, Chenbo Xia

> From: Robin Jarry [mailto:rjarry@redhat.com]
> 
> Vladimir Medvedkin, Jul 18, 2024 at 23:25:
> > I think alignment should be 1 since in FIB6 users usually don't copy
> IPv6
> > address and just provide a pointer to the memory inside the packet.

How can they do that? The bulk lookup function takes an array of IPv6 addresses, not an array of pointers to IPv6 addresses.

What you are suggesting only works with single lookup, not bulk lookup.

> Current
> > vector implementation loads IPv6 addresses using unaligned access (
> > _mm512_loadu_si512) so it doesn't rely on alignment.
> 
> Yes, my intention was exactly that, being able to map that structure
> directly in packets without copying them on the stack.

This would require changing the bulk lookup API to take an array of pointers instead of an array of IPv6 addresses.

It would be acceptable to introduce a new single address lookup function, taking a pointer to an unaligned (or 2 byte aligned) IPv6 address for the single lookup use cases mentioned above.

> 
> > > 2. In the IPv6 packet header, the IPv6 addresses are not 16 byte
> aligned,
> > > they are 8 byte aligned. So we cannot make the IPv6 address type 16
> byte
> > > aligned.
> 
> > Not necessary, if Ethernet frame in mbuf starts on 8b aligned address,
> then
> > IPv6 is aligned only by 2 bytes.
> 
> We probably could safely say that aligning on 2 bytes would be OK. But
> is there any benefit, performance wise, in doing so? Keeping the same
> alignment as before the change would at least make it ABI compatible.

I'm not worried about the IPv6 FIB functions. This proposal introduces a generic IPv6 address type for *all of DPDK*, so you need to consider *all* aspects, not just one library!

There may be current or future CPUs, where alignment makes a performance difference. Do all architectures support unaligned 128 bit access at 100 % similar performance to aligned 128 bit access? I think not!
E.g. on X86 architecture, load/store across a cache boundary has a performance impact. If the type is explicitly unaligned, an instance on the stack (i.e. a local variable holding an IPv6 address) might cross a cache boundary, whereas an 128 bit aligned instance on the stack is guaranteed not to cross a cache boundary.

The generic IPv4 address type is natively aligned (i.e. 4 byte). When accessing an IPv4 address in an IPv4 header following an Ethernet header, it is not 4 byte aligned, so this is an *exception* from the general case, and must be treated as such. You don't want to make the general type unaligned (and thus inefficient) everywhere it is being used, only because a few use cases require the unaligned form.

The same principle must apply to the IPv6 address type. Let's make the generic type natively aligned (16 byte). And you might also offer an explicitly unaligned type for the exception use cases requiring unaligned access.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IPv6 APIs rework
  2024-07-19  9:12       ` Morten Brørup
@ 2024-07-19 10:02         ` Robin Jarry
  2024-07-19 10:09           ` Bruce Richardson
  2024-07-19 10:46           ` Morten Brørup
  2024-07-19 10:41         ` Medvedkin, Vladimir
  1 sibling, 2 replies; 21+ messages in thread
From: Robin Jarry @ 2024-07-19 10:02 UTC (permalink / raw)
  To: Morten Brørup, Vladimir Medvedkin, stephen
  Cc: dev, Sunil Kumar Kori, Rakesh Kudurumalla, Vladimir Medvedkin,
	Wisam Jaddo, Cristian Dumitrescu, Konstantin Ananyev,
	Akhil Goyal, Fan Zhang, Bruce Richardson, Yipeng Wang,
	Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K, Satha Rao,
	Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Gagandeep Singh, Hemant Agrawal, Ajit Khaparde, Somnath Kotur,
	Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
	Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
	Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
	Chenbo Xia

Morten Brørup, Jul 19, 2024 at 11:12:
> > Vladimir Medvedkin, Jul 18, 2024 at 23:25:
> > > I think alignment should be 1 since in FIB6 users usually don't 
> > > copy IPv6 address and just provide a pointer to the memory inside 
> > > the packet.
>
> How can they do that? The bulk lookup function takes an array of IPv6 
> addresses, not an array of pointers to IPv6 addresses.
>
> What you are suggesting only works with single lookup, not bulk 
> lookup.

Indeed for bulk lookup, you need to copy addresses on the stack.

> > Yes, my intention was exactly that, being able to map that structure 
> > directly in packets without copying them on the stack.
>
> This would require changing the bulk lookup API to take an array of 
> pointers instead of an array of IPv6 addresses.
>
> It would be acceptable to introduce a new single address lookup 
> function, taking a pointer to an unaligned (or 2 byte aligned) IPv6 
> address for the single lookup use cases mentioned above.

That would require two different IPv6 structures. I would prefer it we 
could avoid that. Or the unaligned lookup API needs to take a simple 
`const uint8_t *` parameter.

> I'm not worried about the IPv6 FIB functions. This proposal introduces 
> a generic IPv6 address type for *all of DPDK*, so you need to consider 
> *all* aspects, not just one library!
>
> There may be current or future CPUs, where alignment makes 
> a performance difference. Do all architectures support unaligned 128 
> bit access at 100 % similar performance to aligned 128 bit access? 
> I think not!
> E.g. on X86 architecture, load/store across a cache boundary has 
> a performance impact. If the type is explicitly unaligned, an instance 
> on the stack (i.e. a local variable holding an IPv6 address) might 
> cross a cache boundary, whereas an 128 bit aligned instance on the 
> stack is guaranteed not to cross a cache boundary.
>
> The generic IPv4 address type is natively aligned (i.e. 4 byte). When 
> accessing an IPv4 address in an IPv4 header following an Ethernet 
> header, it is not 4 byte aligned, so this is an *exception* from the 
> general case, and must be treated as such. You don't want to make the 
> general type unaligned (and thus inefficient) everywhere it is being 
> used, only because a few use cases require the unaligned form.

I think the main difference is that you almost never pass IPv4 addresses 
as reference but always as values. So alignment does not matter.

> The same principle must apply to the IPv6 address type. Let's make the 
> generic type natively aligned (16 byte). And you might also offer an 
> explicitly unaligned type for the exception use cases requiring 
> unaligned access.

The main issue with this is that you would not be able to use that type 
in the IPv6 header structure to map it to mbuf data. That leaves us with 
two options:

1) Keep a single unaligned IPv6 type and hope for the best with 
   performance. It will not be different from the current state of 
   things where every IPv6 is a uint8_t pointer.

2) Have two IPv6 types, one 16 bytes aligned, and another one 1 byte 
   aligned. The main issue with that second approach is that users may 
   get confused about which one to use and when.

I would prefer to keep it simple at first and go with option 1). We can 
always revisit that later and introduce an aligned IPv6 type for certain 
use cases.

What do you think?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IPv6 APIs rework
  2024-07-19 10:02         ` Robin Jarry
@ 2024-07-19 10:09           ` Bruce Richardson
  2024-07-19 10:46           ` Morten Brørup
  1 sibling, 0 replies; 21+ messages in thread
From: Bruce Richardson @ 2024-07-19 10:09 UTC (permalink / raw)
  To: Robin Jarry
  Cc: Morten Brørup, Vladimir Medvedkin, stephen, dev,
	Sunil Kumar Kori, Rakesh Kudurumalla, Vladimir Medvedkin,
	Wisam Jaddo, Cristian Dumitrescu, Konstantin Ananyev,
	Akhil Goyal, Fan Zhang, Yipeng Wang, Sameh Gobriel,
	Nithin Dabilpuram, Kiran Kumar K, Satha Rao, Harman Kalra,
	Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj, Gagandeep Singh,
	Hemant Agrawal, Ajit Khaparde, Somnath Kotur, Chas Williams,
	Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
	Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
	Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
	Chenbo Xia

On Fri, Jul 19, 2024 at 12:02:38PM +0200, Robin Jarry wrote:
> Morten Brørup, Jul 19, 2024 at 11:12:
> > > Vladimir Medvedkin, Jul 18, 2024 at 23:25:
> > > > I think alignment should be 1 since in FIB6 users usually don't > >
> > copy IPv6 address and just provide a pointer to the memory inside > >
> > the packet.
> > 
> > How can they do that? The bulk lookup function takes an array of IPv6
> > addresses, not an array of pointers to IPv6 addresses.
> > 
> > What you are suggesting only works with single lookup, not bulk lookup.
> 
> Indeed for bulk lookup, you need to copy addresses on the stack.
> 
> > > Yes, my intention was exactly that, being able to map that structure >
> > directly in packets without copying them on the stack.
> > 
> > This would require changing the bulk lookup API to take an array of
> > pointers instead of an array of IPv6 addresses.
> > 
> > It would be acceptable to introduce a new single address lookup
> > function, taking a pointer to an unaligned (or 2 byte aligned) IPv6
> > address for the single lookup use cases mentioned above.
> 
> That would require two different IPv6 structures. I would prefer it we could
> avoid that. Or the unaligned lookup API needs to take a simple `const
> uint8_t *` parameter.
> 
> > I'm not worried about the IPv6 FIB functions. This proposal introduces a
> > generic IPv6 address type for *all of DPDK*, so you need to consider
> > *all* aspects, not just one library!
> > 
> > There may be current or future CPUs, where alignment makes a performance
> > difference. Do all architectures support unaligned 128 bit access at 100
> > % similar performance to aligned 128 bit access? I think not!
> > E.g. on X86 architecture, load/store across a cache boundary has a
> > performance impact. If the type is explicitly unaligned, an instance on
> > the stack (i.e. a local variable holding an IPv6 address) might cross a
> > cache boundary, whereas an 128 bit aligned instance on the stack is
> > guaranteed not to cross a cache boundary.
> > 
> > The generic IPv4 address type is natively aligned (i.e. 4 byte). When
> > accessing an IPv4 address in an IPv4 header following an Ethernet
> > header, it is not 4 byte aligned, so this is an *exception* from the
> > general case, and must be treated as such. You don't want to make the
> > general type unaligned (and thus inefficient) everywhere it is being
> > used, only because a few use cases require the unaligned form.
> 
> I think the main difference is that you almost never pass IPv4 addresses as
> reference but always as values. So alignment does not matter.
> 
> > The same principle must apply to the IPv6 address type. Let's make the
> > generic type natively aligned (16 byte). And you might also offer an
> > explicitly unaligned type for the exception use cases requiring
> > unaligned access.
> 
> The main issue with this is that you would not be able to use that type in
> the IPv6 header structure to map it to mbuf data. That leaves us with two
> options:
> 
> 1) Keep a single unaligned IPv6 type and hope for the best with
> performance. It will not be different from the current state of   things
> where every IPv6 is a uint8_t pointer.
> 
> 2) Have two IPv6 types, one 16 bytes aligned, and another one 1 byte
> aligned. The main issue with that second approach is that users may   get
> confused about which one to use and when.
> 
> I would prefer to keep it simple at first and go with option 1). We can
> always revisit that later and introduce an aligned IPv6 type for certain use
> cases.
> 
> What do you think?
>

+1 for option 1 - keep minimally aligned type. Having two types would be
confusing.

/Bruce 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IPv6 APIs rework
  2024-07-19  9:12       ` Morten Brørup
  2024-07-19 10:02         ` Robin Jarry
@ 2024-07-19 10:41         ` Medvedkin, Vladimir
  1 sibling, 0 replies; 21+ messages in thread
From: Medvedkin, Vladimir @ 2024-07-19 10:41 UTC (permalink / raw)
  To: Morten Brørup, Robin Jarry, Vladimir Medvedkin, stephen
  Cc: dev, Sunil Kumar Kori, Rakesh Kudurumalla, Wisam Jaddo,
	Cristian Dumitrescu, Konstantin Ananyev, Akhil Goyal, Fan Zhang,
	Bruce Richardson, Yipeng Wang, Sameh Gobriel, Nithin Dabilpuram,
	Kiran Kumar K, Satha Rao, Harman Kalra, Ankur Dwivedi,
	Anoob Joseph, Tejasree Kondoj, Gagandeep Singh, Hemant Agrawal,
	Ajit Khaparde, Somnath Kotur, Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
	Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
	Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
	Chenbo Xia

Hi Morten,

On 19/07/2024 10:12, Morten Brørup wrote:
>> From: Robin Jarry [mailto:rjarry@redhat.com]
>>
>> Vladimir Medvedkin, Jul 18, 2024 at 23:25:
>>> I think alignment should be 1 since in FIB6 users usually don't copy
>> IPv6
>>> address and just provide a pointer to the memory inside the packet.
> How can they do that? The bulk lookup function takes an array of IPv6 addresses, not an array of pointers to IPv6 addresses.
>
> What you are suggesting only works with single lookup, not bulk lookup.

You're right, sorry, confused with an internal implementation that 
passes an array of pointers


>> Current
>>> vector implementation loads IPv6 addresses using unaligned access (
>>> _mm512_loadu_si512) so it doesn't rely on alignment.
>> Yes, my intention was exactly that, being able to map that structure
>> directly in packets without copying them on the stack.
> This would require changing the bulk lookup API to take an array of pointers instead of an array of IPv6 addresses.
>
> It would be acceptable to introduce a new single address lookup function, taking a pointer to an unaligned (or 2 byte aligned) IPv6 address for the single lookup use cases mentioned above.
>
>>>> 2. In the IPv6 packet header, the IPv6 addresses are not 16 byte
>> aligned,
>>>> they are 8 byte aligned. So we cannot make the IPv6 address type 16
>> byte
>>>> aligned.
>>> Not necessary, if Ethernet frame in mbuf starts on 8b aligned address,
>> then
>>> IPv6 is aligned only by 2 bytes.
>> We probably could safely say that aligning on 2 bytes would be OK. But
>> is there any benefit, performance wise, in doing so? Keeping the same
>> alignment as before the change would at least make it ABI compatible.
> I'm not worried about the IPv6 FIB functions. This proposal introduces a generic IPv6 address type for *all of DPDK*, so you need to consider *all* aspects, not just one library!
>
> There may be current or future CPUs, where alignment makes a performance difference. Do all architectures support unaligned 128 bit access at 100 % similar performance to aligned 128 bit access? I think not!
> E.g. on X86 architecture, load/store across a cache boundary has a performance impact. If the type is explicitly unaligned, an instance on the stack (i.e. a local variable holding an IPv6 address) might cross a cache boundary, whereas an 128 bit aligned instance on the stack is guaranteed not to cross a cache boundary.
>
> The generic IPv4 address type is natively aligned (i.e. 4 byte). When accessing an IPv4 address in an IPv4 header following an Ethernet header, it is not 4 byte aligned, so this is an *exception* from the general case, and must be treated as such. You don't want to make the general type unaligned (and thus inefficient) everywhere it is being used, only because a few use cases require the unaligned form.
>
> The same principle must apply to the IPv6 address type. Let's make the generic type natively aligned (16 byte). And you might also offer an explicitly unaligned type for the exception use cases requiring unaligned access.
>
-- 
Regards,
Vladimir


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: IPv6 APIs rework
  2024-07-19 10:02         ` Robin Jarry
  2024-07-19 10:09           ` Bruce Richardson
@ 2024-07-19 10:46           ` Morten Brørup
  2024-07-19 11:09             ` Robin Jarry
  1 sibling, 1 reply; 21+ messages in thread
From: Morten Brørup @ 2024-07-19 10:46 UTC (permalink / raw)
  To: Robin Jarry, Vladimir Medvedkin, stephen
  Cc: dev, Sunil Kumar Kori, Rakesh Kudurumalla, Vladimir Medvedkin,
	Wisam Jaddo, Cristian Dumitrescu, Konstantin Ananyev,
	Akhil Goyal, Fan Zhang, Bruce Richardson, Yipeng Wang,
	Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K, Satha Rao,
	Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Gagandeep Singh, Hemant Agrawal, Ajit Khaparde, Somnath Kotur,
	Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
	Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
	Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
	Chenbo Xia

> From: Robin Jarry [mailto:rjarry@redhat.com]
> 
> Morten Brørup, Jul 19, 2024 at 11:12:
> > > Vladimir Medvedkin, Jul 18, 2024 at 23:25:
> > > > I think alignment should be 1 since in FIB6 users usually don't
> > > > copy IPv6 address and just provide a pointer to the memory inside
> > > > the packet.
> >
> > How can they do that? The bulk lookup function takes an array of IPv6
> > addresses, not an array of pointers to IPv6 addresses.
> >
> > What you are suggesting only works with single lookup, not bulk
> > lookup.
> 
> Indeed for bulk lookup, you need to copy addresses on the stack.
> 
> > > Yes, my intention was exactly that, being able to map that structure
> > > directly in packets without copying them on the stack.
> >
> > This would require changing the bulk lookup API to take an array of
> > pointers instead of an array of IPv6 addresses.
> >
> > It would be acceptable to introduce a new single address lookup
> > function, taking a pointer to an unaligned (or 2 byte aligned) IPv6
> > address for the single lookup use cases mentioned above.
> 
> That would require two different IPv6 structures. I would prefer it we
> could avoid that. Or the unaligned lookup API needs to take a simple
> `const uint8_t *` parameter.
> 
> > I'm not worried about the IPv6 FIB functions. This proposal introduces
> > a generic IPv6 address type for *all of DPDK*, so you need to consider
> > *all* aspects, not just one library!
> >
> > There may be current or future CPUs, where alignment makes
> > a performance difference. Do all architectures support unaligned 128
> > bit access at 100 % similar performance to aligned 128 bit access?
> > I think not!
> > E.g. on X86 architecture, load/store across a cache boundary has
> > a performance impact. If the type is explicitly unaligned, an instance
> > on the stack (i.e. a local variable holding an IPv6 address) might
> > cross a cache boundary, whereas an 128 bit aligned instance on the
> > stack is guaranteed not to cross a cache boundary.
> >
> > The generic IPv4 address type is natively aligned (i.e. 4 byte). When
> > accessing an IPv4 address in an IPv4 header following an Ethernet
> > header, it is not 4 byte aligned, so this is an *exception* from the
> > general case, and must be treated as such. You don't want to make the
> > general type unaligned (and thus inefficient) everywhere it is being
> > used, only because a few use cases require the unaligned form.
> 
> I think the main difference is that you almost never pass IPv4 addresses
> as reference but always as values. So alignment does not matter.

When passing an IPv4 address as a value, alignment does matter; it must be 4 byte aligned.

On a CPU with 128 bit registers, I would probably also pass an IPv6 address as a value. With such a CPU, the parameter type should be uint128_t or rte_be128_t, depending on byte order.

> 
> > The same principle must apply to the IPv6 address type. Let's make the
> > generic type natively aligned (16 byte). And you might also offer an
> > explicitly unaligned type for the exception use cases requiring
> > unaligned access.
> 
> The main issue with this is that you would not be able to use that type
> in the IPv6 header structure to map it to mbuf data. That leaves us with
> two options:
> 
> 1) Keep a single unaligned IPv6 type and hope for the best with
>    performance. It will not be different from the current state of
>    things where every IPv6 is a uint8_t pointer.
> 
> 2) Have two IPv6 types, one 16 bytes aligned, and another one 1 byte
>    aligned. The main issue with that second approach is that users may
>    get confused about which one to use and when.
> 
> I would prefer to keep it simple at first and go with option 1). We can
> always revisit that later and introduce an aligned IPv6 type for certain
> use cases.
> 
> What do you think?

There's a 3rd option:
Have an IPv6 type that is simply an array of 16 bytes with no explicitly specified alignment:

struct rte_ipv6_addr {
	unsigned char addr_bytes[RTE_IPV6_ADDR_LEN];
};

Or:

typedef struct rte_ipv6_addr {
	unsigned char addr_bytes[RTE_IPV6_ADDR_LEN];
} rte_ipv6_addr_t;

If used as is, it will be unaligned.
And if alignment offers improved performance for some use cases, explicit alignment attributes can be added to the type in those use cases.

Not using an uint128_t type (or a union of other types than unsigned char) will also avoid byte order issues.

I guess Stephen was right to begin with. :-)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IPv6 APIs rework
  2024-07-19 10:46           ` Morten Brørup
@ 2024-07-19 11:09             ` Robin Jarry
  2024-07-19 15:47               ` Morten Brørup
  0 siblings, 1 reply; 21+ messages in thread
From: Robin Jarry @ 2024-07-19 11:09 UTC (permalink / raw)
  To: Morten Brørup, Vladimir Medvedkin, stephen
  Cc: dev, Sunil Kumar Kori, Rakesh Kudurumalla, Vladimir Medvedkin,
	Wisam Jaddo, Cristian Dumitrescu, Konstantin Ananyev,
	Akhil Goyal, Fan Zhang, Bruce Richardson, Yipeng Wang,
	Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K, Satha Rao,
	Harman Kalra, Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj,
	Gagandeep Singh, Hemant Agrawal, Ajit Khaparde, Somnath Kotur,
	Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Ziyang Xuan, Xiaoyun Wang,
	Jie Hai, Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
	Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
	Chenbo Xia

Morten Brørup, Jul 19, 2024 at 12:46:
> When passing an IPv4 address as a value, alignment does matter; it 
> must be 4 byte aligned.

I was expecting the compiler to do what is necessary to copy the data to 
an aligned register before jumping to the function.

> On a CPU with 128 bit registers, I would probably also pass an IPv6 
> address as a value. With such a CPU, the parameter type should be 
> uint128_t or rte_be128_t, depending on byte order.

I don't think there is a portable/standard uint128_t yet. Everything 
I could find is either GCC or linux specific.

> There's a 3rd option:
> Have an IPv6 type that is simply an array of 16 bytes with no explicitly specified alignment:
>
> struct rte_ipv6_addr {
> 	unsigned char addr_bytes[RTE_IPV6_ADDR_LEN];
> };
>
> Or:
>
> typedef struct rte_ipv6_addr {
> 	unsigned char addr_bytes[RTE_IPV6_ADDR_LEN];
> } rte_ipv6_addr_t;
>
> If used as is, it will be unaligned.
>
> And if alignment offers improved performance for some use cases, 
> explicit alignment attributes can be added to the type in those use 
> cases.
>
> Not using an uint128_t type (or a union of other types than unsigned 
> char) will also avoid byte order issues.
>
> I guess Stephen was right to begin with. :-)

Having the type as a union (as is the POSIX type) makes casting to 
integers a lot less tedious and makes the structure overall more 
flexible.

We could completely add an unaligned be128 member to the union by the 
way. I don't see what is wrong with having sub union members.

About your concern with byte order, since the union members have 
explicit rte_be*_t types, I don't think confusion can happen. I have 
also renamed the members, replacing the "u" prefix with "a" so that it 
does not indicate that it should be used as a host integer.

        struct __rte_aligned(1) rte_ipv6_addr {
                union {
                        unsigned char a[16];
                        unaligned_be16_t a16[8];
                        unaligned_be32_t a32[4];
                        unaligned_be64_t a64[2];
                        unaligned_be128_t a128[1];
                };
        } __rte_packed;


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: IPv6 APIs rework
  2024-07-19 11:09             ` Robin Jarry
@ 2024-07-19 15:47               ` Morten Brørup
  2024-07-19 17:07                 ` Stephen Hemminger
  0 siblings, 1 reply; 21+ messages in thread
From: Morten Brørup @ 2024-07-19 15:47 UTC (permalink / raw)
  To: Robin Jarry, Vladimir Medvedkin, stephen, bruce.richardson,
	Konstantin Ananyev
  Cc: dev, Sunil Kumar Kori, Rakesh Kudurumalla, Wisam Jaddo,
	Cristian Dumitrescu, Konstantin Ananyev, Akhil Goyal, Fan Zhang,
	Yipeng Wang, Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K,
	Satha Rao, Harman Kalra, Ankur Dwivedi, Anoob Joseph,
	Tejasree Kondoj, Gagandeep Singh, Hemant Agrawal, Ajit Khaparde,
	Somnath Kotur, Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Xiaoyun Wang, Jie Hai,
	Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
	Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
	Chenbo Xia

> From: Robin Jarry [mailto:rjarry@redhat.com]
> 
> Morten Brørup, Jul 19, 2024 at 12:46:
> > When passing an IPv4 address as a value, alignment does matter; it
> > must be 4 byte aligned.
> 
> I was expecting the compiler to do what is necessary to copy the data to
> an aligned register before jumping to the function.

Yes, and hereby you have achieved 4-byte alignment of the parameter.

What I meant was: If the parameter's type makes the parameter explicitly unaligned, e.g. an unaligned array of 4 bytes or an unaligned_uint32_t, the code inside the function must also treat the parameter as unaligned, and cannot assume it has magically become 4-byte aligned.

Our functions taking an IPv4 address parameter (by value) passes the value as aligned.
Functions taking an IPv6 address parameter (by value) should behave exactly the same way: The compiler should do what is necessary to copy the data to an aligned register *before* jumping to the function. (Note: In 64 bit architectures, 128 bits requires two 64 bit registers.) The point remains: If conversion from unaligned to aligned is required, it is the responsibility of the code calling the function, not the function itself.

> 
> > On a CPU with 128 bit registers, I would probably also pass an IPv6
> > address as a value. With such a CPU, the parameter type should be
> > uint128_t or rte_be128_t, depending on byte order.
> 
> I don't think there is a portable/standard uint128_t yet. Everything
> I could find is either GCC or linux specific.

Agree. I am using uint128_t conceptually.

> 
> > There's a 3rd option:
> > Have an IPv6 type that is simply an array of 16 bytes with no
> explicitly specified alignment:
> >
> > struct rte_ipv6_addr {
> > 	unsigned char addr_bytes[RTE_IPV6_ADDR_LEN];
> > };
> >
> > Or:
> >
> > typedef struct rte_ipv6_addr {
> > 	unsigned char addr_bytes[RTE_IPV6_ADDR_LEN];
> > } rte_ipv6_addr_t;
> >
> > If used as is, it will be unaligned.
> >
> > And if alignment offers improved performance for some use cases,
> > explicit alignment attributes can be added to the type in those use
> > cases.
> >
> > Not using an uint128_t type (or a union of other types than unsigned
> > char) will also avoid byte order issues.
> >
> > I guess Stephen was right to begin with. :-)
> 
> Having the type as a union (as is the POSIX type) makes casting to
> integers a lot less tedious and makes the structure overall more
> flexible.

Maybe (probably?). However, if you explicitly make the type unaligned, how can the same type be used in an optimized way where the developer knows that it is 16 byte aligned?

NB: There's something in the C standard about type casting from char (and unsigned char) being less restricted than typecasting from uint8_t, so perhaps using unsigned char instead of uint8_t could solve the recasting issue your union is trying to solve. (Unfortunately, I cannot remember the source of this information.)

Generally I don't think that we should introduce complex types/structures/unions only to simplify type casting, if it is at the expense of performance or code readability.

> 
> We could completely add an unaligned be128 member to the union by the
> way. I don't see what is wrong with having sub union members.

(Not that I agree to using a union, but..)
Agree. If it's a union, and alignment is explicitly set, adding 64 bit and 128 bit sub union members should be perfectly acceptable, as it does not modify the alignment or anything else.

> 
> About your concern with byte order, since the union members have
> explicit rte_be*_t types, I don't think confusion can happen. I have
> also renamed the members, replacing the "u" prefix with "a" so that it
> does not indicate that it should be used as a host integer.
> 
>         struct __rte_aligned(1) rte_ipv6_addr {
>                 union {
>                         unsigned char a[16];
>                         unaligned_be16_t a16[8];
>                         unaligned_be32_t a32[4];
>                         unaligned_be64_t a64[2];
>                         unaligned_be128_t a128[1];
>                 };
>         } __rte_packed;

(Again, not that I'm accepting the structure, but...)
Yes, this would solve the byte order concern.

How do you write efficient code with this forcefully unaligned type?
Let's say an application has some structure with an IPv6 address field, which the developer has designed to be 16 byte aligned in the structure.
The compiler would need to always access this 16 byte aligned field as unaligned, because the rte_ipv6_addr type makes it explicitly unaligned.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IPv6 APIs rework
  2024-07-19 15:47               ` Morten Brørup
@ 2024-07-19 17:07                 ` Stephen Hemminger
  2024-07-20 17:43                   ` Robin Jarry
  0 siblings, 1 reply; 21+ messages in thread
From: Stephen Hemminger @ 2024-07-19 17:07 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Robin Jarry, Vladimir Medvedkin, bruce.richardson,
	Konstantin Ananyev, dev, Sunil Kumar Kori, Rakesh Kudurumalla,
	Wisam Jaddo, Cristian Dumitrescu, Konstantin Ananyev,
	Akhil Goyal, Fan Zhang, Yipeng Wang, Sameh Gobriel,
	Nithin Dabilpuram, Kiran Kumar K, Satha Rao, Harman Kalra,
	Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj, Gagandeep Singh,
	Hemant Agrawal, Ajit Khaparde, Somnath Kotur, Chas Williams,
	Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Xiaoyun Wang, Jie Hai,
	Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
	Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
	Chenbo Xia

On Fri, 19 Jul 2024 17:47:47 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> > From: Robin Jarry [mailto:rjarry@redhat.com]
> > 
> > Morten Brørup, Jul 19, 2024 at 12:46:  
> > > When passing an IPv4 address as a value, alignment does matter; it
> > > must be 4 byte aligned.  
> > 
> > I was expecting the compiler to do what is necessary to copy the data to
> > an aligned register before jumping to the function.  
> 
> Yes, and hereby you have achieved 4-byte alignment of the parameter.
> 
> What I meant was: If the parameter's type makes the parameter explicitly unaligned, e.g. an unaligned array of 4 bytes or an unaligned_uint32_t, the code inside the function must also treat the parameter as unaligned, and cannot assume it has magically become 4-byte aligned.
> 
> Our functions taking an IPv4 address parameter (by value) passes the value as aligned.
> Functions taking an IPv6 address parameter (by value) should behave exactly the same way: The compiler should do what is necessary to copy the data to an aligned register *before* jumping to the function. (Note: In 64 bit architectures, 128 bits requires two 64 bit registers.) The point remains: If conversion from unaligned to aligned is required, it is the responsibility of the code calling the function, not the function itself.
> 
> >   
> > > On a CPU with 128 bit registers, I would probably also pass an IPv6
> > > address as a value. With such a CPU, the parameter type should be
> > > uint128_t or rte_be128_t, depending on byte order.  
> > 
> > I don't think there is a portable/standard uint128_t yet. Everything
> > I could find is either GCC or linux specific.  
> 
> Agree. I am using uint128_t conceptually.
> 
> >   
> > > There's a 3rd option:
> > > Have an IPv6 type that is simply an array of 16 bytes with no  
> > explicitly specified alignment:  
> > >
> > > struct rte_ipv6_addr {
> > > 	unsigned char addr_bytes[RTE_IPV6_ADDR_LEN];
> > > };
> > >
> > > Or:
> > >
> > > typedef struct rte_ipv6_addr {
> > > 	unsigned char addr_bytes[RTE_IPV6_ADDR_LEN];
> > > } rte_ipv6_addr_t;
> > >
> > > If used as is, it will be unaligned.
> > >
> > > And if alignment offers improved performance for some use cases,
> > > explicit alignment attributes can be added to the type in those use
> > > cases.
> > >
> > > Not using an uint128_t type (or a union of other types than unsigned
> > > char) will also avoid byte order issues.
> > >
> > > I guess Stephen was right to begin with. :-)  
> > 
> > Having the type as a union (as is the POSIX type) makes casting to
> > integers a lot less tedious and makes the structure overall more
> > flexible.  
> 
> Maybe (probably?). However, if you explicitly make the type unaligned, how can the same type be used in an optimized way where the developer knows that it is 16 byte aligned?
> 
> NB: There's something in the C standard about type casting from char (and unsigned char) being less restricted than typecasting from uint8_t, so perhaps using unsigned char instead of uint8_t could solve the recasting issue your union is trying to solve. (Unfortunately, I cannot remember the source of this information.)
> 
> Generally I don't think that we should introduce complex types/structures/unions only to simplify type casting, if it is at the expense of performance or code readability.
> 
> > 
> > We could completely add an unaligned be128 member to the union by the
> > way. I don't see what is wrong with having sub union members.  
> 
> (Not that I agree to using a union, but..)
> Agree. If it's a union, and alignment is explicitly set, adding 64 bit and 128 bit sub union members should be perfectly acceptable, as it does not modify the alignment or anything else.
> 
> > 
> > About your concern with byte order, since the union members have
> > explicit rte_be*_t types, I don't think confusion can happen. I have
> > also renamed the members, replacing the "u" prefix with "a" so that it
> > does not indicate that it should be used as a host integer.
> > 
> >         struct __rte_aligned(1) rte_ipv6_addr {
> >                 union {
> >                         unsigned char a[16];
> >                         unaligned_be16_t a16[8];
> >                         unaligned_be32_t a32[4];
> >                         unaligned_be64_t a64[2];
> >                         unaligned_be128_t a128[1];
> >                 };
> >         } __rte_packed;  

Don't use packed, it makes the compiler generate very slow access since
it has to assume worst case alignment.

The intermediate forms are not actually big endian. Don't do that.
It also would cause checkers to think swaps are needed.

> 
> (Again, not that I'm accepting the structure, but...)
> Yes, this would solve the byte order concern.
> 
> How do you write efficient code with this forcefully unaligned type?
> Let's say an application has some structure with an IPv6 address field, which the developer has designed to be 16 byte aligned in the structure.
> The compiler would need to always access this 16 byte aligned field as unaligned, because the rte_ipv6_addr type makes it explicitly unaligned.

The force align is a bad idea.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IPv6 APIs rework
  2024-07-19 17:07                 ` Stephen Hemminger
@ 2024-07-20 17:43                   ` Robin Jarry
  2024-07-20 20:26                     ` Stephen Hemminger
  0 siblings, 1 reply; 21+ messages in thread
From: Robin Jarry @ 2024-07-20 17:43 UTC (permalink / raw)
  To: Stephen Hemminger, Morten Brørup
  Cc: Vladimir Medvedkin, bruce.richardson, Konstantin Ananyev, dev,
	Sunil Kumar Kori, Rakesh Kudurumalla, Wisam Jaddo,
	Cristian Dumitrescu, Konstantin Ananyev, Akhil Goyal, Fan Zhang,
	Yipeng Wang, Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K,
	Satha Rao, Harman Kalra, Ankur Dwivedi, Anoob Joseph,
	Tejasree Kondoj, Gagandeep Singh, Hemant Agrawal, Ajit Khaparde,
	Somnath Kotur, Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Xiaoyun Wang, Jie Hai,
	Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
	Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
	Chenbo Xia

Cutting down the quoting a bit.

I understand the arguments against having an unaligned packed struct. 
But it is precisely what we have in the current code base. All IPv6 
addresses are uint8_t[16] arrays. And nobody ever complained about it.

Since this rework is already massive, could we proceed in steps?

First, I can replace all uint8_t[16] arrays by an unaligned packed 
struct which is 99% of the work.

Second, *if there is a real and measurable performance gain*, change 
that structure to remove explicit alignment.

Does this sound like a good plan to everyone?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IPv6 APIs rework
  2024-07-20 17:43                   ` Robin Jarry
@ 2024-07-20 20:26                     ` Stephen Hemminger
  2024-07-20 20:33                       ` Robin Jarry
  0 siblings, 1 reply; 21+ messages in thread
From: Stephen Hemminger @ 2024-07-20 20:26 UTC (permalink / raw)
  To: Robin Jarry
  Cc: Morten Brørup, Vladimir Medvedkin, bruce.richardson,
	Konstantin Ananyev, dev, Sunil Kumar Kori, Rakesh Kudurumalla,
	Wisam Jaddo, Cristian Dumitrescu, Konstantin Ananyev,
	Akhil Goyal, Fan Zhang, Yipeng Wang, Sameh Gobriel,
	Nithin Dabilpuram, Kiran Kumar K, Satha Rao, Harman Kalra,
	Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj, Gagandeep Singh,
	Hemant Agrawal, Ajit Khaparde, Somnath Kotur, Chas Williams,
	Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Xiaoyun Wang, Jie Hai,
	Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
	Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
	Chenbo Xia

On Sat, 20 Jul 2024 19:43:45 +0200
"Robin Jarry" <rjarry@redhat.com> wrote:

> Cutting down the quoting a bit.
> 
> I understand the arguments against having an unaligned packed struct. 
> But it is precisely what we have in the current code base. All IPv6 
> addresses are uint8_t[16] arrays. And nobody ever complained about it.
> 
> Since this rework is already massive, could we proceed in steps?
> 
> First, I can replace all uint8_t[16] arrays by an unaligned packed 
> struct which is 99% of the work.
> 
> Second, *if there is a real and measurable performance gain*, change 
> that structure to remove explicit alignment.
> 
> Does this sound like a good plan to everyone?
> 

There is no need for packing or alignment in in6_addr or current DPDK,
what would be the benefit?  Compilers generate worse code if a structure
is marked packed.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IPv6 APIs rework
  2024-07-20 20:26                     ` Stephen Hemminger
@ 2024-07-20 20:33                       ` Robin Jarry
  2024-07-21 16:12                         ` Morten Brørup
  0 siblings, 1 reply; 21+ messages in thread
From: Robin Jarry @ 2024-07-20 20:33 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Morten Brørup, Vladimir Medvedkin, bruce.richardson,
	Konstantin Ananyev, dev, Sunil Kumar Kori, Rakesh Kudurumalla,
	Wisam Jaddo, Cristian Dumitrescu, Konstantin Ananyev,
	Akhil Goyal, Fan Zhang, Yipeng Wang, Sameh Gobriel,
	Nithin Dabilpuram, Kiran Kumar K, Satha Rao, Harman Kalra,
	Ankur Dwivedi, Anoob Joseph, Tejasree Kondoj, Gagandeep Singh,
	Hemant Agrawal, Ajit Khaparde, Somnath Kotur, Chas Williams,
	Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Xiaoyun Wang, Jie Hai,
	Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
	Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
	Chenbo Xia

Stephen Hemminger, Jul 20, 2024 at 22:26:
> There is no need for packing or alignment in in6_addr or current DPDK, 
> what would be the benefit?  Compilers generate worse code if 
> a structure is marked packed.

The only benefit is to maintain current behaviour.

At first, I had not packed nor aligned anything and I had tons of test 
errors because the compiler added padding in structures that contained 
IPv6 addresses.

I don't want to mix things together. In my opinion, removing that 
alignof(1) constraint is an optimization which has nothing to do with 
the IPv6 API functional rework.

So my proposal is: add a structure *packed and unaligned* first so that 
*all tests are passing*.

And *then*, after the changes have been applied on the main branch and 
no critical issues have been reported, see if we need to remove these 
packed and unaligned constraints.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: IPv6 APIs rework
  2024-07-20 20:33                       ` Robin Jarry
@ 2024-07-21 16:12                         ` Morten Brørup
  2024-07-21 21:51                           ` Robin Jarry
  0 siblings, 1 reply; 21+ messages in thread
From: Morten Brørup @ 2024-07-21 16:12 UTC (permalink / raw)
  To: Robin Jarry, Stephen Hemminger
  Cc: Vladimir Medvedkin, bruce.richardson, Konstantin Ananyev, dev,
	Sunil Kumar Kori, Rakesh Kudurumalla, Wisam Jaddo,
	Cristian Dumitrescu, Konstantin Ananyev, Akhil Goyal, Fan Zhang,
	Yipeng Wang, Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K,
	Satha Rao, Harman Kalra, Ankur Dwivedi, Anoob Joseph,
	Tejasree Kondoj, Gagandeep Singh, Hemant Agrawal, Ajit Khaparde,
	Somnath Kotur, Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Xiaoyun Wang, Jie Hai,
	Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
	Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
	Chenbo Xia

> From: Robin Jarry [mailto:rjarry@redhat.com]
> Sent: Saturday, 20 July 2024 22.33
> 
> Stephen Hemminger, Jul 20, 2024 at 22:26:
> > There is no need for packing or alignment in in6_addr or current DPDK,
> > what would be the benefit?  Compilers generate worse code if
> > a structure is marked packed.
> 
> The only benefit is to maintain current behaviour.
> 
> At first, I had not packed nor aligned anything and I had tons of test
> errors because the compiler added padding in structures that contained
> IPv6 addresses.

If the IPv6 address type you tested with was a struct containing a union of different types (other than an array of 16 bytes), then those sub-types made your IPv6 address type non-byte aligned, and caused padding when used in other structures.

Please try again with the simple array type:
struct rte_ipv6_addr { unsigned char addr_bytes[16]; };

This should not cause any padding when used in other structures, except if used with alignas().

> 
> I don't want to mix things together. In my opinion, removing that
> alignof(1) constraint is an optimization which has nothing to do with
> the IPv6 API functional rework.
> 
> So my proposal is: add a structure *packed and unaligned* first so that
> *all tests are passing*.
> 
> And *then*, after the changes have been applied on the main branch and
> no critical issues have been reported, see if we need to remove these
> packed and unaligned constraints.

If you are introducing an official IPv6 address type into DPDK, its scope it not just the FIB6 API.

Both Stephen and I can see that - in a broader perspective - the packed and unaligned constraints are unacceptable for performance.

It might not be a problem for the current FIB6 implementation, but it *will* be a problem in many other places, if converted to using the new IPv6 address type.

PS:
I do consider adding a dedicated IPv6 address type to DPDK an improvement over the current convention of using an uint8_t[16] array.
But we need to agree on the type, which must work optimally for a broad spectrum of use cases. Otherwise, the new type is not an improvement, but a deterioration of DPDK.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IPv6 APIs rework
  2024-07-21 16:12                         ` Morten Brørup
@ 2024-07-21 21:51                           ` Robin Jarry
  2024-07-22  9:31                             ` Morten Brørup
  0 siblings, 1 reply; 21+ messages in thread
From: Robin Jarry @ 2024-07-21 21:51 UTC (permalink / raw)
  To: Morten Brørup, Stephen Hemminger
  Cc: Vladimir Medvedkin, bruce.richardson, Konstantin Ananyev, dev,
	Sunil Kumar Kori, Rakesh Kudurumalla, Wisam Jaddo,
	Cristian Dumitrescu, Konstantin Ananyev, Akhil Goyal, Fan Zhang,
	Yipeng Wang, Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K,
	Satha Rao, Harman Kalra, Ankur Dwivedi, Anoob Joseph,
	Tejasree Kondoj, Gagandeep Singh, Hemant Agrawal, Ajit Khaparde,
	Somnath Kotur, Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Xiaoyun Wang, Jie Hai,
	Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
	Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
	Chenbo Xia

Hi Morten, Stephen,

Morten Brørup, Jul 21, 2024 at 18:12:
> If the IPv6 address type you tested with was a struct containing 
> a union of different types (other than an array of 16 bytes), then 
> those sub-types made your IPv6 address type non-byte aligned, and 
> caused padding when used in other structures.
>
> Please try again with the simple array type:
> struct rte_ipv6_addr { unsigned char addr_bytes[16]; };
>
> This should not cause any padding when used in other structures, 
> except if used with alignas().

Indeed removing the sub-types in the union removes the need for strict 
alignment and packing.

Too bad, I found these intermediate integers made the code a bit nicer 
but I can understand that it brings a lot of trouble down the line.

NB: I tried uint8_t vs unsigned char, it makes no difference with 
implicit casting to (uint16_t *) or (uint32_t *). Explicit casting is 
required anyway.

> If you are introducing an official IPv6 address type into DPDK, its 
> scope it not just the FIB6 API.
>
> Both Stephen and I can see that - in a broader perspective - the 
> packed and unaligned constraints are unacceptable for performance.
>
> It might not be a problem for the current FIB6 implementation, but it 
> *will* be a problem in many other places, if converted to using the 
> new IPv6 address type.
>
> PS:
> I do consider adding a dedicated IPv6 address type to DPDK an 
> improvement over the current convention of using an uint8_t[16] array.
> But we need to agree on the type, which must work optimally for 
> a broad spectrum of use cases. Otherwise, the new type is not an 
> improvement, but a deterioration of DPDK.

OK, I understand the stakes. I will comply and propose a simple struct 
without any packing nor explicit alignment.

    struct rte_ipv6_addr {
        union {
            unsigned char a[RTE_IPV6_ADDR_SIZE];
        };
    };

I have left the door open in order to ease adding sub-types in the 
future. Indeed, lpm6/fib6 tests rely on literal definitions of IPv6 
addresses and union types need an extra set of curly braces for literal 
definitions. If you think we will never need to add sub-types, I can get 
rid of this. It makes no difference at runtime.

About the timing: when should I send a patch to announce IPv6 API 
breakage for 24.11?

Thanks for taking the time.
Cheers.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: IPv6 APIs rework
  2024-07-21 21:51                           ` Robin Jarry
@ 2024-07-22  9:31                             ` Morten Brørup
  0 siblings, 0 replies; 21+ messages in thread
From: Morten Brørup @ 2024-07-22  9:31 UTC (permalink / raw)
  To: Robin Jarry, Stephen Hemminger
  Cc: Vladimir Medvedkin, bruce.richardson, Konstantin Ananyev, dev,
	Sunil Kumar Kori, Rakesh Kudurumalla, Wisam Jaddo,
	Cristian Dumitrescu, Konstantin Ananyev, Akhil Goyal, Fan Zhang,
	Yipeng Wang, Sameh Gobriel, Nithin Dabilpuram, Kiran Kumar K,
	Satha Rao, Harman Kalra, Ankur Dwivedi, Anoob Joseph,
	Tejasree Kondoj, Gagandeep Singh, Hemant Agrawal, Ajit Khaparde,
	Somnath Kotur, Chas Williams, Min Hu (Connor),
	Potnuri Bharat Teja, Sachin Saxena, Xiaoyun Wang, Jie Hai,
	Yisen Zhuang, Jingjing Wu, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad, Chaoyong He, Devendra Singh Rawat, Alok Prasad,
	Andrew Rybchenko, Jiawen Wu, Jian Wang, Thomas Monjalon,
	Ferruh Yigit, Jiayu Hu, Pavan Nikhilesh, Maxime Coquelin,
	Chenbo Xia

> From: Robin Jarry [mailto:rjarry@redhat.com]
> Sent: Sunday, 21 July 2024 23.51
> 
> Hi Morten, Stephen,
> 
> Morten Brørup, Jul 21, 2024 at 18:12:
> > If the IPv6 address type you tested with was a struct containing
> > a union of different types (other than an array of 16 bytes), then
> > those sub-types made your IPv6 address type non-byte aligned, and
> > caused padding when used in other structures.
> >
> > Please try again with the simple array type:
> > struct rte_ipv6_addr { unsigned char addr_bytes[16]; };
> >
> > This should not cause any padding when used in other structures,
> > except if used with alignas().
> 
> Indeed removing the sub-types in the union removes the need for strict
> alignment and packing.
> 
> Too bad, I found these intermediate integers made the code a bit nicer
> but I can understand that it brings a lot of trouble down the line.

Maybe some magical macros (or inline functions) can be used for pretty casting to larger integer types, using alignof() and/or the GCC assume_aligned attribute.
Such macros/functions can be added in later patches.
Perhaps they might even be generic, so they could be used on other byte array types too.

> 
> NB: I tried uint8_t vs unsigned char, it makes no difference with
> implicit casting to (uint16_t *) or (uint32_t *). Explicit casting is
> required anyway.

Unfortunately, I still cannot recall why unsigned char is better for type casting than uint8_t, so I cannot support my statement with a trustworthy source of reference.

> 
> > If you are introducing an official IPv6 address type into DPDK, its
> > scope it not just the FIB6 API.
> >
> > Both Stephen and I can see that - in a broader perspective - the
> > packed and unaligned constraints are unacceptable for performance.
> >
> > It might not be a problem for the current FIB6 implementation, but it
> > *will* be a problem in many other places, if converted to using the
> > new IPv6 address type.
> >
> > PS:
> > I do consider adding a dedicated IPv6 address type to DPDK an
> > improvement over the current convention of using an uint8_t[16] array.
> > But we need to agree on the type, which must work optimally for
> > a broad spectrum of use cases. Otherwise, the new type is not an
> > improvement, but a deterioration of DPDK.
> 
> OK, I understand the stakes. I will comply and propose a simple struct
> without any packing nor explicit alignment.
> 
>     struct rte_ipv6_addr {
>         union {
>             unsigned char a[RTE_IPV6_ADDR_SIZE];
>         };
>     };
> 
> I have left the door open in order to ease adding sub-types in the
> future. Indeed, lpm6/fib6 tests rely on literal definitions of IPv6
> addresses and union types need an extra set of curly braces for literal
> definitions. If you think we will never need to add sub-types, I can get
> rid of this. It makes no difference at runtime.

I think it is safe to start without the union.
If the anonymous union only has one member, it makes no difference if the union is there or not.
So, if we add other sub-types in the future, the union can be added at that time.

NB: I used "addr_bytes" as the name of the array in the structure, as in the rte_ether_addr structure [1]; but I support using "a" instead, it is shorter and it seems obvious that it is the same.

[1]: https://elixir.bootlin.com/dpdk/v24.07-rc2/source/lib/net/rte_ether.h#L74

<brainstorming>
Perhaps we could add an anonymous union to rte_ether_addr, to shorten its access name similarly:

struct __rte_aligned(2) rte_ether_addr {
+   __extension__
+   union {
        uint8_t addr_bytes[RTE_ETHER_ADDR_LEN]; /**< Addr bytes in tx order */
+       unsigned char a[RTE_ETHER_ADDR_LEN]; /**< Same, but shorter name */
+   }
};

This is not related to your patch in any way. Just thinking out loud.
</brainstorming>

> 
> About the timing: when should I send a patch to announce IPv6 API
> breakage for 24.11?

ASAP, I guess.
I suggest you describe it as an introduction of an IPv6 address type, and list the APIs that will be updated to use this new type.
The intention of introducing the new IPv6 address type with a broader scope than just the FIB6 APIs is to inspire others to use the new IPv6 address type too.

> 
> Thanks for taking the time.
> Cheers.

Thank you for listening.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-07-22  9:31 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-18 15:03 IPv6 APIs rework Robin Jarry
2024-07-18 20:27 ` Morten Brørup
2024-07-18 21:15   ` Stephen Hemminger
2024-07-18 21:40     ` Robin Jarry
2024-07-18 21:25   ` Vladimir Medvedkin
2024-07-18 21:34     ` Robin Jarry
2024-07-19  8:25       ` Konstantin Ananyev
2024-07-19  9:12       ` Morten Brørup
2024-07-19 10:02         ` Robin Jarry
2024-07-19 10:09           ` Bruce Richardson
2024-07-19 10:46           ` Morten Brørup
2024-07-19 11:09             ` Robin Jarry
2024-07-19 15:47               ` Morten Brørup
2024-07-19 17:07                 ` Stephen Hemminger
2024-07-20 17:43                   ` Robin Jarry
2024-07-20 20:26                     ` Stephen Hemminger
2024-07-20 20:33                       ` Robin Jarry
2024-07-21 16:12                         ` Morten Brørup
2024-07-21 21:51                           ` Robin Jarry
2024-07-22  9:31                             ` Morten Brørup
2024-07-19 10:41         ` Medvedkin, Vladimir

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).