DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Robin Jarry" <rjarry@redhat.com>, <dev@dpdk.org>
Cc: "Sunil Kumar Kori" <skori@marvell.com>,
	"Rakesh Kudurumalla" <rkudurumalla@marvell.com>,
	"Vladimir Medvedkin" <vladimir.medvedkin@intel.com>,
	"Wisam Jaddo" <wisamm@nvidia.com>,
	"Cristian Dumitrescu" <cristian.dumitrescu@intel.com>,
	"Konstantin Ananyev" <konstantin.v.ananyev@yandex.ru>,
	"Akhil Goyal" <gakhil@marvell.com>,
	"Fan Zhang" <fanzhang.oss@gmail.com>,
	"Bruce Richardson" <bruce.richardson@intel.com>,
	"Yipeng Wang" <yipeng1.wang@intel.com>,
	"Sameh Gobriel" <sameh.gobriel@intel.com>,
	"Nithin Dabilpuram" <ndabilpuram@marvell.com>,
	"Kiran Kumar K" <kirankumark@marvell.com>,
	"Satha Rao" <skoteshwar@marvell.com>,
	"Harman Kalra" <hkalra@marvell.com>,
	"Ankur Dwivedi" <adwivedi@marvell.com>,
	"Anoob Joseph" <anoobj@marvell.com>,
	"Tejasree Kondoj" <ktejasree@marvell.com>,
	"Gagandeep Singh" <g.singh@nxp.com>,
	"Hemant Agrawal" <hemant.agrawal@nxp.com>,
	"Ajit Khaparde" <ajit.khaparde@broadcom.com>,
	"Somnath Kotur" <somnath.kotur@broadcom.com>,
	"Chas Williams" <chas3@att.com>,
	"Min Hu (Connor)" <humin29@huawei.com>,
	"Potnuri Bharat Teja" <bharat@chelsio.com>,
	"Sachin Saxena" <sachin.saxena@nxp.com>,
	"Ziyang Xuan" <xuanziyang2@huawei.com>,
	"Xiaoyun Wang" <cloud.wangxiaoyun@huawei.com>,
	"Jie Hai" <haijie1@huawei.com>,
	"Yisen Zhuang" <yisen.zhuang@huawei.com>,
	"Jingjing Wu" <jingjing.wu@intel.com>,
	"Dariusz Sosnowski" <dsosnowski@nvidia.com>,
	"Viacheslav Ovsiienko" <viacheslavo@nvidia.com>,
	"Bing Zhao" <bingz@nvidia.com>, "Ori Kam" <orika@nvidia.com>,
	"Suanming Mou" <suanmingm@nvidia.com>,
	"Matan Azrad" <matan@nvidia.com>,
	"Chaoyong He" <chaoyong.he@corigine.com>,
	"Devendra Singh Rawat" <dsinghrawat@marvell.com>,
	"Alok Prasad" <palok@marvell.com>,
	"Andrew Rybchenko" <andrew.rybchenko@oktetlabs.ru>,
	"Stephen Hemminger" <stephen@networkplumber.org>,
	"Jiawen Wu" <jiawenwu@trustnetic.com>,
	"Jian Wang" <jianwang@trustnetic.com>,
	"Thomas Monjalon" <thomas@monjalon.net>,
	"Ferruh Yigit" <ferruh.yigit@amd.com>,
	"Jiayu Hu" <hujiayu.hu@foxmail.com>,
	"Pavan Nikhilesh" <pbhagavatula@marvell.com>,
	"Maxime Coquelin" <maxime.coquelin@redhat.com>,
	"Chenbo Xia" <chenbox@nvidia.com>
Subject: RE: IPv6 APIs rework
Date: Thu, 18 Jul 2024 22:27:03 +0200	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35E9F5AB@smartserver.smartshare.dk> (raw)
In-Reply-To: <D2SR8T1H39CJ.JRQFI6JEH0OX@redhat.com>

> From: Robin Jarry [mailto:rjarry@redhat.com]
> 
> Hi folks,
> 
> while working on IPv6 support for grout [1], I noticed that all DPDK
> IPv6 APIs used fixed sized arrays in the route lookup functions [2].
> 
>  int rte_fib6_lookup_bulk(struct rte_fib6 *fib,
>                           uint8_t ips[][RTE_FIB6_IPV6_ADDR_SIZE],
>                           uint64_t *next_hops,
>                           int n);
> 
> If I'm not mistaken, using sized arrays in function signatures is only
> for documentation purposes and does not result in any specific compiler
> checks. In the above example, the ips parameter is considered as a plain
> old `uint8_t **` pointer.
> 
> Also, not having a dedicated type for IPv6 addresses requires obscure
> pointer arithmetic [3] and casting [4].
> 
> I'd like to introduce a real IPv6 address structure that has the same
> alignment than a dumb `uint8_t *` pointer but has an union to ease
> casting and most importantly presents the whole thing as an explicit
> typed structure:
> 
>     #define RTE_IPV6_ADDR_SIZE 16
> 
>     struct rte_ipv6_addr {
>         union {
>             uint8_t u8[RTE_IPV6_ADDR_SIZE];
>             uint16_t u16[RTE_IPV6_ADDR_SIZE / sizeof(uint16_t)];
>             uint32_t u32[RTE_IPV6_ADDR_SIZE / sizeof(uint32_t)];
>             uint64_t u64[RTE_IPV6_ADDR_SIZE / sizeof(uint64_t)];
>         };
>     } __rte_packed __rte_aligned(1);
> 
> This would require some breakage of the APIs but I think it would
> benefit code readability and maintainability in the long term.

In short: Although I like the idea of a unified IPv6 address type very much, I'm not sure consensus can be reached about the optimal alignment of such a type.

The long version:

Please consider this proposal in a broader perspective.

The IPv4 FIB lookup takes an uint32_t array, so the IPv4 address type here is 4 byte aligned: uint32_t *ips
Generally, uint32_t or rte_be32_t is used for IPv4 addresses, and both these types are 4 byte aligned. In other words: IPv4 addresses are considered 4 byte aligned by DPDK.

I don't think it is similarly simple for IPv6 addresses.

The alignment of IPv6 addresses may depend on how/where they are used, e.g.:
1. For the FIB library, it might be good for vector implementations to have the IPv6 addresses naturally aligned (i.e. 16 byte aligned), like the uint128_t/__int128/__m128i type (or the rte_xmm_t type [XMM]). Furthermore, a simple integer type (uint128_t equivalent) might be preferable in this API.
2. In the IPv6 packet header, the IPv6 addresses are not 16 byte aligned, they are 8 byte aligned. So we cannot make the IPv6 address type 16 byte aligned.

I fear that broadly dumbing down the IPv6 address type to always use 1 byte alignment could potentially introduce unwanted performance penalties (now or in the future). We didn't do it for IPv4 addresses, so let's not do it for IPv6 addresses.

Perhaps we could use the lowest "non-exotic" (considering the use of IPv6 addresses) alignment, which I would guess is 8 byte - as in the IPv6 packet header.
For reference, Ethernet addresses are defined as 2 byte aligned [ETH].

[XMM]: https://elixir.bootlin.com/dpdk/v24.03/source/lib/eal/x86/include/rte_vect.h#L42
[ETH]: https://elixir.bootlin.com/dpdk/v24.07-rc2/source/lib/net/rte_ether.h#L74

> 
>  int rte_fib6_lookup_bulk(struct rte_fib6 *fib,
>                           const struct rte_ipv6_addr *ips,
>                           uint64_t *next_hops,
>                           int n);
> 
> I already have a semi-working draft and am in the process of splitting
> the changes into small chunks to make them easier to review.
> 
> https://github.com/DPDK/dpdk/compare/main...rjarry:dpdk:ipv6-address-
> rework
> 
> Is that something that would be of interest? If yes, I would like to
> announce API breakage before the release of 24.07 so that the changes
> can be integrated into 24.11.
> 
> Cheers!
> 
> [1] https://github.com/rjarry/grout
> [2]
> https://doc.dpdk.org/api/rte__fib6_8h.html#a924678410ccb9551cda3e75d742a
> 11e3
> [3] https://git.dpdk.org/dpdk/tree/lib/fib/trie_avx512.c?h=v24.07-
> rc2#n340
> [4] https://git.dpdk.org/dpdk/tree/lib/hash/rte_thash.h?h=v24.07-
> rc2#n156
> 
> --
> Robin


  reply	other threads:[~2024-07-18 20:27 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-18 15:03 Robin Jarry
2024-07-18 20:27 ` Morten Brørup [this message]
2024-07-18 21:15   ` Stephen Hemminger
2024-07-18 21:40     ` Robin Jarry
2024-07-18 21:25   ` Vladimir Medvedkin
2024-07-18 21:34     ` Robin Jarry
2024-07-19  8:25       ` Konstantin Ananyev
2024-07-19  9:12       ` Morten Brørup
2024-07-19 10:02         ` Robin Jarry
2024-07-19 10:09           ` Bruce Richardson
2024-07-19 10:46           ` Morten Brørup
2024-07-19 11:09             ` Robin Jarry
2024-07-19 15:47               ` Morten Brørup
2024-07-19 17:07                 ` Stephen Hemminger
2024-07-20 17:43                   ` Robin Jarry
2024-07-20 20:26                     ` Stephen Hemminger
2024-07-20 20:33                       ` Robin Jarry
2024-07-21 16:12                         ` Morten Brørup
2024-07-21 21:51                           ` Robin Jarry
2024-07-22  9:31                             ` Morten Brørup
2024-07-19 10:41         ` Medvedkin, Vladimir

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35E9F5AB@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=adwivedi@marvell.com \
    --cc=ajit.khaparde@broadcom.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=anoobj@marvell.com \
    --cc=bharat@chelsio.com \
    --cc=bingz@nvidia.com \
    --cc=bruce.richardson@intel.com \
    --cc=chaoyong.he@corigine.com \
    --cc=chas3@att.com \
    --cc=chenbox@nvidia.com \
    --cc=cloud.wangxiaoyun@huawei.com \
    --cc=cristian.dumitrescu@intel.com \
    --cc=dev@dpdk.org \
    --cc=dsinghrawat@marvell.com \
    --cc=dsosnowski@nvidia.com \
    --cc=fanzhang.oss@gmail.com \
    --cc=ferruh.yigit@amd.com \
    --cc=g.singh@nxp.com \
    --cc=gakhil@marvell.com \
    --cc=haijie1@huawei.com \
    --cc=hemant.agrawal@nxp.com \
    --cc=hkalra@marvell.com \
    --cc=hujiayu.hu@foxmail.com \
    --cc=humin29@huawei.com \
    --cc=jianwang@trustnetic.com \
    --cc=jiawenwu@trustnetic.com \
    --cc=jingjing.wu@intel.com \
    --cc=kirankumark@marvell.com \
    --cc=konstantin.v.ananyev@yandex.ru \
    --cc=ktejasree@marvell.com \
    --cc=matan@nvidia.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=ndabilpuram@marvell.com \
    --cc=orika@nvidia.com \
    --cc=palok@marvell.com \
    --cc=pbhagavatula@marvell.com \
    --cc=rjarry@redhat.com \
    --cc=rkudurumalla@marvell.com \
    --cc=sachin.saxena@nxp.com \
    --cc=sameh.gobriel@intel.com \
    --cc=skori@marvell.com \
    --cc=skoteshwar@marvell.com \
    --cc=somnath.kotur@broadcom.com \
    --cc=stephen@networkplumber.org \
    --cc=suanmingm@nvidia.com \
    --cc=thomas@monjalon.net \
    --cc=viacheslavo@nvidia.com \
    --cc=vladimir.medvedkin@intel.com \
    --cc=wisamm@nvidia.com \
    --cc=xuanziyang2@huawei.com \
    --cc=yipeng1.wang@intel.com \
    --cc=yisen.zhuang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).