DPDK patches and discussions
 help / color / mirror / Atom feed
* Sign changes through function signatures
@ 2023-02-02 19:23 Ben Magistro
  2023-02-02 20:26 ` Tyler Retzlaff
  0 siblings, 1 reply; 8+ messages in thread
From: Ben Magistro @ 2023-02-02 19:23 UTC (permalink / raw)
  To: Olivier Matz, Thomas Monjalon, ferruh.yigit, andrew.rybchenko
  Cc: ben.magistro, dev, Stefan Baranoff

[-- Attachment #1: Type: text/plain, Size: 1673 bytes --]

Hello,

While making some updates to our code base for 22.11.1 that were missed in
our first pass through, we hit the numa node change[1].  In the process of
updating our code, we noticed that a couple functions (rx/tx_queue_setup,
maybe more that we aren't using) state they accept `SOCKET_ID_ANY` but the
function signature then asks for an unsigned integer while `SOCKET_ID_ANY`
is `-1`.  Following it through the redirect to the "real" function it also
asks for an unsigned integer which is then passed on to one or more
functions asking for an integer.  As an example using the the i40e driver
-- we would call `rte_eth_tx_queue_setup` [2] which ultimately calls
`i40e_dev_tx_queue_setup`[3] which finally calls `rte_zmalloc_socket`[4]
and `rte_eth_dma_zone_reserve`[5].

I guess what I am looking for is clarification on if this is intentional or
if this is additional cleanup that may need to be completed/be desirable so
that signs are maintained through the call paths and avoid potentially
producing sign-conversion warnings.  From the very quick glance I took at
the i40e driver, it seems these are just passed through to other functions
and no direct use/manipulation occurs (at least in the mentioned functions).

1)
https://patches.dpdk.org/project/dpdk/patch/20221004145850.32331-1-olivier.matz@6wind.com/
2)
https://doc.dpdk.org/api/rte__ethdev_8h.html#a796c2f20778984c6f41b271e36bae50e
3) https://github.com/DPDK/dpdk/blob/main/drivers/net/i40e/i40e_rxtx.c#L1949
4)
https://doc.dpdk.org/api/rte__malloc_8h.html#a7e9f76b7e0b0921a617c6ab8b28f53b3
5)
https://github.com/DPDK/dpdk/blob/1094dd940ec0cc4e3ce2c5cd94807350855a17f9/lib/ethdev/ethdev_driver.h#L1566

[-- Attachment #2: Type: text/html, Size: 2335 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sign changes through function signatures
  2023-02-02 19:23 Sign changes through function signatures Ben Magistro
@ 2023-02-02 20:26 ` Tyler Retzlaff
  2023-02-02 20:45   ` Thomas Monjalon
  0 siblings, 1 reply; 8+ messages in thread
From: Tyler Retzlaff @ 2023-02-02 20:26 UTC (permalink / raw)
  To: Ben Magistro
  Cc: Olivier Matz, Thomas Monjalon, ferruh.yigit, andrew.rybchenko,
	ben.magistro, dev, Stefan Baranoff

On Thu, Feb 02, 2023 at 02:23:39PM -0500, Ben Magistro wrote:
> Hello,
> 
> While making some updates to our code base for 22.11.1 that were missed in
> our first pass through, we hit the numa node change[1].  In the process of
> updating our code, we noticed that a couple functions (rx/tx_queue_setup,
> maybe more that we aren't using) state they accept `SOCKET_ID_ANY` but the
> function signature then asks for an unsigned integer while `SOCKET_ID_ANY`
> is `-1`.  Following it through the redirect to the "real" function it also
> asks for an unsigned integer which is then passed on to one or more
> functions asking for an integer.  As an example using the the i40e driver
> -- we would call `rte_eth_tx_queue_setup` [2] which ultimately calls
> `i40e_dev_tx_queue_setup`[3] which finally calls `rte_zmalloc_socket`[4]
> and `rte_eth_dma_zone_reserve`[5].
> 
> I guess what I am looking for is clarification on if this is intentional or
> if this is additional cleanup that may need to be completed/be desirable so
> that signs are maintained through the call paths and avoid potentially
> producing sign-conversion warnings.  From the very quick glance I took at
> the i40e driver, it seems these are just passed through to other functions
> and no direct use/manipulation occurs (at least in the mentioned functions).

i believe this is just sloppyness with sign in our api surface. i too
find it frustrating that use of these api force either explicit
casts or suffer having to suppress warnings.

in the past examples of this have been cleaned up without full deprecation
notices but there are a lot of instances. i also feel (unpopular opinion)
that for some integer types like this that have constrained range / number
spaces it would be of value to introduce a typedef that can be used
consistently.

for now you'll just have to add the casts and hopefully in the future we
will fix the api making them unnecessary. of course feel free to submit
patches too, it would be great to have these cleaned up.

> 
> 1)
> https://patches.dpdk.org/project/dpdk/patch/20221004145850.32331-1-olivier.matz@6wind.com/
> 2)
> https://doc.dpdk.org/api/rte__ethdev_8h.html#a796c2f20778984c6f41b271e36bae50e
> 3) https://github.com/DPDK/dpdk/blob/main/drivers/net/i40e/i40e_rxtx.c#L1949
> 4)
> https://doc.dpdk.org/api/rte__malloc_8h.html#a7e9f76b7e0b0921a617c6ab8b28f53b3
> 5)
> https://github.com/DPDK/dpdk/blob/1094dd940ec0cc4e3ce2c5cd94807350855a17f9/lib/ethdev/ethdev_driver.h#L1566

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sign changes through function signatures
  2023-02-02 20:26 ` Tyler Retzlaff
@ 2023-02-02 20:45   ` Thomas Monjalon
  2023-02-02 21:26     ` Morten Brørup
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Monjalon @ 2023-02-02 20:45 UTC (permalink / raw)
  To: Ben Magistro, Tyler Retzlaff
  Cc: Olivier Matz, ferruh.yigit, andrew.rybchenko, ben.magistro, dev,
	Stefan Baranoff, david.marchand, bruce.richardson,
	anatoly.burakov

02/02/2023 21:26, Tyler Retzlaff:
> On Thu, Feb 02, 2023 at 02:23:39PM -0500, Ben Magistro wrote:
> > Hello,
> > 
> > While making some updates to our code base for 22.11.1 that were missed in
> > our first pass through, we hit the numa node change[1].  In the process of
> > updating our code, we noticed that a couple functions (rx/tx_queue_setup,
> > maybe more that we aren't using) state they accept `SOCKET_ID_ANY` but the
> > function signature then asks for an unsigned integer while `SOCKET_ID_ANY`
> > is `-1`.  Following it through the redirect to the "real" function it also
> > asks for an unsigned integer which is then passed on to one or more
> > functions asking for an integer.  As an example using the the i40e driver
> > -- we would call `rte_eth_tx_queue_setup` [2] which ultimately calls
> > `i40e_dev_tx_queue_setup`[3] which finally calls `rte_zmalloc_socket`[4]
> > and `rte_eth_dma_zone_reserve`[5].
> > 
> > I guess what I am looking for is clarification on if this is intentional or
> > if this is additional cleanup that may need to be completed/be desirable so
> > that signs are maintained through the call paths and avoid potentially
> > producing sign-conversion warnings.  From the very quick glance I took at
> > the i40e driver, it seems these are just passed through to other functions
> > and no direct use/manipulation occurs (at least in the mentioned functions).
> 
> i believe this is just sloppyness with sign in our api surface. i too
> find it frustrating that use of these api force either explicit
> casts or suffer having to suppress warnings.
> 
> in the past examples of this have been cleaned up without full deprecation
> notices but there are a lot of instances. i also feel (unpopular opinion)
> that for some integer types like this that have constrained range / number
> spaces it would be of value to introduce a typedef that can be used
> consistently.
> 
> for now you'll just have to add the casts and hopefully in the future we
> will fix the api making them unnecessary. of course feel free to submit
> patches too, it would be great to have these cleaned up.

I agree it should be cleaned up.
Those IDs should accept negative values.
Not sure which type we should choose (int, int32_t, or a typedef).

Another thing to check is the name of the variable.
It should be a socket ID when talking about CPU,
and a NUMA node ID when talking about memory.

And last but not the least,
how can we keep ABI compatibility?
I hope we can use function versioning to avoid deprecation and breaking.

Trials and suggestions are welcome.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Sign changes through function signatures
  2023-02-02 20:45   ` Thomas Monjalon
@ 2023-02-02 21:26     ` Morten Brørup
  2023-02-03 12:05       ` Bruce Richardson
  0 siblings, 1 reply; 8+ messages in thread
From: Morten Brørup @ 2023-02-02 21:26 UTC (permalink / raw)
  To: Thomas Monjalon, Ben Magistro, Tyler Retzlaff, bruce.richardson
  Cc: Olivier Matz, ferruh.yigit, andrew.rybchenko, ben.magistro, dev,
	Stefan Baranoff, david.marchand, anatoly.burakov

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, 2 February 2023 21.45
> 
> 02/02/2023 21:26, Tyler Retzlaff:
> > On Thu, Feb 02, 2023 at 02:23:39PM -0500, Ben Magistro wrote:
> > > Hello,
> > >
> > > While making some updates to our code base for 22.11.1 that were
> missed in
> > > our first pass through, we hit the numa node change[1].  In the
> process of
> > > updating our code, we noticed that a couple functions
> (rx/tx_queue_setup,
> > > maybe more that we aren't using) state they accept `SOCKET_ID_ANY`
> but the
> > > function signature then asks for an unsigned integer while
> `SOCKET_ID_ANY`
> > > is `-1`.  Following it through the redirect to the "real" function
> it also
> > > asks for an unsigned integer which is then passed on to one or more
> > > functions asking for an integer.  As an example using the the i40e
> driver
> > > -- we would call `rte_eth_tx_queue_setup` [2] which ultimately
> calls
> > > `i40e_dev_tx_queue_setup`[3] which finally calls
> `rte_zmalloc_socket`[4]
> > > and `rte_eth_dma_zone_reserve`[5].
> > >
> > > I guess what I am looking for is clarification on if this is
> intentional or
> > > if this is additional cleanup that may need to be completed/be
> desirable so
> > > that signs are maintained through the call paths and avoid
> potentially
> > > producing sign-conversion warnings.  From the very quick glance I
> took at
> > > the i40e driver, it seems these are just passed through to other
> functions
> > > and no direct use/manipulation occurs (at least in the mentioned
> functions).
> >
> > i believe this is just sloppyness with sign in our api surface. i too
> > find it frustrating that use of these api force either explicit
> > casts or suffer having to suppress warnings.
> >
> > in the past examples of this have been cleaned up without full
> deprecation
> > notices but there are a lot of instances. i also feel (unpopular
> opinion)
> > that for some integer types like this that have constrained range /
> number
> > spaces it would be of value to introduce a typedef that can be used
> > consistently.
> >
> > for now you'll just have to add the casts and hopefully in the future
> we
> > will fix the api making them unnecessary. of course feel free to
> submit
> > patches too, it would be great to have these cleaned up.
> 
> I agree it should be cleaned up.
> Those IDs should accept negative values.
> Not sure which type we should choose (int, int32_t, or a typedef).

Why would we use a signed socket ID? We don't use signed port IDs. To me, unsigned seems the way to go. (A minor detail: With unsigned we can use the entire range of values minus one (for the magic "any" value), whereas with signed we can only use the positive range of values. This detail is completely irrelevant when using 32 bit for socket ID, but could be relevant if using fewer bits.)

Also, we don't need 32 bit for socket ID. 8 or 16 bit should suffice, like port ID. But reducing from 32 bit would probably cause major ABI breakage.

> 
> Another thing to check is the name of the variable.
> It should be a socket ID when talking about CPU,
> and a NUMA node ID when talking about memory.
> 
> And last but not the least,
> how can we keep ABI compatibility?
> I hope we can use function versioning to avoid deprecation and
> breaking.
> 
> Trials and suggestions are welcome.

Signedness is not the only problem with the socket ID. The meaning of SOCKET_ID_ANY is excessively overloaded. If we want to clean this up, we should consider the need for another magic value SOCKET_ID_NONE for devices connected to the chipset, as discussed in this other email thread [1]. And as discussed there, there are also size problems, because some device structures use 8 bit to hold the socket ID.

And functions should always return -1, never SOCKET_ID_ANY, to indicate error.

[1]: http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87684@smartserver.smartshare.dk/

I only bring warnings and complications to the discussion here, no solutions. Sorry! :-(


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sign changes through function signatures
  2023-02-02 21:26     ` Morten Brørup
@ 2023-02-03 12:05       ` Bruce Richardson
  2023-02-03 22:12         ` Tyler Retzlaff
  0 siblings, 1 reply; 8+ messages in thread
From: Bruce Richardson @ 2023-02-03 12:05 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Thomas Monjalon, Ben Magistro, Tyler Retzlaff, Olivier Matz,
	ferruh.yigit, andrew.rybchenko, ben.magistro, dev,
	Stefan Baranoff, david.marchand, anatoly.burakov

On Thu, Feb 02, 2023 at 10:26:48PM +0100, Morten Brørup wrote:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Thursday, 2 February 2023 21.45
> > 
> > 02/02/2023 21:26, Tyler Retzlaff:
> > > On Thu, Feb 02, 2023 at 02:23:39PM -0500, Ben Magistro wrote:
> > > > Hello,
> > > >
> > > > While making some updates to our code base for 22.11.1 that were
> > missed in
> > > > our first pass through, we hit the numa node change[1].  In the
> > process of
> > > > updating our code, we noticed that a couple functions
> > (rx/tx_queue_setup,
> > > > maybe more that we aren't using) state they accept `SOCKET_ID_ANY`
> > but the
> > > > function signature then asks for an unsigned integer while
> > `SOCKET_ID_ANY`
> > > > is `-1`.  Following it through the redirect to the "real" function
> > it also
> > > > asks for an unsigned integer which is then passed on to one or more
> > > > functions asking for an integer.  As an example using the the i40e
> > driver
> > > > -- we would call `rte_eth_tx_queue_setup` [2] which ultimately
> > calls
> > > > `i40e_dev_tx_queue_setup`[3] which finally calls
> > `rte_zmalloc_socket`[4]
> > > > and `rte_eth_dma_zone_reserve`[5].
> > > >
> > > > I guess what I am looking for is clarification on if this is
> > intentional or
> > > > if this is additional cleanup that may need to be completed/be
> > desirable so
> > > > that signs are maintained through the call paths and avoid
> > potentially
> > > > producing sign-conversion warnings.  From the very quick glance I
> > took at
> > > > the i40e driver, it seems these are just passed through to other
> > functions
> > > > and no direct use/manipulation occurs (at least in the mentioned
> > functions).
> > >
> > > i believe this is just sloppyness with sign in our api surface. i too
> > > find it frustrating that use of these api force either explicit
> > > casts or suffer having to suppress warnings.
> > >
> > > in the past examples of this have been cleaned up without full
> > deprecation
> > > notices but there are a lot of instances. i also feel (unpopular
> > opinion)
> > > that for some integer types like this that have constrained range /
> > number
> > > spaces it would be of value to introduce a typedef that can be used
> > > consistently.
> > >
> > > for now you'll just have to add the casts and hopefully in the future
> > we
> > > will fix the api making them unnecessary. of course feel free to
> > submit
> > > patches too, it would be great to have these cleaned up.
> > 
> > I agree it should be cleaned up.
> > Those IDs should accept negative values.
> > Not sure which type we should choose (int, int32_t, or a typedef).
> 
> Why would we use a signed socket ID? We don't use signed port IDs. To me, unsigned seems the way to go. (A minor detail: With unsigned we can use the entire range of values minus one (for the magic "any" value), whereas with signed we can only use the positive range of values. This detail is completely irrelevant when using 32 bit for socket ID, but could be relevant if using fewer bits.)
> 
> Also, we don't need 32 bit for socket ID. 8 or 16 bit should suffice, like port ID. But reducing from 32 bit would probably cause major ABI breakage.
> 
> > 
> > Another thing to check is the name of the variable.
> > It should be a socket ID when talking about CPU,
> > and a NUMA node ID when talking about memory.
> > 
> > And last but not the least,
> > how can we keep ABI compatibility?
> > I hope we can use function versioning to avoid deprecation and
> > breaking.
> > 
> > Trials and suggestions are welcome.
> 
> Signedness is not the only problem with the socket ID. The meaning of SOCKET_ID_ANY is excessively overloaded. If we want to clean this up, we should consider the need for another magic value SOCKET_ID_NONE for devices connected to the chipset, as discussed in this other email thread [1]. And as discussed there, there are also size problems, because some device structures use 8 bit to hold the socket ID.
> 
> And functions should always return -1, never SOCKET_ID_ANY, to indicate error.
> 
> [1]: http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87684@smartserver.smartshare.dk/
> 
> I only bring warnings and complications to the discussion here, no solutions. Sorry! :-(
>

Personally, I think if we are going to change things, we should do things
properly, especially/even if we are going to have to break ABI or use ABI
compatibility.

I would suggest rather than a typedef, we should actually wrap the int
value in a struct - for two reasons:

* it means the compiler will actually error out for us if an int or
  unsigned int is used instead. This allow easier fixing at compile-time
  rather than hoping things are correctly specified in existing code.

* it allows us to do things like explicitly calling out flags, rather than
  just using magic values. While still keeping the size 32 bits, we can
  have the actual socket value as 16-bits and have flags to indicate:
  - ANY socket, NO socket, INVALID value socket. This could end up being
  useful in many cases, for example, when allocating memory we could
  specify a socket number with the ANY flag, indicating that any socket is
  ok, but we'd ideally prefer the number specified.

As for socket id, and numa id, I'm not sure we should have different
names/types for the two. For example, for PCI devices, do they need a third
type or are they associated with cores or with memory? The socket id for
the core only matters in terms of data locality, i.e. what memory or cache
location it is in. Therefore, for me, I'd pick one name and stick with it.

/Bruce

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sign changes through function signatures
  2023-02-03 12:05       ` Bruce Richardson
@ 2023-02-03 22:12         ` Tyler Retzlaff
  2023-02-04  8:09           ` Morten Brørup
  0 siblings, 1 reply; 8+ messages in thread
From: Tyler Retzlaff @ 2023-02-03 22:12 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Morten Brørup, Thomas Monjalon, Ben Magistro, Olivier Matz,
	ferruh.yigit, andrew.rybchenko, ben.magistro, dev,
	Stefan Baranoff, david.marchand, anatoly.burakov

On Fri, Feb 03, 2023 at 12:05:04PM +0000, Bruce Richardson wrote:
> On Thu, Feb 02, 2023 at 10:26:48PM +0100, Morten Brørup wrote:
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > Sent: Thursday, 2 February 2023 21.45
> > > 
> > > 02/02/2023 21:26, Tyler Retzlaff:
> > > > On Thu, Feb 02, 2023 at 02:23:39PM -0500, Ben Magistro wrote:
> > > > > Hello,
> > > > >
> > > > > While making some updates to our code base for 22.11.1 that were
> > > missed in
> > > > > our first pass through, we hit the numa node change[1].  In the
> > > process of
> > > > > updating our code, we noticed that a couple functions
> > > (rx/tx_queue_setup,
> > > > > maybe more that we aren't using) state they accept `SOCKET_ID_ANY`
> > > but the
> > > > > function signature then asks for an unsigned integer while
> > > `SOCKET_ID_ANY`
> > > > > is `-1`.  Following it through the redirect to the "real" function
> > > it also
> > > > > asks for an unsigned integer which is then passed on to one or more
> > > > > functions asking for an integer.  As an example using the the i40e
> > > driver
> > > > > -- we would call `rte_eth_tx_queue_setup` [2] which ultimately
> > > calls
> > > > > `i40e_dev_tx_queue_setup`[3] which finally calls
> > > `rte_zmalloc_socket`[4]
> > > > > and `rte_eth_dma_zone_reserve`[5].
> > > > >
> > > > > I guess what I am looking for is clarification on if this is
> > > intentional or
> > > > > if this is additional cleanup that may need to be completed/be
> > > desirable so
> > > > > that signs are maintained through the call paths and avoid
> > > potentially
> > > > > producing sign-conversion warnings.  From the very quick glance I
> > > took at
> > > > > the i40e driver, it seems these are just passed through to other
> > > functions
> > > > > and no direct use/manipulation occurs (at least in the mentioned
> > > functions).
> > > >
> > > > i believe this is just sloppyness with sign in our api surface. i too
> > > > find it frustrating that use of these api force either explicit
> > > > casts or suffer having to suppress warnings.
> > > >
> > > > in the past examples of this have been cleaned up without full
> > > deprecation
> > > > notices but there are a lot of instances. i also feel (unpopular
> > > opinion)
> > > > that for some integer types like this that have constrained range /
> > > number
> > > > spaces it would be of value to introduce a typedef that can be used
> > > > consistently.
> > > >
> > > > for now you'll just have to add the casts and hopefully in the future
> > > we
> > > > will fix the api making them unnecessary. of course feel free to
> > > submit
> > > > patches too, it would be great to have these cleaned up.
> > > 
> > > I agree it should be cleaned up.
> > > Those IDs should accept negative values.
> > > Not sure which type we should choose (int, int32_t, or a typedef).
> > 
> > Why would we use a signed socket ID? We don't use signed port IDs. To me, unsigned seems the way to go. (A minor detail: With unsigned we can use the entire range of values minus one (for the magic "any" value), whereas with signed we can only use the positive range of values. This detail is completely irrelevant when using 32 bit for socket ID, but could be relevant if using fewer bits.)
> > 
> > Also, we don't need 32 bit for socket ID. 8 or 16 bit should suffice, like port ID. But reducing from 32 bit would probably cause major ABI breakage.
> > 
> > > 
> > > Another thing to check is the name of the variable.
> > > It should be a socket ID when talking about CPU,
> > > and a NUMA node ID when talking about memory.
> > > 
> > > And last but not the least,
> > > how can we keep ABI compatibility?
> > > I hope we can use function versioning to avoid deprecation and
> > > breaking.
> > > 
> > > Trials and suggestions are welcome.
> > 
> > Signedness is not the only problem with the socket ID. The meaning of SOCKET_ID_ANY is excessively overloaded. If we want to clean this up, we should consider the need for another magic value SOCKET_ID_NONE for devices connected to the chipset, as discussed in this other email thread [1]. And as discussed there, there are also size problems, because some device structures use 8 bit to hold the socket ID.
> > 
> > And functions should always return -1, never SOCKET_ID_ANY, to indicate error.
> > 
> > [1]: http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87684@smartserver.smartshare.dk/
> > 
> > I only bring warnings and complications to the discussion here, no solutions. Sorry! :-(
> >
> 
> Personally, I think if we are going to change things, we should do things
> properly, especially/even if we are going to have to break ABI or use ABI
> compatibility.
> 
> I would suggest rather than a typedef, we should actually wrap the int
> value in a struct - for two reasons:

> 
> * it means the compiler will actually error out for us if an int or
>   unsigned int is used instead. This allow easier fixing at compile-time
>   rather than hoping things are correctly specified in existing code.
> 
> * it allows us to do things like explicitly calling out flags, rather than
>   just using magic values. While still keeping the size 32 bits, we can
>   have the actual socket value as 16-bits and have flags to indicate:
>   - ANY socket, NO socket, INVALID value socket. This could end up being
>   useful in many cases, for example, when allocating memory we could
>   specify a socket number with the ANY flag, indicating that any socket is
>   ok, but we'd ideally prefer the number specified.

i'm a fan of this where it makes sense. i did this with rte_thread_t for
exactly your first reason. but i did receive resistance from other
members of the community. personally i like compilation to fail when i
make a mistake.

it's definitely way easier to make the argument to do this when the
actual valued is opaque. if it isn't i think then we need to provide
macro/inline accessors to allow applications do whatever it is they do
with the value they carry.

i'll also note that this allows you a cheap way to sprinkle extra
integrity checking when running functional tests. if you have low
performance inline accessors you can do things like enforce the range of
values or or that enumerations are part of a set for debug builds.

as a side i would also caution while i suggested a typedef i don't mean
that everything should be typedef'd especially actual structs that are
used like structs. typedefs for things like socket id would
unquestionably convey more information and implied semantics to the user
of an api than just a standard `int' or whatever. consequently i have found
that this lowers mistakes with the use of the api.

> 
> As for socket id, and numa id, I'm not sure we should have different
> names/types for the two. For example, for PCI devices, do they need a third
> type or are they associated with cores or with memory? The socket id for
> the core only matters in terms of data locality, i.e. what memory or cache
> location it is in. Therefore, for me, I'd pick one name and stick with it.

i think the choice for more than one type vs one type is whether or not
they are "the same" number space as opposed to just coincidentally
overlapping number spaces.

> 
> /Bruce

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Sign changes through function signatures
  2023-02-03 22:12         ` Tyler Retzlaff
@ 2023-02-04  8:09           ` Morten Brørup
  2023-02-06 15:57             ` Ben Magistro
  0 siblings, 1 reply; 8+ messages in thread
From: Morten Brørup @ 2023-02-04  8:09 UTC (permalink / raw)
  To: Tyler Retzlaff, Bruce Richardson
  Cc: Thomas Monjalon, Ben Magistro, Olivier Matz, ferruh.yigit,
	andrew.rybchenko, ben.magistro, dev, Stefan Baranoff,
	david.marchand, anatoly.burakov

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Friday, 3 February 2023 23.13
> 
> On Fri, Feb 03, 2023 at 12:05:04PM +0000, Bruce Richardson wrote:
> > On Thu, Feb 02, 2023 at 10:26:48PM +0100, Morten Brørup wrote:
> > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > Sent: Thursday, 2 February 2023 21.45
> > > >
> > > > 02/02/2023 21:26, Tyler Retzlaff:
> > > > > On Thu, Feb 02, 2023 at 02:23:39PM -0500, Ben Magistro wrote:
> > > > > > Hello,
> > > > > >
> > > > > > While making some updates to our code base for 22.11.1 that
> were
> > > > missed in
> > > > > > our first pass through, we hit the numa node change[1].  In
> the
> > > > process of
> > > > > > updating our code, we noticed that a couple functions
> > > > (rx/tx_queue_setup,
> > > > > > maybe more that we aren't using) state they accept
> `SOCKET_ID_ANY`
> > > > but the
> > > > > > function signature then asks for an unsigned integer while
> > > > `SOCKET_ID_ANY`
> > > > > > is `-1`.  Following it through the redirect to the "real"
> function
> > > > it also
> > > > > > asks for an unsigned integer which is then passed on to one
> or more
> > > > > > functions asking for an integer.  As an example using the the
> i40e
> > > > driver
> > > > > > -- we would call `rte_eth_tx_queue_setup` [2] which
> ultimately
> > > > calls
> > > > > > `i40e_dev_tx_queue_setup`[3] which finally calls
> > > > `rte_zmalloc_socket`[4]
> > > > > > and `rte_eth_dma_zone_reserve`[5].
> > > > > >
> > > > > > I guess what I am looking for is clarification on if this is
> > > > intentional or
> > > > > > if this is additional cleanup that may need to be
> completed/be
> > > > desirable so
> > > > > > that signs are maintained through the call paths and avoid
> > > > potentially
> > > > > > producing sign-conversion warnings.  From the very quick
> glance I
> > > > took at
> > > > > > the i40e driver, it seems these are just passed through to
> other
> > > > functions
> > > > > > and no direct use/manipulation occurs (at least in the
> mentioned
> > > > functions).
> > > > >
> > > > > i believe this is just sloppyness with sign in our api surface.
> i too
> > > > > find it frustrating that use of these api force either explicit
> > > > > casts or suffer having to suppress warnings.
> > > > >
> > > > > in the past examples of this have been cleaned up without full
> > > > deprecation
> > > > > notices but there are a lot of instances. i also feel
> (unpopular
> > > > opinion)
> > > > > that for some integer types like this that have constrained
> range /
> > > > number
> > > > > spaces it would be of value to introduce a typedef that can be
> used
> > > > > consistently.
> > > > >
> > > > > for now you'll just have to add the casts and hopefully in the
> future
> > > > we
> > > > > will fix the api making them unnecessary. of course feel free
> to
> > > > submit
> > > > > patches too, it would be great to have these cleaned up.
> > > >
> > > > I agree it should be cleaned up.
> > > > Those IDs should accept negative values.
> > > > Not sure which type we should choose (int, int32_t, or a
> typedef).
> > >
> > > Why would we use a signed socket ID? We don't use signed port IDs.
> To me, unsigned seems the way to go. (A minor detail: With unsigned we
> can use the entire range of values minus one (for the magic "any"
> value), whereas with signed we can only use the positive range of
> values. This detail is completely irrelevant when using 32 bit for
> socket ID, but could be relevant if using fewer bits.)
> > >
> > > Also, we don't need 32 bit for socket ID. 8 or 16 bit should
> suffice, like port ID. But reducing from 32 bit would probably cause
> major ABI breakage.
> > >
> > > >
> > > > Another thing to check is the name of the variable.
> > > > It should be a socket ID when talking about CPU,
> > > > and a NUMA node ID when talking about memory.
> > > >
> > > > And last but not the least,
> > > > how can we keep ABI compatibility?
> > > > I hope we can use function versioning to avoid deprecation and
> > > > breaking.
> > > >
> > > > Trials and suggestions are welcome.
> > >
> > > Signedness is not the only problem with the socket ID. The meaning
> of SOCKET_ID_ANY is excessively overloaded. If we want to clean this
> up, we should consider the need for another magic value SOCKET_ID_NONE
> for devices connected to the chipset, as discussed in this other email
> thread [1]. And as discussed there, there are also size problems,
> because some device structures use 8 bit to hold the socket ID.
> > >
> > > And functions should always return -1, never SOCKET_ID_ANY, to
> indicate error.
> > >
> > > [1]:
> http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87684@smarts
> erver.smartshare.dk/
> > >
> > > I only bring warnings and complications to the discussion here, no
> solutions. Sorry! :-(
> > >
> >
> > Personally, I think if we are going to change things, we should do
> things
> > properly, especially/even if we are going to have to break ABI or use
> ABI
> > compatibility.
> >
> > I would suggest rather than a typedef, we should actually wrap the
> int
> > value in a struct - for two reasons:
> 
> >
> > * it means the compiler will actually error out for us if an int or
> >   unsigned int is used instead. This allow easier fixing at compile-
> time
> >   rather than hoping things are correctly specified in existing code.
> >
> > * it allows us to do things like explicitly calling out flags, rather
> than
> >   just using magic values. While still keeping the size 32 bits, we
> can
> >   have the actual socket value as 16-bits and have flags to indicate:
> >   - ANY socket, NO socket, INVALID value socket. This could end up
> being
> >   useful in many cases, for example, when allocating memory we could
> >   specify a socket number with the ANY flag, indicating that any
> socket is
> >   ok, but we'd ideally prefer the number specified.
> 
> i'm a fan of this where it makes sense. i did this with rte_thread_t
> for
> exactly your first reason. but i did receive resistance from other
> members of the community. personally i like compilation to fail when i
> make a mistake.
> 
> it's definitely way easier to make the argument to do this when the
> actual valued is opaque. if it isn't i think then we need to provide
> macro/inline accessors to allow applications do whatever it is they do
> with the value they carry.
> 
> i'll also note that this allows you a cheap way to sprinkle extra
> integrity checking when running functional tests. if you have low
> performance inline accessors you can do things like enforce the range
> of
> values or or that enumerations are part of a set for debug builds.
> 
> as a side i would also caution while i suggested a typedef i don't mean
> that everything should be typedef'd especially actual structs that are
> used like structs. typedefs for things like socket id would
> unquestionably convey more information and implied semantics to the
> user
> of an api than just a standard `int' or whatever. consequently i have
> found
> that this lowers mistakes with the use of the api.

Hiding the socket_id in a typedef'd structure seems like shooting sparrows with cannons.

DPDK is using a C coding style, where there is a convention for not using typedefs:
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#typedefs

In the tread case, a typedef made sense, because the underlying type can differ across O/S'es, and thus should be opaque. Which is in line with the coding style.

But I don't think this is the case for socket_id. The socket_id is an enumeration type, and all we need is a magic number for the "chipset" pseudo-socket. And with that, perhaps some iterator macros to include/omit this pseudo-socket, like the lcore_id iterators with and without the main lcore.

The mix of signed and unsigned in function signatures (and in the definition of SOCKET_ID_ANY) is pure sloppyness. This problem may also be present in other function signatures; we just happened to run into it for the socket_id.

The compiler has flags to warn about mixing signed and unsigned types, so we could use that flag to reveal and fix those bugs.

> 
> >
> > As for socket id, and numa id, I'm not sure we should have different
> > names/types for the two. For example, for PCI devices, do they need a
> third
> > type or are they associated with cores or with memory? The socket id
> for
> > the core only matters in terms of data locality, i.e. what memory or
> cache
> > location it is in. Therefore, for me, I'd pick one name and stick
> with it.
> 
> i think the choice for more than one type vs one type is whether or not
> they are "the same" number space as opposed to just coincidentally
> overlapping number spaces.
> 
> >
> > /Bruce


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Sign changes through function signatures
  2023-02-04  8:09           ` Morten Brørup
@ 2023-02-06 15:57             ` Ben Magistro
  0 siblings, 0 replies; 8+ messages in thread
From: Ben Magistro @ 2023-02-06 15:57 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Tyler Retzlaff, Bruce Richardson, Thomas Monjalon, Olivier Matz,
	ferruh.yigit, andrew.rybchenko, ben.magistro, dev,
	Stefan Baranoff, david.marchand, anatoly.burakov

[-- Attachment #1: Type: text/plain, Size: 13556 bytes --]

I'm a fan of "just rip the bandaid off" (especially when it's convenient
for me, however it's very possible I will also be the person to bring up
backwards compatibility).  Speaking of backwards compatibility, API/ABI
breakage was semi-recently discussed at the techboard [1].  From the notes
it was not clear to me what level of breakage is going to be acceptable
going forward.  This same question seems likely to apply to discussion
around specifying the c standard [2] though potentially less impactful
based on the recent discussion.  I am also beginning to see how a "-ng"
project happens to simplify making many large breaking changes.

To try and add my thoughts here.  For practical use, I don't believe a
socket or core ID should ever be negative.  I don't believe you should need
more than 4 bits for socket id (personally only aware of 4 socket system
boards), but if we are saying we keep the same memory space (32 bits) there
is no practical reason not to allocate 8 bits which gives you the remaining
24 bits for flags. In this thread, we've already identified three? that
seem useful, flag_unset (possibly regretting this, but can see this flag
being overloaded to indicate both unset and error setting), flag_any_okay,
and flag_none (not entirely clear to me yet how this would be used
differently than any_okay).  To me this really sounds like a struct makes
sense to manage the value + flags associated with it as a unit.

We are now venturing into areas I know I don't have enough knowledge about
to speak authoritatively on.  On the aspect of numa id and socket id, I had
to look this up, but it appears that one socket can have more than one numa
id (AMD Threadripper) associated with it.  I don't have easy access to an
AMD system I can run a `lscpu` on to provide a sample/confirm.  I am also
not sure what if any implications there are for how it is used within this
code base.  From a practical purpose, I believe memory is still associated
with a socket  so numa and socket may be able to be used interchangeably
for this purpose in which case I agree, pick one and standardize on that
term/language throughout the code base, possibly adding a note for future
developers/users.

When talking about core id I believe we need to utilize at least 16 bits of
space as we can have systems with dual AMD 64C/128T which I believe should
show as cores 0-255 today.  I have not looked at that aspect of the code
but see it as closely related to the socket discussion.  If making changes
to one, it is probably worth reviewing the other at the same time.  Very
quickly looking at rte_lcore [3], it seems like we either have a model that
should be followed for sockets (as suggested by Morten) or another case
where a struct may also make more sense to wrap a value and provide flags
versus magic values.

Going back to the ABI/API breakage question...  When quickly looking at the
API today, we have a number of functions that return negative values to
indicate errors.  Using references and structs may simplify that to the
point of return == 0 on success and < 0 on error, possibly with no need to
utilize rte_errno for these functions so that would at least allow for
following the existing model/pattern.  I am probably oversimplifying this
aspect.

I will say, in the case of TLDK, I've had to increase the return size of
some functions to int64_t to allow the return of the maximum value on
success and support returning a negative value on error.  Without looking,
I don't remember if that was in one of our wrappers, internal code, or
public APIs.  Regardless of where it actually is, I did not like this as
there are functions that expect a uint32_t so casts or warning suppression
may still be required in the code base.

1) http://mails.dpdk.org/archives/dev/2023-January/259811.html
2) http://mails.dpdk.org/archives/dev/2023-February/261097.html
3)
https://doc.dpdk.org/api/rte__lcore_8h.html#acbf23499dc0b2d223e4d311ad5f1b04e

On Sat, Feb 4, 2023 at 3:09 AM Morten Brørup <mb@smartsharesystems.com>
wrote:

> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Friday, 3 February 2023 23.13
> >
> > On Fri, Feb 03, 2023 at 12:05:04PM +0000, Bruce Richardson wrote:
> > > On Thu, Feb 02, 2023 at 10:26:48PM +0100, Morten Brørup wrote:
> > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > > > Sent: Thursday, 2 February 2023 21.45
> > > > >
> > > > > 02/02/2023 21:26, Tyler Retzlaff:
> > > > > > On Thu, Feb 02, 2023 at 02:23:39PM -0500, Ben Magistro wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > While making some updates to our code base for 22.11.1 that
> > were
> > > > > missed in
> > > > > > > our first pass through, we hit the numa node change[1].  In
> > the
> > > > > process of
> > > > > > > updating our code, we noticed that a couple functions
> > > > > (rx/tx_queue_setup,
> > > > > > > maybe more that we aren't using) state they accept
> > `SOCKET_ID_ANY`
> > > > > but the
> > > > > > > function signature then asks for an unsigned integer while
> > > > > `SOCKET_ID_ANY`
> > > > > > > is `-1`.  Following it through the redirect to the "real"
> > function
> > > > > it also
> > > > > > > asks for an unsigned integer which is then passed on to one
> > or more
> > > > > > > functions asking for an integer.  As an example using the the
> > i40e
> > > > > driver
> > > > > > > -- we would call `rte_eth_tx_queue_setup` [2] which
> > ultimately
> > > > > calls
> > > > > > > `i40e_dev_tx_queue_setup`[3] which finally calls
> > > > > `rte_zmalloc_socket`[4]
> > > > > > > and `rte_eth_dma_zone_reserve`[5].
> > > > > > >
> > > > > > > I guess what I am looking for is clarification on if this is
> > > > > intentional or
> > > > > > > if this is additional cleanup that may need to be
> > completed/be
> > > > > desirable so
> > > > > > > that signs are maintained through the call paths and avoid
> > > > > potentially
> > > > > > > producing sign-conversion warnings.  From the very quick
> > glance I
> > > > > took at
> > > > > > > the i40e driver, it seems these are just passed through to
> > other
> > > > > functions
> > > > > > > and no direct use/manipulation occurs (at least in the
> > mentioned
> > > > > functions).
> > > > > >
> > > > > > i believe this is just sloppyness with sign in our api surface.
> > i too
> > > > > > find it frustrating that use of these api force either explicit
> > > > > > casts or suffer having to suppress warnings.
> > > > > >
> > > > > > in the past examples of this have been cleaned up without full
> > > > > deprecation
> > > > > > notices but there are a lot of instances. i also feel
> > (unpopular
> > > > > opinion)
> > > > > > that for some integer types like this that have constrained
> > range /
> > > > > number
> > > > > > spaces it would be of value to introduce a typedef that can be
> > used
> > > > > > consistently.
> > > > > >
> > > > > > for now you'll just have to add the casts and hopefully in the
> > future
> > > > > we
> > > > > > will fix the api making them unnecessary. of course feel free
> > to
> > > > > submit
> > > > > > patches too, it would be great to have these cleaned up.
> > > > >
> > > > > I agree it should be cleaned up.
> > > > > Those IDs should accept negative values.
> > > > > Not sure which type we should choose (int, int32_t, or a
> > typedef).
> > > >
> > > > Why would we use a signed socket ID? We don't use signed port IDs.
> > To me, unsigned seems the way to go. (A minor detail: With unsigned we
> > can use the entire range of values minus one (for the magic "any"
> > value), whereas with signed we can only use the positive range of
> > values. This detail is completely irrelevant when using 32 bit for
> > socket ID, but could be relevant if using fewer bits.)
> > > >
> > > > Also, we don't need 32 bit for socket ID. 8 or 16 bit should
> > suffice, like port ID. But reducing from 32 bit would probably cause
> > major ABI breakage.
> > > >
> > > > >
> > > > > Another thing to check is the name of the variable.
> > > > > It should be a socket ID when talking about CPU,
> > > > > and a NUMA node ID when talking about memory.
> > > > >
> > > > > And last but not the least,
> > > > > how can we keep ABI compatibility?
> > > > > I hope we can use function versioning to avoid deprecation and
> > > > > breaking.
> > > > >
> > > > > Trials and suggestions are welcome.
> > > >
> > > > Signedness is not the only problem with the socket ID. The meaning
> > of SOCKET_ID_ANY is excessively overloaded. If we want to clean this
> > up, we should consider the need for another magic value SOCKET_ID_NONE
> > for devices connected to the chipset, as discussed in this other email
> > thread [1]. And as discussed there, there are also size problems,
> > because some device structures use 8 bit to hold the socket ID.
> > > >
> > > > And functions should always return -1, never SOCKET_ID_ANY, to
> > indicate error.
> > > >
> > > > [1]:
> > http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D87684@smarts
> > erver.smartshare.dk/
> > > >
> > > > I only bring warnings and complications to the discussion here, no
> > solutions. Sorry! :-(
> > > >
> > >
> > > Personally, I think if we are going to change things, we should do
> > things
> > > properly, especially/even if we are going to have to break ABI or use
> > ABI
> > > compatibility.
> > >
> > > I would suggest rather than a typedef, we should actually wrap the
> > int
> > > value in a struct - for two reasons:
> >
> > >
> > > * it means the compiler will actually error out for us if an int or
> > >   unsigned int is used instead. This allow easier fixing at compile-
> > time
> > >   rather than hoping things are correctly specified in existing code.
> > >
> > > * it allows us to do things like explicitly calling out flags, rather
> > than
> > >   just using magic values. While still keeping the size 32 bits, we
> > can
> > >   have the actual socket value as 16-bits and have flags to indicate:
> > >   - ANY socket, NO socket, INVALID value socket. This could end up
> > being
> > >   useful in many cases, for example, when allocating memory we could
> > >   specify a socket number with the ANY flag, indicating that any
> > socket is
> > >   ok, but we'd ideally prefer the number specified.
> >
> > i'm a fan of this where it makes sense. i did this with rte_thread_t
> > for
> > exactly your first reason. but i did receive resistance from other
> > members of the community. personally i like compilation to fail when i
> > make a mistake.
> >
> > it's definitely way easier to make the argument to do this when the
> > actual valued is opaque. if it isn't i think then we need to provide
> > macro/inline accessors to allow applications do whatever it is they do
> > with the value they carry.
> >
> > i'll also note that this allows you a cheap way to sprinkle extra
> > integrity checking when running functional tests. if you have low
> > performance inline accessors you can do things like enforce the range
> > of
> > values or or that enumerations are part of a set for debug builds.
> >
> > as a side i would also caution while i suggested a typedef i don't mean
> > that everything should be typedef'd especially actual structs that are
> > used like structs. typedefs for things like socket id would
> > unquestionably convey more information and implied semantics to the
> > user
> > of an api than just a standard `int' or whatever. consequently i have
> > found
> > that this lowers mistakes with the use of the api.
>
> Hiding the socket_id in a typedef'd structure seems like shooting sparrows
> with cannons.
>
> DPDK is using a C coding style, where there is a convention for not using
> typedefs:
> https://www.kernel.org/doc/html/v4.10/process/coding-style.html#typedefs
>
> In the tread case, a typedef made sense, because the underlying type can
> differ across O/S'es, and thus should be opaque. Which is in line with the
> coding style.
>
> But I don't think this is the case for socket_id. The socket_id is an
> enumeration type, and all we need is a magic number for the "chipset"
> pseudo-socket. And with that, perhaps some iterator macros to include/omit
> this pseudo-socket, like the lcore_id iterators with and without the main
> lcore.
>
> The mix of signed and unsigned in function signatures (and in the
> definition of SOCKET_ID_ANY) is pure sloppyness. This problem may also be
> present in other function signatures; we just happened to run into it for
> the socket_id.
>
> The compiler has flags to warn about mixing signed and unsigned types, so
> we could use that flag to reveal and fix those bugs.
>
> >
> > >
> > > As for socket id, and numa id, I'm not sure we should have different
> > > names/types for the two. For example, for PCI devices, do they need a
> > third
> > > type or are they associated with cores or with memory? The socket id
> > for
> > > the core only matters in terms of data locality, i.e. what memory or
> > cache
> > > location it is in. Therefore, for me, I'd pick one name and stick
> > with it.
> >
> > i think the choice for more than one type vs one type is whether or not
> > they are "the same" number space as opposed to just coincidentally
> > overlapping number spaces.
> >
> > >
> > > /Bruce
>
>

[-- Attachment #2: Type: text/html, Size: 16895 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-02-06 15:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-02 19:23 Sign changes through function signatures Ben Magistro
2023-02-02 20:26 ` Tyler Retzlaff
2023-02-02 20:45   ` Thomas Monjalon
2023-02-02 21:26     ` Morten Brørup
2023-02-03 12:05       ` Bruce Richardson
2023-02-03 22:12         ` Tyler Retzlaff
2023-02-04  8:09           ` Morten Brørup
2023-02-06 15:57             ` Ben Magistro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).