DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Etelson, Gregory" <getelson@nvidia.com>
To: "Van Haaren, Harry" <harry.van.haaren@intel.com>
Cc: Gregory Etelson <getelson@nvidia.com>,
	 "Richardson, Bruce" <bruce.richardson@intel.com>,
	 "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [RFC] add Rust API for basic port operations
Date: Thu, 17 Apr 2025 19:19:21 +0300 (IDT)	[thread overview]
Message-ID: <6de4c566-080a-4d9f-733a-7604d30c53a0@nvidia.com> (raw)
In-Reply-To: <PH8PR11MB680361C556B7E2164363C5E3D7BC2@PH8PR11MB6803.namprd11.prod.outlook.com>

Hello Harry,

>
> Thanks for posting this RFC. I'll share my API RFC in the coming days too, for now
> some feedback (which is already integrated/existing in the API I've been working on)
>
> Bigger picture; the APIs below implement "DPDK C API thinking" in Rust. I'd like to rework the APIs to be "DPDK with Rust Ergonomics" instead:
>
> Example:
> 1) DpdkPortConf::new_from(0, c_dev_conf_struct, c_rx_conf_struct, c_tx_conf_struct, 8, 8, 512, 512, mempool (or None));
>    - the API here is not use case focussed. The values passed in are documented, but not easy to explore/understand.
>
> 2) let ports = dpdk::get_ports(); // returns a Vec<DpdkPort>
>   let p = ports.get(0);
>   p.rxqs(8, rx_mempool)?;
>   p.rx_crc_strip(true);
>   p.set_rss_hash(&[0,1,2,3..N])?;
>
>   // Notes: configure RX values: note the type-safe, and english-syntax functions.
>   // These functions prompt in modern IDEs, making "exploring" APIs easy/fast/intuitive (struct member bit-fields are not as easy to explore)
>   // The "rxqs()" function takes the parameters it needs (u16 count, and a Arc<Dpdk::mempool> object)
>   //   - this avoids exposing an "Option<Arc<Mempool>>" publically, hiding impl detail, giving better ergonomics
>   // Notice the "?" operator, this can be used "inline" to early-return and error (e.g. rss_hash array has incorrect length)
>

Agree.
That approach is much better.

>
> The ability to error check individual functions/APIs allow better error reporting to the user,
> specifically at the point where the error occurred. As these are not datapath APIs, allocations and
> overhead in the failure case should not be a problem - user-experience is the goal:
>
> 1) on error: -EINVAL returned from rte_eth_rxq_configure(), difficult to see what parameter caused the -EINVAL.
> 2) on error: "rss_hash has invalid lenght of 42" (or whatever we make the Debug/std::error::Error impl say)
>
> To summarize, "use-case focused APIs" are easier to read, debug, and maintain.
> The work here is to design/build these Rust ergonomic APIs: it takes DPDK knowledge, and some Rust API
> best-practices to figure these out. There's probably a "playbook" of like 3 rules/strategies, which will
> cover 90%+ of the API design.
>
> Hopefully this email is a starting point; some resources for future reader's reference;
> - https://burntsushi.net/rust-error-handling/
> - https://corrode.dev/blog/idiomatic-rust-resources/ (look for "error-handling" tag)
>

I'll definitely check these out

>
>> ```rust
>> /// Configuration details for a DPDK port.
>> ///
>> /// # Overview
>> ///
>> /// `DpdkPortConf` is used to initialize and configure ports in a DPDK environment. It includes:
>> /// - Device information (`dev_info`) pulled from DPDK.
>> /// - Transmit and receive configurations, including queue settings and memory pools.
>> /// - Details about the number of queues and descriptors for transmission and reception.
>> ///
>> /// This struct is typically instantiated using the [`DpdkPortConf::new_from`] method.
>> ///
>> /// # Example
>> ///
>> /// ```
>> /// let conf = DpdkPortConf::new_from(
>> ///     port_id,
>> ///     dev_conf,
>> ///     tx_conf,
>> ///     rx_conf,
>> ///     8,               // Number of RX queues
>> ///     8,               // Number of TX queues
>> ///     512,             // RX descriptors
>> ///     512,             // TX descriptors
>> ///     0,               // RX queue socket ID
>> ///     0,               // TX queue socket ID
>> ///     Some(rx_mempool) // RX queue mempool
>> /// ).unwrap();
>
> A few observations; The DpdkPortConf itself is a Rust struct, but internally it exposes a number
> of "pub" fields (rte_eth_dev_info, _eth_conf, rte_eth_rxconf, ...), which are directly bindgen-ed
> from the DPDK C API. While this works, it will not provide the ergonomics we'd like for Rust code.
>
> For example, the eth_dev_info->driver_name is a CString (aka, const char *), which needs to be
> converted to a safe Rust CString or CStr (see https://doc.rust-lang.org/beta/std/ffi/struct.CStr.html#method.from_ptr)
>
> A big value add in Rust development is the automatic "Debug" implementations, allowing one to just
>        println!("{dev_info:?}");
> a struct, and that the compiler is able to identify the contents, and present them automatically.
> That will not work if we have C structures with pointers, resulting in bad debug/logging for users.
>

What about explicitly implementing Debug for types that are referenced as pointers ?
For example:

```rust
#[derive(Copy, Clone)]
struct MbufPtr(*const rte_mbuf);

impl std::fmt::Debug for MbufPtr {
     fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
         if self.0.is_null() {
             return write!(f, "MbufPtr(null)");
         }

         let mbuf = unsafe { &*self.0 };
         let data_off = unsafe { mbuf.__bindgen_anon_1.__bindgen_anon_1.data_off };
         let pkt_len = unsafe { mbuf.__bindgen_anon_2.__bindgen_anon_1.pkt_len };

         write!(f, "MbufPtr {{ address: {:p}, data_offset: {}, packet_length: {} }}",
             self.0,
             data_off,
             pkt_len
         )
     }
}
```

The activation is like this:
```rust
let mbuf_ptr: *const rte_mbuf = PTR;
println!("{:?}", MbufPtr(mbuf_ptr));
```

Output:
```
MbufPtr { address: 0x1033af140, data_offset: 128, packet_length: 62 }
```

>
>> ///
>> #[derive(Clone)]
>> pub struct DpdkPortConf {
>>     /// Information about the DPDK Ethernet device (e.g., driver, capabilities, etc.).
>>     pub dev_info: rte_eth_dev_info,
>>
>>     /// Configuration for the Ethernet device. Determines the overall behavior of the device.
>>     pub dev_conf: rte_eth_conf,
>>
>>     /// Configuration for transmitting (Tx) packets on the Ethernet device.
>>     pub tx_conf: rte_eth_txconf,
>>
>>     /// Configuration for receiving (Rx) packets on the Ethernet device.
>>     pub rx_conf: rte_eth_rxconf,
>>
>>     /// Number of receive (Rx) queues configured on this port.
>>     pub rxq_num: u16,
>>
>>     /// Number of transmit (Tx) queues configured on this port.
>>     pub txq_num: u16,
>>
>>     /// Number of descriptors for each transmit (Tx) queue.
>>     /// Descriptors represent items in a queue to handle packets.
>>     pub tx_desc_num: u16,
>>
>>     /// Number of descriptors for each receive (Rx) queue.
>>     pub rx_desc_num: u16,
>>
>>     /// NUMA socket ID associated with the memory used by receive (Rx) queues.
>>     pub rxq_socket_id: u32,
>>
>>     /// NUMA socket ID associated with the memory used by transmit (Tx) queues.
>>     pub txq_socket_id: u32,
>>
>>     /// Memory pool associated with receive (Rx) queues.
>>     /// This manages the buffers used for storing incoming packets.
>>     pub rxq_mempool: Option<Arc<DpdkMempool>>,
>> }
>
> Any reaons that rxq_mempool has an Option<>..? Is it for TX-only port configurations?
> That's the only thing I could imagine is a cause. See above API where RX and TX are
> configured independently - this avoids passing an "RX mempool" at all, if only TX is configured!
>

Right. The Option<> is for Tx only configuration.

>
>> /// A trait defining basic operations for managing and interacting with a DPDK port.
>> ///
>> /// # Overview
>> ///
>> /// The `DpdkPort` trait standardizes how to operate on DPDK ports, making it possible to:
>> /// - Configure the Ethernet device using [`configure`].
>> /// - Start the device using [`start`].
>> /// - Handle Rx and Tx bursts of packets using [`rx_burst`] and [`tx_burst`].
>> ///
>> pub trait DpdkPort: Send + Sync {
>
> This trait provides a very C API centric view. Unfortunately, that view does not include the
> Rust API semantics we need for RXQ and TXQ being Send/!Sync (see https://youtu.be/lb6xn2xQ-NQ?t=802)
>
> Example of INCORRECT usage of this RFC API (that Rust compiler cannot help with)
> (refer to https://github.com/getelson-at-mellanox/rdpdk/blob/main/port/mlx5/mlx5_port.rs#L30 for port creation)
> thread 1:
>   let port = Mlx5::from(0, &port_conf);
>   let pkts = &mut [*mut rte_mbuf; 32];
>   port.rx_burst(0, &mut pkts); // segfault simulateous access on (port/queue combo)
>
> thread 2:
>   let port = Mlx5::from(0, &port_conf);
>   let pkts = &mut [*mut rte_mbuf; 32];
>   port.rx_burst(0, &mut pkts); // segfault simulateous access on (port/queue combo)
>
> The problem here is that the user was able to create two "handles" to the same "port".
> And from that handle of a port, it was possible to call rx_burst(), (or tx_burst(), etc).
>
> The Rust compiler cannot identify that Mlx5::from(0) is not safe to call twice.
> My current preferred solution is to have a Dpdk::eth::get_ports() API, which returns
> all the port instances. This removes the user's ability to "create" ports at all, and hence
> it cannot be mis-used either.
>
> All in all, the method in which C DPDK APIs work (pass in integer ID for port/queue), is
> a very different to how safe Rust APIs will look when building in Send/Sync safety.
>
>

I see your point now.

>
>>     /// Returns the port ID of the DPDK port.
>>     ///
>>     /// # Return Value
>>     /// A `u16` that uniquely identifies the DPDK port.
>>     ///
>>     /// # Example
>>     /// ```
>>     /// let port_id = dpdk_port.port_id();
>>     /// println!("DPDK port ID: {}", port_id);
>>     /// ```
>>     fn port_id(&self) -> u16;
>>
>>     /// Returns a reference to the configuration object of the DPDK port.
>>     ///
>>     /// # Return Value
>>     /// A reference to [`DpdkPortConf`], which contains various settings like Rx/Tx queue configurations,
>>     /// memory pools, and NUMA socket IDs.
>>     ///
>>     /// # Example
>>     /// ```
>>     /// let port_config = dpdk_port.port_conf();
>>     /// println!("Rx queues: {}", port_config.rxq_num);
>>     /// ```
>>     fn port_conf(&self) -> &DpdkPortConf;
>>
>>     /// Configures the DPDK Ethernet device with the settings specified in the port configuration.
>>     ///
>>     /// This method is typically called before starting the port to ensure it is prepared for Rx and Tx operations.
>>     ///
>>     /// # Return Value
>>     /// - `Ok(())` if the configuration was applied successfully.
>>     /// - `Err(String)` with a descriptive error message if the configuration failed.
>>     ///
>>     /// # Example
>>     /// ```
>>     /// let result = dpdk_port.configure();
>>     /// if let Err(err) = result {
>>     ///     eprintln!("Failed to configure the port: {}", err);
>>     /// }
>>     /// ```
>>     fn configure(&mut self) -> Result<(), String>;
>>
>>     /// Starts the DPDK Ethernet device.
>>     ///
>>     /// This method initializes the Rx and Tx queues, making the port ready for data transmission
>>     /// and reception.
>>     ///
>>     /// # Return Value
>>     /// - `Ok(())` if the port was started successfully.
>>     /// - `Err(String)` if the startup process failed, with a descriptive error message.
>>     ///
>>     /// # Example
>>     /// ```
>>     /// let result = dpdk_port.start();
>>     /// if let Err(err) = result {
>>     ///     eprintln!("Failed to start the port: {}", err);
>>     /// }
>>     /// ```
>>     fn start(&mut self) -> Result<(), String>;
>>
>>     /// Receives a burst of packets on the specified Rx queue.
>>     ///
>>     /// # Parameters
>>     /// - `queue_id`: The ID of the Rx queue to receive packets from.
>>     /// - `pkts`: A mutable reference to an array of packet buffers (`*mut rte_mbuf`) where received packets
>>     ///   will be written.
>>     ///
>>     /// # Return Value
>>     /// - `Ok(u16)` containing the number of packets successfully received.
>>     /// - `Err(String)` if the operation failed.
>>     ///
>>     /// # Example
>>     /// ```
>>     /// let pkts: Vec<*mut rte_mbuf> = vec![std::ptr::null_mut(); 32];
>>     /// let received = dpdk_port.rx_burst(0, &pkts);
>>     /// match received {
>>     ///     Ok(count) => println!("Received {} packets", count),
>>     ///     Err(err) => eprintln!("Rx burst failed: {}", err),
>>     /// }
>>     /// ```
>>     fn rx_burst(&mut self, queue_id: u16, pkts: &[*mut rte_mbuf]) -> Result<u16, String>;
>>
>>     /// Sends a burst of packets on the specified Tx queue.
>>     ///
>>     /// # Parameters
>>     /// - `queue_id`: The ID of the Tx queue to send packets on.
>>     /// - `pkts`: A reference to an array of packet buffers (`*mut rte_mbuf`) to send.
>>     ///
>>     /// # Return Value
>>     /// - `Ok(u16)` containing the number of packets successfully sent.
>>     /// - `Err(String)` if the operation failed.
>>     ///
>>     /// # Example
>>     /// ```
>>     /// let pkts: Vec<*mut rte_mbuf> = vec![some_packet_ptr1, some_packet_ptr2];
>>     /// let sent = dpdk_port.tx_burst(0, &pkts);
>>     /// match sent {
>>     ///     Ok(count) => println!("Sent {} packets", count),
>>     ///     Err(err) => eprintln!("Tx burst failed: {}", err),
>>     /// }
>>     /// ```
>>     fn tx_burst(&mut self, queue_id: u16, pkts: &[*mut rte_mbuf]) -> Result<u16, String>;
>> }
>> ```
>
> Having a single trait with configure(), start() and rx_burst() makes it impossible to encode the
> Send/Sync attributes of the "Port" concept, and the "RxQueue" concept seperately.
> So perhaps a solution with two traits could work; I haven't investigated in detail yet.
>
>
>> The API uses raw FFI pointers in DpdkPort rx_burst() and tx_burst() to preserve DPDK performance.
>
> Cool, nice trick... but does it remove all virtual functions/function-pointer dispatch?
> - https://github.com/getelson-at-mellanox/rdpdk/blob/main/app/runpmd/runpmd.rs#L143
> This seems to accept a "&mut Vec<Box<dyn DpdkPort>>", which would mean the ".rx_burst()"
> is a virtual-function into the trait, and then a function-pointer into the PMD?
>

Mlx5 trait implementation calls PMD rx_burst() and tx_burst() functions directly.
This way I did not need to re-implement original DPDK static inlining for IO calls.
For that approach to work DPDK must be patched to expose PMD IO functions.
See 
https://github.com/getelson-at-mellanox/rdpdk/blob/main/dpdk-patches/0002-net-mlx5-refactor-Rx-Tx-functions-selection.patch


> It could be interesting to change "do_io" to a generic function, taking a generic DpdkPort instance? Then the first level
> trait-resolution would become compile-time (through Rust compiler monomorphisation of the generic do_io<Pmd: impl DpdkPort>() into a
> more specific  "do_io<Mlx5>()"  code?

For multi-hw setups that function declaration need to be `do_io<R: DpdkPort, T: DpkdPort>()`
But that will not compile for Rx-only calls.
Without generics the function works for any HW type and for all directions.

>
> I don't believe this to *actually* matter for performance. It might matter for an IO-forward, 0 packet data processing test.
> So micro-benchmark bragging, yes this has impact... but touch the packet data and it no longer matters!
>
>
>> The API implementation is here:
>>
>> API definition:
>> https://github.com/getelson-at-mellanox/rdpdk/blob/main/lib/port/port.rs
>>
>> API implementation with RTE function calls:
>> https://github.com/getelson-at-mellanox/rdpdk/blob/main/lib/port/raw_port.rs
>>
>> Implement direct calls to mlx5 Rx/Tx IO functions:
>> https://github.com/getelson-at-mellanox/rdpdk/blob/main/port/mlx5/mlx5_port.rs
>>
>>
>> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
>
> Thanks for sharing the links and RFC!
>
> I will share the RFC API that communicates the Send/Sync requirements to the Rust compiler for Ethdev Port Rxq as a sample,
> and we can try compare/converge the "trait DpdkPort" and the compile-time-thread-safe approaches into the best API?
>

== thumbs-up ==

Regards,
Gregory

      reply	other threads:[~2025-04-17 16:19 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-16 11:42 Gregory Etelson
2025-04-17 12:26 ` Van Haaren, Harry
2025-04-17 16:19   ` Etelson, Gregory [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6de4c566-080a-4d9f-733a-7604d30c53a0@nvidia.com \
    --to=getelson@nvidia.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=harry.van.haaren@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).