[PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq

DPDK patches and discussions
 help / color / mirror / Atom feed

* [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
@ 2025-04-17 15:10 Harry van Haaren
  2025-04-17 18:58 ` Etelson, Gregory
  2025-04-18 13:23 ` [PATCH 1/3] " Harry van Haaren
  0 siblings, 2 replies; 20+ messages in thread
From: Harry van Haaren @ 2025-04-17 15:10 UTC (permalink / raw)
  To: dev; +Cc: getelson, bruce.richardson, owen.hilyard, Harry van Haaren

This patch is NOT to be considered for merge, it is a demo
of the Rust APIs for Ethdev. There is no actual implementation
of the APIs against the DPDK C functions, this is Rust API only.

To test/run the code (and uncomment things to see errors)
just apply this patch, cd "rust_api_example" and run
$ cargo run

This will compile the API, and spawn 2x threads to poll on
two Rxq instances. The comments in the code explain how the
"Send" and "Sync" attributes are captured per instances of a
struct (e.g. how RxqHandle -> Rxq restricts thread movement).

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
---
 rust_api_example/Cargo.toml  |   6 ++
 rust_api_example/src/main.rs | 189 +++++++++++++++++++++++++++++++++++
 2 files changed, 195 insertions(+)
 create mode 100644 rust_api_example/Cargo.toml
 create mode 100644 rust_api_example/src/main.rs

diff --git a/rust_api_example/Cargo.toml b/rust_api_example/Cargo.toml
new file mode 100644
index 0000000000..0137826340
--- /dev/null
+++ b/rust_api_example/Cargo.toml
@@ -0,0 +1,6 @@
+[package]
+name = "rust_api_example"
+version = "0.1.0"
+edition = "2021"
+
+[dependencies]
diff --git a/rust_api_example/src/main.rs b/rust_api_example/src/main.rs
new file mode 100644
index 0000000000..8d0de50c30
--- /dev/null
+++ b/rust_api_example/src/main.rs
@@ -0,0 +1,189 @@
+// Outline for safe DPDK API bindings
+//  - None of the APIs are actually implemented, this is API design only
+//  - This demo runs 2x threads on 2x Rxqs, and cannot accidentally poll incorrectly
+
+pub mod dpdk {
+    pub mod eth {
+        use super::Mempool;
+
+        #[derive(Debug)]
+        pub struct TxqHandle {/* todo: but same as Rxq */}
+
+        // Handle allows moving between threads, its not polling!
+        #[derive(Debug)]
+        pub struct RxqHandle {
+            port: u16,
+            queue: u16,
+        }
+
+        impl RxqHandle {
+            pub(crate) fn new(port: u16, queue: u16) -> Self {
+                RxqHandle { port, queue }
+            }
+
+            // This function is the key to the API design: it ensures the rx_burst()
+            // function is only available via the Rxq struct, after enable_polling() has been called.
+            // It "consumes" (takes "self" as a parameter, not a '&' reference!) which essentially
+            // destroys/invalidates the handle from the Application level code.
+
+            // It returns an Rxq instance, which has the PhantomData to encode the threading requirements,
+            // and the Rxq has the rx_burst() function: this allows the application to recieve packets.
+            pub fn enable_polling(self) -> Rxq {
+                Rxq {
+                    handle: self,
+                    _phantom: std::marker::PhantomData,
+                }
+            }
+        }
+
+        #[derive(Debug)]
+        pub struct Rxq {
+            handle: RxqHandle,
+            // This "PhantomData" tells the rust compiler to Pretend the Rc<()> is in this struct
+            // but in practice it is a Zero-Sized-Type, so takes up no space. It is a compile-time
+            // language technique to ensure the struct is not moved between threads. This encodes
+            // the API requirement "don't poll from multiple threads without synchronisation (e.g. Mutex)"
+            _phantom: std::marker::PhantomData<std::rc::Rc<()>>,
+        }
+
+        impl Rxq {
+            // TODO: datapath Error types should be lightweight, not String. Here we return ().
+            pub fn rx_burst(&mut self, _mbufs: &mut [u8]) -> Result<usize, ()> {
+                // TODO: Design the Mbuf struct wrapper, and how to best return a batch
+                //  e.g.: investigate "ArrayVec" crate for safe & fixed sized, stack allocated arrays
+                //
+                // There is work to do here, but I want to communicate the general DPDK/EAL/Eth/Rxq concepts
+                // now, this part is not done yet: it is likely the hardest/most performance critical.
+                //
+                // call rte_eth_rx_burst() here
+                println!(
+                    "[thread: {:?}] rx_burst: port {} queue {}",
+                    std::thread::current().id(),
+                    self.handle.port,
+                    self.handle.queue
+                );
+                Ok(0)
+            }
+        }
+
+        #[derive(Debug)]
+        pub struct Port {
+            id: u16,
+            rxqs: Vec<RxqHandle>,
+            txqs: Vec<TxqHandle>,
+        }
+
+        impl Port {
+            // pub(crate) here ensures outside this crate users cannot call this function
+            pub(crate) fn from_u16(id: u16) -> Self {
+                Port {
+                    id,
+                    rxqs: Vec::new(),
+                    txqs: Vec::new(),
+                }
+            }
+
+            pub fn rxqs(&mut self, rxq_count: u16, _mempool: Mempool) -> Result<(), String> {
+                for q in 0..rxq_count {
+                    // call rte_eth_rx_queue_setup() here
+                    self.rxqs.push(RxqHandle::new(self.id, q));
+                }
+                Ok(())
+            }
+
+            pub fn start(&mut self) -> (Vec<RxqHandle>, Vec<TxqHandle>) {
+                // call rte_eth_dev_start() here, then give ownership of Rxq/Txq to app
+                (
+                    std::mem::take(&mut self.rxqs),
+                    std::mem::take(&mut self.txqs),
+                )
+            }
+        }
+    }
+
+    #[derive(Debug, Clone)]
+    // Mempool is a long-life object, which many other DPDK things refer to (e.g. rxq config)
+    // Having a Rust lifetime attached to it (while technically correct) would complicate the
+    // code a LOT, and for little value. This is a tradeoff - happy to discuss more if we want.
+    // The choice here is to derive "Clone", allowing handing over multiple instances of the
+    // same Mempool, similar to how Arc<Mempool> would work, but without the reference counting.
+    pub struct Mempool {}
+
+    impl Mempool {
+        pub fn new(_size: usize) -> Self {
+            Self {}
+        }
+    }
+
+    #[derive(Debug)]
+    pub struct Eal {
+        eth_ports: Option<Vec<eth::Port>>,
+    }
+
+    impl Eal {
+        //  allow init once,
+        pub fn init() -> Result<Self, String> {
+            // EAL init() will do PCI probe and VDev enumeration will find/create eth ports.
+            // This code should loop over the ports, and build up Rust structs representing them
+            let eth_port = vec![eth::Port::from_u16(0)];
+            Ok(Eal {
+                eth_ports: Some(eth_port),
+            })
+        }
+
+        // API to get eth ports, taking ownership. It can be called once.
+        // The return will be None for future calls
+        pub fn take_eth_ports(&mut self) -> Option<Vec<eth::Port>> {
+            self.eth_ports.take()
+        }
+    }
+
+    impl Drop for Eal {
+        fn drop(&mut self) {
+            // todo: rte_eal_cleanup()
+        }
+    }
+} // DPDK mod
+
+fn main() {
+    let mut dpdk = dpdk::Eal::init().expect("dpdk must init ok");
+    let rx_mempool = dpdk::Mempool::new(4096);
+
+    let mut ports = dpdk.take_eth_ports().expect("take eth ports ok");
+    let mut p = ports.pop().unwrap();
+
+    p.rxqs(2, rx_mempool).expect("rxqs setup ok");
+    println!("{:?}", p);
+
+    let (mut rxqs, _txqs) = p.start();
+    println!("rxqs: {:?}", rxqs);
+
+    let rxq1 = rxqs.pop().unwrap();
+    let rxq2 = rxqs.pop().unwrap();
+
+    // spawn a new thread to use rxq1. This demonstrates that the RxqHandle
+    // type can move between threads - it is not tied to the thread that created it.
+    std::thread::spawn(move || {
+        // Uncomment this: it fails to compile!
+        //   - Rxq2 would be used by this newly-spawned thread
+        //     -- specifically the variable was "moved" into this thread
+        //   - it is also used below (by the main thread)
+        // "value used after move" is the error, on the below code
+        // let mut rxq = rxq2.enable_polling();
+
+        // see docs on enable_polling above to understand how the enable_polling()
+        // function helps to achieve the thread-safety-at-compile-time goal.
+        let mut rxq = rxq1.enable_polling();
+        loop {
+            let _nb_mbufs = rxq.rx_burst(&mut [0; 32]);
+            std::thread::sleep(std::time::Duration::from_millis(1000));
+        }
+    });
+
+    // main thread polling rxq2
+    let mut rxq = rxq2.enable_polling();
+    loop {
+        let _nb_mbufs = rxq.rx_burst(&mut [0; 32]);
+        std::thread::sleep(std::time::Duration::from_millis(1000));
+    }
+}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-04-17 15:10 [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq Harry van Haaren
@ 2025-04-17 18:58 ` Etelson, Gregory
  2025-04-18 11:40   ` Van Haaren, Harry
  2025-04-18 13:23 ` [PATCH 1/3] " Harry van Haaren
  1 sibling, 1 reply; 20+ messages in thread
From: Etelson, Gregory @ 2025-04-17 18:58 UTC (permalink / raw)
  To: Harry van Haaren; +Cc: dev, getelson, bruce.richardson, owen.hilyard

Hello Harry,

Thank you for sharing the API.
Please check out my comments below.

Regards,
Gregory

On Thu, 17 Apr 2025, Harry van Haaren wrote:

> External email: Use caution opening links or attachments
>
>
> This patch is NOT to be considered for merge, it is a demo
> of the Rust APIs for Ethdev. There is no actual implementation
> of the APIs against the DPDK C functions, this is Rust API only.
>
> To test/run the code (and uncomment things to see errors)
> just apply this patch, cd "rust_api_example" and run
> $ cargo run
>
> This will compile the API, and spawn 2x threads to poll on
> two Rxq instances. The comments in the code explain how the
> "Send" and "Sync" attributes are captured per instances of a
> struct (e.g. how RxqHandle -> Rxq restricts thread movement).
>
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> ---
> rust_api_example/Cargo.toml  |   6 ++
> rust_api_example/src/main.rs | 189 +++++++++++++++++++++++++++++++++++
> 2 files changed, 195 insertions(+)
> create mode 100644 rust_api_example/Cargo.toml
> create mode 100644 rust_api_example/src/main.rs
>
> diff --git a/rust_api_example/Cargo.toml b/rust_api_example/Cargo.toml
> new file mode 100644
> index 0000000000..0137826340
> --- /dev/null
> +++ b/rust_api_example/Cargo.toml
> @@ -0,0 +1,6 @@
> +[package]
> +name = "rust_api_example"
> +version = "0.1.0"
> +edition = "2021"
> +
> +[dependencies]
> diff --git a/rust_api_example/src/main.rs b/rust_api_example/src/main.rs
> new file mode 100644
> index 0000000000..8d0de50c30
> --- /dev/null
> +++ b/rust_api_example/src/main.rs
> @@ -0,0 +1,189 @@
> +// Outline for safe DPDK API bindings
> +//  - None of the APIs are actually implemented, this is API design only
> +//  - This demo runs 2x threads on 2x Rxqs, and cannot accidentally poll incorrectly
> +
> +pub mod dpdk {
> +    pub mod eth {
> +        use super::Mempool;
> +
> +        #[derive(Debug)]
> +        pub struct TxqHandle {/* todo: but same as Rxq */}
> +
> +        // Handle allows moving between threads, its not polling!
> +        #[derive(Debug)]
> +        pub struct RxqHandle {
> +            port: u16,
> +            queue: u16,
> +        }
> +
> +        impl RxqHandle {
> +            pub(crate) fn new(port: u16, queue: u16) -> Self {
> +                RxqHandle { port, queue }
> +            }
> +
> +            // This function is the key to the API design: it ensures the rx_burst()
> +            // function is only available via the Rxq struct, after enable_polling() has been called.
> +            // It "consumes" (takes "self" as a parameter, not a '&' reference!) which essentially
> +            // destroys/invalidates the handle from the Application level code.
> +
> +            // It returns an Rxq instance, which has the PhantomData to encode the threading requirements,
> +            // and the Rxq has the rx_burst() function: this allows the application to recieve packets.
> +            pub fn enable_polling(self) -> Rxq {
> +                Rxq {
> +                    handle: self,
> +                    _phantom: std::marker::PhantomData,
> +                }
> +            }
> +        }
> +
> +        #[derive(Debug)]
> +        pub struct Rxq {
> +            handle: RxqHandle,
> +            // This "PhantomData" tells the rust compiler to Pretend the Rc<()> is in this struct
> +            // but in practice it is a Zero-Sized-Type, so takes up no space. It is a compile-time
> +            // language technique to ensure the struct is not moved between threads. This encodes
> +            // the API requirement "don't poll from multiple threads without synchronisation (e.g. Mutex)"
> +            _phantom: std::marker::PhantomData<std::rc::Rc<()>>,
> +        }
> +
> +        impl Rxq {
> +            // TODO: datapath Error types should be lightweight, not String. Here we return ().
> +            pub fn rx_burst(&mut self, _mbufs: &mut [u8]) -> Result<usize, ()> {
> +                // TODO: Design the Mbuf struct wrapper, and how to best return a batch
> +                //  e.g.: investigate "ArrayVec" crate for safe & fixed sized, stack allocated arrays
> +                //
> +                // There is work to do here, but I want to communicate the general DPDK/EAL/Eth/Rxq concepts
> +                // now, this part is not done yet: it is likely the hardest/most performance critical.
> +                //
> +                // call rte_eth_rx_burst() here
> +                println!(
> +                    "[thread: {:?}] rx_burst: port {} queue {}",
> +                    std::thread::current().id(),
> +                    self.handle.port,
> +                    self.handle.queue
> +                );
> +                Ok(0)
> +            }
> +        }
> +
> +        #[derive(Debug)]
> +        pub struct Port {
> +            id: u16,
> +            rxqs: Vec<RxqHandle>,
> +            txqs: Vec<TxqHandle>,
> +        }
> +
> +        impl Port {
> +            // pub(crate) here ensures outside this crate users cannot call this function
> +            pub(crate) fn from_u16(id: u16) -> Self {
> +                Port {
> +                    id,
> +                    rxqs: Vec::new(),
> +                    txqs: Vec::new(),
> +                }
> +            }
> +
> +            pub fn rxqs(&mut self, rxq_count: u16, _mempool: Mempool) -> Result<(), String> {
> +                for q in 0..rxq_count {
> +                    // call rte_eth_rx_queue_setup() here
> +                    self.rxqs.push(RxqHandle::new(self.id, q));
> +                }
> +                Ok(())
> +            }
> +
> +            pub fn start(&mut self) -> (Vec<RxqHandle>, Vec<TxqHandle>) {
> +                // call rte_eth_dev_start() here, then give ownership of Rxq/Txq to app

After a call to Port::start, Rx and Tx queues are detached from it's port.
With that model how rte_eth_dev_stop() and subsequent rte_eth_dev_start()
DPDK calls can be implemented ?

> +                (
> +                    std::mem::take(&mut self.rxqs),
> +                    std::mem::take(&mut self.txqs),
> +                )
> +            }
> +        }
> +    }
> +
> +    #[derive(Debug, Clone)]
> +    // Mempool is a long-life object, which many other DPDK things refer to (e.g. rxq config)
> +    // Having a Rust lifetime attached to it (while technically correct) would complicate the
> +    // code a LOT, and for little value. This is a tradeoff - happy to discuss more if we want.
> +    // The choice here is to derive "Clone", allowing handing over multiple instances of the
> +    // same Mempool, similar to how Arc<Mempool> would work, but without the reference counting.
> +    pub struct Mempool {}
> +
> +    impl Mempool {
> +        pub fn new(_size: usize) -> Self {
> +            Self {}
> +        }
> +    }
> +
> +    #[derive(Debug)]
> +    pub struct Eal {
> +        eth_ports: Option<Vec<eth::Port>>,
> +    }
> +
> +    impl Eal {
> +        //  allow init once,
> +        pub fn init() -> Result<Self, String> {
> +            // EAL init() will do PCI probe and VDev enumeration will find/create eth ports.
> +            // This code should loop over the ports, and build up Rust structs representing them
> +            let eth_port = vec![eth::Port::from_u16(0)];
> +            Ok(Eal {
> +                eth_ports: Some(eth_port),
> +            })
> +        }
> +
> +        // API to get eth ports, taking ownership. It can be called once.
> +        // The return will be None for future calls
> +        pub fn take_eth_ports(&mut self) -> Option<Vec<eth::Port>> {
> +            self.eth_ports.take()
> +        }
> +    }
> +
> +    impl Drop for Eal {
> +        fn drop(&mut self) {
> +            // todo: rte_eal_cleanup()
> +        }
> +    }
> +} // DPDK mod
> +
> +fn main() {
> +    let mut dpdk = dpdk::Eal::init().expect("dpdk must init ok");
> +    let rx_mempool = dpdk::Mempool::new(4096);
> +
> +    let mut ports = dpdk.take_eth_ports().expect("take eth ports ok");

Eal::take_eth_ports() resets EAL ports.
A call to rte_dev_probe() will ether fail, because Eal::eth_ports is None 
or create another port-0, depending on implementation.

> +    let mut p = ports.pop().unwrap();
> +
> +    p.rxqs(2, rx_mempool).expect("rxqs setup ok");
> +    println!("{:?}", p);
> +
> +    let (mut rxqs, _txqs) = p.start();
> +    println!("rxqs: {:?}", rxqs);
> +
> +    let rxq1 = rxqs.pop().unwrap();
> +    let rxq2 = rxqs.pop().unwrap();
> +
> +    // spawn a new thread to use rxq1. This demonstrates that the RxqHandle
> +    // type can move between threads - it is not tied to the thread that created it.
> +    std::thread::spawn(move || {
> +        // Uncomment this: it fails to compile!
> +        //   - Rxq2 would be used by this newly-spawned thread
> +        //     -- specifically the variable was "moved" into this thread
> +        //   - it is also used below (by the main thread)
> +        // "value used after move" is the error, on the below code
> +        // let mut rxq = rxq2.enable_polling();
> +
> +        // see docs on enable_polling above to understand how the enable_polling()
> +        // function helps to achieve the thread-safety-at-compile-time goal.
> +        let mut rxq = rxq1.enable_polling();
> +        loop {
> +            let _nb_mbufs = rxq.rx_burst(&mut [0; 32]);
> +            std::thread::sleep(std::time::Duration::from_millis(1000));
> +        }
> +    });
> +
> +    // main thread polling rxq2
> +    let mut rxq = rxq2.enable_polling();
> +    loop {
> +        let _nb_mbufs = rxq.rx_burst(&mut [0; 32]);
> +        std::thread::sleep(std::time::Duration::from_millis(1000));
> +    }
> +}
> --
> 2.34.1
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-04-17 18:58 ` Etelson, Gregory
@ 2025-04-18 11:40   ` Van Haaren, Harry
  2025-04-20  8:57     ` Gregory Etelson
  0 siblings, 1 reply; 20+ messages in thread
From: Van Haaren, Harry @ 2025-04-18 11:40 UTC (permalink / raw)
  To: Etelson, Gregory; +Cc: dev, Richardson, Bruce, owen.hilyard

> From: Etelson, Gregory
> Sent: Thursday, April 17, 2025 7:58 PM
> To: Van Haaren, Harry
> Cc: dev@dpdk.org; getelson@nvidia.com; Richardson, Bruce; owen.hilyard@unh.edu
> Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
> 
> Hello Harry,
> 
> Thank you for sharing the API.
> Please check out my comments below.

Thanks for reading & discussion!

<snip>

> > +
> > +            pub fn start(&mut self) -> (Vec<RxqHandle>, Vec<TxqHandle>) {
> > +                // call rte_eth_dev_start() here, then give ownership of Rxq/Txq to app
> 
> After a call to Port::start, Rx and Tx queues are detached from it's port.
> With that model how rte_eth_dev_stop() and subsequent rte_eth_dev_start()
> DPDK calls can be implemented ?

Correct, the RxqHandle and TxqHandle don't have a "back reference" to the port.
There are a number of ways to ensure eth_dev_stop() cannot be called without the
Rxq/Txqs being "returned" to the Port instance first.

Eg: Use an Arc<T>. The port instance "owns" the Arc<T>, which means it is going to keep
   the Arc alive. Now give each Rxq/Txq a clone of this Arc. When the Drop impl of the
   Rxq/Txq runs, it will decrement the Arc. So just letting the Rxq/Txq go out of scope
   will be enough to have the Port understand that handle is now gone.

   The port itself can use Arc::into_inner function[1], which returns Option<T>. If the
   Some(T) is returned, then all instances of RxqHandle/TxqHandle have been dropped,
   meaning it is safe to eth_dev_stop(), as it is impossible to poll RXQs if there's no Rxq :)
   [1] https://doc.rust-lang.org/std/sync/struct.Arc.html#method.into_inner

// Pseudo-code here:
Dpdk::Eth::Port::stop(&mut self) -> Result<(), Error> {
    let handles_dropped = self.handle_arc.into_inner(); // returns "T" if its the only reference to the Arc
    if handles_dropped.is_none() {
        return Err("an Rxq or Txq handle remains alive, cannot safely stop this port");
    }
}

There's probably a few others, but that's "idiomatic Rust" solution.
We'd have to pass the Arc from the RxqHandle into the Rxq instance itself too,
but that's fine.

<snip>

> > +fn main() {
> > +    let mut dpdk = dpdk::Eal::init().expect("dpdk must init ok");
> > +    let rx_mempool = dpdk::Mempool::new(4096);
> > +
> > +    let mut ports = dpdk.take_eth_ports().expect("take eth ports ok");
> 
> Eal::take_eth_ports() resets EAL ports.

I don't think it "resets" here. The "take eth ports" removes the Port instances from
the dpdk::Eal struct, but there's no "reset" behaviour.

> A call to rte_dev_probe() will ether fail, because Eal::eth_ports is None
> or create another port-0, depending on implementation.

I don't see how or why rte_dev_probe() would be called. The idea is not to allow Rust
apps call DPDK C APIs "when they want". The safe Rust API provides the required abstraction.
So its not possible to have another call to rte_dev_probe(), after the 1st time under eal_init().

Similar topic: Hotplug. I have experience with designing C APIs around hotplug
use-cases (Music/DJ software, from before my DPDK/networking days!). I think DPDK has
an interesting "push hotplug" approach (aka, App makes a function call to "request" the device).
Then on successful return, we can call rte_eth_dev_get_port_by_name() to get the u16 port_id,
and build the Port instance from that. Outline API:

enum EalHotplugDev {
    EthDev(Dpdk::Eth::Port), // enums can have contents in Rust :)
    CryptoDev(Dpdk::Crypto),
    // Etc
}

Eal::hotplug_add(bus: String, dev: String, args: String) -> Result<EalHotplugDev, Error> {
    // TODO: call rte_eal_hotplug_add()
    // TODO: identify how to know if its an Eth, Crypto, Dma, or other dev type?
    match (dev_type) {
        "eth" => {
	    let port_id = rte_eth_dev_get_port_by_name(dev);
	    EalHotplugDev::EthDev( Dpdk::Eth::Port::new(port_id) )
        }
    }
}

Applications could then do:
  let Ok(dev) = eal.hotplug_add("pci", "02:00.0", "dev_option=true") else {
      // failed to hotplug, log error?
      return;
  }
  match (dev) {
      EthDev => {
          // handle the dev here, e.g. configure & spawn thread to poll Rxq like before.
      }
  }

I like having an outline of difficult to "bolt on" features (hotplug is typically hard to add later..)
but I recommend we focus on getting core APIs and such running before more detail/time/implementation here.

Regards, -Harry

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/3] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-04-17 15:10 [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq Harry van Haaren
  2025-04-17 18:58 ` Etelson, Gregory
@ 2025-04-18 13:23 ` Harry van Haaren
  2025-04-18 13:23   ` [PATCH 2/3] rust: split main into example, refactor to lib.rs Harry van Haaren
  2025-04-18 13:23   ` [PATCH 3/3] rust: showcase port Rxq return for stop() and reconfigure Harry van Haaren
  1 sibling, 2 replies; 20+ messages in thread
From: Harry van Haaren @ 2025-04-18 13:23 UTC (permalink / raw)
  To: dev; +Cc: getelson, bruce.richardson, owen.hilyard, Harry van Haaren

This patch is NOT to be considered for merge, it is a demo
of the Rust APIs for Ethdev. There is no actual implementation
of the APIs against the DPDK C functions, this is Rust API only.

To test/run the code (and uncomment things to see errors)
just apply this patch, cd "rust_api_example" and run
$ cargo run

This will compile the API, and spawn 2x threads to poll on
two Rxq instances. The comments in the code explain how the
"Send" and "Sync" attributes are captured per instances of a
struct (e.g. how RxqHandle -> Rxq restricts thread movement).

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
---
 rust_api_example/Cargo.toml  |   6 ++
 rust_api_example/src/main.rs | 189 +++++++++++++++++++++++++++++++++++
 2 files changed, 195 insertions(+)
 create mode 100644 rust_api_example/Cargo.toml
 create mode 100644 rust_api_example/src/main.rs

diff --git a/rust_api_example/Cargo.toml b/rust_api_example/Cargo.toml
new file mode 100644
index 0000000000..0137826340
--- /dev/null
+++ b/rust_api_example/Cargo.toml
@@ -0,0 +1,6 @@
+[package]
+name = "rust_api_example"
+version = "0.1.0"
+edition = "2021"
+
+[dependencies]
diff --git a/rust_api_example/src/main.rs b/rust_api_example/src/main.rs
new file mode 100644
index 0000000000..8d0de50c30
--- /dev/null
+++ b/rust_api_example/src/main.rs
@@ -0,0 +1,189 @@
+// Outline for safe DPDK API bindings
+//  - None of the APIs are actually implemented, this is API design only
+//  - This demo runs 2x threads on 2x Rxqs, and cannot accidentally poll incorrectly
+
+pub mod dpdk {
+    pub mod eth {
+        use super::Mempool;
+
+        #[derive(Debug)]
+        pub struct TxqHandle {/* todo: but same as Rxq */}
+
+        // Handle allows moving between threads, its not polling!
+        #[derive(Debug)]
+        pub struct RxqHandle {
+            port: u16,
+            queue: u16,
+        }
+
+        impl RxqHandle {
+            pub(crate) fn new(port: u16, queue: u16) -> Self {
+                RxqHandle { port, queue }
+            }
+
+            // This function is the key to the API design: it ensures the rx_burst()
+            // function is only available via the Rxq struct, after enable_polling() has been called.
+            // It "consumes" (takes "self" as a parameter, not a '&' reference!) which essentially
+            // destroys/invalidates the handle from the Application level code.
+
+            // It returns an Rxq instance, which has the PhantomData to encode the threading requirements,
+            // and the Rxq has the rx_burst() function: this allows the application to recieve packets.
+            pub fn enable_polling(self) -> Rxq {
+                Rxq {
+                    handle: self,
+                    _phantom: std::marker::PhantomData,
+                }
+            }
+        }
+
+        #[derive(Debug)]
+        pub struct Rxq {
+            handle: RxqHandle,
+            // This "PhantomData" tells the rust compiler to Pretend the Rc<()> is in this struct
+            // but in practice it is a Zero-Sized-Type, so takes up no space. It is a compile-time
+            // language technique to ensure the struct is not moved between threads. This encodes
+            // the API requirement "don't poll from multiple threads without synchronisation (e.g. Mutex)"
+            _phantom: std::marker::PhantomData<std::rc::Rc<()>>,
+        }
+
+        impl Rxq {
+            // TODO: datapath Error types should be lightweight, not String. Here we return ().
+            pub fn rx_burst(&mut self, _mbufs: &mut [u8]) -> Result<usize, ()> {
+                // TODO: Design the Mbuf struct wrapper, and how to best return a batch
+                //  e.g.: investigate "ArrayVec" crate for safe & fixed sized, stack allocated arrays
+                //
+                // There is work to do here, but I want to communicate the general DPDK/EAL/Eth/Rxq concepts
+                // now, this part is not done yet: it is likely the hardest/most performance critical.
+                //
+                // call rte_eth_rx_burst() here
+                println!(
+                    "[thread: {:?}] rx_burst: port {} queue {}",
+                    std::thread::current().id(),
+                    self.handle.port,
+                    self.handle.queue
+                );
+                Ok(0)
+            }
+        }
+
+        #[derive(Debug)]
+        pub struct Port {
+            id: u16,
+            rxqs: Vec<RxqHandle>,
+            txqs: Vec<TxqHandle>,
+        }
+
+        impl Port {
+            // pub(crate) here ensures outside this crate users cannot call this function
+            pub(crate) fn from_u16(id: u16) -> Self {
+                Port {
+                    id,
+                    rxqs: Vec::new(),
+                    txqs: Vec::new(),
+                }
+            }
+
+            pub fn rxqs(&mut self, rxq_count: u16, _mempool: Mempool) -> Result<(), String> {
+                for q in 0..rxq_count {
+                    // call rte_eth_rx_queue_setup() here
+                    self.rxqs.push(RxqHandle::new(self.id, q));
+                }
+                Ok(())
+            }
+
+            pub fn start(&mut self) -> (Vec<RxqHandle>, Vec<TxqHandle>) {
+                // call rte_eth_dev_start() here, then give ownership of Rxq/Txq to app
+                (
+                    std::mem::take(&mut self.rxqs),
+                    std::mem::take(&mut self.txqs),
+                )
+            }
+        }
+    }
+
+    #[derive(Debug, Clone)]
+    // Mempool is a long-life object, which many other DPDK things refer to (e.g. rxq config)
+    // Having a Rust lifetime attached to it (while technically correct) would complicate the
+    // code a LOT, and for little value. This is a tradeoff - happy to discuss more if we want.
+    // The choice here is to derive "Clone", allowing handing over multiple instances of the
+    // same Mempool, similar to how Arc<Mempool> would work, but without the reference counting.
+    pub struct Mempool {}
+
+    impl Mempool {
+        pub fn new(_size: usize) -> Self {
+            Self {}
+        }
+    }
+
+    #[derive(Debug)]
+    pub struct Eal {
+        eth_ports: Option<Vec<eth::Port>>,
+    }
+
+    impl Eal {
+        //  allow init once,
+        pub fn init() -> Result<Self, String> {
+            // EAL init() will do PCI probe and VDev enumeration will find/create eth ports.
+            // This code should loop over the ports, and build up Rust structs representing them
+            let eth_port = vec![eth::Port::from_u16(0)];
+            Ok(Eal {
+                eth_ports: Some(eth_port),
+            })
+        }
+
+        // API to get eth ports, taking ownership. It can be called once.
+        // The return will be None for future calls
+        pub fn take_eth_ports(&mut self) -> Option<Vec<eth::Port>> {
+            self.eth_ports.take()
+        }
+    }
+
+    impl Drop for Eal {
+        fn drop(&mut self) {
+            // todo: rte_eal_cleanup()
+        }
+    }
+} // DPDK mod
+
+fn main() {
+    let mut dpdk = dpdk::Eal::init().expect("dpdk must init ok");
+    let rx_mempool = dpdk::Mempool::new(4096);
+
+    let mut ports = dpdk.take_eth_ports().expect("take eth ports ok");
+    let mut p = ports.pop().unwrap();
+
+    p.rxqs(2, rx_mempool).expect("rxqs setup ok");
+    println!("{:?}", p);
+
+    let (mut rxqs, _txqs) = p.start();
+    println!("rxqs: {:?}", rxqs);
+
+    let rxq1 = rxqs.pop().unwrap();
+    let rxq2 = rxqs.pop().unwrap();
+
+    // spawn a new thread to use rxq1. This demonstrates that the RxqHandle
+    // type can move between threads - it is not tied to the thread that created it.
+    std::thread::spawn(move || {
+        // Uncomment this: it fails to compile!
+        //   - Rxq2 would be used by this newly-spawned thread
+        //     -- specifically the variable was "moved" into this thread
+        //   - it is also used below (by the main thread)
+        // "value used after move" is the error, on the below code
+        // let mut rxq = rxq2.enable_polling();
+
+        // see docs on enable_polling above to understand how the enable_polling()
+        // function helps to achieve the thread-safety-at-compile-time goal.
+        let mut rxq = rxq1.enable_polling();
+        loop {
+            let _nb_mbufs = rxq.rx_burst(&mut [0; 32]);
+            std::thread::sleep(std::time::Duration::from_millis(1000));
+        }
+    });
+
+    // main thread polling rxq2
+    let mut rxq = rxq2.enable_polling();
+    loop {
+        let _nb_mbufs = rxq.rx_burst(&mut [0; 32]);
+        std::thread::sleep(std::time::Duration::from_millis(1000));
+    }
+}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/3] rust: split main into example, refactor to lib.rs
  2025-04-18 13:23 ` [PATCH 1/3] " Harry van Haaren
@ 2025-04-18 13:23   ` Harry van Haaren
  2025-04-18 13:23   ` [PATCH 3/3] rust: showcase port Rxq return for stop() and reconfigure Harry van Haaren
  1 sibling, 0 replies; 20+ messages in thread
From: Harry van Haaren @ 2025-04-18 13:23 UTC (permalink / raw)
  To: dev; +Cc: getelson, bruce.richardson, owen.hilyard, Harry van Haaren

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
---
 rust_api_example/examples/eth_poll.rs    | 35 +++++++++++++++++++
 rust_api_example/src/{main.rs => lib.rs} | 43 ------------------------
 2 files changed, 35 insertions(+), 43 deletions(-)
 create mode 100644 rust_api_example/examples/eth_poll.rs
 rename rust_api_example/src/{main.rs => lib.rs} (77%)

diff --git a/rust_api_example/examples/eth_poll.rs b/rust_api_example/examples/eth_poll.rs
new file mode 100644
index 0000000000..cde28df68d
--- /dev/null
+++ b/rust_api_example/examples/eth_poll.rs
@@ -0,0 +1,35 @@
+// Examples should not require any "unsafe" code.
+#![deny(unsafe_code)]
+
+use rust_api_example::dpdk::{self};
+
+fn main() {
+    let mut dpdk = dpdk::Eal::init().expect("dpdk must init ok");
+    let rx_mempool = dpdk::Mempool::new(4096);
+
+    let mut ports = dpdk.take_eth_ports().expect("take eth ports ok");
+    let mut p = ports.pop().unwrap();
+
+    p.rxqs(2, rx_mempool).expect("rxqs setup ok");
+    println!("{:?}", p);
+
+    let (mut rxqs, _txqs) = p.start();
+    println!("rxqs: {:?}", rxqs);
+
+    let rxq1 = rxqs.pop().unwrap();
+    let rxq2 = rxqs.pop().unwrap();
+
+    std::thread::spawn(move || {
+        let mut rxq = rxq1.enable_polling();
+        loop {
+            let _nb_mbufs = rxq.rx_burst(&mut [0; 32]);
+            std::thread::sleep(std::time::Duration::from_millis(1000));
+        }
+    });
+
+    let mut rxq = rxq2.enable_polling();
+    loop {
+        let _nb_mbufs = rxq.rx_burst(&mut [0; 32]);
+        std::thread::sleep(std::time::Duration::from_millis(1000));
+    }
+}
\ No newline at end of file
diff --git a/rust_api_example/src/main.rs b/rust_api_example/src/lib.rs
similarity index 77%
rename from rust_api_example/src/main.rs
rename to rust_api_example/src/lib.rs
index 8d0de50c30..0d13b06d85 100644
--- a/rust_api_example/src/main.rs
+++ b/rust_api_example/src/lib.rs
@@ -144,46 +144,3 @@ pub mod dpdk {
         }
     }
 } // DPDK mod
-
-fn main() {
-    let mut dpdk = dpdk::Eal::init().expect("dpdk must init ok");
-    let rx_mempool = dpdk::Mempool::new(4096);
-
-    let mut ports = dpdk.take_eth_ports().expect("take eth ports ok");
-    let mut p = ports.pop().unwrap();
-
-    p.rxqs(2, rx_mempool).expect("rxqs setup ok");
-    println!("{:?}", p);
-
-    let (mut rxqs, _txqs) = p.start();
-    println!("rxqs: {:?}", rxqs);
-
-    let rxq1 = rxqs.pop().unwrap();
-    let rxq2 = rxqs.pop().unwrap();
-
-    // spawn a new thread to use rxq1. This demonstrates that the RxqHandle
-    // type can move between threads - it is not tied to the thread that created it.
-    std::thread::spawn(move || {
-        // Uncomment this: it fails to compile!
-        //   - Rxq2 would be used by this newly-spawned thread
-        //     -- specifically the variable was "moved" into this thread
-        //   - it is also used below (by the main thread)
-        // "value used after move" is the error, on the below code
-        // let mut rxq = rxq2.enable_polling();
-
-        // see docs on enable_polling above to understand how the enable_polling()
-        // function helps to achieve the thread-safety-at-compile-time goal.
-        let mut rxq = rxq1.enable_polling();
-        loop {
-            let _nb_mbufs = rxq.rx_burst(&mut [0; 32]);
-            std::thread::sleep(std::time::Duration::from_millis(1000));
-        }
-    });
-
-    // main thread polling rxq2
-    let mut rxq = rxq2.enable_polling();
-    loop {
-        let _nb_mbufs = rxq.rx_burst(&mut [0; 32]);
-        std::thread::sleep(std::time::Duration::from_millis(1000));
-    }
-}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/3] rust: showcase port Rxq return for stop() and reconfigure
  2025-04-18 13:23 ` [PATCH 1/3] " Harry van Haaren
  2025-04-18 13:23   ` [PATCH 2/3] rust: split main into example, refactor to lib.rs Harry van Haaren
@ 2025-04-18 13:23   ` Harry van Haaren
  1 sibling, 0 replies; 20+ messages in thread
From: Harry van Haaren @ 2025-04-18 13:23 UTC (permalink / raw)
  To: dev; +Cc: getelson, bruce.richardson, owen.hilyard, Harry van Haaren

Since the refactor, use this command to run/test:
  cargo r --example eth_poll

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
---
 rust_api_example/examples/eth_poll.rs | 45 ++++++++++++++++++++---
 rust_api_example/src/lib.rs           | 52 ++++++++++++++++++++++++---
 2 files changed, 88 insertions(+), 9 deletions(-)

diff --git a/rust_api_example/examples/eth_poll.rs b/rust_api_example/examples/eth_poll.rs
index cde28df68d..0ef0a28ab9 100644
--- a/rust_api_example/examples/eth_poll.rs
+++ b/rust_api_example/examples/eth_poll.rs
@@ -10,7 +10,7 @@ fn main() {
     let mut ports = dpdk.take_eth_ports().expect("take eth ports ok");
     let mut p = ports.pop().unwrap();
 
-    p.rxqs(2, rx_mempool).expect("rxqs setup ok");
+    p.rxqs(2, rx_mempool.clone()).expect("rxqs setup ok");
     println!("{:?}", p);
 
     let (mut rxqs, _txqs) = p.start();
@@ -21,15 +21,50 @@ fn main() {
 
     std::thread::spawn(move || {
         let mut rxq = rxq1.enable_polling();
-        loop {
+        for _ in 0..3 {
             let _nb_mbufs = rxq.rx_burst(&mut [0; 32]);
             std::thread::sleep(std::time::Duration::from_millis(1000));
         }
     });
 
-    let mut rxq = rxq2.enable_polling();
-    loop {
-        let _nb_mbufs = rxq.rx_burst(&mut [0; 32]);
+    // "shadowing" variables is a common pattern in Rust, and is used here to
+    // allow us to use the same variable name but for Rxq instead of RxqHandle.
+    let mut rxq2 = rxq2.enable_polling();
+    for _ in 0..2 {
+        let _nb_mbufs = rxq2.rx_burst(&mut [0; 32]);
         std::thread::sleep(std::time::Duration::from_millis(1000));
     }
+
+    // Important! As Port::stop() relies on RxqHandle's being dropped to
+    // reduce the refcount, if the rxq is NOT dropped, it will NOT allow
+    // the port to be stopped. This is actually a win for Safety (no polling stopped NIC ports)
+    // but also a potential bug/hiccup at application code level.
+    // Uncomment this line to see the loop below stall forever (waiting for Arc ref count to drop from 2 to 1)
+    drop(rxq2);
+
+    loop {
+        let r = p.stop();
+        match r {
+            Ok(_v) => {
+                println!("stopping port");
+                break;
+            }
+            Err(e) => {
+                println!("stop() returns error: {}", e);
+            }
+        };
+        std::thread::sleep(std::time::Duration::from_millis(300));
+    }
+
+    // Reconfigure after stop()
+    p.rxqs(4, rx_mempool.clone()).expect("rxqs setup ok");
+    println!("{:?}", p);
+
+    // queues is a tuple of (rxqs, txqs) here
+    let queues = p.start();
+    println!("queues: {:?}", queues);
+    drop(queues);
+
+    p.stop().expect("stop() ok");
+    println!("stopped port");
 }
\ No newline at end of file
diff --git a/rust_api_example/src/lib.rs b/rust_api_example/src/lib.rs
index 0d13b06d85..6b795fc227 100644
--- a/rust_api_example/src/lib.rs
+++ b/rust_api_example/src/lib.rs
@@ -5,20 +5,47 @@
 pub mod dpdk {
     pub mod eth {
         use super::Mempool;
-
+        use std::sync::Arc;
+
+        // PortHandle here is used as a refcount of "Outstanding Rx/Tx queues".
+        // This is useful, but the "runstate" of the port is also useful. They are
+        // similar, but not identical. A more elegant solution is likely possible.
+        #[derive(Debug, Clone)]
+        #[allow(unused)]
+        pub(crate) struct PortHandle(Arc<()>);
+
+        impl PortHandle {
+            fn new() -> Self {
+                PortHandle(Arc::new(()))
+            }
+            fn stop(&mut self) -> Result<(), usize> {
+                // if the count is 1, only the Port itself has a handle left.
+                // In that case, the count cannot go up, so we can stop.
+                // The strange "Arc::<()>::function()" syntax here is "Fully qualified syntax":
+                //  - https://doc.rust-lang.org/std/sync/struct.Arc.html#deref-behavior
+                let sc = Arc::<()>::strong_count(&self.0);
+                if  sc == 1 {
+                    Ok(())
+                } else {
+                    Err(sc)
+                }
+            }
+        }
+        
         #[derive(Debug)]
         pub struct TxqHandle {/* todo: but same as Rxq */}
 
         // Handle allows moving between threads, its not polling!
         #[derive(Debug)]
         pub struct RxqHandle {
+            _handle: PortHandle,
             port: u16,
             queue: u16,
         }
 
         impl RxqHandle {
-            pub(crate) fn new(port: u16, queue: u16) -> Self {
-                RxqHandle { port, queue }
+            pub(crate) fn new(handle: PortHandle, port: u16, queue: u16) -> Self {
+                RxqHandle { _handle: handle, port, queue }
             }
 
             // This function is the key to the API design: it ensures the rx_burst()
@@ -68,6 +95,7 @@ pub mod dpdk {
 
         #[derive(Debug)]
         pub struct Port {
+            handle: PortHandle,
             id: u16,
             rxqs: Vec<RxqHandle>,
             txqs: Vec<TxqHandle>,
@@ -77,6 +105,7 @@ pub mod dpdk {
             // pub(crate) here ensures outside this crate users cannot call this function
             pub(crate) fn from_u16(id: u16) -> Self {
                 Port {
+                    handle: PortHandle::new(),
                     id,
                     rxqs: Vec::new(),
                     txqs: Vec::new(),
@@ -84,10 +113,14 @@ pub mod dpdk {
             }
 
             pub fn rxqs(&mut self, rxq_count: u16, _mempool: Mempool) -> Result<(), String> {
+                // ensure no old ports remain
+                self.rxqs.clear();
+
                 for q in 0..rxq_count {
                     // call rte_eth_rx_queue_setup() here
-                    self.rxqs.push(RxqHandle::new(self.id, q));
+                    self.rxqs.push(RxqHandle::new(self.handle.clone(), self.id, q));
                 }
+                println!("{:?}", self.handle);
                 Ok(())
             }
 
@@ -98,6 +131,17 @@ pub mod dpdk {
                     std::mem::take(&mut self.txqs),
                 )
             }
+
+            pub fn stop(&mut self) -> Result<(), String> {
+                match self.handle.stop() {
+                    Ok(_v) => {
+                        // call rte_eth_dev_stop() here
+                        println!("stopping port {}", self.id);
+                        Ok(())
+                    }
+                    Err(e) => Err(format!("Port has {} Rxq/Txq handles outstanding", e)),
+                }
+            }
         }
     }
 
-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-04-18 11:40   ` Van Haaren, Harry
@ 2025-04-20  8:57     ` Gregory Etelson
  2025-04-24 16:06       ` Van Haaren, Harry
  0 siblings, 1 reply; 20+ messages in thread
From: Gregory Etelson @ 2025-04-20  8:57 UTC (permalink / raw)
  To: Van Haaren, Harry; +Cc: dev, Richardson, Bruce, owen.hilyard

[-- Attachment #1: Type: text/plain, Size: 6018 bytes --]

Hello Harry,

I implemented a working echo server with your API.
The code is here: https://github.com/getelson-at-mellanox/rdpdk/tree/safe-q

Several changes:

  *
DPDK configuration is split to 3 mandatory steps:
     *
port configuration in
Port::configure(&mut self, rxq_num: u16, txq_num: u16) -> Result<(), String>
     *
Rx queues configuration in
Port::config_rxqs(&mut self, desc_num: u16, mempool: DpdkMempool) -> Result<(), String>
     *
Tx queues configuration in
Port::config_txqs(&mut self, desc_num: u16) -> Result<(), String>
  *
In the IO thread, I renamed the `enable_polling()` to `activate()` for Rx/Tx symmetry.
  *
I renamed `port` and `q` struct members to `port_id`, `queue_id`

Build steps:

  1.
Apply https://github.com/getelson-at-mellanox/rdpdk/blob/safe-q/dpdk-patches/0001-rust-export-missing-port-objects.patch to DPDK source.
  2.
Install DPDK
  3.
Set PKG_CONFIG_PATH to DPDK installation

Activation:

# cargo run --example echo -- -a <port PCI address>

Regards,
Gregory

________________________________
From: Van Haaren, Harry <harry.van.haaren@intel.com>
Sent: Friday, April 18, 2025 14:40
To: Gregory Etelson <getelson@nvidia.com>
Cc: dev@dpdk.org <dev@dpdk.org>; Richardson, Bruce <bruce.richardson@intel.com>; owen.hilyard@unh.edu <owen.hilyard@unh.edu>
Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq

External email: Use caution opening links or attachments

> From: Etelson, Gregory
> Sent: Thursday, April 17, 2025 7:58 PM
> To: Van Haaren, Harry
> Cc: dev@dpdk.org; getelson@nvidia.com; Richardson, Bruce; owen.hilyard@unh.edu
> Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
>
> Hello Harry,
>
> Thank you for sharing the API.
> Please check out my comments below.

Thanks for reading & discussion!

<snip>

> > +
> > +            pub fn start(&mut self) -> (Vec<RxqHandle>, Vec<TxqHandle>) {
> > +                // call rte_eth_dev_start() here, then give ownership of Rxq/Txq to app
>
> After a call to Port::start, Rx and Tx queues are detached from it's port.
> With that model how rte_eth_dev_stop() and subsequent rte_eth_dev_start()
> DPDK calls can be implemented ?

Correct, the RxqHandle and TxqHandle don't have a "back reference" to the port.
There are a number of ways to ensure eth_dev_stop() cannot be called without the
Rxq/Txqs being "returned" to the Port instance first.

Eg: Use an Arc<T>. The port instance "owns" the Arc<T>, which means it is going to keep
   the Arc alive. Now give each Rxq/Txq a clone of this Arc. When the Drop impl of the
   Rxq/Txq runs, it will decrement the Arc. So just letting the Rxq/Txq go out of scope
   will be enough to have the Port understand that handle is now gone.

   The port itself can use Arc::into_inner function[1], which returns Option<T>. If the
   Some(T) is returned, then all instances of RxqHandle/TxqHandle have been dropped,
   meaning it is safe to eth_dev_stop(), as it is impossible to poll RXQs if there's no Rxq :)
   [1] https://doc.rust-lang.org/std/sync/struct.Arc.html#method.into_inner

// Pseudo-code here:
Dpdk::Eth::Port::stop(&mut self) -> Result<(), Error> {
    let handles_dropped = self.handle_arc.into_inner(); // returns "T" if its the only reference to the Arc
    if handles_dropped.is_none() {
        return Err("an Rxq or Txq handle remains alive, cannot safely stop this port");
    }
}

There's probably a few others, but that's "idiomatic Rust" solution.
We'd have to pass the Arc from the RxqHandle into the Rxq instance itself too,
but that's fine.

<snip>

> > +fn main() {
> > +    let mut dpdk = dpdk::Eal::init().expect("dpdk must init ok");
> > +    let rx_mempool = dpdk::Mempool::new(4096);
> > +
> > +    let mut ports = dpdk.take_eth_ports().expect("take eth ports ok");
>
> Eal::take_eth_ports() resets EAL ports.

I don't think it "resets" here. The "take eth ports" removes the Port instances from
the dpdk::Eal struct, but there's no "reset" behaviour.

> A call to rte_dev_probe() will ether fail, because Eal::eth_ports is None
> or create another port-0, depending on implementation.

I don't see how or why rte_dev_probe() would be called. The idea is not to allow Rust
apps call DPDK C APIs "when they want". The safe Rust API provides the required abstraction.
So its not possible to have another call to rte_dev_probe(), after the 1st time under eal_init().

Similar topic: Hotplug. I have experience with designing C APIs around hotplug
use-cases (Music/DJ software, from before my DPDK/networking days!). I think DPDK has
an interesting "push hotplug" approach (aka, App makes a function call to "request" the device).
Then on successful return, we can call rte_eth_dev_get_port_by_name() to get the u16 port_id,
and build the Port instance from that. Outline API:

enum EalHotplugDev {
    EthDev(Dpdk::Eth::Port), // enums can have contents in Rust :)
    CryptoDev(Dpdk::Crypto),
    // Etc
}

Eal::hotplug_add(bus: String, dev: String, args: String) -> Result<EalHotplugDev, Error> {
    // TODO: call rte_eal_hotplug_add()
    // TODO: identify how to know if its an Eth, Crypto, Dma, or other dev type?
    match (dev_type) {
        "eth" => {
            let port_id = rte_eth_dev_get_port_by_name(dev);
            EalHotplugDev::EthDev( Dpdk::Eth::Port::new(port_id) )
        }
    }
}

Applications could then do:
  let Ok(dev) = eal.hotplug_add("pci", "02:00.0", "dev_option=true") else {
      // failed to hotplug, log error?
      return;
  }
  match (dev) {
      EthDev => {
          // handle the dev here, e.g. configure & spawn thread to poll Rxq like before.
      }
  }

I like having an outline of difficult to "bolt on" features (hotplug is typically hard to add later..)
but I recommend we focus on getting core APIs and such running before more detail/time/implementation here.

Regards, -Harry

[-- Attachment #2: Type: text/html, Size: 14136 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-04-20  8:57     ` Gregory Etelson
@ 2025-04-24 16:06       ` Van Haaren, Harry
  2025-04-27 18:50         ` Etelson, Gregory
  0 siblings, 1 reply; 20+ messages in thread
From: Van Haaren, Harry @ 2025-04-24 16:06 UTC (permalink / raw)
  To: Gregory Etelson; +Cc: dev, Richardson, Bruce, owen.hilyard

[-- Attachment #1: Type: text/plain, Size: 2620 bytes --]

> From: Gregory Etelson
> Sent: Sunday, April 20, 2025 9:57 AM
> To: Van Haaren, Harry
> Cc: dev@dpdk.org; Richardson, Bruce; owen.hilyard@unh.edu
> Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
>
> Hello Harry,
>
> I implemented a working echo server with your API.
> The code is here: https://github.com/getelson-at-mellanox/rdpdk/tree/safe-q

Ah cool! Great to see the API working.

Reviewing the "echo.rs" code, the MbuffMempoolHandle ergonomics can perhaps be improved,
I'll try work on that and have some API suggestions to the mailing list soon.

I see the echo.rs code uses a normal "std::thread::spawn" (not DPDK lcores), there is
some design to do here to ensuring that best practices are used;
- any dataplane threads are registered as lcores (for best performance, mempool caches etc)
- registered lcores are also unregistered when a thread ends (potentially allowing lcore-id reuse??)
I haven't thought about this much, but had a brief discussion with Bruce (who is on holidays now).

Suggesting that mempools & lcores are the two next up API sets to "Rustify" :)


> Several changes:
> DPDK configuration is split to 3 mandatory steps:
> port configuration in
> Port::configure(&mut self, rxq_num: u16, txq_num: u16) -> Result<(), String>
> Rx queues configuration in
> Port::config_rxqs(&mut self, desc_num: u16, mempool: DpdkMempool) -> Result<(), String>
> Tx queues configuration in
> Port::config_txqs(&mut self, desc_num: u16) -> Result<(), String>
> In the IO thread, I renamed the `enable_polling()` to `activate()` for Rx/Tx symmetry.
> I renamed `port` and `q` struct members to `port_id`, `queue_id`

Those seem reasonable changes; no particular concerns.
We can do always do "more more more" type-safety in making it impossible to mis-configure (at compile time).
While type-safety is nice, it will complicate the code too: finding the right tradeoff is key.

For me, having the "Rxq" be pollable only from the correct thread (compile-time check) is the most valuable.
The configuration is "nice to have", but good/simple examples will help users start quickly too, particularly
if the APIs are simple.


> Build steps:
>
> Apply https://github.com/getelson-at-mellanox/rdpdk/blob/safe-q/dpdk-patches/0001-rust-export-missing-port-objects.patch to DPDK source.
> Install DPDK
> Set PKG_CONFIG_PATH to DPDK installation
>
> Activation:
>
> # cargo run --example echo -- -a <port PCI address>

I haven't tried these steps yet, sorry (lack of time at the moment).

> Regards,
> Gregory

Thanks again! -Harry

[-- Attachment #2: Type: text/html, Size: 13800 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-04-24 16:06       ` Van Haaren, Harry
@ 2025-04-27 18:50         ` Etelson, Gregory
  2025-04-30 18:28           ` Gregory Etelson
  0 siblings, 1 reply; 20+ messages in thread
From: Etelson, Gregory @ 2025-04-27 18:50 UTC (permalink / raw)
  To: Van Haaren, Harry; +Cc: Gregory Etelson, dev, Richardson, Bruce, owen.hilyard

Hello Harry,

> > I implemented a working echo server with your API.
> > The code is here: https://github.com/getelson-at-mellanox/rdpdk/tree/safe-q
> 
> Ah cool! Great to see the API working.
> 
> Reviewing the "echo.rs" code, the MbuffMempoolHandle ergonomics can perhaps be improved,
> I'll try work on that and have some API suggestions to the mailing list soon.
> 
> I see the echo.rs code uses a normal "std::thread::spawn" (not DPDK lcores), there is
> some design to do here to ensuring that best practices are used;
> - any dataplane threads are registered as lcores (for best performance, mempool caches etc)
> - registered lcores are also unregistered when a thread ends (potentially allowing lcore-id reuse??)
> I haven't thought about this much, but had a brief discussion with Bruce (who is on holidays now).
> 
> Suggesting that mempools & lcores are the two next up API sets to "Rustify" :)
>

I see 2 issues with the DPDK lcore API:

Unsafe "extern" lcore callback is not considered as new thread and compiler 
will not run arguments Send verifications.

Also lcore arguments use generic 'void *' pointer.

Maybe Rust DPDK library needs native lcore implementation.

Differnet thread agrument types can we wrapped with a macro call.
Example is here: 
https://github.com/getelson-at-mellanox/rdpdk/blob/37494bcae1fcf06bb4338519f931c2130105e576/examples/echo.rs#L88

Regards,
Gregory


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-04-27 18:50         ` Etelson, Gregory
@ 2025-04-30 18:28           ` Gregory Etelson
  2025-05-01  7:44             ` Bruce Richardson
  0 siblings, 1 reply; 20+ messages in thread
From: Gregory Etelson @ 2025-04-30 18:28 UTC (permalink / raw)
  To: Van Haaren, Harry; +Cc: dev, Richardson, Bruce, owen.hilyard

[-- Attachment #1: Type: text/plain, Size: 2345 bytes --]

Hello Harry,

I've been experimenting with lcore workers.
Please check out the new helloworld example:  https://github.com/getelson-at-mellanox/rdpdk/blob/safe-q/examples/helloworld.rs

There are 2 options for the example configuration:

1 Start RDPDK workers on the same cores as EAL:
    cargo run --example helloworld -- -a <PCI address> -l 0,1,3,5

2 Start RDPDK workers on dedicated cores:
    cargo run --example helloworld -- -a 0000:43:00.0 -l 0,1,3,5 -- -l 2-8

Regards,
Gregory

________________________________
From: Gregory Etelson <getelson@nvidia.com>
Sent: Sunday, April 27, 2025 21:50
To: Van Haaren, Harry <harry.van.haaren@intel.com>
Cc: Gregory Etelson <getelson@nvidia.com>; dev@dpdk.org <dev@dpdk.org>; Richardson, Bruce <bruce.richardson@intel.com>; owen.hilyard@unh.edu <owen.hilyard@unh.edu>
Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq

Hello Harry,

> > I implemented a working echo server with your API.
> > The code is here: https://github.com/getelson-at-mellanox/rdpdk/tree/safe-q
>
> Ah cool! Great to see the API working.
>
> Reviewing the "echo.rs" code, the MbuffMempoolHandle ergonomics can perhaps be improved,
> I'll try work on that and have some API suggestions to the mailing list soon.
>
> I see the echo.rs code uses a normal "std::thread::spawn" (not DPDK lcores), there is
> some design to do here to ensuring that best practices are used;
> - any dataplane threads are registered as lcores (for best performance, mempool caches etc)
> - registered lcores are also unregistered when a thread ends (potentially allowing lcore-id reuse??)
> I haven't thought about this much, but had a brief discussion with Bruce (who is on holidays now).
>
> Suggesting that mempools & lcores are the two next up API sets to "Rustify" :)
>

I see 2 issues with the DPDK lcore API:

Unsafe "extern" lcore callback is not considered as new thread and compiler
will not run arguments Send verifications.

Also lcore arguments use generic 'void *' pointer.

Maybe Rust DPDK library needs native lcore implementation.

Differnet thread agrument types can we wrapped with a macro call.
Example is here:
https://github.com/getelson-at-mellanox/rdpdk/blob/37494bcae1fcf06bb4338519f931c2130105e576/examples/echo.rs#L88

Regards,
Gregory



[-- Attachment #2: Type: text/html, Size: 5890 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-04-30 18:28           ` Gregory Etelson
@ 2025-05-01  7:44             ` Bruce Richardson
  2025-05-02 12:46               ` Etelson, Gregory
  0 siblings, 1 reply; 20+ messages in thread
From: Bruce Richardson @ 2025-05-01  7:44 UTC (permalink / raw)
  To: Gregory Etelson; +Cc: Van Haaren, Harry, dev, owen.hilyard

On Wed, Apr 30, 2025 at 06:28:49PM +0000, Gregory Etelson wrote:
>    Hello Harry,
> 
>    I've been experimenting with lcore workers.
> 
>    Please check out the new helloworld example:
>    [1]https://github.com/getelson-at-mellanox/rdpdk/blob/safe-q/examples/h
>    elloworld.rs
> 
>    There are 2 options for the example configuration:
> 
>    1 Start RDPDK workers on the same cores as EAL:
>        cargo run --example helloworld -- -a <PCI address> -l 0,1,3,5
>    2 Start RDPDK workers on dedicated cores:
>        cargo run --example helloworld -- -a 0000:43:00.0 -l 0,1,3,5 -- -l
>    2-8
> 

Thanks for sharing. However, IMHO using EAL for thread management in rust
is the wrong interface to expose. Instead, I believe we should be
encouraging native rust thread management, and not exposing any DPDK
threading APIs except those necessary to have rust threads work with DPDK,
i.e. with an lcore ID. Many years ago when DPDK started, and in the C
world, having DPDK as a runtime environment made sense, but times have
changed and for Rust, there is a whole ecosystem out there already that we
need to "play nice with", so having Rust (not DPDK) do all thread
management is the way to go (again IMHO).

/Bruce

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-05-01  7:44             ` Bruce Richardson
@ 2025-05-02 12:46               ` Etelson, Gregory
  2025-05-02 13:58                 ` Van Haaren, Harry
  0 siblings, 1 reply; 20+ messages in thread
From: Etelson, Gregory @ 2025-05-02 12:46 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Gregory Etelson, Van Haaren, Harry, dev, owen.hilyard

Hello Bruce,

> Thanks for sharing. However, IMHO using EAL for thread management in rust
> is the wrong interface to expose.

EAL is a singleton object in DPDK architecture.
I see it as a hub for other resources.
Following that idea, the EAL structure can be divided to hold the 
"original" resources inherited from librte_eal and new resources
introduced in Rust EAL.

> Instead, I believe we should be
> encouraging native rust thread management, and not exposing any DPDK
> threading APIs except those necessary to have rust threads work with DPDK,
> i.e. with an lcore ID. Many years ago when DPDK started, and in the C
> world, having DPDK as a runtime environment made sense, but times have
> changed and for Rust, there is a whole ecosystem out there already that we
> need to "play nice with", so having Rust (not DPDK) do all thread
> management is the way to go (again IMHO).
>

I'm not sure what exposed DPDK API you refer to.

Regards,
Gregory



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-05-02 12:46               ` Etelson, Gregory
@ 2025-05-02 13:58                 ` Van Haaren, Harry
  2025-05-02 15:41                   ` Gregory Etelson
  2025-05-03 17:13                   ` Owen Hilyard
  0 siblings, 2 replies; 20+ messages in thread
From: Van Haaren, Harry @ 2025-05-02 13:58 UTC (permalink / raw)
  To: Etelson, Gregory, Richardson, Bruce; +Cc: dev, owen.hilyard

> From: Etelson, Gregory
> Sent: Friday, May 02, 2025 1:46 PM
> To: Richardson, Bruce
> Cc: Gregory Etelson; Van Haaren, Harry; dev@dpdk.org; owen.hilyard@unh.edu
> Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
> 
> Hello Bruce,

Hi All,

> > Thanks for sharing. However, IMHO using EAL for thread management in rust
> > is the wrong interface to expose.
> 
> EAL is a singleton object in DPDK architecture.
> I see it as a hub for other resources.

Yep, i tend to agree here; EAL is central to the rest of DPDK working correctly.
And given EALs implementation is heavily relying on global static variables, it is
certainly a "singleton" instance, yes.

> Following that idea, the EAL structure can be divided to hold the
> "original" resources inherited from librte_eal and new resources
> introduced in Rust EAL.

Here we can look from different perspectives. Should "Rust EAL" even exist?
If so, why? The DPDK C APIs were designed in baremetal/linux days, where
certain "best-practices" didn't exist yet, and Rust language was pre 1.0 release.

Of course, certain parts of Rust API must depend on EAL being initialized.
There is a logical flow to DPDK initialization, these must be kept for correct functionality.

I guess I'm saying, perhaps we can do better than mirroring the concept of
"DPDK EAL in C" in to "DPDK EAL in Rust".

> > Instead, I believe we should be
> > encouraging native rust thread management, and not exposing any DPDK
> > threading APIs except those necessary to have rust threads work with DPDK,
> > i.e. with an lcore ID. Many years ago when DPDK started, and in the C
> > world, having DPDK as a runtime environment made sense, but times have
> > changed and for Rust, there is a whole ecosystem out there already that we
> > need to "play nice with", so having Rust (not DPDK) do all thread
> > management is the way to go (again IMHO).
> >
> 
> I'm not sure what exposed DPDK API you refer to.

I think that's the point :) Perhaps the Rust application should decide how/when to
create threads, and how to schedule & pin them. Not the "DPDK crate for Rust".
To give a more concrete examples, lets look at Tokio (or Monoio, or Glommio, or .. )
which are prominent players in the Rust ecosystem, particularly for networking workloads
where request/response patterns are well served by the "async" programming model (e.g HTTP server).

Lets focus on Tokio first: it is an "async runtime" (two links for future readers)
    https://corrode.dev/blog/async/
    https://rust-lang.github.io/async-book/08_ecosystem/00_chapter.html
So an async runtime can run "async" Rust functions (called Futures, or Tasks when run independently..)
There are lots of words/concepts, but I'll focus only on the thread creation/control aspect, given the DPDK EAL lcore context.

Tokio is a work-stealing scheduler. It spawns "worker" threads, and then gives these "tasks"
to various worker cores (similar to how Golang does its work-stealing scheduling). Some 
DPDK crate users might like this type of workflow, where e.g. RXQ polling is a task, and the
"tokio runtime" figures out which worker to run it on. "Spawning" a task causes the "Future"
to start executing. (technical Rust note: notice the "Send" bound on Future: https://docs.rs/tokio/latest/tokio/task/fn.spawn.html )

Other users might prefer the "thread-per-core" and CPU pinning approach (like DPDK itself would do).
Monoio and Glommio both serve these use cases (but in slightly different ways!). They both spawn threads and do CPU pinning.
Monoio and Glommio say "tasks will always remain on the local thread". In Rust techie terms: "Futures are !Send and !Sync"
    https://docs.rs/monoio/latest/monoio/fn.spawn.html 
    https://docs.rs/glommio/latest/glommio/fn.spawn_local.html

So there are at least 3 different async runtimes (and I haven't even talked about async-std, smol, embassy, ...) which
all have different use-cases, and methods of running "tasks" on threads. These runtimes exist, and are widely used,
and applications make use of their thread-scheduling capabilities.

So "async runtimes" do thread creation (and optionally CPU pinning) for the user.
Other libraries like "Rayon" are thread-pool managers, those also have various CPU thread-create/pinning capabilities.
If DPDK *also* wants to do thread creation/management and CPU-thread-to-core pinning for the user, that creates tension.

> Bruce wrote: "so having Rust (not DPDK) do all thread management is the way to go (again IMHO)."

I think I agree here, in order to make the Rust DPDK crate usable from the Rust ecosystem,
it must align itself with the existing Rust networking ecosystem.

That means, the DPDK Rust crate should not FORCE the usage of lcore pinnings and mappings.
Allowing a Rust application to decide how to best handle threading (via Rayon, Tokio, Monoio, etc)
will allow much more "native" or "ergonomic" integration of DPDK into Rust applications.

> Regards,
> Gregory

Apologies for the long-form, "wall of text" email, but I hope it captures the nuance of threading and
async runtimes, which I believe in the long term will be very nice to capture "async offload" use-cases
for DPDK. To put it another way, lookaside processing can be hidden behind async functions & runtimes,
if we design the APIs right: and that would be really cool for making async-offload code easy to write correctly!

Regards, -Harry

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-05-02 13:58                 ` Van Haaren, Harry
@ 2025-05-02 15:41                   ` Gregory Etelson
  2025-05-02 15:57                     ` Bruce Richardson
  2025-05-03 17:13                   ` Owen Hilyard
  1 sibling, 1 reply; 20+ messages in thread
From: Gregory Etelson @ 2025-05-02 15:41 UTC (permalink / raw)
  To: Van Haaren, Harry, Richardson, Bruce; +Cc: dev, owen.hilyard

[-- Attachment #1: Type: text/plain, Size: 6620 bytes --]

Hello Bruce & Harry,

There is an aspect we've not discussed yet.

DPDK is a framework. It's integrated into a network application.
From the application perspective what is a ratio between "pure" application code and DPDK API ?
The exact numbers differ, but it's clear that most of application code is not about DPDK.

Another question to consider - what is more complicated
rewrite entire application from C to Rust or, while having Rust application, upgrade or even replace DPDK API ?

DPDK provides a solid framework for both stability and performance.
In my opinion, binding DPDK as it is today with Rust can significantly improve application design.

Regards,
Gregory
________________________________
From: Van Haaren, Harry <harry.van.haaren@intel.com>
Sent: Friday, May 2, 2025 16:58
To: Gregory Etelson <getelson@nvidia.com>; Richardson, Bruce <bruce.richardson@intel.com>
Cc: dev@dpdk.org <dev@dpdk.org>; owen.hilyard@unh.edu <owen.hilyard@unh.edu>
Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq

External email: Use caution opening links or attachments

> From: Etelson, Gregory
> Sent: Friday, May 02, 2025 1:46 PM
> To: Richardson, Bruce
> Cc: Gregory Etelson; Van Haaren, Harry; dev@dpdk.org; owen.hilyard@unh.edu
> Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
>
> Hello Bruce,

Hi All,

> > Thanks for sharing. However, IMHO using EAL for thread management in rust
> > is the wrong interface to expose.
>
> EAL is a singleton object in DPDK architecture.
> I see it as a hub for other resources.

Yep, i tend to agree here; EAL is central to the rest of DPDK working correctly.
And given EALs implementation is heavily relying on global static variables, it is
certainly a "singleton" instance, yes.

> Following that idea, the EAL structure can be divided to hold the
> "original" resources inherited from librte_eal and new resources
> introduced in Rust EAL.

Here we can look from different perspectives. Should "Rust EAL" even exist?
If so, why? The DPDK C APIs were designed in baremetal/linux days, where
certain "best-practices" didn't exist yet, and Rust language was pre 1.0 release.

Of course, certain parts of Rust API must depend on EAL being initialized.
There is a logical flow to DPDK initialization, these must be kept for correct functionality.

I guess I'm saying, perhaps we can do better than mirroring the concept of
"DPDK EAL in C" in to "DPDK EAL in Rust".

> > Instead, I believe we should be
> > encouraging native rust thread management, and not exposing any DPDK
> > threading APIs except those necessary to have rust threads work with DPDK,
> > i.e. with an lcore ID. Many years ago when DPDK started, and in the C
> > world, having DPDK as a runtime environment made sense, but times have
> > changed and for Rust, there is a whole ecosystem out there already that we
> > need to "play nice with", so having Rust (not DPDK) do all thread
> > management is the way to go (again IMHO).
> >
>
> I'm not sure what exposed DPDK API you refer to.

I think that's the point :) Perhaps the Rust application should decide how/when to
create threads, and how to schedule & pin them. Not the "DPDK crate for Rust".
To give a more concrete examples, lets look at Tokio (or Monoio, or Glommio, or .. )
which are prominent players in the Rust ecosystem, particularly for networking workloads
where request/response patterns are well served by the "async" programming model (e.g HTTP server).

Lets focus on Tokio first: it is an "async runtime" (two links for future readers)
    https://corrode.dev/blog/async/
    https://rust-lang.github.io/async-book/08_ecosystem/00_chapter.html
So an async runtime can run "async" Rust functions (called Futures, or Tasks when run independently..)
There are lots of words/concepts, but I'll focus only on the thread creation/control aspect, given the DPDK EAL lcore context.

Tokio is a work-stealing scheduler. It spawns "worker" threads, and then gives these "tasks"
to various worker cores (similar to how Golang does its work-stealing scheduling). Some
DPDK crate users might like this type of workflow, where e.g. RXQ polling is a task, and the
"tokio runtime" figures out which worker to run it on. "Spawning" a task causes the "Future"
to start executing. (technical Rust note: notice the "Send" bound on Future: https://docs.rs/tokio/latest/tokio/task/fn.spawn.html )

Other users might prefer the "thread-per-core" and CPU pinning approach (like DPDK itself would do).
Monoio and Glommio both serve these use cases (but in slightly different ways!). They both spawn threads and do CPU pinning.
Monoio and Glommio say "tasks will always remain on the local thread". In Rust techie terms: "Futures are !Send and !Sync"
    https://docs.rs/monoio/latest/monoio/fn.spawn.html
    https://docs.rs/glommio/latest/glommio/fn.spawn_local.html

So there are at least 3 different async runtimes (and I haven't even talked about async-std, smol, embassy, ...) which
all have different use-cases, and methods of running "tasks" on threads. These runtimes exist, and are widely used,
and applications make use of their thread-scheduling capabilities.

So "async runtimes" do thread creation (and optionally CPU pinning) for the user.
Other libraries like "Rayon" are thread-pool managers, those also have various CPU thread-create/pinning capabilities.
If DPDK *also* wants to do thread creation/management and CPU-thread-to-core pinning for the user, that creates tension.

> Bruce wrote: "so having Rust (not DPDK) do all thread management is the way to go (again IMHO)."

I think I agree here, in order to make the Rust DPDK crate usable from the Rust ecosystem,
it must align itself with the existing Rust networking ecosystem.

That means, the DPDK Rust crate should not FORCE the usage of lcore pinnings and mappings.
Allowing a Rust application to decide how to best handle threading (via Rayon, Tokio, Monoio, etc)
will allow much more "native" or "ergonomic" integration of DPDK into Rust applications.

> Regards,
> Gregory

Apologies for the long-form, "wall of text" email, but I hope it captures the nuance of threading and
async runtimes, which I believe in the long term will be very nice to capture "async offload" use-cases
for DPDK. To put it another way, lookaside processing can be hidden behind async functions & runtimes,
if we design the APIs right: and that would be really cool for making async-offload code easy to write correctly!

Regards, -Harry

[-- Attachment #2: Type: text/html, Size: 11154 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-05-02 15:41                   ` Gregory Etelson
@ 2025-05-02 15:57                     ` Bruce Richardson
  0 siblings, 0 replies; 20+ messages in thread
From: Bruce Richardson @ 2025-05-02 15:57 UTC (permalink / raw)
  To: Gregory Etelson; +Cc: Van Haaren, Harry, dev, owen.hilyard

On Fri, May 02, 2025 at 03:41:33PM +0000, Gregory Etelson wrote:
>    Hello Bruce & Harry,
> 
>    There is an aspect we've not discussed yet.
> 
>    DPDK is a framework. It's integrated into a network application.
> 
>    From the application perspective what is a ratio between "pure"
>    application code and DPDK API ?
>    The exact numbers differ, but it's clear that most of application code
>    is not about DPDK.
> 
>    Another question to consider - what is more complicated
> 
>    rewrite entire application from C to Rust or, while having Rust
>    application, upgrade or even replace DPDK API ?
> 
>    DPDK provides a solid framework for both stability and performance.
> 
>    In my opinion, binding DPDK as it is today with Rust can significantly
>    improve application design.
> 

I would have initially agreed with that assertion. However, "binding DPDK
as it is today with Rust" has already been done many times and never got
any real traction that I have seen. Just look at the number of crates
coming up when you search crates.io for DPDK[1] - and from a quick scan,
many of these are not crates using DPDK, but wrappers around DPDK as it is
now (or was a couple of years ago!).

Given that it's been attempted so many times before, I really don't see the
value in doing it "one more time". If we want to offer support for DPDK
through rust, we need to offer something different and better. Any rust
developer can already use bindgen to wrap DPDK themselves.

That's why I'm trying to see how we can offer something that will be longer
term maintainable and usable from rust - rather than just exposing the C
APIs. For maintainability we don't want to expose anything that's not
absolutely necessary, and for usability we don't want to expose anything
that may conflict with what is already there is rust, and for both
maintainablity and usability we only should expose that which can't already
be done in rust itself or in an existing crate. So I'd view (almost) all
thread-management, and most of what EAL provides as not to be exposed to
Rust, because Rust already has other ways of doing all that. Similarly for
the non-device management libs (i.e. those not like ethdev), functionality
for rib/fib/packet rordering/etc. are all better handled by separate crates
than by wrapping DPDK.

Furthermore, I also tend to be skeptical of the longer-term maintenance of
anything that is outside the DPDK repo itself. That's why in my initial
RFC, I looked to add to the DPDK repo the minimum hooks needed to make the
repo itself a rust crate, rather than having a rust crate that pulls in and
wraps DPDK. (Again, there are already a handful of those for users to
choose from!)

Just my 2c., and where I am coming from.

/Bruce

[1] https://crates.io/search?q=dpdk

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-05-02 13:58                 ` Van Haaren, Harry
  2025-05-02 15:41                   ` Gregory Etelson
@ 2025-05-03 17:13                   ` Owen Hilyard
  2025-05-06 16:39                     ` Van Haaren, Harry
  1 sibling, 1 reply; 20+ messages in thread
From: Owen Hilyard @ 2025-05-03 17:13 UTC (permalink / raw)
  To: Van Haaren, Harry, Etelson, Gregory, Bruce Richardson; +Cc: dev

[-- Attachment #1: Type: text/plain, Size: 15732 bytes --]

From: Van Haaren, Harry <harry.van.haaren@intel.com>
Sent: Friday, May 2, 2025 9:58 AM
To: Etelson, Gregory <getelson@nvidia.com>; Richardson, Bruce <bruce.richardson@intel.com>
Cc: dev@dpdk.org <dev@dpdk.org>; Owen Hilyard <owen.hilyard@unh.edu>
Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq

> From: Etelson, Gregory
> Sent: Friday, May 02, 2025 1:46 PM
> To: Richardson, Bruce
> Cc: Gregory Etelson; Van Haaren, Harry; dev@dpdk.org; owen.hilyard@unh.edu
> Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
>
> Hello Bruce,

Hi All,
Hi All,

> > Thanks for sharing. However, IMHO using EAL for thread management in rust
> > is the wrong interface to expose.
>
> EAL is a singleton object in DPDK architecture.
> I see it as a hub for other resources.

Yep, i tend to agree here; EAL is central to the rest of DPDK working correctly.
And given EALs implementation is heavily relying on global static variables, it is
certainly a "singleton" instance, yes.
I think a singleton one way to implement this, but then you lose some of the RAII/automatic resource management behavior. It would, however, make some APIs inherently unsafe or very unergonomic unless we were to force rte_eal_cleanup to be run via atexit(3) or the platform equivalent and forbid the user from running it themselves. For a lot of Rust runtimes similar to the EAL (tokio, glommio, etc), once you spawn a runtime it's around until process exit. The other option is to have a handle which represents the state of the EAL on the Rust side and runs rte_eal_init on creation and rte_eal_cleanup on destruction. There are two ways we can make that safe. First, reference counting, once the handles are created, they can be passed around easily, and the last one runs rte_eal_cleanup when it gets dropped.  This avoids having tons of complicated lifetimes and I think that, everywhere that it shouldn't affect fast path performance, we should use refcounting. The other option is to use lifetimes. This is doable, but is going to force people who are more likely to primarily be C or C++ developers to dive deep into Rust's type system if they want to build abstractions over it. If we add async into the mix, as many people are going to want to do, it's going to become much, much harder. As a result, I'd advocate for only using it for data path components where refcounting isn't an option.

> Following that idea, the EAL structure can be divided to hold the
> "original" resources inherited from librte_eal and new resources
> introduced in Rust EAL.

Here we can look from different perspectives. Should "Rust EAL" even exist?
If so, why? The DPDK C APIs were designed in baremetal/linux days, where
certain "best-practices" didn't exist yet, and Rust language was pre 1.0 release.

Of course, certain parts of Rust API must depend on EAL being initialized.
There is a logical flow to DPDK initialization, these must be kept for correct functionality.

I guess I'm saying, perhaps we can do better than mirroring the concept of
"DPDK EAL in C" in to "DPDK EAL in Rust".

I think that there will need to be some kind of runtime exposed by the library. A lot of the existing EAL abstractions may need to be reworked, especially those dealing with memory, but I think a lot of things can be layered on top of the C API. However, I think many of the invariants in the EAL could be enforced at compile time for free, which may mean the creation of a lot of "unchecked" function variants which skip over null checks and other validation.

As was mentioned before, it may also make sense for some abstractions in the C EAL to be lifted to compile time. I've spent a lot of time thinking about how to use something like Rust's traits for "it just works" capabilities where you can declare what features you want (ex: scatter/gather) and it will either be done in hardware or fall back to software, since you were going to need to do it anyway. This might lead to parameterizing a lot of user code on the devices they expect to interact with and then having some "dyn EthDev" as a fallback, which should be roughly equivalent to what we have now. I can explain that in more detail if there's interest.

> > Instead, I believe we should be
> > encouraging native rust thread management, and not exposing any DPDK
> > threading APIs except those necessary to have rust threads work with DPDK,
> > i.e. with an lcore ID. Many years ago when DPDK started, and in the C
> > world, having DPDK as a runtime environment made sense, but times have
> > changed and for Rust, there is a whole ecosystem out there already that we
> > need to "play nice with", so having Rust (not DPDK) do all thread
> > management is the way to go (again IMHO).
> >
>
> I'm not sure what exposed DPDK API you refer to.

I think that's the point :) Perhaps the Rust application should decide how/when to
create threads, and how to schedule & pin them. Not the "DPDK crate for Rust".
To give a more concrete examples, lets look at Tokio (or Monoio, or Glommio, or .. )
which are prominent players in the Rust ecosystem, particularly for networking workloads
where request/response patterns are well served by the "async" programming model (e.g HTTP server).
Rust doesn't really care about threads that much. Yes, it has std::thread as a pthread equivalent, but on Linux those literally call pthread. Enforcing the correctness of the Send and Sync traits (responsible for helping enforce thread safety) in APIs is left to library authors. I've used Rust with EAL threads and it's fine, although a slightly nicer API for launching based on a closure (which is a function pointer and a struct with the captured inputs) would be nice. In Rust, I'd say that async and threads are orthogonal concepts, except where runtimes force them to mix. Async is a way to write a state machine or (with some more abstraction) an execution graph, and Rust the language doesn't care whether a library decides to run some dependencies in parallel. What I think Rust is more likely to want is thread per core and then running either a single async runtime over all of them or an async runtime per core.

Lets focus on Tokio first: it is an "async runtime" (two links for future readers)
    <snip>
So an async runtime can run "async" Rust functions (called Futures, or Tasks when run independently..)
There are lots of words/concepts, but I'll focus only on the thread creation/control aspect, given the DPDK EAL lcore context.

Tokio is a work-stealing scheduler. It spawns "worker" threads, and then gives these "tasks"
to various worker cores (similar to how Golang does its work-stealing scheduling). Some
DPDK crate users might like this type of workflow, where e.g. RXQ polling is a task, and the
"tokio runtime" figures out which worker to run it on. "Spawning" a task causes the "Future"
to start executing. (technical Rust note: notice the "Send" bound on Future: https://docs.rs/tokio/latest/tokio/task/fn.spawn.html )
The work stealing aspect of Tokio has also led to some issues in the Rust ecosystem. What it effectively means is that every "await" is a place where you might get moved to another thread. This means that it would be unsound to, for example, have a queue handle on devices without MT-safe queues unless we want to put a mutex on top of all of the device queues. I personally think this is a lot of the source of people thinking that Rust async is hard, because Tokio forces you to be thread safe at really weird places in your code and has issues like not being able to hold a mutex over an await point.

Other users might prefer the "thread-per-core" and CPU pinning approach (like DPDK itself would do).
nit: Tokio also spawns a thread per core, it just freely moves tasks between cores. It doesn't pin because it's designed to interoperate with the normal kernel scheduler more nicely. I think that not needing pinned cores is nice, but we want the ability to pin for performance reasons, especially on NUMA/NUCA systems (NUCA = Non-Uniform Cache Architecture, almost every AMD EPYC above 8 cores, higher core count Intel Xeons for 3 generations, etc).
Monoio and Glommio both serve these use cases (but in slightly different ways!). They both spawn threads and do CPU pinning.
Monoio and Glommio say "tasks will always remain on the local thread". In Rust techie terms: "Futures are !Send and !Sync"
    https://docs.rs/monoio/latest/monoio/fn.spawn.html
    https://docs.rs/glommio/latest/glommio/fn.spawn_local.html
There is also another option, one which would eliminate "service cores". We provide both a work stealing pool of tasks that have to deal with being yanked between cores/EAL threads at any time, but aren't data plane tasks, and then a different API for spawning tasks onto the local thread/core for data plane tasks (ex: something to manage a particular HTTP connection). This might make writing the runtime harder, but it should provide the best of both worlds provided we can build in a feature (Rust provides a way to "ifdef out" code via features) to disable one or the other if someone doesn't want the overhead.

So there are at least 3 different async runtimes (and I haven't even talked about async-std, smol, embassy, ...) which
all have different use-cases, and methods of running "tasks" on threads. These runtimes exist, and are widely used,
and applications make use of their thread-scheduling capabilities.

So "async runtimes" do thread creation (and optionally CPU pinning) for the user.
Other libraries like "Rayon" are thread-pool managers, those also have various CPU thread-create/pinning capabilities.
If DPDK *also* wants to do thread creation/management and CPU-thread-to-core pinning for the user, that creates tension.
The other problem is that most of these async runtimes have IO very tightly integrated into them. A large portion of Tokio had to be forked and rewritten for io_uring support, and DPDK is a rather stark departure from what they were all designed for. I know that both Tokio and Glommio have "start a new async runtime on this thread" functions, and I think that Tokio has an "add this thread to a multithreaded runtime" somewhere.

I think the main thing that DPDK would need to be concerned about is that many of these runtimes use thread locals, and I'm not sure if that would be transparently handled by the EAL thread runtime since I've always used thread per core and then used the Rust runtime to multiplex between tasks instead of spawning more EAL threads.

Rayon should probably be thought of in a similar vein to OpenMP, since it's mainly designed for batch processing. Unless someone is doing some fairly heavy computation (the kind where "do we want a GPU to accelerate this?" becomes a question) inside of their DPDK application, I'm having trouble thinking of a use case that would want both DPDK and Rayon.

> Bruce wrote: "so having Rust (not DPDK) do all thread management is the way to go (again IMHO)."

I think I agree here, in order to make the Rust DPDK crate usable from the Rust ecosystem,
it must align itself with the existing Rust networking ecosystem.

That means, the DPDK Rust crate should not FORCE the usage of lcore pinnings and mappings.
Allowing a Rust application to decide how to best handle threading (via Rayon, Tokio, Monoio, etc)
will allow much more "native" or "ergonomic" integration of DPDK into Rust applications.
I'm not sure that using DPDK from Rust will be possible without either serious performance sacrifices or rewrites of a lot of the networking libraries. Tokio continues to mimic the BSD sockets API for IO, even with the io_uring version, as does glommio. The idea of the "recv" giving you a buffer without you passing one in isn't really used outside of some lower-level io_uring crates. At a bare minimum, even if DPDK managed to offer an API that works exactly the same ways as io_uring or epoll, we would still need to go to all of the async runtimes and get them to plumb DPDK support in or approve someone from the DPDK community maintaining support. If we don't offer that API, then we either need rewrites inside of the async runtimes or for individual libraries to provide DPDK support, which is going to be even more difficult.

I agree that forcing lcore pinnings and mappings isn't good, but I think that DPDK is well within its rights to build its own async runtime which exposes a standard API. For one thing, the first thing Rust users will ask for is a TCP stack, which the community has been discussing and debating for a long time. I think we should figure out whether the goal is to allow DPDK applications to be written in Rust, or to allow generic Rust applications to use DPDK. The former means that the audience would likely be Rust-fluent people who would have used DPDK regardless, and are fine dealing with mempools, mbufs, the eal, and ethdev configuration. The latter is a much larger audience who is likely going to be less tolerant of dpdk-rs exposing the true complexity of using DPDK. Yes, Rust can help make the abstractions better, but there's an amount of inherent complexity in "Your NIC can handle IPSec for you and can also direct all IPv6 traffic to one core" that I don't think we can remove.

I personally think that making an API for DPDK applications to be written in Rust, and then steadily adding abstractions on top of that until we arrive at something that someone who has never looked at a TCP header can use without too much confusion. That was part of the goal of the Iris project I pitched (and then had to go finish another project so the design is still WIP). I think that a move to DPDK is going to be as radical of a change as a move to io_uring, however, DPDK is fast enough that I think it may be possible to convince people to do a rewrite once we arrive at that high level API. "Swap out your sockets and rework the functions that do network IO for a 5x performance increase" is a very, very attractive offer, but for us to get there I think we need to have DPDK's full potential available in Rust, and then build as many zero-overhead (zero cost or you couldn't write it better yourself) abstractions as we can on top. I want to avoid a situation where we build up to the high-level APIs as fast as we can and then end up in a situation where you have "Easy Mode" and then "C DPDK written in Rust" as your two options.
> Regards,
> Gregory

Apologies for the long-form, "wall of text" email, but I hope it captures the nuance of threading and
async runtimes, which I believe in the long term will be very nice to capture "async offload" use-cases
for DPDK. To put it another way, lookaside processing can be hidden behind async functions & runtimes,
if we design the APIs right: and that would be really cool for making async-offload code easy to write correctly!

Regards, -Harry

Sorry for my own walls of text. As a consequence of working on Iris I've spent a lot of time thinking about how to make DPDK easier to use while keeping the performance intact, and I was already thinking in Rust since it provides one of the better options for these kinds of abstractions (the other option I see is Mojo, which isn't ready yet). I want to see DPDK become more accessible, but the performance and access to hardware is one of the main things that make DPDK special, so I don't want to compromise that. I definitely agree that we need to force DPDK's existing APIs to justify themselves in the face of the new capabilities of Rust, but I think that starting from "How are Rust applications written today?" is a mistake.

Regards,
Owen

[-- Attachment #2: Type: text/html, Size: 36913 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-05-03 17:13                   ` Owen Hilyard
@ 2025-05-06 16:39                     ` Van Haaren, Harry
  2025-05-08 23:53                       ` Owen Hilyard
  0 siblings, 1 reply; 20+ messages in thread
From: Van Haaren, Harry @ 2025-05-06 16:39 UTC (permalink / raw)
  To: Owen Hilyard, Etelson, Gregory, Richardson, Bruce; +Cc: dev

> From: Owen Hilyard
> Sent: Saturday, May 03, 2025 6:13 PM
> To: Van Haaren, Harry; Etelson, Gregory; Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
>
> From: Van Haaren, Harry <harry.van.haaren@intel.com>
> Sent: Friday, May 2, 2025 9:58 AM
> To: Etelson, Gregory <getelson@nvidia.com>; Richardson, Bruce <bruce.richardson@intel.com>
> Cc: dev@dpdk.org <dev@dpdk.org>; Owen Hilyard <owen.hilyard@unh.edu>
> Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
>  
> > From: Etelson, Gregory
> > Sent: Friday, May 02, 2025 1:46 PM
> > To: Richardson, Bruce
> > Cc: Gregory Etelson; Van Haaren, Harry; dev@dpdk.org; owen.hilyard@unh.edu
> > Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
> >
> > Hello Bruce,
>
> Hi All,
> Hi All,

Hi All!

Great to see passionate & detailed replies & input!

Please folks - lets try remember to send plain-text emails, and use  >  to indent each reply.
Its hard to identify what I wrote (1) compared to Owen's replies (2) in the archives otherwise.
(Adding some "Harry wrote" and "Owen wrote" annotations to try help future readability.)

1) https://inbox.dpdk.org/dev/PH8PR11MB6803B2CD0BF276C6164C3D97D78D2@PH8PR11MB6803.namprd11.prod.outlook.com/
2) https://inbox.dpdk.org/dev/DM8P223MB038323681A4BEA771CF92A6D8D8D2@DM8P223MB0383.NAMP223.PROD.OUTLOOK.COM/


Maybe it will help to split the conversation into two threads, with one focussing on
"DPDK used through Safe Rust abstractions", and the other on "future cool use-cases".

Perhaps I jumped a bit too far ahead mentioning async runtimes, and while I like the enthusiasm for
designing "cool new stuff", it is probably better to be realistic around what will get "done": my bad.

I'll reply to the "DPDK via Safe Rust" topics below, and start a new thread (with same folks on CC)
for "future cool use-cases" when I've had a chance to clean up a little demo to showcase them.


> > > Thanks for sharing. However, IMHO using EAL for thread management in rust
> > > is the wrong interface to expose.
> >
> > EAL is a singleton object in DPDK architecture.
> > I see it as a hub for other resources.

Harry Wrote:
> Yep, i tend to agree here; EAL is central to the rest of DPDK working correctly.
> And given EALs implementation is heavily relying on global static variables, it is
> certainly a "singleton" instance, yes.

Owen wrote:
> I think a singleton one way to implement this, but then you lose some of the RAII/automatic resource management behavior. It would, however, make some APIs inherently unsafe or very unergonomic unless we were to force rte_eal_cleanup to be run via atexit(3) or the platform equivalent and forbid the user from running it themselves. For a lot of Rust runtimes similar to the EAL (tokio, glommio, etc), once you spawn a runtime it's around until process exit. The other option is to have a handle which represents the state of the EAL on the Rust side and runs rte_eal_init on creation and rte_eal_cleanup on destruction. There are two ways we can make that safe. First, reference counting, once the handles are created, they can be passed around easily, and the last one runs rte_eal_cleanup when it gets dropped.  This avoids having tons of complicated lifetimes and I think that, everywhere that it shouldn't affect fast path performance, we should use refcounting.

Agreed, refcounts for EAL "singleton" concept yes. For the record, the initial patch actually returns a
"dpdk" object from dpdk::Eal::init(), and Drop impl has a // TODO rte_eal_cleanup(), so well aligned on approach here.
https://patches.dpdk.org/project/dpdk/patch/20250418132324.4085336-1-harry.van.haaren@intel.com/

Owen wrote:
> The other option is to use lifetimes. This is doable, but is going to force people who are more likely to primarily be C or C++ developers to dive deep into Rust's type system if they want to build abstractions over it. If we add async into the mix, as many people are going to want to do, it's going to become much, much harder. As a result, I'd advocate for only using it for data path components where refcounting isn't an option.

+1 to not using lifetimes here, it is not the right solution for this EAL / singleton type problem.


Gregory wrote:
> > Following that idea, the EAL structure can be divided to hold the
> > "original" resources inherited from librte_eal and new resources
> > introduced in Rust EAL.

Harry wrote:
> Here we can look from different perspectives. Should "Rust EAL" even exist?
> If so, why? The DPDK C APIs were designed in baremetal/linux days, where
> certain "best-practices" didn't exist yet, and Rust language was pre 1.0 release.
>
> Of course, certain parts of Rust API must depend on EAL being initialized.
> There is a logical flow to DPDK initialization, these must be kept for correct functionality.
>
> I guess I'm saying, perhaps we can do better than mirroring the concept of
> "DPDK EAL in C" in to "DPDK EAL in Rust".

Owen wrote:
> I think that there will need to be some kind of runtime exposed by the library. A lot of the existing EAL abstractions may need to be reworked, especially those dealing with memory, but I think a lot of things can be layered on top of the C API. However, I think many of the invariants in the EAL could be enforced at compile time for free, which may mean the creation of a lot of "unchecked" function variants which skip over null checks and other validation.

Agree that most (if not all?) things can be layered on top of the C API. Lets leave the "unchecked" function variants discussion until we have code to discuss, its hard to know right now because we don't have an implementation to talk about.

Owen wrote:
> As was mentioned before, it may also make sense for some abstractions in the C EAL to be lifted to compile time. I've spent a lot of time thinking about how to use something like Rust's traits for "it just works" capabilities where you can declare what features you want (ex: scatter/gather) and it will either be done in hardware or fall back to software, since you were going to need to do it anyway. This might lead to parameterizing a lot of user code on the devices they expect to interact with and then having some "dyn EthDev" as a fallback, which should be roughly equivalent to what we have now. I can explain that in more detail if there's interest.

This goes into the "cool new stuff" category in my head: I agree these concepts are possible,
but i feel we must prioritize the "DPDK via Safe Rust" and achieve that first. We cannot put the
cherry on the cake, if the cake is still under construction :)

(Techie note, the description is for a "polyfill" of specific functionality. This is often done via
stacking or layering operations that all provide the same trait in Rust. This is very nice, as one
can provide a specific implementation of a functionality, and compose it with other functionalities.
For examples look at how the "tower" crate: "a library of modular and reusable components for building robust networking clients and servers"

To be very clear - cool techie stuff, but we need to get the basics in place first, before looking at dyn Ethdev type concepts.


Harry/Gregory/Bruce wrote (in order of indentation):
> > > Instead, I believe we should be
> > > encouraging native rust thread management, and not exposing any DPDK
> > > threading APIs except those necessary to have rust threads work with DPDK,
> > > i.e. with an lcore ID. Many years ago when DPDK started, and in the C
> > > world, having DPDK as a runtime environment made sense, but times have
> > > changed and for Rust, there is a whole ecosystem out there already that we
> > > need to "play nice with", so having Rust (not DPDK) do all thread
> > > management is the way to go (again IMHO).
> > >
> >
> > I'm not sure what exposed DPDK API you refer to.
>
> I think that's the point :) Perhaps the Rust application should decide how/when to
> create threads, and how to schedule & pin them. Not the "DPDK crate for Rust".
> To give a more concrete examples, lets look at Tokio (or Monoio, or Glommio, or .. )
> which are prominent players in the Rust ecosystem, particularly for networking workloads
> where request/response patterns are well served by the "async" programming model (e.g HTTP server).

Owen wrote:
> Rust doesn't really care about threads that much. Yes, it has std::thread as a pthread equivalent, but on Linux those literally call pthread. Enforcing the correctness of the Send and Sync traits (responsible for helping enforce thread safety) in APIs is left to library authors. I've used Rust with EAL threads and it's fine, although a slightly nicer API for launching based on a closure (which is a function pointer and a struct with the captured inputs) would be nice. In Rust, I'd say that async and threads are orthogonal concepts, except where runtimes force them to mix. Async is a way to write a state machine or (with some more abstraction) an execution graph, and Rust the language doesn't care whether a library decides to run some dependencies in parallel. What I think Rust is more likely to want is thread per core and then running either a single async runtime over all of them or an async runtime per core.

The key point above is "except where runtimes force them to mix". The DPDK rxq concept (struct Rxq in the code linked above) is !Send.
As a result, it cannot be moved between threads. That allows per-lcore concepts to be used for performance.

The point I was trying to make is that we (the DPDK safe rust wrapper API) should not be prescriptive in how it is used.
In other words: we should allow the user to decide how to spawn/manage/run threads.

We must encode the DPDK requirements of e.g. "Rxq concept" with !Send, !Sync marker traits.
Then the Rust compiler will at compile-time ensure the users code is correct.

I don't believe that I can identify all use-cases, so we cannot design requirements around statements like "I think X is more likely than Y".


Harry wrote:
> Lets focus on Tokio first: it is an "async runtime" (two links for future readers)
>     <snip>
> So an async runtime can run "async" Rust functions (called Futures, or Tasks when run independently..)
> There are lots of words/concepts, but I'll focus only on the thread creation/control aspect, given the DPDK EAL lcore context.
>
> Tokio is a work-stealing scheduler. It spawns "worker" threads, and then gives these "tasks"
> to various worker cores (similar to how Golang does its work-stealing scheduling). Some
> DPDK crate users might like this type of workflow, where e.g. RXQ polling is a task, and the
> "tokio runtime" figures out which worker to run it on. "Spawning" a task causes the "Future"
> to start executing. (technical Rust note: notice the "Send" bound on Future: https://docs.rs/tokio/latest/tokio/task/fn.spawn.html )
> The work stealing aspect of Tokio has also led to some issues in the Rust ecosystem. What it effectively means is that every "await" is a place where you might get moved to another thread. This means that it would be unsound to, for example, have a queue handle on devices without MT-safe queues unless we want to put a mutex on top of all of the device queues. I personally think this is a lot of the source of people thinking that Rust async is hard, because Tokio forces you to be thread safe at really weird places in your code and has issues like not being able to hold a mutex over an await point.
>
> Other users might prefer the "thread-per-core" and CPU pinning approach (like DPDK itself would do).
> nit: Tokio also spawns a thread per core, it just freely moves tasks between cores. It doesn't pin because it's designed to interoperate with the normal kernel scheduler more nicely. I think that not needing pinned cores is nice, but we want the ability to pin for performance reasons, especially on NUMA/NUCA systems (NUCA = Non-Uniform Cache Architecture, almost every AMD EPYC above 8 cores, higher core count Intel Xeons for 3 generations, etc).
> Monoio and Glommio both serve these use cases (but in slightly different ways!). They both spawn threads and do CPU pinning.
> Monoio and Glommio say "tasks will always remain on the local thread". In Rust techie terms: "Futures are !Send and !Sync"
>     https://docs.rs/monoio/latest/monoio/fn.spawn.html
>     https://docs.rs/glommio/latest/glommio/fn.spawn_local.html

Owen wrote:
> There is also another option, one which would eliminate "service cores". We provide both a work stealing pool of tasks that have to deal with being yanked between cores/EAL threads at any time, but aren't data plane tasks, and then a different API for spawning tasks onto the local thread/core for data plane tasks (ex: something to manage a particular HTTP connection). This might make writing the runtime harder, but it should provide the best of both worlds provided we can build in a feature (Rust provides a way to "ifdef out" code via features) to disable one or the other if someone doesn't want the overhead.

Hah, yeah.. (as maintainer of service cores!) I'm aware that the "async Rust" cooperative scheduling is very similar.
That said, the problem service-cores set out to solve is a very different one to how "async Rust" came about.
The implementations, ergonomics, and the language its written in are different too... so they're different beasts!

We don't want to start writing "dpdk-async-runtime". The goal is not to duplicate everything, we must integrate with existing.
I will try provide some examples of integrating DPDK with other Rust networking projects, to prove that it can be done, and is useful.


Harry wrote:
> So there are at least 3 different async runtimes (and I haven't even talked about async-std, smol, embassy, ...) which
> all have different use-cases, and methods of running "tasks" on threads. These runtimes exist, and are widely used,
> and applications make use of their thread-scheduling capabilities.
>
> So "async runtimes" do thread creation (and optionally CPU pinning) for the user.
> Other libraries like "Rayon" are thread-pool managers, those also have various CPU thread-create/pinning capabilities.
> If DPDK *also* wants to do thread creation/management and CPU-thread-to-core pinning for the user, that creates tension.
> The other problem is that most of these async runtimes have IO very tightly integrated into them. A large portion of Tokio had to be forked and rewritten for io_uring support, and DPDK is a rather stark departure from what they were all designed for. I know that both Tokio and Glommio have "start a new async runtime on this thread" functions, and I think that Tokio has an "add this thread to a multithreaded runtime" somewhere.
>
> I think the main thing that DPDK would need to be concerned about is that many of these runtimes use thread locals, and I'm not sure if that would be transparently handled by the EAL thread runtime since I've always used thread per core and then used the Rust runtime to multiplex between tasks instead of spawning more EAL threads.
>
> Rayon should probably be thought of in a similar vein to OpenMP, since it's mainly designed for batch processing. Unless someone is doing some fairly heavy computation (the kind where "do we want a GPU to accelerate this?" becomes a question) inside of their DPDK application, I'm having trouble thinking of a use case that would want both DPDK and Rayon.
>
> > Bruce wrote: "so having Rust (not DPDK) do all thread management is the way to go (again IMHO)."
>
> I think I agree here, in order to make the Rust DPDK crate usable from the Rust ecosystem,
> it must align itself with the existing Rust networking ecosystem.
>
> That means, the DPDK Rust crate should not FORCE the usage of lcore pinnings and mappings.
> Allowing a Rust application to decide how to best handle threading (via Rayon, Tokio, Monoio, etc)
> will allow much more "native" or "ergonomic" integration of DPDK into Rust applications.

Owen wrote:
> I'm not sure that using DPDK from Rust will be possible without either serious performance sacrifices or rewrites of a lot of the networking libraries. Tokio continues to mimic the BSD sockets API for IO, even with the io_uring version, as does glommio. The idea of the "recv" giving you a buffer without you passing one in isn't really used outside of some lower-level io_uring crates. At a bare minimum, even if DPDK managed to offer an API that works exactly the same ways as io_uring or epoll, we would still need to go to all of the async runtimes and get them to plumb DPDK support in or approve someone from the DPDK community maintaining support. If we don't offer that API, then we either need rewrites inside of the async runtimes or for individual libraries to provide DPDK support, which is going to be even more difficult.

Regarding traits used for IO, correct many are focussed on "recv" giving you a buffer, but not all. Look at Monoio, specifically the *Rent APIs:
https://docs.rs/monoio/latest/monoio/io/index.html#traits


Owen wrote:
> I agree that forcing lcore pinnings and mappings isn't good, but I think that DPDK is well within its rights to build its own async runtime which exposes a standard API. For one thing, the first thing Rust users will ask for is a TCP stack, which the community has been discussing and debating for a long time. I think we should figure out whether the goal is to allow DPDK applications to be written in Rust, or to allow generic Rust applications to use DPDK. The former means that the audience would likely be Rust-fluent people who would have used DPDK regardless, and are fine dealing with mempools, mbufs, the eal, and ethdev configuration. The latter is a much larger audience who is likely going to be less tolerant of dpdk-rs exposing the true complexity of using DPDK. Yes, Rust can help make the abstractions better, but there's an amount of inherent complexity in "Your NIC can handle IPSec for you and can also direct all IPv6 traffic to one core" that I don't think we can remove.

Ok, we're getting very far into future/conceptual design here.
For me, DPDK having its own async runtime and its own DPDK TCP stack is NOT the goal.
We should try to integrate DPDK with existing software environments - not rewrite the world.



Owen wrote:
> I personally think that making an API for DPDK applications to be written in Rust, and then steadily adding abstractions on top of that until we arrive at something that someone who has never looked at a TCP header can use without too much confusion. That was part of the goal of the Iris project I pitched (and then had to go finish another project so the design is still WIP). I think that a move to DPDK is going to be as radical of a change as a move to io_uring, however, DPDK is fast enough that I think it may be possible to convince people to do a rewrite once we arrive at that high level API.

I haven't heard of the Iris project you mentioned, is there something concrete to learn from, or is it too WIP to apply?


Owen wrote:
> "Swap out your sockets and rework the functions that do network IO for a 5x performance increase" is a very, very attractive offer, but for us to get there I think we need to have DPDK's full potential available in Rust, and then build as many zero-overhead (zero cost or you couldn't write it better yourself) abstractions as we can on top. I want to avoid a situation where we build up to the high-level APIs as fast as we can and then end up in a situation where you have "Easy Mode" and then "C DPDK written in Rust" as your two options.

My perspective is that we're carefully designing "Safe Rust" APIs, and will have "DPDKs full potential" as a result.
I'm not sure where the "easy mode" comment applies. But lets focus on code - and making concrete progress - over theoretical discussions.

I'll keep my input more consise in future, and try get more patches on list for review.


> > Regards,
> > Gregory
>
> Apologies for the long-form, "wall of text" email, but I hope it captures the nuance of threading and
> async runtimes, which I believe in the long term will be very nice to capture "async offload" use-cases
> for DPDK. To put it another way, lookaside processing can be hidden behind async functions & runtimes,
> if we design the APIs right: and that would be really cool for making async-offload code easy to write correctly!
>
> Regards, -Harry
>
> Sorry for my own walls of text. As a consequence of working on Iris I've spent a lot of time thinking about how to make DPDK easier to use while keeping the performance intact, and I was already thinking in Rust since it provides one of the better options for these kinds of abstractions (the other option I see is Mojo, which isn't ready yet). I want to see DPDK become more accessible, but the performance and access to hardware is one of the main things that make DPDK special, so I don't want to compromise that. I definitely agree that we need to force DPDK's existing APIs to justify themselves in the face of the new capabilities of Rust, but I think that starting from "How are Rust applications written today?" is a mistake.
>
> Regards,
> Owen

Generally agree, but just this line stood out to me:
 > Owen wrote:   I think that starting from "How are Rust applications written today?" is a mistake.

We have to understand how applications are written today, in order to understand what it would take to move them to a DPDK backend.
In C, consuming DPDK is hard, as applications expect TCP via sockets, and DPDK provides mbuf*s: that's a large mismatch. (Yes I'm aware of various DPDK-aware TCP stacks etc.)

In Rust, applications expect a "let tcp_port = TcpListener::bind()", and then to "tcp_port.accept()" incoming requests.
Those requirements can be met by: std::net::TcpListener, tokio::net::TcpListener, and in future, some DPDK (SmolTCP?) based TcpListener.
- https://doc.rust-lang.org/std/net/struct.TcpListener.html
- https://docs.rs/tokio/latest/tokio/net/struct.TcpListener.html

The ability to move between abstractions is much easier in Rust. As a result, providing "normal looking APIs" is IMO the best way forward.

Regards, and thanks for the input & discussion. -Harry

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-05-06 16:39                     ` Van Haaren, Harry
@ 2025-05-08 23:53                       ` Owen Hilyard
  2025-05-09 16:24                         ` Van Haaren, Harry
  0 siblings, 1 reply; 20+ messages in thread
From: Owen Hilyard @ 2025-05-08 23:53 UTC (permalink / raw)
  To: Van Haaren, Harry, Etelson, Gregory, Richardson, Bruce; +Cc: dev

> ‎From: Van Haaren, Harry <harry.van.haaren@intel.com>
> Sent: Tuesday, May 6, 2025 12:39 PM
> To: Owen Hilyard <Owen.Hilyard@unh.edu>; Etelson, Gregory <getelson@nvidia.com>; Richardson, Bruce <bruce.richardson@intel.com>
> Cc: dev@dpdk.org <dev@dpdk.org>
> Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq

> > From: Owen Hilyard
> > Sent: Saturday, May 03, 2025 6:13 PM
> > To: Van Haaren, Harry; Etelson, Gregory; Richardson, Bruce
> > Cc: dev@dpdk.org
> > Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
> >
> > From: Van Haaren, Harry <harry.van.haaren@intel.com>
> > Sent: Friday, May 2, 2025 9:58 AM
> > To: Etelson, Gregory <getelson@nvidia.com>; Richardson, Bruce <bruce.richardson@intel.com>
> > Cc: dev@dpdk.org <dev@dpdk.org>; Owen Hilyard <owen.hilyard@unh.edu>
> > Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
> >
> > > From: Etelson, Gregory
> > > Sent: Friday, May 02, 2025 1:46 PM
> > > To: Richardson, Bruce
> > > Cc: Gregory Etelson; Van Haaren, Harry; dev@dpdk.org; owen.hilyard@unh.edu
> > > Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
> > >
> > > Hello Bruce,
> >
> > Hi All,
> > Hi All,
>
> Hi All!
>
> Great to see passionate & detailed replies & input!
>
> Please folks - lets try remember to send plain-text emails, and use  >  to indent each reply.
>Its hard to identify what I wrote (1) compared to Owen's replies (2) in the archives otherwise.
> (Adding some "Harry wrote" and "Owen wrote" annotations to try help future readability.)

My apologies, I'll be more careful with that.

> Maybe it will help to split the conversation into two threads, with one focussing on
"DPDK used through Safe Rust abstractions", and the other on "future cool use-cases".

Agree. 

> Perhaps I jumped a bit too far ahead mentioning async runtimes, and while I like the enthusiasm for designing "cool new stuff", it is probably better to be realistic around what will get "done": my bad.
> 
> I'll reply to the "DPDK via Safe Rust" topics below, and start a new thread (with same folks on CC) for "future cool use-cases" when I've had a chance to clean up a little demo to showcase them.
>
>
> > > > Thanks for sharing. However, IMHO using EAL for thread management in rust
> > > > is the wrong interface to expose.
> > >
> > > EAL is a singleton object in DPDK architecture.
> > > I see it as a hub for other resources.
>
> Harry Wrote:
> > Yep, i tend to agree here; EAL is central to the rest of DPDK working correctly.
> > And given EALs implementation is heavily relying on global static variables, it is
> > certainly a "singleton" instance, yes.
> 
> Owen wrote:
> > I think a singleton one way to implement this, but then you lose some of the RAII/automatic resource management behavior. It would, however, make some APIs inherently unsafe or very unergonomic unless we were to force rte_eal_cleanup to be run via atexit(3) or the platform equivalent and forbid the user from running it themselves. For a lot of Rust runtimes similar to the EAL (tokio, glommio, etc), once you spawn a runtime it's around until process exit. The other option is to have a handle which represents the state of the EAL on the Rust side and runs rte_eal_init on creation and rte_eal_cleanup on destruction. There are two ways we can make that safe. First, reference counting, once the handles are created, they can be passed around easily, and the last one runs rte_eal_cleanup when it gets dropped.  This avoids having tons of complicated lifetimes and I think that, everywhere that it shouldn't affect fast path performance, we should use refcounting.
>
> Agreed, refcounts for EAL "singleton" concept yes. For the record, the initial patch actually returns a
"dpdk" object from dpdk::Eal::init(), and Drop impl has a // TODO rte_eal_cleanup(), so well aligned on approach here.
> https://patches.dpdk.org/project/dpdk/patch/20250418132324.4085336-1-harry.van.haaren@intel.com/

One thing I think I'd like to see is using a "newtype" for important numbers (ex: "struct EthDevQueueId(pub u16)"). This prevents some classes of error but if we make the constructor public it's at most a minor inconvenience to anyone who has to do something a bit odd. 

> > Owen wrote:
> > The other option is to use lifetimes. This is doable, but is going to force people who are more likely to primarily be C or C++ developers to dive deep into Rust's type system if they want to build abstractions over it. If we add async into the mix, as many people are going to want to do, it's going to become much, much harder. As a result, I'd advocate for only using it for data path components where refcounting isn't an option.
>
> +1 to not using lifetimes here, it is not the right solution for this EAL / singleton type problem.

Having now looked over the initial patchset in more detail, I think we do have a question of how far down "it compiles it works" we want to go. For example, using typestates to make Eal::take_eth_ports impossible to call more than once using something like this:

#[derive(Debug, Default)]
pub struct Eal<const HAS_ETHDEV_PORTS: bool> {
    eth_ports: Vec<eth::Port>,
}

impl<const HAS_ETHDEV_PORTS: bool> Eal<HAS_ETHDEV_PORTS> {
    pub fn init() -> Result<Self, String> {
        // EAL init() will do PCI probe and VDev enumeration will find/create eth ports.
        // This code should loop over the ports, and build up Rust structs representing them
        let eth_port = vec![eth::Port::from_u16(0)];
        Ok(Eal {
            eth_ports: Some(eth_port),
        })
    }
}

impl Eal<true> {
    pub fn take_eth_ports(self) -> (Eal<false>, Vec<eth::Port>) {
        (Eal::<false>::default(), self.eth_ports.take())
    }
}

impl<const HAS_ETHDEV_PORTS: bool> Drop for Eal<HAS_ETHDEV_PORTS> {
    fn drop(&mut self) {
        if HAS_ETHDEV_PORTS {
            // extra desired port cleanup
        }
        // todo: rte_eal_cleanup()
    }
}

This does add some noise to looking at the struct, but also lets the compiler enforce what state a struct should be in to call a given function. Taken to its logical extreme, we could create an API where many of the "resource in wrong state" errors should be impossible. However, it also requires more knowledge of Rust's type system on the part of the people making the API and can be a bit harder to understand without an LSP helping you along.

> Gregory wrote:
> > > Following that idea, the EAL structure can be divided to hold the
> > > "original" resources inherited from librte_eal and new resources
> > > introduced in Rust EAL.
> 
> Harry wrote:
> > Here we can look from different perspectives. Should "Rust EAL" even exist?
> > If so, why? The DPDK C APIs were designed in baremetal/linux days, where
> > certain "best-practices" didn't exist yet, and Rust language was pre 1.0 release.
> >
> > Of course, certain parts of Rust API must depend on EAL being initialized.
> > There is a logical flow to DPDK initialization, these must be kept for correct functionality.
> >
> > I guess I'm saying, perhaps we can do better than mirroring the concept of
> > "DPDK EAL in C" in to "DPDK EAL in Rust".
> 
> Owen wrote:
> > I think that there will need to be some kind of runtime exposed by the library. A lot of the existing EAL abstractions may need to be reworked, especially those dealing with memory, but I think a lot of things can be layered on top of the C API. However, I think many of the invariants in the EAL could be enforced at compile time for free, which may mean the creation of a lot of "unchecked" function variants which skip over null checks and other validation.
> 
> Agree that most (if not all?) things can be layered on top of the C API. Lets leave the "unchecked" function variants discussion until we have code to discuss, its hard to know right now because we don't have an implementation to talk about.

ack. I was mostly referring to eliminating null checks or "does this queue/port exist?" checks, but that can be a later discussion.

> Owen wrote:
> > As was mentioned before, it may also make sense for some abstractions in the C EAL to be lifted to compile time. I've spent a lot of time thinking about how to use something like Rust's traits for "it just works" capabilities where you can declare what features you want (ex: scatter/gather) and it will either be done in hardware or fall back to software, since you were going to need to do it anyway. This might lead to parameterizing a lot of user code on the devices they expect to interact with and then having some "dyn EthDev" as a fallback, which should be roughly equivalent to what we have now. I can explain that in more detail if there's interest.
> 
> This goes into the "cool new stuff" category in my head: I agree these concepts are possible, 
> but i feel we must prioritize the "DPDK via Safe Rust" and achieve that first. We cannot put the
> cherry on the cake, if the cake is still under construction :)
> 
> (Techie note, the description is for a "polyfill" of specific functionality. This is often done via
> stacking or layering operations that all provide the same trait in Rust. This is very nice, as one
> can provide a specific implementation of a functionality, and compose it with other functionalities.
> For examples look at how the "tower" crate: "a library of modular and reusable components for building robust networking clients and servers"
> 
> To be very clear - cool techie stuff, but we need to get the basics in place first, before looking at dyn Ethdev type concepts.
>
> Harry/Gregory/Bruce wrote (in order of indentation):
> > > > Instead, I believe we should be
> > > > encouraging native rust thread management, and not exposing any DPDK
> > > > threading APIs except those necessary to have rust threads work with DPDK,
> > > > i.e. with an lcore ID. Many years ago when DPDK started, and in the C
> > > > world, having DPDK as a runtime environment made sense, but times have
> > > > changed and for Rust, there is a whole ecosystem out there already that we
> > > > need to "play nice with", so having Rust (not DPDK) do all thread
> > > > management is the way to go (again IMHO).
> > > >
> > >
> > > I'm not sure what exposed DPDK API you refer to.
> >
> > I think that's the point :) Perhaps the Rust application should decide how/when to
> create threads, and how to schedule & pin them. Not the "DPDK crate for Rust".
> To give a more concrete examples, lets look at Tokio (or Monoio, or Glommio, or .. )
> which are prominent players in the Rust ecosystem, particularly for networking workloads
> where request/response patterns are well served by the "async" programming model (e.g HTTP server).
>
> Owen wrote:
> > Rust doesn't really care about threads that much. Yes, it has std::thread as a pthread equivalent, but on Linux those literally call pthread. Enforcing the correctness of the Send and Sync traits (responsible for helping enforce thread safety) in APIs is left to library authors. I've used Rust with EAL threads and it's fine, although a slightly nicer API for launching based on a closure (which is a function pointer and a struct with the captured inputs) would be nice. In Rust, I'd say that async and threads are orthogonal concepts, except where runtimes force them to mix. Async is a way to write a state machine or (with some more abstraction) an execution graph, and Rust the language doesn't care whether a library decides to run some dependencies in parallel. What I think Rust is more likely to want is thread per core and then running either a single async runtime over all of them or an async runtime per core.
> 
> The key point above is "except where runtimes force them to mix". The DPDK rxq concept (struct Rxq in the code linked above) is !Send.
> As a result, it cannot be moved between threads. That allows per-lcore concepts to be used for performance.

The problem is that, with Tokio, it also can't be held across an await point. I agree that !Send is correct, but the existence of !Send resources means that integration with Tokio is much, much harder. For PMDs with RTE_ETH_TX_OFFLOAD_MT_LOCKFREE, TX is fine, but as far as I am aware there is no equivalent for RX. And, to safely take advantage of the TX version, we'd need to know the capabilities of the target PMD at compile time, which is part of why my own bindings "devirtualize" the EAL and require a top-level function which dispatches based on the capabilities provided by the PMDs I make use of. Glommio was easily able to integrate safely (theoretically Monoio would be too, although I haven't used it), but I haven't found a safe way to mix Tokio and queue handles which doesn't make it nearly impossible to use async, even when taking that fairly extreme measure. 

> The point I was trying to make is that we (the DPDK safe rust wrapper API) should not be prescriptive in how it is used.
> In other words: we should allow the user to decide how to spawn/manage/run threads.
>
> We must encode the DPDK requirements of e.g. "Rxq concept" with !Send, !Sync marker traits.
> Then the Rust compiler will at compile-time ensure the users code is correct.

I agree that !Send and !Sync are likely correct for Rxqs, however, we also need to be very careful in documenting the WHY of !Send and !Sync in each context. For instance, how are we going to get the queue handles to the threads which run the data path if we get all of them from an Eal struct in a Vec on the main thread? We may need to have a way to "deactivate" them so the user can't use them for queue operations but they are Send, !Sync, emit a fence, and then when the user "activates" them it performs another fence to force anything the last thread did with the queue to be visible on the new core. I suspect we'll need to apply a similar pattern for other thread unsafe parts of DPDK in order to get them to where they need to be during execution. 

> I don't believe that I can identify all use-cases, so we cannot design requirements around statements like "I think X is more likely than Y".

I agree, this is why unsafe escape hatches will be necessary. Someone will have some weird edge-case like a CPU with no cache that makes it fine to move Rxqs around with abandon. 

> Harry wrote:
> > Lets focus on Tokio first: it is an "async runtime" (two links for future readers)
> >     <snip>
> > So an async runtime can run "async" Rust functions (called Futures, or Tasks when run independently..)
> > There are lots of words/concepts, but I'll focus only on the thread creation/control aspect, given the DPDK EAL lcore context.
> > 
> > Tokio is a work-stealing scheduler. It spawns "worker" threads, and then gives these "tasks"
> > to various worker cores (similar to how Golang does its work-stealing scheduling). Some
> > DPDK crate users might like this type of workflow, where e.g. RXQ polling is a task, and the
> > "tokio runtime" figures out which worker to run it on. "Spawning" a task causes the "Future"
> > to start executing. (technical Rust note: notice the "Send" bound on Future: https://docs.rs/tokio/latest/tokio/task/fn.spawn.html )
> > The work stealing aspect of Tokio has also led to some issues in the Rust ecosystem. What it effectively means is that every "await" is a place where you might get moved to another thread. This means that it would be unsound to, for example, have a queue handle on devices without MT-safe queues unless we want to put a mutex on top of all of the device queues. I personally think this is a lot of the source of people thinking that Rust async is hard, because Tokio forces you to be thread safe at really weird places in your code and has issues like not being able to hold a mutex over an await point.
> >
> > Other users might prefer the "thread-per-core" and CPU pinning approach (like DPDK itself would do).
> > nit: Tokio also spawns a thread per core, it just freely moves tasks between cores. It doesn't pin because it's designed to interoperate with the normal kernel scheduler more nicely. I think that not needing pinned cores is nice, but we want the ability to pin for performance reasons, especially on NUMA/NUCA systems (NUCA = Non-Uniform Cache Architecture, almost every AMD EPYC above 8 cores, higher core count Intel Xeons for 3 generations, etc).
> > Monoio and Glommio both serve these use cases (but in slightly different ways!). They both spawn threads and do CPU pinning.
> > Monoio and Glommio say "tasks will always remain on the local thread". In Rust techie terms: "Futures are !Send and !Sync"
> > https://docs.rs/monoio/latest/monoio/fn.spawn.html     
> > https://docs.rs/glommio/latest/glommio/fn.spawn_local.html
> 
> Owen wrote:
> > There is also another option, one which would eliminate "service cores". We provide both a work stealing pool of tasks that have to deal with being yanked between cores/EAL threads at any time, but aren't data plane tasks, and then a different API for spawning tasks onto the local thread/core for data plane tasks (ex: something to manage a particular HTTP connection). This might make writing the runtime harder, but it should provide the best of both worlds provided we can build in a feature (Rust provides a way to "ifdef out" code via features) to disable one or the other if someone doesn't want the overhead.
> 
> Hah, yeah.. (as maintainer of service cores!) I'm aware that the "async Rust" cooperative scheduling is very similar.
> That said, the problem service-cores set out to solve is a very different one to how "async Rust" came about.
> The implementations, ergonomics, and the language its written in are different too... so they're different beasts!

I think we could still make use of the idea of separate pools of thread local and global tasks. 

> We don't want to start writing "dpdk-async-runtime". The goal is not to duplicate everything, we must integrate with existing.

What do you picture someone who picks up "dpdk-rs" seeing as the interface to DPDK when it's fully integrated? Do they enable a feature flag in their async runtime and the runtime handles it for them, do they set up DPDK and start the runtime? Most of the libraries I'm aware of assume the presence of an OS network stack. Yes, there are some like smoltcp which are capable of operating on top of the l2 interface provided by DPDK, but most are going to want a network stack to exist on top of. 

> I will try provide some examples of integrating DPDK with other Rust networking projects, to prove that it can be done, and is useful.
>
> Harry wrote:
> > So there are at least 3 different async runtimes (and I haven't even talked about async-std, smol, embassy, ...) which
> > all have different use-cases, and methods of running "tasks" on threads. These runtimes exist, and are widely used,
> > and applications make use of their thread-scheduling capabilities.
> >
> > So "async runtimes" do thread creation (and optionally CPU pinning) for the user.
> > Other libraries like "Rayon" are thread-pool managers, those also have various CPU thread-create/pinning capabilities.
> > If DPDK *also* wants to do thread creation/management and CPU-thread-to-core pinning for the user, that creates tension.
> > The other problem is that most of these async runtimes have IO very tightly integrated into them. A large portion of Tokio had to be forked and rewritten for io_uring support, and DPDK is a rather stark departure from what they were all designed for. I know that both Tokio and Glommio have "start a new async runtime on this thread" functions, and I think that Tokio has an "add this thread to a multithreaded runtime" somewhere.
> >
> > I think the main thing that DPDK would need to be concerned about is that many of these runtimes use thread locals, and I'm not sure if that would be transparently handled by the EAL thread runtime since I've always used thread per core and then used the Rust runtime to multiplex between tasks instead of spawning more EAL threads.
> >
> > Rayon should probably be thought of in a similar vein to OpenMP, since it's mainly designed for batch processing. Unless someone is doing some fairly heavy computation (the kind where "do we want a GPU to accelerate this?" becomes a question) inside of their DPDK application, I'm having trouble thinking of a use case that would want both DPDK and Rayon.
>> 
> > > Bruce wrote: "so having Rust (not DPDK) do all thread management is the way to go (again IMHO)."
> >
> > I think I agree here, in order to make the Rust DPDK crate usable from the Rust ecosystem,
> > it must align itself with the existing Rust networking ecosystem.
> > 
> > That means, the DPDK Rust crate should not FORCE the usage of lcore pinnings and mappings.
> > Allowing a Rust application to decide how to best handle threading (via Rayon, Tokio, Monoio, etc)
> > will allow much more "native" or "ergonomic" integration of DPDK into Rust applications.
> 
> Owen wrote:
> > I'm not sure that using DPDK from Rust will be possible without either serious performance sacrifices or rewrites of a lot of the networking libraries. Tokio continues to mimic the BSD sockets API for IO, even with the io_uring version, as does glommio. The idea of the "recv" giving you a buffer without you passing one in isn't really used outside of some lower-level io_uring crates. At a bare minimum, even if DPDK managed to offer an API that works exactly the same ways as io_uring or epoll, we would still need to go to all of the async runtimes and get them to plumb DPDK support in or approve someone from the DPDK community maintaining support. If we don't offer that API, then we either need rewrites inside of the async runtimes or for individual libraries to provide DPDK support, which is going to be even more difficult.
> 
> Regarding traits used for IO, correct many are focussed on "recv" giving you a buffer, but not all. Look at Monoio, specifically the *Rent APIs:
> https://docs.rs/monoio/latest/monoio/io/index.html#traits

As far as I can tell, the *Rent APIs for Monoio have the same problem, they require you to pass in a buffer, and to satisfy that API we'd need to throw out zero copy, pass that buffer directly to the PMD, or do some weird thing were we use that API to recycle buffers back into the mempool. I see, in Monoio terms, a DPDK API looking more like TcpStream::read(&mut self) -> impl Future<Output = BufResult<usize, dpdk::PktMbuf>> or some equivalent abstraction on top. 

> Owen wrote:
> > I agree that forcing lcore pinnings and mappings isn't good, but I think that DPDK is well within its rights to build its own async runtime which exposes a standard API. For one thing, the first thing Rust users will ask for is a TCP stack, which the community has been discussing and debating for a long time. I think we should figure out whether the goal is to allow DPDK applications to be written in Rust, or to allow generic Rust applications to use DPDK. The former means that the audience would likely be Rust-fluent people who would have used DPDK regardless, and are fine dealing with mempools, mbufs, the eal, and ethdev configuration. The latter is a much larger audience who is likely going to be less tolerant of dpdk-rs exposing the true complexity of using DPDK. Yes, Rust can help make the abstractions better, but there's an amount of inherent complexity in "Your NIC can handle IPSec for you and can also direct all IPv6 traffic to one core" that I don't think we can remove.
> 
> Ok, we're getting very far into future/conceptual design here.
> For me, DPDK having its own async runtime and its own DPDK TCP stack is NOT the goal.
> We should try to integrate DPDK with existing software environments - not rewrite the world.

Which existing software environments are you thinking of exactly? Most Rust applications that use networking are going to be using Axum, Tower, and the other crates that you've mentioned, and all of those rely on having a TCP stack to be useful. I have found vanishingly few Rust crates which handle integration with DPDK without me editing them to some degree. I'd like to know where you're finding existing Rust software environments which don't care about the presence of a network stack but are still networking oriented. If the goal is to take a DPDK application that would have been written in C/C++ and write it in Rust instead, that is very different than taking an application which would have happily used the OS network stack, such as an HTTP server which deals with normal (<1k RPS) amounts of traffic, and moving it onto DPDK, and it seems to me like you are suggesting that we should focus on the latter.

> Owen wrote:
> > I personally think that making an API for DPDK applications to be written in Rust, and then steadily adding abstractions on top of that until we arrive at something that someone who has never looked at a TCP header can use without too much confusion. That was part of the goal of the Iris project I pitched (and then had to go finish another project so the design is still WIP). I think that a move to DPDK is going to be as radical of a change as a move to io_uring, however, DPDK is fast enough that I think it may be possible to convince people to do a rewrite once we arrive at that high level API.
> 
> I haven't heard of the Iris project you mentioned, is there something concrete to learn from, or is it too WIP to apply?

I have some design docs, but nothing concrete. I got pulled back to another project which is still ongoing shortly after I gave the talk at the last DPDK summit. The main goal of Iris is to provide a DPDK-based alternative to something like a gRPC with a message-based API instead of a byte-based one, and to take advantage of the massive amount of extra breathing room under that new API (as compared to TCP) to plumb in the various accelerators integrated into DPDK alongside a network stack. It's based on observations that many developers aren't even working at a TCP or HTTP level any more, but are instead using "JSON RPC over HTTPS which is automatically converted into objects by their HTTP server framework" or something like gRPC to have a "send message to server" and "get message to server" API. Most of what I have for that is a lot of time spent thinking about a Rust-based API on top of DPDK as a foundation for building the rest of the network stack on top. 

> Owen wrote:
> > "Swap out your sockets and rework the functions that do network IO for a 5x performance increase" is a very, very attractive offer, but for us to get there I think we need to have DPDK's full potential available in Rust, and then build as many zero-overhead (zero cost or you couldn't write it better yourself) abstractions as we can on top. I want to avoid a situation where we build up to the high-level APIs as fast as we can and then end up in a situation where you have "Easy Mode" and then "C DPDK written in Rust" as your two options.
> 
> My perspective is that we're carefully designing "Safe Rust" APIs, and will have "DPDKs full potential" as a result.
> I'm not sure where the "easy mode" comment applies. But lets focus on code - and making concrete progress - over theoretical discussions.
> 
> I'll keep my input more consise in future, and try get more patches on list for review.
> > > Regards,
> > > Gregory
> >
> > Apologies for the long-form, "wall of text" email, but I hope it captures the nuance of threading and
> > async runtimes, which I believe in the long term will be very nice to capture "async offload" use-cases
> > for DPDK. To put it another way, lookaside processing can be hidden behind async functions & runtimes,
> > if we design the APIs right: and that would be really cool for making async-offload code easy to write correctly!
> >
> > Regards, -Harry
> >
> > Sorry for my own walls of text. As a consequence of working on Iris I've spent a lot of time thinking about how to make DPDK easier to use while keeping the performance intact, and I was already thinking in Rust since it provides one of the better options for these kinds of abstractions (the other option I see is Mojo, which isn't ready yet). I want to see DPDK become more accessible, but the performance and access to hardware is one of the main things that make DPDK special, so I don't want to compromise that. I definitely agree that we need to force DPDK's existing APIs to justify themselves in the face of the new capabilities of Rust, but I think that starting from "How are Rust applications written today?" is a mistake.
> >
> > Regards,
> > Owen
> 
> Generally agree, but just this line stood out to me:
> > Owen wrote:   I think that starting from "How are Rust applications written today?" is a mistake.
> 
> We have to understand how applications are written today, in order to understand what it would take to move them to a DPDK backend.
> In C, consuming DPDK is hard, as applications expect TCP via sockets, and DPDK provides mbuf*s: that's a large mismatch. (Yes I'm aware of various DPDK-aware TCP stacks etc.)
>
> In Rust, applications expect a "let tcp_port = TcpListener::bind()", and then to "tcp_port.accept()" incoming requests.
> Those requirements can be met by: std::net::TcpListener, tokio::net::TcpListener, and in future, some DPDK (SmolTCP?) based TcpListener.
> - https://doc.rust-lang.org/std/net/struct.TcpListener.html
> - https://docs.rs/tokio/latest/tokio/net/struct.TcpListener.html
> 
> The ability to move between abstractions is much easier in Rust. As a result, providing "normal looking APIs" is IMO the best way forward.

Yes, moving between abstractions is easier in Rust, but I think that the abstraction provided by std::net::TcpListener and tokio::net::TcpListener is flawed. I'm not sure there is a good way to provide a "normal" API without fairly serious performance compromises. For example, as I'm sure everyone here is aware, the traditional BSD sockets API requires double the memory bandwidth that a zero-copy one does on the rx path. Those APIs also ignore TLS, meaning that we would actually need to go look at a wrapper over rustls or some other TLS implementation as what users interact with. I can keep going up levels, but this is why I decided to put the highest level of abstraction in Iris, the one I intend most people to interact with at "get this blob of bytes over to that other server as a message, possibly encrypting it, compressing it, doing zero trust checks, etc". I'm not sure if applications expect a TcpListener, so much as an HttpListener, or a JsonRPCListener. I think it would be wise to determine what type of API people would want for a dpdk-rs, rather than making an assumption that they want something like BSD sockets. Even inside of the kernel io_uring has been breaking away from that API with an API that looks a lot more like what I would expect from DPDK, and providing ergonomics benefits to users while doing it. 

> Regards, and thanks for the input & discussion. -Harry

Thanks for the discussion, and I hope to continue to work with all of you on this,
Owen

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-05-08 23:53                       ` Owen Hilyard
@ 2025-05-09 16:24                         ` Van Haaren, Harry
  2025-05-10 16:05                           ` Owen Hilyard
  0 siblings, 1 reply; 20+ messages in thread
From: Van Haaren, Harry @ 2025-05-09 16:24 UTC (permalink / raw)
  To: Owen Hilyard, Etelson, Gregory, Richardson, Bruce; +Cc: dev

> From: Owen Hilyard
> Sent: Friday, May 09, 2025 12:53 AM
> To: Van Haaren, Harry; Etelson, Gregory; Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
> 
> > From: Van Haaren, Harry <harry.van.haaren@intel.com>
> > Sent: Tuesday, May 6, 2025 12:39 PM
> > To: Owen Hilyard <Owen.Hilyard@unh.edu>; Etelson, Gregory <getelson@nvidia.com>; Richardson, Bruce <bruce.richardson@intel.com>
> > Cc: dev@dpdk.org <dev@dpdk.org>
> > Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
<snip>
> > Hi All!
> >
> > Great to see passionate & detailed replies & input!
> >
> > Please folks - lets try remember to send plain-text emails, and use  >  to indent each reply.
> >Its hard to identify what I wrote (1) compared to Owen's replies (2) in the archives otherwise.
> > (Adding some "Harry wrote" and "Owen wrote" annotations to try help future readability.)
> 
> My apologies, I'll be more careful with that.

Thanks! The reply here is perfect.


> > Maybe it will help to split the conversation into two threads, with one focussing on
> "DPDK used through Safe Rust abstractions", and the other on "future cool use-cases".
> 
> Agree.
> 
> > Perhaps I jumped a bit too far ahead mentioning async runtimes, and while I like the enthusiasm for designing "cool new stuff", it is probably better to be realistic around what will get "done": my bad.
> >
> > I'll reply to the "DPDK via Safe Rust" topics below, and start a new thread (with same folks on CC) for "future cool use-cases" when I've had a chance to clean up a little demo to showcase them.
> >
> >
> > > > > Thanks for sharing. However, IMHO using EAL for thread management in rust
> > > > > is the wrong interface to expose.
> > > >
> > > > EAL is a singleton object in DPDK architecture.
> > > > I see it as a hub for other resources.
> >
> > Harry Wrote:
> > > Yep, i tend to agree here; EAL is central to the rest of DPDK working correctly.
> > > And given EALs implementation is heavily relying on global static variables, it is
> > > certainly a "singleton" instance, yes.
> >
> > Owen wrote:
> > > I think a singleton one way to implement this, but then you lose some of the RAII/automatic resource management behavior. It would, however, make some APIs inherently unsafe or very unergonomic unless we were to force rte_eal_cleanup to be run via atexit(3) or the platform equivalent and forbid the user from running it themselves. For a lot of Rust runtimes similar to the EAL (tokio, glommio, etc), once you spawn a runtime it's around until process exit. The other option is to have a handle which represents the state of the EAL on the Rust side and runs rte_eal_init on creation and rte_eal_cleanup on destruction. There are two ways we can make that safe. First, reference counting, once the handles are created, they can be passed around easily, and the last one runs rte_eal_cleanup when it gets dropped.  This avoids having tons of complicated lifetimes and I think that, everywhere that it shouldn't affect fast path performance, we should use refcounting.
> >
> > Agreed, refcounts for EAL "singleton" concept yes. For the record, the initial patch actually returns a
> "dpdk" object from dpdk::Eal::init(), and Drop impl has a // TODO rte_eal_cleanup(), so well aligned on approach here.
> > https://patches.dpdk.org/project/dpdk/patch/20250418132324.4085336-1-harry.van.haaren@intel.com/
> 
> One thing I think I'd like to see is using a "newtype" for important numbers (ex: "struct EthDevQueueId(pub u16)"). This prevents some classes of error but if we make the constructor public it's at most a minor inconvenience to anyone who has to do something a bit odd.
> 
> > > Owen wrote:
> > > The other option is to use lifetimes. This is doable, but is going to force people who are more likely to primarily be C or C++ developers to dive deep into Rust's type system if they want to build abstractions over it. If we add async into the mix, as many people are going to want to do, it's going to become much, much harder. As a result, I'd advocate for only using it for data path components where refcounting isn't an option.
> >
> > +1 to not using lifetimes here, it is not the right solution for this EAL / singleton type problem.
> 
> Having now looked over the initial patchset in more detail, I think we do have a question of how far down "it compiles it works" we want to go. For example, using typestates to make Eal::take_eth_ports impossible to call more than once using something like this:
> 
> #[derive(Debug, Default)]
> pub struct Eal<const HAS_ETHDEV_PORTS: bool> {
>     eth_ports: Vec<eth::Port>,
> }
> 
> impl<const HAS_ETHDEV_PORTS: bool> Eal<HAS_ETHDEV_PORTS> {
>     pub fn init() -> Result<Self, String> {
>         // EAL init() will do PCI probe and VDev enumeration will find/create eth ports.
>         // This code should loop over the ports, and build up Rust structs representing them
>         let eth_port = vec![eth::Port::from_u16(0)];
>         Ok(Eal {
>             eth_ports: Some(eth_port),
>         })
>     }
> }
> 
> impl Eal<true> {
>     pub fn take_eth_ports(self) -> (Eal<false>, Vec<eth::Port>) {
>         (Eal::<false>::default(), self.eth_ports.take())
>     }
> }
> 
> impl<const HAS_ETHDEV_PORTS: bool> Drop for Eal<HAS_ETHDEV_PORTS> {
>     fn drop(&mut self) {
>         if HAS_ETHDEV_PORTS {
>             // extra desired port cleanup
>         }
>         // todo: rte_eal_cleanup()
>     }
> }
> 
> This does add some noise to looking at the struct, but also lets the compiler enforce what state a struct should be in to call a given function. Taken to its logical extreme, we could create an API where many of the "resource in wrong state" errors should be impossible. However, it also requires more knowledge of Rust's type system on the part of the people making the API and can be a bit harder to understand without an LSP helping you along.

This is too much in my opinion. I know there's value, but the ergonomics suffers significantly if we have generics over Eal.
I'd like to not treat Ethdev "differently" to other Devs. And if we give Ethdev a generic for EAL, then the others would too; exploding the generic counts & complixity.

Techie notes for eager readers; one can use this technique for compile-time enforing lock-ordering (avoiding ABA deadlock)!
Thanks to Fuchsia OS, and Joshua Liebow-Feeser https://lwn.net/Articles/995814/,
and Angus Morrison for the simpler demo at https://docs.rs/lock_tree/latest/lock_tree/

So this technique is really cool, but not the right tradeoff in this case.

<snip>

> > The key point above is "except where runtimes force them to mix". The DPDK rxq concept (struct Rxq in the code linked above) is !Send.
> > As a result, it cannot be moved between threads. That allows per-lcore concepts to be used for performance.
> 
> The problem is that, with Tokio, it also can't be held across an await point. I agree that !Send is correct, but the existence of !Send resources means that integration with Tokio is much, much harder. For PMDs with RTE_ETH_TX_OFFLOAD_MT_LOCKFREE, TX is fine, but as far as I am aware there is no equivalent for RX. And, to safely take advantage of the TX version, we'd need to know the capabilities of the target PMD at compile time, which is part of why my own bindings "devirtualize" the EAL and require a top-level function which dispatches based on the capabilities provided by the PMDs I make use of. Glommio was easily able to integrate safely (theoretically Monoio would be too, although I haven't used it), but I haven't found a safe way to mix Tokio and queue handles which doesn't make it nearly impossible to use async, even when taking that fairly extreme measure.
> 
> > The point I was trying to make is that we (the DPDK safe rust wrapper API) should not be prescriptive in how it is used.
> > In other words: we should allow the user to decide how to spawn/manage/run threads.
> >
> > We must encode the DPDK requirements of e.g. "Rxq concept" with !Send, !Sync marker traits.
> > Then the Rust compiler will at compile-time ensure the users code is correct.
> 
> I agree that !Send and !Sync are likely correct for Rxqs, however, we also need to be very careful in documenting the WHY of !Send and !Sync in each context. For instance, how are we going to get the queue handles to the threads which run the data path if we get all of them from an Eal struct in a Vec on the main thread? We may need to have a way to "deactivate" them so the user can't use them for queue operations but they are Send, !Sync, emit a fence, and then when the user "activates" them it performs another fence to force anything the last thread did with the queue to be visible on the new core. I suspect we'll need to apply a similar pattern for other thread unsafe parts of DPDK in order to get them to where they need to be during execution.


Look at the patch, the difference between a RxqHandle and Rxq encodes exactly what you're asking.
Gregory renamed the "change" function to .activate(), but the fundamental "consume struct and give back !Send pollable Rxq" is the same.
Agree we need things documented, but the C API docs should have that already, see the Rxq example as explained at Userspace: https://youtu.be/lb6xn2xQ-NQ?t=890.


> > I don't believe that I can identify all use-cases, so we cannot design requirements around statements like "I think X is more likely than Y".
> 
> I agree, this is why unsafe escape hatches will be necessary. Someone will have some weird edge-case like a CPU with no cache that makes it fine to move Rxqs around with abandon.

No need for unsafe, just not be prescriptive in how threading "should work", just be flexible and allow the user to decide.
All the proposed DPDK-rs does is provides safe Rust structs that encode the correct Send/Sync requirements, nothing more.
After that, any user can correctly use our APIs, and if it compiles, then its correct (from a threading POV).
Even users with "weird edge-cases like a CPU with no cache" will still work correctly.


> > Harry wrote:
> > > Lets focus on Tokio first: it is an "async runtime" (two links for future readers)
> > >     <snip>
> > > So an async runtime can run "async" Rust functions (called Futures, or Tasks when run independently..)
> > > There are lots of words/concepts, but I'll focus only on the thread creation/control aspect, given the DPDK EAL lcore context.
> > >
> > > Tokio is a work-stealing scheduler. It spawns "worker" threads, and then gives these "tasks"
> > > to various worker cores (similar to how Golang does its work-stealing scheduling). Some
> > > DPDK crate users might like this type of workflow, where e.g. RXQ polling is a task, and the
> > > "tokio runtime" figures out which worker to run it on. "Spawning" a task causes the "Future"
> > > to start executing. (technical Rust note: notice the "Send" bound on Future: https://docs.rs/tokio/latest/tokio/task/fn.spawn.html )
> > > The work stealing aspect of Tokio has also led to some issues in the Rust ecosystem. What it effectively means is that every "await" is a place where you might get moved to another thread. This means that it would be unsound to, for example, have a queue handle on devices without MT-safe queues unless we want to put a mutex on top of all of the device queues. I personally think this is a lot of the source of people thinking that Rust async is hard, because Tokio forces you to be thread safe at really weird places in your code and has issues like not being able to hold a mutex over an await point.
> > >
> > > Other users might prefer the "thread-per-core" and CPU pinning approach (like DPDK itself would do).
> > > nit: Tokio also spawns a thread per core, it just freely moves tasks between cores. It doesn't pin because it's designed to interoperate with the normal kernel scheduler more nicely. I think that not needing pinned cores is nice, but we want the ability to pin for performance reasons, especially on NUMA/NUCA systems (NUCA = Non-Uniform Cache Architecture, almost every AMD EPYC above 8 cores, higher core count Intel Xeons for 3 generations, etc).
> > > Monoio and Glommio both serve these use cases (but in slightly different ways!). They both spawn threads and do CPU pinning.
> > > Monoio and Glommio say "tasks will always remain on the local thread". In Rust techie terms: "Futures are !Send and !Sync"
> > > https://docs.rs/monoio/latest/monoio/fn.spawn.html    
> > > https://docs.rs/glommio/latest/glommio/fn.spawn_local.html
> >
> > Owen wrote:
> > > There is also another option, one which would eliminate "service cores". We provide both a work stealing pool of tasks that have to deal with being yanked between cores/EAL threads at any time, but aren't data plane tasks, and then a different API for spawning tasks onto the local thread/core for data plane tasks (ex: something to manage a particular HTTP connection). This might make writing the runtime harder, but it should provide the best of both worlds provided we can build in a feature (Rust provides a way to "ifdef out" code via features) to disable one or the other if someone doesn't want the overhead.
> >
> > Hah, yeah.. (as maintainer of service cores!) I'm aware that the "async Rust" cooperative scheduling is very similar.
> > That said, the problem service-cores set out to solve is a very different one to how "async Rust" came about.
> > The implementations, ergonomics, and the language its written in are different too... so they're different beasts!
> 
> I think we could still make use of the idea of separate pools of thread local and global tasks.
> 
> > We don't want to start writing "dpdk-async-runtime". The goal is not to duplicate everything, we must integrate with existing.
> 
> What do you picture someone who picks up "dpdk-rs" seeing as the interface to DPDK when it's fully integrated? Do they enable a feature flag in their async runtime and the runtime handles it for them, do they set up DPDK and start the runtime? Most of the libraries I'm aware of assume the presence of an OS network stack. Yes, there are some like smoltcp which are capable of operating on top of the l2 interface provided by DPDK, but most are going to want a network stack to exist on top of.

DPDK-rs remains DPDK, and the Rust APIs remain at the same level of C APIs.
When I say "integrate with" I mean that DPDK-rs APIs should enable others to build on top of it.
I reference some examples (eg SmolTCP, Tokio etc) because knowledge of how they could consume DPDK gives good context.

I am NOT proposing that DPDK-rs includes more features than DPDK-via-C-API.
DPDK-rs is "just" a safe Rust interface to DPDK functionality.

I am advocating that we understand how things integrate and try support/be-aware of those usages,
primarily to ensure that topics like threading can be resolved well. Yes other libraries expect a TcpListener,
and libraries like SmolTCP (or the DemiKernel Netstack, or FuchsiaOS's netstack3, etc) may provide that bridge.

But DPDK-rs is just DPDK: as first priority, a high-performance L2 ethernet packet I/O library.
Due to Rust language features, we can build in safety via Send/Sync of structs, and nice API design.
To me, that's the goal for a minimal DPDK-rs release.


> > I will try provide some examples of integrating DPDK with other Rust networking projects, to prove that it can be done, and is useful.
> >
> > Harry wrote:
> > > So there are at least 3 different async runtimes (and I haven't even talked about async-std, smol, embassy, ...) which
> > > all have different use-cases, and methods of running "tasks" on threads. These runtimes exist, and are widely used,
> > > and applications make use of their thread-scheduling capabilities.
> > >
> > > So "async runtimes" do thread creation (and optionally CPU pinning) for the user.
> > > Other libraries like "Rayon" are thread-pool managers, those also have various CPU thread-create/pinning capabilities.
> > > If DPDK *also* wants to do thread creation/management and CPU-thread-to-core pinning for the user, that creates tension.
> > > The other problem is that most of these async runtimes have IO very tightly integrated into them. A large portion of Tokio had to be forked and rewritten for io_uring support, and DPDK is a rather stark departure from what they were all designed for. I know that both Tokio and Glommio have "start a new async runtime on this thread" functions, and I think that Tokio has an "add this thread to a multithreaded runtime" somewhere.
> > >
> > > I think the main thing that DPDK would need to be concerned about is that many of these runtimes use thread locals, and I'm not sure if that would be transparently handled by the EAL thread runtime since I've always used thread per core and then used the Rust runtime to multiplex between tasks instead of spawning more EAL threads.
> > >
> > > Rayon should probably be thought of in a similar vein to OpenMP, since it's mainly designed for batch processing. Unless someone is doing some fairly heavy computation (the kind where "do we want a GPU to accelerate this?" becomes a question) inside of their DPDK application, I'm having trouble thinking of a use case that would want both DPDK and Rayon.
> >>
> > > > Bruce wrote: "so having Rust (not DPDK) do all thread management is the way to go (again IMHO)."
> > >
> > > I think I agree here, in order to make the Rust DPDK crate usable from the Rust ecosystem,
> > > it must align itself with the existing Rust networking ecosystem.
> > >
> > > That means, the DPDK Rust crate should not FORCE the usage of lcore pinnings and mappings.
> > > Allowing a Rust application to decide how to best handle threading (via Rayon, Tokio, Monoio, etc)
> > > will allow much more "native" or "ergonomic" integration of DPDK into Rust applications.
> >
> > Owen wrote:
> > > I'm not sure that using DPDK from Rust will be possible without either serious performance sacrifices or rewrites of a lot of the networking libraries. Tokio continues to mimic the BSD sockets API for IO, even with the io_uring version, as does glommio. The idea of the "recv" giving you a buffer without you passing one in isn't really used outside of some lower-level io_uring crates. At a bare minimum, even if DPDK managed to offer an API that works exactly the same ways as io_uring or epoll, we would still need to go to all of the async runtimes and get them to plumb DPDK support in or approve someone from the DPDK community maintaining support. If we don't offer that API, then we either need rewrites inside of the async runtimes or for individual libraries to provide DPDK support, which is going to be even more difficult.
> >
> > Regarding traits used for IO, correct many are focussed on "recv" giving you a buffer, but not all. Look at Monoio, specifically the *Rent APIs:
> > https://docs.rs/monoio/latest/monoio/io/index.html#traits
> 
> As far as I can tell, the *Rent APIs for Monoio have the same problem, they require you to pass in a buffer, and to satisfy that API we'd need to throw out zero copy, pass that buffer directly to the PMD, or do some weird thing were we use that API to recycle buffers back into the mempool. I see, in Monoio terms, a DPDK API looking more like TcpStream::read(&mut self) -> impl Future<Output = BufResult<usize, dpdk::PktMbuf>> or some equivalent abstraction on top.
> 
> > Owen wrote:
> > > I agree that forcing lcore pinnings and mappings isn't good, but I think that DPDK is well within its rights to build its own async runtime which exposes a standard API. For one thing, the first thing Rust users will ask for is a TCP stack, which the community has been discussing and debating for a long time. I think we should figure out whether the goal is to allow DPDK applications to be written in Rust, or to allow generic Rust applications to use DPDK. The former means that the audience would likely be Rust-fluent people who would have used DPDK regardless, and are fine dealing with mempools, mbufs, the eal, and ethdev configuration. The latter is a much larger audience who is likely going to be less tolerant of dpdk-rs exposing the true complexity of using DPDK. Yes, Rust can help make the abstractions better, but there's an amount of inherent complexity in "Your NIC can handle IPSec for you and can also direct all IPv6 traffic to one core" that I don't think we can remove.
> >
> > Ok, we're getting very far into future/conceptual design here.
> > For me, DPDK having its own async runtime and its own DPDK TCP stack is NOT the goal.
> > We should try to integrate DPDK with existing software environments - not rewrite the world.
> 
> Which existing software environments are you thinking of exactly? Most Rust applications that use networking are going to be using Axum, Tower, and the other crates that you've mentioned, and all of those rely on having a TCP stack to be useful. I have found vanishingly few Rust crates which handle integration with DPDK without me editing them to some degree. I'd like to know where you're finding existing Rust software environments which don't care about the presence of a network stack but are still networking oriented. If the goal is to take a DPDK application that would have been written in C/C++ and write it in Rust instead, that is very different than taking an application which would have happily used the OS network stack, such as an HTTP server which deals with normal (<1k RPS) amounts of traffic, and moving it onto DPDK, and it seems to me like you are suggesting that we should focus on the latter.

As above, DPDK-rs is for accelerated packet I/O. Perhaps with some offload features etc in future,
but fundamentally its a high-speed packet I/O library.

Other libraries can build on top, I've done a small (sorry for the pun!) example with SmolTCP,
and integrating DPDK into the "phy" device abstraction: it is not difficult. This provides a route
to TCP with high performance I/O under the hood...

So you mention "HTTP is <1k RPS", that assumption is not correct in all cases.
Use-cases like Next-Gen-FireWall (NGFW) and Reverse-proxy require L7 HTTP processing.
Some even go as far as doing "TLS bumping" (aka MITM inspection; eg internally in a company network).

In these cases, the requirement for L7 HTTP(s) parsing, TLS decrypt/DPI/crypt is huge, with
DPDK levels of performance absolutely being required (or scaling to 100s of boxes doing <1k RPS each!)

I believe the above cases are not easily catered for, because the projects (e.g, Snort, Envoy)
were mostly designed in a pre-DPDK era, and hence expect kernel/FD based I/O. I believe that the lack
of clear C-API abstraction into L7/HTTP layers has stifled some of those projects from consuming DPDK.

So yes, DPDK-rs initially should focus on core priorities: L2 ethernet I/O.
But because the abstractions are more easily ported in Rust, ensuring we don't "design out" these
other use-cases is very important to me - I believe it can expand the potential use-cases for the
core DPDK functionality (Ethdev and the PMDs) a lot.


> > Owen wrote:
> > > I personally think that making an API for DPDK applications to be written in Rust, and then steadily adding abstractions on top of that until we arrive at something that someone who has never looked at a TCP header can use without too much confusion. That was part of the goal of the Iris project I pitched (and then had to go finish another project so the design is still WIP). I think that a move to DPDK is going to be as radical of a change as a move to io_uring, however, DPDK is fast enough that I think it may be possible to convince people to do a rewrite once we arrive at that high level API.
> >
> > I haven't heard of the Iris project you mentioned, is there something concrete to learn from, or is it too WIP to apply?
> 
> I have some design docs, but nothing concrete. I got pulled back to another project which is still ongoing shortly after I gave the talk at the last DPDK summit. The main goal of Iris is to provide a DPDK-based alternative to something like a gRPC with a message-based API instead of a byte-based one, and to take advantage of the massive amount of extra breathing room under that new API (as compared to TCP) to plumb in the various accelerators integrated into DPDK alongside a network stack. It's based on observations that many developers aren't even working at a TCP or HTTP level any more, but are instead using "JSON RPC over HTTPS which is automatically converted into objects by their HTTP server framework" or something like gRPC to have a "send message to server" and "get message to server" API. Most of what I have for that is a lot of time spent thinking about a Rust-based API on top of DPDK as a foundation for building the rest of the network stack on top.

Wauw, big project goals; interesting. (Techie note, checkout Zenoh, and check how SmolTCP allocates its rx/tx buffers allocated in hugepages, lots of cool potential here!)

As above, I think DPDK-rs should focus on "Safe L2 packet I/O" for Rust. So while "cool stuff" above, my focus is on a good/safe L2 API first and foremost.


> > Owen wrote:
> > > "Swap out your sockets and rework the functions that do network IO for a 5x performance increase" is a very, very attractive offer, but for us to get there I think we need to have DPDK's full potential available in Rust, and then build as many zero-overhead (zero cost or you couldn't write it better yourself) abstractions as we can on top. I want to avoid a situation where we build up to the high-level APIs as fast as we can and then end up in a situation where you have "Easy Mode" and then "C DPDK written in Rust" as your two options.
> >
> > My perspective is that we're carefully designing "Safe Rust" APIs, and will have "DPDKs full potential" as a result.
> > I'm not sure where the "easy mode" comment applies. But lets focus on code - and making concrete progress - over theoretical discussions.
> >
> > I'll keep my input more consise in future, and try get more patches on list for review.
> > > > Regards,
> > > > Gregory
> > >
> > > Apologies for the long-form, "wall of text" email, but I hope it captures the nuance of threading and
> > > async runtimes, which I believe in the long term will be very nice to capture "async offload" use-cases
> > > for DPDK. To put it another way, lookaside processing can be hidden behind async functions & runtimes,
> > > if we design the APIs right: and that would be really cool for making async-offload code easy to write correctly!
> > >
> > > Regards, -Harry
> > >
> > > Sorry for my own walls of text. As a consequence of working on Iris I've spent a lot of time thinking about how to make DPDK easier to use while keeping the performance intact, and I was already thinking in Rust since it provides one of the better options for these kinds of abstractions (the other option I see is Mojo, which isn't ready yet). I want to see DPDK become more accessible, but the performance and access to hardware is one of the main things that make DPDK special, so I don't want to compromise that. I definitely agree that we need to force DPDK's existing APIs to justify themselves in the face of the new capabilities of Rust, but I think that starting from "How are Rust applications written today?" is a mistake.
> > >
> > > Regards,
> > > Owen
> >
> > Generally agree, but just this line stood out to me:
> > > Owen wrote:   I think that starting from "How are Rust applications written today?" is a mistake.
> >
> > We have to understand how applications are written today, in order to understand what it would take to move them to a DPDK backend.
> > In C, consuming DPDK is hard, as applications expect TCP via sockets, and DPDK provides mbuf*s: that's a large mismatch. (Yes I'm aware of various DPDK-aware TCP stacks etc.)
> >
> > In Rust, applications expect a "let tcp_port = TcpListener::bind()", and then to "tcp_port.accept()" incoming requests.
> > Those requirements can be met by: std::net::TcpListener, tokio::net::TcpListener, and in future, some DPDK (SmolTCP?) based TcpListener.
> > - https://doc.rust-lang.org/std/net/struct.TcpListener.html
> > - https://docs.rs/tokio/latest/tokio/net/struct.TcpListener.html
> >
> > The ability to move between abstractions is much easier in Rust. As a result, providing "normal looking APIs" is IMO the best way forward.
> 
> Yes, moving between abstractions is easier in Rust, but I think that the abstraction provided by std::net::TcpListener and tokio::net::TcpListener is flawed. I'm not sure there is a good way to provide a "normal" API without fairly serious performance compromises. For example, as I'm sure everyone here is aware, the traditional BSD sockets API requires double the memory bandwidth that a zero-copy one does on the rx path. Those APIs also ignore TLS, meaning that we would actually need to go look at a wrapper over rustls or some other TLS implementation as what users interact with. I can keep going up levels, but this is why I decided to put the highest level of abstraction in Iris, the one I intend most people to interact with at "get this blob of bytes over to that other server as a message, possibly encrypting it, compressing it, doing zero trust checks, etc". I'm not sure if applications expect a TcpListener, so much as an HttpListener, or a JsonRPCListener. I think it would be wise to determine what type of API people would want for a dpdk-rs, rather than making an assumption that they want something like BSD sockets. Even inside of the kernel io_uring has been breaking away from that API with an API that looks a lot more like what I would expect from DPDK, and providing ergonomics benefits to users while doing it.
> 
> > Regards, and thanks for the input & discussion. -Harry
> 
> Thanks for the discussion, and I hope to continue to work with all of you on this,
> Owen

Thanks, good input! Regards, -Harry

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
  2025-05-09 16:24                         ` Van Haaren, Harry
@ 2025-05-10 16:05                           ` Owen Hilyard
  0 siblings, 0 replies; 20+ messages in thread
From: Owen Hilyard @ 2025-05-10 16:05 UTC (permalink / raw)
  To: Van Haaren, Harry, Etelson, Gregory, Richardson, Bruce; +Cc: dev

> ‎From: Van Haaren, Harry <harry.van.haaren@intel.com>
> Sent: Friday, May 9, 2025 12:24 PM
> To: Owen Hilyard <Owen.Hilyard@unh.edu>; Etelson, Gregory <getelson@nvidia.com>; Richardson, Bruce <bruce.richardson@intel.com>
> Cc: dev@dpdk.org <dev@dpdk.org>
> Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
> 
> > From: Owen Hilyard
> > Sent: Friday, May 09, 2025 12:53 AM
> > To: Van Haaren, Harry; Etelson, Gregory; Richardson, Bruce
> > Cc: dev@dpdk.org
> > Subject: Re: [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq
> >
<snip>

> > > Maybe it will help to split the conversation into two threads, with one focussing on
> > "DPDK used through Safe Rust abstractions", and the other on "future cool use-cases".
> >
> > Agree.
> > 
> > > Perhaps I jumped a bit too far ahead mentioning async runtimes, and while I like the enthusiasm for designing "cool new stuff", it is probably better to be realistic around what will get "done": my bad.
> > >
> > > I'll reply to the "DPDK via Safe Rust" topics below, and start a new thread (with same folks on CC) for "future cool use-cases" when I've had a chance to clean up a little demo to showcase them.
> > > 
> > >
> > > > > > Thanks for sharing. However, IMHO using EAL for thread management in rust
> > > > > > is the wrong interface to expose.
> > > > >
> > > > > EAL is a singleton object in DPDK architecture.
> > > > > I see it as a hub for other resources.
> > >
> > > Harry Wrote:
> > > > Yep, i tend to agree here; EAL is central to the rest of DPDK working correctly.
> > > > And given EALs implementation is heavily relying on global static variables, it is
> > > > certainly a "singleton" instance, yes.
> > >
> > > Owen wrote:
> > > > I think a singleton one way to implement this, but then you lose some of the RAII/automatic resource management behavior. It would, however, make some APIs inherently unsafe or very unergonomic unless we were to force rte_eal_cleanup to be run via atexit(3) or the platform equivalent and forbid the user from running it themselves. For a lot of Rust runtimes similar to the EAL (tokio, glommio, etc), once you spawn a runtime it's around until process exit. The other option is to have a handle which represents the state of the EAL on the Rust side and runs rte_eal_init on creation and rte_eal_cleanup on destruction. There are two ways we can make that safe. First, reference counting, once the handles are created, they can be passed around easily, and the last one runs rte_eal_cleanup when it gets dropped.  This avoids having tons of complicated lifetimes and I think that, everywhere that it shouldn't affect fast path performance, we should use refcounting.
> > >
> > > Agreed, refcounts for EAL "singleton" concept yes. For the record, the initial patch actually returns a
> > "dpdk" object from dpdk::Eal::init(), and Drop impl has a // TODO rte_eal_cleanup(), so well aligned on approach here.
> > > https://patches.dpdk.org/project/dpdk/patch/20250418132324.4085336-1-harry.van.haaren@intel.com/
> >
> > One thing I think I'd like to see is using a "newtype" for important numbers (ex: "struct EthDevQueueId(pub u16)"). This prevents some classes of error but if we make the constructor public it's at most a minor inconvenience to anyone who has to do something a bit odd.
> >
> > > > Owen wrote:
> > > The other option is to use lifetimes. This is doable, but is going to force people who are more likely to primarily be C or C++ developers to dive deep into Rust's type system if they want to build abstractions over it. If we add async into the mix, as many people are going to want to do, it's going to become much, much harder. As a result, I'd advocate for only using it for data path components where refcounting isn't an option.
> > >
> > > +1 to not using lifetimes here, it is not the right solution for this EAL / singleton type problem.
> >
> > Having now looked over the initial patchset in more detail, I think we do have a question of how far down "it compiles it works" we want to go. For example, using typestates to make Eal::take_eth_ports impossible to call more than once using something like this:
> >
> > #[derive(Debug, Default)]
> > pub struct Eal<const HAS_ETHDEV_PORTS: bool> {
> >    eth_ports: Vec<eth::Port>,
> > }
> >
> > impl<const HAS_ETHDEV_PORTS: bool> Eal<HAS_ETHDEV_PORTS> {
> >    pub fn init() -> Result<Self, String> {
> >         // EAL init() will do PCI probe and VDev enumeration will find/create eth ports.
> >        // This code should loop over the ports, and build up Rust structs representing them
> >         let eth_port = vec![eth::Port::from_u16(0)];
> >        Ok(Eal {
> >            eth_ports: Some(eth_port),
> >        })
> >   }
> > }
> >
> > impl Eal<true> {
> >    pub fn take_eth_ports(self) -> (Eal<false>, Vec<eth::Port>) {
> >        (Eal::<false>::default(), self.eth_ports.take())
> >    }
> > }
> >
> > impl<const HAS_ETHDEV_PORTS: bool> Drop for Eal<HAS_ETHDEV_PORTS> {
> >    fn drop(&mut self) {
> >        if HAS_ETHDEV_PORTS {
> >            // extra desired port cleanup
> >        }
> >        // todo: rte_eal_cleanup()
> >    }
> > }
> >
> > This does add some noise to looking at the struct, but also lets the compiler enforce what state a struct should be in to call a given function. Taken to its logical extreme, we could create an API where many of the "resource in wrong state" errors should be impossible. However, it also requires more knowledge of Rust's type system on the part of the people making the API and can be a bit harder to understand without an LSP helping you along.
> 
> This is too much in my opinion. I know there's value, but the ergonomics suffers significantly if we have generics over Eal.
I'd like to not treat Ethdev "differently" to other Devs. And if we give Ethdev a generic for EAL, then the others would too; exploding the generic counts & complixity.

Another option would be to have the EAL init return a tuple or struct you are meant to de-structure. Ex:

#[derive(Debug, Default)]
pub struct Eal {}

pub struct EalPartsHolder {
    pub eal: Eal,
    pub ethdev_ports: Vec<eth::Port>,
}

impl Eal {
    pub fn init() -> Result<EalPartsHolder, String> {
        // EAL init() will do PCI probe and VDev enumeration will find/create eth ports.
        // This code should loop over the ports, and build up Rust structs representing them
        let eth_ports = vec![eth::Port::from_u16(0)];
        Ok(EalPartsHolder {
            eal: Self {},
            ethdev_ports: eth_ports,
        })
    }
}

pub fn main() {
    let EalPartsHolder { eal, ethdev_ports } = Eal::init().unwrap();
}

DPDK is complex, especially for new people learning, so I want to create "pits of success" where by the time you make the code compile it should work. This destructuring API might be more ergonomic for users, since only state which is expected to be attached to the EAL is kept inside of the EAL struct, and users can use ".." to "don't care" about any resources they don't plan to use. We can add on other PMD types and resources over time.  

However, I think this does warrant a discussion on how far we're willing to push Rust's type system in the name of "zero overhead abstractions". My own tolerance for extensive use of generics is fairly high, and I would advocate for leaving no stone unturned in the type system in order to avoid runtime overhead in a safe API. In the kernel, the Rust for Linux project has had to make extensive use of advanced type system features in order to encode some more complicated APIs correctly, especially inside of the  DRM and filesystems abstractions. While I think most of DPDK's APIs are reasonable to encode with simple typestates, in some cases such as async rte_flow I think will want to make heavier use of some of these design patterns. What I propose is that all efforts be made to simplify the API without sacrificing safety or performance, but if the API is impossible to encode in "simple Rust", then case-by-case determinations can be made as to whether the users of the API are better served by runtime overhead or cognitive overhead.

<snip>

> So this technique is really cool, but not the right tradeoff in this case.

<snip>

> > > The key point above is "except where runtimes force them to mix". The DPDK rxq concept (struct Rxq in the code linked above) is !Send.
> > > As a result, it cannot be moved between threads. That allows per-lcore concepts to be used for performance.
> >
> > The problem is that, with Tokio, it also can't be held across an await point. I agree that !Send is correct, but the existence of !Send resources means that integration with Tokio is much, much harder. For PMDs with RTE_ETH_TX_OFFLOAD_MT_LOCKFREE, TX is fine, but as far as I am aware there is no equivalent for RX. And, to safely take advantage of the TX version, we'd need to know the capabilities of the target PMD at compile time, which is part of why my own bindings "devirtualize" the EAL and require a top-level function which dispatches based on the capabilities provided by the PMDs I make use of. Glommio was easily able to integrate safely (theoretically Monoio would be too, although I haven't used it), but I haven't found a safe way to mix Tokio and queue handles which doesn't make it nearly impossible to use async, even when taking that fairly extreme measure.
> >
> > > The point I was trying to make is that we (the DPDK safe rust wrapper API) should not be prescriptive in how it is used.
> > > In other words: we should allow the user to decide how to spawn/manage/run threads.
> > >
> > > We must encode the DPDK requirements of e.g. "Rxq concept" with !Send, !Sync marker traits.
> > > Then the Rust compiler will at compile-time ensure the users code is correct.
> >
> > I agree that !Send and !Sync are likely correct for Rxqs, however, we also need to be very careful in documenting the WHY of !Send and !Sync in each context. For instance, how are we going to get the queue handles to the threads which run the data path if we get all of them from an Eal struct in a Vec on the main thread? We may need to have a way to "deactivate" them so the user can't use them for queue operations but they are Send, !Sync, emit a fence, and then when the user "activates" them it performs another fence to force anything the last thread did with the queue to be visible on the new core. I suspect we'll need to apply a similar pattern for other thread unsafe parts of DPDK in order to get them to where they need to be during execution.


> Look at the patch, the difference between a RxqHandle and Rxq encodes exactly what you're asking.
> Gregory renamed the "change" function to .activate(), but the fundamental "consume struct and give back !Send pollable Rxq" is the same.
> Agree we need things documented, but the C API docs should have that already, see the Rxq example as explained at Userspace: https://www.youtube.com/watch?t=890&v=lb6xn2xQ-NQ&feature=youtu.be

My mistake.

> > > I don't believe that I can identify all use-cases, so we cannot design requirements around statements like "I think X is more likely than Y".
> >
> > I agree, this is why unsafe escape hatches will be necessary. Someone will have some weird edge-case like a CPU with no cache that makes it fine to move Rxqs around with abandon.
> 
> No need for unsafe, just not be prescriptive in how threading "should work", just be flexible and allow the user to decide.
> All the proposed DPDK-rs does is provides safe Rust structs that encode the correct Send/Sync requirements, nothing more.
> After that, any user can correctly use our APIs, and if it compiles, then its correct (from a threading POV).
> Even users with "weird edge-cases like a CPU with no cache" will still work correctly.

I think that it is reasonable to have a guarded, easy to find and audit, "trust the developer" escape hatch, not just for threads but for large parts of DPDK's API surface. I think we've all had to do slightly odd things to meet a performance target or make a feature work in a codebase without a redesign, and an API which allows users to express that they have upheld the invariants of the C API in ways they cannot tell the compiler about is precisely what may be needed there. Ideally, we can provide flexible enough safe APIs that will work for everyone, but covering every use-case and scenario is impossible. Possibly this can be covered by the "raw" API and the ability to get the necessary identifiers or pointers out of various handles, and any team which wants to forbid that can stick a #![forbid(unsafe_code)] in their main file or lib file and be on their way. 


> > > Harry wrote:
> > > > Lets focus on Tokio first: it is an "async runtime" (two links for future readers)
> > > >     <snip>
> > > > So an async runtime can run "async" Rust functions (called Futures, or Tasks when run independently..)
> > > > There are lots of words/concepts, but I'll focus only on the thread creation/control aspect, given the DPDK EAL lcore context.
> > > >
> > > > Tokio is a work-stealing scheduler. It spawns "worker" threads, and then gives these "tasks"
> > > > to various worker cores (similar to how Golang does its work-stealing scheduling). Some
> > > > DPDK crate users might like this type of workflow, where e.g. RXQ polling is a task, and the
> > > > "tokio runtime" figures out which worker to run it on. "Spawning" a task causes the "Future"
> > > > to start executing. (technical Rust note: notice the "Send" bound on Future: https://docs.rs/tokio/latest/tokio/task/fn.spawn.html )
> > > > The work stealing aspect of Tokio has also led to some issues in the Rust ecosystem. What it effectively means is that every "await" is a place where you might get moved to another thread. This means that it would be unsound to, for example, have a queue handle on devices without MT-safe queues unless we want to put a mutex on top of all of the device queues. I personally think this is a lot of the source of people thinking that Rust async is hard, because Tokio forces you to be thread safe at really weird places in your code and has issues like not being able to hold a mutex over an await point.
> > > >
> > > > Other users might prefer the "thread-per-core" and CPU pinning approach (like DPDK itself would do).
> > > > nit: Tokio also spawns a thread per core, it just freely moves tasks between cores. It doesn't pin because it's designed to interoperate with the normal kernel scheduler more nicely. I think that not needing pinned cores is nice, but we want the ability to pin for performance reasons, especially on NUMA/NUCA systems (NUCA = Non-Uniform Cache Architecture, almost every AMD EPYC above 8 cores, higher core count Intel Xeons for 3 generations, etc).
> > > > Monoio and Glommio both serve these use cases (but in slightly different ways!). They both spawn threads and do CPU pinning.
> > > > Monoio and Glommio say "tasks will always remain on the local thread". In Rust techie terms: "Futures are !Send and !Sync"
> > > > https://docs.rs/monoio/latest/monoio/fn.spawn.html
> > > > https://docs.rs/glommio/latest/glommio/fn.spawn_local.html
> > >
> > > Owen wrote:
> > > > There is also another option, one which would eliminate "service cores". We provide both a work stealing pool of tasks that have to deal with being yanked between cores/EAL threads at any time, but aren't data plane tasks, and then a different API for spawning tasks onto the local thread/core for data plane tasks (ex: something to manage a particular HTTP connection). This might make writing the runtime harder, but it should provide the best of both worlds provided we can build in a feature (Rust provides a way to "ifdef out" code via features) to disable one or the other if someone doesn't want the overhead.
> > >
> > > Hah, yeah.. (as maintainer of service cores!) I'm aware that the "async Rust" cooperative scheduling is very similar.
> > > That said, the problem service-cores set out to solve is a very different one to how "async Rust" came about.
> > > The implementations, ergonomics, and the language its written in are different too... so they're different beasts!
> >
> > I think we could still make use of the idea of separate pools of thread local and global tasks.
> >
> > > We don't want to start writing "dpdk-async-runtime". The goal is not to duplicate everything, we must integrate with existing.
> >
> > What do you picture someone who picks up "dpdk-rs" seeing as the interface to DPDK when it's fully integrated? Do they enable a feature flag in their async runtime and the runtime handles it for them, do they set up DPDK and start the runtime? Most of the libraries I'm aware of assume the presence of an OS network stack. Yes, there are some like smoltcp which are capable of operating on top of the l2 interface provided by DPDK, but most are going to want a network stack to exist on top of.
> 
> DPDK-rs remains DPDK, and the Rust APIs remain at the same level of C APIs.
> When I say "integrate with" I mean that DPDK-rs APIs should enable others to build on top of it.
> I reference some examples (eg SmolTCP, Tokio etc) because knowledge of how they could consume DPDK gives good context.
> 
> I am NOT proposing that DPDK-rs includes more features than DPDK-via-C-API.
> DPDK-rs is "just" a safe Rust interface to DPDK functionality.
> 
> I am advocating that we understand how things integrate and try support/be-aware of those usages,
> primarily to ensure that topics like threading can be resolved well. Yes other libraries expect a TcpListener,
> and libraries like SmolTCP (or the DemiKernel Netstack, or FuchsiaOS's netstack3, etc) may provide that bridge.
> 
> But DPDK-rs is just DPDK: as first priority, a high-performance L2 ethernet packet I/O library.
> Due to Rust language features, we can build in safety via Send/Sync of structs, and nice API design.
> To me, that's the goal for a minimal DPDK-rs release.

That makes sense. I thought you were going in a different direction which confused me.  

> > > I will try provide some examples of integrating DPDK with other Rust networking projects, to prove that it can be done, and is useful.
> > >
> > > Harry wrote:
> > > > So there are at least 3 different async runtimes (and I haven't even talked about async-std, smol, embassy, ...) which
> > > > all have different use-cases, and methods of running "tasks" on threads. These runtimes exist, and are widely used,
> > > > and applications make use of their thread-scheduling capabilities.
> > > >
> > > > So "async runtimes" do thread creation (and optionally CPU pinning) for the user.
> > > > Other libraries like "Rayon" are thread-pool managers, those also have various CPU thread-create/pinning capabilities.
> > > > If DPDK *also* wants to do thread creation/management and CPU-thread-to-core pinning for the user, that creates tension.
> > > > The other problem is that most of these async runtimes have IO very tightly integrated into them. A large portion of Tokio had to be forked and rewritten for io_uring support, and DPDK is a rather stark departure from what they were all designed for. I know that both Tokio and Glommio have "start a new async runtime on this thread" functions, and I think that Tokio has an "add this thread to a multithreaded runtime" somewhere.
> > > >
> > > > I think the main thing that DPDK would need to be concerned about is that many of these runtimes use thread locals, and I'm not sure if that would be transparently handled by the EAL thread runtime since I've always used thread per core and then used the Rust runtime to multiplex between tasks instead of spawning more EAL threads.
> > > >
> > > > Rayon should probably be thought of in a similar vein to OpenMP, since it's mainly designed for batch processing. Unless someone is doing some fairly heavy computation (the kind where "do we want a GPU to accelerate this?" becomes a question) inside of their DPDK application, I'm having trouble thinking of a use case that would want both DPDK and Rayon.
> > > >
> > > > > Bruce wrote: "so having Rust (not DPDK) do all thread management is the way to go (again IMHO)."
> > > >
> > > > I think I agree here, in order to make the Rust DPDK crate usable from the Rust ecosystem,
> > > > it must align itself with the existing Rust networking ecosystem.
> > > >
> > > > That means, the DPDK Rust crate should not FORCE the usage of lcore pinnings and mappings.
> > > > Allowing a Rust application to decide how to best handle threading (via Rayon, Tokio, Monoio, etc)
> > > > will allow much more "native" or "ergonomic" integration of DPDK into Rust applications.
> > >
> > > Owen wrote:
> > > > I'm not sure that using DPDK from Rust will be possible without either serious performance sacrifices or rewrites of a lot of the networking libraries. Tokio continues to mimic the BSD sockets API for IO, even with the io_uring version, as does glommio. The idea of the "recv" giving you a buffer without you passing one in isn't really used outside of some lower-level io_uring crates. At a bare minimum, even if DPDK managed to offer an API that works exactly the same ways as io_uring or epoll, we would still need to go to all of the async runtimes and get them to plumb DPDK support in or approve someone from the DPDK community maintaining support. If we don't offer that API, then we either need rewrites inside of the async runtimes or for individual libraries to provide DPDK support, which is going to be even more difficult.
> > >
> > > Regarding traits used for IO, correct many are focussed on "recv" giving you a buffer, but not all. Look at Monoio, specifically the *Rent APIs:
> > https://docs.rs/monoio/latest/monoio/io/index.html#traits
> >
> > As far as I can tell, the *Rent APIs for Monoio have the same problem, they require you to pass in a buffer, and to satisfy that API we'd need to throw out zero copy, pass that buffer directly to the PMD, or do some weird thing were we use that API to recycle buffers back into the mempool. I see, in Monoio terms, a DPDK API looking more like TcpStream::read(&mut self) -> impl Future<Output = BufResult<usize, dpdk::PktMbuf>> or some equivalent abstraction on top.
> >
> > > Owen wrote:
> > > > I agree that forcing lcore pinnings and mappings isn't good, but I think that DPDK is well within its rights to build its own async runtime which exposes a standard API. For one thing, the first thing Rust users will ask for is a TCP stack, which the community has been discussing and debating for a long time. I think we should figure out whether the goal is to allow DPDK applications to be written in Rust, or to allow generic Rust applications to use DPDK. The former means that the audience would likely be Rust-fluent people who would have used DPDK regardless, and are fine dealing with mempools, mbufs, the eal, and ethdev configuration. The latter is a much larger audience who is likely going to be less tolerant of dpdk-rs exposing the true complexity of using DPDK. Yes, Rust can help make the abstractions better, but there's an amount of inherent complexity in "Your NIC can handle IPSec for you and can also direct all IPv6 traffic to one core" that I don't think we can remove.
> > >
> > > Ok, we're getting very far into future/conceptual design here.
> > > For me, DPDK having its own async runtime and its own DPDK TCP stack is NOT the goal.
> > > We should try to integrate DPDK with existing software environments - not rewrite the world.
> >
> > Which existing software environments are you thinking of exactly? Most Rust applications that use networking are going to be using Axum, Tower, and the other crates that you've mentioned, and all of those rely on having a TCP stack to be useful. I have found vanishingly few Rust crates which handle integration with DPDK without me editing them to some degree. I'd like to know where you're finding existing Rust software environments which don't care about the presence of a network stack but are still networking oriented. If the goal is to take a DPDK application that would have been written in C/C++ and write it in Rust instead, that is very different than taking an application which would have happily used the OS network stack, such as an HTTP server which deals with normal (<1k RPS) amounts of traffic, and moving it onto DPDK, and it seems to me like you are suggesting that we should focus on the latter.
>
> As above, DPDK-rs is for accelerated packet I/O. Perhaps with some offload features etc in future,
> but fundamentally its a high-speed packet I/O library.
> 
> Other libraries can build on top, I've done a small (sorry for the pun!) example with SmolTCP,
> and integrating DPDK into the "phy" device abstraction: it is not difficult. This provides a route
> to TCP with high performance I/O under the hood...
> 
> So you mention "HTTP is <1k RPS", that assumption is not correct in all cases.
> Use-cases like Next-Gen-FireWall (NGFW) and Reverse-proxy require L7 HTTP processing.
> Some even go as far as doing "TLS bumping" (aka MITM inspection; eg internally in a company network).
>
> In these cases, the requirement for L7 HTTP(s) parsing, TLS decrypt/DPI/crypt is huge, with
> DPDK levels of performance absolutely being required (or scaling to 100s of boxes doing <1k RPS each!)

I must have spent too long away from DPDK, because when I think of a typical networked application, in Rust or languages, I think of CRUD apps. I agree that NGFW and L7 proxies/load balancers are a better use-cases for DPDK than low request rate HTTP servers. If DPDK ends where it does now or at a slightly high level (ex: provide "IP sockets" and handle ARP/neighbor discovery to ease adoption), then I think there's a lot more space for applications to integrate DPDK without DPDK being forced to conform to legacy APIs. It also provides space for DPDK to provide integrated APIs that properly leverage the hardware.  

> I believe the above cases are not easily catered for, because the projects (e.g, Snort, Envoy)
> were mostly designed in a pre-DPDK era, and hence expect kernel/FD based I/O. I believe that the lack
> of clear C-API abstraction into L7/HTTP layers has stifled some of those projects from consuming DPDK.

Strong agree, Rust should help with that abstraction. 

> So yes, DPDK-rs initially should focus on core priorities: L2 ethernet I/O.
> But because the abstractions are more easily ported in Rust, ensuring we don't "design out" these
> other use-cases is very important to me - I believe it can expand the potential use-cases for the
> core DPDK functionality (Ethdev and the PMDs) a lot.

I think that's good for an MVP. I also think it would be useful to provide abstractions for the security library and other things that DPDK can hardware accelerate, provided we can implement robust software fallbacks once we've gotten the basics working.

> > > Owen wrote:
> > > > I personally think that making an API for DPDK applications to be written in Rust, and then steadily adding abstractions on top of that until we arrive at something that someone who has never looked at a TCP header can use without too much confusion. That was part of the goal of the Iris project I pitched (and then had to go finish another project so the design is still WIP). I think that a move to DPDK is going to be as radical of a change as a move to io_uring, however, DPDK is fast enough that I think it may be possible to convince people to do a rewrite once we arrive at that high level API.
> > >
> > > I haven't heard of the Iris project you mentioned, is there something concrete to learn from, or is it too WIP to apply?
> >
> > I have some design docs, but nothing concrete. I got pulled back to another project which is still ongoing shortly after I gave the talk at the last DPDK summit. The main goal of Iris is to provide a DPDK-based alternative to something like a gRPC with a message-based API instead of a byte-based one, and to take advantage of the massive amount of extra breathing room under that new API (as compared to TCP) to plumb in the various accelerators integrated into DPDK alongside a network stack. It's based on observations that many developers aren't even working at a TCP or HTTP level any more, but are instead using "JSON RPC over HTTPS which is automatically converted into objects by their HTTP server framework" or something like gRPC to have a "send message to server" and "get message to server" API. Most of what I have for that is a lot of time spent thinking about a Rust-based API on top of DPDK as a foundation for building the rest of the network stack on top.

> Wauw, big project goals; interesting. (Techie note, checkout Zenoh, and check how SmolTCP allocates its rx/tx buffers allocated in hugepages, lots of cool potential here!)

Well, large project goals are appropriate for a PhD dissertation project. Zenoh looks interesting, and is something that I'll take a closer look at. Iris is closer to a transport protocol than a pub/sub abstraction, and is designed with the idea of "What if I designed a transport protocol for DPDK, to sit on top of DPDK's APIs and make use of all DPDK has to offer?", but they seem to have some interesting ideas that I might use for handling "reliable" (as in TCP) multicast, something database people have increasing interest in. 

> As above, I think DPDK-rs should focus on "Safe L2 packet I/O" for Rust. So while "cool stuff" above, my focus is on a good/safe L2 API first and foremost.

That makes sense. 

> > Owen wrote:
> > > "Swap out your sockets and rework the functions that do network IO for a 5x performance increase" is a very, very attractive offer, but for us to get there I think we need to have DPDK's full potential available in Rust, and then build as many zero-overhead (zero cost or you couldn't write it better yourself) abstractions as we can on top. I want to avoid a situation where we build up to the high-level APIs as fast as we can and then end up in a situation where you have "Easy Mode" and then "C DPDK written in Rust" as your two options.
> >
> > My perspective is that we're carefully designing "Safe Rust" APIs, and will have "DPDKs full potential" as a result.
> > I'm not sure where the "easy mode" comment applies. But lets focus on code - and making concrete progress - over theoretical discussions.
> >
> > I'll keep my input more consise in future, and try get more patches on list for review.
> > > > Regards,
> > > > Gregory
> > >
> > > Apologies for the long-form, "wall of text" email, but I hope it captures the nuance of threading and
> > > async runtimes, which I believe in the long term will be very nice to capture "async offload" use-cases
> > > for DPDK. To put it another way, lookaside processing can be hidden behind async functions & runtimes,
> > > if we design the APIs right: and that would be really cool for making async-offload code easy to write correctly!
> > >
> > > Regards, -Harry
> > >
> > > Sorry for my own walls of text. As a consequence of working on Iris I've spent a lot of time thinking about how to make DPDK easier to use while keeping the performance intact, and I was already thinking in Rust since it provides one of the better options for these kinds of abstractions (the other option I see is Mojo, which isn't ready yet). I want to see DPDK become more accessible, but the performance and access to hardware is one of the main things that make DPDK special, so I don't want to compromise that. I definitely agree that we need to force DPDK's existing APIs to justify themselves in the face of the new capabilities of Rust, but I think that starting from "How are Rust applications written today?" is a mistake.
> > > 

<snip>

> Thanks, good input! Regards, -Harry

Happy to provide input, 
Owen

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-05-10 16:06 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-17 15:10 [PATCH] rust: RFC/demo of safe API for Dpdk Eal, Eth and Rxq Harry van Haaren
2025-04-17 18:58 ` Etelson, Gregory
2025-04-18 11:40   ` Van Haaren, Harry
2025-04-20  8:57     ` Gregory Etelson
2025-04-24 16:06       ` Van Haaren, Harry
2025-04-27 18:50         ` Etelson, Gregory
2025-04-30 18:28           ` Gregory Etelson
2025-05-01  7:44             ` Bruce Richardson
2025-05-02 12:46               ` Etelson, Gregory
2025-05-02 13:58                 ` Van Haaren, Harry
2025-05-02 15:41                   ` Gregory Etelson
2025-05-02 15:57                     ` Bruce Richardson
2025-05-03 17:13                   ` Owen Hilyard
2025-05-06 16:39                     ` Van Haaren, Harry
2025-05-08 23:53                       ` Owen Hilyard
2025-05-09 16:24                         ` Van Haaren, Harry
2025-05-10 16:05                           ` Owen Hilyard
2025-04-18 13:23 ` [PATCH 1/3] " Harry van Haaren
2025-04-18 13:23   ` [PATCH 2/3] rust: split main into example, refactor to lib.rs Harry van Haaren
2025-04-18 13:23   ` [PATCH 3/3] rust: showcase port Rxq return for stop() and reconfigure Harry van Haaren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).