* Re: [dpdk-dev] proposal: raw packet send and receive API for PMD driver
@ 2015-05-27 4:18 Lin XU
0 siblings, 0 replies; 4+ messages in thread
From: Lin XU @ 2015-05-27 4:18 UTC (permalink / raw)
To: dev
I think it is very important to decouple PMD driver with DPDK framework.
(1) Currently, the rte_mbuf struct is too simple and hard to support complex application such as IPSEC, flow control etc. This key struct should be extendable to support customer defined management header and hardware offloading feature.
(2) To support more NICs.
So, I thinks it time to add new API for PMD(a no radical way), and developer can add initial callback function in PMD for various upper layer protocol procedures.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dpdk-dev] proposal: raw packet send and receive API for PMD driver
@ 2015-05-27 14:50 Wiles, Keith
2015-05-27 15:30 ` Venkatesan, Venky
0 siblings, 1 reply; 4+ messages in thread
From: Wiles, Keith @ 2015-05-27 14:50 UTC (permalink / raw)
To: Lin XU, dev
On 5/26/15, 11:18 PM, "Lin XU" <lxu@astri.org> wrote:
>I think it is very important to decouple PMD driver with DPDK framework.
> (1) Currently, the rte_mbuf struct is too simple and hard to support
>complex application such as IPSEC, flow control etc. This key struct
>should be extendable to support customer defined management header and
>hardware offloading feature.
I was wondering if adding something like M_EXT support for external
storage to DPDK MBUF would be more reasonable.
IMO decoupling PMDs from DPDK will possible impact performance and I would
prefer not to let this happen. The drivers are written for performance,
but they did start out as normal FreeBSD/Linux drivers. Most of the core
code to the Intel drivers are shared between other systems.
> (2) To support more NICs.
>So, I thinks it time to add new API for PMD(a no radical way), and
>developer can add initial callback function in PMD for various upper
>layer protocol procedures.
We have one callback now I think, but what callbacks do you need?
The only callback I can think of is for a stack to know when it can
release its hold on the data as it has been transmitted for retry needs.
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dpdk-dev] proposal: raw packet send and receive API for PMD driver
2015-05-27 14:50 Wiles, Keith
@ 2015-05-27 15:30 ` Venkatesan, Venky
0 siblings, 0 replies; 4+ messages in thread
From: Venkatesan, Venky @ 2015-05-27 15:30 UTC (permalink / raw)
To: Wiles, Keith, Lin XU, dev
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Wiles, Keith
> Sent: Wednesday, May 27, 2015 7:51 AM
> To: Lin XU; dev@dpdk.org
> Subject: Re: [dpdk-dev] proposal: raw packet send and receive API for PMD
> driver
>
>
>
> On 5/26/15, 11:18 PM, "Lin XU" <lxu@astri.org> wrote:
>
> >I think it is very important to decouple PMD driver with DPDK framework.
> > (1) Currently, the rte_mbuf struct is too simple and hard to support
> >complex application such as IPSEC, flow control etc. This key struct
> >should be extendable to support customer defined management header
> and
> >hardware offloading feature.
>
> I was wondering if adding something like M_EXT support for external storage
> to DPDK MBUF would be more reasonable.
>
> IMO decoupling PMDs from DPDK will possible impact performance and I
> would prefer not to let this happen. The drivers are written for performance,
> but they did start out as normal FreeBSD/Linux drivers. Most of the core
> code to the Intel drivers are shared between other systems.
>
This was an explicit design choice to keep the mbuf simple, yet sufficient to service volume NIC controllers and the limited offloads that they have. I would prefer not to have rte_mbuf burdened with all that a protocol stack needs - that will simply increase the size of the structure and penalize applications that need a lean structure (like security applications). Extending the mbuf to 128 bytes itself caused a regression in some performance apps.
That said, extensibility or for that matter a custom defined header is possible with in at least two ways.
a) the userdata pointer field can be set to point to a data structure that contains more information
b) you could simply embed the custom structure (like the pipeline code does) behind the rte_mbuf
These can be used to pass through any information from NICs that support hardware offloads as well as carrying areas for protocol stack specific information (e.g. complete IPSEC offload).
> > (2) To support more NICs.
> >So, I thinks it time to add new API for PMD(a no radical way), and
> >developer can add initial callback function in PMD for various upper
> >layer protocol procedures.
>
> We have one callback now I think, but what callbacks do you need?
>
> The only callback I can think of is for a stack to know when it can release its
> hold on the data as it has been transmitted for retry needs.
> >
> >
The one place that I do think we need to change is the memory allocation framework - allowing external memory allocators (buf alloc/free) so that the driver could be run within a completely different memory allocator system. It can be done with the system we have in place today with specific overrides, but it isn't simple. I think there was another request along similar lines on the list. This would pretty much allow a TCP stack for example to allocate and manage memory (as long as the driver interface via an MBUF can be maintained). If this is something valuable, we could look at pursuing this for the next release.
-Venky
^ permalink raw reply [flat|nested] 4+ messages in thread
* [dpdk-dev] proposal: raw packet send and receive API for PMD driver
@ 2015-05-14 0:26 Guojiachun
0 siblings, 0 replies; 4+ messages in thread
From: Guojiachun @ 2015-05-14 0:26 UTC (permalink / raw)
To: dev; +Cc: Jiangtao (IP Arch), Liujun (Johnson)
Hello,
This is our proposal to introduce new PMD APIs, it would be much better to integrate DPDK into various applications.
There is a gap in hardware offload when you porting DPDK to new platform which support some offload features, like packet accelerator, buffer management etc.
If we can make some supplement to it. It will be easy for porting DPDK to new NIC/platform.
1. Packet buffer management
The PMD driver use DPDK software mempool API to get/put packet buffer in currently. But in other cases, the hardware maybe support buffer management unit. We can use hardware buffer-unit replaces DPDK software mempool to gain efficiencies. So we need to register get/put hook APIs to these eth_dev. Defined as following:
/* <obj_table> include <data: virtual addr> and <phys_addr> */
typedef int (*rbuf_bulk_get_hook)(void* mempool, void **obj_table, unsigned n);
typedef int (*rbuf_bulk_free_hook)(void* memaddr);
typedef int (*eth_dev_init_t)(struct eth_driver *eth_drv, struct rte_eth_dev *eth_dev,
rbuf_bulk_get_hook *rbuf_get, rbuf_bulk_free_hook *rbuf_free);
If there are no hardware buffer-unit, we can register the currently rte_mempool APIs in eth_dev_init(). Each driver use these API hook but not rte_mempool_get_bulk()/rte_mempool_put().
2. Recv/send API and raw_buf
The hardware offload feature differences exist between the NICs. Currently defined in rte_mbuf for rx/tx offload can't applying all NIC. And sometimes modifying rte_mbuf also need to modify all PMD driver. But we can define a union rte_rbuf to resolve it.
struct rte_rbuf {
void *buf_addr; /**< Virtual address of segment buffer. */
phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */
uint16_t buf_len; /**< Length of segment buffer. */
uint16_t data_off;
uint8_t nb_segs; /**< Number of segments. */
union{
struct{
uint32_t rx_offload_data[8];
uint32_t tx_offload_data[8];
} offload_data;
struct{ /* intel nic offload define*/
uint32_t rss;
uint64_t tx_offload;
...
}intel_offload;
/* other NIC offload define */
...
} /* offload define */
}
3. RTE_PKTMBUF_HEADROOM
Each PMD driver need to fill rte_mbuf->data_off according to the macro: RTE_PKTMBUF_HEADROOM. But in some cases, different application need different RTE_PKTMBUF_HEADROOM. Once changing the value of RTE_PKTMBUF_HEADROOM, it should to re-compile all drivers. That means different application need different driver lib, but not the same one.
So we can pass a argument dynamically in eth_dev_init() to replace the MACRO: RTE_PKTMBUF_HEADROOM.
Therefore, we can add these APIs as following:
struct rte_rbuf {
void *buf_addr; /**< Virtual address of segment buffer. */
phys_addr_t buf_physaddr; /**< Physical address of segment buffer. */
uint16_t buf_len; /**< Length of segment buffer. */
uint16_t data_off;
uint8_t nb_segs; /**< Number of segments. */
union{
struct{
uint32_t rx_offload_data[8];
uint32_t tx_offload_data[8];
} offload_data;
struct{ /* intel nic offload define*/
uint32_t rss;
uint64_t tx_offload;
...
}intel_offload;
/* other NIC offload define */
...
} /* offload define */
}
uint16_t rte_eth_tx_raw_burst(uint8_t port_id, uint16_t queue_id, struct rte_rbuf **tx_pkts, uint16_t nb_pkts);
uint16_t rte_eth_rx_raw_burst(uint8_t port_id, uint16_t queue_id, struct rte_rbuf **rx_pkts, uint16_t nb_pkts);
/* <obj_table> include <data: virtual addr> and <phys_addr> */
typedef int (*rbuf_bulk_get_hook)(void* mempool, void **obj_table, unsigned n);
typedef int (*rbuf_bulk_free_hook)(void* memaddr);
/* use 'headroom_offset' to replace compile MARCO(CONFIG_RTE_PKTMBUF_HEADROOM) */
typedef int (*eth_dev_init_t)(struct eth_driver *eth_drv, struct rte_eth_dev *eth_dev,
rbuf_bulk_get_hook *rbuf_get, rbuf_bulk_free_hook *rbuf_free, uint16_t headroom_offset);
These are my ideas, I hope you can help me to improve on them.
Thank you!
Jiachun
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-05-27 15:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-27 4:18 [dpdk-dev] proposal: raw packet send and receive API for PMD driver Lin XU
-- strict thread matches above, loose matches on Subject: below --
2015-05-27 14:50 Wiles, Keith
2015-05-27 15:30 ` Venkatesan, Venky
2015-05-14 0:26 Guojiachun
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).