DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] RFC: i40e xmit path HW limitation
@ 2015-07-30 14:57 Vlad Zolotarov
  2015-07-30 16:10 ` [dpdk-dev] " Zhang, Helin
  2015-07-30 16:17 ` [dpdk-dev] RFC: " Stephen Hemminger
  0 siblings, 2 replies; 13+ messages in thread
From: Vlad Zolotarov @ 2015-07-30 14:57 UTC (permalink / raw)
  To: dev, Ananyev, Konstantin, Helin Zhang

Hi, Konstantin, Helin,
there is a documented limitation of xl710 controllers (i40e driver) 
which is not handled in any way by a DPDK driver.
 From the datasheet chapter 8.4.1:

"• A single transmit packet may span up to 8 buffers (up to 8 data descriptors per packet including
both the header and payload buffers).
• The total number of data descriptors for the whole TSO (explained later on in this chapter) is
unlimited as long as each segment within the TSO obeys the previous rule (up to 8 data descriptors
per segment for both the TSO header and the segment payload buffers)."

This means that, for instance, long cluster with small fragments has to 
be linearized before it may be placed on the HW ring.
In more standard environments like Linux or FreeBSD drivers the solution 
is straight forward - call skb_linearize()/m_collapse() corresponding.
In the non-conformist environment like DPDK life is not that easy - 
there is no easy way to collapse the cluster into a linear buffer from 
inside the device driver
since device driver doesn't allocate memory in a fast path and utilizes 
the user allocated pools only.

Here are two proposals for a solution:

 1. We may provide a callback that would return a user TRUE if a give
    cluster has to be linearized and it should always be called before
    rte_eth_tx_burst(). Alternatively it may be called from inside the
    rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some
    error code for a case when one of the clusters it's given has to be
    linearized.
 2. Another option is to allocate a mempool in the driver with the
    elements consuming a single page each (standard 2KB buffers would
    do). Number of elements in the pool should be as Tx ring length
    multiplied by "64KB/(linear data length of the buffer in the pool
    above)". Here I use 64KB as a maximum packet length and not taking
    into an account esoteric things like "Giant" TSO mentioned in the
    spec above. Then we may actually go and linearize the cluster if
    needed on top of the buffers from the pool above, post the buffer
    from the mempool above on the HW ring, link the original cluster to
    that new cluster (using the private data) and release it when the
    send is done.


The first is a change in the API and would require from the application 
some additional handling (linearization). The second would require some 
additional memory but would keep all dirty details inside the driver and 
would leave the rest of the code intact.

Pls., comment.

thanks,
vlad

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-07-30 19:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-30 14:57 [dpdk-dev] RFC: i40e xmit path HW limitation Vlad Zolotarov
2015-07-30 16:10 ` [dpdk-dev] " Zhang, Helin
2015-07-30 16:44   ` Vlad Zolotarov
2015-07-30 17:33     ` Zhang, Helin
2015-07-30 17:56       ` Vlad Zolotarov
2015-07-30 19:00         ` Zhang, Helin
2015-07-30 19:25           ` Vladislav Zolotarov
2015-07-30 16:17 ` [dpdk-dev] RFC: " Stephen Hemminger
2015-07-30 16:20   ` Avi Kivity
2015-07-30 16:50     ` Vlad Zolotarov
2015-07-30 17:01       ` Stephen Hemminger
2015-07-30 17:14         ` Vlad Zolotarov
2015-07-30 17:22         ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).