DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Zhang, Helin" <helin.zhang@intel.com>
To: Vlad Zolotarov <vladz@cloudius-systems.com>,
	"Ananyev, Konstantin" <konstantin.ananyev@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] i40e xmit path HW limitation
Date: Thu, 30 Jul 2015 16:10:19 +0000	[thread overview]
Message-ID: <F35DEAC7BCE34641BA9FAC6BCA4A12E70A8B7B16@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <55BA3B5D.4020402@cloudius-systems.com>



> -----Original Message-----
> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com]
> Sent: Thursday, July 30, 2015 7:58 AM
> To: dev@dpdk.org; Ananyev, Konstantin; Zhang, Helin
> Subject: RFC: i40e xmit path HW limitation
> 
> Hi, Konstantin, Helin,
> there is a documented limitation of xl710 controllers (i40e driver) which is not
> handled in any way by a DPDK driver.
>  From the datasheet chapter 8.4.1:
> 
> "• A single transmit packet may span up to 8 buffers (up to 8 data descriptors per
> packet including both the header and payload buffers).
> • The total number of data descriptors for the whole TSO (explained later on in
> this chapter) is unlimited as long as each segment within the TSO obeys the
> previous rule (up to 8 data descriptors per segment for both the TSO header and
> the segment payload buffers)."
Yes, I remember the RX side just supports 5 segments per packet receiving.
But what's the possible issue you thought about?

> 
> This means that, for instance, long cluster with small fragments has to be
> linearized before it may be placed on the HW ring.
What type of size of the small fragments? Basically 2KB is the default size of mbuf of most
example applications. 2KB x 8 is bigger than 1.5KB. So it is enough for the maximum
packet size we supported.
If 1KB mbuf is used, don't expect it can transmit more than 8KB size of packet.

> In more standard environments like Linux or FreeBSD drivers the solution is
> straight forward - call skb_linearize()/m_collapse() corresponding.
> In the non-conformist environment like DPDK life is not that easy - there is no
> easy way to collapse the cluster into a linear buffer from inside the device driver
> since device driver doesn't allocate memory in a fast path and utilizes the user
> allocated pools only.

> 
> Here are two proposals for a solution:
> 
>  1. We may provide a callback that would return a user TRUE if a give
>     cluster has to be linearized and it should always be called before
>     rte_eth_tx_burst(). Alternatively it may be called from inside the
>     rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some
>     error code for a case when one of the clusters it's given has to be
>     linearized.
>  2. Another option is to allocate a mempool in the driver with the
>     elements consuming a single page each (standard 2KB buffers would
>     do). Number of elements in the pool should be as Tx ring length
>     multiplied by "64KB/(linear data length of the buffer in the pool
>     above)". Here I use 64KB as a maximum packet length and not taking
>     into an account esoteric things like "Giant" TSO mentioned in the
>     spec above. Then we may actually go and linearize the cluster if
>     needed on top of the buffers from the pool above, post the buffer
>     from the mempool above on the HW ring, link the original cluster to
>     that new cluster (using the private data) and release it when the
>     send is done.
> 
> 
> The first is a change in the API and would require from the application some
> additional handling (linearization). The second would require some additional
> memory but would keep all dirty details inside the driver and would leave the
> rest of the code intact.
> 
> Pls., comment.
> 
> thanks,
> vlad
> 


  reply	other threads:[~2015-07-30 16:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-30 14:57 [dpdk-dev] RFC: " Vlad Zolotarov
2015-07-30 16:10 ` Zhang, Helin [this message]
2015-07-30 16:44   ` [dpdk-dev] " Vlad Zolotarov
2015-07-30 17:33     ` Zhang, Helin
2015-07-30 17:56       ` Vlad Zolotarov
2015-07-30 19:00         ` Zhang, Helin
2015-07-30 19:25           ` Vladislav Zolotarov
2015-07-30 16:17 ` [dpdk-dev] RFC: " Stephen Hemminger
2015-07-30 16:20   ` Avi Kivity
2015-07-30 16:50     ` Vlad Zolotarov
2015-07-30 17:01       ` Stephen Hemminger
2015-07-30 17:14         ` Vlad Zolotarov
2015-07-30 17:22         ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F35DEAC7BCE34641BA9FAC6BCA4A12E70A8B7B16@SHSMSX104.ccr.corp.intel.com \
    --to=helin.zhang@intel.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    --cc=vladz@cloudius-systems.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).