From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by dpdk.org (Postfix) with ESMTP id EBB09C634 for ; Thu, 30 Jul 2015 18:44:26 +0200 (CEST) Received: by wicgb10 with SMTP id gb10so251857090wic.1 for ; Thu, 30 Jul 2015 09:44:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=BDKpib8MmY9HJIYEAAQz6PJsUf8g48Bkr425VnxjzQ4=; b=YxAm+MZQEcInVzV3unMFSCOyByI+zBNuA6gSw7aKYVuVlPZ0+Ntm86OBM/Rt5AEoyO f0DrJO5yv+91JTU4IbRVJepmCTgVV1pi91ark8FE6AXplPm/X1CGfnEAuJVSHScgg+PR uCtTVW7o6bPfIKh+FQKrEZIm/yx8BjY3QBe9TiQHIiLGNp+7X2tBCYxoafZo1t5nQjy5 gNKekDKThgYJJBkroJ+xVV9jHEZyyJYsHqiJq1GZftPSMrfewWiYNC8oXGUMLf+1dRHW TObYzANJRotzi/eCUT+z7LUI3q0IRXEE1m2FiqocAuKdGECecPEe7q4BM1WNdM0GvKVb 8jOA== X-Gm-Message-State: ALoCoQlENmZHfJaiNAdjimDzCT5VMDDf7IP/ruL0V3LYI//6zixBRF5L9GjW1RzAXcVj/WlpsChp X-Received: by 10.180.81.103 with SMTP id z7mr7720417wix.21.1438274666734; Thu, 30 Jul 2015 09:44:26 -0700 (PDT) Received: from [10.0.0.166] ([37.142.229.250]) by smtp.googlemail.com with ESMTPSA id di7sm3881595wib.23.2015.07.30.09.44.25 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 30 Jul 2015 09:44:26 -0700 (PDT) To: "Zhang, Helin" , "Ananyev, Konstantin" References: <55BA3B5D.4020402@cloudius-systems.com> From: Vlad Zolotarov Message-ID: <55BA5468.80109@cloudius-systems.com> Date: Thu, 30 Jul 2015 19:44:24 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] i40e xmit path HW limitation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jul 2015 16:44:27 -0000 On 07/30/15 19:10, Zhang, Helin wrote: > >> -----Original Message----- >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] >> Sent: Thursday, July 30, 2015 7:58 AM >> To: dev@dpdk.org; Ananyev, Konstantin; Zhang, Helin >> Subject: RFC: i40e xmit path HW limitation >> >> Hi, Konstantin, Helin, >> there is a documented limitation of xl710 controllers (i40e driver) which is not >> handled in any way by a DPDK driver. >> From the datasheet chapter 8.4.1: >> >> "• A single transmit packet may span up to 8 buffers (up to 8 data descriptors per >> packet including both the header and payload buffers). >> • The total number of data descriptors for the whole TSO (explained later on in >> this chapter) is unlimited as long as each segment within the TSO obeys the >> previous rule (up to 8 data descriptors per segment for both the TSO header and >> the segment payload buffers)." > Yes, I remember the RX side just supports 5 segments per packet receiving. > But what's the possible issue you thought about? Note that it's a Tx size we are talking about. See 30520831f058cd9d75c0f6b360bc5c5ae49b5f27 commit in linux net-next repo. If such a cluster arrives and you post it on the HW ring - HW will shut this HW ring down permanently. The application will see that it's ring is stuck. > >> This means that, for instance, long cluster with small fragments has to be >> linearized before it may be placed on the HW ring. > What type of size of the small fragments? Basically 2KB is the default size of mbuf of most > example applications. 2KB x 8 is bigger than 1.5KB. So it is enough for the maximum > packet size we supported. > If 1KB mbuf is used, don't expect it can transmit more than 8KB size of packet. I kinda lost u here. Again, we talk about the Tx side here and buffers are not obligatory completely filled. Namely there may be a cluster with 15 fragments 100 bytes each. > >> In more standard environments like Linux or FreeBSD drivers the solution is >> straight forward - call skb_linearize()/m_collapse() corresponding. >> In the non-conformist environment like DPDK life is not that easy - there is no >> easy way to collapse the cluster into a linear buffer from inside the device driver >> since device driver doesn't allocate memory in a fast path and utilizes the user >> allocated pools only. >> Here are two proposals for a solution: >> >> 1. We may provide a callback that would return a user TRUE if a give >> cluster has to be linearized and it should always be called before >> rte_eth_tx_burst(). Alternatively it may be called from inside the >> rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some >> error code for a case when one of the clusters it's given has to be >> linearized. >> 2. Another option is to allocate a mempool in the driver with the >> elements consuming a single page each (standard 2KB buffers would >> do). Number of elements in the pool should be as Tx ring length >> multiplied by "64KB/(linear data length of the buffer in the pool >> above)". Here I use 64KB as a maximum packet length and not taking >> into an account esoteric things like "Giant" TSO mentioned in the >> spec above. Then we may actually go and linearize the cluster if >> needed on top of the buffers from the pool above, post the buffer >> from the mempool above on the HW ring, link the original cluster to >> that new cluster (using the private data) and release it when the >> send is done. >> >> >> The first is a change in the API and would require from the application some >> additional handling (linearization). The second would require some additional >> memory but would keep all dirty details inside the driver and would leave the >> rest of the code intact. >> >> Pls., comment. >> >> thanks, >> vlad >>