From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f180.google.com (mail-pd0-f180.google.com [209.85.192.180]) by dpdk.org (Postfix) with ESMTP id 2240C569D for ; Thu, 30 Jul 2015 18:17:46 +0200 (CEST) Received: by pdrg1 with SMTP id g1so26897383pdr.2 for ; Thu, 30 Jul 2015 09:17:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-type:content-transfer-encoding; bh=obOnzHFKKsYhRA5g+A7jFh2vqz3hs47pFBDZkV+LBu0=; b=VuIOCqpbWuVqd8Ug73D1OX9VyuBybv6ZbpsC+pw16hP7dtdxixCx5XryZyHxqwXjzo 5ctEtHEt3PV6BL1t2RMuwr/JOoxvVU/LA/zBQbsCbPZF2r8VYMiG5bbM2+t+tTqyPvPN 4eYvgYxlSN6WxovMPYyNJxlINPu1k0ZoSEri9LzY94w2na5LdYCXqmmS8Zc/aL72ng2w usAI2D1p0xV6bn6CHHYEzUUoXQbaNKMaLS5jav+27Jf1jF5op7e6pz5Iy/XBgfNodi7i Bcyr/MCp+QzoPsIer9kvEt/edcdZw+uvtv/DswnGcH/EA0MCKageJF9SjVqqsTKo9JG+ mq6A== X-Gm-Message-State: ALoCoQliNs6W6/Z1mNleqxtDvMEVC+nBiRbq40tZV+COjayS3nOcklZWpEOJUqYYv4mHUgGtzbf4 X-Received: by 10.70.98.170 with SMTP id ej10mr109386853pdb.12.1438273065498; Thu, 30 Jul 2015 09:17:45 -0700 (PDT) Received: from urahara (static-50-53-82-155.bvtn.or.frontiernet.net. [50.53.82.155]) by smtp.gmail.com with ESMTPSA id xo14sm2973878pac.24.2015.07.30.09.17.44 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 30 Jul 2015 09:17:45 -0700 (PDT) Date: Thu, 30 Jul 2015 09:17:53 -0700 From: Stephen Hemminger To: Vlad Zolotarov Message-ID: <20150730091753.1af6cc67@urahara> In-Reply-To: <55BA3B5D.4020402@cloudius-systems.com> References: <55BA3B5D.4020402@cloudius-systems.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] RFC: i40e xmit path HW limitation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jul 2015 16:17:46 -0000 On Thu, 30 Jul 2015 17:57:33 +0300 Vlad Zolotarov wrote: > Hi, Konstantin, Helin, > there is a documented limitation of xl710 controllers (i40e driver)=20 > which is not handled in any way by a DPDK driver. > From the datasheet chapter 8.4.1: >=20 > "=E2=80=A2 A single transmit packet may span up to 8 buffers (up to 8 dat= a descriptors per packet including > both the header and payload buffers). > =E2=80=A2 The total number of data descriptors for the whole TSO (explain= ed later on in this chapter) is > unlimited as long as each segment within the TSO obeys the previous rule = (up to 8 data descriptors > per segment for both the TSO header and the segment payload buffers)." >=20 > This means that, for instance, long cluster with small fragments has to=20 > be linearized before it may be placed on the HW ring. > In more standard environments like Linux or FreeBSD drivers the solution= =20 > is straight forward - call skb_linearize()/m_collapse() corresponding. > In the non-conformist environment like DPDK life is not that easy -=20 > there is no easy way to collapse the cluster into a linear buffer from=20 > inside the device driver > since device driver doesn't allocate memory in a fast path and utilizes=20 > the user allocated pools only. >=20 > Here are two proposals for a solution: >=20 > 1. We may provide a callback that would return a user TRUE if a give > cluster has to be linearized and it should always be called before > rte_eth_tx_burst(). Alternatively it may be called from inside the > rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some > error code for a case when one of the clusters it's given has to be > linearized. > 2. Another option is to allocate a mempool in the driver with the > elements consuming a single page each (standard 2KB buffers would > do). Number of elements in the pool should be as Tx ring length > multiplied by "64KB/(linear data length of the buffer in the pool > above)". Here I use 64KB as a maximum packet length and not taking > into an account esoteric things like "Giant" TSO mentioned in the > spec above. Then we may actually go and linearize the cluster if > needed on top of the buffers from the pool above, post the buffer > from the mempool above on the HW ring, link the original cluster to > that new cluster (using the private data) and release it when the > send is done. Or just silently drop heavily scattered packets (and increment oerrors) with a PMD_TX_LOG debug message. I think a DPDK driver doesn't have to accept all possible mbufs and do extra work. It seems reasonable to expect caller to be well behaved in this restricted ecosystem.