From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by dpdk.org (Postfix) with ESMTP id F2F125A38 for ; Thu, 30 Jul 2015 18:20:23 +0200 (CEST) Received: by wibud3 with SMTP id ud3so27572900wib.1 for ; Thu, 30 Jul 2015 09:20:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=Qh5VJx0JVm+KKvBWv2PQyiQ3HkFDb/z1SmNmojH4mtU=; b=DT4OwxodK3CeFRnKS4gahLUIOSLAZADbpOGLjjs0oomiOvAoTJ3eFSNehhbTXJCLBj 1lF6G+rkLsS1lDJ88pEka0mlnr/2dTERzfFlFsmeTVa10yKAb5iqoTJNWoRmJdp4FJI5 4VYoCJX2xoG3bFQ9+o4+LbGiAlKfhciZsn+HEMECQ7B8Rf0ssESLzBOiJC16xNCRcyDG /xbturyTL5aauoW5/eJgZk5yEKt4nWYcmzGe8bUylydOa1GT51VcuEDIdqb/jnjSrAew 6EPQrf97WYcQDGNPTP05qG7MV5VIlG3a9OzSEUp/DEYKshEvWHQLRiMe8mEJqSY7Vsgo 6GJQ== X-Gm-Message-State: ALoCoQmDnLK4ALeycD+iajmceeTlUgHt1W3x4/deIP6NCj+7MGw5LOES++3WyG2yEvyC27Vcg2gP X-Received: by 10.194.2.51 with SMTP id 19mr95417042wjr.40.1438273223874; Thu, 30 Jul 2015 09:20:23 -0700 (PDT) Received: from avi.cloudius ([37.142.229.250]) by smtp.googlemail.com with ESMTPSA id d17sm2602144wjs.32.2015.07.30.09.20.22 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 30 Jul 2015 09:20:23 -0700 (PDT) To: Stephen Hemminger , Vlad Zolotarov References: <55BA3B5D.4020402@cloudius-systems.com> <20150730091753.1af6cc67@urahara> From: Avi Kivity Message-ID: <55BA4EC6.3030301@cloudius-systems.com> Date: Thu, 30 Jul 2015 19:20:22 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <20150730091753.1af6cc67@urahara> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] RFC: i40e xmit path HW limitation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jul 2015 16:20:24 -0000 On 07/30/2015 07:17 PM, Stephen Hemminger wrote: > On Thu, 30 Jul 2015 17:57:33 +0300 > Vlad Zolotarov wrote: > >> Hi, Konstantin, Helin, >> there is a documented limitation of xl710 controllers (i40e driver) >> which is not handled in any way by a DPDK driver. >> From the datasheet chapter 8.4.1: >> >> "• A single transmit packet may span up to 8 buffers (up to 8 data descriptors per packet including >> both the header and payload buffers). >> • The total number of data descriptors for the whole TSO (explained later on in this chapter) is >> unlimited as long as each segment within the TSO obeys the previous rule (up to 8 data descriptors >> per segment for both the TSO header and the segment payload buffers)." >> >> This means that, for instance, long cluster with small fragments has to >> be linearized before it may be placed on the HW ring. >> In more standard environments like Linux or FreeBSD drivers the solution >> is straight forward - call skb_linearize()/m_collapse() corresponding. >> In the non-conformist environment like DPDK life is not that easy - >> there is no easy way to collapse the cluster into a linear buffer from >> inside the device driver >> since device driver doesn't allocate memory in a fast path and utilizes >> the user allocated pools only. >> >> Here are two proposals for a solution: >> >> 1. We may provide a callback that would return a user TRUE if a give >> cluster has to be linearized and it should always be called before >> rte_eth_tx_burst(). Alternatively it may be called from inside the >> rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some >> error code for a case when one of the clusters it's given has to be >> linearized. >> 2. Another option is to allocate a mempool in the driver with the >> elements consuming a single page each (standard 2KB buffers would >> do). Number of elements in the pool should be as Tx ring length >> multiplied by "64KB/(linear data length of the buffer in the pool >> above)". Here I use 64KB as a maximum packet length and not taking >> into an account esoteric things like "Giant" TSO mentioned in the >> spec above. Then we may actually go and linearize the cluster if >> needed on top of the buffers from the pool above, post the buffer >> from the mempool above on the HW ring, link the original cluster to >> that new cluster (using the private data) and release it when the >> send is done. > Or just silently drop heavily scattered packets (and increment oerrors) > with a PMD_TX_LOG debug message. > > I think a DPDK driver doesn't have to accept all possible mbufs and do > extra work. It seems reasonable to expect caller to be well behaved > in this restricted ecosystem. > How can the caller know what's well behaved? It's device dependent.