From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f179.google.com (mail-wi0-f179.google.com [209.85.212.179]) by dpdk.org (Postfix) with ESMTP id 4C20BC648 for ; Thu, 30 Jul 2015 16:57:35 +0200 (CEST) Received: by wicmv11 with SMTP id mv11so24609289wic.0 for ; Thu, 30 Jul 2015 07:57:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:to:from:subject:message-id:date:user-agent :mime-version:content-type:content-transfer-encoding; bh=a3UcVlry6Mnp3HOa7MK2RLL3qGjkq8rNTFuL2hIOnBY=; b=al0rV46MdrJ2RDGe5sta/bXP7FzWuMRkSYB5+VQ41z5biopkB63aq8UBCsdtmnDKDd 1pQkhV9KQZ5uTkfa+DkAE/OFQtOr1TdWVtlRTnWDzqJXSV9WvsCAaAALu6Ed1MmN4hBm ILb8FJ6JtPQ4Fr/Iq8G68/IHOLVO7znV15nyf3lykpossL+SqqVnTEs7yybQEyPCjSmg j2/TKC5eCNz53MX17tUC/sUWbaUNqpnYR6myAk9pOFlb4fjcUNoAEqWugV0MaqKyA5bu 8LX9Db/JYvOnL35lgrjmB/nQrzXsh7nvivG+JhHmzbMU220qubpGHEpxLzjITOP5VZuM 33kg== X-Gm-Message-State: ALoCoQnWoKOTYmWf3lTHueWEF9SB1iENqkfv+DZLOD32RrEbAy03u05fJ5apuBhdzB77p//3mNMh X-Received: by 10.180.108.19 with SMTP id hg19mr7205022wib.35.1438268255032; Thu, 30 Jul 2015 07:57:35 -0700 (PDT) Received: from [10.0.0.166] ([37.142.229.250]) by smtp.googlemail.com with ESMTPSA id lg6sm2287757wjb.10.2015.07.30.07.57.33 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 30 Jul 2015 07:57:34 -0700 (PDT) To: "dev@dpdk.org" , "Ananyev, Konstantin" , Helin Zhang From: Vlad Zolotarov Message-ID: <55BA3B5D.4020402@cloudius-systems.com> Date: Thu, 30 Jul 2015 17:57:33 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] RFC: i40e xmit path HW limitation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jul 2015 14:57:35 -0000 Hi, Konstantin, Helin, there is a documented limitation of xl710 controllers (i40e driver) which is not handled in any way by a DPDK driver. From the datasheet chapter 8.4.1: "• A single transmit packet may span up to 8 buffers (up to 8 data descriptors per packet including both the header and payload buffers). • The total number of data descriptors for the whole TSO (explained later on in this chapter) is unlimited as long as each segment within the TSO obeys the previous rule (up to 8 data descriptors per segment for both the TSO header and the segment payload buffers)." This means that, for instance, long cluster with small fragments has to be linearized before it may be placed on the HW ring. In more standard environments like Linux or FreeBSD drivers the solution is straight forward - call skb_linearize()/m_collapse() corresponding. In the non-conformist environment like DPDK life is not that easy - there is no easy way to collapse the cluster into a linear buffer from inside the device driver since device driver doesn't allocate memory in a fast path and utilizes the user allocated pools only. Here are two proposals for a solution: 1. We may provide a callback that would return a user TRUE if a give cluster has to be linearized and it should always be called before rte_eth_tx_burst(). Alternatively it may be called from inside the rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return some error code for a case when one of the clusters it's given has to be linearized. 2. Another option is to allocate a mempool in the driver with the elements consuming a single page each (standard 2KB buffers would do). Number of elements in the pool should be as Tx ring length multiplied by "64KB/(linear data length of the buffer in the pool above)". Here I use 64KB as a maximum packet length and not taking into an account esoteric things like "Giant" TSO mentioned in the spec above. Then we may actually go and linearize the cluster if needed on top of the buffers from the pool above, post the buffer from the mempool above on the HW ring, link the original cluster to that new cluster (using the private data) and release it when the send is done. The first is a change in the API and would require from the application some additional handling (linearization). The second would require some additional memory but would keep all dirty details inside the driver and would leave the rest of the code intact. Pls., comment. thanks, vlad