From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com [209.85.212.182]) by dpdk.org (Postfix) with ESMTP id 6DED2C6C8 for ; Thu, 30 Jul 2015 21:25:06 +0200 (CEST) Received: by wibud3 with SMTP id ud3so4839498wib.0 for ; Thu, 30 Jul 2015 12:25:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=q9C0eH7E2DLB2Hzk16CZolnU94iXsTZaPLBdtoMCeW0=; b=Zh0TO2trOQNNsAvYZ/ZwQqt4FOJJUD4ut8cbWq8T48sH8LFZ8ObGrUx5COX3A76rrz 7lxKlzm6QayMnH977r7JaMucNxu/1/zSvEdsbn6YnkU1tDMIS3MGAPDk6qMx0gf/Jr5O gxZmo8K4HCVETjvJbKKEqdrAGmkIkHf2jgbg7J1FIS3uxyp29pgXXZeYikZ4Y+8PZ/z9 mebnGEsLgx+gvLD4SRldG0BWb8l7U+/8Rx/vXv9WPQCd71yd/HKteitMM1L/lmAX1aFf l9cTbFzmaEVSx0kGmGYmdcIOeRYEiZyOGnW2bOPXdHvQn/lHK584bjjxagKDbQit8ytD 1vTg== X-Gm-Message-State: ALoCoQnWDoL3swn8JFeviGlxUtMkMCw45h1ev3wUJohy/Fu4k79axtqPTGI6hhl4iF+H1QqznboR MIME-Version: 1.0 X-Received: by 10.180.107.138 with SMTP id hc10mr9368650wib.2.1438284306239; Thu, 30 Jul 2015 12:25:06 -0700 (PDT) Received: by 10.194.81.34 with HTTP; Thu, 30 Jul 2015 12:25:06 -0700 (PDT) Received: by 10.194.81.34 with HTTP; Thu, 30 Jul 2015 12:25:06 -0700 (PDT) In-Reply-To: References: <55BA3B5D.4020402@cloudius-systems.com> <55BA5468.80109@cloudius-systems.com> <55BA653D.5060109@cloudius-systems.com> Date: Thu, 30 Jul 2015 22:25:06 +0300 Message-ID: From: Vladislav Zolotarov To: Helin Zhang Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: dev@dpdk.org Subject: Re: [dpdk-dev] i40e xmit path HW limitation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jul 2015 19:25:06 -0000 On Jul 30, 2015 22:00, "Zhang, Helin" wrote: > > > > > -----Original Message----- > > From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] > > Sent: Thursday, July 30, 2015 10:56 AM > > To: Zhang, Helin; Ananyev, Konstantin > > Cc: dev@dpdk.org > > Subject: Re: i40e xmit path HW limitation > > > > > > > > On 07/30/15 20:33, Zhang, Helin wrote: > > > > > >> -----Original Message----- > > >> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] > > >> Sent: Thursday, July 30, 2015 9:44 AM > > >> To: Zhang, Helin; Ananyev, Konstantin > > >> Cc: dev@dpdk.org > > >> Subject: Re: i40e xmit path HW limitation > > >> > > >> > > >> > > >> On 07/30/15 19:10, Zhang, Helin wrote: > > >>>> -----Original Message----- > > >>>> From: Vlad Zolotarov [mailto:vladz@cloudius-systems.com] > > >>>> Sent: Thursday, July 30, 2015 7:58 AM > > >>>> To: dev@dpdk.org; Ananyev, Konstantin; Zhang, Helin > > >>>> Subject: RFC: i40e xmit path HW limitation > > >>>> > > >>>> Hi, Konstantin, Helin, > > >>>> there is a documented limitation of xl710 controllers (i40e driver= ) > > >>>> which is not handled in any way by a DPDK driver. > > >>>> From the datasheet chapter 8.4.1: > > >>>> > > >>>> "=E2=80=A2 A single transmit packet may span up to 8 buffers (up t= o 8 data > > >>>> descriptors per packet including both the header and payload buffers). > > >>>> =E2=80=A2 The total number of data descriptors for the whole TSO (= explained > > >>>> later on in this chapter) is unlimited as long as each segment > > >>>> within the TSO obeys the previous rule (up to 8 data descriptors > > >>>> per segment for both the TSO header and the segment payload buffers)." > > >>> Yes, I remember the RX side just supports 5 segments per packet receiving. > > >>> But what's the possible issue you thought about? > > >> Note that it's a Tx size we are talking about. > > >> > > >> See 30520831f058cd9d75c0f6b360bc5c5ae49b5f27 commit in linux net-nex= t > > repo. > > >> If such a cluster arrives and you post it on the HW ring - HW will > > >> shut this HW ring down permanently. The application will see that it's ring is > > stuck. > > > That issue was because of using more than 8 descriptors for a packet for TSO. > > > > There is no problem in transmitting the TSO packet with more than 8 fragments. > > On the opposite - one can't transmit a non-TSO packet with more than 8 > > fragments. > > One also can't transmit the TSO packet that would contain more than 8 fragments > > in a single TSO segment including the TSO headers. > > > > Pls., read the HW spec as I quoted above for more details. > I meant a packet to be transmitted by the hardware, but not the TSO packet in memory. > It could be a segment in TSO packet in memory. > The linearize check in kernel driver is not for TSO only, it is for both TSO and > NON-TSO cases. That's what i was trying to tell u. Great we are on the same page at last... =F0=9F=98=89 > > > > > > > > >>>> This means that, for instance, long cluster with small fragments > > >>>> has to be linearized before it may be placed on the HW ring. > > >>> What type of size of the small fragments? Basically 2KB is the > > >>> default size of > > >> mbuf of most > > >>> example applications. 2KB x 8 is bigger than 1.5KB. So it is enough > > >>> for the > > >> maximum > > >>> packet size we supported. > > >>> If 1KB mbuf is used, don't expect it can transmit more than 8KB size of > > packet. > > >> I kinda lost u here. Again, we talk about the Tx side here and > > >> buffers are not obligatory completely filled. Namely there may be a > > >> cluster with > > >> 15 fragments 100 bytes each. > > > The root cause is using more than 8 descriptors for a packet. > > > > That would be if u would like to SUPER simplify the HW limitation above= . > > In that case u would significantly limit the different packets that may be sent > > without the linearization. > > > > > Linux driver can help > > > on reducing number of descriptors to be used by merging small size of > > > payload together, right? > > > It is not for TSO, it is just for packet transmitting. 2 options in my mind: > > > 1. Use should ensure it will not use more than 8 descriptors per packet for > > transmitting. > > > > This requirement is too restricting. Pls., see above. > > > > > 2. DPDK driver should try to merge small packet together for such case, like > > Linux kernel driver. > > > I prefer to use option 1, users should ensure that in the application > > > or up layer software, and keep the PMD driver as simple as possible. > > > > The above statement is super confusing: on the one hand u suggest the DPDK > > driver to merge the small packet (fragments?) together (how?) and then = u > > immediately propose the user application to do that. Could u, pls., clarify what > > exactly u suggest here? > > If that's to leave it to the application - note that it would demand patching all > > existing DPDK applications that send TCP packets. > Those are two of obvious options. One is to do that in PMD, the other one is to do > that in up layer. I did not mean it needs to do both! Ok. I just didn't understand where the (2) description ends. Now i get u... =F0=9F=98=89 > > > > > > > > > > But I have a thought that the maximum number of RX/TX descriptor > > > should be able to be queried somewhere. > > > > There is no such thing as maximum number of Tx fragments in a TSO case. > > It's only limited by the Tx ring size. > Again, it is not for TSO case only. You are talking about how to implement it? I understand that and what I was trying to tell was that any limit we choose that satisfies the non-TSO case would be too restricting for a TSO case. Therefore I'd suggest to go to the second option and implement the merging in the driver. Not only it would be the cleanest and robust way but it would also prevent the tremendous code duplication across all applications susceptible to this HW limitation. > Anything missed can be added, as long as it is reasonable. > > Regards, > Helin > > > > > > > > > Regards, > > > Helin > > >>>> In more standard environments like Linux or FreeBSD drivers the > > >>>> solution is straight forward - call skb_linearize()/m_collapse() > > corresponding. > > >>>> In the non-conformist environment like DPDK life is not that easy = - > > >>>> there is no easy way to collapse the cluster into a linear buffer > > >>>> from inside the device > > >> driver > > >>>> since device driver doesn't allocate memory in a fast path and > > >>>> utilizes the user allocated pools only. > > >>>> Here are two proposals for a solution: > > >>>> > > >>>> 1. We may provide a callback that would return a user TRUE if a give > > >>>> cluster has to be linearized and it should always be called before > > >>>> rte_eth_tx_burst(). Alternatively it may be called from inside the > > >>>> rte_eth_tx_burst() and rte_eth_tx_burst() is changed to return > > some > > >>>> error code for a case when one of the clusters it's given has to be > > >>>> linearized. > > >>>> 2. Another option is to allocate a mempool in the driver with the > > >>>> elements consuming a single page each (standard 2KB buffers > > would > > >>>> do). Number of elements in the pool should be as Tx ring length > > >>>> multiplied by "64KB/(linear data length of the buffer in the pool > > >>>> above)". Here I use 64KB as a maximum packet length and not > > taking > > >>>> into an account esoteric things like "Giant" TSO mentioned in the > > >>>> spec above. Then we may actually go and linearize the cluster if > > >>>> needed on top of the buffers from the pool above, post the buffer > > >>>> from the mempool above on the HW ring, link the original cluster to > > >>>> that new cluster (using the private data) and release it when the > > >>>> send is done. > > >>>> > > >>>> > > >>>> The first is a change in the API and would require from the > > >>>> application some additional handling (linearization). The second > > >>>> would require some additional memory but would keep all dirty > > >>>> details inside the driver and would leave the rest of the code intact. > > >>>> > > >>>> Pls., comment. > > >>>> > > >>>> thanks, > > >>>> vlad > > >>>> >