From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f169.google.com (mail-wi0-f169.google.com [209.85.212.169]) by dpdk.org (Postfix) with ESMTP id EB4392A61 for ; Sun, 13 Sep 2015 14:25:01 +0200 (CEST) Received: by wicfx3 with SMTP id fx3so102306788wic.0 for ; Sun, 13 Sep 2015 05:25:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=yjyHb3gHzyFfh4IXbZvOLTkaMGcqg0x3WbzGJPKpFWw=; b=HKShTm4aHQ9EoIagfm5XLQL2Kjrlz2a/wZgwRX54Iv3jXsPOXYjLsmuL+q14XsyMsR 2JEP30J+iL9Tukl0r6B0CiqZr+I97HpGlQnIk3BymF69Q/tcCX06vPKAz7X2f+Bz3zia u/C58FnrGz6Yeq60tVpoBtr8kf331irYHUmrv9hTtsZr4RnD+CMvOr36iytegYyLyg+1 7Ew4bVs8E5XwV96nrUOTKWuCx3X+TThM9PmhF3jFnndRlR1L0Ffl+sITKuk86DSmhrST gV5GU2OfwXioEpvUaJckIgwOYQKPXVO7St2yTkaUdl2EegxZdl/DaAlXRqlJavUNetHO g0PQ== X-Gm-Message-State: ALoCoQk8nSchsxcXcFx/xV5hhun6D7baZbvgBXOtunw6WvPwBF4SyCn9TEwCvBsDWm5AVu0hJZiq X-Received: by 10.180.108.175 with SMTP id hl15mr15886606wib.1.1442147101432; Sun, 13 Sep 2015 05:25:01 -0700 (PDT) Received: from [10.0.0.5] (bzq-79-179-164-244.red.bezeqint.net. [79.179.164.244]) by smtp.googlemail.com with ESMTPSA id o9sm10103452wja.29.2015.09.13.05.24.59 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 13 Sep 2015 05:25:00 -0700 (PDT) To: "Ananyev, Konstantin" , Avi Kivity , Thomas Monjalon , "didier.pallard" References: <1439489195-31553-1-git-send-email-vladz@cloudius-systems.com> <55F2F6A9.6080405@cloudius-systems.com> <3734976.j9Azrvq6io@xps13> <55F313E4.2080300@cloudius-systems.com> <2601191342CEEE43887BDE71AB97725836A85E36@irsmsx105.ger.corp.intel.com> From: Vlad Zolotarov Message-ID: <55F56B1B.80606@cloudius-systems.com> Date: Sun, 13 Sep 2015 15:24:59 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <2601191342CEEE43887BDE71AB97725836A85E36@irsmsx105.ger.corp.intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Sep 2015 12:25:02 -0000 On 09/13/15 14:47, Ananyev, Konstantin wrote: > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Avi Kivity >> Sent: Friday, September 11, 2015 6:48 PM >> To: Thomas Monjalon; Vladislav Zolotarov; didier.pallard >> Cc: dev@dpdk.org >> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598 >> >> On 09/11/2015 07:08 PM, Thomas Monjalon wrote: >>> 2015-09-11 18:43, Avi Kivity: >>>> On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote: >>>>> On Sep 11, 2015 5:55 PM, "Thomas Monjalon" >>>> > wrote: >>>>>> 2015-09-11 17:47, Avi Kivity: >>>>>>> On 09/11/2015 05:25 PM, didier.pallard wrote: >>>>>>>> Hi vlad, >>>>>>>> >>>>>>>> Documentation states that a packet (or multiple packets in transmit >>>>>>>> segmentation) can span any number of >>>>>>>> buffers (and their descriptors) up to a limit of 40 minus WTHRESH >>>>>>>> minus 2. >>>>>>>> >>>>>>>> Shouldn't there be a test in transmit function that drops >>>>> properly the >>>>>>>> mbufs with a too large number of >>>>>>>> segments, while incrementing a statistic; otherwise transmit >>>>> function >>>>>>>> may be locked by the faulty packet without >>>>>>>> notification. >>>>>>>> >>>>>>> What we proposed is that the pmd expose to dpdk, and dpdk expose >>>>> to the >>>>>>> application, an mbuf check function. This way applications that can >>>>>>> generate complex packets can verify that the device will be able to >>>>>>> process them, and applications that only generate simple mbufs can >>>>> avoid >>>>>>> the overhead by not calling the function. >>>>>> More than a check, it should be exposed as a capability of the port. >>>>>> Anyway, if the application sends too much segments, the driver must >>>>>> drop it to avoid hang, and maintain a dedicated statistic counter to >>>>>> allow easy debugging. >>>>> I agree with Thomas - this should not be optional. Malformed packets >>>>> should be dropped. In the icgbe case it's a very simple test - it's a >>>>> single branch per packet so i doubt that it could impose any >>>>> measurable performance degradation. >>>> A drop allows the application no chance to recover. The driver must >>>> either provide the ability for the application to know that it cannot >>>> accept the packet, or it must fix it up itself. >>> I have the feeling that everybody agrees on the same thing: >>> the application must be able to make a well formed packet by checking >>> limitations of the port. What about a field rte_eth_dev_info.max_tx_segs? >> It is not generic enough. i40e has a limit that it imposes post-TSO. >> >> >>> In case the application fails in its checks, the driver must drop it and >>> notify the user via a stat counter. >>> The driver can also remove the hardware limitation by gathering the segments >>> but it may be hard to implement and would be a slow operation. >> I think that to satisfy both the 64b full line rate applications and the >> more complicated full stack applications, this must be made optional. >> In particular, and application that only forwards packets will never hit >> a NIC's limits, so it need not take any action. That's why I think a >> verification function is ideal; a forwarding application can ignore it, >> and a complex application can call it, and if it fails the packet, it >> can linearize it itself, removing complexity from dpdk itself. > I think that's a good approach to that problem. > As I remember we discussed something similar a while ago - > A function (tx_prep() or something) that would check nb_segs and probably some other HW specific restrictions, > calculate pseudo-header checksum, reset ip header len, etc. > > From other hand we also can add two more fields into rte_eth_dev_info: > 1) Max num of segs per TSO packet (tx_max_seg ?). > 2) Max num of segs per single packet/TSO segment (tx_max_mtu_seg ?). > So for ixgbe both will have value 40 - wthresh, > while for i40e 1) would be UINT8_MAX and 2) will be 8. > Then upper layer can use that information to select an optimal size for its TX buffers. HW limitations differ from HW to HW not only by values but also by their nature - for instance for Qlogic bnx2x NICs the limitations may not be expressed in the values above so this must be a callback. > > Konstantin >