From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f196.google.com (mail-wr0-f196.google.com [209.85.128.196]) by dpdk.org (Postfix) with ESMTP id 5A3E97D9A for ; Thu, 27 Jul 2017 13:12:54 +0200 (CEST) Received: by mail-wr0-f196.google.com with SMTP id y43so24094517wrd.0 for ; Thu, 27 Jul 2017 04:12:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=sprPTkpWhm0QiJQx/VguNlvhDo+kZnXsrl0tKX3HkrE=; b=ZhUnq0Bz3oIzYjTFEMBpt77fHGv8bJLdPaye41JeChwn+0PlvZKob2Binhhzs8vxbl DJlrm9KovmNVF5AIA+e7tZPaUOxVOkjb2+XU43b2IN9p2Lvf3F0UUobNY1RYoyyh6uVN Y2W6I9zrwLejj5MJ8Ko7IgWY6A+lYFf122VtoXbt84GlzNUch99TB3n4+ST4hc5Iku7a nsABLg0Db1Zw2uNpXIc081osQjuZ63Z0ZipcLak9vpX5DuKDo03jUPusbQseq4W2pRNj SuOQUnbnHj7FDUMrs3TudoWZLGaiA1oyUK3QtkXSXivkCnpwTUa1PMFN8ePnlDciQ0H8 vdjg== X-Gm-Message-State: AIVw1136ItirFZDszYgv1hLS0Wf9OKE9Kk8zb6cwKFMylOk/1Wj+QNk8 BzLkIqJ4nsra7mHhP+w= X-Received: by 10.223.170.73 with SMTP id q9mr1390173wrd.61.1501153973620; Thu, 27 Jul 2017 04:12:53 -0700 (PDT) Received: from [192.168.64.116] (bzq-82-81-101-184.red.bezeqint.net. [82.81.101.184]) by smtp.gmail.com with ESMTPSA id 199sm2026889wmu.0.2017.07.27.04.12.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Jul 2017 04:12:53 -0700 (PDT) To: Yongseok Koh Cc: adrien.mazarguil@6wind.com, nelio.laranjeiro@6wind.com, dev@dpdk.org References: <20170720154835.13571-1-yskoh@mellanox.com> <20170721151006.GA38779@yongseok-MBP.local> <918eb17f-269c-7102-41ab-69ceb95fdacf@grimberg.me> <20170725074356.GA4034@minint-98vp2qg> From: Sagi Grimberg Message-ID: Date: Thu, 27 Jul 2017 14:12:41 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20170725074356.GA4034@minint-98vp2qg> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] net/mlx5: poll completion queue once per a call X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Jul 2017 11:12:54 -0000 >> Yes I realize that, but can't the device still complete in a burst (of >> unsuppressed completions)? I mean its not guaranteed that for every >> txq_complete a signaled completion is pending right? What happens if >> the device has inconsistent completion pacing? Can't the sw grow a >> batch of completions if txq_complete will process a single completion >> unconditionally? > Speculation. First of all, device doesn't delay completion notifications for no > reason. ASIC is not a SW running on top of a OS. I'm sorry but this statement is not correct. It might be correct in a lab environment, but in practice, there are lots of things that can affect the device timing. > If a completion comes up late, > this means device really can't keep up the rate of posting descriptors. If so, > tx_burst() should generate back-pressure by returning partial Tx, then app can > make a decision between drop or retry. Retry on Tx means back-pressuring Rx side > if app is forwarding packets. Not arguing on that, I was simply suggesting that better heuristics could be applied than "process one completion unconditionally". > More serious problem I expected was a case that the THRESH is smaller than > burst size. In that case, txq->elts[] will be short of slots all the time. But > fortunately, in MLX PMD, we request one completion per a burst at most, not > every THRESH of packets. > > If there's some SW jitter on Tx processiong, the Tx CQ can grow for sure. > Question to myself was "when does it shrink?". It shrinks when Tx burst is light > (burst size is smaller than THRESH) because mlx5_tx_complete() is always called > every time tx_burst() is called. What if it keeps growing? Then, drop is > necessary and natural like I mentioned above. > > It doesn't make sense for SW to absorb any possible SW jitters. Cost is high. > It is usually done by increasing queue depth. Keeping steady state is more > important. Again, I agree jitters are bad, but with proper heuristics in place mlx5 can still keep a low jitter _and_ consume completions faster than consecutive tx_burst invocations. > Rather, this patch is helpful for reducing jitters. When I run a profiler, the > most cycle-consuming part on Tx is still freeing buffers. If we allow loops on > checking valid CQE, many buffers could be freed in a single call of > mlx5_tx_complete() at some moment, then it would cause a long delay. This would > aggravate jitter. I didn't argue the fact that this patch addresses an issue, but mlx5 is a driver that is designed to run applications that can act differently than your test case. > Of course. I appreciate your time for the review. And keep in mind that nothing > is impossible in an open source community. I always like to discuss about ideas > with anyone. But I was just asking to hear more details about your suggestion if > you wanted me to implement it, rather than giving me one-sentence question :-) Good to know. >>> Does "budget" mean the >>> threshold? If so, calculation of stats for adaptive threshold can impact single >>> core performance. With multiple cores, adjusting threshold doesn't affect much. >> >> If you look at mlx5e driver in the kernel, it maintains online stats on >> its RX and TX queues. It maintain these stats mostly for adaptive >> interrupt moderation control (but not only). >> >> I was suggesting maintaining per TX queue stats on average completions >> consumed for each TX burst call, and adjust the stopping condition >> according to a calculated stat. > In case of interrupt mitigation, it could be beneficial because interrupt > handling cost is too costly. But, the beauty of DPDK is polling, isn't it? If you read again my comment, I didn't suggest to apply stats for interrupt moderation, I just gave an example of a use-case. I was suggesting to maintain the online stats for adjusting a threshold of how much completions to process in a tx burst call (instead of processing one unconditionally). > And please remember to ack at the end of this discussion if you are okay so that > this patch can gets merged. One data point is, single core performance (fwd) of > vectorized PMD gets improved by more than 6% with this patch. 6% is never small. Yea, I don't mind merging it in given that I don't have time to come up with anything better (or worse :)) Reviewed-by: Sagi Grimberg