From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx.bisdn.de (mx.bisdn.de [185.27.182.31]) by dpdk.org (Postfix) with ESMTP id AF4A4568E for ; Thu, 26 Feb 2015 13:56:51 +0100 (CET) Received: from [172.16.250.156] (unknown [172.16.250.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx.bisdn.de (Postfix) with ESMTPSA id 69703A2601; Thu, 26 Feb 2015 13:56:51 +0100 (CET) Message-ID: <54EF1812.50307@bisdn.de> Date: Thu, 26 Feb 2015 13:56:50 +0100 From: Marc Sune User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.4.0 MIME-Version: 1.0 To: "Hemant@freescale.com" , Jay Rolette References: <14248648813214-git-send-email-Hemant@freescale.com> <54EDBC76.2050507@druidsoftware.com> <54EDC23A.2080302@bisdn.de> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: DPDK Subject: Re: [dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Feb 2015 12:56:51 -0000 On 26/02/15 08:00, Hemant@freescale.com wrote: > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jay Rolette >> Sent: 25/Feb/2015 7:00 PM >> To: Marc Sune >> Cc: DPDK >> Subject: Re: [dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst >> >> On Wed, Feb 25, 2015 at 6:38 AM, Marc Sune wrote: >> >>> On 25/02/15 13:24, Hemant@freescale.com wrote: >>> >>>> Hi OIivier >>>> Comments inline. >>>> Regards, >>>> Hemant >>>> >>>> -----Original Message----- >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Deme >>>>> Sent: 25/Feb/2015 5:44 PM >>>>> To: dev@dpdk.org >>>>> Subject: Re: [dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst >>>>> >>>>> Thank you Hemant, I think there might be one issue left with the >>>>> patch though. >>>>> The alloc_q must initially be filled with mbufs before getting mbuf >>>>> back on the tx_q. >>>>> >>>>> So the patch should allow rte_kni_rx_burst to check if alloc_q is empty. >>>>> If so, it should invoke kni_allocate_mbufs(kni, 0) (to fill the >>>>> alloc_q with MAX_MBUF_BURST_NUM mbufs) >>>>> >>>>> The patch for rte_kni_rx_burst would then look like: >>>>> >>>>> @@ -575,7 +575,7 @@ rte_kni_rx_burst(struct rte_kni *kni, struct >>>>> rte_mbuf **mbufs, unsigned num) >>>>> >>>>> /* If buffers removed, allocate mbufs and then put them into >>>>> alloc_q */ >>>>> if (ret) >>>>> - kni_allocate_mbufs(kni); >>>>> + kni_allocate_mbufs(kni, ret); else if >>>>> + (unlikely(kni->alloc_q->write == kni->alloc_q->read)) >>>>> + kni_allocate_mbufs(kni, 0); >>>>> >>>>> [hemant] This will introduce a run-time check. >>>> I missed to include the other change in the patch. >>>> I am doing it in kni_alloc i.e. initiate the alloc_q with default >>>> burst size. >>>> kni_allocate_mbufs(ctx, 0); >>>> >>>> In a way, we are now suggesting to reduce the size of alloc_q to only >>>> default burst size. >>>> >>> As an aside comment here, I think that we should allow to tweak the >>> userspace <-> kernel queue sizes (rx_q, tx_q, free_q and alloc_q) . >>> Whether this should be a build configuration option or a parameter to >>> rte_kni_init(), it is not completely clear to me, but I guess >>> rte_kni_init() is a better option. >>> >> rte_kni_init() is definitely a better option. It allows things to be tuned based on >> individual system config rather than requiring different builds. >> >> >>> Having said that, the original mail from Hemant was describing that >>> KNI was giving an out-of-memory. This to me indicates that the pool is >>> incorrectly dimensioned. Even if KNI will not pre-allocate in the >>> alloc_q, or not completely, in the event of high load, you will get >>> this same "out of memory". >>> >>> We can reduce the usage of buffers by the KNI subsystem in kernel >>> space and in userspace, but the kernel will always need a small cache >>> of pre-allocated buffers (coming from user-space), since the KNI >>> kernel module does not know where to grab the packets from (which >>> pool). So my guess is that the dimensioning problem experienced by >>> Hemant would be the same, even with the proposed changes. >>> >>> >>>> Can we reach is situation, when the kernel is adding packets faster >>>> in tx_q than the application is able to dequeue? >>>> >>> I think so. We cannot control much how the kernel will schedule the >>> KNI thread(s), specially if the # of threads in relation to the cores >>> is incorrect (not enough), hence we need at least a reasonable amount >>> of buffering to prevent early dropping to those "internal" burst side effects. >>> >>> Marc >> >> Strongly agree with Marc here. We *really* don't want just a single burst worth >> of mbufs available to the kernel in alloc_q. That's just asking for congestion >> when there's no need for it. >> >> The original problem reported by Olivier is more of a resource tuning problem >> than anything else. The number of mbufs you need in the system has to take >> into account internal queue depths. > [hemant] Following are my suggestions for the time being. > 1. The existing code allocates X buffers and try to add them to alloc_q. If alloc_q is not having space, it frees them. This is not optimized at all. In the rx_burst, we shall only add the numbers of packets, as removed from tx_q. Agree > 2. During the kni_alloc, we can set kni_allocate_mbufs X*Y buffers initially for alloc_q. We can further improve it to make it configurable in future enhancements. Currently we can have the value of Y as 2. Provided that the dimensioning (X*Y), if defined in runtime it is set during rte_kni_init(), in principle I agree. However it is not clear to me if you wantg to call kni_allocate_mbufs(X*Y) for every kni_alloc or just in the first one (in other words, if X*Y == size of alloc_q). Since alloc_q is shared and assuming X*Y == size of alloc_q, I think doing it that in the first kni_alloc() would be sufficient, and then it will get refilled once RX/TX events happen. A different approach, that would require more refactor, since it changes slightly the current strategy, would be to pre-alloc the alloc_q based of the number of KNI interfaces created (kni_alloc). In this sense, rte_kni_init() would get then 2 parameters: the length of the entire shared alloc_q (actually all the queues in the KNI subsystem, with the current impl.) and the number of buffers / KNI interface. This approach could lower the mbuf consumption in certain configurations. > 3. kni_allocate_mbufs will allocate as many buffer are requested in function parameter. Agree Marc > >> Jay