From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx.bisdn.de (mx.bisdn.de [185.27.182.31]) by dpdk.org (Postfix) with ESMTP id 0C07B594F for ; Sat, 15 Nov 2014 00:34:34 +0100 (CET) Received: from [192.168.1.101] (f052051095.adsl.alicedsl.de [78.52.51.95]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx.bisdn.de (Postfix) with ESMTPSA id 880A5A10D0; Sat, 15 Nov 2014 00:44:38 +0100 (CET) Message-ID: <546693E5.7080801@bisdn.de> Date: Sat, 15 Nov 2014 00:44:37 +0100 From: Marc Sune User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131103 Icedove/17.0.10 MIME-Version: 1.0 To: dev@dpdk.org References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] Enhance KNI DPDK-app-side to be Multi-Producer/Consumer X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Nov 2014 23:34:35 -0000 On 14/11/14 22:05, Sanford, Robert wrote: > Hello Thomas, > > I want to discuss a small enhancement to KNI that we developed. We wanted > to send/receive mbufs between one KNI device and multiple cores, but we > wanted to keep the changes simple. So, here were our requirements: > > 1. Don't use heavy synchronization when reading/writing the FIFOs in > shared memory. > 2. Don't make any one core (data or control plane) perform too much work > shuffling mbufs to/from the FIFOs. > 3. Don't use an intermediate RTE ring to drop off mbufs when another core > is already accessing the same KNI FIFO. > 4. Since (for our private use case) we don't need MP/MC on the kernel > side, don't change the kernel KNI code at all. > 5. Don't change the request/reply logic. It stays single-threaded on both > sides. I've done also a quick look during this last days, but still I need to allocate some time to look it carefully. I haven't done anything yet though, so I would be very much interested to see your patch. With the current KNI implementation, and even with the very small patch ([PATCH] Adding RTE_KNI_PREEMPT configuration option) that I sent to the list (btw, any volunteer to review it? ), we are getting with a single core KNI a very very high delay, comparing direct kernel (std. driver) with IGB/IXGB-> KNI->KERNEL->KNI->IGB/IXGB (I was expecting a slightly increased delay, but not of a factor of x100 or more as I get). In addition the jitter is really horrendous (buffering between rte_kni and kernel module?). Btw, we are using a single Kthread (kni_single). We cannot confirm yet what I am about to say , but having a sequence of KNIs (PHY>KNI1>KNI2->...->PHY), we have observed a huge decrease in performance (even with the NO_PREEMPT patch) and, what worries me more, I've seen reordering of packets. Of course, we have first to assure that these issues comented before are not a problem of our DPDK-based switch, but the code is similar to l3fwd (it is partially based on it). Once I find some time, I will try to modify the l2fwd example to try this out, and isolate the problem. One last thing we have obeserved in our application is the huge amount of mbufs that KNI interfaces require. But this could be a product of the buffering either in the rte_kni/kernel module or the kernel itself to treat the incoming skbs, or a combination. I also need to further investigate this. > Here is what we came up with: > > 1. Borrow heavily from the librte_ring implementation. > 2. In librte_kni structure rte_kni, supplement each rte_kni_fifo (tx, rx, > alloc, and free q) with another private, corresponding structure that > contains a head, tail, mask, and size field. > 3. Create kni_fifo_put_mp function with merged logic of kni_fifo_put and > __rte_ring_mp_do_enqueue. After we update the tail index (which is private > to the DPDK-app), we update the FIFO write index (shared between app and > kernel). > 4. Create kni_fifo_get_mc function with merged logic of kni_fifo_get and > __rte_ring_mc_do_dequeue. After we update the tail index, update the FIFO > read index. > 5. In rte_kni_tx_burst and kni_alloc_mbufs, call kni_fifo_put_mp instead > of kni_fifo_put. > 6. In rte_kni_rx_burst and kni_free_bufs, call kni_fifo_get_mc instead of > kni_fifo_get. > > We believe this is a common use case, and thus would like to contribute it > to dpdk.org. > Questions/comments: > 1. Are you interested for us to send the changes as an RFC? I am, personally. > 2. Do you agree with this approach, or would it be better, say, to rewrite > both sides of the interface to be more like librte_ring? I was thinking about something like this. But one of the original authors can perhaps shed some light here. > 3. Perhaps we could improve upon kni_allocate_mbufs, as it blindly > attempts to allocate and enqueue a constant number of mbufs. We have not > really focused on the kernel ==> app data path, because we were only > interested in app ==> kernel. Seen that and agree. Looks like it is a waste of cycles, at first glance. There is a small patch: [PATCH] kni: optimizing the rte_kni_rx_burst That seems to partially address this issue, but as far as I have seen, it is only addressing the case where no pkts are being send, but not cases where for instance @Thomas: I rebased this patch to master HEAD, and I (quickly) tried this patch, but I don't have results. AFAIU from the patch, it only optimizes the case where no buffers are being sent in that KNI, so actually the benchmarking setup seems to be slightly more complicate. In any case, perhaps waiting for Robert's contribution and merge both would make sense. Marc > > -- > Regards, > Robert Sanford >