From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <marc.sune@bisdn.de>
Received: from mx.bisdn.de (mx.bisdn.de [185.27.182.31])
 by dpdk.org (Postfix) with ESMTP id AF4A4568E
 for <dev@dpdk.org>; Thu, 26 Feb 2015 13:56:51 +0100 (CET)
Received: from [172.16.250.156] (unknown [172.16.250.156])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mx.bisdn.de (Postfix) with ESMTPSA id 69703A2601;
 Thu, 26 Feb 2015 13:56:51 +0100 (CET)
Message-ID: <54EF1812.50307@bisdn.de>
Date: Thu, 26 Feb 2015 13:56:50 +0100
From: Marc Sune <marc.sune@bisdn.de>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Icedove/31.4.0
MIME-Version: 1.0
To: "Hemant@freescale.com" <Hemant@freescale.com>, 
 Jay Rolette <rolette@infiniteio.com>
References: <14248648813214-git-send-email-Hemant@freescale.com>
 <54EDBC76.2050507@druidsoftware.com>
 <BY2PR0301MB069369AA0C7436E5F9A63AE9C2170@BY2PR0301MB0693.namprd03.prod.outlook.com>
 <54EDC23A.2080302@bisdn.de>
 <CADNuJVqw4hBD1kk1Y4A2+8-5yi0A0K9_5zBQaO3zOnnwXC0nog@mail.gmail.com>
 <BY2PR0301MB06939B503E040A8173F21408C2140@BY2PR0301MB0693.namprd03.prod.outlook.com>
In-Reply-To: <BY2PR0301MB06939B503E040A8173F21408C2140@BY2PR0301MB0693.namprd03.prod.outlook.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: DPDK <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Feb 2015 12:56:51 -0000


On 26/02/15 08:00, Hemant@freescale.com wrote:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jay Rolette
>> Sent: 25/Feb/2015 7:00 PM
>> To: Marc Sune
>> Cc: DPDK
>> Subject: Re: [dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst
>>
>> On Wed, Feb 25, 2015 at 6:38 AM, Marc Sune <marc.sune@bisdn.de> wrote:
>>
>>> On 25/02/15 13:24, Hemant@freescale.com wrote:
>>>
>>>> Hi OIivier
>>>>           Comments inline.
>>>> Regards,
>>>> Hemant
>>>>
>>>>   -----Original Message-----
>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Olivier Deme
>>>>> Sent: 25/Feb/2015 5:44 PM
>>>>> To: dev@dpdk.org
>>>>> Subject: Re: [dpdk-dev] [PATCH] kni:optimization of rte_kni_rx_burst
>>>>>
>>>>> Thank you Hemant, I think there might be one issue left with the
>>>>> patch though.
>>>>> The alloc_q must initially be filled with mbufs before getting mbuf
>>>>> back on the tx_q.
>>>>>
>>>>> So the patch should allow rte_kni_rx_burst to check if alloc_q is empty.
>>>>> If so, it should invoke kni_allocate_mbufs(kni, 0) (to fill the
>>>>> alloc_q with MAX_MBUF_BURST_NUM mbufs)
>>>>>
>>>>> The patch for rte_kni_rx_burst would then look like:
>>>>>
>>>>> @@ -575,7 +575,7 @@ rte_kni_rx_burst(struct rte_kni *kni, struct
>>>>> rte_mbuf **mbufs, unsigned num)
>>>>>
>>>>>         /* If buffers removed, allocate mbufs and then put them into
>>>>> alloc_q */
>>>>>         if (ret)
>>>>> -        kni_allocate_mbufs(kni);
>>>>> +      kni_allocate_mbufs(kni, ret);  else if
>>>>> + (unlikely(kni->alloc_q->write == kni->alloc_q->read))
>>>>> +      kni_allocate_mbufs(kni, 0);
>>>>>
>>>>>   [hemant]  This will introduce a run-time check.
>>>> I missed to include the other change in the patch.
>>>>    I am doing it in kni_alloc i.e. initiate the alloc_q with default
>>>> burst size.
>>>>          kni_allocate_mbufs(ctx, 0);
>>>>
>>>> In a way, we are now suggesting to reduce the size of alloc_q to only
>>>> default burst size.
>>>>
>>> As an aside comment here, I think that we should allow to tweak the
>>> userspace <-> kernel queue sizes (rx_q, tx_q, free_q and alloc_q) .
>>> Whether this should be a build configuration option or a parameter to
>>> rte_kni_init(), it is not completely clear to me, but I guess
>>> rte_kni_init() is a better option.
>>>
>> rte_kni_init() is definitely a better option. It allows things to be tuned based on
>> individual system config rather than requiring different builds.
>>
>>
>>> Having said that, the original mail from Hemant was describing that
>>> KNI was giving an out-of-memory. This to me indicates that the pool is
>>> incorrectly dimensioned. Even if KNI will not pre-allocate in the
>>> alloc_q, or not completely, in the event of high load, you will get
>>> this same "out of memory".
>>>
>>> We can reduce the usage of buffers by the KNI subsystem in kernel
>>> space and in userspace, but the kernel will always need a small cache
>>> of pre-allocated buffers (coming from user-space), since the KNI
>>> kernel module does not know where to grab the packets from (which
>>> pool). So my guess is that the dimensioning problem experienced by
>>> Hemant would be the same, even with the proposed changes.
>>>
>>>
>>>> Can we reach is situation, when the kernel is adding packets faster
>>>> in tx_q than the application is able to dequeue?
>>>>
>>> I think so. We cannot control much how the kernel will schedule the
>>> KNI thread(s), specially if the # of threads in relation to the cores
>>> is incorrect (not enough), hence we need at least a reasonable amount
>>> of buffering to prevent early dropping to those "internal" burst side effects.
>>>
>>> Marc
>>
>> Strongly agree with Marc here. We *really* don't want just a single burst worth
>> of mbufs available to the kernel in alloc_q. That's just asking for congestion
>> when there's no need for it.
>>
>> The original problem reported by Olivier is more of a resource tuning problem
>> than anything else. The number of mbufs you need in the system has to take
>> into account internal queue depths.
> [hemant]  Following are my suggestions for the time being.
> 1.  The existing code allocates X buffers and try to add them to alloc_q. If alloc_q is not having space, it frees them. This is not optimized at all.  In the rx_burst, we shall only add the numbers of packets, as removed from tx_q.

Agree

> 2. During the kni_alloc, we can set kni_allocate_mbufs X*Y buffers initially for alloc_q.  We can further improve it to make it configurable in future enhancements.  Currently we can have the value of Y as 2.

Provided that the dimensioning (X*Y), if defined in runtime it is set 
during rte_kni_init(), in principle I agree.

However it is not clear to me if you wantg to call 
kni_allocate_mbufs(X*Y) for every kni_alloc or just in the first one (in 
other words, if X*Y == size of alloc_q). Since alloc_q is shared and 
assuming X*Y == size of alloc_q, I think doing it that in the first 
kni_alloc() would be sufficient, and then it will get refilled once 
RX/TX events happen.

A different approach, that would require more refactor, since it changes 
slightly the current strategy, would be to pre-alloc the alloc_q based 
of the number of KNI interfaces created (kni_alloc). In this sense, 
rte_kni_init() would get then 2 parameters: the length of the entire 
shared alloc_q (actually all the queues in the KNI subsystem, with the 
current impl.) and the number of buffers / KNI interface. This approach 
could lower the mbuf consumption in certain configurations.

> 3. kni_allocate_mbufs will allocate as many buffer are requested in function parameter.

Agree

Marc

>
>> Jay