From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bruce.richardson@intel.com>
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
 by dpdk.org (Postfix) with ESMTP id 38F879AAB
 for <dev@dpdk.org>; Wed, 25 Feb 2015 12:02:33 +0100 (CET)
Received: from orsmga001.jf.intel.com ([10.7.209.18])
 by orsmga101.jf.intel.com with ESMTP; 25 Feb 2015 03:02:32 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.09,644,1418112000"; d="scan'208";a="656940726"
Received: from bricha3-mobl3.ger.corp.intel.com ([10.243.20.32])
 by orsmga001.jf.intel.com with SMTP; 25 Feb 2015 03:02:30 -0800
Received: by  (sSMTP sendmail emulation); Wed, 25 Feb 2015 11:02:28 +0025
Date: Wed, 25 Feb 2015 11:02:28 +0000
From: Bruce Richardson <bruce.richardson@intel.com>
To: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-ID: <20150225110228.GA4896@bricha3-MOBL3>
References: <54ED9894.3050409@cloudius-systems.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <54ED9894.3050409@cloudius-systems.com>
Organization: Intel Shannon Ltd.
User-Agent: Mutt/1.5.23 (2014-03-12)
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] : ixgbe: why bulk allocation is not used for a
 scattered Rx flow?
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Feb 2015 11:02:34 -0000

On Wed, Feb 25, 2015 at 11:40:36AM +0200, Vlad Zolotarov wrote:
> Hi, I have a question about the "scattered Rx" feature: why enabling it
> disabled "bulk allocation" feature?

The "bulk-allocation" feature is one where a more optimized RX code path is
used. For the sake of performance, when doing that code path, certain assumptions
were made, one of which was that packets would fit inside a single mbuf. Not
having this assumption makes the receiving of packets much more complicated and
therefore slower. [For similar reasons, the optimized TX routines e.g. vector
TX, are only used if it is guaranteed that no hardware offload features are
going to be used].

Now, it is possible, though challenging, to write optimized code for these more
complicated cases, such as scattered RX, or TX with offloads or scattered packets.
In general, we will always want separate routines for the simple case and the
complicated cases, as the performance hit of checking for the offloads, or
multi-mbuf packets will be significant enough to hit our performance badly when
they are not needed. In the case of the vector PMD for ixgbe - our highest
performance path right now - we have indeed two receive routines, for simple
and scattered cases. For TX, we only have an optimized path for the simple case,
but that is not to say that at some point someone may provide one for the
offload case too.

A final note on scattered packets in particular: if packets are too big to fit
in a single mbuf, then they are not small packets, and the processing time per
packet available is, by definition, larger than for packets that fit in a 
single mbuf. For 64-byte packets, the packet arrival rate is 67ns @ 10G, or
approx 200 cycles at 3GHz. If we assume a standard 2k mbuf, then a packet which
spans two mbufs takes at least 1654ns, and therefore a 3GHz CPU has nearly 5000
cycles to process that same packet. Therefore, since the processing budget is
so much bigger the need to optimize is much less. Therefore it's more important
to focus on the small packet case, which is what we have done.

> There is some unclear comment in the ixgbe_recv_scattered_pkts():
> 
> 		/*
> 		 * Descriptor done.
> 		 *
> 		 * Allocate a new mbuf to replenish the RX ring descriptor.
> 		 * If the allocation fails:
> 		 *    - arrange for that RX descriptor to be the first one
> 		 *      being parsed the next time the receive function is
> 		 *      invoked [on the same queue].
> 		 *
> 		 *    - Stop parsing the RX ring and return immediately.
> 		 *
> 		 * This policy does not drop the packet received in the RX
> 		 * descriptor for which the allocation of a new mbuf failed.
> 		 * Thus, it allows that packet to be later retrieved if
> 		 * mbuf have been freed in the mean time.
> 		 * As a side effect, holding RX descriptors instead of
> 		 * systematically giving them back to the NIC may lead to
> 		 * RX ring exhaustion situations.
> 		 * However, the NIC can gracefully prevent such situations
> 		 * to happen by sending specific "back-pressure" flow control
> 		 * frames to its peer(s).
> 		 */
> 
> Why the same "policy" can't be done in the bulk-context allocation? - Don't
> advance the RDT until u've refilled the ring. What do I miss here?

A lot of the optimizations done in other code paths, such as bulk alloc, may well
be applicable here, it's just that the work has not been done yet, as the focus
is elsewhere. For vector PMD RX, we have now routines that work on both regular
and scattered packets, and both perform much better than the scalar equivalents.
Also to note that in every RX (and TX) routine, the NIC tail pointer update is
always done just once at the end of the function. 

> 
> Another question is about the LRO feature - is there a reason why it's not
> implemented? I've implemented the LRO support in ixgbe PMD to begin with - I
> used a "scattered Rx" as a template and now I'm tuning it (things like the
> stuff above).
> 
> Is there any philosophical reason why it hasn't been implemented in *any*
> PMD so far? ;)

I'm not aware of any philosophical reasons why it hasn't been done. Patches
are welcome, as always. :-)

/Bruce