From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id A429912A8 for ; Thu, 21 Apr 2016 12:54:08 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga101.fm.intel.com with ESMTP; 21 Apr 2016 03:54:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,512,1455004800"; d="scan'208";a="959723849" Received: from bricha3-mobl3.ger.corp.intel.com ([10.237.220.132]) by orsmga002.jf.intel.com with SMTP; 21 Apr 2016 03:54:05 -0700 Received: by (sSMTP sendmail emulation); Thu, 21 Apr 2016 11:54:04 +0025 Date: Thu, 21 Apr 2016 11:54:04 +0100 From: Bruce Richardson To: Jay Rolette Cc: DPDK , Andriy Berestovskyy Message-ID: <20160421105404.GA11224@bricha3-MOBL3> References: <20160420091046.GA4080@bricha3-MOBL3> <20160420140554.GA11780@bricha3-MOBL3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Organization: Intel Shannon Ltd. User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [dpdk-dev] Couple of PMD questions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Apr 2016 10:54:09 -0000 On Wed, Apr 20, 2016 at 10:47:59AM -0500, Jay Rolette wrote: > On Wed, Apr 20, 2016 at 9:05 AM, Bruce Richardson < > bruce.richardson@intel.com> wrote: > > > On Wed, Apr 20, 2016 at 07:22:57AM -0500, Jay Rolette wrote: > > > On Wed, Apr 20, 2016 at 4:10 AM, Bruce Richardson < > > > bruce.richardson@intel.com> wrote: > > > > > > > On Tue, Apr 19, 2016 at 03:16:37PM -0500, Jay Rolette wrote: > > > > > In ixgbe_dev_rx_init(), there is this bit of code: > > > > > > > > > > /* > > > > > * Configure the RX buffer size in the BSIZEPACKET field of > > > > > * the SRRCTL register of the queue. > > > > > * The value is in 1 KB resolution. Valid values can be from > > > > > * 1 KB to 16 KB. > > > > > */ > > > > > buf_size = (uint16_t)(rte_pktmbuf_data_room_size(rxq->mb_pool) > > - > > > > > RTE_PKTMBUF_HEADROOM); > > > > > srrctl |= ((buf_size >> IXGBE_SRRCTL_BSIZEPKT_SHIFT) & > > > > > IXGBE_SRRCTL_BSIZEPKT_MASK); > > > > > > > > > > IXGBE_WRITE_REG(hw, IXGBE_SRRCTL(rxq->reg_idx), srrctl); > > > > > > > > > > buf_size = (uint16_t) ((srrctl & IXGBE_SRRCTL_BSIZEPKT_MASK) << > > > > > IXGBE_SRRCTL_BSIZEPKT_SHIFT); > > > > > > > > > > /* It adds dual VLAN length for supporting dual VLAN */ > > > > > if (dev->data->dev_conf.rxmode.max_rx_pkt_len + > > > > > 2 * IXGBE_VLAN_TAG_SIZE > buf_size) > > > > > dev->data->scattered_rx = 1; > > > > > > > > > > > > > > > Couple of questions I have about it: > > > > > > > > > > 1) If the caller configured the MBUF memory pool data room size to be > > > > > something other than a multiple of 1K (+ RTE_PKTMBUF_HEADROOM), then > > the > > > > > driver ends up silently programming the NIC to use a smaller max RX > > size > > > > > than the caller specified. > > > > > > > > > > Should the driver error out in that case instead of only "sort of" > > > > working? > > > > > If not, it seems like it should be logging an error message. > > > > > > > > Well, it does log at the end of the whole rx init process what RX > > function > > > > is > > > > being used, so there is that. > > > > However, looking at it now, given that there is an explicit setting in > > the > > > > config > > > > to request scattered mode, I think you may be right and that we should > > > > error out > > > > here if scattered mode is needed and not set. It could help prevent > > > > unexpected > > > > problems with these drivers. > > > > > > > > > > +1. A hard fail at init if I've misconfigured things is much preferred to > > > something that mostly works for typical test cases and only fails on > > > corner/limit testing. > > > > > > > > > > > 2) Second question is about the "/* It adds dual VLAN length for > > > > supporting > > > > > dual VLAN */" bit... > > > > > > > > > > As I understand it, dev_conf.rxmode.max_rx_pkt_len is supposed to be > > set > > > > to > > > > > the max frame size you support (although it says it's only used if > > jumbo > > > > > frame is enabled). That same value is generally used when > > calculating the > > > > > size that mbuf elements should be created at. > > > > > > > > > > The description for the data_room_size parameter of > > > > > rte_pktmbuf_pool_create(): > > > > > > > > > > "Size of data buffer in each mbuf, including RTE_PKTMBUF_HEADROOM." > > > > > > > > > > > > > > > If I support a max frame size of 9216 bytes (exactly a 1K multiple to > > > > make > > > > > the NIC happy), then max_rx_pkt_len is going to be 9216 and > > > > data_room_size > > > > > will be 9216 + RTE_PKTMBUF_HEADROOM. > > > > > > > > > > ixgbe_dev_rx_init() will calculate normalize that back to 9216, which > > > > will > > > > > fail the dual VLAN length check. The really nasty part about that is > > it > > > > has > > > > > a side-effect of enabling scattered RX regardless of the fact that I > > > > didn't > > > > > enable scattered RX in dev_conf.rxmode. > > > > > > > > > > The code in the e1000 PMD is similar, so nothing unique to ixgbe. > > > > > > > > > > Is that check correct? It seems wrong to be adding space for q-in-q > > on > > > > top > > > > > of your specified max frame size... > > > > > > > > The issue here is what the correct behaviour needs to be. If we have > > the > > > > user > > > > specify the maximum frame size including all vlan tags, then we hit the > > > > problem > > > > where we have to subtract the VLAN tag sizes when writing the value to > > the > > > > NIC. > > > > In that case, we will hit a problem where we get a e.g. 9210 byte > > frame - > > > > to > > > > reuse your example - without any VLAN tags, which will be rejected by > > the > > > > hardware as being oversized. If we don't do the subtraction, and we > > get the > > > > same 9210 byte packet with 2 VLAN tags on it, the hardware will accept > > it > > > > and > > > > then split it across multiple descriptors because the actual DMA size > > is > > > > 9218 bytes. > > > > > > > > > > As an app developer, I didn't realize the max frame size didn't include > > > VLAN tags. I expected max frame size to be the size of the ethernet frame > > > on the wire, which I would expect to include space used by any VLAN or > > MPLS > > > tags. > > > > > > Is there anything in the docs or example apps about that? I did some > > > digging as I was debugging this and didn't notice it, but entirely > > possible > > > I just missed it. > > > > > > > > > > I'm not sure there is a works-in-all-cases solution here. > > > > > > > > > > Andriy's suggestion seems like it points in the right direction. > > > > > > From an app developer point of view, I'd expect to have a single max > > frame > > > size value to track and the APIs should take care of any adjustments > > > required internally. Maybe have rte_pktmbuf_pool_create() add the > > > additional bytes when it calls rte_mempool_create() under the covers? > > Then > > > it's nice and clean for the API without unexpected side-effects. > > > > > > > It will still have unintended side-effects I think, depending on the > > resolution > > of the NIC buffer length paramters. For drivers like ixgbe or e1000, the > > mempool > > create call could potentially have to add an additional 1k to each buffer > > just > > to be able to store the extra eight bytes. > > > The comments in the ixgbe driver say that the value programmed into SRRCTL > must be on a 1K boundary. Based on your previous response, it sounded like > the NIC ignores that limit for VLAN tags, hence the check for the extra 8 > bytes on the mbuf element size. Are you worried about the size resolution > on mempool elements? > > Sounds like I've got to go spend some quality time in the NIC data > sheets... Maybe I should back up and just ask the higher level question: > > What's the right incantation in both the dev_conf structure and in creating > the mbuf pool to support jumbo frames of some particular size on the wire, > with or without VLAN tags, without requiring scattered_rx support in an app? > I think that the answer to that may be that it's hard to do, or impossible, depending on the NIC. From what I can see from the datasheet for some of the ixgbe NICs, the max-frame-size (MAXFRS) field that you specify always excludes VLAN tags. Therefore you can't tell the NIC to accept a max size of X irrespective of VLAN tags. The max size will always vary depending on the tags. /Bruce