From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 6F4A5292D for ; Sat, 27 May 2017 01:10:25 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 May 2017 16:10:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.38,399,1491289200"; d="scan'208";a="861798998" Received: from irsmsx152.ger.corp.intel.com ([163.33.192.66]) by FMSMGA003.fm.intel.com with ESMTP; 26 May 2017 16:10:23 -0700 Received: from irsmsx109.ger.corp.intel.com ([169.254.13.12]) by IRSMSX152.ger.corp.intel.com ([169.254.6.83]) with mapi id 14.03.0319.002; Sat, 27 May 2017 00:10:23 +0100 From: "Ananyev, Konstantin" To: "Hu, Jiayu" CC: "dev@dpdk.org" , "Wiles, Keith" , "yuanhan.liu@linux.intel.com" Thread-Topic: [PATCH v3 1/3] lib: add Generic Receive Offload API framework Thread-Index: AQHSvNIioTj7qilZNEunAvoVQTYKUKIAL6+AgAGlgACAAN/38IADoyeAgAEVmeA= Date: Fri, 26 May 2017 23:10:21 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772583FAFB22A@IRSMSX109.ger.corp.intel.com> References: <1491309106-94264-1-git-send-email-jiayu.hu@intel.com> <1493021398-115955-1-git-send-email-jiayu.hu@intel.com> <1493021398-115955-2-git-send-email-jiayu.hu@intel.com> <2601191342CEEE43887BDE71AB9772583FAF8C27@IRSMSX109.ger.corp.intel.com> <20170523103154.GA2033@localhost.localdomain> <2601191342CEEE43887BDE71AB9772583FAF9BAE@IRSMSX109.ger.corp.intel.com> <20170526072613.GA85810@localhost.localdomain> In-Reply-To: <20170526072613.GA85810@localhost.localdomain> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 1/3] lib: add Generic Receive Offload API framework X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 May 2017 23:10:27 -0000 Hi Jiayu, > -----Original Message----- > From: Hu, Jiayu > Sent: Friday, May 26, 2017 8:26 AM > To: Ananyev, Konstantin > Cc: dev@dpdk.org; Wiles, Keith ; yuanhan.liu@linux= .intel.com > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framewor= k >=20 > Hi Konstantin, >=20 > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin wrote: > > > > Hi Jiayu, > > > > > > > > Hi Konstantin, > > > > > > Thanks for your comments. My replies/questions are below. > > > > > > BRs, > > > Jiayu > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote: > > > > Hi Jiayu, > > > > My comments/questions below. > > > > Konstantin > > > > > > > > > > > > > > For applications, DPDK GRO provides three external functions to > > > > > enable/disable GRO: > > > > > - rte_gro_init: initialize GRO environment; > > > > > - rte_gro_enable: enable GRO for a given port; > > > > > - rte_gro_disable: disable GRO for a given port. > > > > > Before using GRO, applications should explicitly call rte_gro_ini= t to > > > > > initizalize GRO environment. After that, applications can call > > > > > rte_gro_enable to enable GRO and call rte_gro_disable to disable = GRO for > > > > > specific ports. > > > > > > > > I think this is too restrictive and wouldn't meet various user's ne= eds. > > > > User might want to: > > > > - enable/disable GRO for particular RX queue > > > > - or even setup different GRO types for different RX queues, > > > > i.e, - GRO over IPV4/TCP for queue 0, and GRO over IPV6/TCP for= queue 1, etc. > > > > > > The reason for enabling/disabling GRO per-port instead of per-queue i= s that LINUX > > > controls GRO per-port. To control GRO per-queue indeed can provide mo= re flexibility > > > to applications. But are there any scenarios that different queues of= a port may > > > require different GRO control (i.e. GRO types and enable/disable GRO)= ? > > > > I think yes. > > > > > > > > > - For various reasons, user might prefer not to use RX callbacks fo= r various reasons, > > > > But invoke gro() manually at somepoint in his code. > > > > > > An application-used GRO library can enable more flexibility to applic= ations. Besides, > > > when perform GRO in ethdev layer or inside PMD drivers, it is an issu= e that > > > rte_eth_rx_burst returns actually received packet number or GROed pac= ket number. And > > > the same issue happens in GSO, and even more seriously. This is becau= se applications > > > , like VPP, always rely on the return value of rte_eth_tx_burst to de= cide further > > > operations. If applications can direcly call GRO/GSO libraries, this = issue won't exist. > > > And DPDK is a library, which is not a holistic system like LINUX. We = don't need to do > > > the same as LINUX. Therefore, maybe it's a better idea to directly pr= ovide SW > > > segmentation/reassembling libraries to applications. > > > > > > > - Many users would like to control size (number of flows/items per = flow), > > > > max allowed packet size, max timeout, etc., for different GRO tab= les. > > > > - User would need a way to flush all or only timeout packets from p= articular GRO tables. > > > > > > > > So I think that API needs to extended to become be much more fine-g= rained. > > > > Something like that: > > > > > > > > struct rte_gro_tbl_param { > > > > int32_t socket_id; > > > > size_t max_flows; > > > > size_t max_items_per_flow; > > > > size_t max_pkt_size; > > > > uint64_t packet_timeout_cycles; > > > > > > > > > > > > ... > > > > }; > > > > > > > > struct rte_gro_tbl; > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_para= m *param); > > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl); > > > > > > Yes, I agree with you. It's necessary to provide more fine-grained co= ntrol APIs to > > > applications. But what's 'packet_timeout_cycles' used for? Is it for = TCP packets? > > > > For any packets that sits in the gro_table for too long. > > > > > > > > > > > > > /* > > > > * process packets, might store some packets inside the GRO table, > > > > * returns number of filled entries in pkt[] > > > > */ > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mb= uf *pkt[], uint32_t num); > > > > > > > > /* > > > > * retirieves up to num timeouted packets from the table. > > > > */ > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt= , struct rte_mbuf *pkt[], uint32_t num); > > > > > > Currently, we implement GRO as RX callback, whose processing logic is= simple: > > > receive burst packets -> perform GRO -> return to application. GRO st= ops after > > > finishing processing received packets. If we provide rte_gro_tbl_time= out, when > > > and who will call it? > > > > I mean the following scenario: > > We receive a packet, find it is eligible for GRO and put it into gro_ta= ble > > in expectation - there would be more packets for the same flow. > > But it could happen that we would never (or for some long time) receive > > any new packets for that flow. > > So the first packet would never be delivered to the upper layer, > > or delivered too late. > > So we need a mechanism to extract such not merged packets > > and push them to the upper layer. > > My thought it would be application responsibility to call it from time = to time. > > That's actually another reason why I think we shouldn't use application > > to always use RX callbacks for GRO. >=20 > Currently, we only provide one reassembly function, rte_gro_reassemble_bu= rst, > which merges N inputted packets at a time. After finishing processing the= se > packets, it returns all of them and reset hashing tables. Therefore, ther= e > are no packets in hashing tables after rte_gro_reassemble_burst returns. Ok, sorry I missed that part with rte_hash_reset(). So gro_ressemble_burst() performs only inline processing on current input p= ackets and doesn't try to save packets for future merging, correct? Such approach indeed is much lightweight and doesn't require any extra time= outs and flush(). So my opinion let's keep it like that - nice and simple. BTW, I think in that case we don't need any hashtables (or any other persis= tent strucures)at all. What we need is just a set of GROs (tcp4, tpc6, etc.) we want to perform on= given array of packets.=20 >=20 > If we provide rte_gro_tbl_timeout, we also need to provide another reassm= ebly > function, like rte_gro_reassemble, which processes one given packet at a > time and won't reset hashing tables. Applications decide when to flush pa= ckets > in hashing tables. And rte_gro_tbl_timeout is one of the ways that can be= used > to flush the packets. Do you mean that? Yes, that's what I meant, but as I said above - I think your approach is pr= obably more preferable - it is much simpler and lightweight. Konstantin =20