From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id D663C8CF2 for ; Tue, 30 May 2017 16:10:17 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga105.jf.intel.com with ESMTP; 30 May 2017 07:10:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.38,418,1491289200"; d="scan'208";a="93451297" Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202]) by orsmga002.jf.intel.com with ESMTP; 30 May 2017 07:10:16 -0700 Received: from fmsmsx126.amr.corp.intel.com (10.18.125.43) by fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 30 May 2017 07:10:16 -0700 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by FMSMSX126.amr.corp.intel.com (10.18.125.43) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 30 May 2017 07:10:15 -0700 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.146]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.122]) with mapi id 14.03.0319.002; Tue, 30 May 2017 22:10:13 +0800 From: "Hu, Jiayu" To: "Richardson, Bruce" CC: "Ananyev, Konstantin" , "dev@dpdk.org" , "Wiles, Keith" , "yuanhan.liu@linux.intel.com" Thread-Topic: [dpdk-dev] [PATCH v3 1/3] lib: add Generic Receive Offload API framework Thread-Index: AQHS1wlxT/IlciYY+UqRkm2TSMSXFKILC7Eg//+rmICAAjMnUA== Date: Tue, 30 May 2017 14:10:11 +0000 Message-ID: References: <2601191342CEEE43887BDE71AB9772583FAF8C27@IRSMSX109.ger.corp.intel.com> <20170523103154.GA2033@localhost.localdomain> <2601191342CEEE43887BDE71AB9772583FAF9BAE@IRSMSX109.ger.corp.intel.com> <20170526072613.GA85810@localhost.localdomain> <2601191342CEEE43887BDE71AB9772583FAFB22A@IRSMSX109.ger.corp.intel.com> <20170527034137.GA22570@localhost.localdomain> <2601191342CEEE43887BDE71AB9772583FAFE69E@IRSMSX109.ger.corp.intel.com> <20170527140929.GA27976@localhost.localdomain> <2601191342CEEE43887BDE71AB9772583FB02195@IRSMSX109.ger.corp.intel.com> <20170529121853.GA34708@bricha3-MOBL3.ger.corp.intel.com> In-Reply-To: <20170529121853.GA34708@bricha3-MOBL3.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 1/3] lib: add Generic Receive Offload API framework X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 May 2017 14:10:18 -0000 Hi Bruce, > -----Original Message----- > From: Richardson, Bruce > Sent: Monday, May 29, 2017 8:19 PM > To: Hu, Jiayu > Cc: Ananyev, Konstantin ; dev@dpdk.org; > Wiles, Keith ; yuanhan.liu@linux.intel.com > Subject: Re: [dpdk-dev] [PATCH v3 1/3] lib: add Generic Receive Offload A= PI > framework >=20 > On Mon, May 29, 2017 at 10:22:57AM +0000, Hu, Jiayu wrote: > > Hi Konstantin, > > > > > -----Original Message----- > > > From: Ananyev, Konstantin > > > Sent: Sunday, May 28, 2017 12:51 AM > > > To: Hu, Jiayu > > > Cc: dev@dpdk.org; Wiles, Keith ; > > > yuanhan.liu@linux.intel.com > > > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API > framework > > > > > > > > > Hi Jiayu, > > > > > > > > > > > Hi Konstantin, > > > > > > > > On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote= : > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Hu, Jiayu > > > > > > Sent: Saturday, May 27, 2017 4:42 AM > > > > > > To: Ananyev, Konstantin > > > > > > Cc: dev@dpdk.org; Wiles, Keith ; > > > yuanhan.liu@linux.intel.com > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload AP= I > > > framework > > > > > > > > > > > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin > wrote: > > > > > > > Hi Jiayu, > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Hu, Jiayu > > > > > > > > Sent: Friday, May 26, 2017 8:26 AM > > > > > > > > To: Ananyev, Konstantin > > > > > > > > Cc: dev@dpdk.org; Wiles, Keith ; > > > yuanhan.liu@linux.intel.com > > > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offloa= d API > > > framework > > > > > > > > > > > > > > > > Hi Konstantin, > > > > > > > > > > > > > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, > Konstantin > > > wrote: > > > > > > > > > > > > > > > > > > Hi Jiayu, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Konstantin, > > > > > > > > > > > > > > > > > > > > Thanks for your comments. My replies/questions are belo= w. > > > > > > > > > > > > > > > > > > > > BRs, > > > > > > > > > > Jiayu > > > > > > > > > > > > > > > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, > > > Konstantin wrote: > > > > > > > > > > > Hi Jiayu, > > > > > > > > > > > My comments/questions below. > > > > > > > > > > > Konstantin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For applications, DPDK GRO provides three external > functions > > > to > > > > > > > > > > > > enable/disable GRO: > > > > > > > > > > > > - rte_gro_init: initialize GRO environment; > > > > > > > > > > > > - rte_gro_enable: enable GRO for a given port; > > > > > > > > > > > > - rte_gro_disable: disable GRO for a given port. > > > > > > > > > > > > Before using GRO, applications should explicitly ca= ll > > > rte_gro_init to > > > > > > > > > > > > initizalize GRO environment. After that, applicatio= ns can > call > > > > > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disab= le > to > > > disable GRO for > > > > > > > > > > > > specific ports. > > > > > > > > > > > > > > > > > > > > > > I think this is too restrictive and wouldn't meet var= ious > user's > > > needs. > > > > > > > > > > > User might want to: > > > > > > > > > > > - enable/disable GRO for particular RX queue > > > > > > > > > > > - or even setup different GRO types for different RX = queues, > > > > > > > > > > > i.e, - GRO over IPV4/TCP for queue 0, and GRO ove= r > > > IPV6/TCP for queue 1, etc. > > > > > > > > > > > > > > > > > > > > The reason for enabling/disabling GRO per-port instead = of > per- > > > queue is that LINUX > > > > > > > > > > controls GRO per-port. To control GRO per-queue indeed = can > > > provide more flexibility > > > > > > > > > > to applications. But are there any scenarios that diffe= rent > > > queues of a port may > > > > > > > > > > require different GRO control (i.e. GRO types and > enable/disable > > > GRO)? > > > > > > > > > > > > > > > > > > I think yes. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - For various reasons, user might prefer not to use R= X > callbacks > > > for various reasons, > > > > > > > > > > > But invoke gro() manually at somepoint in his code. > > > > > > > > > > > > > > > > > > > > An application-used GRO library can enable more flexibi= lity to > > > applications. Besides, > > > > > > > > > > when perform GRO in ethdev layer or inside PMD drivers,= it is > an > > > issue that > > > > > > > > > > rte_eth_rx_burst returns actually received packet numbe= r or > > > GROed packet number. And > > > > > > > > > > the same issue happens in GSO, and even more seriously. > This is > > > because applications > > > > > > > > > > , like VPP, always rely on the return value of rte_eth_= tx_burst > to > > > decide further > > > > > > > > > > operations. If applications can direcly call GRO/GSO li= braries, > > > this issue won't exist. > > > > > > > > > > And DPDK is a library, which is not a holistic system l= ike > LINUX. > > > We don't need to do > > > > > > > > > > the same as LINUX. Therefore, maybe it's a better idea = to > > > directly provide SW > > > > > > > > > > segmentation/reassembling libraries to applications. > > > > > > > > > > > > > > > > > > > > > - Many users would like to control size (number of > flows/items > > > per flow), > > > > > > > > > > > max allowed packet size, max timeout, etc., for dif= ferent > GRO > > > tables. > > > > > > > > > > > - User would need a way to flush all or only timeout > packets > > > from particular GRO tables. > > > > > > > > > > > > > > > > > > > > > > So I think that API needs to extended to become be mu= ch > more > > > fine-grained. > > > > > > > > > > > Something like that: > > > > > > > > > > > > > > > > > > > > > > struct rte_gro_tbl_param { > > > > > > > > > > > int32_t socket_id; > > > > > > > > > > > size_t max_flows; > > > > > > > > > > > size_t max_items_per_flow; > > > > > > > > > > > size_t max_pkt_size; > > > > > > > > > > > uint64_t packet_timeout_cycles; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > }; > > > > > > > > > > > > > > > > > > > > > > struct rte_gro_tbl; > > > > > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct > > > rte_gro_tbl_param *param); > > > > > > > > > > > > > > > > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl); > > > > > > > > > > > > > > > > > > > > Yes, I agree with you. It's necessary to provide more f= ine- > > > grained control APIs to > > > > > > > > > > applications. But what's 'packet_timeout_cycles' used f= or? Is > it > > > for TCP packets? > > > > > > > > > > > > > > > > > > For any packets that sits in the gro_table for too long. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /* > > > > > > > > > > > * process packets, might store some packets inside t= he > GRO > > > table, > > > > > > > > > > > * returns number of filled entries in pkt[] > > > > > > > > > > > */ > > > > > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl,= struct > > > rte_mbuf *pkt[], uint32_t num); > > > > > > > > > > > > > > > > > > > > > > /* > > > > > > > > > > > * retirieves up to num timeouted packets from the t= able. > > > > > > > > > > > */ > > > > > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl= , > uint64_t > > > tmt, struct rte_mbuf *pkt[], uint32_t num); > > > > > > > > > > > > > > > > > > > > Currently, we implement GRO as RX callback, whose > processing > > > logic is simple: > > > > > > > > > > receive burst packets -> perform GRO -> return to appli= cation. > > > GRO stops after > > > > > > > > > > finishing processing received packets. If we provide > > > rte_gro_tbl_timeout, when > > > > > > > > > > and who will call it? > > > > > > > > > > > > > > > > > > I mean the following scenario: > > > > > > > > > We receive a packet, find it is eligible for GRO and put = it into > > > gro_table > > > > > > > > > in expectation - there would be more packets for the same > flow. > > > > > > > > > But it could happen that we would never (or for some long > time) > > > receive > > > > > > > > > any new packets for that flow. > > > > > > > > > So the first packet would never be delivered to the upper= layer, > > > > > > > > > or delivered too late. > > > > > > > > > So we need a mechanism to extract such not merged packets > > > > > > > > > and push them to the upper layer. > > > > > > > > > My thought it would be application responsibility to call= it from > > > time to time. > > > > > > > > > That's actually another reason why I think we shouldn't u= se > > > application > > > > > > > > > to always use RX callbacks for GRO. > > > > > > > > > > > > > > > > Currently, we only provide one reassembly function, > > > rte_gro_reassemble_burst, > > > > > > > > which merges N inputted packets at a time. After finishing > > > processing these > > > > > > > > packets, it returns all of them and reset hashing tables. > Therefore, > > > there > > > > > > > > are no packets in hashing tables after rte_gro_reassemble_b= urst > > > returns. > > > > > > > > > > > > > > Ok, sorry I missed that part with rte_hash_reset(). > > > > > > > So gro_ressemble_burst() performs only inline processing on > current > > > input packets > > > > > > > and doesn't try to save packets for future merging, correct? > > > > > > > > > > > > Yes. > > > > > > > > > > > > > Such approach indeed is much lightweight and doesn't require = any > > > extra timeouts and flush(). > > > > > > > So my opinion let's keep it like that - nice and simple. > > > > > > > BTW, I think in that case we don't need any hashtables (or an= y > other > > > persistent strucures)at all. > > > > > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want= to > > > perform on given array of packets. > > > > > > > > > > > > Beside GRO types that are desired to perform, maybe it also nee= ds > > > max_pkt_size and > > > > > > some GRO type specific information? > > > > > > > > > > Yes, but we don't need the actual hash-tables, etc. inside. > > > > > Passing something like struct gro_param seems enough. > > > > > > > > Yes, we can just pass gro_param and allocate hashing tables > > > > inside rte_gro_reassemble_burst. If so, hashing tables of > > > > desired GRO types are created and freed in each invocation > > > > of rte_gro_reassemble_burst. In GRO library, hashing tables > > > > are created by GRO type specific gro_tbl_create_fn. These > > > > gro_tbl_create_fn may allocate hashing table space via malloc > > > > (or rte_malloc). Therefore, we may frequently call malloc/free > > > > when using rte_gro_reassemble_burst. In my opinion, it will > > > > degrade GRO performance greatly. > > > > > > I don't' understand why do we need to put/extract each packet into/fr= om > > > hash table at all. > > > We have N input packets that need to be grouped/sorted by some > criteria. > > > Surely that can be done without any hash-table involved. > > > What is the need for hash table and all the overhead it brings here? > > > > In current design, I assume all GRO types use hash tables to merge > > packets. The key of the hash table is the criteria to merge packets. > > So the main difference for different GRO types' hash tables is the > > key definition. > > > > And the reason for using hash tables is to speed up reassembly. Given > > There are N TCP packets inputted, the simplest way to process packets[i= ] > > Is to traverse processed packets[0]~packets[i-1] and try to find a one > > to merge. In the worst case, we need to check all of packets[0~i-1]. > > In this case, the time complexity of processing N packets is O(N^2). > > If we use a hash table, whose key is the criteria to merge two packets, > > the time to find a packet that may be merged with packets[i] is O(1). > > > > Do you think it's too complicated? > > > > Jiayu > > > How big is your expected burst size? If you are expecting 32 or 64 In current design, if the packet burst size is always small (e.g. less than 32 or 64), applications can use a small hash table to reduce the overhead from the hash table. But it's not a good solution, since the cost is still = high when the burst size is small. > packets per call, then N is small and the overhead of the hash table > seems a bit much. Perhaps you need different code paths for bigger and > smaller bursts? Yes, if hash tables can bring much better performance for the scenario whose N is large, maybe we can consider use two code paths. Jiayu=20 >=20 > /Bruce