From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 78BC01E2F for ; Tue, 30 May 2017 13:56:32 +0200 (CEST) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP; 30 May 2017 04:56:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.38,418,1491289200"; d="scan'208";a="267988365" Received: from irsmsx107.ger.corp.intel.com ([163.33.3.99]) by fmsmga004.fm.intel.com with ESMTP; 30 May 2017 04:56:29 -0700 Received: from irsmsx109.ger.corp.intel.com ([169.254.13.250]) by IRSMSX107.ger.corp.intel.com ([169.254.10.129]) with mapi id 14.03.0319.002; Tue, 30 May 2017 12:56:28 +0100 From: "Ananyev, Konstantin" To: "Hu, Jiayu" CC: "dev@dpdk.org" , "Wiles, Keith" , "yuanhan.liu@linux.intel.com" Thread-Topic: [PATCH v3 1/3] lib: add Generic Receive Offload API framework Thread-Index: AQHSvNIioTj7qilZNEunAvoVQTYKUKIAL6+AgAGlgACAAN/38IADoyeAgAEVmeCAAD37gIAAjg7QgAAhXoCAADsncIACqjiAgAAzONCAAQz+AIAAe9FQ Date: Tue, 30 May 2017 11:56:27 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772583FB033F4@IRSMSX109.ger.corp.intel.com> References: <1491309106-94264-1-git-send-email-jiayu.hu@intel.com> <1493021398-115955-1-git-send-email-jiayu.hu@intel.com> <1493021398-115955-2-git-send-email-jiayu.hu@intel.com> <2601191342CEEE43887BDE71AB9772583FAF8C27@IRSMSX109.ger.corp.intel.com> <20170523103154.GA2033@localhost.localdomain> <2601191342CEEE43887BDE71AB9772583FAF9BAE@IRSMSX109.ger.corp.intel.com> <20170526072613.GA85810@localhost.localdomain> <2601191342CEEE43887BDE71AB9772583FAFB22A@IRSMSX109.ger.corp.intel.com> <20170527034137.GA22570@localhost.localdomain> <2601191342CEEE43887BDE71AB9772583FAFE69E@IRSMSX109.ger.corp.intel.com> <20170527140929.GA27976@localhost.localdomain> <2601191342CEEE43887BDE71AB9772583FB02195@IRSMSX109.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772583FB02F85@IRSMSX109.ger.corp.intel.com> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 1/3] lib: add Generic Receive Offload API framework X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 May 2017 11:56:33 -0000 HI Jiayu, > -----Original Message----- > From: Hu, Jiayu > Sent: Tuesday, May 30, 2017 6:29 AM > To: Ananyev, Konstantin > Cc: dev@dpdk.org; Wiles, Keith ; yuanhan.liu@linux= .intel.com > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framewor= k >=20 > Hi Konstantin, >=20 > > -----Original Message----- > > From: Ananyev, Konstantin > > Sent: Monday, May 29, 2017 8:52 PM > > To: Hu, Jiayu > > Cc: dev@dpdk.org; Wiles, Keith ; > > yuanhan.liu@linux.intel.com > > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framew= ork > > > > Hi Jiayu, > > > > > -----Original Message----- > > > From: Hu, Jiayu > > > Sent: Monday, May 29, 2017 11:23 AM > > > To: Ananyev, Konstantin > > > Cc: dev@dpdk.org; Wiles, Keith ; > > yuanhan.liu@linux.intel.com > > > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API > > framework > > > > > > Hi Konstantin, > > > > > > > -----Original Message----- > > > > From: Ananyev, Konstantin > > > > Sent: Sunday, May 28, 2017 12:51 AM > > > > To: Hu, Jiayu > > > > Cc: dev@dpdk.org; Wiles, Keith ; > > > > yuanhan.liu@linux.intel.com > > > > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API > > framework > > > > > > > > > > > > Hi Jiayu, > > > > > > > > > > > > > > Hi Konstantin, > > > > > > > > > > On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wro= te: > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Hu, Jiayu > > > > > > > Sent: Saturday, May 27, 2017 4:42 AM > > > > > > > To: Ananyev, Konstantin > > > > > > > Cc: dev@dpdk.org; Wiles, Keith ; > > > > yuanhan.liu@linux.intel.com > > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload = API > > > > framework > > > > > > > > > > > > > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin > > wrote: > > > > > > > > Hi Jiayu, > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: Hu, Jiayu > > > > > > > > > Sent: Friday, May 26, 2017 8:26 AM > > > > > > > > > To: Ananyev, Konstantin > > > > > > > > > Cc: dev@dpdk.org; Wiles, Keith ; > > > > yuanhan.liu@linux.intel.com > > > > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offl= oad API > > > > framework > > > > > > > > > > > > > > > > > > Hi Konstantin, > > > > > > > > > > > > > > > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, > > Konstantin > > > > wrote: > > > > > > > > > > > > > > > > > > > > Hi Jiayu, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Konstantin, > > > > > > > > > > > > > > > > > > > > > > Thanks for your comments. My replies/questions are be= low. > > > > > > > > > > > > > > > > > > > > > > BRs, > > > > > > > > > > > Jiayu > > > > > > > > > > > > > > > > > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, > > > > Konstantin wrote: > > > > > > > > > > > > Hi Jiayu, > > > > > > > > > > > > My comments/questions below. > > > > > > > > > > > > Konstantin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For applications, DPDK GRO provides three externa= l > > functions > > > > to > > > > > > > > > > > > > enable/disable GRO: > > > > > > > > > > > > > - rte_gro_init: initialize GRO environment; > > > > > > > > > > > > > - rte_gro_enable: enable GRO for a given port; > > > > > > > > > > > > > - rte_gro_disable: disable GRO for a given port. > > > > > > > > > > > > > Before using GRO, applications should explicitly = call > > > > rte_gro_init to > > > > > > > > > > > > > initizalize GRO environment. After that, applicat= ions can > > call > > > > > > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_dis= able > > to > > > > disable GRO for > > > > > > > > > > > > > specific ports. > > > > > > > > > > > > > > > > > > > > > > > > I think this is too restrictive and wouldn't meet v= arious > > user's > > > > needs. > > > > > > > > > > > > User might want to: > > > > > > > > > > > > - enable/disable GRO for particular RX queue > > > > > > > > > > > > - or even setup different GRO types for different R= X queues, > > > > > > > > > > > > i.e, - GRO over IPV4/TCP for queue 0, and GRO o= ver > > > > IPV6/TCP for queue 1, etc. > > > > > > > > > > > > > > > > > > > > > > The reason for enabling/disabling GRO per-port instea= d of > > per- > > > > queue is that LINUX > > > > > > > > > > > controls GRO per-port. To control GRO per-queue indee= d can > > > > provide more flexibility > > > > > > > > > > > to applications. But are there any scenarios that dif= ferent > > > > queues of a port may > > > > > > > > > > > require different GRO control (i.e. GRO types and > > enable/disable > > > > GRO)? > > > > > > > > > > > > > > > > > > > > I think yes. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - For various reasons, user might prefer not to use= RX > > callbacks > > > > for various reasons, > > > > > > > > > > > > But invoke gro() manually at somepoint in his cod= e. > > > > > > > > > > > > > > > > > > > > > > An application-used GRO library can enable more flexi= bility to > > > > applications. Besides, > > > > > > > > > > > when perform GRO in ethdev layer or inside PMD driver= s, it is > > an > > > > issue that > > > > > > > > > > > rte_eth_rx_burst returns actually received packet num= ber or > > > > GROed packet number. And > > > > > > > > > > > the same issue happens in GSO, and even more seriousl= y. > > This is > > > > because applications > > > > > > > > > > > , like VPP, always rely on the return value of rte_et= h_tx_burst > > to > > > > decide further > > > > > > > > > > > operations. If applications can direcly call GRO/GSO = libraries, > > > > this issue won't exist. > > > > > > > > > > > And DPDK is a library, which is not a holistic system= like > > LINUX. > > > > We don't need to do > > > > > > > > > > > the same as LINUX. Therefore, maybe it's a better ide= a to > > > > directly provide SW > > > > > > > > > > > segmentation/reassembling libraries to applications. > > > > > > > > > > > > > > > > > > > > > > > - Many users would like to control size (number of > > flows/items > > > > per flow), > > > > > > > > > > > > max allowed packet size, max timeout, etc., for d= ifferent > > GRO > > > > tables. > > > > > > > > > > > > - User would need a way to flush all or only timeou= t > > packets > > > > from particular GRO tables. > > > > > > > > > > > > > > > > > > > > > > > > So I think that API needs to extended to become be = much > > more > > > > fine-grained. > > > > > > > > > > > > Something like that: > > > > > > > > > > > > > > > > > > > > > > > > struct rte_gro_tbl_param { > > > > > > > > > > > > int32_t socket_id; > > > > > > > > > > > > size_t max_flows; > > > > > > > > > > > > size_t max_items_per_flow; > > > > > > > > > > > > size_t max_pkt_size; > > > > > > > > > > > > uint64_t packet_timeout_cycles; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > }; > > > > > > > > > > > > > > > > > > > > > > > > struct rte_gro_tbl; > > > > > > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct > > > > rte_gro_tbl_param *param); > > > > > > > > > > > > > > > > > > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl); > > > > > > > > > > > > > > > > > > > > > > Yes, I agree with you. It's necessary to provide more= fine- > > > > grained control APIs to > > > > > > > > > > > applications. But what's 'packet_timeout_cycles' used= for? Is > > it > > > > for TCP packets? > > > > > > > > > > > > > > > > > > > > For any packets that sits in the gro_table for too long= . > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /* > > > > > > > > > > > > * process packets, might store some packets inside= the > > GRO > > > > table, > > > > > > > > > > > > * returns number of filled entries in pkt[] > > > > > > > > > > > > */ > > > > > > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tb= l, struct > > > > rte_mbuf *pkt[], uint32_t num); > > > > > > > > > > > > > > > > > > > > > > > > /* > > > > > > > > > > > > * retirieves up to num timeouted packets from the= table. > > > > > > > > > > > > */ > > > > > > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *t= bl, > > uint64_t > > > > tmt, struct rte_mbuf *pkt[], uint32_t num); > > > > > > > > > > > > > > > > > > > > > > Currently, we implement GRO as RX callback, whose > > processing > > > > logic is simple: > > > > > > > > > > > receive burst packets -> perform GRO -> return to app= lication. > > > > GRO stops after > > > > > > > > > > > finishing processing received packets. If we provide > > > > rte_gro_tbl_timeout, when > > > > > > > > > > > and who will call it? > > > > > > > > > > > > > > > > > > > > I mean the following scenario: > > > > > > > > > > We receive a packet, find it is eligible for GRO and pu= t it into > > > > gro_table > > > > > > > > > > in expectation - there would be more packets for the sa= me > > flow. > > > > > > > > > > But it could happen that we would never (or for some lo= ng > > time) > > > > receive > > > > > > > > > > any new packets for that flow. > > > > > > > > > > So the first packet would never be delivered to the upp= er layer, > > > > > > > > > > or delivered too late. > > > > > > > > > > So we need a mechanism to extract such not merged packe= ts > > > > > > > > > > and push them to the upper layer. > > > > > > > > > > My thought it would be application responsibility to ca= ll it from > > > > time to time. > > > > > > > > > > That's actually another reason why I think we shouldn't= use > > > > application > > > > > > > > > > to always use RX callbacks for GRO. > > > > > > > > > > > > > > > > > > Currently, we only provide one reassembly function, > > > > rte_gro_reassemble_burst, > > > > > > > > > which merges N inputted packets at a time. After finishin= g > > > > processing these > > > > > > > > > packets, it returns all of them and reset hashing tables. > > Therefore, > > > > there > > > > > > > > > are no packets in hashing tables after rte_gro_reassemble= _burst > > > > returns. > > > > > > > > > > > > > > > > Ok, sorry I missed that part with rte_hash_reset(). > > > > > > > > So gro_ressemble_burst() performs only inline processing on > > current > > > > input packets > > > > > > > > and doesn't try to save packets for future merging, correct= ? > > > > > > > > > > > > > > Yes. > > > > > > > > > > > > > > > Such approach indeed is much lightweight and doesn't requir= e any > > > > extra timeouts and flush(). > > > > > > > > So my opinion let's keep it like that - nice and simple. > > > > > > > > BTW, I think in that case we don't need any hashtables (or = any > > other > > > > persistent strucures)at all. > > > > > > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we wa= nt to > > > > perform on given array of packets. > > > > > > > > > > > > > > Beside GRO types that are desired to perform, maybe it also n= eeds > > > > max_pkt_size and > > > > > > > some GRO type specific information? > > > > > > > > > > > > Yes, but we don't need the actual hash-tables, etc. inside. > > > > > > Passing something like struct gro_param seems enough. > > > > > > > > > > Yes, we can just pass gro_param and allocate hashing tables > > > > > inside rte_gro_reassemble_burst. If so, hashing tables of > > > > > desired GRO types are created and freed in each invocation > > > > > of rte_gro_reassemble_burst. In GRO library, hashing tables > > > > > are created by GRO type specific gro_tbl_create_fn. These > > > > > gro_tbl_create_fn may allocate hashing table space via malloc > > > > > (or rte_malloc). Therefore, we may frequently call malloc/free > > > > > when using rte_gro_reassemble_burst. In my opinion, it will > > > > > degrade GRO performance greatly. > > > > > > > > I don't' understand why do we need to put/extract each packet into/= from > > > > hash table at all. > > > > We have N input packets that need to be grouped/sorted by some > > criteria. > > > > Surely that can be done without any hash-table involved. > > > > What is the need for hash table and all the overhead it brings here= ? > > > > > > In current design, I assume all GRO types use hash tables to merge > > > packets. The key of the hash table is the criteria to merge packets. > > > So the main difference for different GRO types' hash tables is the > > > key definition. > > > > > > And the reason for using hash tables is to speed up reassembly. Given > > > There are N TCP packets inputted, the simplest way to process packets= [i] > > > Is to traverse processed packets[0]~packets[i-1] and try to find a on= e > > > to merge. In the worst case, we need to check all of packets[0~i-1]. > > > In this case, the time complexity of processing N packets is O(N^2). > > > If we use a hash table, whose key is the criteria to merge two packet= s, > > > the time to find a packet that may be merged with packets[i] is O(1). > > > > I understand that, but add/search inside the hash table, > > plus resetting it for every burst of packets doesn't come for free also= . > > I think that for most common burst sizes (< 100 packets) > > hash overhead would significantly overweight the price of > > worst case O(N^2) scan. >=20 > Yes, the packet burst size is always less than 100, which may amplify > the cost of using hash tables. To avoid the high price, maybe a simpler > structure, like rte_ip_frag_tbl in IP reassembly library, is better. And > actually, I have never tried other designs. In next version, I will use > a simpler structure to merge TCP packets and compare performance > results. Thanks. >=20 > > Also, if such worst case really worries you, you can pre-sort > > input packets first before starting the actual reassemble for them. >=20 > If the inputted N packets are consist of N1 TCP packets and N2 UDP > packets. If we pre-sort them, what should they look like after sorting? > My thought was something like that: ... ... |------- N1 -------------------------------|--------- N2 ------------------= -------------| Konstantin =20 > > That might help to bring the price down. > > Konstantin >=20