From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id C05D22B9E for ; Sun, 16 Sep 2018 12:08:10 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 16 Sep 2018 03:08:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,381,1531810800"; d="scan'208";a="92162774" Received: from irsmsx107.ger.corp.intel.com ([163.33.3.99]) by orsmga002.jf.intel.com with ESMTP; 16 Sep 2018 03:07:44 -0700 Received: from irsmsx155.ger.corp.intel.com (163.33.192.3) by IRSMSX107.ger.corp.intel.com (163.33.3.99) with Microsoft SMTP Server (TLS) id 14.3.319.2; Sun, 16 Sep 2018 11:07:44 +0100 Received: from irsmsx101.ger.corp.intel.com ([169.254.1.40]) by irsmsx155.ger.corp.intel.com ([169.254.14.243]) with mapi id 14.03.0319.002; Sun, 16 Sep 2018 11:07:43 +0100 From: "Trahe, Fiona" To: "Verma, Shally" , "dev@dpdk.org" CC: "Daly, Lee" , "Jozwiak, TomaszX" , Akhil Goyal , "Sahu, Sunila" , "Gupta, Ashish" , "Trahe, Fiona" Thread-Topic: compressdev: append dest data in PMDs instead of in application Thread-Index: AQHUTOe29qBY7ee7HUeWG6ftICn+TKTyrYiw Date: Sun, 16 Sep 2018 10:07:42 +0000 Message-ID: <348A99DA5F5B7549AA880327E580B4358961E13C@IRSMSX101.ger.corp.intel.com> References: <348A99DA5F5B7549AA880327E580B43589619BC1@IRSMSX101.ger.corp.intel.com> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMDkyMzAyYTQtNjgzZC00NjMyLWE5NzktNTcxNWQ1MjExNDdkIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiK1c3KzVyZThMYXBCWFJLbFhyUmdob28xNWlMRVZ4T0JHRVhFdTJHYWZ6SVwvQ0ZDTGI1a0V5dENxMVU3ZkRGZE8ifQ== x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] compressdev: append dest data in PMDs instead of in application X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2018 10:08:11 -0000 Hi Shally, > -----Original Message----- > From: Verma, Shally [mailto:Shally.Verma@cavium.com] > Sent: Saturday, September 15, 2018 12:32 PM > To: Trahe, Fiona ; dev@dpdk.org > Cc: Daly, Lee ; Jozwiak, TomaszX ; Akhil Goyal > ; Sahu, Sunila ; Gupta, Ashi= sh > > Subject: RE: compressdev: append dest data in PMDs instead of in applicat= ion >=20 > HI Fiona >=20 > >-----Original Message----- > >From: Trahe, Fiona > >Sent: 11 September 2018 22:47 > >To: dev@dpdk.org > >Cc: Trahe, Fiona ; Verma, Shally ; Daly, Lee > ; Jozwiak, > >TomaszX ; Akhil Goyal > >Subject: RFC: compressdev: append dest data in PMDs instead of in applic= ation > > > >External Email > > > >Proposal: > >In compressdev, move the responsibility for appending data in the destin= ation > >mbuf from the application to the PMD. > > > >Why: > >Compression operations are all out-of-place and the output size is unkno= wn. > >Source and destination mbufs are passed to the PMDs. > >There's no problem with the src, but there is with the destination. > >The application allocates the dest mbuf from the pool, guesses how much = of > >the buffer the PMD will write with output and calls rte_pktmbuf_append()= for this amount. > >No data is actually copied into the dest mbuf by the application. > > > >The PMD writes output data to the destination mbuf, and returns the amou= nt written in the > >op.produced parameter. > > > >Throughout this the mbuf is not consistent with expectations, i.e. a cal= l to > >rte_pktmbuf_data_len() will NOT return the actual amount of data in the = buffer. A call to > >rte_pktmbuf_append() will NOT add space at end of existing data. rte_pkt= mbuf_tailroom() > > will not return how much space is available at the end of the data. > >Also, some PMDs, e.g. ISA-L, need scratch space at end of the output buf= fer for checksum > >calculation. So though the appl has appended a specific amount in the ex= pectation that it > >can be used for compressed data, the PMD needs to reduce this by 4 bytes= to reserve > >space for the checksum - even though there may be space to append anothe= r 4bytes. > > > Agree to this issue in first place. However before we start on this. I ha= ve a scenario and question based on > it (probably you've already thought it but I couldn't guess limitations h= ere) >=20 > Given all limitations of mbuf library(especially) on SGL and compression = op behaviour and complexities it's > going to add on PMD, why we can't just input length of dst buffer from an= app and leave all of > mbuf management to app,* if it's using mbuf data buffer* i.e. consider th= is scenario: >=20 > App alloc rte_mbuf and call rte_pktmbuf_tailroom() to update dst.len with= amount of free space. PMD > setup HW according to offset data pointer and length given. > When op is done, then PMD updates op.produced. App call rte_pktmbuf_appen= d() on mbuf to update m- > >data/pkt_len. >=20 > Now there're following possibilities with op.produced: > - if op.produced < tailroom size , app call tailroom size again to find= leftover free space and pass offset +=3D > op.produced and new length in next call > - if op.produced =3D=3D tailroom size, app alloc a new mbuf and chain it = to previous mbuf, call tailroom size on > new mbuf, set offset +=3D op.produced and length =3D free space in new mb= uf. PMD now will have a chained > mbuf in input and offset will now end up in new chained buffer. >=20 > This, of course, mean PMD can have chained mbuf but only last buffer in c= hain available for use. > This is a limitation but only as long as we're working with MBUFs library= which were originally designed for > other purposes and not so better suited for compression-like operations. [Fiona] I like this idea as it keeps all the mbuf mgmt. in the application. But it removes the possibility of passing more than 64k-1 in one op, which = I think is unacceptable for storage purposes. As regards mbuf alternatives you mention below, yes, that's still open. The= mbuf external memory feature which was recently added may suffice, we wanted to try that out bef= ore proposing alternatives. I'm focussed on other work for 18.11 at the moment, so will get back to thi= s topic after that release.=20 >=20 > Why I believe it is more simpler to manage: > -It keeps PMD implementation simple and lives with limitation/expectation= s of mbuf library > -Having length at input will also be a requirement if we add any other bu= ffer structures types. >=20 > Although I am not sure if it can solve checksum issue you mentioned as I = don't fully understood how ISAL > uses dst buffer for checksum o/p. Does it write checksum in dst buffer fo= r every op? Or just when last > chunk of data is processed? > We already have separate checksum variable at input/for output. So, doesn= 't it just copy from dst buffer to > op at output and op.produced updated with only data length? >=20 > I remember, initially you had started a discussion to add new and more ge= neric buffer structure type to use > in place of rte_mbufs for compression library , such as iovec which simpl= y input pointer to raw bufs with > length. > Aren't we thinking in that direction anymore? I believe rather than rewor= king a library/PMD based/around > mbufs (unless we've any requirement to support mbufs so), we should defin= e a buffer variant to better suit > compression and its users need. It also allow us to define lib to input m= ultiple dst buffers either as an array > or SGL. >=20 >=20 > Thanks > Shally >=20 > >It seems more logical that the PMD should take responsibility for append= ing. > >I.e. the application should pass in an empty mbuf, the PMD uses the rte_= pktmbuf_tailroom() > >to know what space is available, uses this space as it needs and appends= the output data to > >match the actual amount of data it writes. > >This method caters for an mbuf already containing some data, in this cas= e the PMD will > >append the output AFTER the data already in the mbuf. > >It also needs to cater for SGL aka chained_mbuf case, though I'd propose= in this case only > >the first mbuf in the chain is allowed to already contain data, the rest= must be empty. > >An application can adjust a chain to satisfy this condition. > > > >Code impacts: > > * rte_comp_op.dst.offset should be deprecated. > > * comments on the m_dst would be changed to describe this usage > > * applications should alloc destination mbuf(s) from a pool, but not ap= pend. > > * PMDs should use append() to correctly reflect the amount of data retu= rned. > > > >This turns out to be more complex than expected for SGLs as rte_mbuf cha= ins DON'T support > >empty mbuf chains, i.e. several macros assume there's only tailroom avai= lable in the last > >segment of a chain. So essentially if an application chains a bunch of e= mpty mbufs > >together the only tailroom available is the buf_len of the last mbuf and= the space in > >all the other mbufs is "lost". > >We've investigated several ways around this, I think either options 1 or= 2 would be ok, but > >am not keen on option 3. I'm looking for feedback please. > > > >Option1: create an rte_comp_mbuf, identical to rte_mbuf - probably just = cast to it - with its own set of > > macros. Most of these wrap existing mbuf macros, those that need to = cater for the empty chain case. > >Option2: pass in a pool on the API instead and the PMD allocs dest mbufs= as needed and chains them. > > Storage customers expect to use external buffers attached to mbufs so= this may not suit their use-case. > > Although it may be an option if all mbufs in the pool are pre-attache= d to external buffers. > >Option3: appl does as today. PMD trims space not used. Would need change= s or additions to mbuf > macros > > so it could trim from more than just the last segment. PMD would need= to free unused segments. > > I'm not keen on this as doing the work in 2 places and there's potent= ial > > to leak mbufs here as allocating them in API and freeing in PMD. > > > >Regards, > >Fiona > >