From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM03-BY2-obe.outbound.protection.outlook.com (mail-by2nam03on0088.outbound.protection.outlook.com [104.47.42.88]) by dpdk.org (Postfix) with ESMTP id 1DB072B9A for ; Mon, 17 Sep 2018 07:59:53 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Q4vMEQvbGJaElaSsBF6w1lpHdPgWc0ZFJ8RhF99e07k=; b=FF4suAy8u1RzIp124BtF4Qo3cBBemUNR/PUg5imc2qD1sV3Jl1Ws8t0x8WFV0gsaos3AaimVbM7UkXnTElfsEhx74wz083bZp58vAAljHX8pogmGzDsWYOOphhm8Q4YeXYgu4ylSdUygnbSpIuQWPFZScD9zopH000qPY5W66v4= Received: from SN6PR07MB5152.namprd07.prod.outlook.com (52.135.101.33) by SN6PR07MB5021.namprd07.prod.outlook.com (52.135.121.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1143.18; Mon, 17 Sep 2018 05:59:50 +0000 Received: from SN6PR07MB5152.namprd07.prod.outlook.com ([fe80::c5da:a9f2:b4f4:73d8]) by SN6PR07MB5152.namprd07.prod.outlook.com ([fe80::c5da:a9f2:b4f4:73d8%2]) with mapi id 15.20.1143.017; Mon, 17 Sep 2018 05:59:50 +0000 From: "Verma, Shally" To: "Trahe, Fiona" , "dev@dpdk.org" CC: "Daly, Lee" , "Jozwiak, TomaszX" , Akhil Goyal , "Sahu, Sunila" , "Gupta, Ashish" Thread-Topic: compressdev: append dest data in PMDs instead of in application Thread-Index: AdRJ75jacUBhpLmbTpmqMj+SnYHmWwC4r/PQADSwnAAAKUO5QA== Date: Mon, 17 Sep 2018 05:59:50 +0000 Message-ID: References: <348A99DA5F5B7549AA880327E580B43589619BC1@IRSMSX101.ger.corp.intel.com> <348A99DA5F5B7549AA880327E580B4358961E13C@IRSMSX101.ger.corp.intel.com> In-Reply-To: <348A99DA5F5B7549AA880327E580B4358961E13C@IRSMSX101.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Shally.Verma@cavium.com; x-originating-ip: [115.113.156.2] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; SN6PR07MB5021; 6:R5tuVNv+9hrUL8nh5MJNxEGituSsrzz6kPOOg7GLhbIyI/NfaTB5JChhP0PRcljbf4kDQpFd3nQDHk3ZVwGoL0C+tJ6lDJdTtIqJGNTDSJGjYTTvPZ7Uuup3/1rRbYxscpAGajA1RE65yo8sJcH/EF4t9MM2aWjiixsc2p7Bm0qBqJhpEAAB6gN4e3nx+Y1BmF81IYJ9ExsLvMN29L//KsKJFxeL2nTAzYne6g5MYXkXBisdqFvGV6FkU+fzJQbCsS/F3UYftZpjENv4GiU0xynN7g/nfJNHK3CAbxolxM21WRjqUXgH/S2HPc7DZnYZ8axURp4QvSZYOmdSybDRkk3gKOnzU7FSuB8SyHpuMGlMjmUyG3FiSR7t4khgShRSzy4IGenROr1FI+Fn91FHboy/NilhbBQyCPA712CHNIDSaPUDN6BDppGF4u0AtPT8l5BQ3iwnoKdV3nKdXs/yWw==; 5:k4lEtC01gN8dqpO1pImQNEuS9XjzUNjCGASRxOeI0Mvv7kKfcYMksZH3zxh+g3GUFYMmldfXY/5qEhSMzcmXb0av2DpMomRxgMzE9gHCU06cTiIMVkcmDi796v3GYBlKWBAjBC4F8NMY+uEZT7c6Bd8BF4AZB8WzAnHB9M6FAjE=; 7:blP490LX6DDM+/A4f1vHgw9ksfDtAklA8BbQhIa1eIrwr2FEIpf2NjflgkvTvQX2KCfWz3046j/aHyuKp0TfhKFKB/gcFjPlywAmi9NGms5JFbE4KSb4C4/0xzlYHgb/TMCYY0RBDk21Yypeuc/RWqP6lJSuhsUSJG8HKl2OhtTt6sFHo8aEywTF40QdGffALLKinXKN+onfUWpCWsDp7xGKR6KTtrvw8tUOw0l//dmj7CeD4xYjYz8+IIAjBMl5 x-ms-exchange-antispam-srfa-diagnostics: SOS;SOR; x-forefront-antispam-report: SFV:SKI; SCL:-1; SFV:NSPM; SFS:(10009020)(376002)(346002)(366004)(396003)(39860400002)(136003)(13464003)(199004)(189003)(107886003)(305945005)(4326008)(2900100001)(2501003)(6246003)(74316002)(99286004)(68736007)(54906003)(5660300001)(110136005)(446003)(86362001)(11346002)(97736004)(476003)(66066001)(186003)(26005)(25786009)(256004)(5250100002)(2906002)(486006)(5024004)(14444005)(102836004)(33656002)(6436002)(55016002)(561944003)(9686003)(229853002)(316002)(53936002)(7696005)(53546011)(6116002)(72206003)(8936002)(478600001)(81156014)(81166006)(106356001)(8676002)(14454004)(3846002)(105586002)(6506007)(7736002)(76176011); DIR:OUT; SFP:1101; SCL:1; SRVR:SN6PR07MB5021; H:SN6PR07MB5152.namprd07.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; x-ms-office365-filtering-correlation-id: 3ca445a3-2b94-4c9e-82d5-08d61c62c7a5 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989137)(4534165)(4627221)(201703031133081)(201702281549075)(8990107)(5600074)(711020)(2017052603328)(7153060)(7193020); SRVR:SN6PR07MB5021; x-ms-traffictypediagnostic: SN6PR07MB5021: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(278428928389397)(185117386973197)(228905959029699); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(3231355)(944501410)(52105095)(93006095)(93001095)(3002001)(10201501046)(149027)(150027)(6041310)(20161123558120)(20161123564045)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(201708071742011)(7699050); SRVR:SN6PR07MB5021; BCL:0; PCL:0; RULEID:; SRVR:SN6PR07MB5021; x-forefront-prvs: 0798146F16 received-spf: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: CuxYgiGY4Idu6aWoKBO4pXrIkS72ZOZClnflKXNd2eSY8GI5BLyAUR+UuO3hWCVg047t2+7h9EHV5kX5IZX7SQ9LfxV7kf9RHbHlRBitgEZibiE7pIx6+t8FhLrdN6NN+ZV0GIt85eiRCj37TpscxvR+zbRYXgUwACQnMe2efm52pJxR7AX5Upwr2zTOD8Fi0sMlM+Rw2+CmxXlxjSD5LohhvDCGNAnS5INjkZu+pvYqur6cZXSvQbzTB09f4KDULw8V/xRYIe4zfd4jaSimUNn24C/c/Ozyp7sKfOlpvYcD33gMbaH7ztGUwolbI0eCGQRi2/rHrGIDEsgNEVjagnK1TxdEocUBGjCN8aQbCRg= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: cavium.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3ca445a3-2b94-4c9e-82d5-08d61c62c7a5 X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Sep 2018 05:59:50.0418 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN6PR07MB5021 Subject: Re: [dpdk-dev] compressdev: append dest data in PMDs instead of in application X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2018 05:59:53 -0000 >-----Original Message----- >From: Trahe, Fiona >Sent: 16 September 2018 15:38 >To: Verma, Shally ; dev@dpdk.org >Cc: Daly, Lee ; Jozwiak, TomaszX ; Akhil Goyal ; Sahu, >Sunila ; Gupta, Ashish ; = Trahe, Fiona >Subject: RE: compressdev: append dest data in PMDs instead of in applicati= on > >External Email > >Hi Shally, > >> -----Original Message----- >> From: Verma, Shally [mailto:Shally.Verma@cavium.com] >> Sent: Saturday, September 15, 2018 12:32 PM >> To: Trahe, Fiona ; dev@dpdk.org >> Cc: Daly, Lee ; Jozwiak, TomaszX ; Akhil Goyal >> ; Sahu, Sunila ; Gupta, Ash= ish >> >> Subject: RE: compressdev: append dest data in PMDs instead of in applica= tion >> >> HI Fiona >> >> >-----Original Message----- >> >From: Trahe, Fiona >> >Sent: 11 September 2018 22:47 >> >To: dev@dpdk.org >> >Cc: Trahe, Fiona ; Verma, Shally ; Daly, Lee >> ; Jozwiak, >> >TomaszX ; Akhil Goyal >> >Subject: RFC: compressdev: append dest data in PMDs instead of in appli= cation >> > >> >External Email >> > >> >Proposal: >> >In compressdev, move the responsibility for appending data in the desti= nation >> >mbuf from the application to the PMD. >> > >> >Why: >> >Compression operations are all out-of-place and the output size is unkn= own. >> >Source and destination mbufs are passed to the PMDs. >> >There's no problem with the src, but there is with the destination. >> >The application allocates the dest mbuf from the pool, guesses how much= of >> >the buffer the PMD will write with output and calls rte_pktmbuf_append(= ) for this amount. >> >No data is actually copied into the dest mbuf by the application. >> > >> >The PMD writes output data to the destination mbuf, and returns the amo= unt written in the >> >op.produced parameter. >> > >> >Throughout this the mbuf is not consistent with expectations, i.e. a ca= ll to >> >rte_pktmbuf_data_len() will NOT return the actual amount of data in the= buffer. A call to >> >rte_pktmbuf_append() will NOT add space at end of existing data. rte_pk= tmbuf_tailroom() >> > will not return how much space is available at the end of the data. >> >Also, some PMDs, e.g. ISA-L, need scratch space at end of the output bu= ffer for checksum >> >calculation. So though the appl has appended a specific amount in the e= xpectation that it >> >can be used for compressed data, the PMD needs to reduce this by 4 byte= s to reserve >> >space for the checksum - even though there may be space to append anoth= er 4bytes. >> > >> Agree to this issue in first place. However before we start on this. I h= ave a scenario and question based on >> it (probably you've already thought it but I couldn't guess limitations = here) >> >> Given all limitations of mbuf library(especially) on SGL and compression= op behaviour and complexities it's >> going to add on PMD, why we can't just input length of dst buffer from a= n app and leave all of >> mbuf management to app,* if it's using mbuf data buffer* i.e. consider t= his scenario: >> >> App alloc rte_mbuf and call rte_pktmbuf_tailroom() to update dst.len wit= h amount of free space. PMD >> setup HW according to offset data pointer and length given. >> When op is done, then PMD updates op.produced. App call rte_pktmbuf_appe= nd() on mbuf to update m- >> >data/pkt_len. >> >> Now there're following possibilities with op.produced: >> - if op.produced < tailroom size , app call tailroom size again to fin= d leftover free space and pass offset +=3D >> op.produced and new length in next call >> - if op.produced =3D=3D tailroom size, app alloc a new mbuf and chain it= to previous mbuf, call tailroom size on >> new mbuf, set offset +=3D op.produced and length =3D free space in new m= buf. PMD now will have a chained >> mbuf in input and offset will now end up in new chained buffer. >> >> This, of course, mean PMD can have chained mbuf but only last buffer in = chain available for use. >> This is a limitation but only as long as we're working with MBUFs librar= y which were originally designed for >> other purposes and not so better suited for compression-like operations. >[Fiona] I like this idea as it keeps all the mbuf mgmt. in the application= . >But it removes the possibility of passing more than 64k-1 in one op, which= I think is unacceptable for storage purposes. >As regards mbuf alternatives you mention below, yes, that's still open. Th= e mbuf external memory >feature which was recently added may suffice, we wanted to try that out be= fore proposing alternatives. >I'm focussed on other work for 18.11 at the moment, so will get back to th= is topic after that release. [Shally] For storage, in any case mbuf data buffer doesn't seem an apt data= structure as they will always have some limitations. External buffer attachment maybe worth considering though. I would be curio= us to know if it behave stably.=20 Thanks Shally > >> >> Why I believe it is more simpler to manage: >> -It keeps PMD implementation simple and lives with limitation/expectatio= ns of mbuf library >> -Having length at input will also be a requirement if we add any other b= uffer structures types. >> >> Although I am not sure if it can solve checksum issue you mentioned as I= don't fully understood how ISAL >> uses dst buffer for checksum o/p. Does it write checksum in dst buffer f= or every op? Or just when last >> chunk of data is processed? >> We already have separate checksum variable at input/for output. So, does= n't it just copy from dst buffer to >> op at output and op.produced updated with only data length? >> >> I remember, initially you had started a discussion to add new and more g= eneric buffer structure type to use >> in place of rte_mbufs for compression library , such as iovec which simp= ly input pointer to raw bufs with >> length. >> Aren't we thinking in that direction anymore? I believe rather than rewo= rking a library/PMD based/around >> mbufs (unless we've any requirement to support mbufs so), we should defi= ne a buffer variant to better suit >> compression and its users need. It also allow us to define lib to input = multiple dst buffers either as an array >> or SGL. >> >> >> Thanks >> Shally >> >> >It seems more logical that the PMD should take responsibility for appen= ding. >> >I.e. the application should pass in an empty mbuf, the PMD uses the rte= _pktmbuf_tailroom() >> >to know what space is available, uses this space as it needs and append= s the output data to >> >match the actual amount of data it writes. >> >This method caters for an mbuf already containing some data, in this ca= se the PMD will >> >append the output AFTER the data already in the mbuf. >> >It also needs to cater for SGL aka chained_mbuf case, though I'd propos= e in this case only >> >the first mbuf in the chain is allowed to already contain data, the res= t must be empty. >> >An application can adjust a chain to satisfy this condition. >> > >> >Code impacts: >> > * rte_comp_op.dst.offset should be deprecated. >> > * comments on the m_dst would be changed to describe this usage >> > * applications should alloc destination mbuf(s) from a pool, but not a= ppend. >> > * PMDs should use append() to correctly reflect the amount of data ret= urned. >> > >> >This turns out to be more complex than expected for SGLs as rte_mbuf ch= ains DON'T support >> >empty mbuf chains, i.e. several macros assume there's only tailroom ava= ilable in the last >> >segment of a chain. So essentially if an application chains a bunch of = empty mbufs >> >together the only tailroom available is the buf_len of the last mbuf an= d the space in >> >all the other mbufs is "lost". >> >We've investigated several ways around this, I think either options 1 o= r 2 would be ok, but >> >am not keen on option 3. I'm looking for feedback please. >> > >> >Option1: create an rte_comp_mbuf, identical to rte_mbuf - probably just= cast to it - with its own set of >> > macros. Most of these wrap existing mbuf macros, those that need to= cater for the empty chain case. >> >Option2: pass in a pool on the API instead and the PMD allocs dest mbuf= s as needed and chains them. >> > Storage customers expect to use external buffers attached to mbufs s= o this may not suit their use-case. >> > Although it may be an option if all mbufs in the pool are pre-attach= ed to external buffers. >> >Option3: appl does as today. PMD trims space not used. Would need chang= es or additions to mbuf >> macros >> > so it could trim from more than just the last segment. PMD would nee= d to free unused segments. >> > I'm not keen on this as doing the work in 2 places and there's poten= tial >> > to leak mbufs here as allocating them in API and freeing in PMD. >> > >> >Regards, >> >Fiona >> >