From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <fiona.trahe@intel.com>
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
 by dpdk.org (Postfix) with ESMTP id 16ED62BFA
 for <dev@dpdk.org>; Tue, 11 Sep 2018 19:17:14 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 11 Sep 2018 10:17:13 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.53,361,1531810800"; d="scan'208";a="85027904"
Received: from irsmsx102.ger.corp.intel.com ([163.33.3.155])
 by fmsmga002.fm.intel.com with ESMTP; 11 Sep 2018 10:16:57 -0700
Received: from irsmsx101.ger.corp.intel.com ([169.254.1.40]) by
 IRSMSX102.ger.corp.intel.com ([169.254.2.180]) with mapi id 14.03.0319.002;
 Tue, 11 Sep 2018 18:16:57 +0100
From: "Trahe, Fiona" <fiona.trahe@intel.com>
To: "dev@dpdk.org" <dev@dpdk.org>
CC: "Trahe, Fiona" <fiona.trahe@intel.com>, "Verma, Shally"
 <Shally.Verma@cavium.com>, "Daly, Lee" <lee.daly@intel.com>, "Jozwiak,
 TomaszX" <tomaszx.jozwiak@intel.com>, Akhil Goyal <akhil.goyal@nxp.com>
Thread-Topic: RFC: compressdev: append dest data in PMDs instead of in
 application
Thread-Index: AdRJ75jacUBhpLmbTpmqMj+SnYHmWw==
Date: Tue, 11 Sep 2018 17:16:55 +0000
Message-ID: <348A99DA5F5B7549AA880327E580B43589619BC1@IRSMSX101.ger.corp.intel.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiODc1YWYwODUtOTkxNS00NzMwLThlNDctOTYwYjYyMjUwZTE5IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiWDFYTHE3a2poMmxBdUR1ZWNcL0xsR3FhQjBZTTJsVlZ5akFOZnF2cU00cnR2a3BXMEZPbjRENTA3QWdFS3k5bXAifQ==
x-ctpclassification: CTP_NT
dlp-product: dlpe-windows
dlp-version: 11.0.400.15
dlp-reaction: no-action
x-originating-ip: [163.33.239.180]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: [dpdk-dev] RFC: compressdev: append dest data in PMDs instead of in
	application
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Sep 2018 17:17:15 -0000

Proposal:
In compressdev, move the responsibility for appending data in the destinati=
on
mbuf from the application to the PMD.

Why:
Compression operations are all out-of-place and the output size is unknown.
Source and destination mbufs are passed to the PMDs.
There's no problem with the src, but there is with the destination.
The application allocates the dest mbuf from the pool, guesses how much of
the buffer the PMD will write with output and calls rte_pktmbuf_append() fo=
r this amount.=20
No data is actually copied into the dest mbuf by the application.
=20
The PMD writes output data to the destination mbuf, and returns the amount =
written in the=20
op.produced parameter.

Throughout this the mbuf is not consistent with expectations, i.e. a call t=
o
rte_pktmbuf_data_len() will NOT return the actual amount of data in the buf=
fer. A call to
rte_pktmbuf_append() will NOT add space at end of existing data. rte_pktmbu=
f_tailroom()
 will not return how much space is available at the end of the data.
Also, some PMDs, e.g. ISA-L, need scratch space at end of the output buffer=
 for checksum
calculation. So though the appl has appended a specific amount in the expec=
tation that it
can be used for compressed data, the PMD needs to reduce this by 4 bytes to=
 reserve
space for the checksum - even though there may be space to append another 4=
bytes.

It seems more logical that the PMD should take responsibility for appending=
.
I.e. the application should pass in an empty mbuf, the PMD uses the rte_pkt=
mbuf_tailroom()
to know what space is available, uses this space as it needs and appends th=
e output data to
match the actual amount of data it writes.
This method caters for an mbuf already containing some data, in this case t=
he PMD will
append the output AFTER the data already in the mbuf.
It also needs to cater for SGL aka chained_mbuf case, though I'd propose in=
 this case only
the first mbuf in the chain is allowed to already contain data, the rest mu=
st be empty.
An application can adjust a chain to satisfy this condition.

Code impacts:
 * rte_comp_op.dst.offset should be deprecated.
 * comments on the m_dst would be changed to describe this usage
 * applications should alloc destination mbuf(s) from a pool, but not appen=
d.
 * PMDs should use append() to correctly reflect the amount of data returne=
d.

This turns out to be more complex than expected for SGLs as rte_mbuf chains=
 DON'T support
empty mbuf chains, i.e. several macros assume there's only tailroom availab=
le in the last=20
segment of a chain. So essentially if an application chains a bunch of empt=
y mbufs
together the only tailroom available is the buf_len of the last mbuf and th=
e space in
all the other mbufs is "lost".
We've investigated several ways around this, I think either options 1 or 2 =
would be ok, but
am not keen on option 3. I'm looking for feedback please.

Option1: create an rte_comp_mbuf, identical to rte_mbuf - probably just cas=
t to it - with its own set of
   macros.  Most of these wrap existing mbuf macros, those that need to cat=
er for the empty chain case.
Option2: pass in a pool on the API instead and the PMD allocs dest mbufs as=
 needed and chains them.
   Storage customers expect to use external buffers attached to mbufs so th=
is may not suit their use-case.
   Although it may be an option if all mbufs in the pool are pre-attached t=
o external buffers.
Option3: appl does as today. PMD trims space not used. Would need changes o=
r additions to mbuf macros
   so it could trim from more than just the last segment. PMD would need to=
 free unused segments.=20
   I'm not keen on this as doing the work in 2 places and there's potential
   to leak mbufs here as allocating them in API and freeing in PMD.

Regards,
Fiona