DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Verma, Shally" <Shally.Verma@cavium.com>
To: "Trahe, Fiona" <fiona.trahe@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
Cc: "Daly, Lee" <lee.daly@intel.com>,
	"Jozwiak, TomaszX" <tomaszx.jozwiak@intel.com>,
	Akhil Goyal <akhil.goyal@nxp.com>,
	"Sahu, Sunila" <Sunila.Sahu@cavium.com>,
	"Gupta, Ashish" <Ashish.Gupta@cavium.com>
Subject: Re: [dpdk-dev] compressdev: append dest data in PMDs instead of in application
Date: Mon, 17 Sep 2018 05:59:50 +0000	[thread overview]
Message-ID: <SN6PR07MB5152A673D4AFAA0E4BD46795F01E0@SN6PR07MB5152.namprd07.prod.outlook.com> (raw)
In-Reply-To: <348A99DA5F5B7549AA880327E580B4358961E13C@IRSMSX101.ger.corp.intel.com>



>-----Original Message-----
>From: Trahe, Fiona <fiona.trahe@intel.com>
>Sent: 16 September 2018 15:38
>To: Verma, Shally <Shally.Verma@cavium.com>; dev@dpdk.org
>Cc: Daly, Lee <lee.daly@intel.com>; Jozwiak, TomaszX <tomaszx.jozwiak@intel.com>; Akhil Goyal <akhil.goyal@nxp.com>; Sahu,
>Sunila <Sunila.Sahu@cavium.com>; Gupta, Ashish <Ashish.Gupta@cavium.com>; Trahe, Fiona <fiona.trahe@intel.com>
>Subject: RE: compressdev: append dest data in PMDs instead of in application
>
>External Email
>
>Hi Shally,
>
>> -----Original Message-----
>> From: Verma, Shally [mailto:Shally.Verma@cavium.com]
>> Sent: Saturday, September 15, 2018 12:32 PM
>> To: Trahe, Fiona <fiona.trahe@intel.com>; dev@dpdk.org
>> Cc: Daly, Lee <lee.daly@intel.com>; Jozwiak, TomaszX <tomaszx.jozwiak@intel.com>; Akhil Goyal
>> <akhil.goyal@nxp.com>; Sahu, Sunila <Sunila.Sahu@cavium.com>; Gupta, Ashish
>> <Ashish.Gupta@cavium.com>
>> Subject: RE: compressdev: append dest data in PMDs instead of in application
>>
>> HI Fiona
>>
>> >-----Original Message-----
>> >From: Trahe, Fiona <fiona.trahe@intel.com>
>> >Sent: 11 September 2018 22:47
>> >To: dev@dpdk.org
>> >Cc: Trahe, Fiona <fiona.trahe@intel.com>; Verma, Shally <Shally.Verma@cavium.com>; Daly, Lee
>> <lee.daly@intel.com>; Jozwiak,
>> >TomaszX <tomaszx.jozwiak@intel.com>; Akhil Goyal <akhil.goyal@nxp.com>
>> >Subject: RFC: compressdev: append dest data in PMDs instead of in application
>> >
>> >External Email
>> >
>> >Proposal:
>> >In compressdev, move the responsibility for appending data in the destination
>> >mbuf from the application to the PMD.
>> >
>> >Why:
>> >Compression operations are all out-of-place and the output size is unknown.
>> >Source and destination mbufs are passed to the PMDs.
>> >There's no problem with the src, but there is with the destination.
>> >The application allocates the dest mbuf from the pool, guesses how much of
>> >the buffer the PMD will write with output and calls rte_pktmbuf_append() for this amount.
>> >No data is actually copied into the dest mbuf by the application.
>> >
>> >The PMD writes output data to the destination mbuf, and returns the amount written in the
>> >op.produced parameter.
>> >
>> >Throughout this the mbuf is not consistent with expectations, i.e. a call to
>> >rte_pktmbuf_data_len() will NOT return the actual amount of data in the buffer. A call to
>> >rte_pktmbuf_append() will NOT add space at end of existing data. rte_pktmbuf_tailroom()
>> > will not return how much space is available at the end of the data.
>> >Also, some PMDs, e.g. ISA-L, need scratch space at end of the output buffer for checksum
>> >calculation. So though the appl has appended a specific amount in the expectation that it
>> >can be used for compressed data, the PMD needs to reduce this by 4 bytes to reserve
>> >space for the checksum - even though there may be space to append another 4bytes.
>> >
>> Agree to this issue in first place. However before we start on this. I have a scenario and question based on
>> it (probably you've already thought it but I couldn't guess limitations here)
>>
>> Given all limitations of mbuf library(especially) on SGL and compression op behaviour and complexities it's
>> going to add on PMD, why we can't just input length of dst buffer from an app and leave all of
>> mbuf management to app,* if it's using mbuf data buffer* i.e. consider this scenario:
>>
>> App alloc rte_mbuf and call rte_pktmbuf_tailroom() to update dst.len with amount of free space. PMD
>> setup HW according to offset data pointer and length given.
>> When op is done, then PMD updates op.produced. App call rte_pktmbuf_append() on mbuf to update m-
>> >data/pkt_len.
>>
>> Now there're following possibilities with op.produced:
>> - if  op.produced < tailroom size ,  app call tailroom size again to find leftover free space and pass offset +=
>> op.produced and new length in next call
>> - if op.produced == tailroom size, app alloc a new mbuf and chain it to previous mbuf, call tailroom size on
>> new mbuf, set offset += op.produced and length = free space in new mbuf. PMD now will have a chained
>> mbuf in input and offset will now end up in new chained buffer.
>>
>> This, of course, mean PMD can have chained mbuf but only last buffer in chain available for use.
>> This is a limitation but only as long as we're working with MBUFs library which were originally designed for
>> other purposes and not so better suited for compression-like operations.
>[Fiona] I like this idea as it keeps all the mbuf mgmt. in the application.
>But it removes the possibility of passing more than 64k-1 in one op, which I think is unacceptable for storage purposes.
>As regards mbuf alternatives you mention below, yes, that's still open. The mbuf external memory
>feature which was recently added may suffice, we wanted to try that out before proposing alternatives.
>I'm focussed on other work for 18.11 at the moment, so will get back to this topic after that release.

[Shally] For storage, in any case mbuf data buffer doesn't seem an apt data structure as they will always have some limitations.
External buffer attachment maybe worth considering though. I would be curious to know if it behave stably. 

Thanks
Shally

>
>>
>> Why I believe it is more simpler to manage:
>> -It keeps PMD implementation simple and lives with limitation/expectations of mbuf library
>> -Having length at input will also be a requirement if we add any other buffer structures types.
>>
>> Although I am not sure if it can solve checksum issue you mentioned as I don't fully understood how ISAL
>> uses dst buffer for checksum o/p. Does it write checksum in dst buffer for every op? Or just when last
>> chunk of data is processed?
>> We already have separate checksum variable at input/for output. So, doesn't it just copy from dst buffer to
>> op at output and op.produced updated with only data length?
>>
>> I remember, initially you had started a discussion to add new and more generic buffer structure type to use
>> in place of rte_mbufs for compression library , such as iovec which simply input pointer to raw bufs with
>> length.
>> Aren't we thinking in that direction anymore? I believe rather than reworking a library/PMD based/around
>> mbufs (unless we've any requirement to support mbufs so), we should define a buffer variant to better suit
>> compression and its users need. It also allow us to define lib to input multiple dst buffers either as an array
>> or SGL.
>>
>>
>> Thanks
>> Shally
>>
>> >It seems more logical that the PMD should take responsibility for appending.
>> >I.e. the application should pass in an empty mbuf, the PMD uses the rte_pktmbuf_tailroom()
>> >to know what space is available, uses this space as it needs and appends the output data to
>> >match the actual amount of data it writes.
>> >This method caters for an mbuf already containing some data, in this case the PMD will
>> >append the output AFTER the data already in the mbuf.
>> >It also needs to cater for SGL aka chained_mbuf case, though I'd propose in this case only
>> >the first mbuf in the chain is allowed to already contain data, the rest must be empty.
>> >An application can adjust a chain to satisfy this condition.
>> >
>> >Code impacts:
>> > * rte_comp_op.dst.offset should be deprecated.
>> > * comments on the m_dst would be changed to describe this usage
>> > * applications should alloc destination mbuf(s) from a pool, but not append.
>> > * PMDs should use append() to correctly reflect the amount of data returned.
>> >
>> >This turns out to be more complex than expected for SGLs as rte_mbuf chains DON'T support
>> >empty mbuf chains, i.e. several macros assume there's only tailroom available in the last
>> >segment of a chain. So essentially if an application chains a bunch of empty mbufs
>> >together the only tailroom available is the buf_len of the last mbuf and the space in
>> >all the other mbufs is "lost".
>> >We've investigated several ways around this, I think either options 1 or 2 would be ok, but
>> >am not keen on option 3. I'm looking for feedback please.
>> >
>> >Option1: create an rte_comp_mbuf, identical to rte_mbuf - probably just cast to it - with its own set of
>> >   macros.  Most of these wrap existing mbuf macros, those that need to cater for the empty chain case.
>> >Option2: pass in a pool on the API instead and the PMD allocs dest mbufs as needed and chains them.
>> >   Storage customers expect to use external buffers attached to mbufs so this may not suit their use-case.
>> >   Although it may be an option if all mbufs in the pool are pre-attached to external buffers.
>> >Option3: appl does as today. PMD trims space not used. Would need changes or additions to mbuf
>> macros
>> >   so it could trim from more than just the last segment. PMD would need to free unused segments.
>> >   I'm not keen on this as doing the work in 2 places and there's potential
>> >   to leak mbufs here as allocating them in API and freeing in PMD.
>> >
>> >Regards,
>> >Fiona
>> >

      reply	other threads:[~2018-09-17  5:59 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-11 17:16 [dpdk-dev] RFC: " Trahe, Fiona
2018-09-15 11:31 ` [dpdk-dev] " Verma, Shally
2018-09-16 10:07   ` Trahe, Fiona
2018-09-17  5:59     ` Verma, Shally [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN6PR07MB5152A673D4AFAA0E4BD46795F01E0@SN6PR07MB5152.namprd07.prod.outlook.com \
    --to=shally.verma@cavium.com \
    --cc=Ashish.Gupta@cavium.com \
    --cc=Sunila.Sahu@cavium.com \
    --cc=akhil.goyal@nxp.com \
    --cc=dev@dpdk.org \
    --cc=fiona.trahe@intel.com \
    --cc=lee.daly@intel.com \
    --cc=tomaszx.jozwiak@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).