DPDK patches and discussions
 help / color / mirror / Atom feed
From: Andrew Rybchenko <arybchenko@solarflare.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	Yongseok Koh <yskoh@mellanox.com>
Cc: "Olivier Matz" <olivier.matz@6wind.com>,
	"Lu, Wenzhuo" <wenzhuo.lu@intel.com>,
	"Wu, Jingjing" <jingjing.wu@intel.com>,
	"Adrien Mazarguil" <adrien.mazarguil@6wind.com>,
	"Nélio Laranjeiro" <nelio.laranjeiro@6wind.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v2 1/6] mbuf: add buffer offset field for flexible indirection
Date: Wed, 11 Apr 2018 17:02:50 +0300	[thread overview]
Message-ID: <44a32fef-9087-f0c1-b28f-47b2d6a06995@solarflare.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB977258AE913944@IRSMSX102.ger.corp.intel.com>

On 04/11/2018 02:39 PM, Ananyev, Konstantin wrote:
> Hi Yongseok,
>
>>>> On Mon, Apr 09, 2018 at 06:04:34PM +0200, Olivier Matz wrote:
>>>>> Hi Yongseok,
>>>>>
>>>>> On Tue, Apr 03, 2018 at 05:12:06PM -0700, Yongseok Koh wrote:
>>>>>> On Tue, Apr 03, 2018 at 10:26:15AM +0200, Olivier Matz wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Mon, Apr 02, 2018 at 11:50:03AM -0700, Yongseok Koh wrote:
>>>>>>>> When attaching a mbuf, indirect mbuf has to point to start of buffer of
>>>>>>>> direct mbuf. By adding buf_off field to rte_mbuf, this becomes more
>>>>>>>> flexible. Indirect mbuf can point to any part of direct mbuf by calling
>>>>>>>> rte_pktmbuf_attach_at().
>>>>>>>>
>>>>>>>> Possible use-cases could be:
>>>>>>>> - If a packet has multiple layers of encapsulation, multiple indirect
>>>>>>>>    buffers can reference different layers of the encapsulated packet.
>>>>>>>> - A large direct mbuf can even contain multiple packets in series and
>>>>>>>>    each packet can be referenced by multiple mbuf indirections.
>>>>>>>>
>>>>>>>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
>>>>>>> I think the current API is already able to do what you want.
>>>>>>>
>>>>>>> 1/ Here is a mbuf m with its data
>>>>>>>
>>>>>>>                 off
>>>>>>>                 <-->
>>>>>>>                        len
>>>>>>>            +----+   <---------->
>>>>>>>            |    |
>>>>>>>          +-|----v----------------------+
>>>>>>>          | |    -----------------------|
>>>>>>> m       | buf  |    XXXXXXXXXXX      ||
>>>>>>>          |      -----------------------|
>>>>>>>          +-----------------------------+
>>>>>>>
>>>>>>>
>>>>>>> 2/ clone m:
>>>>>>>
>>>>>>>    c = rte_pktmbuf_alloc(pool);
>>>>>>>    rte_pktmbuf_attach(c, m);
>>>>>>>
>>>>>>>    Note that c has its own offset and length fields.
>>>>>>>
>>>>>>>
>>>>>>>                 off
>>>>>>>                 <-->
>>>>>>>                        len
>>>>>>>            +----+   <---------->
>>>>>>>            |    |
>>>>>>>          +-|----v----------------------+
>>>>>>>          | |    -----------------------|
>>>>>>> m       | buf  |    XXXXXXXXXXX      ||
>>>>>>>          |      -----------------------|
>>>>>>>          +------^----------------------+
>>>>>>>                 |
>>>>>>>            +----+
>>>>>>> indirect  |
>>>>>>>          +-|---------------------------+
>>>>>>>          | |    -----------------------|
>>>>>>> c       | buf  |                     ||
>>>>>>>          |      -----------------------|
>>>>>>>          +-----------------------------+
>>>>>>>
>>>>>>>                  off    len
>>>>>>>                  <--><---------->
>>>>>>>
>>>>>>>
>>>>>>> 3/ remove some data from c without changing m
>>>>>>>
>>>>>>>     rte_pktmbuf_adj(c, 10)   // at head
>>>>>>>     rte_pktmbuf_trim(c, 10)  // at tail
>>>>>>>
>>>>>>>
>>>>>>> Please let me know if it fits your needs.
>>>>>> No, it doesn't.
>>>>>>
>>>>>> Trimming head and tail with the current APIs removes data and make the space
>>>>>> available. Adjusting packet head means giving more headroom, not shifting the
>>>>>> buffer itself. If m has two indirect mbufs (c1 and c2) and those are pointing to
>>>>>> difference offsets in m,
>>>>>>
>>>>>> rte_pktmbuf_adj(c1, 10);
>>>>>> rte_pktmbuf_adj(c2, 20);
>>>>>>
>>>>>> then the owner of c2 regard the first (off+20)B as available headroom. If it
>>>>>> wants to attach outer header, it will overwrite the headroom even though the
>>>>>> owner of c1 is still accessing it. Instead, another mbuf (h1) for the outer
>>>>>> header should be linked by h1->next = c2.
>>>>> Yes, after these operations c1, c2 and m should become read-only. So, to
>>>>> prepend headers, another mbuf has to be inserted before as you suggest. It
>>>>> is possible to wrap this in a function rte_pktmbuf_clone_area(m, offset,
>>>>> length) that will:
>>>>>    - alloc and attach indirect mbuf for each segment of m that is
>>>>>      in the range [offset : length+offset].
>>>>>    - prepend an empty and writable mbuf for the headers
>>>>>
>>>>>> If c1 and c2 are attached with shifting buffer address by adjusting buf_off,
>>>>>> which actually shrink the headroom, this case can be properly handled.
>>>>> What do you mean by properly handled?
>>>>>
>>>>> Yes, prepending data or adding data in the indirect mbuf won't override
>>>>> the direct mbuf. But prepending data or adding data in the direct mbuf m
>>>>> won't be protected.
>>>>>
>>>>>  From an application point of view, indirect mbufs, or direct mbufs that
>>>>> have refcnt != 1, should be both considered as read-only because they
>>>>> may share their data. How an application can know if the data is shared
>>>>> or not?
>>>>>
>>>>> Maybe we need a flag to differentiate mbufs that are read-only
>>>>> (something like SHARED_DATA, or simply READONLY). In your case, if my
>>>>> understanding is correct, you want to have indirect mbufs with RW data.
>>>> Agree that indirect mbuf must be treated as read-only, Then the current code is
>>>> enough to handle that use-case.
>>>>
>>>>>> And another use-case (this is my actual use-case) is to make a large mbuf have
>>>>>> multiple packets in series. AFAIK, this will also be helpful for some FPGA NICs
>>>>>> because it transfers multiple packets to a single large buffer to reduce PCIe
>>>>>> overhead for small packet traffic like the Multi-Packet Rx of mlx5 does.
>>>>>> Otherwise, packets should be memcpy'd to regular mbufs one by one instead of
>>>>>> indirect referencing.
>>> But just to make HW to RX multiple packets into one mbuf,
>>> data_off inside indirect mbuf should be enough, correct?
>> Right. Current max buffer len of mbuf is 64kB (16bits) but it is enough for mlx5
>> to reach to 100Gbps with 64B traffic (149Mpps). I made mlx5 HW put 16 packets in
>> a buffer. So, it needs ~32kB buffer. Having more bits in length fields would be
>> better but 16-bit is good enough to overcome the PCIe Gen3 bottleneck in order
>> to saturate the network link.
> There were few complains that 64KB max is a limitation for some use-cases.
> I am not against increasing it, but I don't think we have free space on first cache-line for that
> without another big rework of mbuf layout.
> Considering that we need to increase size for buf_len, data_off, data_len, and probably priv_size too.
>
>>> As I understand, what you'd like to achieve with this new field -
>>> ability to manipulate packet boundaries after RX, probably at upper layer.
>>> As Olivier pointed above, that doesn't sound as safe approach - as you have multiple
>>> indirect mbufs trying to modify same direct buffer.
>> I agree that there's an implication that indirect mbuf or mbuf having refcnt > 1
>> is read-only. What that means, all the entities which own such mbufs have to be
>> aware of that and keep the principle as DPDK can't enforce the rule and there
>> can't be such sanity check. In this sense, HW doesn't violate it because the
>> direct mbuf is injected to HW before indirection. When packets are written by
>> HW, PMD attaches indirect mbufs to the direct mbuf and deliver those to
>> application layer with freeing the original direct mbuf (decrement refcnt by 1).
>> So, HW doesn't touch the direct buffer once it reaches to upper layer.
> Yes, I understand that. But as I can see you introduced functions to adjust head and tail,
> which implies that it should be possible by some entity (upper layer?) to manipulate these
> indirect mbufs.
> And we don't know how exactly it will be done.
>
>> The direct buffer will be freed and get available for reuse when all the attached
>> indirect mbufs are freed.
>>
>>> Though if you really need to do that, why it can be achieved by updating buf_len and priv_size
>>> Fields for indirect mbufs, straight after attach()?
>> Good point.
>> Actually that was my draft (Mellanox internal) version of this patch :-) But I
>> had to consider a case where priv_size is really given by user. Even though it
>> is less likely, but if original priv_size is quite big, it can't cover entire
>> buf_len. For this, I had to increase priv_size to 32-bit but adding another
>> 16bit field (buf_off) looked more plausible.
> As I remember, we can't have mbufs bigger then 64K,
> so priv_size + buf_len should be always less than 64K, correct?

It sounds like it is suggested to use/customize priv_size to limit 
indirect mbuf
range in the direct one. It does not work from the box since priv_size is
used to find out direct mbuf by indirect (see rte_mbuf_from_indirect()).

Andrew.

  reply	other threads:[~2018-04-11 14:03 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-10  1:25 [dpdk-dev] [PATCH v1 0/6] net/mlx5: add Multi-Packet Rx support Yongseok Koh
2018-03-10  1:25 ` [dpdk-dev] [PATCH v1 1/6] mbuf: add buffer offset field for flexible indirection Yongseok Koh
2018-03-10  1:25 ` [dpdk-dev] [PATCH v1 2/6] net/mlx5: separate filling Rx flags Yongseok Koh
2018-03-10  1:25 ` [dpdk-dev] [PATCH v1 3/6] net/mlx5: add a function to rdma-core glue Yongseok Koh
2018-03-12  9:13   ` Nélio Laranjeiro
2018-03-10  1:25 ` [dpdk-dev] [PATCH v1 4/6] net/mlx5: add Multi-Packet Rx support Yongseok Koh
2018-03-12  9:20   ` Nélio Laranjeiro
2018-03-10  1:25 ` [dpdk-dev] [PATCH v1 5/6] net/mlx5: release Tx queue resource earlier than Rx Yongseok Koh
2018-03-10  1:25 ` [dpdk-dev] [PATCH v1 6/6] app/testpmd: conserve mbuf indirection flag Yongseok Koh
2018-04-02 18:50 ` [dpdk-dev] [PATCH v2 0/6] net/mlx5: add Multi-Packet Rx support Yongseok Koh
2018-04-02 18:50   ` [dpdk-dev] [PATCH v2 1/6] mbuf: add buffer offset field for flexible indirection Yongseok Koh
2018-04-03  8:26     ` Olivier Matz
2018-04-04  0:12       ` Yongseok Koh
2018-04-09 16:04         ` Olivier Matz
2018-04-10  1:59           ` Yongseok Koh
2018-04-11  0:25             ` Ananyev, Konstantin
2018-04-11  5:33               ` Yongseok Koh
2018-04-11 11:39                 ` Ananyev, Konstantin
2018-04-11 14:02                   ` Andrew Rybchenko [this message]
2018-04-11 17:18                     ` Yongseok Koh
2018-04-11 17:08                   ` Yongseok Koh
2018-04-12 16:34                     ` Ananyev, Konstantin
2018-04-12 18:58                       ` Yongseok Koh
2018-04-02 18:50   ` [dpdk-dev] [PATCH v2 2/6] net/mlx5: separate filling Rx flags Yongseok Koh
2018-04-02 18:50   ` [dpdk-dev] [PATCH v2 3/6] net/mlx5: add a function to rdma-core glue Yongseok Koh
2018-04-02 18:50   ` [dpdk-dev] [PATCH v2 4/6] net/mlx5: add Multi-Packet Rx support Yongseok Koh
2018-04-02 18:50   ` [dpdk-dev] [PATCH v2 5/6] net/mlx5: release Tx queue resource earlier than Rx Yongseok Koh
2018-04-02 18:50   ` [dpdk-dev] [PATCH v2 6/6] app/testpmd: conserve mbuf indirection flag Yongseok Koh
2018-04-19  1:11 ` [dpdk-dev] [PATCH v3 1/2] mbuf: support attaching external buffer to mbuf Yongseok Koh
2018-04-19  1:11   ` [dpdk-dev] [PATCH v3 2/2] app/testpmd: conserve offload flags of mbuf Yongseok Koh
2018-04-23 11:53   ` [dpdk-dev] [PATCH v3 1/2] mbuf: support attaching external buffer to mbuf Ananyev, Konstantin
2018-04-24  2:04     ` Yongseok Koh
2018-04-25 13:16       ` Ananyev, Konstantin
2018-04-25 16:44         ` Yongseok Koh
2018-04-25 18:05           ` Ananyev, Konstantin
2018-04-23 16:18   ` Olivier Matz
2018-04-24  1:29     ` Yongseok Koh
2018-04-24 15:36       ` Olivier Matz
2018-04-24  1:38 ` [dpdk-dev] [PATCH v4 " Yongseok Koh
2018-04-24  1:38   ` [dpdk-dev] [PATCH v4 2/2] app/testpmd: conserve offload flags of mbuf Yongseok Koh
2018-04-24  5:01   ` [dpdk-dev] [PATCH v4 1/2] mbuf: support attaching external buffer to mbuf Stephen Hemminger
2018-04-24 11:47     ` Yongseok Koh
2018-04-24 12:28   ` Andrew Rybchenko
2018-04-24 16:02     ` Olivier Matz
2018-04-24 18:21       ` [dpdk-dev] ***Spam*** " Andrew Rybchenko
2018-04-24 19:15         ` [dpdk-dev] " Olivier Matz
2018-04-24 20:22           ` Thomas Monjalon
2018-04-24 21:53             ` Yongseok Koh
2018-04-24 22:15               ` Thomas Monjalon
2018-04-25  8:21               ` Olivier Matz
2018-04-25 15:06             ` Stephen Hemminger
2018-04-24 23:34           ` Yongseok Koh
2018-04-25 14:45             ` Andrew Rybchenko
2018-04-25 17:40               ` Yongseok Koh
2018-04-25  8:28       ` Olivier Matz
2018-04-25  9:08         ` Yongseok Koh
2018-04-25  9:19           ` Yongseok Koh
2018-04-25 20:00             ` Olivier Matz
2018-04-25 22:54               ` Yongseok Koh
2018-04-24 22:30     ` Yongseok Koh
2018-04-25  2:53 ` [dpdk-dev] [PATCH v5 " Yongseok Koh
2018-04-25  2:53   ` [dpdk-dev] [PATCH v5 2/2] app/testpmd: conserve offload flags of mbuf Yongseok Koh
2018-04-25 13:31   ` [dpdk-dev] [PATCH v5 1/2] mbuf: support attaching external buffer to mbuf Ananyev, Konstantin
2018-04-25 17:06     ` Yongseok Koh
2018-04-25 17:23       ` Ananyev, Konstantin
2018-04-25 18:02         ` Yongseok Koh
2018-04-25 18:22           ` Yongseok Koh
2018-04-25 18:30             ` Yongseok Koh
2018-04-26  1:10 ` [dpdk-dev] [PATCH v6 " Yongseok Koh
2018-04-26  1:10   ` [dpdk-dev] [PATCH v6 2/2] app/testpmd: conserve offload flags of mbuf Yongseok Koh
2018-04-26 11:39   ` [dpdk-dev] [PATCH v6 1/2] mbuf: support attaching external buffer to mbuf Ananyev, Konstantin
2018-04-26 16:05   ` Andrew Rybchenko
2018-04-26 16:10     ` Thomas Monjalon
2018-04-26 19:42       ` Olivier Matz
2018-04-26 19:58         ` Thomas Monjalon
2018-04-26 20:07           ` Olivier Matz
2018-04-26 20:24             ` Thomas Monjalon
2018-04-26 17:18     ` Yongseok Koh
2018-04-26 19:45       ` Olivier Matz
2018-04-27  0:01 ` [dpdk-dev] [PATCH v7 " Yongseok Koh
2018-04-27  0:01   ` [dpdk-dev] [PATCH v7 2/2] app/testpmd: conserve offload flags of mbuf Yongseok Koh
2018-04-27  8:00     ` Andrew Rybchenko
2018-04-27  7:22   ` [dpdk-dev] [PATCH v7 1/2] mbuf: support attaching external buffer to mbuf Andrew Rybchenko
2018-04-27 17:22 ` [dpdk-dev] [PATCH v8 " Yongseok Koh
2018-04-27 17:22   ` [dpdk-dev] [PATCH v8 2/2] app/testpmd: conserve offload flags of mbuf Yongseok Koh
2018-04-27 18:09   ` [dpdk-dev] [PATCH v8 1/2] mbuf: support attaching external buffer to mbuf Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44a32fef-9087-f0c1-b28f-47b2d6a06995@solarflare.com \
    --to=arybchenko@solarflare.com \
    --cc=adrien.mazarguil@6wind.com \
    --cc=dev@dpdk.org \
    --cc=jingjing.wu@intel.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=nelio.laranjeiro@6wind.com \
    --cc=olivier.matz@6wind.com \
    --cc=wenzhuo.lu@intel.com \
    --cc=yskoh@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).