Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI

DPDK patches and discussions
 help / color / mirror / Atom feed

* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
@ 2019-07-12 11:37 Jerin Jacob Kollanukkaran
  2019-07-12 12:09 ` Burakov, Anatoly
  0 siblings, 1 reply; 23+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-07-12 11:37 UTC (permalink / raw)
  To: Burakov, Anatoly, Ferruh Yigit, Vamsi Krishna Attunuru, dev
  Cc: olivier.matz, arybchenko

> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Friday, July 12, 2019 4:19 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; Vamsi Krishna Attunuru
> <vattunuru@marvell.com>; dev@dpdk.org
> Cc: olivier.matz@6wind.com; arybchenko@solarflare.com
> Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> On 12-Jul-19 11:26 AM, Jerin Jacob Kollanukkaran wrote:
> >>>> What do you think?
> >>>
> >>> IMO, If possible we can avoid extra indirection of new config. In
> >>> worst case We can add it. How about following to not have new config
> >>>
> >>> 1) Make MEMPOOL_F_NO_PAGE_BOUND  as default
> >>> http://patches.dpdk.org/patch/55277/
> >>> There is absolutely zero overhead of this flag considering the huge
> >>> page size are minimum 2MB. Typically 512MB or 1GB.
> >>> Any one has any objection?
> >>
> >> Pretty much zero overhead in hugepage case, not so in non-hugepage
> case.
> >> It's rare, but since we support it, we have to account for it.
> >
> > That is a fair concern.
> > How about enable the flag in mempool ONLY when
> rte_eal_has_hugepages()
> > In the common layer?
> 
> Perhaps it's better to check page size of the underlying memory, because 4K
> pages are not necessarily no-huge mode - they could also be external
> memory. That's going to be a bit hard because there may not be a way to
> know which memory we're allocating from in advance, aside from simple
> checks like `(rte_eal_has_hugepages() ||
> rte_malloc_heap_socket_is_external(socket_id))` - but maybe those would
> be sufficient.

Yes.


> 
> >
> >> (also, i don't really like the name NO_PAGE_BOUND since in memzone
> >> API there's a "bounded memzone" allocation API, and this flag's name
> >> reads like objects would not be bounded by page size, not that they
> >> won't cross page
> >> boundary)
> >
> > No strong opinion for the name. What name you suggest?
> 
> How about something like MEMPOOL_F_NO_PAGE_SPLIT?

Looks good to me.

In summary, Change wrt existing patch"
- Change NO_PAGE_BOUND to MEMPOOL_F_NO_PAGE_SPLIT
- Set this flag in  rte_pktmbuf_pool_create() when rte_eal_has_hugepages() ||
 rte_malloc_heap_socket_is_external(socket_id))

Olivier, Any objection?
Ref: http://patches.dpdk.org/patch/55277/

> 
> >
> >>
> >>>
> >>> 2) Introduce rte_kni_mempool_create() API in kni lib to abstract the
> >>> Mempool requirement for KNI. This will enable portable KNI
> applications.
> >>
> >> This means that using KNI is not a drop-in replacement for any other
> >> PMD. If maintainers of KNI are OK with this then sure :)
> >
> > The PMD  don’t have any dependency on NO_PAGE_BOUND flag. Right?
> > If KNI app is using rte_kni_mempool_create() to create the mempool, In
> > what case do you see problem with specific PMD?
> 
> I'm not saying the PMD's have a dependency on the flag, i'm saying that the
> same code cannot be used with and without KNI because you need to call a
> separate API for mempool creation if you want to use it with KNI.

Yes. Need to call the introduced API from 19.08. If we not choose above(first) approach.
It can be documented in "API changes" in release notes. I prefer to have the first 
solution if there is no downside.


> For KNI, the underlying memory must abide by certain constraints that are
> not there for other PMD's, so either you fix all memory to these constraints,
> or you lose the ability to reuse the code with other PMD's as is.
> 
> That is, unless i'm grossly misunderstanding what you're suggesting here :)
> 
> >
> >>
> >> --
> >> Thanks,
> >> Anatoly
> 
> 
> --
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-07-12 11:37 [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI Jerin Jacob Kollanukkaran
@ 2019-07-12 12:09 ` Burakov, Anatoly
  2019-07-12 12:28   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
  0 siblings, 1 reply; 23+ messages in thread
From: Burakov, Anatoly @ 2019-07-12 12:09 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran, Ferruh Yigit, Vamsi Krishna Attunuru, dev
  Cc: olivier.matz, arybchenko

On 12-Jul-19 12:37 PM, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>> Sent: Friday, July 12, 2019 4:19 PM
>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Ferruh Yigit
>> <ferruh.yigit@intel.com>; Vamsi Krishna Attunuru
>> <vattunuru@marvell.com>; dev@dpdk.org
>> Cc: olivier.matz@6wind.com; arybchenko@solarflare.com
>> Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
>> On 12-Jul-19 11:26 AM, Jerin Jacob Kollanukkaran wrote:
>>>>>> What do you think?
>>>>>
>>>>> IMO, If possible we can avoid extra indirection of new config. In
>>>>> worst case We can add it. How about following to not have new config
>>>>>
>>>>> 1) Make MEMPOOL_F_NO_PAGE_BOUND  as default
>>>>> http://patches.dpdk.org/patch/55277/
>>>>> There is absolutely zero overhead of this flag considering the huge
>>>>> page size are minimum 2MB. Typically 512MB or 1GB.
>>>>> Any one has any objection?
>>>>
>>>> Pretty much zero overhead in hugepage case, not so in non-hugepage
>> case.
>>>> It's rare, but since we support it, we have to account for it.
>>>
>>> That is a fair concern.
>>> How about enable the flag in mempool ONLY when
>> rte_eal_has_hugepages()
>>> In the common layer?
>>
>> Perhaps it's better to check page size of the underlying memory, because 4K
>> pages are not necessarily no-huge mode - they could also be external
>> memory. That's going to be a bit hard because there may not be a way to
>> know which memory we're allocating from in advance, aside from simple
>> checks like `(rte_eal_has_hugepages() ||
>> rte_malloc_heap_socket_is_external(socket_id))` - but maybe those would
>> be sufficient.
> 
> Yes.
> 
> 
>>
>>>
>>>> (also, i don't really like the name NO_PAGE_BOUND since in memzone
>>>> API there's a "bounded memzone" allocation API, and this flag's name
>>>> reads like objects would not be bounded by page size, not that they
>>>> won't cross page
>>>> boundary)
>>>
>>> No strong opinion for the name. What name you suggest?
>>
>> How about something like MEMPOOL_F_NO_PAGE_SPLIT?
> 
> Looks good to me.
> 
> In summary, Change wrt existing patch"
> - Change NO_PAGE_BOUND to MEMPOOL_F_NO_PAGE_SPLIT
> - Set this flag in  rte_pktmbuf_pool_create() when rte_eal_has_hugepages() ||
>   rte_malloc_heap_socket_is_external(socket_id))

If we are to have a special KNI allocation API, would we even need that?

> 
> Olivier, Any objection?
> Ref: http://patches.dpdk.org/patch/55277/
> 



-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [EXT] Re: [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-07-12 12:09 ` Burakov, Anatoly
@ 2019-07-12 12:28   ` Jerin Jacob Kollanukkaran
  2019-07-15  4:54     ` Jerin Jacob Kollanukkaran
  0 siblings, 1 reply; 23+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-07-12 12:28 UTC (permalink / raw)
  To: Burakov, Anatoly, Ferruh Yigit, Vamsi Krishna Attunuru, dev
  Cc: olivier.matz, arybchenko

> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Friday, July 12, 2019 5:40 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; Vamsi Krishna Attunuru
> <vattunuru@marvell.com>; dev@dpdk.org
> Cc: olivier.matz@6wind.com; arybchenko@solarflare.com
> Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> 
> External Email
> 
> ----------------------------------------------------------------------
> On 12-Jul-19 12:37 PM, Jerin Jacob Kollanukkaran wrote:
> >> -----Original Message-----
> >> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> >> Sent: Friday, July 12, 2019 4:19 PM
> >> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Ferruh Yigit
> >> <ferruh.yigit@intel.com>; Vamsi Krishna Attunuru
> >> <vattunuru@marvell.com>; dev@dpdk.org
> >> Cc: olivier.matz@6wind.com; arybchenko@solarflare.com
> >> Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in
> >> KNI On 12-Jul-19 11:26 AM, Jerin Jacob Kollanukkaran wrote:
> >>>>>> What do you think?
> >>>>>
> >>>>> IMO, If possible we can avoid extra indirection of new config. In
> >>>>> worst case We can add it. How about following to not have new
> >>>>> config
> >>>>>
> >>>>> 1) Make MEMPOOL_F_NO_PAGE_BOUND  as default
> >>>>> http://patches.dpdk.org/patch/55277/
> >>>>> There is absolutely zero overhead of this flag considering the
> >>>>> huge page size are minimum 2MB. Typically 512MB or 1GB.
> >>>>> Any one has any objection?
> >>>>
> >>>> Pretty much zero overhead in hugepage case, not so in non-hugepage
> >> case.
> >>>> It's rare, but since we support it, we have to account for it.
> >>>
> >>> That is a fair concern.
> >>> How about enable the flag in mempool ONLY when
> >> rte_eal_has_hugepages()
> >>> In the common layer?
> >>
> >> Perhaps it's better to check page size of the underlying memory,
> >> because 4K pages are not necessarily no-huge mode - they could also
> >> be external memory. That's going to be a bit hard because there may
> >> not be a way to know which memory we're allocating from in advance,
> >> aside from simple checks like `(rte_eal_has_hugepages() ||
> >> rte_malloc_heap_socket_is_external(socket_id))` - but maybe those
> >> would be sufficient.
> >
> > Yes.
> >
> >
> >>
> >>>
> >>>> (also, i don't really like the name NO_PAGE_BOUND since in memzone
> >>>> API there's a "bounded memzone" allocation API, and this flag's
> >>>> name reads like objects would not be bounded by page size, not that
> >>>> they won't cross page
> >>>> boundary)
> >>>
> >>> No strong opinion for the name. What name you suggest?
> >>
> >> How about something like MEMPOOL_F_NO_PAGE_SPLIT?
> >
> > Looks good to me.
> >
> > In summary, Change wrt existing patch"
> > - Change NO_PAGE_BOUND to MEMPOOL_F_NO_PAGE_SPLIT
> > - Set this flag in  rte_pktmbuf_pool_create () when
> rte_eal_has_hugepages() ||
> >   rte_malloc_heap_socket_is_external(socket_id))
> 
> If we are to have a special KNI allocation API, would we even need that?

Not need this change in rte_pktmbuf_pool_create () if we introduce
a new rte_kni_pktmbuf_pool_create () API.

> 
> >
> > Olivier, Any objection?
> > Ref: http://patches.dpdk.org/patch/55277/
> >
> 
> 
> 
> --
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [EXT] Re: [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-07-12 12:28   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
@ 2019-07-15  4:54     ` Jerin Jacob Kollanukkaran
  2019-07-15  9:38       ` Burakov, Anatoly
  0 siblings, 1 reply; 23+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-07-15  4:54 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran, Burakov, Anatoly, Ferruh Yigit,
	Vamsi Krishna Attunuru, dev
  Cc: olivier.matz, arybchenko

> > >>>> (also, i don't really like the name NO_PAGE_BOUND since in
> > >>>> memzone API there's a "bounded memzone" allocation API, and this
> > >>>> flag's name reads like objects would not be bounded by page size,
> > >>>> not that they won't cross page
> > >>>> boundary)
> > >>>
> > >>> No strong opinion for the name. What name you suggest?
> > >>
> > >> How about something like MEMPOOL_F_NO_PAGE_SPLIT?
> > >
> > > Looks good to me.
> > >
> > > In summary, Change wrt existing patch"
> > > - Change NO_PAGE_BOUND to MEMPOOL_F_NO_PAGE_SPLIT
> > > - Set this flag in  rte_pktmbuf_pool_create () when
> > rte_eal_has_hugepages() ||
> > >   rte_malloc_heap_socket_is_external(socket_id))
> >
> > If we are to have a special KNI allocation API, would we even need that?
> 
> Not need this change in rte_pktmbuf_pool_create () if we introduce a new
> rte_kni_pktmbuf_pool_create () API.

Ferruh, Olivier, Anatoly,

Any objection to create new rte_kni_pktmbuf_pool_create () API 
to embedded MEMPOOL_F_NO_PAGE_SPLIT flag requirement for KNI + IOVA as VA



> 
> >
> > >
> > > Olivier, Any objection?
> > > Ref: http://patches.dpdk.org/patch/55277/
> > >
> >
> >
> >
> > --
> > Thanks,
> > Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [EXT] Re: [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-07-15  4:54     ` Jerin Jacob Kollanukkaran
@ 2019-07-15  9:38       ` Burakov, Anatoly
  2019-07-16  8:46         ` Olivier Matz
  0 siblings, 1 reply; 23+ messages in thread
From: Burakov, Anatoly @ 2019-07-15  9:38 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran, Ferruh Yigit, Vamsi Krishna Attunuru, dev
  Cc: olivier.matz, arybchenko

On 15-Jul-19 5:54 AM, Jerin Jacob Kollanukkaran wrote:
>>>>>>> (also, i don't really like the name NO_PAGE_BOUND since in
>>>>>>> memzone API there's a "bounded memzone" allocation API, and this
>>>>>>> flag's name reads like objects would not be bounded by page size,
>>>>>>> not that they won't cross page
>>>>>>> boundary)
>>>>>>
>>>>>> No strong opinion for the name. What name you suggest?
>>>>>
>>>>> How about something like MEMPOOL_F_NO_PAGE_SPLIT?
>>>>
>>>> Looks good to me.
>>>>
>>>> In summary, Change wrt existing patch"
>>>> - Change NO_PAGE_BOUND to MEMPOOL_F_NO_PAGE_SPLIT
>>>> - Set this flag in  rte_pktmbuf_pool_create () when
>>> rte_eal_has_hugepages() ||
>>>>    rte_malloc_heap_socket_is_external(socket_id))
>>>
>>> If we are to have a special KNI allocation API, would we even need that?
>>
>> Not need this change in rte_pktmbuf_pool_create () if we introduce a new
>> rte_kni_pktmbuf_pool_create () API.
> 
> Ferruh, Olivier, Anatoly,
> 
> Any objection to create new rte_kni_pktmbuf_pool_create () API
> to embedded MEMPOOL_F_NO_PAGE_SPLIT flag requirement for KNI + IOVA as VA
> 
> 

As long as we all are aware of what that means and agree with that 
consequence (namely, separate codepaths for KNI and other PMD's) then i 
have no specific objections.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [EXT] Re: [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-07-15  9:38       ` Burakov, Anatoly
@ 2019-07-16  8:46         ` Olivier Matz
  2019-07-16  9:40           ` Vamsi Krishna Attunuru
  0 siblings, 1 reply; 23+ messages in thread
From: Olivier Matz @ 2019-07-16  8:46 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: Jerin Jacob Kollanukkaran, Ferruh Yigit, Vamsi Krishna Attunuru,
	dev, arybchenko

Hi,

On Mon, Jul 15, 2019 at 10:38:53AM +0100, Burakov, Anatoly wrote:
> On 15-Jul-19 5:54 AM, Jerin Jacob Kollanukkaran wrote:
> > > > > > > > (also, i don't really like the name NO_PAGE_BOUND since in
> > > > > > > > memzone API there's a "bounded memzone" allocation API, and this
> > > > > > > > flag's name reads like objects would not be bounded by page size,
> > > > > > > > not that they won't cross page
> > > > > > > > boundary)
> > > > > > > 
> > > > > > > No strong opinion for the name. What name you suggest?
> > > > > > 
> > > > > > How about something like MEMPOOL_F_NO_PAGE_SPLIT?
> > > > > 
> > > > > Looks good to me.
> > > > > 
> > > > > In summary, Change wrt existing patch"
> > > > > - Change NO_PAGE_BOUND to MEMPOOL_F_NO_PAGE_SPLIT
> > > > > - Set this flag in  rte_pktmbuf_pool_create () when
> > > > rte_eal_has_hugepages() ||
> > > > >    rte_malloc_heap_socket_is_external(socket_id))
> > > > 
> > > > If we are to have a special KNI allocation API, would we even need that?
> > > 
> > > Not need this change in rte_pktmbuf_pool_create () if we introduce a new
> > > rte_kni_pktmbuf_pool_create () API.
> > 
> > Ferruh, Olivier, Anatoly,
> > 
> > Any objection to create new rte_kni_pktmbuf_pool_create () API
> > to embedded MEMPOOL_F_NO_PAGE_SPLIT flag requirement for KNI + IOVA as VA
> > 
> > 
> 
> As long as we all are aware of what that means and agree with that
> consequence (namely, separate codepaths for KNI and other PMD's) then i have
> no specific objections.

Sorry for the late feedback.

I think we can change the default behavior of mempool populate(), to
prevent objects from being accross 2 pages, except if the size of the
object is bigger than the size of the page. This is already what is done
in rte_mempool_op_calc_mem_size_default() when we want to estimate the
amount of memory needed to allocate N objects.

This would avoid the introduction of a specific API to allocate packets
for kni, and a specific mempool flag.

About the problem of 9K mbuf mentionned by Anatoly, could we imagine a
check in kni code, that just returns an error "does not work with
size(mbuf) > size(page)" ?

Thanks,
Olivier

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [EXT] Re: [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-07-16  8:46         ` Olivier Matz
@ 2019-07-16  9:40           ` Vamsi Krishna Attunuru
  2019-07-16  9:55             ` Olivier Matz
  0 siblings, 1 reply; 23+ messages in thread
From: Vamsi Krishna Attunuru @ 2019-07-16  9:40 UTC (permalink / raw)
  To: Olivier Matz, Burakov, Anatoly
  Cc: Jerin Jacob Kollanukkaran, Ferruh Yigit, dev, arybchenko



> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Tuesday, July 16, 2019 2:17 PM
> To: Burakov, Anatoly <anatoly.burakov@intel.com>
> Cc: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; Vamsi Krishna Attunuru <vattunuru@marvell.com>;
> dev@dpdk.org; arybchenko@solarflare.com
> Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v6 0/4] add IOVA = VA support in KNI
> 
> Hi,
> 
> On Mon, Jul 15, 2019 at 10:38:53AM +0100, Burakov, Anatoly wrote:
> > On 15-Jul-19 5:54 AM, Jerin Jacob Kollanukkaran wrote:
> > > > > > > > > (also, i don't really like the name NO_PAGE_BOUND since
> > > > > > > > > in memzone API there's a "bounded memzone" allocation
> > > > > > > > > API, and this flag's name reads like objects would not
> > > > > > > > > be bounded by page size, not that they won't cross page
> > > > > > > > > boundary)
> > > > > > > >
> > > > > > > > No strong opinion for the name. What name you suggest?
> > > > > > >
> > > > > > > How about something like MEMPOOL_F_NO_PAGE_SPLIT?
> > > > > >
> > > > > > Looks good to me.
> > > > > >
> > > > > > In summary, Change wrt existing patch"
> > > > > > - Change NO_PAGE_BOUND to MEMPOOL_F_NO_PAGE_SPLIT
> > > > > > - Set this flag in  rte_pktmbuf_pool_create () when
> > > > > rte_eal_has_hugepages() ||
> > > > > >    rte_malloc_heap_socket_is_external(socket_id))
> > > > >
> > > > > If we are to have a special KNI allocation API, would we even need that?
> > > >
> > > > Not need this change in rte_pktmbuf_pool_create () if we introduce
> > > > a new rte_kni_pktmbuf_pool_create () API.
> > >
> > > Ferruh, Olivier, Anatoly,
> > >
> > > Any objection to create new rte_kni_pktmbuf_pool_create () API to
> > > embedded MEMPOOL_F_NO_PAGE_SPLIT flag requirement for KNI + IOVA
> as
> > > VA
> > >
> > >
> >
> > As long as we all are aware of what that means and agree with that
> > consequence (namely, separate codepaths for KNI and other PMD's) then
> > i have no specific objections.
> 
> Sorry for the late feedback.
> 
> I think we can change the default behavior of mempool populate(), to prevent
> objects from being accross 2 pages, except if the size of the object is bigger than
> the size of the page. This is already what is done in
> rte_mempool_op_calc_mem_size_default() when we want to estimate the
> amount of memory needed to allocate N objects.
> 
> This would avoid the introduction of a specific API to allocate packets for kni,
> and a specific mempool flag.
> 
> About the problem of 9K mbuf mentionned by Anatoly, could we imagine a
> check in kni code, that just returns an error "does not work with
> size(mbuf) > size(page)" ?
> 

Yes, change in default behavior avoids new APIs or flags.
Two minor changes on top of  above suggestions.
1) Can flag(NO_PAGE_SPLIT) be retained.?,  sequence is like,  flag is set by default in rte_mempool_populate_default()
and later it can be cleared based on obj_per_page in rte_mempool_op_calc_mem_size_default(). I do not see specific
requirement of these flag apart from handling above sequence.
2) For problems of 9k mbuf, I think that check could be addressed in kni lib(in rte_kni_init and return error).

> Thanks,
> Olivier

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [EXT] Re: [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-07-16  9:40           ` Vamsi Krishna Attunuru
@ 2019-07-16  9:55             ` Olivier Matz
  2019-07-16 10:07               ` Vamsi Krishna Attunuru
  0 siblings, 1 reply; 23+ messages in thread
From: Olivier Matz @ 2019-07-16  9:55 UTC (permalink / raw)
  To: Vamsi Krishna Attunuru
  Cc: Burakov, Anatoly, Jerin Jacob Kollanukkaran, Ferruh Yigit, dev,
	arybchenko

Hi,

On Tue, Jul 16, 2019 at 09:40:59AM +0000, Vamsi Krishna Attunuru wrote:
> 
> 
> > -----Original Message-----
> > From: Olivier Matz <olivier.matz@6wind.com>
> > Sent: Tuesday, July 16, 2019 2:17 PM
> > To: Burakov, Anatoly <anatoly.burakov@intel.com>
> > Cc: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Ferruh Yigit
> > <ferruh.yigit@intel.com>; Vamsi Krishna Attunuru <vattunuru@marvell.com>;
> > dev@dpdk.org; arybchenko@solarflare.com
> > Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v6 0/4] add IOVA = VA support in KNI
> > 
> > Hi,
> > 
> > On Mon, Jul 15, 2019 at 10:38:53AM +0100, Burakov, Anatoly wrote:
> > > On 15-Jul-19 5:54 AM, Jerin Jacob Kollanukkaran wrote:
> > > > > > > > > > (also, i don't really like the name NO_PAGE_BOUND since
> > > > > > > > > > in memzone API there's a "bounded memzone" allocation
> > > > > > > > > > API, and this flag's name reads like objects would not
> > > > > > > > > > be bounded by page size, not that they won't cross page
> > > > > > > > > > boundary)
> > > > > > > > >
> > > > > > > > > No strong opinion for the name. What name you suggest?
> > > > > > > >
> > > > > > > > How about something like MEMPOOL_F_NO_PAGE_SPLIT?
> > > > > > >
> > > > > > > Looks good to me.
> > > > > > >
> > > > > > > In summary, Change wrt existing patch"
> > > > > > > - Change NO_PAGE_BOUND to MEMPOOL_F_NO_PAGE_SPLIT
> > > > > > > - Set this flag in  rte_pktmbuf_pool_create () when
> > > > > > rte_eal_has_hugepages() ||
> > > > > > >    rte_malloc_heap_socket_is_external(socket_id))
> > > > > >
> > > > > > If we are to have a special KNI allocation API, would we even need that?
> > > > >
> > > > > Not need this change in rte_pktmbuf_pool_create () if we introduce
> > > > > a new rte_kni_pktmbuf_pool_create () API.
> > > >
> > > > Ferruh, Olivier, Anatoly,
> > > >
> > > > Any objection to create new rte_kni_pktmbuf_pool_create () API to
> > > > embedded MEMPOOL_F_NO_PAGE_SPLIT flag requirement for KNI + IOVA
> > as
> > > > VA
> > > >
> > > >
> > >
> > > As long as we all are aware of what that means and agree with that
> > > consequence (namely, separate codepaths for KNI and other PMD's) then
> > > i have no specific objections.
> > 
> > Sorry for the late feedback.
> > 
> > I think we can change the default behavior of mempool populate(), to prevent
> > objects from being accross 2 pages, except if the size of the object is bigger than
> > the size of the page. This is already what is done in
> > rte_mempool_op_calc_mem_size_default() when we want to estimate the
> > amount of memory needed to allocate N objects.
> > 
> > This would avoid the introduction of a specific API to allocate packets for kni,
> > and a specific mempool flag.
> > 
> > About the problem of 9K mbuf mentionned by Anatoly, could we imagine a
> > check in kni code, that just returns an error "does not work with
> > size(mbuf) > size(page)" ?
> > 
> 
> Yes, change in default behavior avoids new APIs or flags.
> Two minor changes on top of  above suggestions.
> 1) Can flag(NO_PAGE_SPLIT) be retained.?,  sequence is like,  flag is set by default in rte_mempool_populate_default()
> and later it can be cleared based on obj_per_page in rte_mempool_op_calc_mem_size_default(). I do not see specific
> requirement of these flag apart from handling above sequence.

Sorry, I don't get why you want to keep this flag. Is it to facilitate
the error check in kni code?

The flags are used by the mempool user to ask for a specific behavior,
so if we change the default behavior, there is nothing to change to the
user API.

> 2) For problems of 9k mbuf, I think that check could be addressed in kni lib(in rte_kni_init and return error).

You can use rte_mempool_obj_iter() to iterate the objects (mbufs) in the
mempool, to ensure that none of them is accross 2 pages.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [EXT] Re: [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-07-16  9:55             ` Olivier Matz
@ 2019-07-16 10:07               ` Vamsi Krishna Attunuru
  0 siblings, 0 replies; 23+ messages in thread
From: Vamsi Krishna Attunuru @ 2019-07-16 10:07 UTC (permalink / raw)
  To: Olivier Matz
  Cc: Burakov, Anatoly, Jerin Jacob Kollanukkaran, Ferruh Yigit, dev,
	arybchenko



> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Tuesday, July 16, 2019 3:26 PM
> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>
> Cc: Burakov, Anatoly <anatoly.burakov@intel.com>; Jerin Jacob Kollanukkaran
> <jerinj@marvell.com>; Ferruh Yigit <ferruh.yigit@intel.com>; dev@dpdk.org;
> arybchenko@solarflare.com
> Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v6 0/4] add IOVA = VA support in KNI
> 
> Hi,
> 
> On Tue, Jul 16, 2019 at 09:40:59AM +0000, Vamsi Krishna Attunuru wrote:
> >
> >
> > > -----Original Message-----
> > > From: Olivier Matz <olivier.matz@6wind.com>
> > > Sent: Tuesday, July 16, 2019 2:17 PM
> > > To: Burakov, Anatoly <anatoly.burakov@intel.com>
> > > Cc: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Ferruh Yigit
> > > <ferruh.yigit@intel.com>; Vamsi Krishna Attunuru
> > > <vattunuru@marvell.com>; dev@dpdk.org; arybchenko@solarflare.com
> > > Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v6 0/4] add IOVA = VA
> > > support in KNI
> > >
> > > Hi,
> > >
> > > On Mon, Jul 15, 2019 at 10:38:53AM +0100, Burakov, Anatoly wrote:
> > > > On 15-Jul-19 5:54 AM, Jerin Jacob Kollanukkaran wrote:
> > > > > > > > > > > (also, i don't really like the name NO_PAGE_BOUND
> > > > > > > > > > > since in memzone API there's a "bounded memzone"
> > > > > > > > > > > allocation API, and this flag's name reads like
> > > > > > > > > > > objects would not be bounded by page size, not that
> > > > > > > > > > > they won't cross page
> > > > > > > > > > > boundary)
> > > > > > > > > >
> > > > > > > > > > No strong opinion for the name. What name you suggest?
> > > > > > > > >
> > > > > > > > > How about something like MEMPOOL_F_NO_PAGE_SPLIT?
> > > > > > > >
> > > > > > > > Looks good to me.
> > > > > > > >
> > > > > > > > In summary, Change wrt existing patch"
> > > > > > > > - Change NO_PAGE_BOUND to MEMPOOL_F_NO_PAGE_SPLIT
> > > > > > > > - Set this flag in  rte_pktmbuf_pool_create () when
> > > > > > > rte_eal_has_hugepages() ||
> > > > > > > >    rte_malloc_heap_socket_is_external(socket_id))
> > > > > > >
> > > > > > > If we are to have a special KNI allocation API, would we even need
> that?
> > > > > >
> > > > > > Not need this change in rte_pktmbuf_pool_create () if we
> > > > > > introduce a new rte_kni_pktmbuf_pool_create () API.
> > > > >
> > > > > Ferruh, Olivier, Anatoly,
> > > > >
> > > > > Any objection to create new rte_kni_pktmbuf_pool_create () API
> > > > > to embedded MEMPOOL_F_NO_PAGE_SPLIT flag requirement for KNI +
> > > > > IOVA
> > > as
> > > > > VA
> > > > >
> > > > >
> > > >
> > > > As long as we all are aware of what that means and agree with that
> > > > consequence (namely, separate codepaths for KNI and other PMD's)
> > > > then i have no specific objections.
> > >
> > > Sorry for the late feedback.
> > >
> > > I think we can change the default behavior of mempool populate(), to
> > > prevent objects from being accross 2 pages, except if the size of
> > > the object is bigger than the size of the page. This is already what
> > > is done in
> > > rte_mempool_op_calc_mem_size_default() when we want to estimate the
> > > amount of memory needed to allocate N objects.
> > >
> > > This would avoid the introduction of a specific API to allocate
> > > packets for kni, and a specific mempool flag.
> > >
> > > About the problem of 9K mbuf mentionned by Anatoly, could we imagine
> > > a check in kni code, that just returns an error "does not work with
> > > size(mbuf) > size(page)" ?
> > >
> >
> > Yes, change in default behavior avoids new APIs or flags.
> > Two minor changes on top of  above suggestions.
> > 1) Can flag(NO_PAGE_SPLIT) be retained.?,  sequence is like,  flag is
> > set by default in rte_mempool_populate_default() and later it can be
> > cleared based on obj_per_page in rte_mempool_op_calc_mem_size_default().
> I do not see specific requirement of these flag apart from handling above
> sequence.
> 
> Sorry, I don't get why you want to keep this flag. Is it to facilitate the error check
> in kni code?
Yes, it's only for error check I thought.

> 
> The flags are used by the mempool user to ask for a specific behavior, so if we
> change the default behavior, there is nothing to change to the user API.

Correct, the flags are meant for mempool users. As you suggested there is no requirement
of new APIs or flags by changing default behavior.

> 
> > 2) For problems of 9k mbuf, I think that check could be addressed in kni lib(in
> rte_kni_init and return error).
> 
> You can use rte_mempool_obj_iter() to iterate the objects (mbufs) in the
> mempool, to ensure that none of them is accross 2 pages.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
@ 2019-07-12 10:26 Jerin Jacob Kollanukkaran
  2019-07-12 10:48 ` Burakov, Anatoly
  0 siblings, 1 reply; 23+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-07-12 10:26 UTC (permalink / raw)
  To: Burakov, Anatoly, Ferruh Yigit, Vamsi Krishna Attunuru, dev
  Cc: olivier.matz, arybchenko

> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Friday, July 12, 2019 3:28 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; Vamsi Krishna Attunuru
> <vattunuru@marvell.com>; dev@dpdk.org
> Cc: olivier.matz@6wind.com; arybchenko@solarflare.com
> Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> 
> External Email
> 
> ----------------------------------------------------------------------
> On 12-Jul-19 10:17 AM, Jerin Jacob Kollanukkaran wrote:
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@intel.com>
> >> Sent: Thursday, July 11, 2019 9:52 PM
> >> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Vamsi Krishna
> >> Attunuru <vattunuru@marvell.com>; dev@dpdk.org
> >> Cc: olivier.matz@6wind.com; arybchenko@solarflare.com; Burakov,
> >> Anatoly <anatoly.burakov@intel.com>
> >> Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in
> >> KNI
> >>
> >> External Email
> >>
> >> ---------------------------------------------------------------------
> >> - On 7/4/2019 10:48 AM, Jerin Jacob Kollanukkaran wrote:
> >>>> From: Vamsi Krishna Attunuru
> >>>> Sent: Thursday, July 4, 2019 12:13 PM
> >>>> To: dev@dpdk.org
> >>>> Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com;
> >>>> arybchenko@solarflare.com; Jerin Jacob Kollanukkaran
> >>>> <jerinj@marvell.com>; Burakov, Anatoly <anatoly.burakov@intel.com>
> >>>> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> >>>>
> >>>> Hi All,
> >>>>
> >>>> Just to summarize, below items have arisen from the initial review.
> >>>> 1) Can the new mempool flag be made default to all the pools and
> >>>> will
> >> there be case that new flag functionality would fail  for some page sizes.?
> >>>
> >>> If the minimum huge page size is 2MB and normal huge page size is
> >>> 512MB or 1G. So I think, new flags can be default as skipping the
> >>> page
> >> boundaries for Mempool objects has nearly zero overhead. But I leave
> >> decision to maintainers.
> >>>
> >>>> 2) Adding HW device info(pci dev info) to KNI device structure,
> >>>> will it
> >> break KNI on virtual devices in VA or PA mode.?
> >>>
> >>> Iommu_domain will be created only for PCI devices and the system
> >>> runs in IOVA_VA mode. Virtual devices(IOVA_DC(don't care) or
> IOVA_PA
> >>> devices still it works without PCI device structure)
> >>>
> >>> It is  a useful feature where KNI can run without root privilege and
> >>> it is pending for long time. Request to review and close this
> >>
> >> I support the idea to remove 'kni' forcing to the IOVA=PA mode, but
> >> also not sure about forcing all KNI users to update their code to
> >> allocate mempool in a very specific way.
> >>
> >> What about giving more control to the user on this?
> >>
> >> Any user want to use IOVA=VA and KNI together can update application
> >> to justify memory allocation of the KNI and give an explicit "kni
> iova_mode=1"
> >> config.
> >
> > Where this config comes, eal or kni sample app or KNI public API?
> >
> >
> >> Who want to use existing KNI implementation can continue to use it
> >> with IOVA=PA mode which is current case, or for this case user may
> >> need to force the DPDK application to IOVA=PA but at least there is a
> workaround.
> >>
> >> And kni sample application should have sample for both case, although
> >> this increases the testing and maintenance cost, I hope we can get
> >> support from you on the iova_mode=1 usecase.
> >>
> >> What do you think?
> >
> > IMO, If possible we can avoid extra indirection of new config. In
> > worst case We can add it. How about following to not have new config
> >
> > 1) Make MEMPOOL_F_NO_PAGE_BOUND  as default
> > http://patches.dpdk.org/patch/55277/
> > There is absolutely zero overhead of this flag considering the huge
> > page size are minimum 2MB. Typically 512MB or 1GB.
> > Any one has any objection?
> 
> Pretty much zero overhead in hugepage case, not so in non-hugepage case.
> It's rare, but since we support it, we have to account for it.

That is a fair concern. 
How about enable the flag in mempool ONLY when rte_eal_has_hugepages()
In the common layer?

> (also, i don't really like the name NO_PAGE_BOUND since in memzone API
> there's a "bounded memzone" allocation API, and this flag's name reads like
> objects would not be bounded by page size, not that they won't cross page
> boundary)

No strong opinion for the name. What name you suggest?

> 
> >
> > 2) Introduce rte_kni_mempool_create() API in kni lib to abstract the
> > Mempool requirement for KNI. This will enable portable KNI applications.
> 
> This means that using KNI is not a drop-in replacement for any other
> PMD. If maintainers of KNI are OK with this then sure :)

The PMD  don’t have any dependency on NO_PAGE_BOUND flag. Right?
If KNI app is using rte_kni_mempool_create() to create the mempool,
In what case do you see problem with specific PMD?

> 
> --
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-07-12 10:26 [dpdk-dev] " Jerin Jacob Kollanukkaran
@ 2019-07-12 10:48 ` Burakov, Anatoly
  0 siblings, 0 replies; 23+ messages in thread
From: Burakov, Anatoly @ 2019-07-12 10:48 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran, Ferruh Yigit, Vamsi Krishna Attunuru, dev
  Cc: olivier.matz, arybchenko

On 12-Jul-19 11:26 AM, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: Burakov, Anatoly <anatoly.burakov@intel.com>
>> Sent: Friday, July 12, 2019 3:28 PM
>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Ferruh Yigit
>> <ferruh.yigit@intel.com>; Vamsi Krishna Attunuru
>> <vattunuru@marvell.com>; dev@dpdk.org
>> Cc: olivier.matz@6wind.com; arybchenko@solarflare.com
>> Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
>>
>> External Email
>>
>> ----------------------------------------------------------------------
>> On 12-Jul-19 10:17 AM, Jerin Jacob Kollanukkaran wrote:
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> Sent: Thursday, July 11, 2019 9:52 PM
>>>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Vamsi Krishna
>>>> Attunuru <vattunuru@marvell.com>; dev@dpdk.org
>>>> Cc: olivier.matz@6wind.com; arybchenko@solarflare.com; Burakov,
>>>> Anatoly <anatoly.burakov@intel.com>
>>>> Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in
>>>> KNI
>>>>
>>>> External Email
>>>>
>>>> ---------------------------------------------------------------------
>>>> - On 7/4/2019 10:48 AM, Jerin Jacob Kollanukkaran wrote:
>>>>>> From: Vamsi Krishna Attunuru
>>>>>> Sent: Thursday, July 4, 2019 12:13 PM
>>>>>> To: dev@dpdk.org
>>>>>> Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com;
>>>>>> arybchenko@solarflare.com; Jerin Jacob Kollanukkaran
>>>>>> <jerinj@marvell.com>; Burakov, Anatoly <anatoly.burakov@intel.com>
>>>>>> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Just to summarize, below items have arisen from the initial review.
>>>>>> 1) Can the new mempool flag be made default to all the pools and
>>>>>> will
>>>> there be case that new flag functionality would fail  for some page sizes.?
>>>>>
>>>>> If the minimum huge page size is 2MB and normal huge page size is
>>>>> 512MB or 1G. So I think, new flags can be default as skipping the
>>>>> page
>>>> boundaries for Mempool objects has nearly zero overhead. But I leave
>>>> decision to maintainers.
>>>>>
>>>>>> 2) Adding HW device info(pci dev info) to KNI device structure,
>>>>>> will it
>>>> break KNI on virtual devices in VA or PA mode.?
>>>>>
>>>>> Iommu_domain will be created only for PCI devices and the system
>>>>> runs in IOVA_VA mode. Virtual devices(IOVA_DC(don't care) or
>> IOVA_PA
>>>>> devices still it works without PCI device structure)
>>>>>
>>>>> It is  a useful feature where KNI can run without root privilege and
>>>>> it is pending for long time. Request to review and close this
>>>>
>>>> I support the idea to remove 'kni' forcing to the IOVA=PA mode, but
>>>> also not sure about forcing all KNI users to update their code to
>>>> allocate mempool in a very specific way.
>>>>
>>>> What about giving more control to the user on this?
>>>>
>>>> Any user want to use IOVA=VA and KNI together can update application
>>>> to justify memory allocation of the KNI and give an explicit "kni
>> iova_mode=1"
>>>> config.
>>>
>>> Where this config comes, eal or kni sample app or KNI public API?
>>>
>>>
>>>> Who want to use existing KNI implementation can continue to use it
>>>> with IOVA=PA mode which is current case, or for this case user may
>>>> need to force the DPDK application to IOVA=PA but at least there is a
>> workaround.
>>>>
>>>> And kni sample application should have sample for both case, although
>>>> this increases the testing and maintenance cost, I hope we can get
>>>> support from you on the iova_mode=1 usecase.
>>>>
>>>> What do you think?
>>>
>>> IMO, If possible we can avoid extra indirection of new config. In
>>> worst case We can add it. How about following to not have new config
>>>
>>> 1) Make MEMPOOL_F_NO_PAGE_BOUND  as default
>>> http://patches.dpdk.org/patch/55277/
>>> There is absolutely zero overhead of this flag considering the huge
>>> page size are minimum 2MB. Typically 512MB or 1GB.
>>> Any one has any objection?
>>
>> Pretty much zero overhead in hugepage case, not so in non-hugepage case.
>> It's rare, but since we support it, we have to account for it.
> 
> That is a fair concern.
> How about enable the flag in mempool ONLY when rte_eal_has_hugepages()
> In the common layer?

Perhaps it's better to check page size of the underlying memory, because 
4K pages are not necessarily no-huge mode - they could also be external 
memory. That's going to be a bit hard because there may not be a way to 
know which memory we're allocating from in advance, aside from simple 
checks like `(rte_eal_has_hugepages() || 
rte_malloc_heap_socket_is_external(socket_id))` - but maybe those would 
be sufficient.

> 
>> (also, i don't really like the name NO_PAGE_BOUND since in memzone API
>> there's a "bounded memzone" allocation API, and this flag's name reads like
>> objects would not be bounded by page size, not that they won't cross page
>> boundary)
> 
> No strong opinion for the name. What name you suggest?

How about something like MEMPOOL_F_NO_PAGE_SPLIT?

> 
>>
>>>
>>> 2) Introduce rte_kni_mempool_create() API in kni lib to abstract the
>>> Mempool requirement for KNI. This will enable portable KNI applications.
>>
>> This means that using KNI is not a drop-in replacement for any other
>> PMD. If maintainers of KNI are OK with this then sure :)
> 
> The PMD  don’t have any dependency on NO_PAGE_BOUND flag. Right?
> If KNI app is using rte_kni_mempool_create() to create the mempool,
> In what case do you see problem with specific PMD?

I'm not saying the PMD's have a dependency on the flag, i'm saying that 
the same code cannot be used with and without KNI because you need to 
call a separate API for mempool creation if you want to use it with KNI. 
For KNI, the underlying memory must abide by certain constraints that 
are not there for other PMD's, so either you fix all memory to these 
constraints, or you lose the ability to reuse the code with other PMD's 
as is.

That is, unless i'm grossly misunderstanding what you're suggesting here :)

> 
>>
>> --
>> Thanks,
>> Anatoly


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
@ 2019-07-12  9:17 Jerin Jacob Kollanukkaran
  2019-07-12  9:58 ` Burakov, Anatoly
  0 siblings, 1 reply; 23+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-07-12  9:17 UTC (permalink / raw)
  To: Ferruh Yigit, Vamsi Krishna Attunuru, dev
  Cc: olivier.matz, arybchenko, Burakov, Anatoly


> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Thursday, July 11, 2019 9:52 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Vamsi Krishna Attunuru
> <vattunuru@marvell.com>; dev@dpdk.org
> Cc: olivier.matz@6wind.com; arybchenko@solarflare.com; Burakov, Anatoly
> <anatoly.burakov@intel.com>
> Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> 
> External Email
> 
> ----------------------------------------------------------------------
> On 7/4/2019 10:48 AM, Jerin Jacob Kollanukkaran wrote:
> >> From: Vamsi Krishna Attunuru
> >> Sent: Thursday, July 4, 2019 12:13 PM
> >> To: dev@dpdk.org
> >> Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com;
> >> arybchenko@solarflare.com; Jerin Jacob Kollanukkaran
> >> <jerinj@marvell.com>; Burakov, Anatoly <anatoly.burakov@intel.com>
> >> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> >>
> >> Hi All,
> >>
> >> Just to summarize, below items have arisen from the initial review.
> >> 1) Can the new mempool flag be made default to all the pools and will
> there be case that new flag functionality would fail  for some page sizes.?
> >
> > If the minimum huge page size is 2MB and normal huge page size is
> > 512MB or 1G. So I think, new flags can be default as skipping the page
> boundaries for Mempool objects has nearly zero overhead. But I leave
> decision to maintainers.
> >
> >> 2) Adding HW device info(pci dev info) to KNI device structure, will it
> break KNI on virtual devices in VA or PA mode.?
> >
> > Iommu_domain will be created only for PCI devices and the system runs
> > in IOVA_VA mode. Virtual devices(IOVA_DC(don't care) or IOVA_PA
> > devices still it works without PCI device structure)
> >
> > It is  a useful feature where KNI can run without root privilege and
> > it is pending for long time. Request to review and close this
> 
> I support the idea to remove 'kni' forcing to the IOVA=PA mode, but also not
> sure about forcing all KNI users to update their code to allocate mempool in a
> very specific way.
> 
> What about giving more control to the user on this?
> 
> Any user want to use IOVA=VA and KNI together can update application to
> justify memory allocation of the KNI and give an explicit "kni iova_mode=1"
> config.

Where this config comes, eal or kni sample app or KNI public API?


> Who want to use existing KNI implementation can continue to use it with
> IOVA=PA mode which is current case, or for this case user may need to force
> the DPDK application to IOVA=PA but at least there is a workaround.
> 
> And kni sample application should have sample for both case, although this
> increases the testing and maintenance cost, I hope we can get support from
> you on the iova_mode=1 usecase.
> 
> What do you think?

IMO, If possible we can avoid extra indirection of new config. In worst case
We can add it. How about following to not have new config

1) Make MEMPOOL_F_NO_PAGE_BOUND  as default
http://patches.dpdk.org/patch/55277/
There is absolutely zero overhead of this flag considering the huge page size are minimum
2MB. Typically 512MB or 1GB.
Any one has any objection?

2) Introduce rte_kni_mempool_create() API in kni lib to abstract the 
Mempool requirement for KNI. This will enable portable KNI applications.

Thoughts?

> 
> 
> 
> >
> >>
> >> Can someone suggest if any changes required to address above issues.
> > ________________________________________
> > From: dev <mailto:dev-bounces@dpdk.org> on behalf of Vamsi Krishna
> > Attunuru <mailto:vattunuru@marvell.com>
> > Sent: Monday, July 1, 2019 7:21:22 PM
> > To: Jerin Jacob Kollanukkaran; Burakov, Anatoly; mailto:dev@dpdk.org
> > Cc: mailto:ferruh.yigit@intel.com; mailto:olivier.matz@6wind.com;
> > mailto:arybchenko@solarflare.com
> > Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in
> > KNI
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > ping..
> >
> > ________________________________
> > From: Jerin Jacob Kollanukkaran
> > Sent: Thursday, June 27, 2019 3:04:58 PM
> > To: Burakov, Anatoly; Vamsi Krishna Attunuru; mailto:dev@dpdk.org
> > Cc: mailto:ferruh.yigit@intel.com; mailto:olivier.matz@6wind.com;
> > mailto:arybchenko@solarflare.com
> > Subject: RE: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> >
> >> -----Original Message-----
> >> From: Burakov, Anatoly <mailto:anatoly.burakov@intel.com>
> >> Sent: Tuesday, June 25, 2019 7:09 PM
> >> To: Jerin Jacob Kollanukkaran <mailto:jerinj@marvell.com>; Vamsi
> >> Krishna Attunuru <mailto:vattunuru@marvell.com>;
> mailto:dev@dpdk.org
> >> Cc: mailto:ferruh.yigit@intel.com; mailto:olivier.matz@6wind.com;
> >> mailto:arybchenko@solarflare.com
> >> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> >>
> >> On 25-Jun-19 12:30 PM, Burakov, Anatoly wrote:
> >>> On 25-Jun-19 12:15 PM, Jerin Jacob Kollanukkaran wrote:
> >>>>> -----Original Message-----
> >>>>> From: dev <mailto:dev-bounces@dpdk.org> On Behalf Of Burakov,
> >>>>> Anatoly
> >>>>> Sent: Tuesday, June 25, 2019 3:30 PM
> >>>>> To: Vamsi Krishna Attunuru <mailto:vattunuru@marvell.com>;
> >>>>> mailto:dev@dpdk.org
> >>>>> Cc: mailto:ferruh.yigit@intel.com; mailto:olivier.matz@6wind.com;
> >>>>> mailto:arybchenko@solarflare.com
> >>>>> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in
> >>>>> KNI
> >>>>>
> >>>>> On 25-Jun-19 4:56 AM, mailto:vattunuru@marvell.com wrote:
> >>>>>> From: Vamsi Attunuru <mailto:vattunuru@marvell.com>
> >>>>>>
> >>>>>> ----
> >>>>>> V6 Changes:
> >>>>>> * Added new mempool flag to ensure mbuf memory is not scattered
> >>>>>> across page boundaries.
> >>>>>> * Added KNI kernel module required PCI device information.
> >>>>>> * Modified KNI example application to create mempool with new
> >>>>>> mempool flag.
> >>>>>>
> >>>>> Others can chime in, but my 2 cents: this reduces the usefulness
> >>>>> of KNI because it limits the kinds of mempools one can use them
> >>>>> with, and makes it so that the code that works with every other
> >>>>> PMD requires changes to work with KNI.
> >>>>
> >>>> # One option to make this flag as default only for packet
> >>>> mempool(not allow allocate on page boundary).
> >>>> In real world the overhead will be very minimal considering Huge
> >>>> page size is 1G or 512M # Enable this flag explicitly only IOVA =
> >>>> VA mode in library. Not need to expose to application # I don't
> >>>> think, there needs to be any PMD specific change to make KNI with
> >>>> IOVA = VA mode # No preference on flags to be passed by application
> vs in library.
> >>>> But IMO this change would be
> >>>> needed in mempool support KNI in IOVA = VA mode.
> >>>>
> >>>
> >>> I would be OK to just make it default behavior to not cross page
> >>> boundaries when allocating buffers. This would solve the problem for
> >>> KNI and for any other use case that would rely on PA-contiguous
> >>> buffers in face of IOVA as VA mode.
> >>>
> >>> We could also add a flag to explicitly allow page crossing without
> >>> also making mbufs IOVA-non-contiguous, but i'm not sure if there are
> >>> use cases that would benefit from this.
> >>
> >> On another thought, such a default would break 4K pages in case for
> >> packets bigger than page size (i.e. jumbo frames). Should we care?
> >
> > The hugepage size will not be 4K. Right?
> >
> > Olivier,
> >
> > As a maintainer any thoughts of exposing/not exposing the new mepool
> > flag to Skip the page boundaries?
> >
> > All,
> > Either option is fine, Asking for feedback to processed further?
> >


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-07-12  9:17 Jerin Jacob Kollanukkaran
@ 2019-07-12  9:58 ` Burakov, Anatoly
  0 siblings, 0 replies; 23+ messages in thread
From: Burakov, Anatoly @ 2019-07-12  9:58 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran, Ferruh Yigit, Vamsi Krishna Attunuru, dev
  Cc: olivier.matz, arybchenko

On 12-Jul-19 10:17 AM, Jerin Jacob Kollanukkaran wrote:
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> Sent: Thursday, July 11, 2019 9:52 PM
>> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Vamsi Krishna Attunuru
>> <vattunuru@marvell.com>; dev@dpdk.org
>> Cc: olivier.matz@6wind.com; arybchenko@solarflare.com; Burakov, Anatoly
>> <anatoly.burakov@intel.com>
>> Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
>>
>> External Email
>>
>> ----------------------------------------------------------------------
>> On 7/4/2019 10:48 AM, Jerin Jacob Kollanukkaran wrote:
>>>> From: Vamsi Krishna Attunuru
>>>> Sent: Thursday, July 4, 2019 12:13 PM
>>>> To: dev@dpdk.org
>>>> Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com;
>>>> arybchenko@solarflare.com; Jerin Jacob Kollanukkaran
>>>> <jerinj@marvell.com>; Burakov, Anatoly <anatoly.burakov@intel.com>
>>>> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
>>>>
>>>> Hi All,
>>>>
>>>> Just to summarize, below items have arisen from the initial review.
>>>> 1) Can the new mempool flag be made default to all the pools and will
>> there be case that new flag functionality would fail  for some page sizes.?
>>>
>>> If the minimum huge page size is 2MB and normal huge page size is
>>> 512MB or 1G. So I think, new flags can be default as skipping the page
>> boundaries for Mempool objects has nearly zero overhead. But I leave
>> decision to maintainers.
>>>
>>>> 2) Adding HW device info(pci dev info) to KNI device structure, will it
>> break KNI on virtual devices in VA or PA mode.?
>>>
>>> Iommu_domain will be created only for PCI devices and the system runs
>>> in IOVA_VA mode. Virtual devices(IOVA_DC(don't care) or IOVA_PA
>>> devices still it works without PCI device structure)
>>>
>>> It is  a useful feature where KNI can run without root privilege and
>>> it is pending for long time. Request to review and close this
>>
>> I support the idea to remove 'kni' forcing to the IOVA=PA mode, but also not
>> sure about forcing all KNI users to update their code to allocate mempool in a
>> very specific way.
>>
>> What about giving more control to the user on this?
>>
>> Any user want to use IOVA=VA and KNI together can update application to
>> justify memory allocation of the KNI and give an explicit "kni iova_mode=1"
>> config.
> 
> Where this config comes, eal or kni sample app or KNI public API?
> 
> 
>> Who want to use existing KNI implementation can continue to use it with
>> IOVA=PA mode which is current case, or for this case user may need to force
>> the DPDK application to IOVA=PA but at least there is a workaround.
>>
>> And kni sample application should have sample for both case, although this
>> increases the testing and maintenance cost, I hope we can get support from
>> you on the iova_mode=1 usecase.
>>
>> What do you think?
> 
> IMO, If possible we can avoid extra indirection of new config. In worst case
> We can add it. How about following to not have new config
> 
> 1) Make MEMPOOL_F_NO_PAGE_BOUND  as default
> http://patches.dpdk.org/patch/55277/
> There is absolutely zero overhead of this flag considering the huge page size are minimum
> 2MB. Typically 512MB or 1GB.
> Any one has any objection?

Pretty much zero overhead in hugepage case, not so in non-hugepage case. 
It's rare, but since we support it, we have to account for it.

(also, i don't really like the name NO_PAGE_BOUND since in memzone API 
there's a "bounded memzone" allocation API, and this flag's name reads 
like objects would not be bounded by page size, not that they won't 
cross page boundary)

> 
> 2) Introduce rte_kni_mempool_create() API in kni lib to abstract the
> Mempool requirement for KNI. This will enable portable KNI applications.

This means that using KNI is not a drop-in replacement for any other 
PMD. If maintainers of KNI are OK with this then sure :)

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH v5] kni: add IOVA va support for kni
@ 2019-04-22  6:15 kirankumark
  2019-06-25  3:56 ` [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI vattunuru
  0 siblings, 1 reply; 23+ messages in thread
From: kirankumark @ 2019-04-22  6:15 UTC (permalink / raw)
  To: ferruh.yigit; +Cc: dev, Kiran Kumar K

From: Kiran Kumar K <kirankumark@marvell.com>

With current KNI implementation kernel module will work only in
IOVA=PA mode. This patch will add support for kernel module to work
with IOVA=VA mode.

The idea is to get the physical address from iova address using
api iommu_iova_to_phys. Using this API, we will get the physical
address from iova address and later use phys_to_virt API to
convert the physical address to kernel virtual address.

With this approach we have compared the performance with IOVA=PA
and there is no difference observed. Seems like kernel is the
overhead.

This approach will not work with the kernel versions less than 4.4.0
because of API compatibility issues.

Signed-off-by: Kiran Kumar K <kirankumark@marvell.com>
---
V5 changes:
* Fixed build issue with 32b build

V4 changes:
* Fixed build issues with older kernel versions
* This approach will only work with kernel above 4.4.0

V3 Changes:
* Add new approach to work kni with IOVA=VA mode using
iommu_iova_to_phys API.

 kernel/linux/kni/kni_dev.h                    |  4 +
 kernel/linux/kni/kni_misc.c                   | 63 ++++++++++++---
 kernel/linux/kni/kni_net.c                    | 76 +++++++++++++++----
 lib/librte_eal/linux/eal/eal.c                |  9 ---
 .../linux/eal/include/rte_kni_common.h        |  1 +
 lib/librte_kni/rte_kni.c                      |  2 +
 6 files changed, 122 insertions(+), 33 deletions(-)

diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h
index df46aa70e..9c4944921 100644
--- a/kernel/linux/kni/kni_dev.h
+++ b/kernel/linux/kni/kni_dev.h
@@ -23,6 +23,7 @@
 #include <linux/netdevice.h>
 #include <linux/spinlock.h>
 #include <linux/list.h>
+#include <linux/iommu.h>

 #include <rte_kni_common.h>
 #define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
@@ -39,6 +40,9 @@ struct kni_dev {
 	/* kni list */
 	struct list_head list;

+	uint8_t iova_mode;
+	struct iommu_domain *domain;
+
 	struct net_device_stats stats;
 	int status;
 	uint16_t group_id;           /* Group ID of a group of KNI devices */
diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c
index 31845e10f..9e90af31b 100644
--- a/kernel/linux/kni/kni_misc.c
+++ b/kernel/linux/kni/kni_misc.c
@@ -306,10 +306,12 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
 	struct rte_kni_device_info dev_info;
 	struct net_device *net_dev = NULL;
 	struct kni_dev *kni, *dev, *n;
+	struct pci_dev *pci = NULL;
+	struct iommu_domain *domain = NULL;
+	phys_addr_t phys_addr;
 #ifdef RTE_KNI_KMOD_ETHTOOL
 	struct pci_dev *found_pci = NULL;
 	struct net_device *lad_dev = NULL;
-	struct pci_dev *pci = NULL;
 #endif

 	pr_info("Creating kni...\n");
@@ -368,15 +370,56 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
 	strncpy(kni->name, dev_info.name, RTE_KNI_NAMESIZE);

 	/* Translate user space info into kernel space info */
-	kni->tx_q = phys_to_virt(dev_info.tx_phys);
-	kni->rx_q = phys_to_virt(dev_info.rx_phys);
-	kni->alloc_q = phys_to_virt(dev_info.alloc_phys);
-	kni->free_q = phys_to_virt(dev_info.free_phys);
-
-	kni->req_q = phys_to_virt(dev_info.req_phys);
-	kni->resp_q = phys_to_virt(dev_info.resp_phys);
-	kni->sync_va = dev_info.sync_va;
-	kni->sync_kva = phys_to_virt(dev_info.sync_phys);
+
+	if (dev_info.iova_mode) {
+#if KERNEL_VERSION(4, 4, 0) > LINUX_VERSION_CODE
+		(void)pci;
+		pr_err("Kernel version is not supported\n");
+		return -EINVAL;
+#else
+		pci = pci_get_device(dev_info.vendor_id,
+				     dev_info.device_id, NULL);
+		while (pci) {
+			if ((pci->bus->number == dev_info.bus) &&
+			    (PCI_SLOT(pci->devfn) == dev_info.devid) &&
+			    (PCI_FUNC(pci->devfn) == dev_info.function)) {
+				domain = iommu_get_domain_for_dev(&pci->dev);
+				break;
+			}
+			pci = pci_get_device(dev_info.vendor_id,
+					     dev_info.device_id, pci);
+		}
+#endif
+		kni->domain = domain;
+		phys_addr = iommu_iova_to_phys(domain, dev_info.tx_phys);
+		kni->tx_q = phys_to_virt(phys_addr);
+		phys_addr = iommu_iova_to_phys(domain, dev_info.rx_phys);
+		kni->rx_q = phys_to_virt(phys_addr);
+		phys_addr = iommu_iova_to_phys(domain, dev_info.alloc_phys);
+		kni->alloc_q = phys_to_virt(phys_addr);
+		phys_addr = iommu_iova_to_phys(domain, dev_info.free_phys);
+		kni->free_q = phys_to_virt(phys_addr);
+		phys_addr = iommu_iova_to_phys(domain, dev_info.req_phys);
+		kni->req_q = phys_to_virt(phys_addr);
+		phys_addr = iommu_iova_to_phys(domain, dev_info.resp_phys);
+		kni->resp_q = phys_to_virt(phys_addr);
+		kni->sync_va = dev_info.sync_va;
+		phys_addr = iommu_iova_to_phys(domain, dev_info.sync_phys);
+		kni->sync_kva = phys_to_virt(phys_addr);
+		kni->iova_mode = 1;
+
+	} else {
+		kni->tx_q = phys_to_virt(dev_info.tx_phys);
+		kni->rx_q = phys_to_virt(dev_info.rx_phys);
+		kni->alloc_q = phys_to_virt(dev_info.alloc_phys);
+		kni->free_q = phys_to_virt(dev_info.free_phys);
+
+		kni->req_q = phys_to_virt(dev_info.req_phys);
+		kni->resp_q = phys_to_virt(dev_info.resp_phys);
+		kni->sync_va = dev_info.sync_va;
+		kni->sync_kva = phys_to_virt(dev_info.sync_phys);
+		kni->iova_mode = 0;
+	}

 	kni->mbuf_size = dev_info.mbuf_size;

diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c
index be9e6b0b9..e77a28066 100644
--- a/kernel/linux/kni/kni_net.c
+++ b/kernel/linux/kni/kni_net.c
@@ -35,6 +35,22 @@ static void kni_net_rx_normal(struct kni_dev *kni);
 /* kni rx function pointer, with default to normal rx */
 static kni_net_rx_t kni_net_rx_func = kni_net_rx_normal;

+/* iova to kernel virtual address */
+static void *
+iova2kva(struct kni_dev *kni, void *pa)
+{
+	return phys_to_virt(iommu_iova_to_phys(kni->domain,
+				(uintptr_t)pa));
+}
+
+static void *
+iova2data_kva(struct kni_dev *kni, struct rte_kni_mbuf *m)
+{
+	return phys_to_virt((iommu_iova_to_phys(kni->domain,
+					(uintptr_t)m->buf_physaddr) +
+			     m->data_off));
+}
+
 /* physical address to kernel virtual address */
 static void *
 pa2kva(void *pa)
@@ -186,7 +202,10 @@ kni_fifo_trans_pa2va(struct kni_dev *kni,
 			return;

 		for (i = 0; i < num_rx; i++) {
-			kva = pa2kva(kni->pa[i]);
+			if (likely(kni->iova_mode == 1))
+				kva = iova2kva(kni, kni->pa[i]);
+			else
+				kva = pa2kva(kni->pa[i]);
 			kni->va[i] = pa2va(kni->pa[i], kva);
 		}

@@ -263,8 +282,13 @@ kni_net_tx(struct sk_buff *skb, struct net_device *dev)
 	if (likely(ret == 1)) {
 		void *data_kva;

-		pkt_kva = pa2kva(pkt_pa);
-		data_kva = kva2data_kva(pkt_kva);
+		if (likely(kni->iova_mode == 1)) {
+			pkt_kva = iova2kva(kni, pkt_pa);
+			data_kva = iova2data_kva(kni, pkt_kva);
+		} else {
+			pkt_kva = pa2kva(pkt_pa);
+			data_kva = kva2data_kva(pkt_kva);
+		}
 		pkt_va = pa2va(pkt_pa, pkt_kva);

 		len = skb->len;
@@ -335,9 +359,14 @@ kni_net_rx_normal(struct kni_dev *kni)

 	/* Transfer received packets to netif */
 	for (i = 0; i < num_rx; i++) {
-		kva = pa2kva(kni->pa[i]);
+		if (likely(kni->iova_mode == 1)) {
+			kva = iova2kva(kni, kni->pa[i]);
+			data_kva = iova2data_kva(kni, kva);
+		} else {
+			kva = pa2kva(kni->pa[i]);
+			data_kva = kva2data_kva(kva);
+		}
 		len = kva->pkt_len;
-		data_kva = kva2data_kva(kva);
 		kni->va[i] = pa2va(kni->pa[i], kva);

 		skb = dev_alloc_skb(len + 2);
@@ -434,13 +463,20 @@ kni_net_rx_lo_fifo(struct kni_dev *kni)
 		num = ret;
 		/* Copy mbufs */
 		for (i = 0; i < num; i++) {
-			kva = pa2kva(kni->pa[i]);
+
+			if (likely(kni->iova_mode == 1)) {
+				kva = iova2kva(kni, kni->pa[i]);
+				data_kva = iova2data_kva(kni, kva);
+				alloc_kva = iova2kva(kni, kni->alloc_pa[i]);
+				alloc_data_kva = iova2data_kva(kni, alloc_kva);
+			} else {
+				kva = pa2kva(kni->pa[i]);
+				data_kva = kva2data_kva(kva);
+				alloc_kva = pa2kva(kni->alloc_pa[i]);
+				alloc_data_kva = kva2data_kva(alloc_kva);
+			}
 			len = kva->pkt_len;
-			data_kva = kva2data_kva(kva);
 			kni->va[i] = pa2va(kni->pa[i], kva);
-
-			alloc_kva = pa2kva(kni->alloc_pa[i]);
-			alloc_data_kva = kva2data_kva(alloc_kva);
 			kni->alloc_va[i] = pa2va(kni->alloc_pa[i], alloc_kva);

 			memcpy(alloc_data_kva, data_kva, len);
@@ -507,9 +543,15 @@ kni_net_rx_lo_fifo_skb(struct kni_dev *kni)

 	/* Copy mbufs to sk buffer and then call tx interface */
 	for (i = 0; i < num; i++) {
-		kva = pa2kva(kni->pa[i]);
+
+		if (likely(kni->iova_mode == 1)) {
+			kva = iova2kva(kni, kni->pa[i]);
+			data_kva = iova2data_kva(kni, kva);
+		} else {
+			kva = pa2kva(kni->pa[i]);
+			data_kva = kva2data_kva(kva);
+		}
 		len = kva->pkt_len;
-		data_kva = kva2data_kva(kva);
 		kni->va[i] = pa2va(kni->pa[i], kva);

 		skb = dev_alloc_skb(len + 2);
@@ -545,8 +587,14 @@ kni_net_rx_lo_fifo_skb(struct kni_dev *kni)
 				if (!kva->next)
 					break;

-				kva = pa2kva(va2pa(kva->next, kva));
-				data_kva = kva2data_kva(kva);
+				if (likely(kni->iova_mode == 1)) {
+					kva = iova2kva(kni,
+						       va2pa(kva->next, kva));
+					data_kva = iova2data_kva(kni, kva);
+				} else {
+					kva = pa2kva(va2pa(kva->next, kva));
+					data_kva = kva2data_kva(kva);
+				}
 			}
 		}

diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c
index f7ae62d7b..8fac6707d 100644
--- a/lib/librte_eal/linux/eal/eal.c
+++ b/lib/librte_eal/linux/eal/eal.c
@@ -1040,15 +1040,6 @@ rte_eal_init(int argc, char **argv)
 		/* autodetect the IOVA mapping mode (default is RTE_IOVA_PA) */
 		rte_eal_get_configuration()->iova_mode =
 			rte_bus_get_iommu_class();
-
-		/* Workaround for KNI which requires physical address to work */
-		if (rte_eal_get_configuration()->iova_mode == RTE_IOVA_VA &&
-				rte_eal_check_module("rte_kni") == 1) {
-			rte_eal_get_configuration()->iova_mode = RTE_IOVA_PA;
-			RTE_LOG(WARNING, EAL,
-				"Some devices want IOVA as VA but PA will be used because.. "
-				"KNI module inserted\n");
-		}
 	} else {
 		rte_eal_get_configuration()->iova_mode =
 			internal_config.iova_mode;
diff --git a/lib/librte_eal/linux/eal/include/rte_kni_common.h b/lib/librte_eal/linux/eal/include/rte_kni_common.h
index 5afa08713..79ee4bc5a 100644
--- a/lib/librte_eal/linux/eal/include/rte_kni_common.h
+++ b/lib/librte_eal/linux/eal/include/rte_kni_common.h
@@ -128,6 +128,7 @@ struct rte_kni_device_info {
 	unsigned mbuf_size;
 	unsigned int mtu;
 	char mac_addr[6];
+	uint8_t iova_mode;
 };

 #define KNI_DEVICE "kni"
diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index 946459c79..ec8f23694 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -304,6 +304,8 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool,
 	kni->group_id = conf->group_id;
 	kni->mbuf_size = conf->mbuf_size;

+	dev_info.iova_mode = (rte_eal_iova_mode() == RTE_IOVA_VA) ? 1 : 0;
+
 	ret = ioctl(kni_fd, RTE_KNI_IOCTL_CREATE, &dev_info);
 	if (ret < 0)
 		goto ioctl_fail;
--
2.17.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-04-22  6:15 [dpdk-dev] [PATCH v5] kni: add IOVA va support for kni kirankumark
@ 2019-06-25  3:56 ` vattunuru
  2019-06-25 10:00   ` Burakov, Anatoly
  0 siblings, 1 reply; 23+ messages in thread
From: vattunuru @ 2019-06-25  3:56 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit, olivier.matz, arybchenko, Vamsi Attunuru

From: Vamsi Attunuru <vattunuru@marvell.com>

----
V6 Changes:
* Added new mempool flag to ensure mbuf memory is not scattered
across page boundaries.
* Added KNI kernel module required PCI device information.
* Modified KNI example application to create mempool with new
mempool flag.

V5 changes:
* Fixed build issue with 32b build

V4 changes:
* Fixed build issues with older kernel versions
* This approach will only work with kernel above 4.4.0

V3 Changes:
* Add new approach to work kni with IOVA=VA mode using
iommu_iova_to_phys API.

Kiran Kumar K (1):
  kernel/linux/kni: add IOVA support in kni module

Vamsi Attunuru (3):
  lib/mempool: skip populating mempool objs that falls on page
    boundaries
  lib/kni: add PCI related information
  example/kni: add IOVA support for kni application

 examples/kni/main.c                               | 53 +++++++++++++++-
 kernel/linux/kni/kni_dev.h                        |  3 +
 kernel/linux/kni/kni_misc.c                       | 62 +++++++++++++++---
 kernel/linux/kni/kni_net.c                        | 76 +++++++++++++++++++----
 lib/librte_eal/linux/eal/eal.c                    |  8 ---
 lib/librte_eal/linux/eal/include/rte_kni_common.h |  8 +++
 lib/librte_kni/rte_kni.c                          |  7 +++
 lib/librte_mempool/rte_mempool.c                  |  2 +-
 lib/librte_mempool/rte_mempool.h                  |  2 +
 lib/librte_mempool/rte_mempool_ops_default.c      | 30 +++++++++
 10 files changed, 219 insertions(+), 32 deletions(-)

-- 
2.8.4


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-06-25  3:56 ` [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI vattunuru
@ 2019-06-25 10:00   ` Burakov, Anatoly
  2019-06-25 11:15     ` Jerin Jacob Kollanukkaran
  0 siblings, 1 reply; 23+ messages in thread
From: Burakov, Anatoly @ 2019-06-25 10:00 UTC (permalink / raw)
  To: vattunuru, dev; +Cc: ferruh.yigit, olivier.matz, arybchenko

On 25-Jun-19 4:56 AM, vattunuru@marvell.com wrote:
> From: Vamsi Attunuru <vattunuru@marvell.com>
> 
> ----
> V6 Changes:
> * Added new mempool flag to ensure mbuf memory is not scattered
> across page boundaries.
> * Added KNI kernel module required PCI device information.
> * Modified KNI example application to create mempool with new
> mempool flag.
> 
Others can chime in, but my 2 cents: this reduces the usefulness of KNI 
because it limits the kinds of mempools one can use them with, and makes 
it so that the code that works with every other PMD requires changes to 
work with KNI.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-06-25 10:00   ` Burakov, Anatoly
@ 2019-06-25 11:15     ` Jerin Jacob Kollanukkaran
  2019-06-25 11:30       ` Burakov, Anatoly
  0 siblings, 1 reply; 23+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-06-25 11:15 UTC (permalink / raw)
  To: Burakov, Anatoly, Vamsi Krishna Attunuru, dev
  Cc: ferruh.yigit, olivier.matz, arybchenko

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Burakov, Anatoly
> Sent: Tuesday, June 25, 2019 3:30 PM
> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org
> Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com;
> arybchenko@solarflare.com
> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> 
> On 25-Jun-19 4:56 AM, vattunuru@marvell.com wrote:
> > From: Vamsi Attunuru <vattunuru@marvell.com>
> >
> > ----
> > V6 Changes:
> > * Added new mempool flag to ensure mbuf memory is not scattered across
> > page boundaries.
> > * Added KNI kernel module required PCI device information.
> > * Modified KNI example application to create mempool with new mempool
> > flag.
> >
> Others can chime in, but my 2 cents: this reduces the usefulness of KNI because
> it limits the kinds of mempools one can use them with, and makes it so that the
> code that works with every other PMD requires changes to work with KNI.

# One option to make this flag as default only for packet mempool(not allow allocate on page boundary).
In real world the overhead will be very minimal considering Huge page size is 1G or 512M 
# Enable this flag explicitly only IOVA = VA mode in library. Not  need to expose to application
# I don’t think, there needs to be any PMD specific change to make KNI with IOVA = VA mode
# No preference on flags to be passed by application vs in library. But IMO this change would be
needed in mempool support KNI in IOVA = VA mode.



> 
> --
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-06-25 11:15     ` Jerin Jacob Kollanukkaran
@ 2019-06-25 11:30       ` Burakov, Anatoly
  2019-06-25 13:38         ` Burakov, Anatoly
  0 siblings, 1 reply; 23+ messages in thread
From: Burakov, Anatoly @ 2019-06-25 11:30 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran, Vamsi Krishna Attunuru, dev
  Cc: ferruh.yigit, olivier.matz, arybchenko

On 25-Jun-19 12:15 PM, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: dev <dev-bounces@dpdk.org> On Behalf Of Burakov, Anatoly
>> Sent: Tuesday, June 25, 2019 3:30 PM
>> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org
>> Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com;
>> arybchenko@solarflare.com
>> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
>>
>> On 25-Jun-19 4:56 AM, vattunuru@marvell.com wrote:
>>> From: Vamsi Attunuru <vattunuru@marvell.com>
>>>
>>> ----
>>> V6 Changes:
>>> * Added new mempool flag to ensure mbuf memory is not scattered across
>>> page boundaries.
>>> * Added KNI kernel module required PCI device information.
>>> * Modified KNI example application to create mempool with new mempool
>>> flag.
>>>
>> Others can chime in, but my 2 cents: this reduces the usefulness of KNI because
>> it limits the kinds of mempools one can use them with, and makes it so that the
>> code that works with every other PMD requires changes to work with KNI.
> 
> # One option to make this flag as default only for packet mempool(not allow allocate on page boundary).
> In real world the overhead will be very minimal considering Huge page size is 1G or 512M
> # Enable this flag explicitly only IOVA = VA mode in library. Not  need to expose to application
> # I don’t think, there needs to be any PMD specific change to make KNI with IOVA = VA mode
> # No preference on flags to be passed by application vs in library. But IMO this change would be
> needed in mempool support KNI in IOVA = VA mode.
> 

I would be OK to just make it default behavior to not cross page 
boundaries when allocating buffers. This would solve the problem for KNI 
and for any other use case that would rely on PA-contiguous buffers in 
face of IOVA as VA mode.

We could also add a flag to explicitly allow page crossing without also 
making mbufs IOVA-non-contiguous, but i'm not sure if there are use 
cases that would benefit from this.

> 
> 
>>
>> --
>> Thanks,
>> Anatoly


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-06-25 11:30       ` Burakov, Anatoly
@ 2019-06-25 13:38         ` Burakov, Anatoly
  2019-06-27  9:34           ` Jerin Jacob Kollanukkaran
  0 siblings, 1 reply; 23+ messages in thread
From: Burakov, Anatoly @ 2019-06-25 13:38 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran, Vamsi Krishna Attunuru, dev
  Cc: ferruh.yigit, olivier.matz, arybchenko

On 25-Jun-19 12:30 PM, Burakov, Anatoly wrote:
> On 25-Jun-19 12:15 PM, Jerin Jacob Kollanukkaran wrote:
>>> -----Original Message-----
>>> From: dev <dev-bounces@dpdk.org> On Behalf Of Burakov, Anatoly
>>> Sent: Tuesday, June 25, 2019 3:30 PM
>>> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org
>>> Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com;
>>> arybchenko@solarflare.com
>>> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
>>>
>>> On 25-Jun-19 4:56 AM, vattunuru@marvell.com wrote:
>>>> From: Vamsi Attunuru <vattunuru@marvell.com>
>>>>
>>>> ----
>>>> V6 Changes:
>>>> * Added new mempool flag to ensure mbuf memory is not scattered across
>>>> page boundaries.
>>>> * Added KNI kernel module required PCI device information.
>>>> * Modified KNI example application to create mempool with new mempool
>>>> flag.
>>>>
>>> Others can chime in, but my 2 cents: this reduces the usefulness of 
>>> KNI because
>>> it limits the kinds of mempools one can use them with, and makes it 
>>> so that the
>>> code that works with every other PMD requires changes to work with KNI.
>>
>> # One option to make this flag as default only for packet mempool(not 
>> allow allocate on page boundary).
>> In real world the overhead will be very minimal considering Huge page 
>> size is 1G or 512M
>> # Enable this flag explicitly only IOVA = VA mode in library. Not  
>> need to expose to application
>> # I don’t think, there needs to be any PMD specific change to make KNI 
>> with IOVA = VA mode
>> # No preference on flags to be passed by application vs in library. 
>> But IMO this change would be
>> needed in mempool support KNI in IOVA = VA mode.
>>
> 
> I would be OK to just make it default behavior to not cross page 
> boundaries when allocating buffers. This would solve the problem for KNI 
> and for any other use case that would rely on PA-contiguous buffers in 
> face of IOVA as VA mode.
> 
> We could also add a flag to explicitly allow page crossing without also 
> making mbufs IOVA-non-contiguous, but i'm not sure if there are use 
> cases that would benefit from this.

On another thought, such a default would break 4K pages in case for 
packets bigger than page size (i.e. jumbo frames). Should we care?

> 
>>
>>
>>>
>>> -- 
>>> Thanks,
>>> Anatoly
> 
> 


-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-06-25 13:38         ` Burakov, Anatoly
@ 2019-06-27  9:34           ` Jerin Jacob Kollanukkaran
  2019-07-01 13:51             ` Vamsi Krishna Attunuru
  0 siblings, 1 reply; 23+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-06-27  9:34 UTC (permalink / raw)
  To: Burakov, Anatoly, Vamsi Krishna Attunuru, dev
  Cc: ferruh.yigit, olivier.matz, arybchenko

> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Tuesday, June 25, 2019 7:09 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Vamsi Krishna Attunuru
> <vattunuru@marvell.com>; dev@dpdk.org
> Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com;
> arybchenko@solarflare.com
> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> 
> On 25-Jun-19 12:30 PM, Burakov, Anatoly wrote:
> > On 25-Jun-19 12:15 PM, Jerin Jacob Kollanukkaran wrote:
> >>> -----Original Message-----
> >>> From: dev <dev-bounces@dpdk.org> On Behalf Of Burakov, Anatoly
> >>> Sent: Tuesday, June 25, 2019 3:30 PM
> >>> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org
> >>> Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com;
> >>> arybchenko@solarflare.com
> >>> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> >>>
> >>> On 25-Jun-19 4:56 AM, vattunuru@marvell.com wrote:
> >>>> From: Vamsi Attunuru <vattunuru@marvell.com>
> >>>>
> >>>> ----
> >>>> V6 Changes:
> >>>> * Added new mempool flag to ensure mbuf memory is not scattered
> >>>> across page boundaries.
> >>>> * Added KNI kernel module required PCI device information.
> >>>> * Modified KNI example application to create mempool with new
> >>>> mempool flag.
> >>>>
> >>> Others can chime in, but my 2 cents: this reduces the usefulness of
> >>> KNI because it limits the kinds of mempools one can use them with,
> >>> and makes it so that the code that works with every other PMD
> >>> requires changes to work with KNI.
> >>
> >> # One option to make this flag as default only for packet mempool(not
> >> allow allocate on page boundary).
> >> In real world the overhead will be very minimal considering Huge page
> >> size is 1G or 512M # Enable this flag explicitly only IOVA = VA mode
> >> in library. Not need to expose to application # I don’t think, there
> >> needs to be any PMD specific change to make KNI with IOVA = VA mode #
> >> No preference on flags to be passed by application vs in library.
> >> But IMO this change would be
> >> needed in mempool support KNI in IOVA = VA mode.
> >>
> >
> > I would be OK to just make it default behavior to not cross page
> > boundaries when allocating buffers. This would solve the problem for
> > KNI and for any other use case that would rely on PA-contiguous
> > buffers in face of IOVA as VA mode.
> >
> > We could also add a flag to explicitly allow page crossing without
> > also making mbufs IOVA-non-contiguous, but i'm not sure if there are
> > use cases that would benefit from this.
> 
> On another thought, such a default would break 4K pages in case for packets
> bigger than page size (i.e. jumbo frames). Should we care?

The hugepage size will not be 4K. Right?

Olivier,

As a maintainer any thoughts of exposing/not exposing the new mepool flag to
Skip the page boundaries?

All,
Either option is fine, Asking for feedback to processed further?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-06-27  9:34           ` Jerin Jacob Kollanukkaran
@ 2019-07-01 13:51             ` Vamsi Krishna Attunuru
  2019-07-04  6:42               ` Vamsi Krishna Attunuru
  0 siblings, 1 reply; 23+ messages in thread
From: Vamsi Krishna Attunuru @ 2019-07-01 13:51 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran, Burakov, Anatoly, dev
  Cc: ferruh.yigit, olivier.matz, arybchenko

ping..

________________________________
From: Jerin Jacob Kollanukkaran
Sent: Thursday, June 27, 2019 3:04:58 PM
To: Burakov, Anatoly; Vamsi Krishna Attunuru; dev@dpdk.org
Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com; arybchenko@solarflare.com
Subject: RE: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI

> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Tuesday, June 25, 2019 7:09 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Vamsi Krishna Attunuru
> <vattunuru@marvell.com>; dev@dpdk.org
> Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com;
> arybchenko@solarflare.com
> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
>
> On 25-Jun-19 12:30 PM, Burakov, Anatoly wrote:
> > On 25-Jun-19 12:15 PM, Jerin Jacob Kollanukkaran wrote:
> >>> -----Original Message-----
> >>> From: dev <dev-bounces@dpdk.org> On Behalf Of Burakov, Anatoly
> >>> Sent: Tuesday, June 25, 2019 3:30 PM
> >>> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org
> >>> Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com;
> >>> arybchenko@solarflare.com
> >>> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> >>>
> >>> On 25-Jun-19 4:56 AM, vattunuru@marvell.com wrote:
> >>>> From: Vamsi Attunuru <vattunuru@marvell.com>
> >>>>
> >>>> ----
> >>>> V6 Changes:
> >>>> * Added new mempool flag to ensure mbuf memory is not scattered
> >>>> across page boundaries.
> >>>> * Added KNI kernel module required PCI device information.
> >>>> * Modified KNI example application to create mempool with new
> >>>> mempool flag.
> >>>>
> >>> Others can chime in, but my 2 cents: this reduces the usefulness of
> >>> KNI because it limits the kinds of mempools one can use them with,
> >>> and makes it so that the code that works with every other PMD
> >>> requires changes to work with KNI.
> >>
> >> # One option to make this flag as default only for packet mempool(not
> >> allow allocate on page boundary).
> >> In real world the overhead will be very minimal considering Huge page
> >> size is 1G or 512M # Enable this flag explicitly only IOVA = VA mode
> >> in library. Not need to expose to application # I don’t think, there
> >> needs to be any PMD specific change to make KNI with IOVA = VA mode #
> >> No preference on flags to be passed by application vs in library.
> >> But IMO this change would be
> >> needed in mempool support KNI in IOVA = VA mode.
> >>
> >
> > I would be OK to just make it default behavior to not cross page
> > boundaries when allocating buffers. This would solve the problem for
> > KNI and for any other use case that would rely on PA-contiguous
> > buffers in face of IOVA as VA mode.
> >
> > We could also add a flag to explicitly allow page crossing without
> > also making mbufs IOVA-non-contiguous, but i'm not sure if there are
> > use cases that would benefit from this.
>
> On another thought, such a default would break 4K pages in case for packets
> bigger than page size (i.e. jumbo frames). Should we care?

The hugepage size will not be 4K. Right?

Olivier,

As a maintainer any thoughts of exposing/not exposing the new mepool flag to
Skip the page boundaries?

All,
Either option is fine, Asking for feedback to processed further?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-07-01 13:51             ` Vamsi Krishna Attunuru
@ 2019-07-04  6:42               ` Vamsi Krishna Attunuru
  2019-07-04  9:48                 ` Jerin Jacob Kollanukkaran
  0 siblings, 1 reply; 23+ messages in thread
From: Vamsi Krishna Attunuru @ 2019-07-04  6:42 UTC (permalink / raw)
  To: dev
  Cc: ferruh.yigit, olivier.matz, arybchenko,
	Jerin Jacob Kollanukkaran, Burakov, Anatoly

Hi All,


Just to summarize, below items have arisen from the initial review.

1) Can the new mempool flag be made default to all the pools and will there be case that new flag functionality would fail  for some page sizes.?

2) Adding HW device info(pci dev info) to KNI device structure, will it break KNI on virtual devices in VA or PA mode.?


Can someone suggest if any changes required to address above issues.

________________________________
From: dev <dev-bounces@dpdk.org> on behalf of Vamsi Krishna Attunuru <vattunuru@marvell.com>
Sent: Monday, July 1, 2019 7:21:22 PM
To: Jerin Jacob Kollanukkaran; Burakov, Anatoly; dev@dpdk.org
Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com; arybchenko@solarflare.com
Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI

External Email

----------------------------------------------------------------------
ping..

________________________________
From: Jerin Jacob Kollanukkaran
Sent: Thursday, June 27, 2019 3:04:58 PM
To: Burakov, Anatoly; Vamsi Krishna Attunuru; dev@dpdk.org
Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com; arybchenko@solarflare.com
Subject: RE: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI

> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Tuesday, June 25, 2019 7:09 PM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Vamsi Krishna Attunuru
> <vattunuru@marvell.com>; dev@dpdk.org
> Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com;
> arybchenko@solarflare.com
> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
>
> On 25-Jun-19 12:30 PM, Burakov, Anatoly wrote:
> > On 25-Jun-19 12:15 PM, Jerin Jacob Kollanukkaran wrote:
> >>> -----Original Message-----
> >>> From: dev <dev-bounces@dpdk.org> On Behalf Of Burakov, Anatoly
> >>> Sent: Tuesday, June 25, 2019 3:30 PM
> >>> To: Vamsi Krishna Attunuru <vattunuru@marvell.com>; dev@dpdk.org
> >>> Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com;
> >>> arybchenko@solarflare.com
> >>> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> >>>
> >>> On 25-Jun-19 4:56 AM, vattunuru@marvell.com wrote:
> >>>> From: Vamsi Attunuru <vattunuru@marvell.com>
> >>>>
> >>>> ----
> >>>> V6 Changes:
> >>>> * Added new mempool flag to ensure mbuf memory is not scattered
> >>>> across page boundaries.
> >>>> * Added KNI kernel module required PCI device information.
> >>>> * Modified KNI example application to create mempool with new
> >>>> mempool flag.
> >>>>
> >>> Others can chime in, but my 2 cents: this reduces the usefulness of
> >>> KNI because it limits the kinds of mempools one can use them with,
> >>> and makes it so that the code that works with every other PMD
> >>> requires changes to work with KNI.
> >>
> >> # One option to make this flag as default only for packet mempool(not
> >> allow allocate on page boundary).
> >> In real world the overhead will be very minimal considering Huge page
> >> size is 1G or 512M # Enable this flag explicitly only IOVA = VA mode
> >> in library. Not need to expose to application # I don’t think, there
> >> needs to be any PMD specific change to make KNI with IOVA = VA mode #
> >> No preference on flags to be passed by application vs in library.
> >> But IMO this change would be
> >> needed in mempool support KNI in IOVA = VA mode.
> >>
> >
> > I would be OK to just make it default behavior to not cross page
> > boundaries when allocating buffers. This would solve the problem for
> > KNI and for any other use case that would rely on PA-contiguous
> > buffers in face of IOVA as VA mode.
> >
> > We could also add a flag to explicitly allow page crossing without
> > also making mbufs IOVA-non-contiguous, but i'm not sure if there are
> > use cases that would benefit from this.
>
> On another thought, such a default would break 4K pages in case for packets
> bigger than page size (i.e. jumbo frames). Should we care?

The hugepage size will not be 4K. Right?

Olivier,

As a maintainer any thoughts of exposing/not exposing the new mepool flag to
Skip the page boundaries?

All,
Either option is fine, Asking for feedback to processed further?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-07-04  6:42               ` Vamsi Krishna Attunuru
@ 2019-07-04  9:48                 ` Jerin Jacob Kollanukkaran
  2019-07-11 16:21                   ` Ferruh Yigit
  0 siblings, 1 reply; 23+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-07-04  9:48 UTC (permalink / raw)
  To: Vamsi Krishna Attunuru, dev
  Cc: ferruh.yigit, olivier.matz, arybchenko, Burakov, Anatoly

>From: Vamsi Krishna Attunuru 
>Sent: Thursday, July 4, 2019 12:13 PM
>To: dev@dpdk.org
>Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com; arybchenko@solarflare.com; Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Burakov, Anatoly <anatoly.burakov@intel.com>
>Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
>
>Hi All,
>
>Just to summarize, below items have arisen from the initial review.
>1) Can the new mempool flag be made default to all the pools and will there be case that new flag functionality would fail  for some page sizes.?

If the minimum huge page size is 2MB and normal huge page size is 512MB or 1G. So I think, new flags can be default as skipping the page boundaries for 
Mempool objects has nearly zero overhead. But I leave decision to maintainers.

>2) Adding HW device info(pci dev info) to KNI device structure, will it break KNI on virtual devices in VA or PA mode.?

Iommu_domain will be created only for PCI devices and the system runs in IOVA_VA mode. Virtual devices(IOVA_DC(don't care) or
IOVA_PA devices still it works without PCI device structure)

It is  a useful feature where KNI can run without root privilege and it is pending for long time. Request to review and close this

>
>Can someone suggest if any changes required to address above issues. 
________________________________________
From: dev <mailto:dev-bounces@dpdk.org> on behalf of Vamsi Krishna Attunuru <mailto:vattunuru@marvell.com>
Sent: Monday, July 1, 2019 7:21:22 PM
To: Jerin Jacob Kollanukkaran; Burakov, Anatoly; mailto:dev@dpdk.org
Cc: mailto:ferruh.yigit@intel.com; mailto:olivier.matz@6wind.com; mailto:arybchenko@solarflare.com
Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI 
 
External Email

----------------------------------------------------------------------
ping..

________________________________
From: Jerin Jacob Kollanukkaran
Sent: Thursday, June 27, 2019 3:04:58 PM
To: Burakov, Anatoly; Vamsi Krishna Attunuru; mailto:dev@dpdk.org
Cc: mailto:ferruh.yigit@intel.com; mailto:olivier.matz@6wind.com; mailto:arybchenko@solarflare.com
Subject: RE: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI

> -----Original Message-----
> From: Burakov, Anatoly <mailto:anatoly.burakov@intel.com>
> Sent: Tuesday, June 25, 2019 7:09 PM
> To: Jerin Jacob Kollanukkaran <mailto:jerinj@marvell.com>; Vamsi Krishna Attunuru
> <mailto:vattunuru@marvell.com>; mailto:dev@dpdk.org
> Cc: mailto:ferruh.yigit@intel.com; mailto:olivier.matz@6wind.com;
> mailto:arybchenko@solarflare.com
> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
>
> On 25-Jun-19 12:30 PM, Burakov, Anatoly wrote:
> > On 25-Jun-19 12:15 PM, Jerin Jacob Kollanukkaran wrote:
> >>> -----Original Message-----
> >>> From: dev <mailto:dev-bounces@dpdk.org> On Behalf Of Burakov, Anatoly
> >>> Sent: Tuesday, June 25, 2019 3:30 PM
> >>> To: Vamsi Krishna Attunuru <mailto:vattunuru@marvell.com>; mailto:dev@dpdk.org
> >>> Cc: mailto:ferruh.yigit@intel.com; mailto:olivier.matz@6wind.com;
> >>> mailto:arybchenko@solarflare.com
> >>> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> >>>
> >>> On 25-Jun-19 4:56 AM, mailto:vattunuru@marvell.com wrote:
> >>>> From: Vamsi Attunuru <mailto:vattunuru@marvell.com>
> >>>>
> >>>> ----
> >>>> V6 Changes:
> >>>> * Added new mempool flag to ensure mbuf memory is not scattered
> >>>> across page boundaries.
> >>>> * Added KNI kernel module required PCI device information.
> >>>> * Modified KNI example application to create mempool with new
> >>>> mempool flag.
> >>>>
> >>> Others can chime in, but my 2 cents: this reduces the usefulness of
> >>> KNI because it limits the kinds of mempools one can use them with,
> >>> and makes it so that the code that works with every other PMD
> >>> requires changes to work with KNI.
> >>
> >> # One option to make this flag as default only for packet mempool(not
> >> allow allocate on page boundary).
> >> In real world the overhead will be very minimal considering Huge page
> >> size is 1G or 512M # Enable this flag explicitly only IOVA = VA mode
> >> in library. Not need to expose to application # I don't think, there
> >> needs to be any PMD specific change to make KNI with IOVA = VA mode #
> >> No preference on flags to be passed by application vs in library.
> >> But IMO this change would be
> >> needed in mempool support KNI in IOVA = VA mode.
> >>
> >
> > I would be OK to just make it default behavior to not cross page
> > boundaries when allocating buffers. This would solve the problem for
> > KNI and for any other use case that would rely on PA-contiguous
> > buffers in face of IOVA as VA mode.
> >
> > We could also add a flag to explicitly allow page crossing without
> > also making mbufs IOVA-non-contiguous, but i'm not sure if there are
> > use cases that would benefit from this.
>
> On another thought, such a default would break 4K pages in case for packets
> bigger than page size (i.e. jumbo frames). Should we care?

The hugepage size will not be 4K. Right?

Olivier,

As a maintainer any thoughts of exposing/not exposing the new mepool flag to
Skip the page boundaries?

All,
Either option is fine, Asking for feedback to processed further?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
  2019-07-04  9:48                 ` Jerin Jacob Kollanukkaran
@ 2019-07-11 16:21                   ` Ferruh Yigit
  0 siblings, 0 replies; 23+ messages in thread
From: Ferruh Yigit @ 2019-07-11 16:21 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran, Vamsi Krishna Attunuru, dev
  Cc: olivier.matz, arybchenko, Burakov, Anatoly

On 7/4/2019 10:48 AM, Jerin Jacob Kollanukkaran wrote:
>> From: Vamsi Krishna Attunuru 
>> Sent: Thursday, July 4, 2019 12:13 PM
>> To: dev@dpdk.org
>> Cc: ferruh.yigit@intel.com; olivier.matz@6wind.com; arybchenko@solarflare.com; Jerin Jacob Kollanukkaran <jerinj@marvell.com>; Burakov, Anatoly <anatoly.burakov@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
>>
>> Hi All,
>>
>> Just to summarize, below items have arisen from the initial review.
>> 1) Can the new mempool flag be made default to all the pools and will there be case that new flag functionality would fail  for some page sizes.?
> 
> If the minimum huge page size is 2MB and normal huge page size is 512MB or 1G. So I think, new flags can be default as skipping the page boundaries for 
> Mempool objects has nearly zero overhead. But I leave decision to maintainers.
> 
>> 2) Adding HW device info(pci dev info) to KNI device structure, will it break KNI on virtual devices in VA or PA mode.?
> 
> Iommu_domain will be created only for PCI devices and the system runs in IOVA_VA mode. Virtual devices(IOVA_DC(don't care) or
> IOVA_PA devices still it works without PCI device structure)
> 
> It is  a useful feature where KNI can run without root privilege and it is pending for long time. Request to review and close this

I support the idea to remove 'kni' forcing to the IOVA=PA mode, but also not
sure about forcing all KNI users to update their code to allocate mempool in a
very specific way.

What about giving more control to the user on this?

Any user want to use IOVA=VA and KNI together can update application to justify
memory allocation of the KNI and give an explicit "kni iova_mode=1" config.
Who want to use existing KNI implementation can continue to use it with IOVA=PA
mode which is current case, or for this case user may need to force the DPDK
application to IOVA=PA but at least there is a workaround.

And kni sample application should have sample for both case, although this
increases the testing and maintenance cost, I hope we can get support from you
on the iova_mode=1 usecase.

What do you think?



> 
>>
>> Can someone suggest if any changes required to address above issues. 
> ________________________________________
> From: dev <mailto:dev-bounces@dpdk.org> on behalf of Vamsi Krishna Attunuru <mailto:vattunuru@marvell.com>
> Sent: Monday, July 1, 2019 7:21:22 PM
> To: Jerin Jacob Kollanukkaran; Burakov, Anatoly; mailto:dev@dpdk.org
> Cc: mailto:ferruh.yigit@intel.com; mailto:olivier.matz@6wind.com; mailto:arybchenko@solarflare.com
> Subject: [EXT] Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI 
>  
> External Email
> 
> ----------------------------------------------------------------------
> ping..
> 
> ________________________________
> From: Jerin Jacob Kollanukkaran
> Sent: Thursday, June 27, 2019 3:04:58 PM
> To: Burakov, Anatoly; Vamsi Krishna Attunuru; mailto:dev@dpdk.org
> Cc: mailto:ferruh.yigit@intel.com; mailto:olivier.matz@6wind.com; mailto:arybchenko@solarflare.com
> Subject: RE: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
> 
>> -----Original Message-----
>> From: Burakov, Anatoly <mailto:anatoly.burakov@intel.com>
>> Sent: Tuesday, June 25, 2019 7:09 PM
>> To: Jerin Jacob Kollanukkaran <mailto:jerinj@marvell.com>; Vamsi Krishna Attunuru
>> <mailto:vattunuru@marvell.com>; mailto:dev@dpdk.org
>> Cc: mailto:ferruh.yigit@intel.com; mailto:olivier.matz@6wind.com;
>> mailto:arybchenko@solarflare.com
>> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
>>
>> On 25-Jun-19 12:30 PM, Burakov, Anatoly wrote:
>>> On 25-Jun-19 12:15 PM, Jerin Jacob Kollanukkaran wrote:
>>>>> -----Original Message-----
>>>>> From: dev <mailto:dev-bounces@dpdk.org> On Behalf Of Burakov, Anatoly
>>>>> Sent: Tuesday, June 25, 2019 3:30 PM
>>>>> To: Vamsi Krishna Attunuru <mailto:vattunuru@marvell.com>; mailto:dev@dpdk.org
>>>>> Cc: mailto:ferruh.yigit@intel.com; mailto:olivier.matz@6wind.com;
>>>>> mailto:arybchenko@solarflare.com
>>>>> Subject: Re: [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI
>>>>>
>>>>> On 25-Jun-19 4:56 AM, mailto:vattunuru@marvell.com wrote:
>>>>>> From: Vamsi Attunuru <mailto:vattunuru@marvell.com>
>>>>>>
>>>>>> ----
>>>>>> V6 Changes:
>>>>>> * Added new mempool flag to ensure mbuf memory is not scattered
>>>>>> across page boundaries.
>>>>>> * Added KNI kernel module required PCI device information.
>>>>>> * Modified KNI example application to create mempool with new
>>>>>> mempool flag.
>>>>>>
>>>>> Others can chime in, but my 2 cents: this reduces the usefulness of
>>>>> KNI because it limits the kinds of mempools one can use them with,
>>>>> and makes it so that the code that works with every other PMD
>>>>> requires changes to work with KNI.
>>>>
>>>> # One option to make this flag as default only for packet mempool(not
>>>> allow allocate on page boundary).
>>>> In real world the overhead will be very minimal considering Huge page
>>>> size is 1G or 512M # Enable this flag explicitly only IOVA = VA mode
>>>> in library. Not need to expose to application # I don't think, there
>>>> needs to be any PMD specific change to make KNI with IOVA = VA mode #
>>>> No preference on flags to be passed by application vs in library.
>>>> But IMO this change would be
>>>> needed in mempool support KNI in IOVA = VA mode.
>>>>
>>>
>>> I would be OK to just make it default behavior to not cross page
>>> boundaries when allocating buffers. This would solve the problem for
>>> KNI and for any other use case that would rely on PA-contiguous
>>> buffers in face of IOVA as VA mode.
>>>
>>> We could also add a flag to explicitly allow page crossing without
>>> also making mbufs IOVA-non-contiguous, but i'm not sure if there are
>>> use cases that would benefit from this.
>>
>> On another thought, such a default would break 4K pages in case for packets
>> bigger than page size (i.e. jumbo frames). Should we care?
> 
> The hugepage size will not be 4K. Right?
> 
> Olivier,
> 
> As a maintainer any thoughts of exposing/not exposing the new mepool flag to
> Skip the page boundaries?
> 
> All,
> Either option is fine, Asking for feedback to processed further?
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-07-16 10:07 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-12 11:37 [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI Jerin Jacob Kollanukkaran
2019-07-12 12:09 ` Burakov, Anatoly
2019-07-12 12:28   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-07-15  4:54     ` Jerin Jacob Kollanukkaran
2019-07-15  9:38       ` Burakov, Anatoly
2019-07-16  8:46         ` Olivier Matz
2019-07-16  9:40           ` Vamsi Krishna Attunuru
2019-07-16  9:55             ` Olivier Matz
2019-07-16 10:07               ` Vamsi Krishna Attunuru
  -- strict thread matches above, loose matches on Subject: below --
2019-07-12 10:26 [dpdk-dev] " Jerin Jacob Kollanukkaran
2019-07-12 10:48 ` Burakov, Anatoly
2019-07-12  9:17 Jerin Jacob Kollanukkaran
2019-07-12  9:58 ` Burakov, Anatoly
2019-04-22  6:15 [dpdk-dev] [PATCH v5] kni: add IOVA va support for kni kirankumark
2019-06-25  3:56 ` [dpdk-dev] [PATCH v6 0/4] add IOVA = VA support in KNI vattunuru
2019-06-25 10:00   ` Burakov, Anatoly
2019-06-25 11:15     ` Jerin Jacob Kollanukkaran
2019-06-25 11:30       ` Burakov, Anatoly
2019-06-25 13:38         ` Burakov, Anatoly
2019-06-27  9:34           ` Jerin Jacob Kollanukkaran
2019-07-01 13:51             ` Vamsi Krishna Attunuru
2019-07-04  6:42               ` Vamsi Krishna Attunuru
2019-07-04  9:48                 ` Jerin Jacob Kollanukkaran
2019-07-11 16:21                   ` Ferruh Yigit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).