DPDK usage discussions
 help / color / mirror / Atom feed
* Shared memory between two primary DPDK processes
@ 2022-04-08 12:31 Antonio Di Bacco
  2022-04-08 13:26 ` Dmitry Kozlyuk
  0 siblings, 1 reply; 17+ messages in thread
From: Antonio Di Bacco @ 2022-04-08 12:31 UTC (permalink / raw)
  To: users

[-- Attachment #1: Type: text/plain, Size: 384 bytes --]

I know that it is possible to share memory between a primary and secondary
process using rte_memzone_reserve_aligned to allocate memory in primary
that is "seen" also by the secondary. If we have two primary processes
(started with different file-prefix) the same approach is not feasible. I
wonder how to share a chunk of memory hosted on a hugepage between two
primaries.

Regards.

[-- Attachment #2: Type: text/html, Size: 442 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shared memory between two primary DPDK processes
  2022-04-08 12:31 Shared memory between two primary DPDK processes Antonio Di Bacco
@ 2022-04-08 13:26 ` Dmitry Kozlyuk
  2022-04-08 14:36   ` Ferruh Yigit
                     ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Dmitry Kozlyuk @ 2022-04-08 13:26 UTC (permalink / raw)
  To: Antonio Di Bacco; +Cc: users

2022-04-08 14:31 (UTC+0200), Antonio Di Bacco:
> I know that it is possible to share memory between a primary and secondary
> process using rte_memzone_reserve_aligned to allocate memory in primary
> that is "seen" also by the secondary. If we have two primary processes
> (started with different file-prefix) the same approach is not feasible. I
> wonder how to share a chunk of memory hosted on a hugepage between two
> primaries.
> 
> Regards.

Hi Antonio,

Correction: all hugepages allocated by DPDK are shared
between primary and secondary processes, not only memzones.

I assume we're talking about processes within one host,
because your previous similar question was about sharing memory between hosts
(as we have discussed offline), which is out of scope for DPDK.

As for the question directly, you need to map the same part of the same file
in the second primary as the hugepage is mapped from in the first primary.
I don't recommend to work with file paths, because their management
is not straightforward (--single-file-segments, for one) and is undocumented.

There is a way to share DPDK memory segment file descriptors.
Although public, this DPDK API is dangerous in the sense that you must
clearly understand what you're doing and how DPDK works.
Hence the question: what is the task you need this sharing for?
Maybe there is a simpler way.

1. In the first primary:

	mz = rte_memzone_reserve()
	ms = rte_mem_virt2memseg(mz->addr)
	fd = rte_memseg_get_fd(ms)
	offset = rte_memseg_get_fd_offset(ms)

2. Use Unix domain sockets with SCM_RIGHTS
   to send "fd" and "offset" to the second primary.

3. In the second primary, after receiving "fd" and "offset":

	flags = MAP_SHARED | MAP_HUGE | (30 << MAP_HUGE_SHIFT)
	addr = mmap(fd, offset, flags)

Note that "mz" may consist of multiple "ms" depending on the sizes
of the zone and hugepages, and on the zone alignment.
Also "addr" may (and probably will) differ from "mz->addr".
It is possible to pass "mz->addr" and try to force it,
like DPDK does for primary/secondary.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shared memory between two primary DPDK processes
  2022-04-08 13:26 ` Dmitry Kozlyuk
@ 2022-04-08 14:36   ` Ferruh Yigit
  2022-04-08 21:14     ` Antonio Di Bacco
  2022-04-08 21:08   ` Antonio Di Bacco
  2022-07-06 22:14   ` Antonio Di Bacco
  2 siblings, 1 reply; 17+ messages in thread
From: Ferruh Yigit @ 2022-04-08 14:36 UTC (permalink / raw)
  To: Dmitry Kozlyuk, Antonio Di Bacco; +Cc: users

On 4/8/2022 2:26 PM, Dmitry Kozlyuk wrote:
> CAUTION: This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
> 
> 
> 2022-04-08 14:31 (UTC+0200), Antonio Di Bacco:
>> I know that it is possible to share memory between a primary and secondary
>> process using rte_memzone_reserve_aligned to allocate memory in primary
>> that is "seen" also by the secondary. If we have two primary processes
>> (started with different file-prefix) the same approach is not feasible. I
>> wonder how to share a chunk of memory hosted on a hugepage between two
>> primaries.
>>
>> Regards.
> 
> Hi Antonio,
> 
> Correction: all hugepages allocated by DPDK are shared
> between primary and secondary processes, not only memzones.
> 
> I assume we're talking about processes within one host,
> because your previous similar question was about sharing memory between hosts
> (as we have discussed offline), which is out of scope for DPDK.
> 
> As for the question directly, you need to map the same part of the same file
> in the second primary as the hugepage is mapped from in the first primary.
> I don't recommend to work with file paths, because their management
> is not straightforward (--single-file-segments, for one) and is undocumented.
> 
> There is a way to share DPDK memory segment file descriptors.
> Although public, this DPDK API is dangerous in the sense that you must
> clearly understand what you're doing and how DPDK works.
> Hence the question: what is the task you need this sharing for?
> Maybe there is a simpler way.
> 
> 1. In the first primary:
> 
>          mz = rte_memzone_reserve()
>          ms = rte_mem_virt2memseg(mz->addr)
>          fd = rte_memseg_get_fd(ms)
>          offset = rte_memseg_get_fd_offset(ms)
> 
> 2. Use Unix domain sockets with SCM_RIGHTS
>     to send "fd" and "offset" to the second primary.
> 
> 3. In the second primary, after receiving "fd" and "offset":
> 
>          flags = MAP_SHARED | MAP_HUGE | (30 << MAP_HUGE_SHIFT)
>          addr = mmap(fd, offset, flags)
> 
> Note that "mz" may consist of multiple "ms" depending on the sizes
> of the zone and hugepages, and on the zone alignment.
> Also "addr" may (and probably will) differ from "mz->addr".
> It is possible to pass "mz->addr" and try to force it,
> like DPDK does for primary/secondary.
> 

Also 'net/memif' driver can be used:
https://doc.dpdk.org/guides/nics/memif.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shared memory between two primary DPDK processes
  2022-04-08 13:26 ` Dmitry Kozlyuk
  2022-04-08 14:36   ` Ferruh Yigit
@ 2022-04-08 21:08   ` Antonio Di Bacco
  2022-04-11 13:03     ` Antonio Di Bacco
  2022-07-06 22:14   ` Antonio Di Bacco
  2 siblings, 1 reply; 17+ messages in thread
From: Antonio Di Bacco @ 2022-04-08 21:08 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: users

[-- Attachment #1: Type: text/plain, Size: 2369 bytes --]

Il giorno ven 8 apr 2022 alle ore 15:26 Dmitry Kozlyuk <
dmitry.kozliuk@gmail.com> ha scritto:

> 2022-04-08 14:31 (UTC+0200), Antonio Di Bacco:
> > I know that it is possible to share memory between a primary and
> secondary
> > process using rte_memzone_reserve_aligned to allocate memory in primary
> > that is "seen" also by the secondary. If we have two primary processes
> > (started with different file-prefix) the same approach is not feasible. I
> > wonder how to share a chunk of memory hosted on a hugepage between two
> > primaries.
> >
> > Regards.
>
> Hi Antonio,
>
> Correction: all hugepages allocated by DPDK are shared
> between primary and secondary processes, not only memzones.
>
> I assume we're talking about processes within one host,
> because your previous similar question was about sharing memory between
> hosts
> (as we have discussed offline), which is out of scope for DPDK.
>
> As for the question directly, you need to map the same part of the same
> file
> in the second primary as the hugepage is mapped from in the first primary.
> I don't recommend to work with file paths, because their management
> is not straightforward (--single-file-segments, for one) and is
> undocumented.
>
> There is a way to share DPDK memory segment file descriptors.
> Although public, this DPDK API is dangerous in the sense that you must
> clearly understand what you're doing and how DPDK works.
> Hence the question: what is the task you need this sharing for?
> Maybe there is a simpler way.
>
> 1. In the first primary:
>
>         mz = rte_memzone_reserve()
>         ms = rte_mem_virt2memseg(mz->addr)
>         fd = rte_memseg_get_fd(ms)
>         offset = rte_memseg_get_fd_offset(ms)
>
> 2. Use Unix domain sockets with SCM_RIGHTS
>    to send "fd" and "offset" to the second primary.
>
> 3. In the second primary, after receiving "fd" and "offset":
>
>         flags = MAP_SHARED | MAP_HUGE | (30 << MAP_HUGE_SHIFT)
>         addr = mmap(fd, offset, flags)
>
> Note that "mz" may consist of multiple "ms" depending on the sizes
> of the zone and hugepages, and on the zone alignment.
> Also "addr" may (and probably will) differ from "mz->addr".
> It is possible to pass "mz->addr" and try to force it,
> like DPDK does for primary/secondary.
>


Thank you Dmitry, it is really incredible how deep your knowledge is. I
will give it a try.

[-- Attachment #2: Type: text/html, Size: 3046 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shared memory between two primary DPDK processes
  2022-04-08 14:36   ` Ferruh Yigit
@ 2022-04-08 21:14     ` Antonio Di Bacco
  0 siblings, 0 replies; 17+ messages in thread
From: Antonio Di Bacco @ 2022-04-08 21:14 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Dmitry Kozlyuk, users

[-- Attachment #1: Type: text/plain, Size: 3142 bytes --]

Il giorno ven 8 apr 2022 alle ore 16:36 Ferruh Yigit <
ferruh.yigit@xilinx.com> ha scritto:

> On 4/8/2022 2:26 PM, Dmitry Kozlyuk wrote:
> > CAUTION: This message has originated from an External Source. Please use
> proper judgment and caution when opening attachments, clicking links, or
> responding to this email.
> >
> >
> > 2022-04-08 14:31 (UTC+0200), Antonio Di Bacco:
> >> I know that it is possible to share memory between a primary and
> secondary
> >> process using rte_memzone_reserve_aligned to allocate memory in primary
> >> that is "seen" also by the secondary. If we have two primary processes
> >> (started with different file-prefix) the same approach is not feasible.
> I
> >> wonder how to share a chunk of memory hosted on a hugepage between two
> >> primaries.
> >>
> >> Regards.
> >
> > Hi Antonio,
> >
> > Correction: all hugepages allocated by DPDK are shared
> > between primary and secondary processes, not only memzones.
> >
> > I assume we're talking about processes within one host,
> > because your previous similar question was about sharing memory between
> hosts
> > (as we have discussed offline), which is out of scope for DPDK.
> >
> > As for the question directly, you need to map the same part of the same
> file
> > in the second primary as the hugepage is mapped from in the first
> primary.
> > I don't recommend to work with file paths, because their management
> > is not straightforward (--single-file-segments, for one) and is
> undocumented.
> >
> > There is a way to share DPDK memory segment file descriptors.
> > Although public, this DPDK API is dangerous in the sense that you must
> > clearly understand what you're doing and how DPDK works.
> > Hence the question: what is the task you need this sharing for?
> > Maybe there is a simpler way.
> >
> > 1. In the first primary:
> >
> >          mz = rte_memzone_reserve()
> >          ms = rte_mem_virt2memseg(mz->addr)
> >          fd = rte_memseg_get_fd(ms)
> >          offset = rte_memseg_get_fd_offset(ms)
> >
> > 2. Use Unix domain sockets with SCM_RIGHTS
> >     to send "fd" and "offset" to the second primary.
> >
> > 3. In the second primary, after receiving "fd" and "offset":
> >
> >          flags = MAP_SHARED | MAP_HUGE | (30 << MAP_HUGE_SHIFT)
> >          addr = mmap(fd, offset, flags)
> >
> > Note that "mz" may consist of multiple "ms" depending on the sizes
> > of the zone and hugepages, and on the zone alignment.
> > Also "addr" may (and probably will) differ from "mz->addr".
> > It is possible to pass "mz->addr" and try to force it,
> > like DPDK does for primary/secondary.
> >
>
> Also 'net/memif' driver can be used:
> https://doc.dpdk.org/guides/nics/memif.html


Yes, I know about memif. Our application is currently using a chunk of
shared memory, a primary process writes on it and a secondary reads from
it.  Now the secondary will become a primary, sort of a promotion, and
MEMIF would be fine but the paradigm should change a little bit compared to
a shared memory approach.
MEMIF is an interface over a shared memory, we would need the opposite, a
shared memory over a network interface.

Thank you.

[-- Attachment #2: Type: text/html, Size: 4119 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shared memory between two primary DPDK processes
  2022-04-08 21:08   ` Antonio Di Bacco
@ 2022-04-11 13:03     ` Antonio Di Bacco
  2022-04-11 17:30       ` Dmitry Kozlyuk
  0 siblings, 1 reply; 17+ messages in thread
From: Antonio Di Bacco @ 2022-04-11 13:03 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: users

[-- Attachment #1: Type: text/plain, Size: 2923 bytes --]

I did a short program where a  primary (--file-prefix=p1) allocates a
memzone and generates a file descriptor that is passed to another primary
(--file-prefix=p2) .
The process P2 tries to mmap the memory but I get an error (Bad file
descriptor):

        uint64_t* addr = mmap(NULL, 1024*1024*1024, PROT_READ, flags,
mem_fd, 0);
        if (addr == -1)
            perror("mmap");

Il giorno ven 8 apr 2022 alle ore 23:08 Antonio Di Bacco <
a.dibacco.ks@gmail.com> ha scritto:

>
>
> Il giorno ven 8 apr 2022 alle ore 15:26 Dmitry Kozlyuk <
> dmitry.kozliuk@gmail.com> ha scritto:
>
>> 2022-04-08 14:31 (UTC+0200), Antonio Di Bacco:
>> > I know that it is possible to share memory between a primary and
>> secondary
>> > process using rte_memzone_reserve_aligned to allocate memory in primary
>> > that is "seen" also by the secondary. If we have two primary processes
>> > (started with different file-prefix) the same approach is not feasible.
>> I
>> > wonder how to share a chunk of memory hosted on a hugepage between two
>> > primaries.
>> >
>> > Regards.
>>
>> Hi Antonio,
>>
>> Correction: all hugepages allocated by DPDK are shared
>> between primary and secondary processes, not only memzones.
>>
>> I assume we're talking about processes within one host,
>> because your previous similar question was about sharing memory between
>> hosts
>> (as we have discussed offline), which is out of scope for DPDK.
>>
>> As for the question directly, you need to map the same part of the same
>> file
>> in the second primary as the hugepage is mapped from in the first primary.
>> I don't recommend to work with file paths, because their management
>> is not straightforward (--single-file-segments, for one) and is
>> undocumented.
>>
>> There is a way to share DPDK memory segment file descriptors.
>> Although public, this DPDK API is dangerous in the sense that you must
>> clearly understand what you're doing and how DPDK works.
>> Hence the question: what is the task you need this sharing for?
>> Maybe there is a simpler way.
>>
>> 1. In the first primary:
>>
>>         mz = rte_memzone_reserve()
>>         ms = rte_mem_virt2memseg(mz->addr)
>>         fd = rte_memseg_get_fd(ms)
>>         offset = rte_memseg_get_fd_offset(ms)
>>
>> 2. Use Unix domain sockets with SCM_RIGHTS
>>    to send "fd" and "offset" to the second primary.
>>
>> 3. In the second primary, after receiving "fd" and "offset":
>>
>>         flags = MAP_SHARED | MAP_HUGE | (30 << MAP_HUGE_SHIFT)
>>         addr = mmap(fd, offset, flags)
>>
>> Note that "mz" may consist of multiple "ms" depending on the sizes
>> of the zone and hugepages, and on the zone alignment.
>> Also "addr" may (and probably will) differ from "mz->addr".
>> It is possible to pass "mz->addr" and try to force it,
>> like DPDK does for primary/secondary.
>>
>
>
> Thank you Dmitry, it is really incredible how deep your knowledge is. I
> will give it a try.
>

[-- Attachment #2: Type: text/html, Size: 3867 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shared memory between two primary DPDK processes
  2022-04-11 13:03     ` Antonio Di Bacco
@ 2022-04-11 17:30       ` Dmitry Kozlyuk
  2022-04-14  8:20         ` Antonio Di Bacco
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Kozlyuk @ 2022-04-11 17:30 UTC (permalink / raw)
  To: Antonio Di Bacco; +Cc: users

2022-04-11 15:03 (UTC+0200), Antonio Di Bacco:
> I did a short program where a  primary (--file-prefix=p1) allocates a
> memzone and generates a file descriptor that is passed to another primary
> (--file-prefix=p2) .
> The process P2 tries to mmap the memory but I get an error (Bad file
> descriptor):
> 
>         uint64_t* addr = mmap(NULL, 1024*1024*1024, PROT_READ, flags,
> mem_fd, 0);
>         if (addr == -1)
>             perror("mmap");

How do you pass the FD?
Memif does the same thing under the hood, so you should be able too.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shared memory between two primary DPDK processes
  2022-04-11 17:30       ` Dmitry Kozlyuk
@ 2022-04-14  8:20         ` Antonio Di Bacco
  2022-04-14 19:01           ` Dmitry Kozlyuk
  0 siblings, 1 reply; 17+ messages in thread
From: Antonio Di Bacco @ 2022-04-14  8:20 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: users

[-- Attachment #1: Type: text/plain, Size: 1161 bytes --]

Il giorno lun 11 apr 2022 alle ore 19:30 Dmitry Kozlyuk <
dmitry.kozliuk@gmail.com> ha scritto:

> 2022-04-11 15:03 (UTC+0200), Antonio Di Bacco:
> > I did a short program where a  primary (--file-prefix=p1) allocates a
> > memzone and generates a file descriptor that is passed to another primary
> > (--file-prefix=p2) .
> > The process P2 tries to mmap the memory but I get an error (Bad file
> > descriptor):
> >
> >         uint64_t* addr = mmap(NULL, 1024*1024*1024, PROT_READ, flags,
> > mem_fd, 0);
> >         if (addr == -1)
> >             perror("mmap");
>
> How do you pass the FD?
> Memif does the same thing under the hood, so you should be able too.
>


Ok, after having a look to memif I managed to exchange the fd  between the
two processes and it works.
Anyway the procedure seems a little bit clunky and I think I'm going to use
the new SYSCALL pidfd_getfd
to achieve the same result.  In your opinion this method (getfd_pidfd)
could also work if the two DPDK processes
are inside different docker containers?
Or is there another mechanims like using handles to hugepages present in
the filesystem to share between two
different containers?

[-- Attachment #2: Type: text/html, Size: 1699 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shared memory between two primary DPDK processes
  2022-04-14  8:20         ` Antonio Di Bacco
@ 2022-04-14 19:01           ` Dmitry Kozlyuk
  2022-04-14 19:51             ` Antonio Di Bacco
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Kozlyuk @ 2022-04-14 19:01 UTC (permalink / raw)
  To: Antonio Di Bacco; +Cc: users

2022-04-14 10:20 (UTC+0200), Antonio Di Bacco:
[...]
> Ok, after having a look to memif I managed to exchange the fd  between the
> two processes and it works.
> Anyway the procedure seems a little bit clunky and I think I'm going to use
> the new SYSCALL pidfd_getfd
> to achieve the same result.  In your opinion this method (getfd_pidfd)
> could also work if the two DPDK processes
> are inside different docker containers?

Honestly, I've just learned about pidfd_getfd() from you.
But I know that containers use PID namespaces, so there's a question
how you will obtain the pidfd of a process in another container.

In general, any method of sharing FD will work.
Remember that you also need offset and size.
Given that some channel is required to share those,
I think Unix domain socket is still the preferred way.

> Or is there another mechanims like using handles to hugepages present in
> the filesystem to share between two
> different containers?

FD is needed for mmap().
You need to either pass the FD or open() the same hugepage file by path.
I advise against using paths because they are not a part of DPDK API contract.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shared memory between two primary DPDK processes
  2022-04-14 19:01           ` Dmitry Kozlyuk
@ 2022-04-14 19:51             ` Antonio Di Bacco
  2022-04-18 17:34               ` Antonio Di Bacco
  0 siblings, 1 reply; 17+ messages in thread
From: Antonio Di Bacco @ 2022-04-14 19:51 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: users

[-- Attachment #1: Type: text/plain, Size: 1511 bytes --]

Il giorno gio 14 apr 2022 alle ore 21:01 Dmitry Kozlyuk <
dmitry.kozliuk@gmail.com> ha scritto:

> 2022-04-14 10:20 (UTC+0200), Antonio Di Bacco:
> [...]
> > Ok, after having a look to memif I managed to exchange the fd  between
> the
> > two processes and it works.
> > Anyway the procedure seems a little bit clunky and I think I'm going to
> use
> > the new SYSCALL pidfd_getfd
> > to achieve the same result.  In your opinion this method (getfd_pidfd)
> > could also work if the two DPDK processes
> > are inside different docker containers?
>
> Honestly, I've just learned about pidfd_getfd() from you.
> But I know that containers use PID namespaces, so there's a question
> how you will obtain the pidfd of a process in another container.
>
> In general, any method of sharing FD will work.
> Remember that you also need offset and size.
> Given that some channel is required to share those,
> I think Unix domain socket is still the preferred way.
>
> > Or is there another mechanims like using handles to hugepages present in
> > the filesystem to share between two
> > different containers?
>
> FD is needed for mmap().
> You need to either pass the FD or open() the same hugepage file by path.
> I advise against using paths because they are not a part of DPDK API
> contract.
>

Thank you very much Dmitry, your answers are always enlightening.
I'm going to ask a different question on the dpdk.org about the best
practice to share memory between two dpdk processes running in different
containers.

[-- Attachment #2: Type: text/html, Size: 2007 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shared memory between two primary DPDK processes
  2022-04-14 19:51             ` Antonio Di Bacco
@ 2022-04-18 17:34               ` Antonio Di Bacco
  2022-04-18 17:53                 ` Antonio Di Bacco
  0 siblings, 1 reply; 17+ messages in thread
From: Antonio Di Bacco @ 2022-04-18 17:34 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: users

[-- Attachment #1: Type: text/plain, Size: 2779 bytes --]

At the end I tried the pidfd_getfd syscall that is working really fine and
giving me back a "clone" fd of an fd in that was opened from another
process. I tested it opening a text file in the first process  and after
cloning the fd , I could really read the file also in the second process.
Now the weird thing:
1) In the first process I allocate- a huge page, then get the fd
2) In the second process I get my "clone" fd and do an mmap, it works but
if I write on that memory, the first process cannot see what I wrote

int second_process(int remote_pid, int remote_mem_fd) {

        printf("remote_pid %d remote_mem_fd %d\n", remote_pid,
remote_mem_fd);
        int pidfd = syscall(__NR_pidfd_open, remote_pid, 0);

        int my_mem_fd = syscall(438, pidfd, remote_mem_fd, 0);
        printf("my_mem_fd %d\n", my_mem_fd);   // This is nice

        int flags = MAP_SHARED | MAP_HUGETLB | (30 << MAP_HUGE_SHIFT);
        uint64_t* addr = (uint64_t*) mmap(NULL, 1024 * 1024 * 1024,
PROT_READ|PROT_WRITE, flags, my_mem_fd, 0);
        if (addr == -1)
            perror("mmap");
        *addr = 0x0101010102020202;
}


Il giorno gio 14 apr 2022 alle ore 21:51 Antonio Di Bacco <
a.dibacco.ks@gmail.com> ha scritto:

>
>
> Il giorno gio 14 apr 2022 alle ore 21:01 Dmitry Kozlyuk <
> dmitry.kozliuk@gmail.com> ha scritto:
>
>> 2022-04-14 10:20 (UTC+0200), Antonio Di Bacco:
>> [...]
>> > Ok, after having a look to memif I managed to exchange the fd  between
>> the
>> > two processes and it works.
>> > Anyway the procedure seems a little bit clunky and I think I'm going to
>> use
>> > the new SYSCALL pidfd_getfd
>> > to achieve the same result.  In your opinion this method (getfd_pidfd)
>> > could also work if the two DPDK processes
>> > are inside different docker containers?
>>
>> Honestly, I've just learned about pidfd_getfd() from you.
>> But I know that containers use PID namespaces, so there's a question
>> how you will obtain the pidfd of a process in another container.
>>
>> In general, any method of sharing FD will work.
>> Remember that you also need offset and size.
>> Given that some channel is required to share those,
>> I think Unix domain socket is still the preferred way.
>>
>> > Or is there another mechanims like using handles to hugepages present in
>> > the filesystem to share between two
>> > different containers?
>>
>> FD is needed for mmap().
>> You need to either pass the FD or open() the same hugepage file by path.
>> I advise against using paths because they are not a part of DPDK API
>> contract.
>>
>
> Thank you very much Dmitry, your answers are always enlightening.
> I'm going to ask a different question on the dpdk.org about the best
> practice to share memory between two dpdk processes running in different
> containers.
>

[-- Attachment #2: Type: text/html, Size: 3773 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shared memory between two primary DPDK processes
  2022-04-18 17:34               ` Antonio Di Bacco
@ 2022-04-18 17:53                 ` Antonio Di Bacco
  2022-04-18 19:08                   ` Dmitry Kozlyuk
  0 siblings, 1 reply; 17+ messages in thread
From: Antonio Di Bacco @ 2022-04-18 17:53 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: users

[-- Attachment #1: Type: text/plain, Size: 3502 bytes --]

Another info to add:

The process that allocates the 1GB page has this map:
antodib@Ubuntu-20.04-5:: /proc> sudo cat /proc/27812/maps | grep huge
140000000-180000000 rw-s 00000000 00:46 97193
 /dev/huge1G/rtemap_0

while the process that maps the 1GB page (--file-prefix p2) has this maps,
is stealing a new page?
antodib@Ubuntu-20.04-5:: /proc> sudo cat /proc/27906/maps | grep huge
140000000-180000000 rw-s 00000000 00:46 113170
/dev/huge1G/p2map_0
7f7bc0000000-7f7c00000000 rw-s 00000000 00:46 97193
 /dev/huge1G/rtemap_0

Il giorno lun 18 apr 2022 alle ore 19:34 Antonio Di Bacco <
a.dibacco.ks@gmail.com> ha scritto:

> At the end I tried the pidfd_getfd syscall that is working really fine and
> giving me back a "clone" fd of an fd in that was opened from another
> process. I tested it opening a text file in the first process  and after
> cloning the fd , I could really read the file also in the second process.
> Now the weird thing:
> 1) In the first process I allocate- a huge page, then get the fd
> 2) In the second process I get my "clone" fd and do an mmap, it works but
> if I write on that memory, the first process cannot see what I wrote
>
> int second_process(int remote_pid, int remote_mem_fd) {
>
>         printf("remote_pid %d remote_mem_fd %d\n", remote_pid,
> remote_mem_fd);
>         int pidfd = syscall(__NR_pidfd_open, remote_pid, 0);
>
>         int my_mem_fd = syscall(438, pidfd, remote_mem_fd, 0);
>         printf("my_mem_fd %d\n", my_mem_fd);   // This is nice
>
>         int flags = MAP_SHARED | MAP_HUGETLB | (30 << MAP_HUGE_SHIFT);
>         uint64_t* addr = (uint64_t*) mmap(NULL, 1024 * 1024 * 1024,
> PROT_READ|PROT_WRITE, flags, my_mem_fd, 0);
>         if (addr == -1)
>             perror("mmap");
>         *addr = 0x0101010102020202;
> }
>
>
> Il giorno gio 14 apr 2022 alle ore 21:51 Antonio Di Bacco <
> a.dibacco.ks@gmail.com> ha scritto:
>
>>
>>
>> Il giorno gio 14 apr 2022 alle ore 21:01 Dmitry Kozlyuk <
>> dmitry.kozliuk@gmail.com> ha scritto:
>>
>>> 2022-04-14 10:20 (UTC+0200), Antonio Di Bacco:
>>> [...]
>>> > Ok, after having a look to memif I managed to exchange the fd  between
>>> the
>>> > two processes and it works.
>>> > Anyway the procedure seems a little bit clunky and I think I'm going
>>> to use
>>> > the new SYSCALL pidfd_getfd
>>> > to achieve the same result.  In your opinion this method (getfd_pidfd)
>>> > could also work if the two DPDK processes
>>> > are inside different docker containers?
>>>
>>> Honestly, I've just learned about pidfd_getfd() from you.
>>> But I know that containers use PID namespaces, so there's a question
>>> how you will obtain the pidfd of a process in another container.
>>>
>>> In general, any method of sharing FD will work.
>>> Remember that you also need offset and size.
>>> Given that some channel is required to share those,
>>> I think Unix domain socket is still the preferred way.
>>>
>>> > Or is there another mechanims like using handles to hugepages present
>>> in
>>> > the filesystem to share between two
>>> > different containers?
>>>
>>> FD is needed for mmap().
>>> You need to either pass the FD or open() the same hugepage file by path.
>>> I advise against using paths because they are not a part of DPDK API
>>> contract.
>>>
>>
>> Thank you very much Dmitry, your answers are always enlightening.
>> I'm going to ask a different question on the dpdk.org about the best
>> practice to share memory between two dpdk processes running in different
>> containers.
>>
>

[-- Attachment #2: Type: text/html, Size: 4887 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shared memory between two primary DPDK processes
  2022-04-18 17:53                 ` Antonio Di Bacco
@ 2022-04-18 19:08                   ` Dmitry Kozlyuk
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Kozlyuk @ 2022-04-18 19:08 UTC (permalink / raw)
  To: Antonio Di Bacco; +Cc: users

2022-04-18 19:53 (UTC+0200), Antonio Di Bacco:
> Another info to add:
> 
> The process that allocates the 1GB page has this map:
> antodib@Ubuntu-20.04-5:: /proc> sudo cat /proc/27812/maps | grep huge
> 140000000-180000000 rw-s 00000000 00:46 97193
>  /dev/huge1G/rtemap_0
> 
> while the process that maps the 1GB page (--file-prefix p2) has this maps,
> is stealing a new page?
> antodib@Ubuntu-20.04-5:: /proc> sudo cat /proc/27906/maps | grep huge
> 140000000-180000000 rw-s 00000000 00:46 113170
> /dev/huge1G/p2map_0
> 7f7bc0000000-7f7c00000000 rw-s 00000000 00:46 97193
>  /dev/huge1G/rtemap_0
> 
> Il giorno lun 18 apr 2022 alle ore 19:34 Antonio Di Bacco <
> a.dibacco.ks@gmail.com> ha scritto:  
> 
> > At the end I tried the pidfd_getfd syscall that is working really fine and
> > giving me back a "clone" fd of an fd in that was opened from another
> > process. I tested it opening a text file in the first process  and after
> > cloning the fd , I could really read the file also in the second process.
> > Now the weird thing:
> > 1) In the first process I allocate- a huge page, then get the fd
> > 2) In the second process I get my "clone" fd and do an mmap, it works but
> > if I write on that memory, the first process cannot see what I wrote
> >
> > int second_process(int remote_pid, int remote_mem_fd) {
> >
> >         printf("remote_pid %d remote_mem_fd %d\n", remote_pid,
> > remote_mem_fd);
> >         int pidfd = syscall(__NR_pidfd_open, remote_pid, 0);
> >
> >         int my_mem_fd = syscall(438, pidfd, remote_mem_fd, 0);
> >         printf("my_mem_fd %d\n", my_mem_fd);   // This is nice
> >
> >         int flags = MAP_SHARED | MAP_HUGETLB | (30 << MAP_HUGE_SHIFT);
> >         uint64_t* addr = (uint64_t*) mmap(NULL, 1024 * 1024 * 1024,
> > PROT_READ|PROT_WRITE, flags, my_mem_fd, 0);
> >         if (addr == -1)
> >             perror("mmap");
> >         *addr = 0x0101010102020202;
> > }

I don't quite understand what do you mean by "stealing a new page",
but it shows that the second process has mapped the same hugepage file
as the first process (just confirms that mmap indeed works).
rte_mem_virt2phy() can tell if the physical address
is really the same in both processes.
Are you sure that the 2nd process writes before the 1st one reads?
Are you sure read/write is not optimized out?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shared memory between two primary DPDK processes
  2022-04-08 13:26 ` Dmitry Kozlyuk
  2022-04-08 14:36   ` Ferruh Yigit
  2022-04-08 21:08   ` Antonio Di Bacco
@ 2022-07-06 22:14   ` Antonio Di Bacco
  2022-07-07  0:26     ` Dmitry Kozlyuk
  2 siblings, 1 reply; 17+ messages in thread
From: Antonio Di Bacco @ 2022-07-06 22:14 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: users

Dear Dmitry,

I tried to follow this approach and if I allocate 1GB on primary
process number 1, then I can mmap that memory on the primary process
number 2.
I also tried to convert the virt addr of the allocation made in
primary 1 to phys and then I converted the virt addr returned by mmap
in primary 2 and I got the same phys addr.

Unfortunately, if I try to allocated only 10 MB for example in primary
1, then mmap in primary 2 succeeds but it seems that this virt addr
doesn't correspond to the same phys memory as in primary 1.

In the primary 2, the mmap is used like this:

    int flags = MAP_SHARED | MAP_HUGETLB ;

    uint64_t* addr = (uint64_t*) mmap(NULL, sz, PROT_READ|PROT_WRITE,
flags, my_mem_fd, off);

On Fri, Apr 8, 2022 at 3:26 PM Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:
>
> 2022-04-08 14:31 (UTC+0200), Antonio Di Bacco:
> > I know that it is possible to share memory between a primary and secondary
> > process using rte_memzone_reserve_aligned to allocate memory in primary
> > that is "seen" also by the secondary. If we have two primary processes
> > (started with different file-prefix) the same approach is not feasible. I
> > wonder how to share a chunk of memory hosted on a hugepage between two
> > primaries.
> >
> > Regards.
>
> Hi Antonio,
>
> Correction: all hugepages allocated by DPDK are shared
> between primary and secondary processes, not only memzones.
>
> I assume we're talking about processes within one host,
> because your previous similar question was about sharing memory between hosts
> (as we have discussed offline), which is out of scope for DPDK.
>
> As for the question directly, you need to map the same part of the same file
> in the second primary as the hugepage is mapped from in the first primary.
> I don't recommend to work with file paths, because their management
> is not straightforward (--single-file-segments, for one) and is undocumented.
>
> There is a way to share DPDK memory segment file descriptors.
> Although public, this DPDK API is dangerous in the sense that you must
> clearly understand what you're doing and how DPDK works.
> Hence the question: what is the task you need this sharing for?
> Maybe there is a simpler way.
>
> 1. In the first primary:
>
>         mz = rte_memzone_reserve()
>         ms = rte_mem_virt2memseg(mz->addr)
>         fd = rte_memseg_get_fd(ms)
>         offset = rte_memseg_get_fd_offset(ms)
>
> 2. Use Unix domain sockets with SCM_RIGHTS
>    to send "fd" and "offset" to the second primary.
>
> 3. In the second primary, after receiving "fd" and "offset":
>
>         flags = MAP_SHARED | MAP_HUGE | (30 << MAP_HUGE_SHIFT)
>         addr = mmap(fd, offset, flags)
>
> Note that "mz" may consist of multiple "ms" depending on the sizes
> of the zone and hugepages, and on the zone alignment.
> Also "addr" may (and probably will) differ from "mz->addr".
> It is possible to pass "mz->addr" and try to force it,
> like DPDK does for primary/secondary.
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shared memory between two primary DPDK processes
  2022-07-06 22:14   ` Antonio Di Bacco
@ 2022-07-07  0:26     ` Dmitry Kozlyuk
  2022-07-07  8:48       ` Antonio Di Bacco
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Kozlyuk @ 2022-07-07  0:26 UTC (permalink / raw)
  To: Antonio Di Bacco; +Cc: users

2022-07-07 00:14 (UTC+0200), Antonio Di Bacco:
> Dear Dmitry,
> 
> I tried to follow this approach and if I allocate 1GB on primary
> process number 1, then I can mmap that memory on the primary process
> number 2.
> I also tried to convert the virt addr of the allocation made in
> primary 1 to phys and then I converted the virt addr returned by mmap
> in primary 2 and I got the same phys addr.
> 
> Unfortunately, if I try to allocated only 10 MB for example in primary
> 1, then mmap in primary 2 succeeds but it seems that this virt addr
> doesn't correspond to the same phys memory as in primary 1.
> 
> In the primary 2, the mmap is used like this:
> 
>     int flags = MAP_SHARED | MAP_HUGETLB ;
> 
>     uint64_t* addr = (uint64_t*) mmap(NULL, sz, PROT_READ|PROT_WRITE,
> flags, my_mem_fd, off);

Hi Antonio,

From `man 2 mmap`:

   Huge page (Huge TLB) mappings
       For  mappings that employ huge pages, the requirements for the
       arguments of mmap() and munmap() differ somewhat from the requirements
       for mappings that use the native system page size.

       For mmap(), offset must be a multiple of the underlying huge page
       size.  The system automatically aligns length to be a  multiple  of
       the underlying huge page size.

       For munmap(), addr, and length must both be a multiple of the
       underlying huge page size.

Probably process 1 maps a 1 GB hugepage:
DPDK does so if 1 GB hugepages are used even if you only allocate 10 MB.
You can examine memseg to see what size it is (not memzone!).
Hugepage size is a property of each mounted HugeTBL filesystem.
It determines which kernel pool to use.
Pools are over different sets of physical pages.
This means that the kernel doesn't allow to map given page frames
as 1 GB and 2 MB hugepages at the same time via hugetlbfs.
I'm surprised mmap() works at all in your case
and suspect that it is mapping 2 MB hugepages in process 2.

The solution may be, in process 2:

base_offset = RTE_ALIGN_FLOOR(offset, hugepage_size)
map_addr = mmap(fd, size=hugepage_size, offset=base_offset)
addr = RTE_PTR_ADD(map_addr, offset - base_offset)

Note that if [offset; offset+size) crosses a hugepage boundary,
you have to map more than one page.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shared memory between two primary DPDK processes
  2022-07-07  0:26     ` Dmitry Kozlyuk
@ 2022-07-07  8:48       ` Antonio Di Bacco
  2022-07-07  9:26         ` Dmitry Kozlyuk
  0 siblings, 1 reply; 17+ messages in thread
From: Antonio Di Bacco @ 2022-07-07  8:48 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: users

You are right, process 1 is always allocating 1GB even if I request
only 10MB, and memseg->hugepage_sz is 1GB.
When I use rte_memseg_get_fd_offset I get an FD and the offset is 0
(correct) because that's the memseg offset not the memzone.
To access the memory allocated by process 1 I need to take into
account also the offset between memzone and memseg.
And then I need to add (memzone->iova -  memseg->iova) to the address
returned by mmap.






On Thu, Jul 7, 2022 at 2:26 AM Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:
>
> 2022-07-07 00:14 (UTC+0200), Antonio Di Bacco:
> > Dear Dmitry,
> >
> > I tried to follow this approach and if I allocate 1GB on primary
> > process number 1, then I can mmap that memory on the primary process
> > number 2.
> > I also tried to convert the virt addr of the allocation made in
> > primary 1 to phys and then I converted the virt addr returned by mmap
> > in primary 2 and I got the same phys addr.
> >
> > Unfortunately, if I try to allocated only 10 MB for example in primary
> > 1, then mmap in primary 2 succeeds but it seems that this virt addr
> > doesn't correspond to the same phys memory as in primary 1.
> >
> > In the primary 2, the mmap is used like this:
> >
> >     int flags = MAP_SHARED | MAP_HUGETLB ;
> >
> >     uint64_t* addr = (uint64_t*) mmap(NULL, sz, PROT_READ|PROT_WRITE,
> > flags, my_mem_fd, off);
>
> Hi Antonio,
>
> From `man 2 mmap`:
>
>    Huge page (Huge TLB) mappings
>        For  mappings that employ huge pages, the requirements for the
>        arguments of mmap() and munmap() differ somewhat from the requirements
>        for mappings that use the native system page size.
>
>        For mmap(), offset must be a multiple of the underlying huge page
>        size.  The system automatically aligns length to be a  multiple  of
>        the underlying huge page size.
>
>        For munmap(), addr, and length must both be a multiple of the
>        underlying huge page size.
>
> Probably process 1 maps a 1 GB hugepage:
> DPDK does so if 1 GB hugepages are used even if you only allocate 10 MB.
> You can examine memseg to see what size it is (not memzone!).
> Hugepage size is a property of each mounted HugeTBL filesystem.
> It determines which kernel pool to use.
> Pools are over different sets of physical pages.
> This means that the kernel doesn't allow to map given page frames
> as 1 GB and 2 MB hugepages at the same time via hugetlbfs.
> I'm surprised mmap() works at all in your case
> and suspect that it is mapping 2 MB hugepages in process 2.
>
> The solution may be, in process 2:
>
> base_offset = RTE_ALIGN_FLOOR(offset, hugepage_size)
> map_addr = mmap(fd, size=hugepage_size, offset=base_offset)
> addr = RTE_PTR_ADD(map_addr, offset - base_offset)
>
> Note that if [offset; offset+size) crosses a hugepage boundary,
> you have to map more than one page.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Shared memory between two primary DPDK processes
  2022-07-07  8:48       ` Antonio Di Bacco
@ 2022-07-07  9:26         ` Dmitry Kozlyuk
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Kozlyuk @ 2022-07-07  9:26 UTC (permalink / raw)
  To: Antonio Di Bacco; +Cc: users

[-- Attachment #1: Type: text/plain, Size: 284 bytes --]

On Thu, Jul 7, 2022, 11:48 Antonio Di Bacco <a.dibacco.ks@gmail.com> wrote:

> And then I need to add (memzone->iova -  memseg->iova) to the address
> returned by mmap.
>
This bit sounds suspicious.

> mmap() always returns a virtual address.
IOVA is VA or PA depending on DPDK mode.

[-- Attachment #2: Type: text/html, Size: 1073 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-07-07  9:26 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-08 12:31 Shared memory between two primary DPDK processes Antonio Di Bacco
2022-04-08 13:26 ` Dmitry Kozlyuk
2022-04-08 14:36   ` Ferruh Yigit
2022-04-08 21:14     ` Antonio Di Bacco
2022-04-08 21:08   ` Antonio Di Bacco
2022-04-11 13:03     ` Antonio Di Bacco
2022-04-11 17:30       ` Dmitry Kozlyuk
2022-04-14  8:20         ` Antonio Di Bacco
2022-04-14 19:01           ` Dmitry Kozlyuk
2022-04-14 19:51             ` Antonio Di Bacco
2022-04-18 17:34               ` Antonio Di Bacco
2022-04-18 17:53                 ` Antonio Di Bacco
2022-04-18 19:08                   ` Dmitry Kozlyuk
2022-07-06 22:14   ` Antonio Di Bacco
2022-07-07  0:26     ` Dmitry Kozlyuk
2022-07-07  8:48       ` Antonio Di Bacco
2022-07-07  9:26         ` Dmitry Kozlyuk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).