DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] librte_eal: add APIs to speedup virt2iova/phys
@ 2020-04-20 11:09 Li Feng
  2020-04-20 12:16 ` Burakov, Anatoly
  0 siblings, 1 reply; 9+ messages in thread
From: Li Feng @ 2020-04-20 11:09 UTC (permalink / raw)
  To: Anatoly Burakov, Bruce Richardson; +Cc: lifeng1519, dev, Li Feng

Using pread to replace lseek + read.
And add new APIs to reduce open/close/lseek system call frequency when the
user needs to convert a large range of virtual address space.
    - rte_mem_virt2iova_with_fd
    - rte_mem_virt2phy_with_fd

Currently it will be used by spdk in spdk_mem_register.

Signed-off-by: Li Feng <fengli@smartx.com>
---
 lib/librte_eal/freebsd/eal_memory.c | 18 ++++++++++++++
 lib/librte_eal/include/rte_memory.h | 36 +++++++++++++++++++++++++++
 lib/librte_eal/linux/eal_memory.c   | 49 +++++++++++++++++++++++--------------
 lib/librte_eal/rte_eal_version.map  |  3 +++
 4 files changed, 88 insertions(+), 18 deletions(-)

diff --git a/lib/librte_eal/freebsd/eal_memory.c b/lib/librte_eal/freebsd/eal_memory.c
index a97d8f0f0..fc0debf23 100644
--- a/lib/librte_eal/freebsd/eal_memory.c
+++ b/lib/librte_eal/freebsd/eal_memory.c
@@ -44,12 +44,30 @@ rte_mem_virt2phy(const void *virtaddr)
 	(void)virtaddr;
 	return RTE_BAD_IOVA;
 }
+
 rte_iova_t
 rte_mem_virt2iova(const void *virtaddr)
 {
 	return rte_mem_virt2phy(virtaddr);
 }
 
+phys_addr_t
+rte_mem_virt2phy_with_fd(int fd, const void *virtaddr)
+{
+	/*
+	 * XXX not implemented. This function is only used by
+	 * rte_mempool_virt2iova_with_fd() when hugepages are disabled.
+	 */
+	(void)virtaddr;
+	return RTE_BAD_IOVA;
+}
+
+rte_iova_t
+rte_mem_virt2iova_with_fd(int fd, const void *virtaddr)
+{
+	return rte_mem_virt2phy_with_fd(fd, virtaddr);
+}
+
 int
 rte_eal_hugepage_init(void)
 {
diff --git a/lib/librte_eal/include/rte_memory.h b/lib/librte_eal/include/rte_memory.h
index 3d8d0bd69..c75782fa7 100644
--- a/lib/librte_eal/include/rte_memory.h
+++ b/lib/librte_eal/include/rte_memory.h
@@ -108,6 +108,23 @@ int rte_mem_lock_page(const void *virt);
 phys_addr_t rte_mem_virt2phy(const void *virt);
 
 /**
+ * Get physical address of any mapped virtual address in the current process.
+ * It is found by reading fd which is the opened /proc/self/pagemap special file
+ * descriptor. This is a optimization of rte_mem_virt2phy when the
+ * rte_mem_virt2phy is needed to be called many times.
+ * The page must be locked.
+ *
+ * @param fd
+ *   The opened fd of /proc/self/pagemap.
+ * @param virt
+ *   The virtual address.
+ * @return
+ *   The physical address or RTE_BAD_IOVA on error.
+ */
+__rte_experimental
+phys_addr_t rte_mem_virt2phy_with_fd(int fd, const void *virt);
+
+/**
  * Get IO virtual address of any mapped virtual address in the current process.
  *
  * @note This function will not check internal page table. Instead, in IOVA as
@@ -123,6 +140,25 @@ phys_addr_t rte_mem_virt2phy(const void *virt);
 rte_iova_t rte_mem_virt2iova(const void *virt);
 
 /**
+ * Get IO virtual address of any mapped virtual address in the current process.
+ *
+ * @note This function will not check internal page table. Instead, in IOVA as
+ *       PA mode, it will fall back to getting real physical address (which may
+ *       not match the expected IOVA, such as what was specified for external
+ *       memory).
+ *
+ * @param virt
+ *   The virtual address.
+ * @param fd
+ *   The opened fd of /proc/self/pagemap.
+ * @return
+ *   The IO address or RTE_BAD_IOVA on error.
+ */
+__rte_experimental
+rte_iova_t rte_mem_virt2iova_with_fd(int fd, const void *virt);
+
+
+/**
  * Get virtual memory address corresponding to iova address.
  *
  * @note This function read-locks the memory hotplug subsystem, and thus cannot
diff --git a/lib/librte_eal/linux/eal_memory.c b/lib/librte_eal/linux/eal_memory.c
index 7a9c97ff8..918796700 100644
--- a/lib/librte_eal/linux/eal_memory.c
+++ b/lib/librte_eal/linux/eal_memory.c
@@ -91,11 +91,11 @@ uint64_t eal_get_baseaddr(void)
 
 /*
  * Get physical address of any mapped virtual address in the current process.
+ * fd is used to avoid open/close pagemap repeatly.
  */
 phys_addr_t
-rte_mem_virt2phy(const void *virtaddr)
-{
-	int fd, retval;
+rte_mem_virt2phy_with_fd(int fd, const void *virtaddr) {
+	int retval;
 	uint64_t page, physaddr;
 	unsigned long virt_pfn;
 	int page_size;
@@ -107,24 +107,10 @@ rte_mem_virt2phy(const void *virtaddr)
 	/* standard page size */
 	page_size = getpagesize();
 
-	fd = open("/proc/self/pagemap", O_RDONLY);
-	if (fd < 0) {
-		RTE_LOG(INFO, EAL, "%s(): cannot open /proc/self/pagemap: %s\n",
-			__func__, strerror(errno));
-		return RTE_BAD_IOVA;
-	}
-
 	virt_pfn = (unsigned long)virtaddr / page_size;
 	offset = sizeof(uint64_t) * virt_pfn;
-	if (lseek(fd, offset, SEEK_SET) == (off_t) -1) {
-		RTE_LOG(INFO, EAL, "%s(): seek error in /proc/self/pagemap: %s\n",
-				__func__, strerror(errno));
-		close(fd);
-		return RTE_BAD_IOVA;
-	}
 
-	retval = read(fd, &page, PFN_MASK_SIZE);
-	close(fd);
+	retval = pread(fd, &page, PFN_MASK_SIZE, offset);
 	if (retval < 0) {
 		RTE_LOG(INFO, EAL, "%s(): cannot read /proc/self/pagemap: %s\n",
 				__func__, strerror(errno));
@@ -149,6 +135,33 @@ rte_mem_virt2phy(const void *virtaddr)
 	return physaddr;
 }
 
+/*
+ * Get physical address of any mapped virtual address in the current process.
+ */
+phys_addr_t
+rte_mem_virt2phy(const void *virtaddr)
+{
+	uint64_t physaddr;
+	int fd;
+	fd = open("/proc/self/pagemap", O_RDONLY);
+	if (fd < 0) {
+		RTE_LOG(INFO, EAL, "%s(): cannot open /proc/self/pagemap: %s\n",
+			__func__, strerror(errno));
+		return RTE_BAD_IOVA;
+	}
+	physaddr = rte_mem_virt2phy_with_fd(fd, virtaddr);
+	close(fd);
+	return physaddr;
+}
+
+rte_iova_t
+rte_mem_virt2iova_with_fd(int fd, const void *virtaddr)
+{
+	if (rte_eal_iova_mode() == RTE_IOVA_VA)
+		return (uintptr_t)virtaddr;
+	return rte_mem_virt2phy_with_fd(fd, virtaddr);
+}
+
 rte_iova_t
 rte_mem_virt2iova(const void *virtaddr)
 {
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index f9ede5b41..fc3a436e7 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -338,4 +338,7 @@ EXPERIMENTAL {
 
 	# added in 20.05
 	rte_log_can_log;
+
+	rte_mem_virt2iova_with_fd;
+	rte_mem_virt2phy_with_fd;
 };
-- 
2.11.0


-- 
The SmartX email address is only for business purpose. Any sent message 
that is not related to the business is not authorized or permitted by 
SmartX.
本邮箱为北京志凌海纳科技有限公司(SmartX)工作邮箱. 如本邮箱发出的邮件与工作无关,该邮件未得到本公司任何的明示或默示的授权.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH] librte_eal: add APIs to speedup virt2iova/phys
  2020-04-20 11:09 [dpdk-dev] [PATCH] librte_eal: add APIs to speedup virt2iova/phys Li Feng
@ 2020-04-20 12:16 ` Burakov, Anatoly
  2020-04-20 13:07   ` Dmitry Kozlyuk
  0 siblings, 1 reply; 9+ messages in thread
From: Burakov, Anatoly @ 2020-04-20 12:16 UTC (permalink / raw)
  To: Li Feng, Bruce Richardson; +Cc: lifeng1519, dev

On 20-Apr-20 12:09 PM, Li Feng wrote:
> Using pread to replace lseek + read.
> And add new APIs to reduce open/close/lseek system call frequency when the
> user needs to convert a large range of virtual address space.
>      - rte_mem_virt2iova_with_fd
>      - rte_mem_virt2phy_with_fd
> 
> Currently it will be used by spdk in spdk_mem_register.
> 
> Signed-off-by: Li Feng <fengli@smartx.com>
> ---

These API's are IMO already on the verge of what's acceptable because of 
the differences between PA, DPDK IOVA and external memory IOVA. I'm not 
sure building on top of them is a good idea. It's also quite platform 
specific - rte_mem_virt2phy could potentially work with Windows (and in 
fact there was an RFC for it), but would this API work with Windows, 
given that Windows doesn't have fd's? Should we perhaps replace fd's 
with an opaque structure pointer, so that each platform-specific 
implementation could dereference it the way it needs to, without 
exposing internal details of the platform?

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH] librte_eal: add APIs to speedup virt2iova/phys
  2020-04-20 12:16 ` Burakov, Anatoly
@ 2020-04-20 13:07   ` Dmitry Kozlyuk
  2020-04-20 14:13     ` Li Feng
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Kozlyuk @ 2020-04-20 13:07 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: Li Feng, Bruce Richardson, lifeng1519, dev

> On 20-Apr-20 12:09 PM, Li Feng wrote:
> > Using pread to replace lseek + read.
> > And add new APIs to reduce open/close/lseek system call frequency when the
> > user needs to convert a large range of virtual address space.
> >      - rte_mem_virt2iova_with_fd
> >      - rte_mem_virt2phy_with_fd
> > 
> > Currently it will be used by spdk in spdk_mem_register.
> > 
> > Signed-off-by: Li Feng <fengli@smartx.com>
> > ---  
> 
> These API's are IMO already on the verge of what's acceptable because of 
> the differences between PA, DPDK IOVA and external memory IOVA. I'm not 
> sure building on top of them is a good idea. It's also quite platform 
> specific - rte_mem_virt2phy could potentially work with Windows (and in 
> fact there was an RFC for it), but would this API work with Windows, 
> given that Windows doesn't have fd's? Should we perhaps replace fd's 
> with an opaque structure pointer, so that each platform-specific 
> implementation could dereference it the way it needs to, without 
> exposing internal details of the platform?

These new APIs are, in fact, Linux-specific. Doubtfully will Windows ever
benefit from it even with fd abstracted, though I can't say for FreeBSD. Given
the linked suggestion to move rte_vfio_container_dma_map/unmap to some
include/linux header, maybe these APIs could land there too?

	http://mails.dpdk.org/archives/dev/2020-April/164404.html

-- 
Dmitry Kozlyuk

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH] librte_eal: add APIs to speedup virt2iova/phys
  2020-04-20 13:07   ` Dmitry Kozlyuk
@ 2020-04-20 14:13     ` Li Feng
  2021-03-25 13:32       ` David Marchand
  0 siblings, 1 reply; 9+ messages in thread
From: Li Feng @ 2020-04-20 14:13 UTC (permalink / raw)
  To: Dmitry Kozlyuk
  Cc: Burakov, Anatoly, Bruce Richardson, lifeng1519, dev, Kyle Zhang,
	Yang Fan

Cool, thank you, Anatoly and Kozlyuk.

I haven't found how Windows implements the rte_mem_virt2phy.

Using an opaque structure pointer as the first argument is a good idea.

Thanks,

Feng Li

Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> 于2020年4月20日周一 下午9:07写道:
>
> > On 20-Apr-20 12:09 PM, Li Feng wrote:
> > > Using pread to replace lseek + read.
> > > And add new APIs to reduce open/close/lseek system call frequency when the
> > > user needs to convert a large range of virtual address space.
> > >      - rte_mem_virt2iova_with_fd
> > >      - rte_mem_virt2phy_with_fd
> > >
> > > Currently it will be used by spdk in spdk_mem_register.
> > >
> > > Signed-off-by: Li Feng <fengli@smartx.com>
> > > ---
> >
> > These API's are IMO already on the verge of what's acceptable because of
> > the differences between PA, DPDK IOVA and external memory IOVA. I'm not
> > sure building on top of them is a good idea. It's also quite platform
> > specific - rte_mem_virt2phy could potentially work with Windows (and in
> > fact there was an RFC for it), but would this API work with Windows,
> > given that Windows doesn't have fd's? Should we perhaps replace fd's
> > with an opaque structure pointer, so that each platform-specific
> > implementation could dereference it the way it needs to, without
> > exposing internal details of the platform?
>
> These new APIs are, in fact, Linux-specific. Doubtfully will Windows ever
> benefit from it even with fd abstracted, though I can't say for FreeBSD. Given
> the linked suggestion to move rte_vfio_container_dma_map/unmap to some
> include/linux header, maybe these APIs could land there too?
>
>         http://mails.dpdk.org/archives/dev/2020-April/164404.html
>
> --
> Dmitry Kozlyuk

-- 
The SmartX email address is only for business purpose. Any sent message 
that is not related to the business is not authorized or permitted by 
SmartX.
本邮箱为北京志凌海纳科技有限公司(SmartX)工作邮箱. 如本邮箱发出的邮件与工作无关,该邮件未得到本公司任何的明示或默示的授权.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH] librte_eal: add APIs to speedup virt2iova/phys
  2020-04-20 14:13     ` Li Feng
@ 2021-03-25 13:32       ` David Marchand
  2021-03-29  6:26         ` Li Feng
  2021-04-01 10:38         ` Burakov, Anatoly
  0 siblings, 2 replies; 9+ messages in thread
From: David Marchand @ 2021-03-25 13:32 UTC (permalink / raw)
  To: Li Feng
  Cc: Dmitry Kozlyuk, Burakov, Anatoly, Bruce Richardson, Feng Li, dev,
	Kyle Zhang, Yang Fan

Hello,

On Mon, Apr 20, 2020 at 4:13 PM Li Feng <fengli@smartx.com> wrote:
>
> Cool, thank you, Anatoly and Kozlyuk.
>
> I haven't found how Windows implements the rte_mem_virt2phy.
>
> Using an opaque structure pointer as the first argument is a good idea.

I pinged about this patch status 6 months ago but got no reply.
Trying again in public.

From the thread, I understand that at best it would have to be done differently.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH] librte_eal: add APIs to speedup virt2iova/phys
  2021-03-25 13:32       ` David Marchand
@ 2021-03-29  6:26         ` Li Feng
  2021-04-01 10:38         ` Burakov, Anatoly
  1 sibling, 0 replies; 9+ messages in thread
From: Li Feng @ 2021-03-29  6:26 UTC (permalink / raw)
  To: David Marchand
  Cc: Dmitry Kozlyuk, Burakov, Anatoly, Bruce Richardson, Feng Li, dev,
	Kyle Zhang, Yang Fan

Hi david,

Sorry for late response.
I just see your mail on my trip.

I will update this patch, if anyone has interest in this feature.
Currently it's in my own repo.

在 2021年3月25日星期四,David Marchand <david.marchand@redhat.com> 写道:

> Hello,
>
> On Mon, Apr 20, 2020 at 4:13 PM Li Feng <fengli@smartx.com> wrote:
> >
> > Cool, thank you, Anatoly and Kozlyuk.
> >
> > I haven't found how Windows implements the rte_mem_virt2phy.
> >
> > Using an opaque structure pointer as the first argument is a good idea.
>
> I pinged about this patch status 6 months ago but got no reply.
> Trying again in public.
>
> From the thread, I understand that at best it would have to be done
> differently.
>
>
> --
> David Marchand
>
>

-- 
Thanks,
Feng Li

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH] librte_eal: add APIs to speedup virt2iova/phys
  2021-03-25 13:32       ` David Marchand
  2021-03-29  6:26         ` Li Feng
@ 2021-04-01 10:38         ` Burakov, Anatoly
  2021-04-06 10:40           ` Feng Li
  1 sibling, 1 reply; 9+ messages in thread
From: Burakov, Anatoly @ 2021-04-01 10:38 UTC (permalink / raw)
  To: David Marchand, Li Feng
  Cc: Dmitry Kozlyuk, Bruce Richardson, Feng Li, dev, Kyle Zhang, Yang Fan

On 25-Mar-21 1:32 PM, David Marchand wrote:
> Hello,
> 
> On Mon, Apr 20, 2020 at 4:13 PM Li Feng <fengli@smartx.com> wrote:
>>
>> Cool, thank you, Anatoly and Kozlyuk.
>>
>> I haven't found how Windows implements the rte_mem_virt2phy.
>>
>> Using an opaque structure pointer as the first argument is a good idea.
> 
> I pinged about this patch status 6 months ago but got no reply.
> Trying again in public.
> 
>  From the thread, I understand that at best it would have to be done differently.
> 

I would agree with the latter. Like i said in my original response, the 
fd-less API's are already on the very of what's acceptable and in the 
perfect world we wouldn't have them in the first place, and i don't like 
the fact that they exist and would wholly discourage their use, mainly 
because of very confusing semantics of real physical address vs. DPDK's 
IOVA vs. user IOVA, and potential for errors due to trying to resolve an 
IOVA address of something that doesn't even have it.

Given the above, I certainly don't like the idea of building on top of 
these API's.

-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH] librte_eal: add APIs to speedup virt2iova/phys
  2021-04-01 10:38         ` Burakov, Anatoly
@ 2021-04-06 10:40           ` Feng Li
  2021-04-06 11:23             ` David Marchand
  0 siblings, 1 reply; 9+ messages in thread
From: Feng Li @ 2021-04-06 10:40 UTC (permalink / raw)
  To: Burakov, Anatoly
  Cc: David Marchand, Li Feng, Dmitry Kozlyuk, Bruce Richardson, dev,
	Kyle Zhang, Yang Fan

On Thu, Apr 1, 2021 at 6:39 PM Burakov, Anatoly
<anatoly.burakov@intel.com> wrote:
>
> On 25-Mar-21 1:32 PM, David Marchand wrote:
> > Hello,
> >
> > On Mon, Apr 20, 2020 at 4:13 PM Li Feng <fengli@smartx.com> wrote:
> >>
> >> Cool, thank you, Anatoly and Kozlyuk.
> >>
> >> I haven't found how Windows implements the rte_mem_virt2phy.
> >>
> >> Using an opaque structure pointer as the first argument is a good idea.
> >
> > I pinged about this patch status 6 months ago but got no reply.
> > Trying again in public.
> >
> >  From the thread, I understand that at best it would have to be done differently.
> >
>
> I would agree with the latter. Like i said in my original response, the
> fd-less API's are already on the very of what's acceptable and in the
> perfect world we wouldn't have them in the first place, and i don't like
> the fact that they exist and would wholly discourage their use, mainly
> because of very confusing semantics of real physical address vs. DPDK's
> IOVA vs. user IOVA, and potential for errors due to trying to resolve an
> IOVA address of something that doesn't even have it.
>
> Given the above, I certainly don't like the idea of building on top of
> these API's.

Got it. Let's drop it.

>
> --
> Thanks,
> Anatoly

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH] librte_eal: add APIs to speedup virt2iova/phys
  2021-04-06 10:40           ` Feng Li
@ 2021-04-06 11:23             ` David Marchand
  0 siblings, 0 replies; 9+ messages in thread
From: David Marchand @ 2021-04-06 11:23 UTC (permalink / raw)
  To: Feng Li
  Cc: Burakov, Anatoly, Li Feng, Dmitry Kozlyuk, Bruce Richardson, dev,
	Kyle Zhang, Yang Fan

On Tue, Apr 6, 2021 at 12:40 PM Feng Li <lifeng1519@gmail.com> wrote:
>
> On Thu, Apr 1, 2021 at 6:39 PM Burakov, Anatoly
> <anatoly.burakov@intel.com> wrote:
> >
> > On 25-Mar-21 1:32 PM, David Marchand wrote:
> > > Hello,
> > >
> > > On Mon, Apr 20, 2020 at 4:13 PM Li Feng <fengli@smartx.com> wrote:
> > >>
> > >> Cool, thank you, Anatoly and Kozlyuk.
> > >>
> > >> I haven't found how Windows implements the rte_mem_virt2phy.
> > >>
> > >> Using an opaque structure pointer as the first argument is a good idea.
> > >
> > > I pinged about this patch status 6 months ago but got no reply.
> > > Trying again in public.
> > >
> > >  From the thread, I understand that at best it would have to be done differently.
> > >
> >
> > I would agree with the latter. Like i said in my original response, the
> > fd-less API's are already on the very of what's acceptable and in the
> > perfect world we wouldn't have them in the first place, and i don't like
> > the fact that they exist and would wholly discourage their use, mainly
> > because of very confusing semantics of real physical address vs. DPDK's
> > IOVA vs. user IOVA, and potential for errors due to trying to resolve an
> > IOVA address of something that doesn't even have it.
> >
> > Given the above, I certainly don't like the idea of building on top of
> > these API's.
>
> Got it. Let's drop it.

I marked it as rejected in patchwork.
Thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-04-07  6:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-20 11:09 [dpdk-dev] [PATCH] librte_eal: add APIs to speedup virt2iova/phys Li Feng
2020-04-20 12:16 ` Burakov, Anatoly
2020-04-20 13:07   ` Dmitry Kozlyuk
2020-04-20 14:13     ` Li Feng
2021-03-25 13:32       ` David Marchand
2021-03-29  6:26         ` Li Feng
2021-04-01 10:38         ` Burakov, Anatoly
2021-04-06 10:40           ` Feng Li
2021-04-06 11:23             ` David Marchand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).