DPDK patches and discussions
 help / color / mirror / Atom feed
From: "XU Liang" <liang.xu@cinfotech.cn>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v7] eal: map PCI memory resources after hugepages
Date: Tue, 11 Nov 2014 11:53:22 +0800	[thread overview]
Message-ID: <eef2d8b8-f2a1-4421-a0b2-88d22c51df5f@cinfotech.cn> (raw)
In-Reply-To: C6ECDF3AB251BE4894318F4E4512369780C07EEB@IRSMSX109.ger.corp.intel.com

I had finished some tests. The patch works fine. My tests are included :* single process  + uio + vfio * single process  + uio + vfio + base-virtaddr * multiple processes + uio + vfio * multiple processes + uio + vfio + base-virtaddr My unlucky multiple process application still got error without base-virtaddr when initial hugepages. See the attchments: primary.txt and secondary.txt.With base-virtaddr the patch worked, both hugepages and pci resources were mapped into base-virtaddr, My application is happy. See the attchments: base-virtaddr_primary.txt and  base-virtaddr_secondary.txt. ------------------------------------------------------------------From:Burakov, Anatoly <anatoly.burakov@intel.com>Time:2014 Nov 10 (Mon) 21 : 34To:Burakov, Anatoly <anatoly.burakov@intel.com>, dev@dpdk.org <dev@dpdk.org>Subject:Re: [dpdk-dev] [PATCH v7] eal: map PCI memory resources after hugepages
Nak, there are issues with the patch. There is another patch already, but I'll submit it whenever Liang verifies it works with his setup.

Thanks,
Anatoly

-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Anatoly Burakov
Sent: Monday, November 10, 2014 11:35 AM
To: dev@dpdk.org
Subject: [dpdk-dev] [PATCH v7] eal: map PCI memory resources after hugepages

Multi-process DPDK application must mmap hugepages and pci resources
into the same virtual address space. By default the virtual addresses
are chosen by the primary process automatically when calling the mmap.
But sometimes the chosen virtual addresses aren't usable in secondary
process - for example, secondary process is linked with more libraries
than primary process, and the library occupies the same address space
that the primary process has requested for PCI mappings.

This patch makes EAL map PCI BARs right after the hugepages (instead of
location chosen by mmap) in virtual memory.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: Liang Xu <liang.xu@cinfotech.cn>
---
 lib/librte_eal/linuxapp/eal/eal_pci.c              | 19 +++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_pci_uio.c          |  9 ++++++++-
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c         | 13 +++++++++++--
 lib/librte_eal/linuxapp/eal/include/eal_pci_init.h |  6 ++++++
 4 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c
index 5fe3961..dae8739 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci.c
@@ -97,6 +97,25 @@ error:
 	return -1;
 }
 
+void *
+pci_find_max_end_va(void)
+{
+	const struct rte_memseg *seg = rte_eal_get_physmem_layout();
+	const struct rte_memseg *last = seg;
+	unsigned i = 0;
+
+	for (i = 0; i < RTE_MAX_MEMSEG; i++, seg++) {
+		if (seg->addr == NULL)
+			break;
+
+		if (seg->addr > last->addr)
+			last = seg;
+
+	}
+	return RTE_PTR_ADD(last->addr, last->len);
+}
+
+
 /* map a particular resource from a file */
 void *
 pci_map_resource(void *requested_addr, int fd, off_t offset, size_t size)
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
index 7e62266..5090bf1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_uio.c
@@ -48,6 +48,8 @@
 
 static int pci_parse_sysfs_value(const char *filename, uint64_t *val);
 
+void *pci_map_addr = NULL;
+
 
 #define OFF_MAX              ((uint64_t)(off_t)-1)
 static int
@@ -371,10 +373,15 @@ pci_uio_map_resource(struct rte_pci_device *dev)
 			if (maps[j].addr != NULL)
 				fail = 1;
 			else {
-				mapaddr = pci_map_resource(NULL, fd, (off_t)offset,
+				if (pci_map_addr == NULL)
+					pci_map_addr = pci_find_max_end_va();
+
+				mapaddr = pci_map_resource(pci_map_addr, fd, (off_t)offset,
 						(size_t)maps[j].size);
 				if (mapaddr == NULL)
 					fail = 1;
+
+				pci_map_addr = RTE_PTR_ADD(pci_map_addr, maps[j].size);
 			}
 
 			if (fail) {
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index c776ddc..fb6ee7a 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -720,8 +720,17 @@ pci_vfio_map_resource(struct rte_pci_device *dev)
 		if (i == msix_bar)
 			continue;
 
-		bar_addr = pci_map_resource(maps[i].addr, vfio_dev_fd, reg.offset,
-				reg.size);
+		if (internal_config.process_type == RTE_PROC_PRIMARY) {
+			if (pci_map_addr == NULL)
+				pci_map_addr = pci_find_max_end_va();
+
+			bar_addr = pci_map_resource(pci_map_addr, vfio_dev_fd, reg.offset,
+					reg.size);
+			pci_map_addr = RTE_PTR_ADD(pci_map_addr, reg.size);
+		} else {
+			bar_addr = pci_map_resource(maps[i].addr, vfio_dev_fd, reg.offset,
+					reg.size);
+		}
 
 		if (bar_addr == NULL) {
 			RTE_LOG(ERR, EAL, "  %s mapping BAR%i failed: %s\n", pci_addr, i,
diff --git a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
index d758bee..1070eb8 100644
--- a/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/include/eal_pci_init.h
@@ -59,6 +59,12 @@ struct mapped_pci_resource {
 TAILQ_HEAD(mapped_pci_res_list, mapped_pci_resource);
 extern struct mapped_pci_res_list *pci_res_list;
 
+/*
+ * Helper function to map PCI resources right after hugepages in virtual memory
+ */
+extern void *pci_map_addr;
+void *pci_find_max_end_va(void);
+
 void *pci_map_resource(void *requested_addr, int fd, off_t offset,
 		size_t size);
 
-- 
1.8.1.4
From jijiang.liu@intel.com  Tue Nov 11 06:21:09 2014
Return-Path: <jijiang.liu@intel.com>
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
 by dpdk.org (Postfix) with ESMTP id 5836C7F0D
 for <dev@dpdk.org>; Tue, 11 Nov 2014 06:21:09 +0100 (CET)
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
 by fmsmga102.fm.intel.com with ESMTP; 10 Nov 2014 21:30:56 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.07,358,1413270000"; d="scan'208";a="620365999"
Received: from kmsmsx153.gar.corp.intel.com ([172.21.73.88])
 by fmsmga001.fm.intel.com with ESMTP; 10 Nov 2014 21:30:55 -0800
Received: from shsmsx104.ccr.corp.intel.com (10.239.110.15) by
 KMSMSX153.gar.corp.intel.com (172.21.73.88) with Microsoft SMTP Server (TLS)
 id 14.3.195.1; Tue, 11 Nov 2014 13:29:19 +0800
Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.130]) by
 SHSMSX104.ccr.corp.intel.com ([169.254.5.62]) with mapi id 14.03.0195.001;
 Tue, 11 Nov 2014 13:29:19 +0800
From: "Liu, Jijiang" <jijiang.liu@intel.com>
To: Olivier MATZ <olivier.matz@6wind.com>
Thread-Topic: [dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum
 offload
Thread-Index: AQHP8Yux0VUY397yik2xjVM3DjesSpxPpsiAgAGaY5CAABv8AIAIEl8ggAAq4wCAAVxhEA=Date: Tue, 11 Nov 2014 05:29:18 +0000
Message-ID: <1ED644BD7E0A5F4091CF203DAFB8E4CC01D8F7A7@SHSMSX101.ccr.corp.intel.com>
References: <1414376006-31402-1-git-send-email-jijiang.liu@intel.com>
 <1414376006-31402-11-git-send-email-jijiang.liu@intel.com>
 <54588BF7.309@6wind.com>
 <1ED644BD7E0A5F4091CF203DAFB8E4CC01D8510E@SHSMSX101.ccr.corp.intel.com>
 <5459FBB2.1040408@6wind.com>
 <1ED644BD7E0A5F4091CF203DAFB8E4CC01D8F399@SHSMSX101.ccr.corp.intel.com>
 <5460E512.1050609@6wind.com>
In-Reply-To: <5460E512.1050609@6wind.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum
 offload
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Nov 2014 05:21:09 -0000



> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz@6wind.com]
> Sent: Tuesday, November 11, 2014 12:17 AM
> To: Liu, Jijiang
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v8 10/10] app/testpmd:test VxLAN Tx checksum
> offload
> 
> Hi Jijiang,
> 
> On 11/10/2014 07:03 AM, Liu, Jijiang wrote:
> >> Another thing is surprising me.
> >>
> >> - if PKT_TX_VXLAN_CKSUM is not set (legacy use case), then the
> >>    driver use l2_len and l3_len to offload inner IP/UDP/TCP checksums.
> > If the flag is not set, and imply that it is not VXLAN packet,  and do
> > TX checksum offload as regular packet.
> >
> >> - if PKT_TX_VXLAN_CKSUM is set, then the driver has to use
> >>    inner_l{23}_len instead of l{23}_len for the same operation.
> > Your understanding is not fully correct.
> > The l{23}_len is still used for TX checksum offload, please refer to
> i40e_txd_enable_checksum()  implementation.
> 
> This fields are part of public mbuf API. You cannot say to refer to i40e PMD code
> to understand how to use it.
> 
> >> Adding PKT_TX_VXLAN_CKSUM changes the semantic of l2_len and l3_len.
> >> To fix this, I suggest to remove the new fields inner_l{23}_len then
> >> add outer_l{23}_len instead. Therefore, the semantic of l2_len and
> >> l3_len would not change, and a driver would always use the same field for a
> specific offload.
> > Oh...
> 
> Does it mean you agree?

I don't agree to change inner_l{23}_len the name.
The reason is that using the "inner" word means  incoming  packet is tunneling packet or encapsulation packet.
if we add  "outer"{2,3}_len  , which will cause confusion when processing non-tunneling packet.


> >> For my TSO development, I will follow the current semantic.
> > For TSO, you still can use l{2,3} _len .
> > When I develop tunneling TSO, I will use inner_l3_len/inner_l4_len.
> 
> I've just submitted a first version, please feel free to comment it.
> 
> 
> Regards,
> Olivier

  parent reply	other threads:[~2014-11-11  3:43 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-05 13:25 [dpdk-dev] [PATCH] eal: map uio resources after hugepages when the base_virtaddr is configured lxu
2014-11-05 15:10 ` Burakov, Anatoly
2014-11-05 15:49 ` [dpdk-dev] 答复: " XU Liang
2014-11-05 15:59   ` Burakov, Anatoly
2014-11-05 16:10   ` [dpdk-dev] 答复:答复: " XU Liang
2014-11-26  1:46     ` Qiu, Michael
2014-11-26  9:58       ` Burakov, Anatoly
2014-11-06 14:11 ` [dpdk-dev] [PATCH v2] " lxu
2014-11-06 14:27   ` Burakov, Anatoly
2014-11-06 14:48   ` [dpdk-dev] 答复:[PATCH " 徐亮
2014-11-06 14:47 ` [dpdk-dev] [PATCH v3] " lxu
2014-11-06 15:06   ` De Lara Guarch, Pablo
2014-11-06 15:07 ` [dpdk-dev] [PATCH v4] " lxu
2014-11-06 15:12   ` Thomas Monjalon
2014-11-06 15:11 ` lxu
2014-11-06 15:32 ` [dpdk-dev] [PATCH v5] " lxu
2014-11-06 15:41   ` Burakov, Anatoly
2014-11-06 15:58     ` Thomas Monjalon
2014-11-06 16:10       ` Burakov, Anatoly
2014-11-06 17:30         ` Bruce Richardson
2014-11-07  8:01 ` [dpdk-dev] [PATCH v6] " lxu
2014-11-07  9:42   ` Bruce Richardson
2014-11-07  9:47   ` Burakov, Anatoly
2014-11-07  9:57   ` XU Liang
2014-11-07 14:37     ` XU Liang
2014-11-10 11:34   ` [dpdk-dev] [PATCH v7] eal: map PCI memory resources after hugepages Anatoly Burakov
2014-11-10 13:33     ` Burakov, Anatoly
2014-11-11  3:53     ` XU Liang [this message]
2014-11-11 10:09     ` [dpdk-dev] [PATCH v8] " Anatoly Burakov
2014-11-13 11:34       ` Burakov, Anatoly
2014-11-13 12:58         ` Bruce Richardson
2014-11-13 13:44           ` Burakov, Anatoly
2014-11-13 13:46       ` Bruce Richardson
2014-11-25 17:17         ` Thomas Monjalon
2014-11-07 14:57 ` [dpdk-dev] [PATCH v7] eal: map uio " lxu
2014-11-07 15:14   ` Burakov, Anatoly
2014-11-07 15:15   ` Thomas Monjalon
2014-11-07 15:19   ` XU Liang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eef2d8b8-f2a1-4421-a0b2-88d22c51df5f@cinfotech.cn \
    --to=liang.xu@cinfotech.cn \
    --cc=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).