* [dpdk-dev] [PATCH v2] doc/linux_gsg: fix numa lib name error
@ 2018-11-07 2:40 4% Yong Wang
0 siblings, 0 replies; 200+ results
From: Yong Wang @ 2018-11-07 2:40 UTC (permalink / raw)
To: anatoly.burakov; +Cc: dev, Yong Wang
The library for handling NUMA is not libnuma-devel, but numactl-devel
in Red Hat/Fedora and libnuma-dev in Debian/Ubuntu.
Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>
---
v2:
* Add lib name in Ubuntu.
---
doc/guides/linux_gsg/sys_reqs.rst | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index e2230f3..fbc9d54 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -64,7 +64,11 @@ Compilation of the DPDK
x86_x32 ABI is currently supported with distribution packages only on Ubuntu
higher than 13.10 or recent Debian distribution. The only supported compiler is gcc 4.9+.
-* libnuma-devel - library for handling NUMA (Non Uniform Memory Access).
+* Library for handling NUMA (Non Uniform Memory Access).
+
+ * numactl-devel in Red Hat/Fedora;
+
+ * libnuma-dev in Debian/Ubuntu;
* Python, version 2.7+ or 3.2+, to use various helper scripts included in the DPDK package.
--
1.8.3.1
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH v2 2/2] ip_frag: use key length for key comparision
2018-11-06 10:53 3% ` Burakov, Anatoly
@ 2018-11-06 11:41 0% ` Ananyev, Konstantin
0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2018-11-06 11:41 UTC (permalink / raw)
To: Burakov, Anatoly, dev; +Cc: stable, Hall, Ryan E, Gutkin, Alexander V
> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Tuesday, November 6, 2018 10:54 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org
> Cc: stable@dpdk.org; Hall, Ryan E <ryan.e.hall@intel.com>; Gutkin, Alexander V <alexander.v.gutkin@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 2/2] ip_frag: use key length for key comparision
>
> On 05-Nov-18 12:18 PM, Konstantin Ananyev wrote:
> > Right now reassembly code relies on src_dst[] being all zeroes to
> > determine is it free/occupied entry in the fragments table.
> > This is suboptimal and error prone - user can crash DPDK ip_reassembly
> > app by something like the following scapy script:
> > x=Ether(src=...,dst=...)/IP(dst='0.0.0.0',src='0.0.0.0',id=0)/('X'*1000)
> > frags=fragment(x, fragsize=500)
> > sendp(frags, iface=...)
> > To overcome that issue and reduce overhead of
> > 'key invalidate' and 'key is empty' operations -
> > add key_len into keys comparision procedure.
> >
> > Fixes: 4f1a8f633862 ("ip_frag: add IPv6 reassembly")
> > Cc: stable@dpdk.org
> >
> > Reported-by: Ryan E Hall <ryan.e.hall@intel.com>
> > Reported-by: Alexander V Gutkin <alexander.v.gutkin@intel.com>
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
>
>
>
> > @@ -44,9 +44,17 @@ struct ip_frag {
> >
> > /** @internal <src addr, dst_addr, id> to uniquely identify fragmented datagram. */
> > struct ip_frag_key {
> > - uint64_t src_dst[4]; /**< src address, first 8 bytes used for IPv4 */
> > - uint32_t id; /**< dst address */
> > - uint32_t key_len; /**< src/dst key length */
> > + uint64_t src_dst[4];
> > + /**< src and dst address, only first 8 bytes used for IPv4 */
> > + RTE_STD_C11
> > + union {
> > + uint64_t id_key_len; /**< combined for easy fetch */
> > + __extension__
> > + struct {
> > + uint32_t id; /**< packet id */
> > + uint32_t key_len; /**< src/dst key length */
> > + };
> > + };
> > };
>
> Would that break ABI?
No, size and layout of the structure remains the same.
Konstantin
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2 2/2] ip_frag: use key length for key comparision
@ 2018-11-06 10:53 3% ` Burakov, Anatoly
2018-11-06 11:41 0% ` Ananyev, Konstantin
0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-11-06 10:53 UTC (permalink / raw)
To: Konstantin Ananyev, dev; +Cc: stable, ryan.e.hall, alexander.v.gutkin
On 05-Nov-18 12:18 PM, Konstantin Ananyev wrote:
> Right now reassembly code relies on src_dst[] being all zeroes to
> determine is it free/occupied entry in the fragments table.
> This is suboptimal and error prone - user can crash DPDK ip_reassembly
> app by something like the following scapy script:
> x=Ether(src=...,dst=...)/IP(dst='0.0.0.0',src='0.0.0.0',id=0)/('X'*1000)
> frags=fragment(x, fragsize=500)
> sendp(frags, iface=...)
> To overcome that issue and reduce overhead of
> 'key invalidate' and 'key is empty' operations -
> add key_len into keys comparision procedure.
>
> Fixes: 4f1a8f633862 ("ip_frag: add IPv6 reassembly")
> Cc: stable@dpdk.org
>
> Reported-by: Ryan E Hall <ryan.e.hall@intel.com>
> Reported-by: Alexander V Gutkin <alexander.v.gutkin@intel.com>
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
> @@ -44,9 +44,17 @@ struct ip_frag {
>
> /** @internal <src addr, dst_addr, id> to uniquely identify fragmented datagram. */
> struct ip_frag_key {
> - uint64_t src_dst[4]; /**< src address, first 8 bytes used for IPv4 */
> - uint32_t id; /**< dst address */
> - uint32_t key_len; /**< src/dst key length */
> + uint64_t src_dst[4];
> + /**< src and dst address, only first 8 bytes used for IPv4 */
> + RTE_STD_C11
> + union {
> + uint64_t id_key_len; /**< combined for easy fetch */
> + __extension__
> + struct {
> + uint32_t id; /**< packet id */
> + uint32_t key_len; /**< src/dst key length */
> + };
> + };
> };
Would that break ABI?
>
> /**
>
--
Thanks,
Anatoly
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH] doc/linux_gsg: fix numa lib name error
@ 2018-11-06 6:15 4% Yong Wang
0 siblings, 0 replies; 200+ results
From: Yong Wang @ 2018-11-06 6:15 UTC (permalink / raw)
To: bruce.richardson; +Cc: dev, Yong Wang
The library for handling NUMA is numactl-devel, not libnuma-devel.
Signed-off-by: Yong Wang <wang.yong19@zte.com.cn>
---
doc/guides/linux_gsg/sys_reqs.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index e2230f3..1cb14b5 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -64,7 +64,7 @@ Compilation of the DPDK
x86_x32 ABI is currently supported with distribution packages only on Ubuntu
higher than 13.10 or recent Debian distribution. The only supported compiler is gcc 4.9+.
-* libnuma-devel - library for handling NUMA (Non Uniform Memory Access).
+* numactl-devel - library for handling NUMA (Non Uniform Memory Access).
* Python, version 2.7+ or 3.2+, to use various helper scripts included in the DPDK package.
--
1.8.3.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH] doc: update release notes for default KNI carries status
@ 2018-11-05 17:28 4% Ferruh Yigit
0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-11-05 17:28 UTC (permalink / raw)
To: John McNamara, Marko Kovacevic
Cc: dev, Ferruh Yigit, Thomas Monjalon, Dan Gora
Commit 89397a01ce4a ("kni: set default carrier state of interface")
changes the KNI interface default carrier status. Which prevents traffic
flow by default and may break some existing usage / testing.
Document this behavior change in release notes.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
Cc: Dan Gora <dg@adax.com>
---
doc/guides/rel_notes/release_18_11.rst | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 6ce276b22..69c4d1bf6 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -383,6 +383,13 @@ API Changes
This means ``ethtool "-a|--show-pause", "-s|--change"`` won't work, and
``ethtool <iface>`` output will have less information.
+* KNI, by default interface carrier status is ``off`` which means there won't be any traffic.
+ It can be set to ``on`` via ``rte_kni_update_link()`` API or via ``sysfs`` interface:
+ ``echo 1 > /sys/class/net/vEth0/carrier``. Note interface should be ``up`` to be able
+ to read/write sysfs interface.
+ When KNI sample application is used ``-m`` parameter can be used to automatically update
+ the carrier status for the interface.
+
ABI Changes
-----------
--
2.17.2
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH] doc: document KNI limitation in release notes
@ 2018-11-05 17:09 4% Ferruh Yigit
0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-11-05 17:09 UTC (permalink / raw)
To: John McNamara, Marko Kovacevic; +Cc: dev, Ferruh Yigit, Thomas Monjalon
Commit a9460a0b2efb ("kni: fix build on Linux 4.19") disables some
ethtool commands because they are removed in newer (4.19) kernels.
This patch documents removed functionality.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 27b67e0fd..6ce276b22 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -377,6 +377,12 @@ API Changes
* eventdev: Type of 2nd parameter to ``rte_event_eth_rx_adapter_caps_get()``
has been changed from uint8_t to uint16_t.
+* KNI, when ethtool support enabled (``CONFIG_RTE_KNI_KMOD_ETHTOOL=y``)
+ ethtool commands ``ETHTOOL_GSET & ETHTOOL_SSET`` are no more supported for the
+ kernels that has ``ETHTOOL_GLINKSETTINGS & ETHTOOL_SLINKSETTINGS`` support.
+ This means ``ethtool "-a|--show-pause", "-s|--change"`` won't work, and
+ ``ethtool <iface>`` output will have less information.
+
ABI Changes
-----------
--
2.17.2
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH v2 0/4] hash: deprecate lock ellision and read/write concurreny flags
@ 2018-11-02 17:38 3% ` Honnappa Nagarahalli
0 siblings, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2018-11-02 17:38 UTC (permalink / raw)
To: Bruce Richardson
Cc: pablo.de.lara.guarch, dev, Gavin Hu (Arm Technology China),
Dharmik Thakkar, nd, yipeng1.wang, sameh.gobriel, nd
>
> On Thu, Nov 01, 2018 at 06:25:18PM -0500, Honnappa Nagarahalli wrote:
> > Various configuration flags in rte_hash library result in increase of
> > number of test cases. Configuration flags for enabling transactional
> > memory use and read/write concurrency are not required. These features
> > should be supported by default. Please refer to [1] for more context.
> >
> > This patch marks these flags for deprecation in 19.02 release and
> > cleans up the test cases.
> >
> > [1] http://mails.dpdk.org/archives/dev/2018-October/117268.html
> >
> > Honnappa Nagarahalli (4): hash: prepare for deprecation of flags hash:
> > deprecate lock ellision and read/write concurreny flags test/hash:
> > stop using lock ellision and read/write concurreny flags doc/hash:
> > deprecate lock ellision and read/write concurreny flags
> >
> While I'd like to reduce the flags and do cleanup, I'm a little concerned about
> putting this scope of changes in so late in the release. I wonder if less drastic
> changes could work as well for this release, and do the cleanup later.
Thank you Bruce for the review. This patch series is not fixing any user related problems, let us skip this for 18.11. It will give us time as well to think through and get this right.
> For example, rather than deprecating the flags now, how about just change
> the default for when no flags are set? If user has set flags, follow the existing
> path - if flags is set to zero, then have the defaults be to use RW concurrency
> or TSX.
This changes the behavior of the library and what the flags mean, still requires ABI change, but does not need deprecation of flags (I guess this is what you meant). However, it will not solve the problem of losing the capability to disable TSX.
>
> /Bruce
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] check-symbol-change: fix regex to match on end of map file
2018-11-01 22:53 3% ` Thomas Monjalon
@ 2018-11-02 11:50 0% ` Neil Horman
0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2018-11-02 11:50 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev, doucette
On Thu, Nov 01, 2018 at 11:53:00PM +0100, Thomas Monjalon wrote:
> 01/11/2018 14:54, Neil Horman:
> > the regex to determine the end of the map file chunk in a patch seems to
> > be wrong, It was using perl regex syntax, which awk doesn't appear to
> > support (I'm still not sure how it was working previously). Regardless,
> > it wasn't triggering and as a result symbols were getting added to the
> > mapdb that shouldn't be there.
> >
> > Fix it by converting the regex to use traditional posix syntax, matching
> > only on the negation of the character class [^map]
> >
> > Tested and shown to be working on the ip_frag patch set provided by
> > doucette@bu.edu
> >
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > CC: thomas@monjalon.net
> > CC: doucette@bu.edu
> > Reported-by: doucette@bu.edu
>
> You could use these lines:
>
> Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
>
> Reported-by: Cody Doucette <doucette@bu.edu>
>
I'm fine with the second line, and the first is fine I guess, but I'm not sure
there is an exact correlation
> > --- a/devtools/check-symbol-change.sh
> > +++ b/devtools/check-symbol-change.sh
> > - /[-+] a\/.*\.^(map)/ {in_map=0}
> > + /[-+] a\/.*\.[^map]/ {in_map=0}
>
> Not sure this is what you intend:
> [^map] means any character except "m", "a" and "p".
>
Its not 100%, but its pretty close. The regex for exact matching on not a
specific string is pretty large and complex. Since we have no files that that
end in .m .a or .p, this should give us what we want for the forseeable future.
> I don't know whether awk supports this syntax: (?!foo)
>
It unfortunately doesn't, thats perl syntax, and while grep I think supports it,
awk is more strictly posix compliant.
Neil
>
>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v4 2/2] ring: move the atomic load of head above the loop
2018-11-02 9:36 0% ` Thomas Monjalon
@ 2018-11-02 11:23 0% ` Gavin Hu (Arm Technology China)
0 siblings, 0 replies; 200+ results
From: Gavin Hu (Arm Technology China) @ 2018-11-02 11:23 UTC (permalink / raw)
To: Thomas Monjalon, Honnappa Nagarahalli
Cc: Stephen Hemminger, dev, olivier.matz, chaozhu, bruce.richardson,
konstantin.ananyev, jerin.jacob, stable, nd
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, November 2, 2018 5:37 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Cc: Stephen Hemminger <stephen@networkplumber.org>; dev@dpdk.org;
> olivier.matz@6wind.com; chaozhu@linux.vnet.ibm.com;
> bruce.richardson@intel.com; konstantin.ananyev@intel.com;
> jerin.jacob@caviumnetworks.com; stable@dpdk.org; nd <nd@arm.com>
> Subject: Re: [PATCH v4 2/2] ring: move the atomic load of head above the
> loop
>
> 02/11/2018 08:15, Gavin Hu (Arm Technology China):
> >
> > > -----Original Message-----
> > > From: Honnappa Nagarahalli
> > > Sent: Friday, November 2, 2018 12:31 PM
> > > To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Stephen
> > > Hemminger <stephen@networkplumber.org>
> > > Cc: dev@dpdk.org; thomas@monjalon.net; olivier.matz@6wind.com;
> > > chaozhu@linux.vnet.ibm.com; bruce.richardson@intel.com;
> > > konstantin.ananyev@intel.com; jerin.jacob@caviumnetworks.com;
> > > stable@dpdk.org; nd <nd@arm.com>
> > > Subject: RE: [PATCH v4 2/2] ring: move the atomic load of head above
> > > the loop
> > >
> > > <Fixing this to make the reply inline, making email plain text>
> > >
> > >
> > > On Thu, 1 Nov 2018 17:53:51 +0800
> > > Gavin Hu <mailto:gavin.hu@arm.com> wrote:
> > >
> > > > +* **Updated the ring library with C11 memory model.**
> > > > +
> > > > + Updated the ring library with C11 memory model including the
> > > > + following
> > > changes:
> > > > +
> > > > + * Synchronize the load and store of the tail
> > > > + * Move the atomic load of head above the loop
> > > > +
> > >
> > > Does this really need to be in the release notes? Is it a user
> > > visible change or just an internal/optimization and fix.
> > >
> > > [Gavin] There is no api changes, but this is a significant change as
> > > ring is fundamental and widely used, it decreases latency by 25% in
> > > our tests, it may do even better for cases with more contending
> > > producers/consumers or deeper depth of rings.
> > >
> > > [Honnappa] I agree with Stephen. Release notes should be written
> > > from DPDK user perspective. In the rte_ring case, the user has the
> > > option of choosing between c11 and non-c11 algorithms. Performance
> > > would be one of the criteria to choose between these 2 algorithms.
> > > IMO, it probably makes sense to indicate that the performance of c11
> > > based algorithm has been improved. However, I do not know what DPDK
> > > has followed historically regarding performance optimizations. I
> > > would prefer to follow whatever has been followed so far.
> > > I do not think that we need to document the details of the internal
> > > changes since it does not help the user make a decision.
> >
> > I read through the online guidelines for release notes, besides API and new
> features, resolved issues which are significant and not newly introduced in
> this release cycle, should also be included.
> > In this case, the resolved issue existed for long time, across multiple
> release cycles and ring is a core lib, so it should be a candidate for release
> notes.
> >
> > https://doc.dpdk.org/guides-18.08/contributing/patches.html
> > section 5.5 says:
> > Important changes will require an addition to the release notes in
> doc/guides/rel_notes/.
> > See the Release Notes section of the Documentation Guidelines for details.
> > https://doc.dpdk.org/guides-18.08/contributing/documentation.html#doc-
> > guidelines "Developers should include updates to the Release Notes
> > with patch sets that relate to any of the following sections:
> > New Features
> > Resolved Issues (see below)
> > Known Issues
> > API Changes
> > ABI Changes
> > Shared Library Versions
> > Resolved Issues should only include issues from previous releases that
> have been resolved in the current release. Issues that are introduced and
> then fixed within a release cycle do not have to be included here."
> >
> > Suggested order in release notes items:
> > * Core libs (EAL, mempool, ring, mbuf, buses)
> > * Device abstraction libs and PMDs
>
> I agree with Honnappa.
> You don't need to give details, but can explain that performance of
> C11 version is improved.
>
V5 was submitted to indicate the improvement by the change, without giving more technical details, please have a review, thanks!
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v4 2/2] ring: move the atomic load of head above the loop
2018-11-02 7:15 3% ` Gavin Hu (Arm Technology China)
@ 2018-11-02 9:36 0% ` Thomas Monjalon
2018-11-02 11:23 0% ` Gavin Hu (Arm Technology China)
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-11-02 9:36 UTC (permalink / raw)
To: Gavin Hu (Arm Technology China), Honnappa Nagarahalli
Cc: Stephen Hemminger, dev, olivier.matz, chaozhu, bruce.richardson,
konstantin.ananyev, jerin.jacob, stable, nd
02/11/2018 08:15, Gavin Hu (Arm Technology China):
>
> > -----Original Message-----
> > From: Honnappa Nagarahalli
> > Sent: Friday, November 2, 2018 12:31 PM
> > To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Stephen
> > Hemminger <stephen@networkplumber.org>
> > Cc: dev@dpdk.org; thomas@monjalon.net; olivier.matz@6wind.com;
> > chaozhu@linux.vnet.ibm.com; bruce.richardson@intel.com;
> > konstantin.ananyev@intel.com; jerin.jacob@caviumnetworks.com;
> > stable@dpdk.org; nd <nd@arm.com>
> > Subject: RE: [PATCH v4 2/2] ring: move the atomic load of head above the
> > loop
> >
> > <Fixing this to make the reply inline, making email plain text>
> >
> >
> > On Thu, 1 Nov 2018 17:53:51 +0800
> > Gavin Hu <mailto:gavin.hu@arm.com> wrote:
> >
> > > +* **Updated the ring library with C11 memory model.**
> > > +
> > > + Updated the ring library with C11 memory model including the following
> > changes:
> > > +
> > > + * Synchronize the load and store of the tail
> > > + * Move the atomic load of head above the loop
> > > +
> >
> > Does this really need to be in the release notes? Is it a user visible change or
> > just an internal/optimization and fix.
> >
> > [Gavin] There is no api changes, but this is a significant change as ring is
> > fundamental and widely used, it decreases latency by 25% in our tests, it may
> > do even better for cases with more contending producers/consumers or
> > deeper depth of rings.
> >
> > [Honnappa] I agree with Stephen. Release notes should be written from
> > DPDK user perspective. In the rte_ring case, the user has the option of
> > choosing between c11 and non-c11 algorithms. Performance would be one
> > of the criteria to choose between these 2 algorithms. IMO, it probably makes
> > sense to indicate that the performance of c11 based algorithm has been
> > improved. However, I do not know what DPDK has followed historically
> > regarding performance optimizations. I would prefer to follow whatever has
> > been followed so far.
> > I do not think that we need to document the details of the internal changes
> > since it does not help the user make a decision.
>
> I read through the online guidelines for release notes, besides API and new features, resolved issues which are significant and not newly introduced in this release cycle, should also be included.
> In this case, the resolved issue existed for long time, across multiple release cycles and ring is a core lib, so it should be a candidate for release notes.
>
> https://doc.dpdk.org/guides-18.08/contributing/patches.html
> section 5.5 says:
> Important changes will require an addition to the release notes in doc/guides/rel_notes/.
> See the Release Notes section of the Documentation Guidelines for details.
> https://doc.dpdk.org/guides-18.08/contributing/documentation.html#doc-guidelines
> "Developers should include updates to the Release Notes with patch sets that relate to any of the following sections:
> New Features
> Resolved Issues (see below)
> Known Issues
> API Changes
> ABI Changes
> Shared Library Versions
> Resolved Issues should only include issues from previous releases that have been resolved in the current release. Issues that are introduced and then fixed within a release cycle do not have to be included here."
>
> Suggested order in release notes items:
> * Core libs (EAL, mempool, ring, mbuf, buses)
> * Device abstraction libs and PMDs
I agree with Honnappa.
You don't need to give details, but can explain that performance of
C11 version is improved.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v4 2/2] ring: move the atomic load of head above the loop
@ 2018-11-02 7:15 3% ` Gavin Hu (Arm Technology China)
2018-11-02 9:36 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Gavin Hu (Arm Technology China) @ 2018-11-02 7:15 UTC (permalink / raw)
To: Honnappa Nagarahalli, Stephen Hemminger
Cc: dev, thomas, olivier.matz, chaozhu, bruce.richardson,
konstantin.ananyev, jerin.jacob, stable, nd
> -----Original Message-----
> From: Honnappa Nagarahalli
> Sent: Friday, November 2, 2018 12:31 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Stephen
> Hemminger <stephen@networkplumber.org>
> Cc: dev@dpdk.org; thomas@monjalon.net; olivier.matz@6wind.com;
> chaozhu@linux.vnet.ibm.com; bruce.richardson@intel.com;
> konstantin.ananyev@intel.com; jerin.jacob@caviumnetworks.com;
> stable@dpdk.org; nd <nd@arm.com>
> Subject: RE: [PATCH v4 2/2] ring: move the atomic load of head above the
> loop
>
> <Fixing this to make the reply inline, making email plain text>
>
>
> On Thu, 1 Nov 2018 17:53:51 +0800
> Gavin Hu <mailto:gavin.hu@arm.com> wrote:
>
> > +* **Updated the ring library with C11 memory model.**
> > +
> > + Updated the ring library with C11 memory model including the following
> changes:
> > +
> > + * Synchronize the load and store of the tail
> > + * Move the atomic load of head above the loop
> > +
>
> Does this really need to be in the release notes? Is it a user visible change or
> just an internal/optimization and fix.
>
> [Gavin] There is no api changes, but this is a significant change as ring is
> fundamental and widely used, it decreases latency by 25% in our tests, it may
> do even better for cases with more contending producers/consumers or
> deeper depth of rings.
>
> [Honnappa] I agree with Stephen. Release notes should be written from
> DPDK user perspective. In the rte_ring case, the user has the option of
> choosing between c11 and non-c11 algorithms. Performance would be one
> of the criteria to choose between these 2 algorithms. IMO, it probably makes
> sense to indicate that the performance of c11 based algorithm has been
> improved. However, I do not know what DPDK has followed historically
> regarding performance optimizations. I would prefer to follow whatever has
> been followed so far.
> I do not think that we need to document the details of the internal changes
> since it does not help the user make a decision.
I read through the online guidelines for release notes, besides API and new features, resolved issues which are significant and not newly introduced in this release cycle, should also be included.
In this case, the resolved issue existed for long time, across multiple release cycles and ring is a core lib, so it should be a candidate for release notes.
https://doc.dpdk.org/guides-18.08/contributing/patches.html
section 5.5 says:
Important changes will require an addition to the release notes in doc/guides/rel_notes/.
See the Release Notes section of the Documentation Guidelines for details.
https://doc.dpdk.org/guides-18.08/contributing/documentation.html#doc-guidelines
"Developers should include updates to the Release Notes with patch sets that relate to any of the following sections:
New Features
Resolved Issues (see below)
Known Issues
API Changes
ABI Changes
Shared Library Versions
Resolved Issues should only include issues from previous releases that have been resolved in the current release. Issues that are introduced and then fixed within a release cycle do not have to be included here."
Suggested order in release notes items:
* Core libs (EAL, mempool, ring, mbuf, buses)
* Device abstraction libs and PMDs
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] check-symbol-change: fix regex to match on end of map file
@ 2018-11-01 22:53 3% ` Thomas Monjalon
2018-11-02 11:50 0% ` Neil Horman
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-11-01 22:53 UTC (permalink / raw)
To: Neil Horman; +Cc: dev, doucette
01/11/2018 14:54, Neil Horman:
> the regex to determine the end of the map file chunk in a patch seems to
> be wrong, It was using perl regex syntax, which awk doesn't appear to
> support (I'm still not sure how it was working previously). Regardless,
> it wasn't triggering and as a result symbols were getting added to the
> mapdb that shouldn't be there.
>
> Fix it by converting the regex to use traditional posix syntax, matching
> only on the negation of the character class [^map]
>
> Tested and shown to be working on the ip_frag patch set provided by
> doucette@bu.edu
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: thomas@monjalon.net
> CC: doucette@bu.edu
> Reported-by: doucette@bu.edu
You could use these lines:
Fixes: 4bec48184e33 ("devtools: add checks for ABI symbol addition")
Reported-by: Cody Doucette <doucette@bu.edu>
> --- a/devtools/check-symbol-change.sh
> +++ b/devtools/check-symbol-change.sh
> - /[-+] a\/.*\.^(map)/ {in_map=0}
> + /[-+] a\/.*\.[^map]/ {in_map=0}
Not sure this is what you intend:
[^map] means any character except "m", "a" and "p".
I don't know whether awk supports this syntax: (?!foo)
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v5 3/3] ip_frag: extend IPv6 fragment header retrieval
@ 2018-10-31 19:56 3% ` Ananyev, Konstantin
0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2018-10-31 19:56 UTC (permalink / raw)
To: Cody Doucette, Dumitrescu, Cristian; +Cc: dev, Qiaobin Fu
Hi Cody,
>
> Add the ability to parse IPv6 extenders to find the
> IPv6 fragment header, and update callers.
>
> According to RFC 8200, there is no guarantee that the IPv6
> Fragment extension header will come before any other extension
> header, even though it is recommended.
>
> Signed-off-by: Cody Doucette <doucette@bu.edu>
> Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
> Reviewed-by: Michel Machado <michel@digirati.com.br>
> ---
> examples/ip_reassembly/main.c | 6 ++--
> lib/librte_ip_frag/rte_ip_frag.h | 23 ++++++-------
> lib/librte_ip_frag/rte_ip_frag_version.map | 1 +
> lib/librte_ip_frag/rte_ipv6_fragmentation.c | 38 +++++++++++++++++++++
> lib/librte_ip_frag/rte_ipv6_reassembly.c | 4 +--
> lib/librte_port/rte_port_ras.c | 6 ++--
> 6 files changed, 59 insertions(+), 19 deletions(-)
>
> diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
> index 17b55d4c7..3a827bd6c 100644
> --- a/examples/ip_reassembly/main.c
> +++ b/examples/ip_reassembly/main.c
> @@ -365,12 +365,14 @@ reassemble(struct rte_mbuf *m, uint16_t portid, uint32_t queue,
> eth_hdr->ether_type = rte_be_to_cpu_16(ETHER_TYPE_IPv4);
> } else if (RTE_ETH_IS_IPV6_HDR(m->packet_type)) {
> /* if packet is IPv6 */
> - struct ipv6_extension_fragment *frag_hdr;
> + const struct ipv6_extension_fragment *frag_hdr;
> + struct ipv6_extension_fragment frag_hdr_buf;
> struct ipv6_hdr *ip_hdr;
>
> ip_hdr = (struct ipv6_hdr *)(eth_hdr + 1);
>
> - frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(ip_hdr);
> + frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(m,
> + ip_hdr, &frag_hdr_buf);
I looked at the patch once again, and it seems incomplete to me.
Sorry for late comments.
Yes, right now te_ipv6_frag_get_ipv6_fragment_header can properly
retrieve ipv6 fragment info, but it is not enough to make things work
for situation when we have packet with frag header not the first and only
extension header.
In the same function, few lines below, we setup l3_len based on that assumption:
m->l3_len = sizeof(*ip_hdr) + sizeof(*frag_hdr);
mo = rte_ipv6_frag_reassemble_packet(tbl, dr, m, tms, ip_hdr, frag_hdr);
And inside rte_ipv6_frag_reassemble_packet() we still assume the same:
...
frag_hdr = (struct ipv6_extension_fragment *) (ip_hdr + 1);
ip_hdr->proto = frag_hdr->next_header;
I think we need a function that would allow us to get offset of frag_hdr.
Actually probably we can have a generic one here, that can return offset for
any requested ext header or total length of ipv6 header.
Something like that:
struct rte_ipv6_get_xhdr_ofs {
uint16_t find_proto; /* header proto to find */
uint16_t next_proto; /* next header proto */
uint32_t next_ofs; /* offset to start search */
};
struct int
rte_ipv6_get_xhdr_ofs(struct rte_mbuf *pkt, rte_ipv6_get_xhdr_ofs *find);
that would go through ipv6 ext headers till either requested proto is found, or end of IPv6 header.
Then user can do something like that:
/* find fragment extention */
ipv6_get_xhdr_ofs ofs = {
.find_proto = IPPROTO_FRAGMENT,
.next_proto = ipv6_hdr->proto,
.ofs = sizeof(struct ipv6_hdr),
};
rc = rte_ipv6_get_xhdr_ofs(pkt, &ofs);
if(rc == 0)
frag_hdr = rte_pktmbuf_mtod_offset(m, .., ofs.ofs);
...
/* get size of IPv6 header plus all known extensions */
ipv6_get_xhdr_ofs ofs = {
.find_proto = IPPROTO_MAX,
.next_proto = ipv6_hdr->proto,
.ofs = sizeof(struct ipv6_hdr),
};
rc = rte_ipv6_get_xhdr_ofs(pkt, &ofs);
>
> if (frag_hdr != NULL) {
> struct rte_mbuf *mo;
> diff --git a/lib/librte_ip_frag/rte_ip_frag.h b/lib/librte_ip_frag/rte_ip_frag.h
> index 7f425f610..6fc8106bc 100644
> --- a/lib/librte_ip_frag/rte_ip_frag.h
> +++ b/lib/librte_ip_frag/rte_ip_frag.h
> @@ -211,28 +211,25 @@ rte_ipv6_fragment_packet(struct rte_mbuf *pkt_in,
> struct rte_mbuf *rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
> struct rte_ip_frag_death_row *dr,
> struct rte_mbuf *mb, uint64_t tms, struct ipv6_hdr *ip_hdr,
> - struct ipv6_extension_fragment *frag_hdr);
> + const struct ipv6_extension_fragment *frag_hdr);
>
> /**
> * Return a pointer to the packet's fragment header, if found.
> - * It only looks at the extension header that's right after the fixed IPv6
> - * header, and doesn't follow the whole chain of extension headers.
> *
> - * @param hdr
> + * @param pkt
> + * Pointer to the mbuf of the packet.
> + * @param ip_hdr
> * Pointer to the IPv6 header.
> + * @param frag_hdr
> + * A pointer to the buffer where the fragment header
> + * will be copied if it is not contiguous in mbuf data.
> * @return
> * Pointer to the IPv6 fragment extension header, or NULL if it's not
> * present.
> */
> -static inline struct ipv6_extension_fragment *
> -rte_ipv6_frag_get_ipv6_fragment_header(struct ipv6_hdr *hdr)
> -{
> - if (hdr->proto == IPPROTO_FRAGMENT) {
> - return (struct ipv6_extension_fragment *) ++hdr;
> - }
> - else
> - return NULL;
> -}
> +const struct ipv6_extension_fragment *rte_ipv6_frag_get_ipv6_fragment_header(
> + struct rte_mbuf *pkt, const struct ipv6_hdr *ip_hdr,
> + struct ipv6_extension_fragment *frag_hdr);
Another thing - wouldn't it be ab API/ABI breakage?
One more question - making it non-inline - how much it would affect performance?
My guess - no big difference, but did you check?
Konstantin
>
> /**
> * IPv4 fragmentation.
> diff --git a/lib/librte_ip_frag/rte_ip_frag_version.map b/lib/librte_ip_frag/rte_ip_frag_version.map
> index d40d5515f..8b4c82d08 100644
> --- a/lib/librte_ip_frag/rte_ip_frag_version.map
> +++ b/lib/librte_ip_frag/rte_ip_frag_version.map
> @@ -8,6 +8,7 @@ DPDK_2.0 {
> rte_ipv4_fragment_packet;
> rte_ipv6_frag_reassemble_packet;
> rte_ipv6_fragment_packet;
> + rte_ipv6_frag_get_ipv6_fragment_header;
>
> local: *;
> };
> diff --git a/lib/librte_ip_frag/rte_ipv6_fragmentation.c b/lib/librte_ip_frag/rte_ipv6_fragmentation.c
> index 62a7e4e83..bd847dd3d 100644
> --- a/lib/librte_ip_frag/rte_ipv6_fragmentation.c
> +++ b/lib/librte_ip_frag/rte_ipv6_fragmentation.c
> @@ -176,3 +176,41 @@ rte_ipv6_fragment_packet(struct rte_mbuf *pkt_in,
>
> return out_pkt_pos;
> }
> +
> +const struct ipv6_extension_fragment *
> +rte_ipv6_frag_get_ipv6_fragment_header(struct rte_mbuf *pkt,
> + const struct ipv6_hdr *ip_hdr,
> + struct ipv6_extension_fragment *frag_hdr)
> +{
> + size_t offset = sizeof(struct ipv6_hdr);
> + uint8_t nexthdr = ip_hdr->proto;
> +
> + while (ipv6_ext_hdr(nexthdr)) {
> + struct ipv6_opt_hdr opt;
> + const struct ipv6_opt_hdr *popt = rte_pktmbuf_read(pkt,
> + offset, sizeof(opt), &opt);
> + if (popt == NULL)
> + return NULL;
> +
> + switch (nexthdr) {
> + case IPPROTO_NONE:
> + return NULL;
> +
> + case IPPROTO_FRAGMENT:
> + return rte_pktmbuf_read(pkt, offset,
> + sizeof(*frag_hdr), frag_hdr);
> +
> + case IPPROTO_AH:
> + offset += (popt->hdrlen + 2) << 2;
> + break;
> +
> + default:
> + offset += (popt->hdrlen + 1) << 3;
> + break;
> + }
> +
> + nexthdr = popt->nexthdr;
> + }
> +
> + return NULL;
> +}
> diff --git a/lib/librte_ip_frag/rte_ipv6_reassembly.c b/lib/librte_ip_frag/rte_ipv6_reassembly.c
> index db249fe60..b2d67a3f0 100644
> --- a/lib/librte_ip_frag/rte_ipv6_reassembly.c
> +++ b/lib/librte_ip_frag/rte_ipv6_reassembly.c
> @@ -135,8 +135,8 @@ ipv6_frag_reassemble(struct ip_frag_pkt *fp)
> #define FRAG_OFFSET(x) (rte_cpu_to_be_16(x) >> 3)
> struct rte_mbuf *
> rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
> - struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, uint64_t tms,
> - struct ipv6_hdr *ip_hdr, struct ipv6_extension_fragment *frag_hdr)
> + struct rte_ip_frag_death_row *dr, struct rte_mbuf *mb, uint64_t tms,
> + struct ipv6_hdr *ip_hdr, const struct ipv6_extension_fragment *frag_hdr)
> {
> struct ip_frag_pkt *fp;
> struct ip_frag_key key;
> diff --git a/lib/librte_port/rte_port_ras.c b/lib/librte_port/rte_port_ras.c
> index c8b2e19bf..28764f744 100644
> --- a/lib/librte_port/rte_port_ras.c
> +++ b/lib/librte_port/rte_port_ras.c
> @@ -184,9 +184,11 @@ process_ipv6(struct rte_port_ring_writer_ras *p, struct rte_mbuf *pkt)
> /* Assume there is no ethernet header */
> struct ipv6_hdr *pkt_hdr = rte_pktmbuf_mtod(pkt, struct ipv6_hdr *);
>
> - struct ipv6_extension_fragment *frag_hdr;
> + const struct ipv6_extension_fragment *frag_hdr;
> + struct ipv6_extension_fragment frag_hdr_buf;
> uint16_t frag_data = 0;
> - frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(pkt_hdr);
> + frag_hdr = rte_ipv6_frag_get_ipv6_fragment_header(pkt, pkt_hdr,
> + &frag_hdr_buf);
> if (frag_hdr != NULL)
> frag_data = rte_be_to_cpu_16(frag_hdr->frag_data);
>
> --
> 2.17.1
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v1 3/3] test/hash: add readwrite test for ext table
2018-10-26 10:12 3% ` Bruce Richardson
2018-10-29 5:54 0% ` Honnappa Nagarahalli
@ 2018-10-31 4:21 3% ` Honnappa Nagarahalli
1 sibling, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2018-10-31 4:21 UTC (permalink / raw)
To: Bruce Richardson; +Cc: Yipeng Wang, stephen, dev, sameh.gobriel, nd, nd
> On Fri, Oct 26, 2018 at 12:23:56AM +0000, Honnappa Nagarahalli wrote:
> > >
> > > On Wed, Oct 10, 2018 at 02:48:05PM -0700, Yipeng Wang wrote:
> > > > This commit improves the readwrite test to consider extendable
> > > > table feature and add more functional tests to cover more corner cases.
> > > >
> > > > Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> ---
> > > > test/test/test_hash_readwrite.c | 70
> > > > ++++++++++++++++++++++++++++++++++------- 1 file changed, 58
> > > > insertions(+), 12 deletions(-)
> > > >
> > > With the extension of this test case, and the addition of other test
> > > cases by Honnappa in the other patch sets in this release, we are
> > > building up quite a large set of hash table autotests, some of whose
> > > meaning and use is a bit obscure. Are there any hash tests that you
> > > feel could be removed at this point, to simplify things?
> > >
> > (this comment does not apply to this patch) Looks like your concern is
> > about maintenance of the test code.
> > IMO, we need to reduce the number of configuration flags in this library
> which should reduce the number of test cases.
> > The flags I think that are not necessary are:
> > RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT - The tests prove that
> this gives significant performance boost. IMO, if the platform supports it, it
> should be enabled without user consent (I am not an expert on TSX).
> > RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY - Most use cases require this
> support. Only use case where this is not required is a single thread doing both
> inserts and lookups. Even if such a use case is valid, the lock over head should
> be small.
> >
> I agree with the idea. What I suggest is that only a single flag should be
> needed, and that only for the uncommon case, i.e. where we do not need any
> locking of the hash-table. Otherwise the hash should be thread safe by default
> and using the most effective locking mechanism for the platform.
>
> Unfortunately, doing this requires an ABI change, but since it only should
> affect the create function, it should be doable with function versioning to
> keep backward compatibility.
>
I have made the changes. It seems to be working fine. I will post it once internal review completes.
We made this change (SHA: 9d033dac7d7cacca9559e0381f99b4c730e80979) to support 'no free on delete'. This was done by introducing another configuration flag 'RTE_HASH_EXTRA_FLAGS_NO_FREE_ON_DEL'. IMO, it makes sense to keep delete and free as two different operations always and deprecate 'free during delete' support. We can provide backward compatibility by making ABI change instead of introducing another configuration flag.
> /Bruce
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v1 3/3] test/hash: add readwrite test for ext table
2018-10-26 10:12 3% ` Bruce Richardson
@ 2018-10-29 5:54 0% ` Honnappa Nagarahalli
2018-10-31 4:21 3% ` Honnappa Nagarahalli
1 sibling, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2018-10-29 5:54 UTC (permalink / raw)
To: Bruce Richardson; +Cc: Yipeng Wang, stephen, dev, sameh.gobriel, nd
> > > On Wed, Oct 10, 2018 at 02:48:05PM -0700, Yipeng Wang wrote:
> > > > This commit improves the readwrite test to consider extendable
> > > > table feature and add more functional tests to cover more corner cases.
> > > >
> > > > Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> ---
> > > > test/test/test_hash_readwrite.c | 70
> > > > ++++++++++++++++++++++++++++++++++------- 1 file changed, 58
> > > > insertions(+), 12 deletions(-)
> > > >
> > > With the extension of this test case, and the addition of other test
> > > cases by Honnappa in the other patch sets in this release, we are
> > > building up quite a large set of hash table autotests, some of whose
> > > meaning and use is a bit obscure. Are there any hash tests that you
> > > feel could be removed at this point, to simplify things?
> > >
> > (this comment does not apply to this patch) Looks like your concern is
> > about maintenance of the test code.
> > IMO, we need to reduce the number of configuration flags in this library
> which should reduce the number of test cases.
> > The flags I think that are not necessary are:
> > RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT - The tests prove that
> this gives significant performance boost. IMO, if the platform supports it, it
> should be enabled without user consent (I am not an expert on TSX).
> > RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY - Most use cases require this
> support. Only use case where this is not required is a single thread doing both
> inserts and lookups. Even if such a use case is valid, the lock over head should
> be small.
> >
> I agree with the idea. What I suggest is that only a single flag should be
> needed, and that only for the uncommon case, i.e. where we do not need any
> locking of the hash-table. Otherwise the hash should be thread safe by default
> and using the most effective locking mechanism for the platform.
>
RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF - should take care of this.
> Unfortunately, doing this requires an ABI change, but since it only should
> affect the create function, it should be doable with function versioning to
> keep backward compatibility.
>
Looks simple enough. Version the rte_hash_create function and map the existing function to 18.08. The new version of the function will always enable hw_trans_mem_support and rw_concurrency. Should we check to see if these flags are set by the user and print a warning message about deprecation of these flags in the newer version of the function?
> /Bruce
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v4 1/2] build: change default PMD installation subdir to dpdk/pmds-XX.YY
2018-10-02 16:20 3% ` [dpdk-dev] [PATCH v4 " Luca Boccassi
2018-10-02 16:28 0% ` Bruce Richardson
2018-10-05 16:00 0% ` Timothy Redaelli
@ 2018-10-27 21:19 0% ` Thomas Monjalon
2 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-10-27 21:19 UTC (permalink / raw)
To: Luca Boccassi
Cc: dev, bruce.richardson, tredaelli, christian.ehrhardt, mvarlese
02/10/2018 18:20, Luca Boccassi:
> As part of the effort of consolidating the DPDK installation bits and
> pieces across distros, set the default directory of lib/ where PMDs get
> installed to dpdk/pmds-XX.YY. It's necessary to have a versioned
> subdirectory as multiple ABI revisions might be installed at the same
> time, so having a fixed name will cause trouble with the autoload
> feature.
> Small refactor with parsing and saving the major version to a variable,
> since it's now used in 3 different places.
>
> Signed-off-by: Luca Boccassi <bluca@debian.org>
Series applied, thanks
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-26 10:15 0% ` Bruce Richardson
@ 2018-10-26 11:28 0% ` Olivier Matz
0 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2018-10-26 11:28 UTC (permalink / raw)
To: Bruce Richardson; +Cc: dev
On Fri, Oct 26, 2018 at 11:15:14AM +0100, Bruce Richardson wrote:
> On Fri, Oct 26, 2018 at 09:20:15AM +0200, Olivier Matz wrote:
> > Hi,
> >
> > On Wed, Oct 24, 2018 at 05:39:09PM +0100, Bruce Richardson wrote:
> > > On Wed, Oct 24, 2018 at 10:18:19AM +0200, Olivier Matz wrote:
> > > > This RFC targets 19.02.
> > > >
> > > > The rte_net headers conflict with the libc headers, because
> > > > some definitions are duplicated, sometimes with few differences.
> > > > This was discussed in [1], and more recently at the techboard.
> > > >
> > > > Before sending the deprecation notice (target for this is 18.11),
> > > > here is a draft that can be discussed.
> > > >
> > > > This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> > > > and defines in rte_net library. This is a big changeset, that will
> > > > break the API of many functions, but not the ABI.
> > > >
> > > > One question I'm asking is how can we manage the transition.
> > > > Initially, I hoped it was possible to have a compat layer during
> > > > one release (supporting both prefixed and unprefixed names), but
> > > > now that the patch is done, it seems the impact is too big, and
> > > > impacts too many libraries.
> > > >
> > > > Few examples:
> > > > - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> > > > - many rte_flow structures use the rte_ prefixed net structures
> > > > - the mac field of virtio_net structure is rte_ether_addr
> > > > - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> > > > ...
> > > >
> > > > Therefore, it is clear that doing this would break the compilation
> > > > of many external applications.
> > > >
> > >
> > > Can you clarify a bit as to why we can't keep around compatibility versions
> > > of the headers, alongside the new versions? I'm not following the logic
> > > above. Can we not introduce completely new headers with the replacements
> > > while leaving the old ones intact?
> >
> > This is something I tried to do, it is not in the RFC because it was
> > not satisfying, but you can find it there:
> >
> > http://git.droids-corp.org/?p=dpdk.git;a=commitdiff;h=ba1e8e498306
> >
> > With this patch, the usage of unprefixed structures, defines and
> > functions in rte net is still possible by an external application,
> > except if RTE_NET_NO_COMPAT is defined.
> >
> > However, functions and structures that are not in librte_net (the
> > examples from my previous mail, quoted above) use the rte_ prefixed
> > structures in their prototypes. For instance, an application that use
> > rte_eth_macaddr_get() will no compile anymore because it will pass
> > a (struct ether_addr *) instead of a (struct rte_ether_addr *).
> >
> > I don't see any good mean to fix this. Maybe we can do something with
> > defines, but I don't think it is possible to provide both APIs for
> > functions like rte_eth_macaddr_get(). I'm also not convinced it will be
> > that helpful. At the end, if the patchset is applied, we want the
> > applications to switch to the new API. To ease the transition, we can
> > provide a script to patch an application, very similar to the one I use
> > to generate the patchset.
> >
>
> Out of interest, about how many non rte_net functions are we talking about here?
I didn't count, but many. And not only functions, also structures.
To give an idea, here is the output of:
git diff origin/master..HEAD lib/ | filterdiff -i '*.h' -x 'a/lib/librte_net/*'
diff --git a/lib/librte_ethdev/rte_eth_ctrl.h b/lib/librte_ethdev/rte_eth_ctrl.h
index 5ea8ae24c..821d971cd 100644
--- a/lib/librte_ethdev/rte_eth_ctrl.h
+++ b/lib/librte_ethdev/rte_eth_ctrl.h
@@ -110,7 +110,7 @@ struct rte_eth_mac_filter {
uint8_t is_vf; /**< 1 for VF, 0 for port dev */
uint16_t dst_id; /**< VF ID, available when is_vf is 1*/
enum rte_mac_filter_type filter_type; /**< MAC filter type */
- struct ether_addr mac_addr;
+ struct rte_ether_addr mac_addr;
};
/**
@@ -126,7 +126,7 @@ struct rte_eth_mac_filter {
* RTE_ETH_FILTER_DELETE and RTE_ETH_FILTER_GET operations.
*/
struct rte_eth_ethertype_filter {
- struct ether_addr mac_addr; /**< Mac address to match. */
+ struct rte_ether_addr mac_addr; /**< Mac address to match. */
uint16_t ether_type; /**< Ether type to match */
uint16_t flags; /**< Flags from RTE_ETHTYPE_FLAGS_* */
uint16_t queue; /**< Queue assigned to when match*/
@@ -265,8 +265,8 @@ enum rte_tunnel_iptype {
* Tunneling Packet filter configuration.
*/
struct rte_eth_tunnel_filter_conf {
- struct ether_addr outer_mac; /**< Outer MAC address to match. */
- struct ether_addr inner_mac; /**< Inner MAC address to match. */
+ struct rte_ether_addr outer_mac; /**< Outer MAC address to match. */
+ struct rte_ether_addr inner_mac; /**< Inner MAC address to match. */
uint16_t inner_vlan; /**< Inner VLAN to match. */
enum rte_tunnel_iptype ip_type; /**< IP address type. */
/** Outer destination IP address to match if ETH_TUNNEL_FILTER_OIP
@@ -473,7 +473,7 @@ struct rte_eth_sctpv6_flow {
* A structure used to define the input for MAC VLAN flow
*/
struct rte_eth_mac_vlan_flow {
- struct ether_addr mac_addr; /**< Mac address to match. */
+ struct rte_ether_addr mac_addr; /**< Mac address to match. */
};
/**
@@ -493,7 +493,7 @@ struct rte_eth_tunnel_flow {
enum rte_eth_fdir_tunnel_type tunnel_type; /**< Tunnel type to match. */
/** Tunnel ID to match. TNI, VNI... in big endian. */
uint32_t tunnel_id;
- struct ether_addr mac_addr; /**< Mac address to match. */
+ struct rte_ether_addr mac_addr; /**< Mac address to match. */
};
/**
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index fb40c89e0..5deb4e38e 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -2159,7 +2159,7 @@ int rte_eth_dev_set_rx_queue_stats_mapping(uint16_t port_id,
* A pointer to a structure of type *ether_addr* to be filled with
* the Ethernet address of the Ethernet device.
*/
-void rte_eth_macaddr_get(uint16_t port_id, struct ether_addr *mac_addr);
+void rte_eth_macaddr_get(uint16_t port_id, struct rte_ether_addr *mac_addr);
/**
* Retrieve the contextual information of an Ethernet device.
@@ -2843,7 +2843,7 @@ int rte_eth_dev_priority_flow_ctrl_set(uint16_t port_id,
* - (-ENOSPC) if no more MAC addresses can be added.
* - (-EINVAL) if MAC address is invalid.
*/
-int rte_eth_dev_mac_addr_add(uint16_t port_id, struct ether_addr *mac_addr,
+int rte_eth_dev_mac_addr_add(uint16_t port_id, struct rte_ether_addr *mac_addr,
uint32_t pool);
/**
@@ -2859,7 +2859,7 @@ int rte_eth_dev_mac_addr_add(uint16_t port_id, struct ether_addr *mac_addr,
* - (-ENODEV) if *port* invalid.
* - (-EADDRINUSE) if attempting to remove the default MAC address
*/
-int rte_eth_dev_mac_addr_remove(uint16_t port_id, struct ether_addr *mac_addr);
+int rte_eth_dev_mac_addr_remove(uint16_t port_id, struct rte_ether_addr *mac_addr);
/**
* Set the default MAC address.
@@ -2875,7 +2875,7 @@ int rte_eth_dev_mac_addr_remove(uint16_t port_id, struct ether_addr *mac_addr);
* - (-EINVAL) if MAC address is invalid.
*/
int rte_eth_dev_default_mac_addr_set(uint16_t port_id,
- struct ether_addr *mac_addr);
+ struct rte_ether_addr *mac_addr);
/**
* Update Redirection Table(RETA) of Receive Side Scaling of Ethernet device.
@@ -2936,7 +2936,7 @@ int rte_eth_dev_rss_reta_query(uint16_t port_id,
* - (-EIO) if device is removed.
* - (-EINVAL) if bad parameter.
*/
-int rte_eth_dev_uc_hash_table_set(uint16_t port_id, struct ether_addr *addr,
+int rte_eth_dev_uc_hash_table_set(uint16_t port_id, struct rte_ether_addr *addr,
uint8_t on);
/**
@@ -3479,7 +3479,7 @@ rte_eth_dev_get_module_eeprom(uint16_t port_id,
* - (-ENOSPC) if *port_id* has not enough multicast filtering resources.
*/
int rte_eth_dev_set_mc_addr_list(uint16_t port_id,
- struct ether_addr *mc_addr_set,
+ struct rte_ether_addr *mc_addr_set,
uint32_t nb_mc_addr);
/**
diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_ethdev_core.h
index 0d28fd902..fa518620e 100644
--- a/lib/librte_ethdev/rte_ethdev_core.h
+++ b/lib/librte_ethdev/rte_ethdev_core.h
@@ -250,17 +250,17 @@ typedef void (*eth_mac_addr_remove_t)(struct rte_eth_dev *dev, uint32_t index);
/**< @internal Remove MAC address from receive address register */
typedef int (*eth_mac_addr_add_t)(struct rte_eth_dev *dev,
- struct ether_addr *mac_addr,
+ struct rte_ether_addr *mac_addr,
uint32_t index,
uint32_t vmdq);
/**< @internal Set a MAC address into Receive Address Address Register */
typedef int (*eth_mac_addr_set_t)(struct rte_eth_dev *dev,
- struct ether_addr *mac_addr);
+ struct rte_ether_addr *mac_addr);
/**< @internal Set a MAC address into Receive Address Address Register */
typedef int (*eth_uc_hash_table_set_t)(struct rte_eth_dev *dev,
- struct ether_addr *mac_addr,
+ struct rte_ether_addr *mac_addr,
uint8_t on);
/**< @internal Set a Unicast Hash bitmap */
@@ -292,7 +292,7 @@ typedef int (*eth_udp_tunnel_port_del_t)(struct rte_eth_dev *dev,
/**< @internal Delete tunneling UDP port */
typedef int (*eth_set_mc_addr_list_t)(struct rte_eth_dev *dev,
- struct ether_addr *mc_addr_set,
+ struct rte_ether_addr *mc_addr_set,
uint32_t nb_mc_addr);
/**< @internal set the list of multicast addresses on an Ethernet device */
@@ -597,10 +597,10 @@ struct rte_eth_dev_data {
/**< Common rx buffer size handled by all queues */
uint64_t rx_mbuf_alloc_failed; /**< RX ring mbuf allocation failures. */
- struct ether_addr* mac_addrs;/**< Device Ethernet Link address. */
+ struct rte_ether_addr* mac_addrs;/**< Device Ethernet Link address. */
uint64_t mac_pool_sel[ETH_NUM_RECEIVE_MAC_ADDR];
/** bitmap array of associating Ethernet MAC addresses to pools */
- struct ether_addr* hash_mac_addrs;
+ struct rte_ether_addr* hash_mac_addrs;
/** Device Ethernet MAC addresses of hash filtering. */
uint16_t port_id; /**< Device [external] port identifier. */
__extension__
diff --git a/lib/librte_ethdev/rte_flow.h b/lib/librte_ethdev/rte_flow.h
index 26e2fcfa0..c27d590a1 100644
--- a/lib/librte_ethdev/rte_flow.h
+++ b/lib/librte_ethdev/rte_flow.h
@@ -577,8 +577,8 @@ static const struct rte_flow_item_raw rte_flow_item_raw_mask = {
* same order as on the wire.
*/
struct rte_flow_item_eth {
- struct ether_addr dst; /**< Destination MAC. */
- struct ether_addr src; /**< Source MAC. */
+ struct rte_ether_addr dst; /**< Destination MAC. */
+ struct rte_ether_addr src; /**< Source MAC. */
rte_be16_t type; /**< EtherType or TPID. */
};
@@ -597,7 +597,7 @@ static const struct rte_flow_item_eth rte_flow_item_eth_mask = {
* Matches an 802.1Q/ad VLAN tag.
*
* The corresponding standard outer EtherType (TPID) values are
- * ETHER_TYPE_VLAN or ETHER_TYPE_QINQ. It can be overridden by the preceding
+ * RTE_ETHER_TYPE_VLAN or RTE_ETHER_TYPE_QINQ. It can be overridden by the preceding
* pattern item.
*/
struct rte_flow_item_vlan {
@@ -621,7 +621,7 @@ static const struct rte_flow_item_vlan rte_flow_item_vlan_mask = {
* Note: IPv4 options are handled by dedicated pattern items.
*/
struct rte_flow_item_ipv4 {
- struct ipv4_hdr hdr; /**< IPv4 header definition. */
+ struct rte_ipv4_hdr hdr; /**< IPv4 header definition. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_IPV4. */
@@ -643,7 +643,7 @@ static const struct rte_flow_item_ipv4 rte_flow_item_ipv4_mask = {
* RTE_FLOW_ITEM_TYPE_IPV6_EXT.
*/
struct rte_flow_item_ipv6 {
- struct ipv6_hdr hdr; /**< IPv6 header definition. */
+ struct rte_ipv6_hdr hdr; /**< IPv6 header definition. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_IPV6. */
@@ -666,7 +666,7 @@ static const struct rte_flow_item_ipv6 rte_flow_item_ipv6_mask = {
* Matches an ICMP header.
*/
struct rte_flow_item_icmp {
- struct icmp_hdr hdr; /**< ICMP header definition. */
+ struct rte_icmp_hdr hdr; /**< ICMP header definition. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_ICMP. */
@@ -685,7 +685,7 @@ static const struct rte_flow_item_icmp rte_flow_item_icmp_mask = {
* Matches a UDP header.
*/
struct rte_flow_item_udp {
- struct udp_hdr hdr; /**< UDP header definition. */
+ struct rte_udp_hdr hdr; /**< UDP header definition. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_UDP. */
@@ -704,7 +704,7 @@ static const struct rte_flow_item_udp rte_flow_item_udp_mask = {
* Matches a TCP header.
*/
struct rte_flow_item_tcp {
- struct tcp_hdr hdr; /**< TCP header definition. */
+ struct rte_tcp_hdr hdr; /**< TCP header definition. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_TCP. */
@@ -723,7 +723,7 @@ static const struct rte_flow_item_tcp rte_flow_item_tcp_mask = {
* Matches a SCTP header.
*/
struct rte_flow_item_sctp {
- struct sctp_hdr hdr; /**< SCTP header definition. */
+ struct rte_sctp_hdr hdr; /**< SCTP header definition. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_SCTP. */
@@ -761,7 +761,7 @@ static const struct rte_flow_item_vxlan rte_flow_item_vxlan_mask = {
* Matches a E-tag header.
*
* The corresponding standard outer EtherType (TPID) value is
- * ETHER_TYPE_ETAG. It can be overridden by the preceding pattern item.
+ * RTE_ETHER_TYPE_ETAG. It can be overridden by the preceding pattern item.
*/
struct rte_flow_item_e_tag {
/**
@@ -908,7 +908,7 @@ static const struct rte_flow_item_gtp rte_flow_item_gtp_mask = {
* Matches an ESP header.
*/
struct rte_flow_item_esp {
- struct esp_hdr hdr; /**< ESP header definition. */
+ struct rte_esp_hdr hdr; /**< ESP header definition. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_ESP. */
@@ -974,9 +974,9 @@ struct rte_flow_item_arp_eth_ipv4 {
uint8_t hln; /**< Hardware address length, normally 6. */
uint8_t pln; /**< Protocol address length, normally 4. */
rte_be16_t op; /**< Opcode (1 for request, 2 for reply). */
- struct ether_addr sha; /**< Sender hardware address. */
+ struct rte_ether_addr sha; /**< Sender hardware address. */
rte_be32_t spa; /**< Sender IPv4 address. */
- struct ether_addr tha; /**< Target hardware address. */
+ struct rte_ether_addr tha; /**< Target hardware address. */
rte_be32_t tpa; /**< Target IPv4 address. */
};
@@ -1120,7 +1120,7 @@ rte_flow_item_icmp6_nd_opt_mask = {
struct rte_flow_item_icmp6_nd_opt_sla_eth {
uint8_t type; /**< ND option type, normally 1. */
uint8_t length; /**< ND option length, normally 1. */
- struct ether_addr sla; /**< Source Ethernet LLA. */
+ struct rte_ether_addr sla; /**< Source Ethernet LLA. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_ICMP6_ND_OPT_SLA_ETH. */
@@ -1145,7 +1145,7 @@ rte_flow_item_icmp6_nd_opt_sla_eth_mask = {
struct rte_flow_item_icmp6_nd_opt_tla_eth {
uint8_t type; /**< ND option type, normally 2. */
uint8_t length; /**< ND option length, normally 1. */
- struct ether_addr tla; /**< Target Ethernet LLA. */
+ struct rte_ether_addr tla; /**< Target Ethernet LLA. */
};
/** Default mask for RTE_FLOW_ITEM_TYPE_ICMP6_ND_OPT_TLA_ETH. */
@@ -2036,7 +2036,7 @@ struct rte_flow_action_set_ttl {
* Set MAC address from the matched flow
*/
struct rte_flow_action_set_mac {
- uint8_t mac_addr[ETHER_ADDR_LEN];
+ uint8_t mac_addr[RTE_ETHER_ADDR_LEN];
};
/*
diff --git a/lib/librte_gro/gro_tcp4.h b/lib/librte_gro/gro_tcp4.h
index 6bb30cdb9..63f06bec4 100644
--- a/lib/librte_gro/gro_tcp4.h
+++ b/lib/librte_gro/gro_tcp4.h
@@ -19,8 +19,8 @@
/* Header fields representing a TCP/IPv4 flow */
struct tcp4_flow_key {
- struct ether_addr eth_saddr;
- struct ether_addr eth_daddr;
+ struct rte_ether_addr eth_saddr;
+ struct rte_ether_addr eth_daddr;
uint32_t ip_src_addr;
uint32_t ip_dst_addr;
@@ -182,8 +182,8 @@ uint32_t gro_tcp4_tbl_pkt_count(void *tbl);
static inline int
is_same_tcp4_flow(struct tcp4_flow_key k1, struct tcp4_flow_key k2)
{
- return (is_same_ether_addr(&k1.eth_saddr, &k2.eth_saddr) &&
- is_same_ether_addr(&k1.eth_daddr, &k2.eth_daddr) &&
+ return (rte_is_same_ether_addr(&k1.eth_saddr, &k2.eth_saddr) &&
+ rte_is_same_ether_addr(&k1.eth_daddr, &k2.eth_daddr) &&
(k1.ip_src_addr == k2.ip_src_addr) &&
(k1.ip_dst_addr == k2.ip_dst_addr) &&
(k1.recv_ack == k2.recv_ack) &&
@@ -255,7 +255,7 @@ merge_two_tcp4_packets(struct gro_tcp4_item *item,
*/
static inline int
check_seq_option(struct gro_tcp4_item *item,
- struct tcp_hdr *tcph,
+ struct rte_tcp_hdr *tcph,
uint32_t sent_seq,
uint16_t ip_id,
uint16_t tcp_hl,
@@ -264,17 +264,17 @@ check_seq_option(struct gro_tcp4_item *item,
uint8_t is_atomic)
{
struct rte_mbuf *pkt_orig = item->firstseg;
- struct ipv4_hdr *iph_orig;
- struct tcp_hdr *tcph_orig;
+ struct rte_ipv4_hdr *iph_orig;
+ struct rte_tcp_hdr *tcph_orig;
uint16_t len, tcp_hl_orig;
- iph_orig = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_orig, char *) +
+ iph_orig = (struct rte_ipv4_hdr *)(rte_pktmbuf_mtod(pkt_orig, char *) +
l2_offset + pkt_orig->l2_len);
- tcph_orig = (struct tcp_hdr *)((char *)iph_orig + pkt_orig->l3_len);
+ tcph_orig = (struct rte_tcp_hdr *)((char *)iph_orig + pkt_orig->l3_len);
tcp_hl_orig = pkt_orig->l4_len;
/* Check if TCP option fields equal */
- len = RTE_MAX(tcp_hl, tcp_hl_orig) - sizeof(struct tcp_hdr);
+ len = RTE_MAX(tcp_hl, tcp_hl_orig) - sizeof(struct rte_tcp_hdr);
if ((tcp_hl != tcp_hl_orig) || ((len > 0) &&
(memcmp(tcph + 1, tcph_orig + 1,
len) != 0)))
diff --git a/lib/librte_gro/gro_vxlan_tcp4.h b/lib/librte_gro/gro_vxlan_tcp4.h
index 0cafb9211..7832942a6 100644
--- a/lib/librte_gro/gro_vxlan_tcp4.h
+++ b/lib/librte_gro/gro_vxlan_tcp4.h
@@ -12,10 +12,10 @@
/* Header fields representing a VxLAN flow */
struct vxlan_tcp4_flow_key {
struct tcp4_flow_key inner_key;
- struct vxlan_hdr vxlan_hdr;
+ struct rte_vxlan_hdr vxlan_hdr;
- struct ether_addr outer_eth_saddr;
- struct ether_addr outer_eth_daddr;
+ struct rte_ether_addr outer_eth_saddr;
+ struct rte_ether_addr outer_eth_daddr;
uint32_t outer_ip_src_addr;
uint32_t outer_ip_dst_addr;
diff --git a/lib/librte_gso/gso_common.h b/lib/librte_gso/gso_common.h
index 6cd764ff5..48ad1686f 100644
--- a/lib/librte_gso/gso_common.h
+++ b/lib/librte_gso/gso_common.h
@@ -12,8 +12,8 @@
#include <rte_tcp.h>
#include <rte_udp.h>
-#define IS_FRAGMENTED(frag_off) (((frag_off) & IPV4_HDR_OFFSET_MASK) != 0 \
- || ((frag_off) & IPV4_HDR_MF_FLAG) == IPV4_HDR_MF_FLAG)
+#define IS_FRAGMENTED(frag_off) (((frag_off) & RTE_IPV4_HDR_OFFSET_MASK) != 0 \
+ || ((frag_off) & RTE_IPV4_HDR_MF_FLAG) == RTE_IPV4_HDR_MF_FLAG)
#define TCP_HDR_PSH_MASK ((uint8_t)0x08)
#define TCP_HDR_FIN_MASK ((uint8_t)0x01)
@@ -46,9 +46,9 @@
static inline void
update_udp_header(struct rte_mbuf *pkt, uint16_t udp_offset)
{
- struct udp_hdr *udp_hdr;
+ struct rte_udp_hdr *udp_hdr;
- udp_hdr = (struct udp_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+ udp_hdr = (struct rte_udp_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
udp_offset);
udp_hdr->dgram_len = rte_cpu_to_be_16(pkt->pkt_len - udp_offset);
}
@@ -71,9 +71,9 @@ static inline void
update_tcp_header(struct rte_mbuf *pkt, uint16_t l4_offset, uint32_t sent_seq,
uint8_t non_tail)
{
- struct tcp_hdr *tcp_hdr;
+ struct rte_tcp_hdr *tcp_hdr;
- tcp_hdr = (struct tcp_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+ tcp_hdr = (struct rte_tcp_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
l4_offset);
tcp_hdr->sent_seq = rte_cpu_to_be_32(sent_seq);
if (likely(non_tail))
@@ -98,9 +98,9 @@ update_tcp_header(struct rte_mbuf *pkt, uint16_t l4_offset, uint32_t sent_seq,
static inline void
update_ipv4_header(struct rte_mbuf *pkt, uint16_t l3_offset, uint16_t id)
{
- struct ipv4_hdr *ipv4_hdr;
+ struct rte_ipv4_hdr *ipv4_hdr;
- ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
+ ipv4_hdr = (struct rte_ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
l3_offset);
ipv4_hdr->total_length = rte_cpu_to_be_16(pkt->pkt_len - l3_offset);
ipv4_hdr->packet_id = rte_cpu_to_be_16(id);
diff --git a/lib/librte_gso/rte_gso.h b/lib/librte_gso/rte_gso.h
index a626a11e3..3aab297f4 100644
--- a/lib/librte_gso/rte_gso.h
+++ b/lib/librte_gso/rte_gso.h
@@ -18,12 +18,12 @@ extern "C" {
#include <rte_mbuf.h>
/* Minimum GSO segment size for TCP based packets. */
-#define RTE_GSO_SEG_SIZE_MIN (sizeof(struct ether_hdr) + \
- sizeof(struct ipv4_hdr) + sizeof(struct tcp_hdr) + 1)
+#define RTE_GSO_SEG_SIZE_MIN (sizeof(struct rte_ether_hdr) + \
+ sizeof(struct rte_ipv4_hdr) + sizeof(struct rte_tcp_hdr) + 1)
/* Minimum GSO segment size for UDP based packets. */
-#define RTE_GSO_UDP_SEG_SIZE_MIN (sizeof(struct ether_hdr) + \
- sizeof(struct ipv4_hdr) + sizeof(struct udp_hdr) + 1)
+#define RTE_GSO_UDP_SEG_SIZE_MIN (sizeof(struct rte_ether_hdr) + \
+ sizeof(struct rte_ipv4_hdr) + sizeof(struct rte_udp_hdr) + 1)
/* GSO flags for rte_gso_ctx. */
#define RTE_GSO_FLAG_IPID_FIXED (1ULL << 0)
diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h
index a6ddb7bf7..adbaf8f70 100644
--- a/lib/librte_hash/rte_thash.h
+++ b/lib/librte_hash/rte_thash.h
@@ -168,7 +168,7 @@ rte_convert_rss_key(const uint32_t *orig, uint32_t *targ, int len)
* Pointer to rte_ipv6_tuple structure
*/
static inline void
-rte_thash_load_v6_addrs(const struct ipv6_hdr *orig, union rte_thash_tuple *targ)
+rte_thash_load_v6_addrs(const struct rte_ipv6_hdr *orig, union rte_thash_tuple *targ)
{
#ifdef RTE_ARCH_X86
__m128i ipv6 = _mm_loadu_si128((const __m128i *)orig->src_addr);
diff --git a/lib/librte_ip_frag/rte_ip_frag.h b/lib/librte_ip_frag/rte_ip_frag.h
index 7f425f610..28ba33dac 100644
--- a/lib/librte_ip_frag/rte_ip_frag.h
+++ b/lib/librte_ip_frag/rte_ip_frag.h
@@ -210,7 +210,7 @@ rte_ipv6_fragment_packet(struct rte_mbuf *pkt_in,
*/
struct rte_mbuf *rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
struct rte_ip_frag_death_row *dr,
- struct rte_mbuf *mb, uint64_t tms, struct ipv6_hdr *ip_hdr,
+ struct rte_mbuf *mb, uint64_t tms, struct rte_ipv6_hdr *ip_hdr,
struct ipv6_extension_fragment *frag_hdr);
/**
@@ -225,7 +225,7 @@ struct rte_mbuf *rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
* present.
*/
static inline struct ipv6_extension_fragment *
-rte_ipv6_frag_get_ipv6_fragment_header(struct ipv6_hdr *hdr)
+rte_ipv6_frag_get_ipv6_fragment_header(struct rte_ipv6_hdr *hdr)
{
if (hdr->proto == IPPROTO_FRAGMENT) {
return (struct ipv6_extension_fragment *) ++hdr;
@@ -284,7 +284,7 @@ int32_t rte_ipv4_fragment_packet(struct rte_mbuf *pkt_in,
*/
struct rte_mbuf * rte_ipv4_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
struct rte_ip_frag_death_row *dr,
- struct rte_mbuf *mb, uint64_t tms, struct ipv4_hdr *ip_hdr);
+ struct rte_mbuf *mb, uint64_t tms, struct rte_ipv4_hdr *ip_hdr);
/**
* Check if the IPv4 packet is fragmented
@@ -295,12 +295,12 @@ struct rte_mbuf * rte_ipv4_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
* 1 if fragmented, 0 if not fragmented
*/
static inline int
-rte_ipv4_frag_pkt_is_fragmented(const struct ipv4_hdr * hdr) {
+rte_ipv4_frag_pkt_is_fragmented(const struct rte_ipv4_hdr * hdr) {
uint16_t flag_offset, ip_flag, ip_ofs;
flag_offset = rte_be_to_cpu_16(hdr->fragment_offset);
- ip_ofs = (uint16_t)(flag_offset & IPV4_HDR_OFFSET_MASK);
- ip_flag = (uint16_t)(flag_offset & IPV4_HDR_MF_FLAG);
+ ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK);
+ ip_flag = (uint16_t)(flag_offset & RTE_IPV4_HDR_MF_FLAG);
return ip_flag != 0 || ip_ofs != 0;
}
diff --git a/lib/librte_kni/rte_kni.h b/lib/librte_kni/rte_kni.h
index 601abdfc6..ce86b19a2 100644
--- a/lib/librte_kni/rte_kni.h
+++ b/lib/librte_kni/rte_kni.h
@@ -68,7 +68,7 @@ struct rte_kni_conf {
__extension__
uint8_t force_bind : 1; /* Flag to bind kernel thread */
- char mac_addr[ETHER_ADDR_LEN]; /* MAC address assigned to KNI */
+ char mac_addr[RTE_ETHER_ADDR_LEN]; /* MAC address assigned to KNI */
uint16_t mtu;
};
diff --git a/lib/librte_pipeline/rte_table_action.h b/lib/librte_pipeline/rte_table_action.h
index c96061291..400bd5e2c 100644
--- a/lib/librte_pipeline/rte_table_action.h
+++ b/lib/librte_pipeline/rte_table_action.h
@@ -384,8 +384,8 @@ enum rte_table_action_encap_type {
/** Pre-computed Ethernet header fields for encapsulation action. */
struct rte_table_action_ether_hdr {
- struct ether_addr da; /**< Destination address. */
- struct ether_addr sa; /**< Source address. */
+ struct rte_ether_addr da; /**< Destination address. */
+ struct rte_ether_addr sa; /**< Source address. */
};
/** Pre-computed VLAN header fields for encapsulation action. */
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index b4abad30c..064ebb951 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -356,7 +356,7 @@ struct virtio_net {
uint64_t log_size;
uint64_t log_base;
uint64_t log_addr;
- struct ether_addr mac;
+ struct rte_ether_addr mac;
uint16_t mtu;
struct vhost_device_ops const *notify_ops;
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-26 7:20 0% ` Olivier Matz
@ 2018-10-26 10:15 0% ` Bruce Richardson
2018-10-26 11:28 0% ` Olivier Matz
0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-10-26 10:15 UTC (permalink / raw)
To: Olivier Matz; +Cc: dev
On Fri, Oct 26, 2018 at 09:20:15AM +0200, Olivier Matz wrote:
> Hi,
>
> On Wed, Oct 24, 2018 at 05:39:09PM +0100, Bruce Richardson wrote:
> > On Wed, Oct 24, 2018 at 10:18:19AM +0200, Olivier Matz wrote:
> > > This RFC targets 19.02.
> > >
> > > The rte_net headers conflict with the libc headers, because
> > > some definitions are duplicated, sometimes with few differences.
> > > This was discussed in [1], and more recently at the techboard.
> > >
> > > Before sending the deprecation notice (target for this is 18.11),
> > > here is a draft that can be discussed.
> > >
> > > This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> > > and defines in rte_net library. This is a big changeset, that will
> > > break the API of many functions, but not the ABI.
> > >
> > > One question I'm asking is how can we manage the transition.
> > > Initially, I hoped it was possible to have a compat layer during
> > > one release (supporting both prefixed and unprefixed names), but
> > > now that the patch is done, it seems the impact is too big, and
> > > impacts too many libraries.
> > >
> > > Few examples:
> > > - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> > > - many rte_flow structures use the rte_ prefixed net structures
> > > - the mac field of virtio_net structure is rte_ether_addr
> > > - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> > > ...
> > >
> > > Therefore, it is clear that doing this would break the compilation
> > > of many external applications.
> > >
> >
> > Can you clarify a bit as to why we can't keep around compatibility versions
> > of the headers, alongside the new versions? I'm not following the logic
> > above. Can we not introduce completely new headers with the replacements
> > while leaving the old ones intact?
>
> This is something I tried to do, it is not in the RFC because it was
> not satisfying, but you can find it there:
>
> http://git.droids-corp.org/?p=dpdk.git;a=commitdiff;h=ba1e8e498306
>
> With this patch, the usage of unprefixed structures, defines and
> functions in rte net is still possible by an external application,
> except if RTE_NET_NO_COMPAT is defined.
>
> However, functions and structures that are not in librte_net (the
> examples from my previous mail, quoted above) use the rte_ prefixed
> structures in their prototypes. For instance, an application that use
> rte_eth_macaddr_get() will no compile anymore because it will pass
> a (struct ether_addr *) instead of a (struct rte_ether_addr *).
>
> I don't see any good mean to fix this. Maybe we can do something with
> defines, but I don't think it is possible to provide both APIs for
> functions like rte_eth_macaddr_get(). I'm also not convinced it will be
> that helpful. At the end, if the patchset is applied, we want the
> applications to switch to the new API. To ease the transition, we can
> provide a script to patch an application, very similar to the one I use
> to generate the patchset.
>
Out of interest, about how many non rte_net functions are we talking about here?
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v1 3/3] test/hash: add readwrite test for ext table
@ 2018-10-26 10:12 3% ` Bruce Richardson
2018-10-29 5:54 0% ` Honnappa Nagarahalli
2018-10-31 4:21 3% ` Honnappa Nagarahalli
0 siblings, 2 replies; 200+ results
From: Bruce Richardson @ 2018-10-26 10:12 UTC (permalink / raw)
To: Honnappa Nagarahalli; +Cc: Yipeng Wang, stephen, dev, sameh.gobriel, nd
On Fri, Oct 26, 2018 at 12:23:56AM +0000, Honnappa Nagarahalli wrote:
> >
> > On Wed, Oct 10, 2018 at 02:48:05PM -0700, Yipeng Wang wrote:
> > > This commit improves the readwrite test to consider extendable table
> > > feature and add more functional tests to cover more corner cases.
> > >
> > > Signed-off-by: Yipeng Wang <yipeng1.wang@intel.com> ---
> > > test/test/test_hash_readwrite.c | 70
> > > ++++++++++++++++++++++++++++++++++------- 1 file changed, 58
> > > insertions(+), 12 deletions(-)
> > >
> > With the extension of this test case, and the addition of other test cases by
> > Honnappa in the other patch sets in this release, we are building up quite a
> > large set of hash table autotests, some of whose meaning and use is a bit
> > obscure. Are there any hash tests that you feel could be removed at this point,
> > to simplify things?
> >
> (this comment does not apply to this patch)
> Looks like your concern is about maintenance of the test code.
> IMO, we need to reduce the number of configuration flags in this library which should reduce the number of test cases.
> The flags I think that are not necessary are:
> RTE_HASH_EXTRA_FLAGS_TRANS_MEM_SUPPORT - The tests prove that this gives significant performance boost. IMO, if the platform supports it, it should be enabled without user consent (I am not an expert on TSX).
> RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY - Most use cases require this support. Only use case where this is not required is a single thread doing both inserts and lookups. Even if such a use case is valid, the lock over head should be small.
>
I agree with the idea. What I suggest is that only a single flag should be
needed, and that only for the uncommon case, i.e. where we do not need any
locking of the hash-table. Otherwise the hash should be thread safe by
default and using the most effective locking mechanism for the platform.
Unfortunately, doing this requires an ABI change, but since it only should
affect the create function, it should be doable with function versioning to
keep backward compatibility.
/Bruce
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-24 18:38 0% ` Stephen Hemminger
@ 2018-10-26 7:56 0% ` Olivier Matz
0 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2018-10-26 7:56 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
Hi Stephen,
On Wed, Oct 24, 2018 at 11:38:12AM -0700, Stephen Hemminger wrote:
> On Wed, 24 Oct 2018 10:18:19 +0200
> Olivier Matz <olivier.matz@6wind.com> wrote:
>
> > This RFC targets 19.02.
> >
> > The rte_net headers conflict with the libc headers, because
> > some definitions are duplicated, sometimes with few differences.
> > This was discussed in [1], and more recently at the techboard.
> >
> > Before sending the deprecation notice (target for this is 18.11),
> > here is a draft that can be discussed.
> >
> > This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> > and defines in rte_net library. This is a big changeset, that will
> > break the API of many functions, but not the ABI.
> >
> > One question I'm asking is how can we manage the transition.
> > Initially, I hoped it was possible to have a compat layer during
> > one release (supporting both prefixed and unprefixed names), but
> > now that the patch is done, it seems the impact is too big, and
> > impacts too many libraries.
> >
> > Few examples:
> > - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> > - many rte_flow structures use the rte_ prefixed net structures
> > - the mac field of virtio_net structure is rte_ether_addr
> > - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> > ...
> >
> > Therefore, it is clear that doing this would break the compilation
> > of many external applications.
> >
> > Another drawback we need to take in account: it will make the
> > backport of patches more difficult, although this is something
> > that could be tempered by a script.
> >
> > While it is obviously better to have a good namespace convention,
> > we need to identify the issues we have today before deciding it's
> > worth doing the change.
> >
> > Comments?
> >
> >
> > Things that are missing in RFC:
> > - test with FreeBSD
> > - manually fix some indentation issues
> >
> >
> > Olivier Matz (14):
> > net: add rte prefix to arp structures
> > net: add rte prefix to arp defines
> > net: add rte prefix to ether structures
> > net: add rte prefix to ether functions
> > net: add rte prefix to ether defines
> > net: add rte prefix to esp structure
> > net: add rte prefix to gre structure
> > net: add rte prefix to icmp structure
> > net: add rte prefix to icmp defines
> > net: add rte prefix to ip structure
> > net: add rte prefix to ip defines
> > net: add rte prefix to sctp structure
> > net: add rte prefix to tcp structure
> > net: add rte prefix to udp structure
> >
>
> Since BSD structures are available on Linux and BSD why is DPDK reinventing?
> There is no value in doing that.
>From what I see, some structures or defines are a bit different in Linux
and FreeBSD. Examples:
/* Linux */
struct ether_addr
{
u_int8_t ether_addr_octet[ETH_ALEN];
} __attribute__ ((__packed__));
/* FreeBSD */
struct ether_addr {
u_char octet[ETHER_ADDR_LEN];
} __packed;
That's true the compat between Linux and FreeBSD is better than before
in glibc. For instance with 7011c2622fe3 ("Remove __FAVOR_BSD.") added
in 2013 (glibc 2.19). It seems that MUSL also supports BSD network
structures.
So, I agree that using BSD structure looks possible, at least for
ip/ip6/tcp/udp/icmp/... structures and defines. I think we would still
need to provide some network structures for less usual protocols.
The question is: are we confident that the support of network BSD
struct/defines/funcs is good enough in all libc we (will) want to use?
Since DPDK is a network software, it is not that odd to provide our
own network structures, because we will have control on them.
Olivier
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-24 14:56 0% ` Wiles, Keith
@ 2018-10-26 7:22 0% ` Olivier Matz
0 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2018-10-26 7:22 UTC (permalink / raw)
To: Wiles, Keith; +Cc: dpdk-dev
Hi,
On Wed, Oct 24, 2018 at 02:56:14PM +0000, Wiles, Keith wrote:
>
>
> > On Oct 24, 2018, at 1:18 AM, Olivier Matz <olivier.matz@6wind.com> wrote:
> >
> > This RFC targets 19.02.
> >
> > The rte_net headers conflict with the libc headers, because
> > some definitions are duplicated, sometimes with few differences.
> > This was discussed in [1], and more recently at the techboard.
> >
> > Before sending the deprecation notice (target for this is 18.11),
> > here is a draft that can be discussed.
> >
> > This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> > and defines in rte_net library. This is a big changeset, that will
> > break the API of many functions, but not the ABI.
> >
> > One question I'm asking is how can we manage the transition.
> > Initially, I hoped it was possible to have a compat layer during
> > one release (supporting both prefixed and unprefixed names), but
> > now that the patch is done, it seems the impact is too big, and
> > impacts too many libraries.
> >
> > Few examples:
> > - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> > - many rte_flow structures use the rte_ prefixed net structures
> > - the mac field of virtio_net structure is rte_ether_addr
> > - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> > ...
> >
> > Therefore, it is clear that doing this would break the compilation
> > of many external applications.
> >
> > Another drawback we need to take in account: it will make the
> > backport of patches more difficult, although this is something
> > that could be tempered by a script.
> >
> > While it is obviously better to have a good namespace convention,
> > we need to identify the issues we have today before deciding it's
> > worth doing the change.
> >
> > Comments?
>
> I did not see the deprecation notice in the patches below, but I could have missed it.
I will send it only if we reach a consensus about the need to
apply the patchset.
Regards
Olivier
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-24 16:39 0% ` Bruce Richardson
@ 2018-10-26 7:20 0% ` Olivier Matz
2018-10-26 10:15 0% ` Bruce Richardson
0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2018-10-26 7:20 UTC (permalink / raw)
To: Bruce Richardson; +Cc: dev
Hi,
On Wed, Oct 24, 2018 at 05:39:09PM +0100, Bruce Richardson wrote:
> On Wed, Oct 24, 2018 at 10:18:19AM +0200, Olivier Matz wrote:
> > This RFC targets 19.02.
> >
> > The rte_net headers conflict with the libc headers, because
> > some definitions are duplicated, sometimes with few differences.
> > This was discussed in [1], and more recently at the techboard.
> >
> > Before sending the deprecation notice (target for this is 18.11),
> > here is a draft that can be discussed.
> >
> > This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> > and defines in rte_net library. This is a big changeset, that will
> > break the API of many functions, but not the ABI.
> >
> > One question I'm asking is how can we manage the transition.
> > Initially, I hoped it was possible to have a compat layer during
> > one release (supporting both prefixed and unprefixed names), but
> > now that the patch is done, it seems the impact is too big, and
> > impacts too many libraries.
> >
> > Few examples:
> > - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> > - many rte_flow structures use the rte_ prefixed net structures
> > - the mac field of virtio_net structure is rte_ether_addr
> > - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> > ...
> >
> > Therefore, it is clear that doing this would break the compilation
> > of many external applications.
> >
>
> Can you clarify a bit as to why we can't keep around compatibility versions
> of the headers, alongside the new versions? I'm not following the logic
> above. Can we not introduce completely new headers with the replacements
> while leaving the old ones intact?
This is something I tried to do, it is not in the RFC because it was
not satisfying, but you can find it there:
http://git.droids-corp.org/?p=dpdk.git;a=commitdiff;h=ba1e8e498306
With this patch, the usage of unprefixed structures, defines and
functions in rte net is still possible by an external application,
except if RTE_NET_NO_COMPAT is defined.
However, functions and structures that are not in librte_net (the
examples from my previous mail, quoted above) use the rte_ prefixed
structures in their prototypes. For instance, an application that use
rte_eth_macaddr_get() will no compile anymore because it will pass
a (struct ether_addr *) instead of a (struct rte_ether_addr *).
I don't see any good mean to fix this. Maybe we can do something with
defines, but I don't think it is possible to provide both APIs for
functions like rte_eth_macaddr_get(). I'm also not convinced it will be
that helpful. At the end, if the patchset is applied, we want the
applications to switch to the new API. To ease the transition, we can
provide a script to patch an application, very similar to the one I use
to generate the patchset.
Olivier
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 6/6] doc: remove internal libs from release notes
@ 2018-10-25 0:07 4% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-10-25 0:07 UTC (permalink / raw)
To: Ferruh Yigit
Cc: dev, Shreyansh Jain, John McNamara, Marko Kovacevic,
yipeng1.wang, pablo.de.lara.guarch
16/10/2018 13:52, Shreyansh Jain:
> On Monday 15 October 2018 08:20 PM, Ferruh Yigit wrote:
> > These libraries has exported functions but the target of those functions
> > are not user but other libraries.
> >
> > The version of these libraries doesn't mean much to the user so can be
> > dropped from release notes.
> >
> > Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > ---
> > Indeed this is more a question, should we keep them or remove them?
>
> +1 for removing them.
> At least for dpaa/fslmc perspective, I don't see any additional benefit
> in release note. These libraries (dpaa/fslmc) are not actually
> 'libraries' in true (read, plugability) sense :)
> > --- a/doc/guides/rel_notes/release_18_11.rst
> > +++ b/doc/guides/rel_notes/release_18_11.rst
> > - + librte_bus_dpaa.so.2
> > - + librte_bus_fslmc.so.2
> > - + librte_bus_ifpga.so.2
> > - + librte_bus_pci.so.2
> > - + librte_bus_vdev.so.2
> > - + librte_bus_vmbus.so.2
The ABI of bus libraries is important if you want to plug a PMD
into an older DPDK: if bus ABI has changed, you cannot.
I am for keeping them.
> > - librte_pci.so.1
This is a true library!
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-24 8:18 1% [dpdk-dev] [RFC 00/14] prefix network structures Olivier Matz
` (2 preceding siblings ...)
2018-10-24 16:39 0% ` Bruce Richardson
@ 2018-10-24 18:38 0% ` Stephen Hemminger
2018-10-26 7:56 0% ` Olivier Matz
3 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2018-10-24 18:38 UTC (permalink / raw)
To: Olivier Matz; +Cc: dev
On Wed, 24 Oct 2018 10:18:19 +0200
Olivier Matz <olivier.matz@6wind.com> wrote:
> This RFC targets 19.02.
>
> The rte_net headers conflict with the libc headers, because
> some definitions are duplicated, sometimes with few differences.
> This was discussed in [1], and more recently at the techboard.
>
> Before sending the deprecation notice (target for this is 18.11),
> here is a draft that can be discussed.
>
> This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> and defines in rte_net library. This is a big changeset, that will
> break the API of many functions, but not the ABI.
>
> One question I'm asking is how can we manage the transition.
> Initially, I hoped it was possible to have a compat layer during
> one release (supporting both prefixed and unprefixed names), but
> now that the patch is done, it seems the impact is too big, and
> impacts too many libraries.
>
> Few examples:
> - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> - many rte_flow structures use the rte_ prefixed net structures
> - the mac field of virtio_net structure is rte_ether_addr
> - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> ...
>
> Therefore, it is clear that doing this would break the compilation
> of many external applications.
>
> Another drawback we need to take in account: it will make the
> backport of patches more difficult, although this is something
> that could be tempered by a script.
>
> While it is obviously better to have a good namespace convention,
> we need to identify the issues we have today before deciding it's
> worth doing the change.
>
> Comments?
>
>
> Things that are missing in RFC:
> - test with FreeBSD
> - manually fix some indentation issues
>
>
> Olivier Matz (14):
> net: add rte prefix to arp structures
> net: add rte prefix to arp defines
> net: add rte prefix to ether structures
> net: add rte prefix to ether functions
> net: add rte prefix to ether defines
> net: add rte prefix to esp structure
> net: add rte prefix to gre structure
> net: add rte prefix to icmp structure
> net: add rte prefix to icmp defines
> net: add rte prefix to ip structure
> net: add rte prefix to ip defines
> net: add rte prefix to sctp structure
> net: add rte prefix to tcp structure
> net: add rte prefix to udp structure
>
> app/pdump/main.c | 2 +-
> app/test-eventdev/test_perf_common.c | 2 +-
> app/test-eventdev/test_pipeline_common.c | 2 +-
> app/test-pmd/cmdline.c | 66 ++---
> app/test-pmd/cmdline_flow.c | 10 +-
> app/test-pmd/config.c | 34 +--
> app/test-pmd/csumonly.c | 156 +++++-----
> app/test-pmd/flowgen.c | 30 +-
> app/test-pmd/icmpecho.c | 120 ++++----
> app/test-pmd/ieee1588fwd.c | 18 +-
> app/test-pmd/macfwd.c | 12 +-
> app/test-pmd/macswap.c | 16 +-
> app/test-pmd/parameters.c | 6 +-
> app/test-pmd/testpmd.c | 22 +-
> app/test-pmd/testpmd.h | 18 +-
> app/test-pmd/txonly.c | 36 +--
> app/test-pmd/util.c | 34 +--
> doc/guides/prog_guide/bbdev.rst | 6 +-
> .../prog_guide/packet_classif_access_ctrl.rst | 18 +-
> doc/guides/prog_guide/rte_flow.rst | 4 +-
> doc/guides/sample_app_ug/flow_classify.rst | 28 +-
> doc/guides/sample_app_ug/flow_filtering.rst | 6 +-
> doc/guides/sample_app_ug/ip_frag.rst | 16 +-
> doc/guides/sample_app_ug/ip_reassembly.rst | 16 +-
> doc/guides/sample_app_ug/ipv4_multicast.rst | 16 +-
> doc/guides/sample_app_ug/l2_forward_job_stats.rst | 6 +-
> .../sample_app_ug/l2_forward_real_virtual.rst | 6 +-
> doc/guides/sample_app_ug/l3_forward.rst | 12 +-
> doc/guides/sample_app_ug/link_status_intr.rst | 6 +-
> doc/guides/sample_app_ug/ptpclient.rst | 6 +-
> doc/guides/sample_app_ug/rxtx_callbacks.rst | 2 +-
> doc/guides/sample_app_ug/server_node_efd.rst | 12 +-
> doc/guides/sample_app_ug/skeleton.rst | 4 +-
> doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst | 4 +-
> drivers/bus/dpaa/base/fman/fman.c | 2 +-
> drivers/bus/dpaa/base/fman/fman_hw.c | 2 +-
> drivers/bus/dpaa/include/fman.h | 2 +-
> drivers/bus/dpaa/include/netcfg.h | 4 +-
> drivers/net/af_packet/rte_eth_af_packet.c | 2 +-
> drivers/net/ark/ark_ethdev.c | 16 +-
> drivers/net/ark/ark_ext.h | 4 +-
> drivers/net/ark/ark_global.h | 4 +-
> drivers/net/atlantic/atl_ethdev.c | 20 +-
> drivers/net/atlantic/hw_atl/hw_atl_utils.c | 8 +-
> drivers/net/atlantic/hw_atl/hw_atl_utils_fw2x.c | 4 +-
> drivers/net/avf/avf.h | 4 +-
> drivers/net/avf/avf_ethdev.c | 50 ++--
> drivers/net/avf/avf_rxtx.c | 14 +-
> drivers/net/avf/avf_vchnl.c | 8 +-
> drivers/net/avf/base/avf_adminq_cmd.h | 4 +-
> drivers/net/avf/base/avf_common.c | 12 +-
> drivers/net/avf/base/avf_prototype.h | 4 +-
> drivers/net/avp/avp_ethdev.c | 20 +-
> drivers/net/avp/rte_avp_common.h | 2 +-
> drivers/net/axgbe/axgbe_dev.c | 4 +-
> drivers/net/axgbe/axgbe_ethdev.c | 10 +-
> drivers/net/axgbe/axgbe_ethdev.h | 4 +-
> drivers/net/axgbe/axgbe_rxtx.c | 2 +-
> drivers/net/bnx2x/bnx2x.c | 16 +-
> drivers/net/bnx2x/bnx2x_ethdev.c | 4 +-
> drivers/net/bnx2x/bnx2x_ethdev.h | 2 +-
> drivers/net/bnx2x/bnx2x_vfpf.c | 8 +-
> drivers/net/bnx2x/bnx2x_vfpf.h | 2 +-
> drivers/net/bnx2x/ecore_sp.h | 2 +-
> drivers/net/bnxt/bnxt.h | 4 +-
> drivers/net/bnxt/bnxt_ethdev.c | 70 ++---
> drivers/net/bnxt/bnxt_filter.c | 4 +-
> drivers/net/bnxt/bnxt_filter.h | 8 +-
> drivers/net/bnxt/bnxt_flow.c | 26 +-
> drivers/net/bnxt/bnxt_hwrm.c | 40 +--
> drivers/net/bnxt/bnxt_hwrm.h | 2 +-
> drivers/net/bnxt/bnxt_ring.c | 8 +-
> drivers/net/bnxt/bnxt_rxq.c | 2 +-
> drivers/net/bnxt/bnxt_rxr.c | 2 +-
> drivers/net/bnxt/bnxt_vnic.c | 2 +-
> drivers/net/bnxt/rte_pmd_bnxt.c | 14 +-
> drivers/net/bnxt/rte_pmd_bnxt.h | 4 +-
> drivers/net/bonding/rte_eth_bond.h | 2 +-
> drivers/net/bonding/rte_eth_bond_8023ad.c | 26 +-
> drivers/net/bonding/rte_eth_bond_8023ad.h | 10 +-
> drivers/net/bonding/rte_eth_bond_alb.c | 78 ++---
> drivers/net/bonding/rte_eth_bond_alb.h | 10 +-
> drivers/net/bonding/rte_eth_bond_api.c | 2 +-
> drivers/net/bonding/rte_eth_bond_args.c | 2 +-
> drivers/net/bonding/rte_eth_bond_pmd.c | 194 ++++++-------
> drivers/net/bonding/rte_eth_bond_private.h | 6 +-
> drivers/net/cxgbe/base/adapter.h | 6 +-
> drivers/net/cxgbe/base/t4_hw.c | 8 +-
> drivers/net/cxgbe/cxgbe.h | 4 +-
> drivers/net/cxgbe/cxgbe_ethdev.c | 14 +-
> drivers/net/cxgbe/cxgbe_filter.h | 2 +-
> drivers/net/cxgbe/cxgbe_flow.c | 10 +-
> drivers/net/cxgbe/cxgbe_main.c | 4 +-
> drivers/net/cxgbe/cxgbe_pfvf.h | 2 +-
> drivers/net/cxgbe/cxgbevf_main.c | 2 +-
> drivers/net/cxgbe/l2t.c | 8 +-
> drivers/net/cxgbe/l2t.h | 2 +-
> drivers/net/cxgbe/mps_tcam.c | 14 +-
> drivers/net/cxgbe/mps_tcam.h | 4 +-
> drivers/net/cxgbe/sge.c | 8 +-
> drivers/net/dpaa/dpaa_ethdev.c | 20 +-
> drivers/net/dpaa/dpaa_rxtx.c | 22 +-
> drivers/net/dpaa2/dpaa2_ethdev.c | 36 +--
> drivers/net/e1000/e1000_ethdev.h | 2 +-
> drivers/net/e1000/em_ethdev.c | 34 +--
> drivers/net/e1000/em_rxtx.c | 22 +-
> drivers/net/e1000/igb_ethdev.c | 70 ++---
> drivers/net/e1000/igb_flow.c | 12 +-
> drivers/net/e1000/igb_pf.c | 16 +-
> drivers/net/e1000/igb_rxtx.c | 18 +-
> drivers/net/ena/ena_ethdev.c | 16 +-
> drivers/net/ena/ena_ethdev.h | 2 +-
> drivers/net/enetc/base/enetc_hw.h | 4 +-
> drivers/net/enetc/enetc_ethdev.c | 6 +-
> drivers/net/enic/enic.h | 2 +-
> drivers/net/enic/enic_clsf.c | 40 +--
> drivers/net/enic/enic_ethdev.c | 4 +-
> drivers/net/enic/enic_flow.c | 100 +++----
> drivers/net/enic/enic_main.c | 2 +-
> drivers/net/enic/enic_res.c | 4 +-
> drivers/net/failsafe/failsafe.c | 6 +-
> drivers/net/failsafe/failsafe_args.c | 4 +-
> drivers/net/failsafe/failsafe_ether.c | 6 +-
> drivers/net/failsafe/failsafe_ops.c | 6 +-
> drivers/net/failsafe/failsafe_private.h | 4 +-
> drivers/net/fm10k/fm10k.h | 2 +-
> drivers/net/fm10k/fm10k_ethdev.c | 18 +-
> drivers/net/i40e/base/i40e_adminq_cmd.h | 4 +-
> drivers/net/i40e/base/i40e_common.c | 12 +-
> drivers/net/i40e/base/i40e_prototype.h | 4 +-
> drivers/net/i40e/i40e_ethdev.c | 134 ++++-----
> drivers/net/i40e/i40e_ethdev.h | 22 +-
> drivers/net/i40e/i40e_ethdev_vf.c | 60 ++--
> drivers/net/i40e/i40e_fdir.c | 126 ++++----
> drivers/net/i40e/i40e_flow.c | 58 ++--
> drivers/net/i40e/i40e_pf.c | 18 +-
> drivers/net/i40e/i40e_rxtx.c | 28 +-
> drivers/net/i40e/i40e_vf_representor.c | 2 +-
> drivers/net/i40e/rte_pmd_i40e.c | 30 +-
> drivers/net/i40e/rte_pmd_i40e.h | 8 +-
> drivers/net/ixgbe/ixgbe_ethdev.c | 94 +++---
> drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
> drivers/net/ixgbe/ixgbe_flow.c | 22 +-
> drivers/net/ixgbe/ixgbe_pf.c | 14 +-
> drivers/net/ixgbe/ixgbe_rxtx.c | 14 +-
> drivers/net/ixgbe/ixgbe_vf_representor.c | 4 +-
> drivers/net/ixgbe/rte_pmd_ixgbe.c | 10 +-
> drivers/net/ixgbe/rte_pmd_ixgbe.h | 2 +-
> drivers/net/kni/rte_eth_kni.c | 4 +-
> drivers/net/liquidio/lio_ethdev.c | 22 +-
> drivers/net/mlx4/mlx4.c | 4 +-
> drivers/net/mlx4/mlx4.h | 8 +-
> drivers/net/mlx4/mlx4_ethdev.c | 8 +-
> drivers/net/mlx4/mlx4_flow.c | 14 +-
> drivers/net/mlx4/mlx4_rxtx.c | 2 +-
> drivers/net/mlx5/mlx5.c | 4 +-
> drivers/net/mlx5/mlx5.h | 14 +-
> drivers/net/mlx5/mlx5_flow.c | 22 +-
> drivers/net/mlx5/mlx5_flow_tcf.c | 40 +--
> drivers/net/mlx5/mlx5_flow_verbs.c | 26 +-
> drivers/net/mlx5/mlx5_mac.c | 18 +-
> drivers/net/mlx5/mlx5_nl.c | 28 +-
> drivers/net/mlx5/mlx5_rxtx.c | 6 +-
> drivers/net/mlx5/mlx5_rxtx.h | 2 +-
> drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 +-
> drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 10 +-
> drivers/net/mlx5/mlx5_trigger.c | 6 +-
> drivers/net/mvneta/mvneta_ethdev.c | 22 +-
> drivers/net/mvneta/mvneta_ethdev.h | 2 +-
> drivers/net/mvpp2/mrvl_ethdev.c | 22 +-
> drivers/net/mvpp2/mrvl_ethdev.h | 2 +-
> drivers/net/mvpp2/mrvl_flow.c | 4 +-
> drivers/net/netvsc/hn_ethdev.c | 4 +-
> drivers/net/netvsc/hn_nvs.c | 2 +-
> drivers/net/netvsc/hn_rndis.c | 2 +-
> drivers/net/netvsc/hn_rxtx.c | 12 +-
> drivers/net/netvsc/hn_var.h | 4 +-
> drivers/net/netvsc/hn_vf.c | 12 +-
> drivers/net/nfp/nfp_net.c | 20 +-
> drivers/net/nfp/nfp_net_pmd.h | 2 +-
> drivers/net/null/rte_eth_null.c | 6 +-
> drivers/net/octeontx/octeontx_ethdev.c | 8 +-
> drivers/net/octeontx/octeontx_ethdev.h | 2 +-
> drivers/net/pcap/rte_eth_pcap.c | 22 +-
> drivers/net/qede/base/bcm_osal.h | 2 +-
> drivers/net/qede/base/ecore_dev.c | 4 +-
> drivers/net/qede/qede_ethdev.c | 58 ++--
> drivers/net/qede/qede_ethdev.h | 6 +-
> drivers/net/qede/qede_filter.c | 66 ++---
> drivers/net/qede/qede_if.h | 4 +-
> drivers/net/qede/qede_main.c | 6 +-
> drivers/net/qede/qede_rxtx.c | 32 +-
> drivers/net/qede/qede_rxtx.h | 2 +-
> drivers/net/ring/rte_eth_ring.c | 4 +-
> drivers/net/sfc/sfc.h | 2 +-
> drivers/net/sfc/sfc_ef10_tx.c | 8 +-
> drivers/net/sfc/sfc_ethdev.c | 20 +-
> drivers/net/sfc/sfc_flow.c | 12 +-
> drivers/net/sfc/sfc_port.c | 8 +-
> drivers/net/sfc/sfc_tso.c | 8 +-
> drivers/net/softnic/parser.c | 18 +-
> drivers/net/softnic/parser.h | 2 +-
> drivers/net/softnic/rte_eth_softnic.c | 2 +-
> drivers/net/softnic/rte_eth_softnic_pipeline.c | 40 +--
> drivers/net/szedata2/rte_eth_szedata2.c | 8 +-
> drivers/net/tap/rte_eth_tap.c | 58 ++--
> drivers/net/tap/rte_eth_tap.h | 2 +-
> drivers/net/tap/tap_bpf_program.c | 14 +-
> drivers/net/tap/tap_flow.c | 12 +-
> drivers/net/thunderx/base/nicvf_mbox.c | 4 +-
> drivers/net/thunderx/base/nicvf_plat.h | 2 +-
> drivers/net/thunderx/nicvf_ethdev.c | 18 +-
> drivers/net/thunderx/nicvf_struct.h | 2 +-
> drivers/net/vdev_netvsc/vdev_netvsc.c | 16 +-
> drivers/net/vhost/rte_eth_vhost.c | 12 +-
> drivers/net/virtio/virtio_ethdev.c | 70 ++---
> drivers/net/virtio/virtio_pci.h | 4 +-
> drivers/net/virtio/virtio_rxtx.c | 28 +-
> drivers/net/virtio/virtio_user/vhost_kernel_tap.c | 2 +-
> drivers/net/virtio/virtio_user/virtio_user_dev.c | 6 +-
> drivers/net/virtio/virtio_user/virtio_user_dev.h | 2 +-
> drivers/net/virtio/virtio_user_ethdev.c | 8 +-
> drivers/net/virtio/virtqueue.h | 2 +-
> drivers/net/vmxnet3/vmxnet3_ethdev.c | 12 +-
> drivers/net/vmxnet3/vmxnet3_ethdev.h | 2 +-
> drivers/net/vmxnet3/vmxnet3_rxtx.c | 44 +--
> examples/bbdev_app/main.c | 40 +--
> examples/bond/main.c | 78 ++---
> examples/distributor/main.c | 4 +-
> examples/ethtool/ethtool-app/ethapp.c | 8 +-
> examples/ethtool/ethtool-app/main.c | 10 +-
> examples/ethtool/lib/rte_ethtool.c | 8 +-
> examples/ethtool/lib/rte_ethtool.h | 6 +-
> examples/eventdev_pipeline/main.c | 4 +-
> examples/eventdev_pipeline/pipeline_common.h | 10 +-
> examples/flow_classify/flow_classify.c | 30 +-
> examples/flow_filtering/main.c | 10 +-
> examples/ip_fragmentation/main.c | 62 ++--
> examples/ip_pipeline/cli.c | 2 +-
> examples/ip_pipeline/kni.c | 2 +-
> examples/ip_pipeline/parser.c | 18 +-
> examples/ip_pipeline/parser.h | 2 +-
> examples/ip_pipeline/pipeline.c | 40 +--
> examples/ip_reassembly/main.c | 50 ++--
> examples/ipsec-secgw/esp.c | 42 +--
> examples/ipsec-secgw/ipsec-secgw.c | 38 +--
> examples/ipsec-secgw/sa.c | 6 +-
> examples/ipv4_multicast/main.c | 58 ++--
> examples/kni/main.c | 14 +-
> examples/l2fwd-cat/l2fwd-cat.c | 4 +-
> examples/l2fwd-crypto/main.c | 26 +-
> examples/l2fwd-jobstats/main.c | 8 +-
> examples/l2fwd-keepalive/main.c | 8 +-
> examples/l2fwd/main.c | 8 +-
> examples/l3fwd-acl/main.c | 102 +++----
> examples/l3fwd-power/main.c | 100 +++----
> examples/l3fwd-vf/main.c | 68 ++---
> examples/l3fwd/l3fwd.h | 8 +-
> examples/l3fwd/l3fwd_altivec.h | 14 +-
> examples/l3fwd/l3fwd_common.h | 4 +-
> examples/l3fwd/l3fwd_em.c | 44 +--
> examples/l3fwd/l3fwd_em.h | 20 +-
> examples/l3fwd/l3fwd_em_hlm.h | 16 +-
> examples/l3fwd/l3fwd_em_hlm_neon.h | 16 +-
> examples/l3fwd/l3fwd_em_hlm_sse.h | 16 +-
> examples/l3fwd/l3fwd_em_sequential.h | 16 +-
> examples/l3fwd/l3fwd_lpm.c | 50 ++--
> examples/l3fwd/l3fwd_lpm.h | 20 +-
> examples/l3fwd/l3fwd_lpm_altivec.h | 20 +-
> examples/l3fwd/l3fwd_lpm_neon.h | 30 +-
> examples/l3fwd/l3fwd_lpm_sse.h | 20 +-
> examples/l3fwd/l3fwd_neon.h | 14 +-
> examples/l3fwd/l3fwd_sse.h | 14 +-
> examples/l3fwd/main.c | 20 +-
> examples/link_status_interrupt/main.c | 8 +-
> examples/load_balancer/runtime.c | 6 +-
> .../client_server_mp/mp_server/main.c | 2 +-
> examples/packet_ordering/main.c | 2 +-
> examples/performance-thread/l3fwd-thread/main.c | 322 ++++++++++-----------
> examples/ptpclient/ptpclient.c | 32 +-
> examples/qos_meter/main.c | 4 +-
> examples/qos_sched/init.c | 2 +-
> examples/quota_watermark/qw/main.c | 8 +-
> examples/rxtx_callbacks/main.c | 4 +-
> examples/server_node_efd/node/node.c | 6 +-
> examples/server_node_efd/server/main.c | 8 +-
> examples/skeleton/basicfwd.c | 4 +-
> examples/tep_termination/main.c | 2 +-
> examples/tep_termination/main.h | 2 +-
> examples/tep_termination/vxlan.c | 108 +++----
> examples/tep_termination/vxlan.h | 8 +-
> examples/tep_termination/vxlan_setup.c | 30 +-
> examples/tep_termination/vxlan_setup.h | 2 +-
> examples/vhost/main.c | 40 +--
> examples/vhost/main.h | 2 +-
> examples/vm_power_manager/channel_monitor.c | 2 +-
> .../guest_cli/vm_power_cli_guest.c | 2 +-
> examples/vm_power_manager/main.c | 6 +-
> examples/vmdq/main.c | 12 +-
> examples/vmdq_dcb/main.c | 12 +-
> lib/librte_cmdline/cmdline_parse_etheraddr.c | 33 +--
> lib/librte_ethdev/rte_eth_ctrl.h | 12 +-
> lib/librte_ethdev/rte_ethdev.c | 56 ++--
> lib/librte_ethdev/rte_ethdev.h | 12 +-
> lib/librte_ethdev/rte_ethdev_core.h | 12 +-
> lib/librte_ethdev/rte_flow.h | 32 +-
> lib/librte_eventdev/rte_event_eth_rx_adapter.c | 32 +-
> lib/librte_gro/gro_tcp4.c | 26 +-
> lib/librte_gro/gro_tcp4.h | 20 +-
> lib/librte_gro/gro_vxlan_tcp4.c | 64 ++--
> lib/librte_gro/gro_vxlan_tcp4.h | 6 +-
> lib/librte_gso/gso_common.h | 16 +-
> lib/librte_gso/gso_tcp4.c | 12 +-
> lib/librte_gso/gso_tunnel_tcp4.c | 14 +-
> lib/librte_gso/gso_udp4.c | 8 +-
> lib/librte_gso/rte_gso.h | 8 +-
> lib/librte_hash/rte_thash.h | 2 +-
> lib/librte_ip_frag/rte_ip_frag.h | 12 +-
> lib/librte_ip_frag/rte_ipv4_fragmentation.c | 42 +--
> lib/librte_ip_frag/rte_ipv4_reassembly.c | 14 +-
> lib/librte_ip_frag/rte_ipv6_fragmentation.c | 26 +-
> lib/librte_ip_frag/rte_ipv6_reassembly.c | 6 +-
> lib/librte_kni/rte_kni.c | 4 +-
> lib/librte_kni/rte_kni.h | 2 +-
> lib/librte_net/rte_arp.c | 32 +-
> lib/librte_net/rte_arp.h | 36 +--
> lib/librte_net/rte_esp.h | 2 +-
> lib/librte_net/rte_ether.h | 178 ++++++------
> lib/librte_net/rte_gre.h | 2 +-
> lib/librte_net/rte_icmp.h | 6 +-
> lib/librte_net/rte_ip.h | 70 ++---
> lib/librte_net/rte_net.c | 90 +++---
> lib/librte_net/rte_net.h | 22 +-
> lib/librte_net/rte_sctp.h | 2 +-
> lib/librte_net/rte_tcp.h | 2 +-
> lib/librte_net/rte_udp.h | 2 +-
> lib/librte_pipeline/rte_table_action.c | 210 +++++++-------
> lib/librte_pipeline/rte_table_action.h | 4 +-
> lib/librte_port/rte_port_ras.c | 8 +-
> lib/librte_port/rte_port_source_sink.c | 6 +-
> lib/librte_vhost/vhost.h | 2 +-
> lib/librte_vhost/virtio_net.c | 42 +--
> test/test-acl/main.c | 2 +-
> test/test-pipeline/pipeline_acl.c | 16 +-
> test/test-pipeline/pipeline_hash.c | 12 +-
> test/test/packet_burst_generator.c | 126 ++++----
> test/test/packet_burst_generator.h | 26 +-
> test/test/test_acl.c | 8 +-
> test/test/test_acl.h | 122 ++++----
> test/test/test_cmdline_etheraddr.c | 16 +-
> test/test/test_efd.c | 20 +-
> test/test/test_event_eth_rx_adapter.c | 2 +-
> test/test/test_event_eth_tx_adapter.c | 2 +-
> test/test/test_flow_classify.c | 68 ++---
> test/test/test_hash.c | 20 +-
> test/test/test_link_bonding.c | 284 +++++++++---------
> test/test/test_link_bonding_mode4.c | 116 ++++----
> test/test/test_link_bonding_rssconf.c | 6 +-
> test/test/test_lpm.c | 76 ++---
> test/test/test_lpm_perf.c | 10 +-
> test/test/test_member.c | 20 +-
> test/test/test_pmd_perf.c | 20 +-
> test/test/test_sched.c | 20 +-
> test/test/test_table_acl.c | 8 +-
> test/test/test_thash.c | 12 +-
> test/test/virtual_pmd.c | 6 +-
> test/test/virtual_pmd.h | 2 +-
> 367 files changed, 3906 insertions(+), 3913 deletions(-)
>
Since BSD structures are available on Linux and BSD why is DPDK reinventing?
There is no value in doing that.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-24 8:18 1% [dpdk-dev] [RFC 00/14] prefix network structures Olivier Matz
2018-10-24 14:56 0% ` Wiles, Keith
2018-10-24 16:09 0% ` Stephen Hemminger
@ 2018-10-24 16:39 0% ` Bruce Richardson
2018-10-26 7:20 0% ` Olivier Matz
2018-10-24 18:38 0% ` Stephen Hemminger
3 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-10-24 16:39 UTC (permalink / raw)
To: Olivier Matz; +Cc: dev
On Wed, Oct 24, 2018 at 10:18:19AM +0200, Olivier Matz wrote:
> This RFC targets 19.02.
>
> The rte_net headers conflict with the libc headers, because
> some definitions are duplicated, sometimes with few differences.
> This was discussed in [1], and more recently at the techboard.
>
> Before sending the deprecation notice (target for this is 18.11),
> here is a draft that can be discussed.
>
> This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> and defines in rte_net library. This is a big changeset, that will
> break the API of many functions, but not the ABI.
>
> One question I'm asking is how can we manage the transition.
> Initially, I hoped it was possible to have a compat layer during
> one release (supporting both prefixed and unprefixed names), but
> now that the patch is done, it seems the impact is too big, and
> impacts too many libraries.
>
> Few examples:
> - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> - many rte_flow structures use the rte_ prefixed net structures
> - the mac field of virtio_net structure is rte_ether_addr
> - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> ...
>
> Therefore, it is clear that doing this would break the compilation
> of many external applications.
>
Can you clarify a bit as to why we can't keep around compatibility versions
of the headers, alongside the new versions? I'm not following the logic
above. Can we not introduce completely new headers with the replacements
while leaving the old ones intact?
/Bruce
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-24 8:18 1% [dpdk-dev] [RFC 00/14] prefix network structures Olivier Matz
2018-10-24 14:56 0% ` Wiles, Keith
@ 2018-10-24 16:09 0% ` Stephen Hemminger
2018-10-24 16:39 0% ` Bruce Richardson
2018-10-24 18:38 0% ` Stephen Hemminger
3 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2018-10-24 16:09 UTC (permalink / raw)
To: Olivier Matz; +Cc: dev
On Wed, 24 Oct 2018 10:18:19 +0200
Olivier Matz <olivier.matz@6wind.com> wrote:
> This RFC targets 19.02.
>
> The rte_net headers conflict with the libc headers, because
> some definitions are duplicated, sometimes with few differences.
> This was discussed in [1], and more recently at the techboard.
>
> Before sending the deprecation notice (target for this is 18.11),
> here is a draft that can be discussed.
>
> This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> and defines in rte_net library. This is a big changeset, that will
> break the API of many functions, but not the ABI.
>
> One question I'm asking is how can we manage the transition.
> Initially, I hoped it was possible to have a compat layer during
> one release (supporting both prefixed and unprefixed names), but
> now that the patch is done, it seems the impact is too big, and
> impacts too many libraries.
>
> Few examples:
> - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> - many rte_flow structures use the rte_ prefixed net structures
> - the mac field of virtio_net structure is rte_ether_addr
> - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> ...
>
> Therefore, it is clear that doing this would break the compilation
> of many external applications.
>
> Another drawback we need to take in account: it will make the
> backport of patches more difficult, although this is something
> that could be tempered by a script.
>
> While it is obviously better to have a good namespace convention,
> we need to identify the issues we have today before deciding it's
> worth doing the change.
>
> Comments?
>
>
> Things that are missing in RFC:
> - test with FreeBSD
> - manually fix some indentation issues
>
>
> Olivier Matz (14):
> net: add rte prefix to arp structures
> net: add rte prefix to arp defines
> net: add rte prefix to ether structures
> net: add rte prefix to ether functions
> net: add rte prefix to ether defines
> net: add rte prefix to esp structure
> net: add rte prefix to gre structure
> net: add rte prefix to icmp structure
> net: add rte prefix to icmp defines
> net: add rte prefix to ip structure
> net: add rte prefix to ip defines
> net: add rte prefix to sctp structure
> net: add rte prefix to tcp structure
> net: add rte prefix to udp structure
>
> app/pdump/main.c | 2 +-
> app/test-eventdev/test_perf_common.c | 2 +-
> app/test-eventdev/test_pipeline_common.c | 2 +-
> app/test-pmd/cmdline.c | 66 ++---
> app/test-pmd/cmdline_flow.c | 10 +-
> app/test-pmd/config.c | 34 +--
> app/test-pmd/csumonly.c | 156 +++++-----
> app/test-pmd/flowgen.c | 30 +-
> app/test-pmd/icmpecho.c | 120 ++++----
> app/test-pmd/ieee1588fwd.c | 18 +-
> app/test-pmd/macfwd.c | 12 +-
> app/test-pmd/macswap.c | 16 +-
> app/test-pmd/parameters.c | 6 +-
> app/test-pmd/testpmd.c | 22 +-
> app/test-pmd/testpmd.h | 18 +-
> app/test-pmd/txonly.c | 36 +--
> app/test-pmd/util.c | 34 +--
> doc/guides/prog_guide/bbdev.rst | 6 +-
> .../prog_guide/packet_classif_access_ctrl.rst | 18 +-
> doc/guides/prog_guide/rte_flow.rst | 4 +-
> doc/guides/sample_app_ug/flow_classify.rst | 28 +-
> doc/guides/sample_app_ug/flow_filtering.rst | 6 +-
> doc/guides/sample_app_ug/ip_frag.rst | 16 +-
> doc/guides/sample_app_ug/ip_reassembly.rst | 16 +-
> doc/guides/sample_app_ug/ipv4_multicast.rst | 16 +-
> doc/guides/sample_app_ug/l2_forward_job_stats.rst | 6 +-
> .../sample_app_ug/l2_forward_real_virtual.rst | 6 +-
> doc/guides/sample_app_ug/l3_forward.rst | 12 +-
> doc/guides/sample_app_ug/link_status_intr.rst | 6 +-
> doc/guides/sample_app_ug/ptpclient.rst | 6 +-
> doc/guides/sample_app_ug/rxtx_callbacks.rst | 2 +-
> doc/guides/sample_app_ug/server_node_efd.rst | 12 +-
> doc/guides/sample_app_ug/skeleton.rst | 4 +-
> doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst | 4 +-
> drivers/bus/dpaa/base/fman/fman.c | 2 +-
> drivers/bus/dpaa/base/fman/fman_hw.c | 2 +-
> drivers/bus/dpaa/include/fman.h | 2 +-
> drivers/bus/dpaa/include/netcfg.h | 4 +-
> drivers/net/af_packet/rte_eth_af_packet.c | 2 +-
> drivers/net/ark/ark_ethdev.c | 16 +-
> drivers/net/ark/ark_ext.h | 4 +-
> drivers/net/ark/ark_global.h | 4 +-
> drivers/net/atlantic/atl_ethdev.c | 20 +-
> drivers/net/atlantic/hw_atl/hw_atl_utils.c | 8 +-
> drivers/net/atlantic/hw_atl/hw_atl_utils_fw2x.c | 4 +-
> drivers/net/avf/avf.h | 4 +-
> drivers/net/avf/avf_ethdev.c | 50 ++--
> drivers/net/avf/avf_rxtx.c | 14 +-
> drivers/net/avf/avf_vchnl.c | 8 +-
> drivers/net/avf/base/avf_adminq_cmd.h | 4 +-
> drivers/net/avf/base/avf_common.c | 12 +-
> drivers/net/avf/base/avf_prototype.h | 4 +-
> drivers/net/avp/avp_ethdev.c | 20 +-
> drivers/net/avp/rte_avp_common.h | 2 +-
> drivers/net/axgbe/axgbe_dev.c | 4 +-
> drivers/net/axgbe/axgbe_ethdev.c | 10 +-
> drivers/net/axgbe/axgbe_ethdev.h | 4 +-
> drivers/net/axgbe/axgbe_rxtx.c | 2 +-
> drivers/net/bnx2x/bnx2x.c | 16 +-
> drivers/net/bnx2x/bnx2x_ethdev.c | 4 +-
> drivers/net/bnx2x/bnx2x_ethdev.h | 2 +-
> drivers/net/bnx2x/bnx2x_vfpf.c | 8 +-
> drivers/net/bnx2x/bnx2x_vfpf.h | 2 +-
> drivers/net/bnx2x/ecore_sp.h | 2 +-
> drivers/net/bnxt/bnxt.h | 4 +-
> drivers/net/bnxt/bnxt_ethdev.c | 70 ++---
> drivers/net/bnxt/bnxt_filter.c | 4 +-
> drivers/net/bnxt/bnxt_filter.h | 8 +-
> drivers/net/bnxt/bnxt_flow.c | 26 +-
> drivers/net/bnxt/bnxt_hwrm.c | 40 +--
> drivers/net/bnxt/bnxt_hwrm.h | 2 +-
> drivers/net/bnxt/bnxt_ring.c | 8 +-
> drivers/net/bnxt/bnxt_rxq.c | 2 +-
> drivers/net/bnxt/bnxt_rxr.c | 2 +-
> drivers/net/bnxt/bnxt_vnic.c | 2 +-
> drivers/net/bnxt/rte_pmd_bnxt.c | 14 +-
> drivers/net/bnxt/rte_pmd_bnxt.h | 4 +-
> drivers/net/bonding/rte_eth_bond.h | 2 +-
> drivers/net/bonding/rte_eth_bond_8023ad.c | 26 +-
> drivers/net/bonding/rte_eth_bond_8023ad.h | 10 +-
> drivers/net/bonding/rte_eth_bond_alb.c | 78 ++---
> drivers/net/bonding/rte_eth_bond_alb.h | 10 +-
> drivers/net/bonding/rte_eth_bond_api.c | 2 +-
> drivers/net/bonding/rte_eth_bond_args.c | 2 +-
> drivers/net/bonding/rte_eth_bond_pmd.c | 194 ++++++-------
> drivers/net/bonding/rte_eth_bond_private.h | 6 +-
> drivers/net/cxgbe/base/adapter.h | 6 +-
> drivers/net/cxgbe/base/t4_hw.c | 8 +-
> drivers/net/cxgbe/cxgbe.h | 4 +-
> drivers/net/cxgbe/cxgbe_ethdev.c | 14 +-
> drivers/net/cxgbe/cxgbe_filter.h | 2 +-
> drivers/net/cxgbe/cxgbe_flow.c | 10 +-
> drivers/net/cxgbe/cxgbe_main.c | 4 +-
> drivers/net/cxgbe/cxgbe_pfvf.h | 2 +-
> drivers/net/cxgbe/cxgbevf_main.c | 2 +-
> drivers/net/cxgbe/l2t.c | 8 +-
> drivers/net/cxgbe/l2t.h | 2 +-
> drivers/net/cxgbe/mps_tcam.c | 14 +-
> drivers/net/cxgbe/mps_tcam.h | 4 +-
> drivers/net/cxgbe/sge.c | 8 +-
> drivers/net/dpaa/dpaa_ethdev.c | 20 +-
> drivers/net/dpaa/dpaa_rxtx.c | 22 +-
> drivers/net/dpaa2/dpaa2_ethdev.c | 36 +--
> drivers/net/e1000/e1000_ethdev.h | 2 +-
> drivers/net/e1000/em_ethdev.c | 34 +--
> drivers/net/e1000/em_rxtx.c | 22 +-
> drivers/net/e1000/igb_ethdev.c | 70 ++---
> drivers/net/e1000/igb_flow.c | 12 +-
> drivers/net/e1000/igb_pf.c | 16 +-
> drivers/net/e1000/igb_rxtx.c | 18 +-
> drivers/net/ena/ena_ethdev.c | 16 +-
> drivers/net/ena/ena_ethdev.h | 2 +-
> drivers/net/enetc/base/enetc_hw.h | 4 +-
> drivers/net/enetc/enetc_ethdev.c | 6 +-
> drivers/net/enic/enic.h | 2 +-
> drivers/net/enic/enic_clsf.c | 40 +--
> drivers/net/enic/enic_ethdev.c | 4 +-
> drivers/net/enic/enic_flow.c | 100 +++----
> drivers/net/enic/enic_main.c | 2 +-
> drivers/net/enic/enic_res.c | 4 +-
> drivers/net/failsafe/failsafe.c | 6 +-
> drivers/net/failsafe/failsafe_args.c | 4 +-
> drivers/net/failsafe/failsafe_ether.c | 6 +-
> drivers/net/failsafe/failsafe_ops.c | 6 +-
> drivers/net/failsafe/failsafe_private.h | 4 +-
> drivers/net/fm10k/fm10k.h | 2 +-
> drivers/net/fm10k/fm10k_ethdev.c | 18 +-
> drivers/net/i40e/base/i40e_adminq_cmd.h | 4 +-
> drivers/net/i40e/base/i40e_common.c | 12 +-
> drivers/net/i40e/base/i40e_prototype.h | 4 +-
> drivers/net/i40e/i40e_ethdev.c | 134 ++++-----
> drivers/net/i40e/i40e_ethdev.h | 22 +-
> drivers/net/i40e/i40e_ethdev_vf.c | 60 ++--
> drivers/net/i40e/i40e_fdir.c | 126 ++++----
> drivers/net/i40e/i40e_flow.c | 58 ++--
> drivers/net/i40e/i40e_pf.c | 18 +-
> drivers/net/i40e/i40e_rxtx.c | 28 +-
> drivers/net/i40e/i40e_vf_representor.c | 2 +-
> drivers/net/i40e/rte_pmd_i40e.c | 30 +-
> drivers/net/i40e/rte_pmd_i40e.h | 8 +-
> drivers/net/ixgbe/ixgbe_ethdev.c | 94 +++---
> drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
> drivers/net/ixgbe/ixgbe_flow.c | 22 +-
> drivers/net/ixgbe/ixgbe_pf.c | 14 +-
> drivers/net/ixgbe/ixgbe_rxtx.c | 14 +-
> drivers/net/ixgbe/ixgbe_vf_representor.c | 4 +-
> drivers/net/ixgbe/rte_pmd_ixgbe.c | 10 +-
> drivers/net/ixgbe/rte_pmd_ixgbe.h | 2 +-
> drivers/net/kni/rte_eth_kni.c | 4 +-
> drivers/net/liquidio/lio_ethdev.c | 22 +-
> drivers/net/mlx4/mlx4.c | 4 +-
> drivers/net/mlx4/mlx4.h | 8 +-
> drivers/net/mlx4/mlx4_ethdev.c | 8 +-
> drivers/net/mlx4/mlx4_flow.c | 14 +-
> drivers/net/mlx4/mlx4_rxtx.c | 2 +-
> drivers/net/mlx5/mlx5.c | 4 +-
> drivers/net/mlx5/mlx5.h | 14 +-
> drivers/net/mlx5/mlx5_flow.c | 22 +-
> drivers/net/mlx5/mlx5_flow_tcf.c | 40 +--
> drivers/net/mlx5/mlx5_flow_verbs.c | 26 +-
> drivers/net/mlx5/mlx5_mac.c | 18 +-
> drivers/net/mlx5/mlx5_nl.c | 28 +-
> drivers/net/mlx5/mlx5_rxtx.c | 6 +-
> drivers/net/mlx5/mlx5_rxtx.h | 2 +-
> drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 +-
> drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 10 +-
> drivers/net/mlx5/mlx5_trigger.c | 6 +-
> drivers/net/mvneta/mvneta_ethdev.c | 22 +-
> drivers/net/mvneta/mvneta_ethdev.h | 2 +-
> drivers/net/mvpp2/mrvl_ethdev.c | 22 +-
> drivers/net/mvpp2/mrvl_ethdev.h | 2 +-
> drivers/net/mvpp2/mrvl_flow.c | 4 +-
> drivers/net/netvsc/hn_ethdev.c | 4 +-
> drivers/net/netvsc/hn_nvs.c | 2 +-
> drivers/net/netvsc/hn_rndis.c | 2 +-
> drivers/net/netvsc/hn_rxtx.c | 12 +-
> drivers/net/netvsc/hn_var.h | 4 +-
> drivers/net/netvsc/hn_vf.c | 12 +-
> drivers/net/nfp/nfp_net.c | 20 +-
> drivers/net/nfp/nfp_net_pmd.h | 2 +-
> drivers/net/null/rte_eth_null.c | 6 +-
> drivers/net/octeontx/octeontx_ethdev.c | 8 +-
> drivers/net/octeontx/octeontx_ethdev.h | 2 +-
> drivers/net/pcap/rte_eth_pcap.c | 22 +-
> drivers/net/qede/base/bcm_osal.h | 2 +-
> drivers/net/qede/base/ecore_dev.c | 4 +-
> drivers/net/qede/qede_ethdev.c | 58 ++--
> drivers/net/qede/qede_ethdev.h | 6 +-
> drivers/net/qede/qede_filter.c | 66 ++---
> drivers/net/qede/qede_if.h | 4 +-
> drivers/net/qede/qede_main.c | 6 +-
> drivers/net/qede/qede_rxtx.c | 32 +-
> drivers/net/qede/qede_rxtx.h | 2 +-
> drivers/net/ring/rte_eth_ring.c | 4 +-
> drivers/net/sfc/sfc.h | 2 +-
> drivers/net/sfc/sfc_ef10_tx.c | 8 +-
> drivers/net/sfc/sfc_ethdev.c | 20 +-
> drivers/net/sfc/sfc_flow.c | 12 +-
> drivers/net/sfc/sfc_port.c | 8 +-
> drivers/net/sfc/sfc_tso.c | 8 +-
> drivers/net/softnic/parser.c | 18 +-
> drivers/net/softnic/parser.h | 2 +-
> drivers/net/softnic/rte_eth_softnic.c | 2 +-
> drivers/net/softnic/rte_eth_softnic_pipeline.c | 40 +--
> drivers/net/szedata2/rte_eth_szedata2.c | 8 +-
> drivers/net/tap/rte_eth_tap.c | 58 ++--
> drivers/net/tap/rte_eth_tap.h | 2 +-
> drivers/net/tap/tap_bpf_program.c | 14 +-
> drivers/net/tap/tap_flow.c | 12 +-
> drivers/net/thunderx/base/nicvf_mbox.c | 4 +-
> drivers/net/thunderx/base/nicvf_plat.h | 2 +-
> drivers/net/thunderx/nicvf_ethdev.c | 18 +-
> drivers/net/thunderx/nicvf_struct.h | 2 +-
> drivers/net/vdev_netvsc/vdev_netvsc.c | 16 +-
> drivers/net/vhost/rte_eth_vhost.c | 12 +-
> drivers/net/virtio/virtio_ethdev.c | 70 ++---
> drivers/net/virtio/virtio_pci.h | 4 +-
> drivers/net/virtio/virtio_rxtx.c | 28 +-
> drivers/net/virtio/virtio_user/vhost_kernel_tap.c | 2 +-
> drivers/net/virtio/virtio_user/virtio_user_dev.c | 6 +-
> drivers/net/virtio/virtio_user/virtio_user_dev.h | 2 +-
> drivers/net/virtio/virtio_user_ethdev.c | 8 +-
> drivers/net/virtio/virtqueue.h | 2 +-
> drivers/net/vmxnet3/vmxnet3_ethdev.c | 12 +-
> drivers/net/vmxnet3/vmxnet3_ethdev.h | 2 +-
> drivers/net/vmxnet3/vmxnet3_rxtx.c | 44 +--
> examples/bbdev_app/main.c | 40 +--
> examples/bond/main.c | 78 ++---
> examples/distributor/main.c | 4 +-
> examples/ethtool/ethtool-app/ethapp.c | 8 +-
> examples/ethtool/ethtool-app/main.c | 10 +-
> examples/ethtool/lib/rte_ethtool.c | 8 +-
> examples/ethtool/lib/rte_ethtool.h | 6 +-
> examples/eventdev_pipeline/main.c | 4 +-
> examples/eventdev_pipeline/pipeline_common.h | 10 +-
> examples/flow_classify/flow_classify.c | 30 +-
> examples/flow_filtering/main.c | 10 +-
> examples/ip_fragmentation/main.c | 62 ++--
> examples/ip_pipeline/cli.c | 2 +-
> examples/ip_pipeline/kni.c | 2 +-
> examples/ip_pipeline/parser.c | 18 +-
> examples/ip_pipeline/parser.h | 2 +-
> examples/ip_pipeline/pipeline.c | 40 +--
> examples/ip_reassembly/main.c | 50 ++--
> examples/ipsec-secgw/esp.c | 42 +--
> examples/ipsec-secgw/ipsec-secgw.c | 38 +--
> examples/ipsec-secgw/sa.c | 6 +-
> examples/ipv4_multicast/main.c | 58 ++--
> examples/kni/main.c | 14 +-
> examples/l2fwd-cat/l2fwd-cat.c | 4 +-
> examples/l2fwd-crypto/main.c | 26 +-
> examples/l2fwd-jobstats/main.c | 8 +-
> examples/l2fwd-keepalive/main.c | 8 +-
> examples/l2fwd/main.c | 8 +-
> examples/l3fwd-acl/main.c | 102 +++----
> examples/l3fwd-power/main.c | 100 +++----
> examples/l3fwd-vf/main.c | 68 ++---
> examples/l3fwd/l3fwd.h | 8 +-
> examples/l3fwd/l3fwd_altivec.h | 14 +-
> examples/l3fwd/l3fwd_common.h | 4 +-
> examples/l3fwd/l3fwd_em.c | 44 +--
> examples/l3fwd/l3fwd_em.h | 20 +-
> examples/l3fwd/l3fwd_em_hlm.h | 16 +-
> examples/l3fwd/l3fwd_em_hlm_neon.h | 16 +-
> examples/l3fwd/l3fwd_em_hlm_sse.h | 16 +-
> examples/l3fwd/l3fwd_em_sequential.h | 16 +-
> examples/l3fwd/l3fwd_lpm.c | 50 ++--
> examples/l3fwd/l3fwd_lpm.h | 20 +-
> examples/l3fwd/l3fwd_lpm_altivec.h | 20 +-
> examples/l3fwd/l3fwd_lpm_neon.h | 30 +-
> examples/l3fwd/l3fwd_lpm_sse.h | 20 +-
> examples/l3fwd/l3fwd_neon.h | 14 +-
> examples/l3fwd/l3fwd_sse.h | 14 +-
> examples/l3fwd/main.c | 20 +-
> examples/link_status_interrupt/main.c | 8 +-
> examples/load_balancer/runtime.c | 6 +-
> .../client_server_mp/mp_server/main.c | 2 +-
> examples/packet_ordering/main.c | 2 +-
> examples/performance-thread/l3fwd-thread/main.c | 322 ++++++++++-----------
> examples/ptpclient/ptpclient.c | 32 +-
> examples/qos_meter/main.c | 4 +-
> examples/qos_sched/init.c | 2 +-
> examples/quota_watermark/qw/main.c | 8 +-
> examples/rxtx_callbacks/main.c | 4 +-
> examples/server_node_efd/node/node.c | 6 +-
> examples/server_node_efd/server/main.c | 8 +-
> examples/skeleton/basicfwd.c | 4 +-
> examples/tep_termination/main.c | 2 +-
> examples/tep_termination/main.h | 2 +-
> examples/tep_termination/vxlan.c | 108 +++----
> examples/tep_termination/vxlan.h | 8 +-
> examples/tep_termination/vxlan_setup.c | 30 +-
> examples/tep_termination/vxlan_setup.h | 2 +-
> examples/vhost/main.c | 40 +--
> examples/vhost/main.h | 2 +-
> examples/vm_power_manager/channel_monitor.c | 2 +-
> .../guest_cli/vm_power_cli_guest.c | 2 +-
> examples/vm_power_manager/main.c | 6 +-
> examples/vmdq/main.c | 12 +-
> examples/vmdq_dcb/main.c | 12 +-
> lib/librte_cmdline/cmdline_parse_etheraddr.c | 33 +--
> lib/librte_ethdev/rte_eth_ctrl.h | 12 +-
> lib/librte_ethdev/rte_ethdev.c | 56 ++--
> lib/librte_ethdev/rte_ethdev.h | 12 +-
> lib/librte_ethdev/rte_ethdev_core.h | 12 +-
> lib/librte_ethdev/rte_flow.h | 32 +-
> lib/librte_eventdev/rte_event_eth_rx_adapter.c | 32 +-
> lib/librte_gro/gro_tcp4.c | 26 +-
> lib/librte_gro/gro_tcp4.h | 20 +-
> lib/librte_gro/gro_vxlan_tcp4.c | 64 ++--
> lib/librte_gro/gro_vxlan_tcp4.h | 6 +-
> lib/librte_gso/gso_common.h | 16 +-
> lib/librte_gso/gso_tcp4.c | 12 +-
> lib/librte_gso/gso_tunnel_tcp4.c | 14 +-
> lib/librte_gso/gso_udp4.c | 8 +-
> lib/librte_gso/rte_gso.h | 8 +-
> lib/librte_hash/rte_thash.h | 2 +-
> lib/librte_ip_frag/rte_ip_frag.h | 12 +-
> lib/librte_ip_frag/rte_ipv4_fragmentation.c | 42 +--
> lib/librte_ip_frag/rte_ipv4_reassembly.c | 14 +-
> lib/librte_ip_frag/rte_ipv6_fragmentation.c | 26 +-
> lib/librte_ip_frag/rte_ipv6_reassembly.c | 6 +-
> lib/librte_kni/rte_kni.c | 4 +-
> lib/librte_kni/rte_kni.h | 2 +-
> lib/librte_net/rte_arp.c | 32 +-
> lib/librte_net/rte_arp.h | 36 +--
> lib/librte_net/rte_esp.h | 2 +-
> lib/librte_net/rte_ether.h | 178 ++++++------
> lib/librte_net/rte_gre.h | 2 +-
> lib/librte_net/rte_icmp.h | 6 +-
> lib/librte_net/rte_ip.h | 70 ++---
> lib/librte_net/rte_net.c | 90 +++---
> lib/librte_net/rte_net.h | 22 +-
> lib/librte_net/rte_sctp.h | 2 +-
> lib/librte_net/rte_tcp.h | 2 +-
> lib/librte_net/rte_udp.h | 2 +-
> lib/librte_pipeline/rte_table_action.c | 210 +++++++-------
> lib/librte_pipeline/rte_table_action.h | 4 +-
> lib/librte_port/rte_port_ras.c | 8 +-
> lib/librte_port/rte_port_source_sink.c | 6 +-
> lib/librte_vhost/vhost.h | 2 +-
> lib/librte_vhost/virtio_net.c | 42 +--
> test/test-acl/main.c | 2 +-
> test/test-pipeline/pipeline_acl.c | 16 +-
> test/test-pipeline/pipeline_hash.c | 12 +-
> test/test/packet_burst_generator.c | 126 ++++----
> test/test/packet_burst_generator.h | 26 +-
> test/test/test_acl.c | 8 +-
> test/test/test_acl.h | 122 ++++----
> test/test/test_cmdline_etheraddr.c | 16 +-
> test/test/test_efd.c | 20 +-
> test/test/test_event_eth_rx_adapter.c | 2 +-
> test/test/test_event_eth_tx_adapter.c | 2 +-
> test/test/test_flow_classify.c | 68 ++---
> test/test/test_hash.c | 20 +-
> test/test/test_link_bonding.c | 284 +++++++++---------
> test/test/test_link_bonding_mode4.c | 116 ++++----
> test/test/test_link_bonding_rssconf.c | 6 +-
> test/test/test_lpm.c | 76 ++---
> test/test/test_lpm_perf.c | 10 +-
> test/test/test_member.c | 20 +-
> test/test/test_pmd_perf.c | 20 +-
> test/test/test_sched.c | 20 +-
> test/test/test_table_acl.c | 8 +-
> test/test/test_thash.c | 12 +-
> test/test/virtual_pmd.c | 6 +-
> test/test/virtual_pmd.h | 2 +-
> 367 files changed, 3906 insertions(+), 3913 deletions(-)
>
The Linux network developers and Glibc have already agreed on how to handle
overlap. Perhaps that policy could be used/extended rather than breaking
every userspace application.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [RFC 00/14] prefix network structures
2018-10-24 8:18 1% [dpdk-dev] [RFC 00/14] prefix network structures Olivier Matz
@ 2018-10-24 14:56 0% ` Wiles, Keith
2018-10-26 7:22 0% ` Olivier Matz
2018-10-24 16:09 0% ` Stephen Hemminger
` (2 subsequent siblings)
3 siblings, 1 reply; 200+ results
From: Wiles, Keith @ 2018-10-24 14:56 UTC (permalink / raw)
To: Olivier Matz; +Cc: dpdk-dev
> On Oct 24, 2018, at 1:18 AM, Olivier Matz <olivier.matz@6wind.com> wrote:
>
> This RFC targets 19.02.
>
> The rte_net headers conflict with the libc headers, because
> some definitions are duplicated, sometimes with few differences.
> This was discussed in [1], and more recently at the techboard.
>
> Before sending the deprecation notice (target for this is 18.11),
> here is a draft that can be discussed.
>
> This RFC adds the rte_ (or RTE_) prefix to all structures, functions
> and defines in rte_net library. This is a big changeset, that will
> break the API of many functions, but not the ABI.
>
> One question I'm asking is how can we manage the transition.
> Initially, I hoped it was possible to have a compat layer during
> one release (supporting both prefixed and unprefixed names), but
> now that the patch is done, it seems the impact is too big, and
> impacts too many libraries.
>
> Few examples:
> - rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
> - many rte_flow structures use the rte_ prefixed net structures
> - the mac field of virtio_net structure is rte_ether_addr
> - the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
> ...
>
> Therefore, it is clear that doing this would break the compilation
> of many external applications.
>
> Another drawback we need to take in account: it will make the
> backport of patches more difficult, although this is something
> that could be tempered by a script.
>
> While it is obviously better to have a good namespace convention,
> we need to identify the issues we have today before deciding it's
> worth doing the change.
>
> Comments?
I did not see the deprecation notice in the patches below, but I could have missed it.
>
>
> Things that are missing in RFC:
> - test with FreeBSD
> - manually fix some indentation issues
>
>
> Olivier Matz (14):
> net: add rte prefix to arp structures
> net: add rte prefix to arp defines
> net: add rte prefix to ether structures
> net: add rte prefix to ether functions
> net: add rte prefix to ether defines
> net: add rte prefix to esp structure
> net: add rte prefix to gre structure
> net: add rte prefix to icmp structure
> net: add rte prefix to icmp defines
> net: add rte prefix to ip structure
> net: add rte prefix to ip defines
> net: add rte prefix to sctp structure
> net: add rte prefix to tcp structure
> net: add rte prefix to udp structure
>
> app/pdump/main.c | 2 +-
> app/test-eventdev/test_perf_common.c | 2 +-
> app/test-eventdev/test_pipeline_common.c | 2 +-
> app/test-pmd/cmdline.c | 66 ++---
> app/test-pmd/cmdline_flow.c | 10 +-
> app/test-pmd/config.c | 34 +--
> app/test-pmd/csumonly.c | 156 +++++-----
> app/test-pmd/flowgen.c | 30 +-
> app/test-pmd/icmpecho.c | 120 ++++----
> app/test-pmd/ieee1588fwd.c | 18 +-
> app/test-pmd/macfwd.c | 12 +-
> app/test-pmd/macswap.c | 16 +-
> app/test-pmd/parameters.c | 6 +-
> app/test-pmd/testpmd.c | 22 +-
> app/test-pmd/testpmd.h | 18 +-
> app/test-pmd/txonly.c | 36 +--
> app/test-pmd/util.c | 34 +--
> doc/guides/prog_guide/bbdev.rst | 6 +-
> .../prog_guide/packet_classif_access_ctrl.rst | 18 +-
> doc/guides/prog_guide/rte_flow.rst | 4 +-
> doc/guides/sample_app_ug/flow_classify.rst | 28 +-
> doc/guides/sample_app_ug/flow_filtering.rst | 6 +-
> doc/guides/sample_app_ug/ip_frag.rst | 16 +-
> doc/guides/sample_app_ug/ip_reassembly.rst | 16 +-
> doc/guides/sample_app_ug/ipv4_multicast.rst | 16 +-
> doc/guides/sample_app_ug/l2_forward_job_stats.rst | 6 +-
> .../sample_app_ug/l2_forward_real_virtual.rst | 6 +-
> doc/guides/sample_app_ug/l3_forward.rst | 12 +-
> doc/guides/sample_app_ug/link_status_intr.rst | 6 +-
> doc/guides/sample_app_ug/ptpclient.rst | 6 +-
> doc/guides/sample_app_ug/rxtx_callbacks.rst | 2 +-
> doc/guides/sample_app_ug/server_node_efd.rst | 12 +-
> doc/guides/sample_app_ug/skeleton.rst | 4 +-
> doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst | 4 +-
> drivers/bus/dpaa/base/fman/fman.c | 2 +-
> drivers/bus/dpaa/base/fman/fman_hw.c | 2 +-
> drivers/bus/dpaa/include/fman.h | 2 +-
> drivers/bus/dpaa/include/netcfg.h | 4 +-
> drivers/net/af_packet/rte_eth_af_packet.c | 2 +-
> drivers/net/ark/ark_ethdev.c | 16 +-
> drivers/net/ark/ark_ext.h | 4 +-
> drivers/net/ark/ark_global.h | 4 +-
> drivers/net/atlantic/atl_ethdev.c | 20 +-
> drivers/net/atlantic/hw_atl/hw_atl_utils.c | 8 +-
> drivers/net/atlantic/hw_atl/hw_atl_utils_fw2x.c | 4 +-
> drivers/net/avf/avf.h | 4 +-
> drivers/net/avf/avf_ethdev.c | 50 ++--
> drivers/net/avf/avf_rxtx.c | 14 +-
> drivers/net/avf/avf_vchnl.c | 8 +-
> drivers/net/avf/base/avf_adminq_cmd.h | 4 +-
> drivers/net/avf/base/avf_common.c | 12 +-
> drivers/net/avf/base/avf_prototype.h | 4 +-
> drivers/net/avp/avp_ethdev.c | 20 +-
> drivers/net/avp/rte_avp_common.h | 2 +-
> drivers/net/axgbe/axgbe_dev.c | 4 +-
> drivers/net/axgbe/axgbe_ethdev.c | 10 +-
> drivers/net/axgbe/axgbe_ethdev.h | 4 +-
> drivers/net/axgbe/axgbe_rxtx.c | 2 +-
> drivers/net/bnx2x/bnx2x.c | 16 +-
> drivers/net/bnx2x/bnx2x_ethdev.c | 4 +-
> drivers/net/bnx2x/bnx2x_ethdev.h | 2 +-
> drivers/net/bnx2x/bnx2x_vfpf.c | 8 +-
> drivers/net/bnx2x/bnx2x_vfpf.h | 2 +-
> drivers/net/bnx2x/ecore_sp.h | 2 +-
> drivers/net/bnxt/bnxt.h | 4 +-
> drivers/net/bnxt/bnxt_ethdev.c | 70 ++---
> drivers/net/bnxt/bnxt_filter.c | 4 +-
> drivers/net/bnxt/bnxt_filter.h | 8 +-
> drivers/net/bnxt/bnxt_flow.c | 26 +-
> drivers/net/bnxt/bnxt_hwrm.c | 40 +--
> drivers/net/bnxt/bnxt_hwrm.h | 2 +-
> drivers/net/bnxt/bnxt_ring.c | 8 +-
> drivers/net/bnxt/bnxt_rxq.c | 2 +-
> drivers/net/bnxt/bnxt_rxr.c | 2 +-
> drivers/net/bnxt/bnxt_vnic.c | 2 +-
> drivers/net/bnxt/rte_pmd_bnxt.c | 14 +-
> drivers/net/bnxt/rte_pmd_bnxt.h | 4 +-
> drivers/net/bonding/rte_eth_bond.h | 2 +-
> drivers/net/bonding/rte_eth_bond_8023ad.c | 26 +-
> drivers/net/bonding/rte_eth_bond_8023ad.h | 10 +-
> drivers/net/bonding/rte_eth_bond_alb.c | 78 ++---
> drivers/net/bonding/rte_eth_bond_alb.h | 10 +-
> drivers/net/bonding/rte_eth_bond_api.c | 2 +-
> drivers/net/bonding/rte_eth_bond_args.c | 2 +-
> drivers/net/bonding/rte_eth_bond_pmd.c | 194 ++++++-------
> drivers/net/bonding/rte_eth_bond_private.h | 6 +-
> drivers/net/cxgbe/base/adapter.h | 6 +-
> drivers/net/cxgbe/base/t4_hw.c | 8 +-
> drivers/net/cxgbe/cxgbe.h | 4 +-
> drivers/net/cxgbe/cxgbe_ethdev.c | 14 +-
> drivers/net/cxgbe/cxgbe_filter.h | 2 +-
> drivers/net/cxgbe/cxgbe_flow.c | 10 +-
> drivers/net/cxgbe/cxgbe_main.c | 4 +-
> drivers/net/cxgbe/cxgbe_pfvf.h | 2 +-
> drivers/net/cxgbe/cxgbevf_main.c | 2 +-
> drivers/net/cxgbe/l2t.c | 8 +-
> drivers/net/cxgbe/l2t.h | 2 +-
> drivers/net/cxgbe/mps_tcam.c | 14 +-
> drivers/net/cxgbe/mps_tcam.h | 4 +-
> drivers/net/cxgbe/sge.c | 8 +-
> drivers/net/dpaa/dpaa_ethdev.c | 20 +-
> drivers/net/dpaa/dpaa_rxtx.c | 22 +-
> drivers/net/dpaa2/dpaa2_ethdev.c | 36 +--
> drivers/net/e1000/e1000_ethdev.h | 2 +-
> drivers/net/e1000/em_ethdev.c | 34 +--
> drivers/net/e1000/em_rxtx.c | 22 +-
> drivers/net/e1000/igb_ethdev.c | 70 ++---
> drivers/net/e1000/igb_flow.c | 12 +-
> drivers/net/e1000/igb_pf.c | 16 +-
> drivers/net/e1000/igb_rxtx.c | 18 +-
> drivers/net/ena/ena_ethdev.c | 16 +-
> drivers/net/ena/ena_ethdev.h | 2 +-
> drivers/net/enetc/base/enetc_hw.h | 4 +-
> drivers/net/enetc/enetc_ethdev.c | 6 +-
> drivers/net/enic/enic.h | 2 +-
> drivers/net/enic/enic_clsf.c | 40 +--
> drivers/net/enic/enic_ethdev.c | 4 +-
> drivers/net/enic/enic_flow.c | 100 +++----
> drivers/net/enic/enic_main.c | 2 +-
> drivers/net/enic/enic_res.c | 4 +-
> drivers/net/failsafe/failsafe.c | 6 +-
> drivers/net/failsafe/failsafe_args.c | 4 +-
> drivers/net/failsafe/failsafe_ether.c | 6 +-
> drivers/net/failsafe/failsafe_ops.c | 6 +-
> drivers/net/failsafe/failsafe_private.h | 4 +-
> drivers/net/fm10k/fm10k.h | 2 +-
> drivers/net/fm10k/fm10k_ethdev.c | 18 +-
> drivers/net/i40e/base/i40e_adminq_cmd.h | 4 +-
> drivers/net/i40e/base/i40e_common.c | 12 +-
> drivers/net/i40e/base/i40e_prototype.h | 4 +-
> drivers/net/i40e/i40e_ethdev.c | 134 ++++-----
> drivers/net/i40e/i40e_ethdev.h | 22 +-
> drivers/net/i40e/i40e_ethdev_vf.c | 60 ++--
> drivers/net/i40e/i40e_fdir.c | 126 ++++----
> drivers/net/i40e/i40e_flow.c | 58 ++--
> drivers/net/i40e/i40e_pf.c | 18 +-
> drivers/net/i40e/i40e_rxtx.c | 28 +-
> drivers/net/i40e/i40e_vf_representor.c | 2 +-
> drivers/net/i40e/rte_pmd_i40e.c | 30 +-
> drivers/net/i40e/rte_pmd_i40e.h | 8 +-
> drivers/net/ixgbe/ixgbe_ethdev.c | 94 +++---
> drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
> drivers/net/ixgbe/ixgbe_flow.c | 22 +-
> drivers/net/ixgbe/ixgbe_pf.c | 14 +-
> drivers/net/ixgbe/ixgbe_rxtx.c | 14 +-
> drivers/net/ixgbe/ixgbe_vf_representor.c | 4 +-
> drivers/net/ixgbe/rte_pmd_ixgbe.c | 10 +-
> drivers/net/ixgbe/rte_pmd_ixgbe.h | 2 +-
> drivers/net/kni/rte_eth_kni.c | 4 +-
> drivers/net/liquidio/lio_ethdev.c | 22 +-
> drivers/net/mlx4/mlx4.c | 4 +-
> drivers/net/mlx4/mlx4.h | 8 +-
> drivers/net/mlx4/mlx4_ethdev.c | 8 +-
> drivers/net/mlx4/mlx4_flow.c | 14 +-
> drivers/net/mlx4/mlx4_rxtx.c | 2 +-
> drivers/net/mlx5/mlx5.c | 4 +-
> drivers/net/mlx5/mlx5.h | 14 +-
> drivers/net/mlx5/mlx5_flow.c | 22 +-
> drivers/net/mlx5/mlx5_flow_tcf.c | 40 +--
> drivers/net/mlx5/mlx5_flow_verbs.c | 26 +-
> drivers/net/mlx5/mlx5_mac.c | 18 +-
> drivers/net/mlx5/mlx5_nl.c | 28 +-
> drivers/net/mlx5/mlx5_rxtx.c | 6 +-
> drivers/net/mlx5/mlx5_rxtx.h | 2 +-
> drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 +-
> drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 10 +-
> drivers/net/mlx5/mlx5_trigger.c | 6 +-
> drivers/net/mvneta/mvneta_ethdev.c | 22 +-
> drivers/net/mvneta/mvneta_ethdev.h | 2 +-
> drivers/net/mvpp2/mrvl_ethdev.c | 22 +-
> drivers/net/mvpp2/mrvl_ethdev.h | 2 +-
> drivers/net/mvpp2/mrvl_flow.c | 4 +-
> drivers/net/netvsc/hn_ethdev.c | 4 +-
> drivers/net/netvsc/hn_nvs.c | 2 +-
> drivers/net/netvsc/hn_rndis.c | 2 +-
> drivers/net/netvsc/hn_rxtx.c | 12 +-
> drivers/net/netvsc/hn_var.h | 4 +-
> drivers/net/netvsc/hn_vf.c | 12 +-
> drivers/net/nfp/nfp_net.c | 20 +-
> drivers/net/nfp/nfp_net_pmd.h | 2 +-
> drivers/net/null/rte_eth_null.c | 6 +-
> drivers/net/octeontx/octeontx_ethdev.c | 8 +-
> drivers/net/octeontx/octeontx_ethdev.h | 2 +-
> drivers/net/pcap/rte_eth_pcap.c | 22 +-
> drivers/net/qede/base/bcm_osal.h | 2 +-
> drivers/net/qede/base/ecore_dev.c | 4 +-
> drivers/net/qede/qede_ethdev.c | 58 ++--
> drivers/net/qede/qede_ethdev.h | 6 +-
> drivers/net/qede/qede_filter.c | 66 ++---
> drivers/net/qede/qede_if.h | 4 +-
> drivers/net/qede/qede_main.c | 6 +-
> drivers/net/qede/qede_rxtx.c | 32 +-
> drivers/net/qede/qede_rxtx.h | 2 +-
> drivers/net/ring/rte_eth_ring.c | 4 +-
> drivers/net/sfc/sfc.h | 2 +-
> drivers/net/sfc/sfc_ef10_tx.c | 8 +-
> drivers/net/sfc/sfc_ethdev.c | 20 +-
> drivers/net/sfc/sfc_flow.c | 12 +-
> drivers/net/sfc/sfc_port.c | 8 +-
> drivers/net/sfc/sfc_tso.c | 8 +-
> drivers/net/softnic/parser.c | 18 +-
> drivers/net/softnic/parser.h | 2 +-
> drivers/net/softnic/rte_eth_softnic.c | 2 +-
> drivers/net/softnic/rte_eth_softnic_pipeline.c | 40 +--
> drivers/net/szedata2/rte_eth_szedata2.c | 8 +-
> drivers/net/tap/rte_eth_tap.c | 58 ++--
> drivers/net/tap/rte_eth_tap.h | 2 +-
> drivers/net/tap/tap_bpf_program.c | 14 +-
> drivers/net/tap/tap_flow.c | 12 +-
> drivers/net/thunderx/base/nicvf_mbox.c | 4 +-
> drivers/net/thunderx/base/nicvf_plat.h | 2 +-
> drivers/net/thunderx/nicvf_ethdev.c | 18 +-
> drivers/net/thunderx/nicvf_struct.h | 2 +-
> drivers/net/vdev_netvsc/vdev_netvsc.c | 16 +-
> drivers/net/vhost/rte_eth_vhost.c | 12 +-
> drivers/net/virtio/virtio_ethdev.c | 70 ++---
> drivers/net/virtio/virtio_pci.h | 4 +-
> drivers/net/virtio/virtio_rxtx.c | 28 +-
> drivers/net/virtio/virtio_user/vhost_kernel_tap.c | 2 +-
> drivers/net/virtio/virtio_user/virtio_user_dev.c | 6 +-
> drivers/net/virtio/virtio_user/virtio_user_dev.h | 2 +-
> drivers/net/virtio/virtio_user_ethdev.c | 8 +-
> drivers/net/virtio/virtqueue.h | 2 +-
> drivers/net/vmxnet3/vmxnet3_ethdev.c | 12 +-
> drivers/net/vmxnet3/vmxnet3_ethdev.h | 2 +-
> drivers/net/vmxnet3/vmxnet3_rxtx.c | 44 +--
> examples/bbdev_app/main.c | 40 +--
> examples/bond/main.c | 78 ++---
> examples/distributor/main.c | 4 +-
> examples/ethtool/ethtool-app/ethapp.c | 8 +-
> examples/ethtool/ethtool-app/main.c | 10 +-
> examples/ethtool/lib/rte_ethtool.c | 8 +-
> examples/ethtool/lib/rte_ethtool.h | 6 +-
> examples/eventdev_pipeline/main.c | 4 +-
> examples/eventdev_pipeline/pipeline_common.h | 10 +-
> examples/flow_classify/flow_classify.c | 30 +-
> examples/flow_filtering/main.c | 10 +-
> examples/ip_fragmentation/main.c | 62 ++--
> examples/ip_pipeline/cli.c | 2 +-
> examples/ip_pipeline/kni.c | 2 +-
> examples/ip_pipeline/parser.c | 18 +-
> examples/ip_pipeline/parser.h | 2 +-
> examples/ip_pipeline/pipeline.c | 40 +--
> examples/ip_reassembly/main.c | 50 ++--
> examples/ipsec-secgw/esp.c | 42 +--
> examples/ipsec-secgw/ipsec-secgw.c | 38 +--
> examples/ipsec-secgw/sa.c | 6 +-
> examples/ipv4_multicast/main.c | 58 ++--
> examples/kni/main.c | 14 +-
> examples/l2fwd-cat/l2fwd-cat.c | 4 +-
> examples/l2fwd-crypto/main.c | 26 +-
> examples/l2fwd-jobstats/main.c | 8 +-
> examples/l2fwd-keepalive/main.c | 8 +-
> examples/l2fwd/main.c | 8 +-
> examples/l3fwd-acl/main.c | 102 +++----
> examples/l3fwd-power/main.c | 100 +++----
> examples/l3fwd-vf/main.c | 68 ++---
> examples/l3fwd/l3fwd.h | 8 +-
> examples/l3fwd/l3fwd_altivec.h | 14 +-
> examples/l3fwd/l3fwd_common.h | 4 +-
> examples/l3fwd/l3fwd_em.c | 44 +--
> examples/l3fwd/l3fwd_em.h | 20 +-
> examples/l3fwd/l3fwd_em_hlm.h | 16 +-
> examples/l3fwd/l3fwd_em_hlm_neon.h | 16 +-
> examples/l3fwd/l3fwd_em_hlm_sse.h | 16 +-
> examples/l3fwd/l3fwd_em_sequential.h | 16 +-
> examples/l3fwd/l3fwd_lpm.c | 50 ++--
> examples/l3fwd/l3fwd_lpm.h | 20 +-
> examples/l3fwd/l3fwd_lpm_altivec.h | 20 +-
> examples/l3fwd/l3fwd_lpm_neon.h | 30 +-
> examples/l3fwd/l3fwd_lpm_sse.h | 20 +-
> examples/l3fwd/l3fwd_neon.h | 14 +-
> examples/l3fwd/l3fwd_sse.h | 14 +-
> examples/l3fwd/main.c | 20 +-
> examples/link_status_interrupt/main.c | 8 +-
> examples/load_balancer/runtime.c | 6 +-
> .../client_server_mp/mp_server/main.c | 2 +-
> examples/packet_ordering/main.c | 2 +-
> examples/performance-thread/l3fwd-thread/main.c | 322 ++++++++++-----------
> examples/ptpclient/ptpclient.c | 32 +-
> examples/qos_meter/main.c | 4 +-
> examples/qos_sched/init.c | 2 +-
> examples/quota_watermark/qw/main.c | 8 +-
> examples/rxtx_callbacks/main.c | 4 +-
> examples/server_node_efd/node/node.c | 6 +-
> examples/server_node_efd/server/main.c | 8 +-
> examples/skeleton/basicfwd.c | 4 +-
> examples/tep_termination/main.c | 2 +-
> examples/tep_termination/main.h | 2 +-
> examples/tep_termination/vxlan.c | 108 +++----
> examples/tep_termination/vxlan.h | 8 +-
> examples/tep_termination/vxlan_setup.c | 30 +-
> examples/tep_termination/vxlan_setup.h | 2 +-
> examples/vhost/main.c | 40 +--
> examples/vhost/main.h | 2 +-
> examples/vm_power_manager/channel_monitor.c | 2 +-
> .../guest_cli/vm_power_cli_guest.c | 2 +-
> examples/vm_power_manager/main.c | 6 +-
> examples/vmdq/main.c | 12 +-
> examples/vmdq_dcb/main.c | 12 +-
> lib/librte_cmdline/cmdline_parse_etheraddr.c | 33 +--
> lib/librte_ethdev/rte_eth_ctrl.h | 12 +-
> lib/librte_ethdev/rte_ethdev.c | 56 ++--
> lib/librte_ethdev/rte_ethdev.h | 12 +-
> lib/librte_ethdev/rte_ethdev_core.h | 12 +-
> lib/librte_ethdev/rte_flow.h | 32 +-
> lib/librte_eventdev/rte_event_eth_rx_adapter.c | 32 +-
> lib/librte_gro/gro_tcp4.c | 26 +-
> lib/librte_gro/gro_tcp4.h | 20 +-
> lib/librte_gro/gro_vxlan_tcp4.c | 64 ++--
> lib/librte_gro/gro_vxlan_tcp4.h | 6 +-
> lib/librte_gso/gso_common.h | 16 +-
> lib/librte_gso/gso_tcp4.c | 12 +-
> lib/librte_gso/gso_tunnel_tcp4.c | 14 +-
> lib/librte_gso/gso_udp4.c | 8 +-
> lib/librte_gso/rte_gso.h | 8 +-
> lib/librte_hash/rte_thash.h | 2 +-
> lib/librte_ip_frag/rte_ip_frag.h | 12 +-
> lib/librte_ip_frag/rte_ipv4_fragmentation.c | 42 +--
> lib/librte_ip_frag/rte_ipv4_reassembly.c | 14 +-
> lib/librte_ip_frag/rte_ipv6_fragmentation.c | 26 +-
> lib/librte_ip_frag/rte_ipv6_reassembly.c | 6 +-
> lib/librte_kni/rte_kni.c | 4 +-
> lib/librte_kni/rte_kni.h | 2 +-
> lib/librte_net/rte_arp.c | 32 +-
> lib/librte_net/rte_arp.h | 36 +--
> lib/librte_net/rte_esp.h | 2 +-
> lib/librte_net/rte_ether.h | 178 ++++++------
> lib/librte_net/rte_gre.h | 2 +-
> lib/librte_net/rte_icmp.h | 6 +-
> lib/librte_net/rte_ip.h | 70 ++---
> lib/librte_net/rte_net.c | 90 +++---
> lib/librte_net/rte_net.h | 22 +-
> lib/librte_net/rte_sctp.h | 2 +-
> lib/librte_net/rte_tcp.h | 2 +-
> lib/librte_net/rte_udp.h | 2 +-
> lib/librte_pipeline/rte_table_action.c | 210 +++++++-------
> lib/librte_pipeline/rte_table_action.h | 4 +-
> lib/librte_port/rte_port_ras.c | 8 +-
> lib/librte_port/rte_port_source_sink.c | 6 +-
> lib/librte_vhost/vhost.h | 2 +-
> lib/librte_vhost/virtio_net.c | 42 +--
> test/test-acl/main.c | 2 +-
> test/test-pipeline/pipeline_acl.c | 16 +-
> test/test-pipeline/pipeline_hash.c | 12 +-
> test/test/packet_burst_generator.c | 126 ++++----
> test/test/packet_burst_generator.h | 26 +-
> test/test/test_acl.c | 8 +-
> test/test/test_acl.h | 122 ++++----
> test/test/test_cmdline_etheraddr.c | 16 +-
> test/test/test_efd.c | 20 +-
> test/test/test_event_eth_rx_adapter.c | 2 +-
> test/test/test_event_eth_tx_adapter.c | 2 +-
> test/test/test_flow_classify.c | 68 ++---
> test/test/test_hash.c | 20 +-
> test/test/test_link_bonding.c | 284 +++++++++---------
> test/test/test_link_bonding_mode4.c | 116 ++++----
> test/test/test_link_bonding_rssconf.c | 6 +-
> test/test/test_lpm.c | 76 ++---
> test/test/test_lpm_perf.c | 10 +-
> test/test/test_member.c | 20 +-
> test/test/test_pmd_perf.c | 20 +-
> test/test/test_sched.c | 20 +-
> test/test/test_table_acl.c | 8 +-
> test/test/test_thash.c | 12 +-
> test/test/virtual_pmd.c | 6 +-
> test/test/virtual_pmd.h | 2 +-
> 367 files changed, 3906 insertions(+), 3913 deletions(-)
>
> --
> 2.11.0
>
Regards,
Keith
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [RFC 00/14] prefix network structures
@ 2018-10-24 8:18 1% Olivier Matz
2018-10-24 14:56 0% ` Wiles, Keith
` (3 more replies)
0 siblings, 4 replies; 200+ results
From: Olivier Matz @ 2018-10-24 8:18 UTC (permalink / raw)
To: dev
This RFC targets 19.02.
The rte_net headers conflict with the libc headers, because
some definitions are duplicated, sometimes with few differences.
This was discussed in [1], and more recently at the techboard.
Before sending the deprecation notice (target for this is 18.11),
here is a draft that can be discussed.
This RFC adds the rte_ (or RTE_) prefix to all structures, functions
and defines in rte_net library. This is a big changeset, that will
break the API of many functions, but not the ABI.
One question I'm asking is how can we manage the transition.
Initially, I hoped it was possible to have a compat layer during
one release (supporting both prefixed and unprefixed names), but
now that the patch is done, it seems the impact is too big, and
impacts too many libraries.
Few examples:
- rte_eth_macaddr_get/add/remove() use a (struct rte_ether_addr *)
- many rte_flow structures use the rte_ prefixed net structures
- the mac field of virtio_net structure is rte_ether_addr
- the first arg of rte_thash_load_v6_addrs is (struct rte_ipv6_hdr *)
...
Therefore, it is clear that doing this would break the compilation
of many external applications.
Another drawback we need to take in account: it will make the
backport of patches more difficult, although this is something
that could be tempered by a script.
While it is obviously better to have a good namespace convention,
we need to identify the issues we have today before deciding it's
worth doing the change.
Comments?
Things that are missing in RFC:
- test with FreeBSD
- manually fix some indentation issues
Olivier Matz (14):
net: add rte prefix to arp structures
net: add rte prefix to arp defines
net: add rte prefix to ether structures
net: add rte prefix to ether functions
net: add rte prefix to ether defines
net: add rte prefix to esp structure
net: add rte prefix to gre structure
net: add rte prefix to icmp structure
net: add rte prefix to icmp defines
net: add rte prefix to ip structure
net: add rte prefix to ip defines
net: add rte prefix to sctp structure
net: add rte prefix to tcp structure
net: add rte prefix to udp structure
app/pdump/main.c | 2 +-
app/test-eventdev/test_perf_common.c | 2 +-
app/test-eventdev/test_pipeline_common.c | 2 +-
app/test-pmd/cmdline.c | 66 ++---
app/test-pmd/cmdline_flow.c | 10 +-
app/test-pmd/config.c | 34 +--
app/test-pmd/csumonly.c | 156 +++++-----
app/test-pmd/flowgen.c | 30 +-
app/test-pmd/icmpecho.c | 120 ++++----
app/test-pmd/ieee1588fwd.c | 18 +-
app/test-pmd/macfwd.c | 12 +-
app/test-pmd/macswap.c | 16 +-
app/test-pmd/parameters.c | 6 +-
app/test-pmd/testpmd.c | 22 +-
app/test-pmd/testpmd.h | 18 +-
app/test-pmd/txonly.c | 36 +--
app/test-pmd/util.c | 34 +--
doc/guides/prog_guide/bbdev.rst | 6 +-
.../prog_guide/packet_classif_access_ctrl.rst | 18 +-
doc/guides/prog_guide/rte_flow.rst | 4 +-
doc/guides/sample_app_ug/flow_classify.rst | 28 +-
doc/guides/sample_app_ug/flow_filtering.rst | 6 +-
doc/guides/sample_app_ug/ip_frag.rst | 16 +-
doc/guides/sample_app_ug/ip_reassembly.rst | 16 +-
doc/guides/sample_app_ug/ipv4_multicast.rst | 16 +-
doc/guides/sample_app_ug/l2_forward_job_stats.rst | 6 +-
.../sample_app_ug/l2_forward_real_virtual.rst | 6 +-
doc/guides/sample_app_ug/l3_forward.rst | 12 +-
doc/guides/sample_app_ug/link_status_intr.rst | 6 +-
doc/guides/sample_app_ug/ptpclient.rst | 6 +-
doc/guides/sample_app_ug/rxtx_callbacks.rst | 2 +-
doc/guides/sample_app_ug/server_node_efd.rst | 12 +-
doc/guides/sample_app_ug/skeleton.rst | 4 +-
doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst | 4 +-
drivers/bus/dpaa/base/fman/fman.c | 2 +-
drivers/bus/dpaa/base/fman/fman_hw.c | 2 +-
drivers/bus/dpaa/include/fman.h | 2 +-
drivers/bus/dpaa/include/netcfg.h | 4 +-
drivers/net/af_packet/rte_eth_af_packet.c | 2 +-
drivers/net/ark/ark_ethdev.c | 16 +-
drivers/net/ark/ark_ext.h | 4 +-
drivers/net/ark/ark_global.h | 4 +-
drivers/net/atlantic/atl_ethdev.c | 20 +-
drivers/net/atlantic/hw_atl/hw_atl_utils.c | 8 +-
drivers/net/atlantic/hw_atl/hw_atl_utils_fw2x.c | 4 +-
drivers/net/avf/avf.h | 4 +-
drivers/net/avf/avf_ethdev.c | 50 ++--
drivers/net/avf/avf_rxtx.c | 14 +-
drivers/net/avf/avf_vchnl.c | 8 +-
drivers/net/avf/base/avf_adminq_cmd.h | 4 +-
drivers/net/avf/base/avf_common.c | 12 +-
drivers/net/avf/base/avf_prototype.h | 4 +-
drivers/net/avp/avp_ethdev.c | 20 +-
drivers/net/avp/rte_avp_common.h | 2 +-
drivers/net/axgbe/axgbe_dev.c | 4 +-
drivers/net/axgbe/axgbe_ethdev.c | 10 +-
drivers/net/axgbe/axgbe_ethdev.h | 4 +-
drivers/net/axgbe/axgbe_rxtx.c | 2 +-
drivers/net/bnx2x/bnx2x.c | 16 +-
drivers/net/bnx2x/bnx2x_ethdev.c | 4 +-
drivers/net/bnx2x/bnx2x_ethdev.h | 2 +-
drivers/net/bnx2x/bnx2x_vfpf.c | 8 +-
drivers/net/bnx2x/bnx2x_vfpf.h | 2 +-
drivers/net/bnx2x/ecore_sp.h | 2 +-
drivers/net/bnxt/bnxt.h | 4 +-
drivers/net/bnxt/bnxt_ethdev.c | 70 ++---
drivers/net/bnxt/bnxt_filter.c | 4 +-
drivers/net/bnxt/bnxt_filter.h | 8 +-
drivers/net/bnxt/bnxt_flow.c | 26 +-
drivers/net/bnxt/bnxt_hwrm.c | 40 +--
drivers/net/bnxt/bnxt_hwrm.h | 2 +-
drivers/net/bnxt/bnxt_ring.c | 8 +-
drivers/net/bnxt/bnxt_rxq.c | 2 +-
drivers/net/bnxt/bnxt_rxr.c | 2 +-
drivers/net/bnxt/bnxt_vnic.c | 2 +-
drivers/net/bnxt/rte_pmd_bnxt.c | 14 +-
drivers/net/bnxt/rte_pmd_bnxt.h | 4 +-
drivers/net/bonding/rte_eth_bond.h | 2 +-
drivers/net/bonding/rte_eth_bond_8023ad.c | 26 +-
drivers/net/bonding/rte_eth_bond_8023ad.h | 10 +-
drivers/net/bonding/rte_eth_bond_alb.c | 78 ++---
drivers/net/bonding/rte_eth_bond_alb.h | 10 +-
drivers/net/bonding/rte_eth_bond_api.c | 2 +-
drivers/net/bonding/rte_eth_bond_args.c | 2 +-
drivers/net/bonding/rte_eth_bond_pmd.c | 194 ++++++-------
drivers/net/bonding/rte_eth_bond_private.h | 6 +-
drivers/net/cxgbe/base/adapter.h | 6 +-
drivers/net/cxgbe/base/t4_hw.c | 8 +-
drivers/net/cxgbe/cxgbe.h | 4 +-
drivers/net/cxgbe/cxgbe_ethdev.c | 14 +-
drivers/net/cxgbe/cxgbe_filter.h | 2 +-
drivers/net/cxgbe/cxgbe_flow.c | 10 +-
drivers/net/cxgbe/cxgbe_main.c | 4 +-
drivers/net/cxgbe/cxgbe_pfvf.h | 2 +-
drivers/net/cxgbe/cxgbevf_main.c | 2 +-
drivers/net/cxgbe/l2t.c | 8 +-
drivers/net/cxgbe/l2t.h | 2 +-
drivers/net/cxgbe/mps_tcam.c | 14 +-
drivers/net/cxgbe/mps_tcam.h | 4 +-
drivers/net/cxgbe/sge.c | 8 +-
drivers/net/dpaa/dpaa_ethdev.c | 20 +-
drivers/net/dpaa/dpaa_rxtx.c | 22 +-
drivers/net/dpaa2/dpaa2_ethdev.c | 36 +--
drivers/net/e1000/e1000_ethdev.h | 2 +-
drivers/net/e1000/em_ethdev.c | 34 +--
drivers/net/e1000/em_rxtx.c | 22 +-
drivers/net/e1000/igb_ethdev.c | 70 ++---
drivers/net/e1000/igb_flow.c | 12 +-
drivers/net/e1000/igb_pf.c | 16 +-
drivers/net/e1000/igb_rxtx.c | 18 +-
drivers/net/ena/ena_ethdev.c | 16 +-
drivers/net/ena/ena_ethdev.h | 2 +-
drivers/net/enetc/base/enetc_hw.h | 4 +-
drivers/net/enetc/enetc_ethdev.c | 6 +-
drivers/net/enic/enic.h | 2 +-
drivers/net/enic/enic_clsf.c | 40 +--
drivers/net/enic/enic_ethdev.c | 4 +-
drivers/net/enic/enic_flow.c | 100 +++----
drivers/net/enic/enic_main.c | 2 +-
drivers/net/enic/enic_res.c | 4 +-
drivers/net/failsafe/failsafe.c | 6 +-
drivers/net/failsafe/failsafe_args.c | 4 +-
drivers/net/failsafe/failsafe_ether.c | 6 +-
drivers/net/failsafe/failsafe_ops.c | 6 +-
drivers/net/failsafe/failsafe_private.h | 4 +-
drivers/net/fm10k/fm10k.h | 2 +-
drivers/net/fm10k/fm10k_ethdev.c | 18 +-
drivers/net/i40e/base/i40e_adminq_cmd.h | 4 +-
drivers/net/i40e/base/i40e_common.c | 12 +-
drivers/net/i40e/base/i40e_prototype.h | 4 +-
drivers/net/i40e/i40e_ethdev.c | 134 ++++-----
drivers/net/i40e/i40e_ethdev.h | 22 +-
drivers/net/i40e/i40e_ethdev_vf.c | 60 ++--
drivers/net/i40e/i40e_fdir.c | 126 ++++----
drivers/net/i40e/i40e_flow.c | 58 ++--
drivers/net/i40e/i40e_pf.c | 18 +-
drivers/net/i40e/i40e_rxtx.c | 28 +-
drivers/net/i40e/i40e_vf_representor.c | 2 +-
drivers/net/i40e/rte_pmd_i40e.c | 30 +-
drivers/net/i40e/rte_pmd_i40e.h | 8 +-
drivers/net/ixgbe/ixgbe_ethdev.c | 94 +++---
drivers/net/ixgbe/ixgbe_ethdev.h | 2 +-
drivers/net/ixgbe/ixgbe_flow.c | 22 +-
drivers/net/ixgbe/ixgbe_pf.c | 14 +-
drivers/net/ixgbe/ixgbe_rxtx.c | 14 +-
drivers/net/ixgbe/ixgbe_vf_representor.c | 4 +-
drivers/net/ixgbe/rte_pmd_ixgbe.c | 10 +-
drivers/net/ixgbe/rte_pmd_ixgbe.h | 2 +-
drivers/net/kni/rte_eth_kni.c | 4 +-
drivers/net/liquidio/lio_ethdev.c | 22 +-
drivers/net/mlx4/mlx4.c | 4 +-
drivers/net/mlx4/mlx4.h | 8 +-
drivers/net/mlx4/mlx4_ethdev.c | 8 +-
drivers/net/mlx4/mlx4_flow.c | 14 +-
drivers/net/mlx4/mlx4_rxtx.c | 2 +-
drivers/net/mlx5/mlx5.c | 4 +-
drivers/net/mlx5/mlx5.h | 14 +-
drivers/net/mlx5/mlx5_flow.c | 22 +-
drivers/net/mlx5/mlx5_flow_tcf.c | 40 +--
drivers/net/mlx5/mlx5_flow_verbs.c | 26 +-
drivers/net/mlx5/mlx5_mac.c | 18 +-
drivers/net/mlx5/mlx5_nl.c | 28 +-
drivers/net/mlx5/mlx5_rxtx.c | 6 +-
drivers/net/mlx5/mlx5_rxtx.h | 2 +-
drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 +-
drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 10 +-
drivers/net/mlx5/mlx5_trigger.c | 6 +-
drivers/net/mvneta/mvneta_ethdev.c | 22 +-
drivers/net/mvneta/mvneta_ethdev.h | 2 +-
drivers/net/mvpp2/mrvl_ethdev.c | 22 +-
drivers/net/mvpp2/mrvl_ethdev.h | 2 +-
drivers/net/mvpp2/mrvl_flow.c | 4 +-
drivers/net/netvsc/hn_ethdev.c | 4 +-
drivers/net/netvsc/hn_nvs.c | 2 +-
drivers/net/netvsc/hn_rndis.c | 2 +-
drivers/net/netvsc/hn_rxtx.c | 12 +-
drivers/net/netvsc/hn_var.h | 4 +-
drivers/net/netvsc/hn_vf.c | 12 +-
drivers/net/nfp/nfp_net.c | 20 +-
drivers/net/nfp/nfp_net_pmd.h | 2 +-
drivers/net/null/rte_eth_null.c | 6 +-
drivers/net/octeontx/octeontx_ethdev.c | 8 +-
drivers/net/octeontx/octeontx_ethdev.h | 2 +-
drivers/net/pcap/rte_eth_pcap.c | 22 +-
drivers/net/qede/base/bcm_osal.h | 2 +-
drivers/net/qede/base/ecore_dev.c | 4 +-
drivers/net/qede/qede_ethdev.c | 58 ++--
drivers/net/qede/qede_ethdev.h | 6 +-
drivers/net/qede/qede_filter.c | 66 ++---
drivers/net/qede/qede_if.h | 4 +-
drivers/net/qede/qede_main.c | 6 +-
drivers/net/qede/qede_rxtx.c | 32 +-
drivers/net/qede/qede_rxtx.h | 2 +-
drivers/net/ring/rte_eth_ring.c | 4 +-
drivers/net/sfc/sfc.h | 2 +-
drivers/net/sfc/sfc_ef10_tx.c | 8 +-
drivers/net/sfc/sfc_ethdev.c | 20 +-
drivers/net/sfc/sfc_flow.c | 12 +-
drivers/net/sfc/sfc_port.c | 8 +-
drivers/net/sfc/sfc_tso.c | 8 +-
drivers/net/softnic/parser.c | 18 +-
drivers/net/softnic/parser.h | 2 +-
drivers/net/softnic/rte_eth_softnic.c | 2 +-
drivers/net/softnic/rte_eth_softnic_pipeline.c | 40 +--
drivers/net/szedata2/rte_eth_szedata2.c | 8 +-
drivers/net/tap/rte_eth_tap.c | 58 ++--
drivers/net/tap/rte_eth_tap.h | 2 +-
drivers/net/tap/tap_bpf_program.c | 14 +-
drivers/net/tap/tap_flow.c | 12 +-
drivers/net/thunderx/base/nicvf_mbox.c | 4 +-
drivers/net/thunderx/base/nicvf_plat.h | 2 +-
drivers/net/thunderx/nicvf_ethdev.c | 18 +-
drivers/net/thunderx/nicvf_struct.h | 2 +-
drivers/net/vdev_netvsc/vdev_netvsc.c | 16 +-
drivers/net/vhost/rte_eth_vhost.c | 12 +-
drivers/net/virtio/virtio_ethdev.c | 70 ++---
drivers/net/virtio/virtio_pci.h | 4 +-
drivers/net/virtio/virtio_rxtx.c | 28 +-
drivers/net/virtio/virtio_user/vhost_kernel_tap.c | 2 +-
drivers/net/virtio/virtio_user/virtio_user_dev.c | 6 +-
drivers/net/virtio/virtio_user/virtio_user_dev.h | 2 +-
drivers/net/virtio/virtio_user_ethdev.c | 8 +-
drivers/net/virtio/virtqueue.h | 2 +-
drivers/net/vmxnet3/vmxnet3_ethdev.c | 12 +-
drivers/net/vmxnet3/vmxnet3_ethdev.h | 2 +-
drivers/net/vmxnet3/vmxnet3_rxtx.c | 44 +--
examples/bbdev_app/main.c | 40 +--
examples/bond/main.c | 78 ++---
examples/distributor/main.c | 4 +-
examples/ethtool/ethtool-app/ethapp.c | 8 +-
examples/ethtool/ethtool-app/main.c | 10 +-
examples/ethtool/lib/rte_ethtool.c | 8 +-
examples/ethtool/lib/rte_ethtool.h | 6 +-
examples/eventdev_pipeline/main.c | 4 +-
examples/eventdev_pipeline/pipeline_common.h | 10 +-
examples/flow_classify/flow_classify.c | 30 +-
examples/flow_filtering/main.c | 10 +-
examples/ip_fragmentation/main.c | 62 ++--
examples/ip_pipeline/cli.c | 2 +-
examples/ip_pipeline/kni.c | 2 +-
examples/ip_pipeline/parser.c | 18 +-
examples/ip_pipeline/parser.h | 2 +-
examples/ip_pipeline/pipeline.c | 40 +--
examples/ip_reassembly/main.c | 50 ++--
examples/ipsec-secgw/esp.c | 42 +--
examples/ipsec-secgw/ipsec-secgw.c | 38 +--
examples/ipsec-secgw/sa.c | 6 +-
examples/ipv4_multicast/main.c | 58 ++--
examples/kni/main.c | 14 +-
examples/l2fwd-cat/l2fwd-cat.c | 4 +-
examples/l2fwd-crypto/main.c | 26 +-
examples/l2fwd-jobstats/main.c | 8 +-
examples/l2fwd-keepalive/main.c | 8 +-
examples/l2fwd/main.c | 8 +-
examples/l3fwd-acl/main.c | 102 +++----
examples/l3fwd-power/main.c | 100 +++----
examples/l3fwd-vf/main.c | 68 ++---
examples/l3fwd/l3fwd.h | 8 +-
examples/l3fwd/l3fwd_altivec.h | 14 +-
examples/l3fwd/l3fwd_common.h | 4 +-
examples/l3fwd/l3fwd_em.c | 44 +--
examples/l3fwd/l3fwd_em.h | 20 +-
examples/l3fwd/l3fwd_em_hlm.h | 16 +-
examples/l3fwd/l3fwd_em_hlm_neon.h | 16 +-
examples/l3fwd/l3fwd_em_hlm_sse.h | 16 +-
examples/l3fwd/l3fwd_em_sequential.h | 16 +-
examples/l3fwd/l3fwd_lpm.c | 50 ++--
examples/l3fwd/l3fwd_lpm.h | 20 +-
examples/l3fwd/l3fwd_lpm_altivec.h | 20 +-
examples/l3fwd/l3fwd_lpm_neon.h | 30 +-
examples/l3fwd/l3fwd_lpm_sse.h | 20 +-
examples/l3fwd/l3fwd_neon.h | 14 +-
examples/l3fwd/l3fwd_sse.h | 14 +-
examples/l3fwd/main.c | 20 +-
examples/link_status_interrupt/main.c | 8 +-
examples/load_balancer/runtime.c | 6 +-
.../client_server_mp/mp_server/main.c | 2 +-
examples/packet_ordering/main.c | 2 +-
examples/performance-thread/l3fwd-thread/main.c | 322 ++++++++++-----------
examples/ptpclient/ptpclient.c | 32 +-
examples/qos_meter/main.c | 4 +-
examples/qos_sched/init.c | 2 +-
examples/quota_watermark/qw/main.c | 8 +-
examples/rxtx_callbacks/main.c | 4 +-
examples/server_node_efd/node/node.c | 6 +-
examples/server_node_efd/server/main.c | 8 +-
examples/skeleton/basicfwd.c | 4 +-
examples/tep_termination/main.c | 2 +-
examples/tep_termination/main.h | 2 +-
examples/tep_termination/vxlan.c | 108 +++----
examples/tep_termination/vxlan.h | 8 +-
examples/tep_termination/vxlan_setup.c | 30 +-
examples/tep_termination/vxlan_setup.h | 2 +-
examples/vhost/main.c | 40 +--
examples/vhost/main.h | 2 +-
examples/vm_power_manager/channel_monitor.c | 2 +-
.../guest_cli/vm_power_cli_guest.c | 2 +-
examples/vm_power_manager/main.c | 6 +-
examples/vmdq/main.c | 12 +-
examples/vmdq_dcb/main.c | 12 +-
lib/librte_cmdline/cmdline_parse_etheraddr.c | 33 +--
lib/librte_ethdev/rte_eth_ctrl.h | 12 +-
lib/librte_ethdev/rte_ethdev.c | 56 ++--
lib/librte_ethdev/rte_ethdev.h | 12 +-
lib/librte_ethdev/rte_ethdev_core.h | 12 +-
lib/librte_ethdev/rte_flow.h | 32 +-
lib/librte_eventdev/rte_event_eth_rx_adapter.c | 32 +-
lib/librte_gro/gro_tcp4.c | 26 +-
lib/librte_gro/gro_tcp4.h | 20 +-
lib/librte_gro/gro_vxlan_tcp4.c | 64 ++--
lib/librte_gro/gro_vxlan_tcp4.h | 6 +-
lib/librte_gso/gso_common.h | 16 +-
lib/librte_gso/gso_tcp4.c | 12 +-
lib/librte_gso/gso_tunnel_tcp4.c | 14 +-
lib/librte_gso/gso_udp4.c | 8 +-
lib/librte_gso/rte_gso.h | 8 +-
lib/librte_hash/rte_thash.h | 2 +-
lib/librte_ip_frag/rte_ip_frag.h | 12 +-
lib/librte_ip_frag/rte_ipv4_fragmentation.c | 42 +--
lib/librte_ip_frag/rte_ipv4_reassembly.c | 14 +-
lib/librte_ip_frag/rte_ipv6_fragmentation.c | 26 +-
lib/librte_ip_frag/rte_ipv6_reassembly.c | 6 +-
lib/librte_kni/rte_kni.c | 4 +-
lib/librte_kni/rte_kni.h | 2 +-
lib/librte_net/rte_arp.c | 32 +-
lib/librte_net/rte_arp.h | 36 +--
lib/librte_net/rte_esp.h | 2 +-
lib/librte_net/rte_ether.h | 178 ++++++------
lib/librte_net/rte_gre.h | 2 +-
lib/librte_net/rte_icmp.h | 6 +-
lib/librte_net/rte_ip.h | 70 ++---
lib/librte_net/rte_net.c | 90 +++---
lib/librte_net/rte_net.h | 22 +-
lib/librte_net/rte_sctp.h | 2 +-
lib/librte_net/rte_tcp.h | 2 +-
lib/librte_net/rte_udp.h | 2 +-
lib/librte_pipeline/rte_table_action.c | 210 +++++++-------
lib/librte_pipeline/rte_table_action.h | 4 +-
lib/librte_port/rte_port_ras.c | 8 +-
lib/librte_port/rte_port_source_sink.c | 6 +-
lib/librte_vhost/vhost.h | 2 +-
lib/librte_vhost/virtio_net.c | 42 +--
test/test-acl/main.c | 2 +-
test/test-pipeline/pipeline_acl.c | 16 +-
test/test-pipeline/pipeline_hash.c | 12 +-
test/test/packet_burst_generator.c | 126 ++++----
test/test/packet_burst_generator.h | 26 +-
test/test/test_acl.c | 8 +-
test/test/test_acl.h | 122 ++++----
test/test/test_cmdline_etheraddr.c | 16 +-
test/test/test_efd.c | 20 +-
test/test/test_event_eth_rx_adapter.c | 2 +-
test/test/test_event_eth_tx_adapter.c | 2 +-
test/test/test_flow_classify.c | 68 ++---
test/test/test_hash.c | 20 +-
test/test/test_link_bonding.c | 284 +++++++++---------
test/test/test_link_bonding_mode4.c | 116 ++++----
test/test/test_link_bonding_rssconf.c | 6 +-
test/test/test_lpm.c | 76 ++---
test/test/test_lpm_perf.c | 10 +-
test/test/test_member.c | 20 +-
test/test/test_pmd_perf.c | 20 +-
test/test/test_sched.c | 20 +-
test/test/test_table_acl.c | 8 +-
test/test/test_thash.c | 12 +-
test/test/virtual_pmd.c | 6 +-
test/test/virtual_pmd.h | 2 +-
367 files changed, 3906 insertions(+), 3913 deletions(-)
--
2.11.0
^ permalink raw reply [relevance 1%]
* Re: [dpdk-dev] [PATCH] doc: show internal functions in doxygen
2018-10-19 7:39 3% ` Ferruh Yigit
@ 2018-10-22 6:15 0% ` Shreyansh Jain
0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2018-10-22 6:15 UTC (permalink / raw)
To: Ferruh Yigit, Thomas Monjalon; +Cc: dev, john.mcnamara, marko.kovacevic
On Friday 19 October 2018 01:09 PM, Ferruh Yigit wrote:
> On 10/18/2018 6:04 PM, Thomas Monjalon wrote:
>> 18/10/2018 18:22, Ferruh Yigit:
>>> On 10/18/2018 5:08 PM, Thomas Monjalon wrote:
>>>> Not sure we want to show the internal functions to users.
>>>> It may be useful only for PMD developers.
>>>> Do we vote? +1 / -1 welcome!
>>>
>>> What is affected from this setting, can you give an example what was not shown
>>> will be shown now?
>>
>> For instance, most of the things in rte_ethdev_core.h.
>> All the doxygen with @internal tag are affected.
>
> rte_ethdev_core.h is not part of API documentation but I randomly checked
> rte_lpm.h which has some @internal structures.
>
> But those in the lpm header is the ones for ABI versioning, I think it is
> confusing to expose them to the user, and documentation doesn't highlight that
> it is internal.
>
> So not a strong opinion, but from my side -1
>
-1 from me as well.
Even I think it would be overload of information in Doxygen. And to add,
some places might require re-documenting to cleanup internal markers.
My opinion: direct code would help better than doxygen for these cases.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] doc: show internal functions in doxygen
@ 2018-10-19 7:39 3% ` Ferruh Yigit
2018-10-22 6:15 0% ` Shreyansh Jain
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-10-19 7:39 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev, john.mcnamara, marko.kovacevic
On 10/18/2018 6:04 PM, Thomas Monjalon wrote:
> 18/10/2018 18:22, Ferruh Yigit:
>> On 10/18/2018 5:08 PM, Thomas Monjalon wrote:
>>> Not sure we want to show the internal functions to users.
>>> It may be useful only for PMD developers.
>>> Do we vote? +1 / -1 welcome!
>>
>> What is affected from this setting, can you give an example what was not shown
>> will be shown now?
>
> For instance, most of the things in rte_ethdev_core.h.
> All the doxygen with @internal tag are affected.
rte_ethdev_core.h is not part of API documentation but I randomly checked
rte_lpm.h which has some @internal structures.
But those in the lpm header is the ones for ABI versioning, I think it is
confusing to expose them to the user, and documentation doesn't highlight that
it is internal.
So not a strong opinion, but from my side -1
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v2 1/2] eal: add API that sleeps while waiting for threads
@ 2018-10-16 8:42 3% ` Ananyev, Konstantin
0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2018-10-16 8:42 UTC (permalink / raw)
To: Yigit, Ferruh, Richardson, Bruce; +Cc: dev, Yigit, Ferruh, stephen
HI Ferruh,
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Monday, October 15, 2018 11:21 PM
> To: Richardson, Bruce <bruce.richardson@intel.com>
> Cc: dev@dpdk.org; Yigit, Ferruh <ferruh.yigit@intel.com>; stephen@networkplumber.org
> Subject: [dpdk-dev] [PATCH v2 1/2] eal: add API that sleeps while waiting for threads
>
> It is common that sample applications call rte_eal_wait_lcore() while
> waiting for worker threads to be terminated.
> Mostly master lcore keeps waiting in this function.
>
> The waiting app for termination is not a time critical task, app can
> prefer a sleep version of the waiting to consume less cycles.
>
> A sleeping version of the API, rte_eal_wait_lcore_sleep(), has been
> added which uses pthread conditions.
>
> Sample applications will be updated later to use this API.
>
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> v2:
> * use pthread cond instead of usleep
> ---
> lib/librte_eal/bsdapp/eal/eal.c | 3 +++
> lib/librte_eal/bsdapp/eal/eal_thread.c | 7 ++++++
> lib/librte_eal/common/eal_common_launch.c | 22 ++++++++++++++++++
> lib/librte_eal/common/include/rte_launch.h | 26 ++++++++++++++++++++++
> lib/librte_eal/common/include/rte_lcore.h | 3 +++
> lib/librte_eal/linuxapp/eal/eal.c | 3 +++
> lib/librte_eal/linuxapp/eal/eal_thread.c | 7 ++++++
> lib/librte_eal/rte_eal_version.map | 1 +
> 8 files changed, 72 insertions(+)
>
> diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
> index 7735194a3..e7d676657 100644
> --- a/lib/librte_eal/bsdapp/eal/eal.c
> +++ b/lib/librte_eal/bsdapp/eal/eal.c
> @@ -756,6 +756,9 @@ rte_eal_init(int argc, char **argv)
> snprintf(thread_name, sizeof(thread_name),
> "lcore-slave-%d", i);
> rte_thread_setname(lcore_config[i].thread_id, thread_name);
> +
> + pthread_mutex_init(&rte_eal_thread_mutex[i], NULL);
> + pthread_cond_init(&rte_eal_thread_cond[i], NULL);
> }
>
> /*
> diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c b/lib/librte_eal/bsdapp/eal/eal_thread.c
> index 309b58726..60db32d57 100644
> --- a/lib/librte_eal/bsdapp/eal/eal_thread.c
> +++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
> @@ -28,6 +28,9 @@ RTE_DEFINE_PER_LCORE(unsigned, _lcore_id) = LCORE_ID_ANY;
> RTE_DEFINE_PER_LCORE(unsigned, _socket_id) = (unsigned)SOCKET_ID_ANY;
> RTE_DEFINE_PER_LCORE(rte_cpuset_t, _cpuset);
>
> +pthread_cond_t rte_eal_thread_cond[RTE_MAX_LCORE];
> +pthread_mutex_t rte_eal_thread_mutex[RTE_MAX_LCORE];
I think would be better to include cond and mutex into struct lcore_config itself,
probably would help to avoid false sharing.
Though yeh, it would mean ABI breakage, I suppose.
> +
> /*
> * Send a message to a slave lcore identified by slave_id to call a
> * function f with argument arg. Once the execution is done, the
> @@ -154,6 +157,10 @@ eal_thread_loop(__attribute__((unused)) void *arg)
> lcore_config[lcore_id].ret = ret;
> rte_wmb();
> lcore_config[lcore_id].state = FINISHED;
> +
> + pthread_mutex_lock(&rte_eal_thread_mutex[lcore_id]);
> + pthread_cond_signal(&rte_eal_thread_cond[lcore_id]);
> + pthread_mutex_unlock(&rte_eal_thread_mutex[lcore_id]);
I understand it would work that way too, but if you introduce mutex and cond around
the state, then it is better to manipulate/access the state after grabbing the mutex.
BTW in that case we don't need wmb:
lcore_config[lcore_id].ret = ret;
pthread_mutex_lock(...);
lcore_config[lcore_id].state = FINISHED;
pthread_cond_signal(..);
pthread_mutex_unlock(...);
Konstantin
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v17 2/6] eal: enable hotplug on multi-process
2018-10-16 0:16 1% ` [dpdk-dev] [PATCH v17 0/6] " Qi Zhang
@ 2018-10-16 0:16 2% ` Qi Zhang
0 siblings, 0 replies; 200+ results
From: Qi Zhang @ 2018-10-16 0:16 UTC (permalink / raw)
To: thomas, gaetan.rivet, anatoly.burakov, arybchenko
Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
benjamin.h.shelton, narender.vangati, Qi Zhang
We are going to introduce the solution to handle hotplug in
multi-process, it includes the below scenario:
1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary
In the primary-secondary process model, we assume devices are shared
by default. that means attaches or detaches a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.
Any failure during attaching/detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.
This patch covers the implementation of case 1,2.
Case 3,4 will be implemented on a separate patch.
IPC scenario for Case 1, 2:
attach a device
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach the device and send a reply.
d) primary check the reply if all success goes to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach the device and send a reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success
detach a device
a) primary send detach sync request to all secondary
b) secondary detach the device and send reply
c) primary check the reply if all success goes to f).
d) primary send detach rollback sync request to all secondary.
e) secondary receive the request and attach back device. goto g)
f) primary detach the device if success goto g), else goto d)
g) detach fail.
h) detach success.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 13 ++
lib/librte_eal/bsdapp/eal/Makefile | 1 +
lib/librte_eal/common/eal_common_dev.c | 254 +++++++++++++++++++++++++++++---
lib/librte_eal/common/eal_private.h | 22 +++
lib/librte_eal/common/hotplug_mp.c | 221 +++++++++++++++++++++++++++
lib/librte_eal/common/hotplug_mp.h | 46 ++++++
lib/librte_eal/common/include/rte_dev.h | 12 ++
lib/librte_eal/common/include/rte_eal.h | 9 ++
lib/librte_eal/common/meson.build | 1 +
lib/librte_eal/linuxapp/eal/Makefile | 1 +
lib/librte_eal/linuxapp/eal/eal.c | 6 +
11 files changed, 567 insertions(+), 19 deletions(-)
create mode 100644 lib/librte_eal/common/hotplug_mp.c
create mode 100644 lib/librte_eal/common/hotplug_mp.h
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 436b20e2b..da2236fea 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -159,6 +159,13 @@ New Features
this application doesn't need to launch dedicated worker threads for vhost
enqueue/dequeue operations.
+* **Support device multi-process hotplug.**
+
+ Hotplug and hot-unplug for devices will now be supported in multiprocessing
+ scenario. Any ethdev devices created in the primary process will be regarded
+ as shared and will be available for all DPDK processes. Synchronization
+ between processes will be done using DPDK IPC.
+
API Changes
-----------
@@ -213,6 +220,12 @@ API Changes
* eventdev: Type of 2nd parameter to ``rte_event_eth_rx_adapter_caps_get()``
has been changed from uint8_t to uint16_t.
+* eal: scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
+
+ In primary-secondary process model, ``rte_eal_hotplug_add`` will guarantee
+ that device be attached on all processes, while ``rte_eal_hotplug_remove``
+ will guarantee device be detached on all processes.
+
ABI Changes
-----------
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index 97bff4852..6e9bc02c5 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -62,6 +62,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_proc.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_fbarray.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_uuid.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_malloc.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += hotplug_mp.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_elem.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_heap.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_mp.c
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index 7663eaa3f..2209f8843 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -19,8 +19,10 @@
#include <rte_log.h>
#include <rte_spinlock.h>
#include <rte_malloc.h>
+#include <rte_string_fns.h>
#include "eal_private.h"
+#include "hotplug_mp.h"
/**
* The device event callback description.
@@ -127,37 +129,61 @@ int rte_eal_dev_detach(struct rte_device *dev)
return ret;
}
-int
-rte_eal_hotplug_add(const char *busname, const char *devname,
- const char *drvargs)
+/* helper funciton to build devargs, caller should free the memory */
+static int
+build_devargs(const char *busname, const char *devname,
+ const char *drvargs, char **devargs)
{
- int ret;
- char *devargs = NULL;
int length;
+ char *da;
length = snprintf(NULL, 0, "%s:%s,%s", busname, devname, drvargs);
+
if (length < 0)
return -EINVAL;
- devargs = malloc(length + 1);
- if (devargs == NULL)
+
+ da = malloc(length + 1);
+ if (da == NULL)
return -ENOMEM;
- ret = snprintf(devargs, length + 1, "%s:%s,%s", busname, devname, drvargs);
- if (ret < 0)
+
+ if (snprintf(da, length + 1, "%s:%s,%s",
+ busname, devname, drvargs) < 0) {
+ free(da);
return -EINVAL;
+ }
- ret = rte_dev_probe(devargs);
+ *devargs = da;
+ return 0;
+}
+int
+rte_eal_hotplug_add(const char *busname, const char *devname,
+ const char *drvargs)
+{
+
+ char *devargs;
+ int ret;
+
+ ret = build_devargs(busname, devname, drvargs, &devargs);
+
+ if (ret != 0)
+ return ret;
+
+ ret = rte_dev_probe(devargs);
free(devargs);
+
return ret;
}
-int __rte_experimental
-rte_dev_probe(const char *devargs)
+/* probe device at local process. */
+int
+local_dev_probe(const char *devargs, struct rte_device **new_dev)
{
struct rte_device *dev;
struct rte_devargs *da;
int ret;
+ *new_dev = NULL;
da = calloc(1, sizeof(*da));
if (da == NULL)
return -ENOMEM;
@@ -174,11 +200,11 @@ rte_dev_probe(const char *devargs)
}
ret = rte_devargs_insert(da);
- if (ret)
+ if (ret != 0)
goto err_devarg;
ret = da->bus->scan();
- if (ret)
+ if (ret != 0)
goto err_devarg;
dev = da->bus->find_device(NULL, cmp_dev_name, da->name);
@@ -195,11 +221,13 @@ rte_dev_probe(const char *devargs)
}
ret = dev->bus->plug(dev);
- if (ret) {
+ if (ret != 0) {
RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
dev->name);
goto err_devarg;
}
+
+ *new_dev = dev;
return 0;
err_devarg:
@@ -231,8 +259,9 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
return rte_dev_remove(dev);
}
-int __rte_experimental
-rte_dev_remove(struct rte_device *dev)
+/* remove device at local process. */
+int
+local_dev_remove(struct rte_device *dev)
{
int ret;
@@ -248,10 +277,197 @@ rte_dev_remove(struct rte_device *dev)
}
ret = dev->bus->unplug(dev);
- if (ret)
+ if (ret != 0)
RTE_LOG(ERR, EAL, "Driver cannot detach the device (%s)\n",
dev->name);
- rte_devargs_remove(dev->devargs);
+ else
+ rte_devargs_remove(dev->devargs);
+
+ return ret;
+}
+
+int __rte_experimental
+rte_dev_probe(const char *devargs)
+{
+ struct eal_dev_mp_req req;
+ struct rte_device *dev;
+ int ret;
+
+ memset(&req, 0, sizeof(req));
+ req.t = EAL_DEV_REQ_TYPE_ATTACH;
+ strlcpy(req.devargs, devargs, EAL_DEV_MP_DEV_ARGS_MAX_LEN);
+
+ if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+ /**
+ * If in secondary process, just send IPC request to
+ * primary process.
+ */
+ ret = eal_dev_hotplug_request_to_primary(&req);
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL,
+ "Failed to send hotplug request to primary\n");
+ return -ENOMSG;
+ }
+ if (req.result != 0)
+ RTE_LOG(ERR, EAL,
+ "Failed to hotplug add device\n");
+ return req.result;
+ }
+
+ /* attach a shared device from primary start from here: */
+
+ /* primary attach the new device itself. */
+ ret = local_dev_probe(devargs, &dev);
+
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL,
+ "Failed to attach device on primary process\n");
+
+ /**
+ * it is possible that secondary process failed to attached a
+ * device that primary process have during initialization,
+ * so for -EEXIST case, we still need to sync with secondary
+ * process.
+ */
+ if (ret != -EEXIST)
+ return ret;
+ }
+
+ /* primary send attach sync request to secondary. */
+ ret = eal_dev_hotplug_request_to_secondary(&req);
+
+ /* if any communication error, we need to rollback. */
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL,
+ "Failed to send hotplug add request to secondary\n");
+ ret = -ENOMSG;
+ goto rollback;
+ }
+
+ /**
+ * if any secondary failed to attach, we need to consider if rollback
+ * is necessary.
+ */
+ if (req.result != 0) {
+ RTE_LOG(ERR, EAL,
+ "Failed to attach device on secondary process\n");
+ ret = req.result;
+
+ /* for -EEXIST, we don't need to rollback. */
+ if (ret == -EEXIST)
+ return ret;
+ goto rollback;
+ }
+
+ return 0;
+
+rollback:
+ req.t = EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK;
+
+ /* primary send rollback request to secondary. */
+ if (eal_dev_hotplug_request_to_secondary(&req) != 0)
+ RTE_LOG(WARNING, EAL,
+ "Failed to rollback device attach on secondary."
+ "Devices in secondary may not sync with primary\n");
+
+ /* primary rollback itself. */
+ if (local_dev_remove(dev) != 0)
+ RTE_LOG(WARNING, EAL,
+ "Failed to rollback device attach on primary."
+ "Devices in secondary may not sync with primary\n");
+
+ return ret;
+}
+
+int __rte_experimental
+rte_dev_remove(struct rte_device *dev)
+{
+ struct eal_dev_mp_req req;
+ char *devargs;
+ int ret;
+
+ ret = build_devargs(dev->devargs->bus->name, dev->name, "", &devargs);
+ if (ret != 0)
+ return ret;
+
+ memset(&req, 0, sizeof(req));
+ req.t = EAL_DEV_REQ_TYPE_DETACH;
+ strlcpy(req.devargs, devargs, EAL_DEV_MP_DEV_ARGS_MAX_LEN);
+ free(devargs);
+
+ if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+ /**
+ * If in secondary process, just send IPC request to
+ * primary process.
+ */
+ ret = eal_dev_hotplug_request_to_primary(&req);
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL,
+ "Failed to send hotplug request to primary\n");
+ return -ENOMSG;
+ }
+ if (req.result != 0)
+ RTE_LOG(ERR, EAL,
+ "Failed to hotplug remove device\n");
+ return req.result;
+ }
+
+ /* detach a device from primary start from here: */
+
+ /* primary send detach sync request to secondary */
+ ret = eal_dev_hotplug_request_to_secondary(&req);
+
+ /**
+ * if communication error, we need to rollback, because it is possible
+ * part of the secondary processes still detached it successfully.
+ */
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL,
+ "Failed to send device detach request to secondary\n");
+ ret = -ENOMSG;
+ goto rollback;
+ }
+
+ /**
+ * if any secondary failed to detach, we need to consider if rollback
+ * is necessary.
+ */
+ if (req.result != 0) {
+ RTE_LOG(ERR, EAL,
+ "Failed to detach device on secondary process\n");
+ ret = req.result;
+ /**
+ * if -ENOENT, we don't need to rollback, since devices is
+ * already detached on secondary process.
+ */
+ if (ret != -ENOENT)
+ goto rollback;
+ }
+
+ /* primary detach the device itself. */
+ ret = local_dev_remove(dev);
+
+ /* if primary failed, still need to consider if rollback is necessary */
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL,
+ "Failed to detach device on primary process\n");
+ /* if -ENOENT, we don't need to rollback */
+ if (ret == -ENOENT)
+ return ret;
+ goto rollback;
+ }
+
+ return 0;
+
+rollback:
+ req.t = EAL_DEV_REQ_TYPE_DETACH_ROLLBACK;
+
+ /* primary send rollback request to secondary. */
+ if (eal_dev_hotplug_request_to_secondary(&req) != 0)
+ RTE_LOG(WARNING, EAL,
+ "Failed to rollback device detach on secondary."
+ "Devices in secondary may not sync with primary\n");
+
return ret;
}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 4f809a83c..2ad94e435 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,4 +304,26 @@ int
rte_devargs_layers_parse(struct rte_devargs *devargs,
const char *devstr);
+/*
+ * probe a device at local process.
+ *
+ * @param devargs
+ * Device arguments including bus, class and driver properties.
+ * @param new_dev
+ * new device be probed as output.
+ * @return
+ * 0 on success, negative on error.
+ */
+int local_dev_probe(const char *devargs, struct rte_device **new_dev);
+
+/**
+ * Hotplug remove a given device from a specific bus at local process.
+ *
+ * @param dev
+ * Data structure of the device to remove.
+ * @return
+ * 0 on success, negative on error.
+ */
+int local_dev_remove(struct rte_device *dev);
+
#endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/hotplug_mp.c b/lib/librte_eal/common/hotplug_mp.c
new file mode 100644
index 000000000..92d8f50d3
--- /dev/null
+++ b/lib/librte_eal/common/hotplug_mp.c
@@ -0,0 +1,221 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+#include <string.h>
+
+#include <rte_eal.h>
+#include <rte_alarm.h>
+#include <rte_string_fns.h>
+#include <rte_devargs.h>
+
+#include "hotplug_mp.h"
+#include "eal_private.h"
+
+#define MP_TIMEOUT_S 5 /**< 5 seconds timeouts */
+
+static int cmp_dev_name(const struct rte_device *dev, const void *_name)
+{
+ const char *name = _name;
+
+ return strcmp(dev->name, name);
+}
+
+struct mp_reply_bundle {
+ struct rte_mp_msg msg;
+ void *peer;
+};
+
+static int
+handle_secondary_request(const struct rte_mp_msg *msg, const void *peer)
+{
+ RTE_SET_USED(msg);
+ RTE_SET_USED(peer);
+ return -ENOTSUP;
+}
+
+static void __handle_primary_request(void *param)
+{
+ struct mp_reply_bundle *bundle = param;
+ struct rte_mp_msg *msg = &bundle->msg;
+ const struct eal_dev_mp_req *req =
+ (const struct eal_dev_mp_req *)msg->param;
+ struct rte_mp_msg mp_resp;
+ struct eal_dev_mp_req *resp =
+ (struct eal_dev_mp_req *)mp_resp.param;
+ struct rte_devargs *da;
+ struct rte_device *dev;
+ struct rte_bus *bus;
+ int ret = 0;
+
+ memset(&mp_resp, 0, sizeof(mp_resp));
+
+ switch (req->t) {
+ case EAL_DEV_REQ_TYPE_ATTACH:
+ case EAL_DEV_REQ_TYPE_DETACH_ROLLBACK:
+ ret = local_dev_probe(req->devargs, &dev);
+ break;
+ case EAL_DEV_REQ_TYPE_DETACH:
+ case EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK:
+ da = calloc(1, sizeof(*da));
+ if (da == NULL) {
+ ret = -ENOMEM;
+ goto quit;
+ }
+
+ ret = rte_devargs_parse(da, req->devargs);
+ if (ret != 0)
+ goto quit;
+
+ bus = rte_bus_find_by_name(da->bus->name);
+ if (bus == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n", da->bus->name);
+ ret = -ENOENT;
+ goto quit;
+ }
+
+ dev = bus->find_device(NULL, cmp_dev_name, da->name);
+ if (dev == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot find plugged device (%s)\n", da->name);
+ ret = -ENOENT;
+ goto quit;
+ }
+
+ ret = local_dev_remove(dev);
+quit:
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ strlcpy(mp_resp.name, EAL_DEV_MP_ACTION_REQUEST, sizeof(mp_resp.name));
+ mp_resp.len_param = sizeof(*req);
+ memcpy(resp, req, sizeof(*resp));
+ resp->result = ret;
+ if (rte_mp_reply(&mp_resp, bundle->peer) < 0)
+ RTE_LOG(ERR, EAL, "failed to send reply to primary request\n");
+
+ free(bundle->peer);
+ free(bundle);
+}
+
+static int
+handle_primary_request(const struct rte_mp_msg *msg, const void *peer)
+{
+ struct rte_mp_msg mp_resp;
+ const struct eal_dev_mp_req *req =
+ (const struct eal_dev_mp_req *)msg->param;
+ struct eal_dev_mp_req *resp =
+ (struct eal_dev_mp_req *)mp_resp.param;
+ struct mp_reply_bundle *bundle;
+ int ret = 0;
+
+ memset(&mp_resp, 0, sizeof(mp_resp));
+ strlcpy(mp_resp.name, EAL_DEV_MP_ACTION_REQUEST, sizeof(mp_resp.name));
+ mp_resp.len_param = sizeof(*req);
+ memcpy(resp, req, sizeof(*resp));
+
+ bundle = calloc(1, sizeof(*bundle));
+ if (bundle == NULL) {
+ resp->result = -ENOMEM;
+ ret = rte_mp_reply(&mp_resp, peer);
+ if (ret)
+ RTE_LOG(ERR, EAL, "failed to send reply to primary request\n");
+ return ret;
+ }
+
+ bundle->msg = *msg;
+ /**
+ * We need to send reply on interrupt thread, but peer can't be
+ * parsed directly, so this is a temporal hack, need to be fixed
+ * when it is ready.
+ */
+ bundle->peer = (void *)strdup(peer);
+
+ /**
+ * We are at IPC callback thread, sync IPC is not allowed due to
+ * dead lock, so we delegate the task to interrupt thread.
+ */
+ ret = rte_eal_alarm_set(1, __handle_primary_request, bundle);
+ if (ret != 0) {
+ resp->result = ret;
+ ret = rte_mp_reply(&mp_resp, peer);
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL, "failed to send reply to primary request\n");
+ return ret;
+ }
+ }
+ return 0;
+}
+
+int eal_dev_hotplug_request_to_primary(struct eal_dev_mp_req *req)
+{
+ RTE_SET_USED(req);
+ return -ENOTSUP;
+}
+
+int eal_dev_hotplug_request_to_secondary(struct eal_dev_mp_req *req)
+{
+ struct rte_mp_msg mp_req;
+ struct rte_mp_reply mp_reply;
+ struct timespec ts = {.tv_sec = MP_TIMEOUT_S, .tv_nsec = 0};
+ int ret;
+ int i;
+
+ memset(&mp_req, 0, sizeof(mp_req));
+ memcpy(mp_req.param, req, sizeof(*req));
+ mp_req.len_param = sizeof(*req);
+ strlcpy(mp_req.name, EAL_DEV_MP_ACTION_REQUEST, sizeof(mp_req.name));
+
+ ret = rte_mp_request_sync(&mp_req, &mp_reply, &ts);
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL, "rte_mp_request_sync failed\n");
+ return ret;
+ }
+
+ if (mp_reply.nb_sent != mp_reply.nb_received) {
+ RTE_LOG(ERR, EAL, "not all secondary reply\n");
+ return -1;
+ }
+
+ req->result = 0;
+ for (i = 0; i < mp_reply.nb_received; i++) {
+ struct eal_dev_mp_req *resp =
+ (struct eal_dev_mp_req *)mp_reply.msgs[i].param;
+ if (resp->result != 0) {
+ req->result = resp->result;
+ if (req->t == EAL_DEV_REQ_TYPE_ATTACH &&
+ req->result != -EEXIST)
+ break;
+ if (req->t == EAL_DEV_REQ_TYPE_DETACH &&
+ req->result != -ENOENT)
+ break;
+ }
+ }
+
+ return 0;
+}
+
+int rte_mp_dev_hotplug_init(void)
+{
+ int ret;
+
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ ret = rte_mp_action_register(EAL_DEV_MP_ACTION_REQUEST,
+ handle_secondary_request);
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL, "Couldn't register '%s' action\n",
+ EAL_DEV_MP_ACTION_REQUEST);
+ return ret;
+ }
+ } else {
+ ret = rte_mp_action_register(EAL_DEV_MP_ACTION_REQUEST,
+ handle_primary_request);
+ if (ret != 0) {
+ RTE_LOG(ERR, EAL, "Couldn't register '%s' action\n",
+ EAL_DEV_MP_ACTION_REQUEST);
+ return ret;
+ }
+ }
+
+ return 0;
+}
diff --git a/lib/librte_eal/common/hotplug_mp.h b/lib/librte_eal/common/hotplug_mp.h
new file mode 100644
index 000000000..597fde3d7
--- /dev/null
+++ b/lib/librte_eal/common/hotplug_mp.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _HOTPLUG_MP_H_
+#define _HOTPLUG_MP_H_
+
+#include "rte_dev.h"
+#include "rte_bus.h"
+
+#define EAL_DEV_MP_ACTION_REQUEST "eal_dev_mp_request"
+#define EAL_DEV_MP_ACTION_RESPONSE "eal_dev_mp_response"
+
+#define EAL_DEV_MP_DEV_NAME_MAX_LEN RTE_DEV_NAME_MAX_LEN
+#define EAL_DEV_MP_BUS_NAME_MAX_LEN 32
+#define EAL_DEV_MP_DEV_ARGS_MAX_LEN 128
+
+enum eal_dev_req_type {
+ EAL_DEV_REQ_TYPE_ATTACH,
+ EAL_DEV_REQ_TYPE_DETACH,
+ EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK,
+ EAL_DEV_REQ_TYPE_DETACH_ROLLBACK,
+};
+
+struct eal_dev_mp_req {
+ enum eal_dev_req_type t;
+ char devargs[EAL_DEV_MP_DEV_ARGS_MAX_LEN];
+ int result;
+};
+
+/**
+ * This is a synchronous wrapper for secondary process send
+ * request to primary process, this is invoked when an attach
+ * or detach request is issued from primary process.
+ */
+int eal_dev_hotplug_request_to_primary(struct eal_dev_mp_req *req);
+
+/**
+ * this is a synchronous wrapper for primary process send
+ * request to secondary process, this is invoked when an attach
+ * or detach request issued from secondary process.
+ */
+int eal_dev_hotplug_request_to_secondary(struct eal_dev_mp_req *req);
+
+
+#endif /* _HOTPLUG_MP_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 036180ff3..696cf7bbe 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -192,6 +192,9 @@ int rte_eal_dev_detach(struct rte_device *dev);
/**
* Hotplug add a given device to a specific bus.
*
+ * In multi-process, it will request other processes to add the same device.
+ * A failure, in any process, will rollback the action
+ *
* @param busname
* The bus name the device is added to.
* @param devname
@@ -211,6 +214,9 @@ int rte_eal_hotplug_add(const char *busname, const char *devname,
*
* Add matching devices.
*
+ * In multi-process, it will request other processes to add the same device.
+ * A failure, in any process, will rollback the action
+ *
* @param devargs
* Device arguments including bus, class and driver properties.
* @return
@@ -221,6 +227,9 @@ int __rte_experimental rte_dev_probe(const char *devargs);
/**
* Hotplug remove a given device from a specific bus.
*
+ * In multi-process, it will request other processes to remove the same device.
+ * A failure, in any process, will rollback the action
+ *
* @param busname
* The bus name the device is removed from.
* @param devname
@@ -236,6 +245,9 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname);
*
* Remove one device.
*
+ * In multi-process, it will request other processes to remove the same device.
+ * A failure, in any process, will rollback the action
+ *
* @param dev
* Data structure of the device to remove.
* @return
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index e114dcbdc..3ee897c1d 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -378,6 +378,15 @@ int __rte_experimental
rte_mp_reply(struct rte_mp_msg *msg, const char *peer);
/**
+ * Register all mp action callbacks for hotplug.
+ *
+ * @return
+ * 0 on success, negative on error.
+ */
+int __rte_experimental
+rte_mp_dev_hotplug_init(void);
+
+/**
* Usage function typedef used by the application usage function.
*
* Use this function typedef to define and call rte_set_application_usage_hook()
diff --git a/lib/librte_eal/common/meson.build b/lib/librte_eal/common/meson.build
index b7fc98499..04c414356 100644
--- a/lib/librte_eal/common/meson.build
+++ b/lib/librte_eal/common/meson.build
@@ -28,6 +28,7 @@ common_sources = files(
'eal_common_thread.c',
'eal_common_timer.c',
'eal_common_uuid.c',
+ 'hotplug_mp.c',
'malloc_elem.c',
'malloc_heap.c',
'malloc_mp.c',
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 5c16bc40f..736bc6569 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -70,6 +70,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_proc.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_fbarray.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_uuid.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += rte_malloc.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += hotplug_mp.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += malloc_elem.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += malloc_heap.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += malloc_mp.c
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 950f33f2c..d342a04f0 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -888,6 +888,12 @@ rte_eal_init(int argc, char **argv)
}
}
+ /* register multi-process action callbacks for hotplug */
+ if (rte_mp_dev_hotplug_init() < 0) {
+ rte_eal_init_alert("failed to register mp callback for hotplug\n");
+ return -1;
+ }
+
if (rte_bus_scan()) {
rte_eal_init_alert("Cannot scan the buses for devices\n");
rte_errno = ENODEV;
--
2.13.6
^ permalink raw reply [relevance 2%]
* [dpdk-dev] [PATCH v17 0/6] enable hotplug on multi-process
2018-09-28 4:23 1% ` [dpdk-dev] [PATCH v16 0/6] " Qi Zhang
@ 2018-10-16 0:16 1% ` Qi Zhang
2018-10-16 0:16 2% ` [dpdk-dev] [PATCH v17 2/6] eal: " Qi Zhang
1 sibling, 1 reply; 200+ results
From: Qi Zhang @ 2018-10-16 0:16 UTC (permalink / raw)
To: thomas, gaetan.rivet, anatoly.burakov, arybchenko
Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
benjamin.h.shelton, narender.vangati, Qi Zhang
v17:
- fix format in release notes
- rework build_devargs
- fix devargs memory leak in rte_dev_hotplug_add
- always explicit if (<check> != 0)
- rename rte_dev_hotplug_mp_init to rte_mp_dev_hotplug_init and move
funciton claim to rte_dev.h
- comment reword
v16:
- rebase to patch "simplify parameters of hotplug functions"
http://patchwork.dpdk.org/patch/45463/ include:
* keep rte_eal_hotplug_add/rte_eal_hotplug_move unchanged.
* the IPC sync logic is moved to rte_dev_probe/rte_dev_remove.
* simplify the IPC message by removing busname and devname from
eal_dev_mp_req, since devargs string will encode those information
already.
- combined release notes with related code changes.
- replace do_ prefix to local_ for local process only probe/remove function.
- improve comments
v15:
- fix missing return in rte_eth_dev_pci_release.
- minor fix and more detail comments for patch 5/7.
- update release notes for v18.11.
v14:
- rebase.
- All changes belongs to patch 1/6.
1) rename rte_eth_dev_release_port_private to rte_eth_dev_release_port_seondary
since it is only used by secondary process.
2) in rte_eth_dev_pci_generic_remove, even on the secondary process,
I think its better to call rte_eth_dev_release_port_secondary after
dev_uninit since it is possible that secondary process need to release
some local resources in dev_uninit before release the port and return.
Also this does not break all exist users of rte_eth_dev_pci_generic_remove,
because there is no special handle in all exist dev_uninit for secondary
process.
3) add rte_eth_dev_release_port_secondary into rte_eth_dev_destroy as a
general step, so we don't need patches for i40e and ixgbe.
4) fix missing update on rte_ethdev_version.map.
- improve error handle for -EEXIST when attaching a device and -ENOENT
when detaching a device. It is possible that device is not synced during
some situation, so attach an exist device in primary still need to sync
with secondary. Also, it's not necessary to rollback if we fail to
attach an exist device or detach a not exist device on secondary.
- fix potential NULL point ref in handle_primary_request.
- merge all vdev driver patches into one patch.
- merge all pci driver patches into on patch.
v13:
- Since rte_eth_dev_attach/rte_eth_dev_detach will be deprecated,
so, modify the sample code to use rte_eal_hotplug_add and
rte_eal_hotplug_remove to attach/detach device.
v12:
- fix return value in eal_dev_hotplug_request_to_primary.
- add more error log in rte_eal_hotplug_add.
- fix return value in rte_eal_hotplug_add and rte_eal_hotplug_remove
any failure due to IPC error will return -ENOMSG, but not -1.
- remove unnecessary changes from previous rework.
v11: - move out common code from pci_vfio_unmap_secondary and
pci_vfio_unmap_primary.
- move RTE_BUS_NAME_MAX_LEN and RTE_DEV_ARGS_MAX_LEN into hotplug_mp.h
- fix reply check in eal_dev_hotplug_request_to_primary.
- move skeleton code for attaching device from secondary from patch 6/19
to patch 5/19 to improve code readability.
v10:
- Since hotplug add/remove a vdev on a secondary process will sync on
all processes now, it is not necessary to support private vdev for
a secondary process which is identified by a not-NULL devargs in
"--vdev". So re-work on all vdev driver changes to simpified device
probe scenario on a secondary process, devargs will be ignored on
secondary process now.
- fix lisence header in example/multi-process/hotplug_mp/Makefile.
v9:
- Move hotplug IPC from rte_eth_dev_attach/rte_eth_dev_detach to
eal_dev_hotplug_add and eal_dev_hotplug_remove, now all kinds of
devices will be synced in multi-process.
- Fix couple issue when a device is bound to vfio.
1) The device can't be detached clearly in a secondary process, which
also cause it can't be attached again, due to the error that
/dev/vfio/<group_fd> is still busy.(see Patch 3/19 and 4/19)
2) repeat detach/attach device will cause "cannot find TAILQ entry
for PCI device" due to incorrect PCI address compare.
(see patch 2/19).
- Removed device lock.
- Removed private device support.
- Fix commit log grammar issue
v8:
- update rte_eal_version.map due to new API added.
- minor reword on release note.
- minor fix on commit log and code style.
NOTE:
Some issues which is not related with this patchset is expected when
play with hotplug_mp sample as belows.
- Attach a PCI device twice may cause device can't be detached
below fix is required:
https://patches.dpdk.org/patch/42030/
- ixgbe device can't detached, below fix is required
https://patches.dpdk.org/patch/42031/
v7:
- update rte_ethdev_version.map for new APIs.
- improve code readability in __handle_secondary_request by use goto.
- add comments to explain why need to call rte_eal_alarm_set.
- add error log when process_mp_init_callbacks failed.
- reword release notes base on Anatoly's suggestion.
- add back previous "Acked-by" and "Reviewed-by" in commit log.
NOTE: current patchset depends on below IPC fix, or it may not be able
to attach a shared vdev.
https://patches.dpdk.org/patch/41647/
v6:
- remove bus->scan_one, since ABI break is not necessary.
- remove patch for failsafe PMD since it will not support secondary.
- fix wrong implemenation on ixgbe.
- add rte_eth_dev_release_port_private into rte_eth_dev_pci_generic_remove for
secondary process, so we don't need to patch on PMD if PMD use the
default remove function.
- add release notes update.
- agreed to use strdup(peer) as workaround for repling a sync request in seperate
thread.
v5:
- since we will keep mp thread separate from interrupt thread,
it is not necessary to use temporary thread, we use rte_eal_alarm_set.
- remove the change in rte_eth_dev_release_port, since there is a better
way to prevent rte_eth_dev_release_port be called after
rte_eth_dev_release_port_private.
- fix the issue that lock does not take effect on secondary due to
previous re-work
- fix the issue when the first attached device is a private device from
secondary. (patch 8/24)
- work around for reply a sync request in separate thread, this is still
an open and in discussion as below.
https://mails.dpdk.org/archives/dev/2018-June/105359.html
v4:
- since mp thread will be merged to interrupt thread, the fix on v3
for sync IPC deadlock will not work. the new version enable the
machanism to invoke a mp action callback in a temporary thread to
avoid the IPC deadlock, with this, secondary to primary request
impelemtation also be simplified, since we can use sync request
directly in a separate thread.
v3:
- enable mp init callback register to help non-eal module to initialize
mp channel during rte_eal_init
- fix when attach share device from secondary.
1) dead lock due to sync IPC be invoked in rte_malloc in primary
process when handle secondary request to attach device, the
solution is primary process to issue share device attach/detach
in interrupt thread.
2) return port_id not correct.
- check nb_sent and nb_received in sync IPC.
- fix memory leak duirng error handling at attach_on_secondary.
- improve clean_lock_callback to only lock/unlock spinlock once
- improve error code return in check-reply during async IPC.
- remove rte_ prefix of internal function in ethdev_mp.c
- sample code improvement.
1) rename sample to "hotplug_mp", and move to example/multi-process.
2) cleanup header include.
3) call rte_eal_cleanup before exit.
v2:
- rename rte_ethdev_mp.* to ethdev_mp.*
- rename rte_ethdev_lock.* to ethdev_lock.*
- move internal funciton to ethdev_private.h
- separate rte_eth_dev_[un]lock into rte_eth_dev_[un]lock and
rte_eth_dev_[un]lock_with_callback
- lock callbacks will be removed automatically after device is detached.
- add experimental tag for all new APIs.
- fix coding style issue.
- fix wrong lisence header in sample code.
- fix spelling
- fix meson.build.
- improve comments.
Background:
===========
Currently secondary process will only sync ethdev from primary
process at init stage, but it will not be aware if device
is attached/detached on primary process at runtime.
While there is the requirement from application that take
primary-secondary process model. The primary process work as a
resource management process, it will create/destroy virtual device
at runtime, while the secondary process deal with the network stuff
with these devices.
Solution:
=========
So the orignial intention is to fix this gap, but beyond that
the patch set provide a more comprehesive solution to handle
different hotplug cases in multi-process situation, it cover below
scenario:
1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary
In primary-secondary process model, we assume ethernet devices are
shared by default. that means attach or detach a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.
Any failure during attaching or detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.
Scenario for Case 1, 2:
attach device from primary
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach device and send reply.
d) primary check the reply if all success go to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach device and send reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success
detach device from primary
a) primary perform pre-detach check, if device is locked, goto i).
b) primary send pre-detach sync request to all secondary.
c) secondary perform pre-detach check and send reply.
d) primary check the reply if any fail goto i).
e) primary send detach sync request to all secondary
f) secondary detach the device and send reply (assume no fail)
g) primary detach the device.
h) detach success
i) detach failed
Scenario for case 3, 4:
attach device from secondary:
a) seconary send asycn request to primary and wait on a condition
which will be released by matched response from primary.
b) primary receive the request and attach the new device if failed
goto i).
c) primary forward attach request to all secondary as async request
(because this in mp thread context, use sync request will deadlock,
same reason for all following async request.)
d) secondary receive request and attach device and send reply.
e) primary check the reply if all success go to j).
f) primary send attach rollback async request to all secondary.
g) secondary receive the request and detach device and send reply.
h) primary receive the reply and detach device as rollback action.
i) send fail response to secondary, goto k).
j) send success response to secondary.
k) secondary process receive response and return.
detach device from secondary:
a) secondary send async request to primary and wait on a condition
which will be released by matched response from primary.
b) primary receive the request and perform pre-detach check, if device
is locked, goto j).
c) primary send pre-detach async request to all secondary.
d) secondary perform pre-detach check and send reply.
e) primary check the reply if any fail goto j).
f) primary send detach async request to all secondary
g) secondary detach the device and send reply
h) primary detach the device.
i) send success response to secondary, goto k).
j) send fail response to secondary.
k) secondary process receive response and return.
APIs chenages:
==============
scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
In primary-secondary process model, rte_eal_hotplug_add will guarantee
that device be attached on all processes, while rte_eal_hotplug_remove will
guarantee device be detached on all processes.
PMD Impact:
===========
Currently device removing is not handled well in secondary process on
most pmd drivers, rte_eth_dev_relase_port will be invoked and will mess up
primary process since it reset all shared data. So we introduced new API
rte_eth_dev_release_port_secondary which only reset ethdev's state to unsued
but not touch shared data so other process will not be impacted.
Since not all device driver is target to support primary-secondary
process model, so the patch set only fix this for PCI device those driver use
rte_eth_dev_pci_generic_remove or rte_eth_dev_destroy and all
vdev that support secondary process, it can be refereneced by other driver
when equevalent fix is required
Example:
========
The patchset also contains a example to demonstrate device hotplug
in multi-process model, below are detail instructions.
/* start sample code as primary then secondary */
./hotplug_mp --proc-type=auto
Command Line Example:
>help
>list
/* attach a pci device */
> attach 0000:81:00.0
/* detach the pci device */
> detach 0000:81:00.0
/* attach a vdev af_packet device */
> attach net_af_packet,iface=eth0
/* detach the vdev af_packet device */
> detach net_af_packet
Qi Zhang (6):
ethdev: add function to release port in secondary process
eal: enable hotplug on multi-process
eal: support attach or detach share device from secondary
drivers/net: enable hotplug on secondary process
drivers/net: enable device detach on secondary
examples/multi_process: add hotplug sample
doc/guides/rel_notes/release_18_11.rst | 13 +
drivers/net/af_packet/rte_eth_af_packet.c | 6 +-
drivers/net/bnxt/bnxt_ethdev.c | 6 +-
drivers/net/bonding/rte_eth_bond_pmd.c | 6 +-
drivers/net/ena/ena_ethdev.c | 2 +-
drivers/net/kni/rte_eth_kni.c | 6 +-
drivers/net/liquidio/lio_ethdev.c | 2 +-
drivers/net/null/rte_eth_null.c | 6 +-
drivers/net/octeontx/octeontx_ethdev.c | 8 +
drivers/net/pcap/rte_eth_pcap.c | 6 +-
drivers/net/tap/rte_eth_tap.c | 8 +-
drivers/net/vhost/rte_eth_vhost.c | 6 +-
drivers/net/virtio/virtio_ethdev.c | 2 +-
examples/multi_process/Makefile | 1 +
examples/multi_process/hotplug_mp/Makefile | 23 ++
examples/multi_process/hotplug_mp/commands.c | 214 ++++++++++++++
examples/multi_process/hotplug_mp/commands.h | 10 +
examples/multi_process/hotplug_mp/main.c | 41 +++
lib/librte_eal/bsdapp/eal/Makefile | 1 +
lib/librte_eal/common/eal_common_dev.c | 254 ++++++++++++++--
lib/librte_eal/common/eal_private.h | 22 ++
lib/librte_eal/common/hotplug_mp.c | 426 +++++++++++++++++++++++++++
lib/librte_eal/common/hotplug_mp.h | 46 +++
lib/librte_eal/common/include/rte_dev.h | 12 +
lib/librte_eal/common/include/rte_eal.h | 9 +
lib/librte_eal/common/meson.build | 1 +
lib/librte_eal/linuxapp/eal/Makefile | 1 +
lib/librte_eal/linuxapp/eal/eal.c | 6 +
lib/librte_ethdev/rte_ethdev.c | 17 +-
lib/librte_ethdev/rte_ethdev_driver.h | 16 +-
lib/librte_ethdev/rte_ethdev_pci.h | 10 +-
lib/librte_ethdev/rte_ethdev_version.map | 7 +
32 files changed, 1151 insertions(+), 43 deletions(-)
create mode 100644 examples/multi_process/hotplug_mp/Makefile
create mode 100644 examples/multi_process/hotplug_mp/commands.c
create mode 100644 examples/multi_process/hotplug_mp/commands.h
create mode 100644 examples/multi_process/hotplug_mp/main.c
create mode 100644 lib/librte_eal/common/hotplug_mp.c
create mode 100644 lib/librte_eal/common/hotplug_mp.h
--
2.13.6
^ permalink raw reply [relevance 1%]
* Re: [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes
2018-10-12 9:32 0% ` [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes Shreyansh Jain
2018-10-12 9:42 0% ` Shreyansh Jain
@ 2018-10-12 10:16 0% ` Thomas Monjalon
1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-10-12 10:16 UTC (permalink / raw)
To: Shreyansh Jain; +Cc: dev, ferruh.yigit
12/10/2018 11:32, Shreyansh Jain:
> On Wednesday 26 September 2018 11:34 PM, Shreyansh Jain wrote:
> > About the series:
> >
> > This series of patches upgrades the DPAA2 driver firmware to
> > v10.10.10 (MC Firmware).
> > As the bus/fslmc is modified, it is a dependent object for other
> > drivers like net/crypto/qdma. Also, the changes are mostly tightly
> > linked - thus, the patches include upgrade as well as sequential
> > changes to driver.
> > Once done, it would imply that DPAA2 driver won't work with any MC
> > FW lower than 10.10.10.
> >
> > Support for this new firmware is available in publically available
> > LSDK (Layerscape SDK) release [1].
> >
> > Besides the FW change, there are other subtle changes as well:
> > - Support reading the MAC address from NIC device, rather than
> > using a default MAC
> > - Adding support for QBMan 5.0 FW APIs
> > - Some patches for NXP's LX2 platform specific features
> > - And some bug fixes.
> >
> > Dependency:
> >
> > * These patches are based on net-next/master 58c3b609699a8c
> > * Series [1] is logically related to this, but has no git/patch
> > related dependency. It is series for upgrade of DPAA.
> >
> > [1] https://lsdk.github.io/index.html
> > [2] http://patches.dpdk.org/project/dpdk/list/?series=1090&state=*
> >
> > Version History:
> > v1->v2:
> > - Bumped up the version of the libraries (pmd/bus/crypto/event) as the
> > first set of patches (MC firmware update) breaks the internal ABI
> > - Added support for ordered processing APIs. These APIs are expected
> > to be used in subseqent feature updates on DPAA2 ethernet driver.
> > - Some internal bug fixes.
> > (Patches increased from 11~15)
> >
>
> Hi Thomas,
>
> Would you be taking this series for RC1?
Yes
> (Ideally being driver code, this should have been with Ferruh but
> patchwork is showing your name).
Ferruh is taking patches for drivers/net/ and related.
This series is touching a lot more.
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v3 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0
2018-10-12 10:04 2% ` [dpdk-dev] [PATCH v3 " Shreyansh Jain
@ 2018-10-12 10:04 2% ` Shreyansh Jain
0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2018-10-12 10:04 UTC (permalink / raw)
To: thomas; +Cc: ferruh.yigit, dev, Hemant Agrawal
From: Hemant Agrawal <hemant.agrawal@nxp.com>
This patch add the support for new Management Complex
Firmware version to 10.1x.x. One of the main changes in
the APIs ordered queue.
The fslmc bus lib ABI will need to be bumped to reflect
the MC FW API and structure changes.
This will also result in bumping of ABI verion of all dependent
libs as they internally use the MC FW APIs and structures.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
drivers/bus/fslmc/mc/dpbp.c | 10 +
drivers/bus/fslmc/mc/dpci.c | 197 ++++++++++++++++++++
drivers/bus/fslmc/mc/dpcon.c | 30 +++
drivers/bus/fslmc/mc/dpdmai.c | 14 ++
drivers/bus/fslmc/mc/dpio.c | 9 +
drivers/bus/fslmc/mc/fsl_dpbp.h | 1 +
drivers/bus/fslmc/mc/fsl_dpbp_cmd.h | 16 +-
drivers/bus/fslmc/mc/fsl_dpci.h | 47 ++++-
drivers/bus/fslmc/mc/fsl_dpci_cmd.h | 62 +++++-
drivers/bus/fslmc/mc/fsl_dpcon.h | 19 ++
drivers/bus/fslmc/mc/fsl_dpdmai.h | 5 +
drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h | 20 +-
drivers/bus/fslmc/mc/fsl_dpmng.h | 2 +-
drivers/bus/fslmc/mc/fsl_dpopr.h | 85 +++++++++
drivers/bus/fslmc/rte_bus_fslmc_version.map | 10 +
drivers/crypto/dpaa2_sec/Makefile | 2 +-
drivers/crypto/dpaa2_sec/meson.build | 2 +
drivers/event/dpaa2/Makefile | 2 +-
drivers/event/dpaa2/meson.build | 2 +
drivers/mempool/dpaa2/Makefile | 2 +-
drivers/mempool/dpaa2/meson.build | 2 +
drivers/net/dpaa2/Makefile | 2 +-
drivers/net/dpaa2/meson.build | 2 +
drivers/raw/dpaa2_cmdif/Makefile | 2 +-
drivers/raw/dpaa2_cmdif/meson.build | 2 +
drivers/raw/dpaa2_qdma/Makefile | 2 +-
drivers/raw/dpaa2_qdma/dpaa2_qdma.c | 14 +-
drivers/raw/dpaa2_qdma/dpaa2_qdma.h | 6 +-
drivers/raw/dpaa2_qdma/meson.build | 2 +
29 files changed, 538 insertions(+), 33 deletions(-)
create mode 100644 drivers/bus/fslmc/mc/fsl_dpopr.h
diff --git a/drivers/bus/fslmc/mc/dpbp.c b/drivers/bus/fslmc/mc/dpbp.c
index 0215d22da..d9103409c 100644
--- a/drivers/bus/fslmc/mc/dpbp.c
+++ b/drivers/bus/fslmc/mc/dpbp.c
@@ -248,6 +248,16 @@ int dpbp_reset(struct fsl_mc_io *mc_io,
/* send command to mc*/
return mc_send_command(mc_io, &cmd);
}
+/**
+ * dpbp_get_attributes - Retrieve DPBP attributes.
+ *
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPBP object
+ * @attr: Returned object's attributes
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
int dpbp_get_attributes(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
diff --git a/drivers/bus/fslmc/mc/dpci.c b/drivers/bus/fslmc/mc/dpci.c
index ff366bfa9..95edae9d9 100644
--- a/drivers/bus/fslmc/mc/dpci.c
+++ b/drivers/bus/fslmc/mc/dpci.c
@@ -265,6 +265,15 @@ int dpci_reset(struct fsl_mc_io *mc_io,
return mc_send_command(mc_io, &cmd);
}
+/**
+ * dpci_get_attributes() - Retrieve DPCI attributes.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @attr: Returned object's attributes
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
int dpci_get_attributes(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
@@ -292,6 +301,94 @@ int dpci_get_attributes(struct fsl_mc_io *mc_io,
return 0;
}
+/**
+ * dpci_get_peer_attributes() - Retrieve peer DPCI attributes.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @attr: Returned peer attributes
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpci_get_peer_attributes(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ struct dpci_peer_attr *attr)
+{
+ struct dpci_rsp_get_peer_attr *rsp_params;
+ struct mc_command cmd = { 0 };
+ int err;
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCI_CMDID_GET_PEER_ATTR,
+ cmd_flags,
+ token);
+
+ /* send command to mc*/
+ err = mc_send_command(mc_io, &cmd);
+ if (err)
+ return err;
+
+ /* retrieve response parameters */
+ rsp_params = (struct dpci_rsp_get_peer_attr *)cmd.params;
+ attr->peer_id = le32_to_cpu(rsp_params->id);
+ attr->num_of_priorities = rsp_params->num_of_priorities;
+
+ return 0;
+}
+
+/**
+ * dpci_get_link_state() - Retrieve the DPCI link state.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @up: Returned link state; returns '1' if link is up, '0' otherwise
+ *
+ * DPCI can be connected to another DPCI, together they
+ * create a 'link'. In order to use the DPCI Tx and Rx queues,
+ * both objects must be enabled.
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpci_get_link_state(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ int *up)
+{
+ struct dpci_rsp_get_link_state *rsp_params;
+ struct mc_command cmd = { 0 };
+ int err;
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCI_CMDID_GET_LINK_STATE,
+ cmd_flags,
+ token);
+
+ /* send command to mc*/
+ err = mc_send_command(mc_io, &cmd);
+ if (err)
+ return err;
+
+ /* retrieve response parameters */
+ rsp_params = (struct dpci_rsp_get_link_state *)cmd.params;
+ *up = dpci_get_field(rsp_params->up, UP);
+
+ return 0;
+}
+
+/**
+ * dpci_set_rx_queue() - Set Rx queue configuration
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @priority: Select the queue relative to number of
+ * priorities configured at DPCI creation; use
+ * DPCI_ALL_QUEUES to configure all Rx queues
+ * identically.
+ * @cfg: Rx queue configuration
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
int dpci_set_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
@@ -314,6 +411,9 @@ int dpci_set_rx_queue(struct fsl_mc_io *mc_io,
dpci_set_field(cmd_params->dest_type,
DEST_TYPE,
cfg->dest_cfg.dest_type);
+ dpci_set_field(cmd_params->dest_type,
+ ORDER_PRESERVATION,
+ cfg->order_preservation_en);
/* send command to mc*/
return mc_send_command(mc_io, &cmd);
@@ -438,3 +538,100 @@ int dpci_get_api_version(struct fsl_mc_io *mc_io,
return 0;
}
+
+/**
+ * dpci_set_opr() - Set Order Restoration configuration.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @index: The queue index
+ * @options: Configuration mode options
+ * can be OPR_OPT_CREATE or OPR_OPT_RETIRE
+ * @cfg: Configuration options for the OPR
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpci_set_opr(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ uint8_t index,
+ uint8_t options,
+ struct opr_cfg *cfg)
+{
+ struct dpci_cmd_set_opr *cmd_params;
+ struct mc_command cmd = { 0 };
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCI_CMDID_SET_OPR,
+ cmd_flags,
+ token);
+ cmd_params = (struct dpci_cmd_set_opr *)cmd.params;
+ cmd_params->index = index;
+ cmd_params->options = options;
+ cmd_params->oloe = cfg->oloe;
+ cmd_params->oeane = cfg->oeane;
+ cmd_params->olws = cfg->olws;
+ cmd_params->oa = cfg->oa;
+ cmd_params->oprrws = cfg->oprrws;
+
+ /* send command to mc*/
+ return mc_send_command(mc_io, &cmd);
+}
+
+/**
+ * dpci_get_opr() - Retrieve Order Restoration config and query.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @index: The queue index
+ * @cfg: Returned OPR configuration
+ * @qry: Returned OPR query
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpci_get_opr(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ uint8_t index,
+ struct opr_cfg *cfg,
+ struct opr_qry *qry)
+{
+ struct dpci_rsp_get_opr *rsp_params;
+ struct dpci_cmd_get_opr *cmd_params;
+ struct mc_command cmd = { 0 };
+ int err;
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCI_CMDID_GET_OPR,
+ cmd_flags,
+ token);
+ cmd_params = (struct dpci_cmd_get_opr *)cmd.params;
+ cmd_params->index = index;
+
+ /* send command to mc*/
+ err = mc_send_command(mc_io, &cmd);
+ if (err)
+ return err;
+
+ /* retrieve response parameters */
+ rsp_params = (struct dpci_rsp_get_opr *)cmd.params;
+ cfg->oloe = rsp_params->oloe;
+ cfg->oeane = rsp_params->oeane;
+ cfg->olws = rsp_params->olws;
+ cfg->oa = rsp_params->oa;
+ cfg->oprrws = rsp_params->oprrws;
+ qry->rip = dpci_get_field(rsp_params->flags, RIP);
+ qry->enable = dpci_get_field(rsp_params->flags, OPR_ENABLE);
+ qry->nesn = le16_to_cpu(rsp_params->nesn);
+ qry->ndsn = le16_to_cpu(rsp_params->ndsn);
+ qry->ea_tseq = le16_to_cpu(rsp_params->ea_tseq);
+ qry->tseq_nlis = dpci_get_field(rsp_params->tseq_nlis, TSEQ_NLIS);
+ qry->ea_hseq = le16_to_cpu(rsp_params->ea_hseq);
+ qry->hseq_nlis = dpci_get_field(rsp_params->hseq_nlis, HSEQ_NLIS);
+ qry->ea_hptr = le16_to_cpu(rsp_params->ea_hptr);
+ qry->ea_tptr = le16_to_cpu(rsp_params->ea_tptr);
+ qry->opr_vid = le16_to_cpu(rsp_params->opr_vid);
+ qry->opr_id = le16_to_cpu(rsp_params->opr_id);
+
+ return 0;
+}
diff --git a/drivers/bus/fslmc/mc/dpcon.c b/drivers/bus/fslmc/mc/dpcon.c
index 3f6e04b97..92bd26512 100644
--- a/drivers/bus/fslmc/mc/dpcon.c
+++ b/drivers/bus/fslmc/mc/dpcon.c
@@ -295,6 +295,36 @@ int dpcon_get_attributes(struct fsl_mc_io *mc_io,
return 0;
}
+/**
+ * dpcon_set_notification() - Set DPCON notification destination
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCON object
+ * @cfg: Notification parameters
+ *
+ * Return: '0' on Success; Error code otherwise
+ */
+int dpcon_set_notification(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ struct dpcon_notification_cfg *cfg)
+{
+ struct dpcon_cmd_set_notification *dpcon_cmd;
+ struct mc_command cmd = { 0 };
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCON_CMDID_SET_NOTIFICATION,
+ cmd_flags,
+ token);
+ dpcon_cmd = (struct dpcon_cmd_set_notification *)cmd.params;
+ dpcon_cmd->dpio_id = cpu_to_le32(cfg->dpio_id);
+ dpcon_cmd->priority = cfg->priority;
+ dpcon_cmd->user_ctx = cpu_to_le64(cfg->user_ctx);
+
+ /* send command to mc*/
+ return mc_send_command(mc_io, &cmd);
+}
+
/**
* dpcon_get_api_version - Get Data Path Concentrator API version
* @mc_io: Pointer to MC portal's DPCON object
diff --git a/drivers/bus/fslmc/mc/dpdmai.c b/drivers/bus/fslmc/mc/dpdmai.c
index 528889df3..dcb9d516a 100644
--- a/drivers/bus/fslmc/mc/dpdmai.c
+++ b/drivers/bus/fslmc/mc/dpdmai.c
@@ -113,6 +113,7 @@ int dpdmai_create(struct fsl_mc_io *mc_io,
cmd_flags,
dprc_token);
cmd_params = (struct dpdmai_cmd_create *)cmd.params;
+ cmd_params->num_queues = cfg->num_queues;
cmd_params->priorities[0] = cfg->priorities[0];
cmd_params->priorities[1] = cfg->priorities[1];
@@ -297,6 +298,7 @@ int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
rsp_params = (struct dpdmai_rsp_get_attr *)cmd.params;
attr->id = le32_to_cpu(rsp_params->id);
attr->num_of_priorities = rsp_params->num_of_priorities;
+ attr->num_of_queues = rsp_params->num_of_queues;
return 0;
}
@@ -306,6 +308,8 @@ int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
* @mc_io: Pointer to MC portal's I/O object
* @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
* @token: Token of DPDMAI object
+ * @queue_idx: Rx queue index. Accepted values are form 0 to num_queues
+ * parameter provided in dpdmai_create
* @priority: Select the queue relative to number of
* priorities configured at DPDMAI creation; use
* DPDMAI_ALL_QUEUES to configure all Rx queues
@@ -317,6 +321,7 @@ int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
const struct dpdmai_rx_queue_cfg *cfg)
{
@@ -331,6 +336,7 @@ int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
cmd_params->dest_id = cpu_to_le32(cfg->dest_cfg.dest_id);
cmd_params->dest_priority = cfg->dest_cfg.priority;
cmd_params->priority = priority;
+ cmd_params->queue_idx = queue_idx;
cmd_params->user_ctx = cpu_to_le64(cfg->user_ctx);
cmd_params->options = cpu_to_le32(cfg->options);
dpdmai_set_field(cmd_params->dest_type,
@@ -346,6 +352,8 @@ int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
* @mc_io: Pointer to MC portal's I/O object
* @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
* @token: Token of DPDMAI object
+ * @queue_idx: Rx queue index. Accepted values are form 0 to num_queues
+ * parameter provided in dpdmai_create
* @priority: Select the queue relative to number of
* priorities configured at DPDMAI creation
* @attr: Returned Rx queue attributes
@@ -355,6 +363,7 @@ int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
struct dpdmai_rx_queue_attr *attr)
{
@@ -369,6 +378,7 @@ int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
token);
cmd_params = (struct dpdmai_cmd_get_queue *)cmd.params;
cmd_params->priority = priority;
+ cmd_params->queue_idx = queue_idx;
/* send command to mc*/
err = mc_send_command(mc_io, &cmd);
@@ -392,6 +402,8 @@ int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
* @mc_io: Pointer to MC portal's I/O object
* @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
* @token: Token of DPDMAI object
+ * @queue_idx: Tx queue index. Accepted values are form 0 to num_queues
+ * parameter provided in dpdmai_create
* @priority: Select the queue relative to number of
* priorities configured at DPDMAI creation
* @attr: Returned Tx queue attributes
@@ -401,6 +413,7 @@ int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
struct dpdmai_tx_queue_attr *attr)
{
@@ -415,6 +428,7 @@ int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
token);
cmd_params = (struct dpdmai_cmd_get_queue *)cmd.params;
cmd_params->priority = priority;
+ cmd_params->queue_idx = queue_idx;
/* send command to mc*/
err = mc_send_command(mc_io, &cmd);
diff --git a/drivers/bus/fslmc/mc/dpio.c b/drivers/bus/fslmc/mc/dpio.c
index 966277cc6..a3382ed14 100644
--- a/drivers/bus/fslmc/mc/dpio.c
+++ b/drivers/bus/fslmc/mc/dpio.c
@@ -268,6 +268,15 @@ int dpio_reset(struct fsl_mc_io *mc_io,
return mc_send_command(mc_io, &cmd);
}
+/**
+ * dpio_get_attributes() - Retrieve DPIO attributes
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPIO object
+ * @attr: Returned object's attributes
+ *
+ * Return: '0' on Success; Error code otherwise
+ */
int dpio_get_attributes(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
diff --git a/drivers/bus/fslmc/mc/fsl_dpbp.h b/drivers/bus/fslmc/mc/fsl_dpbp.h
index 111836261..9d405b42c 100644
--- a/drivers/bus/fslmc/mc/fsl_dpbp.h
+++ b/drivers/bus/fslmc/mc/fsl_dpbp.h
@@ -82,6 +82,7 @@ int dpbp_get_attributes(struct fsl_mc_io *mc_io,
/**
* BPSCN write will attempt to allocate into a cache (coherent write)
*/
+#define DPBP_NOTIF_OPT_COHERENT_WRITE 0x00000001
int dpbp_get_api_version(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t *major_ver,
diff --git a/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h b/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h
index 18402cedf..55c9fc9b4 100644
--- a/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h
+++ b/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h
@@ -9,13 +9,15 @@
/* DPBP Version */
#define DPBP_VER_MAJOR 3
-#define DPBP_VER_MINOR 3
+#define DPBP_VER_MINOR 4
/* Command versioning */
#define DPBP_CMD_BASE_VERSION 1
+#define DPBP_CMD_VERSION_2 2
#define DPBP_CMD_ID_OFFSET 4
#define DPBP_CMD(id) ((id << DPBP_CMD_ID_OFFSET) | DPBP_CMD_BASE_VERSION)
+#define DPBP_CMD_V2(id) ((id << DPBP_CMD_ID_OFFSET) | DPBP_CMD_VERSION_2)
/* Command IDs */
#define DPBP_CMDID_CLOSE DPBP_CMD(0x800)
@@ -37,8 +39,8 @@
#define DPBP_CMDID_GET_IRQ_STATUS DPBP_CMD(0x016)
#define DPBP_CMDID_CLEAR_IRQ_STATUS DPBP_CMD(0x017)
-#define DPBP_CMDID_SET_NOTIFICATIONS DPBP_CMD(0x1b0)
-#define DPBP_CMDID_GET_NOTIFICATIONS DPBP_CMD(0x1b1)
+#define DPBP_CMDID_SET_NOTIFICATIONS DPBP_CMD_V2(0x1b0)
+#define DPBP_CMDID_GET_NOTIFICATIONS DPBP_CMD_V2(0x1b1)
#define DPBP_CMDID_GET_FREE_BUFFERS_NUM DPBP_CMD(0x1b2)
@@ -68,8 +70,8 @@ struct dpbp_cmd_set_notifications {
uint32_t depletion_exit;
uint32_t surplus_entry;
uint32_t surplus_exit;
- uint16_t options;
- uint16_t pad[3];
+ uint32_t options;
+ uint16_t pad[2];
uint64_t message_ctx;
uint64_t message_iova;
};
@@ -79,8 +81,8 @@ struct dpbp_rsp_get_notifications {
uint32_t depletion_exit;
uint32_t surplus_entry;
uint32_t surplus_exit;
- uint16_t options;
- uint16_t pad[3];
+ uint32_t options;
+ uint16_t pad[2];
uint64_t message_ctx;
uint64_t message_iova;
};
diff --git a/drivers/bus/fslmc/mc/fsl_dpci.h b/drivers/bus/fslmc/mc/fsl_dpci.h
index f69ed3f33..9af9097e5 100644
--- a/drivers/bus/fslmc/mc/fsl_dpci.h
+++ b/drivers/bus/fslmc/mc/fsl_dpci.h
@@ -6,6 +6,8 @@
#ifndef __FSL_DPCI_H
#define __FSL_DPCI_H
+#include <fsl_dpopr.h>
+
/* Data Path Communication Interface API
* Contains initialization APIs and runtime control APIs for DPCI
*/
@@ -17,7 +19,7 @@ struct fsl_mc_io;
/**
* Maximum number of Tx/Rx priorities per DPCI object
*/
-#define DPCI_PRIO_NUM 2
+#define DPCI_PRIO_NUM 4
/**
* Indicates an invalid frame queue
@@ -106,6 +108,27 @@ int dpci_get_attributes(struct fsl_mc_io *mc_io,
uint16_t token,
struct dpci_attr *attr);
+/**
+ * struct dpci_peer_attr - Structure representing the peer DPCI attributes
+ * @peer_id: DPCI peer id; if no peer is connected returns (-1)
+ * @num_of_priorities: The pper's number of receive priorities; determines the
+ * number of transmit priorities for the local DPCI object
+ */
+struct dpci_peer_attr {
+ int peer_id;
+ uint8_t num_of_priorities;
+};
+
+int dpci_get_peer_attributes(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ struct dpci_peer_attr *attr);
+
+int dpci_get_link_state(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ int *up);
+
/**
* enum dpci_dest - DPCI destination types
* @DPCI_DEST_NONE: Unassigned destination; The queue is set in parked mode
@@ -153,6 +176,11 @@ struct dpci_dest_cfg {
*/
#define DPCI_QUEUE_OPT_DEST 0x00000002
+/**
+ * Set the queue to hold active mode.
+ */
+#define DPCI_QUEUE_OPT_HOLD_ACTIVE 0x00000004
+
/**
* struct dpci_rx_queue_cfg - Structure representing RX queue configuration
* @options: Flags representing the suggested modifications to the queue;
@@ -163,11 +191,14 @@ struct dpci_dest_cfg {
* 'options'
* @dest_cfg: Queue destination parameters;
* valid only if 'DPCI_QUEUE_OPT_DEST' is contained in 'options'
+ * @order_preservation_en: order preservation configuration for the rx queue
+ * valid only if 'DPCI_QUEUE_OPT_HOLD_ACTIVE' is contained in 'options'
*/
struct dpci_rx_queue_cfg {
uint32_t options;
uint64_t user_ctx;
struct dpci_dest_cfg dest_cfg;
+ int order_preservation_en;
};
int dpci_set_rx_queue(struct fsl_mc_io *mc_io,
@@ -217,4 +248,18 @@ int dpci_get_api_version(struct fsl_mc_io *mc_io,
uint16_t *major_ver,
uint16_t *minor_ver);
+int dpci_set_opr(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ uint8_t index,
+ uint8_t options,
+ struct opr_cfg *cfg);
+
+int dpci_get_opr(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ uint8_t index,
+ struct opr_cfg *cfg,
+ struct opr_qry *qry);
+
#endif /* __FSL_DPCI_H */
diff --git a/drivers/bus/fslmc/mc/fsl_dpci_cmd.h b/drivers/bus/fslmc/mc/fsl_dpci_cmd.h
index 634248ac0..92b85a820 100644
--- a/drivers/bus/fslmc/mc/fsl_dpci_cmd.h
+++ b/drivers/bus/fslmc/mc/fsl_dpci_cmd.h
@@ -8,7 +8,7 @@
/* DPCI Version */
#define DPCI_VER_MAJOR 3
-#define DPCI_VER_MINOR 3
+#define DPCI_VER_MINOR 4
#define DPCI_CMD_BASE_VERSION 1
#define DPCI_CMD_BASE_VERSION_V2 2
@@ -35,6 +35,8 @@
#define DPCI_CMDID_GET_PEER_ATTR DPCI_CMD_V1(0x0e2)
#define DPCI_CMDID_GET_RX_QUEUE DPCI_CMD_V1(0x0e3)
#define DPCI_CMDID_GET_TX_QUEUE DPCI_CMD_V1(0x0e4)
+#define DPCI_CMDID_SET_OPR DPCI_CMD_V1(0x0e5)
+#define DPCI_CMDID_GET_OPR DPCI_CMD_V1(0x0e6)
/* Macros for accessing command fields smaller than 1byte */
#define DPCI_MASK(field) \
@@ -90,6 +92,8 @@ struct dpci_rsp_get_link_state {
#define DPCI_DEST_TYPE_SHIFT 0
#define DPCI_DEST_TYPE_SIZE 4
+#define DPCI_ORDER_PRESERVATION_SHIFT 4
+#define DPCI_ORDER_PRESERVATION_SIZE 1
struct dpci_cmd_set_rx_queue {
uint32_t dest_id;
@@ -128,5 +132,61 @@ struct dpci_rsp_get_api_version {
uint16_t minor;
};
+struct dpci_cmd_set_opr {
+ uint16_t pad0;
+ uint8_t index;
+ uint8_t options;
+ uint8_t pad1[7];
+ uint8_t oloe;
+ uint8_t oeane;
+ uint8_t olws;
+ uint8_t oa;
+ uint8_t oprrws;
+};
+
+struct dpci_cmd_get_opr {
+ uint16_t pad;
+ uint8_t index;
+};
+
+#define DPCI_RIP_SHIFT 0
+#define DPCI_RIP_SIZE 1
+#define DPCI_OPR_ENABLE_SHIFT 1
+#define DPCI_OPR_ENABLE_SIZE 1
+#define DPCI_TSEQ_NLIS_SHIFT 0
+#define DPCI_TSEQ_NLIS_SIZE 1
+#define DPCI_HSEQ_NLIS_SHIFT 0
+#define DPCI_HSEQ_NLIS_SIZE 1
+
+struct dpci_rsp_get_opr {
+ uint64_t pad0;
+ /* from LSB: rip:1 enable:1 */
+ uint8_t flags;
+ uint16_t pad1;
+ uint8_t oloe;
+ uint8_t oeane;
+ uint8_t olws;
+ uint8_t oa;
+ uint8_t oprrws;
+ uint16_t nesn;
+ uint16_t pad8;
+ uint16_t ndsn;
+ uint16_t pad2;
+ uint16_t ea_tseq;
+ /* only the LSB */
+ uint8_t tseq_nlis;
+ uint8_t pad3;
+ uint16_t ea_hseq;
+ /* only the LSB */
+ uint8_t hseq_nlis;
+ uint8_t pad4;
+ uint16_t ea_hptr;
+ uint16_t pad5;
+ uint16_t ea_tptr;
+ uint16_t pad6;
+ uint16_t opr_vid;
+ uint16_t pad7;
+ uint16_t opr_id;
+};
#pragma pack(pop)
#endif /* _FSL_DPCI_CMD_H */
diff --git a/drivers/bus/fslmc/mc/fsl_dpcon.h b/drivers/bus/fslmc/mc/fsl_dpcon.h
index 36dd5f3c1..fc0430dc1 100644
--- a/drivers/bus/fslmc/mc/fsl_dpcon.h
+++ b/drivers/bus/fslmc/mc/fsl_dpcon.h
@@ -81,6 +81,25 @@ int dpcon_get_attributes(struct fsl_mc_io *mc_io,
uint16_t token,
struct dpcon_attr *attr);
+/**
+ * struct dpcon_notification_cfg - Structure representing notification params
+ * @dpio_id: DPIO object ID; must be configured with a notification channel;
+ * to disable notifications set it to 'DPCON_INVALID_DPIO_ID';
+ * @priority: Priority selection within the DPIO channel; valid values
+ * are 0-7, depending on the number of priorities in that channel
+ * @user_ctx: User context value provided with each CDAN message
+ */
+struct dpcon_notification_cfg {
+ int dpio_id;
+ uint8_t priority;
+ uint64_t user_ctx;
+};
+
+int dpcon_set_notification(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ struct dpcon_notification_cfg *cfg);
+
int dpcon_get_api_version(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t *major_ver,
diff --git a/drivers/bus/fslmc/mc/fsl_dpdmai.h b/drivers/bus/fslmc/mc/fsl_dpdmai.h
index 03e46ec14..40469cc13 100644
--- a/drivers/bus/fslmc/mc/fsl_dpdmai.h
+++ b/drivers/bus/fslmc/mc/fsl_dpdmai.h
@@ -39,6 +39,7 @@ int dpdmai_close(struct fsl_mc_io *mc_io,
* should be configured with 0
*/
struct dpdmai_cfg {
+ uint8_t num_queues;
uint8_t priorities[DPDMAI_PRIO_NUM];
};
@@ -78,6 +79,7 @@ int dpdmai_reset(struct fsl_mc_io *mc_io,
struct dpdmai_attr {
int id;
uint8_t num_of_priorities;
+ uint8_t num_of_queues;
};
int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
@@ -149,6 +151,7 @@ struct dpdmai_rx_queue_cfg {
int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
const struct dpdmai_rx_queue_cfg *cfg);
@@ -168,6 +171,7 @@ struct dpdmai_rx_queue_attr {
int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
struct dpdmai_rx_queue_attr *attr);
@@ -183,6 +187,7 @@ struct dpdmai_tx_queue_attr {
int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
struct dpdmai_tx_queue_attr *attr);
diff --git a/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h b/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h
index 618e19eae..7e122de4e 100644
--- a/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h
+++ b/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h
@@ -7,30 +7,32 @@
/* DPDMAI Version */
#define DPDMAI_VER_MAJOR 3
-#define DPDMAI_VER_MINOR 2
+#define DPDMAI_VER_MINOR 3
/* Command versioning */
#define DPDMAI_CMD_BASE_VERSION 1
+#define DPDMAI_CMD_VERSION_2 2
#define DPDMAI_CMD_ID_OFFSET 4
#define DPDMAI_CMD(id) ((id << DPDMAI_CMD_ID_OFFSET) | DPDMAI_CMD_BASE_VERSION)
+#define DPDMAI_CMD_V2(id) ((id << DPDMAI_CMD_ID_OFFSET) | DPDMAI_CMD_VERSION_2)
/* Command IDs */
#define DPDMAI_CMDID_CLOSE DPDMAI_CMD(0x800)
#define DPDMAI_CMDID_OPEN DPDMAI_CMD(0x80E)
-#define DPDMAI_CMDID_CREATE DPDMAI_CMD(0x90E)
+#define DPDMAI_CMDID_CREATE DPDMAI_CMD_V2(0x90E)
#define DPDMAI_CMDID_DESTROY DPDMAI_CMD(0x98E)
#define DPDMAI_CMDID_GET_API_VERSION DPDMAI_CMD(0xa0E)
#define DPDMAI_CMDID_ENABLE DPDMAI_CMD(0x002)
#define DPDMAI_CMDID_DISABLE DPDMAI_CMD(0x003)
-#define DPDMAI_CMDID_GET_ATTR DPDMAI_CMD(0x004)
+#define DPDMAI_CMDID_GET_ATTR DPDMAI_CMD_V2(0x004)
#define DPDMAI_CMDID_RESET DPDMAI_CMD(0x005)
#define DPDMAI_CMDID_IS_ENABLED DPDMAI_CMD(0x006)
-#define DPDMAI_CMDID_SET_RX_QUEUE DPDMAI_CMD(0x1A0)
-#define DPDMAI_CMDID_GET_RX_QUEUE DPDMAI_CMD(0x1A1)
-#define DPDMAI_CMDID_GET_TX_QUEUE DPDMAI_CMD(0x1A2)
+#define DPDMAI_CMDID_SET_RX_QUEUE DPDMAI_CMD_V2(0x1A0)
+#define DPDMAI_CMDID_GET_RX_QUEUE DPDMAI_CMD_V2(0x1A1)
+#define DPDMAI_CMDID_GET_TX_QUEUE DPDMAI_CMD_V2(0x1A2)
/* Macros for accessing command fields smaller than 1byte */
#define DPDMAI_MASK(field) \
@@ -47,7 +49,7 @@ struct dpdmai_cmd_open {
};
struct dpdmai_cmd_create {
- uint8_t pad;
+ uint8_t num_queues;
uint8_t priorities[2];
};
@@ -66,6 +68,7 @@ struct dpdmai_rsp_is_enabled {
struct dpdmai_rsp_get_attr {
uint32_t id;
uint8_t num_of_priorities;
+ uint8_t num_of_queues;
};
#define DPDMAI_DEST_TYPE_SHIFT 0
@@ -77,7 +80,7 @@ struct dpdmai_cmd_set_rx_queue {
uint8_t priority;
/* from LSB: dest_type:4 */
uint8_t dest_type;
- uint8_t pad;
+ uint8_t queue_idx;
uint64_t user_ctx;
uint32_t options;
};
@@ -85,6 +88,7 @@ struct dpdmai_cmd_set_rx_queue {
struct dpdmai_cmd_get_queue {
uint8_t pad[5];
uint8_t priority;
+ uint8_t queue_idx;
};
struct dpdmai_rsp_get_rx_queue {
diff --git a/drivers/bus/fslmc/mc/fsl_dpmng.h b/drivers/bus/fslmc/mc/fsl_dpmng.h
index afaf9b711..8559bef87 100644
--- a/drivers/bus/fslmc/mc/fsl_dpmng.h
+++ b/drivers/bus/fslmc/mc/fsl_dpmng.h
@@ -18,7 +18,7 @@ struct fsl_mc_io;
* Management Complex firmware version information
*/
#define MC_VER_MAJOR 10
-#define MC_VER_MINOR 3
+#define MC_VER_MINOR 10
/**
* struct mc_version
diff --git a/drivers/bus/fslmc/mc/fsl_dpopr.h b/drivers/bus/fslmc/mc/fsl_dpopr.h
new file mode 100644
index 000000000..fd727e011
--- /dev/null
+++ b/drivers/bus/fslmc/mc/fsl_dpopr.h
@@ -0,0 +1,85 @@
+/* SPDX-License-Identifier: (BSD-3-Clause OR GPL-2.0)
+ *
+ * Copyright 2013-2015 Freescale Semiconductor Inc.
+ * Copyright 2018 NXP
+ *
+ */
+#ifndef __FSL_DPOPR_H_
+#define __FSL_DPOPR_H_
+
+/** @addtogroup dpopr Data Path Order Restoration API
+ * Contains initialization APIs and runtime APIs for the Order Restoration
+ * @{
+ */
+
+/** Order Restoration properties */
+
+/**
+ * Create a new Order Point Record option
+ */
+#define OPR_OPT_CREATE 0x1
+/**
+ * Retire an existing Order Point Record option
+ */
+#define OPR_OPT_RETIRE 0x2
+
+/**
+ * struct opr_cfg - Structure representing OPR configuration
+ * @oprrws: Order point record (OPR) restoration window size (0 to 5)
+ * 0 - Window size is 32 frames.
+ * 1 - Window size is 64 frames.
+ * 2 - Window size is 128 frames.
+ * 3 - Window size is 256 frames.
+ * 4 - Window size is 512 frames.
+ * 5 - Window size is 1024 frames.
+ *@oa: OPR auto advance NESN window size (0 disabled, 1 enabled)
+ *@olws: OPR acceptable late arrival window size (0 to 3)
+ * 0 - Disabled. Late arrivals are always rejected.
+ * 1 - Window size is 32 frames.
+ * 2 - Window size is the same as the OPR restoration
+ * window size configured in the OPRRWS field.
+ * 3 - Window size is 8192 frames.
+ * Late arrivals are always accepted.
+ *@oeane: Order restoration list (ORL) resource exhaustion
+ * advance NESN enable (0 disabled, 1 enabled)
+ *@oloe: OPR loose ordering enable (0 disabled, 1 enabled)
+ */
+struct opr_cfg {
+ uint8_t oprrws;
+ uint8_t oa;
+ uint8_t olws;
+ uint8_t oeane;
+ uint8_t oloe;
+};
+
+/**
+ * struct opr_qry - Structure representing OPR configuration
+ * @enable: Enabled state
+ * @rip: Retirement In Progress
+ * @ndsn: Next dispensed sequence number
+ * @nesn: Next expected sequence number
+ * @ea_hseq: Early arrival head sequence number
+ * @hseq_nlis: HSEQ not last in sequence
+ * @ea_tseq: Early arrival tail sequence number
+ * @tseq_nlis: TSEQ not last in sequence
+ * @ea_tptr: Early arrival tail pointer
+ * @ea_hptr: Early arrival head pointer
+ * @opr_id: Order Point Record ID
+ * @opr_vid: Order Point Record Virtual ID
+ */
+struct opr_qry {
+ char enable;
+ char rip;
+ uint16_t ndsn;
+ uint16_t nesn;
+ uint16_t ea_hseq;
+ char hseq_nlis;
+ uint16_t ea_tseq;
+ char tseq_nlis;
+ uint16_t ea_tptr;
+ uint16_t ea_hptr;
+ uint16_t opr_id;
+ uint16_t opr_vid;
+};
+
+#endif /* __FSL_DPOPR_H_ */
diff --git a/drivers/bus/fslmc/rte_bus_fslmc_version.map b/drivers/bus/fslmc/rte_bus_fslmc_version.map
index b4a881704..8717373dd 100644
--- a/drivers/bus/fslmc/rte_bus_fslmc_version.map
+++ b/drivers/bus/fslmc/rte_bus_fslmc_version.map
@@ -117,3 +117,13 @@ DPDK_18.05 {
rte_dpaa2_memsegs;
} DPDK_18.02;
+
+DPDK_18.11 {
+ global:
+
+ dpci_get_link_state;
+ dpci_get_opr;
+ dpci_get_peer_attributes;
+ dpci_set_opr;
+
+} DPDK_18.05;
diff --git a/drivers/crypto/dpaa2_sec/Makefile b/drivers/crypto/dpaa2_sec/Makefile
index da3d8f84f..a61be49db 100644
--- a/drivers/crypto/dpaa2_sec/Makefile
+++ b/drivers/crypto/dpaa2_sec/Makefile
@@ -41,7 +41,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
EXPORT_MAP := rte_pmd_dpaa2_sec_version.map
# library version
-LIBABIVER := 1
+LIBABIVER := 2
# library source files
SRCS-$(CONFIG_RTE_LIBRTE_PMD_DPAA2_SEC) += dpaa2_sec_dpseci.c
diff --git a/drivers/crypto/dpaa2_sec/meson.build b/drivers/crypto/dpaa2_sec/meson.build
index 01afc5877..8fa4827ed 100644
--- a/drivers/crypto/dpaa2_sec/meson.build
+++ b/drivers/crypto/dpaa2_sec/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/event/dpaa2/Makefile b/drivers/event/dpaa2/Makefile
index 5e1a63200..3f85dd2be 100644
--- a/drivers/event/dpaa2/Makefile
+++ b/drivers/event/dpaa2/Makefile
@@ -27,7 +27,7 @@ CFLAGS += -I$(RTE_SDK)/drivers/net/dpaa2/mc
# versioning export map
EXPORT_MAP := rte_pmd_dpaa2_event_version.map
-LIBABIVER := 1
+LIBABIVER := 2
# depends on fslmc bus which uses experimental API
CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/drivers/event/dpaa2/meson.build b/drivers/event/dpaa2/meson.build
index de7a46155..c46b39e9d 100644
--- a/drivers/event/dpaa2/meson.build
+++ b/drivers/event/dpaa2/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/mempool/dpaa2/Makefile b/drivers/mempool/dpaa2/Makefile
index 9e4c87d79..4996a2cd1 100644
--- a/drivers/mempool/dpaa2/Makefile
+++ b/drivers/mempool/dpaa2/Makefile
@@ -19,7 +19,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
EXPORT_MAP := rte_mempool_dpaa2_version.map
# Lbrary version
-LIBABIVER := 1
+LIBABIVER := 2
# depends on fslmc bus which uses experimental API
CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/drivers/mempool/dpaa2/meson.build b/drivers/mempool/dpaa2/meson.build
index 90bab6069..6b6ead617 100644
--- a/drivers/mempool/dpaa2/meson.build
+++ b/drivers/mempool/dpaa2/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/net/dpaa2/Makefile b/drivers/net/dpaa2/Makefile
index 9b0b14331..1d46f7f25 100644
--- a/drivers/net/dpaa2/Makefile
+++ b/drivers/net/dpaa2/Makefile
@@ -25,7 +25,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
EXPORT_MAP := rte_pmd_dpaa2_version.map
# library version
-LIBABIVER := 1
+LIBABIVER := 2
# depends on fslmc bus which uses experimental API
CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/drivers/net/dpaa2/meson.build b/drivers/net/dpaa2/meson.build
index 213f0d72f..b34595258 100644
--- a/drivers/net/dpaa2/meson.build
+++ b/drivers/net/dpaa2/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/raw/dpaa2_cmdif/Makefile b/drivers/raw/dpaa2_cmdif/Makefile
index 9b863dda2..0dbe5c821 100644
--- a/drivers/raw/dpaa2_cmdif/Makefile
+++ b/drivers/raw/dpaa2_cmdif/Makefile
@@ -24,7 +24,7 @@ LDLIBS += -lrte_rawdev
EXPORT_MAP := rte_pmd_dpaa2_cmdif_version.map
-LIBABIVER := 1
+LIBABIVER := 2
#
# all source are stored in SRCS-y
diff --git a/drivers/raw/dpaa2_cmdif/meson.build b/drivers/raw/dpaa2_cmdif/meson.build
index 1d146872e..37bb24a1b 100644
--- a/drivers/raw/dpaa2_cmdif/meson.build
+++ b/drivers/raw/dpaa2_cmdif/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
build = dpdk_conf.has('RTE_LIBRTE_DPAA2_MEMPOOL')
deps += ['rawdev', 'mempool_dpaa2', 'bus_vdev']
sources = files('dpaa2_cmdif.c')
diff --git a/drivers/raw/dpaa2_qdma/Makefile b/drivers/raw/dpaa2_qdma/Makefile
index d88809ead..645220772 100644
--- a/drivers/raw/dpaa2_qdma/Makefile
+++ b/drivers/raw/dpaa2_qdma/Makefile
@@ -25,7 +25,7 @@ LDLIBS += -lrte_ring
EXPORT_MAP := rte_pmd_dpaa2_qdma_version.map
-LIBABIVER := 1
+LIBABIVER := 2
#
# all source are stored in SRCS-y
diff --git a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
index 2787d3028..44503331e 100644
--- a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
+++ b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
@@ -805,7 +805,7 @@ dpaa2_dpdmai_dev_uninit(struct rte_rawdev *rawdev)
DPAA2_QDMA_ERR("dmdmai disable failed");
/* Set up the DQRR storage for Rx */
- for (i = 0; i < DPDMAI_PRIO_NUM; i++) {
+ for (i = 0; i < dpdmai_dev->num_queues; i++) {
struct dpaa2_queue *rxq = &(dpdmai_dev->rx_queue[i]);
if (rxq->q_storage) {
@@ -856,17 +856,17 @@ dpaa2_dpdmai_dev_init(struct rte_rawdev *rawdev, int dpdmai_id)
ret);
goto init_err;
}
- dpdmai_dev->num_queues = attr.num_of_priorities;
+ dpdmai_dev->num_queues = attr.num_of_queues;
/* Set up Rx Queues */
- for (i = 0; i < attr.num_of_priorities; i++) {
+ for (i = 0; i < dpdmai_dev->num_queues; i++) {
struct dpaa2_queue *rxq;
memset(&rx_queue_cfg, 0, sizeof(struct dpdmai_rx_queue_cfg));
ret = dpdmai_set_rx_queue(&dpdmai_dev->dpdmai,
CMD_PRI_LOW,
dpdmai_dev->token,
- i, &rx_queue_cfg);
+ i, 0, &rx_queue_cfg);
if (ret) {
DPAA2_QDMA_ERR("Setting Rx queue failed with err: %d",
ret);
@@ -893,9 +893,9 @@ dpaa2_dpdmai_dev_init(struct rte_rawdev *rawdev, int dpdmai_id)
}
/* Get Rx and Tx queues FQID's */
- for (i = 0; i < DPDMAI_PRIO_NUM; i++) {
+ for (i = 0; i < dpdmai_dev->num_queues; i++) {
ret = dpdmai_get_rx_queue(&dpdmai_dev->dpdmai, CMD_PRI_LOW,
- dpdmai_dev->token, i, &rx_attr);
+ dpdmai_dev->token, i, 0, &rx_attr);
if (ret) {
DPAA2_QDMA_ERR("Reading device failed with err: %d",
ret);
@@ -904,7 +904,7 @@ dpaa2_dpdmai_dev_init(struct rte_rawdev *rawdev, int dpdmai_id)
dpdmai_dev->rx_queue[i].fqid = rx_attr.fqid;
ret = dpdmai_get_tx_queue(&dpdmai_dev->dpdmai, CMD_PRI_LOW,
- dpdmai_dev->token, i, &tx_attr);
+ dpdmai_dev->token, i, 0, &tx_attr);
if (ret) {
DPAA2_QDMA_ERR("Reading device failed with err: %d",
ret);
diff --git a/drivers/raw/dpaa2_qdma/dpaa2_qdma.h b/drivers/raw/dpaa2_qdma/dpaa2_qdma.h
index c6a057806..0cbe90255 100644
--- a/drivers/raw/dpaa2_qdma/dpaa2_qdma.h
+++ b/drivers/raw/dpaa2_qdma/dpaa2_qdma.h
@@ -11,6 +11,8 @@ struct qdma_io_meta;
#define DPAA2_QDMA_MAX_FLE 3
#define DPAA2_QDMA_MAX_SDD 2
+#define DPAA2_DPDMAI_MAX_QUEUES 8
+
/** FLE pool size: 3 Frame list + 2 source/destination descriptor */
#define QDMA_FLE_POOL_SIZE (sizeof(struct qdma_io_meta) + \
sizeof(struct qbman_fle) * DPAA2_QDMA_MAX_FLE + \
@@ -142,9 +144,9 @@ struct dpaa2_dpdmai_dev {
/** Number of queue in this DPDMAI device */
uint8_t num_queues;
/** RX queues */
- struct dpaa2_queue rx_queue[DPDMAI_PRIO_NUM];
+ struct dpaa2_queue rx_queue[DPAA2_DPDMAI_MAX_QUEUES];
/** TX queues */
- struct dpaa2_queue tx_queue[DPDMAI_PRIO_NUM];
+ struct dpaa2_queue tx_queue[DPAA2_DPDMAI_MAX_QUEUES];
};
#endif /* __DPAA2_QDMA_H__ */
diff --git a/drivers/raw/dpaa2_qdma/meson.build b/drivers/raw/dpaa2_qdma/meson.build
index b6a081f11..2a4b69c16 100644
--- a/drivers/raw/dpaa2_qdma/meson.build
+++ b/drivers/raw/dpaa2_qdma/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
build = dpdk_conf.has('RTE_LIBRTE_DPAA2_MEMPOOL')
deps += ['rawdev', 'mempool_dpaa2', 'ring']
sources = files('dpaa2_qdma.c')
--
2.17.1
^ permalink raw reply [relevance 2%]
* [dpdk-dev] [PATCH v3 00/15] Upgrade DPAA2 FW and other feature/bug fixes
2018-09-26 18:04 2% ` [dpdk-dev] [PATCH v2 00/15] " Shreyansh Jain
2018-09-26 18:04 2% ` [dpdk-dev] [PATCH v2 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0 Shreyansh Jain
2018-10-12 9:32 0% ` [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes Shreyansh Jain
@ 2018-10-12 10:04 2% ` Shreyansh Jain
2018-10-12 10:04 2% ` [dpdk-dev] [PATCH v3 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0 Shreyansh Jain
2 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2018-10-12 10:04 UTC (permalink / raw)
To: thomas; +Cc: ferruh.yigit, dev, Shreyansh Jain
About the series:
This series of patches upgrades the DPAA2 driver firmware to
v10.10.10 (MC Firmware).
As the bus/fslmc is modified, it is a dependent object for other
drivers like net/crypto/qdma. Also, the changes are mostly tightly
linked - thus, the patches include upgrade as well as sequential
changes to driver.
Once done, it would imply that DPAA2 driver won't work with any MC
FW lower than 10.10.10.
Support for this new firmware is available in publically available
LSDK (Layerscape SDK) release [1].
Besides the FW change, there are other subtle changes as well:
- Support reading the MAC address from NIC device, rather than
using a default MAC
- Adding support for QBMan 5.0 FW APIs
- Some patches for NXP's LX2 platform specific features
- And some bug fixes.
Dependency:
* These patches are based on net-next/master 58c3b609699a8c
* Series [1] is logically related to this, but has no git/patch
related dependency. It is series for upgrade of DPAA.
[1] https://lsdk.github.io/index.html
[2] http://patches.dpdk.org/project/dpdk/list/?series=1090&state=*
Version History:
v2->v3:
- Rebased over master (662e382244)
v1->v2:
- Bumped up the version of the libraries (pmd/bus/crypto/event) as the
first set of patches (MC firmware update) breaks the internal ABI
- Added support for ordered processing APIs. These APIs are expected
to be used in subseqent feature updates on DPAA2 ethernet driver.
- Some internal bug fixes.
(Patches increased from 11~15)
Hemant Agrawal (9):
net/dpaa2: fix VLAN filter enablement
bus/fslmc: upgrade mc FW APIs to 10.10.0
net/dpaa2: upgrade dpni to mc FW APIs to 10.10.0
crypto/dpaa2_sec: upgarde mc FW APIs to 10.10.0
net/dpaa2: update RSS value in mbuf for lx2 platform
net/dpaa2: optimize the fd reset in Tx path
net/dpaa2: enhance the queue memory cleanup routines
net/dpaa2: support MBUF VLAN tci population from HW parser
net/dpaa2: support Rx checksum offload in slow parsing
Nipun Gupta (4):
net/dpaa2: fix IOVA conversion for congestion memory
bus/fslmc: support memory backed portals with QBMAN 5.0
bus/fslmc: support 32 enq and deq for LX2 platform
bus/fslmc: disable annotation prefetch for LX2
Shreyansh Jain (2):
net/dpaa2: read hardware provided MAC for DPNI devices
net/dpaa2: add per queue stats get and reset support
drivers/bus/fslmc/mc/dpbp.c | 10 +
drivers/bus/fslmc/mc/dpci.c | 197 +++++
drivers/bus/fslmc/mc/dpcon.c | 30 +
drivers/bus/fslmc/mc/dpdmai.c | 14 +
drivers/bus/fslmc/mc/dpio.c | 9 +
drivers/bus/fslmc/mc/fsl_dpbp.h | 1 +
drivers/bus/fslmc/mc/fsl_dpbp_cmd.h | 16 +-
drivers/bus/fslmc/mc/fsl_dpci.h | 47 +-
drivers/bus/fslmc/mc/fsl_dpci_cmd.h | 62 +-
drivers/bus/fslmc/mc/fsl_dpcon.h | 19 +
drivers/bus/fslmc/mc/fsl_dpdmai.h | 5 +
drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h | 20 +-
drivers/bus/fslmc/mc/fsl_dpmng.h | 2 +-
drivers/bus/fslmc/mc/fsl_dpopr.h | 85 ++
drivers/bus/fslmc/portal/dpaa2_hw_dpio.c | 197 +++--
drivers/bus/fslmc/portal/dpaa2_hw_dpio.h | 4 +
drivers/bus/fslmc/portal/dpaa2_hw_pvt.h | 32 +-
drivers/bus/fslmc/qbman/include/compat.h | 3 +-
.../fslmc/qbman/include/fsl_qbman_portal.h | 33 +-
drivers/bus/fslmc/qbman/qbman_portal.c | 764 +++++++++++++++---
drivers/bus/fslmc/qbman/qbman_portal.h | 30 +-
drivers/bus/fslmc/qbman/qbman_sys.h | 100 ++-
drivers/bus/fslmc/qbman/qbman_sys_decl.h | 4 +
drivers/bus/fslmc/rte_bus_fslmc_version.map | 12 +
drivers/crypto/dpaa2_sec/Makefile | 2 +-
drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c | 8 +-
drivers/crypto/dpaa2_sec/mc/dpseci.c | 128 ++-
drivers/crypto/dpaa2_sec/mc/fsl_dpseci.h | 25 +-
drivers/crypto/dpaa2_sec/mc/fsl_dpseci_cmd.h | 73 +-
drivers/crypto/dpaa2_sec/meson.build | 2 +
drivers/event/dpaa2/Makefile | 2 +-
drivers/event/dpaa2/dpaa2_eventdev.c | 4 +-
drivers/event/dpaa2/meson.build | 2 +
drivers/mempool/dpaa2/Makefile | 2 +-
drivers/mempool/dpaa2/meson.build | 2 +
drivers/net/dpaa2/Makefile | 2 +-
drivers/net/dpaa2/base/dpaa2_hw_dpni_annot.h | 40 +
drivers/net/dpaa2/dpaa2_ethdev.c | 173 +++-
drivers/net/dpaa2/dpaa2_rxtx.c | 95 ++-
drivers/net/dpaa2/mc/dpni.c | 134 ++-
drivers/net/dpaa2/mc/fsl_dpkg.h | 71 +-
drivers/net/dpaa2/mc/fsl_dpni.h | 378 +++++----
drivers/net/dpaa2/mc/fsl_dpni_cmd.h | 87 +-
drivers/net/dpaa2/mc/fsl_net.h | 2 +-
drivers/net/dpaa2/meson.build | 2 +
drivers/raw/dpaa2_cmdif/Makefile | 2 +-
drivers/raw/dpaa2_cmdif/meson.build | 2 +
drivers/raw/dpaa2_qdma/Makefile | 2 +-
drivers/raw/dpaa2_qdma/dpaa2_qdma.c | 14 +-
drivers/raw/dpaa2_qdma/dpaa2_qdma.h | 6 +-
drivers/raw/dpaa2_qdma/meson.build | 2 +
51 files changed, 2374 insertions(+), 584 deletions(-)
create mode 100644 drivers/bus/fslmc/mc/fsl_dpopr.h
--
2.17.1
^ permalink raw reply [relevance 2%]
* Re: [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes
2018-10-12 9:32 0% ` [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes Shreyansh Jain
@ 2018-10-12 9:42 0% ` Shreyansh Jain
2018-10-12 10:16 0% ` Thomas Monjalon
1 sibling, 0 replies; 200+ results
From: Shreyansh Jain @ 2018-10-12 9:42 UTC (permalink / raw)
To: thomas; +Cc: dev, ferruh.yigit
On Friday 12 October 2018 03:02 PM, Shreyansh Jain wrote:
> On Wednesday 26 September 2018 11:34 PM, Shreyansh Jain wrote:
>> About the series:
>>
>> This series of patches upgrades the DPAA2 driver firmware to
>> v10.10.10 (MC Firmware).
>> As the bus/fslmc is modified, it is a dependent object for other
>> drivers like net/crypto/qdma. Also, the changes are mostly tightly
>> linked - thus, the patches include upgrade as well as sequential
>> changes to driver.
>> Once done, it would imply that DPAA2 driver won't work with any MC
>> FW lower than 10.10.10.
>>
>> Support for this new firmware is available in publically available
>> LSDK (Layerscape SDK) release [1].
>>
>> Besides the FW change, there are other subtle changes as well:
>> - Support reading the MAC address from NIC device, rather than
>> using a default MAC
>> - Adding support for QBMan 5.0 FW APIs
>> - Some patches for NXP's LX2 platform specific features
>> - And some bug fixes.
>>
>> Dependency:
>>
>> * These patches are based on net-next/master 58c3b609699a8c
>> * Series [1] is logically related to this, but has no git/patch
>> related dependency. It is series for upgrade of DPAA.
>>
>> [1] https://lsdk.github.io/index.html
>> [2] http://patches.dpdk.org/project/dpdk/list/?series=1090&state=*
>>
>> Version History:
>> v1->v2:
>> - Bumped up the version of the libraries (pmd/bus/crypto/event) as the
>> first set of patches (MC firmware update) breaks the internal ABI
>> - Added support for ordered processing APIs. These APIs are expected
>> to be used in subseqent feature updates on DPAA2 ethernet driver.
>> - Some internal bug fixes.
>> (Patches increased from 11~15)
>>
>
> Hi Thomas,
>
> Would you be taking this series for RC1?
> (Ideally being driver code, this should have been with Ferruh but
> patchwork is showing your name).
Thomas,
I will send a v3; v2 patch apply is broken because of some version bumps
done for buses on master.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes
2018-09-26 18:04 2% ` [dpdk-dev] [PATCH v2 00/15] " Shreyansh Jain
2018-09-26 18:04 2% ` [dpdk-dev] [PATCH v2 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0 Shreyansh Jain
@ 2018-10-12 9:32 0% ` Shreyansh Jain
2018-10-12 9:42 0% ` Shreyansh Jain
2018-10-12 10:16 0% ` Thomas Monjalon
2018-10-12 10:04 2% ` [dpdk-dev] [PATCH v3 " Shreyansh Jain
2 siblings, 2 replies; 200+ results
From: Shreyansh Jain @ 2018-10-12 9:32 UTC (permalink / raw)
To: thomas; +Cc: dev, ferruh.yigit
On Wednesday 26 September 2018 11:34 PM, Shreyansh Jain wrote:
> About the series:
>
> This series of patches upgrades the DPAA2 driver firmware to
> v10.10.10 (MC Firmware).
> As the bus/fslmc is modified, it is a dependent object for other
> drivers like net/crypto/qdma. Also, the changes are mostly tightly
> linked - thus, the patches include upgrade as well as sequential
> changes to driver.
> Once done, it would imply that DPAA2 driver won't work with any MC
> FW lower than 10.10.10.
>
> Support for this new firmware is available in publically available
> LSDK (Layerscape SDK) release [1].
>
> Besides the FW change, there are other subtle changes as well:
> - Support reading the MAC address from NIC device, rather than
> using a default MAC
> - Adding support for QBMan 5.0 FW APIs
> - Some patches for NXP's LX2 platform specific features
> - And some bug fixes.
>
> Dependency:
>
> * These patches are based on net-next/master 58c3b609699a8c
> * Series [1] is logically related to this, but has no git/patch
> related dependency. It is series for upgrade of DPAA.
>
> [1] https://lsdk.github.io/index.html
> [2] http://patches.dpdk.org/project/dpdk/list/?series=1090&state=*
>
> Version History:
> v1->v2:
> - Bumped up the version of the libraries (pmd/bus/crypto/event) as the
> first set of patches (MC firmware update) breaks the internal ABI
> - Added support for ordered processing APIs. These APIs are expected
> to be used in subseqent feature updates on DPAA2 ethernet driver.
> - Some internal bug fixes.
> (Patches increased from 11~15)
>
Hi Thomas,
Would you be taking this series for RC1?
(Ideally being driver code, this should have been with Ferruh but
patchwork is showing your name).
-
Shreyansh
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH] doc: cryptodev deprecation notice for sym session changes
@ 2018-10-11 14:20 4% Konstantin Ananyev
0 siblings, 0 replies; 200+ results
From: Konstantin Ananyev @ 2018-10-11 14:20 UTC (permalink / raw)
To: dev; +Cc: Konstantin Ananyev
Below are details and reasoning for proposed changes.
1.rte_cryptodev_sym_session_init()/ rte_cryptodev_sym_session_clear()
operate based on cytpodev device id, though inside
rte_cryptodev_sym_session device specific data is addressed
by driver id (not device id).
That creates a problem with current implementation when we have
two or more devices with the same driver used by the same session.
Consider the following example:
struct rte_cryptodev_sym_session *sess;
rte_cryptodev_sym_session_init(dev_id=X, sess, ...);
rte_cryptodev_sym_session_init(dev_id=Y, sess, ...);
rte_cryptodev_sym_session_clear(dev_id=X, sess);
After that point if X and Y uses the same driver,
then sess can't be used by device Y any more.
The reason for that - driver specific (not device specific)
data per session, plus there is no information
how many device instances use that data.
Probably the simplest way to deal with that issue -
add a reference counter per each driver data.
2.rte_cryptodev_sym_session_set_user_data() and
rte_cryptodev_sym_session_get_user_data() -
with current implementation there is no defined way for the user to
determine what is the max allowed size of the private data.
rte_cryptodev_sym_session_set_user_data() just blindly copies
user provided data without checking memory boundaries violation.
To overcome that issue propose to add 'uint16_t priv_size' into
rte_cryptodev_sym_session structure.
3.rte_cryptodev_sym_session contains an array of variable size for
driver specific data.
Though number of elements in that array is determined by static
variable nb_drivers, that could be modified by
rte_cryptodev_allocate_driver().
That construction seems to work ok so far, as right now users register
all their PMDs at startup, though it doesn't mean that it would always
remain like that.
To make it less error prone propose to add 'uint16_t nb_drivers'
into the rte_cryptodev_sym_session structure.
At least that allows related functions to check that provided
driver id wouldn't overrun variable array boundaries,
again it allows to determine size of already allocated session
without accessing global variable.
4.#2 and #3 above implies that now each struct rte_cryptodev_sym_session
would have sort of readonly type data (init once at allocation time,
keep unmodified through session life-time).
That requires more changes in current cryptodev implementation:
Right now inside cryptodev framework both rte_cryptodev_sym_session
and driver specific session data are two completely different sctrucures
(e.g. struct cryptodev_sym_session and struct null_crypto_session).
Though current cryptodev implementation implicitly assumes that driver
will allocate both of them from within the same mempool.
Plus this is done in a manner that they override each other fields
(reuse the same space - sort of implicit C union).
That's probably not the best programming practice,
plus make impossible to have readonly fields inside both of them.
To overcome that situation propose to changed an API a bit, to allow
to use two different mempools for these two distinct data structures.
5. Add 'uint64_t userdata' inside struct rte_cryptodev_sym_session.
I suppose that self-explanatory, and might be used in a lot of places
(would be quite useful for ipsec library we develop).
The new proposed layout for rte_cryptodev_sym_session:
struct rte_cryptodev_sym_session {
uint64_t userdata;
/**< Can be used for external metadata */
uint16_t nb_drivers;
/**< number of elements in sess_data array */
uint16_t priv_size;
/**< session private data will be placed after sess_data */
__extension__ struct {
void *data;
uint16_t refcnt;
} sess_data[0];
/**< Driver specific session material, variable size */
};
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
doc/guides/rel_notes/deprecation.rst | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index d2aec64d1..998a0d92c 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -74,3 +74,12 @@ Deprecation Notices
This is due to a lack of flexibility and reliance on a type unusable with
C++ programs (struct rte_flow_desc).
+
+* cryptodev: several API and ABI changes are planned for rte_cryptodev
+ in v19.02:
+
+ - The size and layout of ``rte_cryptodev_sym_session`` will change
+ to fix existing issues.
+ - The size and layout of ``rte_cryptodev_qp_conf`` and syntax of
+ ``rte_cryptodev_queue_pair_setup`` will change to to allow to use
+ two different mempools for crypto and device private sessions.
--
2.13.6
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions
2018-10-08 21:45 0% ` [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions Stephen Hemminger
@ 2018-10-11 12:10 0% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-10-11 12:10 UTC (permalink / raw)
To: dev
Cc: Stephen Hemminger, gaetan.rivet, ophirmu, qi.z.zhang,
ferruh.yigit, ktraynor
08/10/2018 23:45, Stephen Hemminger:
> On Sun, 7 Oct 2018 11:32:39 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
>
> > This is a follow-up of an idea presented at Dublin
> > during the "hotplug talk".
> >
> > Instead of changing the existing hotplug functions, as in the RFC,
> > some new experimental functions are added.
> > The old functions lose their experimental status in order to provide
> > a non-experimental replacement for deprecated attach/detach functions.
> >
> > It has been discussed briefly in the latest technical board meeting.
> >
> >
> > Changes in v6 - after Gaetan's review:
> > - bump ABI version of all buses (because of rte_device change)
> > - unroll snprintf loop in rte_eal_hotplug_add
> >
> > Changes in v5:
> > - rte_devargs_remove is fixed in case of null devargs (patch 2)
> > - a pointer to the bus is added in rte_device (patch 3)
> > - rte_dev_remove is fixed in case of no devargs (patch 5)
> >
> > Changes in v4 - after Andrew's review:
> > - add API changes in release notes (patches 1 & 2)
> > - fix memory leak in rte_eal_hotplug_add (patch 4)
> >
> > Change in v3:
> > - fix null dereferencing in error path (patch 2)
> >
> >
> > Thomas Monjalon (5):
> > devargs: remove deprecated functions
> > devargs: simplify parameters of removal function
> > eal: add bus pointer in device structure
> > eal: remove experimental flag of hotplug functions
> > eal: simplify parameters of hotplug functions
>
> I like these changes.
>
> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Applied
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses
2018-10-10 8:56 0% ` Tu, Lijuan
@ 2018-10-11 9:26 0% ` Alejandro Lucero
0 siblings, 0 replies; 200+ results
From: Alejandro Lucero @ 2018-10-11 9:26 UTC (permalink / raw)
To: lijuan.tu; +Cc: dev
On Wed, Oct 10, 2018 at 10:00 AM Tu, Lijuan <lijuan.tu@intel.com> wrote:
> Hi
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Alejandro Lucero
> > Sent: Friday, October 5, 2018 8:45 PM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH v3 1/6] mem: add function for checking
> > memsegs IOVAs addresses
> >
> > A device can suffer addressing limitations. This function checks memsegs
> > have iovas within the supported range based on dma mask.
> >
> > PMDs should use this function during initialization if device suffers
> > addressing limitations, returning an error if this function returns
> memsegs
> > out of range.
> >
> > Another usage is for emulated IOMMU hardware with addressing limitations.
> >
> > It is necessary to save the most restricted dma mask for checking out
> > memory allocated dynamically after initialization.
> >
> > Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> > Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > ---
> > doc/guides/rel_notes/release_18_11.rst | 10 ++++
> > lib/librte_eal/common/eal_common_memory.c | 60
> > +++++++++++++++++++++++
> > lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++
> > lib/librte_eal/common/include/rte_memory.h | 3 ++
> > lib/librte_eal/common/malloc_heap.c | 12 +++++
> > lib/librte_eal/linuxapp/eal/eal.c | 2 +
> > lib/librte_eal/rte_eal_version.map | 1 +
> > 7 files changed, 91 insertions(+)
> >
> > diff --git a/doc/guides/rel_notes/release_18_11.rst
> > b/doc/guides/rel_notes/release_18_11.rst
> > index 2133a5b..c806dc6 100644
> > --- a/doc/guides/rel_notes/release_18_11.rst
> > +++ b/doc/guides/rel_notes/release_18_11.rst
> > @@ -104,6 +104,14 @@ New Features
> > the specified port. The port must be stopped before the command call
> in
> > order
> > to reconfigure queues.
> >
> > +* **Added check for ensuring allocated memory addressable by devices.**
> > +
> > + Some devices can have addressing limitations so a new function,
> > + ``rte_eal_check_dma_mask``, has been added for checking allocated
> > + memory is not out of the device range. Because now memory can be
> > + dynamically allocated after initialization, a dma mask is kept and
> > + any new allocated memory will be checked out against that dma mask
> > + and rejected if out of range. If more than one device has addressing
> > limitations, the dma mask is the more restricted one.
> >
> > API Changes
> > -----------
> > @@ -156,6 +164,8 @@ ABI Changes
> > ``rte_config`` structure on account of improving DPDK usability
> > when
> > using either ``--legacy-mem`` or ``--single-file-segments``
> flags.
> >
> > +* eal: added ``dma_maskbits`` to ``rte_mem_config`` for keeping more
> > restricted
> > + dma mask based on devices addressing limitations.
> >
> > Removed Items
> > -------------
> > diff --git a/lib/librte_eal/common/eal_common_memory.c
> > b/lib/librte_eal/common/eal_common_memory.c
> > index 0b69804..c482f0d 100644
> > --- a/lib/librte_eal/common/eal_common_memory.c
> > +++ b/lib/librte_eal/common/eal_common_memory.c
> > @@ -385,6 +385,66 @@ struct virtiova {
> > rte_memseg_walk(dump_memseg, f);
> > }
> >
> > +static int
> > +check_iova(const struct rte_memseg_list *msl __rte_unused,
> > + const struct rte_memseg *ms, void *arg) {
> > + uint64_t *mask = arg;
> > + rte_iova_t iova;
> > +
> > + /* higher address within segment */
> > + iova = (ms->iova + ms->len) - 1;
> > + if (!(iova & *mask))
> > + return 0;
> > +
> > + RTE_LOG(DEBUG, EAL, "memseg iova %"PRIx64", len %zx, out of
> > range\n",
> > + ms->iova, ms->len);
> > +
> > + RTE_LOG(DEBUG, EAL, "\tusing dma mask %"PRIx64"\n", *mask);
> > + return 1;
> > +}
> > +
> > +#if defined(RTE_ARCH_64)
> > +#define MAX_DMA_MASK_BITS 63
> > +#else
> > +#define MAX_DMA_MASK_BITS 31
> > +#endif
> > +
> > +/* check memseg iovas are within the required range based on dma mask
> > +*/ int __rte_experimental rte_eal_check_dma_mask(uint8_t maskbits) {
> > + struct rte_mem_config *mcfg =
> > rte_eal_get_configuration()->mem_config;
> > + uint64_t mask;
> > +
> > + /* sanity check */
> > + if (maskbits > MAX_DMA_MASK_BITS) {
> > + RTE_LOG(ERR, EAL, "wrong dma mask size %u (Max: %u)\n",
> > + maskbits, MAX_DMA_MASK_BITS);
> > + return -1;
> > + }
> > +
> > + /* create dma mask */
> > + mask = ~((1ULL << maskbits) - 1);
> > +
> > + if (rte_memseg_walk(check_iova, &mask))
>
> [Lijuan]In my environment, testpmd halts at rte_memseg_walk() when
> maskbits is 0.
>
>
Can you explain this further?
Who is calling rte_eal_check_dma_mask with mask 0? is this a X86_64 system?
The only explanation I can find is the IOMMU hardware reporting mgaw=0 what
I would say is something completely wrong.
> > + /*
> > + * Dma mask precludes hugepage usage.
> > + * This device can not be used and we do not need to keep
> > + * the dma mask.
> > + */
> > + return 1;
> > +
> > + /*
> > + * we need to keep the more restricted maskbit for checking
> > + * potential dynamic memory allocation in the future.
> > + */
> > + mcfg->dma_maskbits = mcfg->dma_maskbits == 0 ? maskbits :
> > + RTE_MIN(mcfg->dma_maskbits, maskbits);
> > +
> > + return 0;
> > +}
> > +
> > /* return the number of memory channels */ unsigned
> > rte_memory_get_nchannel(void) { diff --git
> > a/lib/librte_eal/common/include/rte_eal_memconfig.h
> > b/lib/librte_eal/common/include/rte_eal_memconfig.h
> > index 62a21c2..b5dff70 100644
> > --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
> > +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
> > @@ -81,6 +81,9 @@ struct rte_mem_config {
> > /* legacy mem and single file segments options are shared */
> > uint32_t legacy_mem;
> > uint32_t single_file_segments;
> > +
> > + /* keeps the more restricted dma mask */
> > + uint8_t dma_maskbits;
> > } __attribute__((__packed__));
> >
> >
> > diff --git a/lib/librte_eal/common/include/rte_memory.h
> > b/lib/librte_eal/common/include/rte_memory.h
> > index 14bd277..c349d6c 100644
> > --- a/lib/librte_eal/common/include/rte_memory.h
> > +++ b/lib/librte_eal/common/include/rte_memory.h
> > @@ -454,6 +454,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct
> > rte_memseg_list *msl,
> > */
> > unsigned rte_memory_get_nrank(void);
> >
> > +/* check memsegs iovas are within a range based on dma mask */ int
> > +rte_eal_check_dma_mask(uint8_t maskbits);
> > +
> > /**
> > * Drivers based on uio will not load unless physical
> > * addresses are obtainable. It is only possible to get diff --git
> > a/lib/librte_eal/common/malloc_heap.c
> > b/lib/librte_eal/common/malloc_heap.c
> > index ac7bbb3..3b5b2b6 100644
> > --- a/lib/librte_eal/common/malloc_heap.c
> > +++ b/lib/librte_eal/common/malloc_heap.c
> > @@ -259,11 +259,13 @@ struct malloc_elem *
> > int socket, unsigned int flags, size_t align, size_t bound,
> > bool contig, struct rte_memseg **ms, int n_segs) {
> > + struct rte_mem_config *mcfg =
> > rte_eal_get_configuration()->mem_config;
> > struct rte_memseg_list *msl;
> > struct malloc_elem *elem = NULL;
> > size_t alloc_sz;
> > int allocd_pages;
> > void *ret, *map_addr;
> > + uint64_t mask;
> >
> > alloc_sz = (size_t)pg_sz * n_segs;
> >
> > @@ -291,6 +293,16 @@ struct malloc_elem *
> > goto fail;
> > }
> >
> > + if (mcfg->dma_maskbits) {
> > + mask = ~((1ULL << mcfg->dma_maskbits) - 1);
> > + if (rte_eal_check_dma_mask(mask)) {
> > + RTE_LOG(ERR, EAL,
> > + "%s(): couldn't allocate memory due to DMA
> mask\n",
> > + __func__);
> > + goto fail;
> > + }
> > + }
> > +
> > /* add newly minted memsegs to malloc heap */
> > elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz);
> >
> > diff --git a/lib/librte_eal/linuxapp/eal/eal.c
> > b/lib/librte_eal/linuxapp/eal/eal.c
> > index 4a55d3b..dfe1b8c 100644
> > --- a/lib/librte_eal/linuxapp/eal/eal.c
> > +++ b/lib/librte_eal/linuxapp/eal/eal.c
> > @@ -263,6 +263,8 @@ enum rte_iova_mode
> > * processes could later map the config into this exact location */
> > rte_config.mem_config->mem_cfg_addr = (uintptr_t)
> > rte_mem_cfg_addr;
> >
> > + rte_config.mem_config->dma_maskbits = 0;
> > +
> > }
> >
> > /* attach to an existing shared memory config */ diff --git
> > a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> > index 73282bb..2baefce 100644
> > --- a/lib/librte_eal/rte_eal_version.map
> > +++ b/lib/librte_eal/rte_eal_version.map
> > @@ -291,6 +291,7 @@ EXPERIMENTAL {
> > rte_devargs_parsef;
> > rte_devargs_remove;
> > rte_devargs_type_count;
> > + rte_eal_check_dma_mask;
> > rte_eal_cleanup;
> > rte_eal_hotplug_add;
> > rte_eal_hotplug_remove;
> > --
> > 1.9.1
>
>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions
2018-10-10 16:10 0% ` Adrien Mazarguil
@ 2018-10-11 8:48 0% ` Ori Kam
0 siblings, 0 replies; 200+ results
From: Ori Kam @ 2018-10-11 8:48 UTC (permalink / raw)
To: Adrien Mazarguil
Cc: Andrew Rybchenko, Ferruh Yigit, stephen, Declan Doherty, dev,
Dekel Peled, Thomas Monjalon, Nélio Laranjeiro,
Yongseok Koh, Shahaf Shuler, Ori Kam
Hi Adrian,
Thanks for your comments please see my answer below and inline.
Due to a very short time limit and the fact that we have more than
4 patches that are based on this we need to close it fast.
As I can see there are number of options:
* the old approach that neither of us like. And which mean that for
every tunnel we create a new command.
* My proposed suggestion as is. Which is easier for at least number of application
to implement and faster in most cases.
* My suggestion with different name, but then we need to find also a name
for the decap and also a name for decap_l3. This approach is also problematic
since we have 2 API that are doing the same thig. For example in test-pmd encap
vxlan in which API shell we use?
* Combine between my suggestion and the current one by replacing the raw
buffer with list of items. Less code duplication easier on the validation ( that
don't think we need to validate the encap data) but we loss insertion rate.
* your suggestion of list of action that each action is one item. Main problem
is speed. Complexity form the application side and time to implement.
> -----Original Message-----
> From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> Sent: Wednesday, October 10, 2018 7:10 PM
> To: Ori Kam <orika@mellanox.com>
> Cc: Andrew Rybchenko <arybchenko@solarflare.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; stephen@networkplumber.org; Declan Doherty
> <declan.doherty@intel.com>; dev@dpdk.org; Dekel Peled
> <dekelp@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>; Nélio
> Laranjeiro <nelio.laranjeiro@6wind.com>; Yongseok Koh
> <yskoh@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com>
> Subject: Re: [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation
> actions
>
> On Wed, Oct 10, 2018 at 01:17:01PM +0000, Ori Kam wrote:
> <snip>
> > > -----Original Message-----
> > > From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> <snip>
> > > On Wed, Oct 10, 2018 at 09:00:52AM +0000, Ori Kam wrote:
> > > <snip>
> > > > > On 10/7/2018 1:57 PM, Ori Kam wrote:
> <snip>
> > > > > In addtion the parameter to to the encap action is a list of rte items,
> > > > > this results in 2 extra translation, between the application to the action
> > > > > and from the action to the NIC. This results in negetive impact on the
> > > > > insertion performance.
> > >
> > > Not sure it's a valid concern since in this proposal, PMD is still expected
> > > to interpret the opaque buffer contents regardless for validation and to
> > > convert it to its internal format.
> > >
> > This is the action to take, we should assume
> > that the pattern is valid and not parse it at all.
> > Another issue, we have a lot of complains about the time we take
> > for validation, I know that currently we must validate the rule when creating
> it,
> > but this can change, why should a rule that was validate and the only change
> > is the IP dest of the encap data?
> > virtual switch after creating the first flow are just modifying it so why force
> > them into revalidating it? (but this issue is a different topic)
>
> Did you measure what proportion of time is spent on validation when creating
> a flow rule?
>
> Based on past experience with mlx4/mlx5, creation used to involve a number
> of expensive system calls while validation was basically a single logic loop
> checking individual items/actions while performing conversion to HW
> format (mandatory for creation). Context switches related to kernel
> involvement are the true performance killers.
>
I'm sorry to say I don't have the numbers, but I can tell you
that in the new API in most cases there will be just one system call.
In addition any extra time is a wasted time, again this is a request we got from number
of customers.
> I'm not sure this is a valid argument in favor of this approach since flow
> rule validation still needs to happen regardless.
>
> By the way, applications are not supposed to call rte_flow_validate() before
> rte_flow_create(). The former can be helpful in some cases (e.g. to get a
> rough idea of PMD capabilities during initialization) but they should in
> practice only rely on rte_flow_create(), then fall back to software
> processing if that fails.
>
First I don't think we need to validate the encapsulation data if the data is wrong
then there will the packet will be dropped. Just like you are saying with the restrication
of the flow items it is the responsibility of the application.
Also I said there is a demand for costumers and there is no reason not to do it
but in any case this is not relevant for the current patch.
> > > Worse, it will require a packet parser to iterate over enclosed headers
> > > instead of a list of convenient rte_flow_whatever objects. It won't be
> > > faster without the convenience of pointers to properly aligned structures
> > > that only contain relevant data fields.
> > >
> > Also in the rte_item we are not aligned so there is no difference in
> performance,
> > between the two approaches, In the rte_item actually we have unused
> pointer which
> > are just a waste.
>
> Regarding unused pointers: right, VXLAN/NVGRE encap actions shouldn't have
> relied on _pattern item_ structures, the room for their "last" pointer is
> arguably wasted. On the other hand, the "mask" pointer allows masking
> relevant fields that matter to the application (e.g. source/destination
> addresses as opposed to IPv4 length, version and other irrelevant fields for
> encap).
>
At least according to my testing the NIC can't uses masks and and it is working based
on the offloading configured to any packet (like checksum )
> Not sure why you think it's not aligned. We're comparing an array of
> rte_flow_item objects with raw packet data. The latter requires
> interpretation of each protocol header to jump to the next offset. This is
> more complex on both sides: to build such a buffer for the application, then
> to have it processed by the PMD.
>
Maybe I missing something but the in a buffer approach likely all the data will be in the
cache and will if allocated will also be aligned. On the other hand the rte_items
also are not guarantee to be in the same cache line each access to item may result
in a cache miss. Also accessing individual members are just as accessing them in
raw buffer.
> > Also needs to consider how application are using it. They are already have it
> in raw buffer
> > so it saves the conversation time for the application.
>
> I don't think so. Applications typically know where some traffic is supposed
> to go and what VNI it should use. They don't have a prefabricated packet
> handy to prepend to outgoing traffic. If that was the case they'd most
> likely do so themselves through a extra packet segment and not bother with
> PMD offloads.
>
Contrail V-Router has such a buffer and it just changes the specific fields.
This is one of the thing we wants to offload, from my last check also OVS uses
such buffer.
> <snip>
> > > From a usability standpoint I'm not a fan of the current interface to
> > > perform NVGRE/VXLAN encap, however this proposal adds another layer of
> > > opaqueness in the name of making things more generic than rte_flow
> already
> > > is.
> > >
> > I'm sorry but I don't understand why it is more opaqueness, as I see it is very
> simple
> > just give the encapsulation data and that's it. For example on system that
> support number of
> > encapsulations they don't need to call to a different function just to change
> the buffer.
>
> I'm saying it's opaque from an API standpoint if you expect the PMD to
> interpret that buffer's contents in order to prepend it in a smart way.
>
> Since this generic encap does not support masks, there is no way for an
> application to at least tell a PMD what data matters and what doesn't in the
> provided buffer. This means invalid checksums, lengths and so on must be
> sent as is to the wire. What's the use case for such a behavior?
>
The NIC treats the packet as normal packet that goes throw all normal offloading.
> > > Assuming they are not to be interpreted by PMDs, maybe there's a case for
> > > prepending arbitrary buffers to outgoing traffic and removing them from
> > > incoming traffic. However this feature should not be named "generic tunnel
> > > encap/decap" as it's misleading.
> > >
> > > Something like RTE_FLOW_ACTION_TYPE_HEADER_(PUSH|POP) would be
> > > more
> > > appropriate. I think on the "pop" side, only the size would matter.
> > >
> > Maybe the name can be change but again the application does encapsulation
> so it will
> > be more intuitive for it.
> >
> > > Another problem is that you must not require actions to rely on specific
> > > pattern content:
> > >
> > I don't think this can be true anymore since for example what do you expect
> > to happen when you place an action for example modify ip to packet with no
> ip?
> >
> > This may raise issues in the NIC.
> > Same goes for decap after the flow is in the NIC he must assume that he can
> remove otherwise
> > really unexpected beaver can accord.
>
> Right, that's why it must be documented as undefined behavior. The API is
> not supposed to enforce the relationship. A PMD may require the presence of
> some pattern item in order to perform some action, but this is a PMD
> limitation, not a limitation of the API itself.
>
Agree
> <snip>
> > For maximum flexibility, all actions should be usable on their own on empty
> > > pattern. On the other hand, you can document undefined behavior when
> > > performing some action on traffic that doesn't contain something.
> > >
> >
> > Like I said and like it is already defined for VXLAN_enacp we must know
> > the pattern otherwise the rule can be declined in Kernel / crash when trying
> to decap
> > packet without outer tunnel.
>
> Right, PMD limitation then. You are free to document it in the PMD.
>
Agree
> <snip>
> > > My opinion is that the best generic approach to perform encap/decap with
> > > rte_flow would use one dedicated action per protocol header to
> > > add/remove/modify. This is the suggestion I originally made for
> > > VXLAN/NVGRE [2] and this is one of the reasons the order of actions now
> > > matters [3].
> >
> > I agree that your approach make a lot of sense, but there are number of
> issues with it
> > * it is harder and takes more time from the application point of view.
> > * it is slower when compared to the raw buffer.
>
> I'm convinced of the opposite :) We could try to implement your raw buffer
> approach as well as mine in testpmd (one action per layer, not the current
> VXLAN/NVGRE encap mess mind you) in order to determine which is the most
> convenient on the application side.
>
There are 2 different implementations one for test-pmd and one for normal application.
Writing the code in test-pmd in raw buffer is simpler but less flexible
writing the code in a real application I think is simpler in the buffer approach.
Since they already have a buffer.
> <snip>
> > > Except for raw push/pop of uninterpreted headers, tunnel encapsulations
> not
> > > explicitly supported by rte_flow shouldn't be possible. Who will expect
> > > something that isn't defined by the API to work and rely on it in their
> > > application? I don't see it happening.
> > >
> > Some of our customers are working with private tunnel type, and they can
> configure it using kernel
> > or just new FW this is a real use case.
>
> You can already use negative types to quickly address HW and
> customer-specific needs by the way. Could this [6] perhaps address the
> issue?
>
> PMDs can expose public APIs. You could devise one that spits new negative
> item/action types based on some data, to be subsequently used by flow
> rules with that PMD only.
>
> > > Come on, adding new encap/decap actions to DPDK is shouldn't be such a
> pain
> > > that the only alternative is a generic API to work around me :)
> > >
> >
> > Yes but like I said when a costumer asks for a ecnap and I can give it to him
> why wait for the DPDK next release?
>
> I don't know, is rte_flow held to a special standard compared to other DPDK
> features in this regard? Engineering patches can always be provided,
> backported and whatnot.
>
> Customer applications will have to be modified and recompiled to benefit
> from any new FW capabilities regardless, it's extremely unlikely to be just
> a matter of installing a new FW image.
>
In some cases this is what's happen 😊
> <snip>
> > > Pattern does not necessarily match the full stack of outer layers.
> > >
> > > Decap action must be able to determine what to do on its own, possibly in
> > > conjunction with other actions in the list but that's all.
> > >
> > Decap removes the outer headers.
> > Some tunnels don't have inner L2 and it must be added after the decap
> > this is what L3 decap means, and the user must supply the valid L2 header.
>
> My point is that any data required to perform decap must be provided by the
> decap action itself, not through a pattern item, whose only purpose is to
> filter traffic and may not be present. Precisely what you did for L3 decap.
>
Agree we remove the limitation and just say unpredicted result may accord.
> <snip>
> > > > I think the reasons I gave are very good motivation to change the
> approach
> > > > please also consider that there is no implementation yet that supports the
> > > > old approach.
> > >
> > > Well, although the existing API made this painful, I did submit one [4] and
> > > there's an updated version from Slava [5] for mlx5.
> > >
> > > > while we do have code that uses the new approach.
> > >
> > > If you need the ability to prepend a raw buffer, please consider a different
> > > name for the related actions, redefine them without reliance on specific
> > > pattern items and leave NVGRE/VXLAN encap/decap as is for the time
> > > being. They can deprecated anytime without ABI impact.
> > >
> > > On the other hand if that raw buffer is to be interpreted by the PMD for
> > > more intelligent tunnel encap/decap handling, I do not agree with the
> > > proposed approach for usability reasons.
>
> I'm still not convinced by your approach. If these new actions *must* be
> included unmodified right now to prevent some customer cataclysm, then fine
> as an experiment but please leave VXLAN/NVGRE encaps alone for the time
> being.
>
> > > [2] [PATCH v3 2/4] ethdev: Add tunnel encap/decap actions
> > >
> > >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> > > pdk.org%2Farchives%2Fdev%2F2018-
> > >
> April%2F096418.html&data=02%7C01%7Corika%40mellanox.com%7C7b9
> > >
> 9c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b%
> > >
> 7C0%7C0%7C636747697489048905&sdata=prABlYixGAkdnyZ2cetpgz5%2F
> > > vkMmiC66T3ZNE%2FewkQ4%3D&reserved=0
> > >
> > > [3] ethdev: alter behavior of flow API actions
> > >
> > >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.dpdk
> > >
> .org%2Fdpdk%2Fcommit%2F%3Fid%3Dcc17feb90413&data=02%7C01%7C
> > >
> orika%40mellanox.com%7C7b99c5f781424ba7950608d62ea83efa%7Ca652971
> > >
> c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636747697489058915&sdata
> > >
> =VavsHXeQ3SgMzaTlklBWdkKSEBjELMp9hwUHBlLQlVA%3D&reserved=0
> > >
> > > [4] net/mlx5: add VXLAN encap support to switch flow rules
> > >
> > >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> > > pdk.org%2Farchives%2Fdev%2F2018-
> > >
> August%2F110598.html&data=02%7C01%7Corika%40mellanox.com%7C7b
> > >
> 99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b
> > >
> %7C0%7C0%7C636747697489058915&sdata=lpfDWp9oBN8AFNYZ6VL5BjI
> > > 38SDFt91iuU7pvhbC%2F0E%3D&reserved=0
> > >
> > > [5] net/mlx5: e-switch VXLAN flow validation routine
> > >
> > >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> > > pdk.org%2Farchives%2Fdev%2F2018-
> > >
> October%2F113782.html&data=02%7C01%7Corika%40mellanox.com%7C7
> > >
> b99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461
> > >
> b%7C0%7C0%7C636747697489058915&sdata=8GCbYk6uB2ahZaHaqWX4
> > > OOq%2B7ZLwxiApcs%2FyRAT9qOw%3D&reserved=0
>
> [6] "9.2.9. Negative types"
>
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdoc.dpdk
> .org%2Fguides-18.08%2Fprog_guide%2Frte_flow.html%23negative-
> types&data=02%7C01%7Corika%40mellanox.com%7C52a7b66d888f47a02
> fa308d62ecae971%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C63
> 6747846398519627&sdata=Rn1s5FgQB8pSgjLvs3K2M4rX%2BVbK5Txi59iy
> Q%2FbsUqQ%3D&reserved=0
>
> On an unrelated note, is there a way to prevent Outlook from mangling URLs
> on your side? (those emea01.safelinks things)
>
I will try to find a solution. I didn't find one so far.
> --
> Adrien Mazarguil
> 6WIND
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions
2018-10-10 13:17 0% ` Ori Kam
@ 2018-10-10 16:10 0% ` Adrien Mazarguil
2018-10-11 8:48 0% ` Ori Kam
0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2018-10-10 16:10 UTC (permalink / raw)
To: Ori Kam
Cc: Andrew Rybchenko, Ferruh Yigit, stephen, Declan Doherty, dev,
Dekel Peled, Thomas Monjalon, Nélio Laranjeiro,
Yongseok Koh, Shahaf Shuler
On Wed, Oct 10, 2018 at 01:17:01PM +0000, Ori Kam wrote:
<snip>
> > -----Original Message-----
> > From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
<snip>
> > On Wed, Oct 10, 2018 at 09:00:52AM +0000, Ori Kam wrote:
> > <snip>
> > > > On 10/7/2018 1:57 PM, Ori Kam wrote:
<snip>
> > > > In addtion the parameter to to the encap action is a list of rte items,
> > > > this results in 2 extra translation, between the application to the action
> > > > and from the action to the NIC. This results in negetive impact on the
> > > > insertion performance.
> >
> > Not sure it's a valid concern since in this proposal, PMD is still expected
> > to interpret the opaque buffer contents regardless for validation and to
> > convert it to its internal format.
> >
> This is the action to take, we should assume
> that the pattern is valid and not parse it at all.
> Another issue, we have a lot of complains about the time we take
> for validation, I know that currently we must validate the rule when creating it,
> but this can change, why should a rule that was validate and the only change
> is the IP dest of the encap data?
> virtual switch after creating the first flow are just modifying it so why force
> them into revalidating it? (but this issue is a different topic)
Did you measure what proportion of time is spent on validation when creating
a flow rule?
Based on past experience with mlx4/mlx5, creation used to involve a number
of expensive system calls while validation was basically a single logic loop
checking individual items/actions while performing conversion to HW
format (mandatory for creation). Context switches related to kernel
involvement are the true performance killers.
I'm not sure this is a valid argument in favor of this approach since flow
rule validation still needs to happen regardless.
By the way, applications are not supposed to call rte_flow_validate() before
rte_flow_create(). The former can be helpful in some cases (e.g. to get a
rough idea of PMD capabilities during initialization) but they should in
practice only rely on rte_flow_create(), then fall back to software
processing if that fails.
> > Worse, it will require a packet parser to iterate over enclosed headers
> > instead of a list of convenient rte_flow_whatever objects. It won't be
> > faster without the convenience of pointers to properly aligned structures
> > that only contain relevant data fields.
> >
> Also in the rte_item we are not aligned so there is no difference in performance,
> between the two approaches, In the rte_item actually we have unused pointer which
> are just a waste.
Regarding unused pointers: right, VXLAN/NVGRE encap actions shouldn't have
relied on _pattern item_ structures, the room for their "last" pointer is
arguably wasted. On the other hand, the "mask" pointer allows masking
relevant fields that matter to the application (e.g. source/destination
addresses as opposed to IPv4 length, version and other irrelevant fields for
encap).
Not sure why you think it's not aligned. We're comparing an array of
rte_flow_item objects with raw packet data. The latter requires
interpretation of each protocol header to jump to the next offset. This is
more complex on both sides: to build such a buffer for the application, then
to have it processed by the PMD.
> Also needs to consider how application are using it. They are already have it in raw buffer
> so it saves the conversation time for the application.
I don't think so. Applications typically know where some traffic is supposed
to go and what VNI it should use. They don't have a prefabricated packet
handy to prepend to outgoing traffic. If that was the case they'd most
likely do so themselves through a extra packet segment and not bother with
PMD offloads.
<snip>
> > From a usability standpoint I'm not a fan of the current interface to
> > perform NVGRE/VXLAN encap, however this proposal adds another layer of
> > opaqueness in the name of making things more generic than rte_flow already
> > is.
> >
> I'm sorry but I don't understand why it is more opaqueness, as I see it is very simple
> just give the encapsulation data and that's it. For example on system that support number of
> encapsulations they don't need to call to a different function just to change the buffer.
I'm saying it's opaque from an API standpoint if you expect the PMD to
interpret that buffer's contents in order to prepend it in a smart way.
Since this generic encap does not support masks, there is no way for an
application to at least tell a PMD what data matters and what doesn't in the
provided buffer. This means invalid checksums, lengths and so on must be
sent as is to the wire. What's the use case for such a behavior?
> > Assuming they are not to be interpreted by PMDs, maybe there's a case for
> > prepending arbitrary buffers to outgoing traffic and removing them from
> > incoming traffic. However this feature should not be named "generic tunnel
> > encap/decap" as it's misleading.
> >
> > Something like RTE_FLOW_ACTION_TYPE_HEADER_(PUSH|POP) would be
> > more
> > appropriate. I think on the "pop" side, only the size would matter.
> >
> Maybe the name can be change but again the application does encapsulation so it will
> be more intuitive for it.
>
> > Another problem is that you must not require actions to rely on specific
> > pattern content:
> >
> I don't think this can be true anymore since for example what do you expect
> to happen when you place an action for example modify ip to packet with no ip?
>
> This may raise issues in the NIC.
> Same goes for decap after the flow is in the NIC he must assume that he can remove otherwise
> really unexpected beaver can accord.
Right, that's why it must be documented as undefined behavior. The API is
not supposed to enforce the relationship. A PMD may require the presence of
some pattern item in order to perform some action, but this is a PMD
limitation, not a limitation of the API itself.
<snip>
> For maximum flexibility, all actions should be usable on their own on empty
> > pattern. On the other hand, you can document undefined behavior when
> > performing some action on traffic that doesn't contain something.
> >
>
> Like I said and like it is already defined for VXLAN_enacp we must know
> the pattern otherwise the rule can be declined in Kernel / crash when trying to decap
> packet without outer tunnel.
Right, PMD limitation then. You are free to document it in the PMD.
<snip>
> > My opinion is that the best generic approach to perform encap/decap with
> > rte_flow would use one dedicated action per protocol header to
> > add/remove/modify. This is the suggestion I originally made for
> > VXLAN/NVGRE [2] and this is one of the reasons the order of actions now
> > matters [3].
>
> I agree that your approach make a lot of sense, but there are number of issues with it
> * it is harder and takes more time from the application point of view.
> * it is slower when compared to the raw buffer.
I'm convinced of the opposite :) We could try to implement your raw buffer
approach as well as mine in testpmd (one action per layer, not the current
VXLAN/NVGRE encap mess mind you) in order to determine which is the most
convenient on the application side.
<snip>
> > Except for raw push/pop of uninterpreted headers, tunnel encapsulations not
> > explicitly supported by rte_flow shouldn't be possible. Who will expect
> > something that isn't defined by the API to work and rely on it in their
> > application? I don't see it happening.
> >
> Some of our customers are working with private tunnel type, and they can configure it using kernel
> or just new FW this is a real use case.
You can already use negative types to quickly address HW and
customer-specific needs by the way. Could this [6] perhaps address the
issue?
PMDs can expose public APIs. You could devise one that spits new negative
item/action types based on some data, to be subsequently used by flow
rules with that PMD only.
> > Come on, adding new encap/decap actions to DPDK is shouldn't be such a pain
> > that the only alternative is a generic API to work around me :)
> >
>
> Yes but like I said when a costumer asks for a ecnap and I can give it to him why wait for the DPDK next release?
I don't know, is rte_flow held to a special standard compared to other DPDK
features in this regard? Engineering patches can always be provided,
backported and whatnot.
Customer applications will have to be modified and recompiled to benefit
from any new FW capabilities regardless, it's extremely unlikely to be just
a matter of installing a new FW image.
<snip>
> > Pattern does not necessarily match the full stack of outer layers.
> >
> > Decap action must be able to determine what to do on its own, possibly in
> > conjunction with other actions in the list but that's all.
> >
> Decap removes the outer headers.
> Some tunnels don't have inner L2 and it must be added after the decap
> this is what L3 decap means, and the user must supply the valid L2 header.
My point is that any data required to perform decap must be provided by the
decap action itself, not through a pattern item, whose only purpose is to
filter traffic and may not be present. Precisely what you did for L3 decap.
<snip>
> > > I think the reasons I gave are very good motivation to change the approach
> > > please also consider that there is no implementation yet that supports the
> > > old approach.
> >
> > Well, although the existing API made this painful, I did submit one [4] and
> > there's an updated version from Slava [5] for mlx5.
> >
> > > while we do have code that uses the new approach.
> >
> > If you need the ability to prepend a raw buffer, please consider a different
> > name for the related actions, redefine them without reliance on specific
> > pattern items and leave NVGRE/VXLAN encap/decap as is for the time
> > being. They can deprecated anytime without ABI impact.
> >
> > On the other hand if that raw buffer is to be interpreted by the PMD for
> > more intelligent tunnel encap/decap handling, I do not agree with the
> > proposed approach for usability reasons.
I'm still not convinced by your approach. If these new actions *must* be
included unmodified right now to prevent some customer cataclysm, then fine
as an experiment but please leave VXLAN/NVGRE encaps alone for the time
being.
> > [2] [PATCH v3 2/4] ethdev: Add tunnel encap/decap actions
> >
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> > pdk.org%2Farchives%2Fdev%2F2018-
> > April%2F096418.html&data=02%7C01%7Corika%40mellanox.com%7C7b9
> > 9c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b%
> > 7C0%7C0%7C636747697489048905&sdata=prABlYixGAkdnyZ2cetpgz5%2F
> > vkMmiC66T3ZNE%2FewkQ4%3D&reserved=0
> >
> > [3] ethdev: alter behavior of flow API actions
> >
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.dpdk
> > .org%2Fdpdk%2Fcommit%2F%3Fid%3Dcc17feb90413&data=02%7C01%7C
> > orika%40mellanox.com%7C7b99c5f781424ba7950608d62ea83efa%7Ca652971
> > c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636747697489058915&sdata
> > =VavsHXeQ3SgMzaTlklBWdkKSEBjELMp9hwUHBlLQlVA%3D&reserved=0
> >
> > [4] net/mlx5: add VXLAN encap support to switch flow rules
> >
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> > pdk.org%2Farchives%2Fdev%2F2018-
> > August%2F110598.html&data=02%7C01%7Corika%40mellanox.com%7C7b
> > 99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b
> > %7C0%7C0%7C636747697489058915&sdata=lpfDWp9oBN8AFNYZ6VL5BjI
> > 38SDFt91iuU7pvhbC%2F0E%3D&reserved=0
> >
> > [5] net/mlx5: e-switch VXLAN flow validation routine
> >
> > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> > pdk.org%2Farchives%2Fdev%2F2018-
> > October%2F113782.html&data=02%7C01%7Corika%40mellanox.com%7C7
> > b99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461
> > b%7C0%7C0%7C636747697489058915&sdata=8GCbYk6uB2ahZaHaqWX4
> > OOq%2B7ZLwxiApcs%2FyRAT9qOw%3D&reserved=0
[6] "9.2.9. Negative types"
http://doc.dpdk.org/guides-18.08/prog_guide/rte_flow.html#negative-types
On an unrelated note, is there a way to prevent Outlook from mangling URLs
on your side? (those emea01.safelinks things)
--
Adrien Mazarguil
6WIND
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions
2018-10-10 12:02 2% ` Adrien Mazarguil
@ 2018-10-10 13:17 0% ` Ori Kam
2018-10-10 16:10 0% ` Adrien Mazarguil
0 siblings, 1 reply; 200+ results
From: Ori Kam @ 2018-10-10 13:17 UTC (permalink / raw)
To: Adrien Mazarguil
Cc: Andrew Rybchenko, Ferruh Yigit, stephen, Declan Doherty, dev,
Dekel Peled, Thomas Monjalon, Nélio Laranjeiro,
Yongseok Koh, Shahaf Shuler
Hi
PSB.
> -----Original Message-----
> From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> Sent: Wednesday, October 10, 2018 3:02 PM
> To: Ori Kam <orika@mellanox.com>
> Cc: Andrew Rybchenko <arybchenko@solarflare.com>; Ferruh Yigit
> <ferruh.yigit@intel.com>; stephen@networkplumber.org; Declan Doherty
> <declan.doherty@intel.com>; dev@dpdk.org; Dekel Peled
> <dekelp@mellanox.com>; Thomas Monjalon <thomas@monjalon.net>; Nélio
> Laranjeiro <nelio.laranjeiro@6wind.com>; Yongseok Koh
> <yskoh@mellanox.com>; Shahaf Shuler <shahafs@mellanox.com>
> Subject: Re: [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation
> actions
>
> Sorry if I'm a bit late to the discussion, please see below.
>
> On Wed, Oct 10, 2018 at 09:00:52AM +0000, Ori Kam wrote:
> <snip>
> > > On 10/7/2018 1:57 PM, Ori Kam wrote:
> > > This series implement the generic L2/L3 tunnel encapsulation actions
> > > and is based on rfc [1] "add generic L2/L3 tunnel encapsulation actions"
> > >
> > > Currenlty the encap/decap actions only support encapsulation
> > > of VXLAN and NVGRE L2 packets (L2 encapsulation is where
> > > the inner packet has a valid Ethernet header, while L3 encapsulation
> > > is where the inner packet doesn't have the Ethernet header).
> > > In addtion the parameter to to the encap action is a list of rte items,
> > > this results in 2 extra translation, between the application to the action
> > > and from the action to the NIC. This results in negetive impact on the
> > > insertion performance.
>
> Not sure it's a valid concern since in this proposal, PMD is still expected
> to interpret the opaque buffer contents regardless for validation and to
> convert it to its internal format.
>
This is the action to take, we should assume
that the pattern is valid and not parse it at all.
Another issue, we have a lot of complains about the time we take
for validation, I know that currently we must validate the rule when creating it,
but this can change, why should a rule that was validate and the only change
is the IP dest of the encap data?
virtual switch after creating the first flow are just modifying it so why force
them into revalidating it? (but this issue is a different topic)
> Worse, it will require a packet parser to iterate over enclosed headers
> instead of a list of convenient rte_flow_whatever objects. It won't be
> faster without the convenience of pointers to properly aligned structures
> that only contain relevant data fields.
>
Also in the rte_item we are not aligned so there is no difference in performance,
between the two approaches, In the rte_item actually we have unused pointer which
are just a waste.
Also needs to consider how application are using it. They are already have it in raw buffer
so it saves the conversation time for the application.
> > > Looking forward there are going to be a need to support many more tunnel
> > > encapsulations. For example MPLSoGRE, MPLSoUDP.
> > > Adding the new encapsulation will result in duplication of code.
> > > For example the code for handling NVGRE and VXLAN are exactly the same,
> > > and each new tunnel will have the same exact structure.
> > >
> > > This series introduce a generic encapsulation for L2 tunnel types, and
> > > generic encapsulation for L3 tunnel types. In addtion the new
> > > encapsulations commands are using raw buffer inorder to save the
> > > converstion time, both for the application and the PMD.
>
> From a usability standpoint I'm not a fan of the current interface to
> perform NVGRE/VXLAN encap, however this proposal adds another layer of
> opaqueness in the name of making things more generic than rte_flow already
> is.
>
I'm sorry but I don't understand why it is more opaqueness, as I see it is very simple
just give the encapsulation data and that's it. For example on system that support number of
encapsulations they don't need to call to a different function just to change the buffer.
> Assuming they are not to be interpreted by PMDs, maybe there's a case for
> prepending arbitrary buffers to outgoing traffic and removing them from
> incoming traffic. However this feature should not be named "generic tunnel
> encap/decap" as it's misleading.
>
> Something like RTE_FLOW_ACTION_TYPE_HEADER_(PUSH|POP) would be
> more
> appropriate. I think on the "pop" side, only the size would matter.
>
Maybe the name can be change but again the application does encapsulation so it will
be more intuitive for it.
> Another problem is that you must not require actions to rely on specific
> pattern content:
>
I don't think this can be true anymore since for example what do you expect
to happen when you place an action for example modify ip to packet with no ip?
This may raise issues in the NIC.
Same goes for decap after the flow is in the NIC he must assume that he can remove otherwise
really unexpected beaver can accord.
> [...]
> *
> * Decapsulate outer most tunnel from matched flow.
> *
> * The flow pattern must have a valid tunnel header
> */
> RTE_FLOW_ACTION_TYPE_TUNNEL_DECAP,
>
> For maximum flexibility, all actions should be usable on their own on empty
> pattern. On the other hand, you can document undefined behavior when
> performing some action on traffic that doesn't contain something.
>
Like I said and like it is already defined for VXLAN_enacp we must know
the pattern otherwise the rule can be declined in Kernel / crash when trying to decap
packet without outer tunnel.
> Reason is that invalid traffic may have already been removed by other flow
> rules (or whatever happens) before such a rule is reached; it's a user's
> responsibility to provide such guarantee.
>
> When parsing an action, a PMD is not supposed to look at the pattern. Action
> list should contain all the needed info, otherwise it means the API is badly
> defined.
>
> I'm aware the above makes it tough to implement something like
> RTE_FLOW_ACTION_TYPE_TUNNEL_DECAP as defined in this series, but that's
> unfortunately why I think it must not be defined like that.
>
> My opinion is that the best generic approach to perform encap/decap with
> rte_flow would use one dedicated action per protocol header to
> add/remove/modify. This is the suggestion I originally made for
> VXLAN/NVGRE [2] and this is one of the reasons the order of actions now
> matters [3].
I agree that your approach make a lot of sense, but there are number of issues with it
* it is harder and takes more time from the application point of view.
* it is slower when compared to the raw buffer.
>
> Remember that whatever is provided, be it an opaque buffer (like you did), a
> separate list of items (like VXLAN/NVGRE) or the rte_flow action list itself
> (what I'm suggesting to do), PMDs have to process it. There will be a CPU
> cost. Keep in mind odd use cases that involve QinQinQinQinQ.
>
> > > I like the idea to generalize encap/decap actions. It makes a bit harder
> > > for reader to find which encap/decap actions are supported in fact,
> > > but it changes nothing for automated usage in the code - just try it
> > > (as a generic way used in rte_flow).
> > >
> >
> > Even now the user doesn't know which encapsulation is supported since
> > it is PMD and sometime kernel related. On the other end it simplify adding
> > encapsulation to specific costumers with some time just FW update.
>
> Except for raw push/pop of uninterpreted headers, tunnel encapsulations not
> explicitly supported by rte_flow shouldn't be possible. Who will expect
> something that isn't defined by the API to work and rely on it in their
> application? I don't see it happening.
>
Some of our customers are working with private tunnel type, and they can configure it using kernel
or just new FW this is a real use case.
> Come on, adding new encap/decap actions to DPDK is shouldn't be such a pain
> that the only alternative is a generic API to work around me :)
>
Yes but like I said when a costumer asks for a ecnap and I can give it to him why wait for the DPDK next release?
> > > Arguments about a way of encap/decap headers specification (flow items
> > > vs raw) sound sensible, but I'm not sure about it.
> > > It would be simpler if the tunnel header is added appended or removed
> > > as is, but as I understand it is not true. For example, IPv4 ID will be
> > > different in incoming packets to be decapsulated and different values
> > > should be used on encapsulation. Checksums will be different (but
> > > offloaded in any case).
> > >
> >
> > I'm not sure I understand your comment.
> > Decapsulation is independent of encapsulation, for example if we decap
> > L2 tunnel type then there is no parameter at all the NIC just removes
> > the outer layers.
>
> According to the pattern? As described above, you can't rely on that.
> Pattern does not necessarily match the full stack of outer layers.
>
> Decap action must be able to determine what to do on its own, possibly in
> conjunction with other actions in the list but that's all.
>
Decap removes the outer headers.
Some tunnels don't have inner L2 and it must be added after the decap
this is what L3 decap means, and the user must supply the valid L2 header.
> > > Current way allows to specify which fields do not matter and which one
> > > must match. It allows to say that, for example, VNI match is sufficient
> > > to decapsulate.
> > >
> >
> > The encapsulation according to definition, is a list of headers that should
> > encapsulate the packet. So I don't understand your comment about matching
> > fields. The matching is based on the flow and the encapsulation is just data
> > that should be added on top of the packet.
> >
> > > Also arguments assume that action input is accepted as is by the HW.
> > > It could be true, but could be obviously false and HW interface may
> > > require parsed input (i.e. driver must parse the input buffer and extract
> > > required fields of packet headers).
> > >
> >
> > You are correct there some PMD even Mellanox (for the E-Switch) require to
> parsed input
> > There is no driver that knows rte_flow structure so in any case there should
> be
> > Some translation between the encapsulation data and the NIC data.
> > I agree that writing the code for translation can be harder in this approach,
> > but the code is only written once is the insertion speed is much higher this
> way.
>
> Avoiding code duplication enough of a reason to do something. Yes NVGRE and
> VXLAN encap/decap should be redefined because of that. But IMO, they should
> prepend a single VXLAN or NVGRE header and be followed by other actions that
> in turn prepend a UDP header, an IPv4/IPv6 one, any number of VLAN headers
> and finally an Ethernet header.
>
> > Also like I said some Virtual Switches are already store this data in raw buffer
> > (they update only specific fields) so this will also save time for the application
> when
> > creating a rule.
> >
> > > So, I'd say no. It should be better motivated if we change existing
> > > approach (even advertised as experimental).
> >
> > I think the reasons I gave are very good motivation to change the approach
> > please also consider that there is no implementation yet that supports the
> > old approach.
>
> Well, although the existing API made this painful, I did submit one [4] and
> there's an updated version from Slava [5] for mlx5.
>
> > while we do have code that uses the new approach.
>
> If you need the ability to prepend a raw buffer, please consider a different
> name for the related actions, redefine them without reliance on specific
> pattern items and leave NVGRE/VXLAN encap/decap as is for the time
> being. They can deprecated anytime without ABI impact.
>
> On the other hand if that raw buffer is to be interpreted by the PMD for
> more intelligent tunnel encap/decap handling, I do not agree with the
> proposed approach for usability reasons.
>
> [2] [PATCH v3 2/4] ethdev: Add tunnel encap/decap actions
>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> pdk.org%2Farchives%2Fdev%2F2018-
> April%2F096418.html&data=02%7C01%7Corika%40mellanox.com%7C7b9
> 9c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b%
> 7C0%7C0%7C636747697489048905&sdata=prABlYixGAkdnyZ2cetpgz5%2F
> vkMmiC66T3ZNE%2FewkQ4%3D&reserved=0
>
> [3] ethdev: alter behavior of flow API actions
>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.dpdk
> .org%2Fdpdk%2Fcommit%2F%3Fid%3Dcc17feb90413&data=02%7C01%7C
> orika%40mellanox.com%7C7b99c5f781424ba7950608d62ea83efa%7Ca652971
> c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636747697489058915&sdata
> =VavsHXeQ3SgMzaTlklBWdkKSEBjELMp9hwUHBlLQlVA%3D&reserved=0
>
> [4] net/mlx5: add VXLAN encap support to switch flow rules
>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> pdk.org%2Farchives%2Fdev%2F2018-
> August%2F110598.html&data=02%7C01%7Corika%40mellanox.com%7C7b
> 99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461b
> %7C0%7C0%7C636747697489058915&sdata=lpfDWp9oBN8AFNYZ6VL5BjI
> 38SDFt91iuU7pvhbC%2F0E%3D&reserved=0
>
> [5] net/mlx5: e-switch VXLAN flow validation routine
>
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.d
> pdk.org%2Farchives%2Fdev%2F2018-
> October%2F113782.html&data=02%7C01%7Corika%40mellanox.com%7C7
> b99c5f781424ba7950608d62ea83efa%7Ca652971c7d2e4d9ba6a4d149256f461
> b%7C0%7C0%7C636747697489058915&sdata=8GCbYk6uB2ahZaHaqWX4
> OOq%2B7ZLwxiApcs%2FyRAT9qOw%3D&reserved=0
>
> --
> Adrien Mazarguil
> 6WIND
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v3 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions
@ 2018-10-10 12:02 2% ` Adrien Mazarguil
2018-10-10 13:17 0% ` Ori Kam
0 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2018-10-10 12:02 UTC (permalink / raw)
To: Ori Kam
Cc: Andrew Rybchenko, Ferruh Yigit, stephen, Declan Doherty, dev,
Dekel Peled, Thomas Monjalon, Nélio Laranjeiro,
Yongseok Koh, Shahaf Shuler
Sorry if I'm a bit late to the discussion, please see below.
On Wed, Oct 10, 2018 at 09:00:52AM +0000, Ori Kam wrote:
<snip>
> > On 10/7/2018 1:57 PM, Ori Kam wrote:
> > This series implement the generic L2/L3 tunnel encapsulation actions
> > and is based on rfc [1] "add generic L2/L3 tunnel encapsulation actions"
> >
> > Currenlty the encap/decap actions only support encapsulation
> > of VXLAN and NVGRE L2 packets (L2 encapsulation is where
> > the inner packet has a valid Ethernet header, while L3 encapsulation
> > is where the inner packet doesn't have the Ethernet header).
> > In addtion the parameter to to the encap action is a list of rte items,
> > this results in 2 extra translation, between the application to the action
> > and from the action to the NIC. This results in negetive impact on the
> > insertion performance.
Not sure it's a valid concern since in this proposal, PMD is still expected
to interpret the opaque buffer contents regardless for validation and to
convert it to its internal format.
Worse, it will require a packet parser to iterate over enclosed headers
instead of a list of convenient rte_flow_whatever objects. It won't be
faster without the convenience of pointers to properly aligned structures
that only contain relevant data fields.
> > Looking forward there are going to be a need to support many more tunnel
> > encapsulations. For example MPLSoGRE, MPLSoUDP.
> > Adding the new encapsulation will result in duplication of code.
> > For example the code for handling NVGRE and VXLAN are exactly the same,
> > and each new tunnel will have the same exact structure.
> >
> > This series introduce a generic encapsulation for L2 tunnel types, and
> > generic encapsulation for L3 tunnel types. In addtion the new
> > encapsulations commands are using raw buffer inorder to save the
> > converstion time, both for the application and the PMD.
>From a usability standpoint I'm not a fan of the current interface to
perform NVGRE/VXLAN encap, however this proposal adds another layer of
opaqueness in the name of making things more generic than rte_flow already
is.
Assuming they are not to be interpreted by PMDs, maybe there's a case for
prepending arbitrary buffers to outgoing traffic and removing them from
incoming traffic. However this feature should not be named "generic tunnel
encap/decap" as it's misleading.
Something like RTE_FLOW_ACTION_TYPE_HEADER_(PUSH|POP) would be more
appropriate. I think on the "pop" side, only the size would matter.
Another problem is that you must not require actions to rely on specific
pattern content:
[...]
*
* Decapsulate outer most tunnel from matched flow.
*
* The flow pattern must have a valid tunnel header
*/
RTE_FLOW_ACTION_TYPE_TUNNEL_DECAP,
For maximum flexibility, all actions should be usable on their own on empty
pattern. On the other hand, you can document undefined behavior when
performing some action on traffic that doesn't contain something.
Reason is that invalid traffic may have already been removed by other flow
rules (or whatever happens) before such a rule is reached; it's a user's
responsibility to provide such guarantee.
When parsing an action, a PMD is not supposed to look at the pattern. Action
list should contain all the needed info, otherwise it means the API is badly
defined.
I'm aware the above makes it tough to implement something like
RTE_FLOW_ACTION_TYPE_TUNNEL_DECAP as defined in this series, but that's
unfortunately why I think it must not be defined like that.
My opinion is that the best generic approach to perform encap/decap with
rte_flow would use one dedicated action per protocol header to
add/remove/modify. This is the suggestion I originally made for
VXLAN/NVGRE [2] and this is one of the reasons the order of actions now
matters [3].
Remember that whatever is provided, be it an opaque buffer (like you did), a
separate list of items (like VXLAN/NVGRE) or the rte_flow action list itself
(what I'm suggesting to do), PMDs have to process it. There will be a CPU
cost. Keep in mind odd use cases that involve QinQinQinQinQ.
> > I like the idea to generalize encap/decap actions. It makes a bit harder
> > for reader to find which encap/decap actions are supported in fact,
> > but it changes nothing for automated usage in the code - just try it
> > (as a generic way used in rte_flow).
> >
>
> Even now the user doesn't know which encapsulation is supported since
> it is PMD and sometime kernel related. On the other end it simplify adding
> encapsulation to specific costumers with some time just FW update.
Except for raw push/pop of uninterpreted headers, tunnel encapsulations not
explicitly supported by rte_flow shouldn't be possible. Who will expect
something that isn't defined by the API to work and rely on it in their
application? I don't see it happening.
Come on, adding new encap/decap actions to DPDK is shouldn't be such a pain
that the only alternative is a generic API to work around me :)
> > Arguments about a way of encap/decap headers specification (flow items
> > vs raw) sound sensible, but I'm not sure about it.
> > It would be simpler if the tunnel header is added appended or removed
> > as is, but as I understand it is not true. For example, IPv4 ID will be
> > different in incoming packets to be decapsulated and different values
> > should be used on encapsulation. Checksums will be different (but
> > offloaded in any case).
> >
>
> I'm not sure I understand your comment.
> Decapsulation is independent of encapsulation, for example if we decap
> L2 tunnel type then there is no parameter at all the NIC just removes
> the outer layers.
According to the pattern? As described above, you can't rely on that.
Pattern does not necessarily match the full stack of outer layers.
Decap action must be able to determine what to do on its own, possibly in
conjunction with other actions in the list but that's all.
> > Current way allows to specify which fields do not matter and which one
> > must match. It allows to say that, for example, VNI match is sufficient
> > to decapsulate.
> >
>
> The encapsulation according to definition, is a list of headers that should
> encapsulate the packet. So I don't understand your comment about matching
> fields. The matching is based on the flow and the encapsulation is just data
> that should be added on top of the packet.
>
> > Also arguments assume that action input is accepted as is by the HW.
> > It could be true, but could be obviously false and HW interface may
> > require parsed input (i.e. driver must parse the input buffer and extract
> > required fields of packet headers).
> >
>
> You are correct there some PMD even Mellanox (for the E-Switch) require to parsed input
> There is no driver that knows rte_flow structure so in any case there should be
> Some translation between the encapsulation data and the NIC data.
> I agree that writing the code for translation can be harder in this approach,
> but the code is only written once is the insertion speed is much higher this way.
Avoiding code duplication enough of a reason to do something. Yes NVGRE and
VXLAN encap/decap should be redefined because of that. But IMO, they should
prepend a single VXLAN or NVGRE header and be followed by other actions that
in turn prepend a UDP header, an IPv4/IPv6 one, any number of VLAN headers
and finally an Ethernet header.
> Also like I said some Virtual Switches are already store this data in raw buffer
> (they update only specific fields) so this will also save time for the application when
> creating a rule.
>
> > So, I'd say no. It should be better motivated if we change existing
> > approach (even advertised as experimental).
>
> I think the reasons I gave are very good motivation to change the approach
> please also consider that there is no implementation yet that supports the
> old approach.
Well, although the existing API made this painful, I did submit one [4] and
there's an updated version from Slava [5] for mlx5.
> while we do have code that uses the new approach.
If you need the ability to prepend a raw buffer, please consider a different
name for the related actions, redefine them without reliance on specific
pattern items and leave NVGRE/VXLAN encap/decap as is for the time
being. They can deprecated anytime without ABI impact.
On the other hand if that raw buffer is to be interpreted by the PMD for
more intelligent tunnel encap/decap handling, I do not agree with the
proposed approach for usability reasons.
[2] [PATCH v3 2/4] ethdev: Add tunnel encap/decap actions
https://mails.dpdk.org/archives/dev/2018-April/096418.html
[3] ethdev: alter behavior of flow API actions
https://git.dpdk.org/dpdk/commit/?id=cc17feb90413
[4] net/mlx5: add VXLAN encap support to switch flow rules
https://mails.dpdk.org/archives/dev/2018-August/110598.html
[5] net/mlx5: e-switch VXLAN flow validation routine
https://mails.dpdk.org/archives/dev/2018-October/113782.html
--
Adrien Mazarguil
6WIND
^ permalink raw reply [relevance 2%]
* Re: [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses
2018-10-05 12:45 4% ` [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
@ 2018-10-10 8:56 0% ` Tu, Lijuan
2018-10-11 9:26 0% ` Alejandro Lucero
0 siblings, 1 reply; 200+ results
From: Tu, Lijuan @ 2018-10-10 8:56 UTC (permalink / raw)
To: Alejandro Lucero, dev
Hi
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Alejandro Lucero
> Sent: Friday, October 5, 2018 8:45 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v3 1/6] mem: add function for checking
> memsegs IOVAs addresses
>
> A device can suffer addressing limitations. This function checks memsegs
> have iovas within the supported range based on dma mask.
>
> PMDs should use this function during initialization if device suffers
> addressing limitations, returning an error if this function returns memsegs
> out of range.
>
> Another usage is for emulated IOMMU hardware with addressing limitations.
>
> It is necessary to save the most restricted dma mask for checking out
> memory allocated dynamically after initialization.
>
> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> doc/guides/rel_notes/release_18_11.rst | 10 ++++
> lib/librte_eal/common/eal_common_memory.c | 60
> +++++++++++++++++++++++
> lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++
> lib/librte_eal/common/include/rte_memory.h | 3 ++
> lib/librte_eal/common/malloc_heap.c | 12 +++++
> lib/librte_eal/linuxapp/eal/eal.c | 2 +
> lib/librte_eal/rte_eal_version.map | 1 +
> 7 files changed, 91 insertions(+)
>
> diff --git a/doc/guides/rel_notes/release_18_11.rst
> b/doc/guides/rel_notes/release_18_11.rst
> index 2133a5b..c806dc6 100644
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -104,6 +104,14 @@ New Features
> the specified port. The port must be stopped before the command call in
> order
> to reconfigure queues.
>
> +* **Added check for ensuring allocated memory addressable by devices.**
> +
> + Some devices can have addressing limitations so a new function,
> + ``rte_eal_check_dma_mask``, has been added for checking allocated
> + memory is not out of the device range. Because now memory can be
> + dynamically allocated after initialization, a dma mask is kept and
> + any new allocated memory will be checked out against that dma mask
> + and rejected if out of range. If more than one device has addressing
> limitations, the dma mask is the more restricted one.
>
> API Changes
> -----------
> @@ -156,6 +164,8 @@ ABI Changes
> ``rte_config`` structure on account of improving DPDK usability
> when
> using either ``--legacy-mem`` or ``--single-file-segments`` flags.
>
> +* eal: added ``dma_maskbits`` to ``rte_mem_config`` for keeping more
> restricted
> + dma mask based on devices addressing limitations.
>
> Removed Items
> -------------
> diff --git a/lib/librte_eal/common/eal_common_memory.c
> b/lib/librte_eal/common/eal_common_memory.c
> index 0b69804..c482f0d 100644
> --- a/lib/librte_eal/common/eal_common_memory.c
> +++ b/lib/librte_eal/common/eal_common_memory.c
> @@ -385,6 +385,66 @@ struct virtiova {
> rte_memseg_walk(dump_memseg, f);
> }
>
> +static int
> +check_iova(const struct rte_memseg_list *msl __rte_unused,
> + const struct rte_memseg *ms, void *arg) {
> + uint64_t *mask = arg;
> + rte_iova_t iova;
> +
> + /* higher address within segment */
> + iova = (ms->iova + ms->len) - 1;
> + if (!(iova & *mask))
> + return 0;
> +
> + RTE_LOG(DEBUG, EAL, "memseg iova %"PRIx64", len %zx, out of
> range\n",
> + ms->iova, ms->len);
> +
> + RTE_LOG(DEBUG, EAL, "\tusing dma mask %"PRIx64"\n", *mask);
> + return 1;
> +}
> +
> +#if defined(RTE_ARCH_64)
> +#define MAX_DMA_MASK_BITS 63
> +#else
> +#define MAX_DMA_MASK_BITS 31
> +#endif
> +
> +/* check memseg iovas are within the required range based on dma mask
> +*/ int __rte_experimental rte_eal_check_dma_mask(uint8_t maskbits) {
> + struct rte_mem_config *mcfg =
> rte_eal_get_configuration()->mem_config;
> + uint64_t mask;
> +
> + /* sanity check */
> + if (maskbits > MAX_DMA_MASK_BITS) {
> + RTE_LOG(ERR, EAL, "wrong dma mask size %u (Max: %u)\n",
> + maskbits, MAX_DMA_MASK_BITS);
> + return -1;
> + }
> +
> + /* create dma mask */
> + mask = ~((1ULL << maskbits) - 1);
> +
> + if (rte_memseg_walk(check_iova, &mask))
[Lijuan]In my environment, testpmd halts at rte_memseg_walk() when maskbits is 0.
> + /*
> + * Dma mask precludes hugepage usage.
> + * This device can not be used and we do not need to keep
> + * the dma mask.
> + */
> + return 1;
> +
> + /*
> + * we need to keep the more restricted maskbit for checking
> + * potential dynamic memory allocation in the future.
> + */
> + mcfg->dma_maskbits = mcfg->dma_maskbits == 0 ? maskbits :
> + RTE_MIN(mcfg->dma_maskbits, maskbits);
> +
> + return 0;
> +}
> +
> /* return the number of memory channels */ unsigned
> rte_memory_get_nchannel(void) { diff --git
> a/lib/librte_eal/common/include/rte_eal_memconfig.h
> b/lib/librte_eal/common/include/rte_eal_memconfig.h
> index 62a21c2..b5dff70 100644
> --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
> +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
> @@ -81,6 +81,9 @@ struct rte_mem_config {
> /* legacy mem and single file segments options are shared */
> uint32_t legacy_mem;
> uint32_t single_file_segments;
> +
> + /* keeps the more restricted dma mask */
> + uint8_t dma_maskbits;
> } __attribute__((__packed__));
>
>
> diff --git a/lib/librte_eal/common/include/rte_memory.h
> b/lib/librte_eal/common/include/rte_memory.h
> index 14bd277..c349d6c 100644
> --- a/lib/librte_eal/common/include/rte_memory.h
> +++ b/lib/librte_eal/common/include/rte_memory.h
> @@ -454,6 +454,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct
> rte_memseg_list *msl,
> */
> unsigned rte_memory_get_nrank(void);
>
> +/* check memsegs iovas are within a range based on dma mask */ int
> +rte_eal_check_dma_mask(uint8_t maskbits);
> +
> /**
> * Drivers based on uio will not load unless physical
> * addresses are obtainable. It is only possible to get diff --git
> a/lib/librte_eal/common/malloc_heap.c
> b/lib/librte_eal/common/malloc_heap.c
> index ac7bbb3..3b5b2b6 100644
> --- a/lib/librte_eal/common/malloc_heap.c
> +++ b/lib/librte_eal/common/malloc_heap.c
> @@ -259,11 +259,13 @@ struct malloc_elem *
> int socket, unsigned int flags, size_t align, size_t bound,
> bool contig, struct rte_memseg **ms, int n_segs) {
> + struct rte_mem_config *mcfg =
> rte_eal_get_configuration()->mem_config;
> struct rte_memseg_list *msl;
> struct malloc_elem *elem = NULL;
> size_t alloc_sz;
> int allocd_pages;
> void *ret, *map_addr;
> + uint64_t mask;
>
> alloc_sz = (size_t)pg_sz * n_segs;
>
> @@ -291,6 +293,16 @@ struct malloc_elem *
> goto fail;
> }
>
> + if (mcfg->dma_maskbits) {
> + mask = ~((1ULL << mcfg->dma_maskbits) - 1);
> + if (rte_eal_check_dma_mask(mask)) {
> + RTE_LOG(ERR, EAL,
> + "%s(): couldn't allocate memory due to DMA mask\n",
> + __func__);
> + goto fail;
> + }
> + }
> +
> /* add newly minted memsegs to malloc heap */
> elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz);
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal.c
> b/lib/librte_eal/linuxapp/eal/eal.c
> index 4a55d3b..dfe1b8c 100644
> --- a/lib/librte_eal/linuxapp/eal/eal.c
> +++ b/lib/librte_eal/linuxapp/eal/eal.c
> @@ -263,6 +263,8 @@ enum rte_iova_mode
> * processes could later map the config into this exact location */
> rte_config.mem_config->mem_cfg_addr = (uintptr_t)
> rte_mem_cfg_addr;
>
> + rte_config.mem_config->dma_maskbits = 0;
> +
> }
>
> /* attach to an existing shared memory config */ diff --git
> a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index 73282bb..2baefce 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -291,6 +291,7 @@ EXPERIMENTAL {
> rte_devargs_parsef;
> rte_devargs_remove;
> rte_devargs_type_count;
> + rte_eal_check_dma_mask;
> rte_eal_cleanup;
> rte_eal_hotplug_add;
> rte_eal_hotplug_remove;
> --
> 1.9.1
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v4 2/2] hash table: add an iterator over conflicting entries
@ 2018-10-09 19:29 2% ` Qiaobin Fu
0 siblings, 0 replies; 200+ results
From: Qiaobin Fu @ 2018-10-09 19:29 UTC (permalink / raw)
To: bruce.richardson, pablo.de.lara.guarch
Cc: dev, doucette, keith.wiles, sameh.gobriel, charlie.tai, stephen,
nd, honnappa.nagarahalli, yipeng1.wang, michel, qiaobinf
Function rte_hash_iterate_conflict_entries_with_hash() iterates
over the entries that conflict with an incoming entry.
Iterating over conflicting entries enables one to decide
if the incoming entry is more valuable than the entries already
in the hash table. This is particularly useful after
an insertion failure.
v4:
* Fix the style issue
* Follow the ABI updates
v3:
* Make the rte_hash_iterate() API similar to
rte_hash_iterate_conflict_entries()
v2:
* Fix the style issue
* Make the API more universal
Signed-off-by: Qiaobin Fu <qiaobinf@bu.edu>
Reviewed-by: Cody Doucette <doucette@bu.edu>
Reviewed-by: Michel Machado <michel@digirati.com.br>
Reviewed-by: Keith Wiles <keith.wiles@intel.com>
Reviewed-by: Yipeng Wang <yipeng1.wang@intel.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Gaëtan Rivet <gaetan.rivet@6wind.com>
---
MAINTAINERS | 2 +-
lib/librte_hash/rte_cuckoo_hash.c | 134 ++++++++++++++++++++++++++-
lib/librte_hash/rte_cuckoo_hash.h | 11 +++
lib/librte_hash/rte_hash.h | 71 +++++++++++++-
lib/librte_hash/rte_hash_version.map | 14 +++
test/test/test_hash.c | 6 +-
test/test/test_hash_multiwriter.c | 8 +-
test/test/test_hash_readwrite.c | 14 ++-
8 files changed, 246 insertions(+), 14 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index 9fd258fad..e8c81656f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1055,7 +1055,7 @@ F: test/test/test_efd*
F: examples/server_node_efd/
F: doc/guides/sample_app_ug/server_node_efd.rst
-Hashes
+Hashes - EXPERIMENTAL
M: Bruce Richardson <bruce.richardson@intel.com>
M: Pablo de Lara <pablo.de.lara.guarch@intel.com>
F: lib/librte_hash/
diff --git a/lib/librte_hash/rte_cuckoo_hash.c b/lib/librte_hash/rte_cuckoo_hash.c
index a3e76684d..439251a7f 100644
--- a/lib/librte_hash/rte_cuckoo_hash.c
+++ b/lib/librte_hash/rte_cuckoo_hash.c
@@ -1301,7 +1301,10 @@ rte_hash_lookup_bulk_data(const struct rte_hash *h, const void **keys,
}
int32_t
-rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next)
+rte_hash_iterate_v1808(const struct rte_hash *h,
+ const void **key,
+ void **data,
+ uint32_t *next)
{
uint32_t bucket_idx, idx, position;
struct rte_hash_key *next_key;
@@ -1344,3 +1347,132 @@ rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32
return position - 1;
}
+VERSION_SYMBOL(rte_hash_iterate, _v1808, 18.08);
+
+int32_t
+rte_hash_iterate_v1811(const struct rte_hash *h,
+ const void **key,
+ void **data,
+ struct rte_hash_iterator_state *state)
+{
+ struct rte_hash_iterator_priv *it;
+ uint32_t bucket_idx, idx, position;
+ struct rte_hash_key *next_key;
+
+ RETURN_IF_TRUE(((h == NULL) || (key == NULL) ||
+ (data == NULL) || (state == NULL)), -EINVAL);
+
+ RTE_BUILD_BUG_ON(sizeof(struct rte_hash_iterator_priv) >
+ sizeof(struct rte_hash_iterator_state));
+
+ it = (struct rte_hash_iterator_priv *)state;
+ if (it->next == 0)
+ it->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
+
+ /* Out of bounds */
+ if (it->next >= it->total_entries)
+ return -ENOENT;
+
+ /* Calculate bucket and index of current iterator */
+ bucket_idx = it->next / RTE_HASH_BUCKET_ENTRIES;
+ idx = it->next % RTE_HASH_BUCKET_ENTRIES;
+
+ __hash_rw_reader_lock(h);
+ /* If current position is empty, go to the next one */
+ while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
+ it->next++;
+ /* End of table */
+ if (it->next == it->total_entries) {
+ __hash_rw_reader_unlock(h);
+ return -ENOENT;
+ }
+ bucket_idx = it->next / RTE_HASH_BUCKET_ENTRIES;
+ idx = it->next % RTE_HASH_BUCKET_ENTRIES;
+ }
+ /* Get position of entry in key table */
+ position = h->buckets[bucket_idx].key_idx[idx];
+ next_key = (struct rte_hash_key *) ((char *)h->key_store +
+ position * h->key_entry_size);
+ /* Return key and data */
+ *key = next_key->key;
+ *data = next_key->pdata;
+
+ __hash_rw_reader_unlock(h);
+
+ /* Increment iterator */
+ it->next++;
+
+ return position - 1;
+}
+MAP_STATIC_SYMBOL(int32_t rte_hash_iterate(const struct rte_hash *h,
+ const void **key, void **data, struct rte_hash_iterator_state *state),
+ rte_hash_iterate_v1811);
+
+int32_t __rte_experimental
+rte_hash_iterate_conflict_entries_with_hash(struct rte_hash *h,
+ const void **key,
+ void **data,
+ hash_sig_t sig,
+ struct rte_hash_iterator_state *state)
+{
+ struct rte_hash_iterator_conflict_entries_priv *it;
+
+ RETURN_IF_TRUE(((h == NULL) || (key == NULL) ||
+ (data == NULL) || (state == NULL)), -EINVAL);
+
+ RTE_BUILD_BUG_ON(sizeof(
+ struct rte_hash_iterator_conflict_entries_priv) >
+ sizeof(struct rte_hash_iterator_state));
+
+ it = (struct rte_hash_iterator_conflict_entries_priv *)state;
+ if (it->vnext == 0) {
+ /*
+ * Get the primary bucket index given
+ * the precomputed hash value.
+ */
+ it->primary_bidx = sig & h->bucket_bitmask;
+ /*
+ * Get the secondary bucket index given
+ * the precomputed hash value.
+ */
+ it->secondary_bidx =
+ rte_hash_secondary_hash(sig) & h->bucket_bitmask;
+ }
+
+ while (it->vnext < RTE_HASH_BUCKET_ENTRIES * 2) {
+ uint32_t bidx = it->vnext < RTE_HASH_BUCKET_ENTRIES ?
+ it->primary_bidx : it->secondary_bidx;
+ uint32_t next = it->vnext & (RTE_HASH_BUCKET_ENTRIES - 1);
+ uint32_t position;
+ struct rte_hash_key *next_key;
+
+ RTE_BUILD_BUG_ON(!RTE_IS_POWER_OF_2(RTE_HASH_BUCKET_ENTRIES));
+ __hash_rw_reader_lock(h);
+ position = h->buckets[bidx].key_idx[next];
+
+ /* Increment iterator. */
+ it->vnext++;
+
+ /*
+ * The test below is unlikely because this iterator is meant
+ * to be used after a failed insert.
+ */
+ if (unlikely(position == EMPTY_SLOT)) {
+ __hash_rw_reader_unlock(h);
+ continue;
+ }
+
+ /* Get the entry in key table. */
+ next_key = (struct rte_hash_key *) ((char *)h->key_store +
+ position * h->key_entry_size);
+ /* Return key and data. */
+ *key = next_key->key;
+ *data = next_key->pdata;
+
+ __hash_rw_reader_unlock(h);
+
+ return position - 1;
+ }
+
+ return -ENOENT;
+}
diff --git a/lib/librte_hash/rte_cuckoo_hash.h b/lib/librte_hash/rte_cuckoo_hash.h
index b43f467d5..70297b16d 100644
--- a/lib/librte_hash/rte_cuckoo_hash.h
+++ b/lib/librte_hash/rte_cuckoo_hash.h
@@ -195,4 +195,15 @@ struct queue_node {
int prev_slot; /* Parent(slot) in search path */
};
+struct rte_hash_iterator_priv {
+ uint32_t next;
+ uint32_t total_entries;
+};
+
+struct rte_hash_iterator_conflict_entries_priv {
+ uint32_t vnext;
+ uint32_t primary_bidx;
+ uint32_t secondary_bidx;
+};
+
#endif
diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
index 9e7d9315f..43f6d8b88 100644
--- a/lib/librte_hash/rte_hash.h
+++ b/lib/librte_hash/rte_hash.h
@@ -14,6 +14,8 @@
#include <stdint.h>
#include <stddef.h>
+#include <rte_compat.h>
+
#ifdef __cplusplus
extern "C" {
#endif
@@ -37,6 +39,9 @@ extern "C" {
/** Flag to support reader writer concurrency */
#define RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY 0x04
+/** Size of the hash table iterator state structure */
+#define RTE_HASH_ITERATOR_STATE_SIZE 64
+
/** Signature of key that is stored internally. */
typedef uint32_t hash_sig_t;
@@ -64,6 +69,16 @@ struct rte_hash_parameters {
/** @internal A hash table structure. */
struct rte_hash;
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @internal A hash table iterator state structure.
+ */
+struct rte_hash_iterator_state {
+ uint8_t space[RTE_HASH_ITERATOR_STATE_SIZE];
+} __rte_cache_aligned;
+
/**
* Create a new hash table.
*
@@ -443,6 +458,9 @@ rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
uint32_t num_keys, int32_t *positions);
/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
* Iterate through the hash table, returning key-value pairs.
*
* @param h
@@ -453,16 +471,61 @@ rte_hash_lookup_bulk(const struct rte_hash *h, const void **keys,
* @param data
* Output containing the data associated with key.
* Returns NULL if data was not stored.
- * @param next
- * Pointer to iterator. Should be 0 to start iterating the hash table.
- * Iterator is incremented after each call of this function.
+ * @param state
+ * Pointer to the iterator state.
* @return
* Position where key was stored, if successful.
* - -EINVAL if the parameters are invalid.
* - -ENOENT if end of the hash table.
*/
int32_t
-rte_hash_iterate(const struct rte_hash *h, const void **key, void **data, uint32_t *next);
+rte_hash_iterate(const struct rte_hash *h,
+ const void **key,
+ void **data,
+ struct rte_hash_iterator_state *state);
+
+int32_t
+rte_hash_iterate_v1808(const struct rte_hash *h,
+ const void **key,
+ void **data,
+ uint32_t *next);
+
+int32_t
+rte_hash_iterate_v1811(const struct rte_hash *h,
+ const void **key,
+ void **data,
+ struct rte_hash_iterator_state *state);
+BIND_DEFAULT_SYMBOL(rte_hash_iterate, _v1811, 18.11);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Iterate over entries that conflict with a given hash.
+ *
+ * @param h
+ * Hash table to iterate.
+ * @param key
+ * Output containing the key at where the iterator is currently pointing.
+ * @param data
+ * Output containing the data associated with key.
+ * Returns NULL if data was not stored.
+ * @param sig
+ * Precomputed hash value for the conflict entry.
+ * @param state
+ * Pointer to the iterator state.
+ * @return
+ * Position where key was stored, if successful.
+ * - -EINVAL if the parameters are invalid.
+ * - -ENOENT if there is no more conflicting entries.
+ */
+int32_t __rte_experimental
+rte_hash_iterate_conflict_entries_with_hash(struct rte_hash *h,
+ const void **key,
+ void **data,
+ hash_sig_t sig,
+ struct rte_hash_iterator_state *state);
+
#ifdef __cplusplus
}
#endif
diff --git a/lib/librte_hash/rte_hash_version.map b/lib/librte_hash/rte_hash_version.map
index e216ac8e2..b1bb5cb02 100644
--- a/lib/librte_hash/rte_hash_version.map
+++ b/lib/librte_hash/rte_hash_version.map
@@ -53,3 +53,17 @@ DPDK_18.08 {
rte_hash_count;
} DPDK_16.07;
+
+DPDK_18.11 {
+ global:
+
+ rte_hash_iterate;
+
+} DPDK_18.08;
+
+EXPERIMENTAL {
+ global:
+
+ rte_hash_iterate_conflict_entries_with_hash;
+
+};
diff --git a/test/test/test_hash.c b/test/test/test_hash.c
index b3db9fd10..a9691e5d5 100644
--- a/test/test/test_hash.c
+++ b/test/test/test_hash.c
@@ -1170,8 +1170,8 @@ static int test_hash_iteration(void)
void *next_data;
void *data[NUM_ENTRIES];
unsigned added_keys;
- uint32_t iter = 0;
int ret = 0;
+ struct rte_hash_iterator_state state;
ut_params.entries = NUM_ENTRIES;
ut_params.name = "test_hash_iteration";
@@ -1190,8 +1190,10 @@ static int test_hash_iteration(void)
break;
}
+ memset(&state, 0, sizeof(state));
+
/* Iterate through the hash table */
- while (rte_hash_iterate(handle, &next_key, &next_data, &iter) >= 0) {
+ while (rte_hash_iterate(handle, &next_key, &next_data, &state) >= 0) {
/* Search for the key in the list of keys added */
for (i = 0; i < NUM_ENTRIES; i++) {
if (memcmp(next_key, keys[i], ut_params.key_len) == 0) {
diff --git a/test/test/test_hash_multiwriter.c b/test/test/test_hash_multiwriter.c
index 6a3eb10bd..63c0cd8d0 100644
--- a/test/test/test_hash_multiwriter.c
+++ b/test/test/test_hash_multiwriter.c
@@ -4,6 +4,7 @@
#include <inttypes.h>
#include <locale.h>
+#include <string.h>
#include <rte_cycles.h>
#include <rte_hash.h>
@@ -125,12 +126,15 @@ test_hash_multiwriter(void)
const void *next_key;
void *next_data;
- uint32_t iter = 0;
uint32_t duplicated_keys = 0;
uint32_t lost_keys = 0;
uint32_t count;
+ struct rte_hash_iterator_state state;
+
+ memset(&state, 0, sizeof(state));
+
snprintf(name, 32, "test%u", calledCount++);
hash_params.name = name;
@@ -203,7 +207,7 @@ test_hash_multiwriter(void)
goto err3;
}
- while (rte_hash_iterate(handle, &next_key, &next_data, &iter) >= 0) {
+ while (rte_hash_iterate(handle, &next_key, &next_data, &state) >= 0) {
/* Search for the key in the list of keys added .*/
i = *(const uint32_t *)next_key;
tbl_multiwriter_test_params.found[i]++;
diff --git a/test/test/test_hash_readwrite.c b/test/test/test_hash_readwrite.c
index 55ae33d80..f9279e21e 100644
--- a/test/test/test_hash_readwrite.c
+++ b/test/test/test_hash_readwrite.c
@@ -166,18 +166,21 @@ test_hash_readwrite_functional(int use_htm)
unsigned int i;
const void *next_key;
void *next_data;
- uint32_t iter = 0;
uint32_t duplicated_keys = 0;
uint32_t lost_keys = 0;
int use_jhash = 1;
+ struct rte_hash_iterator_state state;
+
rte_atomic64_init(&gcycles);
rte_atomic64_clear(&gcycles);
rte_atomic64_init(&ginsertions);
rte_atomic64_clear(&ginsertions);
+ memset(&state, 0, sizeof(state));
+
if (init_params(use_htm, use_jhash) != 0)
goto err;
@@ -196,7 +199,7 @@ test_hash_readwrite_functional(int use_htm)
rte_eal_mp_wait_lcore();
while (rte_hash_iterate(tbl_rw_test_param.h, &next_key,
- &next_data, &iter) >= 0) {
+ &next_data, &state) >= 0) {
/* Search for the key in the list of keys added .*/
i = *(const uint32_t *)next_key;
tbl_rw_test_param.found[i]++;
@@ -315,9 +318,10 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
const void *next_key;
void *next_data;
- uint32_t iter = 0;
int use_jhash = 0;
+ struct rte_hash_iterator_state state;
+
uint32_t duplicated_keys = 0;
uint32_t lost_keys = 0;
@@ -333,6 +337,8 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
rte_atomic64_init(&gwrite_cycles);
rte_atomic64_clear(&gwrite_cycles);
+ memset(&state, 0, sizeof(state));
+
if (init_params(use_htm, use_jhash) != 0)
goto err;
@@ -485,7 +491,7 @@ test_hash_readwrite_perf(struct perf *perf_results, int use_htm,
rte_eal_mp_wait_lcore();
while (rte_hash_iterate(tbl_rw_test_param.h,
- &next_key, &next_data, &iter) >= 0) {
+ &next_key, &next_data, &state) >= 0) {
/* Search for the key in the list of keys added .*/
i = *(const uint32_t *)next_key;
tbl_rw_test_param.found[i]++;
--
2.17.1
^ permalink raw reply [relevance 2%]
* [dpdk-dev] [RFC v2 0/9] ipsec: new library for IPsec data-path processing
@ 2018-10-09 18:23 2% ` Konstantin Ananyev
0 siblings, 0 replies; 200+ results
From: Konstantin Ananyev @ 2018-10-09 18:23 UTC (permalink / raw)
To: dev; +Cc: Konstantin Ananyev
This RFC targets 19.02 release.
This RFC introduces a new library within DPDK: librte_ipsec.
The aim is to provide DPDK native high performance library for IPsec
data-path processing.
The library is supposed to utilize existing DPDK crypto-dev and
security API to provide application with transparent IPsec processing API.
The library is concentrated on data-path protocols processing (ESP and AH),
IKE protocol(s) implementation is out of scope for that library.
Though hook/callback mechanisms might be defined in future to allow
integrate it with existing IKE implementations.
Due to quite complex nature of IPsec protocol suite and variety of user
requirements and usage scenarios a few API levels will be provided:
1) Security Association (SA-level) API
Operates at SA level, provides functions to:
- initialize/teardown SA object
- process inbound/outbound ESP/AH packets associated with the given SA
(decrypt/encrypt, authenticate, check integrity,
add/remove ESP/AH related headers and data, etc.).
2) Security Association Database (SAD) API
API to create/manage/destroy IPsec SAD.
While DPDK IPsec library plans to have its own implementation,
the intention is to keep it as independent from the other parts
of IPsec library as possible.
That is supposed to give users the ability to provide their own
implementation of the SAD compatible with the other parts of the
IPsec library.
3) IPsec Context (CTX) API
This is supposed to be a high-level API, where each IPsec CTX is an
abstraction of 'independent copy of the IPsec stack'.
CTX owns set of SAs, SADs and assigned to it crypto-dev queues, etc.
and provides:
- de-multiplexing stream of inbound packets to particular SAs and
further IPsec related processing.
- IPsec related processing for the outbound packets.
- SA add/delete/update functionality
Current RFC concentrates on SA-level API only (1),
detailed discussion for 2) and 3) will be subjects for separate RFC(s).
SA (low) level API
==================
API described below operates on SA level.
It provides functionality that allows user for given SA to process
inbound and outbound IPsec packets.
To be more specific:
- for inbound ESP/AH packets perform decryption, authentication,
integrity checking, remove ESP/AH related headers
- for outbound packets perform payload encryption, attach ICV,
update/add IP headers, add ESP/AH headers/trailers,
setup related mbuf felids (ol_flags, tx_offloads, etc.).
- initialize/un-initialize given SA based on user provided parameters.
The following functionality:
- match inbound/outbound packets to particular SA
- manage crypto/security devices
- provide SAD/SPD related functionality
- determine what crypto/security device has to be used
for given packet(s)
is out of scope for SA-level API.
SA-level API is based on top of crypto-dev/security API and relies on them
to perform actual cipher and integrity checking.
To have an ability to easily map crypto/security sessions into related
IPSec SA opaque userdata field was added into
rte_cryptodev_sym_session and rte_security_session structures.
That implies ABI change for both librte_crytpodev and librte_security.
Due to the nature of crypto-dev API (enqueue/deque model) we use
asynchronous API for IPsec packets destined to be processed
by crypto-device.
Expected API call sequence would be:
/* enqueue for processing by crypto-device */
rte_ipsec_crypto_prepare(...);
rte_cryptodev_enqueue_burst(...);
/* dequeue from crypto-device and do final processing (if any) */
rte_cryptodev_dequeue_burst(...);
rte_ipsec_crypto_group(...); /* optional */
rte_ipsec_process(...);
Though for packets destined for inline processing no extra overhead
is required and synchronous API call: rte_ipsec_process()
is sufficient for that case.
Current implementation supports all four currently defined rte_security types.
Though to accommodate future custom implementations function pointers model is
used for both rte_ipsec_crypto_prepare() and rte_ipsec_process().
Implemented:
------------
- ESP tunnel mode support (both IPv4/IPv6)
- Supported algorithms: AES-CBC, AES-GCM, HMAC-SHA1, NULL
- Anti-Replay window and ESN support
- Unit Test (few basic cases for now)
TODO list:
----------
- ESP transport mode support (both IPv4/IPv6)
- update examples/ipsec-secgw to use librte_ipsec
- extend Unit Test
Konstantin Ananyev (9):
cryptodev: add opaque userdata pointer into crypto sym session
security: add opaque userdata pointer into security session
net: add ESP trailer structure definition
lib: introduce ipsec library
ipsec: add SA data-path API
ipsec: implement SA data-path API
ipsec: rework SA replay window/SQN for MT environment
ipsec: helper functions to group completed crypto-ops
test/ipsec: introduce functional test
config/common_base | 5 +
lib/Makefile | 2 +
lib/librte_cryptodev/rte_cryptodev.h | 2 +
lib/librte_ipsec/Makefile | 27 +
lib/librte_ipsec/crypto.h | 74 ++
lib/librte_ipsec/ipsec_sqn.h | 315 ++++++++
lib/librte_ipsec/meson.build | 10 +
lib/librte_ipsec/pad.h | 45 ++
lib/librte_ipsec/rte_ipsec.h | 156 ++++
lib/librte_ipsec/rte_ipsec_group.h | 151 ++++
lib/librte_ipsec/rte_ipsec_sa.h | 166 ++++
lib/librte_ipsec/rte_ipsec_version.map | 15 +
lib/librte_ipsec/rwl.h | 68 ++
lib/librte_ipsec/sa.c | 1005 ++++++++++++++++++++++++
lib/librte_ipsec/sa.h | 92 +++
lib/librte_ipsec/ses.c | 45 ++
lib/librte_net/rte_esp.h | 10 +-
lib/librte_security/rte_security.h | 2 +
lib/meson.build | 2 +
mk/rte.app.mk | 2 +
test/test/Makefile | 3 +
test/test/meson.build | 3 +
test/test/test_ipsec.c | 1329 ++++++++++++++++++++++++++++++++
23 files changed, 3528 insertions(+), 1 deletion(-)
create mode 100644 lib/librte_ipsec/Makefile
create mode 100644 lib/librte_ipsec/crypto.h
create mode 100644 lib/librte_ipsec/ipsec_sqn.h
create mode 100644 lib/librte_ipsec/meson.build
create mode 100644 lib/librte_ipsec/pad.h
create mode 100644 lib/librte_ipsec/rte_ipsec.h
create mode 100644 lib/librte_ipsec/rte_ipsec_group.h
create mode 100644 lib/librte_ipsec/rte_ipsec_sa.h
create mode 100644 lib/librte_ipsec/rte_ipsec_version.map
create mode 100644 lib/librte_ipsec/rwl.h
create mode 100644 lib/librte_ipsec/sa.c
create mode 100644 lib/librte_ipsec/sa.h
create mode 100644 lib/librte_ipsec/ses.c
create mode 100644 test/test/test_ipsec.c
--
2.13.6
^ permalink raw reply [relevance 2%]
* Re: [dpdk-dev] [PATCH v2 00/10] introduce telemetry library
2018-10-09 10:33 3% ` Van Haaren, Harry
@ 2018-10-09 11:41 0% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-10-09 11:41 UTC (permalink / raw)
To: Van Haaren, Harry
Cc: Laatz, Kevin, dev, stephen, gaetan.rivet, shreyansh.jain,
Richardson, Bruce
09/10/2018 12:33, Van Haaren, Harry:
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > 04/10/2018 15:25, Van Haaren, Harry:
> > > From: Van Haaren, Harry
> > > > From: Laatz, Kevin
> > > > >
> > > > > This patchset introduces a Telemetry library for DPDK Service
> > Assurance.
> > > > > This library provides an easy way to query DPDK Ethdev metrics.
> > > >
> > > > <snip>
> > > >
> > > > > Note: We are aware that the --telemetry flag is not working for meson
> > > > > builds, we are working on it for a future patch. Despite opterr being
> > set
> > > > > to 0, --telemetry said to be 'unrecognized' as a startup print. This
> > is a
> > > > > cosmetic issue and will also be addressed.
> > > > >
> > > > > ---
> > > > > v2:
> > > > > - Reworked telemetry as part of EAL instead of using vdev (Gaetan)
> > > > > - Refactored rte_telemetry_command (Gaetan)
> > > > > - Added MAINTAINERS file entry (Stephen)
> > > > > - Updated docs to reflect vdev to eal rework
> > > > > - Removed collectd patch from patchset (Thomas)
> > > > > - General code clean up from v1 feedback
> > > >
> > > >
> > > > Hi Gaetan, Thomas, Stephen and Shreyansh!
> > > >
> > > >
> > > > goto TL_DR; // if time is short :)
> > > >
> > > >
> > > > In this v2 patchset, we've reworked the Telemetry to no longer use the
> > vdev
> > > > infrastructure, instead having EAL enable it directly. This was
> > requested as
> > > > feedback to the v1 patchset. I'll detail the approach below, and
> > highlight
> > > > some issues we identified while implementing it.
> > > >
> > > > Currently, EAL does not depend on any "DPDK" libraries (ignore kvargs
> > etc
> > > > for a minute).
> > > > Telemetry is a DPDK library, so it depends on EAL. In order to have EAL
> > > > initialize
> > > > Telemetry, it must depend on it - this causes a circular dependency.
> > > >
> > > > This patchset resolves that circular dependency by using a "weak" symbol
> > for
> > > > telemetry init, and then the "strong" version of telemetry init will
> > replace
> > > > it when the library is compiled in.
> > >
> > > Correction: we attempted this approach - but ended up adding a TAILQ of
> > library
> > > initializers functions to EAL, which was then iterated during
> > rte_eal_init().
> > > This also resolved the circular-dependency, but the same linking issue as
> > > described below still exists.
> > >
> > > So - the same question still stands - what is the best solution for 18.11?
> > >
> > >
> > > > Although this *technically* works, it
> > > > requires
> > > > that applications *LINK* against Telemetry library explicitly - as EAL
> > won't
> > > > pull
> > > > in the Telemetry .so automatically... This means application-level
> > build-
> > > > system
> > > > changes to enable --telemetry on the DPDK EAL command line.
> > > >
> > > > Given the complexity in enabling EAL to handle the Telemetry init() and
> > its
> > > > dependencies, I'd like to ask you folks for input on how to better solve
> > > > this?
> >
> > First, the telemetry feature must be enabled via a public function (API).
> > The application can decide to enable the feature at any time, right?
> > If the application wants to enable the feature at initialization
> > (and considers user input from the command line),
> > then the init function has a dependency on telemetry.
> > Your dependency concern is that the init function (which is high level)
> > is in EAL (which is the lowest layer in DPDK).
>
> Yes, and this has been resolved by allowing components to register
> with EAL to have their _init() function called later. V3 coming up
> with this approach, it seems to cover the required use-cases.
>
>
> > I think the command line should not be managed directly by EAL.
> > My suggestion in last summit was to move this code in a different library.
> > We should also move the init function(s) to a new high level library.
> >
> > This is my proposal to solve cyclic dependency: move rte_eal_init in a lib
> > which depends on everything.
>
> I have prototyped this approach, and it is not really clean. It means
> splitting EAL into two halves, and due to meson library naming we have
> to move all eal files to eal_impl or something, and then eal.so keeps rte_eal_init().
>
> Removing functions from the .map files is also technically an ABI break,
> at which point I didn't think it was the right solution.
>
>
> > About the linking issue, I don't understand the problem.
> > If you use the DPDK makefiles, rte.app.mk should manage it.
> > If you use the DPDK meson, all libs are linked.
> > If you use your own system, of course you need to add telemetry lib.
>
> Yes agreed, in practice it should be exactly like this. In reality
> it can be harder to achieve the exact dependencies correctly with
> both Static/Shared builds and constructors etc.
>
> I believe the current approach of registering an _init() function
> will be acceptable, let's wait for v3 to hit the mailing list.
I think it is not clean.
We should really split EAL in two parts:
- low level routines
- high level init.
About telemetry, you can find any workaround, but it must be temporary.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2 00/10] introduce telemetry library
@ 2018-10-09 10:33 3% ` Van Haaren, Harry
2018-10-09 11:41 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Van Haaren, Harry @ 2018-10-09 10:33 UTC (permalink / raw)
To: Thomas Monjalon
Cc: Laatz, Kevin, dev, stephen, gaetan.rivet, shreyansh.jain,
Richardson, Bruce
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, October 4, 2018 4:54 PM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>
> Cc: Laatz, Kevin <kevin.laatz@intel.com>; dev@dpdk.org;
> stephen@networkplumber.org; gaetan.rivet@6wind.com; shreyansh.jain@nxp.com;
> Richardson, Bruce <bruce.richardson@intel.com>
> Subject: Re: [PATCH v2 00/10] introduce telemetry library
>
> 04/10/2018 15:25, Van Haaren, Harry:
> > From: Van Haaren, Harry
> > > From: Laatz, Kevin
> > > >
> > > > This patchset introduces a Telemetry library for DPDK Service
> Assurance.
> > > > This library provides an easy way to query DPDK Ethdev metrics.
> > >
> > > <snip>
> > >
> > > > Note: We are aware that the --telemetry flag is not working for meson
> > > > builds, we are working on it for a future patch. Despite opterr being
> set
> > > > to 0, --telemetry said to be 'unrecognized' as a startup print. This
> is a
> > > > cosmetic issue and will also be addressed.
> > > >
> > > > ---
> > > > v2:
> > > > - Reworked telemetry as part of EAL instead of using vdev (Gaetan)
> > > > - Refactored rte_telemetry_command (Gaetan)
> > > > - Added MAINTAINERS file entry (Stephen)
> > > > - Updated docs to reflect vdev to eal rework
> > > > - Removed collectd patch from patchset (Thomas)
> > > > - General code clean up from v1 feedback
> > >
> > >
> > > Hi Gaetan, Thomas, Stephen and Shreyansh!
> > >
> > >
> > > goto TL_DR; // if time is short :)
> > >
> > >
> > > In this v2 patchset, we've reworked the Telemetry to no longer use the
> vdev
> > > infrastructure, instead having EAL enable it directly. This was
> requested as
> > > feedback to the v1 patchset. I'll detail the approach below, and
> highlight
> > > some issues we identified while implementing it.
> > >
> > > Currently, EAL does not depend on any "DPDK" libraries (ignore kvargs
> etc
> > > for a minute).
> > > Telemetry is a DPDK library, so it depends on EAL. In order to have EAL
> > > initialize
> > > Telemetry, it must depend on it - this causes a circular dependency.
> > >
> > > This patchset resolves that circular dependency by using a "weak" symbol
> for
> > > telemetry init, and then the "strong" version of telemetry init will
> replace
> > > it when the library is compiled in.
> >
> > Correction: we attempted this approach - but ended up adding a TAILQ of
> library
> > initializers functions to EAL, which was then iterated during
> rte_eal_init().
> > This also resolved the circular-dependency, but the same linking issue as
> > described below still exists.
> >
> > So - the same question still stands - what is the best solution for 18.11?
> >
> >
> > > Although this *technically* works, it
> > > requires
> > > that applications *LINK* against Telemetry library explicitly - as EAL
> won't
> > > pull
> > > in the Telemetry .so automatically... This means application-level
> build-
> > > system
> > > changes to enable --telemetry on the DPDK EAL command line.
> > >
> > > Given the complexity in enabling EAL to handle the Telemetry init() and
> its
> > > dependencies, I'd like to ask you folks for input on how to better solve
> > > this?
>
> First, the telemetry feature must be enabled via a public function (API).
> The application can decide to enable the feature at any time, right?
> If the application wants to enable the feature at initialization
> (and considers user input from the command line),
> then the init function has a dependency on telemetry.
> Your dependency concern is that the init function (which is high level)
> is in EAL (which is the lowest layer in DPDK).
Yes, and this has been resolved by allowing components to register
with EAL to have their _init() function called later. V3 coming up
with this approach, it seems to cover the required use-cases.
> I think the command line should not be managed directly by EAL.
> My suggestion in last summit was to move this code in a different library.
> We should also move the init function(s) to a new high level library.
>
> This is my proposal to solve cyclic dependency: move rte_eal_init in a lib
> which depends on everything.
I have prototyped this approach, and it is not really clean. It means
splitting EAL into two halves, and due to meson library naming we have
to move all eal files to eal_impl or something, and then eal.so keeps rte_eal_init().
Removing functions from the .map files is also technically an ABI break,
at which point I didn't think it was the right solution.
> About the linking issue, I don't understand the problem.
> If you use the DPDK makefiles, rte.app.mk should manage it.
> If you use the DPDK meson, all libs are linked.
> If you use your own system, of course you need to add telemetry lib.
Yes agreed, in practice it should be exactly like this. In reality
it can be harder to achieve the exact dependencies correctly with
both Static/Shared builds and constructors etc.
I believe the current approach of registering an _init() function
will be acceptable, let's wait for v3 to hit the mailing list.
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions
2018-10-07 9:32 3% ` [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions Thomas Monjalon
2018-10-07 9:32 4% ` [dpdk-dev] [PATCH v6 3/5] eal: add bus pointer in device structure Thomas Monjalon
@ 2018-10-08 21:45 0% ` Stephen Hemminger
2018-10-11 12:10 0% ` Thomas Monjalon
1 sibling, 1 reply; 200+ results
From: Stephen Hemminger @ 2018-10-08 21:45 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dev, gaetan.rivet, ophirmu, qi.z.zhang, ferruh.yigit, ktraynor
On Sun, 7 Oct 2018 11:32:39 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:
> This is a follow-up of an idea presented at Dublin
> during the "hotplug talk".
>
> Instead of changing the existing hotplug functions, as in the RFC,
> some new experimental functions are added.
> The old functions lose their experimental status in order to provide
> a non-experimental replacement for deprecated attach/detach functions.
>
> It has been discussed briefly in the latest technical board meeting.
>
>
> Changes in v6 - after Gaetan's review:
> - bump ABI version of all buses (because of rte_device change)
> - unroll snprintf loop in rte_eal_hotplug_add
>
> Changes in v5:
> - rte_devargs_remove is fixed in case of null devargs (patch 2)
> - a pointer to the bus is added in rte_device (patch 3)
> - rte_dev_remove is fixed in case of no devargs (patch 5)
>
> Changes in v4 - after Andrew's review:
> - add API changes in release notes (patches 1 & 2)
> - fix memory leak in rte_eal_hotplug_add (patch 4)
>
> Change in v3:
> - fix null dereferencing in error path (patch 2)
>
>
> Thomas Monjalon (5):
> devargs: remove deprecated functions
> devargs: simplify parameters of removal function
> eal: add bus pointer in device structure
> eal: remove experimental flag of hotplug functions
> eal: simplify parameters of hotplug functions
>
> doc/guides/rel_notes/release_18_11.rst | 23 ++++--
> drivers/bus/dpaa/Makefile | 2 +-
> drivers/bus/dpaa/dpaa_bus.c | 2 +
> drivers/bus/dpaa/meson.build | 2 +
> drivers/bus/fslmc/Makefile | 2 +-
> drivers/bus/fslmc/fslmc_bus.c | 2 +
> drivers/bus/fslmc/meson.build | 2 +
> drivers/bus/ifpga/Makefile | 2 +-
> drivers/bus/ifpga/ifpga_bus.c | 6 +-
> drivers/bus/ifpga/meson.build | 2 +
> drivers/bus/pci/Makefile | 2 +-
> drivers/bus/pci/bsd/pci.c | 2 +
> drivers/bus/pci/linux/pci.c | 1 +
> drivers/bus/pci/meson.build | 2 +
> drivers/bus/pci/private.h | 2 +
> drivers/bus/vdev/Makefile | 2 +-
> drivers/bus/vdev/meson.build | 2 +
> drivers/bus/vdev/vdev.c | 9 +--
> drivers/bus/vmbus/Makefile | 2 +-
> drivers/bus/vmbus/linux/vmbus_bus.c | 1 +
> drivers/bus/vmbus/meson.build | 2 +
> drivers/bus/vmbus/private.h | 3 +
> drivers/net/failsafe/failsafe_eal.c | 3 +-
> drivers/net/failsafe/failsafe_ether.c | 3 +-
> lib/librte_eal/common/eal_common_dev.c | 90 +++++++++++++--------
> lib/librte_eal/common/eal_common_devargs.c | 41 ++--------
> lib/librte_eal/common/include/rte_dev.h | 36 +++++++--
> lib/librte_eal/common/include/rte_devargs.h | 81 +------------------
> lib/librte_eal/rte_eal_version.map | 10 +--
> 29 files changed, 155 insertions(+), 184 deletions(-)
>
I like these changes.
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
I noticed there is only minimal places that devargs appear in the documentation.
The relationship between whitelist and devargs is not obvious for new users.
The one place is in the documentation of the documentation! So you want to pull
rte_eth_dev_attach from documentation.rst.
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v6 3/5] eal: add bus pointer in device structure
2018-10-07 9:32 3% ` [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions Thomas Monjalon
@ 2018-10-07 9:32 4% ` Thomas Monjalon
2018-10-08 21:45 0% ` [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions Stephen Hemminger
1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-10-07 9:32 UTC (permalink / raw)
To: dev; +Cc: gaetan.rivet, ophirmu, qi.z.zhang, ferruh.yigit, ktraynor
When a device is added with a devargs (hotplug or whitelist),
the bus pointer can be retrieved via its devargs.
But there is no such devargs.bus in case of standard scan.
A pointer to the rte_bus handle is added to rte_device.
When a device is allocated (during a scan),
the pointer to its bus is assigned.
It will make possible to remove a rte_device,
using the function pointer from its bus.
The function rte_bus_find_by_device() becomes useless,
and may be removed later.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
---
doc/guides/rel_notes/release_18_11.rst | 15 ++++++++++-----
drivers/bus/dpaa/Makefile | 2 +-
drivers/bus/dpaa/dpaa_bus.c | 2 ++
drivers/bus/dpaa/meson.build | 2 ++
drivers/bus/fslmc/Makefile | 2 +-
drivers/bus/fslmc/fslmc_bus.c | 2 ++
drivers/bus/fslmc/meson.build | 2 ++
drivers/bus/ifpga/Makefile | 2 +-
drivers/bus/ifpga/ifpga_bus.c | 1 +
drivers/bus/ifpga/meson.build | 2 ++
drivers/bus/pci/Makefile | 2 +-
drivers/bus/pci/bsd/pci.c | 2 ++
drivers/bus/pci/linux/pci.c | 1 +
drivers/bus/pci/meson.build | 2 ++
drivers/bus/pci/private.h | 2 ++
drivers/bus/vdev/Makefile | 2 +-
drivers/bus/vdev/meson.build | 2 ++
drivers/bus/vdev/vdev.c | 1 +
drivers/bus/vmbus/Makefile | 2 +-
drivers/bus/vmbus/linux/vmbus_bus.c | 1 +
drivers/bus/vmbus/meson.build | 2 ++
drivers/bus/vmbus/private.h | 3 +++
lib/librte_eal/common/include/rte_dev.h | 1 +
23 files changed, 44 insertions(+), 11 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index d534bb71c..c87522f27 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -164,6 +164,10 @@ ABI Changes
``rte_config`` structure on account of improving DPDK usability when
using either ``--legacy-mem`` or ``--single-file-segments`` flags.
+* eal: The structure ``rte_device`` got a new field to reference a ``rte_bus``.
+ It is changing the size of the ``struct rte_device`` and the inherited
+ device structures of all buses.
+
Removed Items
-------------
@@ -199,11 +203,12 @@ The libraries prepended with a plus sign were incremented in this version.
librte_bbdev.so.1
librte_bitratestats.so.2
librte_bpf.so.1
- librte_bus_dpaa.so.1
- librte_bus_fslmc.so.1
- librte_bus_pci.so.1
- librte_bus_vdev.so.1
- + librte_bus_vmbus.so.1
+ + librte_bus_dpaa.so.2
+ + librte_bus_fslmc.so.2
+ + librte_bus_ifpga.so.2
+ + librte_bus_pci.so.2
+ + librte_bus_vdev.so.2
+ + librte_bus_vmbus.so.2
librte_cfgfile.so.2
librte_cmdline.so.2
librte_common_octeontx.so.1
diff --git a/drivers/bus/dpaa/Makefile b/drivers/bus/dpaa/Makefile
index bffaa9d92..9337b5f92 100644
--- a/drivers/bus/dpaa/Makefile
+++ b/drivers/bus/dpaa/Makefile
@@ -24,7 +24,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/common/include
# versioning export map
EXPORT_MAP := rte_bus_dpaa_version.map
-LIBABIVER := 1
+LIBABIVER := 2
# all source are stored in SRCS-y
#
diff --git a/drivers/bus/dpaa/dpaa_bus.c b/drivers/bus/dpaa/dpaa_bus.c
index 49cd04dbb..138e0f98d 100644
--- a/drivers/bus/dpaa/dpaa_bus.c
+++ b/drivers/bus/dpaa/dpaa_bus.c
@@ -165,6 +165,8 @@ dpaa_create_device_list(void)
goto cleanup;
}
+ dev->device.bus = &rte_dpaa_bus.bus;
+
cfg = &dpaa_netcfg->port_cfg[i];
fman_intf = cfg->fman_if;
diff --git a/drivers/bus/dpaa/meson.build b/drivers/bus/dpaa/meson.build
index d10b62c03..5e7705571 100644
--- a/drivers/bus/dpaa/meson.build
+++ b/drivers/bus/dpaa/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/bus/fslmc/Makefile b/drivers/bus/fslmc/Makefile
index 515d0f534..e95551980 100644
--- a/drivers/bus/fslmc/Makefile
+++ b/drivers/bus/fslmc/Makefile
@@ -24,7 +24,7 @@ LDLIBS += -lrte_ethdev
EXPORT_MAP := rte_bus_fslmc_version.map
# library version
-LIBABIVER := 1
+LIBABIVER := 2
SRCS-$(CONFIG_RTE_LIBRTE_FSLMC_BUS) += \
qbman/qbman_portal.c \
diff --git a/drivers/bus/fslmc/fslmc_bus.c b/drivers/bus/fslmc/fslmc_bus.c
index bfe81e236..960f55071 100644
--- a/drivers/bus/fslmc/fslmc_bus.c
+++ b/drivers/bus/fslmc/fslmc_bus.c
@@ -161,6 +161,8 @@ scan_one_fslmc_device(char *dev_name)
return -ENOMEM;
}
+ dev->device.bus = &rte_fslmc_bus.bus;
+
/* Parse the device name and ID */
t_ptr = strtok(dup_dev_name, ".");
if (!t_ptr) {
diff --git a/drivers/bus/fslmc/meson.build b/drivers/bus/fslmc/meson.build
index 22a56a6fc..54ca92d0c 100644
--- a/drivers/bus/fslmc/meson.build
+++ b/drivers/bus/fslmc/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/bus/ifpga/Makefile b/drivers/bus/ifpga/Makefile
index 3ff3bdb81..514452b39 100644
--- a/drivers/bus/ifpga/Makefile
+++ b/drivers/bus/ifpga/Makefile
@@ -19,7 +19,7 @@ LDLIBS += -lrte_kvargs
EXPORT_MAP := rte_bus_ifpga_version.map
# library version
-LIBABIVER := 1
+LIBABIVER := 2
SRCS-$(CONFIG_RTE_LIBRTE_IFPGA_BUS) += ifpga_bus.c
SRCS-$(CONFIG_RTE_LIBRTE_IFPGA_BUS) += ifpga_common.c
diff --git a/drivers/bus/ifpga/ifpga_bus.c b/drivers/bus/ifpga/ifpga_bus.c
index 3ef035b7e..80663328a 100644
--- a/drivers/bus/ifpga/ifpga_bus.c
+++ b/drivers/bus/ifpga/ifpga_bus.c
@@ -142,6 +142,7 @@ ifpga_scan_one(struct rte_rawdev *rawdev,
if (!afu_dev)
goto end;
+ afu_dev->device.bus = &rte_ifpga_bus;
afu_dev->device.devargs = devargs;
afu_dev->device.numa_node = SOCKET_ID_ANY;
afu_dev->device.name = devargs->name;
diff --git a/drivers/bus/ifpga/meson.build b/drivers/bus/ifpga/meson.build
index c9b08c862..0b5c38d54 100644
--- a/drivers/bus/ifpga/meson.build
+++ b/drivers/bus/ifpga/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2010-2018 Intel Corporation
+version = 2
+
deps += ['pci', 'kvargs', 'rawdev']
install_headers('rte_bus_ifpga.h')
sources = files('ifpga_common.c', 'ifpga_bus.c')
diff --git a/drivers/bus/pci/Makefile b/drivers/bus/pci/Makefile
index 4de953f8f..f33e0120f 100644
--- a/drivers/bus/pci/Makefile
+++ b/drivers/bus/pci/Makefile
@@ -4,7 +4,7 @@
include $(RTE_SDK)/mk/rte.vars.mk
LIB = librte_bus_pci.a
-LIBABIVER := 1
+LIBABIVER := 2
EXPORT_MAP := rte_bus_pci_version.map
CFLAGS := -I$(SRCDIR) $(CFLAGS)
diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 655b34b7e..40641cad4 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -223,6 +223,8 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
}
memset(dev, 0, sizeof(*dev));
+ dev->device.bus = &rte_pci_bus.bus;
+
dev->addr.domain = conf->pc_sel.pc_domain;
dev->addr.bus = conf->pc_sel.pc_bus;
dev->addr.devid = conf->pc_sel.pc_dev;
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..e31bbb370 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -228,6 +228,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
return -1;
memset(dev, 0, sizeof(*dev));
+ dev->device.bus = &rte_pci_bus.bus;
dev->addr = *addr;
/* get vendor id */
diff --git a/drivers/bus/pci/meson.build b/drivers/bus/pci/meson.build
index 23d6a5fec..ef9492bb8 100644
--- a/drivers/bus/pci/meson.build
+++ b/drivers/bus/pci/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2017 Intel Corporation
+version = 2
+
deps += ['pci']
install_headers('rte_bus_pci.h')
sources = files('pci_common.c',
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 0e689fa74..04bffa6e7 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -15,6 +15,8 @@ extern struct rte_pci_bus rte_pci_bus;
struct rte_pci_driver;
struct rte_pci_device;
+extern struct rte_pci_bus rte_pci_bus;
+
/**
* Probe the PCI bus
*
diff --git a/drivers/bus/vdev/Makefile b/drivers/bus/vdev/Makefile
index 1f9cd7ebe..803b8ea7b 100644
--- a/drivers/bus/vdev/Makefile
+++ b/drivers/bus/vdev/Makefile
@@ -16,7 +16,7 @@ CFLAGS += -DALLOW_EXPERIMENTAL_API
EXPORT_MAP := rte_bus_vdev_version.map
# library version
-LIBABIVER := 1
+LIBABIVER := 2
SRCS-y += vdev.c
SRCS-y += vdev_params.c
diff --git a/drivers/bus/vdev/meson.build b/drivers/bus/vdev/meson.build
index 12605e5c7..803785f10 100644
--- a/drivers/bus/vdev/meson.build
+++ b/drivers/bus/vdev/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2017 Intel Corporation
+version = 2
+
sources = files('vdev.c',
'vdev_params.c')
install_headers('rte_bus_vdev.h')
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index efca962f7..0142fb2c8 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -456,6 +456,7 @@ vdev_scan(void)
continue;
}
+ dev->device.bus = &rte_vdev_bus;
dev->device.devargs = devargs;
dev->device.numa_node = SOCKET_ID_ANY;
dev->device.name = devargs->name;
diff --git a/drivers/bus/vmbus/Makefile b/drivers/bus/vmbus/Makefile
index deee9dd10..e54c557c6 100644
--- a/drivers/bus/vmbus/Makefile
+++ b/drivers/bus/vmbus/Makefile
@@ -3,7 +3,7 @@
include $(RTE_SDK)/mk/rte.vars.mk
LIB = librte_bus_vmbus.a
-LIBABIVER := 1
+LIBABIVER := 2
EXPORT_MAP := rte_bus_vmbus_version.map
CFLAGS += -I$(SRCDIR)
diff --git a/drivers/bus/vmbus/linux/vmbus_bus.c b/drivers/bus/vmbus/linux/vmbus_bus.c
index 527a6a39f..a4755a387 100644
--- a/drivers/bus/vmbus/linux/vmbus_bus.c
+++ b/drivers/bus/vmbus/linux/vmbus_bus.c
@@ -229,6 +229,7 @@ vmbus_scan_one(const char *name)
if (dev == NULL)
return -1;
+ dev->device.bus = &rte_vmbus_bus.bus;
dev->device.name = strdup(name);
if (!dev->device.name)
goto error;
diff --git a/drivers/bus/vmbus/meson.build b/drivers/bus/vmbus/meson.build
index 18daabecc..0e4d058ee 100644
--- a/drivers/bus/vmbus/meson.build
+++ b/drivers/bus/vmbus/meson.build
@@ -1,5 +1,7 @@
# SPDX-License-Identifier: BSD-3-Clause
+version = 2
+
allow_experimental_apis = true
install_headers('rte_bus_vmbus.h','rte_vmbus_reg.h')
diff --git a/drivers/bus/vmbus/private.h b/drivers/bus/vmbus/private.h
index f2022a68c..211127dd8 100644
--- a/drivers/bus/vmbus/private.h
+++ b/drivers/bus/vmbus/private.h
@@ -10,11 +10,14 @@
#include <sys/uio.h>
#include <rte_log.h>
#include <rte_vmbus_reg.h>
+#include <rte_bus_vmbus.h>
#ifndef PAGE_SIZE
#define PAGE_SIZE 4096
#endif
+extern struct rte_vmbus_bus rte_vmbus_bus;
+
extern int vmbus_logtype_bus;
#define VMBUS_LOG(level, fmt, args...) \
rte_log(RTE_LOG_ ## level, vmbus_logtype_bus, "%s(): " fmt "\n", \
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b80a80598..d82cba847 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -157,6 +157,7 @@ struct rte_device {
TAILQ_ENTRY(rte_device) next; /**< Next device */
const char *name; /**< Device name */
const struct rte_driver *driver;/**< Associated driver */
+ const struct rte_bus *bus; /**< Bus handle assigned on scan */
int numa_node; /**< NUMA node connection */
struct rte_devargs *devargs; /**< Device user arguments */
};
--
2.19.0
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions
@ 2018-10-07 9:32 3% ` Thomas Monjalon
2018-10-07 9:32 4% ` [dpdk-dev] [PATCH v6 3/5] eal: add bus pointer in device structure Thomas Monjalon
2018-10-08 21:45 0% ` [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions Stephen Hemminger
1 sibling, 2 replies; 200+ results
From: Thomas Monjalon @ 2018-10-07 9:32 UTC (permalink / raw)
To: dev; +Cc: gaetan.rivet, ophirmu, qi.z.zhang, ferruh.yigit, ktraynor
This is a follow-up of an idea presented at Dublin
during the "hotplug talk".
Instead of changing the existing hotplug functions, as in the RFC,
some new experimental functions are added.
The old functions lose their experimental status in order to provide
a non-experimental replacement for deprecated attach/detach functions.
It has been discussed briefly in the latest technical board meeting.
Changes in v6 - after Gaetan's review:
- bump ABI version of all buses (because of rte_device change)
- unroll snprintf loop in rte_eal_hotplug_add
Changes in v5:
- rte_devargs_remove is fixed in case of null devargs (patch 2)
- a pointer to the bus is added in rte_device (patch 3)
- rte_dev_remove is fixed in case of no devargs (patch 5)
Changes in v4 - after Andrew's review:
- add API changes in release notes (patches 1 & 2)
- fix memory leak in rte_eal_hotplug_add (patch 4)
Change in v3:
- fix null dereferencing in error path (patch 2)
Thomas Monjalon (5):
devargs: remove deprecated functions
devargs: simplify parameters of removal function
eal: add bus pointer in device structure
eal: remove experimental flag of hotplug functions
eal: simplify parameters of hotplug functions
doc/guides/rel_notes/release_18_11.rst | 23 ++++--
drivers/bus/dpaa/Makefile | 2 +-
drivers/bus/dpaa/dpaa_bus.c | 2 +
drivers/bus/dpaa/meson.build | 2 +
drivers/bus/fslmc/Makefile | 2 +-
drivers/bus/fslmc/fslmc_bus.c | 2 +
drivers/bus/fslmc/meson.build | 2 +
drivers/bus/ifpga/Makefile | 2 +-
drivers/bus/ifpga/ifpga_bus.c | 6 +-
drivers/bus/ifpga/meson.build | 2 +
drivers/bus/pci/Makefile | 2 +-
drivers/bus/pci/bsd/pci.c | 2 +
drivers/bus/pci/linux/pci.c | 1 +
drivers/bus/pci/meson.build | 2 +
drivers/bus/pci/private.h | 2 +
drivers/bus/vdev/Makefile | 2 +-
drivers/bus/vdev/meson.build | 2 +
drivers/bus/vdev/vdev.c | 9 +--
drivers/bus/vmbus/Makefile | 2 +-
drivers/bus/vmbus/linux/vmbus_bus.c | 1 +
drivers/bus/vmbus/meson.build | 2 +
drivers/bus/vmbus/private.h | 3 +
drivers/net/failsafe/failsafe_eal.c | 3 +-
drivers/net/failsafe/failsafe_ether.c | 3 +-
lib/librte_eal/common/eal_common_dev.c | 90 +++++++++++++--------
lib/librte_eal/common/eal_common_devargs.c | 41 ++--------
lib/librte_eal/common/include/rte_dev.h | 36 +++++++--
lib/librte_eal/common/include/rte_devargs.h | 81 +------------------
lib/librte_eal/rte_eal_version.map | 10 +--
29 files changed, 155 insertions(+), 184 deletions(-)
--
2.19.0
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v4 1/2] build: change default PMD installation subdir to dpdk/pmds-XX.YY
2018-10-02 16:20 3% ` [dpdk-dev] [PATCH v4 " Luca Boccassi
2018-10-02 16:28 0% ` Bruce Richardson
@ 2018-10-05 16:00 0% ` Timothy Redaelli
2018-10-27 21:19 0% ` Thomas Monjalon
2 siblings, 0 replies; 200+ results
From: Timothy Redaelli @ 2018-10-05 16:00 UTC (permalink / raw)
To: Luca Boccassi; +Cc: dev, bruce.richardson, christian.ehrhardt, mvarlese
On Tue, 2 Oct 2018 17:20:45 +0100
Luca Boccassi <bluca@debian.org> wrote:
> As part of the effort of consolidating the DPDK installation bits and
> pieces across distros, set the default directory of lib/ where PMDs get
> installed to dpdk/pmds-XX.YY. It's necessary to have a versioned
> subdirectory as multiple ABI revisions might be installed at the same
> time, so having a fixed name will cause trouble with the autoload
> feature.
> Small refactor with parsing and saving the major version to a variable,
> since it's now used in 3 different places.
>
> Signed-off-by: Luca Boccassi <bluca@debian.org>
> ---
Acked-by: Timothy Redaelli <tredaelli@redhat.com>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] ethdev: add field for device data per process
2018-10-05 13:26 4% ` Ferruh Yigit
@ 2018-10-05 14:47 0% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-10-05 14:47 UTC (permalink / raw)
To: Ferruh Yigit, Alejandro Lucero; +Cc: Andrew Rybchenko, dev, rasland
05/10/2018 15:26, Ferruh Yigit:
> On 10/5/2018 2:17 PM, Alejandro Lucero wrote:
> > On Fri, Oct 5, 2018 at 2:01 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> >> Will there be new version?
> >>
> >> Are we agree on name?
> >>
> >> Is LIBABIVER increase should be done in this patch, or will there be other
> >> patch
> >> already doing it?
> >>
> >
> > I'm not familiar with LIBABIVER but just tell me to send it again with that
> > change if you consider that is the right thing to do.
>
> ABI breakage process:
> - Increase LIBABIVER in library Makefile/meson.build
> - Update lib in release notes "Shared Library Versions" section, with a "+" to
> to indicate change
> - Remove deprecation notice (seems not applies to this one)
>
> Thomas mentioned there is another patch breaking the ABI for ethdev, I wonder
> which patch will do the above process.
There will be a patch to remove the attach/detach function.
But the patch for data per process will probably be applied first.
Please do the LIBABIVER bump as described by Ferruh.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] ethdev: add field for device data per process
@ 2018-10-05 13:26 4% ` Ferruh Yigit
2018-10-05 14:47 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-10-05 13:26 UTC (permalink / raw)
To: Alejandro Lucero; +Cc: Thomas Monjalon, Andrew Rybchenko, dev, rasland
On 10/5/2018 2:17 PM, Alejandro Lucero wrote:
> On Fri, Oct 5, 2018 at 2:01 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>
>> On 10/3/2018 9:44 PM, Thomas Monjalon wrote:
>>> + Cc more people
>>>
>>> 27/09/2018 13:26, Alejandro Lucero:
>>>> Primary and secondary processes share a per-device private data. With
>>>> current design it is not possible to have data per-device per-process.
>>>> This is required for handling properly the CPP interface inside the NFP
>>>> PMD with multiprocess support.
>>>>
>>>> There is also at least another PMD driver, tap, with similar
>>>> requirements for per-process device data.
>>>
>>> Yes, it is required to fix tap PMD for multi-process usage.
>>>
>>> I am in favor of accepting this change in 18.11.
>>>
>>> [...]
>>>> @@ -539,7 +539,13 @@ struct rte_eth_dev {
>>>> eth_rx_burst_t rx_pkt_burst; /**< Pointer to PMD receive function.
>> */
>>>> eth_tx_burst_t tx_pkt_burst; /**< Pointer to PMD transmit
>> function. */
>>>> eth_tx_prep_t tx_pkt_prepare; /**< Pointer to PMD transmit prepare
>> function. */
>>>> - struct rte_eth_dev_data *data; /**< Pointer to device data */
>>>> + /**
>>>> + * Next two fields are per-device data but *data is shared between
>>>
>>> All fields in rte_eth_dev are per-device.
>>>
>>>> + * primary and secondary processes and *process_private is
>> per-process
>>>> + * private.
>>>> + */
>>>> + struct rte_eth_dev_data *data; /**< Pointer to device data. */
>>>> + void *process_private; /**< Pointer to per-process device data. */
>>>
>>> We could explain here that this memory is allocated by the PMD.
>>
>> Will there be new version?
>>
>> Are we agree on name?
>>
>> Is LIBABIVER increase should be done in this patch, or will there be other
>> patch
>> already doing it?
>>
>
> I'm not familiar with LIBABIVER but just tell me to send it again with that
> change if you consider that is the right thing to do.
ABI breakage process:
- Increase LIBABIVER in library Makefile/meson.build
- Update lib in release notes "Shared Library Versions" section, with a "+" to
to indicate change
- Remove deprecation notice (seems not applies to this one)
Thomas mentioned there is another patch breaking the ABI for ethdev, I wonder
which patch will do the above process.
> About the name, I will let other to tell.
>
> Thanks
>
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses
2018-10-05 12:45 3% [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask Alejandro Lucero
@ 2018-10-05 12:45 4% ` Alejandro Lucero
2018-10-10 8:56 0% ` Tu, Lijuan
0 siblings, 1 reply; 200+ results
From: Alejandro Lucero @ 2018-10-05 12:45 UTC (permalink / raw)
To: dev
A device can suffer addressing limitations. This function checks
memsegs have iovas within the supported range based on dma mask.
PMDs should use this function during initialization if device
suffers addressing limitations, returning an error if this function
returns memsegs out of range.
Another usage is for emulated IOMMU hardware with addressing
limitations.
It is necessary to save the most restricted dma mask for checking out
memory allocated dynamically after initialization.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 10 ++++
lib/librte_eal/common/eal_common_memory.c | 60 +++++++++++++++++++++++
lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_memory.h | 3 ++
lib/librte_eal/common/malloc_heap.c | 12 +++++
lib/librte_eal/linuxapp/eal/eal.c | 2 +
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 91 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 2133a5b..c806dc6 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -104,6 +104,14 @@ New Features
the specified port. The port must be stopped before the command call in order
to reconfigure queues.
+* **Added check for ensuring allocated memory addressable by devices.**
+
+ Some devices can have addressing limitations so a new function,
+ ``rte_eal_check_dma_mask``, has been added for checking allocated memory is
+ not out of the device range. Because now memory can be dynamically allocated
+ after initialization, a dma mask is kept and any new allocated memory will be
+ checked out against that dma mask and rejected if out of range. If more than
+ one device has addressing limitations, the dma mask is the more restricted one.
API Changes
-----------
@@ -156,6 +164,8 @@ ABI Changes
``rte_config`` structure on account of improving DPDK usability when
using either ``--legacy-mem`` or ``--single-file-segments`` flags.
+* eal: added ``dma_maskbits`` to ``rte_mem_config`` for keeping more restricted
+ dma mask based on devices addressing limitations.
Removed Items
-------------
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0b69804..c482f0d 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -385,6 +385,66 @@ struct virtiova {
rte_memseg_walk(dump_memseg, f);
}
+static int
+check_iova(const struct rte_memseg_list *msl __rte_unused,
+ const struct rte_memseg *ms, void *arg)
+{
+ uint64_t *mask = arg;
+ rte_iova_t iova;
+
+ /* higher address within segment */
+ iova = (ms->iova + ms->len) - 1;
+ if (!(iova & *mask))
+ return 0;
+
+ RTE_LOG(DEBUG, EAL, "memseg iova %"PRIx64", len %zx, out of range\n",
+ ms->iova, ms->len);
+
+ RTE_LOG(DEBUG, EAL, "\tusing dma mask %"PRIx64"\n", *mask);
+ return 1;
+}
+
+#if defined(RTE_ARCH_64)
+#define MAX_DMA_MASK_BITS 63
+#else
+#define MAX_DMA_MASK_BITS 31
+#endif
+
+/* check memseg iovas are within the required range based on dma mask */
+int __rte_experimental
+rte_eal_check_dma_mask(uint8_t maskbits)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ uint64_t mask;
+
+ /* sanity check */
+ if (maskbits > MAX_DMA_MASK_BITS) {
+ RTE_LOG(ERR, EAL, "wrong dma mask size %u (Max: %u)\n",
+ maskbits, MAX_DMA_MASK_BITS);
+ return -1;
+ }
+
+ /* create dma mask */
+ mask = ~((1ULL << maskbits) - 1);
+
+ if (rte_memseg_walk(check_iova, &mask))
+ /*
+ * Dma mask precludes hugepage usage.
+ * This device can not be used and we do not need to keep
+ * the dma mask.
+ */
+ return 1;
+
+ /*
+ * we need to keep the more restricted maskbit for checking
+ * potential dynamic memory allocation in the future.
+ */
+ mcfg->dma_maskbits = mcfg->dma_maskbits == 0 ? maskbits :
+ RTE_MIN(mcfg->dma_maskbits, maskbits);
+
+ return 0;
+}
+
/* return the number of memory channels */
unsigned rte_memory_get_nchannel(void)
{
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 62a21c2..b5dff70 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -81,6 +81,9 @@ struct rte_mem_config {
/* legacy mem and single file segments options are shared */
uint32_t legacy_mem;
uint32_t single_file_segments;
+
+ /* keeps the more restricted dma mask */
+ uint8_t dma_maskbits;
} __attribute__((__packed__));
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277..c349d6c 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -454,6 +454,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
*/
unsigned rte_memory_get_nrank(void);
+/* check memsegs iovas are within a range based on dma mask */
+int rte_eal_check_dma_mask(uint8_t maskbits);
+
/**
* Drivers based on uio will not load unless physical
* addresses are obtainable. It is only possible to get
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3..3b5b2b6 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -259,11 +259,13 @@ struct malloc_elem *
int socket, unsigned int flags, size_t align, size_t bound,
bool contig, struct rte_memseg **ms, int n_segs)
{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct rte_memseg_list *msl;
struct malloc_elem *elem = NULL;
size_t alloc_sz;
int allocd_pages;
void *ret, *map_addr;
+ uint64_t mask;
alloc_sz = (size_t)pg_sz * n_segs;
@@ -291,6 +293,16 @@ struct malloc_elem *
goto fail;
}
+ if (mcfg->dma_maskbits) {
+ mask = ~((1ULL << mcfg->dma_maskbits) - 1);
+ if (rte_eal_check_dma_mask(mask)) {
+ RTE_LOG(ERR, EAL,
+ "%s(): couldn't allocate memory due to DMA mask\n",
+ __func__);
+ goto fail;
+ }
+ }
+
/* add newly minted memsegs to malloc heap */
elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz);
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 4a55d3b..dfe1b8c 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -263,6 +263,8 @@ enum rte_iova_mode
* processes could later map the config into this exact location */
rte_config.mem_config->mem_cfg_addr = (uintptr_t) rte_mem_cfg_addr;
+ rte_config.mem_config->dma_maskbits = 0;
+
}
/* attach to an existing shared memory config */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bb..2baefce 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -291,6 +291,7 @@ EXPERIMENTAL {
rte_devargs_parsef;
rte_devargs_remove;
rte_devargs_type_count;
+ rte_eal_check_dma_mask;
rte_eal_cleanup;
rte_eal_hotplug_add;
rte_eal_hotplug_remove;
--
1.9.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask
@ 2018-10-05 12:45 3% Alejandro Lucero
2018-10-05 12:45 4% ` [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
0 siblings, 1 reply; 200+ results
From: Alejandro Lucero @ 2018-10-05 12:45 UTC (permalink / raw)
To: dev
I sent a patchset about this to be applied on 17.11 stable. The memory
code has had main changes since that version, so here it is the patchset
adjusted to current master repo.
This patchset adds, mainly, a check for ensuring IOVAs are within a
restricted range due to addressing limitations with some devices. There
are two known cases: NFP and IOMMU VT-d emulation.
With this check IOVAs out of range are detected and PMDs can abort
initialization. For the VT-d case, IOVA VA mode is allowed as long as
IOVAs are within the supported range, avoiding to forbid IOVA VA by
default.
For the addressing limitations known cases, there are just 40(NFP) or
39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
most systems. With machines using more memory, the added check will
ensure IOVAs within the range.
With IOVA VA, and because the way the Linux kernel serves mmap calls
in 64 bits systems, 39 or 40 bits are not enough. It is possible to
give an address hint with a lower starting address than the default one
used by the kernel, and then ensuring the mmap uses that hint or hint plus
some offset. With 64 bits systems, the process virtual address space is
large enoguh for doing the hugepages mmaping within the supported range
when those addressing limitations exist. This patchset also adds a change
for using such a hint making the use of IOVA VA a more than likely
possibility when there are those addressing limitations.
The check is not done by default but just when it is required. This
patchset adds the check for NFP initialization and for setting the IOVA
mode is an emulated VT-d is detected. Also, because the recent patchset
adding dynamic memory allocation, the check is also invoked for ensuring
the new memsegs are within the required range.
This patchset could be applied to stable 18.05.
v2:
- change logs from INFO to DEBUG
- only keeps dma mask if device capable of addressing allocated memory
- add ABI changes
- change hint address increment to page size
- split pci/bus commit in two
- fix commits
v3:
- remove previous code about keeping dma mask
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v2] config: disable RTE_NEXT_ABI by default
2018-10-05 11:30 3% ` Neil Horman
@ 2018-10-05 12:35 0% ` Ferruh Yigit
0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-10-05 12:35 UTC (permalink / raw)
To: Neil Horman
Cc: Bruce Richardson, Thomas Monjalon, dev, Luca Boccassi,
Christian Ehrhardt
On 10/5/2018 12:30 PM, Neil Horman wrote:
> On Fri, Oct 05, 2018 at 11:17:30AM +0100, Ferruh Yigit wrote:
>> On 10/5/2018 10:13 AM, Bruce Richardson wrote:
>>> On Thu, Oct 04, 2018 at 05:55:34PM +0200, Thomas Monjalon wrote:
>>>> 04/10/2018 17:28, Ferruh Yigit:
>>>>> On 10/4/2018 4:10 PM, Thomas Monjalon wrote:
>>>>>> 04/10/2018 17:48, Ferruh Yigit:
>>>>>>> Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
>>>>>>> the current release and these APIs are targeted for further release.
>>>>>>
>>>>>> It seems nobody is using it in last releases.
>>>>>>
>>>>>>> RTE_NEXT_ABI shouldn't be enabled by default.
>>>>>>
>>>>>> The reason for having it enabled by default is that when you build DPDK
>>>>>> yourself, you probably want the latest features.
>>>>>> If packaged properly for stability, it is easy to disable it in
>>>>>> the package recipe.
>>>>>
>>>>> My concern was (if this has been used), user may get unstable APIs and without
>>>>> explicitly being aware of it.
>>>>
>>>> I am OK with both defaults (enabled or disabled).
>>>>
>>> I'd keep it as is. As said, I'm not sure it's being used right now anyway.
>>
>> No, not used right now.
>> But I think we can use it, did you able to find chance to check:
>>
>> https://mails.dpdk.org/archives/dev/2018-October/114372.html
>>
>> Option D.
>>
>
> Just to propose something else, We also have the ALLOW_EXPERIMENTAL_API flag
> that we IIRC default to on. Would it be worth consolidating these two
> mechanisms into one? Currently ALLOW_EXPERIMENTAL_API lets us flag symbols that
> are not yet stable, and it seems to work well. It does not however let us
> simply define out structures/variables that might adversely affect the ABI.
> Would it be worth considering adding a macro (something like
> __rte_experimental_symbol()), that allows a variable/struct to be defined if
> ALLOW_EXPERIMENTAL_API is set, and squashed otherwise?
RTE_NEXT_ABI is not just for symbols.
If there a new API foo(), __rte_experimental works fine to mark it experimental.
But if there is an _existing API_
"bar(char)",
and we plan to change it to
"bar(int, int)",
to publish the change early in this release we need RTE_NEXT_ABI ifdef since
both can't exist together, so it will be used as:
Release N:
#ifdef RTE_NEXT_ABI
bar(int, int);
#else
bar(char);
#endif
Release N + 1:
bar(int, int);
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v2 1/6] mem: add function for checking memsegs IOVAs addresses
2018-10-05 12:06 3% [dpdk-dev] [PATCH v2 0/6] use IOVAs check based on DMA mask Alejandro Lucero
@ 2018-10-05 12:06 4% ` Alejandro Lucero
0 siblings, 0 replies; 200+ results
From: Alejandro Lucero @ 2018-10-05 12:06 UTC (permalink / raw)
To: dev
A device can suffer addressing limitations. This function checks
memsegs have iovas within the supported range based on dma mask.
PMDs should use this function during initialization if device
suffers addressing limitations, returning an error if this function
returns memsegs out of range.
Another usage is for emulated IOMMU hardware with addressing
limitations.
It is necessary to save the most restricted dma mask for checking out
memory allocated dynamically after initialization.
Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
doc/guides/rel_notes/release_18_11.rst | 10 ++++
lib/librte_eal/common/eal_common_memory.c | 64 +++++++++++++++++++++++
lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_memory.h | 3 ++
lib/librte_eal/common/malloc_heap.c | 12 +++++
lib/librte_eal/linuxapp/eal/eal.c | 2 +
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 95 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 2133a5b..c806dc6 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -104,6 +104,14 @@ New Features
the specified port. The port must be stopped before the command call in order
to reconfigure queues.
+* **Added check for ensuring allocated memory addressable by devices.**
+
+ Some devices can have addressing limitations so a new function,
+ ``rte_eal_check_dma_mask``, has been added for checking allocated memory is
+ not out of the device range. Because now memory can be dynamically allocated
+ after initialization, a dma mask is kept and any new allocated memory will be
+ checked out against that dma mask and rejected if out of range. If more than
+ one device has addressing limitations, the dma mask is the more restricted one.
API Changes
-----------
@@ -156,6 +164,8 @@ ABI Changes
``rte_config`` structure on account of improving DPDK usability when
using either ``--legacy-mem`` or ``--single-file-segments`` flags.
+* eal: added ``dma_maskbits`` to ``rte_mem_config`` for keeping more restricted
+ dma mask based on devices addressing limitations.
Removed Items
-------------
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0b69804..7555e76 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -385,6 +385,70 @@ struct virtiova {
rte_memseg_walk(dump_memseg, f);
}
+static int
+check_iova(const struct rte_memseg_list *msl __rte_unused,
+ const struct rte_memseg *ms, void *arg)
+{
+ uint64_t *mask = arg;
+ rte_iova_t iova;
+
+ /* higher address within segment */
+ iova = (ms->iova + ms->len) - 1;
+ if (!(iova & *mask))
+ return 0;
+
+ RTE_LOG(DEBUG, EAL, "memseg iova %"PRIx64", len %zx, out of range\n",
+ ms->iova, ms->len);
+
+ RTE_LOG(DEBUG, EAL, "\tusing dma mask %"PRIx64"\n", *mask);
+ return 1;
+}
+
+#if defined(RTE_ARCH_64)
+#define MAX_DMA_MASK_BITS 63
+#else
+#define MAX_DMA_MASK_BITS 31
+#endif
+
+/* check memseg iovas are within the required range based on dma mask */
+int __rte_experimental
+rte_eal_check_dma_mask(uint8_t maskbits)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ uint64_t mask;
+
+ /* sanity check */
+ if (maskbits > MAX_DMA_MASK_BITS) {
+ RTE_LOG(ERR, EAL, "wrong dma mask size %u (Max: %u)\n",
+ maskbits, MAX_DMA_MASK_BITS);
+ return -1;
+ }
+
+ /* keep the more restricted maskbit */
+ if (!mcfg->dma_maskbits || maskbits < mcfg->dma_maskbits)
+ mcfg->dma_maskbits = maskbits;
+
+ /* create dma mask */
+ mask = ~((1ULL << maskbits) - 1);
+
+ if (rte_memseg_walk(check_iova, &mask))
+ /*
+ * Dma mask precludes hugepage usage.
+ * This device can not be used and we do not need to keep
+ * the dma mask.
+ */
+ return 1;
+
+ /*
+ * we need to keep the more restricted maskbit for checking
+ * potential dynamic memory allocation in the future.
+ */
+ mcfg->dma_maskbits = mcfg->dma_maskbits == 0 ? maskbits :
+ RTE_MIN(mcfg->dma_maskbits, maskbits);
+
+ return 0;
+}
+
/* return the number of memory channels */
unsigned rte_memory_get_nchannel(void)
{
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 62a21c2..b5dff70 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -81,6 +81,9 @@ struct rte_mem_config {
/* legacy mem and single file segments options are shared */
uint32_t legacy_mem;
uint32_t single_file_segments;
+
+ /* keeps the more restricted dma mask */
+ uint8_t dma_maskbits;
} __attribute__((__packed__));
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277..c349d6c 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -454,6 +454,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
*/
unsigned rte_memory_get_nrank(void);
+/* check memsegs iovas are within a range based on dma mask */
+int rte_eal_check_dma_mask(uint8_t maskbits);
+
/**
* Drivers based on uio will not load unless physical
* addresses are obtainable. It is only possible to get
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3..3b5b2b6 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -259,11 +259,13 @@ struct malloc_elem *
int socket, unsigned int flags, size_t align, size_t bound,
bool contig, struct rte_memseg **ms, int n_segs)
{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct rte_memseg_list *msl;
struct malloc_elem *elem = NULL;
size_t alloc_sz;
int allocd_pages;
void *ret, *map_addr;
+ uint64_t mask;
alloc_sz = (size_t)pg_sz * n_segs;
@@ -291,6 +293,16 @@ struct malloc_elem *
goto fail;
}
+ if (mcfg->dma_maskbits) {
+ mask = ~((1ULL << mcfg->dma_maskbits) - 1);
+ if (rte_eal_check_dma_mask(mask)) {
+ RTE_LOG(ERR, EAL,
+ "%s(): couldn't allocate memory due to DMA mask\n",
+ __func__);
+ goto fail;
+ }
+ }
+
/* add newly minted memsegs to malloc heap */
elem = malloc_heap_add_memory(heap, msl, map_addr, alloc_sz);
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 4a55d3b..dfe1b8c 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -263,6 +263,8 @@ enum rte_iova_mode
* processes could later map the config into this exact location */
rte_config.mem_config->mem_cfg_addr = (uintptr_t) rte_mem_cfg_addr;
+ rte_config.mem_config->dma_maskbits = 0;
+
}
/* attach to an existing shared memory config */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 73282bb..2baefce 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -291,6 +291,7 @@ EXPERIMENTAL {
rte_devargs_parsef;
rte_devargs_remove;
rte_devargs_type_count;
+ rte_eal_check_dma_mask;
rte_eal_cleanup;
rte_eal_hotplug_add;
rte_eal_hotplug_remove;
--
1.9.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v2 0/6] use IOVAs check based on DMA mask
@ 2018-10-05 12:06 3% Alejandro Lucero
2018-10-05 12:06 4% ` [dpdk-dev] [PATCH v2 1/6] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
0 siblings, 1 reply; 200+ results
From: Alejandro Lucero @ 2018-10-05 12:06 UTC (permalink / raw)
To: dev
I sent a patchset about this to be applied on 17.11 stable. The memory
code has had main changes since that version, so here it is the patchset
adjusted to current master repo.
This patchset adds, mainly, a check for ensuring IOVAs are within a
restricted range due to addressing limitations with some devices. There
are two known cases: NFP and IOMMU VT-d emulation.
With this check IOVAs out of range are detected and PMDs can abort
initialization. For the VT-d case, IOVA VA mode is allowed as long as
IOVAs are within the supported range, avoiding to forbid IOVA VA by
default.
For the addressing limitations known cases, there are just 40(NFP) or
39(VT-d) bits for handling IOVAs. When using IOVA PA, those limitations
imply 1TB(NFP) or 512M(VT-d) as upper limits, which is likely enough for
most systems. With machines using more memory, the added check will
ensure IOVAs within the range.
With IOVA VA, and because the way the Linux kernel serves mmap calls
in 64 bits systems, 39 or 40 bits are not enough. It is possible to
give an address hint with a lower starting address than the default one
used by the kernel, and then ensuring the mmap uses that hint or hint plus
some offset. With 64 bits systems, the process virtual address space is
large enoguh for doing the hugepages mmaping within the supported range
when those addressing limitations exist. This patchset also adds a change
for using such a hint making the use of IOVA VA a more than likely
possibility when there are those addressing limitations.
The check is not done by default but just when it is required. This
patchset adds the check for NFP initialization and for setting the IOVA
mode is an emulated VT-d is detected. Also, because the recent patchset
adding dynamic memory allocation, the check is also invoked for ensuring
the new memsegs are within the required range.
This patchset could be applied to stable 18.05.
v2:
- change logs from INFO to DEBUG
- only keeps dma mask if device capable of addressing allocated memory
- add ABI changes
- change hint address increment to page size
- split pci/bus commit in two
- fix commits
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v2] config: disable RTE_NEXT_ABI by default
2018-10-05 10:17 0% ` Ferruh Yigit
@ 2018-10-05 11:30 3% ` Neil Horman
2018-10-05 12:35 0% ` Ferruh Yigit
0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-10-05 11:30 UTC (permalink / raw)
To: Ferruh Yigit
Cc: Bruce Richardson, Thomas Monjalon, dev, Luca Boccassi,
Christian Ehrhardt
On Fri, Oct 05, 2018 at 11:17:30AM +0100, Ferruh Yigit wrote:
> On 10/5/2018 10:13 AM, Bruce Richardson wrote:
> > On Thu, Oct 04, 2018 at 05:55:34PM +0200, Thomas Monjalon wrote:
> >> 04/10/2018 17:28, Ferruh Yigit:
> >>> On 10/4/2018 4:10 PM, Thomas Monjalon wrote:
> >>>> 04/10/2018 17:48, Ferruh Yigit:
> >>>>> Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
> >>>>> the current release and these APIs are targeted for further release.
> >>>>
> >>>> It seems nobody is using it in last releases.
> >>>>
> >>>>> RTE_NEXT_ABI shouldn't be enabled by default.
> >>>>
> >>>> The reason for having it enabled by default is that when you build DPDK
> >>>> yourself, you probably want the latest features.
> >>>> If packaged properly for stability, it is easy to disable it in
> >>>> the package recipe.
> >>>
> >>> My concern was (if this has been used), user may get unstable APIs and without
> >>> explicitly being aware of it.
> >>
> >> I am OK with both defaults (enabled or disabled).
> >>
> > I'd keep it as is. As said, I'm not sure it's being used right now anyway.
>
> No, not used right now.
> But I think we can use it, did you able to find chance to check:
>
> https://mails.dpdk.org/archives/dev/2018-October/114372.html
>
> Option D.
>
Just to propose something else, We also have the ALLOW_EXPERIMENTAL_API flag
that we IIRC default to on. Would it be worth consolidating these two
mechanisms into one? Currently ALLOW_EXPERIMENTAL_API lets us flag symbols that
are not yet stable, and it seems to work well. It does not however let us
simply define out structures/variables that might adversely affect the ABI.
Would it be worth considering adding a macro (something like
__rte_experimental_symbol()), that allows a variable/struct to be defined if
ALLOW_EXPERIMENTAL_API is set, and squashed otherwise?
Neil
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [RFC] cryptodev: proposed changes in rte_cryptodev_sym_session
@ 2018-10-05 11:05 0% ` Ananyev, Konstantin
0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2018-10-05 11:05 UTC (permalink / raw)
To: dev
Cc: De Lara Guarch, Pablo, Akhil Goyal, Doherty, Declan, Ravi Kumar,
Jerin Jacob, Zhang, Roy Fan, Trahe, Fiona, Tomasz Duszynski,
Hemant Agrawal, Natalie Samsonov, Dmitri Epshtein, Jay Zhou
Hi everyone,
> This RFC for proposes several changes inside rte_cryptodev_sym_session.
> Note that this is just RFC not a complete patch, so for now
> I modified only the librte_cryptodev itself,
> some cryptodev PMD, test-crypto-perf and ipsec-secgw example.
> Proposed changes means ABI/API breakage inside cryptodev,
> so looking for feedback from crypto-dev lib and crypto-PMD maintainiers.
> Below are details and reasoning for proposed changes.
>
> 1.rte_cryptodev_sym_session_init()/ rte_cryptodev_sym_session_clear()
> operate based on cytpodev device id, though inside
> rte_cryptodev_sym_session device specific data is addressed
> by driver id (not device id).
> That creates a problem with current implementation when we have
> two or more devices with the same driver used by the same session.
> Consider the following example:
>
> struct rte_cryptodev_sym_session *sess;
> rte_cryptodev_sym_session_init(dev_id=X, sess, ...);
> rte_cryptodev_sym_session_init(dev_id=Y, sess, ...);
> rte_cryptodev_sym_session_clear(dev_id=X, sess);
>
> After that point if X and Y uses the same driver,
> then sess can't be used by device Y any more.
> The reason for that - driver specific (not device specific)
> data per session, plus there is no information
> how many device instances use that data.
> Probably the simplest way to deal with that issue -
> add a reference counter per each driver data.
>
> 2.rte_cryptodev_sym_session_set_user_data() and
> rte_cryptodev_sym_session_get_user_data() -
> with current implementation there is no defined way for the user to
> determine what is the max allowed size of the private data.
> Even within rte_cryptodev_sym_session_set_user_data() we just blindly
> copying user provided data without checking memory boundaries violation.
> To overcome that issue I added 'uint16_t priv_size' into
> rte_cryptodev_sym_session structure.
>
> 3.rte_cryptodev_sym_session contains an array of variable size for
> driver specific data.
> Though number of elements in that array is determined by static
> variable nb_drivers, that could be modified by
> rte_cryptodev_allocate_driver().
> That construction seems to work ok so far, as right now users register
> all their PMDs at startup, though it doesn't mean that it would always
> remain like that.
> To make it less error prone I added 'uint16_t nb_drivers' into the
> rte_cryptodev_sym_session structure.
> At least that allows related functions to check that provided
> driver id wouldn't overrun variable array boundaries,
> again it allows to determine size of already allocated session
> without accessing global variable.
>
> 4.#2 and #3 above implies that now each struct rte_cryptodev_sym_session
> would have sort of readonly type data (init once at allocation time,
> keep unmodified through session life-time).
> That requires more changes in current cryptodev implementation:
> Right now inside cryptodev framework both rte_cryptodev_sym_session
> and driver specific session data are two completely different sctrucures
> (e.g. struct struct null_crypto_session and struct null_crypto_session).
> Though current cryptodev implementation implicitly assumes that driver
> will allocate both of them from within the same mempool.
> Plus this is done in a manner that they override each other fields
> (reuse the same space - sort of implicit C union).
> That's probably not the best programming practice,
> plus make impossible to have readonly fields inside both of them.
> So to overcome that situation I changed an API a bit, to allow
> to use two different mempools for these two distinct data structures.
>
> 5. Add 'uint64_t userdata' inside struct rte_cryptodev_sym_session.
> I suppose that self-explanatory, and might be used in a lot of places
> (would be quite useful for ipsec library we develop).
>
> So the new proposed layout for rte_cryptodev_sym_session:
> struct rte_cryptodev_sym_session {
> uint64_t userdata;
> /**< Can be used for external metadata */
> uint16_t nb_drivers;
> /**< number of elements in sess_data array */
> uint16_t priv_size;
> /**< session private data will be placed after sess_data */
> __extension__ struct {
> void *data;
> uint16_t refcnt;
> } sess_data[0];
> /**< Driver specific session material, variable size */
> };
>
>
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
Ok, didn't hear any objections, so far,
so I suppose everyone are ok in general with proposed changes.
Will go ahead with deprecation notice for 18.11 then.
Konstantin
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] config: disable RTE_NEXT_ABI by default
2018-10-05 9:13 0% ` Bruce Richardson
@ 2018-10-05 10:17 0% ` Ferruh Yigit
2018-10-05 11:30 3% ` Neil Horman
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-10-05 10:17 UTC (permalink / raw)
To: Bruce Richardson, Thomas Monjalon
Cc: dev, Neil Horman, Luca Boccassi, Christian Ehrhardt
On 10/5/2018 10:13 AM, Bruce Richardson wrote:
> On Thu, Oct 04, 2018 at 05:55:34PM +0200, Thomas Monjalon wrote:
>> 04/10/2018 17:28, Ferruh Yigit:
>>> On 10/4/2018 4:10 PM, Thomas Monjalon wrote:
>>>> 04/10/2018 17:48, Ferruh Yigit:
>>>>> Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
>>>>> the current release and these APIs are targeted for further release.
>>>>
>>>> It seems nobody is using it in last releases.
>>>>
>>>>> RTE_NEXT_ABI shouldn't be enabled by default.
>>>>
>>>> The reason for having it enabled by default is that when you build DPDK
>>>> yourself, you probably want the latest features.
>>>> If packaged properly for stability, it is easy to disable it in
>>>> the package recipe.
>>>
>>> My concern was (if this has been used), user may get unstable APIs and without
>>> explicitly being aware of it.
>>
>> I am OK with both defaults (enabled or disabled).
>>
> I'd keep it as is. As said, I'm not sure it's being used right now anyway.
No, not used right now.
But I think we can use it, did you able to find chance to check:
https://mails.dpdk.org/archives/dev/2018-October/114372.html
Option D.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] config: disable RTE_NEXT_ABI by default
2018-10-04 15:55 0% ` Thomas Monjalon
@ 2018-10-05 9:13 0% ` Bruce Richardson
2018-10-05 10:17 0% ` Ferruh Yigit
0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-10-05 9:13 UTC (permalink / raw)
To: Thomas Monjalon
Cc: Ferruh Yigit, dev, Neil Horman, Luca Boccassi, Christian Ehrhardt
On Thu, Oct 04, 2018 at 05:55:34PM +0200, Thomas Monjalon wrote:
> 04/10/2018 17:28, Ferruh Yigit:
> > On 10/4/2018 4:10 PM, Thomas Monjalon wrote:
> > > 04/10/2018 17:48, Ferruh Yigit:
> > >> Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
> > >> the current release and these APIs are targeted for further release.
> > >
> > > It seems nobody is using it in last releases.
> > >
> > >> RTE_NEXT_ABI shouldn't be enabled by default.
> > >
> > > The reason for having it enabled by default is that when you build DPDK
> > > yourself, you probably want the latest features.
> > > If packaged properly for stability, it is easy to disable it in
> > > the package recipe.
> >
> > My concern was (if this has been used), user may get unstable APIs and without
> > explicitly being aware of it.
>
> I am OK with both defaults (enabled or disabled).
>
I'd keep it as is. As said, I'm not sure it's being used right now anyway.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config
2018-10-04 10:46 0% ` Ferruh Yigit
@ 2018-10-05 9:04 0% ` Burakov, Anatoly
0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-10-05 9:04 UTC (permalink / raw)
To: Ferruh Yigit, Thomas Monjalon; +Cc: dev, John McNamara, Marko Kovacevic
On 04-Oct-18 11:46 AM, Ferruh Yigit wrote:
> On 10/4/2018 10:18 AM, Thomas Monjalon wrote:
>> 04/10/2018 11:17, Burakov, Anatoly:
>>> On 03-Oct-18 11:05 PM, Thomas Monjalon wrote:
>>>> 20/09/2018 17:41, Anatoly Burakov:
>>>>> Currently, command-line switches for legacy mem mode or single-file
>>>>> segments mode are only stored in internal config. This leads to a
>>>>> situation where these flags have to always match between primary
>>>>> and secondary, which is bad for usability.
>>>>>
>>>>> Fix this by storing these flags in the shared config as well, so
>>>>> that secondary process can know if the primary was launched in
>>>>> single-file segments or legacy mem mode.
>>>>>
>>>>> This bumps the EAL ABI, however there's an EAL deprecation notice
>>>>> already in place[1] for a different feature, so that's OK.
>>>>>
>>>>> [1] http://patches.dpdk.org/patch/43502/
>>>>>
>>>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>> ---
>>>>>
>>>>> Notes:
>>>>> v2:
>>>>> - Added documentation on ABI break
>>>>>
>>>>> doc/guides/rel_notes/rel_description.rst | 5 +++++
>>>>
>>>> Removed change in this file (dup of release note).
>>>>
>>>>> doc/guides/rel_notes/release_18_11.rst | 6 +++++-
>>>>> .../common/include/rte_eal_memconfig.h | 4 ++++
>>>>> lib/librte_eal/linuxapp/eal/Makefile | 2 +-
>>>>> lib/librte_eal/linuxapp/eal/eal.c | 20 +++++++++++++++++++
>>>>> lib/librte_eal/meson.build | 2 +-
>>>>> 6 files changed, 36 insertions(+), 3 deletions(-)
>>>>
>>>> Applied (without extra note), thanks.
>>>>
>>>
>>> This will probably break external mem patches due to conflict in release
>>> notes. Should i respin?
>>
>> No, conflicts in release notes are usual. I manage such conflict myself.
>
> It is common to have conflict in release notes and as Thomas said we resolve it
> manually but now this is causing problem in automated per patch tests because
> patch can't be applied.
>
> We should think about a way to prevent these conflicts.
>
How about just ignore them? 'git status' will show you which particular
files cause conflicts. if it's anything in the doc/ directory, it's safe
to 'git add' those files and proceed with rebase/apply, no?
--
Thanks,
Anatoly
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-04 3:54 0% ` Honnappa Nagarahalli
@ 2018-10-04 19:16 0% ` Wang, Yipeng1
0 siblings, 0 replies; 200+ results
From: Wang, Yipeng1 @ 2018-10-04 19:16 UTC (permalink / raw)
To: Honnappa Nagarahalli, Van Haaren, Harry, Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Wednesday, October 3, 2018 8:54 PM
>To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Wang, Yipeng1 <yipeng1.wang@intel.com>; Van Haaren, Harry
><harry.van.haaren@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>
>Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org; Gavin Hu (Arm Technology China)
><Gavin.Hu@arm.com>; Steve Capper <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; nd <nd@arm.com>; Gobriel,
>Sameh <sameh.gobriel@intel.com>
>Subject: RE: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
>
>>
>> > >-----Original Message-----
>> > >From: Van Haaren, Harry
>> > >> > > > > /**
>> > >> > > > > * Add a key to an existing hash table.
>> > >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash
>> > >> > > > >*h, const void
>> > >> > > *key);
>> > >> > > > > * array of user data. This value is unique for this key.
>> > >> > > > > */
>> > >> > > > > int32_t
>> > >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
>> > >> > > > >void *key,
>> > >> > > hash_sig_t sig);
>> > >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
>> > >> > > > >+*key,
>> > >> > > hash_sig_t sig);
>> > >> > > > >
>> > >> > > > > /
>> > >> > > >
>> > >> > > > I think the above changes will break ABI by changing the
>> > >> > > > parameter
>> > >> type?
>> > >> > > Other people may know better on this.
>> > >> > >
>> > >> > > Just removing a const should not change the ABI, I believe,
>> > >> > > since the const is just advisory hint to the compiler. Actual
>> > >> > > parameter size and count remains unchanged so I don't believe there
>> is an issue.
>> > >> > > [ABI experts, please correct me if I'm wrong on this]
>> > >> >
>> > >> >
>> > >> > [Certainly no ABI expert, but...]
>> > >> >
>> > >> > I think this is an API break, not ABI break.
>> > >> >
>> > >> > Given application code as follows, it will fail to compile - even
>> > >> > though
>> > >> running
>> > >> > the new code as a .so wouldn't cause any issues (AFAIK).
>> > >> >
>> > >> > void do_hash_stuff(const struct rte_hash *h, ...) {
>> > >> > /* parameter passed in is const, but updated function
>> > >> > prototype is
>> > >> non-
>> > >> > const */
>> > >> > rte_hash_add_key_with_hash(h, ...); }
>> > >> >
>> > >> > This means that we can't recompile apps against latest patch
>> > >> > without application code changes, if the app was passing a const
>> > >> > rte_hash struct
>> > >> as
>> > >> > the first parameter.
>> > >> >
>> > >> Agree. Do we need to do anything for this?
>> > >
>> > >I think we should try to avoid breaking API wherever possible.
>> > >If we must, then I suppose we could follow the ABI process of a
>> > >deprecation notice.
>> > >
>> > >From my reading of the versioning docs, it doesn't document this case:
>> > >https://doc.dpdk.org/guides/contributing/versioning.html
>> > >
>> > >I don't recall a similar situation in DPDK previously - so I suggest
>> > >you ask Tech board for input here.
>> > >
>> > >Hope that helps! -Harry
>> > [Wang, Yipeng]
>> > Honnappa, how about use a pointer to the counter in the rte_hash
>> > struct instead of the counter? Will this avoid API change?
>> I think it defeats the purpose of 'const' parameter to the API and provides
>> incorrect information to the user.
>Yipeng, I think I have misunderstood your comment. I believe you meant; we could allocate memory to the counter and store the
>pointer in the structure. Please correct me if I am wrong.
>This could be a solution, though it will be another cache line access. It might be ok given that it is a single cache line for entire hash
>table.
[Wang, Yipeng] Yeah that is what I meant. It is an additional memory access but probably it will be in local cache.
Since time is tight, it could be a simple workaround for this version and in future you can extend this pointed counter to a counter array as Ola suggested and the
Cuckooo switch paper did for scaling issue.
>
>> IMO, DPDK should have guidelines on how to handle the API compatibility
>> breaks. I will send an email to tech board on this.
>> We can also solve this by having counters on the bucket. I was planning to do
>> this little bit later. I will look at the effort involved and may be do it now.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] config: disable RTE_NEXT_ABI by default
2018-10-04 15:28 0% ` Ferruh Yigit
@ 2018-10-04 15:55 0% ` Thomas Monjalon
2018-10-05 9:13 0% ` Bruce Richardson
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-10-04 15:55 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: dev, Neil Horman, Luca Boccassi, Christian Ehrhardt
04/10/2018 17:28, Ferruh Yigit:
> On 10/4/2018 4:10 PM, Thomas Monjalon wrote:
> > 04/10/2018 17:48, Ferruh Yigit:
> >> Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
> >> the current release and these APIs are targeted for further release.
> >
> > It seems nobody is using it in last releases.
> >
> >> RTE_NEXT_ABI shouldn't be enabled by default.
> >
> > The reason for having it enabled by default is that when you build DPDK
> > yourself, you probably want the latest features.
> > If packaged properly for stability, it is easy to disable it in
> > the package recipe.
>
> My concern was (if this has been used), user may get unstable APIs and without
> explicitly being aware of it.
I am OK with both defaults (enabled or disabled).
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] config: disable RTE_NEXT_ABI by default
2018-10-04 15:10 0% ` Thomas Monjalon
@ 2018-10-04 15:28 0% ` Ferruh Yigit
2018-10-04 15:55 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-10-04 15:28 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev, Neil Horman, Luca Boccassi, Christian Ehrhardt
On 10/4/2018 4:10 PM, Thomas Monjalon wrote:
> 04/10/2018 17:48, Ferruh Yigit:
>> Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
>> the current release and these APIs are targeted for further release.
>
> It seems nobody is using it in last releases.
>
>> RTE_NEXT_ABI shouldn't be enabled by default.
>
> The reason for having it enabled by default is that when you build DPDK
> yourself, you probably want the latest features.
> If packaged properly for stability, it is easy to disable it in
> the package recipe.
My concern was (if this has been used), user may get unstable APIs and without
explicitly being aware of it.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] config: disable RTE_NEXT_ABI by default
2018-10-04 15:48 9% ` [dpdk-dev] [PATCH v2] " Ferruh Yigit
@ 2018-10-04 15:10 0% ` Thomas Monjalon
2018-10-04 15:28 0% ` Ferruh Yigit
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-10-04 15:10 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: dev, Neil Horman, Luca Boccassi, Christian Ehrhardt
04/10/2018 17:48, Ferruh Yigit:
> Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
> the current release and these APIs are targeted for further release.
It seems nobody is using it in last releases.
> RTE_NEXT_ABI shouldn't be enabled by default.
The reason for having it enabled by default is that when you build DPDK
yourself, you probably want the latest features.
If packaged properly for stability, it is easy to disable it in
the package recipe.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] config: disable RTE_NEXT_ABI by default
2018-10-04 15:43 9% ` [dpdk-dev] [PATCH] config: disable RTE_NEXT_ABI by default Ferruh Yigit
@ 2018-10-04 14:49 0% ` Luca Boccassi
2018-10-04 15:48 9% ` [dpdk-dev] [PATCH v2] " Ferruh Yigit
1 sibling, 0 replies; 200+ results
From: Luca Boccassi @ 2018-10-04 14:49 UTC (permalink / raw)
To: Ferruh Yigit, Thomas Monjalon; +Cc: dev, Neil Horman, Christian Ehrhardt
On Thu, 2018-10-04 at 16:43 +0100, Ferruh Yigit wrote:
> Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
> the current release and these APIs are targetted for further release.
>
> RTE_NEXT_ABI shouldn't be enabled by default.
>
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>
> Cc: Luca Boccassi <bluca@debian.org>
> Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>
> ---
> config/common_base | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/config/common_base b/config/common_base
> index 2e888d13b..dbd0c9ae9 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -43,7 +43,7 @@ CONFIG_RTE_BUILD_SHARED_LIB=n
> #
> # Use newest code breaking previous ABI
> #
> -CONFIG_RTE_NEXT_ABI=y
> +CONFIG_RTE_NEXT_ABI=n
>
> #
> # Major ABI to overwrite library specific LIBABIVER
Acked-by: Luca Boccassi <bluca@debian.org>
--
Kind regards,
Luca Boccassi
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v2] config: disable RTE_NEXT_ABI by default
2018-10-04 15:43 9% ` [dpdk-dev] [PATCH] config: disable RTE_NEXT_ABI by default Ferruh Yigit
2018-10-04 14:49 0% ` Luca Boccassi
@ 2018-10-04 15:48 9% ` Ferruh Yigit
2018-10-04 15:10 0% ` Thomas Monjalon
1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-10-04 15:48 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dev, Ferruh Yigit, Neil Horman, Luca Boccassi, Christian Ehrhardt
Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
the current release and these APIs are targeted for further release.
RTE_NEXT_ABI shouldn't be enabled by default.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Thomas Monjalon <thomas@monjalon.net>
Cc: Luca Boccassi <bluca@debian.org>
Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>
v2:
* fix typo in commit log
---
config/common_base | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/config/common_base b/config/common_base
index 2e888d13b..dbd0c9ae9 100644
--- a/config/common_base
+++ b/config/common_base
@@ -43,7 +43,7 @@ CONFIG_RTE_BUILD_SHARED_LIB=n
#
# Use newest code breaking previous ABI
#
-CONFIG_RTE_NEXT_ABI=y
+CONFIG_RTE_NEXT_ABI=n
#
# Major ABI to overwrite library specific LIBABIVER
--
2.17.1
^ permalink raw reply [relevance 9%]
* [dpdk-dev] [PATCH] config: disable RTE_NEXT_ABI by default
@ 2018-10-04 15:43 9% ` Ferruh Yigit
2018-10-04 14:49 0% ` Luca Boccassi
2018-10-04 15:48 9% ` [dpdk-dev] [PATCH v2] " Ferruh Yigit
0 siblings, 2 replies; 200+ results
From: Ferruh Yigit @ 2018-10-04 15:43 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dev, Ferruh Yigit, Neil Horman, Luca Boccassi, Christian Ehrhardt
Enabling RTE_NEXT_ABI means to enable APIs that break the ABI for
the current release and these APIs are targetted for further release.
RTE_NEXT_ABI shouldn't be enabled by default.
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Thomas Monjalon <thomas@monjalon.net>
Cc: Luca Boccassi <bluca@debian.org>
Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>
---
config/common_base | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/config/common_base b/config/common_base
index 2e888d13b..dbd0c9ae9 100644
--- a/config/common_base
+++ b/config/common_base
@@ -43,7 +43,7 @@ CONFIG_RTE_BUILD_SHARED_LIB=n
#
# Use newest code breaking previous ABI
#
-CONFIG_RTE_NEXT_ABI=y
+CONFIG_RTE_NEXT_ABI=n
#
# Major ABI to overwrite library specific LIBABIVER
--
2.17.1
^ permalink raw reply [relevance 9%]
* [dpdk-dev] Fwd: [PATCH v2 1/5] mem: add function for checking memsegs IOVAs addresses
[not found] ` <CAD+H991m6qauwX+P=muKe6bAjNLUrcBaGbxFXkMV60OVNvRgPg@mail.gmail.com>
@ 2018-10-04 12:59 0% ` Alejandro Lucero
0 siblings, 0 replies; 200+ results
From: Alejandro Lucero @ 2018-10-04 12:59 UTC (permalink / raw)
To: dev
I sent this email only to Anatoly. Sending it again to mailing list.
On Wed, Oct 3, 2018 at 1:43 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:
> On 31-Aug-18 1:50 PM, Alejandro Lucero wrote:
> > A device can suffer addressing limitations. This functions checks
> > memsegs have iovas within the supported range based on dma mask.
> >
> > PMD should use this during initialization if supported devices
> > suffer addressing limitations, returning an error if this function
> > returns memsegs out of range.
> >
> > Another potential usage is for emulated IOMMU hardware with addressing
> > limitations.
> >
> > It is necessary to save the most restricted dma mask for checking
> > memory allocated dynamically after initialization.
> >
> > Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> > ---
> > lib/librte_eal/common/eal_common_memory.c | 56
> +++++++++++++++++++++++
> > lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++
> > lib/librte_eal/common/include/rte_memory.h | 3 ++
> > lib/librte_eal/common/malloc_heap.c | 12 +++++
> > lib/librte_eal/linuxapp/eal/eal.c | 2 +
> > lib/librte_eal/rte_eal_version.map | 1 +
> > 6 files changed, 77 insertions(+)
> >
> > diff --git a/lib/librte_eal/common/eal_common_memory.c
> b/lib/librte_eal/common/eal_common_memory.c
> > index fbfb1b0..bdd8f44 100644
> > --- a/lib/librte_eal/common/eal_common_memory.c
> > +++ b/lib/librte_eal/common/eal_common_memory.c
> > @@ -383,6 +383,62 @@ struct virtiova {
> > rte_memseg_walk(dump_memseg, f);
> > }
> >
> > +static int
> > +check_iova(const struct rte_memseg_list *msl __rte_unused,
> > + const struct rte_memseg *ms, void *arg)
> > +{
> > + uint64_t *mask = arg;
> > + rte_iova_t iova;
> > +
> > + /* higher address within segment */
> > + iova = (ms->iova + ms->len) - 1;
> > + if (!(iova & *mask))
> > + return 0;
> > +
> > + RTE_LOG(INFO, EAL, "memseg iova %"PRIx64", len %zx, out of
> range\n",
> > + ms->iova, ms->len);
> > +
> > + RTE_LOG(INFO, EAL, "\tusing dma mask %"PRIx64"\n", *mask);
>
> IMO putting these as INFO is overkill. I'd prefer not to spam the output
> unless it's really important. Can this go under DEBUG?
>
>
This checks comes from a device or from the alloc_pages_on_heap when
expanding memory. If the check discovers an address out of mask, a device
can not be used or the new memory can not be allocated. I think having this
info will help to understand why the device initialization or the memory
allocation are failing.
> Also, the message is misleading. You stop before you have a chance to
> check other masks, which may restrict them even further. You're
> outputting the message about using DMA mask XXX but this may not be the
> final DMA mask.
>
Well, this is the first triggering, and it is enough for reporting the
problem and avoiding the device or the new memory to be used.
Note that the mask is per device, and for the memory allocation case, it is
the most restrictive dma mask. So there are no other masks to try.
>
> > + /* Stop the walk and change mask */
> > + *mask = 0;
> > + return 1;
> > +}
> > +
> > +#if defined(RTE_ARCH_64)
> > +#define MAX_DMA_MASK_BITS 63
> > +#else
> > +#define MAX_DMA_MASK_BITS 31
> > +#endif
> > +
> > +/* check memseg iovas are within the required range based on dma mask */
> > +int __rte_experimental
> > +rte_eal_check_dma_mask(uint8_t maskbits)
> > +{
> > + struct rte_mem_config *mcfg =
> rte_eal_get_configuration()->mem_config;
> > + uint64_t mask;
> > +
> > + /* sanity check */
> > + if (maskbits > MAX_DMA_MASK_BITS) {
> > + RTE_LOG(INFO, EAL, "wrong dma mask size %u (Max: %u)\n",
> > + maskbits, MAX_DMA_MASK_BITS);
>
> Should be ERR, not INFO.
>
>
Right. I will change it.
> > + return -1;
> > + }
> > +
> > + /* keep the more restricted maskbit */
> > + if (!mcfg->dma_maskbits || maskbits < mcfg->dma_maskbits)
> > + mcfg->dma_maskbits = maskbits;
>
> Do we need to modify mcfg->dma_maskbits before we know if we're going to
> fail? Suggest using a local variable maybe?
>
>
Yes, that's true. If the check fails, the device will not be used therefore
we do not need to keep that dma mask at all.
I will change the order here.
Thanks!
> Also, i think it's a good case for ternary:
>
> bits = mcfg->dma_maskbits == 0 ?
> maskbits :
> RTE_MIN(maskbits, mcfg->dma_maskbits);
>
> IMO the intention looks much clearer.
>
>
Agree.
> > +
> > + /* create dma mask */
> > + mask = ~((1ULL << maskbits) - 1);
> > +
> > + rte_memseg_walk(check_iova, &mask);
> > +
> > + if (!mask)
> > + return -1;
> > +
> > + return 0;
> > +}
> > +
> > /* return the number of memory channels */
> > unsigned rte_memory_get_nchannel(void)
> > {
> > diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h
> b/lib/librte_eal/common/include/rte_eal_memconfig.h
> > index aff0688..aea44cb 100644
> > --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
> > +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
> > @@ -77,6 +77,9 @@ struct rte_mem_config {
> > * exact same address the primary process maps it.
> > */
> > uint64_t mem_cfg_addr;
> > +
> > + /* keeps the more restricted dma mask */
> > + uint8_t dma_maskbits;
>
> This needs to be documented as an ABI break in the 18.11 release notes.
>
>
Ok. I'll add that in the next version.
Thanks
>
> --
> Thanks,
> Anatoly
>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config
2018-10-04 9:18 0% ` Thomas Monjalon
@ 2018-10-04 10:46 0% ` Ferruh Yigit
2018-10-05 9:04 0% ` Burakov, Anatoly
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-10-04 10:46 UTC (permalink / raw)
To: Thomas Monjalon, Burakov, Anatoly; +Cc: dev, John McNamara, Marko Kovacevic
On 10/4/2018 10:18 AM, Thomas Monjalon wrote:
> 04/10/2018 11:17, Burakov, Anatoly:
>> On 03-Oct-18 11:05 PM, Thomas Monjalon wrote:
>>> 20/09/2018 17:41, Anatoly Burakov:
>>>> Currently, command-line switches for legacy mem mode or single-file
>>>> segments mode are only stored in internal config. This leads to a
>>>> situation where these flags have to always match between primary
>>>> and secondary, which is bad for usability.
>>>>
>>>> Fix this by storing these flags in the shared config as well, so
>>>> that secondary process can know if the primary was launched in
>>>> single-file segments or legacy mem mode.
>>>>
>>>> This bumps the EAL ABI, however there's an EAL deprecation notice
>>>> already in place[1] for a different feature, so that's OK.
>>>>
>>>> [1] http://patches.dpdk.org/patch/43502/
>>>>
>>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>> ---
>>>>
>>>> Notes:
>>>> v2:
>>>> - Added documentation on ABI break
>>>>
>>>> doc/guides/rel_notes/rel_description.rst | 5 +++++
>>>
>>> Removed change in this file (dup of release note).
>>>
>>>> doc/guides/rel_notes/release_18_11.rst | 6 +++++-
>>>> .../common/include/rte_eal_memconfig.h | 4 ++++
>>>> lib/librte_eal/linuxapp/eal/Makefile | 2 +-
>>>> lib/librte_eal/linuxapp/eal/eal.c | 20 +++++++++++++++++++
>>>> lib/librte_eal/meson.build | 2 +-
>>>> 6 files changed, 36 insertions(+), 3 deletions(-)
>>>
>>> Applied (without extra note), thanks.
>>>
>>
>> This will probably break external mem patches due to conflict in release
>> notes. Should i respin?
>
> No, conflicts in release notes are usual. I manage such conflict myself.
It is common to have conflict in release notes and as Thomas said we resolve it
manually but now this is causing problem in automated per patch tests because
patch can't be applied.
We should think about a way to prevent these conflicts.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v5 3/5] eal: add bus pointer in device structure
2018-10-04 9:31 4% ` Gaëtan Rivet
@ 2018-10-04 9:48 3% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-10-04 9:48 UTC (permalink / raw)
To: Gaëtan Rivet
Cc: dev, ophirmu, qi.z.zhang, ferruh.yigit, ktraynor, Rosen Xu,
Hemant Agrawal, Shreyansh Jain, Stephen Hemminger
04/10/2018 11:31, Gaëtan Rivet:
> On Thu, Oct 04, 2018 at 01:10:44AM +0200, Thomas Monjalon wrote:
> > When a device is added with a devargs (hotplug or whitelist),
> > the bus pointer can be retrieved via its devargs.
> > But there is no such devargs.bus in case of standard scan.
> >
> > A pointer to the rte_bus handle is added to rte_device.
> > When a device is allocated (during a scan),
> > the pointer to its bus is assigned.
> >
> > It will make possible to remove a rte_device,
> > using the function pointer from its bus.
> >
> > The function rte_bus_find_by_device() becomes useless,
> > and may be removed later.
> >
> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
>
> I agree with this change, but I think this can break ABI of
> buses defining their structure by composition with rte_device (e.g. PCI
> bus). Have you checked ABI?
Yes I forgot it changes the size of the bus structures.
I can spin a v6 with a bump of ABI version of the bus drivers,
and an additional note in release notes.
> Personally I don't care, I prefer a clean framework to a littered lib.
>
> Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
Adding bus drivers maintainers to get more opinions.
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v5 3/5] eal: add bus pointer in device structure
2018-10-03 23:10 4% ` [dpdk-dev] [PATCH v5 3/5] eal: add bus pointer in device structure Thomas Monjalon
@ 2018-10-04 9:31 4% ` Gaëtan Rivet
2018-10-04 9:48 3% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Gaëtan Rivet @ 2018-10-04 9:31 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev, ophirmu, qi.z.zhang, ferruh.yigit, ktraynor
On Thu, Oct 04, 2018 at 01:10:44AM +0200, Thomas Monjalon wrote:
> When a device is added with a devargs (hotplug or whitelist),
> the bus pointer can be retrieved via its devargs.
> But there is no such devargs.bus in case of standard scan.
>
> A pointer to the rte_bus handle is added to rte_device.
> When a device is allocated (during a scan),
> the pointer to its bus is assigned.
>
> It will make possible to remove a rte_device,
> using the function pointer from its bus.
>
> The function rte_bus_find_by_device() becomes useless,
> and may be removed later.
>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
I agree with this change, but I think this can break ABI of
buses defining their structure by composition with rte_device (e.g. PCI
bus). Have you checked ABI?
Personally I don't care, I prefer a clean framework to a littered lib.
Acked-by: Gaetan Rivet <gaetan.rivet@6wind.com>
--
Gaëtan Rivet
6WIND
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config
2018-10-04 9:17 0% ` Burakov, Anatoly
@ 2018-10-04 9:18 0% ` Thomas Monjalon
2018-10-04 10:46 0% ` Ferruh Yigit
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-10-04 9:18 UTC (permalink / raw)
To: Burakov, Anatoly; +Cc: dev, John McNamara, Marko Kovacevic
04/10/2018 11:17, Burakov, Anatoly:
> On 03-Oct-18 11:05 PM, Thomas Monjalon wrote:
> > 20/09/2018 17:41, Anatoly Burakov:
> >> Currently, command-line switches for legacy mem mode or single-file
> >> segments mode are only stored in internal config. This leads to a
> >> situation where these flags have to always match between primary
> >> and secondary, which is bad for usability.
> >>
> >> Fix this by storing these flags in the shared config as well, so
> >> that secondary process can know if the primary was launched in
> >> single-file segments or legacy mem mode.
> >>
> >> This bumps the EAL ABI, however there's an EAL deprecation notice
> >> already in place[1] for a different feature, so that's OK.
> >>
> >> [1] http://patches.dpdk.org/patch/43502/
> >>
> >> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> >> ---
> >>
> >> Notes:
> >> v2:
> >> - Added documentation on ABI break
> >>
> >> doc/guides/rel_notes/rel_description.rst | 5 +++++
> >
> > Removed change in this file (dup of release note).
> >
> >> doc/guides/rel_notes/release_18_11.rst | 6 +++++-
> >> .../common/include/rte_eal_memconfig.h | 4 ++++
> >> lib/librte_eal/linuxapp/eal/Makefile | 2 +-
> >> lib/librte_eal/linuxapp/eal/eal.c | 20 +++++++++++++++++++
> >> lib/librte_eal/meson.build | 2 +-
> >> 6 files changed, 36 insertions(+), 3 deletions(-)
> >
> > Applied (without extra note), thanks.
> >
>
> This will probably break external mem patches due to conflict in release
> notes. Should i respin?
No, conflicts in release notes are usual. I manage such conflict myself.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config
2018-10-03 22:05 0% ` Thomas Monjalon
@ 2018-10-04 9:17 0% ` Burakov, Anatoly
2018-10-04 9:18 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-10-04 9:17 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev, John McNamara, Marko Kovacevic
On 03-Oct-18 11:05 PM, Thomas Monjalon wrote:
> 20/09/2018 17:41, Anatoly Burakov:
>> Currently, command-line switches for legacy mem mode or single-file
>> segments mode are only stored in internal config. This leads to a
>> situation where these flags have to always match between primary
>> and secondary, which is bad for usability.
>>
>> Fix this by storing these flags in the shared config as well, so
>> that secondary process can know if the primary was launched in
>> single-file segments or legacy mem mode.
>>
>> This bumps the EAL ABI, however there's an EAL deprecation notice
>> already in place[1] for a different feature, so that's OK.
>>
>> [1] http://patches.dpdk.org/patch/43502/
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>
>> Notes:
>> v2:
>> - Added documentation on ABI break
>>
>> doc/guides/rel_notes/rel_description.rst | 5 +++++
>
> Removed change in this file (dup of release note).
>
>> doc/guides/rel_notes/release_18_11.rst | 6 +++++-
>> .../common/include/rte_eal_memconfig.h | 4 ++++
>> lib/librte_eal/linuxapp/eal/Makefile | 2 +-
>> lib/librte_eal/linuxapp/eal/eal.c | 20 +++++++++++++++++++
>> lib/librte_eal/meson.build | 2 +-
>> 6 files changed, 36 insertions(+), 3 deletions(-)
>
> Applied (without extra note), thanks.
>
This will probably break external mem patches due to conflict in release
notes. Should i respin?
--
Thanks,
Anatoly
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-03 17:32 0% ` Honnappa Nagarahalli
2018-10-03 17:56 3% ` Wang, Yipeng1
@ 2018-10-04 3:54 0% ` Honnappa Nagarahalli
2018-10-04 19:16 0% ` Wang, Yipeng1
1 sibling, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2018-10-04 3:54 UTC (permalink / raw)
To: Honnappa Nagarahalli, Wang, Yipeng1, Van Haaren, Harry,
Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>
> > >-----Original Message-----
> > >From: Van Haaren, Harry
> > >> > > > > /**
> > >> > > > > * Add a key to an existing hash table.
> > >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash
> > >> > > > >*h, const void
> > >> > > *key);
> > >> > > > > * array of user data. This value is unique for this key.
> > >> > > > > */
> > >> > > > > int32_t
> > >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
> > >> > > > >void *key,
> > >> > > hash_sig_t sig);
> > >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
> > >> > > > >+*key,
> > >> > > hash_sig_t sig);
> > >> > > > >
> > >> > > > > /
> > >> > > >
> > >> > > > I think the above changes will break ABI by changing the
> > >> > > > parameter
> > >> type?
> > >> > > Other people may know better on this.
> > >> > >
> > >> > > Just removing a const should not change the ABI, I believe,
> > >> > > since the const is just advisory hint to the compiler. Actual
> > >> > > parameter size and count remains unchanged so I don't believe there
> is an issue.
> > >> > > [ABI experts, please correct me if I'm wrong on this]
> > >> >
> > >> >
> > >> > [Certainly no ABI expert, but...]
> > >> >
> > >> > I think this is an API break, not ABI break.
> > >> >
> > >> > Given application code as follows, it will fail to compile - even
> > >> > though
> > >> running
> > >> > the new code as a .so wouldn't cause any issues (AFAIK).
> > >> >
> > >> > void do_hash_stuff(const struct rte_hash *h, ...) {
> > >> > /* parameter passed in is const, but updated function
> > >> > prototype is
> > >> non-
> > >> > const */
> > >> > rte_hash_add_key_with_hash(h, ...); }
> > >> >
> > >> > This means that we can't recompile apps against latest patch
> > >> > without application code changes, if the app was passing a const
> > >> > rte_hash struct
> > >> as
> > >> > the first parameter.
> > >> >
> > >> Agree. Do we need to do anything for this?
> > >
> > >I think we should try to avoid breaking API wherever possible.
> > >If we must, then I suppose we could follow the ABI process of a
> > >deprecation notice.
> > >
> > >From my reading of the versioning docs, it doesn't document this case:
> > >https://doc.dpdk.org/guides/contributing/versioning.html
> > >
> > >I don't recall a similar situation in DPDK previously - so I suggest
> > >you ask Tech board for input here.
> > >
> > >Hope that helps! -Harry
> > [Wang, Yipeng]
> > Honnappa, how about use a pointer to the counter in the rte_hash
> > struct instead of the counter? Will this avoid API change?
> I think it defeats the purpose of 'const' parameter to the API and provides
> incorrect information to the user.
Yipeng, I think I have misunderstood your comment. I believe you meant; we could allocate memory to the counter and store the pointer in the structure. Please correct me if I am wrong.
This could be a solution, though it will be another cache line access. It might be ok given that it is a single cache line for entire hash table.
> IMO, DPDK should have guidelines on how to handle the API compatibility
> breaks. I will send an email to tech board on this.
> We can also solve this by having counters on the bucket. I was planning to do
> this little bit later. I will look at the effort involved and may be do it now.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-03 17:56 3% ` Wang, Yipeng1
2018-10-03 23:05 5% ` Ola Liljedahl
@ 2018-10-04 3:32 0% ` Honnappa Nagarahalli
1 sibling, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2018-10-04 3:32 UTC (permalink / raw)
To: Wang, Yipeng1, Van Haaren, Harry, Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh,
Honnappa Nagarahalli
> >
> >> >-----Original Message-----
> >> >From: Van Haaren, Harry
> >> >> > > > > /**
> >> >> > > > > * Add a key to an existing hash table.
> >> >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash
> >> >> > > > >*h, const void
> >> >> > > *key);
> >> >> > > > > * array of user data. This value is unique for this key.
> >> >> > > > > */
> >> >> > > > > int32_t
> >> >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
> >> >> > > > >void *key,
> >> >> > > hash_sig_t sig);
> >> >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
> >> >> > > > >+*key,
> >> >> > > hash_sig_t sig);
> >> >> > > > >
> >> >> > > > > /
> >> >> > > >
> >> >> > > > I think the above changes will break ABI by changing the
> >> >> > > > parameter
> >> >> type?
> >> >> > > Other people may know better on this.
> >> >> > >
> >> >> > > Just removing a const should not change the ABI, I believe,
> >> >> > > since the const is just advisory hint to the compiler. Actual
> >> >> > > parameter size and count remains unchanged so I don't believe
> there is an issue.
> >> >> > > [ABI experts, please correct me if I'm wrong on this]
> >> >> >
> >> >> >
> >> >> > [Certainly no ABI expert, but...]
> >> >> >
> >> >> > I think this is an API break, not ABI break.
> >> >> >
> >> >> > Given application code as follows, it will fail to compile -
> >> >> > even though
> >> >> running
> >> >> > the new code as a .so wouldn't cause any issues (AFAIK).
> >> >> >
> >> >> > void do_hash_stuff(const struct rte_hash *h, ...) {
> >> >> > /* parameter passed in is const, but updated function
> >> >> > prototype is
> >> >> non-
> >> >> > const */
> >> >> > rte_hash_add_key_with_hash(h, ...); }
> >> >> >
> >> >> > This means that we can't recompile apps against latest patch
> >> >> > without application code changes, if the app was passing a const
> >> >> > rte_hash struct
> >> >> as
> >> >> > the first parameter.
> >> >> >
> >> >> Agree. Do we need to do anything for this?
> >> >
> >> >I think we should try to avoid breaking API wherever possible.
> >> >If we must, then I suppose we could follow the ABI process of a
> >> >deprecation notice.
> >> >
> >> >From my reading of the versioning docs, it doesn't document this case:
> >> >https://doc.dpdk.org/guides/contributing/versioning.html
> >> >
> >> >I don't recall a similar situation in DPDK previously - so I suggest
> >> >you ask Tech board for input here.
> >> >
> >> >Hope that helps! -Harry
> >> [Wang, Yipeng]
> >> Honnappa, how about use a pointer to the counter in the rte_hash
> >> struct instead of the counter? Will this avoid API change?
> >I think it defeats the purpose of 'const' parameter to the API and provides
> incorrect information to the user.
> >IMO, DPDK should have guidelines on how to handle the API compatibility
> breaks. I will send an email to tech board on this.
> >We can also solve this by having counters on the bucket. I was planning
> >to do this little bit later. I will look at the effort involved and may be do it
> now.
> [Wang, Yipeng]
> I think with ABI/API change, you might need to announce it one release cycle
> ahead.
>
> In the cuckoo switch paper: Scalable, High Performance Ethernet Forwarding
> with CUCKOOSWITCH it separates the version counter array and the hash
> table. You can strike a balance between granularity of the version counter and
> the cache/memory requirement.
> Is it a better way?
This will introduce another cache line access. It would be good to stay within the single cacheline.
>
> Another consideration is current bucket is 64-byte exactly with the partial-
> key-hashing.
> To add another counter, we need to think about changing certain variables to
> still align cache line.
The 'flags' structure member is not being used. I plan to remove that. That will give us 8B, I will use 4B out of it for the counter.
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v5 3/5] eal: add bus pointer in device structure
@ 2018-10-03 23:10 4% ` Thomas Monjalon
2018-10-04 9:31 4% ` Gaëtan Rivet
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-10-03 23:10 UTC (permalink / raw)
To: dev; +Cc: gaetan.rivet, ophirmu, qi.z.zhang, ferruh.yigit, ktraynor
When a device is added with a devargs (hotplug or whitelist),
the bus pointer can be retrieved via its devargs.
But there is no such devargs.bus in case of standard scan.
A pointer to the rte_bus handle is added to rte_device.
When a device is allocated (during a scan),
the pointer to its bus is assigned.
It will make possible to remove a rte_device,
using the function pointer from its bus.
The function rte_bus_find_by_device() becomes useless,
and may be removed later.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
doc/guides/rel_notes/release_18_11.rst | 2 ++
drivers/bus/dpaa/dpaa_bus.c | 2 ++
drivers/bus/fslmc/fslmc_bus.c | 2 ++
drivers/bus/ifpga/ifpga_bus.c | 1 +
drivers/bus/pci/bsd/pci.c | 2 ++
drivers/bus/pci/linux/pci.c | 1 +
drivers/bus/pci/private.h | 2 ++
drivers/bus/vdev/vdev.c | 1 +
drivers/bus/vmbus/linux/vmbus_bus.c | 1 +
drivers/bus/vmbus/private.h | 3 +++
lib/librte_eal/common/include/rte_dev.h | 1 +
11 files changed, 18 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index d534bb71c..2c6791e5e 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -164,6 +164,8 @@ ABI Changes
``rte_config`` structure on account of improving DPDK usability when
using either ``--legacy-mem`` or ``--single-file-segments`` flags.
+* eal: The structure ``rte_device`` got a new field to reference a ``rte_bus``.
+
Removed Items
-------------
diff --git a/drivers/bus/dpaa/dpaa_bus.c b/drivers/bus/dpaa/dpaa_bus.c
index 49cd04dbb..138e0f98d 100644
--- a/drivers/bus/dpaa/dpaa_bus.c
+++ b/drivers/bus/dpaa/dpaa_bus.c
@@ -165,6 +165,8 @@ dpaa_create_device_list(void)
goto cleanup;
}
+ dev->device.bus = &rte_dpaa_bus.bus;
+
cfg = &dpaa_netcfg->port_cfg[i];
fman_intf = cfg->fman_if;
diff --git a/drivers/bus/fslmc/fslmc_bus.c b/drivers/bus/fslmc/fslmc_bus.c
index bfe81e236..960f55071 100644
--- a/drivers/bus/fslmc/fslmc_bus.c
+++ b/drivers/bus/fslmc/fslmc_bus.c
@@ -161,6 +161,8 @@ scan_one_fslmc_device(char *dev_name)
return -ENOMEM;
}
+ dev->device.bus = &rte_fslmc_bus.bus;
+
/* Parse the device name and ID */
t_ptr = strtok(dup_dev_name, ".");
if (!t_ptr) {
diff --git a/drivers/bus/ifpga/ifpga_bus.c b/drivers/bus/ifpga/ifpga_bus.c
index 3ef035b7e..80663328a 100644
--- a/drivers/bus/ifpga/ifpga_bus.c
+++ b/drivers/bus/ifpga/ifpga_bus.c
@@ -142,6 +142,7 @@ ifpga_scan_one(struct rte_rawdev *rawdev,
if (!afu_dev)
goto end;
+ afu_dev->device.bus = &rte_ifpga_bus;
afu_dev->device.devargs = devargs;
afu_dev->device.numa_node = SOCKET_ID_ANY;
afu_dev->device.name = devargs->name;
diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c
index 655b34b7e..40641cad4 100644
--- a/drivers/bus/pci/bsd/pci.c
+++ b/drivers/bus/pci/bsd/pci.c
@@ -223,6 +223,8 @@ pci_scan_one(int dev_pci_fd, struct pci_conf *conf)
}
memset(dev, 0, sizeof(*dev));
+ dev->device.bus = &rte_pci_bus.bus;
+
dev->addr.domain = conf->pc_sel.pc_domain;
dev->addr.bus = conf->pc_sel.pc_bus;
dev->addr.devid = conf->pc_sel.pc_dev;
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..e31bbb370 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -228,6 +228,7 @@ pci_scan_one(const char *dirname, const struct rte_pci_addr *addr)
return -1;
memset(dev, 0, sizeof(*dev));
+ dev->device.bus = &rte_pci_bus.bus;
dev->addr = *addr;
/* get vendor id */
diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h
index 0e689fa74..04bffa6e7 100644
--- a/drivers/bus/pci/private.h
+++ b/drivers/bus/pci/private.h
@@ -15,6 +15,8 @@ extern struct rte_pci_bus rte_pci_bus;
struct rte_pci_driver;
struct rte_pci_device;
+extern struct rte_pci_bus rte_pci_bus;
+
/**
* Probe the PCI bus
*
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index efca962f7..0142fb2c8 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -456,6 +456,7 @@ vdev_scan(void)
continue;
}
+ dev->device.bus = &rte_vdev_bus;
dev->device.devargs = devargs;
dev->device.numa_node = SOCKET_ID_ANY;
dev->device.name = devargs->name;
diff --git a/drivers/bus/vmbus/linux/vmbus_bus.c b/drivers/bus/vmbus/linux/vmbus_bus.c
index 527a6a39f..a4755a387 100644
--- a/drivers/bus/vmbus/linux/vmbus_bus.c
+++ b/drivers/bus/vmbus/linux/vmbus_bus.c
@@ -229,6 +229,7 @@ vmbus_scan_one(const char *name)
if (dev == NULL)
return -1;
+ dev->device.bus = &rte_vmbus_bus.bus;
dev->device.name = strdup(name);
if (!dev->device.name)
goto error;
diff --git a/drivers/bus/vmbus/private.h b/drivers/bus/vmbus/private.h
index f2022a68c..211127dd8 100644
--- a/drivers/bus/vmbus/private.h
+++ b/drivers/bus/vmbus/private.h
@@ -10,11 +10,14 @@
#include <sys/uio.h>
#include <rte_log.h>
#include <rte_vmbus_reg.h>
+#include <rte_bus_vmbus.h>
#ifndef PAGE_SIZE
#define PAGE_SIZE 4096
#endif
+extern struct rte_vmbus_bus rte_vmbus_bus;
+
extern int vmbus_logtype_bus;
#define VMBUS_LOG(level, fmt, args...) \
rte_log(RTE_LOG_ ## level, vmbus_logtype_bus, "%s(): " fmt "\n", \
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index b80a80598..d82cba847 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -157,6 +157,7 @@ struct rte_device {
TAILQ_ENTRY(rte_device) next; /**< Next device */
const char *name; /**< Device name */
const struct rte_driver *driver;/**< Associated driver */
+ const struct rte_bus *bus; /**< Bus handle assigned on scan */
int numa_node; /**< NUMA node connection */
struct rte_devargs *devargs; /**< Device user arguments */
};
--
2.19.0
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-03 17:56 3% ` Wang, Yipeng1
@ 2018-10-03 23:05 5% ` Ola Liljedahl
2018-10-04 3:32 0% ` Honnappa Nagarahalli
1 sibling, 0 replies; 200+ results
From: Ola Liljedahl @ 2018-10-03 23:05 UTC (permalink / raw)
To: Wang, Yipeng1, Honnappa Nagarahalli, Van Haaren, Harry,
Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, nd, Gobriel, Sameh
On 03/10/2018, 20:00, "Wang, Yipeng1" <yipeng1.wang@intel.com> wrote:
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Wednesday, October 3, 2018 10:33 AM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>
>Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org; Gavin Hu (Arm Technology China)
><Gavin.Hu@arm.com>; Steve Capper <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; nd <nd@arm.com>; Gobriel,
>Sameh <sameh.gobriel@intel.com>
>Subject: RE: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
>
>> >-----Original Message-----
>> >From: Van Haaren, Harry
>> >> > > > > /**
>> >> > > > > * Add a key to an existing hash table.
>> >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
>> >> > > > >const void
>> >> > > *key);
>> >> > > > > * array of user data. This value is unique for this key.
>> >> > > > > */
>> >> > > > > int32_t
>> >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
>> >> > > > >void *key,
>> >> > > hash_sig_t sig);
>> >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
>> >> > > > >+*key,
>> >> > > hash_sig_t sig);
>> >> > > > >
>> >> > > > > /
>> >> > > >
>> >> > > > I think the above changes will break ABI by changing the
>> >> > > > parameter
>> >> type?
>> >> > > Other people may know better on this.
>> >> > >
>> >> > > Just removing a const should not change the ABI, I believe, since
>> >> > > the const is just advisory hint to the compiler. Actual parameter
>> >> > > size and count remains unchanged so I don't believe there is an issue.
>> >> > > [ABI experts, please correct me if I'm wrong on this]
>> >> >
>> >> >
>> >> > [Certainly no ABI expert, but...]
>> >> >
>> >> > I think this is an API break, not ABI break.
>> >> >
>> >> > Given application code as follows, it will fail to compile - even
>> >> > though
>> >> running
>> >> > the new code as a .so wouldn't cause any issues (AFAIK).
>> >> >
>> >> > void do_hash_stuff(const struct rte_hash *h, ...) {
>> >> > /* parameter passed in is const, but updated function prototype
>> >> > is
>> >> non-
>> >> > const */
>> >> > rte_hash_add_key_with_hash(h, ...); }
>> >> >
>> >> > This means that we can't recompile apps against latest patch
>> >> > without application code changes, if the app was passing a const
>> >> > rte_hash struct
>> >> as
>> >> > the first parameter.
>> >> >
>> >> Agree. Do we need to do anything for this?
>> >
>> >I think we should try to avoid breaking API wherever possible.
>> >If we must, then I suppose we could follow the ABI process of a
>> >deprecation notice.
>> >
>> >From my reading of the versioning docs, it doesn't document this case:
>> >https://doc.dpdk.org/guides/contributing/versioning.html
>> >
>> >I don't recall a similar situation in DPDK previously - so I suggest
>> >you ask Tech board for input here.
>> >
>> >Hope that helps! -Harry
>> [Wang, Yipeng]
>> Honnappa, how about use a pointer to the counter in the rte_hash struct
>> instead of the counter? Will this avoid API change?
>I think it defeats the purpose of 'const' parameter to the API and provides incorrect information to the user.
>IMO, DPDK should have guidelines on how to handle the API compatibility breaks. I will send an email to tech board on this.
>We can also solve this by having counters on the bucket. I was planning to do this little bit later. I will look at the effort involved and
>may be do it now.
[Wang, Yipeng]
I think with ABI/API change, you might need to announce it one release cycle ahead.
In the cuckoo switch paper: Scalable, High Performance Ethernet Forwarding with
CUCKOOSWITCH
it separates the version counter array and the hash table. You can strike a balance
between granularity of the version counter and the cache/memory requirement.
Is it a better way?
[Ola] Having only a single generation counter can easily become a scalability bottleneck due to write contention to the cache line.
Ideally you want each gen counter to be located in its own cache line (multiple gen counters in the same cache line will experience the same write contention). But that seems a bit wasteful of memory.
Ideally (in order to avoid accessing more cache lines), the gen counter should be located in the hash bucket. But as you write below, this would create problems for the layout of the hash bucket, there isn't any room for another field.
So I propose an array of gen. counters, indexed by the hash (of somekind) of primary and alternate hashes (signatures) of the element (modulo the array size). So another hash table.
Another consideration is current bucket is 64-byte exactly with the partial-key-hashing.
To add another counter, we need to think about changing certain variables to still align
cache line.
^ permalink raw reply [relevance 5%]
* Re: [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config
2018-09-20 15:41 17% ` [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config Anatoly Burakov
@ 2018-10-03 22:05 0% ` Thomas Monjalon
2018-10-04 9:17 0% ` Burakov, Anatoly
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-10-03 22:05 UTC (permalink / raw)
To: Anatoly Burakov; +Cc: dev, John McNamara, Marko Kovacevic
20/09/2018 17:41, Anatoly Burakov:
> Currently, command-line switches for legacy mem mode or single-file
> segments mode are only stored in internal config. This leads to a
> situation where these flags have to always match between primary
> and secondary, which is bad for usability.
>
> Fix this by storing these flags in the shared config as well, so
> that secondary process can know if the primary was launched in
> single-file segments or legacy mem mode.
>
> This bumps the EAL ABI, however there's an EAL deprecation notice
> already in place[1] for a different feature, so that's OK.
>
> [1] http://patches.dpdk.org/patch/43502/
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>
> Notes:
> v2:
> - Added documentation on ABI break
>
> doc/guides/rel_notes/rel_description.rst | 5 +++++
Removed change in this file (dup of release note).
> doc/guides/rel_notes/release_18_11.rst | 6 +++++-
> .../common/include/rte_eal_memconfig.h | 4 ++++
> lib/librte_eal/linuxapp/eal/Makefile | 2 +-
> lib/librte_eal/linuxapp/eal/eal.c | 20 +++++++++++++++++++
> lib/librte_eal/meson.build | 2 +-
> 6 files changed, 36 insertions(+), 3 deletions(-)
Applied (without extra note), thanks.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-03 17:32 0% ` Honnappa Nagarahalli
@ 2018-10-03 17:56 3% ` Wang, Yipeng1
2018-10-03 23:05 5% ` Ola Liljedahl
2018-10-04 3:32 0% ` Honnappa Nagarahalli
2018-10-04 3:54 0% ` Honnappa Nagarahalli
1 sibling, 2 replies; 200+ results
From: Wang, Yipeng1 @ 2018-10-03 17:56 UTC (permalink / raw)
To: Honnappa Nagarahalli, Van Haaren, Harry, Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>-----Original Message-----
>From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
>Sent: Wednesday, October 3, 2018 10:33 AM
>To: Wang, Yipeng1 <yipeng1.wang@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>; Richardson, Bruce
><bruce.richardson@intel.com>
>Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org; Gavin Hu (Arm Technology China)
><Gavin.Hu@arm.com>; Steve Capper <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; nd <nd@arm.com>; Gobriel,
>Sameh <sameh.gobriel@intel.com>
>Subject: RE: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
>
>> >-----Original Message-----
>> >From: Van Haaren, Harry
>> >> > > > > /**
>> >> > > > > * Add a key to an existing hash table.
>> >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
>> >> > > > >const void
>> >> > > *key);
>> >> > > > > * array of user data. This value is unique for this key.
>> >> > > > > */
>> >> > > > > int32_t
>> >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
>> >> > > > >void *key,
>> >> > > hash_sig_t sig);
>> >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
>> >> > > > >+*key,
>> >> > > hash_sig_t sig);
>> >> > > > >
>> >> > > > > /
>> >> > > >
>> >> > > > I think the above changes will break ABI by changing the
>> >> > > > parameter
>> >> type?
>> >> > > Other people may know better on this.
>> >> > >
>> >> > > Just removing a const should not change the ABI, I believe, since
>> >> > > the const is just advisory hint to the compiler. Actual parameter
>> >> > > size and count remains unchanged so I don't believe there is an issue.
>> >> > > [ABI experts, please correct me if I'm wrong on this]
>> >> >
>> >> >
>> >> > [Certainly no ABI expert, but...]
>> >> >
>> >> > I think this is an API break, not ABI break.
>> >> >
>> >> > Given application code as follows, it will fail to compile - even
>> >> > though
>> >> running
>> >> > the new code as a .so wouldn't cause any issues (AFAIK).
>> >> >
>> >> > void do_hash_stuff(const struct rte_hash *h, ...) {
>> >> > /* parameter passed in is const, but updated function prototype
>> >> > is
>> >> non-
>> >> > const */
>> >> > rte_hash_add_key_with_hash(h, ...); }
>> >> >
>> >> > This means that we can't recompile apps against latest patch
>> >> > without application code changes, if the app was passing a const
>> >> > rte_hash struct
>> >> as
>> >> > the first parameter.
>> >> >
>> >> Agree. Do we need to do anything for this?
>> >
>> >I think we should try to avoid breaking API wherever possible.
>> >If we must, then I suppose we could follow the ABI process of a
>> >deprecation notice.
>> >
>> >From my reading of the versioning docs, it doesn't document this case:
>> >https://doc.dpdk.org/guides/contributing/versioning.html
>> >
>> >I don't recall a similar situation in DPDK previously - so I suggest
>> >you ask Tech board for input here.
>> >
>> >Hope that helps! -Harry
>> [Wang, Yipeng]
>> Honnappa, how about use a pointer to the counter in the rte_hash struct
>> instead of the counter? Will this avoid API change?
>I think it defeats the purpose of 'const' parameter to the API and provides incorrect information to the user.
>IMO, DPDK should have guidelines on how to handle the API compatibility breaks. I will send an email to tech board on this.
>We can also solve this by having counters on the bucket. I was planning to do this little bit later. I will look at the effort involved and
>may be do it now.
[Wang, Yipeng]
I think with ABI/API change, you might need to announce it one release cycle ahead.
In the cuckoo switch paper: Scalable, High Performance Ethernet Forwarding with
CUCKOOSWITCH
it separates the version counter array and the hash table. You can strike a balance
between granularity of the version counter and the cache/memory requirement.
Is it a better way?
Another consideration is current bucket is 64-byte exactly with the partial-key-hashing.
To add another counter, we need to think about changing certain variables to still align
cache line.
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-02 23:58 0% ` Wang, Yipeng1
@ 2018-10-03 17:32 0% ` Honnappa Nagarahalli
2018-10-03 17:56 3% ` Wang, Yipeng1
2018-10-04 3:54 0% ` Honnappa Nagarahalli
0 siblings, 2 replies; 200+ results
From: Honnappa Nagarahalli @ 2018-10-03 17:32 UTC (permalink / raw)
To: Wang, Yipeng1, Van Haaren, Harry, Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
> >-----Original Message-----
> >From: Van Haaren, Harry
> >> > > > > /**
> >> > > > > * Add a key to an existing hash table.
> >> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
> >> > > > >const void
> >> > > *key);
> >> > > > > * array of user data. This value is unique for this key.
> >> > > > > */
> >> > > > > int32_t
> >> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const
> >> > > > >void *key,
> >> > > hash_sig_t sig);
> >> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void
> >> > > > >+*key,
> >> > > hash_sig_t sig);
> >> > > > >
> >> > > > > /
> >> > > >
> >> > > > I think the above changes will break ABI by changing the
> >> > > > parameter
> >> type?
> >> > > Other people may know better on this.
> >> > >
> >> > > Just removing a const should not change the ABI, I believe, since
> >> > > the const is just advisory hint to the compiler. Actual parameter
> >> > > size and count remains unchanged so I don't believe there is an issue.
> >> > > [ABI experts, please correct me if I'm wrong on this]
> >> >
> >> >
> >> > [Certainly no ABI expert, but...]
> >> >
> >> > I think this is an API break, not ABI break.
> >> >
> >> > Given application code as follows, it will fail to compile - even
> >> > though
> >> running
> >> > the new code as a .so wouldn't cause any issues (AFAIK).
> >> >
> >> > void do_hash_stuff(const struct rte_hash *h, ...) {
> >> > /* parameter passed in is const, but updated function prototype
> >> > is
> >> non-
> >> > const */
> >> > rte_hash_add_key_with_hash(h, ...); }
> >> >
> >> > This means that we can't recompile apps against latest patch
> >> > without application code changes, if the app was passing a const
> >> > rte_hash struct
> >> as
> >> > the first parameter.
> >> >
> >> Agree. Do we need to do anything for this?
> >
> >I think we should try to avoid breaking API wherever possible.
> >If we must, then I suppose we could follow the ABI process of a
> >deprecation notice.
> >
> >From my reading of the versioning docs, it doesn't document this case:
> >https://doc.dpdk.org/guides/contributing/versioning.html
> >
> >I don't recall a similar situation in DPDK previously - so I suggest
> >you ask Tech board for input here.
> >
> >Hope that helps! -Harry
> [Wang, Yipeng]
> Honnappa, how about use a pointer to the counter in the rte_hash struct
> instead of the counter? Will this avoid API change?
I think it defeats the purpose of 'const' parameter to the API and provides incorrect information to the user.
IMO, DPDK should have guidelines on how to handle the API compatibility breaks. I will send an email to tech board on this.
We can also solve this by having counters on the bucket. I was planning to do this little bit later. I will look at the effort involved and may be do it now.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] acl: fix invalid results for rule with zero priority
2018-09-25 14:34 0% ` Ananyev, Konstantin
@ 2018-10-03 16:18 0% ` Luca Boccassi
0 siblings, 0 replies; 200+ results
From: Luca Boccassi @ 2018-10-03 16:18 UTC (permalink / raw)
To: Ananyev, Konstantin, Thomas Monjalon; +Cc: dev
On Tue, 2018-09-25 at 14:34 +0000, Ananyev, Konstantin wrote:
> Hi Luca,
>
> >
> > On Sun, 2018-09-16 at 11:56 +0200, Thomas Monjalon wrote:
> > > 24/08/2018 18:47, Konstantin Ananyev:
> > > > If user specifies priority=0 for some of ACL rules
> > > > that can cause rte_acl_classify to return wrong results.
> > > > The reason is that priority zero is used internally for no-
> > > > match
> > > > nodes.
> > > > See more details at: https://bugs.dpdk.org/show_bug.cgi?id=79.
> > > > The simplest way to overcome the issue is just not allow zero
> > > > to be a valid priority for the rule.
> > > >
> > > > Fixes: dc276b5780c2 ("acl: new library")
> > > >
> > > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com
> > > > >
> > >
> > > Cc: stable@dpdk.org
> > >
> > > Applied with below title, thanks
> > > acl: forbid rule with priority zero
> >
> > Hi,
> >
> > This patch is marked for stable, but it changes an enum in a public
> > header
>
> Yes it does.
>
> > so it looks like an ABI breakage? Have I got it wrong?
>
> Strictly speaking - yes, but priority=0 is invalid value with current
> implementation.
> I don't think someone uses it - as in that case acl library simply
> wouldn't work
> correctly.
> Konstantin
Ok, I'll include this patch in 16.11.9 then, thanks for clarifying.
--
Kind regards,
Luca Boccassi
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2 1/5] mem: add function for checking memsegs IOVAs addresses
@ 2018-10-03 12:43 3% ` Burakov, Anatoly
[not found] ` <CAD+H991m6qauwX+P=muKe6bAjNLUrcBaGbxFXkMV60OVNvRgPg@mail.gmail.com>
0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-10-03 12:43 UTC (permalink / raw)
To: Alejandro Lucero, dev; +Cc: stable
On 31-Aug-18 1:50 PM, Alejandro Lucero wrote:
> A device can suffer addressing limitations. This functions checks
> memsegs have iovas within the supported range based on dma mask.
>
> PMD should use this during initialization if supported devices
> suffer addressing limitations, returning an error if this function
> returns memsegs out of range.
>
> Another potential usage is for emulated IOMMU hardware with addressing
> limitations.
>
> It is necessary to save the most restricted dma mask for checking
> memory allocated dynamically after initialization.
>
> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> ---
> lib/librte_eal/common/eal_common_memory.c | 56 +++++++++++++++++++++++
> lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++
> lib/librte_eal/common/include/rte_memory.h | 3 ++
> lib/librte_eal/common/malloc_heap.c | 12 +++++
> lib/librte_eal/linuxapp/eal/eal.c | 2 +
> lib/librte_eal/rte_eal_version.map | 1 +
> 6 files changed, 77 insertions(+)
>
> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
> index fbfb1b0..bdd8f44 100644
> --- a/lib/librte_eal/common/eal_common_memory.c
> +++ b/lib/librte_eal/common/eal_common_memory.c
> @@ -383,6 +383,62 @@ struct virtiova {
> rte_memseg_walk(dump_memseg, f);
> }
>
> +static int
> +check_iova(const struct rte_memseg_list *msl __rte_unused,
> + const struct rte_memseg *ms, void *arg)
> +{
> + uint64_t *mask = arg;
> + rte_iova_t iova;
> +
> + /* higher address within segment */
> + iova = (ms->iova + ms->len) - 1;
> + if (!(iova & *mask))
> + return 0;
> +
> + RTE_LOG(INFO, EAL, "memseg iova %"PRIx64", len %zx, out of range\n",
> + ms->iova, ms->len);
> +
> + RTE_LOG(INFO, EAL, "\tusing dma mask %"PRIx64"\n", *mask);
IMO putting these as INFO is overkill. I'd prefer not to spam the output
unless it's really important. Can this go under DEBUG?
Also, the message is misleading. You stop before you have a chance to
check other masks, which may restrict them even further. You're
outputting the message about using DMA mask XXX but this may not be the
final DMA mask.
> + /* Stop the walk and change mask */
> + *mask = 0;
> + return 1;
> +}
> +
> +#if defined(RTE_ARCH_64)
> +#define MAX_DMA_MASK_BITS 63
> +#else
> +#define MAX_DMA_MASK_BITS 31
> +#endif
> +
> +/* check memseg iovas are within the required range based on dma mask */
> +int __rte_experimental
> +rte_eal_check_dma_mask(uint8_t maskbits)
> +{
> + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
> + uint64_t mask;
> +
> + /* sanity check */
> + if (maskbits > MAX_DMA_MASK_BITS) {
> + RTE_LOG(INFO, EAL, "wrong dma mask size %u (Max: %u)\n",
> + maskbits, MAX_DMA_MASK_BITS);
Should be ERR, not INFO.
> + return -1;
> + }
> +
> + /* keep the more restricted maskbit */
> + if (!mcfg->dma_maskbits || maskbits < mcfg->dma_maskbits)
> + mcfg->dma_maskbits = maskbits;
Do we need to modify mcfg->dma_maskbits before we know if we're going to
fail? Suggest using a local variable maybe?
Also, i think it's a good case for ternary:
bits = mcfg->dma_maskbits == 0 ?
maskbits :
RTE_MIN(maskbits, mcfg->dma_maskbits);
IMO the intention looks much clearer.
> +
> + /* create dma mask */
> + mask = ~((1ULL << maskbits) - 1);
> +
> + rte_memseg_walk(check_iova, &mask);
> +
> + if (!mask)
> + return -1;
> +
> + return 0;
> +}
> +
> /* return the number of memory channels */
> unsigned rte_memory_get_nchannel(void)
> {
> diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
> index aff0688..aea44cb 100644
> --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
> +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
> @@ -77,6 +77,9 @@ struct rte_mem_config {
> * exact same address the primary process maps it.
> */
> uint64_t mem_cfg_addr;
> +
> + /* keeps the more restricted dma mask */
> + uint8_t dma_maskbits;
This needs to be documented as an ABI break in the 18.11 release notes.
--
Thanks,
Anatoly
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-10-02 13:17 3% ` Van Haaren, Harry
@ 2018-10-02 23:58 0% ` Wang, Yipeng1
2018-10-03 17:32 0% ` Honnappa Nagarahalli
0 siblings, 1 reply; 200+ results
From: Wang, Yipeng1 @ 2018-10-02 23:58 UTC (permalink / raw)
To: Van Haaren, Harry, Honnappa Nagarahalli, Richardson, Bruce
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd, Gobriel, Sameh
>-----Original Message-----
>From: Van Haaren, Harry
>> > > > > /**
>> > > > > * Add a key to an existing hash table.
>> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
>> > > > >const void
>> > > *key);
>> > > > > * array of user data. This value is unique for this key.
>> > > > > */
>> > > > > int32_t
>> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void
>> > > > >*key,
>> > > hash_sig_t sig);
>> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
>> > > hash_sig_t sig);
>> > > > >
>> > > > > /
>> > > >
>> > > > I think the above changes will break ABI by changing the parameter
>> type?
>> > > Other people may know better on this.
>> > >
>> > > Just removing a const should not change the ABI, I believe, since the
>> > > const is just advisory hint to the compiler. Actual parameter size and
>> > > count remains unchanged so I don't believe there is an issue.
>> > > [ABI experts, please correct me if I'm wrong on this]
>> >
>> >
>> > [Certainly no ABI expert, but...]
>> >
>> > I think this is an API break, not ABI break.
>> >
>> > Given application code as follows, it will fail to compile - even though
>> running
>> > the new code as a .so wouldn't cause any issues (AFAIK).
>> >
>> > void do_hash_stuff(const struct rte_hash *h, ...) {
>> > /* parameter passed in is const, but updated function prototype is
>> non-
>> > const */
>> > rte_hash_add_key_with_hash(h, ...);
>> > }
>> >
>> > This means that we can't recompile apps against latest patch without
>> > application code changes, if the app was passing a const rte_hash struct
>> as
>> > the first parameter.
>> >
>> Agree. Do we need to do anything for this?
>
>I think we should try to avoid breaking API wherever possible.
>If we must, then I suppose we could follow the ABI process of
>a deprecation notice.
>
>From my reading of the versioning docs, it doesn't document this case:
>https://doc.dpdk.org/guides/contributing/versioning.html
>
>I don't recall a similar situation in DPDK previously - so I suggest
>you ask Tech board for input here.
>
>Hope that helps! -Harry
[Wang, Yipeng]
Honnappa, how about use a pointer to the counter in the rte_hash struct instead of the counter? Will this avoid
API change?
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v4 1/2] build: change default PMD installation subdir to dpdk/pmds-XX.YY
2018-10-02 16:20 3% ` [dpdk-dev] [PATCH v4 " Luca Boccassi
@ 2018-10-02 16:28 0% ` Bruce Richardson
2018-10-05 16:00 0% ` Timothy Redaelli
2018-10-27 21:19 0% ` Thomas Monjalon
2 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2018-10-02 16:28 UTC (permalink / raw)
To: Luca Boccassi; +Cc: dev, tredaelli, christian.ehrhardt, mvarlese
On Tue, Oct 02, 2018 at 05:20:45PM +0100, Luca Boccassi wrote:
> As part of the effort of consolidating the DPDK installation bits and
> pieces across distros, set the default directory of lib/ where PMDs get
> installed to dpdk/pmds-XX.YY. It's necessary to have a versioned
> subdirectory as multiple ABI revisions might be installed at the same
> time, so having a fixed name will cause trouble with the autoload
> feature.
> Small refactor with parsing and saving the major version to a variable,
> since it's now used in 3 different places.
>
> Signed-off-by: Luca Boccassi <bluca@debian.org>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v4 1/2] build: change default PMD installation subdir to dpdk/pmds-XX.YY
` (2 preceding siblings ...)
2018-10-02 15:25 3% ` [dpdk-dev] [PATCH v3 " Luca Boccassi
@ 2018-10-02 16:20 3% ` Luca Boccassi
2018-10-02 16:28 0% ` Bruce Richardson
` (2 more replies)
3 siblings, 3 replies; 200+ results
From: Luca Boccassi @ 2018-10-02 16:20 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, tredaelli, christian.ehrhardt, mvarlese, Luca Boccassi
As part of the effort of consolidating the DPDK installation bits and
pieces across distros, set the default directory of lib/ where PMDs get
installed to dpdk/pmds-XX.YY. It's necessary to have a versioned
subdirectory as multiple ABI revisions might be installed at the same
time, so having a fixed name will cause trouble with the autoload
feature.
Small refactor with parsing and saving the major version to a variable,
since it's now used in 3 different places.
Signed-off-by: Luca Boccassi <bluca@debian.org>
---
drivers/meson.build | 6 ++----
lib/meson.build | 6 ++----
meson.build | 8 +++++++-
3 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/drivers/meson.build b/drivers/meson.build
index 47b4215a30..3a6c4bf656 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -98,10 +98,8 @@ foreach class:driver_classes
lib_version = '@0@.1'.format(version)
so_version = '@0@'.format(version)
else
- pver = meson.project_version().split('.')
- lib_version = '@0@.@1@'.format(pver.get(0),
- pver.get(1))
- so_version = lib_version
+ lib_version = major_version
+ so_version = major_version
endif
# now build the static driver
diff --git a/lib/meson.build b/lib/meson.build
index 3acc67e6ed..bed492a4ec 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -88,10 +88,8 @@ foreach l:libraries
lib_version = '@0@.1'.format(version)
so_version = '@0@'.format(version)
else
- prj_ver = meson.project_version().split('.')
- lib_version = '@0@.@1@'.format(
- prj_ver.get(0), prj_ver.get(1))
- so_version = lib_version
+ lib_version = major_version
+ so_version = major_version
endif
# first build static lib
diff --git a/meson.build b/meson.build
index c9af33532d..4bd04b9de3 100644
--- a/meson.build
+++ b/meson.build
@@ -15,7 +15,13 @@ dpdk_libraries = []
dpdk_drivers = []
dpdk_extra_ldflags = []
-driver_install_path = join_paths(get_option('libdir'), 'dpdk/drivers')
+# set the major version, which might be used by drivers and libraries
+# depending on the configuration options
+pver = meson.project_version().split('.')
+major_version = '@0@.@1@'.format(pver.get(0), pver.get(1))
+
+driver_install_path = join_paths(get_option('libdir'), 'dpdk',
+ 'pmds-' + major_version)
eal_pmd_path = join_paths(get_option('prefix'), driver_install_path)
# configure the build, and make sure configs here and in config folder are
--
2.19.0
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v3 1/2] build: change default PMD installation subdir to dpdk/pmds-XX.YY
2018-10-02 13:06 3% ` [dpdk-dev] [PATCH v2 1/2] build: change default PMD installation subdir to dpdk/pmds-XX.YY Luca Boccassi
@ 2018-10-02 15:25 3% ` Luca Boccassi
2018-10-02 16:20 3% ` [dpdk-dev] [PATCH v4 " Luca Boccassi
3 siblings, 0 replies; 200+ results
From: Luca Boccassi @ 2018-10-02 15:25 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, tredaelli, christian.ehrhardt, mvarlese, Luca Boccassi
As part of the effort of consolidating the DPDK installation bits and
pieces across distros, set the default directory of lib/ where PMDs get
installed to dpdk/pmds-XX.YY. It's necessary to have a versioned
subdirectory as multiple ABI revisions might be installed at the same
time, so having a fixed name will cause trouble with the autoload
feature.
Small refactor with parsing and saving the major version to a variable,
since it's now used in 3 different places.
Signed-off-by: Luca Boccassi <bluca@debian.org>
---
drivers/meson.build | 6 ++----
lib/meson.build | 6 ++----
meson.build | 8 +++++++-
3 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/drivers/meson.build b/drivers/meson.build
index 47b4215a30..3a6c4bf656 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -98,10 +98,8 @@ foreach class:driver_classes
lib_version = '@0@.1'.format(version)
so_version = '@0@'.format(version)
else
- pver = meson.project_version().split('.')
- lib_version = '@0@.@1@'.format(pver.get(0),
- pver.get(1))
- so_version = lib_version
+ lib_version = major_version
+ so_version = major_version
endif
# now build the static driver
diff --git a/lib/meson.build b/lib/meson.build
index 3acc67e6ed..bed492a4ec 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -88,10 +88,8 @@ foreach l:libraries
lib_version = '@0@.1'.format(version)
so_version = '@0@'.format(version)
else
- prj_ver = meson.project_version().split('.')
- lib_version = '@0@.@1@'.format(
- prj_ver.get(0), prj_ver.get(1))
- so_version = lib_version
+ lib_version = major_version
+ so_version = major_version
endif
# first build static lib
diff --git a/meson.build b/meson.build
index c9af33532d..4bd04b9de3 100644
--- a/meson.build
+++ b/meson.build
@@ -15,7 +15,13 @@ dpdk_libraries = []
dpdk_drivers = []
dpdk_extra_ldflags = []
-driver_install_path = join_paths(get_option('libdir'), 'dpdk/drivers')
+# set the major version, which might be used by drivers and libraries
+# depending on the configuration options
+pver = meson.project_version().split('.')
+major_version = '@0@.@1@'.format(pver.get(0), pver.get(1))
+
+driver_install_path = join_paths(get_option('libdir'), 'dpdk',
+ 'pmds-' + major_version)
eal_pmd_path = join_paths(get_option('prefix'), driver_install_path)
# configure the build, and make sure configs here and in config folder are
--
2.19.0
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v9 11/21] malloc: allow creating malloc heaps
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
` (4 preceding siblings ...)
2018-10-02 13:34 8% ` [dpdk-dev] [PATCH v9 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-10-02 13:34 7% ` Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.
This breaks the ABI, so document the change.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 2 +
.../common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_malloc.h | 19 +++++++
lib/librte_eal/common/malloc_heap.c | 37 +++++++++++++
lib/librte_eal/common/malloc_heap.h | 3 ++
lib/librte_eal/common/rte_malloc.c | 52 +++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 117 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 754c41755..e7674adb9 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -177,6 +177,8 @@ ABI Changes
- structure ``rte_mem_config`` has had its ``malloc_heaps`` array
resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
- structure ``rte_malloc_heap`` now has a ``heap_name`` member
+ - structure ``rte_eal_memconfig`` has been extended to contain next
+ socket ID for externally allocated segments
Removed Items
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 7634bff5d..fc44c4e5f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
/* Heaps of Malloc */
struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
+ /* next socket ID for external malloc heap */
+ int next_socket_id;
+
/* address of mem_config in primary process. used to map shared config into
* exact same address the primary process maps it.
*/
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats);
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ * socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ * Name of the heap to create.
+ *
+ * @return
+ * - 0 on successful creation
+ * - -1 in case of error, with rte_errno set to one of the following:
+ * EINVAL - ``heap_name`` was NULL, empty or too long
+ * EEXIST - heap by name of ``heap_name`` already exists
+ * ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
/**
* Find socket ID corresponding to a named heap.
*
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index b28905817..00fdf54f7 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
#include "malloc_heap.h"
#include "malloc_mp.h"
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
static unsigned
check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
{
@@ -1019,6 +1023,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
rte_spinlock_unlock(&heap->lock);
}
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ uint32_t next_socket_id = mcfg->next_socket_id;
+
+ /* prevent overflow. did you really create 2 billion heaps??? */
+ if (next_socket_id > INT32_MAX) {
+ RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+ rte_errno = ENOSPC;
+ return -1;
+ }
+
+ /* initialize empty heap */
+ heap->alloc_count = 0;
+ heap->first = NULL;
+ heap->last = NULL;
+ LIST_INIT(heap->free_head);
+ rte_spinlock_init(&heap->lock);
+ heap->total_size = 0;
+ heap->socket_id = next_socket_id;
+
+ /* we hold a global mem hotplug writelock, so it's safe to increment */
+ mcfg->next_socket_id++;
+
+ /* set up name */
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ return 0;
+}
+
int
rte_eal_malloc_heap_init(void)
{
@@ -1026,6 +1060,9 @@ rte_eal_malloc_heap_init(void)
unsigned int i;
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign min socket ID to external heaps */
+ mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
/* assign names to default DPDK heaps */
for (i = 0; i < rte_socket_count(); i++) {
struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
size_t align, bool contig);
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
int
malloc_heap_free(struct malloc_elem *elem);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_eal_memconfig.h>
+#include <rte_errno.h>
#include <rte_branch_prediction.h>
#include <rte_debug.h>
#include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
}
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct malloc_heap *heap = NULL;
+ int i, ret;
+
+ if (heap_name == NULL ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+ RTE_HEAP_NAME_MAX_LEN) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ /* check if there is space in the heap list, or if heap with this name
+ * already exists.
+ */
+ rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+ /* existing heap */
+ if (strncmp(heap_name, tmp->name,
+ RTE_HEAP_NAME_MAX_LEN) == 0) {
+ RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+ heap_name);
+ rte_errno = EEXIST;
+ ret = -1;
+ goto unlock;
+ }
+ /* empty heap */
+ if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+ heap = tmp;
+ break;
+ }
+ }
+ if (heap == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+ rte_errno = ENOSPC;
+ ret = -1;
+ goto unlock;
+ }
+
+ /* we're sure that we can create a new heap, so do it */
+ ret = malloc_heap_create(heap, heap_name);
+unlock:
+ rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
rte_fbarray_set_used;
rte_log_register_type_and_pick_level;
rte_malloc_dump_heaps;
+ rte_malloc_heap_create;
rte_malloc_heap_get_socket;
rte_malloc_heap_socket_is_external;
rte_mem_alloc_validator_register;
--
2.17.1
^ permalink raw reply [relevance 7%]
* [dpdk-dev] [PATCH v9 01/21] mem: add length to memseg list
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
2018-10-02 13:34 3% ` [dpdk-dev] [PATCH v9 " Anatoly Burakov
@ 2018-10-02 13:34 13% ` Anatoly Burakov
2018-10-02 13:34 10% ` [dpdk-dev] [PATCH v9 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
` (3 subsequent siblings)
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, Bruce Richardson,
laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.
This breaks ABI, so bump the EAL ABI version and document the
change. Also, while we're breaking ABI, pack the members a little
better.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
doc/guides/rel_notes/release_18_11.rst | 8 +++++++-
drivers/bus/pci/linux/pci.c | 2 +-
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal_memory.c | 2 ++
lib/librte_eal/common/eal_common_memory.c | 5 ++---
lib/librte_eal/common/include/rte_eal_memconfig.h | 3 ++-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 3 ++-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +++-
lib/librte_eal/meson.build | 2 +-
10 files changed, 22 insertions(+), 11 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index a8327ea77..58bb79022 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -153,6 +153,12 @@ ABI Changes
=========================================================
+* eal: EAL library ABI version was changed due to previously announced work on
+ supporting external memory in DPDK:
+ - structure ``rte_memseg_list`` now has a new field indicating length
+ of memory addressed by the segment list
+
+
Removed Items
-------------
@@ -198,7 +204,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_compressdev.so.1
librte_cryptodev.so.5
librte_distributor.so.1
- librte_eal.so.8
+ + librte_eal.so.9
librte_ethdev.so.10
+ librte_eventdev.so.6
librte_flow_classify.so.1
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..d6e1027ab 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
static int
find_max_end_va(const struct rte_memseg_list *msl, void *arg)
{
- size_t sz = msl->memseg_arr.len * msl->page_sz;
+ size_t sz = msl->len;
void *end_va = RTE_PTR_ADD(msl->base_va, sz);
void **max_va = arg;
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
EXPORT_MAP := ../../rte_eal_version.map
-LIBABIVER := 8
+LIBABIVER := 9
# specific to bsdapp exec-env
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 16d2bc7c3..65ea670f9 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -79,6 +79,7 @@ rte_eal_hugepage_init(void)
}
msl->base_va = addr;
msl->page_sz = page_sz;
+ msl->len = internal_config.memory;
msl->socket_id = 0;
/* populate memsegs. each memseg is 1 page long */
@@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl)
return -1;
}
msl->base_va = addr;
+ msl->len = mem_sz;
return 0;
}
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0b69804ff..30d018209 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl)
/* a memseg list was specified, check if it's the right one */
start = msl->base_va;
- end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len);
+ end = RTE_PTR_ADD(start, msl->len);
if (addr < start || addr >= end)
return NULL;
@@ -194,8 +194,7 @@ virt2memseg_list(const void *addr)
msl = &mcfg->memsegs[msl_idx];
start = msl->base_va;
- end = RTE_PTR_ADD(start,
- (size_t)msl->page_sz * msl->memseg_arr.len);
+ end = RTE_PTR_ADD(start, msl->len);
if (addr >= start && addr < end)
break;
}
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..1d2362985 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -30,9 +30,10 @@ struct rte_memseg_list {
uint64_t addr_64;
/**< Makes sure addr is always 64-bits */
};
- int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
+ int socket_id; /**< Socket ID for all memsegs in this list. */
volatile uint32_t version; /**< version number for multiprocess sync. */
+ size_t len; /**< Length of memory area covered by this memseg list. */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
EXPORT_MAP := ../../rte_eal_version.map
VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
-LIBABIVER := 8
+LIBABIVER := 9
VPATH += $(RTE_SDK)/lib/librte_eal/common
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index b2e2a9599..71a6e0fd9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -986,7 +986,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
int msl_idx, seg_idx, ret, dir_fd = -1;
start_addr = (uintptr_t) msl->base_va;
- end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz;
+ end_addr = start_addr + msl->len;
if ((uintptr_t)wa->ms->addr < start_addr ||
(uintptr_t)wa->ms->addr >= end_addr)
@@ -1472,6 +1472,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
return -1;
}
local_msl->base_va = primary_msl->base_va;
+ local_msl->len = primary_msl->len;
return 0;
}
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index e3ac24815..897d94179 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -861,6 +861,7 @@ alloc_va_space(struct rte_memseg_list *msl)
return -1;
}
msl->base_va = addr;
+ msl->len = mem_sz;
return 0;
}
@@ -1369,6 +1370,7 @@ eal_legacy_hugepage_init(void)
msl->base_va = addr;
msl->page_sz = page_sz;
msl->socket_id = 0;
+ msl->len = internal_config.memory;
/* populate memsegs. each memseg is one page long */
for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
@@ -1615,7 +1617,7 @@ eal_legacy_hugepage_init(void)
if (msl->memseg_arr.count > 0)
continue;
/* this is an unused list, deallocate it */
- mem_sz = (size_t)msl->page_sz * msl->memseg_arr.len;
+ mem_sz = msl->len;
munmap(msl->base_va, mem_sz);
msl->base_va = NULL;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
error('unsupported system type "@0@"'.format(host_machine.system()))
endif
-version = 8 # the version of the EAL API
+version = 9 # the version of the EAL API
allow_experimental_apis = true
deps += 'compat'
deps += 'kvargs'
--
2.17.1
^ permalink raw reply [relevance 13%]
* [dpdk-dev] [PATCH v9 00/21] Support externally allocated memory in DPDK
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
@ 2018-10-02 13:34 3% ` Anatoly Burakov
2018-10-02 13:34 13% ` [dpdk-dev] [PATCH v9 01/21] mem: add length to memseg list Anatoly Burakov
` (4 subsequent siblings)
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
To: dev
Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero
This is a proposal to enable using externally allocated memory
in DPDK.
In a nutshell, here is what is being done here:
- Index internal malloc heaps by NUMA node index, rather than NUMA
node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
- Each new heap will receive a unique socket ID that will be used by
allocator to decide from which heap (internal or external) to
allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
of externally allocated memory
- If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps
The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).
The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).
A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.
Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.
v9 -> v8 changes:
- Rebase on latest master
- Minor cosmetic testpmd changes as per Bernard's feedback
- Pack structures better (Stephen's suggestion)
- Touch pages before finding their IOVA address
v8 -> v7 changes:
- Rebase on latest master
- More documentation on ABI changes
v7 -> v6 changes:
- Fixed missing IOVA address setup in testpmd
- Fixed MLX drivers as per Yongseok's comments
- Added a check for invalid heap idx on adding memory to heap
v6 -> v5 changes:
- Fixed documentation formatting as per Marko's comments
v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes
v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
IOVA-contiguousness when dealing with externally allocated memory
v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
comments
v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation
RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements
Anatoly Burakov (21):
mem: add length to memseg list
mem: allow memseg lists to be marked as external
malloc: index heaps using heap ID rather than NUMA node
mem: do not check for invalid socket ID
flow_classify: do not check for invalid socket ID
pipeline: do not check for invalid socket ID
sched: do not check for invalid socket ID
malloc: add name to malloc heaps
malloc: add function to query socket ID of named heap
malloc: add function to check if socket is external
malloc: allow creating malloc heaps
malloc: allow destroying heaps
malloc: allow adding memory to named heaps
malloc: allow removing memory from named heaps
malloc: allow attaching to external memory chunks
malloc: allow detaching from external memory
malloc: enable event callbacks for external memory
test: add unit tests for external memory support
app/testpmd: add support for external memory
doc: add external memory feature to the release notes
doc: add external memory feature to programmer's guide
app/test-pmd/config.c | 21 +-
app/test-pmd/parameters.c | 23 +-
app/test-pmd/testpmd.c | 325 ++++++++++++-
app/test-pmd/testpmd.h | 13 +-
config/common_base | 1 +
config/rte_config.h | 1 +
.../prog_guide/env_abstraction_layer.rst | 37 ++
doc/guides/rel_notes/deprecation.rst | 15 -
doc/guides/rel_notes/release_18_11.rst | 36 +-
doc/guides/testpmd_app_ug/run_app.rst | 12 +
drivers/bus/fslmc/fslmc_vfio.c | 13 +-
drivers/bus/pci/linux/pci.c | 2 +-
drivers/net/mlx5/mlx5.c | 4 +-
drivers/net/virtio/virtio_user/vhost_kernel.c | 3 +
.../net/virtio/virtio_user/virtio_user_dev.c | 6 +
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 +
lib/librte_eal/bsdapp/eal/eal_memory.c | 9 +-
lib/librte_eal/common/eal_common_memory.c | 8 +-
lib/librte_eal/common/eal_common_memzone.c | 8 +-
.../common/include/rte_eal_memconfig.h | 11 +-
lib/librte_eal/common/include/rte_malloc.h | 192 ++++++++
.../common/include/rte_malloc_heap.h | 3 +
lib/librte_eal/common/include/rte_memory.h | 9 +
lib/librte_eal/common/malloc_elem.c | 10 +-
lib/librte_eal/common/malloc_heap.c | 320 +++++++++++--
lib/librte_eal/common/malloc_heap.h | 17 +
lib/librte_eal/common/rte_malloc.c | 429 +++++++++++++++++-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 12 +-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 27 +-
lib/librte_eal/meson.build | 2 +-
lib/librte_eal/rte_eal_version.map | 8 +
lib/librte_flow_classify/rte_flow_classify.c | 3 +-
lib/librte_mempool/rte_mempool.c | 57 ++-
lib/librte_pipeline/rte_pipeline.c | 3 +-
lib/librte_sched/rte_sched.c | 2 +-
test/test/Makefile | 1 +
test/test/autotest_data.py | 14 +-
test/test/meson.build | 1 +
test/test/test_external_mem.c | 389 ++++++++++++++++
test/test/test_malloc.c | 3 +
test/test/test_memzone.c | 3 +
45 files changed, 1936 insertions(+), 138 deletions(-)
create mode 100644 test/test/test_external_mem.c
--
2.17.1
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v9 08/21] malloc: add name to malloc heaps
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
` (3 preceding siblings ...)
2018-10-02 13:34 5% ` [dpdk-dev] [PATCH v9 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
@ 2018-10-02 13:34 8% ` Anatoly Burakov
2018-10-02 13:34 7% ` [dpdk-dev] [PATCH v9 11/21] malloc: allow creating " Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.
This breaks the ABI, so document the change.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 2 ++
lib/librte_eal/common/include/rte_malloc_heap.h | 2 ++
lib/librte_eal/common/malloc_heap.c | 17 ++++++++++++++++-
lib/librte_eal/common/rte_malloc.c | 1 +
4 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 172c42f71..754c41755 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -176,6 +176,8 @@ ABI Changes
ID the malloc heap belongs to
- structure ``rte_mem_config`` has had its ``malloc_heaps`` array
resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
+ - structure ``rte_malloc_heap`` now has a ``heap_name`` member
+
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d432cef88..4a7e0eb1d 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
/* Number of free lists per heap, grouped by size. */
#define RTE_HEAP_NUM_FREELISTS 13
+#define RTE_HEAP_NAME_MAX_LEN 32
/* dummy definition, for pointers */
struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
unsigned alloc_count;
unsigned int socket_id;
size_t total_size;
+ char name[RTE_HEAP_NAME_MAX_LEN];
} __rte_cache_aligned;
#endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 09b06061d..b28905817 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -131,7 +131,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
malloc_heap_add_memory(heap, found_msl, ms->addr, len);
heap->total_size += len;
- heap->socket_id = msl->socket_id;
RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
msl->socket_id);
@@ -1024,6 +1023,22 @@ int
rte_eal_malloc_heap_init(void)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int i;
+
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign names to default DPDK heaps */
+ for (i = 0; i < rte_socket_count(); i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+ char heap_name[RTE_HEAP_NAME_MAX_LEN];
+ int socket_id = rte_socket_id_by_idx(i);
+
+ snprintf(heap_name, sizeof(heap_name) - 1,
+ "socket_%i", socket_id);
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ heap->socket_id = socket_id;
+ }
+ }
+
if (register_mp_requests()) {
RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
malloc_heap_get_stats(heap, &sock_stats);
fprintf(f, "Heap id:%u\n", heap_id);
+ fprintf(f, "\tHeap name:%s\n", heap->name);
fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
--
2.17.1
^ permalink raw reply [relevance 8%]
* [dpdk-dev] [PATCH v9 02/21] mem: allow memseg lists to be marked as external
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
2018-10-02 13:34 3% ` [dpdk-dev] [PATCH v9 " Anatoly Burakov
2018-10-02 13:34 13% ` [dpdk-dev] [PATCH v9 01/21] mem: add length to memseg list Anatoly Burakov
@ 2018-10-02 13:34 10% ` Anatoly Burakov
2018-10-02 13:34 5% ` [dpdk-dev] [PATCH v9 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
` (2 subsequent siblings)
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
To: dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, thomas, alejandro.lucero
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so document the change in release notes.
This also breaks a few internal assumptions about memory
contiguousness, so adjust malloc code in a few places.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---
Notes:
v3:
- Add comment to explain the process of picking up minimum
page sizes for mempool
v2:
- Add documentation changes and ABI break
v1:
- Adjust all calls to memseg walk functions to ignore external
segments where it made sense to do so
doc/guides/rel_notes/deprecation.rst | 15 --------
doc/guides/rel_notes/release_18_11.rst | 8 +++++
drivers/bus/fslmc/fslmc_vfio.c | 6 +++-
drivers/net/mlx5/mlx5.c | 4 ++-
drivers/net/virtio/virtio_user/vhost_kernel.c | 3 ++
lib/librte_eal/bsdapp/eal/eal.c | 3 ++
lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++--
lib/librte_eal/common/eal_common_memory.c | 3 ++
.../common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/common/include/rte_memory.h | 9 +++++
lib/librte_eal/common/malloc_elem.c | 10 ++++--
lib/librte_eal/common/malloc_heap.c | 9 +++--
lib/librte_eal/common/rte_malloc.c | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +++++-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++---
lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++-----
test/test/test_malloc.c | 3 ++
test/test/test_memzone.c | 3 ++
19 files changed, 119 insertions(+), 38 deletions(-)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------
-* eal: certain structures will change in EAL on account of upcoming external
- memory support. Aside from internal changes leading to an ABI break, the
- following externally visible changes will also be implemented:
-
- - ``rte_memseg_list`` will change to include a boolean flag indicating
- whether a particular memseg list is externally allocated. This will have
- implications for any users of memseg-walk-related functions, as they will
- now have to skip externally allocated segments in most cases if the intent
- is to only iterate over internal DPDK memory.
- - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
- as some socket ID's will now be representing externally allocated memory. No
- changes will be required for existing code as backwards compatibility will
- be kept, and those who do not use this feature will not see these extra
- socket ID's.
-
* eal: both declaring and identifying devices will be streamlined in v18.11.
New functions will appear to query a specific port from buses, classes of
device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 58bb79022..bc1d56130 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -118,6 +118,12 @@ API Changes
Also, make sure to start the actual text at the margin.
=========================================================
+* eal: ``rte_memseg_list`` structure now has an additional flag indicating
+ whether the memseg list is externally allocated. This will have implications
+ for any users of memseg-walk-related functions, as they will now have to skip
+ externally allocated segments in most cases if the intent is to only iterate
+ over internal DPDK memory.
+
* mbuf: The ``__rte_mbuf_raw_free()`` and ``__rte_pktmbuf_prefree_seg()``
functions were deprecated since 17.05 and are replaced by
``rte_mbuf_raw_free()`` and ``rte_pktmbuf_prefree_seg()``.
@@ -157,6 +163,8 @@ ABI Changes
supporting external memory in DPDK:
- structure ``rte_memseg_list`` now has a new field indicating length
of memory addressed by the segment list
+ - structure ``rte_memseg_list`` now has a new flag indicating whether
+ the memseg list refers to external memory
Removed Items
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..cb33dd891 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -318,11 +318,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
static int
fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+ const struct rte_memseg *ms, void *arg)
{
int *n_segs = arg;
int ret;
+ /* if IOVA address is invalid, skip */
+ if (ms->iova == RTE_BAD_IOVA)
+ return 0;
+
ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
if (ret)
DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index fd89e2af3..af4a78ce9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,13 @@ static struct rte_pci_driver mlx5_driver;
static void *uar_base;
static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
void **addr = arg;
+ if (msl->external)
+ return 0;
if (*addr == NULL)
*addr = ms->addr;
else
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index b3bfcb76f..990ce80ce 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -78,6 +78,9 @@ add_memseg_list(const struct rte_memseg_list *msl, void *arg)
void *start_addr;
uint64_t len;
+ if (msl->external)
+ return 0;
+
if (vm->nregions >= max_regions)
return -1;
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
return 1;
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
int seg_idx;
};
static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
struct attach_walk_args *wa = arg;
void *addr;
+ if (msl->external)
+ return 0;
+
addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
{
uint64_t *total_len = arg;
+ if (msl->external)
+ return 0;
+
*total_len += msl->memseg_arr.count * msl->page_sz;
return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d2362985..645288b02 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -34,6 +34,7 @@ struct rte_memseg_list {
int socket_id; /**< Socket ID for all memsegs in this list. */
volatile uint32_t version; /**< version number for multiprocess sync. */
size_t len; /**< Length of memory area covered by this memseg list. */
+ unsigned int external; /**< 1 if this list points to external memory */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
/* if we're in IOVA as VA mode, or if we're in legacy mode with
- * hugepages, all elements are IOVA-contiguous.
+ * hugepages, all elements are IOVA-contiguous. however, we can only
+ * make these assumptions about internal memory - externally allocated
+ * segments have to be checked.
*/
- if (rte_eal_iova_mode() == RTE_IOVA_VA ||
- (internal_config.legacy_mem && rte_eal_has_hugepages()))
+ if (!elem->msl->external &&
+ (rte_eal_iova_mode() == RTE_IOVA_VA ||
+ (internal_config.legacy_mem &&
+ rte_eal_has_hugepages())))
return RTE_PTR_DIFF(data_end, contig_seg_start);
cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct malloc_heap *heap;
int msl_idx;
+ if (msl->external)
+ return 0;
+
heap = &mcfg->malloc_heaps[msl->socket_id];
/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
/* anything after this is a bonus */
ret = 0;
- /* ...of which we can't avail if we are in legacy mode */
- if (internal_config.legacy_mem)
+ /* ...of which we can't avail if we are in legacy mode, or if this is an
+ * externally allocated segment.
+ */
+ if (internal_config.legacy_mem || msl->external)
goto free_unlock;
/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
if (elem == NULL)
return RTE_BAD_IOVA;
- if (rte_eal_iova_mode() == RTE_IOVA_VA)
+ if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
return (uintptr_t) addr;
ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
return *socket_id == msl->socket_id;
}
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg __rte_unused)
{
/* ms is const, so find this memseg */
- struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+ struct rte_memseg *found;
+
+ if (msl->external)
+ return 0;
+
+ found = rte_mem_virt2memseg(ms->addr, msl);
found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
unsigned int i;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
char name[PATH_MAX];
int msl_idx, ret;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
unsigned int len;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
}
static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
}
static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
uint64_t hugepage_sz;
};
static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
struct spapr_walk_param *param = arg;
uint64_t max = ms->iova + ms->len;
+ if (msl->external)
+ return 0;
+
if (max > param->window_size) {
param->hugepage_sz = ms->hugepage_sz;
param->window_size = max;
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
}
+struct pagesz_walk_arg {
+ int socket_id;
+ size_t min;
+};
+
static int
find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
{
- size_t *min = arg;
+ struct pagesz_walk_arg *wa = arg;
+ bool valid;
- if (msl->page_sz < *min)
- *min = msl->page_sz;
+ /*
+ * we need to only look at page sizes available for a particular socket
+ * ID. so, we either need an exact match on socket ID (can match both
+ * native and external memory), or, if SOCKET_ID_ANY was specified as a
+ * socket ID argument, we must only look at native memory and ignore any
+ * page sizes associated with external memory.
+ */
+ valid = msl->socket_id == wa->socket_id;
+ valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+ if (valid && msl->page_sz < wa->min)
+ wa->min = msl->page_sz;
return 0;
}
static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
{
- size_t min_pagesz = SIZE_MAX;
+ struct pagesz_walk_arg wa;
- rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+ wa.min = SIZE_MAX;
+ wa.socket_id = socket_id;
- return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+ rte_memseg_list_walk(find_min_pagesz, &wa);
+
+ return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
}
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_sz = 0;
pg_shift = 0;
} else if (try_contig) {
- pg_sz = get_min_page_size();
+ pg_sz = get_min_page_size(mp->socket_id);
pg_shift = rte_bsf32(pg_sz);
} else {
pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
{
int32_t *socket = arg;
+ if (msl->external)
+ return 0;
+
return *socket == msl->socket_id;
}
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
{
struct walk_arg *wa = arg;
+ if (msl->external)
+ return 0;
+
if (msl->page_sz == RTE_PGSIZE_2M)
wa->hugepage_2MB_avail = 1;
if (msl->page_sz == RTE_PGSIZE_1G)
--
2.17.1
^ permalink raw reply [relevance 10%]
* [dpdk-dev] [PATCH v9 03/21] malloc: index heaps using heap ID rather than NUMA node
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
` (2 preceding siblings ...)
2018-10-02 13:34 10% ` [dpdk-dev] [PATCH v9 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-10-02 13:34 5% ` Anatoly Burakov
2018-10-02 13:34 8% ` [dpdk-dev] [PATCH v9 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-10-02 13:34 7% ` [dpdk-dev] [PATCH v9 11/21] malloc: allow creating " Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-02 13:34 UTC (permalink / raw)
To: dev
Cc: Thomas Monjalon, Bruce Richardson, John McNamara,
Marko Kovacevic, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, shreyansh.jain, shahafs, arybchenko,
alejandro.lucero
Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.
This breaks the ABI, so document the changes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
config/common_base | 1 +
config/rte_config.h | 1 +
doc/guides/rel_notes/release_18_11.rst | 5 +-
.../common/include/rte_eal_memconfig.h | 4 +-
.../common/include/rte_malloc_heap.h | 1 +
lib/librte_eal/common/malloc_heap.c | 102 +++++++++++++-----
lib/librte_eal/common/malloc_heap.h | 3 +
lib/librte_eal/common/rte_malloc.c | 41 ++++---
8 files changed, 114 insertions(+), 44 deletions(-)
diff --git a/config/common_base b/config/common_base
index acc5211bc..83350e0b1 100644
--- a/config/common_base
+++ b/config/common_base
@@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
CONFIG_RTE_LIBRTE_EAL=y
CONFIG_RTE_MAX_LCORE=128
CONFIG_RTE_MAX_NUMA_NODES=8
+CONFIG_RTE_MAX_HEAPS=32
CONFIG_RTE_MAX_MEMSEG_LISTS=64
# each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
# or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller
diff --git a/config/rte_config.h b/config/rte_config.h
index 20c58dff1..816e6f879 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -24,6 +24,7 @@
#define RTE_BUILD_SHARED_LIB
/* EAL defines */
+#define RTE_MAX_HEAPS 32
#define RTE_MAX_MEMSEG_LISTS 128
#define RTE_MAX_MEMSEG_PER_LIST 8192
#define RTE_MAX_MEM_MB_PER_LIST 32768
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc1d56130..0607a3980 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -165,7 +165,10 @@ ABI Changes
of memory addressed by the segment list
- structure ``rte_memseg_list`` now has a new flag indicating whether
the memseg list refers to external memory
-
+ - structure ``rte_malloc_heap`` now has a new field indicating socket
+ ID the malloc heap belongs to
+ - structure ``rte_mem_config`` has had its ``malloc_heaps`` array
+ resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 645288b02..7634bff5d 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -72,8 +72,8 @@ struct rte_mem_config {
struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
- /* Heaps of Malloc per socket */
- struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+ /* Heaps of Malloc */
+ struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
/* address of mem_config in primary process. used to map shared config into
* exact same address the primary process maps it.
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d43fa9097..d432cef88 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -26,6 +26,7 @@ struct malloc_heap {
struct malloc_elem *volatile last;
unsigned alloc_count;
+ unsigned int socket_id;
size_t total_size;
} __rte_cache_aligned;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 3c8e2063b..a9cfa423f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
return check_flag & flags;
}
+int
+malloc_socket_to_heap_id(unsigned int socket_id)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int i;
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+ if (heap->socket_id == socket_id)
+ return i;
+ }
+ return -1;
+}
+
/*
* Expand the heap with a memory area.
*/
@@ -93,12 +108,17 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct rte_memseg_list *found_msl;
struct malloc_heap *heap;
- int msl_idx;
+ int msl_idx, heap_idx;
if (msl->external)
return 0;
- heap = &mcfg->malloc_heaps[msl->socket_id];
+ heap_idx = malloc_socket_to_heap_id(msl->socket_id);
+ if (heap_idx < 0) {
+ RTE_LOG(ERR, EAL, "Memseg list has invalid socket id\n");
+ return -1;
+ }
+ heap = &mcfg->malloc_heaps[heap_idx];
/* msl is const, so find it */
msl_idx = msl - mcfg->memsegs;
@@ -111,6 +131,7 @@ malloc_add_seg(const struct rte_memseg_list *msl,
malloc_heap_add_memory(heap, found_msl, ms->addr, len);
heap->total_size += len;
+ heap->socket_id = msl->socket_id;
RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
msl->socket_id);
@@ -561,12 +582,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket,
/* this will try lower page sizes first */
static void *
-heap_alloc_on_socket(const char *type, size_t size, int socket,
- unsigned int flags, size_t align, size_t bound, bool contig)
+malloc_heap_alloc_on_heap_id(const char *type, size_t size,
+ unsigned int heap_id, unsigned int flags, size_t align,
+ size_t bound, bool contig)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+ struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
+ int socket_id;
void *ret;
rte_spinlock_lock(&(heap->lock));
@@ -584,12 +607,28 @@ heap_alloc_on_socket(const char *type, size_t size, int socket,
* we may still be able to allocate memory from appropriate page sizes,
* we just need to request more memory first.
*/
+
+ socket_id = rte_socket_id_by_idx(heap_id);
+ /*
+ * if socket ID is negative, we cannot find a socket ID for this heap -
+ * which means it's an external heap. those can have unexpected page
+ * sizes, so if the user asked to allocate from there - assume user
+ * knows what they're doing, and allow allocating from there with any
+ * page size flags.
+ */
+ if (socket_id < 0)
+ size_flags |= RTE_MEMZONE_SIZE_HINT_ONLY;
+
ret = heap_alloc(heap, type, size, size_flags, align, bound, contig);
if (ret != NULL)
goto alloc_unlock;
- if (!alloc_more_mem_on_socket(heap, size, socket, flags, align, bound,
- contig)) {
+ /* if socket ID is invalid, this is an external heap */
+ if (socket_id < 0)
+ goto alloc_unlock;
+
+ if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
+ bound, contig)) {
ret = heap_alloc(heap, type, size, flags, align, bound, contig);
/* this should have succeeded */
@@ -605,7 +644,7 @@ void *
malloc_heap_alloc(const char *type, size_t size, int socket_arg,
unsigned int flags, size_t align, size_t bound, bool contig)
{
- int socket, i, cur_socket;
+ int socket, heap_id, i;
void *ret;
/* return NULL if size is 0 or alignment is not power-of-2 */
@@ -620,22 +659,25 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
else
socket = socket_arg;
- /* Check socket parameter */
- if (socket >= RTE_MAX_NUMA_NODES)
+ /* turn socket ID into heap ID */
+ heap_id = malloc_socket_to_heap_id(socket);
+ /* if heap id is negative, socket ID was invalid */
+ if (heap_id < 0)
return NULL;
- ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
- contig);
+ ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align,
+ bound, contig);
if (ret != NULL || socket_arg != SOCKET_ID_ANY)
return ret;
- /* try other heaps */
+ /* try other heaps. we are only iterating through native DPDK sockets,
+ * so external heaps won't be included.
+ */
for (i = 0; i < (int) rte_socket_count(); i++) {
- cur_socket = rte_socket_id_by_idx(i);
- if (cur_socket == socket)
+ if (i == heap_id)
continue;
- ret = heap_alloc_on_socket(type, size, cur_socket, flags,
- align, bound, contig);
+ ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align,
+ bound, contig);
if (ret != NULL)
return ret;
}
@@ -643,11 +685,11 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
}
static void *
-heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags,
- size_t align, bool contig)
+heap_alloc_biggest_on_heap_id(const char *type, unsigned int heap_id,
+ unsigned int flags, size_t align, bool contig)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+ struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
void *ret;
rte_spinlock_lock(&(heap->lock));
@@ -665,7 +707,7 @@ void *
malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
size_t align, bool contig)
{
- int socket, i, cur_socket;
+ int socket, i, cur_socket, heap_id;
void *ret;
/* return NULL if align is not power-of-2 */
@@ -680,11 +722,13 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
else
socket = socket_arg;
- /* Check socket parameter */
- if (socket >= RTE_MAX_NUMA_NODES)
+ /* turn socket ID into heap ID */
+ heap_id = malloc_socket_to_heap_id(socket);
+ /* if heap id is negative, socket ID was invalid */
+ if (heap_id < 0)
return NULL;
- ret = heap_alloc_biggest_on_socket(type, socket, flags, align,
+ ret = heap_alloc_biggest_on_heap_id(type, heap_id, flags, align,
contig);
if (ret != NULL || socket_arg != SOCKET_ID_ANY)
return ret;
@@ -694,8 +738,8 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
cur_socket = rte_socket_id_by_idx(i);
if (cur_socket == socket)
continue;
- ret = heap_alloc_biggest_on_socket(type, cur_socket, flags,
- align, contig);
+ ret = heap_alloc_biggest_on_heap_id(type, i, flags, align,
+ contig);
if (ret != NULL)
return ret;
}
@@ -760,7 +804,7 @@ malloc_heap_free(struct malloc_elem *elem)
/* ...of which we can't avail if we are in legacy mode, or if this is an
* externally allocated segment.
*/
- if (internal_config.legacy_mem || msl->external)
+ if (internal_config.legacy_mem || (msl->external > 0))
goto free_unlock;
/* check if we can free any memory back to the system */
@@ -917,7 +961,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t size)
}
/*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
*/
int
malloc_heap_get_stats(struct malloc_heap *heap,
@@ -955,7 +999,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
}
/*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
*/
void
malloc_heap_dump(struct malloc_heap *heap, FILE *f)
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index f52cb5559..61b844b6f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -46,6 +46,9 @@ malloc_heap_get_stats(struct malloc_heap *heap,
void
malloc_heap_dump(struct malloc_heap *heap, FILE *f);
+int
+malloc_socket_to_heap_id(unsigned int socket_id);
+
int
rte_eal_malloc_heap_init(void);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 47ca5a742..73d6df31d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -152,11 +152,20 @@ rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int heap_idx, ret = -1;
- if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
- return -1;
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
- return malloc_heap_get_stats(&mcfg->malloc_heaps[socket], socket_stats);
+ heap_idx = malloc_socket_to_heap_id(socket);
+ if (heap_idx < 0)
+ goto unlock;
+
+ ret = malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
+ socket_stats);
+unlock:
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
}
/*
@@ -168,12 +177,14 @@ rte_malloc_dump_heaps(FILE *f)
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
unsigned int idx;
- for (idx = 0; idx < rte_socket_count(); idx++) {
- unsigned int socket = rte_socket_id_by_idx(idx);
- fprintf(f, "Heap on socket %i:\n", socket);
- malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+ for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+ fprintf(f, "Heap id: %u\n", idx);
+ malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
}
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
}
/*
@@ -182,14 +193,19 @@ rte_malloc_dump_heaps(FILE *f)
void
rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
{
- unsigned int socket;
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int heap_id;
struct rte_malloc_socket_stats sock_stats;
+
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
/* Iterate through all initialised heaps */
- for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
- if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0))
- continue;
+ for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
- fprintf(f, "Socket:%u\n", socket);
+ malloc_heap_get_stats(heap, &sock_stats);
+
+ fprintf(f, "Heap id:%u\n", heap_id);
fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
@@ -198,6 +214,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
fprintf(f, "\tAlloc_count:%u,\n",sock_stats.alloc_count);
fprintf(f, "\tFree_count:%u,\n", sock_stats.free_count);
}
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
return;
}
--
2.17.1
^ permalink raw reply [relevance 5%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-30 22:33 0% ` Honnappa Nagarahalli
@ 2018-10-02 13:17 3% ` Van Haaren, Harry
2018-10-02 23:58 0% ` Wang, Yipeng1
0 siblings, 1 reply; 200+ results
From: Van Haaren, Harry @ 2018-10-02 13:17 UTC (permalink / raw)
To: Honnappa Nagarahalli, Richardson, Bruce, Wang, Yipeng1
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd
> -----Original Message-----
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Sunday, September 30, 2018 11:33 PM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Wang, Yipeng1 <yipeng1.wang@intel.com>
> Cc: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org;
> Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Steve Capper
> <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; nd
> <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving
> keys
>
>
> > > > >
> > > > >Reader-writer concurrency issue, caused by moving the keys to their
> > > > >alternative locations during key insert, is solved by introducing a
> > > > >global counter(tbl_chng_cnt) indicating a change in table.
> >
> > <snip>
> >
> > > > > /**
> > > > >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct
> > > > >rte_hash
> > > *h, const void *key,
> > > > > * array of user data. This value is unique for this key.
> > > > > */
> > > > > int32_t
> > > > >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> > > > >+rte_hash_add_key(struct rte_hash *h, const void *key);
> > > > >
> > > > > /**
> > > > > * Add a key to an existing hash table.
> > > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
> > > > >const void
> > > *key);
> > > > > * array of user data. This value is unique for this key.
> > > > > */
> > > > > int32_t
> > > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void
> > > > >*key,
> > > hash_sig_t sig);
> > > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
> > > hash_sig_t sig);
> > > > >
> > > > > /
> > > >
> > > > I think the above changes will break ABI by changing the parameter
> type?
> > > Other people may know better on this.
> > >
> > > Just removing a const should not change the ABI, I believe, since the
> > > const is just advisory hint to the compiler. Actual parameter size and
> > > count remains unchanged so I don't believe there is an issue.
> > > [ABI experts, please correct me if I'm wrong on this]
> >
> >
> > [Certainly no ABI expert, but...]
> >
> > I think this is an API break, not ABI break.
> >
> > Given application code as follows, it will fail to compile - even though
> running
> > the new code as a .so wouldn't cause any issues (AFAIK).
> >
> > void do_hash_stuff(const struct rte_hash *h, ...) {
> > /* parameter passed in is const, but updated function prototype is
> non-
> > const */
> > rte_hash_add_key_with_hash(h, ...);
> > }
> >
> > This means that we can't recompile apps against latest patch without
> > application code changes, if the app was passing a const rte_hash struct
> as
> > the first parameter.
> >
> Agree. Do we need to do anything for this?
I think we should try to avoid breaking API wherever possible.
If we must, then I suppose we could follow the ABI process of
a deprecation notice.
>From my reading of the versioning docs, it doesn't document this case:
https://doc.dpdk.org/guides/contributing/versioning.html
I don't recall a similar situation in DPDK previously - so I suggest
you ask Tech board for input here.
Hope that helps! -Harry
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
2018-10-02 12:23 0% ` Bruce Richardson
@ 2018-10-02 13:07 0% ` Luca Boccassi
0 siblings, 0 replies; 200+ results
From: Luca Boccassi @ 2018-10-02 13:07 UTC (permalink / raw)
To: Bruce Richardson, Marco Varlese; +Cc: Timothy Redaelli, dev, christian.ehrhardt
On Tue, 2018-10-02 at 13:23 +0100, Bruce Richardson wrote:
> On Tue, Oct 02, 2018 at 01:02:26PM +0200, Marco Varlese wrote:
> > On Mon, 2018-10-01 at 12:24 +0100, Luca Boccassi wrote:
> > > On Mon, 2018-10-01 at 12:06 +0100, Bruce Richardson wrote:
> > > > On Mon, Oct 01, 2018 at 12:42:09PM +0200, Timothy Redaelli
> > > > wrote:
> > > > > On Mon, 01 Oct 2018 10:46:02 +0100
> > > > > Luca Boccassi <bluca@debian.org> wrote:
> > > > >
> > > > > > On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> > > > > > > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce
> > > > > > > Richardson
> > > > > > > wrote:
> > > > > > > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi
> > > > > > > > wrote:
> > > > > > > > > Allow users and packagers to override the default
> > > > > > > > > dpdk/drivers
> > > > > > > > > subdirectory where the PMDs get installed under $lib.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > > > > > > ---
> > > > > > > >
> > > > > > > > I'm ok with this change, but what is the current
> > > > > > > > location
> > > > > > > > used by
> > > > > > > > distro's
> > > > > > > > right now? I mistakenly never checked what was done
> > > > > > > > before I
> > > > > > > > used
> > > > > > > > dpdk/drivers as a default value, and I'd like the
> > > > > > > > default to
> > > > > > > > match
> > > > > > > > the
> > > > > > > > common option if possible.
> > > > > > > >
> > > > > > > > /Bruce
> > > > > > > >
> > > > > > >
> > > > > > > Replying to my own question, I've just checked on CentOS
> > > > > > > and
> > > > > > > Debian,
> > > > > > > and it
> > > > > > > appears both are using directory "dpdk-pmds" as the
> > > > > > > subdir
> > > > > > > name.
> > > > > > > Therefore,
> > > > > > > let's just make that the default. [Does it need to be
> > > > > > > configurable in
> > > > > > > that
> > > > > > > case?]
> > > > > > >
> > > > > > > /Bruce
> > > > > >
> > > > > > If the default is the one I expect then I'm fine without
> > > > > > having
> > > > > > an
> > > > > > option (actually happier - less things to configure).
> > > > > >
> > > > > > But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last
> > > > > > January :-)
> > > > > > We changed because using a single directory creates
> > > > > > problems when
> > > > > > multiple different ABI versions are installed, due to the
> > > > > > EAL
> > > > > > autoload
> > > > > > from that directory. So we need a different subdirectory
> > > > > > per ABI
> > > > > > revision.
> > > > > >
> > > > > > We were actually talking with Timothy a while ago to make
> > > > > > this
> > > > > > consistent across our distros, and perhaps Marco can chip
> > > > > > in as
> > > > > > well.
> > > > > >
> > > > > > Timothy, Marco, is using dpdk-MAJORVER-$something ok for
> > > > > > you? I'm
> > > > > > not
> > > > > > too fussy on $something, it can be drivers or pmds or
> > > > > > something
> > > > > > else.
> > > > > >
> > > > >
> > > > > LGTM.
> > > > > If needed, we can just do a compatibility symlink using the
> > > > > current
> > > > > dpdk-pmds path
> > > > >
> > > >
> > > > One suggestion/comment. Would using a unique directory per
> > > > release
> > > > not lead
> > > > to clobbering up the lib directory unnecessarily? How about
> > > > having a
> > > > single
> > > > "dpdk" or "dpdk-pmds" directory in lib, and having $MAJORVER as
> > > > a
> > > > subdir
> > > > under that?
> > > >
> > > > E.g. dpdk/pmds-18.08/, dpdk/pmds-18.11/, or dpdk-pmds/18.08/
> > > > dpdk-pmds/18.11
> > > >
> > > > [The former of the above would be my preference, since I don't
> > > > like
> > > > having
> > > > hypenated names, and like having "dpdk" alone as a folder name
> > > > :-)]
> > > >
> > > > /Bruce
> > >
> > > dpdk/pmds-XX.YY/ would work for me. Timothy and Marco?
> >
> > That would work for us.
> > However, I would suggest to have the path to be configurable
> > (feature to be
> > dropped in maybe next release). Just to make sure the transition
> > can happen
> > without pain in the remote circumstance that something goes wrong
> > with
> > packaging...
> > >
> >
> > --
> > Marco V
> >
>
> Yes, I think it needs to be configurable for the forseeable future.
> If the
> DPDK version is to be put in the path then we either need to always
> use a
> configurable version, since we can't hardcode a version number in the
> default, or else we need to put logic in the meson.build file to
> always
> insert a version number.
>
> /Bruce
Ok, in v2 I added a small bit of logic to set the default to the major
version number (and also the override option).
--
Kind regards,
Luca Boccassi
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v2 1/2] build: change default PMD installation subdir to dpdk/pmds-XX.YY
@ 2018-10-02 13:06 3% ` Luca Boccassi
2018-10-02 15:25 3% ` [dpdk-dev] [PATCH v3 " Luca Boccassi
2018-10-02 16:20 3% ` [dpdk-dev] [PATCH v4 " Luca Boccassi
3 siblings, 0 replies; 200+ results
From: Luca Boccassi @ 2018-10-02 13:06 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, tredaelli, christian.ehrhardt, mvarlese, Luca Boccassi
As part of the effort of consolidating the DPDK installation bits and
pieces across distros, set the default directory of lib/ where PMDs get
installed to dpdk/pmds-XX.YY. It's necessary to have a versioned
subdirectory as multiple ABI revisions might be installed at the same
time, so having a fixed name will cause trouble with the autoload
feature.
Small refactor with parsing and saving the major version to a variable,
since it's now used in 3 different places.
Signed-off-by: Luca Boccassi <bluca@debian.org>
---
drivers/meson.build | 6 ++----
lib/meson.build | 6 ++----
meson.build | 8 +++++++-
3 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/drivers/meson.build b/drivers/meson.build
index 47b4215a30..3a6c4bf656 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -98,10 +98,8 @@ foreach class:driver_classes
lib_version = '@0@.1'.format(version)
so_version = '@0@'.format(version)
else
- pver = meson.project_version().split('.')
- lib_version = '@0@.@1@'.format(pver.get(0),
- pver.get(1))
- so_version = lib_version
+ lib_version = major_version
+ so_version = major_version
endif
# now build the static driver
diff --git a/lib/meson.build b/lib/meson.build
index 3acc67e6ed..bed492a4ec 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -88,10 +88,8 @@ foreach l:libraries
lib_version = '@0@.1'.format(version)
so_version = '@0@'.format(version)
else
- prj_ver = meson.project_version().split('.')
- lib_version = '@0@.@1@'.format(
- prj_ver.get(0), prj_ver.get(1))
- so_version = lib_version
+ lib_version = major_version
+ so_version = major_version
endif
# first build static lib
diff --git a/meson.build b/meson.build
index c9af33532d..4bd04b9de3 100644
--- a/meson.build
+++ b/meson.build
@@ -15,7 +15,13 @@ dpdk_libraries = []
dpdk_drivers = []
dpdk_extra_ldflags = []
-driver_install_path = join_paths(get_option('libdir'), 'dpdk/drivers')
+# set the major version, which might be used by drivers and libraries
+# depending on the configuration options
+pver = meson.project_version().split('.')
+major_version = '@0@.@1@'.format(pver.get(0), pver.get(1))
+
+driver_install_path = join_paths(get_option('libdir'), 'dpdk',
+ 'pmds-' + major_version)
eal_pmd_path = join_paths(get_option('prefix'), driver_install_path)
# configure the build, and make sure configs here and in config folder are
--
2.19.0
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
2018-10-02 11:02 0% ` Marco Varlese
@ 2018-10-02 12:23 0% ` Bruce Richardson
2018-10-02 13:07 0% ` Luca Boccassi
0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-10-02 12:23 UTC (permalink / raw)
To: Marco Varlese; +Cc: Luca Boccassi, Timothy Redaelli, dev, christian.ehrhardt
On Tue, Oct 02, 2018 at 01:02:26PM +0200, Marco Varlese wrote:
> On Mon, 2018-10-01 at 12:24 +0100, Luca Boccassi wrote:
> > On Mon, 2018-10-01 at 12:06 +0100, Bruce Richardson wrote:
> > > On Mon, Oct 01, 2018 at 12:42:09PM +0200, Timothy Redaelli wrote:
> > > > On Mon, 01 Oct 2018 10:46:02 +0100
> > > > Luca Boccassi <bluca@debian.org> wrote:
> > > >
> > > > > On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> > > > > > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson
> > > > > > wrote:
> > > > > > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi
> > > > > > > wrote:
> > > > > > > > Allow users and packagers to override the default
> > > > > > > > dpdk/drivers
> > > > > > > > subdirectory where the PMDs get installed under $lib.
> > > > > > > >
> > > > > > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > > > > > ---
> > > > > > >
> > > > > > > I'm ok with this change, but what is the current location
> > > > > > > used by
> > > > > > > distro's
> > > > > > > right now? I mistakenly never checked what was done before I
> > > > > > > used
> > > > > > > dpdk/drivers as a default value, and I'd like the default to
> > > > > > > match
> > > > > > > the
> > > > > > > common option if possible.
> > > > > > >
> > > > > > > /Bruce
> > > > > > >
> > > > > >
> > > > > > Replying to my own question, I've just checked on CentOS and
> > > > > > Debian,
> > > > > > and it
> > > > > > appears both are using directory "dpdk-pmds" as the subdir
> > > > > > name.
> > > > > > Therefore,
> > > > > > let's just make that the default. [Does it need to be
> > > > > > configurable in
> > > > > > that
> > > > > > case?]
> > > > > >
> > > > > > /Bruce
> > > > >
> > > > > If the default is the one I expect then I'm fine without having
> > > > > an
> > > > > option (actually happier - less things to configure).
> > > > >
> > > > > But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last
> > > > > January :-)
> > > > > We changed because using a single directory creates problems when
> > > > > multiple different ABI versions are installed, due to the EAL
> > > > > autoload
> > > > > from that directory. So we need a different subdirectory per ABI
> > > > > revision.
> > > > >
> > > > > We were actually talking with Timothy a while ago to make this
> > > > > consistent across our distros, and perhaps Marco can chip in as
> > > > > well.
> > > > >
> > > > > Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm
> > > > > not
> > > > > too fussy on $something, it can be drivers or pmds or something
> > > > > else.
> > > > >
> > > >
> > > > LGTM.
> > > > If needed, we can just do a compatibility symlink using the current
> > > > dpdk-pmds path
> > > >
> > >
> > > One suggestion/comment. Would using a unique directory per release
> > > not lead
> > > to clobbering up the lib directory unnecessarily? How about having a
> > > single
> > > "dpdk" or "dpdk-pmds" directory in lib, and having $MAJORVER as a
> > > subdir
> > > under that?
> > >
> > > E.g. dpdk/pmds-18.08/, dpdk/pmds-18.11/, or dpdk-pmds/18.08/
> > > dpdk-pmds/18.11
> > >
> > > [The former of the above would be my preference, since I don't like
> > > having
> > > hypenated names, and like having "dpdk" alone as a folder name :-)]
> > >
> > > /Bruce
> >
> > dpdk/pmds-XX.YY/ would work for me. Timothy and Marco?
> That would work for us.
> However, I would suggest to have the path to be configurable (feature to be
> dropped in maybe next release). Just to make sure the transition can happen
> without pain in the remote circumstance that something goes wrong with
> packaging...
> >
> --
> Marco V
>
Yes, I think it needs to be configurable for the forseeable future. If the
DPDK version is to be put in the path then we either need to always use a
configurable version, since we can't hardcode a version number in the
default, or else we need to put logic in the meson.build file to always
insert a version number.
/Bruce
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
2018-10-01 11:24 0% ` Luca Boccassi
@ 2018-10-02 11:02 0% ` Marco Varlese
2018-10-02 12:23 0% ` Bruce Richardson
0 siblings, 1 reply; 200+ results
From: Marco Varlese @ 2018-10-02 11:02 UTC (permalink / raw)
To: Luca Boccassi, Bruce Richardson, Timothy Redaelli; +Cc: dev, christian.ehrhardt
On Mon, 2018-10-01 at 12:24 +0100, Luca Boccassi wrote:
> On Mon, 2018-10-01 at 12:06 +0100, Bruce Richardson wrote:
> > On Mon, Oct 01, 2018 at 12:42:09PM +0200, Timothy Redaelli wrote:
> > > On Mon, 01 Oct 2018 10:46:02 +0100
> > > Luca Boccassi <bluca@debian.org> wrote:
> > >
> > > > On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> > > > > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson
> > > > > wrote:
> > > > > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi
> > > > > > wrote:
> > > > > > > Allow users and packagers to override the default
> > > > > > > dpdk/drivers
> > > > > > > subdirectory where the PMDs get installed under $lib.
> > > > > > >
> > > > > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > > > > ---
> > > > > >
> > > > > > I'm ok with this change, but what is the current location
> > > > > > used by
> > > > > > distro's
> > > > > > right now? I mistakenly never checked what was done before I
> > > > > > used
> > > > > > dpdk/drivers as a default value, and I'd like the default to
> > > > > > match
> > > > > > the
> > > > > > common option if possible.
> > > > > >
> > > > > > /Bruce
> > > > > >
> > > > >
> > > > > Replying to my own question, I've just checked on CentOS and
> > > > > Debian,
> > > > > and it
> > > > > appears both are using directory "dpdk-pmds" as the subdir
> > > > > name.
> > > > > Therefore,
> > > > > let's just make that the default. [Does it need to be
> > > > > configurable in
> > > > > that
> > > > > case?]
> > > > >
> > > > > /Bruce
> > > >
> > > > If the default is the one I expect then I'm fine without having
> > > > an
> > > > option (actually happier - less things to configure).
> > > >
> > > > But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last
> > > > January :-)
> > > > We changed because using a single directory creates problems when
> > > > multiple different ABI versions are installed, due to the EAL
> > > > autoload
> > > > from that directory. So we need a different subdirectory per ABI
> > > > revision.
> > > >
> > > > We were actually talking with Timothy a while ago to make this
> > > > consistent across our distros, and perhaps Marco can chip in as
> > > > well.
> > > >
> > > > Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm
> > > > not
> > > > too fussy on $something, it can be drivers or pmds or something
> > > > else.
> > > >
> > >
> > > LGTM.
> > > If needed, we can just do a compatibility symlink using the current
> > > dpdk-pmds path
> > >
> >
> > One suggestion/comment. Would using a unique directory per release
> > not lead
> > to clobbering up the lib directory unnecessarily? How about having a
> > single
> > "dpdk" or "dpdk-pmds" directory in lib, and having $MAJORVER as a
> > subdir
> > under that?
> >
> > E.g. dpdk/pmds-18.08/, dpdk/pmds-18.11/, or dpdk-pmds/18.08/
> > dpdk-pmds/18.11
> >
> > [The former of the above would be my preference, since I don't like
> > having
> > hypenated names, and like having "dpdk" alone as a folder name :-)]
> >
> > /Bruce
>
> dpdk/pmds-XX.YY/ would work for me. Timothy and Marco?
That would work for us.
However, I would suggest to have the path to be configurable (feature to be
dropped in maybe next release). Just to make sure the transition can happen
without pain in the remote circumstance that something goes wrong with
packaging...
>
--
Marco V
SUSE LINUX GmbH | GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg) Maxfeldstr. 5, D-90409, Nürnberg
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list
2018-10-01 17:01 3% ` Stephen Hemminger
@ 2018-10-02 9:03 0% ` Burakov, Anatoly
0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-10-02 9:03 UTC (permalink / raw)
To: Stephen Hemminger
Cc: dev, John McNamara, Marko Kovacevic, Bruce Richardson,
laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
On 01-Oct-18 6:01 PM, Stephen Hemminger wrote:
> On Mon, 1 Oct 2018 13:56:09 +0100
> Anatoly Burakov <anatoly.burakov@intel.com> wrote:
>
>> diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
>> index aff0688dd..1d8b0a6fe 100644
>> --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
>> +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
>> @@ -30,6 +30,7 @@ struct rte_memseg_list {
>> uint64_t addr_64;
>> /**< Makes sure addr is always 64-bits */
>> };
>> + size_t len; /**< Length of memory area covered by this memseg list. */
>> int socket_id; /**< Socket ID for all memsegs in this list. */
>> uint64_t page_sz; /**< Page size for all memsegs in this list. */
>> volatile uint32_t version; /**< version number for multiprocess sync. */
>
> If you are going to break ABI, why not try and rearrange to eliminate holes:
>
> Output of pahole (on x86 64 bit):
>
> struct rte_memseg_list {
> union {
> void * base_va; /* 0 8 */
> uint64_t addr_64; /* 0 8 */
> }; /* 0 8 */
> size_t len; /* 8 8 */
> int socket_id; /* 16 4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> uint64_t page_sz; /* 24 8 */
> volatile uint32_t version; /* 32 4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> struct rte_fbarray memseg_arr; /* 40 96 */
>
> /* XXX last struct has 4 bytes of padding */
>
> /* size: 136, cachelines: 3, members: 6 */
> /* sum members: 128, holes: 2, sum holes: 8 */
> /* paddings: 1, sum paddings: 4 */
> /* last cacheline: 8 bytes */
> };
>
Hi Stephen,
This data structure isn't performance-critical in any remote sense, but
sure, I can do that.
--
Thanks,
Anatoly
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list
2018-10-01 12:56 12% ` [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list Anatoly Burakov
@ 2018-10-01 17:01 3% ` Stephen Hemminger
2018-10-02 9:03 0% ` Burakov, Anatoly
0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2018-10-01 17:01 UTC (permalink / raw)
To: Anatoly Burakov
Cc: dev, John McNamara, Marko Kovacevic, Bruce Richardson,
laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
On Mon, 1 Oct 2018 13:56:09 +0100
Anatoly Burakov <anatoly.burakov@intel.com> wrote:
> diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
> index aff0688dd..1d8b0a6fe 100644
> --- a/lib/librte_eal/common/include/rte_eal_memconfig.h
> +++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
> @@ -30,6 +30,7 @@ struct rte_memseg_list {
> uint64_t addr_64;
> /**< Makes sure addr is always 64-bits */
> };
> + size_t len; /**< Length of memory area covered by this memseg list. */
> int socket_id; /**< Socket ID for all memsegs in this list. */
> uint64_t page_sz; /**< Page size for all memsegs in this list. */
> volatile uint32_t version; /**< version number for multiprocess sync. */
If you are going to break ABI, why not try and rearrange to eliminate holes:
Output of pahole (on x86 64 bit):
struct rte_memseg_list {
union {
void * base_va; /* 0 8 */
uint64_t addr_64; /* 0 8 */
}; /* 0 8 */
size_t len; /* 8 8 */
int socket_id; /* 16 4 */
/* XXX 4 bytes hole, try to pack */
uint64_t page_sz; /* 24 8 */
volatile uint32_t version; /* 32 4 */
/* XXX 4 bytes hole, try to pack */
struct rte_fbarray memseg_arr; /* 40 96 */
/* XXX last struct has 4 bytes of padding */
/* size: 136, cachelines: 3, members: 6 */
/* sum members: 128, holes: 2, sum holes: 8 */
/* paddings: 1, sum paddings: 4 */
/* last cacheline: 8 bytes */
};
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v8 02/21] mem: allow memseg lists to be marked as external
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
2018-10-01 12:56 12% ` [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list Anatoly Burakov
@ 2018-10-01 12:56 13% ` Anatoly Burakov
2018-10-01 12:56 5% ` [dpdk-dev] [PATCH v8 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
` (2 subsequent siblings)
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
To: dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, thomas, alejandro.lucero
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so document the change in release notes.
This also breaks a few internal assumptions about memory
contiguousness, so adjust malloc code in a few places.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---
Notes:
v3:
- Add comment to explain the process of picking up minimum
page sizes for mempool
v2:
- Add documentation changes and ABI break
v1:
- Adjust all calls to memseg walk functions to ignore external
segments where it made sense to do so
doc/guides/rel_notes/deprecation.rst | 15 --------
doc/guides/rel_notes/release_18_11.rst | 9 ++++-
drivers/bus/fslmc/fslmc_vfio.c | 6 +++-
drivers/net/mlx5/mlx5.c | 4 ++-
drivers/net/virtio/virtio_user/vhost_kernel.c | 3 ++
lib/librte_eal/bsdapp/eal/eal.c | 3 ++
lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++--
lib/librte_eal/common/eal_common_memory.c | 3 ++
.../common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/common/include/rte_memory.h | 9 +++++
lib/librte_eal/common/malloc_elem.c | 10 ++++--
lib/librte_eal/common/malloc_heap.c | 9 +++--
lib/librte_eal/common/rte_malloc.c | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +++++-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++---
lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++-----
test/test/test_malloc.c | 3 ++
test/test/test_memzone.c | 3 ++
19 files changed, 119 insertions(+), 39 deletions(-)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------
-* eal: certain structures will change in EAL on account of upcoming external
- memory support. Aside from internal changes leading to an ABI break, the
- following externally visible changes will also be implemented:
-
- - ``rte_memseg_list`` will change to include a boolean flag indicating
- whether a particular memseg list is externally allocated. This will have
- implications for any users of memseg-walk-related functions, as they will
- now have to skip externally allocated segments in most cases if the intent
- is to only iterate over internal DPDK memory.
- - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
- as some socket ID's will now be representing externally allocated memory. No
- changes will be required for existing code as backwards compatibility will
- be kept, and those who do not use this feature will not see these extra
- socket ID's.
-
* eal: both declaring and identifying devices will be streamlined in v18.11.
New functions will appear to query a specific port from buses, classes of
device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 9c17762a5..d55e12a27 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -102,6 +102,12 @@ API Changes
Also, make sure to start the actual text at the margin.
=========================================================
+* eal: ``rte_memseg_list`` structure now has an additional flag indicating
+ whether the memseg list is externally allocated. This will have implications
+ for any users of memseg-walk-related functions, as they will now have to skip
+ externally allocated segments in most cases if the intent is to only iterate
+ over internal DPDK memory.
+
* mbuf: The ``__rte_mbuf_raw_free()`` and ``__rte_pktmbuf_prefree_seg()``
functions were deprecated since 17.05 and are replaced by
``rte_mbuf_raw_free()`` and ``rte_pktmbuf_prefree_seg()``.
@@ -118,7 +124,6 @@ API Changes
To request keeping CRC, application should set ``DEV_RX_OFFLOAD_KEEP_CRC`` Rx
offload.
-
ABI Changes
-----------
@@ -138,6 +143,8 @@ ABI Changes
supporting external memory in DPDK:
- structure ``rte_memseg_list`` now has a new field indicating length
of memory addressed by the segment list
+ - structure ``rte_memseg_list`` now has a new flag indicating whether
+ the memseg list refers to external memory
Removed Items
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..cb33dd891 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -318,11 +318,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
static int
fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+ const struct rte_memseg *ms, void *arg)
{
int *n_segs = arg;
int ret;
+ /* if IOVA address is invalid, skip */
+ if (ms->iova == RTE_BAD_IOVA)
+ return 0;
+
ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
if (ret)
DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index fd89e2af3..af4a78ce9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,13 @@ static struct rte_pci_driver mlx5_driver;
static void *uar_base;
static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
void **addr = arg;
+ if (msl->external)
+ return 0;
if (*addr == NULL)
*addr = ms->addr;
else
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index b3bfcb76f..990ce80ce 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -78,6 +78,9 @@ add_memseg_list(const struct rte_memseg_list *msl, void *arg)
void *start_addr;
uint64_t len;
+ if (msl->external)
+ return 0;
+
if (vm->nregions >= max_regions)
return -1;
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
return 1;
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
int seg_idx;
};
static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
struct attach_walk_args *wa = arg;
void *addr;
+ if (msl->external)
+ return 0;
+
addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
{
uint64_t *total_len = arg;
+ if (msl->external)
+ return 0;
+
*total_len += msl->memseg_arr.count * msl->page_sz;
return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
+ unsigned int external; /**< 1 if this list points to external memory */
volatile uint32_t version; /**< version number for multiprocess sync. */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
/* if we're in IOVA as VA mode, or if we're in legacy mode with
- * hugepages, all elements are IOVA-contiguous.
+ * hugepages, all elements are IOVA-contiguous. however, we can only
+ * make these assumptions about internal memory - externally allocated
+ * segments have to be checked.
*/
- if (rte_eal_iova_mode() == RTE_IOVA_VA ||
- (internal_config.legacy_mem && rte_eal_has_hugepages()))
+ if (!elem->msl->external &&
+ (rte_eal_iova_mode() == RTE_IOVA_VA ||
+ (internal_config.legacy_mem &&
+ rte_eal_has_hugepages())))
return RTE_PTR_DIFF(data_end, contig_seg_start);
cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct malloc_heap *heap;
int msl_idx;
+ if (msl->external)
+ return 0;
+
heap = &mcfg->malloc_heaps[msl->socket_id];
/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
/* anything after this is a bonus */
ret = 0;
- /* ...of which we can't avail if we are in legacy mode */
- if (internal_config.legacy_mem)
+ /* ...of which we can't avail if we are in legacy mode, or if this is an
+ * externally allocated segment.
+ */
+ if (internal_config.legacy_mem || msl->external)
goto free_unlock;
/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
if (elem == NULL)
return RTE_BAD_IOVA;
- if (rte_eal_iova_mode() == RTE_IOVA_VA)
+ if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
return (uintptr_t) addr;
ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
return *socket_id == msl->socket_id;
}
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg __rte_unused)
{
/* ms is const, so find this memseg */
- struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+ struct rte_memseg *found;
+
+ if (msl->external)
+ return 0;
+
+ found = rte_mem_virt2memseg(ms->addr, msl);
found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
unsigned int i;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
char name[PATH_MAX];
int msl_idx, ret;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
unsigned int len;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
}
static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
}
static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
uint64_t hugepage_sz;
};
static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
struct spapr_walk_param *param = arg;
uint64_t max = ms->iova + ms->len;
+ if (msl->external)
+ return 0;
+
if (max > param->window_size) {
param->hugepage_sz = ms->hugepage_sz;
param->window_size = max;
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
}
+struct pagesz_walk_arg {
+ int socket_id;
+ size_t min;
+};
+
static int
find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
{
- size_t *min = arg;
+ struct pagesz_walk_arg *wa = arg;
+ bool valid;
- if (msl->page_sz < *min)
- *min = msl->page_sz;
+ /*
+ * we need to only look at page sizes available for a particular socket
+ * ID. so, we either need an exact match on socket ID (can match both
+ * native and external memory), or, if SOCKET_ID_ANY was specified as a
+ * socket ID argument, we must only look at native memory and ignore any
+ * page sizes associated with external memory.
+ */
+ valid = msl->socket_id == wa->socket_id;
+ valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+ if (valid && msl->page_sz < wa->min)
+ wa->min = msl->page_sz;
return 0;
}
static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
{
- size_t min_pagesz = SIZE_MAX;
+ struct pagesz_walk_arg wa;
- rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+ wa.min = SIZE_MAX;
+ wa.socket_id = socket_id;
- return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+ rte_memseg_list_walk(find_min_pagesz, &wa);
+
+ return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
}
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_sz = 0;
pg_shift = 0;
} else if (try_contig) {
- pg_sz = get_min_page_size();
+ pg_sz = get_min_page_size(mp->socket_id);
pg_shift = rte_bsf32(pg_sz);
} else {
pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
{
int32_t *socket = arg;
+ if (msl->external)
+ return 0;
+
return *socket == msl->socket_id;
}
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
{
struct walk_arg *wa = arg;
+ if (msl->external)
+ return 0;
+
if (msl->page_sz == RTE_PGSIZE_2M)
wa->hugepage_2MB_avail = 1;
if (msl->page_sz == RTE_PGSIZE_1G)
--
2.17.1
^ permalink raw reply [relevance 13%]
* [dpdk-dev] [PATCH v8 00/21] Support externally allocated memory in DPDK
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
@ 2018-10-01 12:56 3% ` Anatoly Burakov
2018-10-02 13:34 3% ` [dpdk-dev] [PATCH v9 " Anatoly Burakov
` (5 more replies)
2018-10-01 12:56 12% ` [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list Anatoly Burakov
` (4 subsequent siblings)
5 siblings, 6 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
To: dev
Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero
This is a proposal to enable using externally allocated memory
in DPDK.
In a nutshell, here is what is being done here:
- Index internal malloc heaps by NUMA node index, rather than NUMA
node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
- Each new heap will receive a unique socket ID that will be used by
allocator to decide from which heap (internal or external) to
allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
of externally allocated memory
- If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps
The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).
The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).
A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.
Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.
v8 -> v7 changes:
- Rebase on latest master
- More documentation on ABI changes
v7 -> v6 changes:
- Fixed missing IOVA address setup in testpmd
- Fixed MLX drivers as per Yongseok's comments
- Added a check for invalid heap idx on adding memory to heap
v6 -> v5 changes:
- Fixed documentation formatting as per Marko's comments
v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes
v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
IOVA-contiguousness when dealing with externally allocated memory
v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
comments
v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation
RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements
Anatoly Burakov (21):
mem: add length to memseg list
mem: allow memseg lists to be marked as external
malloc: index heaps using heap ID rather than NUMA node
mem: do not check for invalid socket ID
flow_classify: do not check for invalid socket ID
pipeline: do not check for invalid socket ID
sched: do not check for invalid socket ID
malloc: add name to malloc heaps
malloc: add function to query socket ID of named heap
malloc: add function to check if socket is external
malloc: allow creating malloc heaps
malloc: allow destroying heaps
malloc: allow adding memory to named heaps
malloc: allow removing memory from named heaps
malloc: allow attaching to external memory chunks
malloc: allow detaching from external memory
malloc: enable event callbacks for external memory
test: add unit tests for external memory support
app/testpmd: add support for external memory
doc: add external memory feature to the release notes
doc: add external memory feature to programmer's guide
app/test-pmd/config.c | 21 +-
app/test-pmd/parameters.c | 23 +-
app/test-pmd/testpmd.c | 320 ++++++++++++-
app/test-pmd/testpmd.h | 13 +-
config/common_base | 1 +
config/rte_config.h | 1 +
.../prog_guide/env_abstraction_layer.rst | 37 ++
doc/guides/rel_notes/deprecation.rst | 15 -
doc/guides/rel_notes/release_18_11.rst | 37 +-
doc/guides/testpmd_app_ug/run_app.rst | 12 +
drivers/bus/fslmc/fslmc_vfio.c | 13 +-
drivers/bus/pci/linux/pci.c | 2 +-
drivers/net/mlx5/mlx5.c | 4 +-
drivers/net/virtio/virtio_user/vhost_kernel.c | 3 +
.../net/virtio/virtio_user/virtio_user_dev.c | 6 +
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 +
lib/librte_eal/bsdapp/eal/eal_memory.c | 9 +-
lib/librte_eal/common/eal_common_memory.c | 8 +-
lib/librte_eal/common/eal_common_memzone.c | 8 +-
.../common/include/rte_eal_memconfig.h | 9 +-
lib/librte_eal/common/include/rte_malloc.h | 192 ++++++++
.../common/include/rte_malloc_heap.h | 3 +
lib/librte_eal/common/include/rte_memory.h | 9 +
lib/librte_eal/common/malloc_elem.c | 10 +-
lib/librte_eal/common/malloc_heap.c | 320 +++++++++++--
lib/librte_eal/common/malloc_heap.h | 17 +
lib/librte_eal/common/rte_malloc.c | 429 +++++++++++++++++-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 12 +-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 27 +-
lib/librte_eal/meson.build | 2 +-
lib/librte_eal/rte_eal_version.map | 8 +
lib/librte_flow_classify/rte_flow_classify.c | 3 +-
lib/librte_mempool/rte_mempool.c | 57 ++-
lib/librte_pipeline/rte_pipeline.c | 3 +-
lib/librte_sched/rte_sched.c | 2 +-
test/test/Makefile | 1 +
test/test/autotest_data.py | 14 +-
test/test/meson.build | 1 +
test/test/test_external_mem.c | 389 ++++++++++++++++
test/test/test_malloc.c | 3 +
test/test/test_memzone.c | 3 +
45 files changed, 1930 insertions(+), 138 deletions(-)
create mode 100644 test/test/test_external_mem.c
--
2.17.1
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v8 11/21] malloc: allow creating malloc heaps
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
` (4 preceding siblings ...)
2018-10-01 12:56 8% ` [dpdk-dev] [PATCH v8 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-10-01 12:56 7% ` Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.
This breaks the ABI, so document the change.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 2 +
.../common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_malloc.h | 19 +++++++
lib/librte_eal/common/malloc_heap.c | 37 +++++++++++++
lib/librte_eal/common/malloc_heap.h | 3 ++
lib/librte_eal/common/rte_malloc.c | 52 +++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 117 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index a6bddaaf4..cb6308b1f 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -157,6 +157,8 @@ ABI Changes
- structure ``rte_mem_config`` has had its ``malloc_heaps`` array
resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
- structure ``rte_malloc_heap`` now has a ``heap_name`` member
+ - structure ``rte_eal_memconfig`` has been extended to contain next
+ socket ID for externally allocated segments
Removed Items
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index d7920a4e0..98da58771 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
/* Heaps of Malloc */
struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
+ /* next socket ID for external malloc heap */
+ int next_socket_id;
+
/* address of mem_config in primary process. used to map shared config into
* exact same address the primary process maps it.
*/
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats);
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ * socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ * Name of the heap to create.
+ *
+ * @return
+ * - 0 on successful creation
+ * - -1 in case of error, with rte_errno set to one of the following:
+ * EINVAL - ``heap_name`` was NULL, empty or too long
+ * EEXIST - heap by name of ``heap_name`` already exists
+ * ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
/**
* Find socket ID corresponding to a named heap.
*
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index b28905817..00fdf54f7 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
#include "malloc_heap.h"
#include "malloc_mp.h"
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
static unsigned
check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
{
@@ -1019,6 +1023,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
rte_spinlock_unlock(&heap->lock);
}
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ uint32_t next_socket_id = mcfg->next_socket_id;
+
+ /* prevent overflow. did you really create 2 billion heaps??? */
+ if (next_socket_id > INT32_MAX) {
+ RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+ rte_errno = ENOSPC;
+ return -1;
+ }
+
+ /* initialize empty heap */
+ heap->alloc_count = 0;
+ heap->first = NULL;
+ heap->last = NULL;
+ LIST_INIT(heap->free_head);
+ rte_spinlock_init(&heap->lock);
+ heap->total_size = 0;
+ heap->socket_id = next_socket_id;
+
+ /* we hold a global mem hotplug writelock, so it's safe to increment */
+ mcfg->next_socket_id++;
+
+ /* set up name */
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ return 0;
+}
+
int
rte_eal_malloc_heap_init(void)
{
@@ -1026,6 +1060,9 @@ rte_eal_malloc_heap_init(void)
unsigned int i;
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign min socket ID to external heaps */
+ mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
/* assign names to default DPDK heaps */
for (i = 0; i < rte_socket_count(); i++) {
struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
size_t align, bool contig);
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
int
malloc_heap_free(struct malloc_elem *elem);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_eal_memconfig.h>
+#include <rte_errno.h>
#include <rte_branch_prediction.h>
#include <rte_debug.h>
#include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
}
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct malloc_heap *heap = NULL;
+ int i, ret;
+
+ if (heap_name == NULL ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+ RTE_HEAP_NAME_MAX_LEN) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ /* check if there is space in the heap list, or if heap with this name
+ * already exists.
+ */
+ rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+ /* existing heap */
+ if (strncmp(heap_name, tmp->name,
+ RTE_HEAP_NAME_MAX_LEN) == 0) {
+ RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+ heap_name);
+ rte_errno = EEXIST;
+ ret = -1;
+ goto unlock;
+ }
+ /* empty heap */
+ if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+ heap = tmp;
+ break;
+ }
+ }
+ if (heap == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+ rte_errno = ENOSPC;
+ ret = -1;
+ goto unlock;
+ }
+
+ /* we're sure that we can create a new heap, so do it */
+ ret = malloc_heap_create(heap, heap_name);
+unlock:
+ rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
rte_fbarray_set_used;
rte_log_register_type_and_pick_level;
rte_malloc_dump_heaps;
+ rte_malloc_heap_create;
rte_malloc_heap_get_socket;
rte_malloc_heap_socket_is_external;
rte_mem_alloc_validator_register;
--
2.17.1
^ permalink raw reply [relevance 7%]
* [dpdk-dev] [PATCH v8 03/21] malloc: index heaps using heap ID rather than NUMA node
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
` (2 preceding siblings ...)
2018-10-01 12:56 13% ` [dpdk-dev] [PATCH v8 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-10-01 12:56 5% ` Anatoly Burakov
2018-10-01 12:56 8% ` [dpdk-dev] [PATCH v8 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-10-01 12:56 7% ` [dpdk-dev] [PATCH v8 11/21] malloc: allow creating " Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
To: dev
Cc: Thomas Monjalon, Bruce Richardson, John McNamara,
Marko Kovacevic, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, shreyansh.jain, shahafs, arybchenko,
alejandro.lucero
Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation.
This breaks the ABI, so document the changes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
config/common_base | 1 +
config/rte_config.h | 1 +
doc/guides/rel_notes/release_18_11.rst | 5 +-
.../common/include/rte_eal_memconfig.h | 4 +-
.../common/include/rte_malloc_heap.h | 1 +
lib/librte_eal/common/malloc_heap.c | 102 +++++++++++++-----
lib/librte_eal/common/malloc_heap.h | 3 +
lib/librte_eal/common/rte_malloc.c | 41 ++++---
8 files changed, 114 insertions(+), 44 deletions(-)
diff --git a/config/common_base b/config/common_base
index 155c7d40e..b52770b27 100644
--- a/config/common_base
+++ b/config/common_base
@@ -61,6 +61,7 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
CONFIG_RTE_LIBRTE_EAL=y
CONFIG_RTE_MAX_LCORE=128
CONFIG_RTE_MAX_NUMA_NODES=8
+CONFIG_RTE_MAX_HEAPS=32
CONFIG_RTE_MAX_MEMSEG_LISTS=64
# each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
# or RTE_MAX_MEM_MB_PER_LIST megabytes worth of memory, whichever is smaller
diff --git a/config/rte_config.h b/config/rte_config.h
index 567051b9c..5dd2ac1ad 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -24,6 +24,7 @@
#define RTE_BUILD_SHARED_LIB
/* EAL defines */
+#define RTE_MAX_HEAPS 32
#define RTE_MAX_MEMSEG_LISTS 128
#define RTE_MAX_MEMSEG_PER_LIST 8192
#define RTE_MAX_MEM_MB_PER_LIST 32768
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index d55e12a27..c627c1e88 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -145,7 +145,10 @@ ABI Changes
of memory addressed by the segment list
- structure ``rte_memseg_list`` now has a new flag indicating whether
the memseg list refers to external memory
-
+ - structure ``rte_malloc_heap`` now has a new field indicating socket
+ ID the malloc heap belongs to
+ - structure ``rte_mem_config`` has had its ``malloc_heaps`` array
+ resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 6baa6854f..d7920a4e0 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -72,8 +72,8 @@ struct rte_mem_config {
struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */
- /* Heaps of Malloc per socket */
- struct malloc_heap malloc_heaps[RTE_MAX_NUMA_NODES];
+ /* Heaps of Malloc */
+ struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
/* address of mem_config in primary process. used to map shared config into
* exact same address the primary process maps it.
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index d43fa9097..e7ac32d42 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -27,6 +27,7 @@ struct malloc_heap {
unsigned alloc_count;
size_t total_size;
+ unsigned int socket_id;
} __rte_cache_aligned;
#endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 3c8e2063b..a9cfa423f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -66,6 +66,21 @@ check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
return check_flag & flags;
}
+int
+malloc_socket_to_heap_id(unsigned int socket_id)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int i;
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+
+ if (heap->socket_id == socket_id)
+ return i;
+ }
+ return -1;
+}
+
/*
* Expand the heap with a memory area.
*/
@@ -93,12 +108,17 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
struct rte_memseg_list *found_msl;
struct malloc_heap *heap;
- int msl_idx;
+ int msl_idx, heap_idx;
if (msl->external)
return 0;
- heap = &mcfg->malloc_heaps[msl->socket_id];
+ heap_idx = malloc_socket_to_heap_id(msl->socket_id);
+ if (heap_idx < 0) {
+ RTE_LOG(ERR, EAL, "Memseg list has invalid socket id\n");
+ return -1;
+ }
+ heap = &mcfg->malloc_heaps[heap_idx];
/* msl is const, so find it */
msl_idx = msl - mcfg->memsegs;
@@ -111,6 +131,7 @@ malloc_add_seg(const struct rte_memseg_list *msl,
malloc_heap_add_memory(heap, found_msl, ms->addr, len);
heap->total_size += len;
+ heap->socket_id = msl->socket_id;
RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
msl->socket_id);
@@ -561,12 +582,14 @@ alloc_more_mem_on_socket(struct malloc_heap *heap, size_t size, int socket,
/* this will try lower page sizes first */
static void *
-heap_alloc_on_socket(const char *type, size_t size, int socket,
- unsigned int flags, size_t align, size_t bound, bool contig)
+malloc_heap_alloc_on_heap_id(const char *type, size_t size,
+ unsigned int heap_id, unsigned int flags, size_t align,
+ size_t bound, bool contig)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+ struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
unsigned int size_flags = flags & ~RTE_MEMZONE_SIZE_HINT_ONLY;
+ int socket_id;
void *ret;
rte_spinlock_lock(&(heap->lock));
@@ -584,12 +607,28 @@ heap_alloc_on_socket(const char *type, size_t size, int socket,
* we may still be able to allocate memory from appropriate page sizes,
* we just need to request more memory first.
*/
+
+ socket_id = rte_socket_id_by_idx(heap_id);
+ /*
+ * if socket ID is negative, we cannot find a socket ID for this heap -
+ * which means it's an external heap. those can have unexpected page
+ * sizes, so if the user asked to allocate from there - assume user
+ * knows what they're doing, and allow allocating from there with any
+ * page size flags.
+ */
+ if (socket_id < 0)
+ size_flags |= RTE_MEMZONE_SIZE_HINT_ONLY;
+
ret = heap_alloc(heap, type, size, size_flags, align, bound, contig);
if (ret != NULL)
goto alloc_unlock;
- if (!alloc_more_mem_on_socket(heap, size, socket, flags, align, bound,
- contig)) {
+ /* if socket ID is invalid, this is an external heap */
+ if (socket_id < 0)
+ goto alloc_unlock;
+
+ if (!alloc_more_mem_on_socket(heap, size, socket_id, flags, align,
+ bound, contig)) {
ret = heap_alloc(heap, type, size, flags, align, bound, contig);
/* this should have succeeded */
@@ -605,7 +644,7 @@ void *
malloc_heap_alloc(const char *type, size_t size, int socket_arg,
unsigned int flags, size_t align, size_t bound, bool contig)
{
- int socket, i, cur_socket;
+ int socket, heap_id, i;
void *ret;
/* return NULL if size is 0 or alignment is not power-of-2 */
@@ -620,22 +659,25 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
else
socket = socket_arg;
- /* Check socket parameter */
- if (socket >= RTE_MAX_NUMA_NODES)
+ /* turn socket ID into heap ID */
+ heap_id = malloc_socket_to_heap_id(socket);
+ /* if heap id is negative, socket ID was invalid */
+ if (heap_id < 0)
return NULL;
- ret = heap_alloc_on_socket(type, size, socket, flags, align, bound,
- contig);
+ ret = malloc_heap_alloc_on_heap_id(type, size, heap_id, flags, align,
+ bound, contig);
if (ret != NULL || socket_arg != SOCKET_ID_ANY)
return ret;
- /* try other heaps */
+ /* try other heaps. we are only iterating through native DPDK sockets,
+ * so external heaps won't be included.
+ */
for (i = 0; i < (int) rte_socket_count(); i++) {
- cur_socket = rte_socket_id_by_idx(i);
- if (cur_socket == socket)
+ if (i == heap_id)
continue;
- ret = heap_alloc_on_socket(type, size, cur_socket, flags,
- align, bound, contig);
+ ret = malloc_heap_alloc_on_heap_id(type, size, i, flags, align,
+ bound, contig);
if (ret != NULL)
return ret;
}
@@ -643,11 +685,11 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
}
static void *
-heap_alloc_biggest_on_socket(const char *type, int socket, unsigned int flags,
- size_t align, bool contig)
+heap_alloc_biggest_on_heap_id(const char *type, unsigned int heap_id,
+ unsigned int flags, size_t align, bool contig)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
- struct malloc_heap *heap = &mcfg->malloc_heaps[socket];
+ struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
void *ret;
rte_spinlock_lock(&(heap->lock));
@@ -665,7 +707,7 @@ void *
malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
size_t align, bool contig)
{
- int socket, i, cur_socket;
+ int socket, i, cur_socket, heap_id;
void *ret;
/* return NULL if align is not power-of-2 */
@@ -680,11 +722,13 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
else
socket = socket_arg;
- /* Check socket parameter */
- if (socket >= RTE_MAX_NUMA_NODES)
+ /* turn socket ID into heap ID */
+ heap_id = malloc_socket_to_heap_id(socket);
+ /* if heap id is negative, socket ID was invalid */
+ if (heap_id < 0)
return NULL;
- ret = heap_alloc_biggest_on_socket(type, socket, flags, align,
+ ret = heap_alloc_biggest_on_heap_id(type, heap_id, flags, align,
contig);
if (ret != NULL || socket_arg != SOCKET_ID_ANY)
return ret;
@@ -694,8 +738,8 @@ malloc_heap_alloc_biggest(const char *type, int socket_arg, unsigned int flags,
cur_socket = rte_socket_id_by_idx(i);
if (cur_socket == socket)
continue;
- ret = heap_alloc_biggest_on_socket(type, cur_socket, flags,
- align, contig);
+ ret = heap_alloc_biggest_on_heap_id(type, i, flags, align,
+ contig);
if (ret != NULL)
return ret;
}
@@ -760,7 +804,7 @@ malloc_heap_free(struct malloc_elem *elem)
/* ...of which we can't avail if we are in legacy mode, or if this is an
* externally allocated segment.
*/
- if (internal_config.legacy_mem || msl->external)
+ if (internal_config.legacy_mem || (msl->external > 0))
goto free_unlock;
/* check if we can free any memory back to the system */
@@ -917,7 +961,7 @@ malloc_heap_resize(struct malloc_elem *elem, size_t size)
}
/*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
*/
int
malloc_heap_get_stats(struct malloc_heap *heap,
@@ -955,7 +999,7 @@ malloc_heap_get_stats(struct malloc_heap *heap,
}
/*
- * Function to retrieve data for heap on given socket
+ * Function to retrieve data for a given heap
*/
void
malloc_heap_dump(struct malloc_heap *heap, FILE *f)
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index f52cb5559..61b844b6f 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -46,6 +46,9 @@ malloc_heap_get_stats(struct malloc_heap *heap,
void
malloc_heap_dump(struct malloc_heap *heap, FILE *f);
+int
+malloc_socket_to_heap_id(unsigned int socket_id);
+
int
rte_eal_malloc_heap_init(void);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 47ca5a742..73d6df31d 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -152,11 +152,20 @@ rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ int heap_idx, ret = -1;
- if (socket >= RTE_MAX_NUMA_NODES || socket < 0)
- return -1;
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
- return malloc_heap_get_stats(&mcfg->malloc_heaps[socket], socket_stats);
+ heap_idx = malloc_socket_to_heap_id(socket);
+ if (heap_idx < 0)
+ goto unlock;
+
+ ret = malloc_heap_get_stats(&mcfg->malloc_heaps[heap_idx],
+ socket_stats);
+unlock:
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
}
/*
@@ -168,12 +177,14 @@ rte_malloc_dump_heaps(FILE *f)
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
unsigned int idx;
- for (idx = 0; idx < rte_socket_count(); idx++) {
- unsigned int socket = rte_socket_id_by_idx(idx);
- fprintf(f, "Heap on socket %i:\n", socket);
- malloc_heap_dump(&mcfg->malloc_heaps[socket], f);
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
+ for (idx = 0; idx < RTE_MAX_HEAPS; idx++) {
+ fprintf(f, "Heap id: %u\n", idx);
+ malloc_heap_dump(&mcfg->malloc_heaps[idx], f);
}
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
}
/*
@@ -182,14 +193,19 @@ rte_malloc_dump_heaps(FILE *f)
void
rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
{
- unsigned int socket;
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int heap_id;
struct rte_malloc_socket_stats sock_stats;
+
+ rte_rwlock_read_lock(&mcfg->memory_hotplug_lock);
+
/* Iterate through all initialised heaps */
- for (socket=0; socket< RTE_MAX_NUMA_NODES; socket++) {
- if ((rte_malloc_get_socket_stats(socket, &sock_stats) < 0))
- continue;
+ for (heap_id = 0; heap_id < RTE_MAX_HEAPS; heap_id++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[heap_id];
- fprintf(f, "Socket:%u\n", socket);
+ malloc_heap_get_stats(heap, &sock_stats);
+
+ fprintf(f, "Heap id:%u\n", heap_id);
fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
@@ -198,6 +214,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
fprintf(f, "\tAlloc_count:%u,\n",sock_stats.alloc_count);
fprintf(f, "\tFree_count:%u,\n", sock_stats.free_count);
}
+ rte_rwlock_read_unlock(&mcfg->memory_hotplug_lock);
return;
}
--
2.17.1
^ permalink raw reply [relevance 5%]
* [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
@ 2018-10-01 12:56 12% ` Anatoly Burakov
2018-10-01 17:01 3% ` Stephen Hemminger
2018-10-01 12:56 13% ` [dpdk-dev] [PATCH v8 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
` (3 subsequent siblings)
5 siblings, 1 reply; 200+ results
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, Bruce Richardson,
laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
Previously, to calculate length of memory area covered by a memseg
list, we would've needed to multiply page size by length of fbarray
backing that memseg list. This is not obvious and unnecessarily
low level, so store length in the memseg list itself.
This breaks ABI, so bump the EAL ABI version and document the
change.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
---
doc/guides/rel_notes/release_18_11.rst | 8 +++++++-
drivers/bus/pci/linux/pci.c | 2 +-
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal_memory.c | 2 ++
lib/librte_eal/common/eal_common_memory.c | 5 ++---
lib/librte_eal/common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 3 ++-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +++-
lib/librte_eal/meson.build | 2 +-
10 files changed, 21 insertions(+), 10 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 9c00e33cc..9c17762a5 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -134,6 +134,12 @@ ABI Changes
=========================================================
+* eal: EAL library ABI version was changed due to previously announced work on
+ supporting external memory in DPDK:
+ - structure ``rte_memseg_list`` now has a new field indicating length
+ of memory addressed by the segment list
+
+
Removed Items
-------------
@@ -179,7 +185,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_compressdev.so.1
librte_cryptodev.so.5
librte_distributor.so.1
- librte_eal.so.8
+ + librte_eal.so.9
librte_ethdev.so.10
librte_eventdev.so.4
librte_flow_classify.so.1
diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c
index 04648ac93..d6e1027ab 100644
--- a/drivers/bus/pci/linux/pci.c
+++ b/drivers/bus/pci/linux/pci.c
@@ -119,7 +119,7 @@ rte_pci_unmap_device(struct rte_pci_device *dev)
static int
find_max_end_va(const struct rte_memseg_list *msl, void *arg)
{
- size_t sz = msl->memseg_arr.len * msl->page_sz;
+ size_t sz = msl->len;
void *end_va = RTE_PTR_ADD(msl->base_va, sz);
void **max_va = arg;
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
EXPORT_MAP := ../../rte_eal_version.map
-LIBABIVER := 8
+LIBABIVER := 9
# specific to bsdapp exec-env
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 16d2bc7c3..65ea670f9 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -79,6 +79,7 @@ rte_eal_hugepage_init(void)
}
msl->base_va = addr;
msl->page_sz = page_sz;
+ msl->len = internal_config.memory;
msl->socket_id = 0;
/* populate memsegs. each memseg is 1 page long */
@@ -370,6 +371,7 @@ alloc_va_space(struct rte_memseg_list *msl)
return -1;
}
msl->base_va = addr;
+ msl->len = mem_sz;
return 0;
}
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0b69804ff..30d018209 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -171,7 +171,7 @@ virt2memseg(const void *addr, const struct rte_memseg_list *msl)
/* a memseg list was specified, check if it's the right one */
start = msl->base_va;
- end = RTE_PTR_ADD(start, (size_t)msl->page_sz * msl->memseg_arr.len);
+ end = RTE_PTR_ADD(start, msl->len);
if (addr < start || addr >= end)
return NULL;
@@ -194,8 +194,7 @@ virt2memseg_list(const void *addr)
msl = &mcfg->memsegs[msl_idx];
start = msl->base_va;
- end = RTE_PTR_ADD(start,
- (size_t)msl->page_sz * msl->memseg_arr.len);
+ end = RTE_PTR_ADD(start, msl->len);
if (addr >= start && addr < end)
break;
}
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..1d8b0a6fe 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -30,6 +30,7 @@ struct rte_memseg_list {
uint64_t addr_64;
/**< Makes sure addr is always 64-bits */
};
+ size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
volatile uint32_t version; /**< version number for multiprocess sync. */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
EXPORT_MAP := ../../rte_eal_version.map
VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
-LIBABIVER := 8
+LIBABIVER := 9
VPATH += $(RTE_SDK)/lib/librte_eal/common
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index b2e2a9599..71a6e0fd9 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -986,7 +986,7 @@ free_seg_walk(const struct rte_memseg_list *msl, void *arg)
int msl_idx, seg_idx, ret, dir_fd = -1;
start_addr = (uintptr_t) msl->base_va;
- end_addr = start_addr + msl->memseg_arr.len * (size_t)msl->page_sz;
+ end_addr = start_addr + msl->len;
if ((uintptr_t)wa->ms->addr < start_addr ||
(uintptr_t)wa->ms->addr >= end_addr)
@@ -1472,6 +1472,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
return -1;
}
local_msl->base_va = primary_msl->base_va;
+ local_msl->len = primary_msl->len;
return 0;
}
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
index e3ac24815..897d94179 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -861,6 +861,7 @@ alloc_va_space(struct rte_memseg_list *msl)
return -1;
}
msl->base_va = addr;
+ msl->len = mem_sz;
return 0;
}
@@ -1369,6 +1370,7 @@ eal_legacy_hugepage_init(void)
msl->base_va = addr;
msl->page_sz = page_sz;
msl->socket_id = 0;
+ msl->len = internal_config.memory;
/* populate memsegs. each memseg is one page long */
for (cur_seg = 0; cur_seg < n_segs; cur_seg++) {
@@ -1615,7 +1617,7 @@ eal_legacy_hugepage_init(void)
if (msl->memseg_arr.count > 0)
continue;
/* this is an unused list, deallocate it */
- mem_sz = (size_t)msl->page_sz * msl->memseg_arr.len;
+ mem_sz = msl->len;
munmap(msl->base_va, mem_sz);
msl->base_va = NULL;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
error('unsupported system type "@0@"'.format(host_machine.system()))
endif
-version = 8 # the version of the EAL API
+version = 9 # the version of the EAL API
allow_experimental_apis = true
deps += 'compat'
deps += 'kvargs'
--
2.17.1
^ permalink raw reply [relevance 12%]
* [dpdk-dev] [PATCH v8 08/21] malloc: add name to malloc heaps
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
` (3 preceding siblings ...)
2018-10-01 12:56 5% ` [dpdk-dev] [PATCH v8 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
@ 2018-10-01 12:56 8% ` Anatoly Burakov
2018-10-01 12:56 7% ` [dpdk-dev] [PATCH v8 11/21] malloc: allow creating " Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 12:56 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.
This breaks the ABI, so document the change.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 2 ++
lib/librte_eal/common/include/rte_malloc_heap.h | 2 ++
lib/librte_eal/common/malloc_heap.c | 17 ++++++++++++++++-
lib/librte_eal/common/rte_malloc.c | 1 +
4 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 9583f3eda..a6bddaaf4 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -156,6 +156,8 @@ ABI Changes
ID the malloc heap belongs to
- structure ``rte_mem_config`` has had its ``malloc_heaps`` array
resized from ``RTE_MAX_NUMA_NODES`` to ``RTE_MAX_HEAPS`` value
+ - structure ``rte_malloc_heap`` now has a ``heap_name`` member
+
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
/* Number of free lists per heap, grouped by size. */
#define RTE_HEAP_NUM_FREELISTS 13
+#define RTE_HEAP_NAME_MAX_LEN 32
/* dummy definition, for pointers */
struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
unsigned alloc_count;
size_t total_size;
unsigned int socket_id;
+ char name[RTE_HEAP_NAME_MAX_LEN];
} __rte_cache_aligned;
#endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 09b06061d..b28905817 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -131,7 +131,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
malloc_heap_add_memory(heap, found_msl, ms->addr, len);
heap->total_size += len;
- heap->socket_id = msl->socket_id;
RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
msl->socket_id);
@@ -1024,6 +1023,22 @@ int
rte_eal_malloc_heap_init(void)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int i;
+
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign names to default DPDK heaps */
+ for (i = 0; i < rte_socket_count(); i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+ char heap_name[RTE_HEAP_NAME_MAX_LEN];
+ int socket_id = rte_socket_id_by_idx(i);
+
+ snprintf(heap_name, sizeof(heap_name) - 1,
+ "socket_%i", socket_id);
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ heap->socket_id = socket_id;
+ }
+ }
+
if (register_mp_requests()) {
RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
malloc_heap_get_stats(heap, &sock_stats);
fprintf(f, "Heap id:%u\n", heap_id);
+ fprintf(f, "\tHeap name:%s\n", heap->name);
fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
--
2.17.1
^ permalink raw reply [relevance 8%]
* Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
2018-10-01 11:06 0% ` Bruce Richardson
@ 2018-10-01 11:24 0% ` Luca Boccassi
2018-10-02 11:02 0% ` Marco Varlese
0 siblings, 1 reply; 200+ results
From: Luca Boccassi @ 2018-10-01 11:24 UTC (permalink / raw)
To: Bruce Richardson, Timothy Redaelli; +Cc: dev, mvarlese, christian.ehrhardt
On Mon, 2018-10-01 at 12:06 +0100, Bruce Richardson wrote:
> On Mon, Oct 01, 2018 at 12:42:09PM +0200, Timothy Redaelli wrote:
> > On Mon, 01 Oct 2018 10:46:02 +0100
> > Luca Boccassi <bluca@debian.org> wrote:
> >
> > > On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> > > > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson
> > > > wrote:
> > > > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi
> > > > > wrote:
> > > > > > Allow users and packagers to override the default
> > > > > > dpdk/drivers
> > > > > > subdirectory where the PMDs get installed under $lib.
> > > > > >
> > > > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > > > ---
> > > > >
> > > > > I'm ok with this change, but what is the current location
> > > > > used by
> > > > > distro's
> > > > > right now? I mistakenly never checked what was done before I
> > > > > used
> > > > > dpdk/drivers as a default value, and I'd like the default to
> > > > > match
> > > > > the
> > > > > common option if possible.
> > > > >
> > > > > /Bruce
> > > > >
> > > >
> > > > Replying to my own question, I've just checked on CentOS and
> > > > Debian,
> > > > and it
> > > > appears both are using directory "dpdk-pmds" as the subdir
> > > > name.
> > > > Therefore,
> > > > let's just make that the default. [Does it need to be
> > > > configurable in
> > > > that
> > > > case?]
> > > >
> > > > /Bruce
> > >
> > > If the default is the one I expect then I'm fine without having
> > > an
> > > option (actually happier - less things to configure).
> > >
> > > But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last
> > > January :-)
> > > We changed because using a single directory creates problems when
> > > multiple different ABI versions are installed, due to the EAL
> > > autoload
> > > from that directory. So we need a different subdirectory per ABI
> > > revision.
> > >
> > > We were actually talking with Timothy a while ago to make this
> > > consistent across our distros, and perhaps Marco can chip in as
> > > well.
> > >
> > > Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm
> > > not
> > > too fussy on $something, it can be drivers or pmds or something
> > > else.
> > >
> >
> > LGTM.
> > If needed, we can just do a compatibility symlink using the current
> > dpdk-pmds path
> >
>
> One suggestion/comment. Would using a unique directory per release
> not lead
> to clobbering up the lib directory unnecessarily? How about having a
> single
> "dpdk" or "dpdk-pmds" directory in lib, and having $MAJORVER as a
> subdir
> under that?
>
> E.g. dpdk/pmds-18.08/, dpdk/pmds-18.11/, or dpdk-pmds/18.08/
> dpdk-pmds/18.11
>
> [The former of the above would be my preference, since I don't like
> having
> hypenated names, and like having "dpdk" alone as a folder name :-)]
>
> /Bruce
dpdk/pmds-XX.YY/ would work for me. Timothy and Marco?
--
Kind regards,
Luca Boccassi
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v7 02/21] mem: allow memseg lists to be marked as external
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
@ 2018-10-01 11:04 16% ` Anatoly Burakov
2018-10-01 11:04 4% ` [dpdk-dev] [PATCH v7 04/21] mem: do not check for invalid socket ID Anatoly Burakov
` (2 subsequent siblings)
4 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
To: dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, thomas, alejandro.lucero
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes. This also breaks a few
internal assumptions about memory contiguousness, so adjust
malloc code in a few places.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Yongseok Koh <yskoh@mellanox.com>
---
Notes:
v3:
- Add comment to explain the process of picking up minimum
page sizes for mempool
v2:
- Add documentation changes and ABI break
v1:
- Adjust all calls to memseg walk functions to ignore external
segments where it made sense to do so
doc/guides/rel_notes/deprecation.rst | 15 --------
doc/guides/rel_notes/release_18_11.rst | 13 ++++++-
drivers/bus/fslmc/fslmc_vfio.c | 6 +++-
drivers/net/mlx5/mlx5.c | 4 ++-
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 ++-
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 ++
lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++--
lib/librte_eal/common/eal_common_memory.c | 3 ++
.../common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/common/include/rte_memory.h | 9 +++++
lib/librte_eal/common/malloc_elem.c | 10 ++++--
lib/librte_eal/common/malloc_heap.c | 9 +++--
lib/librte_eal/common/rte_malloc.c | 2 +-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +++++-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++---
lib/librte_eal/meson.build | 2 +-
lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++-----
test/test/test_malloc.c | 3 ++
test/test/test_memzone.c | 3 ++
22 files changed, 127 insertions(+), 43 deletions(-)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------
-* eal: certain structures will change in EAL on account of upcoming external
- memory support. Aside from internal changes leading to an ABI break, the
- following externally visible changes will also be implemented:
-
- - ``rte_memseg_list`` will change to include a boolean flag indicating
- whether a particular memseg list is externally allocated. This will have
- implications for any users of memseg-walk-related functions, as they will
- now have to skip externally allocated segments in most cases if the intent
- is to only iterate over internal DPDK memory.
- - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
- as some socket ID's will now be representing externally allocated memory. No
- changes will be required for existing code as backwards compatibility will
- be kept, and those who do not use this feature will not see these extra
- socket ID's.
-
* eal: both declaring and identifying devices will be streamlined in v18.11.
New functions will appear to query a specific port from buses, classes of
device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..5fc71e208 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* eal: The following API changes were made in 18.11:
+
+ - ``rte_memseg_list`` structure now has an additional flag indicating whether
+ the memseg list is externally allocated. This will have implications for any
+ users of memseg-walk-related functions, as they will now have to skip
+ externally allocated segments in most cases if the intent is to only iterate
+ over internal DPDK memory.
ABI Changes
-----------
@@ -107,6 +114,10 @@ ABI Changes
=========================================================
+* eal: EAL library ABI version was changed due to previously announced work on
+ supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
+ a new flag indicating whether the memseg list refers to external memory.
+
Removed Items
-------------
@@ -152,7 +163,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_compressdev.so.1
librte_cryptodev.so.5
librte_distributor.so.1
- librte_eal.so.8
+ + librte_eal.so.9
librte_ethdev.so.10
librte_eventdev.so.4
librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..cb33dd891 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -318,11 +318,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
static int
fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+ const struct rte_memseg *ms, void *arg)
{
int *n_segs = arg;
int ret;
+ /* if IOVA address is invalid, skip */
+ if (ms->iova == RTE_BAD_IOVA)
+ return 0;
+
ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
if (ret)
DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..fc3cb1b49 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,13 @@ static struct rte_pci_driver mlx5_driver;
static void *uar_base;
static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
void **addr = arg;
+ if (msl->external)
+ return 0;
if (*addr == NULL)
*addr = ms->addr;
else
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
uint32_t region_nr;
};
static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, size_t len, void *arg)
{
struct walk_arg *wa = arg;
struct vhost_memory_region *mr;
void *start_addr;
+ if (msl->external)
+ return 0;
+
if (wa->region_nr >= max_regions)
return -1;
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
EXPORT_MAP := ../../rte_eal_version.map
-LIBABIVER := 8
+LIBABIVER := 9
# specific to bsdapp exec-env
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
return 1;
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
int seg_idx;
};
static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
struct attach_walk_args *wa = arg;
void *addr;
+ if (msl->external)
+ return 0;
+
addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
{
uint64_t *total_len = arg;
+ if (msl->external)
+ return 0;
+
*total_len += msl->memseg_arr.count * msl->page_sz;
return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
+ unsigned int external; /**< 1 if this list points to external memory */
volatile uint32_t version; /**< version number for multiprocess sync. */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
/* if we're in IOVA as VA mode, or if we're in legacy mode with
- * hugepages, all elements are IOVA-contiguous.
+ * hugepages, all elements are IOVA-contiguous. however, we can only
+ * make these assumptions about internal memory - externally allocated
+ * segments have to be checked.
*/
- if (rte_eal_iova_mode() == RTE_IOVA_VA ||
- (internal_config.legacy_mem && rte_eal_has_hugepages()))
+ if (!elem->msl->external &&
+ (rte_eal_iova_mode() == RTE_IOVA_VA ||
+ (internal_config.legacy_mem &&
+ rte_eal_has_hugepages())))
return RTE_PTR_DIFF(data_end, contig_seg_start);
cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct malloc_heap *heap;
int msl_idx;
+ if (msl->external)
+ return 0;
+
heap = &mcfg->malloc_heaps[msl->socket_id];
/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
/* anything after this is a bonus */
ret = 0;
- /* ...of which we can't avail if we are in legacy mode */
- if (internal_config.legacy_mem)
+ /* ...of which we can't avail if we are in legacy mode, or if this is an
+ * externally allocated segment.
+ */
+ if (internal_config.legacy_mem || msl->external)
goto free_unlock;
/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
if (elem == NULL)
return RTE_BAD_IOVA;
- if (rte_eal_iova_mode() == RTE_IOVA_VA)
+ if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
return (uintptr_t) addr;
ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
EXPORT_MAP := ../../rte_eal_version.map
VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
-LIBABIVER := 8
+LIBABIVER := 9
VPATH += $(RTE_SDK)/lib/librte_eal/common
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
return *socket_id == msl->socket_id;
}
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg __rte_unused)
{
/* ms is const, so find this memseg */
- struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+ struct rte_memseg *found;
+
+ if (msl->external)
+ return 0;
+
+ found = rte_mem_virt2memseg(ms->addr, msl);
found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
unsigned int i;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
char name[PATH_MAX];
int msl_idx, ret;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
unsigned int len;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
}
static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
}
static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
uint64_t hugepage_sz;
};
static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
struct spapr_walk_param *param = arg;
uint64_t max = ms->iova + ms->len;
+ if (msl->external)
+ return 0;
+
if (max > param->window_size) {
param->hugepage_sz = ms->hugepage_sz;
param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
error('unsupported system type "@0@"'.format(host_machine.system()))
endif
-version = 8 # the version of the EAL API
+version = 9 # the version of the EAL API
allow_experimental_apis = true
deps += 'compat'
deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
}
+struct pagesz_walk_arg {
+ int socket_id;
+ size_t min;
+};
+
static int
find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
{
- size_t *min = arg;
+ struct pagesz_walk_arg *wa = arg;
+ bool valid;
- if (msl->page_sz < *min)
- *min = msl->page_sz;
+ /*
+ * we need to only look at page sizes available for a particular socket
+ * ID. so, we either need an exact match on socket ID (can match both
+ * native and external memory), or, if SOCKET_ID_ANY was specified as a
+ * socket ID argument, we must only look at native memory and ignore any
+ * page sizes associated with external memory.
+ */
+ valid = msl->socket_id == wa->socket_id;
+ valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+ if (valid && msl->page_sz < wa->min)
+ wa->min = msl->page_sz;
return 0;
}
static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
{
- size_t min_pagesz = SIZE_MAX;
+ struct pagesz_walk_arg wa;
- rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+ wa.min = SIZE_MAX;
+ wa.socket_id = socket_id;
- return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+ rte_memseg_list_walk(find_min_pagesz, &wa);
+
+ return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
}
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_sz = 0;
pg_shift = 0;
} else if (try_contig) {
- pg_sz = get_min_page_size();
+ pg_sz = get_min_page_size(mp->socket_id);
pg_shift = rte_bsf32(pg_sz);
} else {
pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
{
int32_t *socket = arg;
+ if (msl->external)
+ return 0;
+
return *socket == msl->socket_id;
}
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
{
struct walk_arg *wa = arg;
+ if (msl->external)
+ return 0;
+
if (msl->page_sz == RTE_PGSIZE_2M)
wa->hugepage_2MB_avail = 1;
if (msl->page_sz == RTE_PGSIZE_1G)
--
2.17.1
^ permalink raw reply [relevance 16%]
* Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
2018-10-01 10:42 0% ` Timothy Redaelli
@ 2018-10-01 11:06 0% ` Bruce Richardson
2018-10-01 11:24 0% ` Luca Boccassi
0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-10-01 11:06 UTC (permalink / raw)
To: Timothy Redaelli; +Cc: Luca Boccassi, dev, mvarlese, christian.ehrhardt
On Mon, Oct 01, 2018 at 12:42:09PM +0200, Timothy Redaelli wrote:
> On Mon, 01 Oct 2018 10:46:02 +0100
> Luca Boccassi <bluca@debian.org> wrote:
>
> > On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> > > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson wrote:
> > > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi wrote:
> > > > > Allow users and packagers to override the default dpdk/drivers
> > > > > subdirectory where the PMDs get installed under $lib.
> > > > >
> > > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > > ---
> > > >
> > > > I'm ok with this change, but what is the current location used by
> > > > distro's
> > > > right now? I mistakenly never checked what was done before I used
> > > > dpdk/drivers as a default value, and I'd like the default to match
> > > > the
> > > > common option if possible.
> > > >
> > > > /Bruce
> > > >
> > >
> > > Replying to my own question, I've just checked on CentOS and Debian,
> > > and it
> > > appears both are using directory "dpdk-pmds" as the subdir name.
> > > Therefore,
> > > let's just make that the default. [Does it need to be configurable in
> > > that
> > > case?]
> > >
> > > /Bruce
> >
> > If the default is the one I expect then I'm fine without having an
> > option (actually happier - less things to configure).
> >
> > But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last January :-)
> > We changed because using a single directory creates problems when
> > multiple different ABI versions are installed, due to the EAL autoload
> > from that directory. So we need a different subdirectory per ABI
> > revision.
> >
> > We were actually talking with Timothy a while ago to make this
> > consistent across our distros, and perhaps Marco can chip in as well.
> >
> > Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm not
> > too fussy on $something, it can be drivers or pmds or something else.
> >
>
> LGTM.
> If needed, we can just do a compatibility symlink using the current
> dpdk-pmds path
>
One suggestion/comment. Would using a unique directory per release not lead
to clobbering up the lib directory unnecessarily? How about having a single
"dpdk" or "dpdk-pmds" directory in lib, and having $MAJORVER as a subdir
under that?
E.g. dpdk/pmds-18.08/, dpdk/pmds-18.11/, or dpdk-pmds/18.08/
dpdk-pmds/18.11
[The former of the above would be my preference, since I don't like having
hypenated names, and like having "dpdk" alone as a folder name :-)]
/Bruce
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v7 08/21] malloc: add name to malloc heaps
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
` (2 preceding siblings ...)
2018-10-01 11:04 4% ` [dpdk-dev] [PATCH v7 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-10-01 11:04 9% ` Anatoly Burakov
2018-10-01 11:05 4% ` [dpdk-dev] [PATCH v7 11/21] malloc: allow creating " Anatoly Burakov
4 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 1 +
lib/librte_eal/common/include/rte_malloc_heap.h | 2 ++
lib/librte_eal/common/malloc_heap.c | 17 ++++++++++++++++-
lib/librte_eal/common/rte_malloc.c | 1 +
4 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 6ee236302..5a80e1122 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -124,6 +124,7 @@ ABI Changes
* eal: EAL library ABI version was changed due to previously announced work on
supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
a new flag indicating whether the memseg list refers to external memory.
+ Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
/* Number of free lists per heap, grouped by size. */
#define RTE_HEAP_NUM_FREELISTS 13
+#define RTE_HEAP_NAME_MAX_LEN 32
/* dummy definition, for pointers */
struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
unsigned alloc_count;
size_t total_size;
unsigned int socket_id;
+ char name[RTE_HEAP_NAME_MAX_LEN];
} __rte_cache_aligned;
#endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 09b06061d..b28905817 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -131,7 +131,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
malloc_heap_add_memory(heap, found_msl, ms->addr, len);
heap->total_size += len;
- heap->socket_id = msl->socket_id;
RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
msl->socket_id);
@@ -1024,6 +1023,22 @@ int
rte_eal_malloc_heap_init(void)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int i;
+
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign names to default DPDK heaps */
+ for (i = 0; i < rte_socket_count(); i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+ char heap_name[RTE_HEAP_NAME_MAX_LEN];
+ int socket_id = rte_socket_id_by_idx(i);
+
+ snprintf(heap_name, sizeof(heap_name) - 1,
+ "socket_%i", socket_id);
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ heap->socket_id = socket_id;
+ }
+ }
+
if (register_mp_requests()) {
RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
malloc_heap_get_stats(heap, &sock_stats);
fprintf(f, "Heap id:%u\n", heap_id);
+ fprintf(f, "\tHeap name:%s\n", heap->name);
fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
--
2.17.1
^ permalink raw reply [relevance 9%]
* [dpdk-dev] [PATCH v7 11/21] malloc: allow creating malloc heaps
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
` (3 preceding siblings ...)
2018-10-01 11:04 9% ` [dpdk-dev] [PATCH v7 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-10-01 11:05 4% ` Anatoly Burakov
4 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 11:05 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 2 +
.../common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_malloc.h | 19 +++++++
lib/librte_eal/common/malloc_heap.c | 37 +++++++++++++
lib/librte_eal/common/malloc_heap.h | 3 ++
lib/librte_eal/common/rte_malloc.c | 52 +++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 117 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5a80e1122..5065ec1af 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -125,6 +125,8 @@ ABI Changes
supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
a new flag indicating whether the memseg list refers to external memory.
Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
+ Structure ``rte_eal_memconfig`` has been extended to contain next socket
+ ID for externally allocated memory segments.
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index d7920a4e0..98da58771 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
/* Heaps of Malloc */
struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
+ /* next socket ID for external malloc heap */
+ int next_socket_id;
+
/* address of mem_config in primary process. used to map shared config into
* exact same address the primary process maps it.
*/
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats);
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ * socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ * Name of the heap to create.
+ *
+ * @return
+ * - 0 on successful creation
+ * - -1 in case of error, with rte_errno set to one of the following:
+ * EINVAL - ``heap_name`` was NULL, empty or too long
+ * EEXIST - heap by name of ``heap_name`` already exists
+ * ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
/**
* Find socket ID corresponding to a named heap.
*
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index b28905817..00fdf54f7 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
#include "malloc_heap.h"
#include "malloc_mp.h"
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
static unsigned
check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
{
@@ -1019,6 +1023,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
rte_spinlock_unlock(&heap->lock);
}
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ uint32_t next_socket_id = mcfg->next_socket_id;
+
+ /* prevent overflow. did you really create 2 billion heaps??? */
+ if (next_socket_id > INT32_MAX) {
+ RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+ rte_errno = ENOSPC;
+ return -1;
+ }
+
+ /* initialize empty heap */
+ heap->alloc_count = 0;
+ heap->first = NULL;
+ heap->last = NULL;
+ LIST_INIT(heap->free_head);
+ rte_spinlock_init(&heap->lock);
+ heap->total_size = 0;
+ heap->socket_id = next_socket_id;
+
+ /* we hold a global mem hotplug writelock, so it's safe to increment */
+ mcfg->next_socket_id++;
+
+ /* set up name */
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ return 0;
+}
+
int
rte_eal_malloc_heap_init(void)
{
@@ -1026,6 +1060,9 @@ rte_eal_malloc_heap_init(void)
unsigned int i;
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign min socket ID to external heaps */
+ mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
/* assign names to default DPDK heaps */
for (i = 0; i < rte_socket_count(); i++) {
struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
size_t align, bool contig);
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
int
malloc_heap_free(struct malloc_elem *elem);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_eal_memconfig.h>
+#include <rte_errno.h>
#include <rte_branch_prediction.h>
#include <rte_debug.h>
#include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
}
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct malloc_heap *heap = NULL;
+ int i, ret;
+
+ if (heap_name == NULL ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+ RTE_HEAP_NAME_MAX_LEN) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ /* check if there is space in the heap list, or if heap with this name
+ * already exists.
+ */
+ rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+ /* existing heap */
+ if (strncmp(heap_name, tmp->name,
+ RTE_HEAP_NAME_MAX_LEN) == 0) {
+ RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+ heap_name);
+ rte_errno = EEXIST;
+ ret = -1;
+ goto unlock;
+ }
+ /* empty heap */
+ if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+ heap = tmp;
+ break;
+ }
+ }
+ if (heap == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+ rte_errno = ENOSPC;
+ ret = -1;
+ goto unlock;
+ }
+
+ /* we're sure that we can create a new heap, so do it */
+ ret = malloc_heap_create(heap, heap_name);
+unlock:
+ rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
rte_fbarray_set_used;
rte_log_register_type_and_pick_level;
rte_malloc_dump_heaps;
+ rte_malloc_heap_create;
rte_malloc_heap_get_socket;
rte_malloc_heap_socket_is_external;
rte_mem_alloc_validator_register;
--
2.17.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v7 04/21] mem: do not check for invalid socket ID
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
2018-10-01 11:04 16% ` [dpdk-dev] [PATCH v7 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-10-01 11:04 4% ` Anatoly Burakov
2018-10-01 11:04 9% ` [dpdk-dev] [PATCH v7 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-10-01 11:05 4% ` [dpdk-dev] [PATCH v7 11/21] malloc: allow creating " Anatoly Burakov
4 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.
This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 7 +++++++
lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
lib/librte_eal/common/malloc_heap.c | 2 +-
lib/librte_eal/common/rte_malloc.c | 4 ----
4 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5fc71e208..6ee236302 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
users of memseg-walk-related functions, as they will now have to skip
externally allocated segments in most cases if the intent is to only iterate
over internal DPDK memory.
+ - ``socket_id`` parameter across the entire DPDK has gained additional
+ meaning, as some socket ID's will now be representing externally allocated
+ memory. No changes will be required for existing code as backwards
+ compatibility will be kept, and those who do not use this feature will not
+ see these extra socket ID's. Any new API's must not check socket ID
+ parameters themselves, and must instead leave it to the memory subsystem to
+ decide whether socket ID is a valid one.
ABI Changes
-----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
return NULL;
}
- if ((socket_id != SOCKET_ID_ANY) &&
- (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+ if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
rte_errno = EINVAL;
return NULL;
}
- if (!rte_eal_has_hugepages())
+ /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+ * external heap.
+ */
+ if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
socket_id = SOCKET_ID_ANY;
contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index a9cfa423f..09b06061d 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -651,7 +651,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
if (size == 0 || (align && !rte_is_power_of_2(align)))
return NULL;
- if (!rte_eal_has_hugepages())
+ if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
socket_arg = SOCKET_ID_ANY;
if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
if (!rte_eal_has_hugepages())
socket_arg = SOCKET_ID_ANY;
- /* Check socket parameter */
- if (socket_arg >= RTE_MAX_NUMA_NODES)
- return NULL;
-
return malloc_heap_alloc(type, size, socket_arg, 0,
align == 0 ? 1 : align, 0, false);
}
--
2.17.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v7 00/21] Support externally allocated memory in DPDK
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
@ 2018-10-01 11:04 2% ` Anatoly Burakov
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
` (5 more replies)
2018-10-01 11:04 16% ` [dpdk-dev] [PATCH v7 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
` (3 subsequent siblings)
4 siblings, 6 replies; 200+ results
From: Anatoly Burakov @ 2018-10-01 11:04 UTC (permalink / raw)
To: dev
Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero
This is a proposal to enable using externally allocated memory
in DPDK.
In a nutshell, here is what is being done here:
- Index internal malloc heaps by NUMA node index, rather than NUMA
node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
- Each new heap will receive a unique socket ID that will be used by
allocator to decide from which heap (internal or external) to
allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
of externally allocated memory
- If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps
The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).
The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).
A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.
Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.
v7 -> v6 changes:
- Fixed missing IOVA address setup in testpmd
- Fixed MLX drivers as per Yongseok's comments
- Added a check for invalid heap idx on adding memory to heap
v6 -> v5 changes:
- Fixed documentation formatting as per Marko's comments
v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes
v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
IOVA-contiguousness when dealing with externally allocated memory
v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
comments
v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation
RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements
Anatoly Burakov (21):
mem: add length to memseg list
mem: allow memseg lists to be marked as external
malloc: index heaps using heap ID rather than NUMA node
mem: do not check for invalid socket ID
flow_classify: do not check for invalid socket ID
pipeline: do not check for invalid socket ID
sched: do not check for invalid socket ID
malloc: add name to malloc heaps
malloc: add function to query socket ID of named heap
malloc: add function to check if socket is external
malloc: allow creating malloc heaps
malloc: allow destroying heaps
malloc: allow adding memory to named heaps
malloc: allow removing memory from named heaps
malloc: allow attaching to external memory chunks
malloc: allow detaching from external memory
malloc: enable event callbacks for external memory
test: add unit tests for external memory support
app/testpmd: add support for external memory
doc: add external memory feature to the release notes
doc: add external memory feature to programmer's guide
app/test-pmd/config.c | 21 +-
app/test-pmd/parameters.c | 23 +-
app/test-pmd/testpmd.c | 318 ++++++++++++-
app/test-pmd/testpmd.h | 13 +-
config/common_base | 1 +
config/rte_config.h | 1 +
.../prog_guide/env_abstraction_layer.rst | 37 ++
doc/guides/rel_notes/deprecation.rst | 15 -
doc/guides/rel_notes/release_18_11.rst | 28 +-
doc/guides/testpmd_app_ug/run_app.rst | 12 +
drivers/bus/fslmc/fslmc_vfio.c | 13 +-
drivers/bus/pci/linux/pci.c | 2 +-
drivers/net/mlx5/mlx5.c | 4 +-
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 +-
.../net/virtio/virtio_user/virtio_user_dev.c | 8 +
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 +
lib/librte_eal/bsdapp/eal/eal_memory.c | 9 +-
lib/librte_eal/common/eal_common_memory.c | 8 +-
lib/librte_eal/common/eal_common_memzone.c | 8 +-
.../common/include/rte_eal_memconfig.h | 9 +-
lib/librte_eal/common/include/rte_malloc.h | 192 ++++++++
.../common/include/rte_malloc_heap.h | 3 +
lib/librte_eal/common/include/rte_memory.h | 9 +
lib/librte_eal/common/malloc_elem.c | 10 +-
lib/librte_eal/common/malloc_heap.c | 320 +++++++++++--
lib/librte_eal/common/malloc_heap.h | 17 +
lib/librte_eal/common/rte_malloc.c | 429 +++++++++++++++++-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 12 +-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 27 +-
lib/librte_eal/meson.build | 2 +-
lib/librte_eal/rte_eal_version.map | 8 +
lib/librte_flow_classify/rte_flow_classify.c | 3 +-
lib/librte_mempool/rte_mempool.c | 57 ++-
lib/librte_pipeline/rte_pipeline.c | 3 +-
lib/librte_sched/rte_sched.c | 2 +-
test/test/Makefile | 1 +
test/test/autotest_data.py | 14 +-
test/test/meson.build | 1 +
test/test/test_external_mem.c | 389 ++++++++++++++++
test/test/test_malloc.c | 3 +
test/test/test_memzone.c | 3 +
45 files changed, 1923 insertions(+), 138 deletions(-)
create mode 100644 test/test/test_external_mem.c
--
2.17.1
^ permalink raw reply [relevance 2%]
* Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
2018-10-01 9:46 4% ` Luca Boccassi
2018-10-01 10:01 0% ` Bruce Richardson
@ 2018-10-01 10:42 0% ` Timothy Redaelli
2018-10-01 11:06 0% ` Bruce Richardson
1 sibling, 1 reply; 200+ results
From: Timothy Redaelli @ 2018-10-01 10:42 UTC (permalink / raw)
To: Luca Boccassi; +Cc: Bruce Richardson, dev, mvarlese, christian.ehrhardt
On Mon, 01 Oct 2018 10:46:02 +0100
Luca Boccassi <bluca@debian.org> wrote:
> On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson wrote:
> > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi wrote:
> > > > Allow users and packagers to override the default dpdk/drivers
> > > > subdirectory where the PMDs get installed under $lib.
> > > >
> > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > ---
> > >
> > > I'm ok with this change, but what is the current location used by
> > > distro's
> > > right now? I mistakenly never checked what was done before I used
> > > dpdk/drivers as a default value, and I'd like the default to match
> > > the
> > > common option if possible.
> > >
> > > /Bruce
> > >
> >
> > Replying to my own question, I've just checked on CentOS and Debian,
> > and it
> > appears both are using directory "dpdk-pmds" as the subdir name.
> > Therefore,
> > let's just make that the default. [Does it need to be configurable in
> > that
> > case?]
> >
> > /Bruce
>
> If the default is the one I expect then I'm fine without having an
> option (actually happier - less things to configure).
>
> But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last January :-)
> We changed because using a single directory creates problems when
> multiple different ABI versions are installed, due to the EAL autoload
> from that directory. So we need a different subdirectory per ABI
> revision.
>
> We were actually talking with Timothy a while ago to make this
> consistent across our distros, and perhaps Marco can chip in as well.
>
> Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm not
> too fussy on $something, it can be drivers or pmds or something else.
>
LGTM.
If needed, we can just do a compatibility symlink using the current
dpdk-pmds path
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
2018-10-01 9:46 4% ` Luca Boccassi
@ 2018-10-01 10:01 0% ` Bruce Richardson
2018-10-01 10:42 0% ` Timothy Redaelli
1 sibling, 0 replies; 200+ results
From: Bruce Richardson @ 2018-10-01 10:01 UTC (permalink / raw)
To: Luca Boccassi; +Cc: dev, tredaelli, mvarlese, christian.ehrhardt
On Mon, Oct 01, 2018 at 10:46:02AM +0100, Luca Boccassi wrote:
> On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> > On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson wrote:
> > > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi wrote:
> > > > Allow users and packagers to override the default dpdk/drivers
> > > > subdirectory where the PMDs get installed under $lib.
> > > >
> > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > ---
> > >
> > > I'm ok with this change, but what is the current location used by
> > > distro's
> > > right now? I mistakenly never checked what was done before I used
> > > dpdk/drivers as a default value, and I'd like the default to match
> > > the
> > > common option if possible.
> > >
> > > /Bruce
> > >
> >
> > Replying to my own question, I've just checked on CentOS and Debian,
> > and it
> > appears both are using directory "dpdk-pmds" as the subdir name.
> > Therefore,
> > let's just make that the default. [Does it need to be configurable in
> > that
> > case?]
> >
> > /Bruce
>
> If the default is the one I expect then I'm fine without having an
> option (actually happier - less things to configure).
>
> But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last January :-)
> We changed because using a single directory creates problems when
> multiple different ABI versions are installed, due to the EAL autoload
> from that directory. So we need a different subdirectory per ABI
> revision.
>
> We were actually talking with Timothy a while ago to make this
> consistent across our distros, and perhaps Marco can chip in as well.
>
> Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm not
> too fussy on $something, it can be drivers or pmds or something else.
>
Sounds like it needs to be configurable, just in case.
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option
@ 2018-10-01 9:46 4% ` Luca Boccassi
2018-10-01 10:01 0% ` Bruce Richardson
2018-10-01 10:42 0% ` Timothy Redaelli
0 siblings, 2 replies; 200+ results
From: Luca Boccassi @ 2018-10-01 9:46 UTC (permalink / raw)
To: Bruce Richardson; +Cc: dev, tredaelli, mvarlese, christian.ehrhardt
On Mon, 2018-10-01 at 10:25 +0100, Bruce Richardson wrote:
> On Mon, Oct 01, 2018 at 10:17:14AM +0100, Bruce Richardson wrote:
> > On Fri, Sep 28, 2018 at 06:58:03PM +0100, Luca Boccassi wrote:
> > > Allow users and packagers to override the default dpdk/drivers
> > > subdirectory where the PMDs get installed under $lib.
> > >
> > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > ---
> >
> > I'm ok with this change, but what is the current location used by
> > distro's
> > right now? I mistakenly never checked what was done before I used
> > dpdk/drivers as a default value, and I'd like the default to match
> > the
> > common option if possible.
> >
> > /Bruce
> >
>
> Replying to my own question, I've just checked on CentOS and Debian,
> and it
> appears both are using directory "dpdk-pmds" as the subdir name.
> Therefore,
> let's just make that the default. [Does it need to be configurable in
> that
> case?]
>
> /Bruce
If the default is the one I expect then I'm fine without having an
option (actually happier - less things to configure).
But in Debian/Ubuntu it's dpdk-MAJORVER-drivers since last January :-)
We changed because using a single directory creates problems when
multiple different ABI versions are installed, due to the EAL autoload
from that directory. So we need a different subdirectory per ABI
revision.
We were actually talking with Timothy a while ago to make this
consistent across our distros, and perhaps Marco can chip in as well.
Timothy, Marco, is using dpdk-MAJORVER-$something ok for you? I'm not
too fussy on $something, it can be drivers or pmds or something else.
--
Kind regards,
Luca Boccassi
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH v1 4/5] pci: add req handler field to generic pci device
2018-09-29 6:15 3% ` Jeff Guo
@ 2018-10-01 7:51 3% ` Burakov, Anatoly
0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-10-01 7:51 UTC (permalink / raw)
To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
bernard.iremonger, arybchenko
Cc: jblunck, shreyansh.jain, dev, helin.zhang
On 29-Sep-18 7:15 AM, Jeff Guo wrote:
>
> On 9/26/2018 8:22 PM, Burakov, Anatoly wrote:
>> On 17-Aug-18 11:51 AM, Jeff Guo wrote:
>>> There are some extended interrupt types in vfio pci device except
>>> from the
>>> existing interrupts, such as err and req notifier, it could be useful
>>> for
>>> device error monitoring. And these corresponding interrupt handler is
>>> different from the other interrupt handler that register in PMDs, so
>>> a new
>>> interrupt handler should be added. This patch will add specific req
>>> handler
>>> in generic pci device.
>>>
>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>>> ---
>>> drivers/bus/pci/rte_bus_pci.h | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/bus/pci/rte_bus_pci.h
>>> b/drivers/bus/pci/rte_bus_pci.h
>>> index 0d1955f..c45a820 100644
>>> --- a/drivers/bus/pci/rte_bus_pci.h
>>> +++ b/drivers/bus/pci/rte_bus_pci.h
>>> @@ -66,6 +66,7 @@ struct rte_pci_device {
>>> uint16_t max_vfs; /**< sriov enable if not
>>> zero */
>>> enum rte_kernel_driver kdrv; /**< Kernel driver
>>> passthrough */
>>> char name[PCI_PRI_STR_SIZE+1]; /**< PCI location (ASCII) */
>>> + struct rte_intr_handle req_notifier_handler;/**< Req notifier
>>> handle */
>>> };
>>> /**
>>>
>>
>> Does this break ABI?
>>
>
> If add a variable in struct would break ABI, it does.
>
>
Then it probably does. So, should probably bump PCI driver ABI version?
--
Thanks,
Anatoly
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-28 1:00 3% ` Wang, Yipeng1
2018-09-28 8:26 4% ` Bruce Richardson
@ 2018-09-30 23:05 0% ` Honnappa Nagarahalli
1 sibling, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2018-09-30 23:05 UTC (permalink / raw)
To: Wang, Yipeng1, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, Gavin Hu (Arm Technology China), Steve Capper, Ola Liljedahl, nd
> >
> >Reader-writer concurrency issue, caused by moving the keys to their
> >alternative locations during key insert, is solved by introducing a
> >global counter(tbl_chng_cnt) indicating a change in table.
> >
> >@@ -662,6 +679,20 @@ rte_hash_cuckoo_move_insert_mw(const struct
> rte_hash *h,
> > curr_bkt = curr_node->bkt;
> > }
> >
> >+ /* Inform the previous move. The current move need
> >+ * not be informed now as the current bucket entry
> >+ * is present in both primary and secondary.
> >+ * Since there is one writer, load acquires on
> >+ * tbl_chng_cnt are not required.
> >+ */
> >+ __atomic_store_n(&h->tbl_chng_cnt,
> >+ h->tbl_chng_cnt + 1,
> >+ __ATOMIC_RELEASE);
> >+ /* The stores to sig_alt and sig_current should not
> >+ * move above the store to tbl_chng_cnt.
> >+ */
> >+ __atomic_thread_fence(__ATOMIC_RELEASE);
> >+
> [Wang, Yipeng] I believe for X86 this fence should not be compiled to any
> code, otherwise we need macros for the compile time check.
'__atomic_thread_fence(__ATOMIC_RELEASE)' provides load-load and load-store fence [1]. Hence, it should not add any barriers for x86.
[1] https://preshing.com/20130922/acquire-and-release-fences/
>
> >@@ -926,30 +957,56 @@ __rte_hash_lookup_with_hash(const struct
> rte_hash *h, const void *key,
> > uint32_t bucket_idx;
> > hash_sig_t alt_hash;
> > struct rte_hash_bucket *bkt;
> >+ uint32_t cnt_b, cnt_a;
> > int ret;
> >
> >- bucket_idx = sig & h->bucket_bitmask;
> >- bkt = &h->buckets[bucket_idx];
> >-
> > __hash_rw_reader_lock(h);
> >
> >- /* Check if key is in primary location */
> >- ret = search_one_bucket(h, key, sig, data, bkt);
> >- if (ret != -1) {
> >- __hash_rw_reader_unlock(h);
> >- return ret;
> >- }
> >- /* Calculate secondary hash */
> >- alt_hash = rte_hash_secondary_hash(sig);
> >- bucket_idx = alt_hash & h->bucket_bitmask;
> >- bkt = &h->buckets[bucket_idx];
> >+ do {
> [Wang, Yipeng] As far as I know, the MemC3 paper "MemC3: Compact and
> Concurrent MemCache with Dumber Caching and Smarter Hashing"
> as well as OvS cmap uses similar version counter to implement read-write
> concurrency for hash table, but one difference is reader checks even/odd of
> the version counter to make sure there is no concurrent writer. Could you just
> double check and confirm that this is not needed for your implementation?
>
I relooked at this paper. My patch makes use of the fact that during the process of shifting the key will be present in both primary and secondary buckets. The check for odd version counter is not required as the full key comparison would have identified any false signature matches.
> >--- a/lib/librte_hash/rte_hash.h
> >+++ b/lib/librte_hash/rte_hash.h
> >@@ -156,7 +156,7 @@ rte_hash_count(const struct rte_hash *h);
> > * - -ENOSPC if there is no space in the hash for this key.
> > */
> > int
> >-rte_hash_add_key_data(const struct rte_hash *h, const void *key, void
> >*data);
> >+rte_hash_add_key_data(struct rte_hash *h, const void *key, void
> >+*data);
> >
> > /**
> > * Add a key-value pair with a pre-computed hash value @@ -180,7
> >+180,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void
> *key, void *data);
> > * - -ENOSPC if there is no space in the hash for this key.
> > */
> > int32_t
> >-rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void
> >*key,
> >+rte_hash_add_key_with_hash_data(struct rte_hash *h, const void *key,
> > hash_sig_t sig, void *data);
> >
> > /**
> >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct
> rte_hash *h, const void *key,
> > * array of user data. This value is unique for this key.
> > */
> > int32_t
> >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> >+rte_hash_add_key(struct rte_hash *h, const void *key);
> >
> > /**
> > * Add a key to an existing hash table.
> >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h, const
> void *key);
> > * array of user data. This value is unique for this key.
> > */
> > int32_t
> >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
> >hash_sig_t sig);
> >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
> >+hash_sig_t sig);
> >
> > /
>
> I think the above changes will break ABI by changing the parameter type?
> Other people may know better on this.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-28 8:55 4% ` Van Haaren, Harry
@ 2018-09-30 22:33 0% ` Honnappa Nagarahalli
2018-10-02 13:17 3% ` Van Haaren, Harry
0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2018-09-30 22:33 UTC (permalink / raw)
To: Van Haaren, Harry, Richardson, Bruce, Wang, Yipeng1
Cc: De Lara Guarch, Pablo, dev, Gavin Hu (Arm Technology China),
Steve Capper, Ola Liljedahl, nd
> > > >
> > > >Reader-writer concurrency issue, caused by moving the keys to their
> > > >alternative locations during key insert, is solved by introducing a
> > > >global counter(tbl_chng_cnt) indicating a change in table.
>
> <snip>
>
> > > > /**
> > > >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct
> > > >rte_hash
> > *h, const void *key,
> > > > * array of user data. This value is unique for this key.
> > > > */
> > > > int32_t
> > > >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> > > >+rte_hash_add_key(struct rte_hash *h, const void *key);
> > > >
> > > > /**
> > > > * Add a key to an existing hash table.
> > > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h,
> > > >const void
> > *key);
> > > > * array of user data. This value is unique for this key.
> > > > */
> > > > int32_t
> > > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void
> > > >*key,
> > hash_sig_t sig);
> > > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
> > hash_sig_t sig);
> > > >
> > > > /
> > >
> > > I think the above changes will break ABI by changing the parameter type?
> > Other people may know better on this.
> >
> > Just removing a const should not change the ABI, I believe, since the
> > const is just advisory hint to the compiler. Actual parameter size and
> > count remains unchanged so I don't believe there is an issue.
> > [ABI experts, please correct me if I'm wrong on this]
>
>
> [Certainly no ABI expert, but...]
>
> I think this is an API break, not ABI break.
>
> Given application code as follows, it will fail to compile - even though running
> the new code as a .so wouldn't cause any issues (AFAIK).
>
> void do_hash_stuff(const struct rte_hash *h, ...) {
> /* parameter passed in is const, but updated function prototype is non-
> const */
> rte_hash_add_key_with_hash(h, ...);
> }
>
> This means that we can't recompile apps against latest patch without
> application code changes, if the app was passing a const rte_hash struct as
> the first parameter.
>
Agree. Do we need to do anything for this?
>
> -Harry
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v1 4/5] pci: add req handler field to generic pci device
2018-09-26 12:22 3% ` Burakov, Anatoly
@ 2018-09-29 6:15 3% ` Jeff Guo
2018-10-01 7:51 3% ` Burakov, Anatoly
0 siblings, 1 reply; 200+ results
From: Jeff Guo @ 2018-09-29 6:15 UTC (permalink / raw)
To: Burakov, Anatoly, stephen, bruce.richardson, ferruh.yigit,
konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
bernard.iremonger, arybchenko
Cc: jblunck, shreyansh.jain, dev, helin.zhang
On 9/26/2018 8:22 PM, Burakov, Anatoly wrote:
> On 17-Aug-18 11:51 AM, Jeff Guo wrote:
>> There are some extended interrupt types in vfio pci device except
>> from the
>> existing interrupts, such as err and req notifier, it could be useful
>> for
>> device error monitoring. And these corresponding interrupt handler is
>> different from the other interrupt handler that register in PMDs, so
>> a new
>> interrupt handler should be added. This patch will add specific req
>> handler
>> in generic pci device.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>> drivers/bus/pci/rte_bus_pci.h | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/bus/pci/rte_bus_pci.h
>> b/drivers/bus/pci/rte_bus_pci.h
>> index 0d1955f..c45a820 100644
>> --- a/drivers/bus/pci/rte_bus_pci.h
>> +++ b/drivers/bus/pci/rte_bus_pci.h
>> @@ -66,6 +66,7 @@ struct rte_pci_device {
>> uint16_t max_vfs; /**< sriov enable if not
>> zero */
>> enum rte_kernel_driver kdrv; /**< Kernel driver
>> passthrough */
>> char name[PCI_PRI_STR_SIZE+1]; /**< PCI location (ASCII) */
>> + struct rte_intr_handle req_notifier_handler;/**< Req notifier
>> handle */
>> };
>> /**
>>
>
> Does this break ABI?
>
If add a variable in struct would break ABI, it does.
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
2018-09-27 10:40 16% ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-27 11:03 0% ` Shreyansh Jain
@ 2018-09-29 0:09 0% ` Yongseok Koh
1 sibling, 0 replies; 200+ results
From: Yongseok Koh @ 2018-09-29 0:09 UTC (permalink / raw)
To: Anatoly Burakov
Cc: dev, Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Matan Azrad, Shahaf Shuler, Maxime Coquelin,
Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, Thomas Monjalon, alejandro.lucero
On Thu, Sep 27, 2018 at 11:40:59AM +0100, Anatoly Burakov wrote:
> When we allocate and use DPDK memory, we need to be able to
> differentiate between DPDK hugepage segments and segments that
> were made part of DPDK but are externally allocated. Add such
> a property to memseg lists.
>
> This breaks the ABI, so bump the EAL library ABI version and
> document the change in release notes. This also breaks a few
> internal assumptions about memory contiguousness, so adjust
> malloc code in a few places.
>
> All current calls for memseg walk functions were adjusted to
> ignore external segments where it made sense.
>
> Mempools is a special case, because we may be asked to allocate
> a mempool on a specific socket, and we need to ignore all page
> sizes on other heaps or other sockets. Previously, this
> assumption of knowing all page sizes was not a problem, but it
> will be now, so we have to match socket ID with page size when
> calculating minimum page size for a mempool.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
> ---
>
> Notes:
> v3:
> - Add comment to explain the process of picking up minimum
> page sizes for mempool
>
> v2:
> - Add documentation changes and ABI break
>
> v1:
> - Adjust all calls to memseg walk functions to ignore external
> segments where it made sense to do so
>
> doc/guides/rel_notes/deprecation.rst | 15 --------
> doc/guides/rel_notes/release_18_11.rst | 13 ++++++-
> drivers/bus/fslmc/fslmc_vfio.c | 7 ++--
> drivers/net/mlx4/mlx4_mr.c | 3 ++
> drivers/net/mlx5/mlx5.c | 5 ++-
> drivers/net/mlx5/mlx5_mr.c | 3 ++
> drivers/net/virtio/virtio_user/vhost_kernel.c | 5 ++-
> lib/librte_eal/bsdapp/eal/Makefile | 2 +-
> lib/librte_eal/bsdapp/eal/eal.c | 3 ++
> lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++--
> lib/librte_eal/common/eal_common_memory.c | 3 ++
> .../common/include/rte_eal_memconfig.h | 1 +
> lib/librte_eal/common/include/rte_memory.h | 9 +++++
> lib/librte_eal/common/malloc_elem.c | 10 ++++--
> lib/librte_eal/common/malloc_heap.c | 9 +++--
> lib/librte_eal/common/rte_malloc.c | 2 +-
> lib/librte_eal/linuxapp/eal/Makefile | 2 +-
> lib/librte_eal/linuxapp/eal/eal.c | 10 +++++-
> lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++
> lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++---
> lib/librte_eal/meson.build | 2 +-
> lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++-----
> test/test/test_malloc.c | 3 ++
> test/test/test_memzone.c | 3 ++
> 24 files changed, 134 insertions(+), 44 deletions(-)
>
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 138335dfb..d2aec64d1 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
> Deprecation Notices
> -------------------
>
> -* eal: certain structures will change in EAL on account of upcoming external
> - memory support. Aside from internal changes leading to an ABI break, the
> - following externally visible changes will also be implemented:
> -
> - - ``rte_memseg_list`` will change to include a boolean flag indicating
> - whether a particular memseg list is externally allocated. This will have
> - implications for any users of memseg-walk-related functions, as they will
> - now have to skip externally allocated segments in most cases if the intent
> - is to only iterate over internal DPDK memory.
> - - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
> - as some socket ID's will now be representing externally allocated memory. No
> - changes will be required for existing code as backwards compatibility will
> - be kept, and those who do not use this feature will not see these extra
> - socket ID's.
> -
> * eal: both declaring and identifying devices will be streamlined in v18.11.
> New functions will appear to query a specific port from buses, classes of
> device and device drivers. Device declaration will be made coherent with the
> diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
> index bc9b74ec4..5fc71e208 100644
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -91,6 +91,13 @@ API Changes
> flag the MAC can be properly configured in any case. This is particularly
> important for bonding.
>
> +* eal: The following API changes were made in 18.11:
> +
> + - ``rte_memseg_list`` structure now has an additional flag indicating whether
> + the memseg list is externally allocated. This will have implications for any
> + users of memseg-walk-related functions, as they will now have to skip
> + externally allocated segments in most cases if the intent is to only iterate
> + over internal DPDK memory.
>
> ABI Changes
> -----------
> @@ -107,6 +114,10 @@ ABI Changes
> =========================================================
>
>
> +* eal: EAL library ABI version was changed due to previously announced work on
> + supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
> + a new flag indicating whether the memseg list refers to external memory.
> +
> Removed Items
> -------------
>
> @@ -152,7 +163,7 @@ The libraries prepended with a plus sign were incremented in this version.
> librte_compressdev.so.1
> librte_cryptodev.so.5
> librte_distributor.so.1
> - librte_eal.so.8
> + + librte_eal.so.9
> librte_ethdev.so.10
> librte_eventdev.so.4
> librte_flow_classify.so.1
> diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
> index 4c2cd2a87..2e9244fb7 100644
> --- a/drivers/bus/fslmc/fslmc_vfio.c
> +++ b/drivers/bus/fslmc/fslmc_vfio.c
> @@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
> }
>
> static int
> -fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
> - const struct rte_memseg *ms, void *arg)
> +fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
> + void *arg)
> {
> int *n_segs = arg;
> int ret;
>
> + if (msl->external)
> + return 0;
> +
> ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
> if (ret)
> DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
> diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
> index d23d3c613..9f5d790b6 100644
> --- a/drivers/net/mlx4/mlx4_mr.c
> +++ b/drivers/net/mlx4/mlx4_mr.c
> @@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
> {
> struct mr_find_contig_memsegs_data *data = arg;
>
> + if (msl->external)
> + return 0;
> +
Because memory free event for external memory is available, current design of
mlx4/mlx5 memory mgmt can accommodate the new external memory support. So,
please remove it so that PMD can traverse external memory as well.
> if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
> return 0;
> /* Found, save it and stop walking. */
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
> index 30d4e70a7..c90e1d8ce 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
> static void *uar_base;
>
> static int
> -find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
> +find_lower_va_bound(const struct rte_memseg_list *msl,
> const struct rte_memseg *ms, void *arg)
> {
> void **addr = arg;
>
> + if (msl->external)
> + return 0;
> +
This one is fine.
But can you please remove the blank line?
That's a rule by former maintainers. :-)
> if (*addr == NULL)
> *addr = ms->addr;
> else
> diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
> index 1d1bcb5fe..fd4345f9c 100644
> --- a/drivers/net/mlx5/mlx5_mr.c
> +++ b/drivers/net/mlx5/mlx5_mr.c
> @@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
> {
> struct mr_find_contig_memsegs_data *data = arg;
>
> + if (msl->external)
> + return 0;
> +
Like I mentioned, please remove it.
If those two changes in mlx4/5_mr.c are removed, for the whole patch,
Acked-by: Yongseok Koh <yskoh@mellanox.com>
Thanks
Yongseok
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-28 8:26 4% ` Bruce Richardson
@ 2018-09-28 8:55 4% ` Van Haaren, Harry
2018-09-30 22:33 0% ` Honnappa Nagarahalli
0 siblings, 1 reply; 200+ results
From: Van Haaren, Harry @ 2018-09-28 8:55 UTC (permalink / raw)
To: Richardson, Bruce, Wang, Yipeng1
Cc: Honnappa Nagarahalli, De Lara Guarch, Pablo, dev, gavin.hu,
steve.capper, ola.liljedahl, nd
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Friday, September 28, 2018 9:26 AM
> To: Wang, Yipeng1 <yipeng1.wang@intel.com>
> Cc: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>; De Lara Guarch,
> Pablo <pablo.de.lara.guarch@intel.com>; dev@dpdk.org; gavin.hu@arm.com;
> steve.capper@arm.com; ola.liljedahl@arm.com; nd@arm.com
> Subject: Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving
> keys
>
> On Fri, Sep 28, 2018 at 02:00:00AM +0100, Wang, Yipeng1 wrote:
> > Reply inlined:
> >
> > >-----Original Message-----
> > >From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa Nagarahalli
> > >Sent: Thursday, September 6, 2018 10:12 AM
> > >To: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> > >Cc: dev@dpdk.org; honnappa.nagarahalli@dpdk.org; gavin.hu@arm.com;
> steve.capper@arm.com; ola.liljedahl@arm.com;
> > >nd@arm.com; Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > >Subject: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving
> keys
> > >
> > >Reader-writer concurrency issue, caused by moving the keys
> > >to their alternative locations during key insert, is solved
> > >by introducing a global counter(tbl_chng_cnt) indicating a
> > >change in table.
<snip>
> > > /**
> > >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct rte_hash
> *h, const void *key,
> > > * array of user data. This value is unique for this key.
> > > */
> > > int32_t
> > >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> > >+rte_hash_add_key(struct rte_hash *h, const void *key);
> > >
> > > /**
> > > * Add a key to an existing hash table.
> > >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h, const void
> *key);
> > > * array of user data. This value is unique for this key.
> > > */
> > > int32_t
> > >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key,
> hash_sig_t sig);
> > >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key,
> hash_sig_t sig);
> > >
> > > /
> >
> > I think the above changes will break ABI by changing the parameter type?
> Other people may know better on this.
>
> Just removing a const should not change the ABI, I believe, since the const
> is just advisory hint to the compiler. Actual parameter size and count
> remains unchanged so I don't believe there is an issue.
> [ABI experts, please correct me if I'm wrong on this]
[Certainly no ABI expert, but...]
I think this is an API break, not ABI break.
Given application code as follows, it will fail to compile - even though
running the new code as a .so wouldn't cause any issues (AFAIK).
void do_hash_stuff(const struct rte_hash *h, ...)
{
/* parameter passed in is const, but updated function prototype is non-const */
rte_hash_add_key_with_hash(h, ...);
}
This means that we can't recompile apps against latest patch without application
code changes, if the app was passing a const rte_hash struct as the first parameter.
-Harry
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
2018-09-28 1:00 3% ` Wang, Yipeng1
@ 2018-09-28 8:26 4% ` Bruce Richardson
2018-09-28 8:55 4% ` Van Haaren, Harry
2018-09-30 23:05 0% ` Honnappa Nagarahalli
1 sibling, 1 reply; 200+ results
From: Bruce Richardson @ 2018-09-28 8:26 UTC (permalink / raw)
To: Wang, Yipeng1
Cc: Honnappa Nagarahalli, De Lara Guarch, Pablo, dev, gavin.hu,
steve.capper, ola.liljedahl, nd
On Fri, Sep 28, 2018 at 02:00:00AM +0100, Wang, Yipeng1 wrote:
> Reply inlined:
>
> >-----Original Message-----
> >From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa Nagarahalli
> >Sent: Thursday, September 6, 2018 10:12 AM
> >To: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> >Cc: dev@dpdk.org; honnappa.nagarahalli@dpdk.org; gavin.hu@arm.com; steve.capper@arm.com; ola.liljedahl@arm.com;
> >nd@arm.com; Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> >Subject: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
> >
> >Reader-writer concurrency issue, caused by moving the keys
> >to their alternative locations during key insert, is solved
> >by introducing a global counter(tbl_chng_cnt) indicating a
> >change in table.
> >
> >@@ -662,6 +679,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
> > curr_bkt = curr_node->bkt;
> > }
> >
> >+ /* Inform the previous move. The current move need
> >+ * not be informed now as the current bucket entry
> >+ * is present in both primary and secondary.
> >+ * Since there is one writer, load acquires on
> >+ * tbl_chng_cnt are not required.
> >+ */
> >+ __atomic_store_n(&h->tbl_chng_cnt,
> >+ h->tbl_chng_cnt + 1,
> >+ __ATOMIC_RELEASE);
> >+ /* The stores to sig_alt and sig_current should not
> >+ * move above the store to tbl_chng_cnt.
> >+ */
> >+ __atomic_thread_fence(__ATOMIC_RELEASE);
> >+
> [Wang, Yipeng] I believe for X86 this fence should not be compiled to any code, otherwise
> we need macros for the compile time check.
>
> >@@ -926,30 +957,56 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
> > uint32_t bucket_idx;
> > hash_sig_t alt_hash;
> > struct rte_hash_bucket *bkt;
> >+ uint32_t cnt_b, cnt_a;
> > int ret;
> >
> >- bucket_idx = sig & h->bucket_bitmask;
> >- bkt = &h->buckets[bucket_idx];
> >-
> > __hash_rw_reader_lock(h);
> >
> >- /* Check if key is in primary location */
> >- ret = search_one_bucket(h, key, sig, data, bkt);
> >- if (ret != -1) {
> >- __hash_rw_reader_unlock(h);
> >- return ret;
> >- }
> >- /* Calculate secondary hash */
> >- alt_hash = rte_hash_secondary_hash(sig);
> >- bucket_idx = alt_hash & h->bucket_bitmask;
> >- bkt = &h->buckets[bucket_idx];
> >+ do {
> [Wang, Yipeng] As far as I know, the MemC3 paper "MemC3: Compact and Concurrent
> MemCache with Dumber Caching and Smarter Hashing"
> as well as OvS cmap uses similar version counter to implement read-write concurrency for hash table,
> but one difference is reader checks even/odd of the version counter to make sure there is no
> concurrent writer. Could you just double check and confirm that this is not needed for your implementation?
>
> >--- a/lib/librte_hash/rte_hash.h
> >+++ b/lib/librte_hash/rte_hash.h
> >@@ -156,7 +156,7 @@ rte_hash_count(const struct rte_hash *h);
> > * - -ENOSPC if there is no space in the hash for this key.
> > */
> > int
> >-rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
> >+rte_hash_add_key_data(struct rte_hash *h, const void *key, void *data);
> >
> > /**
> > * Add a key-value pair with a pre-computed hash value
> >@@ -180,7 +180,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
> > * - -ENOSPC if there is no space in the hash for this key.
> > */
> > int32_t
> >-rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
> >+rte_hash_add_key_with_hash_data(struct rte_hash *h, const void *key,
> > hash_sig_t sig, void *data);
> >
> > /**
> >@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
> > * array of user data. This value is unique for this key.
> > */
> > int32_t
> >-rte_hash_add_key(const struct rte_hash *h, const void *key);
> >+rte_hash_add_key(struct rte_hash *h, const void *key);
> >
> > /**
> > * Add a key to an existing hash table.
> >@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h, const void *key);
> > * array of user data. This value is unique for this key.
> > */
> > int32_t
> >-rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key, hash_sig_t sig);
> >+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key, hash_sig_t sig);
> >
> > /
>
> I think the above changes will break ABI by changing the parameter type? Other people may know better on this.
Just removing a const should not change the ABI, I believe, since the const
is just advisory hint to the compiler. Actual parameter size and count
remains unchanged so I don't believe there is an issue.
[ABI experts, please correct me if I'm wrong on this]
/Bruce
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v16 2/6] eal: enable hotplug on multi-process
2018-09-28 4:23 1% ` [dpdk-dev] [PATCH v16 0/6] " Qi Zhang
@ 2018-09-28 4:23 2% ` Qi Zhang
0 siblings, 0 replies; 200+ results
From: Qi Zhang @ 2018-09-28 4:23 UTC (permalink / raw)
To: thomas, gaetan.rivet, anatoly.burakov, arybchenko
Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
benjamin.h.shelton, narender.vangati, Qi Zhang
We are going to introduce the solution to handle hotplug in
multi-process, it includes the below scenario:
1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary
In the primary-secondary process model, we assume devices are shared
by default. that means attaches or detaches a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.
Any failure during attaching/detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.
This patch covers the implementation of case 1,2.
Case 3,4 will be implemented on a separate patch.
IPC scenario for Case 1, 2:
attach a device
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach the device and send a reply.
d) primary check the reply if all success goes to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach the device and send a reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success
detach a device
a) primary send detach sync request to all secondary
b) secondary detach the device and send reply
c) primary check the reply if all success goes to f).
d) primary send detach rollback sync request to all secondary.
e) secondary receive the request and attach back device. goto g)
f) primary detach the device if success goto g), else goto d)
g) detach fail.
h) detach success.
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 11 ++
lib/librte_eal/bsdapp/eal/Makefile | 1 +
lib/librte_eal/common/eal_common_dev.c | 225 ++++++++++++++++++++++++++++++--
lib/librte_eal/common/eal_private.h | 30 +++++
lib/librte_eal/common/hotplug_mp.c | 221 +++++++++++++++++++++++++++++++
lib/librte_eal/common/hotplug_mp.h | 46 +++++++
lib/librte_eal/common/include/rte_dev.h | 9 ++
lib/librte_eal/common/meson.build | 1 +
lib/librte_eal/linuxapp/eal/Makefile | 1 +
lib/librte_eal/linuxapp/eal/eal.c | 6 +
10 files changed, 542 insertions(+), 9 deletions(-)
create mode 100644 lib/librte_eal/common/hotplug_mp.c
create mode 100644 lib/librte_eal/common/hotplug_mp.h
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..f88910c7f 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -67,6 +67,12 @@ New Features
SR-IOV option in Hyper-V and Azure. This is an alternative to the previous
vdev_netvsc, tap, and failsafe drivers combination.
+* **Support device multi-process hotplug.**
+
+ Hotplug and hot-unplug for devices will now be supported in multiprocessing
+ scenario. Any ethdev devices created in the primary process will be regarded
+ as shared and will be available for all DPDK processes. Synchronization
+ between processes will be done using DPDK IPC.
API Changes
-----------
@@ -91,6 +97,11 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* eal: scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
+
+ In primary-secondary process model, ``rte_eal_hotplug_add`` will guarantee
+ that device be attached on all processes, while ``rte_eal_hotplug_remove``
+ will guarantee device be detached on all processes.
ABI Changes
-----------
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..4351c6a20 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -62,6 +62,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_proc.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_fbarray.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += eal_common_uuid.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += rte_malloc.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += hotplug_mp.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_elem.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_heap.c
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) += malloc_mp.c
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index 85eb1569f..314266041 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -19,8 +19,10 @@
#include <rte_log.h>
#include <rte_spinlock.h>
#include <rte_malloc.h>
+#include <rte_string_fns.h>
#include "eal_private.h"
+#include "hotplug_mp.h"
/**
* The device event callback description.
@@ -127,9 +129,10 @@ int rte_eal_dev_detach(struct rte_device *dev)
return ret;
}
-int
-rte_eal_hotplug_add(const char *busname, const char *devname,
- const char *drvargs)
+/* help funciton to build devargs, caller should free the memory */
+static char *
+build_devargs(const char *busname, const char *devname,
+ const char *drvargs)
{
char *devargs = NULL;
int size, length = -1;
@@ -140,19 +143,33 @@ rte_eal_hotplug_add(const char *busname, const char *devname,
if (length >= size)
devargs = malloc(length + 1);
if (devargs == NULL)
- return -ENOMEM;
+ break;
} while (size == 0);
+ return devargs;
+}
+
+int
+rte_eal_hotplug_add(const char *busname, const char *devname,
+ const char *drvargs)
+{
+ char *devargs = build_devargs(busname, devname, drvargs);
+
+ if (devargs == NULL)
+ return -ENOMEM;
+
return rte_dev_probe(devargs);
}
-int __rte_experimental
-rte_dev_probe(const char *devargs)
+/* probe device at local process. */
+int
+local_dev_probe(const char *devargs, struct rte_device **new_dev)
{
struct rte_device *dev;
struct rte_devargs *da;
int ret;
+ *new_dev = NULL;
da = calloc(1, sizeof(*da));
if (da == NULL)
return -ENOMEM;
@@ -195,6 +212,8 @@ rte_dev_probe(const char *devargs)
dev->name);
goto err_devarg;
}
+
+ *new_dev = dev;
return 0;
err_devarg:
@@ -226,8 +245,9 @@ rte_eal_hotplug_remove(const char *busname, const char *devname)
return rte_dev_remove(dev);
}
-int __rte_experimental
-rte_dev_remove(struct rte_device *dev)
+/* remove device at local process. */
+int
+local_dev_remove(struct rte_device *dev)
{
struct rte_bus *bus;
int ret;
@@ -248,7 +268,194 @@ rte_dev_remove(struct rte_device *dev)
if (ret)
RTE_LOG(ERR, EAL, "Driver cannot detach the device (%s)\n",
dev->name);
- rte_devargs_remove(dev->devargs);
+ else
+ rte_devargs_remove(dev->devargs);
+
+ return ret;
+}
+
+int __rte_experimental
+rte_dev_probe(const char *devargs)
+{
+ struct eal_dev_mp_req req;
+ struct rte_device *dev;
+ int ret;
+
+ memset(&req, 0, sizeof(req));
+ req.t = EAL_DEV_REQ_TYPE_ATTACH;
+ strlcpy(req.devargs, devargs, EAL_DEV_MP_DEV_ARGS_MAX_LEN);
+
+ if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+ /**
+ * If in secondary process, just send IPC request to
+ * primary process.
+ */
+ ret = eal_dev_hotplug_request_to_primary(&req);
+ if (ret) {
+ RTE_LOG(ERR, EAL,
+ "Failed to send hotplug request to primary\n");
+ return -ENOMSG;
+ }
+ if (req.result)
+ RTE_LOG(ERR, EAL,
+ "Failed to hotplug add device\n");
+ return req.result;
+ }
+
+ /* attach a shared device from primary start from here: */
+
+ /* primary attach the new device itself. */
+ ret = local_dev_probe(devargs, &dev);
+
+ if (ret) {
+ RTE_LOG(ERR, EAL,
+ "Failed to attach device on primary process\n");
+
+ /**
+ * it is possible that secondary process failed to attached a
+ * device that primary process have during initialization,
+ * so for -EEXIST case, we still need to sync with secondary
+ * process.
+ */
+ if (ret != -EEXIST)
+ return ret;
+ }
+
+ /* primary send attach sync request to secondary. */
+ ret = eal_dev_hotplug_request_to_secondary(&req);
+
+ /* if any commnunication error, we need to rollback. */
+ if (ret) {
+ RTE_LOG(ERR, EAL,
+ "Failed to send hotplug add request to secondary\n");
+ ret = -ENOMSG;
+ goto rollback;
+ }
+
+ /**
+ * if any secondary failed to attach, we need to consider if rollback
+ * is necessary.
+ */
+ if (req.result) {
+ RTE_LOG(ERR, EAL,
+ "Failed to attach device on secondary process\n");
+ ret = req.result;
+
+ /* for -EEXIST, we don't need to rollback. */
+ if (ret == -EEXIST)
+ return ret;
+ goto rollback;
+ }
+
+ return 0;
+
+rollback:
+ req.t = EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK;
+
+ /* primary send rollback request to secondary. */
+ if (eal_dev_hotplug_request_to_secondary(&req))
+ RTE_LOG(WARNING, EAL,
+ "Failed to rollback device attach on secondary."
+ "Devices in secondary may not sync with primary\n");
+
+ /* primary rollback itself. */
+ if (local_dev_remove(dev))
+ RTE_LOG(WARNING, EAL,
+ "Failed to rollback device attach on primary."
+ "Devices in secondary may not sync with primary\n");
+
+ return ret;
+}
+
+int __rte_experimental
+rte_dev_remove(struct rte_device *dev)
+{
+ struct eal_dev_mp_req req;
+ char *devargs;
+ int ret;
+
+ devargs = build_devargs(dev->devargs->bus->name, dev->name, "");
+ if (devargs == NULL)
+ return -ENOMEM;
+
+ memset(&req, 0, sizeof(req));
+ req.t = EAL_DEV_REQ_TYPE_DETACH;
+ strlcpy(req.devargs, devargs, EAL_DEV_MP_DEV_ARGS_MAX_LEN);
+ free(devargs);
+
+ if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
+ /**
+ * If in secondary process, just send IPC request to
+ * primary process.
+ */
+ ret = eal_dev_hotplug_request_to_primary(&req);
+ if (ret) {
+ RTE_LOG(ERR, EAL,
+ "Failed to send hotplug request to primary\n");
+ return -ENOMSG;
+ }
+ if (req.result)
+ RTE_LOG(ERR, EAL,
+ "Failed to hotplug remove device\n");
+ return req.result;
+ }
+
+ /* detach a device from primary start from here: */
+
+ /* primary send detach sync request to secondary */
+ ret = eal_dev_hotplug_request_to_secondary(&req);
+
+ /**
+ * if communication error, we need to rollback, because it is possible
+ * part of the secondary processes still detached it successfully.
+ */
+ if (ret) {
+ RTE_LOG(ERR, EAL,
+ "Failed to send device detach request to secondary\n");
+ ret = -ENOMSG;
+ goto rollback;
+ }
+
+ /**
+ * if any secondary failed to detach, we need to consider if rollback
+ * is necessary.
+ */
+ if (req.result) {
+ RTE_LOG(ERR, EAL,
+ "Failed to detach device on secondary process\n");
+ ret = req.result;
+ /**
+ * if -ENOENT, we don't need to rollback, since devices is
+ * already detached on secondary process.
+ */
+ if (ret != -ENOENT)
+ goto rollback;
+ }
+
+ /* primary detach the device itself. */
+ ret = local_dev_remove(dev);
+
+ /* if primary failed, still need to consider if rollback is necessary */
+ if (ret) {
+ RTE_LOG(ERR, EAL,
+ "Failed to detach device on primary process\n");
+ /* if -ENOENT, we don't need to rollback */
+ if (ret == -ENOENT)
+ return ret;
+ goto rollback;
+ }
+
+ return 0;
+
+rollback:
+ req.t = EAL_DEV_REQ_TYPE_DETACH_ROLLBACK;
+
+ /* primary send rollback request to secondary. */
+ if (eal_dev_hotplug_request_to_secondary(&req))
+ RTE_LOG(WARNING, EAL,
+ "Failed to rollback device detach on secondary."
+ "Devices in secondary may not sync with primary\n");
+
return ret;
}
diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h
index 4f809a83c..83f10a9f8 100644
--- a/lib/librte_eal/common/eal_private.h
+++ b/lib/librte_eal/common/eal_private.h
@@ -304,4 +304,34 @@ int
rte_devargs_layers_parse(struct rte_devargs *devargs,
const char *devstr);
+/*
+ * probe a device at local process.
+ *
+ * @param devargs
+ * Device arguments including bus, class and driver properties.
+ * @param new_dev
+ * new device be probed as output.
+ * @return
+ * 0 on success, negative on error.
+ */
+int local_dev_probe(const char *devargs, struct rte_device **new_dev);
+
+/**
+ * Hotplug remove a given device from a specific bus at local process.
+ *
+ * @param dev
+ * Data structure of the device to remove.
+ * @return
+ * 0 on success, negative on error.
+ */
+int local_dev_remove(struct rte_device *dev);
+
+/**
+ * Register all mp action callbacks for hotplug.
+ *
+ * @return
+ * 0 on success, negative on error.
+ */
+int rte_dev_hotplug_mp_init(void);
+
#endif /* _EAL_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/hotplug_mp.c b/lib/librte_eal/common/hotplug_mp.c
new file mode 100644
index 000000000..1c92e44cb
--- /dev/null
+++ b/lib/librte_eal/common/hotplug_mp.c
@@ -0,0 +1,221 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+#include <string.h>
+
+#include <rte_eal.h>
+#include <rte_alarm.h>
+#include <rte_string_fns.h>
+#include <rte_devargs.h>
+
+#include "hotplug_mp.h"
+#include "eal_private.h"
+
+#define MP_TIMEOUT_S 5 /**< 5 seconds timeouts */
+
+static int cmp_dev_name(const struct rte_device *dev, const void *_name)
+{
+ const char *name = _name;
+
+ return strcmp(dev->name, name);
+}
+
+struct mp_reply_bundle {
+ struct rte_mp_msg msg;
+ void *peer;
+};
+
+static int
+handle_secondary_request(const struct rte_mp_msg *msg, const void *peer)
+{
+ RTE_SET_USED(msg);
+ RTE_SET_USED(peer);
+ return -ENOTSUP;
+}
+
+static void __handle_primary_request(void *param)
+{
+ struct mp_reply_bundle *bundle = param;
+ struct rte_mp_msg *msg = &bundle->msg;
+ const struct eal_dev_mp_req *req =
+ (const struct eal_dev_mp_req *)msg->param;
+ struct rte_mp_msg mp_resp;
+ struct eal_dev_mp_req *resp =
+ (struct eal_dev_mp_req *)mp_resp.param;
+ struct rte_devargs *da;
+ struct rte_device *dev;
+ struct rte_bus *bus;
+ int ret = 0;
+
+ memset(&mp_resp, 0, sizeof(mp_resp));
+
+ switch (req->t) {
+ case EAL_DEV_REQ_TYPE_ATTACH:
+ case EAL_DEV_REQ_TYPE_DETACH_ROLLBACK:
+ ret = local_dev_probe(req->devargs, &dev);
+ break;
+ case EAL_DEV_REQ_TYPE_DETACH:
+ case EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK:
+ da = calloc(1, sizeof(*da));
+ if (da == NULL) {
+ ret = -ENOMEM;
+ goto quit;
+ }
+
+ ret = rte_devargs_parse(da, req->devargs);
+ if (ret)
+ goto quit;
+
+ bus = rte_bus_find_by_name(da->bus->name);
+ if (bus == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n", da->bus->name);
+ ret = -ENOENT;
+ goto quit;
+ }
+
+ dev = bus->find_device(NULL, cmp_dev_name, da->name);
+ if (dev == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot find plugged device (%s)\n", da->name);
+ ret = -ENOENT;
+ goto quit;
+ }
+
+ ret = local_dev_remove(dev);
+quit:
+ break;
+ default:
+ ret = -EINVAL;
+ }
+
+ strlcpy(mp_resp.name, EAL_DEV_MP_ACTION_REQUEST, sizeof(mp_resp.name));
+ mp_resp.len_param = sizeof(*req);
+ memcpy(resp, req, sizeof(*resp));
+ resp->result = ret;
+ if (rte_mp_reply(&mp_resp, bundle->peer) < 0)
+ RTE_LOG(ERR, EAL, "failed to send reply to primary request\n");
+
+ free(bundle->peer);
+ free(bundle);
+}
+
+static int
+handle_primary_request(const struct rte_mp_msg *msg, const void *peer)
+{
+ struct rte_mp_msg mp_resp;
+ const struct eal_dev_mp_req *req =
+ (const struct eal_dev_mp_req *)msg->param;
+ struct eal_dev_mp_req *resp =
+ (struct eal_dev_mp_req *)mp_resp.param;
+ struct mp_reply_bundle *bundle;
+ int ret = 0;
+
+ memset(&mp_resp, 0, sizeof(mp_resp));
+ strlcpy(mp_resp.name, EAL_DEV_MP_ACTION_REQUEST, sizeof(mp_resp.name));
+ mp_resp.len_param = sizeof(*req);
+ memcpy(resp, req, sizeof(*resp));
+
+ bundle = calloc(1, sizeof(*bundle));
+ if (bundle == NULL) {
+ resp->result = -ENOMEM;
+ ret = rte_mp_reply(&mp_resp, peer);
+ if (ret)
+ RTE_LOG(ERR, EAL, "failed to send reply to primary request\n");
+ return ret;
+ }
+
+ bundle->msg = *msg;
+ /**
+ * We need to send reply on interrupt thread, but peer can't be
+ * parsed directly, so this is a temporal hack, need to be fixed
+ * when it is ready.
+ */
+ bundle->peer = (void *)strdup(peer);
+
+ /**
+ * We are at IPC callback thread, sync IPC is not allowed due to
+ * dead lock, so we delegate the task to interrupt thread.
+ */
+ ret = rte_eal_alarm_set(1, __handle_primary_request, bundle);
+ if (ret) {
+ resp->result = ret;
+ ret = rte_mp_reply(&mp_resp, peer);
+ if (ret) {
+ RTE_LOG(ERR, EAL, "failed to send reply to primary request\n");
+ return ret;
+ }
+ }
+ return 0;
+}
+
+int eal_dev_hotplug_request_to_primary(struct eal_dev_mp_req *req)
+{
+ RTE_SET_USED(req);
+ return -ENOTSUP;
+}
+
+int eal_dev_hotplug_request_to_secondary(struct eal_dev_mp_req *req)
+{
+ struct rte_mp_msg mp_req;
+ struct rte_mp_reply mp_reply;
+ struct timespec ts = {.tv_sec = MP_TIMEOUT_S, .tv_nsec = 0};
+ int ret;
+ int i;
+
+ memset(&mp_req, 0, sizeof(mp_req));
+ memcpy(mp_req.param, req, sizeof(*req));
+ mp_req.len_param = sizeof(*req);
+ strlcpy(mp_req.name, EAL_DEV_MP_ACTION_REQUEST, sizeof(mp_req.name));
+
+ ret = rte_mp_request_sync(&mp_req, &mp_reply, &ts);
+ if (ret) {
+ RTE_LOG(ERR, EAL, "rte_mp_request_sync failed\n");
+ return ret;
+ }
+
+ if (mp_reply.nb_sent != mp_reply.nb_received) {
+ RTE_LOG(ERR, EAL, "not all secondary reply\n");
+ return -1;
+ }
+
+ req->result = 0;
+ for (i = 0; i < mp_reply.nb_received; i++) {
+ struct eal_dev_mp_req *resp =
+ (struct eal_dev_mp_req *)mp_reply.msgs[i].param;
+ if (resp->result) {
+ req->result = resp->result;
+ if (req->t == EAL_DEV_REQ_TYPE_ATTACH &&
+ req->result != -EEXIST)
+ break;
+ if (req->t == EAL_DEV_REQ_TYPE_DETACH &&
+ req->result != -ENOENT)
+ break;
+ }
+ }
+
+ return 0;
+}
+
+int rte_dev_hotplug_mp_init(void)
+{
+ int ret;
+
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ ret = rte_mp_action_register(EAL_DEV_MP_ACTION_REQUEST,
+ handle_secondary_request);
+ if (ret) {
+ RTE_LOG(ERR, EAL, "Couldn't register '%s' action\n",
+ EAL_DEV_MP_ACTION_REQUEST);
+ return ret;
+ }
+ } else {
+ ret = rte_mp_action_register(EAL_DEV_MP_ACTION_REQUEST,
+ handle_primary_request);
+ if (ret) {
+ RTE_LOG(ERR, EAL, "Couldn't register '%s' action\n",
+ EAL_DEV_MP_ACTION_REQUEST);
+ return ret;
+ }
+ }
+
+ return 0;
+}
diff --git a/lib/librte_eal/common/hotplug_mp.h b/lib/librte_eal/common/hotplug_mp.h
new file mode 100644
index 000000000..c95c8f1fb
--- /dev/null
+++ b/lib/librte_eal/common/hotplug_mp.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _HOTPLUG_MP_H_
+#define _HOTPLUG_MP_H_
+
+#include <rte_dev.h>
+#include <rte_bus.h>
+
+#define EAL_DEV_MP_ACTION_REQUEST "eal_dev_mp_request"
+#define EAL_DEV_MP_ACTION_RESPONSE "eal_dev_mp_response"
+
+#define EAL_DEV_MP_DEV_NAME_MAX_LEN RTE_DEV_NAME_MAX_LEN
+#define EAL_DEV_MP_BUS_NAME_MAX_LEN 32
+#define EAL_DEV_MP_DEV_ARGS_MAX_LEN 128
+
+enum eal_dev_req_type {
+ EAL_DEV_REQ_TYPE_ATTACH,
+ EAL_DEV_REQ_TYPE_DETACH,
+ EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK,
+ EAL_DEV_REQ_TYPE_DETACH_ROLLBACK,
+};
+
+struct eal_dev_mp_req {
+ enum eal_dev_req_type t;
+ char devargs[EAL_DEV_MP_DEV_ARGS_MAX_LEN];
+ int result;
+};
+
+/**
+ * this is a synchronous wrapper for secondary process send
+ * request to primary process, this is invoked when an attach
+ * or detach request issued from primary process.
+ */
+int eal_dev_hotplug_request_to_primary(struct eal_dev_mp_req *req);
+
+/**
+ * this is a synchronous wrapper for primary process send
+ * request to secondary process, this is invoked when an attach
+ * or detach request issued from secondary process.
+ */
+int eal_dev_hotplug_request_to_secondary(struct eal_dev_mp_req *req);
+
+
+#endif /* _HOTPLUG_MP_H_ */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 7a30362c0..266331acd 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -190,6 +190,9 @@ int rte_eal_dev_detach(struct rte_device *dev);
/**
* Hotplug add a given device to a specific bus.
+ * In multi-process, this function will inform all other processes
+ * to hotplug add the same device. Any failure on other process rollback
+ * the action.
*
* @param busname
* The bus name the device is added to.
@@ -219,6 +222,9 @@ int __rte_experimental rte_dev_probe(const char *devargs);
/**
* Hotplug remove a given device from a specific bus.
+ * In multi-process, this function will inform all other processes
+ * to hotplug remove the same device. Any failure on other process
+ * will rollback the action.
*
* @param busname
* The bus name the device is removed from.
@@ -234,6 +240,9 @@ int rte_eal_hotplug_remove(const char *busname, const char *devname);
* @b EXPERIMENTAL: this API may change without prior notice
*
* Remove one device.
+ * In multi-process, this function will inform all other processes
+ * to hotplug remove the same device. Any failure on other process
+ * will rollback the action.
*
* @param dev
* Data structure of the device to remove.
diff --git a/lib/librte_eal/common/meson.build b/lib/librte_eal/common/meson.build
index b7fc98499..04c414356 100644
--- a/lib/librte_eal/common/meson.build
+++ b/lib/librte_eal/common/meson.build
@@ -28,6 +28,7 @@ common_sources = files(
'eal_common_thread.c',
'eal_common_timer.c',
'eal_common_uuid.c',
+ 'hotplug_mp.c',
'malloc_elem.c',
'malloc_heap.c',
'malloc_mp.c',
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..58455c1a6 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -70,6 +70,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_proc.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_fbarray.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal_common_uuid.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += rte_malloc.c
+SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += hotplug_mp.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += malloc_elem.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += malloc_heap.c
SRCS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += malloc_mp.c
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..f2c90c528 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -865,6 +865,12 @@ rte_eal_init(int argc, char **argv)
}
}
+ /* register mp action callbacks for hotplug */
+ if (rte_dev_hotplug_mp_init() < 0) {
+ rte_eal_init_alert("failed to register mp callback for hotplug\n");
+ return -1;
+ }
+
if (rte_bus_scan()) {
rte_eal_init_alert("Cannot scan the buses for devices\n");
rte_errno = ENODEV;
--
2.13.6
^ permalink raw reply [relevance 2%]
* [dpdk-dev] [PATCH v16 0/6] enable hotplug on multi-process
@ 2018-09-28 4:23 1% ` Qi Zhang
2018-09-28 4:23 2% ` [dpdk-dev] [PATCH v16 2/6] eal: " Qi Zhang
2018-10-16 0:16 1% ` [dpdk-dev] [PATCH v17 0/6] " Qi Zhang
1 sibling, 1 reply; 200+ results
From: Qi Zhang @ 2018-09-28 4:23 UTC (permalink / raw)
To: thomas, gaetan.rivet, anatoly.burakov, arybchenko
Cc: konstantin.ananyev, dev, bruce.richardson, ferruh.yigit,
benjamin.h.shelton, narender.vangati, Qi Zhang
v16:
- rebase to patch "simplify parameters of hotplug functions"
http://patchwork.dpdk.org/patch/45463/ include:
* keep rte_eal_hotplug_add/rte_eal_hotplug_move unchanged.
* the IPC sync logic is moved to rte_dev_probe/rte_dev_remove.
* simplify the IPC message by removing busname and devname from
eal_dev_mp_req, since devargs string will encode those information
already.
- combined release notes with related code changes.
- replace do_ prefix to local_ for local process only probe/remove function.
- improve comments
v15:
- fix missing return in rte_eth_dev_pci_release.
- minor fix and more detail comments for patch 5/7.
- update release notes for v18.11.
v14:
- rebase.
- All changes belongs to patch 1/6.
1) rename rte_eth_dev_release_port_private to rte_eth_dev_release_port_seondary
since it is only used by secondary process.
2) in rte_eth_dev_pci_generic_remove, even on the secondary process,
I think its better to call rte_eth_dev_release_port_secondary after
dev_uninit since it is possible that secondary process need to release
some local resources in dev_uninit before release the port and return.
Also this does not break all exist users of rte_eth_dev_pci_generic_remove,
because there is no special handle in all exist dev_uninit for secondary
process.
3) add rte_eth_dev_release_port_secondary into rte_eth_dev_destroy as a
general step, so we don't need patches for i40e and ixgbe.
4) fix missing update on rte_ethdev_version.map.
- improve error handle for -EEXIST when attaching a device and -ENOENT
when detaching a device. It is possible that device is not synced during
some situation, so attach an exist device in primary still need to sync
with secondary. Also, it's not necessary to rollback if we fail to
attach an exist device or detach a not exist device on secondary.
- fix potential NULL point ref in handle_primary_request.
- merge all vdev driver patches into one patch.
- merge all pci driver patches into on patch.
v13:
- Since rte_eth_dev_attach/rte_eth_dev_detach will be deprecated,
so, modify the sample code to use rte_eal_hotplug_add and
rte_eal_hotplug_remove to attach/detach device.
v12:
- fix return value in eal_dev_hotplug_request_to_primary.
- add more error log in rte_eal_hotplug_add.
- fix return value in rte_eal_hotplug_add and rte_eal_hotplug_remove
any failure due to IPC error will return -ENOMSG, but not -1.
- remove unnecessary changes from previous rework.
v11: - move out common code from pci_vfio_unmap_secondary and
pci_vfio_unmap_primary.
- move RTE_BUS_NAME_MAX_LEN and RTE_DEV_ARGS_MAX_LEN into hotplug_mp.h
- fix reply check in eal_dev_hotplug_request_to_primary.
- move skeleton code for attaching device from secondary from patch 6/19
to patch 5/19 to improve code readability.
v10:
- Since hotplug add/remove a vdev on a secondary process will sync on
all processes now, it is not necessary to support private vdev for
a secondary process which is identified by a not-NULL devargs in
"--vdev". So re-work on all vdev driver changes to simpified device
probe scenario on a secondary process, devargs will be ignored on
secondary process now.
- fix lisence header in example/multi-process/hotplug_mp/Makefile.
v9:
- Move hotplug IPC from rte_eth_dev_attach/rte_eth_dev_detach to
eal_dev_hotplug_add and eal_dev_hotplug_remove, now all kinds of
devices will be synced in multi-process.
- Fix couple issue when a device is bound to vfio.
1) The device can't be detached clearly in a secondary process, which
also cause it can't be attached again, due to the error that
/dev/vfio/<group_fd> is still busy.(see Patch 3/19 and 4/19)
2) repeat detach/attach device will cause "cannot find TAILQ entry
for PCI device" due to incorrect PCI address compare.
(see patch 2/19).
- Removed device lock.
- Removed private device support.
- Fix commit log grammar issue
v8:
- update rte_eal_version.map due to new API added.
- minor reword on release note.
- minor fix on commit log and code style.
NOTE:
Some issues which is not related with this patchset is expected when
play with hotplug_mp sample as belows.
- Attach a PCI device twice may cause device can't be detached
below fix is required:
https://patches.dpdk.org/patch/42030/
- ixgbe device can't detached, below fix is required
https://patches.dpdk.org/patch/42031/
v7:
- update rte_ethdev_version.map for new APIs.
- improve code readability in __handle_secondary_request by use goto.
- add comments to explain why need to call rte_eal_alarm_set.
- add error log when process_mp_init_callbacks failed.
- reword release notes base on Anatoly's suggestion.
- add back previous "Acked-by" and "Reviewed-by" in commit log.
NOTE: current patchset depends on below IPC fix, or it may not be able
to attach a shared vdev.
https://patches.dpdk.org/patch/41647/
v6:
- remove bus->scan_one, since ABI break is not necessary.
- remove patch for failsafe PMD since it will not support secondary.
- fix wrong implemenation on ixgbe.
- add rte_eth_dev_release_port_private into rte_eth_dev_pci_generic_remove for
secondary process, so we don't need to patch on PMD if PMD use the
default remove function.
- add release notes update.
- agreed to use strdup(peer) as workaround for repling a sync request in seperate
thread.
v5:
- since we will keep mp thread separate from interrupt thread,
it is not necessary to use temporary thread, we use rte_eal_alarm_set.
- remove the change in rte_eth_dev_release_port, since there is a better
way to prevent rte_eth_dev_release_port be called after
rte_eth_dev_release_port_private.
- fix the issue that lock does not take effect on secondary due to
previous re-work
- fix the issue when the first attached device is a private device from
secondary. (patch 8/24)
- work around for reply a sync request in separate thread, this is still
an open and in discussion as below.
https://mails.dpdk.org/archives/dev/2018-June/105359.html
v4:
- since mp thread will be merged to interrupt thread, the fix on v3
for sync IPC deadlock will not work. the new version enable the
machanism to invoke a mp action callback in a temporary thread to
avoid the IPC deadlock, with this, secondary to primary request
impelemtation also be simplified, since we can use sync request
directly in a separate thread.
v3:
- enable mp init callback register to help non-eal module to initialize
mp channel during rte_eal_init
- fix when attach share device from secondary.
1) dead lock due to sync IPC be invoked in rte_malloc in primary
process when handle secondary request to attach device, the
solution is primary process to issue share device attach/detach
in interrupt thread.
2) return port_id not correct.
- check nb_sent and nb_received in sync IPC.
- fix memory leak duirng error handling at attach_on_secondary.
- improve clean_lock_callback to only lock/unlock spinlock once
- improve error code return in check-reply during async IPC.
- remove rte_ prefix of internal function in ethdev_mp.c
- sample code improvement.
1) rename sample to "hotplug_mp", and move to example/multi-process.
2) cleanup header include.
3) call rte_eal_cleanup before exit.
v2:
- rename rte_ethdev_mp.* to ethdev_mp.*
- rename rte_ethdev_lock.* to ethdev_lock.*
- move internal funciton to ethdev_private.h
- separate rte_eth_dev_[un]lock into rte_eth_dev_[un]lock and
rte_eth_dev_[un]lock_with_callback
- lock callbacks will be removed automatically after device is detached.
- add experimental tag for all new APIs.
- fix coding style issue.
- fix wrong lisence header in sample code.
- fix spelling
- fix meson.build.
- improve comments.
Background:
===========
Currently secondary process will only sync ethdev from primary
process at init stage, but it will not be aware if device
is attached/detached on primary process at runtime.
While there is the requirement from application that take
primary-secondary process model. The primary process work as a
resource management process, it will create/destroy virtual device
at runtime, while the secondary process deal with the network stuff
with these devices.
Solution:
=========
So the orignial intention is to fix this gap, but beyond that
the patch set provide a more comprehesive solution to handle
different hotplug cases in multi-process situation, it cover below
scenario:
1. Attach a device from the primary
2. Detach a device from the primary
3. Attach a device from a secondary
4. Detach a device from a secondary
In primary-secondary process model, we assume ethernet devices are
shared by default. that means attach or detach a device on any process
will broadcast to all other processes through mp channel then device
information will be synchronized on all processes.
Any failure during attaching or detaching process will cause inconsistent
status between processes, so proper rollback action should be considered.
Scenario for Case 1, 2:
attach device from primary
a) primary attach the new device if failed goto h).
b) primary send attach sync request to all secondary.
c) secondary receive request and attach device and send reply.
d) primary check the reply if all success go to i).
e) primary send attach rollback sync request to all secondary.
f) secondary receive the request and detach device and send reply.
g) primary receive the reply and detach device as rollback action.
h) attach fail
i) attach success
detach device from primary
a) primary perform pre-detach check, if device is locked, goto i).
b) primary send pre-detach sync request to all secondary.
c) secondary perform pre-detach check and send reply.
d) primary check the reply if any fail goto i).
e) primary send detach sync request to all secondary
f) secondary detach the device and send reply (assume no fail)
g) primary detach the device.
h) detach success
i) detach failed
Scenario for case 3, 4:
attach device from secondary:
a) seconary send asycn request to primary and wait on a condition
which will be released by matched response from primary.
b) primary receive the request and attach the new device if failed
goto i).
c) primary forward attach request to all secondary as async request
(because this in mp thread context, use sync request will deadlock,
same reason for all following async request.)
d) secondary receive request and attach device and send reply.
e) primary check the reply if all success go to j).
f) primary send attach rollback async request to all secondary.
g) secondary receive the request and detach device and send reply.
h) primary receive the reply and detach device as rollback action.
i) send fail response to secondary, goto k).
j) send success response to secondary.
k) secondary process receive response and return.
detach device from secondary:
a) secondary send async request to primary and wait on a condition
which will be released by matched response from primary.
b) primary receive the request and perform pre-detach check, if device
is locked, goto j).
c) primary send pre-detach async request to all secondary.
d) secondary perform pre-detach check and send reply.
e) primary check the reply if any fail goto j).
f) primary send detach async request to all secondary
g) secondary detach the device and send reply
h) primary detach the device.
i) send success response to secondary, goto k).
j) send fail response to secondary.
k) secondary process receive response and return.
APIs chenages:
==============
scope of rte_eal_hotplug_add and rte_eal_hotplug_remove is extended.
In primary-secondary process model, rte_eal_hotplug_add will guarantee
that device be attached on all processes, while rte_eal_hotplug_remove will
guarantee device be detached on all processes.
PMD Impact:
===========
Currently device removing is not handled well in secondary process on
most pmd drivers, rte_eth_dev_relase_port will be invoked and will mess up
primary process since it reset all shared data. So we introduced new API
rte_eth_dev_release_port_secondary which only reset ethdev's state to unsued
but not touch shared data so other process will not be impacted.
Since not all device driver is target to support primary-secondary
process model, so the patch set only fix this for PCI device those driver use
rte_eth_dev_pci_generic_remove or rte_eth_dev_destroy and all
vdev that support secondary process, it can be refereneced by other driver
when equevalent fix is required
Example:
========
The patchset also contains a example to demonstrate device hotplug
in multi-process model, below are detail instructions.
/* start sample code as primary then secondary */
./hotplug_mp --proc-type=auto
Command Line Example:
>help
>list
/* attach a pci device */
> attach 0000:81:00.0
/* detach the pci device */
> detach 0000:81:00.0
/* attach a vdev af_packet device */
> attach net_af_packet,iface=eth0
/* detach the vdev af_packet device */
> detach net_af_packet
Qi Zhang (6):
ethdev: add function to release port in secondary process
eal: enable hotplug on multi-process
eal: support attach or detach share device from secondary
drivers/net: enable hotplug on secondary process
drivers/net: enable device detach on secondary
examples/multi_process: add hotplug sample
doc/guides/rel_notes/release_18_11.rst | 11 +
drivers/net/af_packet/rte_eth_af_packet.c | 6 +-
drivers/net/bnxt/bnxt_ethdev.c | 6 +-
drivers/net/bonding/rte_eth_bond_pmd.c | 6 +-
drivers/net/ena/ena_ethdev.c | 2 +-
drivers/net/kni/rte_eth_kni.c | 6 +-
drivers/net/liquidio/lio_ethdev.c | 2 +-
drivers/net/null/rte_eth_null.c | 6 +-
drivers/net/octeontx/octeontx_ethdev.c | 8 +
drivers/net/pcap/rte_eth_pcap.c | 6 +-
drivers/net/tap/rte_eth_tap.c | 8 +-
drivers/net/vhost/rte_eth_vhost.c | 6 +-
drivers/net/virtio/virtio_ethdev.c | 2 +-
examples/multi_process/Makefile | 1 +
examples/multi_process/hotplug_mp/Makefile | 23 ++
examples/multi_process/hotplug_mp/commands.c | 214 ++++++++++++++
examples/multi_process/hotplug_mp/commands.h | 10 +
examples/multi_process/hotplug_mp/main.c | 41 +++
lib/librte_eal/bsdapp/eal/Makefile | 1 +
lib/librte_eal/common/eal_common_dev.c | 225 +++++++++++++-
lib/librte_eal/common/eal_private.h | 30 ++
lib/librte_eal/common/hotplug_mp.c | 426 +++++++++++++++++++++++++++
lib/librte_eal/common/hotplug_mp.h | 46 +++
lib/librte_eal/common/include/rte_dev.h | 9 +
lib/librte_eal/common/meson.build | 1 +
lib/librte_eal/linuxapp/eal/Makefile | 1 +
lib/librte_eal/linuxapp/eal/eal.c | 6 +
lib/librte_ethdev/rte_ethdev.c | 17 +-
lib/librte_ethdev/rte_ethdev_driver.h | 16 +-
lib/librte_ethdev/rte_ethdev_pci.h | 10 +-
lib/librte_ethdev/rte_ethdev_version.map | 7 +
31 files changed, 1126 insertions(+), 33 deletions(-)
create mode 100644 examples/multi_process/hotplug_mp/Makefile
create mode 100644 examples/multi_process/hotplug_mp/commands.c
create mode 100644 examples/multi_process/hotplug_mp/commands.h
create mode 100644 examples/multi_process/hotplug_mp/main.c
create mode 100644 lib/librte_eal/common/hotplug_mp.c
create mode 100644 lib/librte_eal/common/hotplug_mp.h
--
2.13.6
^ permalink raw reply [relevance 1%]
* Re: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
@ 2018-09-28 1:00 3% ` Wang, Yipeng1
2018-09-28 8:26 4% ` Bruce Richardson
2018-09-30 23:05 0% ` Honnappa Nagarahalli
0 siblings, 2 replies; 200+ results
From: Wang, Yipeng1 @ 2018-09-28 1:00 UTC (permalink / raw)
To: Honnappa Nagarahalli, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, gavin.hu, steve.capper, ola.liljedahl, nd
Reply inlined:
>-----Original Message-----
>From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Honnappa Nagarahalli
>Sent: Thursday, September 6, 2018 10:12 AM
>To: Richardson, Bruce <bruce.richardson@intel.com>; De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
>Cc: dev@dpdk.org; honnappa.nagarahalli@dpdk.org; gavin.hu@arm.com; steve.capper@arm.com; ola.liljedahl@arm.com;
>nd@arm.com; Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
>Subject: [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys
>
>Reader-writer concurrency issue, caused by moving the keys
>to their alternative locations during key insert, is solved
>by introducing a global counter(tbl_chng_cnt) indicating a
>change in table.
>
>@@ -662,6 +679,20 @@ rte_hash_cuckoo_move_insert_mw(const struct rte_hash *h,
> curr_bkt = curr_node->bkt;
> }
>
>+ /* Inform the previous move. The current move need
>+ * not be informed now as the current bucket entry
>+ * is present in both primary and secondary.
>+ * Since there is one writer, load acquires on
>+ * tbl_chng_cnt are not required.
>+ */
>+ __atomic_store_n(&h->tbl_chng_cnt,
>+ h->tbl_chng_cnt + 1,
>+ __ATOMIC_RELEASE);
>+ /* The stores to sig_alt and sig_current should not
>+ * move above the store to tbl_chng_cnt.
>+ */
>+ __atomic_thread_fence(__ATOMIC_RELEASE);
>+
[Wang, Yipeng] I believe for X86 this fence should not be compiled to any code, otherwise
we need macros for the compile time check.
>@@ -926,30 +957,56 @@ __rte_hash_lookup_with_hash(const struct rte_hash *h, const void *key,
> uint32_t bucket_idx;
> hash_sig_t alt_hash;
> struct rte_hash_bucket *bkt;
>+ uint32_t cnt_b, cnt_a;
> int ret;
>
>- bucket_idx = sig & h->bucket_bitmask;
>- bkt = &h->buckets[bucket_idx];
>-
> __hash_rw_reader_lock(h);
>
>- /* Check if key is in primary location */
>- ret = search_one_bucket(h, key, sig, data, bkt);
>- if (ret != -1) {
>- __hash_rw_reader_unlock(h);
>- return ret;
>- }
>- /* Calculate secondary hash */
>- alt_hash = rte_hash_secondary_hash(sig);
>- bucket_idx = alt_hash & h->bucket_bitmask;
>- bkt = &h->buckets[bucket_idx];
>+ do {
[Wang, Yipeng] As far as I know, the MemC3 paper "MemC3: Compact and Concurrent
MemCache with Dumber Caching and Smarter Hashing"
as well as OvS cmap uses similar version counter to implement read-write concurrency for hash table,
but one difference is reader checks even/odd of the version counter to make sure there is no
concurrent writer. Could you just double check and confirm that this is not needed for your implementation?
>--- a/lib/librte_hash/rte_hash.h
>+++ b/lib/librte_hash/rte_hash.h
>@@ -156,7 +156,7 @@ rte_hash_count(const struct rte_hash *h);
> * - -ENOSPC if there is no space in the hash for this key.
> */
> int
>-rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
>+rte_hash_add_key_data(struct rte_hash *h, const void *key, void *data);
>
> /**
> * Add a key-value pair with a pre-computed hash value
>@@ -180,7 +180,7 @@ rte_hash_add_key_data(const struct rte_hash *h, const void *key, void *data);
> * - -ENOSPC if there is no space in the hash for this key.
> */
> int32_t
>-rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
>+rte_hash_add_key_with_hash_data(struct rte_hash *h, const void *key,
> hash_sig_t sig, void *data);
>
> /**
>@@ -200,7 +200,7 @@ rte_hash_add_key_with_hash_data(const struct rte_hash *h, const void *key,
> * array of user data. This value is unique for this key.
> */
> int32_t
>-rte_hash_add_key(const struct rte_hash *h, const void *key);
>+rte_hash_add_key(struct rte_hash *h, const void *key);
>
> /**
> * Add a key to an existing hash table.
>@@ -222,7 +222,7 @@ rte_hash_add_key(const struct rte_hash *h, const void *key);
> * array of user data. This value is unique for this key.
> */
> int32_t
>-rte_hash_add_key_with_hash(const struct rte_hash *h, const void *key, hash_sig_t sig);
>+rte_hash_add_key_with_hash(struct rte_hash *h, const void *key, hash_sig_t sig);
>
> /
I think the above changes will break ABI by changing the parameter type? Other people may know better on this.
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
2018-09-27 13:42 0% ` Alejandro Lucero
@ 2018-09-27 14:04 0% ` Burakov, Anatoly
0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-09-27 14:04 UTC (permalink / raw)
To: Alejandro Lucero
Cc: dev, Mcnamara, John, marko.kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
Ajit Khaparde, Wiles, Keith, Bruce Richardson, Thomas Monjalon,
Shreyansh Jain, Shahaf Shuler, Andrew Rybchenko
On 27-Sep-18 2:42 PM, Alejandro Lucero wrote:
>
>
> On Thu, Sep 27, 2018 at 2:22 PM Burakov, Anatoly
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
>
> On 27-Sep-18 2:14 PM, Alejandro Lucero wrote:
> > On Thu, Sep 27, 2018 at 11:41 AM Anatoly Burakov
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>>
> > wrote:
> >
> >> We will be assigning "invalid" socket ID's to external heap, and
> >> malloc will now be able to verify if a supplied socket ID is in
> >> fact a valid one, rendering parameter checks for sockets
> >> obsolete.
> >>
> >> This changes the semantics of what we understand by "socket ID",
> >> so document the change in the release notes.
> >>
> >> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com
> <mailto:anatoly.burakov@intel.com>>
> >> ---
> >> doc/guides/rel_notes/release_18_11.rst | 7 +++++++
> >> lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
> >> lib/librte_eal/common/malloc_heap.c | 2 +-
> >> lib/librte_eal/common/rte_malloc.c | 4 ----
> >> 4 files changed, 13 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/doc/guides/rel_notes/release_18_11.rst
> >> b/doc/guides/rel_notes/release_18_11.rst
> >> index 5fc71e208..6ee236302 100644
> >> --- a/doc/guides/rel_notes/release_18_11.rst
> >> +++ b/doc/guides/rel_notes/release_18_11.rst
> >> @@ -98,6 +98,13 @@ API Changes
> >> users of memseg-walk-related functions, as they will now
> have to skip
> >> externally allocated segments in most cases if the intent
> is to only
> >> iterate
> >> over internal DPDK memory.
> >> + - ``socket_id`` parameter across the entire DPDK has gained
> additional
> >> + meaning, as some socket ID's will now be representing
> externally
> >> allocated
> >> + memory. No changes will be required for existing code as
> backwards
> >> + compatibility will be kept, and those who do not use this
> feature
> >> will not
> >> + see these extra socket ID's. Any new API's must not check
> socket ID
> >> + parameters themselves, and must instead leave it to the memory
> >> subsystem to
> >> + decide whether socket ID is a valid one.
> >>
> >> ABI Changes
> >> -----------
> >> diff --git a/lib/librte_eal/common/eal_common_memzone.c
> >> b/lib/librte_eal/common/eal_common_memzone.c
> >> index 7300fe05d..b7081afbf 100644
> >> --- a/lib/librte_eal/common/eal_common_memzone.c
> >> +++ b/lib/librte_eal/common/eal_common_memzone.c
> >> @@ -120,13 +120,15 @@
> memzone_reserve_aligned_thread_unsafe(const char
> >> *name, size_t len,
> >> return NULL;
> >> }
> >>
> >> - if ((socket_id != SOCKET_ID_ANY) &&
> >> - (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
> >> + if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
> >>
> >
> > Should not it be better to use RTE_MAX_HEAP instead of removing
> the check?
>
> First of all, maximum number of heaps should not concern the rest of
> the
> code - this is purely internal detail of rte_malloc.
>
>
> In a previous patch you say that:
>
> "Switch over all parts of EAL to use heap ID instead of NUMA node
> ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
> node's index within the detected NUMA node list. Heap ID for
> external heaps will be order of their creation."
>
> If I understand this right, heaps linked to physical sockets get a heap
> ID, and then external heaps will get IDs starting from the higher
> socket/heap ID + 1.
Yes and no.
Socket ID is an externally visible identification of "where to allocate
from" (a heap). Heap ID is used internally. Normally, there is a 1:1
correspondence of NUMA node to heap ID, but there may be cases where
e.g. only NUMA nodes 0 and 7 are detected, so you'll have socket 0 and 7
as valid socket ID's. However, these socket ID's will be internally
resolved into heap ID's 0 and 1, not 0 and 7.
So, in *most* cases, socket ID for an internal heap is equivalent to its
heap ID, but it is by accident. Heap ID is an internal identifier used
by the malloc heap, and it is not visible externally - it is only known
to malloc itself. Even memzone knows nothing about heap ID's - only
socket ID's.
> So, assuming RTE_MAX_HEAPS is really the maximum number of allowed heaps
> (which does not seem so reading your next paragraph), it would be a good
> sanity check to use RTE_MAX_HEAPS for the socket id.
>
> More importantly, socket ID is completely independent from number of
> heaps. Socket ID is incremented each time a new heap is created, and
> they are not reused. If you create and destroy a heap 100 times -
> you'll
> get 100 different socket ID's, even though max number of heaps is less
> than that.
>
>
> I do not understand this. It is true there is no check regarding
> RTE_MAX_HEAPS when creating new heaps,
There is one :) RTE_MAX_HEAPS is length of malloc heaps array (shared in
memory). If we cannot find a vacant spot in heaps array, the heap will
not be created.
However, *socket ID* is indeed limited only to INT_MAX. Socket ID is not
heap ID - socket ID is an externally visible identifier. Multiple socket
ID's can resolve to the same heap ID.
For example, if you create and destroy a heap 5 times one after the
other, you'll get 5 different socket ID's, but all of them would have
pointed to the same heap ID (but not at the same time).
So, semantically speaking, heap ID isn't really "an ID" as such, it's an
index into heap array. Unlike socket ID, it has no meaning.
> then nor sure what the limit
> refers to. And then there is code like dumping heaps info or getting
> info from the heap based on socket id that will not work.
It is probably unclear because the ordering of this patchset is not
ideal (and i'm not sure how to make it any better).
The code for dumping or getting heap info's accepts socket ID, but it
translates it into heap ID, because that's what malloc uses internally
to differentiate between the heaps. Heap ID is there to break dependency
between NUMA node ID and position in the malloc heap array.
--
Thanks,
Anatoly
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
2018-09-27 13:21 0% ` Burakov, Anatoly
@ 2018-09-27 13:42 0% ` Alejandro Lucero
2018-09-27 14:04 0% ` Burakov, Anatoly
0 siblings, 1 reply; 200+ results
From: Alejandro Lucero @ 2018-09-27 13:42 UTC (permalink / raw)
To: Burakov, Anatoly
Cc: dev, Mcnamara, John, marko.kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
Ajit Khaparde, Wiles, Keith, Bruce Richardson, Thomas Monjalon,
Shreyansh Jain, Shahaf Shuler, Andrew Rybchenko
On Thu, Sep 27, 2018 at 2:22 PM Burakov, Anatoly <anatoly.burakov@intel.com>
wrote:
> On 27-Sep-18 2:14 PM, Alejandro Lucero wrote:
> > On Thu, Sep 27, 2018 at 11:41 AM Anatoly Burakov <
> anatoly.burakov@intel.com>
> > wrote:
> >
> >> We will be assigning "invalid" socket ID's to external heap, and
> >> malloc will now be able to verify if a supplied socket ID is in
> >> fact a valid one, rendering parameter checks for sockets
> >> obsolete.
> >>
> >> This changes the semantics of what we understand by "socket ID",
> >> so document the change in the release notes.
> >>
> >> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> >> ---
> >> doc/guides/rel_notes/release_18_11.rst | 7 +++++++
> >> lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
> >> lib/librte_eal/common/malloc_heap.c | 2 +-
> >> lib/librte_eal/common/rte_malloc.c | 4 ----
> >> 4 files changed, 13 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/doc/guides/rel_notes/release_18_11.rst
> >> b/doc/guides/rel_notes/release_18_11.rst
> >> index 5fc71e208..6ee236302 100644
> >> --- a/doc/guides/rel_notes/release_18_11.rst
> >> +++ b/doc/guides/rel_notes/release_18_11.rst
> >> @@ -98,6 +98,13 @@ API Changes
> >> users of memseg-walk-related functions, as they will now have to
> skip
> >> externally allocated segments in most cases if the intent is to
> only
> >> iterate
> >> over internal DPDK memory.
> >> + - ``socket_id`` parameter across the entire DPDK has gained
> additional
> >> + meaning, as some socket ID's will now be representing externally
> >> allocated
> >> + memory. No changes will be required for existing code as backwards
> >> + compatibility will be kept, and those who do not use this feature
> >> will not
> >> + see these extra socket ID's. Any new API's must not check socket ID
> >> + parameters themselves, and must instead leave it to the memory
> >> subsystem to
> >> + decide whether socket ID is a valid one.
> >>
> >> ABI Changes
> >> -----------
> >> diff --git a/lib/librte_eal/common/eal_common_memzone.c
> >> b/lib/librte_eal/common/eal_common_memzone.c
> >> index 7300fe05d..b7081afbf 100644
> >> --- a/lib/librte_eal/common/eal_common_memzone.c
> >> +++ b/lib/librte_eal/common/eal_common_memzone.c
> >> @@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char
> >> *name, size_t len,
> >> return NULL;
> >> }
> >>
> >> - if ((socket_id != SOCKET_ID_ANY) &&
> >> - (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
> >> + if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
> >>
> >
> > Should not it be better to use RTE_MAX_HEAP instead of removing the
> check?
>
> First of all, maximum number of heaps should not concern the rest of the
> code - this is purely internal detail of rte_malloc.
>
>
In a previous patch you say that:
"Switch over all parts of EAL to use heap ID instead of NUMA node
ID to identify heaps. Heap ID for DPDK-internal heaps is NUMA
node's index within the detected NUMA node list. Heap ID for
external heaps will be order of their creation."
If I understand this right, heaps linked to physical sockets get a heap ID,
and then external heaps will get IDs starting from the higher socket/heap
ID + 1.
So, assuming RTE_MAX_HEAPS is really the maximum number of allowed heaps
(which does not seem so reading your next paragraph), it would be a good
sanity check to use RTE_MAX_HEAPS for the socket id.
More importantly, socket ID is completely independent from number of
> heaps. Socket ID is incremented each time a new heap is created, and
> they are not reused. If you create and destroy a heap 100 times - you'll
> get 100 different socket ID's, even though max number of heaps is less
> than that.
>
>
I do not understand this. It is true there is no check regarding
RTE_MAX_HEAPS when creating new heaps, then nor sure what the limit refers
to. And then there is code like dumping heaps info or getting info from the
heap based on socket id that will not work.
> >
> >
> >
> >> rte_errno = EINVAL;
> >> return NULL;
> >> }
> >>
> >> - if (!rte_eal_has_hugepages())
> >> + /* only set socket to SOCKET_ID_ANY if we aren't allocating for
> an
> >> + * external heap.
> >> + */
> >> + if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
> >> socket_id = SOCKET_ID_ANY;
> >>
> >> contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
> >> diff --git a/lib/librte_eal/common/malloc_heap.c
> >> b/lib/librte_eal/common/malloc_heap.c
> >> index 1d1e35708..73e478076 100644
> >> --- a/lib/librte_eal/common/malloc_heap.c
> >> +++ b/lib/librte_eal/common/malloc_heap.c
> >> @@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int
> >> socket_arg,
> >> if (size == 0 || (align && !rte_is_power_of_2(align)))
> >> return NULL;
> >>
> >> - if (!rte_eal_has_hugepages())
> >> + if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
> >> socket_arg = SOCKET_ID_ANY;
> >>
> >> if (socket_arg == SOCKET_ID_ANY)
> >> diff --git a/lib/librte_eal/common/rte_malloc.c
> >> b/lib/librte_eal/common/rte_malloc.c
> >> index 73d6df31d..9ba1472c3 100644
> >> --- a/lib/librte_eal/common/rte_malloc.c
> >> +++ b/lib/librte_eal/common/rte_malloc.c
> >> @@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size,
> >> unsigned int align,
> >> if (!rte_eal_has_hugepages())
> >> socket_arg = SOCKET_ID_ANY;
> >>
> >> - /* Check socket parameter */
> >> - if (socket_arg >= RTE_MAX_NUMA_NODES)
> >> - return NULL;
> >> -
> >>
> >
> > Sane than before. Better to keep the sanity check using RTE_MAX_HEAPS.
>
> same as above :)
>
>
> --
> Thanks,
> Anatoly
>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
2018-09-27 13:14 0% ` Alejandro Lucero
@ 2018-09-27 13:21 0% ` Burakov, Anatoly
2018-09-27 13:42 0% ` Alejandro Lucero
0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-09-27 13:21 UTC (permalink / raw)
To: Alejandro Lucero
Cc: dev, Mcnamara, John, marko.kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
Ajit Khaparde, Wiles, Keith, Bruce Richardson, Thomas Monjalon,
Shreyansh Jain, Shahaf Shuler, Andrew Rybchenko
On 27-Sep-18 2:14 PM, Alejandro Lucero wrote:
> On Thu, Sep 27, 2018 at 11:41 AM Anatoly Burakov <anatoly.burakov@intel.com>
> wrote:
>
>> We will be assigning "invalid" socket ID's to external heap, and
>> malloc will now be able to verify if a supplied socket ID is in
>> fact a valid one, rendering parameter checks for sockets
>> obsolete.
>>
>> This changes the semantics of what we understand by "socket ID",
>> so document the change in the release notes.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>> doc/guides/rel_notes/release_18_11.rst | 7 +++++++
>> lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
>> lib/librte_eal/common/malloc_heap.c | 2 +-
>> lib/librte_eal/common/rte_malloc.c | 4 ----
>> 4 files changed, 13 insertions(+), 8 deletions(-)
>>
>> diff --git a/doc/guides/rel_notes/release_18_11.rst
>> b/doc/guides/rel_notes/release_18_11.rst
>> index 5fc71e208..6ee236302 100644
>> --- a/doc/guides/rel_notes/release_18_11.rst
>> +++ b/doc/guides/rel_notes/release_18_11.rst
>> @@ -98,6 +98,13 @@ API Changes
>> users of memseg-walk-related functions, as they will now have to skip
>> externally allocated segments in most cases if the intent is to only
>> iterate
>> over internal DPDK memory.
>> + - ``socket_id`` parameter across the entire DPDK has gained additional
>> + meaning, as some socket ID's will now be representing externally
>> allocated
>> + memory. No changes will be required for existing code as backwards
>> + compatibility will be kept, and those who do not use this feature
>> will not
>> + see these extra socket ID's. Any new API's must not check socket ID
>> + parameters themselves, and must instead leave it to the memory
>> subsystem to
>> + decide whether socket ID is a valid one.
>>
>> ABI Changes
>> -----------
>> diff --git a/lib/librte_eal/common/eal_common_memzone.c
>> b/lib/librte_eal/common/eal_common_memzone.c
>> index 7300fe05d..b7081afbf 100644
>> --- a/lib/librte_eal/common/eal_common_memzone.c
>> +++ b/lib/librte_eal/common/eal_common_memzone.c
>> @@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char
>> *name, size_t len,
>> return NULL;
>> }
>>
>> - if ((socket_id != SOCKET_ID_ANY) &&
>> - (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
>> + if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
>>
>
> Should not it be better to use RTE_MAX_HEAP instead of removing the check?
First of all, maximum number of heaps should not concern the rest of the
code - this is purely internal detail of rte_malloc.
More importantly, socket ID is completely independent from number of
heaps. Socket ID is incremented each time a new heap is created, and
they are not reused. If you create and destroy a heap 100 times - you'll
get 100 different socket ID's, even though max number of heaps is less
than that.
>
>
>
>> rte_errno = EINVAL;
>> return NULL;
>> }
>>
>> - if (!rte_eal_has_hugepages())
>> + /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
>> + * external heap.
>> + */
>> + if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
>> socket_id = SOCKET_ID_ANY;
>>
>> contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
>> diff --git a/lib/librte_eal/common/malloc_heap.c
>> b/lib/librte_eal/common/malloc_heap.c
>> index 1d1e35708..73e478076 100644
>> --- a/lib/librte_eal/common/malloc_heap.c
>> +++ b/lib/librte_eal/common/malloc_heap.c
>> @@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int
>> socket_arg,
>> if (size == 0 || (align && !rte_is_power_of_2(align)))
>> return NULL;
>>
>> - if (!rte_eal_has_hugepages())
>> + if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
>> socket_arg = SOCKET_ID_ANY;
>>
>> if (socket_arg == SOCKET_ID_ANY)
>> diff --git a/lib/librte_eal/common/rte_malloc.c
>> b/lib/librte_eal/common/rte_malloc.c
>> index 73d6df31d..9ba1472c3 100644
>> --- a/lib/librte_eal/common/rte_malloc.c
>> +++ b/lib/librte_eal/common/rte_malloc.c
>> @@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size,
>> unsigned int align,
>> if (!rte_eal_has_hugepages())
>> socket_arg = SOCKET_ID_ANY;
>>
>> - /* Check socket parameter */
>> - if (socket_arg >= RTE_MAX_NUMA_NODES)
>> - return NULL;
>> -
>>
>
> Sane than before. Better to keep the sanity check using RTE_MAX_HEAPS.
same as above :)
--
Thanks,
Anatoly
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
2018-09-27 10:41 4% ` [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-09-27 13:14 0% ` Alejandro Lucero
2018-09-27 13:21 0% ` Burakov, Anatoly
0 siblings, 1 reply; 200+ results
From: Alejandro Lucero @ 2018-09-27 13:14 UTC (permalink / raw)
To: Burakov, Anatoly
Cc: dev, Mcnamara, John, marko.kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
Ajit Khaparde, Wiles, Keith, Bruce Richardson, Thomas Monjalon,
Shreyansh Jain, Shahaf Shuler, Andrew Rybchenko
On Thu, Sep 27, 2018 at 11:41 AM Anatoly Burakov <anatoly.burakov@intel.com>
wrote:
> We will be assigning "invalid" socket ID's to external heap, and
> malloc will now be able to verify if a supplied socket ID is in
> fact a valid one, rendering parameter checks for sockets
> obsolete.
>
> This changes the semantics of what we understand by "socket ID",
> so document the change in the release notes.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> doc/guides/rel_notes/release_18_11.rst | 7 +++++++
> lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
> lib/librte_eal/common/malloc_heap.c | 2 +-
> lib/librte_eal/common/rte_malloc.c | 4 ----
> 4 files changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/doc/guides/rel_notes/release_18_11.rst
> b/doc/guides/rel_notes/release_18_11.rst
> index 5fc71e208..6ee236302 100644
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -98,6 +98,13 @@ API Changes
> users of memseg-walk-related functions, as they will now have to skip
> externally allocated segments in most cases if the intent is to only
> iterate
> over internal DPDK memory.
> + - ``socket_id`` parameter across the entire DPDK has gained additional
> + meaning, as some socket ID's will now be representing externally
> allocated
> + memory. No changes will be required for existing code as backwards
> + compatibility will be kept, and those who do not use this feature
> will not
> + see these extra socket ID's. Any new API's must not check socket ID
> + parameters themselves, and must instead leave it to the memory
> subsystem to
> + decide whether socket ID is a valid one.
>
> ABI Changes
> -----------
> diff --git a/lib/librte_eal/common/eal_common_memzone.c
> b/lib/librte_eal/common/eal_common_memzone.c
> index 7300fe05d..b7081afbf 100644
> --- a/lib/librte_eal/common/eal_common_memzone.c
> +++ b/lib/librte_eal/common/eal_common_memzone.c
> @@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char
> *name, size_t len,
> return NULL;
> }
>
> - if ((socket_id != SOCKET_ID_ANY) &&
> - (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
> + if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
>
Should not it be better to use RTE_MAX_HEAP instead of removing the check?
> rte_errno = EINVAL;
> return NULL;
> }
>
> - if (!rte_eal_has_hugepages())
> + /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
> + * external heap.
> + */
> + if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
> socket_id = SOCKET_ID_ANY;
>
> contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
> diff --git a/lib/librte_eal/common/malloc_heap.c
> b/lib/librte_eal/common/malloc_heap.c
> index 1d1e35708..73e478076 100644
> --- a/lib/librte_eal/common/malloc_heap.c
> +++ b/lib/librte_eal/common/malloc_heap.c
> @@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int
> socket_arg,
> if (size == 0 || (align && !rte_is_power_of_2(align)))
> return NULL;
>
> - if (!rte_eal_has_hugepages())
> + if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
> socket_arg = SOCKET_ID_ANY;
>
> if (socket_arg == SOCKET_ID_ANY)
> diff --git a/lib/librte_eal/common/rte_malloc.c
> b/lib/librte_eal/common/rte_malloc.c
> index 73d6df31d..9ba1472c3 100644
> --- a/lib/librte_eal/common/rte_malloc.c
> +++ b/lib/librte_eal/common/rte_malloc.c
> @@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size,
> unsigned int align,
> if (!rte_eal_has_hugepages())
> socket_arg = SOCKET_ID_ANY;
>
> - /* Check socket parameter */
> - if (socket_arg >= RTE_MAX_NUMA_NODES)
> - return NULL;
> -
>
Sane than before. Better to keep the sanity check using RTE_MAX_HEAPS.
> return malloc_heap_alloc(type, size, socket_arg, 0,
> align == 0 ? 1 : align, 0, false);
> }
> --
> 2.17.1
>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
2018-09-27 11:12 0% ` Shreyansh Jain
@ 2018-09-27 11:29 0% ` Burakov, Anatoly
0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-09-27 11:29 UTC (permalink / raw)
To: Shreyansh Jain, dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Matan Azrad, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, thomas, alejandro.lucero
On 27-Sep-18 12:12 PM, Shreyansh Jain wrote:
> On Thursday 27 September 2018 04:38 PM, Burakov, Anatoly wrote:
>> On 27-Sep-18 12:03 PM, Shreyansh Jain wrote:
>>> On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
>>>> When we allocate and use DPDK memory, we need to be able to
>>>> differentiate between DPDK hugepage segments and segments that
>>>> were made part of DPDK but are externally allocated. Add such
>>>> a property to memseg lists.
>>>>
>>>> This breaks the ABI, so bump the EAL library ABI version and
>>>> document the change in release notes. This also breaks a few
>>>> internal assumptions about memory contiguousness, so adjust
>>>> malloc code in a few places.
>>>>
>>>> All current calls for memseg walk functions were adjusted to
>>>> ignore external segments where it made sense.
>>>>
>>>> Mempools is a special case, because we may be asked to allocate
>>>> a mempool on a specific socket, and we need to ignore all page
>>>> sizes on other heaps or other sockets. Previously, this
>>>> assumption of knowing all page sizes was not a problem, but it
>>>> will be now, so we have to match socket ID with page size when
>>>> calculating minimum page size for a mempool.
>>>>
>>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>>> ---
>>>>
>>>
>>> Specifically for bus/fslmc perspective and generically for others:
>>>
>>> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>>>
>>>
>>
>> Actually, this patch may need some further adjustment, since it makes
>> assumption about not wanting to map external memory for DMA.
>>
>> Specifically - there's an fslmc dma map function that now skips
>> external memory segments. Are you sure that's how it's supposed to be?
>>
>
> I thought over that.
> For now yes. If we need to map external memory, and there is an event
> that would be called back, it should be handled separately. So, for
> example, a PMD level API to handle such requests from applications.
Well, technically such an event is already available, now that external
memory allocations trigger mem events :)
>
> The point is that how the external memory is handled is use-case
> specific - the need to have its events reported back is definitely
> there, but its handling is still a grey area.
>
> Once the patches make their way in, I can always come back and tune that.
>
OK, fair enough.
--
Thanks,
Anatoly
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
2018-09-27 11:08 0% ` Burakov, Anatoly
@ 2018-09-27 11:12 0% ` Shreyansh Jain
2018-09-27 11:29 0% ` Burakov, Anatoly
0 siblings, 1 reply; 200+ results
From: Shreyansh Jain @ 2018-09-27 11:12 UTC (permalink / raw)
To: Burakov, Anatoly, dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Matan Azrad, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, thomas, alejandro.lucero
On Thursday 27 September 2018 04:38 PM, Burakov, Anatoly wrote:
> On 27-Sep-18 12:03 PM, Shreyansh Jain wrote:
>> On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
>>> When we allocate and use DPDK memory, we need to be able to
>>> differentiate between DPDK hugepage segments and segments that
>>> were made part of DPDK but are externally allocated. Add such
>>> a property to memseg lists.
>>>
>>> This breaks the ABI, so bump the EAL library ABI version and
>>> document the change in release notes. This also breaks a few
>>> internal assumptions about memory contiguousness, so adjust
>>> malloc code in a few places.
>>>
>>> All current calls for memseg walk functions were adjusted to
>>> ignore external segments where it made sense.
>>>
>>> Mempools is a special case, because we may be asked to allocate
>>> a mempool on a specific socket, and we need to ignore all page
>>> sizes on other heaps or other sockets. Previously, this
>>> assumption of knowing all page sizes was not a problem, but it
>>> will be now, so we have to match socket ID with page size when
>>> calculating minimum page size for a mempool.
>>>
>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>> ---
>>>
>>
>> Specifically for bus/fslmc perspective and generically for others:
>>
>> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>>
>>
>
> Actually, this patch may need some further adjustment, since it makes
> assumption about not wanting to map external memory for DMA.
>
> Specifically - there's an fslmc dma map function that now skips external
> memory segments. Are you sure that's how it's supposed to be?
>
I thought over that.
For now yes. If we need to map external memory, and there is an event
that would be called back, it should be handled separately. So, for
example, a PMD level API to handle such requests from applications.
The point is that how the external memory is handled is use-case
specific - the need to have its events reported back is definitely
there, but its handling is still a grey area.
Once the patches make their way in, I can always come back and tune that.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
2018-09-27 11:03 0% ` Shreyansh Jain
@ 2018-09-27 11:08 0% ` Burakov, Anatoly
2018-09-27 11:12 0% ` Shreyansh Jain
0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-09-27 11:08 UTC (permalink / raw)
To: Shreyansh Jain, dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Matan Azrad, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, thomas, alejandro.lucero
On 27-Sep-18 12:03 PM, Shreyansh Jain wrote:
> On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
>> When we allocate and use DPDK memory, we need to be able to
>> differentiate between DPDK hugepage segments and segments that
>> were made part of DPDK but are externally allocated. Add such
>> a property to memseg lists.
>>
>> This breaks the ABI, so bump the EAL library ABI version and
>> document the change in release notes. This also breaks a few
>> internal assumptions about memory contiguousness, so adjust
>> malloc code in a few places.
>>
>> All current calls for memseg walk functions were adjusted to
>> ignore external segments where it made sense.
>>
>> Mempools is a special case, because we may be asked to allocate
>> a mempool on a specific socket, and we need to ignore all page
>> sizes on other heaps or other sockets. Previously, this
>> assumption of knowing all page sizes was not a problem, but it
>> will be now, so we have to match socket ID with page size when
>> calculating minimum page size for a mempool.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>> ---
>>
>
> Specifically for bus/fslmc perspective and generically for others:
>
> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>
>
Actually, this patch may need some further adjustment, since it makes
assumption about not wanting to map external memory for DMA.
Specifically - there's an fslmc dma map function that now skips external
memory segments. Are you sure that's how it's supposed to be?
--
Thanks,
Anatoly
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
2018-09-27 10:40 16% ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-27 11:03 0% ` Shreyansh Jain
2018-09-27 11:08 0% ` Burakov, Anatoly
2018-09-29 0:09 0% ` Yongseok Koh
1 sibling, 1 reply; 200+ results
From: Shreyansh Jain @ 2018-09-27 11:03 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Matan Azrad, Shahaf Shuler, Yongseok Koh, Maxime Coquelin,
Tiwei Bie, Zhihong Wang, Bruce Richardson, Olivier Matz,
Andrew Rybchenko, laszlo.madarassy, laszlo.vadkerti,
andras.kovacs, winnie.tian, daniel.andrasi, janos.kobor,
geza.koblo, srinath.mannam, scott.branden, ajit.khaparde,
keith.wiles, thomas, alejandro.lucero
On Thursday 27 September 2018 04:10 PM, Anatoly Burakov wrote:
> When we allocate and use DPDK memory, we need to be able to
> differentiate between DPDK hugepage segments and segments that
> were made part of DPDK but are externally allocated. Add such
> a property to memseg lists.
>
> This breaks the ABI, so bump the EAL library ABI version and
> document the change in release notes. This also breaks a few
> internal assumptions about memory contiguousness, so adjust
> malloc code in a few places.
>
> All current calls for memseg walk functions were adjusted to
> ignore external segments where it made sense.
>
> Mempools is a special case, because we may be asked to allocate
> a mempool on a specific socket, and we need to ignore all page
> sizes on other heaps or other sockets. Previously, this
> assumption of knowing all page sizes was not a problem, but it
> will be now, so we have to match socket ID with page size when
> calculating minimum page size for a mempool.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
> ---
>
Specifically for bus/fslmc perspective and generically for others:
Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external
2018-09-26 11:21 2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
@ 2018-09-27 10:40 16% ` Anatoly Burakov
2018-09-27 11:03 0% ` Shreyansh Jain
2018-09-29 0:09 0% ` Yongseok Koh
2018-09-27 10:41 4% ` [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID Anatoly Burakov
` (2 subsequent siblings)
4 siblings, 2 replies; 200+ results
From: Anatoly Burakov @ 2018-09-27 10:40 UTC (permalink / raw)
To: dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, thomas, alejandro.lucero
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes. This also breaks a few
internal assumptions about memory contiguousness, so adjust
malloc code in a few places.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
Notes:
v3:
- Add comment to explain the process of picking up minimum
page sizes for mempool
v2:
- Add documentation changes and ABI break
v1:
- Adjust all calls to memseg walk functions to ignore external
segments where it made sense to do so
doc/guides/rel_notes/deprecation.rst | 15 --------
doc/guides/rel_notes/release_18_11.rst | 13 ++++++-
drivers/bus/fslmc/fslmc_vfio.c | 7 ++--
drivers/net/mlx4/mlx4_mr.c | 3 ++
drivers/net/mlx5/mlx5.c | 5 ++-
drivers/net/mlx5/mlx5_mr.c | 3 ++
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 ++-
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 ++
lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++--
lib/librte_eal/common/eal_common_memory.c | 3 ++
.../common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/common/include/rte_memory.h | 9 +++++
lib/librte_eal/common/malloc_elem.c | 10 ++++--
lib/librte_eal/common/malloc_heap.c | 9 +++--
lib/librte_eal/common/rte_malloc.c | 2 +-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +++++-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++---
lib/librte_eal/meson.build | 2 +-
lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++-----
test/test/test_malloc.c | 3 ++
test/test/test_memzone.c | 3 ++
24 files changed, 134 insertions(+), 44 deletions(-)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------
-* eal: certain structures will change in EAL on account of upcoming external
- memory support. Aside from internal changes leading to an ABI break, the
- following externally visible changes will also be implemented:
-
- - ``rte_memseg_list`` will change to include a boolean flag indicating
- whether a particular memseg list is externally allocated. This will have
- implications for any users of memseg-walk-related functions, as they will
- now have to skip externally allocated segments in most cases if the intent
- is to only iterate over internal DPDK memory.
- - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
- as some socket ID's will now be representing externally allocated memory. No
- changes will be required for existing code as backwards compatibility will
- be kept, and those who do not use this feature will not see these extra
- socket ID's.
-
* eal: both declaring and identifying devices will be streamlined in v18.11.
New functions will appear to query a specific port from buses, classes of
device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..5fc71e208 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* eal: The following API changes were made in 18.11:
+
+ - ``rte_memseg_list`` structure now has an additional flag indicating whether
+ the memseg list is externally allocated. This will have implications for any
+ users of memseg-walk-related functions, as they will now have to skip
+ externally allocated segments in most cases if the intent is to only iterate
+ over internal DPDK memory.
ABI Changes
-----------
@@ -107,6 +114,10 @@ ABI Changes
=========================================================
+* eal: EAL library ABI version was changed due to previously announced work on
+ supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
+ a new flag indicating whether the memseg list refers to external memory.
+
Removed Items
-------------
@@ -152,7 +163,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_compressdev.so.1
librte_cryptodev.so.5
librte_distributor.so.1
- librte_eal.so.8
+ + librte_eal.so.9
librte_ethdev.so.10
librte_eventdev.so.4
librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
}
static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *n_segs = arg;
int ret;
+ if (msl->external)
+ return 0;
+
ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
if (ret)
DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;
+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..c90e1d8ce 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
static void *uar_base;
static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
void **addr = arg;
+ if (msl->external)
+ return 0;
+
if (*addr == NULL)
*addr = ms->addr;
else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;
+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
uint32_t region_nr;
};
static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, size_t len, void *arg)
{
struct walk_arg *wa = arg;
struct vhost_memory_region *mr;
void *start_addr;
+ if (msl->external)
+ return 0;
+
if (wa->region_nr >= max_regions)
return -1;
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
EXPORT_MAP := ../../rte_eal_version.map
-LIBABIVER := 8
+LIBABIVER := 9
# specific to bsdapp exec-env
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
return 1;
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
int seg_idx;
};
static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
struct attach_walk_args *wa = arg;
void *addr;
+ if (msl->external)
+ return 0;
+
addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
{
uint64_t *total_len = arg;
+ if (msl->external)
+ return 0;
+
*total_len += msl->memseg_arr.count * msl->page_sz;
return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
+ unsigned int external; /**< 1 if this list points to external memory */
volatile uint32_t version; /**< version number for multiprocess sync. */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
/* if we're in IOVA as VA mode, or if we're in legacy mode with
- * hugepages, all elements are IOVA-contiguous.
+ * hugepages, all elements are IOVA-contiguous. however, we can only
+ * make these assumptions about internal memory - externally allocated
+ * segments have to be checked.
*/
- if (rte_eal_iova_mode() == RTE_IOVA_VA ||
- (internal_config.legacy_mem && rte_eal_has_hugepages()))
+ if (!elem->msl->external &&
+ (rte_eal_iova_mode() == RTE_IOVA_VA ||
+ (internal_config.legacy_mem &&
+ rte_eal_has_hugepages())))
return RTE_PTR_DIFF(data_end, contig_seg_start);
cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct malloc_heap *heap;
int msl_idx;
+ if (msl->external)
+ return 0;
+
heap = &mcfg->malloc_heaps[msl->socket_id];
/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
/* anything after this is a bonus */
ret = 0;
- /* ...of which we can't avail if we are in legacy mode */
- if (internal_config.legacy_mem)
+ /* ...of which we can't avail if we are in legacy mode, or if this is an
+ * externally allocated segment.
+ */
+ if (internal_config.legacy_mem || msl->external)
goto free_unlock;
/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
if (elem == NULL)
return RTE_BAD_IOVA;
- if (rte_eal_iova_mode() == RTE_IOVA_VA)
+ if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
return (uintptr_t) addr;
ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
EXPORT_MAP := ../../rte_eal_version.map
VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
-LIBABIVER := 8
+LIBABIVER := 9
VPATH += $(RTE_SDK)/lib/librte_eal/common
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
return *socket_id == msl->socket_id;
}
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg __rte_unused)
{
/* ms is const, so find this memseg */
- struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+ struct rte_memseg *found;
+
+ if (msl->external)
+ return 0;
+
+ found = rte_mem_virt2memseg(ms->addr, msl);
found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
unsigned int i;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
char name[PATH_MAX];
int msl_idx, ret;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
unsigned int len;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
}
static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
}
static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
uint64_t hugepage_sz;
};
static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
struct spapr_walk_param *param = arg;
uint64_t max = ms->iova + ms->len;
+ if (msl->external)
+ return 0;
+
if (max > param->window_size) {
param->hugepage_sz = ms->hugepage_sz;
param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
error('unsupported system type "@0@"'.format(host_machine.system()))
endif
-version = 8 # the version of the EAL API
+version = 9 # the version of the EAL API
allow_experimental_apis = true
deps += 'compat'
deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
}
+struct pagesz_walk_arg {
+ int socket_id;
+ size_t min;
+};
+
static int
find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
{
- size_t *min = arg;
+ struct pagesz_walk_arg *wa = arg;
+ bool valid;
- if (msl->page_sz < *min)
- *min = msl->page_sz;
+ /*
+ * we need to only look at page sizes available for a particular socket
+ * ID. so, we either need an exact match on socket ID (can match both
+ * native and external memory), or, if SOCKET_ID_ANY was specified as a
+ * socket ID argument, we must only look at native memory and ignore any
+ * page sizes associated with external memory.
+ */
+ valid = msl->socket_id == wa->socket_id;
+ valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+ if (valid && msl->page_sz < wa->min)
+ wa->min = msl->page_sz;
return 0;
}
static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
{
- size_t min_pagesz = SIZE_MAX;
+ struct pagesz_walk_arg wa;
- rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+ wa.min = SIZE_MAX;
+ wa.socket_id = socket_id;
- return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+ rte_memseg_list_walk(find_min_pagesz, &wa);
+
+ return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
}
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_sz = 0;
pg_shift = 0;
} else if (try_contig) {
- pg_sz = get_min_page_size();
+ pg_sz = get_min_page_size(mp->socket_id);
pg_shift = rte_bsf32(pg_sz);
} else {
pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
{
int32_t *socket = arg;
+ if (msl->external)
+ return 0;
+
return *socket == msl->socket_id;
}
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
{
struct walk_arg *wa = arg;
+ if (msl->external)
+ return 0;
+
if (msl->page_sz == RTE_PGSIZE_2M)
wa->hugepage_2MB_avail = 1;
if (msl->page_sz == RTE_PGSIZE_1G)
--
2.17.1
^ permalink raw reply [relevance 16%]
* [dpdk-dev] [PATCH v6 08/21] malloc: add name to malloc heaps
2018-09-26 11:21 2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
` (2 preceding siblings ...)
2018-09-27 10:41 4% ` [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-09-27 10:41 9% ` Anatoly Burakov
2018-09-27 10:41 4% ` [dpdk-dev] [PATCH v6 11/21] malloc: allow creating " Anatoly Burakov
4 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 1 +
lib/librte_eal/common/include/rte_malloc_heap.h | 2 ++
lib/librte_eal/common/malloc_heap.c | 17 ++++++++++++++++-
lib/librte_eal/common/rte_malloc.c | 1 +
4 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 6ee236302..5a80e1122 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -124,6 +124,7 @@ ABI Changes
* eal: EAL library ABI version was changed due to previously announced work on
supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
a new flag indicating whether the memseg list refers to external memory.
+ Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
/* Number of free lists per heap, grouped by size. */
#define RTE_HEAP_NUM_FREELISTS 13
+#define RTE_HEAP_NAME_MAX_LEN 32
/* dummy definition, for pointers */
struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
unsigned alloc_count;
size_t total_size;
unsigned int socket_id;
+ char name[RTE_HEAP_NAME_MAX_LEN];
} __rte_cache_aligned;
#endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 73e478076..ac89d15a4 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -127,7 +127,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
malloc_heap_add_memory(heap, found_msl, ms->addr, len);
heap->total_size += len;
- heap->socket_id = msl->socket_id;
RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
msl->socket_id);
@@ -1020,6 +1019,22 @@ int
rte_eal_malloc_heap_init(void)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int i;
+
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign names to default DPDK heaps */
+ for (i = 0; i < rte_socket_count(); i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+ char heap_name[RTE_HEAP_NAME_MAX_LEN];
+ int socket_id = rte_socket_id_by_idx(i);
+
+ snprintf(heap_name, sizeof(heap_name) - 1,
+ "socket_%i", socket_id);
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ heap->socket_id = socket_id;
+ }
+ }
+
if (register_mp_requests()) {
RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
malloc_heap_get_stats(heap, &sock_stats);
fprintf(f, "Heap id:%u\n", heap_id);
+ fprintf(f, "\tHeap name:%s\n", heap->name);
fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
--
2.17.1
^ permalink raw reply [relevance 9%]
* [dpdk-dev] [PATCH v6 11/21] malloc: allow creating malloc heaps
2018-09-26 11:21 2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
` (3 preceding siblings ...)
2018-09-27 10:41 9% ` [dpdk-dev] [PATCH v6 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-09-27 10:41 4% ` Anatoly Burakov
4 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 2 +
.../common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_malloc.h | 19 +++++++
lib/librte_eal/common/malloc_heap.c | 37 +++++++++++++
lib/librte_eal/common/malloc_heap.h | 3 ++
lib/librte_eal/common/rte_malloc.c | 52 +++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 117 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5a80e1122..5065ec1af 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -125,6 +125,8 @@ ABI Changes
supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
a new flag indicating whether the memseg list refers to external memory.
Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
+ Structure ``rte_eal_memconfig`` has been extended to contain next socket
+ ID for externally allocated memory segments.
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index d7920a4e0..98da58771 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
/* Heaps of Malloc */
struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
+ /* next socket ID for external malloc heap */
+ int next_socket_id;
+
/* address of mem_config in primary process. used to map shared config into
* exact same address the primary process maps it.
*/
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats);
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ * socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ * Name of the heap to create.
+ *
+ * @return
+ * - 0 on successful creation
+ * - -1 in case of error, with rte_errno set to one of the following:
+ * EINVAL - ``heap_name`` was NULL, empty or too long
+ * EEXIST - heap by name of ``heap_name`` already exists
+ * ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
/**
* Find socket ID corresponding to a named heap.
*
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac89d15a4..987b83fb8 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
#include "malloc_heap.h"
#include "malloc_mp.h"
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
static unsigned
check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
{
@@ -1015,6 +1019,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
rte_spinlock_unlock(&heap->lock);
}
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ uint32_t next_socket_id = mcfg->next_socket_id;
+
+ /* prevent overflow. did you really create 2 billion heaps??? */
+ if (next_socket_id > INT32_MAX) {
+ RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+ rte_errno = ENOSPC;
+ return -1;
+ }
+
+ /* initialize empty heap */
+ heap->alloc_count = 0;
+ heap->first = NULL;
+ heap->last = NULL;
+ LIST_INIT(heap->free_head);
+ rte_spinlock_init(&heap->lock);
+ heap->total_size = 0;
+ heap->socket_id = next_socket_id;
+
+ /* we hold a global mem hotplug writelock, so it's safe to increment */
+ mcfg->next_socket_id++;
+
+ /* set up name */
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ return 0;
+}
+
int
rte_eal_malloc_heap_init(void)
{
@@ -1022,6 +1056,9 @@ rte_eal_malloc_heap_init(void)
unsigned int i;
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign min socket ID to external heaps */
+ mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
/* assign names to default DPDK heaps */
for (i = 0; i < rte_socket_count(); i++) {
struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
size_t align, bool contig);
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
int
malloc_heap_free(struct malloc_elem *elem);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_eal_memconfig.h>
+#include <rte_errno.h>
#include <rte_branch_prediction.h>
#include <rte_debug.h>
#include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
}
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct malloc_heap *heap = NULL;
+ int i, ret;
+
+ if (heap_name == NULL ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+ RTE_HEAP_NAME_MAX_LEN) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ /* check if there is space in the heap list, or if heap with this name
+ * already exists.
+ */
+ rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+ /* existing heap */
+ if (strncmp(heap_name, tmp->name,
+ RTE_HEAP_NAME_MAX_LEN) == 0) {
+ RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+ heap_name);
+ rte_errno = EEXIST;
+ ret = -1;
+ goto unlock;
+ }
+ /* empty heap */
+ if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+ heap = tmp;
+ break;
+ }
+ }
+ if (heap == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+ rte_errno = ENOSPC;
+ ret = -1;
+ goto unlock;
+ }
+
+ /* we're sure that we can create a new heap, so do it */
+ ret = malloc_heap_create(heap, heap_name);
+unlock:
+ rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
rte_fbarray_set_used;
rte_log_register_type_and_pick_level;
rte_malloc_dump_heaps;
+ rte_malloc_heap_create;
rte_malloc_heap_get_socket;
rte_malloc_heap_socket_is_external;
rte_mem_alloc_validator_register;
--
2.17.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID
2018-09-26 11:21 2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
2018-09-27 10:40 16% ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-27 10:41 4% ` Anatoly Burakov
2018-09-27 13:14 0% ` Alejandro Lucero
2018-09-27 10:41 9% ` [dpdk-dev] [PATCH v6 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-09-27 10:41 4% ` [dpdk-dev] [PATCH v6 11/21] malloc: allow creating " Anatoly Burakov
4 siblings, 1 reply; 200+ results
From: Anatoly Burakov @ 2018-09-27 10:41 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko, alejandro.lucero
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.
This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 7 +++++++
lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
lib/librte_eal/common/malloc_heap.c | 2 +-
lib/librte_eal/common/rte_malloc.c | 4 ----
4 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5fc71e208..6ee236302 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
users of memseg-walk-related functions, as they will now have to skip
externally allocated segments in most cases if the intent is to only iterate
over internal DPDK memory.
+ - ``socket_id`` parameter across the entire DPDK has gained additional
+ meaning, as some socket ID's will now be representing externally allocated
+ memory. No changes will be required for existing code as backwards
+ compatibility will be kept, and those who do not use this feature will not
+ see these extra socket ID's. Any new API's must not check socket ID
+ parameters themselves, and must instead leave it to the memory subsystem to
+ decide whether socket ID is a valid one.
ABI Changes
-----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
return NULL;
}
- if ((socket_id != SOCKET_ID_ANY) &&
- (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+ if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
rte_errno = EINVAL;
return NULL;
}
- if (!rte_eal_has_hugepages())
+ /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+ * external heap.
+ */
+ if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
socket_id = SOCKET_ID_ANY;
contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1d1e35708..73e478076 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
if (size == 0 || (align && !rte_is_power_of_2(align)))
return NULL;
- if (!rte_eal_has_hugepages())
+ if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
socket_arg = SOCKET_ID_ANY;
if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
if (!rte_eal_has_hugepages())
socket_arg = SOCKET_ID_ANY;
- /* Check socket parameter */
- if (socket_arg >= RTE_MAX_NUMA_NODES)
- return NULL;
-
return malloc_heap_alloc(type, size, socket_arg, 0,
align == 0 ? 1 : align, 0, false);
}
--
2.17.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v6 00/21] Support externally allocated memory in DPDK
2018-09-26 11:21 2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
@ 2018-09-27 10:40 2% ` Anatoly Burakov
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
` (4 more replies)
2018-09-27 10:40 16% ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
` (3 subsequent siblings)
4 siblings, 5 replies; 200+ results
From: Anatoly Burakov @ 2018-09-27 10:40 UTC (permalink / raw)
To: dev
Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
thomas, shreyansh.jain, shahafs, arybchenko, alejandro.lucero
This is a proposal to enable using externally allocated memory
in DPDK.
In a nutshell, here is what is being done here:
- Index internal malloc heaps by NUMA node index, rather than NUMA
node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
- Each new heap will receive a unique socket ID that will be used by
allocator to decide from which heap (internal or external) to
allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
of externally allocated memory
- If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps
The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).
The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).
A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.
Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.
v6 -> v5 changes:
- Fixed documentation formatting as per Marko's comments
v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes
v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
IOVA-contiguousness when dealing with externally allocated memory
v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
comments
v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation
RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements
Anatoly Burakov (21):
mem: add length to memseg list
mem: allow memseg lists to be marked as external
malloc: index heaps using heap ID rather than NUMA node
mem: do not check for invalid socket ID
flow_classify: do not check for invalid socket ID
pipeline: do not check for invalid socket ID
sched: do not check for invalid socket ID
malloc: add name to malloc heaps
malloc: add function to query socket ID of named heap
malloc: add function to check if socket is external
malloc: allow creating malloc heaps
malloc: allow destroying heaps
malloc: allow adding memory to named heaps
malloc: allow removing memory from named heaps
malloc: allow attaching to external memory chunks
malloc: allow detaching from external memory
malloc: enable event callbacks for external memory
test: add unit tests for external memory support
app/testpmd: add support for external memory
doc: add external memory feature to the release notes
doc: add external memory feature to programmer's guide
app/test-pmd/config.c | 21 +-
app/test-pmd/parameters.c | 23 +-
app/test-pmd/testpmd.c | 305 ++++++++++++-
app/test-pmd/testpmd.h | 13 +-
config/common_base | 1 +
config/rte_config.h | 1 +
.../prog_guide/env_abstraction_layer.rst | 37 ++
doc/guides/rel_notes/deprecation.rst | 15 -
doc/guides/rel_notes/release_18_11.rst | 28 +-
doc/guides/testpmd_app_ug/run_app.rst | 12 +
drivers/bus/fslmc/fslmc_vfio.c | 14 +-
drivers/bus/pci/linux/pci.c | 2 +-
drivers/net/mlx4/mlx4_mr.c | 3 +
drivers/net/mlx5/mlx5.c | 5 +-
drivers/net/mlx5/mlx5_mr.c | 3 +
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 +-
.../net/virtio/virtio_user/virtio_user_dev.c | 8 +
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 +
lib/librte_eal/bsdapp/eal/eal_memory.c | 9 +-
lib/librte_eal/common/eal_common_memory.c | 8 +-
lib/librte_eal/common/eal_common_memzone.c | 8 +-
.../common/include/rte_eal_memconfig.h | 9 +-
lib/librte_eal/common/include/rte_malloc.h | 192 ++++++++
.../common/include/rte_malloc_heap.h | 3 +
lib/librte_eal/common/include/rte_memory.h | 9 +
lib/librte_eal/common/malloc_elem.c | 10 +-
lib/librte_eal/common/malloc_heap.c | 316 +++++++++++--
lib/librte_eal/common/malloc_heap.h | 17 +
lib/librte_eal/common/rte_malloc.c | 429 +++++++++++++++++-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 12 +-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 27 +-
lib/librte_eal/meson.build | 2 +-
lib/librte_eal/rte_eal_version.map | 8 +
lib/librte_flow_classify/rte_flow_classify.c | 3 +-
lib/librte_mempool/rte_mempool.c | 57 ++-
lib/librte_pipeline/rte_pipeline.c | 3 +-
lib/librte_sched/rte_sched.c | 2 +-
test/test/Makefile | 1 +
test/test/autotest_data.py | 14 +-
test/test/meson.build | 1 +
test/test/test_external_mem.c | 389 ++++++++++++++++
test/test/test_malloc.c | 3 +
test/test/test_memzone.c | 3 +
47 files changed, 1913 insertions(+), 139 deletions(-)
create mode 100644 test/test/test_external_mem.c
--
2.17.1
^ permalink raw reply [relevance 2%]
* [dpdk-dev] [PATCH v2 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0
2018-09-26 18:04 2% ` [dpdk-dev] [PATCH v2 00/15] " Shreyansh Jain
@ 2018-09-26 18:04 2% ` Shreyansh Jain
2018-10-12 9:32 0% ` [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes Shreyansh Jain
2018-10-12 10:04 2% ` [dpdk-dev] [PATCH v3 " Shreyansh Jain
2 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2018-09-26 18:04 UTC (permalink / raw)
To: dev, ferruh.yigit; +Cc: thomas, Hemant Agrawal
From: Hemant Agrawal <hemant.agrawal@nxp.com>
This patch add the support for new Management Complex
Firmware version to 10.1x.x. One of the main changes in
the APIs ordered queue.
The fslmc bus lib ABI will need to be bumped to reflect
the MC FW API and structure changes.
This will also result in bumping of ABI verion of all dependent
libs as they internally use the MC FW APIs and structures.
Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
drivers/bus/fslmc/Makefile | 2 +-
drivers/bus/fslmc/mc/dpbp.c | 10 +
drivers/bus/fslmc/mc/dpci.c | 197 ++++++++++++++++++++
drivers/bus/fslmc/mc/dpcon.c | 30 +++
drivers/bus/fslmc/mc/dpdmai.c | 14 ++
drivers/bus/fslmc/mc/dpio.c | 9 +
drivers/bus/fslmc/mc/fsl_dpbp.h | 1 +
drivers/bus/fslmc/mc/fsl_dpbp_cmd.h | 16 +-
drivers/bus/fslmc/mc/fsl_dpci.h | 47 ++++-
drivers/bus/fslmc/mc/fsl_dpci_cmd.h | 62 +++++-
drivers/bus/fslmc/mc/fsl_dpcon.h | 19 ++
drivers/bus/fslmc/mc/fsl_dpdmai.h | 5 +
drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h | 20 +-
drivers/bus/fslmc/mc/fsl_dpmng.h | 2 +-
drivers/bus/fslmc/mc/fsl_dpopr.h | 85 +++++++++
drivers/bus/fslmc/meson.build | 2 +
drivers/bus/fslmc/rte_bus_fslmc_version.map | 10 +
drivers/crypto/dpaa2_sec/Makefile | 2 +-
drivers/crypto/dpaa2_sec/meson.build | 2 +
drivers/event/dpaa2/Makefile | 2 +-
drivers/event/dpaa2/meson.build | 2 +
drivers/mempool/dpaa2/Makefile | 2 +-
drivers/mempool/dpaa2/meson.build | 2 +
drivers/net/dpaa2/Makefile | 2 +-
drivers/net/dpaa2/meson.build | 2 +
drivers/raw/dpaa2_cmdif/Makefile | 2 +-
drivers/raw/dpaa2_cmdif/meson.build | 2 +
drivers/raw/dpaa2_qdma/Makefile | 2 +-
drivers/raw/dpaa2_qdma/dpaa2_qdma.c | 14 +-
drivers/raw/dpaa2_qdma/dpaa2_qdma.h | 6 +-
drivers/raw/dpaa2_qdma/meson.build | 2 +
31 files changed, 541 insertions(+), 34 deletions(-)
create mode 100644 drivers/bus/fslmc/mc/fsl_dpopr.h
diff --git a/drivers/bus/fslmc/Makefile b/drivers/bus/fslmc/Makefile
index 515d0f534..e95551980 100644
--- a/drivers/bus/fslmc/Makefile
+++ b/drivers/bus/fslmc/Makefile
@@ -24,7 +24,7 @@ LDLIBS += -lrte_ethdev
EXPORT_MAP := rte_bus_fslmc_version.map
# library version
-LIBABIVER := 1
+LIBABIVER := 2
SRCS-$(CONFIG_RTE_LIBRTE_FSLMC_BUS) += \
qbman/qbman_portal.c \
diff --git a/drivers/bus/fslmc/mc/dpbp.c b/drivers/bus/fslmc/mc/dpbp.c
index 0215d22da..d9103409c 100644
--- a/drivers/bus/fslmc/mc/dpbp.c
+++ b/drivers/bus/fslmc/mc/dpbp.c
@@ -248,6 +248,16 @@ int dpbp_reset(struct fsl_mc_io *mc_io,
/* send command to mc*/
return mc_send_command(mc_io, &cmd);
}
+/**
+ * dpbp_get_attributes - Retrieve DPBP attributes.
+ *
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPBP object
+ * @attr: Returned object's attributes
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
int dpbp_get_attributes(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
diff --git a/drivers/bus/fslmc/mc/dpci.c b/drivers/bus/fslmc/mc/dpci.c
index ff366bfa9..95edae9d9 100644
--- a/drivers/bus/fslmc/mc/dpci.c
+++ b/drivers/bus/fslmc/mc/dpci.c
@@ -265,6 +265,15 @@ int dpci_reset(struct fsl_mc_io *mc_io,
return mc_send_command(mc_io, &cmd);
}
+/**
+ * dpci_get_attributes() - Retrieve DPCI attributes.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @attr: Returned object's attributes
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
int dpci_get_attributes(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
@@ -292,6 +301,94 @@ int dpci_get_attributes(struct fsl_mc_io *mc_io,
return 0;
}
+/**
+ * dpci_get_peer_attributes() - Retrieve peer DPCI attributes.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @attr: Returned peer attributes
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpci_get_peer_attributes(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ struct dpci_peer_attr *attr)
+{
+ struct dpci_rsp_get_peer_attr *rsp_params;
+ struct mc_command cmd = { 0 };
+ int err;
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCI_CMDID_GET_PEER_ATTR,
+ cmd_flags,
+ token);
+
+ /* send command to mc*/
+ err = mc_send_command(mc_io, &cmd);
+ if (err)
+ return err;
+
+ /* retrieve response parameters */
+ rsp_params = (struct dpci_rsp_get_peer_attr *)cmd.params;
+ attr->peer_id = le32_to_cpu(rsp_params->id);
+ attr->num_of_priorities = rsp_params->num_of_priorities;
+
+ return 0;
+}
+
+/**
+ * dpci_get_link_state() - Retrieve the DPCI link state.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @up: Returned link state; returns '1' if link is up, '0' otherwise
+ *
+ * DPCI can be connected to another DPCI, together they
+ * create a 'link'. In order to use the DPCI Tx and Rx queues,
+ * both objects must be enabled.
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpci_get_link_state(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ int *up)
+{
+ struct dpci_rsp_get_link_state *rsp_params;
+ struct mc_command cmd = { 0 };
+ int err;
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCI_CMDID_GET_LINK_STATE,
+ cmd_flags,
+ token);
+
+ /* send command to mc*/
+ err = mc_send_command(mc_io, &cmd);
+ if (err)
+ return err;
+
+ /* retrieve response parameters */
+ rsp_params = (struct dpci_rsp_get_link_state *)cmd.params;
+ *up = dpci_get_field(rsp_params->up, UP);
+
+ return 0;
+}
+
+/**
+ * dpci_set_rx_queue() - Set Rx queue configuration
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @priority: Select the queue relative to number of
+ * priorities configured at DPCI creation; use
+ * DPCI_ALL_QUEUES to configure all Rx queues
+ * identically.
+ * @cfg: Rx queue configuration
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
int dpci_set_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
@@ -314,6 +411,9 @@ int dpci_set_rx_queue(struct fsl_mc_io *mc_io,
dpci_set_field(cmd_params->dest_type,
DEST_TYPE,
cfg->dest_cfg.dest_type);
+ dpci_set_field(cmd_params->dest_type,
+ ORDER_PRESERVATION,
+ cfg->order_preservation_en);
/* send command to mc*/
return mc_send_command(mc_io, &cmd);
@@ -438,3 +538,100 @@ int dpci_get_api_version(struct fsl_mc_io *mc_io,
return 0;
}
+
+/**
+ * dpci_set_opr() - Set Order Restoration configuration.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @index: The queue index
+ * @options: Configuration mode options
+ * can be OPR_OPT_CREATE or OPR_OPT_RETIRE
+ * @cfg: Configuration options for the OPR
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpci_set_opr(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ uint8_t index,
+ uint8_t options,
+ struct opr_cfg *cfg)
+{
+ struct dpci_cmd_set_opr *cmd_params;
+ struct mc_command cmd = { 0 };
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCI_CMDID_SET_OPR,
+ cmd_flags,
+ token);
+ cmd_params = (struct dpci_cmd_set_opr *)cmd.params;
+ cmd_params->index = index;
+ cmd_params->options = options;
+ cmd_params->oloe = cfg->oloe;
+ cmd_params->oeane = cfg->oeane;
+ cmd_params->olws = cfg->olws;
+ cmd_params->oa = cfg->oa;
+ cmd_params->oprrws = cfg->oprrws;
+
+ /* send command to mc*/
+ return mc_send_command(mc_io, &cmd);
+}
+
+/**
+ * dpci_get_opr() - Retrieve Order Restoration config and query.
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCI object
+ * @index: The queue index
+ * @cfg: Returned OPR configuration
+ * @qry: Returned OPR query
+ *
+ * Return: '0' on Success; Error code otherwise.
+ */
+int dpci_get_opr(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ uint8_t index,
+ struct opr_cfg *cfg,
+ struct opr_qry *qry)
+{
+ struct dpci_rsp_get_opr *rsp_params;
+ struct dpci_cmd_get_opr *cmd_params;
+ struct mc_command cmd = { 0 };
+ int err;
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCI_CMDID_GET_OPR,
+ cmd_flags,
+ token);
+ cmd_params = (struct dpci_cmd_get_opr *)cmd.params;
+ cmd_params->index = index;
+
+ /* send command to mc*/
+ err = mc_send_command(mc_io, &cmd);
+ if (err)
+ return err;
+
+ /* retrieve response parameters */
+ rsp_params = (struct dpci_rsp_get_opr *)cmd.params;
+ cfg->oloe = rsp_params->oloe;
+ cfg->oeane = rsp_params->oeane;
+ cfg->olws = rsp_params->olws;
+ cfg->oa = rsp_params->oa;
+ cfg->oprrws = rsp_params->oprrws;
+ qry->rip = dpci_get_field(rsp_params->flags, RIP);
+ qry->enable = dpci_get_field(rsp_params->flags, OPR_ENABLE);
+ qry->nesn = le16_to_cpu(rsp_params->nesn);
+ qry->ndsn = le16_to_cpu(rsp_params->ndsn);
+ qry->ea_tseq = le16_to_cpu(rsp_params->ea_tseq);
+ qry->tseq_nlis = dpci_get_field(rsp_params->tseq_nlis, TSEQ_NLIS);
+ qry->ea_hseq = le16_to_cpu(rsp_params->ea_hseq);
+ qry->hseq_nlis = dpci_get_field(rsp_params->hseq_nlis, HSEQ_NLIS);
+ qry->ea_hptr = le16_to_cpu(rsp_params->ea_hptr);
+ qry->ea_tptr = le16_to_cpu(rsp_params->ea_tptr);
+ qry->opr_vid = le16_to_cpu(rsp_params->opr_vid);
+ qry->opr_id = le16_to_cpu(rsp_params->opr_id);
+
+ return 0;
+}
diff --git a/drivers/bus/fslmc/mc/dpcon.c b/drivers/bus/fslmc/mc/dpcon.c
index 3f6e04b97..92bd26512 100644
--- a/drivers/bus/fslmc/mc/dpcon.c
+++ b/drivers/bus/fslmc/mc/dpcon.c
@@ -295,6 +295,36 @@ int dpcon_get_attributes(struct fsl_mc_io *mc_io,
return 0;
}
+/**
+ * dpcon_set_notification() - Set DPCON notification destination
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPCON object
+ * @cfg: Notification parameters
+ *
+ * Return: '0' on Success; Error code otherwise
+ */
+int dpcon_set_notification(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ struct dpcon_notification_cfg *cfg)
+{
+ struct dpcon_cmd_set_notification *dpcon_cmd;
+ struct mc_command cmd = { 0 };
+
+ /* prepare command */
+ cmd.header = mc_encode_cmd_header(DPCON_CMDID_SET_NOTIFICATION,
+ cmd_flags,
+ token);
+ dpcon_cmd = (struct dpcon_cmd_set_notification *)cmd.params;
+ dpcon_cmd->dpio_id = cpu_to_le32(cfg->dpio_id);
+ dpcon_cmd->priority = cfg->priority;
+ dpcon_cmd->user_ctx = cpu_to_le64(cfg->user_ctx);
+
+ /* send command to mc*/
+ return mc_send_command(mc_io, &cmd);
+}
+
/**
* dpcon_get_api_version - Get Data Path Concentrator API version
* @mc_io: Pointer to MC portal's DPCON object
diff --git a/drivers/bus/fslmc/mc/dpdmai.c b/drivers/bus/fslmc/mc/dpdmai.c
index 528889df3..dcb9d516a 100644
--- a/drivers/bus/fslmc/mc/dpdmai.c
+++ b/drivers/bus/fslmc/mc/dpdmai.c
@@ -113,6 +113,7 @@ int dpdmai_create(struct fsl_mc_io *mc_io,
cmd_flags,
dprc_token);
cmd_params = (struct dpdmai_cmd_create *)cmd.params;
+ cmd_params->num_queues = cfg->num_queues;
cmd_params->priorities[0] = cfg->priorities[0];
cmd_params->priorities[1] = cfg->priorities[1];
@@ -297,6 +298,7 @@ int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
rsp_params = (struct dpdmai_rsp_get_attr *)cmd.params;
attr->id = le32_to_cpu(rsp_params->id);
attr->num_of_priorities = rsp_params->num_of_priorities;
+ attr->num_of_queues = rsp_params->num_of_queues;
return 0;
}
@@ -306,6 +308,8 @@ int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
* @mc_io: Pointer to MC portal's I/O object
* @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
* @token: Token of DPDMAI object
+ * @queue_idx: Rx queue index. Accepted values are form 0 to num_queues
+ * parameter provided in dpdmai_create
* @priority: Select the queue relative to number of
* priorities configured at DPDMAI creation; use
* DPDMAI_ALL_QUEUES to configure all Rx queues
@@ -317,6 +321,7 @@ int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
const struct dpdmai_rx_queue_cfg *cfg)
{
@@ -331,6 +336,7 @@ int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
cmd_params->dest_id = cpu_to_le32(cfg->dest_cfg.dest_id);
cmd_params->dest_priority = cfg->dest_cfg.priority;
cmd_params->priority = priority;
+ cmd_params->queue_idx = queue_idx;
cmd_params->user_ctx = cpu_to_le64(cfg->user_ctx);
cmd_params->options = cpu_to_le32(cfg->options);
dpdmai_set_field(cmd_params->dest_type,
@@ -346,6 +352,8 @@ int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
* @mc_io: Pointer to MC portal's I/O object
* @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
* @token: Token of DPDMAI object
+ * @queue_idx: Rx queue index. Accepted values are form 0 to num_queues
+ * parameter provided in dpdmai_create
* @priority: Select the queue relative to number of
* priorities configured at DPDMAI creation
* @attr: Returned Rx queue attributes
@@ -355,6 +363,7 @@ int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
struct dpdmai_rx_queue_attr *attr)
{
@@ -369,6 +378,7 @@ int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
token);
cmd_params = (struct dpdmai_cmd_get_queue *)cmd.params;
cmd_params->priority = priority;
+ cmd_params->queue_idx = queue_idx;
/* send command to mc*/
err = mc_send_command(mc_io, &cmd);
@@ -392,6 +402,8 @@ int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
* @mc_io: Pointer to MC portal's I/O object
* @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
* @token: Token of DPDMAI object
+ * @queue_idx: Tx queue index. Accepted values are form 0 to num_queues
+ * parameter provided in dpdmai_create
* @priority: Select the queue relative to number of
* priorities configured at DPDMAI creation
* @attr: Returned Tx queue attributes
@@ -401,6 +413,7 @@ int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
struct dpdmai_tx_queue_attr *attr)
{
@@ -415,6 +428,7 @@ int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
token);
cmd_params = (struct dpdmai_cmd_get_queue *)cmd.params;
cmd_params->priority = priority;
+ cmd_params->queue_idx = queue_idx;
/* send command to mc*/
err = mc_send_command(mc_io, &cmd);
diff --git a/drivers/bus/fslmc/mc/dpio.c b/drivers/bus/fslmc/mc/dpio.c
index 966277cc6..a3382ed14 100644
--- a/drivers/bus/fslmc/mc/dpio.c
+++ b/drivers/bus/fslmc/mc/dpio.c
@@ -268,6 +268,15 @@ int dpio_reset(struct fsl_mc_io *mc_io,
return mc_send_command(mc_io, &cmd);
}
+/**
+ * dpio_get_attributes() - Retrieve DPIO attributes
+ * @mc_io: Pointer to MC portal's I/O object
+ * @cmd_flags: Command flags; one or more of 'MC_CMD_FLAG_'
+ * @token: Token of DPIO object
+ * @attr: Returned object's attributes
+ *
+ * Return: '0' on Success; Error code otherwise
+ */
int dpio_get_attributes(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
diff --git a/drivers/bus/fslmc/mc/fsl_dpbp.h b/drivers/bus/fslmc/mc/fsl_dpbp.h
index 111836261..9d405b42c 100644
--- a/drivers/bus/fslmc/mc/fsl_dpbp.h
+++ b/drivers/bus/fslmc/mc/fsl_dpbp.h
@@ -82,6 +82,7 @@ int dpbp_get_attributes(struct fsl_mc_io *mc_io,
/**
* BPSCN write will attempt to allocate into a cache (coherent write)
*/
+#define DPBP_NOTIF_OPT_COHERENT_WRITE 0x00000001
int dpbp_get_api_version(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t *major_ver,
diff --git a/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h b/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h
index 18402cedf..55c9fc9b4 100644
--- a/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h
+++ b/drivers/bus/fslmc/mc/fsl_dpbp_cmd.h
@@ -9,13 +9,15 @@
/* DPBP Version */
#define DPBP_VER_MAJOR 3
-#define DPBP_VER_MINOR 3
+#define DPBP_VER_MINOR 4
/* Command versioning */
#define DPBP_CMD_BASE_VERSION 1
+#define DPBP_CMD_VERSION_2 2
#define DPBP_CMD_ID_OFFSET 4
#define DPBP_CMD(id) ((id << DPBP_CMD_ID_OFFSET) | DPBP_CMD_BASE_VERSION)
+#define DPBP_CMD_V2(id) ((id << DPBP_CMD_ID_OFFSET) | DPBP_CMD_VERSION_2)
/* Command IDs */
#define DPBP_CMDID_CLOSE DPBP_CMD(0x800)
@@ -37,8 +39,8 @@
#define DPBP_CMDID_GET_IRQ_STATUS DPBP_CMD(0x016)
#define DPBP_CMDID_CLEAR_IRQ_STATUS DPBP_CMD(0x017)
-#define DPBP_CMDID_SET_NOTIFICATIONS DPBP_CMD(0x1b0)
-#define DPBP_CMDID_GET_NOTIFICATIONS DPBP_CMD(0x1b1)
+#define DPBP_CMDID_SET_NOTIFICATIONS DPBP_CMD_V2(0x1b0)
+#define DPBP_CMDID_GET_NOTIFICATIONS DPBP_CMD_V2(0x1b1)
#define DPBP_CMDID_GET_FREE_BUFFERS_NUM DPBP_CMD(0x1b2)
@@ -68,8 +70,8 @@ struct dpbp_cmd_set_notifications {
uint32_t depletion_exit;
uint32_t surplus_entry;
uint32_t surplus_exit;
- uint16_t options;
- uint16_t pad[3];
+ uint32_t options;
+ uint16_t pad[2];
uint64_t message_ctx;
uint64_t message_iova;
};
@@ -79,8 +81,8 @@ struct dpbp_rsp_get_notifications {
uint32_t depletion_exit;
uint32_t surplus_entry;
uint32_t surplus_exit;
- uint16_t options;
- uint16_t pad[3];
+ uint32_t options;
+ uint16_t pad[2];
uint64_t message_ctx;
uint64_t message_iova;
};
diff --git a/drivers/bus/fslmc/mc/fsl_dpci.h b/drivers/bus/fslmc/mc/fsl_dpci.h
index f69ed3f33..9af9097e5 100644
--- a/drivers/bus/fslmc/mc/fsl_dpci.h
+++ b/drivers/bus/fslmc/mc/fsl_dpci.h
@@ -6,6 +6,8 @@
#ifndef __FSL_DPCI_H
#define __FSL_DPCI_H
+#include <fsl_dpopr.h>
+
/* Data Path Communication Interface API
* Contains initialization APIs and runtime control APIs for DPCI
*/
@@ -17,7 +19,7 @@ struct fsl_mc_io;
/**
* Maximum number of Tx/Rx priorities per DPCI object
*/
-#define DPCI_PRIO_NUM 2
+#define DPCI_PRIO_NUM 4
/**
* Indicates an invalid frame queue
@@ -106,6 +108,27 @@ int dpci_get_attributes(struct fsl_mc_io *mc_io,
uint16_t token,
struct dpci_attr *attr);
+/**
+ * struct dpci_peer_attr - Structure representing the peer DPCI attributes
+ * @peer_id: DPCI peer id; if no peer is connected returns (-1)
+ * @num_of_priorities: The pper's number of receive priorities; determines the
+ * number of transmit priorities for the local DPCI object
+ */
+struct dpci_peer_attr {
+ int peer_id;
+ uint8_t num_of_priorities;
+};
+
+int dpci_get_peer_attributes(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ struct dpci_peer_attr *attr);
+
+int dpci_get_link_state(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ int *up);
+
/**
* enum dpci_dest - DPCI destination types
* @DPCI_DEST_NONE: Unassigned destination; The queue is set in parked mode
@@ -153,6 +176,11 @@ struct dpci_dest_cfg {
*/
#define DPCI_QUEUE_OPT_DEST 0x00000002
+/**
+ * Set the queue to hold active mode.
+ */
+#define DPCI_QUEUE_OPT_HOLD_ACTIVE 0x00000004
+
/**
* struct dpci_rx_queue_cfg - Structure representing RX queue configuration
* @options: Flags representing the suggested modifications to the queue;
@@ -163,11 +191,14 @@ struct dpci_dest_cfg {
* 'options'
* @dest_cfg: Queue destination parameters;
* valid only if 'DPCI_QUEUE_OPT_DEST' is contained in 'options'
+ * @order_preservation_en: order preservation configuration for the rx queue
+ * valid only if 'DPCI_QUEUE_OPT_HOLD_ACTIVE' is contained in 'options'
*/
struct dpci_rx_queue_cfg {
uint32_t options;
uint64_t user_ctx;
struct dpci_dest_cfg dest_cfg;
+ int order_preservation_en;
};
int dpci_set_rx_queue(struct fsl_mc_io *mc_io,
@@ -217,4 +248,18 @@ int dpci_get_api_version(struct fsl_mc_io *mc_io,
uint16_t *major_ver,
uint16_t *minor_ver);
+int dpci_set_opr(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ uint8_t index,
+ uint8_t options,
+ struct opr_cfg *cfg);
+
+int dpci_get_opr(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ uint8_t index,
+ struct opr_cfg *cfg,
+ struct opr_qry *qry);
+
#endif /* __FSL_DPCI_H */
diff --git a/drivers/bus/fslmc/mc/fsl_dpci_cmd.h b/drivers/bus/fslmc/mc/fsl_dpci_cmd.h
index 634248ac0..92b85a820 100644
--- a/drivers/bus/fslmc/mc/fsl_dpci_cmd.h
+++ b/drivers/bus/fslmc/mc/fsl_dpci_cmd.h
@@ -8,7 +8,7 @@
/* DPCI Version */
#define DPCI_VER_MAJOR 3
-#define DPCI_VER_MINOR 3
+#define DPCI_VER_MINOR 4
#define DPCI_CMD_BASE_VERSION 1
#define DPCI_CMD_BASE_VERSION_V2 2
@@ -35,6 +35,8 @@
#define DPCI_CMDID_GET_PEER_ATTR DPCI_CMD_V1(0x0e2)
#define DPCI_CMDID_GET_RX_QUEUE DPCI_CMD_V1(0x0e3)
#define DPCI_CMDID_GET_TX_QUEUE DPCI_CMD_V1(0x0e4)
+#define DPCI_CMDID_SET_OPR DPCI_CMD_V1(0x0e5)
+#define DPCI_CMDID_GET_OPR DPCI_CMD_V1(0x0e6)
/* Macros for accessing command fields smaller than 1byte */
#define DPCI_MASK(field) \
@@ -90,6 +92,8 @@ struct dpci_rsp_get_link_state {
#define DPCI_DEST_TYPE_SHIFT 0
#define DPCI_DEST_TYPE_SIZE 4
+#define DPCI_ORDER_PRESERVATION_SHIFT 4
+#define DPCI_ORDER_PRESERVATION_SIZE 1
struct dpci_cmd_set_rx_queue {
uint32_t dest_id;
@@ -128,5 +132,61 @@ struct dpci_rsp_get_api_version {
uint16_t minor;
};
+struct dpci_cmd_set_opr {
+ uint16_t pad0;
+ uint8_t index;
+ uint8_t options;
+ uint8_t pad1[7];
+ uint8_t oloe;
+ uint8_t oeane;
+ uint8_t olws;
+ uint8_t oa;
+ uint8_t oprrws;
+};
+
+struct dpci_cmd_get_opr {
+ uint16_t pad;
+ uint8_t index;
+};
+
+#define DPCI_RIP_SHIFT 0
+#define DPCI_RIP_SIZE 1
+#define DPCI_OPR_ENABLE_SHIFT 1
+#define DPCI_OPR_ENABLE_SIZE 1
+#define DPCI_TSEQ_NLIS_SHIFT 0
+#define DPCI_TSEQ_NLIS_SIZE 1
+#define DPCI_HSEQ_NLIS_SHIFT 0
+#define DPCI_HSEQ_NLIS_SIZE 1
+
+struct dpci_rsp_get_opr {
+ uint64_t pad0;
+ /* from LSB: rip:1 enable:1 */
+ uint8_t flags;
+ uint16_t pad1;
+ uint8_t oloe;
+ uint8_t oeane;
+ uint8_t olws;
+ uint8_t oa;
+ uint8_t oprrws;
+ uint16_t nesn;
+ uint16_t pad8;
+ uint16_t ndsn;
+ uint16_t pad2;
+ uint16_t ea_tseq;
+ /* only the LSB */
+ uint8_t tseq_nlis;
+ uint8_t pad3;
+ uint16_t ea_hseq;
+ /* only the LSB */
+ uint8_t hseq_nlis;
+ uint8_t pad4;
+ uint16_t ea_hptr;
+ uint16_t pad5;
+ uint16_t ea_tptr;
+ uint16_t pad6;
+ uint16_t opr_vid;
+ uint16_t pad7;
+ uint16_t opr_id;
+};
#pragma pack(pop)
#endif /* _FSL_DPCI_CMD_H */
diff --git a/drivers/bus/fslmc/mc/fsl_dpcon.h b/drivers/bus/fslmc/mc/fsl_dpcon.h
index 36dd5f3c1..fc0430dc1 100644
--- a/drivers/bus/fslmc/mc/fsl_dpcon.h
+++ b/drivers/bus/fslmc/mc/fsl_dpcon.h
@@ -81,6 +81,25 @@ int dpcon_get_attributes(struct fsl_mc_io *mc_io,
uint16_t token,
struct dpcon_attr *attr);
+/**
+ * struct dpcon_notification_cfg - Structure representing notification params
+ * @dpio_id: DPIO object ID; must be configured with a notification channel;
+ * to disable notifications set it to 'DPCON_INVALID_DPIO_ID';
+ * @priority: Priority selection within the DPIO channel; valid values
+ * are 0-7, depending on the number of priorities in that channel
+ * @user_ctx: User context value provided with each CDAN message
+ */
+struct dpcon_notification_cfg {
+ int dpio_id;
+ uint8_t priority;
+ uint64_t user_ctx;
+};
+
+int dpcon_set_notification(struct fsl_mc_io *mc_io,
+ uint32_t cmd_flags,
+ uint16_t token,
+ struct dpcon_notification_cfg *cfg);
+
int dpcon_get_api_version(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t *major_ver,
diff --git a/drivers/bus/fslmc/mc/fsl_dpdmai.h b/drivers/bus/fslmc/mc/fsl_dpdmai.h
index 03e46ec14..40469cc13 100644
--- a/drivers/bus/fslmc/mc/fsl_dpdmai.h
+++ b/drivers/bus/fslmc/mc/fsl_dpdmai.h
@@ -39,6 +39,7 @@ int dpdmai_close(struct fsl_mc_io *mc_io,
* should be configured with 0
*/
struct dpdmai_cfg {
+ uint8_t num_queues;
uint8_t priorities[DPDMAI_PRIO_NUM];
};
@@ -78,6 +79,7 @@ int dpdmai_reset(struct fsl_mc_io *mc_io,
struct dpdmai_attr {
int id;
uint8_t num_of_priorities;
+ uint8_t num_of_queues;
};
int dpdmai_get_attributes(struct fsl_mc_io *mc_io,
@@ -149,6 +151,7 @@ struct dpdmai_rx_queue_cfg {
int dpdmai_set_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
const struct dpdmai_rx_queue_cfg *cfg);
@@ -168,6 +171,7 @@ struct dpdmai_rx_queue_attr {
int dpdmai_get_rx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
struct dpdmai_rx_queue_attr *attr);
@@ -183,6 +187,7 @@ struct dpdmai_tx_queue_attr {
int dpdmai_get_tx_queue(struct fsl_mc_io *mc_io,
uint32_t cmd_flags,
uint16_t token,
+ uint8_t queue_idx,
uint8_t priority,
struct dpdmai_tx_queue_attr *attr);
diff --git a/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h b/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h
index 618e19eae..7e122de4e 100644
--- a/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h
+++ b/drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h
@@ -7,30 +7,32 @@
/* DPDMAI Version */
#define DPDMAI_VER_MAJOR 3
-#define DPDMAI_VER_MINOR 2
+#define DPDMAI_VER_MINOR 3
/* Command versioning */
#define DPDMAI_CMD_BASE_VERSION 1
+#define DPDMAI_CMD_VERSION_2 2
#define DPDMAI_CMD_ID_OFFSET 4
#define DPDMAI_CMD(id) ((id << DPDMAI_CMD_ID_OFFSET) | DPDMAI_CMD_BASE_VERSION)
+#define DPDMAI_CMD_V2(id) ((id << DPDMAI_CMD_ID_OFFSET) | DPDMAI_CMD_VERSION_2)
/* Command IDs */
#define DPDMAI_CMDID_CLOSE DPDMAI_CMD(0x800)
#define DPDMAI_CMDID_OPEN DPDMAI_CMD(0x80E)
-#define DPDMAI_CMDID_CREATE DPDMAI_CMD(0x90E)
+#define DPDMAI_CMDID_CREATE DPDMAI_CMD_V2(0x90E)
#define DPDMAI_CMDID_DESTROY DPDMAI_CMD(0x98E)
#define DPDMAI_CMDID_GET_API_VERSION DPDMAI_CMD(0xa0E)
#define DPDMAI_CMDID_ENABLE DPDMAI_CMD(0x002)
#define DPDMAI_CMDID_DISABLE DPDMAI_CMD(0x003)
-#define DPDMAI_CMDID_GET_ATTR DPDMAI_CMD(0x004)
+#define DPDMAI_CMDID_GET_ATTR DPDMAI_CMD_V2(0x004)
#define DPDMAI_CMDID_RESET DPDMAI_CMD(0x005)
#define DPDMAI_CMDID_IS_ENABLED DPDMAI_CMD(0x006)
-#define DPDMAI_CMDID_SET_RX_QUEUE DPDMAI_CMD(0x1A0)
-#define DPDMAI_CMDID_GET_RX_QUEUE DPDMAI_CMD(0x1A1)
-#define DPDMAI_CMDID_GET_TX_QUEUE DPDMAI_CMD(0x1A2)
+#define DPDMAI_CMDID_SET_RX_QUEUE DPDMAI_CMD_V2(0x1A0)
+#define DPDMAI_CMDID_GET_RX_QUEUE DPDMAI_CMD_V2(0x1A1)
+#define DPDMAI_CMDID_GET_TX_QUEUE DPDMAI_CMD_V2(0x1A2)
/* Macros for accessing command fields smaller than 1byte */
#define DPDMAI_MASK(field) \
@@ -47,7 +49,7 @@ struct dpdmai_cmd_open {
};
struct dpdmai_cmd_create {
- uint8_t pad;
+ uint8_t num_queues;
uint8_t priorities[2];
};
@@ -66,6 +68,7 @@ struct dpdmai_rsp_is_enabled {
struct dpdmai_rsp_get_attr {
uint32_t id;
uint8_t num_of_priorities;
+ uint8_t num_of_queues;
};
#define DPDMAI_DEST_TYPE_SHIFT 0
@@ -77,7 +80,7 @@ struct dpdmai_cmd_set_rx_queue {
uint8_t priority;
/* from LSB: dest_type:4 */
uint8_t dest_type;
- uint8_t pad;
+ uint8_t queue_idx;
uint64_t user_ctx;
uint32_t options;
};
@@ -85,6 +88,7 @@ struct dpdmai_cmd_set_rx_queue {
struct dpdmai_cmd_get_queue {
uint8_t pad[5];
uint8_t priority;
+ uint8_t queue_idx;
};
struct dpdmai_rsp_get_rx_queue {
diff --git a/drivers/bus/fslmc/mc/fsl_dpmng.h b/drivers/bus/fslmc/mc/fsl_dpmng.h
index afaf9b711..8559bef87 100644
--- a/drivers/bus/fslmc/mc/fsl_dpmng.h
+++ b/drivers/bus/fslmc/mc/fsl_dpmng.h
@@ -18,7 +18,7 @@ struct fsl_mc_io;
* Management Complex firmware version information
*/
#define MC_VER_MAJOR 10
-#define MC_VER_MINOR 3
+#define MC_VER_MINOR 10
/**
* struct mc_version
diff --git a/drivers/bus/fslmc/mc/fsl_dpopr.h b/drivers/bus/fslmc/mc/fsl_dpopr.h
new file mode 100644
index 000000000..fd727e011
--- /dev/null
+++ b/drivers/bus/fslmc/mc/fsl_dpopr.h
@@ -0,0 +1,85 @@
+/* SPDX-License-Identifier: (BSD-3-Clause OR GPL-2.0)
+ *
+ * Copyright 2013-2015 Freescale Semiconductor Inc.
+ * Copyright 2018 NXP
+ *
+ */
+#ifndef __FSL_DPOPR_H_
+#define __FSL_DPOPR_H_
+
+/** @addtogroup dpopr Data Path Order Restoration API
+ * Contains initialization APIs and runtime APIs for the Order Restoration
+ * @{
+ */
+
+/** Order Restoration properties */
+
+/**
+ * Create a new Order Point Record option
+ */
+#define OPR_OPT_CREATE 0x1
+/**
+ * Retire an existing Order Point Record option
+ */
+#define OPR_OPT_RETIRE 0x2
+
+/**
+ * struct opr_cfg - Structure representing OPR configuration
+ * @oprrws: Order point record (OPR) restoration window size (0 to 5)
+ * 0 - Window size is 32 frames.
+ * 1 - Window size is 64 frames.
+ * 2 - Window size is 128 frames.
+ * 3 - Window size is 256 frames.
+ * 4 - Window size is 512 frames.
+ * 5 - Window size is 1024 frames.
+ *@oa: OPR auto advance NESN window size (0 disabled, 1 enabled)
+ *@olws: OPR acceptable late arrival window size (0 to 3)
+ * 0 - Disabled. Late arrivals are always rejected.
+ * 1 - Window size is 32 frames.
+ * 2 - Window size is the same as the OPR restoration
+ * window size configured in the OPRRWS field.
+ * 3 - Window size is 8192 frames.
+ * Late arrivals are always accepted.
+ *@oeane: Order restoration list (ORL) resource exhaustion
+ * advance NESN enable (0 disabled, 1 enabled)
+ *@oloe: OPR loose ordering enable (0 disabled, 1 enabled)
+ */
+struct opr_cfg {
+ uint8_t oprrws;
+ uint8_t oa;
+ uint8_t olws;
+ uint8_t oeane;
+ uint8_t oloe;
+};
+
+/**
+ * struct opr_qry - Structure representing OPR configuration
+ * @enable: Enabled state
+ * @rip: Retirement In Progress
+ * @ndsn: Next dispensed sequence number
+ * @nesn: Next expected sequence number
+ * @ea_hseq: Early arrival head sequence number
+ * @hseq_nlis: HSEQ not last in sequence
+ * @ea_tseq: Early arrival tail sequence number
+ * @tseq_nlis: TSEQ not last in sequence
+ * @ea_tptr: Early arrival tail pointer
+ * @ea_hptr: Early arrival head pointer
+ * @opr_id: Order Point Record ID
+ * @opr_vid: Order Point Record Virtual ID
+ */
+struct opr_qry {
+ char enable;
+ char rip;
+ uint16_t ndsn;
+ uint16_t nesn;
+ uint16_t ea_hseq;
+ char hseq_nlis;
+ uint16_t ea_tseq;
+ char tseq_nlis;
+ uint16_t ea_tptr;
+ uint16_t ea_hptr;
+ uint16_t opr_id;
+ uint16_t opr_vid;
+};
+
+#endif /* __FSL_DPOPR_H_ */
diff --git a/drivers/bus/fslmc/meson.build b/drivers/bus/fslmc/meson.build
index 22a56a6fc..54ca92d0c 100644
--- a/drivers/bus/fslmc/meson.build
+++ b/drivers/bus/fslmc/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/bus/fslmc/rte_bus_fslmc_version.map b/drivers/bus/fslmc/rte_bus_fslmc_version.map
index b4a881704..8717373dd 100644
--- a/drivers/bus/fslmc/rte_bus_fslmc_version.map
+++ b/drivers/bus/fslmc/rte_bus_fslmc_version.map
@@ -117,3 +117,13 @@ DPDK_18.05 {
rte_dpaa2_memsegs;
} DPDK_18.02;
+
+DPDK_18.11 {
+ global:
+
+ dpci_get_link_state;
+ dpci_get_opr;
+ dpci_get_peer_attributes;
+ dpci_set_opr;
+
+} DPDK_18.05;
diff --git a/drivers/crypto/dpaa2_sec/Makefile b/drivers/crypto/dpaa2_sec/Makefile
index da3d8f84f..a61be49db 100644
--- a/drivers/crypto/dpaa2_sec/Makefile
+++ b/drivers/crypto/dpaa2_sec/Makefile
@@ -41,7 +41,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
EXPORT_MAP := rte_pmd_dpaa2_sec_version.map
# library version
-LIBABIVER := 1
+LIBABIVER := 2
# library source files
SRCS-$(CONFIG_RTE_LIBRTE_PMD_DPAA2_SEC) += dpaa2_sec_dpseci.c
diff --git a/drivers/crypto/dpaa2_sec/meson.build b/drivers/crypto/dpaa2_sec/meson.build
index 01afc5877..8fa4827ed 100644
--- a/drivers/crypto/dpaa2_sec/meson.build
+++ b/drivers/crypto/dpaa2_sec/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/event/dpaa2/Makefile b/drivers/event/dpaa2/Makefile
index 5e1a63200..3f85dd2be 100644
--- a/drivers/event/dpaa2/Makefile
+++ b/drivers/event/dpaa2/Makefile
@@ -27,7 +27,7 @@ CFLAGS += -I$(RTE_SDK)/drivers/net/dpaa2/mc
# versioning export map
EXPORT_MAP := rte_pmd_dpaa2_event_version.map
-LIBABIVER := 1
+LIBABIVER := 2
# depends on fslmc bus which uses experimental API
CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/drivers/event/dpaa2/meson.build b/drivers/event/dpaa2/meson.build
index de7a46155..c46b39e9d 100644
--- a/drivers/event/dpaa2/meson.build
+++ b/drivers/event/dpaa2/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/mempool/dpaa2/Makefile b/drivers/mempool/dpaa2/Makefile
index 9e4c87d79..4996a2cd1 100644
--- a/drivers/mempool/dpaa2/Makefile
+++ b/drivers/mempool/dpaa2/Makefile
@@ -19,7 +19,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
EXPORT_MAP := rte_mempool_dpaa2_version.map
# Lbrary version
-LIBABIVER := 1
+LIBABIVER := 2
# depends on fslmc bus which uses experimental API
CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/drivers/mempool/dpaa2/meson.build b/drivers/mempool/dpaa2/meson.build
index 90bab6069..6b6ead617 100644
--- a/drivers/mempool/dpaa2/meson.build
+++ b/drivers/mempool/dpaa2/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/net/dpaa2/Makefile b/drivers/net/dpaa2/Makefile
index 9b0b14331..1d46f7f25 100644
--- a/drivers/net/dpaa2/Makefile
+++ b/drivers/net/dpaa2/Makefile
@@ -25,7 +25,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_eal/linuxapp/eal
EXPORT_MAP := rte_pmd_dpaa2_version.map
# library version
-LIBABIVER := 1
+LIBABIVER := 2
# depends on fslmc bus which uses experimental API
CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/drivers/net/dpaa2/meson.build b/drivers/net/dpaa2/meson.build
index 213f0d72f..b34595258 100644
--- a/drivers/net/dpaa2/meson.build
+++ b/drivers/net/dpaa2/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
if host_machine.system() != 'linux'
build = false
endif
diff --git a/drivers/raw/dpaa2_cmdif/Makefile b/drivers/raw/dpaa2_cmdif/Makefile
index 9b863dda2..0dbe5c821 100644
--- a/drivers/raw/dpaa2_cmdif/Makefile
+++ b/drivers/raw/dpaa2_cmdif/Makefile
@@ -24,7 +24,7 @@ LDLIBS += -lrte_rawdev
EXPORT_MAP := rte_pmd_dpaa2_cmdif_version.map
-LIBABIVER := 1
+LIBABIVER := 2
#
# all source are stored in SRCS-y
diff --git a/drivers/raw/dpaa2_cmdif/meson.build b/drivers/raw/dpaa2_cmdif/meson.build
index 1d146872e..37bb24a1b 100644
--- a/drivers/raw/dpaa2_cmdif/meson.build
+++ b/drivers/raw/dpaa2_cmdif/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
build = dpdk_conf.has('RTE_LIBRTE_DPAA2_MEMPOOL')
deps += ['rawdev', 'mempool_dpaa2', 'bus_vdev']
sources = files('dpaa2_cmdif.c')
diff --git a/drivers/raw/dpaa2_qdma/Makefile b/drivers/raw/dpaa2_qdma/Makefile
index d88809ead..645220772 100644
--- a/drivers/raw/dpaa2_qdma/Makefile
+++ b/drivers/raw/dpaa2_qdma/Makefile
@@ -25,7 +25,7 @@ LDLIBS += -lrte_ring
EXPORT_MAP := rte_pmd_dpaa2_qdma_version.map
-LIBABIVER := 1
+LIBABIVER := 2
#
# all source are stored in SRCS-y
diff --git a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
index 2787d3028..44503331e 100644
--- a/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
+++ b/drivers/raw/dpaa2_qdma/dpaa2_qdma.c
@@ -805,7 +805,7 @@ dpaa2_dpdmai_dev_uninit(struct rte_rawdev *rawdev)
DPAA2_QDMA_ERR("dmdmai disable failed");
/* Set up the DQRR storage for Rx */
- for (i = 0; i < DPDMAI_PRIO_NUM; i++) {
+ for (i = 0; i < dpdmai_dev->num_queues; i++) {
struct dpaa2_queue *rxq = &(dpdmai_dev->rx_queue[i]);
if (rxq->q_storage) {
@@ -856,17 +856,17 @@ dpaa2_dpdmai_dev_init(struct rte_rawdev *rawdev, int dpdmai_id)
ret);
goto init_err;
}
- dpdmai_dev->num_queues = attr.num_of_priorities;
+ dpdmai_dev->num_queues = attr.num_of_queues;
/* Set up Rx Queues */
- for (i = 0; i < attr.num_of_priorities; i++) {
+ for (i = 0; i < dpdmai_dev->num_queues; i++) {
struct dpaa2_queue *rxq;
memset(&rx_queue_cfg, 0, sizeof(struct dpdmai_rx_queue_cfg));
ret = dpdmai_set_rx_queue(&dpdmai_dev->dpdmai,
CMD_PRI_LOW,
dpdmai_dev->token,
- i, &rx_queue_cfg);
+ i, 0, &rx_queue_cfg);
if (ret) {
DPAA2_QDMA_ERR("Setting Rx queue failed with err: %d",
ret);
@@ -893,9 +893,9 @@ dpaa2_dpdmai_dev_init(struct rte_rawdev *rawdev, int dpdmai_id)
}
/* Get Rx and Tx queues FQID's */
- for (i = 0; i < DPDMAI_PRIO_NUM; i++) {
+ for (i = 0; i < dpdmai_dev->num_queues; i++) {
ret = dpdmai_get_rx_queue(&dpdmai_dev->dpdmai, CMD_PRI_LOW,
- dpdmai_dev->token, i, &rx_attr);
+ dpdmai_dev->token, i, 0, &rx_attr);
if (ret) {
DPAA2_QDMA_ERR("Reading device failed with err: %d",
ret);
@@ -904,7 +904,7 @@ dpaa2_dpdmai_dev_init(struct rte_rawdev *rawdev, int dpdmai_id)
dpdmai_dev->rx_queue[i].fqid = rx_attr.fqid;
ret = dpdmai_get_tx_queue(&dpdmai_dev->dpdmai, CMD_PRI_LOW,
- dpdmai_dev->token, i, &tx_attr);
+ dpdmai_dev->token, i, 0, &tx_attr);
if (ret) {
DPAA2_QDMA_ERR("Reading device failed with err: %d",
ret);
diff --git a/drivers/raw/dpaa2_qdma/dpaa2_qdma.h b/drivers/raw/dpaa2_qdma/dpaa2_qdma.h
index c6a057806..0cbe90255 100644
--- a/drivers/raw/dpaa2_qdma/dpaa2_qdma.h
+++ b/drivers/raw/dpaa2_qdma/dpaa2_qdma.h
@@ -11,6 +11,8 @@ struct qdma_io_meta;
#define DPAA2_QDMA_MAX_FLE 3
#define DPAA2_QDMA_MAX_SDD 2
+#define DPAA2_DPDMAI_MAX_QUEUES 8
+
/** FLE pool size: 3 Frame list + 2 source/destination descriptor */
#define QDMA_FLE_POOL_SIZE (sizeof(struct qdma_io_meta) + \
sizeof(struct qbman_fle) * DPAA2_QDMA_MAX_FLE + \
@@ -142,9 +144,9 @@ struct dpaa2_dpdmai_dev {
/** Number of queue in this DPDMAI device */
uint8_t num_queues;
/** RX queues */
- struct dpaa2_queue rx_queue[DPDMAI_PRIO_NUM];
+ struct dpaa2_queue rx_queue[DPAA2_DPDMAI_MAX_QUEUES];
/** TX queues */
- struct dpaa2_queue tx_queue[DPDMAI_PRIO_NUM];
+ struct dpaa2_queue tx_queue[DPAA2_DPDMAI_MAX_QUEUES];
};
#endif /* __DPAA2_QDMA_H__ */
diff --git a/drivers/raw/dpaa2_qdma/meson.build b/drivers/raw/dpaa2_qdma/meson.build
index b6a081f11..2a4b69c16 100644
--- a/drivers/raw/dpaa2_qdma/meson.build
+++ b/drivers/raw/dpaa2_qdma/meson.build
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright 2018 NXP
+version = 2
+
build = dpdk_conf.has('RTE_LIBRTE_DPAA2_MEMPOOL')
deps += ['rawdev', 'mempool_dpaa2', 'ring']
sources = files('dpaa2_qdma.c')
--
2.17.1
^ permalink raw reply [relevance 2%]
* [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes
@ 2018-09-26 18:04 2% ` Shreyansh Jain
2018-09-26 18:04 2% ` [dpdk-dev] [PATCH v2 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0 Shreyansh Jain
` (2 more replies)
0 siblings, 3 replies; 200+ results
From: Shreyansh Jain @ 2018-09-26 18:04 UTC (permalink / raw)
To: dev, ferruh.yigit; +Cc: thomas, Shreyansh Jain
About the series:
This series of patches upgrades the DPAA2 driver firmware to
v10.10.10 (MC Firmware).
As the bus/fslmc is modified, it is a dependent object for other
drivers like net/crypto/qdma. Also, the changes are mostly tightly
linked - thus, the patches include upgrade as well as sequential
changes to driver.
Once done, it would imply that DPAA2 driver won't work with any MC
FW lower than 10.10.10.
Support for this new firmware is available in publically available
LSDK (Layerscape SDK) release [1].
Besides the FW change, there are other subtle changes as well:
- Support reading the MAC address from NIC device, rather than
using a default MAC
- Adding support for QBMan 5.0 FW APIs
- Some patches for NXP's LX2 platform specific features
- And some bug fixes.
Dependency:
* These patches are based on net-next/master 58c3b609699a8c
* Series [1] is logically related to this, but has no git/patch
related dependency. It is series for upgrade of DPAA.
[1] https://lsdk.github.io/index.html
[2] http://patches.dpdk.org/project/dpdk/list/?series=1090&state=*
Version History:
v1->v2:
- Bumped up the version of the libraries (pmd/bus/crypto/event) as the
first set of patches (MC firmware update) breaks the internal ABI
- Added support for ordered processing APIs. These APIs are expected
to be used in subseqent feature updates on DPAA2 ethernet driver.
- Some internal bug fixes.
(Patches increased from 11~15)
Hemant Agrawal (9):
net/dpaa2: fix VLAN filter enablement
bus/fslmc: upgrade mc FW APIs to 10.10.0
net/dpaa2: upgrade dpni to mc FW APIs to 10.10.0
crypto/dpaa2_sec: upgarde mc FW APIs to 10.10.0
net/dpaa2: update RSS value in mbuf for lx2 platform
net/dpaa2: optimize the fd reset in Tx path
net/dpaa2: enhance the queue memory cleanup routines
net/dpaa2: support MBUF VLAN tci population from HW parser
net/dpaa2: support Rx checksum offload in slow parsing
Nipun Gupta (4):
net/dpaa2: fix IOVA conversion for congestion memory
bus/fslmc: support memory backed portals with QBMAN 5.0
bus/fslmc: support 32 enq and deq for LX2 platform
bus/fslmc: disable annotation prefetch for LX2
Shreyansh Jain (2):
net/dpaa2: read hardware provided MAC for DPNI devices
net/dpaa2: add per queue stats get and reset support
drivers/bus/fslmc/Makefile | 2 +-
drivers/bus/fslmc/mc/dpbp.c | 10 +
drivers/bus/fslmc/mc/dpci.c | 197 +++++
drivers/bus/fslmc/mc/dpcon.c | 30 +
drivers/bus/fslmc/mc/dpdmai.c | 14 +
drivers/bus/fslmc/mc/dpio.c | 9 +
drivers/bus/fslmc/mc/fsl_dpbp.h | 1 +
drivers/bus/fslmc/mc/fsl_dpbp_cmd.h | 16 +-
drivers/bus/fslmc/mc/fsl_dpci.h | 47 +-
drivers/bus/fslmc/mc/fsl_dpci_cmd.h | 62 +-
drivers/bus/fslmc/mc/fsl_dpcon.h | 19 +
drivers/bus/fslmc/mc/fsl_dpdmai.h | 5 +
drivers/bus/fslmc/mc/fsl_dpdmai_cmd.h | 20 +-
drivers/bus/fslmc/mc/fsl_dpmng.h | 2 +-
drivers/bus/fslmc/mc/fsl_dpopr.h | 85 ++
drivers/bus/fslmc/meson.build | 2 +
drivers/bus/fslmc/portal/dpaa2_hw_dpio.c | 197 +++--
drivers/bus/fslmc/portal/dpaa2_hw_dpio.h | 4 +
drivers/bus/fslmc/portal/dpaa2_hw_pvt.h | 32 +-
drivers/bus/fslmc/qbman/include/compat.h | 3 +-
.../fslmc/qbman/include/fsl_qbman_portal.h | 33 +-
drivers/bus/fslmc/qbman/qbman_portal.c | 764 +++++++++++++++---
drivers/bus/fslmc/qbman/qbman_portal.h | 30 +-
drivers/bus/fslmc/qbman/qbman_sys.h | 100 ++-
drivers/bus/fslmc/qbman/qbman_sys_decl.h | 4 +
drivers/bus/fslmc/rte_bus_fslmc_version.map | 12 +
drivers/crypto/dpaa2_sec/Makefile | 2 +-
drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c | 8 +-
drivers/crypto/dpaa2_sec/mc/dpseci.c | 128 ++-
drivers/crypto/dpaa2_sec/mc/fsl_dpseci.h | 25 +-
drivers/crypto/dpaa2_sec/mc/fsl_dpseci_cmd.h | 73 +-
drivers/crypto/dpaa2_sec/meson.build | 2 +
drivers/event/dpaa2/Makefile | 2 +-
drivers/event/dpaa2/dpaa2_eventdev.c | 4 +-
drivers/event/dpaa2/meson.build | 2 +
drivers/mempool/dpaa2/Makefile | 2 +-
drivers/mempool/dpaa2/meson.build | 2 +
drivers/net/dpaa2/Makefile | 2 +-
drivers/net/dpaa2/base/dpaa2_hw_dpni_annot.h | 40 +
drivers/net/dpaa2/dpaa2_ethdev.c | 173 +++-
drivers/net/dpaa2/dpaa2_rxtx.c | 95 ++-
drivers/net/dpaa2/mc/dpni.c | 134 ++-
drivers/net/dpaa2/mc/fsl_dpkg.h | 71 +-
drivers/net/dpaa2/mc/fsl_dpni.h | 378 +++++----
drivers/net/dpaa2/mc/fsl_dpni_cmd.h | 87 +-
drivers/net/dpaa2/mc/fsl_net.h | 2 +-
drivers/net/dpaa2/meson.build | 2 +
drivers/raw/dpaa2_cmdif/Makefile | 2 +-
drivers/raw/dpaa2_cmdif/meson.build | 2 +
drivers/raw/dpaa2_qdma/Makefile | 2 +-
drivers/raw/dpaa2_qdma/dpaa2_qdma.c | 14 +-
drivers/raw/dpaa2_qdma/dpaa2_qdma.h | 6 +-
drivers/raw/dpaa2_qdma/meson.build | 2 +
53 files changed, 2377 insertions(+), 585 deletions(-)
create mode 100644 drivers/bus/fslmc/mc/fsl_dpopr.h
--
2.17.1
^ permalink raw reply [relevance 2%]
* Re: [dpdk-dev] [PATCH v1 4/5] pci: add req handler field to generic pci device
@ 2018-09-26 12:22 3% ` Burakov, Anatoly
2018-09-29 6:15 3% ` Jeff Guo
0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-09-26 12:22 UTC (permalink / raw)
To: Jeff Guo, stephen, bruce.richardson, ferruh.yigit,
konstantin.ananyev, gaetan.rivet, jingjing.wu, thomas, motih,
matan, harry.van.haaren, qi.z.zhang, shaopeng.he,
bernard.iremonger, arybchenko
Cc: jblunck, shreyansh.jain, dev, helin.zhang
On 17-Aug-18 11:51 AM, Jeff Guo wrote:
> There are some extended interrupt types in vfio pci device except from the
> existing interrupts, such as err and req notifier, it could be useful for
> device error monitoring. And these corresponding interrupt handler is
> different from the other interrupt handler that register in PMDs, so a new
> interrupt handler should be added. This patch will add specific req handler
> in generic pci device.
>
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
> drivers/bus/pci/rte_bus_pci.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
> index 0d1955f..c45a820 100644
> --- a/drivers/bus/pci/rte_bus_pci.h
> +++ b/drivers/bus/pci/rte_bus_pci.h
> @@ -66,6 +66,7 @@ struct rte_pci_device {
> uint16_t max_vfs; /**< sriov enable if not zero */
> enum rte_kernel_driver kdrv; /**< Kernel driver passthrough */
> char name[PCI_PRI_STR_SIZE+1]; /**< PCI location (ASCII) */
> + struct rte_intr_handle req_notifier_handler;/**< Req notifier handle */
> };
>
> /**
>
Does this break ABI?
--
Thanks,
Anatoly
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [PATCH v5 02/21] mem: allow memseg lists to be marked as external
2018-09-26 11:21 2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
@ 2018-09-26 11:22 16% ` Anatoly Burakov
2018-09-26 11:22 4% ` [dpdk-dev] [PATCH v5 04/21] mem: do not check for invalid socket ID Anatoly Burakov
` (2 subsequent siblings)
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
To: dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, thomas
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes. This also breaks a few
internal assumptions about memory contiguousness, so adjust
malloc code in a few places.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
Notes:
v3:
- Add comment to explain the process of picking up minimum
page sizes for mempool
v2:
- Add documentation changes and ABI break
v1:
- Adjust all calls to memseg walk functions to ignore external
segments where it made sense to do so
doc/guides/rel_notes/deprecation.rst | 15 --------
doc/guides/rel_notes/release_18_11.rst | 13 ++++++-
drivers/bus/fslmc/fslmc_vfio.c | 7 ++--
drivers/net/mlx4/mlx4_mr.c | 3 ++
drivers/net/mlx5/mlx5.c | 5 ++-
drivers/net/mlx5/mlx5_mr.c | 3 ++
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 ++-
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 ++
lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++--
lib/librte_eal/common/eal_common_memory.c | 3 ++
.../common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/common/include/rte_memory.h | 9 +++++
lib/librte_eal/common/malloc_elem.c | 10 ++++--
lib/librte_eal/common/malloc_heap.c | 9 +++--
lib/librte_eal/common/rte_malloc.c | 2 +-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +++++-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++---
lib/librte_eal/meson.build | 2 +-
lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++-----
test/test/test_malloc.c | 3 ++
test/test/test_memzone.c | 3 ++
24 files changed, 134 insertions(+), 44 deletions(-)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------
-* eal: certain structures will change in EAL on account of upcoming external
- memory support. Aside from internal changes leading to an ABI break, the
- following externally visible changes will also be implemented:
-
- - ``rte_memseg_list`` will change to include a boolean flag indicating
- whether a particular memseg list is externally allocated. This will have
- implications for any users of memseg-walk-related functions, as they will
- now have to skip externally allocated segments in most cases if the intent
- is to only iterate over internal DPDK memory.
- - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
- as some socket ID's will now be representing externally allocated memory. No
- changes will be required for existing code as backwards compatibility will
- be kept, and those who do not use this feature will not see these extra
- socket ID's.
-
* eal: both declaring and identifying devices will be streamlined in v18.11.
New functions will appear to query a specific port from buses, classes of
device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..5fc71e208 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* eal: The following API changes were made in 18.11:
+
+ - ``rte_memseg_list`` structure now has an additional flag indicating whether
+ the memseg list is externally allocated. This will have implications for any
+ users of memseg-walk-related functions, as they will now have to skip
+ externally allocated segments in most cases if the intent is to only iterate
+ over internal DPDK memory.
ABI Changes
-----------
@@ -107,6 +114,10 @@ ABI Changes
=========================================================
+* eal: EAL library ABI version was changed due to previously announced work on
+ supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
+ a new flag indicating whether the memseg list refers to external memory.
+
Removed Items
-------------
@@ -152,7 +163,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_compressdev.so.1
librte_cryptodev.so.5
librte_distributor.so.1
- librte_eal.so.8
+ + librte_eal.so.9
librte_ethdev.so.10
librte_eventdev.so.4
librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
}
static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *n_segs = arg;
int ret;
+ if (msl->external)
+ return 0;
+
ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
if (ret)
DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;
+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..c90e1d8ce 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
static void *uar_base;
static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
void **addr = arg;
+ if (msl->external)
+ return 0;
+
if (*addr == NULL)
*addr = ms->addr;
else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;
+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
uint32_t region_nr;
};
static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, size_t len, void *arg)
{
struct walk_arg *wa = arg;
struct vhost_memory_region *mr;
void *start_addr;
+ if (msl->external)
+ return 0;
+
if (wa->region_nr >= max_regions)
return -1;
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
EXPORT_MAP := ../../rte_eal_version.map
-LIBABIVER := 8
+LIBABIVER := 9
# specific to bsdapp exec-env
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
return 1;
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
int seg_idx;
};
static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
struct attach_walk_args *wa = arg;
void *addr;
+ if (msl->external)
+ return 0;
+
addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
{
uint64_t *total_len = arg;
+ if (msl->external)
+ return 0;
+
*total_len += msl->memseg_arr.count * msl->page_sz;
return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
+ unsigned int external; /**< 1 if this list points to external memory */
volatile uint32_t version; /**< version number for multiprocess sync. */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
/* if we're in IOVA as VA mode, or if we're in legacy mode with
- * hugepages, all elements are IOVA-contiguous.
+ * hugepages, all elements are IOVA-contiguous. however, we can only
+ * make these assumptions about internal memory - externally allocated
+ * segments have to be checked.
*/
- if (rte_eal_iova_mode() == RTE_IOVA_VA ||
- (internal_config.legacy_mem && rte_eal_has_hugepages()))
+ if (!elem->msl->external &&
+ (rte_eal_iova_mode() == RTE_IOVA_VA ||
+ (internal_config.legacy_mem &&
+ rte_eal_has_hugepages())))
return RTE_PTR_DIFF(data_end, contig_seg_start);
cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct malloc_heap *heap;
int msl_idx;
+ if (msl->external)
+ return 0;
+
heap = &mcfg->malloc_heaps[msl->socket_id];
/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
/* anything after this is a bonus */
ret = 0;
- /* ...of which we can't avail if we are in legacy mode */
- if (internal_config.legacy_mem)
+ /* ...of which we can't avail if we are in legacy mode, or if this is an
+ * externally allocated segment.
+ */
+ if (internal_config.legacy_mem || msl->external)
goto free_unlock;
/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
if (elem == NULL)
return RTE_BAD_IOVA;
- if (rte_eal_iova_mode() == RTE_IOVA_VA)
+ if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
return (uintptr_t) addr;
ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
EXPORT_MAP := ../../rte_eal_version.map
VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
-LIBABIVER := 8
+LIBABIVER := 9
VPATH += $(RTE_SDK)/lib/librte_eal/common
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
return *socket_id == msl->socket_id;
}
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg __rte_unused)
{
/* ms is const, so find this memseg */
- struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+ struct rte_memseg *found;
+
+ if (msl->external)
+ return 0;
+
+ found = rte_mem_virt2memseg(ms->addr, msl);
found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
unsigned int i;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
char name[PATH_MAX];
int msl_idx, ret;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
unsigned int len;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
}
static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
}
static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
uint64_t hugepage_sz;
};
static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
struct spapr_walk_param *param = arg;
uint64_t max = ms->iova + ms->len;
+ if (msl->external)
+ return 0;
+
if (max > param->window_size) {
param->hugepage_sz = ms->hugepage_sz;
param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
error('unsupported system type "@0@"'.format(host_machine.system()))
endif
-version = 8 # the version of the EAL API
+version = 9 # the version of the EAL API
allow_experimental_apis = true
deps += 'compat'
deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
}
+struct pagesz_walk_arg {
+ int socket_id;
+ size_t min;
+};
+
static int
find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
{
- size_t *min = arg;
+ struct pagesz_walk_arg *wa = arg;
+ bool valid;
- if (msl->page_sz < *min)
- *min = msl->page_sz;
+ /*
+ * we need to only look at page sizes available for a particular socket
+ * ID. so, we either need an exact match on socket ID (can match both
+ * native and external memory), or, if SOCKET_ID_ANY was specified as a
+ * socket ID argument, we must only look at native memory and ignore any
+ * page sizes associated with external memory.
+ */
+ valid = msl->socket_id == wa->socket_id;
+ valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+ if (valid && msl->page_sz < wa->min)
+ wa->min = msl->page_sz;
return 0;
}
static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
{
- size_t min_pagesz = SIZE_MAX;
+ struct pagesz_walk_arg wa;
- rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+ wa.min = SIZE_MAX;
+ wa.socket_id = socket_id;
- return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+ rte_memseg_list_walk(find_min_pagesz, &wa);
+
+ return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
}
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_sz = 0;
pg_shift = 0;
} else if (try_contig) {
- pg_sz = get_min_page_size();
+ pg_sz = get_min_page_size(mp->socket_id);
pg_shift = rte_bsf32(pg_sz);
} else {
pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
{
int32_t *socket = arg;
+ if (msl->external)
+ return 0;
+
return *socket == msl->socket_id;
}
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
{
struct walk_arg *wa = arg;
+ if (msl->external)
+ return 0;
+
if (msl->page_sz == RTE_PGSIZE_2M)
wa->hugepage_2MB_avail = 1;
if (msl->page_sz == RTE_PGSIZE_1G)
--
2.17.1
^ permalink raw reply [relevance 16%]
* [dpdk-dev] [PATCH v5 04/21] mem: do not check for invalid socket ID
` (2 preceding siblings ...)
2018-09-26 11:22 16% ` [dpdk-dev] [PATCH v5 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-26 11:22 4% ` Anatoly Burakov
2018-09-26 11:22 9% ` [dpdk-dev] [PATCH v5 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-09-26 11:22 4% ` [dpdk-dev] [PATCH v5 11/21] malloc: allow creating " Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.
This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 7 +++++++
lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
lib/librte_eal/common/malloc_heap.c | 2 +-
lib/librte_eal/common/rte_malloc.c | 4 ----
4 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5fc71e208..6ee236302 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
users of memseg-walk-related functions, as they will now have to skip
externally allocated segments in most cases if the intent is to only iterate
over internal DPDK memory.
+ - ``socket_id`` parameter across the entire DPDK has gained additional
+ meaning, as some socket ID's will now be representing externally allocated
+ memory. No changes will be required for existing code as backwards
+ compatibility will be kept, and those who do not use this feature will not
+ see these extra socket ID's. Any new API's must not check socket ID
+ parameters themselves, and must instead leave it to the memory subsystem to
+ decide whether socket ID is a valid one.
ABI Changes
-----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
return NULL;
}
- if ((socket_id != SOCKET_ID_ANY) &&
- (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+ if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
rte_errno = EINVAL;
return NULL;
}
- if (!rte_eal_has_hugepages())
+ /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+ * external heap.
+ */
+ if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
socket_id = SOCKET_ID_ANY;
contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1d1e35708..73e478076 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
if (size == 0 || (align && !rte_is_power_of_2(align)))
return NULL;
- if (!rte_eal_has_hugepages())
+ if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
socket_arg = SOCKET_ID_ANY;
if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
if (!rte_eal_has_hugepages())
socket_arg = SOCKET_ID_ANY;
- /* Check socket parameter */
- if (socket_arg >= RTE_MAX_NUMA_NODES)
- return NULL;
-
return malloc_heap_alloc(type, size, socket_arg, 0,
align == 0 ? 1 : align, 0, false);
}
--
2.17.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v5 11/21] malloc: allow creating malloc heaps
` (4 preceding siblings ...)
2018-09-26 11:22 9% ` [dpdk-dev] [PATCH v5 08/21] malloc: add name to malloc heaps Anatoly Burakov
@ 2018-09-26 11:22 4% ` Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko
Add API to allow creating new malloc heaps. They will be created
with socket ID's going above RTE_MAX_NUMA_NODES, to avoid clashing
with internal heaps.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 2 +
.../common/include/rte_eal_memconfig.h | 3 ++
lib/librte_eal/common/include/rte_malloc.h | 19 +++++++
lib/librte_eal/common/malloc_heap.c | 37 +++++++++++++
lib/librte_eal/common/malloc_heap.h | 3 ++
lib/librte_eal/common/rte_malloc.c | 52 +++++++++++++++++++
lib/librte_eal/rte_eal_version.map | 1 +
7 files changed, 117 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 5a80e1122..5065ec1af 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -125,6 +125,8 @@ ABI Changes
supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
a new flag indicating whether the memseg list refers to external memory.
Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
+ Structure ``rte_eal_memconfig`` has been extended to contain next socket
+ ID for externally allocated memory segments.
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index d7920a4e0..98da58771 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -75,6 +75,9 @@ struct rte_mem_config {
/* Heaps of Malloc */
struct malloc_heap malloc_heaps[RTE_MAX_HEAPS];
+ /* next socket ID for external malloc heap */
+ int next_socket_id;
+
/* address of mem_config in primary process. used to map shared config into
* exact same address the primary process maps it.
*/
diff --git a/lib/librte_eal/common/include/rte_malloc.h b/lib/librte_eal/common/include/rte_malloc.h
index 403271ddc..e326529d0 100644
--- a/lib/librte_eal/common/include/rte_malloc.h
+++ b/lib/librte_eal/common/include/rte_malloc.h
@@ -263,6 +263,25 @@ int
rte_malloc_get_socket_stats(int socket,
struct rte_malloc_socket_stats *socket_stats);
+/**
+ * Creates a new empty malloc heap with a specified name.
+ *
+ * @note Heaps created via this call will automatically get assigned a unique
+ * socket ID, which can be found using ``rte_malloc_heap_get_socket()``
+ *
+ * @param heap_name
+ * Name of the heap to create.
+ *
+ * @return
+ * - 0 on successful creation
+ * - -1 in case of error, with rte_errno set to one of the following:
+ * EINVAL - ``heap_name`` was NULL, empty or too long
+ * EEXIST - heap by name of ``heap_name`` already exists
+ * ENOSPC - no more space in internal config to store a new heap
+ */
+int __rte_experimental
+rte_malloc_heap_create(const char *heap_name);
+
/**
* Find socket ID corresponding to a named heap.
*
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac89d15a4..987b83fb8 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -29,6 +29,10 @@
#include "malloc_heap.h"
#include "malloc_mp.h"
+/* start external socket ID's at a very high number */
+#define CONST_MAX(a, b) (a > b ? a : b) /* RTE_MAX is not a constant */
+#define EXTERNAL_HEAP_MIN_SOCKET_ID (CONST_MAX((1 << 8), RTE_MAX_NUMA_NODES))
+
static unsigned
check_hugepage_sz(unsigned flags, uint64_t hugepage_sz)
{
@@ -1015,6 +1019,36 @@ malloc_heap_dump(struct malloc_heap *heap, FILE *f)
rte_spinlock_unlock(&heap->lock);
}
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ uint32_t next_socket_id = mcfg->next_socket_id;
+
+ /* prevent overflow. did you really create 2 billion heaps??? */
+ if (next_socket_id > INT32_MAX) {
+ RTE_LOG(ERR, EAL, "Cannot assign new socket ID's\n");
+ rte_errno = ENOSPC;
+ return -1;
+ }
+
+ /* initialize empty heap */
+ heap->alloc_count = 0;
+ heap->first = NULL;
+ heap->last = NULL;
+ LIST_INIT(heap->free_head);
+ rte_spinlock_init(&heap->lock);
+ heap->total_size = 0;
+ heap->socket_id = next_socket_id;
+
+ /* we hold a global mem hotplug writelock, so it's safe to increment */
+ mcfg->next_socket_id++;
+
+ /* set up name */
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ return 0;
+}
+
int
rte_eal_malloc_heap_init(void)
{
@@ -1022,6 +1056,9 @@ rte_eal_malloc_heap_init(void)
unsigned int i;
if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign min socket ID to external heaps */
+ mcfg->next_socket_id = EXTERNAL_HEAP_MIN_SOCKET_ID;
+
/* assign names to default DPDK heaps */
for (i = 0; i < rte_socket_count(); i++) {
struct malloc_heap *heap = &mcfg->malloc_heaps[i];
diff --git a/lib/librte_eal/common/malloc_heap.h b/lib/librte_eal/common/malloc_heap.h
index 61b844b6f..eebee16dc 100644
--- a/lib/librte_eal/common/malloc_heap.h
+++ b/lib/librte_eal/common/malloc_heap.h
@@ -33,6 +33,9 @@ void *
malloc_heap_alloc_biggest(const char *type, int socket, unsigned int flags,
size_t align, bool contig);
+int
+malloc_heap_create(struct malloc_heap *heap, const char *heap_name);
+
int
malloc_heap_free(struct malloc_elem *elem);
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index fa81d7862..25967a7cb 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -13,6 +13,7 @@
#include <rte_memory.h>
#include <rte_eal.h>
#include <rte_eal_memconfig.h>
+#include <rte_errno.h>
#include <rte_branch_prediction.h>
#include <rte_debug.h>
#include <rte_launch.h>
@@ -311,3 +312,54 @@ rte_malloc_virt2iova(const void *addr)
return ms->iova + RTE_PTR_DIFF(addr, ms->addr);
}
+
+int
+rte_malloc_heap_create(const char *heap_name)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ struct malloc_heap *heap = NULL;
+ int i, ret;
+
+ if (heap_name == NULL ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) == 0 ||
+ strnlen(heap_name, RTE_HEAP_NAME_MAX_LEN) ==
+ RTE_HEAP_NAME_MAX_LEN) {
+ rte_errno = EINVAL;
+ return -1;
+ }
+ /* check if there is space in the heap list, or if heap with this name
+ * already exists.
+ */
+ rte_rwlock_write_lock(&mcfg->memory_hotplug_lock);
+
+ for (i = 0; i < RTE_MAX_HEAPS; i++) {
+ struct malloc_heap *tmp = &mcfg->malloc_heaps[i];
+ /* existing heap */
+ if (strncmp(heap_name, tmp->name,
+ RTE_HEAP_NAME_MAX_LEN) == 0) {
+ RTE_LOG(ERR, EAL, "Heap %s already exists\n",
+ heap_name);
+ rte_errno = EEXIST;
+ ret = -1;
+ goto unlock;
+ }
+ /* empty heap */
+ if (strnlen(tmp->name, RTE_HEAP_NAME_MAX_LEN) == 0) {
+ heap = tmp;
+ break;
+ }
+ }
+ if (heap == NULL) {
+ RTE_LOG(ERR, EAL, "Cannot create new heap: no space\n");
+ rte_errno = ENOSPC;
+ ret = -1;
+ goto unlock;
+ }
+
+ /* we're sure that we can create a new heap, so do it */
+ ret = malloc_heap_create(heap, heap_name);
+unlock:
+ rte_rwlock_write_unlock(&mcfg->memory_hotplug_lock);
+
+ return ret;
+}
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index bd60506af..376f33bbb 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -318,6 +318,7 @@ EXPERIMENTAL {
rte_fbarray_set_used;
rte_log_register_type_and_pick_level;
rte_malloc_dump_heaps;
+ rte_malloc_heap_create;
rte_malloc_heap_get_socket;
rte_malloc_heap_socket_is_external;
rte_mem_alloc_validator_register;
--
2.17.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK
@ 2018-09-26 11:21 2% ` Anatoly Burakov
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
` (4 more replies)
2018-09-26 11:22 16% ` [dpdk-dev] [PATCH v5 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
` (3 subsequent siblings)
5 siblings, 5 replies; 200+ results
From: Anatoly Burakov @ 2018-09-26 11:21 UTC (permalink / raw)
To: dev
Cc: laszlo.madarassy, laszlo.vadkerti, andras.kovacs, winnie.tian,
daniel.andrasi, janos.kobor, geza.koblo, srinath.mannam,
scott.branden, ajit.khaparde, keith.wiles, bruce.richardson,
thomas, shreyansh.jain, shahafs, arybchenko
This is a proposal to enable using externally allocated memory
in DPDK.
In a nutshell, here is what is being done here:
- Index internal malloc heaps by NUMA node index, rather than NUMA
node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
- Each new heap will receive a unique socket ID that will be used by
allocator to decide from which heap (internal or external) to
allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
of externally allocated memory
- If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps
The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).
The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).
A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.
Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.
v5 -> v4 changes:
- All processes are now able to create and destroy malloc heaps
- Memory is automatically mapped for DMA on adding it to heap
- Mem event callbacks are triggered on adding/removing memory
- Fixed compile issues on FreeBSD
- Better documentation on API/ABI changes
v4 -> v3 changes:
- Dropped sample application in favor of new testpmd flag
- Added new flag to testpmd, with four options of mempool allocation
- Added new API to check if a socket ID belongs to an external heap
- Adjusted malloc and mempool code to not make any assumptions about
IOVA-contiguousness when dealing with externally allocated memory
v3 -> v2 changes:
- Rebase on top of latest master
- Clarifications added to mempool code as per Andrew Rynchenko's
comments
v2 -> v1 changes:
- Fixed NULL dereference on heap socket ID lookup
- Fixed memseg offset calculation on adding memory to heap
- Improved unit test to test for above bugfixes
- Restricted heap creation to primary processes only
- Added sample application
- Added documentation
RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements
Anatoly Burakov (21):
mem: add length to memseg list
mem: allow memseg lists to be marked as external
malloc: index heaps using heap ID rather than NUMA node
mem: do not check for invalid socket ID
flow_classify: do not check for invalid socket ID
pipeline: do not check for invalid socket ID
sched: do not check for invalid socket ID
malloc: add name to malloc heaps
malloc: add function to query socket ID of named heap
malloc: add function to check if socket is external
malloc: allow creating malloc heaps
malloc: allow destroying heaps
malloc: allow adding memory to named heaps
malloc: allow removing memory from named heaps
malloc: allow attaching to external memory chunks
malloc: allow detaching from external memory
malloc: enable event callbacks for external memory
test: add unit tests for external memory support
app/testpmd: add support for external memory
doc: add external memory feature to the release notes
doc: add external memory feature to programmer's guide
app/test-pmd/config.c | 21 +-
app/test-pmd/parameters.c | 23 +-
app/test-pmd/testpmd.c | 305 ++++++++++++-
app/test-pmd/testpmd.h | 13 +-
config/common_base | 1 +
config/rte_config.h | 1 +
.../prog_guide/env_abstraction_layer.rst | 37 ++
doc/guides/rel_notes/deprecation.rst | 15 -
doc/guides/rel_notes/release_18_11.rst | 28 +-
doc/guides/testpmd_app_ug/run_app.rst | 12 +
drivers/bus/fslmc/fslmc_vfio.c | 14 +-
drivers/bus/pci/linux/pci.c | 2 +-
drivers/net/mlx4/mlx4_mr.c | 3 +
drivers/net/mlx5/mlx5.c | 5 +-
drivers/net/mlx5/mlx5_mr.c | 3 +
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 +-
.../net/virtio/virtio_user/virtio_user_dev.c | 8 +
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 +
lib/librte_eal/bsdapp/eal/eal_memory.c | 9 +-
lib/librte_eal/common/eal_common_memory.c | 8 +-
lib/librte_eal/common/eal_common_memzone.c | 8 +-
.../common/include/rte_eal_memconfig.h | 9 +-
lib/librte_eal/common/include/rte_malloc.h | 192 ++++++++
.../common/include/rte_malloc_heap.h | 3 +
lib/librte_eal/common/include/rte_memory.h | 9 +
lib/librte_eal/common/malloc_elem.c | 10 +-
lib/librte_eal/common/malloc_heap.c | 316 +++++++++++--
lib/librte_eal/common/malloc_heap.h | 17 +
lib/librte_eal/common/rte_malloc.c | 429 +++++++++++++++++-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 12 +-
lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 27 +-
lib/librte_eal/meson.build | 2 +-
lib/librte_eal/rte_eal_version.map | 8 +
lib/librte_flow_classify/rte_flow_classify.c | 3 +-
lib/librte_mempool/rte_mempool.c | 57 ++-
lib/librte_pipeline/rte_pipeline.c | 3 +-
lib/librte_sched/rte_sched.c | 2 +-
test/test/Makefile | 1 +
test/test/autotest_data.py | 14 +-
test/test/meson.build | 1 +
test/test/test_external_mem.c | 389 ++++++++++++++++
test/test/test_malloc.c | 3 +
test/test/test_memzone.c | 3 +
47 files changed, 1913 insertions(+), 139 deletions(-)
create mode 100644 test/test/test_external_mem.c
--
2.17.1
^ permalink raw reply [relevance 2%]
* [dpdk-dev] [PATCH v5 08/21] malloc: add name to malloc heaps
` (3 preceding siblings ...)
2018-09-26 11:22 4% ` [dpdk-dev] [PATCH v5 04/21] mem: do not check for invalid socket ID Anatoly Burakov
@ 2018-09-26 11:22 9% ` Anatoly Burakov
2018-09-26 11:22 4% ` [dpdk-dev] [PATCH v5 11/21] malloc: allow creating " Anatoly Burakov
5 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-26 11:22 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko
We will need to refer to external heaps in some way. While we use
heap ID's internally, for external API use it has to be something
more user-friendly. So, we will be using a string to uniquely
identify a heap.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 1 +
lib/librte_eal/common/include/rte_malloc_heap.h | 2 ++
lib/librte_eal/common/malloc_heap.c | 17 ++++++++++++++++-
lib/librte_eal/common/rte_malloc.c | 1 +
4 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 6ee236302..5a80e1122 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -124,6 +124,7 @@ ABI Changes
* eal: EAL library ABI version was changed due to previously announced work on
supporting external memory in DPDK. Structure ``rte_memseg_list`` now has
a new flag indicating whether the memseg list refers to external memory.
+ Structure ``rte_malloc_heap`` now has a ``heap_name`` string member.
Removed Items
-------------
diff --git a/lib/librte_eal/common/include/rte_malloc_heap.h b/lib/librte_eal/common/include/rte_malloc_heap.h
index e7ac32d42..1c08ef3e0 100644
--- a/lib/librte_eal/common/include/rte_malloc_heap.h
+++ b/lib/librte_eal/common/include/rte_malloc_heap.h
@@ -12,6 +12,7 @@
/* Number of free lists per heap, grouped by size. */
#define RTE_HEAP_NUM_FREELISTS 13
+#define RTE_HEAP_NAME_MAX_LEN 32
/* dummy definition, for pointers */
struct malloc_elem;
@@ -28,6 +29,7 @@ struct malloc_heap {
unsigned alloc_count;
size_t total_size;
unsigned int socket_id;
+ char name[RTE_HEAP_NAME_MAX_LEN];
} __rte_cache_aligned;
#endif /* _RTE_MALLOC_HEAP_H_ */
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 73e478076..ac89d15a4 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -127,7 +127,6 @@ malloc_add_seg(const struct rte_memseg_list *msl,
malloc_heap_add_memory(heap, found_msl, ms->addr, len);
heap->total_size += len;
- heap->socket_id = msl->socket_id;
RTE_LOG(DEBUG, EAL, "Added %zuM to heap on socket %i\n", len >> 20,
msl->socket_id);
@@ -1020,6 +1019,22 @@ int
rte_eal_malloc_heap_init(void)
{
struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ unsigned int i;
+
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+ /* assign names to default DPDK heaps */
+ for (i = 0; i < rte_socket_count(); i++) {
+ struct malloc_heap *heap = &mcfg->malloc_heaps[i];
+ char heap_name[RTE_HEAP_NAME_MAX_LEN];
+ int socket_id = rte_socket_id_by_idx(i);
+
+ snprintf(heap_name, sizeof(heap_name) - 1,
+ "socket_%i", socket_id);
+ strlcpy(heap->name, heap_name, RTE_HEAP_NAME_MAX_LEN);
+ heap->socket_id = socket_id;
+ }
+ }
+
if (register_mp_requests()) {
RTE_LOG(ERR, EAL, "Couldn't register malloc multiprocess actions\n");
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 9ba1472c3..72632da56 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -202,6 +202,7 @@ rte_malloc_dump_stats(FILE *f, __rte_unused const char *type)
malloc_heap_get_stats(heap, &sock_stats);
fprintf(f, "Heap id:%u\n", heap_id);
+ fprintf(f, "\tHeap name:%s\n", heap->name);
fprintf(f, "\tHeap_size:%zu,\n", sock_stats.heap_totalsz_bytes);
fprintf(f, "\tFree_size:%zu,\n", sock_stats.heap_freesz_bytes);
fprintf(f, "\tAlloc_size:%zu,\n", sock_stats.heap_allocsz_bytes);
--
2.17.1
^ permalink raw reply [relevance 9%]
* [dpdk-dev] [PATCH v1] doc: remove unused release note file
@ 2018-09-25 15:25 3% John McNamara
0 siblings, 0 replies; 200+ results
From: John McNamara @ 2018-09-25 15:25 UTC (permalink / raw)
To: dev; +Cc: John McNamara
Remove unused file from the release notes docs. This file was
used to display a hierarchy in older releases, circa 2015, but
doesn't seem useful in the current structure.
Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
doc/guides/rel_notes/index.rst | 1 -
doc/guides/rel_notes/rel_description.rst | 12 ------------
2 files changed, 13 deletions(-)
delete mode 100644 doc/guides/rel_notes/rel_description.rst
diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index 89fdb4b..1243e98 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -8,7 +8,6 @@ Release Notes
:maxdepth: 1
:numbered:
- rel_description
release_18_11
release_18_08
release_18_05
diff --git a/doc/guides/rel_notes/rel_description.rst b/doc/guides/rel_notes/rel_description.rst
deleted file mode 100644
index 8f28556..0000000
--- a/doc/guides/rel_notes/rel_description.rst
+++ /dev/null
@@ -1,12 +0,0 @@
-.. SPDX-License-Identifier: BSD-3-Clause
- Copyright(c) 2010-2015 Intel Corporation.
-
-Description of Release
-======================
-
-This document contains the release notes for Data Plane Development Kit (DPDK)
-release version |release| and previous releases.
-
-It lists new features, fixed bugs, API and ABI changes and known issues.
-
-For instructions on compiling and running the release, see the :ref:`DPDK Getting Started Guide <linux_gsg>`.
--
2.7.5
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] acl: fix invalid results for rule with zero priority
2018-09-25 12:22 3% ` Luca Boccassi
2018-09-25 12:57 3% ` Thomas Monjalon
@ 2018-09-25 14:34 0% ` Ananyev, Konstantin
2018-10-03 16:18 0% ` Luca Boccassi
1 sibling, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2018-09-25 14:34 UTC (permalink / raw)
To: Luca Boccassi, Thomas Monjalon; +Cc: dev
Hi Luca,
>
> On Sun, 2018-09-16 at 11:56 +0200, Thomas Monjalon wrote:
> > 24/08/2018 18:47, Konstantin Ananyev:
> > > If user specifies priority=0 for some of ACL rules
> > > that can cause rte_acl_classify to return wrong results.
> > > The reason is that priority zero is used internally for no-match
> > > nodes.
> > > See more details at: https://bugs.dpdk.org/show_bug.cgi?id=79.
> > > The simplest way to overcome the issue is just not allow zero
> > > to be a valid priority for the rule.
> > >
> > > Fixes: dc276b5780c2 ("acl: new library")
> > >
> > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> >
> > Cc: stable@dpdk.org
> >
> > Applied with below title, thanks
> > acl: forbid rule with priority zero
>
> Hi,
>
> This patch is marked for stable, but it changes an enum in a public header
Yes it does.
> so it looks like an ABI breakage? Have I got it wrong?
Strictly speaking - yes, but priority=0 is invalid value with current implementation.
I don't think someone uses it - as in that case acl library simply wouldn't work
correctly.
Konstantin
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] acl: fix invalid results for rule with zero priority
2018-09-25 12:22 3% ` Luca Boccassi
@ 2018-09-25 12:57 3% ` Thomas Monjalon
2018-09-25 14:34 0% ` Ananyev, Konstantin
1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-09-25 12:57 UTC (permalink / raw)
To: Luca Boccassi, Konstantin Ananyev; +Cc: dev
25/09/2018 14:22, Luca Boccassi:
> On Sun, 2018-09-16 at 11:56 +0200, Thomas Monjalon wrote:
> > 24/08/2018 18:47, Konstantin Ananyev:
> > > If user specifies priority=0 for some of ACL rules
> > > that can cause rte_acl_classify to return wrong results.
> > > The reason is that priority zero is used internally for no-match
> > > nodes.
> > > See more details at: https://bugs.dpdk.org/show_bug.cgi?id=79.
> > > The simplest way to overcome the issue is just not allow zero
> > > to be a valid priority for the rule.
> > >
> > > Fixes: dc276b5780c2 ("acl: new library")
> > >
> > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> >
> > Cc: stable@dpdk.org
> >
> > Applied with below title, thanks
> > acl: forbid rule with priority zero
>
> Hi,
>
> This patch is marked for stable, but it changes an enum in a public
> header so it looks like an ABI breakage? Have I got it wrong?
- RTE_ACL_MIN_PRIORITY = 0,
+ RTE_ACL_MIN_PRIORITY = 1,
In my understanding, the change is not breaking the ABI because
the old minimal value (0) can still be used, with the same side effect.
The new value is just removing a side effect for newly compiled apps.
Konstantin, am I right?
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH] acl: fix invalid results for rule with zero priority
@ 2018-09-25 12:22 3% ` Luca Boccassi
2018-09-25 12:57 3% ` Thomas Monjalon
2018-09-25 14:34 0% ` Ananyev, Konstantin
0 siblings, 2 replies; 200+ results
From: Luca Boccassi @ 2018-09-25 12:22 UTC (permalink / raw)
To: Thomas Monjalon, Konstantin Ananyev; +Cc: dev
On Sun, 2018-09-16 at 11:56 +0200, Thomas Monjalon wrote:
> 24/08/2018 18:47, Konstantin Ananyev:
> > If user specifies priority=0 for some of ACL rules
> > that can cause rte_acl_classify to return wrong results.
> > The reason is that priority zero is used internally for no-match
> > nodes.
> > See more details at: https://bugs.dpdk.org/show_bug.cgi?id=79.
> > The simplest way to overcome the issue is just not allow zero
> > to be a valid priority for the rule.
> >
> > Fixes: dc276b5780c2 ("acl: new library")
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
>
> Cc: stable@dpdk.org
>
> Applied with below title, thanks
> acl: forbid rule with priority zero
Hi,
This patch is marked for stable, but it changes an enum in a public
header so it looks like an ABI breakage? Have I got it wrong?
--
Kind regards,
Luca Boccassi
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
2018-09-25 9:50 0% ` Thomas Monjalon
@ 2018-09-25 9:56 0% ` Jerin Jacob
0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2018-09-25 9:56 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: Nikhil Rao, dev, stable
-----Original Message-----
> Date: Tue, 25 Sep 2018 11:50:06 +0200
> From: Thomas Monjalon <thomas@monjalon.net>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Cc: Nikhil Rao <nikhil.rao@intel.com>, dev@dpdk.org, stable@dpdk.org
> Subject: Re: [PATCH v2] eventdev: fix port id argument in Rx adapter caps
> API
>
>
> 25/09/2018 11:15, Jerin Jacob:
> > -----Original Message-----
> > > Date: Tue, 25 Sep 2018 14:19:12 +0530
> > > From: Nikhil Rao <nikhil.rao@intel.com>
> > > To: jerin.jacob@caviumnetworks.com
> > > CC: dev@dpdk.org, Nikhil Rao <nikhil.rao@intel.com>, stable@dpdk.org
> > > Subject: [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
> > > X-Mailer: git-send-email 1.8.3.1
> > >
> > >
> > > Make the ethernet port id passed into
> > > rte_event_eth_rx_adapter_caps_get() 16 bit.
> > >
> > > Also, update the event rx adapter test to use 16 bit
> > > ethernet port ids.
> > >
> > > Fixes: c2189c907dd1 ("eventdev: make ethdev port identifiers 16-bit")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
> > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > ---
> > >
> > > v2:
> > > * squash changes to autotest and library into a single patch (Jerin Jacob)
> > > * add update to release notes (Jerin Jacob)
> > >
> > > lib/librte_eventdev/rte_eventdev.h | 2 +-
> > > lib/librte_eventdev/rte_eventdev.c | 2 +-
> > > test/test/test_event_eth_rx_adapter.c | 6 +++---
> > > doc/guides/rel_notes/release_18_11.rst | 4 +++-
> > > lib/librte_eventdev/Makefile | 2 +-
> >
> > Missing version update in lib/librte_eventdev/meson.build. See version=
> >
> > > 5 files changed, 9 insertions(+), 7 deletions(-)
> > >
> > > ABI Changes
> > > -----------
> > > @@ -162,7 +164,7 @@ The libraries prepended with a plus sign were incremented in this version.
> > > librte_distributor.so.1
> > > librte_eal.so.8
> > > librte_ethdev.so.10
> > > - librte_eventdev.so.4
> > > + + librte_eventdev.so.6
> >
> > Can you send a separate standalone patch to fixup doc/guides/rel_notes/release_18_08.rst
> > release notes. The version(change to librte_eventdev.so.5) should have been
> > updated in change set in 3810ae4357.
> >
> > +Thomas,
> > In case if he has difference in opinion on updating released release note file.
>
> I prefer such changes being atomic.
Me too. But the offending change set(3810ae4357) is old
➜ [master][dpdk.org] $ git describe 3810ae4357
v18.05-389-g3810ae435
Do you prefer to have patch to update the release_18_08.rst file or ignore it?
>
>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
2018-09-25 9:15 0% ` Jerin Jacob
@ 2018-09-25 9:50 0% ` Thomas Monjalon
2018-09-25 9:56 0% ` Jerin Jacob
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-09-25 9:50 UTC (permalink / raw)
To: Jerin Jacob; +Cc: Nikhil Rao, dev, stable
25/09/2018 11:15, Jerin Jacob:
> -----Original Message-----
> > Date: Tue, 25 Sep 2018 14:19:12 +0530
> > From: Nikhil Rao <nikhil.rao@intel.com>
> > To: jerin.jacob@caviumnetworks.com
> > CC: dev@dpdk.org, Nikhil Rao <nikhil.rao@intel.com>, stable@dpdk.org
> > Subject: [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
> > X-Mailer: git-send-email 1.8.3.1
> >
> >
> > Make the ethernet port id passed into
> > rte_event_eth_rx_adapter_caps_get() 16 bit.
> >
> > Also, update the event rx adapter test to use 16 bit
> > ethernet port ids.
> >
> > Fixes: c2189c907dd1 ("eventdev: make ethdev port identifiers 16-bit")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
> > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > ---
> >
> > v2:
> > * squash changes to autotest and library into a single patch (Jerin Jacob)
> > * add update to release notes (Jerin Jacob)
> >
> > lib/librte_eventdev/rte_eventdev.h | 2 +-
> > lib/librte_eventdev/rte_eventdev.c | 2 +-
> > test/test/test_event_eth_rx_adapter.c | 6 +++---
> > doc/guides/rel_notes/release_18_11.rst | 4 +++-
> > lib/librte_eventdev/Makefile | 2 +-
>
> Missing version update in lib/librte_eventdev/meson.build. See version=
>
> > 5 files changed, 9 insertions(+), 7 deletions(-)
> >
> > ABI Changes
> > -----------
> > @@ -162,7 +164,7 @@ The libraries prepended with a plus sign were incremented in this version.
> > librte_distributor.so.1
> > librte_eal.so.8
> > librte_ethdev.so.10
> > - librte_eventdev.so.4
> > + + librte_eventdev.so.6
>
> Can you send a separate standalone patch to fixup doc/guides/rel_notes/release_18_08.rst
> release notes. The version(change to librte_eventdev.so.5) should have been
> updated in change set in 3810ae4357.
>
> +Thomas,
> In case if he has difference in opinion on updating released release note file.
I prefer such changes being atomic.
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v3] eventdev: fix port id argument in Rx adapter caps API
2018-09-25 8:49 4% ` [dpdk-dev] [PATCH v2] " Nikhil Rao
@ 2018-09-25 9:49 4% ` Nikhil Rao
1 sibling, 0 replies; 200+ results
From: Nikhil Rao @ 2018-09-25 9:49 UTC (permalink / raw)
To: jerin.jacob; +Cc: dev, Nikhil Rao, stable
Make the ethernet port id passed into
rte_event_eth_rx_adapter_caps_get() 16 bit.
Also, update the event rx adapter test to use 16 bit
ethernet port ids.
Fixes: c2189c907dd1 ("eventdev: make ethdev port identifiers 16-bit")
Cc: stable@dpdk.org
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
v2:
* squash changes to autotest and library into a single patch (Jerin Jacob)
* add update to release notes (Jerin Jacob)
v3:
* update meson.build (Jerin Jacob)
lib/librte_eventdev/rte_eventdev.h | 2 +-
lib/librte_eventdev/rte_eventdev.c | 2 +-
test/test/test_event_eth_rx_adapter.c | 6 +++---
doc/guides/rel_notes/release_18_11.rst | 4 +++-
lib/librte_eventdev/Makefile | 2 +-
lib/librte_eventdev/meson.build | 2 +-
6 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
index a24213e..8541109 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -1112,7 +1112,7 @@ struct rte_event {
*
*/
int
-rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
+rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint16_t eth_port_id,
uint32_t *caps);
#define RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT (1ULL << 0)
diff --git a/lib/librte_eventdev/rte_eventdev.c b/lib/librte_eventdev/rte_eventdev.c
index 0a8572b..b1914dc 100644
--- a/lib/librte_eventdev/rte_eventdev.c
+++ b/lib/librte_eventdev/rte_eventdev.c
@@ -109,7 +109,7 @@
}
int
-rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
+rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint16_t eth_port_id,
uint32_t *caps)
{
struct rte_eventdev *dev;
diff --git a/test/test/test_event_eth_rx_adapter.c b/test/test/test_event_eth_rx_adapter.c
index 4641640..592bcaa 100644
--- a/test/test/test_event_eth_rx_adapter.c
+++ b/test/test/test_event_eth_rx_adapter.c
@@ -32,7 +32,7 @@ struct event_eth_rx_adapter_test_params {
static struct event_eth_rx_adapter_test_params default_params;
static inline int
-port_init_common(uint8_t port, const struct rte_eth_conf *port_conf,
+port_init_common(uint16_t port, const struct rte_eth_conf *port_conf,
struct rte_mempool *mp)
{
const uint16_t rx_ring_size = 512, tx_ring_size = 512;
@@ -94,7 +94,7 @@ struct event_eth_rx_adapter_test_params {
}
static inline int
-port_init_rx_intr(uint8_t port, struct rte_mempool *mp)
+port_init_rx_intr(uint16_t port, struct rte_mempool *mp)
{
static const struct rte_eth_conf port_conf_default = {
.rxmode = {
@@ -110,7 +110,7 @@ struct event_eth_rx_adapter_test_params {
}
static inline int
-port_init(uint8_t port, struct rte_mempool *mp)
+port_init(uint16_t port, struct rte_mempool *mp)
{
static const struct rte_eth_conf port_conf_default = {
.rxmode = {
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 97daad1..842b46b 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -99,6 +99,8 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* eventdev: Type of 2nd parameter to ``rte_event_eth_rx_adapter_caps_get()``
+ has been changed from uint8_t to uint16_t.
ABI Changes
-----------
@@ -162,7 +164,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_distributor.so.1
librte_eal.so.8
librte_ethdev.so.10
- librte_eventdev.so.4
+ + librte_eventdev.so.6
librte_flow_classify.so.1
librte_gro.so.1
librte_gso.so.1
diff --git a/lib/librte_eventdev/Makefile b/lib/librte_eventdev/Makefile
index 47f599a..ce800ea 100644
--- a/lib/librte_eventdev/Makefile
+++ b/lib/librte_eventdev/Makefile
@@ -8,7 +8,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
LIB = librte_eventdev.a
# library version
-LIBABIVER := 5
+LIBABIVER := 6
# build flags
CFLAGS += -DALLOW_EXPERIMENTAL_API
diff --git a/lib/librte_eventdev/meson.build b/lib/librte_eventdev/meson.build
index 3cbaf29..3c4e510 100644
--- a/lib/librte_eventdev/meson.build
+++ b/lib/librte_eventdev/meson.build
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright(c) 2017 Intel Corporation
-version = 5
+version = 6
allow_experimental_apis = true
if host_machine.system() == 'linux'
--
1.8.3.1
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
2018-09-25 8:49 4% ` [dpdk-dev] [PATCH v2] " Nikhil Rao
@ 2018-09-25 9:15 0% ` Jerin Jacob
2018-09-25 9:50 0% ` Thomas Monjalon
0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-09-25 9:15 UTC (permalink / raw)
To: Nikhil Rao; +Cc: dev, stable, thomas
-----Original Message-----
> Date: Tue, 25 Sep 2018 14:19:12 +0530
> From: Nikhil Rao <nikhil.rao@intel.com>
> To: jerin.jacob@caviumnetworks.com
> CC: dev@dpdk.org, Nikhil Rao <nikhil.rao@intel.com>, stable@dpdk.org
> Subject: [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
> X-Mailer: git-send-email 1.8.3.1
>
>
> Make the ethernet port id passed into
> rte_event_eth_rx_adapter_caps_get() 16 bit.
>
> Also, update the event rx adapter test to use 16 bit
> ethernet port ids.
>
> Fixes: c2189c907dd1 ("eventdev: make ethdev port identifiers 16-bit")
> Cc: stable@dpdk.org
>
> Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>
> v2:
> * squash changes to autotest and library into a single patch (Jerin Jacob)
> * add update to release notes (Jerin Jacob)
>
> lib/librte_eventdev/rte_eventdev.h | 2 +-
> lib/librte_eventdev/rte_eventdev.c | 2 +-
> test/test/test_event_eth_rx_adapter.c | 6 +++---
> doc/guides/rel_notes/release_18_11.rst | 4 +++-
> lib/librte_eventdev/Makefile | 2 +-
Missing version update in lib/librte_eventdev/meson.build. See version=
> 5 files changed, 9 insertions(+), 7 deletions(-)
>
> ABI Changes
> -----------
> @@ -162,7 +164,7 @@ The libraries prepended with a plus sign were incremented in this version.
> librte_distributor.so.1
> librte_eal.so.8
> librte_ethdev.so.10
> - librte_eventdev.so.4
> + + librte_eventdev.so.6
Can you send a separate standalone patch to fixup doc/guides/rel_notes/release_18_08.rst
release notes. The version(change to librte_eventdev.so.5) should have been
updated in change set in 3810ae4357.
+Thomas,
In case if he has difference in opinion on updating released release note file.
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v2] eventdev: fix port id argument in Rx adapter caps API
@ 2018-09-25 8:49 4% ` Nikhil Rao
2018-09-25 9:15 0% ` Jerin Jacob
2018-09-25 9:49 4% ` [dpdk-dev] [PATCH v3] " Nikhil Rao
1 sibling, 1 reply; 200+ results
From: Nikhil Rao @ 2018-09-25 8:49 UTC (permalink / raw)
To: jerin.jacob; +Cc: dev, Nikhil Rao, stable
Make the ethernet port id passed into
rte_event_eth_rx_adapter_caps_get() 16 bit.
Also, update the event rx adapter test to use 16 bit
ethernet port ids.
Fixes: c2189c907dd1 ("eventdev: make ethdev port identifiers 16-bit")
Cc: stable@dpdk.org
Signed-off-by: Nikhil Rao <nikhil.rao@intel.com>
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
v2:
* squash changes to autotest and library into a single patch (Jerin Jacob)
* add update to release notes (Jerin Jacob)
lib/librte_eventdev/rte_eventdev.h | 2 +-
lib/librte_eventdev/rte_eventdev.c | 2 +-
test/test/test_event_eth_rx_adapter.c | 6 +++---
doc/guides/rel_notes/release_18_11.rst | 4 +++-
lib/librte_eventdev/Makefile | 2 +-
5 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
index a24213e..8541109 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -1112,7 +1112,7 @@ struct rte_event {
*
*/
int
-rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
+rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint16_t eth_port_id,
uint32_t *caps);
#define RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT (1ULL << 0)
diff --git a/lib/librte_eventdev/rte_eventdev.c b/lib/librte_eventdev/rte_eventdev.c
index 0a8572b..b1914dc 100644
--- a/lib/librte_eventdev/rte_eventdev.c
+++ b/lib/librte_eventdev/rte_eventdev.c
@@ -109,7 +109,7 @@
}
int
-rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
+rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint16_t eth_port_id,
uint32_t *caps)
{
struct rte_eventdev *dev;
diff --git a/test/test/test_event_eth_rx_adapter.c b/test/test/test_event_eth_rx_adapter.c
index 4641640..592bcaa 100644
--- a/test/test/test_event_eth_rx_adapter.c
+++ b/test/test/test_event_eth_rx_adapter.c
@@ -32,7 +32,7 @@ struct event_eth_rx_adapter_test_params {
static struct event_eth_rx_adapter_test_params default_params;
static inline int
-port_init_common(uint8_t port, const struct rte_eth_conf *port_conf,
+port_init_common(uint16_t port, const struct rte_eth_conf *port_conf,
struct rte_mempool *mp)
{
const uint16_t rx_ring_size = 512, tx_ring_size = 512;
@@ -94,7 +94,7 @@ struct event_eth_rx_adapter_test_params {
}
static inline int
-port_init_rx_intr(uint8_t port, struct rte_mempool *mp)
+port_init_rx_intr(uint16_t port, struct rte_mempool *mp)
{
static const struct rte_eth_conf port_conf_default = {
.rxmode = {
@@ -110,7 +110,7 @@ struct event_eth_rx_adapter_test_params {
}
static inline int
-port_init(uint8_t port, struct rte_mempool *mp)
+port_init(uint16_t port, struct rte_mempool *mp)
{
static const struct rte_eth_conf port_conf_default = {
.rxmode = {
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 97daad1..842b46b 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -99,6 +99,8 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* eventdev: Type of 2nd parameter to ``rte_event_eth_rx_adapter_caps_get()``
+ has been changed from uint8_t to uint16_t.
ABI Changes
-----------
@@ -162,7 +164,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_distributor.so.1
librte_eal.so.8
librte_ethdev.so.10
- librte_eventdev.so.4
+ + librte_eventdev.so.6
librte_flow_classify.so.1
librte_gro.so.1
librte_gso.so.1
diff --git a/lib/librte_eventdev/Makefile b/lib/librte_eventdev/Makefile
index 47f599a..ce800ea 100644
--- a/lib/librte_eventdev/Makefile
+++ b/lib/librte_eventdev/Makefile
@@ -8,7 +8,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
LIB = librte_eventdev.a
# library version
-LIBABIVER := 5
+LIBABIVER := 6
# build flags
CFLAGS += -DALLOW_EXPERIMENTAL_API
--
1.8.3.1
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH] doc: announce CRC strip changes in release notes
2018-09-24 17:31 4% ` [dpdk-dev] [PATCH] doc: announce CRC strip changes in release notes Ferruh Yigit
@ 2018-09-24 17:12 0% ` David Marchand
0 siblings, 0 replies; 200+ results
From: David Marchand @ 2018-09-24 17:12 UTC (permalink / raw)
To: Ferruh Yigit; +Cc: John McNamara, Marko Kovacevic, dev, Thomas Monjalon
On Mon, Sep 24, 2018 at 7:31 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> Document changes done in
> commit 323e7b667f18 ("ethdev: make default behavior CRC strip on Rx")
>
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> doc/guides/rel_notes/release_18_11.rst | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
> index 2f53564a9..41b9cd8d5 100644
> --- a/doc/guides/rel_notes/release_18_11.rst
> +++ b/doc/guides/rel_notes/release_18_11.rst
> @@ -112,6 +112,12 @@ API Changes
> flag the MAC can be properly configured in any case. This is particularly
> important for bonding.
>
> +* The default behaviour of CRC strip offload changed. Without any specific Rx
> + offload flag, default behavior by PMD is now to strip CRC.
> + DEV_RX_OFFLOAD_CRC_STRIP offload flag has been removed.
> + To request keeping CRC, application should set ``DEV_RX_OFFLOAD_KEEP_CRC`` Rx
> + offload.
> +
>
> ABI Changes
> -----------
Reviewed-by: David Marchand <david.marchand@6wind.com>
--
David Marchand
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH] doc: announce CRC strip changes in release notes
@ 2018-09-24 17:31 4% ` Ferruh Yigit
2018-09-24 17:12 0% ` David Marchand
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-09-24 17:31 UTC (permalink / raw)
To: John McNamara, Marko Kovacevic
Cc: dev, Ferruh Yigit, Thomas Monjalon, david.marchand
Document changes done in
commit 323e7b667f18 ("ethdev: make default behavior CRC strip on Rx")
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 2f53564a9..41b9cd8d5 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -112,6 +112,12 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* The default behaviour of CRC strip offload changed. Without any specific Rx
+ offload flag, default behavior by PMD is now to strip CRC.
+ DEV_RX_OFFLOAD_CRC_STRIP offload flag has been removed.
+ To request keeping CRC, application should set ``DEV_RX_OFFLOAD_KEEP_CRC`` Rx
+ offload.
+
ABI Changes
-----------
--
2.17.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v4 02/20] mem: allow memseg lists to be marked as external
@ 2018-09-21 16:13 16% ` Anatoly Burakov
2018-09-21 16:13 4% ` [dpdk-dev] [PATCH v4 04/20] mem: do not check for invalid socket ID Anatoly Burakov
2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
To: dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, thomas
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes. This also breaks a few
internal assumptions about memory contiguousness, so adjust
malloc code in a few places.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
Notes:
v3:
- Add comment to explain the process of picking up minimum
page sizes for mempool
v2:
- Add documentation changes and ABI break
v1:
- Adjust all calls to memseg walk functions to ignore external
segments where it made sense to do so
doc/guides/rel_notes/deprecation.rst | 15 --------
doc/guides/rel_notes/release_18_11.rst | 12 ++++++-
drivers/bus/fslmc/fslmc_vfio.c | 7 ++--
drivers/net/mlx4/mlx4_mr.c | 3 ++
drivers/net/mlx5/mlx5.c | 5 ++-
drivers/net/mlx5/mlx5_mr.c | 3 ++
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 ++-
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 ++
lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++--
lib/librte_eal/common/eal_common_memory.c | 3 ++
.../common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/common/include/rte_memory.h | 9 +++++
lib/librte_eal/common/malloc_elem.c | 10 ++++--
lib/librte_eal/common/malloc_heap.c | 9 +++--
lib/librte_eal/common/rte_malloc.c | 2 +-
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +++++-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++---
lib/librte_eal/meson.build | 2 +-
lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++-----
test/test/test_malloc.c | 3 ++
test/test/test_memzone.c | 3 ++
24 files changed, 133 insertions(+), 44 deletions(-)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------
-* eal: certain structures will change in EAL on account of upcoming external
- memory support. Aside from internal changes leading to an ABI break, the
- following externally visible changes will also be implemented:
-
- - ``rte_memseg_list`` will change to include a boolean flag indicating
- whether a particular memseg list is externally allocated. This will have
- implications for any users of memseg-walk-related functions, as they will
- now have to skip externally allocated segments in most cases if the intent
- is to only iterate over internal DPDK memory.
- - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
- as some socket ID's will now be representing externally allocated memory. No
- changes will be required for existing code as backwards compatibility will
- be kept, and those who do not use this feature will not see these extra
- socket ID's.
-
* eal: both declaring and identifying devices will be streamlined in v18.11.
New functions will appear to query a specific port from buses, classes of
device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..e96ec9b43 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* eal: The following API changes were made in 18.11:
+
+ - ``rte_memseg_list`` structure now has an additional flag indicating whether
+ the memseg list is externally allocated. This will have implications for any
+ users of memseg-walk-related functions, as they will now have to skip
+ externally allocated segments in most cases if the intent is to only iterate
+ over internal DPDK memory.
ABI Changes
-----------
@@ -107,6 +114,9 @@ ABI Changes
=========================================================
+* eal: EAL library ABI version was changed due to previously announced work on
+ supporting external memory in DPDK.
+
Removed Items
-------------
@@ -152,7 +162,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_compressdev.so.1
librte_cryptodev.so.5
librte_distributor.so.1
- librte_eal.so.8
+ + librte_eal.so.9
librte_ethdev.so.10
librte_eventdev.so.4
librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
}
static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *n_segs = arg;
int ret;
+ if (msl->external)
+ return 0;
+
ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
if (ret)
DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;
+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..c90e1d8ce 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
static void *uar_base;
static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
void **addr = arg;
+ if (msl->external)
+ return 0;
+
if (*addr == NULL)
*addr = ms->addr;
else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;
+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
uint32_t region_nr;
};
static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, size_t len, void *arg)
{
struct walk_arg *wa = arg;
struct vhost_memory_region *mr;
void *start_addr;
+ if (msl->external)
+ return 0;
+
if (wa->region_nr >= max_regions)
return -1;
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
EXPORT_MAP := ../../rte_eal_version.map
-LIBABIVER := 8
+LIBABIVER := 9
# specific to bsdapp exec-env
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
return 1;
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
int seg_idx;
};
static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
struct attach_walk_args *wa = arg;
void *addr;
+ if (msl->external)
+ return 0;
+
addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
{
uint64_t *total_len = arg;
+ if (msl->external)
+ return 0;
+
*total_len += msl->memseg_arr.count * msl->page_sz;
return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
+ unsigned int external; /**< 1 if this list points to external memory */
volatile uint32_t version; /**< version number for multiprocess sync. */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
diff --git a/lib/librte_eal/common/malloc_elem.c b/lib/librte_eal/common/malloc_elem.c
index e0a8ed15b..1a74660de 100644
--- a/lib/librte_eal/common/malloc_elem.c
+++ b/lib/librte_eal/common/malloc_elem.c
@@ -39,10 +39,14 @@ malloc_elem_find_max_iova_contig(struct malloc_elem *elem, size_t align)
contig_seg_start = RTE_PTR_ALIGN_CEIL(data_start, align);
/* if we're in IOVA as VA mode, or if we're in legacy mode with
- * hugepages, all elements are IOVA-contiguous.
+ * hugepages, all elements are IOVA-contiguous. however, we can only
+ * make these assumptions about internal memory - externally allocated
+ * segments have to be checked.
*/
- if (rte_eal_iova_mode() == RTE_IOVA_VA ||
- (internal_config.legacy_mem && rte_eal_has_hugepages()))
+ if (!elem->msl->external &&
+ (rte_eal_iova_mode() == RTE_IOVA_VA ||
+ (internal_config.legacy_mem &&
+ rte_eal_has_hugepages())))
return RTE_PTR_DIFF(data_end, contig_seg_start);
cur_page = RTE_PTR_ALIGN_FLOOR(contig_seg_start, page_sz);
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct malloc_heap *heap;
int msl_idx;
+ if (msl->external)
+ return 0;
+
heap = &mcfg->malloc_heaps[msl->socket_id];
/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
/* anything after this is a bonus */
ret = 0;
- /* ...of which we can't avail if we are in legacy mode */
- if (internal_config.legacy_mem)
+ /* ...of which we can't avail if we are in legacy mode, or if this is an
+ * externally allocated segment.
+ */
+ if (internal_config.legacy_mem || msl->external)
goto free_unlock;
/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index b51a6d111..47ca5a742 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -223,7 +223,7 @@ rte_malloc_virt2iova(const void *addr)
if (elem == NULL)
return RTE_BAD_IOVA;
- if (rte_eal_iova_mode() == RTE_IOVA_VA)
+ if (!elem->msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
return (uintptr_t) addr;
ms = rte_mem_virt2memseg(addr, elem->msl);
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
EXPORT_MAP := ../../rte_eal_version.map
VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
-LIBABIVER := 8
+LIBABIVER := 9
VPATH += $(RTE_SDK)/lib/librte_eal/common
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
return *socket_id == msl->socket_id;
}
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg __rte_unused)
{
/* ms is const, so find this memseg */
- struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+ struct rte_memseg *found;
+
+ if (msl->external)
+ return 0;
+
+ found = rte_mem_virt2memseg(ms->addr, msl);
found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
unsigned int i;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
char name[PATH_MAX];
int msl_idx, ret;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
unsigned int len;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
}
static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
}
static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
uint64_t hugepage_sz;
};
static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
struct spapr_walk_param *param = arg;
uint64_t max = ms->iova + ms->len;
+ if (msl->external)
+ return 0;
+
if (max > param->window_size) {
param->hugepage_sz = ms->hugepage_sz;
param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
error('unsupported system type "@0@"'.format(host_machine.system()))
endif
-version = 8 # the version of the EAL API
+version = 9 # the version of the EAL API
allow_experimental_apis = true
deps += 'compat'
deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
}
+struct pagesz_walk_arg {
+ int socket_id;
+ size_t min;
+};
+
static int
find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
{
- size_t *min = arg;
+ struct pagesz_walk_arg *wa = arg;
+ bool valid;
- if (msl->page_sz < *min)
- *min = msl->page_sz;
+ /*
+ * we need to only look at page sizes available for a particular socket
+ * ID. so, we either need an exact match on socket ID (can match both
+ * native and external memory), or, if SOCKET_ID_ANY was specified as a
+ * socket ID argument, we must only look at native memory and ignore any
+ * page sizes associated with external memory.
+ */
+ valid = msl->socket_id == wa->socket_id;
+ valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+ if (valid && msl->page_sz < wa->min)
+ wa->min = msl->page_sz;
return 0;
}
static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
{
- size_t min_pagesz = SIZE_MAX;
+ struct pagesz_walk_arg wa;
- rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+ wa.min = SIZE_MAX;
+ wa.socket_id = socket_id;
- return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+ rte_memseg_list_walk(find_min_pagesz, &wa);
+
+ return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
}
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_sz = 0;
pg_shift = 0;
} else if (try_contig) {
- pg_sz = get_min_page_size();
+ pg_sz = get_min_page_size(mp->socket_id);
pg_shift = rte_bsf32(pg_sz);
} else {
pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
{
int32_t *socket = arg;
+ if (msl->external)
+ return 0;
+
return *socket == msl->socket_id;
}
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
{
struct walk_arg *wa = arg;
+ if (msl->external)
+ return 0;
+
if (msl->page_sz == RTE_PGSIZE_2M)
wa->hugepage_2MB_avail = 1;
if (msl->page_sz == RTE_PGSIZE_1G)
--
2.17.1
^ permalink raw reply [relevance 16%]
* [dpdk-dev] [PATCH v4 04/20] mem: do not check for invalid socket ID
2018-09-21 16:13 16% ` [dpdk-dev] [PATCH v4 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-21 16:13 4% ` Anatoly Burakov
2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-21 16:13 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.
This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 7 +++++++
lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
lib/librte_eal/common/malloc_heap.c | 2 +-
lib/librte_eal/common/rte_malloc.c | 4 ----
4 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index e96ec9b43..63bbb1b51 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
users of memseg-walk-related functions, as they will now have to skip
externally allocated segments in most cases if the intent is to only iterate
over internal DPDK memory.
+ - ``socket_id`` parameter across the entire DPDK has gained additional
+ meaning, as some socket ID's will now be representing externally allocated
+ memory. No changes will be required for existing code as backwards
+ compatibility will be kept, and those who do not use this feature will not
+ see these extra socket ID's. Any new API's must not check socket ID
+ parameters themselves, and must instead leave it to the memory subsystem to
+ decide whether socket ID is a valid one.
ABI Changes
-----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
return NULL;
}
- if ((socket_id != SOCKET_ID_ANY) &&
- (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+ if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
rte_errno = EINVAL;
return NULL;
}
- if (!rte_eal_has_hugepages())
+ /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+ * external heap.
+ */
+ if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
socket_id = SOCKET_ID_ANY;
contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1d1e35708..73e478076 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
if (size == 0 || (align && !rte_is_power_of_2(align)))
return NULL;
- if (!rte_eal_has_hugepages())
+ if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
socket_arg = SOCKET_ID_ANY;
if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index 73d6df31d..9ba1472c3 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
if (!rte_eal_has_hugepages())
socket_arg = SOCKET_ID_ANY;
- /* Check socket parameter */
- if (socket_arg >= RTE_MAX_NUMA_NODES)
- return NULL;
-
return malloc_heap_alloc(type, size, socket_arg, 0,
align == 0 ? 1 : align, 0, false);
}
--
2.17.1
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries
2018-09-12 20:37 2% ` Honnappa Nagarahalli
@ 2018-09-20 19:50 0% ` Michel Machado
0 siblings, 0 replies; 200+ results
From: Michel Machado @ 2018-09-20 19:50 UTC (permalink / raw)
To: Honnappa Nagarahalli, Qiaobin Fu, bruce.richardson, pablo.de.lara.guarch
Cc: dev, doucette, keith.wiles, sameh.gobriel, charlie.tai, stephen,
nd, yipeng1.wang
On 09/12/2018 04:37 PM, Honnappa Nagarahalli wrote:
>>> +int32_t
>>> +rte_hash_iterator_init(const struct rte_hash *h,
>>> + struct rte_hash_iterator_state *state) {
>>> + struct rte_hash_iterator_istate *__state;
>>> '__state' can be replaced by 's'.
>>>
>>> +
>>> + RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
>>> +
>>> + __state = (struct rte_hash_iterator_istate *)state;
>>> + __state->h = h;
>>> + __state->next = 0;
>>> + __state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
>>> +
>>> + return 0;
>>> +}
>>> IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.
>>
>> Applications would have to initialize struct rte_hash_iterator_state *state before calling rte_hash_iterate() anyway. Why not initializing the fields of a state only once?
>>
>> My concern is about creating another API for every iterator API. You have a valid point on saving cycles as this API applies for data plane. Have you done any performance benchmarking with and without this API? May be we can guide our decision based on that.
>
> It's not just about creating one init function for each iterator because an iterator may have a couple of init functions. For example, someone may eventually find useful to add another init function for the conflicting-entry iterator that we are advocating in this patch. A possibility would be for this new init function to use the key of the new entry instead of its signature to initialize the state. Similar to what is already done in rte_hash_lookup*() functions. In spite of possibly having multiple init functions, there will be a single iterator function.
>
> About the performance benchmarking, the current API only requites applications to initialize a single 32-bit integer. But with the adoption of a struct for the state, the initialization will grow to 64 bytes.
>
> As my tests showed, I do not see any impact of this.
Ok, we are going to eliminate the init functions in v4.
>>> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
>>> index 9e7d9315f..fdb01023e 100644
>>> --- a/lib/librte_hash/rte_hash.h
>>> +++ b/lib/librte_hash/rte_hash.h
>>> @@ -14,6 +14,8 @@
>>> #include <stdint.h>
>>> #include <stddef.h>
>>>
>>> +#include <rte_compat.h>
>>> +
>>> #ifdef __cplusplus
>>> extern "C" {
>>> #endif
>>> @@ -64,6 +66,16 @@ struct rte_hash_parameters {
>>> /** @internal A hash table structure. */ struct rte_hash;
>>>
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice.
>>> + *
>>> + * @internal A hash table iterator state structure.
>>> + */
>>> +struct rte_hash_iterator_state {
>>> + uint8_t space[64];
>>> I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.
>>
>> Okay.
>
> I think we should not replace 64 with RTE_CACHE_LINE_SIZE because the ABI would change based on the architecture for which it's compiled.
>
> Ok. May be have a #define for 64?
Ok.
[ ]'s
Michel Machado
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config
2018-09-19 8:56 3% ` Thomas Monjalon
@ 2018-09-20 15:41 17% ` Anatoly Burakov
2018-10-03 22:05 0% ` Thomas Monjalon
1 sibling, 1 reply; 200+ results
From: Anatoly Burakov @ 2018-09-20 15:41 UTC (permalink / raw)
To: dev; +Cc: John McNamara, Marko Kovacevic, thomas
Currently, command-line switches for legacy mem mode or single-file
segments mode are only stored in internal config. This leads to a
situation where these flags have to always match between primary
and secondary, which is bad for usability.
Fix this by storing these flags in the shared config as well, so
that secondary process can know if the primary was launched in
single-file segments or legacy mem mode.
This bumps the EAL ABI, however there's an EAL deprecation notice
already in place[1] for a different feature, so that's OK.
[1] http://patches.dpdk.org/patch/43502/
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
Notes:
v2:
- Added documentation on ABI break
doc/guides/rel_notes/rel_description.rst | 5 +++++
doc/guides/rel_notes/release_18_11.rst | 6 +++++-
.../common/include/rte_eal_memconfig.h | 4 ++++
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 20 +++++++++++++++++++
lib/librte_eal/meson.build | 2 +-
6 files changed, 36 insertions(+), 3 deletions(-)
diff --git a/doc/guides/rel_notes/rel_description.rst b/doc/guides/rel_notes/rel_description.rst
index 8f285566f..3fd289939 100644
--- a/doc/guides/rel_notes/rel_description.rst
+++ b/doc/guides/rel_notes/rel_description.rst
@@ -10,3 +10,8 @@ release version |release| and previous releases.
It lists new features, fixed bugs, API and ABI changes and known issues.
For instructions on compiling and running the release, see the :ref:`DPDK Getting Started Guide <linux_gsg>`.
+
+* eal: new ABI version for EAL library due to adding ``legacy_mem`` and
+ ``single_file_segments`` values to ``rte_config`` structure on account of
+ improving DPDK usability when using either ``--legacy-mem`` or
+ ``--single-file-segments`` flags.
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 3ae6b3f58..34acf01d9 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -83,6 +83,10 @@ ABI Changes
Also, make sure to start the actual text at the margin.
=========================================================
+* eal: added ``legacy_mem`` and ``single_file_segments`` values to
+ ``rte_config`` structure on account of improving DPDK usability when
+ using either ``--legacy-mem`` or ``--single-file-segments`` flags.
+
Removed Items
-------------
@@ -129,7 +133,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_compressdev.so.1
librte_cryptodev.so.5
librte_distributor.so.1
- librte_eal.so.8
+ + librte_eal.so.9
librte_ethdev.so.10
librte_eventdev.so.4
librte_flow_classify.so.1
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index aff0688dd..62a21c2dc 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -77,6 +77,10 @@ struct rte_mem_config {
* exact same address the primary process maps it.
*/
uint64_t mem_cfg_addr;
+
+ /* legacy mem and single file segments options are shared */
+ uint32_t legacy_mem;
+ uint32_t single_file_segments;
} __attribute__((__packed__));
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
EXPORT_MAP := ../../rte_eal_version.map
VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
-LIBABIVER := 8
+LIBABIVER := 9
VPATH += $(RTE_SDK)/lib/librte_eal/common
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..4a55d3b69 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -352,6 +352,24 @@ eal_proc_type_detect(void)
return ptype;
}
+/* copies data from internal config to shared config */
+static void
+eal_update_mem_config(void)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ mcfg->legacy_mem = internal_config.legacy_mem;
+ mcfg->single_file_segments = internal_config.single_file_segments;
+}
+
+/* copies data from shared config to internal config */
+static void
+eal_update_internal_config(void)
+{
+ struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+ internal_config.legacy_mem = mcfg->legacy_mem;
+ internal_config.single_file_segments = mcfg->single_file_segments;
+}
+
/* Sets up rte_config structure with the pointer to shared memory config.*/
static void
rte_config_init(void)
@@ -361,11 +379,13 @@ rte_config_init(void)
switch (rte_config.process_type){
case RTE_PROC_PRIMARY:
rte_eal_config_create();
+ eal_update_mem_config();
break;
case RTE_PROC_SECONDARY:
rte_eal_config_attach();
rte_eal_mcfg_wait_complete(rte_config.mem_config);
rte_eal_config_reattach();
+ eal_update_internal_config();
break;
case RTE_PROC_AUTO:
case RTE_PROC_INVALID:
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
error('unsupported system type "@0@"'.format(host_machine.system()))
endif
-version = 8 # the version of the EAL API
+version = 9 # the version of the EAL API
allow_experimental_apis = true
deps += 'compat'
deps += 'kvargs'
--
2.17.1
^ permalink raw reply [relevance 17%]
* [dpdk-dev] [PATCH v3 04/20] mem: do not check for invalid socket ID
2018-09-20 11:36 16% ` [dpdk-dev] [PATCH v3 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-20 11:36 4% ` Anatoly Burakov
2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs, arybchenko
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.
This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 7 +++++++
lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
lib/librte_eal/common/malloc_heap.c | 2 +-
lib/librte_eal/common/rte_malloc.c | 4 ----
4 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index e96ec9b43..63bbb1b51 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -98,6 +98,13 @@ API Changes
users of memseg-walk-related functions, as they will now have to skip
externally allocated segments in most cases if the intent is to only iterate
over internal DPDK memory.
+ - ``socket_id`` parameter across the entire DPDK has gained additional
+ meaning, as some socket ID's will now be representing externally allocated
+ memory. No changes will be required for existing code as backwards
+ compatibility will be kept, and those who do not use this feature will not
+ see these extra socket ID's. Any new API's must not check socket ID
+ parameters themselves, and must instead leave it to the memory subsystem to
+ decide whether socket ID is a valid one.
ABI Changes
-----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
return NULL;
}
- if ((socket_id != SOCKET_ID_ANY) &&
- (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+ if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
rte_errno = EINVAL;
return NULL;
}
- if (!rte_eal_has_hugepages())
+ /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+ * external heap.
+ */
+ if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
socket_id = SOCKET_ID_ANY;
contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 1d1e35708..73e478076 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -647,7 +647,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
if (size == 0 || (align && !rte_is_power_of_2(align)))
return NULL;
- if (!rte_eal_has_hugepages())
+ if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
socket_arg = SOCKET_ID_ANY;
if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index dfcdf380a..458c44ba6 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
if (!rte_eal_has_hugepages())
socket_arg = SOCKET_ID_ANY;
- /* Check socket parameter */
- if (socket_arg >= RTE_MAX_NUMA_NODES)
- return NULL;
-
return malloc_heap_alloc(type, size, socket_arg, 0,
align == 0 ? 1 : align, 0, false);
}
--
2.17.1
^ permalink raw reply [relevance 4%]
* [dpdk-dev] [PATCH v3 02/20] mem: allow memseg lists to be marked as external
@ 2018-09-20 11:36 16% ` Anatoly Burakov
2018-09-20 11:36 4% ` [dpdk-dev] [PATCH v3 04/20] mem: do not check for invalid socket ID Anatoly Burakov
2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-20 11:36 UTC (permalink / raw)
To: dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, thomas
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
Notes:
v3:
- Add comment to explain the process of picking up minimum
page sizes for mempool
v2:
- Add documentation changes and ABI break
v1:
- Adjust all calls to memseg walk functions to ignore external
segments where it made sense to do so
doc/guides/rel_notes/deprecation.rst | 15 --------
doc/guides/rel_notes/release_18_11.rst | 12 ++++++-
drivers/bus/fslmc/fslmc_vfio.c | 7 ++--
drivers/net/mlx4/mlx4_mr.c | 3 ++
drivers/net/mlx5/mlx5.c | 5 ++-
drivers/net/mlx5/mlx5_mr.c | 3 ++
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 ++-
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 ++
lib/librte_eal/bsdapp/eal/eal_memory.c | 7 ++--
lib/librte_eal/common/eal_common_memory.c | 3 ++
.../common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/common/include/rte_memory.h | 9 +++++
lib/librte_eal/common/malloc_heap.c | 9 +++--
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +++++-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 +++++
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 ++++++---
lib/librte_eal/meson.build | 2 +-
lib/librte_mempool/rte_mempool.c | 35 ++++++++++++++-----
test/test/test_malloc.c | 3 ++
test/test/test_memzone.c | 3 ++
22 files changed, 125 insertions(+), 40 deletions(-)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 138335dfb..d2aec64d1 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------
-* eal: certain structures will change in EAL on account of upcoming external
- memory support. Aside from internal changes leading to an ABI break, the
- following externally visible changes will also be implemented:
-
- - ``rte_memseg_list`` will change to include a boolean flag indicating
- whether a particular memseg list is externally allocated. This will have
- implications for any users of memseg-walk-related functions, as they will
- now have to skip externally allocated segments in most cases if the intent
- is to only iterate over internal DPDK memory.
- - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
- as some socket ID's will now be representing externally allocated memory. No
- changes will be required for existing code as backwards compatibility will
- be kept, and those who do not use this feature will not see these extra
- socket ID's.
-
* eal: both declaring and identifying devices will be streamlined in v18.11.
New functions will appear to query a specific port from buses, classes of
device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index bc9b74ec4..e96ec9b43 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -91,6 +91,13 @@ API Changes
flag the MAC can be properly configured in any case. This is particularly
important for bonding.
+* eal: The following API changes were made in 18.11:
+
+ - ``rte_memseg_list`` structure now has an additional flag indicating whether
+ the memseg list is externally allocated. This will have implications for any
+ users of memseg-walk-related functions, as they will now have to skip
+ externally allocated segments in most cases if the intent is to only iterate
+ over internal DPDK memory.
ABI Changes
-----------
@@ -107,6 +114,9 @@ ABI Changes
=========================================================
+* eal: EAL library ABI version was changed due to previously announced work on
+ supporting external memory in DPDK.
+
Removed Items
-------------
@@ -152,7 +162,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_compressdev.so.1
librte_cryptodev.so.5
librte_distributor.so.1
- librte_eal.so.8
+ + librte_eal.so.9
librte_ethdev.so.10
librte_eventdev.so.4
librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
}
static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *n_segs = arg;
int ret;
+ if (msl->external)
+ return 0;
+
ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
if (ret)
DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;
+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 30d4e70a7..c90e1d8ce 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
static void *uar_base;
static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
void **addr = arg;
+ if (msl->external)
+ return 0;
+
if (*addr == NULL)
*addr = ms->addr;
else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;
+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index d1be82162..91cd545b2 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
uint32_t region_nr;
};
static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, size_t len, void *arg)
{
struct walk_arg *wa = arg;
struct vhost_memory_region *mr;
void *start_addr;
+ if (msl->external)
+ return 0;
+
if (wa->region_nr >= max_regions)
return -1;
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
EXPORT_MAP := ../../rte_eal_version.map
-LIBABIVER := 8
+LIBABIVER := 9
# specific to bsdapp exec-env
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
return 1;
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
int seg_idx;
};
static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
struct attach_walk_args *wa = arg;
void *addr;
+ if (msl->external)
+ return 0;
+
addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 30d018209..a2461ed79 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
{
uint64_t *total_len = arg;
+ if (msl->external)
+ return 0;
+
*total_len += msl->memseg_arr.count * msl->page_sz;
return 0;
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
+ unsigned int external; /**< 1 if this list points to external memory */
volatile uint32_t version; /**< version number for multiprocess sync. */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index 14bd277a4..ffdd56bfb 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index ac7bbb3ba..3c8e2063b 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct malloc_heap *heap;
int msl_idx;
+ if (msl->external)
+ return 0;
+
heap = &mcfg->malloc_heaps[msl->socket_id];
/* msl is const, so find it */
@@ -754,8 +757,10 @@ malloc_heap_free(struct malloc_elem *elem)
/* anything after this is a bonus */
ret = 0;
- /* ...of which we can't avail if we are in legacy mode */
- if (internal_config.legacy_mem)
+ /* ...of which we can't avail if we are in legacy mode, or if this is an
+ * externally allocated segment.
+ */
+ if (internal_config.legacy_mem || msl->external)
goto free_unlock;
/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
EXPORT_MAP := ../../rte_eal_version.map
VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
-LIBABIVER := 8
+LIBABIVER := 9
VPATH += $(RTE_SDK)/lib/librte_eal/common
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
return *socket_id == msl->socket_id;
}
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg __rte_unused)
{
/* ms is const, so find this memseg */
- struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+ struct rte_memseg *found;
+
+ if (msl->external)
+ return 0;
+
+ found = rte_mem_virt2memseg(ms->addr, msl);
found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index 71a6e0fd9..f6a0098af 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1408,6 +1408,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
unsigned int i;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1456,6 +1459,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
char name[PATH_MAX];
int msl_idx, ret;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1509,6 +1515,9 @@ fd_list_create_walk(const struct rte_memseg_list *msl,
unsigned int len;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
}
static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
}
static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
uint64_t hugepage_sz;
};
static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
struct spapr_walk_param *param = arg;
uint64_t max = ms->iova + ms->len;
+ if (msl->external)
+ return 0;
+
if (max > param->window_size) {
param->hugepage_sz = ms->hugepage_sz;
param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
error('unsupported system type "@0@"'.format(host_machine.system()))
endif
-version = 8 # the version of the EAL API
+version = 9 # the version of the EAL API
allow_experimental_apis = true
deps += 'compat'
deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..2ed539f01 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,44 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
}
+struct pagesz_walk_arg {
+ int socket_id;
+ size_t min;
+};
+
static int
find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
{
- size_t *min = arg;
+ struct pagesz_walk_arg *wa = arg;
+ bool valid;
- if (msl->page_sz < *min)
- *min = msl->page_sz;
+ /*
+ * we need to only look at page sizes available for a particular socket
+ * ID. so, we either need an exact match on socket ID (can match both
+ * native and external memory), or, if SOCKET_ID_ANY was specified as a
+ * socket ID argument, we must only look at native memory and ignore any
+ * page sizes associated with external memory.
+ */
+ valid = msl->socket_id == wa->socket_id;
+ valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+ if (valid && msl->page_sz < wa->min)
+ wa->min = msl->page_sz;
return 0;
}
static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
{
- size_t min_pagesz = SIZE_MAX;
+ struct pagesz_walk_arg wa;
- rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+ wa.min = SIZE_MAX;
+ wa.socket_id = socket_id;
- return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+ rte_memseg_list_walk(find_min_pagesz, &wa);
+
+ return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
}
@@ -470,7 +489,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_sz = 0;
pg_shift = 0;
} else if (try_contig) {
- pg_sz = get_min_page_size();
+ pg_sz = get_min_page_size(mp->socket_id);
pg_shift = rte_bsf32(pg_sz);
} else {
pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
{
int32_t *socket = arg;
+ if (msl->external)
+ return 0;
+
return *socket == msl->socket_id;
}
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
{
struct walk_arg *wa = arg;
+ if (msl->external)
+ return 0;
+
if (msl->page_sz == RTE_PGSIZE_2M)
wa->hugepage_2MB_avail = 1;
if (msl->page_sz == RTE_PGSIZE_1G)
--
2.17.1
^ permalink raw reply [relevance 16%]
* Re: [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external
2018-09-20 9:30 0% ` Andrew Rybchenko
@ 2018-09-20 9:54 0% ` Burakov, Anatoly
0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-09-20 9:54 UTC (permalink / raw)
To: Andrew Rybchenko, dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
Olivier Matz, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
thomas
On 20-Sep-18 10:30 AM, Andrew Rybchenko wrote:
> On 9/19/18 4:56 PM, Anatoly Burakov wrote:
>> When we allocate and use DPDK memory, we need to be able to
>> differentiate between DPDK hugepage segments and segments that
>> were made part of DPDK but are externally allocated. Add such
>> a property to memseg lists.
>>
>> This breaks the ABI, so bump the EAL library ABI version and
>> document the change in release notes.
>>
>> All current calls for memseg walk functions were adjusted to
>> ignore external segments where it made sense.
>>
>> Mempools is a special case, because we may be asked to allocate
>> a mempool on a specific socket, and we need to ignore all page
>> sizes on other heaps or other sockets. Previously, this
>> assumption of knowing all page sizes was not a problem, but it
>> will be now, so we have to match socket ID with page size when
>> calculating minimum page size for a mempool.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>
> A couple of minor questions/suggestions below, but it is OK to
> go as is even if rejected.
>
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>
> <...>
>
>> diff --git a/lib/librte_mempool/rte_mempool.c
>> b/lib/librte_mempool/rte_mempool.c
>> index 03e6b5f73..d61c77da3 100644
>> --- a/lib/librte_mempool/rte_mempool.c
>> +++ b/lib/librte_mempool/rte_mempool.c
>> @@ -99,25 +99,40 @@ static unsigned optimize_object_size(unsigned
>> obj_size)
>> return new_obj_size * RTE_MEMPOOL_ALIGN;
>> }
>> +struct pagesz_walk_arg {
>> + int socket_id;
>> + size_t min;
>> +};
>> +
>> static int
>> find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
>> {
>> - size_t *min = arg;
>> + struct pagesz_walk_arg *wa = arg;
>> + bool valid;
>> - if (msl->page_sz < *min)
>> - *min = msl->page_sz;
>> + valid = msl->socket_id == wa->socket_id;
>
> Is it intended that we accept externally allocated segment
> if it is on requested socket? If so, it would be good to add
> comment to explain why.
Accepting externally allocated segments is precisely the point here - we
want to find page size of underlying memory, regardless of whether it's
internal or external. We use socket ID to identify valid page sizes for
a particular heap (since socket ID is technically a heap identifier, as
far as external code is concerned), but within that heap there can be
multiple segment lists corresponding to that socket ID, each with its
own page size.
>
>> + valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
>> +
>> + if (!valid)
>> + return 0;
>> +
>> + if (msl->page_sz < wa->min)
>> + wa->min = msl->page_sz;
>
> I'd suggest to keep single return (it is just a bit shorter)
> if (valid && msl->page_sz < wa->min)
> wa->min = msl->page_sz;
Sure. If there will be other comments that warrant a v3 respin, i'll
incorporate this feedback :)
Thanks for the review!
>
> <...>
>
--
Thanks,
Anatoly
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external
2018-09-19 13:56 16% ` [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-20 9:30 0% ` Andrew Rybchenko
2018-09-20 9:54 0% ` Burakov, Anatoly
0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2018-09-20 9:30 UTC (permalink / raw)
To: Anatoly Burakov, dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
Olivier Matz, laszlo.madarassy, laszlo.vadkerti, andras.kovacs,
winnie.tian, daniel.andrasi, janos.kobor, geza.koblo,
srinath.mannam, scott.branden, ajit.khaparde, keith.wiles,
thomas
On 9/19/18 4:56 PM, Anatoly Burakov wrote:
> When we allocate and use DPDK memory, we need to be able to
> differentiate between DPDK hugepage segments and segments that
> were made part of DPDK but are externally allocated. Add such
> a property to memseg lists.
>
> This breaks the ABI, so bump the EAL library ABI version and
> document the change in release notes.
>
> All current calls for memseg walk functions were adjusted to
> ignore external segments where it made sense.
>
> Mempools is a special case, because we may be asked to allocate
> a mempool on a specific socket, and we need to ignore all page
> sizes on other heaps or other sockets. Previously, this
> assumption of knowing all page sizes was not a problem, but it
> will be now, so we have to match socket ID with page size when
> calculating minimum page size for a mempool.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
A couple of minor questions/suggestions below, but it is OK to
go as is even if rejected.
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
<...>
> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
> index 03e6b5f73..d61c77da3 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -99,25 +99,40 @@ static unsigned optimize_object_size(unsigned obj_size)
> return new_obj_size * RTE_MEMPOOL_ALIGN;
> }
>
> +struct pagesz_walk_arg {
> + int socket_id;
> + size_t min;
> +};
> +
> static int
> find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
> {
> - size_t *min = arg;
> + struct pagesz_walk_arg *wa = arg;
> + bool valid;
>
> - if (msl->page_sz < *min)
> - *min = msl->page_sz;
> + valid = msl->socket_id == wa->socket_id;
Is it intended that we accept externally allocated segment
if it is on requested socket? If so, it would be good to add
comment to explain why.
> + valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
> +
> + if (!valid)
> + return 0;
> +
> + if (msl->page_sz < wa->min)
> + wa->min = msl->page_sz;
I'd suggest to keep single return (it is just a bit shorter)
if (valid && msl->page_sz < wa->min)
wa->min = msl->page_sz;
<...>
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external
@ 2018-09-19 13:56 16% ` Anatoly Burakov
2018-09-20 9:30 0% ` Andrew Rybchenko
2018-09-19 13:56 4% ` [dpdk-dev] [PATCH v2 04/20] mem: do not check for invalid socket ID Anatoly Burakov
1 sibling, 1 reply; 200+ results
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
To: dev
Cc: Neil Horman, John McNamara, Marko Kovacevic, Hemant Agrawal,
Shreyansh Jain, Matan Azrad, Shahaf Shuler, Yongseok Koh,
Maxime Coquelin, Tiwei Bie, Zhihong Wang, Bruce Richardson,
Olivier Matz, Andrew Rybchenko, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, thomas
When we allocate and use DPDK memory, we need to be able to
differentiate between DPDK hugepage segments and segments that
were made part of DPDK but are externally allocated. Add such
a property to memseg lists.
This breaks the ABI, so bump the EAL library ABI version and
document the change in release notes.
All current calls for memseg walk functions were adjusted to
ignore external segments where it made sense.
Mempools is a special case, because we may be asked to allocate
a mempool on a specific socket, and we need to ignore all page
sizes on other heaps or other sockets. Previously, this
assumption of knowing all page sizes was not a problem, but it
will be now, so we have to match socket ID with page size when
calculating minimum page size for a mempool.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
Notes:
v1:
- Adjust all calls to memseg walk functions to ignore external
segments where it made sense to do so
doc/guides/rel_notes/deprecation.rst | 15 ---------
doc/guides/rel_notes/release_18_11.rst | 12 ++++++-
drivers/bus/fslmc/fslmc_vfio.c | 7 +++--
drivers/net/mlx4/mlx4_mr.c | 3 ++
drivers/net/mlx5/mlx5.c | 5 ++-
drivers/net/mlx5/mlx5_mr.c | 3 ++
drivers/net/virtio/virtio_user/vhost_kernel.c | 5 ++-
lib/librte_eal/bsdapp/eal/Makefile | 2 +-
lib/librte_eal/bsdapp/eal/eal.c | 3 ++
lib/librte_eal/bsdapp/eal/eal_memory.c | 7 +++--
lib/librte_eal/common/eal_common_memory.c | 4 +++
.../common/include/rte_eal_memconfig.h | 1 +
lib/librte_eal/common/include/rte_memory.h | 9 ++++++
lib/librte_eal/common/malloc_heap.c | 9 ++++--
lib/librte_eal/linuxapp/eal/Makefile | 2 +-
lib/librte_eal/linuxapp/eal/eal.c | 10 +++++-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 9 ++++++
lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 +++++++---
lib/librte_eal/meson.build | 2 +-
lib/librte_mempool/rte_mempool.c | 31 ++++++++++++++-----
test/test/test_malloc.c | 3 ++
test/test/test_memzone.c | 3 ++
22 files changed, 122 insertions(+), 40 deletions(-)
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index e2dbee317..12122cb55 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -11,21 +11,6 @@ API and ABI deprecation notices are to be posted here.
Deprecation Notices
-------------------
-* eal: certain structures will change in EAL on account of upcoming external
- memory support. Aside from internal changes leading to an ABI break, the
- following externally visible changes will also be implemented:
-
- - ``rte_memseg_list`` will change to include a boolean flag indicating
- whether a particular memseg list is externally allocated. This will have
- implications for any users of memseg-walk-related functions, as they will
- now have to skip externally allocated segments in most cases if the intent
- is to only iterate over internal DPDK memory.
- - ``socket_id`` parameter across the entire DPDK will gain additional meaning,
- as some socket ID's will now be representing externally allocated memory. No
- changes will be required for existing code as backwards compatibility will
- be kept, and those who do not use this feature will not see these extra
- socket ID's.
-
* eal: both declaring and identifying devices will be streamlined in v18.11.
New functions will appear to query a specific port from buses, classes of
device and device drivers. Device declaration will be made coherent with the
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 3ae6b3f58..e2cbc82da 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -68,6 +68,13 @@ API Changes
Also, make sure to start the actual text at the margin.
=========================================================
+* eal: The following API changes were made in 18.11:
+
+ - ``rte_memseg_list`` structure now has an additional flag indicating whether
+ the memseg list is externally allocated. This will have implications for any
+ users of memseg-walk-related functions, as they will now have to skip
+ externally allocated segments in most cases if the intent is to only iterate
+ over internal DPDK memory.
ABI Changes
-----------
@@ -84,6 +91,9 @@ ABI Changes
=========================================================
+* eal: EAL library ABI version was changed due to previously announced work on
+ supporting external memory in DPDK.
+
Removed Items
-------------
@@ -129,7 +139,7 @@ The libraries prepended with a plus sign were incremented in this version.
librte_compressdev.so.1
librte_cryptodev.so.5
librte_distributor.so.1
- librte_eal.so.8
+ + librte_eal.so.9
librte_ethdev.so.10
librte_eventdev.so.4
librte_flow_classify.so.1
diff --git a/drivers/bus/fslmc/fslmc_vfio.c b/drivers/bus/fslmc/fslmc_vfio.c
index 4c2cd2a87..2e9244fb7 100644
--- a/drivers/bus/fslmc/fslmc_vfio.c
+++ b/drivers/bus/fslmc/fslmc_vfio.c
@@ -317,12 +317,15 @@ fslmc_unmap_dma(uint64_t vaddr, uint64_t iovaddr __rte_unused, size_t len)
}
static int
-fslmc_dmamap_seg(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+fslmc_dmamap_seg(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *n_segs = arg;
int ret;
+ if (msl->external)
+ return 0;
+
ret = fslmc_map_dma(ms->addr_64, ms->iova, ms->len);
if (ret)
DPAA2_BUS_ERR("Unable to VFIO map (addr=%p, len=%zu)",
diff --git a/drivers/net/mlx4/mlx4_mr.c b/drivers/net/mlx4/mlx4_mr.c
index d23d3c613..9f5d790b6 100644
--- a/drivers/net/mlx4/mlx4_mr.c
+++ b/drivers/net/mlx4/mlx4_mr.c
@@ -496,6 +496,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;
+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index ec63bc6e2..d9ed15880 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -568,11 +568,14 @@ static struct rte_pci_driver mlx5_driver;
static void *uar_base;
static int
-find_lower_va_bound(const struct rte_memseg_list *msl __rte_unused,
+find_lower_va_bound(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
void **addr = arg;
+ if (msl->external)
+ return 0;
+
if (*addr == NULL)
*addr = ms->addr;
else
diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c
index 1d1bcb5fe..fd4345f9c 100644
--- a/drivers/net/mlx5/mlx5_mr.c
+++ b/drivers/net/mlx5/mlx5_mr.c
@@ -486,6 +486,9 @@ mr_find_contig_memsegs_cb(const struct rte_memseg_list *msl,
{
struct mr_find_contig_memsegs_data *data = arg;
+ if (msl->external)
+ return 0;
+
if (data->addr < ms->addr_64 || data->addr >= ms->addr_64 + len)
return 0;
/* Found, save it and stop walking. */
diff --git a/drivers/net/virtio/virtio_user/vhost_kernel.c b/drivers/net/virtio/virtio_user/vhost_kernel.c
index b2444096c..885c59c8a 100644
--- a/drivers/net/virtio/virtio_user/vhost_kernel.c
+++ b/drivers/net/virtio/virtio_user/vhost_kernel.c
@@ -75,13 +75,16 @@ struct walk_arg {
uint32_t region_nr;
};
static int
-add_memory_region(const struct rte_memseg_list *msl __rte_unused,
+add_memory_region(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, size_t len, void *arg)
{
struct walk_arg *wa = arg;
struct vhost_memory_region *mr;
void *start_addr;
+ if (msl->external)
+ return 0;
+
if (wa->region_nr >= max_regions)
return -1;
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index d27da3d15..97bff4852 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -22,7 +22,7 @@ LDLIBS += -lrte_kvargs
EXPORT_MAP := ../../rte_eal_version.map
-LIBABIVER := 8
+LIBABIVER := 9
# specific to bsdapp exec-env
SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index d7ae9d686..7735194a3 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -502,6 +502,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
if (msl->socket_id == *socket_id && msl->memseg_arr.count != 0)
return 1;
diff --git a/lib/librte_eal/bsdapp/eal/eal_memory.c b/lib/librte_eal/bsdapp/eal/eal_memory.c
index 65ea670f9..4b092e1f2 100644
--- a/lib/librte_eal/bsdapp/eal/eal_memory.c
+++ b/lib/librte_eal/bsdapp/eal/eal_memory.c
@@ -236,12 +236,15 @@ struct attach_walk_args {
int seg_idx;
};
static int
-attach_segment(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+attach_segment(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
struct attach_walk_args *wa = arg;
void *addr;
+ if (msl->external)
+ return 0;
+
addr = mmap(ms->addr, ms->len, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_FIXED, wa->fd_hugepage,
wa->seg_idx * EAL_PAGE_SIZE);
diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 0868bf681..55a11bf4d 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -272,6 +272,9 @@ physmem_size(const struct rte_memseg_list *msl, void *arg)
{
uint64_t *total_len = arg;
+ if (msl->external)
+ return 0;
+
*total_len += msl->memseg_arr.count * msl->page_sz;
return 0;
@@ -547,6 +550,7 @@ rte_memseg_list_walk(rte_memseg_list_walk_t func, void *arg)
return ret;
}
+
/* init memory subsystem */
int
rte_eal_memory_init(void)
diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 1d8b0a6fe..6baa6854f 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -33,6 +33,7 @@ struct rte_memseg_list {
size_t len; /**< Length of memory area covered by this memseg list. */
int socket_id; /**< Socket ID for all memsegs in this list. */
uint64_t page_sz; /**< Page size for all memsegs in this list. */
+ unsigned int external; /**< 1 if this list points to external memory */
volatile uint32_t version; /**< version number for multiprocess sync. */
struct rte_fbarray memseg_arr;
};
diff --git a/lib/librte_eal/common/include/rte_memory.h b/lib/librte_eal/common/include/rte_memory.h
index c4b7f4cff..b381d1cb6 100644
--- a/lib/librte_eal/common/include/rte_memory.h
+++ b/lib/librte_eal/common/include/rte_memory.h
@@ -215,6 +215,9 @@ typedef int (*rte_memseg_list_walk_t)(const struct rte_memseg_list *msl,
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -233,6 +236,9 @@ rte_memseg_walk(rte_memseg_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
@@ -251,6 +257,9 @@ rte_memseg_contig_walk(rte_memseg_contig_walk_t func, void *arg);
* @note This function read-locks the memory hotplug subsystem, and thus cannot
* be used within memory-related callback functions.
*
+ * @note This function will also walk through externally allocated segments. It
+ * is up to the user to decide whether to skip through these segments.
+ *
* @param func
* Iterator function
* @param arg
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index 12aaf2d72..8c37b9d7c 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -95,6 +95,9 @@ malloc_add_seg(const struct rte_memseg_list *msl,
struct malloc_heap *heap;
int msl_idx;
+ if (msl->external)
+ return 0;
+
heap = &mcfg->malloc_heaps[msl->socket_id];
/* msl is const, so find it */
@@ -756,8 +759,10 @@ malloc_heap_free(struct malloc_elem *elem)
/* anything after this is a bonus */
ret = 0;
- /* ...of which we can't avail if we are in legacy mode */
- if (internal_config.legacy_mem)
+ /* ...of which we can't avail if we are in legacy mode, or if this is an
+ * externally allocated segment.
+ */
+ if (internal_config.legacy_mem || msl->external)
goto free_unlock;
/* check if we can free any memory back to the system */
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index fd92c75c2..5c16bc40f 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
EXPORT_MAP := ../../rte_eal_version.map
VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
-LIBABIVER := 8
+LIBABIVER := 9
VPATH += $(RTE_SDK)/lib/librte_eal/common
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index e59ac6577..253a6aece 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -725,6 +725,9 @@ check_socket(const struct rte_memseg_list *msl, void *arg)
{
int *socket_id = arg;
+ if (msl->external)
+ return 0;
+
return *socket_id == msl->socket_id;
}
@@ -1059,7 +1062,12 @@ mark_freeable(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
void *arg __rte_unused)
{
/* ms is const, so find this memseg */
- struct rte_memseg *found = rte_mem_virt2memseg(ms->addr, msl);
+ struct rte_memseg *found;
+
+ if (msl->external)
+ return 0;
+
+ found = rte_mem_virt2memseg(ms->addr, msl);
found->flags &= ~RTE_MEMSEG_FLAG_DO_NOT_FREE;
diff --git a/lib/librte_eal/linuxapp/eal/eal_memalloc.c b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
index d040a2f71..8b0bbe43f 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memalloc.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memalloc.c
@@ -1250,6 +1250,9 @@ sync_walk(const struct rte_memseg_list *msl, void *arg __rte_unused)
unsigned int i;
int msl_idx;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1298,6 +1301,9 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
char name[PATH_MAX];
int msl_idx, ret;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
primary_msl = &mcfg->memsegs[msl_idx];
local_msl = &local_memsegs[msl_idx];
@@ -1328,6 +1334,9 @@ secondary_lock_list_create_walk(const struct rte_memseg_list *msl,
int msl_idx;
int *data;
+ if (msl->external)
+ return 0;
+
msl_idx = msl - mcfg->memsegs;
len = msl->memseg_arr.len;
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index c68dc38e0..fddbc3b54 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1082,11 +1082,14 @@ rte_vfio_get_group_num(const char *sysfs_base,
}
static int
-type1_map(const struct rte_memseg_list *msl __rte_unused,
- const struct rte_memseg *ms, void *arg)
+type1_map(const struct rte_memseg_list *msl, const struct rte_memseg *ms,
+ void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1196,11 +1199,14 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova,
}
static int
-vfio_spapr_map_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_map_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
int *vfio_container_fd = arg;
+ if (msl->external)
+ return 0;
+
return vfio_spapr_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
ms->len, 1);
}
@@ -1210,12 +1216,15 @@ struct spapr_walk_param {
uint64_t hugepage_sz;
};
static int
-vfio_spapr_window_size_walk(const struct rte_memseg_list *msl __rte_unused,
+vfio_spapr_window_size_walk(const struct rte_memseg_list *msl,
const struct rte_memseg *ms, void *arg)
{
struct spapr_walk_param *param = arg;
uint64_t max = ms->iova + ms->len;
+ if (msl->external)
+ return 0;
+
if (max > param->window_size) {
param->hugepage_sz = ms->hugepage_sz;
param->window_size = max;
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index e1fde15d1..62ef985b9 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
error('unsupported system type "@0@"'.format(host_machine.system()))
endif
-version = 8 # the version of the EAL API
+version = 9 # the version of the EAL API
allow_experimental_apis = true
deps += 'compat'
deps += 'kvargs'
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 03e6b5f73..d61c77da3 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,25 +99,40 @@ static unsigned optimize_object_size(unsigned obj_size)
return new_obj_size * RTE_MEMPOOL_ALIGN;
}
+struct pagesz_walk_arg {
+ int socket_id;
+ size_t min;
+};
+
static int
find_min_pagesz(const struct rte_memseg_list *msl, void *arg)
{
- size_t *min = arg;
+ struct pagesz_walk_arg *wa = arg;
+ bool valid;
- if (msl->page_sz < *min)
- *min = msl->page_sz;
+ valid = msl->socket_id == wa->socket_id;
+ valid |= wa->socket_id == SOCKET_ID_ANY && msl->external == 0;
+
+ if (!valid)
+ return 0;
+
+ if (msl->page_sz < wa->min)
+ wa->min = msl->page_sz;
return 0;
}
static size_t
-get_min_page_size(void)
+get_min_page_size(int socket_id)
{
- size_t min_pagesz = SIZE_MAX;
+ struct pagesz_walk_arg wa;
- rte_memseg_list_walk(find_min_pagesz, &min_pagesz);
+ wa.min = SIZE_MAX;
+ wa.socket_id = socket_id;
- return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+ rte_memseg_list_walk(find_min_pagesz, &wa);
+
+ return wa.min == SIZE_MAX ? (size_t) getpagesize() : wa.min;
}
@@ -470,7 +485,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
pg_sz = 0;
pg_shift = 0;
} else if (try_contig) {
- pg_sz = get_min_page_size();
+ pg_sz = get_min_page_size(mp->socket_id);
pg_shift = rte_bsf32(pg_sz);
} else {
pg_sz = getpagesize();
diff --git a/test/test/test_malloc.c b/test/test/test_malloc.c
index 4b5abb4e0..5e5272419 100644
--- a/test/test/test_malloc.c
+++ b/test/test/test_malloc.c
@@ -711,6 +711,9 @@ check_socket_mem(const struct rte_memseg_list *msl, void *arg)
{
int32_t *socket = arg;
+ if (msl->external)
+ return 0;
+
return *socket == msl->socket_id;
}
diff --git a/test/test/test_memzone.c b/test/test/test_memzone.c
index 452d7cc5e..9fe465e62 100644
--- a/test/test/test_memzone.c
+++ b/test/test/test_memzone.c
@@ -115,6 +115,9 @@ find_available_pagesz(const struct rte_memseg_list *msl, void *arg)
{
struct walk_arg *wa = arg;
+ if (msl->external)
+ return 0;
+
if (msl->page_sz == RTE_PGSIZE_2M)
wa->hugepage_2MB_avail = 1;
if (msl->page_sz == RTE_PGSIZE_1G)
--
2.17.1
^ permalink raw reply [relevance 16%]
* [dpdk-dev] [PATCH v2 04/20] mem: do not check for invalid socket ID
2018-09-19 13:56 16% ` [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
@ 2018-09-19 13:56 4% ` Anatoly Burakov
1 sibling, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-09-19 13:56 UTC (permalink / raw)
To: dev
Cc: John McNamara, Marko Kovacevic, laszlo.madarassy,
laszlo.vadkerti, andras.kovacs, winnie.tian, daniel.andrasi,
janos.kobor, geza.koblo, srinath.mannam, scott.branden,
ajit.khaparde, keith.wiles, bruce.richardson, thomas,
shreyansh.jain, shahafs
We will be assigning "invalid" socket ID's to external heap, and
malloc will now be able to verify if a supplied socket ID is in
fact a valid one, rendering parameter checks for sockets
obsolete.
This changes the semantics of what we understand by "socket ID",
so document the change in the release notes.
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
doc/guides/rel_notes/release_18_11.rst | 7 +++++++
lib/librte_eal/common/eal_common_memzone.c | 8 +++++---
lib/librte_eal/common/malloc_heap.c | 2 +-
lib/librte_eal/common/rte_malloc.c | 4 ----
4 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index e2cbc82da..c04685d17 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -75,6 +75,13 @@ API Changes
users of memseg-walk-related functions, as they will now have to skip
externally allocated segments in most cases if the intent is to only iterate
over internal DPDK memory.
+ - ``socket_id`` parameter across the entire DPDK has gained additional
+ meaning, as some socket ID's will now be representing externally allocated
+ memory. No changes will be required for existing code as backwards
+ compatibility will be kept, and those who do not use this feature will not
+ see these extra socket ID's. Any new API's must not check socket ID
+ parameters themselves, and must instead leave it to the memory subsystem to
+ decide whether socket ID is a valid one.
ABI Changes
-----------
diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c
index 7300fe05d..b7081afbf 100644
--- a/lib/librte_eal/common/eal_common_memzone.c
+++ b/lib/librte_eal/common/eal_common_memzone.c
@@ -120,13 +120,15 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
return NULL;
}
- if ((socket_id != SOCKET_ID_ANY) &&
- (socket_id >= RTE_MAX_NUMA_NODES || socket_id < 0)) {
+ if ((socket_id != SOCKET_ID_ANY) && socket_id < 0) {
rte_errno = EINVAL;
return NULL;
}
- if (!rte_eal_has_hugepages())
+ /* only set socket to SOCKET_ID_ANY if we aren't allocating for an
+ * external heap.
+ */
+ if (!rte_eal_has_hugepages() && socket_id < RTE_MAX_NUMA_NODES)
socket_id = SOCKET_ID_ANY;
contig = (flags & RTE_MEMZONE_IOVA_CONTIG) != 0;
diff --git a/lib/librte_eal/common/malloc_heap.c b/lib/librte_eal/common/malloc_heap.c
index c4d303533..1dcb1de8f 100644
--- a/lib/librte_eal/common/malloc_heap.c
+++ b/lib/librte_eal/common/malloc_heap.c
@@ -649,7 +649,7 @@ malloc_heap_alloc(const char *type, size_t size, int socket_arg,
if (size == 0 || (align && !rte_is_power_of_2(align)))
return NULL;
- if (!rte_eal_has_hugepages())
+ if (!rte_eal_has_hugepages() && socket_arg < RTE_MAX_NUMA_NODES)
socket_arg = SOCKET_ID_ANY;
if (socket_arg == SOCKET_ID_ANY)
diff --git a/lib/librte_eal/common/rte_malloc.c b/lib/librte_eal/common/rte_malloc.c
index dfcdf380a..458c44ba6 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -47,10 +47,6 @@ rte_malloc_socket(const char *type, size_t size, unsigned int align,
if (!rte_eal_has_hugepages())
socket_arg = SOCKET_ID_ANY;
- /* Check socket parameter */
- if (socket_arg >= RTE_MAX_NUMA_NODES)
- return NULL;
-
return malloc_heap_alloc(type, size, socket_arg, 0,
align == 0 ? 1 : align, 0, false);
}
--
2.17.1
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH] mem: share legacy and single file segments mode with secondaries
@ 2018-09-19 8:56 3% ` Thomas Monjalon
2018-09-20 15:41 17% ` [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config Anatoly Burakov
1 sibling, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-09-19 8:56 UTC (permalink / raw)
To: Anatoly Burakov; +Cc: dev
27/08/2018 14:24, Anatoly Burakov:
> Currently, command-line switches for legacy mem mode or single-file
> segments mode are only stored in internal config. This leads to a
> situation where these flags have to always match between primary
> and secondary, which is bad for usability.
>
> Fix this by storing these flags in the shared config as well, so
> that secondary process can know if the primary was launched in
> single-file segments or legacy mem mode.
>
> This bumps the EAL ABI, however there's an EAL deprecation notice
> already in place[1] for a different feature, so that's OK.
You need to update the release notes:
- ABI change section
- library version section
Thanks
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v2] mbuf: remove deprecated segment free functions
2018-09-17 12:45 8% ` [dpdk-dev] [PATCH v2] " David Marchand
@ 2018-09-19 8:34 0% ` Thomas Monjalon
0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-09-19 8:34 UTC (permalink / raw)
To: David Marchand; +Cc: dev, olivier.matz, arybchenko
17/09/2018 14:45, David Marchand:
> __rte_mbuf_raw_free and __rte_pktmbuf_prefree_seg have been deprecated for
> a long time now (early 17.05), are not part of the abi and are easily
> replaced with existing api.
>
> Signed-off-by: David Marchand <david.marchand@6wind.com>
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Applied, thanks
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
2018-09-18 15:53 0% ` Ferruh Yigit
@ 2018-09-19 5:37 0% ` Honnappa Nagarahalli
0 siblings, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2018-09-19 5:37 UTC (permalink / raw)
To: Ferruh Yigit, Jerin Jacob, Kokkilagadda, Kiran
Cc: Ola Liljedahl, Gavin Hu (Arm Technology China),
Jacob, Jerin, dev, nd, Steve Capper,
Phil Yang (Arm Technology China),
Bruce Richardson, Konstantin Ananyev
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Tuesday, September 18, 2018 10:54 AM
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Kokkilagadda, Kiran
> <Kiran.Kokkilagadda@cavium.com>
> Cc: Ola Liljedahl <Ola.Liljedahl@arm.com>; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>;
> dev@dpdk.org; nd <nd@arm.com>; Steve Capper <Steve.Capper@arm.com>;
> Phil Yang (Arm Technology China) <Phil.Yang@arm.com>; Bruce Richardson
> <bruce.richardson@intel.com>; Konstantin Ananyev
> <konstantin.ananyev@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
>
> On 9/14/2018 3:45 AM, Jerin Jacob wrote:
> > -----Original Message-----
> >> Date: Thu, 13 Sep 2018 23:45:31 +0000
> >> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> >> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> >> CC: Ola Liljedahl <Ola.Liljedahl@arm.com>, "Kokkilagadda, Kiran"
> >> <Kiran.Kokkilagadda@cavium.com>, "Gavin Hu (Arm Technology China)"
> >> <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,
> Jerin"
> >> <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org"
> >> <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
> >> <Steve.Capper@arm.com>, "Phil Yang (Arm Technology China)"
> >> <Phil.Yang@arm.com>
> >> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> >> synchronization
> >>
> >> External Email
> >>
> >> -----Original Message-----
> >>> Date: Thu, 13 Sep 2018 17:40:53 +0000
> >>> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> >>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, Ola Liljedahl
> >>> <Ola.Liljedahl@arm.com>
> >>> CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, "Gavin Hu
> >>> (Arm Technology China)" <Gavin.Hu@arm.com>, Ferruh Yigit
> >>> <ferruh.yigit@intel.com>, "Jacob, Jerin"
> >>> <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org"
> >>> <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
> >>> <Steve.Capper@arm.com>, "Phil Yang (Arm Technology China)"
> >>> <Phil.Yang@arm.com>
> >>> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> >>> synchronization
> >>>
> >>>
> >>> Hi Jerin,
> >>> Is there any reason for having 'RTE_RING_USE_C11_MEM_MODEL',
> which is specific to rte_ring? I do not see a need for choosing only some
> algorithms to work with C11 model. I suggest that we change this to
> 'RTE_USE_C11_MEM_MODEL' so that it can apply to all libraries/algorithms.
> >>
> >>
> >> Yes. Makes sense to me to keep only single config option.
> >>
> >> rte_ring has 2 sets of algorithms for Arm architecture, one with C11
> memory model and the other with barriers. Going forward (for ex: for KNI), I
> think we should support C11 memory model only and skip the barriers.
> >
> > IMO, Both should be supported and set N as in the config/common_base.
> > Based on architecture or micro architecture the performance can vary.
> > So keeping both options and allowing to override to arch/micro arch
> > specific config file makes sense to me.(like existing model, as smp_*
> > ops are compiler NOP for x86)
>
> Hi Jerin, Honnappa, Kiran,
>
> Will there be a new version for this release?
>
> I can see two options:
> 1- Add read/write barriers for both library and kernel parts.
> 2- Use c11 atomics
> 2a- change existing RTE_RING_USE_C11_MEM_MODEL to
> RTE_USE_C11_MEM_MODEL
> 2b- Use RTE_USE_C11_MEM_MODEL to implement c11 atomic for arm and
> ppc
>
> 2) seems agreed on, but is it clear who will work on it?
Sorry for the late reply. We have implemented 2), currently undergoing internal review. We will get this out today. We will work through the community reviews quickly after that.
>
> And 1) looks easier to implement, if 2) won't make time for release can we
> fallback to this one?
>
> Thanks,
> ferruh
>
> >
> >> Also, do you see any issues in making C11 memory model default for Arm
> architecture?
> >
> > It is already set default Y to arm64. see config/common_armv8a_linuxapp.
> >
> > And it is possible for micro architecture to override, see
> > config/defconfig_arm64-thunderx-linuxapp-gcc
> >
> >
> >>
> >>>
> >>> Thank you,
> >>> Honnappa
> >>>
> >>> -----Original Message-----
> >>> From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> >>> Sent: Wednesday, August 29, 2018 3:58 AM
> >>> To: Ola Liljedahl <Ola.Liljedahl@arm.com>
> >>> Cc: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Honnappa
> >>> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu
> >>> <Gavin.Hu@arm.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob,
> >>> Jerin <Jerin.JacobKollanukkaran@cavium.com>; dev@dpdk.org; nd
> >>> <nd@arm.com>; Steve Capper <Steve.Capper@arm.com>
> >>> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> >>> synchronization
> >>>
> >>> -----Original Message-----
> >>>> Date: Wed, 29 Aug 2018 08:47:56 +0000
> >>>> From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> >>>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> >>>> CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
> >>>> Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu
> >>>> <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,
> Jerin"
> >>>> <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org"
> >>>> <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
> >>>> <Steve.Capper@arm.com>
> >>>> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> >>>> synchronization
> >>>> user-agent: Microsoft-MacOutlook/10.10.0.180812
> >>>>
> >>>>
> >>>> There was a mention of rte_ring which is a different data structure. But
> perhaps I misunderstood why this was mentioned and the idea was only to
> use the C11 memory model as is also used in rte_ring nowadays.
> >>>>
> >>>> But why would we have different code for x86 and for other
> architectures (ARM, Power)? If we use the C11 memory model (and e.g. GCC
> __atomic builtins), the code generated for x86 will be the same.
> __atomic_load(__ATOMIC_ACQUIRE) and
> __atomic_store(__ATOMIC_RELEASE) should translate to plain loads and
> stores on x86?
> >>>
> >>> # One reason was __atomic builtins primitives were implemented in gcc
> 4.7 and x86 would like to support < gcc 4.7 and ICC compiler.
> >>> # The theme was no change in the existing code for x86.I am not sure
> about the code generation for x86 with __atomic builtins, I let x86
> maintainers to comments on this.
> >>>
> >>>
> >>>>
> >>>> -- Ola
> >>>>
> >>>> On 29/08/2018, 10:28, "Jerin Jacob" <jerin.jacob@caviumnetworks.com>
> wrote:
> >>>>
> >>>> -----Original Message-----
> >>>> > Date: Wed, 29 Aug 2018 07:34:34 +0000
> >>>> > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> >>>> > To: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>,
> Honnappa
> >>>> > Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu
> <Gavin.Hu@arm.com>,
> >>>> > Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin"
> >>>> > <Jerin.JacobKollanukkaran@cavium.com>
> >>>> > CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Steve
> Capper
> >>>> > <Steve.Capper@arm.com>
> >>>> > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> >>>> > synchronization
> >>>> > user-agent: Microsoft-MacOutlook/10.10.0.180812
> >>>> >
> >>>> > Is the rte_kni kernel/user binary interface subject to backwards
> compatibility requirements? Or can we change it for a new DPDK release?
> >>>>
> >>>> What would be the change in interface? Is it removing the volatile for
> >>>> C11 case, Then you can use anonymous union OR #define to keep the
> size
> >>>> and offset of the element intact.
> >>>>
> >>>> struct rte_kni_fifo {
> >>>> #ifndef RTE_C11...
> >>>> volatile unsigned write; /**< Next position to be written*/
> >>>> volatile unsigned read; /**< Next position to be read */
> >>>> #else
> >>>> unsigned write; /**< Next position to be written*/
> >>>> unsigned read; /**< Next position to be read */
> >>>> #endif
> >>>> unsigned len; /**< Circular buffer length */
> >>>> unsigned elem_size; /**< Pointer size - for 32/64 bitOS */
> >>>> void *volatile buffer[]; /**< The buffer contains mbuf
> >>>> pointers */
> >>>> };
> >>>>
> >>>> Anonymous union example:
> >>>> https://git.dpdk.org/dpdk/tree/lib/librte_mbuf/rte_mbuf.h#n461
> >>>>
> >>>> You can check the ABI breakage by devtools/validate-abi.sh
> >>>>
> >>>> >
> >>>> > -- Ola
> >>>> >
> >>>> > From: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>
> >>>> > Date: Wednesday, 29 August 2018 at 07:50
> >>>> > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
> Gavin Hu <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob,
> Jerin" <Jerin.JacobKollanukkaran@cavium.com>
> >>>> > Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Ola
> Liljedahl <Ola.Liljedahl@arm.com>, Steve Capper <Steve.Capper@arm.com>
> >>>> > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
> >>>> >
> >>>> >
> >>>> > Agreed. Please go a head and make the changes. You need to make
> same change in kernel side also. And please use c11 ring (see rte_ring)
> mechanism so that it won't impact other platforms like intel. We need this
> change just for arm and ppc.
> >>>> >
> >>>> > ________________________________
> >>>> > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> >>>> > Sent: Wednesday, August 29, 2018 10:29 AM
> >>>> > To: Gavin Hu; Kokkilagadda, Kiran; Ferruh Yigit; Jacob, Jerin
> >>>> > Cc: dev@dpdk.org; nd; Ola Liljedahl; Steve Capper
> >>>> > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
> >>>> >
> >>>> >
> >>>> > External Email
> >>>> >
> >>>> > I agree with Gavin here. Store to fifo->write and fifo->read can get
> hoisted resulting in accessing invalid buffer array entries or over writing of the
> buffer array entries.
> >>>> >
> >>>> > IMO, we should solve this using c11 atomics. This will also help
> remove the use of ‘volatile’ from ‘rte_kni_fifo’ structure.
> >>>> >
> >>>> >
> >>>> >
> >>>> > If you want us to put together a patch with this idea, please let us
> know.
> >>>> >
> >>>> >
> >>>> >
> >>>> > Thank you,
> >>>> >
> >>>> > Honnappa
> >>>> >
> >>>> >
> >>>> >
> >>>> > From: Gavin Hu
> >>>> > Sent: Tuesday, August 28, 2018 2:31 PM
> >>>> > To: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Ferruh
> Yigit <ferruh.yigit@intel.com>; Jacob, Jerin
> <Jerin.JacobKollanukkaran@cavium.com>
> >>>> > Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ola Liljedahl
> <Ola.Liljedahl@arm.com>; Steve Capper <Steve.Capper@arm.com>
> >>>> > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
> >>>> >
> >>>> >
> >>>> >
> >>>> > Assuming reader and writer may execute on different CPU's, this
> become standard multithreaded programming.
> >>>> >
> >>>> > We are concerned about that update the reader pointer too
> early(weak ordering may reorder it before reading from the slots), that means
> the slots are released and may immediately overwritten by the writer then
> you get “too new” data and get lost of the old data.
> >>>> >
> >>>> >
> >>>> >
> >>>> > From: Kokkilagadda, Kiran
> <Kiran.Kokkilagadda@cavium.com<mailto:Kiran.Kokkilagadda@cavium.com>>
> >>>> > Sent: Tuesday, August 28, 2018 6:44 PM
> >>>> > To: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>;
> Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Jacob,
> Jerin
> <Jerin.JacobKollanukkaran@cavium.com<mailto:Jerin.JacobKollanukkaran@ca
> vium.com>>
> >>>> > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
> >>>> > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
> >>>> >
> >>>> >
> >>>> >
> >>>> > In this instance there won't be any problem, as until the value of
> fifo->write changes, this loop won't get executed. As of now we didn't see any
> issue with it and for performance reasons, we don't want to keep read barrier.
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> > ________________________________
> >>>> >
> >>>> > From: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>
> >>>> > Sent: Monday, August 27, 2018 9:10 PM
> >>>> > To: Ferruh Yigit; Kokkilagadda, Kiran; Jacob, Jerin
> >>>> > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
> >>>> > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
> >>>> >
> >>>> >
> >>>> >
> >>>> > External Email
> >>>> >
> >>>> > This fix is not complete, kni_fifo_get requires a read fence also,
> otherwise it probably gets stale data on a weak ordering platform.
> >>>> >
> >>>> > > -----Original Message-----
> >>>> > > From: dev <dev-bounces@dpdk.org<mailto:dev-
> bounces@dpdk.org>> On Behalf Of Ferruh Yigit
> >>>> > > Sent: Monday, August 27, 2018 10:08 PM
> >>>> > > To: Kiran Kumar
> <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetwor
> ks.com>>;
> >>>> > >
> jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>
> >>>> > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>
> >>>> > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> >>>> > > synchronization
> >>>> > >
> >>>> > > On 8/16/2018 10:55 AM, Kiran Kumar wrote:
> >>>> > > > With existing code in kni_fifo_put, rx_q values are not being
> updated
> >>>> > > > before updating fifo_write. While reading rx_q in
> kni_net_rx_normal,
> >>>> > > > This is causing the sync issue on other core. So adding a write
> >>>> > > > barrier to make sure the values being synced before updating
> fifo_write.
> >>>> > > >
> >>>> > > > Fixes: 3fc5ca2f6352 ("kni: initial import")
> >>>> > > >
> >>>> > > > Signed-off-by: Kiran Kumar
> <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetwor
> ks.com>>
> >>>> > > > Acked-by: Jerin Jacob
> <jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com
> >>
> >>>> > >
> >>>> > > Acked-by: Ferruh Yigit
> <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
> >>>> > IMPORTANT NOTICE: The contents of this email and any
> attachments are confidential and may also be privileged. If you are not the
> intended recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> >>>>
> >>>>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
2018-09-14 2:45 0% ` Jerin Jacob
@ 2018-09-18 15:53 0% ` Ferruh Yigit
2018-09-19 5:37 0% ` Honnappa Nagarahalli
0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-09-18 15:53 UTC (permalink / raw)
To: Jerin Jacob, Honnappa Nagarahalli, Kokkilagadda, Kiran
Cc: Ola Liljedahl, Gavin Hu (Arm Technology China),
Jacob, Jerin, dev, nd, Steve Capper,
Phil Yang (Arm Technology China),
Bruce Richardson, Konstantin Ananyev
On 9/14/2018 3:45 AM, Jerin Jacob wrote:
> -----Original Message-----
>> Date: Thu, 13 Sep 2018 23:45:31 +0000
>> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> CC: Ola Liljedahl <Ola.Liljedahl@arm.com>, "Kokkilagadda, Kiran"
>> <Kiran.Kokkilagadda@cavium.com>, "Gavin Hu (Arm Technology China)"
>> <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin"
>> <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>, nd
>> <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>, "Phil Yang (Arm
>> Technology China)" <Phil.Yang@arm.com>
>> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>> synchronization
>>
>> External Email
>>
>> -----Original Message-----
>>> Date: Thu, 13 Sep 2018 17:40:53 +0000
>>> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, Ola Liljedahl
>>> <Ola.Liljedahl@arm.com>
>>> CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, "Gavin Hu
>>> (Arm Technology China)" <Gavin.Hu@arm.com>, Ferruh Yigit
>>> <ferruh.yigit@intel.com>, "Jacob, Jerin"
>>> <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>,
>>> nd <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>, "Phil Yang (Arm
>>> Technology China)" <Phil.Yang@arm.com>
>>> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>>> synchronization
>>>
>>>
>>> Hi Jerin,
>>> Is there any reason for having 'RTE_RING_USE_C11_MEM_MODEL', which is specific to rte_ring? I do not see a need for choosing only some algorithms to work with C11 model. I suggest that we change this to 'RTE_USE_C11_MEM_MODEL' so that it can apply to all libraries/algorithms.
>>
>>
>> Yes. Makes sense to me to keep only single config option.
>>
>> rte_ring has 2 sets of algorithms for Arm architecture, one with C11 memory model and the other with barriers. Going forward (for ex: for KNI), I think we should support C11 memory model only and skip the barriers.
>
> IMO, Both should be supported and set N as in the config/common_base.
> Based on architecture or micro architecture the performance can vary.
> So keeping both options and allowing to override to arch/micro arch
> specific config file makes sense to me.(like existing model, as smp_*
> ops are compiler NOP for x86)
Hi Jerin, Honnappa, Kiran,
Will there be a new version for this release?
I can see two options:
1- Add read/write barriers for both library and kernel parts.
2- Use c11 atomics
2a- change existing RTE_RING_USE_C11_MEM_MODEL to RTE_USE_C11_MEM_MODEL
2b- Use RTE_USE_C11_MEM_MODEL to implement c11 atomic for arm and ppc
2) seems agreed on, but is it clear who will work on it?
And 1) looks easier to implement, if 2) won't make time for release can we
fallback to this one?
Thanks,
ferruh
>
>> Also, do you see any issues in making C11 memory model default for Arm architecture?
>
> It is already set default Y to arm64. see config/common_armv8a_linuxapp.
>
> And it is possible for micro architecture to override, see
> config/defconfig_arm64-thunderx-linuxapp-gcc
>
>
>>
>>>
>>> Thank you,
>>> Honnappa
>>>
>>> -----Original Message-----
>>> From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>> Sent: Wednesday, August 29, 2018 3:58 AM
>>> To: Ola Liljedahl <Ola.Liljedahl@arm.com>
>>> Cc: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Honnappa
>>> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu
>>> <Gavin.Hu@arm.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob,
>>> Jerin <Jerin.JacobKollanukkaran@cavium.com>; dev@dpdk.org; nd
>>> <nd@arm.com>; Steve Capper <Steve.Capper@arm.com>
>>> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>>> synchronization
>>>
>>> -----Original Message-----
>>>> Date: Wed, 29 Aug 2018 08:47:56 +0000
>>>> From: Ola Liljedahl <Ola.Liljedahl@arm.com>
>>>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>> CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
>>>> Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu
>>>> <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin"
>>>> <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org"
>>>> <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
>>>> <Steve.Capper@arm.com>
>>>> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>>>> synchronization
>>>> user-agent: Microsoft-MacOutlook/10.10.0.180812
>>>>
>>>>
>>>> There was a mention of rte_ring which is a different data structure. But perhaps I misunderstood why this was mentioned and the idea was only to use the C11 memory model as is also used in rte_ring nowadays.
>>>>
>>>> But why would we have different code for x86 and for other architectures (ARM, Power)? If we use the C11 memory model (and e.g. GCC __atomic builtins), the code generated for x86 will be the same. __atomic_load(__ATOMIC_ACQUIRE) and __atomic_store(__ATOMIC_RELEASE) should translate to plain loads and stores on x86?
>>>
>>> # One reason was __atomic builtins primitives were implemented in gcc 4.7 and x86 would like to support < gcc 4.7 and ICC compiler.
>>> # The theme was no change in the existing code for x86.I am not sure about the code generation for x86 with __atomic builtins, I let x86 maintainers to comments on this.
>>>
>>>
>>>>
>>>> -- Ola
>>>>
>>>> On 29/08/2018, 10:28, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
>>>>
>>>> -----Original Message-----
>>>> > Date: Wed, 29 Aug 2018 07:34:34 +0000
>>>> > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
>>>> > To: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
>>>> > Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>,
>>>> > Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin"
>>>> > <Jerin.JacobKollanukkaran@cavium.com>
>>>> > CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
>>>> > <Steve.Capper@arm.com>
>>>> > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>>>> > synchronization
>>>> > user-agent: Microsoft-MacOutlook/10.10.0.180812
>>>> >
>>>> > Is the rte_kni kernel/user binary interface subject to backwards compatibility requirements? Or can we change it for a new DPDK release?
>>>>
>>>> What would be the change in interface? Is it removing the volatile for
>>>> C11 case, Then you can use anonymous union OR #define to keep the size
>>>> and offset of the element intact.
>>>>
>>>> struct rte_kni_fifo {
>>>> #ifndef RTE_C11...
>>>> volatile unsigned write; /**< Next position to be written*/
>>>> volatile unsigned read; /**< Next position to be read */
>>>> #else
>>>> unsigned write; /**< Next position to be written*/
>>>> unsigned read; /**< Next position to be read */
>>>> #endif
>>>> unsigned len; /**< Circular buffer length */
>>>> unsigned elem_size; /**< Pointer size - for 32/64 bitOS */
>>>> void *volatile buffer[]; /**< The buffer contains mbuf
>>>> pointers */
>>>> };
>>>>
>>>> Anonymous union example:
>>>> https://git.dpdk.org/dpdk/tree/lib/librte_mbuf/rte_mbuf.h#n461
>>>>
>>>> You can check the ABI breakage by devtools/validate-abi.sh
>>>>
>>>> >
>>>> > -- Ola
>>>> >
>>>> > From: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>
>>>> > Date: Wednesday, 29 August 2018 at 07:50
>>>> > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin" <Jerin.JacobKollanukkaran@cavium.com>
>>>> > Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Ola Liljedahl <Ola.Liljedahl@arm.com>, Steve Capper <Steve.Capper@arm.com>
>>>> > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>>>> >
>>>> >
>>>> > Agreed. Please go a head and make the changes. You need to make same change in kernel side also. And please use c11 ring (see rte_ring) mechanism so that it won't impact other platforms like intel. We need this change just for arm and ppc.
>>>> >
>>>> > ________________________________
>>>> > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
>>>> > Sent: Wednesday, August 29, 2018 10:29 AM
>>>> > To: Gavin Hu; Kokkilagadda, Kiran; Ferruh Yigit; Jacob, Jerin
>>>> > Cc: dev@dpdk.org; nd; Ola Liljedahl; Steve Capper
>>>> > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>>>> >
>>>> >
>>>> > External Email
>>>> >
>>>> > I agree with Gavin here. Store to fifo->write and fifo->read can get hoisted resulting in accessing invalid buffer array entries or over writing of the buffer array entries.
>>>> >
>>>> > IMO, we should solve this using c11 atomics. This will also help remove the use of ‘volatile’ from ‘rte_kni_fifo’ structure.
>>>> >
>>>> >
>>>> >
>>>> > If you want us to put together a patch with this idea, please let us know.
>>>> >
>>>> >
>>>> >
>>>> > Thank you,
>>>> >
>>>> > Honnappa
>>>> >
>>>> >
>>>> >
>>>> > From: Gavin Hu
>>>> > Sent: Tuesday, August 28, 2018 2:31 PM
>>>> > To: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>
>>>> > Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; Steve Capper <Steve.Capper@arm.com>
>>>> > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>>>> >
>>>> >
>>>> >
>>>> > Assuming reader and writer may execute on different CPU's, this become standard multithreaded programming.
>>>> >
>>>> > We are concerned about that update the reader pointer too early(weak ordering may reorder it before reading from the slots), that means the slots are released and may immediately overwritten by the writer then you get “too new” data and get lost of the old data.
>>>> >
>>>> >
>>>> >
>>>> > From: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com<mailto:Kiran.Kokkilagadda@cavium.com>>
>>>> > Sent: Tuesday, August 28, 2018 6:44 PM
>>>> > To: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>; Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com<mailto:Jerin.JacobKollanukkaran@cavium.com>>
>>>> > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
>>>> > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>>>> >
>>>> >
>>>> >
>>>> > In this instance there won't be any problem, as until the value of fifo->write changes, this loop won't get executed. As of now we didn't see any issue with it and for performance reasons, we don't want to keep read barrier.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > ________________________________
>>>> >
>>>> > From: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>
>>>> > Sent: Monday, August 27, 2018 9:10 PM
>>>> > To: Ferruh Yigit; Kokkilagadda, Kiran; Jacob, Jerin
>>>> > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
>>>> > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>>>> >
>>>> >
>>>> >
>>>> > External Email
>>>> >
>>>> > This fix is not complete, kni_fifo_get requires a read fence also, otherwise it probably gets stale data on a weak ordering platform.
>>>> >
>>>> > > -----Original Message-----
>>>> > > From: dev <dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>> On Behalf Of Ferruh Yigit
>>>> > > Sent: Monday, August 27, 2018 10:08 PM
>>>> > > To: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>;
>>>> > > jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>
>>>> > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>
>>>> > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
>>>> > > synchronization
>>>> > >
>>>> > > On 8/16/2018 10:55 AM, Kiran Kumar wrote:
>>>> > > > With existing code in kni_fifo_put, rx_q values are not being updated
>>>> > > > before updating fifo_write. While reading rx_q in kni_net_rx_normal,
>>>> > > > This is causing the sync issue on other core. So adding a write
>>>> > > > barrier to make sure the values being synced before updating fifo_write.
>>>> > > >
>>>> > > > Fixes: 3fc5ca2f6352 ("kni: initial import")
>>>> > > >
>>>> > > > Signed-off-by: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>
>>>> > > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>>
>>>> > >
>>>> > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
>>>> > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
>>>>
>>>>
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH v2] mbuf: remove deprecated segment free functions
2018-09-10 5:18 3% [dpdk-dev] [PATCH] mbuf: remove deprecated segment free functions David Marchand
2018-09-10 8:06 0% ` Andrew Rybchenko
@ 2018-09-17 12:45 8% ` David Marchand
2018-09-19 8:34 0% ` Thomas Monjalon
1 sibling, 1 reply; 200+ results
From: David Marchand @ 2018-09-17 12:45 UTC (permalink / raw)
To: dev; +Cc: olivier.matz, arybchenko, thomas
__rte_mbuf_raw_free and __rte_pktmbuf_prefree_seg have been deprecated for
a long time now (early 17.05), are not part of the abi and are easily
replaced with existing api.
Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
doc/guides/rel_notes/release_18_11.rst | 5 +++++
lib/librte_mbuf/rte_mbuf.h | 16 ----------------
2 files changed, 5 insertions(+), 16 deletions(-)
diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
index 3ae6b3f58..d98573072 100644
--- a/doc/guides/rel_notes/release_18_11.rst
+++ b/doc/guides/rel_notes/release_18_11.rst
@@ -68,6 +68,11 @@ API Changes
Also, make sure to start the actual text at the margin.
=========================================================
+* mbuf: The ``__rte_mbuf_raw_free()`` and ``__rte_pktmbuf_prefree_seg()``
+ functions were deprecated since 17.05 and are removed:
+
+ Those functions were kept for compatibility and are replaced by
+ ``rte_mbuf_raw_free()`` and ``rte_pktmbuf_prefree_seg()``.
ABI Changes
-----------
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 9ce5d76d7..a50b05c64 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1038,14 +1038,6 @@ rte_mbuf_raw_free(struct rte_mbuf *m)
rte_mempool_put(m->pool, m);
}
-/* compat with older versions */
-__rte_deprecated
-static inline void
-__rte_mbuf_raw_free(struct rte_mbuf *m)
-{
- rte_mbuf_raw_free(m);
-}
-
/**
* The packet mbuf constructor.
*
@@ -1658,14 +1650,6 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
return NULL;
}
-/* deprecated, replaced by rte_pktmbuf_prefree_seg() */
-__rte_deprecated
-static inline struct rte_mbuf *
-__rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
-{
- return rte_pktmbuf_prefree_seg(m);
-}
-
/**
* Free a segment of a packet mbuf into its original mempool.
*
--
2.17.1
^ permalink raw reply [relevance 8%]
* Re: [dpdk-dev] [PATCH] mbuf: remove deprecated segment free functions
2018-09-16 9:39 0% ` Thomas Monjalon
@ 2018-09-17 7:07 0% ` Olivier Matz
0 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2018-09-17 7:07 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: David Marchand, dev, Andrew Rybchenko
Hi Thomas,
On Sun, Sep 16, 2018 at 11:39:29AM +0200, Thomas Monjalon wrote:
> 10/09/2018 10:06, Andrew Rybchenko:
> > On 09/10/2018 08:18 AM, David Marchand wrote:
> > > __rte_mbuf_raw_free and __rte_pktmbuf_prefree_seg have been deprecated for
> > > a long time now (early 17.05), are not part of the abi and are easily
> > > replaced with existing api.
> > >
> > > Signed-off-by: David Marchand <david.marchand@6wind.com>
> >
> > Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
>
> I think we need to bump the library version and update the API section
> in the release notes.
I don't think bumping the lib version is required here, the patch removes
two functions that are static inline.
But updating the API section would be nice, yes.
Thanks,
Olivier
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH] mbuf: remove deprecated segment free functions
2018-09-10 8:06 0% ` Andrew Rybchenko
@ 2018-09-16 9:39 0% ` Thomas Monjalon
2018-09-17 7:07 0% ` Olivier Matz
0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-09-16 9:39 UTC (permalink / raw)
To: David Marchand; +Cc: dev, Andrew Rybchenko, olivier.matz
10/09/2018 10:06, Andrew Rybchenko:
> On 09/10/2018 08:18 AM, David Marchand wrote:
> > __rte_mbuf_raw_free and __rte_pktmbuf_prefree_seg have been deprecated for
> > a long time now (early 17.05), are not part of the abi and are easily
> > replaced with existing api.
> >
> > Signed-off-by: David Marchand <david.marchand@6wind.com>
>
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
I think we need to bump the library version and update the API section
in the release notes.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
2018-09-13 23:45 0% ` Honnappa Nagarahalli
@ 2018-09-14 2:45 0% ` Jerin Jacob
2018-09-18 15:53 0% ` Ferruh Yigit
0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-09-14 2:45 UTC (permalink / raw)
To: Honnappa Nagarahalli
Cc: Ola Liljedahl, Kokkilagadda, Kiran,
Gavin Hu (Arm Technology China),
Ferruh Yigit, Jacob, Jerin, dev, nd, Steve Capper,
Phil Yang (Arm Technology China)
-----Original Message-----
> Date: Thu, 13 Sep 2018 23:45:31 +0000
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: Ola Liljedahl <Ola.Liljedahl@arm.com>, "Kokkilagadda, Kiran"
> <Kiran.Kokkilagadda@cavium.com>, "Gavin Hu (Arm Technology China)"
> <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin"
> <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>, nd
> <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>, "Phil Yang (Arm
> Technology China)" <Phil.Yang@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
>
> External Email
>
> -----Original Message-----
> > Date: Thu, 13 Sep 2018 17:40:53 +0000
> > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, Ola Liljedahl
> > <Ola.Liljedahl@arm.com>
> > CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, "Gavin Hu
> > (Arm Technology China)" <Gavin.Hu@arm.com>, Ferruh Yigit
> > <ferruh.yigit@intel.com>, "Jacob, Jerin"
> > <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>,
> > nd <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>, "Phil Yang (Arm
> > Technology China)" <Phil.Yang@arm.com>
> > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > synchronization
> >
> >
> > Hi Jerin,
> > Is there any reason for having 'RTE_RING_USE_C11_MEM_MODEL', which is specific to rte_ring? I do not see a need for choosing only some algorithms to work with C11 model. I suggest that we change this to 'RTE_USE_C11_MEM_MODEL' so that it can apply to all libraries/algorithms.
>
>
> Yes. Makes sense to me to keep only single config option.
>
> rte_ring has 2 sets of algorithms for Arm architecture, one with C11 memory model and the other with barriers. Going forward (for ex: for KNI), I think we should support C11 memory model only and skip the barriers.
IMO, Both should be supported and set N as in the config/common_base.
Based on architecture or micro architecture the performance can vary.
So keeping both options and allowing to override to arch/micro arch
specific config file makes sense to me.(like existing model, as smp_*
ops are compiler NOP for x86)
> Also, do you see any issues in making C11 memory model default for Arm architecture?
It is already set default Y to arm64. see config/common_armv8a_linuxapp.
And it is possible for micro architecture to override, see
config/defconfig_arm64-thunderx-linuxapp-gcc
>
> >
> > Thank you,
> > Honnappa
> >
> > -----Original Message-----
> > From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > Sent: Wednesday, August 29, 2018 3:58 AM
> > To: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > Cc: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Honnappa
> > Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu
> > <Gavin.Hu@arm.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob,
> > Jerin <Jerin.JacobKollanukkaran@cavium.com>; dev@dpdk.org; nd
> > <nd@arm.com>; Steve Capper <Steve.Capper@arm.com>
> > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > synchronization
> >
> > -----Original Message-----
> > > Date: Wed, 29 Aug 2018 08:47:56 +0000
> > > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
> > > Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu
> > > <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin"
> > > <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org"
> > > <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
> > > <Steve.Capper@arm.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > > synchronization
> > > user-agent: Microsoft-MacOutlook/10.10.0.180812
> > >
> > >
> > > There was a mention of rte_ring which is a different data structure. But perhaps I misunderstood why this was mentioned and the idea was only to use the C11 memory model as is also used in rte_ring nowadays.
> > >
> > > But why would we have different code for x86 and for other architectures (ARM, Power)? If we use the C11 memory model (and e.g. GCC __atomic builtins), the code generated for x86 will be the same. __atomic_load(__ATOMIC_ACQUIRE) and __atomic_store(__ATOMIC_RELEASE) should translate to plain loads and stores on x86?
> >
> > # One reason was __atomic builtins primitives were implemented in gcc 4.7 and x86 would like to support < gcc 4.7 and ICC compiler.
> > # The theme was no change in the existing code for x86.I am not sure about the code generation for x86 with __atomic builtins, I let x86 maintainers to comments on this.
> >
> >
> > >
> > > -- Ola
> > >
> > > On 29/08/2018, 10:28, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
> > >
> > > -----Original Message-----
> > > > Date: Wed, 29 Aug 2018 07:34:34 +0000
> > > > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > > > To: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
> > > > Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>,
> > > > Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin"
> > > > <Jerin.JacobKollanukkaran@cavium.com>
> > > > CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
> > > > <Steve.Capper@arm.com>
> > > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > > > synchronization
> > > > user-agent: Microsoft-MacOutlook/10.10.0.180812
> > > >
> > > > Is the rte_kni kernel/user binary interface subject to backwards compatibility requirements? Or can we change it for a new DPDK release?
> > >
> > > What would be the change in interface? Is it removing the volatile for
> > > C11 case, Then you can use anonymous union OR #define to keep the size
> > > and offset of the element intact.
> > >
> > > struct rte_kni_fifo {
> > > #ifndef RTE_C11...
> > > volatile unsigned write; /**< Next position to be written*/
> > > volatile unsigned read; /**< Next position to be read */
> > > #else
> > > unsigned write; /**< Next position to be written*/
> > > unsigned read; /**< Next position to be read */
> > > #endif
> > > unsigned len; /**< Circular buffer length */
> > > unsigned elem_size; /**< Pointer size - for 32/64 bitOS */
> > > void *volatile buffer[]; /**< The buffer contains mbuf
> > > pointers */
> > > };
> > >
> > > Anonymous union example:
> > > https://git.dpdk.org/dpdk/tree/lib/librte_mbuf/rte_mbuf.h#n461
> > >
> > > You can check the ABI breakage by devtools/validate-abi.sh
> > >
> > > >
> > > > -- Ola
> > > >
> > > > From: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>
> > > > Date: Wednesday, 29 August 2018 at 07:50
> > > > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin" <Jerin.JacobKollanukkaran@cavium.com>
> > > > Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Ola Liljedahl <Ola.Liljedahl@arm.com>, Steve Capper <Steve.Capper@arm.com>
> > > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > > >
> > > >
> > > > Agreed. Please go a head and make the changes. You need to make same change in kernel side also. And please use c11 ring (see rte_ring) mechanism so that it won't impact other platforms like intel. We need this change just for arm and ppc.
> > > >
> > > > ________________________________
> > > > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > > > Sent: Wednesday, August 29, 2018 10:29 AM
> > > > To: Gavin Hu; Kokkilagadda, Kiran; Ferruh Yigit; Jacob, Jerin
> > > > Cc: dev@dpdk.org; nd; Ola Liljedahl; Steve Capper
> > > > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > > >
> > > >
> > > > External Email
> > > >
> > > > I agree with Gavin here. Store to fifo->write and fifo->read can get hoisted resulting in accessing invalid buffer array entries or over writing of the buffer array entries.
> > > >
> > > > IMO, we should solve this using c11 atomics. This will also help remove the use of ‘volatile’ from ‘rte_kni_fifo’ structure.
> > > >
> > > >
> > > >
> > > > If you want us to put together a patch with this idea, please let us know.
> > > >
> > > >
> > > >
> > > > Thank you,
> > > >
> > > > Honnappa
> > > >
> > > >
> > > >
> > > > From: Gavin Hu
> > > > Sent: Tuesday, August 28, 2018 2:31 PM
> > > > To: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>
> > > > Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; Steve Capper <Steve.Capper@arm.com>
> > > > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > > >
> > > >
> > > >
> > > > Assuming reader and writer may execute on different CPU's, this become standard multithreaded programming.
> > > >
> > > > We are concerned about that update the reader pointer too early(weak ordering may reorder it before reading from the slots), that means the slots are released and may immediately overwritten by the writer then you get “too new” data and get lost of the old data.
> > > >
> > > >
> > > >
> > > > From: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com<mailto:Kiran.Kokkilagadda@cavium.com>>
> > > > Sent: Tuesday, August 28, 2018 6:44 PM
> > > > To: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>; Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com<mailto:Jerin.JacobKollanukkaran@cavium.com>>
> > > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
> > > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > > >
> > > >
> > > >
> > > > In this instance there won't be any problem, as until the value of fifo->write changes, this loop won't get executed. As of now we didn't see any issue with it and for performance reasons, we don't want to keep read barrier.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ________________________________
> > > >
> > > > From: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>
> > > > Sent: Monday, August 27, 2018 9:10 PM
> > > > To: Ferruh Yigit; Kokkilagadda, Kiran; Jacob, Jerin
> > > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
> > > > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > > >
> > > >
> > > >
> > > > External Email
> > > >
> > > > This fix is not complete, kni_fifo_get requires a read fence also, otherwise it probably gets stale data on a weak ordering platform.
> > > >
> > > > > -----Original Message-----
> > > > > From: dev <dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>> On Behalf Of Ferruh Yigit
> > > > > Sent: Monday, August 27, 2018 10:08 PM
> > > > > To: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>;
> > > > > jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>
> > > > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>
> > > > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > > > > synchronization
> > > > >
> > > > > On 8/16/2018 10:55 AM, Kiran Kumar wrote:
> > > > > > With existing code in kni_fifo_put, rx_q values are not being updated
> > > > > > before updating fifo_write. While reading rx_q in kni_net_rx_normal,
> > > > > > This is causing the sync issue on other core. So adding a write
> > > > > > barrier to make sure the values being synced before updating fifo_write.
> > > > > >
> > > > > > Fixes: 3fc5ca2f6352 ("kni: initial import")
> > > > > >
> > > > > > Signed-off-by: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>
> > > > > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>>
> > > > >
> > > > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
> > > > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> > >
> > >
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
2018-09-13 17:51 0% ` Jerin Jacob
@ 2018-09-13 23:45 0% ` Honnappa Nagarahalli
2018-09-14 2:45 0% ` Jerin Jacob
0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2018-09-13 23:45 UTC (permalink / raw)
To: Jerin Jacob
Cc: Ola Liljedahl, Kokkilagadda, Kiran,
Gavin Hu (Arm Technology China),
Ferruh Yigit, Jacob, Jerin, dev, nd, Steve Capper,
Phil Yang (Arm Technology China)
-----Original Message-----
> Date: Thu, 13 Sep 2018 17:40:53 +0000
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, Ola Liljedahl
> <Ola.Liljedahl@arm.com>
> CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, "Gavin Hu
> (Arm Technology China)" <Gavin.Hu@arm.com>, Ferruh Yigit
> <ferruh.yigit@intel.com>, "Jacob, Jerin"
> <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>,
> nd <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>, "Phil Yang (Arm
> Technology China)" <Phil.Yang@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
>
>
> Hi Jerin,
> Is there any reason for having 'RTE_RING_USE_C11_MEM_MODEL', which is specific to rte_ring? I do not see a need for choosing only some algorithms to work with C11 model. I suggest that we change this to 'RTE_USE_C11_MEM_MODEL' so that it can apply to all libraries/algorithms.
Yes. Makes sense to me to keep only single config option.
rte_ring has 2 sets of algorithms for Arm architecture, one with C11 memory model and the other with barriers. Going forward (for ex: for KNI), I think we should support C11 memory model only and skip the barriers.
Also, do you see any issues in making C11 memory model default for Arm architecture?
>
> Thank you,
> Honnappa
>
> -----Original Message-----
> From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Sent: Wednesday, August 29, 2018 3:58 AM
> To: Ola Liljedahl <Ola.Liljedahl@arm.com>
> Cc: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu
> <Gavin.Hu@arm.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob,
> Jerin <Jerin.JacobKollanukkaran@cavium.com>; dev@dpdk.org; nd
> <nd@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
>
> -----Original Message-----
> > Date: Wed, 29 Aug 2018 08:47:56 +0000
> > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
> > Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu
> > <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin"
> > <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org"
> > <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
> > <Steve.Capper@arm.com>
> > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > synchronization
> > user-agent: Microsoft-MacOutlook/10.10.0.180812
> >
> >
> > There was a mention of rte_ring which is a different data structure. But perhaps I misunderstood why this was mentioned and the idea was only to use the C11 memory model as is also used in rte_ring nowadays.
> >
> > But why would we have different code for x86 and for other architectures (ARM, Power)? If we use the C11 memory model (and e.g. GCC __atomic builtins), the code generated for x86 will be the same. __atomic_load(__ATOMIC_ACQUIRE) and __atomic_store(__ATOMIC_RELEASE) should translate to plain loads and stores on x86?
>
> # One reason was __atomic builtins primitives were implemented in gcc 4.7 and x86 would like to support < gcc 4.7 and ICC compiler.
> # The theme was no change in the existing code for x86.I am not sure about the code generation for x86 with __atomic builtins, I let x86 maintainers to comments on this.
>
>
> >
> > -- Ola
> >
> > On 29/08/2018, 10:28, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
> >
> > -----Original Message-----
> > > Date: Wed, 29 Aug 2018 07:34:34 +0000
> > > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > > To: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
> > > Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>,
> > > Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin"
> > > <Jerin.JacobKollanukkaran@cavium.com>
> > > CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
> > > <Steve.Capper@arm.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > > synchronization
> > > user-agent: Microsoft-MacOutlook/10.10.0.180812
> > >
> > > Is the rte_kni kernel/user binary interface subject to backwards compatibility requirements? Or can we change it for a new DPDK release?
> >
> > What would be the change in interface? Is it removing the volatile for
> > C11 case, Then you can use anonymous union OR #define to keep the size
> > and offset of the element intact.
> >
> > struct rte_kni_fifo {
> > #ifndef RTE_C11...
> > volatile unsigned write; /**< Next position to be written*/
> > volatile unsigned read; /**< Next position to be read */
> > #else
> > unsigned write; /**< Next position to be written*/
> > unsigned read; /**< Next position to be read */
> > #endif
> > unsigned len; /**< Circular buffer length */
> > unsigned elem_size; /**< Pointer size - for 32/64 bitOS */
> > void *volatile buffer[]; /**< The buffer contains mbuf
> > pointers */
> > };
> >
> > Anonymous union example:
> > https://git.dpdk.org/dpdk/tree/lib/librte_mbuf/rte_mbuf.h#n461
> >
> > You can check the ABI breakage by devtools/validate-abi.sh
> >
> > >
> > > -- Ola
> > >
> > > From: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>
> > > Date: Wednesday, 29 August 2018 at 07:50
> > > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin" <Jerin.JacobKollanukkaran@cavium.com>
> > > Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Ola Liljedahl <Ola.Liljedahl@arm.com>, Steve Capper <Steve.Capper@arm.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > >
> > >
> > > Agreed. Please go a head and make the changes. You need to make same change in kernel side also. And please use c11 ring (see rte_ring) mechanism so that it won't impact other platforms like intel. We need this change just for arm and ppc.
> > >
> > > ________________________________
> > > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > > Sent: Wednesday, August 29, 2018 10:29 AM
> > > To: Gavin Hu; Kokkilagadda, Kiran; Ferruh Yigit; Jacob, Jerin
> > > Cc: dev@dpdk.org; nd; Ola Liljedahl; Steve Capper
> > > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > >
> > >
> > > External Email
> > >
> > > I agree with Gavin here. Store to fifo->write and fifo->read can get hoisted resulting in accessing invalid buffer array entries or over writing of the buffer array entries.
> > >
> > > IMO, we should solve this using c11 atomics. This will also help remove the use of ‘volatile’ from ‘rte_kni_fifo’ structure.
> > >
> > >
> > >
> > > If you want us to put together a patch with this idea, please let us know.
> > >
> > >
> > >
> > > Thank you,
> > >
> > > Honnappa
> > >
> > >
> > >
> > > From: Gavin Hu
> > > Sent: Tuesday, August 28, 2018 2:31 PM
> > > To: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>
> > > Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; Steve Capper <Steve.Capper@arm.com>
> > > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > >
> > >
> > >
> > > Assuming reader and writer may execute on different CPU's, this become standard multithreaded programming.
> > >
> > > We are concerned about that update the reader pointer too early(weak ordering may reorder it before reading from the slots), that means the slots are released and may immediately overwritten by the writer then you get “too new” data and get lost of the old data.
> > >
> > >
> > >
> > > From: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com<mailto:Kiran.Kokkilagadda@cavium.com>>
> > > Sent: Tuesday, August 28, 2018 6:44 PM
> > > To: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>; Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com<mailto:Jerin.JacobKollanukkaran@cavium.com>>
> > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
> > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > >
> > >
> > >
> > > In this instance there won't be any problem, as until the value of fifo->write changes, this loop won't get executed. As of now we didn't see any issue with it and for performance reasons, we don't want to keep read barrier.
> > >
> > >
> > >
> > >
> > >
> > > ________________________________
> > >
> > > From: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>
> > > Sent: Monday, August 27, 2018 9:10 PM
> > > To: Ferruh Yigit; Kokkilagadda, Kiran; Jacob, Jerin
> > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
> > > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > >
> > >
> > >
> > > External Email
> > >
> > > This fix is not complete, kni_fifo_get requires a read fence also, otherwise it probably gets stale data on a weak ordering platform.
> > >
> > > > -----Original Message-----
> > > > From: dev <dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>> On Behalf Of Ferruh Yigit
> > > > Sent: Monday, August 27, 2018 10:08 PM
> > > > To: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>;
> > > > jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>
> > > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>
> > > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > > > synchronization
> > > >
> > > > On 8/16/2018 10:55 AM, Kiran Kumar wrote:
> > > > > With existing code in kni_fifo_put, rx_q values are not being updated
> > > > > before updating fifo_write. While reading rx_q in kni_net_rx_normal,
> > > > > This is causing the sync issue on other core. So adding a write
> > > > > barrier to make sure the values being synced before updating fifo_write.
> > > > >
> > > > > Fixes: 3fc5ca2f6352 ("kni: initial import")
> > > > >
> > > > > Signed-off-by: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>
> > > > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>>
> > > >
> > > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
> > > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> >
> >
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
2018-09-13 17:40 0% ` Honnappa Nagarahalli
@ 2018-09-13 17:51 0% ` Jerin Jacob
2018-09-13 23:45 0% ` Honnappa Nagarahalli
0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-09-13 17:51 UTC (permalink / raw)
To: Honnappa Nagarahalli
Cc: Ola Liljedahl, Kokkilagadda, Kiran,
Gavin Hu (Arm Technology China),
Ferruh Yigit, Jacob, Jerin, dev, nd, Steve Capper,
Phil Yang (Arm Technology China)
-----Original Message-----
> Date: Thu, 13 Sep 2018 17:40:53 +0000
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>, Ola Liljedahl
> <Ola.Liljedahl@arm.com>
> CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, "Gavin Hu (Arm
> Technology China)" <Gavin.Hu@arm.com>, Ferruh Yigit
> <ferruh.yigit@intel.com>, "Jacob, Jerin"
> <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>, nd
> <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>, "Phil Yang (Arm
> Technology China)" <Phil.Yang@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
>
>
> Hi Jerin,
> Is there any reason for having 'RTE_RING_USE_C11_MEM_MODEL', which is specific to rte_ring? I do not see a need for choosing only some algorithms to work with C11 model. I suggest that we change this to 'RTE_USE_C11_MEM_MODEL' so that it can apply to all libraries/algorithms.
Yes. Makes sense to me to keep only single config option.
>
> Thank you,
> Honnappa
>
> -----Original Message-----
> From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Sent: Wednesday, August 29, 2018 3:58 AM
> To: Ola Liljedahl <Ola.Liljedahl@arm.com>
> Cc: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu <Gavin.Hu@arm.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>; dev@dpdk.org; nd <nd@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
>
> -----Original Message-----
> > Date: Wed, 29 Aug 2018 08:47:56 +0000
> > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
> > Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu
> > <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin"
> > <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>,
> > nd <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>
> > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > synchronization
> > user-agent: Microsoft-MacOutlook/10.10.0.180812
> >
> >
> > There was a mention of rte_ring which is a different data structure. But perhaps I misunderstood why this was mentioned and the idea was only to use the C11 memory model as is also used in rte_ring nowadays.
> >
> > But why would we have different code for x86 and for other architectures (ARM, Power)? If we use the C11 memory model (and e.g. GCC __atomic builtins), the code generated for x86 will be the same. __atomic_load(__ATOMIC_ACQUIRE) and __atomic_store(__ATOMIC_RELEASE) should translate to plain loads and stores on x86?
>
> # One reason was __atomic builtins primitives were implemented in gcc 4.7 and x86 would like to support < gcc 4.7 and ICC compiler.
> # The theme was no change in the existing code for x86.I am not sure about the code generation for x86 with __atomic builtins, I let x86 maintainers to comments on this.
>
>
> >
> > -- Ola
> >
> > On 29/08/2018, 10:28, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
> >
> > -----Original Message-----
> > > Date: Wed, 29 Aug 2018 07:34:34 +0000
> > > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > > To: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
> > > Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>,
> > > Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin"
> > > <Jerin.JacobKollanukkaran@cavium.com>
> > > CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
> > > <Steve.Capper@arm.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > > synchronization
> > > user-agent: Microsoft-MacOutlook/10.10.0.180812
> > >
> > > Is the rte_kni kernel/user binary interface subject to backwards compatibility requirements? Or can we change it for a new DPDK release?
> >
> > What would be the change in interface? Is it removing the volatile for
> > C11 case, Then you can use anonymous union OR #define to keep the size
> > and offset of the element intact.
> >
> > struct rte_kni_fifo {
> > #ifndef RTE_C11...
> > volatile unsigned write; /**< Next position to be written*/
> > volatile unsigned read; /**< Next position to be read */
> > #else
> > unsigned write; /**< Next position to be written*/
> > unsigned read; /**< Next position to be read */
> > #endif
> > unsigned len; /**< Circular buffer length */
> > unsigned elem_size; /**< Pointer size - for 32/64 bitOS */
> > void *volatile buffer[]; /**< The buffer contains mbuf
> > pointers */
> > };
> >
> > Anonymous union example:
> > https://git.dpdk.org/dpdk/tree/lib/librte_mbuf/rte_mbuf.h#n461
> >
> > You can check the ABI breakage by devtools/validate-abi.sh
> >
> > >
> > > -- Ola
> > >
> > > From: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>
> > > Date: Wednesday, 29 August 2018 at 07:50
> > > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin" <Jerin.JacobKollanukkaran@cavium.com>
> > > Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Ola Liljedahl <Ola.Liljedahl@arm.com>, Steve Capper <Steve.Capper@arm.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > >
> > >
> > > Agreed. Please go a head and make the changes. You need to make same change in kernel side also. And please use c11 ring (see rte_ring) mechanism so that it won't impact other platforms like intel. We need this change just for arm and ppc.
> > >
> > > ________________________________
> > > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > > Sent: Wednesday, August 29, 2018 10:29 AM
> > > To: Gavin Hu; Kokkilagadda, Kiran; Ferruh Yigit; Jacob, Jerin
> > > Cc: dev@dpdk.org; nd; Ola Liljedahl; Steve Capper
> > > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > >
> > >
> > > External Email
> > >
> > > I agree with Gavin here. Store to fifo->write and fifo->read can get hoisted resulting in accessing invalid buffer array entries or over writing of the buffer array entries.
> > >
> > > IMO, we should solve this using c11 atomics. This will also help remove the use of ‘volatile’ from ‘rte_kni_fifo’ structure.
> > >
> > >
> > >
> > > If you want us to put together a patch with this idea, please let us know.
> > >
> > >
> > >
> > > Thank you,
> > >
> > > Honnappa
> > >
> > >
> > >
> > > From: Gavin Hu
> > > Sent: Tuesday, August 28, 2018 2:31 PM
> > > To: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>
> > > Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; Steve Capper <Steve.Capper@arm.com>
> > > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > >
> > >
> > >
> > > Assuming reader and writer may execute on different CPU's, this become standard multithreaded programming.
> > >
> > > We are concerned about that update the reader pointer too early(weak ordering may reorder it before reading from the slots), that means the slots are released and may immediately overwritten by the writer then you get “too new” data and get lost of the old data.
> > >
> > >
> > >
> > > From: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com<mailto:Kiran.Kokkilagadda@cavium.com>>
> > > Sent: Tuesday, August 28, 2018 6:44 PM
> > > To: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>; Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com<mailto:Jerin.JacobKollanukkaran@cavium.com>>
> > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
> > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > >
> > >
> > >
> > > In this instance there won't be any problem, as until the value of fifo->write changes, this loop won't get executed. As of now we didn't see any issue with it and for performance reasons, we don't want to keep read barrier.
> > >
> > >
> > >
> > >
> > >
> > > ________________________________
> > >
> > > From: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>
> > > Sent: Monday, August 27, 2018 9:10 PM
> > > To: Ferruh Yigit; Kokkilagadda, Kiran; Jacob, Jerin
> > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
> > > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> > >
> > >
> > >
> > > External Email
> > >
> > > This fix is not complete, kni_fifo_get requires a read fence also, otherwise it probably gets stale data on a weak ordering platform.
> > >
> > > > -----Original Message-----
> > > > From: dev <dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>> On Behalf Of Ferruh Yigit
> > > > Sent: Monday, August 27, 2018 10:08 PM
> > > > To: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>;
> > > > jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>
> > > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>
> > > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > > > synchronization
> > > >
> > > > On 8/16/2018 10:55 AM, Kiran Kumar wrote:
> > > > > With existing code in kni_fifo_put, rx_q values are not being updated
> > > > > before updating fifo_write. While reading rx_q in kni_net_rx_normal,
> > > > > This is causing the sync issue on other core. So adding a write
> > > > > barrier to make sure the values being synced before updating fifo_write.
> > > > >
> > > > > Fixes: 3fc5ca2f6352 ("kni: initial import")
> > > > >
> > > > > Signed-off-by: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>
> > > > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>>
> > > >
> > > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
> > > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> >
> >
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
@ 2018-09-13 17:40 0% ` Honnappa Nagarahalli
2018-09-13 17:51 0% ` Jerin Jacob
0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2018-09-13 17:40 UTC (permalink / raw)
To: Jerin Jacob, Ola Liljedahl
Cc: Kokkilagadda, Kiran, Gavin Hu (Arm Technology China),
Ferruh Yigit, Jacob, Jerin, dev, nd, Steve Capper,
Phil Yang (Arm Technology China)
Hi Jerin,
Is there any reason for having 'RTE_RING_USE_C11_MEM_MODEL', which is specific to rte_ring? I do not see a need for choosing only some algorithms to work with C11 model. I suggest that we change this to 'RTE_USE_C11_MEM_MODEL' so that it can apply to all libraries/algorithms.
Thank you,
Honnappa
-----Original Message-----
From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Sent: Wednesday, August 29, 2018 3:58 AM
To: Ola Liljedahl <Ola.Liljedahl@arm.com>
Cc: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Gavin Hu <Gavin.Hu@arm.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>; dev@dpdk.org; nd <nd@arm.com>; Steve Capper <Steve.Capper@arm.com>
Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
-----Original Message-----
> Date: Wed, 29 Aug 2018 08:47:56 +0000
> From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu
> <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin"
> <Jerin.JacobKollanukkaran@cavium.com>, "dev@dpdk.org" <dev@dpdk.org>,
> nd <nd@arm.com>, Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> synchronization
> user-agent: Microsoft-MacOutlook/10.10.0.180812
>
>
> There was a mention of rte_ring which is a different data structure. But perhaps I misunderstood why this was mentioned and the idea was only to use the C11 memory model as is also used in rte_ring nowadays.
>
> But why would we have different code for x86 and for other architectures (ARM, Power)? If we use the C11 memory model (and e.g. GCC __atomic builtins), the code generated for x86 will be the same. __atomic_load(__ATOMIC_ACQUIRE) and __atomic_store(__ATOMIC_RELEASE) should translate to plain loads and stores on x86?
# One reason was __atomic builtins primitives were implemented in gcc 4.7 and x86 would like to support < gcc 4.7 and ICC compiler.
# The theme was no change in the existing code for x86.I am not sure about the code generation for x86 with __atomic builtins, I let x86 maintainers to comments on this.
>
> -- Ola
>
> On 29/08/2018, 10:28, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
>
> -----Original Message-----
> > Date: Wed, 29 Aug 2018 07:34:34 +0000
> > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > To: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>, Honnappa
> > Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>,
> > Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin"
> > <Jerin.JacobKollanukkaran@cavium.com>
> > CC: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Steve Capper
> > <Steve.Capper@arm.com>
> > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > synchronization
> > user-agent: Microsoft-MacOutlook/10.10.0.180812
> >
> > Is the rte_kni kernel/user binary interface subject to backwards compatibility requirements? Or can we change it for a new DPDK release?
>
> What would be the change in interface? Is it removing the volatile for
> C11 case, Then you can use anonymous union OR #define to keep the size
> and offset of the element intact.
>
> struct rte_kni_fifo {
> #ifndef RTE_C11...
> volatile unsigned write; /**< Next position to be written*/
> volatile unsigned read; /**< Next position to be read */
> #else
> unsigned write; /**< Next position to be written*/
> unsigned read; /**< Next position to be read */
> #endif
> unsigned len; /**< Circular buffer length */
> unsigned elem_size; /**< Pointer size - for 32/64 bitOS */
> void *volatile buffer[]; /**< The buffer contains mbuf
> pointers */
> };
>
> Anonymous union example:
> https://git.dpdk.org/dpdk/tree/lib/librte_mbuf/rte_mbuf.h#n461
>
> You can check the ABI breakage by devtools/validate-abi.sh
>
> >
> > -- Ola
> >
> > From: "Kokkilagadda, Kiran" <Kiran.Kokkilagadda@cavium.com>
> > Date: Wednesday, 29 August 2018 at 07:50
> > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, Gavin Hu <Gavin.Hu@arm.com>, Ferruh Yigit <ferruh.yigit@intel.com>, "Jacob, Jerin" <Jerin.JacobKollanukkaran@cavium.com>
> > Cc: "dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>, Ola Liljedahl <Ola.Liljedahl@arm.com>, Steve Capper <Steve.Capper@arm.com>
> > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> >
> >
> > Agreed. Please go a head and make the changes. You need to make same change in kernel side also. And please use c11 ring (see rte_ring) mechanism so that it won't impact other platforms like intel. We need this change just for arm and ppc.
> >
> > ________________________________
> > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Sent: Wednesday, August 29, 2018 10:29 AM
> > To: Gavin Hu; Kokkilagadda, Kiran; Ferruh Yigit; Jacob, Jerin
> > Cc: dev@dpdk.org; nd; Ola Liljedahl; Steve Capper
> > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> >
> >
> > External Email
> >
> > I agree with Gavin here. Store to fifo->write and fifo->read can get hoisted resulting in accessing invalid buffer array entries or over writing of the buffer array entries.
> >
> > IMO, we should solve this using c11 atomics. This will also help remove the use of ‘volatile’ from ‘rte_kni_fifo’ structure.
> >
> >
> >
> > If you want us to put together a patch with this idea, please let us know.
> >
> >
> >
> > Thank you,
> >
> > Honnappa
> >
> >
> >
> > From: Gavin Hu
> > Sent: Tuesday, August 28, 2018 2:31 PM
> > To: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com>; Ferruh Yigit <ferruh.yigit@intel.com>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com>
> > Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; Steve Capper <Steve.Capper@arm.com>
> > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> >
> >
> >
> > Assuming reader and writer may execute on different CPU's, this become standard multithreaded programming.
> >
> > We are concerned about that update the reader pointer too early(weak ordering may reorder it before reading from the slots), that means the slots are released and may immediately overwritten by the writer then you get “too new” data and get lost of the old data.
> >
> >
> >
> > From: Kokkilagadda, Kiran <Kiran.Kokkilagadda@cavium.com<mailto:Kiran.Kokkilagadda@cavium.com>>
> > Sent: Tuesday, August 28, 2018 6:44 PM
> > To: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>; Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>; Jacob, Jerin <Jerin.JacobKollanukkaran@cavium.com<mailto:Jerin.JacobKollanukkaran@cavium.com>>
> > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>
> > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> >
> >
> >
> > In this instance there won't be any problem, as until the value of fifo->write changes, this loop won't get executed. As of now we didn't see any issue with it and for performance reasons, we don't want to keep read barrier.
> >
> >
> >
> >
> >
> > ________________________________
> >
> > From: Gavin Hu <Gavin.Hu@arm.com<mailto:Gavin.Hu@arm.com>>
> > Sent: Monday, August 27, 2018 9:10 PM
> > To: Ferruh Yigit; Kokkilagadda, Kiran; Jacob, Jerin
> > Cc: dev@dpdk.org<mailto:dev@dpdk.org>; Honnappa Nagarahalli
> > Subject: RE: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization
> >
> >
> >
> > External Email
> >
> > This fix is not complete, kni_fifo_get requires a read fence also, otherwise it probably gets stale data on a weak ordering platform.
> >
> > > -----Original Message-----
> > > From: dev <dev-bounces@dpdk.org<mailto:dev-bounces@dpdk.org>> On Behalf Of Ferruh Yigit
> > > Sent: Monday, August 27, 2018 10:08 PM
> > > To: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>;
> > > jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>
> > > Cc: dev@dpdk.org<mailto:dev@dpdk.org>
> > > Subject: Re: [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer
> > > synchronization
> > >
> > > On 8/16/2018 10:55 AM, Kiran Kumar wrote:
> > > > With existing code in kni_fifo_put, rx_q values are not being updated
> > > > before updating fifo_write. While reading rx_q in kni_net_rx_normal,
> > > > This is causing the sync issue on other core. So adding a write
> > > > barrier to make sure the values being synced before updating fifo_write.
> > > >
> > > > Fixes: 3fc5ca2f6352 ("kni: initial import")
> > > >
> > > > Signed-off-by: Kiran Kumar <kkokkilagadda@caviumnetworks.com<mailto:kkokkilagadda@caviumnetworks.com>>
> > > > Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com<mailto:jerin.jacob@caviumnetworks.com>>
> > >
> > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com<mailto:ferruh.yigit@intel.com>>
> > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
>
>
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries
2018-09-06 14:28 3% ` Michel Machado
@ 2018-09-12 20:37 2% ` Honnappa Nagarahalli
2018-09-20 19:50 0% ` Michel Machado
0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2018-09-12 20:37 UTC (permalink / raw)
To: Michel Machado, Qiaobin Fu, bruce.richardson, pablo.de.lara.guarch
Cc: dev, doucette, keith.wiles, sameh.gobriel, charlie.tai, stephen,
nd, yipeng1.wang
Hi Michel,
I applied your patch and tweaked the code to run few performance tests on Arm (Cortex-A72, 1.3GHz) and x86 (Intel Xeon CPU E5-2660 v4 @ 2.00GHz). The perf code looks as follows:
count_b = rte_rdtsc_precise();
int k = 0;
rte_hash_iterator_init(tbl_rw_test_param.h, &state);
while (rte_hash_iterate(&state, &next_key, &next_data) >= 0) {
/* Search for the key in the list of keys added .*/
i = *(const uint32_t *)next_key;
tbl_rw_test_param.found[i]++;
k++;
}
count_a = rte_rdtsc_precise() - count_b;
printf("*****Cycles2 per iterate call: %lu\n", count_a/k);
Further, I changed the rte_hash_iterate as follows and ran the same test.
int32_t rte_hash_iterate(const struct rte_hash *h, struct rte_hash_iterator_state *state, const void **key, void **data)
Finally, I used memset in the place of rte_hash_iterator_init with the required changes to rte_hash_iterate.
All these tests show little variation in 'cycles per iterate call' for both architectures.
-----Original Message-----
From: Michel Machado <michel@digirati.com.br>
Sent: Thursday, September 6, 2018 9:29 AM
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Qiaobin Fu <qiaobinf@bu.edu>; bruce.richardson@intel.com; pablo.de.lara.guarch@intel.com
Cc: dev@dpdk.org; doucette@bu.edu; keith.wiles@intel.com; sameh.gobriel@intel.com; charlie.tai@intel.com; stephen@networkplumber.org; nd <nd@arm.com>; yipeng1.wang@intel.com
Subject: Re: [PATCH v3] hash table: add an iterator over conflicting entries
On 09/05/2018 06:13 PM, Honnappa Nagarahalli wrote:
>> + uint32_t next;
>> + uint32_t total_entries;
>> +};
>> This structure can be moved to rte_cuckoo_hash.h file.
>
> What's the purpose of moving this struct to a header file since it's only used in the C file rte_cuckoo_hash.c?
>
> This is to maintain consistency. For ex: 'struct queue_node', which is
> an internal structure, is kept in rte_cuckoo_hash.h
Okay. We'll move it there.
>> +int32_t
>> +rte_hash_iterator_init(const struct rte_hash *h,
>> + struct rte_hash_iterator_state *state) {
>> + struct rte_hash_iterator_istate *__state;
>> '__state' can be replaced by 's'.
>>
>> +
>> + RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
>> +
>> + __state = (struct rte_hash_iterator_istate *)state;
>> + __state->h = h;
>> + __state->next = 0;
>> + __state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
>> +
>> + return 0;
>> +}
>> IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.
>
> Applications would have to initialize struct rte_hash_iterator_state *state before calling rte_hash_iterate() anyway. Why not initializing the fields of a state only once?
>
> My concern is about creating another API for every iterator API. You have a valid point on saving cycles as this API applies for data plane. Have you done any performance benchmarking with and without this API? May be we can guide our decision based on that.
It's not just about creating one init function for each iterator because an iterator may have a couple of init functions. For example, someone may eventually find useful to add another init function for the conflicting-entry iterator that we are advocating in this patch. A possibility would be for this new init function to use the key of the new entry instead of its signature to initialize the state. Similar to what is already done in rte_hash_lookup*() functions. In spite of possibly having multiple init functions, there will be a single iterator function.
About the performance benchmarking, the current API only requites applications to initialize a single 32-bit integer. But with the adoption of a struct for the state, the initialization will grow to 64 bytes.
As my tests showed, I do not see any impact of this.
>> int32_t
>> -rte_hash_iterate(const struct rte_hash *h, const void **key, void
>> **data, uint32_t *next)
>> +rte_hash_iterate(
>> + struct rte_hash_iterator_state *state, const void **key, void
>> +**data)
>>
>> IMO, as suggested above, do not store 'struct rte_hash *h' in 'struct rte_hash_iterator_state'. Instead, change the API definition as follows:
>> rte_hash_iterate(const struct rte_hash *h, const void **key, void
>> **data, struct rte_hash_iterator_state *state)
>>
>> This will help keep the API signature consistent with existing APIs.
>>
>> This is an ABI change. Please take a look at https://doc.dpdk.org/guides/contributing/versioning.html.
>
> The ABI will change in a way or another, so why not going for a single state instead of requiring parameters that are already needed for the initialization of the state?
>
> Are there any cost savings we can achieve by keeping the 'h' in the iterator state?
There's a tiny cost saving: avoiding to push that parameter in the execution stack every time the iterator will get another entry. However, the reason I find more important is to make impossible to introduce a bug in the code. Consider a function that is dealing with two hash tables and two iterators. Without asking for the hash table to make progress in an iterator, it's impossible to mix up hash tables and iterator states.
IMO, similar arguments can be applied for other APIs too. It is more difficult to use the APIs if they are not consistent. I also do not see the benefit of the savings in my tests.
There's even the possibility that an iterator doesn't need the hash table after its initialization. This would be an *unlikely* case, but consider an iterator that only returns a couple of entries. It could cache those entries during initialization.
>> /* Calculate bucket and index of current iterator */
>> - bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>> - idx = *next % RTE_HASH_BUCKET_ENTRIES;
>> + bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
>> + idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>>
>> /* If current position is empty, go to the next one */
>> - while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>> - (*next)++;
>> + while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>> + __state->next++;
>> /* End of table */
>> - if (*next == total_entries)
>> + if (__state->next == __state->total_entries)
>> return -ENOENT;
>> - bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>> - idx = *next % RTE_HASH_BUCKET_ENTRIES;
>> + bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
>> + idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>> }
>> - __hash_rw_reader_lock(h);
>> + __hash_rw_reader_lock(__state->h);
>> /* Get position of entry in key table */
>> - position = h->buckets[bucket_idx].key_idx[idx];
>> - next_key = (struct rte_hash_key *) ((char *)h->key_store +
>> - position * h->key_entry_size);
>> + position = __state->h->buckets[bucket_idx].key_idx[idx];
>> + next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
>> + position * __state->h->key_entry_size);
>> /* Return key and data */
>> *key = next_key->key;
>> *data = next_key->pdata;
>>
>> - __hash_rw_reader_unlock(h);
>> + __hash_rw_reader_unlock(__state->h);
>>
>> /* Increment iterator */
>> - (*next)++;
>> + __state->next++;
>> This comment is not related to this change, it is better to place this inside the lock.
>
> Even though __state->next does not depend on the lock?
>
> It depends on if this API needs to be multi-thread safe. Interestingly, the documentation does not say it is multi-thread safe. If it has to be multi-thread safe, then the state also needs to be protected. For ex: what happens if the user uses a global variable for the state?
If an application needs to share an iterator state between threads, it has to have a synchronization mechanism for that as it would for any other shared variable. The lock above is allowing applications to share a hash table between threads, it has no semantic over anything else.
Agree, the lock is for protecting the hash table, not the iterator state.
>> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
>> index 9e7d9315f..fdb01023e 100644
>> --- a/lib/librte_hash/rte_hash.h
>> +++ b/lib/librte_hash/rte_hash.h
>> @@ -14,6 +14,8 @@
>> #include <stdint.h>
>> #include <stddef.h>
>>
>> +#include <rte_compat.h>
>> +
>> #ifdef __cplusplus
>> extern "C" {
>> #endif
>> @@ -64,6 +66,16 @@ struct rte_hash_parameters {
>> /** @internal A hash table structure. */ struct rte_hash;
>>
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice.
>> + *
>> + * @internal A hash table iterator state structure.
>> + */
>> +struct rte_hash_iterator_state {
>> + uint8_t space[64];
>> I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.
>
> Okay.
I think we should not replace 64 with RTE_CACHE_LINE_SIZE because the ABI would change based on the architecture for which it's compiled.
Ok. May be have a #define for 64?
[ ]'s
Michel Machado
^ permalink raw reply [relevance 2%]
* Re: [dpdk-dev] [PATCH v2 10/10] kni: add API to set link status on kernel interface
2018-09-11 23:14 3% ` Stephen Hemminger
@ 2018-09-12 4:02 0% ` Jason Wang
0 siblings, 0 replies; 200+ results
From: Jason Wang @ 2018-09-12 4:02 UTC (permalink / raw)
To: Stephen Hemminger, Dan Gora; +Cc: Igor Ryzhov, Ferruh Yigit, dev
On 2018年09月12日 07:14, Stephen Hemminger wrote:
> On Tue, 11 Sep 2018 19:07:47 -0300
> Dan Gora <dg@adax.com> wrote:
>
>> On Tue, Sep 11, 2018 at 6:52 PM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>>> The carrier state has no meaning when device is down, at least for physical
>>> devices. Because often the PHY is powered off when the device is marked down.
>> The thing that caught my attention is that when you mark a kernel
>> ethernet device 'down', you get a message that the link is down in the
>> syslog.
>>
>> snappy:root:bash 2645 => ip link set down dev eth0
>> Sep 11 18:32:48 snappy kernel: e1000e: eth0 NIC Link is Down
>>
>> With this method, that's not possible because you cannot change the
>> link state from the callback from kni_net_release.
>>
>> The carrier state doesn't have any meaning from a data transfer point
>> of view, but it's often useful for being able to diagnose connectivity
>> issues (is my cable plugged in or not).
>>
>> I'm still not really clear what the objection really is to the ioctl
>> method. Is it just the number of changes? That the kernel driver has
>> to change as well? Just that there is another way to do it?
>>
>> thanks
>> dan
> I want to see KNI as part of the standard Linux kernel at some future date.
> Having KNI as an out of tree driver means it is doomed to chasing tail lights
> for the Linux kernel ABI instability and also problems with Linux distributions.
Why not use vhost_net instead? KNI duplicates its function.
Thanks
>
> One of the barriers to entry for Linux drivers is introducing new ioctl's.
> Ioctl's have issues with being device specific and also 32/64 compatiablity.
> If KNI has ioctl's it makes it harder to get merged some day.
>
> I freely admit that this is forcing KNI to respond to something that is not
> there yet, so if it is too hard, then doing it with ioctl is going to be
> necessary.
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH v2 10/10] kni: add API to set link status on kernel interface
@ 2018-09-11 23:14 3% ` Stephen Hemminger
2018-09-12 4:02 0% ` Jason Wang
0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2018-09-11 23:14 UTC (permalink / raw)
To: Dan Gora; +Cc: Igor Ryzhov, Ferruh Yigit, dev
On Tue, 11 Sep 2018 19:07:47 -0300
Dan Gora <dg@adax.com> wrote:
> On Tue, Sep 11, 2018 at 6:52 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> > The carrier state has no meaning when device is down, at least for physical
> > devices. Because often the PHY is powered off when the device is marked down.
>
> The thing that caught my attention is that when you mark a kernel
> ethernet device 'down', you get a message that the link is down in the
> syslog.
>
> snappy:root:bash 2645 => ip link set down dev eth0
> Sep 11 18:32:48 snappy kernel: e1000e: eth0 NIC Link is Down
>
> With this method, that's not possible because you cannot change the
> link state from the callback from kni_net_release.
>
> The carrier state doesn't have any meaning from a data transfer point
> of view, but it's often useful for being able to diagnose connectivity
> issues (is my cable plugged in or not).
>
> I'm still not really clear what the objection really is to the ioctl
> method. Is it just the number of changes? That the kernel driver has
> to change as well? Just that there is another way to do it?
>
> thanks
> dan
I want to see KNI as part of the standard Linux kernel at some future date.
Having KNI as an out of tree driver means it is doomed to chasing tail lights
for the Linux kernel ABI instability and also problems with Linux distributions.
One of the barriers to entry for Linux drivers is introducing new ioctl's.
Ioctl's have issues with being device specific and also 32/64 compatiablity.
If KNI has ioctl's it makes it harder to get merged some day.
I freely admit that this is forcing KNI to respond to something that is not
there yet, so if it is too hard, then doing it with ioctl is going to be
necessary.
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH 07/15] net/liquidio: rename version map after library file name
2018-09-11 14:06 0% ` Bruce Richardson
@ 2018-09-11 16:05 0% ` Luca Boccassi
0 siblings, 0 replies; 200+ results
From: Luca Boccassi @ 2018-09-11 16:05 UTC (permalink / raw)
To: Bruce Richardson; +Cc: dev
On Tue, 2018-09-11 at 15:06 +0100, Bruce Richardson wrote:
> On Tue, Sep 11, 2018 at 02:41:36PM +0100, Luca Boccassi wrote:
> > On Tue, 2018-09-11 at 14:32 +0100, Bruce Richardson wrote:
> > > On Tue, Sep 11, 2018 at 02:09:30PM +0100, Luca Boccassi wrote:
> > > > On Tue, 2018-09-11 at 14:06 +0100, Bruce Richardson wrote:
> > > > > On Mon, Sep 10, 2018 at 09:04:07PM +0100, Luca Boccassi
> > > > > wrote:
> > > > > > The library is called librte_pmd_lio, so rename the map
> > > > > > file
> > > > > > and
> > > > > > set
> > > > > > the name in the meson file so that the built library names
> > > > > > with
> > > > > > meson
> > > > > > and legacy makefiles are the same
> > > > > >
> > > > > > Fixes: bad475c03fee ("net/liquidio: add to meson build")
> > > > > > Cc: stable@dpdk.org
> > > > > >
> > > > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > > >
> > > > > Rather than doing this renaming, can we instead add a symlink
> > > > > in
> > > > > the
> > > > > install phase to map the old name to the new one? I'd like to
> > > > > see
> > > > > the
> > > > > consistency of directory name, map filename and driver name
> > > > > enforced
> > > > > strictly in the build system. Having exceptions is a pain.
> > > > >
> > > > > /Bruce
> > > >
> > > > We could, but the pain gets shifted on packagers then - what
> > > > about
> > > > renaming the directory entirely to net/lio?
> > >
> > > For packagers, what sort of ABI compatibility guarantees do you
> > > try
> > > and
> > > keep between releases. Is this something that just needs a one-
> > > release ABI
> > > announcement, as with other ABI changes?
> > >
> > > /Bruce
> >
> > Currently in Debian/Ubuntu we are using the ABI override (because
> > of
> > the sticky ABI breakage issue) so the filenames and package names
> > are
> > different on every release anyway.
> >
> > So in theory we could change the name of the libs and packages, but
> > what I'm mostly worried about is keeping consistency and some level
> > of
> > compatibility between old and new build systems, isn't that an
> > issue?
> >
>
> It's a good question, and I suspect everyone will have their own
> opinion.
>
> Personally, I take the view that moving build system involves quite a
> number of changes anyway, so we should take the opportunity to clean
> up a
> few other things at the same time. This is why I'm so keep on trying
> to
> keep everything consistent as far as possible throughout the system
> and not
> put in special cases. For many of these a) if we put in lots of name
> overrides now we'll probably never get rid of them, and b) it's more
> likely
> that future drivers will adopt the same technique to have different
> naming
> of drivers and directories.
>
> However, if keeping sonames consistent is a major concern, then
> perhaps we
> should look to rename some directories, like you suggested before.
>
> /Bruce
Actually I tend to agree, it would be better to make the libraries
consistent, so I'm fine with having to deal with it once in packaging.
I'll send a v2 without most of the renames.
--
Kind regards,
Luca Boccassi
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 07/15] net/liquidio: rename version map after library file name
2018-09-11 13:41 4% ` Luca Boccassi
@ 2018-09-11 14:06 0% ` Bruce Richardson
2018-09-11 16:05 0% ` Luca Boccassi
0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-09-11 14:06 UTC (permalink / raw)
To: Luca Boccassi; +Cc: dev
On Tue, Sep 11, 2018 at 02:41:36PM +0100, Luca Boccassi wrote:
> On Tue, 2018-09-11 at 14:32 +0100, Bruce Richardson wrote:
> > On Tue, Sep 11, 2018 at 02:09:30PM +0100, Luca Boccassi wrote:
> > > On Tue, 2018-09-11 at 14:06 +0100, Bruce Richardson wrote:
> > > > On Mon, Sep 10, 2018 at 09:04:07PM +0100, Luca Boccassi wrote:
> > > > > The library is called librte_pmd_lio, so rename the map file
> > > > > and
> > > > > set
> > > > > the name in the meson file so that the built library names with
> > > > > meson
> > > > > and legacy makefiles are the same
> > > > >
> > > > > Fixes: bad475c03fee ("net/liquidio: add to meson build")
> > > > > Cc: stable@dpdk.org
> > > > >
> > > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > > >
> > > > Rather than doing this renaming, can we instead add a symlink in
> > > > the
> > > > install phase to map the old name to the new one? I'd like to see
> > > > the
> > > > consistency of directory name, map filename and driver name
> > > > enforced
> > > > strictly in the build system. Having exceptions is a pain.
> > > >
> > > > /Bruce
> > >
> > > We could, but the pain gets shifted on packagers then - what about
> > > renaming the directory entirely to net/lio?
> >
> > For packagers, what sort of ABI compatibility guarantees do you try
> > and
> > keep between releases. Is this something that just needs a one-
> > release ABI
> > announcement, as with other ABI changes?
> >
> > /Bruce
>
> Currently in Debian/Ubuntu we are using the ABI override (because of
> the sticky ABI breakage issue) so the filenames and package names are
> different on every release anyway.
>
> So in theory we could change the name of the libs and packages, but
> what I'm mostly worried about is keeping consistency and some level of
> compatibility between old and new build systems, isn't that an issue?
>
It's a good question, and I suspect everyone will have their own opinion.
Personally, I take the view that moving build system involves quite a
number of changes anyway, so we should take the opportunity to clean up a
few other things at the same time. This is why I'm so keep on trying to
keep everything consistent as far as possible throughout the system and not
put in special cases. For many of these a) if we put in lots of name
overrides now we'll probably never get rid of them, and b) it's more likely
that future drivers will adopt the same technique to have different naming
of drivers and directories.
However, if keeping sonames consistent is a major concern, then perhaps we
should look to rename some directories, like you suggested before.
/Bruce
^ permalink raw reply [relevance 0%]
* Re: [dpdk-dev] [PATCH 07/15] net/liquidio: rename version map after library file name
2018-09-11 13:32 4% ` Bruce Richardson
@ 2018-09-11 13:41 4% ` Luca Boccassi
2018-09-11 14:06 0% ` Bruce Richardson
0 siblings, 1 reply; 200+ results
From: Luca Boccassi @ 2018-09-11 13:41 UTC (permalink / raw)
To: Bruce Richardson; +Cc: dev
On Tue, 2018-09-11 at 14:32 +0100, Bruce Richardson wrote:
> On Tue, Sep 11, 2018 at 02:09:30PM +0100, Luca Boccassi wrote:
> > On Tue, 2018-09-11 at 14:06 +0100, Bruce Richardson wrote:
> > > On Mon, Sep 10, 2018 at 09:04:07PM +0100, Luca Boccassi wrote:
> > > > The library is called librte_pmd_lio, so rename the map file
> > > > and
> > > > set
> > > > the name in the meson file so that the built library names with
> > > > meson
> > > > and legacy makefiles are the same
> > > >
> > > > Fixes: bad475c03fee ("net/liquidio: add to meson build")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > >
> > > Rather than doing this renaming, can we instead add a symlink in
> > > the
> > > install phase to map the old name to the new one? I'd like to see
> > > the
> > > consistency of directory name, map filename and driver name
> > > enforced
> > > strictly in the build system. Having exceptions is a pain.
> > >
> > > /Bruce
> >
> > We could, but the pain gets shifted on packagers then - what about
> > renaming the directory entirely to net/lio?
>
> For packagers, what sort of ABI compatibility guarantees do you try
> and
> keep between releases. Is this something that just needs a one-
> release ABI
> announcement, as with other ABI changes?
>
> /Bruce
Currently in Debian/Ubuntu we are using the ABI override (because of
the sticky ABI breakage issue) so the filenames and package names are
different on every release anyway.
So in theory we could change the name of the libs and packages, but
what I'm mostly worried about is keeping consistency and some level of
compatibility between old and new build systems, isn't that an issue?
--
Kind regards,
Luca Boccassi
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH 07/15] net/liquidio: rename version map after library file name
@ 2018-09-11 13:38 3% ` Luca Boccassi
0 siblings, 0 replies; 200+ results
From: Luca Boccassi @ 2018-09-11 13:38 UTC (permalink / raw)
To: Bruce Richardson; +Cc: dev
On Tue, 2018-09-11 at 14:30 +0100, Bruce Richardson wrote:
> On Tue, Sep 11, 2018 at 02:09:30PM +0100, Luca Boccassi wrote:
> > On Tue, 2018-09-11 at 14:06 +0100, Bruce Richardson wrote:
> > > On Mon, Sep 10, 2018 at 09:04:07PM +0100, Luca Boccassi wrote:
> > > > The library is called librte_pmd_lio, so rename the map file
> > > > and
> > > > set
> > > > the name in the meson file so that the built library names with
> > > > meson
> > > > and legacy makefiles are the same
> > > >
> > > > Fixes: bad475c03fee ("net/liquidio: add to meson build")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> > >
> > > Rather than doing this renaming, can we instead add a symlink in
> > > the
> > > install phase to map the old name to the new one? I'd like to see
> > > the
> > > consistency of directory name, map filename and driver name
> > > enforced
> > > strictly in the build system. Having exceptions is a pain.
> > >
> > > /Bruce
> >
> > We could, but the pain gets shifted on packagers then - what about
> > renaming the directory entirely to net/lio?
> >
>
> It is still an issue with packagers if the symlinks are created as
> part of
> the install step of DPDK itself (which is what I was intending)? I
> was
> thinking of adding a new post-install script for the backward
> compatible
> renames.
At least for Debian/Ubuntu, if I tell the tools that package libfoo1
needs to have libfoo.so.1.2.3, that's what it will do, without
following symlinks. So a broken link will be installed in the system,
unless I start tracking what symlinks are there and adding them
manually to the package they belong to.
There's also the fact that by policy the library package names should
match the file name of the library and its ABI revision, so
libfoo.so.1.2.3 should be in libfoo1 pkg vy policy - if they mismatch,
some linters tools are going to yell at me at the very least.
> As for renaming the directory, I don't mind, but I'll let the driver
> maintainers comment on their thoughts on it.
>
> /Bruce
--
Kind regards,
Luca Boccassi
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH 07/15] net/liquidio: rename version map after library file name
@ 2018-09-11 13:32 4% ` Bruce Richardson
2018-09-11 13:41 4% ` Luca Boccassi
1 sibling, 1 reply; 200+ results
From: Bruce Richardson @ 2018-09-11 13:32 UTC (permalink / raw)
To: Luca Boccassi
Cc: t, dev, keith.wiles, roy.fan.zhang, jingjing.wu, wenzhuo.lu,
rasesh.mody, harish.patil, shahed.shaikh, amr.mokhtar,
shijith.thotton, ssrinivasan, liang.j.ma, peter.mccarthy,
jerin.jacob, maciej.czekaj, arybchenko, ashish.gupta, yongwang,
thomas
On Tue, Sep 11, 2018 at 02:09:30PM +0100, Luca Boccassi wrote:
> On Tue, 2018-09-11 at 14:06 +0100, Bruce Richardson wrote:
> > On Mon, Sep 10, 2018 at 09:04:07PM +0100, Luca Boccassi wrote:
> > > The library is called librte_pmd_lio, so rename the map file and
> > > set
> > > the name in the meson file so that the built library names with
> > > meson
> > > and legacy makefiles are the same
> > >
> > > Fixes: bad475c03fee ("net/liquidio: add to meson build")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Luca Boccassi <bluca@debian.org>
> >
> > Rather than doing this renaming, can we instead add a symlink in the
> > install phase to map the old name to the new one? I'd like to see the
> > consistency of directory name, map filename and driver name enforced
> > strictly in the build system. Having exceptions is a pain.
> >
> > /Bruce
>
> We could, but the pain gets shifted on packagers then - what about
> renaming the directory entirely to net/lio?
For packagers, what sort of ABI compatibility guarantees do you try and
keep between releases. Is this something that just needs a one-release ABI
announcement, as with other ABI changes?
/Bruce
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [PATCH] mbuf: remove deprecated segment free functions
2018-09-10 5:18 3% [dpdk-dev] [PATCH] mbuf: remove deprecated segment free functions David Marchand
@ 2018-09-10 8:06 0% ` Andrew Rybchenko
2018-09-16 9:39 0% ` Thomas Monjalon
2018-09-17 12:45 8% ` [dpdk-dev] [PATCH v2] " David Marchand
1 sibling, 1 reply; 200+ results
From: Andrew Rybchenko @ 2018-09-10 8:06 UTC (permalink / raw)
To: David Marchand, dev; +Cc: olivier.matz
On 09/10/2018 08:18 AM, David Marchand wrote:
> __rte_mbuf_raw_free and __rte_pktmbuf_prefree_seg have been deprecated for
> a long time now (early 17.05), are not part of the abi and are easily
> replaced with existing api.
>
> Signed-off-by: David Marchand <david.marchand@6wind.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
^ permalink raw reply [relevance 0%]
* [dpdk-dev] [PATCH] mbuf: remove deprecated segment free functions
@ 2018-09-10 5:18 3% David Marchand
2018-09-10 8:06 0% ` Andrew Rybchenko
2018-09-17 12:45 8% ` [dpdk-dev] [PATCH v2] " David Marchand
0 siblings, 2 replies; 200+ results
From: David Marchand @ 2018-09-10 5:18 UTC (permalink / raw)
To: dev; +Cc: olivier.matz
__rte_mbuf_raw_free and __rte_pktmbuf_prefree_seg have been deprecated for
a long time now (early 17.05), are not part of the abi and are easily
replaced with existing api.
Signed-off-by: David Marchand <david.marchand@6wind.com>
---
lib/librte_mbuf/rte_mbuf.h | 16 ----------------
1 file changed, 16 deletions(-)
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 9ce5d76d7..a50b05c64 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1038,14 +1038,6 @@ rte_mbuf_raw_free(struct rte_mbuf *m)
rte_mempool_put(m->pool, m);
}
-/* compat with older versions */
-__rte_deprecated
-static inline void
-__rte_mbuf_raw_free(struct rte_mbuf *m)
-{
- rte_mbuf_raw_free(m);
-}
-
/**
* The packet mbuf constructor.
*
@@ -1658,14 +1650,6 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
return NULL;
}
-/* deprecated, replaced by rte_pktmbuf_prefree_seg() */
-__rte_deprecated
-static inline struct rte_mbuf *
-__rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
-{
- return rte_pktmbuf_prefree_seg(m);
-}
-
/**
* Free a segment of a packet mbuf into its original mempool.
*
--
2.17.1
^ permalink raw reply [relevance 3%]
* [dpdk-dev] [dpdk-announce] DPDK 18.05.1 released
@ 2018-09-05 14:44 1% Christian Ehrhardt
0 siblings, 0 replies; 200+ results
From: Christian Ehrhardt @ 2018-09-05 14:44 UTC (permalink / raw)
To: announce
Hi all,
Here is a new stable release:
https://fast.dpdk.org/rel/dpdk-18.05.1.tar.xz
The git tree is at:
https://dpdk.org/browse/dpdk-stable/?h=18.05
Christian Ehrhardt <christian.ehrhardt@canonical.com>
---
MAINTAINERS | 12 +-
app/test-crypto-perf/cperf_ops.c | 3 +
app/test-eventdev/test_order_atq.c | 12 +-
app/test-eventdev/test_order_queue.c | 12 +-
app/test-pmd/cmdline.c | 8 +-
app/test-pmd/cmdline_flow.c | 29 ++-
app/test-pmd/cmdline_tm.c | 37 ++-
app/test-pmd/testpmd.c | 46 +++-
buildtools/pmdinfogen/Makefile | 2 +-
config/meson.build | 3 +-
devtools/test-build.sh | 1 -
devtools/test-meson-builds.sh | 13 +-
doc/guides/cryptodevs/dpaa2_sec.rst | 1 -
doc/guides/cryptodevs/dpaa_sec.rst | 1 -
doc/guides/eventdevs/octeontx.rst | 2 +-
doc/guides/nics/qede.rst | 13 +-
doc/guides/nics/vdev_netvsc.rst | 2 +-
doc/guides/rel_notes/release_18_05.rst | 234 +++++++++++++++++++
doc/guides/testpmd_app_ug/testpmd_funcs.rst | 6 +-
drivers/bus/dpaa/base/fman/fman_hw.c | 20 +-
drivers/bus/dpaa/base/fman/of.c | 5 +
drivers/bus/dpaa/dpaa_bus.c | 14 +-
drivers/bus/dpaa/include/compat.h | 6 +
drivers/bus/pci/linux/pci_vfio.c | 2 +-
drivers/compress/isal/isal_compress_pmd.c | 68 ++++--
drivers/compress/isal/isal_compress_pmd_ops.c | 7 +-
drivers/crypto/virtio/virtio_cryptodev.c | 6 +
drivers/crypto/virtio/virtio_cryptodev.h | 3 +
drivers/crypto/virtio/virtio_rxtx.c | 14 +-
drivers/event/octeontx/ssovf_evdev.c | 14 +-
drivers/event/octeontx/ssovf_worker.c | 17 +-
drivers/event/octeontx/timvf_evdev.c | 2 +-
drivers/mempool/octeontx/octeontx_fpavf.c | 45 ++--
drivers/mempool/octeontx/octeontx_fpavf.h | 9 +
drivers/meson.build | 3 +
drivers/net/af_packet/rte_eth_af_packet.c | 1 +
drivers/net/avf/avf_ethdev.c | 17 +-
drivers/net/bnx2x/bnx2x.c | 22 +-
drivers/net/bnx2x/bnx2x.h | 1 +
drivers/net/bnx2x/bnx2x_ethdev.c | 105 ++++++---
drivers/net/bnx2x/bnx2x_ethdev.h | 3 +-
drivers/net/bnxt/bnxt.h | 4 +
drivers/net/bnxt/bnxt_ethdev.c | 56 +++--
drivers/net/bnxt/bnxt_filter.c | 27 ++-
drivers/net/bnxt/bnxt_hwrm.c | 57 +++--
drivers/net/bnxt/bnxt_stats.c | 3 +
drivers/net/bnxt/bnxt_txr.c | 59 ++++-
drivers/net/bnxt/bnxt_txr.h | 10 +
drivers/net/bnxt/bnxt_vnic.c | 5 +-
drivers/net/bnxt/bnxt_vnic.h | 6 +-
drivers/net/bonding/rte_eth_bond_api.c | 14 +-
drivers/net/bonding/rte_eth_bond_pmd.c | 27 +--
drivers/net/cxgbe/base/t4_hw.c | 97 ++++++--
drivers/net/cxgbe/base/t4_regs.h | 3 +
drivers/net/cxgbe/base/t4fw_interface.h | 8 +
drivers/net/cxgbe/base/t4vf_hw.c | 6 +
drivers/net/cxgbe/cxgbe_compat.h | 9 -
drivers/net/cxgbe/cxgbe_ethdev.c | 3 +-
drivers/net/cxgbe/cxgbevf_ethdev.c | 1 +
drivers/net/cxgbe/sge.c | 10 +-
drivers/net/dpaa/dpaa_ethdev.c | 36 ++-
drivers/net/dpaa2/dpaa2_rxtx.c | 16 +-
drivers/net/dpaa2/mc/dpni.c | 2 +-
drivers/net/ena/base/ena_plat_dpdk.h | 35 +--
drivers/net/ena/ena_ethdev.c | 4 +-
drivers/net/enic/base/vnic_dev.c | 16 ++
drivers/net/enic/base/vnic_dev.h | 4 +
drivers/net/enic/base/vnic_devcmd.h | 23 +-
drivers/net/enic/base/vnic_enet.h | 5 +-
drivers/net/enic/base/vnic_nic.h | 4 +-
drivers/net/enic/enic.h | 2 +
drivers/net/enic/enic_ethdev.c | 5 +-
drivers/net/enic/enic_main.c | 42 ++--
drivers/net/enic/enic_res.c | 11 +-
drivers/net/enic/enic_rxtx.c | 42 +++-
drivers/net/failsafe/failsafe.c | 1 +
drivers/net/fm10k/fm10k.h | 3 -
drivers/net/i40e/i40e_ethdev.c | 197 ++++++++++++----
drivers/net/i40e/i40e_ethdev_vf.c | 1 -
drivers/net/i40e/i40e_rxtx.c | 2 +-
drivers/net/i40e/i40e_rxtx_vec_avx2.c | 2 +-
drivers/net/i40e/rte_pmd_i40e.c | 1 +
drivers/net/ixgbe/ixgbe_ethdev.h | 5 +
drivers/net/ixgbe/ixgbe_fdir.c | 30 ++-
drivers/net/ixgbe/ixgbe_flow.c | 12 +-
drivers/net/ixgbe/ixgbe_pf.c | 14 +-
drivers/net/kni/rte_eth_kni.c | 1 +
drivers/net/mlx4/Makefile | 2 +-
drivers/net/mlx4/mlx4.c | 40 ++--
drivers/net/mlx4/mlx4.h | 1 +
drivers/net/mlx4/mlx4_rxq.c | 9 +-
drivers/net/mlx5/Makefile | 4 +-
drivers/net/mlx5/mlx5.c | 26 ++-
drivers/net/mlx5/mlx5.h | 2 +-
drivers/net/mlx5/mlx5_defs.h | 5 +-
drivers/net/mlx5/mlx5_ethdev.c | 20 +-
drivers/net/mlx5/mlx5_flow.c | 6 +-
drivers/net/mlx5/mlx5_glue.c | 4 +
drivers/net/mlx5/mlx5_mr.c | 119 +++++-----
drivers/net/mlx5/mlx5_mr.h | 5 +-
drivers/net/mlx5/mlx5_nl.c | 6 +-
drivers/net/mlx5/mlx5_rxmode.c | 26 ++-
drivers/net/mlx5/mlx5_rxq.c | 56 +----
drivers/net/mlx5/mlx5_rxtx.c | 28 +--
drivers/net/mlx5/mlx5_rxtx_vec.h | 4 +-
drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 16 +-
drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 16 +-
drivers/net/mlx5/mlx5_socket.c | 6 +
drivers/net/mlx5/mlx5_trigger.c | 45 ++--
drivers/net/mlx5/mlx5_txq.c | 31 +--
drivers/net/mvpp2/mrvl_ethdev.c | 5 +-
drivers/net/nfp/nfp_net.c | 12 +-
drivers/net/nfp/nfpcore/nfp-common/nfp_platform.h | 1 -
drivers/net/null/rte_eth_null.c | 1 +
drivers/net/octeontx/octeontx_ethdev.c | 15 +-
drivers/net/octeontx/octeontx_rxtx.c | 2 +-
drivers/net/pcap/rte_eth_pcap.c | 87 +++-----
drivers/net/qede/base/bcm_osal.c | 5 +
drivers/net/qede/base/ecore_dev.c | 10 +-
drivers/net/qede/base/ecore_int.c | 14 +-
drivers/net/qede/base/ecore_sriov.c | 44 ++++
drivers/net/qede/base/ecore_vf.c | 33 +++
drivers/net/qede/base/ecore_vf.h | 9 +
drivers/net/qede/base/ecore_vfpf_if.h | 16 ++
drivers/net/qede/qede_ethdev.c | 261 ++++++++++++++--------
drivers/net/qede/qede_ethdev.h | 1 +
drivers/net/qede/qede_fdir.c | 3 +
drivers/net/qede/qede_main.c | 7 +-
drivers/net/qede/qede_rxtx.c | 23 +-
drivers/net/qede/qede_rxtx.h | 1 -
drivers/net/sfc/sfc_ef10_essb_rx.c | 28 ++-
drivers/net/sfc/sfc_ef10_rx_ev.h | 8 +-
drivers/net/sfc/sfc_ethdev.c | 6 +-
drivers/net/sfc/sfc_filter.c | 14 ++
drivers/net/sfc/sfc_filter.h | 10 +
drivers/net/sfc/sfc_flow.c | 20 +-
drivers/net/sfc/sfc_rx.c | 26 ++-
drivers/net/tap/rte_eth_tap.c | 2 +
drivers/net/tap/tap_flow.c | 18 +-
drivers/net/thunderx/nicvf_ethdev.c | 5 +-
drivers/net/thunderx/nicvf_rxtx.c | 24 +-
drivers/net/vhost/rte_eth_vhost.c | 1 +
drivers/raw/dpaa2_qdma/dpaa2_qdma.c | 1 +
examples/exception_path/main.c | 3 +
examples/flow_filtering/main.c | 16 ++
examples/ipsec-secgw/ipsec-secgw.c | 7 +-
examples/l2fwd-crypto/main.c | 37 +--
examples/l3fwd/l3fwd_em.c | 1 -
examples/l3fwd/l3fwd_lpm.c | 1 -
examples/meson.build | 4 +
kernel/linux/kni/ethtool/igb/igb_ethtool.c | 7 +-
kernel/linux/kni/ethtool/igb/kcompat.h | 5 +
lib/librte_bitratestats/rte_bitrate.c | 6 +
lib/librte_cryptodev/rte_crypto.h | 51 +++--
lib/librte_eal/bsdapp/eal/eal_memory.c | 2 +-
lib/librte_eal/common/eal_common_dev.c | 26 +--
lib/librte_eal/common/eal_common_memory.c | 33 ++-
lib/librte_eal/common/eal_common_proc.c | 6 +-
lib/librte_eal/common/eal_common_thread.c | 6 +-
lib/librte_eal/common/include/rte_bitmap.h | 8 +-
lib/librte_eal/common/include/rte_version.h | 2 +-
lib/librte_eal/common/malloc_elem.c | 14 +-
lib/librte_eal/linuxapp/eal/eal_interrupts.c | 2 +-
lib/librte_eal/linuxapp/eal/eal_memalloc.c | 23 +-
lib/librte_eal/linuxapp/eal/eal_memory.c | 2 +-
lib/librte_eal/linuxapp/eal/eal_thread.c | 4 +-
lib/librte_eal/linuxapp/eal/eal_vfio.c | 49 +---
lib/librte_eal/linuxapp/eal/eal_vfio.h | 1 -
lib/librte_eal/linuxapp/eal/eal_vfio_mp_sync.c | 8 -
lib/librte_eal/meson.build | 2 +-
lib/librte_ethdev/rte_ethdev.c | 10 +
lib/librte_ethdev/rte_ethdev.h | 4 +-
lib/librte_ethdev/rte_ethdev_driver.h | 1 -
lib/librte_ethdev/rte_flow.c | 2 +-
lib/librte_eventdev/rte_event_eth_rx_adapter.c | 38 +++-
lib/librte_eventdev/rte_event_ring.c | 15 +-
lib/librte_hash/rte_cuckoo_hash.c | 21 +-
lib/librte_hash/rte_cuckoo_hash_x86.h | 3 +
lib/librte_hash/rte_hash.h | 20 +-
lib/librte_kni/rte_kni.c | 3 +
lib/librte_latencystats/rte_latencystats.c | 8 +-
lib/librte_metrics/rte_metrics.c | 15 +-
lib/librte_net/rte_ip.h | 28 +--
lib/librte_ring/rte_ring.h | 2 +-
lib/librte_ring/rte_ring_c11_mem.h | 8 +-
lib/librte_security/rte_security.c | 3 +-
lib/librte_vhost/iotlb.c | 10 +-
lib/librte_vhost/iotlb.h | 2 +-
lib/librte_vhost/vhost.h | 1 +
lib/librte_vhost/vhost_user.c | 5 +
lib/librte_vhost/virtio_net.c | 3 +-
lib/meson.build | 4 +
mk/rte.sdkinstall.mk | 36 +--
mk/rte.sdkroot.mk | 4 +-
mk/rte.sdktest.mk | 32 ++-
mk/toolchain/gcc/rte.toolchain-compat.mk | 5 +
mk/toolchain/gcc/rte.vars.mk | 9 +
pkg/dpdk.spec | 2 +-
test/test/autotest_runner.py | 145 ++++++------
test/test/meson.build | 7 +-
test/test/test_cryptodev.c | 2 +-
test/test/test_eal_flags.c | 33 +--
test/test/test_flow_classify.c | 20 +-
test/test/test_hash_multiwriter.c | 50 ++++-
test/test/test_pmd_ring.c | 2 +
205 files changed, 2565 insertions(+), 1246 deletions(-)
Adrien Mazarguil (8):
app/testpmd: fix crash when attaching a device
net/mlx4: fix minor resource leak during init
net/mlx5: fix errno object in probe function
net/mlx5: fix missing errno in probe function
net/mlx5: fix error message in probe function
net/mlx5: fix invalid error check
maintainers: update for Mellanox PMDs
net/mlx5: fix invalid network interface index
Ajit Khaparde (11):
net/bnxt: fix clear port stats
net/bnxt: fix close operation
net/bnxt: fix HW Tx checksum offload check
net/bnxt: check filter type before clearing it
net/bnxt: fix set MTU
net/bnxt: fix incorrect IO address handling in Tx
net/bnxt: fix Rx ring count limitation
net/bnxt: fix memory leaks in NVM commands
net/bnxt: fix lock release on NVM write failure
net/bnxt: check access denied for HWRM commands
net/bnxt: fix RETA size
Alejandro Lucero (2):
net/nfp: fix unused header reference
net/nfp: fix field initialization in Tx descriptor
Alok Makhariya (1):
bus/dpaa: fix phandle support for Linux 4.16
Anatoly Burakov (14):
ipc: fix locking while sending messages
mem: fix alignment of requested virtual areas
eal/bsd: fix memory segment index display
malloc: fix pad erasing
eal/linux: fix invalid syntax in interrupts
eal/linux: fix uninitialized value
vfio: fix uninitialized variable
malloc: do not skip pad on free
test: fix EAL flags autotest on FreeBSD
test: fix result printing
test: fix code on report
test: make autotest runner python 2/3 compliant
test: print autotest categories
test: improve filtering
Andrew Rybchenko (7):
net/sfc: cut non VLAN ID bits from TCI
net/sfc: discard packets with bad CRC on EF10 ESSB Rx
net/sfc: fix double-free in EF10 ESSB Rx queue purge
net/sfc: move Rx checksum offload check to device level
net/sfc: fix Rx queue offloads reporting in queue info
net/sfc: fix assert in set multicast address list
net/sfc: handle unknown L3 packet class in EF10 event parser
Andy Green (2):
ring: fix declaration after statement
ring: fix sign conversion warning
Beilei Xing (5):
net/i40e: fix shifts of 32-bit value
net/i40e: fix PPPoL2TP packet type parsing
net/i40e: fix packet type parsing with DDP
net/i40e: fix setting TPID with AQ command
net/i40e: fix device parameter parsing
Bruce Richardson (3):
eal: fix error message for unsupported platforms
examples/exception_path: fix out-of-bounds read
mk: fix permissions when using make install
Chas Williams (2):
net/bonding: always update bonding link status
net/bonding: do not clear active slave count
Christian Ehrhardt (3):
FIXUP: net/mlx5: fix invalid network interface index
version: 18.05.1-rc1
version: 18.05.1
Damjan Marion (1):
net/i40e: do not reset device info data
Dan Gora (1):
kni: fix crash with null name
Daria Kolistratova (1):
net/ena: fix SIGFPE with 0 Rx queue
Dariusz Stojaczyk (7):
mem: do not leave unmapped holes in EAL memory area
mem: do not unmap overlapping region on mmap failure
mem: avoid crash on memseg query with invalid address
mem: fix alignment requested with --base-virtaddr
mem: do not use --base-virtaddr in secondary processes
eal: fix return codes on thread naming failure
eal: fix return codes on control thread failure
David Marchand (1):
net/bnxt: add missing ids in xstats
Drocula Lambda (1):
kni: fix build on RHEL 7.5
Fan Zhang (1):
crypto/virtio: fix IV physical address
Ferruh Yigit (4):
kni: fix build with gcc 8.1
net/thunderx: fix build with gcc optimization on
app/testpmd: fix typo in setting Tx offload command
drivers/net: fix crash in secondary process
Gage Eads (1):
net: rename u16 to fix shadowed declaration
Gavin Hu (5):
mk: fix cross build
devtools: fix ninja command in build test
build: fix for host clang and cross gcc
net/dpaa2: remove loop for unused pool entries
maintainers: claim maintainership for ARM v7 and v8
Haiyue Wang (1):
net/i40e: workaround performance degradation
Harry van Haaren (2):
net/i40e: fix rearm check in AVX2 Rx
event: fix ring init failure handling
Hemant Agrawal (8):
doc: fix limitations for dpaa crypto
doc: fix limitations for dpaa2 crypto
test/crypto: fix device id when stopping port
bus/dpaa: fix SVR id fetch location
bus/dpaa: fix buffer offset setting in FMAN
net/dpaa: fix queue error handling and logs
net/dpaa2: fix prefetch Rx to honor number of packets
raw/dpaa2_qdma: fix IOVA as VA flag
Hyong Youb Kim (4):
net/enic: fix receive packet types
net/enic: update the UDP RSS detection mechanism
net/enic: do not overwrite admin Tx queue limit
net/enic: initialize RQ fetch index before enabling RQ
Ido Goshen (1):
net/pcap: fix multiple queues
Igor Romanov (1):
net/sfc: fix filter exceptions logic
Jananee Parthasarathy (1):
mk: update targets for classified tests
Jay Ding (1):
net/bnxt: check for invalid vNIC id
Jerin Jacob (3):
doc: fix octeontx eventdev selftest argument
ethdev: fix queue statistics mapping documentation
eal: fix bitmap documentation
Kiran Kumar (3):
net/bonding: fix MAC address reset
ethdev: check queue stats mapping input arguments
net/thunderx: avoid sq door bell write on zero packet
Konstantin Ananyev (3):
examples/ipsec-secgw: fix IPv4 checksum at Tx
examples/ipsec-secgw: fix bypass rule processing
app/testpmd: fix DCB config
Krzysztof Kanas (2):
app/testpmd: fix crash on TM command error
app/testpmd: fix help for TM commit command
Lee Daly (1):
compress/isal: fix offset usage
Matan Azrad (1):
net/tap: fix zeroed flow mask configurations
Maxime Coquelin (2):
vhost: fix missing increment of log cache count
vhost: flush IOTLB cache on new mem table handling
Moti Haimovsky (2):
net/mlx4: check RSS queues number limitation
net/mlx4: advertise Rx jumbo frame support
Nelio Laranjeiro (3):
net/mlx5: clean-up developer logs
app/testpmd: fix missing count action fields
net/mlx5: fix TCI mask filter
Nikhil Rao (5):
eventdev: fix port in Rx adapter internal function
eventdev: fix missing update to Rx adaper WRR position
eventdev: add event buffer flush in Rx adapter
eventdev: fix internal port logic in Rx adapter
eventdev: fix Rx SW adapter stop
Nithin Dabilpuram (1):
app/testpmd: fix buffer leak in TM command
Ophir Munk (1):
net/mlx5: fix secondary process resource leakage
Pablo de Lara (13):
cryptodev: fix ABI breakage
net/ixgbe: fix crash on detach
compress/isal: fix log type name
compress/isal: set null pointer after freeing
compress/isal: fix memory leak
examples/l2fwd-crypto: fix digest with AEAD algo
examples/l2fwd-crypto: check return value on IV size check
examples/l2fwd-crypto: skip device not supporting operation
devtools: remove already enabled nfp from build test
test/hash: fix multiwriter with non consecutive cores
test/hash: fix potential memory leak
app/crypto-perf: fix auth IV offset
hash: fix doxygen of return values
Pavan Nikhilesh (5):
event/octeontx: fix flush callback
mempool/octeontx: fix pool to aura mapping
app/eventdev: fix order test service init
event/octeontx: remove unnecessary port start and stop
net/octeontx: fix stop clearing Rx/Tx functions
Qi Zhang (4):
eal: fix hotplug add and remove
vfio: fix PCI address comparison
vfio: remove uneccessary IPC for group fd clear
net/ixgbe: fix missing null check on detach
Radu Nicolau (4):
security: fix crash on destroy null session
net/bonding: fix invalid port id
test: fix uninitialized port configuration
net/bonding: fix race condition
Rafal Kozik (4):
net/ena: check pointer before memset
net/ena: change memory type
net/ena: fix GENMASK_ULL macro
net/ena: set link speed as none
Rahul Lakkireddy (4):
net/cxgbe: report configured link auto-negotiation
net/cxgbe: fix Rx channel map and queue type
net/cxgbevf: add missing Tx byte counters
net/cxgbe: fix init failure due to new flash parts
Rami Rosen (2):
examples/l3fwd: remove useless include
ethdev: fix a doxygen comment for port allocation
Rasesh Mody (11):
net/qede: fix VF MTU update
net/qede: fix for devargs
net/qede: fix L2-handles used for RSS hash update
net/qede: fix memory alloc for multiple port reconfig
net/qede: remove primary MAC removal
doc: update qede management firmware guide
net/qede: fix default extended VLAN offload config
net/qede/base: fix to clear HW indication
net/qede/base: fix GRC attention callback
net/bnx2x: fix FW command timeout during stop
net/bnx2x: fix poll link status
Remy Horton (4):
bitrate: add sanity check on parameters
metrics: add check for invalid key
metrics: do not fail silently when uninitialised
metrics: disallow null as metric name
Reshma Pattan (3):
test/flow_classify: fix return types
mk: remove unnecessary test rules
latency: free up the memzone
Rosen Xu (1):
examples/flow_filtering: add flow director config for i40e
Shahaf Shuler (2):
net/mlx5: separate generic tunnel TSO from the standard one
net/mlx5: fix build with rdma-core v19
Shahed Shaikh (8):
net/qede: fix incorrect link status update
net/qede: fix link change event notification
net/qede: fix unicast MAC address handling in VF
net/qede: fix legacy interrupt mode
net/qede: fix Rx/Tx offload flags
net/qede: fix interrupt handler unregister
net/qede: fix MAC address removal failure message
net/qede: fix ntuple filter configuration
Shaopeng He (1):
net/i40e: fix Tx queue setup after stop
Shreyansh Jain (1):
doc: fix bonding command in testpmd
Somnath Kotur (4):
net/bnxt: revert reset of L2 filter id
net/bnxt: fix to move a flow to a different queue
net/bnxt: use correct flags during VLAN configuration
net/bnxt: fix filter freeing
Stephen Hemminger (2):
net/mlx5: fix log initialization
doc: fix typo in vdev_netvsc guide
Thomas Monjalon (2):
bus/dpaa: fix build
net/fm10k: remove unused constant
Timothy Redaelli (2):
net/mlx4: avoid stripping the glue library
net/mlx5: avoid stripping the glue library
Tiwei Bie (1):
vhost: release locks on RARP packet failure
Tomasz Duszynski (1):
net/mvpp2: check pointer before using it
Wei Zhao (7):
net/ixgbe: add support for VLAN in IP mode FDIR
net/ixgbe: fix tunnel id format error for FDIR
net/ixgbe: fix tunnel type set error for FDIR
net/ixgbe: fix mask bits register set error for FDIR
app/testpmd: fix VLAN TCI mask set error for FDIR
net/i40e: fix check of flow director programming status
net/i40e: revert fix of flow director check
Xiaoxin Peng (1):
net/bnxt: fix Tx with multiple mbuf
Xiaoyun Li (3):
net/i40e: fix link speed
app/testpmd: fix little performance drop
net/avf: fix offload capabilities
Xueming Li (1):
net/mlx5: fix crash in device probe
Yaroslav Brustinov (1):
net/mlx5: fix linkage of glue lib with gcc 4.7.2
Yipeng Wang (3):
hash: fix multiwriter lock memory allocation
hash: fix a multi-writer race condition
hash: fix key slot size accuracy
Yongseok Koh (6):
net/mlx5: fix error number handling
net/mlx5: fix Rx buffer replenishment threshold
net/mlx5: fix assert for Tx completion queue count
net/mlx5: fix queue rollback when starting device
net/mlx5: preserve promiscuous flag for flow isolation mode
net/mlx5: preserve allmulticast flag for flow isolation mode
^ permalink raw reply [relevance 1%]
* Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries
2018-09-05 22:13 4% ` Honnappa Nagarahalli
@ 2018-09-06 14:28 3% ` Michel Machado
2018-09-12 20:37 2% ` Honnappa Nagarahalli
0 siblings, 1 reply; 200+ results
From: Michel Machado @ 2018-09-06 14:28 UTC (permalink / raw)
To: Honnappa Nagarahalli, Qiaobin Fu, bruce.richardson, pablo.de.lara.guarch
Cc: dev, doucette, keith.wiles, sameh.gobriel, charlie.tai, stephen,
nd, yipeng1.wang
On 09/05/2018 06:13 PM, Honnappa Nagarahalli wrote:
>> + uint32_t next;
>> + uint32_t total_entries;
>> +};
>> This structure can be moved to rte_cuckoo_hash.h file.
>
> What's the purpose of moving this struct to a header file since it's only used in the C file rte_cuckoo_hash.c?
>
> This is to maintain consistency. For ex: 'struct queue_node', which is an internal structure, is kept in rte_cuckoo_hash.h
Okay. We'll move it there.
>> +int32_t
>> +rte_hash_iterator_init(const struct rte_hash *h,
>> + struct rte_hash_iterator_state *state) {
>> + struct rte_hash_iterator_istate *__state;
>> '__state' can be replaced by 's'.
>>
>> +
>> + RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
>> +
>> + __state = (struct rte_hash_iterator_istate *)state;
>> + __state->h = h;
>> + __state->next = 0;
>> + __state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
>> +
>> + return 0;
>> +}
>> IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.
>
> Applications would have to initialize struct rte_hash_iterator_state *state before calling rte_hash_iterate() anyway. Why not initializing the fields of a state only once?
>
> My concern is about creating another API for every iterator API. You have a valid point on saving cycles as this API applies for data plane. Have you done any performance benchmarking with and without this API? May be we can guide our decision based on that.
It's not just about creating one init function for each iterator
because an iterator may have a couple of init functions. For example,
someone may eventually find useful to add another init function for the
conflicting-entry iterator that we are advocating in this patch. A
possibility would be for this new init function to use the key of the
new entry instead of its signature to initialize the state. Similar to
what is already done in rte_hash_lookup*() functions. In spite of
possibly having multiple init functions, there will be a single iterator
function.
About the performance benchmarking, the current API only requites
applications to initialize a single 32-bit integer. But with the
adoption of a struct for the state, the initialization will grow to 64
bytes.
>> int32_t
>> -rte_hash_iterate(const struct rte_hash *h, const void **key, void
>> **data, uint32_t *next)
>> +rte_hash_iterate(
>> + struct rte_hash_iterator_state *state, const void **key, void
>> +**data)
>>
>> IMO, as suggested above, do not store 'struct rte_hash *h' in 'struct rte_hash_iterator_state'. Instead, change the API definition as follows:
>> rte_hash_iterate(const struct rte_hash *h, const void **key, void
>> **data, struct rte_hash_iterator_state *state)
>>
>> This will help keep the API signature consistent with existing APIs.
>>
>> This is an ABI change. Please take a look at https://doc.dpdk.org/guides/contributing/versioning.html.
>
> The ABI will change in a way or another, so why not going for a single state instead of requiring parameters that are already needed for the initialization of the state?
>
> Are there any cost savings we can achieve by keeping the 'h' in the iterator state?
There's a tiny cost saving: avoiding to push that parameter in the
execution stack every time the iterator will get another entry. However,
the reason I find more important is to make impossible to introduce a
bug in the code. Consider a function that is dealing with two hash
tables and two iterators. Without asking for the hash table to make
progress in an iterator, it's impossible to mix up hash tables and
iterator states.
There's even the possibility that an iterator doesn't need the hash
table after its initialization. This would be an *unlikely* case, but
consider an iterator that only returns a couple of entries. It could
cache those entries during initialization.
>> /* Calculate bucket and index of current iterator */
>> - bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>> - idx = *next % RTE_HASH_BUCKET_ENTRIES;
>> + bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
>> + idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>>
>> /* If current position is empty, go to the next one */
>> - while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>> - (*next)++;
>> + while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
>> + __state->next++;
>> /* End of table */
>> - if (*next == total_entries)
>> + if (__state->next == __state->total_entries)
>> return -ENOENT;
>> - bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
>> - idx = *next % RTE_HASH_BUCKET_ENTRIES;
>> + bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
>> + idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>> }
>> - __hash_rw_reader_lock(h);
>> + __hash_rw_reader_lock(__state->h);
>> /* Get position of entry in key table */
>> - position = h->buckets[bucket_idx].key_idx[idx];
>> - next_key = (struct rte_hash_key *) ((char *)h->key_store +
>> - position * h->key_entry_size);
>> + position = __state->h->buckets[bucket_idx].key_idx[idx];
>> + next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
>> + position * __state->h->key_entry_size);
>> /* Return key and data */
>> *key = next_key->key;
>> *data = next_key->pdata;
>>
>> - __hash_rw_reader_unlock(h);
>> + __hash_rw_reader_unlock(__state->h);
>>
>> /* Increment iterator */
>> - (*next)++;
>> + __state->next++;
>> This comment is not related to this change, it is better to place this inside the lock.
>
> Even though __state->next does not depend on the lock?
>
> It depends on if this API needs to be multi-thread safe. Interestingly, the documentation does not say it is multi-thread safe. If it has to be multi-thread safe, then the state also needs to be protected. For ex: what happens if the user uses a global variable for the state?
If an application needs to share an iterator state between threads,
it has to have a synchronization mechanism for that as it would for any
other shared variable. The lock above is allowing applications to share
a hash table between threads, it has no semantic over anything else.
>> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
>> index 9e7d9315f..fdb01023e 100644
>> --- a/lib/librte_hash/rte_hash.h
>> +++ b/lib/librte_hash/rte_hash.h
>> @@ -14,6 +14,8 @@
>> #include <stdint.h>
>> #include <stddef.h>
>>
>> +#include <rte_compat.h>
>> +
>> #ifdef __cplusplus
>> extern "C" {
>> #endif
>> @@ -64,6 +66,16 @@ struct rte_hash_parameters {
>> /** @internal A hash table structure. */ struct rte_hash;
>>
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice.
>> + *
>> + * @internal A hash table iterator state structure.
>> + */
>> +struct rte_hash_iterator_state {
>> + uint8_t space[64];
>> I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.
>
> Okay.
I think we should not replace 64 with RTE_CACHE_LINE_SIZE because
the ABI would change based on the architecture for which it's compiled.
[ ]'s
Michel Machado
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries
@ 2018-09-06 13:34 4% ` Michel Machado
0 siblings, 0 replies; 200+ results
From: Michel Machado @ 2018-09-06 13:34 UTC (permalink / raw)
To: Wang, Yipeng1, Qiaobin Fu, Richardson, Bruce, De Lara Guarch, Pablo
Cc: dev, doucette, Wiles, Keith, Gobriel, Sameh, Tai, Charlie,
stephen, nd, honnappa.nagarahalli
On 09/05/2018 04:27 PM, Wang, Yipeng1 wrote:
> Hmm I see, it falls back to my original thought to have malloc inside the init function..
> Thanks for the explanation. :)
>
> So I guess with your implementation, in future if we change the internal state to be larger,
> the ABI will be broken.
If that happens, yes, the ABI would need to change again. But this
concern is overblown for two reasons. First, this event is unlikely to
happen because struct rte_hash_iterator_state is already allocating 64
bytes while struct rte_hash_iterator_istate and struct
rte_hash_iterator_conflict_entries_istate consume 16 and 20 bytes,
respectively. Thus, the complexity of the underlying hash algorithm
would need to grow substantially to force the necessary state of these
iterators to grow more than 4x and 3x, respectively. This is unlikely to
happen, and, if it does, it would likely break the ABI somewhere else
and have a high impact on applications anyway.
Second, even if the unlikely event happens, all one would need to do
is to increase the size of struct rte_hash_iterator_state, mark the new
API as a new version, and applications would be ready for the new ABI
just recompiling.
> BTW, this patch set also changes API so proper notice is needed.
> People more familiar with API/ABI change policies may be able to help here.
We'd be happy to get feedback on this aspect.
> Just to confirm, is there anyway like I said for your application to have some long-live states
> and reuse them throughout the application so that you don’t have to have short-lived ones in stack?
Two things would need to happen for this to be possible. The init
functions would need to accept previously allocated iterator states,
that is, the init function would act as a reset of the state when acting
on a previous allocated state. And, applications would now need to carry
these pre-allocated state to avoid a malloc. In order words, we'll
increase the complexity of the API.
To emphasize that the cost of a malloc is not negligible,
rte_malloc() needs to get a spinlock (see heap_alloc_on_socket()), do
its thing to allocate memory, and, if the first attempt fails, try to
allocate the memory on other sockets (see end of malloc_heap_alloc()).
For an iterator that goes through the whole hash table, this cost may be
okay, but for an iterator that goes through a couple entries, this cost
is a lot to add.
This memory allocation concern is not new. Function
rte_pktmbuf_read(), for example, let applications pass buffers, which
are often allocated in the execution stack, to avoid the malloc cost.
[ ]'s
Michel Machado
^ permalink raw reply [relevance 4%]
* Re: [dpdk-dev] [RFC] ethdev: add min/max MTU to device info
2018-09-06 6:29 3% ` Andrew Rybchenko
@ 2018-09-06 10:52 3% ` Stephen Hemminger
0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2018-09-06 10:52 UTC (permalink / raw)
To: Andrew Rybchenko; +Cc: dev
On Thu, 6 Sep 2018 09:29:32 +0300
Andrew Rybchenko <arybchenko@solarflare.com> wrote:
> On 09/05/2018 07:41 PM, Stephen Hemminger wrote:
> > This addresses the usability issue raised by OVS at DPDK Userspace
> > summit. It adds general min/max mtu into device info. For compatiablity,
> > and to save space, it fits in a hole in existing structure.
>
> It is true for amd64, but it looks like it is false on 32-bit. So, ABI
> breakage.
Yes it is ABI change on 32 bit, but 18.11 is a major release where
this is allowed/expected.
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [RFC] ethdev: add min/max MTU to device info
@ 2018-09-06 6:29 3% ` Andrew Rybchenko
2018-09-06 10:52 3% ` Stephen Hemminger
0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2018-09-06 6:29 UTC (permalink / raw)
To: Stephen Hemminger, dev
On 09/05/2018 07:41 PM, Stephen Hemminger wrote:
> This addresses the usability issue raised by OVS at DPDK Userspace
> summit. It adds general min/max mtu into device info. For compatiablity,
> and to save space, it fits in a hole in existing structure.
It is true for amd64, but it looks like it is false on 32-bit. So, ABI
breakage.
> The initial version sets max mtu to normal Ethernet, it is up to
> PMD to set larger value if it supports Jumbo frames.
>
> Fixing the drivers to use this is trivial and can be done by 18.11.
> Already have some of the patches done.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> lib/librte_ethdev/rte_ethdev.c | 7 +++++++
> lib/librte_ethdev/rte_ethdev.h | 2 ++
> 2 files changed, 9 insertions(+)
>
> diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
> index 4c320250589a..df0c7536a7c4 100644
> --- a/lib/librte_ethdev/rte_ethdev.c
> +++ b/lib/librte_ethdev/rte_ethdev.c
> @@ -2408,6 +2408,8 @@ rte_eth_dev_info_get(uint16_t port_id, struct rte_eth_dev_info *dev_info)
> dev_info->rx_desc_lim = lim;
> dev_info->tx_desc_lim = lim;
> dev_info->device = dev->device;
> + dev_info->min_mtu = ETHER_MIN_MTU;
> + dev_info->max_mtu = ETHER_MTU;
>
> RTE_FUNC_PTR_OR_RET(*dev->dev_ops->dev_infos_get);
> (*dev->dev_ops->dev_infos_get)(dev, dev_info);
> @@ -2471,12 +2473,17 @@ int
> rte_eth_dev_set_mtu(uint16_t port_id, uint16_t mtu)
> {
> int ret;
> + struct rte_eth_dev_info dev_info;
> struct rte_eth_dev *dev;
>
> RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> dev = &rte_eth_devices[port_id];
> RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->mtu_set, -ENOTSUP);
>
> + rte_eth_dev_info_get(port_id, &dev_info);
> + if (mtu < dev_info.min_mtu || mtu > dev_info.max_mtu)
> + return -EINVAL;
> +
The check breaks set MTU to value larger than ETHER_MTU for not
updated drivers. So, IMHO, it should be pushed only with appropriate
updates in all drivers which support bigger MTU.
> ret = (*dev->dev_ops->mtu_set)(dev, mtu);
> if (!ret)
> dev->data->mtu = mtu;
> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
> index 7070e9ab408f..5171a9083288 100644
> --- a/lib/librte_ethdev/rte_ethdev.h
> +++ b/lib/librte_ethdev/rte_ethdev.h
> @@ -1015,6 +1015,8 @@ struct rte_eth_dev_info {
> const char *driver_name; /**< Device Driver name. */
> unsigned int if_index; /**< Index to bound host interface, or 0 if none.
> Use if_indextoname() to translate into an interface name. */
> + uint16_t min_mtu; /**< Minimum MTU allowed */
> + uint16_t max_mtu; /**< Maximum MTU allowed */
> const uint32_t *dev_flags; /**< Device flags */
> uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
> uint32_t max_rx_pktlen; /**< Maximum configurable length of RX pkt. */
^ permalink raw reply [relevance 3%]
* Re: [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries
@ 2018-09-05 22:13 4% ` Honnappa Nagarahalli
2018-09-06 14:28 3% ` Michel Machado
0 siblings, 1 reply; 200+ results
From: Honnappa Nagarahalli @ 2018-09-05 22:13 UTC (permalink / raw)
To: Michel Machado, Qiaobin Fu, bruce.richardson, pablo.de.lara.guarch
Cc: dev, doucette, keith.wiles, sameh.gobriel, charlie.tai, stephen,
nd, yipeng1.wang
-----Original Message-----
From: Michel Machado <michel@digirati.com.br>
Sent: Tuesday, September 4, 2018 2:37 PM
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Qiaobin Fu <qiaobinf@bu.edu>; bruce.richardson@intel.com; pablo.de.lara.guarch@intel.com
Cc: dev@dpdk.org; doucette@bu.edu; keith.wiles@intel.com; sameh.gobriel@intel.com; charlie.tai@intel.com; stephen@networkplumber.org; nd <nd@arm.com>; yipeng1.wang@intel.com
Subject: Re: [PATCH v3] hash table: add an iterator over conflicting entries
Hi Honnappa,
On 09/02/2018 06:05 PM, Honnappa Nagarahalli wrote:
> +/* istate stands for internal state. */ struct
> +rte_hash_iterator_istate {
> + const struct rte_hash *h;
> This can be outside of this structure. This will help keep the API definitions consistent with existing APIs. Please see further comments below.
Discussed later.
> + uint32_t next;
> + uint32_t total_entries;
> +};
> This structure can be moved to rte_cuckoo_hash.h file.
What's the purpose of moving this struct to a header file since it's only used in the C file rte_cuckoo_hash.c?
This is to maintain consistency. For ex: 'struct queue_node', which is an internal structure, is kept in rte_cuckoo_hash.h
> +int32_t
> +rte_hash_iterator_init(const struct rte_hash *h,
> + struct rte_hash_iterator_state *state) {
> + struct rte_hash_iterator_istate *__state;
> '__state' can be replaced by 's'.
>
> +
> + RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
> +
> + __state = (struct rte_hash_iterator_istate *)state;
> + __state->h = h;
> + __state->next = 0;
> + __state->total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
> +
> + return 0;
> +}
> IMO, creating this API can be avoided if the initialization is handled in 'rte_hash_iterate' function. The cost of doing this is very trivial (one extra 'if' statement) in 'rte_hash_iterate' function. It will help keep the number of APIs to minimal.
Applications would have to initialize struct rte_hash_iterator_state *state before calling rte_hash_iterate() anyway. Why not initializing the fields of a state only once?
My concern is about creating another API for every iterator API. You have a valid point on saving cycles as this API applies for data plane. Have you done any performance benchmarking with and without this API? May be we can guide our decision based on that.
> int32_t
> -rte_hash_iterate(const struct rte_hash *h, const void **key, void
> **data, uint32_t *next)
> +rte_hash_iterate(
> + struct rte_hash_iterator_state *state, const void **key, void
> +**data)
>
> IMO, as suggested above, do not store 'struct rte_hash *h' in 'struct rte_hash_iterator_state'. Instead, change the API definition as follows:
> rte_hash_iterate(const struct rte_hash *h, const void **key, void
> **data, struct rte_hash_iterator_state *state)
>
> This will help keep the API signature consistent with existing APIs.
>
> This is an ABI change. Please take a look at https://doc.dpdk.org/guides/contributing/versioning.html.
The ABI will change in a way or another, so why not going for a single state instead of requiring parameters that are already needed for the initialization of the state?
Are there any cost savings we can achieve by keeping the 'h' in the iterator state?
Thank you for the link. We'll check how to proceed with the ABI change.
> {
> + struct rte_hash_iterator_istate *__state;
> '__state' can be replaced with 's'.
Gaëtan Rivet has already pointed this out in his review of this version of our patch.
> uint32_t bucket_idx, idx, position;
> struct rte_hash_key *next_key;
>
> - RETURN_IF_TRUE(((h == NULL) || (next == NULL)), -EINVAL);
> + RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
> + (data == NULL)), -EINVAL);
> +
> + __state = (struct rte_hash_iterator_istate *)state;
>
> - const uint32_t total_entries = h->num_buckets * RTE_HASH_BUCKET_ENTRIES;
> /* Out of bounds */
> - if (*next >= total_entries)
> + if (__state->next >= __state->total_entries)
> return -ENOENT;
>
> 'if (__state->next == 0)' is required to avoid creating 'rte_hash_iterator_init' API.
The argument to keep _init() is presented above in this email.
> /* Calculate bucket and index of current iterator */
> - bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> - idx = *next % RTE_HASH_BUCKET_ENTRIES;
> + bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
> + idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
>
> /* If current position is empty, go to the next one */
> - while (h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> - (*next)++;
> + while (__state->h->buckets[bucket_idx].key_idx[idx] == EMPTY_SLOT) {
> + __state->next++;
> /* End of table */
> - if (*next == total_entries)
> + if (__state->next == __state->total_entries)
> return -ENOENT;
> - bucket_idx = *next / RTE_HASH_BUCKET_ENTRIES;
> - idx = *next % RTE_HASH_BUCKET_ENTRIES;
> + bucket_idx = __state->next / RTE_HASH_BUCKET_ENTRIES;
> + idx = __state->next % RTE_HASH_BUCKET_ENTRIES;
> }
> - __hash_rw_reader_lock(h);
> + __hash_rw_reader_lock(__state->h);
> /* Get position of entry in key table */
> - position = h->buckets[bucket_idx].key_idx[idx];
> - next_key = (struct rte_hash_key *) ((char *)h->key_store +
> - position * h->key_entry_size);
> + position = __state->h->buckets[bucket_idx].key_idx[idx];
> + next_key = (struct rte_hash_key *) ((char *)__state->h->key_store +
> + position * __state->h->key_entry_size);
> /* Return key and data */
> *key = next_key->key;
> *data = next_key->pdata;
>
> - __hash_rw_reader_unlock(h);
> + __hash_rw_reader_unlock(__state->h);
>
> /* Increment iterator */
> - (*next)++;
> + __state->next++;
> This comment is not related to this change, it is better to place this inside the lock.
Even though __state->next does not depend on the lock?
It depends on if this API needs to be multi-thread safe. Interestingly, the documentation does not say it is multi-thread safe. If it has to be multi-thread safe, then the state also needs to be protected. For ex: what happens if the user uses a global variable for the state?
> return position - 1;
> }
> +
> +/* istate stands for internal state. */ struct
> +rte_hash_iterator_conflict_entries_istate {
> + const struct rte_hash *h;
> This can be moved outside of this structure.
Discussed earlier.
> + uint32_t vnext;
> + uint32_t primary_bidx;
> + uint32_t secondary_bidx;
> +};
> +
> +int32_t __rte_experimental
> +rte_hash_iterator_conflict_entries_init_with_hash(const struct rte_hash *h,
> + hash_sig_t sig, struct rte_hash_iterator_state *state) {
> + struct rte_hash_iterator_conflict_entries_istate *__state;
> +
> + RETURN_IF_TRUE(((h == NULL) || (state == NULL)), -EINVAL);
> +
> + __state = (struct rte_hash_iterator_conflict_entries_istate *)state;
> + __state->h = h;
> + __state->vnext = 0;
> +
> + /* Get the primary bucket index given the precomputed hash value. */
> + __state->primary_bidx = sig & h->bucket_bitmask;
> + /* Get the secondary bucket index given the precomputed hash value. */
> + __state->secondary_bidx =
> + rte_hash_secondary_hash(sig) & h->bucket_bitmask;
> +
> + return 0;
> +}
> IMO, as mentioned above, it is possible to avoid creating this API.
Discussed earlier.
> +
> +int32_t __rte_experimental
> +rte_hash_iterate_conflict_entries(
> + struct rte_hash_iterator_state *state, const void **key, void
> +**data)
> Signature of this API can be changed as follows:
> rte_hash_iterate_conflict_entries(
> struct rte_hash *h, const void **key, void **data, struct
> rte_hash_iterator_state *state)
Discussed earlier.
> +{
> + struct rte_hash_iterator_conflict_entries_istate *__state;
> +
> + RETURN_IF_TRUE(((state == NULL) || (key == NULL) ||
> + (data == NULL)), -EINVAL);
> +
> + __state = (struct rte_hash_iterator_conflict_entries_istate *)state;
> +
> + while (__state->vnext < RTE_HASH_BUCKET_ENTRIES * 2) {
> + uint32_t bidx = __state->vnext < RTE_HASH_BUCKET_ENTRIES ?
> + __state->primary_bidx : __state->secondary_bidx;
> + uint32_t next = __state->vnext & (RTE_HASH_BUCKET_ENTRIES - 1);
>
> take the reader lock before reading bucket entry
Thanks for pointing this out. We are going to do so. The lock came in as we go through the versions of this patch.
> + uint32_t position = __state->h->buckets[bidx].key_idx[next];
> + struct rte_hash_key *next_key;
> +
> + /* Increment iterator. */
> + __state->vnext++;
> +
> + /*
> + * The test below is unlikely because this iterator is meant
> + * to be used after a failed insert.
> + */
> + if (unlikely(position == EMPTY_SLOT))
> + continue;
> +
> + /* Get the entry in key table. */
> + next_key = (struct rte_hash_key *) (
> + (char *)__state->h->key_store +
> + position * __state->h->key_entry_size);
> + /* Return key and data. */
> + *key = next_key->key;
> + *data = next_key->pdata;
> give the reader lock
We'll do so.
> +
> + return position - 1;
> + }
> +
> + return -ENOENT;
> +}
> diff --git a/lib/librte_hash/rte_hash.h b/lib/librte_hash/rte_hash.h
> index 9e7d9315f..fdb01023e 100644
> --- a/lib/librte_hash/rte_hash.h
> +++ b/lib/librte_hash/rte_hash.h
> @@ -14,6 +14,8 @@
> #include <stdint.h>
> #include <stddef.h>
>
> +#include <rte_compat.h>
> +
> #ifdef __cplusplus
> extern "C" {
> #endif
> @@ -64,6 +66,16 @@ struct rte_hash_parameters {
> /** @internal A hash table structure. */ struct rte_hash;
>
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * @internal A hash table iterator state structure.
> + */
> +struct rte_hash_iterator_state {
> + uint8_t space[64];
> I would call this 'state'. 64 can be replaced by 'RTE_CACHE_LINE_SIZE'.
Okay.
[ ]'s
Michel Machado
^ permalink raw reply [relevance 4%]
Results 8801-9000 of ~18000 next (older) | prev (newer) | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2018-03-07 17:44 [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI Ferruh Yigit
2018-10-04 15:43 9% ` [dpdk-dev] [PATCH] config: disable RTE_NEXT_ABI by default Ferruh Yigit
2018-10-04 14:49 0% ` Luca Boccassi
2018-10-04 15:48 9% ` [dpdk-dev] [PATCH v2] " Ferruh Yigit
2018-10-04 15:10 0% ` Thomas Monjalon
2018-10-04 15:28 0% ` Ferruh Yigit
2018-10-04 15:55 0% ` Thomas Monjalon
2018-10-05 9:13 0% ` Bruce Richardson
2018-10-05 10:17 0% ` Ferruh Yigit
2018-10-05 11:30 3% ` Neil Horman
2018-10-05 12:35 0% ` Ferruh Yigit
2018-06-07 12:38 [dpdk-dev] [PATCH 00/22] enable hotplug on multi-process Qi Zhang
2018-09-28 4:23 1% ` [dpdk-dev] [PATCH v16 0/6] " Qi Zhang
2018-09-28 4:23 2% ` [dpdk-dev] [PATCH v16 2/6] eal: " Qi Zhang
2018-10-16 0:16 1% ` [dpdk-dev] [PATCH v17 0/6] " Qi Zhang
2018-10-16 0:16 2% ` [dpdk-dev] [PATCH v17 2/6] eal: " Qi Zhang
2018-06-28 22:45 [dpdk-dev] [PATCH 00/10] kni: Interface detach and link status fixes Dan Gora
2018-06-29 1:54 ` Dan Gora
2018-06-29 1:55 ` [dpdk-dev] [PATCH v2 10/10] kni: add API to set link status on kernel interface Dan Gora
2018-08-29 15:54 ` Stephen Hemminger
2018-08-29 21:02 ` Dan Gora
2018-08-29 22:00 ` Stephen Hemminger
2018-08-29 22:12 ` Dan Gora
2018-08-29 22:41 ` Dan Gora
2018-08-29 23:10 ` Stephen Hemminger
2018-08-30 9:49 ` Igor Ryzhov
2018-08-30 21:41 ` Dan Gora
2018-08-30 22:09 ` Stephen Hemminger
2018-08-30 22:11 ` Dan Gora
2018-09-04 0:47 ` Dan Gora
2018-09-05 12:57 ` Stephen Hemminger
2018-09-11 21:45 ` Dan Gora
2018-09-11 21:52 ` Stephen Hemminger
2018-09-11 22:07 ` Dan Gora
2018-09-11 23:14 3% ` Stephen Hemminger
2018-09-12 4:02 0% ` Jason Wang
2018-08-16 9:55 [dpdk-dev] [PATCH v2] kni: fix kni Rx fifo producer synchronization Kiran Kumar
2018-08-27 14:07 ` Ferruh Yigit
2018-08-27 15:40 ` Gavin Hu
2018-08-28 10:43 ` Kokkilagadda, Kiran
2018-08-28 19:30 ` Gavin Hu
2018-08-29 4:59 ` Honnappa Nagarahalli
2018-08-29 5:49 ` Kokkilagadda, Kiran
2018-08-29 7:34 ` Ola Liljedahl
2018-08-29 8:28 ` Jerin Jacob
2018-08-29 8:47 ` Ola Liljedahl
2018-08-29 8:57 ` Jerin Jacob
2018-09-13 17:40 0% ` Honnappa Nagarahalli
2018-09-13 17:51 0% ` Jerin Jacob
2018-09-13 23:45 0% ` Honnappa Nagarahalli
2018-09-14 2:45 0% ` Jerin Jacob
2018-09-18 15:53 0% ` Ferruh Yigit
2018-09-19 5:37 0% ` Honnappa Nagarahalli
2018-08-17 10:51 [dpdk-dev] [PATCH v1 0/5] Enable hotplug in vfio Jeff Guo
2018-08-17 10:51 ` [dpdk-dev] [PATCH v1 4/5] pci: add req handler field to generic pci device Jeff Guo
2018-09-26 12:22 3% ` Burakov, Anatoly
2018-09-29 6:15 3% ` Jeff Guo
2018-10-01 7:51 3% ` Burakov, Anatoly
2018-08-23 12:08 [dpdk-dev] [PATCH 00/11] introduce telemetry library Ciara Power
2018-10-03 17:36 ` [dpdk-dev] [PATCH v2 00/10] " Kevin Laatz
2018-10-04 13:25 ` Van Haaren, Harry
2018-10-04 15:53 ` Thomas Monjalon
2018-10-09 10:33 3% ` Van Haaren, Harry
2018-10-09 11:41 0% ` Thomas Monjalon
2018-08-24 16:47 [dpdk-dev] [PATCH] acl: fix invalid results for rule with zero priority Konstantin Ananyev
2018-09-16 9:56 ` Thomas Monjalon
2018-09-25 12:22 3% ` Luca Boccassi
2018-09-25 12:57 3% ` Thomas Monjalon
2018-09-25 14:34 0% ` Ananyev, Konstantin
2018-10-03 16:18 0% ` Luca Boccassi
2018-08-24 16:53 [dpdk-dev] [RFC] ipsec: new library for IPsec data-path processing Konstantin Ananyev
2018-10-09 18:23 2% ` [dpdk-dev] [RFC v2 0/9] " Konstantin Ananyev
2018-08-24 17:48 [dpdk-dev] [RFC] cryptodev: proposed changes in rte_cryptodev_sym_session Konstantin Ananyev
2018-10-05 11:05 0% ` Ananyev, Konstantin
2018-08-27 12:24 [dpdk-dev] [PATCH] mem: share legacy and single file segments mode with secondaries Anatoly Burakov
2018-09-19 8:56 3% ` Thomas Monjalon
2018-09-20 15:41 17% ` [dpdk-dev] [PATCH v2] mem: store memory mode flags in shared config Anatoly Burakov
2018-10-03 22:05 0% ` Thomas Monjalon
2018-10-04 9:17 0% ` Burakov, Anatoly
2018-10-04 9:18 0% ` Thomas Monjalon
2018-10-04 10:46 0% ` Ferruh Yigit
2018-10-05 9:04 0% ` Burakov, Anatoly
2018-08-31 12:50 [dpdk-dev] [PATCH v2 0/5] use IOVAs check based on DMA mask Alejandro Lucero
2018-08-31 12:50 ` [dpdk-dev] [PATCH v2 1/5] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
2018-10-03 12:43 3% ` Burakov, Anatoly
[not found] ` <CAD+H991m6qauwX+P=muKe6bAjNLUrcBaGbxFXkMV60OVNvRgPg@mail.gmail.com>
2018-10-04 12:59 0% ` [dpdk-dev] Fwd: " Alejandro Lucero
2018-08-31 16:51 [dpdk-dev] [PATCH v3] hash table: add an iterator over conflicting entries Qiaobin Fu
2018-09-02 22:05 ` Honnappa Nagarahalli
2018-09-04 19:36 ` Michel Machado
2018-09-05 22:13 4% ` Honnappa Nagarahalli
2018-09-06 14:28 3% ` Michel Machado
2018-09-12 20:37 2% ` Honnappa Nagarahalli
2018-09-20 19:50 0% ` Michel Machado
2018-09-04 18:55 ` Wang, Yipeng1
2018-09-04 19:07 ` Michel Machado
2018-09-04 19:51 ` Wang, Yipeng1
2018-09-04 20:26 ` Michel Machado
2018-09-04 20:57 ` Wang, Yipeng1
2018-09-05 17:52 ` Michel Machado
2018-09-05 20:27 ` Wang, Yipeng1
2018-09-06 13:34 4% ` Michel Machado
2018-10-09 19:29 ` [dpdk-dev] [PATCH v4 1/2] hash table: fix a bug in rte_hash_iterate() Qiaobin Fu
2018-10-09 19:29 2% ` [dpdk-dev] [PATCH v4 2/2] hash table: add an iterator over conflicting entries Qiaobin Fu
2018-09-04 10:12 [dpdk-dev] [PATCH v2] ethdev: make default behavior CRC strip on Rx Ferruh Yigit
2018-09-24 17:31 4% ` [dpdk-dev] [PATCH] doc: announce CRC strip changes in release notes Ferruh Yigit
2018-09-24 17:12 0% ` David Marchand
2018-09-05 12:21 [dpdk-dev] [PATCH 1/2] eventdev: fix port id argument in Rx adapter caps API Nikhil Rao
2018-09-25 8:49 4% ` [dpdk-dev] [PATCH v2] " Nikhil Rao
2018-09-25 9:15 0% ` Jerin Jacob
2018-09-25 9:50 0% ` Thomas Monjalon
2018-09-25 9:56 0% ` Jerin Jacob
2018-09-25 9:49 4% ` [dpdk-dev] [PATCH v3] " Nikhil Rao
2018-09-05 14:44 1% [dpdk-dev] [dpdk-announce] DPDK 18.05.1 released Christian Ehrhardt
2018-09-05 16:41 [dpdk-dev] [RFC] ethdev: add min/max MTU to device info Stephen Hemminger
2018-09-06 6:29 3% ` Andrew Rybchenko
2018-09-06 10:52 3% ` Stephen Hemminger
2018-09-06 17:12 [dpdk-dev] [PATCH 0/4] Address reader-writer concurrency in rte_hash Honnappa Nagarahalli
2018-09-06 17:12 ` [dpdk-dev] [PATCH 3/4] hash: fix rw concurrency while moving keys Honnappa Nagarahalli
2018-09-28 1:00 3% ` Wang, Yipeng1
2018-09-28 8:26 4% ` Bruce Richardson
2018-09-28 8:55 4% ` Van Haaren, Harry
2018-09-30 22:33 0% ` Honnappa Nagarahalli
2018-10-02 13:17 3% ` Van Haaren, Harry
2018-10-02 23:58 0% ` Wang, Yipeng1
2018-10-03 17:32 0% ` Honnappa Nagarahalli
2018-10-03 17:56 3% ` Wang, Yipeng1
2018-10-03 23:05 5% ` Ola Liljedahl
2018-10-04 3:32 0% ` Honnappa Nagarahalli
2018-10-04 3:54 0% ` Honnappa Nagarahalli
2018-10-04 19:16 0% ` Wang, Yipeng1
2018-09-30 23:05 0% ` Honnappa Nagarahalli
2018-09-07 22:27 [dpdk-dev] [RFC] eal: simplify parameters of hotplug functions Thomas Monjalon
2018-10-03 23:10 ` [dpdk-dev] [PATCH v5 0/5] eal: simplify devargs and " Thomas Monjalon
2018-10-03 23:10 4% ` [dpdk-dev] [PATCH v5 3/5] eal: add bus pointer in device structure Thomas Monjalon
2018-10-04 9:31 4% ` Gaëtan Rivet
2018-10-04 9:48 3% ` Thomas Monjalon
2018-10-07 9:32 3% ` [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions Thomas Monjalon
2018-10-07 9:32 4% ` [dpdk-dev] [PATCH v6 3/5] eal: add bus pointer in device structure Thomas Monjalon
2018-10-08 21:45 0% ` [dpdk-dev] [PATCH v6 0/5] eal: simplify devargs and hotplug functions Stephen Hemminger
2018-10-11 12:10 0% ` Thomas Monjalon
2018-09-10 5:18 3% [dpdk-dev] [PATCH] mbuf: remove deprecated segment free functions David Marchand
2018-09-10 8:06 0% ` Andrew Rybchenko
2018-09-16 9:39 0% ` Thomas Monjalon
2018-09-17 7:07 0% ` Olivier Matz
2018-09-17 12:45 8% ` [dpdk-dev] [PATCH v2] " David Marchand
2018-09-19 8:34 0% ` Thomas Monjalon
2018-09-10 20:04 [dpdk-dev] [PATCH 00/15] rename PMDs map files to match library name and add Meson files Luca Boccassi
2018-09-10 20:04 ` [dpdk-dev] [PATCH 07/15] net/liquidio: rename version map after library file name Luca Boccassi
2018-09-11 13:06 ` Bruce Richardson
2018-09-11 13:09 ` Luca Boccassi
2018-09-11 13:30 ` Bruce Richardson
2018-09-11 13:38 3% ` Luca Boccassi
2018-09-11 13:32 4% ` Bruce Richardson
2018-09-11 13:41 4% ` Luca Boccassi
2018-09-11 14:06 0% ` Bruce Richardson
2018-09-11 16:05 0% ` Luca Boccassi
2018-09-17 10:36 [dpdk-dev] [PATCH 00/11] Upgrade DPAA2 FW and other feature/bug fixes Shreyansh Jain
2018-09-26 18:04 2% ` [dpdk-dev] [PATCH v2 00/15] " Shreyansh Jain
2018-09-26 18:04 2% ` [dpdk-dev] [PATCH v2 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0 Shreyansh Jain
2018-10-12 9:32 0% ` [dpdk-dev] [PATCH v2 00/15] Upgrade DPAA2 FW and other feature/bug fixes Shreyansh Jain
2018-10-12 9:42 0% ` Shreyansh Jain
2018-10-12 10:16 0% ` Thomas Monjalon
2018-10-12 10:04 2% ` [dpdk-dev] [PATCH v3 " Shreyansh Jain
2018-10-12 10:04 2% ` [dpdk-dev] [PATCH v3 03/15] bus/fslmc: upgrade mc FW APIs to 10.10.0 Shreyansh Jain
2018-09-21 16:13 [dpdk-dev] [PATCH v4 00/20] Support externally allocated memory in DPDK Anatoly Burakov
2018-09-20 11:36 ` [dpdk-dev] [PATCH v3 " Anatoly Burakov
2018-09-19 13:56 ` [dpdk-dev] [PATCH v2 " Anatoly Burakov
2018-09-04 13:11 ` [dpdk-dev] [PATCH 00/16] " Anatoly Burakov
2018-09-19 13:56 16% ` [dpdk-dev] [PATCH v2 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-20 9:30 0% ` Andrew Rybchenko
2018-09-20 9:54 0% ` Burakov, Anatoly
2018-09-19 13:56 4% ` [dpdk-dev] [PATCH v2 04/20] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-20 11:36 16% ` [dpdk-dev] [PATCH v3 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-20 11:36 4% ` [dpdk-dev] [PATCH v3 04/20] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-21 16:13 16% ` [dpdk-dev] [PATCH v4 02/20] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-21 16:13 4% ` [dpdk-dev] [PATCH v4 04/20] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-26 11:21 2% ` [dpdk-dev] [PATCH v5 00/21] Support externally allocated memory in DPDK Anatoly Burakov
2018-09-27 10:40 2% ` [dpdk-dev] [PATCH v6 " Anatoly Burakov
2018-10-01 11:04 2% ` [dpdk-dev] [PATCH v7 " Anatoly Burakov
2018-10-01 12:56 3% ` [dpdk-dev] [PATCH v8 " Anatoly Burakov
2018-10-02 13:34 3% ` [dpdk-dev] [PATCH v9 " Anatoly Burakov
2018-10-02 13:34 13% ` [dpdk-dev] [PATCH v9 01/21] mem: add length to memseg list Anatoly Burakov
2018-10-02 13:34 10% ` [dpdk-dev] [PATCH v9 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-10-02 13:34 5% ` [dpdk-dev] [PATCH v9 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
2018-10-02 13:34 8% ` [dpdk-dev] [PATCH v9 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-10-02 13:34 7% ` [dpdk-dev] [PATCH v9 11/21] malloc: allow creating " Anatoly Burakov
2018-10-01 12:56 12% ` [dpdk-dev] [PATCH v8 01/21] mem: add length to memseg list Anatoly Burakov
2018-10-01 17:01 3% ` Stephen Hemminger
2018-10-02 9:03 0% ` Burakov, Anatoly
2018-10-01 12:56 13% ` [dpdk-dev] [PATCH v8 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-10-01 12:56 5% ` [dpdk-dev] [PATCH v8 03/21] malloc: index heaps using heap ID rather than NUMA node Anatoly Burakov
2018-10-01 12:56 8% ` [dpdk-dev] [PATCH v8 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-10-01 12:56 7% ` [dpdk-dev] [PATCH v8 11/21] malloc: allow creating " Anatoly Burakov
2018-10-01 11:04 16% ` [dpdk-dev] [PATCH v7 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-10-01 11:04 4% ` [dpdk-dev] [PATCH v7 04/21] mem: do not check for invalid socket ID Anatoly Burakov
2018-10-01 11:04 9% ` [dpdk-dev] [PATCH v7 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-10-01 11:05 4% ` [dpdk-dev] [PATCH v7 11/21] malloc: allow creating " Anatoly Burakov
2018-09-27 10:40 16% ` [dpdk-dev] [PATCH v6 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-27 11:03 0% ` Shreyansh Jain
2018-09-27 11:08 0% ` Burakov, Anatoly
2018-09-27 11:12 0% ` Shreyansh Jain
2018-09-27 11:29 0% ` Burakov, Anatoly
2018-09-29 0:09 0% ` Yongseok Koh
2018-09-27 10:41 4% ` [dpdk-dev] [PATCH v6 04/21] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-27 13:14 0% ` Alejandro Lucero
2018-09-27 13:21 0% ` Burakov, Anatoly
2018-09-27 13:42 0% ` Alejandro Lucero
2018-09-27 14:04 0% ` Burakov, Anatoly
2018-09-27 10:41 9% ` [dpdk-dev] [PATCH v6 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-09-27 10:41 4% ` [dpdk-dev] [PATCH v6 11/21] malloc: allow creating " Anatoly Burakov
2018-09-26 11:22 16% ` [dpdk-dev] [PATCH v5 02/21] mem: allow memseg lists to be marked as external Anatoly Burakov
2018-09-26 11:22 4% ` [dpdk-dev] [PATCH v5 04/21] mem: do not check for invalid socket ID Anatoly Burakov
2018-09-26 11:22 9% ` [dpdk-dev] [PATCH v5 08/21] malloc: add name to malloc heaps Anatoly Burakov
2018-09-26 11:22 4% ` [dpdk-dev] [PATCH v5 11/21] malloc: allow creating " Anatoly Burakov
2018-09-25 15:25 3% [dpdk-dev] [PATCH v1] doc: remove unused release note file John McNamara
2018-09-26 21:00 [dpdk-dev] [PATCH v2 0/3] ethdev: add generic L2/L3 tunnel encapsulation actions Ori Kam
2018-10-07 12:57 ` [dpdk-dev] [PATCH v3 " Ori Kam
2018-10-09 16:48 ` Ferruh Yigit
2018-10-10 6:45 ` Andrew Rybchenko
2018-10-10 9:00 ` Ori Kam
2018-10-10 12:02 2% ` Adrien Mazarguil
2018-10-10 13:17 0% ` Ori Kam
2018-10-10 16:10 0% ` Adrien Mazarguil
2018-10-11 8:48 0% ` Ori Kam
2018-09-27 11:26 [dpdk-dev] [PATCH] ethdev: add field for device data per process Alejandro Lucero
2018-10-03 20:44 ` Thomas Monjalon
2018-10-05 13:01 ` Ferruh Yigit
2018-10-05 13:17 ` Alejandro Lucero
2018-10-05 13:26 4% ` Ferruh Yigit
2018-10-05 14:47 0% ` Thomas Monjalon
2018-09-28 17:58 [dpdk-dev] [PATCH] build: add drivers_install_subdir meson option Luca Boccassi
2018-10-01 9:17 ` Bruce Richardson
2018-10-01 9:25 ` Bruce Richardson
2018-10-01 9:46 4% ` Luca Boccassi
2018-10-01 10:01 0% ` Bruce Richardson
2018-10-01 10:42 0% ` Timothy Redaelli
2018-10-01 11:06 0% ` Bruce Richardson
2018-10-01 11:24 0% ` Luca Boccassi
2018-10-02 11:02 0% ` Marco Varlese
2018-10-02 12:23 0% ` Bruce Richardson
2018-10-02 13:07 0% ` Luca Boccassi
2018-10-02 13:06 3% ` [dpdk-dev] [PATCH v2 1/2] build: change default PMD installation subdir to dpdk/pmds-XX.YY Luca Boccassi
2018-10-02 15:25 3% ` [dpdk-dev] [PATCH v3 " Luca Boccassi
2018-10-02 16:20 3% ` [dpdk-dev] [PATCH v4 " Luca Boccassi
2018-10-02 16:28 0% ` Bruce Richardson
2018-10-05 16:00 0% ` Timothy Redaelli
2018-10-27 21:19 0% ` Thomas Monjalon
2018-10-05 12:06 3% [dpdk-dev] [PATCH v2 0/6] use IOVAs check based on DMA mask Alejandro Lucero
2018-10-05 12:06 4% ` [dpdk-dev] [PATCH v2 1/6] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
2018-10-05 12:45 3% [dpdk-dev] [PATCH v3 0/6] use IOVAs check based on DMA mask Alejandro Lucero
2018-10-05 12:45 4% ` [dpdk-dev] [PATCH v3 1/6] mem: add function for checking memsegs IOVAs addresses Alejandro Lucero
2018-10-10 8:56 0% ` Tu, Lijuan
2018-10-11 9:26 0% ` Alejandro Lucero
2018-10-10 21:48 [dpdk-dev] [PATCH v1 0/3] Improvements over rte hash and tests Yipeng Wang
2018-10-10 21:48 ` [dpdk-dev] [PATCH v1 3/3] test/hash: add readwrite test for ext table Yipeng Wang
2018-10-24 20:36 ` Bruce Richardson
2018-10-26 0:23 ` Honnappa Nagarahalli
2018-10-26 10:12 3% ` Bruce Richardson
2018-10-29 5:54 0% ` Honnappa Nagarahalli
2018-10-31 4:21 3% ` Honnappa Nagarahalli
2018-10-11 14:20 4% [dpdk-dev] [PATCH] doc: cryptodev deprecation notice for sym session changes Konstantin Ananyev
2018-10-11 19:57 [dpdk-dev] [PATCH 1/2] eal: add API that sleeps while waiting for threads Ferruh Yigit
2018-10-15 22:21 ` [dpdk-dev] [PATCH v2 " Ferruh Yigit
2018-10-16 8:42 3% ` Ananyev, Konstantin
2018-10-15 14:50 [dpdk-dev] [PATCH 1/6] doc: add missing shared library versions to release notes Ferruh Yigit
2018-10-15 14:50 ` [dpdk-dev] [PATCH 6/6] doc: remove internal libs from " Ferruh Yigit
2018-10-16 11:52 ` Shreyansh Jain
2018-10-25 0:07 4% ` Thomas Monjalon
2018-10-18 16:08 [dpdk-dev] [PATCH] doc: show internal functions in doxygen Thomas Monjalon
2018-10-18 16:22 ` Ferruh Yigit
2018-10-18 17:04 ` Thomas Monjalon
2018-10-19 7:39 3% ` Ferruh Yigit
2018-10-22 6:15 0% ` Shreyansh Jain
2018-10-24 8:18 1% [dpdk-dev] [RFC 00/14] prefix network structures Olivier Matz
2018-10-24 14:56 0% ` Wiles, Keith
2018-10-26 7:22 0% ` Olivier Matz
2018-10-24 16:09 0% ` Stephen Hemminger
2018-10-24 16:39 0% ` Bruce Richardson
2018-10-26 7:20 0% ` Olivier Matz
2018-10-26 10:15 0% ` Bruce Richardson
2018-10-26 11:28 0% ` Olivier Matz
2018-10-24 18:38 0% ` Stephen Hemminger
2018-10-26 7:56 0% ` Olivier Matz
2018-10-31 2:17 [dpdk-dev] [PATCH v5 0/3] Extend rte_ipv6_frag_get_ipv6_fragment_header() Cody Doucette
2018-10-31 2:17 ` [dpdk-dev] [PATCH v5 3/3] ip_frag: extend IPv6 fragment header retrieval Cody Doucette
2018-10-31 19:56 3% ` Ananyev, Konstantin
2018-11-01 4:54 [dpdk-dev] [PATCH 1/3] hash: deprecate lock ellision and read/write concurreny flags Honnappa Nagarahalli
2018-11-01 23:25 ` [dpdk-dev] [PATCH v2 0/4] " Honnappa Nagarahalli
2018-11-02 11:25 ` Bruce Richardson
2018-11-02 17:38 3% ` Honnappa Nagarahalli
2018-11-01 9:53 [dpdk-dev] [PATCH v4 0/2] ring library with c11 memory model bug fix and optimization Gavin Hu
2018-11-01 9:53 ` [dpdk-dev] [PATCH v4 2/2] ring: move the atomic load of head above the loop Gavin Hu
2018-11-01 17:26 ` Stephen Hemminger
2018-11-02 0:53 ` Gavin Hu (Arm Technology China)
2018-11-02 4:30 ` Honnappa Nagarahalli
2018-11-02 7:15 3% ` Gavin Hu (Arm Technology China)
2018-11-02 9:36 0% ` Thomas Monjalon
2018-11-02 11:23 0% ` Gavin Hu (Arm Technology China)
2018-11-01 13:54 [dpdk-dev] [PATCH] check-symbol-change: fix regex to match on end of map file Neil Horman
2018-11-01 22:53 3% ` Thomas Monjalon
2018-11-02 11:50 0% ` Neil Horman
2018-11-05 10:26 [dpdk-dev] [PATCH 0/2] ip_frag: two fixes in reassembly code Konstantin Ananyev
2018-11-05 12:18 ` [dpdk-dev] [PATCH v2 2/2] ip_frag: use key length for key comparision Konstantin Ananyev
2018-11-06 10:53 3% ` Burakov, Anatoly
2018-11-06 11:41 0% ` Ananyev, Konstantin
2018-11-05 17:09 4% [dpdk-dev] [PATCH] doc: document KNI limitation in release notes Ferruh Yigit
2018-11-05 17:28 4% [dpdk-dev] [PATCH] doc: update release notes for default KNI carries status Ferruh Yigit
2018-11-06 6:15 4% [dpdk-dev] [PATCH] doc/linux_gsg: fix numa lib name error Yong Wang
2018-11-07 2:40 4% [dpdk-dev] [PATCH v2] " Yong Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).