* Re: [dpdk-dev] [PATCH] mk: using initial-exec model for thread local variable
2018-07-05 14:13 [dpdk-dev] [PATCH] mk: using initial-exec model for thread local variable Marvin Liu
@ 2018-07-05 9:25 ` Thomas Monjalon
2018-07-05 14:46 ` Sachin Saxena
1 sibling, 0 replies; 8+ messages in thread
From: Thomas Monjalon @ 2018-07-05 9:25 UTC (permalink / raw)
To: Marvin Liu; +Cc: zhiyong.yang, dev, techboard
05/07/2018 16:13, Marvin Liu:
> When building share library, thread-local storage model will be changed
> to global-dynamic. It will add additional cost for reading thread local
> variable. On the other hand, dynamically load share library with static
> TLS will request additional DTV slot which is limited by loader. By now
> only librte_pmd_eal.so contain thread local variable. So that can make
> TLS model back to initial-exec like static library for better
> performance.
>
> Signed-off-by: Marvin Liu <yong.liu@intel.com>
>
> diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
> index 7e4531bab..19d5e11ef 100644
> --- a/mk/toolchain/gcc/rte.vars.mk
> +++ b/mk/toolchain/gcc/rte.vars.mk
It is only for GCC? not clang?
> +# Initial execution TLS model has better performane compared to dynamic
> +# global. But this model require for addtional slot on DTV when dlopen
> +# object with thread local variable.
Few typos in this comment.
> +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> +TOOLCHAIN_CFLAGS += -ftls-model=initial-exec
> +endif
We really need more test or review of this patch.
Cc techboard: do we take the risk of getting it in RC1
without review? It is waiting for long.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [dpdk-dev] [PATCH] mk: using initial-exec model for thread local variable
@ 2018-07-05 14:13 Marvin Liu
2018-07-05 9:25 ` Thomas Monjalon
2018-07-05 14:46 ` Sachin Saxena
0 siblings, 2 replies; 8+ messages in thread
From: Marvin Liu @ 2018-07-05 14:13 UTC (permalink / raw)
To: zhiyong.yang, thomas, dev; +Cc: Marvin Liu
When building share library, thread-local storage model will be changed
to global-dynamic. It will add additional cost for reading thread local
variable. On the other hand, dynamically load share library with static
TLS will request additional DTV slot which is limited by loader. By now
only librte_pmd_eal.so contain thread local variable. So that can make
TLS model back to initial-exec like static library for better
performance.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
index 7e4531bab..19d5e11ef 100644
--- a/mk/toolchain/gcc/rte.vars.mk
+++ b/mk/toolchain/gcc/rte.vars.mk
@@ -43,6 +43,13 @@ ifeq (,$(findstring -O0,$(EXTRA_CFLAGS)))
endif
endif
+# Initial execution TLS model has better performane compared to dynamic
+# global. But this model require for addtional slot on DTV when dlopen
+# object with thread local variable.
+ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
+TOOLCHAIN_CFLAGS += -ftls-model=initial-exec
+endif
+
WERROR_FLAGS := -W -Wall -Wstrict-prototypes -Wmissing-prototypes
WERROR_FLAGS += -Wmissing-declarations -Wold-style-definition -Wpointer-arith
WERROR_FLAGS += -Wcast-align -Wnested-externs -Wcast-qual
--
2.17.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [PATCH] mk: using initial-exec model for thread local variable
2018-07-05 14:13 [dpdk-dev] [PATCH] mk: using initial-exec model for thread local variable Marvin Liu
2018-07-05 9:25 ` Thomas Monjalon
@ 2018-07-05 14:46 ` Sachin Saxena
2018-07-06 2:22 ` Liu, Yong
1 sibling, 1 reply; 8+ messages in thread
From: Sachin Saxena @ 2018-07-05 14:46 UTC (permalink / raw)
To: Marvin Liu, zhiyong.yang, thomas, dev
>
> When building share library, thread-local storage model will be changed to
> global-dynamic. It will add additional cost for reading thread local variable.
> On the other hand, dynamically load share library with static TLS will request
> additional DTV slot which is limited by loader. By now only librte_pmd_eal.so
> contain thread local variable. So that can make TLS model back to initial-exec
> like static library for better performance.
>
> Signed-off-by: Marvin Liu <yong.liu@intel.com>
>
> diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
> index 7e4531bab..19d5e11ef 100644
> --- a/mk/toolchain/gcc/rte.vars.mk
> +++ b/mk/toolchain/gcc/rte.vars.mk
> @@ -43,6 +43,13 @@ ifeq (,$(findstring -O0,$(EXTRA_CFLAGS))) endif endif
>
> +# Initial execution TLS model has better performane compared to dynamic
> +# global. But this model require for addtional slot on DTV when dlopen
> +# object with thread local variable.
> +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> +TOOLCHAIN_CFLAGS += -ftls-model=initial-exec endif
> +
[Sachin Saxena] Using initial-exec model for shared object is not recommended. If you link a shared object containing IE-model, the object will have the DF_STATIC_TLS flag set. By the spec, this means that dlopen() might refuse to load it if TLS usage is greater than static TLS space.
This is what happening, when I tried to validate this change on ARM64 based NXP platform with VPP-dpdk solution. VPP initialization fails with following error:
"load_one_plugin:145: /usr/lib/vpp_plugins/dpdk_plugin.so: cannot allocate memory in static TLS block"
Note that dpdk dpaa2 driver and VPP both uses TLS variables quite significantly. When forced to Initial-exec model in dpdk shared object, VPP static TLS space is getting exhausted and dlopen() returns error while trying to load the DPDK object.
For same reason, when we use "-fPIC" the default TLS model changed to "global-dynamics" from "Initial-exec".
In my opinion, this change should not be merged as it is breaking basic functionality.
> WERROR_FLAGS := -W -Wall -Wstrict-prototypes -Wmissing-prototypes
> WERROR_FLAGS += -Wmissing-declarations -Wold-style-definition -Wpointer-
> arith WERROR_FLAGS += -Wcast-align -Wnested-externs -Wcast-qual
> --
> 2.17.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [PATCH] mk: using initial-exec model for thread local variable
2018-07-05 14:46 ` Sachin Saxena
@ 2018-07-06 2:22 ` Liu, Yong
2018-07-06 10:02 ` Bruce Richardson
0 siblings, 1 reply; 8+ messages in thread
From: Liu, Yong @ 2018-07-06 2:22 UTC (permalink / raw)
To: Sachin Saxena, Yang, Zhiyong, thomas, dev
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Sachin Saxena
> Sent: Thursday, July 05, 2018 10:46 PM
> To: Liu, Yong <yong.liu@intel.com>; Yang, Zhiyong <zhiyong.yang@intel.com>;
> thomas@monjalon.net; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] mk: using initial-exec model for thread
> local variable
>
>
>
> >
> > When building share library, thread-local storage model will be changed
> to
> > global-dynamic. It will add additional cost for reading thread local
> variable.
> > On the other hand, dynamically load share library with static TLS will
> request
> > additional DTV slot which is limited by loader. By now only
> librte_pmd_eal.so
> > contain thread local variable. So that can make TLS model back to
> initial-exec
> > like static library for better performance.
> >
> > Signed-off-by: Marvin Liu <yong.liu@intel.com>
> >
> > diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
> > index 7e4531bab..19d5e11ef 100644
> > --- a/mk/toolchain/gcc/rte.vars.mk
> > +++ b/mk/toolchain/gcc/rte.vars.mk
> > @@ -43,6 +43,13 @@ ifeq (,$(findstring -O0,$(EXTRA_CFLAGS))) endif
> endif
> >
> > +# Initial execution TLS model has better performane compared to dynamic
> > +# global. But this model require for addtional slot on DTV when dlopen
> > +# object with thread local variable.
> > +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> > +TOOLCHAIN_CFLAGS += -ftls-model=initial-exec endif
> > +
>
> [Sachin Saxena] Using initial-exec model for shared object is not
> recommended. If you link a shared object containing IE-model, the object
> will have the DF_STATIC_TLS flag set. By the spec, this means that dlopen()
> might refuse to load it if TLS usage is greater than static TLS space.
> This is what happening, when I tried to validate this change on ARM64
> based NXP platform with VPP-dpdk solution. VPP initialization fails with
> following error:
> "load_one_plugin:145: /usr/lib/vpp_plugins/dpdk_plugin.so: cannot
> allocate memory in static TLS block"
>
> Note that dpdk dpaa2 driver and VPP both uses TLS variables quite
> significantly. When forced to Initial-exec model in dpdk shared object,
> VPP static TLS space is getting exhausted and dlopen() returns error while
> trying to load the DPDK object.
> For same reason, when we use "-fPIC" the default TLS model changed to
> "global-dynamics" from "Initial-exec".
>
> In my opinion, this change should not be merged as it is breaking basic
> functionality.
Thanks for your opinion, Sachin.
IE model may cause problem when using dlopen open share object. On the other hand, it can benefit performance.
It will be better to keep current workable setting and users may change it by themselves.
Regards,
Marvin
>
> > WERROR_FLAGS := -W -Wall -Wstrict-prototypes -Wmissing-prototypes
> > WERROR_FLAGS += -Wmissing-declarations -Wold-style-definition -Wpointer-
> > arith WERROR_FLAGS += -Wcast-align -Wnested-externs -Wcast-qual
> > --
> > 2.17.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [PATCH] mk: using initial-exec model for thread local variable
2018-07-06 2:22 ` Liu, Yong
@ 2018-07-06 10:02 ` Bruce Richardson
0 siblings, 0 replies; 8+ messages in thread
From: Bruce Richardson @ 2018-07-06 10:02 UTC (permalink / raw)
To: Liu, Yong; +Cc: Sachin Saxena, Yang, Zhiyong, thomas, dev
On Fri, Jul 06, 2018 at 02:22:14AM +0000, Liu, Yong wrote:
>
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Sachin Saxena
> > Sent: Thursday, July 05, 2018 10:46 PM
> > To: Liu, Yong <yong.liu@intel.com>; Yang, Zhiyong <zhiyong.yang@intel.com>;
> > thomas@monjalon.net; dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] mk: using initial-exec model for thread
> > local variable
> >
> >
> >
> > >
> > > When building share library, thread-local storage model will be changed
> > to
> > > global-dynamic. It will add additional cost for reading thread local
> > variable.
> > > On the other hand, dynamically load share library with static TLS will
> > request
> > > additional DTV slot which is limited by loader. By now only
> > librte_pmd_eal.so
> > > contain thread local variable. So that can make TLS model back to
> > initial-exec
> > > like static library for better performance.
> > >
> > > Signed-off-by: Marvin Liu <yong.liu@intel.com>
> > >
> > > diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
> > > index 7e4531bab..19d5e11ef 100644
> > > --- a/mk/toolchain/gcc/rte.vars.mk
> > > +++ b/mk/toolchain/gcc/rte.vars.mk
> > > @@ -43,6 +43,13 @@ ifeq (,$(findstring -O0,$(EXTRA_CFLAGS))) endif
> > endif
> > >
> > > +# Initial execution TLS model has better performane compared to dynamic
> > > +# global. But this model require for addtional slot on DTV when dlopen
> > > +# object with thread local variable.
> > > +ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
> > > +TOOLCHAIN_CFLAGS += -ftls-model=initial-exec endif
> > > +
> >
> > [Sachin Saxena] Using initial-exec model for shared object is not
> > recommended. If you link a shared object containing IE-model, the object
> > will have the DF_STATIC_TLS flag set. By the spec, this means that dlopen()
> > might refuse to load it if TLS usage is greater than static TLS space.
> > This is what happening, when I tried to validate this change on ARM64
> > based NXP platform with VPP-dpdk solution. VPP initialization fails with
> > following error:
> > "load_one_plugin:145: /usr/lib/vpp_plugins/dpdk_plugin.so: cannot
> > allocate memory in static TLS block"
> >
> > Note that dpdk dpaa2 driver and VPP both uses TLS variables quite
> > significantly. When forced to Initial-exec model in dpdk shared object,
> > VPP static TLS space is getting exhausted and dlopen() returns error while
> > trying to load the DPDK object.
> > For same reason, when we use "-fPIC" the default TLS model changed to
> > "global-dynamics" from "Initial-exec".
> >
> > In my opinion, this change should not be merged as it is breaking basic
> > functionality.
>
> Thanks for your opinion, Sachin.
> IE model may cause problem when using dlopen open share object. On the other hand, it can benefit performance.
> It will be better to keep current workable setting and users may change it by themselves.
>
What is the performance delta, and where is it most seen? I suggest for
future patches like this, that the commit message itself should give a
rough/approx indication of the perf impacts.
/Bruce
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [PATCH] mk: using initial-exec model for thread local variable
2018-04-28 4:39 ` Yang, Zhiyong
@ 2018-05-18 9:46 ` Thomas Monjalon
0 siblings, 0 replies; 8+ messages in thread
From: Thomas Monjalon @ 2018-05-18 9:46 UTC (permalink / raw)
To: Yang, Zhiyong, Liu, Yong; +Cc: dev, Wang, Zhihong
28/04/2018 06:39, Yang, Zhiyong:
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Marvin Liu
> Sent: Saturday, April 28, 2018 5:54 PM
> >
> > When building share library, thread-local storage model will be changed to
> > global-dynamic. It will cost additional protect for read thread local variable. By
> > now only lcore id is this kind of varaible and not need to dynamic share with
> > other threads. So make TLS model back to initial-exec like static library for
> > better performance.
> >
> > Signed-off-by: Marvin Liu <yong.liu@intel.com>
> >
>
> For vhost-user,
> If no this pach, vhost user in shared lib perf drops 14.3% than working in static.
> after applying the patch , vhost-user in shared lib can achieve the similar perf as in static lib.
>
> Tested-by: Zhiyong Yang <zhiyong.yang@intel.com>
For the record, I have decided to not try this optimization
in the last weeks of the 18.05 release.
However we could try it in 18.08 by applying the patch early in the cycle.
Is there anyone against this patch?
^ permalink raw reply [flat|nested] 8+ messages in thread
* [dpdk-dev] [PATCH] mk: using initial-exec model for thread local variable
@ 2018-04-28 9:54 Marvin Liu
2018-04-28 4:39 ` Yang, Zhiyong
0 siblings, 1 reply; 8+ messages in thread
From: Marvin Liu @ 2018-04-28 9:54 UTC (permalink / raw)
To: dev; +Cc: Marvin Liu
When building share library, thread-local storage model will be changed
to global-dynamic. It will cost additional protect for read thread local
variable. By now only lcore id is this kind of varaible and not need to
dynamic share with other threads. So make TLS model back to initial-exec
like static library for better performance.
Signed-off-by: Marvin Liu <yong.liu@intel.com>
diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
index 7e4531b..7b5e71c 100644
--- a/mk/toolchain/gcc/rte.vars.mk
+++ b/mk/toolchain/gcc/rte.vars.mk
@@ -43,6 +43,10 @@ ifeq (,$(findstring -O0,$(EXTRA_CFLAGS)))
endif
endif
+ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
+TOOLCHAIN_CFLAGS += -ftls-model=initial-exec
+endif
+
WERROR_FLAGS := -W -Wall -Wstrict-prototypes -Wmissing-prototypes
WERROR_FLAGS += -Wmissing-declarations -Wold-style-definition -Wpointer-arith
WERROR_FLAGS += -Wcast-align -Wnested-externs -Wcast-qual
--
1.9.3
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [PATCH] mk: using initial-exec model for thread local variable
2018-04-28 9:54 Marvin Liu
@ 2018-04-28 4:39 ` Yang, Zhiyong
2018-05-18 9:46 ` Thomas Monjalon
0 siblings, 1 reply; 8+ messages in thread
From: Yang, Zhiyong @ 2018-04-28 4:39 UTC (permalink / raw)
To: Liu, Yong, dev; +Cc: Liu, Yong, Wang, Zhihong
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Marvin Liu
> Sent: Saturday, April 28, 2018 5:54 PM
> To: dev@dpdk.org
> Cc: Liu, Yong <yong.liu@intel.com>
> Subject: [dpdk-dev] [PATCH] mk: using initial-exec model for thread local
> variable
>
> When building share library, thread-local storage model will be changed to
> global-dynamic. It will cost additional protect for read thread local variable. By
> now only lcore id is this kind of varaible and not need to dynamic share with
> other threads. So make TLS model back to initial-exec like static library for
> better performance.
>
> Signed-off-by: Marvin Liu <yong.liu@intel.com>
>
For vhost-user,
If no this pach, vhost user in shared lib perf drops 14.3% than working in static.
after applying the patch , vhost-user in shared lib can achieve the similar perf as in static lib.
Tested-by: Zhiyong Yang <zhiyong.yang@intel.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-07-06 10:02 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-05 14:13 [dpdk-dev] [PATCH] mk: using initial-exec model for thread local variable Marvin Liu
2018-07-05 9:25 ` Thomas Monjalon
2018-07-05 14:46 ` Sachin Saxena
2018-07-06 2:22 ` Liu, Yong
2018-07-06 10:02 ` Bruce Richardson
-- strict thread matches above, loose matches on Subject: below --
2018-04-28 9:54 Marvin Liu
2018-04-28 4:39 ` Yang, Zhiyong
2018-05-18 9:46 ` Thomas Monjalon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).