DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] eal: fix up bad asm in rte_cpu_get_features
@ 2014-03-18 20:43 Neil Horman
  2014-03-19 14:48 ` [dpdk-dev] [PATCH v2] " Neil Horman
  2014-03-20 11:42 ` [dpdk-dev] [PATCH v3] eal: Fix up assembly for x86_64 " Neil Horman
  0 siblings, 2 replies; 9+ messages in thread
From: Neil Horman @ 2014-03-18 20:43 UTC (permalink / raw)
  To: dev

The recent conversion to build dpdk as a DSO has an error in
rte_cpu_get_features.  When being build with -fpie, %ebx gets clobbered by the
cpuid instruction which is also the pic register.  Therefore the inline asm
tries to save off %ebx, but does so incorrectly.  It starts by loading
params.ebx to "D" which is %edi, but then the first instruction moves %ebx to
%edi, clobbering the input value. Then after the operation is complete, "D"
(%edi) is stored to the local ebx variable, but only after the xchgl instruction
has happened, which means ebx holds only the PIC pointer.  This behavior was
causing strange segfults for me when running the cpuid instruction.

The fix is pretty easy, split the asm into two separate directives, the first
saving ebx, and using it to grab the appropriate cpuid info (and correctly
listing %edi as a clobbered register in the process, and then a subsequent asm
directive preforming the reverse exchange (again, listing %edi as being
clobbered).

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
---
 lib/librte_eal/common/eal_common_cpuflags.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_cpuflags.c b/lib/librte_eal/common/eal_common_cpuflags.c
index 1ebf78c..2072a0c 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -208,16 +208,19 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
 	asm volatile ( 
             "mov %%ebx, %%edi\n"
             "cpuid\n"
-            "xchgl %%ebx, %%edi;\n"
             : "=a" (eax),
-              "=D" (ebx),
+              "=b" (ebx),
               "=c" (ecx),
               "=d" (edx)
             /* input */
             : "a" (params.eax),
-              "D" (params.ebx),
+              "b" (params.ebx),
               "c" (params.ecx),
-              "d" (params.edx));
+              "d" (params.edx)
+	    : "%edi");
+
+	asm volatile ("xchgl %%ebx, %%edi;\n"
+		      : :);
 #endif
 
 	switch (params.return_register) {
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH v2] eal: fix up bad asm in rte_cpu_get_features
  2014-03-18 20:43 [dpdk-dev] [PATCH] eal: fix up bad asm in rte_cpu_get_features Neil Horman
@ 2014-03-19 14:48 ` Neil Horman
  2014-03-19 15:44   ` H. Peter Anvin
  2014-03-20 11:42 ` [dpdk-dev] [PATCH v3] eal: Fix up assembly for x86_64 " Neil Horman
  1 sibling, 1 reply; 9+ messages in thread
From: Neil Horman @ 2014-03-19 14:48 UTC (permalink / raw)
  To: dev

The recent conversion to build dpdk as a DSO has an error in
rte_cpu_get_features.  When being build with -fpie, %ebx gets clobbered by the
cpuid instruction which is also the pic register.  Therefore the inline asm
tries to save off %ebx, but does so incorrectly.  It starts by loading
params.ebx to "D" which is %edi, but then the first instruction moves %ebx to
%edi, clobbering the input value. Then after the operation is complete, "D"
(%edi) is stored to the local ebx variable, but only after the xchgl instruction
has happened, which means ebx holds only the PIC pointer.  This behavior was
causing strange segfults for me when running the cpuid instruction.

The fix is pretty easy, split the asm into two separate directives, the first
saving ebx, and using it to grab the appropriate cpuid info (and correctly
listing %edi as a clobbered register in the process, and then a subsequent asm
directive preforming the reverse exchange (again, listing %edi as being
clobbered).

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

---
Change notes

v2) Fix constraints to ensure that ebx isn't overwritten before asm starts
---
 lib/librte_eal/common/eal_common_cpuflags.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_cpuflags.c b/lib/librte_eal/common/eal_common_cpuflags.c
index 1ebf78c..75b505f 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -190,7 +190,7 @@ static const struct feature_entry cpu_feature_table[] = {
 static inline int
 rte_cpu_get_features(struct cpuid_parameters_t params)
 {
-	int eax, ebx, ecx, edx;            /* registers */
+	int eax, ebx, ecx, edx, oldebx;            /* registers */
 
 #ifndef __PIC__
    asm volatile ("cpuid"
@@ -206,18 +206,21 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
                    "d" (params.edx));
 #else
 	asm volatile ( 
-            "mov %%ebx, %%edi\n"
+            "xchgl %%ebx, %%edi\n"
             "cpuid\n"
-            "xchgl %%ebx, %%edi;\n"
             : "=a" (eax),
-              "=D" (ebx),
+              "=b" (ebx),
               "=c" (ecx),
-              "=d" (edx)
+              "=d" (edx),
+	      "=D" (oldebx)
             /* input */
             : "a" (params.eax),
               "D" (params.ebx),
               "c" (params.ecx),
               "d" (params.edx));
+
+	asm volatile ("xchgl %%ebx, %%edi;\n"
+		      : : "D" (oldebx));
 #endif
 
 	switch (params.return_register) {
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH v2] eal: fix up bad asm in rte_cpu_get_features
  2014-03-19 14:48 ` [dpdk-dev] [PATCH v2] " Neil Horman
@ 2014-03-19 15:44   ` H. Peter Anvin
  2014-03-20  0:40     ` Neil Horman
  0 siblings, 1 reply; 9+ messages in thread
From: H. Peter Anvin @ 2014-03-19 15:44 UTC (permalink / raw)
  To: Neil Horman, dev

On 03/19/2014 07:48 AM, Neil Horman wrote:
> The recent conversion to build dpdk as a DSO has an error in
> rte_cpu_get_features.  When being build with -fpie, %ebx gets clobbered by the
> cpuid instruction which is also the pic register.  Therefore the inline asm
> tries to save off %ebx, but does so incorrectly.  It starts by loading
> params.ebx to "D" which is %edi, but then the first instruction moves %ebx to
> %edi, clobbering the input value. Then after the operation is complete, "D"
> (%edi) is stored to the local ebx variable, but only after the xchgl instruction
> has happened, which means ebx holds only the PIC pointer.  This behavior was
> causing strange segfults for me when running the cpuid instruction.
> 
> The fix is pretty easy, split the asm into two separate directives, the first
> saving ebx, and using it to grab the appropriate cpuid info (and correctly
> listing %edi as a clobbered register in the process, and then a subsequent asm
> directive preforming the reverse exchange (again, listing %edi as being
> clobbered).
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> 

Hi Neil :)

If I'm reading this correctly, this is at the very best extremely
dangerous (I'm confused why it would compile at all with PIC enabled),
since it leaves the CPU state in an unexpected way between two asm
statements, where the compiler is perfectly allowed to put code.

Instead, I would do simple xchg/xchg, which is an idiom I have used for
this particular purpose in a lot of code.  The minimal patch is simply
to change "mov" to "xchg" inside the asm statement.

There is no fundamental reason to nail down the register to %edi,
though; thus I would suggest instead:

diff --git a/lib/librte_eal/common/eal_common_cpuflags.c
b/lib/librte_eal/common/eal_common_cpuflags.c
index 1ebf78cc2a48..6b75992fec1a 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -206,16 +206,16 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
                    "d" (params.edx));
 #else
        asm volatile (
-            "mov %%ebx, %%edi\n"
+            "xchgl %%ebx, %1\n"
             "cpuid\n"
-            "xchgl %%ebx, %%edi;\n"
+            "xchgl %%ebx, %1;\n"
             : "=a" (eax),
-              "=D" (ebx),
+              "=r" (ebx),
               "=c" (ecx),
               "=d" (edx)
             /* input */
             : "a" (params.eax),
-              "D" (params.ebx),
+              "1" (params.ebx),
               "c" (params.ecx),
               "d" (params.edx));
 #endif


> ---
> Change notes
> 
> v2) Fix constraints to ensure that ebx isn't overwritten before asm starts
> ---
>  lib/librte_eal/common/eal_common_cpuflags.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_cpuflags.c b/lib/librte_eal/common/eal_common_cpuflags.c
> index 1ebf78c..75b505f 100644
> --- a/lib/librte_eal/common/eal_common_cpuflags.c
> +++ b/lib/librte_eal/common/eal_common_cpuflags.c
> @@ -190,7 +190,7 @@ static const struct feature_entry cpu_feature_table[] = {
>  static inline int
>  rte_cpu_get_features(struct cpuid_parameters_t params)
>  {
> -	int eax, ebx, ecx, edx;            /* registers */
> +	int eax, ebx, ecx, edx, oldebx;            /* registers */
>  
>  #ifndef __PIC__
>     asm volatile ("cpuid"
> @@ -206,18 +206,21 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
>                     "d" (params.edx));
>  #else
>  	asm volatile ( 
> -            "mov %%ebx, %%edi\n"
> +            "xchgl %%ebx, %%edi\n"
>              "cpuid\n"
> -            "xchgl %%ebx, %%edi;\n"
>              : "=a" (eax),
> -              "=D" (ebx),
> +              "=b" (ebx),
>                "=c" (ecx),
> -              "=d" (edx)
> +              "=d" (edx),
> +	      "=D" (oldebx)
>              /* input */
>              : "a" (params.eax),
>                "D" (params.ebx),
>                "c" (params.ecx),
>                "d" (params.edx));
> +
> +	asm volatile ("xchgl %%ebx, %%edi;\n"
> +		      : : "D" (oldebx));
>  #endif
>  
>  	switch (params.return_register) {
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH v2] eal: fix up bad asm in rte_cpu_get_features
  2014-03-19 15:44   ` H. Peter Anvin
@ 2014-03-20  0:40     ` Neil Horman
  2014-03-20  4:22       ` H. Peter Anvin
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Horman @ 2014-03-20  0:40 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: dev

On Wed, Mar 19, 2014 at 08:44:46AM -0700, H. Peter Anvin wrote:
> On 03/19/2014 07:48 AM, Neil Horman wrote:
> > The recent conversion to build dpdk as a DSO has an error in
> > rte_cpu_get_features.  When being build with -fpie, %ebx gets clobbered by the
> > cpuid instruction which is also the pic register.  Therefore the inline asm
> > tries to save off %ebx, but does so incorrectly.  It starts by loading
> > params.ebx to "D" which is %edi, but then the first instruction moves %ebx to
> > %edi, clobbering the input value. Then after the operation is complete, "D"
> > (%edi) is stored to the local ebx variable, but only after the xchgl instruction
> > has happened, which means ebx holds only the PIC pointer.  This behavior was
> > causing strange segfults for me when running the cpuid instruction.
> > 
> > The fix is pretty easy, split the asm into two separate directives, the first
> > saving ebx, and using it to grab the appropriate cpuid info (and correctly
> > listing %edi as a clobbered register in the process, and then a subsequent asm
> > directive preforming the reverse exchange (again, listing %edi as being
> > clobbered).
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > 
> 
So after some discussion with hpa, I need to self NAK this again, apologies for
the noise.  Theres some clean up to be done in this area, and I'm still getting
a segfault that is in some way related to this code, but I need to dig deeper to
understand it.

Neil

> Hi Neil :)
> 
> If I'm reading this correctly, this is at the very best extremely
> dangerous (I'm confused why it would compile at all with PIC enabled),
> since it leaves the CPU state in an unexpected way between two asm
> statements, where the compiler is perfectly allowed to put code.
> 
> Instead, I would do simple xchg/xchg, which is an idiom I have used for
> this particular purpose in a lot of code.  The minimal patch is simply
> to change "mov" to "xchg" inside the asm statement.
> 
> There is no fundamental reason to nail down the register to %edi,
> though; thus I would suggest instead:
> 
> diff --git a/lib/librte_eal/common/eal_common_cpuflags.c
> b/lib/librte_eal/common/eal_common_cpuflags.c
> index 1ebf78cc2a48..6b75992fec1a 100644
> --- a/lib/librte_eal/common/eal_common_cpuflags.c
> +++ b/lib/librte_eal/common/eal_common_cpuflags.c
> @@ -206,16 +206,16 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
>                     "d" (params.edx));
>  #else
>         asm volatile (
> -            "mov %%ebx, %%edi\n"
> +            "xchgl %%ebx, %1\n"
>              "cpuid\n"
> -            "xchgl %%ebx, %%edi;\n"
> +            "xchgl %%ebx, %1;\n"
>              : "=a" (eax),
> -              "=D" (ebx),
> +              "=r" (ebx),
>                "=c" (ecx),
>                "=d" (edx)
>              /* input */
>              : "a" (params.eax),
> -              "D" (params.ebx),
> +              "1" (params.ebx),
>                "c" (params.ecx),
>                "d" (params.edx));
>  #endif
> 
> 
> > ---
> > Change notes
> > 
> > v2) Fix constraints to ensure that ebx isn't overwritten before asm starts
> > ---
> >  lib/librte_eal/common/eal_common_cpuflags.c | 13 ++++++++-----
> >  1 file changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/lib/librte_eal/common/eal_common_cpuflags.c b/lib/librte_eal/common/eal_common_cpuflags.c
> > index 1ebf78c..75b505f 100644
> > --- a/lib/librte_eal/common/eal_common_cpuflags.c
> > +++ b/lib/librte_eal/common/eal_common_cpuflags.c
> > @@ -190,7 +190,7 @@ static const struct feature_entry cpu_feature_table[] = {
> >  static inline int
> >  rte_cpu_get_features(struct cpuid_parameters_t params)
> >  {
> > -	int eax, ebx, ecx, edx;            /* registers */
> > +	int eax, ebx, ecx, edx, oldebx;            /* registers */
> >  
> >  #ifndef __PIC__
> >     asm volatile ("cpuid"
> > @@ -206,18 +206,21 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
> >                     "d" (params.edx));
> >  #else
> >  	asm volatile ( 
> > -            "mov %%ebx, %%edi\n"
> > +            "xchgl %%ebx, %%edi\n"
> >              "cpuid\n"
> > -            "xchgl %%ebx, %%edi;\n"
> >              : "=a" (eax),
> > -              "=D" (ebx),
> > +              "=b" (ebx),
> >                "=c" (ecx),
> > -              "=d" (edx)
> > +              "=d" (edx),
> > +	      "=D" (oldebx)
> >              /* input */
> >              : "a" (params.eax),
> >                "D" (params.ebx),
> >                "c" (params.ecx),
> >                "d" (params.edx));
> > +
> > +	asm volatile ("xchgl %%ebx, %%edi;\n"
> > +		      : : "D" (oldebx));
> >  #endif
> >  
> >  	switch (params.return_register) {
> > 
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH v2] eal: fix up bad asm in rte_cpu_get_features
  2014-03-20  0:40     ` Neil Horman
@ 2014-03-20  4:22       ` H. Peter Anvin
  2014-03-20 11:03         ` Neil Horman
  0 siblings, 1 reply; 9+ messages in thread
From: H. Peter Anvin @ 2014-03-20  4:22 UTC (permalink / raw)
  To: Neil Horman, dev

On 03/19/2014 05:40 PM, Neil Horman wrote:
> So after some discussion with hpa, I need to self NAK this again, apologies for
> the noise.  Theres some clean up to be done in this area, and I'm still getting
> a segfault that is in some way related to this code, but I need to dig deeper to
> understand it.
> 
> Neil

I still believe we should add the patch I posted in the previous email;
I should clean it up and put a proper header on it.

This is, if there is actually a need to feed %ebx and %edx into CPUID
(the native instruction is sensitive to %eax and %ecx, but not %ebx or
%edx.)

For reference, this is a version of CPUID I personally often use:

struct cpuid {
	unsigned int eax, ecx, edx, ebx;
};

static inline void cpuid(unsigned int leaf, unsigned int subleaf,
			 struct cpuid *out)
{
#if defined(__i386__) && defined(__PIC__)
	/* %ebx is a forbidden register */
	asm volatile("movl %%ebx,%0 ; cpuid ; xchgl %%ebx,%0"
		: "=r" (out->ebx),
		  "=a" (out->eax),
		  "=c" (out->ecx),
		  "=d" (out->edx)
		: "a" (leaf), "c" (subleaf));
#else
	asm volatile("cpuid"
		: "=b" (out->ebx),
		  "=a" (out->eax),
		  "=c" (out->ecx),
		  "=d" (out->edx)
		: "a" (leaf), "c" (subleaf));
#endif
}

... but that is a pretty significant API change.

Making it an inline lets gcc elide the entire memory structure, so that
is definitely useful.

>>
>> diff --git a/lib/librte_eal/common/eal_common_cpuflags.c
>> b/lib/librte_eal/common/eal_common_cpuflags.c
>> index 1ebf78cc2a48..6b75992fec1a 100644
>> --- a/lib/librte_eal/common/eal_common_cpuflags.c
>> +++ b/lib/librte_eal/common/eal_common_cpuflags.c
>> @@ -206,16 +206,16 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
>>                     "d" (params.edx));
>>  #else
>>         asm volatile (
>> -            "mov %%ebx, %%edi\n"
>> +            "xchgl %%ebx, %1\n"
>>              "cpuid\n"
>> -            "xchgl %%ebx, %%edi;\n"
>> +            "xchgl %%ebx, %1;\n"
>>              : "=a" (eax),
>> -              "=D" (ebx),
>> +              "=r" (ebx),
>>                "=c" (ecx),
>>                "=d" (edx)
>>              /* input */
>>              : "a" (params.eax),
>> -              "D" (params.ebx),
>> +              "1" (params.ebx),
>>                "c" (params.ecx),
>>                "d" (params.edx));
>>  #endif
>>

	-hpa

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH v2] eal: fix up bad asm in rte_cpu_get_features
  2014-03-20  4:22       ` H. Peter Anvin
@ 2014-03-20 11:03         ` Neil Horman
  2014-03-20 11:27           ` Neil Horman
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Horman @ 2014-03-20 11:03 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: dev

On Wed, Mar 19, 2014 at 09:22:03PM -0700, H. Peter Anvin wrote:
> On 03/19/2014 05:40 PM, Neil Horman wrote:
> > So after some discussion with hpa, I need to self NAK this again, apologies for
> > the noise.  Theres some clean up to be done in this area, and I'm still getting
> > a segfault that is in some way related to this code, but I need to dig deeper to
> > understand it.
> > 
> > Neil
> 
> I still believe we should add the patch I posted in the previous email;
> I should clean it up and put a proper header on it.
> 
I agree, but the fact of the matter is that I'm still getting a segfault very
close to these instructions and I dont' understand why yet.  I'd hate to just
make the problem go away without understanding the reason why.  The patch you
propose doesn't fix (yet moving the xchgl to its own asm statement does).

> This is, if there is actually a need to feed %ebx and %edx into CPUID
> (the native instruction is sensitive to %eax and %ecx, but not %ebx or
> %edx.)
> 
> For reference, this is a version of CPUID I personally often use:
> 
> struct cpuid {
> 	unsigned int eax, ecx, edx, ebx;
> };
> 
> static inline void cpuid(unsigned int leaf, unsigned int subleaf,
> 			 struct cpuid *out)
> {
> #if defined(__i386__) && defined(__PIC__)
So, this is an additional difference and this in fact does make the problem
clear up.  By applying only this patch:

@@ -192,7 +192,7 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
 {
        int eax, ebx, ecx, edx;            /* registers */
 
-#ifndef __PIC__
+#if !defined(__PIC__) || !defined(__i386__)
    asm volatile ("cpuid"
                  /* output */
                  : "=a" (eax),

my build compiles the cpuid instruction branch, not the mov;cpuid; xchgl branch
(its an x86_64 build).  Is there any reason that x86_64 doesn't need to save the
ebx register when running cpuid while building PIE code?

Neil

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH v2] eal: fix up bad asm in rte_cpu_get_features
  2014-03-20 11:03         ` Neil Horman
@ 2014-03-20 11:27           ` Neil Horman
  2014-03-20 15:20             ` H. Peter Anvin
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Horman @ 2014-03-20 11:27 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: dev

On Thu, Mar 20, 2014 at 07:03:23AM -0400, Neil Horman wrote:
> On Wed, Mar 19, 2014 at 09:22:03PM -0700, H. Peter Anvin wrote:
> > On 03/19/2014 05:40 PM, Neil Horman wrote:
> > > So after some discussion with hpa, I need to self NAK this again, apologies for
> > > the noise.  Theres some clean up to be done in this area, and I'm still getting
> > > a segfault that is in some way related to this code, but I need to dig deeper to
> > > understand it.
> > > 
> > > Neil
> > 
> > I still believe we should add the patch I posted in the previous email;
> > I should clean it up and put a proper header on it.
> > 
> I agree, but the fact of the matter is that I'm still getting a segfault very
> close to these instructions and I dont' understand why yet.  I'd hate to just
> make the problem go away without understanding the reason why.  The patch you
> propose doesn't fix (yet moving the xchgl to its own asm statement does).
> 
> > This is, if there is actually a need to feed %ebx and %edx into CPUID
> > (the native instruction is sensitive to %eax and %ecx, but not %ebx or
> > %edx.)
> > 
> > For reference, this is a version of CPUID I personally often use:
> > 
> > struct cpuid {
> > 	unsigned int eax, ecx, edx, ebx;
> > };
> > 
> > static inline void cpuid(unsigned int leaf, unsigned int subleaf,
> > 			 struct cpuid *out)
> > {
> > #if defined(__i386__) && defined(__PIC__)
> So, this is an additional difference and this in fact does make the problem
> clear up.  By applying only this patch:
> 
> @@ -192,7 +192,7 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
>  {
>         int eax, ebx, ecx, edx;            /* registers */
>  
> -#ifndef __PIC__
> +#if !defined(__PIC__) || !defined(__i386__)
>     asm volatile ("cpuid"
>                   /* output */
>                   : "=a" (eax),
> 
> my build compiles the cpuid instruction branch, not the mov;cpuid; xchgl branch
> (its an x86_64 build).  Is there any reason that x86_64 doesn't need to save the
> ebx register when running cpuid while building PIE code?
> 
> Neil
> 
> 
So, I answered my own question, sort of.  The __i386__ is clear: x86_64 uses RIP
relative addressing, making the saving of ebx not needed - thats perfectly
clear.

Whats a bit less clear to me is why it matters.  Ideally moving ebx and
restoring it with an xchg should change the register state at all.  It would
clobber the lower part of rbx I think, but looking at the disassembly that
shouldn't be used, so as long as the calling function saves its value of rbx, it
should be ok.  The odd part is, if I look at the disassembly of
rte_cpu_get_flag_enabled compiled with and without the mov and xchgl operations,
I see that without those additional instructions the compiler adds a push rbx
and pop rbx instruction at the start and end of the assembly, but not when the
mov ebx, %0 and xchgl %ebx, %0 instructions are added.  I'm not sure what the
compiler is sensitive to when adding those instructions, but it seems like it
should be sensitive to the cpuid instruction, and should be adding it to both.

I'd like your thought Peter on that, but either way it seems clear that the
mov/xchgl aren't needed for x86_64 code, so I'll clean that up and post a new
patch shortly.

Neil

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [dpdk-dev] [PATCH v3] eal: Fix up assembly for x86_64 in rte_cpu_get_features
  2014-03-18 20:43 [dpdk-dev] [PATCH] eal: fix up bad asm in rte_cpu_get_features Neil Horman
  2014-03-19 14:48 ` [dpdk-dev] [PATCH v2] " Neil Horman
@ 2014-03-20 11:42 ` Neil Horman
  1 sibling, 0 replies; 9+ messages in thread
From: Neil Horman @ 2014-03-20 11:42 UTC (permalink / raw)
  To: dev; +Cc: H. Peter Anvin

x86_64 doesn't need to save off and restore ebx when issuing cpuid, since x86_64
uses RIP relative addressing.  Doing the save actually clobbers the lower half
of rbx, which could be used and not saved off independently, leading to
undefined behavior.  Fix up the defines so that for x86_64 we just issue the
cpuid instruction, which is safe.  Also, while we're at it, lets clean up the
input and output constraints on the inline asm, so that we don't load registers
that the cpuid instruction isn't sensitive to.

Note that this patch does alter the API, in that specifcations to ebx and edx
are ignored.  I chose to go ahead and do that because there is only a single
caller of this function and neither register is ever written currently.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
---
 lib/librte_eal/common/eal_common_cpuflags.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/lib/librte_eal/common/eal_common_cpuflags.c b/lib/librte_eal/common/eal_common_cpuflags.c
index 1ebf78c..0a18d53 100644
--- a/lib/librte_eal/common/eal_common_cpuflags.c
+++ b/lib/librte_eal/common/eal_common_cpuflags.c
@@ -192,7 +192,7 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
 {
 	int eax, ebx, ecx, edx;            /* registers */
 
-#ifndef __PIC__
+#if !defined(__PIC__) || !defined(__i386__)
    asm volatile ("cpuid"
                  /* output */
                  : "=a" (eax),
@@ -201,23 +201,19 @@ rte_cpu_get_features(struct cpuid_parameters_t params)
                    "=d" (edx)
                  /* input */
                  : "a" (params.eax),
-                   "b" (params.ebx),
-                   "c" (params.ecx),
-                   "d" (params.edx));
+                   "c" (params.ecx));
 #else
 	asm volatile ( 
-            "mov %%ebx, %%edi\n"
+            "mov %%ebx, %0\n"
             "cpuid\n"
-            "xchgl %%ebx, %%edi;\n"
-            : "=a" (eax),
-              "=D" (ebx),
+            "xchgl %%ebx, %0\n"
+            : "=r" (ebx),
+	      "=a" (eax),
               "=c" (ecx),
               "=d" (edx)
             /* input */
             : "a" (params.eax),
-              "D" (params.ebx),
-              "c" (params.ecx),
-              "d" (params.edx));
+              "c" (params.ecx));
 #endif
 
 	switch (params.return_register) {
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [dpdk-dev] [PATCH v2] eal: fix up bad asm in rte_cpu_get_features
  2014-03-20 11:27           ` Neil Horman
@ 2014-03-20 15:20             ` H. Peter Anvin
  0 siblings, 0 replies; 9+ messages in thread
From: H. Peter Anvin @ 2014-03-20 15:20 UTC (permalink / raw)
  To: Neil Horman; +Cc: dev

On 03/20/2014 04:27 AM, Neil Horman wrote:
>>
> So, I answered my own question, sort of.  The __i386__ is clear: x86_64 uses RIP
> relative addressing, making the saving of ebx not needed - thats perfectly
> clear.
> 
> Whats a bit less clear to me is why it matters.  Ideally moving ebx and
> restoring it with an xchg should change the register state at all.  It would
> clobber the lower part of rbx I think, but looking at the disassembly that
> shouldn't be used, so as long as the calling function saves its value of rbx, it
> should be ok.

I think you just hit on the real bug.

If this code were compiled on 64 bits, it would clobber the *upper* half
of %rbx, because a 32-bit operation on 64 bits clobber the upper half of
the register.  Since the compiler isn't being told that %rbx is being
modified, it expects %rbx to be unmodified and disaster ensues.

It just clicked on me, though, that this function is actually a static
function in a .c file, meaning it is not an API at all.  This code can
be simplified dramatically as a result.

Let me see if I can hack up something quickly.

> The odd part is, if I look at the disassembly of
> rte_cpu_get_flag_enabled compiled with and without the mov and xchgl operations,
> I see that without those additional instructions the compiler adds a push rbx
> and pop rbx instruction at the start and end of the assembly, but not when the
> mov ebx, %0 and xchgl %ebx, %0 instructions are added.  I'm not sure what the
> compiler is sensitive to when adding those instructions, but it seems like it
> should be sensitive to the cpuid instruction, and should be adding it to both.

It's not the instruction, it is the fact that the constraints include a
"=b".

This explains why your little hack happens to work... I was wondering
how it compiled at all.  The answer, of course, is that it it on x86-64
where the hack is neither necessary nor correct.

	-hpa

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-03-20 15:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-18 20:43 [dpdk-dev] [PATCH] eal: fix up bad asm in rte_cpu_get_features Neil Horman
2014-03-19 14:48 ` [dpdk-dev] [PATCH v2] " Neil Horman
2014-03-19 15:44   ` H. Peter Anvin
2014-03-20  0:40     ` Neil Horman
2014-03-20  4:22       ` H. Peter Anvin
2014-03-20 11:03         ` Neil Horman
2014-03-20 11:27           ` Neil Horman
2014-03-20 15:20             ` H. Peter Anvin
2014-03-20 11:42 ` [dpdk-dev] [PATCH v3] eal: Fix up assembly for x86_64 " Neil Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).