DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Neil Horman <nhorman@tuxdriver.com>, "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] ixgbe: convert sse intrinsics to use __builtin	variants
Date: Sat, 26 Jul 2014 11:15:01 +0000	[thread overview]
Message-ID: <2601191342CEEE43887BDE71AB97725821343E33@IRSMSX105.ger.corp.intel.com> (raw)
In-Reply-To: <1406226211-1364-1-git-send-email-nhorman@tuxdriver.com>


Hi Neil,
 
> The ixgbe pmd currently can't be built without enabling sse instructions at
> compile time.

Actually it can, all you have to do is set RTE_IXGBE_INC_VECTOR=n in your config.

>  While sse extensions provide better performance, theres no reason
> that we can't still create builds to run on systems that don't support sse.  If
> we modify the ixgbe code to use the __builtin_shuffle and __builtin_popcountll
> functions, I've confirmed that the gcc compiler emits the appropriate sse
> instructions when the provided -march parameter indicates a machine that
> includes sse support, and emits generic code when see isn't available.

I don't think it is ok to blindly replace _mm_shuffle_epi8 with __builtin_shuffle.
They are not identical.
I tried your patch on IVB box (gcc 4.8.3, CONFIG_RTE_MACHINE="native").
The result is - ixgbe_recv_pkts_vec() functionality is broken.
See below for more details.
So my vote is NACK.
Konstantin 
 
1. Code changes:
uint16_t
ixgbe_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
                uint16_t nb_pkts)
{
        ...
        __m128i shuf_msk;
        ...
        /* mask to shuffle from desc. to mbuf */
        shuf_msk = _mm_set_epi8(
                7, 6, 5, 4,  /* octet 4~7, 32bits rss */
                0xFF, 0xFF,  /* skip high 16 bits vlan_macip, zero out */
                15, 14,      /* octet 14~15, low 16 bits vlan_macip */
                0xFF, 0xFF,  /* skip high 16 bits pkt_len, zero out */
                13, 12,      /* octet 12~13, low 16 bits pkt_len */
                0xFF, 0xFF,  /* skip nb_segs and in_port, zero out */
                13, 12       /* octet 12~13, 16 bits data_len */
                );
         ...
         for (...) {
                __m128i descs[RTE_IXGBE_DESCS_PER_LOOP];
                __m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
                ...
                descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); 
                ...
-               pkt_mb4 = _mm_shuffle_epi8(descs[3], shuf_msk);
+              pkt_mb4 = __builtin_shuffle(descs[3], shuf_msk);
...

2. Code generated before the patch (valid one):
...
vmovdqa 0x4978d(%rip),%xmm0           /* load shuf_msk */
...
vmovdqu 0x30(%rdx),%xmm4                /* load desc[3] */
....
vpshufb %xmm0,%xmm4,%xmm8
....

3. Code generated after the patch applied (broken one):
...
vmovdqu 0x30(%rdx),%xmm3
...
vpunpcklqdq %xmm3,%xmm3,%xmm3       /* !!! ERROR - should be vpshufb !!!! */

4. What happens here?
My understanding:

GCC treats __m128i as vector of two 64bit integers:
/lib/gcc/x86_64-redhat-linux/4.8.3/include/emmintrin.h:typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__));

From https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html:
"... Vector shuffling is available using functions __builtin_shuffle (vec, mask) and __builtin_shuffle (vec0, vec1, mask). Both functions construct a permutation of elements from one or two vectors and return a vector of the same type as the input vector(s). The mask is an integral vector with the same width (W) and element count (N) as the output vector.

The elements of the input vectors are numbered in memory ordering of vec0 beginning at 0 and vec1 beginning at N. The elements of mask are considered modulo N in the single-operand case and modulo 2*N in the two-operand case."

For m128i N = 2, so:

__m128i x, __m128i msk;
x = __builtin_shuffle(x, msk);

means:

index0 = msk[0..63] % 2;
index1 = msk[64..127] % 2;
x[0..63] =  x[index0 * 64..index0*64+63];
x[64..127] =  x[index1 * 64..index1*64+63];

In ixgbe_recv_pkts_vec() shuf_msk[0..63] % 2 == 0 and  shuf_msk[64..127] % 2.
So compiler makes optimisation:
pkt_mb4[0..63]  = descs[3] [0..63];
pkt_mb4[64..127]  = descs[3] [0..63];
i.e:
vpunpcklqdq %xmm3,%xmm3,%xmm3

BTW, changing to
__builtin_shuffle((__v16qi) descs[3], (__v16qi)shuf_msk);
wouldn't help either.
 In that case __builtin_shuffle will consider elements of mas modulo 16.
While _mm_shuffle_epi8 (PSHUFB) is expected to zero destination byte if upper bit in the corresponding mask byte is 1.

 


  parent reply	other threads:[~2014-07-26 11:13 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-24 18:23 Neil Horman
2014-07-24 20:41 ` Venkatesan, Venky
2014-07-25  2:20   ` Neil Horman
2014-07-26 11:15 ` Ananyev, Konstantin [this message]
2014-07-28 13:32   ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2601191342CEEE43887BDE71AB97725821343E33@IRSMSX105.ger.corp.intel.com \
    --to=konstantin.ananyev@intel.com \
    --cc=dev@dpdk.org \
    --cc=nhorman@tuxdriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).