From: Neil Horman <nhorman@tuxdriver.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] ixgbe: convert sse intrinsics to use __builtin variants
Date: Mon, 28 Jul 2014 09:32:23 -0400 [thread overview]
Message-ID: <20140728133223.GA25278@hmsreliant.think-freely.org> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB97725821343E33@IRSMSX105.ger.corp.intel.com>
On Sat, Jul 26, 2014 at 11:15:01AM +0000, Ananyev, Konstantin wrote:
>
> Hi Neil,
>
> > The ixgbe pmd currently can't be built without enabling sse instructions at
> > compile time.
>
> Actually it can, all you have to do is set RTE_IXGBE_INC_VECTOR=n in your config.
>
> > While sse extensions provide better performance, theres no reason
> > that we can't still create builds to run on systems that don't support sse. If
> > we modify the ixgbe code to use the __builtin_shuffle and __builtin_popcountll
> > functions, I've confirmed that the gcc compiler emits the appropriate sse
> > instructions when the provided -march parameter indicates a machine that
> > includes sse support, and emits generic code when see isn't available.
>
> I don't think it is ok to blindly replace _mm_shuffle_epi8 with __builtin_shuffle.
> They are not identical.
> I tried your patch on IVB box (gcc 4.8.3, CONFIG_RTE_MACHINE="native").
> The result is - ixgbe_recv_pkts_vec() functionality is broken.
> See below for more details.
> So my vote is NACK.
> Konstantin
>
> 1. Code changes:
> uint16_t
> ixgbe_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts,
> uint16_t nb_pkts)
> {
> ...
> __m128i shuf_msk;
> ...
> /* mask to shuffle from desc. to mbuf */
> shuf_msk = _mm_set_epi8(
> 7, 6, 5, 4, /* octet 4~7, 32bits rss */
> 0xFF, 0xFF, /* skip high 16 bits vlan_macip, zero out */
> 15, 14, /* octet 14~15, low 16 bits vlan_macip */
> 0xFF, 0xFF, /* skip high 16 bits pkt_len, zero out */
> 13, 12, /* octet 12~13, low 16 bits pkt_len */
> 0xFF, 0xFF, /* skip nb_segs and in_port, zero out */
> 13, 12 /* octet 12~13, 16 bits data_len */
> );
> ...
> for (...) {
> __m128i descs[RTE_IXGBE_DESCS_PER_LOOP];
> __m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
> ...
> descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
> ...
> - pkt_mb4 = _mm_shuffle_epi8(descs[3], shuf_msk);
> + pkt_mb4 = __builtin_shuffle(descs[3], shuf_msk);
> ...
>
> 2. Code generated before the patch (valid one):
> ...
> vmovdqa 0x4978d(%rip),%xmm0 /* load shuf_msk */
> ...
> vmovdqu 0x30(%rdx),%xmm4 /* load desc[3] */
> ....
> vpshufb %xmm0,%xmm4,%xmm8
> ....
>
> 3. Code generated after the patch applied (broken one):
> ...
> vmovdqu 0x30(%rdx),%xmm3
> ...
> vpunpcklqdq %xmm3,%xmm3,%xmm3 /* !!! ERROR - should be vpshufb !!!! */
>
> 4. What happens here?
> My understanding:
>
> GCC treats __m128i as vector of two 64bit integers:
> /lib/gcc/x86_64-redhat-linux/4.8.3/include/emmintrin.h:typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__));
>
> From https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html:
> "... Vector shuffling is available using functions __builtin_shuffle (vec, mask) and __builtin_shuffle (vec0, vec1, mask). Both functions construct a permutation of elements from one or two vectors and return a vector of the same type as the input vector(s). The mask is an integral vector with the same width (W) and element count (N) as the output vector.
>
> The elements of the input vectors are numbered in memory ordering of vec0 beginning at 0 and vec1 beginning at N. The elements of mask are considered modulo N in the single-operand case and modulo 2*N in the two-operand case."
>
> For m128i N = 2, so:
>
> __m128i x, __m128i msk;
> x = __builtin_shuffle(x, msk);
>
> means:
>
> index0 = msk[0..63] % 2;
> index1 = msk[64..127] % 2;
> x[0..63] = x[index0 * 64..index0*64+63];
> x[64..127] = x[index1 * 64..index1*64+63];
>
> In ixgbe_recv_pkts_vec() shuf_msk[0..63] % 2 == 0 and shuf_msk[64..127] % 2.
> So compiler makes optimisation:
> pkt_mb4[0..63] = descs[3] [0..63];
> pkt_mb4[64..127] = descs[3] [0..63];
> i.e:
> vpunpcklqdq %xmm3,%xmm3,%xmm3
>
> BTW, changing to
> __builtin_shuffle((__v16qi) descs[3], (__v16qi)shuf_msk);
> wouldn't help either.
> In that case __builtin_shuffle will consider elements of mas modulo 16.
> While _mm_shuffle_epi8 (PSHUFB) is expected to zero destination byte if upper bit in the corresponding mask byte is 1.
>
Ok, Sorry for the delayed response, I spend the weekend reading up on the
differences between these instructions, and you're right, I thought those
operations were equivalent, but the zeroing operation differentiates them.
Sorry about that.
That said, it would be really helpful for distribution packaging to be able to
enable vectorized reception at run time. A compile time build variable really
just isn't very helpful. I'll see if I can rework the patch to allow optional
vectorized patch reception at run time.
Neil
>
>
prev parent reply other threads:[~2014-07-28 13:30 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-24 18:23 Neil Horman
2014-07-24 20:41 ` Venkatesan, Venky
2014-07-25 2:20 ` Neil Horman
2014-07-26 11:15 ` Ananyev, Konstantin
2014-07-28 13:32 ` Neil Horman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140728133223.GA25278@hmsreliant.think-freely.org \
--to=nhorman@tuxdriver.com \
--cc=dev@dpdk.org \
--cc=konstantin.ananyev@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).