From: "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com>
To: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>,
Jerin Jacob Kollanukkaran <jerinj@marvell.com>,
Bruce Richardson <bruce.richardson@intel.com>,
Gavin Hu <gavin.hu@arm.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] lmp: add lookup x4 with x4 default values
Date: Mon, 13 Jan 2020 17:48:11 +0000 [thread overview]
Message-ID: <57c112b9-cacc-b355-f69c-6b60986c5d4d@intel.com> (raw)
In-Reply-To: <CY4PR1801MB1863BEB4D262A77F0E8FF19FDE350@CY4PR1801MB1863.namprd18.prod.outlook.com>
Hi,
On 13/01/2020 12:34, Pavan Nikhilesh Bhagavatula wrote:
>> -----Original Message-----
>> From: dev <dev-bounces@dpdk.org> On Behalf Of Medvedkin,
>> Vladimir
>> Sent: Monday, January 13, 2020 4:37 PM
>> To: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>; Jerin
>> Jacob Kollanukkaran <jerinj@marvell.com>; Bruce Richardson
>> <bruce.richardson@intel.com>; Gavin Hu <gavin.hu@arm.com>
>> Cc: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] lmp: add lookup x4 with x4 default
>> values
>>
>> Hi Pavan,
>>
> Hi Medvedkin,
>
>> I don't think it is a good idea to add extra function because:
>>
>> 1) it is just a copy of an existing rte_lpm_lookupx4() except the last 4
>> ternary ops
> Yes, but I had no other option as modifying the current function will break ABI ☹.
>
>> 2) What is a real world use case for that? Usually returned value is
>> used as an index in an array of next_hop structs.
> If we take l3fwd as an example the next hop holds fwd port_id whereas the default value
> Passed holds mbuf->port. This allows Tx without having a branch.
>
> Event devices can aggregate packets from multiple ethernet ports and schedule them on
> a core. The current API requires us to pass a BAD_PORT and compare the result for every
> packet but if we are allowed to pass 4 different default values we could seamlessly send
> them for Tx.
>
>> 3) You can have the same result by using special unused defv and
>> pcmpeqd/vpblendd on a hop[4] after lookup
> Yes, but sadly that would be architecture depended.
But rte_lpm_lookupx4() itself is architecture depended. My suggestion
here would be - implement rte_lpm_lookupx4_defx4() in arch specific .c
files as a wraper around rte_lpm_lookupx4() and do pcmpeqd/vpblendd
stuff after. In this case you won't need to copy all of this implemented
code.
>
>> On 11/01/2020 16:08, pbhagavatula@marvell.com wrote:
>>> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>>
>>> Add lookup x4 with x4 default values.
>>> This can be used in usecases where we have to process burst of
>> packets
>>> from different ports.
>>>
>>> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>> ---
>>> app/test/test_lpm_perf.c | 31 +++++++++
>>> lib/librte_lpm/rte_lpm.h | 23 +++++++
>>> lib/librte_lpm/rte_lpm_altivec.h | 109
>> +++++++++++++++++++++++++++++++
>>> lib/librte_lpm/rte_lpm_neon.h | 102
>> +++++++++++++++++++++++++++++
>>> lib/librte_lpm/rte_lpm_sse.h | 104
>> +++++++++++++++++++++++++++++
>>> 5 files changed, 369 insertions(+)
>>>
>>> diff --git a/app/test/test_lpm_perf.c b/app/test/test_lpm_perf.c
>>> index a2578fe90..8e9d4c7eb 100644
>>> --- a/app/test/test_lpm_perf.c
>>> +++ b/app/test/test_lpm_perf.c
>>> @@ -460,6 +460,37 @@ test_lpm_perf(void)
>>> (double)total_time / ((double)ITERATIONS *
>> BATCH_SIZE),
>>> (count * 100.0) / (double)(ITERATIONS *
>> BATCH_SIZE));
>>> + /* Measure LookupX4 DefaultX4 */
>>> + total_time = 0;
>>> + count = 0;
>>> + uint32_t def[4] = {UINT32_MAX, UINT32_MAX, UINT32_MAX,
>> UINT32_MAX};
>>> + for (i = 0; i < ITERATIONS; i++) {
>>> + static uint32_t ip_batch[BATCH_SIZE];
>>> + uint32_t next_hops[4];
>>> +
>>> + /* Create array of random IP addresses */
>>> + for (j = 0; j < BATCH_SIZE; j++)
>>> + ip_batch[j] = rte_rand();
>>> +
>>> + /* Lookup per batch */
>>> + begin = rte_rdtsc();
>>> + for (j = 0; j < BATCH_SIZE; j += RTE_DIM(next_hops)) {
>>> + unsigned int k;
>>> + xmm_t ipx4;
>>> +
>>> + ipx4 = vect_loadu_sil128((xmm_t *)(ip_batch +
>> j));
>>> + ipx4 = *(xmm_t *)(ip_batch + j);
>>> + rte_lpm_lookupx4_defx4(lpm, ipx4, next_hops,
>> def);
>>> + for (k = 0; k < RTE_DIM(next_hops); k++)
>>> + if (unlikely(next_hops[k] ==
>> UINT32_MAX))
>>> + count++;
>>> + }
>>> +
>>> + total_time += rte_rdtsc() - begin;
>>> + }
>>> + printf("LPM LookupX4 Defx4: %.1f cycles (fails = %.1f%%)\n",
>>> + (double)total_time / ((double)ITERATIONS *
>> BATCH_SIZE),
>>> + (count * 100.0) / (double)(ITERATIONS *
>> BATCH_SIZE));
>>> /* Measure Delete */
>>> status = 0;
>>> begin = rte_rdtsc();
>>> diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h
>>> index b9d49ac87..e66b43e06 100644
>>> --- a/lib/librte_lpm/rte_lpm.h
>>> +++ b/lib/librte_lpm/rte_lpm.h
>>> @@ -370,6 +370,29 @@ static inline void
>>> rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t
>> hop[4],
>>> uint32_t defv);
>>>
>>> +/**
>>> + * Lookup four IP addresses in an LPM table.
>>> + *
>>> + * @param lpm
>>> + * LPM object handle
>>> + * @param ip
>>> + * Four IPs to be looked up in the LPM table
>>> + * @param hop
>>> + * Next hop of the most specific rule found for IP (valid on lookup
>> hit only).
>>> + * This is an 4 elements array of two byte values.
>>> + * If the lookup was successful for the given IP, then least significant
>> byte
>>> + * of the corresponding element is the actual next hop and the
>> most
>>> + * significant byte is zero.
>>> + * If the lookup for the given IP failed, then corresponding element
>> would
>>> + * contain default value, see description of then next parameter.
>>> + * @param defv
>>> + * Default value[] to populate into corresponding element of hop[]
>> array,
>>> + * if lookup would fail.
>>> + */
>>> +static inline void
>>> +rte_lpm_lookupx4_defx4(const struct rte_lpm *lpm, xmm_t ip,
>> uint32_t hop[4],
>>> + uint32_t defv[4]);
>>> +
>>> #if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
>>> #include "rte_lpm_neon.h"
>>> #elif defined(RTE_ARCH_PPC_64)
>>> diff --git a/lib/librte_lpm/rte_lpm_altivec.h
>> b/lib/librte_lpm/rte_lpm_altivec.h
>>> index 228c41b38..1afc7bd74 100644
>>> --- a/lib/librte_lpm/rte_lpm_altivec.h
>>> +++ b/lib/librte_lpm/rte_lpm_altivec.h
>>> @@ -120,6 +120,115 @@ rte_lpm_lookupx4(const struct rte_lpm
>> *lpm, xmm_t ip, uint32_t hop[4],
>>> hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[3] &
>> 0x00FFFFFF : defv;
>>> }
>>>
>>> +static inline void
>>> +rte_lpm_lookupx4_defx4(const struct rte_lpm *lpm, xmm_t ip,
>> uint32_t hop[4],
>>> + uint32_t defv[4])
>>> +{
>>> + vector signed int i24;
>>> + rte_xmm_t i8;
>>> + uint32_t tbl[4];
>>> + uint64_t idx, pt, pt2;
>>> + const uint32_t *ptbl;
>>> +
>>> + const uint32_t mask = UINT8_MAX;
>>> + const vector signed int mask8 = (xmm_t){mask, mask, mask,
>> mask};
>>> +
>>> + /*
>>> + * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 2 LPM entries
>>> + * as one 64-bit value (0x0300000003000000).
>>> + */
>>> + const uint64_t mask_xv =
>>> + ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK |
>>> + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK <<
>> 32);
>>> +
>>> + /*
>>> + * RTE_LPM_LOOKUP_SUCCESS for 2 LPM entries
>>> + * as one 64-bit value (0x0100000001000000).
>>> + */
>>> + const uint64_t mask_v =
>>> + ((uint64_t)RTE_LPM_LOOKUP_SUCCESS |
>>> + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32);
>>> +
>>> + /* get 4 indexes for tbl24[]. */
>>> + i24 = vec_sr((xmm_t) ip,
>>> + (vector unsigned int){CHAR_BIT, CHAR_BIT, CHAR_BIT,
>> CHAR_BIT});
>>> +
>>> + /* extract values from tbl24[] */
>>> + idx = (uint32_t)i24[0];
>>> + idx = idx < (1<<24) ? idx : (1<<24)-1;
>>> + ptbl = (const uint32_t *)&lpm->tbl24[idx];
>>> + tbl[0] = *ptbl;
>>> +
>>> + idx = (uint32_t) i24[1];
>>> + idx = idx < (1<<24) ? idx : (1<<24)-1;
>>> + ptbl = (const uint32_t *)&lpm->tbl24[idx];
>>> + tbl[1] = *ptbl;
>>> +
>>> + idx = (uint32_t) i24[2];
>>> + idx = idx < (1<<24) ? idx : (1<<24)-1;
>>> + ptbl = (const uint32_t *)&lpm->tbl24[idx];
>>> + tbl[2] = *ptbl;
>>> +
>>> + idx = (uint32_t) i24[3];
>>> + idx = idx < (1<<24) ? idx : (1<<24)-1;
>>> + ptbl = (const uint32_t *)&lpm->tbl24[idx];
>>> + tbl[3] = *ptbl;
>>> +
>>> + /* get 4 indexes for tbl8[]. */
>>> + i8.x = vec_and(ip, mask8);
>>> +
>>> + pt = (uint64_t)tbl[0] |
>>> + (uint64_t)tbl[1] << 32;
>>> + pt2 = (uint64_t)tbl[2] |
>>> + (uint64_t)tbl[3] << 32;
>>> +
>>> + /* search successfully finished for all 4 IP addresses. */
>>> + if (likely((pt & mask_xv) == mask_v) &&
>>> + likely((pt2 & mask_xv) == mask_v)) {
>>> + *(uint64_t *)hop = pt & RTE_LPM_MASKX4_RES;
>>> + *(uint64_t *)(hop + 2) = pt2 & RTE_LPM_MASKX4_RES;
>>> + return;
>>> + }
>>> +
>>> + if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) ==
>>> + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {
>>> + i8.u32[0] = i8.u32[0] +
>>> + (uint8_t)tbl[0] *
>> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
>>> + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[0]];
>>> + tbl[0] = *ptbl;
>>> + }
>>> + if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK)
>> ==
>>> + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {
>>> + i8.u32[1] = i8.u32[1] +
>>> + (uint8_t)tbl[1] *
>> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
>>> + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[1]];
>>> + tbl[1] = *ptbl;
>>> + }
>>> + if (unlikely((pt2 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) ==
>>> + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {
>>> + i8.u32[2] = i8.u32[2] +
>>> + (uint8_t)tbl[2] *
>> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
>>> + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[2]];
>>> + tbl[2] = *ptbl;
>>> + }
>>> + if (unlikely((pt2 >> 32 &
>> RTE_LPM_VALID_EXT_ENTRY_BITMASK) ==
>>> + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {
>>> + i8.u32[3] = i8.u32[3] +
>>> + (uint8_t)tbl[3] *
>> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
>>> + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[3]];
>>> + tbl[3] = *ptbl;
>>> + }
>>> +
>>> + hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[0] &
>> 0x00FFFFFF :
>>> +
>> defv[0];
>>> + hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[1] &
>> 0x00FFFFFF :
>>> +
>> defv[1];
>>> + hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[2] &
>> 0x00FFFFFF :
>>> +
>> defv[2];
>>> + hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[3] &
>> 0x00FFFFFF :
>>> +
>> defv[3];
>>> +}
>>> +
>>> #ifdef __cplusplus
>>> }
>>> #endif
>>> diff --git a/lib/librte_lpm/rte_lpm_neon.h
>> b/lib/librte_lpm/rte_lpm_neon.h
>>> index 6c131d312..6ef635b18 100644
>>> --- a/lib/librte_lpm/rte_lpm_neon.h
>>> +++ b/lib/librte_lpm/rte_lpm_neon.h
>>> @@ -113,6 +113,108 @@ rte_lpm_lookupx4(const struct rte_lpm
>> *lpm, xmm_t ip, uint32_t hop[4],
>>> hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[3] &
>> 0x00FFFFFF : defv;
>>> }
>>>
>>> +static inline void
>>> +rte_lpm_lookupx4_defx4(const struct rte_lpm *lpm, xmm_t ip,
>> uint32_t hop[4],
>>> + uint32_t defv[4])
>>> +{
>>> + uint32x4_t i24;
>>> + rte_xmm_t i8;
>>> + uint32_t tbl[4];
>>> + uint64_t idx, pt, pt2;
>>> + const uint32_t *ptbl;
>>> +
>>> + const uint32_t mask = UINT8_MAX;
>>> + const int32x4_t mask8 = vdupq_n_s32(mask);
>>> +
>>> + /*
>>> + * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 2 LPM entries
>>> + * as one 64-bit value (0x0300000003000000).
>>> + */
>>> + const uint64_t mask_xv =
>>> + ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK |
>>> + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK <<
>> 32);
>>> +
>>> + /*
>>> + * RTE_LPM_LOOKUP_SUCCESS for 2 LPM entries
>>> + * as one 64-bit value (0x0100000001000000).
>>> + */
>>> + const uint64_t mask_v =
>>> + ((uint64_t)RTE_LPM_LOOKUP_SUCCESS |
>>> + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32);
>>> +
>>> + /* get 4 indexes for tbl24[]. */
>>> + i24 = vshrq_n_u32((uint32x4_t)ip, CHAR_BIT);
>>> +
>>> + /* extract values from tbl24[] */
>>> + idx = vgetq_lane_u64((uint64x2_t)i24, 0);
>>> +
>>> + ptbl = (const uint32_t *)&lpm->tbl24[(uint32_t)idx];
>>> + tbl[0] = *ptbl;
>>> + ptbl = (const uint32_t *)&lpm->tbl24[idx >> 32];
>>> + tbl[1] = *ptbl;
>>> +
>>> + idx = vgetq_lane_u64((uint64x2_t)i24, 1);
>>> +
>>> + ptbl = (const uint32_t *)&lpm->tbl24[(uint32_t)idx];
>>> + tbl[2] = *ptbl;
>>> + ptbl = (const uint32_t *)&lpm->tbl24[idx >> 32];
>>> + tbl[3] = *ptbl;
>>> +
>>> + /* get 4 indexes for tbl8[]. */
>>> + i8.x = vandq_s32(ip, mask8);
>>> +
>>> + pt = (uint64_t)tbl[0] |
>>> + (uint64_t)tbl[1] << 32;
>>> + pt2 = (uint64_t)tbl[2] |
>>> + (uint64_t)tbl[3] << 32;
>>> +
>>> + /* search successfully finished for all 4 IP addresses. */
>>> + if (likely((pt & mask_xv) == mask_v) &&
>>> + likely((pt2 & mask_xv) == mask_v)) {
>>> + *(uint64_t *)hop = pt & RTE_LPM_MASKX4_RES;
>>> + *(uint64_t *)(hop + 2) = pt2 & RTE_LPM_MASKX4_RES;
>>> + return;
>>> + }
>>> +
>>> + if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) ==
>>> + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {
>>> + i8.u32[0] = i8.u32[0] +
>>> + (uint8_t)tbl[0] *
>> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
>>> + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[0]];
>>> + tbl[0] = *ptbl;
>>> + }
>>> + if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK)
>> ==
>>> + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {
>>> + i8.u32[1] = i8.u32[1] +
>>> + (uint8_t)tbl[1] *
>> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
>>> + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[1]];
>>> + tbl[1] = *ptbl;
>>> + }
>>> + if (unlikely((pt2 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) ==
>>> + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {
>>> + i8.u32[2] = i8.u32[2] +
>>> + (uint8_t)tbl[2] *
>> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
>>> + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[2]];
>>> + tbl[2] = *ptbl;
>>> + }
>>> + if (unlikely((pt2 >> 32 &
>> RTE_LPM_VALID_EXT_ENTRY_BITMASK) ==
>>> + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {
>>> + i8.u32[3] = i8.u32[3] +
>>> + (uint8_t)tbl[3] *
>> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
>>> + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[3]];
>>> + tbl[3] = *ptbl;
>>> + }
>>> +
>>> + hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[0] &
>> 0x00FFFFFF :
>>> +
>> defv[0];
>>> + hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[1] &
>> 0x00FFFFFF :
>>> +
>> defv[1];
>>> + hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[2] &
>> 0x00FFFFFF :
>>> +
>> defv[2];
>>> + hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[3] &
>> 0x00FFFFFF :
>>> +
>> defv[3];
>>> +}
>>> +
>>> #ifdef __cplusplus
>>> }
>>> #endif
>>> diff --git a/lib/librte_lpm/rte_lpm_sse.h
>> b/lib/librte_lpm/rte_lpm_sse.h
>>> index 44770b6ff..6ef15816c 100644
>>> --- a/lib/librte_lpm/rte_lpm_sse.h
>>> +++ b/lib/librte_lpm/rte_lpm_sse.h
>>> @@ -114,6 +114,110 @@ rte_lpm_lookupx4(const struct rte_lpm
>> *lpm, xmm_t ip, uint32_t hop[4],
>>> hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[3] &
>> 0x00FFFFFF : defv;
>>> }
>>>
>>> +static inline void
>>> +rte_lpm_lookupx4_defx4(const struct rte_lpm *lpm, xmm_t ip,
>> uint32_t hop[4],
>>> + uint32_t defv[4])
>>> +{
>>> + __m128i i24;
>>> + rte_xmm_t i8;
>>> + uint32_t tbl[4];
>>> + uint64_t idx, pt, pt2;
>>> + const uint32_t *ptbl;
>>> +
>>> + const __m128i mask8 =
>>> + _mm_set_epi32(UINT8_MAX, UINT8_MAX,
>> UINT8_MAX, UINT8_MAX);
>>> +
>>> + /*
>>> + * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 2 LPM entries
>>> + * as one 64-bit value (0x0300000003000000).
>>> + */
>>> + const uint64_t mask_xv =
>>> + ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK |
>>> + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK <<
>> 32);
>>> +
>>> + /*
>>> + * RTE_LPM_LOOKUP_SUCCESS for 2 LPM entries
>>> + * as one 64-bit value (0x0100000001000000).
>>> + */
>>> + const uint64_t mask_v =
>>> + ((uint64_t)RTE_LPM_LOOKUP_SUCCESS |
>>> + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32);
>>> +
>>> + /* get 4 indexes for tbl24[]. */
>>> + i24 = _mm_srli_epi32(ip, CHAR_BIT);
>>> +
>>> + /* extract values from tbl24[] */
>>> + idx = _mm_cvtsi128_si64(i24);
>>> + /* With -O0 option, gcc 4.8 - 5.4 fails to fold sizeof() into a
>> constant */
>>> + i24 = _mm_srli_si128(i24, /* sizeof(uint64_t) */ 8);
>>> +
>>> + ptbl = (const uint32_t *)&lpm->tbl24[(uint32_t)idx];
>>> + tbl[0] = *ptbl;
>>> + ptbl = (const uint32_t *)&lpm->tbl24[idx >> 32];
>>> + tbl[1] = *ptbl;
>>> +
>>> + idx = _mm_cvtsi128_si64(i24);
>>> +
>>> + ptbl = (const uint32_t *)&lpm->tbl24[(uint32_t)idx];
>>> + tbl[2] = *ptbl;
>>> + ptbl = (const uint32_t *)&lpm->tbl24[idx >> 32];
>>> + tbl[3] = *ptbl;
>>> +
>>> + /* get 4 indexes for tbl8[]. */
>>> + i8.x = _mm_and_si128(ip, mask8);
>>> +
>>> + pt = (uint64_t)tbl[0] |
>>> + (uint64_t)tbl[1] << 32;
>>> + pt2 = (uint64_t)tbl[2] |
>>> + (uint64_t)tbl[3] << 32;
>>> +
>>> + /* search successfully finished for all 4 IP addresses. */
>>> + if (likely((pt & mask_xv) == mask_v) &&
>>> + likely((pt2 & mask_xv) == mask_v)) {
>>> + *(uint64_t *)hop = pt & RTE_LPM_MASKX4_RES;
>>> + *(uint64_t *)(hop + 2) = pt2 & RTE_LPM_MASKX4_RES;
>>> + return;
>>> + }
>>> +
>>> + if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) ==
>>> + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {
>>> + i8.u32[0] = i8.u32[0] +
>>> + (uint8_t)tbl[0] *
>> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
>>> + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[0]];
>>> + tbl[0] = *ptbl;
>>> + }
>>> + if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK)
>> ==
>>> + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {
>>> + i8.u32[1] = i8.u32[1] +
>>> + (uint8_t)tbl[1] *
>> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
>>> + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[1]];
>>> + tbl[1] = *ptbl;
>>> + }
>>> + if (unlikely((pt2 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) ==
>>> + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {
>>> + i8.u32[2] = i8.u32[2] +
>>> + (uint8_t)tbl[2] *
>> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
>>> + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[2]];
>>> + tbl[2] = *ptbl;
>>> + }
>>> + if (unlikely((pt2 >> 32 &
>> RTE_LPM_VALID_EXT_ENTRY_BITMASK) ==
>>> + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) {
>>> + i8.u32[3] = i8.u32[3] +
>>> + (uint8_t)tbl[3] *
>> RTE_LPM_TBL8_GROUP_NUM_ENTRIES;
>>> + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[3]];
>>> + tbl[3] = *ptbl;
>>> + }
>>> +
>>> + hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[0] &
>> 0x00FFFFFF :
>>> +
>> defv[0];
>>> + hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[1] &
>> 0x00FFFFFF :
>>> +
>> defv[1];
>>> + hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[2] &
>> 0x00FFFFFF :
>>> +
>> defv[2];
>>> + hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[3] &
>> 0x00FFFFFF :
>>> +
>> defv[3];
>>> +}
>>> +
>>> #ifdef __cplusplus
>>> }
>>> #endif
>> --
>> Regards,
>> Vladimir
--
Regards,
Vladimir
prev parent reply other threads:[~2020-01-13 17:48 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-11 16:08 pbhagavatula
2020-01-13 11:06 ` Medvedkin, Vladimir
2020-01-13 12:34 ` Pavan Nikhilesh Bhagavatula
2020-01-13 17:48 ` Medvedkin, Vladimir [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57c112b9-cacc-b355-f69c-6b60986c5d4d@intel.com \
--to=vladimir.medvedkin@intel.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=gavin.hu@arm.com \
--cc=jerinj@marvell.com \
--cc=pbhagavatula@marvell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).