From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-ve1eur01on0072.outbound.protection.outlook.com [104.47.1.72]) by dpdk.org (Postfix) with ESMTP id 404F91B1EF for ; Fri, 13 Jul 2018 08:16:36 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=cqWRs2JZAb4bbY9uwg0b3kvR5wVvBJe2ZnofGzIJYow=; b=lAeNJ+LMC1pNXywcoxz5EW4oKvfjQqrgWph+R7HdqkDK2Eq+Hm5pJfIbDCFsd2/H0aA/e0QBL7HMXPx7ce4r4XoiHkzFpZDe51sf5Ob40nw0i68bGiFxebizmYSGNhvSXAVlrmBMs3CW7To9rhJ9DK9wZdVl1ujLHYSPKnb1z1Y= Received: from DB7PR05MB4426.eurprd05.prod.outlook.com (52.134.109.15) by DB7PR05MB4201.eurprd05.prod.outlook.com (52.134.107.158) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.930.21; Fri, 13 Jul 2018 06:16:33 +0000 Received: from DB7PR05MB4426.eurprd05.prod.outlook.com ([fe80::d9c6:913c:c361:f7b7]) by DB7PR05MB4426.eurprd05.prod.outlook.com ([fe80::d9c6:913c:c361:f7b7%6]) with mapi id 15.20.0930.022; Fri, 13 Jul 2018 06:16:33 +0000 From: Shahaf Shuler To: Mordechay Haimovsky CC: Yongseok Koh , "dev@dpdk.org" , "ferruh.yigit@intel.com" Thread-Topic: [PATCH v3] net/mlx5: add support for 32bit systems Thread-Index: AQHUGdggl0axjyl7Gku4wfvQKGIGo6SLhkHA Date: Fri, 13 Jul 2018 06:16:33 +0000 Message-ID: References: <1530529900-27859-1-git-send-email-motih@mellanox.com> <1531396891-23874-1-git-send-email-motih@mellanox.com> In-Reply-To: <1531396891-23874-1-git-send-email-motih@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=shahafs@mellanox.com; x-originating-ip: [82.81.16.27] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB7PR05MB4201; 7:/51/BAvAY6eZVU1X1/UHPrrtDwmuKBZFYPvYA6De52ObfV3IfICYQbI7kODF//NU9DqPMWgj0BVz9Dwm6mF8rib9/X7fFhE0XcT+BfEpmeEbB1t8AaV56TxoPcPmqf176OveQFhW4eeyhM9HO7es1BEIT4rukVZCo+EQKa+d2mimzp+CMF2MKIpV89GL+hAUH17c9ASlQulI7HR87cyACIBt22yx1ywKZix7K7LxK0vbYYXIrdUV9KdiIknx0BIP x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: 86de7569-3673-412f-017d-08d5e8882ea1 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989117)(48565401081)(5600053)(711020)(2017052603328)(7153060)(7193020); SRVR:DB7PR05MB4201; x-ms-traffictypediagnostic: DB7PR05MB4201: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(166708455590820)(275740015457677)(111885846020525)(84791874153150); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(93006095)(93001095)(10201501046)(3231311)(944501410)(52105095)(3002001)(6055026)(149027)(150027)(6041310)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123558120)(20161123562045)(20161123564045)(6072148)(201708071742011)(7699016); SRVR:DB7PR05MB4201; BCL:0; PCL:0; RULEID:; SRVR:DB7PR05MB4201; x-forefront-prvs: 07326CFBC4 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(366004)(136003)(39860400002)(346002)(376002)(396003)(199004)(189003)(256004)(2906002)(5660300001)(966005)(14454004)(5250100002)(316002)(575784001)(14444005)(76176011)(305945005)(55016002)(8936002)(6306002)(74316002)(53946003)(81166006)(81156014)(2900100001)(8676002)(86362001)(53936002)(99286004)(6246003)(7696005)(25786009)(26005)(105586002)(476003)(33656002)(66066001)(6636002)(229853002)(11346002)(7736002)(4326008)(102836004)(446003)(6862004)(6506007)(106356001)(478600001)(9686003)(6116002)(68736007)(6436002)(97736004)(3846002)(486006)(54906003); DIR:OUT; SFP:1101; SCL:1; SRVR:DB7PR05MB4201; H:DB7PR05MB4426.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: NueRrFv/undTDAg3rrKRqEexn4PxYvEg97eeVdsfFb0EtqwIg6WF0qmZUJ1qiObKRYVJch9Q2ckWa2KNo/aaH7VvFV8jAzCs66ys6hiFvB3dawB1Y35vMyZwdSUOaCWyxkdQRHv0XBMa07ojdW7QuHSXlRP7nXyDu8W0FY94tv/Ze5O4cNgbfqbSAcDzMHst/854QlkYY6fLbyg/bwmZF+9pHtdHQ6REVeCFOu6IHFQUj7OO0DwopPV7qVX+3fwCsOG/9yzApqxKln4z0Yc0K0jZyM//6vkCJcERDDrezppLDWjEwY3krCXI5HrxCtG+N5+4unrW6rdYwSxqVsfGWE6dPZEV7r/jkddDiGubiSg= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 86de7569-3673-412f-017d-08d5e8882ea1 X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Jul 2018 06:16:33.7372 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB7PR05MB4201 Subject: Re: [dpdk-dev] [PATCH v3] net/mlx5: add support for 32bit systems X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jul 2018 06:16:36 -0000 Thursday, July 12, 2018 3:02 PM, Mordechay Haimovsky: > Subject: [PATCH v3] net/mlx5: add support for 32bit systems >=20 > This patch adds support for building and running mlx5 PMD on 32bit system= s > such as i686. >=20 > The main issue to tackle was handling the 32bit access to the UAR as quot= ed > from the mlx5 PRM: > QP and CQ DoorBells require 64-bit writes. For best performance, it is > recommended to execute the QP/CQ DoorBell as a single 64-bit write > operation. For platforms that do not support 64 bit writes, it is possibl= e to > issue the 64 bits DoorBells through two consecutive writes, each write 32 > bits, as described below: > * The order of writing each of the Dwords is from lower to upper > addresses. > * No other DoorBell can be rung (or even start ringing) in the midst of = an on- > going write of a DoorBell over a given UAR page. > The last rule implies that in a multi-threaded environment, the access to= a > UAR page (which can be accessible by all threads in the process) must be > synchronized (for example, using a semaphore) unless an atomic write of 6= 4 > bits in a single bus operation is guaranteed. Such a synchronization is n= ot > required for when ringing DoorBells on different UAR pages. >=20 > Signed-off-by: Moti Haimovsky Applied to next-net-mlx (again), thanks.=20 Guidelines for 32b compilation and testing: 1. fetch the latest rdma-core from github. Make sure you have commit "708c8= 242 mlx5: Fix compilation on 32 bit systems when sse3 is on" 2. compile rdma-core for 32b by mkdir build32 cd build32 CFLAGS=3D"-Werror -m32" cmake -GNinja .. -DENABLE_RESOLVE_NEIGH=3D0 -DIOCT= L_MODE=3Dboth (approach taken from rdma-core travis build https://github.co= m/linux-rdma/rdma-core/blob/master/buildlib/travis-build#L20)=20 Ninja (or ninja-build) 3. compile and run DPDK against build32 directory =09 > --- > v3: > * Rebased upon latest changes in mlx5 PMD and rdma-core. >=20 > v2: > * Fixed coding style issues. > * Modified documentation according to review inputs. > * Fixed merge conflicts. > --- > doc/guides/nics/features/mlx5.ini | 1 + > doc/guides/nics/mlx5.rst | 6 +++- > drivers/net/mlx5/mlx5.c | 8 ++++- > drivers/net/mlx5/mlx5.h | 5 +++ > drivers/net/mlx5/mlx5_defs.h | 18 ++++++++-- > drivers/net/mlx5/mlx5_rxq.c | 6 +++- > drivers/net/mlx5/mlx5_rxtx.c | 22 +++++++------ > drivers/net/mlx5/mlx5_rxtx.h | 69 > ++++++++++++++++++++++++++++++++++++++- > drivers/net/mlx5/mlx5_txq.c | 13 +++++++- > 9 files changed, 131 insertions(+), 17 deletions(-) >=20 > diff --git a/doc/guides/nics/features/mlx5.ini > b/doc/guides/nics/features/mlx5.ini > index e75b14b..b28b43e 100644 > --- a/doc/guides/nics/features/mlx5.ini > +++ b/doc/guides/nics/features/mlx5.ini > @@ -43,5 +43,6 @@ Multiprocess aware =3D Y > Other kdrv =3D Y > ARMv8 =3D Y > Power8 =3D Y > +x86-32 =3D Y > x86-64 =3D Y > Usage doc =3D Y > diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index > 0d0d217..ebf2336 100644 > --- a/doc/guides/nics/mlx5.rst > +++ b/doc/guides/nics/mlx5.rst > @@ -49,7 +49,7 @@ libibverbs. > Features > -------- >=20 > -- Multi arch support: x86_64, POWER8, ARMv8. > +- Multi arch support: x86_64, POWER8, ARMv8, i686. > - Multiple TX and RX queues. > - Support for scattered TX and RX frames. > - IPv4, IPv6, TCPv4, TCPv6, UDPv4 and UDPv6 RSS on any number of queues. > @@ -489,6 +489,10 @@ RMDA Core with Linux Kernel > - Minimal kernel version : v4.14 or the most recent 4.14-rc (see `Linux > installation documentation`_) > - Minimal rdma-core version: v15+ commit 0c5f5765213a ("Merge pull > request #227 from yishaih/tm") > (see `RDMA Core installation documentation`_) > +- When building for i686 use: > + > + - rdma-core version 18.0 or above built with 32bit support. > + - Kernel version 4.14.41 or above. >=20 > .. _`Linux installation documentation`: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux- > stable.git/plain/Documentation/admin-guide/README.rst > .. _`RDMA Core installation documentation`: > https://raw.githubusercontent.com/linux-rdma/rdma- > core/master/README.md > diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index > dda50b8..15f1a17 100644 > --- a/drivers/net/mlx5/mlx5.c > +++ b/drivers/net/mlx5/mlx5.c > @@ -598,7 +598,7 @@ > rte_memseg_walk(find_lower_va_bound, &addr); >=20 > /* keep distance to hugepages to minimize potential conflicts. */ > - addr =3D RTE_PTR_SUB(addr, MLX5_UAR_OFFSET + MLX5_UAR_SIZE); > + addr =3D RTE_PTR_SUB(addr, (uintptr_t)(MLX5_UAR_OFFSET + > +MLX5_UAR_SIZE)); > /* anonymous mmap, no real memory consumption. */ > addr =3D mmap(addr, MLX5_UAR_SIZE, > PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > @@ -939,6 +939,12 @@ > priv->device_attr =3D attr; > priv->pd =3D pd; > priv->mtu =3D ETHER_MTU; > +#ifndef RTE_ARCH_64 > + /* Initialize UAR access locks for 32bit implementations. */ > + rte_spinlock_init(&priv->uar_lock_cq); > + for (i =3D 0; i < MLX5_UAR_PAGE_NUM_MAX; i++) > + rte_spinlock_init(&priv->uar_lock[i]); > +#endif > /* Some internal functions rely on Netlink sockets, open them now. > */ > priv->nl_socket_rdma =3D mlx5_nl_init(0, NETLINK_RDMA); > priv->nl_socket_route =3D mlx5_nl_init(RTMGRP_LINK, > NETLINK_ROUTE); > diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index > 131be33..896158a 100644 > --- a/drivers/net/mlx5/mlx5.h > +++ b/drivers/net/mlx5/mlx5.h > @@ -215,6 +215,11 @@ struct priv { > int nl_socket_rdma; /* Netlink socket (NETLINK_RDMA). */ > int nl_socket_route; /* Netlink socket (NETLINK_ROUTE). */ > uint32_t nl_sn; /* Netlink message sequence number. */ > +#ifndef RTE_ARCH_64 > + rte_spinlock_t uar_lock_cq; /* CQs share a common distinct UAR */ > + rte_spinlock_t uar_lock[MLX5_UAR_PAGE_NUM_MAX]; > + /* UAR same-page access control required in 32bit implementations. > */ > +#endif > }; >=20 > #define PORT_ID(priv) ((priv)->dev_data->port_id) diff --git > a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h index > 5bbbec2..f6ec415 100644 > --- a/drivers/net/mlx5/mlx5_defs.h > +++ b/drivers/net/mlx5/mlx5_defs.h > @@ -87,14 +87,28 @@ > #define MLX5_LINK_STATUS_TIMEOUT 10 >=20 > /* Reserved address space for UAR mapping. */ -#define MLX5_UAR_SIZE > (1ULL << 32) > +#define MLX5_UAR_SIZE (1ULL << (sizeof(uintptr_t) * 4)) >=20 > /* Offset of reserved UAR address space to hugepage memory. Offset is > used here > * to minimize possibility of address next to hugepage being used by oth= er > code > * in either primary or secondary process, failing to map TX UAR would m= ake > TX > * packets invisible to HW. > */ > -#define MLX5_UAR_OFFSET (1ULL << 32) > +#define MLX5_UAR_OFFSET (1ULL << (sizeof(uintptr_t) * 4)) > + > +/* Maximum number of UAR pages used by a port, > + * These are the size and mask for an array of mutexes used to > +synchronize > + * the access to port's UARs on platforms that do not support 64 bit wri= tes. > + * In such systems it is possible to issue the 64 bits DoorBells > +through two > + * consecutive writes, each write 32 bits. The access to a UAR page > +(which can > + * be accessible by all threads in the process) must be synchronized > + * (for example, using a semaphore). Such a synchronization is not > +required > + * when ringing DoorBells on different UAR pages. > + * A port with 512 Tx queues uses 8, 4kBytes, UAR pages which are > +shared > + * among the ports. > + */ > +#define MLX5_UAR_PAGE_NUM_MAX 64 > +#define MLX5_UAR_PAGE_NUM_MASK ((MLX5_UAR_PAGE_NUM_MAX) > - 1) >=20 > /* Log 2 of the default number of strides per WQE for Multi-Packet RQ. *= / > #define MLX5_MPRQ_STRIDE_NUM_N 6U diff --git > a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index > 071740b..16e1641 100644 > --- a/drivers/net/mlx5/mlx5_rxq.c > +++ b/drivers/net/mlx5/mlx5_rxq.c > @@ -647,7 +647,8 @@ > doorbell =3D (uint64_t)doorbell_hi << 32; > doorbell |=3D rxq->cqn; > rxq->cq_db[MLX5_CQ_ARM_DB] =3D rte_cpu_to_be_32(doorbell_hi); > - rte_write64(rte_cpu_to_be_64(doorbell), cq_db_reg); > + mlx5_uar_write64(rte_cpu_to_be_64(doorbell), > + cq_db_reg, rxq->uar_lock_cq); > } >=20 > /** > @@ -1449,6 +1450,9 @@ struct mlx5_rxq_ctrl * > tmpl->rxq.elts_n =3D log2above(desc); > tmpl->rxq.elts =3D > (struct rte_mbuf *(*)[1 << tmpl->rxq.elts_n])(tmpl + 1); > +#ifndef RTE_ARCH_64 > + tmpl->rxq.uar_lock_cq =3D &priv->uar_lock_cq; #endif > tmpl->idx =3D idx; > rte_atomic32_inc(&tmpl->refcnt); > LIST_INSERT_HEAD(&priv->rxqsctrl, tmpl, next); diff --git > a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index > a7ed8d8..52a1074 100644 > --- a/drivers/net/mlx5/mlx5_rxtx.c > +++ b/drivers/net/mlx5/mlx5_rxtx.c > @@ -495,6 +495,7 @@ > volatile struct mlx5_wqe_ctrl *last_wqe =3D NULL; > unsigned int segs_n =3D 0; > const unsigned int max_inline =3D txq->max_inline; > + uint64_t addr_64; >=20 > if (unlikely(!pkts_n)) > return 0; > @@ -711,12 +712,12 @@ > ds =3D 3; > use_dseg: > /* Add the remaining packet as a simple ds. */ > - addr =3D rte_cpu_to_be_64(addr); > + addr_64 =3D rte_cpu_to_be_64(addr); > *dseg =3D (rte_v128u32_t){ > rte_cpu_to_be_32(length), > mlx5_tx_mb2mr(txq, buf), > - addr, > - addr >> 32, > + addr_64, > + addr_64 >> 32, > }; > ++ds; > if (!segs_n) > @@ -750,12 +751,12 @@ > total_length +=3D length; > #endif > /* Store segment information. */ > - addr =3D rte_cpu_to_be_64(rte_pktmbuf_mtod(buf, > uintptr_t)); > + addr_64 =3D rte_cpu_to_be_64(rte_pktmbuf_mtod(buf, > uintptr_t)); > *dseg =3D (rte_v128u32_t){ > rte_cpu_to_be_32(length), > mlx5_tx_mb2mr(txq, buf), > - addr, > - addr >> 32, > + addr_64, > + addr_64 >> 32, > }; > (*txq->elts)[++elts_head & elts_m] =3D buf; > if (--segs_n) > @@ -1450,6 +1451,7 @@ > unsigned int mpw_room =3D 0; > unsigned int inl_pad =3D 0; > uint32_t inl_hdr; > + uint64_t addr_64; > struct mlx5_mpw mpw =3D { > .state =3D MLX5_MPW_STATE_CLOSED, > }; > @@ -1586,13 +1588,13 @@ > ((uintptr_t)mpw.data.raw + > inl_pad); > (*txq->elts)[elts_head++ & elts_m] =3D buf; > - addr =3D rte_cpu_to_be_64(rte_pktmbuf_mtod(buf, > - uintptr_t)); > + addr_64 =3D > rte_cpu_to_be_64(rte_pktmbuf_mtod(buf, > + uintptr_t)); > *dseg =3D (rte_v128u32_t) { > rte_cpu_to_be_32(length), > mlx5_tx_mb2mr(txq, buf), > - addr, > - addr >> 32, > + addr_64, > + addr_64 >> 32, > }; > mpw.data.raw =3D (volatile void *)(dseg + 1); > mpw.total_len +=3D (inl_pad + sizeof(*dseg)); diff --git > a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index > a04a84f..992e977 100644 > --- a/drivers/net/mlx5/mlx5_rxtx.h > +++ b/drivers/net/mlx5/mlx5_rxtx.h > @@ -26,6 +26,8 @@ > #include > #include > #include > +#include > +#include >=20 > #include "mlx5_utils.h" > #include "mlx5.h" > @@ -118,6 +120,10 @@ struct mlx5_rxq_data { > void *cq_uar; /* CQ user access region. */ > uint32_t cqn; /* CQ number. */ > uint8_t cq_arm_sn; /* CQ arm seq number. */ > +#ifndef RTE_ARCH_64 > + rte_spinlock_t *uar_lock_cq; > + /* CQ (UAR) access lock required for 32bit implementations */ #endif > uint32_t tunnel; /* Tunnel information. */ } __rte_cache_aligned; >=20 > @@ -198,6 +204,10 @@ struct mlx5_txq_data { > volatile void *bf_reg; /* Blueflame register remapped. */ > struct rte_mbuf *(*elts)[]; /* TX elements. */ > struct mlx5_txq_stats stats; /* TX queue counters. */ > +#ifndef RTE_ARCH_64 > + rte_spinlock_t *uar_lock; > + /* UAR access lock required for 32bit implementations */ #endif > } __rte_cache_aligned; >=20 > /* Verbs Rx queue elements. */ > @@ -353,6 +363,63 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_txq, struct > rte_mbuf **pkts, uint32_t mlx5_rx_addr2mr_bh(struct mlx5_rxq_data > *rxq, uintptr_t addr); uint32_t mlx5_tx_addr2mr_bh(struct mlx5_txq_data > *txq, uintptr_t addr); >=20 > +/** > + * Provide safe 64bit store operation to mlx5 UAR region for both 32bit > +and > + * 64bit architectures. > + * > + * @param val > + * value to write in CPU endian format. > + * @param addr > + * Address to write to. > + * @param lock > + * Address of the lock to use for that UAR access. > + */ > +static __rte_always_inline void > +__mlx5_uar_write64_relaxed(uint64_t val, volatile void *addr, > + rte_spinlock_t *lock __rte_unused) { #ifdef > RTE_ARCH_64 > + rte_write64_relaxed(val, addr); > +#else /* !RTE_ARCH_64 */ > + rte_spinlock_lock(lock); > + rte_write32_relaxed(val, addr); > + rte_io_wmb(); > + rte_write32_relaxed(val >> 32, > + (volatile void *)((volatile char *)addr + 4)); > + rte_spinlock_unlock(lock); > +#endif > +} > + > +/** > + * Provide safe 64bit store operation to mlx5 UAR region for both 32bit > +and > + * 64bit architectures while guaranteeing the order of execution with > +the > + * code being executed. > + * > + * @param val > + * value to write in CPU endian format. > + * @param addr > + * Address to write to. > + * @param lock > + * Address of the lock to use for that UAR access. > + */ > +static __rte_always_inline void > +__mlx5_uar_write64(uint64_t val, volatile void *addr, rte_spinlock_t > +*lock) { > + rte_io_wmb(); > + __mlx5_uar_write64_relaxed(val, addr, lock); } > + > +/* Assist macros, used instead of directly calling the functions they > +wrap. */ #ifdef RTE_ARCH_64 #define mlx5_uar_write64_relaxed(val, dst, > +lock) \ > + __mlx5_uar_write64_relaxed(val, dst, NULL) #define > +mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, NULL) > +#else #define mlx5_uar_write64_relaxed(val, dst, lock) \ > + __mlx5_uar_write64_relaxed(val, dst, lock) #define > +mlx5_uar_write64(val, dst, lock) __mlx5_uar_write64(val, dst, lock) > +#endif > + > #ifndef NDEBUG > /** > * Verify or set magic value in CQE. > @@ -619,7 +686,7 @@ uint16_t mlx5_rx_burst_vec(void *dpdk_txq, struct > rte_mbuf **pkts, > *txq->qp_db =3D rte_cpu_to_be_32(txq->wqe_ci); > /* Ensure ordering between DB record and BF copy. */ > rte_wmb(); > - *dst =3D *src; > + mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock); > if (cond) > rte_wmb(); > } > diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c > index 5057561..f9bc473 100644 > --- a/drivers/net/mlx5/mlx5_txq.c > +++ b/drivers/net/mlx5/mlx5_txq.c > @@ -255,6 +255,9 @@ > struct mlx5_txq_ctrl *txq_ctrl; > int already_mapped; > size_t page_size =3D sysconf(_SC_PAGESIZE); > +#ifndef RTE_ARCH_64 > + unsigned int lock_idx; > +#endif >=20 > memset(pages, 0, priv->txqs_n * sizeof(uintptr_t)); > /* > @@ -281,7 +284,7 @@ > } > /* new address in reserved UAR address space. */ > addr =3D RTE_PTR_ADD(priv->uar_base, > - uar_va & (MLX5_UAR_SIZE - 1)); > + uar_va & (uintptr_t)(MLX5_UAR_SIZE - 1)); > if (!already_mapped) { > pages[pages_n++] =3D uar_va; > /* fixed mmap to specified address in reserved @@ - > 305,6 +308,12 @@ > else > assert(txq_ctrl->txq.bf_reg =3D=3D > RTE_PTR_ADD((void *)addr, off)); > +#ifndef RTE_ARCH_64 > + /* Assign a UAR lock according to UAR page number */ > + lock_idx =3D (txq_ctrl->uar_mmap_offset / page_size) & > + MLX5_UAR_PAGE_NUM_MASK; > + txq->uar_lock =3D &priv->uar_lock[lock_idx]; #endif > } > return 0; > } > @@ -511,6 +520,8 @@ struct mlx5_txq_ibv * > rte_atomic32_inc(&txq_ibv->refcnt); > if (qp.comp_mask & MLX5DV_QP_MASK_UAR_MMAP_OFFSET) { > txq_ctrl->uar_mmap_offset =3D qp.uar_mmap_offset; > + DRV_LOG(DEBUG, "port %u: uar_mmap_offset 0x%lx", > + dev->data->port_id, txq_ctrl->uar_mmap_offset); > } else { > DRV_LOG(ERR, > "port %u failed to retrieve UAR info, invalid" > -- > 1.8.3.1