From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <prvs=8921f9585b=rmody@marvell.com>
Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com
 [67.231.156.173]) by dpdk.org (Postfix) with ESMTP id B88FFDE3;
 Fri, 18 Jan 2019 17:57:51 +0100 (CET)
Received: from pps.filterd (m0045851.ppops.net [127.0.0.1])
 by mx0b-0016f401.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id
 x0IGtNI1010907; Fri, 18 Jan 2019 08:57:51 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com;
 h=from : to : cc :
 subject : date : message-id : references : in-reply-to : content-type :
 content-transfer-encoding : mime-version; s=pfpt0818;
 bh=I3n7Fj+1jwKjWR5pzWkg+HcGCCa2rl1X5WIUY5WStko=;
 b=bRZ/8qUbulXrtZRBZdcTvodZwU6tRtGxonVLzMlc64jtqoIj20vCixBYEq10XdPC88he
 LldUppS3bMYcLO76UZbO42AtKBSUZ7Ot1ZWEWmR7OW8nuabZe9Nx764c7otDXwvGrXkk
 g6BmKi20IYVXLxfLw75DCWKRJzItvsTeO0n+wYl0S8dGEZlzRyngAqrBf7LVSwqL4flk
 o/Kf8L6ukGU8hPGuelqzSCXuAMxJkYIzMOMiyNtuqJn+haBMKpowemol8wCqR2c0MxJB
 U3gMe9+38cZ6TwCZ4mOXDyWxOXwXFJr8Rl0uviDMI7CDajozgSL8FHXcB6TXodFrZlPG oQ== 
Received: from sc-exch04.marvell.com ([199.233.58.184])
 by mx0b-0016f401.pphosted.com with ESMTP id 2q2vubdm1b-10
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT);
 Fri, 18 Jan 2019 08:57:50 -0800
Received: from SC-EXCH04.marvell.com (10.93.176.84) by SC-EXCH04.marvell.com
 (10.93.176.84) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Fri, 18 Jan
 2019 08:57:49 -0800
Received: from NAM05-DM3-obe.outbound.protection.outlook.com (104.47.49.59) by
 SC-EXCH04.marvell.com (10.93.176.84) with Microsoft SMTP Server
 (TLS) id
 15.0.1367.3 via Frontend Transport; Fri, 18 Jan 2019 08:57:49 -0800
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=marvell.onmicrosoft.com; s=selector1-marvell-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=I3n7Fj+1jwKjWR5pzWkg+HcGCCa2rl1X5WIUY5WStko=;
 b=nXlAnxxY8Uhqb4iGzAeSG/LAQa8K1BEYi1ZzYsz7wke4Wrg12l9kSXppexlT1AissuAx2iCqHkHHUeVeuEQrq/E818IBftYqCKx1eXCJ6kHIyjKscskV7Ed7K1czd/bZXKwImiDoM/NuwepITdHypWMyaa96Cz3K66dbmz3Y68w=
Received: from BYAPR18MB2838.namprd18.prod.outlook.com (20.179.58.18) by
 BYAPR18MB2535.namprd18.prod.outlook.com (20.179.93.31) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.1537.26; Fri, 18 Jan 2019 16:57:42 +0000
Received: from BYAPR18MB2838.namprd18.prod.outlook.com
 ([fe80::99e4:8d74:dab4:f5c5]) by BYAPR18MB2838.namprd18.prod.outlook.com
 ([fe80::99e4:8d74:dab4:f5c5%4]) with mapi id 15.20.1537.018; Fri, 18 Jan 2019
 16:57:42 +0000
From: Rasesh Mody <rmody@marvell.com>
To: Shahed Shaikh <shshaikh@marvell.com>, "dev@dpdk.org" <dev@dpdk.org>
CC: "ferruh.yigit@intel.com" <ferruh.yigit@intel.com>, "stable@dpdk.org"
 <stable@dpdk.org>, "thomas@monjalon.net" <thomas@monjalon.net>
Thread-Topic: [dpdk-dev] [PATCH 1/2] net/qede: fix performance bottleneck in
 Rx	path
Thread-Index: AQHUrxi9eIEmWBrex0WK4CXUHjXztaW1PnJA
Date: Fri, 18 Jan 2019 16:57:41 +0000
Message-ID: <BYAPR18MB28388FB930DFE1C9FF5F4A0FB59C0@BYAPR18MB2838.namprd18.prod.outlook.com>
References: <20190118102930.27487-1-shshaikh@marvell.com>
In-Reply-To: <20190118102930.27487-1-shshaikh@marvell.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [198.186.1.5]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; BYAPR18MB2535;
 20:gnhWXx/HeN6StY/C3qPUdTQvxD5TS7lwA2ZlJ2gJzv0l6XNy0XkuEMjlqzhHVPUJRd2FCEhTH/V/5XjVVn1tHEfps9oIMZKmYXmkmgGB/krY4+mZecQWiPQ0sEIsv6aGrTBtaL6noAN9BRyjE0XBQsuKoJMXOe60dqOFf6Bmx6E=
x-ms-office365-filtering-correlation-id: 33cee200-2ef4-447c-f8c3-08d67d660f92
x-microsoft-antispam: BCL:0; PCL:0;
 RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600109)(711020)(2017052603328)(7153060)(7193020);
 SRVR:BYAPR18MB2535; 
x-ms-traffictypediagnostic: BYAPR18MB2535:
x-microsoft-antispam-prvs: <BYAPR18MB25359CA78736AD7B7604F720B59C0@BYAPR18MB2535.namprd18.prod.outlook.com>
x-forefront-prvs: 0921D55E4F
x-forefront-antispam-report: SFV:NSPM;
 SFS:(10009020)(376002)(346002)(39850400004)(396003)(366004)(136003)(199004)(189003)(3846002)(81156014)(486006)(71200400001)(305945005)(7736002)(110136005)(97736004)(99286004)(33656002)(54906003)(86362001)(316002)(66066001)(256004)(14444005)(6116002)(4326008)(2906002)(71190400001)(106356001)(74316002)(81166006)(6346003)(53936002)(6436002)(7696005)(25786009)(186003)(6246003)(8936002)(102836004)(55016002)(105586002)(5660300001)(229853002)(9686003)(478600001)(446003)(11346002)(2501003)(14454004)(476003)(68736007)(26005)(76176011)(6506007);
 DIR:OUT; SFP:1101; SCL:1; SRVR:BYAPR18MB2535;
 H:BYAPR18MB2838.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en;
 PTR:InfoNoRecords; A:1; MX:3; 
received-spf: None (protection.outlook.com: marvell.com does not designate
 permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: B/VtRl0kyklHMYs7vcHq2a6y6wWcs0iA+yZxSunNNC4ud5pzy9w0C/rUgqyIURfJvdEdMWT7ThyRTy8Pj2lQu8xwRWoL3DfK99bpPJv43WCURYou15G65+GogPhwLfP8omcPydZvSswPxl3EYImNzYxlfbMxxHLjdML3b9VSyPtnTskmIRP20q8y366JOMvZXa6rrkySLuodZhEJHlZhVqBmphunATJYuiWvDUE5WC4ND7R8CGr5WqGJYJaTPNQnpxdijFajq1/ZlFJem1ma0R4YPHhSdjUtH7rtBI8+40rvNrcJ1Q+OXbccn6Kr5zdONEGYpseUMsJCJmYreYrWy9sR3KiAwOtCDl0X2AZESDpWIENkjN1aqvprOgnacApE5Og9ihvYog4j01VNSQwarneMGeeK1bJrFzdR1BHf2R4=
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 33cee200-2ef4-447c-f8c3-08d67d660f92
X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Jan 2019 16:57:41.9649 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR18MB2535
X-OriginatorOrg: marvell.com
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, ,
 definitions=2019-01-18_10:, , signatures=0
X-Proofpoint-Details: rule=outbound_notspam policy=outbound score=0
 priorityscore=1501
 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0
 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0
 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx
 scancount=1 engine=8.0.1-1810050000 definitions=main-1901180121
X-Mailman-Approved-At: Sat, 19 Jan 2019 00:14:28 +0100
Subject: Re: [dpdk-dev] [PATCH 1/2] net/qede: fix performance bottleneck in
	Rx	path
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 18 Jan 2019 16:57:52 -0000

>From: dev <dev-bounces@dpdk.org> On Behalf Of Shahed Shaikh
>Sent: Friday, January 18, 2019 2:29 AM
>
>Allocating replacement buffer per received packet is expensive.
>Instead, process received packets first and allocate replacement buffers i=
n
>bulk later.
>
>This improves performance by ~25% in terms of PPS on AMD platforms.
>
>Fixes: 2ea6f76aff40 ("qede: add core driver")
>Cc: stable@dpdk.org
>
>Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
>---

Acked-by: Rasesh Mody <rmody@marvell.com>=20

> drivers/net/qede/qede_rxtx.c | 97
>+++++++++++++++++++++++++++++++++-----------
> drivers/net/qede/qede_rxtx.h |  2 +
> 2 files changed, 75 insertions(+), 24 deletions(-)
>
>diff --git a/drivers/net/qede/qede_rxtx.c b/drivers/net/qede/qede_rxtx.c
>index 0e33be1..684c4ae 100644
>--- a/drivers/net/qede/qede_rxtx.c
>+++ b/drivers/net/qede/qede_rxtx.c
>@@ -35,6 +35,52 @@ static inline int qede_alloc_rx_buffer(struct
>qede_rx_queue *rxq)
>        return 0;
> }
>
>+#define QEDE_MAX_BULK_ALLOC_COUNT 512
>+
>+static inline int qede_alloc_rx_bulk_mbufs(struct qede_rx_queue *rxq,
>+int count) {
>+       void *obj_p[QEDE_MAX_BULK_ALLOC_COUNT] __rte_cache_aligned;
>+       struct rte_mbuf *mbuf =3D NULL;
>+       struct eth_rx_bd *rx_bd;
>+       dma_addr_t mapping;
>+       int i, ret =3D 0;
>+       uint16_t idx;
>+
>+       idx =3D rxq->sw_rx_prod & NUM_RX_BDS(rxq);
>+
>+       if (count > QEDE_MAX_BULK_ALLOC_COUNT)
>+               count =3D QEDE_MAX_BULK_ALLOC_COUNT;
>+
>+       ret =3D rte_mempool_get_bulk(rxq->mb_pool, obj_p, count);
>+       if (unlikely(ret)) {
>+               PMD_RX_LOG(ERR, rxq,
>+                          "Failed to allocate %d rx buffers "
>+                           "sw_rx_prod %u sw_rx_cons %u mp entries %u fre=
e %u",
>+                           count, idx, rxq->sw_rx_cons & NUM_RX_BDS(rxq),
>+                           rte_mempool_avail_count(rxq->mb_pool),
>+                           rte_mempool_in_use_count(rxq->mb_pool));
>+               return -ENOMEM;
>+       }
>+
>+       for (i =3D 0; i < count; i++) {
>+               mbuf =3D obj_p[i];
>+               if (likely(i < count - 1))
>+                       rte_prefetch0(obj_p[i + 1]);
>+
>+               idx =3D rxq->sw_rx_prod & NUM_RX_BDS(rxq);
>+               rxq->sw_rx_ring[idx].mbuf =3D mbuf;
>+               rxq->sw_rx_ring[idx].page_offset =3D 0;
>+               mapping =3D rte_mbuf_data_iova_default(mbuf);
>+               rx_bd =3D (struct eth_rx_bd *)
>+                       ecore_chain_produce(&rxq->rx_bd_ring);
>+               rx_bd->addr.hi =3D rte_cpu_to_le_32(U64_HI(mapping));
>+               rx_bd->addr.lo =3D rte_cpu_to_le_32(U64_LO(mapping));
>+               rxq->sw_rx_prod++;
>+       }
>+
>+       return 0;
>+}
>+
> /* Criterias for calculating Rx buffer size -
>  * 1) rx_buf_size should not exceed the size of mbuf
>  * 2) In scattered_rx mode - minimum rx_buf_size should be @@ -1131,7
>+1177,7 @@ qede_reuse_page(__rte_unused struct qede_dev *qdev,
>                struct qede_rx_queue *rxq, struct qede_rx_entry *curr_cons=
)  {
>        struct eth_rx_bd *rx_bd_prod =3D ecore_chain_produce(&rxq-
>>rx_bd_ring);
>-       uint16_t idx =3D rxq->sw_rx_cons & NUM_RX_BDS(rxq);
>+       uint16_t idx =3D rxq->sw_rx_prod & NUM_RX_BDS(rxq);
>        struct qede_rx_entry *curr_prod;
>        dma_addr_t new_mapping;
>
>@@ -1364,7 +1410,6 @@ qede_recv_pkts(void *p_rxq, struct rte_mbuf
>**rx_pkts, uint16_t nb_pkts)
>        uint8_t bitfield_val;
> #endif
>        uint8_t tunn_parse_flag;
>-       uint8_t j;
>        struct eth_fast_path_rx_tpa_start_cqe *cqe_start_tpa;
>        uint64_t ol_flags;
>        uint32_t packet_type;
>@@ -1373,6 +1418,7 @@ qede_recv_pkts(void *p_rxq, struct rte_mbuf
>**rx_pkts, uint16_t nb_pkts)
>        uint8_t offset, tpa_agg_idx, flags;
>        struct qede_agg_info *tpa_info =3D NULL;
>        uint32_t rss_hash;
>+       int rx_alloc_count =3D 0;
>
>        hw_comp_cons =3D rte_le_to_cpu_16(*rxq->hw_cons_ptr);
>        sw_comp_cons =3D ecore_chain_get_cons_idx(&rxq->rx_comp_ring);
>@@ -1382,6 +1428,25 @@ qede_recv_pkts(void *p_rxq, struct rte_mbuf
>**rx_pkts, uint16_t nb_pkts)
>        if (hw_comp_cons =3D=3D sw_comp_cons)
>                return 0;
>
>+       /* Allocate buffers that we used in previous loop */
>+       if (rxq->rx_alloc_count) {
>+               if (unlikely(qede_alloc_rx_bulk_mbufs(rxq,
>+                            rxq->rx_alloc_count))) {
>+                       struct rte_eth_dev *dev;
>+
>+                       PMD_RX_LOG(ERR, rxq,
>+                                  "New buffer allocation failed,"
>+                                  "dropping incoming packetn");
>+                       dev =3D &rte_eth_devices[rxq->port_id];
>+                       dev->data->rx_mbuf_alloc_failed +=3D
>+                                                       rxq->rx_alloc_coun=
t;
>+                       rxq->rx_alloc_errors +=3D rxq->rx_alloc_count;
>+                       return 0;
>+               }
>+               qede_update_rx_prod(qdev, rxq);
>+               rxq->rx_alloc_count =3D 0;
>+       }
>+
>        while (sw_comp_cons !=3D hw_comp_cons) {
>                ol_flags =3D 0;
>                packet_type =3D RTE_PTYPE_UNKNOWN; @@ -1553,16 +1618,7 @@
>qede_recv_pkts(void *p_rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
>                        rx_mb->hash.rss =3D rss_hash;
>                }
>
>-               if (unlikely(qede_alloc_rx_buffer(rxq) !=3D 0)) {
>-                       PMD_RX_LOG(ERR, rxq,
>-                                  "New buffer allocation failed,"
>-                                  "dropping incoming packet\n");
>-                       qede_recycle_rx_bd_ring(rxq, qdev, fp_cqe->bd_num)=
;
>-                       rte_eth_devices[rxq->port_id].
>-                           data->rx_mbuf_alloc_failed++;
>-                       rxq->rx_alloc_errors++;
>-                       break;
>-               }
>+               rx_alloc_count++;
>                qede_rx_bd_ring_consume(rxq);
>
>                if (!tpa_start_flg && fp_cqe->bd_num > 1) { @@ -1574,17 +1=
630,9
>@@ qede_recv_pkts(void *p_rxq, struct rte_mbuf **rx_pkts, uint16_t
>nb_pkts)
>                        if (qede_process_sg_pkts(p_rxq, seg1, num_segs,
>                                                 pkt_len - len))
>                                goto next_cqe;
>-                       for (j =3D 0; j < num_segs; j++) {
>-                               if (qede_alloc_rx_buffer(rxq)) {
>-                                       PMD_RX_LOG(ERR, rxq,
>-                                               "Buffer allocation failed"=
);
>-                                       rte_eth_devices[rxq->port_id].
>-                                               data->rx_mbuf_alloc_failed=
++;
>-                                       rxq->rx_alloc_errors++;
>-                                       break;
>-                               }
>-                               rxq->rx_segs++;
>-                       }
>+
>+                       rx_alloc_count +=3D num_segs;
>+                       rxq->rx_segs +=3D num_segs;
>                }
>                rxq->rx_segs++; /* for the first segment */
>
>@@ -1626,7 +1674,8 @@ qede_recv_pkts(void *p_rxq, struct rte_mbuf
>**rx_pkts, uint16_t nb_pkts)
>                }
>        }
>
>-       qede_update_rx_prod(qdev, rxq);
>+       /* Request number of bufferes to be allocated in next loop */
>+       rxq->rx_alloc_count =3D rx_alloc_count;
>
>        rxq->rcv_pkts +=3D rx_pkt;
>
>diff --git a/drivers/net/qede/qede_rxtx.h b/drivers/net/qede/qede_rxtx.h
>index 454daa0..5b249cb 100644
>--- a/drivers/net/qede/qede_rxtx.h
>+++ b/drivers/net/qede/qede_rxtx.h
>@@ -192,6 +192,8 @@ struct qede_rx_queue {
>        uint16_t queue_id;
>        uint16_t port_id;
>        uint16_t rx_buf_size;
>+       uint16_t rx_alloc_count;
>+       uint16_t unused;
>        uint64_t rcv_pkts;
>        uint64_t rx_segs;
>        uint64_t rx_hw_errors;
>--
>2.7.4