From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 57F01457D4; Thu, 15 Aug 2024 13:57:05 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E077A4027E; Thu, 15 Aug 2024 13:57:04 +0200 (CEST) Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2054.outbound.protection.outlook.com [40.107.22.54]) by mails.dpdk.org (Postfix) with ESMTP id D50D74025F for ; Thu, 15 Aug 2024 13:57:03 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=RuRPVxxS0PDWOp4TvZN/PxWWAc0VxsjEXK1PMkHp0kme59kbqNw49fDp5tONLaW4POQuS/W7owwvIAkdBdIUwnblIZj+xnUGjoa7hzMIfpif3KKIzeQUKr7+yHrTTNJyFOMHuTbpI/8WdCXH+jR7KrjLk/hBrlbzuwGX5QLJIvOuR5GG2jb4ZaN2c3GGOp2gQd+CX/C1HsqNRzUoz2dxb8cUk1Gb6/syvjbEfPMZEGDW2bmMW3Q7GUyV+InU642q7zft22gN+ftytMLcnJdqWeTdWixfJwGQtoqyHD/fjEcJqMiWscI0PyiYTI0TuKF1xTuUZ1PbwoLhJMZKQFnD1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wQCqGqrFCV/kO6nFY/lciSeJX5kjtjkD0OTFhMez6mk=; b=SDLFXXJXLmKFUPWau+BdQ/t4yytaG5bzDtXz/vHzBhfTl9sm213MBxefAx4AZiizsG87QHnBsqDdeaSHRqhfpPWcLUyoI9IOKgGoCIvtFT6TOvZEiHn9b9K9o68rB/COHboIWLlZCeu7nrlnIiEZOFjMfUvdeYnJZoUfhlFKOHEVjD/ZpnangEYrLt91aG8ezrXWufWGm8EiHW2ynssZlfivmZINRd5Xuj6B5phgWYq3cmWPYgyUimeOqbHWl/zmLScSKi3shJ9oojUEZ9pWDSxIAgjf9ANR67KiHnzP2MND9Rro5x6eWaUMETri+x2TIFAPymGvS1vMQYSpNzvMvQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 192.176.1.74) smtp.rcpttodomain=dpdk.org smtp.mailfrom=ericsson.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=ericsson.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ericsson.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wQCqGqrFCV/kO6nFY/lciSeJX5kjtjkD0OTFhMez6mk=; b=uU8BrgkAyF+4imd7bPoZ3CArrLomXMK8qX5cL16EIogx4Qt/Om0EOTZ8wXIPYFP3GxIsLQDEEnMU7rNImcdDNOZYMUogacGc5lXhO2oPWnGF+vz362Uckdqy2LkRK391GVC9kQXVPnrE8gxUsQANS6feNesMEYTUhbRhyQ+ll/Xx9HizOyoZkO+rdDDNCn8r/RAagGhp48PpzujDia57GozwIrMXU6u/LAxWJQIjRZeqX7FG41fS80lpevvQmMxrvLgL3WDvA1wkw00lGCvDUBRI3H4PPidRpp4q/dv6YjbA85B2N2AcJDXO15AMHp37WBRsVPey+K5DjhUzNabJqQ== Received: from AS4P250CA0009.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:5df::10) by PA4PR07MB7567.eurprd07.prod.outlook.com (2603:10a6:102:cf::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.16; Thu, 15 Aug 2024 11:57:01 +0000 Received: from AMS0EPF00000197.eurprd05.prod.outlook.com (2603:10a6:20b:5df:cafe::88) by AS4P250CA0009.outlook.office365.com (2603:10a6:20b:5df::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7875.19 via Frontend Transport; Thu, 15 Aug 2024 11:57:00 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 192.176.1.74) smtp.mailfrom=ericsson.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=ericsson.com; Received-SPF: Pass (protection.outlook.com: domain of ericsson.com designates 192.176.1.74 as permitted sender) receiver=protection.outlook.com; client-ip=192.176.1.74; helo=oa.msg.ericsson.com; pr=C Received: from oa.msg.ericsson.com (192.176.1.74) by AMS0EPF00000197.mail.protection.outlook.com (10.167.16.219) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7849.8 via Frontend Transport; Thu, 15 Aug 2024 11:57:00 +0000 Received: from seroius18814.sero.gic.ericsson.se (153.88.142.248) by smtp-central.internal.ericsson.com (100.87.178.63) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Thu, 15 Aug 2024 13:57:00 +0200 Received: from seroiuts03116.sero.gic.ericsson.se (seroiuts03116.sero.gic.ericsson.se [10.210.134.60]) by seroius18814.sero.gic.ericsson.se (Postfix) with ESMTP id D8C694020B70; Thu, 15 Aug 2024 13:56:59 +0200 (CEST) Received: by seroiuts03116.sero.gic.ericsson.se (Postfix, from userid 7401342) id C6FF9603F772; Thu, 15 Aug 2024 13:56:59 +0200 (CEST) From: To: CC: , Vignesh PS Subject: [PATCH] net/af_packet: add explicit flush for Tx Date: Thu, 15 Aug 2024 13:56:53 +0200 Message-ID: <20240815115653.603552-1-vignesh.purushotham.srinivas@ericsson.com> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AMS0EPF00000197:EE_|PA4PR07MB7567:EE_ X-MS-Office365-Filtering-Correlation-Id: 0e0aa98b-523f-41a1-7b68-08dcbd215efe X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|1800799024|82310400026|376014|36860700013; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?yzx523AaetJm2nENwfOve2aTokaf9lZEPOY92aVhbQ/txCFgo08oHHhqevtv?= =?us-ascii?Q?Kj+tpEFugNtQf9CNAUs4e4DCft64PWiDQvTuFI4GJiBiv4dTe8mkxzWrE23Y?= =?us-ascii?Q?A1eEA7Kozc8p7dDXFFAgdPuymzd9rgo8sjz7aWvIOyR/Z1cYfuiib4ImEqfL?= =?us-ascii?Q?codcko/UdAfIdNeYw+W7o/mQgb+yIkTtUwZsQAdbR2WLrtRLOkm2hn51TBQu?= =?us-ascii?Q?jf8pRlgGSnavOz+objIvuv4r01rXWZH4DqaE6zteiXm53XOuzyrwRwMung5A?= =?us-ascii?Q?FeBieYCYPInHIfqi+ut8+BR0iohuxvw5EWuxS4GzA2En0KULqSeNiGgJdRjE?= =?us-ascii?Q?fFEE42LSt0oU2pyLR8WqAKetDqA6F/KyD+gasaCy8IJ/kroYGGp8KxojdQY2?= =?us-ascii?Q?Yi6Z4RKAsD59Ib5ovV6D7/YEM5lnGb4y260KRwZjCLXWAUe/PfAxnr/DzJs0?= =?us-ascii?Q?yW6yWs9XA9cSTrh3MGpA+8Nq6R1A6yIuZxaFhZ9bL063f7dRpEQgYxF9URda?= =?us-ascii?Q?3WBo9Bwp2fsOjrh/0206KpvIaKbumR6ihxHjZPx0FDiKu48qknDWiA1TeO13?= =?us-ascii?Q?GLeg0KxG4TmpSL6h2u2tDeAL5Ow3KFSoeIe61XkR5eqpUuVMWw/iplkoV55s?= =?us-ascii?Q?Uj3EknuqNQctQI018q0ZMPzVzN0U/73ykrgYMOT3Okk3ZNDF7WbO1doBSNkR?= =?us-ascii?Q?5igsfHq9OFbBBLeyRqgWszAEHghQ3RJ5ST13hjy1vvRM5F4PGJsqd85hRypZ?= =?us-ascii?Q?PXroxCqQ9RnrYX8lggKi4rTGybeuNgH7/XXc8rkvbVSw+x52+R3PDBttaZC2?= =?us-ascii?Q?pW8cCHSzhqWwKtW8qj34oABEYgBUM1yF4CcdS/NgYSnrsKmNHL6yBu5TrHgx?= =?us-ascii?Q?gd0PSmWmT4NRfC1ZPwOkg69fOJFFaMmNAvoh5G1TqyF1p35O6q7j5QwVkImt?= =?us-ascii?Q?wuPqyFdfcEz5l1W2WsMs3a+MlGDBLJ4FvrMk0iKguHHUpk6GG1+RWPuy0VfJ?= =?us-ascii?Q?xglZ4dW/orz/YfK+zr8YJafhC9cDrH8eDIkAiIOgn6GjwOI8tbfuuv9dq2FW?= =?us-ascii?Q?CmdvtbKmoqCf0U30sMN6/SizRykXhAZadmcjG2JIP9GNcnOx25O3fdEHT3Bk?= =?us-ascii?Q?lsVtvW2JWh5RA1ztr8Rh3v6hP2A9YyCukhlem57hCO0Z22uhmg6pc1f6Udmi?= =?us-ascii?Q?7KcI6WQ7TPn3HMOhYz0itlb7Kn/C8T8IciS4hnldYRqjLwm7Z6LBp9NFqvhM?= =?us-ascii?Q?yKgLVYlNWKrbiYOZiwNJz21zyQ1LZ+rPak47vUzvyj6c6RhCQXGMiZJoF+CL?= =?us-ascii?Q?nTMhd/WBQEPTWHg4LBX1cx9INd0absFw+JwcHQV1BuLvELHv04o5LNxu5A9W?= =?us-ascii?Q?hqCoZn7OGRubGpcfOVRVUAujK+rDPCepJfr+gvPbPBhEH86uHlAt9VPgnq15?= =?us-ascii?Q?nveeex0APrxbd5wfY7eNcNzX4ymLkjMk?= X-Forefront-Antispam-Report: CIP:192.176.1.74; CTRY:SE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:oa.msg.ericsson.com; PTR:office365.se.ericsson.net; CAT:NONE; SFS:(13230040)(1800799024)(82310400026)(376014)(36860700013); DIR:OUT; SFP:1101; X-OriginatorOrg: ericsson.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Aug 2024 11:57:00.4449 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0e0aa98b-523f-41a1-7b68-08dcbd215efe X-MS-Exchange-CrossTenant-Id: 92e84ceb-fbfd-47ab-be52-080c6b87953f X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=92e84ceb-fbfd-47ab-be52-080c6b87953f; Ip=[192.176.1.74]; Helo=[oa.msg.ericsson.com] X-MS-Exchange-CrossTenant-AuthSource: AMS0EPF00000197.eurprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PA4PR07MB7567 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Vignesh PS af_packet PMD uses system calls to transmit packets. Separate the transmit function into two different calls so its possible to avoid syscalls during transmit. Signed-off-by: Vignesh PS --- .mailmap | 1 + doc/guides/nics/af_packet.rst | 26 ++++++- drivers/net/af_packet/rte_eth_af_packet.c | 90 ++++++++++++++++++++++- 3 files changed, 110 insertions(+), 7 deletions(-) diff --git a/.mailmap b/.mailmap index 4a508bafad..5e9462b7cd 100644 --- a/.mailmap +++ b/.mailmap @@ -1548,6 +1548,7 @@ Viacheslav Ovsiienko Victor Kaplansky Victor Raj Vidya Sagar Velumuri +Vignesh PS Vignesh Sridhar Vijayakumar Muthuvel Manickam Vijaya Mohan Guvva diff --git a/doc/guides/nics/af_packet.rst b/doc/guides/nics/af_packet.rst index 66b977e1a2..fe92ef231f 100644 --- a/doc/guides/nics/af_packet.rst +++ b/doc/guides/nics/af_packet.rst @@ -29,6 +29,7 @@ Some of these, in turn, will be used to configure the PACKET_MMAP settings. * ``framesz`` - PACKET_MMAP frame size (optional, default 2048B; Note: multiple of 16B); * ``framecnt`` - PACKET_MMAP frame count (optional, default 512). +* ``explicit_flush`` - enable two stage packet transmit. Because this implementation is based on PACKET_MMAP, and PACKET_MMAP has its own pre-requisites, it should be noted that the inner workings of PACKET_MMAP @@ -39,6 +40,9 @@ As an example, if one changes ``framesz`` to be 1024B, it is expected that ``blocksz`` is set to at least 1024B as well (although 2048B in this case would allow two "frames" per "block"). +When ``explicit_flush`` is enabled, then the PMD will temporary buffer mbuf in a +ring buffer in the PMD until ``rte_eth_tx_done_cleanup`` is called on the TX queue. + This restriction happens because PACKET_MMAP expects each single "frame" to fit inside of a "block". And although multiple "frames" can fit inside of a single "block", a "frame" may not span across two "blocks". @@ -64,11 +68,25 @@ framecnt=512): .. code-block:: console - --vdev=eth_af_packet0,iface=tap0,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0 + --vdev=eth_af_packet0,iface=tap0,blocksz=4096,framesz=2048,framecnt=512,qpairs=1,qdisc_bypass=0,explicit_flush=1 Features and Limitations ------------------------ -The PMD will re-insert the VLAN tag transparently to the packet if the kernel -strips it, as long as the ``RTE_ETH_RX_OFFLOAD_VLAN_STRIP`` is not enabled by the -application. +* The PMD will re-insert the VLAN tag transparently to the packet if the kernel + strips it, as long as the ``RTE_ETH_RX_OFFLOAD_VLAN_STRIP`` is not enabled by the + application. +* The PMD relies on send_to() system call to transmit packets from the PACKET_MMAP socket. + This system call can cause head-in-line blocking. Hence, it's advantageous to buffer the + packets in the drivers instead of immediately triggering packet transmits on calling + ``rte_eth_tx_burst()``. Therefore, the PMD splits the functionality of ``rte_eth_tx_burst()`` + into two functional stages, where ``rte_eth_tx_burst()`` causes packets to be be buffered + in the driver, and subsequent call to ``rte_eth_tx_done_cleanup()`` triggers the actual + packet transmits. With such disaggregated PMD design, it is possible to call + ``rte_eth_tx_burst()`` on workers and trigger tramists (by calling + ``rte_eth_tx_done_cleanup()``) from a control plane worker and eliminate + head-in-line blocking. +* To enable the two stage packet transmit, the PMD should be started with explicit_flush=1 + (Default explicit_flush=0). +* When calling ``rte_eth_tx_done_cleanup()`` the free_cnt parameter has no effect on how + many packets are flushed. The PMD will flush all the packets present in the buffer. diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c index 6b7b16f348..cdbe43313a 100644 --- a/drivers/net/af_packet/rte_eth_af_packet.c +++ b/drivers/net/af_packet/rte_eth_af_packet.c @@ -36,9 +36,11 @@ #define ETH_AF_PACKET_FRAMESIZE_ARG "framesz" #define ETH_AF_PACKET_FRAMECOUNT_ARG "framecnt" #define ETH_AF_PACKET_QDISC_BYPASS_ARG "qdisc_bypass" +#define ETH_AF_PACKET_EXPLICIT_FLUSH_ARG "explicit_flush" #define DFLT_FRAME_SIZE (1 << 11) #define DFLT_FRAME_COUNT (1 << 9) +#define DFLT_FRAME_BURST (32) struct __rte_cache_aligned pkt_rx_queue { int sockfd; @@ -62,8 +64,10 @@ struct __rte_cache_aligned pkt_tx_queue { struct iovec *rd; uint8_t *map; + struct rte_ring *buf; unsigned int framecount; unsigned int framenum; + unsigned int explicit_flush; volatile unsigned long tx_pkts; volatile unsigned long err_pkts; @@ -91,6 +95,7 @@ static const char *valid_arguments[] = { ETH_AF_PACKET_FRAMESIZE_ARG, ETH_AF_PACKET_FRAMECOUNT_ARG, ETH_AF_PACKET_QDISC_BYPASS_ARG, + ETH_AF_PACKET_EXPLICIT_FLUSH_ARG, NULL }; @@ -198,7 +203,7 @@ tx_ring_status_available(uint32_t tp_status) * Callback to handle sending packets through a real NIC. */ static uint16_t -eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) +eth_af_packet_tx_internal(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) { struct tpacket2_hdr *ppd; struct rte_mbuf *mbuf; @@ -311,6 +316,59 @@ eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return i; } +/* + * Callback to handle sending packets. + */ +static uint16_t +eth_af_packet_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) +{ + struct pkt_tx_queue *pkt_q = queue; + + if (unlikely(nb_pkts == 0)) + return 0; + + if (pkt_q->explicit_flush) + return rte_ring_enqueue_burst(pkt_q->buf, + (void **)bufs, nb_pkts, NULL); + + return eth_af_packet_tx_internal(queue, bufs, nb_pkts); +} + +/* + * Callback to flush previously buffer tx packets. + */ +static int +eth_af_packet_tx_flush(void *queue, uint32_t free_cnt __rte_unused) +{ + uint16_t sent, nb_pkts; + uint16_t num_flushed = 0; + + struct pkt_tx_queue *pkt_q = queue; + + while (true) { + /* flush DFLT_FRAME_BURST of buffered pkts every iteration */ + struct rte_mbuf *bufs[DFLT_FRAME_BURST]; + nb_pkts = rte_ring_dequeue_burst_start(pkt_q->buf, + (void **)bufs, DFLT_FRAME_BURST, NULL); + + if (unlikely(nb_pkts == 0)) + break; + + /* If packet are dropped internally by the below + * function, it okay to not include that stats in the + * return of this function because err_pkts is updated + * internally. + */ + sent = eth_af_packet_tx_internal(queue, bufs, nb_pkts); + num_flushed += sent; + + /* commit the dequeue operation */ + rte_ring_dequeue_finish(pkt_q->buf, sent); + } + + return num_flushed; +} + static int eth_dev_start(struct rte_eth_dev *dev) { @@ -637,6 +695,7 @@ static const struct eth_dev_ops ops = { .link_update = eth_link_update, .stats_get = eth_stats_get, .stats_reset = eth_stats_reset, + .tx_done_cleanup = eth_af_packet_tx_flush, }; /* @@ -668,6 +727,7 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, unsigned int framesize, unsigned int framecnt, unsigned int qdisc_bypass, + unsigned int explicit_flush, struct pmd_internals **internals, struct rte_eth_dev **eth_dev, struct rte_kvargs *kvlist) @@ -885,6 +945,18 @@ rte_pmd_init_internals(struct rte_vdev_device *dev, goto error; } + char buf_name[RTE_RING_NAMESIZE]; + snprintf(buf_name, RTE_RING_NAMESIZE, "%s:txq%u", name, q); + tx_queue->buf = rte_ring_create(buf_name, tx_queue->framecount, + numa_node, RING_F_SP_ENQ | RING_F_SC_DEQ); + if (tx_queue->buf == NULL) { + PMD_LOG(ERR, + "%s: could not create ring buffer. err=%s", + buf_name, rte_strerror(rte_errno)); + goto error; + } + tx_queue->explicit_flush = explicit_flush; + #if defined(PACKET_FANOUT) rc = setsockopt(qsockfd, SOL_PACKET, PACKET_FANOUT, &fanout_arg, sizeof(fanout_arg)); @@ -962,6 +1034,7 @@ rte_eth_from_packet(struct rte_vdev_device *dev, unsigned int framecount = DFLT_FRAME_COUNT; unsigned int qpairs = 1; unsigned int qdisc_bypass = 1; + unsigned int explicit_flush = 0; /* do some parameter checking */ if (*sockfd < 0) @@ -1024,6 +1097,16 @@ rte_eth_from_packet(struct rte_vdev_device *dev, } continue; } + if (strstr(pair->key, ETH_AF_PACKET_EXPLICIT_FLUSH_ARG) != NULL) { + explicit_flush = atoi(pair->value); + if (explicit_flush > 1) { + PMD_LOG(ERR, + "%s: invalid explicit_flush value", + name); + return -1; + } + continue; + } } if (framesize > blocksize) { @@ -1049,7 +1132,7 @@ rte_eth_from_packet(struct rte_vdev_device *dev, if (rte_pmd_init_internals(dev, *sockfd, qpairs, blocksize, blockcount, framesize, framecount, - qdisc_bypass, + qdisc_bypass, explicit_flush, &internals, ð_dev, kvlist) < 0) return -1; @@ -1146,4 +1229,5 @@ RTE_PMD_REGISTER_PARAM_STRING(net_af_packet, "blocksz= " "framesz= " "framecnt= " - "qdisc_bypass=<0|1>"); + "qdisc_bypass=<0|1> " + "explicit_flush=<0|1>"); -- 2.34.1