From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 69395A0C43;
	Mon, 11 Oct 2021 03:18:15 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id E274740E01;
	Mon, 11 Oct 2021 03:18:14 +0200 (CEST)
Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187])
 by mails.dpdk.org (Postfix) with ESMTP id E371E40142
 for <dev@dpdk.org>; Mon, 11 Oct 2021 03:18:12 +0200 (CEST)
Received: from dggemv703-chm.china.huawei.com (unknown [172.30.72.56])
 by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4HSLVC543KzW8hy;
 Mon, 11 Oct 2021 09:16:35 +0800 (CST)
Received: from dggpeml500024.china.huawei.com (7.185.36.10) by
 dggemv703-chm.china.huawei.com (10.3.19.46) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2308.8; Mon, 11 Oct 2021 09:18:10 +0800
Received: from [127.0.0.1] (10.67.100.224) by dggpeml500024.china.huawei.com
 (7.185.36.10) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.8; Mon, 11 Oct
 2021 09:18:09 +0800
From: fengchengwen <fengchengwen@huawei.com>
To: Konstantin Ananyev <konstantin.ananyev@intel.com>, <dev@dpdk.org>
CC: <xiaoyun.li@intel.com>, <anoobj@marvell.com>, <jerinj@marvell.com>,
 <ndabilpuram@marvell.com>, <adwivedi@marvell.com>,
 <shepard.siegel@atomicrules.com>, <ed.czeck@atomicrules.com>,
 <john.miller@atomicrules.com>, <irusskikh@marvell.com>,
 <ajit.khaparde@broadcom.com>, <somnath.kotur@broadcom.com>,
 <rahul.lakkireddy@chelsio.com>, <hemant.agrawal@nxp.com>,
 <sachin.saxena@oss.nxp.com>, <haiyue.wang@intel.com>, <johndale@cisco.com>,
 <hyonkim@cisco.com>, <qi.z.zhang@intel.com>, <xiao.w.wang@intel.com>,
 <humin29@huawei.com>, <yisen.zhuang@huawei.com>, <oulijun@huawei.com>,
 <beilei.xing@intel.com>, <jingjing.wu@intel.com>, <qiming.yang@intel.com>,
 <matan@nvidia.com>, <viacheslavo@nvidia.com>, <sthemmin@microsoft.com>,
 <longli@microsoft.com>, <heinrich.kuhn@corigine.com>,
 <kirankumark@marvell.com>, <andrew.rybchenko@oktetlabs.ru>,
 <mczekaj@marvell.com>, <jiawenwu@trustnetic.com>, <jianwang@trustnetic.com>,
 <maxime.coquelin@redhat.com>, <chenbo.xia@intel.com>, <thomas@monjalon.net>,
 <ferruh.yigit@intel.com>, <mdr@ashroe.eu>, <jay.jayatheerthan@intel.com>
References: <20211004135603.20593-1-konstantin.ananyev@intel.com>
 <20211007112750.25526-1-konstantin.ananyev@intel.com>
 <20211007112750.25526-5-konstantin.ananyev@intel.com>
 <4c57bb9d-21d6-0722-92b8-987283bb8fe6@huawei.com>
Message-ID: <2f8c4eba-879f-7ee8-28c3-f8c23f1f885d@huawei.com>
Date: Mon, 11 Oct 2021 09:18:09 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101
 Thunderbird/68.11.0
MIME-Version: 1.0
In-Reply-To: <4c57bb9d-21d6-0722-92b8-987283bb8fe6@huawei.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Originating-IP: [10.67.100.224]
X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To
 dggpeml500024.china.huawei.com (7.185.36.10)
X-CFilter-Loop: Reflected
Subject: Re: [dpdk-dev] [PATCH v5 4/7] ethdev: copy fast-path API into
 separate structure
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Sorry to self-reply.

I think it's better the 'struct rte_eth_dev *dev' hold a pointer to the
'struct rte_eth_fp_ops', e.g.

	struct rte_eth_dev {
		struct rte_eth_fp_ops *fp_ops;
		...  // other field
	}

The eth framework set the pointer in the rte_eth_dev_pci_allocate(), and driver fill
corresponding callback:
	dev->fp_ops->rx_pkt_burst = xxx_recv_pkts;
	dev->fp_ops->tx_pkt_burst = xxx_xmit_pkts;
	...

In this way, the behavior of the primary and secondary processes can be unified, which
is basically the same as that of the original process.


On 2021/10/9 20:05, fengchengwen wrote:
> On 2021/10/7 19:27, Konstantin Ananyev wrote:
>> Copy public function pointers (rx_pkt_burst(), etc.) and related
>> pointers to internal data from rte_eth_dev structure into a
>> separate flat array. That array will remain in a public header.
>> The intention here is to make rte_eth_dev and related structures internal.
>> That should allow future possible changes to core eth_dev structures
>> to be transparent to the user and help to avoid ABI/API breakages.
>> The plan is to keep minimal part of data from rte_eth_dev public,
>> so we still can use inline functions for fast-path calls
>> (like rte_eth_rx_burst(), etc.) to avoid/minimize slowdown.
>> The whole idea beyond this new schema:
>> 1. PMDs keep to setup fast-path function pointers and related data
>>    inside rte_eth_dev struct in the same way they did it before.
>> 2. Inside rte_eth_dev_start() and inside rte_eth_dev_probing_finish()
>>    (for secondary process) we call eth_dev_fp_ops_setup, which
>>    copies these function and data pointers into rte_eth_fp_ops[port_id].
>> 3. Inside rte_eth_dev_stop() and inside rte_eth_dev_release_port()
>>    we call eth_dev_fp_ops_reset(), which resets rte_eth_fp_ops[port_id]
>>    into some dummy values.
>> 4. fast-path ethdev API (rte_eth_rx_burst(), etc.) will use that new
>>    flat array to call PMD specific functions.
>> That approach should allow us to make rte_eth_devices[] private
>> without introducing regression and help to avoid changes in drivers code.
>>
>> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
>> ---
>>  lib/ethdev/ethdev_private.c  | 52 ++++++++++++++++++++++++++++++++++
>>  lib/ethdev/ethdev_private.h  |  7 +++++
>>  lib/ethdev/rte_ethdev.c      | 27 ++++++++++++++++++
>>  lib/ethdev/rte_ethdev_core.h | 55 ++++++++++++++++++++++++++++++++++++
>>  4 files changed, 141 insertions(+)
>>
>> diff --git a/lib/ethdev/ethdev_private.c b/lib/ethdev/ethdev_private.c
>> index 012cf73ca2..3eeda6e9f9 100644
>> --- a/lib/ethdev/ethdev_private.c
>> +++ b/lib/ethdev/ethdev_private.c
>> @@ -174,3 +174,55 @@ rte_eth_devargs_parse_representor_ports(char *str, void *data)
>>  		RTE_LOG(ERR, EAL, "wrong representor format: %s\n", str);
>>  	return str == NULL ? -1 : 0;
>>  }
>> +
>> +static uint16_t
>> +dummy_eth_rx_burst(__rte_unused void *rxq,
>> +		__rte_unused struct rte_mbuf **rx_pkts,
>> +		__rte_unused uint16_t nb_pkts)
>> +{
>> +	RTE_ETHDEV_LOG(ERR, "rx_pkt_burst for unconfigured port\n");
>> +	rte_errno = ENOTSUP;
>> +	return 0;
>> +}
>> +
>> +static uint16_t
>> +dummy_eth_tx_burst(__rte_unused void *txq,
>> +		__rte_unused struct rte_mbuf **tx_pkts,
>> +		__rte_unused uint16_t nb_pkts)
>> +{
>> +	RTE_ETHDEV_LOG(ERR, "tx_pkt_burst for unconfigured port\n");
>> +	rte_errno = ENOTSUP;
>> +	return 0;
>> +}
>> +
>> +void
>> +eth_dev_fp_ops_reset(struct rte_eth_fp_ops *fpo)
> 
> The port_id parameter is preferable, this will hide rte_eth_fp_ops as much as possible.
> 
>> +{
>> +	static void *dummy_data[RTE_MAX_QUEUES_PER_PORT];
>> +	static const struct rte_eth_fp_ops dummy_ops = {
>> +		.rx_pkt_burst = dummy_eth_rx_burst,
>> +		.tx_pkt_burst = dummy_eth_tx_burst,
>> +		.rxq = {.data = dummy_data, .clbk = dummy_data,},
>> +		.txq = {.data = dummy_data, .clbk = dummy_data,},
>> +	};
>> +
>> +	*fpo = dummy_ops;
>> +}
>> +
>> +void
>> +eth_dev_fp_ops_setup(struct rte_eth_fp_ops *fpo,
>> +		const struct rte_eth_dev *dev)
> 
> Because fp_ops and eth_dev is a one-to-one correspondence. It's better only use
> port_id parameter.
> 
>> +{
>> +	fpo->rx_pkt_burst = dev->rx_pkt_burst;
>> +	fpo->tx_pkt_burst = dev->tx_pkt_burst;
>> +	fpo->tx_pkt_prepare = dev->tx_pkt_prepare;
>> +	fpo->rx_queue_count = dev->rx_queue_count;
>> +	fpo->rx_descriptor_status = dev->rx_descriptor_status;
>> +	fpo->tx_descriptor_status = dev->tx_descriptor_status;
>> +
>> +	fpo->rxq.data = dev->data->rx_queues;
>> +	fpo->rxq.clbk = (void **)(uintptr_t)dev->post_rx_burst_cbs;
>> +
>> +	fpo->txq.data = dev->data->tx_queues;
>> +	fpo->txq.clbk = (void **)(uintptr_t)dev->pre_tx_burst_cbs;
>> +}
>> diff --git a/lib/ethdev/ethdev_private.h b/lib/ethdev/ethdev_private.h
>> index 3724429577..5721be7bdc 100644
>> --- a/lib/ethdev/ethdev_private.h
>> +++ b/lib/ethdev/ethdev_private.h
>> @@ -26,4 +26,11 @@ eth_find_device(const struct rte_eth_dev *_start, rte_eth_cmp_t cmp,
>>  /* Parse devargs value for representor parameter. */
>>  int rte_eth_devargs_parse_representor_ports(char *str, void *data);
>>  
>> +/* reset eth fast-path API to dummy values */
>> +void eth_dev_fp_ops_reset(struct rte_eth_fp_ops *fpo);
>> +
>> +/* setup eth fast-path API to ethdev values */
>> +void eth_dev_fp_ops_setup(struct rte_eth_fp_ops *fpo,
>> +		const struct rte_eth_dev *dev);
> 
> Some drivers control the transmit/receive function during operation. E.g.
> for hns3 driver, when detect reset, primary process will set rx/tx burst to dummy, after
> process reset, primary process will set the correct rx/tx burst. During this process, the
> send and receive threads are still working, but the bursts they call are changed. So:
> 1. it is recommended that trace be deleted from the dummy function.
> 2. public the eth_dev_fp_ops_reset/setup interface for driver usage.
> 
>> +
>>  #endif /* _ETH_PRIVATE_H_ */
>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
>> index c8abda6dd7..9f7a0cbb8c 100644
>> --- a/lib/ethdev/rte_ethdev.c
>> +++ b/lib/ethdev/rte_ethdev.c
>> @@ -44,6 +44,9 @@
>>  static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
>>  struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
>>  
>> +/* public fast-path API */
>> +struct rte_eth_fp_ops rte_eth_fp_ops[RTE_MAX_ETHPORTS];
>> +
>>  /* spinlock for eth device callbacks */
>>  static rte_spinlock_t eth_dev_cb_lock = RTE_SPINLOCK_INITIALIZER;
>>  
>> @@ -578,6 +581,8 @@ rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)
>>  		rte_eth_dev_callback_process(eth_dev,
>>  				RTE_ETH_EVENT_DESTROY, NULL);
>>  
>> +	eth_dev_fp_ops_reset(rte_eth_fp_ops + eth_dev->data->port_id);
>> +
>>  	rte_spinlock_lock(&eth_dev_shared_data->ownership_lock);
>>  
>>  	eth_dev->state = RTE_ETH_DEV_UNUSED;
>> @@ -1787,6 +1792,9 @@ rte_eth_dev_start(uint16_t port_id)
>>  		(*dev->dev_ops->link_update)(dev, 0);
>>  	}
>>  
>> +	/* expose selection of PMD fast-path functions */
>> +	eth_dev_fp_ops_setup(rte_eth_fp_ops + port_id, dev);
>> +
>>  	rte_ethdev_trace_start(port_id);
>>  	return 0;
>>  }
>> @@ -1809,6 +1817,9 @@ rte_eth_dev_stop(uint16_t port_id)
>>  		return 0;
>>  	}
>>  
>> +	/* point fast-path functions to dummy ones */
>> +	eth_dev_fp_ops_reset(rte_eth_fp_ops + port_id);
>> +
>>  	dev->data->dev_started = 0;
>>  	ret = (*dev->dev_ops->dev_stop)(dev);
>>  	rte_ethdev_trace_stop(port_id, ret);
>> @@ -4567,6 +4578,14 @@ rte_eth_mirror_rule_reset(uint16_t port_id, uint8_t rule_id)
>>  	return eth_err(port_id, (*dev->dev_ops->mirror_rule_reset)(dev, rule_id));
>>  }
>>  
>> +RTE_INIT(eth_dev_init_fp_ops)
>> +{
>> +	uint32_t i;
>> +
>> +	for (i = 0; i != RTE_DIM(rte_eth_fp_ops); i++)
>> +		eth_dev_fp_ops_reset(rte_eth_fp_ops + i);
>> +}
>> +
>>  RTE_INIT(eth_dev_init_cb_lists)
>>  {
>>  	uint16_t i;
>> @@ -4735,6 +4754,14 @@ rte_eth_dev_probing_finish(struct rte_eth_dev *dev)
>>  	if (dev == NULL)
>>  		return;
>>  
>> +	/*
>> +	 * for secondary process, at that point we expect device
>> +	 * to be already 'usable', so shared data and all function pointers
>> +	 * for fast-path devops have to be setup properly inside rte_eth_dev.
>> +	 */
>> +	if (rte_eal_process_type() == RTE_PROC_SECONDARY)
>> +		eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
>> +
>>  	rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_NEW, NULL);
>>  
>>  	dev->state = RTE_ETH_DEV_ATTACHED;
>> diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
>> index 51cd68de94..d5853dff86 100644
>> --- a/lib/ethdev/rte_ethdev_core.h
>> +++ b/lib/ethdev/rte_ethdev_core.h
>> @@ -50,6 +50,61 @@ typedef int (*eth_rx_descriptor_status_t)(void *rxq, uint16_t offset);
>>  typedef int (*eth_tx_descriptor_status_t)(void *txq, uint16_t offset);
>>  /**< @internal Check the status of a Tx descriptor */
>>  
>> +/**
>> + * @internal
>> + * Structure used to hold opaque pointers to internal ethdev Rx/Tx
>> + * queues data.
>> + * The main purpose to expose these pointers at all - allow compiler
>> + * to fetch this data for fast-path ethdev inline functions in advance.
>> + */
>> +struct rte_ethdev_qdata {
>> +	void **data;
>> +	/**< points to array of internal queue data pointers */
>> +	void **clbk;
>> +	/**< points to array of queue callback data pointers */
>> +};
>> +
>> +/**
>> + * @internal
>> + * fast-path ethdev functions and related data are hold in a flat array.
>> + * One entry per ethdev.
>> + * On 64-bit systems contents of this structure occupy exactly two 64B lines.
>> + * On 32-bit systems contents of this structure fits into one 64B line.
>> + */
>> +struct rte_eth_fp_ops {
>> +
>> +	/**
>> +	 * Rx fast-path functions and related data.
>> +	 * 64-bit systems: occupies first 64B line
>> +	 */
>> +	eth_rx_burst_t rx_pkt_burst;
>> +	/**< PMD receive function. */
>> +	eth_rx_queue_count_t rx_queue_count;
>> +	/**< Get the number of used RX descriptors. */
>> +	eth_rx_descriptor_status_t rx_descriptor_status;
>> +	/**< Check the status of a Rx descriptor. */
>> +	struct rte_ethdev_qdata rxq;
>> +	/**< Rx queues data. */
>> +	uintptr_t reserved1[3];
>> +
>> +	/**
>> +	 * Tx fast-path functions and related data.
>> +	 * 64-bit systems: occupies second 64B line
>> +	 */
>> +	eth_tx_burst_t tx_pkt_burst;
> 
> Why not place rx_pkt_burst/tx_pkt_burst/rxq /txq to the first cacheline ?
> Other function, e.g. rx_queue_count/descriptor_status are low frequency call functions.
> 
>> +	/**< PMD transmit function. */
>> +	eth_tx_prep_t tx_pkt_prepare;
>> +	/**< PMD transmit prepare function. */
>> +	eth_tx_descriptor_status_t tx_descriptor_status;
>> +	/**< Check the status of a Tx descriptor. */
>> +	struct rte_ethdev_qdata txq;
>> +	/**< Tx queues data. */
>> +	uintptr_t reserved2[3];
>> +
>> +} __rte_cache_aligned;
>> +
>> +extern struct rte_eth_fp_ops rte_eth_fp_ops[RTE_MAX_ETHPORTS];
>> +
>>  
>>  /**
>>   * @internal
>>
> 
> 
> .
>