From: Jerin Jacob <jerinjacobk@gmail.com>
To: Abdullah Sevincer <abdullah.sevincer@intel.com>
Cc: dev@dpdk.org, jerinj@marvell.com
Subject: Re: [PATCH v12 1/3] event/dlb2: add producer port probing optimization
Date: Fri, 30 Sep 2022 13:58:04 +0530 [thread overview]
Message-ID: <CALBAE1ODWtfB30QDU8_W5tVf3siKCfDKv0mVddj1pyZn7UjQ8A@mail.gmail.com> (raw)
In-Reply-To: <20220929235900.1761461-1-abdullah.sevincer@intel.com>
On Fri, Sep 30, 2022 at 5:29 AM Abdullah Sevincer
<abdullah.sevincer@intel.com> wrote:
>
> For best performance, applications running on certain cores should use
> the DLB device locally available on the same tile along with other
> resources. To allocate optimal resources, probing is done for each
> producer port (PP) for a given CPU and the best performing ports are
> allocated to producers. The cpu used for probing is either the first
> core of producer coremask (if present) or the second core of EAL
> coremask. This will be extended later to probe for all CPUs in the
> producer coremask or EAL coremask.
>
> Producer coremask can be passed along with the BDF of the DLB devices.
> "-a xx:y.z,producer_coremask=<core_mask>"
>
> Applications also need to pass RTE_EVENT_PORT_CFG_HINT_PRODUCER during
> rte_event_port_setup() for producer ports for optimal port allocation.
>
> For optimal load balancing ports that map to one or more QIDs in common
> should not be in numerical sequence. The port->QID mapping is application
> dependent, but the driver interleaves port IDs as much as possible to
> reduce the likelihood of sequential ports mapping to the same QID(s).
>
> Hence, DLB uses an initial allocation of Port IDs to maximize the
> average distance between an ID and its immediate neighbors. Using
> the initialport allocation option can be passed through devarg
> "default_port_allocation=y(or Y)".
>
> When events are dropped by workers or consumers that use LDB ports,
> completions are sent which are just ENQs and may impact the latency.
> To address this, probing is done for LDB ports as well. Probing is
> done on ports per 'cos'. When default cos is used, ports will be
> allocated from best ports from the best 'cos', else from best ports of
> the specific cos.
>
> Signed-off-by: Abdullah Sevincer <abdullah.sevincer@intel.com>
Changed subject as " event/dlb2: optimize producer port probing"
Series applied to dpdk-next-net-eventdev/for-main. Thanks
> ---
> doc/guides/eventdevs/dlb2.rst | 36 +++
> drivers/event/dlb2/dlb2.c | 72 +++++-
> drivers/event/dlb2/dlb2_priv.h | 7 +
> drivers/event/dlb2/dlb2_user.h | 1 +
> drivers/event/dlb2/pf/base/dlb2_hw_types.h | 5 +
> drivers/event/dlb2/pf/base/dlb2_resource.c | 250 ++++++++++++++++++++-
> drivers/event/dlb2/pf/base/dlb2_resource.h | 15 +-
> drivers/event/dlb2/pf/dlb2_main.c | 9 +-
> drivers/event/dlb2/pf/dlb2_main.h | 23 +-
> drivers/event/dlb2/pf/dlb2_pf.c | 23 +-
> 10 files changed, 413 insertions(+), 28 deletions(-)
>
> diff --git a/doc/guides/eventdevs/dlb2.rst b/doc/guides/eventdevs/dlb2.rst
> index 5b21f13b68..f5bf5757c6 100644
> --- a/doc/guides/eventdevs/dlb2.rst
> +++ b/doc/guides/eventdevs/dlb2.rst
> @@ -414,3 +414,39 @@ Note that the weight may not exceed the maximum CQ depth.
> --allow ea:00.0,cq_weight=all:<weight>
> --allow ea:00.0,cq_weight=qidA-qidB:<weight>
> --allow ea:00.0,cq_weight=qid:<weight>
> +
> +Producer Coremask
> +~~~~~~~~~~~~~~~~~
> +
> +For best performance, applications running on certain cores should use
> +the DLB device locally available on the same tile along with other
> +resources. To allocate optimal resources, probing is done for each
> +producer port (PP) for a given CPU and the best performing ports are
> +allocated to producers. The cpu used for probing is either the first
> +core of producer coremask (if present) or the second core of EAL
> +coremask. This will be extended later to probe for all CPUs in the
> +producer coremask or EAL coremask. Producer coremask can be passed
> +along with the BDF of the DLB devices.
> +
> + .. code-block:: console
> +
> + -a xx:y.z,producer_coremask=<core_mask>
> +
> +Default LDB Port Allocation
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +For optimal load balancing ports that map to one or more QIDs in common
> +should not be in numerical sequence. The port->QID mapping is application
> +dependent, but the driver interleaves port IDs as much as possible to
> +reduce the likelihood of sequential ports mapping to the same QID(s).
> +
> +Hence, DLB uses an initial allocation of Port IDs to maximize the
> +average distance between an ID and its immediate neighbors. (i.e.the
> +distance from 1 to 0 and to 2, the distance from 2 to 1 and to 3, etc.).
> +Initial port allocation option can be passed through devarg. If y (or Y)
> +inial port allocation will be used, otherwise initial port allocation
> +won't be used.
> +
> + .. code-block:: console
> +
> + --allow ea:00.0,default_port_allocation=<y/Y>
> diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
> index 759578378f..6a9db4b642 100644
> --- a/drivers/event/dlb2/dlb2.c
> +++ b/drivers/event/dlb2/dlb2.c
> @@ -293,6 +293,23 @@ dlb2_string_to_int(int *result, const char *str)
> return 0;
> }
>
> +static int
> +set_producer_coremask(const char *key __rte_unused,
> + const char *value,
> + void *opaque)
> +{
> + const char **mask_str = opaque;
> +
> + if (value == NULL || opaque == NULL) {
> + DLB2_LOG_ERR("NULL pointer\n");
> + return -EINVAL;
> + }
> +
> + *mask_str = value;
> +
> + return 0;
> +}
> +
> static int
> set_numa_node(const char *key __rte_unused, const char *value, void *opaque)
> {
> @@ -617,6 +634,26 @@ set_vector_opts_enab(const char *key __rte_unused,
> return 0;
> }
>
> +static int
> +set_default_ldb_port_allocation(const char *key __rte_unused,
> + const char *value,
> + void *opaque)
> +{
> + bool *default_ldb_port_allocation = opaque;
> +
> + if (value == NULL || opaque == NULL) {
> + DLB2_LOG_ERR("NULL pointer\n");
> + return -EINVAL;
> + }
> +
> + if ((*value == 'y') || (*value == 'Y'))
> + *default_ldb_port_allocation = true;
> + else
> + *default_ldb_port_allocation = false;
> +
> + return 0;
> +}
> +
> static int
> set_qid_depth_thresh(const char *key __rte_unused,
> const char *value,
> @@ -1785,6 +1822,9 @@ dlb2_hw_create_dir_port(struct dlb2_eventdev *dlb2,
> } else
> credit_high_watermark = enqueue_depth;
>
> + if (ev_port->conf.event_port_cfg & RTE_EVENT_PORT_CFG_HINT_PRODUCER)
> + cfg.is_producer = 1;
> +
> /* Per QM values */
>
> ret = dlb2_iface_dir_port_create(handle, &cfg, dlb2->poll_mode);
> @@ -1979,6 +2019,10 @@ dlb2_eventdev_port_setup(struct rte_eventdev *dev,
> }
> ev_port->enq_retries = port_conf->enqueue_depth / sw_credit_quanta;
>
> + /* Save off port config for reconfig */
> + ev_port->conf = *port_conf;
> +
> +
> /*
> * Create port
> */
> @@ -2005,9 +2049,6 @@ dlb2_eventdev_port_setup(struct rte_eventdev *dev,
> }
> }
>
> - /* Save off port config for reconfig */
> - ev_port->conf = *port_conf;
> -
> ev_port->id = ev_port_id;
> ev_port->enq_configured = true;
> ev_port->setup_done = true;
> @@ -4700,6 +4741,8 @@ dlb2_parse_params(const char *params,
> DLB2_CQ_WEIGHT,
> DLB2_PORT_COS,
> DLB2_COS_BW,
> + DLB2_PRODUCER_COREMASK,
> + DLB2_DEFAULT_LDB_PORT_ALLOCATION_ARG,
> NULL };
>
> if (params != NULL && params[0] != '\0') {
> @@ -4881,6 +4924,29 @@ dlb2_parse_params(const char *params,
> }
>
>
> + ret = rte_kvargs_process(kvlist,
> + DLB2_PRODUCER_COREMASK,
> + set_producer_coremask,
> + &dlb2_args->producer_coremask);
> + if (ret != 0) {
> + DLB2_LOG_ERR(
> + "%s: Error parsing producer coremask",
> + name);
> + rte_kvargs_free(kvlist);
> + return ret;
> + }
> +
> + ret = rte_kvargs_process(kvlist,
> + DLB2_DEFAULT_LDB_PORT_ALLOCATION_ARG,
> + set_default_ldb_port_allocation,
> + &dlb2_args->default_ldb_port_allocation);
> + if (ret != 0) {
> + DLB2_LOG_ERR("%s: Error parsing ldb default port allocation arg",
> + name);
> + rte_kvargs_free(kvlist);
> + return ret;
> + }
> +
> rte_kvargs_free(kvlist);
> }
> }
> diff --git a/drivers/event/dlb2/dlb2_priv.h b/drivers/event/dlb2/dlb2_priv.h
> index db431f7d8b..9ef5bcb901 100644
> --- a/drivers/event/dlb2/dlb2_priv.h
> +++ b/drivers/event/dlb2/dlb2_priv.h
> @@ -51,6 +51,8 @@
> #define DLB2_CQ_WEIGHT "cq_weight"
> #define DLB2_PORT_COS "port_cos"
> #define DLB2_COS_BW "cos_bw"
> +#define DLB2_PRODUCER_COREMASK "producer_coremask"
> +#define DLB2_DEFAULT_LDB_PORT_ALLOCATION_ARG "default_port_allocation"
>
> /* Begin HW related defines and structs */
>
> @@ -386,6 +388,7 @@ struct dlb2_port {
> uint16_t hw_credit_quanta;
> bool use_avx512;
> uint32_t cq_weight;
> + bool is_producer; /* True if port is of type producer */
> };
>
> /* Per-process per-port mmio and memory pointers */
> @@ -669,6 +672,8 @@ struct dlb2_devargs {
> struct dlb2_cq_weight cq_weight;
> struct dlb2_port_cos port_cos;
> struct dlb2_cos_bw cos_bw;
> + const char *producer_coremask;
> + bool default_ldb_port_allocation;
> };
>
> /* End Eventdev related defines and structs */
> @@ -722,6 +727,8 @@ void dlb2_event_build_hcws(struct dlb2_port *qm_port,
> uint8_t *sched_type,
> uint8_t *queue_id);
>
> +/* Extern functions */
> +extern int rte_eal_parse_coremask(const char *coremask, int *cores);
>
> /* Extern globals */
> extern struct process_local_port_data dlb2_port[][DLB2_NUM_PORT_TYPES];
> diff --git a/drivers/event/dlb2/dlb2_user.h b/drivers/event/dlb2/dlb2_user.h
> index 901e2e0c66..28c6aaaf43 100644
> --- a/drivers/event/dlb2/dlb2_user.h
> +++ b/drivers/event/dlb2/dlb2_user.h
> @@ -498,6 +498,7 @@ struct dlb2_create_dir_port_args {
> __u16 cq_depth;
> __u16 cq_depth_threshold;
> __s32 queue_id;
> + __u8 is_producer;
> };
>
> /*
> diff --git a/drivers/event/dlb2/pf/base/dlb2_hw_types.h b/drivers/event/dlb2/pf/base/dlb2_hw_types.h
> index 9511521e67..87996ef621 100644
> --- a/drivers/event/dlb2/pf/base/dlb2_hw_types.h
> +++ b/drivers/event/dlb2/pf/base/dlb2_hw_types.h
> @@ -249,6 +249,7 @@ struct dlb2_hw_domain {
> struct dlb2_list_head avail_ldb_queues;
> struct dlb2_list_head avail_ldb_ports[DLB2_NUM_COS_DOMAINS];
> struct dlb2_list_head avail_dir_pq_pairs;
> + struct dlb2_list_head rsvd_dir_pq_pairs;
> u32 total_hist_list_entries;
> u32 avail_hist_list_entries;
> u32 hist_list_entry_base;
> @@ -347,6 +348,10 @@ struct dlb2_hw {
> struct dlb2_function_resources vdev[DLB2_MAX_NUM_VDEVS];
> struct dlb2_hw_domain domains[DLB2_MAX_NUM_DOMAINS];
> u8 cos_reservation[DLB2_NUM_COS_DOMAINS];
> + int prod_core_list[RTE_MAX_LCORE];
> + u8 num_prod_cores;
> + int dir_pp_allocations[DLB2_MAX_NUM_DIR_PORTS_V2_5];
> + int ldb_pp_allocations[DLB2_MAX_NUM_LDB_PORTS];
>
> /* Virtualization */
> int virt_mode;
> diff --git a/drivers/event/dlb2/pf/base/dlb2_resource.c b/drivers/event/dlb2/pf/base/dlb2_resource.c
> index 0731416a43..280a8e51b1 100644
> --- a/drivers/event/dlb2/pf/base/dlb2_resource.c
> +++ b/drivers/event/dlb2/pf/base/dlb2_resource.c
> @@ -51,6 +51,7 @@ static void dlb2_init_domain_rsrc_lists(struct dlb2_hw_domain *domain)
> dlb2_list_init_head(&domain->used_dir_pq_pairs);
> dlb2_list_init_head(&domain->avail_ldb_queues);
> dlb2_list_init_head(&domain->avail_dir_pq_pairs);
> + dlb2_list_init_head(&domain->rsvd_dir_pq_pairs);
>
> for (i = 0; i < DLB2_NUM_COS_DOMAINS; i++)
> dlb2_list_init_head(&domain->used_ldb_ports[i]);
> @@ -106,8 +107,10 @@ void dlb2_resource_free(struct dlb2_hw *hw)
> * Return:
> * Returns 0 upon success, <0 otherwise.
> */
> -int dlb2_resource_init(struct dlb2_hw *hw, enum dlb2_hw_ver ver)
> +int dlb2_resource_init(struct dlb2_hw *hw, enum dlb2_hw_ver ver, const void *probe_args)
> {
> + const struct dlb2_devargs *args = (const struct dlb2_devargs *)probe_args;
> + bool ldb_port_default = args ? args->default_ldb_port_allocation : false;
> struct dlb2_list_entry *list;
> unsigned int i;
> int ret;
> @@ -122,6 +125,7 @@ int dlb2_resource_init(struct dlb2_hw *hw, enum dlb2_hw_ver ver)
> * the distance from 1 to 0 and to 2, the distance from 2 to 1 and to
> * 3, etc.).
> */
> +
> const u8 init_ldb_port_allocation[DLB2_MAX_NUM_LDB_PORTS] = {
> 0, 7, 14, 5, 12, 3, 10, 1, 8, 15, 6, 13, 4, 11, 2, 9,
> 16, 23, 30, 21, 28, 19, 26, 17, 24, 31, 22, 29, 20, 27, 18, 25,
> @@ -164,7 +168,10 @@ int dlb2_resource_init(struct dlb2_hw *hw, enum dlb2_hw_ver ver)
> int cos_id = i >> DLB2_NUM_COS_DOMAINS;
> struct dlb2_ldb_port *port;
>
> - port = &hw->rsrcs.ldb_ports[init_ldb_port_allocation[i]];
> + if (ldb_port_default == true)
> + port = &hw->rsrcs.ldb_ports[init_ldb_port_allocation[i]];
> + else
> + port = &hw->rsrcs.ldb_ports[hw->ldb_pp_allocations[i]];
>
> dlb2_list_add(&hw->pf.avail_ldb_ports[cos_id],
> &port->func_list);
> @@ -172,7 +179,8 @@ int dlb2_resource_init(struct dlb2_hw *hw, enum dlb2_hw_ver ver)
>
> hw->pf.num_avail_dir_pq_pairs = DLB2_MAX_NUM_DIR_PORTS(hw->ver);
> for (i = 0; i < hw->pf.num_avail_dir_pq_pairs; i++) {
> - list = &hw->rsrcs.dir_pq_pairs[i].func_list;
> + int index = hw->dir_pp_allocations[i];
> + list = &hw->rsrcs.dir_pq_pairs[index].func_list;
>
> dlb2_list_add(&hw->pf.avail_dir_pq_pairs, list);
> }
> @@ -592,6 +600,7 @@ static int dlb2_attach_dir_ports(struct dlb2_hw *hw,
> u32 num_ports,
> struct dlb2_cmd_response *resp)
> {
> + int num_res = hw->num_prod_cores;
> unsigned int i;
>
> if (rsrcs->num_avail_dir_pq_pairs < num_ports) {
> @@ -611,12 +620,19 @@ static int dlb2_attach_dir_ports(struct dlb2_hw *hw,
> return -EFAULT;
> }
>
> + if (num_res) {
> + dlb2_list_add(&domain->rsvd_dir_pq_pairs,
> + &port->domain_list);
> + num_res--;
> + } else {
> + dlb2_list_add(&domain->avail_dir_pq_pairs,
> + &port->domain_list);
> + }
> +
> dlb2_list_del(&rsrcs->avail_dir_pq_pairs, &port->func_list);
>
> port->domain_id = domain->id;
> port->owned = true;
> -
> - dlb2_list_add(&domain->avail_dir_pq_pairs, &port->domain_list);
> }
>
> rsrcs->num_avail_dir_pq_pairs -= num_ports;
> @@ -739,6 +755,199 @@ static int dlb2_attach_ldb_queues(struct dlb2_hw *hw,
> return 0;
> }
>
> +static int
> +dlb2_pp_profile(struct dlb2_hw *hw, int port, int cpu, bool is_ldb)
> +{
> + u64 cycle_start = 0ULL, cycle_end = 0ULL;
> + struct dlb2_hcw hcw_mem[DLB2_HCW_MEM_SIZE], *hcw;
> + void __iomem *pp_addr;
> + cpu_set_t cpuset;
> + int i;
> +
> + CPU_ZERO(&cpuset);
> + CPU_SET(cpu, &cpuset);
> + sched_setaffinity(0, sizeof(cpuset), &cpuset);
> +
> + pp_addr = os_map_producer_port(hw, port, is_ldb);
> +
> + /* Point hcw to a 64B-aligned location */
> + hcw = (struct dlb2_hcw *)((uintptr_t)&hcw_mem[DLB2_HCW_64B_OFF] &
> + ~DLB2_HCW_ALIGN_MASK);
> +
> + /*
> + * Program the first HCW for a completion and token return and
> + * the other HCWs as NOOPS
> + */
> +
> + memset(hcw, 0, (DLB2_HCW_MEM_SIZE - DLB2_HCW_64B_OFF) * sizeof(*hcw));
> + hcw->qe_comp = 1;
> + hcw->cq_token = 1;
> + hcw->lock_id = 1;
> +
> + cycle_start = rte_get_tsc_cycles();
> + for (i = 0; i < DLB2_NUM_PROBE_ENQS; i++)
> + dlb2_movdir64b(pp_addr, hcw);
> +
> + cycle_end = rte_get_tsc_cycles();
> +
> + os_unmap_producer_port(hw, pp_addr);
> + return (int)(cycle_end - cycle_start);
> +}
> +
> +static void *
> +dlb2_pp_profile_func(void *data)
> +{
> + struct dlb2_pp_thread_data *thread_data = data;
> + int cycles;
> +
> + cycles = dlb2_pp_profile(thread_data->hw, thread_data->pp,
> + thread_data->cpu, thread_data->is_ldb);
> +
> + thread_data->cycles = cycles;
> +
> + return NULL;
> +}
> +
> +static int dlb2_pp_cycle_comp(const void *a, const void *b)
> +{
> + const struct dlb2_pp_thread_data *x = a;
> + const struct dlb2_pp_thread_data *y = b;
> +
> + return x->cycles - y->cycles;
> +}
> +
> +
> +/* Probe producer ports from different CPU cores */
> +static void
> +dlb2_get_pp_allocation(struct dlb2_hw *hw, int cpu, int port_type, int cos_id)
> +{
> + struct dlb2_dev *dlb2_dev = container_of(hw, struct dlb2_dev, hw);
> + int i, err, ver = DLB2_HW_DEVICE_FROM_PCI_ID(dlb2_dev->pdev);
> + bool is_ldb = (port_type == DLB2_LDB_PORT);
> + int num_ports = is_ldb ? DLB2_MAX_NUM_LDB_PORTS :
> + DLB2_MAX_NUM_DIR_PORTS(ver);
> + struct dlb2_pp_thread_data dlb2_thread_data[num_ports];
> + int *port_allocations = is_ldb ? hw->ldb_pp_allocations :
> + hw->dir_pp_allocations;
> + int num_sort = is_ldb ? DLB2_NUM_COS_DOMAINS : 1;
> + struct dlb2_pp_thread_data cos_cycles[num_sort];
> + int num_ports_per_sort = num_ports / num_sort;
> + pthread_t pthread;
> +
> + dlb2_dev->enqueue_four = dlb2_movdir64b;
> +
> + DLB2_LOG_INFO(" for %s: cpu core used in pp profiling: %d\n",
> + is_ldb ? "LDB" : "DIR", cpu);
> +
> + memset(cos_cycles, 0, num_sort * sizeof(struct dlb2_pp_thread_data));
> + for (i = 0; i < num_ports; i++) {
> + int cos = is_ldb ? (i >> DLB2_NUM_COS_DOMAINS) : 0;
> +
> + dlb2_thread_data[i].is_ldb = is_ldb;
> + dlb2_thread_data[i].pp = i;
> + dlb2_thread_data[i].cycles = 0;
> + dlb2_thread_data[i].hw = hw;
> + dlb2_thread_data[i].cpu = cpu;
> +
> + err = pthread_create(&pthread, NULL, &dlb2_pp_profile_func,
> + &dlb2_thread_data[i]);
> + if (err) {
> + DLB2_LOG_ERR(": thread creation failed! err=%d", err);
> + return;
> + }
> +
> + err = pthread_join(pthread, NULL);
> + if (err) {
> + DLB2_LOG_ERR(": thread join failed! err=%d", err);
> + return;
> + }
> + cos_cycles[cos].cycles += dlb2_thread_data[i].cycles;
> +
> + if ((i + 1) % num_ports_per_sort == 0) {
> + int index = cos * num_ports_per_sort;
> +
> + cos_cycles[cos].pp = index;
> + /*
> + * For LDB ports first sort with in a cos. Later sort
> + * the best cos based on total cycles for the cos.
> + * For DIR ports, there is a single sort across all
> + * ports.
> + */
> + qsort(&dlb2_thread_data[index], num_ports_per_sort,
> + sizeof(struct dlb2_pp_thread_data),
> + dlb2_pp_cycle_comp);
> + }
> + }
> +
> + /*
> + * Re-arrange best ports by cos if default cos is used.
> + */
> + if (is_ldb && cos_id == DLB2_COS_DEFAULT)
> + qsort(cos_cycles, num_sort,
> + sizeof(struct dlb2_pp_thread_data),
> + dlb2_pp_cycle_comp);
> +
> + for (i = 0; i < num_ports; i++) {
> + int start = is_ldb ? cos_cycles[i / num_ports_per_sort].pp : 0;
> + int index = i % num_ports_per_sort;
> +
> + port_allocations[i] = dlb2_thread_data[start + index].pp;
> + DLB2_LOG_INFO(": pp %d cycles %d", port_allocations[i],
> + dlb2_thread_data[start + index].cycles);
> + }
> +}
> +
> +int
> +dlb2_resource_probe(struct dlb2_hw *hw, const void *probe_args)
> +{
> + const struct dlb2_devargs *args = (const struct dlb2_devargs *)probe_args;
> + const char *mask = NULL;
> + int cpu = 0, cnt = 0, cores[RTE_MAX_LCORE];
> + int i, cos_id = DLB2_COS_DEFAULT;
> +
> + if (args) {
> + mask = (const char *)args->producer_coremask;
> + cos_id = args->cos_id;
> + }
> +
> + if (mask && rte_eal_parse_coremask(mask, cores)) {
> + DLB2_LOG_ERR(": Invalid producer coremask=%s", mask);
> + return -1;
> + }
> +
> + hw->num_prod_cores = 0;
> + for (i = 0; i < RTE_MAX_LCORE; i++) {
> + if (rte_lcore_is_enabled(i)) {
> + if (mask) {
> + /*
> + * Populate the producer cores from parsed
> + * coremask
> + */
> + if (cores[i] != -1) {
> + hw->prod_core_list[cores[i]] = i;
> + hw->num_prod_cores++;
> + }
> + } else if ((++cnt == DLB2_EAL_PROBE_CORE ||
> + rte_lcore_count() < DLB2_EAL_PROBE_CORE)) {
> + /*
> + * If no producer coremask is provided, use the
> + * second EAL core to probe
> + */
> + cpu = i;
> + break;
> + }
> + }
> + }
> + /* Use the first core in producer coremask to probe */
> + if (hw->num_prod_cores)
> + cpu = hw->prod_core_list[0];
> +
> + dlb2_get_pp_allocation(hw, cpu, DLB2_LDB_PORT, cos_id);
> + dlb2_get_pp_allocation(hw, cpu, DLB2_DIR_PORT, DLB2_COS_DEFAULT);
> +
> + return 0;
> +}
> +
> static int
> dlb2_domain_attach_resources(struct dlb2_hw *hw,
> struct dlb2_function_resources *rsrcs,
> @@ -4359,6 +4568,8 @@ dlb2_verify_create_ldb_port_args(struct dlb2_hw *hw,
> return -EINVAL;
> }
>
> + DLB2_LOG_INFO(": LDB: cos=%d port:%d\n", id, port->id.phys_id);
> +
> /* Check cache-line alignment */
> if ((cq_dma_base & 0x3F) != 0) {
> resp->status = DLB2_ST_INVALID_CQ_VIRT_ADDR;
> @@ -4568,13 +4779,25 @@ dlb2_verify_create_dir_port_args(struct dlb2_hw *hw,
> /*
> * If the port's queue is not configured, validate that a free
> * port-queue pair is available.
> + * First try the 'res' list if the port is producer OR if
> + * 'avail' list is empty else fall back to 'avail' list
> */
> - pq = DLB2_DOM_LIST_HEAD(domain->avail_dir_pq_pairs,
> - typeof(*pq));
> + if (!dlb2_list_empty(&domain->rsvd_dir_pq_pairs) &&
> + (args->is_producer ||
> + dlb2_list_empty(&domain->avail_dir_pq_pairs)))
> + pq = DLB2_DOM_LIST_HEAD(domain->rsvd_dir_pq_pairs,
> + typeof(*pq));
> + else
> + pq = DLB2_DOM_LIST_HEAD(domain->avail_dir_pq_pairs,
> + typeof(*pq));
> +
> if (!pq) {
> resp->status = DLB2_ST_DIR_PORTS_UNAVAILABLE;
> return -EINVAL;
> }
> + DLB2_LOG_INFO(": DIR: port:%d is_producer=%d\n",
> + pq->id.phys_id, args->is_producer);
> +
> }
>
> /* Check cache-line alignment */
> @@ -4875,11 +5098,18 @@ int dlb2_hw_create_dir_port(struct dlb2_hw *hw,
> return ret;
>
> /*
> - * Configuration succeeded, so move the resource from the 'avail' to
> - * the 'used' list (if it's not already there).
> + * Configuration succeeded, so move the resource from the 'avail' or
> + * 'res' to the 'used' list (if it's not already there).
> */
> if (args->queue_id == -1) {
> - dlb2_list_del(&domain->avail_dir_pq_pairs, &port->domain_list);
> + struct dlb2_list_head *res = &domain->rsvd_dir_pq_pairs;
> + struct dlb2_list_head *avail = &domain->avail_dir_pq_pairs;
> +
> + if ((args->is_producer && !dlb2_list_empty(res)) ||
> + dlb2_list_empty(avail))
> + dlb2_list_del(res, &port->domain_list);
> + else
> + dlb2_list_del(avail, &port->domain_list);
>
> dlb2_list_add(&domain->used_dir_pq_pairs, &port->domain_list);
> }
> diff --git a/drivers/event/dlb2/pf/base/dlb2_resource.h b/drivers/event/dlb2/pf/base/dlb2_resource.h
> index a7e6c90888..71bd6148f1 100644
> --- a/drivers/event/dlb2/pf/base/dlb2_resource.h
> +++ b/drivers/event/dlb2/pf/base/dlb2_resource.h
> @@ -23,7 +23,20 @@
> * Return:
> * Returns 0 upon success, <0 otherwise.
> */
> -int dlb2_resource_init(struct dlb2_hw *hw, enum dlb2_hw_ver ver);
> +int dlb2_resource_init(struct dlb2_hw *hw, enum dlb2_hw_ver ver, const void *probe_args);
> +
> +/**
> + * dlb2_resource_probe() - probe hw resources
> + * @hw: pointer to struct dlb2_hw.
> + *
> + * This function probes hw resources for best port allocation to producer
> + * cores.
> + *
> + * Return:
> + * Returns 0 upon success, <0 otherwise.
> + */
> +int dlb2_resource_probe(struct dlb2_hw *hw, const void *probe_args);
> +
>
> /**
> * dlb2_clr_pmcsr_disable() - power on bulk of DLB 2.0 logic
> diff --git a/drivers/event/dlb2/pf/dlb2_main.c b/drivers/event/dlb2/pf/dlb2_main.c
> index b6ec85b479..717aa4fc08 100644
> --- a/drivers/event/dlb2/pf/dlb2_main.c
> +++ b/drivers/event/dlb2/pf/dlb2_main.c
> @@ -147,7 +147,7 @@ static int dlb2_pf_wait_for_device_ready(struct dlb2_dev *dlb2_dev,
> }
>
> struct dlb2_dev *
> -dlb2_probe(struct rte_pci_device *pdev)
> +dlb2_probe(struct rte_pci_device *pdev, const void *probe_args)
> {
> struct dlb2_dev *dlb2_dev;
> int ret = 0;
> @@ -208,6 +208,10 @@ dlb2_probe(struct rte_pci_device *pdev)
> if (ret)
> goto wait_for_device_ready_fail;
>
> + ret = dlb2_resource_probe(&dlb2_dev->hw, probe_args);
> + if (ret)
> + goto resource_probe_fail;
> +
> ret = dlb2_pf_reset(dlb2_dev);
> if (ret)
> goto dlb2_reset_fail;
> @@ -216,7 +220,7 @@ dlb2_probe(struct rte_pci_device *pdev)
> if (ret)
> goto init_driver_state_fail;
>
> - ret = dlb2_resource_init(&dlb2_dev->hw, dlb_version);
> + ret = dlb2_resource_init(&dlb2_dev->hw, dlb_version, probe_args);
> if (ret)
> goto resource_init_fail;
>
> @@ -227,6 +231,7 @@ dlb2_probe(struct rte_pci_device *pdev)
> init_driver_state_fail:
> dlb2_reset_fail:
> pci_mmap_bad_addr:
> +resource_probe_fail:
> wait_for_device_ready_fail:
> rte_free(dlb2_dev);
> dlb2_dev_malloc_fail:
> diff --git a/drivers/event/dlb2/pf/dlb2_main.h b/drivers/event/dlb2/pf/dlb2_main.h
> index 5aa51b1616..4c64d72e9c 100644
> --- a/drivers/event/dlb2/pf/dlb2_main.h
> +++ b/drivers/event/dlb2/pf/dlb2_main.h
> @@ -15,7 +15,11 @@
> #include "base/dlb2_hw_types.h"
> #include "../dlb2_user.h"
>
> -#define DLB2_DEFAULT_UNREGISTER_TIMEOUT_S 5
> +#define DLB2_EAL_PROBE_CORE 2
> +#define DLB2_NUM_PROBE_ENQS 1000
> +#define DLB2_HCW_MEM_SIZE 8
> +#define DLB2_HCW_64B_OFF 4
> +#define DLB2_HCW_ALIGN_MASK 0x3F
>
> struct dlb2_dev;
>
> @@ -31,15 +35,30 @@ struct dlb2_dev {
> /* struct list_head list; */
> struct device *dlb2_device;
> bool domain_reset_failed;
> + /* The enqueue_four function enqueues four HCWs (one cache-line worth)
> + * to the HQM, using whichever mechanism is supported by the platform
> + * on which this driver is running.
> + */
> + void (*enqueue_four)(void *qe4, void *pp_addr);
> /* The resource mutex serializes access to driver data structures and
> * hardware registers.
> */
> rte_spinlock_t resource_mutex;
> bool worker_launched;
> u8 revision;
> + u8 version;
> +};
> +
> +struct dlb2_pp_thread_data {
> + struct dlb2_hw *hw;
> + int pp;
> + int cpu;
> + bool is_ldb;
> + int cycles;
> };
>
> -struct dlb2_dev *dlb2_probe(struct rte_pci_device *pdev);
> +struct dlb2_dev *dlb2_probe(struct rte_pci_device *pdev, const void *probe_args);
> +
>
> int dlb2_pf_reset(struct dlb2_dev *dlb2_dev);
> int dlb2_pf_create_sched_domain(struct dlb2_hw *hw,
> diff --git a/drivers/event/dlb2/pf/dlb2_pf.c b/drivers/event/dlb2/pf/dlb2_pf.c
> index 71ac141b66..3d15250e11 100644
> --- a/drivers/event/dlb2/pf/dlb2_pf.c
> +++ b/drivers/event/dlb2/pf/dlb2_pf.c
> @@ -702,6 +702,7 @@ dlb2_eventdev_pci_init(struct rte_eventdev *eventdev)
> struct dlb2_devargs dlb2_args = {
> .socket_id = rte_socket_id(),
> .max_num_events = DLB2_MAX_NUM_LDB_CREDITS,
> + .producer_coremask = NULL,
> .num_dir_credits_override = -1,
> .qid_depth_thresholds = { {0} },
> .poll_interval = DLB2_POLL_INTERVAL_DEFAULT,
> @@ -713,6 +714,7 @@ dlb2_eventdev_pci_init(struct rte_eventdev *eventdev)
> };
> struct dlb2_eventdev *dlb2;
> int q;
> + const void *probe_args = NULL;
>
> DLB2_LOG_DBG("Enter with dev_id=%d socket_id=%d",
> eventdev->data->dev_id, eventdev->data->socket_id);
> @@ -728,16 +730,6 @@ dlb2_eventdev_pci_init(struct rte_eventdev *eventdev)
> dlb2 = dlb2_pmd_priv(eventdev); /* rte_zmalloc_socket mem */
> dlb2->version = DLB2_HW_DEVICE_FROM_PCI_ID(pci_dev);
>
> - /* Probe the DLB2 PF layer */
> - dlb2->qm_instance.pf_dev = dlb2_probe(pci_dev);
> -
> - if (dlb2->qm_instance.pf_dev == NULL) {
> - DLB2_LOG_ERR("DLB2 PF Probe failed with error %d\n",
> - rte_errno);
> - ret = -rte_errno;
> - goto dlb2_probe_failed;
> - }
> -
> /* Were we invoked with runtime parameters? */
> if (pci_dev->device.devargs) {
> ret = dlb2_parse_params(pci_dev->device.devargs->args,
> @@ -749,6 +741,17 @@ dlb2_eventdev_pci_init(struct rte_eventdev *eventdev)
> ret, rte_errno);
> goto dlb2_probe_failed;
> }
> + probe_args = &dlb2_args;
> + }
> +
> + /* Probe the DLB2 PF layer */
> + dlb2->qm_instance.pf_dev = dlb2_probe(pci_dev, probe_args);
> +
> + if (dlb2->qm_instance.pf_dev == NULL) {
> + DLB2_LOG_ERR("DLB2 PF Probe failed with error %d\n",
> + rte_errno);
> + ret = -rte_errno;
> + goto dlb2_probe_failed;
> }
>
> ret = dlb2_primary_eventdev_probe(eventdev,
> --
> 2.25.1
>
next prev parent reply other threads:[~2022-09-30 8:28 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-20 0:59 [PATCH 0/3] DLB2 Performance Optimizations Timothy McDaniel
2022-08-20 0:59 ` [PATCH 1/3] event/dlb2: add producer port probing optimization Timothy McDaniel
2022-09-03 13:16 ` Jerin Jacob
2022-09-26 22:55 ` [PATCH v3 " Abdullah Sevincer
2022-09-26 22:55 ` [PATCH v3 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-26 22:55 ` [PATCH v3 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-27 1:42 ` [PATCH v4 1/3] event/dlb2: add producer port probing optimization Abdullah Sevincer
2022-09-27 1:42 ` [PATCH v4 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-27 1:42 ` [PATCH v4 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-28 14:45 ` [PATCH v4 1/3] event/dlb2: add producer port probing optimization Jerin Jacob
2022-09-28 19:11 ` [PATCH v5 " Abdullah Sevincer
2022-09-28 19:11 ` [PATCH v5 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-28 19:19 ` [PATCH v6 1/3] event/dlb2: add producer port probing optimization Abdullah Sevincer
2022-09-28 19:19 ` [PATCH v6 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-28 19:19 ` [PATCH v6 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-28 20:28 ` [PATCH v7 1/3] event/dlb2: add producer port probing optimization Abdullah Sevincer
2022-09-28 20:28 ` [PATCH v7 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-28 20:28 ` [PATCH v7 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-29 1:32 ` [PATCH v8 1/3] event/dlb2: add producer port probing optimization Abdullah Sevincer
2022-09-29 1:32 ` [PATCH v8 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-29 1:32 ` [PATCH v8 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-29 2:48 ` [PATCH v8 1/3] event/dlb2: add producer port probing optimization Sevincer, Abdullah
2022-09-29 3:46 ` [PATCH v9 " Abdullah Sevincer
2022-09-29 3:46 ` [PATCH v9 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-29 3:46 ` [PATCH v9 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-29 5:03 ` [PATCH v10 1/3] event/dlb2: add producer port probing optimization Abdullah Sevincer
2022-09-29 5:03 ` [PATCH v10 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-29 5:03 ` [PATCH v10 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-29 15:26 ` [PATCH v11 1/3] event/dlb2: add producer port probing optimization Abdullah Sevincer
2022-09-29 15:26 ` [PATCH v11 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-29 15:26 ` [PATCH v11 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-29 23:58 ` [PATCH v12 1/3] event/dlb2: add producer port probing optimization Abdullah Sevincer
2022-09-29 23:58 ` [PATCH v12 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-29 23:59 ` [PATCH v12 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-30 8:28 ` Jerin Jacob [this message]
2022-08-20 0:59 ` [PATCH 2/3] event/dlb2: add fence bypass option for producer ports Timothy McDaniel
2022-08-20 0:59 ` [PATCH 3/3] event/dlb2: optimize credit allocations Timothy McDaniel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALBAE1ODWtfB30QDU8_W5tVf3siKCfDKv0mVddj1pyZn7UjQ8A@mail.gmail.com \
--to=jerinjacobk@gmail.com \
--cc=abdullah.sevincer@intel.com \
--cc=dev@dpdk.org \
--cc=jerinj@marvell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).