DPDK patches and discussions
 help / color / mirror / Atom feed
From: Jerin Jacob <jerinjacobk@gmail.com>
To: Abdullah Sevincer <abdullah.sevincer@intel.com>
Cc: dev@dpdk.org, jerinj@marvell.com
Subject: Re: [PATCH v12 1/3] event/dlb2: add producer port probing optimization
Date: Fri, 30 Sep 2022 13:58:04 +0530	[thread overview]
Message-ID: <CALBAE1ODWtfB30QDU8_W5tVf3siKCfDKv0mVddj1pyZn7UjQ8A@mail.gmail.com> (raw)
In-Reply-To: <20220929235900.1761461-1-abdullah.sevincer@intel.com>

On Fri, Sep 30, 2022 at 5:29 AM Abdullah Sevincer
<abdullah.sevincer@intel.com> wrote:
>
> For best performance, applications running on certain cores should use
> the DLB device locally available on the same tile along with other
> resources. To allocate optimal resources, probing is done for each
> producer port (PP) for a given CPU and the best performing ports are
> allocated to producers. The cpu used for probing is either the first
> core of producer coremask (if present) or the second core of EAL
> coremask. This will be extended later to probe for all CPUs in the
> producer coremask or EAL coremask.
>
> Producer coremask can be passed along with the BDF of the DLB devices.
> "-a xx:y.z,producer_coremask=<core_mask>"
>
> Applications also need to pass RTE_EVENT_PORT_CFG_HINT_PRODUCER during
> rte_event_port_setup() for producer ports for optimal port allocation.
>
> For optimal load balancing ports that map to one or more QIDs in common
> should not be in numerical sequence. The port->QID mapping is application
> dependent, but the driver interleaves port IDs as much as possible to
> reduce the likelihood of sequential ports mapping to the same QID(s).
>
> Hence, DLB uses an initial allocation of Port IDs to maximize the
> average distance between an ID and its immediate neighbors. Using
> the initialport allocation option can be passed through devarg
> "default_port_allocation=y(or Y)".
>
> When events are dropped by workers or consumers that use LDB ports,
> completions are sent which are just ENQs and may impact the latency.
> To address this,  probing is done for LDB ports as well. Probing is
> done on ports per 'cos'. When default cos is used, ports will be
> allocated from best ports from the best 'cos', else from best ports of
> the specific cos.
>
> Signed-off-by: Abdullah Sevincer <abdullah.sevincer@intel.com>

Changed subject as " event/dlb2: optimize producer port probing"

Series applied to dpdk-next-net-eventdev/for-main. Thanks

> ---
>  doc/guides/eventdevs/dlb2.rst              |  36 +++
>  drivers/event/dlb2/dlb2.c                  |  72 +++++-
>  drivers/event/dlb2/dlb2_priv.h             |   7 +
>  drivers/event/dlb2/dlb2_user.h             |   1 +
>  drivers/event/dlb2/pf/base/dlb2_hw_types.h |   5 +
>  drivers/event/dlb2/pf/base/dlb2_resource.c | 250 ++++++++++++++++++++-
>  drivers/event/dlb2/pf/base/dlb2_resource.h |  15 +-
>  drivers/event/dlb2/pf/dlb2_main.c          |   9 +-
>  drivers/event/dlb2/pf/dlb2_main.h          |  23 +-
>  drivers/event/dlb2/pf/dlb2_pf.c            |  23 +-
>  10 files changed, 413 insertions(+), 28 deletions(-)
>
> diff --git a/doc/guides/eventdevs/dlb2.rst b/doc/guides/eventdevs/dlb2.rst
> index 5b21f13b68..f5bf5757c6 100644
> --- a/doc/guides/eventdevs/dlb2.rst
> +++ b/doc/guides/eventdevs/dlb2.rst
> @@ -414,3 +414,39 @@ Note that the weight may not exceed the maximum CQ depth.
>         --allow ea:00.0,cq_weight=all:<weight>
>         --allow ea:00.0,cq_weight=qidA-qidB:<weight>
>         --allow ea:00.0,cq_weight=qid:<weight>
> +
> +Producer Coremask
> +~~~~~~~~~~~~~~~~~
> +
> +For best performance, applications running on certain cores should use
> +the DLB device locally available on the same tile along with other
> +resources. To allocate optimal resources, probing is done for each
> +producer port (PP) for a given CPU and the best performing ports are
> +allocated to producers. The cpu used for probing is either the first
> +core of producer coremask (if present) or the second core of EAL
> +coremask. This will be extended later to probe for all CPUs in the
> +producer coremask or EAL coremask. Producer coremask can be passed
> +along with the BDF of the DLB devices.
> +
> +    .. code-block:: console
> +
> +       -a xx:y.z,producer_coremask=<core_mask>
> +
> +Default LDB Port Allocation
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +For optimal load balancing ports that map to one or more QIDs in common
> +should not be in numerical sequence. The port->QID mapping is application
> +dependent, but the driver interleaves port IDs as much as possible to
> +reduce the likelihood of sequential ports mapping to the same QID(s).
> +
> +Hence, DLB uses an initial allocation of Port IDs to maximize the
> +average distance between an ID and its immediate neighbors. (i.e.the
> +distance from 1 to 0 and to 2, the distance from 2 to 1 and to 3, etc.).
> +Initial port allocation option can be passed through devarg. If y (or Y)
> +inial port allocation will be used, otherwise initial port allocation
> +won't be used.
> +
> +    .. code-block:: console
> +
> +       --allow ea:00.0,default_port_allocation=<y/Y>
> diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
> index 759578378f..6a9db4b642 100644
> --- a/drivers/event/dlb2/dlb2.c
> +++ b/drivers/event/dlb2/dlb2.c
> @@ -293,6 +293,23 @@ dlb2_string_to_int(int *result, const char *str)
>         return 0;
>  }
>
> +static int
> +set_producer_coremask(const char *key __rte_unused,
> +                     const char *value,
> +                     void *opaque)
> +{
> +       const char **mask_str = opaque;
> +
> +       if (value == NULL || opaque == NULL) {
> +               DLB2_LOG_ERR("NULL pointer\n");
> +               return -EINVAL;
> +       }
> +
> +       *mask_str = value;
> +
> +       return 0;
> +}
> +
>  static int
>  set_numa_node(const char *key __rte_unused, const char *value, void *opaque)
>  {
> @@ -617,6 +634,26 @@ set_vector_opts_enab(const char *key __rte_unused,
>         return 0;
>  }
>
> +static int
> +set_default_ldb_port_allocation(const char *key __rte_unused,
> +                     const char *value,
> +                     void *opaque)
> +{
> +       bool *default_ldb_port_allocation = opaque;
> +
> +       if (value == NULL || opaque == NULL) {
> +               DLB2_LOG_ERR("NULL pointer\n");
> +               return -EINVAL;
> +       }
> +
> +       if ((*value == 'y') || (*value == 'Y'))
> +               *default_ldb_port_allocation = true;
> +       else
> +               *default_ldb_port_allocation = false;
> +
> +       return 0;
> +}
> +
>  static int
>  set_qid_depth_thresh(const char *key __rte_unused,
>                      const char *value,
> @@ -1785,6 +1822,9 @@ dlb2_hw_create_dir_port(struct dlb2_eventdev *dlb2,
>         } else
>                 credit_high_watermark = enqueue_depth;
>
> +       if (ev_port->conf.event_port_cfg & RTE_EVENT_PORT_CFG_HINT_PRODUCER)
> +               cfg.is_producer = 1;
> +
>         /* Per QM values */
>
>         ret = dlb2_iface_dir_port_create(handle, &cfg,  dlb2->poll_mode);
> @@ -1979,6 +2019,10 @@ dlb2_eventdev_port_setup(struct rte_eventdev *dev,
>         }
>         ev_port->enq_retries = port_conf->enqueue_depth / sw_credit_quanta;
>
> +       /* Save off port config for reconfig */
> +       ev_port->conf = *port_conf;
> +
> +
>         /*
>          * Create port
>          */
> @@ -2005,9 +2049,6 @@ dlb2_eventdev_port_setup(struct rte_eventdev *dev,
>                 }
>         }
>
> -       /* Save off port config for reconfig */
> -       ev_port->conf = *port_conf;
> -
>         ev_port->id = ev_port_id;
>         ev_port->enq_configured = true;
>         ev_port->setup_done = true;
> @@ -4700,6 +4741,8 @@ dlb2_parse_params(const char *params,
>                                              DLB2_CQ_WEIGHT,
>                                              DLB2_PORT_COS,
>                                              DLB2_COS_BW,
> +                                            DLB2_PRODUCER_COREMASK,
> +                                            DLB2_DEFAULT_LDB_PORT_ALLOCATION_ARG,
>                                              NULL };
>
>         if (params != NULL && params[0] != '\0') {
> @@ -4881,6 +4924,29 @@ dlb2_parse_params(const char *params,
>                         }
>
>
> +                       ret = rte_kvargs_process(kvlist,
> +                                                DLB2_PRODUCER_COREMASK,
> +                                                set_producer_coremask,
> +                                                &dlb2_args->producer_coremask);
> +                       if (ret != 0) {
> +                               DLB2_LOG_ERR(
> +                                       "%s: Error parsing producer coremask",
> +                                       name);
> +                               rte_kvargs_free(kvlist);
> +                               return ret;
> +                       }
> +
> +                       ret = rte_kvargs_process(kvlist,
> +                                                DLB2_DEFAULT_LDB_PORT_ALLOCATION_ARG,
> +                                                set_default_ldb_port_allocation,
> +                                                &dlb2_args->default_ldb_port_allocation);
> +                       if (ret != 0) {
> +                               DLB2_LOG_ERR("%s: Error parsing ldb default port allocation arg",
> +                                            name);
> +                               rte_kvargs_free(kvlist);
> +                               return ret;
> +                       }
> +
>                         rte_kvargs_free(kvlist);
>                 }
>         }
> diff --git a/drivers/event/dlb2/dlb2_priv.h b/drivers/event/dlb2/dlb2_priv.h
> index db431f7d8b..9ef5bcb901 100644
> --- a/drivers/event/dlb2/dlb2_priv.h
> +++ b/drivers/event/dlb2/dlb2_priv.h
> @@ -51,6 +51,8 @@
>  #define DLB2_CQ_WEIGHT "cq_weight"
>  #define DLB2_PORT_COS "port_cos"
>  #define DLB2_COS_BW "cos_bw"
> +#define DLB2_PRODUCER_COREMASK "producer_coremask"
> +#define DLB2_DEFAULT_LDB_PORT_ALLOCATION_ARG "default_port_allocation"
>
>  /* Begin HW related defines and structs */
>
> @@ -386,6 +388,7 @@ struct dlb2_port {
>         uint16_t hw_credit_quanta;
>         bool use_avx512;
>         uint32_t cq_weight;
> +       bool is_producer; /* True if port is of type producer */
>  };
>
>  /* Per-process per-port mmio and memory pointers */
> @@ -669,6 +672,8 @@ struct dlb2_devargs {
>         struct dlb2_cq_weight cq_weight;
>         struct dlb2_port_cos port_cos;
>         struct dlb2_cos_bw cos_bw;
> +       const char *producer_coremask;
> +       bool default_ldb_port_allocation;
>  };
>
>  /* End Eventdev related defines and structs */
> @@ -722,6 +727,8 @@ void dlb2_event_build_hcws(struct dlb2_port *qm_port,
>                            uint8_t *sched_type,
>                            uint8_t *queue_id);
>
> +/* Extern functions */
> +extern int rte_eal_parse_coremask(const char *coremask, int *cores);
>
>  /* Extern globals */
>  extern struct process_local_port_data dlb2_port[][DLB2_NUM_PORT_TYPES];
> diff --git a/drivers/event/dlb2/dlb2_user.h b/drivers/event/dlb2/dlb2_user.h
> index 901e2e0c66..28c6aaaf43 100644
> --- a/drivers/event/dlb2/dlb2_user.h
> +++ b/drivers/event/dlb2/dlb2_user.h
> @@ -498,6 +498,7 @@ struct dlb2_create_dir_port_args {
>         __u16 cq_depth;
>         __u16 cq_depth_threshold;
>         __s32 queue_id;
> +       __u8 is_producer;
>  };
>
>  /*
> diff --git a/drivers/event/dlb2/pf/base/dlb2_hw_types.h b/drivers/event/dlb2/pf/base/dlb2_hw_types.h
> index 9511521e67..87996ef621 100644
> --- a/drivers/event/dlb2/pf/base/dlb2_hw_types.h
> +++ b/drivers/event/dlb2/pf/base/dlb2_hw_types.h
> @@ -249,6 +249,7 @@ struct dlb2_hw_domain {
>         struct dlb2_list_head avail_ldb_queues;
>         struct dlb2_list_head avail_ldb_ports[DLB2_NUM_COS_DOMAINS];
>         struct dlb2_list_head avail_dir_pq_pairs;
> +       struct dlb2_list_head rsvd_dir_pq_pairs;
>         u32 total_hist_list_entries;
>         u32 avail_hist_list_entries;
>         u32 hist_list_entry_base;
> @@ -347,6 +348,10 @@ struct dlb2_hw {
>         struct dlb2_function_resources vdev[DLB2_MAX_NUM_VDEVS];
>         struct dlb2_hw_domain domains[DLB2_MAX_NUM_DOMAINS];
>         u8 cos_reservation[DLB2_NUM_COS_DOMAINS];
> +       int prod_core_list[RTE_MAX_LCORE];
> +       u8 num_prod_cores;
> +       int dir_pp_allocations[DLB2_MAX_NUM_DIR_PORTS_V2_5];
> +       int ldb_pp_allocations[DLB2_MAX_NUM_LDB_PORTS];
>
>         /* Virtualization */
>         int virt_mode;
> diff --git a/drivers/event/dlb2/pf/base/dlb2_resource.c b/drivers/event/dlb2/pf/base/dlb2_resource.c
> index 0731416a43..280a8e51b1 100644
> --- a/drivers/event/dlb2/pf/base/dlb2_resource.c
> +++ b/drivers/event/dlb2/pf/base/dlb2_resource.c
> @@ -51,6 +51,7 @@ static void dlb2_init_domain_rsrc_lists(struct dlb2_hw_domain *domain)
>         dlb2_list_init_head(&domain->used_dir_pq_pairs);
>         dlb2_list_init_head(&domain->avail_ldb_queues);
>         dlb2_list_init_head(&domain->avail_dir_pq_pairs);
> +       dlb2_list_init_head(&domain->rsvd_dir_pq_pairs);
>
>         for (i = 0; i < DLB2_NUM_COS_DOMAINS; i++)
>                 dlb2_list_init_head(&domain->used_ldb_ports[i]);
> @@ -106,8 +107,10 @@ void dlb2_resource_free(struct dlb2_hw *hw)
>   * Return:
>   * Returns 0 upon success, <0 otherwise.
>   */
> -int dlb2_resource_init(struct dlb2_hw *hw, enum dlb2_hw_ver ver)
> +int dlb2_resource_init(struct dlb2_hw *hw, enum dlb2_hw_ver ver, const void *probe_args)
>  {
> +       const struct dlb2_devargs *args = (const struct dlb2_devargs *)probe_args;
> +       bool ldb_port_default = args ? args->default_ldb_port_allocation : false;
>         struct dlb2_list_entry *list;
>         unsigned int i;
>         int ret;
> @@ -122,6 +125,7 @@ int dlb2_resource_init(struct dlb2_hw *hw, enum dlb2_hw_ver ver)
>          * the distance from 1 to 0 and to 2, the distance from 2 to 1 and to
>          * 3, etc.).
>          */
> +
>         const u8 init_ldb_port_allocation[DLB2_MAX_NUM_LDB_PORTS] = {
>                 0,  7,  14,  5, 12,  3, 10,  1,  8, 15,  6, 13,  4, 11,  2,  9,
>                 16, 23, 30, 21, 28, 19, 26, 17, 24, 31, 22, 29, 20, 27, 18, 25,
> @@ -164,7 +168,10 @@ int dlb2_resource_init(struct dlb2_hw *hw, enum dlb2_hw_ver ver)
>                 int cos_id = i >> DLB2_NUM_COS_DOMAINS;
>                 struct dlb2_ldb_port *port;
>
> -               port = &hw->rsrcs.ldb_ports[init_ldb_port_allocation[i]];
> +               if (ldb_port_default == true)
> +                       port = &hw->rsrcs.ldb_ports[init_ldb_port_allocation[i]];
> +               else
> +                       port = &hw->rsrcs.ldb_ports[hw->ldb_pp_allocations[i]];
>
>                 dlb2_list_add(&hw->pf.avail_ldb_ports[cos_id],
>                               &port->func_list);
> @@ -172,7 +179,8 @@ int dlb2_resource_init(struct dlb2_hw *hw, enum dlb2_hw_ver ver)
>
>         hw->pf.num_avail_dir_pq_pairs = DLB2_MAX_NUM_DIR_PORTS(hw->ver);
>         for (i = 0; i < hw->pf.num_avail_dir_pq_pairs; i++) {
> -               list = &hw->rsrcs.dir_pq_pairs[i].func_list;
> +               int index = hw->dir_pp_allocations[i];
> +               list = &hw->rsrcs.dir_pq_pairs[index].func_list;
>
>                 dlb2_list_add(&hw->pf.avail_dir_pq_pairs, list);
>         }
> @@ -592,6 +600,7 @@ static int dlb2_attach_dir_ports(struct dlb2_hw *hw,
>                                  u32 num_ports,
>                                  struct dlb2_cmd_response *resp)
>  {
> +       int num_res = hw->num_prod_cores;
>         unsigned int i;
>
>         if (rsrcs->num_avail_dir_pq_pairs < num_ports) {
> @@ -611,12 +620,19 @@ static int dlb2_attach_dir_ports(struct dlb2_hw *hw,
>                         return -EFAULT;
>                 }
>
> +               if (num_res) {
> +                       dlb2_list_add(&domain->rsvd_dir_pq_pairs,
> +                                     &port->domain_list);
> +                       num_res--;
> +               } else {
> +                       dlb2_list_add(&domain->avail_dir_pq_pairs,
> +                       &port->domain_list);
> +               }
> +
>                 dlb2_list_del(&rsrcs->avail_dir_pq_pairs, &port->func_list);
>
>                 port->domain_id = domain->id;
>                 port->owned = true;
> -
> -               dlb2_list_add(&domain->avail_dir_pq_pairs, &port->domain_list);
>         }
>
>         rsrcs->num_avail_dir_pq_pairs -= num_ports;
> @@ -739,6 +755,199 @@ static int dlb2_attach_ldb_queues(struct dlb2_hw *hw,
>         return 0;
>  }
>
> +static int
> +dlb2_pp_profile(struct dlb2_hw *hw, int port, int cpu, bool is_ldb)
> +{
> +       u64 cycle_start = 0ULL, cycle_end = 0ULL;
> +       struct dlb2_hcw hcw_mem[DLB2_HCW_MEM_SIZE], *hcw;
> +       void __iomem *pp_addr;
> +       cpu_set_t cpuset;
> +       int i;
> +
> +       CPU_ZERO(&cpuset);
> +       CPU_SET(cpu, &cpuset);
> +       sched_setaffinity(0, sizeof(cpuset), &cpuset);
> +
> +       pp_addr = os_map_producer_port(hw, port, is_ldb);
> +
> +       /* Point hcw to a 64B-aligned location */
> +       hcw = (struct dlb2_hcw *)((uintptr_t)&hcw_mem[DLB2_HCW_64B_OFF] &
> +             ~DLB2_HCW_ALIGN_MASK);
> +
> +       /*
> +        * Program the first HCW for a completion and token return and
> +        * the other HCWs as NOOPS
> +        */
> +
> +       memset(hcw, 0, (DLB2_HCW_MEM_SIZE - DLB2_HCW_64B_OFF) * sizeof(*hcw));
> +       hcw->qe_comp = 1;
> +       hcw->cq_token = 1;
> +       hcw->lock_id = 1;
> +
> +       cycle_start = rte_get_tsc_cycles();
> +       for (i = 0; i < DLB2_NUM_PROBE_ENQS; i++)
> +               dlb2_movdir64b(pp_addr, hcw);
> +
> +       cycle_end = rte_get_tsc_cycles();
> +
> +       os_unmap_producer_port(hw, pp_addr);
> +       return (int)(cycle_end - cycle_start);
> +}
> +
> +static void *
> +dlb2_pp_profile_func(void *data)
> +{
> +       struct dlb2_pp_thread_data *thread_data = data;
> +       int cycles;
> +
> +       cycles = dlb2_pp_profile(thread_data->hw, thread_data->pp,
> +       thread_data->cpu, thread_data->is_ldb);
> +
> +       thread_data->cycles = cycles;
> +
> +       return NULL;
> +}
> +
> +static int dlb2_pp_cycle_comp(const void *a, const void *b)
> +{
> +       const struct dlb2_pp_thread_data *x = a;
> +       const struct dlb2_pp_thread_data *y = b;
> +
> +       return x->cycles - y->cycles;
> +}
> +
> +
> +/* Probe producer ports from different CPU cores */
> +static void
> +dlb2_get_pp_allocation(struct dlb2_hw *hw, int cpu, int port_type, int cos_id)
> +{
> +       struct dlb2_dev *dlb2_dev = container_of(hw, struct dlb2_dev, hw);
> +       int i, err, ver = DLB2_HW_DEVICE_FROM_PCI_ID(dlb2_dev->pdev);
> +       bool is_ldb = (port_type == DLB2_LDB_PORT);
> +       int num_ports = is_ldb ? DLB2_MAX_NUM_LDB_PORTS :
> +       DLB2_MAX_NUM_DIR_PORTS(ver);
> +       struct dlb2_pp_thread_data dlb2_thread_data[num_ports];
> +       int *port_allocations = is_ldb ? hw->ldb_pp_allocations :
> +                                        hw->dir_pp_allocations;
> +       int num_sort = is_ldb ? DLB2_NUM_COS_DOMAINS : 1;
> +       struct dlb2_pp_thread_data cos_cycles[num_sort];
> +       int num_ports_per_sort = num_ports / num_sort;
> +       pthread_t pthread;
> +
> +       dlb2_dev->enqueue_four = dlb2_movdir64b;
> +
> +       DLB2_LOG_INFO(" for %s: cpu core used in pp profiling: %d\n",
> +                     is_ldb ? "LDB" : "DIR", cpu);
> +
> +       memset(cos_cycles, 0, num_sort * sizeof(struct dlb2_pp_thread_data));
> +       for (i = 0; i < num_ports; i++) {
> +               int cos = is_ldb ? (i >> DLB2_NUM_COS_DOMAINS) : 0;
> +
> +               dlb2_thread_data[i].is_ldb = is_ldb;
> +               dlb2_thread_data[i].pp = i;
> +               dlb2_thread_data[i].cycles = 0;
> +               dlb2_thread_data[i].hw = hw;
> +               dlb2_thread_data[i].cpu = cpu;
> +
> +               err = pthread_create(&pthread, NULL, &dlb2_pp_profile_func,
> +                                    &dlb2_thread_data[i]);
> +               if (err) {
> +                       DLB2_LOG_ERR(": thread creation failed! err=%d", err);
> +                       return;
> +               }
> +
> +               err = pthread_join(pthread, NULL);
> +               if (err) {
> +                       DLB2_LOG_ERR(": thread join failed! err=%d", err);
> +                       return;
> +               }
> +               cos_cycles[cos].cycles += dlb2_thread_data[i].cycles;
> +
> +               if ((i + 1) % num_ports_per_sort == 0) {
> +                       int index = cos * num_ports_per_sort;
> +
> +                       cos_cycles[cos].pp = index;
> +                       /*
> +                        * For LDB ports first sort with in a cos. Later sort
> +                        * the best cos based on total cycles for the cos.
> +                        * For DIR ports, there is a single sort across all
> +                        * ports.
> +                        */
> +                       qsort(&dlb2_thread_data[index], num_ports_per_sort,
> +                             sizeof(struct dlb2_pp_thread_data),
> +                             dlb2_pp_cycle_comp);
> +               }
> +       }
> +
> +       /*
> +        * Re-arrange best ports by cos if default cos is used.
> +        */
> +       if (is_ldb && cos_id == DLB2_COS_DEFAULT)
> +               qsort(cos_cycles, num_sort,
> +                     sizeof(struct dlb2_pp_thread_data),
> +                     dlb2_pp_cycle_comp);
> +
> +       for (i = 0; i < num_ports; i++) {
> +               int start = is_ldb ? cos_cycles[i / num_ports_per_sort].pp : 0;
> +               int index = i % num_ports_per_sort;
> +
> +               port_allocations[i] = dlb2_thread_data[start + index].pp;
> +               DLB2_LOG_INFO(": pp %d cycles %d", port_allocations[i],
> +                            dlb2_thread_data[start + index].cycles);
> +       }
> +}
> +
> +int
> +dlb2_resource_probe(struct dlb2_hw *hw, const void *probe_args)
> +{
> +       const struct dlb2_devargs *args = (const struct dlb2_devargs *)probe_args;
> +       const char *mask = NULL;
> +       int cpu = 0, cnt = 0, cores[RTE_MAX_LCORE];
> +       int i, cos_id = DLB2_COS_DEFAULT;
> +
> +       if (args) {
> +               mask = (const char *)args->producer_coremask;
> +               cos_id = args->cos_id;
> +       }
> +
> +       if (mask && rte_eal_parse_coremask(mask, cores)) {
> +               DLB2_LOG_ERR(": Invalid producer coremask=%s", mask);
> +               return -1;
> +       }
> +
> +       hw->num_prod_cores = 0;
> +       for (i = 0; i < RTE_MAX_LCORE; i++) {
> +               if (rte_lcore_is_enabled(i)) {
> +                       if (mask) {
> +                               /*
> +                                * Populate the producer cores from parsed
> +                                * coremask
> +                                */
> +                               if (cores[i] != -1) {
> +                                       hw->prod_core_list[cores[i]] = i;
> +                                       hw->num_prod_cores++;
> +                               }
> +                       } else if ((++cnt == DLB2_EAL_PROBE_CORE ||
> +                          rte_lcore_count() < DLB2_EAL_PROBE_CORE)) {
> +                               /*
> +                                * If no producer coremask is provided, use the
> +                                * second EAL core to probe
> +                                */
> +                               cpu = i;
> +                               break;
> +                       }
> +               }
> +       }
> +       /* Use the first core in producer coremask to probe */
> +       if (hw->num_prod_cores)
> +               cpu = hw->prod_core_list[0];
> +
> +       dlb2_get_pp_allocation(hw, cpu, DLB2_LDB_PORT, cos_id);
> +       dlb2_get_pp_allocation(hw, cpu, DLB2_DIR_PORT, DLB2_COS_DEFAULT);
> +
> +       return 0;
> +}
> +
>  static int
>  dlb2_domain_attach_resources(struct dlb2_hw *hw,
>                              struct dlb2_function_resources *rsrcs,
> @@ -4359,6 +4568,8 @@ dlb2_verify_create_ldb_port_args(struct dlb2_hw *hw,
>                 return -EINVAL;
>         }
>
> +       DLB2_LOG_INFO(": LDB: cos=%d port:%d\n", id, port->id.phys_id);
> +
>         /* Check cache-line alignment */
>         if ((cq_dma_base & 0x3F) != 0) {
>                 resp->status = DLB2_ST_INVALID_CQ_VIRT_ADDR;
> @@ -4568,13 +4779,25 @@ dlb2_verify_create_dir_port_args(struct dlb2_hw *hw,
>                 /*
>                  * If the port's queue is not configured, validate that a free
>                  * port-queue pair is available.
> +                * First try the 'res' list if the port is producer OR if
> +                * 'avail' list is empty else fall back to 'avail' list
>                  */
> -               pq = DLB2_DOM_LIST_HEAD(domain->avail_dir_pq_pairs,
> -                                       typeof(*pq));
> +               if (!dlb2_list_empty(&domain->rsvd_dir_pq_pairs) &&
> +                   (args->is_producer ||
> +                    dlb2_list_empty(&domain->avail_dir_pq_pairs)))
> +                       pq = DLB2_DOM_LIST_HEAD(domain->rsvd_dir_pq_pairs,
> +                                               typeof(*pq));
> +               else
> +                       pq = DLB2_DOM_LIST_HEAD(domain->avail_dir_pq_pairs,
> +                                               typeof(*pq));
> +
>                 if (!pq) {
>                         resp->status = DLB2_ST_DIR_PORTS_UNAVAILABLE;
>                         return -EINVAL;
>                 }
> +               DLB2_LOG_INFO(": DIR: port:%d is_producer=%d\n",
> +                             pq->id.phys_id, args->is_producer);
> +
>         }
>
>         /* Check cache-line alignment */
> @@ -4875,11 +5098,18 @@ int dlb2_hw_create_dir_port(struct dlb2_hw *hw,
>                 return ret;
>
>         /*
> -        * Configuration succeeded, so move the resource from the 'avail' to
> -        * the 'used' list (if it's not already there).
> +        * Configuration succeeded, so move the resource from the 'avail' or
> +        * 'res' to the 'used' list (if it's not already there).
>          */
>         if (args->queue_id == -1) {
> -               dlb2_list_del(&domain->avail_dir_pq_pairs, &port->domain_list);
> +               struct dlb2_list_head *res = &domain->rsvd_dir_pq_pairs;
> +               struct dlb2_list_head *avail = &domain->avail_dir_pq_pairs;
> +
> +               if ((args->is_producer && !dlb2_list_empty(res)) ||
> +                    dlb2_list_empty(avail))
> +                       dlb2_list_del(res, &port->domain_list);
> +               else
> +                       dlb2_list_del(avail, &port->domain_list);
>
>                 dlb2_list_add(&domain->used_dir_pq_pairs, &port->domain_list);
>         }
> diff --git a/drivers/event/dlb2/pf/base/dlb2_resource.h b/drivers/event/dlb2/pf/base/dlb2_resource.h
> index a7e6c90888..71bd6148f1 100644
> --- a/drivers/event/dlb2/pf/base/dlb2_resource.h
> +++ b/drivers/event/dlb2/pf/base/dlb2_resource.h
> @@ -23,7 +23,20 @@
>   * Return:
>   * Returns 0 upon success, <0 otherwise.
>   */
> -int dlb2_resource_init(struct dlb2_hw *hw, enum dlb2_hw_ver ver);
> +int dlb2_resource_init(struct dlb2_hw *hw, enum dlb2_hw_ver ver, const void *probe_args);
> +
> +/**
> + * dlb2_resource_probe() - probe hw resources
> + * @hw: pointer to struct dlb2_hw.
> + *
> + * This function probes hw resources for best port allocation to producer
> + * cores.
> + *
> + * Return:
> + * Returns 0 upon success, <0 otherwise.
> + */
> +int dlb2_resource_probe(struct dlb2_hw *hw, const void *probe_args);
> +
>
>  /**
>   * dlb2_clr_pmcsr_disable() - power on bulk of DLB 2.0 logic
> diff --git a/drivers/event/dlb2/pf/dlb2_main.c b/drivers/event/dlb2/pf/dlb2_main.c
> index b6ec85b479..717aa4fc08 100644
> --- a/drivers/event/dlb2/pf/dlb2_main.c
> +++ b/drivers/event/dlb2/pf/dlb2_main.c
> @@ -147,7 +147,7 @@ static int dlb2_pf_wait_for_device_ready(struct dlb2_dev *dlb2_dev,
>  }
>
>  struct dlb2_dev *
> -dlb2_probe(struct rte_pci_device *pdev)
> +dlb2_probe(struct rte_pci_device *pdev, const void *probe_args)
>  {
>         struct dlb2_dev *dlb2_dev;
>         int ret = 0;
> @@ -208,6 +208,10 @@ dlb2_probe(struct rte_pci_device *pdev)
>         if (ret)
>                 goto wait_for_device_ready_fail;
>
> +       ret = dlb2_resource_probe(&dlb2_dev->hw, probe_args);
> +       if (ret)
> +               goto resource_probe_fail;
> +
>         ret = dlb2_pf_reset(dlb2_dev);
>         if (ret)
>                 goto dlb2_reset_fail;
> @@ -216,7 +220,7 @@ dlb2_probe(struct rte_pci_device *pdev)
>         if (ret)
>                 goto init_driver_state_fail;
>
> -       ret = dlb2_resource_init(&dlb2_dev->hw, dlb_version);
> +       ret = dlb2_resource_init(&dlb2_dev->hw, dlb_version, probe_args);
>         if (ret)
>                 goto resource_init_fail;
>
> @@ -227,6 +231,7 @@ dlb2_probe(struct rte_pci_device *pdev)
>  init_driver_state_fail:
>  dlb2_reset_fail:
>  pci_mmap_bad_addr:
> +resource_probe_fail:
>  wait_for_device_ready_fail:
>         rte_free(dlb2_dev);
>  dlb2_dev_malloc_fail:
> diff --git a/drivers/event/dlb2/pf/dlb2_main.h b/drivers/event/dlb2/pf/dlb2_main.h
> index 5aa51b1616..4c64d72e9c 100644
> --- a/drivers/event/dlb2/pf/dlb2_main.h
> +++ b/drivers/event/dlb2/pf/dlb2_main.h
> @@ -15,7 +15,11 @@
>  #include "base/dlb2_hw_types.h"
>  #include "../dlb2_user.h"
>
> -#define DLB2_DEFAULT_UNREGISTER_TIMEOUT_S 5
> +#define DLB2_EAL_PROBE_CORE 2
> +#define DLB2_NUM_PROBE_ENQS 1000
> +#define DLB2_HCW_MEM_SIZE 8
> +#define DLB2_HCW_64B_OFF 4
> +#define DLB2_HCW_ALIGN_MASK 0x3F
>
>  struct dlb2_dev;
>
> @@ -31,15 +35,30 @@ struct dlb2_dev {
>         /* struct list_head list; */
>         struct device *dlb2_device;
>         bool domain_reset_failed;
> +       /* The enqueue_four function enqueues four HCWs (one cache-line worth)
> +        * to the HQM, using whichever mechanism is supported by the platform
> +        * on which this driver is running.
> +        */
> +       void (*enqueue_four)(void *qe4, void *pp_addr);
>         /* The resource mutex serializes access to driver data structures and
>          * hardware registers.
>          */
>         rte_spinlock_t resource_mutex;
>         bool worker_launched;
>         u8 revision;
> +       u8 version;
> +};
> +
> +struct dlb2_pp_thread_data {
> +       struct dlb2_hw *hw;
> +       int pp;
> +       int cpu;
> +       bool is_ldb;
> +       int cycles;
>  };
>
> -struct dlb2_dev *dlb2_probe(struct rte_pci_device *pdev);
> +struct dlb2_dev *dlb2_probe(struct rte_pci_device *pdev, const void *probe_args);
> +
>
>  int dlb2_pf_reset(struct dlb2_dev *dlb2_dev);
>  int dlb2_pf_create_sched_domain(struct dlb2_hw *hw,
> diff --git a/drivers/event/dlb2/pf/dlb2_pf.c b/drivers/event/dlb2/pf/dlb2_pf.c
> index 71ac141b66..3d15250e11 100644
> --- a/drivers/event/dlb2/pf/dlb2_pf.c
> +++ b/drivers/event/dlb2/pf/dlb2_pf.c
> @@ -702,6 +702,7 @@ dlb2_eventdev_pci_init(struct rte_eventdev *eventdev)
>         struct dlb2_devargs dlb2_args = {
>                 .socket_id = rte_socket_id(),
>                 .max_num_events = DLB2_MAX_NUM_LDB_CREDITS,
> +               .producer_coremask = NULL,
>                 .num_dir_credits_override = -1,
>                 .qid_depth_thresholds = { {0} },
>                 .poll_interval = DLB2_POLL_INTERVAL_DEFAULT,
> @@ -713,6 +714,7 @@ dlb2_eventdev_pci_init(struct rte_eventdev *eventdev)
>         };
>         struct dlb2_eventdev *dlb2;
>         int q;
> +       const void *probe_args = NULL;
>
>         DLB2_LOG_DBG("Enter with dev_id=%d socket_id=%d",
>                      eventdev->data->dev_id, eventdev->data->socket_id);
> @@ -728,16 +730,6 @@ dlb2_eventdev_pci_init(struct rte_eventdev *eventdev)
>                 dlb2 = dlb2_pmd_priv(eventdev); /* rte_zmalloc_socket mem */
>                 dlb2->version = DLB2_HW_DEVICE_FROM_PCI_ID(pci_dev);
>
> -               /* Probe the DLB2 PF layer */
> -               dlb2->qm_instance.pf_dev = dlb2_probe(pci_dev);
> -
> -               if (dlb2->qm_instance.pf_dev == NULL) {
> -                       DLB2_LOG_ERR("DLB2 PF Probe failed with error %d\n",
> -                                    rte_errno);
> -                       ret = -rte_errno;
> -                       goto dlb2_probe_failed;
> -               }
> -
>                 /* Were we invoked with runtime parameters? */
>                 if (pci_dev->device.devargs) {
>                         ret = dlb2_parse_params(pci_dev->device.devargs->args,
> @@ -749,6 +741,17 @@ dlb2_eventdev_pci_init(struct rte_eventdev *eventdev)
>                                              ret, rte_errno);
>                                 goto dlb2_probe_failed;
>                         }
> +                       probe_args = &dlb2_args;
> +               }
> +
> +               /* Probe the DLB2 PF layer */
> +               dlb2->qm_instance.pf_dev = dlb2_probe(pci_dev, probe_args);
> +
> +               if (dlb2->qm_instance.pf_dev == NULL) {
> +                       DLB2_LOG_ERR("DLB2 PF Probe failed with error %d\n",
> +                                    rte_errno);
> +                       ret = -rte_errno;
> +                       goto dlb2_probe_failed;
>                 }
>
>                 ret = dlb2_primary_eventdev_probe(eventdev,
> --
> 2.25.1
>

  parent reply	other threads:[~2022-09-30  8:28 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-20  0:59 [PATCH 0/3] DLB2 Performance Optimizations Timothy McDaniel
2022-08-20  0:59 ` [PATCH 1/3] event/dlb2: add producer port probing optimization Timothy McDaniel
2022-09-03 13:16   ` Jerin Jacob
2022-09-26 22:55   ` [PATCH v3 " Abdullah Sevincer
2022-09-26 22:55     ` [PATCH v3 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-26 22:55     ` [PATCH v3 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-27  1:42   ` [PATCH v4 1/3] event/dlb2: add producer port probing optimization Abdullah Sevincer
2022-09-27  1:42     ` [PATCH v4 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-27  1:42     ` [PATCH v4 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-28 14:45     ` [PATCH v4 1/3] event/dlb2: add producer port probing optimization Jerin Jacob
2022-09-28 19:11     ` [PATCH v5 " Abdullah Sevincer
2022-09-28 19:11     ` [PATCH v5 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-28 19:19     ` [PATCH v6 1/3] event/dlb2: add producer port probing optimization Abdullah Sevincer
2022-09-28 19:19       ` [PATCH v6 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-28 19:19       ` [PATCH v6 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-28 20:28     ` [PATCH v7 1/3] event/dlb2: add producer port probing optimization Abdullah Sevincer
2022-09-28 20:28       ` [PATCH v7 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-28 20:28       ` [PATCH v7 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-29  1:32     ` [PATCH v8 1/3] event/dlb2: add producer port probing optimization Abdullah Sevincer
2022-09-29  1:32       ` [PATCH v8 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-29  1:32       ` [PATCH v8 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-29  2:48       ` [PATCH v8 1/3] event/dlb2: add producer port probing optimization Sevincer, Abdullah
2022-09-29  3:46     ` [PATCH v9 " Abdullah Sevincer
2022-09-29  3:46       ` [PATCH v9 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-29  3:46       ` [PATCH v9 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-29  5:03   ` [PATCH v10 1/3] event/dlb2: add producer port probing optimization Abdullah Sevincer
2022-09-29  5:03     ` [PATCH v10 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-29  5:03     ` [PATCH v10 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-29 15:26   ` [PATCH v11 1/3] event/dlb2: add producer port probing optimization Abdullah Sevincer
2022-09-29 15:26     ` [PATCH v11 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-29 15:26     ` [PATCH v11 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-29 23:58   ` [PATCH v12 1/3] event/dlb2: add producer port probing optimization Abdullah Sevincer
2022-09-29 23:58     ` [PATCH v12 2/3] event/dlb2: add fence bypass option for producer ports Abdullah Sevincer
2022-09-29 23:59     ` [PATCH v12 3/3] event/dlb2: optimize credit allocations Abdullah Sevincer
2022-09-30  8:28     ` Jerin Jacob [this message]
2022-08-20  0:59 ` [PATCH 2/3] event/dlb2: add fence bypass option for producer ports Timothy McDaniel
2022-08-20  0:59 ` [PATCH 3/3] event/dlb2: optimize credit allocations Timothy McDaniel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALBAE1ODWtfB30QDU8_W5tVf3siKCfDKv0mVddj1pyZn7UjQ8A@mail.gmail.com \
    --to=jerinjacobk@gmail.com \
    --cc=abdullah.sevincer@intel.com \
    --cc=dev@dpdk.org \
    --cc=jerinj@marvell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).