From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id D7807A04DF;
	Tue, 11 Aug 2020 20:26:20 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id B5E221C026;
	Tue, 11 Aug 2020 20:26:20 +0200 (CEST)
Received: from mail-il1-f196.google.com (mail-il1-f196.google.com
 [209.85.166.196]) by dpdk.org (Postfix) with ESMTP id 213C94C99
 for <dev@dpdk.org>; Tue, 11 Aug 2020 20:26:19 +0200 (CEST)
Received: by mail-il1-f196.google.com with SMTP id c16so11616185ils.8
 for <dev@dpdk.org>; Tue, 11 Aug 2020 11:26:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc:content-transfer-encoding;
 bh=6ov7ez1whFcwHHntT88NWd9056YkBNF0ZFXxBYWPznA=;
 b=MgJr2YyMV+k83mjBpikrPtyzBckjeUebe+fJMa4R7eMyNwPPRlQcbpG7V+k6nQixTu
 XIbxHUyTZRy7t58tB75uW1fXs+ZkJicuXz0U35+9FgISIG942Jj2FPVn8tUTwoc2DHqY
 KSHWZX9jRakOtWLly7GGSKHu5EblRUlkVYVldN9qZ3Y77W/oChlM4UpCWsthpGzeYy51
 dDPpqvLWrl+zguD9TO4ML8TFoPuW3CPHA6fliOkt7IthL2gFVaur8sgzkSmdpNcKBA1m
 +874lIpE39/hLKNM2Ha1lN9ZTXOw+Ba2nw35GhZSKzb1U5qU7XpNmWpvuHL3FUmE4eOB
 S3hg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=6ov7ez1whFcwHHntT88NWd9056YkBNF0ZFXxBYWPznA=;
 b=DLZ/eOmhVLBh58TmvbmnydWaDoShv+HWV3uG4jHtn9WXug890H3Hla39XQJP/3Azvt
 voMBpDnjkys95xax5Alic6mdD/Bmq48ksAIF8MDrHzMXMtxy5LraynLwIYP0ywNny7Aq
 S36Sc7AC9Y7JeQAGO6WSz96wnX0T0O1BeaSn+cpWEMgQGIopYaNFdKVYSOabAUNpTPCJ
 72YAayphe6+mPTib5ixrwBXUv3fe/RuquXxS33/w3UMIlbLXLBUUIZlCXU8Yvpjx69D2
 7orrQ9WExOIACyHrkUZ/qIvTwLVnQIiAtDCGQp9AFywwpeAtVNsxcC+6+ba0olO5jBZ9
 6NGw==
X-Gm-Message-State: AOAM530m8YVWjIkCj6Ew0m29WNaVnqDRarCEaY4PLeC4wbEvcThreWj/
 Knk70lTLDs7TBjG4Uj67W5XAwvFMpX6+Yy6LpNE=
X-Google-Smtp-Source: ABdhPJwB9bPmOnorkMEFKtwroh1M4o3SjrJsJOM7O0m2fVYKCqZQ7gF9o710V0yyFw4GSUld05f/ptLaVMP1NNUcVmQ=
X-Received: by 2002:a92:cd0d:: with SMTP id z13mr1007410iln.294.1597170378194; 
 Tue, 11 Aug 2020 11:26:18 -0700 (PDT)
MIME-Version: 1.0
References: <1593232671-5690-0-git-send-email-timothy.mcdaniel@intel.com>
 <1596138614-17409-1-git-send-email-timothy.mcdaniel@intel.com>
 <1596138614-17409-6-git-send-email-timothy.mcdaniel@intel.com>
In-Reply-To: <1596138614-17409-6-git-send-email-timothy.mcdaniel@intel.com>
From: Jerin Jacob <jerinjacobk@gmail.com>
Date: Tue, 11 Aug 2020 23:56:01 +0530
Message-ID: <CALBAE1O7FTFH=m=7fNFdoRizGbtSv5K=+jhMyjLddosAkzmuDg@mail.gmail.com>
To: "McDaniel, Timothy" <timothy.mcdaniel@intel.com>
Cc: Jerin Jacob <jerinj@marvell.com>,
 =?UTF-8?Q?Mattias_R=C3=B6nnblom?= <mattias.ronnblom@ericsson.com>, 
 dpdk-dev <dev@dpdk.org>, Gage Eads <gage.eads@intel.com>, 
 "Van Haaren, Harry" <harry.van.haaren@intel.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [dpdk-dev] [PATCH 05/27] event/dlb: add DLB documentation
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Fri, Jul 31, 2020 at 1:24 AM McDaniel, Timothy
<timothy.mcdaniel@intel.com> wrote:
>
> From: "McDaniel, Timothy" <timothy.mcdaniel@intel.com>
>
> Signed-off-by: McDaniel, Timothy <timothy.mcdaniel@intel.com>
> ---
>  doc/guides/eventdevs/dlb.rst |  343 ++++++++++++++++++++++++++++++++++++=
++++++
>  1 file changed, 343 insertions(+)
>  create mode 100644 doc/guides/eventdevs/dlb.rst
>
> diff --git a/doc/guides/eventdevs/dlb.rst b/doc/guides/eventdevs/dlb.rst

Please don't have a separate patch for documentation. Have
documentation base in the first patch
and update the doc as and when code is added.


> new file mode 100644
> index 0000000..5c42969
> --- /dev/null
> +++ b/doc/guides/eventdevs/dlb.rst
> @@ -0,0 +1,343 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2020 Intel Corporation.
> +
> +Driver for the Intel=C2=AE Dynamic Load Balancer (DLB)
> +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
> +
> +The DPDK dlb poll mode driver supports the Intel=C2=AE Dynamic Load Bala=
ncer.
> +
> +Prerequisites
> +-------------
> +
> +- Follow the DPDK :ref:`Getting Started Guide for Linux <linux_gsg>` to =
setup
> +  the basic DPDK environment.
> +
> +Configuration
> +-------------
> +
> +* The DLB PF PMD is a user-space PMD that uses VFIO to gain direct
> +  device access. To use this operation mode, the PCIe PF device must be =
bound
> +  to a DPDK-compatible VFIO driver, such as vfio-pci. The PF PMD does no=
t work
> +  with PCIe VFs, but is portable to all environments (Linux, FreeBSD, et=
c.)
> +  that DPDK supports. (Note: PF PMD testing has been limited to Linux at=
 this
> +  time.)
> +
> +Eventdev API Notes
> +------------------
> +
> +The DLB provides the functions of a DPDK event device; specifically, it
> +supports atomic, ordered, and parallel scheduling events from queues to =
ports.
> +However, the DLB hardware is not a perfect match to the eventdev API. So=
me DLB
> +features are abstracted by the PMD (e.g. directed ports), some are only
> +accessible as vdev command-line parameters, and certain eventdev feature=
s are
> +not supported (e.g. the event flow ID is not maintained during schedulin=
g).
> +
> +In general the dlb PMD is designed for ease-of-use and doesn't require a
> +detailed understanding of the hardware, but these details are important =
when
> +writing high-performance code. This section describes the places where t=
he
> +eventdev API and DLB misalign.
> +
> +VAS Configuration
> +~~~~~~~~~~~~~~~~~
> +
> +A VAS is a scheduling domain, of which there are 32 in the DLB.
> +When a VAS is configured, it allocates load-balanced and
> +directed queues, ports, credits, and other hardware resources. Some VAS
> +resource allocations are user-controlled -- the number of queues, for ex=
ample
> +-- and others, like credit pools (one directed and one load-balanced poo=
l per
> +VAS), are not.
> +
> +The DLB is a closed system eventdev, and as such the ``nb_events_limit``=
 device
> +setup argument and the per-port ``new_event_threshold`` argument apply a=
s
> +defined in the eventdev header file. The limit is applied to all enqueue=
s,
> +regardless of whether it will consume a directed or load-balanced credit=
.
> +
> +Load-balanced and Directed Ports
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +DLB ports come in two flavors: load-balanced and directed. The eventdev =
API
> +does not have the same concept, but it has a similar one: ports and queu=
es that
> +are singly-linked (i.e. linked to a single queue or port, respectively).
> +
> +The ``rte_event_dev_info_get()`` function reports the number of availabl=
e
> +event ports and queues (among other things). For the DLB PMD, max_event_=
ports
> +and max_event_queues report the number of available load-balanced ports =
and
> +queues, and max_single_link_event_port_queue_pairs reports the number of
> +available directed ports and queues.
> +
> +When a VAS is created in ``rte_event_dev_configure()``, the user specifi=
es
> +``nb_event_ports`` and ``nb_single_link_event_port_queues``, which contr=
ol the
> +total number of ports (load-balanced and directed) and the number of dir=
ected
> +ports. Hence, the number of requested load-balanced ports is ``nb_event_=
ports
> +- nb_single_link_event_ports``. The ``nb_event_queues`` field specifies =
the
> +total number of queues (load-balanced and directed). The number of direc=
ted
> +queues comes from ``nb_single_link_event_port_queues``, since directed p=
orts
> +and queues come in pairs.
> +
> +When a port is setup, the ``RTE_EVENT_PORT_CFG_SINGLE_LINK`` flag determ=
ines
> +whether it should be configured as a directed (the flag is set) or a
> +load-balanced (the flag is unset) port. Similarly, the
> +``RTE_EVENT_QUEUE_CFG_SINGLE_LINK`` queue configuration flag controls
> +whether it is a directed or load-balanced queue.
> +
> +Load-balanced ports can only be linked to load-balanced queues, and dire=
cted
> +ports can only be linked to directed queues. Furthermore, directed ports=
 can
> +only be linked to a single directed queue (and vice versa), and that lin=
k
> +cannot change after the eventdev is started.
> +
> +The eventdev API doesn't have a directed scheduling type. To support dir=
ected
> +traffic, the dlb PMD detects when an event is being sent to a directed q=
ueue
> +and overrides its scheduling type. Note that the originally selected sch=
eduling
> +type (atomic, ordered, or parallel) is not preserved, and an event's sch=
ed_type
> +will be set to ``RTE_SCHED_TYPE_ATOMIC`` when it is dequeued from a dire=
cted
> +port.
> +
> +Flow ID
> +~~~~~~~
> +
> +The flow ID field is not preserved in the event when it is scheduled in =
the
> +DLB, because the DLB hardware control word format does not have sufficie=
nt
> +space to preserve every event field. As a result, the flow ID specified =
with
> +the enqueued event will not be in the dequeued event. If this field is
> +required, the application should pass it through an out-of-band path (fo=
r
> +example in the mbuf's udata64 field, if the event points to an mbuf) or
> +reconstruct the flow ID after receiving the event.
> +
> +Also, the DLB hardware control word supports a 16-bit flow ID. Since str=
uct
> +rte_event's flow_id field is 20 bits, the DLB PMD drops the most signifi=
cant
> +four bits from the event's flow ID.
> +
> +Hardware Credits
> +~~~~~~~~~~~~~~~~
> +
> +DLB uses a hardware credit scheme to prevent software from overflowing h=
ardware
> +event storage, with each unit of storage represented by a credit. A port=
 spends
> +a credit to enqueue an event, and hardware refills the ports with credit=
s as the
> +events are scheduled to ports. Refills come from credit pools, and each =
port is
> +a member of a load-balanced credit pool and a directed credit pool. The
> +load-balanced credits are used to enqueue to load-balanced queues, and d=
irected
> +credits are used for directed queues.
> +
> +A DLB eventdev contains one load-balanced and one directed credit pool. =
These
> +pools' sizes are controlled by the nb_events_limit field in struct
> +rte_event_dev_config. The load-balanced pool is sized to contain
> +nb_events_limit credits, and the directed pool is sized to contain
> +nb_events_limit/4 credits. The directed pool size can be overridden with=
 the
> +num_dir_credits vdev argument, like so:
> +
> +    .. code-block:: console
> +
> +       --vdev=3Ddlb1_event,num_dir_credits=3D<value>
> +
> +This can be used if the default allocation is too low or too high for th=
e
> +specific application needs. The PMD also supports a vdev arg that limits=
 the
> +max_num_events reported by rte_event_dev_info_get():
> +
> +    .. code-block:: console
> +
> +       --vdev=3Ddlb1_event,max_num_events=3D<value>
> +
> +By default, max_num_events is reported as the total available load-balan=
ced
> +credits. If multiple DLB-based applications are being used, it may be de=
sirable
> +to control how many load-balanced credits each application uses, particu=
larly
> +when application(s) are written to configure nb_events_limit equal to th=
e
> +reported max_num_events.
> +
> +Each port is a member of both credit pools. A port's credit allocation i=
s
> +defined by its low watermark, high watermark, and refill quanta. These t=
hree
> +parameters are calculated by the dlb PMD like so:
> +
> +- The load-balanced high watermark is set to the port's enqueue_depth.
> +  The directed high watermark is set to the minimum of the enqueue_depth=
 and
> +  the directed pool size divided by the total number of ports.
> +- The refill quanta is set to half the high watermark.
> +- The low watermark is set to the minimum of 16 and the refill quanta.
> +
> +When the eventdev is started, each port is pre-allocated a high watermar=
k's
> +worth of credits. For example, if an eventdev contains four ports with e=
nqueue
> +depths of 32 and a load-balanced credit pool size of 4096, each port wil=
l start
> +with 32 load-balanced credits, and there will be 3968 credits available =
to
> +replenish the ports. Thus, a single port is not capable of enqueueing up=
 to the
> +nb_events_limit (without any events being dequeued), since the other por=
ts are
> +retaining their initial credit allocation; in short, all ports must enqu=
eue in
> +order to reach the limit.
> +
> +If a port attempts to enqueue and has no credits available, the enqueue
> +operation will fail and the application must retry the enqueue. Credits =
are
> +replenished asynchronously by the DLB hardware.
> +
> +Software Credits
> +~~~~~~~~~~~~~~~~
> +
> +The DLB is a "closed system" event dev, and the DLB PMD layers a softwar=
e
> +credit scheme on top of the hardware credit scheme in order to comply wi=
th
> +the per-port backpressure described in the eventdev API.
> +
> +The DLB's hardware scheme is local to a queue/pipeline stage: a port spe=
nds a
> +credit when it enqueues to a queue, and credits are later replenished af=
ter the
> +events are dequeued and released.
> +
> +In the software credit scheme, a credit is consumed when a new (.op =3D
> +RTE_EVENT_OP_NEW) event is injected into the system, and the credit is
> +replenished when the event is released from the system (either explicitl=
y with
> +RTE_EVENT_OP_RELEASE or implicitly in dequeue_burst()).
> +
> +In this model, an event is "in the system" from its first enqueue into e=
ventdev
> +until it is last dequeued. If the event goes through multiple event queu=
es, it
> +is still considered "in the system" while a worker thread is processing =
it.
> +
> +A port will fail to enqueue if the number of events in the system exceed=
s its
> +``new_event_threshold`` (specified at port setup time). A port will also=
 fail
> +to enqueue if it lacks enough hardware credits to enqueue; load-balanced
> +credits are used to enqueue to a load-balanced queue, and directed credi=
ts are
> +used to enqueue to a directed queue.
> +
> +The out-of-credit situations are typically transient, and an eventdev
> +application using the DLB ought to retry its enqueues if they fail.
> +If enqueue fails, DLB PMD sets rte_errno as follows:
> +
> +- -ENOSPC: Credit exhaustion (either hardware or software)
> +- -EINVAL: Invalid argument, such as port ID, queue ID, or sched_type.
> +
> +Depending on the pipeline the application has constructed, it's possible=
 to
> +enter a credit deadlock scenario wherein the worker thread lacks the cre=
dit
> +to enqueue an event, and it must dequeue an event before it can recover =
the
> +credit. If the worker thread retries its enqueue indefinitely, it will n=
ot
> +make forward progress. Such deadlock is possible if the application has =
event
> +"loops", in which an event in dequeued from queue A and later enqueued b=
ack to
> +queue A.
> +
> +Due to this, workers should stop retrying after a time, release the even=
ts it
> +is attempting to enqueue, and dequeue more events. It is important that =
the
> +worker release the events and don't simply set them aside to retry the e=
nqueue
> +again later, because the port has limited history list size (by default,=
 twice
> +the port's dequeue_depth).
> +
> +Priority
> +~~~~~~~~
> +
> +The DLB supports event priority and per-port queue service priority, as
> +described in the eventdev header file. The DLB does not support 'global'=
 event
> +queue priority established at queue creation time.
> +
> +DLB supports 8 event and queue service priority levels. For both priorit=
y
> +types, the PMD uses the upper three bits of the priority field to determ=
ine the
> +DLB priority, discarding the 5 least significant bits. The 5 least signi=
ficant
> +event priority bits are not preserved when an event is enqueued.
> +
> +Load-Balanced Queues
> +~~~~~~~~~~~~~~~~~~~~
> +
> +A load-balanced queue can support atomic and ordered scheduling, or atom=
ic and
> +unordered scheduling, but not atomic and unordered and ordered schedulin=
g. A
> +queue's scheduling types are controlled by the event queue configuration=
.
> +
> +If the user sets the ``RTE_EVENT_QUEUE_CFG_ALL_TYPES`` flag, the
> +``nb_atomic_order_sequences`` determines the supported scheduling types.
> +With non-zero ``nb_atomic_order_sequences``, the queue is configured for=
 atomic
> +and ordered scheduling. In this case, ``RTE_SCHED_TYPE_PARALLEL`` schedu=
ling is
> +supported by scheduling those events as ordered events.  Note that when =
the
> +event is dequeued, its sched_type will be ``RTE_SCHED_TYPE_ORDERED``. El=
se if
> +``nb_atomic_order_sequences`` is zero, the queue is configured for atomi=
c and
> +unordered scheduling. In this case, ``RTE_SCHED_TYPE_ORDERED`` is unsupp=
orted.
> +
> +If the ``RTE_EVENT_QUEUE_CFG_ALL_TYPES`` flag is not set, schedule_type
> +dictates the queue's scheduling type.
> +
> +The ``nb_atomic_order_sequences`` queue configuration field sets the ord=
ered
> +queue's reorder buffer size.  DLB has 4 groups of ordered queues, where =
each
> +group is configured to contain either 1 queue with 1024 reorder entries,=
 2
> +queues with 512 reorder entries, and so on down to 32 queues with 32 ent=
ries.
> +
> +When a load-balanced queue is created, the PMD will configure a new sequ=
ence
> +number group on-demand if num_sequence_numbers does not match a pre-exis=
ting
> +group with available reorder buffer entries. If all sequence number grou=
ps are
> +in use, no new group will be created and queue configuration will fail. =
(Note
> +that when the PMD is used with a virtual DLB device, it cannot change th=
e
> +sequence number configuration.)
> +
> +The queue's ``nb_atomic_flows`` parameter is ignored by the DLB PMD, bec=
ause
> +the DLB doesn't limit the number of flows a queue can track. In the DLB,=
 all
> +load-balanced queues can use the full 16-bit flow ID range.
> +
> +Reconfiguration
> +~~~~~~~~~~~~~~~
> +
> +The Eventdev API allows one to reconfigure a device, its ports, and its =
queues
> +by first stopping the device, calling the configuration function(s), the=
n
> +restarting the device. The DLB doesn't support configuring an individual=
 queue
> +or port without first reconfiguring the entire device, however, so there=
 are
> +certain reconfiguration sequences that are valid in the eventdev API but=
 not
> +supported by the PMD.
> +
> +Specifically, the PMD supports the following configuration sequence:
> +1. Configure and start the device
> +2. Stop the device
> +3. (Optional) Reconfigure the device
> +4. (Optional) If step 3 is run:
> +
> +   a. Setup queue(s). The reconfigured queue(s) lose their previous port=
 links.
> +   b. The reconfigured port(s) lose their previous queue links.
> +
> +5. (Optional, only if steps 4a and 4b are run) Link port(s) to queue(s)
> +6. Restart the device. If the device is reconfigured in step 3 but one o=
r more
> +   of its ports or queues are not, the PMD will apply their previous
> +   configuration (including port->queue links) at this time.
> +
> +The PMD does not support the following configuration sequences:
> +1. Configure and start the device
> +2. Stop the device
> +3. Setup queue or setup port
> +4. Start the device
> +
> +This sequence is not supported because the event device must be reconfig=
ured
> +before its ports or queues can be.
> +
> +Deferred Scheduling
> +~~~~~~~~~~~~~~~~~~~
> +
> +The DLB PMD's default behavior for managing a CQ is to "pop" the CQ once=
 per
> +dequeued event before returning from rte_event_dequeue_burst(). This fre=
es the
> +corresponding entries in the CQ, which enables the DLB to schedule more =
events
> +to it.
> +
> +To support applications seeking finer-grained scheduling control -- for =
example
> +deferring scheduling to get the best possible priority scheduling and
> +load-balancing -- the PMD supports a deferred scheduling mode. In this m=
ode,
> +the CQ entry is not popped until the *subsequent* rte_event_dequeue_burs=
t()
> +call. This mode only applies to load-balanced event ports with dequeue d=
epth of
> +1.
> +
> +To enable deferred scheduling, use the defer_sched vdev argument like so=
:
> +
> +    .. code-block:: console
> +
> +       --vdev=3Ddlb1_event,defer_sched=3Don
> +
> +Atomic Inflights Allocation
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +In the last stage prior to scheduling an atomic event to a CQ, DLB holds=
 the
> +inflight event in a temporary buffer that is divided among load-balanced
> +queues. If a queue's atomic buffer storage fills up, this can result in
> +head-of-line-blocking. For example:
> +- An LDB queue allocated N atomic buffer entries
> +- All N entries are filled with events from flow X, which is pinned to C=
Q 0.
> +
> +Until CQ 0 releases 1+ events, no other atomic flows for that LDB queue =
can be
> +scheduled. The likelihood of this case depends on the eventdev configura=
tion,
> +traffic behavior, event processing latency, potential for a worker to be
> +interrupted or otherwise delayed, etc.
> +
> +By default, the PMD allocates 16 buffer entries for each load-balanced q=
ueue,
> +which provides an even division across all 128 queues but potentially wa=
stes
> +buffer space (e.g. if not all queues are used, or aren't used for atomic
> +scheduling).
> +
> +The PMD provides a dev arg to override the default per-queue allocation.=
 To
> +increase a vdev's per-queue atomic-inflight allocation to (for example) =
64:
> +
> +    .. code-block:: console
> +
> +       --vdev=3Ddlb1_event,atm_inflights=3D64
> +
> --
> 1.7.10
>