DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/7] Introduce event vectorization
@ 2021-02-20 22:09 pbhagavatula
  2021-02-20 22:09 ` [dpdk-dev] [PATCH 1/7] eventdev: introduce event vector capability pbhagavatula
                   ` (8 more replies)
  0 siblings, 9 replies; 144+ messages in thread
From: pbhagavatula @ 2021-02-20 22:09 UTC (permalink / raw)
  To: jerinj, jay.jayatheerthan, erik.g.carrillo, abhinandan.gujjar,
	timothy.mcdaniel, hemant.agrawal, harry.van.haaren,
	mattias.ronnblom, liang.j.ma
  Cc: dev, Pavan Nikhilesh

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

In traditional event programming model, events are identified by a
flow-id and a uintptr_t. The flow-id uniquely identifies a given event
and determines the order of scheduling based on schedule type, the
uintptr_t holds a single object.

Event devices also support burst mode with configurable dequeue depth,
i.e. each dequeue call would return multiple events and each event
might be at a different stage of the pipeline.
Having a burst of events belonging to different stages in a dequeue
burst is not only difficult to vectorize but also increases the scheduler
overhead and application overhead of pipelining events further.
Using event vectors we see a performance gain of ~150% as shown in [1].

By introducing event vectorization, each event will be capable of holding
multiple uintptr_t of the same flow thereby allowing applications
to vectorize their pipeline and reduce the complexity of pipelining
events across multiple stages. This also reduces the complexity of handling
enqueue and dequeue on an event device.

Since event devices are transparent to the events they are scheduling
so the event producers such as eth_rx_adapter, crypto_adapter , etc..
are responsible for vectorizing the buffers of the same flow into a single
event.

The series also breaks ABI in [2/7] patch which we fix in [7/7]. The patch
[7/7] can be changed in the next major release i.e. v21.11.

The dpdk-test-eventdev application has been updated with options to test
multiple vector sizes and timeouts.

[1]
As for performance improvement, with a ARM Cortex-A72 equivalent processer,
software event device (--vdev=event_sw0), single worker core, single stage
and using one service core for Rx adapter, Tx adapter, Scheduling.

Without event vectorization:
    ./build/app/dpdk-test-eventdev -l 7-23 -s 0x700 --vdev="event_sw0" --
         --prod_type_ethdev --nb_pkts=0 --verbose 2 --test=pipeline_queue
         --stlist=a --wlcores=20
    Port[0] using Rx adapter[0] configured
    Port[0] using Tx adapter[0] Configured
    4.728 mpps avg 4.728 mpps

With event vectorization:
    ./build/app/dpdk-test-eventdev -l 7-23 -s 0x700 --vdev="event_sw0" --
        --prod_type_ethdev --nb_pkts=0 --verbose 2 --test=pipeline_queue
        --stlist=a --wlcores=20 --enable_vector --nb_eth_queues 1
        --vector_size 256
    Port[0] using Rx adapter[0] configured
    Port[0] using Tx adapter[0] Configured
    34.383 mpps avg 34.383 mpps

Having dedicated service cores for each Rx queues and tweaking the vector,
dequeue burst size would further improve performance.

API usage is shown below:

Configuration:

	struct rte_event_eth_rx_adapter_event_vector_config vec_conf;

	vector_pool = rte_event_vector_pool_create("vector_pool",
			nb_elem, 0, vector_size, socket_id);

	rte_event_eth_rx_adapter_create(id, event_id, &adptr_conf);
	rte_event_eth_rx_adapter_queue_add(id, eth_id, -1, &queue_conf);
	if (cap & RTE_EVENT_ETH_RX_ADAPTER_CAP_EVENT_VECTOR) {
		vec_conf.vector_sz = vector_size;
		vec_conf.vector_timeout_ns = vector_tmo_nsec;
		vec_conf.vector_mp = vector_pool;
		rte_event_eth_rx_adapter_queue_event_vector_config(id,
				eth_id, -1, &vec_conf);
	}

Fastpath:

	num = rte_event_dequeue_burst(event_id, port_id, &ev, 1, 0);
	if (!num)
		continue;

	if (ev.event_type & RTE_EVENT_TYPE_VECTOR) {
		switch (ev.event_type) {
		case RTE_EVENT_TYPE_ETHDEV_VECTOR:
		case RTE_EVENT_TYPE_ETH_RX_ADAPTER_VECTOR:
			struct rte_mbuf **mbufs;

			mbufs = ev.vector_ev->mbufs;
			for (i = 0; i < ev.vector_ev->nb_elem; i++)
				//Process mbufs.
			break;
		case ...
		}
	}
	...

Pavan Nikhilesh (7):
  eventdev: introduce event vector capability
  eventdev: introduce event vector Rx capability
  eventdev: introduce event vector Tx capability
  eventdev: add Rx adapter event vector support
  eventdev: add Tx adapter event vector support
  app/eventdev: add event vector mode in pipeline test
  eventdev: fix ABI breakage due to event vector

 app/test-eventdev/evt_common.h                |   4 +
 app/test-eventdev/evt_options.c               |  52 +++
 app/test-eventdev/evt_options.h               |   4 +
 app/test-eventdev/test_pipeline_atq.c         | 310 +++++++++++++--
 app/test-eventdev/test_pipeline_common.c      |  77 +++-
 app/test-eventdev/test_pipeline_common.h      |  18 +
 app/test-eventdev/test_pipeline_queue.c       | 320 +++++++++++++--
 .../prog_guide/event_ethernet_rx_adapter.rst  |  38 ++
 .../prog_guide/event_ethernet_tx_adapter.rst  |  12 +
 doc/guides/prog_guide/eventdev.rst            |  36 +-
 doc/guides/tools/testeventdev.rst             |  28 ++
 lib/librte_eventdev/eventdev_pmd.h            |  60 ++-
 .../rte_event_eth_rx_adapter.c                | 367 +++++++++++++++++-
 .../rte_event_eth_rx_adapter.h                |  93 +++++
 .../rte_event_eth_tx_adapter.c                |  66 +++-
 lib/librte_eventdev/rte_eventdev.c            |  11 +-
 lib/librte_eventdev/rte_eventdev.h            | 145 ++++++-
 lib/librte_eventdev/version.map               |   5 +
 18 files changed, 1560 insertions(+), 86 deletions(-)

--
2.17.1


^ permalink raw reply	[flat|nested] 144+ messages in thread