DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC PATCH v2] raw/ptdma: introduce ptdma driver
@ 2021-09-06 16:55 Selwin Sebastian
  2021-09-06 17:17 ` David Marchand
  0 siblings, 1 reply; 9+ messages in thread
From: Selwin Sebastian @ 2021-09-06 16:55 UTC (permalink / raw)
  To: dev; +Cc: Selwin Sebastian

Add support for PTDMA driver

Signed-off-by: Selwin Sebastian <selwin.sebastian@amd.com>
---
 MAINTAINERS                              |   5 +
 doc/guides/rawdevs/ptdma.rst             | 220 ++++++++++++++
 drivers/raw/meson.build                  |   1 +
 drivers/raw/ptdma/meson.build            |  16 +
 drivers/raw/ptdma/ptdma_dev.c            | 135 +++++++++
 drivers/raw/ptdma/ptdma_pmd_private.h    |  41 +++
 drivers/raw/ptdma/ptdma_rawdev.c         | 266 +++++++++++++++++
 drivers/raw/ptdma/ptdma_rawdev_spec.h    | 362 +++++++++++++++++++++++
 drivers/raw/ptdma/ptdma_rawdev_test.c    | 272 +++++++++++++++++
 drivers/raw/ptdma/rte_ptdma_rawdev.h     | 124 ++++++++
 drivers/raw/ptdma/rte_ptdma_rawdev_fns.h | 298 +++++++++++++++++++
 drivers/raw/ptdma/version.map            |   5 +
 usertools/dpdk-devbind.py                |   4 +-
 13 files changed, 1748 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/rawdevs/ptdma.rst
 create mode 100644 drivers/raw/ptdma/meson.build
 create mode 100644 drivers/raw/ptdma/ptdma_dev.c
 create mode 100644 drivers/raw/ptdma/ptdma_pmd_private.h
 create mode 100644 drivers/raw/ptdma/ptdma_rawdev.c
 create mode 100644 drivers/raw/ptdma/ptdma_rawdev_spec.h
 create mode 100644 drivers/raw/ptdma/ptdma_rawdev_test.c
 create mode 100644 drivers/raw/ptdma/rte_ptdma_rawdev.h
 create mode 100644 drivers/raw/ptdma/rte_ptdma_rawdev_fns.h
 create mode 100644 drivers/raw/ptdma/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 266f5ac1da..f4afd1a072 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1305,6 +1305,11 @@ F: doc/guides/rawdevs/ioat.rst
 F: examples/ioat/
 F: doc/guides/sample_app_ug/ioat.rst
 
+PTDMA Rawdev
+M: Selwin Sebastian <selwin.sebastian@amd.com>
+F: drivers/raw/ptdma/
+F: doc/guides/rawdevs/ptdma.rst
+
 NXP DPAA2 QDMA
 M: Nipun Gupta <nipun.gupta@nxp.com>
 F: drivers/raw/dpaa2_qdma/
diff --git a/doc/guides/rawdevs/ptdma.rst b/doc/guides/rawdevs/ptdma.rst
new file mode 100644
index 0000000000..50772f9f3b
--- /dev/null
+++ b/doc/guides/rawdevs/ptdma.rst
@@ -0,0 +1,220 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+
+PTDMA Rawdev Driver
+===================
+
+The ``ptdma`` rawdev driver provides a poll-mode driver (PMD) for AMD PTDMA device.
+
+Hardware Requirements
+----------------------
+
+The ``dpdk-devbind.py`` script, included with DPDK,
+can be used to show the presence of supported hardware.
+Running ``dpdk-devbind.py --status-dev misc`` will show all the miscellaneous,
+or rawdev-based devices on the system.
+
+Sample output from a system with PTDMA is shown below
+
+Misc (rawdev) devices using DPDK-compatible driver
+==================================================
+0000:01:00.2 'Starship/Matisse PTDMA 1498' drv=igb_uio unused=vfio-pci
+0000:02:00.2 'Starship/Matisse PTDMA 1498' drv=igb_uio unused=vfio-pci
+
+Devices using UIO drivers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The HW devices to be used will need to be bound to a user-space IO driver for use.
+The ``dpdk-devbind.py`` script can be used to view the state of the PTDMA devices
+and to bind them to a suitable DPDK-supported driver, such as ``igb_uio``.
+For example::
+
+        $ sudo ./usertools/dpdk-devbind.py  --force --bind=igb_uio 0000:01:00.2 0000:02:00.2
+
+Compilation
+------------
+
+For builds using ``meson`` and ``ninja``, the driver will be built when the target platform is x86-based.
+No additional compilation steps are necessary.
+
+
+Using PTDMA Rawdev Devices
+--------------------------
+
+To use the devices from an application, the rawdev API can be used, along
+with definitions taken from the device-specific header file
+``rte_ptdma_rawdev.h``. This header is needed to get the definition of
+structure parameters used by some of the rawdev APIs for PTDMA rawdev
+devices, as well as providing key functions for using the device for memory
+copies.
+
+Getting Device Information
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Basic information about each rawdev device can be queried using the
+``rte_rawdev_info_get()`` API. For most applications, this API will be
+needed to verify that the rawdev in question is of the expected type. For
+example, the following code snippet can be used to identify an PTDMA
+rawdev device for use by an application:
+
+.. code-block:: C
+
+        for (i = 0; i < count && !found; i++) {
+                struct rte_rawdev_info info = { .dev_private = NULL };
+                found = (rte_rawdev_info_get(i, &info, 0) == 0 &&
+                                strcmp(info.driver_name,
+                                                PTDMA_PMD_RAWDEV_NAME) == 0);
+        }
+
+When calling the ``rte_rawdev_info_get()`` API for an PTDMA rawdev device,
+the ``dev_private`` field in the ``rte_rawdev_info`` struct should either
+be NULL, or else be set to point to a structure of type
+``rte_ptdma_rawdev_config``, in which case the size of the configured device
+input ring will be returned in that structure.
+
+Device Configuration
+~~~~~~~~~~~~~~~~~~~~~
+
+Configuring an PTDMA rawdev device is done using the
+``rte_rawdev_configure()`` API, which takes the same structure parameters
+as the, previously referenced, ``rte_rawdev_info_get()`` API. The main
+difference is that, because the parameter is used as input rather than
+output, the ``dev_private`` structure element cannot be NULL, and must
+point to a valid ``rte_ptdma_rawdev_config`` structure, containing the ring
+size to be used by the device. The ring size must be a power of two,
+between 64 and 4096.
+If it is not needed, the tracking by the driver of user-provided completion
+handles may be disabled by setting the ``hdls_disable`` flag in
+the configuration structure also.
+
+The following code shows how the device is configured in
+``test_ptdma_rawdev.c``:
+
+.. code-block:: C
+
+   #define PTDMA_TEST_RINGSIZE 512
+        struct rte_ptdma_rawdev_config p = { .ring_size = -1 };
+        struct rte_rawdev_info info = { .dev_private = &p };
+
+        /* ... */
+
+        p.ring_size = PTDMA_TEST_RINGSIZE;
+        if (rte_rawdev_configure(dev_id, &info, sizeof(p)) != 0) {
+                printf("Error with rte_rawdev_configure()\n");
+                return -1;
+        }
+
+Once configured, the device can then be made ready for use by calling the
+``rte_rawdev_start()`` API.
+
+Performing Data Copies
+~~~~~~~~~~~~~~~~~~~~~~~
+
+To perform data copies using PTDMA rawdev devices, the functions
+``rte_ptdma_enqueue_copy()`` and ``rte_ptdma_perform_ops()`` should be used.
+Once copies have been completed, the completion will be reported back when
+the application calls ``rte_ptdma_completed_ops()``.
+
+The ``rte_ptdma_enqueue_copy()`` function enqueues a single copy to the
+device ring for copying at a later point. The parameters to that function
+include the IOVA addresses of both the source and destination buffers,
+as well as two "handles" to be returned to the user when the copy is
+completed. These handles can be arbitrary values, but two are provided so
+that the library can track handles for both source and destination on
+behalf of the user, e.g. virtual addresses for the buffers, or mbuf
+pointers if packet data is being copied.
+
+While the ``rte_ptdma_enqueue_copy()`` function enqueues a copy operation on
+the device ring, the copy will not actually be performed until after the
+application calls the ``rte_ptdma_perform_ops()`` function. This function
+informs the device hardware of the elements enqueued on the ring, and the
+device will begin to process them. It is expected that, for efficiency
+reasons, a burst of operations will be enqueued to the device via multiple
+enqueue calls between calls to the ``rte_ptdma_perform_ops()`` function.
+
+The following code from ``test_ptdma_rawdev.c`` demonstrates how to enqueue
+a burst of copies to the device and start the hardware processing of them:
+
+.. code-block:: C
+
+        struct rte_mbuf *srcs[32], *dsts[32];
+        unsigned int j;
+
+        for (i = 0; i < RTE_DIM(srcs); i++) {
+                char *src_data;
+
+                srcs[i] = rte_pktmbuf_alloc(pool);
+                dsts[i] = rte_pktmbuf_alloc(pool);
+                srcs[i]->data_len = srcs[i]->pkt_len = length;
+                dsts[i]->data_len = dsts[i]->pkt_len = length;
+                src_data = rte_pktmbuf_mtod(srcs[i], char *);
+
+                for (j = 0; j < length; j++)
+                        src_data[j] = rand() & 0xFF;
+
+                if (rte_ptdma_enqueue_copy(dev_id,
+                                srcs[i]->buf_iova + srcs[i]->data_off,
+                                dsts[i]->buf_iova + dsts[i]->data_off,
+                                length,
+                                (uintptr_t)srcs[i],
+                                (uintptr_t)dsts[i]) != 1) {
+                        printf("Error with rte_ptdma_enqueue_copy for buffer %u\n",
+                                        i);
+                        return -1;
+                }
+        }
+        rte_ptdma_perform_ops(dev_id);
+
+To retrieve information about completed copies, the API
+``rte_ptdma_completed_ops()`` should be used. This API will return to the
+application a set of completion handles passed in when the relevant copies
+were enqueued.
+
+The following code from ``test_ptdma_rawdev.c`` shows the test code
+retrieving information about the completed copies and validating the data
+is correct before freeing the data buffers using the returned handles:
+
+.. code-block:: C
+
+        if (rte_ptdma_completed_ops(dev_id, 64, (void *)completed_src,
+                        (void *)completed_dst) != RTE_DIM(srcs)) {
+                printf("Error with rte_ptdma_completed_ops\n");
+                return -1;
+        }
+        for (i = 0; i < RTE_DIM(srcs); i++) {
+                char *src_data, *dst_data;
+
+                if (completed_src[i] != srcs[i]) {
+                        printf("Error with source pointer %u\n", i);
+                        return -1;
+                }
+                if (completed_dst[i] != dsts[i]) {
+                        printf("Error with dest pointer %u\n", i);
+                        return -1;
+                }
+
+                src_data = rte_pktmbuf_mtod(srcs[i], char *);
+                dst_data = rte_pktmbuf_mtod(dsts[i], char *);
+                for (j = 0; j < length; j++)
+                        if (src_data[j] != dst_data[j]) {
+                                printf("Error with copy of packet %u, byte %u\n",
+                                                i, j);
+                                return -1;
+                        }
+                rte_pktmbuf_free(srcs[i]);
+                rte_pktmbuf_free(dsts[i]);
+        }
+
+Querying Device Statistics
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The statistics from the PTDMA rawdev device can be got via the xstats
+functions in the ``rte_rawdev`` library, i.e.
+``rte_rawdev_xstats_names_get()``, ``rte_rawdev_xstats_get()`` and
+``rte_rawdev_xstats_by_name_get``. The statistics returned for each device
+instance are:
+
+* ``failed_enqueues``
+* ``successful_enqueues``
+* ``copies_started``
+* ``copies_completed``
diff --git a/drivers/raw/meson.build b/drivers/raw/meson.build
index b51536f8a7..e896745d9c 100644
--- a/drivers/raw/meson.build
+++ b/drivers/raw/meson.build
@@ -14,6 +14,7 @@ drivers = [
         'ntb',
         'octeontx2_dma',
         'octeontx2_ep',
+	'ptdma',
         'skeleton',
 ]
 std_deps = ['rawdev']
diff --git a/drivers/raw/ptdma/meson.build b/drivers/raw/ptdma/meson.build
new file mode 100644
index 0000000000..a3eab8dbfd
--- /dev/null
+++ b/drivers/raw/ptdma/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2021 Advanced Micro Devices, Inc. All rights reserved.
+
+build = dpdk_conf.has('RTE_ARCH_X86')
+reason = 'only supported on x86'
+sources = files(
+	'ptdma_rawdev.c',
+	'ptdma_dev.c',
+	'ptdma_rawdev_test.c')
+deps += ['bus_pci',
+	'bus_vdev',
+	'mbuf',
+	'rawdev']
+
+headers = files('rte_ptdma_rawdev.h',
+		'rte_ptdma_rawdev_fns.h')
diff --git a/drivers/raw/ptdma/ptdma_dev.c b/drivers/raw/ptdma/ptdma_dev.c
new file mode 100644
index 0000000000..1d0207a9af
--- /dev/null
+++ b/drivers/raw/ptdma/ptdma_dev.c
@@ -0,0 +1,135 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <dirent.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/queue.h>
+#include <sys/types.h>
+#include <sys/file.h>
+#include <unistd.h>
+
+#include <rte_hexdump.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_memory.h>
+#include <rte_spinlock.h>
+#include <rte_string_fns.h>
+
+#include "ptdma_rawdev_spec.h"
+#include "ptdma_pmd_private.h"
+#include "rte_ptdma_rawdev_fns.h"
+
+static int ptdma_dev_id;
+
+static const struct rte_memzone *
+ptdma_queue_dma_zone_reserve(const char *queue_name,
+			   uint32_t queue_size,
+			   int socket_id)
+{
+	const struct rte_memzone *mz;
+
+	mz = rte_memzone_lookup(queue_name);
+	if (mz != 0) {
+		if (((size_t)queue_size <= mz->len) &&
+		    ((socket_id == SOCKET_ID_ANY) ||
+		     (socket_id == mz->socket_id))) {
+			PTDMA_PMD_INFO("re-use memzone already "
+				     "allocated for %s", queue_name);
+			return mz;
+		}
+		PTDMA_PMD_ERR("Incompatible memzone already "
+			    "allocated %s, size %u, socket %d. "
+			    "Requested size %u, socket %u",
+			    queue_name, (uint32_t)mz->len,
+			    mz->socket_id, queue_size, socket_id);
+		return NULL;
+	}
+
+	PTDMA_PMD_INFO("Allocate memzone for %s, size %u on socket %u",
+		     queue_name, queue_size, socket_id);
+
+	return rte_memzone_reserve_aligned(queue_name, queue_size,
+			socket_id, RTE_MEMZONE_IOVA_CONTIG, queue_size);
+}
+
+int
+ptdma_add_queue(struct rte_ptdma_rawdev *dev)
+{
+	int i;
+	uint32_t dma_addr_lo, dma_addr_hi;
+	uint32_t ptdma_version = 0;
+	struct ptdma_cmd_queue *cmd_q;
+	const struct rte_memzone *q_mz;
+	void *vaddr;
+
+	if (dev == NULL)
+		return -1;
+
+	dev->id = ptdma_dev_id++;
+	dev->qidx = 0;
+	vaddr = (void *)(dev->pci.mem_resource[2].addr);
+
+	PTDMA_WRITE_REG(vaddr, CMD_REQID_CONFIG_OFFSET, 0x0);
+	ptdma_version = PTDMA_READ_REG(vaddr, CMD_PTDMA_VERSION);
+	PTDMA_PMD_INFO("PTDMA VERSION  = 0x%x", ptdma_version);
+
+	dev->cmd_q_count = 0;
+	/* Find available queues */
+	for (i = 0; i < MAX_HW_QUEUES; i++) {
+		cmd_q = &dev->cmd_q[dev->cmd_q_count++];
+		cmd_q->dev = dev;
+		cmd_q->id = i;
+		cmd_q->qidx = 0;
+		cmd_q->qsize = Q_SIZE(Q_DESC_SIZE);
+
+		cmd_q->reg_base = (uint8_t *)vaddr +
+			CMD_Q_STATUS_INCR * (i + 1);
+
+		/* PTDMA queue memory */
+		snprintf(cmd_q->memz_name, sizeof(cmd_q->memz_name),
+			 "%s_%d_%s_%d_%s",
+			 "ptdma_dev",
+			 (int)dev->id, "queue",
+			 (int)cmd_q->id, "mem");
+		q_mz = ptdma_queue_dma_zone_reserve(cmd_q->memz_name,
+				cmd_q->qsize, rte_socket_id());
+		cmd_q->qbase_addr = (void *)q_mz->addr;
+		cmd_q->qbase_desc = (void *)q_mz->addr;
+		cmd_q->qbase_phys_addr =  q_mz->iova;
+
+		cmd_q->qcontrol = 0;
+		/* init control reg to zero */
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE,
+			      cmd_q->qcontrol);
+
+		/* Disable the interrupts */
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_INT_ENABLE_BASE, 0x00);
+		PTDMA_READ_REG(cmd_q->reg_base, CMD_Q_INT_STATUS_BASE);
+		PTDMA_READ_REG(cmd_q->reg_base, CMD_Q_STATUS_BASE);
+
+		/* Clear the interrupts */
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_INTERRUPT_STATUS_BASE,
+			      ALL_INTERRUPTS);
+
+		/* Configure size of each virtual queue accessible to host */
+		cmd_q->qcontrol &= ~(CMD_Q_SIZE << CMD_Q_SHIFT);
+		cmd_q->qcontrol |= QUEUE_SIZE_VAL << CMD_Q_SHIFT;
+
+		dma_addr_lo = low32_value(cmd_q->qbase_phys_addr);
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_TAIL_LO_BASE,
+			      (uint32_t)dma_addr_lo);
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_HEAD_LO_BASE,
+			      (uint32_t)dma_addr_lo);
+
+		dma_addr_hi = high32_value(cmd_q->qbase_phys_addr);
+		cmd_q->qcontrol |= (dma_addr_hi << 16);
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE,
+			      cmd_q->qcontrol);
+
+	}
+	return 0;
+}
diff --git a/drivers/raw/ptdma/ptdma_pmd_private.h b/drivers/raw/ptdma/ptdma_pmd_private.h
new file mode 100644
index 0000000000..0c25e737f5
--- /dev/null
+++ b/drivers/raw/ptdma/ptdma_pmd_private.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef _PTDMA_PMD_PRIVATE_H_
+#define _PTDMA_PMD_PRIVATE_H_
+
+#include <rte_rawdev.h>
+#include "ptdma_rawdev_spec.h"
+
+extern int ptdma_pmd_logtype;
+
+#define PTDMA_PMD_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, ptdma_pmd_logtype, "%s(): " fmt "\n", \
+			__func__, ##args)
+
+#define PTDMA_PMD_FUNC_TRACE() PTDMA_PMD_LOG(DEBUG, ">>")
+
+#define PTDMA_PMD_ERR(fmt, args...) \
+	PTDMA_PMD_LOG(ERR, fmt, ## args)
+#define PTDMA_PMD_WARN(fmt, args...) \
+	PTDMA_PMD_LOG(WARNING, fmt, ## args)
+#define PTDMA_PMD_DEBUG(fmt, args...) \
+	PTDMA_PMD_LOG(DEBUG, fmt, ## args)
+#define PTDMA_PMD_INFO(fmt, args...) \
+	PTDMA_PMD_LOG(INFO, fmt, ## args)
+
+int ptdma_xstats_get(const struct rte_rawdev *dev, const unsigned int ids[],
+		uint64_t values[], unsigned int n);
+int ptdma_xstats_get_names(const struct rte_rawdev *dev,
+		struct rte_rawdev_xstats_name *names,
+		unsigned int size);
+int ptdma_xstats_reset(struct rte_rawdev *dev, const uint32_t *ids,
+		uint32_t nb_ids);
+int ptdma_add_queue(struct rte_ptdma_rawdev *dev);
+
+extern int ptdma_rawdev_test(uint16_t dev_id);
+
+#endif /* _PTDMA_PMD_PRIVATE_H_ */
+
+
diff --git a/drivers/raw/ptdma/ptdma_rawdev.c b/drivers/raw/ptdma/ptdma_rawdev.c
new file mode 100644
index 0000000000..cfa57d81ed
--- /dev/null
+++ b/drivers/raw/ptdma/ptdma_rawdev.c
@@ -0,0 +1,266 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <rte_bus_pci.h>
+#include <rte_rawdev_pmd.h>
+#include <rte_memzone.h>
+#include <rte_string_fns.h>
+#include <rte_dev.h>
+
+#include "rte_ptdma_rawdev.h"
+#include "ptdma_rawdev_spec.h"
+#include "ptdma_pmd_private.h"
+
+RTE_LOG_REGISTER(ptdma_pmd_logtype, rawdev.ptdma, INFO);
+
+uint8_t ptdma_rawdev_driver_id;
+static struct rte_pci_driver ptdma_pmd_drv;
+
+#define AMD_VENDOR_ID		0x1022
+#define PTDMA_DEVICE_ID		0x1498
+#define COMPLETION_SZ sizeof(__m128i)
+
+static const struct rte_pci_id pci_id_ptdma_map[] = {
+	{ RTE_PCI_DEVICE(AMD_VENDOR_ID, PTDMA_DEVICE_ID) },
+	{ .vendor_id = 0, /* sentinel */ },
+};
+
+static const char * const xstat_names[] = {
+	"failed_enqueues", "successful_enqueues",
+	"copies_started", "copies_completed"
+};
+
+static int
+ptdma_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config,
+		size_t config_size)
+{
+	struct rte_ptdma_rawdev_config *params = config;
+	struct rte_ptdma_rawdev *ptdma_priv = dev->dev_private;
+
+	if (dev->started)
+		return -EBUSY;
+	if (params == NULL || config_size != sizeof(*params))
+		return -EINVAL;
+	if (params->ring_size > 8192 || params->ring_size < 64 ||
+			!rte_is_power_of_2(params->ring_size))
+		return -EINVAL;
+	ptdma_priv->ring_size = params->ring_size;
+	ptdma_priv->hdls_disable = params->hdls_disable;
+	ptdma_priv->hdls = rte_zmalloc_socket("ptdma_hdls",
+			ptdma_priv->ring_size * sizeof(*ptdma_priv->hdls),
+			RTE_CACHE_LINE_SIZE, rte_socket_id());
+	return 0;
+}
+
+static int
+ptdma_rawdev_remove(struct rte_pci_device *dev);
+
+int
+ptdma_xstats_get(const struct rte_rawdev *dev, const unsigned int ids[],
+		uint64_t values[], unsigned int n)
+{
+	const struct rte_ptdma_rawdev *ptdma = dev->dev_private;
+	const uint64_t *stats = (const void *)&ptdma->xstats;
+	unsigned int i;
+
+	for (i = 0; i < n; i++) {
+		if (ids[i] > sizeof(ptdma->xstats)/sizeof(*stats))
+			values[i] = 0;
+		else
+			values[i] = stats[ids[i]];
+	}
+	return n;
+}
+
+int
+ptdma_xstats_get_names(const struct rte_rawdev *dev,
+		struct rte_rawdev_xstats_name *names,
+		unsigned int size)
+{
+	unsigned int i;
+
+	RTE_SET_USED(dev);
+	if (size < RTE_DIM(xstat_names))
+		return RTE_DIM(xstat_names);
+	for (i = 0; i < RTE_DIM(xstat_names); i++)
+		strlcpy(names[i].name, xstat_names[i], sizeof(names[i]));
+	return RTE_DIM(xstat_names);
+}
+
+int
+ptdma_xstats_reset(struct rte_rawdev *dev, const uint32_t *ids,
+		uint32_t nb_ids)
+{
+	struct rte_ptdma_rawdev *ptdma = dev->dev_private;
+	uint64_t *stats = (void *)&ptdma->xstats;
+	unsigned int i;
+
+	if (!ids) {
+		memset(&ptdma->xstats, 0, sizeof(ptdma->xstats));
+		return 0;
+	}
+	for (i = 0; i < nb_ids; i++)
+		if (ids[i] < sizeof(ptdma->xstats)/sizeof(*stats))
+			stats[ids[i]] = 0;
+	return 0;
+}
+
+static int
+ptdma_dev_start(struct rte_rawdev *dev)
+{
+	RTE_SET_USED(dev);
+	return 0;
+}
+
+static void
+ptdma_dev_stop(struct rte_rawdev *dev)
+{
+	RTE_SET_USED(dev);
+}
+
+static int
+ptdma_dev_close(struct rte_rawdev *dev __rte_unused)
+{
+	return 0;
+}
+
+static int
+ptdma_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info,
+		size_t dev_info_size)
+{
+	struct rte_ptdma_rawdev_config *cfg = dev_info;
+	struct rte_ptdma_rawdev *ptdma = dev->dev_private;
+
+	if (dev_info == NULL || dev_info_size != sizeof(*cfg))
+		return -EINVAL;
+	cfg->ring_size = ptdma->ring_size;
+	cfg->hdls_disable = ptdma->hdls_disable;
+	return 0;
+}
+
+static int
+ptdma_rawdev_create(const char *name, struct rte_pci_device *dev)
+{
+	static const struct rte_rawdev_ops ptdma_rawdev_ops = {
+			.dev_configure = ptdma_dev_configure,
+			.dev_start = ptdma_dev_start,
+			.dev_stop = ptdma_dev_stop,
+			.dev_close = ptdma_dev_close,
+			.dev_info_get = ptdma_dev_info_get,
+			.xstats_get = ptdma_xstats_get,
+			.xstats_get_names = ptdma_xstats_get_names,
+			.xstats_reset = ptdma_xstats_reset,
+			.dev_selftest = ptdma_rawdev_test,
+	};
+	struct rte_rawdev *rawdev = NULL;
+	struct rte_ptdma_rawdev *ptdma_priv = NULL;
+	int ret = 0;
+	if (!name) {
+		PTDMA_PMD_ERR("Invalid name of the device!");
+		ret = -EINVAL;
+		goto cleanup;
+	}
+	/* Allocate device structure */
+	rawdev = rte_rawdev_pmd_allocate(name, sizeof(struct rte_rawdev),
+						rte_socket_id());
+	if (rawdev == NULL) {
+		PTDMA_PMD_ERR("Unable to allocate raw device");
+		ret = -ENOMEM;
+		goto cleanup;
+	}
+
+	rawdev->dev_id = ptdma_rawdev_driver_id++;
+	PTDMA_PMD_INFO("dev_id = %d", rawdev->dev_id);
+	PTDMA_PMD_INFO("driver_name = %s", dev->device.driver->name);
+
+	rawdev->dev_ops = &ptdma_rawdev_ops;
+	rawdev->device = &dev->device;
+	rawdev->driver_name = dev->device.driver->name;
+
+	ptdma_priv = rte_zmalloc_socket("ptdma_priv", sizeof(*ptdma_priv),
+				RTE_CACHE_LINE_SIZE, rte_socket_id());
+	rawdev->dev_private = ptdma_priv;
+	ptdma_priv->rawdev = rawdev;
+	ptdma_priv->ring_size = 0;
+	ptdma_priv->pci = *dev;
+
+	/* device is valid, add queue details */
+	if (ptdma_add_queue(ptdma_priv))
+		goto init_error;
+
+	return 0;
+
+cleanup:
+	if (rawdev)
+		rte_rawdev_pmd_release(rawdev);
+	return ret;
+init_error:
+	PTDMA_PMD_ERR("driver %s(): failed", __func__);
+	ptdma_rawdev_remove(dev);
+	return -EFAULT;
+}
+
+static int
+ptdma_rawdev_destroy(const char *name)
+{
+	int ret;
+	struct rte_rawdev *rdev;
+	if (!name) {
+		PTDMA_PMD_ERR("Invalid device name");
+		return -EINVAL;
+	}
+	rdev = rte_rawdev_pmd_get_named_dev(name);
+	if (!rdev) {
+		PTDMA_PMD_ERR("Invalid device name (%s)", name);
+		return -EINVAL;
+	}
+
+	if (rdev->dev_private != NULL)
+		rte_free(rdev->dev_private);
+
+	/* rte_rawdev_close is called by pmd_release */
+	ret = rte_rawdev_pmd_release(rdev);
+
+	if (ret)
+		PTDMA_PMD_DEBUG("Device cleanup failed");
+	return 0;
+}
+static int
+ptdma_rawdev_probe(struct rte_pci_driver *drv, struct rte_pci_device *dev)
+{
+	char name[32];
+	int ret = 0;
+
+	rte_pci_device_name(&dev->addr, name, sizeof(name));
+	PTDMA_PMD_INFO("Init %s on NUMA node %d", name, dev->device.numa_node);
+
+	dev->device.driver = &drv->driver;
+	ret = ptdma_rawdev_create(name, dev);
+	return ret;
+}
+
+static int
+ptdma_rawdev_remove(struct rte_pci_device *dev)
+{
+	char name[32];
+	int ret;
+
+	rte_pci_device_name(&dev->addr, name, sizeof(name));
+	PTDMA_PMD_INFO("Closing %s on NUMA node %d",
+			name, dev->device.numa_node);
+	ret = ptdma_rawdev_destroy(name);
+	return ret;
+}
+
+static struct rte_pci_driver ptdma_pmd_drv = {
+	.id_table = pci_id_ptdma_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.probe = ptdma_rawdev_probe,
+	.remove = ptdma_rawdev_remove,
+};
+
+RTE_PMD_REGISTER_PCI(PTDMA_PMD_RAWDEV_NAME, ptdma_pmd_drv);
+RTE_PMD_REGISTER_PCI_TABLE(PTDMA_PMD_RAWDEV_NAME, pci_id_ptdma_map);
+RTE_PMD_REGISTER_KMOD_DEP(PTDMA_PMD_RAWDEV_NAME, "* igb_uio | uio_pci_generic");
+
diff --git a/drivers/raw/ptdma/ptdma_rawdev_spec.h b/drivers/raw/ptdma/ptdma_rawdev_spec.h
new file mode 100644
index 0000000000..73511bec95
--- /dev/null
+++ b/drivers/raw/ptdma/ptdma_rawdev_spec.h
@@ -0,0 +1,362 @@
+/* SPDX-License-Identifier: BSD-3.0-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef __PT_DEV_H__
+#define __PT_DEV_H__
+
+#include <rte_bus_pci.h>
+#include <rte_byteorder.h>
+#include <rte_io.h>
+#include <rte_pci.h>
+#include <rte_spinlock.h>
+#include <rte_rawdev.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define BIT(nr)				(1 << (nr))
+
+#define BITS_PER_LONG   (__SIZEOF_LONG__ * 8)
+#define GENMASK(h, l)   (((~0UL) << (l)) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
+
+#define MAX_HW_QUEUES			1
+
+/* Register Mappings */
+
+#define CMD_QUEUE_PRIO_OFFSET		0x00
+#define CMD_REQID_CONFIG_OFFSET		0x04
+#define CMD_TIMEOUT_OFFSET		0x08
+#define CMD_TIMEOUT_GRANULARITY		0x0C
+#define CMD_PTDMA_VERSION		0x10
+
+#define CMD_Q_CONTROL_BASE		0x0000
+#define CMD_Q_TAIL_LO_BASE		0x0004
+#define CMD_Q_HEAD_LO_BASE		0x0008
+#define CMD_Q_INT_ENABLE_BASE		0x000C
+#define CMD_Q_INTERRUPT_STATUS_BASE	0x0010
+
+#define CMD_Q_STATUS_BASE		0x0100
+#define CMD_Q_INT_STATUS_BASE		0x0104
+#define CMD_Q_DMA_STATUS_BASE		0x0108
+#define CMD_Q_DMA_READ_STATUS_BASE	0x010C
+#define CMD_Q_DMA_WRITE_STATUS_BASE	0x0110
+#define CMD_Q_ABORT_BASE		0x0114
+#define CMD_Q_AX_CACHE_BASE		0x0118
+
+#define CMD_CONFIG_OFFSET		0x1120
+#define CMD_CLK_GATE_CTL_OFFSET		0x6004
+
+#define CMD_DESC_DW0_VAL		0x500012
+
+/* Address offset for virtual queue registers */
+#define CMD_Q_STATUS_INCR		0x1000
+
+/* Bit masks */
+#define CMD_CONFIG_REQID		0
+#define CMD_TIMEOUT_DISABLE		0
+#define CMD_CLK_DYN_GATING_DIS		0
+#define CMD_CLK_SW_GATE_MODE		0
+#define CMD_CLK_GATE_CTL		0
+#define CMD_QUEUE_PRIO			GENMASK(2, 1)
+#define CMD_CONFIG_VHB_EN		BIT(0)
+#define CMD_CLK_DYN_GATING_EN		BIT(0)
+#define CMD_CLK_HW_GATE_MODE		BIT(0)
+#define CMD_CLK_GATE_ON_DELAY		BIT(12)
+#define CMD_CLK_GATE_OFF_DELAY		BIT(12)
+
+#define CMD_CLK_GATE_CONFIG		(CMD_CLK_GATE_CTL | \
+					CMD_CLK_HW_GATE_MODE | \
+					CMD_CLK_GATE_ON_DELAY | \
+					CMD_CLK_DYN_GATING_EN | \
+					CMD_CLK_GATE_OFF_DELAY)
+
+#define CMD_Q_LEN			32
+#define CMD_Q_RUN			BIT(0)
+#define CMD_Q_HALT			BIT(1)
+#define CMD_Q_MEM_LOCATION		BIT(2)
+#define CMD_Q_SIZE			GENMASK(4, 0)
+#define CMD_Q_SHIFT			GENMASK(1, 0)
+#define COMMANDS_PER_QUEUE		8192
+
+
+#define QUEUE_SIZE_VAL			((ffs(COMMANDS_PER_QUEUE) - 2) & \
+						CMD_Q_SIZE)
+#define Q_PTR_MASK			(2 << (QUEUE_SIZE_VAL + 5) - 1)
+#define Q_DESC_SIZE			sizeof(struct ptdma_desc)
+#define Q_SIZE(n)			(COMMANDS_PER_QUEUE * (n))
+
+#define INT_COMPLETION			BIT(0)
+#define INT_ERROR			BIT(1)
+#define INT_QUEUE_STOPPED		BIT(2)
+#define INT_EMPTY_QUEUE			BIT(3)
+#define SUPPORTED_INTERRUPTS		(INT_COMPLETION | INT_ERROR)
+#define ALL_INTERRUPTS			(INT_COMPLETION | INT_ERROR | \
+					INT_QUEUE_STOPPED)
+
+/****** Local Storage Block ******/
+#define LSB_START			0
+#define LSB_END				127
+#define LSB_COUNT			(LSB_END - LSB_START + 1)
+
+#define LSB_REGION_WIDTH		5
+#define MAX_LSB_CNT			8
+
+#define LSB_SIZE			16
+#define LSB_ITEM_SIZE			128
+#define SLSB_MAP_SIZE			(MAX_LSB_CNT * LSB_SIZE)
+#define LSB_ENTRY_NUMBER(LSB_ADDR)	(LSB_ADDR / LSB_ITEM_SIZE)
+
+
+#define PT_DMAPOOL_MAX_SIZE		64
+#define PT_DMAPOOL_ALIGN		BIT(5)
+
+#define PT_PASSTHRU_BLOCKSIZE		512
+
+/* General PTDMA Defines */
+
+#define PTDMA_SB_BYTES			32
+#define	PTDMA_ENGINE_PASSTHRU		0x5
+
+/* Word 0 */
+#define PTDMA_CMD_DW0(p)		((p)->dw0)
+#define PTDMA_CMD_SOC(p)		(PTDMA_CMD_DW0(p).soc)
+#define PTDMA_CMD_IOC(p)		(PTDMA_CMD_DW0(p).ioc)
+#define PTDMA_CMD_INIT(p)		(PTDMA_CMD_DW0(p).init)
+#define PTDMA_CMD_EOM(p)		(PTDMA_CMD_DW0(p).eom)
+#define PTDMA_CMD_FUNCTION(p)		(PTDMA_CMD_DW0(p).function)
+#define PTDMA_CMD_ENGINE(p)		(PTDMA_CMD_DW0(p).engine)
+#define PTDMA_CMD_PROT(p)		(PTDMA_CMD_DW0(p).prot)
+
+/* Word 1 */
+#define PTDMA_CMD_DW1(p)		((p)->length)
+#define PTDMA_CMD_LEN(p)		(PTDMA_CMD_DW1(p))
+
+/* Word 2 */
+#define PTDMA_CMD_DW2(p)		((p)->src_lo)
+#define PTDMA_CMD_SRC_LO(p)		(PTDMA_CMD_DW2(p))
+
+/* Word 3 */
+#define PTDMA_CMD_DW3(p)		((p)->dw3)
+#define PTDMA_CMD_SRC_MEM(p)		((p)->dw3.src_mem)
+#define PTDMA_CMD_SRC_HI(p)		((p)->dw3.src_hi)
+#define PTDMA_CMD_LSB_ID(p)		((p)->dw3.lsb_cxt_id)
+#define PTDMA_CMD_FIX_SRC(p)		((p)->dw3.fixed)
+
+/* Words 4/5 */
+#define PTDMA_CMD_DST_LO(p)		((p)->dst_lo)
+#define PTDMA_CMD_DW5(p)		((p)->dw5.dst_hi)
+#define PTDMA_CMD_DST_HI(p)		(PTDMA_CMD_DW5(p))
+#define PTDMA_CMD_DST_MEM(p)		((p)->dw5.dst_mem)
+#define PTDMA_CMD_FIX_DST(p)		((p)->dw5.fixed)
+
+/* bitmap */
+enum {
+	BITS_PER_WORD = sizeof(unsigned long) * CHAR_BIT
+};
+
+#define WORD_OFFSET(b) ((b) / BITS_PER_WORD)
+#define BIT_OFFSET(b)  ((b) % BITS_PER_WORD)
+
+#define PTDMA_DIV_ROUND_UP(n, d)  (((n) + (d) - 1) / (d))
+#define PTDMA_BITMAP_SIZE(nr) \
+	PTDMA_DIV_ROUND_UP(nr, CHAR_BIT * sizeof(unsigned long))
+
+#define PTDMA_BITMAP_FIRST_WORD_MASK(start) \
+	(~0UL << ((start) & (BITS_PER_WORD - 1)))
+#define PTDMA_BITMAP_LAST_WORD_MASK(nbits) \
+	(~0UL >> (-(nbits) & (BITS_PER_WORD - 1)))
+
+#define __ptdma_round_mask(x, y) ((typeof(x))((y)-1))
+#define ptdma_round_down(x, y) ((x) & ~__ptdma_round_mask(x, y))
+
+/** PTDMA registers Write/Read */
+static inline void ptdma_pci_reg_write(void *base, int offset,
+					uint32_t value)
+{
+	volatile void *reg_addr = ((uint8_t *)base + offset);
+	rte_write32((rte_cpu_to_le_32(value)), reg_addr);
+}
+
+static inline uint32_t ptdma_pci_reg_read(void *base, int offset)
+{
+	volatile void *reg_addr = ((uint8_t *)base + offset);
+	return rte_le_to_cpu_32(rte_read32(reg_addr));
+}
+
+#define PTDMA_READ_REG(hw_addr, reg_offset) \
+	ptdma_pci_reg_read(hw_addr, reg_offset)
+
+#define PTDMA_WRITE_REG(hw_addr, reg_offset, value) \
+	ptdma_pci_reg_write(hw_addr, reg_offset, value)
+
+/**
+ * A structure describing a PTDMA command queue.
+ */
+struct ptdma_cmd_queue {
+	struct rte_ptdma_rawdev *dev;
+	char memz_name[RTE_MEMZONE_NAMESIZE];
+
+	/* Queue identifier */
+	uint64_t id;	/**< queue id */
+	uint64_t qidx;	/**< queue index */
+	uint64_t qsize;	/**< queue size */
+
+	/* Queue address */
+	struct ptdma_desc *qbase_desc;
+	void *qbase_addr;
+	phys_addr_t qbase_phys_addr;
+	/**< queue-page registers addr */
+	void *reg_base;
+	uint32_t qcontrol;
+	/**< queue ctrl reg */
+	uint32_t head_offset;
+	uint32_t tail_offset;
+
+	int lsb;
+	/**< lsb region assigned to queue */
+	unsigned long lsbmask;
+	/**< lsb regions queue can access */
+	unsigned long lsbmap[PTDMA_BITMAP_SIZE(LSB_COUNT)];
+	/**< all lsb resources which queue is using */
+	uint32_t sb_key;
+	/**< lsb assigned for queue */
+} __rte_cache_aligned;
+
+/* Passthru engine */
+
+#define PTDMA_PT_BYTESWAP(p)      ((p)->pt.byteswap)
+#define PTDMA_PT_BITWISE(p)       ((p)->pt.bitwise)
+
+/**
+ * passthru_bitwise - type of bitwise passthru operation
+ *
+ * @PTDMA_PASSTHRU_BITWISE_NOOP: no bitwise operation performed
+ * @PTDMA_PASSTHRU_BITWISE_AND: perform bitwise AND of src with mask
+ * @PTDMA_PASSTHRU_BITWISE_OR: perform bitwise OR of src with mask
+ * @PTDMA_PASSTHRU_BITWISE_XOR: perform bitwise XOR of src with mask
+ * @PTDMA_PASSTHRU_BITWISE_MASK: overwrite with mask
+ */
+enum ptdma_passthru_bitwise {
+	PTDMA_PASSTHRU_BITWISE_NOOP = 0,
+	PTDMA_PASSTHRU_BITWISE_AND,
+	PTDMA_PASSTHRU_BITWISE_OR,
+	PTDMA_PASSTHRU_BITWISE_XOR,
+	PTDMA_PASSTHRU_BITWISE_MASK,
+	PTDMA_PASSTHRU_BITWISE__LAST,
+};
+
+/**
+ * ptdma_passthru_byteswap - type of byteswap passthru operation
+ *
+ * @PTDMA_PASSTHRU_BYTESWAP_NOOP: no byte swapping performed
+ * @PTDMA_PASSTHRU_BYTESWAP_32BIT: swap bytes within 32-bit words
+ * @PTDMA_PASSTHRU_BYTESWAP_256BIT: swap bytes within 256-bit words
+ */
+enum ptdma_passthru_byteswap {
+	PTDMA_PASSTHRU_BYTESWAP_NOOP = 0,
+	PTDMA_PASSTHRU_BYTESWAP_32BIT,
+	PTDMA_PASSTHRU_BYTESWAP_256BIT,
+	PTDMA_PASSTHRU_BYTESWAP__LAST,
+};
+
+/**
+ * PTDMA passthru
+ */
+struct ptdma_passthru {
+	phys_addr_t src_addr;
+	phys_addr_t dest_addr;
+	enum ptdma_passthru_bitwise bit_mod;
+	enum ptdma_passthru_byteswap byte_swap;
+	int len;
+};
+
+union ptdma_function {
+	struct {
+		uint16_t byteswap:2;
+		uint16_t bitwise:3;
+		uint16_t reflect:2;
+		uint16_t rsvd:8;
+	} pt;
+	uint16_t raw;
+};
+
+/**
+ * ptdma memory type
+ */
+enum ptdma_memtype {
+	PTDMA_MEMTYPE_SYSTEM = 0,
+	PTDMA_MEMTYPE_SB,
+	PTDMA_MEMTYPE_LOCAL,
+	PTDMA_MEMTYPE_LAST,
+};
+
+/*
+ * descriptor for PTDMA commands
+ * 8 32-bit words:
+ * word 0: function; engine; control bits
+ * word 1: length of source data
+ * word 2: low 32 bits of source pointer
+ * word 3: upper 16 bits of source pointer; source memory type
+ * word 4: low 32 bits of destination pointer
+ * word 5: upper 16 bits of destination pointer; destination memory type
+ * word 6: reserved 32 bits
+ * word 7: reserved 32 bits
+ */
+
+union dword0 {
+	struct {
+		uint32_t soc:1;
+		uint32_t ioc:1;
+		uint32_t rsvd1:1;
+		uint32_t init:1;
+		uint32_t eom:1;
+		uint32_t function:15;
+		uint32_t engine:4;
+		uint32_t prot:1;
+		uint32_t rsvd2:7;
+	};
+	uint32_t val;
+};
+
+struct dword3 {
+	uint32_t  src_hi:16;
+	uint32_t  src_mem:2;
+	uint32_t  lsb_cxt_id:8;
+	uint32_t  rsvd1:5;
+	uint32_t  fixed:1;
+};
+
+struct dword5 {
+	uint32_t  dst_hi:16;
+	uint32_t  dst_mem:2;
+	uint32_t  rsvd1:13;
+	uint32_t  fixed:1;
+};
+
+struct ptdma_desc {
+	union dword0 dw0;
+	uint32_t length;
+	uint32_t src_lo;
+	struct dword3 dw3;
+	uint32_t dst_lo;
+	struct dword5 dw5;
+	uint32_t rsvd1;
+	uint32_t rsvd2;
+};
+
+
+static inline uint32_t
+low32_value(unsigned long addr)
+{
+	return ((uint64_t)addr) & 0x0ffffffff;
+}
+
+static inline uint32_t
+high32_value(unsigned long addr)
+{
+	return ((uint64_t)addr >> 32) & 0x00000ffff;
+}
+
+#endif
diff --git a/drivers/raw/ptdma/ptdma_rawdev_test.c b/drivers/raw/ptdma/ptdma_rawdev_test.c
new file mode 100644
index 0000000000..fbbcd66c8d
--- /dev/null
+++ b/drivers/raw/ptdma/ptdma_rawdev_test.c
@@ -0,0 +1,272 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ **/
+
+#include <unistd.h>
+#include <inttypes.h>
+#include <rte_mbuf.h>
+#include "rte_rawdev.h"
+#include "rte_ptdma_rawdev.h"
+#include "ptdma_pmd_private.h"
+
+#define MAX_SUPPORTED_RAWDEVS 16
+#define TEST_SKIPPED 77
+
+
+static struct rte_mempool *pool;
+static unsigned short expected_ring_size[MAX_SUPPORTED_RAWDEVS];
+
+#define PRINT_ERR(...) print_err(__func__, __LINE__, __VA_ARGS__)
+
+static inline int
+__rte_format_printf(3, 4)
+print_err(const char *func, int lineno, const char *format, ...)
+{
+	va_list ap;
+	int ret;
+
+	ret = fprintf(stderr, "In %s:%d - ", func, lineno);
+	va_start(ap, format);
+	ret += vfprintf(stderr, format, ap);
+	va_end(ap);
+
+	return ret;
+}
+
+static int
+test_enqueue_copies(int dev_id)
+{
+	const unsigned int length = 1024;
+	unsigned int i = 0;
+	do {
+		struct rte_mbuf *src, *dst;
+		char *src_data, *dst_data;
+		struct rte_mbuf *completed[2] = {0};
+
+		/* test doing a single copy */
+		src = rte_pktmbuf_alloc(pool);
+		dst = rte_pktmbuf_alloc(pool);
+		src->data_len = src->pkt_len = length;
+		dst->data_len = dst->pkt_len = length;
+		src_data = rte_pktmbuf_mtod(src, char *);
+		dst_data = rte_pktmbuf_mtod(dst, char *);
+
+		for (i = 0; i < length; i++)
+			src_data[i] = rand() & 0xFF;
+
+		if (rte_ptdma_enqueue_copy(dev_id,
+				src->buf_iova + src->data_off,
+				dst->buf_iova + dst->data_off,
+				length,
+				(uintptr_t)src,
+				(uintptr_t)dst) != 1) {
+			PRINT_ERR("Error with rte_ptdma_enqueue_copy - 1\n");
+			return -1;
+		}
+		rte_ptdma_perform_ops(dev_id);
+		usleep(10);
+
+		if (rte_ptdma_completed_ops(dev_id, 1, (void *)&completed[0],
+				(void *)&completed[1]) != 1) {
+			PRINT_ERR("Error with rte_ptdma_completed_ops - 1\n");
+			return -1;
+		}
+		if (completed[0] != src || completed[1] != dst) {
+			PRINT_ERR("Error with completions: got (%p, %p), not (%p,%p)\n",
+					completed[0], completed[1], src, dst);
+			return -1;
+		}
+
+		for (i = 0; i < length; i++)
+			if (dst_data[i] != src_data[i]) {
+				PRINT_ERR("Data mismatch at char %u - 1\n", i);
+				return -1;
+			}
+		rte_pktmbuf_free(src);
+		rte_pktmbuf_free(dst);
+
+
+	} while (0);
+
+	/* test doing multiple copies */
+	do {
+		struct rte_mbuf *srcs[32], *dsts[32];
+		struct rte_mbuf *completed_src[64];
+		struct rte_mbuf *completed_dst[64];
+		unsigned int j;
+
+		for (i = 0; i < RTE_DIM(srcs) ; i++) {
+			char *src_data;
+
+			srcs[i] = rte_pktmbuf_alloc(pool);
+			dsts[i] = rte_pktmbuf_alloc(pool);
+			srcs[i]->data_len = srcs[i]->pkt_len = length;
+			dsts[i]->data_len = dsts[i]->pkt_len = length;
+			src_data = rte_pktmbuf_mtod(srcs[i], char *);
+
+			for (j = 0; j < length; j++)
+				src_data[j] = rand() & 0xFF;
+
+			if (rte_ptdma_enqueue_copy(dev_id,
+					srcs[i]->buf_iova + srcs[i]->data_off,
+					dsts[i]->buf_iova + dsts[i]->data_off,
+					length,
+					(uintptr_t)srcs[i],
+					(uintptr_t)dsts[i]) != 1) {
+				PRINT_ERR("Error with rte_ptdma_enqueue_copy for buffer %u\n",
+						i);
+				return -1;
+			}
+		}
+		rte_ptdma_perform_ops(dev_id);
+		usleep(100);
+
+		if (rte_ptdma_completed_ops(dev_id, 64, (void *)completed_src,
+				(void *)completed_dst) != RTE_DIM(srcs)) {
+			PRINT_ERR("Error with rte_ptdma_completed_ops\n");
+			return -1;
+		}
+
+		for (i = 0; i < RTE_DIM(srcs) ; i++) {
+			char *src_data, *dst_data;
+			if (completed_src[i] != srcs[i]) {
+				PRINT_ERR("Error with source pointer %u\n", i);
+				return -1;
+			}
+			if (completed_dst[i] != dsts[i]) {
+				PRINT_ERR("Error with dest pointer %u\n", i);
+				return -1;
+			}
+
+			src_data = rte_pktmbuf_mtod(srcs[i], char *);
+			dst_data = rte_pktmbuf_mtod(dsts[i], char *);
+			for (j = 0; j < length; j++)
+				if (src_data[j] != dst_data[j]) {
+					PRINT_ERR("Error with copy of packet %u, byte %u\n",
+							i, j);
+					return -1;
+				}
+
+			rte_pktmbuf_free(srcs[i]);
+			rte_pktmbuf_free(dsts[i]);
+		}
+
+	} while (0);
+
+	return 0;
+}
+
+int
+ptdma_rawdev_test(uint16_t dev_id)
+{
+#define PTDMA_TEST_RINGSIZE 512
+	struct rte_ptdma_rawdev_config p = { .ring_size = -1 };
+	struct rte_rawdev_info info = { .dev_private = &p };
+	struct rte_rawdev_xstats_name *snames = NULL;
+	uint64_t *stats = NULL;
+	unsigned int *ids = NULL;
+	unsigned int nb_xstats;
+	unsigned int i;
+
+	if (dev_id >= MAX_SUPPORTED_RAWDEVS) {
+		printf("Skipping test. Cannot test rawdevs with id's greater than %d\n",
+				MAX_SUPPORTED_RAWDEVS);
+		return TEST_SKIPPED;
+	}
+
+	rte_rawdev_info_get(dev_id, &info, sizeof(p));
+	if (p.ring_size != expected_ring_size[dev_id]) {
+		PRINT_ERR("Error, initial ring size is not as expected (Actual: %d, Expected: %d)\n",
+				(int)p.ring_size, expected_ring_size[dev_id]);
+		return -1;
+	}
+
+	p.ring_size = PTDMA_TEST_RINGSIZE;
+	if (rte_rawdev_configure(dev_id, &info, sizeof(p)) != 0) {
+		PRINT_ERR("Error with rte_rawdev_configure()\n");
+		return -1;
+	}
+	rte_rawdev_info_get(dev_id, &info, sizeof(p));
+	if (p.ring_size != PTDMA_TEST_RINGSIZE) {
+		PRINT_ERR("Error, ring size is not %d (%d)\n",
+				PTDMA_TEST_RINGSIZE, (int)p.ring_size);
+		return -1;
+	}
+	expected_ring_size[dev_id] = p.ring_size;
+
+	if (rte_rawdev_start(dev_id) != 0) {
+		PRINT_ERR("Error with rte_rawdev_start()\n");
+		return -1;
+	}
+
+	pool = rte_pktmbuf_pool_create("TEST_PTDMA_POOL",
+			256, /* n == num elements */
+			32,  /* cache size */
+			0,   /* priv size */
+			2048, /* data room size */
+			info.socket_id);
+	if (pool == NULL) {
+		PRINT_ERR("Error with mempool creation\n");
+		return -1;
+	}
+
+	/* allocate memory for xstats names and values */
+	nb_xstats = rte_rawdev_xstats_names_get(dev_id, NULL, 0);
+
+	snames = malloc(sizeof(*snames) * nb_xstats);
+	if (snames == NULL) {
+		PRINT_ERR("Error allocating xstat names memory\n");
+		goto err;
+	}
+	rte_rawdev_xstats_names_get(dev_id, snames, nb_xstats);
+
+	ids = malloc(sizeof(*ids) * nb_xstats);
+	if (ids == NULL) {
+		PRINT_ERR("Error allocating xstat ids memory\n");
+		goto err;
+	}
+	for (i = 0; i < nb_xstats; i++)
+		ids[i] = i;
+
+	stats = malloc(sizeof(*stats) * nb_xstats);
+	if (stats == NULL) {
+		PRINT_ERR("Error allocating xstat memory\n");
+		goto err;
+	}
+
+	/* run the test cases */
+	printf("Running Copy Tests\n");
+	for (i = 0; i < 100; i++) {
+		unsigned int j;
+
+		if (test_enqueue_copies(dev_id) != 0)
+			goto err;
+
+		rte_rawdev_xstats_get(dev_id, ids, stats, nb_xstats);
+		for (j = 0; j < nb_xstats; j++)
+			printf("%s: %"PRIu64"   ", snames[j].name, stats[j]);
+		printf("\r");
+	}
+	printf("\n");
+
+	rte_rawdev_stop(dev_id);
+	if (rte_rawdev_xstats_reset(dev_id, NULL, 0) != 0) {
+		PRINT_ERR("Error resetting xstat values\n");
+		goto err;
+	}
+
+	rte_mempool_free(pool);
+	free(snames);
+	free(stats);
+	free(ids);
+	return 0;
+
+err:
+	rte_rawdev_stop(dev_id);
+	rte_rawdev_xstats_reset(dev_id, NULL, 0);
+	rte_mempool_free(pool);
+	free(snames);
+	free(stats);
+	free(ids);
+	return -1;
+}
diff --git a/drivers/raw/ptdma/rte_ptdma_rawdev.h b/drivers/raw/ptdma/rte_ptdma_rawdev.h
new file mode 100644
index 0000000000..84eccbc4e8
--- /dev/null
+++ b/drivers/raw/ptdma/rte_ptdma_rawdev.h
@@ -0,0 +1,124 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef _RTE_PTMDA_RAWDEV_H_
+#define _RTE_PTMDA_RAWDEV_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_ptdma_rawdev.h
+ *
+ * Definitions for using the ptdma rawdev device driver
+ *
+ * @warning
+ * @b EXPERIMENTAL: these structures and APIs may change without prior notice
+ */
+
+#include <rte_common.h>
+
+/** Name of the device driver */
+#define PTDMA_PMD_RAWDEV_NAME rawdev_ptdma
+/** String reported as the device driver name by rte_rawdev_info_get() */
+#define PTDMA_PMD_RAWDEV_NAME_STR "rawdev_ptdma"
+
+/**
+ * Configuration structure for an ptdma rawdev instance
+ *
+ * This structure is to be passed as the ".dev_private" parameter when
+ * calling the rte_rawdev_get_info() and rte_rawdev_configure() APIs on
+ * an ptdma rawdev instance.
+ */
+struct rte_ptdma_rawdev_config {
+	unsigned short ring_size; /**< size of job submission descriptor ring */
+	bool hdls_disable;    /**< if set, ignore user-supplied handle params */
+};
+
+/**
+ * Enqueue a copy operation onto the ptdma device
+ *
+ * This queues up a copy operation to be performed by hardware, but does not
+ * trigger hardware to begin that operation.
+ *
+ * @param dev_id
+ *   The rawdev device id of the ptdma instance
+ * @param src
+ *   The physical address of the source buffer
+ * @param dst
+ *   The physical address of the destination buffer
+ * @param length
+ *   The length of the data to be copied
+ * @param src_hdl
+ *   An opaque handle for the source data, to be returned when this operation
+ *   has been completed and the user polls for the completion details.
+ *   NOTE: If hdls_disable configuration option for the device is set, this
+ *   parameter is ignored.
+ * @param dst_hdl
+ *   An opaque handle for the destination data, to be returned when this
+ *   operation has been completed and the user polls for the completion details.
+ *   NOTE: If hdls_disable configuration option for the device is set, this
+ *   parameter is ignored.
+ * @return
+ *   Number of operations enqueued, either 0 or 1
+ */
+static inline int
+__rte_experimental
+rte_ptdma_enqueue_copy(int dev_id, phys_addr_t src, phys_addr_t dst,
+		unsigned int length, uintptr_t src_hdl, uintptr_t dst_hdl);
+
+
+/**
+ * Trigger hardware to begin performing enqueued operations
+ *
+ * This API is used to write to the hardware to trigger it
+ * to begin the operations previously enqueued by rte_ptdma_enqueue_copy()
+ *
+ * @param dev_id
+ *   The rawdev device id of the ptdma instance
+ */
+static inline void
+__rte_experimental
+rte_ptdma_perform_ops(int dev_id);
+
+/**
+ * Returns details of operations that have been completed
+ *
+ * This function returns number of newly-completed operations.
+ *
+ * @param dev_id
+ *   The rawdev device id of the ptdma instance
+ * @param max_copies
+ *   The number of entries which can fit in the src_hdls and dst_hdls
+ *   arrays, i.e. max number of completed operations to report.
+ *   NOTE: If hdls_disable configuration option for the device is set, this
+ *   parameter is ignored.
+ * @param src_hdls
+ *   Array to hold the source handle parameters of the completed ops.
+ *   NOTE: If hdls_disable configuration option for the device is set, this
+ *   parameter is ignored.
+ * @param dst_hdls
+ *   Array to hold the destination handle parameters of the completed ops.
+ *   NOTE: If hdls_disable configuration option for the device is set, this
+ *   parameter is ignored.
+ * @return
+ *   -1 on error, with rte_errno set appropriately.
+ *   Otherwise number of completed operations i.e. number of entries written
+ *   to the src_hdls and dst_hdls array parameters.
+ */
+static inline int
+__rte_experimental
+rte_ptdma_completed_ops(int dev_id, uint8_t max_copies,
+		uintptr_t *src_hdls, uintptr_t *dst_hdls);
+
+
+/* include the implementation details from a separate file */
+#include "rte_ptdma_rawdev_fns.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PTMDA_RAWDEV_H_ */
diff --git a/drivers/raw/ptdma/rte_ptdma_rawdev_fns.h b/drivers/raw/ptdma/rte_ptdma_rawdev_fns.h
new file mode 100644
index 0000000000..f4dced3bef
--- /dev/null
+++ b/drivers/raw/ptdma/rte_ptdma_rawdev_fns.h
@@ -0,0 +1,298 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+#ifndef _RTE_PTDMA_RAWDEV_FNS_H_
+#define _RTE_PTDMA_RAWDEV_FNS_H_
+
+#include <x86intrin.h>
+#include <rte_rawdev.h>
+#include <rte_memzone.h>
+#include <rte_prefetch.h>
+#include "ptdma_rawdev_spec.h"
+#include "ptdma_pmd_private.h"
+
+/**
+ * @internal
+ * some statistics for tracking, if added/changed update xstats fns
+ */
+struct rte_ptdma_xstats {
+	uint64_t enqueue_failed;
+	uint64_t enqueued;
+	uint64_t started;
+	uint64_t completed;
+};
+
+/**
+ * @internal
+ * Structure representing an PTDMA device instance
+ */
+struct rte_ptdma_rawdev {
+	struct rte_rawdev *rawdev;
+	struct rte_ptdma_xstats xstats;
+	unsigned short ring_size;
+
+	bool hdls_disable;
+	__m128i *hdls; /* completion handles for returning to user */
+	unsigned short next_read;
+	unsigned short next_write;
+
+	int id; /**< ptdma dev id on platform */
+	struct ptdma_cmd_queue cmd_q[MAX_HW_QUEUES]; /**< ptdma queue */
+	int cmd_q_count; /**< no. of ptdma Queues */
+	struct rte_pci_device pci; /**< ptdma pci identifier */
+	int qidx;
+
+};
+
+static __rte_always_inline void
+ptdma_dump_registers(int dev_id)
+{
+	struct rte_ptdma_rawdev *ptdma_priv =
+		(struct rte_ptdma_rawdev *)rte_rawdevs[dev_id].dev_private;
+	struct ptdma_cmd_queue *cmd_q;
+	uint32_t cur_head_offset;
+	uint32_t cur_tail_offset;
+
+	cmd_q = &ptdma_priv->cmd_q[0];
+
+	PTDMA_PMD_DEBUG("cmd_q->head_offset	= %d\n", cmd_q->head_offset);
+	PTDMA_PMD_DEBUG("cmd_q->tail_offset	= %d\n", cmd_q->tail_offset);
+	PTDMA_PMD_DEBUG("cmd_q->id		= %" PRIx64 "\n", cmd_q->id);
+	PTDMA_PMD_DEBUG("cmd_q->qidx		= %" PRIx64 "\n", cmd_q->qidx);
+	PTDMA_PMD_DEBUG("cmd_q->qsize		= %" PRIx64 "\n", cmd_q->qsize);
+
+	cur_head_offset = PTDMA_READ_REG(cmd_q->reg_base,
+			CMD_Q_HEAD_LO_BASE);
+	cur_tail_offset = PTDMA_READ_REG(cmd_q->reg_base,
+			CMD_Q_TAIL_LO_BASE);
+
+	PTDMA_PMD_DEBUG("cur_head_offset	= %d\n", cur_head_offset);
+	PTDMA_PMD_DEBUG("cur_tail_offset	= %d\n", cur_tail_offset);
+	PTDMA_PMD_DEBUG("Q_CONTROL_BASE		= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_CONTROL_BASE));
+	PTDMA_PMD_DEBUG("Q_STATUS_BASE		= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_STATUS_BASE));
+	PTDMA_PMD_DEBUG("Q_INT_STATUS_BASE	= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_INT_STATUS_BASE));
+	PTDMA_PMD_DEBUG("Q_DMA_STATUS_BASE	= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_DMA_STATUS_BASE));
+	PTDMA_PMD_DEBUG("Q_DMA_RD_STS_BASE	= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_DMA_READ_STATUS_BASE));
+	PTDMA_PMD_DEBUG("Q_DMA_WRT_STS_BASE	= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_DMA_WRITE_STATUS_BASE));
+}
+
+static __rte_always_inline void
+ptdma_perform_passthru(struct ptdma_passthru *pst,
+		struct ptdma_cmd_queue *cmd_q)
+{
+	struct ptdma_desc *desc;
+	union ptdma_function function;
+
+	desc = &cmd_q->qbase_desc[cmd_q->qidx];
+
+	PTDMA_CMD_ENGINE(desc) = PTDMA_ENGINE_PASSTHRU;
+
+	PTDMA_CMD_SOC(desc) = 0;
+	PTDMA_CMD_IOC(desc) = 0;
+	PTDMA_CMD_INIT(desc) = 0;
+	PTDMA_CMD_EOM(desc) = 0;
+	PTDMA_CMD_PROT(desc) = 0;
+
+	function.raw = 0;
+	PTDMA_PT_BYTESWAP(&function) = pst->byte_swap;
+	PTDMA_PT_BITWISE(&function) = pst->bit_mod;
+	PTDMA_CMD_FUNCTION(desc) = function.raw;
+	PTDMA_CMD_LEN(desc) = pst->len;
+
+	PTDMA_CMD_SRC_LO(desc) = (uint32_t)(pst->src_addr);
+	PTDMA_CMD_SRC_HI(desc) = high32_value(pst->src_addr);
+	PTDMA_CMD_SRC_MEM(desc) = PTDMA_MEMTYPE_SYSTEM;
+
+	PTDMA_CMD_DST_LO(desc) = (uint32_t)(pst->dest_addr);
+	PTDMA_CMD_DST_HI(desc) = high32_value(pst->dest_addr);
+	PTDMA_CMD_DST_MEM(desc) = PTDMA_MEMTYPE_SYSTEM;
+
+	cmd_q->qidx = (cmd_q->qidx + 1) % COMMANDS_PER_QUEUE;
+
+}
+
+
+static __rte_always_inline int
+ptdma_ops_to_enqueue(int dev_id, uint32_t op, uint64_t src, phys_addr_t dst,
+		unsigned int length, uintptr_t src_hdl, uintptr_t dst_hdl)
+{
+	struct rte_ptdma_rawdev *ptdma_priv =
+		(struct rte_ptdma_rawdev *)rte_rawdevs[dev_id].dev_private;
+	struct ptdma_cmd_queue *cmd_q;
+	struct ptdma_passthru pst;
+	uint32_t cmd_q_ctrl;
+	unsigned short write	= ptdma_priv->next_write;
+	unsigned short read	= ptdma_priv->next_read;
+	unsigned short mask	= ptdma_priv->ring_size - 1;
+	unsigned short space	= mask + read - write;
+
+	cmd_q = &ptdma_priv->cmd_q[0];
+	cmd_q_ctrl = PTDMA_READ_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE);
+
+	if (cmd_q_ctrl & CMD_Q_RUN) {
+		/* Turn the queue off using control register */
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE,
+				cmd_q_ctrl & ~CMD_Q_RUN);
+		do {
+			cmd_q_ctrl = PTDMA_READ_REG(cmd_q->reg_base,
+					CMD_Q_CONTROL_BASE);
+		} while (!(cmd_q_ctrl & CMD_Q_HALT));
+	}
+
+	if (space == 0) {
+		ptdma_priv->xstats.enqueue_failed++;
+		return 0;
+	}
+
+	ptdma_priv->next_write = write + 1;
+	write &= mask;
+
+	if (!op)
+		pst.src_addr	= src;
+	else
+		PTDMA_PMD_DEBUG("Operation not supported by PTDMA\n");
+
+	pst.dest_addr	= dst;
+	pst.len		= length;
+	pst.bit_mod	= PTDMA_PASSTHRU_BITWISE_NOOP;
+	pst.byte_swap	= PTDMA_PASSTHRU_BYTESWAP_NOOP;
+
+	cmd_q = &ptdma_priv->cmd_q[0];
+
+	cmd_q->head_offset = (uint32_t)(PTDMA_READ_REG(cmd_q->reg_base,
+				CMD_Q_HEAD_LO_BASE));
+
+	ptdma_perform_passthru(&pst, cmd_q);
+
+	cmd_q->tail_offset = (uint32_t)(cmd_q->qbase_phys_addr + cmd_q->qidx *
+				Q_DESC_SIZE);
+	rte_wmb();
+
+	/* Write the new tail address back to the queue register */
+	PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_TAIL_LO_BASE,
+			cmd_q->tail_offset);
+
+	if (!ptdma_priv->hdls_disable)
+		ptdma_priv->hdls[write] =
+					_mm_set_epi64x((int64_t)dst_hdl,
+							(int64_t)src_hdl);
+	ptdma_priv->xstats.enqueued++;
+
+	return 1;
+}
+
+static __rte_always_inline int
+ptdma_ops_to_dequeue(int dev_id, int max_copies, uintptr_t *src_hdls,
+						uintptr_t *dst_hdls)
+{
+	struct rte_ptdma_rawdev *ptdma_priv =
+		(struct rte_ptdma_rawdev *)rte_rawdevs[dev_id].dev_private;
+	struct ptdma_cmd_queue *cmd_q;
+	uint32_t cur_head_offset;
+	short end_read;
+	unsigned short count;
+	unsigned short read	= ptdma_priv->next_read;
+	unsigned short write	= ptdma_priv->next_write;
+	unsigned short mask	= ptdma_priv->ring_size - 1;
+	int i = 0;
+
+	cmd_q = &ptdma_priv->cmd_q[0];
+
+	cur_head_offset = PTDMA_READ_REG(cmd_q->reg_base,
+			CMD_Q_HEAD_LO_BASE);
+
+	end_read = cur_head_offset - cmd_q->head_offset;
+
+	if (end_read < 0)
+		end_read = COMMANDS_PER_QUEUE - cmd_q->head_offset
+				+ cur_head_offset;
+	if (end_read < max_copies)
+		return 0;
+
+	if (end_read != 0)
+		count = (write - (read & mask)) & mask;
+	else
+		return 0;
+
+	if (ptdma_priv->hdls_disable) {
+		read += count;
+		goto end;
+	}
+
+	if (count > max_copies)
+		count = max_copies;
+
+	for (; i < count - 1; i += 2, read += 2) {
+		__m128i hdls0 =
+			_mm_load_si128(&ptdma_priv->hdls[read & mask]);
+		__m128i hdls1 =
+			_mm_load_si128(&ptdma_priv->hdls[(read + 1) & mask]);
+		_mm_storeu_si128((__m128i *)&src_hdls[i],
+				_mm_unpacklo_epi64(hdls0, hdls1));
+		_mm_storeu_si128((__m128i *)&dst_hdls[i],
+				_mm_unpackhi_epi64(hdls0, hdls1));
+	}
+
+	for (; i < count; i++, read++) {
+		uintptr_t *hdls =
+			(uintptr_t *)&ptdma_priv->hdls[read & mask];
+		src_hdls[i] = hdls[0];
+		dst_hdls[i] = hdls[1];
+	}
+end:
+	ptdma_priv->next_read = read;
+	ptdma_priv->xstats.completed += count;
+
+	return count;
+}
+
+static inline int
+rte_ptdma_enqueue_copy(int dev_id, phys_addr_t src, phys_addr_t dst,
+		unsigned int length, uintptr_t src_hdl, uintptr_t dst_hdl)
+{
+	return ptdma_ops_to_enqueue(dev_id, 0, src, dst, length,
+					src_hdl, dst_hdl);
+}
+
+static inline void
+rte_ptdma_perform_ops(int dev_id)
+{
+	struct rte_ptdma_rawdev *ptdma_priv =
+		(struct rte_ptdma_rawdev *)rte_rawdevs[dev_id].dev_private;
+	struct ptdma_cmd_queue *cmd_q;
+	uint32_t cmd_q_ctrl;
+
+	cmd_q = &ptdma_priv->cmd_q[0];
+	cmd_q_ctrl = PTDMA_READ_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE);
+
+	 /* Turn the queue on using control register */
+	PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE,
+			cmd_q_ctrl | CMD_Q_RUN);
+
+	ptdma_priv->xstats.started = ptdma_priv->xstats.enqueued;
+}
+
+static inline int
+rte_ptdma_completed_ops(int dev_id, uint8_t max_copies,
+		uintptr_t *src_hdls, uintptr_t *dst_hdls)
+{
+	int ret = 0;
+
+	ret = ptdma_ops_to_dequeue(dev_id, max_copies, src_hdls, dst_hdls);
+
+	return ret;
+}
+
+#endif
diff --git a/drivers/raw/ptdma/version.map b/drivers/raw/ptdma/version.map
new file mode 100644
index 0000000000..45917242ca
--- /dev/null
+++ b/drivers/raw/ptdma/version.map
@@ -0,0 +1,5 @@
+DPDK_21 {
+
+       local: *;
+};
+
diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
index 74d16e4c4b..30c11e92ba 100755
--- a/usertools/dpdk-devbind.py
+++ b/usertools/dpdk-devbind.py
@@ -65,6 +65,8 @@
                  'SVendor': None, 'SDevice': None}
 intel_ntb_icx = {'Class': '06', 'Vendor': '8086', 'Device': '347e',
                  'SVendor': None, 'SDevice': None}
+amd_ptdma   = {'Class': '10', 'Vendor': '1022', 'Device': '1498',
+                 'SVendor': None, 'SDevice': None}
 
 network_devices = [network_class, cavium_pkx, avp_vnic, ifpga_class]
 baseband_devices = [acceleration_class]
@@ -74,7 +76,7 @@
 compress_devices = [cavium_zip]
 regex_devices = [octeontx2_ree]
 misc_devices = [cnxk_bphy, cnxk_bphy_cgx, intel_ioat_bdw, intel_ioat_skx, intel_ioat_icx, intel_idxd_spr,
-                intel_ntb_skx, intel_ntb_icx,
+                intel_ntb_skx, intel_ntb_icx, amd_ptdma,
                 octeontx2_dma]
 
 # global dict ethernet devices present. Dictionary indexed by PCI address.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 9+ messages in thread
* [dpdk-dev] [RFC PATCH v2] raw/ptdma: introduce ptdma driver
@ 2021-09-06 15:59 Selwin Sebastian
  0 siblings, 0 replies; 9+ messages in thread
From: Selwin Sebastian @ 2021-09-06 15:59 UTC (permalink / raw)
  To: dev; +Cc: Selwin Sebastian

From: Selwin Sebastian <selwin.sebastia@amd.com>

Add support for PTDMA driver

Signed-off-by: Selwin Sebastian <selwin.sebastia@amd.com>
---
 MAINTAINERS                              |   5 +
 doc/guides/rawdevs/ptdma.rst             | 220 ++++++++++++++
 drivers/raw/meson.build                  |   1 +
 drivers/raw/ptdma/meson.build            |  16 +
 drivers/raw/ptdma/ptdma_dev.c            | 135 +++++++++
 drivers/raw/ptdma/ptdma_pmd_private.h    |  41 +++
 drivers/raw/ptdma/ptdma_rawdev.c         | 266 +++++++++++++++++
 drivers/raw/ptdma/ptdma_rawdev_spec.h    | 362 +++++++++++++++++++++++
 drivers/raw/ptdma/ptdma_rawdev_test.c    | 272 +++++++++++++++++
 drivers/raw/ptdma/rte_ptdma_rawdev.h     | 124 ++++++++
 drivers/raw/ptdma/rte_ptdma_rawdev_fns.h | 298 +++++++++++++++++++
 drivers/raw/ptdma/version.map            |   5 +
 usertools/dpdk-devbind.py                |   4 +-
 13 files changed, 1748 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/rawdevs/ptdma.rst
 create mode 100644 drivers/raw/ptdma/meson.build
 create mode 100644 drivers/raw/ptdma/ptdma_dev.c
 create mode 100644 drivers/raw/ptdma/ptdma_pmd_private.h
 create mode 100644 drivers/raw/ptdma/ptdma_rawdev.c
 create mode 100644 drivers/raw/ptdma/ptdma_rawdev_spec.h
 create mode 100644 drivers/raw/ptdma/ptdma_rawdev_test.c
 create mode 100644 drivers/raw/ptdma/rte_ptdma_rawdev.h
 create mode 100644 drivers/raw/ptdma/rte_ptdma_rawdev_fns.h
 create mode 100644 drivers/raw/ptdma/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 266f5ac1da..f4afd1a072 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1305,6 +1305,11 @@ F: doc/guides/rawdevs/ioat.rst
 F: examples/ioat/
 F: doc/guides/sample_app_ug/ioat.rst
 
+PTDMA Rawdev
+M: Selwin Sebastian <selwin.sebastian@amd.com>
+F: drivers/raw/ptdma/
+F: doc/guides/rawdevs/ptdma.rst
+
 NXP DPAA2 QDMA
 M: Nipun Gupta <nipun.gupta@nxp.com>
 F: drivers/raw/dpaa2_qdma/
diff --git a/doc/guides/rawdevs/ptdma.rst b/doc/guides/rawdevs/ptdma.rst
new file mode 100644
index 0000000000..50772f9f3b
--- /dev/null
+++ b/doc/guides/rawdevs/ptdma.rst
@@ -0,0 +1,220 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+
+PTDMA Rawdev Driver
+===================
+
+The ``ptdma`` rawdev driver provides a poll-mode driver (PMD) for AMD PTDMA device.
+
+Hardware Requirements
+----------------------
+
+The ``dpdk-devbind.py`` script, included with DPDK,
+can be used to show the presence of supported hardware.
+Running ``dpdk-devbind.py --status-dev misc`` will show all the miscellaneous,
+or rawdev-based devices on the system.
+
+Sample output from a system with PTDMA is shown below
+
+Misc (rawdev) devices using DPDK-compatible driver
+==================================================
+0000:01:00.2 'Starship/Matisse PTDMA 1498' drv=igb_uio unused=vfio-pci
+0000:02:00.2 'Starship/Matisse PTDMA 1498' drv=igb_uio unused=vfio-pci
+
+Devices using UIO drivers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The HW devices to be used will need to be bound to a user-space IO driver for use.
+The ``dpdk-devbind.py`` script can be used to view the state of the PTDMA devices
+and to bind them to a suitable DPDK-supported driver, such as ``igb_uio``.
+For example::
+
+        $ sudo ./usertools/dpdk-devbind.py  --force --bind=igb_uio 0000:01:00.2 0000:02:00.2
+
+Compilation
+------------
+
+For builds using ``meson`` and ``ninja``, the driver will be built when the target platform is x86-based.
+No additional compilation steps are necessary.
+
+
+Using PTDMA Rawdev Devices
+--------------------------
+
+To use the devices from an application, the rawdev API can be used, along
+with definitions taken from the device-specific header file
+``rte_ptdma_rawdev.h``. This header is needed to get the definition of
+structure parameters used by some of the rawdev APIs for PTDMA rawdev
+devices, as well as providing key functions for using the device for memory
+copies.
+
+Getting Device Information
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Basic information about each rawdev device can be queried using the
+``rte_rawdev_info_get()`` API. For most applications, this API will be
+needed to verify that the rawdev in question is of the expected type. For
+example, the following code snippet can be used to identify an PTDMA
+rawdev device for use by an application:
+
+.. code-block:: C
+
+        for (i = 0; i < count && !found; i++) {
+                struct rte_rawdev_info info = { .dev_private = NULL };
+                found = (rte_rawdev_info_get(i, &info, 0) == 0 &&
+                                strcmp(info.driver_name,
+                                                PTDMA_PMD_RAWDEV_NAME) == 0);
+        }
+
+When calling the ``rte_rawdev_info_get()`` API for an PTDMA rawdev device,
+the ``dev_private`` field in the ``rte_rawdev_info`` struct should either
+be NULL, or else be set to point to a structure of type
+``rte_ptdma_rawdev_config``, in which case the size of the configured device
+input ring will be returned in that structure.
+
+Device Configuration
+~~~~~~~~~~~~~~~~~~~~~
+
+Configuring an PTDMA rawdev device is done using the
+``rte_rawdev_configure()`` API, which takes the same structure parameters
+as the, previously referenced, ``rte_rawdev_info_get()`` API. The main
+difference is that, because the parameter is used as input rather than
+output, the ``dev_private`` structure element cannot be NULL, and must
+point to a valid ``rte_ptdma_rawdev_config`` structure, containing the ring
+size to be used by the device. The ring size must be a power of two,
+between 64 and 4096.
+If it is not needed, the tracking by the driver of user-provided completion
+handles may be disabled by setting the ``hdls_disable`` flag in
+the configuration structure also.
+
+The following code shows how the device is configured in
+``test_ptdma_rawdev.c``:
+
+.. code-block:: C
+
+   #define PTDMA_TEST_RINGSIZE 512
+        struct rte_ptdma_rawdev_config p = { .ring_size = -1 };
+        struct rte_rawdev_info info = { .dev_private = &p };
+
+        /* ... */
+
+        p.ring_size = PTDMA_TEST_RINGSIZE;
+        if (rte_rawdev_configure(dev_id, &info, sizeof(p)) != 0) {
+                printf("Error with rte_rawdev_configure()\n");
+                return -1;
+        }
+
+Once configured, the device can then be made ready for use by calling the
+``rte_rawdev_start()`` API.
+
+Performing Data Copies
+~~~~~~~~~~~~~~~~~~~~~~~
+
+To perform data copies using PTDMA rawdev devices, the functions
+``rte_ptdma_enqueue_copy()`` and ``rte_ptdma_perform_ops()`` should be used.
+Once copies have been completed, the completion will be reported back when
+the application calls ``rte_ptdma_completed_ops()``.
+
+The ``rte_ptdma_enqueue_copy()`` function enqueues a single copy to the
+device ring for copying at a later point. The parameters to that function
+include the IOVA addresses of both the source and destination buffers,
+as well as two "handles" to be returned to the user when the copy is
+completed. These handles can be arbitrary values, but two are provided so
+that the library can track handles for both source and destination on
+behalf of the user, e.g. virtual addresses for the buffers, or mbuf
+pointers if packet data is being copied.
+
+While the ``rte_ptdma_enqueue_copy()`` function enqueues a copy operation on
+the device ring, the copy will not actually be performed until after the
+application calls the ``rte_ptdma_perform_ops()`` function. This function
+informs the device hardware of the elements enqueued on the ring, and the
+device will begin to process them. It is expected that, for efficiency
+reasons, a burst of operations will be enqueued to the device via multiple
+enqueue calls between calls to the ``rte_ptdma_perform_ops()`` function.
+
+The following code from ``test_ptdma_rawdev.c`` demonstrates how to enqueue
+a burst of copies to the device and start the hardware processing of them:
+
+.. code-block:: C
+
+        struct rte_mbuf *srcs[32], *dsts[32];
+        unsigned int j;
+
+        for (i = 0; i < RTE_DIM(srcs); i++) {
+                char *src_data;
+
+                srcs[i] = rte_pktmbuf_alloc(pool);
+                dsts[i] = rte_pktmbuf_alloc(pool);
+                srcs[i]->data_len = srcs[i]->pkt_len = length;
+                dsts[i]->data_len = dsts[i]->pkt_len = length;
+                src_data = rte_pktmbuf_mtod(srcs[i], char *);
+
+                for (j = 0; j < length; j++)
+                        src_data[j] = rand() & 0xFF;
+
+                if (rte_ptdma_enqueue_copy(dev_id,
+                                srcs[i]->buf_iova + srcs[i]->data_off,
+                                dsts[i]->buf_iova + dsts[i]->data_off,
+                                length,
+                                (uintptr_t)srcs[i],
+                                (uintptr_t)dsts[i]) != 1) {
+                        printf("Error with rte_ptdma_enqueue_copy for buffer %u\n",
+                                        i);
+                        return -1;
+                }
+        }
+        rte_ptdma_perform_ops(dev_id);
+
+To retrieve information about completed copies, the API
+``rte_ptdma_completed_ops()`` should be used. This API will return to the
+application a set of completion handles passed in when the relevant copies
+were enqueued.
+
+The following code from ``test_ptdma_rawdev.c`` shows the test code
+retrieving information about the completed copies and validating the data
+is correct before freeing the data buffers using the returned handles:
+
+.. code-block:: C
+
+        if (rte_ptdma_completed_ops(dev_id, 64, (void *)completed_src,
+                        (void *)completed_dst) != RTE_DIM(srcs)) {
+                printf("Error with rte_ptdma_completed_ops\n");
+                return -1;
+        }
+        for (i = 0; i < RTE_DIM(srcs); i++) {
+                char *src_data, *dst_data;
+
+                if (completed_src[i] != srcs[i]) {
+                        printf("Error with source pointer %u\n", i);
+                        return -1;
+                }
+                if (completed_dst[i] != dsts[i]) {
+                        printf("Error with dest pointer %u\n", i);
+                        return -1;
+                }
+
+                src_data = rte_pktmbuf_mtod(srcs[i], char *);
+                dst_data = rte_pktmbuf_mtod(dsts[i], char *);
+                for (j = 0; j < length; j++)
+                        if (src_data[j] != dst_data[j]) {
+                                printf("Error with copy of packet %u, byte %u\n",
+                                                i, j);
+                                return -1;
+                        }
+                rte_pktmbuf_free(srcs[i]);
+                rte_pktmbuf_free(dsts[i]);
+        }
+
+Querying Device Statistics
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The statistics from the PTDMA rawdev device can be got via the xstats
+functions in the ``rte_rawdev`` library, i.e.
+``rte_rawdev_xstats_names_get()``, ``rte_rawdev_xstats_get()`` and
+``rte_rawdev_xstats_by_name_get``. The statistics returned for each device
+instance are:
+
+* ``failed_enqueues``
+* ``successful_enqueues``
+* ``copies_started``
+* ``copies_completed``
diff --git a/drivers/raw/meson.build b/drivers/raw/meson.build
index b51536f8a7..e896745d9c 100644
--- a/drivers/raw/meson.build
+++ b/drivers/raw/meson.build
@@ -14,6 +14,7 @@ drivers = [
         'ntb',
         'octeontx2_dma',
         'octeontx2_ep',
+	'ptdma',
         'skeleton',
 ]
 std_deps = ['rawdev']
diff --git a/drivers/raw/ptdma/meson.build b/drivers/raw/ptdma/meson.build
new file mode 100644
index 0000000000..a3eab8dbfd
--- /dev/null
+++ b/drivers/raw/ptdma/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2021 Advanced Micro Devices, Inc. All rights reserved.
+
+build = dpdk_conf.has('RTE_ARCH_X86')
+reason = 'only supported on x86'
+sources = files(
+	'ptdma_rawdev.c',
+	'ptdma_dev.c',
+	'ptdma_rawdev_test.c')
+deps += ['bus_pci',
+	'bus_vdev',
+	'mbuf',
+	'rawdev']
+
+headers = files('rte_ptdma_rawdev.h',
+		'rte_ptdma_rawdev_fns.h')
diff --git a/drivers/raw/ptdma/ptdma_dev.c b/drivers/raw/ptdma/ptdma_dev.c
new file mode 100644
index 0000000000..1d0207a9af
--- /dev/null
+++ b/drivers/raw/ptdma/ptdma_dev.c
@@ -0,0 +1,135 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <dirent.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/queue.h>
+#include <sys/types.h>
+#include <sys/file.h>
+#include <unistd.h>
+
+#include <rte_hexdump.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_memory.h>
+#include <rte_spinlock.h>
+#include <rte_string_fns.h>
+
+#include "ptdma_rawdev_spec.h"
+#include "ptdma_pmd_private.h"
+#include "rte_ptdma_rawdev_fns.h"
+
+static int ptdma_dev_id;
+
+static const struct rte_memzone *
+ptdma_queue_dma_zone_reserve(const char *queue_name,
+			   uint32_t queue_size,
+			   int socket_id)
+{
+	const struct rte_memzone *mz;
+
+	mz = rte_memzone_lookup(queue_name);
+	if (mz != 0) {
+		if (((size_t)queue_size <= mz->len) &&
+		    ((socket_id == SOCKET_ID_ANY) ||
+		     (socket_id == mz->socket_id))) {
+			PTDMA_PMD_INFO("re-use memzone already "
+				     "allocated for %s", queue_name);
+			return mz;
+		}
+		PTDMA_PMD_ERR("Incompatible memzone already "
+			    "allocated %s, size %u, socket %d. "
+			    "Requested size %u, socket %u",
+			    queue_name, (uint32_t)mz->len,
+			    mz->socket_id, queue_size, socket_id);
+		return NULL;
+	}
+
+	PTDMA_PMD_INFO("Allocate memzone for %s, size %u on socket %u",
+		     queue_name, queue_size, socket_id);
+
+	return rte_memzone_reserve_aligned(queue_name, queue_size,
+			socket_id, RTE_MEMZONE_IOVA_CONTIG, queue_size);
+}
+
+int
+ptdma_add_queue(struct rte_ptdma_rawdev *dev)
+{
+	int i;
+	uint32_t dma_addr_lo, dma_addr_hi;
+	uint32_t ptdma_version = 0;
+	struct ptdma_cmd_queue *cmd_q;
+	const struct rte_memzone *q_mz;
+	void *vaddr;
+
+	if (dev == NULL)
+		return -1;
+
+	dev->id = ptdma_dev_id++;
+	dev->qidx = 0;
+	vaddr = (void *)(dev->pci.mem_resource[2].addr);
+
+	PTDMA_WRITE_REG(vaddr, CMD_REQID_CONFIG_OFFSET, 0x0);
+	ptdma_version = PTDMA_READ_REG(vaddr, CMD_PTDMA_VERSION);
+	PTDMA_PMD_INFO("PTDMA VERSION  = 0x%x", ptdma_version);
+
+	dev->cmd_q_count = 0;
+	/* Find available queues */
+	for (i = 0; i < MAX_HW_QUEUES; i++) {
+		cmd_q = &dev->cmd_q[dev->cmd_q_count++];
+		cmd_q->dev = dev;
+		cmd_q->id = i;
+		cmd_q->qidx = 0;
+		cmd_q->qsize = Q_SIZE(Q_DESC_SIZE);
+
+		cmd_q->reg_base = (uint8_t *)vaddr +
+			CMD_Q_STATUS_INCR * (i + 1);
+
+		/* PTDMA queue memory */
+		snprintf(cmd_q->memz_name, sizeof(cmd_q->memz_name),
+			 "%s_%d_%s_%d_%s",
+			 "ptdma_dev",
+			 (int)dev->id, "queue",
+			 (int)cmd_q->id, "mem");
+		q_mz = ptdma_queue_dma_zone_reserve(cmd_q->memz_name,
+				cmd_q->qsize, rte_socket_id());
+		cmd_q->qbase_addr = (void *)q_mz->addr;
+		cmd_q->qbase_desc = (void *)q_mz->addr;
+		cmd_q->qbase_phys_addr =  q_mz->iova;
+
+		cmd_q->qcontrol = 0;
+		/* init control reg to zero */
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE,
+			      cmd_q->qcontrol);
+
+		/* Disable the interrupts */
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_INT_ENABLE_BASE, 0x00);
+		PTDMA_READ_REG(cmd_q->reg_base, CMD_Q_INT_STATUS_BASE);
+		PTDMA_READ_REG(cmd_q->reg_base, CMD_Q_STATUS_BASE);
+
+		/* Clear the interrupts */
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_INTERRUPT_STATUS_BASE,
+			      ALL_INTERRUPTS);
+
+		/* Configure size of each virtual queue accessible to host */
+		cmd_q->qcontrol &= ~(CMD_Q_SIZE << CMD_Q_SHIFT);
+		cmd_q->qcontrol |= QUEUE_SIZE_VAL << CMD_Q_SHIFT;
+
+		dma_addr_lo = low32_value(cmd_q->qbase_phys_addr);
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_TAIL_LO_BASE,
+			      (uint32_t)dma_addr_lo);
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_HEAD_LO_BASE,
+			      (uint32_t)dma_addr_lo);
+
+		dma_addr_hi = high32_value(cmd_q->qbase_phys_addr);
+		cmd_q->qcontrol |= (dma_addr_hi << 16);
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE,
+			      cmd_q->qcontrol);
+
+	}
+	return 0;
+}
diff --git a/drivers/raw/ptdma/ptdma_pmd_private.h b/drivers/raw/ptdma/ptdma_pmd_private.h
new file mode 100644
index 0000000000..0c25e737f5
--- /dev/null
+++ b/drivers/raw/ptdma/ptdma_pmd_private.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef _PTDMA_PMD_PRIVATE_H_
+#define _PTDMA_PMD_PRIVATE_H_
+
+#include <rte_rawdev.h>
+#include "ptdma_rawdev_spec.h"
+
+extern int ptdma_pmd_logtype;
+
+#define PTDMA_PMD_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, ptdma_pmd_logtype, "%s(): " fmt "\n", \
+			__func__, ##args)
+
+#define PTDMA_PMD_FUNC_TRACE() PTDMA_PMD_LOG(DEBUG, ">>")
+
+#define PTDMA_PMD_ERR(fmt, args...) \
+	PTDMA_PMD_LOG(ERR, fmt, ## args)
+#define PTDMA_PMD_WARN(fmt, args...) \
+	PTDMA_PMD_LOG(WARNING, fmt, ## args)
+#define PTDMA_PMD_DEBUG(fmt, args...) \
+	PTDMA_PMD_LOG(DEBUG, fmt, ## args)
+#define PTDMA_PMD_INFO(fmt, args...) \
+	PTDMA_PMD_LOG(INFO, fmt, ## args)
+
+int ptdma_xstats_get(const struct rte_rawdev *dev, const unsigned int ids[],
+		uint64_t values[], unsigned int n);
+int ptdma_xstats_get_names(const struct rte_rawdev *dev,
+		struct rte_rawdev_xstats_name *names,
+		unsigned int size);
+int ptdma_xstats_reset(struct rte_rawdev *dev, const uint32_t *ids,
+		uint32_t nb_ids);
+int ptdma_add_queue(struct rte_ptdma_rawdev *dev);
+
+extern int ptdma_rawdev_test(uint16_t dev_id);
+
+#endif /* _PTDMA_PMD_PRIVATE_H_ */
+
+
diff --git a/drivers/raw/ptdma/ptdma_rawdev.c b/drivers/raw/ptdma/ptdma_rawdev.c
new file mode 100644
index 0000000000..cfa57d81ed
--- /dev/null
+++ b/drivers/raw/ptdma/ptdma_rawdev.c
@@ -0,0 +1,266 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <rte_bus_pci.h>
+#include <rte_rawdev_pmd.h>
+#include <rte_memzone.h>
+#include <rte_string_fns.h>
+#include <rte_dev.h>
+
+#include "rte_ptdma_rawdev.h"
+#include "ptdma_rawdev_spec.h"
+#include "ptdma_pmd_private.h"
+
+RTE_LOG_REGISTER(ptdma_pmd_logtype, rawdev.ptdma, INFO);
+
+uint8_t ptdma_rawdev_driver_id;
+static struct rte_pci_driver ptdma_pmd_drv;
+
+#define AMD_VENDOR_ID		0x1022
+#define PTDMA_DEVICE_ID		0x1498
+#define COMPLETION_SZ sizeof(__m128i)
+
+static const struct rte_pci_id pci_id_ptdma_map[] = {
+	{ RTE_PCI_DEVICE(AMD_VENDOR_ID, PTDMA_DEVICE_ID) },
+	{ .vendor_id = 0, /* sentinel */ },
+};
+
+static const char * const xstat_names[] = {
+	"failed_enqueues", "successful_enqueues",
+	"copies_started", "copies_completed"
+};
+
+static int
+ptdma_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config,
+		size_t config_size)
+{
+	struct rte_ptdma_rawdev_config *params = config;
+	struct rte_ptdma_rawdev *ptdma_priv = dev->dev_private;
+
+	if (dev->started)
+		return -EBUSY;
+	if (params == NULL || config_size != sizeof(*params))
+		return -EINVAL;
+	if (params->ring_size > 8192 || params->ring_size < 64 ||
+			!rte_is_power_of_2(params->ring_size))
+		return -EINVAL;
+	ptdma_priv->ring_size = params->ring_size;
+	ptdma_priv->hdls_disable = params->hdls_disable;
+	ptdma_priv->hdls = rte_zmalloc_socket("ptdma_hdls",
+			ptdma_priv->ring_size * sizeof(*ptdma_priv->hdls),
+			RTE_CACHE_LINE_SIZE, rte_socket_id());
+	return 0;
+}
+
+static int
+ptdma_rawdev_remove(struct rte_pci_device *dev);
+
+int
+ptdma_xstats_get(const struct rte_rawdev *dev, const unsigned int ids[],
+		uint64_t values[], unsigned int n)
+{
+	const struct rte_ptdma_rawdev *ptdma = dev->dev_private;
+	const uint64_t *stats = (const void *)&ptdma->xstats;
+	unsigned int i;
+
+	for (i = 0; i < n; i++) {
+		if (ids[i] > sizeof(ptdma->xstats)/sizeof(*stats))
+			values[i] = 0;
+		else
+			values[i] = stats[ids[i]];
+	}
+	return n;
+}
+
+int
+ptdma_xstats_get_names(const struct rte_rawdev *dev,
+		struct rte_rawdev_xstats_name *names,
+		unsigned int size)
+{
+	unsigned int i;
+
+	RTE_SET_USED(dev);
+	if (size < RTE_DIM(xstat_names))
+		return RTE_DIM(xstat_names);
+	for (i = 0; i < RTE_DIM(xstat_names); i++)
+		strlcpy(names[i].name, xstat_names[i], sizeof(names[i]));
+	return RTE_DIM(xstat_names);
+}
+
+int
+ptdma_xstats_reset(struct rte_rawdev *dev, const uint32_t *ids,
+		uint32_t nb_ids)
+{
+	struct rte_ptdma_rawdev *ptdma = dev->dev_private;
+	uint64_t *stats = (void *)&ptdma->xstats;
+	unsigned int i;
+
+	if (!ids) {
+		memset(&ptdma->xstats, 0, sizeof(ptdma->xstats));
+		return 0;
+	}
+	for (i = 0; i < nb_ids; i++)
+		if (ids[i] < sizeof(ptdma->xstats)/sizeof(*stats))
+			stats[ids[i]] = 0;
+	return 0;
+}
+
+static int
+ptdma_dev_start(struct rte_rawdev *dev)
+{
+	RTE_SET_USED(dev);
+	return 0;
+}
+
+static void
+ptdma_dev_stop(struct rte_rawdev *dev)
+{
+	RTE_SET_USED(dev);
+}
+
+static int
+ptdma_dev_close(struct rte_rawdev *dev __rte_unused)
+{
+	return 0;
+}
+
+static int
+ptdma_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info,
+		size_t dev_info_size)
+{
+	struct rte_ptdma_rawdev_config *cfg = dev_info;
+	struct rte_ptdma_rawdev *ptdma = dev->dev_private;
+
+	if (dev_info == NULL || dev_info_size != sizeof(*cfg))
+		return -EINVAL;
+	cfg->ring_size = ptdma->ring_size;
+	cfg->hdls_disable = ptdma->hdls_disable;
+	return 0;
+}
+
+static int
+ptdma_rawdev_create(const char *name, struct rte_pci_device *dev)
+{
+	static const struct rte_rawdev_ops ptdma_rawdev_ops = {
+			.dev_configure = ptdma_dev_configure,
+			.dev_start = ptdma_dev_start,
+			.dev_stop = ptdma_dev_stop,
+			.dev_close = ptdma_dev_close,
+			.dev_info_get = ptdma_dev_info_get,
+			.xstats_get = ptdma_xstats_get,
+			.xstats_get_names = ptdma_xstats_get_names,
+			.xstats_reset = ptdma_xstats_reset,
+			.dev_selftest = ptdma_rawdev_test,
+	};
+	struct rte_rawdev *rawdev = NULL;
+	struct rte_ptdma_rawdev *ptdma_priv = NULL;
+	int ret = 0;
+	if (!name) {
+		PTDMA_PMD_ERR("Invalid name of the device!");
+		ret = -EINVAL;
+		goto cleanup;
+	}
+	/* Allocate device structure */
+	rawdev = rte_rawdev_pmd_allocate(name, sizeof(struct rte_rawdev),
+						rte_socket_id());
+	if (rawdev == NULL) {
+		PTDMA_PMD_ERR("Unable to allocate raw device");
+		ret = -ENOMEM;
+		goto cleanup;
+	}
+
+	rawdev->dev_id = ptdma_rawdev_driver_id++;
+	PTDMA_PMD_INFO("dev_id = %d", rawdev->dev_id);
+	PTDMA_PMD_INFO("driver_name = %s", dev->device.driver->name);
+
+	rawdev->dev_ops = &ptdma_rawdev_ops;
+	rawdev->device = &dev->device;
+	rawdev->driver_name = dev->device.driver->name;
+
+	ptdma_priv = rte_zmalloc_socket("ptdma_priv", sizeof(*ptdma_priv),
+				RTE_CACHE_LINE_SIZE, rte_socket_id());
+	rawdev->dev_private = ptdma_priv;
+	ptdma_priv->rawdev = rawdev;
+	ptdma_priv->ring_size = 0;
+	ptdma_priv->pci = *dev;
+
+	/* device is valid, add queue details */
+	if (ptdma_add_queue(ptdma_priv))
+		goto init_error;
+
+	return 0;
+
+cleanup:
+	if (rawdev)
+		rte_rawdev_pmd_release(rawdev);
+	return ret;
+init_error:
+	PTDMA_PMD_ERR("driver %s(): failed", __func__);
+	ptdma_rawdev_remove(dev);
+	return -EFAULT;
+}
+
+static int
+ptdma_rawdev_destroy(const char *name)
+{
+	int ret;
+	struct rte_rawdev *rdev;
+	if (!name) {
+		PTDMA_PMD_ERR("Invalid device name");
+		return -EINVAL;
+	}
+	rdev = rte_rawdev_pmd_get_named_dev(name);
+	if (!rdev) {
+		PTDMA_PMD_ERR("Invalid device name (%s)", name);
+		return -EINVAL;
+	}
+
+	if (rdev->dev_private != NULL)
+		rte_free(rdev->dev_private);
+
+	/* rte_rawdev_close is called by pmd_release */
+	ret = rte_rawdev_pmd_release(rdev);
+
+	if (ret)
+		PTDMA_PMD_DEBUG("Device cleanup failed");
+	return 0;
+}
+static int
+ptdma_rawdev_probe(struct rte_pci_driver *drv, struct rte_pci_device *dev)
+{
+	char name[32];
+	int ret = 0;
+
+	rte_pci_device_name(&dev->addr, name, sizeof(name));
+	PTDMA_PMD_INFO("Init %s on NUMA node %d", name, dev->device.numa_node);
+
+	dev->device.driver = &drv->driver;
+	ret = ptdma_rawdev_create(name, dev);
+	return ret;
+}
+
+static int
+ptdma_rawdev_remove(struct rte_pci_device *dev)
+{
+	char name[32];
+	int ret;
+
+	rte_pci_device_name(&dev->addr, name, sizeof(name));
+	PTDMA_PMD_INFO("Closing %s on NUMA node %d",
+			name, dev->device.numa_node);
+	ret = ptdma_rawdev_destroy(name);
+	return ret;
+}
+
+static struct rte_pci_driver ptdma_pmd_drv = {
+	.id_table = pci_id_ptdma_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.probe = ptdma_rawdev_probe,
+	.remove = ptdma_rawdev_remove,
+};
+
+RTE_PMD_REGISTER_PCI(PTDMA_PMD_RAWDEV_NAME, ptdma_pmd_drv);
+RTE_PMD_REGISTER_PCI_TABLE(PTDMA_PMD_RAWDEV_NAME, pci_id_ptdma_map);
+RTE_PMD_REGISTER_KMOD_DEP(PTDMA_PMD_RAWDEV_NAME, "* igb_uio | uio_pci_generic");
+
diff --git a/drivers/raw/ptdma/ptdma_rawdev_spec.h b/drivers/raw/ptdma/ptdma_rawdev_spec.h
new file mode 100644
index 0000000000..73511bec95
--- /dev/null
+++ b/drivers/raw/ptdma/ptdma_rawdev_spec.h
@@ -0,0 +1,362 @@
+/* SPDX-License-Identifier: BSD-3.0-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef __PT_DEV_H__
+#define __PT_DEV_H__
+
+#include <rte_bus_pci.h>
+#include <rte_byteorder.h>
+#include <rte_io.h>
+#include <rte_pci.h>
+#include <rte_spinlock.h>
+#include <rte_rawdev.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define BIT(nr)				(1 << (nr))
+
+#define BITS_PER_LONG   (__SIZEOF_LONG__ * 8)
+#define GENMASK(h, l)   (((~0UL) << (l)) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
+
+#define MAX_HW_QUEUES			1
+
+/* Register Mappings */
+
+#define CMD_QUEUE_PRIO_OFFSET		0x00
+#define CMD_REQID_CONFIG_OFFSET		0x04
+#define CMD_TIMEOUT_OFFSET		0x08
+#define CMD_TIMEOUT_GRANULARITY		0x0C
+#define CMD_PTDMA_VERSION		0x10
+
+#define CMD_Q_CONTROL_BASE		0x0000
+#define CMD_Q_TAIL_LO_BASE		0x0004
+#define CMD_Q_HEAD_LO_BASE		0x0008
+#define CMD_Q_INT_ENABLE_BASE		0x000C
+#define CMD_Q_INTERRUPT_STATUS_BASE	0x0010
+
+#define CMD_Q_STATUS_BASE		0x0100
+#define CMD_Q_INT_STATUS_BASE		0x0104
+#define CMD_Q_DMA_STATUS_BASE		0x0108
+#define CMD_Q_DMA_READ_STATUS_BASE	0x010C
+#define CMD_Q_DMA_WRITE_STATUS_BASE	0x0110
+#define CMD_Q_ABORT_BASE		0x0114
+#define CMD_Q_AX_CACHE_BASE		0x0118
+
+#define CMD_CONFIG_OFFSET		0x1120
+#define CMD_CLK_GATE_CTL_OFFSET		0x6004
+
+#define CMD_DESC_DW0_VAL		0x500012
+
+/* Address offset for virtual queue registers */
+#define CMD_Q_STATUS_INCR		0x1000
+
+/* Bit masks */
+#define CMD_CONFIG_REQID		0
+#define CMD_TIMEOUT_DISABLE		0
+#define CMD_CLK_DYN_GATING_DIS		0
+#define CMD_CLK_SW_GATE_MODE		0
+#define CMD_CLK_GATE_CTL		0
+#define CMD_QUEUE_PRIO			GENMASK(2, 1)
+#define CMD_CONFIG_VHB_EN		BIT(0)
+#define CMD_CLK_DYN_GATING_EN		BIT(0)
+#define CMD_CLK_HW_GATE_MODE		BIT(0)
+#define CMD_CLK_GATE_ON_DELAY		BIT(12)
+#define CMD_CLK_GATE_OFF_DELAY		BIT(12)
+
+#define CMD_CLK_GATE_CONFIG		(CMD_CLK_GATE_CTL | \
+					CMD_CLK_HW_GATE_MODE | \
+					CMD_CLK_GATE_ON_DELAY | \
+					CMD_CLK_DYN_GATING_EN | \
+					CMD_CLK_GATE_OFF_DELAY)
+
+#define CMD_Q_LEN			32
+#define CMD_Q_RUN			BIT(0)
+#define CMD_Q_HALT			BIT(1)
+#define CMD_Q_MEM_LOCATION		BIT(2)
+#define CMD_Q_SIZE			GENMASK(4, 0)
+#define CMD_Q_SHIFT			GENMASK(1, 0)
+#define COMMANDS_PER_QUEUE		8192
+
+
+#define QUEUE_SIZE_VAL			((ffs(COMMANDS_PER_QUEUE) - 2) & \
+						CMD_Q_SIZE)
+#define Q_PTR_MASK			(2 << (QUEUE_SIZE_VAL + 5) - 1)
+#define Q_DESC_SIZE			sizeof(struct ptdma_desc)
+#define Q_SIZE(n)			(COMMANDS_PER_QUEUE * (n))
+
+#define INT_COMPLETION			BIT(0)
+#define INT_ERROR			BIT(1)
+#define INT_QUEUE_STOPPED		BIT(2)
+#define INT_EMPTY_QUEUE			BIT(3)
+#define SUPPORTED_INTERRUPTS		(INT_COMPLETION | INT_ERROR)
+#define ALL_INTERRUPTS			(INT_COMPLETION | INT_ERROR | \
+					INT_QUEUE_STOPPED)
+
+/****** Local Storage Block ******/
+#define LSB_START			0
+#define LSB_END				127
+#define LSB_COUNT			(LSB_END - LSB_START + 1)
+
+#define LSB_REGION_WIDTH		5
+#define MAX_LSB_CNT			8
+
+#define LSB_SIZE			16
+#define LSB_ITEM_SIZE			128
+#define SLSB_MAP_SIZE			(MAX_LSB_CNT * LSB_SIZE)
+#define LSB_ENTRY_NUMBER(LSB_ADDR)	(LSB_ADDR / LSB_ITEM_SIZE)
+
+
+#define PT_DMAPOOL_MAX_SIZE		64
+#define PT_DMAPOOL_ALIGN		BIT(5)
+
+#define PT_PASSTHRU_BLOCKSIZE		512
+
+/* General PTDMA Defines */
+
+#define PTDMA_SB_BYTES			32
+#define	PTDMA_ENGINE_PASSTHRU		0x5
+
+/* Word 0 */
+#define PTDMA_CMD_DW0(p)		((p)->dw0)
+#define PTDMA_CMD_SOC(p)		(PTDMA_CMD_DW0(p).soc)
+#define PTDMA_CMD_IOC(p)		(PTDMA_CMD_DW0(p).ioc)
+#define PTDMA_CMD_INIT(p)		(PTDMA_CMD_DW0(p).init)
+#define PTDMA_CMD_EOM(p)		(PTDMA_CMD_DW0(p).eom)
+#define PTDMA_CMD_FUNCTION(p)		(PTDMA_CMD_DW0(p).function)
+#define PTDMA_CMD_ENGINE(p)		(PTDMA_CMD_DW0(p).engine)
+#define PTDMA_CMD_PROT(p)		(PTDMA_CMD_DW0(p).prot)
+
+/* Word 1 */
+#define PTDMA_CMD_DW1(p)		((p)->length)
+#define PTDMA_CMD_LEN(p)		(PTDMA_CMD_DW1(p))
+
+/* Word 2 */
+#define PTDMA_CMD_DW2(p)		((p)->src_lo)
+#define PTDMA_CMD_SRC_LO(p)		(PTDMA_CMD_DW2(p))
+
+/* Word 3 */
+#define PTDMA_CMD_DW3(p)		((p)->dw3)
+#define PTDMA_CMD_SRC_MEM(p)		((p)->dw3.src_mem)
+#define PTDMA_CMD_SRC_HI(p)		((p)->dw3.src_hi)
+#define PTDMA_CMD_LSB_ID(p)		((p)->dw3.lsb_cxt_id)
+#define PTDMA_CMD_FIX_SRC(p)		((p)->dw3.fixed)
+
+/* Words 4/5 */
+#define PTDMA_CMD_DST_LO(p)		((p)->dst_lo)
+#define PTDMA_CMD_DW5(p)		((p)->dw5.dst_hi)
+#define PTDMA_CMD_DST_HI(p)		(PTDMA_CMD_DW5(p))
+#define PTDMA_CMD_DST_MEM(p)		((p)->dw5.dst_mem)
+#define PTDMA_CMD_FIX_DST(p)		((p)->dw5.fixed)
+
+/* bitmap */
+enum {
+	BITS_PER_WORD = sizeof(unsigned long) * CHAR_BIT
+};
+
+#define WORD_OFFSET(b) ((b) / BITS_PER_WORD)
+#define BIT_OFFSET(b)  ((b) % BITS_PER_WORD)
+
+#define PTDMA_DIV_ROUND_UP(n, d)  (((n) + (d) - 1) / (d))
+#define PTDMA_BITMAP_SIZE(nr) \
+	PTDMA_DIV_ROUND_UP(nr, CHAR_BIT * sizeof(unsigned long))
+
+#define PTDMA_BITMAP_FIRST_WORD_MASK(start) \
+	(~0UL << ((start) & (BITS_PER_WORD - 1)))
+#define PTDMA_BITMAP_LAST_WORD_MASK(nbits) \
+	(~0UL >> (-(nbits) & (BITS_PER_WORD - 1)))
+
+#define __ptdma_round_mask(x, y) ((typeof(x))((y)-1))
+#define ptdma_round_down(x, y) ((x) & ~__ptdma_round_mask(x, y))
+
+/** PTDMA registers Write/Read */
+static inline void ptdma_pci_reg_write(void *base, int offset,
+					uint32_t value)
+{
+	volatile void *reg_addr = ((uint8_t *)base + offset);
+	rte_write32((rte_cpu_to_le_32(value)), reg_addr);
+}
+
+static inline uint32_t ptdma_pci_reg_read(void *base, int offset)
+{
+	volatile void *reg_addr = ((uint8_t *)base + offset);
+	return rte_le_to_cpu_32(rte_read32(reg_addr));
+}
+
+#define PTDMA_READ_REG(hw_addr, reg_offset) \
+	ptdma_pci_reg_read(hw_addr, reg_offset)
+
+#define PTDMA_WRITE_REG(hw_addr, reg_offset, value) \
+	ptdma_pci_reg_write(hw_addr, reg_offset, value)
+
+/**
+ * A structure describing a PTDMA command queue.
+ */
+struct ptdma_cmd_queue {
+	struct rte_ptdma_rawdev *dev;
+	char memz_name[RTE_MEMZONE_NAMESIZE];
+
+	/* Queue identifier */
+	uint64_t id;	/**< queue id */
+	uint64_t qidx;	/**< queue index */
+	uint64_t qsize;	/**< queue size */
+
+	/* Queue address */
+	struct ptdma_desc *qbase_desc;
+	void *qbase_addr;
+	phys_addr_t qbase_phys_addr;
+	/**< queue-page registers addr */
+	void *reg_base;
+	uint32_t qcontrol;
+	/**< queue ctrl reg */
+	uint32_t head_offset;
+	uint32_t tail_offset;
+
+	int lsb;
+	/**< lsb region assigned to queue */
+	unsigned long lsbmask;
+	/**< lsb regions queue can access */
+	unsigned long lsbmap[PTDMA_BITMAP_SIZE(LSB_COUNT)];
+	/**< all lsb resources which queue is using */
+	uint32_t sb_key;
+	/**< lsb assigned for queue */
+} __rte_cache_aligned;
+
+/* Passthru engine */
+
+#define PTDMA_PT_BYTESWAP(p)      ((p)->pt.byteswap)
+#define PTDMA_PT_BITWISE(p)       ((p)->pt.bitwise)
+
+/**
+ * passthru_bitwise - type of bitwise passthru operation
+ *
+ * @PTDMA_PASSTHRU_BITWISE_NOOP: no bitwise operation performed
+ * @PTDMA_PASSTHRU_BITWISE_AND: perform bitwise AND of src with mask
+ * @PTDMA_PASSTHRU_BITWISE_OR: perform bitwise OR of src with mask
+ * @PTDMA_PASSTHRU_BITWISE_XOR: perform bitwise XOR of src with mask
+ * @PTDMA_PASSTHRU_BITWISE_MASK: overwrite with mask
+ */
+enum ptdma_passthru_bitwise {
+	PTDMA_PASSTHRU_BITWISE_NOOP = 0,
+	PTDMA_PASSTHRU_BITWISE_AND,
+	PTDMA_PASSTHRU_BITWISE_OR,
+	PTDMA_PASSTHRU_BITWISE_XOR,
+	PTDMA_PASSTHRU_BITWISE_MASK,
+	PTDMA_PASSTHRU_BITWISE__LAST,
+};
+
+/**
+ * ptdma_passthru_byteswap - type of byteswap passthru operation
+ *
+ * @PTDMA_PASSTHRU_BYTESWAP_NOOP: no byte swapping performed
+ * @PTDMA_PASSTHRU_BYTESWAP_32BIT: swap bytes within 32-bit words
+ * @PTDMA_PASSTHRU_BYTESWAP_256BIT: swap bytes within 256-bit words
+ */
+enum ptdma_passthru_byteswap {
+	PTDMA_PASSTHRU_BYTESWAP_NOOP = 0,
+	PTDMA_PASSTHRU_BYTESWAP_32BIT,
+	PTDMA_PASSTHRU_BYTESWAP_256BIT,
+	PTDMA_PASSTHRU_BYTESWAP__LAST,
+};
+
+/**
+ * PTDMA passthru
+ */
+struct ptdma_passthru {
+	phys_addr_t src_addr;
+	phys_addr_t dest_addr;
+	enum ptdma_passthru_bitwise bit_mod;
+	enum ptdma_passthru_byteswap byte_swap;
+	int len;
+};
+
+union ptdma_function {
+	struct {
+		uint16_t byteswap:2;
+		uint16_t bitwise:3;
+		uint16_t reflect:2;
+		uint16_t rsvd:8;
+	} pt;
+	uint16_t raw;
+};
+
+/**
+ * ptdma memory type
+ */
+enum ptdma_memtype {
+	PTDMA_MEMTYPE_SYSTEM = 0,
+	PTDMA_MEMTYPE_SB,
+	PTDMA_MEMTYPE_LOCAL,
+	PTDMA_MEMTYPE_LAST,
+};
+
+/*
+ * descriptor for PTDMA commands
+ * 8 32-bit words:
+ * word 0: function; engine; control bits
+ * word 1: length of source data
+ * word 2: low 32 bits of source pointer
+ * word 3: upper 16 bits of source pointer; source memory type
+ * word 4: low 32 bits of destination pointer
+ * word 5: upper 16 bits of destination pointer; destination memory type
+ * word 6: reserved 32 bits
+ * word 7: reserved 32 bits
+ */
+
+union dword0 {
+	struct {
+		uint32_t soc:1;
+		uint32_t ioc:1;
+		uint32_t rsvd1:1;
+		uint32_t init:1;
+		uint32_t eom:1;
+		uint32_t function:15;
+		uint32_t engine:4;
+		uint32_t prot:1;
+		uint32_t rsvd2:7;
+	};
+	uint32_t val;
+};
+
+struct dword3 {
+	uint32_t  src_hi:16;
+	uint32_t  src_mem:2;
+	uint32_t  lsb_cxt_id:8;
+	uint32_t  rsvd1:5;
+	uint32_t  fixed:1;
+};
+
+struct dword5 {
+	uint32_t  dst_hi:16;
+	uint32_t  dst_mem:2;
+	uint32_t  rsvd1:13;
+	uint32_t  fixed:1;
+};
+
+struct ptdma_desc {
+	union dword0 dw0;
+	uint32_t length;
+	uint32_t src_lo;
+	struct dword3 dw3;
+	uint32_t dst_lo;
+	struct dword5 dw5;
+	uint32_t rsvd1;
+	uint32_t rsvd2;
+};
+
+
+static inline uint32_t
+low32_value(unsigned long addr)
+{
+	return ((uint64_t)addr) & 0x0ffffffff;
+}
+
+static inline uint32_t
+high32_value(unsigned long addr)
+{
+	return ((uint64_t)addr >> 32) & 0x00000ffff;
+}
+
+#endif
diff --git a/drivers/raw/ptdma/ptdma_rawdev_test.c b/drivers/raw/ptdma/ptdma_rawdev_test.c
new file mode 100644
index 0000000000..fbbcd66c8d
--- /dev/null
+++ b/drivers/raw/ptdma/ptdma_rawdev_test.c
@@ -0,0 +1,272 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ **/
+
+#include <unistd.h>
+#include <inttypes.h>
+#include <rte_mbuf.h>
+#include "rte_rawdev.h"
+#include "rte_ptdma_rawdev.h"
+#include "ptdma_pmd_private.h"
+
+#define MAX_SUPPORTED_RAWDEVS 16
+#define TEST_SKIPPED 77
+
+
+static struct rte_mempool *pool;
+static unsigned short expected_ring_size[MAX_SUPPORTED_RAWDEVS];
+
+#define PRINT_ERR(...) print_err(__func__, __LINE__, __VA_ARGS__)
+
+static inline int
+__rte_format_printf(3, 4)
+print_err(const char *func, int lineno, const char *format, ...)
+{
+	va_list ap;
+	int ret;
+
+	ret = fprintf(stderr, "In %s:%d - ", func, lineno);
+	va_start(ap, format);
+	ret += vfprintf(stderr, format, ap);
+	va_end(ap);
+
+	return ret;
+}
+
+static int
+test_enqueue_copies(int dev_id)
+{
+	const unsigned int length = 1024;
+	unsigned int i = 0;
+	do {
+		struct rte_mbuf *src, *dst;
+		char *src_data, *dst_data;
+		struct rte_mbuf *completed[2] = {0};
+
+		/* test doing a single copy */
+		src = rte_pktmbuf_alloc(pool);
+		dst = rte_pktmbuf_alloc(pool);
+		src->data_len = src->pkt_len = length;
+		dst->data_len = dst->pkt_len = length;
+		src_data = rte_pktmbuf_mtod(src, char *);
+		dst_data = rte_pktmbuf_mtod(dst, char *);
+
+		for (i = 0; i < length; i++)
+			src_data[i] = rand() & 0xFF;
+
+		if (rte_ptdma_enqueue_copy(dev_id,
+				src->buf_iova + src->data_off,
+				dst->buf_iova + dst->data_off,
+				length,
+				(uintptr_t)src,
+				(uintptr_t)dst) != 1) {
+			PRINT_ERR("Error with rte_ptdma_enqueue_copy - 1\n");
+			return -1;
+		}
+		rte_ptdma_perform_ops(dev_id);
+		usleep(10);
+
+		if (rte_ptdma_completed_ops(dev_id, 1, (void *)&completed[0],
+				(void *)&completed[1]) != 1) {
+			PRINT_ERR("Error with rte_ptdma_completed_ops - 1\n");
+			return -1;
+		}
+		if (completed[0] != src || completed[1] != dst) {
+			PRINT_ERR("Error with completions: got (%p, %p), not (%p,%p)\n",
+					completed[0], completed[1], src, dst);
+			return -1;
+		}
+
+		for (i = 0; i < length; i++)
+			if (dst_data[i] != src_data[i]) {
+				PRINT_ERR("Data mismatch at char %u - 1\n", i);
+				return -1;
+			}
+		rte_pktmbuf_free(src);
+		rte_pktmbuf_free(dst);
+
+
+	} while (0);
+
+	/* test doing multiple copies */
+	do {
+		struct rte_mbuf *srcs[32], *dsts[32];
+		struct rte_mbuf *completed_src[64];
+		struct rte_mbuf *completed_dst[64];
+		unsigned int j;
+
+		for (i = 0; i < RTE_DIM(srcs) ; i++) {
+			char *src_data;
+
+			srcs[i] = rte_pktmbuf_alloc(pool);
+			dsts[i] = rte_pktmbuf_alloc(pool);
+			srcs[i]->data_len = srcs[i]->pkt_len = length;
+			dsts[i]->data_len = dsts[i]->pkt_len = length;
+			src_data = rte_pktmbuf_mtod(srcs[i], char *);
+
+			for (j = 0; j < length; j++)
+				src_data[j] = rand() & 0xFF;
+
+			if (rte_ptdma_enqueue_copy(dev_id,
+					srcs[i]->buf_iova + srcs[i]->data_off,
+					dsts[i]->buf_iova + dsts[i]->data_off,
+					length,
+					(uintptr_t)srcs[i],
+					(uintptr_t)dsts[i]) != 1) {
+				PRINT_ERR("Error with rte_ptdma_enqueue_copy for buffer %u\n",
+						i);
+				return -1;
+			}
+		}
+		rte_ptdma_perform_ops(dev_id);
+		usleep(100);
+
+		if (rte_ptdma_completed_ops(dev_id, 64, (void *)completed_src,
+				(void *)completed_dst) != RTE_DIM(srcs)) {
+			PRINT_ERR("Error with rte_ptdma_completed_ops\n");
+			return -1;
+		}
+
+		for (i = 0; i < RTE_DIM(srcs) ; i++) {
+			char *src_data, *dst_data;
+			if (completed_src[i] != srcs[i]) {
+				PRINT_ERR("Error with source pointer %u\n", i);
+				return -1;
+			}
+			if (completed_dst[i] != dsts[i]) {
+				PRINT_ERR("Error with dest pointer %u\n", i);
+				return -1;
+			}
+
+			src_data = rte_pktmbuf_mtod(srcs[i], char *);
+			dst_data = rte_pktmbuf_mtod(dsts[i], char *);
+			for (j = 0; j < length; j++)
+				if (src_data[j] != dst_data[j]) {
+					PRINT_ERR("Error with copy of packet %u, byte %u\n",
+							i, j);
+					return -1;
+				}
+
+			rte_pktmbuf_free(srcs[i]);
+			rte_pktmbuf_free(dsts[i]);
+		}
+
+	} while (0);
+
+	return 0;
+}
+
+int
+ptdma_rawdev_test(uint16_t dev_id)
+{
+#define PTDMA_TEST_RINGSIZE 512
+	struct rte_ptdma_rawdev_config p = { .ring_size = -1 };
+	struct rte_rawdev_info info = { .dev_private = &p };
+	struct rte_rawdev_xstats_name *snames = NULL;
+	uint64_t *stats = NULL;
+	unsigned int *ids = NULL;
+	unsigned int nb_xstats;
+	unsigned int i;
+
+	if (dev_id >= MAX_SUPPORTED_RAWDEVS) {
+		printf("Skipping test. Cannot test rawdevs with id's greater than %d\n",
+				MAX_SUPPORTED_RAWDEVS);
+		return TEST_SKIPPED;
+	}
+
+	rte_rawdev_info_get(dev_id, &info, sizeof(p));
+	if (p.ring_size != expected_ring_size[dev_id]) {
+		PRINT_ERR("Error, initial ring size is not as expected (Actual: %d, Expected: %d)\n",
+				(int)p.ring_size, expected_ring_size[dev_id]);
+		return -1;
+	}
+
+	p.ring_size = PTDMA_TEST_RINGSIZE;
+	if (rte_rawdev_configure(dev_id, &info, sizeof(p)) != 0) {
+		PRINT_ERR("Error with rte_rawdev_configure()\n");
+		return -1;
+	}
+	rte_rawdev_info_get(dev_id, &info, sizeof(p));
+	if (p.ring_size != PTDMA_TEST_RINGSIZE) {
+		PRINT_ERR("Error, ring size is not %d (%d)\n",
+				PTDMA_TEST_RINGSIZE, (int)p.ring_size);
+		return -1;
+	}
+	expected_ring_size[dev_id] = p.ring_size;
+
+	if (rte_rawdev_start(dev_id) != 0) {
+		PRINT_ERR("Error with rte_rawdev_start()\n");
+		return -1;
+	}
+
+	pool = rte_pktmbuf_pool_create("TEST_PTDMA_POOL",
+			256, /* n == num elements */
+			32,  /* cache size */
+			0,   /* priv size */
+			2048, /* data room size */
+			info.socket_id);
+	if (pool == NULL) {
+		PRINT_ERR("Error with mempool creation\n");
+		return -1;
+	}
+
+	/* allocate memory for xstats names and values */
+	nb_xstats = rte_rawdev_xstats_names_get(dev_id, NULL, 0);
+
+	snames = malloc(sizeof(*snames) * nb_xstats);
+	if (snames == NULL) {
+		PRINT_ERR("Error allocating xstat names memory\n");
+		goto err;
+	}
+	rte_rawdev_xstats_names_get(dev_id, snames, nb_xstats);
+
+	ids = malloc(sizeof(*ids) * nb_xstats);
+	if (ids == NULL) {
+		PRINT_ERR("Error allocating xstat ids memory\n");
+		goto err;
+	}
+	for (i = 0; i < nb_xstats; i++)
+		ids[i] = i;
+
+	stats = malloc(sizeof(*stats) * nb_xstats);
+	if (stats == NULL) {
+		PRINT_ERR("Error allocating xstat memory\n");
+		goto err;
+	}
+
+	/* run the test cases */
+	printf("Running Copy Tests\n");
+	for (i = 0; i < 100; i++) {
+		unsigned int j;
+
+		if (test_enqueue_copies(dev_id) != 0)
+			goto err;
+
+		rte_rawdev_xstats_get(dev_id, ids, stats, nb_xstats);
+		for (j = 0; j < nb_xstats; j++)
+			printf("%s: %"PRIu64"   ", snames[j].name, stats[j]);
+		printf("\r");
+	}
+	printf("\n");
+
+	rte_rawdev_stop(dev_id);
+	if (rte_rawdev_xstats_reset(dev_id, NULL, 0) != 0) {
+		PRINT_ERR("Error resetting xstat values\n");
+		goto err;
+	}
+
+	rte_mempool_free(pool);
+	free(snames);
+	free(stats);
+	free(ids);
+	return 0;
+
+err:
+	rte_rawdev_stop(dev_id);
+	rte_rawdev_xstats_reset(dev_id, NULL, 0);
+	rte_mempool_free(pool);
+	free(snames);
+	free(stats);
+	free(ids);
+	return -1;
+}
diff --git a/drivers/raw/ptdma/rte_ptdma_rawdev.h b/drivers/raw/ptdma/rte_ptdma_rawdev.h
new file mode 100644
index 0000000000..84eccbc4e8
--- /dev/null
+++ b/drivers/raw/ptdma/rte_ptdma_rawdev.h
@@ -0,0 +1,124 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef _RTE_PTMDA_RAWDEV_H_
+#define _RTE_PTMDA_RAWDEV_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_ptdma_rawdev.h
+ *
+ * Definitions for using the ptdma rawdev device driver
+ *
+ * @warning
+ * @b EXPERIMENTAL: these structures and APIs may change without prior notice
+ */
+
+#include <rte_common.h>
+
+/** Name of the device driver */
+#define PTDMA_PMD_RAWDEV_NAME rawdev_ptdma
+/** String reported as the device driver name by rte_rawdev_info_get() */
+#define PTDMA_PMD_RAWDEV_NAME_STR "rawdev_ptdma"
+
+/**
+ * Configuration structure for an ptdma rawdev instance
+ *
+ * This structure is to be passed as the ".dev_private" parameter when
+ * calling the rte_rawdev_get_info() and rte_rawdev_configure() APIs on
+ * an ptdma rawdev instance.
+ */
+struct rte_ptdma_rawdev_config {
+	unsigned short ring_size; /**< size of job submission descriptor ring */
+	bool hdls_disable;    /**< if set, ignore user-supplied handle params */
+};
+
+/**
+ * Enqueue a copy operation onto the ptdma device
+ *
+ * This queues up a copy operation to be performed by hardware, but does not
+ * trigger hardware to begin that operation.
+ *
+ * @param dev_id
+ *   The rawdev device id of the ptdma instance
+ * @param src
+ *   The physical address of the source buffer
+ * @param dst
+ *   The physical address of the destination buffer
+ * @param length
+ *   The length of the data to be copied
+ * @param src_hdl
+ *   An opaque handle for the source data, to be returned when this operation
+ *   has been completed and the user polls for the completion details.
+ *   NOTE: If hdls_disable configuration option for the device is set, this
+ *   parameter is ignored.
+ * @param dst_hdl
+ *   An opaque handle for the destination data, to be returned when this
+ *   operation has been completed and the user polls for the completion details.
+ *   NOTE: If hdls_disable configuration option for the device is set, this
+ *   parameter is ignored.
+ * @return
+ *   Number of operations enqueued, either 0 or 1
+ */
+static inline int
+__rte_experimental
+rte_ptdma_enqueue_copy(int dev_id, phys_addr_t src, phys_addr_t dst,
+		unsigned int length, uintptr_t src_hdl, uintptr_t dst_hdl);
+
+
+/**
+ * Trigger hardware to begin performing enqueued operations
+ *
+ * This API is used to write to the hardware to trigger it
+ * to begin the operations previously enqueued by rte_ptdma_enqueue_copy()
+ *
+ * @param dev_id
+ *   The rawdev device id of the ptdma instance
+ */
+static inline void
+__rte_experimental
+rte_ptdma_perform_ops(int dev_id);
+
+/**
+ * Returns details of operations that have been completed
+ *
+ * This function returns number of newly-completed operations.
+ *
+ * @param dev_id
+ *   The rawdev device id of the ptdma instance
+ * @param max_copies
+ *   The number of entries which can fit in the src_hdls and dst_hdls
+ *   arrays, i.e. max number of completed operations to report.
+ *   NOTE: If hdls_disable configuration option for the device is set, this
+ *   parameter is ignored.
+ * @param src_hdls
+ *   Array to hold the source handle parameters of the completed ops.
+ *   NOTE: If hdls_disable configuration option for the device is set, this
+ *   parameter is ignored.
+ * @param dst_hdls
+ *   Array to hold the destination handle parameters of the completed ops.
+ *   NOTE: If hdls_disable configuration option for the device is set, this
+ *   parameter is ignored.
+ * @return
+ *   -1 on error, with rte_errno set appropriately.
+ *   Otherwise number of completed operations i.e. number of entries written
+ *   to the src_hdls and dst_hdls array parameters.
+ */
+static inline int
+__rte_experimental
+rte_ptdma_completed_ops(int dev_id, uint8_t max_copies,
+		uintptr_t *src_hdls, uintptr_t *dst_hdls);
+
+
+/* include the implementation details from a separate file */
+#include "rte_ptdma_rawdev_fns.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PTMDA_RAWDEV_H_ */
diff --git a/drivers/raw/ptdma/rte_ptdma_rawdev_fns.h b/drivers/raw/ptdma/rte_ptdma_rawdev_fns.h
new file mode 100644
index 0000000000..f4dced3bef
--- /dev/null
+++ b/drivers/raw/ptdma/rte_ptdma_rawdev_fns.h
@@ -0,0 +1,298 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+#ifndef _RTE_PTDMA_RAWDEV_FNS_H_
+#define _RTE_PTDMA_RAWDEV_FNS_H_
+
+#include <x86intrin.h>
+#include <rte_rawdev.h>
+#include <rte_memzone.h>
+#include <rte_prefetch.h>
+#include "ptdma_rawdev_spec.h"
+#include "ptdma_pmd_private.h"
+
+/**
+ * @internal
+ * some statistics for tracking, if added/changed update xstats fns
+ */
+struct rte_ptdma_xstats {
+	uint64_t enqueue_failed;
+	uint64_t enqueued;
+	uint64_t started;
+	uint64_t completed;
+};
+
+/**
+ * @internal
+ * Structure representing an PTDMA device instance
+ */
+struct rte_ptdma_rawdev {
+	struct rte_rawdev *rawdev;
+	struct rte_ptdma_xstats xstats;
+	unsigned short ring_size;
+
+	bool hdls_disable;
+	__m128i *hdls; /* completion handles for returning to user */
+	unsigned short next_read;
+	unsigned short next_write;
+
+	int id; /**< ptdma dev id on platform */
+	struct ptdma_cmd_queue cmd_q[MAX_HW_QUEUES]; /**< ptdma queue */
+	int cmd_q_count; /**< no. of ptdma Queues */
+	struct rte_pci_device pci; /**< ptdma pci identifier */
+	int qidx;
+
+};
+
+static __rte_always_inline void
+ptdma_dump_registers(int dev_id)
+{
+	struct rte_ptdma_rawdev *ptdma_priv =
+		(struct rte_ptdma_rawdev *)rte_rawdevs[dev_id].dev_private;
+	struct ptdma_cmd_queue *cmd_q;
+	uint32_t cur_head_offset;
+	uint32_t cur_tail_offset;
+
+	cmd_q = &ptdma_priv->cmd_q[0];
+
+	PTDMA_PMD_DEBUG("cmd_q->head_offset	= %d\n", cmd_q->head_offset);
+	PTDMA_PMD_DEBUG("cmd_q->tail_offset	= %d\n", cmd_q->tail_offset);
+	PTDMA_PMD_DEBUG("cmd_q->id		= %" PRIx64 "\n", cmd_q->id);
+	PTDMA_PMD_DEBUG("cmd_q->qidx		= %" PRIx64 "\n", cmd_q->qidx);
+	PTDMA_PMD_DEBUG("cmd_q->qsize		= %" PRIx64 "\n", cmd_q->qsize);
+
+	cur_head_offset = PTDMA_READ_REG(cmd_q->reg_base,
+			CMD_Q_HEAD_LO_BASE);
+	cur_tail_offset = PTDMA_READ_REG(cmd_q->reg_base,
+			CMD_Q_TAIL_LO_BASE);
+
+	PTDMA_PMD_DEBUG("cur_head_offset	= %d\n", cur_head_offset);
+	PTDMA_PMD_DEBUG("cur_tail_offset	= %d\n", cur_tail_offset);
+	PTDMA_PMD_DEBUG("Q_CONTROL_BASE		= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_CONTROL_BASE));
+	PTDMA_PMD_DEBUG("Q_STATUS_BASE		= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_STATUS_BASE));
+	PTDMA_PMD_DEBUG("Q_INT_STATUS_BASE	= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_INT_STATUS_BASE));
+	PTDMA_PMD_DEBUG("Q_DMA_STATUS_BASE	= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_DMA_STATUS_BASE));
+	PTDMA_PMD_DEBUG("Q_DMA_RD_STS_BASE	= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_DMA_READ_STATUS_BASE));
+	PTDMA_PMD_DEBUG("Q_DMA_WRT_STS_BASE	= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_DMA_WRITE_STATUS_BASE));
+}
+
+static __rte_always_inline void
+ptdma_perform_passthru(struct ptdma_passthru *pst,
+		struct ptdma_cmd_queue *cmd_q)
+{
+	struct ptdma_desc *desc;
+	union ptdma_function function;
+
+	desc = &cmd_q->qbase_desc[cmd_q->qidx];
+
+	PTDMA_CMD_ENGINE(desc) = PTDMA_ENGINE_PASSTHRU;
+
+	PTDMA_CMD_SOC(desc) = 0;
+	PTDMA_CMD_IOC(desc) = 0;
+	PTDMA_CMD_INIT(desc) = 0;
+	PTDMA_CMD_EOM(desc) = 0;
+	PTDMA_CMD_PROT(desc) = 0;
+
+	function.raw = 0;
+	PTDMA_PT_BYTESWAP(&function) = pst->byte_swap;
+	PTDMA_PT_BITWISE(&function) = pst->bit_mod;
+	PTDMA_CMD_FUNCTION(desc) = function.raw;
+	PTDMA_CMD_LEN(desc) = pst->len;
+
+	PTDMA_CMD_SRC_LO(desc) = (uint32_t)(pst->src_addr);
+	PTDMA_CMD_SRC_HI(desc) = high32_value(pst->src_addr);
+	PTDMA_CMD_SRC_MEM(desc) = PTDMA_MEMTYPE_SYSTEM;
+
+	PTDMA_CMD_DST_LO(desc) = (uint32_t)(pst->dest_addr);
+	PTDMA_CMD_DST_HI(desc) = high32_value(pst->dest_addr);
+	PTDMA_CMD_DST_MEM(desc) = PTDMA_MEMTYPE_SYSTEM;
+
+	cmd_q->qidx = (cmd_q->qidx + 1) % COMMANDS_PER_QUEUE;
+
+}
+
+
+static __rte_always_inline int
+ptdma_ops_to_enqueue(int dev_id, uint32_t op, uint64_t src, phys_addr_t dst,
+		unsigned int length, uintptr_t src_hdl, uintptr_t dst_hdl)
+{
+	struct rte_ptdma_rawdev *ptdma_priv =
+		(struct rte_ptdma_rawdev *)rte_rawdevs[dev_id].dev_private;
+	struct ptdma_cmd_queue *cmd_q;
+	struct ptdma_passthru pst;
+	uint32_t cmd_q_ctrl;
+	unsigned short write	= ptdma_priv->next_write;
+	unsigned short read	= ptdma_priv->next_read;
+	unsigned short mask	= ptdma_priv->ring_size - 1;
+	unsigned short space	= mask + read - write;
+
+	cmd_q = &ptdma_priv->cmd_q[0];
+	cmd_q_ctrl = PTDMA_READ_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE);
+
+	if (cmd_q_ctrl & CMD_Q_RUN) {
+		/* Turn the queue off using control register */
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE,
+				cmd_q_ctrl & ~CMD_Q_RUN);
+		do {
+			cmd_q_ctrl = PTDMA_READ_REG(cmd_q->reg_base,
+					CMD_Q_CONTROL_BASE);
+		} while (!(cmd_q_ctrl & CMD_Q_HALT));
+	}
+
+	if (space == 0) {
+		ptdma_priv->xstats.enqueue_failed++;
+		return 0;
+	}
+
+	ptdma_priv->next_write = write + 1;
+	write &= mask;
+
+	if (!op)
+		pst.src_addr	= src;
+	else
+		PTDMA_PMD_DEBUG("Operation not supported by PTDMA\n");
+
+	pst.dest_addr	= dst;
+	pst.len		= length;
+	pst.bit_mod	= PTDMA_PASSTHRU_BITWISE_NOOP;
+	pst.byte_swap	= PTDMA_PASSTHRU_BYTESWAP_NOOP;
+
+	cmd_q = &ptdma_priv->cmd_q[0];
+
+	cmd_q->head_offset = (uint32_t)(PTDMA_READ_REG(cmd_q->reg_base,
+				CMD_Q_HEAD_LO_BASE));
+
+	ptdma_perform_passthru(&pst, cmd_q);
+
+	cmd_q->tail_offset = (uint32_t)(cmd_q->qbase_phys_addr + cmd_q->qidx *
+				Q_DESC_SIZE);
+	rte_wmb();
+
+	/* Write the new tail address back to the queue register */
+	PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_TAIL_LO_BASE,
+			cmd_q->tail_offset);
+
+	if (!ptdma_priv->hdls_disable)
+		ptdma_priv->hdls[write] =
+					_mm_set_epi64x((int64_t)dst_hdl,
+							(int64_t)src_hdl);
+	ptdma_priv->xstats.enqueued++;
+
+	return 1;
+}
+
+static __rte_always_inline int
+ptdma_ops_to_dequeue(int dev_id, int max_copies, uintptr_t *src_hdls,
+						uintptr_t *dst_hdls)
+{
+	struct rte_ptdma_rawdev *ptdma_priv =
+		(struct rte_ptdma_rawdev *)rte_rawdevs[dev_id].dev_private;
+	struct ptdma_cmd_queue *cmd_q;
+	uint32_t cur_head_offset;
+	short end_read;
+	unsigned short count;
+	unsigned short read	= ptdma_priv->next_read;
+	unsigned short write	= ptdma_priv->next_write;
+	unsigned short mask	= ptdma_priv->ring_size - 1;
+	int i = 0;
+
+	cmd_q = &ptdma_priv->cmd_q[0];
+
+	cur_head_offset = PTDMA_READ_REG(cmd_q->reg_base,
+			CMD_Q_HEAD_LO_BASE);
+
+	end_read = cur_head_offset - cmd_q->head_offset;
+
+	if (end_read < 0)
+		end_read = COMMANDS_PER_QUEUE - cmd_q->head_offset
+				+ cur_head_offset;
+	if (end_read < max_copies)
+		return 0;
+
+	if (end_read != 0)
+		count = (write - (read & mask)) & mask;
+	else
+		return 0;
+
+	if (ptdma_priv->hdls_disable) {
+		read += count;
+		goto end;
+	}
+
+	if (count > max_copies)
+		count = max_copies;
+
+	for (; i < count - 1; i += 2, read += 2) {
+		__m128i hdls0 =
+			_mm_load_si128(&ptdma_priv->hdls[read & mask]);
+		__m128i hdls1 =
+			_mm_load_si128(&ptdma_priv->hdls[(read + 1) & mask]);
+		_mm_storeu_si128((__m128i *)&src_hdls[i],
+				_mm_unpacklo_epi64(hdls0, hdls1));
+		_mm_storeu_si128((__m128i *)&dst_hdls[i],
+				_mm_unpackhi_epi64(hdls0, hdls1));
+	}
+
+	for (; i < count; i++, read++) {
+		uintptr_t *hdls =
+			(uintptr_t *)&ptdma_priv->hdls[read & mask];
+		src_hdls[i] = hdls[0];
+		dst_hdls[i] = hdls[1];
+	}
+end:
+	ptdma_priv->next_read = read;
+	ptdma_priv->xstats.completed += count;
+
+	return count;
+}
+
+static inline int
+rte_ptdma_enqueue_copy(int dev_id, phys_addr_t src, phys_addr_t dst,
+		unsigned int length, uintptr_t src_hdl, uintptr_t dst_hdl)
+{
+	return ptdma_ops_to_enqueue(dev_id, 0, src, dst, length,
+					src_hdl, dst_hdl);
+}
+
+static inline void
+rte_ptdma_perform_ops(int dev_id)
+{
+	struct rte_ptdma_rawdev *ptdma_priv =
+		(struct rte_ptdma_rawdev *)rte_rawdevs[dev_id].dev_private;
+	struct ptdma_cmd_queue *cmd_q;
+	uint32_t cmd_q_ctrl;
+
+	cmd_q = &ptdma_priv->cmd_q[0];
+	cmd_q_ctrl = PTDMA_READ_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE);
+
+	 /* Turn the queue on using control register */
+	PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE,
+			cmd_q_ctrl | CMD_Q_RUN);
+
+	ptdma_priv->xstats.started = ptdma_priv->xstats.enqueued;
+}
+
+static inline int
+rte_ptdma_completed_ops(int dev_id, uint8_t max_copies,
+		uintptr_t *src_hdls, uintptr_t *dst_hdls)
+{
+	int ret = 0;
+
+	ret = ptdma_ops_to_dequeue(dev_id, max_copies, src_hdls, dst_hdls);
+
+	return ret;
+}
+
+#endif
diff --git a/drivers/raw/ptdma/version.map b/drivers/raw/ptdma/version.map
new file mode 100644
index 0000000000..45917242ca
--- /dev/null
+++ b/drivers/raw/ptdma/version.map
@@ -0,0 +1,5 @@
+DPDK_21 {
+
+       local: *;
+};
+
diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
index 74d16e4c4b..30c11e92ba 100755
--- a/usertools/dpdk-devbind.py
+++ b/usertools/dpdk-devbind.py
@@ -65,6 +65,8 @@
                  'SVendor': None, 'SDevice': None}
 intel_ntb_icx = {'Class': '06', 'Vendor': '8086', 'Device': '347e',
                  'SVendor': None, 'SDevice': None}
+amd_ptdma   = {'Class': '10', 'Vendor': '1022', 'Device': '1498',
+                 'SVendor': None, 'SDevice': None}
 
 network_devices = [network_class, cavium_pkx, avp_vnic, ifpga_class]
 baseband_devices = [acceleration_class]
@@ -74,7 +76,7 @@
 compress_devices = [cavium_zip]
 regex_devices = [octeontx2_ree]
 misc_devices = [cnxk_bphy, cnxk_bphy_cgx, intel_ioat_bdw, intel_ioat_skx, intel_ioat_icx, intel_idxd_spr,
-                intel_ntb_skx, intel_ntb_icx,
+                intel_ntb_skx, intel_ntb_icx, amd_ptdma,
                 octeontx2_dma]
 
 # global dict ethernet devices present. Dictionary indexed by PCI address.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 9+ messages in thread
* [dpdk-dev] [RFC PATCH v2] raw/ptdma: introduce ptdma driver
@ 2021-09-06 14:34 Selwin Sebastian
  0 siblings, 0 replies; 9+ messages in thread
From: Selwin Sebastian @ 2021-09-06 14:34 UTC (permalink / raw)
  To: dev; +Cc: Selwin Sebastian

Add support for PTDMA driver

Signed-off-by: Selwin Sebastian <selwin.sebastia@amd.com>
---
 MAINTAINERS                              |   5 +
 doc/guides/rawdevs/ptdma.rst             | 220 ++++++++++++++
 drivers/raw/meson.build                  |   1 +
 drivers/raw/ptdma/meson.build            |  16 +
 drivers/raw/ptdma/ptdma_dev.c            | 135 +++++++++
 drivers/raw/ptdma/ptdma_pmd_private.h    |  41 +++
 drivers/raw/ptdma/ptdma_rawdev.c         | 266 +++++++++++++++++
 drivers/raw/ptdma/ptdma_rawdev_spec.h    | 362 +++++++++++++++++++++++
 drivers/raw/ptdma/ptdma_rawdev_test.c    | 272 +++++++++++++++++
 drivers/raw/ptdma/rte_ptdma_rawdev.h     | 124 ++++++++
 drivers/raw/ptdma/rte_ptdma_rawdev_fns.h | 298 +++++++++++++++++++
 drivers/raw/ptdma/version.map            |   5 +
 usertools/dpdk-devbind.py                |   4 +-
 13 files changed, 1748 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/rawdevs/ptdma.rst
 create mode 100644 drivers/raw/ptdma/meson.build
 create mode 100644 drivers/raw/ptdma/ptdma_dev.c
 create mode 100644 drivers/raw/ptdma/ptdma_pmd_private.h
 create mode 100644 drivers/raw/ptdma/ptdma_rawdev.c
 create mode 100644 drivers/raw/ptdma/ptdma_rawdev_spec.h
 create mode 100644 drivers/raw/ptdma/ptdma_rawdev_test.c
 create mode 100644 drivers/raw/ptdma/rte_ptdma_rawdev.h
 create mode 100644 drivers/raw/ptdma/rte_ptdma_rawdev_fns.h
 create mode 100644 drivers/raw/ptdma/version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 266f5ac1da..f4afd1a072 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1305,6 +1305,11 @@ F: doc/guides/rawdevs/ioat.rst
 F: examples/ioat/
 F: doc/guides/sample_app_ug/ioat.rst
 
+PTDMA Rawdev
+M: Selwin Sebastian <selwin.sebastian@amd.com>
+F: drivers/raw/ptdma/
+F: doc/guides/rawdevs/ptdma.rst
+
 NXP DPAA2 QDMA
 M: Nipun Gupta <nipun.gupta@nxp.com>
 F: drivers/raw/dpaa2_qdma/
diff --git a/doc/guides/rawdevs/ptdma.rst b/doc/guides/rawdevs/ptdma.rst
new file mode 100644
index 0000000000..50772f9f3b
--- /dev/null
+++ b/doc/guides/rawdevs/ptdma.rst
@@ -0,0 +1,220 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+
+PTDMA Rawdev Driver
+===================
+
+The ``ptdma`` rawdev driver provides a poll-mode driver (PMD) for AMD PTDMA device.
+
+Hardware Requirements
+----------------------
+
+The ``dpdk-devbind.py`` script, included with DPDK,
+can be used to show the presence of supported hardware.
+Running ``dpdk-devbind.py --status-dev misc`` will show all the miscellaneous,
+or rawdev-based devices on the system.
+
+Sample output from a system with PTDMA is shown below
+
+Misc (rawdev) devices using DPDK-compatible driver
+==================================================
+0000:01:00.2 'Starship/Matisse PTDMA 1498' drv=igb_uio unused=vfio-pci
+0000:02:00.2 'Starship/Matisse PTDMA 1498' drv=igb_uio unused=vfio-pci
+
+Devices using UIO drivers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The HW devices to be used will need to be bound to a user-space IO driver for use.
+The ``dpdk-devbind.py`` script can be used to view the state of the PTDMA devices
+and to bind them to a suitable DPDK-supported driver, such as ``igb_uio``.
+For example::
+
+        $ sudo ./usertools/dpdk-devbind.py  --force --bind=igb_uio 0000:01:00.2 0000:02:00.2
+
+Compilation
+------------
+
+For builds using ``meson`` and ``ninja``, the driver will be built when the target platform is x86-based.
+No additional compilation steps are necessary.
+
+
+Using PTDMA Rawdev Devices
+--------------------------
+
+To use the devices from an application, the rawdev API can be used, along
+with definitions taken from the device-specific header file
+``rte_ptdma_rawdev.h``. This header is needed to get the definition of
+structure parameters used by some of the rawdev APIs for PTDMA rawdev
+devices, as well as providing key functions for using the device for memory
+copies.
+
+Getting Device Information
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Basic information about each rawdev device can be queried using the
+``rte_rawdev_info_get()`` API. For most applications, this API will be
+needed to verify that the rawdev in question is of the expected type. For
+example, the following code snippet can be used to identify an PTDMA
+rawdev device for use by an application:
+
+.. code-block:: C
+
+        for (i = 0; i < count && !found; i++) {
+                struct rte_rawdev_info info = { .dev_private = NULL };
+                found = (rte_rawdev_info_get(i, &info, 0) == 0 &&
+                                strcmp(info.driver_name,
+                                                PTDMA_PMD_RAWDEV_NAME) == 0);
+        }
+
+When calling the ``rte_rawdev_info_get()`` API for an PTDMA rawdev device,
+the ``dev_private`` field in the ``rte_rawdev_info`` struct should either
+be NULL, or else be set to point to a structure of type
+``rte_ptdma_rawdev_config``, in which case the size of the configured device
+input ring will be returned in that structure.
+
+Device Configuration
+~~~~~~~~~~~~~~~~~~~~~
+
+Configuring an PTDMA rawdev device is done using the
+``rte_rawdev_configure()`` API, which takes the same structure parameters
+as the, previously referenced, ``rte_rawdev_info_get()`` API. The main
+difference is that, because the parameter is used as input rather than
+output, the ``dev_private`` structure element cannot be NULL, and must
+point to a valid ``rte_ptdma_rawdev_config`` structure, containing the ring
+size to be used by the device. The ring size must be a power of two,
+between 64 and 4096.
+If it is not needed, the tracking by the driver of user-provided completion
+handles may be disabled by setting the ``hdls_disable`` flag in
+the configuration structure also.
+
+The following code shows how the device is configured in
+``test_ptdma_rawdev.c``:
+
+.. code-block:: C
+
+   #define PTDMA_TEST_RINGSIZE 512
+        struct rte_ptdma_rawdev_config p = { .ring_size = -1 };
+        struct rte_rawdev_info info = { .dev_private = &p };
+
+        /* ... */
+
+        p.ring_size = PTDMA_TEST_RINGSIZE;
+        if (rte_rawdev_configure(dev_id, &info, sizeof(p)) != 0) {
+                printf("Error with rte_rawdev_configure()\n");
+                return -1;
+        }
+
+Once configured, the device can then be made ready for use by calling the
+``rte_rawdev_start()`` API.
+
+Performing Data Copies
+~~~~~~~~~~~~~~~~~~~~~~~
+
+To perform data copies using PTDMA rawdev devices, the functions
+``rte_ptdma_enqueue_copy()`` and ``rte_ptdma_perform_ops()`` should be used.
+Once copies have been completed, the completion will be reported back when
+the application calls ``rte_ptdma_completed_ops()``.
+
+The ``rte_ptdma_enqueue_copy()`` function enqueues a single copy to the
+device ring for copying at a later point. The parameters to that function
+include the IOVA addresses of both the source and destination buffers,
+as well as two "handles" to be returned to the user when the copy is
+completed. These handles can be arbitrary values, but two are provided so
+that the library can track handles for both source and destination on
+behalf of the user, e.g. virtual addresses for the buffers, or mbuf
+pointers if packet data is being copied.
+
+While the ``rte_ptdma_enqueue_copy()`` function enqueues a copy operation on
+the device ring, the copy will not actually be performed until after the
+application calls the ``rte_ptdma_perform_ops()`` function. This function
+informs the device hardware of the elements enqueued on the ring, and the
+device will begin to process them. It is expected that, for efficiency
+reasons, a burst of operations will be enqueued to the device via multiple
+enqueue calls between calls to the ``rte_ptdma_perform_ops()`` function.
+
+The following code from ``test_ptdma_rawdev.c`` demonstrates how to enqueue
+a burst of copies to the device and start the hardware processing of them:
+
+.. code-block:: C
+
+        struct rte_mbuf *srcs[32], *dsts[32];
+        unsigned int j;
+
+        for (i = 0; i < RTE_DIM(srcs); i++) {
+                char *src_data;
+
+                srcs[i] = rte_pktmbuf_alloc(pool);
+                dsts[i] = rte_pktmbuf_alloc(pool);
+                srcs[i]->data_len = srcs[i]->pkt_len = length;
+                dsts[i]->data_len = dsts[i]->pkt_len = length;
+                src_data = rte_pktmbuf_mtod(srcs[i], char *);
+
+                for (j = 0; j < length; j++)
+                        src_data[j] = rand() & 0xFF;
+
+                if (rte_ptdma_enqueue_copy(dev_id,
+                                srcs[i]->buf_iova + srcs[i]->data_off,
+                                dsts[i]->buf_iova + dsts[i]->data_off,
+                                length,
+                                (uintptr_t)srcs[i],
+                                (uintptr_t)dsts[i]) != 1) {
+                        printf("Error with rte_ptdma_enqueue_copy for buffer %u\n",
+                                        i);
+                        return -1;
+                }
+        }
+        rte_ptdma_perform_ops(dev_id);
+
+To retrieve information about completed copies, the API
+``rte_ptdma_completed_ops()`` should be used. This API will return to the
+application a set of completion handles passed in when the relevant copies
+were enqueued.
+
+The following code from ``test_ptdma_rawdev.c`` shows the test code
+retrieving information about the completed copies and validating the data
+is correct before freeing the data buffers using the returned handles:
+
+.. code-block:: C
+
+        if (rte_ptdma_completed_ops(dev_id, 64, (void *)completed_src,
+                        (void *)completed_dst) != RTE_DIM(srcs)) {
+                printf("Error with rte_ptdma_completed_ops\n");
+                return -1;
+        }
+        for (i = 0; i < RTE_DIM(srcs); i++) {
+                char *src_data, *dst_data;
+
+                if (completed_src[i] != srcs[i]) {
+                        printf("Error with source pointer %u\n", i);
+                        return -1;
+                }
+                if (completed_dst[i] != dsts[i]) {
+                        printf("Error with dest pointer %u\n", i);
+                        return -1;
+                }
+
+                src_data = rte_pktmbuf_mtod(srcs[i], char *);
+                dst_data = rte_pktmbuf_mtod(dsts[i], char *);
+                for (j = 0; j < length; j++)
+                        if (src_data[j] != dst_data[j]) {
+                                printf("Error with copy of packet %u, byte %u\n",
+                                                i, j);
+                                return -1;
+                        }
+                rte_pktmbuf_free(srcs[i]);
+                rte_pktmbuf_free(dsts[i]);
+        }
+
+Querying Device Statistics
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The statistics from the PTDMA rawdev device can be got via the xstats
+functions in the ``rte_rawdev`` library, i.e.
+``rte_rawdev_xstats_names_get()``, ``rte_rawdev_xstats_get()`` and
+``rte_rawdev_xstats_by_name_get``. The statistics returned for each device
+instance are:
+
+* ``failed_enqueues``
+* ``successful_enqueues``
+* ``copies_started``
+* ``copies_completed``
diff --git a/drivers/raw/meson.build b/drivers/raw/meson.build
index b51536f8a7..e896745d9c 100644
--- a/drivers/raw/meson.build
+++ b/drivers/raw/meson.build
@@ -14,6 +14,7 @@ drivers = [
         'ntb',
         'octeontx2_dma',
         'octeontx2_ep',
+	'ptdma',
         'skeleton',
 ]
 std_deps = ['rawdev']
diff --git a/drivers/raw/ptdma/meson.build b/drivers/raw/ptdma/meson.build
new file mode 100644
index 0000000000..a3eab8dbfd
--- /dev/null
+++ b/drivers/raw/ptdma/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2021 Advanced Micro Devices, Inc. All rights reserved.
+
+build = dpdk_conf.has('RTE_ARCH_X86')
+reason = 'only supported on x86'
+sources = files(
+	'ptdma_rawdev.c',
+	'ptdma_dev.c',
+	'ptdma_rawdev_test.c')
+deps += ['bus_pci',
+	'bus_vdev',
+	'mbuf',
+	'rawdev']
+
+headers = files('rte_ptdma_rawdev.h',
+		'rte_ptdma_rawdev_fns.h')
diff --git a/drivers/raw/ptdma/ptdma_dev.c b/drivers/raw/ptdma/ptdma_dev.c
new file mode 100644
index 0000000000..1d0207a9af
--- /dev/null
+++ b/drivers/raw/ptdma/ptdma_dev.c
@@ -0,0 +1,135 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <dirent.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/queue.h>
+#include <sys/types.h>
+#include <sys/file.h>
+#include <unistd.h>
+
+#include <rte_hexdump.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_memory.h>
+#include <rte_spinlock.h>
+#include <rte_string_fns.h>
+
+#include "ptdma_rawdev_spec.h"
+#include "ptdma_pmd_private.h"
+#include "rte_ptdma_rawdev_fns.h"
+
+static int ptdma_dev_id;
+
+static const struct rte_memzone *
+ptdma_queue_dma_zone_reserve(const char *queue_name,
+			   uint32_t queue_size,
+			   int socket_id)
+{
+	const struct rte_memzone *mz;
+
+	mz = rte_memzone_lookup(queue_name);
+	if (mz != 0) {
+		if (((size_t)queue_size <= mz->len) &&
+		    ((socket_id == SOCKET_ID_ANY) ||
+		     (socket_id == mz->socket_id))) {
+			PTDMA_PMD_INFO("re-use memzone already "
+				     "allocated for %s", queue_name);
+			return mz;
+		}
+		PTDMA_PMD_ERR("Incompatible memzone already "
+			    "allocated %s, size %u, socket %d. "
+			    "Requested size %u, socket %u",
+			    queue_name, (uint32_t)mz->len,
+			    mz->socket_id, queue_size, socket_id);
+		return NULL;
+	}
+
+	PTDMA_PMD_INFO("Allocate memzone for %s, size %u on socket %u",
+		     queue_name, queue_size, socket_id);
+
+	return rte_memzone_reserve_aligned(queue_name, queue_size,
+			socket_id, RTE_MEMZONE_IOVA_CONTIG, queue_size);
+}
+
+int
+ptdma_add_queue(struct rte_ptdma_rawdev *dev)
+{
+	int i;
+	uint32_t dma_addr_lo, dma_addr_hi;
+	uint32_t ptdma_version = 0;
+	struct ptdma_cmd_queue *cmd_q;
+	const struct rte_memzone *q_mz;
+	void *vaddr;
+
+	if (dev == NULL)
+		return -1;
+
+	dev->id = ptdma_dev_id++;
+	dev->qidx = 0;
+	vaddr = (void *)(dev->pci.mem_resource[2].addr);
+
+	PTDMA_WRITE_REG(vaddr, CMD_REQID_CONFIG_OFFSET, 0x0);
+	ptdma_version = PTDMA_READ_REG(vaddr, CMD_PTDMA_VERSION);
+	PTDMA_PMD_INFO("PTDMA VERSION  = 0x%x", ptdma_version);
+
+	dev->cmd_q_count = 0;
+	/* Find available queues */
+	for (i = 0; i < MAX_HW_QUEUES; i++) {
+		cmd_q = &dev->cmd_q[dev->cmd_q_count++];
+		cmd_q->dev = dev;
+		cmd_q->id = i;
+		cmd_q->qidx = 0;
+		cmd_q->qsize = Q_SIZE(Q_DESC_SIZE);
+
+		cmd_q->reg_base = (uint8_t *)vaddr +
+			CMD_Q_STATUS_INCR * (i + 1);
+
+		/* PTDMA queue memory */
+		snprintf(cmd_q->memz_name, sizeof(cmd_q->memz_name),
+			 "%s_%d_%s_%d_%s",
+			 "ptdma_dev",
+			 (int)dev->id, "queue",
+			 (int)cmd_q->id, "mem");
+		q_mz = ptdma_queue_dma_zone_reserve(cmd_q->memz_name,
+				cmd_q->qsize, rte_socket_id());
+		cmd_q->qbase_addr = (void *)q_mz->addr;
+		cmd_q->qbase_desc = (void *)q_mz->addr;
+		cmd_q->qbase_phys_addr =  q_mz->iova;
+
+		cmd_q->qcontrol = 0;
+		/* init control reg to zero */
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE,
+			      cmd_q->qcontrol);
+
+		/* Disable the interrupts */
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_INT_ENABLE_BASE, 0x00);
+		PTDMA_READ_REG(cmd_q->reg_base, CMD_Q_INT_STATUS_BASE);
+		PTDMA_READ_REG(cmd_q->reg_base, CMD_Q_STATUS_BASE);
+
+		/* Clear the interrupts */
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_INTERRUPT_STATUS_BASE,
+			      ALL_INTERRUPTS);
+
+		/* Configure size of each virtual queue accessible to host */
+		cmd_q->qcontrol &= ~(CMD_Q_SIZE << CMD_Q_SHIFT);
+		cmd_q->qcontrol |= QUEUE_SIZE_VAL << CMD_Q_SHIFT;
+
+		dma_addr_lo = low32_value(cmd_q->qbase_phys_addr);
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_TAIL_LO_BASE,
+			      (uint32_t)dma_addr_lo);
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_HEAD_LO_BASE,
+			      (uint32_t)dma_addr_lo);
+
+		dma_addr_hi = high32_value(cmd_q->qbase_phys_addr);
+		cmd_q->qcontrol |= (dma_addr_hi << 16);
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE,
+			      cmd_q->qcontrol);
+
+	}
+	return 0;
+}
diff --git a/drivers/raw/ptdma/ptdma_pmd_private.h b/drivers/raw/ptdma/ptdma_pmd_private.h
new file mode 100644
index 0000000000..0c25e737f5
--- /dev/null
+++ b/drivers/raw/ptdma/ptdma_pmd_private.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef _PTDMA_PMD_PRIVATE_H_
+#define _PTDMA_PMD_PRIVATE_H_
+
+#include <rte_rawdev.h>
+#include "ptdma_rawdev_spec.h"
+
+extern int ptdma_pmd_logtype;
+
+#define PTDMA_PMD_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, ptdma_pmd_logtype, "%s(): " fmt "\n", \
+			__func__, ##args)
+
+#define PTDMA_PMD_FUNC_TRACE() PTDMA_PMD_LOG(DEBUG, ">>")
+
+#define PTDMA_PMD_ERR(fmt, args...) \
+	PTDMA_PMD_LOG(ERR, fmt, ## args)
+#define PTDMA_PMD_WARN(fmt, args...) \
+	PTDMA_PMD_LOG(WARNING, fmt, ## args)
+#define PTDMA_PMD_DEBUG(fmt, args...) \
+	PTDMA_PMD_LOG(DEBUG, fmt, ## args)
+#define PTDMA_PMD_INFO(fmt, args...) \
+	PTDMA_PMD_LOG(INFO, fmt, ## args)
+
+int ptdma_xstats_get(const struct rte_rawdev *dev, const unsigned int ids[],
+		uint64_t values[], unsigned int n);
+int ptdma_xstats_get_names(const struct rte_rawdev *dev,
+		struct rte_rawdev_xstats_name *names,
+		unsigned int size);
+int ptdma_xstats_reset(struct rte_rawdev *dev, const uint32_t *ids,
+		uint32_t nb_ids);
+int ptdma_add_queue(struct rte_ptdma_rawdev *dev);
+
+extern int ptdma_rawdev_test(uint16_t dev_id);
+
+#endif /* _PTDMA_PMD_PRIVATE_H_ */
+
+
diff --git a/drivers/raw/ptdma/ptdma_rawdev.c b/drivers/raw/ptdma/ptdma_rawdev.c
new file mode 100644
index 0000000000..cfa57d81ed
--- /dev/null
+++ b/drivers/raw/ptdma/ptdma_rawdev.c
@@ -0,0 +1,266 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#include <rte_bus_pci.h>
+#include <rte_rawdev_pmd.h>
+#include <rte_memzone.h>
+#include <rte_string_fns.h>
+#include <rte_dev.h>
+
+#include "rte_ptdma_rawdev.h"
+#include "ptdma_rawdev_spec.h"
+#include "ptdma_pmd_private.h"
+
+RTE_LOG_REGISTER(ptdma_pmd_logtype, rawdev.ptdma, INFO);
+
+uint8_t ptdma_rawdev_driver_id;
+static struct rte_pci_driver ptdma_pmd_drv;
+
+#define AMD_VENDOR_ID		0x1022
+#define PTDMA_DEVICE_ID		0x1498
+#define COMPLETION_SZ sizeof(__m128i)
+
+static const struct rte_pci_id pci_id_ptdma_map[] = {
+	{ RTE_PCI_DEVICE(AMD_VENDOR_ID, PTDMA_DEVICE_ID) },
+	{ .vendor_id = 0, /* sentinel */ },
+};
+
+static const char * const xstat_names[] = {
+	"failed_enqueues", "successful_enqueues",
+	"copies_started", "copies_completed"
+};
+
+static int
+ptdma_dev_configure(const struct rte_rawdev *dev, rte_rawdev_obj_t config,
+		size_t config_size)
+{
+	struct rte_ptdma_rawdev_config *params = config;
+	struct rte_ptdma_rawdev *ptdma_priv = dev->dev_private;
+
+	if (dev->started)
+		return -EBUSY;
+	if (params == NULL || config_size != sizeof(*params))
+		return -EINVAL;
+	if (params->ring_size > 8192 || params->ring_size < 64 ||
+			!rte_is_power_of_2(params->ring_size))
+		return -EINVAL;
+	ptdma_priv->ring_size = params->ring_size;
+	ptdma_priv->hdls_disable = params->hdls_disable;
+	ptdma_priv->hdls = rte_zmalloc_socket("ptdma_hdls",
+			ptdma_priv->ring_size * sizeof(*ptdma_priv->hdls),
+			RTE_CACHE_LINE_SIZE, rte_socket_id());
+	return 0;
+}
+
+static int
+ptdma_rawdev_remove(struct rte_pci_device *dev);
+
+int
+ptdma_xstats_get(const struct rte_rawdev *dev, const unsigned int ids[],
+		uint64_t values[], unsigned int n)
+{
+	const struct rte_ptdma_rawdev *ptdma = dev->dev_private;
+	const uint64_t *stats = (const void *)&ptdma->xstats;
+	unsigned int i;
+
+	for (i = 0; i < n; i++) {
+		if (ids[i] > sizeof(ptdma->xstats)/sizeof(*stats))
+			values[i] = 0;
+		else
+			values[i] = stats[ids[i]];
+	}
+	return n;
+}
+
+int
+ptdma_xstats_get_names(const struct rte_rawdev *dev,
+		struct rte_rawdev_xstats_name *names,
+		unsigned int size)
+{
+	unsigned int i;
+
+	RTE_SET_USED(dev);
+	if (size < RTE_DIM(xstat_names))
+		return RTE_DIM(xstat_names);
+	for (i = 0; i < RTE_DIM(xstat_names); i++)
+		strlcpy(names[i].name, xstat_names[i], sizeof(names[i]));
+	return RTE_DIM(xstat_names);
+}
+
+int
+ptdma_xstats_reset(struct rte_rawdev *dev, const uint32_t *ids,
+		uint32_t nb_ids)
+{
+	struct rte_ptdma_rawdev *ptdma = dev->dev_private;
+	uint64_t *stats = (void *)&ptdma->xstats;
+	unsigned int i;
+
+	if (!ids) {
+		memset(&ptdma->xstats, 0, sizeof(ptdma->xstats));
+		return 0;
+	}
+	for (i = 0; i < nb_ids; i++)
+		if (ids[i] < sizeof(ptdma->xstats)/sizeof(*stats))
+			stats[ids[i]] = 0;
+	return 0;
+}
+
+static int
+ptdma_dev_start(struct rte_rawdev *dev)
+{
+	RTE_SET_USED(dev);
+	return 0;
+}
+
+static void
+ptdma_dev_stop(struct rte_rawdev *dev)
+{
+	RTE_SET_USED(dev);
+}
+
+static int
+ptdma_dev_close(struct rte_rawdev *dev __rte_unused)
+{
+	return 0;
+}
+
+static int
+ptdma_dev_info_get(struct rte_rawdev *dev, rte_rawdev_obj_t dev_info,
+		size_t dev_info_size)
+{
+	struct rte_ptdma_rawdev_config *cfg = dev_info;
+	struct rte_ptdma_rawdev *ptdma = dev->dev_private;
+
+	if (dev_info == NULL || dev_info_size != sizeof(*cfg))
+		return -EINVAL;
+	cfg->ring_size = ptdma->ring_size;
+	cfg->hdls_disable = ptdma->hdls_disable;
+	return 0;
+}
+
+static int
+ptdma_rawdev_create(const char *name, struct rte_pci_device *dev)
+{
+	static const struct rte_rawdev_ops ptdma_rawdev_ops = {
+			.dev_configure = ptdma_dev_configure,
+			.dev_start = ptdma_dev_start,
+			.dev_stop = ptdma_dev_stop,
+			.dev_close = ptdma_dev_close,
+			.dev_info_get = ptdma_dev_info_get,
+			.xstats_get = ptdma_xstats_get,
+			.xstats_get_names = ptdma_xstats_get_names,
+			.xstats_reset = ptdma_xstats_reset,
+			.dev_selftest = ptdma_rawdev_test,
+	};
+	struct rte_rawdev *rawdev = NULL;
+	struct rte_ptdma_rawdev *ptdma_priv = NULL;
+	int ret = 0;
+	if (!name) {
+		PTDMA_PMD_ERR("Invalid name of the device!");
+		ret = -EINVAL;
+		goto cleanup;
+	}
+	/* Allocate device structure */
+	rawdev = rte_rawdev_pmd_allocate(name, sizeof(struct rte_rawdev),
+						rte_socket_id());
+	if (rawdev == NULL) {
+		PTDMA_PMD_ERR("Unable to allocate raw device");
+		ret = -ENOMEM;
+		goto cleanup;
+	}
+
+	rawdev->dev_id = ptdma_rawdev_driver_id++;
+	PTDMA_PMD_INFO("dev_id = %d", rawdev->dev_id);
+	PTDMA_PMD_INFO("driver_name = %s", dev->device.driver->name);
+
+	rawdev->dev_ops = &ptdma_rawdev_ops;
+	rawdev->device = &dev->device;
+	rawdev->driver_name = dev->device.driver->name;
+
+	ptdma_priv = rte_zmalloc_socket("ptdma_priv", sizeof(*ptdma_priv),
+				RTE_CACHE_LINE_SIZE, rte_socket_id());
+	rawdev->dev_private = ptdma_priv;
+	ptdma_priv->rawdev = rawdev;
+	ptdma_priv->ring_size = 0;
+	ptdma_priv->pci = *dev;
+
+	/* device is valid, add queue details */
+	if (ptdma_add_queue(ptdma_priv))
+		goto init_error;
+
+	return 0;
+
+cleanup:
+	if (rawdev)
+		rte_rawdev_pmd_release(rawdev);
+	return ret;
+init_error:
+	PTDMA_PMD_ERR("driver %s(): failed", __func__);
+	ptdma_rawdev_remove(dev);
+	return -EFAULT;
+}
+
+static int
+ptdma_rawdev_destroy(const char *name)
+{
+	int ret;
+	struct rte_rawdev *rdev;
+	if (!name) {
+		PTDMA_PMD_ERR("Invalid device name");
+		return -EINVAL;
+	}
+	rdev = rte_rawdev_pmd_get_named_dev(name);
+	if (!rdev) {
+		PTDMA_PMD_ERR("Invalid device name (%s)", name);
+		return -EINVAL;
+	}
+
+	if (rdev->dev_private != NULL)
+		rte_free(rdev->dev_private);
+
+	/* rte_rawdev_close is called by pmd_release */
+	ret = rte_rawdev_pmd_release(rdev);
+
+	if (ret)
+		PTDMA_PMD_DEBUG("Device cleanup failed");
+	return 0;
+}
+static int
+ptdma_rawdev_probe(struct rte_pci_driver *drv, struct rte_pci_device *dev)
+{
+	char name[32];
+	int ret = 0;
+
+	rte_pci_device_name(&dev->addr, name, sizeof(name));
+	PTDMA_PMD_INFO("Init %s on NUMA node %d", name, dev->device.numa_node);
+
+	dev->device.driver = &drv->driver;
+	ret = ptdma_rawdev_create(name, dev);
+	return ret;
+}
+
+static int
+ptdma_rawdev_remove(struct rte_pci_device *dev)
+{
+	char name[32];
+	int ret;
+
+	rte_pci_device_name(&dev->addr, name, sizeof(name));
+	PTDMA_PMD_INFO("Closing %s on NUMA node %d",
+			name, dev->device.numa_node);
+	ret = ptdma_rawdev_destroy(name);
+	return ret;
+}
+
+static struct rte_pci_driver ptdma_pmd_drv = {
+	.id_table = pci_id_ptdma_map,
+	.drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC,
+	.probe = ptdma_rawdev_probe,
+	.remove = ptdma_rawdev_remove,
+};
+
+RTE_PMD_REGISTER_PCI(PTDMA_PMD_RAWDEV_NAME, ptdma_pmd_drv);
+RTE_PMD_REGISTER_PCI_TABLE(PTDMA_PMD_RAWDEV_NAME, pci_id_ptdma_map);
+RTE_PMD_REGISTER_KMOD_DEP(PTDMA_PMD_RAWDEV_NAME, "* igb_uio | uio_pci_generic");
+
diff --git a/drivers/raw/ptdma/ptdma_rawdev_spec.h b/drivers/raw/ptdma/ptdma_rawdev_spec.h
new file mode 100644
index 0000000000..73511bec95
--- /dev/null
+++ b/drivers/raw/ptdma/ptdma_rawdev_spec.h
@@ -0,0 +1,362 @@
+/* SPDX-License-Identifier: BSD-3.0-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef __PT_DEV_H__
+#define __PT_DEV_H__
+
+#include <rte_bus_pci.h>
+#include <rte_byteorder.h>
+#include <rte_io.h>
+#include <rte_pci.h>
+#include <rte_spinlock.h>
+#include <rte_rawdev.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define BIT(nr)				(1 << (nr))
+
+#define BITS_PER_LONG   (__SIZEOF_LONG__ * 8)
+#define GENMASK(h, l)   (((~0UL) << (l)) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
+
+#define MAX_HW_QUEUES			1
+
+/* Register Mappings */
+
+#define CMD_QUEUE_PRIO_OFFSET		0x00
+#define CMD_REQID_CONFIG_OFFSET		0x04
+#define CMD_TIMEOUT_OFFSET		0x08
+#define CMD_TIMEOUT_GRANULARITY		0x0C
+#define CMD_PTDMA_VERSION		0x10
+
+#define CMD_Q_CONTROL_BASE		0x0000
+#define CMD_Q_TAIL_LO_BASE		0x0004
+#define CMD_Q_HEAD_LO_BASE		0x0008
+#define CMD_Q_INT_ENABLE_BASE		0x000C
+#define CMD_Q_INTERRUPT_STATUS_BASE	0x0010
+
+#define CMD_Q_STATUS_BASE		0x0100
+#define CMD_Q_INT_STATUS_BASE		0x0104
+#define CMD_Q_DMA_STATUS_BASE		0x0108
+#define CMD_Q_DMA_READ_STATUS_BASE	0x010C
+#define CMD_Q_DMA_WRITE_STATUS_BASE	0x0110
+#define CMD_Q_ABORT_BASE		0x0114
+#define CMD_Q_AX_CACHE_BASE		0x0118
+
+#define CMD_CONFIG_OFFSET		0x1120
+#define CMD_CLK_GATE_CTL_OFFSET		0x6004
+
+#define CMD_DESC_DW0_VAL		0x500012
+
+/* Address offset for virtual queue registers */
+#define CMD_Q_STATUS_INCR		0x1000
+
+/* Bit masks */
+#define CMD_CONFIG_REQID		0
+#define CMD_TIMEOUT_DISABLE		0
+#define CMD_CLK_DYN_GATING_DIS		0
+#define CMD_CLK_SW_GATE_MODE		0
+#define CMD_CLK_GATE_CTL		0
+#define CMD_QUEUE_PRIO			GENMASK(2, 1)
+#define CMD_CONFIG_VHB_EN		BIT(0)
+#define CMD_CLK_DYN_GATING_EN		BIT(0)
+#define CMD_CLK_HW_GATE_MODE		BIT(0)
+#define CMD_CLK_GATE_ON_DELAY		BIT(12)
+#define CMD_CLK_GATE_OFF_DELAY		BIT(12)
+
+#define CMD_CLK_GATE_CONFIG		(CMD_CLK_GATE_CTL | \
+					CMD_CLK_HW_GATE_MODE | \
+					CMD_CLK_GATE_ON_DELAY | \
+					CMD_CLK_DYN_GATING_EN | \
+					CMD_CLK_GATE_OFF_DELAY)
+
+#define CMD_Q_LEN			32
+#define CMD_Q_RUN			BIT(0)
+#define CMD_Q_HALT			BIT(1)
+#define CMD_Q_MEM_LOCATION		BIT(2)
+#define CMD_Q_SIZE			GENMASK(4, 0)
+#define CMD_Q_SHIFT			GENMASK(1, 0)
+#define COMMANDS_PER_QUEUE		8192
+
+
+#define QUEUE_SIZE_VAL			((ffs(COMMANDS_PER_QUEUE) - 2) & \
+						CMD_Q_SIZE)
+#define Q_PTR_MASK			(2 << (QUEUE_SIZE_VAL + 5) - 1)
+#define Q_DESC_SIZE			sizeof(struct ptdma_desc)
+#define Q_SIZE(n)			(COMMANDS_PER_QUEUE * (n))
+
+#define INT_COMPLETION			BIT(0)
+#define INT_ERROR			BIT(1)
+#define INT_QUEUE_STOPPED		BIT(2)
+#define INT_EMPTY_QUEUE			BIT(3)
+#define SUPPORTED_INTERRUPTS		(INT_COMPLETION | INT_ERROR)
+#define ALL_INTERRUPTS			(INT_COMPLETION | INT_ERROR | \
+					INT_QUEUE_STOPPED)
+
+/****** Local Storage Block ******/
+#define LSB_START			0
+#define LSB_END				127
+#define LSB_COUNT			(LSB_END - LSB_START + 1)
+
+#define LSB_REGION_WIDTH		5
+#define MAX_LSB_CNT			8
+
+#define LSB_SIZE			16
+#define LSB_ITEM_SIZE			128
+#define SLSB_MAP_SIZE			(MAX_LSB_CNT * LSB_SIZE)
+#define LSB_ENTRY_NUMBER(LSB_ADDR)	(LSB_ADDR / LSB_ITEM_SIZE)
+
+
+#define PT_DMAPOOL_MAX_SIZE		64
+#define PT_DMAPOOL_ALIGN		BIT(5)
+
+#define PT_PASSTHRU_BLOCKSIZE		512
+
+/* General PTDMA Defines */
+
+#define PTDMA_SB_BYTES			32
+#define	PTDMA_ENGINE_PASSTHRU		0x5
+
+/* Word 0 */
+#define PTDMA_CMD_DW0(p)		((p)->dw0)
+#define PTDMA_CMD_SOC(p)		(PTDMA_CMD_DW0(p).soc)
+#define PTDMA_CMD_IOC(p)		(PTDMA_CMD_DW0(p).ioc)
+#define PTDMA_CMD_INIT(p)		(PTDMA_CMD_DW0(p).init)
+#define PTDMA_CMD_EOM(p)		(PTDMA_CMD_DW0(p).eom)
+#define PTDMA_CMD_FUNCTION(p)		(PTDMA_CMD_DW0(p).function)
+#define PTDMA_CMD_ENGINE(p)		(PTDMA_CMD_DW0(p).engine)
+#define PTDMA_CMD_PROT(p)		(PTDMA_CMD_DW0(p).prot)
+
+/* Word 1 */
+#define PTDMA_CMD_DW1(p)		((p)->length)
+#define PTDMA_CMD_LEN(p)		(PTDMA_CMD_DW1(p))
+
+/* Word 2 */
+#define PTDMA_CMD_DW2(p)		((p)->src_lo)
+#define PTDMA_CMD_SRC_LO(p)		(PTDMA_CMD_DW2(p))
+
+/* Word 3 */
+#define PTDMA_CMD_DW3(p)		((p)->dw3)
+#define PTDMA_CMD_SRC_MEM(p)		((p)->dw3.src_mem)
+#define PTDMA_CMD_SRC_HI(p)		((p)->dw3.src_hi)
+#define PTDMA_CMD_LSB_ID(p)		((p)->dw3.lsb_cxt_id)
+#define PTDMA_CMD_FIX_SRC(p)		((p)->dw3.fixed)
+
+/* Words 4/5 */
+#define PTDMA_CMD_DST_LO(p)		((p)->dst_lo)
+#define PTDMA_CMD_DW5(p)		((p)->dw5.dst_hi)
+#define PTDMA_CMD_DST_HI(p)		(PTDMA_CMD_DW5(p))
+#define PTDMA_CMD_DST_MEM(p)		((p)->dw5.dst_mem)
+#define PTDMA_CMD_FIX_DST(p)		((p)->dw5.fixed)
+
+/* bitmap */
+enum {
+	BITS_PER_WORD = sizeof(unsigned long) * CHAR_BIT
+};
+
+#define WORD_OFFSET(b) ((b) / BITS_PER_WORD)
+#define BIT_OFFSET(b)  ((b) % BITS_PER_WORD)
+
+#define PTDMA_DIV_ROUND_UP(n, d)  (((n) + (d) - 1) / (d))
+#define PTDMA_BITMAP_SIZE(nr) \
+	PTDMA_DIV_ROUND_UP(nr, CHAR_BIT * sizeof(unsigned long))
+
+#define PTDMA_BITMAP_FIRST_WORD_MASK(start) \
+	(~0UL << ((start) & (BITS_PER_WORD - 1)))
+#define PTDMA_BITMAP_LAST_WORD_MASK(nbits) \
+	(~0UL >> (-(nbits) & (BITS_PER_WORD - 1)))
+
+#define __ptdma_round_mask(x, y) ((typeof(x))((y)-1))
+#define ptdma_round_down(x, y) ((x) & ~__ptdma_round_mask(x, y))
+
+/** PTDMA registers Write/Read */
+static inline void ptdma_pci_reg_write(void *base, int offset,
+					uint32_t value)
+{
+	volatile void *reg_addr = ((uint8_t *)base + offset);
+	rte_write32((rte_cpu_to_le_32(value)), reg_addr);
+}
+
+static inline uint32_t ptdma_pci_reg_read(void *base, int offset)
+{
+	volatile void *reg_addr = ((uint8_t *)base + offset);
+	return rte_le_to_cpu_32(rte_read32(reg_addr));
+}
+
+#define PTDMA_READ_REG(hw_addr, reg_offset) \
+	ptdma_pci_reg_read(hw_addr, reg_offset)
+
+#define PTDMA_WRITE_REG(hw_addr, reg_offset, value) \
+	ptdma_pci_reg_write(hw_addr, reg_offset, value)
+
+/**
+ * A structure describing a PTDMA command queue.
+ */
+struct ptdma_cmd_queue {
+	struct rte_ptdma_rawdev *dev;
+	char memz_name[RTE_MEMZONE_NAMESIZE];
+
+	/* Queue identifier */
+	uint64_t id;	/**< queue id */
+	uint64_t qidx;	/**< queue index */
+	uint64_t qsize;	/**< queue size */
+
+	/* Queue address */
+	struct ptdma_desc *qbase_desc;
+	void *qbase_addr;
+	phys_addr_t qbase_phys_addr;
+	/**< queue-page registers addr */
+	void *reg_base;
+	uint32_t qcontrol;
+	/**< queue ctrl reg */
+	uint32_t head_offset;
+	uint32_t tail_offset;
+
+	int lsb;
+	/**< lsb region assigned to queue */
+	unsigned long lsbmask;
+	/**< lsb regions queue can access */
+	unsigned long lsbmap[PTDMA_BITMAP_SIZE(LSB_COUNT)];
+	/**< all lsb resources which queue is using */
+	uint32_t sb_key;
+	/**< lsb assigned for queue */
+} __rte_cache_aligned;
+
+/* Passthru engine */
+
+#define PTDMA_PT_BYTESWAP(p)      ((p)->pt.byteswap)
+#define PTDMA_PT_BITWISE(p)       ((p)->pt.bitwise)
+
+/**
+ * passthru_bitwise - type of bitwise passthru operation
+ *
+ * @PTDMA_PASSTHRU_BITWISE_NOOP: no bitwise operation performed
+ * @PTDMA_PASSTHRU_BITWISE_AND: perform bitwise AND of src with mask
+ * @PTDMA_PASSTHRU_BITWISE_OR: perform bitwise OR of src with mask
+ * @PTDMA_PASSTHRU_BITWISE_XOR: perform bitwise XOR of src with mask
+ * @PTDMA_PASSTHRU_BITWISE_MASK: overwrite with mask
+ */
+enum ptdma_passthru_bitwise {
+	PTDMA_PASSTHRU_BITWISE_NOOP = 0,
+	PTDMA_PASSTHRU_BITWISE_AND,
+	PTDMA_PASSTHRU_BITWISE_OR,
+	PTDMA_PASSTHRU_BITWISE_XOR,
+	PTDMA_PASSTHRU_BITWISE_MASK,
+	PTDMA_PASSTHRU_BITWISE__LAST,
+};
+
+/**
+ * ptdma_passthru_byteswap - type of byteswap passthru operation
+ *
+ * @PTDMA_PASSTHRU_BYTESWAP_NOOP: no byte swapping performed
+ * @PTDMA_PASSTHRU_BYTESWAP_32BIT: swap bytes within 32-bit words
+ * @PTDMA_PASSTHRU_BYTESWAP_256BIT: swap bytes within 256-bit words
+ */
+enum ptdma_passthru_byteswap {
+	PTDMA_PASSTHRU_BYTESWAP_NOOP = 0,
+	PTDMA_PASSTHRU_BYTESWAP_32BIT,
+	PTDMA_PASSTHRU_BYTESWAP_256BIT,
+	PTDMA_PASSTHRU_BYTESWAP__LAST,
+};
+
+/**
+ * PTDMA passthru
+ */
+struct ptdma_passthru {
+	phys_addr_t src_addr;
+	phys_addr_t dest_addr;
+	enum ptdma_passthru_bitwise bit_mod;
+	enum ptdma_passthru_byteswap byte_swap;
+	int len;
+};
+
+union ptdma_function {
+	struct {
+		uint16_t byteswap:2;
+		uint16_t bitwise:3;
+		uint16_t reflect:2;
+		uint16_t rsvd:8;
+	} pt;
+	uint16_t raw;
+};
+
+/**
+ * ptdma memory type
+ */
+enum ptdma_memtype {
+	PTDMA_MEMTYPE_SYSTEM = 0,
+	PTDMA_MEMTYPE_SB,
+	PTDMA_MEMTYPE_LOCAL,
+	PTDMA_MEMTYPE_LAST,
+};
+
+/*
+ * descriptor for PTDMA commands
+ * 8 32-bit words:
+ * word 0: function; engine; control bits
+ * word 1: length of source data
+ * word 2: low 32 bits of source pointer
+ * word 3: upper 16 bits of source pointer; source memory type
+ * word 4: low 32 bits of destination pointer
+ * word 5: upper 16 bits of destination pointer; destination memory type
+ * word 6: reserved 32 bits
+ * word 7: reserved 32 bits
+ */
+
+union dword0 {
+	struct {
+		uint32_t soc:1;
+		uint32_t ioc:1;
+		uint32_t rsvd1:1;
+		uint32_t init:1;
+		uint32_t eom:1;
+		uint32_t function:15;
+		uint32_t engine:4;
+		uint32_t prot:1;
+		uint32_t rsvd2:7;
+	};
+	uint32_t val;
+};
+
+struct dword3 {
+	uint32_t  src_hi:16;
+	uint32_t  src_mem:2;
+	uint32_t  lsb_cxt_id:8;
+	uint32_t  rsvd1:5;
+	uint32_t  fixed:1;
+};
+
+struct dword5 {
+	uint32_t  dst_hi:16;
+	uint32_t  dst_mem:2;
+	uint32_t  rsvd1:13;
+	uint32_t  fixed:1;
+};
+
+struct ptdma_desc {
+	union dword0 dw0;
+	uint32_t length;
+	uint32_t src_lo;
+	struct dword3 dw3;
+	uint32_t dst_lo;
+	struct dword5 dw5;
+	uint32_t rsvd1;
+	uint32_t rsvd2;
+};
+
+
+static inline uint32_t
+low32_value(unsigned long addr)
+{
+	return ((uint64_t)addr) & 0x0ffffffff;
+}
+
+static inline uint32_t
+high32_value(unsigned long addr)
+{
+	return ((uint64_t)addr >> 32) & 0x00000ffff;
+}
+
+#endif
diff --git a/drivers/raw/ptdma/ptdma_rawdev_test.c b/drivers/raw/ptdma/ptdma_rawdev_test.c
new file mode 100644
index 0000000000..fbbcd66c8d
--- /dev/null
+++ b/drivers/raw/ptdma/ptdma_rawdev_test.c
@@ -0,0 +1,272 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ **/
+
+#include <unistd.h>
+#include <inttypes.h>
+#include <rte_mbuf.h>
+#include "rte_rawdev.h"
+#include "rte_ptdma_rawdev.h"
+#include "ptdma_pmd_private.h"
+
+#define MAX_SUPPORTED_RAWDEVS 16
+#define TEST_SKIPPED 77
+
+
+static struct rte_mempool *pool;
+static unsigned short expected_ring_size[MAX_SUPPORTED_RAWDEVS];
+
+#define PRINT_ERR(...) print_err(__func__, __LINE__, __VA_ARGS__)
+
+static inline int
+__rte_format_printf(3, 4)
+print_err(const char *func, int lineno, const char *format, ...)
+{
+	va_list ap;
+	int ret;
+
+	ret = fprintf(stderr, "In %s:%d - ", func, lineno);
+	va_start(ap, format);
+	ret += vfprintf(stderr, format, ap);
+	va_end(ap);
+
+	return ret;
+}
+
+static int
+test_enqueue_copies(int dev_id)
+{
+	const unsigned int length = 1024;
+	unsigned int i = 0;
+	do {
+		struct rte_mbuf *src, *dst;
+		char *src_data, *dst_data;
+		struct rte_mbuf *completed[2] = {0};
+
+		/* test doing a single copy */
+		src = rte_pktmbuf_alloc(pool);
+		dst = rte_pktmbuf_alloc(pool);
+		src->data_len = src->pkt_len = length;
+		dst->data_len = dst->pkt_len = length;
+		src_data = rte_pktmbuf_mtod(src, char *);
+		dst_data = rte_pktmbuf_mtod(dst, char *);
+
+		for (i = 0; i < length; i++)
+			src_data[i] = rand() & 0xFF;
+
+		if (rte_ptdma_enqueue_copy(dev_id,
+				src->buf_iova + src->data_off,
+				dst->buf_iova + dst->data_off,
+				length,
+				(uintptr_t)src,
+				(uintptr_t)dst) != 1) {
+			PRINT_ERR("Error with rte_ptdma_enqueue_copy - 1\n");
+			return -1;
+		}
+		rte_ptdma_perform_ops(dev_id);
+		usleep(10);
+
+		if (rte_ptdma_completed_ops(dev_id, 1, (void *)&completed[0],
+				(void *)&completed[1]) != 1) {
+			PRINT_ERR("Error with rte_ptdma_completed_ops - 1\n");
+			return -1;
+		}
+		if (completed[0] != src || completed[1] != dst) {
+			PRINT_ERR("Error with completions: got (%p, %p), not (%p,%p)\n",
+					completed[0], completed[1], src, dst);
+			return -1;
+		}
+
+		for (i = 0; i < length; i++)
+			if (dst_data[i] != src_data[i]) {
+				PRINT_ERR("Data mismatch at char %u - 1\n", i);
+				return -1;
+			}
+		rte_pktmbuf_free(src);
+		rte_pktmbuf_free(dst);
+
+
+	} while (0);
+
+	/* test doing multiple copies */
+	do {
+		struct rte_mbuf *srcs[32], *dsts[32];
+		struct rte_mbuf *completed_src[64];
+		struct rte_mbuf *completed_dst[64];
+		unsigned int j;
+
+		for (i = 0; i < RTE_DIM(srcs) ; i++) {
+			char *src_data;
+
+			srcs[i] = rte_pktmbuf_alloc(pool);
+			dsts[i] = rte_pktmbuf_alloc(pool);
+			srcs[i]->data_len = srcs[i]->pkt_len = length;
+			dsts[i]->data_len = dsts[i]->pkt_len = length;
+			src_data = rte_pktmbuf_mtod(srcs[i], char *);
+
+			for (j = 0; j < length; j++)
+				src_data[j] = rand() & 0xFF;
+
+			if (rte_ptdma_enqueue_copy(dev_id,
+					srcs[i]->buf_iova + srcs[i]->data_off,
+					dsts[i]->buf_iova + dsts[i]->data_off,
+					length,
+					(uintptr_t)srcs[i],
+					(uintptr_t)dsts[i]) != 1) {
+				PRINT_ERR("Error with rte_ptdma_enqueue_copy for buffer %u\n",
+						i);
+				return -1;
+			}
+		}
+		rte_ptdma_perform_ops(dev_id);
+		usleep(100);
+
+		if (rte_ptdma_completed_ops(dev_id, 64, (void *)completed_src,
+				(void *)completed_dst) != RTE_DIM(srcs)) {
+			PRINT_ERR("Error with rte_ptdma_completed_ops\n");
+			return -1;
+		}
+
+		for (i = 0; i < RTE_DIM(srcs) ; i++) {
+			char *src_data, *dst_data;
+			if (completed_src[i] != srcs[i]) {
+				PRINT_ERR("Error with source pointer %u\n", i);
+				return -1;
+			}
+			if (completed_dst[i] != dsts[i]) {
+				PRINT_ERR("Error with dest pointer %u\n", i);
+				return -1;
+			}
+
+			src_data = rte_pktmbuf_mtod(srcs[i], char *);
+			dst_data = rte_pktmbuf_mtod(dsts[i], char *);
+			for (j = 0; j < length; j++)
+				if (src_data[j] != dst_data[j]) {
+					PRINT_ERR("Error with copy of packet %u, byte %u\n",
+							i, j);
+					return -1;
+				}
+
+			rte_pktmbuf_free(srcs[i]);
+			rte_pktmbuf_free(dsts[i]);
+		}
+
+	} while (0);
+
+	return 0;
+}
+
+int
+ptdma_rawdev_test(uint16_t dev_id)
+{
+#define PTDMA_TEST_RINGSIZE 512
+	struct rte_ptdma_rawdev_config p = { .ring_size = -1 };
+	struct rte_rawdev_info info = { .dev_private = &p };
+	struct rte_rawdev_xstats_name *snames = NULL;
+	uint64_t *stats = NULL;
+	unsigned int *ids = NULL;
+	unsigned int nb_xstats;
+	unsigned int i;
+
+	if (dev_id >= MAX_SUPPORTED_RAWDEVS) {
+		printf("Skipping test. Cannot test rawdevs with id's greater than %d\n",
+				MAX_SUPPORTED_RAWDEVS);
+		return TEST_SKIPPED;
+	}
+
+	rte_rawdev_info_get(dev_id, &info, sizeof(p));
+	if (p.ring_size != expected_ring_size[dev_id]) {
+		PRINT_ERR("Error, initial ring size is not as expected (Actual: %d, Expected: %d)\n",
+				(int)p.ring_size, expected_ring_size[dev_id]);
+		return -1;
+	}
+
+	p.ring_size = PTDMA_TEST_RINGSIZE;
+	if (rte_rawdev_configure(dev_id, &info, sizeof(p)) != 0) {
+		PRINT_ERR("Error with rte_rawdev_configure()\n");
+		return -1;
+	}
+	rte_rawdev_info_get(dev_id, &info, sizeof(p));
+	if (p.ring_size != PTDMA_TEST_RINGSIZE) {
+		PRINT_ERR("Error, ring size is not %d (%d)\n",
+				PTDMA_TEST_RINGSIZE, (int)p.ring_size);
+		return -1;
+	}
+	expected_ring_size[dev_id] = p.ring_size;
+
+	if (rte_rawdev_start(dev_id) != 0) {
+		PRINT_ERR("Error with rte_rawdev_start()\n");
+		return -1;
+	}
+
+	pool = rte_pktmbuf_pool_create("TEST_PTDMA_POOL",
+			256, /* n == num elements */
+			32,  /* cache size */
+			0,   /* priv size */
+			2048, /* data room size */
+			info.socket_id);
+	if (pool == NULL) {
+		PRINT_ERR("Error with mempool creation\n");
+		return -1;
+	}
+
+	/* allocate memory for xstats names and values */
+	nb_xstats = rte_rawdev_xstats_names_get(dev_id, NULL, 0);
+
+	snames = malloc(sizeof(*snames) * nb_xstats);
+	if (snames == NULL) {
+		PRINT_ERR("Error allocating xstat names memory\n");
+		goto err;
+	}
+	rte_rawdev_xstats_names_get(dev_id, snames, nb_xstats);
+
+	ids = malloc(sizeof(*ids) * nb_xstats);
+	if (ids == NULL) {
+		PRINT_ERR("Error allocating xstat ids memory\n");
+		goto err;
+	}
+	for (i = 0; i < nb_xstats; i++)
+		ids[i] = i;
+
+	stats = malloc(sizeof(*stats) * nb_xstats);
+	if (stats == NULL) {
+		PRINT_ERR("Error allocating xstat memory\n");
+		goto err;
+	}
+
+	/* run the test cases */
+	printf("Running Copy Tests\n");
+	for (i = 0; i < 100; i++) {
+		unsigned int j;
+
+		if (test_enqueue_copies(dev_id) != 0)
+			goto err;
+
+		rte_rawdev_xstats_get(dev_id, ids, stats, nb_xstats);
+		for (j = 0; j < nb_xstats; j++)
+			printf("%s: %"PRIu64"   ", snames[j].name, stats[j]);
+		printf("\r");
+	}
+	printf("\n");
+
+	rte_rawdev_stop(dev_id);
+	if (rte_rawdev_xstats_reset(dev_id, NULL, 0) != 0) {
+		PRINT_ERR("Error resetting xstat values\n");
+		goto err;
+	}
+
+	rte_mempool_free(pool);
+	free(snames);
+	free(stats);
+	free(ids);
+	return 0;
+
+err:
+	rte_rawdev_stop(dev_id);
+	rte_rawdev_xstats_reset(dev_id, NULL, 0);
+	rte_mempool_free(pool);
+	free(snames);
+	free(stats);
+	free(ids);
+	return -1;
+}
diff --git a/drivers/raw/ptdma/rte_ptdma_rawdev.h b/drivers/raw/ptdma/rte_ptdma_rawdev.h
new file mode 100644
index 0000000000..84eccbc4e8
--- /dev/null
+++ b/drivers/raw/ptdma/rte_ptdma_rawdev.h
@@ -0,0 +1,124 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+
+#ifndef _RTE_PTMDA_RAWDEV_H_
+#define _RTE_PTMDA_RAWDEV_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_ptdma_rawdev.h
+ *
+ * Definitions for using the ptdma rawdev device driver
+ *
+ * @warning
+ * @b EXPERIMENTAL: these structures and APIs may change without prior notice
+ */
+
+#include <rte_common.h>
+
+/** Name of the device driver */
+#define PTDMA_PMD_RAWDEV_NAME rawdev_ptdma
+/** String reported as the device driver name by rte_rawdev_info_get() */
+#define PTDMA_PMD_RAWDEV_NAME_STR "rawdev_ptdma"
+
+/**
+ * Configuration structure for an ptdma rawdev instance
+ *
+ * This structure is to be passed as the ".dev_private" parameter when
+ * calling the rte_rawdev_get_info() and rte_rawdev_configure() APIs on
+ * an ptdma rawdev instance.
+ */
+struct rte_ptdma_rawdev_config {
+	unsigned short ring_size; /**< size of job submission descriptor ring */
+	bool hdls_disable;    /**< if set, ignore user-supplied handle params */
+};
+
+/**
+ * Enqueue a copy operation onto the ptdma device
+ *
+ * This queues up a copy operation to be performed by hardware, but does not
+ * trigger hardware to begin that operation.
+ *
+ * @param dev_id
+ *   The rawdev device id of the ptdma instance
+ * @param src
+ *   The physical address of the source buffer
+ * @param dst
+ *   The physical address of the destination buffer
+ * @param length
+ *   The length of the data to be copied
+ * @param src_hdl
+ *   An opaque handle for the source data, to be returned when this operation
+ *   has been completed and the user polls for the completion details.
+ *   NOTE: If hdls_disable configuration option for the device is set, this
+ *   parameter is ignored.
+ * @param dst_hdl
+ *   An opaque handle for the destination data, to be returned when this
+ *   operation has been completed and the user polls for the completion details.
+ *   NOTE: If hdls_disable configuration option for the device is set, this
+ *   parameter is ignored.
+ * @return
+ *   Number of operations enqueued, either 0 or 1
+ */
+static inline int
+__rte_experimental
+rte_ptdma_enqueue_copy(int dev_id, phys_addr_t src, phys_addr_t dst,
+		unsigned int length, uintptr_t src_hdl, uintptr_t dst_hdl);
+
+
+/**
+ * Trigger hardware to begin performing enqueued operations
+ *
+ * This API is used to write to the hardware to trigger it
+ * to begin the operations previously enqueued by rte_ptdma_enqueue_copy()
+ *
+ * @param dev_id
+ *   The rawdev device id of the ptdma instance
+ */
+static inline void
+__rte_experimental
+rte_ptdma_perform_ops(int dev_id);
+
+/**
+ * Returns details of operations that have been completed
+ *
+ * This function returns number of newly-completed operations.
+ *
+ * @param dev_id
+ *   The rawdev device id of the ptdma instance
+ * @param max_copies
+ *   The number of entries which can fit in the src_hdls and dst_hdls
+ *   arrays, i.e. max number of completed operations to report.
+ *   NOTE: If hdls_disable configuration option for the device is set, this
+ *   parameter is ignored.
+ * @param src_hdls
+ *   Array to hold the source handle parameters of the completed ops.
+ *   NOTE: If hdls_disable configuration option for the device is set, this
+ *   parameter is ignored.
+ * @param dst_hdls
+ *   Array to hold the destination handle parameters of the completed ops.
+ *   NOTE: If hdls_disable configuration option for the device is set, this
+ *   parameter is ignored.
+ * @return
+ *   -1 on error, with rte_errno set appropriately.
+ *   Otherwise number of completed operations i.e. number of entries written
+ *   to the src_hdls and dst_hdls array parameters.
+ */
+static inline int
+__rte_experimental
+rte_ptdma_completed_ops(int dev_id, uint8_t max_copies,
+		uintptr_t *src_hdls, uintptr_t *dst_hdls);
+
+
+/* include the implementation details from a separate file */
+#include "rte_ptdma_rawdev_fns.h"
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_PTMDA_RAWDEV_H_ */
diff --git a/drivers/raw/ptdma/rte_ptdma_rawdev_fns.h b/drivers/raw/ptdma/rte_ptdma_rawdev_fns.h
new file mode 100644
index 0000000000..f4dced3bef
--- /dev/null
+++ b/drivers/raw/ptdma/rte_ptdma_rawdev_fns.h
@@ -0,0 +1,298 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 Advanced Micro Devices, Inc. All rights reserved.
+ */
+#ifndef _RTE_PTDMA_RAWDEV_FNS_H_
+#define _RTE_PTDMA_RAWDEV_FNS_H_
+
+#include <x86intrin.h>
+#include <rte_rawdev.h>
+#include <rte_memzone.h>
+#include <rte_prefetch.h>
+#include "ptdma_rawdev_spec.h"
+#include "ptdma_pmd_private.h"
+
+/**
+ * @internal
+ * some statistics for tracking, if added/changed update xstats fns
+ */
+struct rte_ptdma_xstats {
+	uint64_t enqueue_failed;
+	uint64_t enqueued;
+	uint64_t started;
+	uint64_t completed;
+};
+
+/**
+ * @internal
+ * Structure representing an PTDMA device instance
+ */
+struct rte_ptdma_rawdev {
+	struct rte_rawdev *rawdev;
+	struct rte_ptdma_xstats xstats;
+	unsigned short ring_size;
+
+	bool hdls_disable;
+	__m128i *hdls; /* completion handles for returning to user */
+	unsigned short next_read;
+	unsigned short next_write;
+
+	int id; /**< ptdma dev id on platform */
+	struct ptdma_cmd_queue cmd_q[MAX_HW_QUEUES]; /**< ptdma queue */
+	int cmd_q_count; /**< no. of ptdma Queues */
+	struct rte_pci_device pci; /**< ptdma pci identifier */
+	int qidx;
+
+};
+
+static __rte_always_inline void
+ptdma_dump_registers(int dev_id)
+{
+	struct rte_ptdma_rawdev *ptdma_priv =
+		(struct rte_ptdma_rawdev *)rte_rawdevs[dev_id].dev_private;
+	struct ptdma_cmd_queue *cmd_q;
+	uint32_t cur_head_offset;
+	uint32_t cur_tail_offset;
+
+	cmd_q = &ptdma_priv->cmd_q[0];
+
+	PTDMA_PMD_DEBUG("cmd_q->head_offset	= %d\n", cmd_q->head_offset);
+	PTDMA_PMD_DEBUG("cmd_q->tail_offset	= %d\n", cmd_q->tail_offset);
+	PTDMA_PMD_DEBUG("cmd_q->id		= %" PRIx64 "\n", cmd_q->id);
+	PTDMA_PMD_DEBUG("cmd_q->qidx		= %" PRIx64 "\n", cmd_q->qidx);
+	PTDMA_PMD_DEBUG("cmd_q->qsize		= %" PRIx64 "\n", cmd_q->qsize);
+
+	cur_head_offset = PTDMA_READ_REG(cmd_q->reg_base,
+			CMD_Q_HEAD_LO_BASE);
+	cur_tail_offset = PTDMA_READ_REG(cmd_q->reg_base,
+			CMD_Q_TAIL_LO_BASE);
+
+	PTDMA_PMD_DEBUG("cur_head_offset	= %d\n", cur_head_offset);
+	PTDMA_PMD_DEBUG("cur_tail_offset	= %d\n", cur_tail_offset);
+	PTDMA_PMD_DEBUG("Q_CONTROL_BASE		= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_CONTROL_BASE));
+	PTDMA_PMD_DEBUG("Q_STATUS_BASE		= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_STATUS_BASE));
+	PTDMA_PMD_DEBUG("Q_INT_STATUS_BASE	= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_INT_STATUS_BASE));
+	PTDMA_PMD_DEBUG("Q_DMA_STATUS_BASE	= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_DMA_STATUS_BASE));
+	PTDMA_PMD_DEBUG("Q_DMA_RD_STS_BASE	= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_DMA_READ_STATUS_BASE));
+	PTDMA_PMD_DEBUG("Q_DMA_WRT_STS_BASE	= 0x%x\n",
+						PTDMA_READ_REG(cmd_q->reg_base,
+						CMD_Q_DMA_WRITE_STATUS_BASE));
+}
+
+static __rte_always_inline void
+ptdma_perform_passthru(struct ptdma_passthru *pst,
+		struct ptdma_cmd_queue *cmd_q)
+{
+	struct ptdma_desc *desc;
+	union ptdma_function function;
+
+	desc = &cmd_q->qbase_desc[cmd_q->qidx];
+
+	PTDMA_CMD_ENGINE(desc) = PTDMA_ENGINE_PASSTHRU;
+
+	PTDMA_CMD_SOC(desc) = 0;
+	PTDMA_CMD_IOC(desc) = 0;
+	PTDMA_CMD_INIT(desc) = 0;
+	PTDMA_CMD_EOM(desc) = 0;
+	PTDMA_CMD_PROT(desc) = 0;
+
+	function.raw = 0;
+	PTDMA_PT_BYTESWAP(&function) = pst->byte_swap;
+	PTDMA_PT_BITWISE(&function) = pst->bit_mod;
+	PTDMA_CMD_FUNCTION(desc) = function.raw;
+	PTDMA_CMD_LEN(desc) = pst->len;
+
+	PTDMA_CMD_SRC_LO(desc) = (uint32_t)(pst->src_addr);
+	PTDMA_CMD_SRC_HI(desc) = high32_value(pst->src_addr);
+	PTDMA_CMD_SRC_MEM(desc) = PTDMA_MEMTYPE_SYSTEM;
+
+	PTDMA_CMD_DST_LO(desc) = (uint32_t)(pst->dest_addr);
+	PTDMA_CMD_DST_HI(desc) = high32_value(pst->dest_addr);
+	PTDMA_CMD_DST_MEM(desc) = PTDMA_MEMTYPE_SYSTEM;
+
+	cmd_q->qidx = (cmd_q->qidx + 1) % COMMANDS_PER_QUEUE;
+
+}
+
+
+static __rte_always_inline int
+ptdma_ops_to_enqueue(int dev_id, uint32_t op, uint64_t src, phys_addr_t dst,
+		unsigned int length, uintptr_t src_hdl, uintptr_t dst_hdl)
+{
+	struct rte_ptdma_rawdev *ptdma_priv =
+		(struct rte_ptdma_rawdev *)rte_rawdevs[dev_id].dev_private;
+	struct ptdma_cmd_queue *cmd_q;
+	struct ptdma_passthru pst;
+	uint32_t cmd_q_ctrl;
+	unsigned short write	= ptdma_priv->next_write;
+	unsigned short read	= ptdma_priv->next_read;
+	unsigned short mask	= ptdma_priv->ring_size - 1;
+	unsigned short space	= mask + read - write;
+
+	cmd_q = &ptdma_priv->cmd_q[0];
+	cmd_q_ctrl = PTDMA_READ_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE);
+
+	if (cmd_q_ctrl & CMD_Q_RUN) {
+		/* Turn the queue off using control register */
+		PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE,
+				cmd_q_ctrl & ~CMD_Q_RUN);
+		do {
+			cmd_q_ctrl = PTDMA_READ_REG(cmd_q->reg_base,
+					CMD_Q_CONTROL_BASE);
+		} while (!(cmd_q_ctrl & CMD_Q_HALT));
+	}
+
+	if (space == 0) {
+		ptdma_priv->xstats.enqueue_failed++;
+		return 0;
+	}
+
+	ptdma_priv->next_write = write + 1;
+	write &= mask;
+
+	if (!op)
+		pst.src_addr	= src;
+	else
+		PTDMA_PMD_DEBUG("Operation not supported by PTDMA\n");
+
+	pst.dest_addr	= dst;
+	pst.len		= length;
+	pst.bit_mod	= PTDMA_PASSTHRU_BITWISE_NOOP;
+	pst.byte_swap	= PTDMA_PASSTHRU_BYTESWAP_NOOP;
+
+	cmd_q = &ptdma_priv->cmd_q[0];
+
+	cmd_q->head_offset = (uint32_t)(PTDMA_READ_REG(cmd_q->reg_base,
+				CMD_Q_HEAD_LO_BASE));
+
+	ptdma_perform_passthru(&pst, cmd_q);
+
+	cmd_q->tail_offset = (uint32_t)(cmd_q->qbase_phys_addr + cmd_q->qidx *
+				Q_DESC_SIZE);
+	rte_wmb();
+
+	/* Write the new tail address back to the queue register */
+	PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_TAIL_LO_BASE,
+			cmd_q->tail_offset);
+
+	if (!ptdma_priv->hdls_disable)
+		ptdma_priv->hdls[write] =
+					_mm_set_epi64x((int64_t)dst_hdl,
+							(int64_t)src_hdl);
+	ptdma_priv->xstats.enqueued++;
+
+	return 1;
+}
+
+static __rte_always_inline int
+ptdma_ops_to_dequeue(int dev_id, int max_copies, uintptr_t *src_hdls,
+						uintptr_t *dst_hdls)
+{
+	struct rte_ptdma_rawdev *ptdma_priv =
+		(struct rte_ptdma_rawdev *)rte_rawdevs[dev_id].dev_private;
+	struct ptdma_cmd_queue *cmd_q;
+	uint32_t cur_head_offset;
+	short end_read;
+	unsigned short count;
+	unsigned short read	= ptdma_priv->next_read;
+	unsigned short write	= ptdma_priv->next_write;
+	unsigned short mask	= ptdma_priv->ring_size - 1;
+	int i = 0;
+
+	cmd_q = &ptdma_priv->cmd_q[0];
+
+	cur_head_offset = PTDMA_READ_REG(cmd_q->reg_base,
+			CMD_Q_HEAD_LO_BASE);
+
+	end_read = cur_head_offset - cmd_q->head_offset;
+
+	if (end_read < 0)
+		end_read = COMMANDS_PER_QUEUE - cmd_q->head_offset
+				+ cur_head_offset;
+	if (end_read < max_copies)
+		return 0;
+
+	if (end_read != 0)
+		count = (write - (read & mask)) & mask;
+	else
+		return 0;
+
+	if (ptdma_priv->hdls_disable) {
+		read += count;
+		goto end;
+	}
+
+	if (count > max_copies)
+		count = max_copies;
+
+	for (; i < count - 1; i += 2, read += 2) {
+		__m128i hdls0 =
+			_mm_load_si128(&ptdma_priv->hdls[read & mask]);
+		__m128i hdls1 =
+			_mm_load_si128(&ptdma_priv->hdls[(read + 1) & mask]);
+		_mm_storeu_si128((__m128i *)&src_hdls[i],
+				_mm_unpacklo_epi64(hdls0, hdls1));
+		_mm_storeu_si128((__m128i *)&dst_hdls[i],
+				_mm_unpackhi_epi64(hdls0, hdls1));
+	}
+
+	for (; i < count; i++, read++) {
+		uintptr_t *hdls =
+			(uintptr_t *)&ptdma_priv->hdls[read & mask];
+		src_hdls[i] = hdls[0];
+		dst_hdls[i] = hdls[1];
+	}
+end:
+	ptdma_priv->next_read = read;
+	ptdma_priv->xstats.completed += count;
+
+	return count;
+}
+
+static inline int
+rte_ptdma_enqueue_copy(int dev_id, phys_addr_t src, phys_addr_t dst,
+		unsigned int length, uintptr_t src_hdl, uintptr_t dst_hdl)
+{
+	return ptdma_ops_to_enqueue(dev_id, 0, src, dst, length,
+					src_hdl, dst_hdl);
+}
+
+static inline void
+rte_ptdma_perform_ops(int dev_id)
+{
+	struct rte_ptdma_rawdev *ptdma_priv =
+		(struct rte_ptdma_rawdev *)rte_rawdevs[dev_id].dev_private;
+	struct ptdma_cmd_queue *cmd_q;
+	uint32_t cmd_q_ctrl;
+
+	cmd_q = &ptdma_priv->cmd_q[0];
+	cmd_q_ctrl = PTDMA_READ_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE);
+
+	 /* Turn the queue on using control register */
+	PTDMA_WRITE_REG(cmd_q->reg_base, CMD_Q_CONTROL_BASE,
+			cmd_q_ctrl | CMD_Q_RUN);
+
+	ptdma_priv->xstats.started = ptdma_priv->xstats.enqueued;
+}
+
+static inline int
+rte_ptdma_completed_ops(int dev_id, uint8_t max_copies,
+		uintptr_t *src_hdls, uintptr_t *dst_hdls)
+{
+	int ret = 0;
+
+	ret = ptdma_ops_to_dequeue(dev_id, max_copies, src_hdls, dst_hdls);
+
+	return ret;
+}
+
+#endif
diff --git a/drivers/raw/ptdma/version.map b/drivers/raw/ptdma/version.map
new file mode 100644
index 0000000000..45917242ca
--- /dev/null
+++ b/drivers/raw/ptdma/version.map
@@ -0,0 +1,5 @@
+DPDK_21 {
+
+       local: *;
+};
+
diff --git a/usertools/dpdk-devbind.py b/usertools/dpdk-devbind.py
index 74d16e4c4b..30c11e92ba 100755
--- a/usertools/dpdk-devbind.py
+++ b/usertools/dpdk-devbind.py
@@ -65,6 +65,8 @@
                  'SVendor': None, 'SDevice': None}
 intel_ntb_icx = {'Class': '06', 'Vendor': '8086', 'Device': '347e',
                  'SVendor': None, 'SDevice': None}
+amd_ptdma   = {'Class': '10', 'Vendor': '1022', 'Device': '1498',
+                 'SVendor': None, 'SDevice': None}
 
 network_devices = [network_class, cavium_pkx, avp_vnic, ifpga_class]
 baseband_devices = [acceleration_class]
@@ -74,7 +76,7 @@
 compress_devices = [cavium_zip]
 regex_devices = [octeontx2_ree]
 misc_devices = [cnxk_bphy, cnxk_bphy_cgx, intel_ioat_bdw, intel_ioat_skx, intel_ioat_icx, intel_idxd_spr,
-                intel_ntb_skx, intel_ntb_icx,
+                intel_ntb_skx, intel_ntb_icx, amd_ptdma,
                 octeontx2_dma]
 
 # global dict ethernet devices present. Dictionary indexed by PCI address.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-06-28  9:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-06 16:55 [dpdk-dev] [RFC PATCH v2] raw/ptdma: introduce ptdma driver Selwin Sebastian
2021-09-06 17:17 ` David Marchand
2021-10-27 14:59   ` Thomas Monjalon
2021-10-28 14:54     ` Sebastian, Selwin
2022-06-15 14:35       ` Thomas Monjalon
2022-06-16 14:27         ` Sebastian, Selwin
2023-06-28  9:08           ` Ferruh Yigit
  -- strict thread matches above, loose matches on Subject: below --
2021-09-06 15:59 Selwin Sebastian
2021-09-06 14:34 Selwin Sebastian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).