ABI - search results

DPDK patches and discussions
 help / color / mirror / Atom feed

Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download:

* [dpdk-dev] [PATCH] eal/service: remove experimental tags
@ 2018-04-05 13:15  9% Harry van Haaren
  0 siblings, 0 replies; 200+ results
From: Harry van Haaren @ 2018-04-05 13:15 UTC (permalink / raw)
  To: dev; +Cc: Harry van Haaren

This commit removes the experimental tags from the
service cores functions, they now become part of the
main DPDK API/ABI.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>

---

 MAINTAINERS                                        |   2 +-
 doc/guides/rel_notes/release_18_05.rst             |   7 ++
 examples/service_cores/Makefile                    |   3 -
 examples/service_cores/meson.build                 |   1 -
 lib/librte_eal/common/include/rte_service.h        | 117 ++++-----------------
 .../common/include/rte_service_component.h         |  38 ++-----
 lib/librte_eal/common/rte_service.c                |  55 +++++-----
 lib/librte_eal/rte_eal_version.map                 |  38 ++++---
 8 files changed, 87 insertions(+), 174 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index ed3251d..d10c27d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -156,7 +156,7 @@ F: test/test/test_mp_secondary.c
 F: examples/multi_process/
 F: doc/guides/sample_app_ug/multi_process.rst
 
-Service Cores - EXPERIMENTAL
+Service Cores
 M: Harry van Haaren <harry.van.haaren@intel.com>
 F: lib/librte_eal/common/include/rte_service.h
 F: lib/librte_eal/common/include/rte_service_component.h
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index e5fac1c..940a308 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -72,6 +72,13 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Service Cores is no longer marked as experimental.**
+
+  The service cores functions are no longer marked as experimental, and have
+  become part of the normal DPDK API and ABI. Any future ABI changes will be
+  announced at least one release before the ABI change is made. There are no
+  ABI breaking changes planned.
+
 
 ABI Changes
 -----------
diff --git a/examples/service_cores/Makefile b/examples/service_cores/Makefile
index 3156e35..a4d6b7b 100644
--- a/examples/service_cores/Makefile
+++ b/examples/service_cores/Makefile
@@ -23,8 +23,6 @@ CFLAGS += -O3 $(shell pkg-config --cflags libdpdk)
 LDFLAGS_SHARED = $(shell pkg-config --libs libdpdk)
 LDFLAGS_STATIC = -Wl,-Bstatic $(shell pkg-config --static --libs libdpdk)
 
-CFLAGS += -DALLOW_EXPERIMENTAL_API
-
 build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build
 	$(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED)
 
@@ -50,7 +48,6 @@ RTE_TARGET ?= x86_64-native-linuxapp-gcc
 
 include $(RTE_SDK)/mk/rte.vars.mk
 
-CFLAGS += -DALLOW_EXPERIMENTAL_API
 CFLAGS += $(WERROR_FLAGS)
 
 # workaround for a gcc bug with noreturn attribute
diff --git a/examples/service_cores/meson.build b/examples/service_cores/meson.build
index 2b0a250..c34e11e 100644
--- a/examples/service_cores/meson.build
+++ b/examples/service_cores/meson.build
@@ -6,7 +6,6 @@
 # To build this example as a standalone application with an already-installed
 # DPDK instance, use 'make'
 
-allow_experimental_apis = true
 sources = files(
 	'main.c'
 )
diff --git a/lib/librte_eal/common/include/rte_service.h b/lib/librte_eal/common/include/rte_service.h
index 211eb37..aea4d91 100644
--- a/lib/librte_eal/common/include/rte_service.h
+++ b/lib/librte_eal/common/include/rte_service.h
@@ -47,9 +47,6 @@ extern "C" {
 #define RTE_SERVICE_CAP_MT_SAFE (1 << 0)
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  *  Return the number of services registered.
  *
  * The number of services registered can be passed to *rte_service_get_by_id*,
@@ -57,12 +54,9 @@ extern "C" {
  *
  * @return The number of services registered.
  */
-uint32_t __rte_experimental rte_service_get_count(void);
+uint32_t rte_service_get_count(void);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Return the id of a service by name.
  *
  * This function provides the id of the service using the service name as
@@ -84,24 +78,17 @@ uint32_t __rte_experimental rte_service_get_count(void);
  * @retval -EINVAL Null *service_id* pointer provided
  * @retval -ENODEV No such service registered
  */
-int32_t __rte_experimental rte_service_get_by_name(const char *name,
-					       uint32_t *service_id);
+int32_t rte_service_get_by_name(const char *name, uint32_t *service_id);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Return the name of the service.
  *
  * @return A pointer to the name of the service. The returned pointer remains
  *         in ownership of the service, and the application must not free it.
  */
-const char __rte_experimental *rte_service_get_name(uint32_t id);
+const char *rte_service_get_name(uint32_t id);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Check if a service has a specific capability.
  *
  * This function returns if *service* has implements *capability*.
@@ -109,13 +96,9 @@ const char __rte_experimental *rte_service_get_name(uint32_t id);
  * @retval 1 Capability supported by this service instance
  * @retval 0 Capability not supported by this service instance
  */
-int32_t __rte_experimental rte_service_probe_capability(uint32_t id,
-						    uint32_t capability);
+int32_t rte_service_probe_capability(uint32_t id, uint32_t capability);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Map or unmap a lcore to a service.
  *
  * Each core can be added or removed from running a specific service. This
@@ -134,13 +117,10 @@ int32_t __rte_experimental rte_service_probe_capability(uint32_t id,
  * @retval 0 lcore map updated successfully
  * @retval -EINVAL An invalid service or lcore was provided.
  */
-int32_t __rte_experimental rte_service_map_lcore_set(uint32_t service_id,
-				  uint32_t lcore, uint32_t enable);
+int32_t rte_service_map_lcore_set(uint32_t service_id, uint32_t lcore,
+		uint32_t enable);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Retrieve the mapping of an lcore to a service.
  *
  * @param service_id the service to apply the lcore to
@@ -150,13 +130,9 @@ int32_t __rte_experimental rte_service_map_lcore_set(uint32_t service_id,
  * @retval 0 lcore is not mapped to service
  * @retval -EINVAL An invalid service or lcore was provided.
  */
-int32_t __rte_experimental rte_service_map_lcore_get(uint32_t service_id,
-						 uint32_t lcore);
+int32_t rte_service_map_lcore_get(uint32_t service_id, uint32_t lcore);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Set the runstate of the service.
  *
  * Each service is either running or stopped. Setting a non-zero runstate
@@ -168,12 +144,9 @@ int32_t __rte_experimental rte_service_map_lcore_get(uint32_t service_id,
  * @retval 0 The service was successfully started
  * @retval -EINVAL Invalid service id
  */
-int32_t __rte_experimental rte_service_runstate_set(uint32_t id, uint32_t runstate);
+int32_t rte_service_runstate_set(uint32_t id, uint32_t runstate);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Get the runstate for the service with *id*. See *rte_service_runstate_set*
  * for details of runstates. A service can call this function to ensure that
  * the application has indicated that it will receive CPU cycles. Either a
@@ -186,12 +159,9 @@ int32_t __rte_experimental rte_service_runstate_set(uint32_t id, uint32_t runsta
  * @retval 0 Service is stopped
  * @retval -EINVAL Invalid service id
  */
-int32_t __rte_experimental rte_service_runstate_get(uint32_t id);
+int32_t rte_service_runstate_get(uint32_t id);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Enable or disable the check for a service-core being mapped to the service.
  * An application can disable the check when takes the responsibility to run a
  * service itself using *rte_service_run_iter_on_app_lcore*.
@@ -202,13 +172,9 @@ int32_t __rte_experimental rte_service_runstate_get(uint32_t id);
  * @retval 0 Success
  * @retval -EINVAL Invalid service ID
  */
-int32_t __rte_experimental rte_service_set_runstate_mapped_check(uint32_t id,
-							     int32_t enable);
+int32_t rte_service_set_runstate_mapped_check(uint32_t id, int32_t enable);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * This function runs a service callback from a non-service lcore.
  *
  * This function is designed to enable gradual porting to service cores, and
@@ -241,13 +207,10 @@ int32_t __rte_experimental rte_service_set_runstate_mapped_check(uint32_t id,
  * @retval -ENOEXEC Service is not in a run-able state
  * @retval -EINVAL Invalid service id
  */
-int32_t __rte_experimental rte_service_run_iter_on_app_lcore(uint32_t id,
+int32_t rte_service_run_iter_on_app_lcore(uint32_t id,
 		uint32_t serialize_multithread_unsafe);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Start a service core.
  *
  * Starting a core makes the core begin polling. Any services assigned to it
@@ -259,12 +222,9 @@ int32_t __rte_experimental rte_service_run_iter_on_app_lcore(uint32_t id,
  * @retval -EINVAL Failed to start core. The *lcore_id* passed in is not
  *          currently assigned to be a service core.
  */
-int32_t __rte_experimental rte_service_lcore_start(uint32_t lcore_id);
+int32_t rte_service_lcore_start(uint32_t lcore_id);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Stop a service core.
  *
  * Stopping a core makes the core become idle, but remains  assigned as a
@@ -278,12 +238,9 @@ int32_t __rte_experimental rte_service_lcore_start(uint32_t lcore_id);
  *          The application must stop the service first, and then stop the
  *          lcore.
  */
-int32_t __rte_experimental rte_service_lcore_stop(uint32_t lcore_id);
+int32_t rte_service_lcore_stop(uint32_t lcore_id);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Adds lcore to the list of service cores.
  *
  * This functions can be used at runtime in order to modify the service core
@@ -294,12 +251,9 @@ int32_t __rte_experimental rte_service_lcore_stop(uint32_t lcore_id);
  * @retval -EALREADY lcore is already added to the service core list
  * @retval -EINVAL Invalid lcore provided
  */
-int32_t __rte_experimental rte_service_lcore_add(uint32_t lcore);
+int32_t rte_service_lcore_add(uint32_t lcore);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Removes lcore from the list of service cores.
  *
  * This can fail if the core is not stopped, see *rte_service_core_stop*.
@@ -308,12 +262,9 @@ int32_t __rte_experimental rte_service_lcore_add(uint32_t lcore);
  * @retval -EBUSY Lcore is not stopped, stop service core before removing.
  * @retval -EINVAL failed to add lcore to service core mask.
  */
-int32_t __rte_experimental rte_service_lcore_del(uint32_t lcore);
+int32_t rte_service_lcore_del(uint32_t lcore);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Retrieve the number of service cores currently available.
  *
  * This function returns the integer count of service cores available. The
@@ -325,24 +276,18 @@ int32_t __rte_experimental rte_service_lcore_del(uint32_t lcore);
  *
  * @return The number of service cores currently configured.
  */
-int32_t __rte_experimental rte_service_lcore_count(void);
+int32_t rte_service_lcore_count(void);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Resets all service core mappings. This does not remove the service cores
  * from duty, just unmaps all services / cores, and stops() the service cores.
  * The runstate of services is not modified.
  *
  * @retval 0 Success
  */
-int32_t __rte_experimental rte_service_lcore_reset_all(void);
+int32_t rte_service_lcore_reset_all(void);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Enable or disable statistics collection for *service*.
  *
  * This function enables per core, per-service cycle count collection.
@@ -351,13 +296,9 @@ int32_t __rte_experimental rte_service_lcore_reset_all(void);
  * @retval 0 Success
  * @retval -EINVAL Invalid service pointer passed
  */
-int32_t __rte_experimental rte_service_set_stats_enable(uint32_t id,
-						    int32_t enable);
+int32_t rte_service_set_stats_enable(uint32_t id, int32_t enable);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Retrieve the list of currently enabled service cores.
  *
  * This function fills in an application supplied array, with each element
@@ -373,12 +314,9 @@ int32_t __rte_experimental rte_service_set_stats_enable(uint32_t id,
  *          service core list. No items have been populated, call this function
  *          with a size of at least *rte_service_core_count* items.
  */
-int32_t __rte_experimental rte_service_lcore_list(uint32_t array[], uint32_t n);
+int32_t rte_service_lcore_list(uint32_t array[], uint32_t n);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Get the numer of services running on the supplied lcore.
  *
  * @param lcore Id of the service core.
@@ -386,19 +324,16 @@ int32_t __rte_experimental rte_service_lcore_list(uint32_t array[], uint32_t n);
  * @retval -EINVAL Invalid lcore provided
  * @retval -ENOTSUP The provided lcore is not a service core.
  */
-int32_t __rte_experimental rte_service_lcore_count_services(uint32_t lcore);
+int32_t rte_service_lcore_count_services(uint32_t lcore);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Dumps any information available about the service. When id is UINT32_MAX,
  * this function dumps info for all services.
  *
  * @retval 0 Statistics have been successfully dumped
  * @retval -EINVAL Invalid service id provided
  */
-int32_t __rte_experimental rte_service_dump(FILE *f, uint32_t id);
+int32_t rte_service_dump(FILE *f, uint32_t id);
 
 /**
  * Returns the number of cycles that this service has consumed
@@ -411,28 +346,22 @@ int32_t __rte_experimental rte_service_dump(FILE *f, uint32_t id);
 #define RTE_SERVICE_ATTR_CALL_COUNT 1
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Get an attribute from a service.
  *
  * @retval 0 Success, the attribute value has been written to *attr_value*.
  *         -EINVAL Invalid id, attr_id or attr_value was NULL.
  */
-int32_t __rte_experimental rte_service_attr_get(uint32_t id, uint32_t attr_id,
+int32_t rte_service_attr_get(uint32_t id, uint32_t attr_id,
 		uint32_t *attr_value);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Reset all attribute values of a service.
  *
  * @param id The service to reset all statistics of
  * @retval 0 Successfully reset attributes
  *         -EINVAL Invalid service id provided
  */
-int32_t __rte_experimental rte_service_attr_reset_all(uint32_t id);
+int32_t rte_service_attr_reset_all(uint32_t id);
 
 #ifdef __cplusplus
 }
diff --git a/lib/librte_eal/common/include/rte_service_component.h b/lib/librte_eal/common/include/rte_service_component.h
index 9ba4aa2..c12adbc 100644
--- a/lib/librte_eal/common/include/rte_service_component.h
+++ b/lib/librte_eal/common/include/rte_service_component.h
@@ -13,17 +13,11 @@
 #include <rte_service.h>
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Signature of callback function to run a service.
  */
 typedef int32_t (*rte_service_func)(void *args);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * The specification of a service.
  *
  * This struct contains metadata about the service itself, the callback
@@ -47,9 +41,6 @@ struct rte_service_spec {
 };
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Register a new service.
  *
  * A service represents a component that the requires CPU time periodically to
@@ -73,14 +64,10 @@ struct rte_service_spec {
  *         -EINVAL Attempted to register an invalid service (eg, no callback
  *         set)
  */
-int32_t __rte_experimental
-rte_service_component_register(const struct rte_service_spec *spec,
-			       uint32_t *service_id);
+int32_t rte_service_component_register(const struct rte_service_spec *spec,
+		uint32_t *service_id);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Unregister a service component.
  *
  * The service being removed must be stopped before calling this function.
@@ -89,12 +76,9 @@ rte_service_component_register(const struct rte_service_spec *spec,
  * @retval -EBUSY The service is currently running, stop the service before
  *          calling unregister. No action has been taken.
  */
-int32_t __rte_experimental rte_service_component_unregister(uint32_t id);
+int32_t rte_service_component_unregister(uint32_t id);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Private function to allow EAL to initialized default mappings.
  *
  * This function iterates all the services, and maps then to the available
@@ -107,12 +91,9 @@ int32_t __rte_experimental rte_service_component_unregister(uint32_t id);
  * @retval -ENODEV Error in enabling service lcore on a service
  * @retval -ENOEXEC Error when starting services
  */
-int32_t __rte_experimental rte_service_start_with_defaults(void);
+int32_t rte_service_start_with_defaults(void);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Set the backend runstate of a component.
  *
  * This function allows services to be registered at startup, but not yet
@@ -124,13 +105,9 @@ int32_t __rte_experimental rte_service_start_with_defaults(void);
  *
  * @retval 0 Success
  */
-int32_t __rte_experimental rte_service_component_runstate_set(uint32_t id,
-							  uint32_t runstate);
+int32_t rte_service_component_runstate_set(uint32_t id, uint32_t runstate);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * Initialize the service library.
  *
  * In order to use the service library, it must be initialized. EAL initializes
@@ -142,14 +119,11 @@ int32_t __rte_experimental rte_service_component_runstate_set(uint32_t id,
 int32_t rte_service_init(void);
 
 /**
- * @warning
- * @b EXPERIMENTAL: this API may change without prior notice
- *
  * @internal Free up the memory that has been initialized.
  * This routine is to be invoked prior to process termination.
  *
  * @retval None
  */
-void __rte_experimental rte_service_finalize(void);
+void rte_service_finalize(void);
 
 #endif /* _RTE_SERVICE_PRIVATE_H_ */
diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c
index be9b5e6..73507aa 100644
--- a/lib/librte_eal/common/rte_service.c
+++ b/lib/librte_eal/common/rte_service.c
@@ -115,7 +115,7 @@ int32_t rte_service_init(void)
 	return -ENOMEM;
 }
 
-void __rte_experimental
+void
 rte_service_finalize(void)
 {
 	if (!rte_service_library_initialized)
@@ -161,7 +161,7 @@ service_mt_safe(struct rte_service_spec_impl *s)
 	return !!(s->spec.capabilities & RTE_SERVICE_CAP_MT_SAFE);
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_set_stats_enable(uint32_t id, int32_t enabled)
 {
 	struct rte_service_spec_impl *s;
@@ -175,7 +175,7 @@ rte_service_set_stats_enable(uint32_t id, int32_t enabled)
 	return 0;
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_set_runstate_mapped_check(uint32_t id, int32_t enabled)
 {
 	struct rte_service_spec_impl *s;
@@ -189,13 +189,13 @@ rte_service_set_runstate_mapped_check(uint32_t id, int32_t enabled)
 	return 0;
 }
 
-uint32_t __rte_experimental
+uint32_t
 rte_service_get_count(void)
 {
 	return rte_service_count;
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_get_by_name(const char *name, uint32_t *service_id)
 {
 	if (!service_id)
@@ -213,7 +213,7 @@ rte_service_get_by_name(const char *name, uint32_t *service_id)
 	return -ENODEV;
 }
 
-const char * __rte_experimental
+const char *
 rte_service_get_name(uint32_t id)
 {
 	struct rte_service_spec_impl *s;
@@ -221,7 +221,7 @@ rte_service_get_name(uint32_t id)
 	return s->spec.name;
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_probe_capability(uint32_t id, uint32_t capability)
 {
 	struct rte_service_spec_impl *s;
@@ -229,7 +229,7 @@ rte_service_probe_capability(uint32_t id, uint32_t capability)
 	return !!(s->spec.capabilities & capability);
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_component_register(const struct rte_service_spec *spec,
 			       uint32_t *id_ptr)
 {
@@ -262,7 +262,7 @@ rte_service_component_register(const struct rte_service_spec *spec,
 	return 0;
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_component_unregister(uint32_t id)
 {
 	uint32_t i;
@@ -283,7 +283,7 @@ rte_service_component_unregister(uint32_t id)
 	return 0;
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_component_runstate_set(uint32_t id, uint32_t runstate)
 {
 	struct rte_service_spec_impl *s;
@@ -298,7 +298,7 @@ rte_service_component_runstate_set(uint32_t id, uint32_t runstate)
 	return 0;
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_runstate_set(uint32_t id, uint32_t runstate)
 {
 	struct rte_service_spec_impl *s;
@@ -313,7 +313,7 @@ rte_service_runstate_set(uint32_t id, uint32_t runstate)
 	return 0;
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_runstate_get(uint32_t id)
 {
 	struct rte_service_spec_impl *s;
@@ -374,7 +374,7 @@ service_run(uint32_t i, struct core_state *cs, uint64_t service_mask)
 	return 0;
 }
 
-int32_t __rte_experimental rte_service_run_iter_on_app_lcore(uint32_t id,
+int32_t rte_service_run_iter_on_app_lcore(uint32_t id,
 		uint32_t serialize_mt_unsafe)
 {
 	/* run service on calling core, using all-ones as the service mask */
@@ -430,7 +430,7 @@ rte_service_runner_func(void *arg)
 	return 0;
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_lcore_count(void)
 {
 	int32_t count = 0;
@@ -440,7 +440,7 @@ rte_service_lcore_count(void)
 	return count;
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_lcore_list(uint32_t array[], uint32_t n)
 {
 	uint32_t count = rte_service_lcore_count();
@@ -463,7 +463,7 @@ rte_service_lcore_list(uint32_t array[], uint32_t n)
 	return count;
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_lcore_count_services(uint32_t lcore)
 {
 	if (lcore >= RTE_MAX_LCORE)
@@ -476,7 +476,7 @@ rte_service_lcore_count_services(uint32_t lcore)
 	return __builtin_popcountll(cs->service_mask);
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_start_with_defaults(void)
 {
 	/* create a default mapping from cores to services, then start the
@@ -562,7 +562,7 @@ service_update(struct rte_service_spec *service, uint32_t lcore,
 	return 0;
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_map_lcore_set(uint32_t id, uint32_t lcore, uint32_t enabled)
 {
 	struct rte_service_spec_impl *s;
@@ -571,7 +571,7 @@ rte_service_map_lcore_set(uint32_t id, uint32_t lcore, uint32_t enabled)
 	return service_update(&s->spec, lcore, &on, 0);
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_map_lcore_get(uint32_t id, uint32_t lcore)
 {
 	struct rte_service_spec_impl *s;
@@ -597,7 +597,7 @@ set_lcore_state(uint32_t lcore, int32_t state)
 	lcore_states[lcore].is_service_core = (state == ROLE_SERVICE);
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_lcore_reset_all(void)
 {
 	/* loop over cores, reset all to mask 0 */
@@ -617,7 +617,7 @@ rte_service_lcore_reset_all(void)
 	return 0;
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_lcore_add(uint32_t lcore)
 {
 	if (lcore >= RTE_MAX_LCORE)
@@ -636,7 +636,7 @@ rte_service_lcore_add(uint32_t lcore)
 	return rte_eal_wait_lcore(lcore);
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_lcore_del(uint32_t lcore)
 {
 	if (lcore >= RTE_MAX_LCORE)
@@ -655,7 +655,7 @@ rte_service_lcore_del(uint32_t lcore)
 	return 0;
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_lcore_start(uint32_t lcore)
 {
 	if (lcore >= RTE_MAX_LCORE)
@@ -678,7 +678,7 @@ rte_service_lcore_start(uint32_t lcore)
 	return ret;
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_lcore_stop(uint32_t lcore)
 {
 	if (lcore >= RTE_MAX_LCORE)
@@ -708,7 +708,7 @@ rte_service_lcore_stop(uint32_t lcore)
 	return 0;
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_attr_get(uint32_t id, uint32_t attr_id, uint32_t *attr_value)
 {
 	struct rte_service_spec_impl *s;
@@ -753,7 +753,7 @@ rte_service_dump_one(FILE *f, struct rte_service_spec_impl *s,
 			s->cycles_spent, s->cycles_spent / calls);
 }
 
-int32_t __rte_experimental
+int32_t
 rte_service_attr_reset_all(uint32_t id)
 {
 	struct rte_service_spec_impl *s;
@@ -781,7 +781,8 @@ service_dump_calls_per_lcore(FILE *f, uint32_t lcore, uint32_t reset)
 	fprintf(f, "\n");
 }
 
-int32_t __rte_experimental rte_service_dump(FILE *f, uint32_t id)
+int32_t
+rte_service_dump(FILE *f, uint32_t id)
 {
 	uint32_t i;
 	int print_one = (id != UINT32_MAX);
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index dd38783..5fdbb56 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -211,28 +211,14 @@ DPDK_18.02 {
 
 }  DPDK_17.11;
 
-EXPERIMENTAL {
+DPDK_18.05 {
 	global:
 
-	rte_eal_cleanup;
-	rte_eal_devargs_insert;
-	rte_eal_devargs_parse;
-	rte_eal_devargs_remove;
-	rte_eal_hotplug_add;
-	rte_eal_hotplug_remove;
-	rte_eal_mbuf_user_pool_ops;
-	rte_log_register_type_and_pick_level;
-	rte_mp_action_register;
-	rte_mp_action_unregister;
-	rte_mp_reply;
-	rte_mp_request_sync;
-	rte_mp_request_async;
-	rte_mp_sendmsg;
 	rte_service_attr_get;
 	rte_service_attr_reset_all;
 	rte_service_component_register;
-	rte_service_component_unregister;
 	rte_service_component_runstate_set;
+	rte_service_component_unregister;
 	rte_service_dump;
 	rte_service_finalize;
 	rte_service_get_by_id;
@@ -256,6 +242,26 @@ EXPERIMENTAL {
 	rte_service_set_runstate_mapped_check;
 	rte_service_set_stats_enable;
 	rte_service_start_with_defaults;
+
+}  DPDK_18.02;
+
+EXPERIMENTAL {
+	global:
+
+	rte_eal_cleanup;
+	rte_eal_devargs_insert;
+	rte_eal_devargs_parse;
+	rte_eal_devargs_remove;
+	rte_eal_hotplug_add;
+	rte_eal_hotplug_remove;
+	rte_eal_mbuf_user_pool_ops;
+	rte_log_register_type_and_pick_level;
+	rte_mp_action_register;
+	rte_mp_action_unregister;
+	rte_mp_reply;
+	rte_mp_request_sync;
+	rte_mp_request_async;
+	rte_mp_sendmsg;
 	rte_socket_count;
 	rte_socket_id_by_idx;
 
-- 
2.7.4

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH v1 01/16] ethdev: update ABI for flow API functions
  2018-04-05 10:06  4%   ` Thomas Monjalon
@ 2018-04-05 12:44  9%     ` Adrien Mazarguil
  0 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2018-04-05 12:44 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, Ferruh Yigit

On Thu, Apr 05, 2018 at 12:06:10PM +0200, Thomas Monjalon wrote:
> 04/04/2018 17:56, Adrien Mazarguil:
> > Subsequent patches will modify existing types and slightly alter the
> > behavior of the flow API. This warrants a major ABI breakage.
> > 
> > While it is already taken care of for 18.05 (LIBABIVER was updated to
> > version 9 by a prior commit), this patch explicitly adds the affected flow
> > API functions as a safety measure.
> 
> I don't understand this patch.
> 
> If the API is broken, you must move the function from old block to
> the new one.

Missed that part, I'll update it.

> And it must be done in the patch modifying the function.

About that, almost each patch in this series breaks the ABI in its own
way. This left me with two options: either updating these functions once and
for all and explaining why in a dedicated patch, or updating them in the
first patch with an ABI impact, with subsequent patches piggybacking on that
change.

Unless there's a way to update the map file for each patch that breaks ABI,
I think the former is more consistent, but I don't mind if you prefer the
latter. What do you suggest?

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH] doc: add meter API change to release notes
  2018-04-05 11:49  4% [dpdk-dev] [PATCH] doc: add meter API change to release notes Jasvinder Singh
@ 2018-04-05 12:03  0% ` Dumitrescu, Cristian
  0 siblings, 0 replies; 200+ results
From: Dumitrescu, Cristian @ 2018-04-05 12:03 UTC (permalink / raw)
  To: Singh, Jasvinder, dev



> -----Original Message-----
> From: Singh, Jasvinder
> Sent: Thursday, April 5, 2018 12:50 PM
> To: dev@dpdk.org
> Cc: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> Subject: [PATCH] doc: add meter API change to release notes
> 
> Update the release notes with meter api change to support configuration
> profiles.
> 
> Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
> ---
>  doc/guides/rel_notes/release_18_05.rst | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_18_05.rst
> b/doc/guides/rel_notes/release_18_05.rst
> index e5fac1c..34222cd 100644
> --- a/doc/guides/rel_notes/release_18_05.rst
> +++ b/doc/guides/rel_notes/release_18_05.rst
> @@ -72,6 +72,16 @@ API Changes
>     Also, make sure to start the actual text at the margin.
>     =========================================================
> 
> +* **Meter API updated to accomodate configuration profiles.**
> +
> +  The meter API is changed to support meter configuration profiles. The
> +  configuration profile represents the set of configuration parameters
> +  for a given meter object, such as the rates and sizes for the token
> +  buckets. These configuration parameters were previously the part of
> meter
> +  object internal data strcuture. The separation of the configuration
> +  parameters from meter object data structure results in reducing its
> +  memory footprint which helps in better cache utilization when large
> number
> +  of meter objects are used.
> 
>  ABI Changes
>  -----------
> --
> 2.9.3

Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] doc: add meter API change to release notes
@ 2018-04-05 11:49  4% Jasvinder Singh
  2018-04-05 12:03  0% ` Dumitrescu, Cristian
  0 siblings, 1 reply; 200+ results
From: Jasvinder Singh @ 2018-04-05 11:49 UTC (permalink / raw)
  To: dev; +Cc: cristian.dumitrescu

Update the release notes with meter api change to support configuration
profiles.

Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
---
 doc/guides/rel_notes/release_18_05.rst | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index e5fac1c..34222cd 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -72,6 +72,16 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Meter API updated to accomodate configuration profiles.**
+
+  The meter API is changed to support meter configuration profiles. The
+  configuration profile represents the set of configuration parameters
+  for a given meter object, such as the rates and sizes for the token
+  buckets. These configuration parameters were previously the part of meter
+  object internal data strcuture. The separation of the configuration
+  parameters from meter object data structure results in reducing its
+  memory footprint which helps in better cache utilization when large number
+  of meter objects are used.
 
 ABI Changes
 -----------
-- 
2.9.3

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v4] lib/librte_meter: add meter configuration profile
  2018-04-05 10:12  0%     ` Thomas Monjalon
@ 2018-04-05 11:00  0%       ` Dumitrescu, Cristian
  0 siblings, 0 replies; 200+ results
From: Dumitrescu, Cristian @ 2018-04-05 11:00 UTC (permalink / raw)
  To: Thomas Monjalon, Singh, Jasvinder; +Cc: dev



> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, April 5, 2018 11:12 AM
> To: Singh, Jasvinder <jasvinder.singh@intel.com>; Dumitrescu, Cristian
> <cristian.dumitrescu@intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4] lib/librte_meter: add meter
> configuration profile
> 
> 19/02/2018 22:12, Thomas Monjalon:
> > 08/01/2018 16:43, Jasvinder Singh:
> > > Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> > > Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
> >
> > Applied for 18.05 (was postponed to preserve 18.02 ABI), thanks.
> 
> We forgot to update the release notes about the API change.
> Please, could you send a patch to add it in the appropriate section?
> Thanks
> 

Will send a quick patch later today, thanks!

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3 1/4] ethdev: add support for PMD-tuned Tx/Rx parameters
  2018-04-04 18:56  3%       ` De Lara Guarch, Pablo
@ 2018-04-05 10:16  0%         ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-04-05 10:16 UTC (permalink / raw)
  To: Horton, Remy
  Cc: dev, De Lara Guarch, Pablo, Mcnamara, John, Lu, Wenzhuo, Wu,
	Jingjing, Zhang, Qi Z, Xing, Beilei, Shreyansh Jain

04/04/2018 20:56, De Lara Guarch, Pablo:
> 
> API and ABI changes should be documented in release notes.

When sending a v4 for the API change, you can add my ack:

Acked-by: Thomas Monjalon <thomas@monjalon.net>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4] lib/librte_meter: add meter configuration profile
  2018-02-19 21:12  3%   ` Thomas Monjalon
@ 2018-04-05 10:12  0%     ` Thomas Monjalon
  2018-04-05 11:00  0%       ` Dumitrescu, Cristian
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-04-05 10:12 UTC (permalink / raw)
  To: Jasvinder Singh, cristian.dumitrescu; +Cc: dev

19/02/2018 22:12, Thomas Monjalon:
> 08/01/2018 16:43, Jasvinder Singh:
> > Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> > Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>
> 
> Applied for 18.05 (was postponed to preserve 18.02 ABI), thanks.

We forgot to update the release notes about the API change.
Please, could you send a patch to add it in the appropriate section?
Thanks

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 01/16] ethdev: update ABI for flow API functions
  2018-04-04 15:56  7% ` [dpdk-dev] [PATCH v1 01/16] ethdev: update ABI for flow API functions Adrien Mazarguil
@ 2018-04-05 10:06  4%   ` Thomas Monjalon
  2018-04-05 12:44  9%     ` Adrien Mazarguil
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-04-05 10:06 UTC (permalink / raw)
  To: Adrien Mazarguil; +Cc: dev, Ferruh Yigit

04/04/2018 17:56, Adrien Mazarguil:
> Subsequent patches will modify existing types and slightly alter the
> behavior of the flow API. This warrants a major ABI breakage.
> 
> While it is already taken care of for 18.05 (LIBABIVER was updated to
> version 9 by a prior commit), this patch explicitly adds the affected flow
> API functions as a safety measure.

I don't understand this patch.

If the API is broken, you must move the function from old block to
the new one. And it must be done in the patch modifying the function.


> --- a/lib/librte_ether/rte_ethdev_version.map
> +++ b/lib/librte_ether/rte_ethdev_version.map
> +DPDK_18.05 {
> +	global:
> +
> +	rte_flow_validate;
> +	rte_flow_create;
> +	rte_flow_query;
> +	rte_flow_copy;
> +
> +} DPDK_18.02;

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v4] ethdev: replace bus specific struct with generic dev
  2018-04-04 17:57  3%           ` De Lara Guarch, Pablo
@ 2018-04-05  9:19  0%             ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-04-05  9:19 UTC (permalink / raw)
  To: De Lara Guarch, Pablo, David Marchand, santosh
  Cc: dev, Shreyansh Jain, Legacy, Allain (Wind River),
	Tomasz Duszynski, Thomas Monjalon

On 4/4/2018 6:57 PM, De Lara Guarch, Pablo wrote:
> 
> 
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
>> Sent: Tuesday, April 3, 2018 10:50 AM
>> To: David Marchand <david.marchand@6wind.com>; santosh
>> <santosh.shukla@caviumnetworks.com>
>> Cc: dev@dpdk.org; Shreyansh Jain <shreyansh.jain@nxp.com>; Legacy, Allain
>> (Wind River) <allain.legacy@windriver.com>; Tomasz Duszynski
>> <tdu@semihalf.com>; Thomas Monjalon <thomas@monjalon.net>
>> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: replace bus specific struct with
>> generic dev
>>
>> On 4/3/2018 10:06 AM, David Marchand wrote:
>>> On Mon, Apr 2, 2018 at 6:13 PM, santosh
>>> <santosh.shukla@caviumnetworks.com> wrote:
>>>> On Friday 30 March 2018 08:59 PM, David Marchand wrote:
>>>>> I can see we enforce the driver name by putting it after the call to
>>>>> .dev_infos_get.
>>>>> http://dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev.c#n2399
>>>>>
>>>>> octeontx pmd seems to try to do something about it:
>>>>> http://dpdk.org/browse/dpdk/tree/drivers/net/octeontx/octeontx_ethde
>>>>> v.c#n622
>>>>>
>>>>> Not sure it does something, might be a thing to cleanup.
>>>>>
>>>>>
>>>> In case, if your referring to driver_name update then indeed its a
>>>> cleanup [1].
>>>>
>>>> Otherwise, I don't see any issue with v4 Or may be /I /misunderstood
>>>> your comment.
>>>
>>> I agree there is no fundamental issue.
>>>
>>>     dev_info->device = dev->device;
>>>
>>>     RTE_FUNC_PTR_OR_RET(*dev->dev_ops->dev_infos_get);
>>>     (*dev->dev_ops->dev_infos_get)(dev, dev_info);
>>>     dev_info->driver_name = dev->device->driver->name;
>>>
>>> If somebody (I mean some pmd out there) has a usecase with
>>> dev_info->device != dev->device, why not.
>>
>> Intentional let drivers update this variable although I don't also see any use case
>> of it.
>>
>> This variable was set by PMDs before this patch, so I don't see any reason to be
>> so strict here.
>>
>> If driver does anything ethdev will set dev_info->device for it, if it want to
>> overwrite, for any reason, it will have the capability.
> 
> Looks good to me. Will do the same for cryptodev and bbdev.
> The only thing that I am missing here is an update in documentation,
> adding the ABI Change in release notes.

Right, I forget about it, will send a new version.

Thanks,
ferruh

> 
> Apart from it:
> 
> Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
> 
>>
>>>
>>> Thomas ?
>>>
>>>
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ring: relax alignment constraint on ring structure
  2018-04-03 16:42  3%           ` Jerin Jacob
@ 2018-04-04 23:38  0%             ` Ananyev, Konstantin
  0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2018-04-04 23:38 UTC (permalink / raw)
  To: Jerin Jacob, Olivier Matz; +Cc: dev, Richardson, Bruce

Hi lads,

> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Tuesday, April 3, 2018 5:43 PM
> To: Olivier Matz <olivier.matz@6wind.com>
> Cc: dev@dpdk.org; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] ring: relax alignment constraint on ring structure
> 
> -----Original Message-----
> > Date: Tue, 3 Apr 2018 17:56:01 +0200
> > From: Olivier Matz <olivier.matz@6wind.com>
> > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > CC: dev@dpdk.org, konstantin.ananyev@intel.com, bruce.richardson@intel.com
> > Subject: Re: [dpdk-dev] [PATCH] ring: relax alignment constraint on ring
> >  structure
> > User-Agent: NeoMutt/20170113 (1.7.2)
> >
> > On Tue, Apr 03, 2018 at 09:07:04PM +0530, Jerin Jacob wrote:
> > > -----Original Message-----
> > > > Date: Tue, 3 Apr 2018 17:25:17 +0200
> > > > From: Olivier Matz <olivier.matz@6wind.com>
> > > > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > > CC: dev@dpdk.org, konstantin.ananyev@intel.com, bruce.richardson@intel.com
> > > > Subject: Re: [dpdk-dev] [PATCH] ring: relax alignment constraint on ring
> > > >  structure
> > > > User-Agent: NeoMutt/20170113 (1.7.2)
> > > >
> > > > On Tue, Apr 03, 2018 at 08:37:23PM +0530, Jerin Jacob wrote:
> > > > > -----Original Message-----
> > > > > > Date: Tue, 3 Apr 2018 15:26:44 +0200
> > > > > > From: Olivier Matz <olivier.matz@6wind.com>
> > > > > > To: dev@dpdk.org
> > > > > > Subject: [dpdk-dev] [PATCH] ring: relax alignment constraint on ring
> > > > > >  structure
> > > > > > X-Mailer: git-send-email 2.11.0
> > > > > >
> > > > > > The initial objective of
> > > > > > commit d9f0d3a1ffd4 ("ring: remove split cacheline build setting")
> > > > > > was to add an empty cache line betwee, the producer and consumer
> > > > > > data (on platform with cache line size = 64B), preventing from
> > > > > > having them on adjacent cache lines.
> > > > > >
> > > > > > Following discussion on the mailing list, it appears that this
> > > > > > also imposes an alignment constraint that is not required.
> > > > > >
> > > > > > This patch removes the extra alignment constraint and adds the
> > > > > > empty cache lines using padding fields in the structure. The
> > > > > > size of rte_ring structure and the offset of the fields remain
> > > > > > the same on platforms with cache line size = 64B:
> > > > > >
> > > > > >   rte_ring = 384
> > > > > >   rte_ring.name = 0
> > > > > >   rte_ring.flags = 32
> > > > > >   rte_ring.memzone = 40
> > > > > >   rte_ring.size = 48
> > > > > >   rte_ring.mask = 52
> > > > > >   rte_ring.prod = 128
> > > > > >   rte_ring.cons = 256
> > > > > >
> > > > > > But it has an impact on platform where cache line size is 128B:
> > > > > >
> > > > > >   rte_ring = 384        -> 768
> > > > > >   rte_ring.name = 0
> > > > > >   rte_ring.flags = 32
> > > > > >   rte_ring.memzone = 40
> > > > > >   rte_ring.size = 48
> > > > > >   rte_ring.mask = 52
> > > > > >   rte_ring.prod = 128   -> 256
> > > > > >   rte_ring.cons = 256   -> 512
> > > > >
> > > > > Are we leaving TWO cacheline to make sure, HW prefetch don't load
> > > > > the adjust cacheline(consumer)?
> > > > >
> > > > > If so, Will it have impact on those machine where it is 128B Cache line
> > > > > and the HW prefetcher is not loading the next caching explicitly. Right?
> > > >
> > > > The impact on machines that have a 128B cache line is that an unused
> > > > cache line will be added between the producer and consumer data. I
> > > > expect that the impact is positive in case there is a hw prefetcher, and
> > > > null in case there is no such prefetcher.
> > >
> > > It is not NULL, Right? You are loosing 256B for each ring.
> >
> > Is it really that important?
> 
> Pipeline or eventdev SW cases there could more rings in the system.
> I don't see any downside of having config option which is enabled
> default.
> 
> In my view, such config options are good, as in embedded usecases, customers
> can really fine tune the target for the need. In server usecases, let the default
> of option be enabled, no harm.

But that would mean we have to maintain two layouts for the rte_ring structure.
I am agree with Olivier here, might be saving 256B per ring is not worth such hassle.
Konstantin

> 
> >
> >
> > > > On machines with 64B cache line, this was already the case. It just
> > > > reduces the alignment constraint.
> > >
> > > Not all the 64B CL machines will have HW prefetch.
> > >
> > > I would recommend to add conditional compilation flags to express HW
> > > prefetch enabled or not? based on that we can decide to reserve
> > > the additional space. By default, in common config, HW prefetch can
> > > be enabled so that it works for almost all cases.
> >
> > The hw prefetcher can be enabled at runtime, so a compilation flag
> > does not seem to be a good idea. Moreover, changing this compilation
> 
> On those Hardwares HW prefetch can be disabled at runtime, it is fine
> with default config. I was taking about some low end ARM hardware which
> does not have HW prefetch is not present at all.
> 
> > flag would change the ABI.
> 
> ABI is broken anyway, Right? due to size of the structure change.
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v7] eal: provide API for querying valid socket id's
  2018-03-31 17:08  5%     ` [dpdk-dev] [PATCH v7] " Anatoly Burakov
@ 2018-04-04 22:31  3%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-04-04 22:31 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, Neil Horman, John McNamara, Marko Kovacevic,
	Bruce Richardson, chaozhu, gowrishankar.m

31/03/2018 19:08, Anatoly Burakov:
> During lcore scan, find all socket ID's and store them, and
> provide public API to query valid socket id's. This will break
> the ABI, so bump ABI version.
> 
> Also, remove deprecation notice corresponding to this change.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
> ---
> 
> Notes:
>     v7:
>     - Renamed rte_num_socket_ids() to rte_socket_count()
>     - Removed deprecation notice associated with this change
>     - Addressed review comments

You forgot the release notes for the ABI version (from my previous review).

Applied and fixed.

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 00/13] eal: replace calls to rte_panic and refrain from new instances
@ 2018-04-04 22:01  3% Arnon Warshavsky
  0 siblings, 0 replies; 200+ results
From: Arnon Warshavsky @ 2018-04-04 22:01 UTC (permalink / raw)
  To: thomas, anatoly.burakov, wenzhuo.lu, declan.doherty, jerin.jacob,
	bruce.richardson, ferruh.yigit
  Cc: dev, arnon


The purpose of this patch series is to cleanup the library code
from paths that end up aborting the process,
and move to checking error values, in order to allow the running process
perform an orderly teardown or other mitigation of the event.

This patch modifies the majority of rte_panic calls
under lib and drivers, and replaces them with a log message
and an error return code according to context,
that can be propagated up the call stack.

- Focus was given to the dpdk initialization path
- Some of the panic calls within drivers were left in place where
  the call is from within an interrupt or calls that are
  on the data path,where there is no simple applicative
  route to propagate the error to temination.
  These should be handled by the driver maintainers.
- In order to avoid breaking ABI where panic was called from public
  void functions, a panic state variable was introduced so that
  it can be queried after calling these void functions.
  This tool place for a single function call.
- local void functions with no api were changed to retrun a value
  where needed
- No change took place in example and test files
- No change took place for debug assertions calling panic
- A new function was added to devtools/checkpatches.sh
  in order to prevent new additions of calls to rte_panic
  under lib and drivers.

Keep calm and don't panic

---

v2:
- reformat error messages so that literal string are in the same line
- fix typo in commit message
- add new return code to doxigen of rte_memzone_free()

Arnon Warshavsky (13):
  crypto: replace rte_panic instances in crypto driver
  bond: replace rte_panic instances in bonding driver
  e1000: replace rte_panic instances in e1000 driver
  ixgbe: replace rte_panic instances in ixgbe driver
  eal: replace rte_panic instances in eventdev
  kni: replace rte_panic instances in kni
  e1000: replace rte_panic instances in e1000 driver
  eal: replace rte_panic instances in hugepage_info
  eal: replace rte_panic instances in common_memzone
  eal: replace rte_panic instances in interrupts thread
  eal: replace rte_panic instances in ethdev
  eal: replace rte_panic instances in init sequence
  devtools: prevent new instances of rte_panic and rte_exit

 devtools/checkpatches.sh                          |  94 ++++++++++++++++-
 drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c       |   8 +-
 drivers/crypto/dpaa_sec/dpaa_sec.c                |   8 +-
 drivers/net/bonding/rte_eth_bond_8023ad.c         |  30 ++++--
 drivers/net/bonding/rte_eth_bond_8023ad_private.h |   2 +-
 drivers/net/bonding/rte_eth_bond_api.c            |  20 ++--
 drivers/net/bonding/rte_eth_bond_pmd.c            |  10 +-
 drivers/net/bonding/rte_eth_bond_private.h        |   2 +-
 drivers/net/e1000/e1000_ethdev.h                  |   2 +-
 drivers/net/e1000/igb_ethdev.c                    |   3 +-
 drivers/net/e1000/igb_pf.c                        |  15 +--
 drivers/net/ixgbe/ixgbe_ethdev.c                  |   3 +-
 drivers/net/ixgbe/ixgbe_ethdev.h                  |   2 +-
 drivers/net/ixgbe/ixgbe_pf.c                      |  13 ++-
 lib/librte_eal/bsdapp/eal/eal.c                   |  87 +++++++++++-----
 lib/librte_eal/bsdapp/eal/eal_thread.c            |  65 +++++++++---
 lib/librte_eal/common/eal_common_launch.c         |  21 ++++
 lib/librte_eal/common/eal_common_memzone.c        |   3 +-
 lib/librte_eal/common/include/rte_debug.h         |  12 +++
 lib/librte_eal/common/include/rte_memzone.h       |   1 +
 lib/librte_eal/common/rte_malloc.c                |   7 +-
 lib/librte_eal/linuxapp/eal/eal.c                 | 121 +++++++++++++++-------
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c   |  21 ++--
 lib/librte_eal/linuxapp/eal/eal_interrupts.c      |  27 +++--
 lib/librte_eal/linuxapp/eal/eal_thread.c          |  65 +++++++++---
 lib/librte_ether/rte_ethdev.c                     |  36 +++++--
 lib/librte_eventdev/rte_eventdev_pmd_pci.h        |   8 +-
 lib/librte_eventdev/rte_eventdev_pmd_vdev.h       |   8 +-
 lib/librte_kni/rte_kni.c                          |  18 ++--
 lib/librte_kni/rte_kni_fifo.h                     |  11 +-
 30 files changed, 540 insertions(+), 183 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v11 3/9] eventtimer: add common code
  @ 2018-04-04 21:51  3%           ` Erik Gabriel Carrillo
  0 siblings, 0 replies; 200+ results
From: Erik Gabriel Carrillo @ 2018-04-04 21:51 UTC (permalink / raw)
  To: pbhagavatula, jerin.jacob; +Cc: dev

This commit adds the logic that is shared by all event timer adapter
drivers; the common code handles instance allocation and some
initialization.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
---
 config/common_base                                |   1 +
 drivers/event/sw/sw_evdev.c                       |  18 +
 lib/librte_eventdev/Makefile                      |   2 +
 lib/librte_eventdev/rte_event_timer_adapter.c     | 387 ++++++++++++++++++++++
 lib/librte_eventdev/rte_event_timer_adapter_pmd.h | 114 +++++++
 lib/librte_eventdev/rte_eventdev.c                |  22 ++
 lib/librte_eventdev/rte_eventdev.h                |  20 ++
 lib/librte_eventdev/rte_eventdev_pmd.h            |  35 ++
 lib/librte_eventdev/rte_eventdev_version.map      |  20 +-
 9 files changed, 618 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eventdev/rte_event_timer_adapter.c
 create mode 100644 lib/librte_eventdev/rte_event_timer_adapter_pmd.h

diff --git a/config/common_base b/config/common_base
index 7abf7c6..9354c66 100644
--- a/config/common_base
+++ b/config/common_base
@@ -550,6 +550,7 @@ CONFIG_RTE_LIBRTE_EVENTDEV=y
 CONFIG_RTE_LIBRTE_EVENTDEV_DEBUG=n
 CONFIG_RTE_EVENT_MAX_DEVS=16
 CONFIG_RTE_EVENT_MAX_QUEUES_PER_DEV=64
+CONFIG_RTE_EVENT_TIMER_ADAPTER_NUM_MAX=32
 
 #
 # Compile PMD for skeleton event device
diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c
index 0e89f11..dcb6551 100644
--- a/drivers/event/sw/sw_evdev.c
+++ b/drivers/event/sw/sw_evdev.c
@@ -464,6 +464,22 @@ sw_eth_rx_adapter_caps_get(const struct rte_eventdev *dev,
 	return 0;
 }
 
+static int
+sw_timer_adapter_caps_get(const struct rte_eventdev *dev,
+			  uint64_t flags,
+			  uint32_t *caps,
+			  const struct rte_event_timer_adapter_ops **ops)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(flags);
+	*caps = 0;
+
+	/* Use default SW ops */
+	*ops = NULL;
+
+	return 0;
+}
+
 static void
 sw_info_get(struct rte_eventdev *dev, struct rte_event_dev_info *info)
 {
@@ -791,6 +807,8 @@ sw_probe(struct rte_vdev_device *vdev)
 
 			.eth_rx_adapter_caps_get = sw_eth_rx_adapter_caps_get,
 
+			.timer_adapter_caps_get = sw_timer_adapter_caps_get,
+
 			.xstats_get = sw_xstats_get,
 			.xstats_get_names = sw_xstats_get_names,
 			.xstats_get_by_name = sw_xstats_get_by_name,
diff --git a/lib/librte_eventdev/Makefile b/lib/librte_eventdev/Makefile
index 549b182..8b16e3f 100644
--- a/lib/librte_eventdev/Makefile
+++ b/lib/librte_eventdev/Makefile
@@ -20,6 +20,7 @@ LDLIBS += -lrte_eal -lrte_ring -lrte_ethdev -lrte_hash
 SRCS-y += rte_eventdev.c
 SRCS-y += rte_event_ring.c
 SRCS-y += rte_event_eth_rx_adapter.c
+SRCS-y += rte_event_timer_adapter.c
 
 # export include files
 SYMLINK-y-include += rte_eventdev.h
@@ -29,6 +30,7 @@ SYMLINK-y-include += rte_eventdev_pmd_vdev.h
 SYMLINK-y-include += rte_event_ring.h
 SYMLINK-y-include += rte_event_eth_rx_adapter.h
 SYMLINK-y-include += rte_event_timer_adapter.h
+SYMLINK-y-include += rte_event_timer_adapter_pmd.h
 
 # versioning export map
 EXPORT_MAP := rte_eventdev_version.map
diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
new file mode 100644
index 0000000..75a14ac
--- /dev/null
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -0,0 +1,387 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017-2018 Intel Corporation.
+ * All rights reserved.
+ */
+
+#include <string.h>
+#include <inttypes.h>
+
+#include <rte_memzone.h>
+#include <rte_memory.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+
+#include "rte_eventdev.h"
+#include "rte_eventdev_pmd.h"
+#include "rte_event_timer_adapter.h"
+#include "rte_event_timer_adapter_pmd.h"
+
+#define DATA_MZ_NAME_MAX_LEN 64
+#define DATA_MZ_NAME_FORMAT "rte_event_timer_adapter_data_%d"
+
+static int evtim_logtype;
+
+static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
+
+#define EVTIM_LOG(level, logtype, ...) \
+	rte_log(RTE_LOG_ ## level, logtype, \
+		RTE_FMT("EVTIMER: %s() line %u: " RTE_FMT_HEAD(__VA_ARGS__,) \
+			"\n", __func__, __LINE__, RTE_FMT_TAIL(__VA_ARGS__,)))
+
+#define EVTIM_LOG_ERR(...) EVTIM_LOG(ERR, evtim_logtype, __VA_ARGS__)
+
+#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
+#define EVTIM_LOG_DBG(...) \
+	EVTIM_LOG(DEBUG, evtim_logtype, __VA_ARGS__)
+#else
+#define EVTIM_LOG_DBG(...) (void)0
+#endif
+
+static int
+default_port_conf_cb(uint16_t id, uint8_t event_dev_id, uint8_t *event_port_id,
+		     void *conf_arg)
+{
+	struct rte_event_timer_adapter *adapter;
+	struct rte_eventdev *dev;
+	struct rte_event_dev_config dev_conf;
+	struct rte_event_port_conf *port_conf, def_port_conf = {0};
+	int started;
+	uint8_t port_id;
+	uint8_t dev_id;
+	int ret;
+
+	RTE_SET_USED(event_dev_id);
+
+	adapter = &adapters[id];
+	dev = &rte_eventdevs[adapter->data->event_dev_id];
+	dev_id = dev->data->dev_id;
+	dev_conf = dev->data->dev_conf;
+
+	started = dev->data->dev_started;
+	if (started)
+		rte_event_dev_stop(dev_id);
+
+	port_id = dev_conf.nb_event_ports;
+	dev_conf.nb_event_ports += 1;
+	ret = rte_event_dev_configure(dev_id, &dev_conf);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to configure event dev %u\n", dev_id);
+		if (started)
+			if (rte_event_dev_start(dev_id))
+				return -EIO;
+
+		return ret;
+	}
+
+	if (conf_arg != NULL)
+		port_conf = conf_arg;
+	else {
+		port_conf = &def_port_conf;
+		ret = rte_event_port_default_conf_get(dev_id, port_id,
+						      port_conf);
+		if (ret < 0)
+			return ret;
+	}
+
+	ret = rte_event_port_setup(dev_id, port_id, port_conf);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to setup event port %u on event dev %u\n",
+			      port_id, dev_id);
+		return ret;
+	}
+
+	*event_port_id = port_id;
+
+	if (started)
+		ret = rte_event_dev_start(dev_id);
+
+	return ret;
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_create(const struct rte_event_timer_adapter_conf *conf)
+{
+	return rte_event_timer_adapter_create_ext(conf, default_port_conf_cb,
+						  NULL);
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_create_ext(
+		const struct rte_event_timer_adapter_conf *conf,
+		rte_event_timer_adapter_port_conf_cb_t conf_cb,
+		void *conf_arg)
+{
+	uint16_t adapter_id;
+	struct rte_event_timer_adapter *adapter;
+	const struct rte_memzone *mz;
+	char mz_name[DATA_MZ_NAME_MAX_LEN];
+	int n, ret;
+	struct rte_eventdev *dev;
+
+	if (conf == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Check eventdev ID */
+	if (!rte_event_pmd_is_valid_dev(conf->event_dev_id)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	dev = &rte_eventdevs[conf->event_dev_id];
+
+	adapter_id = conf->timer_adapter_id;
+
+	/* Check that adapter_id is in range */
+	if (adapter_id >= RTE_EVENT_TIMER_ADAPTER_NUM_MAX) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Check adapter ID not already allocated */
+	adapter = &adapters[adapter_id];
+	if (adapter->allocated) {
+		rte_errno = EEXIST;
+		return NULL;
+	}
+
+	/* Create shared data area. */
+	n = snprintf(mz_name, sizeof(mz_name), DATA_MZ_NAME_FORMAT, adapter_id);
+	if (n >= (int)sizeof(mz_name)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	mz = rte_memzone_reserve(mz_name,
+				 sizeof(struct rte_event_timer_adapter_data),
+				 conf->socket_id, 0);
+	if (mz == NULL)
+		/* rte_errno set by rte_memzone_reserve */
+		return NULL;
+
+	adapter->data = mz->addr;
+	memset(adapter->data, 0, sizeof(struct rte_event_timer_adapter_data));
+
+	adapter->data->mz = mz;
+	adapter->data->event_dev_id = conf->event_dev_id;
+	adapter->data->id = adapter_id;
+	adapter->data->socket_id = conf->socket_id;
+	adapter->data->conf = *conf;  /* copy conf structure */
+
+	/* Query eventdev PMD for timer adapter capabilities and ops */
+	ret = dev->dev_ops->timer_adapter_caps_get(dev,
+						   adapter->data->conf.flags,
+						   &adapter->data->caps,
+						   &adapter->ops);
+	if (ret < 0) {
+		rte_errno = ret;
+		goto free_memzone;
+	}
+
+	if (!(adapter->data->caps &
+	      RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT)) {
+		FUNC_PTR_OR_NULL_RET_WITH_ERRNO(conf_cb, -EINVAL);
+		ret = conf_cb(adapter->data->id, adapter->data->event_dev_id,
+			      &adapter->data->event_port_id, conf_arg);
+		if (ret < 0) {
+			rte_errno = ret;
+			goto free_memzone;
+		}
+	}
+
+	/* Allow driver to do some setup */
+	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
+	ret = adapter->ops->init(adapter);
+	if (ret < 0) {
+		rte_errno = ret;
+		goto free_memzone;
+	}
+
+	/* Set fast-path function pointers */
+	adapter->arm_burst = adapter->ops->arm_burst;
+	adapter->arm_tmo_tick_burst = adapter->ops->arm_tmo_tick_burst;
+	adapter->cancel_burst = adapter->ops->cancel_burst;
+
+	adapter->allocated = 1;
+
+	return adapter;
+
+free_memzone:
+	rte_memzone_free(adapter->data->mz);
+	return NULL;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_info *adapter_info)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+
+	if (adapter->ops->get_info)
+		/* let driver set values it knows */
+		adapter->ops->get_info(adapter, adapter_info);
+
+	/* Set common values */
+	adapter_info->conf = adapter->data->conf;
+	adapter_info->event_dev_port_id = adapter->data->event_port_id;
+	adapter_info->caps = adapter->data->caps;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->start, -EINVAL);
+
+	ret = adapter->ops->start(adapter);
+	if (ret < 0)
+		return ret;
+
+	adapter->data->started = 1;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stop, -EINVAL);
+
+	if (adapter->data->started == 0) {
+		EVTIM_LOG_ERR("event timer adapter %"PRIu8" already stopped",
+			      adapter->data->id);
+		return 0;
+	}
+
+	ret = adapter->ops->stop(adapter);
+	if (ret < 0)
+		return ret;
+
+	adapter->data->started = 0;
+
+	return 0;
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_lookup(uint16_t adapter_id)
+{
+	char name[DATA_MZ_NAME_MAX_LEN];
+	const struct rte_memzone *mz;
+	struct rte_event_timer_adapter_data *data;
+	struct rte_event_timer_adapter *adapter;
+	int ret;
+	struct rte_eventdev *dev;
+
+	if (adapters[adapter_id].allocated)
+		return &adapters[adapter_id]; /* Adapter is already loaded */
+
+	snprintf(name, DATA_MZ_NAME_MAX_LEN, DATA_MZ_NAME_FORMAT, adapter_id);
+	mz = rte_memzone_lookup(name);
+	if (mz == NULL) {
+		rte_errno = ENOENT;
+		return NULL;
+	}
+
+	data = mz->addr;
+
+	adapter = &adapters[data->id];
+	adapter->data = data;
+
+	dev = &rte_eventdevs[adapter->data->event_dev_id];
+
+	/* Query eventdev PMD for timer adapter capabilities and ops */
+	ret = dev->dev_ops->timer_adapter_caps_get(dev,
+						   adapter->data->conf.flags,
+						   &adapter->data->caps,
+						   &adapter->ops);
+	if (ret < 0) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Set fast-path function pointers */
+	adapter->arm_burst = adapter->ops->arm_burst;
+	adapter->arm_tmo_tick_burst = adapter->ops->arm_tmo_tick_burst;
+	adapter->cancel_burst = adapter->ops->cancel_burst;
+
+	adapter->allocated = 1;
+
+	return adapter;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_free(struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->uninit, -EINVAL);
+
+	if (adapter->data->started == 1) {
+		EVTIM_LOG_ERR("event timer adapter %"PRIu8" must be stopped "
+			      "before freeing", adapter->data->id);
+		return -EBUSY;
+	}
+
+	/* free impl priv data */
+	ret = adapter->ops->uninit(adapter);
+	if (ret < 0)
+		return ret;
+
+	/* free shared data area */
+	ret = rte_memzone_free(adapter->data->mz);
+	if (ret < 0)
+		return ret;
+
+	adapter->data = NULL;
+	adapter->allocated = 0;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_service_id_get(struct rte_event_timer_adapter *adapter,
+				       uint32_t *service_id)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+
+	if (adapter->data->service_inited && service_id != NULL)
+		*service_id = adapter->data->service_id;
+
+	return adapter->data->service_inited ? 0 : -ESRCH;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stats_get(struct rte_event_timer_adapter *adapter,
+				  struct rte_event_timer_adapter_stats *stats)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stats_get, -EINVAL);
+	if (stats == NULL)
+		return -EINVAL;
+
+	return adapter->ops->stats_get(adapter, stats);
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stats_reset(struct rte_event_timer_adapter *adapter)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stats_reset, -EINVAL);
+	return adapter->ops->stats_reset(adapter);
+}
+
+RTE_INIT(event_timer_adapter_init_log);
+static void
+event_timer_adapter_init_log(void)
+{
+	evtim_logtype = rte_log_register("lib.eventdev.adapter.timer");
+	if (evtim_logtype >= 0)
+		rte_log_set_level(evtim_logtype, RTE_LOG_NOTICE);
+}
diff --git a/lib/librte_eventdev/rte_event_timer_adapter_pmd.h b/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
new file mode 100644
index 0000000..cf3509d
--- /dev/null
+++ b/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017-2018 Intel Corporation.
+ * All rights reserved.
+ */
+
+#ifndef __RTE_EVENT_TIMER_ADAPTER_PMD_H__
+#define __RTE_EVENT_TIMER_ADAPTER_PMD_H__
+
+/**
+ * @file
+ * RTE Event Timer Adapter API (PMD Side)
+ *
+ * @note
+ * This file provides implementation helpers for internal use by PMDs.  They
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_event_timer_adapter.h"
+
+/*
+ * Definitions of functions exported by an event timer adapter implementation
+ * through *rte_event_timer_adapter_ops* structure supplied in the
+ * *rte_event_timer_adapter* structure associated with an event timer adapter.
+ */
+
+typedef int (*rte_event_timer_adapter_init_t)(
+		struct rte_event_timer_adapter *adapter);
+/**< @internal Event timer adapter implementation setup */
+typedef int (*rte_event_timer_adapter_uninit_t)(
+		struct rte_event_timer_adapter *adapter);
+/**< @internal Event timer adapter implementation teardown */
+typedef int (*rte_event_timer_adapter_start_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Start running event timer adapter */
+typedef int (*rte_event_timer_adapter_stop_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Stop running event timer adapter */
+typedef void (*rte_event_timer_adapter_get_info_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_info *adapter_info);
+/**< @internal Get contextual information for event timer adapter */
+typedef int (*rte_event_timer_adapter_stats_get_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_stats *stats);
+/**< @internal Get statistics for event timer adapter */
+typedef int (*rte_event_timer_adapter_stats_reset_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Reset statistics for event timer adapter */
+
+/**
+ * @internal Structure containing the functions exported by an event timer
+ * adapter implementation.
+ */
+struct rte_event_timer_adapter_ops {
+	rte_event_timer_adapter_init_t		init;  /**< Set up adapter */
+	rte_event_timer_adapter_uninit_t	uninit;/**< Tear down adapter */
+	rte_event_timer_adapter_start_t		start; /**< Start adapter */
+	rte_event_timer_adapter_stop_t		stop;  /**< Stop adapter */
+	rte_event_timer_adapter_get_info_t	get_info;
+	/**< Get info from driver */
+	rte_event_timer_adapter_stats_get_t	stats_get;
+	/**< Get adapter statistics */
+	rte_event_timer_adapter_stats_reset_t	stats_reset;
+	/**< Reset adapter statistics */
+	rte_event_timer_arm_burst_t		arm_burst;
+	/**< Arm one or more event timers */
+	rte_event_timer_arm_tmo_tick_burst_t	arm_tmo_tick_burst;
+	/**< Arm event timers with same expiration time */
+	rte_event_timer_cancel_burst_t		cancel_burst;
+	/**< Cancel one or more event timers */
+};
+
+/**
+ * @internal Adapter data; structure to be placed in shared memory to be
+ * accessible by various processes in a multi-process configuration.
+ */
+struct rte_event_timer_adapter_data {
+	uint8_t id;
+	/**< Event timer adapter ID */
+	uint8_t event_dev_id;
+	/**< Event device ID */
+	uint32_t socket_id;
+	/**< Socket ID where memory is allocated */
+	uint8_t event_port_id;
+	/**< Optional: event port ID used when the inbuilt port is absent */
+	const struct rte_memzone *mz;
+	/**< Event timer adapter memzone pointer */
+	struct rte_event_timer_adapter_conf conf;
+	/**< Configuration used to configure the adapter. */
+	uint32_t caps;
+	/**< Adapter capabilities */
+	void *adapter_priv;
+	/**< Timer adapter private data*/
+	uint8_t service_inited;
+	/**< Service initialization state */
+	uint32_t service_id;
+	/**< Service ID*/
+
+	RTE_STD_C11
+	uint8_t started : 1;
+	/**< Flag to indicate adapter started. */
+} __rte_cache_aligned;
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __RTE_EVENT_TIMER_ADAPTER_PMD_H__ */
diff --git a/lib/librte_eventdev/rte_eventdev.c b/lib/librte_eventdev/rte_eventdev.c
index 2de8d9a..3f016f4 100644
--- a/lib/librte_eventdev/rte_eventdev.c
+++ b/lib/librte_eventdev/rte_eventdev.c
@@ -123,6 +123,28 @@ rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
 				: 0;
 }
 
+int __rte_experimental
+rte_event_timer_adapter_caps_get(uint8_t dev_id, uint32_t *caps)
+{
+	struct rte_eventdev *dev;
+	const struct rte_event_timer_adapter_ops *ops;
+
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
+
+	dev = &rte_eventdevs[dev_id];
+
+	if (caps == NULL)
+		return -EINVAL;
+	*caps = 0;
+
+	return dev->dev_ops->timer_adapter_caps_get ?
+				(*dev->dev_ops->timer_adapter_caps_get)(dev,
+									0,
+									caps,
+									&ops)
+				: 0;
+}
+
 static inline int
 rte_event_dev_queue_config(struct rte_eventdev *dev, uint8_t nb_queues)
 {
diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
index 86df4be..6fcbe94 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -215,6 +215,7 @@ extern "C" {
 #include <rte_config.h>
 #include <rte_memory.h>
 #include <rte_errno.h>
+#include <rte_compat.h>
 
 struct rte_mbuf; /* we just use mbuf pointers; no need to include rte_mbuf.h */
 struct rte_event;
@@ -1115,6 +1116,25 @@ int
 rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
 				uint32_t *caps);
 
+#define RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT (1ULL << 0)
+/**< This flag is set when the timer mechanism is in HW. */
+
+/**
+ * Retrieve the event device's timer adapter capabilities.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @param[out] caps
+ *   A pointer to memory to be filled with event timer adapter capabilities.
+ *
+ * @return
+ *   - 0: Success, driver provided event timer adapter capabilities.
+ *   - <0: Error code returned by the driver function.
+ */
+int __rte_experimental
+rte_event_timer_adapter_caps_get(uint8_t dev_id, uint32_t *caps);
+
 struct rte_eventdev_driver;
 struct rte_eventdev_ops;
 struct rte_eventdev;
diff --git a/lib/librte_eventdev/rte_eventdev_pmd.h b/lib/librte_eventdev/rte_eventdev_pmd.h
index 3a8ddd7..2dcb528 100644
--- a/lib/librte_eventdev/rte_eventdev_pmd.h
+++ b/lib/librte_eventdev/rte_eventdev_pmd.h
@@ -26,6 +26,7 @@ extern "C" {
 #include <rte_malloc.h>
 
 #include "rte_eventdev.h"
+#include "rte_event_timer_adapter_pmd.h"
 
 /* Logging Macros */
 #define RTE_EDEV_LOG_ERR(...) \
@@ -449,6 +450,37 @@ typedef int (*eventdev_eth_rx_adapter_caps_get_t)
 struct rte_event_eth_rx_adapter_queue_conf *queue_conf;
 
 /**
+ * Retrieve the event device's timer adapter capabilities, as well as the ops
+ * structure that an event timer adapter should call through to enter the
+ * driver
+ *
+ * @param dev
+ *   Event device pointer
+ *
+ * @param flags
+ *   Flags that can be used to determine how to select an event timer
+ *   adapter ops structure
+ *
+ * @param[out] caps
+ *   A pointer to memory filled with Rx event adapter capabilities.
+ *
+ * @param[out] ops
+ *   A pointer to the ops pointer to set with the address of the desired ops
+ *   structure
+ *
+ * @return
+ *   - 0: Success, driver provides Rx event adapter capabilities for the
+ *	ethernet device.
+ *   - <0: Error code returned by the driver function.
+ *
+ */
+typedef int (*eventdev_timer_adapter_caps_get_t)(
+				const struct rte_eventdev *dev,
+				uint64_t flags,
+				uint32_t *caps,
+				const struct rte_event_timer_adapter_ops **ops);
+
+/**
  * Add ethernet Rx queues to event device. This callback is invoked if
  * the caps returned from rte_eventdev_eth_rx_adapter_caps_get(, eth_port_id)
  * has RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT set.
@@ -640,6 +672,9 @@ struct rte_eventdev_ops {
 	eventdev_eth_rx_adapter_stats_reset eth_rx_adapter_stats_reset;
 	/**< Reset ethernet Rx stats */
 
+	eventdev_timer_adapter_caps_get_t timer_adapter_caps_get;
+	/**< Get timer adapter capabilities */
+
 	eventdev_selftest dev_selftest;
 	/**< Start eventdev Selftest */
 
diff --git a/lib/librte_eventdev/rte_eventdev_version.map b/lib/librte_eventdev/rte_eventdev_version.map
index 4396536..3ee28f7 100644
--- a/lib/librte_eventdev/rte_eventdev_version.map
+++ b/lib/librte_eventdev/rte_eventdev_version.map
@@ -66,7 +66,6 @@ DPDK_17.11 {
 	rte_event_eth_rx_adapter_stats_get;
 	rte_event_eth_rx_adapter_stats_reset;
 	rte_event_eth_rx_adapter_stop;
-
 } DPDK_17.08;
 
 DPDK_18.02 {
@@ -80,3 +79,22 @@ DPDK_18.05 {
 
 	rte_event_dev_stop_flush_callback_register;
 } DPDK_18.02;
+
+EXPERIMENTAL {
+	global:
+
+	rte_event_timer_adapter_caps_get;
+	rte_event_timer_adapter_create;
+	rte_event_timer_adapter_create_ext;
+	rte_event_timer_adapter_free;
+	rte_event_timer_adapter_get_info;
+	rte_event_timer_adapter_lookup;
+	rte_event_timer_adapter_service_id_get;
+	rte_event_timer_adapter_start;
+	rte_event_timer_adapter_stats_get;
+	rte_event_timer_adapter_stats_reset;
+	rte_event_timer_adapter_stop;
+	rte_event_timer_arm_burst;
+	rte_event_timer_arm_tmo_tick_burst;
+	rte_event_timer_cancel_burst;
+} DPDK_18.05;
-- 
2.6.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 1/2] eal: add API to align integer to previous power of 2
  2018-04-04 18:36  3%                 ` Pavan Nikhilesh
@ 2018-04-04 19:41  3%                   ` Matan Azrad
  0 siblings, 0 replies; 200+ results
From: Matan Azrad @ 2018-04-04 19:41 UTC (permalink / raw)
  To: Pavan Nikhilesh, jerin.jacob, keith.wiles, Thomas Monjalon; +Cc: dev

Hi Pavan

From: Pavan Nikhilesh, Wednesday, April 4, 2018 9:36 PM
> On Wed, Apr 04, 2018 at 06:23:19PM +0000, Matan Azrad wrote:
> > Hi Pavan
> >
> > From: Pavan Nikhilesh, Wednesday, April 4, 2018 9:16 PM
> > > Hi Matan,
> > >
> > > >
> > > > Got you.
> > > > Looks like you found issue here...
> > > > The experimental tag probably should be in a root .h file.
> > > > Probably, need a fix patch to move it for a different\new .h file.
> > > >
> > > > What do you think?
> > > >
> > >
> > > Actually thats just start of the rabbit hole, if we succeed to tag a
> > > inline function in rte_common.h as experimental every lib/driver
> > > that uses rte_common.h (almost everything) needs to have CFLAGS set
> > > to - DALLOW_EXPERIMENTAL_API.
> > >
> >
> > Isn't it relevant only for the libs which are using the new tagged APIs?
> 
> Static inline functions in .h files will be added to each and every .c example
> preprocessor output for rte_pci.c which includes rte_common.h:
> 
> # 231 "/home/pavan/Work/clean/dpdk/build/include/rte_common.h"
> extern int RTE_BUILD_BUG_ON_detected_error; # 249
> "/home/pavan/Work/clean/dpdk/build/include/rte_common.h"
> static inline uint32_t __attribute__((deprecated("Symbol is not yet part of
> stable ABI"), section(".text.experimental"))) rte_combine32ms1b(register
> uint32_t x) {  x |= x >> 1;  x |= x >> 2;  x |= x >> 4;  x |= x >> 8;  x |= x >> 16;
> 
>  return x;
> }
> # 271 "/home/pavan/Work/clean/dpdk/build/include/rte_common.h"
> static inline uint64_t
> rte_combine64ms1b(register uint64_t v)
> {
>  v |= v >> 1;
>  v |= v >> 2;
>  v |= v >> 4;
>  v |= v >> 8;
>  v |= v >> 16;
>  v |= v >> 32;
> 
>  return v;
> }
> 
> Which causes compiler to throw error as DALLOW_EXPERIMENTAL_API is not
> added to cflags.
> 

Are you sure?

I added the next code and the compilation passed:
static inline uint32_t
__attribute__((deprecated("Symbol is not yet part of stable ABI"), \
section(".text.experimental")))
rte_combine32ms1b(register uint32_t x)
{
	x |= x >> 1;
	x |= x >> 2;
	x |= x >> 4;
	x |= x >> 8;
	x |= x >> 16;

	return x;
}

Actually, the combine functions should not be experimental (already used in the existed code).
It also will prevent us to add the cflag in every lib which uses the old align functions. 
Only the new align functions should be tagged.
And then, you need to add the cflag only in the places which use these functions.

Am I missing something?

> >
> > > Regards,
> > > Pavan.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 1/4] ethdev: add support for PMD-tuned Tx/Rx parameters
  @ 2018-04-04 18:56  3%       ` De Lara Guarch, Pablo
  2018-04-05 10:16  0%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: De Lara Guarch, Pablo @ 2018-04-04 18:56 UTC (permalink / raw)
  To: Horton, Remy, dev
  Cc: Mcnamara, John, Lu, Wenzhuo, Wu, Jingjing, Zhang, Qi Z, Xing,
	Beilei, Shreyansh Jain, Thomas Monjalon

Hi Remy,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Remy Horton
> Sent: Wednesday, April 4, 2018 6:18 PM
> To: dev@dpdk.org
> Cc: Mcnamara, John <john.mcnamara@intel.com>; Lu, Wenzhuo
> <wenzhuo.lu@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; Xing, Beilei <beilei.xing@intel.com>; Shreyansh Jain
> <shreyansh.jain@nxp.com>; Thomas Monjalon <thomas@monjalon.net>
> Subject: [dpdk-dev] [PATCH v3 1/4] ethdev: add support for PMD-tuned Tx/Rx
> parameters
> 
> The optimal values of several transmission & reception related parameters, such
> as burst sizes, descriptor ring sizes, and number of queues, varies between
> different network interface devices. This patch allows individual PMDs to specify
> preferred parameter values.
> 
> Signed-off-by: Remy Horton <remy.horton@intel.com>
> ---
>  doc/guides/rel_notes/deprecation.rst   | 13 ----------
>  doc/guides/rel_notes/release_18_05.rst |  5 ++++
>  lib/librte_ether/rte_ethdev.c          | 44 +++++++++++++++++++++++++++-------
>  lib/librte_ether/rte_ethdev.h          | 25 +++++++++++++++++++
>  4 files changed, 65 insertions(+), 22 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> index 0c696f7..920df6b 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -115,19 +115,6 @@ Deprecation Notices
>    The new API add rss_level field to ``rte_eth_rss_conf`` to enable a choice
>    of RSS hash calculation on outer or inner header of tunneled packet.
> 
> -* ethdev:  Currently, if the  rte_eth_rx_burst() function returns a value less
> -  than *nb_pkts*, the application will assume that no more packets are present.
> -  Some of the hw queue based hardware can only support smaller burst for RX
> -  and TX and thus break the expectation of the rx_burst API. Similar is the
> -  case for TX burst as well as ring sizes. ``rte_eth_dev_info`` will be added
> -  with following new parameters so as to support semantics for drivers to
> -  define a preferred size for Rx/Tx burst and rings.
> -
> -  - Member ``struct preferred_size`` would be added to enclose all preferred
> -    size to be fetched from driver/implementation.
> -  - Members ``uint16_t rx_burst``,  ``uint16_t tx_burst``, ``uint16_t rx_ring``,
> -    and ``uint16_t tx_ring`` would be added to ``struct preferred_size``.
> -
>  * ethdev: A work is being planned for 18.05 to expose VF port representors
>    as a mean to perform control and data path operation on the different VFs.
>    As VF representor is an ethdev port, new fields are needed in order to map diff

API and ABI changes should be documented in release notes.

Thanks,
Pablo

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 1/2] eal: add API to align integer to previous power of 2
  @ 2018-04-04 18:36  3%                 ` Pavan Nikhilesh
  2018-04-04 19:41  3%                   ` Matan Azrad
  0 siblings, 1 reply; 200+ results
From: Pavan Nikhilesh @ 2018-04-04 18:36 UTC (permalink / raw)
  To: Matan Azrad, jerin.jacob, keith.wiles, Thomas Monjalon; +Cc: dev

On Wed, Apr 04, 2018 at 06:23:19PM +0000, Matan Azrad wrote:
> Hi Pavan
>
> From: Pavan Nikhilesh, Wednesday, April 4, 2018 9:16 PM
> > Hi Matan,
> >
> > >
> > > Got you.
> > > Looks like you found issue here...
> > > The experimental tag probably should be in a root .h file.
> > > Probably, need a fix patch to move it for a different\new .h file.
> > >
> > > What do you think?
> > >
> >
> > Actually thats just start of the rabbit hole, if we succeed to tag a inline
> > function in rte_common.h as experimental every lib/driver that uses
> > rte_common.h (almost everything) needs to have CFLAGS set to -
> > DALLOW_EXPERIMENTAL_API.
> >
>
> Isn't it relevant only for the libs which are using the new tagged APIs?

Static inline functions in .h files will be added to each and every .c
example preprocessor output for rte_pci.c which includes rte_common.h:

# 231 "/home/pavan/Work/clean/dpdk/build/include/rte_common.h"
extern int RTE_BUILD_BUG_ON_detected_error;
# 249 "/home/pavan/Work/clean/dpdk/build/include/rte_common.h"
static inline uint32_t __attribute__((deprecated("Symbol is not yet part of stable ABI"), section(".text.experimental")))
rte_combine32ms1b(register uint32_t x)
{
 x |= x >> 1;
 x |= x >> 2;
 x |= x >> 4;
 x |= x >> 8;
 x |= x >> 16;

 return x;
}
# 271 "/home/pavan/Work/clean/dpdk/build/include/rte_common.h"
static inline uint64_t
rte_combine64ms1b(register uint64_t v)
{
 v |= v >> 1;
 v |= v >> 2;
 v |= v >> 4;
 v |= v >> 8;
 v |= v >> 16;
 v |= v >> 32;

 return v;
}

Which causes compiler to throw error as DALLOW_EXPERIMENTAL_API is not added
to cflags.

>
> > Regards,
> > Pavan.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4] ethdev: replace bus specific struct with generic dev
  @ 2018-04-04 17:57  3%           ` De Lara Guarch, Pablo
  2018-04-05  9:19  0%             ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: De Lara Guarch, Pablo @ 2018-04-04 17:57 UTC (permalink / raw)
  To: Yigit, Ferruh, David Marchand, santosh
  Cc: dev, Shreyansh Jain, Legacy, Allain (Wind River),
	Tomasz Duszynski, Thomas Monjalon



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Tuesday, April 3, 2018 10:50 AM
> To: David Marchand <david.marchand@6wind.com>; santosh
> <santosh.shukla@caviumnetworks.com>
> Cc: dev@dpdk.org; Shreyansh Jain <shreyansh.jain@nxp.com>; Legacy, Allain
> (Wind River) <allain.legacy@windriver.com>; Tomasz Duszynski
> <tdu@semihalf.com>; Thomas Monjalon <thomas@monjalon.net>
> Subject: Re: [dpdk-dev] [PATCH v4] ethdev: replace bus specific struct with
> generic dev
> 
> On 4/3/2018 10:06 AM, David Marchand wrote:
> > On Mon, Apr 2, 2018 at 6:13 PM, santosh
> > <santosh.shukla@caviumnetworks.com> wrote:
> >> On Friday 30 March 2018 08:59 PM, David Marchand wrote:
> >>> I can see we enforce the driver name by putting it after the call to
> >>> .dev_infos_get.
> >>> http://dpdk.org/browse/dpdk/tree/lib/librte_ether/rte_ethdev.c#n2399
> >>>
> >>> octeontx pmd seems to try to do something about it:
> >>> http://dpdk.org/browse/dpdk/tree/drivers/net/octeontx/octeontx_ethde
> >>> v.c#n622
> >>>
> >>> Not sure it does something, might be a thing to cleanup.
> >>>
> >>>
> >> In case, if your referring to driver_name update then indeed its a
> >> cleanup [1].
> >>
> >> Otherwise, I don't see any issue with v4 Or may be /I /misunderstood
> >> your comment.
> >
> > I agree there is no fundamental issue.
> >
> >     dev_info->device = dev->device;
> >
> >     RTE_FUNC_PTR_OR_RET(*dev->dev_ops->dev_infos_get);
> >     (*dev->dev_ops->dev_infos_get)(dev, dev_info);
> >     dev_info->driver_name = dev->device->driver->name;
> >
> > If somebody (I mean some pmd out there) has a usecase with
> > dev_info->device != dev->device, why not.
> 
> Intentional let drivers update this variable although I don't also see any use case
> of it.
> 
> This variable was set by PMDs before this patch, so I don't see any reason to be
> so strict here.
> 
> If driver does anything ethdev will set dev_info->device for it, if it want to
> overwrite, for any reason, it will have the capability.

Looks good to me. Will do the same for cryptodev and bbdev.
The only thing that I am missing here is an update in documentation,
adding the ABI Change in release notes.

Apart from it:

Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>

> 
> >
> > Thomas ?
> >
> >


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 0/4] ethdev: add per-PMD tuning of RxTx parmeters
  2018-03-21 14:27  3% ` [dpdk-dev] [PATCH v2 " Remy Horton
  2018-03-27 18:43  0%   ` Ferruh Yigit
@ 2018-04-04 17:17  3%   ` Remy Horton
    1 sibling, 1 reply; 200+ results
From: Remy Horton @ 2018-04-04 17:17 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Wenzhuo Lu, Jingjing Wu, Qi Zhang, Beilei Xing,
	Shreyansh Jain, Thomas Monjalon

The optimal values of several transmission & reception related parameters,
such as burst sizes, descriptor ring sizes, and number of queues, varies
between different network interface devices. This patchset allows individual
PMDs to specify their preferred parameter values, and if so indicated by an
application, for them to be used automatically by the ethdev layer.

rte_eth_dev_configure() has been changed so that specifying zero for both
nb_rx_q AND nb_tx_q causes it to use driver preferred values, and if these
are not available, falls back to EAL defaults. Setting one (but not both)
to zero does not cause the use of defaults, as having one of them zeroed is
a valid setup.

This patchset includes per-PMD values for e1000 and i40e but it is expected
that subsequent patchsets will cover other PMDs. A deprecation notice
covering the API/ABI change is in place.

Changes in v3:
* Changed formatting around new rte_eth_dev_info fields
* Added Doxygen documentation to struct rte_eth_dev_portconf
* Testpmd "port config all burst 0" and --burst=0 uses PMD 
  Rx burst recommendations.
* Added to release notes
* Rebased to 8ea081f38161

Changes in v2:
* Rebased to master
* Removed fallback values from rte_eth_dev_info_get()
* Added fallback values to rte_rte_[rt]x_queue_setup()
* Added fallback values to rte_eth_dev_configure()
* Corrected comment
* Removed deprecation notice
* Split RX and Tx into seperate structures
* Changed parameter names

Remy Horton (4):
  ethdev: add support for PMD-tuned Tx/Rx parameters
  net/e1000: add TxRx tuning parameters
  net/i40e: add TxRx tuning parameters
  testpmd: make use of per-PMD TxRx parameters

 app/test-pmd/cmdline.c                 | 31 +++++++++++++++++++++---
 app/test-pmd/parameters.c              | 38 +++++++++++++++++++++++++----
 app/test-pmd/testpmd.c                 |  5 ++--
 doc/guides/rel_notes/deprecation.rst   | 13 ----------
 doc/guides/rel_notes/release_18_05.rst |  5 ++++
 drivers/net/e1000/em_ethdev.c          |  6 +++++
 drivers/net/i40e/i40e_ethdev.c         | 33 ++++++++++++++++++++++---
 lib/librte_ether/rte_ethdev.c          | 44 +++++++++++++++++++++++++++-------
 lib/librte_ether/rte_ethdev.h          | 25 +++++++++++++++++++
 9 files changed, 165 insertions(+), 35 deletions(-)

-- 
2.9.5

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v7 02/10] crypto/virtio: support virtio device init
  @ 2018-04-04 17:03  1%   ` Jay Zhou
  0 siblings, 0 replies; 200+ results
From: Jay Zhou @ 2018-04-04 17:03 UTC (permalink / raw)
  To: dev
  Cc: pablo.de.lara.guarch, roy.fan.zhang, thomas, arei.gonglei,
	xin.zeng, weidong.huang, wangxinxin.wang, longpeng2,
	jianjay.zhou

This patch implements the initialization of the virtio crypto device.
The virtio crypto device conforms to virtio-1.0, so this patch only
supports modern mode operation.
The cryptodev is created at the virtio crypto pci device probing stage.
The function of virtio_crypto_pkt_tx_burst() is used to burst transfer
packets and virtio_crypto_pkt_rx_burst() is used to burst receive packets.

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 drivers/crypto/virtio/Makefile           |   3 +
 drivers/crypto/virtio/virtio_cryptodev.c | 247 ++++++++++++++++-
 drivers/crypto/virtio/virtio_cryptodev.h |  13 +
 drivers/crypto/virtio/virtio_logs.h      |  91 ++++++
 drivers/crypto/virtio/virtio_pci.c       | 460 +++++++++++++++++++++++++++++++
 drivers/crypto/virtio/virtio_pci.h       | 253 +++++++++++++++++
 drivers/crypto/virtio/virtio_ring.h      | 137 +++++++++
 drivers/crypto/virtio/virtio_rxtx.c      |  26 ++
 drivers/crypto/virtio/virtqueue.c        |  43 +++
 drivers/crypto/virtio/virtqueue.h        | 172 ++++++++++++
 10 files changed, 1442 insertions(+), 3 deletions(-)
 create mode 100644 drivers/crypto/virtio/virtio_logs.h
 create mode 100644 drivers/crypto/virtio/virtio_pci.c
 create mode 100644 drivers/crypto/virtio/virtio_pci.h
 create mode 100644 drivers/crypto/virtio/virtio_ring.h
 create mode 100644 drivers/crypto/virtio/virtio_rxtx.c
 create mode 100644 drivers/crypto/virtio/virtqueue.c
 create mode 100644 drivers/crypto/virtio/virtqueue.h

diff --git a/drivers/crypto/virtio/Makefile b/drivers/crypto/virtio/Makefile
index a3b44e9..c4727ea 100644
--- a/drivers/crypto/virtio/Makefile
+++ b/drivers/crypto/virtio/Makefile
@@ -18,6 +18,9 @@ LIBABIVER := 1
 #
 # all source are stored in SRCS-y
 #
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO) += virtqueue.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO) += virtio_pci.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO) += virtio_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO) += virtio_cryptodev.c
 
 # this lib depends upon:
diff --git a/drivers/crypto/virtio/virtio_cryptodev.c b/drivers/crypto/virtio/virtio_cryptodev.c
index 84aff58..4550834 100644
--- a/drivers/crypto/virtio/virtio_cryptodev.c
+++ b/drivers/crypto/virtio/virtio_cryptodev.c
@@ -3,25 +3,238 @@
  */
 #include <rte_pci.h>
 #include <rte_bus_pci.h>
+#include <rte_cryptodev.h>
 #include <rte_cryptodev_pmd.h>
+#include <rte_eal.h>
 #include "virtio_cryptodev.h"
+#include "virtqueue.h"
+
+int virtio_crypto_logtype_init;
+int virtio_crypto_logtype_session;
+int virtio_crypto_logtype_rx;
+int virtio_crypto_logtype_tx;
+int virtio_crypto_logtype_driver;
+
+/*
+ * The set of PCI devices this driver supports
+ */
+static const struct rte_pci_id pci_id_virtio_crypto_map[] = {
+	{ RTE_PCI_DEVICE(VIRTIO_CRYPTO_PCI_VENDORID,
+				VIRTIO_CRYPTO_PCI_DEVICEID) },
+	{ .vendor_id = 0, /* sentinel */ },
+};
 
 uint8_t cryptodev_virtio_driver_id;
 
+/*
+ * dev_ops for virtio, bare necessities for basic operation
+ */
+static struct rte_cryptodev_ops virtio_crypto_dev_ops = {
+	/* Device related operations */
+	.dev_configure			 = NULL,
+	.dev_start			 = NULL,
+	.dev_stop			 = NULL,
+	.dev_close			 = NULL,
+	.dev_infos_get			 = NULL,
+
+	.stats_get			 = NULL,
+	.stats_reset			 = NULL,
+
+	.queue_pair_setup                = NULL,
+	.queue_pair_release              = NULL,
+	.queue_pair_start                = NULL,
+	.queue_pair_stop                 = NULL,
+	.queue_pair_count                = NULL,
+
+	/* Crypto related operations */
+	.session_get_size	= NULL,
+	.session_configure	= NULL,
+	.session_clear		= NULL,
+	.qp_attach_session = NULL,
+	.qp_detach_session = NULL
+};
+
+static int
+virtio_negotiate_features(struct virtio_crypto_hw *hw, uint64_t req_features)
+{
+	uint64_t host_features;
+
+	PMD_INIT_FUNC_TRACE();
+
+	/* Prepare guest_features: feature that driver wants to support */
+	VIRTIO_CRYPTO_INIT_LOG_DBG("guest_features before negotiate = %" PRIx64,
+		req_features);
+
+	/* Read device(host) feature bits */
+	host_features = VTPCI_OPS(hw)->get_features(hw);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("host_features before negotiate = %" PRIx64,
+		host_features);
+
+	/*
+	 * Negotiate features: Subset of device feature bits are written back
+	 * guest feature bits.
+	 */
+	hw->guest_features = req_features;
+	hw->guest_features = vtpci_cryptodev_negotiate_features(hw,
+							host_features);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("features after negotiate = %" PRIx64,
+		hw->guest_features);
+
+	if (hw->modern) {
+		if (!vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) {
+			VIRTIO_CRYPTO_INIT_LOG_ERR(
+				"VIRTIO_F_VERSION_1 features is not enabled.");
+			return -1;
+		}
+		vtpci_cryptodev_set_status(hw,
+			VIRTIO_CONFIG_STATUS_FEATURES_OK);
+		if (!(vtpci_cryptodev_get_status(hw) &
+			VIRTIO_CONFIG_STATUS_FEATURES_OK)) {
+			VIRTIO_CRYPTO_INIT_LOG_ERR("failed to set FEATURES_OK "
+						"status!");
+			return -1;
+		}
+	}
+
+	hw->req_guest_features = req_features;
+
+	return 0;
+}
+
+/* reset device and renegotiate features if needed */
+static int
+virtio_crypto_init_device(struct rte_cryptodev *cryptodev,
+	uint64_t req_features)
+{
+	struct virtio_crypto_hw *hw = cryptodev->data->dev_private;
+	struct virtio_crypto_config local_config;
+	struct virtio_crypto_config *config = &local_config;
+
+	PMD_INIT_FUNC_TRACE();
+
+	/* Reset the device although not necessary at startup */
+	vtpci_cryptodev_reset(hw);
+
+	/* Tell the host we've noticed this device. */
+	vtpci_cryptodev_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
+
+	/* Tell the host we've known how to drive the device. */
+	vtpci_cryptodev_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
+	if (virtio_negotiate_features(hw, req_features) < 0)
+		return -1;
+
+	/* Get status of the device */
+	vtpci_read_cryptodev_config(hw,
+		offsetof(struct virtio_crypto_config, status),
+		&config->status, sizeof(config->status));
+	if (config->status != VIRTIO_CRYPTO_S_HW_READY) {
+		VIRTIO_CRYPTO_DRV_LOG_ERR("accelerator hardware is "
+				"not ready");
+		return -1;
+	}
+
+	/* Get number of data queues */
+	vtpci_read_cryptodev_config(hw,
+		offsetof(struct virtio_crypto_config, max_dataqueues),
+		&config->max_dataqueues,
+		sizeof(config->max_dataqueues));
+	hw->max_dataqueues = config->max_dataqueues;
+
+	VIRTIO_CRYPTO_INIT_LOG_DBG("hw->max_dataqueues=%d",
+		hw->max_dataqueues);
+
+	return 0;
+}
+
+/*
+ * This function is based on probe() function
+ * It returns 0 on success.
+ */
+static int
+crypto_virtio_create(const char *name, struct rte_pci_device *pci_dev,
+		struct rte_cryptodev_pmd_init_params *init_params)
+{
+	struct rte_cryptodev *cryptodev;
+	struct virtio_crypto_hw *hw;
+
+	PMD_INIT_FUNC_TRACE();
+
+	cryptodev = rte_cryptodev_pmd_create(name, &pci_dev->device,
+					init_params);
+	if (cryptodev == NULL)
+		return -ENODEV;
+
+	cryptodev->driver_id = cryptodev_virtio_driver_id;
+	cryptodev->dev_ops = &virtio_crypto_dev_ops;
+
+	cryptodev->enqueue_burst = virtio_crypto_pkt_tx_burst;
+	cryptodev->dequeue_burst = virtio_crypto_pkt_rx_burst;
+
+	cryptodev->feature_flags = RTE_CRYPTODEV_FF_SYMMETRIC_CRYPTO |
+		RTE_CRYPTODEV_FF_SYM_OPERATION_CHAINING;
+
+	hw = cryptodev->data->dev_private;
+	hw->dev_id = cryptodev->data->dev_id;
+
+	VIRTIO_CRYPTO_INIT_LOG_DBG("dev %d vendorID=0x%x deviceID=0x%x",
+		cryptodev->data->dev_id, pci_dev->id.vendor_id,
+		pci_dev->id.device_id);
+
+	/* pci device init */
+	if (vtpci_cryptodev_init(pci_dev, hw))
+		return -1;
+
+	if (virtio_crypto_init_device(cryptodev,
+			VIRTIO_CRYPTO_PMD_GUEST_FEATURES) < 0)
+		return -1;
+
+	return 0;
+}
+
 static int crypto_virtio_pci_probe(
 	struct rte_pci_driver *pci_drv __rte_unused,
-	struct rte_pci_device *pci_dev __rte_unused)
+	struct rte_pci_device *pci_dev)
 {
-	return 0;
+	struct rte_cryptodev_pmd_init_params init_params = {
+		.name = "",
+		.socket_id = rte_socket_id(),
+		.private_data_size = sizeof(struct virtio_crypto_hw),
+		.max_nb_sessions = RTE_VIRTIO_CRYPTO_PMD_MAX_NB_SESSIONS
+	};
+	char name[RTE_CRYPTODEV_NAME_MAX_LEN];
+
+	VIRTIO_CRYPTO_DRV_LOG_DBG("Found Crypto device at %02x:%02x.%x",
+			pci_dev->addr.bus,
+			pci_dev->addr.devid,
+			pci_dev->addr.function);
+
+	rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+
+	return crypto_virtio_create(name, pci_dev, &init_params);
 }
 
 static int crypto_virtio_pci_remove(
-	struct rte_pci_device *pci_dev __rte_unused)
+	struct rte_pci_device *pci_dev)
 {
+	struct rte_cryptodev *cryptodev;
+	char cryptodev_name[RTE_CRYPTODEV_NAME_MAX_LEN];
+
+	if (pci_dev == NULL)
+		return -EINVAL;
+
+	rte_pci_device_name(&pci_dev->addr, cryptodev_name,
+			sizeof(cryptodev_name));
+
+	cryptodev = rte_cryptodev_pmd_get_named_dev(cryptodev_name);
+	if (cryptodev == NULL)
+		return -ENODEV;
+
 	return 0;
 }
 
 static struct rte_pci_driver rte_virtio_crypto_driver = {
+	.id_table = pci_id_virtio_crypto_map,
+	.drv_flags = 0,
 	.probe = crypto_virtio_pci_probe,
 	.remove = crypto_virtio_pci_remove
 };
@@ -32,3 +245,31 @@ static int crypto_virtio_pci_remove(
 RTE_PMD_REGISTER_CRYPTO_DRIVER(virtio_crypto_drv,
 	rte_virtio_crypto_driver.driver,
 	cryptodev_virtio_driver_id);
+
+RTE_INIT(virtio_crypto_init_log);
+static void
+virtio_crypto_init_log(void)
+{
+	virtio_crypto_logtype_init = rte_log_register("pmd.crypto.virtio.init");
+	if (virtio_crypto_logtype_init >= 0)
+		rte_log_set_level(virtio_crypto_logtype_init, RTE_LOG_NOTICE);
+
+	virtio_crypto_logtype_session =
+		rte_log_register("pmd.crypto.virtio.session");
+	if (virtio_crypto_logtype_session >= 0)
+		rte_log_set_level(virtio_crypto_logtype_session,
+				RTE_LOG_NOTICE);
+
+	virtio_crypto_logtype_rx = rte_log_register("pmd.crypto.virtio.rx");
+	if (virtio_crypto_logtype_rx >= 0)
+		rte_log_set_level(virtio_crypto_logtype_rx, RTE_LOG_NOTICE);
+
+	virtio_crypto_logtype_tx = rte_log_register("pmd.crypto.virtio.tx");
+	if (virtio_crypto_logtype_tx >= 0)
+		rte_log_set_level(virtio_crypto_logtype_tx, RTE_LOG_NOTICE);
+
+	virtio_crypto_logtype_driver =
+		rte_log_register("pmd.crypto.virtio.driver");
+	if (virtio_crypto_logtype_driver >= 0)
+		rte_log_set_level(virtio_crypto_logtype_driver, RTE_LOG_NOTICE);
+}
diff --git a/drivers/crypto/virtio/virtio_cryptodev.h b/drivers/crypto/virtio/virtio_cryptodev.h
index 44517b8..392db4a 100644
--- a/drivers/crypto/virtio/virtio_cryptodev.h
+++ b/drivers/crypto/virtio/virtio_cryptodev.h
@@ -5,6 +5,19 @@
 #ifndef _VIRTIO_CRYPTODEV_H_
 #define _VIRTIO_CRYPTODEV_H_
 
+#include <rte_cryptodev.h>
+
+/* Features desired/implemented by this driver. */
+#define VIRTIO_CRYPTO_PMD_GUEST_FEATURES (1ULL << VIRTIO_F_VERSION_1)
+
 #define CRYPTODEV_NAME_VIRTIO_PMD crypto_virtio
 
+uint16_t virtio_crypto_pkt_tx_burst(void *tx_queue,
+		struct rte_crypto_op **tx_pkts,
+		uint16_t nb_pkts);
+
+uint16_t virtio_crypto_pkt_rx_burst(void *tx_queue,
+		struct rte_crypto_op **tx_pkts,
+		uint16_t nb_pkts);
+
 #endif /* _VIRTIO_CRYPTODEV_H_ */
diff --git a/drivers/crypto/virtio/virtio_logs.h b/drivers/crypto/virtio/virtio_logs.h
new file mode 100644
index 0000000..26a286c
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_logs.h
@@ -0,0 +1,91 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_LOGS_H_
+#define _VIRTIO_LOGS_H_
+
+#include <rte_log.h>
+
+#define PMD_INIT_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, RTE_LOGTYPE_PMD, \
+		"PMD: %s(): " fmt "\n", __func__, ##args)
+
+#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
+
+extern int virtio_crypto_logtype_init;
+
+#define VIRTIO_CRYPTO_INIT_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_init, \
+		"INIT: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_INIT_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_INIT_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_INIT_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_INIT_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_INIT_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_INIT_LOG_IMPL(ERR, fmt, ## args)
+
+extern int virtio_crypto_logtype_session;
+
+#define VIRTIO_CRYPTO_SESSION_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_session, \
+		"SESSION: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_SESSION_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_SESSION_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_SESSION_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_SESSION_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_SESSION_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_SESSION_LOG_IMPL(ERR, fmt, ## args)
+
+extern int virtio_crypto_logtype_rx;
+
+#define VIRTIO_CRYPTO_RX_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_rx, \
+		"RX: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_RX_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_RX_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_RX_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_RX_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_RX_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_RX_LOG_IMPL(ERR, fmt, ## args)
+
+extern int virtio_crypto_logtype_tx;
+
+#define VIRTIO_CRYPTO_TX_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_tx, \
+		"TX: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_TX_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_TX_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_TX_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_TX_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_TX_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_TX_LOG_IMPL(ERR, fmt, ## args)
+
+extern int virtio_crypto_logtype_driver;
+
+#define VIRTIO_CRYPTO_DRV_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_driver, \
+		"DRIVER: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_DRV_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_DRV_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_DRV_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_DRV_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_DRV_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_DRV_LOG_IMPL(ERR, fmt, ## args)
+
+#endif /* _VIRTIO_LOGS_H_ */
diff --git a/drivers/crypto/virtio/virtio_pci.c b/drivers/crypto/virtio/virtio_pci.c
new file mode 100644
index 0000000..43ec1a4
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_pci.c
@@ -0,0 +1,460 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#include <stdint.h>
+
+#ifdef RTE_EXEC_ENV_LINUXAPP
+ #include <dirent.h>
+ #include <fcntl.h>
+#endif
+
+#include <rte_io.h>
+#include <rte_bus.h>
+
+#include "virtio_pci.h"
+#include "virtqueue.h"
+
+/*
+ * Following macros are derived from linux/pci_regs.h, however,
+ * we can't simply include that header here, as there is no such
+ * file for non-Linux platform.
+ */
+#define PCI_CAPABILITY_LIST	0x34
+#define PCI_CAP_ID_VNDR		0x09
+#define PCI_CAP_ID_MSIX		0x11
+
+/*
+ * The remaining space is defined by each driver as the per-driver
+ * configuration space.
+ */
+#define VIRTIO_PCI_CONFIG(hw) \
+		(((hw)->use_msix == VIRTIO_MSIX_ENABLED) ? 24 : 20)
+
+static inline int
+check_vq_phys_addr_ok(struct virtqueue *vq)
+{
+	/* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
+	 * and only accepts 32 bit page frame number.
+	 * Check if the allocated physical memory exceeds 16TB.
+	 */
+	if ((vq->vq_ring_mem + vq->vq_ring_size - 1) >>
+			(VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR("vring address shouldn't be above 16TB!");
+		return 0;
+	}
+
+	return 1;
+}
+
+static inline void
+io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
+{
+	rte_write32(val & ((1ULL << 32) - 1), lo);
+	rte_write32(val >> 32,		     hi);
+}
+
+static void
+modern_read_dev_config(struct virtio_crypto_hw *hw, size_t offset,
+		       void *dst, int length)
+{
+	int i;
+	uint8_t *p;
+	uint8_t old_gen, new_gen;
+
+	do {
+		old_gen = rte_read8(&hw->common_cfg->config_generation);
+
+		p = dst;
+		for (i = 0;  i < length; i++)
+			*p++ = rte_read8((uint8_t *)hw->dev_cfg + offset + i);
+
+		new_gen = rte_read8(&hw->common_cfg->config_generation);
+	} while (old_gen != new_gen);
+}
+
+static void
+modern_write_dev_config(struct virtio_crypto_hw *hw, size_t offset,
+			const void *src, int length)
+{
+	int i;
+	const uint8_t *p = src;
+
+	for (i = 0;  i < length; i++)
+		rte_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
+}
+
+static uint64_t
+modern_get_features(struct virtio_crypto_hw *hw)
+{
+	uint32_t features_lo, features_hi;
+
+	rte_write32(0, &hw->common_cfg->device_feature_select);
+	features_lo = rte_read32(&hw->common_cfg->device_feature);
+
+	rte_write32(1, &hw->common_cfg->device_feature_select);
+	features_hi = rte_read32(&hw->common_cfg->device_feature);
+
+	return ((uint64_t)features_hi << 32) | features_lo;
+}
+
+static void
+modern_set_features(struct virtio_crypto_hw *hw, uint64_t features)
+{
+	rte_write32(0, &hw->common_cfg->guest_feature_select);
+	rte_write32(features & ((1ULL << 32) - 1),
+		    &hw->common_cfg->guest_feature);
+
+	rte_write32(1, &hw->common_cfg->guest_feature_select);
+	rte_write32(features >> 32,
+		    &hw->common_cfg->guest_feature);
+}
+
+static uint8_t
+modern_get_status(struct virtio_crypto_hw *hw)
+{
+	return rte_read8(&hw->common_cfg->device_status);
+}
+
+static void
+modern_set_status(struct virtio_crypto_hw *hw, uint8_t status)
+{
+	rte_write8(status, &hw->common_cfg->device_status);
+}
+
+static void
+modern_reset(struct virtio_crypto_hw *hw)
+{
+	modern_set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	modern_get_status(hw);
+}
+
+static uint8_t
+modern_get_isr(struct virtio_crypto_hw *hw)
+{
+	return rte_read8(hw->isr);
+}
+
+static uint16_t
+modern_set_config_irq(struct virtio_crypto_hw *hw, uint16_t vec)
+{
+	rte_write16(vec, &hw->common_cfg->msix_config);
+	return rte_read16(&hw->common_cfg->msix_config);
+}
+
+static uint16_t
+modern_set_queue_irq(struct virtio_crypto_hw *hw, struct virtqueue *vq,
+		uint16_t vec)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+	rte_write16(vec, &hw->common_cfg->queue_msix_vector);
+	return rte_read16(&hw->common_cfg->queue_msix_vector);
+}
+
+static uint16_t
+modern_get_queue_num(struct virtio_crypto_hw *hw, uint16_t queue_id)
+{
+	rte_write16(queue_id, &hw->common_cfg->queue_select);
+	return rte_read16(&hw->common_cfg->queue_size);
+}
+
+static int
+modern_setup_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
+{
+	uint64_t desc_addr, avail_addr, used_addr;
+	uint16_t notify_off;
+
+	if (!check_vq_phys_addr_ok(vq))
+		return -1;
+
+	desc_addr = vq->vq_ring_mem;
+	avail_addr = desc_addr + vq->vq_nentries * sizeof(struct vring_desc);
+	used_addr = RTE_ALIGN_CEIL(avail_addr + offsetof(struct vring_avail,
+							 ring[vq->vq_nentries]),
+				   VIRTIO_PCI_VRING_ALIGN);
+
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(desc_addr, &hw->common_cfg->queue_desc_lo,
+				      &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(avail_addr, &hw->common_cfg->queue_avail_lo,
+				       &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(used_addr, &hw->common_cfg->queue_used_lo,
+				      &hw->common_cfg->queue_used_hi);
+
+	notify_off = rte_read16(&hw->common_cfg->queue_notify_off);
+	vq->notify_addr = (void *)((uint8_t *)hw->notify_base +
+				notify_off * hw->notify_off_multiplier);
+
+	rte_write16(1, &hw->common_cfg->queue_enable);
+
+	VIRTIO_CRYPTO_INIT_LOG_DBG("queue %u addresses:", vq->vq_queue_index);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("\t desc_addr: %" PRIx64, desc_addr);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("\t aval_addr: %" PRIx64, avail_addr);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("\t used_addr: %" PRIx64, used_addr);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("\t notify addr: %p (notify offset: %u)",
+		vq->notify_addr, notify_off);
+
+	return 0;
+}
+
+static void
+modern_del_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(0, &hw->common_cfg->queue_desc_lo,
+				  &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_avail_lo,
+				  &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_used_lo,
+				  &hw->common_cfg->queue_used_hi);
+
+	rte_write16(0, &hw->common_cfg->queue_enable);
+}
+
+static void
+modern_notify_queue(struct virtio_crypto_hw *hw __rte_unused,
+		struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, vq->notify_addr);
+}
+
+const struct virtio_pci_ops virtio_crypto_modern_ops = {
+	.read_dev_cfg	= modern_read_dev_config,
+	.write_dev_cfg	= modern_write_dev_config,
+	.reset		= modern_reset,
+	.get_status	= modern_get_status,
+	.set_status	= modern_set_status,
+	.get_features	= modern_get_features,
+	.set_features	= modern_set_features,
+	.get_isr	= modern_get_isr,
+	.set_config_irq	= modern_set_config_irq,
+	.set_queue_irq  = modern_set_queue_irq,
+	.get_queue_num	= modern_get_queue_num,
+	.setup_queue	= modern_setup_queue,
+	.del_queue	= modern_del_queue,
+	.notify_queue	= modern_notify_queue,
+};
+
+void
+vtpci_read_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+		void *dst, int length)
+{
+	VTPCI_OPS(hw)->read_dev_cfg(hw, offset, dst, length);
+}
+
+void
+vtpci_write_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+		const void *src, int length)
+{
+	VTPCI_OPS(hw)->write_dev_cfg(hw, offset, src, length);
+}
+
+uint64_t
+vtpci_cryptodev_negotiate_features(struct virtio_crypto_hw *hw,
+		uint64_t host_features)
+{
+	uint64_t features;
+
+	/*
+	 * Limit negotiated features to what the driver, virtqueue, and
+	 * host all support.
+	 */
+	features = host_features & hw->guest_features;
+	VTPCI_OPS(hw)->set_features(hw, features);
+
+	return features;
+}
+
+void
+vtpci_cryptodev_reset(struct virtio_crypto_hw *hw)
+{
+	VTPCI_OPS(hw)->set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	/* flush status write */
+	VTPCI_OPS(hw)->get_status(hw);
+}
+
+void
+vtpci_cryptodev_reinit_complete(struct virtio_crypto_hw *hw)
+{
+	vtpci_cryptodev_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
+}
+
+void
+vtpci_cryptodev_set_status(struct virtio_crypto_hw *hw, uint8_t status)
+{
+	if (status != VIRTIO_CONFIG_STATUS_RESET)
+		status |= VTPCI_OPS(hw)->get_status(hw);
+
+	VTPCI_OPS(hw)->set_status(hw, status);
+}
+
+uint8_t
+vtpci_cryptodev_get_status(struct virtio_crypto_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_status(hw);
+}
+
+uint8_t
+vtpci_cryptodev_isr(struct virtio_crypto_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_isr(hw);
+}
+
+static void *
+get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap)
+{
+	uint8_t  bar    = cap->bar;
+	uint32_t length = cap->length;
+	uint32_t offset = cap->offset;
+	uint8_t *base;
+
+	if (bar >= PCI_MAX_RESOURCE) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR("invalid bar: %u", bar);
+		return NULL;
+	}
+
+	if (offset + length < offset) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR("offset(%u) + length(%u) overflows",
+			offset, length);
+		return NULL;
+	}
+
+	if (offset + length > dev->mem_resource[bar].len) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR(
+			"invalid cap: overflows bar space: %u > %" PRIu64,
+			offset + length, dev->mem_resource[bar].len);
+		return NULL;
+	}
+
+	base = dev->mem_resource[bar].addr;
+	if (base == NULL) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR("bar %u base addr is NULL", bar);
+		return NULL;
+	}
+
+	return base + offset;
+}
+
+#define PCI_MSIX_ENABLE 0x8000
+
+static int
+virtio_read_caps(struct rte_pci_device *dev, struct virtio_crypto_hw *hw)
+{
+	uint8_t pos;
+	struct virtio_pci_cap cap;
+	int ret;
+
+	if (rte_pci_map_device(dev)) {
+		VIRTIO_CRYPTO_INIT_LOG_DBG("failed to map pci device!");
+		return -1;
+	}
+
+	ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
+	if (ret < 0) {
+		VIRTIO_CRYPTO_INIT_LOG_DBG("failed to read pci capability list");
+		return -1;
+	}
+
+	while (pos) {
+		ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
+		if (ret < 0) {
+			VIRTIO_CRYPTO_INIT_LOG_ERR(
+				"failed to read pci cap at pos: %x", pos);
+			break;
+		}
+
+		if (cap.cap_vndr == PCI_CAP_ID_MSIX) {
+			/* Transitional devices would also have this capability,
+			 * that's why we also check if msix is enabled.
+			 * 1st byte is cap ID; 2nd byte is the position of next
+			 * cap; next two bytes are the flags.
+			 */
+			uint16_t flags = ((uint16_t *)&cap)[1];
+
+			if (flags & PCI_MSIX_ENABLE)
+				hw->use_msix = VIRTIO_MSIX_ENABLED;
+			else
+				hw->use_msix = VIRTIO_MSIX_DISABLED;
+		}
+
+		if (cap.cap_vndr != PCI_CAP_ID_VNDR) {
+			VIRTIO_CRYPTO_INIT_LOG_DBG(
+				"[%2x] skipping non VNDR cap id: %02x",
+				pos, cap.cap_vndr);
+			goto next;
+		}
+
+		VIRTIO_CRYPTO_INIT_LOG_DBG(
+			"[%2x] cfg type: %u, bar: %u, offset: %04x, len: %u",
+			pos, cap.cfg_type, cap.bar, cap.offset, cap.length);
+
+		switch (cap.cfg_type) {
+		case VIRTIO_PCI_CAP_COMMON_CFG:
+			hw->common_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_NOTIFY_CFG:
+			rte_pci_read_config(dev, &hw->notify_off_multiplier,
+					4, pos + sizeof(cap));
+			hw->notify_base = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_DEVICE_CFG:
+			hw->dev_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_ISR_CFG:
+			hw->isr = get_cfg_addr(dev, &cap);
+			break;
+		}
+
+next:
+		pos = cap.cap_next;
+	}
+
+	if (hw->common_cfg == NULL || hw->notify_base == NULL ||
+	    hw->dev_cfg == NULL    || hw->isr == NULL) {
+		VIRTIO_CRYPTO_INIT_LOG_INFO("no modern virtio pci device found.");
+		return -1;
+	}
+
+	VIRTIO_CRYPTO_INIT_LOG_INFO("found modern virtio pci device.");
+
+	VIRTIO_CRYPTO_INIT_LOG_DBG("common cfg mapped at: %p", hw->common_cfg);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("device cfg mapped at: %p", hw->dev_cfg);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("isr cfg mapped at: %p", hw->isr);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("notify base: %p, notify off multiplier: %u",
+		hw->notify_base, hw->notify_off_multiplier);
+
+	return 0;
+}
+
+/*
+ * Return -1:
+ *   if there is error mapping with VFIO/UIO.
+ *   if port map error when driver type is KDRV_NONE.
+ *   if whitelisted but driver type is KDRV_UNKNOWN.
+ * Return 1 if kernel driver is managing the device.
+ * Return 0 on success.
+ */
+int
+vtpci_cryptodev_init(struct rte_pci_device *dev, struct virtio_crypto_hw *hw)
+{
+	/*
+	 * Try if we can succeed reading virtio pci caps, which exists
+	 * only on modern pci device. If failed, we fallback to legacy
+	 * virtio handling.
+	 */
+	if (virtio_read_caps(dev, hw) == 0) {
+		VIRTIO_CRYPTO_INIT_LOG_INFO("modern virtio pci detected.");
+		virtio_hw_internal[hw->dev_id].vtpci_ops =
+					&virtio_crypto_modern_ops;
+		hw->modern = 1;
+		return 0;
+	}
+
+	/*
+	 * virtio crypto conforms to virtio 1.0 and doesn't support
+	 * legacy mode
+	 */
+	return -1;
+}
diff --git a/drivers/crypto/virtio/virtio_pci.h b/drivers/crypto/virtio/virtio_pci.h
new file mode 100644
index 0000000..cd316a6
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_pci.h
@@ -0,0 +1,253 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_PCI_H_
+#define _VIRTIO_PCI_H_
+
+#include <linux/virtio_crypto.h>
+
+#include <stdint.h>
+
+#include <rte_pci.h>
+#include <rte_bus_pci.h>
+#include <rte_cryptodev.h>
+
+struct virtqueue;
+
+/* VirtIO PCI vendor/device ID. */
+#define VIRTIO_CRYPTO_PCI_VENDORID 0x1AF4
+#define VIRTIO_CRYPTO_PCI_DEVICEID 0x1054
+
+/* VirtIO ABI version, this must match exactly. */
+#define VIRTIO_PCI_ABI_VERSION 0
+
+/*
+ * VirtIO Header, located in BAR 0.
+ */
+#define VIRTIO_PCI_HOST_FEATURES  0  /* host's supported features (32bit, RO)*/
+#define VIRTIO_PCI_GUEST_FEATURES 4  /* guest's supported features (32, RW) */
+#define VIRTIO_PCI_QUEUE_PFN      8  /* physical address of VQ (32, RW) */
+#define VIRTIO_PCI_QUEUE_NUM      12 /* number of ring entries (16, RO) */
+#define VIRTIO_PCI_QUEUE_SEL      14 /* current VQ selection (16, RW) */
+#define VIRTIO_PCI_QUEUE_NOTIFY   16 /* notify host regarding VQ (16, RW) */
+#define VIRTIO_PCI_STATUS         18 /* device status register (8, RW) */
+#define VIRTIO_PCI_ISR            19 /* interrupt status register, reading
+				      * also clears the register (8, RO)
+				      */
+/* Only if MSIX is enabled: */
+
+/* configuration change vector (16, RW) */
+#define VIRTIO_MSI_CONFIG_VECTOR  20
+/* vector for selected VQ notifications */
+#define VIRTIO_MSI_QUEUE_VECTOR	  22
+
+/* The bit of the ISR which indicates a device has an interrupt. */
+#define VIRTIO_PCI_ISR_INTR   0x1
+/* The bit of the ISR which indicates a device configuration change. */
+#define VIRTIO_PCI_ISR_CONFIG 0x2
+/* Vector value used to disable MSI for queue. */
+#define VIRTIO_MSI_NO_VECTOR 0xFFFF
+
+/* Status byte for guest to report progress. */
+#define VIRTIO_CONFIG_STATUS_RESET     0x00
+#define VIRTIO_CONFIG_STATUS_ACK       0x01
+#define VIRTIO_CONFIG_STATUS_DRIVER    0x02
+#define VIRTIO_CONFIG_STATUS_DRIVER_OK 0x04
+#define VIRTIO_CONFIG_STATUS_FEATURES_OK 0x08
+#define VIRTIO_CONFIG_STATUS_FAILED    0x80
+
+/*
+ * Each virtqueue indirect descriptor list must be physically contiguous.
+ * To allow us to malloc(9) each list individually, limit the number
+ * supported to what will fit in one page. With 4KB pages, this is a limit
+ * of 256 descriptors. If there is ever a need for more, we can switch to
+ * contigmalloc(9) for the larger allocations, similar to what
+ * bus_dmamem_alloc(9) does.
+ *
+ * Note the sizeof(struct vring_desc) is 16 bytes.
+ */
+#define VIRTIO_MAX_INDIRECT ((int) (PAGE_SIZE / 16))
+
+/* Do we get callbacks when the ring is completely used, even if we've
+ * suppressed them?
+ */
+#define VIRTIO_F_NOTIFY_ON_EMPTY	24
+
+/* Can the device handle any descriptor layout? */
+#define VIRTIO_F_ANY_LAYOUT		27
+
+/* We support indirect buffer descriptors */
+#define VIRTIO_RING_F_INDIRECT_DESC	28
+
+#define VIRTIO_F_VERSION_1		32
+#define VIRTIO_F_IOMMU_PLATFORM	33
+
+/* The Guest publishes the used index for which it expects an interrupt
+ * at the end of the avail ring. Host should ignore the avail->flags field.
+ */
+/* The Host publishes the avail index for which it expects a kick
+ * at the end of the used ring. Guest should ignore the used->flags field.
+ */
+#define VIRTIO_RING_F_EVENT_IDX		29
+
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG	1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG	2
+/* ISR Status */
+#define VIRTIO_PCI_CAP_ISR_CFG		3
+/* Device specific configuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG	4
+/* PCI configuration access */
+#define VIRTIO_PCI_CAP_PCI_CFG		5
+
+/* This is the PCI capability header: */
+struct virtio_pci_cap {
+	uint8_t cap_vndr;	/* Generic PCI field: PCI_CAP_ID_VNDR */
+	uint8_t cap_next;	/* Generic PCI field: next ptr. */
+	uint8_t cap_len;	/* Generic PCI field: capability length */
+	uint8_t cfg_type;	/* Identifies the structure. */
+	uint8_t bar;		/* Where to find it. */
+	uint8_t padding[3];	/* Pad to full dword. */
+	uint32_t offset;	/* Offset within bar. */
+	uint32_t length;	/* Length of the structure, in bytes. */
+};
+
+struct virtio_pci_notify_cap {
+	struct virtio_pci_cap cap;
+	uint32_t notify_off_multiplier;	/* Multiplier for queue_notify_off. */
+};
+
+/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
+struct virtio_pci_common_cfg {
+	/* About the whole device. */
+	uint32_t device_feature_select;	/* read-write */
+	uint32_t device_feature;	/* read-only */
+	uint32_t guest_feature_select;	/* read-write */
+	uint32_t guest_feature;		/* read-write */
+	uint16_t msix_config;		/* read-write */
+	uint16_t num_queues;		/* read-only */
+	uint8_t device_status;		/* read-write */
+	uint8_t config_generation;	/* read-only */
+
+	/* About a specific virtqueue. */
+	uint16_t queue_select;		/* read-write */
+	uint16_t queue_size;		/* read-write, power of 2. */
+	uint16_t queue_msix_vector;	/* read-write */
+	uint16_t queue_enable;		/* read-write */
+	uint16_t queue_notify_off;	/* read-only */
+	uint32_t queue_desc_lo;		/* read-write */
+	uint32_t queue_desc_hi;		/* read-write */
+	uint32_t queue_avail_lo;	/* read-write */
+	uint32_t queue_avail_hi;	/* read-write */
+	uint32_t queue_used_lo;		/* read-write */
+	uint32_t queue_used_hi;		/* read-write */
+};
+
+struct virtio_crypto_hw;
+
+struct virtio_pci_ops {
+	void (*read_dev_cfg)(struct virtio_crypto_hw *hw, size_t offset,
+			     void *dst, int len);
+	void (*write_dev_cfg)(struct virtio_crypto_hw *hw, size_t offset,
+			      const void *src, int len);
+	void (*reset)(struct virtio_crypto_hw *hw);
+
+	uint8_t (*get_status)(struct virtio_crypto_hw *hw);
+	void (*set_status)(struct virtio_crypto_hw *hw, uint8_t status);
+
+	uint64_t (*get_features)(struct virtio_crypto_hw *hw);
+	void (*set_features)(struct virtio_crypto_hw *hw, uint64_t features);
+
+	uint8_t (*get_isr)(struct virtio_crypto_hw *hw);
+
+	uint16_t (*set_config_irq)(struct virtio_crypto_hw *hw, uint16_t vec);
+
+	uint16_t (*set_queue_irq)(struct virtio_crypto_hw *hw,
+			struct virtqueue *vq, uint16_t vec);
+
+	uint16_t (*get_queue_num)(struct virtio_crypto_hw *hw,
+			uint16_t queue_id);
+	int (*setup_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+	void (*del_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+	void (*notify_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+};
+
+struct virtio_crypto_hw {
+	/* control queue */
+	struct virtqueue *cvq;
+	uint16_t    dev_id;
+	uint16_t    max_dataqueues;
+	uint64_t    req_guest_features;
+	uint64_t    guest_features;
+	uint8_t	    use_msix;
+	uint8_t     modern;
+	uint32_t    notify_off_multiplier;
+	uint8_t     *isr;
+	uint16_t    *notify_base;
+	struct virtio_pci_common_cfg *common_cfg;
+	struct virtio_crypto_config *dev_cfg;
+	const struct rte_cryptodev_capabilities *virtio_dev_capabilities;
+};
+
+/*
+ * While virtio_crypto_hw is stored in shared memory, this structure stores
+ * some infos that may vary in the multiple process model locally.
+ * For example, the vtpci_ops pointer.
+ */
+struct virtio_hw_internal {
+	const struct virtio_pci_ops *vtpci_ops;
+	struct rte_pci_ioport io;
+};
+
+#define VTPCI_OPS(hw)	(virtio_hw_internal[(hw)->dev_id].vtpci_ops)
+#define VTPCI_IO(hw)	(&virtio_hw_internal[(hw)->dev_id].io)
+
+extern struct virtio_hw_internal virtio_hw_internal[RTE_MAX_VIRTIO_CRYPTO];
+
+/*
+ * How many bits to shift physical queue address written to QUEUE_PFN.
+ * 12 is historical, and due to x86 page size.
+ */
+#define VIRTIO_PCI_QUEUE_ADDR_SHIFT 12
+
+/* The alignment to use between consumer and producer parts of vring. */
+#define VIRTIO_PCI_VRING_ALIGN 4096
+
+enum virtio_msix_status {
+	VIRTIO_MSIX_NONE = 0,
+	VIRTIO_MSIX_DISABLED = 1,
+	VIRTIO_MSIX_ENABLED = 2
+};
+
+static inline int
+vtpci_with_feature(struct virtio_crypto_hw *hw, uint64_t bit)
+{
+	return (hw->guest_features & (1ULL << bit)) != 0;
+}
+
+/*
+ * Function declaration from virtio_pci.c
+ */
+int vtpci_cryptodev_init(struct rte_pci_device *dev,
+	struct virtio_crypto_hw *hw);
+void vtpci_cryptodev_reset(struct virtio_crypto_hw *hw);
+
+void vtpci_cryptodev_reinit_complete(struct virtio_crypto_hw *hw);
+
+uint8_t vtpci_cryptodev_get_status(struct virtio_crypto_hw *hw);
+void vtpci_cryptodev_set_status(struct virtio_crypto_hw *hw, uint8_t status);
+
+uint64_t vtpci_cryptodev_negotiate_features(struct virtio_crypto_hw *hw,
+	uint64_t host_features);
+
+void vtpci_write_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+	const void *src, int length);
+
+void vtpci_read_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+	void *dst, int length);
+
+uint8_t vtpci_cryptodev_isr(struct virtio_crypto_hw *hw);
+
+#endif /* _VIRTIO_PCI_H_ */
diff --git a/drivers/crypto/virtio/virtio_ring.h b/drivers/crypto/virtio/virtio_ring.h
new file mode 100644
index 0000000..ee30674
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_ring.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_RING_H_
+#define _VIRTIO_RING_H_
+
+#include <stdint.h>
+
+#include <rte_common.h>
+
+/* This marks a buffer as continuing via the next field. */
+#define VRING_DESC_F_NEXT       1
+/* This marks a buffer as write-only (otherwise read-only). */
+#define VRING_DESC_F_WRITE      2
+/* This means the buffer contains a list of buffer descriptors. */
+#define VRING_DESC_F_INDIRECT   4
+
+/* The Host uses this in used->flags to advise the Guest: don't kick me
+ * when you add a buffer.  It's unreliable, so it's simply an
+ * optimization.  Guest will still kick if it's out of buffers.
+ */
+#define VRING_USED_F_NO_NOTIFY  1
+/* The Guest uses this in avail->flags to advise the Host: don't
+ * interrupt me when you consume a buffer.  It's unreliable, so it's
+ * simply an optimization.
+ */
+#define VRING_AVAIL_F_NO_INTERRUPT  1
+
+/* VirtIO ring descriptors: 16 bytes.
+ * These can chain together via "next".
+ */
+struct vring_desc {
+	uint64_t addr;  /*  Address (guest-physical). */
+	uint32_t len;   /* Length. */
+	uint16_t flags; /* The flags as indicated above. */
+	uint16_t next;  /* We chain unused descriptors via this. */
+};
+
+struct vring_avail {
+	uint16_t flags;
+	uint16_t idx;
+	uint16_t ring[0];
+};
+
+/* id is a 16bit index. uint32_t is used here for ids for padding reasons. */
+struct vring_used_elem {
+	/* Index of start of used descriptor chain. */
+	uint32_t id;
+	/* Total length of the descriptor chain which was written to. */
+	uint32_t len;
+};
+
+struct vring_used {
+	uint16_t flags;
+	volatile uint16_t idx;
+	struct vring_used_elem ring[0];
+};
+
+struct vring {
+	unsigned int num;
+	struct vring_desc  *desc;
+	struct vring_avail *avail;
+	struct vring_used  *used;
+};
+
+/* The standard layout for the ring is a continuous chunk of memory which
+ * looks like this.  We assume num is a power of 2.
+ *
+ * struct vring {
+ *      // The actual descriptors (16 bytes each)
+ *      struct vring_desc desc[num];
+ *
+ *      // A ring of available descriptor heads with free-running index.
+ *      __u16 avail_flags;
+ *      __u16 avail_idx;
+ *      __u16 available[num];
+ *      __u16 used_event_idx;
+ *
+ *      // Padding to the next align boundary.
+ *      char pad[];
+ *
+ *      // A ring of used descriptor heads with free-running index.
+ *      __u16 used_flags;
+ *      __u16 used_idx;
+ *      struct vring_used_elem used[num];
+ *      __u16 avail_event_idx;
+ * };
+ *
+ * NOTE: for VirtIO PCI, align is 4096.
+ */
+
+/*
+ * We publish the used event index at the end of the available ring, and vice
+ * versa. They are at the end for backwards compatibility.
+ */
+#define vring_used_event(vr)  ((vr)->avail->ring[(vr)->num])
+#define vring_avail_event(vr) (*(uint16_t *)&(vr)->used->ring[(vr)->num])
+
+static inline size_t
+vring_size(unsigned int num, unsigned long align)
+{
+	size_t size;
+
+	size = num * sizeof(struct vring_desc);
+	size += sizeof(struct vring_avail) + (num * sizeof(uint16_t));
+	size = RTE_ALIGN_CEIL(size, align);
+	size += sizeof(struct vring_used) +
+		(num * sizeof(struct vring_used_elem));
+	return size;
+}
+
+static inline void
+vring_init(struct vring *vr, unsigned int num, uint8_t *p,
+	unsigned long align)
+{
+	vr->num = num;
+	vr->desc = (struct vring_desc *) p;
+	vr->avail = (struct vring_avail *) (p +
+		num * sizeof(struct vring_desc));
+	vr->used = (void *)
+		RTE_ALIGN_CEIL((uintptr_t)(&vr->avail->ring[num]), align);
+}
+
+/*
+ * The following is used with VIRTIO_RING_F_EVENT_IDX.
+ * Assuming a given event_idx value from the other size, if we have
+ * just incremented index from old to new_idx, should we trigger an
+ * event?
+ */
+static inline int
+vring_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
+{
+	return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old);
+}
+
+#endif /* _VIRTIO_RING_H_ */
diff --git a/drivers/crypto/virtio/virtio_rxtx.c b/drivers/crypto/virtio/virtio_rxtx.c
new file mode 100644
index 0000000..51f6e09
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_rxtx.c
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+#include "virtio_cryptodev.h"
+
+uint16_t
+virtio_crypto_pkt_rx_burst(
+	void *tx_queue __rte_unused,
+	struct rte_crypto_op **rx_pkts __rte_unused,
+	uint16_t nb_pkts __rte_unused)
+{
+	uint16_t nb_rx = 0;
+
+	return nb_rx;
+}
+
+uint16_t
+virtio_crypto_pkt_tx_burst(
+	void *tx_queue __rte_unused,
+	struct rte_crypto_op **tx_pkts __rte_unused,
+	uint16_t nb_pkts __rte_unused)
+{
+	uint16_t nb_tx = 0;
+
+	return nb_tx;
+}
diff --git a/drivers/crypto/virtio/virtqueue.c b/drivers/crypto/virtio/virtqueue.c
new file mode 100644
index 0000000..fd8be58
--- /dev/null
+++ b/drivers/crypto/virtio/virtqueue.c
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+#include <rte_crypto.h>
+#include <rte_malloc.h>
+
+#include "virtqueue.h"
+
+void
+virtqueue_disable_intr(struct virtqueue *vq)
+{
+	/*
+	 * Set VRING_AVAIL_F_NO_INTERRUPT to hint host
+	 * not to interrupt when it consumes packets
+	 * Note: this is only considered a hint to the host
+	 */
+	vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
+}
+
+void
+virtqueue_detatch_unused(struct virtqueue *vq)
+{
+	struct rte_crypto_op *cop = NULL;
+
+	int idx;
+
+	if (vq != NULL)
+		for (idx = 0; idx < vq->vq_nentries; idx++) {
+			cop = vq->vq_descx[idx].crypto_op;
+			if (cop) {
+				if (cop->sym->m_src)
+					rte_pktmbuf_free(cop->sym->m_src);
+				if (cop->sym->m_dst)
+					rte_pktmbuf_free(cop->sym->m_dst);
+				rte_crypto_op_free(cop);
+				vq->vq_descx[idx].crypto_op = NULL;
+			}
+		}
+}
diff --git a/drivers/crypto/virtio/virtqueue.h b/drivers/crypto/virtio/virtqueue.h
new file mode 100644
index 0000000..0a9bddb
--- /dev/null
+++ b/drivers/crypto/virtio/virtqueue.h
@@ -0,0 +1,172 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTQUEUE_H_
+#define _VIRTQUEUE_H_
+
+#include <linux/virtio_crypto.h>
+
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_mempool.h>
+
+#include "virtio_pci.h"
+#include "virtio_ring.h"
+#include "virtio_logs.h"
+
+struct rte_mbuf;
+
+/*
+ * Per virtio_config.h in Linux.
+ *     For virtio_pci on SMP, we don't need to order with respect to MMIO
+ *     accesses through relaxed memory I/O windows, so smp_mb() et al are
+ *     sufficient.
+ *
+ */
+#define virtio_mb()	rte_smp_mb()
+#define virtio_rmb()	rte_smp_rmb()
+#define virtio_wmb()	rte_smp_wmb()
+
+#define VIRTQUEUE_MAX_NAME_SZ 32
+
+enum { VTCRYPTO_DATAQ = 0, VTCRYPTO_CTRLQ = 1 };
+
+/**
+ * The maximum virtqueue size is 2^15. Use that value as the end of
+ * descriptor chain terminator since it will never be a valid index
+ * in the descriptor table. This is used to verify we are correctly
+ * handling vq_free_cnt.
+ */
+#define VQ_RING_DESC_CHAIN_END 32768
+
+struct vq_desc_extra {
+	void     *crypto_op;
+	void     *cookie;
+	uint16_t ndescs;
+};
+
+struct virtqueue {
+	/**< virtio_crypto_hw structure pointer. */
+	struct virtio_crypto_hw *hw;
+	/**< mem zone to populate RX ring. */
+	const struct rte_memzone *mz;
+	/**< memzone to populate hdr and request. */
+	struct rte_mempool *mpool;
+	uint8_t     dev_id;              /**< Device identifier. */
+	uint16_t    vq_queue_index;       /**< PCI queue index */
+
+	void        *vq_ring_virt_mem;    /**< linear address of vring*/
+	unsigned int vq_ring_size;
+	phys_addr_t vq_ring_mem;          /**< physical address of vring */
+
+	struct vring vq_ring;    /**< vring keeping desc, used and avail */
+	uint16_t    vq_free_cnt; /**< num of desc available */
+	uint16_t    vq_nentries; /**< vring desc numbers */
+
+	/**
+	 * Head of the free chain in the descriptor table. If
+	 * there are no free descriptors, this will be set to
+	 * VQ_RING_DESC_CHAIN_END.
+	 */
+	uint16_t  vq_desc_head_idx;
+	uint16_t  vq_desc_tail_idx;
+	/**
+	 * Last consumed descriptor in the used table,
+	 * trails vq_ring.used->idx.
+	 */
+	uint16_t vq_used_cons_idx;
+	uint16_t vq_avail_idx;
+
+	/* Statistics */
+	uint64_t	packets_sent_total;
+	uint64_t	packets_sent_failed;
+	uint64_t	packets_received_total;
+	uint64_t	packets_received_failed;
+
+	uint16_t  *notify_addr;
+
+	struct vq_desc_extra vq_descx[0];
+};
+
+/**
+ * Tell the backend not to interrupt us.
+ */
+void virtqueue_disable_intr(struct virtqueue *vq);
+
+/**
+ *  Get all mbufs to be freed.
+ */
+void virtqueue_detatch_unused(struct virtqueue *vq);
+
+static inline int
+virtqueue_full(const struct virtqueue *vq)
+{
+	return vq->vq_free_cnt == 0;
+}
+
+#define VIRTQUEUE_NUSED(vq) \
+	((uint16_t)((vq)->vq_ring.used->idx - (vq)->vq_used_cons_idx))
+
+static inline void
+vq_update_avail_idx(struct virtqueue *vq)
+{
+	virtio_wmb();
+	vq->vq_ring.avail->idx = vq->vq_avail_idx;
+}
+
+static inline void
+vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
+{
+	uint16_t avail_idx;
+	/*
+	 * Place the head of the descriptor chain into the next slot and make
+	 * it usable to the host. The chain is made available now rather than
+	 * deferring to virtqueue_notify() in the hopes that if the host is
+	 * currently running on another CPU, we can keep it processing the new
+	 * descriptor.
+	 */
+	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
+	if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx))
+		vq->vq_ring.avail->ring[avail_idx] = desc_idx;
+	vq->vq_avail_idx++;
+}
+
+static inline int
+virtqueue_kick_prepare(struct virtqueue *vq)
+{
+	return !(vq->vq_ring.used->flags & VRING_USED_F_NO_NOTIFY);
+}
+
+static inline void
+virtqueue_notify(struct virtqueue *vq)
+{
+	/*
+	 * Ensure updated avail->idx is visible to host.
+	 * For virtio on IA, the notificaiton is through io port operation
+	 * which is a serialization instruction itself.
+	 */
+	VTPCI_OPS(vq->hw)->notify_queue(vq->hw, vq);
+}
+
+/**
+ * Dump virtqueue internal structures, for debug purpose only.
+ */
+#define VIRTQUEUE_DUMP(vq) do { \
+	uint16_t used_idx, nused; \
+	used_idx = (vq)->vq_ring.used->idx; \
+	nused = (uint16_t)(used_idx - (vq)->vq_used_cons_idx); \
+	VIRTIO_CRYPTO_INIT_LOG_DBG(\
+	  "VQ: - size=%d; free=%d; used=%d; desc_head_idx=%d;" \
+	  " avail.idx=%d; used_cons_idx=%d; used.idx=%d;" \
+	  " avail.flags=0x%x; used.flags=0x%x", \
+	  (vq)->vq_nentries, (vq)->vq_free_cnt, nused, \
+	  (vq)->vq_desc_head_idx, (vq)->vq_ring.avail->idx, \
+	  (vq)->vq_used_cons_idx, (vq)->vq_ring.used->idx, \
+	  (vq)->vq_ring.avail->flags, (vq)->vq_ring.used->flags); \
+} while (0)
+
+#endif /* _VIRTQUEUE_H_ */
-- 
1.8.3.1

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v1 10/16] ethdev: add encap level to RSS flow API action
  2018-04-04 15:56  4% [dpdk-dev] [PATCH v1 00/16] Flow API overhaul for switch offloads Adrien Mazarguil
  2018-04-04 15:56  7% ` [dpdk-dev] [PATCH v1 01/16] ethdev: update ABI for flow API functions Adrien Mazarguil
  2018-04-04 15:56  3% ` [dpdk-dev] [PATCH v1 05/16] ethdev: remove DUP action from flow API Adrien Mazarguil
@ 2018-04-04 15:56  2% ` Adrien Mazarguil
  2 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2018-04-04 15:56 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, dev
  Cc: Xueming Li, Wenzhuo Lu, Jingjing Wu, Beilei Xing, Qi Zhang,
	Konstantin Ananyev, Nelio Laranjeiro, Yongseok Koh,
	Andrew Rybchenko, Pascal Mazon

RSS hash types (ETH_RSS_* macros defined in rte_ethdev.h) describe the
protocol header fields of a packet that must be taken into account while
computing RSS.

When facing encapsulated (e.g. tunneled) packets, there is an ambiguity as
to whether these should apply to inner or outer packets. Applications need
the ability to tell exactly "where" RSS must be performed.

This is addressed by adding encapsulation level information to the RSS flow
action. Its default value is 0 and stands for the usual unspecified
behavior. Other values provide a specific encapsulation level.

Contrary to the change announced by commit 676b605182a5 ("doc: announce
ethdev API change for RSS configuration"), this patch does not affect
struct rte_eth_rss_conf but struct rte_flow_action_rss as the former is not
used anymore by the RSS flow action. ABI impact is therefore limited to
rte_flow.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Cc: Xueming Li <xuemingl@mellanox.com>
Cc: Ferruh Yigit <ferruh.yigit@intel.com>
Cc: Thomas Monjalon <thomas@monjalon.net>
Cc: Wenzhuo Lu <wenzhuo.lu@intel.com>
Cc: Jingjing Wu <jingjing.wu@intel.com>
Cc: Beilei Xing <beilei.xing@intel.com>
Cc: Qi Zhang <qi.z.zhang@intel.com>
Cc: Konstantin Ananyev <konstantin.ananyev@intel.com>
Cc: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Cc: Yongseok Koh <yskoh@mellanox.com>
Cc: Andrew Rybchenko <arybchenko@solarflare.com>
Cc: Pascal Mazon <pascal.mazon@6wind.com>
---
 app/test-pmd/cmdline_flow.c                 | 13 ++++++++++++
 app/test-pmd/config.c                       |  1 +
 doc/guides/prog_guide/rte_flow.rst          | 24 ++++++++++++++++++++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  2 ++
 drivers/net/e1000/igb_flow.c                |  4 ++++
 drivers/net/e1000/igb_rxtx.c                |  2 ++
 drivers/net/i40e/i40e_ethdev.c              |  2 ++
 drivers/net/i40e/i40e_flow.c                |  4 ++++
 drivers/net/ixgbe/ixgbe_flow.c              |  4 ++++
 drivers/net/ixgbe/ixgbe_rxtx.c              |  2 ++
 drivers/net/mlx4/mlx4_flow.c                |  6 ++++++
 drivers/net/mlx5/mlx5_flow.c                | 11 ++++++++++
 drivers/net/sfc/sfc_flow.c                  |  3 +++
 drivers/net/tap/tap_flow.c                  |  6 +++++-
 lib/librte_ether/rte_flow.c                 |  1 +
 lib/librte_ether/rte_flow.h                 | 26 ++++++++++++++++++++++++
 16 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 23e10d623..2fbd3d8ef 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -167,6 +167,7 @@ enum index {
 	ACTION_COUNT,
 	ACTION_RSS,
 	ACTION_RSS_FUNC,
+	ACTION_RSS_LEVEL,
 	ACTION_RSS_FUNC_DEFAULT,
 	ACTION_RSS_FUNC_TOEPLITZ,
 	ACTION_RSS_FUNC_SIMPLE_XOR,
@@ -638,6 +639,7 @@ static const enum index action_queue[] = {
 
 static const enum index action_rss[] = {
 	ACTION_RSS_FUNC,
+	ACTION_RSS_LEVEL,
 	ACTION_RSS_TYPES,
 	ACTION_RSS_KEY,
 	ACTION_RSS_KEY_LEN,
@@ -1616,6 +1618,16 @@ static const struct token token_list[] = {
 		.help = "simple XOR hash function",
 		.call = parse_vc_action_rss_func,
 	},
+	[ACTION_RSS_LEVEL] = {
+		.name = "level",
+		.help = "encapsulation level for \"types\"",
+		.next = NEXT(action_rss, NEXT_ENTRY(UNSIGNED)),
+		.args = ARGS(ARGS_ENTRY_ARB
+			     (offsetof(struct action_rss_data, conf) +
+			      offsetof(struct rte_flow_action_rss, level),
+			      sizeof(((struct rte_flow_action_rss *)0)->
+				     level))),
+	},
 	[ACTION_RSS_TYPES] = {
 		.name = "types",
 		.help = "RSS hash types",
@@ -2107,6 +2119,7 @@ parse_vc_action_rss(struct context *ctx, const struct token *token,
 	*action_rss_data = (struct action_rss_data){
 		.conf = (struct rte_flow_action_rss){
 			.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+			.level = 0,
 			.types = rss_hf,
 			.key_len = sizeof(action_rss_data->key),
 			.queue_num = RTE_MIN(nb_rxq, ACTION_RSS_QUEUE_NUM),
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index b258c93e8..c0fefe475 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1085,6 +1085,7 @@ flow_action_conf_copy(void *buf, const struct rte_flow_action *action)
 		if (dst.rss)
 			*dst.rss = (struct rte_flow_action_rss){
 				.func = src.rss->func,
+				.level = src.rss->level,
 				.types = src.rss->types,
 				.key_len = src.rss->key_len,
 				.queue_num = src.rss->queue_num,
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 6261233bc..c893d737a 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1304,6 +1304,28 @@ Note: RSS hash result is stored in the ``hash.rss`` mbuf field which
 overlaps ``hash.fdir.lo``. Since `Action: MARK`_ sets the ``hash.fdir.hi``
 field only, both can be requested simultaneously.
 
+Also, regarding packet encapsulation ``level``:
+
+- ``0`` requests the default behavior. Depending on the packet type, it can
+  mean outermost, innermost, anything in between or even no RSS.
+
+  It basically stands for the innermost encapsulation level RSS can be
+  performed on according to PMD and device capabilities.
+
+- ``1`` requests RSS to be performed on the outermost packet encapsulation
+  level.
+
+- ``2`` and subsequent values request RSS to be performed on the specified
+   inner packet encapsulation level, from outermost to innermost (lower to
+   higher values).
+
+Values other than ``0`` are not necessarily supported.
+
+Requesting a specific RSS level on unrecognized traffic results in undefined
+behavior. For predictable results, it is recommended to make the flow rule
+pattern match packet headers up to the requested encapsulation level so that
+only matching traffic goes through.
+
 .. _table_rte_flow_action_rss:
 
 .. table:: RSS
@@ -1313,6 +1335,8 @@ field only, both can be requested simultaneously.
    +===============+====================================+
    | ``func``      | RSS hash function to apply         |
    +---------------+------------------------------------+
+   | ``level``     | encapsulation level for ``types``  |
+   +---------------+------------------------------------+
    | ``types``     | RSS hash types (see ``ETH_RSS_*``) |
    +---------------+------------------------------------+
    | ``key_len``   | hash key length in bytes           |
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index d9d68ad9b..738461f44 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3401,6 +3401,8 @@ This section lists supported actions and their attributes, if any.
   - ``func {hash function}``: RSS hash function to apply, allowed tokens are
     the same as `set_hash_global_config`_.
 
+  - ``level {unsigned}``: encapsulation level for ``types``.
+
   - ``types [{RSS hash type} [...]] end``: RSS hash types, allowed tokens
     are the same as `set_hash_input_set`_, an empty list means none (0).
 
diff --git a/drivers/net/e1000/igb_flow.c b/drivers/net/e1000/igb_flow.c
index 747c524f5..13f6f2a28 100644
--- a/drivers/net/e1000/igb_flow.c
+++ b/drivers/net/e1000/igb_flow.c
@@ -1314,6 +1314,10 @@ igb_parse_rss_filter(struct rte_eth_dev *dev,
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION, act,
 			 "non-default RSS hash functions are not supported");
+	if (rss->level)
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION, act,
+			 "a nonzero RSS encapsulation level is not supported");
 	if (rss->key_len && rss->key_len != RTE_DIM(rss_conf->key))
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION, act,
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 18367f443..80407e6bb 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -2767,6 +2767,7 @@ igb_rss_conf_init(struct igb_rte_flow_rss_conf *out,
 		return -EINVAL;
 	out->conf = (struct rte_flow_action_rss){
 		.func = in->func,
+		.level = in->level,
 		.types = in->types,
 		.key_len = in->key_len,
 		.queue_num = in->queue_num,
@@ -2782,6 +2783,7 @@ igb_action_rss_same(const struct rte_flow_action_rss *comp,
 		    const struct rte_flow_action_rss *with)
 {
 	return (comp->func == with->func &&
+		comp->level == with->level &&
 		comp->types == with->types &&
 		comp->key_len == with->key_len &&
 		comp->queue_num == with->queue_num &&
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index c503d7de2..8f47039a8 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -11963,6 +11963,7 @@ i40e_rss_conf_init(struct i40e_rte_flow_rss_conf *out,
 		return -EINVAL;
 	out->conf = (struct rte_flow_action_rss){
 		.func = in->func,
+		.level = in->level,
 		.types = in->types,
 		.key_len = in->key_len,
 		.queue_num = in->queue_num,
@@ -11978,6 +11979,7 @@ i40e_action_rss_same(const struct rte_flow_action_rss *comp,
 		     const struct rte_flow_action_rss *with)
 {
 	return (comp->func == with->func &&
+		comp->level == with->level &&
 		comp->types == with->types &&
 		comp->key_len == with->key_len &&
 		comp->queue_num == with->queue_num &&
diff --git a/drivers/net/i40e/i40e_flow.c b/drivers/net/i40e/i40e_flow.c
index 65ee27917..1b336df74 100644
--- a/drivers/net/i40e/i40e_flow.c
+++ b/drivers/net/i40e/i40e_flow.c
@@ -4328,6 +4328,10 @@ i40e_flow_parse_rss_action(struct rte_eth_dev *dev,
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION, act,
 			 "non-default RSS hash functions are not supported");
+	if (rss->level)
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION, act,
+			 "a nonzero RSS encapsulation level is not supported");
 	if (rss->key_len && rss->key_len > RTE_DIM(rss_config->key))
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION, act,
diff --git a/drivers/net/ixgbe/ixgbe_flow.c b/drivers/net/ixgbe/ixgbe_flow.c
index 10056a0f7..67d22b382 100644
--- a/drivers/net/ixgbe/ixgbe_flow.c
+++ b/drivers/net/ixgbe/ixgbe_flow.c
@@ -2783,6 +2783,10 @@ ixgbe_parse_rss_filter(struct rte_eth_dev *dev,
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION, act,
 			 "non-default RSS hash functions are not supported");
+	if (rss->level)
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION, act,
+			 "a nonzero RSS encapsulation level is not supported");
 	if (rss->key_len && rss->key_len != RTE_DIM(rss_conf->key))
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION, act,
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 4f46eeb2b..4697ff0c0 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -5530,6 +5530,7 @@ ixgbe_rss_conf_init(struct ixgbe_rte_flow_rss_conf *out,
 		return -EINVAL;
 	out->conf = (struct rte_flow_action_rss){
 		.func = in->func,
+		.level = in->level,
 		.types = in->types,
 		.key_len = in->key_len,
 		.queue_num = in->queue_num,
@@ -5545,6 +5546,7 @@ ixgbe_action_rss_same(const struct rte_flow_action_rss *comp,
 		      const struct rte_flow_action_rss *with)
 {
 	return (comp->func == with->func &&
+		comp->level == with->level &&
 		comp->types == with->types &&
 		comp->key_len == with->key_len &&
 		comp->queue_num == with->queue_num &&
diff --git a/drivers/net/mlx4/mlx4_flow.c b/drivers/net/mlx4/mlx4_flow.c
index dcaf8df44..779641e11 100644
--- a/drivers/net/mlx4/mlx4_flow.c
+++ b/drivers/net/mlx4/mlx4_flow.c
@@ -796,6 +796,11 @@ mlx4_flow_prepare(struct priv *priv,
 					" is Toeplitz";
 				goto exit_action_not_supported;
 			}
+			if (rss->level) {
+				msg = "a nonzero RSS encapsulation level is"
+					" not supported";
+				goto exit_action_not_supported;
+			}
 			rte_errno = 0;
 			fields = mlx4_conv_rss_types(priv, rss->types);
 			if (fields == (uint64_t)-1 && rte_errno) {
@@ -1290,6 +1295,7 @@ mlx4_flow_internal(struct priv *priv, struct rte_flow_error *error)
 	uint16_t queue[queues];
 	struct rte_flow_action_rss action_rss = {
 		.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+		.level = 0,
 		.types = -1,
 		.key_len = MLX4_RSS_HASH_KEY_SIZE,
 		.queue_num = queues,
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 0771ad339..bc1176819 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -644,6 +644,14 @@ mlx5_flow_convert_actions(struct rte_eth_dev *dev,
 						   " function is Toeplitz");
 				return -rte_errno;
 			}
+			if (rss->level) {
+				rte_flow_error_set(error, EINVAL,
+						   RTE_FLOW_ERROR_TYPE_ACTION,
+						   actions,
+						   "a nonzero RSS encapsulation"
+						   " level is not supported");
+				return -rte_errno;
+			}
 			if (rss->types & MLX5_RSS_HF_MASK) {
 				rte_flow_error_set(error, EINVAL,
 						   RTE_FLOW_ERROR_TYPE_ACTION,
@@ -694,6 +702,7 @@ mlx5_flow_convert_actions(struct rte_eth_dev *dev,
 			}
 			parser->rss_conf = (struct rte_flow_action_rss){
 				.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+				.level = 0,
 				.types = rss->types,
 				.key_len = rss_key_len,
 				.queue_num = rss->queue_num,
@@ -1927,6 +1936,7 @@ mlx5_flow_list_create(struct rte_eth_dev *dev,
 	flow->queues = (uint16_t (*)[])(flow + 1);
 	flow->rss_conf = (struct rte_flow_action_rss){
 		.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+		.level = 0,
 		.types = parser.rss_conf.types,
 		.key_len = parser.rss_conf.key_len,
 		.queue_num = parser.rss_conf.queue_num,
@@ -2442,6 +2452,7 @@ mlx5_ctrl_flow_vlan(struct rte_eth_dev *dev,
 	uint16_t queue[priv->reta_idx_n];
 	struct rte_flow_action_rss action_rss = {
 		.func = RTE_ETH_HASH_FUNCTION_DEFAULT,
+		.level = 0,
 		.types = priv->rss_conf.rss_hf,
 		.key_len = priv->rss_conf.rss_key_len,
 		.queue_num = priv->reta_idx_n,
diff --git a/drivers/net/sfc/sfc_flow.c b/drivers/net/sfc/sfc_flow.c
index d08ba326c..bf9609735 100644
--- a/drivers/net/sfc/sfc_flow.c
+++ b/drivers/net/sfc/sfc_flow.c
@@ -1265,6 +1265,9 @@ sfc_flow_parse_rss(struct sfc_adapter *sa,
 	if (rss->func)
 		return -EINVAL;
 
+	if (rss->level)
+		return -EINVAL;
+
 	if ((rss->types & ~SFC_RSS_OFFLOADS) != 0)
 		return -EINVAL;
 
diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c
index 3d91da216..e5eb50fc5 100644
--- a/drivers/net/tap/tap_flow.c
+++ b/drivers/net/tap/tap_flow.c
@@ -2055,11 +2055,15 @@ static int rss_add_actions(struct rte_flow *flow, struct pmd_internals *pmd,
 	struct rss_key rss_entry = { .hash_fields = 0,
 				     .key_size = 0 };
 
-	/* Check supported hash functions */
+	/* Check supported RSS features */
 	if (rss->func)
 		return rte_flow_error_set
 			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
 			 "non-default RSS hash functions are not supported");
+	if (rss->level)
+		return rte_flow_error_set
+			(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			 "a nonzero RSS encapsulation level is not supported");
 
 	/* Get a new map key for a new RSS rule */
 	err = bpf_rss_key(KEY_CMD_GET, &flow->key_idx);
diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
index 0a2c0ac00..1f247d656 100644
--- a/lib/librte_ether/rte_flow.c
+++ b/lib/librte_ether/rte_flow.c
@@ -331,6 +331,7 @@ flow_action_conf_copy(void *buf, const struct rte_flow_action *action)
 		if (dst.rss)
 			*dst.rss = (struct rte_flow_action_rss){
 				.func = src.rss->func,
+				.level = src.rss->level,
 				.types = src.rss->types,
 				.key_len = src.rss->key_len,
 				.queue_num = src.rss->queue_num,
diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
index 1fc1df9c3..1b222ba60 100644
--- a/lib/librte_ether/rte_flow.h
+++ b/lib/librte_ether/rte_flow.h
@@ -1039,6 +1039,32 @@ struct rte_flow_query_count {
  */
 struct rte_flow_action_rss {
 	enum rte_eth_hash_function func; /**< RSS hash function to apply. */
+	/**
+	 * Packet encapsulation level RSS hash @p types apply to.
+	 *
+	 * - @p 0 requests the default behavior. Depending on the packet
+	 *   type, it can mean outermost, innermost, anything in between or
+	 *   even no RSS.
+	 *
+	 *   It basically stands for the innermost encapsulation level RSS
+	 *   can be performed on according to PMD and device capabilities.
+	 *
+	 * - @p 1 requests RSS to be performed on the outermost packet
+	 *   encapsulation level.
+	 *
+	 * - @p 2 and subsequent values request RSS to be performed on the
+	 *   specified inner packet encapsulation level, from outermost to
+	 *   innermost (lower to higher values).
+	 *
+	 * Values other than @p 0 are not necessarily supported.
+	 *
+	 * Requesting a specific RSS level on unrecognized traffic results
+	 * in undefined behavior. For predictable results, it is recommended
+	 * to make the flow rule pattern match packet headers up to the
+	 * requested encapsulation level so that only matching traffic goes
+	 * through.
+	 */
+	uint32_t level;
 	uint64_t types; /**< RSS hash types (see ETH_RSS_*). */
 	uint32_t key_len; /**< Hash key length in bytes. */
 	uint32_t queue_num; /**< Number of entries in @p queue. */
-- 
2.11.0

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v1 05/16] ethdev: remove DUP action from flow API
  2018-04-04 15:56  4% [dpdk-dev] [PATCH v1 00/16] Flow API overhaul for switch offloads Adrien Mazarguil
  2018-04-04 15:56  7% ` [dpdk-dev] [PATCH v1 01/16] ethdev: update ABI for flow API functions Adrien Mazarguil
@ 2018-04-04 15:56  3% ` Adrien Mazarguil
  2018-04-04 15:56  2% ` [dpdk-dev] [PATCH v1 10/16] ethdev: add encap level to RSS flow API action Adrien Mazarguil
  2 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2018-04-04 15:56 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, dev

Upcoming changes in relation to the handling of actions list will make the
DUP action redundant as specifying several QUEUE actions will achieve the
same behavior. Besides, no PMD implements this action.

By removing an entry from enum rte_flow_action_type, this patch triggers a
major ABI breakage.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 app/test-pmd/cmdline_flow.c                 | 23 -----------------------
 app/test-pmd/config.c                       |  1 -
 doc/guides/prog_guide/rte_flow.rst          | 23 -----------------------
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  8 --------
 lib/librte_ether/rte_flow.c                 |  1 -
 lib/librte_ether/rte_flow.h                 | 24 ------------------------
 6 files changed, 80 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 30450f1a4..9702b3ef3 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -164,8 +164,6 @@ enum index {
 	ACTION_QUEUE_INDEX,
 	ACTION_DROP,
 	ACTION_COUNT,
-	ACTION_DUP,
-	ACTION_DUP_INDEX,
 	ACTION_RSS,
 	ACTION_RSS_TYPES,
 	ACTION_RSS_TYPE,
@@ -625,7 +623,6 @@ static const enum index next_action[] = {
 	ACTION_QUEUE,
 	ACTION_DROP,
 	ACTION_COUNT,
-	ACTION_DUP,
 	ACTION_RSS,
 	ACTION_PF,
 	ACTION_VF,
@@ -645,12 +642,6 @@ static const enum index action_queue[] = {
 	ZERO,
 };
 
-static const enum index action_dup[] = {
-	ACTION_DUP_INDEX,
-	ACTION_NEXT,
-	ZERO,
-};
-
 static const enum index action_rss[] = {
 	ACTION_RSS_TYPES,
 	ACTION_RSS_KEY,
@@ -1597,20 +1588,6 @@ static const struct token token_list[] = {
 		.next = NEXT(NEXT_ENTRY(ACTION_NEXT)),
 		.call = parse_vc,
 	},
-	[ACTION_DUP] = {
-		.name = "dup",
-		.help = "duplicate packets to a given queue index",
-		.priv = PRIV_ACTION(DUP, sizeof(struct rte_flow_action_dup)),
-		.next = NEXT(action_dup),
-		.call = parse_vc,
-	},
-	[ACTION_DUP_INDEX] = {
-		.name = "index",
-		.help = "queue index to duplicate packets to",
-		.next = NEXT(action_dup, NEXT_ENTRY(UNSIGNED)),
-		.args = ARGS(ARGS_ENTRY(struct rte_flow_action_dup, index)),
-		.call = parse_vc_conf,
-	},
 	[ACTION_RSS] = {
 		.name = "rss",
 		.help = "spread packets among several queues",
diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 7ae0295f6..8d42ea9a9 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1049,7 +1049,6 @@ static const struct {
 	MK_FLOW_ACTION(QUEUE, sizeof(struct rte_flow_action_queue)),
 	MK_FLOW_ACTION(DROP, 0),
 	MK_FLOW_ACTION(COUNT, 0),
-	MK_FLOW_ACTION(DUP, sizeof(struct rte_flow_action_dup)),
 	MK_FLOW_ACTION(RSS, sizeof(struct rte_flow_action_rss)), /* +queue[] */
 	MK_FLOW_ACTION(PF, 0),
 	MK_FLOW_ACTION(VF, sizeof(struct rte_flow_action_vf)),
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 51826d04c..a237e4fd2 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -1299,26 +1299,6 @@ Query structure to retrieve and reset flow rule counters:
    | ``bytes``     | out | number of bytes through this rule |
    +---------------+-----+-----------------------------------+
 
-Action: ``DUP``
-^^^^^^^^^^^^^^^
-
-Duplicates packets to a given queue index.
-
-This is normally combined with QUEUE, however when used alone, it is
-actually similar to QUEUE + PASSTHRU.
-
-- Non-terminating by default.
-
-.. _table_rte_flow_action_dup:
-
-.. table:: DUP
-
-   +-----------+------------------------------------+
-   | Field     | Value                              |
-   +===========+====================================+
-   | ``index`` | queue index to duplicate packet to |
-   +-----------+------------------------------------+
-
 Action: ``RSS``
 ^^^^^^^^^^^^^^^
 
@@ -2010,9 +1990,6 @@ Unsupported actions
   and tagging (`Action: MARK`_ or `Action: FLAG`_) may be implemented in
   software as long as the target queue is used by a single rule.
 
-- A rule specifying both `Action: DUP`_ + `Action: QUEUE`_ may be translated
-  to two hidden rules combining `Action: QUEUE`_ and `Action: PASSTHRU`_.
-
 - When a single target queue is provided, `Action: RSS`_ can also be
   implemented through `Action: QUEUE`_.
 
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index cb6f201e1..a015d02a4 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3363,10 +3363,6 @@ actions can sometimes be combined when the end result is unambiguous::
 
 ::
 
-   drop / dup index 6 / end # same as above
-
-::
-
    queue index 6 / rss queues 6 7 8 / end # queue has no effect
 
 ::
@@ -3400,10 +3396,6 @@ This section lists supported actions and their attributes, if any.
 
 - ``count``: enable counters for this rule.
 
-- ``dup``: duplicate packets to a given queue index.
-
-  - ``index {unsigned}``: queue index to duplicate packets to.
-
 - ``rss``: spread packets among several queues.
 
   - ``types [{RSS hash type} [...]] end``: RSS hash types, allowed tokens
diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c
index ba6feddee..db04c4f94 100644
--- a/lib/librte_ether/rte_flow.c
+++ b/lib/librte_ether/rte_flow.c
@@ -73,7 +73,6 @@ static const struct rte_flow_desc_data rte_flow_desc_action[] = {
 	MK_FLOW_ACTION(QUEUE, sizeof(struct rte_flow_action_queue)),
 	MK_FLOW_ACTION(DROP, 0),
 	MK_FLOW_ACTION(COUNT, 0),
-	MK_FLOW_ACTION(DUP, sizeof(struct rte_flow_action_dup)),
 	MK_FLOW_ACTION(RSS, sizeof(struct rte_flow_action_rss)), /* +queue[] */
 	MK_FLOW_ACTION(PF, 0),
 	MK_FLOW_ACTION(VF, sizeof(struct rte_flow_action_vf)),
diff --git a/lib/librte_ether/rte_flow.h b/lib/librte_ether/rte_flow.h
index 36fd38ffa..aab637a2c 100644
--- a/lib/librte_ether/rte_flow.h
+++ b/lib/librte_ether/rte_flow.h
@@ -961,16 +961,6 @@ enum rte_flow_action_type {
 	RTE_FLOW_ACTION_TYPE_COUNT,
 
 	/**
-	 * Duplicates packets to a given queue index.
-	 *
-	 * This is normally combined with QUEUE, however when used alone, it
-	 * is actually similar to QUEUE + PASSTHRU.
-	 *
-	 * See struct rte_flow_action_dup.
-	 */
-	RTE_FLOW_ACTION_TYPE_DUP,
-
-	/**
 	 * Similar to QUEUE, except RSS is additionally performed on packets
 	 * to spread them among several queues according to the provided
 	 * parameters.
@@ -1052,20 +1042,6 @@ struct rte_flow_query_count {
 };
 
 /**
- * RTE_FLOW_ACTION_TYPE_DUP
- *
- * Duplicates packets to a given queue index.
- *
- * This is normally combined with QUEUE, however when used alone, it is
- * actually similar to QUEUE + PASSTHRU.
- *
- * Non-terminating by default.
- */
-struct rte_flow_action_dup {
-	uint16_t index; /**< Queue index to duplicate packets to. */
-};
-
-/**
  * RTE_FLOW_ACTION_TYPE_RSS
  *
  * Similar to QUEUE, except RSS is additionally performed on packets to
-- 
2.11.0

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v1 01/16] ethdev: update ABI for flow API functions
  2018-04-04 15:56  4% [dpdk-dev] [PATCH v1 00/16] Flow API overhaul for switch offloads Adrien Mazarguil
@ 2018-04-04 15:56  7% ` Adrien Mazarguil
  2018-04-05 10:06  4%   ` Thomas Monjalon
  2018-04-04 15:56  3% ` [dpdk-dev] [PATCH v1 05/16] ethdev: remove DUP action from flow API Adrien Mazarguil
  2018-04-04 15:56  2% ` [dpdk-dev] [PATCH v1 10/16] ethdev: add encap level to RSS flow API action Adrien Mazarguil
  2 siblings, 1 reply; 200+ results
From: Adrien Mazarguil @ 2018-04-04 15:56 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, dev

Subsequent patches will modify existing types and slightly alter the
behavior of the flow API. This warrants a major ABI breakage.

While it is already taken care of for 18.05 (LIBABIVER was updated to
version 9 by a prior commit), this patch explicitly adds the affected flow
API functions as a safety measure.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 lib/librte_ether/rte_ethdev_version.map | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
index 34df6c8b5..78a6f5afb 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -203,6 +203,16 @@ DPDK_18.02 {
 
 } DPDK_17.11;
 
+DPDK_18.05 {
+	global:
+
+	rte_flow_validate;
+	rte_flow_create;
+	rte_flow_query;
+	rte_flow_copy;
+
+} DPDK_18.02;
+
 EXPERIMENTAL {
 	global:
 
-- 
2.11.0

^ permalink raw reply	[relevance 7%]

* [dpdk-dev] [PATCH v1 00/16] Flow API overhaul for switch offloads
@ 2018-04-04 15:56  4% Adrien Mazarguil
  2018-04-04 15:56  7% ` [dpdk-dev] [PATCH v1 01/16] ethdev: update ABI for flow API functions Adrien Mazarguil
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Adrien Mazarguil @ 2018-04-04 15:56 UTC (permalink / raw)
  To: Thomas Monjalon, Ferruh Yigit, dev

As summarized in a prior RFC [1], the flow API (rte_flow) was chosen as a
means to manage switch offloads supported by many devices (usually going by
names such as E-Switch or vSwitch) through user-specified flow rules.

Combined with the need to support encap/decap actions, this requires a
change in the way flow actions are processed (in order and possibly
repeated) which modifies the behavior of some of the existing actions, thus
warranting a major ABI breakage.

Given this ABI breakage is also required by other work submitted for the
current release [2][3], this series addresses various longstanding issues
with the flow API and makes minor improvements in preparation for upcoming
features.

Changes summary:

- Additional error types.
- Clearer documentation.
- Improved C++ compatibility.
- Exhaustive RSS action.
- Consistent behavior of VLAN pattern item.
- New "transfer" attribute bringing consistency to VF/PF pattern items.
- Confusing "PORT" pattern item renamed "PHY_PORT", with new action
  counterpart.
- New "PORT_ID" pattern item and action to be used with port representors.

This series piggybacks on the major ABI update introduced by a prior
commit [4] for DPDK 18.05 and depends on several fixes [5] which must be
applied first.

[1] "[RFC] Switch device offload with DPDK"
    http://dpdk.org/ml/archives/dev/2018-March/092513.html

[2] commit 676b605182a5 ("doc: announce ethdev API change for RSS
    configuration")

[3] "[PATCH v1 00/21] MLX5 tunnel Rx offloading"
    http://dpdk.org/ml/archives/dev/2018-March/092264.html

[4] commit 653e038efc9b ("ethdev: remove versioning of filter control
    function")

[5] "[PATCH v2 00/13] Bunch of flow API-related fixes"
    http://dpdk.org/ml/archives/dev/2018-April/095273.html

Adrien Mazarguil (16):
  ethdev: update ABI for flow API functions
  ethdev: add error types to flow API
  ethdev: clarify flow API pattern items and actions
  doc: remove flow API migration section
  ethdev: remove DUP action from flow API
  ethdev: alter behavior of flow API actions
  ethdev: remove C99 flexible arrays from flow API
  ethdev: flatten RSS configuration in flow API
  ethdev: add hash function to RSS flow API action
  ethdev: add encap level to RSS flow API action
  ethdev: refine TPID handling in flow API
  ethdev: add transfer attribute to flow API
  ethdev: update behavior of VF/PF in flow API
  ethdev: rename physical port item in flow API
  ethdev: add physical port action to flow API
  ethdev: add port ID item and action to flow API

 app/test-pmd/cmdline_flow.c                 | 405 ++++++++++-----
 app/test-pmd/config.c                       |  78 +--
 doc/guides/nics/tap.rst                     |   2 +-
 doc/guides/prog_guide/rte_flow.rst          | 601 ++++++++---------------
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  54 +-
 drivers/net/bnxt/bnxt_filter.c              |  52 +-
 drivers/net/e1000/e1000_ethdev.h            |  13 +-
 drivers/net/e1000/igb_ethdev.c              |   4 +-
 drivers/net/e1000/igb_flow.c                |  83 +++-
 drivers/net/e1000/igb_rxtx.c                |  55 ++-
 drivers/net/enic/enic_flow.c                |  52 +-
 drivers/net/i40e/i40e_ethdev.c              |  57 ++-
 drivers/net/i40e/i40e_ethdev.h              |  15 +-
 drivers/net/i40e/i40e_flow.c                | 144 ++++--
 drivers/net/ixgbe/ixgbe_ethdev.c            |   4 +-
 drivers/net/ixgbe/ixgbe_ethdev.h            |  13 +-
 drivers/net/ixgbe/ixgbe_flow.c              |  91 +++-
 drivers/net/ixgbe/ixgbe_rxtx.c              |  55 ++-
 drivers/net/mlx4/mlx4.c                     |   2 +-
 drivers/net/mlx4/mlx4_flow.c                | 117 +++--
 drivers/net/mlx4/mlx4_flow.h                |   2 +-
 drivers/net/mlx4/mlx4_rxq.c                 |   2 +-
 drivers/net/mlx4/mlx4_rxtx.h                |   2 +-
 drivers/net/mlx5/mlx5_flow.c                | 317 ++++++------
 drivers/net/mlx5/mlx5_rxq.c                 |  22 +-
 drivers/net/mlx5/mlx5_rxtx.h                |  26 +-
 drivers/net/mvpp2/mrvl_flow.c               |  33 +-
 drivers/net/sfc/sfc_flow.c                  |  82 +++-
 drivers/net/tap/tap_flow.c                  |  51 +-
 examples/ipsec-secgw/ipsec.c                |  21 +-
 lib/librte_ether/rte_ethdev_version.map     |  10 +
 lib/librte_ether/rte_flow.c                 |  68 +--
 lib/librte_ether/rte_flow.h                 | 328 ++++++++-----
 33 files changed, 1747 insertions(+), 1114 deletions(-)

-- 
2.11.0

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3 04/11] mempool: add op to calculate memory size to be allocated
  2018-03-26 16:09  7%   ` [dpdk-dev] [PATCH v3 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
@ 2018-04-04 15:08  0%     ` santosh
  0 siblings, 0 replies; 200+ results
From: santosh @ 2018-04-04 15:08 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Olivier MATZ


On Monday 26 March 2018 09:39 PM, Andrew Rybchenko wrote:
> Size of memory chunk required to populate mempool objects depends
> on how objects are stored in the memory. Different mempool drivers
> may have different requirements and a new operation allows to
> calculate memory size in accordance with driver requirements and
> advertise requirements on minimum memory chunk size and alignment
> in a generic way.
>
> Bump ABI version since the patch breaks it.
>
> Suggested-by: Olivier Matz <olivier.matz@6wind.com>
> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
> ---

Acked-by: Santosh Shukla <Santosh.Shukla@caviumnetworks.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 12/13] ethdev: fix ABI version in meson build
  2018-04-04 14:57  3% ` [dpdk-dev] [PATCH v2 00/13] Bunch of flow API-related fixes Adrien Mazarguil
@ 2018-04-04 14:58  4%   ` Adrien Mazarguil
  0 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2018-04-04 14:58 UTC (permalink / raw)
  To: dev; +Cc: Kirill Rybalchenko

Must remain synchronized with its Makefile counterpart.

Fixes: 653e038efc9b ("ethdev: remove versioning of filter control function")
Cc: Kirill Rybalchenko <kirill.rybalchenko@intel.com>

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 lib/librte_ether/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_ether/meson.build b/lib/librte_ether/meson.build
index 7fed86056..12bdb6b61 100644
--- a/lib/librte_ether/meson.build
+++ b/lib/librte_ether/meson.build
@@ -2,7 +2,7 @@
 # Copyright(c) 2017 Intel Corporation
 
 name = 'ethdev'
-version = 8
+version = 9
 allow_experimental_apis = true
 sources = files('ethdev_profile.c',
 	'rte_ethdev.c',
-- 
2.11.0

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 00/13] Bunch of flow API-related fixes
  2018-03-23 12:58  3% [dpdk-dev] [PATCH v1 0/9] Bunch of flow API-related fixes Adrien Mazarguil
  2018-03-23 12:58  4% ` [dpdk-dev] [PATCH v1 9/9] ethdev: fix ABI version in meson build Adrien Mazarguil
@ 2018-04-04 14:57  3% ` Adrien Mazarguil
  2018-04-04 14:58  4%   ` [dpdk-dev] [PATCH v2 12/13] ethdev: fix ABI version in meson build Adrien Mazarguil
  1 sibling, 1 reply; 200+ results
From: Adrien Mazarguil @ 2018-04-04 14:57 UTC (permalink / raw)
  To: dev

This series contains several fixes for rte_flow and its implementation in
PMDs and testpmd. Upcoming work on the flow API depends on it.

v2 changes:

- mlx5 fix (patch #3).
- bnxt fix (patch #4).
- sfc fix (patch #6).
- Missing include (patch #13).

Adrien Mazarguil (13):
  net/mlx4: fix RSS resource leak in case of error
  net/mlx4: fix ignored RSS hash types
  net/mlx5: fix RSS flow action bounds check
  net/bnxt: fix matching of flow API item masks
  net/sfc: fix endian conversions in flow API
  app/testpmd: fix flow completion for RSS queues
  app/testpmd: fix lack of flow action configuration
  app/testpmd: fix RSS flow action configuration
  app/testpmd: fix missing RSS fields in flow action
  ethdev: fix shallow copy of flow API RSS action
  ethdev: fix missing boolean values in flow command
  ethdev: fix ABI version in meson build
  ethdev: fix missing include in flow API

 app/test-pmd/cmdline_flow.c                 | 255 ++++++++++++++++++++---
 app/test-pmd/config.c                       | 160 +++++++++-----
 app/test-pmd/testpmd.h                      |  13 ++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   8 +
 drivers/net/bnxt/bnxt_filter.c              |  14 +-
 drivers/net/mlx4/mlx4_flow.c                |  17 +-
 drivers/net/mlx5/mlx5_flow.c                |   9 +
 drivers/net/sfc/sfc_flow.c                  |  13 +-
 lib/librte_ether/meson.build                |   2 +-
 lib/librte_ether/rte_flow.c                 | 145 +++++++++----
 lib/librte_ether/rte_flow.h                 |   2 +
 11 files changed, 503 insertions(+), 135 deletions(-)

-- 
2.11.0

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 12/13] eal: replace rte_panic instances in init sequence
  2018-04-04 11:27  3% [dpdk-dev] eal: replace calls to rte_panic and refrain from new instances Arnon Warshavsky
                   ` (4 preceding siblings ...)
  2018-04-04 11:27  3% ` [dpdk-dev] [PATCH 11/13] eal: replace rte_panic instances in ethdev Arnon Warshavsky
@ 2018-04-04 11:27  2% ` Arnon Warshavsky
  5 siblings, 0 replies; 200+ results
From: Arnon Warshavsky @ 2018-04-04 11:27 UTC (permalink / raw)
  To: thomas, anatoly.burakov, wenzhuo.lu, declan.doherty, jerin.jacob,
	bruce.richardson, ferruh.yigit
  Cc: dev, arnon

Local functions to this file,
changing from void to int are non-abi-breaking.
For handling the single function that cannot
change from void to int due to abi,
where this is the only place it is called in,
I added a state variable that is being checked
right after the call to this function.

Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>
---
 lib/librte_eal/bsdapp/eal/eal.c           |  87 ++++++++++++++-------
 lib/librte_eal/bsdapp/eal/eal_thread.c    |  65 +++++++++++-----
 lib/librte_eal/common/eal_common_launch.c |  21 ++++++
 lib/librte_eal/common/include/rte_debug.h |  12 +++
 lib/librte_eal/linuxapp/eal/eal.c         | 121 ++++++++++++++++++++----------
 lib/librte_eal/linuxapp/eal/eal_thread.c  |  65 +++++++++++-----
 6 files changed, 272 insertions(+), 99 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index 4eafcb5..f6aa3b2 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -150,7 +150,7 @@ enum rte_iova_mode
  * We also don't lock the whole file, so that in future we can use read-locks
  * on other parts, e.g. memzones, to detect if there are running secondary
  * processes. */
-static void
+static int
 rte_eal_config_create(void)
 {
 	void *rte_mem_cfg_addr;
@@ -159,60 +159,79 @@ enum rte_iova_mode
 	const char *pathname = eal_runtime_config_path();
 
 	if (internal_config.no_shconf)
-		return;
+		return 0;
 
 	if (mem_cfg_fd < 0){
 		mem_cfg_fd = open(pathname, O_RDWR | O_CREAT, 0660);
-		if (mem_cfg_fd < 0)
-			rte_panic("Cannot open '%s' for rte_mem_config\n", pathname);
+		if (mem_cfg_fd < 0) {
+			RTE_LOG(CRIT, EAL, "%s(): Cannot open '%s' for rte_mem_config\n",
+					__func__, pathname);
+			return -1;
+		}
 	}
 
 	retval = ftruncate(mem_cfg_fd, sizeof(*rte_config.mem_config));
 	if (retval < 0){
 		close(mem_cfg_fd);
-		rte_panic("Cannot resize '%s' for rte_mem_config\n", pathname);
+		RTE_LOG(CRIT, EAL, "%s(): Cannot resize '%s' for rte_mem_config\n",
+				__func__, pathname);
+		return -1;
 	}
 
 	retval = fcntl(mem_cfg_fd, F_SETLK, &wr_lock);
 	if (retval < 0){
 		close(mem_cfg_fd);
-		rte_exit(EXIT_FAILURE, "Cannot create lock on '%s'. Is another primary "
-				"process running?\n", pathname);
+		RTE_LOG(CRIT, EAL, "%s(): Cannot create lock on '%s'."
+				" Is another primary process running?\n",
+				__func__, pathname);
+		return -1;
 	}
 
 	rte_mem_cfg_addr = mmap(NULL, sizeof(*rte_config.mem_config),
 				PROT_READ | PROT_WRITE, MAP_SHARED, mem_cfg_fd, 0);
 
 	if (rte_mem_cfg_addr == MAP_FAILED){
-		rte_panic("Cannot mmap memory for rte_config\n");
+		RTE_LOG(CRIT, EAL, "%s(): Cannot mmap memory for rte_config\n",
+				__func__);
+		return -1;
 	}
 	memcpy(rte_mem_cfg_addr, &early_mem_config, sizeof(early_mem_config));
 	rte_config.mem_config = rte_mem_cfg_addr;
+
+	return 0;
 }
 
 /* attach to an existing shared memory config */
-static void
+static int
 rte_eal_config_attach(void)
 {
 	void *rte_mem_cfg_addr;
 	const char *pathname = eal_runtime_config_path();
 
 	if (internal_config.no_shconf)
-		return;
+		return 0;
 
 	if (mem_cfg_fd < 0){
 		mem_cfg_fd = open(pathname, O_RDWR);
-		if (mem_cfg_fd < 0)
-			rte_panic("Cannot open '%s' for rte_mem_config\n", pathname);
+		if (mem_cfg_fd < 0) {
+			RTE_LOG(CRIT, EAL, "%s(): Cannot open '%s' for rte_mem_config\n",
+					__func__, pathname);
+			return -1;
+		}
 	}
 
 	rte_mem_cfg_addr = mmap(NULL, sizeof(*rte_config.mem_config),
 				PROT_READ | PROT_WRITE, MAP_SHARED, mem_cfg_fd, 0);
 	close(mem_cfg_fd);
-	if (rte_mem_cfg_addr == MAP_FAILED)
-		rte_panic("Cannot mmap memory for rte_config\n");
+	if (rte_mem_cfg_addr == MAP_FAILED) {
+		RTE_LOG(CRIT, EAL, "%s(): Cannot mmap memory for rte_config\n",
+				__func__);
+		return -1;
+	}
 
 	rte_config.mem_config = rte_mem_cfg_addr;
+
+	return 0;
 }
 
 /* Detect if we are a primary or a secondary process */
@@ -236,23 +255,28 @@ enum rte_proc_type_t
 }
 
 /* Sets up rte_config structure with the pointer to shared memory config.*/
-static void
+static int
 rte_config_init(void)
 {
 	rte_config.process_type = internal_config.process_type;
 
 	switch (rte_config.process_type){
 	case RTE_PROC_PRIMARY:
-		rte_eal_config_create();
+		if (rte_eal_config_create())
+			return -1;
 		break;
 	case RTE_PROC_SECONDARY:
-		rte_eal_config_attach();
+		if (rte_eal_config_attach())
+			return -1;
 		rte_eal_mcfg_wait_complete(rte_config.mem_config);
 		break;
 	case RTE_PROC_AUTO:
 	case RTE_PROC_INVALID:
-		rte_panic("Invalid process type\n");
+		RTE_LOG(CRIT, EAL, "%s(): Invalid process type %d\n",
+				__func__, rte_config.process_type);
+		return -1;
 	}
+	return 0;
 }
 
 /* display usage */
@@ -583,7 +607,8 @@ static void rte_eal_init_alert(const char *msg)
 
 	rte_srand(rte_rdtsc());
 
-	rte_config_init();
+	if (rte_config_init() != 0)
+		return -1;
 
 	if (rte_mp_channel_init() < 0) {
 		rte_eal_init_alert("failed to init mp channel\n");
@@ -630,7 +655,8 @@ static void rte_eal_init_alert(const char *msg)
 
 	eal_check_mem_on_local_socket();
 
-	eal_thread_init_master(rte_config.master_lcore);
+	if (eal_thread_init_master(rte_config.master_lcore) != 0)
+		return -1;
 
 	ret = eal_thread_dump_affinity(cpuset, RTE_CPU_AFFINITY_STR_LEN);
 
@@ -644,18 +670,27 @@ static void rte_eal_init_alert(const char *msg)
 		 * create communication pipes between master thread
 		 * and children
 		 */
-		if (pipe(lcore_config[i].pipe_master2slave) < 0)
-			rte_panic("Cannot create pipe\n");
-		if (pipe(lcore_config[i].pipe_slave2master) < 0)
-			rte_panic("Cannot create pipe\n");
+		if (pipe(lcore_config[i].pipe_master2slave) < 0) {
+			RTE_LOG(CRIT, EAL, "%s(): Cannot create pipe\n",
+					__func__);
+			return -1;
+		}
+		if (pipe(lcore_config[i].pipe_slave2master) < 0) {
+			RTE_LOG(CRIT, EAL, "%s(): Cannot create pipe\n",
+					__func__);
+			return -1;
+		}
 
 		lcore_config[i].state = WAIT;
 
 		/* create a thread for each lcore */
 		ret = pthread_create(&lcore_config[i].thread_id, NULL,
 				     eal_thread_loop, NULL);
-		if (ret != 0)
-			rte_panic("Cannot create thread\n");
+		if (ret != 0) {
+			RTE_LOG(CRIT, EAL, "%s(): Cannot create thread\n",
+					__func__);
+			return -1;
+		}
 
 		/* Set thread_name for aid in debugging. */
 		snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN,
diff --git a/lib/librte_eal/bsdapp/eal/eal_thread.c b/lib/librte_eal/bsdapp/eal/eal_thread.c
index d602daf..5c3947c 100644
--- a/lib/librte_eal/bsdapp/eal/eal_thread.c
+++ b/lib/librte_eal/bsdapp/eal/eal_thread.c
@@ -51,16 +51,22 @@
 	n = 0;
 	while (n == 0 || (n < 0 && errno == EINTR))
 		n = write(m2s, &c, 1);
-	if (n < 0)
-		rte_panic("cannot write on configuration pipe\n");
+	if (n < 0) {
+		RTE_LOG(CRIT, EAL, "%s(): Cannot write on configuration pipe\n",
+				__func__);
+		return -1;
+	}
 
 	/* wait ack */
 	do {
 		n = read(s2m, &c, 1);
 	} while (n < 0 && errno == EINTR);
 
-	if (n <= 0)
-		rte_panic("cannot read on configuration pipe\n");
+	if (n <= 0) {
+		RTE_LOG(CRIT, EAL, "%s(): Cannot read on configuration pipe\n",
+				__func__);
+		return -1;
+	}
 
 	return 0;
 }
@@ -84,8 +90,19 @@ void eal_thread_init_master(unsigned lcore_id)
 	RTE_PER_LCORE(_lcore_id) = lcore_id;
 
 	/* set CPU affinity */
-	if (eal_thread_set_affinity() < 0)
-		rte_panic("cannot set affinity\n");
+	if (eal_thread_set_affinity() < 0) {
+		RTE_LOG(CRIT, EAL, "%s(): Cannot set affinity\n", __func__);
+		rte_move_to_panic_state();
+	}
+}
+
+/* move to panic state and do not return */
+static __attribute__((noreturn)) void
+defunct_and_remain_in_endless_loop(void)
+{
+	rte_move_to_panic_state();
+	while (1)
+		sleep(1);
 }
 
 /* main loop of threads */
@@ -106,8 +123,11 @@ void eal_thread_init_master(unsigned lcore_id)
 		if (thread_id == lcore_config[lcore_id].thread_id)
 			break;
 	}
-	if (lcore_id == RTE_MAX_LCORE)
-		rte_panic("cannot retrieve lcore id\n");
+	if (lcore_id == RTE_MAX_LCORE) {
+		RTE_LOG(CRIT, EAL, "%s(): Cannot retrieve lcore id\n",
+				__func__);
+		defunct_and_remain_in_endless_loop();
+	}
 
 	m2s = lcore_config[lcore_id].pipe_master2slave[0];
 	s2m = lcore_config[lcore_id].pipe_slave2master[1];
@@ -116,8 +136,10 @@ void eal_thread_init_master(unsigned lcore_id)
 	RTE_PER_LCORE(_lcore_id) = lcore_id;
 
 	/* set CPU affinity */
-	if (eal_thread_set_affinity() < 0)
-		rte_panic("cannot set affinity\n");
+	if (eal_thread_set_affinity() < 0) {
+		RTE_LOG(CRIT, EAL, "%s(): Cannot set affinity\n", __func__);
+		defunct_and_remain_in_endless_loop();
+	}
 
 	ret = eal_thread_dump_affinity(cpuset, RTE_CPU_AFFINITY_STR_LEN);
 
@@ -133,8 +155,11 @@ void eal_thread_init_master(unsigned lcore_id)
 			n = read(m2s, &c, 1);
 		} while (n < 0 && errno == EINTR);
 
-		if (n <= 0)
-			rte_panic("cannot read on configuration pipe\n");
+		if (n <= 0) {
+			RTE_LOG(CRIT, EAL, "%s(): Cannot read on configuration pipe\n",
+					__func__);
+			defunct_and_remain_in_endless_loop();
+		}
 
 		lcore_config[lcore_id].state = RUNNING;
 
@@ -142,11 +167,17 @@ void eal_thread_init_master(unsigned lcore_id)
 		n = 0;
 		while (n == 0 || (n < 0 && errno == EINTR))
 			n = write(s2m, &c, 1);
-		if (n < 0)
-			rte_panic("cannot write on configuration pipe\n");
-
-		if (lcore_config[lcore_id].f == NULL)
-			rte_panic("NULL function pointer\n");
+		if (n < 0) {
+			RTE_LOG(CRIT, EAL, "%s(): Cannot write on configuration pipe\n",
+					__func__);
+			defunct_and_remain_in_endless_loop();
+		}
+
+		if (lcore_config[lcore_id].f == NULL) {
+			RTE_LOG(CRIT, EAL, "%s(): NULL function pointer\n",
+					__func__);
+			defunct_and_remain_in_endless_loop();
+		}
 
 		/* call the function and store the return value */
 		fct_arg = lcore_config[lcore_id].arg;
diff --git a/lib/librte_eal/common/eal_common_launch.c b/lib/librte_eal/common/eal_common_launch.c
index fe0ba3f..6f8bd46 100644
--- a/lib/librte_eal/common/eal_common_launch.c
+++ b/lib/librte_eal/common/eal_common_launch.c
@@ -14,6 +14,7 @@
 #include <rte_pause.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
+#include <rte_debug.h>
 
 /*
  * Wait until a lcore finished its job.
@@ -88,3 +89,23 @@ enum rte_lcore_state_t
 		rte_eal_wait_lcore(lcore_id);
 	}
 }
+
+/* panic state */
+static int _panic_state;
+
+/**
+ * Check if the system is in panic state
+ * @return int
+ */
+int rte_get_panic_state(void)
+{
+	return _panic_state;
+}
+
+/**
+ * Move the system to be in panic state
+ */
+void rte_move_to_panic_state(void)
+{
+	_panic_state = 1;
+}
diff --git a/lib/librte_eal/common/include/rte_debug.h b/lib/librte_eal/common/include/rte_debug.h
index 272df49..b421d33 100644
--- a/lib/librte_eal/common/include/rte_debug.h
+++ b/lib/librte_eal/common/include/rte_debug.h
@@ -79,4 +79,16 @@ void __rte_panic(const char *funcname , const char *format, ...)
 }
 #endif
 
+/**
+ * Check if the system is in panic state
+ * @return int
+ */
+int rte_get_panic_state(void);
+
+/**
+ * Move the system to be in panic state
+ */
+void rte_move_to_panic_state(void);
+
+
 #endif /* _RTE_DEBUG_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 2ecd07b..b7b950a 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -160,7 +160,7 @@ enum rte_iova_mode
  * We also don't lock the whole file, so that in future we can use read-locks
  * on other parts, e.g. memzones, to detect if there are running secondary
  * processes. */
-static void
+static int
 rte_eal_config_create(void)
 {
 	void *rte_mem_cfg_addr;
@@ -169,7 +169,7 @@ enum rte_iova_mode
 	const char *pathname = eal_runtime_config_path();
 
 	if (internal_config.no_shconf)
-		return;
+		return 0;
 
 	/* map the config before hugepage address so that we don't waste a page */
 	if (internal_config.base_virtaddr != 0)
@@ -179,30 +179,39 @@ enum rte_iova_mode
 	else
 		rte_mem_cfg_addr = NULL;
 
-	if (mem_cfg_fd < 0){
+	if (mem_cfg_fd < 0) {
 		mem_cfg_fd = open(pathname, O_RDWR | O_CREAT, 0660);
-		if (mem_cfg_fd < 0)
-			rte_panic("Cannot open '%s' for rte_mem_config\n", pathname);
+		if (mem_cfg_fd < 0) {
+			RTE_LOG(CRIT, EAL, "%s(): Cannot open '%s' for "
+					"rte_mem_config\n", __func__, pathname);
+			return -1;
+		}
 	}
 
 	retval = ftruncate(mem_cfg_fd, sizeof(*rte_config.mem_config));
-	if (retval < 0){
+	if (retval < 0) {
 		close(mem_cfg_fd);
-		rte_panic("Cannot resize '%s' for rte_mem_config\n", pathname);
+		RTE_LOG(CRIT, EAL, "%s(): Cannot resize '%s' for rte_mem_config\n",
+				__func__, pathname);
+		return -1;
 	}
 
 	retval = fcntl(mem_cfg_fd, F_SETLK, &wr_lock);
-	if (retval < 0){
+	if (retval < 0) {
 		close(mem_cfg_fd);
-		rte_exit(EXIT_FAILURE, "Cannot create lock on '%s'. Is another primary "
-				"process running?\n", pathname);
+		RTE_LOG(CRIT, EAL, "%s(): Cannot create lock on '%s'."
+				" Is another primary process running?\n",
+				__func__, pathname);
+		return -1;
 	}
 
 	rte_mem_cfg_addr = mmap(rte_mem_cfg_addr, sizeof(*rte_config.mem_config),
 				PROT_READ | PROT_WRITE, MAP_SHARED, mem_cfg_fd, 0);
 
-	if (rte_mem_cfg_addr == MAP_FAILED){
-		rte_panic("Cannot mmap memory for rte_config\n");
+	if (rte_mem_cfg_addr == MAP_FAILED) {
+		RTE_LOG(CRIT, EAL, "%s(): Cannot mmap memory for "
+				"rte_config\n", __func__);
+		return -1;
 	}
 	memcpy(rte_mem_cfg_addr, &early_mem_config, sizeof(early_mem_config));
 	rte_config.mem_config = rte_mem_cfg_addr;
@@ -211,10 +220,11 @@ enum rte_iova_mode
 	 * processes could later map the config into this exact location */
 	rte_config.mem_config->mem_cfg_addr = (uintptr_t) rte_mem_cfg_addr;
 
+	return 0;
 }
 
 /* attach to an existing shared memory config */
-static void
+static int
 rte_eal_config_attach(void)
 {
 	struct rte_mem_config *mem_config;
@@ -222,33 +232,41 @@ enum rte_iova_mode
 	const char *pathname = eal_runtime_config_path();
 
 	if (internal_config.no_shconf)
-		return;
+		return 0;
 
-	if (mem_cfg_fd < 0){
+	if (mem_cfg_fd < 0) {
 		mem_cfg_fd = open(pathname, O_RDWR);
-		if (mem_cfg_fd < 0)
-			rte_panic("Cannot open '%s' for rte_mem_config\n", pathname);
+		if (mem_cfg_fd < 0) {
+			RTE_LOG(CRIT, EAL, "%s(): Cannot open '%s' for rte_mem_config\n",
+						__func__, pathname);
+			return -1;
+		}
 	}
 
 	/* map it as read-only first */
 	mem_config = (struct rte_mem_config *) mmap(NULL, sizeof(*mem_config),
 			PROT_READ, MAP_SHARED, mem_cfg_fd, 0);
-	if (mem_config == MAP_FAILED)
-		rte_panic("Cannot mmap memory for rte_config! error %i (%s)\n",
-			  errno, strerror(errno));
+	if (mem_config == MAP_FAILED) {
+		RTE_LOG(CRIT, EAL, "%s(): Cannot mmap memory for "
+				"rte_config! error %i (%s)\n",
+				__func__, errno, strerror(errno));
+		return -1;
+	}
 
 	rte_config.mem_config = mem_config;
+
+	return 0;
 }
 
 /* reattach the shared config at exact memory location primary process has it */
-static void
+static int
 rte_eal_config_reattach(void)
 {
 	struct rte_mem_config *mem_config;
 	void *rte_mem_cfg_addr;
 
 	if (internal_config.no_shconf)
-		return;
+		return 0;
 
 	/* save the address primary process has mapped shared config to */
 	rte_mem_cfg_addr = (void *) (uintptr_t) rte_config.mem_config->mem_cfg_addr;
@@ -263,16 +281,21 @@ enum rte_iova_mode
 	if (mem_config == MAP_FAILED || mem_config != rte_mem_cfg_addr) {
 		if (mem_config != MAP_FAILED)
 			/* errno is stale, don't use */
-			rte_panic("Cannot mmap memory for rte_config at [%p], got [%p]"
-				  " - please use '--base-virtaddr' option\n",
-				  rte_mem_cfg_addr, mem_config);
+			RTE_LOG(CRIT, EAL, "%s(): Cannot mmap memory for "
+					"rte_config at [%p], got [%p] - please use "
+					"'--base-virtaddr' option\n",
+					__func__, rte_mem_cfg_addr, mem_config);
 		else
-			rte_panic("Cannot mmap memory for rte_config! error %i (%s)\n",
-				  errno, strerror(errno));
+			RTE_LOG(CRIT, EAL, "%s(): Cannot mmap memory for "
+					"rte_config! error %i (%s)\n",
+					__func__, errno, strerror(errno));
+		return -1;
 	}
 	close(mem_cfg_fd);
 
 	rte_config.mem_config = mem_config;
+
+	return 0;
 }
 
 /* Detect if we are a primary or a secondary process */
@@ -296,24 +319,31 @@ enum rte_proc_type_t
 }
 
 /* Sets up rte_config structure with the pointer to shared memory config.*/
-static void
+static int
 rte_config_init(void)
 {
 	rte_config.process_type = internal_config.process_type;
 
 	switch (rte_config.process_type){
 	case RTE_PROC_PRIMARY:
-		rte_eal_config_create();
+		if (rte_eal_config_create() != 0)
+			return -1;
 		break;
 	case RTE_PROC_SECONDARY:
-		rte_eal_config_attach();
+		if (rte_eal_config_attach() != 0)
+			return -1;
 		rte_eal_mcfg_wait_complete(rte_config.mem_config);
-		rte_eal_config_reattach();
+		if (rte_eal_config_reattach() != 0)
+			return -1;
 		break;
 	case RTE_PROC_AUTO:
 	case RTE_PROC_INVALID:
-		rte_panic("Invalid process type\n");
+		RTE_LOG(CRIT, EAL, "%s(): Invalid process type %d\n",
+				__func__, rte_config.process_type);
+		return -1;
 	}
+
+	return 0;
 }
 
 /* Unlocks hugepage directories that were locked by eal_hugepage_info_init */
@@ -827,7 +857,8 @@ static void rte_eal_init_alert(const char *msg)
 
 	rte_srand(rte_rdtsc());
 
-	rte_config_init();
+	if (rte_config_init() != 0)
+		return -1;
 
 	if (rte_eal_log_init(logid, internal_config.syslog_facility) < 0) {
 		rte_eal_init_alert("Cannot init logging.");
@@ -890,6 +921,9 @@ static void rte_eal_init_alert(const char *msg)
 
 	eal_thread_init_master(rte_config.master_lcore);
 
+	if (rte_get_panic_state())
+		return -1;
+
 	ret = eal_thread_dump_affinity(cpuset, RTE_CPU_AFFINITY_STR_LEN);
 
 	RTE_LOG(DEBUG, EAL, "Master lcore %u is ready (tid=%x;cpuset=[%s%s])\n",
@@ -907,18 +941,27 @@ static void rte_eal_init_alert(const char *msg)
 		 * create communication pipes between master thread
 		 * and children
 		 */
-		if (pipe(lcore_config[i].pipe_master2slave) < 0)
-			rte_panic("Cannot create pipe\n");
-		if (pipe(lcore_config[i].pipe_slave2master) < 0)
-			rte_panic("Cannot create pipe\n");
+		if (pipe(lcore_config[i].pipe_master2slave) < 0) {
+			RTE_LOG(CRIT, EAL, "%s(): Cannot create pipe\n",
+					__func__);
+			return -1;
+		}
+		if (pipe(lcore_config[i].pipe_slave2master) < 0) {
+			RTE_LOG(CRIT, EAL, "%s(): Cannot create pipe\n",
+					__func__);
+			return -1;
+		}
 
 		lcore_config[i].state = WAIT;
 
 		/* create a thread for each lcore */
 		ret = pthread_create(&lcore_config[i].thread_id, NULL,
 				     eal_thread_loop, NULL);
-		if (ret != 0)
-			rte_panic("Cannot create thread\n");
+		if (ret != 0) {
+			RTE_LOG(CRIT, EAL, "%s(): Cannot create thread\n",
+					__func__);
+			return -1;
+		}
 
 		/* Set thread_name for aid in debugging. */
 		snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN,
diff --git a/lib/librte_eal/linuxapp/eal/eal_thread.c b/lib/librte_eal/linuxapp/eal/eal_thread.c
index 08e150b..3afcee5 100644
--- a/lib/librte_eal/linuxapp/eal/eal_thread.c
+++ b/lib/librte_eal/linuxapp/eal/eal_thread.c
@@ -51,16 +51,22 @@
 	n = 0;
 	while (n == 0 || (n < 0 && errno == EINTR))
 		n = write(m2s, &c, 1);
-	if (n < 0)
-		rte_panic("cannot write on configuration pipe\n");
+	if (n < 0) {
+		RTE_LOG(CRIT, EAL, "%s(): Cannot write on configuration pipe\n",
+				__func__);
+		return -1;
+	}
 
 	/* wait ack */
 	do {
 		n = read(s2m, &c, 1);
 	} while (n < 0 && errno == EINTR);
 
-	if (n <= 0)
-		rte_panic("cannot read on configuration pipe\n");
+	if (n <= 0) {
+		RTE_LOG(CRIT, EAL, "%s(): Cannot read on configuration pipe\n",
+				__func__);
+		return -1;
+	}
 
 	return 0;
 }
@@ -84,8 +90,19 @@ void eal_thread_init_master(unsigned lcore_id)
 	RTE_PER_LCORE(_lcore_id) = lcore_id;
 
 	/* set CPU affinity */
-	if (eal_thread_set_affinity() < 0)
-		rte_panic("cannot set affinity\n");
+	if (eal_thread_set_affinity() < 0) {
+		RTE_LOG(CRIT, EAL, "%s(): Cannot set affinity\n", __func__);
+		rte_move_to_panic_state();
+	}
+}
+
+/* move to panic state and do not return */
+static __attribute__((noreturn)) void
+defunct_and_remain_in_endless_loop(void)
+{
+	rte_move_to_panic_state();
+	while (1)
+		sleep(1);
 }
 
 /* main loop of threads */
@@ -106,8 +123,11 @@ void eal_thread_init_master(unsigned lcore_id)
 		if (thread_id == lcore_config[lcore_id].thread_id)
 			break;
 	}
-	if (lcore_id == RTE_MAX_LCORE)
-		rte_panic("cannot retrieve lcore id\n");
+	if (lcore_id == RTE_MAX_LCORE) {
+		RTE_LOG(CRIT, EAL, "%s(): Cannot retrieve lcore id\n",
+				__func__);
+		defunct_and_remain_in_endless_loop();
+	}
 
 	m2s = lcore_config[lcore_id].pipe_master2slave[0];
 	s2m = lcore_config[lcore_id].pipe_slave2master[1];
@@ -116,8 +136,10 @@ void eal_thread_init_master(unsigned lcore_id)
 	RTE_PER_LCORE(_lcore_id) = lcore_id;
 
 	/* set CPU affinity */
-	if (eal_thread_set_affinity() < 0)
-		rte_panic("cannot set affinity\n");
+	if (eal_thread_set_affinity() < 0) {
+		RTE_LOG(CRIT, EAL, "%s(): Cannot set affinity\n", __func__);
+		defunct_and_remain_in_endless_loop();
+	}
 
 	ret = eal_thread_dump_affinity(cpuset, RTE_CPU_AFFINITY_STR_LEN);
 
@@ -133,8 +155,11 @@ void eal_thread_init_master(unsigned lcore_id)
 			n = read(m2s, &c, 1);
 		} while (n < 0 && errno == EINTR);
 
-		if (n <= 0)
-			rte_panic("cannot read on configuration pipe\n");
+		if (n <= 0) {
+			RTE_LOG(CRIT, EAL, "%s(): Cannot read on configuration pipe\n",
+					__func__);
+			defunct_and_remain_in_endless_loop();
+		}
 
 		lcore_config[lcore_id].state = RUNNING;
 
@@ -142,11 +167,17 @@ void eal_thread_init_master(unsigned lcore_id)
 		n = 0;
 		while (n == 0 || (n < 0 && errno == EINTR))
 			n = write(s2m, &c, 1);
-		if (n < 0)
-			rte_panic("cannot write on configuration pipe\n");
-
-		if (lcore_config[lcore_id].f == NULL)
-			rte_panic("NULL function pointer\n");
+		if (n < 0) {
+			RTE_LOG(CRIT, EAL, "%s(): Cannot write on configuration pipe\n",
+					__func__);
+			defunct_and_remain_in_endless_loop();
+		}
+
+		if (lcore_config[lcore_id].f == NULL) {
+			RTE_LOG(CRIT, EAL, "%s(): NULL function pointer\n",
+					__func__);
+			defunct_and_remain_in_endless_loop();
+		}
 
 		/* call the function and store the return value */
 		fct_arg = lcore_config[lcore_id].arg;
-- 
1.8.3.1

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH 11/13] eal: replace rte_panic instances in ethdev
  2018-04-04 11:27  3% [dpdk-dev] eal: replace calls to rte_panic and refrain from new instances Arnon Warshavsky
                   ` (3 preceding siblings ...)
  2018-04-04 11:27  3% ` [dpdk-dev] [PATCH 06/13] kni: replace rte_panic instances in kni Arnon Warshavsky
@ 2018-04-04 11:27  3% ` Arnon Warshavsky
  2018-04-04 11:27  2% ` [dpdk-dev] [PATCH 12/13] eal: replace rte_panic instances in init sequence Arnon Warshavsky
  5 siblings, 0 replies; 200+ results
From: Arnon Warshavsky @ 2018-04-04 11:27 UTC (permalink / raw)
  To: thomas, anatoly.burakov, wenzhuo.lu, declan.doherty, jerin.jacob,
	bruce.richardson, ferruh.yigit
  Cc: dev, arnon

Local function to this file,
changing from void to int is non-abi-breaking

Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>
---
 lib/librte_ether/rte_ethdev.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 2c74f7e..57e1e6b 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -194,7 +194,7 @@ enum {
 	return port_id;
 }
 
-static void
+static int
 rte_eth_dev_shared_data_prepare(void)
 {
 	const unsigned flags = 0;
@@ -210,8 +210,12 @@ enum {
 					rte_socket_id(), flags);
 		} else
 			mz = rte_memzone_lookup(MZ_RTE_ETH_DEV_DATA);
-		if (mz == NULL)
-			rte_panic("Cannot allocate ethdev shared data\n");
+		if (mz == NULL) {
+			rte_spinlock_unlock(&rte_eth_shared_data_lock);
+			RTE_LOG(CRIT, EAL, "%s(): Cannot allocate ethdev shared data\n",
+					__func__);
+			return -1;
+		}
 
 		rte_eth_dev_shared_data = mz->addr;
 		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
@@ -224,6 +228,8 @@ enum {
 	}
 
 	rte_spinlock_unlock(&rte_eth_shared_data_lock);
+
+	return 0;
 }
 
 struct rte_eth_dev *
@@ -274,7 +280,8 @@ struct rte_eth_dev *
 	uint16_t port_id;
 	struct rte_eth_dev *eth_dev = NULL;
 
-	rte_eth_dev_shared_data_prepare();
+	if (rte_eth_dev_shared_data_prepare() != 0)
+		return NULL;
 
 	/* Synchronize port creation between primary and secondary threads. */
 	rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
@@ -317,7 +324,8 @@ struct rte_eth_dev *
 	uint16_t i;
 	struct rte_eth_dev *eth_dev = NULL;
 
-	rte_eth_dev_shared_data_prepare();
+	if (rte_eth_dev_shared_data_prepare() != 0)
+		return NULL;
 
 	/* Synchronize port attachment to primary port creation and release. */
 	rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
@@ -345,7 +353,8 @@ struct rte_eth_dev *
 	if (eth_dev == NULL)
 		return -EINVAL;
 
-	rte_eth_dev_shared_data_prepare();
+	if (rte_eth_dev_shared_data_prepare() != 0)
+		return -1;
 
 	rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
 
@@ -399,7 +408,8 @@ struct rte_eth_dev *
 int __rte_experimental
 rte_eth_dev_owner_new(uint64_t *owner_id)
 {
-	rte_eth_dev_shared_data_prepare();
+	if (rte_eth_dev_shared_data_prepare() != 0)
+		return -1;
 
 	rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
 
@@ -450,7 +460,8 @@ struct rte_eth_dev *
 {
 	int ret;
 
-	rte_eth_dev_shared_data_prepare();
+	if (rte_eth_dev_shared_data_prepare() != 0)
+		return -1;
 
 	rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
 
@@ -467,7 +478,8 @@ struct rte_eth_dev *
 			{.id = RTE_ETH_DEV_NO_OWNER, .name = ""};
 	int ret;
 
-	rte_eth_dev_shared_data_prepare();
+	if (rte_eth_dev_shared_data_prepare() != 0)
+		return -1;
 
 	rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
 
@@ -482,7 +494,8 @@ struct rte_eth_dev *
 {
 	uint16_t port_id;
 
-	rte_eth_dev_shared_data_prepare();
+	if (rte_eth_dev_shared_data_prepare() != 0)
+		return;
 
 	rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
 
@@ -502,7 +515,8 @@ struct rte_eth_dev *
 {
 	int ret = 0;
 
-	rte_eth_dev_shared_data_prepare();
+	if (rte_eth_dev_shared_data_prepare() != 0)
+		return -1;
 
 	rte_spinlock_lock(&rte_eth_dev_shared_data->ownership_lock);
 
-- 
1.8.3.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 06/13] kni: replace rte_panic instances in kni
  2018-04-04 11:27  3% [dpdk-dev] eal: replace calls to rte_panic and refrain from new instances Arnon Warshavsky
                   ` (2 preceding siblings ...)
  2018-04-04 11:27  3% ` [dpdk-dev] [PATCH 04/13] ixgbe: replace rte_panic instances in ixgbe driver Arnon Warshavsky
@ 2018-04-04 11:27  3% ` Arnon Warshavsky
  2018-04-04 11:27  3% ` [dpdk-dev] [PATCH 11/13] eal: replace rte_panic instances in ethdev Arnon Warshavsky
  2018-04-04 11:27  2% ` [dpdk-dev] [PATCH 12/13] eal: replace rte_panic instances in init sequence Arnon Warshavsky
  5 siblings, 0 replies; 200+ results
From: Arnon Warshavsky @ 2018-04-04 11:27 UTC (permalink / raw)
  To: thomas, anatoly.burakov, wenzhuo.lu, declan.doherty, jerin.jacob,
	bruce.richardson, ferruh.yigit
  Cc: dev, arnon

replace panic calls with log and retrun value.
Local function to this file,
changing from void to int is non-abi-breaking

Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>
---
 lib/librte_kni/rte_kni.c      | 18 ++++++++++++------
 lib/librte_kni/rte_kni_fifo.h | 11 ++++++++---
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index 2867411..54050c8 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -353,37 +353,43 @@ struct rte_kni *
 	/* TX RING */
 	mz = slot->m_tx_q;
 	ctx->tx_q = mz->addr;
-	kni_fifo_init(ctx->tx_q, KNI_FIFO_COUNT_MAX);
+	if (kni_fifo_init(ctx->tx_q, KNI_FIFO_COUNT_MAX))
+		return NULL;
 	dev_info.tx_phys = mz->phys_addr;
 
 	/* RX RING */
 	mz = slot->m_rx_q;
 	ctx->rx_q = mz->addr;
-	kni_fifo_init(ctx->rx_q, KNI_FIFO_COUNT_MAX);
+	if (kni_fifo_init(ctx->rx_q, KNI_FIFO_COUNT_MAX))
+		return NULL;
 	dev_info.rx_phys = mz->phys_addr;
 
 	/* ALLOC RING */
 	mz = slot->m_alloc_q;
 	ctx->alloc_q = mz->addr;
-	kni_fifo_init(ctx->alloc_q, KNI_FIFO_COUNT_MAX);
+	if (kni_fifo_init(ctx->alloc_q, KNI_FIFO_COUNT_MAX))
+		return NULL;
 	dev_info.alloc_phys = mz->phys_addr;
 
 	/* FREE RING */
 	mz = slot->m_free_q;
 	ctx->free_q = mz->addr;
-	kni_fifo_init(ctx->free_q, KNI_FIFO_COUNT_MAX);
+	if (kni_fifo_init(ctx->free_q, KNI_FIFO_COUNT_MAX))
+		return NULL;
 	dev_info.free_phys = mz->phys_addr;
 
 	/* Request RING */
 	mz = slot->m_req_q;
 	ctx->req_q = mz->addr;
-	kni_fifo_init(ctx->req_q, KNI_FIFO_COUNT_MAX);
+	if (kni_fifo_init(ctx->req_q, KNI_FIFO_COUNT_MAX))
+		return NULL;
 	dev_info.req_phys = mz->phys_addr;
 
 	/* Response RING */
 	mz = slot->m_resp_q;
 	ctx->resp_q = mz->addr;
-	kni_fifo_init(ctx->resp_q, KNI_FIFO_COUNT_MAX);
+	if (kni_fifo_init(ctx->resp_q, KNI_FIFO_COUNT_MAX))
+		return NULL;
 	dev_info.resp_phys = mz->phys_addr;
 
 	/* Req/Resp sync mem area */
diff --git a/lib/librte_kni/rte_kni_fifo.h b/lib/librte_kni/rte_kni_fifo.h
index ac26a8c..5052015 100644
--- a/lib/librte_kni/rte_kni_fifo.h
+++ b/lib/librte_kni/rte_kni_fifo.h
@@ -7,17 +7,22 @@
 /**
  * Initializes the kni fifo structure
  */
-static void
+static int
 kni_fifo_init(struct rte_kni_fifo *fifo, unsigned size)
 {
 	/* Ensure size is power of 2 */
-	if (size & (size - 1))
-		rte_panic("KNI fifo size must be power of 2\n");
+	if (size & (size - 1)) {
+		RTE_LOG(CRIT, EAL, "%s(): KNI fifo size must be power of 2\n",
+				__func__);
+		return -1;
+	}
 
 	fifo->write = 0;
 	fifo->read = 0;
 	fifo->len = size;
 	fifo->elem_size = sizeof(void *);
+
+	return 0;
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 04/13] ixgbe: replace rte_panic instances in ixgbe driver
  2018-04-04 11:27  3% [dpdk-dev] eal: replace calls to rte_panic and refrain from new instances Arnon Warshavsky
  2018-04-04 11:27  3% ` [dpdk-dev] [PATCH 02/13] bond: replace rte_panic instances in bonding driver Arnon Warshavsky
  2018-04-04 11:27  3% ` [dpdk-dev] [PATCH 03/13] e1000: replace rte_panic instances in e1000 driver Arnon Warshavsky
@ 2018-04-04 11:27  3% ` Arnon Warshavsky
  2018-04-04 11:27  3% ` [dpdk-dev] [PATCH 06/13] kni: replace rte_panic instances in kni Arnon Warshavsky
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Arnon Warshavsky @ 2018-04-04 11:27 UTC (permalink / raw)
  To: thomas, anatoly.burakov, wenzhuo.lu, declan.doherty, jerin.jacob,
	bruce.richardson, ferruh.yigit
  Cc: dev, arnon

replace panic calls with log and retrun value.
Local function to this file,
changing from void to int is non-abi-breaking

Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c |  3 ++-
 drivers/net/ixgbe/ixgbe_ethdev.h |  2 +-
 drivers/net/ixgbe/ixgbe_pf.c     | 13 +++++++++----
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 4df5c75..96188dc 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1224,7 +1224,8 @@ struct rte_ixgbe_xstats_name_off {
 	memset(hwstrip, 0, sizeof(*hwstrip));
 
 	/* initialize PF if max_vfs not zero */
-	ixgbe_pf_host_init(eth_dev);
+	if (ixgbe_pf_host_init(eth_dev) != 0)
+		return -1;
 
 	ctrl_ext = IXGBE_READ_REG(hw, IXGBE_CTRL_EXT);
 	/* let hardware know driver is loaded */
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.h b/drivers/net/ixgbe/ixgbe_ethdev.h
index c56d652..82d7fd2 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.h
+++ b/drivers/net/ixgbe/ixgbe_ethdev.h
@@ -663,7 +663,7 @@ int ixgbe_fdir_filter_program(struct rte_eth_dev *dev,
 
 void ixgbe_vlan_hw_strip_disable_all(struct rte_eth_dev *dev);
 
-void ixgbe_pf_host_init(struct rte_eth_dev *eth_dev);
+int ixgbe_pf_host_init(struct rte_eth_dev *eth_dev);
 
 void ixgbe_pf_host_uninit(struct rte_eth_dev *eth_dev);
 
diff --git a/drivers/net/ixgbe/ixgbe_pf.c b/drivers/net/ixgbe/ixgbe_pf.c
index ea99737..5c25de0 100644
--- a/drivers/net/ixgbe/ixgbe_pf.c
+++ b/drivers/net/ixgbe/ixgbe_pf.c
@@ -66,7 +66,7 @@ int ixgbe_vf_perm_addr_gen(struct rte_eth_dev *dev, uint16_t vf_num)
 	return 0;
 }
 
-void ixgbe_pf_host_init(struct rte_eth_dev *eth_dev)
+int ixgbe_pf_host_init(struct rte_eth_dev *eth_dev)
 {
 	struct ixgbe_vf_info **vfinfo =
 		IXGBE_DEV_PRIVATE_TO_P_VFDATA(eth_dev->data->dev_private);
@@ -84,11 +84,14 @@ void ixgbe_pf_host_init(struct rte_eth_dev *eth_dev)
 	RTE_ETH_DEV_SRIOV(eth_dev).active = 0;
 	vf_num = dev_num_vf(eth_dev);
 	if (vf_num == 0)
-		return;
+		return 0;
 
 	*vfinfo = rte_zmalloc("vf_info", sizeof(struct ixgbe_vf_info) * vf_num, 0);
-	if (*vfinfo == NULL)
-		rte_panic("Cannot allocate memory for private VF data\n");
+	if (*vfinfo == NULL) {
+		RTE_LOG(ERR, PMD, "%s() Cannot allocate memory for private VF data\n",
+				__func__);
+		return -1;
+	}
 
 	memset(mirror_info, 0, sizeof(struct ixgbe_mirror_info));
 	memset(uta_info, 0, sizeof(struct ixgbe_uta_info));
@@ -116,6 +119,8 @@ void ixgbe_pf_host_init(struct rte_eth_dev *eth_dev)
 
 	/* set mb interrupt mask */
 	ixgbe_mb_intr_setup(eth_dev);
+
+	return 0;
 }
 
 void ixgbe_pf_host_uninit(struct rte_eth_dev *eth_dev)
-- 
1.8.3.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 03/13] e1000: replace rte_panic instances in e1000 driver
  2018-04-04 11:27  3% [dpdk-dev] eal: replace calls to rte_panic and refrain from new instances Arnon Warshavsky
  2018-04-04 11:27  3% ` [dpdk-dev] [PATCH 02/13] bond: replace rte_panic instances in bonding driver Arnon Warshavsky
@ 2018-04-04 11:27  3% ` Arnon Warshavsky
  2018-04-04 11:27  3% ` [dpdk-dev] [PATCH 04/13] ixgbe: replace rte_panic instances in ixgbe driver Arnon Warshavsky
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Arnon Warshavsky @ 2018-04-04 11:27 UTC (permalink / raw)
  To: thomas, anatoly.burakov, wenzhuo.lu, declan.doherty, jerin.jacob,
	bruce.richardson, ferruh.yigit
  Cc: dev, arnon

replace panic calls with log and retrun value.
Local function to this file,
changing from void to int is non-abi-breaking

Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>
---
 drivers/net/e1000/e1000_ethdev.h |  2 +-
 drivers/net/e1000/igb_ethdev.c   |  3 ++-
 drivers/net/e1000/igb_pf.c       | 15 +++++++++------
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/net/e1000/e1000_ethdev.h b/drivers/net/e1000/e1000_ethdev.h
index 23b089c..a66ff42 100644
--- a/drivers/net/e1000/e1000_ethdev.h
+++ b/drivers/net/e1000/e1000_ethdev.h
@@ -405,7 +405,7 @@ int eth_igb_rss_hash_conf_get(struct rte_eth_dev *dev,
 /*
  * misc function prototypes
  */
-void igb_pf_host_init(struct rte_eth_dev *eth_dev);
+int igb_pf_host_init(struct rte_eth_dev *eth_dev);
 
 void igb_pf_mbx_process(struct rte_eth_dev *eth_dev);
 
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index d7eef9a..994bb5a 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -833,7 +833,8 @@ static int igb_flex_filter_uninit(struct rte_eth_dev *eth_dev)
 	}
 
 	/* initialize PF if max_vfs not zero */
-	igb_pf_host_init(eth_dev);
+	if (igb_pf_host_init(eth_dev) != 0)
+		goto err_late;
 
 	ctrl_ext = E1000_READ_REG(hw, E1000_CTRL_EXT);
 	/* Set PF Reset Done bit so PF/VF Mail Ops can work */
diff --git a/drivers/net/e1000/igb_pf.c b/drivers/net/e1000/igb_pf.c
index b9f2e53..dfa63c9 100644
--- a/drivers/net/e1000/igb_pf.c
+++ b/drivers/net/e1000/igb_pf.c
@@ -63,7 +63,7 @@ int igb_vf_perm_addr_gen(struct rte_eth_dev *dev, uint16_t vf_num)
 	return 0;
 }
 
-void igb_pf_host_init(struct rte_eth_dev *eth_dev)
+int igb_pf_host_init(struct rte_eth_dev *eth_dev)
 {
 	struct e1000_vf_info **vfinfo =
 		E1000_DEV_PRIVATE_TO_P_VFDATA(eth_dev->data->dev_private);
@@ -74,7 +74,7 @@ void igb_pf_host_init(struct rte_eth_dev *eth_dev)
 
 	RTE_ETH_DEV_SRIOV(eth_dev).active = 0;
 	if (0 == (vf_num = dev_num_vf(eth_dev)))
-		return;
+		return 0;
 
 	if (hw->mac.type == e1000_i350)
 		nb_queue = 1;
@@ -82,11 +82,14 @@ void igb_pf_host_init(struct rte_eth_dev *eth_dev)
 		/* per datasheet, it should be 2, but 1 seems correct */
 		nb_queue = 1;
 	else
-		return;
+		return 0;
 
 	*vfinfo = rte_zmalloc("vf_info", sizeof(struct e1000_vf_info) * vf_num, 0);
-	if (*vfinfo == NULL)
-		rte_panic("Cannot allocate memory for private VF data\n");
+	if (*vfinfo == NULL) {
+		RTE_LOG(CRIT, PMD, "%s(): Cannot allocate memory for private "
+				"VF data\n", __func__);
+		return -1;
+	}
 
 	RTE_ETH_DEV_SRIOV(eth_dev).active = ETH_8_POOLS;
 	RTE_ETH_DEV_SRIOV(eth_dev).nb_q_per_pool = nb_queue;
@@ -98,7 +101,7 @@ void igb_pf_host_init(struct rte_eth_dev *eth_dev)
 	/* set mb interrupt mask */
 	igb_mb_intr_setup(eth_dev);
 
-	return;
+	return 0;
 }
 
 void igb_pf_host_uninit(struct rte_eth_dev *dev)
-- 
1.8.3.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH 02/13] bond: replace rte_panic instances in bonding driver
  2018-04-04 11:27  3% [dpdk-dev] eal: replace calls to rte_panic and refrain from new instances Arnon Warshavsky
@ 2018-04-04 11:27  3% ` Arnon Warshavsky
  2018-04-04 11:27  3% ` [dpdk-dev] [PATCH 03/13] e1000: replace rte_panic instances in e1000 driver Arnon Warshavsky
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Arnon Warshavsky @ 2018-04-04 11:27 UTC (permalink / raw)
  To: thomas, anatoly.burakov, wenzhuo.lu, declan.doherty, jerin.jacob,
	bruce.richardson, ferruh.yigit
  Cc: dev, arnon

replace panic calls with log and retrun value.
Local functions to this file,
changing from void to int are non-abi-breaking

Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>
---
 drivers/net/bonding/rte_eth_bond_8023ad.c         | 30 +++++++++++++++--------
 drivers/net/bonding/rte_eth_bond_8023ad_private.h |  2 +-
 drivers/net/bonding/rte_eth_bond_api.c            | 20 ++++++++++-----
 drivers/net/bonding/rte_eth_bond_pmd.c            | 10 +++++---
 drivers/net/bonding/rte_eth_bond_private.h        |  2 +-
 5 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/drivers/net/bonding/rte_eth_bond_8023ad.c b/drivers/net/bonding/rte_eth_bond_8023ad.c
index c452318..310118c 100644
--- a/drivers/net/bonding/rte_eth_bond_8023ad.c
+++ b/drivers/net/bonding/rte_eth_bond_8023ad.c
@@ -893,7 +893,7 @@
 			bond_mode_8023ad_periodic_cb, arg);
 }
 
-void
+int
 bond_mode_8023ad_activate_slave(struct rte_eth_dev *bond_dev,
 				uint16_t slave_id)
 {
@@ -939,7 +939,7 @@
 	timer_cancel(&port->warning_timer);
 
 	if (port->mbuf_pool != NULL)
-		return;
+		return 0;
 
 	RTE_ASSERT(port->rx_ring == NULL);
 	RTE_ASSERT(port->tx_ring == NULL);
@@ -968,8 +968,10 @@
 	/* Any memory allocation failure in initialization is critical because
 	 * resources can't be free, so reinitialization is impossible. */
 	if (port->mbuf_pool == NULL) {
-		rte_panic("Slave %u: Failed to create memory pool '%s': %s\n",
-			slave_id, mem_name, rte_strerror(rte_errno));
+		RTE_LOG(ERR, PMD, "%s() Slave %u: Failed to create memory"
+				" pool '%s': %s\n", __func__,
+				slave_id, mem_name, rte_strerror(rte_errno));
+		return -1;
 	}
 
 	snprintf(mem_name, RTE_DIM(mem_name), "slave_%u_rx", slave_id);
@@ -977,8 +979,9 @@
 			rte_align32pow2(BOND_MODE_8023AX_SLAVE_RX_PKTS), socket_id, 0);
 
 	if (port->rx_ring == NULL) {
-		rte_panic("Slave %u: Failed to create rx ring '%s': %s\n", slave_id,
-			mem_name, rte_strerror(rte_errno));
+		RTE_LOG(ERR, PMD, "%s() Slave %u: Failed to create rx ring '%s': %s\n",
+			__func__, slave_id, mem_name, rte_strerror(rte_errno));
+		return -1;
 	}
 
 	/* TX ring is at least one pkt longer to make room for marker packet. */
@@ -987,9 +990,13 @@
 			rte_align32pow2(BOND_MODE_8023AX_SLAVE_TX_PKTS + 1), socket_id, 0);
 
 	if (port->tx_ring == NULL) {
-		rte_panic("Slave %u: Failed to create tx ring '%s': %s\n", slave_id,
-			mem_name, rte_strerror(rte_errno));
+		RTE_LOG(ERR, PMD, "%s() Slave %u: Fail to create tx ring "
+				"'%s': %s\n", __func__,
+				slave_id, mem_name, rte_strerror(rte_errno));
+		return -1;
 	}
+
+	return 0;
 }
 
 int
@@ -1143,9 +1150,12 @@
 	struct bond_dev_private *internals = bond_dev->data->dev_private;
 	uint8_t i;
 
-	for (i = 0; i < internals->active_slave_count; i++)
-		bond_mode_8023ad_activate_slave(bond_dev,
+	for (i = 0; i < internals->active_slave_count; i++) {
+		int rc = bond_mode_8023ad_activate_slave(bond_dev,
 				internals->active_slaves[i]);
+		if (rc != 0)
+			return rc;
+	}
 
 	return 0;
 }
diff --git a/drivers/net/bonding/rte_eth_bond_8023ad_private.h b/drivers/net/bonding/rte_eth_bond_8023ad_private.h
index 0f490a5..96a42f2 100644
--- a/drivers/net/bonding/rte_eth_bond_8023ad_private.h
+++ b/drivers/net/bonding/rte_eth_bond_8023ad_private.h
@@ -263,7 +263,7 @@ struct mode8023ad_private {
  * @return
  *  0 on success, negative value otherwise.
  */
-void
+int
 bond_mode_8023ad_activate_slave(struct rte_eth_dev *dev, uint16_t port_id);
 
 /**
diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c
index f854b73..6bc5887 100644
--- a/drivers/net/bonding/rte_eth_bond_api.c
+++ b/drivers/net/bonding/rte_eth_bond_api.c
@@ -69,14 +69,15 @@
 	return 0;
 }
 
-void
+int
 activate_slave(struct rte_eth_dev *eth_dev, uint16_t port_id)
 {
 	struct bond_dev_private *internals = eth_dev->data->dev_private;
 	uint8_t active_count = internals->active_slave_count;
 
 	if (internals->mode == BONDING_MODE_8023AD)
-		bond_mode_8023ad_activate_slave(eth_dev, port_id);
+		if (bond_mode_8023ad_activate_slave(eth_dev, port_id) != 0)
+			return -1;
 
 	if (internals->mode == BONDING_MODE_TLB
 			|| internals->mode == BONDING_MODE_ALB) {
@@ -349,10 +350,17 @@
 				bond_ethdev_primary_set(internals,
 							slave_port_id);
 
-			if (find_slave_by_id(internals->active_slaves,
-					     internals->active_slave_count,
-					     slave_port_id) == internals->active_slave_count)
-				activate_slave(bonded_eth_dev, slave_port_id);
+			int rc =
+				find_slave_by_id(internals->active_slaves,
+					internals->active_slave_count,
+					slave_port_id);
+
+			if (rc == internals->active_slave_count) {
+				int rc = activate_slave(bonded_eth_dev,
+							slave_port_id);
+				if (rc != 0)
+					return -1;
+			}
 		}
 	}
 
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index b59ba9f..96f8b1a 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1740,8 +1740,11 @@ struct bwg_slave {
 		/* Any memory allocation failure in initialization is critical because
 		 * resources can't be free, so reinitialization is impossible. */
 		if (port->slow_pool == NULL) {
-			rte_panic("Slave %u: Failed to create memory pool '%s': %s\n",
-				slave_id, mem_name, rte_strerror(rte_errno));
+			RTE_LOG(ERR, PMD, "%s() Slave %u: Failed to create"
+					" memory pool '%s': %s\n",
+					__func__, slave_id, mem_name,
+					rte_strerror(rte_errno));
+			return -1;
 		}
 	}
 
@@ -2660,7 +2663,8 @@ struct bwg_slave {
 			mac_address_slaves_update(bonded_eth_dev);
 		}
 
-		activate_slave(bonded_eth_dev, port_id);
+		if (activate_slave(bonded_eth_dev, port_id) != 0)
+			return -1;
 
 		/* If user has defined the primary port then default to using it */
 		if (internals->user_defined_primary_port &&
diff --git a/drivers/net/bonding/rte_eth_bond_private.h b/drivers/net/bonding/rte_eth_bond_private.h
index 92e15f8..65453aa 100644
--- a/drivers/net/bonding/rte_eth_bond_private.h
+++ b/drivers/net/bonding/rte_eth_bond_private.h
@@ -185,7 +185,7 @@ struct bond_dev_private {
 void
 deactivate_slave(struct rte_eth_dev *eth_dev, uint16_t port_id);
 
-void
+int
 activate_slave(struct rte_eth_dev *eth_dev, uint16_t port_id);
 
 void
-- 
1.8.3.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] eal: replace calls to rte_panic and refrain from new instances
@ 2018-04-04 11:27  3% Arnon Warshavsky
  2018-04-04 11:27  3% ` [dpdk-dev] [PATCH 02/13] bond: replace rte_panic instances in bonding driver Arnon Warshavsky
                   ` (5 more replies)
  0 siblings, 6 replies; 200+ results
From: Arnon Warshavsky @ 2018-04-04 11:27 UTC (permalink / raw)
  To: thomas, anatoly.burakov, wenzhuo.lu, declan.doherty, jerin.jacob,
	bruce.richardson, ferruh.yigit
  Cc: dev, arnon



The purpose of this patch series is to cleanup the library code
from paths that end up aborting the process,
and move to checking error values, in order to allow the running process
perform an orderly teardown or other mitigation of the event.

This patch modifies the majority of rte_panic calls
under lib and drivers, and replaces them with a log message
and an error return code according to context,
that can be propagated up the call stack.

- Focus was given to the dpdk initialization path
- Some of the panic calls within drivers were left in place where
  the call is from within an interrupt or calls that are
  on the data path,where there is no simple applicative
  route to propagate the error to temination.
  These should be handled by the driver maintainers.
- In order to avoid breaking ABI where panic was called from public
  void functions, a panic state variable was introduced so that
  it can be queried after calling these void functions.
  This tool place for a single function call.
- local void functions with no api were changed to retrun a value
  where needed
- No change took place in example and test files
- No change took place for debug assertions calling panic
- A new function was added to devtools/checkpatches.sh
  in order to prevent new additions of calls to rte_panic
  under lib and drivers.

Keep calm and don't panic.


Arnon Warshavsky (13):
  crypto: replace rte_panic instances in crypto driver
  bond: replace rte_panic instances in bonding driver
  e1000: replace rte_panic instances in e1000 driver
  ixgbe: replace rte_panic instances in ixgbe driver
  eal: replace rte_panic instances in eventdev
  kni: replace rte_panic instances in kni
  e1000: replace rte_panic instances in e1000 driver
  eal: replace rte_panic instances in hugepage_info
  eal: replace rte_panic instances in common_memzone
  eal: replace rte_panic instances in interrupts thread
  eal: replace rte_panic instances in ethdev
  eal: replace rte_panic instances in init sequence
  devtools: prevent new instances of rte_panic and rte_exit

 devtools/checkpatches.sh                          |  94 ++++++++++++++++-
 drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c       |   8 +-
 drivers/crypto/dpaa_sec/dpaa_sec.c                |   8 +-
 drivers/net/bonding/rte_eth_bond_8023ad.c         |  30 ++++--
 drivers/net/bonding/rte_eth_bond_8023ad_private.h |   2 +-
 drivers/net/bonding/rte_eth_bond_api.c            |  20 ++--
 drivers/net/bonding/rte_eth_bond_pmd.c            |  10 +-
 drivers/net/bonding/rte_eth_bond_private.h        |   2 +-
 drivers/net/e1000/e1000_ethdev.h                  |   2 +-
 drivers/net/e1000/igb_ethdev.c                    |   3 +-
 drivers/net/e1000/igb_pf.c                        |  15 +--
 drivers/net/ixgbe/ixgbe_ethdev.c                  |   3 +-
 drivers/net/ixgbe/ixgbe_ethdev.h                  |   2 +-
 drivers/net/ixgbe/ixgbe_pf.c                      |  13 ++-
 lib/librte_eal/bsdapp/eal/eal.c                   |  87 +++++++++++-----
 lib/librte_eal/bsdapp/eal/eal_thread.c            |  65 +++++++++---
 lib/librte_eal/common/eal_common_launch.c         |  21 ++++
 lib/librte_eal/common/eal_common_memzone.c        |   5 +-
 lib/librte_eal/common/include/rte_debug.h         |  12 +++
 lib/librte_eal/common/rte_malloc.c                |   7 +-
 lib/librte_eal/linuxapp/eal/eal.c                 | 121 +++++++++++++++-------
 lib/librte_eal/linuxapp/eal/eal_hugepage_info.c   |  21 ++--
 lib/librte_eal/linuxapp/eal/eal_interrupts.c      |  27 +++--
 lib/librte_eal/linuxapp/eal/eal_thread.c          |  65 +++++++++---
 lib/librte_ether/rte_ethdev.c                     |  36 +++++--
 lib/librte_eventdev/rte_eventdev_pmd_pci.h        |   8 +-
 lib/librte_eventdev/rte_eventdev_pmd_vdev.h       |   8 +-
 lib/librte_kni/rte_kni.c                          |  18 ++--
 lib/librte_kni/rte_kni_fifo.h                     |  11 +-
 29 files changed, 540 insertions(+), 184 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 24/68] mempool: add support for the new allocation methods
  @ 2018-04-03 23:21  3%   ` Anatoly Burakov
  0 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2018-04-03 23:21 UTC (permalink / raw)
  To: dev
  Cc: Olivier Matz, keith.wiles, jianfeng.tan, andras.kovacs,
	laszlo.vadkeri, benjamin.walker, bruce.richardson, thomas,
	konstantin.ananyev, kuralamudhan.ramakrishnan, louise.m.daly,
	nelio.laranjeiro, yskoh, pepperjo, jerin.jacob, hemant.agrawal,
	shreyansh.jain, gowrishankar.m

If a user has specified that the zone should have contiguous memory,
use the new _contig allocation API's instead of normal ones.
Otherwise, account for the fact that unless we're in IOVA_AS_VA
mode, we cannot guarantee that the pages would be physically
contiguous, so we calculate the memzone size and alignments as if
we were getting the smallest page size available.

Existing mempool size calculation function also doesn't give us
expected results, because it will return memzone sizes aligned to
page size (e.g. a 1MB mempool will reserve the entire 1GB page if
all user has are 1GB pages), so add a new one that will give us
results more in line with what we would expect.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v3:
    - Fixed mempool size calculation
    - Fixed handling of contiguous memzones
    - Moved earlier in the patchset

 lib/librte_mempool/Makefile      |   3 +
 lib/librte_mempool/meson.build   |   3 +
 lib/librte_mempool/rte_mempool.c | 137 ++++++++++++++++++++++++++++++++-------
 3 files changed, 121 insertions(+), 22 deletions(-)

diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 24e735a..cfc69b4 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -13,6 +13,9 @@ EXPORT_MAP := rte_mempool_version.map
 
 LIBABIVER := 3
 
+# uses new contiguous memzone allocation that isn't yet in stable ABI
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops.c
diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
index 712720f..5916a0f 100644
--- a/lib/librte_mempool/meson.build
+++ b/lib/librte_mempool/meson.build
@@ -5,3 +5,6 @@ version = 3
 sources = files('rte_mempool.c', 'rte_mempool_ops.c')
 headers = files('rte_mempool.h')
 deps += ['ring']
+
+# contig memzone allocation is not yet part of stable API
+allow_experimental_apis = true
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 54f7f4b..e147180 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -3,6 +3,7 @@
  * Copyright(c) 2016 6WIND S.A.
  */
 
+#include <stdbool.h>
 #include <stdio.h>
 #include <string.h>
 #include <stdint.h>
@@ -98,6 +99,27 @@ static unsigned optimize_object_size(unsigned obj_size)
 	return new_obj_size * RTE_MEMPOOL_ALIGN;
 }
 
+static size_t
+get_min_page_size(void)
+{
+	const struct rte_mem_config *mcfg =
+			rte_eal_get_configuration()->mem_config;
+	int i;
+	size_t min_pagesz = SIZE_MAX;
+
+	for (i = 0; i < RTE_MAX_MEMSEG; i++) {
+		const struct rte_memseg *ms = &mcfg->memseg[i];
+
+		if (ms->addr == NULL)
+			continue;
+
+		if (ms->hugepage_sz < min_pagesz)
+			min_pagesz = ms->hugepage_sz;
+	}
+
+	return min_pagesz == SIZE_MAX ? (size_t) getpagesize() : min_pagesz;
+}
+
 static void
 mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova)
 {
@@ -204,7 +226,6 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
 	return sz->total_size;
 }
 
-
 /*
  * Calculate maximum amount of memory required to store given number of objects.
  */
@@ -367,16 +388,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	/* update mempool capabilities */
 	mp->flags |= mp_capa_flags;
 
-	/* Detect pool area has sufficient space for elements */
-	if (mp_capa_flags & MEMPOOL_F_CAPA_PHYS_CONTIG) {
-		if (len < total_elt_sz * mp->size) {
-			RTE_LOG(ERR, MEMPOOL,
-				"pool area %" PRIx64 " not enough\n",
-				(uint64_t)len);
-			return -ENOSPC;
-		}
-	}
-
 	memhdr = rte_zmalloc("MEMPOOL_MEMHDR", sizeof(*memhdr), 0);
 	if (memhdr == NULL)
 		return -ENOMEM;
@@ -549,6 +560,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	unsigned mz_id, n;
 	unsigned int mp_flags;
 	int ret;
+	bool force_contig, no_contig, try_contig, no_pageshift;
 
 	/* mempool must not be populated */
 	if (mp->nb_mem_chunks != 0)
@@ -563,9 +575,62 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	/* update mempool capabilities */
 	mp->flags |= mp_flags;
 
-	if (rte_eal_has_hugepages()) {
-		pg_shift = 0; /* not needed, zone is physically contiguous */
+	no_contig = mp->flags & MEMPOOL_F_NO_PHYS_CONTIG;
+	force_contig = mp->flags & MEMPOOL_F_CAPA_PHYS_CONTIG;
+
+	/*
+	 * the following section calculates page shift and page size values.
+	 *
+	 * these values impact the result of rte_mempool_xmem_size(), which
+	 * returns the amount of memory that should be allocated to store the
+	 * desired number of objects. when not zero, it allocates more memory
+	 * for the padding between objects, to ensure that an object does not
+	 * cross a page boundary. in other words, page size/shift are to be set
+	 * to zero if mempool elements won't care about page boundaries.
+	 * there are several considerations for page size and page shift here.
+	 *
+	 * if we don't need our mempools to have physically contiguous objects,
+	 * then just set page shift and page size to 0, because the user has
+	 * indicated that there's no need to care about anything.
+	 *
+	 * if we do need contiguous objects, there is also an option to reserve
+	 * the entire mempool memory as one contiguous block of memory, in
+	 * which case the page shift and alignment wouldn't matter as well.
+	 *
+	 * if we require contiguous objects, but not necessarily the entire
+	 * mempool reserved space to be contiguous, then there are two options.
+	 *
+	 * if our IO addresses are virtual, not actual physical (IOVA as VA
+	 * case), then no page shift needed - our memory allocation will give us
+	 * contiguous physical memory as far as the hardware is concerned, so
+	 * act as if we're getting contiguous memory.
+	 *
+	 * if our IO addresses are physical, we may get memory from bigger
+	 * pages, or we might get memory from smaller pages, and how much of it
+	 * we require depends on whether we want bigger or smaller pages.
+	 * However, requesting each and every memory size is too much work, so
+	 * what we'll do instead is walk through the page sizes available, pick
+	 * the smallest one and set up page shift to match that one. We will be
+	 * wasting some space this way, but it's much nicer than looping around
+	 * trying to reserve each and every page size.
+	 *
+	 * However, since size calculation will produce page-aligned sizes, it
+	 * makes sense to first try and see if we can reserve the entire memzone
+	 * in one contiguous chunk as well (otherwise we might end up wasting a
+	 * 1G page on a 10MB memzone). If we fail to get enough contiguous
+	 * memory, then we'll go and reserve space page-by-page.
+	 */
+	no_pageshift = no_contig || force_contig ||
+			rte_eal_iova_mode() == RTE_IOVA_VA;
+	try_contig = !no_contig && !no_pageshift && rte_eal_has_hugepages();
+
+	if (no_pageshift) {
 		pg_sz = 0;
+		pg_shift = 0;
+		align = RTE_CACHE_LINE_SIZE;
+	} else if (try_contig) {
+		pg_sz = get_min_page_size();
+		pg_shift = rte_bsf32(pg_sz);
 		align = RTE_CACHE_LINE_SIZE;
 	} else {
 		pg_sz = getpagesize();
@@ -575,8 +640,12 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 	for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
-		size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift,
-						mp->flags);
+		if (try_contig || no_pageshift)
+			size = rte_mempool_xmem_size(n, total_elt_sz, 0,
+				mp->flags);
+		else
+			size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift,
+				mp->flags);
 
 		ret = snprintf(mz_name, sizeof(mz_name),
 			RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);
@@ -585,23 +654,47 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 			goto fail;
 		}
 
-		mz = rte_memzone_reserve_aligned(mz_name, size,
-			mp->socket_id, mz_flags, align);
-		/* not enough memory, retry with the biggest zone we have */
-		if (mz == NULL)
-			mz = rte_memzone_reserve_aligned(mz_name, 0,
+		mz = NULL;
+		if (force_contig || try_contig) {
+			/* if contiguous memory for entire mempool memory was
+			 * requested, don't try reserving again if we fail...
+			 */
+			mz = rte_memzone_reserve_aligned_contig(mz_name, size,
+				mp->socket_id, mz_flags, align);
+
+			/* ...unless we are doing best effort allocation, in
+			 * which case recalculate size and try again */
+			if (try_contig && mz == NULL) {
+				try_contig = false;
+				align = pg_sz;
+				size = rte_mempool_xmem_size(n, total_elt_sz,
+					pg_shift, mp->flags);
+			}
+		}
+		/* only try this if we're not trying to reserve contiguous
+		 * memory.
+		 */
+		if (!force_contig && mz == NULL) {
+			mz = rte_memzone_reserve_aligned(mz_name, size,
 				mp->socket_id, mz_flags, align);
+			/* not enough memory, retry with the biggest zone we
+			 * have
+			 */
+			if (mz == NULL)
+				mz = rte_memzone_reserve_aligned(mz_name, 0,
+					mp->socket_id, mz_flags, align);
+		}
 		if (mz == NULL) {
 			ret = -rte_errno;
 			goto fail;
 		}
 
-		if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG)
+		if (no_contig)
 			iova = RTE_BAD_IOVA;
 		else
 			iova = mz->iova;
 
-		if (rte_eal_has_hugepages())
+		if (no_pageshift || try_contig)
 			ret = rte_mempool_populate_iova(mp, mz->addr,
 				iova, mz->len,
 				rte_mempool_memchunk_mz_free,
-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v10 3/9] eventtimer: add common code
  @ 2018-04-03 21:44  3%         ` Erik Gabriel Carrillo
    1 sibling, 0 replies; 200+ results
From: Erik Gabriel Carrillo @ 2018-04-03 21:44 UTC (permalink / raw)
  To: pbhagavatula, jerin.jacob; +Cc: dev, hemant.agrawal

This commit adds the logic that is shared by all event timer adapter
drivers; the common code handles instance allocation and some
initialization.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 config/common_base                                |   1 +
 drivers/event/sw/sw_evdev.c                       |  18 +
 lib/librte_eventdev/Makefile                      |   2 +
 lib/librte_eventdev/rte_event_timer_adapter.c     | 387 ++++++++++++++++++++++
 lib/librte_eventdev/rte_event_timer_adapter_pmd.h | 114 +++++++
 lib/librte_eventdev/rte_eventdev.c                |  22 ++
 lib/librte_eventdev/rte_eventdev.h                |  20 ++
 lib/librte_eventdev/rte_eventdev_pmd.h            |  35 ++
 lib/librte_eventdev/rte_eventdev_version.map      |  20 +-
 9 files changed, 618 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eventdev/rte_event_timer_adapter.c
 create mode 100644 lib/librte_eventdev/rte_event_timer_adapter_pmd.h

diff --git a/config/common_base b/config/common_base
index 7abf7c6..9354c66 100644
--- a/config/common_base
+++ b/config/common_base
@@ -550,6 +550,7 @@ CONFIG_RTE_LIBRTE_EVENTDEV=y
 CONFIG_RTE_LIBRTE_EVENTDEV_DEBUG=n
 CONFIG_RTE_EVENT_MAX_DEVS=16
 CONFIG_RTE_EVENT_MAX_QUEUES_PER_DEV=64
+CONFIG_RTE_EVENT_TIMER_ADAPTER_NUM_MAX=32
 
 #
 # Compile PMD for skeleton event device
diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c
index 0e89f11..dcb6551 100644
--- a/drivers/event/sw/sw_evdev.c
+++ b/drivers/event/sw/sw_evdev.c
@@ -464,6 +464,22 @@ sw_eth_rx_adapter_caps_get(const struct rte_eventdev *dev,
 	return 0;
 }
 
+static int
+sw_timer_adapter_caps_get(const struct rte_eventdev *dev,
+			  uint64_t flags,
+			  uint32_t *caps,
+			  const struct rte_event_timer_adapter_ops **ops)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(flags);
+	*caps = 0;
+
+	/* Use default SW ops */
+	*ops = NULL;
+
+	return 0;
+}
+
 static void
 sw_info_get(struct rte_eventdev *dev, struct rte_event_dev_info *info)
 {
@@ -791,6 +807,8 @@ sw_probe(struct rte_vdev_device *vdev)
 
 			.eth_rx_adapter_caps_get = sw_eth_rx_adapter_caps_get,
 
+			.timer_adapter_caps_get = sw_timer_adapter_caps_get,
+
 			.xstats_get = sw_xstats_get,
 			.xstats_get_names = sw_xstats_get_names,
 			.xstats_get_by_name = sw_xstats_get_by_name,
diff --git a/lib/librte_eventdev/Makefile b/lib/librte_eventdev/Makefile
index 549b182..8b16e3f 100644
--- a/lib/librte_eventdev/Makefile
+++ b/lib/librte_eventdev/Makefile
@@ -20,6 +20,7 @@ LDLIBS += -lrte_eal -lrte_ring -lrte_ethdev -lrte_hash
 SRCS-y += rte_eventdev.c
 SRCS-y += rte_event_ring.c
 SRCS-y += rte_event_eth_rx_adapter.c
+SRCS-y += rte_event_timer_adapter.c
 
 # export include files
 SYMLINK-y-include += rte_eventdev.h
@@ -29,6 +30,7 @@ SYMLINK-y-include += rte_eventdev_pmd_vdev.h
 SYMLINK-y-include += rte_event_ring.h
 SYMLINK-y-include += rte_event_eth_rx_adapter.h
 SYMLINK-y-include += rte_event_timer_adapter.h
+SYMLINK-y-include += rte_event_timer_adapter_pmd.h
 
 # versioning export map
 EXPORT_MAP := rte_eventdev_version.map
diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
new file mode 100644
index 0000000..75a14ac
--- /dev/null
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -0,0 +1,387 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017-2018 Intel Corporation.
+ * All rights reserved.
+ */
+
+#include <string.h>
+#include <inttypes.h>
+
+#include <rte_memzone.h>
+#include <rte_memory.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+
+#include "rte_eventdev.h"
+#include "rte_eventdev_pmd.h"
+#include "rte_event_timer_adapter.h"
+#include "rte_event_timer_adapter_pmd.h"
+
+#define DATA_MZ_NAME_MAX_LEN 64
+#define DATA_MZ_NAME_FORMAT "rte_event_timer_adapter_data_%d"
+
+static int evtim_logtype;
+
+static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
+
+#define EVTIM_LOG(level, logtype, ...) \
+	rte_log(RTE_LOG_ ## level, logtype, \
+		RTE_FMT("EVTIMER: %s() line %u: " RTE_FMT_HEAD(__VA_ARGS__,) \
+			"\n", __func__, __LINE__, RTE_FMT_TAIL(__VA_ARGS__,)))
+
+#define EVTIM_LOG_ERR(...) EVTIM_LOG(ERR, evtim_logtype, __VA_ARGS__)
+
+#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
+#define EVTIM_LOG_DBG(...) \
+	EVTIM_LOG(DEBUG, evtim_logtype, __VA_ARGS__)
+#else
+#define EVTIM_LOG_DBG(...) (void)0
+#endif
+
+static int
+default_port_conf_cb(uint16_t id, uint8_t event_dev_id, uint8_t *event_port_id,
+		     void *conf_arg)
+{
+	struct rte_event_timer_adapter *adapter;
+	struct rte_eventdev *dev;
+	struct rte_event_dev_config dev_conf;
+	struct rte_event_port_conf *port_conf, def_port_conf = {0};
+	int started;
+	uint8_t port_id;
+	uint8_t dev_id;
+	int ret;
+
+	RTE_SET_USED(event_dev_id);
+
+	adapter = &adapters[id];
+	dev = &rte_eventdevs[adapter->data->event_dev_id];
+	dev_id = dev->data->dev_id;
+	dev_conf = dev->data->dev_conf;
+
+	started = dev->data->dev_started;
+	if (started)
+		rte_event_dev_stop(dev_id);
+
+	port_id = dev_conf.nb_event_ports;
+	dev_conf.nb_event_ports += 1;
+	ret = rte_event_dev_configure(dev_id, &dev_conf);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to configure event dev %u\n", dev_id);
+		if (started)
+			if (rte_event_dev_start(dev_id))
+				return -EIO;
+
+		return ret;
+	}
+
+	if (conf_arg != NULL)
+		port_conf = conf_arg;
+	else {
+		port_conf = &def_port_conf;
+		ret = rte_event_port_default_conf_get(dev_id, port_id,
+						      port_conf);
+		if (ret < 0)
+			return ret;
+	}
+
+	ret = rte_event_port_setup(dev_id, port_id, port_conf);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to setup event port %u on event dev %u\n",
+			      port_id, dev_id);
+		return ret;
+	}
+
+	*event_port_id = port_id;
+
+	if (started)
+		ret = rte_event_dev_start(dev_id);
+
+	return ret;
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_create(const struct rte_event_timer_adapter_conf *conf)
+{
+	return rte_event_timer_adapter_create_ext(conf, default_port_conf_cb,
+						  NULL);
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_create_ext(
+		const struct rte_event_timer_adapter_conf *conf,
+		rte_event_timer_adapter_port_conf_cb_t conf_cb,
+		void *conf_arg)
+{
+	uint16_t adapter_id;
+	struct rte_event_timer_adapter *adapter;
+	const struct rte_memzone *mz;
+	char mz_name[DATA_MZ_NAME_MAX_LEN];
+	int n, ret;
+	struct rte_eventdev *dev;
+
+	if (conf == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Check eventdev ID */
+	if (!rte_event_pmd_is_valid_dev(conf->event_dev_id)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	dev = &rte_eventdevs[conf->event_dev_id];
+
+	adapter_id = conf->timer_adapter_id;
+
+	/* Check that adapter_id is in range */
+	if (adapter_id >= RTE_EVENT_TIMER_ADAPTER_NUM_MAX) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Check adapter ID not already allocated */
+	adapter = &adapters[adapter_id];
+	if (adapter->allocated) {
+		rte_errno = EEXIST;
+		return NULL;
+	}
+
+	/* Create shared data area. */
+	n = snprintf(mz_name, sizeof(mz_name), DATA_MZ_NAME_FORMAT, adapter_id);
+	if (n >= (int)sizeof(mz_name)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	mz = rte_memzone_reserve(mz_name,
+				 sizeof(struct rte_event_timer_adapter_data),
+				 conf->socket_id, 0);
+	if (mz == NULL)
+		/* rte_errno set by rte_memzone_reserve */
+		return NULL;
+
+	adapter->data = mz->addr;
+	memset(adapter->data, 0, sizeof(struct rte_event_timer_adapter_data));
+
+	adapter->data->mz = mz;
+	adapter->data->event_dev_id = conf->event_dev_id;
+	adapter->data->id = adapter_id;
+	adapter->data->socket_id = conf->socket_id;
+	adapter->data->conf = *conf;  /* copy conf structure */
+
+	/* Query eventdev PMD for timer adapter capabilities and ops */
+	ret = dev->dev_ops->timer_adapter_caps_get(dev,
+						   adapter->data->conf.flags,
+						   &adapter->data->caps,
+						   &adapter->ops);
+	if (ret < 0) {
+		rte_errno = ret;
+		goto free_memzone;
+	}
+
+	if (!(adapter->data->caps &
+	      RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT)) {
+		FUNC_PTR_OR_NULL_RET_WITH_ERRNO(conf_cb, -EINVAL);
+		ret = conf_cb(adapter->data->id, adapter->data->event_dev_id,
+			      &adapter->data->event_port_id, conf_arg);
+		if (ret < 0) {
+			rte_errno = ret;
+			goto free_memzone;
+		}
+	}
+
+	/* Allow driver to do some setup */
+	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
+	ret = adapter->ops->init(adapter);
+	if (ret < 0) {
+		rte_errno = ret;
+		goto free_memzone;
+	}
+
+	/* Set fast-path function pointers */
+	adapter->arm_burst = adapter->ops->arm_burst;
+	adapter->arm_tmo_tick_burst = adapter->ops->arm_tmo_tick_burst;
+	adapter->cancel_burst = adapter->ops->cancel_burst;
+
+	adapter->allocated = 1;
+
+	return adapter;
+
+free_memzone:
+	rte_memzone_free(adapter->data->mz);
+	return NULL;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_info *adapter_info)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+
+	if (adapter->ops->get_info)
+		/* let driver set values it knows */
+		adapter->ops->get_info(adapter, adapter_info);
+
+	/* Set common values */
+	adapter_info->conf = adapter->data->conf;
+	adapter_info->event_dev_port_id = adapter->data->event_port_id;
+	adapter_info->caps = adapter->data->caps;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->start, -EINVAL);
+
+	ret = adapter->ops->start(adapter);
+	if (ret < 0)
+		return ret;
+
+	adapter->data->started = 1;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stop, -EINVAL);
+
+	if (adapter->data->started == 0) {
+		EVTIM_LOG_ERR("event timer adapter %"PRIu8" already stopped",
+			      adapter->data->id);
+		return 0;
+	}
+
+	ret = adapter->ops->stop(adapter);
+	if (ret < 0)
+		return ret;
+
+	adapter->data->started = 0;
+
+	return 0;
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_lookup(uint16_t adapter_id)
+{
+	char name[DATA_MZ_NAME_MAX_LEN];
+	const struct rte_memzone *mz;
+	struct rte_event_timer_adapter_data *data;
+	struct rte_event_timer_adapter *adapter;
+	int ret;
+	struct rte_eventdev *dev;
+
+	if (adapters[adapter_id].allocated)
+		return &adapters[adapter_id]; /* Adapter is already loaded */
+
+	snprintf(name, DATA_MZ_NAME_MAX_LEN, DATA_MZ_NAME_FORMAT, adapter_id);
+	mz = rte_memzone_lookup(name);
+	if (mz == NULL) {
+		rte_errno = ENOENT;
+		return NULL;
+	}
+
+	data = mz->addr;
+
+	adapter = &adapters[data->id];
+	adapter->data = data;
+
+	dev = &rte_eventdevs[adapter->data->event_dev_id];
+
+	/* Query eventdev PMD for timer adapter capabilities and ops */
+	ret = dev->dev_ops->timer_adapter_caps_get(dev,
+						   adapter->data->conf.flags,
+						   &adapter->data->caps,
+						   &adapter->ops);
+	if (ret < 0) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Set fast-path function pointers */
+	adapter->arm_burst = adapter->ops->arm_burst;
+	adapter->arm_tmo_tick_burst = adapter->ops->arm_tmo_tick_burst;
+	adapter->cancel_burst = adapter->ops->cancel_burst;
+
+	adapter->allocated = 1;
+
+	return adapter;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_free(struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->uninit, -EINVAL);
+
+	if (adapter->data->started == 1) {
+		EVTIM_LOG_ERR("event timer adapter %"PRIu8" must be stopped "
+			      "before freeing", adapter->data->id);
+		return -EBUSY;
+	}
+
+	/* free impl priv data */
+	ret = adapter->ops->uninit(adapter);
+	if (ret < 0)
+		return ret;
+
+	/* free shared data area */
+	ret = rte_memzone_free(adapter->data->mz);
+	if (ret < 0)
+		return ret;
+
+	adapter->data = NULL;
+	adapter->allocated = 0;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_service_id_get(struct rte_event_timer_adapter *adapter,
+				       uint32_t *service_id)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+
+	if (adapter->data->service_inited && service_id != NULL)
+		*service_id = adapter->data->service_id;
+
+	return adapter->data->service_inited ? 0 : -ESRCH;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stats_get(struct rte_event_timer_adapter *adapter,
+				  struct rte_event_timer_adapter_stats *stats)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stats_get, -EINVAL);
+	if (stats == NULL)
+		return -EINVAL;
+
+	return adapter->ops->stats_get(adapter, stats);
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stats_reset(struct rte_event_timer_adapter *adapter)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stats_reset, -EINVAL);
+	return adapter->ops->stats_reset(adapter);
+}
+
+RTE_INIT(event_timer_adapter_init_log);
+static void
+event_timer_adapter_init_log(void)
+{
+	evtim_logtype = rte_log_register("lib.eventdev.adapter.timer");
+	if (evtim_logtype >= 0)
+		rte_log_set_level(evtim_logtype, RTE_LOG_NOTICE);
+}
diff --git a/lib/librte_eventdev/rte_event_timer_adapter_pmd.h b/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
new file mode 100644
index 0000000..cf3509d
--- /dev/null
+++ b/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017-2018 Intel Corporation.
+ * All rights reserved.
+ */
+
+#ifndef __RTE_EVENT_TIMER_ADAPTER_PMD_H__
+#define __RTE_EVENT_TIMER_ADAPTER_PMD_H__
+
+/**
+ * @file
+ * RTE Event Timer Adapter API (PMD Side)
+ *
+ * @note
+ * This file provides implementation helpers for internal use by PMDs.  They
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_event_timer_adapter.h"
+
+/*
+ * Definitions of functions exported by an event timer adapter implementation
+ * through *rte_event_timer_adapter_ops* structure supplied in the
+ * *rte_event_timer_adapter* structure associated with an event timer adapter.
+ */
+
+typedef int (*rte_event_timer_adapter_init_t)(
+		struct rte_event_timer_adapter *adapter);
+/**< @internal Event timer adapter implementation setup */
+typedef int (*rte_event_timer_adapter_uninit_t)(
+		struct rte_event_timer_adapter *adapter);
+/**< @internal Event timer adapter implementation teardown */
+typedef int (*rte_event_timer_adapter_start_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Start running event timer adapter */
+typedef int (*rte_event_timer_adapter_stop_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Stop running event timer adapter */
+typedef void (*rte_event_timer_adapter_get_info_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_info *adapter_info);
+/**< @internal Get contextual information for event timer adapter */
+typedef int (*rte_event_timer_adapter_stats_get_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_stats *stats);
+/**< @internal Get statistics for event timer adapter */
+typedef int (*rte_event_timer_adapter_stats_reset_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Reset statistics for event timer adapter */
+
+/**
+ * @internal Structure containing the functions exported by an event timer
+ * adapter implementation.
+ */
+struct rte_event_timer_adapter_ops {
+	rte_event_timer_adapter_init_t		init;  /**< Set up adapter */
+	rte_event_timer_adapter_uninit_t	uninit;/**< Tear down adapter */
+	rte_event_timer_adapter_start_t		start; /**< Start adapter */
+	rte_event_timer_adapter_stop_t		stop;  /**< Stop adapter */
+	rte_event_timer_adapter_get_info_t	get_info;
+	/**< Get info from driver */
+	rte_event_timer_adapter_stats_get_t	stats_get;
+	/**< Get adapter statistics */
+	rte_event_timer_adapter_stats_reset_t	stats_reset;
+	/**< Reset adapter statistics */
+	rte_event_timer_arm_burst_t		arm_burst;
+	/**< Arm one or more event timers */
+	rte_event_timer_arm_tmo_tick_burst_t	arm_tmo_tick_burst;
+	/**< Arm event timers with same expiration time */
+	rte_event_timer_cancel_burst_t		cancel_burst;
+	/**< Cancel one or more event timers */
+};
+
+/**
+ * @internal Adapter data; structure to be placed in shared memory to be
+ * accessible by various processes in a multi-process configuration.
+ */
+struct rte_event_timer_adapter_data {
+	uint8_t id;
+	/**< Event timer adapter ID */
+	uint8_t event_dev_id;
+	/**< Event device ID */
+	uint32_t socket_id;
+	/**< Socket ID where memory is allocated */
+	uint8_t event_port_id;
+	/**< Optional: event port ID used when the inbuilt port is absent */
+	const struct rte_memzone *mz;
+	/**< Event timer adapter memzone pointer */
+	struct rte_event_timer_adapter_conf conf;
+	/**< Configuration used to configure the adapter. */
+	uint32_t caps;
+	/**< Adapter capabilities */
+	void *adapter_priv;
+	/**< Timer adapter private data*/
+	uint8_t service_inited;
+	/**< Service initialization state */
+	uint32_t service_id;
+	/**< Service ID*/
+
+	RTE_STD_C11
+	uint8_t started : 1;
+	/**< Flag to indicate adapter started. */
+} __rte_cache_aligned;
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __RTE_EVENT_TIMER_ADAPTER_PMD_H__ */
diff --git a/lib/librte_eventdev/rte_eventdev.c b/lib/librte_eventdev/rte_eventdev.c
index 2de8d9a..3f016f4 100644
--- a/lib/librte_eventdev/rte_eventdev.c
+++ b/lib/librte_eventdev/rte_eventdev.c
@@ -123,6 +123,28 @@ rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
 				: 0;
 }
 
+int __rte_experimental
+rte_event_timer_adapter_caps_get(uint8_t dev_id, uint32_t *caps)
+{
+	struct rte_eventdev *dev;
+	const struct rte_event_timer_adapter_ops *ops;
+
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
+
+	dev = &rte_eventdevs[dev_id];
+
+	if (caps == NULL)
+		return -EINVAL;
+	*caps = 0;
+
+	return dev->dev_ops->timer_adapter_caps_get ?
+				(*dev->dev_ops->timer_adapter_caps_get)(dev,
+									0,
+									caps,
+									&ops)
+				: 0;
+}
+
 static inline int
 rte_event_dev_queue_config(struct rte_eventdev *dev, uint8_t nb_queues)
 {
diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
index 86df4be..6fcbe94 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -215,6 +215,7 @@ extern "C" {
 #include <rte_config.h>
 #include <rte_memory.h>
 #include <rte_errno.h>
+#include <rte_compat.h>
 
 struct rte_mbuf; /* we just use mbuf pointers; no need to include rte_mbuf.h */
 struct rte_event;
@@ -1115,6 +1116,25 @@ int
 rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
 				uint32_t *caps);
 
+#define RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT (1ULL << 0)
+/**< This flag is set when the timer mechanism is in HW. */
+
+/**
+ * Retrieve the event device's timer adapter capabilities.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @param[out] caps
+ *   A pointer to memory to be filled with event timer adapter capabilities.
+ *
+ * @return
+ *   - 0: Success, driver provided event timer adapter capabilities.
+ *   - <0: Error code returned by the driver function.
+ */
+int __rte_experimental
+rte_event_timer_adapter_caps_get(uint8_t dev_id, uint32_t *caps);
+
 struct rte_eventdev_driver;
 struct rte_eventdev_ops;
 struct rte_eventdev;
diff --git a/lib/librte_eventdev/rte_eventdev_pmd.h b/lib/librte_eventdev/rte_eventdev_pmd.h
index 3a8ddd7..2dcb528 100644
--- a/lib/librte_eventdev/rte_eventdev_pmd.h
+++ b/lib/librte_eventdev/rte_eventdev_pmd.h
@@ -26,6 +26,7 @@ extern "C" {
 #include <rte_malloc.h>
 
 #include "rte_eventdev.h"
+#include "rte_event_timer_adapter_pmd.h"
 
 /* Logging Macros */
 #define RTE_EDEV_LOG_ERR(...) \
@@ -449,6 +450,37 @@ typedef int (*eventdev_eth_rx_adapter_caps_get_t)
 struct rte_event_eth_rx_adapter_queue_conf *queue_conf;
 
 /**
+ * Retrieve the event device's timer adapter capabilities, as well as the ops
+ * structure that an event timer adapter should call through to enter the
+ * driver
+ *
+ * @param dev
+ *   Event device pointer
+ *
+ * @param flags
+ *   Flags that can be used to determine how to select an event timer
+ *   adapter ops structure
+ *
+ * @param[out] caps
+ *   A pointer to memory filled with Rx event adapter capabilities.
+ *
+ * @param[out] ops
+ *   A pointer to the ops pointer to set with the address of the desired ops
+ *   structure
+ *
+ * @return
+ *   - 0: Success, driver provides Rx event adapter capabilities for the
+ *	ethernet device.
+ *   - <0: Error code returned by the driver function.
+ *
+ */
+typedef int (*eventdev_timer_adapter_caps_get_t)(
+				const struct rte_eventdev *dev,
+				uint64_t flags,
+				uint32_t *caps,
+				const struct rte_event_timer_adapter_ops **ops);
+
+/**
  * Add ethernet Rx queues to event device. This callback is invoked if
  * the caps returned from rte_eventdev_eth_rx_adapter_caps_get(, eth_port_id)
  * has RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT set.
@@ -640,6 +672,9 @@ struct rte_eventdev_ops {
 	eventdev_eth_rx_adapter_stats_reset eth_rx_adapter_stats_reset;
 	/**< Reset ethernet Rx stats */
 
+	eventdev_timer_adapter_caps_get_t timer_adapter_caps_get;
+	/**< Get timer adapter capabilities */
+
 	eventdev_selftest dev_selftest;
 	/**< Start eventdev Selftest */
 
diff --git a/lib/librte_eventdev/rte_eventdev_version.map b/lib/librte_eventdev/rte_eventdev_version.map
index 4396536..6979577 100644
--- a/lib/librte_eventdev/rte_eventdev_version.map
+++ b/lib/librte_eventdev/rte_eventdev_version.map
@@ -66,7 +66,6 @@ DPDK_17.11 {
 	rte_event_eth_rx_adapter_stats_get;
 	rte_event_eth_rx_adapter_stats_reset;
 	rte_event_eth_rx_adapter_stop;
-
 } DPDK_17.08;
 
 DPDK_18.02 {
@@ -80,3 +79,22 @@ DPDK_18.05 {
 
 	rte_event_dev_stop_flush_callback_register;
 } DPDK_18.02;
+
+EXPERIMENTAL {
+	global:
+
+        rte_event_timer_adapter_caps_get;
+	rte_event_timer_adapter_create;
+	rte_event_timer_adapter_create_ext;
+	rte_event_timer_adapter_free;
+	rte_event_timer_adapter_get_info;
+	rte_event_timer_adapter_lookup;
+	rte_event_timer_adapter_service_id_get;
+	rte_event_timer_adapter_start;
+	rte_event_timer_adapter_stats_get;
+	rte_event_timer_adapter_stats_reset;
+	rte_event_timer_adapter_stop;
+	rte_event_timer_arm_burst;
+	rte_event_timer_arm_tmo_tick_burst;
+	rte_event_timer_cancel_burst;
+} DPDK_18.05;
-- 
2.6.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] ring: relax alignment constraint on ring structure
  2018-04-03 15:56  3%         ` Olivier Matz
@ 2018-04-03 16:42  3%           ` Jerin Jacob
  2018-04-04 23:38  0%             ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-04-03 16:42 UTC (permalink / raw)
  To: Olivier Matz; +Cc: dev, konstantin.ananyev, bruce.richardson

-----Original Message-----
> Date: Tue, 3 Apr 2018 17:56:01 +0200
> From: Olivier Matz <olivier.matz@6wind.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: dev@dpdk.org, konstantin.ananyev@intel.com, bruce.richardson@intel.com
> Subject: Re: [dpdk-dev] [PATCH] ring: relax alignment constraint on ring
>  structure
> User-Agent: NeoMutt/20170113 (1.7.2)
> 
> On Tue, Apr 03, 2018 at 09:07:04PM +0530, Jerin Jacob wrote:
> > -----Original Message-----
> > > Date: Tue, 3 Apr 2018 17:25:17 +0200
> > > From: Olivier Matz <olivier.matz@6wind.com>
> > > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > CC: dev@dpdk.org, konstantin.ananyev@intel.com, bruce.richardson@intel.com
> > > Subject: Re: [dpdk-dev] [PATCH] ring: relax alignment constraint on ring
> > >  structure
> > > User-Agent: NeoMutt/20170113 (1.7.2)
> > > 
> > > On Tue, Apr 03, 2018 at 08:37:23PM +0530, Jerin Jacob wrote:
> > > > -----Original Message-----
> > > > > Date: Tue, 3 Apr 2018 15:26:44 +0200
> > > > > From: Olivier Matz <olivier.matz@6wind.com>
> > > > > To: dev@dpdk.org
> > > > > Subject: [dpdk-dev] [PATCH] ring: relax alignment constraint on ring
> > > > >  structure
> > > > > X-Mailer: git-send-email 2.11.0
> > > > > 
> > > > > The initial objective of
> > > > > commit d9f0d3a1ffd4 ("ring: remove split cacheline build setting")
> > > > > was to add an empty cache line betwee, the producer and consumer
> > > > > data (on platform with cache line size = 64B), preventing from
> > > > > having them on adjacent cache lines.
> > > > > 
> > > > > Following discussion on the mailing list, it appears that this
> > > > > also imposes an alignment constraint that is not required.
> > > > > 
> > > > > This patch removes the extra alignment constraint and adds the
> > > > > empty cache lines using padding fields in the structure. The
> > > > > size of rte_ring structure and the offset of the fields remain
> > > > > the same on platforms with cache line size = 64B:
> > > > > 
> > > > >   rte_ring = 384
> > > > >   rte_ring.name = 0
> > > > >   rte_ring.flags = 32
> > > > >   rte_ring.memzone = 40
> > > > >   rte_ring.size = 48
> > > > >   rte_ring.mask = 52
> > > > >   rte_ring.prod = 128
> > > > >   rte_ring.cons = 256
> > > > > 
> > > > > But it has an impact on platform where cache line size is 128B:
> > > > > 
> > > > >   rte_ring = 384        -> 768
> > > > >   rte_ring.name = 0
> > > > >   rte_ring.flags = 32
> > > > >   rte_ring.memzone = 40
> > > > >   rte_ring.size = 48
> > > > >   rte_ring.mask = 52
> > > > >   rte_ring.prod = 128   -> 256
> > > > >   rte_ring.cons = 256   -> 512
> > > > 
> > > > Are we leaving TWO cacheline to make sure, HW prefetch don't load
> > > > the adjust cacheline(consumer)?
> > > > 
> > > > If so, Will it have impact on those machine where it is 128B Cache line
> > > > and the HW prefetcher is not loading the next caching explicitly. Right?
> > > 
> > > The impact on machines that have a 128B cache line is that an unused
> > > cache line will be added between the producer and consumer data. I
> > > expect that the impact is positive in case there is a hw prefetcher, and
> > > null in case there is no such prefetcher.
> > 
> > It is not NULL, Right? You are loosing 256B for each ring.
> 
> Is it really that important?

Pipeline or eventdev SW cases there could more rings in the system.
I don't see any downside of having config option which is enabled
default.

In my view, such config options are good, as in embedded usecases, customers
can really fine tune the target for the need. In server usecases, let the default
of option be enabled, no harm.

> 
> 
> > > On machines with 64B cache line, this was already the case. It just
> > > reduces the alignment constraint.
> > 
> > Not all the 64B CL machines will have HW prefetch.
> > 
> > I would recommend to add conditional compilation flags to express HW
> > prefetch enabled or not? based on that we can decide to reserve
> > the additional space. By default, in common config, HW prefetch can
> > be enabled so that it works for almost all cases.
> 
> The hw prefetcher can be enabled at runtime, so a compilation flag
> does not seem to be a good idea. Moreover, changing this compilation

On those Hardwares HW prefetch can be disabled at runtime, it is fine
with default config. I was taking about some low end ARM hardware which
does not have HW prefetch is not present at all.

> flag would change the ABI.

ABI is broken anyway, Right? due to size of the structure change.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] ring: relax alignment constraint on ring structure
  @ 2018-04-03 15:56  3%         ` Olivier Matz
  2018-04-03 16:42  3%           ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2018-04-03 15:56 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, konstantin.ananyev, bruce.richardson

On Tue, Apr 03, 2018 at 09:07:04PM +0530, Jerin Jacob wrote:
> -----Original Message-----
> > Date: Tue, 3 Apr 2018 17:25:17 +0200
> > From: Olivier Matz <olivier.matz@6wind.com>
> > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > CC: dev@dpdk.org, konstantin.ananyev@intel.com, bruce.richardson@intel.com
> > Subject: Re: [dpdk-dev] [PATCH] ring: relax alignment constraint on ring
> >  structure
> > User-Agent: NeoMutt/20170113 (1.7.2)
> > 
> > On Tue, Apr 03, 2018 at 08:37:23PM +0530, Jerin Jacob wrote:
> > > -----Original Message-----
> > > > Date: Tue, 3 Apr 2018 15:26:44 +0200
> > > > From: Olivier Matz <olivier.matz@6wind.com>
> > > > To: dev@dpdk.org
> > > > Subject: [dpdk-dev] [PATCH] ring: relax alignment constraint on ring
> > > >  structure
> > > > X-Mailer: git-send-email 2.11.0
> > > > 
> > > > The initial objective of
> > > > commit d9f0d3a1ffd4 ("ring: remove split cacheline build setting")
> > > > was to add an empty cache line betwee, the producer and consumer
> > > > data (on platform with cache line size = 64B), preventing from
> > > > having them on adjacent cache lines.
> > > > 
> > > > Following discussion on the mailing list, it appears that this
> > > > also imposes an alignment constraint that is not required.
> > > > 
> > > > This patch removes the extra alignment constraint and adds the
> > > > empty cache lines using padding fields in the structure. The
> > > > size of rte_ring structure and the offset of the fields remain
> > > > the same on platforms with cache line size = 64B:
> > > > 
> > > >   rte_ring = 384
> > > >   rte_ring.name = 0
> > > >   rte_ring.flags = 32
> > > >   rte_ring.memzone = 40
> > > >   rte_ring.size = 48
> > > >   rte_ring.mask = 52
> > > >   rte_ring.prod = 128
> > > >   rte_ring.cons = 256
> > > > 
> > > > But it has an impact on platform where cache line size is 128B:
> > > > 
> > > >   rte_ring = 384        -> 768
> > > >   rte_ring.name = 0
> > > >   rte_ring.flags = 32
> > > >   rte_ring.memzone = 40
> > > >   rte_ring.size = 48
> > > >   rte_ring.mask = 52
> > > >   rte_ring.prod = 128   -> 256
> > > >   rte_ring.cons = 256   -> 512
> > > 
> > > Are we leaving TWO cacheline to make sure, HW prefetch don't load
> > > the adjust cacheline(consumer)?
> > > 
> > > If so, Will it have impact on those machine where it is 128B Cache line
> > > and the HW prefetcher is not loading the next caching explicitly. Right?
> > 
> > The impact on machines that have a 128B cache line is that an unused
> > cache line will be added between the producer and consumer data. I
> > expect that the impact is positive in case there is a hw prefetcher, and
> > null in case there is no such prefetcher.
> 
> It is not NULL, Right? You are loosing 256B for each ring.

Is it really that important?


> > On machines with 64B cache line, this was already the case. It just
> > reduces the alignment constraint.
> 
> Not all the 64B CL machines will have HW prefetch.
> 
> I would recommend to add conditional compilation flags to express HW
> prefetch enabled or not? based on that we can decide to reserve
> the additional space. By default, in common config, HW prefetch can
> be enabled so that it works for almost all cases.

The hw prefetcher can be enabled at runtime, so a compilation flag
does not seem to be a good idea. Moreover, changing this compilation
flag would change the ABI.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 0/2] gcc-8 build fixes
  @ 2018-04-03 15:10  3%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2018-04-03 15:10 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

On Tue, 3 Apr 2018 10:23:43 +0100
Ferruh Yigit <ferruh.yigit@intel.com> wrote:

> On 3/29/2018 6:05 PM, Stephen Hemminger wrote:
> > This fixes some of the obvious warnings found building DPDK
> > with gcc-8. There still are some deeper issues in the rte_hash_table
> > code; leave the fix for that up to the maintainer.
> > 
> > Stephen Hemminger (2):
> >   rte_mbuf: fix strncpy warnings
> >   rte_metrics: fix strncpy truncation warning
> > 
> > v3
> >   missing SOB on 1st patch
> > 
> > v2
> >   fix issues with wrong length in mbuf pool_ops
> >   don't need memset in metrics names
> > 
> > Stephen Hemminger (2):
> >   rte_mbuf: fix strncpy warnings
> >   rte_metrics: fix strncpy truncation warning  
> 
> I tried with gcc-8 [1] and getting a few more build errors similar to these
> ones. Are these two files only build error you get?
> 
> 
> [1]
> gcc (GCC) 8.0.1 20180401 (experimental)
> 

This fixes the easy ones. The harder one is in cuckoo hash.

  CC rte_table_hash_cuckoo.o
lib/librte_table/rte_table_hash_cuckoo.c: In function ‘rte_table_hash_cuckoo_create’:
lib/librte_table/rte_table_hash_cuckoo.c:110:16: error: cast between incompatible function types from ‘rte_table_hash_op_hash’ {aka ‘long unsigned int (*)(void *, void *, unsigned int,  long unsigned int)’} to ‘uint32_t (*)(const void *, uint32_t,  uint32_t)’ {aka ‘unsigned int (*)(const void *, unsigned int,  unsigned int)’} [-Werror=cast-function-type]
   .hash_func = (rte_hash_function)(p->f_hash),
                ^
cc1: all warnings being treated as errors

Not sure what the right way to fix this one is. Hash table should not be defining
its own special hash function prototype. Changing to a common definition is
non-trivial and breaks ABI.  Casting seems wrong, error prone,
and a bad precedent in this case.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external backend support
  2018-04-03 14:42  0%           ` Tan, Jianfeng
@ 2018-04-03 14:48  0%             ` Wodkowski, PawelX
  0 siblings, 0 replies; 200+ results
From: Wodkowski, PawelX @ 2018-04-03 14:48 UTC (permalink / raw)
  To: Tan, Jianfeng, Maxime Coquelin, Zhang, Roy Fan, dev; +Cc: jianjay.zhou

> -----Original Message-----
> From: Tan, Jianfeng
> Sent: Tuesday, April 3, 2018 4:43 PM
> To: Maxime Coquelin <maxime.coquelin@redhat.com>; Zhang, Roy Fan
> <roy.fan.zhang@intel.com>; Wodkowski, PawelX
> <pawelx.wodkowski@intel.com>; dev@dpdk.org
> Cc: jianjay.zhou@huawei.com
> Subject: Re: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external
> backend support
> 
> 
> 
> On 4/3/2018 9:44 PM, Maxime Coquelin wrote:
> > Hi Pawel, Fan,
> >
> > On 04/01/2018 09:53 PM, Zhang, Roy Fan wrote:
> >> Hi Pawel,
> >>
> >>> -----Original Message-----
> >>> From: Wodkowski, PawelX
> >>> Sent: Thursday, March 29, 2018 2:48 PM
> >>> To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> >>> Cc: maxime.coquelin@redhat.com; jianjay.zhou@huawei.com; Tan,
> Jianfeng
> >>> <jianfeng.tan@intel.com>
> >>> Subject: RE: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external
> >>> backend support
> >>>
> >>>> -----Original Message-----
> >>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
> >>>> Sent: Thursday, March 29, 2018 2:53 PM
> >>>> To: dev@dpdk.org
> >>>> Cc: maxime.coquelin@redhat.com; jianjay.zhou@huawei.com; Tan,
> >>> Jianfeng
> >>>> <jianfeng.tan@intel.com>
> >>>> Subject: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external
> >>>> backend support
> >>>>
> >>>> This patch adds external backend support to vhost library. The patch
> >>>> provides new APIs for the external backend to register pre and post
> >>>> vhost-user message handlers.
> >>>>
> >>>> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> >>>> ---
> >>>>   lib/librte_vhost/rte_vhost.h           | 64
> >>>> +++++++++++++++++++++++++++++++++-
> >>>>   lib/librte_vhost/rte_vhost_version.map |  6 ++++
> >>>>   lib/librte_vhost/vhost.c               | 17 ++++++++-
> >>>>   lib/librte_vhost/vhost.h               |  8 +++--
> >>>>   lib/librte_vhost/vhost_user.c          | 33 +++++++++++++++++-
> >>>>   5 files changed, 123 insertions(+), 5 deletions(-)
> >>>>
> >>>> diff --git a/lib/librte_vhost/rte_vhost.h
> >>>> b/lib/librte_vhost/rte_vhost.h index d332069..b902c44 100644
> >>>> --- a/lib/librte_vhost/rte_vhost.h
> >>>> +++ b/lib/librte_vhost/rte_vhost.h
> >>>> @@ -1,5 +1,5 @@
> >
> > <snip/>
> >
> >>
> >>>> + * @param require_reply
> >>>> + *  If the handler requires sending a reply, this varaible shall be
> >>>> +written 1,
> >>>> + *  otherwise 0.
> >>>> + * @return
> >>>> + *  0 on success, -1 on failure
> >>>> + */
> >>>> +typedef int (*rte_vhost_msg_post_handle)(int vid, void *msg,
> >>>> +        uint32_t *require_reply);
> >>>> +
> >>>
> >>> What mean 'Message pointer' Is this const for us? Is this payload?
> >>> Making
> >>> msg 'void *' is not a way to go here. Those pre and post handlers
> >>> need to see
> >>> exactly the same structures like vhost_user.c file. Otherwise we can
> >>> get into
> >>> troubles when ABI changes.
> >>
> >> It is the pointer to the vhost_user message. It cannot be const as
> >> the backend
> >> may change the payload.
> >>
> >>>
> >>> Also you can easily merge pre and post handlers into one handler
> >>> with one
> >>> Parameter describing what phase of message processing we are now.
> >>>
> >>
> >> No I don't think so. To do so it will be quite unclear in the future
> >> as we are
> >> using one function to do two totally different things.
> >
> > Time is running out for v18.05 integration deadline (April 6th), and
> > we haven't reached a consensus.
> >
> > Except this API point, I think vhost-crypto is at the right level.
> > Since vhost-crypto lives in librte_vhost, I propose Fan cooks an
> > intermediate solution that does not need API change.
> >
> > Doing this, we postpone the API change to v18.08, so we have time to
> > discuss what the right API should be. Once agreed, vhost-crypto moves to
> > the new API.
> >
> > Pawel, Jianfeng, Fan, is it fine for you?
> 
> +1. This can avoid blocking this patch set, and give more time for
> discussing new APIs and external structs.
> 
> Thanks,
> Jianfeng
> 
> >
> > Thanks,
> > Maxime

Fine for me too.
Pawel

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external backend support
  2018-04-03 13:44  0%         ` Maxime Coquelin
  2018-04-03 13:55  0%           ` Zhang, Roy Fan
@ 2018-04-03 14:42  0%           ` Tan, Jianfeng
  2018-04-03 14:48  0%             ` Wodkowski, PawelX
  1 sibling, 1 reply; 200+ results
From: Tan, Jianfeng @ 2018-04-03 14:42 UTC (permalink / raw)
  To: Maxime Coquelin, Zhang, Roy Fan, Wodkowski, PawelX, dev; +Cc: jianjay.zhou



On 4/3/2018 9:44 PM, Maxime Coquelin wrote:
> Hi Pawel, Fan,
>
> On 04/01/2018 09:53 PM, Zhang, Roy Fan wrote:
>> Hi Pawel,
>>
>>> -----Original Message-----
>>> From: Wodkowski, PawelX
>>> Sent: Thursday, March 29, 2018 2:48 PM
>>> To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
>>> Cc: maxime.coquelin@redhat.com; jianjay.zhou@huawei.com; Tan, Jianfeng
>>> <jianfeng.tan@intel.com>
>>> Subject: RE: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external
>>> backend support
>>>
>>>> -----Original Message-----
>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
>>>> Sent: Thursday, March 29, 2018 2:53 PM
>>>> To: dev@dpdk.org
>>>> Cc: maxime.coquelin@redhat.com; jianjay.zhou@huawei.com; Tan,
>>> Jianfeng
>>>> <jianfeng.tan@intel.com>
>>>> Subject: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external
>>>> backend support
>>>>
>>>> This patch adds external backend support to vhost library. The patch
>>>> provides new APIs for the external backend to register pre and post
>>>> vhost-user message handlers.
>>>>
>>>> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
>>>> ---
>>>>   lib/librte_vhost/rte_vhost.h           | 64
>>>> +++++++++++++++++++++++++++++++++-
>>>>   lib/librte_vhost/rte_vhost_version.map |  6 ++++
>>>>   lib/librte_vhost/vhost.c               | 17 ++++++++-
>>>>   lib/librte_vhost/vhost.h               |  8 +++--
>>>>   lib/librte_vhost/vhost_user.c          | 33 +++++++++++++++++-
>>>>   5 files changed, 123 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/lib/librte_vhost/rte_vhost.h
>>>> b/lib/librte_vhost/rte_vhost.h index d332069..b902c44 100644
>>>> --- a/lib/librte_vhost/rte_vhost.h
>>>> +++ b/lib/librte_vhost/rte_vhost.h
>>>> @@ -1,5 +1,5 @@
>
> <snip/>
>
>>
>>>> + * @param require_reply
>>>> + *  If the handler requires sending a reply, this varaible shall be
>>>> +written 1,
>>>> + *  otherwise 0.
>>>> + * @return
>>>> + *  0 on success, -1 on failure
>>>> + */
>>>> +typedef int (*rte_vhost_msg_post_handle)(int vid, void *msg,
>>>> +        uint32_t *require_reply);
>>>> +
>>>
>>> What mean 'Message pointer' Is this const for us? Is this payload? 
>>> Making
>>> msg 'void *' is not a way to go here. Those pre and post handlers 
>>> need to see
>>> exactly the same structures like vhost_user.c file. Otherwise we can 
>>> get into
>>> troubles when ABI changes.
>>
>> It is the pointer to the vhost_user message. It cannot be const as 
>> the backend
>> may change the payload.
>>
>>>
>>> Also you can easily merge pre and post handlers into one handler 
>>> with one
>>> Parameter describing what phase of message processing we are now.
>>>
>>
>> No I don't think so. To do so it will be quite unclear in the future 
>> as we are
>> using one function to do two totally different things.
>
> Time is running out for v18.05 integration deadline (April 6th), and 
> we haven't reached a consensus.
>
> Except this API point, I think vhost-crypto is at the right level.
> Since vhost-crypto lives in librte_vhost, I propose Fan cooks an
> intermediate solution that does not need API change.
>
> Doing this, we postpone the API change to v18.08, so we have time to
> discuss what the right API should be. Once agreed, vhost-crypto moves to
> the new API.
>
> Pawel, Jianfeng, Fan, is it fine for you?

+1. This can avoid blocking this patch set, and give more time for 
discussing new APIs and external structs.

Thanks,
Jianfeng

>
> Thanks,
> Maxime

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external backend support
  2018-04-03 13:44  0%         ` Maxime Coquelin
@ 2018-04-03 13:55  0%           ` Zhang, Roy Fan
  2018-04-03 14:42  0%           ` Tan, Jianfeng
  1 sibling, 0 replies; 200+ results
From: Zhang, Roy Fan @ 2018-04-03 13:55 UTC (permalink / raw)
  To: Maxime Coquelin, Wodkowski, PawelX, dev; +Cc: jianjay.zhou, Tan, Jianfeng

Hi Maxime,

No problem. I will work on that.
Pawel, Jianfeng, if you guys have other concerns or suggestions, please give me a shout.

Thanks a lot guys, for the review and help!

Regards,
Fan

> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Tuesday, April 3, 2018 2:45 PM
> To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; Wodkowski, PawelX
> <pawelx.wodkowski@intel.com>; dev@dpdk.org
> Cc: jianjay.zhou@huawei.com; Tan, Jianfeng <jianfeng.tan@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external
> backend support
> 
> Hi Pawel, Fan,
> 
> On 04/01/2018 09:53 PM, Zhang, Roy Fan wrote:
> > Hi Pawel,
> >
> >> -----Original Message-----
> >> From: Wodkowski, PawelX
> >> Sent: Thursday, March 29, 2018 2:48 PM
> >> To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> >> Cc: maxime.coquelin@redhat.com; jianjay.zhou@huawei.com; Tan,
> >> Jianfeng <jianfeng.tan@intel.com>
> >> Subject: RE: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external
> >> backend support
> >>
> >>> -----Original Message-----
> >>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
> >>> Sent: Thursday, March 29, 2018 2:53 PM
> >>> To: dev@dpdk.org
> >>> Cc: maxime.coquelin@redhat.com; jianjay.zhou@huawei.com; Tan,
> >> Jianfeng
> >>> <jianfeng.tan@intel.com>
> >>> Subject: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external
> >>> backend support
> >>>
> >>> This patch adds external backend support to vhost library. The patch
> >>> provides new APIs for the external backend to register pre and post
> >>> vhost-user message handlers.
> >>>
> >>> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> >>> ---
> >>>   lib/librte_vhost/rte_vhost.h           | 64
> >>> +++++++++++++++++++++++++++++++++-
> >>>   lib/librte_vhost/rte_vhost_version.map |  6 ++++
> >>>   lib/librte_vhost/vhost.c               | 17 ++++++++-
> >>>   lib/librte_vhost/vhost.h               |  8 +++--
> >>>   lib/librte_vhost/vhost_user.c          | 33 +++++++++++++++++-
> >>>   5 files changed, 123 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/lib/librte_vhost/rte_vhost.h
> >>> b/lib/librte_vhost/rte_vhost.h index d332069..b902c44 100644
> >>> --- a/lib/librte_vhost/rte_vhost.h
> >>> +++ b/lib/librte_vhost/rte_vhost.h
> >>> @@ -1,5 +1,5 @@
> 
> <snip/>
> 
> >
> >>> + * @param require_reply
> >>> + *  If the handler requires sending a reply, this varaible shall be
> >>> +written 1,
> >>> + *  otherwise 0.
> >>> + * @return
> >>> + *  0 on success, -1 on failure
> >>> + */
> >>> +typedef int (*rte_vhost_msg_post_handle)(int vid, void *msg,
> >>> +		uint32_t *require_reply);
> >>> +
> >>
> >> What mean 'Message pointer' Is this const for us? Is this payload?
> >> Making msg 'void *' is not a way to go here. Those pre and post
> >> handlers need to see exactly the same structures like vhost_user.c
> >> file. Otherwise we can get into troubles when ABI changes.
> >
> > It is the pointer to the vhost_user message. It cannot be const as the
> > backend may change the payload.
> >
> >>
> >> Also you can easily merge pre and post handlers into one handler with
> >> one Parameter describing what phase of message processing we are now.
> >>
> >
> > No I don't think so. To do so it will be quite unclear in the future
> > as we are using one function to do two totally different things.
> 
> Time is running out for v18.05 integration deadline (April 6th), and we haven't
> reached a consensus.
> 
> Except this API point, I think vhost-crypto is at the right level.
> Since vhost-crypto lives in librte_vhost, I propose Fan cooks an intermediate
> solution that does not need API change.
> 
> Doing this, we postpone the API change to v18.08, so we have time to discuss
> what the right API should be. Once agreed, vhost-crypto moves to the new
> API.
> 
> Pawel, Jianfeng, Fan, is it fine for you?
> 
> Thanks,
> Maxime

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external backend support
  2018-04-01 19:53  0%       ` Zhang, Roy Fan
@ 2018-04-03 13:44  0%         ` Maxime Coquelin
  2018-04-03 13:55  0%           ` Zhang, Roy Fan
  2018-04-03 14:42  0%           ` Tan, Jianfeng
  0 siblings, 2 replies; 200+ results
From: Maxime Coquelin @ 2018-04-03 13:44 UTC (permalink / raw)
  To: Zhang, Roy Fan, Wodkowski, PawelX, dev; +Cc: jianjay.zhou, Tan, Jianfeng

Hi Pawel, Fan,

On 04/01/2018 09:53 PM, Zhang, Roy Fan wrote:
> Hi Pawel,
> 
>> -----Original Message-----
>> From: Wodkowski, PawelX
>> Sent: Thursday, March 29, 2018 2:48 PM
>> To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
>> Cc: maxime.coquelin@redhat.com; jianjay.zhou@huawei.com; Tan, Jianfeng
>> <jianfeng.tan@intel.com>
>> Subject: RE: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external
>> backend support
>>
>>> -----Original Message-----
>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
>>> Sent: Thursday, March 29, 2018 2:53 PM
>>> To: dev@dpdk.org
>>> Cc: maxime.coquelin@redhat.com; jianjay.zhou@huawei.com; Tan,
>> Jianfeng
>>> <jianfeng.tan@intel.com>
>>> Subject: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external
>>> backend support
>>>
>>> This patch adds external backend support to vhost library. The patch
>>> provides new APIs for the external backend to register pre and post
>>> vhost-user message handlers.
>>>
>>> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
>>> ---
>>>   lib/librte_vhost/rte_vhost.h           | 64
>>> +++++++++++++++++++++++++++++++++-
>>>   lib/librte_vhost/rte_vhost_version.map |  6 ++++
>>>   lib/librte_vhost/vhost.c               | 17 ++++++++-
>>>   lib/librte_vhost/vhost.h               |  8 +++--
>>>   lib/librte_vhost/vhost_user.c          | 33 +++++++++++++++++-
>>>   5 files changed, 123 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/lib/librte_vhost/rte_vhost.h
>>> b/lib/librte_vhost/rte_vhost.h index d332069..b902c44 100644
>>> --- a/lib/librte_vhost/rte_vhost.h
>>> +++ b/lib/librte_vhost/rte_vhost.h
>>> @@ -1,5 +1,5 @@

<snip/>

> 
>>> + * @param require_reply
>>> + *  If the handler requires sending a reply, this varaible shall be
>>> +written 1,
>>> + *  otherwise 0.
>>> + * @return
>>> + *  0 on success, -1 on failure
>>> + */
>>> +typedef int (*rte_vhost_msg_post_handle)(int vid, void *msg,
>>> +		uint32_t *require_reply);
>>> +
>>
>> What mean 'Message pointer' Is this const for us? Is this payload? Making
>> msg 'void *' is not a way to go here. Those pre and post handlers need to see
>> exactly the same structures like vhost_user.c file. Otherwise we can get into
>> troubles when ABI changes.
> 
> It is the pointer to the vhost_user message. It cannot be const as the backend
> may change the payload.
> 
>>
>> Also you can easily merge pre and post handlers into one handler with one
>> Parameter describing what phase of message processing we are now.
>>
> 
> No I don't think so. To do so it will be quite unclear in the future as we are
> using one function to do two totally different things.

Time is running out for v18.05 integration deadline (April 6th), and we 
haven't reached a consensus.

Except this API point, I think vhost-crypto is at the right level.
Since vhost-crypto lives in librte_vhost, I propose Fan cooks an
intermediate solution that does not need API change.

Doing this, we postpone the API change to v18.08, so we have time to
discuss what the right API should be. Once agreed, vhost-crypto moves to
the new API.

Pawel, Jianfeng, Fan, is it fine for you?

Thanks,
Maxime

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] mbuf: remove control mbuf
  @ 2018-04-03 13:39  3% ` Olivier Matz
  0 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2018-04-03 13:39 UTC (permalink / raw)
  To: dev

The rte_ctrlmbuf structure is not used by any example application
in dpdk. Remove it, as announced on the mailing list.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 doc/guides/prog_guide/glossary.rst     |  3 --
 doc/guides/prog_guide/mbuf_lib.rst     | 11 ++---
 doc/guides/prog_guide/overview.rst     |  4 +-
 doc/guides/prog_guide/source_org.rst   |  2 +-
 doc/guides/rel_notes/deprecation.rst   | 13 -----
 doc/guides/rel_notes/release_18_05.rst | 15 +++++-
 lib/librte_mbuf/Makefile               |  2 +-
 lib/librte_mbuf/rte_mbuf.c             | 15 ------
 lib/librte_mbuf/rte_mbuf.h             | 86 ----------------------------------
 lib/librte_mbuf/rte_mbuf_version.map   |  1 -
 10 files changed, 23 insertions(+), 129 deletions(-)

diff --git a/doc/guides/prog_guide/glossary.rst b/doc/guides/prog_guide/glossary.rst
index e101bc022..dda45bd18 100644
--- a/doc/guides/prog_guide/glossary.rst
+++ b/doc/guides/prog_guide/glossary.rst
@@ -41,9 +41,6 @@ CPU
 CRC
    Cyclic Redundancy Check
 
-ctrlmbuf
-   An *mbuf* carrying control data.
-
 Data Plane
    In contrast to the control plane, the data plane in a network architecture
    are the layers involved when forwarding packets.  These layers must be
diff --git a/doc/guides/prog_guide/mbuf_lib.rst b/doc/guides/prog_guide/mbuf_lib.rst
index 210a9af9f..0d3223b08 100644
--- a/doc/guides/prog_guide/mbuf_lib.rst
+++ b/doc/guides/prog_guide/mbuf_lib.rst
@@ -10,9 +10,8 @@ The mbuf library provides the ability to allocate and free buffers (mbufs)
 that may be used by the DPDK application to store message buffers.
 The message buffers are stored in a mempool, using the :ref:`Mempool Library <Mempool_Library>`.
 
-A rte_mbuf struct can carry network packet buffers
-or generic control buffers (indicated by the CTRL_MBUF_FLAG).
-This can be extended to other types.
+A rte_mbuf struct generally carries network packet buffers, but it can actually
+be any data (control data, events, ...).
 The rte_mbuf header structure is kept as small as possible and currently uses
 just two cache lines, with the most frequently used fields being on the first
 of the two cache lines.
@@ -68,13 +67,13 @@ Buffers Stored in Memory Pools
 The Buffer Manager uses the :ref:`Mempool Library <Mempool_Library>` to allocate buffers.
 Therefore, it ensures that the packet header is interleaved optimally across the channels and ranks for L3 processing.
 An mbuf contains a field indicating the pool that it originated from.
-When calling rte_ctrlmbuf_free(m) or rte_pktmbuf_free(m), the mbuf returns to its original pool.
+When calling rte_pktmbuf_free(m), the mbuf returns to its original pool.
 
 Constructors
 ------------
 
-Packet and control mbuf constructors are provided by the API.
-The rte_pktmbuf_init() and rte_ctrlmbuf_init() functions initialize some fields in the mbuf structure that
+Packet mbuf constructors are provided by the API.
+The rte_pktmbuf_init() function initializes some fields in the mbuf structure that
 are not modified by the user once created (mbuf type, origin pool, buffer start address, and so on).
 This function is given as a callback function to the rte_mempool_create() function at pool creation time.
 
diff --git a/doc/guides/prog_guide/overview.rst b/doc/guides/prog_guide/overview.rst
index 2663fe0e8..c01f37e3c 100644
--- a/doc/guides/prog_guide/overview.rst
+++ b/doc/guides/prog_guide/overview.rst
@@ -130,8 +130,8 @@ The mbuf library provides the facility to create and destroy buffers
 that may be used by the DPDK application to store message buffers.
 The message buffers are created at startup time and stored in a mempool, using the DPDK mempool library.
 
-This library provides an API to allocate/free mbufs, manipulate control message buffers (ctrlmbuf) which are generic message buffers,
-and packet buffers (pktmbuf) which are used to carry network packets.
+This library provides an API to allocate/free mbufs, manipulate
+packet buffers which are used to carry network packets.
 
 Network Packet Buffer Management is described in :ref:`Mbuf Library <Mbuf_Library>`.
 
diff --git a/doc/guides/prog_guide/source_org.rst b/doc/guides/prog_guide/source_org.rst
index a8f5832bc..b640b0111 100644
--- a/doc/guides/prog_guide/source_org.rst
+++ b/doc/guides/prog_guide/source_org.rst
@@ -46,7 +46,7 @@ The lib directory contains::
     +-- librte_kni          # Kernel NIC interface
     +-- librte_kvargs       # Argument parsing library
     +-- librte_lpm          # Longest prefix match library
-    +-- librte_mbuf         # Packet and control mbuf manipulation
+    +-- librte_mbuf         # Packet buffer manipulation
     +-- librte_mempool      # Memory pool manager (fixed sized objects)
     +-- librte_meter        # QoS metering library
     +-- librte_net          # Various IP-related headers
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 84e153461..61b8ac705 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -76,19 +76,6 @@ Deprecation Notices
     customize objects population and allocate contiguous
     block of objects if underlying driver supports it.
 
-* mbuf: The control mbuf API will be removed in v18.05. The impacted
-  functions and macros are:
-
-  - ``rte_ctrlmbuf_init()``
-  - ``rte_ctrlmbuf_alloc()``
-  - ``rte_ctrlmbuf_free()``
-  - ``rte_ctrlmbuf_data()``
-  - ``rte_ctrlmbuf_len()``
-  - ``rte_is_ctrlmbuf()``
-  - ``CTRL_MBUF_FLAG``
-
-  The packet mbuf API should be used as a replacement.
-
 * mbuf: The opaque ``mbuf->hash.sched`` field will be updated to support generic
   definition in line with the ethdev TM and MTR APIs. Currently, this field
   is defined in librte_sched in a non-generic way. The new generic format
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 4d0276f1d..9b9a74885 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -72,6 +72,19 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* mbuf: The control mbuf API has been removed in v18.05. The impacted
+  functions and macros are:
+
+  - ``rte_ctrlmbuf_init()``
+  - ``rte_ctrlmbuf_alloc()``
+  - ``rte_ctrlmbuf_free()``
+  - ``rte_ctrlmbuf_data()``
+  - ``rte_ctrlmbuf_len()``
+  - ``rte_is_ctrlmbuf()``
+  - ``CTRL_MBUF_FLAG``
+
+  The packet mbuf API should be used as a replacement.
+
 
 ABI Changes
 -----------
@@ -163,7 +176,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_kvargs.so.1
      librte_latencystats.so.1
      librte_lpm.so.2
-     librte_mbuf.so.3
+   + librte_mbuf.so.4
      librte_mempool.so.3
    + librte_meter.so.2
      librte_metrics.so.1
diff --git a/lib/librte_mbuf/Makefile b/lib/librte_mbuf/Makefile
index 367568ae3..8749a00fe 100644
--- a/lib/librte_mbuf/Makefile
+++ b/lib/librte_mbuf/Makefile
@@ -12,7 +12,7 @@ LDLIBS += -lrte_eal -lrte_mempool
 
 EXPORT_MAP := rte_mbuf_version.map
 
-LIBABIVER := 3
+LIBABIVER := 4
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MBUF) := rte_mbuf.c rte_mbuf_ptype.c rte_mbuf_pool_ops.c
diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 091d388d3..3f4c83305 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -33,21 +33,6 @@
 #include <rte_memcpy.h>
 
 /*
- * ctrlmbuf constructor, given as a callback function to
- * rte_mempool_obj_iter() or rte_mempool_create()
- */
-void
-rte_ctrlmbuf_init(struct rte_mempool *mp,
-		__attribute__((unused)) void *opaque_arg,
-		void *_m,
-		__attribute__((unused)) unsigned i)
-{
-	struct rte_mbuf *m = _m;
-	rte_pktmbuf_init(mp, opaque_arg, _m, i);
-	m->ol_flags |= CTRL_MBUF_FLAG;
-}
-
-/*
  * pktmbuf pool constructor, given as a callback function to
  * rte_mempool_create(), or called directly if using
  * rte_mempool_create_empty()/rte_mempool_populate()
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 62740254d..06eceba37 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -330,9 +330,6 @@ extern "C" {
 
 #define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
 
-/* Use final bit of flags to indicate a control mbuf */
-#define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
-
 /** Alignment constraint of mbuf private area. */
 #define RTE_MBUF_PRIV_ALIGN 8
 
@@ -915,89 +912,6 @@ __rte_mbuf_raw_free(struct rte_mbuf *m)
 	rte_mbuf_raw_free(m);
 }
 
-/* Operations on ctrl mbuf */
-
-/**
- * The control mbuf constructor.
- *
- * This function initializes some fields in an mbuf structure that are
- * not modified by the user once created (mbuf type, origin pool, buffer
- * start address, and so on). This function is given as a callback function
- * to rte_mempool_obj_iter() or rte_mempool_create() at pool creation time.
- *
- * @param mp
- *   The mempool from which the mbuf is allocated.
- * @param opaque_arg
- *   A pointer that can be used by the user to retrieve useful information
- *   for mbuf initialization. This pointer is the opaque argument passed to
- *   rte_mempool_obj_iter() or rte_mempool_create().
- * @param m
- *   The mbuf to initialize.
- * @param i
- *   The index of the mbuf in the pool table.
- */
-void rte_ctrlmbuf_init(struct rte_mempool *mp, void *opaque_arg,
-		void *m, unsigned i);
-
-/**
- * Allocate a new mbuf (type is ctrl) from mempool *mp*.
- *
- * This new mbuf is initialized with data pointing to the beginning of
- * buffer, and with a length of zero.
- *
- * @param mp
- *   The mempool from which the mbuf is allocated.
- * @return
- *   - The pointer to the new mbuf on success.
- *   - NULL if allocation failed.
- */
-#define rte_ctrlmbuf_alloc(mp) rte_pktmbuf_alloc(mp)
-
-/**
- * Free a control mbuf back into its original mempool.
- *
- * @param m
- *   The control mbuf to be freed.
- */
-#define rte_ctrlmbuf_free(m) rte_pktmbuf_free(m)
-
-/**
- * A macro that returns the pointer to the carried data.
- *
- * The value that can be read or assigned.
- *
- * @param m
- *   The control mbuf.
- */
-#define rte_ctrlmbuf_data(m) ((char *)((m)->buf_addr) + (m)->data_off)
-
-/**
- * A macro that returns the length of the carried data.
- *
- * The value that can be read or assigned.
- *
- * @param m
- *   The control mbuf.
- */
-#define rte_ctrlmbuf_len(m) rte_pktmbuf_data_len(m)
-
-/**
- * Tests if an mbuf is a control mbuf
- *
- * @param m
- *   The mbuf to be tested
- * @return
- *   - True (1) if the mbuf is a control mbuf
- *   - False(0) otherwise
- */
-static inline int
-rte_is_ctrlmbuf(struct rte_mbuf *m)
-{
-	return !!(m->ol_flags & CTRL_MBUF_FLAG);
-}
-
-/* Operations on pkt mbuf */
-
 /**
  * The packet mbuf constructor.
  *
diff --git a/lib/librte_mbuf/rte_mbuf_version.map b/lib/librte_mbuf/rte_mbuf_version.map
index d418dcb82..2e056d994 100644
--- a/lib/librte_mbuf/rte_mbuf_version.map
+++ b/lib/librte_mbuf/rte_mbuf_version.map
@@ -1,7 +1,6 @@
 DPDK_2.0 {
 	global:
 
-	rte_ctrlmbuf_init;
 	rte_get_rx_ol_flag_name;
 	rte_get_tx_ol_flag_name;
 	rte_mbuf_sanity_check;
-- 
2.11.0

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] ring: relax alignment constraint on ring structure
  @ 2018-04-03 13:26  9% ` Olivier Matz
    0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2018-04-03 13:26 UTC (permalink / raw)
  To: dev

The initial objective of
commit d9f0d3a1ffd4 ("ring: remove split cacheline build setting")
was to add an empty cache line betwee, the producer and consumer
data (on platform with cache line size = 64B), preventing from
having them on adjacent cache lines.

Following discussion on the mailing list, it appears that this
also imposes an alignment constraint that is not required.

This patch removes the extra alignment constraint and adds the
empty cache lines using padding fields in the structure. The
size of rte_ring structure and the offset of the fields remain
the same on platforms with cache line size = 64B:

  rte_ring = 384
  rte_ring.name = 0
  rte_ring.flags = 32
  rte_ring.memzone = 40
  rte_ring.size = 48
  rte_ring.mask = 52
  rte_ring.prod = 128
  rte_ring.cons = 256

But it has an impact on platform where cache line size is 128B:

  rte_ring = 384        -> 768
  rte_ring.name = 0
  rte_ring.flags = 32
  rte_ring.memzone = 40
  rte_ring.size = 48
  rte_ring.mask = 52
  rte_ring.prod = 128   -> 256
  rte_ring.cons = 256   -> 512

Link: http://dpdk.org/dev/patchwork/patch/25039/
Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 doc/guides/rel_notes/deprecation.rst   |  6 ------
 doc/guides/rel_notes/release_18_05.rst |  8 +++++++-
 lib/librte_ring/Makefile               |  2 +-
 lib/librte_ring/rte_ring.h             | 16 ++++++----------
 4 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 40448961a..84e153461 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -139,9 +139,3 @@ Deprecation Notices
   required the previous behavior can be configured using existing flow
   director APIs. There is no ABI/API break. This change will just remove a
   global configuration setting and require explicit configuration.
-
-* ring: The alignment constraints on the ring structure will be relaxed
-  to one cache line instead of two, and an empty cache line padding will
-  be added between the producer and consumer structures. The size of the
-  structure and the offset of the fields will remain the same on
-  platforms with 64B cache line, but will change on other platforms.
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 9cc77f893..4d0276f1d 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -86,6 +86,12 @@ ABI Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* ring: the alignment constraints on the ring structure has been relaxed
+  to one cache line instead of two, and an empty cache line padding is
+  added between the producer and consumer structures. The size of the
+  structure and the offset of the fields remains the same on platforms
+  with 64B cache line, but changes on other platforms.
+
 
 Removed Items
 -------------
@@ -176,7 +182,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_power.so.1
      librte_rawdev.so.1
      librte_reorder.so.1
-     librte_ring.so.1
+   + librte_ring.so.2
      librte_sched.so.1
      librte_security.so.1
      librte_table.so.3
diff --git a/lib/librte_ring/Makefile b/lib/librte_ring/Makefile
index bde8907d6..21a36770d 100644
--- a/lib/librte_ring/Makefile
+++ b/lib/librte_ring/Makefile
@@ -11,7 +11,7 @@ LDLIBS += -lrte_eal
 
 EXPORT_MAP := rte_ring_version.map
 
-LIBABIVER := 1
+LIBABIVER := 2
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_RING) := rte_ring.c
diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 253cdc96a..d3d3f7f97 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -62,14 +62,6 @@ enum rte_ring_queue_behavior {
 
 struct rte_memzone; /* forward declaration, so as not to require memzone.h */
 
-#if RTE_CACHE_LINE_SIZE < 128
-#define PROD_ALIGN (RTE_CACHE_LINE_SIZE * 2)
-#define CONS_ALIGN (RTE_CACHE_LINE_SIZE * 2)
-#else
-#define PROD_ALIGN RTE_CACHE_LINE_SIZE
-#define CONS_ALIGN RTE_CACHE_LINE_SIZE
-#endif
-
 /* structure to hold a pair of head/tail values and other metadata */
 struct rte_ring_headtail {
 	volatile uint32_t head;  /**< Prod/consumer head. */
@@ -101,11 +93,15 @@ struct rte_ring {
 	uint32_t mask;           /**< Mask (size-1) of ring. */
 	uint32_t capacity;       /**< Usable size of ring */
 
+	char pad0 __rte_cache_aligned; /**< empty cache line */
+
 	/** Ring producer status. */
-	struct rte_ring_headtail prod __rte_aligned(PROD_ALIGN);
+	struct rte_ring_headtail prod __rte_cache_aligned;
+	char pad1 __rte_cache_aligned; /**< empty cache line */
 
 	/** Ring consumer status. */
-	struct rte_ring_headtail cons __rte_aligned(CONS_ALIGN);
+	struct rte_ring_headtail cons __rte_cache_aligned;
+	char pad2 __rte_cache_aligned; /**< empty cache line */
 };
 
 #define RING_F_SP_ENQ 0x0001 /**< The default enqueue is "single-producer". */
-- 
2.11.0

^ permalink raw reply	[relevance 9%]

* Re: [dpdk-dev] [PATCH v3 1/2] doc: add vfio api support
  2018-04-03  8:28  4% ` [dpdk-dev] [PATCH v3 1/2] doc: add vfio api support Hemant Agrawal
@ 2018-04-03 10:16  0%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-04-03 10:16 UTC (permalink / raw)
  To: Hemant Agrawal; +Cc: dev, anatoly.burakov

03/04/2018 10:28, Hemant Agrawal:
> --- a/doc/api/doxy-api-index.md
> +++ b/doc/api/doxy-api-index.md
> @@ -179,4 +179,5 @@ The public API headers are grouped by topics:
>    [EAL config]         (@ref rte_eal.h),
>    [common]             (@ref rte_common.h),
>    [ABI compat]         (@ref rte_compat.h),
> -  [version]            (@ref rte_version.h)
> +  [version]            (@ref rte_version.h),
> +  [vfio]               (@ref rte_vfio.h)

It would be more appropriate after rte_pci.h in "device" section.

> diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
> index cda52fd..166612f 100644
> --- a/doc/api/doxy-api.conf
> +++ b/doc/api/doxy-api.conf
> @@ -82,6 +82,7 @@ INPUT                   = doc/api/doxy-api-index.md \
>  FILE_PATTERNS           = rte_*.h \
>                            cmdline.h
>  PREDEFINED              = __DOXYGEN__ \
> +			  VFIO_PRESENT \
>                            __attribute__(x)=

The indent is not the same as other lines.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v6 02/10] crypto/virtio: support virtio device init
  @ 2018-04-03  9:43  1% ` Jay Zhou
    1 sibling, 0 replies; 200+ results
From: Jay Zhou @ 2018-04-03  9:43 UTC (permalink / raw)
  To: dev
  Cc: pablo.de.lara.guarch, roy.fan.zhang, thomas, arei.gonglei,
	xin.zeng, weidong.huang, wangxinxin.wang, longpeng2,
	jianjay.zhou

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
---
 drivers/crypto/virtio/Makefile           |   3 +
 drivers/crypto/virtio/virtio_cryptodev.c | 247 ++++++++++++++++-
 drivers/crypto/virtio/virtio_cryptodev.h |  13 +
 drivers/crypto/virtio/virtio_logs.h      |  91 ++++++
 drivers/crypto/virtio/virtio_pci.c       | 460 +++++++++++++++++++++++++++++++
 drivers/crypto/virtio/virtio_pci.h       | 253 +++++++++++++++++
 drivers/crypto/virtio/virtio_ring.h      | 137 +++++++++
 drivers/crypto/virtio/virtio_rxtx.c      |  26 ++
 drivers/crypto/virtio/virtqueue.c        |  43 +++
 drivers/crypto/virtio/virtqueue.h        | 172 ++++++++++++
 10 files changed, 1442 insertions(+), 3 deletions(-)
 create mode 100644 drivers/crypto/virtio/virtio_logs.h
 create mode 100644 drivers/crypto/virtio/virtio_pci.c
 create mode 100644 drivers/crypto/virtio/virtio_pci.h
 create mode 100644 drivers/crypto/virtio/virtio_ring.h
 create mode 100644 drivers/crypto/virtio/virtio_rxtx.c
 create mode 100644 drivers/crypto/virtio/virtqueue.c
 create mode 100644 drivers/crypto/virtio/virtqueue.h

diff --git a/drivers/crypto/virtio/Makefile b/drivers/crypto/virtio/Makefile
index a3b44e9..c4727ea 100644
--- a/drivers/crypto/virtio/Makefile
+++ b/drivers/crypto/virtio/Makefile
@@ -18,6 +18,9 @@ LIBABIVER := 1
 #
 # all source are stored in SRCS-y
 #
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO) += virtqueue.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO) += virtio_pci.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO) += virtio_rxtx.c
 SRCS-$(CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO) += virtio_cryptodev.c
 
 # this lib depends upon:
diff --git a/drivers/crypto/virtio/virtio_cryptodev.c b/drivers/crypto/virtio/virtio_cryptodev.c
index 84aff58..4550834 100644
--- a/drivers/crypto/virtio/virtio_cryptodev.c
+++ b/drivers/crypto/virtio/virtio_cryptodev.c
@@ -3,25 +3,238 @@
  */
 #include <rte_pci.h>
 #include <rte_bus_pci.h>
+#include <rte_cryptodev.h>
 #include <rte_cryptodev_pmd.h>
+#include <rte_eal.h>
 #include "virtio_cryptodev.h"
+#include "virtqueue.h"
+
+int virtio_crypto_logtype_init;
+int virtio_crypto_logtype_session;
+int virtio_crypto_logtype_rx;
+int virtio_crypto_logtype_tx;
+int virtio_crypto_logtype_driver;
+
+/*
+ * The set of PCI devices this driver supports
+ */
+static const struct rte_pci_id pci_id_virtio_crypto_map[] = {
+	{ RTE_PCI_DEVICE(VIRTIO_CRYPTO_PCI_VENDORID,
+				VIRTIO_CRYPTO_PCI_DEVICEID) },
+	{ .vendor_id = 0, /* sentinel */ },
+};
 
 uint8_t cryptodev_virtio_driver_id;
 
+/*
+ * dev_ops for virtio, bare necessities for basic operation
+ */
+static struct rte_cryptodev_ops virtio_crypto_dev_ops = {
+	/* Device related operations */
+	.dev_configure			 = NULL,
+	.dev_start			 = NULL,
+	.dev_stop			 = NULL,
+	.dev_close			 = NULL,
+	.dev_infos_get			 = NULL,
+
+	.stats_get			 = NULL,
+	.stats_reset			 = NULL,
+
+	.queue_pair_setup                = NULL,
+	.queue_pair_release              = NULL,
+	.queue_pair_start                = NULL,
+	.queue_pair_stop                 = NULL,
+	.queue_pair_count                = NULL,
+
+	/* Crypto related operations */
+	.session_get_size	= NULL,
+	.session_configure	= NULL,
+	.session_clear		= NULL,
+	.qp_attach_session = NULL,
+	.qp_detach_session = NULL
+};
+
+static int
+virtio_negotiate_features(struct virtio_crypto_hw *hw, uint64_t req_features)
+{
+	uint64_t host_features;
+
+	PMD_INIT_FUNC_TRACE();
+
+	/* Prepare guest_features: feature that driver wants to support */
+	VIRTIO_CRYPTO_INIT_LOG_DBG("guest_features before negotiate = %" PRIx64,
+		req_features);
+
+	/* Read device(host) feature bits */
+	host_features = VTPCI_OPS(hw)->get_features(hw);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("host_features before negotiate = %" PRIx64,
+		host_features);
+
+	/*
+	 * Negotiate features: Subset of device feature bits are written back
+	 * guest feature bits.
+	 */
+	hw->guest_features = req_features;
+	hw->guest_features = vtpci_cryptodev_negotiate_features(hw,
+							host_features);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("features after negotiate = %" PRIx64,
+		hw->guest_features);
+
+	if (hw->modern) {
+		if (!vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) {
+			VIRTIO_CRYPTO_INIT_LOG_ERR(
+				"VIRTIO_F_VERSION_1 features is not enabled.");
+			return -1;
+		}
+		vtpci_cryptodev_set_status(hw,
+			VIRTIO_CONFIG_STATUS_FEATURES_OK);
+		if (!(vtpci_cryptodev_get_status(hw) &
+			VIRTIO_CONFIG_STATUS_FEATURES_OK)) {
+			VIRTIO_CRYPTO_INIT_LOG_ERR("failed to set FEATURES_OK "
+						"status!");
+			return -1;
+		}
+	}
+
+	hw->req_guest_features = req_features;
+
+	return 0;
+}
+
+/* reset device and renegotiate features if needed */
+static int
+virtio_crypto_init_device(struct rte_cryptodev *cryptodev,
+	uint64_t req_features)
+{
+	struct virtio_crypto_hw *hw = cryptodev->data->dev_private;
+	struct virtio_crypto_config local_config;
+	struct virtio_crypto_config *config = &local_config;
+
+	PMD_INIT_FUNC_TRACE();
+
+	/* Reset the device although not necessary at startup */
+	vtpci_cryptodev_reset(hw);
+
+	/* Tell the host we've noticed this device. */
+	vtpci_cryptodev_set_status(hw, VIRTIO_CONFIG_STATUS_ACK);
+
+	/* Tell the host we've known how to drive the device. */
+	vtpci_cryptodev_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER);
+	if (virtio_negotiate_features(hw, req_features) < 0)
+		return -1;
+
+	/* Get status of the device */
+	vtpci_read_cryptodev_config(hw,
+		offsetof(struct virtio_crypto_config, status),
+		&config->status, sizeof(config->status));
+	if (config->status != VIRTIO_CRYPTO_S_HW_READY) {
+		VIRTIO_CRYPTO_DRV_LOG_ERR("accelerator hardware is "
+				"not ready");
+		return -1;
+	}
+
+	/* Get number of data queues */
+	vtpci_read_cryptodev_config(hw,
+		offsetof(struct virtio_crypto_config, max_dataqueues),
+		&config->max_dataqueues,
+		sizeof(config->max_dataqueues));
+	hw->max_dataqueues = config->max_dataqueues;
+
+	VIRTIO_CRYPTO_INIT_LOG_DBG("hw->max_dataqueues=%d",
+		hw->max_dataqueues);
+
+	return 0;
+}
+
+/*
+ * This function is based on probe() function
+ * It returns 0 on success.
+ */
+static int
+crypto_virtio_create(const char *name, struct rte_pci_device *pci_dev,
+		struct rte_cryptodev_pmd_init_params *init_params)
+{
+	struct rte_cryptodev *cryptodev;
+	struct virtio_crypto_hw *hw;
+
+	PMD_INIT_FUNC_TRACE();
+
+	cryptodev = rte_cryptodev_pmd_create(name, &pci_dev->device,
+					init_params);
+	if (cryptodev == NULL)
+		return -ENODEV;
+
+	cryptodev->driver_id = cryptodev_virtio_driver_id;
+	cryptodev->dev_ops = &virtio_crypto_dev_ops;
+
+	cryptodev->enqueue_burst = virtio_crypto_pkt_tx_burst;
+	cryptodev->dequeue_burst = virtio_crypto_pkt_rx_burst;
+
+	cryptodev->feature_flags = RTE_CRYPTODEV_FF_SYMMETRIC_CRYPTO |
+		RTE_CRYPTODEV_FF_SYM_OPERATION_CHAINING;
+
+	hw = cryptodev->data->dev_private;
+	hw->dev_id = cryptodev->data->dev_id;
+
+	VIRTIO_CRYPTO_INIT_LOG_DBG("dev %d vendorID=0x%x deviceID=0x%x",
+		cryptodev->data->dev_id, pci_dev->id.vendor_id,
+		pci_dev->id.device_id);
+
+	/* pci device init */
+	if (vtpci_cryptodev_init(pci_dev, hw))
+		return -1;
+
+	if (virtio_crypto_init_device(cryptodev,
+			VIRTIO_CRYPTO_PMD_GUEST_FEATURES) < 0)
+		return -1;
+
+	return 0;
+}
+
 static int crypto_virtio_pci_probe(
 	struct rte_pci_driver *pci_drv __rte_unused,
-	struct rte_pci_device *pci_dev __rte_unused)
+	struct rte_pci_device *pci_dev)
 {
-	return 0;
+	struct rte_cryptodev_pmd_init_params init_params = {
+		.name = "",
+		.socket_id = rte_socket_id(),
+		.private_data_size = sizeof(struct virtio_crypto_hw),
+		.max_nb_sessions = RTE_VIRTIO_CRYPTO_PMD_MAX_NB_SESSIONS
+	};
+	char name[RTE_CRYPTODEV_NAME_MAX_LEN];
+
+	VIRTIO_CRYPTO_DRV_LOG_DBG("Found Crypto device at %02x:%02x.%x",
+			pci_dev->addr.bus,
+			pci_dev->addr.devid,
+			pci_dev->addr.function);
+
+	rte_pci_device_name(&pci_dev->addr, name, sizeof(name));
+
+	return crypto_virtio_create(name, pci_dev, &init_params);
 }
 
 static int crypto_virtio_pci_remove(
-	struct rte_pci_device *pci_dev __rte_unused)
+	struct rte_pci_device *pci_dev)
 {
+	struct rte_cryptodev *cryptodev;
+	char cryptodev_name[RTE_CRYPTODEV_NAME_MAX_LEN];
+
+	if (pci_dev == NULL)
+		return -EINVAL;
+
+	rte_pci_device_name(&pci_dev->addr, cryptodev_name,
+			sizeof(cryptodev_name));
+
+	cryptodev = rte_cryptodev_pmd_get_named_dev(cryptodev_name);
+	if (cryptodev == NULL)
+		return -ENODEV;
+
 	return 0;
 }
 
 static struct rte_pci_driver rte_virtio_crypto_driver = {
+	.id_table = pci_id_virtio_crypto_map,
+	.drv_flags = 0,
 	.probe = crypto_virtio_pci_probe,
 	.remove = crypto_virtio_pci_remove
 };
@@ -32,3 +245,31 @@ static int crypto_virtio_pci_remove(
 RTE_PMD_REGISTER_CRYPTO_DRIVER(virtio_crypto_drv,
 	rte_virtio_crypto_driver.driver,
 	cryptodev_virtio_driver_id);
+
+RTE_INIT(virtio_crypto_init_log);
+static void
+virtio_crypto_init_log(void)
+{
+	virtio_crypto_logtype_init = rte_log_register("pmd.crypto.virtio.init");
+	if (virtio_crypto_logtype_init >= 0)
+		rte_log_set_level(virtio_crypto_logtype_init, RTE_LOG_NOTICE);
+
+	virtio_crypto_logtype_session =
+		rte_log_register("pmd.crypto.virtio.session");
+	if (virtio_crypto_logtype_session >= 0)
+		rte_log_set_level(virtio_crypto_logtype_session,
+				RTE_LOG_NOTICE);
+
+	virtio_crypto_logtype_rx = rte_log_register("pmd.crypto.virtio.rx");
+	if (virtio_crypto_logtype_rx >= 0)
+		rte_log_set_level(virtio_crypto_logtype_rx, RTE_LOG_NOTICE);
+
+	virtio_crypto_logtype_tx = rte_log_register("pmd.crypto.virtio.tx");
+	if (virtio_crypto_logtype_tx >= 0)
+		rte_log_set_level(virtio_crypto_logtype_tx, RTE_LOG_NOTICE);
+
+	virtio_crypto_logtype_driver =
+		rte_log_register("pmd.crypto.virtio.driver");
+	if (virtio_crypto_logtype_driver >= 0)
+		rte_log_set_level(virtio_crypto_logtype_driver, RTE_LOG_NOTICE);
+}
diff --git a/drivers/crypto/virtio/virtio_cryptodev.h b/drivers/crypto/virtio/virtio_cryptodev.h
index 44517b8..392db4a 100644
--- a/drivers/crypto/virtio/virtio_cryptodev.h
+++ b/drivers/crypto/virtio/virtio_cryptodev.h
@@ -5,6 +5,19 @@
 #ifndef _VIRTIO_CRYPTODEV_H_
 #define _VIRTIO_CRYPTODEV_H_
 
+#include <rte_cryptodev.h>
+
+/* Features desired/implemented by this driver. */
+#define VIRTIO_CRYPTO_PMD_GUEST_FEATURES (1ULL << VIRTIO_F_VERSION_1)
+
 #define CRYPTODEV_NAME_VIRTIO_PMD crypto_virtio
 
+uint16_t virtio_crypto_pkt_tx_burst(void *tx_queue,
+		struct rte_crypto_op **tx_pkts,
+		uint16_t nb_pkts);
+
+uint16_t virtio_crypto_pkt_rx_burst(void *tx_queue,
+		struct rte_crypto_op **tx_pkts,
+		uint16_t nb_pkts);
+
 #endif /* _VIRTIO_CRYPTODEV_H_ */
diff --git a/drivers/crypto/virtio/virtio_logs.h b/drivers/crypto/virtio/virtio_logs.h
new file mode 100644
index 0000000..26a286c
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_logs.h
@@ -0,0 +1,91 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_LOGS_H_
+#define _VIRTIO_LOGS_H_
+
+#include <rte_log.h>
+
+#define PMD_INIT_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, RTE_LOGTYPE_PMD, \
+		"PMD: %s(): " fmt "\n", __func__, ##args)
+
+#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
+
+extern int virtio_crypto_logtype_init;
+
+#define VIRTIO_CRYPTO_INIT_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_init, \
+		"INIT: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_INIT_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_INIT_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_INIT_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_INIT_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_INIT_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_INIT_LOG_IMPL(ERR, fmt, ## args)
+
+extern int virtio_crypto_logtype_session;
+
+#define VIRTIO_CRYPTO_SESSION_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_session, \
+		"SESSION: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_SESSION_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_SESSION_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_SESSION_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_SESSION_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_SESSION_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_SESSION_LOG_IMPL(ERR, fmt, ## args)
+
+extern int virtio_crypto_logtype_rx;
+
+#define VIRTIO_CRYPTO_RX_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_rx, \
+		"RX: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_RX_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_RX_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_RX_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_RX_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_RX_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_RX_LOG_IMPL(ERR, fmt, ## args)
+
+extern int virtio_crypto_logtype_tx;
+
+#define VIRTIO_CRYPTO_TX_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_tx, \
+		"TX: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_TX_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_TX_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_TX_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_TX_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_TX_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_TX_LOG_IMPL(ERR, fmt, ## args)
+
+extern int virtio_crypto_logtype_driver;
+
+#define VIRTIO_CRYPTO_DRV_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_driver, \
+		"DRIVER: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_DRV_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_DRV_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_DRV_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_DRV_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_DRV_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_DRV_LOG_IMPL(ERR, fmt, ## args)
+
+#endif /* _VIRTIO_LOGS_H_ */
diff --git a/drivers/crypto/virtio/virtio_pci.c b/drivers/crypto/virtio/virtio_pci.c
new file mode 100644
index 0000000..43ec1a4
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_pci.c
@@ -0,0 +1,460 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#include <stdint.h>
+
+#ifdef RTE_EXEC_ENV_LINUXAPP
+ #include <dirent.h>
+ #include <fcntl.h>
+#endif
+
+#include <rte_io.h>
+#include <rte_bus.h>
+
+#include "virtio_pci.h"
+#include "virtqueue.h"
+
+/*
+ * Following macros are derived from linux/pci_regs.h, however,
+ * we can't simply include that header here, as there is no such
+ * file for non-Linux platform.
+ */
+#define PCI_CAPABILITY_LIST	0x34
+#define PCI_CAP_ID_VNDR		0x09
+#define PCI_CAP_ID_MSIX		0x11
+
+/*
+ * The remaining space is defined by each driver as the per-driver
+ * configuration space.
+ */
+#define VIRTIO_PCI_CONFIG(hw) \
+		(((hw)->use_msix == VIRTIO_MSIX_ENABLED) ? 24 : 20)
+
+static inline int
+check_vq_phys_addr_ok(struct virtqueue *vq)
+{
+	/* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
+	 * and only accepts 32 bit page frame number.
+	 * Check if the allocated physical memory exceeds 16TB.
+	 */
+	if ((vq->vq_ring_mem + vq->vq_ring_size - 1) >>
+			(VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR("vring address shouldn't be above 16TB!");
+		return 0;
+	}
+
+	return 1;
+}
+
+static inline void
+io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
+{
+	rte_write32(val & ((1ULL << 32) - 1), lo);
+	rte_write32(val >> 32,		     hi);
+}
+
+static void
+modern_read_dev_config(struct virtio_crypto_hw *hw, size_t offset,
+		       void *dst, int length)
+{
+	int i;
+	uint8_t *p;
+	uint8_t old_gen, new_gen;
+
+	do {
+		old_gen = rte_read8(&hw->common_cfg->config_generation);
+
+		p = dst;
+		for (i = 0;  i < length; i++)
+			*p++ = rte_read8((uint8_t *)hw->dev_cfg + offset + i);
+
+		new_gen = rte_read8(&hw->common_cfg->config_generation);
+	} while (old_gen != new_gen);
+}
+
+static void
+modern_write_dev_config(struct virtio_crypto_hw *hw, size_t offset,
+			const void *src, int length)
+{
+	int i;
+	const uint8_t *p = src;
+
+	for (i = 0;  i < length; i++)
+		rte_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
+}
+
+static uint64_t
+modern_get_features(struct virtio_crypto_hw *hw)
+{
+	uint32_t features_lo, features_hi;
+
+	rte_write32(0, &hw->common_cfg->device_feature_select);
+	features_lo = rte_read32(&hw->common_cfg->device_feature);
+
+	rte_write32(1, &hw->common_cfg->device_feature_select);
+	features_hi = rte_read32(&hw->common_cfg->device_feature);
+
+	return ((uint64_t)features_hi << 32) | features_lo;
+}
+
+static void
+modern_set_features(struct virtio_crypto_hw *hw, uint64_t features)
+{
+	rte_write32(0, &hw->common_cfg->guest_feature_select);
+	rte_write32(features & ((1ULL << 32) - 1),
+		    &hw->common_cfg->guest_feature);
+
+	rte_write32(1, &hw->common_cfg->guest_feature_select);
+	rte_write32(features >> 32,
+		    &hw->common_cfg->guest_feature);
+}
+
+static uint8_t
+modern_get_status(struct virtio_crypto_hw *hw)
+{
+	return rte_read8(&hw->common_cfg->device_status);
+}
+
+static void
+modern_set_status(struct virtio_crypto_hw *hw, uint8_t status)
+{
+	rte_write8(status, &hw->common_cfg->device_status);
+}
+
+static void
+modern_reset(struct virtio_crypto_hw *hw)
+{
+	modern_set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	modern_get_status(hw);
+}
+
+static uint8_t
+modern_get_isr(struct virtio_crypto_hw *hw)
+{
+	return rte_read8(hw->isr);
+}
+
+static uint16_t
+modern_set_config_irq(struct virtio_crypto_hw *hw, uint16_t vec)
+{
+	rte_write16(vec, &hw->common_cfg->msix_config);
+	return rte_read16(&hw->common_cfg->msix_config);
+}
+
+static uint16_t
+modern_set_queue_irq(struct virtio_crypto_hw *hw, struct virtqueue *vq,
+		uint16_t vec)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+	rte_write16(vec, &hw->common_cfg->queue_msix_vector);
+	return rte_read16(&hw->common_cfg->queue_msix_vector);
+}
+
+static uint16_t
+modern_get_queue_num(struct virtio_crypto_hw *hw, uint16_t queue_id)
+{
+	rte_write16(queue_id, &hw->common_cfg->queue_select);
+	return rte_read16(&hw->common_cfg->queue_size);
+}
+
+static int
+modern_setup_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
+{
+	uint64_t desc_addr, avail_addr, used_addr;
+	uint16_t notify_off;
+
+	if (!check_vq_phys_addr_ok(vq))
+		return -1;
+
+	desc_addr = vq->vq_ring_mem;
+	avail_addr = desc_addr + vq->vq_nentries * sizeof(struct vring_desc);
+	used_addr = RTE_ALIGN_CEIL(avail_addr + offsetof(struct vring_avail,
+							 ring[vq->vq_nentries]),
+				   VIRTIO_PCI_VRING_ALIGN);
+
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(desc_addr, &hw->common_cfg->queue_desc_lo,
+				      &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(avail_addr, &hw->common_cfg->queue_avail_lo,
+				       &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(used_addr, &hw->common_cfg->queue_used_lo,
+				      &hw->common_cfg->queue_used_hi);
+
+	notify_off = rte_read16(&hw->common_cfg->queue_notify_off);
+	vq->notify_addr = (void *)((uint8_t *)hw->notify_base +
+				notify_off * hw->notify_off_multiplier);
+
+	rte_write16(1, &hw->common_cfg->queue_enable);
+
+	VIRTIO_CRYPTO_INIT_LOG_DBG("queue %u addresses:", vq->vq_queue_index);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("\t desc_addr: %" PRIx64, desc_addr);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("\t aval_addr: %" PRIx64, avail_addr);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("\t used_addr: %" PRIx64, used_addr);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("\t notify addr: %p (notify offset: %u)",
+		vq->notify_addr, notify_off);
+
+	return 0;
+}
+
+static void
+modern_del_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(0, &hw->common_cfg->queue_desc_lo,
+				  &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_avail_lo,
+				  &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_used_lo,
+				  &hw->common_cfg->queue_used_hi);
+
+	rte_write16(0, &hw->common_cfg->queue_enable);
+}
+
+static void
+modern_notify_queue(struct virtio_crypto_hw *hw __rte_unused,
+		struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, vq->notify_addr);
+}
+
+const struct virtio_pci_ops virtio_crypto_modern_ops = {
+	.read_dev_cfg	= modern_read_dev_config,
+	.write_dev_cfg	= modern_write_dev_config,
+	.reset		= modern_reset,
+	.get_status	= modern_get_status,
+	.set_status	= modern_set_status,
+	.get_features	= modern_get_features,
+	.set_features	= modern_set_features,
+	.get_isr	= modern_get_isr,
+	.set_config_irq	= modern_set_config_irq,
+	.set_queue_irq  = modern_set_queue_irq,
+	.get_queue_num	= modern_get_queue_num,
+	.setup_queue	= modern_setup_queue,
+	.del_queue	= modern_del_queue,
+	.notify_queue	= modern_notify_queue,
+};
+
+void
+vtpci_read_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+		void *dst, int length)
+{
+	VTPCI_OPS(hw)->read_dev_cfg(hw, offset, dst, length);
+}
+
+void
+vtpci_write_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+		const void *src, int length)
+{
+	VTPCI_OPS(hw)->write_dev_cfg(hw, offset, src, length);
+}
+
+uint64_t
+vtpci_cryptodev_negotiate_features(struct virtio_crypto_hw *hw,
+		uint64_t host_features)
+{
+	uint64_t features;
+
+	/*
+	 * Limit negotiated features to what the driver, virtqueue, and
+	 * host all support.
+	 */
+	features = host_features & hw->guest_features;
+	VTPCI_OPS(hw)->set_features(hw, features);
+
+	return features;
+}
+
+void
+vtpci_cryptodev_reset(struct virtio_crypto_hw *hw)
+{
+	VTPCI_OPS(hw)->set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	/* flush status write */
+	VTPCI_OPS(hw)->get_status(hw);
+}
+
+void
+vtpci_cryptodev_reinit_complete(struct virtio_crypto_hw *hw)
+{
+	vtpci_cryptodev_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
+}
+
+void
+vtpci_cryptodev_set_status(struct virtio_crypto_hw *hw, uint8_t status)
+{
+	if (status != VIRTIO_CONFIG_STATUS_RESET)
+		status |= VTPCI_OPS(hw)->get_status(hw);
+
+	VTPCI_OPS(hw)->set_status(hw, status);
+}
+
+uint8_t
+vtpci_cryptodev_get_status(struct virtio_crypto_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_status(hw);
+}
+
+uint8_t
+vtpci_cryptodev_isr(struct virtio_crypto_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_isr(hw);
+}
+
+static void *
+get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap)
+{
+	uint8_t  bar    = cap->bar;
+	uint32_t length = cap->length;
+	uint32_t offset = cap->offset;
+	uint8_t *base;
+
+	if (bar >= PCI_MAX_RESOURCE) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR("invalid bar: %u", bar);
+		return NULL;
+	}
+
+	if (offset + length < offset) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR("offset(%u) + length(%u) overflows",
+			offset, length);
+		return NULL;
+	}
+
+	if (offset + length > dev->mem_resource[bar].len) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR(
+			"invalid cap: overflows bar space: %u > %" PRIu64,
+			offset + length, dev->mem_resource[bar].len);
+		return NULL;
+	}
+
+	base = dev->mem_resource[bar].addr;
+	if (base == NULL) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR("bar %u base addr is NULL", bar);
+		return NULL;
+	}
+
+	return base + offset;
+}
+
+#define PCI_MSIX_ENABLE 0x8000
+
+static int
+virtio_read_caps(struct rte_pci_device *dev, struct virtio_crypto_hw *hw)
+{
+	uint8_t pos;
+	struct virtio_pci_cap cap;
+	int ret;
+
+	if (rte_pci_map_device(dev)) {
+		VIRTIO_CRYPTO_INIT_LOG_DBG("failed to map pci device!");
+		return -1;
+	}
+
+	ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
+	if (ret < 0) {
+		VIRTIO_CRYPTO_INIT_LOG_DBG("failed to read pci capability list");
+		return -1;
+	}
+
+	while (pos) {
+		ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
+		if (ret < 0) {
+			VIRTIO_CRYPTO_INIT_LOG_ERR(
+				"failed to read pci cap at pos: %x", pos);
+			break;
+		}
+
+		if (cap.cap_vndr == PCI_CAP_ID_MSIX) {
+			/* Transitional devices would also have this capability,
+			 * that's why we also check if msix is enabled.
+			 * 1st byte is cap ID; 2nd byte is the position of next
+			 * cap; next two bytes are the flags.
+			 */
+			uint16_t flags = ((uint16_t *)&cap)[1];
+
+			if (flags & PCI_MSIX_ENABLE)
+				hw->use_msix = VIRTIO_MSIX_ENABLED;
+			else
+				hw->use_msix = VIRTIO_MSIX_DISABLED;
+		}
+
+		if (cap.cap_vndr != PCI_CAP_ID_VNDR) {
+			VIRTIO_CRYPTO_INIT_LOG_DBG(
+				"[%2x] skipping non VNDR cap id: %02x",
+				pos, cap.cap_vndr);
+			goto next;
+		}
+
+		VIRTIO_CRYPTO_INIT_LOG_DBG(
+			"[%2x] cfg type: %u, bar: %u, offset: %04x, len: %u",
+			pos, cap.cfg_type, cap.bar, cap.offset, cap.length);
+
+		switch (cap.cfg_type) {
+		case VIRTIO_PCI_CAP_COMMON_CFG:
+			hw->common_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_NOTIFY_CFG:
+			rte_pci_read_config(dev, &hw->notify_off_multiplier,
+					4, pos + sizeof(cap));
+			hw->notify_base = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_DEVICE_CFG:
+			hw->dev_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_ISR_CFG:
+			hw->isr = get_cfg_addr(dev, &cap);
+			break;
+		}
+
+next:
+		pos = cap.cap_next;
+	}
+
+	if (hw->common_cfg == NULL || hw->notify_base == NULL ||
+	    hw->dev_cfg == NULL    || hw->isr == NULL) {
+		VIRTIO_CRYPTO_INIT_LOG_INFO("no modern virtio pci device found.");
+		return -1;
+	}
+
+	VIRTIO_CRYPTO_INIT_LOG_INFO("found modern virtio pci device.");
+
+	VIRTIO_CRYPTO_INIT_LOG_DBG("common cfg mapped at: %p", hw->common_cfg);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("device cfg mapped at: %p", hw->dev_cfg);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("isr cfg mapped at: %p", hw->isr);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("notify base: %p, notify off multiplier: %u",
+		hw->notify_base, hw->notify_off_multiplier);
+
+	return 0;
+}
+
+/*
+ * Return -1:
+ *   if there is error mapping with VFIO/UIO.
+ *   if port map error when driver type is KDRV_NONE.
+ *   if whitelisted but driver type is KDRV_UNKNOWN.
+ * Return 1 if kernel driver is managing the device.
+ * Return 0 on success.
+ */
+int
+vtpci_cryptodev_init(struct rte_pci_device *dev, struct virtio_crypto_hw *hw)
+{
+	/*
+	 * Try if we can succeed reading virtio pci caps, which exists
+	 * only on modern pci device. If failed, we fallback to legacy
+	 * virtio handling.
+	 */
+	if (virtio_read_caps(dev, hw) == 0) {
+		VIRTIO_CRYPTO_INIT_LOG_INFO("modern virtio pci detected.");
+		virtio_hw_internal[hw->dev_id].vtpci_ops =
+					&virtio_crypto_modern_ops;
+		hw->modern = 1;
+		return 0;
+	}
+
+	/*
+	 * virtio crypto conforms to virtio 1.0 and doesn't support
+	 * legacy mode
+	 */
+	return -1;
+}
diff --git a/drivers/crypto/virtio/virtio_pci.h b/drivers/crypto/virtio/virtio_pci.h
new file mode 100644
index 0000000..cd316a6
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_pci.h
@@ -0,0 +1,253 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_PCI_H_
+#define _VIRTIO_PCI_H_
+
+#include <linux/virtio_crypto.h>
+
+#include <stdint.h>
+
+#include <rte_pci.h>
+#include <rte_bus_pci.h>
+#include <rte_cryptodev.h>
+
+struct virtqueue;
+
+/* VirtIO PCI vendor/device ID. */
+#define VIRTIO_CRYPTO_PCI_VENDORID 0x1AF4
+#define VIRTIO_CRYPTO_PCI_DEVICEID 0x1054
+
+/* VirtIO ABI version, this must match exactly. */
+#define VIRTIO_PCI_ABI_VERSION 0
+
+/*
+ * VirtIO Header, located in BAR 0.
+ */
+#define VIRTIO_PCI_HOST_FEATURES  0  /* host's supported features (32bit, RO)*/
+#define VIRTIO_PCI_GUEST_FEATURES 4  /* guest's supported features (32, RW) */
+#define VIRTIO_PCI_QUEUE_PFN      8  /* physical address of VQ (32, RW) */
+#define VIRTIO_PCI_QUEUE_NUM      12 /* number of ring entries (16, RO) */
+#define VIRTIO_PCI_QUEUE_SEL      14 /* current VQ selection (16, RW) */
+#define VIRTIO_PCI_QUEUE_NOTIFY   16 /* notify host regarding VQ (16, RW) */
+#define VIRTIO_PCI_STATUS         18 /* device status register (8, RW) */
+#define VIRTIO_PCI_ISR            19 /* interrupt status register, reading
+				      * also clears the register (8, RO)
+				      */
+/* Only if MSIX is enabled: */
+
+/* configuration change vector (16, RW) */
+#define VIRTIO_MSI_CONFIG_VECTOR  20
+/* vector for selected VQ notifications */
+#define VIRTIO_MSI_QUEUE_VECTOR	  22
+
+/* The bit of the ISR which indicates a device has an interrupt. */
+#define VIRTIO_PCI_ISR_INTR   0x1
+/* The bit of the ISR which indicates a device configuration change. */
+#define VIRTIO_PCI_ISR_CONFIG 0x2
+/* Vector value used to disable MSI for queue. */
+#define VIRTIO_MSI_NO_VECTOR 0xFFFF
+
+/* Status byte for guest to report progress. */
+#define VIRTIO_CONFIG_STATUS_RESET     0x00
+#define VIRTIO_CONFIG_STATUS_ACK       0x01
+#define VIRTIO_CONFIG_STATUS_DRIVER    0x02
+#define VIRTIO_CONFIG_STATUS_DRIVER_OK 0x04
+#define VIRTIO_CONFIG_STATUS_FEATURES_OK 0x08
+#define VIRTIO_CONFIG_STATUS_FAILED    0x80
+
+/*
+ * Each virtqueue indirect descriptor list must be physically contiguous.
+ * To allow us to malloc(9) each list individually, limit the number
+ * supported to what will fit in one page. With 4KB pages, this is a limit
+ * of 256 descriptors. If there is ever a need for more, we can switch to
+ * contigmalloc(9) for the larger allocations, similar to what
+ * bus_dmamem_alloc(9) does.
+ *
+ * Note the sizeof(struct vring_desc) is 16 bytes.
+ */
+#define VIRTIO_MAX_INDIRECT ((int) (PAGE_SIZE / 16))
+
+/* Do we get callbacks when the ring is completely used, even if we've
+ * suppressed them?
+ */
+#define VIRTIO_F_NOTIFY_ON_EMPTY	24
+
+/* Can the device handle any descriptor layout? */
+#define VIRTIO_F_ANY_LAYOUT		27
+
+/* We support indirect buffer descriptors */
+#define VIRTIO_RING_F_INDIRECT_DESC	28
+
+#define VIRTIO_F_VERSION_1		32
+#define VIRTIO_F_IOMMU_PLATFORM	33
+
+/* The Guest publishes the used index for which it expects an interrupt
+ * at the end of the avail ring. Host should ignore the avail->flags field.
+ */
+/* The Host publishes the avail index for which it expects a kick
+ * at the end of the used ring. Guest should ignore the used->flags field.
+ */
+#define VIRTIO_RING_F_EVENT_IDX		29
+
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG	1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG	2
+/* ISR Status */
+#define VIRTIO_PCI_CAP_ISR_CFG		3
+/* Device specific configuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG	4
+/* PCI configuration access */
+#define VIRTIO_PCI_CAP_PCI_CFG		5
+
+/* This is the PCI capability header: */
+struct virtio_pci_cap {
+	uint8_t cap_vndr;	/* Generic PCI field: PCI_CAP_ID_VNDR */
+	uint8_t cap_next;	/* Generic PCI field: next ptr. */
+	uint8_t cap_len;	/* Generic PCI field: capability length */
+	uint8_t cfg_type;	/* Identifies the structure. */
+	uint8_t bar;		/* Where to find it. */
+	uint8_t padding[3];	/* Pad to full dword. */
+	uint32_t offset;	/* Offset within bar. */
+	uint32_t length;	/* Length of the structure, in bytes. */
+};
+
+struct virtio_pci_notify_cap {
+	struct virtio_pci_cap cap;
+	uint32_t notify_off_multiplier;	/* Multiplier for queue_notify_off. */
+};
+
+/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
+struct virtio_pci_common_cfg {
+	/* About the whole device. */
+	uint32_t device_feature_select;	/* read-write */
+	uint32_t device_feature;	/* read-only */
+	uint32_t guest_feature_select;	/* read-write */
+	uint32_t guest_feature;		/* read-write */
+	uint16_t msix_config;		/* read-write */
+	uint16_t num_queues;		/* read-only */
+	uint8_t device_status;		/* read-write */
+	uint8_t config_generation;	/* read-only */
+
+	/* About a specific virtqueue. */
+	uint16_t queue_select;		/* read-write */
+	uint16_t queue_size;		/* read-write, power of 2. */
+	uint16_t queue_msix_vector;	/* read-write */
+	uint16_t queue_enable;		/* read-write */
+	uint16_t queue_notify_off;	/* read-only */
+	uint32_t queue_desc_lo;		/* read-write */
+	uint32_t queue_desc_hi;		/* read-write */
+	uint32_t queue_avail_lo;	/* read-write */
+	uint32_t queue_avail_hi;	/* read-write */
+	uint32_t queue_used_lo;		/* read-write */
+	uint32_t queue_used_hi;		/* read-write */
+};
+
+struct virtio_crypto_hw;
+
+struct virtio_pci_ops {
+	void (*read_dev_cfg)(struct virtio_crypto_hw *hw, size_t offset,
+			     void *dst, int len);
+	void (*write_dev_cfg)(struct virtio_crypto_hw *hw, size_t offset,
+			      const void *src, int len);
+	void (*reset)(struct virtio_crypto_hw *hw);
+
+	uint8_t (*get_status)(struct virtio_crypto_hw *hw);
+	void (*set_status)(struct virtio_crypto_hw *hw, uint8_t status);
+
+	uint64_t (*get_features)(struct virtio_crypto_hw *hw);
+	void (*set_features)(struct virtio_crypto_hw *hw, uint64_t features);
+
+	uint8_t (*get_isr)(struct virtio_crypto_hw *hw);
+
+	uint16_t (*set_config_irq)(struct virtio_crypto_hw *hw, uint16_t vec);
+
+	uint16_t (*set_queue_irq)(struct virtio_crypto_hw *hw,
+			struct virtqueue *vq, uint16_t vec);
+
+	uint16_t (*get_queue_num)(struct virtio_crypto_hw *hw,
+			uint16_t queue_id);
+	int (*setup_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+	void (*del_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+	void (*notify_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+};
+
+struct virtio_crypto_hw {
+	/* control queue */
+	struct virtqueue *cvq;
+	uint16_t    dev_id;
+	uint16_t    max_dataqueues;
+	uint64_t    req_guest_features;
+	uint64_t    guest_features;
+	uint8_t	    use_msix;
+	uint8_t     modern;
+	uint32_t    notify_off_multiplier;
+	uint8_t     *isr;
+	uint16_t    *notify_base;
+	struct virtio_pci_common_cfg *common_cfg;
+	struct virtio_crypto_config *dev_cfg;
+	const struct rte_cryptodev_capabilities *virtio_dev_capabilities;
+};
+
+/*
+ * While virtio_crypto_hw is stored in shared memory, this structure stores
+ * some infos that may vary in the multiple process model locally.
+ * For example, the vtpci_ops pointer.
+ */
+struct virtio_hw_internal {
+	const struct virtio_pci_ops *vtpci_ops;
+	struct rte_pci_ioport io;
+};
+
+#define VTPCI_OPS(hw)	(virtio_hw_internal[(hw)->dev_id].vtpci_ops)
+#define VTPCI_IO(hw)	(&virtio_hw_internal[(hw)->dev_id].io)
+
+extern struct virtio_hw_internal virtio_hw_internal[RTE_MAX_VIRTIO_CRYPTO];
+
+/*
+ * How many bits to shift physical queue address written to QUEUE_PFN.
+ * 12 is historical, and due to x86 page size.
+ */
+#define VIRTIO_PCI_QUEUE_ADDR_SHIFT 12
+
+/* The alignment to use between consumer and producer parts of vring. */
+#define VIRTIO_PCI_VRING_ALIGN 4096
+
+enum virtio_msix_status {
+	VIRTIO_MSIX_NONE = 0,
+	VIRTIO_MSIX_DISABLED = 1,
+	VIRTIO_MSIX_ENABLED = 2
+};
+
+static inline int
+vtpci_with_feature(struct virtio_crypto_hw *hw, uint64_t bit)
+{
+	return (hw->guest_features & (1ULL << bit)) != 0;
+}
+
+/*
+ * Function declaration from virtio_pci.c
+ */
+int vtpci_cryptodev_init(struct rte_pci_device *dev,
+	struct virtio_crypto_hw *hw);
+void vtpci_cryptodev_reset(struct virtio_crypto_hw *hw);
+
+void vtpci_cryptodev_reinit_complete(struct virtio_crypto_hw *hw);
+
+uint8_t vtpci_cryptodev_get_status(struct virtio_crypto_hw *hw);
+void vtpci_cryptodev_set_status(struct virtio_crypto_hw *hw, uint8_t status);
+
+uint64_t vtpci_cryptodev_negotiate_features(struct virtio_crypto_hw *hw,
+	uint64_t host_features);
+
+void vtpci_write_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+	const void *src, int length);
+
+void vtpci_read_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+	void *dst, int length);
+
+uint8_t vtpci_cryptodev_isr(struct virtio_crypto_hw *hw);
+
+#endif /* _VIRTIO_PCI_H_ */
diff --git a/drivers/crypto/virtio/virtio_ring.h b/drivers/crypto/virtio/virtio_ring.h
new file mode 100644
index 0000000..ee30674
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_ring.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_RING_H_
+#define _VIRTIO_RING_H_
+
+#include <stdint.h>
+
+#include <rte_common.h>
+
+/* This marks a buffer as continuing via the next field. */
+#define VRING_DESC_F_NEXT       1
+/* This marks a buffer as write-only (otherwise read-only). */
+#define VRING_DESC_F_WRITE      2
+/* This means the buffer contains a list of buffer descriptors. */
+#define VRING_DESC_F_INDIRECT   4
+
+/* The Host uses this in used->flags to advise the Guest: don't kick me
+ * when you add a buffer.  It's unreliable, so it's simply an
+ * optimization.  Guest will still kick if it's out of buffers.
+ */
+#define VRING_USED_F_NO_NOTIFY  1
+/* The Guest uses this in avail->flags to advise the Host: don't
+ * interrupt me when you consume a buffer.  It's unreliable, so it's
+ * simply an optimization.
+ */
+#define VRING_AVAIL_F_NO_INTERRUPT  1
+
+/* VirtIO ring descriptors: 16 bytes.
+ * These can chain together via "next".
+ */
+struct vring_desc {
+	uint64_t addr;  /*  Address (guest-physical). */
+	uint32_t len;   /* Length. */
+	uint16_t flags; /* The flags as indicated above. */
+	uint16_t next;  /* We chain unused descriptors via this. */
+};
+
+struct vring_avail {
+	uint16_t flags;
+	uint16_t idx;
+	uint16_t ring[0];
+};
+
+/* id is a 16bit index. uint32_t is used here for ids for padding reasons. */
+struct vring_used_elem {
+	/* Index of start of used descriptor chain. */
+	uint32_t id;
+	/* Total length of the descriptor chain which was written to. */
+	uint32_t len;
+};
+
+struct vring_used {
+	uint16_t flags;
+	volatile uint16_t idx;
+	struct vring_used_elem ring[0];
+};
+
+struct vring {
+	unsigned int num;
+	struct vring_desc  *desc;
+	struct vring_avail *avail;
+	struct vring_used  *used;
+};
+
+/* The standard layout for the ring is a continuous chunk of memory which
+ * looks like this.  We assume num is a power of 2.
+ *
+ * struct vring {
+ *      // The actual descriptors (16 bytes each)
+ *      struct vring_desc desc[num];
+ *
+ *      // A ring of available descriptor heads with free-running index.
+ *      __u16 avail_flags;
+ *      __u16 avail_idx;
+ *      __u16 available[num];
+ *      __u16 used_event_idx;
+ *
+ *      // Padding to the next align boundary.
+ *      char pad[];
+ *
+ *      // A ring of used descriptor heads with free-running index.
+ *      __u16 used_flags;
+ *      __u16 used_idx;
+ *      struct vring_used_elem used[num];
+ *      __u16 avail_event_idx;
+ * };
+ *
+ * NOTE: for VirtIO PCI, align is 4096.
+ */
+
+/*
+ * We publish the used event index at the end of the available ring, and vice
+ * versa. They are at the end for backwards compatibility.
+ */
+#define vring_used_event(vr)  ((vr)->avail->ring[(vr)->num])
+#define vring_avail_event(vr) (*(uint16_t *)&(vr)->used->ring[(vr)->num])
+
+static inline size_t
+vring_size(unsigned int num, unsigned long align)
+{
+	size_t size;
+
+	size = num * sizeof(struct vring_desc);
+	size += sizeof(struct vring_avail) + (num * sizeof(uint16_t));
+	size = RTE_ALIGN_CEIL(size, align);
+	size += sizeof(struct vring_used) +
+		(num * sizeof(struct vring_used_elem));
+	return size;
+}
+
+static inline void
+vring_init(struct vring *vr, unsigned int num, uint8_t *p,
+	unsigned long align)
+{
+	vr->num = num;
+	vr->desc = (struct vring_desc *) p;
+	vr->avail = (struct vring_avail *) (p +
+		num * sizeof(struct vring_desc));
+	vr->used = (void *)
+		RTE_ALIGN_CEIL((uintptr_t)(&vr->avail->ring[num]), align);
+}
+
+/*
+ * The following is used with VIRTIO_RING_F_EVENT_IDX.
+ * Assuming a given event_idx value from the other size, if we have
+ * just incremented index from old to new_idx, should we trigger an
+ * event?
+ */
+static inline int
+vring_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
+{
+	return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old);
+}
+
+#endif /* _VIRTIO_RING_H_ */
diff --git a/drivers/crypto/virtio/virtio_rxtx.c b/drivers/crypto/virtio/virtio_rxtx.c
new file mode 100644
index 0000000..51f6e09
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_rxtx.c
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+#include "virtio_cryptodev.h"
+
+uint16_t
+virtio_crypto_pkt_rx_burst(
+	void *tx_queue __rte_unused,
+	struct rte_crypto_op **rx_pkts __rte_unused,
+	uint16_t nb_pkts __rte_unused)
+{
+	uint16_t nb_rx = 0;
+
+	return nb_rx;
+}
+
+uint16_t
+virtio_crypto_pkt_tx_burst(
+	void *tx_queue __rte_unused,
+	struct rte_crypto_op **tx_pkts __rte_unused,
+	uint16_t nb_pkts __rte_unused)
+{
+	uint16_t nb_tx = 0;
+
+	return nb_tx;
+}
diff --git a/drivers/crypto/virtio/virtqueue.c b/drivers/crypto/virtio/virtqueue.c
new file mode 100644
index 0000000..fd8be58
--- /dev/null
+++ b/drivers/crypto/virtio/virtqueue.c
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+#include <rte_crypto.h>
+#include <rte_malloc.h>
+
+#include "virtqueue.h"
+
+void
+virtqueue_disable_intr(struct virtqueue *vq)
+{
+	/*
+	 * Set VRING_AVAIL_F_NO_INTERRUPT to hint host
+	 * not to interrupt when it consumes packets
+	 * Note: this is only considered a hint to the host
+	 */
+	vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
+}
+
+void
+virtqueue_detatch_unused(struct virtqueue *vq)
+{
+	struct rte_crypto_op *cop = NULL;
+
+	int idx;
+
+	if (vq != NULL)
+		for (idx = 0; idx < vq->vq_nentries; idx++) {
+			cop = vq->vq_descx[idx].crypto_op;
+			if (cop) {
+				if (cop->sym->m_src)
+					rte_pktmbuf_free(cop->sym->m_src);
+				if (cop->sym->m_dst)
+					rte_pktmbuf_free(cop->sym->m_dst);
+				rte_crypto_op_free(cop);
+				vq->vq_descx[idx].crypto_op = NULL;
+			}
+		}
+}
diff --git a/drivers/crypto/virtio/virtqueue.h b/drivers/crypto/virtio/virtqueue.h
new file mode 100644
index 0000000..0a9bddb
--- /dev/null
+++ b/drivers/crypto/virtio/virtqueue.h
@@ -0,0 +1,172 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTQUEUE_H_
+#define _VIRTQUEUE_H_
+
+#include <linux/virtio_crypto.h>
+
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_mempool.h>
+
+#include "virtio_pci.h"
+#include "virtio_ring.h"
+#include "virtio_logs.h"
+
+struct rte_mbuf;
+
+/*
+ * Per virtio_config.h in Linux.
+ *     For virtio_pci on SMP, we don't need to order with respect to MMIO
+ *     accesses through relaxed memory I/O windows, so smp_mb() et al are
+ *     sufficient.
+ *
+ */
+#define virtio_mb()	rte_smp_mb()
+#define virtio_rmb()	rte_smp_rmb()
+#define virtio_wmb()	rte_smp_wmb()
+
+#define VIRTQUEUE_MAX_NAME_SZ 32
+
+enum { VTCRYPTO_DATAQ = 0, VTCRYPTO_CTRLQ = 1 };
+
+/**
+ * The maximum virtqueue size is 2^15. Use that value as the end of
+ * descriptor chain terminator since it will never be a valid index
+ * in the descriptor table. This is used to verify we are correctly
+ * handling vq_free_cnt.
+ */
+#define VQ_RING_DESC_CHAIN_END 32768
+
+struct vq_desc_extra {
+	void     *crypto_op;
+	void     *cookie;
+	uint16_t ndescs;
+};
+
+struct virtqueue {
+	/**< virtio_crypto_hw structure pointer. */
+	struct virtio_crypto_hw *hw;
+	/**< mem zone to populate RX ring. */
+	const struct rte_memzone *mz;
+	/**< memzone to populate hdr and request. */
+	struct rte_mempool *mpool;
+	uint8_t     dev_id;              /**< Device identifier. */
+	uint16_t    vq_queue_index;       /**< PCI queue index */
+
+	void        *vq_ring_virt_mem;    /**< linear address of vring*/
+	unsigned int vq_ring_size;
+	phys_addr_t vq_ring_mem;          /**< physical address of vring */
+
+	struct vring vq_ring;    /**< vring keeping desc, used and avail */
+	uint16_t    vq_free_cnt; /**< num of desc available */
+	uint16_t    vq_nentries; /**< vring desc numbers */
+
+	/**
+	 * Head of the free chain in the descriptor table. If
+	 * there are no free descriptors, this will be set to
+	 * VQ_RING_DESC_CHAIN_END.
+	 */
+	uint16_t  vq_desc_head_idx;
+	uint16_t  vq_desc_tail_idx;
+	/**
+	 * Last consumed descriptor in the used table,
+	 * trails vq_ring.used->idx.
+	 */
+	uint16_t vq_used_cons_idx;
+	uint16_t vq_avail_idx;
+
+	/* Statistics */
+	uint64_t	packets_sent_total;
+	uint64_t	packets_sent_failed;
+	uint64_t	packets_received_total;
+	uint64_t	packets_received_failed;
+
+	uint16_t  *notify_addr;
+
+	struct vq_desc_extra vq_descx[0];
+};
+
+/**
+ * Tell the backend not to interrupt us.
+ */
+void virtqueue_disable_intr(struct virtqueue *vq);
+
+/**
+ *  Get all mbufs to be freed.
+ */
+void virtqueue_detatch_unused(struct virtqueue *vq);
+
+static inline int
+virtqueue_full(const struct virtqueue *vq)
+{
+	return vq->vq_free_cnt == 0;
+}
+
+#define VIRTQUEUE_NUSED(vq) \
+	((uint16_t)((vq)->vq_ring.used->idx - (vq)->vq_used_cons_idx))
+
+static inline void
+vq_update_avail_idx(struct virtqueue *vq)
+{
+	virtio_wmb();
+	vq->vq_ring.avail->idx = vq->vq_avail_idx;
+}
+
+static inline void
+vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
+{
+	uint16_t avail_idx;
+	/*
+	 * Place the head of the descriptor chain into the next slot and make
+	 * it usable to the host. The chain is made available now rather than
+	 * deferring to virtqueue_notify() in the hopes that if the host is
+	 * currently running on another CPU, we can keep it processing the new
+	 * descriptor.
+	 */
+	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
+	if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx))
+		vq->vq_ring.avail->ring[avail_idx] = desc_idx;
+	vq->vq_avail_idx++;
+}
+
+static inline int
+virtqueue_kick_prepare(struct virtqueue *vq)
+{
+	return !(vq->vq_ring.used->flags & VRING_USED_F_NO_NOTIFY);
+}
+
+static inline void
+virtqueue_notify(struct virtqueue *vq)
+{
+	/*
+	 * Ensure updated avail->idx is visible to host.
+	 * For virtio on IA, the notificaiton is through io port operation
+	 * which is a serialization instruction itself.
+	 */
+	VTPCI_OPS(vq->hw)->notify_queue(vq->hw, vq);
+}
+
+/**
+ * Dump virtqueue internal structures, for debug purpose only.
+ */
+#define VIRTQUEUE_DUMP(vq) do { \
+	uint16_t used_idx, nused; \
+	used_idx = (vq)->vq_ring.used->idx; \
+	nused = (uint16_t)(used_idx - (vq)->vq_used_cons_idx); \
+	VIRTIO_CRYPTO_INIT_LOG_DBG(\
+	  "VQ: - size=%d; free=%d; used=%d; desc_head_idx=%d;" \
+	  " avail.idx=%d; used_cons_idx=%d; used.idx=%d;" \
+	  " avail.flags=0x%x; used.flags=0x%x", \
+	  (vq)->vq_nentries, (vq)->vq_free_cnt, nused, \
+	  (vq)->vq_desc_head_idx, (vq)->vq_ring.avail->idx, \
+	  (vq)->vq_used_cons_idx, (vq)->vq_ring.used->idx, \
+	  (vq)->vq_ring.avail->flags, (vq)->vq_ring.used->flags); \
+} while (0)
+
+#endif /* _VIRTQUEUE_H_ */
-- 
1.8.3.1

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v3 1/2] doc: add vfio api support
  @ 2018-04-03  8:28  4% ` Hemant Agrawal
  2018-04-03 10:16  0%   ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Hemant Agrawal @ 2018-04-03  8:28 UTC (permalink / raw)
  To: dev; +Cc: anatoly.burakov, thomas

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
 doc/api/doxy-api-index.md                | 3 ++-
 doc/api/doxy-api.conf                    | 1 +
 lib/librte_eal/common/include/rte_vfio.h | 5 +++++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index d77f205..12c1ebe 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -179,4 +179,5 @@ The public API headers are grouped by topics:
   [EAL config]         (@ref rte_eal.h),
   [common]             (@ref rte_common.h),
   [ABI compat]         (@ref rte_compat.h),
-  [version]            (@ref rte_version.h)
+  [version]            (@ref rte_version.h),
+  [vfio]               (@ref rte_vfio.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index cda52fd..166612f 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -82,6 +82,7 @@ INPUT                   = doc/api/doxy-api-index.md \
 FILE_PATTERNS           = rte_*.h \
                           cmdline.h
 PREDEFINED              = __DOXYGEN__ \
+			  VFIO_PRESENT \
                           __attribute__(x)=
 
 OPTIMIZE_OUTPUT_FOR_C   = YES
diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h
index 249095e..9b7b983 100644
--- a/lib/librte_eal/common/include/rte_vfio.h
+++ b/lib/librte_eal/common/include/rte_vfio.h
@@ -5,6 +5,11 @@
 #ifndef _RTE_VFIO_H_
 #define _RTE_VFIO_H_
 
+/**
+ * @file
+ * RTE VFIO. This library provides various VFIO related utility functions.
+ */
+
 /*
  * determine if VFIO is present on the system
  */
-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC PATCH 5/5] test: add few eBPF samples
  @ 2018-04-02 22:26  3%         ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2018-04-02 22:26 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: 'dev@dpdk.org'

-----Original Message-----
> Date: Fri, 30 Mar 2018 17:42:22 +0000
> From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
> To: 'Jerin Jacob' <jerin.jacob@caviumnetworks.com>
> CC: "'dev@dpdk.org'" <dev@dpdk.org>
> Subject: RE: [dpdk-dev] [RFC PATCH 5/5] test: add few eBPF samples
> 
> Hi Jerin,
> > > > Add few simple eBPF programs as an example.
> > > >
> > > > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > > diff --git a/test/bpf/mbuf.h b/test/bpf/mbuf.h
> > > > new file mode 100644
> > > > index 000000000..aeef6339d
> > > > --- /dev/null
> > > > +++ b/test/bpf/mbuf.h
> > > > @@ -0,0 +1,556 @@
> > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > + * Copyright(c) 2010-2014 Intel Corporation.
> > > > + * Copyright 2014 6WIND S.A.
> > > > + */
> > > > +
> > > > +/*
> > > > + * Snipper from dpdk.org rte_mbuf.h.
> > > > + * used to provide BPF programs information about rte_mbuf layout.
> > > > + */
> > > > +
> > > > +#ifndef _MBUF_H_
> > > > +#define _MBUF_H_
> > > > +
> > > > +#include <stdint.h>
> > > > +#include <rte_common.h>
> > > > +#include <rte_memory.h>
> > >
> > > Is it worth to keep an copy of mbuf for standalone purpose?
> > > Since clang is already supported, I think, if someone need mbuf then
> > > they can include DPDK headers. Just thinking in maintainability
> > > perspective.
> > 
> > That would be ideal.
> > I made a snippet just to avoid compiler errors for bpf target.
> > Will try to address it in next version.
> > 
> 
> I looked at it a bit more and it seems that it wouldn't be that straightforward as I thought.
> There are things not supported by bpf target (thread local-storage and simd related definitions)
> inside include chain.
> So to fix it some changes in our core include files might be needed .
> The simplest way would probably be to move struct rte_mbuf and related macros definitions into a separate
> file (rte_mbuf_common.h or so).

I think, rte_mbuf_common.h should be the way to go. IMO, KNI also benefited with that.

I guess, There is NO ABI change if we move the generic stuff to rte_mbuf_common.h.
But if you think, it is quite controversial change then we could
postpone to next release.(Only my worry is that, once it is postponed it
may not happen). I am fine with either way.

> Though it is quite controversial change and I think it is better to postpone it till a separate patch and
> probably next release.
> So for now I left a snipper test/bpf/mbuf.h in place.
> Konstantin

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v9 3/9] eventtimer: add common code
  @ 2018-04-02 19:39  3%       ` Erik Gabriel Carrillo
    1 sibling, 0 replies; 200+ results
From: Erik Gabriel Carrillo @ 2018-04-02 19:39 UTC (permalink / raw)
  To: pbhagavatula; +Cc: dev, jerin.jacob, hemant.agrawal

This commit adds the logic that is shared by all event timer adapter
drivers; the common code handles instance allocation and some
initialization.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 config/common_base                                |   1 +
 drivers/event/sw/sw_evdev.c                       |  18 +
 lib/librte_eventdev/Makefile                      |   2 +
 lib/librte_eventdev/rte_event_timer_adapter.c     | 387 ++++++++++++++++++++++
 lib/librte_eventdev/rte_event_timer_adapter_pmd.h | 114 +++++++
 lib/librte_eventdev/rte_eventdev.c                |  22 ++
 lib/librte_eventdev/rte_eventdev.h                |  20 ++
 lib/librte_eventdev/rte_eventdev_pmd.h            |  35 ++
 lib/librte_eventdev/rte_eventdev_version.map      |  21 +-
 9 files changed, 619 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eventdev/rte_event_timer_adapter.c
 create mode 100644 lib/librte_eventdev/rte_event_timer_adapter_pmd.h

diff --git a/config/common_base b/config/common_base
index ee10b44..accc6f5 100644
--- a/config/common_base
+++ b/config/common_base
@@ -550,6 +550,7 @@ CONFIG_RTE_LIBRTE_EVENTDEV=y
 CONFIG_RTE_LIBRTE_EVENTDEV_DEBUG=n
 CONFIG_RTE_EVENT_MAX_DEVS=16
 CONFIG_RTE_EVENT_MAX_QUEUES_PER_DEV=64
+CONFIG_RTE_EVENT_TIMER_ADAPTER_NUM_MAX=32
 
 #
 # Compile PMD for skeleton event device
diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c
index 6672fd8..0847547 100644
--- a/drivers/event/sw/sw_evdev.c
+++ b/drivers/event/sw/sw_evdev.c
@@ -464,6 +464,22 @@ sw_eth_rx_adapter_caps_get(const struct rte_eventdev *dev,
 	return 0;
 }
 
+static int
+sw_timer_adapter_caps_get(const struct rte_eventdev *dev,
+			  uint64_t flags,
+			  uint32_t *caps,
+			  const struct rte_event_timer_adapter_ops **ops)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(flags);
+	*caps = 0;
+
+	/* Use default SW ops */
+	*ops = NULL;
+
+	return 0;
+}
+
 static void
 sw_info_get(struct rte_eventdev *dev, struct rte_event_dev_info *info)
 {
@@ -791,6 +807,8 @@ sw_probe(struct rte_vdev_device *vdev)
 
 			.eth_rx_adapter_caps_get = sw_eth_rx_adapter_caps_get,
 
+			.timer_adapter_caps_get = sw_timer_adapter_caps_get,
+
 			.xstats_get = sw_xstats_get,
 			.xstats_get_names = sw_xstats_get_names,
 			.xstats_get_by_name = sw_xstats_get_by_name,
diff --git a/lib/librte_eventdev/Makefile b/lib/librte_eventdev/Makefile
index 549b182..8b16e3f 100644
--- a/lib/librte_eventdev/Makefile
+++ b/lib/librte_eventdev/Makefile
@@ -20,6 +20,7 @@ LDLIBS += -lrte_eal -lrte_ring -lrte_ethdev -lrte_hash
 SRCS-y += rte_eventdev.c
 SRCS-y += rte_event_ring.c
 SRCS-y += rte_event_eth_rx_adapter.c
+SRCS-y += rte_event_timer_adapter.c
 
 # export include files
 SYMLINK-y-include += rte_eventdev.h
@@ -29,6 +30,7 @@ SYMLINK-y-include += rte_eventdev_pmd_vdev.h
 SYMLINK-y-include += rte_event_ring.h
 SYMLINK-y-include += rte_event_eth_rx_adapter.h
 SYMLINK-y-include += rte_event_timer_adapter.h
+SYMLINK-y-include += rte_event_timer_adapter_pmd.h
 
 # versioning export map
 EXPORT_MAP := rte_eventdev_version.map
diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
new file mode 100644
index 0000000..75a14ac
--- /dev/null
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -0,0 +1,387 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017-2018 Intel Corporation.
+ * All rights reserved.
+ */
+
+#include <string.h>
+#include <inttypes.h>
+
+#include <rte_memzone.h>
+#include <rte_memory.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+
+#include "rte_eventdev.h"
+#include "rte_eventdev_pmd.h"
+#include "rte_event_timer_adapter.h"
+#include "rte_event_timer_adapter_pmd.h"
+
+#define DATA_MZ_NAME_MAX_LEN 64
+#define DATA_MZ_NAME_FORMAT "rte_event_timer_adapter_data_%d"
+
+static int evtim_logtype;
+
+static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
+
+#define EVTIM_LOG(level, logtype, ...) \
+	rte_log(RTE_LOG_ ## level, logtype, \
+		RTE_FMT("EVTIMER: %s() line %u: " RTE_FMT_HEAD(__VA_ARGS__,) \
+			"\n", __func__, __LINE__, RTE_FMT_TAIL(__VA_ARGS__,)))
+
+#define EVTIM_LOG_ERR(...) EVTIM_LOG(ERR, evtim_logtype, __VA_ARGS__)
+
+#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
+#define EVTIM_LOG_DBG(...) \
+	EVTIM_LOG(DEBUG, evtim_logtype, __VA_ARGS__)
+#else
+#define EVTIM_LOG_DBG(...) (void)0
+#endif
+
+static int
+default_port_conf_cb(uint16_t id, uint8_t event_dev_id, uint8_t *event_port_id,
+		     void *conf_arg)
+{
+	struct rte_event_timer_adapter *adapter;
+	struct rte_eventdev *dev;
+	struct rte_event_dev_config dev_conf;
+	struct rte_event_port_conf *port_conf, def_port_conf = {0};
+	int started;
+	uint8_t port_id;
+	uint8_t dev_id;
+	int ret;
+
+	RTE_SET_USED(event_dev_id);
+
+	adapter = &adapters[id];
+	dev = &rte_eventdevs[adapter->data->event_dev_id];
+	dev_id = dev->data->dev_id;
+	dev_conf = dev->data->dev_conf;
+
+	started = dev->data->dev_started;
+	if (started)
+		rte_event_dev_stop(dev_id);
+
+	port_id = dev_conf.nb_event_ports;
+	dev_conf.nb_event_ports += 1;
+	ret = rte_event_dev_configure(dev_id, &dev_conf);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to configure event dev %u\n", dev_id);
+		if (started)
+			if (rte_event_dev_start(dev_id))
+				return -EIO;
+
+		return ret;
+	}
+
+	if (conf_arg != NULL)
+		port_conf = conf_arg;
+	else {
+		port_conf = &def_port_conf;
+		ret = rte_event_port_default_conf_get(dev_id, port_id,
+						      port_conf);
+		if (ret < 0)
+			return ret;
+	}
+
+	ret = rte_event_port_setup(dev_id, port_id, port_conf);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to setup event port %u on event dev %u\n",
+			      port_id, dev_id);
+		return ret;
+	}
+
+	*event_port_id = port_id;
+
+	if (started)
+		ret = rte_event_dev_start(dev_id);
+
+	return ret;
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_create(const struct rte_event_timer_adapter_conf *conf)
+{
+	return rte_event_timer_adapter_create_ext(conf, default_port_conf_cb,
+						  NULL);
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_create_ext(
+		const struct rte_event_timer_adapter_conf *conf,
+		rte_event_timer_adapter_port_conf_cb_t conf_cb,
+		void *conf_arg)
+{
+	uint16_t adapter_id;
+	struct rte_event_timer_adapter *adapter;
+	const struct rte_memzone *mz;
+	char mz_name[DATA_MZ_NAME_MAX_LEN];
+	int n, ret;
+	struct rte_eventdev *dev;
+
+	if (conf == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Check eventdev ID */
+	if (!rte_event_pmd_is_valid_dev(conf->event_dev_id)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	dev = &rte_eventdevs[conf->event_dev_id];
+
+	adapter_id = conf->timer_adapter_id;
+
+	/* Check that adapter_id is in range */
+	if (adapter_id >= RTE_EVENT_TIMER_ADAPTER_NUM_MAX) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Check adapter ID not already allocated */
+	adapter = &adapters[adapter_id];
+	if (adapter->allocated) {
+		rte_errno = EEXIST;
+		return NULL;
+	}
+
+	/* Create shared data area. */
+	n = snprintf(mz_name, sizeof(mz_name), DATA_MZ_NAME_FORMAT, adapter_id);
+	if (n >= (int)sizeof(mz_name)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	mz = rte_memzone_reserve(mz_name,
+				 sizeof(struct rte_event_timer_adapter_data),
+				 conf->socket_id, 0);
+	if (mz == NULL)
+		/* rte_errno set by rte_memzone_reserve */
+		return NULL;
+
+	adapter->data = mz->addr;
+	memset(adapter->data, 0, sizeof(struct rte_event_timer_adapter_data));
+
+	adapter->data->mz = mz;
+	adapter->data->event_dev_id = conf->event_dev_id;
+	adapter->data->id = adapter_id;
+	adapter->data->socket_id = conf->socket_id;
+	adapter->data->conf = *conf;  /* copy conf structure */
+
+	/* Query eventdev PMD for timer adapter capabilities and ops */
+	ret = dev->dev_ops->timer_adapter_caps_get(dev,
+						   adapter->data->conf.flags,
+						   &adapter->data->caps,
+						   &adapter->ops);
+	if (ret < 0) {
+		rte_errno = ret;
+		goto free_memzone;
+	}
+
+	if (!(adapter->data->caps &
+	      RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT)) {
+		FUNC_PTR_OR_NULL_RET_WITH_ERRNO(conf_cb, -EINVAL);
+		ret = conf_cb(adapter->data->id, adapter->data->event_dev_id,
+			      &adapter->data->event_port_id, conf_arg);
+		if (ret < 0) {
+			rte_errno = ret;
+			goto free_memzone;
+		}
+	}
+
+	/* Allow driver to do some setup */
+	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
+	ret = adapter->ops->init(adapter);
+	if (ret < 0) {
+		rte_errno = ret;
+		goto free_memzone;
+	}
+
+	/* Set fast-path function pointers */
+	adapter->arm_burst = adapter->ops->arm_burst;
+	adapter->arm_tmo_tick_burst = adapter->ops->arm_tmo_tick_burst;
+	adapter->cancel_burst = adapter->ops->cancel_burst;
+
+	adapter->allocated = 1;
+
+	return adapter;
+
+free_memzone:
+	rte_memzone_free(adapter->data->mz);
+	return NULL;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_info *adapter_info)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+
+	if (adapter->ops->get_info)
+		/* let driver set values it knows */
+		adapter->ops->get_info(adapter, adapter_info);
+
+	/* Set common values */
+	adapter_info->conf = adapter->data->conf;
+	adapter_info->event_dev_port_id = adapter->data->event_port_id;
+	adapter_info->caps = adapter->data->caps;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->start, -EINVAL);
+
+	ret = adapter->ops->start(adapter);
+	if (ret < 0)
+		return ret;
+
+	adapter->data->started = 1;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stop, -EINVAL);
+
+	if (adapter->data->started == 0) {
+		EVTIM_LOG_ERR("event timer adapter %"PRIu8" already stopped",
+			      adapter->data->id);
+		return 0;
+	}
+
+	ret = adapter->ops->stop(adapter);
+	if (ret < 0)
+		return ret;
+
+	adapter->data->started = 0;
+
+	return 0;
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_lookup(uint16_t adapter_id)
+{
+	char name[DATA_MZ_NAME_MAX_LEN];
+	const struct rte_memzone *mz;
+	struct rte_event_timer_adapter_data *data;
+	struct rte_event_timer_adapter *adapter;
+	int ret;
+	struct rte_eventdev *dev;
+
+	if (adapters[adapter_id].allocated)
+		return &adapters[adapter_id]; /* Adapter is already loaded */
+
+	snprintf(name, DATA_MZ_NAME_MAX_LEN, DATA_MZ_NAME_FORMAT, adapter_id);
+	mz = rte_memzone_lookup(name);
+	if (mz == NULL) {
+		rte_errno = ENOENT;
+		return NULL;
+	}
+
+	data = mz->addr;
+
+	adapter = &adapters[data->id];
+	adapter->data = data;
+
+	dev = &rte_eventdevs[adapter->data->event_dev_id];
+
+	/* Query eventdev PMD for timer adapter capabilities and ops */
+	ret = dev->dev_ops->timer_adapter_caps_get(dev,
+						   adapter->data->conf.flags,
+						   &adapter->data->caps,
+						   &adapter->ops);
+	if (ret < 0) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Set fast-path function pointers */
+	adapter->arm_burst = adapter->ops->arm_burst;
+	adapter->arm_tmo_tick_burst = adapter->ops->arm_tmo_tick_burst;
+	adapter->cancel_burst = adapter->ops->cancel_burst;
+
+	adapter->allocated = 1;
+
+	return adapter;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_free(struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->uninit, -EINVAL);
+
+	if (adapter->data->started == 1) {
+		EVTIM_LOG_ERR("event timer adapter %"PRIu8" must be stopped "
+			      "before freeing", adapter->data->id);
+		return -EBUSY;
+	}
+
+	/* free impl priv data */
+	ret = adapter->ops->uninit(adapter);
+	if (ret < 0)
+		return ret;
+
+	/* free shared data area */
+	ret = rte_memzone_free(adapter->data->mz);
+	if (ret < 0)
+		return ret;
+
+	adapter->data = NULL;
+	adapter->allocated = 0;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_service_id_get(struct rte_event_timer_adapter *adapter,
+				       uint32_t *service_id)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+
+	if (adapter->data->service_inited && service_id != NULL)
+		*service_id = adapter->data->service_id;
+
+	return adapter->data->service_inited ? 0 : -ESRCH;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stats_get(struct rte_event_timer_adapter *adapter,
+				  struct rte_event_timer_adapter_stats *stats)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stats_get, -EINVAL);
+	if (stats == NULL)
+		return -EINVAL;
+
+	return adapter->ops->stats_get(adapter, stats);
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stats_reset(struct rte_event_timer_adapter *adapter)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stats_reset, -EINVAL);
+	return adapter->ops->stats_reset(adapter);
+}
+
+RTE_INIT(event_timer_adapter_init_log);
+static void
+event_timer_adapter_init_log(void)
+{
+	evtim_logtype = rte_log_register("lib.eventdev.adapter.timer");
+	if (evtim_logtype >= 0)
+		rte_log_set_level(evtim_logtype, RTE_LOG_NOTICE);
+}
diff --git a/lib/librte_eventdev/rte_event_timer_adapter_pmd.h b/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
new file mode 100644
index 0000000..cf3509d
--- /dev/null
+++ b/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017-2018 Intel Corporation.
+ * All rights reserved.
+ */
+
+#ifndef __RTE_EVENT_TIMER_ADAPTER_PMD_H__
+#define __RTE_EVENT_TIMER_ADAPTER_PMD_H__
+
+/**
+ * @file
+ * RTE Event Timer Adapter API (PMD Side)
+ *
+ * @note
+ * This file provides implementation helpers for internal use by PMDs.  They
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_event_timer_adapter.h"
+
+/*
+ * Definitions of functions exported by an event timer adapter implementation
+ * through *rte_event_timer_adapter_ops* structure supplied in the
+ * *rte_event_timer_adapter* structure associated with an event timer adapter.
+ */
+
+typedef int (*rte_event_timer_adapter_init_t)(
+		struct rte_event_timer_adapter *adapter);
+/**< @internal Event timer adapter implementation setup */
+typedef int (*rte_event_timer_adapter_uninit_t)(
+		struct rte_event_timer_adapter *adapter);
+/**< @internal Event timer adapter implementation teardown */
+typedef int (*rte_event_timer_adapter_start_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Start running event timer adapter */
+typedef int (*rte_event_timer_adapter_stop_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Stop running event timer adapter */
+typedef void (*rte_event_timer_adapter_get_info_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_info *adapter_info);
+/**< @internal Get contextual information for event timer adapter */
+typedef int (*rte_event_timer_adapter_stats_get_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_stats *stats);
+/**< @internal Get statistics for event timer adapter */
+typedef int (*rte_event_timer_adapter_stats_reset_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Reset statistics for event timer adapter */
+
+/**
+ * @internal Structure containing the functions exported by an event timer
+ * adapter implementation.
+ */
+struct rte_event_timer_adapter_ops {
+	rte_event_timer_adapter_init_t		init;  /**< Set up adapter */
+	rte_event_timer_adapter_uninit_t	uninit;/**< Tear down adapter */
+	rte_event_timer_adapter_start_t		start; /**< Start adapter */
+	rte_event_timer_adapter_stop_t		stop;  /**< Stop adapter */
+	rte_event_timer_adapter_get_info_t	get_info;
+	/**< Get info from driver */
+	rte_event_timer_adapter_stats_get_t	stats_get;
+	/**< Get adapter statistics */
+	rte_event_timer_adapter_stats_reset_t	stats_reset;
+	/**< Reset adapter statistics */
+	rte_event_timer_arm_burst_t		arm_burst;
+	/**< Arm one or more event timers */
+	rte_event_timer_arm_tmo_tick_burst_t	arm_tmo_tick_burst;
+	/**< Arm event timers with same expiration time */
+	rte_event_timer_cancel_burst_t		cancel_burst;
+	/**< Cancel one or more event timers */
+};
+
+/**
+ * @internal Adapter data; structure to be placed in shared memory to be
+ * accessible by various processes in a multi-process configuration.
+ */
+struct rte_event_timer_adapter_data {
+	uint8_t id;
+	/**< Event timer adapter ID */
+	uint8_t event_dev_id;
+	/**< Event device ID */
+	uint32_t socket_id;
+	/**< Socket ID where memory is allocated */
+	uint8_t event_port_id;
+	/**< Optional: event port ID used when the inbuilt port is absent */
+	const struct rte_memzone *mz;
+	/**< Event timer adapter memzone pointer */
+	struct rte_event_timer_adapter_conf conf;
+	/**< Configuration used to configure the adapter. */
+	uint32_t caps;
+	/**< Adapter capabilities */
+	void *adapter_priv;
+	/**< Timer adapter private data*/
+	uint8_t service_inited;
+	/**< Service initialization state */
+	uint32_t service_id;
+	/**< Service ID*/
+
+	RTE_STD_C11
+	uint8_t started : 1;
+	/**< Flag to indicate adapter started. */
+} __rte_cache_aligned;
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __RTE_EVENT_TIMER_ADAPTER_PMD_H__ */
diff --git a/lib/librte_eventdev/rte_eventdev.c b/lib/librte_eventdev/rte_eventdev.c
index 851a119..eb3c601 100644
--- a/lib/librte_eventdev/rte_eventdev.c
+++ b/lib/librte_eventdev/rte_eventdev.c
@@ -123,6 +123,28 @@ rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
 				: 0;
 }
 
+int __rte_experimental
+rte_event_timer_adapter_caps_get(uint8_t dev_id, uint32_t *caps)
+{
+	struct rte_eventdev *dev;
+	const struct rte_event_timer_adapter_ops *ops;
+
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
+
+	dev = &rte_eventdevs[dev_id];
+
+	if (caps == NULL)
+		return -EINVAL;
+	*caps = 0;
+
+	return dev->dev_ops->timer_adapter_caps_get ?
+				(*dev->dev_ops->timer_adapter_caps_get)(dev,
+									0,
+									caps,
+									&ops)
+				: 0;
+}
+
 static inline int
 rte_event_dev_queue_config(struct rte_eventdev *dev, uint8_t nb_queues)
 {
diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
index 297a93d..5c4032c 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -215,6 +215,7 @@ extern "C" {
 #include <rte_config.h>
 #include <rte_memory.h>
 #include <rte_errno.h>
+#include <rte_compat.h>
 
 struct rte_mbuf; /* we just use mbuf pointers; no need to include rte_mbuf.h */
 
@@ -1069,6 +1070,25 @@ int
 rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
 				uint32_t *caps);
 
+#define RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT (1ULL << 0)
+/**< This flag is set when the timer mechanism is in HW. */
+
+/**
+ * Retrieve the event device's timer adapter capabilities.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @param[out] caps
+ *   A pointer to memory to be filled with event timer adapter capabilities.
+ *
+ * @return
+ *   - 0: Success, driver provided event timer adapter capabilities.
+ *   - <0: Error code returned by the driver function.
+ */
+int __rte_experimental
+rte_event_timer_adapter_caps_get(uint8_t dev_id, uint32_t *caps);
+
 struct rte_eventdev_driver;
 struct rte_eventdev_ops;
 struct rte_eventdev;
diff --git a/lib/librte_eventdev/rte_eventdev_pmd.h b/lib/librte_eventdev/rte_eventdev_pmd.h
index 31343b5..0e37f1c 100644
--- a/lib/librte_eventdev/rte_eventdev_pmd.h
+++ b/lib/librte_eventdev/rte_eventdev_pmd.h
@@ -26,6 +26,7 @@ extern "C" {
 #include <rte_malloc.h>
 
 #include "rte_eventdev.h"
+#include "rte_event_timer_adapter_pmd.h"
 
 /* Logging Macros */
 #define RTE_EDEV_LOG_ERR(...) \
@@ -449,6 +450,37 @@ typedef int (*eventdev_eth_rx_adapter_caps_get_t)
 struct rte_event_eth_rx_adapter_queue_conf *queue_conf;
 
 /**
+ * Retrieve the event device's timer adapter capabilities, as well as the ops
+ * structure that an event timer adapter should call through to enter the
+ * driver
+ *
+ * @param dev
+ *   Event device pointer
+ *
+ * @param flags
+ *   Flags that can be used to determine how to select an event timer
+ *   adapter ops structure
+ *
+ * @param[out] caps
+ *   A pointer to memory filled with Rx event adapter capabilities.
+ *
+ * @param[out] ops
+ *   A pointer to the ops pointer to set with the address of the desired ops
+ *   structure
+ *
+ * @return
+ *   - 0: Success, driver provides Rx event adapter capabilities for the
+ *	ethernet device.
+ *   - <0: Error code returned by the driver function.
+ *
+ */
+typedef int (*eventdev_timer_adapter_caps_get_t)(
+				const struct rte_eventdev *dev,
+				uint64_t flags,
+				uint32_t *caps,
+				const struct rte_event_timer_adapter_ops **ops);
+
+/**
  * Add ethernet Rx queues to event device. This callback is invoked if
  * the caps returned from rte_eventdev_eth_rx_adapter_caps_get(, eth_port_id)
  * has RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT set.
@@ -640,6 +672,9 @@ struct rte_eventdev_ops {
 	eventdev_eth_rx_adapter_stats_reset eth_rx_adapter_stats_reset;
 	/**< Reset ethernet Rx stats */
 
+	eventdev_timer_adapter_caps_get_t timer_adapter_caps_get;
+	/**< Get timer adapter capabilities */
+
 	eventdev_selftest dev_selftest;
 	/**< Start eventdev Selftest */
 };
diff --git a/lib/librte_eventdev/rte_eventdev_version.map b/lib/librte_eventdev/rte_eventdev_version.map
index 2aef470..537afb8 100644
--- a/lib/librte_eventdev/rte_eventdev_version.map
+++ b/lib/librte_eventdev/rte_eventdev_version.map
@@ -66,7 +66,6 @@ DPDK_17.11 {
 	rte_event_eth_rx_adapter_stats_get;
 	rte_event_eth_rx_adapter_stats_reset;
 	rte_event_eth_rx_adapter_stop;
-
 } DPDK_17.08;
 
 DPDK_18.02 {
@@ -74,3 +73,23 @@ DPDK_18.02 {
 
 	rte_event_dev_selftest;
 } DPDK_17.11;
+
+EXPERIMENTAL {
+	global:
+
+        rte_event_timer_adapter_caps_get;
+	rte_event_timer_adapter_create;
+	rte_event_timer_adapter_create_ext;
+	rte_event_timer_adapter_free;
+	rte_event_timer_adapter_get_info;
+	rte_event_timer_adapter_lookup;
+	rte_event_timer_adapter_service_id_get;
+	rte_event_timer_adapter_start;
+	rte_event_timer_adapter_stats_get;
+	rte_event_timer_adapter_stats_reset;
+	rte_event_timer_adapter_stop;
+	rte_event_timer_init;
+	rte_event_timer_arm_burst;
+	rte_event_timer_arm_tmo_tick_burst;
+	rte_event_timer_cancel_burst;
+} DPDK_18.02;
-- 
2.6.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4 2/5] vhost: support selective datapath
  2018-03-31  6:10  3%     ` Maxime Coquelin
@ 2018-04-02  1:58  0%       ` Wang, Zhihong
  0 siblings, 0 replies; 200+ results
From: Wang, Zhihong @ 2018-04-02  1:58 UTC (permalink / raw)
  To: Maxime Coquelin, dev
  Cc: Tan, Jianfeng, Bie, Tiwei, yliu, Liang, Cunming, Wang, Xiao W, Daly, Dan



> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Saturday, March 31, 2018 2:10 PM
> To: Wang, Zhihong <zhihong.wang@intel.com>; dev@dpdk.org
> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; Bie, Tiwei
> <tiwei.bie@intel.com>; yliu@fridaylinux.org; Liang, Cunming
> <cunming.liang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>; Daly,
> Dan <dan.daly@intel.com>
> Subject: Re: [PATCH v4 2/5] vhost: support selective datapath
> 
> 
> 
> On 03/10/2018 11:01 AM, Zhihong Wang wrote:
> > This patch set introduces support for selective datapath in DPDK vhost-user
> > lib. vDPA stands for vhost Data Path Acceleration. The idea is to support
> > virtio ring compatible devices to serve virtio driver directly to enable
> > datapath acceleration.
> >
> > A set of device ops is defined for device specific operations:
> >
> >       a. queue_num_get: Called to get supported queue number of the
> device.
> >
> >       b. feature_get: Called to get supported features of the device.
> >
> >       c. protocol_feature_get: Called to get supported protocol features of
> >          the device.
> >
> >       d. dev_conf: Called to configure the actual device when the virtio
> >          device becomes ready.
> >
> >       e. dev_close: Called to close the actual device when the virtio device
> >          is stopped.
> >
> >       f. vring_state_set: Called to change the state of the vring in the
> >          actual device when vring state changes.
> >
> >       g. feature_set: Called to set the negotiated features to device.
> >
> >       h. migration_done: Called to allow the device to response to RARP
> >          sending.
> >
> >       i. get_vfio_group_fd: Called to get the VFIO group fd of the device.
> >
> >       j. get_vfio_device_fd: Called to get the VFIO device fd of the device.
> >
> >       k. get_notify_area: Called to get the notify area info of the queue.
> >
> > Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
> > ---
> > Changes in v4:
> >
> >   1. Remove the "engine" concept in the lib.
> >
> > ---
> > Changes in v2:
> >
> >   1. Add VFIO related vDPA device ops.
> >
> >   lib/librte_vhost/Makefile              |  4 +-
> >   lib/librte_vhost/rte_vdpa.h            | 94
> +++++++++++++++++++++++++++++++++
> >   lib/librte_vhost/rte_vhost_version.map |  6 +++
> >   lib/librte_vhost/vdpa.c                | 96
> ++++++++++++++++++++++++++++++++++
> >   4 files changed, 198 insertions(+), 2 deletions(-)
> >   create mode 100644 lib/librte_vhost/rte_vdpa.h
> >   create mode 100644 lib/librte_vhost/vdpa.c
> >
> > diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> > index 5d6c6abae..37044ac03 100644
> > --- a/lib/librte_vhost/Makefile
> > +++ b/lib/librte_vhost/Makefile
> > @@ -22,9 +22,9 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -
> lrte_ethdev -lrte_net
> >
> >   # all source are stored in SRCS-y
> >   SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c
> \
> > -					vhost_user.c virtio_net.c
> > +					vhost_user.c virtio_net.c vdpa.c
> >
> >   # install includes
> > -SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
> > +SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
> rte_vdpa.h
> >
> >   include $(RTE_SDK)/mk/rte.lib.mk
> > diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
> > new file mode 100644
> > index 000000000..a4bbbd93d
> > --- /dev/null
> > +++ b/lib/librte_vhost/rte_vdpa.h
> > @@ -0,0 +1,94 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2018 Intel Corporation
> > + */
> > +
> > +#ifndef _RTE_VDPA_H_
> > +#define _RTE_VDPA_H_
> > +
> > +/**
> > + * @file
> > + *
> > + * Device specific vhost lib
> > + */
> > +
> > +#include <rte_pci.h>
> > +#include "rte_vhost.h"
> > +
> > +#define MAX_VDPA_NAME_LEN 128
> > +
> > +enum vdpa_addr_type {
> > +	PCI_ADDR,
> > +	VDPA_ADDR_MAX
> > +};
> > +
> > +struct rte_vdpa_dev_addr {
> > +	enum vdpa_addr_type type;
> > +	union {
> > +		uint8_t __dummy[64];
> > +		struct rte_pci_addr pci_addr;
> > +	};
> > +};
> > +
> > +/* Get capabilities of this device */
> > +typedef int (*vdpa_dev_queue_num_get_t)(int did, uint32_t
> *queue_num);
> > +typedef int (*vdpa_dev_feature_get_t)(int did, uint64_t *features);
> > +
> > +/* Driver configure/close the device */
> > +typedef int (*vdpa_dev_conf_t)(int vid);
> > +typedef int (*vdpa_dev_close_t)(int vid);
> > +
> > +/* Enable/disable this vring */
> > +typedef int (*vdpa_vring_state_set_t)(int vid, int vring, int state);
> > +
> > +/* Set features when changed */
> > +typedef int (*vdpa_feature_set_t)(int vid);
> > +
> > +/* Destination operations when migration done */
> > +typedef int (*vdpa_migration_done_t)(int vid);
> > +
> > +/* Get the vfio group fd */
> > +typedef int (*vdpa_get_vfio_group_fd_t)(int vid);
> > +
> > +/* Get the vfio device fd */
> > +typedef int (*vdpa_get_vfio_device_fd_t)(int vid);
> > +
> > +/* Get the notify area info of the queue */
> > +typedef int (*vdpa_get_notify_area_t)(int vid, int qid, uint64_t *offset,
> > +		uint64_t *size);
> > +/* Device ops */
> > +struct rte_vdpa_dev_ops {
> > +	vdpa_dev_queue_num_get_t  queue_num_get;
> > +	vdpa_dev_feature_get_t    feature_get;
> > +	vdpa_dev_feature_get_t    protocol_feature_get;
> > +	vdpa_dev_conf_t           dev_conf;
> > +	vdpa_dev_close_t          dev_close;
> > +	vdpa_vring_state_set_t    vring_state_set;
> > +	vdpa_feature_set_t        feature_set;
> > +	vdpa_migration_done_t     migration_done;
> > +	vdpa_get_vfio_group_fd_t  get_vfio_group_fd;
> > +	vdpa_get_vfio_device_fd_t get_vfio_device_fd;
> > +	vdpa_get_notify_area_t    get_notify_area;
> 
> Maybe you could reserve some room here to avoid breaking the ABI in the
> future if we need to add some optional ops.

Good suggestion.

> 
> > +};
> > +
> > +struct rte_vdpa_device {
> > +	struct rte_vdpa_dev_addr addr;
> > +	struct rte_vdpa_dev_ops *ops;
> > +} __rte_cache_aligned;
> > +
> > +extern struct rte_vdpa_device *vdpa_devices[];
> > +extern uint32_t vdpa_device_num;
> > +
> > +/* Register a vdpa device, return did if successful, -1 on failure */
> > +int __rte_experimental
> > +rte_vdpa_register_device(struct rte_vdpa_dev_addr *addr,
> > +		struct rte_vdpa_dev_ops *ops);
> > +
> > +/* Unregister a vdpa device, return -1 on failure */
> > +int __rte_experimental
> > +rte_vdpa_unregister_device(int did);
> > +
> > +/* Find did of a vdpa device, return -1 on failure */
> > +int __rte_experimental
> > +rte_vdpa_find_device_id(struct rte_vdpa_dev_addr *addr);
> > +
> > +#endif /* _RTE_VDPA_H_ */
> > diff --git a/lib/librte_vhost/rte_vhost_version.map
> b/lib/librte_vhost/rte_vhost_version.map
> > index df0103129..7bcffb490 100644
> > --- a/lib/librte_vhost/rte_vhost_version.map
> > +++ b/lib/librte_vhost/rte_vhost_version.map
> > @@ -59,3 +59,9 @@ DPDK_18.02 {
> >   	rte_vhost_vring_call;
> >
> >   } DPDK_17.08;
> > +
> > +EXPERIMENTAL {
> > +	rte_vdpa_register_device;
> > +	rte_vdpa_unregister_device;
> > +	rte_vdpa_find_device_id;
> 
> I think you need also to declare the new structs here,
> not only the new functions.

Ok.

> 
> > +} DPDK_18.02;
> > diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c
> > new file mode 100644
> > index 000000000..0c950d45f
> > --- /dev/null
> > +++ b/lib/librte_vhost/vdpa.c
> > @@ -0,0 +1,96 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2018 Intel Corporation
> > + */
> > +
> > +/**
> > + * @file
> > + *
> > + * Device specific vhost lib
> > + */
> > +
> > +#include <stdbool.h>
> > +
> > +#include <rte_malloc.h>
> > +#include "rte_vdpa.h"
> > +#include "vhost.h"
> > +
> > +struct rte_vdpa_device *vdpa_devices[MAX_VHOST_DEVICE];
> > +uint32_t vdpa_device_num;
> > +
> > +static int is_same_vdpa_dev_addr(struct rte_vdpa_dev_addr *a,
> > +		struct rte_vdpa_dev_addr *b)
> > +{
> 
> Given the boolean nature of the function name, I would return 1 if same
> device, 0 if different.

Ok, will use bool.

> 
> > +	int ret = 0;
> > +
> > +	if (a->type != b->type)
> > +		return -1;
> > +
> > +	switch (a->type) {
> > +	case PCI_ADDR:
> > +		if (a->pci_addr.domain != b->pci_addr.domain ||
> > +				a->pci_addr.bus != b->pci_addr.bus ||
> > +				a->pci_addr.devid != b->pci_addr.devid ||
> > +				a->pci_addr.function != b->pci_addr.function)
> > +			ret = -1;
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +int rte_vdpa_register_device(struct rte_vdpa_dev_addr *addr,
> > +		struct rte_vdpa_dev_ops *ops)
> > +{
> > +	struct rte_vdpa_device *dev;
> > +	char device_name[MAX_VDPA_NAME_LEN];
> > +	int i;
> > +
> > +	if (vdpa_device_num >= MAX_VHOST_DEVICE)
> > +		return -1;
> > +
> > +	for (i = 0; i < MAX_VHOST_DEVICE; i++) {
> > +		if (vdpa_devices[i] == NULL)
> > +			break;
> You might want to check same device isn't being registering a second
> time, and return an error in that case.

Will do.

Thanks
-Zhihong

> 
> This is not a blocker though, and can be done in a dedicated patch.
> 
> > +	}
> > +
> > +	sprintf(device_name, "vdpa-dev-%d", i);
> > +	dev = rte_zmalloc(device_name, sizeof(struct rte_vdpa_device),
> > +			RTE_CACHE_LINE_SIZE);
> > +	if (!dev)
> > +		return -1;
> > +
> > +	memcpy(&dev->addr, addr, sizeof(struct rte_vdpa_dev_addr));
> > +	dev->ops = ops;
> > +	vdpa_devices[i] = dev;
> > +	vdpa_device_num++;
> > +
> > +	return i;
> > +}
> > +
> > +int rte_vdpa_unregister_device(int did)
> > +{
> > +	if (did < 0 || did >= MAX_VHOST_DEVICE || vdpa_devices[did] ==
> NULL)
> > +		return -1;
> > +
> > +	rte_free(vdpa_devices[did]);
> > +	vdpa_devices[did] = NULL;
> > +	vdpa_device_num--;
> > +
> > +	return did;
> > +}
> > +
> > +int rte_vdpa_find_device_id(struct rte_vdpa_dev_addr *addr)
> > +{
> > +	struct rte_vdpa_device *dev;
> > +	int i;
> > +
> > +	for (i = 0; i < MAX_VHOST_DEVICE; ++i) {
> > +		dev = vdpa_devices[i];
> > +		if (dev && is_same_vdpa_dev_addr(&dev->addr, addr) == 0)
> > +			return i;
> > +	}
> > +
> > +	return -1;
> > +}
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external backend support
  2018-03-29 13:47  3%     ` Wodkowski, PawelX
@ 2018-04-01 19:53  0%       ` Zhang, Roy Fan
  2018-04-03 13:44  0%         ` Maxime Coquelin
  0 siblings, 1 reply; 200+ results
From: Zhang, Roy Fan @ 2018-04-01 19:53 UTC (permalink / raw)
  To: Wodkowski, PawelX, dev; +Cc: maxime.coquelin, jianjay.zhou, Tan, Jianfeng

Hi Pawel,

> -----Original Message-----
> From: Wodkowski, PawelX
> Sent: Thursday, March 29, 2018 2:48 PM
> To: Zhang, Roy Fan <roy.fan.zhang@intel.com>; dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; jianjay.zhou@huawei.com; Tan, Jianfeng
> <jianfeng.tan@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external
> backend support
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
> > Sent: Thursday, March 29, 2018 2:53 PM
> > To: dev@dpdk.org
> > Cc: maxime.coquelin@redhat.com; jianjay.zhou@huawei.com; Tan,
> Jianfeng
> > <jianfeng.tan@intel.com>
> > Subject: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external
> > backend support
> >
> > This patch adds external backend support to vhost library. The patch
> > provides new APIs for the external backend to register pre and post
> > vhost-user message handlers.
> >
> > Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> > ---
> >  lib/librte_vhost/rte_vhost.h           | 64
> > +++++++++++++++++++++++++++++++++-
> >  lib/librte_vhost/rte_vhost_version.map |  6 ++++
> >  lib/librte_vhost/vhost.c               | 17 ++++++++-
> >  lib/librte_vhost/vhost.h               |  8 +++--
> >  lib/librte_vhost/vhost_user.c          | 33 +++++++++++++++++-
> >  5 files changed, 123 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/librte_vhost/rte_vhost.h
> > b/lib/librte_vhost/rte_vhost.h index d332069..b902c44 100644
> > --- a/lib/librte_vhost/rte_vhost.h
> > +++ b/lib/librte_vhost/rte_vhost.h
> > @@ -1,5 +1,5 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> > - * Copyright(c) 2010-2017 Intel Corporation
> > + * Copyright(c) 2010-2018 Intel Corporation
> >   */
> >
> >  #ifndef _RTE_VHOST_H_
> > @@ -88,6 +88,55 @@ struct vhost_device_ops {  };
> >
> >  /**
> > + * function prototype for the vhost backend to handler specific vhost
> > + user
> > + * messages prior to the master message handling
> > + *
> > + * @param vid
> > + *  vhost device id
> > + * @param msg
> > + *  Message pointer.
> > + * @param payload
> > + *  Message payload.
> 
> No payload parameter.
Sorry about that. I will fix the comment.

> 
> > + * @param require_reply
> > + *  If the handler requires sending a reply, this varaible shall be
> > + written 1,
> > + *  otherwise 0.
> > + * @param skip_master
> > + *  If the handler requires skipping the master message handling,
> > + this
> > variable
> > + *  shall be written 1, otherwise 0.
> > + * @return
> > + *  0 on success, -1 on failure
> > + */
> > +typedef int (*rte_vhost_msg_pre_handle)(int vid, void *msg,
> > +		uint32_t *require_reply, uint32_t *skip_master);
> > +
> > +/**
> > + * function prototype for the vhost backend to handler specific vhost
> > +user
> > + * messages after the master message handling is done
> > + *
> > + * @param vid
> > + *  vhost device id
> > + * @param msg
> > + *  Message pointer.
> > + * @param payload
> > + *  Message payload.
> 
> No payload parameter :)
> 

Same here

> > + * @param require_reply
> > + *  If the handler requires sending a reply, this varaible shall be
> > +written 1,
> > + *  otherwise 0.
> > + * @return
> > + *  0 on success, -1 on failure
> > + */
> > +typedef int (*rte_vhost_msg_post_handle)(int vid, void *msg,
> > +		uint32_t *require_reply);
> > +
> 
> What mean 'Message pointer' Is this const for us? Is this payload? Making
> msg 'void *' is not a way to go here. Those pre and post handlers need to see
> exactly the same structures like vhost_user.c file. Otherwise we can get into
> troubles when ABI changes.

It is the pointer to the vhost_user message. It cannot be const as the backend
may change the payload. 

> 
> Also you can easily merge pre and post handlers into one handler with one
> Parameter describing what phase of message processing we are now.
> 

No I don't think so. To do so it will be quite unclear in the future as we are
using one function to do two totally different things. 

> > +/**
> > + * pre and post vhost user message handlers  */ struct
> > +vhost_user_extern_ops {
> > +	rte_vhost_msg_pre_handle pre_msg_handle;
> > +	rte_vhost_msg_post_handle post_msg_handle; };
> > +
> > +/**
> >   * Convert guest physical address to host virtual address
> >   *
> >   * @param mem
> > @@ -434,6 +483,19 @@ int rte_vhost_vring_call(int vid, uint16_t vring_idx);
> >   */
> >  uint32_t rte_vhost_rx_queue_count(int vid, uint16_t qid);
> >
> > +/**
> > + * register external vhost backend
> > + *
> > + * @param vid
> > + *  vhost device ID
> > + * @param ops
> > + *  ops that process external vhost user messages
> > + * @return
> > + *  0 on success, -1 on failure
> > + */
> > +int
> > +rte_vhost_user_register_extern_ops(int vid, struct
> > vhost_user_extern_ops *ops);
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/librte_vhost/rte_vhost_version.map
> > b/lib/librte_vhost/rte_vhost_version.map
> > index df01031..91bf9f0 100644
> > --- a/lib/librte_vhost/rte_vhost_version.map
> > +++ b/lib/librte_vhost/rte_vhost_version.map
> > @@ -59,3 +59,9 @@ DPDK_18.02 {
> >  	rte_vhost_vring_call;
> >
> >  } DPDK_17.08;
> > +
> > +DPDK_18.05 {
> > +	global:
> > +
> > +	rte_vhost_user_register_extern_ops;
> > +} DPDK_18.02;
> > diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index
> > a407067..80af341 100644
> > --- a/lib/librte_vhost/vhost.c
> > +++ b/lib/librte_vhost/vhost.c
> > @@ -1,5 +1,5 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> > - * Copyright(c) 2010-2016 Intel Corporation
> > + * Copyright(c) 2010-2018 Intel Corporation
> >   */
> >
> >  #include <linux/vhost.h>
> > @@ -627,3 +627,18 @@ rte_vhost_rx_queue_count(int vid, uint16_t qid)
> >
> >  	return *((volatile uint16_t *)&vq->avail->idx) - vq->last_avail_idx;
> > }
> > +
> > +int
> > +rte_vhost_user_register_extern_ops(int vid, struct
> > vhost_user_extern_ops *ops)
> > +{
> > +	struct virtio_net *dev;
> > +
> > +	dev = get_device(vid);
> > +	if (dev == NULL)
> > +		return -1;
> > +
> > +	if (ops)
> > +		rte_memcpy(&dev->extern_ops, ops, sizeof(*ops));
> > +
> > +	return 0;
> > +}
> 
> Why we need this new "register" API? Why can't you use one of the (struct
> vhost_device_ops).reserved[0] field to put this callback there?
> I think this is right time to utilize this field.
> 

The patch here is a more generic and intuitive way for external backend to
register the handlers to process the vhost user message only recognized by it.
Please read Maxime's comments in v2 version of this patch.
http://dpdk.org/ml/archives/dev/2018-March/093408.html.
As we discussed we need 2 different handlers for external vhost user device
to handle device specifc vhost user messages. A public API is needed. 

> Can you do something similar to
> http://dpdk.org/ml/archives/dev/2018-March/094213.html ?

The patch content here causes the least damage to the existing library. Plus
the patch you mentioned won't help with the pre and post handlers problem - 
or it would consume all of two remaining reserved fields in vhost_user_ops
structure for pre and post handlers, respectively.

> 
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index
> > d947bc9..2072b88 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -1,5 +1,5 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> > - * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2010-2018 Intel Corporation
> >   */
> >
> >  #ifndef _VHOST_NET_CDEV_H_
> > @@ -241,8 +241,12 @@ struct virtio_net {
> >  	struct guest_page       *guest_pages;
> >
> >  	int			slave_req_fd;
> > -} __rte_cache_aligned;
> >
> > +	/* private data for external virtio device */
> > +	void			*extern_data;
> > +	/* pre and post vhost user message handlers for externel backend */
> > +	struct vhost_user_extern_ops extern_ops; } __rte_cache_aligned;
> >
> >  #define VHOST_LOG_PAGE	4096
> >
> > diff --git a/lib/librte_vhost/vhost_user.c
> > b/lib/librte_vhost/vhost_user.c index 90ed211..ede8a5e 100644
> > --- a/lib/librte_vhost/vhost_user.c
> > +++ b/lib/librte_vhost/vhost_user.c
> > @@ -1,5 +1,5 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> > - * Copyright(c) 2010-2016 Intel Corporation
> > + * Copyright(c) 2010-2018 Intel Corporation
> >   */
> >
> >  #include <stdint.h>
> > @@ -50,6 +50,8 @@ static const char
> > *vhost_message_str[VHOST_USER_MAX] = {
> >  	[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
> >  	[VHOST_USER_SET_SLAVE_REQ_FD]  =
> > "VHOST_USER_SET_SLAVE_REQ_FD",
> >  	[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
> > +	[VHOST_USER_CRYPTO_CREATE_SESS] =
> > "VHOST_USER_CRYPTO_CREATE_SESS",
> > +	[VHOST_USER_CRYPTO_CLOSE_SESS] =
> > "VHOST_USER_CRYPTO_CLOSE_SESS",
> >  };
> >
> >  static uint64_t
> > @@ -1302,6 +1304,7 @@ vhost_user_msg_handler(int vid, int fd)
> >  	struct VhostUserMsg msg;
> >  	int ret;
> >  	int unlock_required = 0;
> > +	uint32_t skip_master = 0;
> >
> >  	dev = get_device(vid);
> >  	if (dev == NULL)
> > @@ -1379,6 +1382,21 @@ vhost_user_msg_handler(int vid, int fd)
> >
> >  	}
> >
> > +	if (dev->extern_ops.pre_msg_handle) {
> > +		uint32_t need_reply;
> > +
> > +		ret = (*dev->extern_ops.pre_msg_handle)(dev->vid,
> > +				(void *)&msg, &need_reply, &skip_master);
> > +		if (ret < 0)
> > +			goto skip_to_reply;
> > +
> > +		if (need_reply)
> > +			send_vhost_reply(fd, &msg);
> > +	}
> > +
> > +	if (skip_master)
> > +		goto skip_to_post_handle;
> 
> This can be moved inside above  if () { }

Yes, you are right.

> 
> > +
> >  	switch (msg.request.master) {
> >  	case VHOST_USER_GET_FEATURES:
> >  		msg.payload.u64 = vhost_user_get_features(dev); @@ -
> 1479,9 +1497,22
> > @@ vhost_user_msg_handler(int vid, int fd)
> >  	default:
> >  		ret = -1;
> >  		break;
> > +	}
> > +
> > +skip_to_post_handle:
> > +	if (dev->extern_ops.post_msg_handle) {
> > +		uint32_t need_reply;
> > +
> > +		ret = (*dev->extern_ops.post_msg_handle)(
> > +				dev->vid, (void *)&msg, &need_reply);
> > +		if (ret < 0)
> > +			goto skip_to_reply;
> >
> > +		if (need_reply)
> > +			send_vhost_reply(fd, &msg);
> >  	}
> >
> > +skip_to_reply:
> >  	if (unlock_required)
> >  		vhost_user_unlock_all_queue_pairs(dev);
> >
> > --
> > 2.7.4
> 
> Overall, I think, this direction where we need to go.
> 
> Pawel

Regards,
Fan

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v6 4/8] ethdev: Add port representor device flag
  2018-03-29 14:53  3%     ` Doherty, Declan
@ 2018-04-01  6:14  0%       ` Shahaf Shuler
  0 siblings, 0 replies; 200+ results
From: Shahaf Shuler @ 2018-04-01  6:14 UTC (permalink / raw)
  To: Doherty, Declan, dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Qi Zhang,
	Alejandro Lucero, Andrew Rybchenko, Mohammad Abdul Awal,
	Remy Horton, John McNamara, Rony Efraim, Wu, Jingjing, Lu,
	Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson, Bruce, Ananyev,
	Konstantin, Wang, Zhihong

Thursday, March 29, 2018 5:53 PM, Doherty, Declan:
> On 29/03/2018 7:13 AM, Shahaf Shuler wrote:
> > Wednesday, March 28, 2018 4:54 PM, Declan Doherty:
> >> Subject: [dpdk-dev][PATCH v6 4/8] ethdev: Add port representor device
> >> flag
> >>
> >> Add new device flag to specify that ethdev port is a port representor.
> >> Extend rte_eth_dev_info structure to expose device flags to user
> >> which enable applications to discover if a port is a representor port.
> >>
> >> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> >> ---
> >>   lib/librte_ether/rte_ethdev.c             | 1 +
> >>   lib/librte_ether/rte_ethdev.h             | 9 ++++++---
> >>   lib/librte_ether/rte_ethdev_representor.h | 3 +++
> >>   3 files changed, 10 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/lib/librte_ether/rte_ethdev.c
> >> b/lib/librte_ether/rte_ethdev.c index c719f84a3..163246433 100644
> >> --- a/lib/librte_ether/rte_ethdev.c
> >> +++ b/lib/librte_ether/rte_ethdev.c
> >> @@ -2399,6 +2399,7 @@ rte_eth_dev_info_get(uint16_t port_id, struct
> >> rte_eth_dev_info *dev_info)
> >>   	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
> >>   	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
> >>   	dev_info->switch_id = dev->data->switch_id;
> >> +	dev_info->dev_flags = dev->data->dev_flags;
> >>   }
> >>
> >>   int
> >> diff --git a/lib/librte_ether/rte_ethdev.h
> >> b/lib/librte_ether/rte_ethdev.h index dced4fc41..226acc8b1 100644
> >> --- a/lib/librte_ether/rte_ethdev.h
> >> +++ b/lib/librte_ether/rte_ethdev.h
> >> @@ -996,6 +996,7 @@ struct rte_eth_dev_info {
> >>   	const char *driver_name; /**< Device Driver name. */
> >>   	unsigned int if_index; /**< Index to bound host interface, or 0 if
> >> none.
> >>   		Use if_indextoname() to translate into an interface name. */
> >> +	uint32_t dev_flags; /**< Device flags */
> >>   	uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
> >>   	uint32_t max_rx_pktlen; /**< Maximum configurable length of RX
> >> pkt. */
> >>   	uint16_t max_rx_queues; /**< Maximum number of RX queues. */
> @@
> >> -1229,11 +1230,13 @@ struct rte_eth_dev_owner {  };
> >>
> >>   /** Device supports link state interrupt */
> >> -#define RTE_ETH_DEV_INTR_LSC     0x0002
> >> +#define RTE_ETH_DEV_INTR_LSC		0x0002
> >>   /** Device is a bonded slave */
> >> -#define RTE_ETH_DEV_BONDED_SLAVE 0x0004
> >> +#define RTE_ETH_DEV_BONDED_SLAVE	0x0004
> >>   /** Device supports device removal interrupt */
> >> -#define RTE_ETH_DEV_INTR_RMV     0x0008
> >> +#define RTE_ETH_DEV_INTR_RMV		0x0008
> >> +/** Device is port representor */
> >> +#define RTE_ETH_DEV_REPRESENTOR		0x0010
> >
> > Maybe it is a good time to make some order here.
> > I understand the decision to use flags instead of bit-field. It is better.
> >
> > However there is a mix here of device capabilities like :
> RTE_ETH_DEV_INTR_LSC   and RTE_ETH_DEV_INTR_RMV
> > And device attributes like : RTE_ETH_DEV_BONDED_SLAVE and
> RTE_ETH_DEV_REPRESENTOR.
> > I don't think they belong together under the genetic name of dev_flags.
> >
> > Moreover, I am not sure the fact device is bonded slave should be exposed
> to the application. It should be internal to ethdev and its port iterators.
> 
> That's a good point on the bonded slave flag, I'll look at fixing that for the
> next release. I don't think changing it should effect ABI but I'll need to have a
> closer look.
> 
> Do you think that we should have a separate device attributes field, which
> the representor flag is contained in.
> 
> >
> > Finally I think representor port may need more info now (and in the
> future), for example the associated vf id.
> > For that, I think it is better it to be exposed as a dedicated struct on device
> info.
> 
> I think a switch port id should suffice for that, for SR-IOV devices it would
> map to the vf_id.

I think we need both switch_domain and vf_id. 
Because for representors, the application should know which VFs can be reached from this representor and which VF it represent. 

> 
> >
> >>
> >>   /**
> >>    * @warning
> >> diff --git a/lib/librte_ether/rte_ethdev_representor.h
> >> b/lib/librte_ether/rte_ethdev_representor.h
> >> index cbc1f2855..f3726d0ba 100644
> >> --- a/lib/librte_ether/rte_ethdev_representor.h
> >> +++ b/lib/librte_ether/rte_ethdev_representor.h
> >> @@ -22,6 +22,9 @@ eth_dev_representor_port_init(struct rte_eth_dev
> >> *ethdev, void *init_params)
> >>   	/** representor inherits the switch id of it's base device */
> >>   	ethdev->data->switch_id = base_ethdev->data->switch_id;
> >>
> >> +	/** Set device flags to specify that device is a representor port */
> >> +	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
> >
> > Should be set in the PMD, not in ethdev layer
> 
> As in the previous patch this is just a generic port bus init function which
> meets the simplest use case of representor port with a single switch domain,
> a PMD doesn't need to use it but having it here saves duplicating the same
> code across multiple PMD which are only supporting the basic mode.
> 
> >
> >> +
> >>   	return 0;
> >>   }
> >>
> >> --
> >> 2.14.3
> >


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v7] eal: provide API for querying valid socket id's
  2018-03-22 12:36  5%   ` [dpdk-dev] [PATCH v6] " Anatoly Burakov
  2018-03-22 17:07  0%     ` gowrishankar muthukrishnan
  2018-03-27 16:24  3%     ` Thomas Monjalon
@ 2018-03-31 17:08  5%     ` Anatoly Burakov
  2018-04-04 22:31  3%       ` Thomas Monjalon
  2 siblings, 1 reply; 200+ results
From: Anatoly Burakov @ 2018-03-31 17:08 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, John McNamara, Marko Kovacevic, Bruce Richardson,
	thomas, chaozhu, gowrishankar.m

During lcore scan, find all socket ID's and store them, and
provide public API to query valid socket id's. This will break
the ABI, so bump ABI version.

Also, remove deprecation notice corresponding to this change.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
---

Notes:
    v7:
    - Renamed rte_num_socket_ids() to rte_socket_count()
    - Removed deprecation notice associated with this change
    - Addressed review comments
    
    v6:
    - Fixed meson ABI version header
    
    v5:
    - Move API to experimental
    - Store list of valid socket id's instead of simply
      recording the biggest one
    
    v4:
    - Remove backwards ABI compatibility, bump ABI instead
    
    v3:
    - Added ABI compatibility
    
    v2:
    - checkpatch changes
    - check socket before deciding if the core is not to be used

 doc/guides/rel_notes/deprecation.rst      |  3 --
 lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
 lib/librte_eal/common/eal_common_lcore.c  | 75 ++++++++++++++++++++++++++-----
 lib/librte_eal/common/include/rte_eal.h   |  2 +
 lib/librte_eal/common/include/rte_lcore.h | 30 +++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
 lib/librte_eal/meson.build                |  2 +-
 lib/librte_eal/rte_eal_version.map        |  2 +
 8 files changed, 100 insertions(+), 18 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 74c18ed..80472f5 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -38,9 +38,6 @@ Deprecation Notices
   success and failure, respectively.  This will change to 1 and 0 for true and
   false, respectively, to make use of the function more intuitive.
 
-* eal: new ``numa_node_count`` member will be added to ``rte_config`` structure
-  in v18.05.
-
 * eal: due to internal data layout reorganization, there will be changes to
   several structures and functions as a result of coming changes to support
   memory hotplug in v18.05.
diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index dd455e6..ed1d17b 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -21,7 +21,7 @@ LDLIBS += -lgcc_s
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 6
+LIBABIVER := 7
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/common/eal_common_lcore.c b/lib/librte_eal/common/eal_common_lcore.c
index 7724fa4..3167e9d 100644
--- a/lib/librte_eal/common/eal_common_lcore.c
+++ b/lib/librte_eal/common/eal_common_lcore.c
@@ -7,6 +7,7 @@
 #include <string.h>
 #include <dirent.h>
 
+#include <rte_errno.h>
 #include <rte_log.h>
 #include <rte_eal.h>
 #include <rte_lcore.h>
@@ -16,6 +17,19 @@
 #include "eal_private.h"
 #include "eal_thread.h"
 
+static int
+socket_id_cmp(const void *a, const void *b)
+{
+	const int *lcore_id_a = a;
+	const int *lcore_id_b = b;
+
+	if (*lcore_id_a < *lcore_id_b)
+		return -1;
+	if (*lcore_id_a > *lcore_id_b)
+		return 1;
+	return 0;
+}
+
 /*
  * Parse /sys/devices/system/cpu to get the number of physical and logical
  * processors on the machine. The function will fill the cpu_info
@@ -28,6 +42,8 @@ rte_eal_cpu_init(void)
 	struct rte_config *config = rte_eal_get_configuration();
 	unsigned lcore_id;
 	unsigned count = 0;
+	unsigned int socket_id, prev_socket_id;
+	int lcore_to_socket_id[RTE_MAX_LCORE];
 
 	/*
 	 * Parse the maximum set of logical cores, detect the subset of running
@@ -39,6 +55,19 @@ rte_eal_cpu_init(void)
 		/* init cpuset for per lcore config */
 		CPU_ZERO(&lcore_config[lcore_id].cpuset);
 
+		/* find socket first */
+		socket_id = eal_cpu_socket_id(lcore_id);
+		if (socket_id >= RTE_MAX_NUMA_NODES) {
+#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
+			socket_id = 0;
+#else
+			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than RTE_MAX_NUMA_NODES (%d)\n",
+					socket_id, RTE_MAX_NUMA_NODES);
+			return -1;
+#endif
+		}
+		lcore_to_socket_id[lcore_id] = socket_id;
+
 		/* in 1:1 mapping, record related cpu detected state */
 		lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
 		if (lcore_config[lcore_id].detected == 0) {
@@ -54,18 +83,7 @@ rte_eal_cpu_init(void)
 		config->lcore_role[lcore_id] = ROLE_RTE;
 		lcore_config[lcore_id].core_role = ROLE_RTE;
 		lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
-		lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
-		if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
-#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
-			lcore_config[lcore_id].socket_id = 0;
-#else
-			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
-				"RTE_MAX_NUMA_NODES (%d)\n",
-				lcore_config[lcore_id].socket_id,
-				RTE_MAX_NUMA_NODES);
-			return -1;
-#endif
-		}
+		lcore_config[lcore_id].socket_id = socket_id;
 		RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
 				"core %u on socket %u\n",
 				lcore_id, lcore_config[lcore_id].core_id,
@@ -79,5 +97,38 @@ rte_eal_cpu_init(void)
 		RTE_MAX_LCORE);
 	RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
 
+	/* sort all socket id's in ascending order */
+	qsort(lcore_to_socket_id, RTE_DIM(lcore_to_socket_id),
+			sizeof(lcore_to_socket_id[0]), socket_id_cmp);
+
+	prev_socket_id = -1;
+	config->numa_node_count = 0;
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		socket_id = lcore_to_socket_id[lcore_id];
+		if (socket_id != prev_socket_id)
+			config->numa_nodes[config->numa_node_count++] =
+					socket_id;
+		prev_socket_id = socket_id;
+	}
+	RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", config->numa_node_count);
+
 	return 0;
 }
+
+unsigned int __rte_experimental
+rte_socket_count(void)
+{
+	const struct rte_config *config = rte_eal_get_configuration();
+	return config->numa_node_count;
+}
+
+int __rte_experimental
+rte_socket_id_by_idx(unsigned int idx)
+{
+	const struct rte_config *config = rte_eal_get_configuration();
+	if (idx >= config->numa_node_count) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	return config->numa_nodes[idx];
+}
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 93ca4cc..991cbe0 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -57,6 +57,8 @@ enum rte_proc_type_t {
 struct rte_config {
 	uint32_t master_lcore;       /**< Id of the master lcore */
 	uint32_t lcore_count;        /**< Number of available logical cores. */
+	uint32_t numa_node_count;    /**< Number of detected NUMA nodes. */
+	uint32_t numa_nodes[RTE_MAX_NUMA_NODES]; /**< List of detected numa nodes. */
 	uint32_t service_lcore_count;/**< Number of available service cores. */
 	enum rte_lcore_role_t lcore_role[RTE_MAX_LCORE]; /**< State of cores. */
 
diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h
index 0472220..7312975 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -132,6 +132,36 @@ rte_lcore_index(int lcore_id)
 unsigned rte_socket_id(void);
 
 /**
+ * Return number of physical sockets detected on the system.
+ *
+ * Note that number of nodes may not be correspondent to their physical id's:
+ * for example, a system may report two socket id's, but the actual socket id's
+ * may be 0 and 8.
+ *
+ * @return
+ *   the number of physical sockets as recognized by EAL
+ */
+unsigned int __rte_experimental
+rte_socket_count(void);
+
+/**
+ * Return socket id with a particular index.
+ *
+ * This will return socket id at a particular position in list of all detected
+ * physical socket id's. For example, on a machine with sockets [0, 8], passing
+ * 1 as a parameter will return 8.
+ *
+ * @param idx
+ *   index of physical socket id to return
+ *
+ * @return
+ *   - physical socket id as recognized by EAL
+ *   - -1 on error, with errno set to EINVAL
+ */
+int __rte_experimental
+rte_socket_id_by_idx(unsigned int idx);
+
+/**
  * Get the ID of the physical socket of the specified lcore
  *
  * @param lcore_id
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 7e5bbe8..b9c7727 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 6
+LIBABIVER := 7
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index 15d1c6a..4aa63e3 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -21,7 +21,7 @@ else
 	error('unsupported system type @0@'.format(hostmachine.system()))
 endif
 
-version = 6  # the version of the EAL API
+version = 7  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 cflags += '-D_GNU_SOURCE'
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 1d88437..30ec1fc 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -257,5 +257,7 @@ EXPERIMENTAL {
 	rte_service_set_runstate_mapped_check;
 	rte_service_set_stats_enable;
 	rte_service_start_with_defaults;
+	rte_socket_count;
+	rte_socket_id_by_idx;
 
 } DPDK_18.02;
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v5 1/7] crypto/virtio: add virtio related fundamental functions
  @ 2018-03-31  9:18  2% ` Jay Zhou
  0 siblings, 0 replies; 200+ results
From: Jay Zhou @ 2018-03-31  9:18 UTC (permalink / raw)
  To: dev
  Cc: pablo.de.lara.guarch, roy.fan.zhang, thomas, arei.gonglei,
	xin.zeng, weidong.huang, wangxinxin.wang, longpeng2,
	jianjay.zhou

Since there does not have the common virtio library, we have to put
these files here. They are basically the same with virtio net related files
with some minor changes.

Meanwhile, adding virtio crypto PMD related release note for 18.05.

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 config/common_base                     |  14 +
 doc/guides/rel_notes/release_18_05.rst |   6 +
 drivers/crypto/virtio/virtio_logs.h    |  91 +++++++
 drivers/crypto/virtio/virtio_pci.c     | 460 +++++++++++++++++++++++++++++++++
 drivers/crypto/virtio/virtio_pci.h     | 253 ++++++++++++++++++
 drivers/crypto/virtio/virtio_ring.h    | 137 ++++++++++
 drivers/crypto/virtio/virtqueue.c      |  43 +++
 drivers/crypto/virtio/virtqueue.h      | 172 ++++++++++++
 8 files changed, 1176 insertions(+)
 create mode 100644 drivers/crypto/virtio/virtio_logs.h
 create mode 100644 drivers/crypto/virtio/virtio_pci.c
 create mode 100644 drivers/crypto/virtio/virtio_pci.h
 create mode 100644 drivers/crypto/virtio/virtio_ring.h
 create mode 100644 drivers/crypto/virtio/virtqueue.c
 create mode 100644 drivers/crypto/virtio/virtqueue.h

diff --git a/config/common_base b/config/common_base
index a842478..bf6bbc7 100644
--- a/config/common_base
+++ b/config/common_base
@@ -486,6 +486,20 @@ CONFIG_RTE_LIBRTE_PMD_QAT_DEBUG_DRIVER=n
 CONFIG_RTE_QAT_PMD_MAX_NB_SESSIONS=2048
 
 #
+# Compile PMD for virtio crypto devices
+#
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO=n
+#
+# Number of maximum virtio crypto devices
+#
+CONFIG_RTE_MAX_VIRTIO_CRYPTO=32
+#
+# Number of sessions to create in the session memory pool
+# on a single virtio crypto device.
+#
+CONFIG_RTE_VIRTIO_CRYPTO_PMD_MAX_NB_SESSIONS=1024
+
+#
 # Compile PMD for AESNI backed device
 #
 CONFIG_RTE_LIBRTE_PMD_AESNI_MB=n
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 0eeabf5..a90c25e 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -53,6 +53,12 @@ New Features
   :doc:`../cryptodevs/ccp` crypto driver guide for more details on
   this new driver.
 
+* **Added the virtio crypto PMD.**
+
+  Added a new virtio crypto PMD, which provides AES-CBC ciphering and
+  AES-CBC with HMAC-SHA1 algorithm-chaining. See the
+  :doc:`../cryptodevs/virtio` crypto driver guide for more details on
+  this new driver.
 
 API Changes
 -----------
diff --git a/drivers/crypto/virtio/virtio_logs.h b/drivers/crypto/virtio/virtio_logs.h
new file mode 100644
index 0000000..26a286c
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_logs.h
@@ -0,0 +1,91 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_LOGS_H_
+#define _VIRTIO_LOGS_H_
+
+#include <rte_log.h>
+
+#define PMD_INIT_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, RTE_LOGTYPE_PMD, \
+		"PMD: %s(): " fmt "\n", __func__, ##args)
+
+#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
+
+extern int virtio_crypto_logtype_init;
+
+#define VIRTIO_CRYPTO_INIT_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_init, \
+		"INIT: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_INIT_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_INIT_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_INIT_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_INIT_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_INIT_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_INIT_LOG_IMPL(ERR, fmt, ## args)
+
+extern int virtio_crypto_logtype_session;
+
+#define VIRTIO_CRYPTO_SESSION_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_session, \
+		"SESSION: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_SESSION_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_SESSION_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_SESSION_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_SESSION_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_SESSION_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_SESSION_LOG_IMPL(ERR, fmt, ## args)
+
+extern int virtio_crypto_logtype_rx;
+
+#define VIRTIO_CRYPTO_RX_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_rx, \
+		"RX: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_RX_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_RX_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_RX_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_RX_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_RX_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_RX_LOG_IMPL(ERR, fmt, ## args)
+
+extern int virtio_crypto_logtype_tx;
+
+#define VIRTIO_CRYPTO_TX_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_tx, \
+		"TX: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_TX_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_TX_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_TX_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_TX_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_TX_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_TX_LOG_IMPL(ERR, fmt, ## args)
+
+extern int virtio_crypto_logtype_driver;
+
+#define VIRTIO_CRYPTO_DRV_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_driver, \
+		"DRIVER: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_DRV_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_DRV_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_DRV_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_DRV_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_DRV_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_DRV_LOG_IMPL(ERR, fmt, ## args)
+
+#endif /* _VIRTIO_LOGS_H_ */
diff --git a/drivers/crypto/virtio/virtio_pci.c b/drivers/crypto/virtio/virtio_pci.c
new file mode 100644
index 0000000..43ec1a4
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_pci.c
@@ -0,0 +1,460 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#include <stdint.h>
+
+#ifdef RTE_EXEC_ENV_LINUXAPP
+ #include <dirent.h>
+ #include <fcntl.h>
+#endif
+
+#include <rte_io.h>
+#include <rte_bus.h>
+
+#include "virtio_pci.h"
+#include "virtqueue.h"
+
+/*
+ * Following macros are derived from linux/pci_regs.h, however,
+ * we can't simply include that header here, as there is no such
+ * file for non-Linux platform.
+ */
+#define PCI_CAPABILITY_LIST	0x34
+#define PCI_CAP_ID_VNDR		0x09
+#define PCI_CAP_ID_MSIX		0x11
+
+/*
+ * The remaining space is defined by each driver as the per-driver
+ * configuration space.
+ */
+#define VIRTIO_PCI_CONFIG(hw) \
+		(((hw)->use_msix == VIRTIO_MSIX_ENABLED) ? 24 : 20)
+
+static inline int
+check_vq_phys_addr_ok(struct virtqueue *vq)
+{
+	/* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
+	 * and only accepts 32 bit page frame number.
+	 * Check if the allocated physical memory exceeds 16TB.
+	 */
+	if ((vq->vq_ring_mem + vq->vq_ring_size - 1) >>
+			(VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR("vring address shouldn't be above 16TB!");
+		return 0;
+	}
+
+	return 1;
+}
+
+static inline void
+io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
+{
+	rte_write32(val & ((1ULL << 32) - 1), lo);
+	rte_write32(val >> 32,		     hi);
+}
+
+static void
+modern_read_dev_config(struct virtio_crypto_hw *hw, size_t offset,
+		       void *dst, int length)
+{
+	int i;
+	uint8_t *p;
+	uint8_t old_gen, new_gen;
+
+	do {
+		old_gen = rte_read8(&hw->common_cfg->config_generation);
+
+		p = dst;
+		for (i = 0;  i < length; i++)
+			*p++ = rte_read8((uint8_t *)hw->dev_cfg + offset + i);
+
+		new_gen = rte_read8(&hw->common_cfg->config_generation);
+	} while (old_gen != new_gen);
+}
+
+static void
+modern_write_dev_config(struct virtio_crypto_hw *hw, size_t offset,
+			const void *src, int length)
+{
+	int i;
+	const uint8_t *p = src;
+
+	for (i = 0;  i < length; i++)
+		rte_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
+}
+
+static uint64_t
+modern_get_features(struct virtio_crypto_hw *hw)
+{
+	uint32_t features_lo, features_hi;
+
+	rte_write32(0, &hw->common_cfg->device_feature_select);
+	features_lo = rte_read32(&hw->common_cfg->device_feature);
+
+	rte_write32(1, &hw->common_cfg->device_feature_select);
+	features_hi = rte_read32(&hw->common_cfg->device_feature);
+
+	return ((uint64_t)features_hi << 32) | features_lo;
+}
+
+static void
+modern_set_features(struct virtio_crypto_hw *hw, uint64_t features)
+{
+	rte_write32(0, &hw->common_cfg->guest_feature_select);
+	rte_write32(features & ((1ULL << 32) - 1),
+		    &hw->common_cfg->guest_feature);
+
+	rte_write32(1, &hw->common_cfg->guest_feature_select);
+	rte_write32(features >> 32,
+		    &hw->common_cfg->guest_feature);
+}
+
+static uint8_t
+modern_get_status(struct virtio_crypto_hw *hw)
+{
+	return rte_read8(&hw->common_cfg->device_status);
+}
+
+static void
+modern_set_status(struct virtio_crypto_hw *hw, uint8_t status)
+{
+	rte_write8(status, &hw->common_cfg->device_status);
+}
+
+static void
+modern_reset(struct virtio_crypto_hw *hw)
+{
+	modern_set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	modern_get_status(hw);
+}
+
+static uint8_t
+modern_get_isr(struct virtio_crypto_hw *hw)
+{
+	return rte_read8(hw->isr);
+}
+
+static uint16_t
+modern_set_config_irq(struct virtio_crypto_hw *hw, uint16_t vec)
+{
+	rte_write16(vec, &hw->common_cfg->msix_config);
+	return rte_read16(&hw->common_cfg->msix_config);
+}
+
+static uint16_t
+modern_set_queue_irq(struct virtio_crypto_hw *hw, struct virtqueue *vq,
+		uint16_t vec)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+	rte_write16(vec, &hw->common_cfg->queue_msix_vector);
+	return rte_read16(&hw->common_cfg->queue_msix_vector);
+}
+
+static uint16_t
+modern_get_queue_num(struct virtio_crypto_hw *hw, uint16_t queue_id)
+{
+	rte_write16(queue_id, &hw->common_cfg->queue_select);
+	return rte_read16(&hw->common_cfg->queue_size);
+}
+
+static int
+modern_setup_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
+{
+	uint64_t desc_addr, avail_addr, used_addr;
+	uint16_t notify_off;
+
+	if (!check_vq_phys_addr_ok(vq))
+		return -1;
+
+	desc_addr = vq->vq_ring_mem;
+	avail_addr = desc_addr + vq->vq_nentries * sizeof(struct vring_desc);
+	used_addr = RTE_ALIGN_CEIL(avail_addr + offsetof(struct vring_avail,
+							 ring[vq->vq_nentries]),
+				   VIRTIO_PCI_VRING_ALIGN);
+
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(desc_addr, &hw->common_cfg->queue_desc_lo,
+				      &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(avail_addr, &hw->common_cfg->queue_avail_lo,
+				       &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(used_addr, &hw->common_cfg->queue_used_lo,
+				      &hw->common_cfg->queue_used_hi);
+
+	notify_off = rte_read16(&hw->common_cfg->queue_notify_off);
+	vq->notify_addr = (void *)((uint8_t *)hw->notify_base +
+				notify_off * hw->notify_off_multiplier);
+
+	rte_write16(1, &hw->common_cfg->queue_enable);
+
+	VIRTIO_CRYPTO_INIT_LOG_DBG("queue %u addresses:", vq->vq_queue_index);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("\t desc_addr: %" PRIx64, desc_addr);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("\t aval_addr: %" PRIx64, avail_addr);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("\t used_addr: %" PRIx64, used_addr);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("\t notify addr: %p (notify offset: %u)",
+		vq->notify_addr, notify_off);
+
+	return 0;
+}
+
+static void
+modern_del_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(0, &hw->common_cfg->queue_desc_lo,
+				  &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_avail_lo,
+				  &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_used_lo,
+				  &hw->common_cfg->queue_used_hi);
+
+	rte_write16(0, &hw->common_cfg->queue_enable);
+}
+
+static void
+modern_notify_queue(struct virtio_crypto_hw *hw __rte_unused,
+		struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, vq->notify_addr);
+}
+
+const struct virtio_pci_ops virtio_crypto_modern_ops = {
+	.read_dev_cfg	= modern_read_dev_config,
+	.write_dev_cfg	= modern_write_dev_config,
+	.reset		= modern_reset,
+	.get_status	= modern_get_status,
+	.set_status	= modern_set_status,
+	.get_features	= modern_get_features,
+	.set_features	= modern_set_features,
+	.get_isr	= modern_get_isr,
+	.set_config_irq	= modern_set_config_irq,
+	.set_queue_irq  = modern_set_queue_irq,
+	.get_queue_num	= modern_get_queue_num,
+	.setup_queue	= modern_setup_queue,
+	.del_queue	= modern_del_queue,
+	.notify_queue	= modern_notify_queue,
+};
+
+void
+vtpci_read_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+		void *dst, int length)
+{
+	VTPCI_OPS(hw)->read_dev_cfg(hw, offset, dst, length);
+}
+
+void
+vtpci_write_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+		const void *src, int length)
+{
+	VTPCI_OPS(hw)->write_dev_cfg(hw, offset, src, length);
+}
+
+uint64_t
+vtpci_cryptodev_negotiate_features(struct virtio_crypto_hw *hw,
+		uint64_t host_features)
+{
+	uint64_t features;
+
+	/*
+	 * Limit negotiated features to what the driver, virtqueue, and
+	 * host all support.
+	 */
+	features = host_features & hw->guest_features;
+	VTPCI_OPS(hw)->set_features(hw, features);
+
+	return features;
+}
+
+void
+vtpci_cryptodev_reset(struct virtio_crypto_hw *hw)
+{
+	VTPCI_OPS(hw)->set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	/* flush status write */
+	VTPCI_OPS(hw)->get_status(hw);
+}
+
+void
+vtpci_cryptodev_reinit_complete(struct virtio_crypto_hw *hw)
+{
+	vtpci_cryptodev_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
+}
+
+void
+vtpci_cryptodev_set_status(struct virtio_crypto_hw *hw, uint8_t status)
+{
+	if (status != VIRTIO_CONFIG_STATUS_RESET)
+		status |= VTPCI_OPS(hw)->get_status(hw);
+
+	VTPCI_OPS(hw)->set_status(hw, status);
+}
+
+uint8_t
+vtpci_cryptodev_get_status(struct virtio_crypto_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_status(hw);
+}
+
+uint8_t
+vtpci_cryptodev_isr(struct virtio_crypto_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_isr(hw);
+}
+
+static void *
+get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap)
+{
+	uint8_t  bar    = cap->bar;
+	uint32_t length = cap->length;
+	uint32_t offset = cap->offset;
+	uint8_t *base;
+
+	if (bar >= PCI_MAX_RESOURCE) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR("invalid bar: %u", bar);
+		return NULL;
+	}
+
+	if (offset + length < offset) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR("offset(%u) + length(%u) overflows",
+			offset, length);
+		return NULL;
+	}
+
+	if (offset + length > dev->mem_resource[bar].len) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR(
+			"invalid cap: overflows bar space: %u > %" PRIu64,
+			offset + length, dev->mem_resource[bar].len);
+		return NULL;
+	}
+
+	base = dev->mem_resource[bar].addr;
+	if (base == NULL) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR("bar %u base addr is NULL", bar);
+		return NULL;
+	}
+
+	return base + offset;
+}
+
+#define PCI_MSIX_ENABLE 0x8000
+
+static int
+virtio_read_caps(struct rte_pci_device *dev, struct virtio_crypto_hw *hw)
+{
+	uint8_t pos;
+	struct virtio_pci_cap cap;
+	int ret;
+
+	if (rte_pci_map_device(dev)) {
+		VIRTIO_CRYPTO_INIT_LOG_DBG("failed to map pci device!");
+		return -1;
+	}
+
+	ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
+	if (ret < 0) {
+		VIRTIO_CRYPTO_INIT_LOG_DBG("failed to read pci capability list");
+		return -1;
+	}
+
+	while (pos) {
+		ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
+		if (ret < 0) {
+			VIRTIO_CRYPTO_INIT_LOG_ERR(
+				"failed to read pci cap at pos: %x", pos);
+			break;
+		}
+
+		if (cap.cap_vndr == PCI_CAP_ID_MSIX) {
+			/* Transitional devices would also have this capability,
+			 * that's why we also check if msix is enabled.
+			 * 1st byte is cap ID; 2nd byte is the position of next
+			 * cap; next two bytes are the flags.
+			 */
+			uint16_t flags = ((uint16_t *)&cap)[1];
+
+			if (flags & PCI_MSIX_ENABLE)
+				hw->use_msix = VIRTIO_MSIX_ENABLED;
+			else
+				hw->use_msix = VIRTIO_MSIX_DISABLED;
+		}
+
+		if (cap.cap_vndr != PCI_CAP_ID_VNDR) {
+			VIRTIO_CRYPTO_INIT_LOG_DBG(
+				"[%2x] skipping non VNDR cap id: %02x",
+				pos, cap.cap_vndr);
+			goto next;
+		}
+
+		VIRTIO_CRYPTO_INIT_LOG_DBG(
+			"[%2x] cfg type: %u, bar: %u, offset: %04x, len: %u",
+			pos, cap.cfg_type, cap.bar, cap.offset, cap.length);
+
+		switch (cap.cfg_type) {
+		case VIRTIO_PCI_CAP_COMMON_CFG:
+			hw->common_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_NOTIFY_CFG:
+			rte_pci_read_config(dev, &hw->notify_off_multiplier,
+					4, pos + sizeof(cap));
+			hw->notify_base = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_DEVICE_CFG:
+			hw->dev_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_ISR_CFG:
+			hw->isr = get_cfg_addr(dev, &cap);
+			break;
+		}
+
+next:
+		pos = cap.cap_next;
+	}
+
+	if (hw->common_cfg == NULL || hw->notify_base == NULL ||
+	    hw->dev_cfg == NULL    || hw->isr == NULL) {
+		VIRTIO_CRYPTO_INIT_LOG_INFO("no modern virtio pci device found.");
+		return -1;
+	}
+
+	VIRTIO_CRYPTO_INIT_LOG_INFO("found modern virtio pci device.");
+
+	VIRTIO_CRYPTO_INIT_LOG_DBG("common cfg mapped at: %p", hw->common_cfg);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("device cfg mapped at: %p", hw->dev_cfg);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("isr cfg mapped at: %p", hw->isr);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("notify base: %p, notify off multiplier: %u",
+		hw->notify_base, hw->notify_off_multiplier);
+
+	return 0;
+}
+
+/*
+ * Return -1:
+ *   if there is error mapping with VFIO/UIO.
+ *   if port map error when driver type is KDRV_NONE.
+ *   if whitelisted but driver type is KDRV_UNKNOWN.
+ * Return 1 if kernel driver is managing the device.
+ * Return 0 on success.
+ */
+int
+vtpci_cryptodev_init(struct rte_pci_device *dev, struct virtio_crypto_hw *hw)
+{
+	/*
+	 * Try if we can succeed reading virtio pci caps, which exists
+	 * only on modern pci device. If failed, we fallback to legacy
+	 * virtio handling.
+	 */
+	if (virtio_read_caps(dev, hw) == 0) {
+		VIRTIO_CRYPTO_INIT_LOG_INFO("modern virtio pci detected.");
+		virtio_hw_internal[hw->dev_id].vtpci_ops =
+					&virtio_crypto_modern_ops;
+		hw->modern = 1;
+		return 0;
+	}
+
+	/*
+	 * virtio crypto conforms to virtio 1.0 and doesn't support
+	 * legacy mode
+	 */
+	return -1;
+}
diff --git a/drivers/crypto/virtio/virtio_pci.h b/drivers/crypto/virtio/virtio_pci.h
new file mode 100644
index 0000000..cd316a6
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_pci.h
@@ -0,0 +1,253 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_PCI_H_
+#define _VIRTIO_PCI_H_
+
+#include <linux/virtio_crypto.h>
+
+#include <stdint.h>
+
+#include <rte_pci.h>
+#include <rte_bus_pci.h>
+#include <rte_cryptodev.h>
+
+struct virtqueue;
+
+/* VirtIO PCI vendor/device ID. */
+#define VIRTIO_CRYPTO_PCI_VENDORID 0x1AF4
+#define VIRTIO_CRYPTO_PCI_DEVICEID 0x1054
+
+/* VirtIO ABI version, this must match exactly. */
+#define VIRTIO_PCI_ABI_VERSION 0
+
+/*
+ * VirtIO Header, located in BAR 0.
+ */
+#define VIRTIO_PCI_HOST_FEATURES  0  /* host's supported features (32bit, RO)*/
+#define VIRTIO_PCI_GUEST_FEATURES 4  /* guest's supported features (32, RW) */
+#define VIRTIO_PCI_QUEUE_PFN      8  /* physical address of VQ (32, RW) */
+#define VIRTIO_PCI_QUEUE_NUM      12 /* number of ring entries (16, RO) */
+#define VIRTIO_PCI_QUEUE_SEL      14 /* current VQ selection (16, RW) */
+#define VIRTIO_PCI_QUEUE_NOTIFY   16 /* notify host regarding VQ (16, RW) */
+#define VIRTIO_PCI_STATUS         18 /* device status register (8, RW) */
+#define VIRTIO_PCI_ISR            19 /* interrupt status register, reading
+				      * also clears the register (8, RO)
+				      */
+/* Only if MSIX is enabled: */
+
+/* configuration change vector (16, RW) */
+#define VIRTIO_MSI_CONFIG_VECTOR  20
+/* vector for selected VQ notifications */
+#define VIRTIO_MSI_QUEUE_VECTOR	  22
+
+/* The bit of the ISR which indicates a device has an interrupt. */
+#define VIRTIO_PCI_ISR_INTR   0x1
+/* The bit of the ISR which indicates a device configuration change. */
+#define VIRTIO_PCI_ISR_CONFIG 0x2
+/* Vector value used to disable MSI for queue. */
+#define VIRTIO_MSI_NO_VECTOR 0xFFFF
+
+/* Status byte for guest to report progress. */
+#define VIRTIO_CONFIG_STATUS_RESET     0x00
+#define VIRTIO_CONFIG_STATUS_ACK       0x01
+#define VIRTIO_CONFIG_STATUS_DRIVER    0x02
+#define VIRTIO_CONFIG_STATUS_DRIVER_OK 0x04
+#define VIRTIO_CONFIG_STATUS_FEATURES_OK 0x08
+#define VIRTIO_CONFIG_STATUS_FAILED    0x80
+
+/*
+ * Each virtqueue indirect descriptor list must be physically contiguous.
+ * To allow us to malloc(9) each list individually, limit the number
+ * supported to what will fit in one page. With 4KB pages, this is a limit
+ * of 256 descriptors. If there is ever a need for more, we can switch to
+ * contigmalloc(9) for the larger allocations, similar to what
+ * bus_dmamem_alloc(9) does.
+ *
+ * Note the sizeof(struct vring_desc) is 16 bytes.
+ */
+#define VIRTIO_MAX_INDIRECT ((int) (PAGE_SIZE / 16))
+
+/* Do we get callbacks when the ring is completely used, even if we've
+ * suppressed them?
+ */
+#define VIRTIO_F_NOTIFY_ON_EMPTY	24
+
+/* Can the device handle any descriptor layout? */
+#define VIRTIO_F_ANY_LAYOUT		27
+
+/* We support indirect buffer descriptors */
+#define VIRTIO_RING_F_INDIRECT_DESC	28
+
+#define VIRTIO_F_VERSION_1		32
+#define VIRTIO_F_IOMMU_PLATFORM	33
+
+/* The Guest publishes the used index for which it expects an interrupt
+ * at the end of the avail ring. Host should ignore the avail->flags field.
+ */
+/* The Host publishes the avail index for which it expects a kick
+ * at the end of the used ring. Guest should ignore the used->flags field.
+ */
+#define VIRTIO_RING_F_EVENT_IDX		29
+
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG	1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG	2
+/* ISR Status */
+#define VIRTIO_PCI_CAP_ISR_CFG		3
+/* Device specific configuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG	4
+/* PCI configuration access */
+#define VIRTIO_PCI_CAP_PCI_CFG		5
+
+/* This is the PCI capability header: */
+struct virtio_pci_cap {
+	uint8_t cap_vndr;	/* Generic PCI field: PCI_CAP_ID_VNDR */
+	uint8_t cap_next;	/* Generic PCI field: next ptr. */
+	uint8_t cap_len;	/* Generic PCI field: capability length */
+	uint8_t cfg_type;	/* Identifies the structure. */
+	uint8_t bar;		/* Where to find it. */
+	uint8_t padding[3];	/* Pad to full dword. */
+	uint32_t offset;	/* Offset within bar. */
+	uint32_t length;	/* Length of the structure, in bytes. */
+};
+
+struct virtio_pci_notify_cap {
+	struct virtio_pci_cap cap;
+	uint32_t notify_off_multiplier;	/* Multiplier for queue_notify_off. */
+};
+
+/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
+struct virtio_pci_common_cfg {
+	/* About the whole device. */
+	uint32_t device_feature_select;	/* read-write */
+	uint32_t device_feature;	/* read-only */
+	uint32_t guest_feature_select;	/* read-write */
+	uint32_t guest_feature;		/* read-write */
+	uint16_t msix_config;		/* read-write */
+	uint16_t num_queues;		/* read-only */
+	uint8_t device_status;		/* read-write */
+	uint8_t config_generation;	/* read-only */
+
+	/* About a specific virtqueue. */
+	uint16_t queue_select;		/* read-write */
+	uint16_t queue_size;		/* read-write, power of 2. */
+	uint16_t queue_msix_vector;	/* read-write */
+	uint16_t queue_enable;		/* read-write */
+	uint16_t queue_notify_off;	/* read-only */
+	uint32_t queue_desc_lo;		/* read-write */
+	uint32_t queue_desc_hi;		/* read-write */
+	uint32_t queue_avail_lo;	/* read-write */
+	uint32_t queue_avail_hi;	/* read-write */
+	uint32_t queue_used_lo;		/* read-write */
+	uint32_t queue_used_hi;		/* read-write */
+};
+
+struct virtio_crypto_hw;
+
+struct virtio_pci_ops {
+	void (*read_dev_cfg)(struct virtio_crypto_hw *hw, size_t offset,
+			     void *dst, int len);
+	void (*write_dev_cfg)(struct virtio_crypto_hw *hw, size_t offset,
+			      const void *src, int len);
+	void (*reset)(struct virtio_crypto_hw *hw);
+
+	uint8_t (*get_status)(struct virtio_crypto_hw *hw);
+	void (*set_status)(struct virtio_crypto_hw *hw, uint8_t status);
+
+	uint64_t (*get_features)(struct virtio_crypto_hw *hw);
+	void (*set_features)(struct virtio_crypto_hw *hw, uint64_t features);
+
+	uint8_t (*get_isr)(struct virtio_crypto_hw *hw);
+
+	uint16_t (*set_config_irq)(struct virtio_crypto_hw *hw, uint16_t vec);
+
+	uint16_t (*set_queue_irq)(struct virtio_crypto_hw *hw,
+			struct virtqueue *vq, uint16_t vec);
+
+	uint16_t (*get_queue_num)(struct virtio_crypto_hw *hw,
+			uint16_t queue_id);
+	int (*setup_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+	void (*del_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+	void (*notify_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+};
+
+struct virtio_crypto_hw {
+	/* control queue */
+	struct virtqueue *cvq;
+	uint16_t    dev_id;
+	uint16_t    max_dataqueues;
+	uint64_t    req_guest_features;
+	uint64_t    guest_features;
+	uint8_t	    use_msix;
+	uint8_t     modern;
+	uint32_t    notify_off_multiplier;
+	uint8_t     *isr;
+	uint16_t    *notify_base;
+	struct virtio_pci_common_cfg *common_cfg;
+	struct virtio_crypto_config *dev_cfg;
+	const struct rte_cryptodev_capabilities *virtio_dev_capabilities;
+};
+
+/*
+ * While virtio_crypto_hw is stored in shared memory, this structure stores
+ * some infos that may vary in the multiple process model locally.
+ * For example, the vtpci_ops pointer.
+ */
+struct virtio_hw_internal {
+	const struct virtio_pci_ops *vtpci_ops;
+	struct rte_pci_ioport io;
+};
+
+#define VTPCI_OPS(hw)	(virtio_hw_internal[(hw)->dev_id].vtpci_ops)
+#define VTPCI_IO(hw)	(&virtio_hw_internal[(hw)->dev_id].io)
+
+extern struct virtio_hw_internal virtio_hw_internal[RTE_MAX_VIRTIO_CRYPTO];
+
+/*
+ * How many bits to shift physical queue address written to QUEUE_PFN.
+ * 12 is historical, and due to x86 page size.
+ */
+#define VIRTIO_PCI_QUEUE_ADDR_SHIFT 12
+
+/* The alignment to use between consumer and producer parts of vring. */
+#define VIRTIO_PCI_VRING_ALIGN 4096
+
+enum virtio_msix_status {
+	VIRTIO_MSIX_NONE = 0,
+	VIRTIO_MSIX_DISABLED = 1,
+	VIRTIO_MSIX_ENABLED = 2
+};
+
+static inline int
+vtpci_with_feature(struct virtio_crypto_hw *hw, uint64_t bit)
+{
+	return (hw->guest_features & (1ULL << bit)) != 0;
+}
+
+/*
+ * Function declaration from virtio_pci.c
+ */
+int vtpci_cryptodev_init(struct rte_pci_device *dev,
+	struct virtio_crypto_hw *hw);
+void vtpci_cryptodev_reset(struct virtio_crypto_hw *hw);
+
+void vtpci_cryptodev_reinit_complete(struct virtio_crypto_hw *hw);
+
+uint8_t vtpci_cryptodev_get_status(struct virtio_crypto_hw *hw);
+void vtpci_cryptodev_set_status(struct virtio_crypto_hw *hw, uint8_t status);
+
+uint64_t vtpci_cryptodev_negotiate_features(struct virtio_crypto_hw *hw,
+	uint64_t host_features);
+
+void vtpci_write_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+	const void *src, int length);
+
+void vtpci_read_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+	void *dst, int length);
+
+uint8_t vtpci_cryptodev_isr(struct virtio_crypto_hw *hw);
+
+#endif /* _VIRTIO_PCI_H_ */
diff --git a/drivers/crypto/virtio/virtio_ring.h b/drivers/crypto/virtio/virtio_ring.h
new file mode 100644
index 0000000..ee30674
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_ring.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_RING_H_
+#define _VIRTIO_RING_H_
+
+#include <stdint.h>
+
+#include <rte_common.h>
+
+/* This marks a buffer as continuing via the next field. */
+#define VRING_DESC_F_NEXT       1
+/* This marks a buffer as write-only (otherwise read-only). */
+#define VRING_DESC_F_WRITE      2
+/* This means the buffer contains a list of buffer descriptors. */
+#define VRING_DESC_F_INDIRECT   4
+
+/* The Host uses this in used->flags to advise the Guest: don't kick me
+ * when you add a buffer.  It's unreliable, so it's simply an
+ * optimization.  Guest will still kick if it's out of buffers.
+ */
+#define VRING_USED_F_NO_NOTIFY  1
+/* The Guest uses this in avail->flags to advise the Host: don't
+ * interrupt me when you consume a buffer.  It's unreliable, so it's
+ * simply an optimization.
+ */
+#define VRING_AVAIL_F_NO_INTERRUPT  1
+
+/* VirtIO ring descriptors: 16 bytes.
+ * These can chain together via "next".
+ */
+struct vring_desc {
+	uint64_t addr;  /*  Address (guest-physical). */
+	uint32_t len;   /* Length. */
+	uint16_t flags; /* The flags as indicated above. */
+	uint16_t next;  /* We chain unused descriptors via this. */
+};
+
+struct vring_avail {
+	uint16_t flags;
+	uint16_t idx;
+	uint16_t ring[0];
+};
+
+/* id is a 16bit index. uint32_t is used here for ids for padding reasons. */
+struct vring_used_elem {
+	/* Index of start of used descriptor chain. */
+	uint32_t id;
+	/* Total length of the descriptor chain which was written to. */
+	uint32_t len;
+};
+
+struct vring_used {
+	uint16_t flags;
+	volatile uint16_t idx;
+	struct vring_used_elem ring[0];
+};
+
+struct vring {
+	unsigned int num;
+	struct vring_desc  *desc;
+	struct vring_avail *avail;
+	struct vring_used  *used;
+};
+
+/* The standard layout for the ring is a continuous chunk of memory which
+ * looks like this.  We assume num is a power of 2.
+ *
+ * struct vring {
+ *      // The actual descriptors (16 bytes each)
+ *      struct vring_desc desc[num];
+ *
+ *      // A ring of available descriptor heads with free-running index.
+ *      __u16 avail_flags;
+ *      __u16 avail_idx;
+ *      __u16 available[num];
+ *      __u16 used_event_idx;
+ *
+ *      // Padding to the next align boundary.
+ *      char pad[];
+ *
+ *      // A ring of used descriptor heads with free-running index.
+ *      __u16 used_flags;
+ *      __u16 used_idx;
+ *      struct vring_used_elem used[num];
+ *      __u16 avail_event_idx;
+ * };
+ *
+ * NOTE: for VirtIO PCI, align is 4096.
+ */
+
+/*
+ * We publish the used event index at the end of the available ring, and vice
+ * versa. They are at the end for backwards compatibility.
+ */
+#define vring_used_event(vr)  ((vr)->avail->ring[(vr)->num])
+#define vring_avail_event(vr) (*(uint16_t *)&(vr)->used->ring[(vr)->num])
+
+static inline size_t
+vring_size(unsigned int num, unsigned long align)
+{
+	size_t size;
+
+	size = num * sizeof(struct vring_desc);
+	size += sizeof(struct vring_avail) + (num * sizeof(uint16_t));
+	size = RTE_ALIGN_CEIL(size, align);
+	size += sizeof(struct vring_used) +
+		(num * sizeof(struct vring_used_elem));
+	return size;
+}
+
+static inline void
+vring_init(struct vring *vr, unsigned int num, uint8_t *p,
+	unsigned long align)
+{
+	vr->num = num;
+	vr->desc = (struct vring_desc *) p;
+	vr->avail = (struct vring_avail *) (p +
+		num * sizeof(struct vring_desc));
+	vr->used = (void *)
+		RTE_ALIGN_CEIL((uintptr_t)(&vr->avail->ring[num]), align);
+}
+
+/*
+ * The following is used with VIRTIO_RING_F_EVENT_IDX.
+ * Assuming a given event_idx value from the other size, if we have
+ * just incremented index from old to new_idx, should we trigger an
+ * event?
+ */
+static inline int
+vring_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
+{
+	return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old);
+}
+
+#endif /* _VIRTIO_RING_H_ */
diff --git a/drivers/crypto/virtio/virtqueue.c b/drivers/crypto/virtio/virtqueue.c
new file mode 100644
index 0000000..fd8be58
--- /dev/null
+++ b/drivers/crypto/virtio/virtqueue.c
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+#include <rte_crypto.h>
+#include <rte_malloc.h>
+
+#include "virtqueue.h"
+
+void
+virtqueue_disable_intr(struct virtqueue *vq)
+{
+	/*
+	 * Set VRING_AVAIL_F_NO_INTERRUPT to hint host
+	 * not to interrupt when it consumes packets
+	 * Note: this is only considered a hint to the host
+	 */
+	vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
+}
+
+void
+virtqueue_detatch_unused(struct virtqueue *vq)
+{
+	struct rte_crypto_op *cop = NULL;
+
+	int idx;
+
+	if (vq != NULL)
+		for (idx = 0; idx < vq->vq_nentries; idx++) {
+			cop = vq->vq_descx[idx].crypto_op;
+			if (cop) {
+				if (cop->sym->m_src)
+					rte_pktmbuf_free(cop->sym->m_src);
+				if (cop->sym->m_dst)
+					rte_pktmbuf_free(cop->sym->m_dst);
+				rte_crypto_op_free(cop);
+				vq->vq_descx[idx].crypto_op = NULL;
+			}
+		}
+}
diff --git a/drivers/crypto/virtio/virtqueue.h b/drivers/crypto/virtio/virtqueue.h
new file mode 100644
index 0000000..0a9bddb
--- /dev/null
+++ b/drivers/crypto/virtio/virtqueue.h
@@ -0,0 +1,172 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTQUEUE_H_
+#define _VIRTQUEUE_H_
+
+#include <linux/virtio_crypto.h>
+
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_mempool.h>
+
+#include "virtio_pci.h"
+#include "virtio_ring.h"
+#include "virtio_logs.h"
+
+struct rte_mbuf;
+
+/*
+ * Per virtio_config.h in Linux.
+ *     For virtio_pci on SMP, we don't need to order with respect to MMIO
+ *     accesses through relaxed memory I/O windows, so smp_mb() et al are
+ *     sufficient.
+ *
+ */
+#define virtio_mb()	rte_smp_mb()
+#define virtio_rmb()	rte_smp_rmb()
+#define virtio_wmb()	rte_smp_wmb()
+
+#define VIRTQUEUE_MAX_NAME_SZ 32
+
+enum { VTCRYPTO_DATAQ = 0, VTCRYPTO_CTRLQ = 1 };
+
+/**
+ * The maximum virtqueue size is 2^15. Use that value as the end of
+ * descriptor chain terminator since it will never be a valid index
+ * in the descriptor table. This is used to verify we are correctly
+ * handling vq_free_cnt.
+ */
+#define VQ_RING_DESC_CHAIN_END 32768
+
+struct vq_desc_extra {
+	void     *crypto_op;
+	void     *cookie;
+	uint16_t ndescs;
+};
+
+struct virtqueue {
+	/**< virtio_crypto_hw structure pointer. */
+	struct virtio_crypto_hw *hw;
+	/**< mem zone to populate RX ring. */
+	const struct rte_memzone *mz;
+	/**< memzone to populate hdr and request. */
+	struct rte_mempool *mpool;
+	uint8_t     dev_id;              /**< Device identifier. */
+	uint16_t    vq_queue_index;       /**< PCI queue index */
+
+	void        *vq_ring_virt_mem;    /**< linear address of vring*/
+	unsigned int vq_ring_size;
+	phys_addr_t vq_ring_mem;          /**< physical address of vring */
+
+	struct vring vq_ring;    /**< vring keeping desc, used and avail */
+	uint16_t    vq_free_cnt; /**< num of desc available */
+	uint16_t    vq_nentries; /**< vring desc numbers */
+
+	/**
+	 * Head of the free chain in the descriptor table. If
+	 * there are no free descriptors, this will be set to
+	 * VQ_RING_DESC_CHAIN_END.
+	 */
+	uint16_t  vq_desc_head_idx;
+	uint16_t  vq_desc_tail_idx;
+	/**
+	 * Last consumed descriptor in the used table,
+	 * trails vq_ring.used->idx.
+	 */
+	uint16_t vq_used_cons_idx;
+	uint16_t vq_avail_idx;
+
+	/* Statistics */
+	uint64_t	packets_sent_total;
+	uint64_t	packets_sent_failed;
+	uint64_t	packets_received_total;
+	uint64_t	packets_received_failed;
+
+	uint16_t  *notify_addr;
+
+	struct vq_desc_extra vq_descx[0];
+};
+
+/**
+ * Tell the backend not to interrupt us.
+ */
+void virtqueue_disable_intr(struct virtqueue *vq);
+
+/**
+ *  Get all mbufs to be freed.
+ */
+void virtqueue_detatch_unused(struct virtqueue *vq);
+
+static inline int
+virtqueue_full(const struct virtqueue *vq)
+{
+	return vq->vq_free_cnt == 0;
+}
+
+#define VIRTQUEUE_NUSED(vq) \
+	((uint16_t)((vq)->vq_ring.used->idx - (vq)->vq_used_cons_idx))
+
+static inline void
+vq_update_avail_idx(struct virtqueue *vq)
+{
+	virtio_wmb();
+	vq->vq_ring.avail->idx = vq->vq_avail_idx;
+}
+
+static inline void
+vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
+{
+	uint16_t avail_idx;
+	/*
+	 * Place the head of the descriptor chain into the next slot and make
+	 * it usable to the host. The chain is made available now rather than
+	 * deferring to virtqueue_notify() in the hopes that if the host is
+	 * currently running on another CPU, we can keep it processing the new
+	 * descriptor.
+	 */
+	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
+	if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx))
+		vq->vq_ring.avail->ring[avail_idx] = desc_idx;
+	vq->vq_avail_idx++;
+}
+
+static inline int
+virtqueue_kick_prepare(struct virtqueue *vq)
+{
+	return !(vq->vq_ring.used->flags & VRING_USED_F_NO_NOTIFY);
+}
+
+static inline void
+virtqueue_notify(struct virtqueue *vq)
+{
+	/*
+	 * Ensure updated avail->idx is visible to host.
+	 * For virtio on IA, the notificaiton is through io port operation
+	 * which is a serialization instruction itself.
+	 */
+	VTPCI_OPS(vq->hw)->notify_queue(vq->hw, vq);
+}
+
+/**
+ * Dump virtqueue internal structures, for debug purpose only.
+ */
+#define VIRTQUEUE_DUMP(vq) do { \
+	uint16_t used_idx, nused; \
+	used_idx = (vq)->vq_ring.used->idx; \
+	nused = (uint16_t)(used_idx - (vq)->vq_used_cons_idx); \
+	VIRTIO_CRYPTO_INIT_LOG_DBG(\
+	  "VQ: - size=%d; free=%d; used=%d; desc_head_idx=%d;" \
+	  " avail.idx=%d; used_cons_idx=%d; used.idx=%d;" \
+	  " avail.flags=0x%x; used.flags=0x%x", \
+	  (vq)->vq_nentries, (vq)->vq_free_cnt, nused, \
+	  (vq)->vq_desc_head_idx, (vq)->vq_ring.avail->idx, \
+	  (vq)->vq_used_cons_idx, (vq)->vq_ring.used->idx, \
+	  (vq)->vq_ring.avail->flags, (vq)->vq_ring.used->flags); \
+} while (0)
+
+#endif /* _VIRTQUEUE_H_ */
-- 
1.8.3.1

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v4 1/7] crypto/virtio: add virtio related fundamental functions
  @ 2018-03-31  7:49  2% ` Jay Zhou
  0 siblings, 0 replies; 200+ results
From: Jay Zhou @ 2018-03-31  7:49 UTC (permalink / raw)
  To: dev
  Cc: pablo.de.lara.guarch, roy.fan.zhang, thomas, arei.gonglei,
	xin.zeng, weidong.huang, wangxinxin.wang, longpeng2,
	jianjay.zhou

Since there does not have the common virtio library, we have to put
these files here. They are basically the same with virtio net related files
with some minor changes.

Meanwhile, adding virtio crypto PMD related release note for 18.05.

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Fan Zhang <roy.fan.zhang@intel.com>
Acked-by: Fan Zhang <roy.fan.zhang@intel.com>
---
 config/common_base                     |  14 +
 doc/guides/rel_notes/release_18_05.rst |   4 +
 drivers/crypto/virtio/virtio_logs.h    |  91 +++++++
 drivers/crypto/virtio/virtio_pci.c     | 460 +++++++++++++++++++++++++++++++++
 drivers/crypto/virtio/virtio_pci.h     | 253 ++++++++++++++++++
 drivers/crypto/virtio/virtio_ring.h    | 137 ++++++++++
 drivers/crypto/virtio/virtqueue.c      |  43 +++
 drivers/crypto/virtio/virtqueue.h      | 172 ++++++++++++
 8 files changed, 1174 insertions(+)
 create mode 100644 drivers/crypto/virtio/virtio_logs.h
 create mode 100644 drivers/crypto/virtio/virtio_pci.c
 create mode 100644 drivers/crypto/virtio/virtio_pci.h
 create mode 100644 drivers/crypto/virtio/virtio_ring.h
 create mode 100644 drivers/crypto/virtio/virtqueue.c
 create mode 100644 drivers/crypto/virtio/virtqueue.h

diff --git a/config/common_base b/config/common_base
index ee10b44..91d3102 100644
--- a/config/common_base
+++ b/config/common_base
@@ -486,6 +486,20 @@ CONFIG_RTE_LIBRTE_PMD_QAT_DEBUG_DRIVER=n
 CONFIG_RTE_QAT_PMD_MAX_NB_SESSIONS=2048
 
 #
+# Compile PMD for virtio crypto devices
+#
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO=n
+#
+# Number of maximum virtio crypto devices
+#
+CONFIG_RTE_MAX_VIRTIO_CRYPTO=32
+#
+# Number of sessions to create in the session memory pool
+# on a single virtio crypto device.
+#
+CONFIG_RTE_VIRTIO_CRYPTO_PMD_MAX_NB_SESSIONS=1024
+
+#
 # Compile PMD for AESNI backed device
 #
 CONFIG_RTE_LIBRTE_PMD_AESNI_MB=n
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 3923dc2..32c39d5 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -41,6 +41,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Added Virtio Crypto PMD.**
+
+  Added new Virtio Crypto PMD, which provides AES-CBC ciphering and AES-CBC
+  with HMAC-SHA1 algorithm-chaining.
 
 API Changes
 -----------
diff --git a/drivers/crypto/virtio/virtio_logs.h b/drivers/crypto/virtio/virtio_logs.h
new file mode 100644
index 0000000..26a286c
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_logs.h
@@ -0,0 +1,91 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_LOGS_H_
+#define _VIRTIO_LOGS_H_
+
+#include <rte_log.h>
+
+#define PMD_INIT_LOG(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, RTE_LOGTYPE_PMD, \
+		"PMD: %s(): " fmt "\n", __func__, ##args)
+
+#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
+
+extern int virtio_crypto_logtype_init;
+
+#define VIRTIO_CRYPTO_INIT_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_init, \
+		"INIT: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_INIT_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_INIT_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_INIT_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_INIT_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_INIT_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_INIT_LOG_IMPL(ERR, fmt, ## args)
+
+extern int virtio_crypto_logtype_session;
+
+#define VIRTIO_CRYPTO_SESSION_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_session, \
+		"SESSION: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_SESSION_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_SESSION_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_SESSION_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_SESSION_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_SESSION_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_SESSION_LOG_IMPL(ERR, fmt, ## args)
+
+extern int virtio_crypto_logtype_rx;
+
+#define VIRTIO_CRYPTO_RX_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_rx, \
+		"RX: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_RX_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_RX_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_RX_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_RX_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_RX_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_RX_LOG_IMPL(ERR, fmt, ## args)
+
+extern int virtio_crypto_logtype_tx;
+
+#define VIRTIO_CRYPTO_TX_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_tx, \
+		"TX: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_TX_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_TX_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_TX_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_TX_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_TX_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_TX_LOG_IMPL(ERR, fmt, ## args)
+
+extern int virtio_crypto_logtype_driver;
+
+#define VIRTIO_CRYPTO_DRV_LOG_IMPL(level, fmt, args...) \
+	rte_log(RTE_LOG_ ## level, virtio_crypto_logtype_driver, \
+		"DRIVER: %s(): " fmt "\n", __func__, ##args)
+
+#define VIRTIO_CRYPTO_DRV_LOG_INFO(fmt, args...) \
+	VIRTIO_CRYPTO_DRV_LOG_IMPL(INFO, fmt, ## args)
+
+#define VIRTIO_CRYPTO_DRV_LOG_DBG(fmt, args...) \
+	VIRTIO_CRYPTO_DRV_LOG_IMPL(DEBUG, fmt, ## args)
+
+#define VIRTIO_CRYPTO_DRV_LOG_ERR(fmt, args...) \
+	VIRTIO_CRYPTO_DRV_LOG_IMPL(ERR, fmt, ## args)
+
+#endif /* _VIRTIO_LOGS_H_ */
diff --git a/drivers/crypto/virtio/virtio_pci.c b/drivers/crypto/virtio/virtio_pci.c
new file mode 100644
index 0000000..43ec1a4
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_pci.c
@@ -0,0 +1,460 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#include <stdint.h>
+
+#ifdef RTE_EXEC_ENV_LINUXAPP
+ #include <dirent.h>
+ #include <fcntl.h>
+#endif
+
+#include <rte_io.h>
+#include <rte_bus.h>
+
+#include "virtio_pci.h"
+#include "virtqueue.h"
+
+/*
+ * Following macros are derived from linux/pci_regs.h, however,
+ * we can't simply include that header here, as there is no such
+ * file for non-Linux platform.
+ */
+#define PCI_CAPABILITY_LIST	0x34
+#define PCI_CAP_ID_VNDR		0x09
+#define PCI_CAP_ID_MSIX		0x11
+
+/*
+ * The remaining space is defined by each driver as the per-driver
+ * configuration space.
+ */
+#define VIRTIO_PCI_CONFIG(hw) \
+		(((hw)->use_msix == VIRTIO_MSIX_ENABLED) ? 24 : 20)
+
+static inline int
+check_vq_phys_addr_ok(struct virtqueue *vq)
+{
+	/* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
+	 * and only accepts 32 bit page frame number.
+	 * Check if the allocated physical memory exceeds 16TB.
+	 */
+	if ((vq->vq_ring_mem + vq->vq_ring_size - 1) >>
+			(VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR("vring address shouldn't be above 16TB!");
+		return 0;
+	}
+
+	return 1;
+}
+
+static inline void
+io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
+{
+	rte_write32(val & ((1ULL << 32) - 1), lo);
+	rte_write32(val >> 32,		     hi);
+}
+
+static void
+modern_read_dev_config(struct virtio_crypto_hw *hw, size_t offset,
+		       void *dst, int length)
+{
+	int i;
+	uint8_t *p;
+	uint8_t old_gen, new_gen;
+
+	do {
+		old_gen = rte_read8(&hw->common_cfg->config_generation);
+
+		p = dst;
+		for (i = 0;  i < length; i++)
+			*p++ = rte_read8((uint8_t *)hw->dev_cfg + offset + i);
+
+		new_gen = rte_read8(&hw->common_cfg->config_generation);
+	} while (old_gen != new_gen);
+}
+
+static void
+modern_write_dev_config(struct virtio_crypto_hw *hw, size_t offset,
+			const void *src, int length)
+{
+	int i;
+	const uint8_t *p = src;
+
+	for (i = 0;  i < length; i++)
+		rte_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
+}
+
+static uint64_t
+modern_get_features(struct virtio_crypto_hw *hw)
+{
+	uint32_t features_lo, features_hi;
+
+	rte_write32(0, &hw->common_cfg->device_feature_select);
+	features_lo = rte_read32(&hw->common_cfg->device_feature);
+
+	rte_write32(1, &hw->common_cfg->device_feature_select);
+	features_hi = rte_read32(&hw->common_cfg->device_feature);
+
+	return ((uint64_t)features_hi << 32) | features_lo;
+}
+
+static void
+modern_set_features(struct virtio_crypto_hw *hw, uint64_t features)
+{
+	rte_write32(0, &hw->common_cfg->guest_feature_select);
+	rte_write32(features & ((1ULL << 32) - 1),
+		    &hw->common_cfg->guest_feature);
+
+	rte_write32(1, &hw->common_cfg->guest_feature_select);
+	rte_write32(features >> 32,
+		    &hw->common_cfg->guest_feature);
+}
+
+static uint8_t
+modern_get_status(struct virtio_crypto_hw *hw)
+{
+	return rte_read8(&hw->common_cfg->device_status);
+}
+
+static void
+modern_set_status(struct virtio_crypto_hw *hw, uint8_t status)
+{
+	rte_write8(status, &hw->common_cfg->device_status);
+}
+
+static void
+modern_reset(struct virtio_crypto_hw *hw)
+{
+	modern_set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	modern_get_status(hw);
+}
+
+static uint8_t
+modern_get_isr(struct virtio_crypto_hw *hw)
+{
+	return rte_read8(hw->isr);
+}
+
+static uint16_t
+modern_set_config_irq(struct virtio_crypto_hw *hw, uint16_t vec)
+{
+	rte_write16(vec, &hw->common_cfg->msix_config);
+	return rte_read16(&hw->common_cfg->msix_config);
+}
+
+static uint16_t
+modern_set_queue_irq(struct virtio_crypto_hw *hw, struct virtqueue *vq,
+		uint16_t vec)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+	rte_write16(vec, &hw->common_cfg->queue_msix_vector);
+	return rte_read16(&hw->common_cfg->queue_msix_vector);
+}
+
+static uint16_t
+modern_get_queue_num(struct virtio_crypto_hw *hw, uint16_t queue_id)
+{
+	rte_write16(queue_id, &hw->common_cfg->queue_select);
+	return rte_read16(&hw->common_cfg->queue_size);
+}
+
+static int
+modern_setup_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
+{
+	uint64_t desc_addr, avail_addr, used_addr;
+	uint16_t notify_off;
+
+	if (!check_vq_phys_addr_ok(vq))
+		return -1;
+
+	desc_addr = vq->vq_ring_mem;
+	avail_addr = desc_addr + vq->vq_nentries * sizeof(struct vring_desc);
+	used_addr = RTE_ALIGN_CEIL(avail_addr + offsetof(struct vring_avail,
+							 ring[vq->vq_nentries]),
+				   VIRTIO_PCI_VRING_ALIGN);
+
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(desc_addr, &hw->common_cfg->queue_desc_lo,
+				      &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(avail_addr, &hw->common_cfg->queue_avail_lo,
+				       &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(used_addr, &hw->common_cfg->queue_used_lo,
+				      &hw->common_cfg->queue_used_hi);
+
+	notify_off = rte_read16(&hw->common_cfg->queue_notify_off);
+	vq->notify_addr = (void *)((uint8_t *)hw->notify_base +
+				notify_off * hw->notify_off_multiplier);
+
+	rte_write16(1, &hw->common_cfg->queue_enable);
+
+	VIRTIO_CRYPTO_INIT_LOG_DBG("queue %u addresses:", vq->vq_queue_index);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("\t desc_addr: %" PRIx64, desc_addr);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("\t aval_addr: %" PRIx64, avail_addr);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("\t used_addr: %" PRIx64, used_addr);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("\t notify addr: %p (notify offset: %u)",
+		vq->notify_addr, notify_off);
+
+	return 0;
+}
+
+static void
+modern_del_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(0, &hw->common_cfg->queue_desc_lo,
+				  &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_avail_lo,
+				  &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_used_lo,
+				  &hw->common_cfg->queue_used_hi);
+
+	rte_write16(0, &hw->common_cfg->queue_enable);
+}
+
+static void
+modern_notify_queue(struct virtio_crypto_hw *hw __rte_unused,
+		struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, vq->notify_addr);
+}
+
+const struct virtio_pci_ops virtio_crypto_modern_ops = {
+	.read_dev_cfg	= modern_read_dev_config,
+	.write_dev_cfg	= modern_write_dev_config,
+	.reset		= modern_reset,
+	.get_status	= modern_get_status,
+	.set_status	= modern_set_status,
+	.get_features	= modern_get_features,
+	.set_features	= modern_set_features,
+	.get_isr	= modern_get_isr,
+	.set_config_irq	= modern_set_config_irq,
+	.set_queue_irq  = modern_set_queue_irq,
+	.get_queue_num	= modern_get_queue_num,
+	.setup_queue	= modern_setup_queue,
+	.del_queue	= modern_del_queue,
+	.notify_queue	= modern_notify_queue,
+};
+
+void
+vtpci_read_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+		void *dst, int length)
+{
+	VTPCI_OPS(hw)->read_dev_cfg(hw, offset, dst, length);
+}
+
+void
+vtpci_write_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+		const void *src, int length)
+{
+	VTPCI_OPS(hw)->write_dev_cfg(hw, offset, src, length);
+}
+
+uint64_t
+vtpci_cryptodev_negotiate_features(struct virtio_crypto_hw *hw,
+		uint64_t host_features)
+{
+	uint64_t features;
+
+	/*
+	 * Limit negotiated features to what the driver, virtqueue, and
+	 * host all support.
+	 */
+	features = host_features & hw->guest_features;
+	VTPCI_OPS(hw)->set_features(hw, features);
+
+	return features;
+}
+
+void
+vtpci_cryptodev_reset(struct virtio_crypto_hw *hw)
+{
+	VTPCI_OPS(hw)->set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	/* flush status write */
+	VTPCI_OPS(hw)->get_status(hw);
+}
+
+void
+vtpci_cryptodev_reinit_complete(struct virtio_crypto_hw *hw)
+{
+	vtpci_cryptodev_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
+}
+
+void
+vtpci_cryptodev_set_status(struct virtio_crypto_hw *hw, uint8_t status)
+{
+	if (status != VIRTIO_CONFIG_STATUS_RESET)
+		status |= VTPCI_OPS(hw)->get_status(hw);
+
+	VTPCI_OPS(hw)->set_status(hw, status);
+}
+
+uint8_t
+vtpci_cryptodev_get_status(struct virtio_crypto_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_status(hw);
+}
+
+uint8_t
+vtpci_cryptodev_isr(struct virtio_crypto_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_isr(hw);
+}
+
+static void *
+get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap)
+{
+	uint8_t  bar    = cap->bar;
+	uint32_t length = cap->length;
+	uint32_t offset = cap->offset;
+	uint8_t *base;
+
+	if (bar >= PCI_MAX_RESOURCE) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR("invalid bar: %u", bar);
+		return NULL;
+	}
+
+	if (offset + length < offset) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR("offset(%u) + length(%u) overflows",
+			offset, length);
+		return NULL;
+	}
+
+	if (offset + length > dev->mem_resource[bar].len) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR(
+			"invalid cap: overflows bar space: %u > %" PRIu64,
+			offset + length, dev->mem_resource[bar].len);
+		return NULL;
+	}
+
+	base = dev->mem_resource[bar].addr;
+	if (base == NULL) {
+		VIRTIO_CRYPTO_INIT_LOG_ERR("bar %u base addr is NULL", bar);
+		return NULL;
+	}
+
+	return base + offset;
+}
+
+#define PCI_MSIX_ENABLE 0x8000
+
+static int
+virtio_read_caps(struct rte_pci_device *dev, struct virtio_crypto_hw *hw)
+{
+	uint8_t pos;
+	struct virtio_pci_cap cap;
+	int ret;
+
+	if (rte_pci_map_device(dev)) {
+		VIRTIO_CRYPTO_INIT_LOG_DBG("failed to map pci device!");
+		return -1;
+	}
+
+	ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
+	if (ret < 0) {
+		VIRTIO_CRYPTO_INIT_LOG_DBG("failed to read pci capability list");
+		return -1;
+	}
+
+	while (pos) {
+		ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
+		if (ret < 0) {
+			VIRTIO_CRYPTO_INIT_LOG_ERR(
+				"failed to read pci cap at pos: %x", pos);
+			break;
+		}
+
+		if (cap.cap_vndr == PCI_CAP_ID_MSIX) {
+			/* Transitional devices would also have this capability,
+			 * that's why we also check if msix is enabled.
+			 * 1st byte is cap ID; 2nd byte is the position of next
+			 * cap; next two bytes are the flags.
+			 */
+			uint16_t flags = ((uint16_t *)&cap)[1];
+
+			if (flags & PCI_MSIX_ENABLE)
+				hw->use_msix = VIRTIO_MSIX_ENABLED;
+			else
+				hw->use_msix = VIRTIO_MSIX_DISABLED;
+		}
+
+		if (cap.cap_vndr != PCI_CAP_ID_VNDR) {
+			VIRTIO_CRYPTO_INIT_LOG_DBG(
+				"[%2x] skipping non VNDR cap id: %02x",
+				pos, cap.cap_vndr);
+			goto next;
+		}
+
+		VIRTIO_CRYPTO_INIT_LOG_DBG(
+			"[%2x] cfg type: %u, bar: %u, offset: %04x, len: %u",
+			pos, cap.cfg_type, cap.bar, cap.offset, cap.length);
+
+		switch (cap.cfg_type) {
+		case VIRTIO_PCI_CAP_COMMON_CFG:
+			hw->common_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_NOTIFY_CFG:
+			rte_pci_read_config(dev, &hw->notify_off_multiplier,
+					4, pos + sizeof(cap));
+			hw->notify_base = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_DEVICE_CFG:
+			hw->dev_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_ISR_CFG:
+			hw->isr = get_cfg_addr(dev, &cap);
+			break;
+		}
+
+next:
+		pos = cap.cap_next;
+	}
+
+	if (hw->common_cfg == NULL || hw->notify_base == NULL ||
+	    hw->dev_cfg == NULL    || hw->isr == NULL) {
+		VIRTIO_CRYPTO_INIT_LOG_INFO("no modern virtio pci device found.");
+		return -1;
+	}
+
+	VIRTIO_CRYPTO_INIT_LOG_INFO("found modern virtio pci device.");
+
+	VIRTIO_CRYPTO_INIT_LOG_DBG("common cfg mapped at: %p", hw->common_cfg);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("device cfg mapped at: %p", hw->dev_cfg);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("isr cfg mapped at: %p", hw->isr);
+	VIRTIO_CRYPTO_INIT_LOG_DBG("notify base: %p, notify off multiplier: %u",
+		hw->notify_base, hw->notify_off_multiplier);
+
+	return 0;
+}
+
+/*
+ * Return -1:
+ *   if there is error mapping with VFIO/UIO.
+ *   if port map error when driver type is KDRV_NONE.
+ *   if whitelisted but driver type is KDRV_UNKNOWN.
+ * Return 1 if kernel driver is managing the device.
+ * Return 0 on success.
+ */
+int
+vtpci_cryptodev_init(struct rte_pci_device *dev, struct virtio_crypto_hw *hw)
+{
+	/*
+	 * Try if we can succeed reading virtio pci caps, which exists
+	 * only on modern pci device. If failed, we fallback to legacy
+	 * virtio handling.
+	 */
+	if (virtio_read_caps(dev, hw) == 0) {
+		VIRTIO_CRYPTO_INIT_LOG_INFO("modern virtio pci detected.");
+		virtio_hw_internal[hw->dev_id].vtpci_ops =
+					&virtio_crypto_modern_ops;
+		hw->modern = 1;
+		return 0;
+	}
+
+	/*
+	 * virtio crypto conforms to virtio 1.0 and doesn't support
+	 * legacy mode
+	 */
+	return -1;
+}
diff --git a/drivers/crypto/virtio/virtio_pci.h b/drivers/crypto/virtio/virtio_pci.h
new file mode 100644
index 0000000..cd316a6
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_pci.h
@@ -0,0 +1,253 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_PCI_H_
+#define _VIRTIO_PCI_H_
+
+#include <linux/virtio_crypto.h>
+
+#include <stdint.h>
+
+#include <rte_pci.h>
+#include <rte_bus_pci.h>
+#include <rte_cryptodev.h>
+
+struct virtqueue;
+
+/* VirtIO PCI vendor/device ID. */
+#define VIRTIO_CRYPTO_PCI_VENDORID 0x1AF4
+#define VIRTIO_CRYPTO_PCI_DEVICEID 0x1054
+
+/* VirtIO ABI version, this must match exactly. */
+#define VIRTIO_PCI_ABI_VERSION 0
+
+/*
+ * VirtIO Header, located in BAR 0.
+ */
+#define VIRTIO_PCI_HOST_FEATURES  0  /* host's supported features (32bit, RO)*/
+#define VIRTIO_PCI_GUEST_FEATURES 4  /* guest's supported features (32, RW) */
+#define VIRTIO_PCI_QUEUE_PFN      8  /* physical address of VQ (32, RW) */
+#define VIRTIO_PCI_QUEUE_NUM      12 /* number of ring entries (16, RO) */
+#define VIRTIO_PCI_QUEUE_SEL      14 /* current VQ selection (16, RW) */
+#define VIRTIO_PCI_QUEUE_NOTIFY   16 /* notify host regarding VQ (16, RW) */
+#define VIRTIO_PCI_STATUS         18 /* device status register (8, RW) */
+#define VIRTIO_PCI_ISR            19 /* interrupt status register, reading
+				      * also clears the register (8, RO)
+				      */
+/* Only if MSIX is enabled: */
+
+/* configuration change vector (16, RW) */
+#define VIRTIO_MSI_CONFIG_VECTOR  20
+/* vector for selected VQ notifications */
+#define VIRTIO_MSI_QUEUE_VECTOR	  22
+
+/* The bit of the ISR which indicates a device has an interrupt. */
+#define VIRTIO_PCI_ISR_INTR   0x1
+/* The bit of the ISR which indicates a device configuration change. */
+#define VIRTIO_PCI_ISR_CONFIG 0x2
+/* Vector value used to disable MSI for queue. */
+#define VIRTIO_MSI_NO_VECTOR 0xFFFF
+
+/* Status byte for guest to report progress. */
+#define VIRTIO_CONFIG_STATUS_RESET     0x00
+#define VIRTIO_CONFIG_STATUS_ACK       0x01
+#define VIRTIO_CONFIG_STATUS_DRIVER    0x02
+#define VIRTIO_CONFIG_STATUS_DRIVER_OK 0x04
+#define VIRTIO_CONFIG_STATUS_FEATURES_OK 0x08
+#define VIRTIO_CONFIG_STATUS_FAILED    0x80
+
+/*
+ * Each virtqueue indirect descriptor list must be physically contiguous.
+ * To allow us to malloc(9) each list individually, limit the number
+ * supported to what will fit in one page. With 4KB pages, this is a limit
+ * of 256 descriptors. If there is ever a need for more, we can switch to
+ * contigmalloc(9) for the larger allocations, similar to what
+ * bus_dmamem_alloc(9) does.
+ *
+ * Note the sizeof(struct vring_desc) is 16 bytes.
+ */
+#define VIRTIO_MAX_INDIRECT ((int) (PAGE_SIZE / 16))
+
+/* Do we get callbacks when the ring is completely used, even if we've
+ * suppressed them?
+ */
+#define VIRTIO_F_NOTIFY_ON_EMPTY	24
+
+/* Can the device handle any descriptor layout? */
+#define VIRTIO_F_ANY_LAYOUT		27
+
+/* We support indirect buffer descriptors */
+#define VIRTIO_RING_F_INDIRECT_DESC	28
+
+#define VIRTIO_F_VERSION_1		32
+#define VIRTIO_F_IOMMU_PLATFORM	33
+
+/* The Guest publishes the used index for which it expects an interrupt
+ * at the end of the avail ring. Host should ignore the avail->flags field.
+ */
+/* The Host publishes the avail index for which it expects a kick
+ * at the end of the used ring. Guest should ignore the used->flags field.
+ */
+#define VIRTIO_RING_F_EVENT_IDX		29
+
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG	1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG	2
+/* ISR Status */
+#define VIRTIO_PCI_CAP_ISR_CFG		3
+/* Device specific configuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG	4
+/* PCI configuration access */
+#define VIRTIO_PCI_CAP_PCI_CFG		5
+
+/* This is the PCI capability header: */
+struct virtio_pci_cap {
+	uint8_t cap_vndr;	/* Generic PCI field: PCI_CAP_ID_VNDR */
+	uint8_t cap_next;	/* Generic PCI field: next ptr. */
+	uint8_t cap_len;	/* Generic PCI field: capability length */
+	uint8_t cfg_type;	/* Identifies the structure. */
+	uint8_t bar;		/* Where to find it. */
+	uint8_t padding[3];	/* Pad to full dword. */
+	uint32_t offset;	/* Offset within bar. */
+	uint32_t length;	/* Length of the structure, in bytes. */
+};
+
+struct virtio_pci_notify_cap {
+	struct virtio_pci_cap cap;
+	uint32_t notify_off_multiplier;	/* Multiplier for queue_notify_off. */
+};
+
+/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
+struct virtio_pci_common_cfg {
+	/* About the whole device. */
+	uint32_t device_feature_select;	/* read-write */
+	uint32_t device_feature;	/* read-only */
+	uint32_t guest_feature_select;	/* read-write */
+	uint32_t guest_feature;		/* read-write */
+	uint16_t msix_config;		/* read-write */
+	uint16_t num_queues;		/* read-only */
+	uint8_t device_status;		/* read-write */
+	uint8_t config_generation;	/* read-only */
+
+	/* About a specific virtqueue. */
+	uint16_t queue_select;		/* read-write */
+	uint16_t queue_size;		/* read-write, power of 2. */
+	uint16_t queue_msix_vector;	/* read-write */
+	uint16_t queue_enable;		/* read-write */
+	uint16_t queue_notify_off;	/* read-only */
+	uint32_t queue_desc_lo;		/* read-write */
+	uint32_t queue_desc_hi;		/* read-write */
+	uint32_t queue_avail_lo;	/* read-write */
+	uint32_t queue_avail_hi;	/* read-write */
+	uint32_t queue_used_lo;		/* read-write */
+	uint32_t queue_used_hi;		/* read-write */
+};
+
+struct virtio_crypto_hw;
+
+struct virtio_pci_ops {
+	void (*read_dev_cfg)(struct virtio_crypto_hw *hw, size_t offset,
+			     void *dst, int len);
+	void (*write_dev_cfg)(struct virtio_crypto_hw *hw, size_t offset,
+			      const void *src, int len);
+	void (*reset)(struct virtio_crypto_hw *hw);
+
+	uint8_t (*get_status)(struct virtio_crypto_hw *hw);
+	void (*set_status)(struct virtio_crypto_hw *hw, uint8_t status);
+
+	uint64_t (*get_features)(struct virtio_crypto_hw *hw);
+	void (*set_features)(struct virtio_crypto_hw *hw, uint64_t features);
+
+	uint8_t (*get_isr)(struct virtio_crypto_hw *hw);
+
+	uint16_t (*set_config_irq)(struct virtio_crypto_hw *hw, uint16_t vec);
+
+	uint16_t (*set_queue_irq)(struct virtio_crypto_hw *hw,
+			struct virtqueue *vq, uint16_t vec);
+
+	uint16_t (*get_queue_num)(struct virtio_crypto_hw *hw,
+			uint16_t queue_id);
+	int (*setup_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+	void (*del_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+	void (*notify_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+};
+
+struct virtio_crypto_hw {
+	/* control queue */
+	struct virtqueue *cvq;
+	uint16_t    dev_id;
+	uint16_t    max_dataqueues;
+	uint64_t    req_guest_features;
+	uint64_t    guest_features;
+	uint8_t	    use_msix;
+	uint8_t     modern;
+	uint32_t    notify_off_multiplier;
+	uint8_t     *isr;
+	uint16_t    *notify_base;
+	struct virtio_pci_common_cfg *common_cfg;
+	struct virtio_crypto_config *dev_cfg;
+	const struct rte_cryptodev_capabilities *virtio_dev_capabilities;
+};
+
+/*
+ * While virtio_crypto_hw is stored in shared memory, this structure stores
+ * some infos that may vary in the multiple process model locally.
+ * For example, the vtpci_ops pointer.
+ */
+struct virtio_hw_internal {
+	const struct virtio_pci_ops *vtpci_ops;
+	struct rte_pci_ioport io;
+};
+
+#define VTPCI_OPS(hw)	(virtio_hw_internal[(hw)->dev_id].vtpci_ops)
+#define VTPCI_IO(hw)	(&virtio_hw_internal[(hw)->dev_id].io)
+
+extern struct virtio_hw_internal virtio_hw_internal[RTE_MAX_VIRTIO_CRYPTO];
+
+/*
+ * How many bits to shift physical queue address written to QUEUE_PFN.
+ * 12 is historical, and due to x86 page size.
+ */
+#define VIRTIO_PCI_QUEUE_ADDR_SHIFT 12
+
+/* The alignment to use between consumer and producer parts of vring. */
+#define VIRTIO_PCI_VRING_ALIGN 4096
+
+enum virtio_msix_status {
+	VIRTIO_MSIX_NONE = 0,
+	VIRTIO_MSIX_DISABLED = 1,
+	VIRTIO_MSIX_ENABLED = 2
+};
+
+static inline int
+vtpci_with_feature(struct virtio_crypto_hw *hw, uint64_t bit)
+{
+	return (hw->guest_features & (1ULL << bit)) != 0;
+}
+
+/*
+ * Function declaration from virtio_pci.c
+ */
+int vtpci_cryptodev_init(struct rte_pci_device *dev,
+	struct virtio_crypto_hw *hw);
+void vtpci_cryptodev_reset(struct virtio_crypto_hw *hw);
+
+void vtpci_cryptodev_reinit_complete(struct virtio_crypto_hw *hw);
+
+uint8_t vtpci_cryptodev_get_status(struct virtio_crypto_hw *hw);
+void vtpci_cryptodev_set_status(struct virtio_crypto_hw *hw, uint8_t status);
+
+uint64_t vtpci_cryptodev_negotiate_features(struct virtio_crypto_hw *hw,
+	uint64_t host_features);
+
+void vtpci_write_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+	const void *src, int length);
+
+void vtpci_read_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+	void *dst, int length);
+
+uint8_t vtpci_cryptodev_isr(struct virtio_crypto_hw *hw);
+
+#endif /* _VIRTIO_PCI_H_ */
diff --git a/drivers/crypto/virtio/virtio_ring.h b/drivers/crypto/virtio/virtio_ring.h
new file mode 100644
index 0000000..ee30674
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_ring.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_RING_H_
+#define _VIRTIO_RING_H_
+
+#include <stdint.h>
+
+#include <rte_common.h>
+
+/* This marks a buffer as continuing via the next field. */
+#define VRING_DESC_F_NEXT       1
+/* This marks a buffer as write-only (otherwise read-only). */
+#define VRING_DESC_F_WRITE      2
+/* This means the buffer contains a list of buffer descriptors. */
+#define VRING_DESC_F_INDIRECT   4
+
+/* The Host uses this in used->flags to advise the Guest: don't kick me
+ * when you add a buffer.  It's unreliable, so it's simply an
+ * optimization.  Guest will still kick if it's out of buffers.
+ */
+#define VRING_USED_F_NO_NOTIFY  1
+/* The Guest uses this in avail->flags to advise the Host: don't
+ * interrupt me when you consume a buffer.  It's unreliable, so it's
+ * simply an optimization.
+ */
+#define VRING_AVAIL_F_NO_INTERRUPT  1
+
+/* VirtIO ring descriptors: 16 bytes.
+ * These can chain together via "next".
+ */
+struct vring_desc {
+	uint64_t addr;  /*  Address (guest-physical). */
+	uint32_t len;   /* Length. */
+	uint16_t flags; /* The flags as indicated above. */
+	uint16_t next;  /* We chain unused descriptors via this. */
+};
+
+struct vring_avail {
+	uint16_t flags;
+	uint16_t idx;
+	uint16_t ring[0];
+};
+
+/* id is a 16bit index. uint32_t is used here for ids for padding reasons. */
+struct vring_used_elem {
+	/* Index of start of used descriptor chain. */
+	uint32_t id;
+	/* Total length of the descriptor chain which was written to. */
+	uint32_t len;
+};
+
+struct vring_used {
+	uint16_t flags;
+	volatile uint16_t idx;
+	struct vring_used_elem ring[0];
+};
+
+struct vring {
+	unsigned int num;
+	struct vring_desc  *desc;
+	struct vring_avail *avail;
+	struct vring_used  *used;
+};
+
+/* The standard layout for the ring is a continuous chunk of memory which
+ * looks like this.  We assume num is a power of 2.
+ *
+ * struct vring {
+ *      // The actual descriptors (16 bytes each)
+ *      struct vring_desc desc[num];
+ *
+ *      // A ring of available descriptor heads with free-running index.
+ *      __u16 avail_flags;
+ *      __u16 avail_idx;
+ *      __u16 available[num];
+ *      __u16 used_event_idx;
+ *
+ *      // Padding to the next align boundary.
+ *      char pad[];
+ *
+ *      // A ring of used descriptor heads with free-running index.
+ *      __u16 used_flags;
+ *      __u16 used_idx;
+ *      struct vring_used_elem used[num];
+ *      __u16 avail_event_idx;
+ * };
+ *
+ * NOTE: for VirtIO PCI, align is 4096.
+ */
+
+/*
+ * We publish the used event index at the end of the available ring, and vice
+ * versa. They are at the end for backwards compatibility.
+ */
+#define vring_used_event(vr)  ((vr)->avail->ring[(vr)->num])
+#define vring_avail_event(vr) (*(uint16_t *)&(vr)->used->ring[(vr)->num])
+
+static inline size_t
+vring_size(unsigned int num, unsigned long align)
+{
+	size_t size;
+
+	size = num * sizeof(struct vring_desc);
+	size += sizeof(struct vring_avail) + (num * sizeof(uint16_t));
+	size = RTE_ALIGN_CEIL(size, align);
+	size += sizeof(struct vring_used) +
+		(num * sizeof(struct vring_used_elem));
+	return size;
+}
+
+static inline void
+vring_init(struct vring *vr, unsigned int num, uint8_t *p,
+	unsigned long align)
+{
+	vr->num = num;
+	vr->desc = (struct vring_desc *) p;
+	vr->avail = (struct vring_avail *) (p +
+		num * sizeof(struct vring_desc));
+	vr->used = (void *)
+		RTE_ALIGN_CEIL((uintptr_t)(&vr->avail->ring[num]), align);
+}
+
+/*
+ * The following is used with VIRTIO_RING_F_EVENT_IDX.
+ * Assuming a given event_idx value from the other size, if we have
+ * just incremented index from old to new_idx, should we trigger an
+ * event?
+ */
+static inline int
+vring_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
+{
+	return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old);
+}
+
+#endif /* _VIRTIO_RING_H_ */
diff --git a/drivers/crypto/virtio/virtqueue.c b/drivers/crypto/virtio/virtqueue.c
new file mode 100644
index 0000000..fd8be58
--- /dev/null
+++ b/drivers/crypto/virtio/virtqueue.c
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+#include <rte_crypto.h>
+#include <rte_malloc.h>
+
+#include "virtqueue.h"
+
+void
+virtqueue_disable_intr(struct virtqueue *vq)
+{
+	/*
+	 * Set VRING_AVAIL_F_NO_INTERRUPT to hint host
+	 * not to interrupt when it consumes packets
+	 * Note: this is only considered a hint to the host
+	 */
+	vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
+}
+
+void
+virtqueue_detatch_unused(struct virtqueue *vq)
+{
+	struct rte_crypto_op *cop = NULL;
+
+	int idx;
+
+	if (vq != NULL)
+		for (idx = 0; idx < vq->vq_nentries; idx++) {
+			cop = vq->vq_descx[idx].crypto_op;
+			if (cop) {
+				if (cop->sym->m_src)
+					rte_pktmbuf_free(cop->sym->m_src);
+				if (cop->sym->m_dst)
+					rte_pktmbuf_free(cop->sym->m_dst);
+				rte_crypto_op_free(cop);
+				vq->vq_descx[idx].crypto_op = NULL;
+			}
+		}
+}
diff --git a/drivers/crypto/virtio/virtqueue.h b/drivers/crypto/virtio/virtqueue.h
new file mode 100644
index 0000000..0a9bddb
--- /dev/null
+++ b/drivers/crypto/virtio/virtqueue.h
@@ -0,0 +1,172 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTQUEUE_H_
+#define _VIRTQUEUE_H_
+
+#include <linux/virtio_crypto.h>
+
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_mempool.h>
+
+#include "virtio_pci.h"
+#include "virtio_ring.h"
+#include "virtio_logs.h"
+
+struct rte_mbuf;
+
+/*
+ * Per virtio_config.h in Linux.
+ *     For virtio_pci on SMP, we don't need to order with respect to MMIO
+ *     accesses through relaxed memory I/O windows, so smp_mb() et al are
+ *     sufficient.
+ *
+ */
+#define virtio_mb()	rte_smp_mb()
+#define virtio_rmb()	rte_smp_rmb()
+#define virtio_wmb()	rte_smp_wmb()
+
+#define VIRTQUEUE_MAX_NAME_SZ 32
+
+enum { VTCRYPTO_DATAQ = 0, VTCRYPTO_CTRLQ = 1 };
+
+/**
+ * The maximum virtqueue size is 2^15. Use that value as the end of
+ * descriptor chain terminator since it will never be a valid index
+ * in the descriptor table. This is used to verify we are correctly
+ * handling vq_free_cnt.
+ */
+#define VQ_RING_DESC_CHAIN_END 32768
+
+struct vq_desc_extra {
+	void     *crypto_op;
+	void     *cookie;
+	uint16_t ndescs;
+};
+
+struct virtqueue {
+	/**< virtio_crypto_hw structure pointer. */
+	struct virtio_crypto_hw *hw;
+	/**< mem zone to populate RX ring. */
+	const struct rte_memzone *mz;
+	/**< memzone to populate hdr and request. */
+	struct rte_mempool *mpool;
+	uint8_t     dev_id;              /**< Device identifier. */
+	uint16_t    vq_queue_index;       /**< PCI queue index */
+
+	void        *vq_ring_virt_mem;    /**< linear address of vring*/
+	unsigned int vq_ring_size;
+	phys_addr_t vq_ring_mem;          /**< physical address of vring */
+
+	struct vring vq_ring;    /**< vring keeping desc, used and avail */
+	uint16_t    vq_free_cnt; /**< num of desc available */
+	uint16_t    vq_nentries; /**< vring desc numbers */
+
+	/**
+	 * Head of the free chain in the descriptor table. If
+	 * there are no free descriptors, this will be set to
+	 * VQ_RING_DESC_CHAIN_END.
+	 */
+	uint16_t  vq_desc_head_idx;
+	uint16_t  vq_desc_tail_idx;
+	/**
+	 * Last consumed descriptor in the used table,
+	 * trails vq_ring.used->idx.
+	 */
+	uint16_t vq_used_cons_idx;
+	uint16_t vq_avail_idx;
+
+	/* Statistics */
+	uint64_t	packets_sent_total;
+	uint64_t	packets_sent_failed;
+	uint64_t	packets_received_total;
+	uint64_t	packets_received_failed;
+
+	uint16_t  *notify_addr;
+
+	struct vq_desc_extra vq_descx[0];
+};
+
+/**
+ * Tell the backend not to interrupt us.
+ */
+void virtqueue_disable_intr(struct virtqueue *vq);
+
+/**
+ *  Get all mbufs to be freed.
+ */
+void virtqueue_detatch_unused(struct virtqueue *vq);
+
+static inline int
+virtqueue_full(const struct virtqueue *vq)
+{
+	return vq->vq_free_cnt == 0;
+}
+
+#define VIRTQUEUE_NUSED(vq) \
+	((uint16_t)((vq)->vq_ring.used->idx - (vq)->vq_used_cons_idx))
+
+static inline void
+vq_update_avail_idx(struct virtqueue *vq)
+{
+	virtio_wmb();
+	vq->vq_ring.avail->idx = vq->vq_avail_idx;
+}
+
+static inline void
+vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
+{
+	uint16_t avail_idx;
+	/*
+	 * Place the head of the descriptor chain into the next slot and make
+	 * it usable to the host. The chain is made available now rather than
+	 * deferring to virtqueue_notify() in the hopes that if the host is
+	 * currently running on another CPU, we can keep it processing the new
+	 * descriptor.
+	 */
+	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
+	if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx))
+		vq->vq_ring.avail->ring[avail_idx] = desc_idx;
+	vq->vq_avail_idx++;
+}
+
+static inline int
+virtqueue_kick_prepare(struct virtqueue *vq)
+{
+	return !(vq->vq_ring.used->flags & VRING_USED_F_NO_NOTIFY);
+}
+
+static inline void
+virtqueue_notify(struct virtqueue *vq)
+{
+	/*
+	 * Ensure updated avail->idx is visible to host.
+	 * For virtio on IA, the notificaiton is through io port operation
+	 * which is a serialization instruction itself.
+	 */
+	VTPCI_OPS(vq->hw)->notify_queue(vq->hw, vq);
+}
+
+/**
+ * Dump virtqueue internal structures, for debug purpose only.
+ */
+#define VIRTQUEUE_DUMP(vq) do { \
+	uint16_t used_idx, nused; \
+	used_idx = (vq)->vq_ring.used->idx; \
+	nused = (uint16_t)(used_idx - (vq)->vq_used_cons_idx); \
+	VIRTIO_CRYPTO_INIT_LOG_DBG(\
+	  "VQ: - size=%d; free=%d; used=%d; desc_head_idx=%d;" \
+	  " avail.idx=%d; used_cons_idx=%d; used.idx=%d;" \
+	  " avail.flags=0x%x; used.flags=0x%x", \
+	  (vq)->vq_nentries, (vq)->vq_free_cnt, nused, \
+	  (vq)->vq_desc_head_idx, (vq)->vq_ring.avail->idx, \
+	  (vq)->vq_used_cons_idx, (vq)->vq_ring.used->idx, \
+	  (vq)->vq_ring.avail->flags, (vq)->vq_ring.used->flags); \
+} while (0)
+
+#endif /* _VIRTQUEUE_H_ */
-- 
1.8.3.1

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v4 2/5] vhost: support selective datapath
  @ 2018-03-31  6:10  3%     ` Maxime Coquelin
  2018-04-02  1:58  0%       ` Wang, Zhihong
  0 siblings, 1 reply; 200+ results
From: Maxime Coquelin @ 2018-03-31  6:10 UTC (permalink / raw)
  To: Zhihong Wang, dev
  Cc: jianfeng.tan, tiwei.bie, yliu, cunming.liang, xiao.w.wang, dan.daly



On 03/10/2018 11:01 AM, Zhihong Wang wrote:
> This patch set introduces support for selective datapath in DPDK vhost-user
> lib. vDPA stands for vhost Data Path Acceleration. The idea is to support
> virtio ring compatible devices to serve virtio driver directly to enable
> datapath acceleration.
> 
> A set of device ops is defined for device specific operations:
> 
>       a. queue_num_get: Called to get supported queue number of the device.
> 
>       b. feature_get: Called to get supported features of the device.
> 
>       c. protocol_feature_get: Called to get supported protocol features of
>          the device.
> 
>       d. dev_conf: Called to configure the actual device when the virtio
>          device becomes ready.
> 
>       e. dev_close: Called to close the actual device when the virtio device
>          is stopped.
> 
>       f. vring_state_set: Called to change the state of the vring in the
>          actual device when vring state changes.
> 
>       g. feature_set: Called to set the negotiated features to device.
> 
>       h. migration_done: Called to allow the device to response to RARP
>          sending.
> 
>       i. get_vfio_group_fd: Called to get the VFIO group fd of the device.
> 
>       j. get_vfio_device_fd: Called to get the VFIO device fd of the device.
> 
>       k. get_notify_area: Called to get the notify area info of the queue.
> 
> Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
> ---
> Changes in v4:
> 
>   1. Remove the "engine" concept in the lib.
> 
> ---
> Changes in v2:
> 
>   1. Add VFIO related vDPA device ops.
> 
>   lib/librte_vhost/Makefile              |  4 +-
>   lib/librte_vhost/rte_vdpa.h            | 94 +++++++++++++++++++++++++++++++++
>   lib/librte_vhost/rte_vhost_version.map |  6 +++
>   lib/librte_vhost/vdpa.c                | 96 ++++++++++++++++++++++++++++++++++
>   4 files changed, 198 insertions(+), 2 deletions(-)
>   create mode 100644 lib/librte_vhost/rte_vdpa.h
>   create mode 100644 lib/librte_vhost/vdpa.c
> 
> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> index 5d6c6abae..37044ac03 100644
> --- a/lib/librte_vhost/Makefile
> +++ b/lib/librte_vhost/Makefile
> @@ -22,9 +22,9 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -lrte_ethdev -lrte_net
>   
>   # all source are stored in SRCS-y
>   SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \
> -					vhost_user.c virtio_net.c
> +					vhost_user.c virtio_net.c vdpa.c
>   
>   # install includes
> -SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h rte_vdpa.h
>   
>   include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
> new file mode 100644
> index 000000000..a4bbbd93d
> --- /dev/null
> +++ b/lib/librte_vhost/rte_vdpa.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation
> + */
> +
> +#ifndef _RTE_VDPA_H_
> +#define _RTE_VDPA_H_
> +
> +/**
> + * @file
> + *
> + * Device specific vhost lib
> + */
> +
> +#include <rte_pci.h>
> +#include "rte_vhost.h"
> +
> +#define MAX_VDPA_NAME_LEN 128
> +
> +enum vdpa_addr_type {
> +	PCI_ADDR,
> +	VDPA_ADDR_MAX
> +};
> +
> +struct rte_vdpa_dev_addr {
> +	enum vdpa_addr_type type;
> +	union {
> +		uint8_t __dummy[64];
> +		struct rte_pci_addr pci_addr;
> +	};
> +};
> +
> +/* Get capabilities of this device */
> +typedef int (*vdpa_dev_queue_num_get_t)(int did, uint32_t *queue_num);
> +typedef int (*vdpa_dev_feature_get_t)(int did, uint64_t *features);
> +
> +/* Driver configure/close the device */
> +typedef int (*vdpa_dev_conf_t)(int vid);
> +typedef int (*vdpa_dev_close_t)(int vid);
> +
> +/* Enable/disable this vring */
> +typedef int (*vdpa_vring_state_set_t)(int vid, int vring, int state);
> +
> +/* Set features when changed */
> +typedef int (*vdpa_feature_set_t)(int vid);
> +
> +/* Destination operations when migration done */
> +typedef int (*vdpa_migration_done_t)(int vid);
> +
> +/* Get the vfio group fd */
> +typedef int (*vdpa_get_vfio_group_fd_t)(int vid);
> +
> +/* Get the vfio device fd */
> +typedef int (*vdpa_get_vfio_device_fd_t)(int vid);
> +
> +/* Get the notify area info of the queue */
> +typedef int (*vdpa_get_notify_area_t)(int vid, int qid, uint64_t *offset,
> +		uint64_t *size);
> +/* Device ops */
> +struct rte_vdpa_dev_ops {
> +	vdpa_dev_queue_num_get_t  queue_num_get;
> +	vdpa_dev_feature_get_t    feature_get;
> +	vdpa_dev_feature_get_t    protocol_feature_get;
> +	vdpa_dev_conf_t           dev_conf;
> +	vdpa_dev_close_t          dev_close;
> +	vdpa_vring_state_set_t    vring_state_set;
> +	vdpa_feature_set_t        feature_set;
> +	vdpa_migration_done_t     migration_done;
> +	vdpa_get_vfio_group_fd_t  get_vfio_group_fd;
> +	vdpa_get_vfio_device_fd_t get_vfio_device_fd;
> +	vdpa_get_notify_area_t    get_notify_area;

Maybe you could reserve some room here to avoid breaking the ABI in the
future if we need to add some optional ops.

> +};
> +
> +struct rte_vdpa_device {
> +	struct rte_vdpa_dev_addr addr;
> +	struct rte_vdpa_dev_ops *ops;
> +} __rte_cache_aligned;
> +
> +extern struct rte_vdpa_device *vdpa_devices[];
> +extern uint32_t vdpa_device_num;
> +
> +/* Register a vdpa device, return did if successful, -1 on failure */
> +int __rte_experimental
> +rte_vdpa_register_device(struct rte_vdpa_dev_addr *addr,
> +		struct rte_vdpa_dev_ops *ops);
> +
> +/* Unregister a vdpa device, return -1 on failure */
> +int __rte_experimental
> +rte_vdpa_unregister_device(int did);
> +
> +/* Find did of a vdpa device, return -1 on failure */
> +int __rte_experimental
> +rte_vdpa_find_device_id(struct rte_vdpa_dev_addr *addr);
> +
> +#endif /* _RTE_VDPA_H_ */
> diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map
> index df0103129..7bcffb490 100644
> --- a/lib/librte_vhost/rte_vhost_version.map
> +++ b/lib/librte_vhost/rte_vhost_version.map
> @@ -59,3 +59,9 @@ DPDK_18.02 {
>   	rte_vhost_vring_call;
>   
>   } DPDK_17.08;
> +
> +EXPERIMENTAL {
> +	rte_vdpa_register_device;
> +	rte_vdpa_unregister_device;
> +	rte_vdpa_find_device_id;

I think you need also to declare the new structs here,
not only the new functions.

> +} DPDK_18.02;
> diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c
> new file mode 100644
> index 000000000..0c950d45f
> --- /dev/null
> +++ b/lib/librte_vhost/vdpa.c
> @@ -0,0 +1,96 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation
> + */
> +
> +/**
> + * @file
> + *
> + * Device specific vhost lib
> + */
> +
> +#include <stdbool.h>
> +
> +#include <rte_malloc.h>
> +#include "rte_vdpa.h"
> +#include "vhost.h"
> +
> +struct rte_vdpa_device *vdpa_devices[MAX_VHOST_DEVICE];
> +uint32_t vdpa_device_num;
> +
> +static int is_same_vdpa_dev_addr(struct rte_vdpa_dev_addr *a,
> +		struct rte_vdpa_dev_addr *b)
> +{

Given the boolean nature of the function name, I would return 1 if same
device, 0 if different.

> +	int ret = 0;
> +
> +	if (a->type != b->type)
> +		return -1;
> +
> +	switch (a->type) {
> +	case PCI_ADDR:
> +		if (a->pci_addr.domain != b->pci_addr.domain ||
> +				a->pci_addr.bus != b->pci_addr.bus ||
> +				a->pci_addr.devid != b->pci_addr.devid ||
> +				a->pci_addr.function != b->pci_addr.function)
> +			ret = -1;
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
> +int rte_vdpa_register_device(struct rte_vdpa_dev_addr *addr,
> +		struct rte_vdpa_dev_ops *ops)
> +{
> +	struct rte_vdpa_device *dev;
> +	char device_name[MAX_VDPA_NAME_LEN];
> +	int i;
> +
> +	if (vdpa_device_num >= MAX_VHOST_DEVICE)
> +		return -1;
> +
> +	for (i = 0; i < MAX_VHOST_DEVICE; i++) {
> +		if (vdpa_devices[i] == NULL)
> +			break;
You might want to check same device isn't being registering a second
time, and return an error in that case.

This is not a blocker though, and can be done in a dedicated patch.

> +	}
> +
> +	sprintf(device_name, "vdpa-dev-%d", i);
> +	dev = rte_zmalloc(device_name, sizeof(struct rte_vdpa_device),
> +			RTE_CACHE_LINE_SIZE);
> +	if (!dev)
> +		return -1;
> +
> +	memcpy(&dev->addr, addr, sizeof(struct rte_vdpa_dev_addr));
> +	dev->ops = ops;
> +	vdpa_devices[i] = dev;
> +	vdpa_device_num++;
> +
> +	return i;
> +}
> +
> +int rte_vdpa_unregister_device(int did)
> +{
> +	if (did < 0 || did >= MAX_VHOST_DEVICE || vdpa_devices[did] == NULL)
> +		return -1;
> +
> +	rte_free(vdpa_devices[did]);
> +	vdpa_devices[did] = NULL;
> +	vdpa_device_num--;
> +
> +	return did;
> +}
> +
> +int rte_vdpa_find_device_id(struct rte_vdpa_dev_addr *addr)
> +{
> +	struct rte_vdpa_device *dev;
> +	int i;
> +
> +	for (i = 0; i < MAX_VHOST_DEVICE; ++i) {
> +		dev = vdpa_devices[i];
> +		if (dev && is_same_vdpa_dev_addr(&dev->addr, addr) == 0)
> +			return i;
> +	}
> +
> +	return -1;
> +}
> 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 0/4] ethdev: add per-PMD tuning of RxTx parmeters
  2018-03-30 10:34  0%     ` Ferruh Yigit
@ 2018-03-31  0:05  0%       ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-03-31  0:05 UTC (permalink / raw)
  To: Ferruh Yigit, Remy Horton
  Cc: dev, John McNamara, Wenzhuo Lu, Jingjing Wu, Qi Zhang,
	Beilei Xing, Shreyansh Jain

30/03/2018 12:34, Ferruh Yigit:
> On 3/27/2018 7:43 PM, Ferruh Yigit wrote:
> > On 3/21/2018 2:27 PM, Remy Horton wrote:
> >> The optimal values of several transmission & reception related parameters,
> >> such as burst sizes, descriptor ring sizes, and number of queues, varies
> >> between different network interface devices. This patchset allows individual
> >> PMDs to specify their preferred parameter values, and if so indicated by an
> >> application, for them to be used automatically by the ethdev layer.
> >>
> >> rte_eth_dev_configure() has been changed so that specifying zero for both
> >> nb_rx_q AND nb_tx_q causes it to use driver preferred values, and if these
> >> are not available, falls back to EAL defaults. Setting one (but not both)
> >> to zero does not cause the use of defaults, as having one of them zeroed is
> >> a valid setup.
> >>
> >> This RFC/V1 includes per-PMD values for e1000 and i40e but it is expected
> >> that subsequent patchsets will cover other PMDs. A deprecation notice
> >> covering the API/ABI change is in place.
> >>
> >>
> >> Changes in v2:
> >> * Rebased to 
> >> * Removed fallback values from rte_eth_dev_info_get()
> >> * Added fallback values to rte_rte_[rt]x_queue_setup()
> >> * Added fallback values to rte_eth_dev_configure()
> >> * Corrected comment
> >> * Removed deprecation notice
> >> * Split RX and Tx into seperate structures
> >> * Changed parameter names
> >>
> >>
> >> Remy Horton (4):
> >>   ethdev: add support for PMD-tuned Tx/Rx parameters
> >>   net/e1000: add TxRx tuning parameters
> >>   net/i40e: add TxRx tuning parameters
> >>   testpmd: make use of per-PMD TxRx parameters
> > 
> > Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
> 
> Series applied to dpdk-next-net/master, thanks.

I prefer not pulling this series in master and give a chance to have
a more complete v3 for testpmd and examples.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 2/7] bpf: add BPF loading and execution framework
    2018-03-09 16:42  2% ` [dpdk-dev] [PATCH v1 1/5] bpf: add BPF loading and execution framework Konstantin Ananyev
@ 2018-03-30 17:32  2% ` Konstantin Ananyev
  1 sibling, 0 replies; 200+ results
From: Konstantin Ananyev @ 2018-03-30 17:32 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

librte_bpf provides a framework to load and execute eBPF bytecode
inside user-space dpdk based applications.
It supports basic set of features from eBPF spec
(https://www.kernel.org/doc/Documentation/networking/filter.txt).

Not currently supported features:
 - JIT
 - cBPF
 - tail-pointer call
 - eBPF MAP
 - skb

It also adds dependency on libelf.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 config/common_base                 |   5 +
 lib/Makefile                       |   2 +
 lib/librte_bpf/Makefile            |  30 +++
 lib/librte_bpf/bpf.c               |  59 +++++
 lib/librte_bpf/bpf_exec.c          | 452 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_impl.h          |  41 ++++
 lib/librte_bpf/bpf_load.c          | 385 +++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_validate.c      |  55 +++++
 lib/librte_bpf/meson.build         |  18 ++
 lib/librte_bpf/rte_bpf.h           | 160 +++++++++++++
 lib/librte_bpf/rte_bpf_version.map |  12 +
 lib/meson.build                    |   2 +-
 mk/rte.app.mk                      |   2 +
 13 files changed, 1222 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/meson.build
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map

diff --git a/config/common_base b/config/common_base
index ee10b449b..97b60f9ff 100644
--- a/config/common_base
+++ b/config/common_base
@@ -827,3 +827,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y
 # Compile the eventdev application
 #
 CONFIG_RTE_APP_EVENTDEV=y
+
+#
+# Compile librte_bpf
+#
+CONFIG_RTE_LIBRTE_BPF=y
diff --git a/lib/Makefile b/lib/Makefile
index ec965a606..a4a2329f9 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -97,6 +97,8 @@ DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
 DIRS-$(CONFIG_RTE_LIBRTE_GSO) += librte_gso
 DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ether librte_net
 DEPDIRS-librte_gso += librte_mempool
+DIRS-$(CONFIG_RTE_LIBRTE_BPF) += librte_bpf
+DEPDIRS-librte_bpf := librte_eal librte_mempool librte_mbuf librte_ether
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
new file mode 100644
index 000000000..e0f434e77
--- /dev/null
+++ b/lib/librte_bpf/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bpf.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+LDLIBS += -lrte_net -lrte_eal
+LDLIBS += -lrte_mempool -lrte_ring
+LDLIBS += -lrte_mbuf -lrte_ethdev
+LDLIBS += -lelf
+
+EXPORT_MAP := rte_bpf_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+
+# install header files
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
new file mode 100644
index 000000000..d7f68c017
--- /dev/null
+++ b/lib/librte_bpf/bpf.c
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+int rte_bpf_logtype;
+
+__rte_experimental void
+rte_bpf_destroy(struct rte_bpf *bpf)
+{
+	if (bpf != NULL) {
+		if (bpf->jit.func != NULL)
+			munmap(bpf->jit.func, bpf->jit.sz);
+		munmap(bpf, bpf->sz);
+	}
+}
+
+__rte_experimental int
+rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit)
+{
+	if (bpf == NULL || jit == NULL)
+		return -EINVAL;
+
+	jit[0] = bpf->jit;
+	return 0;
+}
+
+int
+bpf_jit(struct rte_bpf *bpf)
+{
+	int32_t rc;
+
+	rc = -ENOTSUP;
+	if (rc != 0)
+		RTE_BPF_LOG(WARNING, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
+
+RTE_INIT(rte_bpf_init_log);
+
+static void
+rte_bpf_init_log(void)
+{
+	rte_bpf_logtype = rte_log_register("lib.bpf");
+	if (rte_bpf_logtype >= 0)
+		rte_log_set_level(rte_bpf_logtype, RTE_LOG_INFO);
+}
diff --git a/lib/librte_bpf/bpf_exec.c b/lib/librte_bpf/bpf_exec.c
new file mode 100644
index 000000000..0382ade98
--- /dev/null
+++ b/lib/librte_bpf/bpf_exec.c
@@ -0,0 +1,452 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define BPF_JMP_UNC(ins)	((ins) += (ins)->off)
+
+#define BPF_JMP_CND_REG(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg]) ? \
+		(ins)->off : 0)
+
+#define BPF_JMP_CND_IMM(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(ins)->imm) ? \
+		(ins)->off : 0)
+
+#define BPF_NEG_ALU(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(-(reg)[(ins)->dst_reg]))
+
+#define BPF_MOV_ALU_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(reg)[(ins)->src_reg])
+
+#define BPF_OP_ALU_REG(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg])
+
+#define BPF_MOV_ALU_IMM(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(ins)->imm)
+
+#define BPF_OP_ALU_IMM(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(ins)->imm)
+
+#define BPF_DIV_ZERO_CHECK(bpf, reg, ins, type) do { \
+	if ((type)(reg)[(ins)->src_reg] == 0) { \
+		RTE_BPF_LOG(ERR, \
+			"%s(%p): division by 0 at pc: %#zx;\n", \
+			__func__, bpf, \
+			(uintptr_t)(ins) - (uintptr_t)(bpf)->prm.ins); \
+		return 0; \
+	} \
+} while (0)
+
+#define BPF_LD_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = \
+		*(type *)(uintptr_t)((reg)[(ins)->src_reg] + (ins)->off))
+
+#define BPF_ST_IMM(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(ins)->imm)
+
+#define BPF_ST_REG(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(reg)[(ins)->src_reg])
+
+#define BPF_ST_XADD_REG(reg, ins, tp)	\
+	(rte_atomic##tp##_add((rte_atomic##tp##_t *) \
+		(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off), \
+		reg[ins->src_reg]))
+
+static inline void
+bpf_alu_be(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_be_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_be_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_be_64(*v);
+		break;
+	}
+}
+
+static inline void
+bpf_alu_le(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_le_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_le_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_le_64(*v);
+		break;
+	}
+}
+
+static inline uint64_t
+bpf_exec(const struct rte_bpf *bpf, uint64_t reg[MAX_BPF_REG])
+{
+	const struct bpf_insn *ins;
+
+	for (ins = bpf->prm.ins; ; ins++) {
+		switch (ins->code) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint32_t);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_BE):
+			bpf_alu_be(reg, ins);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_LE):
+			bpf_alu_le(reg, ins);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint64_t);
+			break;
+		/* 64 bit ALU REG operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint64_t);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+			BPF_LD_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_H):
+			BPF_LD_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_W):
+			BPF_LD_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_DW):
+			BPF_LD_REG(reg, ins, uint64_t);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | BPF_DW):
+			reg[ins->dst_reg] = (uint32_t)ins[0].imm |
+				(uint64_t)(uint32_t)ins[1].imm << 32;
+			ins++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+			BPF_ST_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_H):
+			BPF_ST_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_W):
+			BPF_ST_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_DW):
+			BPF_ST_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+			BPF_ST_IMM(reg, ins, uint8_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_H):
+			BPF_ST_IMM(reg, ins, uint16_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_W):
+			BPF_ST_IMM(reg, ins, uint32_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_DW):
+			BPF_ST_IMM(reg, ins, uint64_t);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | BPF_XADD | BPF_W):
+			BPF_ST_XADD_REG(reg, ins, 32);
+			break;
+		case (BPF_STX | BPF_XADD | BPF_DW):
+			BPF_ST_XADD_REG(reg, ins, 64);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			BPF_JMP_UNC(ins);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, &, uint64_t);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, &, uint64_t);
+			break;
+		/* call instructions */
+		case (BPF_JMP | BPF_CALL):
+			reg[BPF_REG_0] = bpf->prm.xsym[ins->imm].func(
+				reg[BPF_REG_1], reg[BPF_REG_2], reg[BPF_REG_3],
+				reg[BPF_REG_4], reg[BPF_REG_5]);
+			break;
+		/* return instruction */
+		case (BPF_JMP | BPF_EXIT):
+			return reg[BPF_REG_0];
+		default:
+			RTE_BPF_LOG(ERR,
+				"%s(%p): invalid opcode %#x at pc: %#zx;\n",
+				__func__, bpf, ins->code,
+				(uintptr_t)ins - (uintptr_t)bpf->prm.ins);
+			return 0;
+		}
+	}
+
+	/* should never be reached */
+	RTE_VERIFY(0);
+	return 0;
+}
+
+__rte_experimental uint32_t
+rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
+	uint32_t num)
+{
+	uint32_t i;
+	uint64_t reg[MAX_BPF_REG];
+	uint64_t stack[MAX_BPF_STACK_SIZE / sizeof(uint64_t)];
+
+	for (i = 0; i != num; i++) {
+
+		reg[BPF_REG_1] = (uintptr_t)ctx[i];
+		reg[BPF_REG_10] = (uintptr_t)(stack + RTE_DIM(stack));
+
+		rc[i] = bpf_exec(bpf, reg);
+	}
+
+	return i;
+}
+
+__rte_experimental uint64_t
+rte_bpf_exec(const struct rte_bpf *bpf, void *ctx)
+{
+	uint64_t rc;
+
+	rte_bpf_exec_burst(bpf, &ctx, &rc, 1);
+	return rc;
+}
diff --git a/lib/librte_bpf/bpf_impl.h b/lib/librte_bpf/bpf_impl.h
new file mode 100644
index 000000000..5d7e65c31
--- /dev/null
+++ b/lib/librte_bpf/bpf_impl.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _BPF_H_
+#define _BPF_H_
+
+#include <rte_bpf.h>
+#include <sys/mman.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MAX_BPF_STACK_SIZE	0x200
+
+struct rte_bpf {
+	struct rte_bpf_prm prm;
+	struct rte_bpf_jit jit;
+	size_t sz;
+	uint32_t stack_sz;
+};
+
+extern int bpf_validate(struct rte_bpf *bpf);
+
+extern int bpf_jit(struct rte_bpf *bpf);
+
+#ifdef RTE_ARCH_X86_64
+extern int bpf_jit_x86(struct rte_bpf *);
+#endif
+
+extern int rte_bpf_logtype;
+
+#define	RTE_BPF_LOG(lvl, fmt, args...) \
+	rte_log(RTE_LOG_## lvl, rte_bpf_logtype, fmt, ##args)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _BPF_H_ */
diff --git a/lib/librte_bpf/bpf_load.c b/lib/librte_bpf/bpf_load.c
new file mode 100644
index 000000000..e1ff5714a
--- /dev/null
+++ b/lib/librte_bpf/bpf_load.c
@@ -0,0 +1,385 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/queue.h>
+#include <fcntl.h>
+
+#include <libelf.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+#include <rte_errno.h>
+
+#include "bpf_impl.h"
+
+/* To overcome compatibility issue */
+#ifndef EM_BPF
+#define	EM_BPF	247
+#endif
+
+static uint32_t
+bpf_find_xsym(const char *sn, enum rte_bpf_xtype type,
+	const struct rte_bpf_xsym fp[], uint32_t fn)
+{
+	uint32_t i;
+
+	if (sn == NULL || fp == NULL)
+		return UINT32_MAX;
+
+	for (i = 0; i != fn; i++) {
+		if (fp[i].type == type && strcmp(sn, fp[i].name) == 0)
+			break;
+	}
+
+	return (i != fn) ? i : UINT32_MAX;
+}
+
+/*
+ * update BPF code at offset *ofs* with a proper address(index) for external
+ * symbol *sn*
+ */
+static int
+resolve_xsym(const char *sn, size_t ofs, struct bpf_insn *ins, size_t ins_sz,
+	const struct rte_bpf_prm *prm)
+{
+	uint32_t idx, fidx;
+	enum rte_bpf_xtype type;
+
+	if (ofs % sizeof(ins[0]) != 0 || ofs >= ins_sz)
+		return -EINVAL;
+
+	idx = ofs / sizeof(ins[0]);
+	if (ins[idx].code == (BPF_JMP | BPF_CALL))
+		type = RTE_BPF_XTYPE_FUNC;
+	else if (ins[idx].code == (BPF_LD | BPF_IMM | BPF_DW) &&
+			ofs < ins_sz - sizeof(ins[idx]))
+		type = RTE_BPF_XTYPE_VAR;
+	else
+		return -EINVAL;
+
+	fidx = bpf_find_xsym(sn, type, prm->xsym, prm->nb_xsym);
+	if (fidx == UINT32_MAX)
+		return -ENOENT;
+
+	/* for function we just need an index in our xsym table */
+	if (type == RTE_BPF_XTYPE_FUNC)
+		ins[idx].imm = fidx;
+	/* for variable we need to store its absolute address */
+	else {
+		ins[idx].imm = (uintptr_t)prm->xsym[fidx].var;
+		ins[idx + 1].imm = (uintptr_t)prm->xsym[fidx].var >> 32;
+	}
+
+	return 0;
+}
+
+static int
+check_elf_header(const Elf64_Ehdr * eh)
+{
+	const char *err;
+
+	err = NULL;
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	if (eh->e_ident[EI_DATA] != ELFDATA2LSB)
+#else
+	if (eh->e_ident[EI_DATA] != ELFDATA2MSB)
+#endif
+		err = "not native byte order";
+	else if (eh->e_ident[EI_OSABI] != ELFOSABI_NONE)
+		err = "unexpected OS ABI";
+	else if (eh->e_type != ET_REL)
+		err = "unexpected ELF type";
+	else if (eh->e_machine != EM_NONE && eh->e_machine != EM_BPF)
+		err = "unexpected machine type";
+
+	if (err != NULL) {
+		RTE_BPF_LOG(ERR, "%s(): %s\n", __func__, err);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find executable section by name.
+ */
+static int
+find_elf_code(Elf *elf, const char *section, Elf_Data **psd, size_t *pidx)
+{
+	Elf_Scn *sc;
+	const Elf64_Ehdr *eh;
+	const Elf64_Shdr *sh;
+	Elf_Data *sd;
+	const char *sn;
+	int32_t rc;
+
+	eh = elf64_getehdr(elf);
+	if (eh == NULL) {
+		rc = elf_errno();
+		RTE_BPF_LOG(ERR, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	if (check_elf_header(eh) != 0)
+		return -EINVAL;
+
+	/* find given section by name */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL;
+			sc = elf_nextscn(elf, sc)) {
+		sh = elf64_getshdr(sc);
+		sn = elf_strptr(elf, eh->e_shstrndx, sh->sh_name);
+		if (sn != NULL && strcmp(section, sn) == 0 &&
+				sh->sh_type == SHT_PROGBITS &&
+				sh->sh_flags == (SHF_ALLOC | SHF_EXECINSTR))
+			break;
+	}
+
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL || sd->d_size == 0 ||
+			sd->d_size % sizeof(struct bpf_insn) != 0) {
+		rc = elf_errno();
+		RTE_BPF_LOG(ERR, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	*psd = sd;
+	*pidx = elf_ndxscn(sc);
+	return 0;
+}
+
+/*
+ * helper function to process data from relocation table.
+ */
+static int
+process_reloc(Elf *elf, size_t sym_idx, Elf64_Rel *re, size_t re_sz,
+	struct bpf_insn *ins, size_t ins_sz, const struct rte_bpf_prm *prm)
+{
+	int32_t rc;
+	uint32_t i, n;
+	size_t ofs, sym;
+	const char *sn;
+	const Elf64_Ehdr *eh;
+	Elf_Scn *sc;
+	const Elf_Data *sd;
+	Elf64_Sym *sm;
+
+	eh = elf64_getehdr(elf);
+
+	/* get symtable by section index */
+	sc = elf_getscn(elf, sym_idx);
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL)
+		return -EINVAL;
+	sm = sd->d_buf;
+
+	n = re_sz / sizeof(re[0]);
+	for (i = 0; i != n; i++) {
+
+		ofs = re[i].r_offset;
+
+		/* retrieve index in the symtable */
+		sym = ELF64_R_SYM(re[i].r_info);
+		if (sym * sizeof(sm[0]) >= sd->d_size)
+			return -EINVAL;
+
+		sn = elf_strptr(elf, eh->e_shstrndx, sm[sym].st_name);
+
+		rc = resolve_xsym(sn, ofs, ins, ins_sz, prm);
+		if (rc != 0) {
+			RTE_BPF_LOG(ERR,
+				"resolve_xsym(%s, %zu) error code: %d\n",
+				sn, ofs, rc);
+			return rc;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find relocation information (if any)
+ * and update bpf code.
+ */
+static int
+elf_reloc_code(Elf *elf, Elf_Data *ed, size_t sidx,
+	const struct rte_bpf_prm *prm)
+{
+	Elf64_Rel *re;
+	Elf_Scn *sc;
+	const Elf64_Shdr *sh;
+	const Elf_Data *sd;
+	int32_t rc;
+
+	rc = 0;
+
+	/* walk through all sections */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL && rc == 0;
+			sc = elf_nextscn(elf, sc)) {
+
+		sh = elf64_getshdr(sc);
+
+		/* relocation data for our code section */
+		if (sh->sh_type == SHT_REL && sh->sh_info == sidx) {
+			sd = elf_getdata(sc, NULL);
+			if (sd == NULL || sd->d_size == 0 ||
+					sd->d_size % sizeof(re[0]) != 0)
+				return -EINVAL;
+			rc = process_reloc(elf, sh->sh_link,
+				sd->d_buf, sd->d_size, ed->d_buf, ed->d_size,
+				prm);
+		}
+	}
+
+	return rc;
+}
+
+static struct rte_bpf *
+bpf_load(const struct rte_bpf_prm *prm)
+{
+	uint8_t *buf;
+	struct rte_bpf *bpf;
+	size_t sz, bsz, insz, xsz;
+
+	xsz =  prm->nb_xsym * sizeof(prm->xsym[0]);
+	insz = prm->nb_ins * sizeof(prm->ins[0]);
+	bsz = sizeof(bpf[0]);
+	sz = insz + xsz + bsz;
+
+	buf = mmap(NULL, sz, PROT_READ | PROT_WRITE,
+		MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (buf == MAP_FAILED)
+		return NULL;
+
+	bpf = (void *)buf;
+	bpf->sz = sz;
+
+	memcpy(&bpf->prm, prm, sizeof(bpf->prm));
+
+	memcpy(buf + bsz, prm->xsym, xsz);
+	memcpy(buf + bsz + xsz, prm->ins, insz);
+
+	bpf->prm.xsym = (void *)(buf + bsz);
+	bpf->prm.ins = (void *)(buf + bsz + xsz);
+
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_load(const struct rte_bpf_prm *prm)
+{
+	struct rte_bpf *bpf;
+	int32_t rc;
+
+	if (prm == NULL || prm->ins == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load(prm);
+	if (bpf == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	rc = bpf_validate(bpf);
+	if (rc == 0) {
+		bpf_jit(bpf);
+		if (mprotect(bpf, bpf->sz, PROT_READ) != 0)
+			rc = -ENOMEM;
+	}
+
+	if (rc != 0) {
+		rte_bpf_destroy(bpf);
+		rte_errno = -rc;
+		return NULL;
+	}
+
+	return bpf;
+}
+
+static struct rte_bpf *
+bpf_load_elf(const struct rte_bpf_prm *prm, int32_t fd, const char *section)
+{
+	Elf *elf;
+	Elf_Data *sd;
+	size_t sidx;
+	int32_t rc;
+	struct rte_bpf *bpf;
+	struct rte_bpf_prm np;
+
+	elf_version(EV_CURRENT);
+	elf = elf_begin(fd, ELF_C_READ, NULL);
+
+	rc = find_elf_code(elf, section, &sd, &sidx);
+	if (rc == 0)
+		rc = elf_reloc_code(elf, sd, sidx, prm);
+
+	if (rc == 0) {
+		np = prm[0];
+		np.ins = sd->d_buf;
+		np.nb_ins = sd->d_size / sizeof(struct bpf_insn);
+		bpf = rte_bpf_load(&np);
+	} else {
+		bpf = NULL;
+		rte_errno = -rc;
+	}
+
+	elf_end(elf);
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_elf_load(const struct rte_bpf_prm *prm, const char *fname,
+	const char *sname)
+{
+	int32_t fd, rc;
+	struct rte_bpf *bpf;
+
+	if (prm == NULL || fname == NULL || sname == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	fd = open(fname, O_RDONLY);
+	if (fd < 0) {
+		rc = errno;
+		RTE_BPF_LOG(ERR, "%s(%s) error code: %d(%s)\n",
+			__func__, fname, rc, strerror(rc));
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load_elf(prm, fd, sname);
+	close(fd);
+
+	if (bpf == NULL) {
+		RTE_BPF_LOG(ERR,
+			"%s(fname=\"%s\", sname=\"%s\") failed, "
+			"error code: %d\n",
+			__func__, fname, sname, rte_errno);
+		return NULL;
+	}
+
+	RTE_BPF_LOG(INFO, "%s(fname=\"%s\", sname=\"%s\") "
+		"successfully creates %p(jit={.func=%p,.sz=%zu});\n",
+		__func__, fname, sname, bpf, bpf->jit.func, bpf->jit.sz);
+	return bpf;
+}
diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c
new file mode 100644
index 000000000..1911e1381
--- /dev/null
+++ b/lib/librte_bpf/bpf_validate.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+/*
+ * dummy one for now, need more work.
+ */
+int
+bpf_validate(struct rte_bpf *bpf)
+{
+	int32_t rc, ofs, stack_sz;
+	uint32_t i, op, dr;
+	const struct bpf_insn *ins;
+
+	rc = 0;
+	stack_sz = 0;
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		ins = bpf->prm.ins + i;
+		op = ins->code;
+		dr = ins->dst_reg;
+		ofs = ins->off;
+
+		if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) &&
+				dr == BPF_REG_10) {
+			ofs -= sizeof(uint64_t);
+			stack_sz = RTE_MIN(ofs, stack_sz);
+		}
+	}
+
+	if (stack_sz != 0) {
+		stack_sz = -stack_sz;
+		if (stack_sz > MAX_BPF_STACK_SIZE)
+			rc = -ERANGE;
+		else
+			bpf->stack_sz = stack_sz;
+	}
+
+	if (rc != 0)
+		RTE_BPF_LOG(ERR, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
new file mode 100644
index 000000000..05c48c7ff
--- /dev/null
+++ b/lib/librte_bpf/meson.build
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+allow_experimental_apis = true
+sources = files('bpf.c',
+		'bpf_exec.c',
+		'bpf_load.c',
+		'bpf_validate.c')
+
+install_headers = files('rte_bpf.h')
+
+deps += ['mbuf', 'net']
+
+dep = dependency('libelf', required: false)
+if dep.found() == false
+	build = false
+endif
+ext_deps += dep
diff --git a/lib/librte_bpf/rte_bpf.h b/lib/librte_bpf/rte_bpf.h
new file mode 100644
index 000000000..4d4b93599
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf.h
@@ -0,0 +1,160 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_H_
+#define _RTE_BPF_H_
+
+#include <rte_common.h>
+#include <rte_mbuf.h>
+#include <rte_bpf_def.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Possible types for external symbols.
+ */
+enum rte_bpf_xtype {
+	RTE_BPF_XTYPE_FUNC, /**< function */
+	RTE_BPF_XTYPE_VAR, /**< variable */
+	RTE_BPF_XTYPE_NUM
+};
+
+/**
+ * Definition for external symbols available in the BPF program.
+ */
+struct rte_bpf_xsym {
+	const char *name;        /**< name */
+	enum rte_bpf_xtype type; /**< type */
+	union {
+		uint64_t (*func)(uint64_t, uint64_t, uint64_t,
+				uint64_t, uint64_t);
+		void *var;
+	}; /**< value */
+};
+
+/**
+ * Possible BPF program types.
+ * Use negative values for DPDK specific prog-types, to make sure they will
+ * not interfere with Linux related ones.
+ */
+enum rte_bpf_prog_type {
+	RTE_BPF_PROG_TYPE_UNSPEC = BPF_PROG_TYPE_UNSPEC,
+	/**< input is a pointer to raw data */
+	RTE_BPF_PROG_TYPE_MBUF = INT32_MIN,
+	/**< input is a pointer to rte_mbuf */
+};
+
+/**
+ * Input parameters for loading eBPF code.
+ */
+struct rte_bpf_prm {
+	const struct bpf_insn *ins; /**< array of eBPF instructions */
+	uint32_t nb_ins;            /**< number of instructions in ins */
+	const struct rte_bpf_xsym *xsym;
+	/**< array of external symbols that eBPF code is allowed to reference */
+	uint32_t nb_xsym; /**< number of elements in xsym */
+	enum rte_bpf_prog_type prog_type; /**< eBPF program type */
+};
+
+/**
+ * Information about compiled into native ISA eBPF code.
+ */
+struct rte_bpf_jit {
+	uint64_t (*func)(void *);
+	size_t sz;
+};
+
+struct rte_bpf;
+
+/**
+ * De-allocate all memory used by this eBPF execution context.
+ *
+ * @param bpf
+ *   BPF handle to destroy.
+ */
+void rte_bpf_destroy(struct rte_bpf *bpf);
+
+/**
+ * Create a new eBPF execution context and load given BPF code into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_load(const struct rte_bpf_prm *prm);
+
+/**
+ * Create a new eBPF execution context and load BPF code from given ELF
+ * file into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_elf_load(const struct rte_bpf_prm *prm,
+	const char *fname, const char *sname);
+
+/**
+ * Execute given BPF bytecode.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   pointer to input context.
+ * @return
+ *   BPF execution return value.
+ */
+uint64_t rte_bpf_exec(const struct rte_bpf *bpf, void *ctx);
+
+/**
+ * Execute given BPF bytecode over a set of input contexts.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   array of pointers to the input contexts.
+ * @param rc
+ *   array of return values (one per input).
+ * @param num
+ *   number of elements in ctx[] (and rc[]).
+ * @return
+ *   number of successfully processed inputs.
+ */
+uint32_t rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[],
+	uint64_t rc[], uint32_t num);
+
+/**
+ * Provide information about natively compield code for given BPF handle.
+ *
+ * @param bpf
+ *   handle for the BPF code.
+ * @param jit
+ *   pointer to the rte_bpf_jit structure to be filled with related data.
+ * @return
+ *   - -EINVAL if the parameters are invalid.
+ *   - Zero if operation completed successfully.
+ */
+int rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
new file mode 100644
index 000000000..ff65144df
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_destroy;
+	rte_bpf_elf_load;
+	rte_bpf_exec;
+	rte_bpf_exec_burst;
+	rte_bpf_get_jit;
+	rte_bpf_load;
+
+	local: *;
+};
diff --git a/lib/meson.build b/lib/meson.build
index ef6159170..7ff7aaaa5 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -23,7 +23,7 @@ libraries = [ 'compat', # just a header, used for versioning
 	# add pkt framework libs which use other libs from above
 	'port', 'table', 'pipeline',
 	# flow_classify lib depends on pkt framework table lib
-	'flow_classify']
+	'flow_classify', 'bpf']
 
 foreach l:libraries
 	build = true
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 94525dc80..07a9bcfe2 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -83,6 +83,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)          += -lrte_power
 _LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER)          += -lrte_timer
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EFD)            += -lrte_efd
 
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BPF)            += -lrte_bpf -lelf
+
 _LDLIBS-y += --whole-archive
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
-- 
2.13.6

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v2 0/4] ethdev: add per-PMD tuning of RxTx parmeters
  2018-03-27 18:43  0%   ` Ferruh Yigit
@ 2018-03-30 10:34  0%     ` Ferruh Yigit
  2018-03-31  0:05  0%       ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-03-30 10:34 UTC (permalink / raw)
  To: Remy Horton, dev
  Cc: John McNamara, Wenzhuo Lu, Jingjing Wu, Qi Zhang, Beilei Xing,
	Shreyansh Jain, Thomas Monjalon

On 3/27/2018 7:43 PM, Ferruh Yigit wrote:
> On 3/21/2018 2:27 PM, Remy Horton wrote:
>> The optimal values of several transmission & reception related parameters,
>> such as burst sizes, descriptor ring sizes, and number of queues, varies
>> between different network interface devices. This patchset allows individual
>> PMDs to specify their preferred parameter values, and if so indicated by an
>> application, for them to be used automatically by the ethdev layer.
>>
>> rte_eth_dev_configure() has been changed so that specifying zero for both
>> nb_rx_q AND nb_tx_q causes it to use driver preferred values, and if these
>> are not available, falls back to EAL defaults. Setting one (but not both)
>> to zero does not cause the use of defaults, as having one of them zeroed is
>> a valid setup.
>>
>> This RFC/V1 includes per-PMD values for e1000 and i40e but it is expected
>> that subsequent patchsets will cover other PMDs. A deprecation notice
>> covering the API/ABI change is in place.
>>
>>
>> Changes in v2:
>> * Rebased to 
>> * Removed fallback values from rte_eth_dev_info_get()
>> * Added fallback values to rte_rte_[rt]x_queue_setup()
>> * Added fallback values to rte_eth_dev_configure()
>> * Corrected comment
>> * Removed deprecation notice
>> * Split RX and Tx into seperate structures
>> * Changed parameter names
>>
>>
>> Remy Horton (4):
>>   ethdev: add support for PMD-tuned Tx/Rx parameters
>>   net/e1000: add TxRx tuning parameters
>>   net/i40e: add TxRx tuning parameters
>>   testpmd: make use of per-PMD TxRx parameters
> 
> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

Series applied to dpdk-next-net/master, thanks.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v8 3/9] eventtimer: add common code
  @ 2018-03-29 21:27  3%     ` Erik Gabriel Carrillo
    1 sibling, 0 replies; 200+ results
From: Erik Gabriel Carrillo @ 2018-03-29 21:27 UTC (permalink / raw)
  To: pbhagavatula; +Cc: dev, jerin.jacob, nipun.gupta, hemant.agrawal

This commit adds the logic that is shared by all event timer adapter
drivers; the common code handles instance allocation and some
initialization.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 config/common_base                                |   1 +
 drivers/event/sw/sw_evdev.c                       |  18 +
 lib/librte_eventdev/Makefile                      |   2 +
 lib/librte_eventdev/rte_event_timer_adapter.c     | 387 ++++++++++++++++++++++
 lib/librte_eventdev/rte_event_timer_adapter_pmd.h | 114 +++++++
 lib/librte_eventdev/rte_eventdev.c                |  22 ++
 lib/librte_eventdev/rte_eventdev.h                |  20 ++
 lib/librte_eventdev/rte_eventdev_pmd.h            |  35 ++
 lib/librte_eventdev/rte_eventdev_version.map      |  21 +-
 9 files changed, 619 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_eventdev/rte_event_timer_adapter.c
 create mode 100644 lib/librte_eventdev/rte_event_timer_adapter_pmd.h

diff --git a/config/common_base b/config/common_base
index ee10b44..accc6f5 100644
--- a/config/common_base
+++ b/config/common_base
@@ -550,6 +550,7 @@ CONFIG_RTE_LIBRTE_EVENTDEV=y
 CONFIG_RTE_LIBRTE_EVENTDEV_DEBUG=n
 CONFIG_RTE_EVENT_MAX_DEVS=16
 CONFIG_RTE_EVENT_MAX_QUEUES_PER_DEV=64
+CONFIG_RTE_EVENT_TIMER_ADAPTER_NUM_MAX=32
 
 #
 # Compile PMD for skeleton event device
diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c
index 6672fd8..0847547 100644
--- a/drivers/event/sw/sw_evdev.c
+++ b/drivers/event/sw/sw_evdev.c
@@ -464,6 +464,22 @@ sw_eth_rx_adapter_caps_get(const struct rte_eventdev *dev,
 	return 0;
 }
 
+static int
+sw_timer_adapter_caps_get(const struct rte_eventdev *dev,
+			  uint64_t flags,
+			  uint32_t *caps,
+			  const struct rte_event_timer_adapter_ops **ops)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(flags);
+	*caps = 0;
+
+	/* Use default SW ops */
+	*ops = NULL;
+
+	return 0;
+}
+
 static void
 sw_info_get(struct rte_eventdev *dev, struct rte_event_dev_info *info)
 {
@@ -791,6 +807,8 @@ sw_probe(struct rte_vdev_device *vdev)
 
 			.eth_rx_adapter_caps_get = sw_eth_rx_adapter_caps_get,
 
+			.timer_adapter_caps_get = sw_timer_adapter_caps_get,
+
 			.xstats_get = sw_xstats_get,
 			.xstats_get_names = sw_xstats_get_names,
 			.xstats_get_by_name = sw_xstats_get_by_name,
diff --git a/lib/librte_eventdev/Makefile b/lib/librte_eventdev/Makefile
index 549b182..8b16e3f 100644
--- a/lib/librte_eventdev/Makefile
+++ b/lib/librte_eventdev/Makefile
@@ -20,6 +20,7 @@ LDLIBS += -lrte_eal -lrte_ring -lrte_ethdev -lrte_hash
 SRCS-y += rte_eventdev.c
 SRCS-y += rte_event_ring.c
 SRCS-y += rte_event_eth_rx_adapter.c
+SRCS-y += rte_event_timer_adapter.c
 
 # export include files
 SYMLINK-y-include += rte_eventdev.h
@@ -29,6 +30,7 @@ SYMLINK-y-include += rte_eventdev_pmd_vdev.h
 SYMLINK-y-include += rte_event_ring.h
 SYMLINK-y-include += rte_event_eth_rx_adapter.h
 SYMLINK-y-include += rte_event_timer_adapter.h
+SYMLINK-y-include += rte_event_timer_adapter_pmd.h
 
 # versioning export map
 EXPORT_MAP := rte_eventdev_version.map
diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
new file mode 100644
index 0000000..75a14ac
--- /dev/null
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -0,0 +1,387 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017-2018 Intel Corporation.
+ * All rights reserved.
+ */
+
+#include <string.h>
+#include <inttypes.h>
+
+#include <rte_memzone.h>
+#include <rte_memory.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+
+#include "rte_eventdev.h"
+#include "rte_eventdev_pmd.h"
+#include "rte_event_timer_adapter.h"
+#include "rte_event_timer_adapter_pmd.h"
+
+#define DATA_MZ_NAME_MAX_LEN 64
+#define DATA_MZ_NAME_FORMAT "rte_event_timer_adapter_data_%d"
+
+static int evtim_logtype;
+
+static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
+
+#define EVTIM_LOG(level, logtype, ...) \
+	rte_log(RTE_LOG_ ## level, logtype, \
+		RTE_FMT("EVTIMER: %s() line %u: " RTE_FMT_HEAD(__VA_ARGS__,) \
+			"\n", __func__, __LINE__, RTE_FMT_TAIL(__VA_ARGS__,)))
+
+#define EVTIM_LOG_ERR(...) EVTIM_LOG(ERR, evtim_logtype, __VA_ARGS__)
+
+#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
+#define EVTIM_LOG_DBG(...) \
+	EVTIM_LOG(DEBUG, evtim_logtype, __VA_ARGS__)
+#else
+#define EVTIM_LOG_DBG(...) (void)0
+#endif
+
+static int
+default_port_conf_cb(uint16_t id, uint8_t event_dev_id, uint8_t *event_port_id,
+		     void *conf_arg)
+{
+	struct rte_event_timer_adapter *adapter;
+	struct rte_eventdev *dev;
+	struct rte_event_dev_config dev_conf;
+	struct rte_event_port_conf *port_conf, def_port_conf = {0};
+	int started;
+	uint8_t port_id;
+	uint8_t dev_id;
+	int ret;
+
+	RTE_SET_USED(event_dev_id);
+
+	adapter = &adapters[id];
+	dev = &rte_eventdevs[adapter->data->event_dev_id];
+	dev_id = dev->data->dev_id;
+	dev_conf = dev->data->dev_conf;
+
+	started = dev->data->dev_started;
+	if (started)
+		rte_event_dev_stop(dev_id);
+
+	port_id = dev_conf.nb_event_ports;
+	dev_conf.nb_event_ports += 1;
+	ret = rte_event_dev_configure(dev_id, &dev_conf);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to configure event dev %u\n", dev_id);
+		if (started)
+			if (rte_event_dev_start(dev_id))
+				return -EIO;
+
+		return ret;
+	}
+
+	if (conf_arg != NULL)
+		port_conf = conf_arg;
+	else {
+		port_conf = &def_port_conf;
+		ret = rte_event_port_default_conf_get(dev_id, port_id,
+						      port_conf);
+		if (ret < 0)
+			return ret;
+	}
+
+	ret = rte_event_port_setup(dev_id, port_id, port_conf);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to setup event port %u on event dev %u\n",
+			      port_id, dev_id);
+		return ret;
+	}
+
+	*event_port_id = port_id;
+
+	if (started)
+		ret = rte_event_dev_start(dev_id);
+
+	return ret;
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_create(const struct rte_event_timer_adapter_conf *conf)
+{
+	return rte_event_timer_adapter_create_ext(conf, default_port_conf_cb,
+						  NULL);
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_create_ext(
+		const struct rte_event_timer_adapter_conf *conf,
+		rte_event_timer_adapter_port_conf_cb_t conf_cb,
+		void *conf_arg)
+{
+	uint16_t adapter_id;
+	struct rte_event_timer_adapter *adapter;
+	const struct rte_memzone *mz;
+	char mz_name[DATA_MZ_NAME_MAX_LEN];
+	int n, ret;
+	struct rte_eventdev *dev;
+
+	if (conf == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Check eventdev ID */
+	if (!rte_event_pmd_is_valid_dev(conf->event_dev_id)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	dev = &rte_eventdevs[conf->event_dev_id];
+
+	adapter_id = conf->timer_adapter_id;
+
+	/* Check that adapter_id is in range */
+	if (adapter_id >= RTE_EVENT_TIMER_ADAPTER_NUM_MAX) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Check adapter ID not already allocated */
+	adapter = &adapters[adapter_id];
+	if (adapter->allocated) {
+		rte_errno = EEXIST;
+		return NULL;
+	}
+
+	/* Create shared data area. */
+	n = snprintf(mz_name, sizeof(mz_name), DATA_MZ_NAME_FORMAT, adapter_id);
+	if (n >= (int)sizeof(mz_name)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	mz = rte_memzone_reserve(mz_name,
+				 sizeof(struct rte_event_timer_adapter_data),
+				 conf->socket_id, 0);
+	if (mz == NULL)
+		/* rte_errno set by rte_memzone_reserve */
+		return NULL;
+
+	adapter->data = mz->addr;
+	memset(adapter->data, 0, sizeof(struct rte_event_timer_adapter_data));
+
+	adapter->data->mz = mz;
+	adapter->data->event_dev_id = conf->event_dev_id;
+	adapter->data->id = adapter_id;
+	adapter->data->socket_id = conf->socket_id;
+	adapter->data->conf = *conf;  /* copy conf structure */
+
+	/* Query eventdev PMD for timer adapter capabilities and ops */
+	ret = dev->dev_ops->timer_adapter_caps_get(dev,
+						   adapter->data->conf.flags,
+						   &adapter->data->caps,
+						   &adapter->ops);
+	if (ret < 0) {
+		rte_errno = ret;
+		goto free_memzone;
+	}
+
+	if (!(adapter->data->caps &
+	      RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT)) {
+		FUNC_PTR_OR_NULL_RET_WITH_ERRNO(conf_cb, -EINVAL);
+		ret = conf_cb(adapter->data->id, adapter->data->event_dev_id,
+			      &adapter->data->event_port_id, conf_arg);
+		if (ret < 0) {
+			rte_errno = ret;
+			goto free_memzone;
+		}
+	}
+
+	/* Allow driver to do some setup */
+	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
+	ret = adapter->ops->init(adapter);
+	if (ret < 0) {
+		rte_errno = ret;
+		goto free_memzone;
+	}
+
+	/* Set fast-path function pointers */
+	adapter->arm_burst = adapter->ops->arm_burst;
+	adapter->arm_tmo_tick_burst = adapter->ops->arm_tmo_tick_burst;
+	adapter->cancel_burst = adapter->ops->cancel_burst;
+
+	adapter->allocated = 1;
+
+	return adapter;
+
+free_memzone:
+	rte_memzone_free(adapter->data->mz);
+	return NULL;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_info *adapter_info)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+
+	if (adapter->ops->get_info)
+		/* let driver set values it knows */
+		adapter->ops->get_info(adapter, adapter_info);
+
+	/* Set common values */
+	adapter_info->conf = adapter->data->conf;
+	adapter_info->event_dev_port_id = adapter->data->event_port_id;
+	adapter_info->caps = adapter->data->caps;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->start, -EINVAL);
+
+	ret = adapter->ops->start(adapter);
+	if (ret < 0)
+		return ret;
+
+	adapter->data->started = 1;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stop, -EINVAL);
+
+	if (adapter->data->started == 0) {
+		EVTIM_LOG_ERR("event timer adapter %"PRIu8" already stopped",
+			      adapter->data->id);
+		return 0;
+	}
+
+	ret = adapter->ops->stop(adapter);
+	if (ret < 0)
+		return ret;
+
+	adapter->data->started = 0;
+
+	return 0;
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_lookup(uint16_t adapter_id)
+{
+	char name[DATA_MZ_NAME_MAX_LEN];
+	const struct rte_memzone *mz;
+	struct rte_event_timer_adapter_data *data;
+	struct rte_event_timer_adapter *adapter;
+	int ret;
+	struct rte_eventdev *dev;
+
+	if (adapters[adapter_id].allocated)
+		return &adapters[adapter_id]; /* Adapter is already loaded */
+
+	snprintf(name, DATA_MZ_NAME_MAX_LEN, DATA_MZ_NAME_FORMAT, adapter_id);
+	mz = rte_memzone_lookup(name);
+	if (mz == NULL) {
+		rte_errno = ENOENT;
+		return NULL;
+	}
+
+	data = mz->addr;
+
+	adapter = &adapters[data->id];
+	adapter->data = data;
+
+	dev = &rte_eventdevs[adapter->data->event_dev_id];
+
+	/* Query eventdev PMD for timer adapter capabilities and ops */
+	ret = dev->dev_ops->timer_adapter_caps_get(dev,
+						   adapter->data->conf.flags,
+						   &adapter->data->caps,
+						   &adapter->ops);
+	if (ret < 0) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Set fast-path function pointers */
+	adapter->arm_burst = adapter->ops->arm_burst;
+	adapter->arm_tmo_tick_burst = adapter->ops->arm_tmo_tick_burst;
+	adapter->cancel_burst = adapter->ops->cancel_burst;
+
+	adapter->allocated = 1;
+
+	return adapter;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_free(struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->uninit, -EINVAL);
+
+	if (adapter->data->started == 1) {
+		EVTIM_LOG_ERR("event timer adapter %"PRIu8" must be stopped "
+			      "before freeing", adapter->data->id);
+		return -EBUSY;
+	}
+
+	/* free impl priv data */
+	ret = adapter->ops->uninit(adapter);
+	if (ret < 0)
+		return ret;
+
+	/* free shared data area */
+	ret = rte_memzone_free(adapter->data->mz);
+	if (ret < 0)
+		return ret;
+
+	adapter->data = NULL;
+	adapter->allocated = 0;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_service_id_get(struct rte_event_timer_adapter *adapter,
+				       uint32_t *service_id)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+
+	if (adapter->data->service_inited && service_id != NULL)
+		*service_id = adapter->data->service_id;
+
+	return adapter->data->service_inited ? 0 : -ESRCH;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stats_get(struct rte_event_timer_adapter *adapter,
+				  struct rte_event_timer_adapter_stats *stats)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stats_get, -EINVAL);
+	if (stats == NULL)
+		return -EINVAL;
+
+	return adapter->ops->stats_get(adapter, stats);
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stats_reset(struct rte_event_timer_adapter *adapter)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stats_reset, -EINVAL);
+	return adapter->ops->stats_reset(adapter);
+}
+
+RTE_INIT(event_timer_adapter_init_log);
+static void
+event_timer_adapter_init_log(void)
+{
+	evtim_logtype = rte_log_register("lib.eventdev.adapter.timer");
+	if (evtim_logtype >= 0)
+		rte_log_set_level(evtim_logtype, RTE_LOG_NOTICE);
+}
diff --git a/lib/librte_eventdev/rte_event_timer_adapter_pmd.h b/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
new file mode 100644
index 0000000..cf3509d
--- /dev/null
+++ b/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017-2018 Intel Corporation.
+ * All rights reserved.
+ */
+
+#ifndef __RTE_EVENT_TIMER_ADAPTER_PMD_H__
+#define __RTE_EVENT_TIMER_ADAPTER_PMD_H__
+
+/**
+ * @file
+ * RTE Event Timer Adapter API (PMD Side)
+ *
+ * @note
+ * This file provides implementation helpers for internal use by PMDs.  They
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_event_timer_adapter.h"
+
+/*
+ * Definitions of functions exported by an event timer adapter implementation
+ * through *rte_event_timer_adapter_ops* structure supplied in the
+ * *rte_event_timer_adapter* structure associated with an event timer adapter.
+ */
+
+typedef int (*rte_event_timer_adapter_init_t)(
+		struct rte_event_timer_adapter *adapter);
+/**< @internal Event timer adapter implementation setup */
+typedef int (*rte_event_timer_adapter_uninit_t)(
+		struct rte_event_timer_adapter *adapter);
+/**< @internal Event timer adapter implementation teardown */
+typedef int (*rte_event_timer_adapter_start_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Start running event timer adapter */
+typedef int (*rte_event_timer_adapter_stop_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Stop running event timer adapter */
+typedef void (*rte_event_timer_adapter_get_info_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_info *adapter_info);
+/**< @internal Get contextual information for event timer adapter */
+typedef int (*rte_event_timer_adapter_stats_get_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_stats *stats);
+/**< @internal Get statistics for event timer adapter */
+typedef int (*rte_event_timer_adapter_stats_reset_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Reset statistics for event timer adapter */
+
+/**
+ * @internal Structure containing the functions exported by an event timer
+ * adapter implementation.
+ */
+struct rte_event_timer_adapter_ops {
+	rte_event_timer_adapter_init_t		init;  /**< Set up adapter */
+	rte_event_timer_adapter_uninit_t	uninit;/**< Tear down adapter */
+	rte_event_timer_adapter_start_t		start; /**< Start adapter */
+	rte_event_timer_adapter_stop_t		stop;  /**< Stop adapter */
+	rte_event_timer_adapter_get_info_t	get_info;
+	/**< Get info from driver */
+	rte_event_timer_adapter_stats_get_t	stats_get;
+	/**< Get adapter statistics */
+	rte_event_timer_adapter_stats_reset_t	stats_reset;
+	/**< Reset adapter statistics */
+	rte_event_timer_arm_burst_t		arm_burst;
+	/**< Arm one or more event timers */
+	rte_event_timer_arm_tmo_tick_burst_t	arm_tmo_tick_burst;
+	/**< Arm event timers with same expiration time */
+	rte_event_timer_cancel_burst_t		cancel_burst;
+	/**< Cancel one or more event timers */
+};
+
+/**
+ * @internal Adapter data; structure to be placed in shared memory to be
+ * accessible by various processes in a multi-process configuration.
+ */
+struct rte_event_timer_adapter_data {
+	uint8_t id;
+	/**< Event timer adapter ID */
+	uint8_t event_dev_id;
+	/**< Event device ID */
+	uint32_t socket_id;
+	/**< Socket ID where memory is allocated */
+	uint8_t event_port_id;
+	/**< Optional: event port ID used when the inbuilt port is absent */
+	const struct rte_memzone *mz;
+	/**< Event timer adapter memzone pointer */
+	struct rte_event_timer_adapter_conf conf;
+	/**< Configuration used to configure the adapter. */
+	uint32_t caps;
+	/**< Adapter capabilities */
+	void *adapter_priv;
+	/**< Timer adapter private data*/
+	uint8_t service_inited;
+	/**< Service initialization state */
+	uint32_t service_id;
+	/**< Service ID*/
+
+	RTE_STD_C11
+	uint8_t started : 1;
+	/**< Flag to indicate adapter started. */
+} __rte_cache_aligned;
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __RTE_EVENT_TIMER_ADAPTER_PMD_H__ */
diff --git a/lib/librte_eventdev/rte_eventdev.c b/lib/librte_eventdev/rte_eventdev.c
index 851a119..eb3c601 100644
--- a/lib/librte_eventdev/rte_eventdev.c
+++ b/lib/librte_eventdev/rte_eventdev.c
@@ -123,6 +123,28 @@ rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
 				: 0;
 }
 
+int __rte_experimental
+rte_event_timer_adapter_caps_get(uint8_t dev_id, uint32_t *caps)
+{
+	struct rte_eventdev *dev;
+	const struct rte_event_timer_adapter_ops *ops;
+
+	RTE_EVENTDEV_VALID_DEVID_OR_ERR_RET(dev_id, -EINVAL);
+
+	dev = &rte_eventdevs[dev_id];
+
+	if (caps == NULL)
+		return -EINVAL;
+	*caps = 0;
+
+	return dev->dev_ops->timer_adapter_caps_get ?
+				(*dev->dev_ops->timer_adapter_caps_get)(dev,
+									0,
+									caps,
+									&ops)
+				: 0;
+}
+
 static inline int
 rte_event_dev_queue_config(struct rte_eventdev *dev, uint8_t nb_queues)
 {
diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
index f9ad71e..77fb693 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -215,6 +215,7 @@ extern "C" {
 #include <rte_config.h>
 #include <rte_memory.h>
 #include <rte_errno.h>
+#include <rte_compat.h>
 
 struct rte_mbuf; /* we just use mbuf pointers; no need to include rte_mbuf.h */
 
@@ -1069,6 +1070,25 @@ int
 rte_event_eth_rx_adapter_caps_get(uint8_t dev_id, uint8_t eth_port_id,
 				uint32_t *caps);
 
+#define RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT (1ULL << 0)
+/**< This flag is set when the timer mechanism is in HW. */
+
+/**
+ * Retrieve the event device's timer adapter capabilities.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @param[out] caps
+ *   A pointer to memory to be filled with event timer adapter capabilities.
+ *
+ * @return
+ *   - 0: Success, driver provided event timer adapter capabilities.
+ *   - <0: Error code returned by the driver function.
+ */
+int __rte_experimental
+rte_event_timer_adapter_caps_get(uint8_t dev_id, uint32_t *caps);
+
 struct rte_eventdev_driver;
 struct rte_eventdev_ops;
 struct rte_eventdev;
diff --git a/lib/librte_eventdev/rte_eventdev_pmd.h b/lib/librte_eventdev/rte_eventdev_pmd.h
index 31343b5..0e37f1c 100644
--- a/lib/librte_eventdev/rte_eventdev_pmd.h
+++ b/lib/librte_eventdev/rte_eventdev_pmd.h
@@ -26,6 +26,7 @@ extern "C" {
 #include <rte_malloc.h>
 
 #include "rte_eventdev.h"
+#include "rte_event_timer_adapter_pmd.h"
 
 /* Logging Macros */
 #define RTE_EDEV_LOG_ERR(...) \
@@ -449,6 +450,37 @@ typedef int (*eventdev_eth_rx_adapter_caps_get_t)
 struct rte_event_eth_rx_adapter_queue_conf *queue_conf;
 
 /**
+ * Retrieve the event device's timer adapter capabilities, as well as the ops
+ * structure that an event timer adapter should call through to enter the
+ * driver
+ *
+ * @param dev
+ *   Event device pointer
+ *
+ * @param flags
+ *   Flags that can be used to determine how to select an event timer
+ *   adapter ops structure
+ *
+ * @param[out] caps
+ *   A pointer to memory filled with Rx event adapter capabilities.
+ *
+ * @param[out] ops
+ *   A pointer to the ops pointer to set with the address of the desired ops
+ *   structure
+ *
+ * @return
+ *   - 0: Success, driver provides Rx event adapter capabilities for the
+ *	ethernet device.
+ *   - <0: Error code returned by the driver function.
+ *
+ */
+typedef int (*eventdev_timer_adapter_caps_get_t)(
+				const struct rte_eventdev *dev,
+				uint64_t flags,
+				uint32_t *caps,
+				const struct rte_event_timer_adapter_ops **ops);
+
+/**
  * Add ethernet Rx queues to event device. This callback is invoked if
  * the caps returned from rte_eventdev_eth_rx_adapter_caps_get(, eth_port_id)
  * has RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT set.
@@ -640,6 +672,9 @@ struct rte_eventdev_ops {
 	eventdev_eth_rx_adapter_stats_reset eth_rx_adapter_stats_reset;
 	/**< Reset ethernet Rx stats */
 
+	eventdev_timer_adapter_caps_get_t timer_adapter_caps_get;
+	/**< Get timer adapter capabilities */
+
 	eventdev_selftest dev_selftest;
 	/**< Start eventdev Selftest */
 };
diff --git a/lib/librte_eventdev/rte_eventdev_version.map b/lib/librte_eventdev/rte_eventdev_version.map
index 2aef470..537afb8 100644
--- a/lib/librte_eventdev/rte_eventdev_version.map
+++ b/lib/librte_eventdev/rte_eventdev_version.map
@@ -66,7 +66,6 @@ DPDK_17.11 {
 	rte_event_eth_rx_adapter_stats_get;
 	rte_event_eth_rx_adapter_stats_reset;
 	rte_event_eth_rx_adapter_stop;
-
 } DPDK_17.08;
 
 DPDK_18.02 {
@@ -74,3 +73,23 @@ DPDK_18.02 {
 
 	rte_event_dev_selftest;
 } DPDK_17.11;
+
+EXPERIMENTAL {
+	global:
+
+        rte_event_timer_adapter_caps_get;
+	rte_event_timer_adapter_create;
+	rte_event_timer_adapter_create_ext;
+	rte_event_timer_adapter_free;
+	rte_event_timer_adapter_get_info;
+	rte_event_timer_adapter_lookup;
+	rte_event_timer_adapter_service_id_get;
+	rte_event_timer_adapter_start;
+	rte_event_timer_adapter_stats_get;
+	rte_event_timer_adapter_stats_reset;
+	rte_event_timer_adapter_stop;
+	rte_event_timer_init;
+	rte_event_timer_arm_burst;
+	rte_event_timer_arm_tmo_tick_burst;
+	rte_event_timer_cancel_burst;
+} DPDK_18.02;
-- 
2.6.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 1/2] Add RIB library
  2018-03-29 20:11  0%     ` Vladimir Medvedkin
@ 2018-03-29 20:41  3%       ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2018-03-29 20:41 UTC (permalink / raw)
  To: Vladimir Medvedkin; +Cc: dev

On Thu, Mar 29, 2018 at 11:11:20PM +0300, Vladimir Medvedkin wrote:
> 2018-03-29 13:27 GMT+03:00 Bruce Richardson <bruce.richardson@intel.com>:
> 
> > On Wed, Feb 21, 2018 at 09:44:54PM +0000, Medvedkin Vladimir wrote:
> > > RIB is an alternative to current LPM library.
<snip>
> > > +#define LOOKUP_FUNC(suffix, type, bulk_prefetch)                     \
> > > +int rte_dir24_8_lookup_bulk_##suffix(void *fib_p, const uint32_t
> > *ips,       \
> > > +     uint64_t *next_hops, const unsigned n)                          \
> > > +{                                                                    \
> > > +     struct rte_dir24_8_tbl *fib = (struct rte_dir24_8_tbl *)fib_p;  \
> > > +     uint64_t tmp;                                                   \
> > > +     uint32_t i;                                                     \
> > > +     uint32_t prefetch_offset = RTE_MIN((unsigned)bulk_prefetch, n); \
> > > +                                                                     \
> > > +     RTE_RIB_RETURN_IF_TRUE(((fib == NULL) || (ips == NULL) ||       \
> > > +             (next_hops == NULL)), -EINVAL);                         \
> > > +                                                                     \
> > > +     for (i = 0; i < prefetch_offset; i++)                           \
> > > +             rte_prefetch0(get_tbl24_p(fib, ips[i]));                \
> > > +     for (i = 0; i < (n - prefetch_offset); i++) {                   \
> > > +             rte_prefetch0(get_tbl24_p(fib, ips[i + prefetch_offset]));
> > \
> > > +             tmp = ((type *)fib->tbl24)[ips[i] >> 8];                \
> > > +             if (unlikely((tmp & RTE_DIR24_8_VALID_EXT_ENT) ==       \
> > > +                     RTE_DIR24_8_VALID_EXT_ENT)) {                   \
> > > +                     tmp = ((type *)fib->tbl8)[(uint8_t)ips[i] +     \
> > > +                             ((tmp >> 1) *
> > RTE_DIR24_8_TBL8_GRP_NUM_ENT)]; \
> > > +             }                                                       \
> > > +             next_hops[i] = tmp >> 1;                                \
> > > +     }                                                               \
> > > +     for (; i < n; i++) {                                            \
> > > +             tmp = ((type *)fib->tbl24)[ips[i] >> 8];                \
> > > +             if (unlikely((tmp & RTE_DIR24_8_VALID_EXT_ENT) ==       \
> > > +                     RTE_DIR24_8_VALID_EXT_ENT)) {                   \
> > > +                     tmp = ((type *)fib->tbl8)[(uint8_t)ips[i] +     \
> > > +                             ((tmp >> 1) *
> > RTE_DIR24_8_TBL8_GRP_NUM_ENT)]; \
> > > +             }                                                       \
> > > +             next_hops[i] = tmp >> 1;                                \
> > > +     }                                                               \
> > > +     return 0;                                                       \
> > > +}                                                                    \
> >
> > What is the advantage of doing this as a macro? Unless I'm missing
> > something "suffix" is never actually used in the function at all, and you
> > reference the size of the data from fix->nh_sz. Therefore there can be no
> > performance benefit from having such a lookup function, that I can see.
> >
> suffix is to create 4 different function names, look at the end of
> rte_dir24_8.c, there are
> LOOKUP_FUNC(1b, uint8_t, 5)
> LOOKUP_FUNC(2b, uint16_t, 6)
> LOOKUP_FUNC(4b, uint32_t, 15)
> LOOKUP_FUNC(8b, uint64_t, 12)
> 
> Now I made single lookup function that references the size of the data from
> fib->nh_sz instead of static casting to passed type in macro.
> was:
> BULK RIB Lookup: 24.2 cycles (fails = 0.0%)
> become:
> BULK RIB Lookup: 26.1 cycles (fails = 0.0%)
> proc E3-1230v1@3.6Ghz
> 
> 
So you are saying that it turned out to be faster to do a lookup of the
size rather than hardcoding it. Seems strange, but ok.
I'm still confused why the four functions with four different names. What
is different between the four implementations. Just the amount of
prefetching done? It could still be done by a single call with a
compile-time constant parameter. It's whats used a lot in the ring library
and works well there.

> >
> > Therefore, if performance is ok, I suggest just making a single lookup_bulk
> > function that works with all sizes - as the inlined lookup function does in
> > the header.
> >
> > Alternatively, if you do want specific functions for each
> > entry size, you still don't need macros. Write a single function that takes
> > as a final parameter the entry-size and use that in calculations rather
> > than nh_sz.  Then wrap that function in the set of public ones, passing in
> > the final size parameter explicitly as "1", "2", "4" or "8". The compiler
> > will then know that as a compile-time constant and generate the correct
> > code for each size. However, for this path I suggest you check for any
> > resulting performance improvement, e.g. with l3fwd, as I think it's not
> > likely to be significant.
> >
> > > +
<snip> 
> >
> > > +{
> > > +     uint64_t res;
> > > +     struct rte_dir24_8_tbl *fib = (struct rte_dir24_8_tbl *)fib_p;
> > > +
> > > +     RTE_RIB_RETURN_IF_TRUE(((fib == NULL) || (ip == NULL) ||
> > > +             (next_hop == NULL)), -EINVAL);
> > > +
> > > +     res = RTE_DIR24_8_GET_TBL24(fib, ip);
> > > +     if (unlikely((res & RTE_DIR24_8_VALID_EXT_ENT) ==
> > > +             RTE_DIR24_8_VALID_EXT_ENT)) {
> > > +             res = RTE_DIR24_8_GET_TBL8(fib, res, ip);
> > > +     }
> > > +     *next_hop = res >> 1;
> > > +     return 0;
> > > +}
> >
> > Do we need this static inline function? Can the bulk functions do on their
> > own? If we can remove this, we can move the most of the header file
> > contents, especially the structures, out of the public header. That would
> > greatly improve the ease with which ABI can be maintained.
> >
> It was done in some manner to LPM. There was separate single lookup and
> bulk versions.
> Of course it is possible to remove this function at all and use bulk
> version to lookup single packet. But I thought maybe somebody could use it.
> 

Yes, I understand that. However, if it's unlikely to be used, I would
prioritize having ABI-friendliness over having it.

/Bruce

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 1/2] Add RIB library
  2018-03-29 10:27  3%   ` Bruce Richardson
@ 2018-03-29 20:11  0%     ` Vladimir Medvedkin
  2018-03-29 20:41  3%       ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Vladimir Medvedkin @ 2018-03-29 20:11 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

2018-03-29 13:27 GMT+03:00 Bruce Richardson <bruce.richardson@intel.com>:

> On Wed, Feb 21, 2018 at 09:44:54PM +0000, Medvedkin Vladimir wrote:
> > RIB is an alternative to current LPM library.
> > It solves the following problems
> >  - Increases the speed of control plane operations against lpm such as
> >    adding/deleting routes
> >  - Adds abstraction from dataplane algorithms, so it is possible to add
> >    different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc
> >    in addition to current dir24_8
> >  - It is possible to keep user defined application specific additional
> >    information in struct rte_rib_node which represents route entry.
> >    It can be next hop/set of next hops (i.e. active and feasible),
> >    pointers to link rte_rib_node based on some criteria (i.e. next_hop),
> >    plenty of additional control plane information.
> >  - For dir24_8 implementation it is possible to remove
> rte_lpm_tbl_entry.depth
> >    field that helps to save 6 bits.
> >  - Also new dir24_8 implementation supports different next_hop sizes
> >    (1/2/4/8 bytes per next hop)
> >  - Removed RTE_LPM_LOOKUP_SUCCESS to save 1 bit and to eleminate ternary
> operator.
> >    Instead it returns special default value if there is no route.
> >
> > Signed-off-by: Medvedkin Vladimir <medvedkinv@gmail.com>
> > ---
>
> Hi again,
>
> some initial comments on the dir24_8 files below.
>
> /Bruce
>
> >  config/common_base                 |   6 +
> >  doc/api/doxy-api.conf              |   1 +
> >  lib/Makefile                       |   2 +
> >  lib/librte_rib/Makefile            |  22 ++
> >  lib/librte_rib/rte_dir24_8.c       | 482 ++++++++++++++++++++++++++++++
> +++
> >  lib/librte_rib/rte_dir24_8.h       | 116 ++++++++
> >  lib/librte_rib/rte_rib.c           | 526 ++++++++++++++++++++++++++++++
> +++++++
> >  lib/librte_rib/rte_rib.h           | 322 +++++++++++++++++++++++
> >  lib/librte_rib/rte_rib_version.map |  18 ++
> >  mk/rte.app.mk                      |   1 +
> >  10 files changed, 1496 insertions(+)
> >  create mode 100644 lib/librte_rib/Makefile
> >  create mode 100644 lib/librte_rib/rte_dir24_8.c
> >  create mode 100644 lib/librte_rib/rte_dir24_8.h
> >  create mode 100644 lib/librte_rib/rte_rib.c
> >  create mode 100644 lib/librte_rib/rte_rib.h
> >  create mode 100644 lib/librte_rib/rte_rib_version.map
> >
>
> <snip>
>
> > diff --git a/lib/librte_rib/rte_dir24_8.c b/lib/librte_rib/rte_dir24_8.c
> > new file mode 100644
> > index 0000000..a12f882
> > --- /dev/null
> > +++ b/lib/librte_rib/rte_dir24_8.c
> > @@ -0,0 +1,482 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2018 Vladimir Medvedkin <medvedkinv@gmail.com>
> > + */
> > +
> > +#include <stdint.h>
> > +#include <stdlib.h>
> > +#include <stdio.h>
> > +#include <string.h>
> > +#include <rte_debug.h>
> > +#include <rte_malloc.h>
> > +#include <rte_prefetch.h>
> > +#include <rte_errno.h>
> > +
> > +#include <inttypes.h>
> > +
> > +#include <rte_memory.h>
> > +#include <rte_branch_prediction.h>
> > +
> > +#include <rte_rib.h>
> > +#include <rte_dir24_8.h>
> > +
> > +#define BITMAP_SLAB_BIT_SIZE_LOG2    6
> > +#define BITMAP_SLAB_BIT_SIZE         (1 << BITMAP_SLAB_BIT_SIZE_LOG2)
> > +#define BITMAP_SLAB_BITMASK          (BITMAP_SLAB_BIT_SIZE - 1)
> > +
> > +#define ROUNDUP(x, y)         RTE_ALIGN_CEIL(x, (1 << (32 - y)))
> > +
> > +static __rte_always_inline __attribute__((pure)) void *
> > +get_tbl24_p(struct rte_dir24_8_tbl *fib, uint32_t ip)
> > +{
> > +     return (void *)&((uint8_t *)fib->tbl24)[(ip &
> > +             RTE_DIR24_8_TBL24_MASK) >> (8 - fib->nh_sz)];
> > +}
> > +
> > +#define LOOKUP_FUNC(suffix, type, bulk_prefetch)                     \
> > +int rte_dir24_8_lookup_bulk_##suffix(void *fib_p, const uint32_t
> *ips,       \
> > +     uint64_t *next_hops, const unsigned n)                          \
> > +{                                                                    \
> > +     struct rte_dir24_8_tbl *fib = (struct rte_dir24_8_tbl *)fib_p;  \
> > +     uint64_t tmp;                                                   \
> > +     uint32_t i;                                                     \
> > +     uint32_t prefetch_offset = RTE_MIN((unsigned)bulk_prefetch, n); \
> > +                                                                     \
> > +     RTE_RIB_RETURN_IF_TRUE(((fib == NULL) || (ips == NULL) ||       \
> > +             (next_hops == NULL)), -EINVAL);                         \
> > +                                                                     \
> > +     for (i = 0; i < prefetch_offset; i++)                           \
> > +             rte_prefetch0(get_tbl24_p(fib, ips[i]));                \
> > +     for (i = 0; i < (n - prefetch_offset); i++) {                   \
> > +             rte_prefetch0(get_tbl24_p(fib, ips[i + prefetch_offset]));
> \
> > +             tmp = ((type *)fib->tbl24)[ips[i] >> 8];                \
> > +             if (unlikely((tmp & RTE_DIR24_8_VALID_EXT_ENT) ==       \
> > +                     RTE_DIR24_8_VALID_EXT_ENT)) {                   \
> > +                     tmp = ((type *)fib->tbl8)[(uint8_t)ips[i] +     \
> > +                             ((tmp >> 1) *
> RTE_DIR24_8_TBL8_GRP_NUM_ENT)]; \
> > +             }                                                       \
> > +             next_hops[i] = tmp >> 1;                                \
> > +     }                                                               \
> > +     for (; i < n; i++) {                                            \
> > +             tmp = ((type *)fib->tbl24)[ips[i] >> 8];                \
> > +             if (unlikely((tmp & RTE_DIR24_8_VALID_EXT_ENT) ==       \
> > +                     RTE_DIR24_8_VALID_EXT_ENT)) {                   \
> > +                     tmp = ((type *)fib->tbl8)[(uint8_t)ips[i] +     \
> > +                             ((tmp >> 1) *
> RTE_DIR24_8_TBL8_GRP_NUM_ENT)]; \
> > +             }                                                       \
> > +             next_hops[i] = tmp >> 1;                                \
> > +     }                                                               \
> > +     return 0;                                                       \
> > +}                                                                    \
>
> What is the advantage of doing this as a macro? Unless I'm missing
> something "suffix" is never actually used in the function at all, and you
> reference the size of the data from fix->nh_sz. Therefore there can be no
> performance benefit from having such a lookup function, that I can see.
>
suffix is to create 4 different function names, look at the end of
rte_dir24_8.c, there are
LOOKUP_FUNC(1b, uint8_t, 5)
LOOKUP_FUNC(2b, uint16_t, 6)
LOOKUP_FUNC(4b, uint32_t, 15)
LOOKUP_FUNC(8b, uint64_t, 12)

Now I made single lookup function that references the size of the data from
fib->nh_sz instead of static casting to passed type in macro.
was:
BULK RIB Lookup: 24.2 cycles (fails = 0.0%)
become:
BULK RIB Lookup: 26.1 cycles (fails = 0.0%)
proc E3-1230v1@3.6Ghz


>
> Therefore, if performance is ok, I suggest just making a single lookup_bulk
> function that works with all sizes - as the inlined lookup function does in
> the header.
>
> Alternatively, if you do want specific functions for each
> entry size, you still don't need macros. Write a single function that takes
> as a final parameter the entry-size and use that in calculations rather
> than nh_sz.  Then wrap that function in the set of public ones, passing in
> the final size parameter explicitly as "1", "2", "4" or "8". The compiler
> will then know that as a compile-time constant and generate the correct
> code for each size. However, for this path I suggest you check for any
> resulting performance improvement, e.g. with l3fwd, as I think it's not
> likely to be significant.
>
> > +
> > +static void
> > +write_to_fib(void *ptr, uint64_t val, enum rte_dir24_8_nh_sz size, int
> n)
> > +{
> > +     int i;
> > +     uint8_t *ptr8 = (uint8_t *)ptr;
> > +     uint16_t *ptr16 = (uint16_t *)ptr;
> > +     uint32_t *ptr32 = (uint32_t *)ptr;
> > +     uint64_t *ptr64 = (uint64_t *)ptr;
> > +
> > +     switch (size) {
> > +     case RTE_DIR24_8_1B:
> > +             for (i = 0; i < n; i++)
> > +                     ptr8[i] = (uint8_t)val;
> > +             break;
> > +     case RTE_DIR24_8_2B:
> > +             for (i = 0; i < n; i++)
> > +                     ptr16[i] = (uint16_t)val;
> > +             break;
> > +     case RTE_DIR24_8_4B:
> > +             for (i = 0; i < n; i++)
> > +                     ptr32[i] = (uint32_t)val;
> > +             break;
> > +     case RTE_DIR24_8_8B:
> > +             for (i = 0; i < n; i++)
> > +                     ptr64[i] = (uint64_t)val;
> > +             break;
> > +     }
> > +}
> > +
> > +static int
> > +tbl8_get_idx(struct rte_dir24_8_tbl *fib)
> > +{
> > +     uint32_t i;
> > +     int bit_idx;
> > +
> > +     for (i = 0; (i < (fib->number_tbl8s >> BITMAP_SLAB_BIT_SIZE_LOG2))
> &&
> > +             (fib->tbl8_idxes[i] == UINT64_MAX); i++)
> > +             ;
> > +     if (i <= (fib->number_tbl8s >> BITMAP_SLAB_BIT_SIZE_LOG2)) {
> > +             bit_idx = __builtin_ctzll(~fib->tbl8_idxes[i]);
> > +             fib->tbl8_idxes[i] |= (1ULL << bit_idx);
> > +             return (i << BITMAP_SLAB_BIT_SIZE_LOG2) + bit_idx;
> > +     }
> > +     return -ENOSPC;
> > +}
> > +
> > +static inline void
> > +tbl8_free_idx(struct rte_dir24_8_tbl *fib, int idx)
> > +{
> > +     fib->tbl8_idxes[idx >> BITMAP_SLAB_BIT_SIZE_LOG2] &=
> > +             ~(1ULL << (idx & BITMAP_SLAB_BITMASK));
> > +}
> > +
> > +static int
> > +tbl8_alloc(struct rte_dir24_8_tbl *fib, uint64_t nh)
> > +{
> > +     int     tbl8_idx;
> > +     uint8_t *tbl8_ptr;
> > +
> > +     tbl8_idx = tbl8_get_idx(fib);
> > +     if (tbl8_idx < 0)
> > +             return tbl8_idx;
> > +     tbl8_ptr = (uint8_t *)fib->tbl8 +
> > +             ((tbl8_idx * RTE_DIR24_8_TBL8_GRP_NUM_ENT) <<
> > +             fib->nh_sz);
> > +     /*Init tbl8 entries with nexthop from tbl24*/
> > +     write_to_fib((void *)tbl8_ptr, nh|
> > +             RTE_DIR24_8_VALID_EXT_ENT, fib->nh_sz,
> > +             RTE_DIR24_8_TBL8_GRP_NUM_ENT);
> > +     return tbl8_idx;
> > +}
> > +
> > +static void
> > +tbl8_recycle(struct rte_dir24_8_tbl *fib, uint32_t ip, uint64_t
> tbl8_idx)
> > +{
> > +     int i;
> > +     uint64_t nh;
> > +     uint8_t *ptr8;
> > +     uint16_t *ptr16;
> > +     uint32_t *ptr32;
> > +     uint64_t *ptr64;
> > +
> > +     switch (fib->nh_sz) {
> > +     case RTE_DIR24_8_1B:
> > +             ptr8 = &((uint8_t *)fib->tbl8)[tbl8_idx *
> > +                             RTE_DIR24_8_TBL8_GRP_NUM_ENT];
> > +             nh = *ptr8;
> > +             for (i = 1; i < RTE_DIR24_8_TBL8_GRP_NUM_ENT; i++) {
> > +                     if (nh != ptr8[i])
> > +                             return;
> > +             }
> > +             ((uint8_t *)fib->tbl24)[ip >> 8] =
> > +                     nh & ~RTE_DIR24_8_VALID_EXT_ENT;
> > +             for (i = 0; i < RTE_DIR24_8_TBL8_GRP_NUM_ENT; i++)
> > +                     ptr8[i] = 0;
> > +             break;
> > +     case RTE_DIR24_8_2B:
> > +             ptr16 = &((uint16_t *)fib->tbl8)[tbl8_idx *
> > +                             RTE_DIR24_8_TBL8_GRP_NUM_ENT];
> > +             nh = *ptr16;
> > +             for (i = 1; i < RTE_DIR24_8_TBL8_GRP_NUM_ENT; i++) {
> > +                     if (nh != ptr16[i])
> > +                             return;
> > +             }
> > +             ((uint16_t *)fib->tbl24)[ip >> 8] =
> > +                     nh & ~RTE_DIR24_8_VALID_EXT_ENT;
> > +             for (i = 0; i < RTE_DIR24_8_TBL8_GRP_NUM_ENT; i++)
> > +                     ptr16[i] = 0;
> > +             break;
> > +     case RTE_DIR24_8_4B:
> > +             ptr32 = &((uint32_t *)fib->tbl8)[tbl8_idx *
> > +                             RTE_DIR24_8_TBL8_GRP_NUM_ENT];
> > +             nh = *ptr32;
> > +             for (i = 1; i < RTE_DIR24_8_TBL8_GRP_NUM_ENT; i++) {
> > +                     if (nh != ptr32[i])
> > +                             return;
> > +             }
> > +             ((uint32_t *)fib->tbl24)[ip >> 8] =
> > +                     nh & ~RTE_DIR24_8_VALID_EXT_ENT;
> > +             for (i = 0; i < RTE_DIR24_8_TBL8_GRP_NUM_ENT; i++)
> > +                     ptr32[i] = 0;
> > +             break;
> > +     case RTE_DIR24_8_8B:
> > +             ptr64 = &((uint64_t *)fib->tbl8)[tbl8_idx *
> > +                             RTE_DIR24_8_TBL8_GRP_NUM_ENT];
> > +             nh = *ptr64;
> > +             for (i = 1; i < RTE_DIR24_8_TBL8_GRP_NUM_ENT; i++) {
> > +                     if (nh != ptr64[i])
> > +                             return;
> > +             }
> > +             ((uint64_t *)fib->tbl24)[ip >> 8] =
> > +                     nh & ~RTE_DIR24_8_VALID_EXT_ENT;
> > +             for (i = 0; i < RTE_DIR24_8_TBL8_GRP_NUM_ENT; i++)
> > +                     ptr64[i] = 0;
> > +             break;
> > +     }
> > +     tbl8_free_idx(fib, tbl8_idx);
> > +}
> > +
> > +static int
> > +install_to_fib(struct rte_dir24_8_tbl *fib, uint32_t ledge, uint32_t
> redge,
> > +     uint64_t next_hop)
> > +{
> > +     uint64_t        tbl24_tmp;
> > +     int     tbl8_idx;
> > +     int tmp_tbl8_idx;
> > +     uint8_t *tbl8_ptr;
> > +
> > +     /*case for 0.0.0.0/0*/
> > +     if (unlikely((ledge == 0) && (redge == 0))) {
> > +             write_to_fib(fib->tbl24, next_hop << 1, fib->nh_sz, 1 <<
> 24);
> > +             return 0;
> > +     }
> > +     if (ROUNDUP(ledge, 24) <= redge) {
> > +             if (ledge < ROUNDUP(ledge, 24)) {
> > +                     tbl24_tmp = RTE_DIR24_8_GET_TBL24(fib, ledge);
> > +                     if ((tbl24_tmp & RTE_DIR24_8_VALID_EXT_ENT) !=
> > +                             RTE_DIR24_8_VALID_EXT_ENT) {
> > +                             tbl8_idx = tbl8_alloc(fib, tbl24_tmp);
> > +                             tmp_tbl8_idx = tbl8_get_idx(fib);
> > +                             if ((tbl8_idx < 0) || (tmp_tbl8_idx < 0))
> > +                                     return -ENOSPC;
> > +                             tbl8_free_idx(fib, tmp_tbl8_idx);
> > +                             /*update dir24 entry with tbl8 index*/
> > +                             write_to_fib(get_tbl24_p(fib, ledge),
> > +                                     (tbl8_idx << 1)|
> > +                                     RTE_DIR24_8_VALID_EXT_ENT,
> > +                                     fib->nh_sz, 1);
> > +                     } else
> > +                             tbl8_idx = tbl24_tmp >> 1;
> > +                     tbl8_ptr = (uint8_t *)fib->tbl8 +
> > +                             (((tbl8_idx *
> RTE_DIR24_8_TBL8_GRP_NUM_ENT) +
> > +                             (ledge & ~RTE_DIR24_8_TBL24_MASK)) <<
> > +                             fib->nh_sz);
> > +                     /*update tbl8 with new next hop*/
> > +                     write_to_fib((void *)tbl8_ptr, (next_hop << 1)|
> > +                             RTE_DIR24_8_VALID_EXT_ENT,
> > +                             fib->nh_sz, ROUNDUP(ledge, 24) - ledge);
> > +                     tbl8_recycle(fib, ledge, tbl8_idx);
> > +             }
> > +             if (ROUNDUP(ledge, 24) < (redge & RTE_DIR24_8_TBL24_MASK))
> {
> > +                     write_to_fib(get_tbl24_p(fib, ROUNDUP(ledge, 24)),
> > +                             next_hop << 1, fib->nh_sz,
> > +                             ((redge & RTE_DIR24_8_TBL24_MASK) -
> > +                             ROUNDUP(ledge, 24)) >> 8);
> > +             }
> > +             if (redge & ~RTE_DIR24_8_TBL24_MASK) {
> > +                     tbl24_tmp = RTE_DIR24_8_GET_TBL24(fib, redge);
> > +                     if ((tbl24_tmp & RTE_DIR24_8_VALID_EXT_ENT) !=
> > +                                     RTE_DIR24_8_VALID_EXT_ENT) {
> > +                             tbl8_idx = tbl8_alloc(fib, tbl24_tmp);
> > +                             if (tbl8_idx < 0)
> > +                                     return -ENOSPC;
> > +                             /*update dir24 entry with tbl8 index*/
> > +                             write_to_fib(get_tbl24_p(fib, redge),
> > +                                     (tbl8_idx << 1)|
> > +                                     RTE_DIR24_8_VALID_EXT_ENT,
> > +                                     fib->nh_sz, 1);
> > +                     } else
> > +                             tbl8_idx = tbl24_tmp >> 1;
> > +                     tbl8_ptr = (uint8_t *)fib->tbl8 +
> > +                             ((tbl8_idx * RTE_DIR24_8_TBL8_GRP_NUM_ENT)
> <<
> > +                             fib->nh_sz);
> > +                     /*update tbl8 with new next hop*/
> > +                     write_to_fib((void *)tbl8_ptr, (next_hop << 1)|
> > +                             RTE_DIR24_8_VALID_EXT_ENT,
> > +                             fib->nh_sz, redge &
> ~RTE_DIR24_8_TBL24_MASK);
> > +                     tbl8_recycle(fib, redge, tbl8_idx);
> > +             }
> > +     } else {
> > +             tbl24_tmp = RTE_DIR24_8_GET_TBL24(fib, ledge);
> > +             if ((tbl24_tmp & RTE_DIR24_8_VALID_EXT_ENT) !=
> > +                     RTE_DIR24_8_VALID_EXT_ENT) {
> > +                     tbl8_idx = tbl8_alloc(fib, tbl24_tmp);
> > +                     if (tbl8_idx < 0)
> > +                             return -ENOSPC;
> > +                     /*update dir24 entry with tbl8 index*/
> > +                     write_to_fib(get_tbl24_p(fib, ledge),
> > +                             (tbl8_idx << 1)|
> > +                             RTE_DIR24_8_VALID_EXT_ENT,
> > +                             fib->nh_sz, 1);
> > +             } else
> > +                     tbl8_idx = tbl24_tmp >> 1;
> > +             tbl8_ptr = (uint8_t *)fib->tbl8 +
> > +                     (((tbl8_idx * RTE_DIR24_8_TBL8_GRP_NUM_ENT) +
> > +                     (ledge & ~RTE_DIR24_8_TBL24_MASK)) <<
> > +                     fib->nh_sz);
> > +             /*update tbl8 with new next hop*/
> > +             write_to_fib((void *)tbl8_ptr, (next_hop << 1)|
> > +                     RTE_DIR24_8_VALID_EXT_ENT,
> > +                     fib->nh_sz, redge - ledge);
> > +             tbl8_recycle(fib, ledge, tbl8_idx);
> > +     }
> > +     return 0;
> > +}
> > +
> > +static int
> > +modify_fib(struct rte_rib *rib, uint32_t ip, uint8_t depth,
> > +     uint64_t next_hop)
> > +{
> > +     struct rte_rib_node *tmp = NULL;
> > +     struct rte_dir24_8_tbl *fib;
> > +     uint32_t ledge, redge;
> > +     int ret;
> > +
> > +     fib = rib->fib;
> > +
> > +     if (next_hop > DIR24_8_MAX_NH(fib))
> > +             return -EINVAL;
> > +
> > +     ledge = ip;
> > +     do {
> > +             tmp = rte_rib_tree_get_nxt(rib, ip, depth, tmp,
> > +                     RTE_RIB_GET_NXT_COVER);
> > +             if (tmp != NULL) {
> > +                     if (tmp->depth == depth)
> > +                             continue;
> > +                     redge = tmp->key;
> > +                     if (ledge == redge) {
> > +                             ledge = redge +
> > +                                     (uint32_t)(1ULL << (32 -
> tmp->depth));
> > +                             continue;
> > +                     }
> > +                     ret = install_to_fib(fib, ledge, redge,
> > +                             next_hop);
> > +                     if (ret != 0)
> > +                             return ret;
> > +                     ledge = redge +
> > +                             (uint32_t)(1ULL << (32 - tmp->depth));
> > +             } else {
> > +                     redge = ip + (uint32_t)(1ULL << (32 - depth));
> > +                     ret = install_to_fib(fib, ledge, redge,
> > +                             next_hop);
> > +                     if (ret != 0)
> > +                             return ret;
> > +             }
> > +     } while (tmp);
> > +
> > +     return 0;
> > +}
> > +
> > +int
> > +rte_dir24_8_modify(struct rte_rib *rib, uint32_t ip, uint8_t depth,
> > +     uint64_t next_hop, enum rte_rib_op op)
> > +{
> > +     struct rte_dir24_8_tbl *fib;
> > +     struct rte_rib_node *tmp = NULL;
> > +     struct rte_rib_node *node;
> > +     struct rte_rib_node *parent;
> > +     int ret = 0;
> > +
> > +     if ((rib == NULL) || (depth > RTE_RIB_MAXDEPTH))
> > +             return -EINVAL;
> > +
> > +     fib = rib->fib;
> > +     RTE_ASSERT(fib);
> > +
> > +     ip &= (uint32_t)(UINT64_MAX << (32 - depth));
> > +
> > +     node = rte_rib_tree_lookup_exact(rib, ip, depth);
> > +     switch (op) {
> > +     case RTE_RIB_ADD:
> > +             if (node != NULL) {
> > +                     if (node->nh == next_hop)
> > +                             return 0;
> > +                     ret = modify_fib(rib, ip, depth, next_hop);
> > +                     if (ret == 0)
> > +                             node->nh = next_hop;
> > +                     return 0;
> > +             }
> > +             if (depth > 24) {
> > +                     tmp = rte_rib_tree_get_nxt(rib, ip, 24, NULL,
> > +                             RTE_RIB_GET_NXT_COVER);
> > +                     if ((tmp == NULL) &&
> > +                             (fib->cur_tbl8s >= fib->number_tbl8s))
> > +                             return -ENOSPC;
> > +
> > +             }
> > +             node = rte_rib_tree_insert(rib, ip, depth);
> > +             if (node == NULL)
> > +                     return -rte_errno;
> > +             node->nh = next_hop;
> > +             parent = rte_rib_tree_lookup_parent(node);
> > +             if ((parent != NULL) && (parent->nh == next_hop))
> > +                     return 0;
> > +             ret = modify_fib(rib, ip, depth, next_hop);
> > +             if (ret) {
> > +                     rte_rib_tree_remove(rib, ip, depth);
> > +                     return ret;
> > +             }
> > +             if ((depth > 24) && (tmp == NULL))
> > +                     fib->cur_tbl8s++;
> > +             return 0;
> > +     case RTE_RIB_DEL:
> > +             if (node == NULL)
> > +                     return -ENOENT;
> > +
> > +             parent = rte_rib_tree_lookup_parent(node);
> > +             if (parent != NULL) {
> > +                     if (parent->nh != node->nh)
> > +                             ret = modify_fib(rib, ip, depth,
> parent->nh);
> > +             } else
> > +                     ret = modify_fib(rib, ip, depth, fib->def_nh);
> > +             if (ret == 0) {
> > +                     rte_rib_tree_remove(rib, ip, depth);
> > +                     if (depth > 24) {
> > +                             tmp = rte_rib_tree_get_nxt(rib, ip, 24,
> NULL,
> > +                                     RTE_RIB_GET_NXT_COVER);
> > +                             if (tmp == NULL)
> > +                                     fib->cur_tbl8s--;
> > +                     }
> > +             }
> > +             return ret;
> > +     default:
> > +             break;
> > +     }
> > +     return -EINVAL;
> > +}
> > +
> > +struct rte_dir24_8_tbl *rte_dir24_8_create(const char *name, int
> socket_id,
> > +     enum rte_dir24_8_nh_sz nh_sz, uint64_t def_nh)
> > +{
> > +     char mem_name[RTE_RIB_NAMESIZE];
> > +     struct rte_dir24_8_tbl *fib;
> > +
> > +     snprintf(mem_name, sizeof(mem_name), "FIB_%s", name);
> > +     fib = rte_zmalloc_socket(name, sizeof(struct rte_dir24_8_tbl) +
> > +             RTE_DIR24_8_TBL24_NUM_ENT * (1 << nh_sz),
> RTE_CACHE_LINE_SIZE,
> > +             socket_id);
> > +     if (fib == NULL)
> > +             return fib;
> > +
> > +     snprintf(mem_name, sizeof(mem_name), "TBL8_%s", name);
> > +     fib->tbl8 = rte_zmalloc_socket(mem_name,
> RTE_DIR24_8_TBL8_GRP_NUM_ENT *
> > +                     (1 << nh_sz) * RTE_DIR24_8_TBL8_NUM_GROUPS,
> > +                     RTE_CACHE_LINE_SIZE, socket_id);
> > +     if (fib->tbl8 == NULL) {
> > +             rte_free(fib);
> > +             return NULL;
> > +     }
> > +     fib->def_nh = def_nh;
> > +     fib->nh_sz = nh_sz;
> > +     fib->number_tbl8s = RTE_MIN((uint32_t)RTE_DIR24_8_TBL8_NUM_GROUPS,
> > +                             DIR24_8_MAX_NH(fib));
> > +
> > +     snprintf(mem_name, sizeof(mem_name), "TBL8_idxes_%s", name);
> > +     fib->tbl8_idxes = rte_zmalloc_socket(mem_name,
> > +                     RTE_ALIGN_CEIL(fib->number_tbl8s, 64) >> 3,
> > +                     RTE_CACHE_LINE_SIZE, socket_id);
> > +     if (fib->tbl8_idxes == NULL) {
> > +             rte_free(fib->tbl8);
> > +             rte_free(fib);
> > +             return NULL;
> > +     }
> > +
> > +     return fib;
> > +}
> > +
> > +void
> > +rte_dir24_8_free(void *fib_p)
> > +{
> > +     struct rte_dir24_8_tbl *fib = (struct rte_dir24_8_tbl *)fib_p;
> > +
> > +     rte_free(fib->tbl8_idxes);
> > +     rte_free(fib->tbl8);
> > +     rte_free(fib);
> > +}
> > +
> > +LOOKUP_FUNC(1b, uint8_t, 5)
> > +LOOKUP_FUNC(2b, uint16_t, 6)
> > +LOOKUP_FUNC(4b, uint32_t, 15)
> > +LOOKUP_FUNC(8b, uint64_t, 12)
> > diff --git a/lib/librte_rib/rte_dir24_8.h b/lib/librte_rib/rte_dir24_8.h
> > new file mode 100644
> > index 0000000..f779409
> > --- /dev/null
> > +++ b/lib/librte_rib/rte_dir24_8.h
> > @@ -0,0 +1,116 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2018 Vladimir Medvedkin <medvedkinv@gmail.com>
> > + */
> > +
> > +#ifndef _RTE_DIR24_8_H_
> > +#define _RTE_DIR24_8_H_
> > +
> > +/**
> > + * @file
> > + * RTE Longest Prefix Match (LPM)
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +/** @internal Total number of tbl24 entries. */
> > +#define RTE_DIR24_8_TBL24_NUM_ENT    (1 << 24)
> > +
> > +/** Maximum depth value possible for IPv4 LPM. */
> > +#define RTE_DIR24_8_MAX_DEPTH                32
> > +
> > +/** @internal Number of entries in a tbl8 group. */
> > +#define RTE_DIR24_8_TBL8_GRP_NUM_ENT 256
> > +
> > +/** @internal Total number of tbl8 groups in the tbl8. */
> > +#define RTE_DIR24_8_TBL8_NUM_GROUPS  65536
> > +
> > +/** @internal bitmask with valid and valid_group fields set */
> > +#define RTE_DIR24_8_VALID_EXT_ENT    0x01
> > +
> > +#define RTE_DIR24_8_TBL24_MASK               0xffffff00
> > +
> > +/** Size of nexthop (1 << nh_sz) bits */
> > +enum rte_dir24_8_nh_sz {
> > +     RTE_DIR24_8_1B,
> > +     RTE_DIR24_8_2B,
> > +     RTE_DIR24_8_4B,
> > +     RTE_DIR24_8_8B
> > +};
> > +
> > +
> > +#define DIR24_8_BITS_IN_NH(fib)              (8 * (1 << fib->nh_sz))
> > +#define DIR24_8_MAX_NH(fib)  ((1ULL << (DIR24_8_BITS_IN_NH(fib) - 1)) -
> 1)
> > +
> > +#define DIR24_8_TBL_IDX(a, fib)              ((a) >> (3 - fib->nh_sz))
> > +#define DIR24_8_PSD_IDX(a, fib)              ((a) & ((1 << (3 -
> fib->nh_sz)) - 1))
> > +
> > +#define DIR24_8_TBL24_VAL(ip)        (ip >> 8)
> > +#define DIR24_8_TBL8_VAL(res, ip)                                    \
> > +     ((res >> 1) * RTE_DIR24_8_TBL8_GRP_NUM_ENT + (uint8_t)ip)       \
> > +
> > +#define DIR24_8_LOOKUP_MSK                                           \
> > +     (((1ULL << ((1 << (fib->nh_sz + 3)) - 1)) << 1) - 1)            \
> > +
> > +#define RTE_DIR24_8_GET_TBL24(fib, ip)
>      \
> > +     ((fib->tbl24[DIR24_8_TBL_IDX(DIR24_8_TBL24_VAL(ip), fib)] >>    \
> > +     (DIR24_8_PSD_IDX(DIR24_8_TBL24_VAL(ip), fib) *                  \
> > +     DIR24_8_BITS_IN_NH(fib))) & DIR24_8_LOOKUP_MSK)                 \
> > +
> > +#define RTE_DIR24_8_GET_TBL8(fib, res, ip)                           \
> > +     ((fib->tbl8[DIR24_8_TBL_IDX(DIR24_8_TBL8_VAL(res, ip), fib)] >> \
> > +     (DIR24_8_PSD_IDX(DIR24_8_TBL8_VAL(res, ip), fib) *              \
> > +     DIR24_8_BITS_IN_NH(fib))) & DIR24_8_LOOKUP_MSK)                 \
> >
> I would strongly suggest making each of the above macros into inline
> functions instead. It would allow easier readability since you have
> parameter types and can split things across lines easier.
> Also, some comments might be good too.
>
> +
> > +
> > +struct rte_dir24_8_tbl {
> > +     uint32_t        number_tbl8s;   /**< Total number of tbl8s. */
> > +     uint32_t        cur_tbl8s;      /**< Current cumber of tbl8s. */
> > +     uint64_t        def_nh;
> > +     enum rte_dir24_8_nh_sz  nh_sz;  /**< Size of nexthop entry */
> > +     uint64_t        *tbl8;          /**< LPM tbl8 table. */
> > +     uint64_t        *tbl8_idxes;
> > +     uint64_t        tbl24[0] __rte_cache_aligned; /**< LPM tbl24
> table. */
> > +};
> > +
> > +struct rte_dir24_8_tbl *rte_dir24_8_create(const char *name, int
> socket_id,
> > +     enum rte_dir24_8_nh_sz nh_sz, uint64_t def_nh);
> > +void rte_dir24_8_free(void *fib_p);
> > +int rte_dir24_8_modify(struct rte_rib *rib, uint32_t key,
> > +     uint8_t depth, uint64_t next_hop, enum rte_rib_op op);
> > +int rte_dir24_8_lookup_bulk_1b(void *fib_p, const uint32_t *ips,
> > +     uint64_t *next_hops, const unsigned n);
> > +int rte_dir24_8_lookup_bulk_2b(void *fib_p, const uint32_t *ips,
> > +     uint64_t *next_hops, const unsigned n);
> > +int rte_dir24_8_lookup_bulk_4b(void *fib_p, const uint32_t *ips,
> > +     uint64_t *next_hops, const unsigned n);
> > +int rte_dir24_8_lookup_bulk_8b(void *fib_p, const uint32_t *ips,
> > +     uint64_t *next_hops, const unsigned n);
> > +
> > +
> > +static inline int
> > +rte_dir24_8_lookup(void *fib_p, uint32_t ip, uint64_t *next_hop)
>
> Why use void * as parameter, since the proper type is defined just above?

agree, will change to struct rte_dir24_8_tbl *


>
> > +{
> > +     uint64_t res;
> > +     struct rte_dir24_8_tbl *fib = (struct rte_dir24_8_tbl *)fib_p;
> > +
> > +     RTE_RIB_RETURN_IF_TRUE(((fib == NULL) || (ip == NULL) ||
> > +             (next_hop == NULL)), -EINVAL);
> > +
> > +     res = RTE_DIR24_8_GET_TBL24(fib, ip);
> > +     if (unlikely((res & RTE_DIR24_8_VALID_EXT_ENT) ==
> > +             RTE_DIR24_8_VALID_EXT_ENT)) {
> > +             res = RTE_DIR24_8_GET_TBL8(fib, res, ip);
> > +     }
> > +     *next_hop = res >> 1;
> > +     return 0;
> > +}
>
> Do we need this static inline function? Can the bulk functions do on their
> own? If we can remove this, we can move the most of the header file
> contents, especially the structures, out of the public header. That would
> greatly improve the ease with which ABI can be maintained.
>
It was done in some manner to LPM. There was separate single lookup and
bulk versions.
Of course it is possible to remove this function at all and use bulk
version to lookup single packet. But I thought maybe somebody could use it.


>
>
> +
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_DIR24_8_H_ */
> > +
>



-- 
Regards,
Vladimir

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/2] Add RIB library
  2018-03-26  9:50  0%       ` Bruce Richardson
@ 2018-03-29 19:59  0%         ` Vladimir Medvedkin
  0 siblings, 0 replies; 200+ results
From: Vladimir Medvedkin @ 2018-03-29 19:59 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

2018-03-26 12:50 GMT+03:00 Bruce Richardson <bruce.richardson@intel.com>:

> On Sun, Mar 25, 2018 at 09:17:20PM +0300, Vladimir Medvedkin wrote:
> > Hi,
> >
> > 2018-03-14 14:09 GMT+03:00 Bruce Richardson <bruce.richardson@intel.com
> >:
> >
> > > On Wed, Feb 21, 2018 at 09:44:54PM +0000, Medvedkin Vladimir wrote:
> > > > RIB is an alternative to current LPM library.
> > > > It solves the following problems
> > > >  - Increases the speed of control plane operations against lpm such
> as
> > > >    adding/deleting routes
> > > >  - Adds abstraction from dataplane algorithms, so it is possible to
> add
> > > >    different ip route lookup algorythms such as
> DXR/poptrie/lpc-trie/etc
> > > >    in addition to current dir24_8
> > > >  - It is possible to keep user defined application specific
> additional
> > > >    information in struct rte_rib_node which represents route entry.
> > > >    It can be next hop/set of next hops (i.e. active and feasible),
> > > >    pointers to link rte_rib_node based on some criteria (i.e.
> next_hop),
> > > >    plenty of additional control plane information.
> > > >  - For dir24_8 implementation it is possible to remove
> > > rte_lpm_tbl_entry.depth
> > > >    field that helps to save 6 bits.
> > > >  - Also new dir24_8 implementation supports different next_hop sizes
> > > >    (1/2/4/8 bytes per next hop)
> > > >  - Removed RTE_LPM_LOOKUP_SUCCESS to save 1 bit and to eleminate
> ternary
> > > operator.
> > > >    Instead it returns special default value if there is no route.
> > > >
> > > > Signed-off-by: Medvedkin Vladimir <medvedkinv@gmail.com>
> > > > ---
> > > >  config/common_base                 |   6 +
> > > >  doc/api/doxy-api.conf              |   1 +
> > > >  lib/Makefile                       |   2 +
> > > >  lib/librte_rib/Makefile            |  22 ++
> > > >  lib/librte_rib/rte_dir24_8.c       | 482
> ++++++++++++++++++++++++++++++
> > > +++
> > > >  lib/librte_rib/rte_dir24_8.h       | 116 ++++++++
> > > >  lib/librte_rib/rte_rib.c           | 526
> ++++++++++++++++++++++++++++++
> > > +++++++
> > > >  lib/librte_rib/rte_rib.h           | 322 +++++++++++++++++++++++
> > > >  lib/librte_rib/rte_rib_version.map |  18 ++
> > > >  mk/rte.app.mk                      |   1 +
> > > >  10 files changed, 1496 insertions(+)
> > > >  create mode 100644 lib/librte_rib/Makefile
> > > >  create mode 100644 lib/librte_rib/rte_dir24_8.c
> > > >  create mode 100644 lib/librte_rib/rte_dir24_8.h
> > > >  create mode 100644 lib/librte_rib/rte_rib.c
> > > >  create mode 100644 lib/librte_rib/rte_rib.h
> > > >  create mode 100644 lib/librte_rib/rte_rib_version.map
> > > >
> > >
> > > First pass review comments. For now just reviewed the main public
> header
> > > file rte_rib.h. Later reviews will cover the other files as best I can.
> > >
> > > /Bruce
> > >
> > > <snip>
> > > > diff --git a/lib/librte_rib/rte_rib.h b/lib/librte_rib/rte_rib.h
> > > > new file mode 100644
> > > > index 0000000..6eac8fb
> > > > --- /dev/null
> > > > +++ b/lib/librte_rib/rte_rib.h
> > > > @@ -0,0 +1,322 @@
> > > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > > + * Copyright(c) 2018 Vladimir Medvedkin <medvedkinv@gmail.com>
> > > > + */
> > > > +
> > > > +#ifndef _RTE_RIB_H_
> > > > +#define _RTE_RIB_H_
> > > > +
> > > > +/**
> > > > + * @file
> > > > + * Compressed trie implementation for Longest Prefix Match
> > > > + */
> > > > +
> > > > +/** @internal Macro to enable/disable run-time checks. */
> > > > +#if defined(RTE_LIBRTE_RIB_DEBUG)
> > > > +#define RTE_RIB_RETURN_IF_TRUE(cond, retval) do {    \
> > > > +     if (cond)                                       \
> > > > +             return retval;                          \
> > > > +} while (0)
> > > > +#else
> > > > +#define RTE_RIB_RETURN_IF_TRUE(cond, retval)
> > > > +#endif
> > >
> > > use RTE_ASSERT?
> > >
> > it was done just like it was done in the LPM lib. But if you think it
> > should be RTE_ASSERT so be it.
> >
> >
> > >
> > > > +
> > > > +#define RTE_RIB_VALID_NODE   1
> > >
> > > should there be an INVALID_NODE macro?
> > >
> > No
> >
> >
> > >
> > > > +#define RTE_RIB_GET_NXT_ALL  0
> > > > +#define RTE_RIB_GET_NXT_COVER        1
> > > > +
> > > > +#define RTE_RIB_INVALID_ROUTE        0
> > > > +#define RTE_RIB_VALID_ROUTE  1
> > > > +
> > > > +/** Max number of characters in RIB name. */
> > > > +#define RTE_RIB_NAMESIZE     64
> > > > +
> > > > +/** Maximum depth value possible for IPv4 RIB. */
> > > > +#define RTE_RIB_MAXDEPTH     32
> > >
> > > I think we should have IPv4 in the name here. Will it not be extended
> to
> > > support IPv6 in future?
> > >
> > I think there should be a separate implementation of the library for ipv6
> >
> I can understand the need for a separate LPM implementation, but should
> they both not be under the same rib library?
>
I planned to have sepatate rib6 for ipv6.
Of course it is possible to make universal abstraction for both v4 and v6.
But in this case there will be universal rte_rib_node with type (v4|v6) and
variable length, keys will become union of uint32_t for v4 and uint8_t [16]
for v6. I think it is overcomplication.


> >
> > >
> > > > +
> > > > +/**
> > > > + * Macro to check if prefix1 {key1/depth1}
> > > > + * is covered by prefix2 {key2/depth2}
> > > > + */
> > > > +#define RTE_RIB_IS_COVERED(key1, depth1, key2, depth2)
> > >      \
> > > > +     ((((key1 ^ key2) & (uint32_t)(UINT64_MAX << (32 - depth2))) ==
> 0)\
> > > > +             && (depth1 > depth2))
> > > Neat check!
> > >
> > > Any particular reason for using UINT64_MAX here rather than UINT32_MAX?
> >
> > in case when depth2 = 0 UINT32_MAX shifted left by 32 bit will remain
> > UINT32_MAX because shift count will be masked to 5 bits.
> >
> > I think you can avoid the casting and have a slightly shorter mask by
> > > changing "(uint32_t)(UINT64_MAX << (32 - depth2)" to
> > > "~(UINT32_MAX >> depth2)"
> > > I'd also suggest for readability putting the second check first, and,
> > > for maintainability, using an inline function rather than a macro.
> > >
> >  Agree, it looks clearer
> >
> >
> > > > +
> > > > +/** @internal Macro to get next node in tree*/
> > > > +#define RTE_RIB_GET_NXT_NODE(node, key)
> > >       \
> > > > +     ((key & (1 << (31 - node->depth))) ? node->right : node->left)
> > > > +/** @internal Macro to check if node is right child*/
> > > > +#define RTE_RIB_IS_RIGHT_NODE(node)  (node->parent->right == node)
> > >
> > > Again, consider inline fns rather than macros.
> > >
> > Ok
> >
> > For the latter macro, rather than doing additional pointer derefs to
> > > parent, can you also get if it's a right node by using:
> > > "(node->key & (1 << (32 - node->depth)))"?
> > >
> > No, it is not possible. Decision whether node be left or right is made
> > using parent and child common depth.
> > Consider case with 10.0.0.0/8 and 10.128.0.0/24. In this way common
> depth
> > will be /8 and 10.128.0.0/24 will be right child.
> >
> >
> > > > +
> > > > +
> > > > +struct rte_rib_node {
> > > > +     struct rte_rib_node *left;
> > > > +     struct rte_rib_node *right;
> > > > +     struct rte_rib_node *parent;
> > > > +     uint32_t        key;
> > > > +     uint8_t         depth;
> > > > +     uint8_t         flag;
> > > > +     uint64_t        nh;
> > > > +     uint64_t        ext[0];
> > > > +};
> > > > +
> > > > +struct rte_rib;
> > > > +
> > > > +/** Type of FIB struct*/
> > > > +enum rte_rib_type {
> > > > +     RTE_RIB_DIR24_8_1B,
> > > > +     RTE_RIB_DIR24_8_2B,
> > > > +     RTE_RIB_DIR24_8_4B,
> > > > +     RTE_RIB_DIR24_8_8B,
> > > > +     RTE_RIB_TYPE_MAX
> > > > +};
> > >
> > > If the plan is to support multiple underlying fib types and algorithms
> > > under the rib library, would it not be better to separate out the
> > > algorithm part from the data storage part? So have the type just be
> > > DIR_24_8, and have the 1, 2, 4 or 8 specified separately.
> > >
> > Yes, we were talk about it in IRC, agree. Now I pass next hop size in
> > union rte_rib_fib_conf inside rte_rib_conf
> >
> >
> > >
> > > > +
> > > > +enum rte_rib_op {
> > > > +     RTE_RIB_ADD,
> > > > +     RTE_RIB_DEL
> > > > +};
> > > > +
> > > > +/** RIB nodes allocation type */
> > > > +enum rte_rib_alloc_type {
> > > > +     RTE_RIB_MALLOC,
> > > > +     RTE_RIB_MEMPOOL,
> > > > +     RTE_RIB_ALLOC_MAX
> > > > +};
> > >
> > > Not sure you need this any more. Malloc allocations and mempool
> > > allocations are now pretty much the same thing.
> > >
> > Actually I think to remove malloc. On performance tests with
> > adding/deleting huge amount of routes malloc is slower. Maybe because of
> > fragmentation.
> > What do you think?
> >
> Yes, definitely mempool allocations are the way to go!
>
> >
> > > > +
> > > > +typedef int (*rte_rib_modify_fn_t)(struct rte_rib *rib, uint32_t
> key,
> > > > +     uint8_t depth, uint64_t next_hop, enum rte_rib_op op);
> > >
> > > Do you anticipate more ops in future than just add and delete? If not,
> > > why not just split this function into two and drop the op struct.
> > >
> > It is difficult question. I'm not ready to make decision at the moment.
> >
> >
> > >
> > > > +typedef int (*rte_rib_tree_lookup_fn_t)(void *fib, const uint32_t
> *ips,
> > > > +     uint64_t *next_hops, const unsigned n);
> > > > +typedef struct rte_rib_node *(*rte_rib_alloc_node_fn_t)(struct
> rte_rib
> > > *rib);
> > > > +typedef void (*rte_rib_free_node_fn_t)(struct rte_rib *rib,
> > > > +     struct rte_rib_node *node);
> > > > +
> > > > +struct rte_rib {
> > > > +     char name[RTE_RIB_NAMESIZE];
> > > > +     /*pointer to rib trie*/
> > > > +     struct rte_rib_node     *trie;
> > > > +     /*pointer to dataplane struct*/
> > > > +     void    *fib;
> > > > +     /*prefix modification*/
> > > > +     rte_rib_modify_fn_t     modify;
> > > > +     /* Bulk lookup fn*/
> > > > +     rte_rib_tree_lookup_fn_t        lookup;
> > > > +     /*alloc trie element*/
> > > > +     rte_rib_alloc_node_fn_t alloc_node;
> > > > +     /*free trie element*/
> > > > +     rte_rib_free_node_fn_t  free_node;
> > > > +     struct rte_mempool      *node_pool;
> > > > +     uint32_t                cur_nodes;
> > > > +     uint32_t                cur_routes;
> > > > +     int                     max_nodes;
> > > > +     int                     node_sz;
> > > > +     enum rte_rib_type       type;
> > > > +     enum rte_rib_alloc_type alloc_type;
> > > > +};
> > > > +
> > > > +/** RIB configuration structure */
> > > > +struct rte_rib_conf {
> > > > +     enum rte_rib_type       type;
> > > > +     enum rte_rib_alloc_type alloc_type;
> > > > +     int     max_nodes;
> > > > +     size_t  node_sz;
> > > > +     uint64_t def_nh;
> > > > +};
> > > > +
> > > > +/**
> > > > + * Lookup an IP into the RIB structure
> > > > + *
> > > > + * @param rib
> > > > + *  RIB object handle
> > > > + * @param key
> > > > + *  IP to be looked up in the RIB
> > > > + * @return
> > > > + *  pointer to struct rte_rib_node on success,
> > > > + *  NULL otherwise
> > > > + */
> > > > +struct rte_rib_node *
> > > > +rte_rib_tree_lookup(struct rte_rib *rib, uint32_t key);
> > > > +
> > > > +/**
> > > > + * Lookup less specific route into the RIB structure
> > > > + *
> > > > + * @param ent
> > > > + *  Pointer to struct rte_rib_node that represents target route
> > > > + * @return
> > > > + *  pointer to struct rte_rib_node that represents
> > > > + *  less specific route on success,
> > > > + *  NULL otherwise
> > > > + */
> > > > +struct rte_rib_node *
> > > > +rte_rib_tree_lookup_parent(struct rte_rib_node *ent);
> > > > +
> > > > +/**
> > > > + * Lookup prefix into the RIB structure
> > > > + *
> > > > + * @param rib
> > > > + *  RIB object handle
> > > > + * @param key
> > > > + *  net to be looked up in the RIB
> > > > + * @param depth
> > > > + *  prefix length
> > > > + * @return
> > > > + *  pointer to struct rte_rib_node on success,
> > > > + *  NULL otherwise
> > > > + */
> > > > +struct rte_rib_node *
> > > > +rte_rib_tree_lookup_exact(struct rte_rib *rib, uint32_t key,
> uint8_t
> > > depth);
> > >
> > > Can you explain the difference between this and regular lookup, and how
> > > they would be used. I don't think the names convey the differences
> > > sufficiently, and so we should look to rename one or both to be
> clearer.
> > >
> > Regular lookup (rte_rib_tree_lookup) will lookup for most specific node
> for
> > passed key.
> > rte_rib_tree_lookup_exact will lookup node contained key and depth equal
> to
> > passed in args. It used to find exact route.
> >
> So if there is no node exactly matching the parameters, it the lookup_exact
> returns failure? E.g. if you request a /24 node, it won't return a /8 node
> that would cover the /24?
>
yes, it returns failure. Use  rte_rib_tree_lookup(without depth) and if you
want use  rte_rib_tree_lookup_parent after.


> >
> > >
> > > > +
> > > > +/**
> > > > + * Retrieve next more specific prefix from the RIB
> > > s/more/most/
> > >
> >
> > > > + * that is covered by key/depth supernet
> > > > + *
> > > > + * @param rib
> > > > + *  RIB object handle
> > > > + * @param key
> > > > + *  net address of supernet prefix that covers returned more
> specific
> > > prefixes
> > > > + * @param depth
> > > > + *  supernet prefix length
> > > > + * @param cur
> > > > + *   pointer to the last returned prefix to get next prefix
> > > > + *   or
> > > > + *   NULL to get first more specific prefix
> > > > + * @param flag
> > > > + *  -RTE_RIB_GET_NXT_ALL
> > > > + *   get all prefixes from subtrie
> > >
> > > By all prefixes do you mean more specific, i.e. the final prefix?
> > >
> > What do you mean the final prefix?
> >
> The most specific one, or the longest prefix.

This function is created for different task, not for lpm lookup. This
function is for traverse on trie and retrieve routes that falls under the
key/depth so term the longest prefix is irrelevant here.
Let me explain with an example. Imagine there are 10.0.0.0/8, 10.0.0.0/24,
10.0.0.10/32 and 10.64.0.0/10 in routing table.
You have code like:

rte_rib_node *tmp = NULL;
do {
    tmp = rte_rib_tree_get_nxt(rib, IPv4(10,0,0,0), 8,
RTE_RIB_GET_NXT_ALL); /* retrieve all routes that belongs to 10.0.0.0/8 */
    if (node)
          printf("%u/%u\n", tmp->key, tmp->depth);
} while (tmp);

in this case you will see all subprefixes, but without 10.0.0.0/8:
10.0.0.0/24
10.0.0.10/32
10.64.0.0/10

If you want 10.0.0.0/8 do
tmp = rte_rib_tree_get_nxt(rib, IPv4(10,0,0,0), 7, RTE_RIB_GET_NXT_ALL); /*
retrieve all routes that belongs to 10.0.0.0/7 */

And if you call it with RTE_RIB_GET_NXT_COVER like
tmp = rte_rib_tree_get_nxt(rib, IPv4(10,0,0,0), 8, RTE_RIB_GET_NXT_COVER);
you will get
10.0.0.0/24
10.64.0.0/10
without
10.0.0.10/32
This is useful if you want to get gaps for 10.0.0.0/8 that not covered by
presented routes.


>
> >
> > > > + *  -RTE_RIB_GET_NXT_COVER
> > > > + *   get only first more specific prefix even if it have more
> specifics
> > > > + * @return
> > > > + *  pointer to the next more specific prefix
> > > > + *  or
> > > > + *  NULL if there is no prefixes left
> > > > + */
> > > > +struct rte_rib_node *
> > > > +rte_rib_tree_get_nxt(struct rte_rib *rib, uint32_t key, uint8_t
> depth,
> > > > +     struct rte_rib_node *cur, int flag);
> > > > +
> > > > +/**
> > > > + * Remove prefix from the RIB
> > > > + *
> > > > + * @param rib
> > > > + *  RIB object handle
> > > > + * @param key
> > > > + *  net to be removed from the RIB
> > > > + * @param depth
> > > > + *  prefix length
> > > > + */
> > > > +void
> > > > +rte_rib_tree_remove(struct rte_rib *rib, uint32_t key, uint8_t
> depth);
> > > > +
> > > > +/**
> > > > + * Insert prefix into the RIB
> > > > + *
> > > > + * @param rib
> > > > + *  RIB object handle
> > > > + * @param key
> > > > + *  net to be inserted to the RIB
> > > > + * @param depth
> > > > + *  prefix length
> > > > + * @return
> > > > + *  pointer to new rte_rib_node on success
> > > > + *  NULL otherwise
> > > > + */
> > > > +struct rte_rib_node *
> > > > +rte_rib_tree_insert(struct rte_rib *rib, uint32_t key, uint8_t
> depth);
> > > > +
> > > > +/**
> > > > + * Create RIB
> > > > + *
> > > > + * @param name
> > > > + *  RIB name
> > > > + * @param socket_id
> > > > + *  NUMA socket ID for RIB table memory allocation
> > > > + * @param conf
> > > > + *  Structure containing the configuration
> > > > + * @return
> > > > + *  Handle to RIB object on success
> > > > + *  NULL otherwise with rte_errno set to an appropriate values.
> > > > + */
> > > > +struct rte_rib *
> > > > +rte_rib_create(const char *name, int socket_id, struct rte_rib_conf
> > > *conf);
> > > > +
> > > > +/**
> > > > + * Find an existing RIB object and return a pointer to it.
> > > > + *
> > > > + * @param name
> > > > + *  Name of the rib object as passed to rte_rib_create()
> > > > + * @return
> > > > + *  Pointer to rib object or NULL if object not found with rte_errno
> > > > + *  set appropriately. Possible rte_errno values include:
> > > > + *   - ENOENT - required entry not available to return.
> > > > + */
> > > > +struct rte_rib *
> > > > +rte_rib_find_existing(const char *name);
> > > > +
> > > > +/**
> > > > + * Free an RIB object.
> > > > + *
> > > > + * @param rib
> > > > + *   RIB object handle
> > > > + * @return
> > > > + *   None
> > > > + */
> > > > +void
> > > > +rte_rib_free(struct rte_rib *rib);
> > > > +
> > > > +/**
> > > > + * Add a rule to the RIB.
> > > > + *
> > > > + * @param rib
> > > > + *   RIB object handle
> > > > + * @param ip
> > > > + *   IP of the rule to be added to the RIB
> > > > + * @param depth
> > > > + *   Depth of the rule to be added to the RIB
> > > > + * @param next_hop
> > > > + *   Next hop of the rule to be added to the RIB
> > > > + * @return
> > > > + *   0 on success, negative value otherwise
> > > > + */
> > > > +int
> > > > +rte_rib_add(struct rte_rib *rib, uint32_t ip, uint8_t depth,
> uint64_t
> > > next_hop);
> > > > +
> > > > +/**
> > > > + * Delete a rule from the RIB.
> > > > + *
> > > > + * @param rib
> > > > + *   RIB object handle
> > > > + * @param ip
> > > > + *   IP of the rule to be deleted from the RIB
> > > > + * @param depth
> > > > + *   Depth of the rule to be deleted from the RIB
> > > > + * @return
> > > > + *   0 on success, negative value otherwise
> > > > + */
> > > > +int
> > > > +rte_rib_delete(struct rte_rib *rib, uint32_t ip, uint8_t depth);
> > > > +
> > > > +/**
> > > > + * Lookup multiple IP addresses in an FIB. This may be implemented
> as a
> > > > + * macro, so the address of the function should not be used.
> > > > + *
> > > > + * @param RIB
> > > > + *   RIB object handle
> > > > + * @param ips
> > > > + *   Array of IPs to be looked up in the FIB
> > > > + * @param next_hops
> > > > + *   Next hop of the most specific rule found for IP.
> > > > + *   This is an array of eight byte values.
> > > > + *   If the lookup for the given IP failed, then corresponding
> element
> > > would
> > > > + *   contain default value, see description of then next parameter.
> > > > + * @param n
> > > > + *   Number of elements in ips (and next_hops) array to lookup. This
> > > should be a
> > > > + *   compile time constant, and divisible by 8 for best performance.
> > > > + * @param defv
> > > > + *   Default value to populate into corresponding element of hop[]
> > > array,
> > > > + *   if lookup would fail.
> > > > + *  @return
> > > > + *   -EINVAL for incorrect arguments, otherwise 0
> > > > + */
> > > > +#define rte_rib_fib_lookup_bulk(rib, ips, next_hops, n)      \
> > > > +     rib->lookup(rib->fib, ips, next_hops, n)
> > >
> > > My main thought here is whether this needs to be a function at all?
> > > Given that it takes a full burst of addresses in a single go, how much
> > > performance would actually be lost by making this a regular function in
> > > the C file?
> > > IF we do convert this to a regular function, then a lot of the
> structure
> > > definitions above - most importantly, the rib structure itself - can
> > > probably be moved to a private header file and not exposed to
> > > applications at all. This will make ABI compatibility a *lot* easier,
> as
> > > the structures can be changed without affecting the public ABI.
> > >
> > I didn't quite understand what you mean.
> >
> Sorry, by "needs to be a function" in first line read "needs to be a
> macro". Basically, the point is to not inline anything that doesn't need
> it. If a function works on a burst of packets, it probably will be fine
> being a regular function than a macro or inline function.
>
ok, got it after conversation in IRC

>
> /Bruce
>



-- 
Regards,
Vladimir

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] vhost: fix segfault as handle set_mem_table message
  2018-03-29 16:37  3%             ` Maxime Coquelin
@ 2018-03-29 18:09  0%               ` Wodkowski, PawelX
  0 siblings, 0 replies; 200+ results
From: Wodkowski, PawelX @ 2018-03-29 18:09 UTC (permalink / raw)
  To: Maxime Coquelin, Tan, Jianfeng, Victor Kaplansky
  Cc: dev, stable, Yang, Yi Y, Harris, James R, Yang, Ziye, Liu,
	Changpeng, Stojaczyk, DariuszX, Yuanhan Liu

> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Thursday, March 29, 2018 6:38 PM
> To: Wodkowski, PawelX <pawelx.wodkowski@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>; Victor Kaplansky <vkaplans@redhat.com>
> Cc: dev@dpdk.org; stable@dpdk.org; Yang, Yi Y <yi.y.yang@intel.com>;
> Harris, James R <james.r.harris@intel.com>; Yang, Ziye
> <ziye.yang@intel.com>; Liu, Changpeng <changpeng.liu@intel.com>;
> Stojaczyk, DariuszX <dariuszx.stojaczyk@intel.com>; Yuanhan Liu
> <yliu@fridaylinux.org>
> Subject: Re: [PATCH] vhost: fix segfault as handle set_mem_table message
> 
> Hi Pawel,
> 
> On 03/29/2018 02:57 PM, Wodkowski, PawelX wrote:
> >>>>>>> DPDK vhost-user handles this message rudely by unmap all existing
> >>>>>>> regions and map new ones. This might lead to segfault if there
> >>>>>>> is pmd thread just trying to touch those unmapped memory
> regions.
> >>>>>>>
> >>>>>>> But for most cases, except VM memory hotplug,
> >>>>
> >>>> FYI, Victor is working on implementing a lock-less protection
> mechanism
> >>>> to prevent crashes in such cases. It is intended first to protect
> >>>> log_base in case of multiqueue + live-migration, but would solve thi
> >>>> issue too.
> >>>
> >>> Bring this issue back for discussion.
> >>>
> >>> Reported by SPDK guys, even with per-queue lock, they could still run
> into
> >> crash as of memory hot plug or unplug.
> >>> In detail, you add the lock in an internal struct, vhost_queue which is,
> >> unfortunately, not visible to the external datapath, like vhost-scsi in SPDK.
> >>
> >> Yes, I agree current solution is not enough
> >>
> >>> The memory hot plug and unplug would be the main issue from SPDK
> side so
> >> far. For this specific issue, I think we can continue this patch to filter out
> the
> >> changed regions, and keep unchanged regions not remapped.
> >>>
> >>> But I know that the per-vq lock is not only to resolve the memory table
> issue,
> >> some other vhost-user messages could also lead to problems? If yes, shall
> we
> >> take a step back, to think about how to solve this kind of issues for
> backends,
> >> like vhost-scsi.
> >>
> >> Right, any message that can change the device or virtqueue states can be
> >> problematic.
> >>
> >>> Thoughts?
> >>
> >> In another thread, SPDK people proposed to destroy and re-create the
> >> device for every message. I think this is not acceptable.
> >
> > Backend must know which part of device is outdated (memory table, ring
> > position etc) so it can take right action here. I  don't insist on destroy/create
> > scheme but this solve most of those problems (if not all). if we will have
> another
> > working solution this is perfectly fine for me.
> >
> >>
> >> I proposed an alternative that I think would work:
> >> - external backends & applications implements the
> .vring_state_changed()
> >>     callback. On disable it stops processing the rings and only return
> >>     once all descriptor buffers are processed. On enable, they resume the
> >>     rings processing.
> >> - In vhost lib, we implement vhost_vring_pause and vhost_vring_resume
> >>     functions. In pause function, we save device state (enable or
> >>     disable) in a variable, and if the ring is enabled at device level it
> >>     calls .vring_state_changed() with disabled state. In resume, it checks
> >>     the vring state in the device and call .vring_state_changed() with
> >>     enable state if it was enabled in device state.
> >
> > This will be not enough. We need to know what exactly changed. As for
> ring
> > state it is straight forward to save/fetch new ring state but eg. for set mem
> > table we need to finish all IO, remove currently registered RDMA memory.
> > Then, when new memory table is available we need to register it again for
> > RDMA then resume IO.
> 
> Yes, that's what I meant when I said "only return once all desc buffers
> are processed".
> 
> These messages are quite rare, I don't think it is really a problem to
> finish all IO when it happens, and that's what happened with your
> initial patch.
> 

Sure, this will be quite easy to use this workaround. We will treat disabling
*any ring* as destroy device and enabling *all rings* as new device event.

> I agree we must consider a solution as you propose below, but my
> proposal could easily be implemented for v18.05. Whereas your patch
> below is quite a big change, and I think it is a bit too late as
> integration deadline for v18.05 is April 6th.

Agree, it is too late for v18.05 to do that. So let's use your solution for v18.05
and develop final fix for next release.

> 
> >
> >>
> >> So, I think that would work but I hadn't had a clear reply from SPDK
> >> people proving it wouldn't.
> >>
> >> They asked we revert Victor's patch, but I don't see the need as it does
> >> not hurt SPDK (but doesn't protect anything for them I agree), while it
> >> really fixes real issues with internal Net backend.
> >>
> >> What do you think of my proposal? Do you see other alternative?
> >>
> >
> > As Victor is already working on the solution, can you post some info about
> how
> > you plan to solve it? I was thinking about something like code bellow (sorry
> for
> > how this code look like but this is my work-in-progress  to see if this make
> any
> > sense here). This code allow to:
> > 1.  not introducing changes like http://dpdk.org/ml/archives/dev/2018-
> March/093922.html
> >       because backend will handle this by its own.
> 
> Right, but we may anyway have to declare the payload for these backend
> specific messages in vhost lib as it may be bigger than existing
> payloads.
> 
> > 2. virtio-net specific messages can be moved out of generic vhost_user.c
> file
> > 3. virtqueue locking stuff can be moved to virito-net specific backend.
> >
> > Pls let me know what you think.
> 
> Thanks for sharing, please find a few comments below:
> 
> >
> > ---
> >   lib/librte_vhost/Makefile         |   2 +-
> >   lib/librte_vhost/rte_vhost.h      |  60 ++++++++++++++++++-
> >   lib/librte_vhost/rte_vhost_user.h | 120
> ++++++++++++++++++++++++++++++++++++++
> >   lib/librte_vhost/vhost.h          |  14 -----
> >   lib/librte_vhost/vhost_user.c     |  30 ++++++++++
> >   lib/librte_vhost/vhost_user.h     |  88 ----------------------------
> >   6 files changed, 209 insertions(+), 105 deletions(-)
> >   create mode 100644 lib/librte_vhost/rte_vhost_user.h
> >
> > diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> > index 5d6c6abaed51..07439a186d91 100644
> > --- a/lib/librte_vhost/Makefile
> > +++ b/lib/librte_vhost/Makefile
> > @@ -25,6 +25,6 @@ SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c
> iotlb.c socket.c vhost.c \
> >   					vhost_user.c virtio_net.c
> >
> >   # install includes
> > -SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
> > +SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
> rte_vhost_user.h
> >
> >   include $(RTE_SDK)/mk/rte.lib.mk
> > diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
> > index d33206997453..7b76952638dc 100644
> > --- a/lib/librte_vhost/rte_vhost.h
> > +++ b/lib/librte_vhost/rte_vhost.h
> > @@ -16,12 +16,13 @@
> >   #include <rte_memory.h>
> >   #include <rte_mempool.h>
> >
> > +#include <rte_vhost_user.h>
> > +
> >   #ifdef __cplusplus
> >   extern "C" {
> >   #endif
> >
> >   /* These are not C++-aware. */
> > -#include <linux/vhost.h>
> >   #include <linux/virtio_ring.h>
> >
> >   #define RTE_VHOST_USER_CLIENT		(1ULL << 0)
> > @@ -65,6 +66,51 @@ struct rte_vhost_vring {
> >   };
> >
> >   /**
> > + * Vhost library started processing given vhost user message.
> > + *
> > + * This state should be used eg. to stop rings processing in case of
> > + * SET_MEM_TABLE message.
> > + *
> > + * Backend is allowed to return any result of
> RTE_VHOST_USER_MESSAGE_RESULT_*.
> > + */
> > +#define RTE_VHOST_USER_MSG_START 0
> > +
> > +/**
> > + * Vhost library is finishing processing given vhost user message.
> > + * If backend have handled the message produced response is passed as
> message
> > + * parameter. If response is needed it will be send after returning.
> > + *
> > + * This state might be used to resume ring processing in case of
> SET_MEM_TABLE
> > + * message.
> > + *
> > + * Returning RTE_VHOST_USER_MSG_RESULT_FAILED will trigger failure
> action in
> > + * vhost library.
> > + */
> > +#define RTE_VHOST_USER_MSG_END 1
> > +
> > +/**
> > + * Backend understood the message but processing it failed for some
> reason.
> > + * vhost library will take the failure action - chance closing existing
> > + * connection.
> > + */
> > +#define RTE_VHOST_USER_MSG_RESULT_FAILED -1
> > +
> > +/**
> > + * Backend understood the message and handled it entirly. Backend is
> responsible
> > + * for filling message object with right response data.
> > + */
> > +#define RTE_VHOST_USER_MSG_RESULT_HANDLED 0
> > +
> > +/**
> > + * Backend ignored the message or understood and took some action. In
> either
> > + * case the message need to be further processed by vhost library.
> > + *
> > + * Backend is not allowed to change passed message.
> > + */
> > +#define RTE_VHOST_USER_MSG_RESULT_OK 1
> > +
> > +
> > +/**
> >    * Device and vring operations.
> >    */
> >   struct vhost_device_ops {
> > @@ -84,7 +130,17 @@ struct vhost_device_ops {
> >   	int (*new_connection)(int vid);
> >   	void (*destroy_connection)(int vid);
> >
> > -	void *reserved[2]; /**< Reserved for future extension */
> > +	/**
> > +	 * Backend callback for user message.
> > +	 *
> > +	 * @param vid id of vhost device
> > +	 * @param msg message object.
> > +	 * @param phase RTE_VHOST_USER_MSG_START or
> RTE_VHOST_USER_MSG_END
> > +	 * @return one of RTE_VHOST_USER_MESSAGE_RESULT_*
> > +	 */
> > +	int (*user_message_handler)(int vid, struct VhostUserMsg *msg, int
> phase);
> 
> I think it would deserve a dedicated struct for two reasons:
>   1. This is specific to backend implementation, whereas the above struct
> was introduced for the application using the backends.

Agree, the additional dedicated structure can be useful for application but
this is not the case here.
As already proven here http://dpdk.org/dev/patchwork/patch/36582/
there are and will be virtio extensions in the future for particular backend
that vhost_user.c shouldn't really care about. Anyway if there will be no 
other option we will use this API you proposed but I prefere having this
handler.

>   2. There is not a lot room remaining in this struct before breaking the
> ABI.

True. But, from other side, what is the benefit of having "Reserved for 
future extension" fields if we are afraid of using it when needed.

>   3. (3 reasons in fact :) ) It is to handle vhost-user messages, so it
> would be better in rte_vhost_user.h.
> 

I'm sure I can do it better and make it more generic ;)

> > +
> > +	void *reserved[1]; /**< Reserved for future extension */
> >   };
> >
> >   /**
> > diff --git a/lib/librte_vhost/rte_vhost_user.h
> b/lib/librte_vhost/rte_vhost_user.h
> > new file mode 100644
> > index 000000000000..f7678d33acc3
> > --- /dev/null
> > +++ b/lib/librte_vhost/rte_vhost_user.h
> > @@ -0,0 +1,120 @@
> > +#ifndef _VHOST_RTE_VHOST_USER_H_
> > +#define _VHOST_RTE_VHOST_USER_H_
> > +
> > +#include <stdint.h>
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +/* These are not C++-aware. */
> > +#include <linux/vhost.h>
> > +
> > +/* refer to hw/virtio/vhost-user.c */
> > +
> > +struct vhost_iotlb_msg {
> > +	__u64 iova;
> > +	__u64 size;
> > +	__u64 uaddr;
> > +#define VHOST_ACCESS_RO      0x1
> > +#define VHOST_ACCESS_WO      0x2
> > +#define VHOST_ACCESS_RW      0x3
> > +	__u8 perm;
> > +#define VHOST_IOTLB_MISS           1
> > +#define VHOST_IOTLB_UPDATE         2
> > +#define VHOST_IOTLB_INVALIDATE     3
> > +#define VHOST_IOTLB_ACCESS_FAIL    4
> > +	__u8 type;
> > +};
> > +
> > +#define VHOST_MEMORY_MAX_NREGIONS 8
> > +
> > +#define VHOST_USER_PROTOCOL_F_MQ	0
> > +#define VHOST_USER_PROTOCOL_F_LOG_SHMFD	1
> > +#define VHOST_USER_PROTOCOL_F_RARP	2
> > +#define VHOST_USER_PROTOCOL_F_REPLY_ACK	3
> > +#define VHOST_USER_PROTOCOL_F_NET_MTU 4
> > +#define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5
> > +
> > +typedef enum VhostUserRequest {
> > +	VHOST_USER_NONE = 0,
> > +	VHOST_USER_GET_FEATURES = 1,
> > +	VHOST_USER_SET_FEATURES = 2,
> > +	VHOST_USER_SET_OWNER = 3,
> > +	VHOST_USER_RESET_OWNER = 4,
> > +	VHOST_USER_SET_MEM_TABLE = 5,
> > +	VHOST_USER_SET_LOG_BASE = 6,
> > +	VHOST_USER_SET_LOG_FD = 7,
> > +	VHOST_USER_SET_VRING_NUM = 8,
> > +	VHOST_USER_SET_VRING_ADDR = 9,
> > +	VHOST_USER_SET_VRING_BASE = 10,
> > +	VHOST_USER_GET_VRING_BASE = 11,
> > +	VHOST_USER_SET_VRING_KICK = 12,
> > +	VHOST_USER_SET_VRING_CALL = 13,
> > +	VHOST_USER_SET_VRING_ERR = 14,
> > +	VHOST_USER_GET_PROTOCOL_FEATURES = 15,
> > +	VHOST_USER_SET_PROTOCOL_FEATURES = 16,
> > +	VHOST_USER_GET_QUEUE_NUM = 17,
> > +	VHOST_USER_SET_VRING_ENABLE = 18,
> > +	VHOST_USER_SEND_RARP = 19,
> > +	VHOST_USER_NET_SET_MTU = 20,
> > +	VHOST_USER_SET_SLAVE_REQ_FD = 21,
> > +	VHOST_USER_IOTLB_MSG = 22,
> > +	VHOST_USER_MAX
> > +} VhostUserRequest;
> > +
> > +typedef enum VhostUserSlaveRequest {
> > +	VHOST_USER_SLAVE_NONE = 0,
> > +	VHOST_USER_SLAVE_IOTLB_MSG = 1,
> > +	VHOST_USER_SLAVE_MAX
> > +} VhostUserSlaveRequest;
> > +
> > +typedef struct VhostUserMemoryRegion {
> > +	uint64_t guest_phys_addr;
> > +	uint64_t memory_size;
> > +	uint64_t userspace_addr;
> > +	uint64_t mmap_offset;
> > +} VhostUserMemoryRegion;
> > +
> > +typedef struct VhostUserMemory {
> > +	uint32_t nregions;
> > +	uint32_t padding;
> > +	VhostUserMemoryRegion
> regions[VHOST_MEMORY_MAX_NREGIONS];
> > +} VhostUserMemory;
> > +
> > +typedef struct VhostUserLog {
> > +	uint64_t mmap_size;
> > +	uint64_t mmap_offset;
> > +} VhostUserLog;
> > +
> > +typedef struct VhostUserMsg {
> > +	union {
> > +		VhostUserRequest master;
> > +		VhostUserSlaveRequest slave;
> > +	} request;
> > +
> > +#define VHOST_USER_VERSION_MASK     0x3
> > +#define VHOST_USER_REPLY_MASK       (0x1 << 2)
> > +#define VHOST_USER_NEED_REPLY		(0x1 << 3)
> > +	uint32_t flags;
> > +	uint32_t size; /* the following payload size */
> > +	union {
> > +#define VHOST_USER_VRING_IDX_MASK   0xff
> > +#define VHOST_USER_VRING_NOFD_MASK  (0x1<<8)
> > +		uint64_t u64;
> > +		struct vhost_vring_state state;
> > +		struct vhost_vring_addr addr;
> > +		VhostUserMemory memory;
> > +		VhostUserLog    log;
> > +		struct vhost_iotlb_msg iotlb;
> > +	} payload;
> 
> We'll need the backend-specific payloads to be declared here so that we
> know the max message size for input validation.
> 

True, I did not considered this.
How about making payload a 'void *'. Allocating buffer dynamically will allow to
hide those structures and decrease size of this patch.

> > +	int fds[VHOST_MEMORY_MAX_NREGIONS];
> > +} __attribute((packed)) VhostUserMsg;
> > +
> > +#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _VHOST_RTE_VHOST_USER_H_ */
> > diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> > index d947bc9e3b3f..42a1474095a3 100644
> > --- a/lib/librte_vhost/vhost.h
> > +++ b/lib/librte_vhost/vhost.h
> > @@ -141,20 +141,6 @@ struct vhost_virtqueue {
> >
> >   #define VIRTIO_F_IOMMU_PLATFORM 33
> >
> > -struct vhost_iotlb_msg {
> > -	__u64 iova;
> > -	__u64 size;
> > -	__u64 uaddr;
> > -#define VHOST_ACCESS_RO      0x1
> > -#define VHOST_ACCESS_WO      0x2
> > -#define VHOST_ACCESS_RW      0x3
> > -	__u8 perm;
> > -#define VHOST_IOTLB_MISS           1
> > -#define VHOST_IOTLB_UPDATE         2
> > -#define VHOST_IOTLB_INVALIDATE     3
> > -#define VHOST_IOTLB_ACCESS_FAIL    4
> > -	__u8 type;
> > -};
> >
> >   #define VHOST_IOTLB_MSG 0x1
> >
> > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> > index 90ed2112e0af..15532e182b58 100644
> > --- a/lib/librte_vhost/vhost_user.c
> > +++ b/lib/librte_vhost/vhost_user.c
> > @@ -1301,6 +1301,7 @@ vhost_user_msg_handler(int vid, int fd)
> >   	struct virtio_net *dev;
> >   	struct VhostUserMsg msg;
> >   	int ret;
> > +	int user_handler_result;
> >   	int unlock_required = 0;
> >
> >   	dev = get_device(vid);
> > @@ -1347,6 +1348,26 @@ vhost_user_msg_handler(int vid, int fd)
> >   		return -1;
> >   	}
> >
> > +
> > +	if (dev->notify_ops->user_message_handler) {
> > +		user_handler_result = dev->notify_ops-
> >user_message_handler(
> > +				dev->vid, &msg,
> RTE_VHOST_USER_MSG_START);
> > +
> > +		switch (user_handler_result) {
> > +		case RTE_VHOST_USER_MSG_RESULT_FAILED:
> > +			RTE_LOG(ERR, VHOST_CONFIG,
> > +				"User message handler failed\n");
> > +			return -1;
> > +		case RTE_VHOST_USER_MSG_RESULT_HANDLED:
> > +			RTE_LOG(DEBUG, VHOST_CONFIG,
> > +				"User message handled by backend\n");
> > +			goto msg_handled;
> > +		case RTE_VHOST_USER_MSG_RESULT_OK:
> > +			break;
> > +		}
> > +	}
> > +
> > +
> >   	/*
> >   	 * Note: we don't lock all queues on VHOST_USER_GET_VRING_BASE
> >   	 * and VHOST_USER_RESET_OWNER, since it is sent when virtio stops
> > @@ -1485,6 +1506,15 @@ vhost_user_msg_handler(int vid, int fd)
> >   	if (unlock_required)
> >   		vhost_user_unlock_all_queue_pairs(dev);
> >
> > +msg_handled:
> > +	if (dev->notify_ops->user_message_handler) {
> > +		user_handler_result = dev->notify_ops-
> >user_message_handler(
> > +				dev->vid, &msg,
> RTE_VHOST_USER_MSG_END);
> > +
> > +		if (user_handler_result ==
> RTE_VHOST_USER_MSG_RESULT_FAILED)
> > +			return -1;
> > +	}
> > +
> >   	if (msg.flags & VHOST_USER_NEED_REPLY) {
> >   		msg.payload.u64 = !!ret;
> >   		msg.size = sizeof(msg.payload.u64);
> > diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
> > index d4bd604b9d6b..cf3f0da0ec41 100644
> > --- a/lib/librte_vhost/vhost_user.h
> > +++ b/lib/librte_vhost/vhost_user.h
> > @@ -10,17 +10,6 @@
> >
> >   #include "rte_vhost.h"
> >
> > -/* refer to hw/virtio/vhost-user.c */
> > -
> > -#define VHOST_MEMORY_MAX_NREGIONS 8
> > -
> > -#define VHOST_USER_PROTOCOL_F_MQ	0
> > -#define VHOST_USER_PROTOCOL_F_LOG_SHMFD	1
> > -#define VHOST_USER_PROTOCOL_F_RARP	2
> > -#define VHOST_USER_PROTOCOL_F_REPLY_ACK	3
> > -#define VHOST_USER_PROTOCOL_F_NET_MTU 4
> > -#define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5
> > -
> >   #define VHOST_USER_PROTOCOL_FEATURES	((1ULL <<
> VHOST_USER_PROTOCOL_F_MQ) | \
> >   					 (1ULL <<
> VHOST_USER_PROTOCOL_F_LOG_SHMFD) |\
> >   					 (1ULL <<
> VHOST_USER_PROTOCOL_F_RARP) | \
> > @@ -28,83 +17,6 @@
> >   					 (1ULL <<
> VHOST_USER_PROTOCOL_F_NET_MTU) | \
> >   					 (1ULL <<
> VHOST_USER_PROTOCOL_F_SLAVE_REQ))
> >
> > -typedef enum VhostUserRequest {
> > -	VHOST_USER_NONE = 0,
> > -	VHOST_USER_GET_FEATURES = 1,
> > -	VHOST_USER_SET_FEATURES = 2,
> > -	VHOST_USER_SET_OWNER = 3,
> > -	VHOST_USER_RESET_OWNER = 4,
> > -	VHOST_USER_SET_MEM_TABLE = 5,
> > -	VHOST_USER_SET_LOG_BASE = 6,
> > -	VHOST_USER_SET_LOG_FD = 7,
> > -	VHOST_USER_SET_VRING_NUM = 8,
> > -	VHOST_USER_SET_VRING_ADDR = 9,
> > -	VHOST_USER_SET_VRING_BASE = 10,
> > -	VHOST_USER_GET_VRING_BASE = 11,
> > -	VHOST_USER_SET_VRING_KICK = 12,
> > -	VHOST_USER_SET_VRING_CALL = 13,
> > -	VHOST_USER_SET_VRING_ERR = 14,
> > -	VHOST_USER_GET_PROTOCOL_FEATURES = 15,
> > -	VHOST_USER_SET_PROTOCOL_FEATURES = 16,
> > -	VHOST_USER_GET_QUEUE_NUM = 17,
> > -	VHOST_USER_SET_VRING_ENABLE = 18,
> > -	VHOST_USER_SEND_RARP = 19,
> > -	VHOST_USER_NET_SET_MTU = 20,
> > -	VHOST_USER_SET_SLAVE_REQ_FD = 21,
> > -	VHOST_USER_IOTLB_MSG = 22,
> > -	VHOST_USER_MAX
> > -} VhostUserRequest;
> > -
> > -typedef enum VhostUserSlaveRequest {
> > -	VHOST_USER_SLAVE_NONE = 0,
> > -	VHOST_USER_SLAVE_IOTLB_MSG = 1,
> > -	VHOST_USER_SLAVE_MAX
> > -} VhostUserSlaveRequest;
> > -
> > -typedef struct VhostUserMemoryRegion {
> > -	uint64_t guest_phys_addr;
> > -	uint64_t memory_size;
> > -	uint64_t userspace_addr;
> > -	uint64_t mmap_offset;
> > -} VhostUserMemoryRegion;
> > -
> > -typedef struct VhostUserMemory {
> > -	uint32_t nregions;
> > -	uint32_t padding;
> > -	VhostUserMemoryRegion
> regions[VHOST_MEMORY_MAX_NREGIONS];
> > -} VhostUserMemory;
> > -
> > -typedef struct VhostUserLog {
> > -	uint64_t mmap_size;
> > -	uint64_t mmap_offset;
> > -} VhostUserLog;
> > -
> > -typedef struct VhostUserMsg {
> > -	union {
> > -		VhostUserRequest master;
> > -		VhostUserSlaveRequest slave;
> > -	} request;
> > -
> > -#define VHOST_USER_VERSION_MASK     0x3
> > -#define VHOST_USER_REPLY_MASK       (0x1 << 2)
> > -#define VHOST_USER_NEED_REPLY		(0x1 << 3)
> > -	uint32_t flags;
> > -	uint32_t size; /* the following payload size */
> > -	union {
> > -#define VHOST_USER_VRING_IDX_MASK   0xff
> > -#define VHOST_USER_VRING_NOFD_MASK  (0x1<<8)
> > -		uint64_t u64;
> > -		struct vhost_vring_state state;
> > -		struct vhost_vring_addr addr;
> > -		VhostUserMemory memory;
> > -		VhostUserLog    log;
> > -		struct vhost_iotlb_msg iotlb;
> > -	} payload;
> > -	int fds[VHOST_MEMORY_MAX_NREGIONS];
> > -} __attribute((packed)) VhostUserMsg;
> > -
> > -#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
> > -
> >   /* The version of the protocol we support */
> >   #define VHOST_USER_VERSION    0x1
> >
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] vhost: fix segfault as handle set_mem_table message
  @ 2018-03-29 16:37  3%             ` Maxime Coquelin
  2018-03-29 18:09  0%               ` Wodkowski, PawelX
  0 siblings, 1 reply; 200+ results
From: Maxime Coquelin @ 2018-03-29 16:37 UTC (permalink / raw)
  To: Wodkowski, PawelX, Tan, Jianfeng, Victor Kaplansky
  Cc: dev, stable, Yang, Yi Y, Harris, James R, Yang, Ziye, Liu,
	Changpeng, Stojaczyk, DariuszX, Yuanhan Liu

Hi Pawel,

On 03/29/2018 02:57 PM, Wodkowski, PawelX wrote:
>>>>>>> DPDK vhost-user handles this message rudely by unmap all existing
>>>>>>> regions and map new ones. This might lead to segfault if there
>>>>>>> is pmd thread just trying to touch those unmapped memory regions.
>>>>>>>
>>>>>>> But for most cases, except VM memory hotplug,
>>>>
>>>> FYI, Victor is working on implementing a lock-less protection mechanism
>>>> to prevent crashes in such cases. It is intended first to protect
>>>> log_base in case of multiqueue + live-migration, but would solve thi
>>>> issue too.
>>>
>>> Bring this issue back for discussion.
>>>
>>> Reported by SPDK guys, even with per-queue lock, they could still run into
>> crash as of memory hot plug or unplug.
>>> In detail, you add the lock in an internal struct, vhost_queue which is,
>> unfortunately, not visible to the external datapath, like vhost-scsi in SPDK.
>>
>> Yes, I agree current solution is not enough
>>
>>> The memory hot plug and unplug would be the main issue from SPDK side so
>> far. For this specific issue, I think we can continue this patch to filter out the
>> changed regions, and keep unchanged regions not remapped.
>>>
>>> But I know that the per-vq lock is not only to resolve the memory table issue,
>> some other vhost-user messages could also lead to problems? If yes, shall we
>> take a step back, to think about how to solve this kind of issues for backends,
>> like vhost-scsi.
>>
>> Right, any message that can change the device or virtqueue states can be
>> problematic.
>>
>>> Thoughts?
>>
>> In another thread, SPDK people proposed to destroy and re-create the
>> device for every message. I think this is not acceptable.
> 
> Backend must know which part of device is outdated (memory table, ring
> position etc) so it can take right action here. I  don't insist on destroy/create
> scheme but this solve most of those problems (if not all). if we will have another
> working solution this is perfectly fine for me.
> 
>>
>> I proposed an alternative that I think would work:
>> - external backends & applications implements the .vring_state_changed()
>>     callback. On disable it stops processing the rings and only return
>>     once all descriptor buffers are processed. On enable, they resume the
>>     rings processing.
>> - In vhost lib, we implement vhost_vring_pause and vhost_vring_resume
>>     functions. In pause function, we save device state (enable or
>>     disable) in a variable, and if the ring is enabled at device level it
>>     calls .vring_state_changed() with disabled state. In resume, it checks
>>     the vring state in the device and call .vring_state_changed() with
>>     enable state if it was enabled in device state.
> 
> This will be not enough. We need to know what exactly changed. As for ring
> state it is straight forward to save/fetch new ring state but eg. for set mem
> table we need to finish all IO, remove currently registered RDMA memory.
> Then, when new memory table is available we need to register it again for
> RDMA then resume IO.

Yes, that's what I meant when I said "only return once all desc buffers
are processed".

These messages are quite rare, I don't think it is really a problem to
finish all IO when it happens, and that's what happened with your
initial patch.

I agree we must consider a solution as you propose below, but my
proposal could easily be implemented for v18.05. Whereas your patch
below is quite a big change, and I think it is a bit too late as
integration deadline for v18.05 is April 6th.

> 
>>
>> So, I think that would work but I hadn't had a clear reply from SPDK
>> people proving it wouldn't.
>>
>> They asked we revert Victor's patch, but I don't see the need as it does
>> not hurt SPDK (but doesn't protect anything for them I agree), while it
>> really fixes real issues with internal Net backend.
>>
>> What do you think of my proposal? Do you see other alternative?
>>
> 
> As Victor is already working on the solution, can you post some info about how
> you plan to solve it? I was thinking about something like code bellow (sorry for
> how this code look like but this is my work-in-progress  to see if this make any
> sense here). This code allow to:
> 1.  not introducing changes like http://dpdk.org/ml/archives/dev/2018-March/093922.html
>       because backend will handle this by its own.

Right, but we may anyway have to declare the payload for these backend
specific messages in vhost lib as it may be bigger than existing
payloads.

> 2. virtio-net specific messages can be moved out of generic vhost_user.c file
> 3. virtqueue locking stuff can be moved to virito-net specific backend.
> 
> Pls let me know what you think.

Thanks for sharing, please find a few comments below:

> 
> ---
>   lib/librte_vhost/Makefile         |   2 +-
>   lib/librte_vhost/rte_vhost.h      |  60 ++++++++++++++++++-
>   lib/librte_vhost/rte_vhost_user.h | 120 ++++++++++++++++++++++++++++++++++++++
>   lib/librte_vhost/vhost.h          |  14 -----
>   lib/librte_vhost/vhost_user.c     |  30 ++++++++++
>   lib/librte_vhost/vhost_user.h     |  88 ----------------------------
>   6 files changed, 209 insertions(+), 105 deletions(-)
>   create mode 100644 lib/librte_vhost/rte_vhost_user.h
> 
> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> index 5d6c6abaed51..07439a186d91 100644
> --- a/lib/librte_vhost/Makefile
> +++ b/lib/librte_vhost/Makefile
> @@ -25,6 +25,6 @@ SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \
>   					vhost_user.c virtio_net.c
>   
>   # install includes
> -SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h rte_vhost_user.h
>   
>   include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
> index d33206997453..7b76952638dc 100644
> --- a/lib/librte_vhost/rte_vhost.h
> +++ b/lib/librte_vhost/rte_vhost.h
> @@ -16,12 +16,13 @@
>   #include <rte_memory.h>
>   #include <rte_mempool.h>
>   
> +#include <rte_vhost_user.h>
> +
>   #ifdef __cplusplus
>   extern "C" {
>   #endif
>   
>   /* These are not C++-aware. */
> -#include <linux/vhost.h>
>   #include <linux/virtio_ring.h>
>   
>   #define RTE_VHOST_USER_CLIENT		(1ULL << 0)
> @@ -65,6 +66,51 @@ struct rte_vhost_vring {
>   };
>   
>   /**
> + * Vhost library started processing given vhost user message.
> + *
> + * This state should be used eg. to stop rings processing in case of
> + * SET_MEM_TABLE message.
> + *
> + * Backend is allowed to return any result of RTE_VHOST_USER_MESSAGE_RESULT_*.
> + */
> +#define RTE_VHOST_USER_MSG_START 0
> +
> +/**
> + * Vhost library is finishing processing given vhost user message.
> + * If backend have handled the message produced response is passed as message
> + * parameter. If response is needed it will be send after returning.
> + *
> + * This state might be used to resume ring processing in case of SET_MEM_TABLE
> + * message.
> + *
> + * Returning RTE_VHOST_USER_MSG_RESULT_FAILED will trigger failure action in
> + * vhost library.
> + */
> +#define RTE_VHOST_USER_MSG_END 1
> +
> +/**
> + * Backend understood the message but processing it failed for some reason.
> + * vhost library will take the failure action - chance closing existing
> + * connection.
> + */
> +#define RTE_VHOST_USER_MSG_RESULT_FAILED -1
> +
> +/**
> + * Backend understood the message and handled it entirly. Backend is responsible
> + * for filling message object with right response data.
> + */
> +#define RTE_VHOST_USER_MSG_RESULT_HANDLED 0
> +
> +/**
> + * Backend ignored the message or understood and took some action. In either
> + * case the message need to be further processed by vhost library.
> + *
> + * Backend is not allowed to change passed message.
> + */
> +#define RTE_VHOST_USER_MSG_RESULT_OK 1
> +
> +
> +/**
>    * Device and vring operations.
>    */
>   struct vhost_device_ops {
> @@ -84,7 +130,17 @@ struct vhost_device_ops {
>   	int (*new_connection)(int vid);
>   	void (*destroy_connection)(int vid);
>   
> -	void *reserved[2]; /**< Reserved for future extension */
> +	/**
> +	 * Backend callback for user message.
> +	 *
> +	 * @param vid id of vhost device
> +	 * @param msg message object.
> +	 * @param phase RTE_VHOST_USER_MSG_START or RTE_VHOST_USER_MSG_END
> +	 * @return one of RTE_VHOST_USER_MESSAGE_RESULT_*
> +	 */
> +	int (*user_message_handler)(int vid, struct VhostUserMsg *msg, int phase);

I think it would deserve a dedicated struct for two reasons:
  1. This is specific to backend implementation, whereas the above struct
was introduced for the application using the backends.
  2. There is not a lot room remaining in this struct before breaking the
ABI.
  3. (3 reasons in fact :) ) It is to handle vhost-user messages, so it 
would be better in rte_vhost_user.h.

> +
> +	void *reserved[1]; /**< Reserved for future extension */
>   };
>   
>   /**
> diff --git a/lib/librte_vhost/rte_vhost_user.h b/lib/librte_vhost/rte_vhost_user.h
> new file mode 100644
> index 000000000000..f7678d33acc3
> --- /dev/null
> +++ b/lib/librte_vhost/rte_vhost_user.h
> @@ -0,0 +1,120 @@
> +#ifndef _VHOST_RTE_VHOST_USER_H_
> +#define _VHOST_RTE_VHOST_USER_H_
> +
> +#include <stdint.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/* These are not C++-aware. */
> +#include <linux/vhost.h>
> +
> +/* refer to hw/virtio/vhost-user.c */
> +
> +struct vhost_iotlb_msg {
> +	__u64 iova;
> +	__u64 size;
> +	__u64 uaddr;
> +#define VHOST_ACCESS_RO      0x1
> +#define VHOST_ACCESS_WO      0x2
> +#define VHOST_ACCESS_RW      0x3
> +	__u8 perm;
> +#define VHOST_IOTLB_MISS           1
> +#define VHOST_IOTLB_UPDATE         2
> +#define VHOST_IOTLB_INVALIDATE     3
> +#define VHOST_IOTLB_ACCESS_FAIL    4
> +	__u8 type;
> +};
> +
> +#define VHOST_MEMORY_MAX_NREGIONS 8
> +
> +#define VHOST_USER_PROTOCOL_F_MQ	0
> +#define VHOST_USER_PROTOCOL_F_LOG_SHMFD	1
> +#define VHOST_USER_PROTOCOL_F_RARP	2
> +#define VHOST_USER_PROTOCOL_F_REPLY_ACK	3
> +#define VHOST_USER_PROTOCOL_F_NET_MTU 4
> +#define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5
> +
> +typedef enum VhostUserRequest {
> +	VHOST_USER_NONE = 0,
> +	VHOST_USER_GET_FEATURES = 1,
> +	VHOST_USER_SET_FEATURES = 2,
> +	VHOST_USER_SET_OWNER = 3,
> +	VHOST_USER_RESET_OWNER = 4,
> +	VHOST_USER_SET_MEM_TABLE = 5,
> +	VHOST_USER_SET_LOG_BASE = 6,
> +	VHOST_USER_SET_LOG_FD = 7,
> +	VHOST_USER_SET_VRING_NUM = 8,
> +	VHOST_USER_SET_VRING_ADDR = 9,
> +	VHOST_USER_SET_VRING_BASE = 10,
> +	VHOST_USER_GET_VRING_BASE = 11,
> +	VHOST_USER_SET_VRING_KICK = 12,
> +	VHOST_USER_SET_VRING_CALL = 13,
> +	VHOST_USER_SET_VRING_ERR = 14,
> +	VHOST_USER_GET_PROTOCOL_FEATURES = 15,
> +	VHOST_USER_SET_PROTOCOL_FEATURES = 16,
> +	VHOST_USER_GET_QUEUE_NUM = 17,
> +	VHOST_USER_SET_VRING_ENABLE = 18,
> +	VHOST_USER_SEND_RARP = 19,
> +	VHOST_USER_NET_SET_MTU = 20,
> +	VHOST_USER_SET_SLAVE_REQ_FD = 21,
> +	VHOST_USER_IOTLB_MSG = 22,
> +	VHOST_USER_MAX
> +} VhostUserRequest;
> +
> +typedef enum VhostUserSlaveRequest {
> +	VHOST_USER_SLAVE_NONE = 0,
> +	VHOST_USER_SLAVE_IOTLB_MSG = 1,
> +	VHOST_USER_SLAVE_MAX
> +} VhostUserSlaveRequest;
> +
> +typedef struct VhostUserMemoryRegion {
> +	uint64_t guest_phys_addr;
> +	uint64_t memory_size;
> +	uint64_t userspace_addr;
> +	uint64_t mmap_offset;
> +} VhostUserMemoryRegion;
> +
> +typedef struct VhostUserMemory {
> +	uint32_t nregions;
> +	uint32_t padding;
> +	VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
> +} VhostUserMemory;
> +
> +typedef struct VhostUserLog {
> +	uint64_t mmap_size;
> +	uint64_t mmap_offset;
> +} VhostUserLog;
> +
> +typedef struct VhostUserMsg {
> +	union {
> +		VhostUserRequest master;
> +		VhostUserSlaveRequest slave;
> +	} request;
> +
> +#define VHOST_USER_VERSION_MASK     0x3
> +#define VHOST_USER_REPLY_MASK       (0x1 << 2)
> +#define VHOST_USER_NEED_REPLY		(0x1 << 3)
> +	uint32_t flags;
> +	uint32_t size; /* the following payload size */
> +	union {
> +#define VHOST_USER_VRING_IDX_MASK   0xff
> +#define VHOST_USER_VRING_NOFD_MASK  (0x1<<8)
> +		uint64_t u64;
> +		struct vhost_vring_state state;
> +		struct vhost_vring_addr addr;
> +		VhostUserMemory memory;
> +		VhostUserLog    log;
> +		struct vhost_iotlb_msg iotlb;
> +	} payload;

We'll need the backend-specific payloads to be declared here so that we
know the max message size for input validation.

> +	int fds[VHOST_MEMORY_MAX_NREGIONS];
> +} __attribute((packed)) VhostUserMsg;
> +
> +#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _VHOST_RTE_VHOST_USER_H_ */
> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> index d947bc9e3b3f..42a1474095a3 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -141,20 +141,6 @@ struct vhost_virtqueue {
>   
>   #define VIRTIO_F_IOMMU_PLATFORM 33
>   
> -struct vhost_iotlb_msg {
> -	__u64 iova;
> -	__u64 size;
> -	__u64 uaddr;
> -#define VHOST_ACCESS_RO      0x1
> -#define VHOST_ACCESS_WO      0x2
> -#define VHOST_ACCESS_RW      0x3
> -	__u8 perm;
> -#define VHOST_IOTLB_MISS           1
> -#define VHOST_IOTLB_UPDATE         2
> -#define VHOST_IOTLB_INVALIDATE     3
> -#define VHOST_IOTLB_ACCESS_FAIL    4
> -	__u8 type;
> -};
>   
>   #define VHOST_IOTLB_MSG 0x1
>   
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index 90ed2112e0af..15532e182b58 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -1301,6 +1301,7 @@ vhost_user_msg_handler(int vid, int fd)
>   	struct virtio_net *dev;
>   	struct VhostUserMsg msg;
>   	int ret;
> +	int user_handler_result;
>   	int unlock_required = 0;
>   
>   	dev = get_device(vid);
> @@ -1347,6 +1348,26 @@ vhost_user_msg_handler(int vid, int fd)
>   		return -1;
>   	}
>   
> +
> +	if (dev->notify_ops->user_message_handler) {
> +		user_handler_result = dev->notify_ops->user_message_handler(
> +				dev->vid, &msg, RTE_VHOST_USER_MSG_START);
> +
> +		switch (user_handler_result) {
> +		case RTE_VHOST_USER_MSG_RESULT_FAILED:
> +			RTE_LOG(ERR, VHOST_CONFIG,
> +				"User message handler failed\n");
> +			return -1;
> +		case RTE_VHOST_USER_MSG_RESULT_HANDLED:
> +			RTE_LOG(DEBUG, VHOST_CONFIG,
> +				"User message handled by backend\n");
> +			goto msg_handled;
> +		case RTE_VHOST_USER_MSG_RESULT_OK:
> +			break;
> +		}
> +	}
> +
> +
>   	/*
>   	 * Note: we don't lock all queues on VHOST_USER_GET_VRING_BASE
>   	 * and VHOST_USER_RESET_OWNER, since it is sent when virtio stops
> @@ -1485,6 +1506,15 @@ vhost_user_msg_handler(int vid, int fd)
>   	if (unlock_required)
>   		vhost_user_unlock_all_queue_pairs(dev);
>   
> +msg_handled:
> +	if (dev->notify_ops->user_message_handler) {
> +		user_handler_result = dev->notify_ops->user_message_handler(
> +				dev->vid, &msg, RTE_VHOST_USER_MSG_END);
> +
> +		if (user_handler_result == RTE_VHOST_USER_MSG_RESULT_FAILED)
> +			return -1;
> +	}
> +
>   	if (msg.flags & VHOST_USER_NEED_REPLY) {
>   		msg.payload.u64 = !!ret;
>   		msg.size = sizeof(msg.payload.u64);
> diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
> index d4bd604b9d6b..cf3f0da0ec41 100644
> --- a/lib/librte_vhost/vhost_user.h
> +++ b/lib/librte_vhost/vhost_user.h
> @@ -10,17 +10,6 @@
>   
>   #include "rte_vhost.h"
>   
> -/* refer to hw/virtio/vhost-user.c */
> -
> -#define VHOST_MEMORY_MAX_NREGIONS 8
> -
> -#define VHOST_USER_PROTOCOL_F_MQ	0
> -#define VHOST_USER_PROTOCOL_F_LOG_SHMFD	1
> -#define VHOST_USER_PROTOCOL_F_RARP	2
> -#define VHOST_USER_PROTOCOL_F_REPLY_ACK	3
> -#define VHOST_USER_PROTOCOL_F_NET_MTU 4
> -#define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5
> -
>   #define VHOST_USER_PROTOCOL_FEATURES	((1ULL << VHOST_USER_PROTOCOL_F_MQ) | \
>   					 (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) |\
>   					 (1ULL << VHOST_USER_PROTOCOL_F_RARP) | \
> @@ -28,83 +17,6 @@
>   					 (1ULL << VHOST_USER_PROTOCOL_F_NET_MTU) | \
>   					 (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ))
>   
> -typedef enum VhostUserRequest {
> -	VHOST_USER_NONE = 0,
> -	VHOST_USER_GET_FEATURES = 1,
> -	VHOST_USER_SET_FEATURES = 2,
> -	VHOST_USER_SET_OWNER = 3,
> -	VHOST_USER_RESET_OWNER = 4,
> -	VHOST_USER_SET_MEM_TABLE = 5,
> -	VHOST_USER_SET_LOG_BASE = 6,
> -	VHOST_USER_SET_LOG_FD = 7,
> -	VHOST_USER_SET_VRING_NUM = 8,
> -	VHOST_USER_SET_VRING_ADDR = 9,
> -	VHOST_USER_SET_VRING_BASE = 10,
> -	VHOST_USER_GET_VRING_BASE = 11,
> -	VHOST_USER_SET_VRING_KICK = 12,
> -	VHOST_USER_SET_VRING_CALL = 13,
> -	VHOST_USER_SET_VRING_ERR = 14,
> -	VHOST_USER_GET_PROTOCOL_FEATURES = 15,
> -	VHOST_USER_SET_PROTOCOL_FEATURES = 16,
> -	VHOST_USER_GET_QUEUE_NUM = 17,
> -	VHOST_USER_SET_VRING_ENABLE = 18,
> -	VHOST_USER_SEND_RARP = 19,
> -	VHOST_USER_NET_SET_MTU = 20,
> -	VHOST_USER_SET_SLAVE_REQ_FD = 21,
> -	VHOST_USER_IOTLB_MSG = 22,
> -	VHOST_USER_MAX
> -} VhostUserRequest;
> -
> -typedef enum VhostUserSlaveRequest {
> -	VHOST_USER_SLAVE_NONE = 0,
> -	VHOST_USER_SLAVE_IOTLB_MSG = 1,
> -	VHOST_USER_SLAVE_MAX
> -} VhostUserSlaveRequest;
> -
> -typedef struct VhostUserMemoryRegion {
> -	uint64_t guest_phys_addr;
> -	uint64_t memory_size;
> -	uint64_t userspace_addr;
> -	uint64_t mmap_offset;
> -} VhostUserMemoryRegion;
> -
> -typedef struct VhostUserMemory {
> -	uint32_t nregions;
> -	uint32_t padding;
> -	VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS];
> -} VhostUserMemory;
> -
> -typedef struct VhostUserLog {
> -	uint64_t mmap_size;
> -	uint64_t mmap_offset;
> -} VhostUserLog;
> -
> -typedef struct VhostUserMsg {
> -	union {
> -		VhostUserRequest master;
> -		VhostUserSlaveRequest slave;
> -	} request;
> -
> -#define VHOST_USER_VERSION_MASK     0x3
> -#define VHOST_USER_REPLY_MASK       (0x1 << 2)
> -#define VHOST_USER_NEED_REPLY		(0x1 << 3)
> -	uint32_t flags;
> -	uint32_t size; /* the following payload size */
> -	union {
> -#define VHOST_USER_VRING_IDX_MASK   0xff
> -#define VHOST_USER_VRING_NOFD_MASK  (0x1<<8)
> -		uint64_t u64;
> -		struct vhost_vring_state state;
> -		struct vhost_vring_addr addr;
> -		VhostUserMemory memory;
> -		VhostUserLog    log;
> -		struct vhost_iotlb_msg iotlb;
> -	} payload;
> -	int fds[VHOST_MEMORY_MAX_NREGIONS];
> -} __attribute((packed)) VhostUserMsg;
> -
> -#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
> -
>   /* The version of the protocol we support */
>   #define VHOST_USER_VERSION    0x1
>   
> 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 4/8] ethdev: Add port representor device flag
  @ 2018-03-29 14:53  3%     ` Doherty, Declan
  2018-04-01  6:14  0%       ` Shahaf Shuler
  0 siblings, 1 reply; 200+ results
From: Doherty, Declan @ 2018-03-29 14:53 UTC (permalink / raw)
  To: Shahaf Shuler, dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Qi Zhang,
	Alejandro Lucero, Andrew Rybchenko, Mohammad Abdul Awal,
	Remy Horton, John McNamara, Rony Efraim, Wu, Jingjing, Lu,
	Wenzhuo, Vincent JArdin, Yuanhan Liu, Richardson, Bruce, Ananyev,
	Konstantin, Wang, Zhihong

On 29/03/2018 7:13 AM, Shahaf Shuler wrote:
> Wednesday, March 28, 2018 4:54 PM, Declan Doherty:
>> Subject: [dpdk-dev][PATCH v6 4/8] ethdev: Add port representor device flag
>>
>> Add new device flag to specify that ethdev port is a port representor.
>> Extend rte_eth_dev_info structure to expose device flags to user which
>> enable applications to discover if a port is a representor port.
>>
>> Signed-off-by: Declan Doherty <declan.doherty@intel.com>
>> ---
>>   lib/librte_ether/rte_ethdev.c             | 1 +
>>   lib/librte_ether/rte_ethdev.h             | 9 ++++++---
>>   lib/librte_ether/rte_ethdev_representor.h | 3 +++
>>   3 files changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
>> index c719f84a3..163246433 100644
>> --- a/lib/librte_ether/rte_ethdev.c
>> +++ b/lib/librte_ether/rte_ethdev.c
>> @@ -2399,6 +2399,7 @@ rte_eth_dev_info_get(uint16_t port_id, struct
>> rte_eth_dev_info *dev_info)
>>   	dev_info->nb_rx_queues = dev->data->nb_rx_queues;
>>   	dev_info->nb_tx_queues = dev->data->nb_tx_queues;
>>   	dev_info->switch_id = dev->data->switch_id;
>> +	dev_info->dev_flags = dev->data->dev_flags;
>>   }
>>
>>   int
>> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
>> index dced4fc41..226acc8b1 100644
>> --- a/lib/librte_ether/rte_ethdev.h
>> +++ b/lib/librte_ether/rte_ethdev.h
>> @@ -996,6 +996,7 @@ struct rte_eth_dev_info {
>>   	const char *driver_name; /**< Device Driver name. */
>>   	unsigned int if_index; /**< Index to bound host interface, or 0 if
>> none.
>>   		Use if_indextoname() to translate into an interface name. */
>> +	uint32_t dev_flags; /**< Device flags */
>>   	uint32_t min_rx_bufsize; /**< Minimum size of RX buffer. */
>>   	uint32_t max_rx_pktlen; /**< Maximum configurable length of RX
>> pkt. */
>>   	uint16_t max_rx_queues; /**< Maximum number of RX queues. */
>> @@ -1229,11 +1230,13 @@ struct rte_eth_dev_owner {  };
>>
>>   /** Device supports link state interrupt */
>> -#define RTE_ETH_DEV_INTR_LSC     0x0002
>> +#define RTE_ETH_DEV_INTR_LSC		0x0002
>>   /** Device is a bonded slave */
>> -#define RTE_ETH_DEV_BONDED_SLAVE 0x0004
>> +#define RTE_ETH_DEV_BONDED_SLAVE	0x0004
>>   /** Device supports device removal interrupt */
>> -#define RTE_ETH_DEV_INTR_RMV     0x0008
>> +#define RTE_ETH_DEV_INTR_RMV		0x0008
>> +/** Device is port representor */
>> +#define RTE_ETH_DEV_REPRESENTOR		0x0010
> 
> Maybe it is a good time to make some order here.
> I understand the decision to use flags instead of bit-field. It is better.
> 
> However there is a mix here of device capabilities like : RTE_ETH_DEV_INTR_LSC   and RTE_ETH_DEV_INTR_RMV
> And device attributes like : RTE_ETH_DEV_BONDED_SLAVE and RTE_ETH_DEV_REPRESENTOR.
> I don't think they belong together under the genetic name of dev_flags.
> 
> Moreover, I am not sure the fact device is bonded slave should be exposed to the application. It should be internal to ethdev and its port iterators.

That's a good point on the bonded slave flag, I'll look at fixing that 
for the next release. I don't think changing it should effect ABI but 
I'll need to have a closer look.

Do you think that we should have a separate device attributes field, 
which the representor flag is contained in.

> 
> Finally I think representor port may need more info now (and in the future), for example the associated vf id.
> For that, I think it is better it to be exposed as a dedicated struct on device info.

I think a switch port id should suffice for that, for SR-IOV devices it 
would map to the vf_id.

> 
>>
>>   /**
>>    * @warning
>> diff --git a/lib/librte_ether/rte_ethdev_representor.h
>> b/lib/librte_ether/rte_ethdev_representor.h
>> index cbc1f2855..f3726d0ba 100644
>> --- a/lib/librte_ether/rte_ethdev_representor.h
>> +++ b/lib/librte_ether/rte_ethdev_representor.h
>> @@ -22,6 +22,9 @@ eth_dev_representor_port_init(struct rte_eth_dev
>> *ethdev, void *init_params)
>>   	/** representor inherits the switch id of it's base device */
>>   	ethdev->data->switch_id = base_ethdev->data->switch_id;
>>
>> +	/** Set device flags to specify that device is a representor port */
>> +	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
> 
> Should be set in the PMD, not in ethdev layer

As in the previous patch this is just a generic port bus init function 
which meets the simplest use case of representor port with a single 
switch domain, a PMD doesn't need to use it but having it here saves 
duplicating the same code across multiple PMD which are only supporting 
the basic mode.

> 
>> +
>>   	return 0;
>>   }
>>
>> --
>> 2.14.3
> 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external backend support
  @ 2018-03-29 13:47  3%     ` Wodkowski, PawelX
  2018-04-01 19:53  0%       ` Zhang, Roy Fan
  0 siblings, 1 reply; 200+ results
From: Wodkowski, PawelX @ 2018-03-29 13:47 UTC (permalink / raw)
  To: Zhang, Roy Fan, dev; +Cc: maxime.coquelin, jianjay.zhou, Tan, Jianfeng

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Fan Zhang
> Sent: Thursday, March 29, 2018 2:53 PM
> To: dev@dpdk.org
> Cc: maxime.coquelin@redhat.com; jianjay.zhou@huawei.com; Tan, Jianfeng
> <jianfeng.tan@intel.com>
> Subject: [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external backend
> support
> 
> This patch adds external backend support to vhost library. The patch provides
> new APIs for the external backend to register pre and post vhost-user
> message
> handlers.
> 
> Signed-off-by: Fan Zhang <roy.fan.zhang@intel.com>
> ---
>  lib/librte_vhost/rte_vhost.h           | 64
> +++++++++++++++++++++++++++++++++-
>  lib/librte_vhost/rte_vhost_version.map |  6 ++++
>  lib/librte_vhost/vhost.c               | 17 ++++++++-
>  lib/librte_vhost/vhost.h               |  8 +++--
>  lib/librte_vhost/vhost_user.c          | 33 +++++++++++++++++-
>  5 files changed, 123 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
> index d332069..b902c44 100644
> --- a/lib/librte_vhost/rte_vhost.h
> +++ b/lib/librte_vhost/rte_vhost.h
> @@ -1,5 +1,5 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2010-2017 Intel Corporation
> + * Copyright(c) 2010-2018 Intel Corporation
>   */
> 
>  #ifndef _RTE_VHOST_H_
> @@ -88,6 +88,55 @@ struct vhost_device_ops {
>  };
> 
>  /**
> + * function prototype for the vhost backend to handler specific vhost user
> + * messages prior to the master message handling
> + *
> + * @param vid
> + *  vhost device id
> + * @param msg
> + *  Message pointer.
> + * @param payload
> + *  Message payload.

No payload parameter.

> + * @param require_reply
> + *  If the handler requires sending a reply, this varaible shall be written 1,
> + *  otherwise 0.
> + * @param skip_master
> + *  If the handler requires skipping the master message handling, this
> variable
> + *  shall be written 1, otherwise 0.
> + * @return
> + *  0 on success, -1 on failure
> + */
> +typedef int (*rte_vhost_msg_pre_handle)(int vid, void *msg,
> +		uint32_t *require_reply, uint32_t *skip_master);
> +
> +/**
> + * function prototype for the vhost backend to handler specific vhost user
> + * messages after the master message handling is done
> + *
> + * @param vid
> + *  vhost device id
> + * @param msg
> + *  Message pointer.
> + * @param payload
> + *  Message payload.

No payload parameter :)

> + * @param require_reply
> + *  If the handler requires sending a reply, this varaible shall be written 1,
> + *  otherwise 0.
> + * @return
> + *  0 on success, -1 on failure
> + */
> +typedef int (*rte_vhost_msg_post_handle)(int vid, void *msg,
> +		uint32_t *require_reply);
> +

What mean 'Message pointer' Is this const for us? Is this payload? Making msg 'void *' is not a
way to go here. Those pre and post handlers need to see exactly the same
structures like vhost_user.c file. Otherwise we can get into troubles when ABI
changes.

Also you can easily merge pre and post handlers into one handler with one
Parameter describing what phase of message processing we are now.

> +/**
> + * pre and post vhost user message handlers
> + */
> +struct vhost_user_extern_ops {
> +	rte_vhost_msg_pre_handle pre_msg_handle;
> +	rte_vhost_msg_post_handle post_msg_handle;
> +};
> +
> +/**
>   * Convert guest physical address to host virtual address
>   *
>   * @param mem
> @@ -434,6 +483,19 @@ int rte_vhost_vring_call(int vid, uint16_t vring_idx);
>   */
>  uint32_t rte_vhost_rx_queue_count(int vid, uint16_t qid);
> 
> +/**
> + * register external vhost backend
> + *
> + * @param vid
> + *  vhost device ID
> + * @param ops
> + *  ops that process external vhost user messages
> + * @return
> + *  0 on success, -1 on failure
> + */
> +int
> +rte_vhost_user_register_extern_ops(int vid, struct
> vhost_user_extern_ops *ops);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_vhost/rte_vhost_version.map
> b/lib/librte_vhost/rte_vhost_version.map
> index df01031..91bf9f0 100644
> --- a/lib/librte_vhost/rte_vhost_version.map
> +++ b/lib/librte_vhost/rte_vhost_version.map
> @@ -59,3 +59,9 @@ DPDK_18.02 {
>  	rte_vhost_vring_call;
> 
>  } DPDK_17.08;
> +
> +DPDK_18.05 {
> +	global:
> +
> +	rte_vhost_user_register_extern_ops;
> +} DPDK_18.02;
> diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
> index a407067..80af341 100644
> --- a/lib/librte_vhost/vhost.c
> +++ b/lib/librte_vhost/vhost.c
> @@ -1,5 +1,5 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2010-2016 Intel Corporation
> + * Copyright(c) 2010-2018 Intel Corporation
>   */
> 
>  #include <linux/vhost.h>
> @@ -627,3 +627,18 @@ rte_vhost_rx_queue_count(int vid, uint16_t qid)
> 
>  	return *((volatile uint16_t *)&vq->avail->idx) - vq->last_avail_idx;
>  }
> +
> +int
> +rte_vhost_user_register_extern_ops(int vid, struct
> vhost_user_extern_ops *ops)
> +{
> +	struct virtio_net *dev;
> +
> +	dev = get_device(vid);
> +	if (dev == NULL)
> +		return -1;
> +
> +	if (ops)
> +		rte_memcpy(&dev->extern_ops, ops, sizeof(*ops));
> +
> +	return 0;
> +}

Why we need this new "register" API? Why can't you use one of the 
(struct vhost_device_ops).reserved[0] field to put this callback there?
I think this is right time to utilize this field.

Can you do something similar to 
http://dpdk.org/ml/archives/dev/2018-March/094213.html ?

> diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
> index d947bc9..2072b88 100644
> --- a/lib/librte_vhost/vhost.h
> +++ b/lib/librte_vhost/vhost.h
> @@ -1,5 +1,5 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2010-2018 Intel Corporation
>   */
> 
>  #ifndef _VHOST_NET_CDEV_H_
> @@ -241,8 +241,12 @@ struct virtio_net {
>  	struct guest_page       *guest_pages;
> 
>  	int			slave_req_fd;
> -} __rte_cache_aligned;
> 
> +	/* private data for external virtio device */
> +	void			*extern_data;
> +	/* pre and post vhost user message handlers for externel backend */
> +	struct vhost_user_extern_ops extern_ops;
> +} __rte_cache_aligned;
> 
>  #define VHOST_LOG_PAGE	4096
> 
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index 90ed211..ede8a5e 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -1,5 +1,5 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2010-2016 Intel Corporation
> + * Copyright(c) 2010-2018 Intel Corporation
>   */
> 
>  #include <stdint.h>
> @@ -50,6 +50,8 @@ static const char
> *vhost_message_str[VHOST_USER_MAX] = {
>  	[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
>  	[VHOST_USER_SET_SLAVE_REQ_FD]  =
> "VHOST_USER_SET_SLAVE_REQ_FD",
>  	[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
> +	[VHOST_USER_CRYPTO_CREATE_SESS] =
> "VHOST_USER_CRYPTO_CREATE_SESS",
> +	[VHOST_USER_CRYPTO_CLOSE_SESS] =
> "VHOST_USER_CRYPTO_CLOSE_SESS",
>  };
> 
>  static uint64_t
> @@ -1302,6 +1304,7 @@ vhost_user_msg_handler(int vid, int fd)
>  	struct VhostUserMsg msg;
>  	int ret;
>  	int unlock_required = 0;
> +	uint32_t skip_master = 0;
> 
>  	dev = get_device(vid);
>  	if (dev == NULL)
> @@ -1379,6 +1382,21 @@ vhost_user_msg_handler(int vid, int fd)
> 
>  	}
> 
> +	if (dev->extern_ops.pre_msg_handle) {
> +		uint32_t need_reply;
> +
> +		ret = (*dev->extern_ops.pre_msg_handle)(dev->vid,
> +				(void *)&msg, &need_reply, &skip_master);
> +		if (ret < 0)
> +			goto skip_to_reply;
> +
> +		if (need_reply)
> +			send_vhost_reply(fd, &msg);
> +	}
> +
> +	if (skip_master)
> +		goto skip_to_post_handle;

This can be moved inside above  if () { } 

> +
>  	switch (msg.request.master) {
>  	case VHOST_USER_GET_FEATURES:
>  		msg.payload.u64 = vhost_user_get_features(dev);
> @@ -1479,9 +1497,22 @@ vhost_user_msg_handler(int vid, int fd)
>  	default:
>  		ret = -1;
>  		break;
> +	}
> +
> +skip_to_post_handle:
> +	if (dev->extern_ops.post_msg_handle) {
> +		uint32_t need_reply;
> +
> +		ret = (*dev->extern_ops.post_msg_handle)(
> +				dev->vid, (void *)&msg, &need_reply);
> +		if (ret < 0)
> +			goto skip_to_reply;
> 
> +		if (need_reply)
> +			send_vhost_reply(fd, &msg);
>  	}
> 
> +skip_to_reply:
>  	if (unlock_required)
>  		vhost_user_unlock_all_queue_pairs(dev);
> 
> --
> 2.7.4

Overall, I think, this direction where we need to go.

Pawel

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2 1/2] Add RIB library
    2018-03-14 11:09  4%   ` Bruce Richardson
@ 2018-03-29 10:27  3%   ` Bruce Richardson
  2018-03-29 20:11  0%     ` Vladimir Medvedkin
  1 sibling, 1 reply; 200+ results
From: Bruce Richardson @ 2018-03-29 10:27 UTC (permalink / raw)
  To: Medvedkin Vladimir; +Cc: dev

On Wed, Feb 21, 2018 at 09:44:54PM +0000, Medvedkin Vladimir wrote:
> RIB is an alternative to current LPM library.
> It solves the following problems
>  - Increases the speed of control plane operations against lpm such as
>    adding/deleting routes
>  - Adds abstraction from dataplane algorithms, so it is possible to add
>    different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc
>    in addition to current dir24_8
>  - It is possible to keep user defined application specific additional
>    information in struct rte_rib_node which represents route entry.
>    It can be next hop/set of next hops (i.e. active and feasible),
>    pointers to link rte_rib_node based on some criteria (i.e. next_hop),
>    plenty of additional control plane information.
>  - For dir24_8 implementation it is possible to remove rte_lpm_tbl_entry.depth
>    field that helps to save 6 bits.
>  - Also new dir24_8 implementation supports different next_hop sizes
>    (1/2/4/8 bytes per next hop)
>  - Removed RTE_LPM_LOOKUP_SUCCESS to save 1 bit and to eleminate ternary operator.
>    Instead it returns special default value if there is no route.
> 
> Signed-off-by: Medvedkin Vladimir <medvedkinv@gmail.com>
> ---

Hi again,

some initial comments on the dir24_8 files below.

/Bruce

>  config/common_base                 |   6 +
>  doc/api/doxy-api.conf              |   1 +
>  lib/Makefile                       |   2 +
>  lib/librte_rib/Makefile            |  22 ++
>  lib/librte_rib/rte_dir24_8.c       | 482 +++++++++++++++++++++++++++++++++
>  lib/librte_rib/rte_dir24_8.h       | 116 ++++++++
>  lib/librte_rib/rte_rib.c           | 526 +++++++++++++++++++++++++++++++++++++
>  lib/librte_rib/rte_rib.h           | 322 +++++++++++++++++++++++
>  lib/librte_rib/rte_rib_version.map |  18 ++
>  mk/rte.app.mk                      |   1 +
>  10 files changed, 1496 insertions(+)
>  create mode 100644 lib/librte_rib/Makefile
>  create mode 100644 lib/librte_rib/rte_dir24_8.c
>  create mode 100644 lib/librte_rib/rte_dir24_8.h
>  create mode 100644 lib/librte_rib/rte_rib.c
>  create mode 100644 lib/librte_rib/rte_rib.h
>  create mode 100644 lib/librte_rib/rte_rib_version.map
> 

<snip>

> diff --git a/lib/librte_rib/rte_dir24_8.c b/lib/librte_rib/rte_dir24_8.c
> new file mode 100644
> index 0000000..a12f882
> --- /dev/null
> +++ b/lib/librte_rib/rte_dir24_8.c
> @@ -0,0 +1,482 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Vladimir Medvedkin <medvedkinv@gmail.com>
> + */
> +
> +#include <stdint.h>
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <rte_debug.h>
> +#include <rte_malloc.h>
> +#include <rte_prefetch.h>
> +#include <rte_errno.h>
> +
> +#include <inttypes.h>
> +
> +#include <rte_memory.h>
> +#include <rte_branch_prediction.h>
> +
> +#include <rte_rib.h>
> +#include <rte_dir24_8.h>
> +
> +#define BITMAP_SLAB_BIT_SIZE_LOG2	6
> +#define BITMAP_SLAB_BIT_SIZE		(1 << BITMAP_SLAB_BIT_SIZE_LOG2)
> +#define BITMAP_SLAB_BITMASK		(BITMAP_SLAB_BIT_SIZE - 1)
> +
> +#define ROUNDUP(x, y)	 RTE_ALIGN_CEIL(x, (1 << (32 - y)))
> +
> +static __rte_always_inline __attribute__((pure)) void *
> +get_tbl24_p(struct rte_dir24_8_tbl *fib, uint32_t ip)
> +{
> +	return (void *)&((uint8_t *)fib->tbl24)[(ip &
> +		RTE_DIR24_8_TBL24_MASK) >> (8 - fib->nh_sz)];
> +}
> +
> +#define LOOKUP_FUNC(suffix, type, bulk_prefetch)			\
> +int rte_dir24_8_lookup_bulk_##suffix(void *fib_p, const uint32_t *ips,	\
> +	uint64_t *next_hops, const unsigned n)				\
> +{									\
> +	struct rte_dir24_8_tbl *fib = (struct rte_dir24_8_tbl *)fib_p;	\
> +	uint64_t tmp;							\
> +	uint32_t i;							\
> +	uint32_t prefetch_offset = RTE_MIN((unsigned)bulk_prefetch, n);	\
> +									\
> +	RTE_RIB_RETURN_IF_TRUE(((fib == NULL) || (ips == NULL) ||	\
> +		(next_hops == NULL)), -EINVAL);				\
> +									\
> +	for (i = 0; i < prefetch_offset; i++)				\
> +		rte_prefetch0(get_tbl24_p(fib, ips[i]));		\
> +	for (i = 0; i < (n - prefetch_offset); i++) {			\
> +		rte_prefetch0(get_tbl24_p(fib, ips[i + prefetch_offset])); \
> +		tmp = ((type *)fib->tbl24)[ips[i] >> 8];		\
> +		if (unlikely((tmp & RTE_DIR24_8_VALID_EXT_ENT) ==	\
> +			RTE_DIR24_8_VALID_EXT_ENT)) {			\
> +			tmp = ((type *)fib->tbl8)[(uint8_t)ips[i] +	\
> +				((tmp >> 1) * RTE_DIR24_8_TBL8_GRP_NUM_ENT)]; \
> +		}							\
> +		next_hops[i] = tmp >> 1;				\
> +	}								\
> +	for (; i < n; i++) {						\
> +		tmp = ((type *)fib->tbl24)[ips[i] >> 8];		\
> +		if (unlikely((tmp & RTE_DIR24_8_VALID_EXT_ENT) ==	\
> +			RTE_DIR24_8_VALID_EXT_ENT)) {			\
> +			tmp = ((type *)fib->tbl8)[(uint8_t)ips[i] +	\
> +				((tmp >> 1) * RTE_DIR24_8_TBL8_GRP_NUM_ENT)]; \
> +		}							\
> +		next_hops[i] = tmp >> 1;				\
> +	}								\
> +	return 0;							\
> +}									\

What is the advantage of doing this as a macro? Unless I'm missing
something "suffix" is never actually used in the function at all, and you
reference the size of the data from fix->nh_sz. Therefore there can be no
performance benefit from having such a lookup function, that I can see.

Therefore, if performance is ok, I suggest just making a single lookup_bulk
function that works with all sizes - as the inlined lookup function does in
the header. 

Alternatively, if you do want specific functions for each
entry size, you still don't need macros. Write a single function that takes
as a final parameter the entry-size and use that in calculations rather
than nh_sz.  Then wrap that function in the set of public ones, passing in
the final size parameter explicitly as "1", "2", "4" or "8". The compiler
will then know that as a compile-time constant and generate the correct
code for each size. However, for this path I suggest you check for any
resulting performance improvement, e.g. with l3fwd, as I think it's not
likely to be significant.

> +
> +static void
> +write_to_fib(void *ptr, uint64_t val, enum rte_dir24_8_nh_sz size, int n)
> +{
> +	int i;
> +	uint8_t *ptr8 = (uint8_t *)ptr;
> +	uint16_t *ptr16 = (uint16_t *)ptr;
> +	uint32_t *ptr32 = (uint32_t *)ptr;
> +	uint64_t *ptr64 = (uint64_t *)ptr;
> +
> +	switch (size) {
> +	case RTE_DIR24_8_1B:
> +		for (i = 0; i < n; i++)
> +			ptr8[i] = (uint8_t)val;
> +		break;
> +	case RTE_DIR24_8_2B:
> +		for (i = 0; i < n; i++)
> +			ptr16[i] = (uint16_t)val;
> +		break;
> +	case RTE_DIR24_8_4B:
> +		for (i = 0; i < n; i++)
> +			ptr32[i] = (uint32_t)val;
> +		break;
> +	case RTE_DIR24_8_8B:
> +		for (i = 0; i < n; i++)
> +			ptr64[i] = (uint64_t)val;
> +		break;
> +	}
> +}
> +
> +static int
> +tbl8_get_idx(struct rte_dir24_8_tbl *fib)
> +{
> +	uint32_t i;
> +	int bit_idx;
> +
> +	for (i = 0; (i < (fib->number_tbl8s >> BITMAP_SLAB_BIT_SIZE_LOG2)) &&
> +		(fib->tbl8_idxes[i] == UINT64_MAX); i++)
> +		;
> +	if (i <= (fib->number_tbl8s >> BITMAP_SLAB_BIT_SIZE_LOG2)) {
> +		bit_idx = __builtin_ctzll(~fib->tbl8_idxes[i]);
> +		fib->tbl8_idxes[i] |= (1ULL << bit_idx);
> +		return (i << BITMAP_SLAB_BIT_SIZE_LOG2) + bit_idx;
> +	}
> +	return -ENOSPC;
> +}
> +
> +static inline void
> +tbl8_free_idx(struct rte_dir24_8_tbl *fib, int idx)
> +{
> +	fib->tbl8_idxes[idx >> BITMAP_SLAB_BIT_SIZE_LOG2] &=
> +		~(1ULL << (idx & BITMAP_SLAB_BITMASK));
> +}
> +
> +static int
> +tbl8_alloc(struct rte_dir24_8_tbl *fib, uint64_t nh)
> +{
> +	int	tbl8_idx;
> +	uint8_t	*tbl8_ptr;
> +
> +	tbl8_idx = tbl8_get_idx(fib);
> +	if (tbl8_idx < 0)
> +		return tbl8_idx;
> +	tbl8_ptr = (uint8_t *)fib->tbl8 +
> +		((tbl8_idx * RTE_DIR24_8_TBL8_GRP_NUM_ENT) <<
> +		fib->nh_sz);
> +	/*Init tbl8 entries with nexthop from tbl24*/
> +	write_to_fib((void *)tbl8_ptr, nh|
> +		RTE_DIR24_8_VALID_EXT_ENT, fib->nh_sz,
> +		RTE_DIR24_8_TBL8_GRP_NUM_ENT);
> +	return tbl8_idx;
> +}
> +
> +static void
> +tbl8_recycle(struct rte_dir24_8_tbl *fib, uint32_t ip, uint64_t tbl8_idx)
> +{
> +	int i;
> +	uint64_t nh;
> +	uint8_t *ptr8;
> +	uint16_t *ptr16;
> +	uint32_t *ptr32;
> +	uint64_t *ptr64;
> +
> +	switch (fib->nh_sz) {
> +	case RTE_DIR24_8_1B:
> +		ptr8 = &((uint8_t *)fib->tbl8)[tbl8_idx *
> +				RTE_DIR24_8_TBL8_GRP_NUM_ENT];
> +		nh = *ptr8;
> +		for (i = 1; i < RTE_DIR24_8_TBL8_GRP_NUM_ENT; i++) {
> +			if (nh != ptr8[i])
> +				return;
> +		}
> +		((uint8_t *)fib->tbl24)[ip >> 8] =
> +			nh & ~RTE_DIR24_8_VALID_EXT_ENT;
> +		for (i = 0; i < RTE_DIR24_8_TBL8_GRP_NUM_ENT; i++)
> +			ptr8[i] = 0;
> +		break;
> +	case RTE_DIR24_8_2B:
> +		ptr16 = &((uint16_t *)fib->tbl8)[tbl8_idx *
> +				RTE_DIR24_8_TBL8_GRP_NUM_ENT];
> +		nh = *ptr16;
> +		for (i = 1; i < RTE_DIR24_8_TBL8_GRP_NUM_ENT; i++) {
> +			if (nh != ptr16[i])
> +				return;
> +		}
> +		((uint16_t *)fib->tbl24)[ip >> 8] =
> +			nh & ~RTE_DIR24_8_VALID_EXT_ENT;
> +		for (i = 0; i < RTE_DIR24_8_TBL8_GRP_NUM_ENT; i++)
> +			ptr16[i] = 0;
> +		break;
> +	case RTE_DIR24_8_4B:
> +		ptr32 = &((uint32_t *)fib->tbl8)[tbl8_idx *
> +				RTE_DIR24_8_TBL8_GRP_NUM_ENT];
> +		nh = *ptr32;
> +		for (i = 1; i < RTE_DIR24_8_TBL8_GRP_NUM_ENT; i++) {
> +			if (nh != ptr32[i])
> +				return;
> +		}
> +		((uint32_t *)fib->tbl24)[ip >> 8] =
> +			nh & ~RTE_DIR24_8_VALID_EXT_ENT;
> +		for (i = 0; i < RTE_DIR24_8_TBL8_GRP_NUM_ENT; i++)
> +			ptr32[i] = 0;
> +		break;
> +	case RTE_DIR24_8_8B:
> +		ptr64 = &((uint64_t *)fib->tbl8)[tbl8_idx *
> +				RTE_DIR24_8_TBL8_GRP_NUM_ENT];
> +		nh = *ptr64;
> +		for (i = 1; i < RTE_DIR24_8_TBL8_GRP_NUM_ENT; i++) {
> +			if (nh != ptr64[i])
> +				return;
> +		}
> +		((uint64_t *)fib->tbl24)[ip >> 8] =
> +			nh & ~RTE_DIR24_8_VALID_EXT_ENT;
> +		for (i = 0; i < RTE_DIR24_8_TBL8_GRP_NUM_ENT; i++)
> +			ptr64[i] = 0;
> +		break;
> +	}
> +	tbl8_free_idx(fib, tbl8_idx);
> +}
> +
> +static int
> +install_to_fib(struct rte_dir24_8_tbl *fib, uint32_t ledge, uint32_t redge,
> +	uint64_t next_hop)
> +{
> +	uint64_t	tbl24_tmp;
> +	int	tbl8_idx;
> +	int tmp_tbl8_idx;
> +	uint8_t	*tbl8_ptr;
> +
> +	/*case for 0.0.0.0/0*/
> +	if (unlikely((ledge == 0) && (redge == 0))) {
> +		write_to_fib(fib->tbl24, next_hop << 1, fib->nh_sz, 1 << 24);
> +		return 0;
> +	}
> +	if (ROUNDUP(ledge, 24) <= redge) {
> +		if (ledge < ROUNDUP(ledge, 24)) {
> +			tbl24_tmp = RTE_DIR24_8_GET_TBL24(fib, ledge);
> +			if ((tbl24_tmp & RTE_DIR24_8_VALID_EXT_ENT) !=
> +				RTE_DIR24_8_VALID_EXT_ENT) {
> +				tbl8_idx = tbl8_alloc(fib, tbl24_tmp);
> +				tmp_tbl8_idx = tbl8_get_idx(fib);
> +				if ((tbl8_idx < 0) || (tmp_tbl8_idx < 0))
> +					return -ENOSPC;
> +				tbl8_free_idx(fib, tmp_tbl8_idx);
> +				/*update dir24 entry with tbl8 index*/
> +				write_to_fib(get_tbl24_p(fib, ledge),
> +					(tbl8_idx << 1)|
> +					RTE_DIR24_8_VALID_EXT_ENT,
> +					fib->nh_sz, 1);
> +			} else
> +				tbl8_idx = tbl24_tmp >> 1;
> +			tbl8_ptr = (uint8_t *)fib->tbl8 +
> +				(((tbl8_idx * RTE_DIR24_8_TBL8_GRP_NUM_ENT) +
> +				(ledge & ~RTE_DIR24_8_TBL24_MASK)) <<
> +				fib->nh_sz);
> +			/*update tbl8 with new next hop*/
> +			write_to_fib((void *)tbl8_ptr, (next_hop << 1)|
> +				RTE_DIR24_8_VALID_EXT_ENT,
> +				fib->nh_sz, ROUNDUP(ledge, 24) - ledge);
> +			tbl8_recycle(fib, ledge, tbl8_idx);
> +		}
> +		if (ROUNDUP(ledge, 24) < (redge & RTE_DIR24_8_TBL24_MASK)) {
> +			write_to_fib(get_tbl24_p(fib, ROUNDUP(ledge, 24)),
> +				next_hop << 1, fib->nh_sz,
> +				((redge & RTE_DIR24_8_TBL24_MASK) -
> +				ROUNDUP(ledge, 24)) >> 8);
> +		}
> +		if (redge & ~RTE_DIR24_8_TBL24_MASK) {
> +			tbl24_tmp = RTE_DIR24_8_GET_TBL24(fib, redge);
> +			if ((tbl24_tmp & RTE_DIR24_8_VALID_EXT_ENT) !=
> +					RTE_DIR24_8_VALID_EXT_ENT) {
> +				tbl8_idx = tbl8_alloc(fib, tbl24_tmp);
> +				if (tbl8_idx < 0)
> +					return -ENOSPC;
> +				/*update dir24 entry with tbl8 index*/
> +				write_to_fib(get_tbl24_p(fib, redge),
> +					(tbl8_idx << 1)|
> +					RTE_DIR24_8_VALID_EXT_ENT,
> +					fib->nh_sz, 1);
> +			} else
> +				tbl8_idx = tbl24_tmp >> 1;
> +			tbl8_ptr = (uint8_t *)fib->tbl8 +
> +				((tbl8_idx * RTE_DIR24_8_TBL8_GRP_NUM_ENT) <<
> +				fib->nh_sz);
> +			/*update tbl8 with new next hop*/
> +			write_to_fib((void *)tbl8_ptr, (next_hop << 1)|
> +				RTE_DIR24_8_VALID_EXT_ENT,
> +				fib->nh_sz, redge & ~RTE_DIR24_8_TBL24_MASK);
> +			tbl8_recycle(fib, redge, tbl8_idx);
> +		}
> +	} else {
> +		tbl24_tmp = RTE_DIR24_8_GET_TBL24(fib, ledge);
> +		if ((tbl24_tmp & RTE_DIR24_8_VALID_EXT_ENT) !=
> +			RTE_DIR24_8_VALID_EXT_ENT) {
> +			tbl8_idx = tbl8_alloc(fib, tbl24_tmp);
> +			if (tbl8_idx < 0)
> +				return -ENOSPC;
> +			/*update dir24 entry with tbl8 index*/
> +			write_to_fib(get_tbl24_p(fib, ledge),
> +				(tbl8_idx << 1)|
> +				RTE_DIR24_8_VALID_EXT_ENT,
> +				fib->nh_sz, 1);
> +		} else
> +			tbl8_idx = tbl24_tmp >> 1;
> +		tbl8_ptr = (uint8_t *)fib->tbl8 +
> +			(((tbl8_idx * RTE_DIR24_8_TBL8_GRP_NUM_ENT) +
> +			(ledge & ~RTE_DIR24_8_TBL24_MASK)) <<
> +			fib->nh_sz);
> +		/*update tbl8 with new next hop*/
> +		write_to_fib((void *)tbl8_ptr, (next_hop << 1)|
> +			RTE_DIR24_8_VALID_EXT_ENT,
> +			fib->nh_sz, redge - ledge);
> +		tbl8_recycle(fib, ledge, tbl8_idx);
> +	}
> +	return 0;
> +}
> +
> +static int
> +modify_fib(struct rte_rib *rib, uint32_t ip, uint8_t depth,
> +	uint64_t next_hop)
> +{
> +	struct rte_rib_node *tmp = NULL;
> +	struct rte_dir24_8_tbl *fib;
> +	uint32_t ledge, redge;
> +	int ret;
> +
> +	fib = rib->fib;
> +
> +	if (next_hop > DIR24_8_MAX_NH(fib))
> +		return -EINVAL;
> +
> +	ledge = ip;
> +	do {
> +		tmp = rte_rib_tree_get_nxt(rib, ip, depth, tmp,
> +			RTE_RIB_GET_NXT_COVER);
> +		if (tmp != NULL) {
> +			if (tmp->depth == depth)
> +				continue;
> +			redge = tmp->key;
> +			if (ledge == redge) {
> +				ledge = redge +
> +					(uint32_t)(1ULL << (32 - tmp->depth));
> +				continue;
> +			}
> +			ret = install_to_fib(fib, ledge, redge,
> +				next_hop);
> +			if (ret != 0)
> +				return ret;
> +			ledge = redge +
> +				(uint32_t)(1ULL << (32 - tmp->depth));
> +		} else {
> +			redge = ip + (uint32_t)(1ULL << (32 - depth));
> +			ret = install_to_fib(fib, ledge, redge,
> +				next_hop);
> +			if (ret != 0)
> +				return ret;
> +		}
> +	} while (tmp);
> +
> +	return 0;
> +}
> +
> +int
> +rte_dir24_8_modify(struct rte_rib *rib, uint32_t ip, uint8_t depth,
> +	uint64_t next_hop, enum rte_rib_op op)
> +{
> +	struct rte_dir24_8_tbl *fib;
> +	struct rte_rib_node *tmp = NULL;
> +	struct rte_rib_node *node;
> +	struct rte_rib_node *parent;
> +	int ret = 0;
> +
> +	if ((rib == NULL) || (depth > RTE_RIB_MAXDEPTH))
> +		return -EINVAL;
> +
> +	fib = rib->fib;
> +	RTE_ASSERT(fib);
> +
> +	ip &= (uint32_t)(UINT64_MAX << (32 - depth));
> +
> +	node = rte_rib_tree_lookup_exact(rib, ip, depth);
> +	switch (op) {
> +	case RTE_RIB_ADD:
> +		if (node != NULL) {
> +			if (node->nh == next_hop)
> +				return 0;
> +			ret = modify_fib(rib, ip, depth, next_hop);
> +			if (ret == 0)
> +				node->nh = next_hop;
> +			return 0;
> +		}
> +		if (depth > 24) {
> +			tmp = rte_rib_tree_get_nxt(rib, ip, 24, NULL,
> +				RTE_RIB_GET_NXT_COVER);
> +			if ((tmp == NULL) &&
> +				(fib->cur_tbl8s >= fib->number_tbl8s))
> +				return -ENOSPC;
> +
> +		}
> +		node = rte_rib_tree_insert(rib, ip, depth);
> +		if (node == NULL)
> +			return -rte_errno;
> +		node->nh = next_hop;
> +		parent = rte_rib_tree_lookup_parent(node);
> +		if ((parent != NULL) && (parent->nh == next_hop))
> +			return 0;
> +		ret = modify_fib(rib, ip, depth, next_hop);
> +		if (ret) {
> +			rte_rib_tree_remove(rib, ip, depth);
> +			return ret;
> +		}
> +		if ((depth > 24) && (tmp == NULL))
> +			fib->cur_tbl8s++;
> +		return 0;
> +	case RTE_RIB_DEL:
> +		if (node == NULL)
> +			return -ENOENT;
> +
> +		parent = rte_rib_tree_lookup_parent(node);
> +		if (parent != NULL) {
> +			if (parent->nh != node->nh)
> +				ret = modify_fib(rib, ip, depth, parent->nh);
> +		} else
> +			ret = modify_fib(rib, ip, depth, fib->def_nh);
> +		if (ret == 0) {
> +			rte_rib_tree_remove(rib, ip, depth);
> +			if (depth > 24) {
> +				tmp = rte_rib_tree_get_nxt(rib, ip, 24, NULL,
> +					RTE_RIB_GET_NXT_COVER);
> +				if (tmp == NULL)
> +					fib->cur_tbl8s--;
> +			}
> +		}
> +		return ret;
> +	default:
> +		break;
> +	}
> +	return -EINVAL;
> +}
> +
> +struct rte_dir24_8_tbl *rte_dir24_8_create(const char *name, int socket_id,
> +	enum rte_dir24_8_nh_sz nh_sz, uint64_t def_nh)
> +{
> +	char mem_name[RTE_RIB_NAMESIZE];
> +	struct rte_dir24_8_tbl *fib;
> +
> +	snprintf(mem_name, sizeof(mem_name), "FIB_%s", name);
> +	fib = rte_zmalloc_socket(name, sizeof(struct rte_dir24_8_tbl) +
> +		RTE_DIR24_8_TBL24_NUM_ENT * (1 << nh_sz), RTE_CACHE_LINE_SIZE,
> +		socket_id);
> +	if (fib == NULL)
> +		return fib;
> +
> +	snprintf(mem_name, sizeof(mem_name), "TBL8_%s", name);
> +	fib->tbl8 = rte_zmalloc_socket(mem_name, RTE_DIR24_8_TBL8_GRP_NUM_ENT *
> +			(1 << nh_sz) * RTE_DIR24_8_TBL8_NUM_GROUPS,
> +			RTE_CACHE_LINE_SIZE, socket_id);
> +	if (fib->tbl8 == NULL) {
> +		rte_free(fib);
> +		return NULL;
> +	}
> +	fib->def_nh = def_nh;
> +	fib->nh_sz = nh_sz;
> +	fib->number_tbl8s = RTE_MIN((uint32_t)RTE_DIR24_8_TBL8_NUM_GROUPS,
> +				DIR24_8_MAX_NH(fib));
> +
> +	snprintf(mem_name, sizeof(mem_name), "TBL8_idxes_%s", name);
> +	fib->tbl8_idxes = rte_zmalloc_socket(mem_name,
> +			RTE_ALIGN_CEIL(fib->number_tbl8s, 64) >> 3,
> +			RTE_CACHE_LINE_SIZE, socket_id);
> +	if (fib->tbl8_idxes == NULL) {
> +		rte_free(fib->tbl8);
> +		rte_free(fib);
> +		return NULL;
> +	}
> +
> +	return fib;
> +}
> +
> +void
> +rte_dir24_8_free(void *fib_p)
> +{
> +	struct rte_dir24_8_tbl *fib = (struct rte_dir24_8_tbl *)fib_p;
> +
> +	rte_free(fib->tbl8_idxes);
> +	rte_free(fib->tbl8);
> +	rte_free(fib);
> +}
> +
> +LOOKUP_FUNC(1b, uint8_t, 5)
> +LOOKUP_FUNC(2b, uint16_t, 6)
> +LOOKUP_FUNC(4b, uint32_t, 15)
> +LOOKUP_FUNC(8b, uint64_t, 12)
> diff --git a/lib/librte_rib/rte_dir24_8.h b/lib/librte_rib/rte_dir24_8.h
> new file mode 100644
> index 0000000..f779409
> --- /dev/null
> +++ b/lib/librte_rib/rte_dir24_8.h
> @@ -0,0 +1,116 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Vladimir Medvedkin <medvedkinv@gmail.com>
> + */
> +
> +#ifndef _RTE_DIR24_8_H_
> +#define _RTE_DIR24_8_H_
> +
> +/**
> + * @file
> + * RTE Longest Prefix Match (LPM)
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/** @internal Total number of tbl24 entries. */
> +#define RTE_DIR24_8_TBL24_NUM_ENT	(1 << 24)
> +
> +/** Maximum depth value possible for IPv4 LPM. */
> +#define RTE_DIR24_8_MAX_DEPTH		32
> +
> +/** @internal Number of entries in a tbl8 group. */
> +#define RTE_DIR24_8_TBL8_GRP_NUM_ENT	256
> +
> +/** @internal Total number of tbl8 groups in the tbl8. */
> +#define RTE_DIR24_8_TBL8_NUM_GROUPS	65536
> +
> +/** @internal bitmask with valid and valid_group fields set */
> +#define RTE_DIR24_8_VALID_EXT_ENT	0x01
> +
> +#define RTE_DIR24_8_TBL24_MASK		0xffffff00
> +
> +/** Size of nexthop (1 << nh_sz) bits */
> +enum rte_dir24_8_nh_sz {
> +	RTE_DIR24_8_1B,
> +	RTE_DIR24_8_2B,
> +	RTE_DIR24_8_4B,
> +	RTE_DIR24_8_8B
> +};
> +
> +
> +#define DIR24_8_BITS_IN_NH(fib)		(8 * (1 << fib->nh_sz))
> +#define DIR24_8_MAX_NH(fib)	((1ULL << (DIR24_8_BITS_IN_NH(fib) - 1)) - 1)
> +
> +#define DIR24_8_TBL_IDX(a, fib)		((a) >> (3 - fib->nh_sz))
> +#define DIR24_8_PSD_IDX(a, fib)		((a) & ((1 << (3 - fib->nh_sz)) - 1))
> +
> +#define DIR24_8_TBL24_VAL(ip)	(ip >> 8)
> +#define DIR24_8_TBL8_VAL(res, ip)					\
> +	((res >> 1) * RTE_DIR24_8_TBL8_GRP_NUM_ENT + (uint8_t)ip)	\
> +
> +#define DIR24_8_LOOKUP_MSK						\
> +	(((1ULL << ((1 << (fib->nh_sz + 3)) - 1)) << 1) - 1)		\
> +
> +#define RTE_DIR24_8_GET_TBL24(fib, ip)					\
> +	((fib->tbl24[DIR24_8_TBL_IDX(DIR24_8_TBL24_VAL(ip), fib)] >>	\
> +	(DIR24_8_PSD_IDX(DIR24_8_TBL24_VAL(ip), fib) *			\
> +	DIR24_8_BITS_IN_NH(fib))) & DIR24_8_LOOKUP_MSK)			\
> +
> +#define RTE_DIR24_8_GET_TBL8(fib, res, ip)				\
> +	((fib->tbl8[DIR24_8_TBL_IDX(DIR24_8_TBL8_VAL(res, ip), fib)] >>	\
> +	(DIR24_8_PSD_IDX(DIR24_8_TBL8_VAL(res, ip), fib) *		\
> +	DIR24_8_BITS_IN_NH(fib))) & DIR24_8_LOOKUP_MSK)			\
> 
I would strongly suggest making each of the above macros into inline
functions instead. It would allow easier readability since you have
parameter types and can split things across lines easier.
Also, some comments might be good too.

+
> +
> +struct rte_dir24_8_tbl {
> +	uint32_t	number_tbl8s;	/**< Total number of tbl8s. */
> +	uint32_t	cur_tbl8s;	/**< Current cumber of tbl8s. */
> +	uint64_t	def_nh;
> +	enum rte_dir24_8_nh_sz	nh_sz;	/**< Size of nexthop entry */
> +	uint64_t	*tbl8;		/**< LPM tbl8 table. */
> +	uint64_t	*tbl8_idxes;
> +	uint64_t	tbl24[0] __rte_cache_aligned; /**< LPM tbl24 table. */
> +};
> +
> +struct rte_dir24_8_tbl *rte_dir24_8_create(const char *name, int socket_id,
> +	enum rte_dir24_8_nh_sz nh_sz, uint64_t def_nh);
> +void rte_dir24_8_free(void *fib_p);
> +int rte_dir24_8_modify(struct rte_rib *rib, uint32_t key,
> +	uint8_t depth, uint64_t next_hop, enum rte_rib_op op);
> +int rte_dir24_8_lookup_bulk_1b(void *fib_p, const uint32_t *ips,
> +	uint64_t *next_hops, const unsigned n);
> +int rte_dir24_8_lookup_bulk_2b(void *fib_p, const uint32_t *ips,
> +	uint64_t *next_hops, const unsigned n);
> +int rte_dir24_8_lookup_bulk_4b(void *fib_p, const uint32_t *ips,
> +	uint64_t *next_hops, const unsigned n);
> +int rte_dir24_8_lookup_bulk_8b(void *fib_p, const uint32_t *ips,
> +	uint64_t *next_hops, const unsigned n);
> +
> +
> +static inline int
> +rte_dir24_8_lookup(void *fib_p, uint32_t ip, uint64_t *next_hop)

Why use void * as parameter, since the proper type is defined just above?

> +{
> +	uint64_t res;
> +	struct rte_dir24_8_tbl *fib = (struct rte_dir24_8_tbl *)fib_p;
> +
> +	RTE_RIB_RETURN_IF_TRUE(((fib == NULL) || (ip == NULL) ||
> +		(next_hop == NULL)), -EINVAL);
> +
> +	res = RTE_DIR24_8_GET_TBL24(fib, ip);
> +	if (unlikely((res & RTE_DIR24_8_VALID_EXT_ENT) ==
> +		RTE_DIR24_8_VALID_EXT_ENT)) {
> +		res = RTE_DIR24_8_GET_TBL8(fib, res, ip);
> +	}
> +	*next_hop = res >> 1;
> +	return 0;
> +}

Do we need this static inline function? Can the bulk functions do on their
own? If we can remove this, we can move the most of the header file
contents, especially the structures, out of the public header. That would
greatly improve the ease with which ABI can be maintained.

> +
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_DIR24_8_H_ */
> +

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] ethdev: replace bus specific struct with generic dev
  2018-03-29  6:17  0% ` Tomasz Duszynski
@ 2018-03-29  9:20  0%   ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-03-29  9:20 UTC (permalink / raw)
  To: Tomasz Duszynski, dpdk-dev

On 3/29/2018 7:17 AM, Tomasz Duszynski wrote:
> On Tue, Mar 27, 2018 at 06:40:52PM +0100, Ferruh Yigit wrote:
>> Public struct rte_eth_dev_info has a "struct rte_pci_device" field in it
>> although it is common for all ethdev in all buses.
>>
>> Replacing pci specific struct with generic device struct and updating
>> places that are using pci device in a way to get this information from
>> generic device.
>>
>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>> ---
>> Cc: Pablo de Lara <pablo.de.lara.guarch@intel.com>
>>
>> There is no deprecation notice sent for this update but in this release
>> ethdev info already updated and ABI already broken, it can be good
>> opportunity for this update.
>> ---
>>  app/test-pmd/config.c                     | 11 ++++++++++-
>>  app/test-pmd/testpmd.h                    | 24 ++++++++++++++++++------
>>  drivers/net/af_packet/rte_eth_af_packet.c |  1 +
>>  drivers/net/ark/ark_ethdev.c              |  4 +++-
>>  drivers/net/avf/avf_ethdev.c              |  2 +-
>>  drivers/net/avp/avp_ethdev.c              |  2 +-
>>  drivers/net/bnx2x/bnx2x_ethdev.c          |  2 +-
>>  drivers/net/bnxt/bnxt_ethdev.c            |  2 +-
>>  drivers/net/cxgbe/cxgbe_ethdev.c          |  2 +-
>>  drivers/net/dpaa/dpaa_ethdev.c            |  1 +
>>  drivers/net/dpaa2/dpaa2_ethdev.c          |  1 +
>>  drivers/net/e1000/em_ethdev.c             |  2 +-
>>  drivers/net/e1000/igb_ethdev.c            |  4 ++--
>>  drivers/net/ena/ena_ethdev.c              |  2 +-
>>  drivers/net/enic/enic_ethdev.c            |  2 +-
>>  drivers/net/fm10k/fm10k_ethdev.c          |  2 +-
>>  drivers/net/i40e/i40e_ethdev.c            |  2 +-
>>  drivers/net/i40e/i40e_ethdev_vf.c         |  2 +-
>>  drivers/net/ixgbe/ixgbe_ethdev.c          |  4 ++--
>>  drivers/net/kni/rte_eth_kni.c             |  2 +-
>>  drivers/net/liquidio/lio_ethdev.c         |  2 +-
>>  drivers/net/mlx4/mlx4_ethdev.c            |  2 +-
>>  drivers/net/mlx5/mlx5_ethdev.c            |  2 +-
>>  drivers/net/mrvl/mrvl_ethdev.c            |  2 ++
>>  drivers/net/nfp/nfp_net.c                 |  2 +-
>>  drivers/net/null/rte_eth_null.c           |  1 +
>>  drivers/net/octeontx/octeontx_ethdev.c    |  2 +-
>>  drivers/net/pcap/rte_eth_pcap.c           |  1 +
>>  drivers/net/qede/qede_ethdev.c            |  2 +-
>>  drivers/net/ring/rte_eth_ring.c           |  1 +
>>  drivers/net/sfc/sfc_ethdev.c              |  2 +-
>>  drivers/net/szedata2/rte_eth_szedata2.c   |  2 +-
>>  drivers/net/tap/rte_eth_tap.c             |  2 +-
>>  drivers/net/thunderx/nicvf_ethdev.c       |  2 +-
>>  drivers/net/virtio/virtio_ethdev.c        |  2 +-
>>  drivers/net/vmxnet3/vmxnet3_ethdev.c      |  2 +-
>>  examples/ethtool/lib/rte_ethtool.c        | 15 +++++++++------
>>  examples/ip_pipeline/init.c               | 10 ++++++++--
>>  examples/kni/main.c                       | 10 +++++++---
>>  lib/librte_ether/rte_ethdev.h             |  2 +-
>>  test/test/test_kni.c                      | 28 ++++++++++++++++++++++------
>>  41 files changed, 114 insertions(+), 54 deletions(-)
>>
> 
> [...]
> 
>> diff --git a/drivers/net/mrvl/mrvl_ethdev.c b/drivers/net/mrvl/mrvl_ethdev.c
>> index c0483b912..d46c65255 100644
>> --- a/drivers/net/mrvl/mrvl_ethdev.c
>> +++ b/drivers/net/mrvl/mrvl_ethdev.c
>> @@ -1314,6 +1314,8 @@ static void
>>  mrvl_dev_infos_get(struct rte_eth_dev *dev __rte_unused,
>>  		   struct rte_eth_dev_info *info)
>>  {
>> +	info->device = dev->device;
> 
> Since dev is used perhaps __rte_unused can be dropped.
> Besides that,

OK, I will send new version.

(reduced cc list)

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: replace bus specific struct with generic dev
  2018-03-27 17:40  1% [dpdk-dev] [PATCH] ethdev: replace bus specific struct with generic dev Ferruh Yigit
                   ` (2 preceding siblings ...)
  2018-03-29  6:17  0% ` Tomasz Duszynski
@ 2018-03-29  8:01  0% ` santosh
  3 siblings, 0 replies; 200+ results
From: santosh @ 2018-03-29  8:01 UTC (permalink / raw)
  To: Ferruh Yigit, Jerin Jacob, Thomas Monjalon; +Cc: dev


On Tuesday 27 March 2018 11:10 PM, Ferruh Yigit wrote:
> Public struct rte_eth_dev_info has a "struct rte_pci_device" field in it
> although it is common for all ethdev in all buses.
>
> Replacing pci specific struct with generic device struct and updating
> places that are using pci device in a way to get this information from
> generic device.
>
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> Cc: Pablo de Lara <pablo.de.lara.guarch@intel.com>
>
> There is no deprecation notice sent for this update but in this release
> ethdev info already updated and ABI already broken, it can be good
> opportunity for this update.
> ---
>  app/test-pmd/config.c                     | 11 ++++++++++-
>  app/test-pmd/testpmd.h                    | 24 ++++++++++++++++++------
>  drivers/net/af_packet/rte_eth_af_packet.c |  1 +
>  drivers/net/ark/ark_ethdev.c              |  4 +++-
>  drivers/net/avf/avf_ethdev.c              |  2 +-
>  drivers/net/avp/avp_ethdev.c              |  2 +-
>  drivers/net/bnx2x/bnx2x_ethdev.c          |  2 +-
>  drivers/net/bnxt/bnxt_ethdev.c            |  2 +-
>  drivers/net/cxgbe/cxgbe_ethdev.c          |  2 +-
>  drivers/net/dpaa/dpaa_ethdev.c            |  1 +
>  drivers/net/dpaa2/dpaa2_ethdev.c          |  1 +
>  drivers/net/e1000/em_ethdev.c             |  2 +-
>  drivers/net/e1000/igb_ethdev.c            |  4 ++--
>  drivers/net/ena/ena_ethdev.c              |  2 +-
>  drivers/net/enic/enic_ethdev.c            |  2 +-
>  drivers/net/fm10k/fm10k_ethdev.c          |  2 +-
>  drivers/net/i40e/i40e_ethdev.c            |  2 +-
>  drivers/net/i40e/i40e_ethdev_vf.c         |  2 +-
>  drivers/net/ixgbe/ixgbe_ethdev.c          |  4 ++--
>  drivers/net/kni/rte_eth_kni.c             |  2 +-
>  drivers/net/liquidio/lio_ethdev.c         |  2 +-
>  drivers/net/mlx4/mlx4_ethdev.c            |  2 +-
>  drivers/net/mlx5/mlx5_ethdev.c            |  2 +-
>  drivers/net/mrvl/mrvl_ethdev.c            |  2 ++
>  drivers/net/nfp/nfp_net.c                 |  2 +-
>  drivers/net/null/rte_eth_null.c           |  1 +
>  drivers/net/octeontx/octeontx_ethdev.c    |  2 +-

Resending, as mailman had issues sending to too many recipient.
nits: patch apply failed on tip 20526313, applied manually.
with that:
Acked-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>

[..]

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: replace bus specific struct with generic dev
  2018-03-27 17:40  1% [dpdk-dev] [PATCH] ethdev: replace bus specific struct with generic dev Ferruh Yigit
  2018-03-28  7:04  0% ` Shreyansh Jain
  2018-03-28 13:11  0% ` Legacy, Allain
@ 2018-03-29  6:17  0% ` Tomasz Duszynski
  2018-03-29  9:20  0%   ` Ferruh Yigit
  2018-03-29  8:01  0% ` santosh
  3 siblings, 1 reply; 200+ results
From: Tomasz Duszynski @ 2018-03-29  6:17 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Wenzhuo Lu, Jingjing Wu, John W. Linville, Shepard Siegel,
	Ed Czeck, John Miller, Allain Legacy, Matt Peters, Harish Patil,
	Rasesh Mody, Ajit Khaparde, Somnath Kotur, Rahul Lakkireddy,
	Hemant Agrawal, Shreyansh Jain, Marcin Wojtas, Michal Krawczyk,
	Guy Tzalik, Evgeny Schemeilin, John Daley, Hyong Youb Kim,
	Qi Zhang, Xiao Wang, Beilei Xing, Konstantin Ananyev,
	Shijith Thotton, Srisivasubramanian Srinivasan, Adrien Mazarguil,
	Nelio Laranjeiro, Yongseok Koh, Jacek Siuda, Tomasz Duszynski,
	Dmitri Epshtein, Natalie Samsonov, Jianbo Liu, Alejandro Lucero,
	Tetsuya Mukawa, Santosh Shukla, Jerin Jacob, Shahed Shaikh,
	Bruce Richardson, Andrew Rybchenko, Matej Vido, Pascal Mazon,
	Maciej Czekaj, Maxime Coquelin, Tiwei Bie, Shrikrishna Khare,
	Remy Horton, Ori Kam, Pablo de Lara, Radu Nicolau, Akhil Goyal,
	Tomasz Kantecki, Cristian Dumitrescu, Thomas Monjalon, dev

On Tue, Mar 27, 2018 at 06:40:52PM +0100, Ferruh Yigit wrote:
> Public struct rte_eth_dev_info has a "struct rte_pci_device" field in it
> although it is common for all ethdev in all buses.
>
> Replacing pci specific struct with generic device struct and updating
> places that are using pci device in a way to get this information from
> generic device.
>
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> Cc: Pablo de Lara <pablo.de.lara.guarch@intel.com>
>
> There is no deprecation notice sent for this update but in this release
> ethdev info already updated and ABI already broken, it can be good
> opportunity for this update.
> ---
>  app/test-pmd/config.c                     | 11 ++++++++++-
>  app/test-pmd/testpmd.h                    | 24 ++++++++++++++++++------
>  drivers/net/af_packet/rte_eth_af_packet.c |  1 +
>  drivers/net/ark/ark_ethdev.c              |  4 +++-
>  drivers/net/avf/avf_ethdev.c              |  2 +-
>  drivers/net/avp/avp_ethdev.c              |  2 +-
>  drivers/net/bnx2x/bnx2x_ethdev.c          |  2 +-
>  drivers/net/bnxt/bnxt_ethdev.c            |  2 +-
>  drivers/net/cxgbe/cxgbe_ethdev.c          |  2 +-
>  drivers/net/dpaa/dpaa_ethdev.c            |  1 +
>  drivers/net/dpaa2/dpaa2_ethdev.c          |  1 +
>  drivers/net/e1000/em_ethdev.c             |  2 +-
>  drivers/net/e1000/igb_ethdev.c            |  4 ++--
>  drivers/net/ena/ena_ethdev.c              |  2 +-
>  drivers/net/enic/enic_ethdev.c            |  2 +-
>  drivers/net/fm10k/fm10k_ethdev.c          |  2 +-
>  drivers/net/i40e/i40e_ethdev.c            |  2 +-
>  drivers/net/i40e/i40e_ethdev_vf.c         |  2 +-
>  drivers/net/ixgbe/ixgbe_ethdev.c          |  4 ++--
>  drivers/net/kni/rte_eth_kni.c             |  2 +-
>  drivers/net/liquidio/lio_ethdev.c         |  2 +-
>  drivers/net/mlx4/mlx4_ethdev.c            |  2 +-
>  drivers/net/mlx5/mlx5_ethdev.c            |  2 +-
>  drivers/net/mrvl/mrvl_ethdev.c            |  2 ++
>  drivers/net/nfp/nfp_net.c                 |  2 +-
>  drivers/net/null/rte_eth_null.c           |  1 +
>  drivers/net/octeontx/octeontx_ethdev.c    |  2 +-
>  drivers/net/pcap/rte_eth_pcap.c           |  1 +
>  drivers/net/qede/qede_ethdev.c            |  2 +-
>  drivers/net/ring/rte_eth_ring.c           |  1 +
>  drivers/net/sfc/sfc_ethdev.c              |  2 +-
>  drivers/net/szedata2/rte_eth_szedata2.c   |  2 +-
>  drivers/net/tap/rte_eth_tap.c             |  2 +-
>  drivers/net/thunderx/nicvf_ethdev.c       |  2 +-
>  drivers/net/virtio/virtio_ethdev.c        |  2 +-
>  drivers/net/vmxnet3/vmxnet3_ethdev.c      |  2 +-
>  examples/ethtool/lib/rte_ethtool.c        | 15 +++++++++------
>  examples/ip_pipeline/init.c               | 10 ++++++++--
>  examples/kni/main.c                       | 10 +++++++---
>  lib/librte_ether/rte_ethdev.h             |  2 +-
>  test/test/test_kni.c                      | 28 ++++++++++++++++++++++------
>  41 files changed, 114 insertions(+), 54 deletions(-)
>

[...]

> diff --git a/drivers/net/mrvl/mrvl_ethdev.c b/drivers/net/mrvl/mrvl_ethdev.c
> index c0483b912..d46c65255 100644
> --- a/drivers/net/mrvl/mrvl_ethdev.c
> +++ b/drivers/net/mrvl/mrvl_ethdev.c
> @@ -1314,6 +1314,8 @@ static void
>  mrvl_dev_infos_get(struct rte_eth_dev *dev __rte_unused,
>  		   struct rte_eth_dev_info *info)
>  {
> +	info->device = dev->device;

Since dev is used perhaps __rte_unused can be dropped.
Besides that,

Acked-by: Tomasz Duszynski <tdu@semihalf.com>

> +
>  	info->speed_capa = ETH_LINK_SPEED_10M |
>  			   ETH_LINK_SPEED_100M |
>  			   ETH_LINK_SPEED_1G |
> diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
> index 8591c7de0..add00baf9 100644
> --- a/drivers/net/nfp/nfp_net.c
> +++ b/drivers/net/nfp/nfp_net.c
> @@ -1159,7 +1159,7 @@ nfp_net_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>
>  	hw = NFP_NET_DEV_PRIVATE_TO_HW(dev->data->dev_private);
>
> -	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
> +	dev_info->device = dev->device;
>  	dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
>  	dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
>  	dev_info->min_rx_bufsize = ETHER_MIN_MTU;
> diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c
> index 73fe8b04a..7506f77f6 100644
> --- a/drivers/net/null/rte_eth_null.c
> +++ b/drivers/net/null/rte_eth_null.c
> @@ -292,6 +292,7 @@ eth_dev_info(struct rte_eth_dev *dev,
>  		return;
>
>  	internals = dev->data->dev_private;
> +	dev_info->device = dev->device;
>  	dev_info->max_mac_addrs = 1;
>  	dev_info->max_rx_pktlen = (uint32_t)-1;
>  	dev_info->max_rx_queues = RTE_DIM(internals->rx_null_queues);
> diff --git a/drivers/net/octeontx/octeontx_ethdev.c b/drivers/net/octeontx/octeontx_ethdev.c
> index 90dd249a6..edd4dd3ff 100644
> --- a/drivers/net/octeontx/octeontx_ethdev.c
> +++ b/drivers/net/octeontx/octeontx_ethdev.c
> @@ -611,7 +611,7 @@ octeontx_dev_info(struct rte_eth_dev *dev,
>  	dev_info->max_rx_queues = 1;
>  	dev_info->max_tx_queues = PKO_MAX_NUM_DQ;
>  	dev_info->min_rx_bufsize = 0;
> -	dev_info->pci_dev = NULL;
> +	dev_info->device = NULL;
>
>  	dev_info->default_rxconf = (struct rte_eth_rxconf) {
>  		.rx_free_thresh = 0,
> diff --git a/drivers/net/pcap/rte_eth_pcap.c b/drivers/net/pcap/rte_eth_pcap.c
> index c1571e1fe..2e739a24e 100644
> --- a/drivers/net/pcap/rte_eth_pcap.c
> +++ b/drivers/net/pcap/rte_eth_pcap.c
> @@ -526,6 +526,7 @@ eth_dev_info(struct rte_eth_dev *dev,
>  {
>  	struct pmd_internals *internals = dev->data->dev_private;
>
> +	dev_info->device = dev->device;
>  	dev_info->if_index = internals->if_index;
>  	dev_info->max_mac_addrs = 1;
>  	dev_info->max_rx_pktlen = (uint32_t) -1;
> diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c
> index a91f43683..59d604b78 100644
> --- a/drivers/net/qede/qede_ethdev.c
> +++ b/drivers/net/qede/qede_ethdev.c
> @@ -1515,7 +1515,7 @@ qede_dev_info_get(struct rte_eth_dev *eth_dev,
>
>  	PMD_INIT_FUNC_TRACE(edev);
>
> -	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
> +	dev_info->device = eth_dev->device;
>  	dev_info->min_rx_bufsize = (uint32_t)QEDE_MIN_RX_BUFF_SIZE;
>  	dev_info->max_rx_pktlen = (uint32_t)ETH_TX_MAX_NON_LSO_PKT_LEN;
>  	dev_info->rx_desc_lim = qede_rx_desc_lim;
> diff --git a/drivers/net/ring/rte_eth_ring.c b/drivers/net/ring/rte_eth_ring.c
> index df13c44be..14274fa36 100644
> --- a/drivers/net/ring/rte_eth_ring.c
> +++ b/drivers/net/ring/rte_eth_ring.c
> @@ -153,6 +153,7 @@ eth_dev_info(struct rte_eth_dev *dev,
>  		struct rte_eth_dev_info *dev_info)
>  {
>  	struct pmd_internals *internals = dev->data->dev_private;
> +	dev_info->device = dev->device;
>  	dev_info->max_mac_addrs = 1;
>  	dev_info->max_rx_pktlen = (uint32_t)-1;
>  	dev_info->max_rx_queues = (uint16_t)internals->max_rx_queues;
> diff --git a/drivers/net/sfc/sfc_ethdev.c b/drivers/net/sfc/sfc_ethdev.c
> index f16d52081..2c0ad7ecf 100644
> --- a/drivers/net/sfc/sfc_ethdev.c
> +++ b/drivers/net/sfc/sfc_ethdev.c
> @@ -89,7 +89,7 @@ sfc_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>
>  	sfc_log_init(sa, "entry");
>
> -	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
> +	dev_info->device = dev->device;
>  	dev_info->max_rx_pktlen = EFX_MAC_PDU_MAX;
>
>  	/* Autonegotiation may be disabled */
> diff --git a/drivers/net/szedata2/rte_eth_szedata2.c b/drivers/net/szedata2/rte_eth_szedata2.c
> index 1d02aee6f..4157cc88f 100644
> --- a/drivers/net/szedata2/rte_eth_szedata2.c
> +++ b/drivers/net/szedata2/rte_eth_szedata2.c
> @@ -1031,7 +1031,7 @@ eth_dev_info(struct rte_eth_dev *dev,
>  		struct rte_eth_dev_info *dev_info)
>  {
>  	struct pmd_internals *internals = dev->data->dev_private;
> -	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
> +	dev_info->device = dev->device;
>  	dev_info->if_index = 0;
>  	dev_info->max_mac_addrs = 1;
>  	dev_info->max_rx_pktlen = (uint32_t)-1;
> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
> index 67ed9d466..23843e32e 100644
> --- a/drivers/net/tap/rte_eth_tap.c
> +++ b/drivers/net/tap/rte_eth_tap.c
> @@ -688,7 +688,7 @@ tap_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>  	dev_info->max_rx_queues = RTE_PMD_TAP_MAX_QUEUES;
>  	dev_info->max_tx_queues = RTE_PMD_TAP_MAX_QUEUES;
>  	dev_info->min_rx_bufsize = 0;
> -	dev_info->pci_dev = NULL;
> +	dev_info->device = NULL;
>  	dev_info->speed_capa = tap_dev_speed_capa();
>  	dev_info->rx_offload_capa = tap_rx_offload_get_port_capa();
>  	dev_info->tx_offload_capa = tap_tx_offload_get_port_capa();
> diff --git a/drivers/net/thunderx/nicvf_ethdev.c b/drivers/net/thunderx/nicvf_ethdev.c
> index 067f2243b..f9e4a5810 100644
> --- a/drivers/net/thunderx/nicvf_ethdev.c
> +++ b/drivers/net/thunderx/nicvf_ethdev.c
> @@ -1400,7 +1400,7 @@ nicvf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>
>  	PMD_INIT_FUNC_TRACE();
>
> -	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
> +	dev_info->device = dev->device;
>
>  	/* Autonegotiation may be disabled */
>  	dev_info->speed_capa = ETH_LINK_SPEED_FIXED;
> diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
> index 4dddb1c80..c623ce186 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -2057,7 +2057,7 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>
>  	dev_info->speed_capa = ETH_LINK_SPEED_10G; /* fake value */
>
> -	dev_info->pci_dev = dev->device ? RTE_ETH_DEV_TO_PCI(dev) : NULL;
> +	dev_info->device = dev->device;
>  	dev_info->max_rx_queues =
>  		RTE_MIN(hw->max_queue_pairs, VIRTIO_MAX_RX_QUEUES);
>  	dev_info->max_tx_queues =
> diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c b/drivers/net/vmxnet3/vmxnet3_ethdev.c
> index 426008722..220668e19 100644
> --- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
> +++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
> @@ -1025,7 +1025,7 @@ static void
>  vmxnet3_dev_info_get(struct rte_eth_dev *dev,
>  		     struct rte_eth_dev_info *dev_info)
>  {
> -	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
> +	dev_info->device = dev->device;
>
>  	dev_info->max_rx_queues = VMXNET3_MAX_RX_QUEUES;
>  	dev_info->max_tx_queues = VMXNET3_MAX_TX_QUEUES;
> diff --git a/examples/ethtool/lib/rte_ethtool.c b/examples/ethtool/lib/rte_ethtool.c
> index 90dfbb739..4c770ec6a 100644
> --- a/examples/ethtool/lib/rte_ethtool.c
> +++ b/examples/ethtool/lib/rte_ethtool.c
> @@ -22,6 +22,8 @@ rte_ethtool_get_drvinfo(uint16_t port_id, struct ethtool_drvinfo *drvinfo)
>  {
>  	struct rte_eth_dev_info dev_info;
>  	struct rte_dev_reg_info reg_info;
> +	const struct rte_pci_device *pci_dev;
> +	const struct rte_bus *bus;
>  	int n;
>  	int ret;
>
> @@ -46,15 +48,16 @@ rte_ethtool_get_drvinfo(uint16_t port_id, struct ethtool_drvinfo *drvinfo)
>  	snprintf(drvinfo->version, sizeof(drvinfo->version), "%s",
>  		rte_version());
>  	/* TODO: replace bus_info by rte_devargs.name */
> -	if (dev_info.pci_dev)
> +	bus = rte_bus_find_by_device(dev_info.device);
> +	if (bus && !strcmp(bus->name, "pci")) {
> +		pci_dev = RTE_DEV_TO_PCI(dev_info.device);
>  		snprintf(drvinfo->bus_info, sizeof(drvinfo->bus_info),
>  			"%04x:%02x:%02x.%x",
> -			dev_info.pci_dev->addr.domain,
> -			dev_info.pci_dev->addr.bus,
> -			dev_info.pci_dev->addr.devid,
> -			dev_info.pci_dev->addr.function);
> -	else
> +			pci_dev->addr.domain, pci_dev->addr.bus,
> +			pci_dev->addr.devid, pci_dev->addr.function);
> +	} else {
>  		snprintf(drvinfo->bus_info, sizeof(drvinfo->bus_info), "N/A");
> +	}
>
>  	memset(&reg_info, 0, sizeof(reg_info));
>  	rte_eth_dev_get_reg_info(port_id, &reg_info);
> diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c
> index bb07efa13..f57236b7a 100644
> --- a/examples/ip_pipeline/init.c
> +++ b/examples/ip_pipeline/init.c
> @@ -1266,6 +1266,8 @@ app_init_kni(struct app_params *app) {
>  		struct rte_eth_dev_info dev_info;
>  		struct app_mempool_params *mempool_params;
>  		struct rte_mempool *mempool;
> +		const struct rte_pci_device *pci_dev;
> +		const struct rte_bus *bus;
>  		struct rte_kni_conf conf;
>  		struct rte_kni_ops ops;
>
> @@ -1297,8 +1299,12 @@ app_init_kni(struct app_params *app) {
>  		}
>  		conf.group_id = p_link->pmd_id;
>  		conf.mbuf_size = mempool_params->buffer_size;
> -		conf.addr = dev_info.pci_dev->addr;
> -		conf.id = dev_info.pci_dev->id;
> +		bus = rte_bus_find_by_device(dev_info.device);
> +		if (bus && !strcmp(bus->name, "pci")) {
> +			pci_dev = RTE_DEV_TO_PCI(dev_info.device);
> +			conf.addr = pci_dev->addr;
> +			conf.id = pci_dev->id;
> +		}
>
>  		memset(&ops, 0, sizeof(ops));
>  		ops.port_id = (uint8_t) p_link->pmd_id;
> diff --git a/examples/kni/main.c b/examples/kni/main.c
> index 0d9980ee1..06eb74f6f 100644
> --- a/examples/kni/main.c
> +++ b/examples/kni/main.c
> @@ -834,13 +834,17 @@ kni_alloc(uint16_t port_id)
>  		if (i == 0) {
>  			struct rte_kni_ops ops;
>  			struct rte_eth_dev_info dev_info;
> +			const struct rte_pci_device *pci_dev;
> +			const struct rte_bus *bus;
>
>  			memset(&dev_info, 0, sizeof(dev_info));
>  			rte_eth_dev_info_get(port_id, &dev_info);
>
> -			if (dev_info.pci_dev) {
> -				conf.addr = dev_info.pci_dev->addr;
> -				conf.id = dev_info.pci_dev->id;
> +			bus = rte_bus_find_by_device(dev_info.device);
> +			if (bus && !strcmp(bus->name, "pci")) {
> +				pci_dev = RTE_DEV_TO_PCI(dev_info.device);
> +				conf.addr = pci_dev->addr;
> +				conf.id = pci_dev->id;
>  			}
>  			/* Get the interface default mac address */
>  			rte_eth_macaddr_get(port_id,
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index ab1030d42..0ed903966 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -995,7 +995,7 @@ struct rte_pci_device;
>   * Ethernet device information
>   */
>  struct rte_eth_dev_info {
> -	struct rte_pci_device *pci_dev; /**< Device PCI information. */
> +	struct rte_device *device; /** Generic device information */
>  	const char *driver_name; /**< Device Driver name. */
>  	unsigned int if_index; /**< Index to bound host interface, or 0 if none.
>  		Use if_indextoname() to translate into an interface name. */
> diff --git a/test/test/test_kni.c b/test/test/test_kni.c
> index e4839cdb7..e23eb0837 100644
> --- a/test/test/test_kni.c
> +++ b/test/test/test_kni.c
> @@ -357,6 +357,8 @@ test_kni_processing(uint16_t port_id, struct rte_mempool *mp)
>  	struct rte_kni_conf conf;
>  	struct rte_eth_dev_info info;
>  	struct rte_kni_ops ops;
> +	const struct rte_pci_device *pci_dev;
> +	const struct rte_bus *bus;
>
>  	if (!mp)
>  		return -1;
> @@ -366,8 +368,12 @@ test_kni_processing(uint16_t port_id, struct rte_mempool *mp)
>  	memset(&ops, 0, sizeof(ops));
>
>  	rte_eth_dev_info_get(port_id, &info);
> -	conf.addr = info.pci_dev->addr;
> -	conf.id = info.pci_dev->id;
> +	bus = rte_bus_find_by_device(info.device);
> +	if (bus && !strcmp(bus->name, "pci")) {
> +		pci_dev = RTE_DEV_TO_PCI(info.device);
> +		conf.addr = pci_dev->addr;
> +		conf.id = pci_dev->id;
> +	}
>  	snprintf(conf.name, sizeof(conf.name), TEST_KNI_PORT);
>
>  	/* core id 1 configured for kernel thread */
> @@ -465,6 +471,8 @@ test_kni(void)
>  	struct rte_kni_conf conf;
>  	struct rte_eth_dev_info info;
>  	struct rte_kni_ops ops;
> +	const struct rte_pci_device *pci_dev;
> +	const struct rte_bus *bus;
>
>  	/* Initialize KNI subsytem */
>  	rte_kni_init(KNI_TEST_MAX_PORTS);
> @@ -523,8 +531,12 @@ test_kni(void)
>  	memset(&conf, 0, sizeof(conf));
>  	memset(&ops, 0, sizeof(ops));
>  	rte_eth_dev_info_get(port_id, &info);
> -	conf.addr = info.pci_dev->addr;
> -	conf.id = info.pci_dev->id;
> +	bus = rte_bus_find_by_device(info.device);
> +	if (bus && !strcmp(bus->name, "pci")) {
> +		pci_dev = RTE_DEV_TO_PCI(info.device);
> +		conf.addr = pci_dev->addr;
> +		conf.id = pci_dev->id;
> +	}
>  	conf.group_id = port_id;
>  	conf.mbuf_size = MAX_PACKET_SZ;
>
> @@ -552,8 +564,12 @@ test_kni(void)
>  	memset(&info, 0, sizeof(info));
>  	memset(&ops, 0, sizeof(ops));
>  	rte_eth_dev_info_get(port_id, &info);
> -	conf.addr = info.pci_dev->addr;
> -	conf.id = info.pci_dev->id;
> +	bus = rte_bus_find_by_device(info.device);
> +	if (bus && !strcmp(bus->name, "pci")) {
> +		pci_dev = RTE_DEV_TO_PCI(info.device);
> +		conf.addr = pci_dev->addr;
> +		conf.id = pci_dev->id;
> +	}
>  	conf.group_id = port_id;
>  	conf.mbuf_size = MAX_PACKET_SZ;
>
> --
> 2.14.3
>

--
- Tomasz Duszyński

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] ethdev: replace bus specific struct with generic dev
  2018-03-27 17:40  1% [dpdk-dev] [PATCH] ethdev: replace bus specific struct with generic dev Ferruh Yigit
  2018-03-28  7:04  0% ` Shreyansh Jain
@ 2018-03-28 13:11  0% ` Legacy, Allain
  2018-03-29  6:17  0% ` Tomasz Duszynski
  2018-03-29  8:01  0% ` santosh
  3 siblings, 0 replies; 200+ results
From: Legacy, Allain @ 2018-03-28 13:11 UTC (permalink / raw)
  To: YIGIT, FERRUH, LU, WENZHUO, WU, JINGJING, John W. Linville,
	Shepard Siegel, Ed Czeck, John Miller, Peters, Matt,
	Harish Patil, Rasesh Mody, Ajit Khaparde, Somnath Kotur,
	Rahul Lakkireddy, Hemant Agrawal, Shreyansh Jain, Marcin Wojtas,
	Michal Krawczyk, Guy Tzalik, Evgeny Schemeilin, John Daley,
	Hyong Youb Kim, ZHANG, QI, WANG, XIAO, XING, BEILEI, ANANYEV,
	KONSTANTIN, Shijith Thotton, Srisivasubramanian Srinivasan,
	Adrien Mazarguil, Nelio Laranjeiro, Yongseok Koh, Jacek Siuda,
	Tomasz Duszynski, Dmitri Epshtein, Natalie Samsonov, Jianbo Liu,
	Alejandro Lucero, Tetsuya Mukawa, Santosh Shukla, Jerin Jacob,
	Shahed Shaikh, RICHARDSON, BRUCE, Andrew Rybchenko, Matej Vido,
	Pascal Mazon, Maciej Czekaj, Maxime Coquelin, BIE, TIWEI,
	Shrikrishna Khare, HORTON, REMY, Ori Kam, DE LARA GUARCH, PABLO,
	NICOLAU, RADU, Akhil Goyal, KANTECKI, TOMASZ, DUMITRESCU,
	CRISTIAN FLORIN, Thomas Monjalon
  Cc: dev, YIGIT, FERRUH

> -----Original Message-----
> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> Sent: Tuesday, March 27, 2018 1:41 PM
<...>
> Subject: [PATCH] ethdev: replace bus specific struct with generic dev
> 
> Public struct rte_eth_dev_info has a "struct rte_pci_device" field in it
> although it is common for all ethdev in all buses.
> 
> Replacing pci specific struct with generic device struct and updating places
> that are using pci device in a way to get this information from generic device.
> 
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> Cc: Pablo de Lara <pablo.de.lara.guarch@intel.com>
> 
> There is no deprecation notice sent for this update but in this release ethdev
> info already updated and ABI already broken, it can be good opportunity for
> this update.
> ---
>  app/test-pmd/config.c                     | 11 ++++++++++-
>  app/test-pmd/testpmd.h                    | 24 ++++++++++++++++++------
>  drivers/net/af_packet/rte_eth_af_packet.c |  1 +
>  drivers/net/ark/ark_ethdev.c              |  4 +++-
>  drivers/net/avf/avf_ethdev.c              |  2 +-
>  drivers/net/avp/avp_ethdev.c              |  2 +-

Acked-by:  Allain Legacy <allain.legacy@windriver.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space messages
  2018-03-28 10:23  0%             ` Liu, Changpeng
@ 2018-03-28 10:56  0%               ` Maxime Coquelin
  0 siblings, 0 replies; 200+ results
From: Maxime Coquelin @ 2018-03-28 10:56 UTC (permalink / raw)
  To: Liu, Changpeng, Kulasek, TomaszX, yliu
  Cc: Verkamp, Daniel, Harris, James R, Wodkowski, PawelX, dev, Tan, Jianfeng



On 03/28/2018 12:23 PM, Liu, Changpeng wrote:
> 
> 
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Wednesday, March 28, 2018 6:11 PM
>> To: Liu, Changpeng <changpeng.liu@intel.com>; Kulasek, TomaszX
>> <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
>> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
>> <james.r.harris@intel.com>; Wodkowski, PawelX
>> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Tan, Jianfeng
>> <jianfeng.tan@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space
>> messages
>>
>>
>>
>> On 03/28/2018 12:03 PM, Liu, Changpeng wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>>> Sent: Wednesday, March 28, 2018 5:58 PM
>>>> To: Liu, Changpeng <changpeng.liu@intel.com>; Kulasek, TomaszX
>>>> <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
>>>> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
>>>> <james.r.harris@intel.com>; Wodkowski, PawelX
>>>> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Tan, Jianfeng
>>>> <jianfeng.tan@intel.com>
>>>> Subject: Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space
>>>> messages
>>>>
>>>>
>>>>
>>>> On 03/28/2018 11:50 AM, Liu, Changpeng wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>>>>> Sent: Wednesday, March 28, 2018 5:12 PM
>>>>>> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
>>>>>> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
>>>>>> <james.r.harris@intel.com>; Wodkowski, PawelX
>>>>>> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Liu, Changpeng
>>>>>> <changpeng.liu@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
>>>>>> Subject: Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space
>>>>>> messages
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 03/27/2018 05:35 PM, Tomasz Kulasek wrote:
>>>>>>> This patch adds new vhost user messages GET_CONFIG and SET_CONFIG
>>>> used
>>>>>>> for get/set virtio device's configuration space.
>>>>>>>
>>>>>>> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
>>>>>>> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
>>>>>>> ---
>>>>>>> Changes in v2:
>>>>>>>      - code cleanup
>>>>>>>
>>>>>>>      lib/librte_vhost/rte_vhost.h  |  4 ++++
>>>>>>>      lib/librte_vhost/vhost_user.c | 22 ++++++++++++++++++++++
>>>>>>>      lib/librte_vhost/vhost_user.h | 16 ++++++++++++++++
>>>>>>>      3 files changed, 42 insertions(+)
>>>>>>>
>>>>>>> diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
>>>>>>> index d332069..fe30518 100644
>>>>>>> --- a/lib/librte_vhost/rte_vhost.h
>>>>>>> +++ b/lib/librte_vhost/rte_vhost.h
>>>>>>> @@ -84,6 +84,10 @@ struct vhost_device_ops {
>>>>>>>      	int (*new_connection)(int vid);
>>>>>>>      	void (*destroy_connection)(int vid);
>>>>>>>
>>>>>>> +	int (*get_config)(int vid, uint8_t *config, uint32_t config_len);
>>>>>>> +	int (*set_config)(int vid, uint8_t *config, uint32_t offset,
>>>>>>> +			uint32_t len, uint32_t flags);
>>>>>>> +
>>>>>>>      	void *reserved[2]; /**< Reserved for future extension */
>>>>>>
>>>>>> You are breaking the ABI, as you grow the size of the ops struct.
>>>>>>
>>>>>> Also, I'm wondering if we shouldn't have a different ops for external
>>>>>> backends. Here these ops are more intended to the application, we could
>>>>>> have a specific ops struct for external backends IMHO.
>>>>>>
>>>>>>>      };
>>>>>>>
>>>>>>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>>>>>>> index 90ed211..0ed6a5a 100644
>>>>>>> --- a/lib/librte_vhost/vhost_user.c
>>>>>>> +++ b/lib/librte_vhost/vhost_user.c
>>>>>>> @@ -50,6 +50,8 @@ static const char
>>>> *vhost_message_str[VHOST_USER_MAX]
>>>>>> = {
>>>>>>>      	[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
>>>>>>>      	[VHOST_USER_SET_SLAVE_REQ_FD]  =
>>>>>> "VHOST_USER_SET_SLAVE_REQ_FD",
>>>>>>>      	[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
>>>>>>> +	[VHOST_USER_GET_CONFIG] = "VHOST_USER_GET_CONFIG",
>>>>>>> +	[VHOST_USER_SET_CONFIG] = "VHOST_USER_SET_CONFIG",
>>>>>>>      };
>>>>>>>
>>>>>>>      static uint64_t
>>>>>>> @@ -1355,6 +1357,7 @@ vhost_user_msg_handler(int vid, int fd)
>>>>>>>      	 * would cause a dead lock.
>>>>>>>      	 */
>>>>>>>      	switch (msg.request.master) {
>>>>>>> +	case VHOST_USER_SET_CONFIG:
>>>>>>
>>>>>> It seems VHOST_USER_GET_CONFIG is missing here.
>>>>>>
>>>>>>>      	case VHOST_USER_SET_FEATURES:
>>>>>>>      	case VHOST_USER_SET_PROTOCOL_FEATURES:
>>>>>>>      	case VHOST_USER_SET_OWNER:
>>>>>>> @@ -1380,6 +1383,25 @@ vhost_user_msg_handler(int vid, int fd)
>>>>>>>      	}
>>>>>>>
>>>>>>>      	switch (msg.request.master) {
>>>>>>> +	case VHOST_USER_GET_CONFIG:
>>>>>>> +		if (dev->notify_ops->get_config(dev->vid,
>>>>>> Please check ->get_config is set before calling it.
>>>>>>
>>>>>>> +				msg.payload.config.region,
>>>>>>> +				msg.payload.config.size) != 0) {
>>>>>>> +			msg.size = sizeof(uint64_t);
>>>>>>> +		}
>>>>>>> +		send_vhost_reply(fd, &msg);
>>>>>>> +		break;
>>>>>>> +	case VHOST_USER_SET_CONFIG:
>>>>>>> +		if ((dev->notify_ops->set_config(dev->vid,
>>>>>> Ditto.
>>>>>>
>>>>>>> +				msg.payload.config.region,
>>>>>>> +				msg.payload.config.offset,
>>>>>>> +				msg.payload.config.size,
>>>>>>> +				msg.payload.config.flags)) != 0) {
>>>>>>> +			ret = 1;
>>>>>>> +		} else {
>>>>>>> +			ret = 0;
>>>>>>> +		}
>>>>>>
>>>>>> ret = dev->notify_ops->set_config instead?
>>>>>>> +		break;
>>>>>>>      	case VHOST_USER_GET_FEATURES:
>>>>>>>      		msg.payload.u64 = vhost_user_get_features(dev);
>>>>>>>      		msg.size = sizeof(msg.payload.u64);
>>>>>>> diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
>>>>>>> index d4bd604..25cc026 100644
>>>>>>> --- a/lib/librte_vhost/vhost_user.h
>>>>>>> +++ b/lib/librte_vhost/vhost_user.h
>>>>>>> @@ -14,6 +14,11 @@
>>>>>>>
>>>>>>>      #define VHOST_MEMORY_MAX_NREGIONS 8
>>>>>>>
>>>>>>> +/*
>>>>>>> + * Maximum size of virtio device config space
>>>>>>> + */
>>>>>>> +#define VHOST_USER_MAX_CONFIG_SIZE 256
>>>>>>> +
>>>>>>>      #define VHOST_USER_PROTOCOL_F_MQ	0
>>>>>>>      #define VHOST_USER_PROTOCOL_F_LOG_SHMFD	1
>>>>>>>      #define VHOST_USER_PROTOCOL_F_RARP	2
>>>>>>
>>>>>> Shouldn't there be a protocol feature associated to these new messages?
>>>>>> Else how QEMU knows the backend supports it or not?
>>>>>>
>>>>>> I looked at QEMU code and indeed no protocol feature associated, that's
>>>>>> strange...
>>>>> Nice to have, for now not all the QEMU host driver need to get this
>>>> configuration space from slave backend
>>>>> when getting start. This message can be used for migration of vhost-user
>>>> devices.
>>>>
>>>> So if QEMU sends this message but the DPDK version does not support it
>>>> yet, vhost_user_msg_handler() will return an error ("vhost read
>>>> incorrect message") and the socket will be closed.
>>>>
>>>> How do we overcome this? I think we really need a spec update ASAP,
>>>> before QEMU v2.12 is out (-rc1 already).
>>>>
>>>> Do you have time to take care of this?
>>> For now there are no other users except us care about this message, :), it's no
>> hurry.
>>> I can take this after QEMU 2.12 release adding a new protocol feature bit.
>>
>> Are you sure?
>> If I understand the code correctly, as the guest writes in config regs
>> of a virtio-blk device, .set_config callback will be called.
> Exactly.
>>
>> If you have a vhost-user backend, it will receive the SET_CONFIG
>> request, no?
> For now this only enabled for QEMU vhost-user-blk driver, QEMU virtio-blk driver didn't have such issue.

Right.
But it will be really painful to manage for example for cross-version
live migration. Or when you'll want to use QEMU v2.13+ with a DPDK
v18.05 backend, the protocol feature won't be negotiated.

Really, this is important to get it right at the beginning.

Thanks,
Maxime
>>
>> Cheers,
>> Maxime
>>
>>>>
>>>> Thanks,
>>>> Maxime

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space messages
  2018-03-28 10:11  0%           ` Maxime Coquelin
@ 2018-03-28 10:23  0%             ` Liu, Changpeng
  2018-03-28 10:56  0%               ` Maxime Coquelin
  0 siblings, 1 reply; 200+ results
From: Liu, Changpeng @ 2018-03-28 10:23 UTC (permalink / raw)
  To: Maxime Coquelin, Kulasek, TomaszX, yliu
  Cc: Verkamp, Daniel, Harris, James R, Wodkowski, PawelX, dev, Tan, Jianfeng



> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Wednesday, March 28, 2018 6:11 PM
> To: Liu, Changpeng <changpeng.liu@intel.com>; Kulasek, TomaszX
> <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
> <james.r.harris@intel.com>; Wodkowski, PawelX
> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Tan, Jianfeng
> <jianfeng.tan@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space
> messages
> 
> 
> 
> On 03/28/2018 12:03 PM, Liu, Changpeng wrote:
> >
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> >> Sent: Wednesday, March 28, 2018 5:58 PM
> >> To: Liu, Changpeng <changpeng.liu@intel.com>; Kulasek, TomaszX
> >> <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
> >> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
> >> <james.r.harris@intel.com>; Wodkowski, PawelX
> >> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Tan, Jianfeng
> >> <jianfeng.tan@intel.com>
> >> Subject: Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space
> >> messages
> >>
> >>
> >>
> >> On 03/28/2018 11:50 AM, Liu, Changpeng wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> >>>> Sent: Wednesday, March 28, 2018 5:12 PM
> >>>> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
> >>>> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
> >>>> <james.r.harris@intel.com>; Wodkowski, PawelX
> >>>> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Liu, Changpeng
> >>>> <changpeng.liu@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> >>>> Subject: Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space
> >>>> messages
> >>>>
> >>>>
> >>>>
> >>>> On 03/27/2018 05:35 PM, Tomasz Kulasek wrote:
> >>>>> This patch adds new vhost user messages GET_CONFIG and SET_CONFIG
> >> used
> >>>>> for get/set virtio device's configuration space.
> >>>>>
> >>>>> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
> >>>>> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> >>>>> ---
> >>>>> Changes in v2:
> >>>>>     - code cleanup
> >>>>>
> >>>>>     lib/librte_vhost/rte_vhost.h  |  4 ++++
> >>>>>     lib/librte_vhost/vhost_user.c | 22 ++++++++++++++++++++++
> >>>>>     lib/librte_vhost/vhost_user.h | 16 ++++++++++++++++
> >>>>>     3 files changed, 42 insertions(+)
> >>>>>
> >>>>> diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
> >>>>> index d332069..fe30518 100644
> >>>>> --- a/lib/librte_vhost/rte_vhost.h
> >>>>> +++ b/lib/librte_vhost/rte_vhost.h
> >>>>> @@ -84,6 +84,10 @@ struct vhost_device_ops {
> >>>>>     	int (*new_connection)(int vid);
> >>>>>     	void (*destroy_connection)(int vid);
> >>>>>
> >>>>> +	int (*get_config)(int vid, uint8_t *config, uint32_t config_len);
> >>>>> +	int (*set_config)(int vid, uint8_t *config, uint32_t offset,
> >>>>> +			uint32_t len, uint32_t flags);
> >>>>> +
> >>>>>     	void *reserved[2]; /**< Reserved for future extension */
> >>>>
> >>>> You are breaking the ABI, as you grow the size of the ops struct.
> >>>>
> >>>> Also, I'm wondering if we shouldn't have a different ops for external
> >>>> backends. Here these ops are more intended to the application, we could
> >>>> have a specific ops struct for external backends IMHO.
> >>>>
> >>>>>     };
> >>>>>
> >>>>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> >>>>> index 90ed211..0ed6a5a 100644
> >>>>> --- a/lib/librte_vhost/vhost_user.c
> >>>>> +++ b/lib/librte_vhost/vhost_user.c
> >>>>> @@ -50,6 +50,8 @@ static const char
> >> *vhost_message_str[VHOST_USER_MAX]
> >>>> = {
> >>>>>     	[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
> >>>>>     	[VHOST_USER_SET_SLAVE_REQ_FD]  =
> >>>> "VHOST_USER_SET_SLAVE_REQ_FD",
> >>>>>     	[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
> >>>>> +	[VHOST_USER_GET_CONFIG] = "VHOST_USER_GET_CONFIG",
> >>>>> +	[VHOST_USER_SET_CONFIG] = "VHOST_USER_SET_CONFIG",
> >>>>>     };
> >>>>>
> >>>>>     static uint64_t
> >>>>> @@ -1355,6 +1357,7 @@ vhost_user_msg_handler(int vid, int fd)
> >>>>>     	 * would cause a dead lock.
> >>>>>     	 */
> >>>>>     	switch (msg.request.master) {
> >>>>> +	case VHOST_USER_SET_CONFIG:
> >>>>
> >>>> It seems VHOST_USER_GET_CONFIG is missing here.
> >>>>
> >>>>>     	case VHOST_USER_SET_FEATURES:
> >>>>>     	case VHOST_USER_SET_PROTOCOL_FEATURES:
> >>>>>     	case VHOST_USER_SET_OWNER:
> >>>>> @@ -1380,6 +1383,25 @@ vhost_user_msg_handler(int vid, int fd)
> >>>>>     	}
> >>>>>
> >>>>>     	switch (msg.request.master) {
> >>>>> +	case VHOST_USER_GET_CONFIG:
> >>>>> +		if (dev->notify_ops->get_config(dev->vid,
> >>>> Please check ->get_config is set before calling it.
> >>>>
> >>>>> +				msg.payload.config.region,
> >>>>> +				msg.payload.config.size) != 0) {
> >>>>> +			msg.size = sizeof(uint64_t);
> >>>>> +		}
> >>>>> +		send_vhost_reply(fd, &msg);
> >>>>> +		break;
> >>>>> +	case VHOST_USER_SET_CONFIG:
> >>>>> +		if ((dev->notify_ops->set_config(dev->vid,
> >>>> Ditto.
> >>>>
> >>>>> +				msg.payload.config.region,
> >>>>> +				msg.payload.config.offset,
> >>>>> +				msg.payload.config.size,
> >>>>> +				msg.payload.config.flags)) != 0) {
> >>>>> +			ret = 1;
> >>>>> +		} else {
> >>>>> +			ret = 0;
> >>>>> +		}
> >>>>
> >>>> ret = dev->notify_ops->set_config instead?
> >>>>> +		break;
> >>>>>     	case VHOST_USER_GET_FEATURES:
> >>>>>     		msg.payload.u64 = vhost_user_get_features(dev);
> >>>>>     		msg.size = sizeof(msg.payload.u64);
> >>>>> diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
> >>>>> index d4bd604..25cc026 100644
> >>>>> --- a/lib/librte_vhost/vhost_user.h
> >>>>> +++ b/lib/librte_vhost/vhost_user.h
> >>>>> @@ -14,6 +14,11 @@
> >>>>>
> >>>>>     #define VHOST_MEMORY_MAX_NREGIONS 8
> >>>>>
> >>>>> +/*
> >>>>> + * Maximum size of virtio device config space
> >>>>> + */
> >>>>> +#define VHOST_USER_MAX_CONFIG_SIZE 256
> >>>>> +
> >>>>>     #define VHOST_USER_PROTOCOL_F_MQ	0
> >>>>>     #define VHOST_USER_PROTOCOL_F_LOG_SHMFD	1
> >>>>>     #define VHOST_USER_PROTOCOL_F_RARP	2
> >>>>
> >>>> Shouldn't there be a protocol feature associated to these new messages?
> >>>> Else how QEMU knows the backend supports it or not?
> >>>>
> >>>> I looked at QEMU code and indeed no protocol feature associated, that's
> >>>> strange...
> >>> Nice to have, for now not all the QEMU host driver need to get this
> >> configuration space from slave backend
> >>> when getting start. This message can be used for migration of vhost-user
> >> devices.
> >>
> >> So if QEMU sends this message but the DPDK version does not support it
> >> yet, vhost_user_msg_handler() will return an error ("vhost read
> >> incorrect message") and the socket will be closed.
> >>
> >> How do we overcome this? I think we really need a spec update ASAP,
> >> before QEMU v2.12 is out (-rc1 already).
> >>
> >> Do you have time to take care of this?
> > For now there are no other users except us care about this message, :), it's no
> hurry.
> > I can take this after QEMU 2.12 release adding a new protocol feature bit.
> 
> Are you sure?
> If I understand the code correctly, as the guest writes in config regs
> of a virtio-blk device, .set_config callback will be called.
Exactly.
> 
> If you have a vhost-user backend, it will receive the SET_CONFIG
> request, no?
For now this only enabled for QEMU vhost-user-blk driver, QEMU virtio-blk driver didn't have such issue.
> 
> Cheers,
> Maxime
> 
> >>
> >> Thanks,
> >> Maxime

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space messages
  2018-03-28 10:03  0%         ` Liu, Changpeng
@ 2018-03-28 10:11  0%           ` Maxime Coquelin
  2018-03-28 10:23  0%             ` Liu, Changpeng
  0 siblings, 1 reply; 200+ results
From: Maxime Coquelin @ 2018-03-28 10:11 UTC (permalink / raw)
  To: Liu, Changpeng, Kulasek, TomaszX, yliu
  Cc: Verkamp, Daniel, Harris, James R, Wodkowski, PawelX, dev, Tan, Jianfeng



On 03/28/2018 12:03 PM, Liu, Changpeng wrote:
> 
> 
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Wednesday, March 28, 2018 5:58 PM
>> To: Liu, Changpeng <changpeng.liu@intel.com>; Kulasek, TomaszX
>> <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
>> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
>> <james.r.harris@intel.com>; Wodkowski, PawelX
>> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Tan, Jianfeng
>> <jianfeng.tan@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space
>> messages
>>
>>
>>
>> On 03/28/2018 11:50 AM, Liu, Changpeng wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>>> Sent: Wednesday, March 28, 2018 5:12 PM
>>>> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
>>>> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
>>>> <james.r.harris@intel.com>; Wodkowski, PawelX
>>>> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Liu, Changpeng
>>>> <changpeng.liu@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
>>>> Subject: Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space
>>>> messages
>>>>
>>>>
>>>>
>>>> On 03/27/2018 05:35 PM, Tomasz Kulasek wrote:
>>>>> This patch adds new vhost user messages GET_CONFIG and SET_CONFIG
>> used
>>>>> for get/set virtio device's configuration space.
>>>>>
>>>>> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
>>>>> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
>>>>> ---
>>>>> Changes in v2:
>>>>>     - code cleanup
>>>>>
>>>>>     lib/librte_vhost/rte_vhost.h  |  4 ++++
>>>>>     lib/librte_vhost/vhost_user.c | 22 ++++++++++++++++++++++
>>>>>     lib/librte_vhost/vhost_user.h | 16 ++++++++++++++++
>>>>>     3 files changed, 42 insertions(+)
>>>>>
>>>>> diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
>>>>> index d332069..fe30518 100644
>>>>> --- a/lib/librte_vhost/rte_vhost.h
>>>>> +++ b/lib/librte_vhost/rte_vhost.h
>>>>> @@ -84,6 +84,10 @@ struct vhost_device_ops {
>>>>>     	int (*new_connection)(int vid);
>>>>>     	void (*destroy_connection)(int vid);
>>>>>
>>>>> +	int (*get_config)(int vid, uint8_t *config, uint32_t config_len);
>>>>> +	int (*set_config)(int vid, uint8_t *config, uint32_t offset,
>>>>> +			uint32_t len, uint32_t flags);
>>>>> +
>>>>>     	void *reserved[2]; /**< Reserved for future extension */
>>>>
>>>> You are breaking the ABI, as you grow the size of the ops struct.
>>>>
>>>> Also, I'm wondering if we shouldn't have a different ops for external
>>>> backends. Here these ops are more intended to the application, we could
>>>> have a specific ops struct for external backends IMHO.
>>>>
>>>>>     };
>>>>>
>>>>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>>>>> index 90ed211..0ed6a5a 100644
>>>>> --- a/lib/librte_vhost/vhost_user.c
>>>>> +++ b/lib/librte_vhost/vhost_user.c
>>>>> @@ -50,6 +50,8 @@ static const char
>> *vhost_message_str[VHOST_USER_MAX]
>>>> = {
>>>>>     	[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
>>>>>     	[VHOST_USER_SET_SLAVE_REQ_FD]  =
>>>> "VHOST_USER_SET_SLAVE_REQ_FD",
>>>>>     	[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
>>>>> +	[VHOST_USER_GET_CONFIG] = "VHOST_USER_GET_CONFIG",
>>>>> +	[VHOST_USER_SET_CONFIG] = "VHOST_USER_SET_CONFIG",
>>>>>     };
>>>>>
>>>>>     static uint64_t
>>>>> @@ -1355,6 +1357,7 @@ vhost_user_msg_handler(int vid, int fd)
>>>>>     	 * would cause a dead lock.
>>>>>     	 */
>>>>>     	switch (msg.request.master) {
>>>>> +	case VHOST_USER_SET_CONFIG:
>>>>
>>>> It seems VHOST_USER_GET_CONFIG is missing here.
>>>>
>>>>>     	case VHOST_USER_SET_FEATURES:
>>>>>     	case VHOST_USER_SET_PROTOCOL_FEATURES:
>>>>>     	case VHOST_USER_SET_OWNER:
>>>>> @@ -1380,6 +1383,25 @@ vhost_user_msg_handler(int vid, int fd)
>>>>>     	}
>>>>>
>>>>>     	switch (msg.request.master) {
>>>>> +	case VHOST_USER_GET_CONFIG:
>>>>> +		if (dev->notify_ops->get_config(dev->vid,
>>>> Please check ->get_config is set before calling it.
>>>>
>>>>> +				msg.payload.config.region,
>>>>> +				msg.payload.config.size) != 0) {
>>>>> +			msg.size = sizeof(uint64_t);
>>>>> +		}
>>>>> +		send_vhost_reply(fd, &msg);
>>>>> +		break;
>>>>> +	case VHOST_USER_SET_CONFIG:
>>>>> +		if ((dev->notify_ops->set_config(dev->vid,
>>>> Ditto.
>>>>
>>>>> +				msg.payload.config.region,
>>>>> +				msg.payload.config.offset,
>>>>> +				msg.payload.config.size,
>>>>> +				msg.payload.config.flags)) != 0) {
>>>>> +			ret = 1;
>>>>> +		} else {
>>>>> +			ret = 0;
>>>>> +		}
>>>>
>>>> ret = dev->notify_ops->set_config instead?
>>>>> +		break;
>>>>>     	case VHOST_USER_GET_FEATURES:
>>>>>     		msg.payload.u64 = vhost_user_get_features(dev);
>>>>>     		msg.size = sizeof(msg.payload.u64);
>>>>> diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
>>>>> index d4bd604..25cc026 100644
>>>>> --- a/lib/librte_vhost/vhost_user.h
>>>>> +++ b/lib/librte_vhost/vhost_user.h
>>>>> @@ -14,6 +14,11 @@
>>>>>
>>>>>     #define VHOST_MEMORY_MAX_NREGIONS 8
>>>>>
>>>>> +/*
>>>>> + * Maximum size of virtio device config space
>>>>> + */
>>>>> +#define VHOST_USER_MAX_CONFIG_SIZE 256
>>>>> +
>>>>>     #define VHOST_USER_PROTOCOL_F_MQ	0
>>>>>     #define VHOST_USER_PROTOCOL_F_LOG_SHMFD	1
>>>>>     #define VHOST_USER_PROTOCOL_F_RARP	2
>>>>
>>>> Shouldn't there be a protocol feature associated to these new messages?
>>>> Else how QEMU knows the backend supports it or not?
>>>>
>>>> I looked at QEMU code and indeed no protocol feature associated, that's
>>>> strange...
>>> Nice to have, for now not all the QEMU host driver need to get this
>> configuration space from slave backend
>>> when getting start. This message can be used for migration of vhost-user
>> devices.
>>
>> So if QEMU sends this message but the DPDK version does not support it
>> yet, vhost_user_msg_handler() will return an error ("vhost read
>> incorrect message") and the socket will be closed.
>>
>> How do we overcome this? I think we really need a spec update ASAP,
>> before QEMU v2.12 is out (-rc1 already).
>>
>> Do you have time to take care of this?
> For now there are no other users except us care about this message, :), it's no hurry.
> I can take this after QEMU 2.12 release adding a new protocol feature bit.

Are you sure?
If I understand the code correctly, as the guest writes in config regs
of a virtio-blk device, .set_config callback will be called.

If you have a vhost-user backend, it will receive the SET_CONFIG
request, no?

Cheers,
Maxime

>>
>> Thanks,
>> Maxime

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space messages
  2018-03-28  9:57  0%       ` Maxime Coquelin
@ 2018-03-28 10:03  0%         ` Liu, Changpeng
  2018-03-28 10:11  0%           ` Maxime Coquelin
  0 siblings, 1 reply; 200+ results
From: Liu, Changpeng @ 2018-03-28 10:03 UTC (permalink / raw)
  To: Maxime Coquelin, Kulasek, TomaszX, yliu
  Cc: Verkamp, Daniel, Harris, James R, Wodkowski, PawelX, dev, Tan, Jianfeng



> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Wednesday, March 28, 2018 5:58 PM
> To: Liu, Changpeng <changpeng.liu@intel.com>; Kulasek, TomaszX
> <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
> <james.r.harris@intel.com>; Wodkowski, PawelX
> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Tan, Jianfeng
> <jianfeng.tan@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space
> messages
> 
> 
> 
> On 03/28/2018 11:50 AM, Liu, Changpeng wrote:
> >
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> >> Sent: Wednesday, March 28, 2018 5:12 PM
> >> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
> >> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
> >> <james.r.harris@intel.com>; Wodkowski, PawelX
> >> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Liu, Changpeng
> >> <changpeng.liu@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> >> Subject: Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space
> >> messages
> >>
> >>
> >>
> >> On 03/27/2018 05:35 PM, Tomasz Kulasek wrote:
> >>> This patch adds new vhost user messages GET_CONFIG and SET_CONFIG
> used
> >>> for get/set virtio device's configuration space.
> >>>
> >>> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
> >>> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> >>> ---
> >>> Changes in v2:
> >>>    - code cleanup
> >>>
> >>>    lib/librte_vhost/rte_vhost.h  |  4 ++++
> >>>    lib/librte_vhost/vhost_user.c | 22 ++++++++++++++++++++++
> >>>    lib/librte_vhost/vhost_user.h | 16 ++++++++++++++++
> >>>    3 files changed, 42 insertions(+)
> >>>
> >>> diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
> >>> index d332069..fe30518 100644
> >>> --- a/lib/librte_vhost/rte_vhost.h
> >>> +++ b/lib/librte_vhost/rte_vhost.h
> >>> @@ -84,6 +84,10 @@ struct vhost_device_ops {
> >>>    	int (*new_connection)(int vid);
> >>>    	void (*destroy_connection)(int vid);
> >>>
> >>> +	int (*get_config)(int vid, uint8_t *config, uint32_t config_len);
> >>> +	int (*set_config)(int vid, uint8_t *config, uint32_t offset,
> >>> +			uint32_t len, uint32_t flags);
> >>> +
> >>>    	void *reserved[2]; /**< Reserved for future extension */
> >>
> >> You are breaking the ABI, as you grow the size of the ops struct.
> >>
> >> Also, I'm wondering if we shouldn't have a different ops for external
> >> backends. Here these ops are more intended to the application, we could
> >> have a specific ops struct for external backends IMHO.
> >>
> >>>    };
> >>>
> >>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> >>> index 90ed211..0ed6a5a 100644
> >>> --- a/lib/librte_vhost/vhost_user.c
> >>> +++ b/lib/librte_vhost/vhost_user.c
> >>> @@ -50,6 +50,8 @@ static const char
> *vhost_message_str[VHOST_USER_MAX]
> >> = {
> >>>    	[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
> >>>    	[VHOST_USER_SET_SLAVE_REQ_FD]  =
> >> "VHOST_USER_SET_SLAVE_REQ_FD",
> >>>    	[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
> >>> +	[VHOST_USER_GET_CONFIG] = "VHOST_USER_GET_CONFIG",
> >>> +	[VHOST_USER_SET_CONFIG] = "VHOST_USER_SET_CONFIG",
> >>>    };
> >>>
> >>>    static uint64_t
> >>> @@ -1355,6 +1357,7 @@ vhost_user_msg_handler(int vid, int fd)
> >>>    	 * would cause a dead lock.
> >>>    	 */
> >>>    	switch (msg.request.master) {
> >>> +	case VHOST_USER_SET_CONFIG:
> >>
> >> It seems VHOST_USER_GET_CONFIG is missing here.
> >>
> >>>    	case VHOST_USER_SET_FEATURES:
> >>>    	case VHOST_USER_SET_PROTOCOL_FEATURES:
> >>>    	case VHOST_USER_SET_OWNER:
> >>> @@ -1380,6 +1383,25 @@ vhost_user_msg_handler(int vid, int fd)
> >>>    	}
> >>>
> >>>    	switch (msg.request.master) {
> >>> +	case VHOST_USER_GET_CONFIG:
> >>> +		if (dev->notify_ops->get_config(dev->vid,
> >> Please check ->get_config is set before calling it.
> >>
> >>> +				msg.payload.config.region,
> >>> +				msg.payload.config.size) != 0) {
> >>> +			msg.size = sizeof(uint64_t);
> >>> +		}
> >>> +		send_vhost_reply(fd, &msg);
> >>> +		break;
> >>> +	case VHOST_USER_SET_CONFIG:
> >>> +		if ((dev->notify_ops->set_config(dev->vid,
> >> Ditto.
> >>
> >>> +				msg.payload.config.region,
> >>> +				msg.payload.config.offset,
> >>> +				msg.payload.config.size,
> >>> +				msg.payload.config.flags)) != 0) {
> >>> +			ret = 1;
> >>> +		} else {
> >>> +			ret = 0;
> >>> +		}
> >>
> >> ret = dev->notify_ops->set_config instead?
> >>> +		break;
> >>>    	case VHOST_USER_GET_FEATURES:
> >>>    		msg.payload.u64 = vhost_user_get_features(dev);
> >>>    		msg.size = sizeof(msg.payload.u64);
> >>> diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
> >>> index d4bd604..25cc026 100644
> >>> --- a/lib/librte_vhost/vhost_user.h
> >>> +++ b/lib/librte_vhost/vhost_user.h
> >>> @@ -14,6 +14,11 @@
> >>>
> >>>    #define VHOST_MEMORY_MAX_NREGIONS 8
> >>>
> >>> +/*
> >>> + * Maximum size of virtio device config space
> >>> + */
> >>> +#define VHOST_USER_MAX_CONFIG_SIZE 256
> >>> +
> >>>    #define VHOST_USER_PROTOCOL_F_MQ	0
> >>>    #define VHOST_USER_PROTOCOL_F_LOG_SHMFD	1
> >>>    #define VHOST_USER_PROTOCOL_F_RARP	2
> >>
> >> Shouldn't there be a protocol feature associated to these new messages?
> >> Else how QEMU knows the backend supports it or not?
> >>
> >> I looked at QEMU code and indeed no protocol feature associated, that's
> >> strange...
> > Nice to have, for now not all the QEMU host driver need to get this
> configuration space from slave backend
> > when getting start. This message can be used for migration of vhost-user
> devices.
> 
> So if QEMU sends this message but the DPDK version does not support it
> yet, vhost_user_msg_handler() will return an error ("vhost read
> incorrect message") and the socket will be closed.
> 
> How do we overcome this? I think we really need a spec update ASAP,
> before QEMU v2.12 is out (-rc1 already).
> 
> Do you have time to take care of this?
For now there are no other users except us care about this message, :), it's no hurry.
I can take this after QEMU 2.12 release adding a new protocol feature bit.
> 
> Thanks,
> Maxime

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space messages
  2018-03-28  9:50  0%     ` Liu, Changpeng
@ 2018-03-28  9:57  0%       ` Maxime Coquelin
  2018-03-28 10:03  0%         ` Liu, Changpeng
  0 siblings, 1 reply; 200+ results
From: Maxime Coquelin @ 2018-03-28  9:57 UTC (permalink / raw)
  To: Liu, Changpeng, Kulasek, TomaszX, yliu
  Cc: Verkamp, Daniel, Harris, James R, Wodkowski, PawelX, dev, Tan, Jianfeng



On 03/28/2018 11:50 AM, Liu, Changpeng wrote:
> 
> 
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Wednesday, March 28, 2018 5:12 PM
>> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
>> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
>> <james.r.harris@intel.com>; Wodkowski, PawelX
>> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Liu, Changpeng
>> <changpeng.liu@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space
>> messages
>>
>>
>>
>> On 03/27/2018 05:35 PM, Tomasz Kulasek wrote:
>>> This patch adds new vhost user messages GET_CONFIG and SET_CONFIG used
>>> for get/set virtio device's configuration space.
>>>
>>> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
>>> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
>>> ---
>>> Changes in v2:
>>>    - code cleanup
>>>
>>>    lib/librte_vhost/rte_vhost.h  |  4 ++++
>>>    lib/librte_vhost/vhost_user.c | 22 ++++++++++++++++++++++
>>>    lib/librte_vhost/vhost_user.h | 16 ++++++++++++++++
>>>    3 files changed, 42 insertions(+)
>>>
>>> diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
>>> index d332069..fe30518 100644
>>> --- a/lib/librte_vhost/rte_vhost.h
>>> +++ b/lib/librte_vhost/rte_vhost.h
>>> @@ -84,6 +84,10 @@ struct vhost_device_ops {
>>>    	int (*new_connection)(int vid);
>>>    	void (*destroy_connection)(int vid);
>>>
>>> +	int (*get_config)(int vid, uint8_t *config, uint32_t config_len);
>>> +	int (*set_config)(int vid, uint8_t *config, uint32_t offset,
>>> +			uint32_t len, uint32_t flags);
>>> +
>>>    	void *reserved[2]; /**< Reserved for future extension */
>>
>> You are breaking the ABI, as you grow the size of the ops struct.
>>
>> Also, I'm wondering if we shouldn't have a different ops for external
>> backends. Here these ops are more intended to the application, we could
>> have a specific ops struct for external backends IMHO.
>>
>>>    };
>>>
>>> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
>>> index 90ed211..0ed6a5a 100644
>>> --- a/lib/librte_vhost/vhost_user.c
>>> +++ b/lib/librte_vhost/vhost_user.c
>>> @@ -50,6 +50,8 @@ static const char *vhost_message_str[VHOST_USER_MAX]
>> = {
>>>    	[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
>>>    	[VHOST_USER_SET_SLAVE_REQ_FD]  =
>> "VHOST_USER_SET_SLAVE_REQ_FD",
>>>    	[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
>>> +	[VHOST_USER_GET_CONFIG] = "VHOST_USER_GET_CONFIG",
>>> +	[VHOST_USER_SET_CONFIG] = "VHOST_USER_SET_CONFIG",
>>>    };
>>>
>>>    static uint64_t
>>> @@ -1355,6 +1357,7 @@ vhost_user_msg_handler(int vid, int fd)
>>>    	 * would cause a dead lock.
>>>    	 */
>>>    	switch (msg.request.master) {
>>> +	case VHOST_USER_SET_CONFIG:
>>
>> It seems VHOST_USER_GET_CONFIG is missing here.
>>
>>>    	case VHOST_USER_SET_FEATURES:
>>>    	case VHOST_USER_SET_PROTOCOL_FEATURES:
>>>    	case VHOST_USER_SET_OWNER:
>>> @@ -1380,6 +1383,25 @@ vhost_user_msg_handler(int vid, int fd)
>>>    	}
>>>
>>>    	switch (msg.request.master) {
>>> +	case VHOST_USER_GET_CONFIG:
>>> +		if (dev->notify_ops->get_config(dev->vid,
>> Please check ->get_config is set before calling it.
>>
>>> +				msg.payload.config.region,
>>> +				msg.payload.config.size) != 0) {
>>> +			msg.size = sizeof(uint64_t);
>>> +		}
>>> +		send_vhost_reply(fd, &msg);
>>> +		break;
>>> +	case VHOST_USER_SET_CONFIG:
>>> +		if ((dev->notify_ops->set_config(dev->vid,
>> Ditto.
>>
>>> +				msg.payload.config.region,
>>> +				msg.payload.config.offset,
>>> +				msg.payload.config.size,
>>> +				msg.payload.config.flags)) != 0) {
>>> +			ret = 1;
>>> +		} else {
>>> +			ret = 0;
>>> +		}
>>
>> ret = dev->notify_ops->set_config instead?
>>> +		break;
>>>    	case VHOST_USER_GET_FEATURES:
>>>    		msg.payload.u64 = vhost_user_get_features(dev);
>>>    		msg.size = sizeof(msg.payload.u64);
>>> diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
>>> index d4bd604..25cc026 100644
>>> --- a/lib/librte_vhost/vhost_user.h
>>> +++ b/lib/librte_vhost/vhost_user.h
>>> @@ -14,6 +14,11 @@
>>>
>>>    #define VHOST_MEMORY_MAX_NREGIONS 8
>>>
>>> +/*
>>> + * Maximum size of virtio device config space
>>> + */
>>> +#define VHOST_USER_MAX_CONFIG_SIZE 256
>>> +
>>>    #define VHOST_USER_PROTOCOL_F_MQ	0
>>>    #define VHOST_USER_PROTOCOL_F_LOG_SHMFD	1
>>>    #define VHOST_USER_PROTOCOL_F_RARP	2
>>
>> Shouldn't there be a protocol feature associated to these new messages?
>> Else how QEMU knows the backend supports it or not?
>>
>> I looked at QEMU code and indeed no protocol feature associated, that's
>> strange...
> Nice to have, for now not all the QEMU host driver need to get this configuration space from slave backend
> when getting start. This message can be used for migration of vhost-user devices.

So if QEMU sends this message but the DPDK version does not support it
yet, vhost_user_msg_handler() will return an error ("vhost read
incorrect message") and the socket will be closed.

How do we overcome this? I think we really need a spec update ASAP, 
before QEMU v2.12 is out (-rc1 already).

Do you have time to take care of this?

Thanks,
Maxime

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space messages
  2018-03-28  9:11  3%   ` Maxime Coquelin
  2018-03-28  9:19  0%     ` Wodkowski, PawelX
@ 2018-03-28  9:50  0%     ` Liu, Changpeng
  2018-03-28  9:57  0%       ` Maxime Coquelin
  1 sibling, 1 reply; 200+ results
From: Liu, Changpeng @ 2018-03-28  9:50 UTC (permalink / raw)
  To: Maxime Coquelin, Kulasek, TomaszX, yliu
  Cc: Verkamp, Daniel, Harris, James R, Wodkowski, PawelX, dev, Tan, Jianfeng



> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Wednesday, March 28, 2018 5:12 PM
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
> <james.r.harris@intel.com>; Wodkowski, PawelX
> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Liu, Changpeng
> <changpeng.liu@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space
> messages
> 
> 
> 
> On 03/27/2018 05:35 PM, Tomasz Kulasek wrote:
> > This patch adds new vhost user messages GET_CONFIG and SET_CONFIG used
> > for get/set virtio device's configuration space.
> >
> > Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
> > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > ---
> > Changes in v2:
> >   - code cleanup
> >
> >   lib/librte_vhost/rte_vhost.h  |  4 ++++
> >   lib/librte_vhost/vhost_user.c | 22 ++++++++++++++++++++++
> >   lib/librte_vhost/vhost_user.h | 16 ++++++++++++++++
> >   3 files changed, 42 insertions(+)
> >
> > diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
> > index d332069..fe30518 100644
> > --- a/lib/librte_vhost/rte_vhost.h
> > +++ b/lib/librte_vhost/rte_vhost.h
> > @@ -84,6 +84,10 @@ struct vhost_device_ops {
> >   	int (*new_connection)(int vid);
> >   	void (*destroy_connection)(int vid);
> >
> > +	int (*get_config)(int vid, uint8_t *config, uint32_t config_len);
> > +	int (*set_config)(int vid, uint8_t *config, uint32_t offset,
> > +			uint32_t len, uint32_t flags);
> > +
> >   	void *reserved[2]; /**< Reserved for future extension */
> 
> You are breaking the ABI, as you grow the size of the ops struct.
> 
> Also, I'm wondering if we shouldn't have a different ops for external
> backends. Here these ops are more intended to the application, we could
> have a specific ops struct for external backends IMHO.
> 
> >   };
> >
> > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> > index 90ed211..0ed6a5a 100644
> > --- a/lib/librte_vhost/vhost_user.c
> > +++ b/lib/librte_vhost/vhost_user.c
> > @@ -50,6 +50,8 @@ static const char *vhost_message_str[VHOST_USER_MAX]
> = {
> >   	[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
> >   	[VHOST_USER_SET_SLAVE_REQ_FD]  =
> "VHOST_USER_SET_SLAVE_REQ_FD",
> >   	[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
> > +	[VHOST_USER_GET_CONFIG] = "VHOST_USER_GET_CONFIG",
> > +	[VHOST_USER_SET_CONFIG] = "VHOST_USER_SET_CONFIG",
> >   };
> >
> >   static uint64_t
> > @@ -1355,6 +1357,7 @@ vhost_user_msg_handler(int vid, int fd)
> >   	 * would cause a dead lock.
> >   	 */
> >   	switch (msg.request.master) {
> > +	case VHOST_USER_SET_CONFIG:
> 
> It seems VHOST_USER_GET_CONFIG is missing here.
> 
> >   	case VHOST_USER_SET_FEATURES:
> >   	case VHOST_USER_SET_PROTOCOL_FEATURES:
> >   	case VHOST_USER_SET_OWNER:
> > @@ -1380,6 +1383,25 @@ vhost_user_msg_handler(int vid, int fd)
> >   	}
> >
> >   	switch (msg.request.master) {
> > +	case VHOST_USER_GET_CONFIG:
> > +		if (dev->notify_ops->get_config(dev->vid,
> Please check ->get_config is set before calling it.
> 
> > +				msg.payload.config.region,
> > +				msg.payload.config.size) != 0) {
> > +			msg.size = sizeof(uint64_t);
> > +		}
> > +		send_vhost_reply(fd, &msg);
> > +		break;
> > +	case VHOST_USER_SET_CONFIG:
> > +		if ((dev->notify_ops->set_config(dev->vid,
> Ditto.
> 
> > +				msg.payload.config.region,
> > +				msg.payload.config.offset,
> > +				msg.payload.config.size,
> > +				msg.payload.config.flags)) != 0) {
> > +			ret = 1;
> > +		} else {
> > +			ret = 0;
> > +		}
> 
> ret = dev->notify_ops->set_config instead?
> > +		break;
> >   	case VHOST_USER_GET_FEATURES:
> >   		msg.payload.u64 = vhost_user_get_features(dev);
> >   		msg.size = sizeof(msg.payload.u64);
> > diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
> > index d4bd604..25cc026 100644
> > --- a/lib/librte_vhost/vhost_user.h
> > +++ b/lib/librte_vhost/vhost_user.h
> > @@ -14,6 +14,11 @@
> >
> >   #define VHOST_MEMORY_MAX_NREGIONS 8
> >
> > +/*
> > + * Maximum size of virtio device config space
> > + */
> > +#define VHOST_USER_MAX_CONFIG_SIZE 256
> > +
> >   #define VHOST_USER_PROTOCOL_F_MQ	0
> >   #define VHOST_USER_PROTOCOL_F_LOG_SHMFD	1
> >   #define VHOST_USER_PROTOCOL_F_RARP	2
> 
> Shouldn't there be a protocol feature associated to these new messages?
> Else how QEMU knows the backend supports it or not?
> 
> I looked at QEMU code and indeed no protocol feature associated, that's
> strange...
Nice to have, for now not all the QEMU host driver need to get this configuration space from slave backend
when getting start. This message can be used for migration of vhost-user devices. 
> 
> > @@ -52,12 +57,15 @@ typedef enum VhostUserRequest {
> >   	VHOST_USER_NET_SET_MTU = 20,
> >   	VHOST_USER_SET_SLAVE_REQ_FD = 21,
> >   	VHOST_USER_IOTLB_MSG = 22,
> > +	VHOST_USER_GET_CONFIG = 24,
> > +	VHOST_USER_SET_CONFIG = 25,
> >   	VHOST_USER_MAX
> >   } VhostUserRequest;
> >
> >   typedef enum VhostUserSlaveRequest {
> >   	VHOST_USER_SLAVE_NONE = 0,
> >   	VHOST_USER_SLAVE_IOTLB_MSG = 1,
> > +	VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> >   	VHOST_USER_SLAVE_MAX
> >   } VhostUserSlaveRequest;
> >
> > @@ -79,6 +87,13 @@ typedef struct VhostUserLog {
> >   	uint64_t mmap_offset;
> >   } VhostUserLog;
> >
> > +typedef struct VhostUserConfig {
> > +	uint32_t offset;
> > +	uint32_t size;
> > +	uint32_t flags;
> > +	uint8_t region[VHOST_USER_MAX_CONFIG_SIZE];
> > +} VhostUserConfig;
> > +
> >   typedef struct VhostUserMsg {
> >   	union {
> >   		VhostUserRequest master;
> > @@ -98,6 +113,7 @@ typedef struct VhostUserMsg {
> >   		struct vhost_vring_addr addr;
> >   		VhostUserMemory memory;
> >   		VhostUserLog    log;
> > +		VhostUserConfig config;
> >   		struct vhost_iotlb_msg iotlb;
> >   	} payload;
> >   	int fds[VHOST_MEMORY_MAX_NREGIONS];
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space messages
  2018-03-28  9:33  0%       ` Maxime Coquelin
@ 2018-03-28  9:48  0%         ` Maxime Coquelin
  0 siblings, 0 replies; 200+ results
From: Maxime Coquelin @ 2018-03-28  9:48 UTC (permalink / raw)
  To: Wodkowski, PawelX, Kulasek, TomaszX, yliu
  Cc: Verkamp, Daniel, Harris, James R, dev, Liu, Changpeng, Tan, Jianfeng



On 03/28/2018 11:33 AM, Maxime Coquelin wrote:
> 
> 
> On 03/28/2018 11:19 AM, Wodkowski, PawelX wrote:
>>> -----Original Message-----
>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>> Sent: Wednesday, March 28, 2018 11:12 AM
>>> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
>>> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
>>> <james.r.harris@intel.com>; Wodkowski, PawelX
>>> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Liu, Changpeng
>>> <changpeng.liu@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
>>> Subject: Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space
>>> messages
>>>
>>>
>>>
>>> On 03/27/2018 05:35 PM, Tomasz Kulasek wrote:
>>>> This patch adds new vhost user messages GET_CONFIG and SET_CONFIG used
>>>> for get/set virtio device's configuration space.
>>>>
>>>> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
>>>> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
>>>> ---
>>>> Changes in v2:
>>>>    - code cleanup
>>>>
>>>>    lib/librte_vhost/rte_vhost.h  |  4 ++++
>>>>    lib/librte_vhost/vhost_user.c | 22 ++++++++++++++++++++++
>>>>    lib/librte_vhost/vhost_user.h | 16 ++++++++++++++++
>>>>    3 files changed, 42 insertions(+)
>>>>
>>>> diff --git a/lib/librte_vhost/rte_vhost.h 
>>>> b/lib/librte_vhost/rte_vhost.h
>>>> index d332069..fe30518 100644
>>>> --- a/lib/librte_vhost/rte_vhost.h
>>>> +++ b/lib/librte_vhost/rte_vhost.h
>>>> @@ -84,6 +84,10 @@ struct vhost_device_ops {
>>>>        int (*new_connection)(int vid);
>>>>        void (*destroy_connection)(int vid);
>>>>
>>>> +    int (*get_config)(int vid, uint8_t *config, uint32_t config_len);
>>>> +    int (*set_config)(int vid, uint8_t *config, uint32_t offset,
>>>> +            uint32_t len, uint32_t flags);
>>>> +
>>>>        void *reserved[2]; /**< Reserved for future extension */
>>>
>>> You are breaking the ABI, as you grow the size of the ops struct.
>>>
>>> Also, I'm wondering if we shouldn't have a different ops for external
>>> backends. Here these ops are more intended to the application, we could
>>> have a specific ops struct for external backends IMHO.
>>
>> What do mean by "external backends" ?
> 
> Libs like SPDK or Crypto that implements their own ring processing,
> comparing to an application like DPDK that doesn't care of rings
> details.

Sorry, I meant comparing to an application like OVS*

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space messages
  2018-03-28  9:19  0%     ` Wodkowski, PawelX
@ 2018-03-28  9:33  0%       ` Maxime Coquelin
  2018-03-28  9:48  0%         ` Maxime Coquelin
  0 siblings, 1 reply; 200+ results
From: Maxime Coquelin @ 2018-03-28  9:33 UTC (permalink / raw)
  To: Wodkowski, PawelX, Kulasek, TomaszX, yliu
  Cc: Verkamp, Daniel, Harris, James R, dev, Liu, Changpeng, Tan, Jianfeng



On 03/28/2018 11:19 AM, Wodkowski, PawelX wrote:
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Wednesday, March 28, 2018 11:12 AM
>> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
>> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
>> <james.r.harris@intel.com>; Wodkowski, PawelX
>> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Liu, Changpeng
>> <changpeng.liu@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space
>> messages
>>
>>
>>
>> On 03/27/2018 05:35 PM, Tomasz Kulasek wrote:
>>> This patch adds new vhost user messages GET_CONFIG and SET_CONFIG used
>>> for get/set virtio device's configuration space.
>>>
>>> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
>>> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
>>> ---
>>> Changes in v2:
>>>    - code cleanup
>>>
>>>    lib/librte_vhost/rte_vhost.h  |  4 ++++
>>>    lib/librte_vhost/vhost_user.c | 22 ++++++++++++++++++++++
>>>    lib/librte_vhost/vhost_user.h | 16 ++++++++++++++++
>>>    3 files changed, 42 insertions(+)
>>>
>>> diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
>>> index d332069..fe30518 100644
>>> --- a/lib/librte_vhost/rte_vhost.h
>>> +++ b/lib/librte_vhost/rte_vhost.h
>>> @@ -84,6 +84,10 @@ struct vhost_device_ops {
>>>    	int (*new_connection)(int vid);
>>>    	void (*destroy_connection)(int vid);
>>>
>>> +	int (*get_config)(int vid, uint8_t *config, uint32_t config_len);
>>> +	int (*set_config)(int vid, uint8_t *config, uint32_t offset,
>>> +			uint32_t len, uint32_t flags);
>>> +
>>>    	void *reserved[2]; /**< Reserved for future extension */
>>
>> You are breaking the ABI, as you grow the size of the ops struct.
>>
>> Also, I'm wondering if we shouldn't have a different ops for external
>> backends. Here these ops are more intended to the application, we could
>> have a specific ops struct for external backends IMHO.
> 
> What do mean by "external backends" ?

Libs like SPDK or Crypto that implements their own ring processing,
comparing to an application like DPDK that doesn't care of rings
details.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space messages
  2018-03-28  9:11  3%   ` Maxime Coquelin
@ 2018-03-28  9:19  0%     ` Wodkowski, PawelX
  2018-03-28  9:33  0%       ` Maxime Coquelin
  2018-03-28  9:50  0%     ` Liu, Changpeng
  1 sibling, 1 reply; 200+ results
From: Wodkowski, PawelX @ 2018-03-28  9:19 UTC (permalink / raw)
  To: Maxime Coquelin, Kulasek, TomaszX, yliu
  Cc: Verkamp, Daniel, Harris, James R, dev, Liu, Changpeng, Tan, Jianfeng

> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Wednesday, March 28, 2018 11:12 AM
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>; yliu@fridaylinux.org
> Cc: Verkamp, Daniel <daniel.verkamp@intel.com>; Harris, James R
> <james.r.harris@intel.com>; Wodkowski, PawelX
> <pawelx.wodkowski@intel.com>; dev@dpdk.org; Liu, Changpeng
> <changpeng.liu@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space
> messages
> 
> 
> 
> On 03/27/2018 05:35 PM, Tomasz Kulasek wrote:
> > This patch adds new vhost user messages GET_CONFIG and SET_CONFIG used
> > for get/set virtio device's configuration space.
> >
> > Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
> > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > ---
> > Changes in v2:
> >   - code cleanup
> >
> >   lib/librte_vhost/rte_vhost.h  |  4 ++++
> >   lib/librte_vhost/vhost_user.c | 22 ++++++++++++++++++++++
> >   lib/librte_vhost/vhost_user.h | 16 ++++++++++++++++
> >   3 files changed, 42 insertions(+)
> >
> > diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
> > index d332069..fe30518 100644
> > --- a/lib/librte_vhost/rte_vhost.h
> > +++ b/lib/librte_vhost/rte_vhost.h
> > @@ -84,6 +84,10 @@ struct vhost_device_ops {
> >   	int (*new_connection)(int vid);
> >   	void (*destroy_connection)(int vid);
> >
> > +	int (*get_config)(int vid, uint8_t *config, uint32_t config_len);
> > +	int (*set_config)(int vid, uint8_t *config, uint32_t offset,
> > +			uint32_t len, uint32_t flags);
> > +
> >   	void *reserved[2]; /**< Reserved for future extension */
> 
> You are breaking the ABI, as you grow the size of the ops struct.
> 
> Also, I'm wondering if we shouldn't have a different ops for external
> backends. Here these ops are more intended to the application, we could
> have a specific ops struct for external backends IMHO.

What do mean by "external backends" ?
> 
> >   };
> >
> > diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> > index 90ed211..0ed6a5a 100644
> > --- a/lib/librte_vhost/vhost_user.c
> > +++ b/lib/librte_vhost/vhost_user.c
> > @@ -50,6 +50,8 @@ static const char
> *vhost_message_str[VHOST_USER_MAX] = {
> >   	[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
> >   	[VHOST_USER_SET_SLAVE_REQ_FD]  =
> "VHOST_USER_SET_SLAVE_REQ_FD",
> >   	[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
> > +	[VHOST_USER_GET_CONFIG] = "VHOST_USER_GET_CONFIG",
> > +	[VHOST_USER_SET_CONFIG] = "VHOST_USER_SET_CONFIG",
> >   };
> >
> >   static uint64_t
> > @@ -1355,6 +1357,7 @@ vhost_user_msg_handler(int vid, int fd)
> >   	 * would cause a dead lock.
> >   	 */
> >   	switch (msg.request.master) {
> > +	case VHOST_USER_SET_CONFIG:
> 
> It seems VHOST_USER_GET_CONFIG is missing here.
> 
> >   	case VHOST_USER_SET_FEATURES:
> >   	case VHOST_USER_SET_PROTOCOL_FEATURES:
> >   	case VHOST_USER_SET_OWNER:
> > @@ -1380,6 +1383,25 @@ vhost_user_msg_handler(int vid, int fd)
> >   	}
> >
> >   	switch (msg.request.master) {
> > +	case VHOST_USER_GET_CONFIG:
> > +		if (dev->notify_ops->get_config(dev->vid,
> Please check ->get_config is set before calling it.
> 
> > +				msg.payload.config.region,
> > +				msg.payload.config.size) != 0) {
> > +			msg.size = sizeof(uint64_t);
> > +		}
> > +		send_vhost_reply(fd, &msg);
> > +		break;
> > +	case VHOST_USER_SET_CONFIG:
> > +		if ((dev->notify_ops->set_config(dev->vid,
> Ditto.
> 
> > +				msg.payload.config.region,
> > +				msg.payload.config.offset,
> > +				msg.payload.config.size,
> > +				msg.payload.config.flags)) != 0) {
> > +			ret = 1;
> > +		} else {
> > +			ret = 0;
> > +		}
> 
> ret = dev->notify_ops->set_config instead?
> > +		break;
> >   	case VHOST_USER_GET_FEATURES:
> >   		msg.payload.u64 = vhost_user_get_features(dev);
> >   		msg.size = sizeof(msg.payload.u64);
> > diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
> > index d4bd604..25cc026 100644
> > --- a/lib/librte_vhost/vhost_user.h
> > +++ b/lib/librte_vhost/vhost_user.h
> > @@ -14,6 +14,11 @@
> >
> >   #define VHOST_MEMORY_MAX_NREGIONS 8
> >
> > +/*
> > + * Maximum size of virtio device config space
> > + */
> > +#define VHOST_USER_MAX_CONFIG_SIZE 256
> > +
> >   #define VHOST_USER_PROTOCOL_F_MQ	0
> >   #define VHOST_USER_PROTOCOL_F_LOG_SHMFD	1
> >   #define VHOST_USER_PROTOCOL_F_RARP	2
> 
> Shouldn't there be a protocol feature associated to these new messages?
> Else how QEMU knows the backend supports it or not?
> 
> I looked at QEMU code and indeed no protocol feature associated, that's
> strange...
> 
> > @@ -52,12 +57,15 @@ typedef enum VhostUserRequest {
> >   	VHOST_USER_NET_SET_MTU = 20,
> >   	VHOST_USER_SET_SLAVE_REQ_FD = 21,
> >   	VHOST_USER_IOTLB_MSG = 22,
> > +	VHOST_USER_GET_CONFIG = 24,
> > +	VHOST_USER_SET_CONFIG = 25,
> >   	VHOST_USER_MAX
> >   } VhostUserRequest;
> >
> >   typedef enum VhostUserSlaveRequest {
> >   	VHOST_USER_SLAVE_NONE = 0,
> >   	VHOST_USER_SLAVE_IOTLB_MSG = 1,
> > +	VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
> >   	VHOST_USER_SLAVE_MAX
> >   } VhostUserSlaveRequest;
> >
> > @@ -79,6 +87,13 @@ typedef struct VhostUserLog {
> >   	uint64_t mmap_offset;
> >   } VhostUserLog;
> >
> > +typedef struct VhostUserConfig {
> > +	uint32_t offset;
> > +	uint32_t size;
> > +	uint32_t flags;
> > +	uint8_t region[VHOST_USER_MAX_CONFIG_SIZE];
> > +} VhostUserConfig;
> > +
> >   typedef struct VhostUserMsg {
> >   	union {
> >   		VhostUserRequest master;
> > @@ -98,6 +113,7 @@ typedef struct VhostUserMsg {
> >   		struct vhost_vring_addr addr;
> >   		VhostUserMemory memory;
> >   		VhostUserLog    log;
> > +		VhostUserConfig config;
> >   		struct vhost_iotlb_msg iotlb;
> >   	} payload;
> >   	int fds[VHOST_MEMORY_MAX_NREGIONS];
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] vhost: add virtio configuration space messages
  @ 2018-03-28  9:11  3%   ` Maxime Coquelin
  2018-03-28  9:19  0%     ` Wodkowski, PawelX
  2018-03-28  9:50  0%     ` Liu, Changpeng
  0 siblings, 2 replies; 200+ results
From: Maxime Coquelin @ 2018-03-28  9:11 UTC (permalink / raw)
  To: Tomasz Kulasek, yliu
  Cc: daniel.verkamp, james.r.harris, pawelx.wodkowski, dev,
	Changpeng Liu, Jianfeng Tan



On 03/27/2018 05:35 PM, Tomasz Kulasek wrote:
> This patch adds new vhost user messages GET_CONFIG and SET_CONFIG used
> for get/set virtio device's configuration space.
> 
> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> ---
> Changes in v2:
>   - code cleanup
> 
>   lib/librte_vhost/rte_vhost.h  |  4 ++++
>   lib/librte_vhost/vhost_user.c | 22 ++++++++++++++++++++++
>   lib/librte_vhost/vhost_user.h | 16 ++++++++++++++++
>   3 files changed, 42 insertions(+)
> 
> diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
> index d332069..fe30518 100644
> --- a/lib/librte_vhost/rte_vhost.h
> +++ b/lib/librte_vhost/rte_vhost.h
> @@ -84,6 +84,10 @@ struct vhost_device_ops {
>   	int (*new_connection)(int vid);
>   	void (*destroy_connection)(int vid);
>   
> +	int (*get_config)(int vid, uint8_t *config, uint32_t config_len);
> +	int (*set_config)(int vid, uint8_t *config, uint32_t offset,
> +			uint32_t len, uint32_t flags);
> +
>   	void *reserved[2]; /**< Reserved for future extension */

You are breaking the ABI, as you grow the size of the ops struct.

Also, I'm wondering if we shouldn't have a different ops for external 
backends. Here these ops are more intended to the application, we could 
have a specific ops struct for external backends IMHO.

>   };
>   
> diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
> index 90ed211..0ed6a5a 100644
> --- a/lib/librte_vhost/vhost_user.c
> +++ b/lib/librte_vhost/vhost_user.c
> @@ -50,6 +50,8 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
>   	[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
>   	[VHOST_USER_SET_SLAVE_REQ_FD]  = "VHOST_USER_SET_SLAVE_REQ_FD",
>   	[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
> +	[VHOST_USER_GET_CONFIG] = "VHOST_USER_GET_CONFIG",
> +	[VHOST_USER_SET_CONFIG] = "VHOST_USER_SET_CONFIG",
>   };
>   
>   static uint64_t
> @@ -1355,6 +1357,7 @@ vhost_user_msg_handler(int vid, int fd)
>   	 * would cause a dead lock.
>   	 */
>   	switch (msg.request.master) {
> +	case VHOST_USER_SET_CONFIG:

It seems VHOST_USER_GET_CONFIG is missing here.

>   	case VHOST_USER_SET_FEATURES:
>   	case VHOST_USER_SET_PROTOCOL_FEATURES:
>   	case VHOST_USER_SET_OWNER:
> @@ -1380,6 +1383,25 @@ vhost_user_msg_handler(int vid, int fd)
>   	}
>   
>   	switch (msg.request.master) {
> +	case VHOST_USER_GET_CONFIG:
> +		if (dev->notify_ops->get_config(dev->vid,
Please check ->get_config is set before calling it.

> +				msg.payload.config.region,
> +				msg.payload.config.size) != 0) {
> +			msg.size = sizeof(uint64_t);
> +		}
> +		send_vhost_reply(fd, &msg);
> +		break;
> +	case VHOST_USER_SET_CONFIG:
> +		if ((dev->notify_ops->set_config(dev->vid,
Ditto.

> +				msg.payload.config.region,
> +				msg.payload.config.offset,
> +				msg.payload.config.size,
> +				msg.payload.config.flags)) != 0) {
> +			ret = 1;
> +		} else {
> +			ret = 0;
> +		}

ret = dev->notify_ops->set_config instead?
> +		break;
>   	case VHOST_USER_GET_FEATURES:
>   		msg.payload.u64 = vhost_user_get_features(dev);
>   		msg.size = sizeof(msg.payload.u64);
> diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
> index d4bd604..25cc026 100644
> --- a/lib/librte_vhost/vhost_user.h
> +++ b/lib/librte_vhost/vhost_user.h
> @@ -14,6 +14,11 @@
>   
>   #define VHOST_MEMORY_MAX_NREGIONS 8
>   
> +/*
> + * Maximum size of virtio device config space
> + */
> +#define VHOST_USER_MAX_CONFIG_SIZE 256
> +
>   #define VHOST_USER_PROTOCOL_F_MQ	0
>   #define VHOST_USER_PROTOCOL_F_LOG_SHMFD	1
>   #define VHOST_USER_PROTOCOL_F_RARP	2

Shouldn't there be a protocol feature associated to these new messages?
Else how QEMU knows the backend supports it or not?

I looked at QEMU code and indeed no protocol feature associated, that's
strange...

> @@ -52,12 +57,15 @@ typedef enum VhostUserRequest {
>   	VHOST_USER_NET_SET_MTU = 20,
>   	VHOST_USER_SET_SLAVE_REQ_FD = 21,
>   	VHOST_USER_IOTLB_MSG = 22,
> +	VHOST_USER_GET_CONFIG = 24,
> +	VHOST_USER_SET_CONFIG = 25,
>   	VHOST_USER_MAX
>   } VhostUserRequest;
>   
>   typedef enum VhostUserSlaveRequest {
>   	VHOST_USER_SLAVE_NONE = 0,
>   	VHOST_USER_SLAVE_IOTLB_MSG = 1,
> +	VHOST_USER_SLAVE_CONFIG_CHANGE_MSG = 2,
>   	VHOST_USER_SLAVE_MAX
>   } VhostUserSlaveRequest;
>   
> @@ -79,6 +87,13 @@ typedef struct VhostUserLog {
>   	uint64_t mmap_offset;
>   } VhostUserLog;
>   
> +typedef struct VhostUserConfig {
> +	uint32_t offset;
> +	uint32_t size;
> +	uint32_t flags;
> +	uint8_t region[VHOST_USER_MAX_CONFIG_SIZE];
> +} VhostUserConfig;
> +
>   typedef struct VhostUserMsg {
>   	union {
>   		VhostUserRequest master;
> @@ -98,6 +113,7 @@ typedef struct VhostUserMsg {
>   		struct vhost_vring_addr addr;
>   		VhostUserMemory memory;
>   		VhostUserLog    log;
> +		VhostUserConfig config;
>   		struct vhost_iotlb_msg iotlb;
>   	} payload;
>   	int fds[VHOST_MEMORY_MAX_NREGIONS];
>

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] ethdev: replace bus specific struct with generic dev
  2018-03-27 17:40  1% [dpdk-dev] [PATCH] ethdev: replace bus specific struct with generic dev Ferruh Yigit
@ 2018-03-28  7:04  0% ` Shreyansh Jain
  2018-03-28 13:11  0% ` Legacy, Allain
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2018-03-28  7:04 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Wenzhuo Lu, Jingjing Wu, John W. Linville, Shepard Siegel,
	Ed Czeck, John Miller, Allain Legacy, Matt Peters, Harish Patil,
	Rasesh Mody, Ajit Khaparde, Somnath Kotur, Rahul Lakkireddy,
	Hemant Agrawal, Shreyansh Jain, Marcin Wojtas, Michal Krawczyk,
	Guy Tzalik, Evgeny Schemeilin, John Daley, Hyong Youb Kim,
	Qi Zhang, Xiao Wang, Beilei Xing, Konstantin Ananyev,
	Shijith Thotton, Srisivasubramanian Srinivasan, Adrien Mazarguil,
	Nelio Laranjeiro, Yongseok Koh, Jacek Siuda, Tomasz Duszynski,
	Dmitri Epshtein, Natalie Samsonov, Jianbo Liu, Alejandro Lucero,
	Tetsuya Mukawa, Santosh Shukla, Jerin Jacob, Shahed Shaikh,
	Bruce Richardson, Andrew Rybchenko, Matej Vido, Pascal Mazon,
	Maciej Czekaj, Maxime Coquelin, Tiwei Bie, Shrikrishna Khare,
	Remy Horton, Ori Kam, Pablo de Lara, Radu Nicolau, Akhil Goyal,
	Tomasz Kantecki, Cristian Dumitrescu, Thomas Monjalon, dev

On 3/27/2018 11:10 PM, Ferruh Yigit wrote:
> Public struct rte_eth_dev_info has a "struct rte_pci_device" field in it
> although it is common for all ethdev in all buses.
> 
> Replacing pci specific struct with generic device struct and updating
> places that are using pci device in a way to get this information from
> generic device.
> 
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
> Cc: Pablo de Lara <pablo.de.lara.guarch@intel.com>
> 
> There is no deprecation notice sent for this update but in this release
> ethdev info already updated and ABI already broken, it can be good
> opportunity for this update.
> ---
>   app/test-pmd/config.c                     | 11 ++++++++++-
>   app/test-pmd/testpmd.h                    | 24 ++++++++++++++++++------
>   drivers/net/af_packet/rte_eth_af_packet.c |  1 +
>   drivers/net/ark/ark_ethdev.c              |  4 +++-
>   drivers/net/avf/avf_ethdev.c              |  2 +-
>   drivers/net/avp/avp_ethdev.c              |  2 +-
>   drivers/net/bnx2x/bnx2x_ethdev.c          |  2 +-
>   drivers/net/bnxt/bnxt_ethdev.c            |  2 +-
>   drivers/net/cxgbe/cxgbe_ethdev.c          |  2 +-
>   drivers/net/dpaa/dpaa_ethdev.c            |  1 +
>   drivers/net/dpaa2/dpaa2_ethdev.c          |  1 +
>   drivers/net/e1000/em_ethdev.c             |  2 +-
>   drivers/net/e1000/igb_ethdev.c            |  4 ++--
>   drivers/net/ena/ena_ethdev.c              |  2 +-
>   drivers/net/enic/enic_ethdev.c            |  2 +-
>   drivers/net/fm10k/fm10k_ethdev.c          |  2 +-
>   drivers/net/i40e/i40e_ethdev.c            |  2 +-
>   drivers/net/i40e/i40e_ethdev_vf.c         |  2 +-
>   drivers/net/ixgbe/ixgbe_ethdev.c          |  4 ++--
>   drivers/net/kni/rte_eth_kni.c             |  2 +-
>   drivers/net/liquidio/lio_ethdev.c         |  2 +-
>   drivers/net/mlx4/mlx4_ethdev.c            |  2 +-
>   drivers/net/mlx5/mlx5_ethdev.c            |  2 +-
>   drivers/net/mrvl/mrvl_ethdev.c            |  2 ++
>   drivers/net/nfp/nfp_net.c                 |  2 +-
>   drivers/net/null/rte_eth_null.c           |  1 +
>   drivers/net/octeontx/octeontx_ethdev.c    |  2 +-
>   drivers/net/pcap/rte_eth_pcap.c           |  1 +
>   drivers/net/qede/qede_ethdev.c            |  2 +-
>   drivers/net/ring/rte_eth_ring.c           |  1 +
>   drivers/net/sfc/sfc_ethdev.c              |  2 +-
>   drivers/net/szedata2/rte_eth_szedata2.c   |  2 +-
>   drivers/net/tap/rte_eth_tap.c             |  2 +-
>   drivers/net/thunderx/nicvf_ethdev.c       |  2 +-
>   drivers/net/virtio/virtio_ethdev.c        |  2 +-
>   drivers/net/vmxnet3/vmxnet3_ethdev.c      |  2 +-
>   examples/ethtool/lib/rte_ethtool.c        | 15 +++++++++------
>   examples/ip_pipeline/init.c               | 10 ++++++++--
>   examples/kni/main.c                       | 10 +++++++---
>   lib/librte_ether/rte_ethdev.h             |  2 +-
>   test/test/test_kni.c                      | 28 ++++++++++++++++++++++------
>   41 files changed, 114 insertions(+), 54 deletions(-)
> 

[...]

> diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
> index 781d75cc2..ec3a024c6 100644
> --- a/drivers/net/cxgbe/cxgbe_ethdev.c
> +++ b/drivers/net/cxgbe/cxgbe_ethdev.c
> @@ -148,7 +148,7 @@ static void cxgbe_dev_info_get(struct rte_eth_dev *eth_dev,
>   		.nb_align = 1,
>   	};
>   
> -	device_info->pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
> +	device_info->device = eth_dev->device;
>   
>   	device_info->min_rx_bufsize = CXGBE_MIN_RX_BUFSIZE;
>   	device_info->max_rx_pktlen = CXGBE_MAX_RX_PKTLEN;
> diff --git a/drivers/net/dpaa/dpaa_ethdev.c b/drivers/net/dpaa/dpaa_ethdev.c
> index db493648a..29513158f 100644
> --- a/drivers/net/dpaa/dpaa_ethdev.c
> +++ b/drivers/net/dpaa/dpaa_ethdev.c
> @@ -245,6 +245,7 @@ static void dpaa_eth_dev_info(struct rte_eth_dev *dev,
>   
>   	PMD_INIT_FUNC_TRACE();
>   
> +	dev_info->device = dev->device;
>   	dev_info->max_rx_queues = dpaa_intf->nb_rx_queues;
>   	dev_info->max_tx_queues = dpaa_intf->nb_tx_queues;
>   	dev_info->min_rx_bufsize = DPAA_MIN_RX_BUF_SIZE;
> diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c
> index 2fb7b2da7..7802067e8 100644
> --- a/drivers/net/dpaa2/dpaa2_ethdev.c
> +++ b/drivers/net/dpaa2/dpaa2_ethdev.c
> @@ -163,6 +163,7 @@ dpaa2_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
>   
>   	dev_info->if_index = priv->hw_id;
>   
> +	dev_info->device = dev->device;
>   	dev_info->max_mac_addrs = priv->max_mac_filters;
>   	dev_info->max_rx_pktlen = DPAA2_MAX_RX_PKT_LEN;
>   	dev_info->min_rx_bufsize = DPAA2_MIN_RX_BUF_SIZE;

[...]

For dpaa and dpaa2 specific change...

> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index ab1030d42..0ed903966 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -995,7 +995,7 @@ struct rte_pci_device;
>    * Ethernet device information
>    */
>   struct rte_eth_dev_info {
> -	struct rte_pci_device *pci_dev; /**< Device PCI information. */
> +	struct rte_device *device; /** Generic device information */
>   	const char *driver_name; /**< Device Driver name. */
>   	unsigned int if_index; /**< Index to bound host interface, or 0 if none.
>   		Use if_indextoname() to translate into an interface name. */

[...]

And for the above change:

Acked-By: Shreyansh Jain <shreyansh.jain@nxp.com>

_
Shreyansh

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 0/4] ethdev: add per-PMD tuning of RxTx parmeters
  2018-03-21 14:27  3% ` [dpdk-dev] [PATCH v2 " Remy Horton
@ 2018-03-27 18:43  0%   ` Ferruh Yigit
  2018-03-30 10:34  0%     ` Ferruh Yigit
  2018-04-04 17:17  3%   ` [dpdk-dev] [PATCH v3 " Remy Horton
  1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-03-27 18:43 UTC (permalink / raw)
  To: Remy Horton, dev
  Cc: John McNamara, Wenzhuo Lu, Jingjing Wu, Qi Zhang, Beilei Xing,
	Shreyansh Jain, Thomas Monjalon

On 3/21/2018 2:27 PM, Remy Horton wrote:
> The optimal values of several transmission & reception related parameters,
> such as burst sizes, descriptor ring sizes, and number of queues, varies
> between different network interface devices. This patchset allows individual
> PMDs to specify their preferred parameter values, and if so indicated by an
> application, for them to be used automatically by the ethdev layer.
> 
> rte_eth_dev_configure() has been changed so that specifying zero for both
> nb_rx_q AND nb_tx_q causes it to use driver preferred values, and if these
> are not available, falls back to EAL defaults. Setting one (but not both)
> to zero does not cause the use of defaults, as having one of them zeroed is
> a valid setup.
> 
> This RFC/V1 includes per-PMD values for e1000 and i40e but it is expected
> that subsequent patchsets will cover other PMDs. A deprecation notice
> covering the API/ABI change is in place.
> 
> 
> Changes in v2:
> * Rebased to 
> * Removed fallback values from rte_eth_dev_info_get()
> * Added fallback values to rte_rte_[rt]x_queue_setup()
> * Added fallback values to rte_eth_dev_configure()
> * Corrected comment
> * Removed deprecation notice
> * Split RX and Tx into seperate structures
> * Changed parameter names
> 
> 
> Remy Horton (4):
>   ethdev: add support for PMD-tuned Tx/Rx parameters
>   net/e1000: add TxRx tuning parameters
>   net/i40e: add TxRx tuning parameters
>   testpmd: make use of per-PMD TxRx parameters

Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] ethdev: replace bus specific struct with generic dev
@ 2018-03-27 17:40  1% Ferruh Yigit
  2018-03-28  7:04  0% ` Shreyansh Jain
                   ` (3 more replies)
  0 siblings, 4 replies; 200+ results
From: Ferruh Yigit @ 2018-03-27 17:40 UTC (permalink / raw)
  To: Wenzhuo Lu, Jingjing Wu, John W. Linville, Shepard Siegel,
	Ed Czeck, John Miller, Allain Legacy, Matt Peters, Harish Patil,
	Rasesh Mody, Ajit Khaparde, Somnath Kotur, Rahul Lakkireddy,
	Hemant Agrawal, Shreyansh Jain, Marcin Wojtas, Michal Krawczyk,
	Guy Tzalik, Evgeny Schemeilin, John Daley, Hyong Youb Kim,
	Qi Zhang, Xiao Wang, Beilei Xing, Konstantin Ananyev,
	Shijith Thotton, Srisivasubramanian Srinivasan, Adrien Mazarguil,
	Nelio Laranjeiro, Yongseok Koh, Jacek Siuda, Tomasz Duszynski,
	Dmitri Epshtein, Natalie Samsonov, Jianbo Liu, Alejandro Lucero,
	Tetsuya Mukawa, Santosh Shukla, Jerin Jacob, Shahed Shaikh,
	Bruce Richardson, Andrew Rybchenko, Matej Vido, Pascal Mazon,
	Maciej Czekaj, Maxime Coquelin, Tiwei Bie, Shrikrishna Khare,
	Remy Horton, Ori Kam, Pablo de Lara, Radu Nicolau, Akhil Goyal,
	Tomasz Kantecki, Cristian Dumitrescu, Thomas Monjalon
  Cc: dev, Ferruh Yigit

Public struct rte_eth_dev_info has a "struct rte_pci_device" field in it
although it is common for all ethdev in all buses.

Replacing pci specific struct with generic device struct and updating
places that are using pci device in a way to get this information from
generic device.

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
Cc: Pablo de Lara <pablo.de.lara.guarch@intel.com>

There is no deprecation notice sent for this update but in this release
ethdev info already updated and ABI already broken, it can be good
opportunity for this update.
---
 app/test-pmd/config.c                     | 11 ++++++++++-
 app/test-pmd/testpmd.h                    | 24 ++++++++++++++++++------
 drivers/net/af_packet/rte_eth_af_packet.c |  1 +
 drivers/net/ark/ark_ethdev.c              |  4 +++-
 drivers/net/avf/avf_ethdev.c              |  2 +-
 drivers/net/avp/avp_ethdev.c              |  2 +-
 drivers/net/bnx2x/bnx2x_ethdev.c          |  2 +-
 drivers/net/bnxt/bnxt_ethdev.c            |  2 +-
 drivers/net/cxgbe/cxgbe_ethdev.c          |  2 +-
 drivers/net/dpaa/dpaa_ethdev.c            |  1 +
 drivers/net/dpaa2/dpaa2_ethdev.c          |  1 +
 drivers/net/e1000/em_ethdev.c             |  2 +-
 drivers/net/e1000/igb_ethdev.c            |  4 ++--
 drivers/net/ena/ena_ethdev.c              |  2 +-
 drivers/net/enic/enic_ethdev.c            |  2 +-
 drivers/net/fm10k/fm10k_ethdev.c          |  2 +-
 drivers/net/i40e/i40e_ethdev.c            |  2 +-
 drivers/net/i40e/i40e_ethdev_vf.c         |  2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c          |  4 ++--
 drivers/net/kni/rte_eth_kni.c             |  2 +-
 drivers/net/liquidio/lio_ethdev.c         |  2 +-
 drivers/net/mlx4/mlx4_ethdev.c            |  2 +-
 drivers/net/mlx5/mlx5_ethdev.c            |  2 +-
 drivers/net/mrvl/mrvl_ethdev.c            |  2 ++
 drivers/net/nfp/nfp_net.c                 |  2 +-
 drivers/net/null/rte_eth_null.c           |  1 +
 drivers/net/octeontx/octeontx_ethdev.c    |  2 +-
 drivers/net/pcap/rte_eth_pcap.c           |  1 +
 drivers/net/qede/qede_ethdev.c            |  2 +-
 drivers/net/ring/rte_eth_ring.c           |  1 +
 drivers/net/sfc/sfc_ethdev.c              |  2 +-
 drivers/net/szedata2/rte_eth_szedata2.c   |  2 +-
 drivers/net/tap/rte_eth_tap.c             |  2 +-
 drivers/net/thunderx/nicvf_ethdev.c       |  2 +-
 drivers/net/virtio/virtio_ethdev.c        |  2 +-
 drivers/net/vmxnet3/vmxnet3_ethdev.c      |  2 +-
 examples/ethtool/lib/rte_ethtool.c        | 15 +++++++++------
 examples/ip_pipeline/init.c               | 10 ++++++++--
 examples/kni/main.c                       | 10 +++++++---
 lib/librte_ether/rte_ethdev.h             |  2 +-
 test/test/test_kni.c                      | 28 ++++++++++++++++++++++------
 41 files changed, 114 insertions(+), 54 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index 51f725865..e5578a472 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -754,6 +754,8 @@ vlan_id_is_invalid(uint16_t vlan_id)
 static int
 port_reg_off_is_invalid(portid_t port_id, uint32_t reg_off)
 {
+	const struct rte_pci_device *pci_dev;
+	const struct rte_bus *bus;
 	uint64_t pci_len;
 
 	if (reg_off & 0x3) {
@@ -762,7 +764,14 @@ port_reg_off_is_invalid(portid_t port_id, uint32_t reg_off)
 		       (unsigned)reg_off);
 		return 1;
 	}
-	pci_len = ports[port_id].dev_info.pci_dev->mem_resource[0].len;
+
+	bus = rte_bus_find_by_device(ports[port_id].dev_info.device);
+	if (bus && !strcmp(bus->name, "pci"))
+		pci_dev = RTE_DEV_TO_PCI(ports[port_id].dev_info.device);
+	else
+		return 1;
+
+	pci_len = pci_dev->mem_resource[0].len;
 	if (reg_off >= pci_len) {
 		printf("Port %d: register offset %u (0x%X) out of port PCI "
 		       "resource (length=%"PRIu64")\n",
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 153abea05..a58fb4e70 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -500,12 +500,18 @@ mbuf_pool_find(unsigned int sock_id)
 static inline uint32_t
 port_pci_reg_read(struct rte_port *port, uint32_t reg_off)
 {
+	const struct rte_pci_device *pci_dev;
+	const struct rte_bus *bus;
 	void *reg_addr;
 	uint32_t reg_v;
 
-	reg_addr = (void *)
-		((char *)port->dev_info.pci_dev->mem_resource[0].addr +
-			reg_off);
+	bus = rte_bus_find_by_device(port->dev_info.device);
+	if (bus && !strcmp(bus->name, "pci"))
+		pci_dev = RTE_DEV_TO_PCI(port->dev_info.device);
+	else
+		return 0;
+
+	reg_addr = ((char *)pci_dev->mem_resource[0].addr + reg_off);
 	reg_v = *((volatile uint32_t *)reg_addr);
 	return rte_le_to_cpu_32(reg_v);
 }
@@ -516,11 +522,17 @@ port_pci_reg_read(struct rte_port *port, uint32_t reg_off)
 static inline void
 port_pci_reg_write(struct rte_port *port, uint32_t reg_off, uint32_t reg_v)
 {
+	const struct rte_pci_device *pci_dev;
+	const struct rte_bus *bus;
 	void *reg_addr;
 
-	reg_addr = (void *)
-		((char *)port->dev_info.pci_dev->mem_resource[0].addr +
-			reg_off);
+	bus = rte_bus_find_by_device(port->dev_info.device);
+	if (bus && !strcmp(bus->name, "pci"))
+		pci_dev = RTE_DEV_TO_PCI(port->dev_info.device);
+	else
+		return;
+
+	reg_addr = ((char *)pci_dev->mem_resource[0].addr + reg_off);
 	*((volatile uint32_t *)reg_addr) = rte_cpu_to_le_32(reg_v);
 }
 
diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c
index 57eccfd04..2dc5cf527 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -293,6 +293,7 @@ eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 {
 	struct pmd_internals *internals = dev->data->dev_private;
 
+	dev_info->device = dev->device;
 	dev_info->if_index = internals->if_index;
 	dev_info->max_mac_addrs = 1;
 	dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
diff --git a/drivers/net/ark/ark_ethdev.c b/drivers/net/ark/ark_ethdev.c
index ff87c20e2..3d5412879 100644
--- a/drivers/net/ark/ark_ethdev.c
+++ b/drivers/net/ark/ark_ethdev.c
@@ -748,6 +748,8 @@ eth_ark_dev_info_get(struct rte_eth_dev *dev,
 	struct ark_mpu_t *rx_mpu = RTE_PTR_ADD(ark->bar0, ARK_MPU_RX_BASE);
 	uint16_t ports = ark->num_ports;
 
+	dev_info->device = dev->device;
+
 	dev_info->max_rx_pktlen = ARK_RX_MAX_PKT_LEN;
 	dev_info->min_rx_bufsize = ARK_RX_MIN_BUFSIZE;
 
@@ -771,7 +773,7 @@ eth_ark_dev_info_get(struct rte_eth_dev *dev,
 				ETH_LINK_SPEED_40G |
 				ETH_LINK_SPEED_50G |
 				ETH_LINK_SPEED_100G);
-	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	dev_info->device = dev->device;
 }
 
 static int
diff --git a/drivers/net/avf/avf_ethdev.c b/drivers/net/avf/avf_ethdev.c
index 4442c3cd8..81743c879 100644
--- a/drivers/net/avf/avf_ethdev.c
+++ b/drivers/net/avf/avf_ethdev.c
@@ -507,7 +507,7 @@ avf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	struct avf_info *vf = AVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
 
 	memset(dev_info, 0, sizeof(*dev_info));
-	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	dev_info->device = dev->device;
 	dev_info->max_rx_queues = vf->vsi_res->num_queue_pairs;
 	dev_info->max_tx_queues = vf->vsi_res->num_queue_pairs;
 	dev_info->min_rx_bufsize = AVF_BUF_SIZE_MIN;
diff --git a/drivers/net/avp/avp_ethdev.c b/drivers/net/avp/avp_ethdev.c
index dba99120f..0c1d8ae5d 100644
--- a/drivers/net/avp/avp_ethdev.c
+++ b/drivers/net/avp/avp_ethdev.c
@@ -2206,7 +2206,7 @@ avp_dev_info_get(struct rte_eth_dev *eth_dev,
 {
 	struct avp_dev *avp = AVP_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
 
-	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
+	dev_info->device = eth_dev->device;
 	dev_info->max_rx_queues = avp->max_rx_queues;
 	dev_info->max_tx_queues = avp->max_tx_queues;
 	dev_info->min_rx_bufsize = AVP_MIN_RX_BUFSIZE;
diff --git a/drivers/net/bnx2x/bnx2x_ethdev.c b/drivers/net/bnx2x/bnx2x_ethdev.c
index 483d5a17c..29636522e 100644
--- a/drivers/net/bnx2x/bnx2x_ethdev.c
+++ b/drivers/net/bnx2x/bnx2x_ethdev.c
@@ -447,7 +447,7 @@ static void
 bnx2x_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 {
 	struct bnx2x_softc *sc = dev->data->dev_private;
-	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	dev_info->device = dev->device;
 	dev_info->max_rx_queues  = sc->max_rx_queues;
 	dev_info->max_tx_queues  = sc->max_tx_queues;
 	dev_info->min_rx_bufsize = BNX2X_MIN_RX_BUF_SIZE;
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index 21c46f833..a1f8a7d63 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -407,7 +407,7 @@ static void bnxt_dev_info_get_op(struct rte_eth_dev *eth_dev,
 	uint16_t max_vnics, i, j, vpool, vrxq;
 	unsigned int max_rx_rings;
 
-	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
+	dev_info->device = eth_dev->device;
 
 	/* MAC Specifics */
 	dev_info->max_mac_addrs = bp->max_l2_ctx;
diff --git a/drivers/net/cxgbe/cxgbe_ethdev.c b/drivers/net/cxgbe/cxgbe_ethdev.c
index 781d75cc2..ec3a024c6 100644
--- a/drivers/net/cxgbe/cxgbe_ethdev.c
+++ b/drivers/net/cxgbe/cxgbe_ethdev.c
@@ -148,7 +148,7 @@ static void cxgbe_dev_info_get(struct rte_eth_dev *eth_dev,
 		.nb_align = 1,
 	};
 
-	device_info->pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
+	device_info->device = eth_dev->device;
 
 	device_info->min_rx_bufsize = CXGBE_MIN_RX_BUFSIZE;
 	device_info->max_rx_pktlen = CXGBE_MAX_RX_PKTLEN;
diff --git a/drivers/net/dpaa/dpaa_ethdev.c b/drivers/net/dpaa/dpaa_ethdev.c
index db493648a..29513158f 100644
--- a/drivers/net/dpaa/dpaa_ethdev.c
+++ b/drivers/net/dpaa/dpaa_ethdev.c
@@ -245,6 +245,7 @@ static void dpaa_eth_dev_info(struct rte_eth_dev *dev,
 
 	PMD_INIT_FUNC_TRACE();
 
+	dev_info->device = dev->device;
 	dev_info->max_rx_queues = dpaa_intf->nb_rx_queues;
 	dev_info->max_tx_queues = dpaa_intf->nb_tx_queues;
 	dev_info->min_rx_bufsize = DPAA_MIN_RX_BUF_SIZE;
diff --git a/drivers/net/dpaa2/dpaa2_ethdev.c b/drivers/net/dpaa2/dpaa2_ethdev.c
index 2fb7b2da7..7802067e8 100644
--- a/drivers/net/dpaa2/dpaa2_ethdev.c
+++ b/drivers/net/dpaa2/dpaa2_ethdev.c
@@ -163,6 +163,7 @@ dpaa2_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 
 	dev_info->if_index = priv->hw_id;
 
+	dev_info->device = dev->device;
 	dev_info->max_mac_addrs = priv->max_mac_filters;
 	dev_info->max_rx_pktlen = DPAA2_MAX_RX_PKT_LEN;
 	dev_info->min_rx_bufsize = DPAA2_MIN_RX_BUF_SIZE;
diff --git a/drivers/net/e1000/em_ethdev.c b/drivers/net/e1000/em_ethdev.c
index 0358cbfa4..8ab6361a2 100644
--- a/drivers/net/e1000/em_ethdev.c
+++ b/drivers/net/e1000/em_ethdev.c
@@ -1048,7 +1048,7 @@ eth_em_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 {
 	struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 
-	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	dev_info->device = dev->device;
 	dev_info->min_rx_bufsize = 256; /* See BSIZE field of RCTL register. */
 	dev_info->max_rx_pktlen = em_get_max_pktlen(hw);
 	dev_info->max_mac_addrs = hw->mac.rar_entry_count;
diff --git a/drivers/net/e1000/igb_ethdev.c b/drivers/net/e1000/igb_ethdev.c
index d7eef9a6c..5504fc16c 100644
--- a/drivers/net/e1000/igb_ethdev.c
+++ b/drivers/net/e1000/igb_ethdev.c
@@ -2144,7 +2144,7 @@ eth_igb_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 {
 	struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 
-	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	dev_info->device = dev->device;
 	dev_info->min_rx_bufsize = 256; /* See BSIZE field of RCTL register. */
 	dev_info->max_rx_pktlen  = 0x3FFF; /* See RLPML register. */
 	dev_info->max_mac_addrs = hw->mac.rar_entry_count;
@@ -2273,7 +2273,7 @@ eth_igbvf_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 {
 	struct e1000_hw *hw = E1000_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 
-	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	dev_info->device = dev->device;
 	dev_info->min_rx_bufsize = 256; /* See BSIZE field of RCTL register. */
 	dev_info->max_rx_pktlen  = 0x3FFF; /* See RLPML register. */
 	dev_info->max_mac_addrs = hw->mac.rar_entry_count;
diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index ad4e03dba..4806f2601 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -1527,7 +1527,7 @@ static void ena_infos_get(struct rte_eth_dev *dev,
 	ena_dev = &adapter->ena_dev;
 	ena_assert_msg(ena_dev != NULL, "Uninitialized device");
 
-	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	dev_info->device = dev->device;
 
 	dev_info->speed_capa =
 			ETH_LINK_SPEED_1G   |
diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index 03f0c2547..c4250db28 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -471,7 +471,7 @@ static void enicpmd_dev_info_get(struct rte_eth_dev *eth_dev,
 	struct enic *enic = pmd_priv(eth_dev);
 
 	ENICPMD_FUNC_TRACE();
-	device_info->pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
+	device_info->device = eth_dev->device;
 	/* Scattered Rx uses two receive queues per rx queue exposed to dpdk */
 	device_info->max_rx_queues = enic->conf_rq_count / 2;
 	device_info->max_tx_queues = enic->conf_wq_count;
diff --git a/drivers/net/fm10k/fm10k_ethdev.c b/drivers/net/fm10k/fm10k_ethdev.c
index cc1a773a7..116be6f13 100644
--- a/drivers/net/fm10k/fm10k_ethdev.c
+++ b/drivers/net/fm10k/fm10k_ethdev.c
@@ -1375,7 +1375,7 @@ fm10k_dev_infos_get(struct rte_eth_dev *dev,
 
 	PMD_INIT_FUNC_TRACE();
 
-	dev_info->pci_dev            = pdev;
+	dev_info->device             = dev->device;
 	dev_info->min_rx_bufsize     = FM10K_MIN_RX_BUF_SIZE;
 	dev_info->max_rx_pktlen      = FM10K_MAX_PKT_SIZE;
 	dev_info->max_rx_queues      = hw->mac.max_queues;
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index f9408b0e7..54c577170 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -3200,7 +3200,7 @@ i40e_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	struct i40e_vsi *vsi = pf->main_vsi;
 	struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(dev);
 
-	dev_info->pci_dev = pci_dev;
+	dev_info->device = dev->device;
 	dev_info->max_rx_queues = vsi->nb_qps;
 	dev_info->max_tx_queues = vsi->nb_qps;
 	dev_info->min_rx_bufsize = I40E_BUF_SIZE_MIN;
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c
index c30a0e9c3..ce468c7b3 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -2165,7 +2165,7 @@ i40evf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
 
 	memset(dev_info, 0, sizeof(*dev_info));
-	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	dev_info->device = dev->device;
 	dev_info->max_rx_queues = vf->vsi_res->num_queue_pairs;
 	dev_info->max_tx_queues = vf->vsi_res->num_queue_pairs;
 	dev_info->min_rx_bufsize = I40E_BUF_SIZE_MIN;
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index b65fab736..f66c6c515 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -3572,7 +3572,7 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 	struct rte_eth_conf *dev_conf = &dev->data->dev_conf;
 
-	dev_info->pci_dev = pci_dev;
+	dev_info->device = dev->device;
 	dev_info->max_rx_queues = (uint16_t)hw->mac.max_rx_queues;
 	dev_info->max_tx_queues = (uint16_t)hw->mac.max_tx_queues;
 	if (RTE_ETH_DEV_SRIOV(dev).active == 0) {
@@ -3731,7 +3731,7 @@ ixgbevf_dev_info_get(struct rte_eth_dev *dev,
 	struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(dev);
 	struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 
-	dev_info->pci_dev = pci_dev;
+	dev_info->device = dev->device;
 	dev_info->max_rx_queues = (uint16_t)hw->mac.max_rx_queues;
 	dev_info->max_tx_queues = (uint16_t)hw->mac.max_tx_queues;
 	dev_info->min_rx_bufsize = 1024; /* cf BSIZEPACKET in SRRCTL reg */
diff --git a/drivers/net/kni/rte_eth_kni.c b/drivers/net/kni/rte_eth_kni.c
index dc4e65f5d..d35e2beb9 100644
--- a/drivers/net/kni/rte_eth_kni.c
+++ b/drivers/net/kni/rte_eth_kni.c
@@ -201,7 +201,7 @@ eth_kni_dev_info(struct rte_eth_dev *dev __rte_unused,
 	dev_info->max_rx_queues = KNI_MAX_QUEUE_PER_PORT;
 	dev_info->max_tx_queues = KNI_MAX_QUEUE_PER_PORT;
 	dev_info->min_rx_bufsize = 0;
-	dev_info->pci_dev = NULL;
+	dev_info->device = NULL;
 }
 
 static int
diff --git a/drivers/net/liquidio/lio_ethdev.c b/drivers/net/liquidio/lio_ethdev.c
index eeb8350e4..a135645bc 100644
--- a/drivers/net/liquidio/lio_ethdev.c
+++ b/drivers/net/liquidio/lio_ethdev.c
@@ -373,7 +373,7 @@ lio_dev_info_get(struct rte_eth_dev *eth_dev,
 	struct lio_device *lio_dev = LIO_DEV(eth_dev);
 	struct rte_pci_device *pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
 
-	devinfo->pci_dev = pci_dev;
+	devinfo->device = eth_dev->device;
 
 	switch (pci_dev->id.subsystem_device_id) {
 	/* CN23xx 10G cards */
diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index beecc53ba..577aab3d3 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -555,7 +555,7 @@ mlx4_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	unsigned int max;
 	char ifname[IF_NAMESIZE];
 
-	info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	info->device = dev->device;
 	/* FIXME: we should ask the device for these values. */
 	info->min_rx_bufsize = 32;
 	info->max_rx_pktlen = 65536;
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index aa4118026..693755a74 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -403,7 +403,7 @@ mlx5_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *info)
 	unsigned int max;
 	char ifname[IF_NAMESIZE];
 
-	info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	info->device = dev->device;
 	/* FIXME: we should ask the device for these values. */
 	info->min_rx_bufsize = 32;
 	info->max_rx_pktlen = 65536;
diff --git a/drivers/net/mrvl/mrvl_ethdev.c b/drivers/net/mrvl/mrvl_ethdev.c
index c0483b912..d46c65255 100644
--- a/drivers/net/mrvl/mrvl_ethdev.c
+++ b/drivers/net/mrvl/mrvl_ethdev.c
@@ -1314,6 +1314,8 @@ static void
 mrvl_dev_infos_get(struct rte_eth_dev *dev __rte_unused,
 		   struct rte_eth_dev_info *info)
 {
+	info->device = dev->device;
+
 	info->speed_capa = ETH_LINK_SPEED_10M |
 			   ETH_LINK_SPEED_100M |
 			   ETH_LINK_SPEED_1G |
diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index 8591c7de0..add00baf9 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -1159,7 +1159,7 @@ nfp_net_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 
 	hw = NFP_NET_DEV_PRIVATE_TO_HW(dev->data->dev_private);
 
-	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	dev_info->device = dev->device;
 	dev_info->max_rx_queues = (uint16_t)hw->max_rx_queues;
 	dev_info->max_tx_queues = (uint16_t)hw->max_tx_queues;
 	dev_info->min_rx_bufsize = ETHER_MIN_MTU;
diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c
index 73fe8b04a..7506f77f6 100644
--- a/drivers/net/null/rte_eth_null.c
+++ b/drivers/net/null/rte_eth_null.c
@@ -292,6 +292,7 @@ eth_dev_info(struct rte_eth_dev *dev,
 		return;
 
 	internals = dev->data->dev_private;
+	dev_info->device = dev->device;
 	dev_info->max_mac_addrs = 1;
 	dev_info->max_rx_pktlen = (uint32_t)-1;
 	dev_info->max_rx_queues = RTE_DIM(internals->rx_null_queues);
diff --git a/drivers/net/octeontx/octeontx_ethdev.c b/drivers/net/octeontx/octeontx_ethdev.c
index 90dd249a6..edd4dd3ff 100644
--- a/drivers/net/octeontx/octeontx_ethdev.c
+++ b/drivers/net/octeontx/octeontx_ethdev.c
@@ -611,7 +611,7 @@ octeontx_dev_info(struct rte_eth_dev *dev,
 	dev_info->max_rx_queues = 1;
 	dev_info->max_tx_queues = PKO_MAX_NUM_DQ;
 	dev_info->min_rx_bufsize = 0;
-	dev_info->pci_dev = NULL;
+	dev_info->device = NULL;
 
 	dev_info->default_rxconf = (struct rte_eth_rxconf) {
 		.rx_free_thresh = 0,
diff --git a/drivers/net/pcap/rte_eth_pcap.c b/drivers/net/pcap/rte_eth_pcap.c
index c1571e1fe..2e739a24e 100644
--- a/drivers/net/pcap/rte_eth_pcap.c
+++ b/drivers/net/pcap/rte_eth_pcap.c
@@ -526,6 +526,7 @@ eth_dev_info(struct rte_eth_dev *dev,
 {
 	struct pmd_internals *internals = dev->data->dev_private;
 
+	dev_info->device = dev->device;
 	dev_info->if_index = internals->if_index;
 	dev_info->max_mac_addrs = 1;
 	dev_info->max_rx_pktlen = (uint32_t) -1;
diff --git a/drivers/net/qede/qede_ethdev.c b/drivers/net/qede/qede_ethdev.c
index a91f43683..59d604b78 100644
--- a/drivers/net/qede/qede_ethdev.c
+++ b/drivers/net/qede/qede_ethdev.c
@@ -1515,7 +1515,7 @@ qede_dev_info_get(struct rte_eth_dev *eth_dev,
 
 	PMD_INIT_FUNC_TRACE(edev);
 
-	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(eth_dev);
+	dev_info->device = eth_dev->device;
 	dev_info->min_rx_bufsize = (uint32_t)QEDE_MIN_RX_BUFF_SIZE;
 	dev_info->max_rx_pktlen = (uint32_t)ETH_TX_MAX_NON_LSO_PKT_LEN;
 	dev_info->rx_desc_lim = qede_rx_desc_lim;
diff --git a/drivers/net/ring/rte_eth_ring.c b/drivers/net/ring/rte_eth_ring.c
index df13c44be..14274fa36 100644
--- a/drivers/net/ring/rte_eth_ring.c
+++ b/drivers/net/ring/rte_eth_ring.c
@@ -153,6 +153,7 @@ eth_dev_info(struct rte_eth_dev *dev,
 		struct rte_eth_dev_info *dev_info)
 {
 	struct pmd_internals *internals = dev->data->dev_private;
+	dev_info->device = dev->device;
 	dev_info->max_mac_addrs = 1;
 	dev_info->max_rx_pktlen = (uint32_t)-1;
 	dev_info->max_rx_queues = (uint16_t)internals->max_rx_queues;
diff --git a/drivers/net/sfc/sfc_ethdev.c b/drivers/net/sfc/sfc_ethdev.c
index f16d52081..2c0ad7ecf 100644
--- a/drivers/net/sfc/sfc_ethdev.c
+++ b/drivers/net/sfc/sfc_ethdev.c
@@ -89,7 +89,7 @@ sfc_dev_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 
 	sfc_log_init(sa, "entry");
 
-	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	dev_info->device = dev->device;
 	dev_info->max_rx_pktlen = EFX_MAC_PDU_MAX;
 
 	/* Autonegotiation may be disabled */
diff --git a/drivers/net/szedata2/rte_eth_szedata2.c b/drivers/net/szedata2/rte_eth_szedata2.c
index 1d02aee6f..4157cc88f 100644
--- a/drivers/net/szedata2/rte_eth_szedata2.c
+++ b/drivers/net/szedata2/rte_eth_szedata2.c
@@ -1031,7 +1031,7 @@ eth_dev_info(struct rte_eth_dev *dev,
 		struct rte_eth_dev_info *dev_info)
 {
 	struct pmd_internals *internals = dev->data->dev_private;
-	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	dev_info->device = dev->device;
 	dev_info->if_index = 0;
 	dev_info->max_mac_addrs = 1;
 	dev_info->max_rx_pktlen = (uint32_t)-1;
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 67ed9d466..23843e32e 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -688,7 +688,7 @@ tap_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 	dev_info->max_rx_queues = RTE_PMD_TAP_MAX_QUEUES;
 	dev_info->max_tx_queues = RTE_PMD_TAP_MAX_QUEUES;
 	dev_info->min_rx_bufsize = 0;
-	dev_info->pci_dev = NULL;
+	dev_info->device = NULL;
 	dev_info->speed_capa = tap_dev_speed_capa();
 	dev_info->rx_offload_capa = tap_rx_offload_get_port_capa();
 	dev_info->tx_offload_capa = tap_tx_offload_get_port_capa();
diff --git a/drivers/net/thunderx/nicvf_ethdev.c b/drivers/net/thunderx/nicvf_ethdev.c
index 067f2243b..f9e4a5810 100644
--- a/drivers/net/thunderx/nicvf_ethdev.c
+++ b/drivers/net/thunderx/nicvf_ethdev.c
@@ -1400,7 +1400,7 @@ nicvf_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 
 	PMD_INIT_FUNC_TRACE();
 
-	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	dev_info->device = dev->device;
 
 	/* Autonegotiation may be disabled */
 	dev_info->speed_capa = ETH_LINK_SPEED_FIXED;
diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 4dddb1c80..c623ce186 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -2057,7 +2057,7 @@ virtio_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info)
 
 	dev_info->speed_capa = ETH_LINK_SPEED_10G; /* fake value */
 
-	dev_info->pci_dev = dev->device ? RTE_ETH_DEV_TO_PCI(dev) : NULL;
+	dev_info->device = dev->device;
 	dev_info->max_rx_queues =
 		RTE_MIN(hw->max_queue_pairs, VIRTIO_MAX_RX_QUEUES);
 	dev_info->max_tx_queues =
diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c b/drivers/net/vmxnet3/vmxnet3_ethdev.c
index 426008722..220668e19 100644
--- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
+++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
@@ -1025,7 +1025,7 @@ static void
 vmxnet3_dev_info_get(struct rte_eth_dev *dev,
 		     struct rte_eth_dev_info *dev_info)
 {
-	dev_info->pci_dev = RTE_ETH_DEV_TO_PCI(dev);
+	dev_info->device = dev->device;
 
 	dev_info->max_rx_queues = VMXNET3_MAX_RX_QUEUES;
 	dev_info->max_tx_queues = VMXNET3_MAX_TX_QUEUES;
diff --git a/examples/ethtool/lib/rte_ethtool.c b/examples/ethtool/lib/rte_ethtool.c
index 90dfbb739..4c770ec6a 100644
--- a/examples/ethtool/lib/rte_ethtool.c
+++ b/examples/ethtool/lib/rte_ethtool.c
@@ -22,6 +22,8 @@ rte_ethtool_get_drvinfo(uint16_t port_id, struct ethtool_drvinfo *drvinfo)
 {
 	struct rte_eth_dev_info dev_info;
 	struct rte_dev_reg_info reg_info;
+	const struct rte_pci_device *pci_dev;
+	const struct rte_bus *bus;
 	int n;
 	int ret;
 
@@ -46,15 +48,16 @@ rte_ethtool_get_drvinfo(uint16_t port_id, struct ethtool_drvinfo *drvinfo)
 	snprintf(drvinfo->version, sizeof(drvinfo->version), "%s",
 		rte_version());
 	/* TODO: replace bus_info by rte_devargs.name */
-	if (dev_info.pci_dev)
+	bus = rte_bus_find_by_device(dev_info.device);
+	if (bus && !strcmp(bus->name, "pci")) {
+		pci_dev = RTE_DEV_TO_PCI(dev_info.device);
 		snprintf(drvinfo->bus_info, sizeof(drvinfo->bus_info),
 			"%04x:%02x:%02x.%x",
-			dev_info.pci_dev->addr.domain,
-			dev_info.pci_dev->addr.bus,
-			dev_info.pci_dev->addr.devid,
-			dev_info.pci_dev->addr.function);
-	else
+			pci_dev->addr.domain, pci_dev->addr.bus,
+			pci_dev->addr.devid, pci_dev->addr.function);
+	} else {
 		snprintf(drvinfo->bus_info, sizeof(drvinfo->bus_info), "N/A");
+	}
 
 	memset(&reg_info, 0, sizeof(reg_info));
 	rte_eth_dev_get_reg_info(port_id, &reg_info);
diff --git a/examples/ip_pipeline/init.c b/examples/ip_pipeline/init.c
index bb07efa13..f57236b7a 100644
--- a/examples/ip_pipeline/init.c
+++ b/examples/ip_pipeline/init.c
@@ -1266,6 +1266,8 @@ app_init_kni(struct app_params *app) {
 		struct rte_eth_dev_info dev_info;
 		struct app_mempool_params *mempool_params;
 		struct rte_mempool *mempool;
+		const struct rte_pci_device *pci_dev;
+		const struct rte_bus *bus;
 		struct rte_kni_conf conf;
 		struct rte_kni_ops ops;
 
@@ -1297,8 +1299,12 @@ app_init_kni(struct app_params *app) {
 		}
 		conf.group_id = p_link->pmd_id;
 		conf.mbuf_size = mempool_params->buffer_size;
-		conf.addr = dev_info.pci_dev->addr;
-		conf.id = dev_info.pci_dev->id;
+		bus = rte_bus_find_by_device(dev_info.device);
+		if (bus && !strcmp(bus->name, "pci")) {
+			pci_dev = RTE_DEV_TO_PCI(dev_info.device);
+			conf.addr = pci_dev->addr;
+			conf.id = pci_dev->id;
+		}
 
 		memset(&ops, 0, sizeof(ops));
 		ops.port_id = (uint8_t) p_link->pmd_id;
diff --git a/examples/kni/main.c b/examples/kni/main.c
index 0d9980ee1..06eb74f6f 100644
--- a/examples/kni/main.c
+++ b/examples/kni/main.c
@@ -834,13 +834,17 @@ kni_alloc(uint16_t port_id)
 		if (i == 0) {
 			struct rte_kni_ops ops;
 			struct rte_eth_dev_info dev_info;
+			const struct rte_pci_device *pci_dev;
+			const struct rte_bus *bus;
 
 			memset(&dev_info, 0, sizeof(dev_info));
 			rte_eth_dev_info_get(port_id, &dev_info);
 
-			if (dev_info.pci_dev) {
-				conf.addr = dev_info.pci_dev->addr;
-				conf.id = dev_info.pci_dev->id;
+			bus = rte_bus_find_by_device(dev_info.device);
+			if (bus && !strcmp(bus->name, "pci")) {
+				pci_dev = RTE_DEV_TO_PCI(dev_info.device);
+				conf.addr = pci_dev->addr;
+				conf.id = pci_dev->id;
 			}
 			/* Get the interface default mac address */
 			rte_eth_macaddr_get(port_id,
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index ab1030d42..0ed903966 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -995,7 +995,7 @@ struct rte_pci_device;
  * Ethernet device information
  */
 struct rte_eth_dev_info {
-	struct rte_pci_device *pci_dev; /**< Device PCI information. */
+	struct rte_device *device; /** Generic device information */
 	const char *driver_name; /**< Device Driver name. */
 	unsigned int if_index; /**< Index to bound host interface, or 0 if none.
 		Use if_indextoname() to translate into an interface name. */
diff --git a/test/test/test_kni.c b/test/test/test_kni.c
index e4839cdb7..e23eb0837 100644
--- a/test/test/test_kni.c
+++ b/test/test/test_kni.c
@@ -357,6 +357,8 @@ test_kni_processing(uint16_t port_id, struct rte_mempool *mp)
 	struct rte_kni_conf conf;
 	struct rte_eth_dev_info info;
 	struct rte_kni_ops ops;
+	const struct rte_pci_device *pci_dev;
+	const struct rte_bus *bus;
 
 	if (!mp)
 		return -1;
@@ -366,8 +368,12 @@ test_kni_processing(uint16_t port_id, struct rte_mempool *mp)
 	memset(&ops, 0, sizeof(ops));
 
 	rte_eth_dev_info_get(port_id, &info);
-	conf.addr = info.pci_dev->addr;
-	conf.id = info.pci_dev->id;
+	bus = rte_bus_find_by_device(info.device);
+	if (bus && !strcmp(bus->name, "pci")) {
+		pci_dev = RTE_DEV_TO_PCI(info.device);
+		conf.addr = pci_dev->addr;
+		conf.id = pci_dev->id;
+	}
 	snprintf(conf.name, sizeof(conf.name), TEST_KNI_PORT);
 
 	/* core id 1 configured for kernel thread */
@@ -465,6 +471,8 @@ test_kni(void)
 	struct rte_kni_conf conf;
 	struct rte_eth_dev_info info;
 	struct rte_kni_ops ops;
+	const struct rte_pci_device *pci_dev;
+	const struct rte_bus *bus;
 
 	/* Initialize KNI subsytem */
 	rte_kni_init(KNI_TEST_MAX_PORTS);
@@ -523,8 +531,12 @@ test_kni(void)
 	memset(&conf, 0, sizeof(conf));
 	memset(&ops, 0, sizeof(ops));
 	rte_eth_dev_info_get(port_id, &info);
-	conf.addr = info.pci_dev->addr;
-	conf.id = info.pci_dev->id;
+	bus = rte_bus_find_by_device(info.device);
+	if (bus && !strcmp(bus->name, "pci")) {
+		pci_dev = RTE_DEV_TO_PCI(info.device);
+		conf.addr = pci_dev->addr;
+		conf.id = pci_dev->id;
+	}
 	conf.group_id = port_id;
 	conf.mbuf_size = MAX_PACKET_SZ;
 
@@ -552,8 +564,12 @@ test_kni(void)
 	memset(&info, 0, sizeof(info));
 	memset(&ops, 0, sizeof(ops));
 	rte_eth_dev_info_get(port_id, &info);
-	conf.addr = info.pci_dev->addr;
-	conf.id = info.pci_dev->id;
+	bus = rte_bus_find_by_device(info.device);
+	if (bus && !strcmp(bus->name, "pci")) {
+		pci_dev = RTE_DEV_TO_PCI(info.device);
+		conf.addr = pci_dev->addr;
+		conf.id = pci_dev->id;
+	}
 	conf.group_id = port_id;
 	conf.mbuf_size = MAX_PACKET_SZ;
 
-- 
2.14.3

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH v6] eal: provide API for querying valid socket id's
  2018-03-22 12:36  5%   ` [dpdk-dev] [PATCH v6] " Anatoly Burakov
  2018-03-22 17:07  0%     ` gowrishankar muthukrishnan
@ 2018-03-27 16:24  3%     ` Thomas Monjalon
  2018-03-31 17:08  5%     ` [dpdk-dev] [PATCH v7] " Anatoly Burakov
  2 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-03-27 16:24 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, Bruce Richardson, chaozhu, gowrishankar.m

22/03/2018 13:36, Anatoly Burakov:
> --- a/lib/librte_eal/common/include/rte_eal.h
> +++ b/lib/librte_eal/common/include/rte_eal.h
> @@ -57,6 +57,9 @@ enum rte_proc_type_t {
>  struct rte_config {
>  	uint32_t master_lcore;       /**< Id of the master lcore */
>  	uint32_t lcore_count;        /**< Number of available logical cores. */
> +	uint32_t numa_node_count;    /**< Number of detected NUMA nodes. */
> +	uint32_t numa_nodes[RTE_MAX_NUMA_NODES];
> +	/**< List of detected numa nodes. */

Please keep this comment on the same line if it's below 99 chars.


> --- a/lib/librte_eal/common/include/rte_lcore.h
> +++ b/lib/librte_eal/common/include/rte_lcore.h
> @@ -132,6 +132,36 @@ rte_lcore_index(int lcore_id)
>  unsigned rte_socket_id(void);
>  
>  /**
> + * Return number of physical sockets detected on the system.
> + *
> + * Note that number of nodes may not be correspondent to their physical id's:
> + * for example, a system may report two socket id's, but the actual socket id's
> + * may be 0 and 8.
> + *
> + * @return
> + *   the number of physical sockets as recognized by EAL
> + */
> +unsigned int __rte_experimental
> +rte_num_socket_ids(void);

I suggest rte_socket_count() as function name.


> +/**
> + * Return socket id with a particular index.
> + *
> + * This will return socket id at a particular position in list of all detected
> + * physical socket id's. For example, on a machine with sockets [0, 8], passing
> + * 1 as a parameter will return 8.
> + *
> + * @param idx
> + *   index of physical socket id to return
> + *
> + * @return
> + *   - physical socket id as recognized by EAL
> + *   - -1 on error, with errno set to EINVAL
> + */
> +int __rte_experimental
> +rte_socket_id_by_idx(unsigned int idx);

OK for this function.


> --- a/lib/librte_eal/linuxapp/eal/Makefile
> +++ b/lib/librte_eal/linuxapp/eal/Makefile
> -LIBABIVER := 6
> +LIBABIVER := 7

When changing the ABI version, you need to update the release notes.

There is also a deprecation notice to remove.


> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -229,6 +229,8 @@ EXPERIMENTAL {
>  	rte_mp_request;
>  	rte_mp_request_async;
>  	rte_mp_reply;
> +	rte_num_socket_ids;
> +	rte_socket_id_by_idx;

This one is not in the alphabetical order.

>  	rte_service_attr_get;
>  	rte_service_attr_reset_all;
>  	rte_service_component_register;

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3] eal: replace rte_panic instances to return an error value
  @ 2018-03-27 14:06  5%     ` Arnon Warshavsky
  0 siblings, 0 replies; 200+ results
From: Arnon Warshavsky @ 2018-03-27 14:06 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Thomas Monjalon, Burakov, Anatoly, wenzhuo.lu, declan.doherty,
	jerin.jacob, ferruh.yigit, dev

I now have a set of several patches which pass the abi-validation , 1 patch
of checkpatches.sh to prevent future panic instances
and the set of abi-breaking changes which are included the init sequence.

1.
I assume it is best to bundle all to a single patchset. Please correct me
otherwise

2.
Trying to work around the abi in this phase
What is your take regarding adding a state to the init phase:

int rte_get_legacy_panic_state();
void rte_move_to_legacy_panic_state();

This will allow preserving abi for these few functions, replacing for
example the currently abi-breaking

if (eal_thread_init_master())
    return -1

with

eal_thread_init_master()
if ( rte_get_legacy_panic_state())
    return -1

while calling  rte_move_to_legacy_panic_state()  from within these void
functions where panic takes place today.

This can also partially serve the cases where panic is called from within
an interrupt handler, and have no applicative context to return to.

thanks
/Arnon

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH] pdump: change to use generic multi-process channel
  2018-03-27  1:26  0%       ` Tan, Jianfeng
@ 2018-03-27  8:21  3%         ` Pattan, Reshma
  0 siblings, 0 replies; 200+ results
From: Pattan, Reshma @ 2018-03-27  8:21 UTC (permalink / raw)
  To: Tan, Jianfeng, dev

Hi,

> >
> > > > 1) I feel ABI breakage has to be addressed  first for change in
> > > rte_pdump_init() .
> >
> > So, you want to remove unnecessary socket  related code from
> > dpdk-pdump in future release itself?  Kind of making sense.
> > But dpdk-pdump  tool has socket path related command line options
> > which user still can pass on, isn't it kind of confusion we creating
> > w.r.t Internal design and usage?
> 
> AFAIK, these options do not affect anything with this patch even they are set.
> How about printing a warning saying that these options will be deprecated
> and take no effect now?

Fine I guess, when the ABI notice is sent to remove all socket path code, that time you can remove the socket path cli options too.

Thanks,
Reshma

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] pdump: change to use generic multi-process channel
  2018-03-21  9:55  3%     ` Pattan, Reshma
@ 2018-03-27  1:26  0%       ` Tan, Jianfeng
  2018-03-27  8:21  3%         ` Pattan, Reshma
  0 siblings, 1 reply; 200+ results
From: Tan, Jianfeng @ 2018-03-27  1:26 UTC (permalink / raw)
  To: Pattan, Reshma, dev



> -----Original Message-----
> From: Pattan, Reshma
> Sent: Wednesday, March 21, 2018 5:55 PM
> To: Tan, Jianfeng; dev@dpdk.org
> Subject: RE: [PATCH] pdump: change to use generic multi-process channel
> 
> Hi,
> 
> > -----Original Message-----
> > From: Tan, Jianfeng
> > Sent: Wednesday, March 21, 2018 2:28 AM
> > To: Pattan, Reshma <reshma.pattan@intel.com>; dev@dpdk.org
> > Subject: RE: [PATCH] pdump: change to use generic multi-process channel
> >
> 
> Hi,
> 
> > > 1) I feel ABI breakage has to be addressed  first for change in
> > rte_pdump_init() .
> > > 2)ABI notice for removal of the rte_pdump_set_socket_dir()  and then
> > remove it completely .
> >
> > This patch itself does not break any ABI. It just puts parameters of
> > rte_pdump_init() not used. And make rte_pdump_set_socket_dir() as a
> > dummy function.
> >
> 
> So, for current release you just mark parameters unused and functions set to
> dummy, in future release you announce
> ABI breakage by removing them completely? If that is agreed plan I don't
> have any issues.

Actually, as you commented, we can announce the deprecation with this patch.

> 
> > > 3)Need to do cleanup of the code app/dpdk-pdump.
> >
> > Yes, I understand it's a normal process to announce deprecation firstly, and
> > then do the change.
> >
> > But here is the thing, with generic mp introduced, we will not be
> compatible
> > with DPDK versions.
> > So we want to unify the use of generic mp channel in this release for vfio,
> > pdump, vdev, memory rework.
> > And in fact, ABI/API changes could be delayed to later releases.
> 
> So, you want to remove unnecessary socket  related code from dpdk-pdump
> in future release itself?  Kind of making sense.
> But dpdk-pdump  tool has socket path related command line options which
> user still can pass on, isn't it kind of confusion we creating w.r.t
> Internal design and usage?

AFAIK, these options do not affect anything with this patch even they are set. How about printing a warning saying that these options will be deprecated and take no effect now?

I'll send a v2 for your review.

Thanks,
Jianfeng

> 
> >
> > > 4)After all the changes we need to make sure dpdk-pdump works fine
> > > without breaking the functionality, validation team should be able to help.
> >
> > I have done a simple test of pdump. Can you suggest where can I get the
> > comprehensive test cases?
> >
> 
> Ok, if you have verified and observed packets are been captured successfully,
> that is good enough.
> 
> Thanks,
> Reshma

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v1 3/6] mempool: support block dequeue operation
  2018-03-26 16:12  3% ` [dpdk-dev] [PATCH v1 0/6] mempool: add bucket driver Andrew Rybchenko
@ 2018-03-26 16:12  4%   ` Andrew Rybchenko
  0 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-26 16:12 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ, Artem V. Andreev

From: "Artem V. Andreev" <Artem.Andreev@oktetlabs.ru>

If mempool manager supports object blocks (physically and virtual
contiguous set of objects), it is sufficient to get the first
object only and the function allows to avoid filling in of
information about each block member.

Signed-off-by: Artem V. Andreev <Artem.Andreev@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
 doc/guides/rel_notes/deprecation.rst       |   7 --
 lib/librte_mempool/Makefile                |   1 +
 lib/librte_mempool/meson.build             |   2 +
 lib/librte_mempool/rte_mempool.c           |  39 ++++++++
 lib/librte_mempool/rte_mempool.h           | 151 ++++++++++++++++++++++++++++-
 lib/librte_mempool/rte_mempool_ops.c       |   1 +
 lib/librte_mempool/rte_mempool_version.map |   1 +
 7 files changed, 194 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 5301259..8249638 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -59,13 +59,6 @@ Deprecation Notices
 
   - ``rte_eal_mbuf_default_mempool_ops``
 
-* mempool: several API and ABI changes are planned in v18.05.
-
-  The following changes are planned:
-
-  - addition of new op to allocate contiguous
-    block of objects if underlying driver supports it.
-
 * mbuf: The control mbuf API will be removed in v18.05. The impacted
   functions and macros are:
 
diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 2c46fdd..62dd1a4 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -10,6 +10,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
 # Allow deprecated symbol to use deprecated rte_mempool_populate_iova_tab()
 # from earlier deprecated rte_mempool_populate_phys_tab()
 CFLAGS += -Wno-deprecated-declarations
+CFLAGS += -DALLOW_EXPERIMENTAL_API
 LDLIBS += -lrte_eal -lrte_ring
 
 EXPORT_MAP := rte_mempool_version.map
diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
index 22e912a..8ef88e3 100644
--- a/lib/librte_mempool/meson.build
+++ b/lib/librte_mempool/meson.build
@@ -1,6 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
+allow_experimental_apis = true
+
 extra_flags = []
 
 # Allow deprecated symbol to use deprecated rte_mempool_populate_iova_tab()
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index c58bcc6..79f8429 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -1125,6 +1125,36 @@ void rte_mempool_check_cookies(const struct rte_mempool *mp,
 #endif
 }
 
+void
+rte_mempool_contig_blocks_check_cookies(const struct rte_mempool *mp,
+	void * const *first_obj_table_const, unsigned int n, int free)
+{
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+	struct rte_mempool_info info;
+	const size_t total_elt_sz =
+		mp->header_size + mp->elt_size + mp->trailer_size;
+	unsigned int i, j;
+
+	rte_mempool_ops_get_info(mp, &info);
+
+	for (i = 0; i < n; ++i) {
+		void *first_obj = first_obj_table_const[i];
+
+		for (j = 0; j < info.contig_block_size; ++j) {
+			void *obj;
+
+			obj = (void *)((uintptr_t)first_obj + j * total_elt_sz);
+			rte_mempool_check_cookies(mp, &obj, 1, free);
+		}
+	}
+#else
+	RTE_SET_USED(mp);
+	RTE_SET_USED(first_obj_table_const);
+	RTE_SET_USED(n);
+	RTE_SET_USED(free);
+#endif
+}
+
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
 static void
 mempool_obj_audit(struct rte_mempool *mp, __rte_unused void *opaque,
@@ -1190,6 +1220,7 @@ void
 rte_mempool_dump(FILE *f, struct rte_mempool *mp)
 {
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+	struct rte_mempool_info info;
 	struct rte_mempool_debug_stats sum;
 	unsigned lcore_id;
 #endif
@@ -1231,6 +1262,7 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp)
 
 	/* sum and dump statistics */
 #ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+	rte_mempool_ops_get_info(mp, &info);
 	memset(&sum, 0, sizeof(sum));
 	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
 		sum.put_bulk += mp->stats[lcore_id].put_bulk;
@@ -1239,6 +1271,8 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp)
 		sum.get_success_objs += mp->stats[lcore_id].get_success_objs;
 		sum.get_fail_bulk += mp->stats[lcore_id].get_fail_bulk;
 		sum.get_fail_objs += mp->stats[lcore_id].get_fail_objs;
+		sum.get_success_blks += mp->stats[lcore_id].get_success_blks;
+		sum.get_fail_blks += mp->stats[lcore_id].get_fail_blks;
 	}
 	fprintf(f, "  stats:\n");
 	fprintf(f, "    put_bulk=%"PRIu64"\n", sum.put_bulk);
@@ -1247,6 +1281,11 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp)
 	fprintf(f, "    get_success_objs=%"PRIu64"\n", sum.get_success_objs);
 	fprintf(f, "    get_fail_bulk=%"PRIu64"\n", sum.get_fail_bulk);
 	fprintf(f, "    get_fail_objs=%"PRIu64"\n", sum.get_fail_objs);
+	if (info.contig_block_size > 0) {
+		fprintf(f, "    get_success_blks=%"PRIu64"\n",
+			sum.get_success_blks);
+		fprintf(f, "    get_fail_blks=%"PRIu64"\n", sum.get_fail_blks);
+	}
 #else
 	fprintf(f, "  no statistics available\n");
 #endif
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 1ac2f57..3cab3a0 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -70,6 +70,10 @@ struct rte_mempool_debug_stats {
 	uint64_t get_success_objs; /**< Objects successfully allocated. */
 	uint64_t get_fail_bulk;    /**< Failed allocation number. */
 	uint64_t get_fail_objs;    /**< Objects that failed to be allocated. */
+	/** Successful allocation number of contiguous blocks. */
+	uint64_t get_success_blks;
+	/** Failed allocation number of contiguous blocks. */
+	uint64_t get_fail_blks;
 } __rte_cache_aligned;
 #endif
 
@@ -195,7 +199,10 @@ struct rte_mempool_memhdr {
  *
  * Additional information about the mempool
  */
-struct rte_mempool_info;
+struct rte_mempool_info {
+	/** Number of objects in the contiguous block */
+	unsigned int contig_block_size;
+};
 
 /**
  * The RTE mempool structure.
@@ -273,8 +280,16 @@ struct rte_mempool {
 			mp->stats[__lcore_id].name##_bulk += 1;	\
 		}                                               \
 	} while(0)
+#define __MEMPOOL_CONTIG_BLOCKS_STAT_ADD(mp, name, n) do {                    \
+		unsigned int __lcore_id = rte_lcore_id();       \
+		if (__lcore_id < RTE_MAX_LCORE) {               \
+			mp->stats[__lcore_id].name##_blks += n;	\
+			mp->stats[__lcore_id].name##_bulk += 1;	\
+		}                                               \
+	} while (0)
 #else
 #define __MEMPOOL_STAT_ADD(mp, name, n) do {} while(0)
+#define __MEMPOOL_CONTIG_BLOCKS_STAT_ADD(mp, name, n) do {} while (0)
 #endif
 
 /**
@@ -342,6 +357,38 @@ void rte_mempool_check_cookies(const struct rte_mempool *mp,
 #define __mempool_check_cookies(mp, obj_table_const, n, free) do {} while(0)
 #endif /* RTE_LIBRTE_MEMPOOL_DEBUG */
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @internal Check contiguous object blocks and update cookies or panic.
+ *
+ * @param mp
+ *   Pointer to the memory pool.
+ * @param first_obj_table_const
+ *   Pointer to a table of void * pointers (first object of the contiguous
+ *   object blocks).
+ * @param n
+ *   Number of contiguous object blocks.
+ * @param free
+ *   - 0: object is supposed to be allocated, mark it as free
+ *   - 1: object is supposed to be free, mark it as allocated
+ *   - 2: just check that cookie is valid (free or allocated)
+ */
+void rte_mempool_contig_blocks_check_cookies(const struct rte_mempool *mp,
+	void * const *first_obj_table_const, unsigned int n, int free);
+
+#ifdef RTE_LIBRTE_MEMPOOL_DEBUG
+#define __mempool_contig_blocks_check_cookies(mp, first_obj_table_const, n, \
+					      free) \
+	rte_mempool_contig_blocks_check_cookies(mp, first_obj_table_const, n, \
+						free)
+#else
+#define __mempool_contig_blocks_check_cookies(mp, first_obj_table_const, n, \
+					      free) \
+	do {} while (0)
+#endif /* RTE_LIBRTE_MEMPOOL_DEBUG */
+
 #define RTE_MEMPOOL_OPS_NAMESIZE 32 /**< Max length of ops struct name. */
 
 /**
@@ -374,6 +421,15 @@ typedef int (*rte_mempool_dequeue_t)(struct rte_mempool *mp,
 		void **obj_table, unsigned int n);
 
 /**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Dequeue a number of contiquous object blocks from the external pool.
+ */
+typedef int (*rte_mempool_dequeue_contig_blocks_t)(struct rte_mempool *mp,
+		 void **first_obj_table, unsigned int n);
+
+/**
  * Return the number of available objects in the external pool.
  */
 typedef unsigned (*rte_mempool_get_count)(const struct rte_mempool *mp);
@@ -539,6 +595,10 @@ struct rte_mempool_ops {
 	 * Get mempool info
 	 */
 	rte_mempool_get_info_t get_info;
+	/**
+	 * Dequeue a number of contiguous object blocks.
+	 */
+	rte_mempool_dequeue_contig_blocks_t dequeue_contig_blocks;
 } __rte_cache_aligned;
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
@@ -617,6 +677,30 @@ rte_mempool_ops_dequeue_bulk(struct rte_mempool *mp,
 }
 
 /**
+ * @internal Wrapper for mempool_ops dequeue_contig_blocks callback.
+ *
+ * @param[in] mp
+ *   Pointer to the memory pool.
+ * @param[out] first_obj_table
+ *   Pointer to a table of void * pointers (first objects).
+ * @param[in] n
+ *   Number of blocks to get.
+ * @return
+ *   - 0: Success; got n objects.
+ *   - <0: Error; code of dequeue function.
+ */
+static inline int
+rte_mempool_ops_dequeue_contig_blocks(struct rte_mempool *mp,
+		void **first_obj_table, unsigned int n)
+{
+	struct rte_mempool_ops *ops;
+
+	ops = rte_mempool_get_ops(mp->ops_index);
+	RTE_ASSERT(ops->dequeue_contig_blocks != NULL);
+	return ops->dequeue_contig_blocks(mp, first_obj_table, n);
+}
+
+/**
  * @internal wrapper for mempool_ops enqueue callback.
  *
  * @param mp
@@ -1531,6 +1615,71 @@ rte_mempool_get(struct rte_mempool *mp, void **obj_p)
 }
 
 /**
+ * @internal Get contiguous blocks of objects from the pool. Used internally.
+ * @param mp
+ *   A pointer to the mempool structure.
+ * @param first_obj_table
+ *   A pointer to a pointer to the first object in each block.
+ * @param n
+ *   A number of blocks to get.
+ * @return
+ *   - >0: Success
+ *   - <0: Error
+ */
+static __rte_always_inline int
+__mempool_generic_get_contig_blocks(struct rte_mempool *mp,
+				    void **first_obj_table, unsigned int n)
+{
+	int ret;
+
+	ret = rte_mempool_ops_dequeue_contig_blocks(mp, first_obj_table, n);
+	if (ret < 0)
+		__MEMPOOL_CONTIG_BLOCKS_STAT_ADD(mp, get_fail, n);
+	else
+		__MEMPOOL_CONTIG_BLOCKS_STAT_ADD(mp, get_success, n);
+
+	return ret;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get a contiguous blocks of objects from the mempool.
+ *
+ * If cache is enabled, consider to flush it first, to reuse objects
+ * as soon as possible.
+ *
+ * The application should check that the driver supports the operation
+ * by calling rte_mempool_ops_get_info() and checking that `contig_block_size`
+ * is not zero.
+ *
+ * @param mp
+ *   A pointer to the mempool structure.
+ * @param first_obj_table
+ *   A pointer to a pointer to the first object in each block.
+ * @param n
+ *   The number of blocks to get from mempool.
+ * @return
+ *   - 0: Success; blocks taken.
+ *   - -ENOBUFS: Not enough entries in the mempool; no object is retrieved.
+ *   - -EOPNOTSUPP: The mempool driver does not support block dequeue
+ */
+static __rte_always_inline int
+__rte_experimental
+rte_mempool_get_contig_blocks(struct rte_mempool *mp,
+			      void **first_obj_table, unsigned int n)
+{
+	int ret;
+
+	ret = __mempool_generic_get_contig_blocks(mp, first_obj_table, n);
+	if (ret == 0)
+		__mempool_contig_blocks_check_cookies(mp, first_obj_table, n,
+						      1);
+	return ret;
+}
+
+/**
  * Return the number of entries in the mempool.
  *
  * When cache is enabled, this function has to browse the length of
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index efc1c08..a27e1fa 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -60,6 +60,7 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->calc_mem_size = h->calc_mem_size;
 	ops->populate = h->populate;
 	ops->get_info = h->get_info;
+	ops->dequeue_contig_blocks = h->dequeue_contig_blocks;
 
 	rte_spinlock_unlock(&rte_mempool_ops_table.sl);
 
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index c9d16ec..1c406b5 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -53,6 +53,7 @@ DPDK_17.11 {
 DPDK_18.05 {
 	global:
 
+	rte_mempool_contig_blocks_check_cookies;
 	rte_mempool_op_calc_mem_size_default;
 	rte_mempool_op_populate_default;
 
-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v1 0/6] mempool: add bucket driver
                     ` (2 preceding siblings ...)
  2018-03-26 16:09  2% ` [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver Andrew Rybchenko
@ 2018-03-26 16:12  3% ` Andrew Rybchenko
  2018-03-26 16:12  4%   ` [dpdk-dev] [PATCH v1 3/6] mempool: support block dequeue operation Andrew Rybchenko
  3 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2018-03-26 16:12 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

The initial patch series [1] (RFCv1 is [2]) is split into two to simplify
processing.  It is the second part which relies on the first one [3].

It should be applied on top of [4] and [3].

The patch series adds bucket mempool driver which allows to allocate
(both physically and virtually) contiguous blocks of objects and adds
mempool API to do it. It is still capable to provide separate objects,
but it is definitely more heavy-weight than ring/stack drivers.
The driver will be used by the future Solarflare driver enhancements
which allow to utilize physical contiguous blocks in the NIC firmware.

The target usecase is dequeue in blocks and enqueue separate objects
back (which are collected in buckets to be dequeued). So, the memory
pool with bucket driver is created by an application and provided to
networking PMD receive queue. The choice of bucket driver is done using
rte_eth_dev_pool_ops_supported(). A PMD that relies upon contiguous
block allocation should report the bucket driver as the only supported
and preferred one.

Introduction of the contiguous block dequeue operation is proven by
performance measurements using autotest with minor enhancements:
 - in the original test bulks are powers of two, which is unacceptable
   for us, so they are changed to multiple of contig_block_size;
 - the test code is duplicated to support plain dequeue and
   dequeue_contig_blocks;
 - all the extra test variations (with/without cache etc) are eliminated;
 - a fake read from the dequeued buffer is added (in both cases) to
   simulate mbufs access.

start performance test for bucket (without cache)
mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=   1 n_keep=  30 Srate_persec=   111935488
mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=   1 n_keep=  60 Srate_persec=   115290931
mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=  15 n_keep=  30 Srate_persec=   353055539
mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=  15 n_keep=  60 Srate_persec=   353330790
mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=   1 n_keep=  30 Srate_persec=   224657407
mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=   1 n_keep=  60 Srate_persec=   230411468
mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=  15 n_keep=  30 Srate_persec=   706700902
mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=  15 n_keep=  60 Srate_persec=   703673139
mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=   1 n_keep=  30 Srate_persec=   425236887
mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=   1 n_keep=  60 Srate_persec=   437295512
mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=  15 n_keep=  30 Srate_persec=  1343409356
mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=  15 n_keep=  60 Srate_persec=  1336567397
start performance test for bucket (without cache + contiguous dequeue)
mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=   1 n_keep=  30 Crate_persec=   122945536
mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=   1 n_keep=  60 Crate_persec=   126458265
mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=  15 n_keep=  30 Crate_persec=   374262988
mempool_autotest cache=   0 cores= 1 n_get_bulk=  15 n_put_bulk=  15 n_keep=  60 Crate_persec=   377316966
mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=   1 n_keep=  30 Crate_persec=   244842496
mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=   1 n_keep=  60 Crate_persec=   251618917
mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=  15 n_keep=  30 Crate_persec=   751226060
mempool_autotest cache=   0 cores= 2 n_get_bulk=  15 n_put_bulk=  15 n_keep=  60 Crate_persec=   756233010
mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=   1 n_keep=  30 Crate_persec=   462068120
mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=   1 n_keep=  60 Crate_persec=   476997221
mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=  15 n_keep=  30 Crate_persec=  1432171313
mempool_autotest cache=   0 cores= 4 n_get_bulk=  15 n_put_bulk=  15 n_keep=  60 Crate_persec=  1438829771

The number of objects in the contiguous block is a function of bucket
memory size (.config option) and total element size. In the future
additional API with possibility to pass parameters on mempool allocation
may be added.

It breaks ABI since changes rte_mempool_ops. The ABI version is already
bumped in [4].


[1] https://dpdk.org/ml/archives/dev/2018-January/088698.html
[2] https://dpdk.org/ml/archives/dev/2017-November/082335.html
[3] https://dpdk.org/ml/archives/dev/2018-March/093807.html
[4] https://dpdk.org/ml/archives/dev/2018-March/093196.html


RFCv2 -> v1:
  - rebased on top of [3]
  - cleanup deprecation notice when it is done
  - mark a new API experimental
  - move contig blocks dequeue debug checks/processing to the library function
  - add contig blocks get stats
  - add release notes

RFCv1 -> RFCv2:
  - change info API to get information from driver required to
    API user to know contiguous block size
  - use SPDX tags
  - avoid all objects affinity to single lcore
  - fix bucket get_count
  - fix NO_CACHE_ALIGN case in bucket mempool


Andrew Rybchenko (1):
  doc: advertise bucket mempool driver

Artem V. Andreev (5):
  mempool/bucket: implement bucket mempool manager
  mempool: implement abstract mempool info API
  mempool: support block dequeue operation
  mempool/bucket: implement block dequeue operation
  mempool/bucket: do not allow one lcore to grab all buckets

 MAINTAINERS                                        |   9 +
 config/common_base                                 |   2 +
 doc/guides/rel_notes/deprecation.rst               |   7 -
 doc/guides/rel_notes/release_18_05.rst             |   9 +
 drivers/mempool/Makefile                           |   1 +
 drivers/mempool/bucket/Makefile                    |  27 +
 drivers/mempool/bucket/meson.build                 |   9 +
 drivers/mempool/bucket/rte_mempool_bucket.c        | 627 +++++++++++++++++++++
 .../mempool/bucket/rte_mempool_bucket_version.map  |   4 +
 lib/librte_mempool/Makefile                        |   1 +
 lib/librte_mempool/meson.build                     |   2 +
 lib/librte_mempool/rte_mempool.c                   |  39 ++
 lib/librte_mempool/rte_mempool.h                   | 190 +++++++
 lib/librte_mempool/rte_mempool_ops.c               |  16 +
 lib/librte_mempool/rte_mempool_version.map         |   8 +
 mk/rte.app.mk                                      |   1 +
 16 files changed, 945 insertions(+), 7 deletions(-)
 create mode 100644 drivers/mempool/bucket/Makefile
 create mode 100644 drivers/mempool/bucket/meson.build
 create mode 100644 drivers/mempool/bucket/rte_mempool_bucket.c
 create mode 100644 drivers/mempool/bucket/rte_mempool_bucket_version.map

-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 10/11] mempool: remove callback to register memory area
  2018-03-26 16:09  2% ` [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver Andrew Rybchenko
                     ` (3 preceding siblings ...)
  2018-03-26 16:09  4%   ` [dpdk-dev] [PATCH v3 07/11] mempool: deprecate xmem functions Andrew Rybchenko
@ 2018-03-26 16:09  8%   ` Andrew Rybchenko
  4 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-26 16:09 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

The callback is not required any more since there is a new callback
to populate objects using provided memory area which provides
the same information.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v2 -> v3:
 - none

v1 -> v2:
 - none

RFCv2 -> v1:
 - advertise ABI changes in release notes

 doc/guides/rel_notes/deprecation.rst       |  1 -
 doc/guides/rel_notes/release_18_05.rst     |  2 ++
 lib/librte_mempool/rte_mempool.c           |  5 -----
 lib/librte_mempool/rte_mempool.h           | 31 ------------------------------
 lib/librte_mempool/rte_mempool_ops.c       | 14 --------------
 lib/librte_mempool/rte_mempool_version.map |  1 -
 6 files changed, 2 insertions(+), 52 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 473330d..5301259 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -63,7 +63,6 @@ Deprecation Notices
 
   The following changes are planned:
 
-  - substitute ``register_memory_area`` with ``populate`` ops.
   - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
 
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 6a8db54..016c4ed 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -108,6 +108,8 @@ ABI Changes
   Callback ``get_capabilities`` has been removed from ``rte_mempool_ops``
   since its features are covered by ``calc_mem_size`` and ``populate``
   callbacks.
+  Callback ``register_memory_area`` has been removed from ``rte_mempool_ops``
+  since the new callback ``populate`` may be used instead of it.
 
 
 Removed Items
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 8c3b0b1..c58bcc6 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -355,11 +355,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	if (ret != 0)
 		return ret;
 
-	/* Notify memory area to mempool */
-	ret = rte_mempool_ops_register_memory_area(mp, vaddr, iova, len);
-	if (ret != -ENOTSUP && ret < 0)
-		return ret;
-
 	/* mempool is already populated */
 	if (mp->populated_size >= mp->size)
 		return -ENOSPC;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 9107f5a..314f909 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -371,12 +371,6 @@ typedef int (*rte_mempool_dequeue_t)(struct rte_mempool *mp,
 typedef unsigned (*rte_mempool_get_count)(const struct rte_mempool *mp);
 
 /**
- * Notify new memory area to mempool.
- */
-typedef int (*rte_mempool_ops_register_memory_area_t)
-(const struct rte_mempool *mp, char *vaddr, rte_iova_t iova, size_t len);
-
-/**
  * Calculate memory size required to store given number of objects.
  *
  * If mempool objects are not required to be IOVA-contiguous
@@ -514,10 +508,6 @@ struct rte_mempool_ops {
 	rte_mempool_dequeue_t dequeue;   /**< Dequeue an object. */
 	rte_mempool_get_count get_count; /**< Get qty of available objs. */
 	/**
-	 * Notify new memory area to mempool
-	 */
-	rte_mempool_ops_register_memory_area_t register_memory_area;
-	/**
 	 * Optional callback to calculate memory size required to
 	 * store specified number of objects.
 	 */
@@ -639,27 +629,6 @@ unsigned
 rte_mempool_ops_get_count(const struct rte_mempool *mp);
 
 /**
- * @internal wrapper for mempool_ops register_memory_area callback.
- * API to notify the mempool handler when a new memory area is added to pool.
- *
- * @param mp
- *   Pointer to the memory pool.
- * @param vaddr
- *   Pointer to the buffer virtual address.
- * @param iova
- *   Pointer to the buffer IO address.
- * @param len
- *   Pool size.
- * @return
- *   - 0: Success;
- *   - -ENOTSUP - doesn't support register_memory_area ops (valid error case).
- *   - Otherwise, rte_mempool_populate_phys fails thus pool create fails.
- */
-int
-rte_mempool_ops_register_memory_area(const struct rte_mempool *mp,
-				char *vaddr, rte_iova_t iova, size_t len);
-
-/**
  * @internal wrapper for mempool_ops calc_mem_size callback.
  * API to calculate size of memory required to store specified number of
  * object.
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 6ac669a..ea9be1e 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -57,7 +57,6 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->enqueue = h->enqueue;
 	ops->dequeue = h->dequeue;
 	ops->get_count = h->get_count;
-	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
 	ops->populate = h->populate;
 
@@ -99,19 +98,6 @@ rte_mempool_ops_get_count(const struct rte_mempool *mp)
 }
 
 /* wrapper to notify new memory area to external mempool */
-int
-rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
-					rte_iova_t iova, size_t len)
-{
-	struct rte_mempool_ops *ops;
-
-	ops = rte_mempool_get_ops(mp->ops_index);
-
-	RTE_FUNC_PTR_OR_ERR_RET(ops->register_memory_area, -ENOTSUP);
-	return ops->register_memory_area(mp, vaddr, iova, len);
-}
-
-/* wrapper to notify new memory area to external mempool */
 ssize_t
 rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 				uint32_t obj_num, uint32_t pg_shift,
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 637f73f..cf375db 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -45,7 +45,6 @@ DPDK_16.07 {
 DPDK_17.11 {
 	global:
 
-	rte_mempool_ops_register_memory_area;
 	rte_mempool_populate_iova;
 	rte_mempool_populate_iova_tab;
 
-- 
2.7.4

^ permalink raw reply	[relevance 8%]

* [dpdk-dev] [PATCH v3 04/11] mempool: add op to calculate memory size to be allocated
  2018-03-26 16:09  2% ` [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver Andrew Rybchenko
@ 2018-03-26 16:09  7%   ` Andrew Rybchenko
  2018-04-04 15:08  0%     ` santosh
  2018-03-26 16:09  6%   ` [dpdk-dev] [PATCH v3 05/11] mempool: add op to populate objects using provided memory Andrew Rybchenko
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2018-03-26 16:09 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

Size of memory chunk required to populate mempool objects depends
on how objects are stored in the memory. Different mempool drivers
may have different requirements and a new operation allows to
calculate memory size in accordance with driver requirements and
advertise requirements on minimum memory chunk size and alignment
in a generic way.

Bump ABI version since the patch breaks it.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v2 -> v3:
 - none

v1 -> v2:
 - clarify min_chunk_size meaning
 - rebase on top of patch series which fixes library version in meson
   build

RFCv2 -> v1:
 - move default calc_mem_size callback to rte_mempool_ops_default.c
 - add ABI changes to release notes
 - name default callback consistently: rte_mempool_op_<callback>_default()
 - bump ABI version since it is the first patch which breaks ABI
 - describe default callback behaviour in details
 - avoid introduction of internal function to cope with deprecation
   (keep it to deprecation patch)
 - move cache-line or page boundary chunk alignment to default callback
 - highlight that min_chunk_size and align parameters are output only

 doc/guides/rel_notes/deprecation.rst         |  3 +-
 doc/guides/rel_notes/release_18_05.rst       |  7 ++-
 lib/librte_mempool/Makefile                  |  3 +-
 lib/librte_mempool/meson.build               |  5 +-
 lib/librte_mempool/rte_mempool.c             | 43 +++++++-------
 lib/librte_mempool/rte_mempool.h             | 86 +++++++++++++++++++++++++++-
 lib/librte_mempool/rte_mempool_ops.c         | 18 ++++++
 lib/librte_mempool/rte_mempool_ops_default.c | 38 ++++++++++++
 lib/librte_mempool/rte_mempool_version.map   |  7 +++
 9 files changed, 182 insertions(+), 28 deletions(-)
 create mode 100644 lib/librte_mempool/rte_mempool_ops_default.c

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 6594585..e02d4ca 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -72,8 +72,7 @@ Deprecation Notices
 
   - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
-  - addition of new ops to customize required memory chunk calculation,
-    customize objects population and allocate contiguous
+  - addition of new ops to customize objects population and allocate contiguous
     block of objects if underlying driver supports it.
 
 * mbuf: The control mbuf API will be removed in v18.05. The impacted
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index f2525bb..59583ea 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -80,6 +80,11 @@ ABI Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Changed rte_mempool_ops structure.**
+
+  A new callback ``calc_mem_size`` has been added to ``rte_mempool_ops``
+  to allow to customize required memory size calculation.
+
 
 Removed Items
 -------------
@@ -152,7 +157,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_latencystats.so.1
      librte_lpm.so.2
      librte_mbuf.so.3
-     librte_mempool.so.3
+   + librte_mempool.so.4
    + librte_meter.so.2
      librte_metrics.so.1
      librte_net.so.1
diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 24e735a..072740f 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -11,11 +11,12 @@ LDLIBS += -lrte_eal -lrte_ring
 
 EXPORT_MAP := rte_mempool_version.map
 
-LIBABIVER := 3
+LIBABIVER := 4
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops_default.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
 
diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
index 712720f..9e3b527 100644
--- a/lib/librte_mempool/meson.build
+++ b/lib/librte_mempool/meson.build
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
-version = 3
-sources = files('rte_mempool.c', 'rte_mempool_ops.c')
+version = 4
+sources = files('rte_mempool.c', 'rte_mempool_ops.c',
+		'rte_mempool_ops_default.c')
 headers = files('rte_mempool.h')
 deps += ['ring']
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index d8e3720..dd2d0fe 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -561,10 +561,10 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	unsigned int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	const struct rte_memzone *mz;
-	size_t size, total_elt_sz, align, pg_sz, pg_shift;
+	ssize_t mem_size;
+	size_t align, pg_sz, pg_shift;
 	rte_iova_t iova;
 	unsigned mz_id, n;
-	unsigned int mp_flags;
 	int ret;
 
 	ret = mempool_ops_alloc_once(mp);
@@ -575,29 +575,23 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	if (mp->nb_mem_chunks != 0)
 		return -EEXIST;
 
-	/* Get mempool capabilities */
-	mp_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
-	/* update mempool capabilities */
-	mp->flags |= mp_flags;
-
 	if (rte_eal_has_hugepages()) {
 		pg_shift = 0; /* not needed, zone is physically contiguous */
 		pg_sz = 0;
-		align = RTE_CACHE_LINE_SIZE;
 	} else {
 		pg_sz = getpagesize();
 		pg_shift = rte_bsf32(pg_sz);
-		align = pg_sz;
 	}
 
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 	for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
-		size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift,
-						mp->flags);
+		size_t min_chunk_size;
+
+		mem_size = rte_mempool_ops_calc_mem_size(mp, n, pg_shift,
+				&min_chunk_size, &align);
+		if (mem_size < 0) {
+			ret = mem_size;
+			goto fail;
+		}
 
 		ret = snprintf(mz_name, sizeof(mz_name),
 			RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);
@@ -606,7 +600,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 			goto fail;
 		}
 
-		mz = rte_memzone_reserve_aligned(mz_name, size,
+		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
 			mp->socket_id, mz_flags, align);
 		/* not enough memory, retry with the biggest zone we have */
 		if (mz == NULL)
@@ -617,6 +611,12 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 			goto fail;
 		}
 
+		if (mz->len < min_chunk_size) {
+			rte_memzone_free(mz);
+			ret = -ENOMEM;
+			goto fail;
+		}
+
 		if (mp->flags & MEMPOOL_F_NO_IOVA_CONTIG)
 			iova = RTE_BAD_IOVA;
 		else
@@ -649,13 +649,14 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 static size_t
 get_anon_size(const struct rte_mempool *mp)
 {
-	size_t size, total_elt_sz, pg_sz, pg_shift;
+	size_t size, pg_sz, pg_shift;
+	size_t min_chunk_size;
+	size_t align;
 
 	pg_sz = getpagesize();
 	pg_shift = rte_bsf32(pg_sz);
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
-	size = rte_mempool_xmem_size(mp->size, total_elt_sz, pg_shift,
-					mp->flags);
+	size = rte_mempool_ops_calc_mem_size(mp, mp->size, pg_shift,
+					     &min_chunk_size, &align);
 
 	return size;
 }
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index e531a15..191255d 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -400,6 +400,62 @@ typedef int (*rte_mempool_get_capabilities_t)(const struct rte_mempool *mp,
 typedef int (*rte_mempool_ops_register_memory_area_t)
 (const struct rte_mempool *mp, char *vaddr, rte_iova_t iova, size_t len);
 
+/**
+ * Calculate memory size required to store given number of objects.
+ *
+ * If mempool objects are not required to be IOVA-contiguous
+ * (the flag MEMPOOL_F_NO_IOVA_CONTIG is set), min_chunk_size defines
+ * virtually contiguous chunk size. Otherwise, if mempool objects must
+ * be IOVA-contiguous (the flag MEMPOOL_F_NO_IOVA_CONTIG is clear),
+ * min_chunk_size defines IOVA-contiguous chunk size.
+ *
+ * @param[in] mp
+ *   Pointer to the memory pool.
+ * @param[in] obj_num
+ *   Number of objects.
+ * @param[in] pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @param[out] min_chunk_size
+ *   Location for minimum size of the memory chunk which may be used to
+ *   store memory pool objects.
+ * @param[out] align
+ *   Location for required memory chunk alignment.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+typedef ssize_t (*rte_mempool_calc_mem_size_t)(const struct rte_mempool *mp,
+		uint32_t obj_num,  uint32_t pg_shift,
+		size_t *min_chunk_size, size_t *align);
+
+/**
+ * Default way to calculate memory size required to store given number of
+ * objects.
+ *
+ * If page boundaries may be ignored, it is just a product of total
+ * object size including header and trailer and number of objects.
+ * Otherwise, it is a number of pages required to store given number of
+ * objects without crossing page boundary.
+ *
+ * Note that if object size is bigger than page size, then it assumes
+ * that pages are grouped in subsets of physically continuous pages big
+ * enough to store at least one object.
+ *
+ * If mempool driver requires object addresses to be block size aligned
+ * (MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS), space for one extra element is
+ * reserved to be able to meet the requirement.
+ *
+ * Minimum size of memory chunk is either all required space, if
+ * capabilities say that whole memory area must be physically contiguous
+ * (MEMPOOL_F_CAPA_PHYS_CONTIG), or a maximum of the page size and total
+ * element size.
+ *
+ * Required memory chunk alignment is a maximum of page size and cache
+ * line size.
+ */
+ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
+		uint32_t obj_num, uint32_t pg_shift,
+		size_t *min_chunk_size, size_t *align);
+
 /** Structure defining mempool operations structure */
 struct rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
@@ -416,6 +472,11 @@ struct rte_mempool_ops {
 	 * Notify new memory area to mempool
 	 */
 	rte_mempool_ops_register_memory_area_t register_memory_area;
+	/**
+	 * Optional callback to calculate memory size required to
+	 * store specified number of objects.
+	 */
+	rte_mempool_calc_mem_size_t calc_mem_size;
 } __rte_cache_aligned;
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
@@ -565,6 +626,29 @@ rte_mempool_ops_register_memory_area(const struct rte_mempool *mp,
 				char *vaddr, rte_iova_t iova, size_t len);
 
 /**
+ * @internal wrapper for mempool_ops calc_mem_size callback.
+ * API to calculate size of memory required to store specified number of
+ * object.
+ *
+ * @param[in] mp
+ *   Pointer to the memory pool.
+ * @param[in] obj_num
+ *   Number of objects.
+ * @param[in] pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @param[out] min_chunk_size
+ *   Location for minimum size of the memory chunk which may be used to
+ *   store memory pool objects.
+ * @param[out] align
+ *   Location for required memory chunk alignment.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+ssize_t rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
+				      uint32_t obj_num, uint32_t pg_shift,
+				      size_t *min_chunk_size, size_t *align);
+
+/**
  * @internal wrapper for mempool_ops free callback.
  *
  * @param mp
@@ -1534,7 +1618,7 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  * of objects. Assume that the memory buffer will be aligned at page
  * boundary.
  *
- * Note that if object size is bigger then page size, then it assumes
+ * Note that if object size is bigger than page size, then it assumes
  * that pages are grouped in subsets of physically continuous pages big
  * enough to store at least one object.
  *
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 0732255..26908cc 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -59,6 +59,7 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->get_count = h->get_count;
 	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
+	ops->calc_mem_size = h->calc_mem_size;
 
 	rte_spinlock_unlock(&rte_mempool_ops_table.sl);
 
@@ -123,6 +124,23 @@ rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
 	return ops->register_memory_area(mp, vaddr, iova, len);
 }
 
+/* wrapper to notify new memory area to external mempool */
+ssize_t
+rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
+				uint32_t obj_num, uint32_t pg_shift,
+				size_t *min_chunk_size, size_t *align)
+{
+	struct rte_mempool_ops *ops;
+
+	ops = rte_mempool_get_ops(mp->ops_index);
+
+	if (ops->calc_mem_size == NULL)
+		return rte_mempool_op_calc_mem_size_default(mp, obj_num,
+				pg_shift, min_chunk_size, align);
+
+	return ops->calc_mem_size(mp, obj_num, pg_shift, min_chunk_size, align);
+}
+
 /* sets mempool ops previously registered by rte_mempool_register_ops. */
 int
 rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
new file mode 100644
index 0000000..57fe79b
--- /dev/null
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016 Intel Corporation.
+ * Copyright(c) 2016 6WIND S.A.
+ * Copyright(c) 2018 Solarflare Communications Inc.
+ */
+
+#include <rte_mempool.h>
+
+ssize_t
+rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
+				     uint32_t obj_num, uint32_t pg_shift,
+				     size_t *min_chunk_size, size_t *align)
+{
+	unsigned int mp_flags;
+	int ret;
+	size_t total_elt_sz;
+	size_t mem_size;
+
+	/* Get mempool capabilities */
+	mp_flags = 0;
+	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
+	if ((ret < 0) && (ret != -ENOTSUP))
+		return ret;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
+					 mp->flags | mp_flags);
+
+	if (mp_flags & MEMPOOL_F_CAPA_PHYS_CONTIG)
+		*min_chunk_size = mem_size;
+	else
+		*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
+
+	*align = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE, (size_t)1 << pg_shift);
+
+	return mem_size;
+}
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 62b76f9..cb38189 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -51,3 +51,10 @@ DPDK_17.11 {
 	rte_mempool_populate_iova_tab;
 
 } DPDK_16.07;
+
+DPDK_18.05 {
+	global:
+
+	rte_mempool_op_calc_mem_size_default;
+
+} DPDK_17.11;
-- 
2.7.4

^ permalink raw reply	[relevance 7%]

* [dpdk-dev] [PATCH v3 07/11] mempool: deprecate xmem functions
  2018-03-26 16:09  2% ` [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver Andrew Rybchenko
                     ` (2 preceding siblings ...)
  2018-03-26 16:09  6%   ` [dpdk-dev] [PATCH v3 06/11] mempool: remove callback to get capabilities Andrew Rybchenko
@ 2018-03-26 16:09  4%   ` Andrew Rybchenko
  2018-03-26 16:09  8%   ` [dpdk-dev] [PATCH v3 10/11] mempool: remove callback to register memory area Andrew Rybchenko
  4 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-26 16:09 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ, Thomas Monjalon

Move rte_mempool_xmem_size() code to internal helper function
since it is required in two places: deprecated rte_mempool_xmem_size()
and non-deprecated rte_mempool_op_calc_mem_size_default().

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v2 -> v3:
 - none

v1 -> v2:
 - deprecate rte_mempool_populate_iova_tab()
 - add -Wno-deprecated-declarations to fix build errors because of
   rte_mempool_populate_iova_tab() deprecation
 - add @deprecated to deprecated functions description

RFCv2 -> v1:
 - advertise deprecation in release notes
 - factor out default memory size calculation into non-deprecated
   internal function to avoid usage of deprecated function internally
 - remove test for deprecated functions to address build issue because
   of usage of deprecated functions (it is easy to allow usage of
   deprecated function in Makefile, but very complicated in meson)

 doc/guides/rel_notes/deprecation.rst         |  7 -------
 doc/guides/rel_notes/release_18_05.rst       | 11 ++++++++++
 lib/librte_mempool/Makefile                  |  3 +++
 lib/librte_mempool/meson.build               | 12 +++++++++++
 lib/librte_mempool/rte_mempool.c             | 19 ++++++++++++++---
 lib/librte_mempool/rte_mempool.h             | 30 +++++++++++++++++++++++++++
 lib/librte_mempool/rte_mempool_ops_default.c |  4 ++--
 test/test/test_mempool.c                     | 31 ----------------------------
 8 files changed, 74 insertions(+), 43 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 4deed9a..473330d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -60,13 +60,6 @@ Deprecation Notices
   - ``rte_eal_mbuf_default_mempool_ops``
 
 * mempool: several API and ABI changes are planned in v18.05.
-  The following functions, introduced for Xen, which is not supported
-  anymore since v17.11, are hard to use, not used anywhere else in DPDK.
-  Therefore they will be deprecated in v18.05 and removed in v18.08:
-
-  - ``rte_mempool_xmem_create``
-  - ``rte_mempool_xmem_size``
-  - ``rte_mempool_xmem_usage``
 
   The following changes are planned:
 
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index c50f26c..6a8db54 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -74,6 +74,17 @@ API Changes
   Now the new driver callbacks ``calc_mem_size`` and ``populate`` may be
   used to achieve it without specific knowledge in the generic code.
 
+* **Deprecated mempool xmem functions.**
+
+  The following functions, introduced for Xen, which is not supported
+  anymore since v17.11, are hard to use, not used anywhere else in DPDK.
+  Therefore they were deprecated in v18.05 and will be removed in v18.08:
+
+  - ``rte_mempool_xmem_create``
+  - ``rte_mempool_xmem_size``
+  - ``rte_mempool_xmem_usage``
+  - ``rte_mempool_populate_iova_tab``
+
 
 ABI Changes
 -----------
diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 072740f..2c46fdd 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -7,6 +7,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
 LIB = librte_mempool.a
 
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+# Allow deprecated symbol to use deprecated rte_mempool_populate_iova_tab()
+# from earlier deprecated rte_mempool_populate_phys_tab()
+CFLAGS += -Wno-deprecated-declarations
 LDLIBS += -lrte_eal -lrte_ring
 
 EXPORT_MAP := rte_mempool_version.map
diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
index 9e3b527..22e912a 100644
--- a/lib/librte_mempool/meson.build
+++ b/lib/librte_mempool/meson.build
@@ -1,6 +1,18 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
+extra_flags = []
+
+# Allow deprecated symbol to use deprecated rte_mempool_populate_iova_tab()
+# from earlier deprecated rte_mempool_populate_phys_tab()
+extra_flags += '-Wno-deprecated-declarations'
+
+foreach flag: extra_flags
+	if cc.has_argument(flag)
+		cflags += flag
+	endif
+endforeach
+
 version = 4
 sources = files('rte_mempool.c', 'rte_mempool_ops.c',
 		'rte_mempool_ops_default.c')
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 40eedde..8c3b0b1 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -204,11 +204,13 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
 
 
 /*
- * Calculate maximum amount of memory required to store given number of objects.
+ * Internal function to calculate required memory chunk size shared
+ * by default implementation of the corresponding callback and
+ * deprecated external function.
  */
 size_t
-rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
-		      __rte_unused unsigned int flags)
+rte_mempool_calc_mem_size_helper(uint32_t elt_num, size_t total_elt_sz,
+				 uint32_t pg_shift)
 {
 	size_t obj_per_page, pg_num, pg_sz;
 
@@ -228,6 +230,17 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
 }
 
 /*
+ * Calculate maximum amount of memory required to store given number of objects.
+ */
+size_t
+rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
+		      __rte_unused unsigned int flags)
+{
+	return rte_mempool_calc_mem_size_helper(elt_num, total_elt_sz,
+						pg_shift);
+}
+
+/*
  * Calculate how much memory would be actually required with the
  * given memory footprint to store required number of elements.
  */
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 0b83d5e..9107f5a 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -427,6 +427,28 @@ ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 		size_t *min_chunk_size, size_t *align);
 
 /**
+ * @internal Helper function to calculate memory size required to store
+ * specified number of objects in assumption that the memory buffer will
+ * be aligned at page boundary.
+ *
+ * Note that if object size is bigger than page size, then it assumes
+ * that pages are grouped in subsets of physically continuous pages big
+ * enough to store at least one object.
+ *
+ * @param elt_num
+ *   Number of elements.
+ * @param total_elt_sz
+ *   The size of each element, including header and trailer, as returned
+ *   by rte_mempool_calc_obj_size().
+ * @param pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+size_t rte_mempool_calc_mem_size_helper(uint32_t elt_num, size_t total_elt_sz,
+		uint32_t pg_shift);
+
+/**
  * Function to be called for each populated object.
  *
  * @param[in] mp
@@ -855,6 +877,7 @@ rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
 		   int socket_id, unsigned flags);
 
 /**
+ * @deprecated
  * Create a new mempool named *name* in memory.
  *
  * The pool contains n elements of elt_size. Its size is set to n.
@@ -912,6 +935,7 @@ rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
  *   The pointer to the new allocated mempool, on success. NULL on error
  *   with rte_errno set appropriately. See rte_mempool_create() for details.
  */
+__rte_deprecated
 struct rte_mempool *
 rte_mempool_xmem_create(const char *name, unsigned n, unsigned elt_size,
 		unsigned cache_size, unsigned private_data_size,
@@ -1008,6 +1032,7 @@ int rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
 	void *opaque);
 
 /**
+ * @deprecated
  * Add physical memory for objects in the pool at init
  *
  * Add a virtually contiguous memory chunk in the pool where objects can
@@ -1033,6 +1058,7 @@ int rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
  *   On error, the chunks are not added in the memory list of the
  *   mempool and a negative errno is returned.
  */
+__rte_deprecated
 int rte_mempool_populate_iova_tab(struct rte_mempool *mp, char *vaddr,
 	const rte_iova_t iova[], uint32_t pg_num, uint32_t pg_shift,
 	rte_mempool_memchunk_free_cb_t *free_cb, void *opaque);
@@ -1652,6 +1678,7 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
 	struct rte_mempool_objsz *sz);
 
 /**
+ * @deprecated
  * Get the size of memory required to store mempool elements.
  *
  * Calculate the maximum amount of memory required to store given number
@@ -1674,10 +1701,12 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  * @return
  *   Required memory size aligned at page boundary.
  */
+__rte_deprecated
 size_t rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz,
 	uint32_t pg_shift, unsigned int flags);
 
 /**
+ * @deprecated
  * Get the size of memory required to store mempool elements.
  *
  * Calculate how much memory would be actually required with the given
@@ -1705,6 +1734,7 @@ size_t rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz,
  *   buffer is too small, return a negative value whose absolute value
  *   is the actual number of elements that can be stored in that buffer.
  */
+__rte_deprecated
 ssize_t rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num,
 	size_t total_elt_sz, const rte_iova_t iova[], uint32_t pg_num,
 	uint32_t pg_shift, unsigned int flags);
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 3defc15..fd63ca1 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -16,8 +16,8 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 
-	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
-					 mp->flags);
+	mem_size = rte_mempool_calc_mem_size_helper(obj_num, total_elt_sz,
+						    pg_shift);
 
 	*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
 
diff --git a/test/test/test_mempool.c b/test/test/test_mempool.c
index 63f921e..8d29af2 100644
--- a/test/test/test_mempool.c
+++ b/test/test/test_mempool.c
@@ -444,34 +444,6 @@ test_mempool_same_name_twice_creation(void)
 	return 0;
 }
 
-/*
- * Basic test for mempool_xmem functions.
- */
-static int
-test_mempool_xmem_misc(void)
-{
-	uint32_t elt_num, total_size;
-	size_t sz;
-	ssize_t usz;
-
-	elt_num = MAX_KEEP;
-	total_size = rte_mempool_calc_obj_size(MEMPOOL_ELT_SIZE, 0, NULL);
-	sz = rte_mempool_xmem_size(elt_num, total_size, MEMPOOL_PG_SHIFT_MAX,
-					0);
-
-	usz = rte_mempool_xmem_usage(NULL, elt_num, total_size, 0, 1,
-		MEMPOOL_PG_SHIFT_MAX, 0);
-
-	if (sz != (size_t)usz)  {
-		printf("failure @ %s: rte_mempool_xmem_usage(%u, %u) "
-			"returns: %#zx, while expected: %#zx;\n",
-			__func__, elt_num, total_size, sz, (size_t)usz);
-		return -1;
-	}
-
-	return 0;
-}
-
 static void
 walk_cb(struct rte_mempool *mp, void *userdata __rte_unused)
 {
@@ -596,9 +568,6 @@ test_mempool(void)
 	if (test_mempool_same_name_twice_creation() < 0)
 		goto err;
 
-	if (test_mempool_xmem_misc() < 0)
-		goto err;
-
 	/* test the stack handler */
 	if (test_mempool_basic(mp_stack, 1) < 0)
 		goto err;
-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v3 06/11] mempool: remove callback to get capabilities
  2018-03-26 16:09  2% ` [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver Andrew Rybchenko
  2018-03-26 16:09  7%   ` [dpdk-dev] [PATCH v3 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
  2018-03-26 16:09  6%   ` [dpdk-dev] [PATCH v3 05/11] mempool: add op to populate objects using provided memory Andrew Rybchenko
@ 2018-03-26 16:09  6%   ` Andrew Rybchenko
  2018-03-26 16:09  4%   ` [dpdk-dev] [PATCH v3 07/11] mempool: deprecate xmem functions Andrew Rybchenko
  2018-03-26 16:09  8%   ` [dpdk-dev] [PATCH v3 10/11] mempool: remove callback to register memory area Andrew Rybchenko
  4 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-26 16:09 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ, Santosh Shukla, Jerin Jacob

The callback was introduced to let generic code to know octeontx
mempool driver requirements to use single physically contiguous
memory chunk to store all objects and align object address to
total object size. Now these requirements are met using a new
callbacks to calculate required memory chunk size and to populate
objects using provided memory chunk.

These capability flags are not used anywhere else.

Restricting capabilities to flags is not generic and likely to
be insufficient to describe mempool driver features. If required
in the future, API which returns structured information may be
added.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v2 -> v3:
 - none

v1 -> v2:
 - fix typo
 - rebase on top of patch which renames MEMPOOL_F_NO_PHYS_CONTIG

RFCv2 -> v1:
 - squash mempool/octeontx patches to add calc_mem_size and populate
   callbacks to this one in order to avoid breakages in the middle of
   patchset
 - advertise API changes in release notes

 doc/guides/rel_notes/deprecation.rst            |  1 -
 doc/guides/rel_notes/release_18_05.rst          | 11 +++++
 drivers/mempool/octeontx/rte_mempool_octeontx.c | 59 +++++++++++++++++++++----
 lib/librte_mempool/rte_mempool.c                | 44 ++----------------
 lib/librte_mempool/rte_mempool.h                | 52 +---------------------
 lib/librte_mempool/rte_mempool_ops.c            | 14 ------
 lib/librte_mempool/rte_mempool_ops_default.c    | 15 +------
 lib/librte_mempool/rte_mempool_version.map      |  1 -
 8 files changed, 68 insertions(+), 129 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index c06fc67..4deed9a 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -70,7 +70,6 @@ Deprecation Notices
 
   The following changes are planned:
 
-  - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
   - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index abaefe5..c50f26c 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -66,6 +66,14 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Removed mempool capability flags and related functions.**
+
+  Flags ``MEMPOOL_F_CAPA_PHYS_CONTIG`` and
+  ``MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS`` were used by octeontx mempool
+  driver to customize generic mempool library behaviour.
+  Now the new driver callbacks ``calc_mem_size`` and ``populate`` may be
+  used to achieve it without specific knowledge in the generic code.
+
 
 ABI Changes
 -----------
@@ -86,6 +94,9 @@ ABI Changes
   to allow to customize required memory size calculation.
   A new callback ``populate`` has been added to ``rte_mempool_ops``
   to allow to customize objects population.
+  Callback ``get_capabilities`` has been removed from ``rte_mempool_ops``
+  since its features are covered by ``calc_mem_size`` and ``populate``
+  callbacks.
 
 
 Removed Items
diff --git a/drivers/mempool/octeontx/rte_mempool_octeontx.c b/drivers/mempool/octeontx/rte_mempool_octeontx.c
index d143d05..64ed528 100644
--- a/drivers/mempool/octeontx/rte_mempool_octeontx.c
+++ b/drivers/mempool/octeontx/rte_mempool_octeontx.c
@@ -126,14 +126,29 @@ octeontx_fpavf_get_count(const struct rte_mempool *mp)
 	return octeontx_fpa_bufpool_free_count(pool);
 }
 
-static int
-octeontx_fpavf_get_capabilities(const struct rte_mempool *mp,
-				unsigned int *flags)
+static ssize_t
+octeontx_fpavf_calc_mem_size(const struct rte_mempool *mp,
+			     uint32_t obj_num, uint32_t pg_shift,
+			     size_t *min_chunk_size, size_t *align)
 {
-	RTE_SET_USED(mp);
-	*flags |= (MEMPOOL_F_CAPA_PHYS_CONTIG |
-			MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS);
-	return 0;
+	ssize_t mem_size;
+
+	/*
+	 * Simply need space for one more object to be able to
+	 * fulfil alignment requirements.
+	 */
+	mem_size = rte_mempool_op_calc_mem_size_default(mp, obj_num + 1,
+							pg_shift,
+							min_chunk_size, align);
+	if (mem_size >= 0) {
+		/*
+		 * Memory area which contains objects must be physically
+		 * contiguous.
+		 */
+		*min_chunk_size = mem_size;
+	}
+
+	return mem_size;
 }
 
 static int
@@ -150,6 +165,33 @@ octeontx_fpavf_register_memory_area(const struct rte_mempool *mp,
 	return octeontx_fpavf_pool_set_range(pool_bar, len, vaddr, gpool);
 }
 
+static int
+octeontx_fpavf_populate(struct rte_mempool *mp, unsigned int max_objs,
+			void *vaddr, rte_iova_t iova, size_t len,
+			rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
+{
+	size_t total_elt_sz;
+	size_t off;
+
+	if (iova == RTE_BAD_IOVA)
+		return -EINVAL;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	/* align object start address to a multiple of total_elt_sz */
+	off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
+
+	if (len < off)
+		return -EINVAL;
+
+	vaddr = (char *)vaddr + off;
+	iova += off;
+	len -= off;
+
+	return rte_mempool_op_populate_default(mp, max_objs, vaddr, iova, len,
+					       obj_cb, obj_cb_arg);
+}
+
 static struct rte_mempool_ops octeontx_fpavf_ops = {
 	.name = "octeontx_fpavf",
 	.alloc = octeontx_fpavf_alloc,
@@ -157,8 +199,9 @@ static struct rte_mempool_ops octeontx_fpavf_ops = {
 	.enqueue = octeontx_fpavf_enqueue,
 	.dequeue = octeontx_fpavf_dequeue,
 	.get_count = octeontx_fpavf_get_count,
-	.get_capabilities = octeontx_fpavf_get_capabilities,
 	.register_memory_area = octeontx_fpavf_register_memory_area,
+	.calc_mem_size = octeontx_fpavf_calc_mem_size,
+	.populate = octeontx_fpavf_populate,
 };
 
 MEMPOOL_REGISTER_OPS(octeontx_fpavf_ops);
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index d917dc7..40eedde 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -208,15 +208,9 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  */
 size_t
 rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
-		      unsigned int flags)
+		      __rte_unused unsigned int flags)
 {
 	size_t obj_per_page, pg_num, pg_sz;
-	unsigned int mask;
-
-	mask = MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS | MEMPOOL_F_CAPA_PHYS_CONTIG;
-	if ((flags & mask) == mask)
-		/* alignment need one additional object */
-		elt_num += 1;
 
 	if (total_elt_sz == 0)
 		return 0;
@@ -240,18 +234,12 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
 ssize_t
 rte_mempool_xmem_usage(__rte_unused void *vaddr, uint32_t elt_num,
 	size_t total_elt_sz, const rte_iova_t iova[], uint32_t pg_num,
-	uint32_t pg_shift, unsigned int flags)
+	uint32_t pg_shift, __rte_unused unsigned int flags)
 {
 	uint32_t elt_cnt = 0;
 	rte_iova_t start, end;
 	uint32_t iova_idx;
 	size_t pg_sz = (size_t)1 << pg_shift;
-	unsigned int mask;
-
-	mask = MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS | MEMPOOL_F_CAPA_PHYS_CONTIG;
-	if ((flags & mask) == mask)
-		/* alignment need one additional object */
-		elt_num += 1;
 
 	/* if iova is NULL, assume contiguous memory */
 	if (iova == NULL) {
@@ -345,8 +333,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	rte_iova_t iova, size_t len, rte_mempool_memchunk_free_cb_t *free_cb,
 	void *opaque)
 {
-	unsigned total_elt_sz;
-	unsigned int mp_capa_flags;
 	unsigned i = 0;
 	size_t off;
 	struct rte_mempool_memhdr *memhdr;
@@ -365,27 +351,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	if (mp->populated_size >= mp->size)
 		return -ENOSPC;
 
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
-
-	/* Get mempool capabilities */
-	mp_capa_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_capa_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
-	/* update mempool capabilities */
-	mp->flags |= mp_capa_flags;
-
-	/* Detect pool area has sufficient space for elements */
-	if (mp_capa_flags & MEMPOOL_F_CAPA_PHYS_CONTIG) {
-		if (len < total_elt_sz * mp->size) {
-			RTE_LOG(ERR, MEMPOOL,
-				"pool area %" PRIx64 " not enough\n",
-				(uint64_t)len);
-			return -ENOSPC;
-		}
-	}
-
 	memhdr = rte_zmalloc("MEMPOOL_MEMHDR", sizeof(*memhdr), 0);
 	if (memhdr == NULL)
 		return -ENOMEM;
@@ -397,10 +362,7 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	memhdr->free_cb = free_cb;
 	memhdr->opaque = opaque;
 
-	if (mp_capa_flags & MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS)
-		/* align object start address to a multiple of total_elt_sz */
-		off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
-	else if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
+	if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
 		off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr;
 	else
 		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 754261e..0b83d5e 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -246,24 +246,6 @@ struct rte_mempool {
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */
 #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */
-/**
- * This capability flag is advertised by a mempool handler, if the whole
- * memory area containing the objects must be physically contiguous.
- * Note: This flag should not be passed by application.
- */
-#define MEMPOOL_F_CAPA_PHYS_CONTIG 0x0040
-/**
- * This capability flag is advertised by a mempool handler. Used for a case
- * where mempool driver wants object start address(vaddr) aligned to block
- * size(/ total element size).
- *
- * Note:
- * - This flag should not be passed by application.
- *   Flag used for mempool driver only.
- * - Mempool driver must also set MEMPOOL_F_CAPA_PHYS_CONTIG flag along with
- *   MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS.
- */
-#define MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS 0x0080
 
 /**
  * @internal When debug is enabled, store some statistics.
@@ -389,12 +371,6 @@ typedef int (*rte_mempool_dequeue_t)(struct rte_mempool *mp,
 typedef unsigned (*rte_mempool_get_count)(const struct rte_mempool *mp);
 
 /**
- * Get the mempool capabilities.
- */
-typedef int (*rte_mempool_get_capabilities_t)(const struct rte_mempool *mp,
-		unsigned int *flags);
-
-/**
  * Notify new memory area to mempool.
  */
 typedef int (*rte_mempool_ops_register_memory_area_t)
@@ -440,13 +416,7 @@ typedef ssize_t (*rte_mempool_calc_mem_size_t)(const struct rte_mempool *mp,
  * that pages are grouped in subsets of physically continuous pages big
  * enough to store at least one object.
  *
- * If mempool driver requires object addresses to be block size aligned
- * (MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS), space for one extra element is
- * reserved to be able to meet the requirement.
- *
- * Minimum size of memory chunk is either all required space, if
- * capabilities say that whole memory area must be physically contiguous
- * (MEMPOOL_F_CAPA_PHYS_CONTIG), or a maximum of the page size and total
+ * Minimum size of memory chunk is a maximum of the page size and total
  * element size.
  *
  * Required memory chunk alignment is a maximum of page size and cache
@@ -522,10 +492,6 @@ struct rte_mempool_ops {
 	rte_mempool_dequeue_t dequeue;   /**< Dequeue an object. */
 	rte_mempool_get_count get_count; /**< Get qty of available objs. */
 	/**
-	 * Get the mempool capabilities
-	 */
-	rte_mempool_get_capabilities_t get_capabilities;
-	/**
 	 * Notify new memory area to mempool
 	 */
 	rte_mempool_ops_register_memory_area_t register_memory_area;
@@ -651,22 +617,6 @@ unsigned
 rte_mempool_ops_get_count(const struct rte_mempool *mp);
 
 /**
- * @internal wrapper for mempool_ops get_capabilities callback.
- *
- * @param mp [in]
- *   Pointer to the memory pool.
- * @param flags [out]
- *   Pointer to the mempool flags.
- * @return
- *   - 0: Success; The mempool driver has advertised his pool capabilities in
- *   flags param.
- *   - -ENOTSUP - doesn't support get_capabilities ops (valid case).
- *   - Otherwise, pool create fails.
- */
-int
-rte_mempool_ops_get_capabilities(const struct rte_mempool *mp,
-					unsigned int *flags);
-/**
  * @internal wrapper for mempool_ops register_memory_area callback.
  * API to notify the mempool handler when a new memory area is added to pool.
  *
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 1a7f39f..6ac669a 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -57,7 +57,6 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->enqueue = h->enqueue;
 	ops->dequeue = h->dequeue;
 	ops->get_count = h->get_count;
-	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
 	ops->populate = h->populate;
@@ -99,19 +98,6 @@ rte_mempool_ops_get_count(const struct rte_mempool *mp)
 	return ops->get_count(mp);
 }
 
-/* wrapper to get external mempool capabilities. */
-int
-rte_mempool_ops_get_capabilities(const struct rte_mempool *mp,
-					unsigned int *flags)
-{
-	struct rte_mempool_ops *ops;
-
-	ops = rte_mempool_get_ops(mp->ops_index);
-
-	RTE_FUNC_PTR_OR_ERR_RET(ops->get_capabilities, -ENOTSUP);
-	return ops->get_capabilities(mp, flags);
-}
-
 /* wrapper to notify new memory area to external mempool */
 int
 rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 57295f7..3defc15 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -11,26 +11,15 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 				     uint32_t obj_num, uint32_t pg_shift,
 				     size_t *min_chunk_size, size_t *align)
 {
-	unsigned int mp_flags;
-	int ret;
 	size_t total_elt_sz;
 	size_t mem_size;
 
-	/* Get mempool capabilities */
-	mp_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 
 	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
-					 mp->flags | mp_flags);
+					 mp->flags);
 
-	if (mp_flags & MEMPOOL_F_CAPA_PHYS_CONTIG)
-		*min_chunk_size = mem_size;
-	else
-		*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
+	*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
 
 	*align = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE, (size_t)1 << pg_shift);
 
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 41a0b09..637f73f 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -45,7 +45,6 @@ DPDK_16.07 {
 DPDK_17.11 {
 	global:
 
-	rte_mempool_ops_get_capabilities;
 	rte_mempool_ops_register_memory_area;
 	rte_mempool_populate_iova;
 	rte_mempool_populate_iova_tab;
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver
    2018-03-10 15:39  3% ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
  2018-03-25 16:20  2% ` [dpdk-dev] [PATCH v2 00/11] " Andrew Rybchenko
@ 2018-03-26 16:09  2% ` Andrew Rybchenko
  2018-03-26 16:09  7%   ` [dpdk-dev] [PATCH v3 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
                     ` (4 more replies)
  2018-03-26 16:12  3% ` [dpdk-dev] [PATCH v1 0/6] mempool: add bucket driver Andrew Rybchenko
  3 siblings, 5 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-26 16:09 UTC (permalink / raw)
  To: dev
  Cc: Olivier MATZ, Thomas Monjalon, Anatoly Burakov, Santosh Shukla,
	Jerin Jacob, Hemant Agrawal, Shreyansh Jain

The patch series should be applied on top of [7].

The initial patch series [1] is split into two to simplify processing.
The second series relies on this one and will add bucket mempool driver
and related ops.

The patch series has generic enhancements suggested by Olivier.
Basically it adds driver callbacks to calculate required memory size and
to populate objects using provided memory area. It allows to remove
so-called capability flags used before to tell generic code how to
allocate and slice allocated memory into mempool objects.
Clean up which removes get_capabilities and register_memory_area is
not strictly required, but I think right thing to do.
Existing mempool drivers are updated.

rte_mempool_populate_iova_tab() is also deprecated in v2 as agreed in [2].
Unfortunately it requires addition of -Wno-deprecated-declarations flag
to librte_mempool since the function is used by deprecated earlier
rte_mempool_populate_phys_tab(). If the later may be removed in the
release, we can avoid addition of the flag to allow usage of deprecated
functions.

One open question remains from previous review [3].

The patch series interfere with memory hotplug for DPDK [4] ([5] to be
precise). So, rebase may be required.

A new patch is added to the series to rename MEMPOOL_F_NO_PHYS_CONTIG
as MEMPOOL_F_NO_IOVA_CONTIG as agreed in [6].
MEMPOOL_F_CAPA_PHYS_CONTIG is not renamed since it removed in this
patchset.

It breaks ABI since changes rte_mempool_ops. Also it removes
rte_mempool_ops_register_memory_area() and
rte_mempool_ops_get_capabilities() since corresponding callbacks are
removed.

Internal global functions are not listed in map file since it is not
a part of external API.

[1] https://dpdk.org/ml/archives/dev/2018-January/088698.html
[2] https://dpdk.org/ml/archives/dev/2018-March/093186.html
[3] https://dpdk.org/ml/archives/dev/2018-March/093329.html
[4] https://dpdk.org/ml/archives/dev/2018-March/092070.html
[5] https://dpdk.org/ml/archives/dev/2018-March/092088.html
[6] https://dpdk.org/ml/archives/dev/2018-March/093345.html
[7] https://dpdk.org/ml/archives/dev/2018-March/093196.html

v2 -> v3:
  - fix build error in mempool/dpaa: prepare to remove register memory area op

v1 -> v2:
  - deprecate rte_mempool_populate_iova_tab()
  - add patch to fix memory leak if no objects are populated
  - add patch to rename MEMPOOL_F_NO_PHYS_CONTIG
  - minor fixes (typos, blank line at the end of file)
  - highlight meaning of min_chunk_size (when it is virtual or
    physical contiguous)
  - make sure that mempool is initialized in rte_mempool_populate_anon()
  - move patch to ensure that mempool is initialized earlier in the series

RFCv2 -> v1:
  - split the series in two
  - squash octeontx patches which implement calc_mem_size and populate
    callbacks into the patch which removes get_capabilities since it is
    the easiest way to untangle the tangle of tightly related library
    functions and flags advertised by the driver
  - consistently name default callbacks
  - move default callbacks to dedicated file
  - see detailed description in patches

RFCv1 -> RFCv2:
  - add driver ops to calculate required memory size and populate
    mempool objects, remove extra flags which were required before
    to control it
  - transition of octeontx and dpaa drivers to the new callbacks
  - change info API to get information from driver required to
    API user to know contiguous block size
  - remove get_capabilities (not required any more and may be
    substituted with more in info get API)
  - remove register_memory_area since it is substituted with
    populate callback which can do more
  - use SPDX tags
  - avoid all objects affinity to single lcore
  - fix bucket get_count
  - deprecate XMEM API
  - avoid introduction of a new function to flush cache
  - fix NO_CACHE_ALIGN case in bucket mempool

Andrew Rybchenko (9):
  mempool: fix memhdr leak when no objects are populated
  mempool: rename flag to control IOVA-contiguous objects
  mempool: add op to calculate memory size to be allocated
  mempool: add op to populate objects using provided memory
  mempool: remove callback to get capabilities
  mempool: deprecate xmem functions
  mempool/octeontx: prepare to remove register memory area op
  mempool/dpaa: prepare to remove register memory area op
  mempool: remove callback to register memory area

Artem V. Andreev (2):
  mempool: ensure the mempool is initialized before populating
  mempool: support flushing the default cache of the mempool

 doc/guides/rel_notes/deprecation.rst            |  12 +-
 doc/guides/rel_notes/release_18_05.rst          |  33 ++-
 drivers/mempool/dpaa/dpaa_mempool.c             |  13 +-
 drivers/mempool/octeontx/rte_mempool_octeontx.c |  64 ++++--
 drivers/net/thunderx/nicvf_ethdev.c             |   2 +-
 lib/librte_mempool/Makefile                     |   6 +-
 lib/librte_mempool/meson.build                  |  17 +-
 lib/librte_mempool/rte_mempool.c                | 179 ++++++++-------
 lib/librte_mempool/rte_mempool.h                | 280 +++++++++++++++++-------
 lib/librte_mempool/rte_mempool_ops.c            |  37 ++--
 lib/librte_mempool/rte_mempool_ops_default.c    |  51 +++++
 lib/librte_mempool/rte_mempool_version.map      |  10 +-
 test/test/test_mempool.c                        |  31 ---
 13 files changed, 485 insertions(+), 250 deletions(-)
 create mode 100644 lib/librte_mempool/rte_mempool_ops_default.c

-- 
2.7.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3 05/11] mempool: add op to populate objects using provided memory
  2018-03-26 16:09  2% ` [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver Andrew Rybchenko
  2018-03-26 16:09  7%   ` [dpdk-dev] [PATCH v3 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
@ 2018-03-26 16:09  6%   ` Andrew Rybchenko
  2018-03-26 16:09  6%   ` [dpdk-dev] [PATCH v3 06/11] mempool: remove callback to get capabilities Andrew Rybchenko
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-26 16:09 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

The callback allows to customize how objects are stored in the
memory chunk. Default implementation of the callback which simply
puts objects one by one is available.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v2 -> v3:
 - none

v1 -> v2:
 - fix memory leak if off is bigger than len

RFCv2 -> v1:
 - advertise ABI changes in release notes
 - use consistent name for default callback:
   rte_mempool_op_<callback>_default()
 - add opaque data pointer to populated object callback
 - move default callback to dedicated file

 doc/guides/rel_notes/deprecation.rst         |  2 +-
 doc/guides/rel_notes/release_18_05.rst       |  2 +
 lib/librte_mempool/rte_mempool.c             | 23 ++++---
 lib/librte_mempool/rte_mempool.h             | 90 ++++++++++++++++++++++++++++
 lib/librte_mempool/rte_mempool_ops.c         | 21 +++++++
 lib/librte_mempool/rte_mempool_ops_default.c | 24 ++++++++
 lib/librte_mempool/rte_mempool_version.map   |  1 +
 7 files changed, 149 insertions(+), 14 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index e02d4ca..c06fc67 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -72,7 +72,7 @@ Deprecation Notices
 
   - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
-  - addition of new ops to customize objects population and allocate contiguous
+  - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
 
 * mbuf: The control mbuf API will be removed in v18.05. The impacted
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 59583ea..abaefe5 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -84,6 +84,8 @@ ABI Changes
 
   A new callback ``calc_mem_size`` has been added to ``rte_mempool_ops``
   to allow to customize required memory size calculation.
+  A new callback ``populate`` has been added to ``rte_mempool_ops``
+  to allow to customize objects population.
 
 
 Removed Items
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index dd2d0fe..d917dc7 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,7 +99,8 @@ static unsigned optimize_object_size(unsigned obj_size)
 }
 
 static void
-mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova)
+mempool_add_elem(struct rte_mempool *mp, __rte_unused void *opaque,
+		 void *obj, rte_iova_t iova)
 {
 	struct rte_mempool_objhdr *hdr;
 	struct rte_mempool_objtlr *tlr __rte_unused;
@@ -116,9 +117,6 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova)
 	tlr = __mempool_get_trailer(obj);
 	tlr->cookie = RTE_MEMPOOL_TRAILER_COOKIE;
 #endif
-
-	/* enqueue in ring */
-	rte_mempool_ops_enqueue_bulk(mp, &obj, 1);
 }
 
 /* call obj_cb() for each mempool element */
@@ -407,17 +405,16 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	else
 		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
 
-	while (off + total_elt_sz <= len && mp->populated_size < mp->size) {
-		off += mp->header_size;
-		if (iova == RTE_BAD_IOVA)
-			mempool_add_elem(mp, (char *)vaddr + off,
-				RTE_BAD_IOVA);
-		else
-			mempool_add_elem(mp, (char *)vaddr + off, iova + off);
-		off += mp->elt_size + mp->trailer_size;
-		i++;
+	if (off > len) {
+		ret = -EINVAL;
+		goto fail;
 	}
 
+	i = rte_mempool_ops_populate(mp, mp->size - mp->populated_size,
+		(char *)vaddr + off,
+		(iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off),
+		len - off, mempool_add_elem, NULL);
+
 	/* not enough room to store one object */
 	if (i == 0) {
 		ret = -EINVAL;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 191255d..754261e 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -456,6 +456,63 @@ ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 		uint32_t obj_num, uint32_t pg_shift,
 		size_t *min_chunk_size, size_t *align);
 
+/**
+ * Function to be called for each populated object.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] opaque
+ *   An opaque pointer passed to iterator.
+ * @param[in] vaddr
+ *   Object virtual address.
+ * @param[in] iova
+ *   Input/output virtual address of the object or RTE_BAD_IOVA.
+ */
+typedef void (rte_mempool_populate_obj_cb_t)(struct rte_mempool *mp,
+		void *opaque, void *vaddr, rte_iova_t iova);
+
+/**
+ * Populate memory pool objects using provided memory chunk.
+ *
+ * Populated objects should be enqueued to the pool, e.g. using
+ * rte_mempool_ops_enqueue_bulk().
+ *
+ * If the given IO address is unknown (iova = RTE_BAD_IOVA),
+ * the chunk doesn't need to be physically contiguous (only virtually),
+ * and allocated objects may span two pages.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] max_objs
+ *   Maximum number of objects to be populated.
+ * @param[in] vaddr
+ *   The virtual address of memory that should be used to store objects.
+ * @param[in] iova
+ *   The IO address
+ * @param[in] len
+ *   The length of memory in bytes.
+ * @param[in] obj_cb
+ *   Callback function to be executed for each populated object.
+ * @param[in] obj_cb_arg
+ *   An opaque pointer passed to the callback function.
+ * @return
+ *   The number of objects added on success.
+ *   On error, no objects are populated and a negative errno is returned.
+ */
+typedef int (*rte_mempool_populate_t)(struct rte_mempool *mp,
+		unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg);
+
+/**
+ * Default way to populate memory pool object using provided memory
+ * chunk: just slice objects one by one.
+ */
+int rte_mempool_op_populate_default(struct rte_mempool *mp,
+		unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg);
+
 /** Structure defining mempool operations structure */
 struct rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
@@ -477,6 +534,11 @@ struct rte_mempool_ops {
 	 * store specified number of objects.
 	 */
 	rte_mempool_calc_mem_size_t calc_mem_size;
+	/**
+	 * Optional callback to populate mempool objects using
+	 * provided memory chunk.
+	 */
+	rte_mempool_populate_t populate;
 } __rte_cache_aligned;
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
@@ -649,6 +711,34 @@ ssize_t rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 				      size_t *min_chunk_size, size_t *align);
 
 /**
+ * @internal wrapper for mempool_ops populate callback.
+ *
+ * Populate memory pool objects using provided memory chunk.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] max_objs
+ *   Maximum number of objects to be populated.
+ * @param[in] vaddr
+ *   The virtual address of memory that should be used to store objects.
+ * @param[in] iova
+ *   The IO address
+ * @param[in] len
+ *   The length of memory in bytes.
+ * @param[in] obj_cb
+ *   Callback function to be executed for each populated object.
+ * @param[in] obj_cb_arg
+ *   An opaque pointer passed to the callback function.
+ * @return
+ *   The number of objects added on success.
+ *   On error, no objects are populated and a negative errno is returned.
+ */
+int rte_mempool_ops_populate(struct rte_mempool *mp, unsigned int max_objs,
+			     void *vaddr, rte_iova_t iova, size_t len,
+			     rte_mempool_populate_obj_cb_t *obj_cb,
+			     void *obj_cb_arg);
+
+/**
  * @internal wrapper for mempool_ops free callback.
  *
  * @param mp
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 26908cc..1a7f39f 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -60,6 +60,7 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
+	ops->populate = h->populate;
 
 	rte_spinlock_unlock(&rte_mempool_ops_table.sl);
 
@@ -141,6 +142,26 @@ rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 	return ops->calc_mem_size(mp, obj_num, pg_shift, min_chunk_size, align);
 }
 
+/* wrapper to populate memory pool objects using provided memory chunk */
+int
+rte_mempool_ops_populate(struct rte_mempool *mp, unsigned int max_objs,
+				void *vaddr, rte_iova_t iova, size_t len,
+				rte_mempool_populate_obj_cb_t *obj_cb,
+				void *obj_cb_arg)
+{
+	struct rte_mempool_ops *ops;
+
+	ops = rte_mempool_get_ops(mp->ops_index);
+
+	if (ops->populate == NULL)
+		return rte_mempool_op_populate_default(mp, max_objs, vaddr,
+						       iova, len, obj_cb,
+						       obj_cb_arg);
+
+	return ops->populate(mp, max_objs, vaddr, iova, len, obj_cb,
+			     obj_cb_arg);
+}
+
 /* sets mempool ops previously registered by rte_mempool_register_ops. */
 int
 rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 57fe79b..57295f7 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -36,3 +36,27 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 
 	return mem_size;
 }
+
+int
+rte_mempool_op_populate_default(struct rte_mempool *mp, unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
+{
+	size_t total_elt_sz;
+	size_t off;
+	unsigned int i;
+	void *obj;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	for (off = 0, i = 0; off + total_elt_sz <= len && i < max_objs; i++) {
+		off += mp->header_size;
+		obj = (char *)vaddr + off;
+		obj_cb(mp, obj_cb_arg, obj,
+		       (iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off));
+		rte_mempool_ops_enqueue_bulk(mp, &obj, 1);
+		off += mp->elt_size + mp->trailer_size;
+	}
+
+	return i;
+}
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index cb38189..41a0b09 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -56,5 +56,6 @@ DPDK_18.05 {
 	global:
 
 	rte_mempool_op_calc_mem_size_default;
+	rte_mempool_op_populate_default;
 
 } DPDK_17.11;
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* Re: [dpdk-dev] [PATCH v2 1/2] Add RIB library
  2018-03-25 18:17  0%     ` Vladimir Medvedkin
@ 2018-03-26  9:50  0%       ` Bruce Richardson
  2018-03-29 19:59  0%         ` Vladimir Medvedkin
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2018-03-26  9:50 UTC (permalink / raw)
  To: Vladimir Medvedkin; +Cc: dev

On Sun, Mar 25, 2018 at 09:17:20PM +0300, Vladimir Medvedkin wrote:
> Hi,
> 
> 2018-03-14 14:09 GMT+03:00 Bruce Richardson <bruce.richardson@intel.com>:
> 
> > On Wed, Feb 21, 2018 at 09:44:54PM +0000, Medvedkin Vladimir wrote:
> > > RIB is an alternative to current LPM library.
> > > It solves the following problems
> > >  - Increases the speed of control plane operations against lpm such as
> > >    adding/deleting routes
> > >  - Adds abstraction from dataplane algorithms, so it is possible to add
> > >    different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc
> > >    in addition to current dir24_8
> > >  - It is possible to keep user defined application specific additional
> > >    information in struct rte_rib_node which represents route entry.
> > >    It can be next hop/set of next hops (i.e. active and feasible),
> > >    pointers to link rte_rib_node based on some criteria (i.e. next_hop),
> > >    plenty of additional control plane information.
> > >  - For dir24_8 implementation it is possible to remove
> > rte_lpm_tbl_entry.depth
> > >    field that helps to save 6 bits.
> > >  - Also new dir24_8 implementation supports different next_hop sizes
> > >    (1/2/4/8 bytes per next hop)
> > >  - Removed RTE_LPM_LOOKUP_SUCCESS to save 1 bit and to eleminate ternary
> > operator.
> > >    Instead it returns special default value if there is no route.
> > >
> > > Signed-off-by: Medvedkin Vladimir <medvedkinv@gmail.com>
> > > ---
> > >  config/common_base                 |   6 +
> > >  doc/api/doxy-api.conf              |   1 +
> > >  lib/Makefile                       |   2 +
> > >  lib/librte_rib/Makefile            |  22 ++
> > >  lib/librte_rib/rte_dir24_8.c       | 482 ++++++++++++++++++++++++++++++
> > +++
> > >  lib/librte_rib/rte_dir24_8.h       | 116 ++++++++
> > >  lib/librte_rib/rte_rib.c           | 526 ++++++++++++++++++++++++++++++
> > +++++++
> > >  lib/librte_rib/rte_rib.h           | 322 +++++++++++++++++++++++
> > >  lib/librte_rib/rte_rib_version.map |  18 ++
> > >  mk/rte.app.mk                      |   1 +
> > >  10 files changed, 1496 insertions(+)
> > >  create mode 100644 lib/librte_rib/Makefile
> > >  create mode 100644 lib/librte_rib/rte_dir24_8.c
> > >  create mode 100644 lib/librte_rib/rte_dir24_8.h
> > >  create mode 100644 lib/librte_rib/rte_rib.c
> > >  create mode 100644 lib/librte_rib/rte_rib.h
> > >  create mode 100644 lib/librte_rib/rte_rib_version.map
> > >
> >
> > First pass review comments. For now just reviewed the main public header
> > file rte_rib.h. Later reviews will cover the other files as best I can.
> >
> > /Bruce
> >
> > <snip>
> > > diff --git a/lib/librte_rib/rte_rib.h b/lib/librte_rib/rte_rib.h
> > > new file mode 100644
> > > index 0000000..6eac8fb
> > > --- /dev/null
> > > +++ b/lib/librte_rib/rte_rib.h
> > > @@ -0,0 +1,322 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright(c) 2018 Vladimir Medvedkin <medvedkinv@gmail.com>
> > > + */
> > > +
> > > +#ifndef _RTE_RIB_H_
> > > +#define _RTE_RIB_H_
> > > +
> > > +/**
> > > + * @file
> > > + * Compressed trie implementation for Longest Prefix Match
> > > + */
> > > +
> > > +/** @internal Macro to enable/disable run-time checks. */
> > > +#if defined(RTE_LIBRTE_RIB_DEBUG)
> > > +#define RTE_RIB_RETURN_IF_TRUE(cond, retval) do {    \
> > > +     if (cond)                                       \
> > > +             return retval;                          \
> > > +} while (0)
> > > +#else
> > > +#define RTE_RIB_RETURN_IF_TRUE(cond, retval)
> > > +#endif
> >
> > use RTE_ASSERT?
> >
> it was done just like it was done in the LPM lib. But if you think it
> should be RTE_ASSERT so be it.
> 
> 
> >
> > > +
> > > +#define RTE_RIB_VALID_NODE   1
> >
> > should there be an INVALID_NODE macro?
> >
> No
> 
> 
> >
> > > +#define RTE_RIB_GET_NXT_ALL  0
> > > +#define RTE_RIB_GET_NXT_COVER        1
> > > +
> > > +#define RTE_RIB_INVALID_ROUTE        0
> > > +#define RTE_RIB_VALID_ROUTE  1
> > > +
> > > +/** Max number of characters in RIB name. */
> > > +#define RTE_RIB_NAMESIZE     64
> > > +
> > > +/** Maximum depth value possible for IPv4 RIB. */
> > > +#define RTE_RIB_MAXDEPTH     32
> >
> > I think we should have IPv4 in the name here. Will it not be extended to
> > support IPv6 in future?
> >
> I think there should be a separate implementation of the library for ipv6
> 
I can understand the need for a separate LPM implementation, but should
they both not be under the same rib library?

> 
> >
> > > +
> > > +/**
> > > + * Macro to check if prefix1 {key1/depth1}
> > > + * is covered by prefix2 {key2/depth2}
> > > + */
> > > +#define RTE_RIB_IS_COVERED(key1, depth1, key2, depth2)
> >      \
> > > +     ((((key1 ^ key2) & (uint32_t)(UINT64_MAX << (32 - depth2))) == 0)\
> > > +             && (depth1 > depth2))
> > Neat check!
> >
> > Any particular reason for using UINT64_MAX here rather than UINT32_MAX?
> 
> in case when depth2 = 0 UINT32_MAX shifted left by 32 bit will remain
> UINT32_MAX because shift count will be masked to 5 bits.
> 
> I think you can avoid the casting and have a slightly shorter mask by
> > changing "(uint32_t)(UINT64_MAX << (32 - depth2)" to
> > "~(UINT32_MAX >> depth2)"
> > I'd also suggest for readability putting the second check first, and,
> > for maintainability, using an inline function rather than a macro.
> >
>  Agree, it looks clearer
> 
> 
> > > +
> > > +/** @internal Macro to get next node in tree*/
> > > +#define RTE_RIB_GET_NXT_NODE(node, key)
> >       \
> > > +     ((key & (1 << (31 - node->depth))) ? node->right : node->left)
> > > +/** @internal Macro to check if node is right child*/
> > > +#define RTE_RIB_IS_RIGHT_NODE(node)  (node->parent->right == node)
> >
> > Again, consider inline fns rather than macros.
> >
> Ok
> 
> For the latter macro, rather than doing additional pointer derefs to
> > parent, can you also get if it's a right node by using:
> > "(node->key & (1 << (32 - node->depth)))"?
> >
> No, it is not possible. Decision whether node be left or right is made
> using parent and child common depth.
> Consider case with 10.0.0.0/8 and 10.128.0.0/24. In this way common depth
> will be /8 and 10.128.0.0/24 will be right child.
> 
> 
> > > +
> > > +
> > > +struct rte_rib_node {
> > > +     struct rte_rib_node *left;
> > > +     struct rte_rib_node *right;
> > > +     struct rte_rib_node *parent;
> > > +     uint32_t        key;
> > > +     uint8_t         depth;
> > > +     uint8_t         flag;
> > > +     uint64_t        nh;
> > > +     uint64_t        ext[0];
> > > +};
> > > +
> > > +struct rte_rib;
> > > +
> > > +/** Type of FIB struct*/
> > > +enum rte_rib_type {
> > > +     RTE_RIB_DIR24_8_1B,
> > > +     RTE_RIB_DIR24_8_2B,
> > > +     RTE_RIB_DIR24_8_4B,
> > > +     RTE_RIB_DIR24_8_8B,
> > > +     RTE_RIB_TYPE_MAX
> > > +};
> >
> > If the plan is to support multiple underlying fib types and algorithms
> > under the rib library, would it not be better to separate out the
> > algorithm part from the data storage part? So have the type just be
> > DIR_24_8, and have the 1, 2, 4 or 8 specified separately.
> >
> Yes, we were talk about it in IRC, agree. Now I pass next hop size in
> union rte_rib_fib_conf inside rte_rib_conf
> 
> 
> >
> > > +
> > > +enum rte_rib_op {
> > > +     RTE_RIB_ADD,
> > > +     RTE_RIB_DEL
> > > +};
> > > +
> > > +/** RIB nodes allocation type */
> > > +enum rte_rib_alloc_type {
> > > +     RTE_RIB_MALLOC,
> > > +     RTE_RIB_MEMPOOL,
> > > +     RTE_RIB_ALLOC_MAX
> > > +};
> >
> > Not sure you need this any more. Malloc allocations and mempool
> > allocations are now pretty much the same thing.
> >
> Actually I think to remove malloc. On performance tests with
> adding/deleting huge amount of routes malloc is slower. Maybe because of
> fragmentation.
> What do you think?
> 
Yes, definitely mempool allocations are the way to go!

> 
> > > +
> > > +typedef int (*rte_rib_modify_fn_t)(struct rte_rib *rib, uint32_t key,
> > > +     uint8_t depth, uint64_t next_hop, enum rte_rib_op op);
> >
> > Do you anticipate more ops in future than just add and delete? If not,
> > why not just split this function into two and drop the op struct.
> >
> It is difficult question. I'm not ready to make decision at the moment.
> 
> 
> >
> > > +typedef int (*rte_rib_tree_lookup_fn_t)(void *fib, const uint32_t *ips,
> > > +     uint64_t *next_hops, const unsigned n);
> > > +typedef struct rte_rib_node *(*rte_rib_alloc_node_fn_t)(struct rte_rib
> > *rib);
> > > +typedef void (*rte_rib_free_node_fn_t)(struct rte_rib *rib,
> > > +     struct rte_rib_node *node);
> > > +
> > > +struct rte_rib {
> > > +     char name[RTE_RIB_NAMESIZE];
> > > +     /*pointer to rib trie*/
> > > +     struct rte_rib_node     *trie;
> > > +     /*pointer to dataplane struct*/
> > > +     void    *fib;
> > > +     /*prefix modification*/
> > > +     rte_rib_modify_fn_t     modify;
> > > +     /* Bulk lookup fn*/
> > > +     rte_rib_tree_lookup_fn_t        lookup;
> > > +     /*alloc trie element*/
> > > +     rte_rib_alloc_node_fn_t alloc_node;
> > > +     /*free trie element*/
> > > +     rte_rib_free_node_fn_t  free_node;
> > > +     struct rte_mempool      *node_pool;
> > > +     uint32_t                cur_nodes;
> > > +     uint32_t                cur_routes;
> > > +     int                     max_nodes;
> > > +     int                     node_sz;
> > > +     enum rte_rib_type       type;
> > > +     enum rte_rib_alloc_type alloc_type;
> > > +};
> > > +
> > > +/** RIB configuration structure */
> > > +struct rte_rib_conf {
> > > +     enum rte_rib_type       type;
> > > +     enum rte_rib_alloc_type alloc_type;
> > > +     int     max_nodes;
> > > +     size_t  node_sz;
> > > +     uint64_t def_nh;
> > > +};
> > > +
> > > +/**
> > > + * Lookup an IP into the RIB structure
> > > + *
> > > + * @param rib
> > > + *  RIB object handle
> > > + * @param key
> > > + *  IP to be looked up in the RIB
> > > + * @return
> > > + *  pointer to struct rte_rib_node on success,
> > > + *  NULL otherwise
> > > + */
> > > +struct rte_rib_node *
> > > +rte_rib_tree_lookup(struct rte_rib *rib, uint32_t key);
> > > +
> > > +/**
> > > + * Lookup less specific route into the RIB structure
> > > + *
> > > + * @param ent
> > > + *  Pointer to struct rte_rib_node that represents target route
> > > + * @return
> > > + *  pointer to struct rte_rib_node that represents
> > > + *  less specific route on success,
> > > + *  NULL otherwise
> > > + */
> > > +struct rte_rib_node *
> > > +rte_rib_tree_lookup_parent(struct rte_rib_node *ent);
> > > +
> > > +/**
> > > + * Lookup prefix into the RIB structure
> > > + *
> > > + * @param rib
> > > + *  RIB object handle
> > > + * @param key
> > > + *  net to be looked up in the RIB
> > > + * @param depth
> > > + *  prefix length
> > > + * @return
> > > + *  pointer to struct rte_rib_node on success,
> > > + *  NULL otherwise
> > > + */
> > > +struct rte_rib_node *
> > > +rte_rib_tree_lookup_exact(struct rte_rib *rib, uint32_t key, uint8_t
> > depth);
> >
> > Can you explain the difference between this and regular lookup, and how
> > they would be used. I don't think the names convey the differences
> > sufficiently, and so we should look to rename one or both to be clearer.
> >
> Regular lookup (rte_rib_tree_lookup) will lookup for most specific node for
> passed key.
> rte_rib_tree_lookup_exact will lookup node contained key and depth equal to
> passed in args. It used to find exact route.
> 
So if there is no node exactly matching the parameters, it the lookup_exact
returns failure? E.g. if you request a /24 node, it won't return a /8 node
that would cover the /24?

> 
> >
> > > +
> > > +/**
> > > + * Retrieve next more specific prefix from the RIB
> > s/more/most/
> >
> 
> > > + * that is covered by key/depth supernet
> > > + *
> > > + * @param rib
> > > + *  RIB object handle
> > > + * @param key
> > > + *  net address of supernet prefix that covers returned more specific
> > prefixes
> > > + * @param depth
> > > + *  supernet prefix length
> > > + * @param cur
> > > + *   pointer to the last returned prefix to get next prefix
> > > + *   or
> > > + *   NULL to get first more specific prefix
> > > + * @param flag
> > > + *  -RTE_RIB_GET_NXT_ALL
> > > + *   get all prefixes from subtrie
> >
> > By all prefixes do you mean more specific, i.e. the final prefix?
> >
> What do you mean the final prefix?
> 
The most specific one, or the longest prefix.

> 
> > > + *  -RTE_RIB_GET_NXT_COVER
> > > + *   get only first more specific prefix even if it have more specifics
> > > + * @return
> > > + *  pointer to the next more specific prefix
> > > + *  or
> > > + *  NULL if there is no prefixes left
> > > + */
> > > +struct rte_rib_node *
> > > +rte_rib_tree_get_nxt(struct rte_rib *rib, uint32_t key, uint8_t depth,
> > > +     struct rte_rib_node *cur, int flag);
> > > +
> > > +/**
> > > + * Remove prefix from the RIB
> > > + *
> > > + * @param rib
> > > + *  RIB object handle
> > > + * @param key
> > > + *  net to be removed from the RIB
> > > + * @param depth
> > > + *  prefix length
> > > + */
> > > +void
> > > +rte_rib_tree_remove(struct rte_rib *rib, uint32_t key, uint8_t depth);
> > > +
> > > +/**
> > > + * Insert prefix into the RIB
> > > + *
> > > + * @param rib
> > > + *  RIB object handle
> > > + * @param key
> > > + *  net to be inserted to the RIB
> > > + * @param depth
> > > + *  prefix length
> > > + * @return
> > > + *  pointer to new rte_rib_node on success
> > > + *  NULL otherwise
> > > + */
> > > +struct rte_rib_node *
> > > +rte_rib_tree_insert(struct rte_rib *rib, uint32_t key, uint8_t depth);
> > > +
> > > +/**
> > > + * Create RIB
> > > + *
> > > + * @param name
> > > + *  RIB name
> > > + * @param socket_id
> > > + *  NUMA socket ID for RIB table memory allocation
> > > + * @param conf
> > > + *  Structure containing the configuration
> > > + * @return
> > > + *  Handle to RIB object on success
> > > + *  NULL otherwise with rte_errno set to an appropriate values.
> > > + */
> > > +struct rte_rib *
> > > +rte_rib_create(const char *name, int socket_id, struct rte_rib_conf
> > *conf);
> > > +
> > > +/**
> > > + * Find an existing RIB object and return a pointer to it.
> > > + *
> > > + * @param name
> > > + *  Name of the rib object as passed to rte_rib_create()
> > > + * @return
> > > + *  Pointer to rib object or NULL if object not found with rte_errno
> > > + *  set appropriately. Possible rte_errno values include:
> > > + *   - ENOENT - required entry not available to return.
> > > + */
> > > +struct rte_rib *
> > > +rte_rib_find_existing(const char *name);
> > > +
> > > +/**
> > > + * Free an RIB object.
> > > + *
> > > + * @param rib
> > > + *   RIB object handle
> > > + * @return
> > > + *   None
> > > + */
> > > +void
> > > +rte_rib_free(struct rte_rib *rib);
> > > +
> > > +/**
> > > + * Add a rule to the RIB.
> > > + *
> > > + * @param rib
> > > + *   RIB object handle
> > > + * @param ip
> > > + *   IP of the rule to be added to the RIB
> > > + * @param depth
> > > + *   Depth of the rule to be added to the RIB
> > > + * @param next_hop
> > > + *   Next hop of the rule to be added to the RIB
> > > + * @return
> > > + *   0 on success, negative value otherwise
> > > + */
> > > +int
> > > +rte_rib_add(struct rte_rib *rib, uint32_t ip, uint8_t depth, uint64_t
> > next_hop);
> > > +
> > > +/**
> > > + * Delete a rule from the RIB.
> > > + *
> > > + * @param rib
> > > + *   RIB object handle
> > > + * @param ip
> > > + *   IP of the rule to be deleted from the RIB
> > > + * @param depth
> > > + *   Depth of the rule to be deleted from the RIB
> > > + * @return
> > > + *   0 on success, negative value otherwise
> > > + */
> > > +int
> > > +rte_rib_delete(struct rte_rib *rib, uint32_t ip, uint8_t depth);
> > > +
> > > +/**
> > > + * Lookup multiple IP addresses in an FIB. This may be implemented as a
> > > + * macro, so the address of the function should not be used.
> > > + *
> > > + * @param RIB
> > > + *   RIB object handle
> > > + * @param ips
> > > + *   Array of IPs to be looked up in the FIB
> > > + * @param next_hops
> > > + *   Next hop of the most specific rule found for IP.
> > > + *   This is an array of eight byte values.
> > > + *   If the lookup for the given IP failed, then corresponding element
> > would
> > > + *   contain default value, see description of then next parameter.
> > > + * @param n
> > > + *   Number of elements in ips (and next_hops) array to lookup. This
> > should be a
> > > + *   compile time constant, and divisible by 8 for best performance.
> > > + * @param defv
> > > + *   Default value to populate into corresponding element of hop[]
> > array,
> > > + *   if lookup would fail.
> > > + *  @return
> > > + *   -EINVAL for incorrect arguments, otherwise 0
> > > + */
> > > +#define rte_rib_fib_lookup_bulk(rib, ips, next_hops, n)      \
> > > +     rib->lookup(rib->fib, ips, next_hops, n)
> >
> > My main thought here is whether this needs to be a function at all?
> > Given that it takes a full burst of addresses in a single go, how much
> > performance would actually be lost by making this a regular function in
> > the C file?
> > IF we do convert this to a regular function, then a lot of the structure
> > definitions above - most importantly, the rib structure itself - can
> > probably be moved to a private header file and not exposed to
> > applications at all. This will make ABI compatibility a *lot* easier, as
> > the structures can be changed without affecting the public ABI.
> >
> I didn't quite understand what you mean.
> 
Sorry, by "needs to be a function" in first line read "needs to be a
macro". Basically, the point is to not inline anything that doesn't need
it. If a function works on a burst of packets, it probably will be fine
being a regular function than a macro or inline function.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/2] Add RIB library
  2018-03-14 11:09  4%   ` Bruce Richardson
@ 2018-03-25 18:17  0%     ` Vladimir Medvedkin
  2018-03-26  9:50  0%       ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Vladimir Medvedkin @ 2018-03-25 18:17 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

Hi,

2018-03-14 14:09 GMT+03:00 Bruce Richardson <bruce.richardson@intel.com>:

> On Wed, Feb 21, 2018 at 09:44:54PM +0000, Medvedkin Vladimir wrote:
> > RIB is an alternative to current LPM library.
> > It solves the following problems
> >  - Increases the speed of control plane operations against lpm such as
> >    adding/deleting routes
> >  - Adds abstraction from dataplane algorithms, so it is possible to add
> >    different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc
> >    in addition to current dir24_8
> >  - It is possible to keep user defined application specific additional
> >    information in struct rte_rib_node which represents route entry.
> >    It can be next hop/set of next hops (i.e. active and feasible),
> >    pointers to link rte_rib_node based on some criteria (i.e. next_hop),
> >    plenty of additional control plane information.
> >  - For dir24_8 implementation it is possible to remove
> rte_lpm_tbl_entry.depth
> >    field that helps to save 6 bits.
> >  - Also new dir24_8 implementation supports different next_hop sizes
> >    (1/2/4/8 bytes per next hop)
> >  - Removed RTE_LPM_LOOKUP_SUCCESS to save 1 bit and to eleminate ternary
> operator.
> >    Instead it returns special default value if there is no route.
> >
> > Signed-off-by: Medvedkin Vladimir <medvedkinv@gmail.com>
> > ---
> >  config/common_base                 |   6 +
> >  doc/api/doxy-api.conf              |   1 +
> >  lib/Makefile                       |   2 +
> >  lib/librte_rib/Makefile            |  22 ++
> >  lib/librte_rib/rte_dir24_8.c       | 482 ++++++++++++++++++++++++++++++
> +++
> >  lib/librte_rib/rte_dir24_8.h       | 116 ++++++++
> >  lib/librte_rib/rte_rib.c           | 526 ++++++++++++++++++++++++++++++
> +++++++
> >  lib/librte_rib/rte_rib.h           | 322 +++++++++++++++++++++++
> >  lib/librte_rib/rte_rib_version.map |  18 ++
> >  mk/rte.app.mk                      |   1 +
> >  10 files changed, 1496 insertions(+)
> >  create mode 100644 lib/librte_rib/Makefile
> >  create mode 100644 lib/librte_rib/rte_dir24_8.c
> >  create mode 100644 lib/librte_rib/rte_dir24_8.h
> >  create mode 100644 lib/librte_rib/rte_rib.c
> >  create mode 100644 lib/librte_rib/rte_rib.h
> >  create mode 100644 lib/librte_rib/rte_rib_version.map
> >
>
> First pass review comments. For now just reviewed the main public header
> file rte_rib.h. Later reviews will cover the other files as best I can.
>
> /Bruce
>
> <snip>
> > diff --git a/lib/librte_rib/rte_rib.h b/lib/librte_rib/rte_rib.h
> > new file mode 100644
> > index 0000000..6eac8fb
> > --- /dev/null
> > +++ b/lib/librte_rib/rte_rib.h
> > @@ -0,0 +1,322 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2018 Vladimir Medvedkin <medvedkinv@gmail.com>
> > + */
> > +
> > +#ifndef _RTE_RIB_H_
> > +#define _RTE_RIB_H_
> > +
> > +/**
> > + * @file
> > + * Compressed trie implementation for Longest Prefix Match
> > + */
> > +
> > +/** @internal Macro to enable/disable run-time checks. */
> > +#if defined(RTE_LIBRTE_RIB_DEBUG)
> > +#define RTE_RIB_RETURN_IF_TRUE(cond, retval) do {    \
> > +     if (cond)                                       \
> > +             return retval;                          \
> > +} while (0)
> > +#else
> > +#define RTE_RIB_RETURN_IF_TRUE(cond, retval)
> > +#endif
>
> use RTE_ASSERT?
>
it was done just like it was done in the LPM lib. But if you think it
should be RTE_ASSERT so be it.


>
> > +
> > +#define RTE_RIB_VALID_NODE   1
>
> should there be an INVALID_NODE macro?
>
No


>
> > +#define RTE_RIB_GET_NXT_ALL  0
> > +#define RTE_RIB_GET_NXT_COVER        1
> > +
> > +#define RTE_RIB_INVALID_ROUTE        0
> > +#define RTE_RIB_VALID_ROUTE  1
> > +
> > +/** Max number of characters in RIB name. */
> > +#define RTE_RIB_NAMESIZE     64
> > +
> > +/** Maximum depth value possible for IPv4 RIB. */
> > +#define RTE_RIB_MAXDEPTH     32
>
> I think we should have IPv4 in the name here. Will it not be extended to
> support IPv6 in future?
>
I think there should be a separate implementation of the library for ipv6


>
> > +
> > +/**
> > + * Macro to check if prefix1 {key1/depth1}
> > + * is covered by prefix2 {key2/depth2}
> > + */
> > +#define RTE_RIB_IS_COVERED(key1, depth1, key2, depth2)
>      \
> > +     ((((key1 ^ key2) & (uint32_t)(UINT64_MAX << (32 - depth2))) == 0)\
> > +             && (depth1 > depth2))
> Neat check!
>
> Any particular reason for using UINT64_MAX here rather than UINT32_MAX?

in case when depth2 = 0 UINT32_MAX shifted left by 32 bit will remain
UINT32_MAX because shift count will be masked to 5 bits.

I think you can avoid the casting and have a slightly shorter mask by
> changing "(uint32_t)(UINT64_MAX << (32 - depth2)" to
> "~(UINT32_MAX >> depth2)"
> I'd also suggest for readability putting the second check first, and,
> for maintainability, using an inline function rather than a macro.
>
 Agree, it looks clearer


> > +
> > +/** @internal Macro to get next node in tree*/
> > +#define RTE_RIB_GET_NXT_NODE(node, key)
>       \
> > +     ((key & (1 << (31 - node->depth))) ? node->right : node->left)
> > +/** @internal Macro to check if node is right child*/
> > +#define RTE_RIB_IS_RIGHT_NODE(node)  (node->parent->right == node)
>
> Again, consider inline fns rather than macros.
>
Ok

For the latter macro, rather than doing additional pointer derefs to
> parent, can you also get if it's a right node by using:
> "(node->key & (1 << (32 - node->depth)))"?
>
No, it is not possible. Decision whether node be left or right is made
using parent and child common depth.
Consider case with 10.0.0.0/8 and 10.128.0.0/24. In this way common depth
will be /8 and 10.128.0.0/24 will be right child.


> > +
> > +
> > +struct rte_rib_node {
> > +     struct rte_rib_node *left;
> > +     struct rte_rib_node *right;
> > +     struct rte_rib_node *parent;
> > +     uint32_t        key;
> > +     uint8_t         depth;
> > +     uint8_t         flag;
> > +     uint64_t        nh;
> > +     uint64_t        ext[0];
> > +};
> > +
> > +struct rte_rib;
> > +
> > +/** Type of FIB struct*/
> > +enum rte_rib_type {
> > +     RTE_RIB_DIR24_8_1B,
> > +     RTE_RIB_DIR24_8_2B,
> > +     RTE_RIB_DIR24_8_4B,
> > +     RTE_RIB_DIR24_8_8B,
> > +     RTE_RIB_TYPE_MAX
> > +};
>
> If the plan is to support multiple underlying fib types and algorithms
> under the rib library, would it not be better to separate out the
> algorithm part from the data storage part? So have the type just be
> DIR_24_8, and have the 1, 2, 4 or 8 specified separately.
>
Yes, we were talk about it in IRC, agree. Now I pass next hop size in
union rte_rib_fib_conf inside rte_rib_conf


>
> > +
> > +enum rte_rib_op {
> > +     RTE_RIB_ADD,
> > +     RTE_RIB_DEL
> > +};
> > +
> > +/** RIB nodes allocation type */
> > +enum rte_rib_alloc_type {
> > +     RTE_RIB_MALLOC,
> > +     RTE_RIB_MEMPOOL,
> > +     RTE_RIB_ALLOC_MAX
> > +};
>
> Not sure you need this any more. Malloc allocations and mempool
> allocations are now pretty much the same thing.
>
Actually I think to remove malloc. On performance tests with
adding/deleting huge amount of routes malloc is slower. Maybe because of
fragmentation.
What do you think?


> > +
> > +typedef int (*rte_rib_modify_fn_t)(struct rte_rib *rib, uint32_t key,
> > +     uint8_t depth, uint64_t next_hop, enum rte_rib_op op);
>
> Do you anticipate more ops in future than just add and delete? If not,
> why not just split this function into two and drop the op struct.
>
It is difficult question. I'm not ready to make decision at the moment.


>
> > +typedef int (*rte_rib_tree_lookup_fn_t)(void *fib, const uint32_t *ips,
> > +     uint64_t *next_hops, const unsigned n);
> > +typedef struct rte_rib_node *(*rte_rib_alloc_node_fn_t)(struct rte_rib
> *rib);
> > +typedef void (*rte_rib_free_node_fn_t)(struct rte_rib *rib,
> > +     struct rte_rib_node *node);
> > +
> > +struct rte_rib {
> > +     char name[RTE_RIB_NAMESIZE];
> > +     /*pointer to rib trie*/
> > +     struct rte_rib_node     *trie;
> > +     /*pointer to dataplane struct*/
> > +     void    *fib;
> > +     /*prefix modification*/
> > +     rte_rib_modify_fn_t     modify;
> > +     /* Bulk lookup fn*/
> > +     rte_rib_tree_lookup_fn_t        lookup;
> > +     /*alloc trie element*/
> > +     rte_rib_alloc_node_fn_t alloc_node;
> > +     /*free trie element*/
> > +     rte_rib_free_node_fn_t  free_node;
> > +     struct rte_mempool      *node_pool;
> > +     uint32_t                cur_nodes;
> > +     uint32_t                cur_routes;
> > +     int                     max_nodes;
> > +     int                     node_sz;
> > +     enum rte_rib_type       type;
> > +     enum rte_rib_alloc_type alloc_type;
> > +};
> > +
> > +/** RIB configuration structure */
> > +struct rte_rib_conf {
> > +     enum rte_rib_type       type;
> > +     enum rte_rib_alloc_type alloc_type;
> > +     int     max_nodes;
> > +     size_t  node_sz;
> > +     uint64_t def_nh;
> > +};
> > +
> > +/**
> > + * Lookup an IP into the RIB structure
> > + *
> > + * @param rib
> > + *  RIB object handle
> > + * @param key
> > + *  IP to be looked up in the RIB
> > + * @return
> > + *  pointer to struct rte_rib_node on success,
> > + *  NULL otherwise
> > + */
> > +struct rte_rib_node *
> > +rte_rib_tree_lookup(struct rte_rib *rib, uint32_t key);
> > +
> > +/**
> > + * Lookup less specific route into the RIB structure
> > + *
> > + * @param ent
> > + *  Pointer to struct rte_rib_node that represents target route
> > + * @return
> > + *  pointer to struct rte_rib_node that represents
> > + *  less specific route on success,
> > + *  NULL otherwise
> > + */
> > +struct rte_rib_node *
> > +rte_rib_tree_lookup_parent(struct rte_rib_node *ent);
> > +
> > +/**
> > + * Lookup prefix into the RIB structure
> > + *
> > + * @param rib
> > + *  RIB object handle
> > + * @param key
> > + *  net to be looked up in the RIB
> > + * @param depth
> > + *  prefix length
> > + * @return
> > + *  pointer to struct rte_rib_node on success,
> > + *  NULL otherwise
> > + */
> > +struct rte_rib_node *
> > +rte_rib_tree_lookup_exact(struct rte_rib *rib, uint32_t key, uint8_t
> depth);
>
> Can you explain the difference between this and regular lookup, and how
> they would be used. I don't think the names convey the differences
> sufficiently, and so we should look to rename one or both to be clearer.
>
Regular lookup (rte_rib_tree_lookup) will lookup for most specific node for
passed key.
rte_rib_tree_lookup_exact will lookup node contained key and depth equal to
passed in args. It used to find exact route.


>
> > +
> > +/**
> > + * Retrieve next more specific prefix from the RIB
> s/more/most/
>

> > + * that is covered by key/depth supernet
> > + *
> > + * @param rib
> > + *  RIB object handle
> > + * @param key
> > + *  net address of supernet prefix that covers returned more specific
> prefixes
> > + * @param depth
> > + *  supernet prefix length
> > + * @param cur
> > + *   pointer to the last returned prefix to get next prefix
> > + *   or
> > + *   NULL to get first more specific prefix
> > + * @param flag
> > + *  -RTE_RIB_GET_NXT_ALL
> > + *   get all prefixes from subtrie
>
> By all prefixes do you mean more specific, i.e. the final prefix?
>
What do you mean the final prefix?


> > + *  -RTE_RIB_GET_NXT_COVER
> > + *   get only first more specific prefix even if it have more specifics
> > + * @return
> > + *  pointer to the next more specific prefix
> > + *  or
> > + *  NULL if there is no prefixes left
> > + */
> > +struct rte_rib_node *
> > +rte_rib_tree_get_nxt(struct rte_rib *rib, uint32_t key, uint8_t depth,
> > +     struct rte_rib_node *cur, int flag);
> > +
> > +/**
> > + * Remove prefix from the RIB
> > + *
> > + * @param rib
> > + *  RIB object handle
> > + * @param key
> > + *  net to be removed from the RIB
> > + * @param depth
> > + *  prefix length
> > + */
> > +void
> > +rte_rib_tree_remove(struct rte_rib *rib, uint32_t key, uint8_t depth);
> > +
> > +/**
> > + * Insert prefix into the RIB
> > + *
> > + * @param rib
> > + *  RIB object handle
> > + * @param key
> > + *  net to be inserted to the RIB
> > + * @param depth
> > + *  prefix length
> > + * @return
> > + *  pointer to new rte_rib_node on success
> > + *  NULL otherwise
> > + */
> > +struct rte_rib_node *
> > +rte_rib_tree_insert(struct rte_rib *rib, uint32_t key, uint8_t depth);
> > +
> > +/**
> > + * Create RIB
> > + *
> > + * @param name
> > + *  RIB name
> > + * @param socket_id
> > + *  NUMA socket ID for RIB table memory allocation
> > + * @param conf
> > + *  Structure containing the configuration
> > + * @return
> > + *  Handle to RIB object on success
> > + *  NULL otherwise with rte_errno set to an appropriate values.
> > + */
> > +struct rte_rib *
> > +rte_rib_create(const char *name, int socket_id, struct rte_rib_conf
> *conf);
> > +
> > +/**
> > + * Find an existing RIB object and return a pointer to it.
> > + *
> > + * @param name
> > + *  Name of the rib object as passed to rte_rib_create()
> > + * @return
> > + *  Pointer to rib object or NULL if object not found with rte_errno
> > + *  set appropriately. Possible rte_errno values include:
> > + *   - ENOENT - required entry not available to return.
> > + */
> > +struct rte_rib *
> > +rte_rib_find_existing(const char *name);
> > +
> > +/**
> > + * Free an RIB object.
> > + *
> > + * @param rib
> > + *   RIB object handle
> > + * @return
> > + *   None
> > + */
> > +void
> > +rte_rib_free(struct rte_rib *rib);
> > +
> > +/**
> > + * Add a rule to the RIB.
> > + *
> > + * @param rib
> > + *   RIB object handle
> > + * @param ip
> > + *   IP of the rule to be added to the RIB
> > + * @param depth
> > + *   Depth of the rule to be added to the RIB
> > + * @param next_hop
> > + *   Next hop of the rule to be added to the RIB
> > + * @return
> > + *   0 on success, negative value otherwise
> > + */
> > +int
> > +rte_rib_add(struct rte_rib *rib, uint32_t ip, uint8_t depth, uint64_t
> next_hop);
> > +
> > +/**
> > + * Delete a rule from the RIB.
> > + *
> > + * @param rib
> > + *   RIB object handle
> > + * @param ip
> > + *   IP of the rule to be deleted from the RIB
> > + * @param depth
> > + *   Depth of the rule to be deleted from the RIB
> > + * @return
> > + *   0 on success, negative value otherwise
> > + */
> > +int
> > +rte_rib_delete(struct rte_rib *rib, uint32_t ip, uint8_t depth);
> > +
> > +/**
> > + * Lookup multiple IP addresses in an FIB. This may be implemented as a
> > + * macro, so the address of the function should not be used.
> > + *
> > + * @param RIB
> > + *   RIB object handle
> > + * @param ips
> > + *   Array of IPs to be looked up in the FIB
> > + * @param next_hops
> > + *   Next hop of the most specific rule found for IP.
> > + *   This is an array of eight byte values.
> > + *   If the lookup for the given IP failed, then corresponding element
> would
> > + *   contain default value, see description of then next parameter.
> > + * @param n
> > + *   Number of elements in ips (and next_hops) array to lookup. This
> should be a
> > + *   compile time constant, and divisible by 8 for best performance.
> > + * @param defv
> > + *   Default value to populate into corresponding element of hop[]
> array,
> > + *   if lookup would fail.
> > + *  @return
> > + *   -EINVAL for incorrect arguments, otherwise 0
> > + */
> > +#define rte_rib_fib_lookup_bulk(rib, ips, next_hops, n)      \
> > +     rib->lookup(rib->fib, ips, next_hops, n)
>
> My main thought here is whether this needs to be a function at all?
> Given that it takes a full burst of addresses in a single go, how much
> performance would actually be lost by making this a regular function in
> the C file?
> IF we do convert this to a regular function, then a lot of the structure
> definitions above - most importantly, the rib structure itself - can
> probably be moved to a private header file and not exposed to
> applications at all. This will make ABI compatibility a *lot* easier, as
> the structures can be changed without affecting the public ABI.
>
I didn't quite understand what you mean.


> /Bruce
>
> > +
> > +#endif /* _RTE_RIB_H_ */
>



-- 
Regards,
Vladimir

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 06/11] mempool: remove callback to get capabilities
  2018-03-25 16:20  2% ` [dpdk-dev] [PATCH v2 00/11] " Andrew Rybchenko
  2018-03-25 16:20  7%   ` [dpdk-dev] [PATCH v2 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
  2018-03-25 16:20  6%   ` [dpdk-dev] [PATCH v2 05/11] mempool: add op to populate objects using provided memory Andrew Rybchenko
@ 2018-03-25 16:20  6%   ` Andrew Rybchenko
  2018-03-25 16:20  4%   ` [dpdk-dev] [PATCH v2 07/11] mempool: deprecate xmem functions Andrew Rybchenko
  2018-03-25 16:20  8%   ` [dpdk-dev] [PATCH v2 10/11] mempool: remove callback to register memory area Andrew Rybchenko
  4 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-25 16:20 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ, Santosh Shukla, Jerin Jacob

The callback was introduced to let generic code to know octeontx
mempool driver requirements to use single physically contiguous
memory chunk to store all objects and align object address to
total object size. Now these requirements are met using a new
callbacks to calculate required memory chunk size and to populate
objects using provided memory chunk.

These capability flags are not used anywhere else.

Restricting capabilities to flags is not generic and likely to
be insufficient to describe mempool driver features. If required
in the future, API which returns structured information may be
added.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v1 -> v2:
 - fix typo
 - rebase on top of patch which renames MEMPOOL_F_NO_PHYS_CONTIG

RFCv2 -> v1:
 - squash mempool/octeontx patches to add calc_mem_size and populate
   callbacks to this one in order to avoid breakages in the middle of
   patchset
 - advertise API changes in release notes

 doc/guides/rel_notes/deprecation.rst            |  1 -
 doc/guides/rel_notes/release_18_05.rst          | 11 +++++
 drivers/mempool/octeontx/rte_mempool_octeontx.c | 59 +++++++++++++++++++++----
 lib/librte_mempool/rte_mempool.c                | 44 ++----------------
 lib/librte_mempool/rte_mempool.h                | 52 +---------------------
 lib/librte_mempool/rte_mempool_ops.c            | 14 ------
 lib/librte_mempool/rte_mempool_ops_default.c    | 15 +------
 lib/librte_mempool/rte_mempool_version.map      |  1 -
 8 files changed, 68 insertions(+), 129 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index c06fc67..4deed9a 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -70,7 +70,6 @@ Deprecation Notices
 
   The following changes are planned:
 
-  - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
   - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index abaefe5..c50f26c 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -66,6 +66,14 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Removed mempool capability flags and related functions.**
+
+  Flags ``MEMPOOL_F_CAPA_PHYS_CONTIG`` and
+  ``MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS`` were used by octeontx mempool
+  driver to customize generic mempool library behaviour.
+  Now the new driver callbacks ``calc_mem_size`` and ``populate`` may be
+  used to achieve it without specific knowledge in the generic code.
+
 
 ABI Changes
 -----------
@@ -86,6 +94,9 @@ ABI Changes
   to allow to customize required memory size calculation.
   A new callback ``populate`` has been added to ``rte_mempool_ops``
   to allow to customize objects population.
+  Callback ``get_capabilities`` has been removed from ``rte_mempool_ops``
+  since its features are covered by ``calc_mem_size`` and ``populate``
+  callbacks.
 
 
 Removed Items
diff --git a/drivers/mempool/octeontx/rte_mempool_octeontx.c b/drivers/mempool/octeontx/rte_mempool_octeontx.c
index d143d05..64ed528 100644
--- a/drivers/mempool/octeontx/rte_mempool_octeontx.c
+++ b/drivers/mempool/octeontx/rte_mempool_octeontx.c
@@ -126,14 +126,29 @@ octeontx_fpavf_get_count(const struct rte_mempool *mp)
 	return octeontx_fpa_bufpool_free_count(pool);
 }
 
-static int
-octeontx_fpavf_get_capabilities(const struct rte_mempool *mp,
-				unsigned int *flags)
+static ssize_t
+octeontx_fpavf_calc_mem_size(const struct rte_mempool *mp,
+			     uint32_t obj_num, uint32_t pg_shift,
+			     size_t *min_chunk_size, size_t *align)
 {
-	RTE_SET_USED(mp);
-	*flags |= (MEMPOOL_F_CAPA_PHYS_CONTIG |
-			MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS);
-	return 0;
+	ssize_t mem_size;
+
+	/*
+	 * Simply need space for one more object to be able to
+	 * fulfil alignment requirements.
+	 */
+	mem_size = rte_mempool_op_calc_mem_size_default(mp, obj_num + 1,
+							pg_shift,
+							min_chunk_size, align);
+	if (mem_size >= 0) {
+		/*
+		 * Memory area which contains objects must be physically
+		 * contiguous.
+		 */
+		*min_chunk_size = mem_size;
+	}
+
+	return mem_size;
 }
 
 static int
@@ -150,6 +165,33 @@ octeontx_fpavf_register_memory_area(const struct rte_mempool *mp,
 	return octeontx_fpavf_pool_set_range(pool_bar, len, vaddr, gpool);
 }
 
+static int
+octeontx_fpavf_populate(struct rte_mempool *mp, unsigned int max_objs,
+			void *vaddr, rte_iova_t iova, size_t len,
+			rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
+{
+	size_t total_elt_sz;
+	size_t off;
+
+	if (iova == RTE_BAD_IOVA)
+		return -EINVAL;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	/* align object start address to a multiple of total_elt_sz */
+	off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
+
+	if (len < off)
+		return -EINVAL;
+
+	vaddr = (char *)vaddr + off;
+	iova += off;
+	len -= off;
+
+	return rte_mempool_op_populate_default(mp, max_objs, vaddr, iova, len,
+					       obj_cb, obj_cb_arg);
+}
+
 static struct rte_mempool_ops octeontx_fpavf_ops = {
 	.name = "octeontx_fpavf",
 	.alloc = octeontx_fpavf_alloc,
@@ -157,8 +199,9 @@ static struct rte_mempool_ops octeontx_fpavf_ops = {
 	.enqueue = octeontx_fpavf_enqueue,
 	.dequeue = octeontx_fpavf_dequeue,
 	.get_count = octeontx_fpavf_get_count,
-	.get_capabilities = octeontx_fpavf_get_capabilities,
 	.register_memory_area = octeontx_fpavf_register_memory_area,
+	.calc_mem_size = octeontx_fpavf_calc_mem_size,
+	.populate = octeontx_fpavf_populate,
 };
 
 MEMPOOL_REGISTER_OPS(octeontx_fpavf_ops);
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index d917dc7..40eedde 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -208,15 +208,9 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  */
 size_t
 rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
-		      unsigned int flags)
+		      __rte_unused unsigned int flags)
 {
 	size_t obj_per_page, pg_num, pg_sz;
-	unsigned int mask;
-
-	mask = MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS | MEMPOOL_F_CAPA_PHYS_CONTIG;
-	if ((flags & mask) == mask)
-		/* alignment need one additional object */
-		elt_num += 1;
 
 	if (total_elt_sz == 0)
 		return 0;
@@ -240,18 +234,12 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
 ssize_t
 rte_mempool_xmem_usage(__rte_unused void *vaddr, uint32_t elt_num,
 	size_t total_elt_sz, const rte_iova_t iova[], uint32_t pg_num,
-	uint32_t pg_shift, unsigned int flags)
+	uint32_t pg_shift, __rte_unused unsigned int flags)
 {
 	uint32_t elt_cnt = 0;
 	rte_iova_t start, end;
 	uint32_t iova_idx;
 	size_t pg_sz = (size_t)1 << pg_shift;
-	unsigned int mask;
-
-	mask = MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS | MEMPOOL_F_CAPA_PHYS_CONTIG;
-	if ((flags & mask) == mask)
-		/* alignment need one additional object */
-		elt_num += 1;
 
 	/* if iova is NULL, assume contiguous memory */
 	if (iova == NULL) {
@@ -345,8 +333,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	rte_iova_t iova, size_t len, rte_mempool_memchunk_free_cb_t *free_cb,
 	void *opaque)
 {
-	unsigned total_elt_sz;
-	unsigned int mp_capa_flags;
 	unsigned i = 0;
 	size_t off;
 	struct rte_mempool_memhdr *memhdr;
@@ -365,27 +351,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	if (mp->populated_size >= mp->size)
 		return -ENOSPC;
 
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
-
-	/* Get mempool capabilities */
-	mp_capa_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_capa_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
-	/* update mempool capabilities */
-	mp->flags |= mp_capa_flags;
-
-	/* Detect pool area has sufficient space for elements */
-	if (mp_capa_flags & MEMPOOL_F_CAPA_PHYS_CONTIG) {
-		if (len < total_elt_sz * mp->size) {
-			RTE_LOG(ERR, MEMPOOL,
-				"pool area %" PRIx64 " not enough\n",
-				(uint64_t)len);
-			return -ENOSPC;
-		}
-	}
-
 	memhdr = rte_zmalloc("MEMPOOL_MEMHDR", sizeof(*memhdr), 0);
 	if (memhdr == NULL)
 		return -ENOMEM;
@@ -397,10 +362,7 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	memhdr->free_cb = free_cb;
 	memhdr->opaque = opaque;
 
-	if (mp_capa_flags & MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS)
-		/* align object start address to a multiple of total_elt_sz */
-		off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
-	else if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
+	if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
 		off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr;
 	else
 		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 754261e..0b83d5e 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -246,24 +246,6 @@ struct rte_mempool {
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_IOVA_CONTIG 0x0020 /**< Don't need IOVA contiguous objs. */
 #define MEMPOOL_F_NO_PHYS_CONTIG MEMPOOL_F_NO_IOVA_CONTIG /* deprecated */
-/**
- * This capability flag is advertised by a mempool handler, if the whole
- * memory area containing the objects must be physically contiguous.
- * Note: This flag should not be passed by application.
- */
-#define MEMPOOL_F_CAPA_PHYS_CONTIG 0x0040
-/**
- * This capability flag is advertised by a mempool handler. Used for a case
- * where mempool driver wants object start address(vaddr) aligned to block
- * size(/ total element size).
- *
- * Note:
- * - This flag should not be passed by application.
- *   Flag used for mempool driver only.
- * - Mempool driver must also set MEMPOOL_F_CAPA_PHYS_CONTIG flag along with
- *   MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS.
- */
-#define MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS 0x0080
 
 /**
  * @internal When debug is enabled, store some statistics.
@@ -389,12 +371,6 @@ typedef int (*rte_mempool_dequeue_t)(struct rte_mempool *mp,
 typedef unsigned (*rte_mempool_get_count)(const struct rte_mempool *mp);
 
 /**
- * Get the mempool capabilities.
- */
-typedef int (*rte_mempool_get_capabilities_t)(const struct rte_mempool *mp,
-		unsigned int *flags);
-
-/**
  * Notify new memory area to mempool.
  */
 typedef int (*rte_mempool_ops_register_memory_area_t)
@@ -440,13 +416,7 @@ typedef ssize_t (*rte_mempool_calc_mem_size_t)(const struct rte_mempool *mp,
  * that pages are grouped in subsets of physically continuous pages big
  * enough to store at least one object.
  *
- * If mempool driver requires object addresses to be block size aligned
- * (MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS), space for one extra element is
- * reserved to be able to meet the requirement.
- *
- * Minimum size of memory chunk is either all required space, if
- * capabilities say that whole memory area must be physically contiguous
- * (MEMPOOL_F_CAPA_PHYS_CONTIG), or a maximum of the page size and total
+ * Minimum size of memory chunk is a maximum of the page size and total
  * element size.
  *
  * Required memory chunk alignment is a maximum of page size and cache
@@ -522,10 +492,6 @@ struct rte_mempool_ops {
 	rte_mempool_dequeue_t dequeue;   /**< Dequeue an object. */
 	rte_mempool_get_count get_count; /**< Get qty of available objs. */
 	/**
-	 * Get the mempool capabilities
-	 */
-	rte_mempool_get_capabilities_t get_capabilities;
-	/**
 	 * Notify new memory area to mempool
 	 */
 	rte_mempool_ops_register_memory_area_t register_memory_area;
@@ -651,22 +617,6 @@ unsigned
 rte_mempool_ops_get_count(const struct rte_mempool *mp);
 
 /**
- * @internal wrapper for mempool_ops get_capabilities callback.
- *
- * @param mp [in]
- *   Pointer to the memory pool.
- * @param flags [out]
- *   Pointer to the mempool flags.
- * @return
- *   - 0: Success; The mempool driver has advertised his pool capabilities in
- *   flags param.
- *   - -ENOTSUP - doesn't support get_capabilities ops (valid case).
- *   - Otherwise, pool create fails.
- */
-int
-rte_mempool_ops_get_capabilities(const struct rte_mempool *mp,
-					unsigned int *flags);
-/**
  * @internal wrapper for mempool_ops register_memory_area callback.
  * API to notify the mempool handler when a new memory area is added to pool.
  *
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 1a7f39f..6ac669a 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -57,7 +57,6 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->enqueue = h->enqueue;
 	ops->dequeue = h->dequeue;
 	ops->get_count = h->get_count;
-	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
 	ops->populate = h->populate;
@@ -99,19 +98,6 @@ rte_mempool_ops_get_count(const struct rte_mempool *mp)
 	return ops->get_count(mp);
 }
 
-/* wrapper to get external mempool capabilities. */
-int
-rte_mempool_ops_get_capabilities(const struct rte_mempool *mp,
-					unsigned int *flags)
-{
-	struct rte_mempool_ops *ops;
-
-	ops = rte_mempool_get_ops(mp->ops_index);
-
-	RTE_FUNC_PTR_OR_ERR_RET(ops->get_capabilities, -ENOTSUP);
-	return ops->get_capabilities(mp, flags);
-}
-
 /* wrapper to notify new memory area to external mempool */
 int
 rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 57295f7..3defc15 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -11,26 +11,15 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 				     uint32_t obj_num, uint32_t pg_shift,
 				     size_t *min_chunk_size, size_t *align)
 {
-	unsigned int mp_flags;
-	int ret;
 	size_t total_elt_sz;
 	size_t mem_size;
 
-	/* Get mempool capabilities */
-	mp_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 
 	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
-					 mp->flags | mp_flags);
+					 mp->flags);
 
-	if (mp_flags & MEMPOOL_F_CAPA_PHYS_CONTIG)
-		*min_chunk_size = mem_size;
-	else
-		*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
+	*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
 
 	*align = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE, (size_t)1 << pg_shift);
 
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 41a0b09..637f73f 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -45,7 +45,6 @@ DPDK_16.07 {
 DPDK_17.11 {
 	global:
 
-	rte_mempool_ops_get_capabilities;
 	rte_mempool_ops_register_memory_area;
 	rte_mempool_populate_iova;
 	rte_mempool_populate_iova_tab;
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH v2 10/11] mempool: remove callback to register memory area
  2018-03-25 16:20  2% ` [dpdk-dev] [PATCH v2 00/11] " Andrew Rybchenko
                     ` (3 preceding siblings ...)
  2018-03-25 16:20  4%   ` [dpdk-dev] [PATCH v2 07/11] mempool: deprecate xmem functions Andrew Rybchenko
@ 2018-03-25 16:20  8%   ` Andrew Rybchenko
  4 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-25 16:20 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

The callback is not required any more since there is a new callback
to populate objects using provided memory area which provides
the same information.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v1 -> v2:
 - none

RFCv2 -> v1:
 - advertise ABI changes in release notes

 doc/guides/rel_notes/deprecation.rst       |  1 -
 doc/guides/rel_notes/release_18_05.rst     |  2 ++
 lib/librte_mempool/rte_mempool.c           |  5 -----
 lib/librte_mempool/rte_mempool.h           | 31 ------------------------------
 lib/librte_mempool/rte_mempool_ops.c       | 14 --------------
 lib/librte_mempool/rte_mempool_version.map |  1 -
 6 files changed, 2 insertions(+), 52 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 473330d..5301259 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -63,7 +63,6 @@ Deprecation Notices
 
   The following changes are planned:
 
-  - substitute ``register_memory_area`` with ``populate`` ops.
   - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
 
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 6a8db54..016c4ed 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -108,6 +108,8 @@ ABI Changes
   Callback ``get_capabilities`` has been removed from ``rte_mempool_ops``
   since its features are covered by ``calc_mem_size`` and ``populate``
   callbacks.
+  Callback ``register_memory_area`` has been removed from ``rte_mempool_ops``
+  since the new callback ``populate`` may be used instead of it.
 
 
 Removed Items
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 8c3b0b1..c58bcc6 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -355,11 +355,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	if (ret != 0)
 		return ret;
 
-	/* Notify memory area to mempool */
-	ret = rte_mempool_ops_register_memory_area(mp, vaddr, iova, len);
-	if (ret != -ENOTSUP && ret < 0)
-		return ret;
-
 	/* mempool is already populated */
 	if (mp->populated_size >= mp->size)
 		return -ENOSPC;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 9107f5a..314f909 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -371,12 +371,6 @@ typedef int (*rte_mempool_dequeue_t)(struct rte_mempool *mp,
 typedef unsigned (*rte_mempool_get_count)(const struct rte_mempool *mp);
 
 /**
- * Notify new memory area to mempool.
- */
-typedef int (*rte_mempool_ops_register_memory_area_t)
-(const struct rte_mempool *mp, char *vaddr, rte_iova_t iova, size_t len);
-
-/**
  * Calculate memory size required to store given number of objects.
  *
  * If mempool objects are not required to be IOVA-contiguous
@@ -514,10 +508,6 @@ struct rte_mempool_ops {
 	rte_mempool_dequeue_t dequeue;   /**< Dequeue an object. */
 	rte_mempool_get_count get_count; /**< Get qty of available objs. */
 	/**
-	 * Notify new memory area to mempool
-	 */
-	rte_mempool_ops_register_memory_area_t register_memory_area;
-	/**
 	 * Optional callback to calculate memory size required to
 	 * store specified number of objects.
 	 */
@@ -639,27 +629,6 @@ unsigned
 rte_mempool_ops_get_count(const struct rte_mempool *mp);
 
 /**
- * @internal wrapper for mempool_ops register_memory_area callback.
- * API to notify the mempool handler when a new memory area is added to pool.
- *
- * @param mp
- *   Pointer to the memory pool.
- * @param vaddr
- *   Pointer to the buffer virtual address.
- * @param iova
- *   Pointer to the buffer IO address.
- * @param len
- *   Pool size.
- * @return
- *   - 0: Success;
- *   - -ENOTSUP - doesn't support register_memory_area ops (valid error case).
- *   - Otherwise, rte_mempool_populate_phys fails thus pool create fails.
- */
-int
-rte_mempool_ops_register_memory_area(const struct rte_mempool *mp,
-				char *vaddr, rte_iova_t iova, size_t len);
-
-/**
  * @internal wrapper for mempool_ops calc_mem_size callback.
  * API to calculate size of memory required to store specified number of
  * object.
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 6ac669a..ea9be1e 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -57,7 +57,6 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->enqueue = h->enqueue;
 	ops->dequeue = h->dequeue;
 	ops->get_count = h->get_count;
-	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
 	ops->populate = h->populate;
 
@@ -99,19 +98,6 @@ rte_mempool_ops_get_count(const struct rte_mempool *mp)
 }
 
 /* wrapper to notify new memory area to external mempool */
-int
-rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
-					rte_iova_t iova, size_t len)
-{
-	struct rte_mempool_ops *ops;
-
-	ops = rte_mempool_get_ops(mp->ops_index);
-
-	RTE_FUNC_PTR_OR_ERR_RET(ops->register_memory_area, -ENOTSUP);
-	return ops->register_memory_area(mp, vaddr, iova, len);
-}
-
-/* wrapper to notify new memory area to external mempool */
 ssize_t
 rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 				uint32_t obj_num, uint32_t pg_shift,
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 637f73f..cf375db 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -45,7 +45,6 @@ DPDK_16.07 {
 DPDK_17.11 {
 	global:
 
-	rte_mempool_ops_register_memory_area;
 	rte_mempool_populate_iova;
 	rte_mempool_populate_iova_tab;
 
-- 
2.7.4

^ permalink raw reply	[relevance 8%]

* [dpdk-dev] [PATCH v2 07/11] mempool: deprecate xmem functions
  2018-03-25 16:20  2% ` [dpdk-dev] [PATCH v2 00/11] " Andrew Rybchenko
                     ` (2 preceding siblings ...)
  2018-03-25 16:20  6%   ` [dpdk-dev] [PATCH v2 06/11] mempool: remove callback to get capabilities Andrew Rybchenko
@ 2018-03-25 16:20  4%   ` Andrew Rybchenko
  2018-03-25 16:20  8%   ` [dpdk-dev] [PATCH v2 10/11] mempool: remove callback to register memory area Andrew Rybchenko
  4 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-25 16:20 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ, Thomas Monjalon

Move rte_mempool_xmem_size() code to internal helper function
since it is required in two places: deprecated rte_mempool_xmem_size()
and non-deprecated rte_mempool_op_calc_mem_size_default().

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v1 -> v2:
 - deprecate rte_mempool_populate_iova_tab()
 - add -Wno-deprecated-declarations to fix build errors because of
   rte_mempool_populate_iova_tab() deprecation
 - add @deprecated to deprecated functions description

RFCv2 -> v1:
 - advertise deprecation in release notes
 - factor out default memory size calculation into non-deprecated
   internal function to avoid usage of deprecated function internally
 - remove test for deprecated functions to address build issue because
   of usage of deprecated functions (it is easy to allow usage of
   deprecated function in Makefile, but very complicated in meson)

 doc/guides/rel_notes/deprecation.rst         |  7 -------
 doc/guides/rel_notes/release_18_05.rst       | 11 ++++++++++
 lib/librte_mempool/Makefile                  |  3 +++
 lib/librte_mempool/meson.build               | 12 +++++++++++
 lib/librte_mempool/rte_mempool.c             | 19 ++++++++++++++---
 lib/librte_mempool/rte_mempool.h             | 30 +++++++++++++++++++++++++++
 lib/librte_mempool/rte_mempool_ops_default.c |  4 ++--
 test/test/test_mempool.c                     | 31 ----------------------------
 8 files changed, 74 insertions(+), 43 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 4deed9a..473330d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -60,13 +60,6 @@ Deprecation Notices
   - ``rte_eal_mbuf_default_mempool_ops``
 
 * mempool: several API and ABI changes are planned in v18.05.
-  The following functions, introduced for Xen, which is not supported
-  anymore since v17.11, are hard to use, not used anywhere else in DPDK.
-  Therefore they will be deprecated in v18.05 and removed in v18.08:
-
-  - ``rte_mempool_xmem_create``
-  - ``rte_mempool_xmem_size``
-  - ``rte_mempool_xmem_usage``
 
   The following changes are planned:
 
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index c50f26c..6a8db54 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -74,6 +74,17 @@ API Changes
   Now the new driver callbacks ``calc_mem_size`` and ``populate`` may be
   used to achieve it without specific knowledge in the generic code.
 
+* **Deprecated mempool xmem functions.**
+
+  The following functions, introduced for Xen, which is not supported
+  anymore since v17.11, are hard to use, not used anywhere else in DPDK.
+  Therefore they were deprecated in v18.05 and will be removed in v18.08:
+
+  - ``rte_mempool_xmem_create``
+  - ``rte_mempool_xmem_size``
+  - ``rte_mempool_xmem_usage``
+  - ``rte_mempool_populate_iova_tab``
+
 
 ABI Changes
 -----------
diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 072740f..2c46fdd 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -7,6 +7,9 @@ include $(RTE_SDK)/mk/rte.vars.mk
 LIB = librte_mempool.a
 
 CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+# Allow deprecated symbol to use deprecated rte_mempool_populate_iova_tab()
+# from earlier deprecated rte_mempool_populate_phys_tab()
+CFLAGS += -Wno-deprecated-declarations
 LDLIBS += -lrte_eal -lrte_ring
 
 EXPORT_MAP := rte_mempool_version.map
diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
index 9e3b527..22e912a 100644
--- a/lib/librte_mempool/meson.build
+++ b/lib/librte_mempool/meson.build
@@ -1,6 +1,18 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
+extra_flags = []
+
+# Allow deprecated symbol to use deprecated rte_mempool_populate_iova_tab()
+# from earlier deprecated rte_mempool_populate_phys_tab()
+extra_flags += '-Wno-deprecated-declarations'
+
+foreach flag: extra_flags
+	if cc.has_argument(flag)
+		cflags += flag
+	endif
+endforeach
+
 version = 4
 sources = files('rte_mempool.c', 'rte_mempool_ops.c',
 		'rte_mempool_ops_default.c')
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 40eedde..8c3b0b1 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -204,11 +204,13 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
 
 
 /*
- * Calculate maximum amount of memory required to store given number of objects.
+ * Internal function to calculate required memory chunk size shared
+ * by default implementation of the corresponding callback and
+ * deprecated external function.
  */
 size_t
-rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
-		      __rte_unused unsigned int flags)
+rte_mempool_calc_mem_size_helper(uint32_t elt_num, size_t total_elt_sz,
+				 uint32_t pg_shift)
 {
 	size_t obj_per_page, pg_num, pg_sz;
 
@@ -228,6 +230,17 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
 }
 
 /*
+ * Calculate maximum amount of memory required to store given number of objects.
+ */
+size_t
+rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
+		      __rte_unused unsigned int flags)
+{
+	return rte_mempool_calc_mem_size_helper(elt_num, total_elt_sz,
+						pg_shift);
+}
+
+/*
  * Calculate how much memory would be actually required with the
  * given memory footprint to store required number of elements.
  */
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 0b83d5e..9107f5a 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -427,6 +427,28 @@ ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 		size_t *min_chunk_size, size_t *align);
 
 /**
+ * @internal Helper function to calculate memory size required to store
+ * specified number of objects in assumption that the memory buffer will
+ * be aligned at page boundary.
+ *
+ * Note that if object size is bigger than page size, then it assumes
+ * that pages are grouped in subsets of physically continuous pages big
+ * enough to store at least one object.
+ *
+ * @param elt_num
+ *   Number of elements.
+ * @param total_elt_sz
+ *   The size of each element, including header and trailer, as returned
+ *   by rte_mempool_calc_obj_size().
+ * @param pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+size_t rte_mempool_calc_mem_size_helper(uint32_t elt_num, size_t total_elt_sz,
+		uint32_t pg_shift);
+
+/**
  * Function to be called for each populated object.
  *
  * @param[in] mp
@@ -855,6 +877,7 @@ rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
 		   int socket_id, unsigned flags);
 
 /**
+ * @deprecated
  * Create a new mempool named *name* in memory.
  *
  * The pool contains n elements of elt_size. Its size is set to n.
@@ -912,6 +935,7 @@ rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
  *   The pointer to the new allocated mempool, on success. NULL on error
  *   with rte_errno set appropriately. See rte_mempool_create() for details.
  */
+__rte_deprecated
 struct rte_mempool *
 rte_mempool_xmem_create(const char *name, unsigned n, unsigned elt_size,
 		unsigned cache_size, unsigned private_data_size,
@@ -1008,6 +1032,7 @@ int rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
 	void *opaque);
 
 /**
+ * @deprecated
  * Add physical memory for objects in the pool at init
  *
  * Add a virtually contiguous memory chunk in the pool where objects can
@@ -1033,6 +1058,7 @@ int rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
  *   On error, the chunks are not added in the memory list of the
  *   mempool and a negative errno is returned.
  */
+__rte_deprecated
 int rte_mempool_populate_iova_tab(struct rte_mempool *mp, char *vaddr,
 	const rte_iova_t iova[], uint32_t pg_num, uint32_t pg_shift,
 	rte_mempool_memchunk_free_cb_t *free_cb, void *opaque);
@@ -1652,6 +1678,7 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
 	struct rte_mempool_objsz *sz);
 
 /**
+ * @deprecated
  * Get the size of memory required to store mempool elements.
  *
  * Calculate the maximum amount of memory required to store given number
@@ -1674,10 +1701,12 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  * @return
  *   Required memory size aligned at page boundary.
  */
+__rte_deprecated
 size_t rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz,
 	uint32_t pg_shift, unsigned int flags);
 
 /**
+ * @deprecated
  * Get the size of memory required to store mempool elements.
  *
  * Calculate how much memory would be actually required with the given
@@ -1705,6 +1734,7 @@ size_t rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz,
  *   buffer is too small, return a negative value whose absolute value
  *   is the actual number of elements that can be stored in that buffer.
  */
+__rte_deprecated
 ssize_t rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num,
 	size_t total_elt_sz, const rte_iova_t iova[], uint32_t pg_num,
 	uint32_t pg_shift, unsigned int flags);
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 3defc15..fd63ca1 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -16,8 +16,8 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 
-	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
-					 mp->flags);
+	mem_size = rte_mempool_calc_mem_size_helper(obj_num, total_elt_sz,
+						    pg_shift);
 
 	*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
 
diff --git a/test/test/test_mempool.c b/test/test/test_mempool.c
index 63f921e..8d29af2 100644
--- a/test/test/test_mempool.c
+++ b/test/test/test_mempool.c
@@ -444,34 +444,6 @@ test_mempool_same_name_twice_creation(void)
 	return 0;
 }
 
-/*
- * Basic test for mempool_xmem functions.
- */
-static int
-test_mempool_xmem_misc(void)
-{
-	uint32_t elt_num, total_size;
-	size_t sz;
-	ssize_t usz;
-
-	elt_num = MAX_KEEP;
-	total_size = rte_mempool_calc_obj_size(MEMPOOL_ELT_SIZE, 0, NULL);
-	sz = rte_mempool_xmem_size(elt_num, total_size, MEMPOOL_PG_SHIFT_MAX,
-					0);
-
-	usz = rte_mempool_xmem_usage(NULL, elt_num, total_size, 0, 1,
-		MEMPOOL_PG_SHIFT_MAX, 0);
-
-	if (sz != (size_t)usz)  {
-		printf("failure @ %s: rte_mempool_xmem_usage(%u, %u) "
-			"returns: %#zx, while expected: %#zx;\n",
-			__func__, elt_num, total_size, sz, (size_t)usz);
-		return -1;
-	}
-
-	return 0;
-}
-
 static void
 walk_cb(struct rte_mempool *mp, void *userdata __rte_unused)
 {
@@ -596,9 +568,6 @@ test_mempool(void)
 	if (test_mempool_same_name_twice_creation() < 0)
 		goto err;
 
-	if (test_mempool_xmem_misc() < 0)
-		goto err;
-
 	/* test the stack handler */
 	if (test_mempool_basic(mp_stack, 1) < 0)
 		goto err;
-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 05/11] mempool: add op to populate objects using provided memory
  2018-03-25 16:20  2% ` [dpdk-dev] [PATCH v2 00/11] " Andrew Rybchenko
  2018-03-25 16:20  7%   ` [dpdk-dev] [PATCH v2 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
@ 2018-03-25 16:20  6%   ` Andrew Rybchenko
  2018-03-25 16:20  6%   ` [dpdk-dev] [PATCH v2 06/11] mempool: remove callback to get capabilities Andrew Rybchenko
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-25 16:20 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

The callback allows to customize how objects are stored in the
memory chunk. Default implementation of the callback which simply
puts objects one by one is available.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v1 -> v2:
 - fix memory leak if off is bigger than len

RFCv2 -> v1:
 - advertise ABI changes in release notes
 - use consistent name for default callback:
   rte_mempool_op_<callback>_default()
 - add opaque data pointer to populated object callback
 - move default callback to dedicated file

 doc/guides/rel_notes/deprecation.rst         |  2 +-
 doc/guides/rel_notes/release_18_05.rst       |  2 +
 lib/librte_mempool/rte_mempool.c             | 23 ++++---
 lib/librte_mempool/rte_mempool.h             | 90 ++++++++++++++++++++++++++++
 lib/librte_mempool/rte_mempool_ops.c         | 21 +++++++
 lib/librte_mempool/rte_mempool_ops_default.c | 24 ++++++++
 lib/librte_mempool/rte_mempool_version.map   |  1 +
 7 files changed, 149 insertions(+), 14 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index e02d4ca..c06fc67 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -72,7 +72,7 @@ Deprecation Notices
 
   - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
-  - addition of new ops to customize objects population and allocate contiguous
+  - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
 
 * mbuf: The control mbuf API will be removed in v18.05. The impacted
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 59583ea..abaefe5 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -84,6 +84,8 @@ ABI Changes
 
   A new callback ``calc_mem_size`` has been added to ``rte_mempool_ops``
   to allow to customize required memory size calculation.
+  A new callback ``populate`` has been added to ``rte_mempool_ops``
+  to allow to customize objects population.
 
 
 Removed Items
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index dd2d0fe..d917dc7 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,7 +99,8 @@ static unsigned optimize_object_size(unsigned obj_size)
 }
 
 static void
-mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova)
+mempool_add_elem(struct rte_mempool *mp, __rte_unused void *opaque,
+		 void *obj, rte_iova_t iova)
 {
 	struct rte_mempool_objhdr *hdr;
 	struct rte_mempool_objtlr *tlr __rte_unused;
@@ -116,9 +117,6 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova)
 	tlr = __mempool_get_trailer(obj);
 	tlr->cookie = RTE_MEMPOOL_TRAILER_COOKIE;
 #endif
-
-	/* enqueue in ring */
-	rte_mempool_ops_enqueue_bulk(mp, &obj, 1);
 }
 
 /* call obj_cb() for each mempool element */
@@ -407,17 +405,16 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	else
 		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
 
-	while (off + total_elt_sz <= len && mp->populated_size < mp->size) {
-		off += mp->header_size;
-		if (iova == RTE_BAD_IOVA)
-			mempool_add_elem(mp, (char *)vaddr + off,
-				RTE_BAD_IOVA);
-		else
-			mempool_add_elem(mp, (char *)vaddr + off, iova + off);
-		off += mp->elt_size + mp->trailer_size;
-		i++;
+	if (off > len) {
+		ret = -EINVAL;
+		goto fail;
 	}
 
+	i = rte_mempool_ops_populate(mp, mp->size - mp->populated_size,
+		(char *)vaddr + off,
+		(iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off),
+		len - off, mempool_add_elem, NULL);
+
 	/* not enough room to store one object */
 	if (i == 0) {
 		ret = -EINVAL;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 191255d..754261e 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -456,6 +456,63 @@ ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 		uint32_t obj_num, uint32_t pg_shift,
 		size_t *min_chunk_size, size_t *align);
 
+/**
+ * Function to be called for each populated object.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] opaque
+ *   An opaque pointer passed to iterator.
+ * @param[in] vaddr
+ *   Object virtual address.
+ * @param[in] iova
+ *   Input/output virtual address of the object or RTE_BAD_IOVA.
+ */
+typedef void (rte_mempool_populate_obj_cb_t)(struct rte_mempool *mp,
+		void *opaque, void *vaddr, rte_iova_t iova);
+
+/**
+ * Populate memory pool objects using provided memory chunk.
+ *
+ * Populated objects should be enqueued to the pool, e.g. using
+ * rte_mempool_ops_enqueue_bulk().
+ *
+ * If the given IO address is unknown (iova = RTE_BAD_IOVA),
+ * the chunk doesn't need to be physically contiguous (only virtually),
+ * and allocated objects may span two pages.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] max_objs
+ *   Maximum number of objects to be populated.
+ * @param[in] vaddr
+ *   The virtual address of memory that should be used to store objects.
+ * @param[in] iova
+ *   The IO address
+ * @param[in] len
+ *   The length of memory in bytes.
+ * @param[in] obj_cb
+ *   Callback function to be executed for each populated object.
+ * @param[in] obj_cb_arg
+ *   An opaque pointer passed to the callback function.
+ * @return
+ *   The number of objects added on success.
+ *   On error, no objects are populated and a negative errno is returned.
+ */
+typedef int (*rte_mempool_populate_t)(struct rte_mempool *mp,
+		unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg);
+
+/**
+ * Default way to populate memory pool object using provided memory
+ * chunk: just slice objects one by one.
+ */
+int rte_mempool_op_populate_default(struct rte_mempool *mp,
+		unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg);
+
 /** Structure defining mempool operations structure */
 struct rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
@@ -477,6 +534,11 @@ struct rte_mempool_ops {
 	 * store specified number of objects.
 	 */
 	rte_mempool_calc_mem_size_t calc_mem_size;
+	/**
+	 * Optional callback to populate mempool objects using
+	 * provided memory chunk.
+	 */
+	rte_mempool_populate_t populate;
 } __rte_cache_aligned;
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
@@ -649,6 +711,34 @@ ssize_t rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 				      size_t *min_chunk_size, size_t *align);
 
 /**
+ * @internal wrapper for mempool_ops populate callback.
+ *
+ * Populate memory pool objects using provided memory chunk.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] max_objs
+ *   Maximum number of objects to be populated.
+ * @param[in] vaddr
+ *   The virtual address of memory that should be used to store objects.
+ * @param[in] iova
+ *   The IO address
+ * @param[in] len
+ *   The length of memory in bytes.
+ * @param[in] obj_cb
+ *   Callback function to be executed for each populated object.
+ * @param[in] obj_cb_arg
+ *   An opaque pointer passed to the callback function.
+ * @return
+ *   The number of objects added on success.
+ *   On error, no objects are populated and a negative errno is returned.
+ */
+int rte_mempool_ops_populate(struct rte_mempool *mp, unsigned int max_objs,
+			     void *vaddr, rte_iova_t iova, size_t len,
+			     rte_mempool_populate_obj_cb_t *obj_cb,
+			     void *obj_cb_arg);
+
+/**
  * @internal wrapper for mempool_ops free callback.
  *
  * @param mp
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 26908cc..1a7f39f 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -60,6 +60,7 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
+	ops->populate = h->populate;
 
 	rte_spinlock_unlock(&rte_mempool_ops_table.sl);
 
@@ -141,6 +142,26 @@ rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 	return ops->calc_mem_size(mp, obj_num, pg_shift, min_chunk_size, align);
 }
 
+/* wrapper to populate memory pool objects using provided memory chunk */
+int
+rte_mempool_ops_populate(struct rte_mempool *mp, unsigned int max_objs,
+				void *vaddr, rte_iova_t iova, size_t len,
+				rte_mempool_populate_obj_cb_t *obj_cb,
+				void *obj_cb_arg)
+{
+	struct rte_mempool_ops *ops;
+
+	ops = rte_mempool_get_ops(mp->ops_index);
+
+	if (ops->populate == NULL)
+		return rte_mempool_op_populate_default(mp, max_objs, vaddr,
+						       iova, len, obj_cb,
+						       obj_cb_arg);
+
+	return ops->populate(mp, max_objs, vaddr, iova, len, obj_cb,
+			     obj_cb_arg);
+}
+
 /* sets mempool ops previously registered by rte_mempool_register_ops. */
 int
 rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 57fe79b..57295f7 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -36,3 +36,27 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 
 	return mem_size;
 }
+
+int
+rte_mempool_op_populate_default(struct rte_mempool *mp, unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
+{
+	size_t total_elt_sz;
+	size_t off;
+	unsigned int i;
+	void *obj;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	for (off = 0, i = 0; off + total_elt_sz <= len && i < max_objs; i++) {
+		off += mp->header_size;
+		obj = (char *)vaddr + off;
+		obj_cb(mp, obj_cb_arg, obj,
+		       (iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off));
+		rte_mempool_ops_enqueue_bulk(mp, &obj, 1);
+		off += mp->elt_size + mp->trailer_size;
+	}
+
+	return i;
+}
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index cb38189..41a0b09 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -56,5 +56,6 @@ DPDK_18.05 {
 	global:
 
 	rte_mempool_op_calc_mem_size_default;
+	rte_mempool_op_populate_default;
 
 } DPDK_17.11;
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH v2 04/11] mempool: add op to calculate memory size to be allocated
  2018-03-25 16:20  2% ` [dpdk-dev] [PATCH v2 00/11] " Andrew Rybchenko
@ 2018-03-25 16:20  7%   ` Andrew Rybchenko
  2018-03-25 16:20  6%   ` [dpdk-dev] [PATCH v2 05/11] mempool: add op to populate objects using provided memory Andrew Rybchenko
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-25 16:20 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

Size of memory chunk required to populate mempool objects depends
on how objects are stored in the memory. Different mempool drivers
may have different requirements and a new operation allows to
calculate memory size in accordance with driver requirements and
advertise requirements on minimum memory chunk size and alignment
in a generic way.

Bump ABI version since the patch breaks it.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
v1 -> v2:
 - clarify min_chunk_size meaning
 - rebase on top of patch series which fixes library version in meson
   build

RFCv2 -> v1:
 - move default calc_mem_size callback to rte_mempool_ops_default.c
 - add ABI changes to release notes
 - name default callback consistently: rte_mempool_op_<callback>_default()
 - bump ABI version since it is the first patch which breaks ABI
 - describe default callback behaviour in details
 - avoid introduction of internal function to cope with deprecation
   (keep it to deprecation patch)
 - move cache-line or page boundary chunk alignment to default callback
 - highlight that min_chunk_size and align parameters are output only

 doc/guides/rel_notes/deprecation.rst         |  3 +-
 doc/guides/rel_notes/release_18_05.rst       |  7 ++-
 lib/librte_mempool/Makefile                  |  3 +-
 lib/librte_mempool/meson.build               |  5 +-
 lib/librte_mempool/rte_mempool.c             | 43 +++++++-------
 lib/librte_mempool/rte_mempool.h             | 86 +++++++++++++++++++++++++++-
 lib/librte_mempool/rte_mempool_ops.c         | 18 ++++++
 lib/librte_mempool/rte_mempool_ops_default.c | 38 ++++++++++++
 lib/librte_mempool/rte_mempool_version.map   |  7 +++
 9 files changed, 182 insertions(+), 28 deletions(-)
 create mode 100644 lib/librte_mempool/rte_mempool_ops_default.c

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 6594585..e02d4ca 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -72,8 +72,7 @@ Deprecation Notices
 
   - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
-  - addition of new ops to customize required memory chunk calculation,
-    customize objects population and allocate contiguous
+  - addition of new ops to customize objects population and allocate contiguous
     block of objects if underlying driver supports it.
 
 * mbuf: The control mbuf API will be removed in v18.05. The impacted
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index f2525bb..59583ea 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -80,6 +80,11 @@ ABI Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Changed rte_mempool_ops structure.**
+
+  A new callback ``calc_mem_size`` has been added to ``rte_mempool_ops``
+  to allow to customize required memory size calculation.
+
 
 Removed Items
 -------------
@@ -152,7 +157,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_latencystats.so.1
      librte_lpm.so.2
      librte_mbuf.so.3
-     librte_mempool.so.3
+   + librte_mempool.so.4
    + librte_meter.so.2
      librte_metrics.so.1
      librte_net.so.1
diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 24e735a..072740f 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -11,11 +11,12 @@ LDLIBS += -lrte_eal -lrte_ring
 
 EXPORT_MAP := rte_mempool_version.map
 
-LIBABIVER := 3
+LIBABIVER := 4
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops_default.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
 
diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
index 712720f..9e3b527 100644
--- a/lib/librte_mempool/meson.build
+++ b/lib/librte_mempool/meson.build
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
-version = 3
-sources = files('rte_mempool.c', 'rte_mempool_ops.c')
+version = 4
+sources = files('rte_mempool.c', 'rte_mempool_ops.c',
+		'rte_mempool_ops_default.c')
 headers = files('rte_mempool.h')
 deps += ['ring']
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index d8e3720..dd2d0fe 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -561,10 +561,10 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	unsigned int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	const struct rte_memzone *mz;
-	size_t size, total_elt_sz, align, pg_sz, pg_shift;
+	ssize_t mem_size;
+	size_t align, pg_sz, pg_shift;
 	rte_iova_t iova;
 	unsigned mz_id, n;
-	unsigned int mp_flags;
 	int ret;
 
 	ret = mempool_ops_alloc_once(mp);
@@ -575,29 +575,23 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	if (mp->nb_mem_chunks != 0)
 		return -EEXIST;
 
-	/* Get mempool capabilities */
-	mp_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
-	/* update mempool capabilities */
-	mp->flags |= mp_flags;
-
 	if (rte_eal_has_hugepages()) {
 		pg_shift = 0; /* not needed, zone is physically contiguous */
 		pg_sz = 0;
-		align = RTE_CACHE_LINE_SIZE;
 	} else {
 		pg_sz = getpagesize();
 		pg_shift = rte_bsf32(pg_sz);
-		align = pg_sz;
 	}
 
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 	for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
-		size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift,
-						mp->flags);
+		size_t min_chunk_size;
+
+		mem_size = rte_mempool_ops_calc_mem_size(mp, n, pg_shift,
+				&min_chunk_size, &align);
+		if (mem_size < 0) {
+			ret = mem_size;
+			goto fail;
+		}
 
 		ret = snprintf(mz_name, sizeof(mz_name),
 			RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);
@@ -606,7 +600,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 			goto fail;
 		}
 
-		mz = rte_memzone_reserve_aligned(mz_name, size,
+		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
 			mp->socket_id, mz_flags, align);
 		/* not enough memory, retry with the biggest zone we have */
 		if (mz == NULL)
@@ -617,6 +611,12 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 			goto fail;
 		}
 
+		if (mz->len < min_chunk_size) {
+			rte_memzone_free(mz);
+			ret = -ENOMEM;
+			goto fail;
+		}
+
 		if (mp->flags & MEMPOOL_F_NO_IOVA_CONTIG)
 			iova = RTE_BAD_IOVA;
 		else
@@ -649,13 +649,14 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 static size_t
 get_anon_size(const struct rte_mempool *mp)
 {
-	size_t size, total_elt_sz, pg_sz, pg_shift;
+	size_t size, pg_sz, pg_shift;
+	size_t min_chunk_size;
+	size_t align;
 
 	pg_sz = getpagesize();
 	pg_shift = rte_bsf32(pg_sz);
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
-	size = rte_mempool_xmem_size(mp->size, total_elt_sz, pg_shift,
-					mp->flags);
+	size = rte_mempool_ops_calc_mem_size(mp, mp->size, pg_shift,
+					     &min_chunk_size, &align);
 
 	return size;
 }
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index e531a15..191255d 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -400,6 +400,62 @@ typedef int (*rte_mempool_get_capabilities_t)(const struct rte_mempool *mp,
 typedef int (*rte_mempool_ops_register_memory_area_t)
 (const struct rte_mempool *mp, char *vaddr, rte_iova_t iova, size_t len);
 
+/**
+ * Calculate memory size required to store given number of objects.
+ *
+ * If mempool objects are not required to be IOVA-contiguous
+ * (the flag MEMPOOL_F_NO_IOVA_CONTIG is set), min_chunk_size defines
+ * virtually contiguous chunk size. Otherwise, if mempool objects must
+ * be IOVA-contiguous (the flag MEMPOOL_F_NO_IOVA_CONTIG is clear),
+ * min_chunk_size defines IOVA-contiguous chunk size.
+ *
+ * @param[in] mp
+ *   Pointer to the memory pool.
+ * @param[in] obj_num
+ *   Number of objects.
+ * @param[in] pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @param[out] min_chunk_size
+ *   Location for minimum size of the memory chunk which may be used to
+ *   store memory pool objects.
+ * @param[out] align
+ *   Location for required memory chunk alignment.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+typedef ssize_t (*rte_mempool_calc_mem_size_t)(const struct rte_mempool *mp,
+		uint32_t obj_num,  uint32_t pg_shift,
+		size_t *min_chunk_size, size_t *align);
+
+/**
+ * Default way to calculate memory size required to store given number of
+ * objects.
+ *
+ * If page boundaries may be ignored, it is just a product of total
+ * object size including header and trailer and number of objects.
+ * Otherwise, it is a number of pages required to store given number of
+ * objects without crossing page boundary.
+ *
+ * Note that if object size is bigger than page size, then it assumes
+ * that pages are grouped in subsets of physically continuous pages big
+ * enough to store at least one object.
+ *
+ * If mempool driver requires object addresses to be block size aligned
+ * (MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS), space for one extra element is
+ * reserved to be able to meet the requirement.
+ *
+ * Minimum size of memory chunk is either all required space, if
+ * capabilities say that whole memory area must be physically contiguous
+ * (MEMPOOL_F_CAPA_PHYS_CONTIG), or a maximum of the page size and total
+ * element size.
+ *
+ * Required memory chunk alignment is a maximum of page size and cache
+ * line size.
+ */
+ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
+		uint32_t obj_num, uint32_t pg_shift,
+		size_t *min_chunk_size, size_t *align);
+
 /** Structure defining mempool operations structure */
 struct rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
@@ -416,6 +472,11 @@ struct rte_mempool_ops {
 	 * Notify new memory area to mempool
 	 */
 	rte_mempool_ops_register_memory_area_t register_memory_area;
+	/**
+	 * Optional callback to calculate memory size required to
+	 * store specified number of objects.
+	 */
+	rte_mempool_calc_mem_size_t calc_mem_size;
 } __rte_cache_aligned;
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
@@ -565,6 +626,29 @@ rte_mempool_ops_register_memory_area(const struct rte_mempool *mp,
 				char *vaddr, rte_iova_t iova, size_t len);
 
 /**
+ * @internal wrapper for mempool_ops calc_mem_size callback.
+ * API to calculate size of memory required to store specified number of
+ * object.
+ *
+ * @param[in] mp
+ *   Pointer to the memory pool.
+ * @param[in] obj_num
+ *   Number of objects.
+ * @param[in] pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @param[out] min_chunk_size
+ *   Location for minimum size of the memory chunk which may be used to
+ *   store memory pool objects.
+ * @param[out] align
+ *   Location for required memory chunk alignment.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+ssize_t rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
+				      uint32_t obj_num, uint32_t pg_shift,
+				      size_t *min_chunk_size, size_t *align);
+
+/**
  * @internal wrapper for mempool_ops free callback.
  *
  * @param mp
@@ -1534,7 +1618,7 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  * of objects. Assume that the memory buffer will be aligned at page
  * boundary.
  *
- * Note that if object size is bigger then page size, then it assumes
+ * Note that if object size is bigger than page size, then it assumes
  * that pages are grouped in subsets of physically continuous pages big
  * enough to store at least one object.
  *
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 0732255..26908cc 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -59,6 +59,7 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->get_count = h->get_count;
 	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
+	ops->calc_mem_size = h->calc_mem_size;
 
 	rte_spinlock_unlock(&rte_mempool_ops_table.sl);
 
@@ -123,6 +124,23 @@ rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
 	return ops->register_memory_area(mp, vaddr, iova, len);
 }
 
+/* wrapper to notify new memory area to external mempool */
+ssize_t
+rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
+				uint32_t obj_num, uint32_t pg_shift,
+				size_t *min_chunk_size, size_t *align)
+{
+	struct rte_mempool_ops *ops;
+
+	ops = rte_mempool_get_ops(mp->ops_index);
+
+	if (ops->calc_mem_size == NULL)
+		return rte_mempool_op_calc_mem_size_default(mp, obj_num,
+				pg_shift, min_chunk_size, align);
+
+	return ops->calc_mem_size(mp, obj_num, pg_shift, min_chunk_size, align);
+}
+
 /* sets mempool ops previously registered by rte_mempool_register_ops. */
 int
 rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
new file mode 100644
index 0000000..57fe79b
--- /dev/null
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016 Intel Corporation.
+ * Copyright(c) 2016 6WIND S.A.
+ * Copyright(c) 2018 Solarflare Communications Inc.
+ */
+
+#include <rte_mempool.h>
+
+ssize_t
+rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
+				     uint32_t obj_num, uint32_t pg_shift,
+				     size_t *min_chunk_size, size_t *align)
+{
+	unsigned int mp_flags;
+	int ret;
+	size_t total_elt_sz;
+	size_t mem_size;
+
+	/* Get mempool capabilities */
+	mp_flags = 0;
+	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
+	if ((ret < 0) && (ret != -ENOTSUP))
+		return ret;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
+					 mp->flags | mp_flags);
+
+	if (mp_flags & MEMPOOL_F_CAPA_PHYS_CONTIG)
+		*min_chunk_size = mem_size;
+	else
+		*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
+
+	*align = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE, (size_t)1 << pg_shift);
+
+	return mem_size;
+}
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 62b76f9..cb38189 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -51,3 +51,10 @@ DPDK_17.11 {
 	rte_mempool_populate_iova_tab;
 
 } DPDK_16.07;
+
+DPDK_18.05 {
+	global:
+
+	rte_mempool_op_calc_mem_size_default;
+
+} DPDK_17.11;
-- 
2.7.4

^ permalink raw reply	[relevance 7%]

* [dpdk-dev] [PATCH v2 00/11] mempool: prepare to add bucket driver
    2018-03-10 15:39  3% ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
@ 2018-03-25 16:20  2% ` Andrew Rybchenko
  2018-03-25 16:20  7%   ` [dpdk-dev] [PATCH v2 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
                     ` (4 more replies)
  2018-03-26 16:09  2% ` [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver Andrew Rybchenko
  2018-03-26 16:12  3% ` [dpdk-dev] [PATCH v1 0/6] mempool: add bucket driver Andrew Rybchenko
  3 siblings, 5 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-25 16:20 UTC (permalink / raw)
  To: dev
  Cc: Olivier MATZ, Thomas Monjalon, Anatoly Burakov, Santosh Shukla,
	Jerin Jacob, Hemant Agrawal, Shreyansh Jain

The patch series should be applied on top of [7].

The initial patch series [1] is split into two to simplify processing.
The second series relies on this one and will add bucket mempool driver
and related ops.

The patch series has generic enhancements suggested by Olivier.
Basically it adds driver callbacks to calculate required memory size and
to populate objects using provided memory area. It allows to remove
so-called capability flags used before to tell generic code how to
allocate and slice allocated memory into mempool objects.
Clean up which removes get_capabilities and register_memory_area is
not strictly required, but I think right thing to do.
Existing mempool drivers are updated.

rte_mempool_populate_iova_tab() is also deprecated in v2 as agreed in [2].
Unfortunately it requires addition of -Wno-deprecated-declarations flag
to librte_mempool since the function is used by deprecated earlier
rte_mempool_populate_phys_tab(). If the later may be removed in the
release, we can avoid addition of the flag to allow usage of deprecated
functions.

One open question remains from previous review [3].

The patch series interfere with memory hotplug for DPDK [4] ([5] to be
precise). So, rebase may be required.

A new patch is added to the series to rename MEMPOOL_F_NO_PHYS_CONTIG
as MEMPOOL_F_NO_IOVA_CONTIG as agreed in [6].
MEMPOOL_F_CAPA_PHYS_CONTIG is not renamed since it removed in this
patchset.

It breaks ABI since changes rte_mempool_ops. Also it removes
rte_mempool_ops_register_memory_area() and
rte_mempool_ops_get_capabilities() since corresponding callbacks are
removed.

Internal global functions are not listed in map file since it is not
a part of external API.

[1] https://dpdk.org/ml/archives/dev/2018-January/088698.html
[2] https://dpdk.org/ml/archives/dev/2018-March/093186.html
[3] https://dpdk.org/ml/archives/dev/2018-March/093329.html
[4] https://dpdk.org/ml/archives/dev/2018-March/092070.html
[5] https://dpdk.org/ml/archives/dev/2018-March/092088.html
[6] https://dpdk.org/ml/archives/dev/2018-March/093345.html
[7] https://dpdk.org/ml/archives/dev/2018-March/093196.html

v1 -> v2:
  - deprecate rte_mempool_populate_iova_tab()
  - add patch to fix memory leak if no objects are populated
  - add patch to rename MEMPOOL_F_NO_PHYS_CONTIG
  - minor fixes (typos, blank line at the end of file)
  - highlight meaning of min_chunk_size (when it is virtual or
    physical contiguous)
  - make sure that mempool is initialized in rte_mempool_populate_anon()
  - move patch to ensure that mempool is initialized earlier in the series

RFCv2 -> v1:
  - split the series in two
  - squash octeontx patches which implement calc_mem_size and populate
    callbacks into the patch which removes get_capabilities since it is
    the easiest way to untangle the tangle of tightly related library
    functions and flags advertised by the driver
  - consistently name default callbacks
  - move default callbacks to dedicated file
  - see detailed description in patches

RFCv1 -> RFCv2:
  - add driver ops to calculate required memory size and populate
    mempool objects, remove extra flags which were required before
    to control it
  - transition of octeontx and dpaa drivers to the new callbacks
  - change info API to get information from driver required to
    API user to know contiguous block size
  - remove get_capabilities (not required any more and may be
    substituted with more in info get API)
  - remove register_memory_area since it is substituted with
    populate callback which can do more
  - use SPDX tags
  - avoid all objects affinity to single lcore
  - fix bucket get_count
  - deprecate XMEM API
  - avoid introduction of a new function to flush cache
  - fix NO_CACHE_ALIGN case in bucket mempool


Andrew Rybchenko (9):
  mempool: fix memhdr leak when no objects are populated
  mempool: rename flag to control IOVA-contiguous objects
  mempool: add op to calculate memory size to be allocated
  mempool: add op to populate objects using provided memory
  mempool: remove callback to get capabilities
  mempool: deprecate xmem functions
  mempool/octeontx: prepare to remove register memory area op
  mempool/dpaa: prepare to remove register memory area op
  mempool: remove callback to register memory area

Artem V. Andreev (2):
  mempool: ensure the mempool is initialized before populating
  mempool: support flushing the default cache of the mempool

 doc/guides/rel_notes/deprecation.rst            |  12 +-
 doc/guides/rel_notes/release_18_05.rst          |  33 ++-
 drivers/mempool/dpaa/dpaa_mempool.c             |  13 +-
 drivers/mempool/octeontx/rte_mempool_octeontx.c |  64 ++++--
 drivers/net/thunderx/nicvf_ethdev.c             |   2 +-
 lib/librte_mempool/Makefile                     |   6 +-
 lib/librte_mempool/meson.build                  |  17 +-
 lib/librte_mempool/rte_mempool.c                | 179 ++++++++-------
 lib/librte_mempool/rte_mempool.h                | 280 +++++++++++++++++-------
 lib/librte_mempool/rte_mempool_ops.c            |  37 ++--
 lib/librte_mempool/rte_mempool_ops_default.c    |  51 +++++
 lib/librte_mempool/rte_mempool_version.map      |  10 +-
 test/test/test_mempool.c                        |  31 ---
 13 files changed, 485 insertions(+), 250 deletions(-)
 create mode 100644 lib/librte_mempool/rte_mempool_ops_default.c

-- 
2.7.4

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH v3 1/7] crypto/virtio: add virtio related fundamental functions
  @ 2018-03-25  8:33  2% ` Jay Zhou
  0 siblings, 0 replies; 200+ results
From: Jay Zhou @ 2018-03-25  8:33 UTC (permalink / raw)
  To: dev
  Cc: pablo.de.lara.guarch, roy.fan.zhang, thomas, arei.gonglei,
	xin.zeng, weidong.huang, wangxinxin.wang, longpeng2,
	jianjay.zhou

Since there does not have the common virtio library, we have to put
these files here. They are basically the same with virtio net related files
with some minor changes.

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
---
 config/common_base                  |  20 ++
 drivers/crypto/virtio/virtio_logs.h |  47 ++++
 drivers/crypto/virtio/virtio_pci.c  | 460 ++++++++++++++++++++++++++++++++++++
 drivers/crypto/virtio/virtio_pci.h  | 253 ++++++++++++++++++++
 drivers/crypto/virtio/virtio_ring.h | 137 +++++++++++
 drivers/crypto/virtio/virtqueue.c   |  43 ++++
 drivers/crypto/virtio/virtqueue.h   | 176 ++++++++++++++
 7 files changed, 1136 insertions(+)
 create mode 100644 drivers/crypto/virtio/virtio_logs.h
 create mode 100644 drivers/crypto/virtio/virtio_pci.c
 create mode 100644 drivers/crypto/virtio/virtio_pci.h
 create mode 100644 drivers/crypto/virtio/virtio_ring.h
 create mode 100644 drivers/crypto/virtio/virtqueue.c
 create mode 100644 drivers/crypto/virtio/virtqueue.h

diff --git a/config/common_base b/config/common_base
index ad03cf4..19d0cdd 100644
--- a/config/common_base
+++ b/config/common_base
@@ -482,6 +482,26 @@ CONFIG_RTE_LIBRTE_PMD_QAT_DEBUG_DRIVER=n
 CONFIG_RTE_QAT_PMD_MAX_NB_SESSIONS=2048
 
 #
+# Compile PMD for virtio crypto devices
+#
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_INIT=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_SESSION=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_DRIVER=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_DUMP=n
+#
+# Number of maximum virtio crypto devices
+#
+CONFIG_RTE_MAX_VIRTIO_CRYPTO=32
+#
+# Number of sessions to create in the session memory pool
+# on a single virtio crypto device.
+#
+CONFIG_RTE_VIRTIO_CRYPTO_PMD_MAX_NB_SESSIONS=1024
+
+#
 # Compile PMD for AESNI backed device
 #
 CONFIG_RTE_LIBRTE_PMD_AESNI_MB=n
diff --git a/drivers/crypto/virtio/virtio_logs.h b/drivers/crypto/virtio/virtio_logs.h
new file mode 100644
index 0000000..20582a4
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_logs.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_LOGS_H_
+#define _VIRTIO_LOGS_H_
+
+#include <rte_log.h>
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_INIT
+#define PMD_INIT_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): " fmt "\n", __func__, ## args)
+#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
+#else
+#define PMD_INIT_LOG(level, fmt, args...) do { } while (0)
+#define PMD_INIT_FUNC_TRACE() do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_SESSION
+#define PMD_SESSION_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() session: " fmt "\n", __func__, ## args)
+#else
+#define PMD_SESSION_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_RX
+#define PMD_RX_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() rx: " fmt "\n", __func__, ## args)
+#else
+#define PMD_RX_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_TX
+#define PMD_TX_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() tx: " fmt "\n", __func__, ## args)
+#else
+#define PMD_TX_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_DRIVER
+#define PMD_DRV_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): driver " fmt "\n", __func__, ## args)
+#else
+#define PMD_DRV_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#endif /* _VIRTIO_LOGS_H_ */
diff --git a/drivers/crypto/virtio/virtio_pci.c b/drivers/crypto/virtio/virtio_pci.c
new file mode 100644
index 0000000..7aa5cdd
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_pci.c
@@ -0,0 +1,460 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#include <stdint.h>
+
+#ifdef RTE_EXEC_ENV_LINUXAPP
+ #include <dirent.h>
+ #include <fcntl.h>
+#endif
+
+#include <rte_io.h>
+#include <rte_bus.h>
+
+#include "virtio_pci.h"
+#include "virtqueue.h"
+
+/*
+ * Following macros are derived from linux/pci_regs.h, however,
+ * we can't simply include that header here, as there is no such
+ * file for non-Linux platform.
+ */
+#define PCI_CAPABILITY_LIST	0x34
+#define PCI_CAP_ID_VNDR		0x09
+#define PCI_CAP_ID_MSIX		0x11
+
+/*
+ * The remaining space is defined by each driver as the per-driver
+ * configuration space.
+ */
+#define VIRTIO_PCI_CONFIG(hw) \
+		(((hw)->use_msix == VIRTIO_MSIX_ENABLED) ? 24 : 20)
+
+static inline int
+check_vq_phys_addr_ok(struct virtqueue *vq)
+{
+	/* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
+	 * and only accepts 32 bit page frame number.
+	 * Check if the allocated physical memory exceeds 16TB.
+	 */
+	if ((vq->vq_ring_mem + vq->vq_ring_size - 1) >>
+			(VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
+		PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
+		return 0;
+	}
+
+	return 1;
+}
+
+static inline void
+io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
+{
+	rte_write32(val & ((1ULL << 32) - 1), lo);
+	rte_write32(val >> 32,		     hi);
+}
+
+static void
+modern_read_dev_config(struct virtio_crypto_hw *hw, size_t offset,
+		       void *dst, int length)
+{
+	int i;
+	uint8_t *p;
+	uint8_t old_gen, new_gen;
+
+	do {
+		old_gen = rte_read8(&hw->common_cfg->config_generation);
+
+		p = dst;
+		for (i = 0;  i < length; i++)
+			*p++ = rte_read8((uint8_t *)hw->dev_cfg + offset + i);
+
+		new_gen = rte_read8(&hw->common_cfg->config_generation);
+	} while (old_gen != new_gen);
+}
+
+static void
+modern_write_dev_config(struct virtio_crypto_hw *hw, size_t offset,
+			const void *src, int length)
+{
+	int i;
+	const uint8_t *p = src;
+
+	for (i = 0;  i < length; i++)
+		rte_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
+}
+
+static uint64_t
+modern_get_features(struct virtio_crypto_hw *hw)
+{
+	uint32_t features_lo, features_hi;
+
+	rte_write32(0, &hw->common_cfg->device_feature_select);
+	features_lo = rte_read32(&hw->common_cfg->device_feature);
+
+	rte_write32(1, &hw->common_cfg->device_feature_select);
+	features_hi = rte_read32(&hw->common_cfg->device_feature);
+
+	return ((uint64_t)features_hi << 32) | features_lo;
+}
+
+static void
+modern_set_features(struct virtio_crypto_hw *hw, uint64_t features)
+{
+	rte_write32(0, &hw->common_cfg->guest_feature_select);
+	rte_write32(features & ((1ULL << 32) - 1),
+		    &hw->common_cfg->guest_feature);
+
+	rte_write32(1, &hw->common_cfg->guest_feature_select);
+	rte_write32(features >> 32,
+		    &hw->common_cfg->guest_feature);
+}
+
+static uint8_t
+modern_get_status(struct virtio_crypto_hw *hw)
+{
+	return rte_read8(&hw->common_cfg->device_status);
+}
+
+static void
+modern_set_status(struct virtio_crypto_hw *hw, uint8_t status)
+{
+	rte_write8(status, &hw->common_cfg->device_status);
+}
+
+static void
+modern_reset(struct virtio_crypto_hw *hw)
+{
+	modern_set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	modern_get_status(hw);
+}
+
+static uint8_t
+modern_get_isr(struct virtio_crypto_hw *hw)
+{
+	return rte_read8(hw->isr);
+}
+
+static uint16_t
+modern_set_config_irq(struct virtio_crypto_hw *hw, uint16_t vec)
+{
+	rte_write16(vec, &hw->common_cfg->msix_config);
+	return rte_read16(&hw->common_cfg->msix_config);
+}
+
+static uint16_t
+modern_set_queue_irq(struct virtio_crypto_hw *hw, struct virtqueue *vq,
+		uint16_t vec)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+	rte_write16(vec, &hw->common_cfg->queue_msix_vector);
+	return rte_read16(&hw->common_cfg->queue_msix_vector);
+}
+
+static uint16_t
+modern_get_queue_num(struct virtio_crypto_hw *hw, uint16_t queue_id)
+{
+	rte_write16(queue_id, &hw->common_cfg->queue_select);
+	return rte_read16(&hw->common_cfg->queue_size);
+}
+
+static int
+modern_setup_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
+{
+	uint64_t desc_addr, avail_addr, used_addr;
+	uint16_t notify_off;
+
+	if (!check_vq_phys_addr_ok(vq))
+		return -1;
+
+	desc_addr = vq->vq_ring_mem;
+	avail_addr = desc_addr + vq->vq_nentries * sizeof(struct vring_desc);
+	used_addr = RTE_ALIGN_CEIL(avail_addr + offsetof(struct vring_avail,
+							 ring[vq->vq_nentries]),
+				   VIRTIO_PCI_VRING_ALIGN);
+
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(desc_addr, &hw->common_cfg->queue_desc_lo,
+				      &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(avail_addr, &hw->common_cfg->queue_avail_lo,
+				       &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(used_addr, &hw->common_cfg->queue_used_lo,
+				      &hw->common_cfg->queue_used_hi);
+
+	notify_off = rte_read16(&hw->common_cfg->queue_notify_off);
+	vq->notify_addr = (void *)((uint8_t *)hw->notify_base +
+				notify_off * hw->notify_off_multiplier);
+
+	rte_write16(1, &hw->common_cfg->queue_enable);
+
+	PMD_INIT_LOG(DEBUG, "queue %u addresses:", vq->vq_queue_index);
+	PMD_INIT_LOG(DEBUG, "\t desc_addr: %" PRIx64, desc_addr);
+	PMD_INIT_LOG(DEBUG, "\t aval_addr: %" PRIx64, avail_addr);
+	PMD_INIT_LOG(DEBUG, "\t used_addr: %" PRIx64, used_addr);
+	PMD_INIT_LOG(DEBUG, "\t notify addr: %p (notify offset: %u)",
+		vq->notify_addr, notify_off);
+
+	return 0;
+}
+
+static void
+modern_del_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(0, &hw->common_cfg->queue_desc_lo,
+				  &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_avail_lo,
+				  &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_used_lo,
+				  &hw->common_cfg->queue_used_hi);
+
+	rte_write16(0, &hw->common_cfg->queue_enable);
+}
+
+static void
+modern_notify_queue(struct virtio_crypto_hw *hw __rte_unused,
+		struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, vq->notify_addr);
+}
+
+const struct virtio_pci_ops virtio_crypto_modern_ops = {
+	.read_dev_cfg	= modern_read_dev_config,
+	.write_dev_cfg	= modern_write_dev_config,
+	.reset		= modern_reset,
+	.get_status	= modern_get_status,
+	.set_status	= modern_set_status,
+	.get_features	= modern_get_features,
+	.set_features	= modern_set_features,
+	.get_isr	= modern_get_isr,
+	.set_config_irq	= modern_set_config_irq,
+	.set_queue_irq  = modern_set_queue_irq,
+	.get_queue_num	= modern_get_queue_num,
+	.setup_queue	= modern_setup_queue,
+	.del_queue	= modern_del_queue,
+	.notify_queue	= modern_notify_queue,
+};
+
+void
+vtpci_read_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+		void *dst, int length)
+{
+	VTPCI_OPS(hw)->read_dev_cfg(hw, offset, dst, length);
+}
+
+void
+vtpci_write_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+		const void *src, int length)
+{
+	VTPCI_OPS(hw)->write_dev_cfg(hw, offset, src, length);
+}
+
+uint64_t
+vtpci_cryptodev_negotiate_features(struct virtio_crypto_hw *hw,
+		uint64_t host_features)
+{
+	uint64_t features;
+
+	/*
+	 * Limit negotiated features to what the driver, virtqueue, and
+	 * host all support.
+	 */
+	features = host_features & hw->guest_features;
+	VTPCI_OPS(hw)->set_features(hw, features);
+
+	return features;
+}
+
+void
+vtpci_cryptodev_reset(struct virtio_crypto_hw *hw)
+{
+	VTPCI_OPS(hw)->set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	/* flush status write */
+	VTPCI_OPS(hw)->get_status(hw);
+}
+
+void
+vtpci_cryptodev_reinit_complete(struct virtio_crypto_hw *hw)
+{
+	vtpci_cryptodev_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
+}
+
+void
+vtpci_cryptodev_set_status(struct virtio_crypto_hw *hw, uint8_t status)
+{
+	if (status != VIRTIO_CONFIG_STATUS_RESET)
+		status |= VTPCI_OPS(hw)->get_status(hw);
+
+	VTPCI_OPS(hw)->set_status(hw, status);
+}
+
+uint8_t
+vtpci_cryptodev_get_status(struct virtio_crypto_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_status(hw);
+}
+
+uint8_t
+vtpci_cryptodev_isr(struct virtio_crypto_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_isr(hw);
+}
+
+static void *
+get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap)
+{
+	uint8_t  bar    = cap->bar;
+	uint32_t length = cap->length;
+	uint32_t offset = cap->offset;
+	uint8_t *base;
+
+	if (bar >= PCI_MAX_RESOURCE) {
+		PMD_INIT_LOG(ERR, "invalid bar: %u", bar);
+		return NULL;
+	}
+
+	if (offset + length < offset) {
+		PMD_INIT_LOG(ERR, "offset(%u) + length(%u) overflows",
+			offset, length);
+		return NULL;
+	}
+
+	if (offset + length > dev->mem_resource[bar].len) {
+		PMD_INIT_LOG(ERR,
+			"invalid cap: overflows bar space: %u > %" PRIu64,
+			offset + length, dev->mem_resource[bar].len);
+		return NULL;
+	}
+
+	base = dev->mem_resource[bar].addr;
+	if (base == NULL) {
+		PMD_INIT_LOG(ERR, "bar %u base addr is NULL", bar);
+		return NULL;
+	}
+
+	return base + offset;
+}
+
+#define PCI_MSIX_ENABLE 0x8000
+
+static int
+virtio_read_caps(struct rte_pci_device *dev, struct virtio_crypto_hw *hw)
+{
+	uint8_t pos;
+	struct virtio_pci_cap cap;
+	int ret;
+
+	if (rte_pci_map_device(dev)) {
+		PMD_INIT_LOG(DEBUG, "failed to map pci device!");
+		return -1;
+	}
+
+	ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
+	if (ret < 0) {
+		PMD_INIT_LOG(DEBUG, "failed to read pci capability list");
+		return -1;
+	}
+
+	while (pos) {
+		ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
+		if (ret < 0) {
+			PMD_INIT_LOG(ERR,
+				"failed to read pci cap at pos: %x", pos);
+			break;
+		}
+
+		if (cap.cap_vndr == PCI_CAP_ID_MSIX) {
+			/* Transitional devices would also have this capability,
+			 * that's why we also check if msix is enabled.
+			 * 1st byte is cap ID; 2nd byte is the position of next
+			 * cap; next two bytes are the flags.
+			 */
+			uint16_t flags = ((uint16_t *)&cap)[1];
+
+			if (flags & PCI_MSIX_ENABLE)
+				hw->use_msix = VIRTIO_MSIX_ENABLED;
+			else
+				hw->use_msix = VIRTIO_MSIX_DISABLED;
+		}
+
+		if (cap.cap_vndr != PCI_CAP_ID_VNDR) {
+			PMD_INIT_LOG(DEBUG,
+				"[%2x] skipping non VNDR cap id: %02x",
+				pos, cap.cap_vndr);
+			goto next;
+		}
+
+		PMD_INIT_LOG(DEBUG,
+			"[%2x] cfg type: %u, bar: %u, offset: %04x, len: %u",
+			pos, cap.cfg_type, cap.bar, cap.offset, cap.length);
+
+		switch (cap.cfg_type) {
+		case VIRTIO_PCI_CAP_COMMON_CFG:
+			hw->common_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_NOTIFY_CFG:
+			rte_pci_read_config(dev, &hw->notify_off_multiplier,
+					4, pos + sizeof(cap));
+			hw->notify_base = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_DEVICE_CFG:
+			hw->dev_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_ISR_CFG:
+			hw->isr = get_cfg_addr(dev, &cap);
+			break;
+		}
+
+next:
+		pos = cap.cap_next;
+	}
+
+	if (hw->common_cfg == NULL || hw->notify_base == NULL ||
+	    hw->dev_cfg == NULL    || hw->isr == NULL) {
+		PMD_INIT_LOG(INFO, "no modern virtio pci device found.");
+		return -1;
+	}
+
+	PMD_INIT_LOG(INFO, "found modern virtio pci device.");
+
+	PMD_INIT_LOG(DEBUG, "common cfg mapped at: %p", hw->common_cfg);
+	PMD_INIT_LOG(DEBUG, "device cfg mapped at: %p", hw->dev_cfg);
+	PMD_INIT_LOG(DEBUG, "isr cfg mapped at: %p", hw->isr);
+	PMD_INIT_LOG(DEBUG, "notify base: %p, notify off multiplier: %u",
+		hw->notify_base, hw->notify_off_multiplier);
+
+	return 0;
+}
+
+/*
+ * Return -1:
+ *   if there is error mapping with VFIO/UIO.
+ *   if port map error when driver type is KDRV_NONE.
+ *   if whitelisted but driver type is KDRV_UNKNOWN.
+ * Return 1 if kernel driver is managing the device.
+ * Return 0 on success.
+ */
+int
+vtpci_cryptodev_init(struct rte_pci_device *dev, struct virtio_crypto_hw *hw)
+{
+	/*
+	 * Try if we can succeed reading virtio pci caps, which exists
+	 * only on modern pci device. If failed, we fallback to legacy
+	 * virtio handling.
+	 */
+	if (virtio_read_caps(dev, hw) == 0) {
+		PMD_INIT_LOG(INFO, "modern virtio pci detected.");
+		virtio_hw_internal[hw->dev_id].vtpci_ops =
+					&virtio_crypto_modern_ops;
+		hw->modern = 1;
+		return 0;
+	}
+
+	/*
+	 * virtio crypto conforms to virtio 1.0 and doesn't support
+	 * legacy mode
+	 */
+	return -1;
+}
diff --git a/drivers/crypto/virtio/virtio_pci.h b/drivers/crypto/virtio/virtio_pci.h
new file mode 100644
index 0000000..cd316a6
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_pci.h
@@ -0,0 +1,253 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_PCI_H_
+#define _VIRTIO_PCI_H_
+
+#include <linux/virtio_crypto.h>
+
+#include <stdint.h>
+
+#include <rte_pci.h>
+#include <rte_bus_pci.h>
+#include <rte_cryptodev.h>
+
+struct virtqueue;
+
+/* VirtIO PCI vendor/device ID. */
+#define VIRTIO_CRYPTO_PCI_VENDORID 0x1AF4
+#define VIRTIO_CRYPTO_PCI_DEVICEID 0x1054
+
+/* VirtIO ABI version, this must match exactly. */
+#define VIRTIO_PCI_ABI_VERSION 0
+
+/*
+ * VirtIO Header, located in BAR 0.
+ */
+#define VIRTIO_PCI_HOST_FEATURES  0  /* host's supported features (32bit, RO)*/
+#define VIRTIO_PCI_GUEST_FEATURES 4  /* guest's supported features (32, RW) */
+#define VIRTIO_PCI_QUEUE_PFN      8  /* physical address of VQ (32, RW) */
+#define VIRTIO_PCI_QUEUE_NUM      12 /* number of ring entries (16, RO) */
+#define VIRTIO_PCI_QUEUE_SEL      14 /* current VQ selection (16, RW) */
+#define VIRTIO_PCI_QUEUE_NOTIFY   16 /* notify host regarding VQ (16, RW) */
+#define VIRTIO_PCI_STATUS         18 /* device status register (8, RW) */
+#define VIRTIO_PCI_ISR            19 /* interrupt status register, reading
+				      * also clears the register (8, RO)
+				      */
+/* Only if MSIX is enabled: */
+
+/* configuration change vector (16, RW) */
+#define VIRTIO_MSI_CONFIG_VECTOR  20
+/* vector for selected VQ notifications */
+#define VIRTIO_MSI_QUEUE_VECTOR	  22
+
+/* The bit of the ISR which indicates a device has an interrupt. */
+#define VIRTIO_PCI_ISR_INTR   0x1
+/* The bit of the ISR which indicates a device configuration change. */
+#define VIRTIO_PCI_ISR_CONFIG 0x2
+/* Vector value used to disable MSI for queue. */
+#define VIRTIO_MSI_NO_VECTOR 0xFFFF
+
+/* Status byte for guest to report progress. */
+#define VIRTIO_CONFIG_STATUS_RESET     0x00
+#define VIRTIO_CONFIG_STATUS_ACK       0x01
+#define VIRTIO_CONFIG_STATUS_DRIVER    0x02
+#define VIRTIO_CONFIG_STATUS_DRIVER_OK 0x04
+#define VIRTIO_CONFIG_STATUS_FEATURES_OK 0x08
+#define VIRTIO_CONFIG_STATUS_FAILED    0x80
+
+/*
+ * Each virtqueue indirect descriptor list must be physically contiguous.
+ * To allow us to malloc(9) each list individually, limit the number
+ * supported to what will fit in one page. With 4KB pages, this is a limit
+ * of 256 descriptors. If there is ever a need for more, we can switch to
+ * contigmalloc(9) for the larger allocations, similar to what
+ * bus_dmamem_alloc(9) does.
+ *
+ * Note the sizeof(struct vring_desc) is 16 bytes.
+ */
+#define VIRTIO_MAX_INDIRECT ((int) (PAGE_SIZE / 16))
+
+/* Do we get callbacks when the ring is completely used, even if we've
+ * suppressed them?
+ */
+#define VIRTIO_F_NOTIFY_ON_EMPTY	24
+
+/* Can the device handle any descriptor layout? */
+#define VIRTIO_F_ANY_LAYOUT		27
+
+/* We support indirect buffer descriptors */
+#define VIRTIO_RING_F_INDIRECT_DESC	28
+
+#define VIRTIO_F_VERSION_1		32
+#define VIRTIO_F_IOMMU_PLATFORM	33
+
+/* The Guest publishes the used index for which it expects an interrupt
+ * at the end of the avail ring. Host should ignore the avail->flags field.
+ */
+/* The Host publishes the avail index for which it expects a kick
+ * at the end of the used ring. Guest should ignore the used->flags field.
+ */
+#define VIRTIO_RING_F_EVENT_IDX		29
+
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG	1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG	2
+/* ISR Status */
+#define VIRTIO_PCI_CAP_ISR_CFG		3
+/* Device specific configuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG	4
+/* PCI configuration access */
+#define VIRTIO_PCI_CAP_PCI_CFG		5
+
+/* This is the PCI capability header: */
+struct virtio_pci_cap {
+	uint8_t cap_vndr;	/* Generic PCI field: PCI_CAP_ID_VNDR */
+	uint8_t cap_next;	/* Generic PCI field: next ptr. */
+	uint8_t cap_len;	/* Generic PCI field: capability length */
+	uint8_t cfg_type;	/* Identifies the structure. */
+	uint8_t bar;		/* Where to find it. */
+	uint8_t padding[3];	/* Pad to full dword. */
+	uint32_t offset;	/* Offset within bar. */
+	uint32_t length;	/* Length of the structure, in bytes. */
+};
+
+struct virtio_pci_notify_cap {
+	struct virtio_pci_cap cap;
+	uint32_t notify_off_multiplier;	/* Multiplier for queue_notify_off. */
+};
+
+/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
+struct virtio_pci_common_cfg {
+	/* About the whole device. */
+	uint32_t device_feature_select;	/* read-write */
+	uint32_t device_feature;	/* read-only */
+	uint32_t guest_feature_select;	/* read-write */
+	uint32_t guest_feature;		/* read-write */
+	uint16_t msix_config;		/* read-write */
+	uint16_t num_queues;		/* read-only */
+	uint8_t device_status;		/* read-write */
+	uint8_t config_generation;	/* read-only */
+
+	/* About a specific virtqueue. */
+	uint16_t queue_select;		/* read-write */
+	uint16_t queue_size;		/* read-write, power of 2. */
+	uint16_t queue_msix_vector;	/* read-write */
+	uint16_t queue_enable;		/* read-write */
+	uint16_t queue_notify_off;	/* read-only */
+	uint32_t queue_desc_lo;		/* read-write */
+	uint32_t queue_desc_hi;		/* read-write */
+	uint32_t queue_avail_lo;	/* read-write */
+	uint32_t queue_avail_hi;	/* read-write */
+	uint32_t queue_used_lo;		/* read-write */
+	uint32_t queue_used_hi;		/* read-write */
+};
+
+struct virtio_crypto_hw;
+
+struct virtio_pci_ops {
+	void (*read_dev_cfg)(struct virtio_crypto_hw *hw, size_t offset,
+			     void *dst, int len);
+	void (*write_dev_cfg)(struct virtio_crypto_hw *hw, size_t offset,
+			      const void *src, int len);
+	void (*reset)(struct virtio_crypto_hw *hw);
+
+	uint8_t (*get_status)(struct virtio_crypto_hw *hw);
+	void (*set_status)(struct virtio_crypto_hw *hw, uint8_t status);
+
+	uint64_t (*get_features)(struct virtio_crypto_hw *hw);
+	void (*set_features)(struct virtio_crypto_hw *hw, uint64_t features);
+
+	uint8_t (*get_isr)(struct virtio_crypto_hw *hw);
+
+	uint16_t (*set_config_irq)(struct virtio_crypto_hw *hw, uint16_t vec);
+
+	uint16_t (*set_queue_irq)(struct virtio_crypto_hw *hw,
+			struct virtqueue *vq, uint16_t vec);
+
+	uint16_t (*get_queue_num)(struct virtio_crypto_hw *hw,
+			uint16_t queue_id);
+	int (*setup_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+	void (*del_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+	void (*notify_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+};
+
+struct virtio_crypto_hw {
+	/* control queue */
+	struct virtqueue *cvq;
+	uint16_t    dev_id;
+	uint16_t    max_dataqueues;
+	uint64_t    req_guest_features;
+	uint64_t    guest_features;
+	uint8_t	    use_msix;
+	uint8_t     modern;
+	uint32_t    notify_off_multiplier;
+	uint8_t     *isr;
+	uint16_t    *notify_base;
+	struct virtio_pci_common_cfg *common_cfg;
+	struct virtio_crypto_config *dev_cfg;
+	const struct rte_cryptodev_capabilities *virtio_dev_capabilities;
+};
+
+/*
+ * While virtio_crypto_hw is stored in shared memory, this structure stores
+ * some infos that may vary in the multiple process model locally.
+ * For example, the vtpci_ops pointer.
+ */
+struct virtio_hw_internal {
+	const struct virtio_pci_ops *vtpci_ops;
+	struct rte_pci_ioport io;
+};
+
+#define VTPCI_OPS(hw)	(virtio_hw_internal[(hw)->dev_id].vtpci_ops)
+#define VTPCI_IO(hw)	(&virtio_hw_internal[(hw)->dev_id].io)
+
+extern struct virtio_hw_internal virtio_hw_internal[RTE_MAX_VIRTIO_CRYPTO];
+
+/*
+ * How many bits to shift physical queue address written to QUEUE_PFN.
+ * 12 is historical, and due to x86 page size.
+ */
+#define VIRTIO_PCI_QUEUE_ADDR_SHIFT 12
+
+/* The alignment to use between consumer and producer parts of vring. */
+#define VIRTIO_PCI_VRING_ALIGN 4096
+
+enum virtio_msix_status {
+	VIRTIO_MSIX_NONE = 0,
+	VIRTIO_MSIX_DISABLED = 1,
+	VIRTIO_MSIX_ENABLED = 2
+};
+
+static inline int
+vtpci_with_feature(struct virtio_crypto_hw *hw, uint64_t bit)
+{
+	return (hw->guest_features & (1ULL << bit)) != 0;
+}
+
+/*
+ * Function declaration from virtio_pci.c
+ */
+int vtpci_cryptodev_init(struct rte_pci_device *dev,
+	struct virtio_crypto_hw *hw);
+void vtpci_cryptodev_reset(struct virtio_crypto_hw *hw);
+
+void vtpci_cryptodev_reinit_complete(struct virtio_crypto_hw *hw);
+
+uint8_t vtpci_cryptodev_get_status(struct virtio_crypto_hw *hw);
+void vtpci_cryptodev_set_status(struct virtio_crypto_hw *hw, uint8_t status);
+
+uint64_t vtpci_cryptodev_negotiate_features(struct virtio_crypto_hw *hw,
+	uint64_t host_features);
+
+void vtpci_write_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+	const void *src, int length);
+
+void vtpci_read_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+	void *dst, int length);
+
+uint8_t vtpci_cryptodev_isr(struct virtio_crypto_hw *hw);
+
+#endif /* _VIRTIO_PCI_H_ */
diff --git a/drivers/crypto/virtio/virtio_ring.h b/drivers/crypto/virtio/virtio_ring.h
new file mode 100644
index 0000000..ee30674
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_ring.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_RING_H_
+#define _VIRTIO_RING_H_
+
+#include <stdint.h>
+
+#include <rte_common.h>
+
+/* This marks a buffer as continuing via the next field. */
+#define VRING_DESC_F_NEXT       1
+/* This marks a buffer as write-only (otherwise read-only). */
+#define VRING_DESC_F_WRITE      2
+/* This means the buffer contains a list of buffer descriptors. */
+#define VRING_DESC_F_INDIRECT   4
+
+/* The Host uses this in used->flags to advise the Guest: don't kick me
+ * when you add a buffer.  It's unreliable, so it's simply an
+ * optimization.  Guest will still kick if it's out of buffers.
+ */
+#define VRING_USED_F_NO_NOTIFY  1
+/* The Guest uses this in avail->flags to advise the Host: don't
+ * interrupt me when you consume a buffer.  It's unreliable, so it's
+ * simply an optimization.
+ */
+#define VRING_AVAIL_F_NO_INTERRUPT  1
+
+/* VirtIO ring descriptors: 16 bytes.
+ * These can chain together via "next".
+ */
+struct vring_desc {
+	uint64_t addr;  /*  Address (guest-physical). */
+	uint32_t len;   /* Length. */
+	uint16_t flags; /* The flags as indicated above. */
+	uint16_t next;  /* We chain unused descriptors via this. */
+};
+
+struct vring_avail {
+	uint16_t flags;
+	uint16_t idx;
+	uint16_t ring[0];
+};
+
+/* id is a 16bit index. uint32_t is used here for ids for padding reasons. */
+struct vring_used_elem {
+	/* Index of start of used descriptor chain. */
+	uint32_t id;
+	/* Total length of the descriptor chain which was written to. */
+	uint32_t len;
+};
+
+struct vring_used {
+	uint16_t flags;
+	volatile uint16_t idx;
+	struct vring_used_elem ring[0];
+};
+
+struct vring {
+	unsigned int num;
+	struct vring_desc  *desc;
+	struct vring_avail *avail;
+	struct vring_used  *used;
+};
+
+/* The standard layout for the ring is a continuous chunk of memory which
+ * looks like this.  We assume num is a power of 2.
+ *
+ * struct vring {
+ *      // The actual descriptors (16 bytes each)
+ *      struct vring_desc desc[num];
+ *
+ *      // A ring of available descriptor heads with free-running index.
+ *      __u16 avail_flags;
+ *      __u16 avail_idx;
+ *      __u16 available[num];
+ *      __u16 used_event_idx;
+ *
+ *      // Padding to the next align boundary.
+ *      char pad[];
+ *
+ *      // A ring of used descriptor heads with free-running index.
+ *      __u16 used_flags;
+ *      __u16 used_idx;
+ *      struct vring_used_elem used[num];
+ *      __u16 avail_event_idx;
+ * };
+ *
+ * NOTE: for VirtIO PCI, align is 4096.
+ */
+
+/*
+ * We publish the used event index at the end of the available ring, and vice
+ * versa. They are at the end for backwards compatibility.
+ */
+#define vring_used_event(vr)  ((vr)->avail->ring[(vr)->num])
+#define vring_avail_event(vr) (*(uint16_t *)&(vr)->used->ring[(vr)->num])
+
+static inline size_t
+vring_size(unsigned int num, unsigned long align)
+{
+	size_t size;
+
+	size = num * sizeof(struct vring_desc);
+	size += sizeof(struct vring_avail) + (num * sizeof(uint16_t));
+	size = RTE_ALIGN_CEIL(size, align);
+	size += sizeof(struct vring_used) +
+		(num * sizeof(struct vring_used_elem));
+	return size;
+}
+
+static inline void
+vring_init(struct vring *vr, unsigned int num, uint8_t *p,
+	unsigned long align)
+{
+	vr->num = num;
+	vr->desc = (struct vring_desc *) p;
+	vr->avail = (struct vring_avail *) (p +
+		num * sizeof(struct vring_desc));
+	vr->used = (void *)
+		RTE_ALIGN_CEIL((uintptr_t)(&vr->avail->ring[num]), align);
+}
+
+/*
+ * The following is used with VIRTIO_RING_F_EVENT_IDX.
+ * Assuming a given event_idx value from the other size, if we have
+ * just incremented index from old to new_idx, should we trigger an
+ * event?
+ */
+static inline int
+vring_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
+{
+	return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old);
+}
+
+#endif /* _VIRTIO_RING_H_ */
diff --git a/drivers/crypto/virtio/virtqueue.c b/drivers/crypto/virtio/virtqueue.c
new file mode 100644
index 0000000..fd8be58
--- /dev/null
+++ b/drivers/crypto/virtio/virtqueue.c
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+#include <rte_crypto.h>
+#include <rte_malloc.h>
+
+#include "virtqueue.h"
+
+void
+virtqueue_disable_intr(struct virtqueue *vq)
+{
+	/*
+	 * Set VRING_AVAIL_F_NO_INTERRUPT to hint host
+	 * not to interrupt when it consumes packets
+	 * Note: this is only considered a hint to the host
+	 */
+	vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
+}
+
+void
+virtqueue_detatch_unused(struct virtqueue *vq)
+{
+	struct rte_crypto_op *cop = NULL;
+
+	int idx;
+
+	if (vq != NULL)
+		for (idx = 0; idx < vq->vq_nentries; idx++) {
+			cop = vq->vq_descx[idx].crypto_op;
+			if (cop) {
+				if (cop->sym->m_src)
+					rte_pktmbuf_free(cop->sym->m_src);
+				if (cop->sym->m_dst)
+					rte_pktmbuf_free(cop->sym->m_dst);
+				rte_crypto_op_free(cop);
+				vq->vq_descx[idx].crypto_op = NULL;
+			}
+		}
+}
diff --git a/drivers/crypto/virtio/virtqueue.h b/drivers/crypto/virtio/virtqueue.h
new file mode 100644
index 0000000..1bd0e89
--- /dev/null
+++ b/drivers/crypto/virtio/virtqueue.h
@@ -0,0 +1,176 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTQUEUE_H_
+#define _VIRTQUEUE_H_
+
+#include <linux/virtio_crypto.h>
+
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_mempool.h>
+
+#include "virtio_pci.h"
+#include "virtio_ring.h"
+#include "virtio_logs.h"
+
+struct rte_mbuf;
+
+/*
+ * Per virtio_config.h in Linux.
+ *     For virtio_pci on SMP, we don't need to order with respect to MMIO
+ *     accesses through relaxed memory I/O windows, so smp_mb() et al are
+ *     sufficient.
+ *
+ */
+#define virtio_mb()	rte_smp_mb()
+#define virtio_rmb()	rte_smp_rmb()
+#define virtio_wmb()	rte_smp_wmb()
+
+#define VIRTQUEUE_MAX_NAME_SZ 32
+
+enum { VTCRYPTO_DATAQ = 0, VTCRYPTO_CTRLQ = 1 };
+
+/**
+ * The maximum virtqueue size is 2^15. Use that value as the end of
+ * descriptor chain terminator since it will never be a valid index
+ * in the descriptor table. This is used to verify we are correctly
+ * handling vq_free_cnt.
+ */
+#define VQ_RING_DESC_CHAIN_END 32768
+
+struct vq_desc_extra {
+	void     *crypto_op;
+	void     *cookie;
+	uint16_t ndescs;
+};
+
+struct virtqueue {
+	/**< virtio_crypto_hw structure pointer. */
+	struct virtio_crypto_hw *hw;
+	/**< mem zone to populate RX ring. */
+	const struct rte_memzone *mz;
+	/**< memzone to populate hdr and request. */
+	struct rte_mempool *mpool;
+	uint8_t     dev_id;              /**< Device identifier. */
+	uint16_t    vq_queue_index;       /**< PCI queue index */
+
+	void        *vq_ring_virt_mem;    /**< linear address of vring*/
+	unsigned int vq_ring_size;
+	phys_addr_t vq_ring_mem;          /**< physical address of vring */
+
+	struct vring vq_ring;    /**< vring keeping desc, used and avail */
+	uint16_t    vq_free_cnt; /**< num of desc available */
+	uint16_t    vq_nentries; /**< vring desc numbers */
+
+	/**
+	 * Head of the free chain in the descriptor table. If
+	 * there are no free descriptors, this will be set to
+	 * VQ_RING_DESC_CHAIN_END.
+	 */
+	uint16_t  vq_desc_head_idx;
+	uint16_t  vq_desc_tail_idx;
+	/**
+	 * Last consumed descriptor in the used table,
+	 * trails vq_ring.used->idx.
+	 */
+	uint16_t vq_used_cons_idx;
+	uint16_t vq_avail_idx;
+
+	/* Statistics */
+	uint64_t	packets_sent_total;
+	uint64_t	packets_sent_failed;
+	uint64_t	packets_received_total;
+	uint64_t	packets_received_failed;
+
+	uint16_t  *notify_addr;
+
+	struct vq_desc_extra vq_descx[0];
+};
+
+/**
+ * Tell the backend not to interrupt us.
+ */
+void virtqueue_disable_intr(struct virtqueue *vq);
+
+/**
+ *  Get all mbufs to be freed.
+ */
+void virtqueue_detatch_unused(struct virtqueue *vq);
+
+static inline int
+virtqueue_full(const struct virtqueue *vq)
+{
+	return vq->vq_free_cnt == 0;
+}
+
+#define VIRTQUEUE_NUSED(vq) \
+	((uint16_t)((vq)->vq_ring.used->idx - (vq)->vq_used_cons_idx))
+
+static inline void
+vq_update_avail_idx(struct virtqueue *vq)
+{
+	virtio_wmb();
+	vq->vq_ring.avail->idx = vq->vq_avail_idx;
+}
+
+static inline void
+vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
+{
+	uint16_t avail_idx;
+	/*
+	 * Place the head of the descriptor chain into the next slot and make
+	 * it usable to the host. The chain is made available now rather than
+	 * deferring to virtqueue_notify() in the hopes that if the host is
+	 * currently running on another CPU, we can keep it processing the new
+	 * descriptor.
+	 */
+	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
+	if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx))
+		vq->vq_ring.avail->ring[avail_idx] = desc_idx;
+	vq->vq_avail_idx++;
+}
+
+static inline int
+virtqueue_kick_prepare(struct virtqueue *vq)
+{
+	return !(vq->vq_ring.used->flags & VRING_USED_F_NO_NOTIFY);
+}
+
+static inline void
+virtqueue_notify(struct virtqueue *vq)
+{
+	/*
+	 * Ensure updated avail->idx is visible to host.
+	 * For virtio on IA, the notificaiton is through io port operation
+	 * which is a serialization instruction itself.
+	 */
+	VTPCI_OPS(vq->hw)->notify_queue(vq->hw, vq);
+}
+
+/**
+ * Dump virtqueue internal structures, for debug purpose only.
+ */
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_DUMP
+#define VIRTQUEUE_DUMP(vq) do { \
+	uint16_t used_idx, nused; \
+	used_idx = (vq)->vq_ring.used->idx; \
+	nused = (uint16_t)(used_idx - (vq)->vq_used_cons_idx); \
+	PMD_INIT_LOG(DEBUG, \
+	  "VQ: - size=%d; free=%d; used=%d; desc_head_idx=%d;" \
+	  " avail.idx=%d; used_cons_idx=%d; used.idx=%d;" \
+	  " avail.flags=0x%x; used.flags=0x%x", \
+	  (vq)->vq_nentries, (vq)->vq_free_cnt, nused, \
+	  (vq)->vq_desc_head_idx, (vq)->vq_ring.avail->idx, \
+	  (vq)->vq_used_cons_idx, (vq)->vq_ring.used->idx, \
+	  (vq)->vq_ring.avail->flags, (vq)->vq_ring.used->flags); \
+} while (0)
+#else
+#define VIRTQUEUE_DUMP(vq) do { } while (0)
+#endif
+
+#endif /* _VIRTQUEUE_H_ */
-- 
1.8.3.1

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH 2/4] net/nfp: update PMD for using new CPP interface
    2018-03-23 17:35  1% ` [dpdk-dev] [PATCH 1/4] net/nfp: add NFP CPP support Alejandro Lucero
@ 2018-03-23 17:35  6% ` Alejandro Lucero
  1 sibling, 0 replies; 200+ results
From: Alejandro Lucero @ 2018-03-23 17:35 UTC (permalink / raw)
  To: dev

PF PMD support was based on NSPU interface. This patch changes the
PMD for using the new CPP user space interface which gives more
flexibility for adding new functionalities, and specifically at this
point, for properly selecting the right firmware file which requires
to know about the card to work with.

This change just changes initialization with the datapath being unaffected.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 drivers/net/nfp/Makefile      |  17 ++-
 drivers/net/nfp/nfp_net.c     | 342 +++++++++++++++++++++++++++++-------------
 drivers/net/nfp/nfp_net_pmd.h |  16 +-
 3 files changed, 264 insertions(+), 111 deletions(-)

diff --git a/drivers/net/nfp/Makefile b/drivers/net/nfp/Makefile
index aa3b68a..ab4e0a7 100644
--- a/drivers/net/nfp/Makefile
+++ b/drivers/net/nfp/Makefile
@@ -20,11 +20,24 @@ EXPORT_MAP := rte_pmd_nfp_version.map
 
 LIBABIVER := 1
 
+VPATH += $(SRCDIR)/nfpcore
+
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_cppcore.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_cpp_pcie_ops.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_mutex.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_resource.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_crc.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_mip.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_nffw.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_hwinfo.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_rtsym.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_nsp.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_nsp_cmds.c
+SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_nsp_eth.c
+
 #
 # all source are stored in SRCS-y
 #
 SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_net.c
-SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_nfpu.c
-SRCS-$(CONFIG_RTE_LIBRTE_NFP_PMD) += nfp_nspu.c
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/nfp/nfp_net.c b/drivers/net/nfp/nfp_net.c
index e5bfde6..0657a23 100644
--- a/drivers/net/nfp/nfp_net.c
+++ b/drivers/net/nfp/nfp_net.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2014, 2015 Netronome Systems, Inc.
+ * Copyright (c) 2014-2018 Netronome Systems, Inc.
  * All rights reserved.
  *
  * Small portions derived from code Copyright(c) 2010-2015 Intel Corporation.
@@ -55,7 +55,13 @@
 #include <rte_alarm.h>
 #include <rte_spinlock.h>
 
-#include "nfp_nfpu.h"
+#include "nfpcore/nfp_cpp.h"
+#include "nfpcore/nfp_nffw.h"
+#include "nfpcore/nfp_hwinfo.h"
+#include "nfpcore/nfp_mip.h"
+#include "nfpcore/nfp_rtsym.h"
+#include "nfpcore/nfp_nsp.h"
+
 #include "nfp_net_pmd.h"
 #include "nfp_net_logs.h"
 #include "nfp_net_ctrl.h"
@@ -104,12 +110,8 @@ static int nfp_net_rss_reta_write(struct rte_eth_dev *dev,
 static int nfp_net_rss_hash_write(struct rte_eth_dev *dev,
 			struct rte_eth_rss_conf *rss_conf);
 
-/*
- * The offset of the queue controller queues in the PCIe Target. These
- * happen to be at the same offset on the NFP6000 and the NFP3200 so
- * we use a single macro here.
- */
-#define NFP_PCIE_QUEUE(_q)	(0x800 * ((_q) & 0xff))
+/* The offset of the queue controller queues in the PCIe Target */
+#define NFP_PCIE_QUEUE(_q) (0x80000 + (NFP_QCP_QUEUE_ADDR_SZ * ((_q) & 0xff)))
 
 /* Maximum value which can be added to a queue with one transaction */
 #define NFP_QCP_MAX_ADD	0x7f
@@ -625,47 +627,29 @@ enum nfp_qcp_ptr {
 #define ETH_ADDR_LEN	6
 
 static void
-nfp_eth_copy_mac_reverse(uint8_t *dst, const uint8_t *src)
+nfp_eth_copy_mac(uint8_t *dst, const uint8_t *src)
 {
 	int i;
 
 	for (i = 0; i < ETH_ADDR_LEN; i++)
-		dst[ETH_ADDR_LEN - i - 1] = src[i];
+		dst[i] = src[i];
 }
 
 static int
 nfp_net_pf_read_mac(struct nfp_net_hw *hw, int port)
 {
-	union eth_table_entry *entry;
-	int idx, i;
-
-	idx = port;
-	entry = hw->eth_table;
-
-	/* Reading NFP ethernet table obtained before */
-	for (i = 0; i < NSP_ETH_MAX_COUNT; i++) {
-		if (!(entry->port & NSP_ETH_PORT_LANES_MASK)) {
-			/* port not in use */
-			entry++;
-			continue;
-		}
-		if (idx == 0)
-			break;
-		idx--;
-		entry++;
-	}
-
-	if (i == NSP_ETH_MAX_COUNT)
-		return -EINVAL;
+	struct nfp_eth_table *nfp_eth_table;
 
+	nfp_eth_table = nfp_eth_read_ports(hw->cpp);
 	/*
 	 * hw points to port0 private data. We need hw now pointing to
 	 * right port.
 	 */
 	hw += port;
-	nfp_eth_copy_mac_reverse((uint8_t *)&hw->mac_addr,
-				 (uint8_t *)&entry->mac_addr);
+	nfp_eth_copy_mac((uint8_t *)&hw->mac_addr,
+			 (uint8_t *)&nfp_eth_table->ports[port].mac_addr);
 
+	free(nfp_eth_table);
 	return 0;
 }
 
@@ -831,7 +815,7 @@ enum nfp_qcp_ptr {
 
 	if (hw->is_pf)
 		/* Configure the physical port up */
-		nfp_nsp_eth_config(hw->nspu_desc, hw->pf_port_idx, 1);
+		nfp_eth_set_configured(hw->cpp, hw->pf_port_idx, 1);
 
 	hw->ctrl = new_ctrl;
 
@@ -882,7 +866,7 @@ enum nfp_qcp_ptr {
 
 	if (hw->is_pf)
 		/* Configure the physical port down */
-		nfp_nsp_eth_config(hw->nspu_desc, hw->pf_port_idx, 0);
+		nfp_eth_set_configured(hw->cpp, hw->pf_port_idx, 0);
 }
 
 /* Reset and stop device. The device can not be restarted. */
@@ -2734,10 +2718,8 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 	uint64_t tx_bar_off = 0, rx_bar_off = 0;
 	uint32_t start_q;
 	int stride = 4;
-
-	nspu_desc_t *nspu_desc = NULL;
-	uint64_t bar_offset;
 	int port = 0;
+	int err;
 
 	PMD_INIT_FUNC_TRACE();
 
@@ -2758,7 +2740,6 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 
 		/* This points to the specific port private data */
 		hw = &hwport0[port];
-		hw->pf_port_idx = port;
 	} else {
 		hw = NFP_NET_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
 		hwport0 = 0;
@@ -2792,19 +2773,14 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 	}
 
 	if (hw->is_pf && port == 0) {
-		nspu_desc = hw->nspu_desc;
-
-		if (nfp_nsp_map_ctrl_bar(nspu_desc, &bar_offset) != 0) {
-			/*
-			 * A firmware should be there after PF probe so this
-			 * should not happen.
-			 */
-			RTE_LOG(ERR, PMD, "PF BAR symbol resolution failed\n");
-			return -ENODEV;
+		hw->ctrl_bar = nfp_rtsym_map(hw->sym_tbl, "_pf0_net_bar0",
+					     hw->total_ports * 32768,
+					     &hw->ctrl_area);
+		if (!hw->ctrl_bar) {
+			printf("nfp_rtsym_map fails for _pf0_net_ctrl_bar\n");
+			return -EIO;
 		}
 
-		/* vNIC PF control BAR is a subset of PF PCI device BAR */
-		hw->ctrl_bar += bar_offset;
 		PMD_INIT_LOG(DEBUG, "ctrl bar: %p\n", hw->ctrl_bar);
 	}
 
@@ -2828,13 +2804,14 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 	case PCI_DEVICE_ID_NFP6000_PF_NIC:
 	case PCI_DEVICE_ID_NFP6000_VF_NIC:
 		start_q = nn_cfg_readl(hw, NFP_NET_CFG_START_TXQ);
-		tx_bar_off = NFP_PCIE_QUEUE(start_q);
+		tx_bar_off = start_q * NFP_QCP_QUEUE_ADDR_SZ;
 		start_q = nn_cfg_readl(hw, NFP_NET_CFG_START_RXQ);
-		rx_bar_off = NFP_PCIE_QUEUE(start_q);
+		rx_bar_off = start_q * NFP_QCP_QUEUE_ADDR_SZ;
 		break;
 	default:
 		RTE_LOG(ERR, PMD, "nfp_net: no device ID matching\n");
-		return -ENODEV;
+		err = -ENODEV;
+		goto dev_err_ctrl_map;
 	}
 
 	PMD_INIT_LOG(DEBUG, "tx_bar_off: 0x%" PRIx64 "\n", tx_bar_off);
@@ -2842,17 +2819,19 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 
 	if (hw->is_pf && port == 0) {
 		/* configure access to tx/rx vNIC BARs */
-		nfp_nsp_map_queues_bar(nspu_desc, &bar_offset);
-		PMD_INIT_LOG(DEBUG, "tx/rx bar_offset: %" PRIx64 "\n",
-				    bar_offset);
-		hwport0->hw_queues = (uint8_t *)pci_dev->mem_resource[0].addr;
-
-		/* vNIC PF tx/rx BARs are a subset of PF PCI device */
-		hwport0->hw_queues += bar_offset;
+		hwport0->hw_queues = nfp_cpp_map_area(hw->cpp, 0, 0,
+						      NFP_PCIE_QUEUE(0),
+						      NFP_QCP_QUEUE_AREA_SZ,
+						      &hw->hwqueues_area);
+
+		if (!hwport0->hw_queues) {
+			printf("nfp_rtsym_map fails for net.qc\n");
+			err = -EIO;
+			goto dev_err_ctrl_map;
+		}
 
-		/* Lets seize the chance to read eth table from hw */
-		if (nfp_nsp_eth_read_table(nspu_desc, &hw->eth_table))
-			return -ENODEV;
+		PMD_INIT_LOG(DEBUG, "tx/rx bar address: 0x%p\n",
+				    hwport0->hw_queues);
 	}
 
 	if (hw->is_pf) {
@@ -2912,7 +2891,8 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 	eth_dev->data->mac_addrs = rte_zmalloc("mac_addr", ETHER_ADDR_LEN, 0);
 	if (eth_dev->data->mac_addrs == NULL) {
 		PMD_INIT_LOG(ERR, "Failed to space for MAC address");
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto dev_err_queues_map;
 	}
 
 	if (hw->is_pf) {
@@ -2923,6 +2903,8 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 	}
 
 	if (!is_valid_assigned_ether_addr((struct ether_addr *)&hw->mac_addr)) {
+		PMD_INIT_LOG(INFO, "Using random mac address for port %d\n",
+				   port);
 		/* Using random mac addresses for VFs */
 		eth_random_addr(&hw->mac_addr[0]);
 		nfp_net_write_mac(hw, (uint8_t *)&hw->mac_addr);
@@ -2951,11 +2933,19 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 	nfp_net_stats_reset(eth_dev);
 
 	return 0;
+
+dev_err_queues_map:
+		nfp_cpp_area_free(hw->hwqueues_area);
+dev_err_ctrl_map:
+		nfp_cpp_area_free(hw->ctrl_area);
+
+	return err;
 }
 
 static int
 nfp_pf_create_dev(struct rte_pci_device *dev, int port, int ports,
-		  nfpu_desc_t *nfpu_desc, void **priv)
+		  struct nfp_cpp *cpp, struct nfp_hwinfo *hwinfo,
+		  int phys_port, struct nfp_rtsym_table *sym_tbl, void **priv)
 {
 	struct rte_eth_dev *eth_dev;
 	struct nfp_net_hw *hw;
@@ -2993,12 +2983,16 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 	 * Then dev_private is adjusted per port.
 	 */
 	hw = (struct nfp_net_hw *)(eth_dev->data->dev_private) + port;
-	hw->nspu_desc = nfpu_desc->nspu;
-	hw->nfpu_desc = nfpu_desc;
+	hw->cpp = cpp;
+	hw->hwinfo = hwinfo;
+	hw->sym_tbl = sym_tbl;
+	hw->pf_port_idx = phys_port;
 	hw->is_pf = 1;
 	if (ports > 1)
 		hw->pf_multiport_enabled = 1;
 
+	hw->total_ports = ports;
+
 	eth_dev->device = &dev->device;
 	rte_eth_copy_pci_info(eth_dev, dev);
 
@@ -3012,55 +3006,191 @@ uint32_t nfp_net_txq_full(struct nfp_net_txq *txq)
 	return ret;
 }
 
+#define DEFAULT_FW_PATH       "/lib/firmware/netronome"
+
+static int
+nfp_fw_upload(struct rte_pci_device *dev, struct nfp_nsp *nsp, char *card)
+{
+	struct nfp_cpp *cpp = nsp->cpp;
+	int fw_f;
+	char *fw_buf;
+	char fw_name[100];
+	char serial[100];
+	struct stat file_stat;
+	off_t fsize, bytes;
+
+	/* Looking for firmware file in order of priority */
+
+	/* First try to find a firmware image specific for this device */
+	sprintf(serial, "serial-%02x-%02x-%02x-%02x-%02x-%02x-%02hhx-%02hhx",
+		cpp->serial[0], cpp->serial[1], cpp->serial[2], cpp->serial[3],
+		cpp->serial[4], cpp->serial[5], cpp->interface >> 8,
+		cpp->interface & 0xff);
+
+	sprintf(fw_name, "%s/%s.nffw", DEFAULT_FW_PATH, serial);
+
+	RTE_LOG(DEBUG, PMD, "Trying with fw file: %s\n", fw_name);
+	fw_f = open(fw_name, O_RDONLY);
+	if (fw_f > 0)
+		goto read_fw;
+
+	/* Then try the PCI name */
+	sprintf(fw_name, "%s/pci-%s.nffw", DEFAULT_FW_PATH, dev->device.name);
+
+	RTE_LOG(DEBUG, PMD, "Trying with fw file: %s\n", fw_name);
+	fw_f = open(fw_name, O_RDONLY);
+	if (fw_f > 0)
+		goto read_fw;
+
+	/* Finally try the card type and media */
+	sprintf(fw_name, "%s/%s", DEFAULT_FW_PATH, card);
+	RTE_LOG(DEBUG, PMD, "Trying with fw file: %s\n", fw_name);
+	fw_f = open(fw_name, O_RDONLY);
+	if (fw_f < 0) {
+		RTE_LOG(INFO, PMD, "Firmware file %s not found.", fw_name);
+		return -ENOENT;
+	}
+
+read_fw:
+	if (fstat(fw_f, &file_stat) < 0) {
+		RTE_LOG(INFO, PMD, "Firmware file %s size is unknown", fw_name);
+		close(fw_f);
+		return -ENOENT;
+	}
+
+	fsize = file_stat.st_size;
+	RTE_LOG(INFO, PMD, "Firmware file found at %s with size: %" PRIu64 "\n",
+			    fw_name, (uint64_t)fsize);
+
+	fw_buf = malloc((size_t)fsize);
+	if (!fw_buf) {
+		RTE_LOG(INFO, PMD, "malloc failed for fw buffer");
+		close(fw_f);
+		return -ENOMEM;
+	}
+	memset(fw_buf, 0, fsize);
+
+	bytes = read(fw_f, fw_buf, fsize);
+	if (bytes != fsize) {
+		RTE_LOG(INFO, PMD, "Reading fw to buffer failed.\n"
+				   "Just %" PRIu64 " of %" PRIu64 " bytes read",
+				   (uint64_t)bytes, (uint64_t)fsize);
+		free(fw_buf);
+		close(fw_f);
+		return -EIO;
+	}
+
+	RTE_LOG(INFO, PMD, "Uploading the firmware ...");
+	nfp_nsp_load_fw(nsp, fw_buf, bytes);
+	RTE_LOG(INFO, PMD, "Done");
+
+	free(fw_buf);
+	close(fw_f);
+
+	return 0;
+}
+
+static int
+nfp_fw_setup(struct rte_pci_device *dev, struct nfp_cpp *cpp,
+	     struct nfp_eth_table *nfp_eth_table, struct nfp_hwinfo *hwinfo)
+{
+	struct nfp_nsp *nsp;
+	const char *nfp_fw_model;
+	char card_desc[100];
+	int err = 0;
+
+	nfp_fw_model = nfp_hwinfo_lookup(hwinfo, "assembly.partno");
+
+	if (nfp_fw_model) {
+		RTE_LOG(INFO, PMD, "firmware model found: %s\n", nfp_fw_model);
+	} else {
+		RTE_LOG(ERR, PMD, "firmware model NOT found\n");
+		return -EIO;
+	}
+
+	if (nfp_eth_table->count == 0 || nfp_eth_table->count > 8) {
+		RTE_LOG(ERR, PMD, "NFP ethernet table reports wrong ports: %u\n",
+		       nfp_eth_table->count);
+		return -EIO;
+	}
+
+	RTE_LOG(INFO, PMD, "NFP ethernet port table reports %u ports\n",
+			   nfp_eth_table->count);
+
+	RTE_LOG(INFO, PMD, "Port speed: %u\n", nfp_eth_table->ports[0].speed);
+
+	sprintf(card_desc, "nic_%s_%dx%d.nffw", nfp_fw_model,
+		nfp_eth_table->count, nfp_eth_table->ports[0].speed / 1000);
+
+	nsp = nfp_nsp_open(cpp);
+	if (!nsp) {
+		RTE_LOG(ERR, PMD, "NFP error when obtaining NSP handle\n");
+		return -EIO;
+	}
+
+	nfp_nsp_device_soft_reset(nsp);
+	err = nfp_fw_upload(dev, nsp, card_desc);
+
+	nfp_nsp_close(nsp);
+	return err;
+}
+
 static int nfp_pf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 			    struct rte_pci_device *dev)
 {
-	nfpu_desc_t *nfpu_desc;
-	nspu_desc_t *nspu_desc;
-	uint64_t offset_symbol;
-	uint8_t *bar_offset;
-	int major, minor;
+	struct nfp_cpp *cpp;
+	struct nfp_hwinfo *hwinfo;
+	struct nfp_rtsym_table *sym_tbl;
+	struct nfp_eth_table *nfp_eth_table = NULL;
 	int total_ports;
 	void *priv = 0;
 	int ret = -ENODEV;
+	int err;
 	int i;
 
 	if (!dev)
 		return ret;
 
-	nfpu_desc = rte_malloc("nfp nfpu", sizeof(nfpu_desc_t), 0);
-	if (!nfpu_desc)
-		return -ENOMEM;
-
-	if (nfpu_open(dev, nfpu_desc, 0) < 0) {
-		RTE_LOG(ERR, PMD,
-			"nfpu_open failed\n");
-		goto nfpu_error;
+	cpp = nfp_cpp_from_device_name(dev->device.name);
+	if (!cpp) {
+		RTE_LOG(ERR, PMD, "A CPP handle can not be obtained");
+		ret = -EIO;
+		goto error;
 	}
 
-	nspu_desc = nfpu_desc->nspu;
+	hwinfo = nfp_hwinfo_read(cpp);
+	if (!hwinfo) {
+		RTE_LOG(ERR, PMD, "Error reading hwinfo table");
+		return -EIO;
+	}
 
+	nfp_eth_table = nfp_eth_read_ports(cpp);
+	if (!nfp_eth_table) {
+		RTE_LOG(ERR, PMD, "Error reading NFP ethernet table\n");
+		return -EIO;
+	}
 
-	/* Check NSP ABI version */
-	if (nfp_nsp_get_abi_version(nspu_desc, &major, &minor) < 0) {
-		RTE_LOG(INFO, PMD, "NFP NSP not present\n");
+	if (nfp_fw_setup(dev, cpp, nfp_eth_table, hwinfo)) {
+		RTE_LOG(INFO, PMD, "Error when uploading firmware\n");
+		ret = -EIO;
 		goto error;
 	}
-	PMD_INIT_LOG(INFO, "nspu ABI version: %d.%d\n", major, minor);
 
-	if ((major == 0) && (minor < 20)) {
-		RTE_LOG(INFO, PMD, "NFP NSP ABI version too old. Required 0.20 or higher\n");
+	/* Now the symbol table should be there */
+	sym_tbl = nfp_rtsym_table_read(cpp);
+	if (!sym_tbl) {
+		RTE_LOG(ERR, PMD, "Something is wrong with the firmware"
+				" symbol table");
+		ret = -EIO;
 		goto error;
 	}
 
-	ret = nfp_nsp_fw_setup(nspu_desc, "nfd_cfg_pf0_num_ports",
-			       &offset_symbol);
-	if (ret)
+	total_ports = nfp_rtsym_read_le(sym_tbl, "nfd_cfg_pf0_num_ports", &err);
+	if (total_ports != (int)nfp_eth_table->count) {
+		RTE_LOG(ERR, PMD, "Inconsistent number of ports\n");
+		ret = -EIO;
 		goto error;
-
-	bar_offset = (uint8_t *)dev->mem_resource[0].addr;
-	bar_offset += offset_symbol;
-	total_ports = (uint32_t)*bar_offset;
+	}
 	PMD_INIT_LOG(INFO, "Total pf ports: %d\n", total_ports);
 
 	if (total_ports <= 0 || total_ports > 8) {
@@ -3070,18 +3200,15 @@ static int nfp_pf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
 	}
 
 	for (i = 0; i < total_ports; i++) {
-		ret = nfp_pf_create_dev(dev, i, total_ports, nfpu_desc, &priv);
+		ret = nfp_pf_create_dev(dev, i, total_ports, cpp, hwinfo,
+					nfp_eth_table->ports[i].index,
+					sym_tbl, &priv);
 		if (ret)
-			goto error;
+			break;
 	}
 
-	return 0;
-
 error:
-	nfpu_close(nfpu_desc);
-nfpu_error:
-	rte_free(nfpu_desc);
-
+	free(nfp_eth_table);
 	return ret;
 }
 
@@ -3129,8 +3256,19 @@ static int eth_nfp_pci_remove(struct rte_pci_device *pci_dev)
 	if ((pci_dev->id.device_id == PCI_DEVICE_ID_NFP4000_PF_NIC) ||
 	    (pci_dev->id.device_id == PCI_DEVICE_ID_NFP6000_PF_NIC)) {
 		port = get_pf_port_number(eth_dev->data->name);
+		/*
+		 * hotplug is not possible with multiport PF although freeing
+		 * data structures can be done for first port.
+		 */
+		if (port != 0)
+			return -ENOTSUP;
 		hwport0 = NFP_NET_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
 		hw = &hwport0[port];
+		nfp_cpp_area_free(hw->ctrl_area);
+		nfp_cpp_area_free(hw->hwqueues_area);
+		free(hw->hwinfo);
+		free(hw->sym_tbl);
+		nfp_cpp_free(hw->cpp);
 	} else {
 		hw = NFP_NET_DEV_PRIVATE_TO_HW(eth_dev->data->dev_private);
 	}
diff --git a/drivers/net/nfp/nfp_net_pmd.h b/drivers/net/nfp/nfp_net_pmd.h
index 1ae0ea6..097c871 100644
--- a/drivers/net/nfp/nfp_net_pmd.h
+++ b/drivers/net/nfp/nfp_net_pmd.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2014, 2015 Netronome Systems, Inc.
+ * Copyright (c) 2014-2018 Netronome Systems, Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -63,6 +63,7 @@
 #define NFP_NET_CRTL_BAR        0
 #define NFP_NET_TX_BAR          2
 #define NFP_NET_RX_BAR          2
+#define NFP_QCP_QUEUE_AREA_SZ			0x80000
 
 /* Macros for accessing the Queue Controller Peripheral 'CSRs' */
 #define NFP_QCP_QUEUE_OFF(_x)                 ((_x) * 0x800)
@@ -430,20 +431,21 @@ struct nfp_net_hw {
 	/* Records starting point for counters */
 	struct rte_eth_stats eth_stats_base;
 
-#ifdef NFP_NET_LIBNFP
 	struct nfp_cpp *cpp;
 	struct nfp_cpp_area *ctrl_area;
-	struct nfp_cpp_area *tx_area;
-	struct nfp_cpp_area *rx_area;
+	struct nfp_cpp_area *hwqueues_area;
 	struct nfp_cpp_area *msix_area;
-#endif
+
 	uint8_t *hw_queues;
 	uint8_t is_pf;
 	uint8_t pf_port_idx;
 	uint8_t pf_multiport_enabled;
+	uint8_t total_ports;
+
 	union eth_table_entry *eth_table;
-	nspu_desc_t *nspu_desc;
-	nfpu_desc_t *nfpu_desc;
+
+	struct nfp_hwinfo *hwinfo;
+	struct nfp_rtsym_table *sym_tbl;
 };
 
 struct nfp_net_adapter {
-- 
1.9.1

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH 1/4] net/nfp: add NFP CPP support
  @ 2018-03-23 17:35  1% ` Alejandro Lucero
  2018-03-23 17:35  6% ` [dpdk-dev] [PATCH 2/4] net/nfp: update PMD for using new CPP interface Alejandro Lucero
  1 sibling, 0 replies; 200+ results
From: Alejandro Lucero @ 2018-03-23 17:35 UTC (permalink / raw)
  To: dev

CPP refers to the internal NFP Command Push Pull bus. This patch allows
to create CPP commands from user space allowing to access any single
part of the chip.

This CPP interface is the base for having other functionalities like
mutexes when accessing specific chip components, chip resources management,
firmware upload or using the NSP, an embedded arm processor which can
perform tasks on demand.

NSP was the previous only way for doing things in the chip by the PMD,
where a NSPU interface was used for commands like firmware upload or
port link configuration. CPP interface supersedes NSPU, but it is still
possible to use NSP through CPP.

CPP interface adds a great flexibility for doing things like extended
stats, firmware debugging or selecting properly the firmware file to
upload.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 drivers/net/nfp/nfpcore/nfp-common/nfp_cppat.h    | 748 +++++++++++++++++
 drivers/net/nfp/nfpcore/nfp-common/nfp_platform.h |  62 ++
 drivers/net/nfp/nfpcore/nfp-common/nfp_resid.h    | 620 ++++++++++++++
 drivers/net/nfp/nfpcore/nfp6000/nfp6000.h         |  68 ++
 drivers/net/nfp/nfpcore/nfp6000/nfp_xpb.h         |  54 ++
 drivers/net/nfp/nfpcore/nfp_cpp.h                 | 803 ++++++++++++++++++
 drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c        | 962 ++++++++++++++++++++++
 drivers/net/nfp/nfpcore/nfp_cppcore.c             | 901 ++++++++++++++++++++
 drivers/net/nfp/nfpcore/nfp_crc.c                 |  75 ++
 drivers/net/nfp/nfpcore/nfp_crc.h                 |  45 +
 drivers/net/nfp/nfpcore/nfp_hwinfo.c              | 226 +++++
 drivers/net/nfp/nfpcore/nfp_hwinfo.h              | 111 +++
 drivers/net/nfp/nfpcore/nfp_mip.c                 | 180 ++++
 drivers/net/nfp/nfpcore/nfp_mip.h                 |  47 ++
 drivers/net/nfp/nfpcore/nfp_mutex.c               | 450 ++++++++++
 drivers/net/nfp/nfpcore/nfp_nffw.c                | 261 ++++++
 drivers/net/nfp/nfpcore/nfp_nffw.h                | 112 +++
 drivers/net/nfp/nfpcore/nfp_nsp.c                 | 453 ++++++++++
 drivers/net/nfp/nfpcore/nfp_nsp.h                 | 330 ++++++++
 drivers/net/nfp/nfpcore/nfp_nsp_cmds.c            | 135 +++
 drivers/net/nfp/nfpcore/nfp_nsp_eth.c             | 691 ++++++++++++++++
 drivers/net/nfp/nfpcore/nfp_resource.c            | 291 +++++++
 drivers/net/nfp/nfpcore/nfp_resource.h            |  78 ++
 drivers/net/nfp/nfpcore/nfp_rtsym.c               | 353 ++++++++
 drivers/net/nfp/nfpcore/nfp_rtsym.h               |  87 ++
 drivers/net/nfp/nfpcore/nfp_target.h              | 605 ++++++++++++++
 26 files changed, 8748 insertions(+)
 create mode 100644 drivers/net/nfp/nfpcore/nfp-common/nfp_cppat.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp-common/nfp_platform.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp-common/nfp_resid.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp6000/nfp6000.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp6000/nfp_xpb.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_cpp.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_cppcore.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_crc.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_crc.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_hwinfo.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_hwinfo.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_mip.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_mip.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_mutex.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_nffw.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_nffw.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_nsp.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_nsp.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_nsp_cmds.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_nsp_eth.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_resource.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_resource.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_rtsym.c
 create mode 100644 drivers/net/nfp/nfpcore/nfp_rtsym.h
 create mode 100644 drivers/net/nfp/nfpcore/nfp_target.h

diff --git a/drivers/net/nfp/nfpcore/nfp-common/nfp_cppat.h b/drivers/net/nfp/nfpcore/nfp-common/nfp_cppat.h
new file mode 100644
index 0000000..fbeec57
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp-common/nfp_cppat.h
@@ -0,0 +1,748 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_CPPAT_H__
+#define __NFP_CPPAT_H__
+
+#include "nfp_platform.h"
+#include "nfp_resid.h"
+
+/* This file contains helpers for creating CPP commands
+ *
+ * All magic NFP-6xxx IMB 'mode' numbers here are from:
+ * Databook (1 August 2013)
+ * - System Overview and Connectivity
+ * -- Internal Connectivity
+ * --- Distributed Switch Fabric - Command Push/Pull (DSF-CPP) Bus
+ * ---- CPP addressing
+ * ----- Table 3.6. CPP Address Translation Mode Commands
+ */
+
+#define _NIC_NFP6000_MU_LOCALITY_DIRECT 2
+
+static inline int
+_nfp6000_decode_basic(uint64_t addr, int *dest_island, int cpp_tgt, int mode,
+		      int addr40, int isld1, int isld0);
+
+static uint64_t
+_nic_mask64(int msb, int lsb, int at0)
+{
+	uint64_t v;
+	int w = msb - lsb + 1;
+
+	if (w == 64)
+		return ~(uint64_t)0;
+
+	if ((lsb + w) > 64)
+		return 0;
+
+	v = (UINT64_C(1) << w) - 1;
+
+	if (at0)
+		return v;
+
+	return v << lsb;
+}
+
+/* For VQDR, we may not modify the Channel bits, which might overlap
+ * with the Index bit. When it does, we need to ensure that isld0 == isld1.
+ */
+static inline int
+_nfp6000_encode_basic(uint64_t *addr, int dest_island, int cpp_tgt, int mode,
+		      int addr40, int isld1, int isld0)
+{
+	uint64_t _u64;
+	int iid_lsb, idx_lsb;
+	int i, v = 0;
+	int isld[2];
+
+	isld[0] = isld0;
+	isld[1] = isld1;
+
+	switch (cpp_tgt) {
+	case NFP6000_CPPTGT_MU:
+		/* This function doesn't handle MU */
+		return NFP_ERRNO(EINVAL);
+	case NFP6000_CPPTGT_CTXPB:
+		/* This function doesn't handle CTXPB */
+		return NFP_ERRNO(EINVAL);
+	default:
+		break;
+	}
+
+	switch (mode) {
+	case 0:
+		if (cpp_tgt == NFP6000_CPPTGT_VQDR && !addr40) {
+			/*
+			 * In this specific mode we'd rather not modify the
+			 * address but we can verify if the existing contents
+			 * will point to a valid island.
+			 */
+			i = _nfp6000_decode_basic(*addr, &v, cpp_tgt, mode,
+						  addr40, isld1,
+						  isld0);
+			if (i != 0)
+				/* Full Island ID and channel bits overlap */
+				return i;
+
+			/*
+			 * If dest_island is invalid, the current address won't
+			 * go where expected.
+			 */
+			if (dest_island != -1 && dest_island != v)
+				return NFP_ERRNO(EINVAL);
+
+			/* If dest_island was -1, we don't care */
+			return 0;
+		}
+
+		iid_lsb = (addr40) ? 34 : 26;
+
+		/* <39:34> or <31:26> */
+		_u64 = _nic_mask64((iid_lsb + 5), iid_lsb, 0);
+		*addr &= ~_u64;
+		*addr |= (((uint64_t)dest_island) << iid_lsb) & _u64;
+		return 0;
+	case 1:
+		if (cpp_tgt == NFP6000_CPPTGT_VQDR && !addr40) {
+			i = _nfp6000_decode_basic(*addr, &v, cpp_tgt, mode,
+						  addr40, isld1, isld0);
+			if (i != 0)
+				/* Full Island ID and channel bits overlap */
+				return i;
+
+			/*
+			 * If dest_island is invalid, the current address won't
+			 * go where expected.
+			 */
+			if (dest_island != -1 && dest_island != v)
+				return NFP_ERRNO(EINVAL);
+
+			/* If dest_island was -1, we don't care */
+			return 0;
+		}
+
+		idx_lsb = (addr40) ? 39 : 31;
+		if (dest_island == isld0) {
+			/* Only need to clear the Index bit */
+			*addr &= ~_nic_mask64(idx_lsb, idx_lsb, 0);
+			return 0;
+		}
+
+		if (dest_island == isld1) {
+			/* Only need to set the Index bit */
+			*addr |= (UINT64_C(1) << idx_lsb);
+			return 0;
+		}
+
+		return NFP_ERRNO(ENODEV);
+	case 2:
+		if (cpp_tgt == NFP6000_CPPTGT_VQDR && !addr40) {
+			/* iid<0> = addr<30> = channel<0> */
+			/* channel<1> = addr<31> = Index */
+
+			/*
+			 * Special case where we allow channel bits to be set
+			 * before hand and with them select an island.
+			 * So we need to confirm that it's at least plausible.
+			 */
+			i = _nfp6000_decode_basic(*addr, &v, cpp_tgt, mode,
+						  addr40, isld1, isld0);
+			if (i != 0)
+				/* Full Island ID and channel bits overlap */
+				return i;
+
+			/*
+			 * If dest_island is invalid, the current address won't
+			 * go where expected.
+			 */
+			if (dest_island != -1 && dest_island != v)
+				return NFP_ERRNO(EINVAL);
+
+			/* If dest_island was -1, we don't care */
+			return 0;
+		}
+
+		/*
+		 * Make sure we compare against isldN values by clearing the
+		 * LSB. This is what the silicon does.
+		 **/
+		isld[0] &= ~1;
+		isld[1] &= ~1;
+
+		idx_lsb = (addr40) ? 39 : 31;
+		iid_lsb = idx_lsb - 1;
+
+		/*
+		 * Try each option, take first one that fits. Not sure if we
+		 * would want to do some smarter searching and prefer 0 or non-0
+		 * island IDs.
+		 */
+
+		for (i = 0; i < 2; i++) {
+			for (v = 0; v < 2; v++) {
+				if (dest_island != (isld[i] | v))
+					continue;
+				*addr &= ~_nic_mask64(idx_lsb, iid_lsb, 0);
+				*addr |= (((uint64_t)i) << idx_lsb);
+				*addr |= (((uint64_t)v) << iid_lsb);
+				return 0;
+			}
+		}
+
+		return NFP_ERRNO(ENODEV);
+	case 3:
+		if (cpp_tgt == NFP6000_CPPTGT_VQDR && !addr40) {
+			/*
+			 * iid<0> = addr<29> = data
+			 * iid<1> = addr<30> = channel<0>
+			 * channel<1> = addr<31> = Index
+			 */
+			i = _nfp6000_decode_basic(*addr, &v, cpp_tgt, mode,
+						  addr40, isld1, isld0);
+			if (i != 0)
+				/* Full Island ID and channel bits overlap */
+				return i;
+
+			if (dest_island != -1 && dest_island != v)
+				return NFP_ERRNO(EINVAL);
+
+			/* If dest_island was -1, we don't care */
+			return 0;
+		}
+
+		isld[0] &= ~3;
+		isld[1] &= ~3;
+
+		idx_lsb = (addr40) ? 39 : 31;
+		iid_lsb = idx_lsb - 2;
+
+		for (i = 0; i < 2; i++) {
+			for (v = 0; v < 4; v++) {
+				if (dest_island != (isld[i] | v))
+					continue;
+				*addr &= ~_nic_mask64(idx_lsb, iid_lsb, 0);
+				*addr |= (((uint64_t)i) << idx_lsb);
+				*addr |= (((uint64_t)v) << iid_lsb);
+				return 0;
+			}
+		}
+		return NFP_ERRNO(ENODEV);
+	default:
+		break;
+	}
+
+	return NFP_ERRNO(EINVAL);
+}
+
+static inline int
+_nfp6000_decode_basic(uint64_t addr, int *dest_island, int cpp_tgt, int mode,
+		      int addr40, int isld1, int isld0)
+{
+	int iid_lsb, idx_lsb;
+
+	switch (cpp_tgt) {
+	case NFP6000_CPPTGT_MU:
+		/* This function doesn't handle MU */
+		return NFP_ERRNO(EINVAL);
+	case NFP6000_CPPTGT_CTXPB:
+		/* This function doesn't handle CTXPB */
+		return NFP_ERRNO(EINVAL);
+	default:
+		break;
+	}
+
+	switch (mode) {
+	case 0:
+		/*
+		 * For VQDR, in this mode for 32-bit addressing it would be
+		 * islands 0, 16, 32 and 48 depending on channel and upper
+		 * address bits. Since those are not all valid islands, most
+		 * decode cases would result in bad island IDs, but we do them
+		 * anyway since this is decoding an address that is already
+		 * assumed to be used as-is to get to sram.
+		 */
+		iid_lsb = (addr40) ? 34 : 26;
+		*dest_island = (int)(addr >> iid_lsb) & 0x3F;
+		return 0;
+	case 1:
+		/*
+		 * For VQDR 32-bit, this would decode as:
+		 *	Channel 0: island#0
+		 *	Channel 1: island#0
+		 *	Channel 2: island#1
+		 *	Channel 3: island#1
+		 *
+		 * That would be valid as long as both islands have VQDR.
+		 * Let's allow this.
+		 */
+
+		idx_lsb = (addr40) ? 39 : 31;
+		if (addr & _nic_mask64(idx_lsb, idx_lsb, 0))
+			*dest_island = isld1;
+		else
+			*dest_island = isld0;
+
+		return 0;
+	case 2:
+		/*
+		 * For VQDR 32-bit:
+		 *	Channel 0: (island#0 | 0)
+		 *	Channel 1: (island#0 | 1)
+		 *	Channel 2: (island#1 | 0)
+		 *	Channel 3: (island#1 | 1)
+		 *
+		 * Make sure we compare against isldN values by clearing the
+		 * LSB. This is what the silicon does.
+		 */
+		isld0 &= ~1;
+		isld1 &= ~1;
+
+		idx_lsb = (addr40) ? 39 : 31;
+		iid_lsb = idx_lsb - 1;
+
+		if (addr & _nic_mask64(idx_lsb, idx_lsb, 0))
+			*dest_island = isld1 | (int)((addr >> iid_lsb) & 1);
+		else
+			*dest_island = isld0 | (int)((addr >> iid_lsb) & 1);
+
+		return 0;
+	case 3:
+		/*
+		 * In this mode the data address starts to affect the island ID
+		 * so rather not allow it. In some really specific case one
+		 * could use this to send the upper half of the VQDR channel to
+		 * another MU, but this is getting very specific. However, as
+		 * above for mode 0, this is the decoder and the caller should
+		 * validate the resulting IID. This blindly does what the
+		 * silicon would do.
+		 */
+
+		isld0 &= ~3;
+		isld1 &= ~3;
+
+		idx_lsb = (addr40) ? 39 : 31;
+		iid_lsb = idx_lsb - 2;
+
+		if (addr & _nic_mask64(idx_lsb, idx_lsb, 0))
+			*dest_island = isld1 | (int)((addr >> iid_lsb) & 3);
+		else
+			*dest_island = isld0 | (int)((addr >> iid_lsb) & 3);
+
+		return 0;
+	default:
+		break;
+	}
+
+	return NFP_ERRNO(EINVAL);
+}
+
+static inline int
+_nfp6000_cppat_mu_locality_lsb(int mode, int addr40)
+{
+	switch (mode) {
+	case 0:
+	case 1:
+	case 2:
+	case 3:
+		return (addr40) ? 38 : 30;
+	default:
+		break;
+	}
+	return NFP_ERRNO(EINVAL);
+}
+
+static inline int
+_nfp6000_encode_mu(uint64_t *addr, int dest_island, int mode, int addr40,
+		   int isld1, int isld0)
+{
+	uint64_t _u64;
+	int iid_lsb, idx_lsb, locality_lsb;
+	int i, v;
+	int isld[2];
+	int da;
+
+	isld[0] = isld0;
+	isld[1] = isld1;
+	locality_lsb = _nfp6000_cppat_mu_locality_lsb(mode, addr40);
+
+	if (((*addr >> locality_lsb) & 3) == _NIC_NFP6000_MU_LOCALITY_DIRECT)
+		da = 1;
+	else
+		da = 0;
+
+	switch (mode) {
+	case 0:
+		iid_lsb = (addr40) ? 32 : 24;
+		_u64 = _nic_mask64((iid_lsb + 5), iid_lsb, 0);
+		*addr &= ~_u64;
+		*addr |= (((uint64_t)dest_island) << iid_lsb) & _u64;
+		return 0;
+	case 1:
+		if (da) {
+			iid_lsb = (addr40) ? 32 : 24;
+			_u64 = _nic_mask64((iid_lsb + 5), iid_lsb, 0);
+			*addr &= ~_u64;
+			*addr |= (((uint64_t)dest_island) << iid_lsb) & _u64;
+			return 0;
+		}
+
+		idx_lsb = (addr40) ? 37 : 29;
+		if (dest_island == isld0) {
+			*addr &= ~_nic_mask64(idx_lsb, idx_lsb, 0);
+			return 0;
+		}
+
+		if (dest_island == isld1) {
+			*addr |= (UINT64_C(1) << idx_lsb);
+			return 0;
+		}
+
+		return NFP_ERRNO(ENODEV);
+	case 2:
+		if (da) {
+			iid_lsb = (addr40) ? 32 : 24;
+			_u64 = _nic_mask64((iid_lsb + 5), iid_lsb, 0);
+			*addr &= ~_u64;
+			*addr |= (((uint64_t)dest_island) << iid_lsb) & _u64;
+			return 0;
+		}
+
+		/*
+		 * Make sure we compare against isldN values by clearing the
+		 * LSB. This is what the silicon does.
+		 */
+		isld[0] &= ~1;
+		isld[1] &= ~1;
+
+		idx_lsb = (addr40) ? 37 : 29;
+		iid_lsb = idx_lsb - 1;
+
+		/*
+		 * Try each option, take first one that fits. Not sure if we
+		 * would want to do some smarter searching and prefer 0 or
+		 * non-0 island IDs.
+		 */
+
+		for (i = 0; i < 2; i++) {
+			for (v = 0; v < 2; v++) {
+				if (dest_island != (isld[i] | v))
+					continue;
+				*addr &= ~_nic_mask64(idx_lsb, iid_lsb, 0);
+				*addr |= (((uint64_t)i) << idx_lsb);
+				*addr |= (((uint64_t)v) << iid_lsb);
+				return 0;
+			}
+		}
+		return NFP_ERRNO(ENODEV);
+	case 3:
+		/*
+		 * Only the EMU will use 40 bit addressing. Silently set the
+		 * direct locality bit for everyone else. The SDK toolchain
+		 * uses dest_island <= 0 to test for atypical address encodings
+		 * to support access to local-island CTM with a 32-but address
+		 * (high-locality is effectively ignored and just used for
+		 * routing to island #0).
+		 */
+		if (dest_island > 0 &&
+		    (dest_island < 24 || dest_island > 26)) {
+			*addr |= ((uint64_t)_NIC_NFP6000_MU_LOCALITY_DIRECT)
+				 << locality_lsb;
+			da = 1;
+		}
+
+		if (da) {
+			iid_lsb = (addr40) ? 32 : 24;
+			_u64 = _nic_mask64((iid_lsb + 5), iid_lsb, 0);
+			*addr &= ~_u64;
+			*addr |= (((uint64_t)dest_island) << iid_lsb) & _u64;
+			return 0;
+		}
+
+		isld[0] &= ~3;
+		isld[1] &= ~3;
+
+		idx_lsb = (addr40) ? 37 : 29;
+		iid_lsb = idx_lsb - 2;
+
+		for (i = 0; i < 2; i++) {
+			for (v = 0; v < 4; v++) {
+				if (dest_island != (isld[i] | v))
+					continue;
+				*addr &= ~_nic_mask64(idx_lsb, iid_lsb, 0);
+				*addr |= (((uint64_t)i) << idx_lsb);
+				*addr |= (((uint64_t)v) << iid_lsb);
+				return 0;
+			}
+		}
+
+		return NFP_ERRNO(ENODEV);
+	default:
+		break;
+	}
+
+	return NFP_ERRNO(EINVAL);
+}
+
+static inline int
+_nfp6000_decode_mu(uint64_t addr, int *dest_island, int mode, int addr40,
+		   int isld1, int isld0)
+{
+	int iid_lsb, idx_lsb, locality_lsb;
+	int da;
+
+	locality_lsb = _nfp6000_cppat_mu_locality_lsb(mode, addr40);
+
+	if (((addr >> locality_lsb) & 3) == _NIC_NFP6000_MU_LOCALITY_DIRECT)
+		da = 1;
+	else
+		da = 0;
+
+	switch (mode) {
+	case 0:
+		iid_lsb = (addr40) ? 32 : 24;
+		*dest_island = (int)(addr >> iid_lsb) & 0x3F;
+		return 0;
+	case 1:
+		if (da) {
+			iid_lsb = (addr40) ? 32 : 24;
+			*dest_island = (int)(addr >> iid_lsb) & 0x3F;
+			return 0;
+		}
+
+		idx_lsb = (addr40) ? 37 : 29;
+
+		if (addr & _nic_mask64(idx_lsb, idx_lsb, 0))
+			*dest_island = isld1;
+		else
+			*dest_island = isld0;
+
+		return 0;
+	case 2:
+		if (da) {
+			iid_lsb = (addr40) ? 32 : 24;
+			*dest_island = (int)(addr >> iid_lsb) & 0x3F;
+			return 0;
+		}
+		/*
+		 * Make sure we compare against isldN values by clearing the
+		 * LSB. This is what the silicon does.
+		 */
+		isld0 &= ~1;
+		isld1 &= ~1;
+
+		idx_lsb = (addr40) ? 37 : 29;
+		iid_lsb = idx_lsb - 1;
+
+		if (addr & _nic_mask64(idx_lsb, idx_lsb, 0))
+			*dest_island = isld1 | (int)((addr >> iid_lsb) & 1);
+		else
+			*dest_island = isld0 | (int)((addr >> iid_lsb) & 1);
+
+		return 0;
+	case 3:
+		if (da) {
+			iid_lsb = (addr40) ? 32 : 24;
+			*dest_island = (int)(addr >> iid_lsb) & 0x3F;
+			return 0;
+		}
+
+		isld0 &= ~3;
+		isld1 &= ~3;
+
+		idx_lsb = (addr40) ? 37 : 29;
+		iid_lsb = idx_lsb - 2;
+
+		if (addr & _nic_mask64(idx_lsb, idx_lsb, 0))
+			*dest_island = isld1 | (int)((addr >> iid_lsb) & 3);
+		else
+			*dest_island = isld0 | (int)((addr >> iid_lsb) & 3);
+
+		return 0;
+	default:
+		break;
+	}
+
+	return NFP_ERRNO(EINVAL);
+}
+
+static inline int
+_nfp6000_cppat_addr_encode(uint64_t *addr, int dest_island, int cpp_tgt,
+			   int mode, int addr40, int isld1, int isld0)
+{
+	switch (cpp_tgt) {
+	case NFP6000_CPPTGT_NBI:
+	case NFP6000_CPPTGT_VQDR:
+	case NFP6000_CPPTGT_ILA:
+	case NFP6000_CPPTGT_PCIE:
+	case NFP6000_CPPTGT_ARM:
+	case NFP6000_CPPTGT_CRYPTO:
+	case NFP6000_CPPTGT_CLS:
+		return _nfp6000_encode_basic(addr, dest_island, cpp_tgt, mode,
+					     addr40, isld1, isld0);
+
+	case NFP6000_CPPTGT_MU:
+		return _nfp6000_encode_mu(addr, dest_island, mode, addr40,
+					  isld1, isld0);
+
+	case NFP6000_CPPTGT_CTXPB:
+		if (mode != 1 || addr40 != 0)
+			return NFP_ERRNO(EINVAL);
+
+		*addr &= ~_nic_mask64(29, 24, 0);
+		*addr |= (((uint64_t)dest_island) << 24) &
+			  _nic_mask64(29, 24, 0);
+		return 0;
+	default:
+		break;
+	}
+
+	return NFP_ERRNO(EINVAL);
+}
+
+static inline int
+_nfp6000_cppat_addr_decode(uint64_t addr, int *dest_island, int cpp_tgt,
+			   int mode, int addr40, int isld1, int isld0)
+{
+	switch (cpp_tgt) {
+	case NFP6000_CPPTGT_NBI:
+	case NFP6000_CPPTGT_VQDR:
+	case NFP6000_CPPTGT_ILA:
+	case NFP6000_CPPTGT_PCIE:
+	case NFP6000_CPPTGT_ARM:
+	case NFP6000_CPPTGT_CRYPTO:
+	case NFP6000_CPPTGT_CLS:
+		return _nfp6000_decode_basic(addr, dest_island, cpp_tgt, mode,
+					     addr40, isld1, isld0);
+
+	case NFP6000_CPPTGT_MU:
+		return _nfp6000_decode_mu(addr, dest_island, mode, addr40,
+					  isld1, isld0);
+
+	case NFP6000_CPPTGT_CTXPB:
+		if (mode != 1 || addr40 != 0)
+			return -EINVAL;
+		*dest_island = (int)(addr >> 24) & 0x3F;
+		return 0;
+	default:
+		break;
+	}
+
+	return -EINVAL;
+}
+
+static inline int
+_nfp6000_cppat_addr_iid_clear(uint64_t *addr, int cpp_tgt, int mode, int addr40)
+{
+	int iid_lsb, locality_lsb, da;
+
+	switch (cpp_tgt) {
+	case NFP6000_CPPTGT_NBI:
+	case NFP6000_CPPTGT_VQDR:
+	case NFP6000_CPPTGT_ILA:
+	case NFP6000_CPPTGT_PCIE:
+	case NFP6000_CPPTGT_ARM:
+	case NFP6000_CPPTGT_CRYPTO:
+	case NFP6000_CPPTGT_CLS:
+		switch (mode) {
+		case 0:
+			iid_lsb = (addr40) ? 34 : 26;
+			*addr &= ~(UINT64_C(0x3F) << iid_lsb);
+			return 0;
+		case 1:
+			iid_lsb = (addr40) ? 39 : 31;
+			*addr &= ~_nic_mask64(iid_lsb, iid_lsb, 0);
+			return 0;
+		case 2:
+			iid_lsb = (addr40) ? 38 : 30;
+			*addr &= ~_nic_mask64(iid_lsb + 1, iid_lsb, 0);
+			return 0;
+		case 3:
+			iid_lsb = (addr40) ? 37 : 29;
+			*addr &= ~_nic_mask64(iid_lsb + 2, iid_lsb, 0);
+			return 0;
+		default:
+			break;
+		}
+	case NFP6000_CPPTGT_MU:
+		locality_lsb = _nfp6000_cppat_mu_locality_lsb(mode, addr40);
+		da = (((*addr >> locality_lsb) & 3) ==
+		      _NIC_NFP6000_MU_LOCALITY_DIRECT);
+		switch (mode) {
+		case 0:
+			iid_lsb = (addr40) ? 32 : 24;
+			*addr &= ~(UINT64_C(0x3F) << iid_lsb);
+			return 0;
+		case 1:
+			if (da) {
+				iid_lsb = (addr40) ? 32 : 24;
+				*addr &= ~(UINT64_C(0x3F) << iid_lsb);
+				return 0;
+			}
+			iid_lsb = (addr40) ? 37 : 29;
+			*addr &= ~_nic_mask64(iid_lsb, iid_lsb, 0);
+			return 0;
+		case 2:
+			if (da) {
+				iid_lsb = (addr40) ? 32 : 24;
+				*addr &= ~(UINT64_C(0x3F) << iid_lsb);
+				return 0;
+			}
+
+			iid_lsb = (addr40) ? 36 : 28;
+			*addr &= ~_nic_mask64(iid_lsb + 1, iid_lsb, 0);
+			return 0;
+		case 3:
+			if (da) {
+				iid_lsb = (addr40) ? 32 : 24;
+				*addr &= ~(UINT64_C(0x3F) << iid_lsb);
+				return 0;
+			}
+
+			iid_lsb = (addr40) ? 35 : 27;
+			*addr &= ~_nic_mask64(iid_lsb + 2, iid_lsb, 0);
+			return 0;
+		default:
+			break;
+		}
+	case NFP6000_CPPTGT_CTXPB:
+		if (mode != 1 || addr40 != 0)
+			return 0;
+		*addr &= ~(UINT64_C(0x3F) << 24);
+		return 0;
+	default:
+		break;
+	}
+
+	return NFP_ERRNO(EINVAL);
+}
+
+#endif /* __NFP_CPPAT_H__ */
diff --git a/drivers/net/nfp/nfpcore/nfp-common/nfp_platform.h b/drivers/net/nfp/nfpcore/nfp-common/nfp_platform.h
new file mode 100644
index 0000000..bee2e94
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp-common/nfp_platform.h
@@ -0,0 +1,62 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_PLATFORM_H__
+#define __NFP_PLATFORM_H__
+
+#include <fcntl.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdlib.h>
+#include <ctype.h>
+#include <inttypes.h>
+#include <sys/cdefs.h>
+#include <sys/stat.h>
+#include <limits.h>
+#include <errno.h>
+
+#ifndef BIT_ULL
+#define BIT(x) (1 << (x))
+#define BIT_ULL(x) (1ULL << (x))
+#endif
+
+#ifndef ARRAY_SIZE
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#endif
+
+#define NFP_ERRNO(err) (errno = (err), -1)
+#define NFP_ERRNO_RET(err, ret) (errno = (err), (ret))
+#define NFP_NOERR(errv) (errno)
+#define NFP_ERRPTR(err) (errno = (err), NULL)
+#define NFP_PTRERR(errv) (errno)
+
+#endif /* __NFP_PLATFORM_H__ */
diff --git a/drivers/net/nfp/nfpcore/nfp-common/nfp_resid.h b/drivers/net/nfp/nfpcore/nfp-common/nfp_resid.h
new file mode 100644
index 0000000..629b27e
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp-common/nfp_resid.h
@@ -0,0 +1,620 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Small portions derived from code Copyright(c) 2010-2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_RESID_H__
+#define __NFP_RESID_H__
+
+#if (!defined(_NFP_RESID_NO_C_FUNC) && \
+	(defined(__NFP_TOOL_NFCC) || defined(__NFP_TOOL_NFAS)))
+#define _NFP_RESID_NO_C_FUNC
+#endif
+
+#ifndef _NFP_RESID_NO_C_FUNC
+#include "nfp_platform.h"
+#endif
+
+/*
+ * NFP Chip Architectures
+ *
+ * These are semi-arbitrary values to indicate an NFP architecture.
+ * They serve as a software view of a group of chip families, not necessarily a
+ * direct mapping to actual hardware design.
+ */
+#define NFP_CHIP_ARCH_YD	1
+#define NFP_CHIP_ARCH_TH	2
+
+/*
+ * NFP Chip Families.
+ *
+ * These are not enums, because they need to be microcode compatible.
+ * They are also not maskable.
+ *
+ * Note: The NFP-4xxx family is handled as NFP-6xxx in most software
+ * components.
+ *
+ */
+#define NFP_CHIP_FAMILY_NFP6000 0x6000	/* ARCH_TH */
+
+/* NFP Microengine/Flow Processing Core Versions */
+#define NFP_CHIP_ME_VERSION_2_7 0x0207
+#define NFP_CHIP_ME_VERSION_2_8 0x0208
+#define NFP_CHIP_ME_VERSION_2_9 0x0209
+
+/* NFP Chip Base Revisions. Minor stepping can just be added to these */
+#define NFP_CHIP_REVISION_A0 0x00
+#define NFP_CHIP_REVISION_B0 0x10
+#define NFP_CHIP_REVISION_C0 0x20
+#define NFP_CHIP_REVISION_PF 0xff /* Maximum possible revision */
+
+/* CPP Targets for each chip architecture */
+#define NFP6000_CPPTGT_NBI 1
+#define NFP6000_CPPTGT_VQDR 2
+#define NFP6000_CPPTGT_ILA 6
+#define NFP6000_CPPTGT_MU 7
+#define NFP6000_CPPTGT_PCIE 9
+#define NFP6000_CPPTGT_ARM 10
+#define NFP6000_CPPTGT_CRYPTO 12
+#define NFP6000_CPPTGT_CTXPB 14
+#define NFP6000_CPPTGT_CLS 15
+
+/*
+ * Wildcard indicating a CPP read or write action
+ *
+ * The action used will be either read or write depending on whether a read or
+ * write instruction/call is performed on the NFP_CPP_ID.  It is recomended that
+ * the RW action is used even if all actions to be performed on a NFP_CPP_ID are
+ * known to be only reads or writes. Doing so will in many cases save NFP CPP
+ * internal software resources.
+ */
+#define NFP_CPP_ACTION_RW 32
+
+#define NFP_CPP_TARGET_ID_MASK 0x1f
+
+/*
+ *  NFP_CPP_ID - pack target, token, and action into a CPP ID.
+ *
+ * Create a 32-bit CPP identifier representing the access to be made.
+ * These identifiers are used as parameters to other NFP CPP functions. Some
+ * CPP devices may allow wildcard identifiers to be specified.
+ *
+ * @param[in]	target	NFP CPP target id
+ * @param[in]	action	NFP CPP action id
+ * @param[in]	token	NFP CPP token id
+ * @return		NFP CPP ID
+ */
+#define NFP_CPP_ID(target, action, token)                   \
+	((((target) & 0x7f) << 24) | (((token) & 0xff) << 16) | \
+	 (((action) & 0xff) << 8))
+
+#define NFP_CPP_ISLAND_ID(target, action, token, island)    \
+	((((target) & 0x7f) << 24) | (((token) & 0xff) << 16) | \
+	 (((action) & 0xff) << 8) | (((island) & 0xff) << 0))
+
+#ifndef _NFP_RESID_NO_C_FUNC
+
+/**
+ * Return the NFP CPP target of a NFP CPP ID
+ * @param[in]	id	NFP CPP ID
+ * @return	NFP CPP target
+ */
+static inline uint8_t
+NFP_CPP_ID_TARGET_of(uint32_t id)
+{
+	return (id >> 24) & NFP_CPP_TARGET_ID_MASK;
+}
+
+/*
+ * Return the NFP CPP token of a NFP CPP ID
+ * @param[in]	id	NFP CPP ID
+ * @return	NFP CPP token
+ */
+static inline uint8_t
+NFP_CPP_ID_TOKEN_of(uint32_t id)
+{
+	return (id >> 16) & 0xff;
+}
+
+/*
+ * Return the NFP CPP action of a NFP CPP ID
+ * @param[in]	id	NFP CPP ID
+ * @return	NFP CPP action
+ */
+static inline uint8_t
+NFP_CPP_ID_ACTION_of(uint32_t id)
+{
+	return (id >> 8) & 0xff;
+}
+
+/*
+ * Return the NFP CPP action of a NFP CPP ID
+ * @param[in]   id      NFP CPP ID
+ * @return      NFP CPP action
+ */
+static inline uint8_t
+NFP_CPP_ID_ISLAND_of(uint32_t id)
+{
+	return (id) & 0xff;
+}
+
+#endif /* _NFP_RESID_NO_C_FUNC */
+
+/*
+ *  Check if @p chip_family is an ARCH_TH chip.
+ * @param chip_family One of NFP_CHIP_FAMILY_*
+ */
+#define NFP_FAMILY_IS_ARCH_TH(chip_family) \
+	((int)(chip_family) == (int)NFP_CHIP_FAMILY_NFP6000)
+
+/*
+ *  Get the NFP_CHIP_ARCH_* of @p chip_family.
+ * @param chip_family One of NFP_CHIP_FAMILY_*
+ */
+#define NFP_FAMILY_ARCH(x) \
+	(__extension__ ({ \
+		typeof(x) _x = (x); \
+		(NFP_FAMILY_IS_ARCH_TH(_x) ? NFP_CHIP_ARCH_TH : \
+		NFP_FAMILY_IS_ARCH_YD(_x) ? NFP_CHIP_ARCH_YD : -1) \
+	}))
+
+/*
+ *  Check if @p chip_family is an NFP-6xxx chip.
+ * @param chip_family One of NFP_CHIP_FAMILY_*
+ */
+#define NFP_FAMILY_IS_NFP6000(chip_family) \
+	((int)(chip_family) == (int)NFP_CHIP_FAMILY_NFP6000)
+
+/*
+ *  Make microengine ID for NFP-6xxx.
+ * @param island_id   Island ID.
+ * @param menum       ME number, 0 based, within island.
+ *
+ * NOTE: menum should really be unsigned - MSC compiler throws error (not
+ * warning) if a clause is always true i.e. menum >= 0 if cluster_num is type
+ * unsigned int hence the cast of the menum to an int in that particular clause
+ */
+#define NFP6000_MEID(a, b)                       \
+	(__extension__ ({ \
+		typeof(a) _a = (a); \
+		typeof(b) _b = (b); \
+		(((((int)(_a) & 0x3F) == (int)(_a)) &&   \
+		(((int)(_b) >= 0) && ((int)(_b) < 12))) ?    \
+		(int)(((_a) << 4) | ((_b) + 4)) : -1) \
+	}))
+
+/*
+ *  Do a general sanity check on the ME ID.
+ * The check is on the highest possible island ID for the chip family and the
+ * microengine number must  be a master ID.
+ * @param meid      ME ID as created by NFP6000_MEID
+ */
+#define NFP6000_MEID_IS_VALID(meid) \
+	(__extension__ ({ \
+		typeof(meid) _a = (meid); \
+		((((_a) >> 4) < 64) && (((_a) >> 4) >= 0) && \
+		 (((_a) & 0xF) >= 4)) \
+	}))
+
+/*
+ *  Extract island ID from ME ID.
+ * @param meid   ME ID as created by NFP6000_MEID
+ */
+#define NFP6000_MEID_ISLAND_of(meid) (((meid) >> 4) & 0x3F)
+
+/*
+ * Extract microengine number (0 based) from ME ID.
+ * @param meid   ME ID as created by NFP6000_MEID
+ */
+#define NFP6000_MEID_MENUM_of(meid) (((meid) & 0xF) - 4)
+
+/*
+ * Extract microengine group number (0 based) from ME ID.
+ * The group is two code-sharing microengines, so group  0 refers to MEs 0,1,
+ * group 1 refers to MEs 2,3 etc.
+ * @param meid   ME ID as created by NFP6000_MEID
+ */
+#define NFP6000_MEID_MEGRP_of(meid) (NFP6000_MEID_MENUM_of(meid) >> 1)
+
+#ifndef _NFP_RESID_NO_C_FUNC
+
+/*
+ *  Convert a string to an ME ID.
+ *
+ * @param s       A string of format iX.meY
+ * @param endptr  If non-NULL, *endptr will point to the trailing string
+ *                after the ME ID part of the string, which is either
+ *                an empty string or the first character after the separating
+ *                period.
+ * @return     ME ID on success, -1 on error.
+ */
+int nfp6000_idstr2meid(const char *s, const char **endptr);
+
+/*
+ *  Extract island ID from string.
+ *
+ * Example:
+ * char *c;
+ * int val = nfp6000_idstr2island("i32.me5", &c);
+ * // val == 32, c == "me5"
+ * val = nfp6000_idstr2island("i32", &c);
+ * // val == 32, c == ""
+ *
+ * @param s       A string of format "iX.anything" or "iX"
+ * @param endptr  If non-NULL, *endptr will point to the trailing string
+ *                after the island part of the string, which is either
+ *                an empty string or the first character after the separating
+ *                period.
+ * @return        If successful, the island ID, -1 on error.
+ */
+int nfp6000_idstr2island(const char *s, const char **endptr);
+
+/*
+ *  Extract microengine number from string.
+ *
+ * Example:
+ * char *c;
+ * int menum = nfp6000_idstr2menum("me5.anything", &c);
+ * // menum == 5, c == "anything"
+ * menum = nfp6000_idstr2menum("me5", &c);
+ * // menum == 5, c == ""
+ *
+ * @param s       A string of format "meX.anything" or "meX"
+ * @param endptr  If non-NULL, *endptr will point to the trailing string
+ *                after the ME number part of the string, which is either
+ *                an empty string or the first character after the separating
+ *                period.
+ * @return        If successful, the ME number, -1 on error.
+ */
+int nfp6000_idstr2menum(const char *s, const char **endptr);
+
+/*
+ * Extract context number from string.
+ *
+ * Example:
+ * char *c;
+ * int val = nfp6000_idstr2ctxnum("ctx5.anything", &c);
+ * // val == 5, c == "anything"
+ * val = nfp6000_idstr2ctxnum("ctx5", &c);
+ * // val == 5, c == ""
+ *
+ * @param s       A string of format "ctxN.anything" or "ctxN"
+ * @param endptr  If non-NULL, *endptr will point to the trailing string
+ *                after the context number part of the string, which is either
+ *                an empty string or the first character after the separating
+ *                period.
+ * @return        If successful, the context number, -1 on error.
+ */
+int nfp6000_idstr2ctxnum(const char *s, const char **endptr);
+
+/*
+ * Extract microengine group number from string.
+ *
+ * Example:
+ * char *c;
+ * int val = nfp6000_idstr2megrp("tg2.anything", &c);
+ * // val == 2, c == "anything"
+ * val = nfp6000_idstr2megrp("tg5", &c);
+ * // val == 2, c == ""
+ *
+ * @param s       A string of format "tgX.anything" or "tgX"
+ * @param endptr  If non-NULL, *endptr will point to the trailing string
+ *                after the ME group part of the string, which is either
+ *                an empty string or the first character after the separating
+ *                period.
+ * @return        If successful, the ME group number, -1 on error.
+ */
+int nfp6000_idstr2megrp(const char *s, const char **endptr);
+
+/*
+ * Create ME ID string of format "iX[.meY]".
+ *
+ * @param s      Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *               The resulting string is output here.
+ * @param meid   Microengine ID.
+ * @return       Pointer to "s" on success, NULL on error.
+ */
+const char *nfp6000_meid2str(char *s, int meid);
+
+/*
+ * Create ME ID string of format "name[.meY]" or "iX[.meY]".
+ *
+ * @param s      Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *               The resulting string is output here.
+ * @param meid   Microengine ID.
+ * @return       Pointer to "s" on success, NULL on error.
+ *
+ * Similar to nfp6000_meid2str() except use an alias instead of "iX"
+ * if one exists for the island.
+ */
+const char *nfp6000_meid2altstr(char *s, int meid);
+
+/*
+ * Create string of format "iX".
+ *
+ * @param s         Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *                  The resulting string is output here.
+ * @param island_id Island ID.
+ * @return          Pointer to "s" on success, NULL on error.
+ */
+const char *nfp6000_island2str(char *s, int island_id);
+
+/*
+ * Create string of format "name", an island alias.
+ *
+ * @param s         Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *                  The resulting string is output here.
+ * @param island_id Island ID.
+ * @return          Pointer to "s" on success, NULL on error.
+ */
+const char *nfp6000_island2altstr(char *s, int island_id);
+
+/*
+ * Create string of format "meY".
+ *
+ * @param s      Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *               The resulting string is output here.
+ * @param menum  Microengine number within island.
+ * @return       Pointer to "s" on success, NULL on error.
+ */
+const char *nfp6000_menum2str(char *s, int menum);
+
+/*
+ * Create string of format "ctxY".
+ *
+ * @param s      Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *               The resulting string is output here.
+ * @param ctxnum Context number within microengine.
+ * @return       Pointer to "s" on success, NULL on error.
+ */
+const char *nfp6000_ctxnum2str(char *s, int ctxnum);
+
+/*
+ * Create string of format "tgY".
+ *
+ * @param s      Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *               The resulting string is output here.
+ * @param megrp  Microengine group number within cluster.
+ * @return       Pointer to "s" on success, NULL on error.
+ */
+const char *nfp6000_megrp2str(char *s, int megrp);
+
+/*
+ * Convert a string to an ME ID.
+ *
+ * @param chip_family Chip family ID
+ * @param s           A string of format iX.meY (or clX.meY)
+ * @param endptr      If non-NULL, *endptr will point to the trailing
+ *                    string after the ME ID part of the string, which
+ *                    is either an empty string or the first character
+ *                    after the separating period.
+ * @return            ME ID on success, -1 on error.
+ */
+int nfp_idstr2meid(int chip_family, const char *s, const char **endptr);
+
+/*
+ * Extract island ID from string.
+ *
+ * Example:
+ * char *c;
+ * int val = nfp_idstr2island(chip, "i32.me5", &c);
+ * // val == 32, c == "me5"
+ * val = nfp_idstr2island(chip, "i32", &c);
+ * // val == 32, c == ""
+ *
+ * @param chip_family Chip family ID
+ * @param s           A string of format "iX.anything" or "iX"
+ * @param endptr      If non-NULL, *endptr will point to the trailing
+ *                    striong after the ME ID part of the string, which
+ *                    is either an empty string or the first character
+ *                    after the separating period.
+ * @return            The island ID on succes, -1 on error.
+ */
+int nfp_idstr2island(int chip_family, const char *s, const char **endptr);
+
+/*
+ * Extract microengine number from string.
+ *
+ * Example:
+ * char *c;
+ * int menum = nfp_idstr2menum("me5.anything", &c);
+ * // menum == 5, c == "anything"
+ * menum = nfp_idstr2menum("me5", &c);
+ * // menum == 5, c == ""
+ *
+ * @param chip_family Chip family ID
+ * @param s           A string of format "meX.anything" or "meX"
+ * @param endptr      If non-NULL, *endptr will point to the trailing
+ *                    striong after the ME ID part of the string, which
+ *                    is either an empty string or the first character
+ *                    after the separating period.
+ * @return            The ME number on succes, -1 on error.
+ */
+int nfp_idstr2menum(int chip_family, const char *s, const char **endptr);
+
+/*
+ * Extract context number from string.
+ *
+ * Example:
+ * char *c;
+ * int val = nfp_idstr2ctxnum("ctx5.anything", &c);
+ * // val == 5, c == "anything"
+ * val = nfp_idstr2ctxnum("ctx5", &c);
+ * // val == 5, c == ""
+ *
+ * @param s       A string of format "ctxN.anything" or "ctxN"
+ * @param endptr  If non-NULL, *endptr will point to the trailing string
+ *                after the context number part of the string, which is either
+ *                an empty string or the first character after the separating
+ *                period.
+ * @return        If successful, the context number, -1 on error.
+ */
+int nfp_idstr2ctxnum(int chip_family, const char *s, const char **endptr);
+
+/*
+ * Extract microengine group number from string.
+ *
+ * Example:
+ * char *c;
+ * int val = nfp_idstr2megrp("tg2.anything", &c);
+ * // val == 2, c == "anything"
+ * val = nfp_idstr2megrp("tg5", &c);
+ * // val == 5, c == ""
+ *
+ * @param s       A string of format "tgX.anything" or "tgX"
+ * @param endptr  If non-NULL, *endptr will point to the trailing string
+ *                after the ME group part of the string, which is either
+ *                an empty string or the first character after the separating
+ *                period.
+ * @return        If successful, the ME group number, -1 on error.
+ */
+int nfp_idstr2megrp(int chip_family, const char *s, const char **endptr);
+
+/*
+ * Create ME ID string of format "iX[.meY]".
+ *
+ * @param chip_family Chip family ID
+ * @param s           Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *                    The resulting string is output here.
+ * @param meid        Microengine ID.
+ * @return            Pointer to "s" on success, NULL on error.
+ */
+const char *nfp_meid2str(int chip_family, char *s, int meid);
+
+/*
+ * Create ME ID string of format "name[.meY]" or "iX[.meY]".
+ *
+ * @param chip_family Chip family ID
+ * @param s           Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *                    The resulting string is output here.
+ * @param meid        Microengine ID.
+ * @return            Pointer to "s" on success, NULL on error.
+ *
+ * Similar to nfp_meid2str() except use an alias instead of "iX"
+ * if one exists for the island.
+ */
+const char *nfp_meid2altstr(int chip_family, char *s, int meid);
+
+/*
+ * Create string of format "iX".
+ *
+ * @param chip_family Chip family ID
+ * @param s           Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *                    The resulting string is output here.
+ * @param island_id   Island ID.
+ * @return            Pointer to "s" on success, NULL on error.
+ */
+const char *nfp_island2str(int chip_family, char *s, int island_id);
+
+/*
+ * Create string of format "name", an island alias.
+ *
+ * @param chip_family Chip family ID
+ * @param s           Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *                    The resulting string is output here.
+ * @param island_id   Island ID.
+ * @return            Pointer to "s" on success, NULL on error.
+ */
+const char *nfp_island2altstr(int chip_family, char *s, int island_id);
+
+/*
+ * Create string of format "meY".
+ *
+ * @param chip_family Chip family ID
+ * @param s           Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *                    The resulting string is output here.
+ * @param menum       Microengine number within island.
+ * @return            Pointer to "s" on success, NULL on error.
+ */
+const char *nfp_menum2str(int chip_family, char *s, int menum);
+
+/*
+ * Create string of format "ctxY".
+ *
+ * @param s      Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *               The resulting string is output here.
+ * @param ctxnum Context number within microengine.
+ * @return       Pointer to "s" on success, NULL on error.
+ */
+const char *nfp_ctxnum2str(int chip_family, char *s, int ctxnum);
+
+/*
+ * Create string of format "tgY".
+ *
+ * @param s      Pointer to char buffer of size NFP_MEID_STR_SZ.
+ *               The resulting string is output here.
+ * @param megrp  Microengine group number within cluster.
+ * @return       Pointer to "s" on success, NULL on error.
+ */
+const char *nfp_megrp2str(int chip_family, char *s, int megrp);
+
+/*
+ * Convert a two character string to revision number.
+ *
+ * Revision integer is 0x00 for A0, 0x11 for B1 etc.
+ *
+ * @param s     Two character string.
+ * @return      Revision number, -1 on error
+ */
+int nfp_idstr2rev(const char *s);
+
+/*
+ * Create string from revision number.
+ *
+ * String will be upper case.
+ *
+ * @param s     Pointer to char buffer with size of at least 3
+ *              for 2 characters and string terminator.
+ * @param rev   Revision number.
+ * @return      Pointer to "s" on success, NULL on error.
+ */
+const char *nfp_rev2str(char *s, int rev);
+
+/*
+ * Get the NFP CPP address from a string
+ *
+ * String is in the format [island@]target[:[action:[token:]]address]
+ *
+ * @param chip_family Chip family ID
+ * @param tid           Pointer to string to parse
+ * @param cpp_idp       Pointer to CPP ID
+ * @param cpp_addrp     Pointer to CPP address
+ * @return              0 on success, or -1 and errno
+ */
+int nfp_str2cpp(int chip_family,
+		const char *tid,
+		uint32_t *cpp_idp,
+		uint64_t *cpp_addrp);
+
+
+#endif /* _NFP_RESID_NO_C_FUNC */
+
+#endif /* __NFP_RESID_H__ */
diff --git a/drivers/net/nfp/nfpcore/nfp6000/nfp6000.h b/drivers/net/nfp/nfpcore/nfp6000/nfp6000.h
new file mode 100644
index 0000000..dc0a359
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp6000/nfp6000.h
@@ -0,0 +1,68 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Small portions derived from code Copyright(c) 2010-2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_NFP6000_H__
+#define __NFP_NFP6000_H__
+
+/* CPP Target IDs */
+#define NFP_CPP_TARGET_INVALID          0
+#define NFP_CPP_TARGET_NBI              1
+#define NFP_CPP_TARGET_QDR              2
+#define NFP_CPP_TARGET_ILA              6
+#define NFP_CPP_TARGET_MU               7
+#define NFP_CPP_TARGET_PCIE             9
+#define NFP_CPP_TARGET_ARM              10
+#define NFP_CPP_TARGET_CRYPTO           12
+#define NFP_CPP_TARGET_ISLAND_XPB       14	/* Shared with CAP */
+#define NFP_CPP_TARGET_ISLAND_CAP       14	/* Shared with XPB */
+#define NFP_CPP_TARGET_CT_XPB           14
+#define NFP_CPP_TARGET_LOCAL_SCRATCH    15
+#define NFP_CPP_TARGET_CLS              NFP_CPP_TARGET_LOCAL_SCRATCH
+
+#define NFP_ISL_EMEM0                   24
+
+#define NFP_MU_ADDR_ACCESS_TYPE_MASK    3ULL
+#define NFP_MU_ADDR_ACCESS_TYPE_DIRECT  2ULL
+
+static inline int
+nfp_cppat_mu_locality_lsb(int mode, int addr40)
+{
+	switch (mode) {
+	case 0 ... 3:
+		return addr40 ? 38 : 30;
+	default:
+		return -EINVAL;
+	}
+}
+
+#endif /* NFP_NFP6000_H */
diff --git a/drivers/net/nfp/nfpcore/nfp6000/nfp_xpb.h b/drivers/net/nfp/nfpcore/nfp6000/nfp_xpb.h
new file mode 100644
index 0000000..2a89519
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp6000/nfp_xpb.h
@@ -0,0 +1,54 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Small portions derived from code Copyright(c) 2010-2015 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_XPB_H__
+#define __NFP_XPB_H__
+
+/*
+ * For use with NFP6000 Databook "XPB Addressing" section
+ */
+#define NFP_XPB_OVERLAY(island)  (((island) & 0x3f) << 24)
+
+#define NFP_XPB_ISLAND(island)   (NFP_XPB_OVERLAY(island) + 0x60000)
+
+#define NFP_XPB_ISLAND_of(offset) (((offset) >> 24) & 0x3F)
+
+/*
+ * For use with NFP6000 Databook "XPB Island and Device IDs" chapter
+ */
+#define NFP_XPB_DEVICE(island, slave, device) \
+				(NFP_XPB_OVERLAY(island) | \
+				 (((slave) & 3) << 22) | \
+				 (((device) & 0x3f) << 16))
+
+#endif /* NFP_XPB_H */
diff --git a/drivers/net/nfp/nfpcore/nfp_cpp.h b/drivers/net/nfp/nfpcore/nfp_cpp.h
new file mode 100644
index 0000000..e1d8fda
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_cpp.h
@@ -0,0 +1,803 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+
+#ifndef __NFP_CPP_H__
+#define __NFP_CPP_H__
+
+#include "nfp-common/nfp_platform.h"
+#include "nfp-common/nfp_resid.h"
+
+struct nfp_cpp_mutex;
+
+/*
+ * NFP CPP handle
+ */
+struct nfp_cpp {
+	uint32_t model;
+	uint32_t interface;
+	uint8_t *serial;
+	int serial_len;
+	void *priv;
+
+	/* Mutex cache */
+	struct nfp_cpp_mutex *mutex_cache;
+	const struct nfp_cpp_operations *op;
+
+	/*
+	 * NFP-6xxx originating island IMB CPP Address Translation. CPP Target
+	 * ID is index into array. Values are obtained at runtime from local
+	 * island XPB CSRs.
+	 */
+	uint32_t imb_cat_table[16];
+};
+
+/*
+ * NFP CPP device area handle
+ */
+struct nfp_cpp_area {
+	struct nfp_cpp *cpp;
+	char *name;
+	unsigned long long offset;
+	unsigned long size;
+	/* Here follows the 'priv' part of nfp_cpp_area. */
+};
+
+/*
+ * NFP CPP operations structure
+ */
+struct nfp_cpp_operations {
+	/* Size of priv area in struct nfp_cpp_area */
+	size_t area_priv_size;
+
+	/* Instance an NFP CPP */
+	int (*init)(struct nfp_cpp *cpp, const char *devname);
+
+	/*
+	 * Free the bus.
+	 * Called only once, during nfp_cpp_unregister()
+	 */
+	void (*free)(struct nfp_cpp *cpp);
+
+	/*
+	 * Initialize a new NFP CPP area
+	 * NOTE: This is _not_ serialized
+	 */
+	int (*area_init)(struct nfp_cpp_area *area,
+			 uint32_t dest,
+			 unsigned long long address,
+			 unsigned long size);
+	/*
+	 * Clean up a NFP CPP area before it is freed
+	 * NOTE: This is _not_ serialized
+	 */
+	void (*area_cleanup)(struct nfp_cpp_area *area);
+
+	/*
+	 * Acquire resources for a NFP CPP area
+	 * Serialized
+	 */
+	int (*area_acquire)(struct nfp_cpp_area *area);
+	/*
+	 * Release resources for a NFP CPP area
+	 * Serialized
+	 */
+	void (*area_release)(struct nfp_cpp_area *area);
+	/*
+	 * Return a void IO pointer to a NFP CPP area
+	 * NOTE: This is _not_ serialized
+	 */
+
+	void *(*area_iomem)(struct nfp_cpp_area *area);
+
+	void *(*area_mapped)(struct nfp_cpp_area *area);
+	/*
+	 * Perform a read from a NFP CPP area
+	 * Serialized
+	 */
+	int (*area_read)(struct nfp_cpp_area *area,
+			 void *kernel_vaddr,
+			 unsigned long offset,
+			 unsigned int length);
+	/*
+	 * Perform a write to a NFP CPP area
+	 * Serialized
+	 */
+	int (*area_write)(struct nfp_cpp_area *area,
+			  const void *kernel_vaddr,
+			  unsigned long offset,
+			  unsigned int length);
+};
+
+/*
+ * This should be the only external function the transport
+ * module supplies
+ */
+const struct nfp_cpp_operations *nfp_cpp_transport_operations(void);
+
+/*
+ * Set the model id
+ *
+ * @param   cpp     NFP CPP operations structure
+ * @param   model   Model ID
+ */
+void nfp_cpp_model_set(struct nfp_cpp *cpp, uint32_t model);
+
+/*
+ * Set the private instance owned data of a nfp_cpp struct
+ *
+ * @param   cpp     NFP CPP operations structure
+ * @param   interface Interface ID
+ */
+void nfp_cpp_interface_set(struct nfp_cpp *cpp, uint32_t interface);
+
+/*
+ * Set the private instance owned data of a nfp_cpp struct
+ *
+ * @param   cpp     NFP CPP operations structure
+ * @param   serial  NFP serial byte array
+ * @param   len     Length of the serial byte array
+ */
+int nfp_cpp_serial_set(struct nfp_cpp *cpp, const uint8_t *serial,
+		       size_t serial_len);
+
+/*
+ * Set the private data of the nfp_cpp instance
+ *
+ * @param   cpp NFP CPP operations structure
+ * @return      Opaque device pointer
+ */
+void nfp_cpp_priv_set(struct nfp_cpp *cpp, void *priv);
+
+/*
+ * Return the private data of the nfp_cpp instance
+ *
+ * @param   cpp NFP CPP operations structure
+ * @return      Opaque device pointer
+ */
+void *nfp_cpp_priv(struct nfp_cpp *cpp);
+
+/*
+ * Get the privately allocated portion of a NFP CPP area handle
+ *
+ * @param   cpp_area    NFP CPP area handle
+ * @return          Pointer to the private area, or NULL on failure
+ */
+void *nfp_cpp_area_priv(struct nfp_cpp_area *cpp_area);
+
+uint32_t __nfp_cpp_model_autodetect(struct nfp_cpp *cpp);
+
+/*
+ * NFP CPP core interface for CPP clients.
+ */
+
+/*
+ * Open a NFP CPP handle to a CPP device
+ *
+ * @param[in]	id	0-based ID for the CPP interface to use
+ *
+ * @return NFP CPP handle, or NULL on failure (and set errno accordingly).
+ */
+struct nfp_cpp *nfp_cpp_from_device_name(const char *devname);
+
+/*
+ * Free a NFP CPP handle
+ *
+ * @param[in]	cpp	NFP CPP handle
+ */
+void nfp_cpp_free(struct nfp_cpp *cpp);
+
+#define NFP_CPP_MODEL_INVALID   0xffffffff
+
+/*
+ * NFP_CPP_MODEL_CHIP_of - retrieve the chip ID from the model ID
+ *
+ * The chip ID is a 16-bit BCD+A-F encoding for the chip type.
+ *
+ * @param[in]   model   NFP CPP model id
+ * @return      NFP CPP chip id
+ */
+#define NFP_CPP_MODEL_CHIP_of(model)        (((model) >> 16) & 0xffff)
+
+/*
+ * NFP_CPP_MODEL_IS_6000 - Check for the NFP6000 family of devices
+ *
+ * NOTE: The NFP4000 series is considered as a NFP6000 series variant.
+ *
+ * @param[in]	model	NFP CPP model id
+ * @return		true if model is in the NFP6000 family, false otherwise.
+ */
+#define NFP_CPP_MODEL_IS_6000(model)		     \
+		((NFP_CPP_MODEL_CHIP_of(model) >= 0x4000) && \
+		(NFP_CPP_MODEL_CHIP_of(model) < 0x7000))
+
+/*
+ * nfp_cpp_model - Retrieve the Model ID of the NFP
+ *
+ * @param[in]	cpp	NFP CPP handle
+ * @return		NFP CPP Model ID
+ */
+uint32_t nfp_cpp_model(struct nfp_cpp *cpp);
+
+/*
+ * NFP Interface types - logical interface for this CPP connection 4 bits are
+ * reserved for interface type.
+ */
+#define NFP_CPP_INTERFACE_TYPE_INVALID		0x0
+#define NFP_CPP_INTERFACE_TYPE_PCI		0x1
+#define NFP_CPP_INTERFACE_TYPE_ARM		0x2
+#define NFP_CPP_INTERFACE_TYPE_RPC		0x3
+#define NFP_CPP_INTERFACE_TYPE_ILA		0x4
+
+/*
+ * Construct a 16-bit NFP Interface ID
+ *
+ * Interface IDs consists of 4 bits of interface type, 4 bits of unit
+ * identifier, and 8 bits of channel identifier.
+ *
+ * The NFP Interface ID is used in the implementation of NFP CPP API mutexes,
+ * which use the MU Atomic CompareAndWrite operation - hence the limit to 16
+ * bits to be able to use the NFP Interface ID as a lock owner.
+ *
+ * @param[in]	type	NFP Interface Type
+ * @param[in]	unit	Unit identifier for the interface type
+ * @param[in]	channel	Channel identifier for the interface unit
+ * @return		Interface ID
+ */
+#define NFP_CPP_INTERFACE(type, unit, channel)	\
+	((((type) & 0xf) << 12) | \
+	 (((unit) & 0xf) <<  8) | \
+	 (((channel) & 0xff) << 0))
+
+/*
+ * Get the interface type of a NFP Interface ID
+ * @param[in]	interface	NFP Interface ID
+ * @return			NFP Interface ID's type
+ */
+#define NFP_CPP_INTERFACE_TYPE_of(interface)	(((interface) >> 12) & 0xf)
+
+/*
+ * Get the interface unit of a NFP Interface ID
+ * @param[in]	interface	NFP Interface ID
+ * @return			NFP Interface ID's unit
+ */
+#define NFP_CPP_INTERFACE_UNIT_of(interface)	(((interface) >>  8) & 0xf)
+
+/*
+ * Get the interface channel of a NFP Interface ID
+ * @param[in]	interface	NFP Interface ID
+ * @return			NFP Interface ID's channel
+ */
+#define NFP_CPP_INTERFACE_CHANNEL_of(interface)	(((interface) >>  0) & 0xff)
+
+/*
+ * Retrieve the Interface ID of the NFP
+ * @param[in]	cpp	NFP CPP handle
+ * @return		NFP CPP Interface ID
+ */
+uint16_t nfp_cpp_interface(struct nfp_cpp *cpp);
+
+/*
+ * Retrieve the NFP Serial Number (unique per NFP)
+ * @param[in]	cpp	NFP CPP handle
+ * @param[out]	serial	Pointer to reference the serial number array
+ *
+ * @return	size of the NFP6000 serial number, in bytes
+ */
+int nfp_cpp_serial(struct nfp_cpp *cpp, const uint8_t **serial);
+
+/*
+ * Allocate a NFP CPP area handle, as an offset into a CPP ID
+ * @param[in]	cpp	NFP CPP handle
+ * @param[in]	cpp_id	NFP CPP ID
+ * @param[in]	address	Offset into the NFP CPP ID address space
+ * @param[in]	size	Size of the area to reserve
+ *
+ * @return NFP CPP handle, or NULL on failure (and set errno accordingly).
+ */
+struct nfp_cpp_area *nfp_cpp_area_alloc(struct nfp_cpp *cpp, uint32_t cpp_id,
+					unsigned long long address,
+					unsigned long size);
+
+/*
+ * Allocate a NFP CPP area handle, as an offset into a CPP ID, by a named owner
+ * @param[in]	cpp	NFP CPP handle
+ * @param[in]	cpp_id	NFP CPP ID
+ * @param[in]	name	Name of owner of the area
+ * @param[in]	address	Offset into the NFP CPP ID address space
+ * @param[in]	size	Size of the area to reserve
+ *
+ * @return NFP CPP handle, or NULL on failure (and set errno accordingly).
+ */
+struct nfp_cpp_area *nfp_cpp_area_alloc_with_name(struct nfp_cpp *cpp,
+						  uint32_t cpp_id,
+						  const char *name,
+						  unsigned long long address,
+						  unsigned long size);
+
+/*
+ * Free an allocated NFP CPP area handle
+ * @param[in]	area	NFP CPP area handle
+ */
+void nfp_cpp_area_free(struct nfp_cpp_area *area);
+
+/*
+ * Acquire the resources needed to access the NFP CPP area handle
+ *
+ * @param[in]	area	NFP CPP area handle
+ *
+ * @return 0 on success, -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_area_acquire(struct nfp_cpp_area *area);
+
+/*
+ * Release the resources needed to access the NFP CPP area handle
+ *
+ * @param[in]	area	NFP CPP area handle
+ */
+void nfp_cpp_area_release(struct nfp_cpp_area *area);
+
+/*
+ * Allocate, then acquire the resources needed to access the NFP CPP area handle
+ * @param[in]	cpp	NFP CPP handle
+ * @param[in]	cpp_id	NFP CPP ID
+ * @param[in]	address	Offset into the NFP CPP ID address space
+ * @param[in]	size	Size of the area to reserve
+ *
+ * @return NFP CPP handle, or NULL on failure (and set errno accordingly).
+ */
+struct nfp_cpp_area *nfp_cpp_area_alloc_acquire(struct nfp_cpp *cpp,
+						uint32_t cpp_id,
+						unsigned long long address,
+						unsigned long size);
+
+/*
+ * Release the resources, then free the NFP CPP area handle
+ * @param[in]	area	NFP CPP area handle
+ */
+void nfp_cpp_area_release_free(struct nfp_cpp_area *area);
+
+uint8_t *nfp_cpp_map_area(struct nfp_cpp *cpp, int domain, int target,
+			   uint64_t addr, unsigned long size,
+			   struct nfp_cpp_area **area);
+/*
+ * Return an IO pointer to the beginning of the NFP CPP area handle. The area
+ * must be acquired with 'nfp_cpp_area_acquire()' before calling this operation.
+ *
+ * @param[in]	area	NFP CPP area handle
+ *
+ * @return Pointer to IO memory, or NULL on failure (and set errno accordingly).
+ */
+void *nfp_cpp_area_mapped(struct nfp_cpp_area *area);
+
+/*
+ * Read from a NFP CPP area handle into a buffer. The area must be acquired with
+ * 'nfp_cpp_area_acquire()' before calling this operation.
+ *
+ * @param[in]	area	NFP CPP area handle
+ * @param[in]	offset	Offset into the area
+ * @param[in]	buffer	Location of buffer to receive the data
+ * @param[in]	length	Length of the data to read
+ *
+ * @return bytes read on success, -1 on failure (and set errno accordingly).
+ *
+ */
+int nfp_cpp_area_read(struct nfp_cpp_area *area, unsigned long offset,
+		      void *buffer, size_t length);
+
+/*
+ * Write to a NFP CPP area handle from a buffer. The area must be acquired with
+ * 'nfp_cpp_area_acquire()' before calling this operation.
+ *
+ * @param[in]	area	NFP CPP area handle
+ * @param[in]	offset	Offset into the area
+ * @param[in]	buffer	Location of buffer that holds the data
+ * @param[in]	length	Length of the data to read
+ *
+ * @return bytes written on success, -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_area_write(struct nfp_cpp_area *area, unsigned long offset,
+		       const void *buffer, size_t length);
+
+/*
+ * nfp_cpp_area_iomem() - get IOMEM region for CPP area
+ * @area:       CPP area handle
+ *
+ * Returns an iomem pointer for use with readl()/writel() style operations.
+ *
+ * NOTE: Area must have been locked down with an 'acquire'.
+ *
+ * Return: pointer to the area, or NULL
+ */
+void *nfp_cpp_area_iomem(struct nfp_cpp_area *area);
+
+/*
+ * Verify that IO can be performed on an offset in an area
+ *
+ * @param[in]	area	NFP CPP area handle
+ * @param[in]	offset	Offset into the area
+ * @param[in]	size	Size of region to validate
+ *
+ * @return 0 on success, -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_area_check_range(struct nfp_cpp_area *area,
+			     unsigned long long offset, unsigned long size);
+
+/*
+ * Get the NFP CPP handle that is the parent of a NFP CPP area handle
+ *
+ * @param	cpp_area	NFP CPP area handle
+ * @return			NFP CPP handle
+ */
+struct nfp_cpp *nfp_cpp_area_cpp(struct nfp_cpp_area *cpp_area);
+
+/*
+ * Get the name passed during allocation of the NFP CPP area handle
+ *
+ * @param	cpp_area	NFP CPP area handle
+ * @return			Pointer to the area's name
+ */
+const char *nfp_cpp_area_name(struct nfp_cpp_area *cpp_area);
+
+/*
+ * Read a block of data from a NFP CPP ID
+ *
+ * @param[in]	cpp	NFP CPP handle
+ * @param[in]	cpp_id	NFP CPP ID
+ * @param[in]	address	Offset into the NFP CPP ID address space
+ * @param[in]	kernel_vaddr	Buffer to copy read data to
+ * @param[in]	length	Size of the area to reserve
+ *
+ * @return bytes read on success, -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_read(struct nfp_cpp *cpp, uint32_t cpp_id,
+		 unsigned long long address, void *kernel_vaddr, size_t length);
+
+/*
+ * Write a block of data to a NFP CPP ID
+ *
+ * @param[in]	cpp	NFP CPP handle
+ * @param[in]	cpp_id	NFP CPP ID
+ * @param[in]	address	Offset into the NFP CPP ID address space
+ * @param[in]	kernel_vaddr	Buffer to copy write data from
+ * @param[in]	length	Size of the area to reserve
+ *
+ * @return bytes written on success, -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_write(struct nfp_cpp *cpp, uint32_t cpp_id,
+		  unsigned long long address, const void *kernel_vaddr,
+		  size_t length);
+
+
+
+/*
+ * Fill a NFP CPP area handle and offset with a value
+ *
+ * @param[in]	area	NFP CPP area handle
+ * @param[in]	offset	Offset into the NFP CPP ID address space
+ * @param[in]	value	32-bit value to fill area with
+ * @param[in]	length	Size of the area to reserve
+ *
+ * @return bytes written on success, -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_area_fill(struct nfp_cpp_area *area, unsigned long offset,
+		      uint32_t value, size_t length);
+
+/*
+ * Read a single 32-bit value from a NFP CPP area handle
+ *
+ * @param area		NFP CPP area handle
+ * @param offset	offset into NFP CPP area handle
+ * @param value		output value
+ *
+ * The area must be acquired with 'nfp_cpp_area_acquire()' before calling this
+ * operation.
+ *
+ * NOTE: offset must be 32-bit aligned.
+ *
+ * @return 0 on success, or -1 on error (and set errno accordingly).
+ */
+int nfp_cpp_area_readl(struct nfp_cpp_area *area, unsigned long offset,
+		       uint32_t *value);
+
+/*
+ * Write a single 32-bit value to a NFP CPP area handle
+ *
+ * @param area		NFP CPP area handle
+ * @param offset	offset into NFP CPP area handle
+ * @param value		value to write
+ *
+ * The area must be acquired with 'nfp_cpp_area_acquire()' before calling this
+ * operation.
+ *
+ * NOTE: offset must be 32-bit aligned.
+ *
+ * @return 0 on success, or -1 on error (and set errno accordingly).
+ */
+int nfp_cpp_area_writel(struct nfp_cpp_area *area, unsigned long offset,
+			uint32_t value);
+
+/*
+ * Read a single 64-bit value from a NFP CPP area handle
+ *
+ * @param area		NFP CPP area handle
+ * @param offset	offset into NFP CPP area handle
+ * @param value		output value
+ *
+ * The area must be acquired with 'nfp_cpp_area_acquire()' before calling this
+ * operation.
+ *
+ * NOTE: offset must be 64-bit aligned.
+ *
+ * @return 0 on success, or -1 on error (and set errno accordingly).
+ */
+int nfp_cpp_area_readq(struct nfp_cpp_area *area, unsigned long offset,
+		       uint64_t *value);
+
+/*
+ * Write a single 64-bit value to a NFP CPP area handle
+ *
+ * @param area		NFP CPP area handle
+ * @param offset	offset into NFP CPP area handle
+ * @param value		value to write
+ *
+ * The area must be acquired with 'nfp_cpp_area_acquire()' before calling this
+ * operation.
+ *
+ * NOTE: offset must be 64-bit aligned.
+ *
+ * @return 0 on success, or -1 on error (and set errno accordingly).
+ */
+int nfp_cpp_area_writeq(struct nfp_cpp_area *area, unsigned long offset,
+			uint64_t value);
+
+/*
+ * Write a single 32-bit value on the XPB bus
+ *
+ * @param cpp           NFP CPP device handle
+ * @param xpb_tgt	XPB target and address
+ * @param value         value to write
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_xpb_writel(struct nfp_cpp *cpp, uint32_t xpb_tgt, uint32_t value);
+
+/*
+ * Read a single 32-bit value from the XPB bus
+ *
+ * @param cpp           NFP CPP device handle
+ * @param xpb_tgt	XPB target and address
+ * @param value         output value
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_xpb_readl(struct nfp_cpp *cpp, uint32_t xpb_tgt, uint32_t *value);
+
+/*
+ * Modify bits of a 32-bit value from the XPB bus
+ *
+ * @param cpp           NFP CPP device handle
+ * @param xpb_tgt       XPB target and address
+ * @param mask          mask of bits to alter
+ * @param value         value to modify
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_xpb_writelm(struct nfp_cpp *cpp, uint32_t xpb_tgt, uint32_t mask,
+		    uint32_t value);
+
+/*
+ * Modify bits of a 32-bit value from the XPB bus
+ *
+ * @param cpp           NFP CPP device handle
+ * @param xpb_tgt       XPB target and address
+ * @param mask          mask of bits to alter
+ * @param value         value to monitor for
+ * @param timeout_us    maximum number of us to wait (-1 for forever)
+ *
+ * @return >= 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_xpb_waitlm(struct nfp_cpp *cpp, uint32_t xpb_tgt, uint32_t mask,
+		   uint32_t value, int timeout_us);
+
+/*
+ * Read a 32-bit word from a NFP CPP ID
+ *
+ * @param cpp           NFP CPP handle
+ * @param cpp_id        NFP CPP ID
+ * @param address       offset into the NFP CPP ID address space
+ * @param value         output value
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_readl(struct nfp_cpp *cpp, uint32_t cpp_id,
+		  unsigned long long address, uint32_t *value);
+
+/*
+ * Write a 32-bit value to a NFP CPP ID
+ *
+ * @param cpp           NFP CPP handle
+ * @param cpp_id        NFP CPP ID
+ * @param address       offset into the NFP CPP ID address space
+ * @param value         value to write
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ *
+ */
+int nfp_cpp_writel(struct nfp_cpp *cpp, uint32_t cpp_id,
+		   unsigned long long address, uint32_t value);
+
+/*
+ * Read a 64-bit work from a NFP CPP ID
+ *
+ * @param cpp           NFP CPP handle
+ * @param cpp_id        NFP CPP ID
+ * @param address       offset into the NFP CPP ID address space
+ * @param value         output value
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_readq(struct nfp_cpp *cpp, uint32_t cpp_id,
+		  unsigned long long address, uint64_t *value);
+
+/*
+ * Write a 64-bit value to a NFP CPP ID
+ *
+ * @param cpp           NFP CPP handle
+ * @param cpp_id        NFP CPP ID
+ * @param address       offset into the NFP CPP ID address space
+ * @param value         value to write
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_writeq(struct nfp_cpp *cpp, uint32_t cpp_id,
+		   unsigned long long address, uint64_t value);
+
+/*
+ * Initialize a mutex location
+
+ * The CPP target:address must point to a 64-bit aligned location, and will
+ * initialize 64 bits of data at the location.
+ *
+ * This creates the initial mutex state, as locked by this nfp_cpp_interface().
+ *
+ * This function should only be called when setting up the initial lock state
+ * upon boot-up of the system.
+ *
+ * @param cpp		NFP CPP handle
+ * @param target	NFP CPP target ID
+ * @param address	Offset into the address space of the NFP CPP target ID
+ * @param key_id	Unique 32-bit value for this mutex
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_mutex_init(struct nfp_cpp *cpp, int target,
+		       unsigned long long address, uint32_t key_id);
+
+/*
+ * Create a mutex handle from an address controlled by a MU Atomic engine
+ *
+ * The CPP target:address must point to a 64-bit aligned location, and reserve
+ * 64 bits of data at the location for use by the handle.
+ *
+ * Only target/address pairs that point to entities that support the MU Atomic
+ * Engine's CmpAndSwap32 command are supported.
+ *
+ * @param cpp		NFP CPP handle
+ * @param target	NFP CPP target ID
+ * @param address	Offset into the address space of the NFP CPP target ID
+ * @param key_id	32-bit unique key (must match the key at this location)
+ *
+ * @return		A non-NULL struct nfp_cpp_mutex * on success, NULL on
+ *                      failure.
+ */
+struct nfp_cpp_mutex *nfp_cpp_mutex_alloc(struct nfp_cpp *cpp, int target,
+					  unsigned long long address,
+					  uint32_t key_id);
+
+/*
+ * Get the NFP CPP handle the mutex was created with
+ *
+ * @param   mutex   NFP mutex handle
+ * @return          NFP CPP handle
+ */
+struct nfp_cpp *nfp_cpp_mutex_cpp(struct nfp_cpp_mutex *mutex);
+
+/*
+ * Get the mutex key
+ *
+ * @param   mutex   NFP mutex handle
+ * @return          Mutex key
+ */
+uint32_t nfp_cpp_mutex_key(struct nfp_cpp_mutex *mutex);
+
+/*
+ * Get the mutex owner
+ *
+ * @param   mutex   NFP mutex handle
+ * @return          Interface ID of the mutex owner
+ *
+ * NOTE: This is for debug purposes ONLY - the owner may change at any time,
+ * unless it has been locked by this NFP CPP handle.
+ */
+uint16_t nfp_cpp_mutex_owner(struct nfp_cpp_mutex *mutex);
+
+/*
+ * Get the mutex target
+ *
+ * @param   mutex   NFP mutex handle
+ * @return          Mutex CPP target (ie NFP_CPP_TARGET_MU)
+ */
+int nfp_cpp_mutex_target(struct nfp_cpp_mutex *mutex);
+
+/*
+ * Get the mutex address
+ *
+ * @param   mutex   NFP mutex handle
+ * @return          Mutex CPP address
+ */
+uint64_t nfp_cpp_mutex_address(struct nfp_cpp_mutex *mutex);
+
+/*
+ * Free a mutex handle - does not alter the lock state
+ *
+ * @param mutex		NFP CPP Mutex handle
+ */
+void nfp_cpp_mutex_free(struct nfp_cpp_mutex *mutex);
+
+/*
+ * Lock a mutex handle, using the NFP MU Atomic Engine
+ *
+ * @param mutex		NFP CPP Mutex handle
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_mutex_lock(struct nfp_cpp_mutex *mutex);
+
+/*
+ * Unlock a mutex handle, using the NFP MU Atomic Engine
+ *
+ * @param mutex		NFP CPP Mutex handle
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int nfp_cpp_mutex_unlock(struct nfp_cpp_mutex *mutex);
+
+/*
+ * Attempt to lock a mutex handle, using the NFP MU Atomic Engine
+ *
+ * @param mutex		NFP CPP Mutex handle
+ * @return		0 if the lock succeeded, -1 on failure (and errno set
+ *			appropriately).
+ */
+int nfp_cpp_mutex_trylock(struct nfp_cpp_mutex *mutex);
+
+#endif /* !__NFP_CPP_H__ */
diff --git a/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
new file mode 100644
index 0000000..817e70f
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_cpp_pcie_ops.c
@@ -0,0 +1,962 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * nfp_cpp_pcie_ops.c
+ * Authors: Vinayak Tammineedi <vinayak.tammineedi@netronome.com>
+ *
+ * Multiplexes the NFP BARs between NFP internal resources and
+ * implements the PCIe specific interface for generic CPP bus access.
+ *
+ * The BARs are managed and allocated if they are available.
+ * The generic CPP bus abstraction builds upon this BAR interface.
+ */
+
+#include <assert.h>
+#include <stdio.h>
+#include <execinfo.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <fcntl.h>
+#include <string.h>
+#include <unistd.h>
+#include <errno.h>
+#include <dirent.h>
+#include <libgen.h>
+
+#include <sys/mman.h>
+#include <sys/file.h>
+#include <sys/stat.h>
+
+#include "nfp_cpp.h"
+#include "nfp_target.h"
+#include "nfp6000/nfp6000.h"
+
+#define NFP_PCIE_BAR(_pf)	(0x30000 + ((_pf) & 7) * 0xc0)
+
+#define NFP_PCIE_BAR_PCIE2CPP_ACTION_BASEADDRESS(_x)  (((_x) & 0x1f) << 16)
+#define NFP_PCIE_BAR_PCIE2CPP_BASEADDRESS(_x)         (((_x) & 0xffff) << 0)
+#define NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT(_x)        (((_x) & 0x3) << 27)
+#define NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT_32BIT    0
+#define NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT_64BIT    1
+#define NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT_0BYTE    3
+#define NFP_PCIE_BAR_PCIE2CPP_MAPTYPE(_x)             (((_x) & 0x7) << 29)
+#define NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_OF(_x)          (((_x) >> 29) & 0x7)
+#define NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_FIXED         0
+#define NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_BULK          1
+#define NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_TARGET        2
+#define NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_GENERAL       3
+#define NFP_PCIE_BAR_PCIE2CPP_TARGET_BASEADDRESS(_x)  (((_x) & 0xf) << 23)
+#define NFP_PCIE_BAR_PCIE2CPP_TOKEN_BASEADDRESS(_x)   (((_x) & 0x3) << 21)
+
+/*
+ * Minimal size of the PCIe cfg memory we depend on being mapped,
+ * queue controller and DMA controller don't have to be covered.
+ */
+#define NFP_PCI_MIN_MAP_SIZE				0x080000
+
+#define NFP_PCIE_P2C_FIXED_SIZE(bar)               (1 << (bar)->bitsize)
+#define NFP_PCIE_P2C_BULK_SIZE(bar)                (1 << (bar)->bitsize)
+#define NFP_PCIE_P2C_GENERAL_TARGET_OFFSET(bar, x) ((x) << ((bar)->bitsize - 2))
+#define NFP_PCIE_P2C_GENERAL_TOKEN_OFFSET(bar, x) ((x) << ((bar)->bitsize - 4))
+#define NFP_PCIE_P2C_GENERAL_SIZE(bar)             (1 << ((bar)->bitsize - 4))
+
+#define NFP_PCIE_CFG_BAR_PCIETOCPPEXPBAR(bar, slot) \
+	(NFP_PCIE_BAR(0) + ((bar) * 8 + (slot)) * 4)
+
+#define NFP_PCIE_CPP_BAR_PCIETOCPPEXPBAR(bar, slot) \
+	(((bar) * 8 + (slot)) * 4)
+
+/*
+ * Define to enable a bit more verbose debug output.
+ * Set to 1 to enable a bit more verbose debug output.
+ */
+struct nfp_pcie_user;
+struct nfp6000_area_priv;
+
+/*
+ * struct nfp_bar - describes BAR configuration and usage
+ * @nfp:	backlink to owner
+ * @barcfg:	cached contents of BAR config CSR
+ * @base:	the BAR's base CPP offset
+ * @mask:       mask for the BAR aperture (read only)
+ * @bitsize:	bitsize of BAR aperture (read only)
+ * @index:	index of the BAR
+ * @lock:	lock to specify if bar is in use
+ * @refcnt:	number of current users
+ * @iomem:	mapped IO memory
+ */
+#define NFP_BAR_MAX 7
+struct nfp_bar {
+	struct nfp_pcie_user *nfp;
+	uint32_t barcfg;
+	uint64_t base;		/* CPP address base */
+	uint64_t mask;		/* Bit mask of the bar */
+	uint32_t bitsize;	/* Bit size of the bar */
+	int index;
+	int lock;
+
+	char *csr;
+	char *iomem;
+};
+
+#define BUSDEV_SZ	13
+struct nfp_pcie_user {
+	struct nfp_bar bar[NFP_BAR_MAX];
+
+	int device;
+	int lock;
+	char busdev[BUSDEV_SZ];
+	int barsz;
+	char *cfg;
+};
+
+static uint32_t
+nfp_bar_maptype(struct nfp_bar *bar)
+{
+	return NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_OF(bar->barcfg);
+}
+
+#define TARGET_WIDTH_32    4
+#define TARGET_WIDTH_64    8
+
+static int
+nfp_compute_bar(const struct nfp_bar *bar, uint32_t *bar_config,
+		uint64_t *bar_base, int tgt, int act, int tok,
+		uint64_t offset, size_t size, int width)
+{
+	uint32_t bitsize;
+	uint32_t newcfg;
+	uint64_t mask;
+
+	if (tgt >= 16)
+		return -EINVAL;
+
+	switch (width) {
+	case 8:
+		newcfg =
+		    NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT
+		    (NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT_64BIT);
+		break;
+	case 4:
+		newcfg =
+		    NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT
+		    (NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT_32BIT);
+		break;
+	case 0:
+		newcfg =
+		    NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT
+		    (NFP_PCIE_BAR_PCIE2CPP_LENGTHSELECT_0BYTE);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	if (act != NFP_CPP_ACTION_RW && act != 0) {
+		/* Fixed CPP mapping with specific action */
+		mask = ~(NFP_PCIE_P2C_FIXED_SIZE(bar) - 1);
+
+		newcfg |=
+		    NFP_PCIE_BAR_PCIE2CPP_MAPTYPE
+		    (NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_FIXED);
+		newcfg |= NFP_PCIE_BAR_PCIE2CPP_TARGET_BASEADDRESS(tgt);
+		newcfg |= NFP_PCIE_BAR_PCIE2CPP_ACTION_BASEADDRESS(act);
+		newcfg |= NFP_PCIE_BAR_PCIE2CPP_TOKEN_BASEADDRESS(tok);
+
+		if ((offset & mask) != ((offset + size - 1) & mask)) {
+			printf("BAR%d: Won't use for Fixed mapping\n",
+				bar->index);
+			printf("\t<%#llx,%#llx>, action=%d\n",
+				(unsigned long long)offset,
+				(unsigned long long)(offset + size), act);
+			printf("\tBAR too small (0x%llx).\n",
+				(unsigned long long)mask);
+			return -EINVAL;
+		}
+		offset &= mask;
+
+#ifdef DEBUG
+		printf("BAR%d: Created Fixed mapping\n", bar->index);
+		printf("\t%d:%d:%d:0x%#llx-0x%#llx>\n", tgt, act, tok,
+			(unsigned long long)offset,
+			(unsigned long long)(offset + mask));
+#endif
+
+		bitsize = 40 - 16;
+	} else {
+		mask = ~(NFP_PCIE_P2C_BULK_SIZE(bar) - 1);
+
+		/* Bulk mapping */
+		newcfg |=
+		    NFP_PCIE_BAR_PCIE2CPP_MAPTYPE
+		    (NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_BULK);
+
+		newcfg |= NFP_PCIE_BAR_PCIE2CPP_TARGET_BASEADDRESS(tgt);
+		newcfg |= NFP_PCIE_BAR_PCIE2CPP_TOKEN_BASEADDRESS(tok);
+
+		if ((offset & mask) != ((offset + size - 1) & mask)) {
+			printf("BAR%d: Won't use for bulk mapping\n",
+				bar->index);
+			printf("\t<%#llx,%#llx>\n", (unsigned long long)offset,
+				(unsigned long long)(offset + size));
+			printf("\ttarget=%d, token=%d\n", tgt, tok);
+			printf("\tBAR too small (%#llx) - (%#llx != %#llx).\n",
+				(unsigned long long)mask,
+				(unsigned long long)(offset & mask),
+				(unsigned long long)(offset + size - 1) & mask);
+
+			return -EINVAL;
+		}
+
+		offset &= mask;
+
+#ifdef DEBUG
+		printf("BAR%d: Created bulk mapping %d:x:%d:%#llx-%#llx\n",
+			bar->index, tgt, tok, (unsigned long long)offset,
+			(unsigned long long)(offset + ~mask));
+#endif
+
+		bitsize = 40 - 21;
+	}
+
+	if (bar->bitsize < bitsize) {
+		printf("BAR%d: Too small for %d:%d:%d\n", bar->index, tgt, tok,
+			act);
+		return -EINVAL;
+	}
+
+	newcfg |= offset >> bitsize;
+
+	if (bar_base)
+		*bar_base = offset;
+
+	if (bar_config)
+		*bar_config = newcfg;
+
+	return 0;
+}
+
+static int
+nfp_bar_write(struct nfp_pcie_user *nfp, struct nfp_bar *bar,
+		  uint32_t newcfg)
+{
+	int base, slot;
+
+	base = bar->index >> 3;
+	slot = bar->index & 7;
+
+	if (!nfp->cfg)
+		return (-ENOMEM);
+
+	bar->csr = nfp->cfg +
+		   NFP_PCIE_CFG_BAR_PCIETOCPPEXPBAR(base, slot);
+
+	*(uint32_t *)(bar->csr) = newcfg;
+
+	bar->barcfg = newcfg;
+#ifdef DEBUG
+	printf("BAR%d: updated to 0x%08x\n", bar->index, newcfg);
+#endif
+
+	return 0;
+}
+
+static int
+nfp_reconfigure_bar(struct nfp_pcie_user *nfp, struct nfp_bar *bar, int tgt,
+		int act, int tok, uint64_t offset, size_t size, int width)
+{
+	uint64_t newbase;
+	uint32_t newcfg;
+	int err;
+
+	err = nfp_compute_bar(bar, &newcfg, &newbase, tgt, act, tok, offset,
+			      size, width);
+	if (err)
+		return err;
+
+	bar->base = newbase;
+
+	return nfp_bar_write(nfp, bar, newcfg);
+}
+
+/*
+ * Map all PCI bars. We assume that the BAR with the PCIe config block is
+ * already mapped.
+ *
+ * BAR0.0: Reserved for General Mapping (for MSI-X access to PCIe SRAM)
+ */
+static int
+nfp_enable_bars(struct nfp_pcie_user *nfp)
+{
+	struct nfp_bar *bar;
+	int x;
+
+	for (x = ARRAY_SIZE(nfp->bar); x > 0; x--) {
+		bar = &nfp->bar[x - 1];
+		bar->barcfg = 0;
+		bar->nfp = nfp;
+		bar->index = x;
+		bar->mask = (1 << (nfp->barsz - 3)) - 1;
+		bar->bitsize = nfp->barsz - 3;
+		bar->base = 0;
+		bar->iomem = NULL;
+		bar->lock = 0;
+		bar->csr = nfp->cfg +
+			   NFP_PCIE_CFG_BAR_PCIETOCPPEXPBAR(bar->index >> 3,
+							   bar->index & 7);
+		bar->iomem =
+		    (char *)mmap(0, 1 << bar->bitsize, PROT_READ | PROT_WRITE,
+				 MAP_SHARED, nfp->device,
+				 bar->index << bar->bitsize);
+
+		if (bar->iomem == MAP_FAILED)
+			return (-ENOMEM);
+	}
+	return 0;
+}
+
+static struct nfp_bar *
+nfp_alloc_bar(struct nfp_pcie_user *nfp)
+{
+	struct nfp_bar *bar;
+	int x;
+
+	for (x = ARRAY_SIZE(nfp->bar); x > 0; x--) {
+		bar = &nfp->bar[x - 1];
+		if (!bar->lock) {
+			bar->lock = 1;
+			return bar;
+		}
+	}
+	return NULL;
+}
+
+static void
+nfp_disable_bars(struct nfp_pcie_user *nfp)
+{
+	struct nfp_bar *bar;
+	int x;
+
+	for (x = ARRAY_SIZE(nfp->bar); x > 0; x--) {
+		bar = &nfp->bar[x - 1];
+		if (bar->iomem) {
+			munmap(bar->iomem, 1 << (nfp->barsz - 3));
+			bar->iomem = NULL;
+			bar->lock = 0;
+		}
+	}
+}
+
+/*
+ * Generic CPP bus access interface.
+ */
+
+struct nfp6000_area_priv {
+	struct nfp_bar *bar;
+	uint32_t bar_offset;
+
+	uint32_t target;
+	uint32_t action;
+	uint32_t token;
+	uint64_t offset;
+	struct {
+		int read;
+		int write;
+		int bar;
+	} width;
+	size_t size;
+	char *iomem;
+};
+
+static int
+nfp6000_area_init(struct nfp_cpp_area *area, uint32_t dest,
+		  unsigned long long address, unsigned long size)
+{
+	struct nfp_pcie_user *nfp = nfp_cpp_priv(nfp_cpp_area_cpp(area));
+	struct nfp6000_area_priv *priv = nfp_cpp_area_priv(area);
+	uint32_t target = NFP_CPP_ID_TARGET_of(dest);
+	uint32_t action = NFP_CPP_ID_ACTION_of(dest);
+	uint32_t token = NFP_CPP_ID_TOKEN_of(dest);
+	int pp, ret = 0;
+
+	pp = nfp6000_target_pushpull(NFP_CPP_ID(target, action, token),
+				     address);
+	if (pp < 0)
+		return pp;
+
+	priv->width.read = PUSH_WIDTH(pp);
+	priv->width.write = PULL_WIDTH(pp);
+
+	if (priv->width.read > 0 &&
+	    priv->width.write > 0 && priv->width.read != priv->width.write)
+		return -EINVAL;
+
+	if (priv->width.read > 0)
+		priv->width.bar = priv->width.read;
+	else
+		priv->width.bar = priv->width.write;
+
+	priv->bar = nfp_alloc_bar(nfp);
+	if (priv->bar == NULL)
+		return -ENOMEM;
+
+	priv->target = target;
+	priv->action = action;
+	priv->token = token;
+	priv->offset = address;
+	priv->size = size;
+
+	ret = nfp_reconfigure_bar(nfp, priv->bar, priv->target, priv->action,
+				  priv->token, priv->offset, priv->size,
+				  priv->width.bar);
+
+	return ret;
+}
+
+static int
+nfp6000_area_acquire(struct nfp_cpp_area *area)
+{
+	struct nfp6000_area_priv *priv = nfp_cpp_area_priv(area);
+
+	/* Calculate offset into BAR. */
+	if (nfp_bar_maptype(priv->bar) ==
+	    NFP_PCIE_BAR_PCIE2CPP_MAPTYPE_GENERAL) {
+		priv->bar_offset = priv->offset &
+			(NFP_PCIE_P2C_GENERAL_SIZE(priv->bar) - 1);
+		priv->bar_offset +=
+			NFP_PCIE_P2C_GENERAL_TARGET_OFFSET(priv->bar,
+							   priv->target);
+		priv->bar_offset +=
+		    NFP_PCIE_P2C_GENERAL_TOKEN_OFFSET(priv->bar, priv->token);
+	} else {
+		priv->bar_offset = priv->offset & priv->bar->mask;
+	}
+
+	/* Must have been too big. Sub-allocate. */
+	if (!priv->bar->iomem)
+		return (-ENOMEM);
+
+	priv->iomem = priv->bar->iomem + priv->bar_offset;
+
+	return 0;
+}
+
+static void *
+nfp6000_area_mapped(struct nfp_cpp_area *area)
+{
+	struct nfp6000_area_priv *area_priv = nfp_cpp_area_priv(area);
+
+	if (!area_priv->iomem)
+		return NULL;
+
+	return area_priv->iomem;
+}
+
+static void
+nfp6000_area_release(struct nfp_cpp_area *area)
+{
+	struct nfp6000_area_priv *priv = nfp_cpp_area_priv(area);
+	priv->bar->lock = 0;
+	priv->bar = NULL;
+	priv->iomem = NULL;
+}
+
+static void *
+nfp6000_area_iomem(struct nfp_cpp_area *area)
+{
+	struct nfp6000_area_priv *priv = nfp_cpp_area_priv(area);
+	return priv->iomem;
+}
+
+static int
+nfp6000_area_read(struct nfp_cpp_area *area, void *kernel_vaddr,
+		  unsigned long offset, unsigned int length)
+{
+	uint64_t *wrptr64 = kernel_vaddr;
+	const volatile uint64_t *rdptr64;
+	struct nfp6000_area_priv *priv;
+	uint32_t *wrptr32 = kernel_vaddr;
+	const volatile uint32_t *rdptr32;
+	int width;
+	unsigned int n;
+	bool is_64;
+
+	priv = nfp_cpp_area_priv(area);
+	rdptr64 = (uint64_t *)(priv->iomem + offset);
+	rdptr32 = (uint32_t *)(priv->iomem + offset);
+
+	if (offset + length > priv->size)
+		return -EFAULT;
+
+	width = priv->width.read;
+
+	if (width <= 0)
+		return -EINVAL;
+
+	/* Unaligned? Translate to an explicit access */
+	if ((priv->offset + offset) & (width - 1)) {
+		printf("aread_read unaligned!!!\n");
+		return -EINVAL;
+	}
+
+	is_64 = width == TARGET_WIDTH_64;
+
+	/* MU reads via a PCIe2CPP BAR supports 32bit (and other) lengths */
+	if (priv->target == (NFP_CPP_TARGET_ID_MASK & NFP_CPP_TARGET_MU) &&
+	    priv->action == NFP_CPP_ACTION_RW) {
+		is_64 = false;
+	}
+
+	if (is_64) {
+		if (offset % sizeof(uint64_t) != 0 ||
+		    length % sizeof(uint64_t) != 0)
+			return -EINVAL;
+	} else {
+		if (offset % sizeof(uint32_t) != 0 ||
+		    length % sizeof(uint32_t) != 0)
+			return -EINVAL;
+	}
+
+	if (!priv->bar)
+		return -EFAULT;
+
+	if (is_64)
+		for (n = 0; n < length; n += sizeof(uint64_t)) {
+			*wrptr64 = *rdptr64;
+			wrptr64++;
+			rdptr64++;
+		}
+	else
+		for (n = 0; n < length; n += sizeof(uint32_t)) {
+			*wrptr32 = *rdptr32;
+			wrptr32++;
+			rdptr32++;
+		}
+
+	return n;
+}
+
+static int
+nfp6000_area_write(struct nfp_cpp_area *area, const void *kernel_vaddr,
+		   unsigned long offset, unsigned int length)
+{
+	const uint64_t *rdptr64 = kernel_vaddr;
+	uint64_t *wrptr64;
+	const uint32_t *rdptr32 = kernel_vaddr;
+	struct nfp6000_area_priv *priv;
+	uint32_t *wrptr32;
+	int width;
+	unsigned int n;
+	bool is_64;
+
+	priv = nfp_cpp_area_priv(area);
+	wrptr64 = (uint64_t *)(priv->iomem + offset);
+	wrptr32 = (uint32_t *)(priv->iomem + offset);
+
+	if (offset + length > priv->size)
+		return -EFAULT;
+
+	width = priv->width.write;
+
+	if (width <= 0)
+		return -EINVAL;
+
+	/* Unaligned? Translate to an explicit access */
+	if ((priv->offset + offset) & (width - 1))
+		return -EINVAL;
+
+	is_64 = width == TARGET_WIDTH_64;
+
+	/* MU writes via a PCIe2CPP BAR supports 32bit (and other) lengths */
+	if (priv->target == (NFP_CPP_TARGET_ID_MASK & NFP_CPP_TARGET_MU) &&
+	    priv->action == NFP_CPP_ACTION_RW)
+		is_64 = false;
+
+	if (is_64) {
+		if (offset % sizeof(uint64_t) != 0 ||
+		    length % sizeof(uint64_t) != 0)
+			return -EINVAL;
+	} else {
+		if (offset % sizeof(uint32_t) != 0 ||
+		    length % sizeof(uint32_t) != 0)
+			return -EINVAL;
+	}
+
+	if (!priv->bar)
+		return -EFAULT;
+
+	if (is_64)
+		for (n = 0; n < length; n += sizeof(uint64_t)) {
+			*wrptr64 = *rdptr64;
+			wrptr64++;
+			rdptr64++;
+		}
+	else
+		for (n = 0; n < length; n += sizeof(uint32_t)) {
+			*wrptr32 = *rdptr32;
+			wrptr32++;
+			rdptr32++;
+		}
+
+	return n;
+}
+
+#define PCI_DEVICES "/sys/bus/pci/devices"
+
+static int
+nfp_acquire_process_lock(struct nfp_pcie_user *desc)
+{
+	int rc;
+	struct flock lock;
+	char lockname[30];
+
+	memset(&lock, 0, sizeof(lock));
+
+	snprintf(lockname, sizeof(lockname), "/var/lock/nfp_%s", desc->busdev);
+	desc->lock = open(lockname, O_RDWR | O_CREAT, 0666);
+	if (desc->lock < 0)
+		return desc->lock;
+
+	lock.l_type = F_WRLCK;
+	lock.l_whence = SEEK_SET;
+	rc = -1;
+	while (rc != 0) {
+		rc = fcntl(desc->lock, F_SETLKW, &lock);
+		if (rc < 0) {
+			if (errno != EAGAIN && errno != EACCES) {
+				close(desc->lock);
+				return rc;
+			}
+		}
+	}
+
+	return 0;
+}
+
+static int
+nfp6000_set_model(struct nfp_pcie_user *desc, struct nfp_cpp *cpp)
+{
+	char tmp_str[80];
+	uint32_t tmp;
+	int fp;
+
+	snprintf(tmp_str, sizeof(tmp_str), "%s/%s/config", PCI_DEVICES,
+		 desc->busdev);
+
+	fp = open(tmp_str, O_RDONLY);
+	if (!fp)
+		return -1;
+
+	lseek(fp, 0x2e, SEEK_SET);
+
+	if (read(fp, &tmp, sizeof(tmp)) != sizeof(tmp)) {
+		printf("Error reading config file for model\n");
+		return -1;
+	}
+
+	tmp = tmp << 16;
+
+	if (close(fp) == -1)
+		return -1;
+
+	nfp_cpp_model_set(cpp, tmp);
+
+	return 0;
+}
+
+static int
+nfp6000_set_interface(struct nfp_pcie_user *desc, struct nfp_cpp *cpp)
+{
+	char tmp_str[80];
+	uint16_t tmp;
+	int fp;
+
+	snprintf(tmp_str, sizeof(tmp_str), "%s/%s/config", PCI_DEVICES,
+		 desc->busdev);
+
+	fp = open(tmp_str, O_RDONLY);
+	if (!fp)
+		return -1;
+
+	lseek(fp, 0x154, SEEK_SET);
+
+	if (read(fp, &tmp, sizeof(tmp)) != sizeof(tmp)) {
+		printf("error reading config file for interface\n");
+		return -1;
+	}
+
+	if (close(fp) == -1)
+		return -1;
+
+	nfp_cpp_interface_set(cpp, tmp);
+
+	return 0;
+}
+
+#define PCI_CFG_SPACE_SIZE	256
+#define PCI_CFG_SPACE_EXP_SIZE	4096
+#define PCI_EXT_CAP_ID(header)		(int)(header & 0x0000ffff)
+#define PCI_EXT_CAP_NEXT(header)	((header >> 20) & 0xffc)
+#define PCI_EXT_CAP_ID_DSN	0x03
+static int
+nfp_pci_find_next_ext_capability(int fp, int cap)
+{
+	uint32_t header;
+	int ttl;
+	int pos = PCI_CFG_SPACE_SIZE;
+
+	/* minimum 8 bytes per capability */
+	ttl = (PCI_CFG_SPACE_EXP_SIZE - PCI_CFG_SPACE_SIZE) / 8;
+
+	lseek(fp, pos, SEEK_SET);
+	if (read(fp, &header, sizeof(header)) != sizeof(header)) {
+		printf("error reading config file for serial\n");
+		return -1;
+	}
+
+	/*
+	 * If we have no capabilities, this is indicated by cap ID,
+	 * cap version and next pointer all being 0.
+	 */
+	if (header == 0)
+		return 0;
+
+	while (ttl-- > 0) {
+		if (PCI_EXT_CAP_ID(header) == cap)
+			return pos;
+
+		pos = PCI_EXT_CAP_NEXT(header);
+		if (pos < PCI_CFG_SPACE_SIZE)
+			break;
+
+		lseek(fp, pos, SEEK_SET);
+		if (read(fp, &header, sizeof(header)) != sizeof(header)) {
+			printf("error reading config file for serial\n");
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+static int
+nfp6000_set_serial(struct nfp_pcie_user *desc, struct nfp_cpp *cpp)
+{
+	char tmp_str[80];
+	uint16_t tmp;
+	uint8_t serial[6];
+	int serial_len = 6;
+	int fp, pos;
+
+	snprintf(tmp_str, sizeof(tmp_str), "%s/%s/config", PCI_DEVICES,
+		 desc->busdev);
+
+	fp = open(tmp_str, O_RDONLY);
+	if (!fp)
+		return -1;
+
+	pos = nfp_pci_find_next_ext_capability(fp, PCI_EXT_CAP_ID_DSN);
+	if (pos <= 0) {
+		printf("PCI_EXT_CAP_ID_DSN not found. Using default offset\n");
+		lseek(fp, 0x156, SEEK_SET);
+	} else {
+		lseek(fp, pos + 6, SEEK_SET);
+	}
+
+	if (read(fp, &tmp, sizeof(tmp)) != sizeof(tmp)) {
+		printf("error reading config file for serial\n");
+		return -1;
+	}
+
+	serial[4] = (uint8_t)((tmp >> 8) & 0xff);
+	serial[5] = (uint8_t)(tmp & 0xff);
+
+	if (read(fp, &tmp, sizeof(tmp)) != sizeof(tmp)) {
+		printf("error reading config file for serial\n");
+		return -1;
+	}
+
+	serial[2] = (uint8_t)((tmp >> 8) & 0xff);
+	serial[3] = (uint8_t)(tmp & 0xff);
+
+	if (read(fp, &tmp, sizeof(tmp)) != sizeof(tmp)) {
+		printf("error reading config file for serial\n");
+		return -1;
+	}
+
+	serial[0] = (uint8_t)((tmp >> 8) & 0xff);
+	serial[1] = (uint8_t)(tmp & 0xff);
+
+	if (close(fp) == -1)
+		return -1;
+
+	nfp_cpp_serial_set(cpp, serial, serial_len);
+
+	return 0;
+}
+
+static int
+nfp6000_set_barsz(struct nfp_pcie_user *desc)
+{
+	char tmp_str[80];
+	unsigned long start, end, flags, tmp;
+	int i;
+	FILE *fp;
+
+	snprintf(tmp_str, sizeof(tmp_str), "%s/%s/resource", PCI_DEVICES,
+		 desc->busdev);
+
+	fp = fopen(tmp_str, "r");
+	if (!fp)
+		return -1;
+
+	if (fscanf(fp, "0x%lx 0x%lx 0x%lx", &start, &end, &flags) == 0) {
+		printf("error reading resource file for bar size\n");
+		return -1;
+	}
+
+	if (fclose(fp) == -1)
+		return -1;
+
+	tmp = (end - start) + 1;
+	i = 0;
+	while (tmp >>= 1)
+		i++;
+	desc->barsz = i;
+	return 0;
+}
+
+static int
+nfp6000_init(struct nfp_cpp *cpp, const char *devname)
+{
+	char link[120];
+	char tmp_str[80];
+	ssize_t size;
+	int ret = 0;
+	uint32_t model;
+	struct nfp_pcie_user *desc;
+
+	desc = malloc(sizeof(*desc));
+	if (!desc)
+		return -1;
+
+
+	memset(desc->busdev, 0, BUSDEV_SZ);
+	strncpy(desc->busdev, devname, strlen(devname));
+
+	ret = nfp_acquire_process_lock(desc);
+	if (ret)
+		return -1;
+
+	snprintf(tmp_str, sizeof(tmp_str), "%s/%s/driver", PCI_DEVICES,
+		 desc->busdev);
+
+	size = readlink(tmp_str, link, sizeof(link));
+
+	if (size == -1)
+		tmp_str[0] = '\0';
+
+	if (size == sizeof(link))
+		tmp_str[0] = '\0';
+
+	snprintf(tmp_str, sizeof(tmp_str), "%s/%s/resource0", PCI_DEVICES,
+		 desc->busdev);
+
+	desc->device = open(tmp_str, O_RDWR);
+	if (desc->device == -1)
+		return -1;
+
+	if (nfp6000_set_model(desc, cpp) < 0)
+		return -1;
+	if (nfp6000_set_interface(desc, cpp) < 0)
+		return -1;
+	if (nfp6000_set_serial(desc, cpp) < 0)
+		return -1;
+	if (nfp6000_set_barsz(desc) < 0)
+		return -1;
+
+	desc->cfg = (char *)mmap(0, 1 << (desc->barsz - 3),
+				 PROT_READ | PROT_WRITE,
+				 MAP_SHARED, desc->device, 0);
+
+	if (desc->cfg == MAP_FAILED)
+		return -1;
+
+	nfp_enable_bars(desc);
+
+	nfp_cpp_priv_set(cpp, desc);
+
+	model = __nfp_cpp_model_autodetect(cpp);
+	nfp_cpp_model_set(cpp, model);
+
+	return ret;
+}
+
+static void
+nfp6000_free(struct nfp_cpp *cpp)
+{
+	struct nfp_pcie_user *desc = nfp_cpp_priv(cpp);
+	int x;
+
+	/* Unmap may cause if there are any pending transaxctions */
+	nfp_disable_bars(desc);
+	munmap(desc->cfg, 1 << (desc->barsz - 3));
+
+	for (x = ARRAY_SIZE(desc->bar); x > 0; x--) {
+		if (desc->bar[x - 1].iomem)
+			munmap(desc->bar[x - 1].iomem, 1 << (desc->barsz - 3));
+	}
+	close(desc->lock);
+	close(desc->device);
+	free(desc);
+}
+
+static const struct nfp_cpp_operations nfp6000_pcie_ops = {
+	.init = nfp6000_init,
+	.free = nfp6000_free,
+
+	.area_priv_size = sizeof(struct nfp6000_area_priv),
+	.area_init = nfp6000_area_init,
+	.area_acquire = nfp6000_area_acquire,
+	.area_release = nfp6000_area_release,
+	.area_mapped = nfp6000_area_mapped,
+	.area_read = nfp6000_area_read,
+	.area_write = nfp6000_area_write,
+	.area_iomem = nfp6000_area_iomem,
+};
+
+const struct
+nfp_cpp_operations *nfp_cpp_transport_operations(void)
+{
+	return &nfp6000_pcie_ops;
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_cppcore.c b/drivers/net/nfp/nfpcore/nfp_cppcore.c
new file mode 100644
index 0000000..f110b8f
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_cppcore.c
@@ -0,0 +1,901 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <assert.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <sys/types.h>
+
+#include <rte_byteorder.h>
+
+#include "nfp_cpp.h"
+#include "nfp_target.h"
+#include "nfp6000/nfp6000.h"
+#include "nfp6000/nfp_xpb.h"
+#include "nfp_nffw.h"
+
+#define NFP_PL_DEVICE_ID                        0x00000004
+#define NFP_PL_DEVICE_ID_MASK                   0xff
+
+#define NFP6000_ARM_GCSR_SOFTMODEL0             0x00400144
+
+void
+nfp_cpp_priv_set(struct nfp_cpp *cpp, void *priv)
+{
+	cpp->priv = priv;
+}
+
+void *
+nfp_cpp_priv(struct nfp_cpp *cpp)
+{
+	return cpp->priv;
+}
+
+void
+nfp_cpp_model_set(struct nfp_cpp *cpp, uint32_t model)
+{
+	cpp->model = model;
+}
+
+uint32_t
+nfp_cpp_model(struct nfp_cpp *cpp)
+{
+	if (!cpp)
+		return NFP_CPP_MODEL_INVALID;
+
+	if (cpp->model == 0)
+		cpp->model = __nfp_cpp_model_autodetect(cpp);
+
+	return cpp->model;
+}
+
+void
+nfp_cpp_interface_set(struct nfp_cpp *cpp, uint32_t interface)
+{
+	cpp->interface = interface;
+}
+
+int
+nfp_cpp_serial(struct nfp_cpp *cpp, const uint8_t **serial)
+{
+	*serial = cpp->serial;
+	return cpp->serial_len;
+}
+
+int
+nfp_cpp_serial_set(struct nfp_cpp *cpp, const uint8_t *serial,
+		   size_t serial_len)
+{
+	if (cpp->serial_len)
+		free(cpp->serial);
+
+	cpp->serial = malloc(serial_len);
+	if (!cpp->serial)
+		return -1;
+
+	memcpy(cpp->serial, serial, serial_len);
+	cpp->serial_len = serial_len;
+
+	return 0;
+}
+
+uint16_t
+nfp_cpp_interface(struct nfp_cpp *cpp)
+{
+	if (!cpp)
+		return NFP_CPP_INTERFACE(NFP_CPP_INTERFACE_TYPE_INVALID, 0, 0);
+
+	return cpp->interface;
+}
+
+void *
+nfp_cpp_area_priv(struct nfp_cpp_area *cpp_area)
+{
+	return &cpp_area[1];
+}
+
+struct nfp_cpp *
+nfp_cpp_area_cpp(struct nfp_cpp_area *cpp_area)
+{
+	return cpp_area->cpp;
+}
+
+const char *
+nfp_cpp_area_name(struct nfp_cpp_area *cpp_area)
+{
+	return cpp_area->name;
+}
+
+/*
+ * nfp_cpp_area_alloc - allocate a new CPP area
+ * @cpp:    CPP handle
+ * @dest:   CPP id
+ * @address:    start address on CPP target
+ * @size:   size of area in bytes
+ *
+ * Allocate and initialize a CPP area structure.  The area must later
+ * be locked down with an 'acquire' before it can be safely accessed.
+ *
+ * NOTE: @address and @size must be 32-bit aligned values.
+ */
+struct nfp_cpp_area *
+nfp_cpp_area_alloc_with_name(struct nfp_cpp *cpp, uint32_t dest,
+			      const char *name, unsigned long long address,
+			      unsigned long size)
+{
+	struct nfp_cpp_area *area;
+	uint64_t tmp64 = (uint64_t)address;
+	int tmp, err;
+
+	if (!cpp)
+		return NULL;
+
+	/* CPP bus uses only a 40-bit address */
+	if ((address + size) > (1ULL << 40))
+		return NFP_ERRPTR(EFAULT);
+
+	/* Remap from cpp_island to cpp_target */
+	err = nfp_target_cpp(dest, tmp64, &dest, &tmp64, cpp->imb_cat_table);
+	if (err < 0)
+		return NULL;
+
+	address = (unsigned long long)tmp64;
+
+	if (!name)
+		name = "";
+
+	area = calloc(1, sizeof(*area) + cpp->op->area_priv_size +
+		      strlen(name) + 1);
+	if (!area)
+		return NULL;
+
+	area->cpp = cpp;
+	area->name = ((char *)area) + sizeof(*area) + cpp->op->area_priv_size;
+	memcpy(area->name, name, strlen(name) + 1);
+
+	/*
+	 * Preserve errno around the call to area_init, since most
+	 * implementations will blindly call nfp_target_action_width()for both
+	 * read or write modes, and that will set errno to EINVAL.
+	 */
+	tmp = errno;
+
+	err = cpp->op->area_init(area, dest, address, size);
+	if (err < 0) {
+		free(area);
+		return NULL;
+	}
+
+	/* Restore errno */
+	errno = tmp;
+
+	area->offset = address;
+	area->size = size;
+
+	return area;
+}
+
+struct nfp_cpp_area *
+nfp_cpp_area_alloc(struct nfp_cpp *cpp, uint32_t dest,
+		    unsigned long long address, unsigned long size)
+{
+	return nfp_cpp_area_alloc_with_name(cpp, dest, NULL, address, size);
+}
+
+/*
+ * nfp_cpp_area_alloc_acquire - allocate a new CPP area and lock it down
+ *
+ * @cpp:    CPP handle
+ * @dest:   CPP id
+ * @address:    start address on CPP target
+ * @size:   size of area
+ *
+ * Allocate and initilizae a CPP area structure, and lock it down so
+ * that it can be accessed directly.
+ *
+ * NOTE: @address and @size must be 32-bit aligned values.
+ *
+ * NOTE: The area must also be 'released' when the structure is freed.
+ */
+struct nfp_cpp_area *
+nfp_cpp_area_alloc_acquire(struct nfp_cpp *cpp, uint32_t destination,
+			    unsigned long long address, unsigned long size)
+{
+	struct nfp_cpp_area *area;
+
+	area = nfp_cpp_area_alloc(cpp, destination, address, size);
+	if (!area)
+		return NULL;
+
+	if (nfp_cpp_area_acquire(area)) {
+		nfp_cpp_area_free(area);
+		return NULL;
+	}
+
+	return area;
+}
+
+/*
+ * nfp_cpp_area_free - free up the CPP area
+ * area:    CPP area handle
+ *
+ * Frees up memory resources held by the CPP area.
+ */
+void
+nfp_cpp_area_free(struct nfp_cpp_area *area)
+{
+	if (area->cpp->op->area_cleanup)
+		area->cpp->op->area_cleanup(area);
+	free(area);
+}
+
+/*
+ * nfp_cpp_area_release_free - release CPP area and free it
+ * area:    CPP area handle
+ *
+ * Releases CPP area and frees up memory resources held by the it.
+ */
+void
+nfp_cpp_area_release_free(struct nfp_cpp_area *area)
+{
+	nfp_cpp_area_release(area);
+	nfp_cpp_area_free(area);
+}
+
+/*
+ * nfp_cpp_area_acquire - lock down a CPP area for access
+ * @area:   CPP area handle
+ *
+ * Locks down the CPP area for a potential long term activity.  Area
+ * must always be locked down before being accessed.
+ */
+int
+nfp_cpp_area_acquire(struct nfp_cpp_area *area)
+{
+	if (area->cpp->op->area_acquire) {
+		int err = area->cpp->op->area_acquire(area);
+
+		if (err < 0)
+			return -1;
+	}
+
+	return 0;
+}
+
+/*
+ * nfp_cpp_area_release - release a locked down CPP area
+ * @area:   CPP area handle
+ *
+ * Releases a previously locked down CPP area.
+ */
+void
+nfp_cpp_area_release(struct nfp_cpp_area *area)
+{
+	if (area->cpp->op->area_release)
+		area->cpp->op->area_release(area);
+}
+
+/*
+ * nfp_cpp_area_iomem() - get IOMEM region for CPP area
+ *
+ * @area:       CPP area handle
+ *
+ * Returns an iomem pointer for use with readl()/writel() style operations.
+ *
+ * NOTE: Area must have been locked down with an 'acquire'.
+ *
+ * Return: pointer to the area, or NULL
+ */
+void *
+nfp_cpp_area_iomem(struct nfp_cpp_area *area)
+{
+	void *iomem = NULL;
+
+	if (area->cpp->op->area_iomem)
+		iomem = area->cpp->op->area_iomem(area);
+
+	return iomem;
+}
+
+/*
+ * nfp_cpp_area_read - read data from CPP area
+ *
+ * @area:       CPP area handle
+ * @offset:     offset into CPP area
+ * @kernel_vaddr:   kernel address to put data into
+ * @length:     number of bytes to read
+ *
+ * Read data from indicated CPP region.
+ *
+ * NOTE: @offset and @length must be 32-bit aligned values.
+ *
+ * NOTE: Area must have been locked down with an 'acquire'.
+ */
+int
+nfp_cpp_area_read(struct nfp_cpp_area *area, unsigned long offset,
+		  void *kernel_vaddr, size_t length)
+{
+	if ((offset + length) > area->size)
+		return NFP_ERRNO(EFAULT);
+
+	return area->cpp->op->area_read(area, kernel_vaddr, offset, length);
+}
+
+/*
+ * nfp_cpp_area_write - write data to CPP area
+ *
+ * @area:       CPP area handle
+ * @offset:     offset into CPP area
+ * @kernel_vaddr:   kernel address to read data from
+ * @length:     number of bytes to write
+ *
+ * Write data to indicated CPP region.
+ *
+ * NOTE: @offset and @length must be 32-bit aligned values.
+ *
+ * NOTE: Area must have been locked down with an 'acquire'.
+ */
+int
+nfp_cpp_area_write(struct nfp_cpp_area *area, unsigned long offset,
+		   const void *kernel_vaddr, size_t length)
+{
+	if ((offset + length) > area->size)
+		return NFP_ERRNO(EFAULT);
+
+	return area->cpp->op->area_write(area, kernel_vaddr, offset, length);
+}
+
+void *
+nfp_cpp_area_mapped(struct nfp_cpp_area *area)
+{
+	if (area->cpp->op->area_mapped)
+		return area->cpp->op->area_mapped(area);
+	return NULL;
+}
+
+/*
+ * nfp_cpp_area_check_range - check if address range fits in CPP area
+ *
+ * @area:   CPP area handle
+ * @offset: offset into CPP area
+ * @length: size of address range in bytes
+ *
+ * Check if address range fits within CPP area.  Return 0 if area fits
+ * or -1 on error.
+ */
+int
+nfp_cpp_area_check_range(struct nfp_cpp_area *area, unsigned long long offset,
+			 unsigned long length)
+{
+	if (((offset + length) > area->size))
+		return NFP_ERRNO(EFAULT);
+
+	return 0;
+}
+
+/*
+ * Return the correct CPP address, and fixup xpb_addr as needed,
+ * based upon NFP model.
+ */
+static uint32_t
+nfp_xpb_to_cpp(struct nfp_cpp *cpp, uint32_t *xpb_addr)
+{
+	uint32_t xpb;
+	int island;
+
+	if (!NFP_CPP_MODEL_IS_6000(cpp->model))
+		return 0;
+
+	xpb = NFP_CPP_ID(14, NFP_CPP_ACTION_RW, 0);
+
+	/*
+	 * Ensure that non-local XPB accesses go out through the
+	 * global XPBM bus.
+	 */
+	island = ((*xpb_addr) >> 24) & 0x3f;
+
+	if (!island)
+		return xpb;
+
+	if (island == 1) {
+		/*
+		 * Accesses to the ARM Island overlay uses Island 0
+		 * Global Bit
+		 */
+		(*xpb_addr) &= ~0x7f000000;
+		if (*xpb_addr < 0x60000)
+			*xpb_addr |= (1 << 30);
+		else
+			/* And only non-ARM interfaces use island id = 1 */
+			if (NFP_CPP_INTERFACE_TYPE_of(nfp_cpp_interface(cpp)) !=
+			    NFP_CPP_INTERFACE_TYPE_ARM)
+				*xpb_addr |= (1 << 24);
+	} else {
+		(*xpb_addr) |= (1 << 30);
+	}
+
+	return xpb;
+}
+
+int
+nfp_cpp_area_readl(struct nfp_cpp_area *area, unsigned long offset,
+		   uint32_t *value)
+{
+	int sz;
+	uint32_t tmp;
+
+	sz = nfp_cpp_area_read(area, offset, &tmp, sizeof(tmp));
+	*value = rte_le_to_cpu_32(tmp);
+
+	return (sz == sizeof(*value)) ? 0 : -1;
+}
+
+int
+nfp_cpp_area_writel(struct nfp_cpp_area *area, unsigned long offset,
+		    uint32_t value)
+{
+	int sz;
+
+	value = rte_cpu_to_le_32(value);
+	sz = nfp_cpp_area_write(area, offset, &value, sizeof(value));
+	return (sz == sizeof(value)) ? 0 : -1;
+}
+
+int
+nfp_cpp_area_readq(struct nfp_cpp_area *area, unsigned long offset,
+		   uint64_t *value)
+{
+	int sz;
+	uint64_t tmp;
+
+	sz = nfp_cpp_area_read(area, offset, &tmp, sizeof(tmp));
+	*value = rte_le_to_cpu_64(tmp);
+
+	return (sz == sizeof(*value)) ? 0 : -1;
+}
+
+int
+nfp_cpp_area_writeq(struct nfp_cpp_area *area, unsigned long offset,
+		    uint64_t value)
+{
+	int sz;
+
+	value = rte_cpu_to_le_64(value);
+	sz = nfp_cpp_area_write(area, offset, &value, sizeof(value));
+
+	return (sz == sizeof(value)) ? 0 : -1;
+}
+
+int
+nfp_cpp_readl(struct nfp_cpp *cpp, uint32_t cpp_id, unsigned long long address,
+	      uint32_t *value)
+{
+	int sz;
+	uint32_t tmp;
+
+	sz = nfp_cpp_read(cpp, cpp_id, address, &tmp, sizeof(tmp));
+	*value = rte_le_to_cpu_32(tmp);
+
+	return (sz == sizeof(*value)) ? 0 : -1;
+}
+
+int
+nfp_cpp_writel(struct nfp_cpp *cpp, uint32_t cpp_id, unsigned long long address,
+	       uint32_t value)
+{
+	int sz;
+
+	value = rte_cpu_to_le_32(value);
+	sz = nfp_cpp_write(cpp, cpp_id, address, &value, sizeof(value));
+
+	return (sz == sizeof(value)) ? 0 : -1;
+}
+
+int
+nfp_cpp_readq(struct nfp_cpp *cpp, uint32_t cpp_id, unsigned long long address,
+	      uint64_t *value)
+{
+	int sz;
+	uint64_t tmp;
+
+	sz = nfp_cpp_read(cpp, cpp_id, address, &tmp, sizeof(tmp));
+	*value = rte_le_to_cpu_64(tmp);
+
+	return (sz == sizeof(*value)) ? 0 : -1;
+}
+
+int
+nfp_cpp_writeq(struct nfp_cpp *cpp, uint32_t cpp_id, unsigned long long address,
+	       uint64_t value)
+{
+	int sz;
+
+	value = rte_cpu_to_le_64(value);
+	sz = nfp_cpp_write(cpp, cpp_id, address, &value, sizeof(value));
+
+	return (sz == sizeof(value)) ? 0 : -1;
+}
+
+int
+nfp_xpb_writel(struct nfp_cpp *cpp, uint32_t xpb_addr, uint32_t value)
+{
+	uint32_t cpp_dest;
+
+	cpp_dest = nfp_xpb_to_cpp(cpp, &xpb_addr);
+
+	return nfp_cpp_writel(cpp, cpp_dest, xpb_addr, value);
+}
+
+int
+nfp_xpb_readl(struct nfp_cpp *cpp, uint32_t xpb_addr, uint32_t *value)
+{
+	uint32_t cpp_dest;
+
+	cpp_dest = nfp_xpb_to_cpp(cpp, &xpb_addr);
+
+	return nfp_cpp_readl(cpp, cpp_dest, xpb_addr, value);
+}
+
+static struct nfp_cpp *
+nfp_cpp_alloc(const char *devname)
+{
+	const struct nfp_cpp_operations *ops;
+	struct nfp_cpp *cpp;
+	int err;
+
+	ops = nfp_cpp_transport_operations();
+
+	if (!ops || !ops->init)
+		return NFP_ERRPTR(EINVAL);
+
+	cpp = calloc(1, sizeof(*cpp));
+	if (!cpp)
+		return NULL;
+
+	cpp->op = ops;
+
+	if (cpp->op->init) {
+		err = cpp->op->init(cpp, devname);
+		if (err < 0) {
+			free(cpp);
+			return NULL;
+		}
+	}
+
+	if (NFP_CPP_MODEL_IS_6000(nfp_cpp_model(cpp))) {
+		uint32_t xpbaddr;
+		size_t tgt;
+
+		for (tgt = 0; tgt < ARRAY_SIZE(cpp->imb_cat_table); tgt++) {
+			/* Hardcoded XPB IMB Base, island 0 */
+			xpbaddr = 0x000a0000 + (tgt * 4);
+			err = nfp_xpb_readl(cpp, xpbaddr,
+				(uint32_t *)&cpp->imb_cat_table[tgt]);
+			if (err < 0) {
+				free(cpp);
+				return NULL;
+			}
+		}
+	}
+
+	return cpp;
+}
+
+/*
+ * nfp_cpp_free - free the CPP handle
+ * @cpp:    CPP handle
+ */
+void
+nfp_cpp_free(struct nfp_cpp *cpp)
+{
+	if (cpp->op && cpp->op->free)
+		cpp->op->free(cpp);
+
+	if (cpp->serial_len)
+		free(cpp->serial);
+
+	free(cpp);
+}
+
+struct nfp_cpp *
+nfp_cpp_from_device_name(const char *devname)
+{
+	return nfp_cpp_alloc(devname);
+}
+
+/*
+ * Modify bits of a 32-bit value from the XPB bus
+ *
+ * @param cpp           NFP CPP device handle
+ * @param xpb_tgt       XPB target and address
+ * @param mask          mask of bits to alter
+ * @param value         value to modify
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int
+nfp_xpb_writelm(struct nfp_cpp *cpp, uint32_t xpb_tgt, uint32_t mask,
+		uint32_t value)
+{
+	int err;
+	uint32_t tmp;
+
+	err = nfp_xpb_readl(cpp, xpb_tgt, &tmp);
+	if (err < 0)
+		return err;
+
+	tmp &= ~mask;
+	tmp |= (mask & value);
+	return nfp_xpb_writel(cpp, xpb_tgt, tmp);
+}
+
+/*
+ * Modify bits of a 32-bit value from the XPB bus
+ *
+ * @param cpp           NFP CPP device handle
+ * @param xpb_tgt       XPB target and address
+ * @param mask          mask of bits to alter
+ * @param value         value to monitor for
+ * @param timeout_us    maximum number of us to wait (-1 for forever)
+ *
+ * @return >= 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int
+nfp_xpb_waitlm(struct nfp_cpp *cpp, uint32_t xpb_tgt, uint32_t mask,
+	       uint32_t value, int timeout_us)
+{
+	uint32_t tmp;
+	int err;
+
+	do {
+		err = nfp_xpb_readl(cpp, xpb_tgt, &tmp);
+		if (err < 0)
+			goto exit;
+
+		if ((tmp & mask) == (value & mask)) {
+			if (timeout_us < 0)
+				timeout_us = 0;
+			break;
+		}
+
+		if (timeout_us < 0)
+			continue;
+
+		timeout_us -= 100;
+		usleep(100);
+	} while (timeout_us >= 0);
+
+	if (timeout_us < 0)
+		err = NFP_ERRNO(ETIMEDOUT);
+	else
+		err = timeout_us;
+
+exit:
+	return err;
+}
+
+/*
+ * nfp_cpp_read - read from CPP target
+ * @cpp:        CPP handle
+ * @destination:    CPP id
+ * @address:        offset into CPP target
+ * @kernel_vaddr:   kernel buffer for result
+ * @length:     number of bytes to read
+ */
+int
+nfp_cpp_read(struct nfp_cpp *cpp, uint32_t destination,
+	     unsigned long long address, void *kernel_vaddr, size_t length)
+{
+	struct nfp_cpp_area *area;
+	int err;
+
+	area = nfp_cpp_area_alloc_acquire(cpp, destination, address, length);
+	if (!area) {
+		printf("Area allocation/acquire failed\n");
+		return -1;
+	}
+
+	err = nfp_cpp_area_read(area, 0, kernel_vaddr, length);
+
+	nfp_cpp_area_release_free(area);
+	return err;
+}
+
+/*
+ * nfp_cpp_write - write to CPP target
+ * @cpp:        CPP handle
+ * @destination:    CPP id
+ * @address:        offset into CPP target
+ * @kernel_vaddr:   kernel buffer to read from
+ * @length:     number of bytes to write
+ */
+int
+nfp_cpp_write(struct nfp_cpp *cpp, uint32_t destination,
+	      unsigned long long address, const void *kernel_vaddr,
+	      size_t length)
+{
+	struct nfp_cpp_area *area;
+	int err;
+
+	area = nfp_cpp_area_alloc_acquire(cpp, destination, address, length);
+	if (!area)
+		return -1;
+
+	err = nfp_cpp_area_write(area, 0, kernel_vaddr, length);
+
+	nfp_cpp_area_release_free(area);
+	return err;
+}
+
+/*
+ * nfp_cpp_area_fill - fill a CPP area with a value
+ * @area:       CPP area
+ * @offset:     offset into CPP area
+ * @value:      value to fill with
+ * @length:     length of area to fill
+ */
+int
+nfp_cpp_area_fill(struct nfp_cpp_area *area, unsigned long offset,
+		  uint32_t value, size_t length)
+{
+	int err;
+	size_t i;
+	uint64_t value64;
+
+	value = rte_cpu_to_le_32(value);
+	value64 = ((uint64_t)value << 32) | value;
+
+	if ((offset + length) > area->size)
+		return NFP_ERRNO(EINVAL);
+
+	if ((area->offset + offset) & 3)
+		return NFP_ERRNO(EINVAL);
+
+	if (((area->offset + offset) & 7) == 4 && length >= 4) {
+		err = nfp_cpp_area_write(area, offset, &value, sizeof(value));
+		if (err < 0)
+			return err;
+		if (err != sizeof(value))
+			return NFP_ERRNO(ENOSPC);
+		offset += sizeof(value);
+		length -= sizeof(value);
+	}
+
+	for (i = 0; (i + sizeof(value)) < length; i += sizeof(value64)) {
+		err =
+		    nfp_cpp_area_write(area, offset + i, &value64,
+				       sizeof(value64));
+		if (err < 0)
+			return err;
+		if (err != sizeof(value64))
+			return NFP_ERRNO(ENOSPC);
+	}
+
+	if ((i + sizeof(value)) <= length) {
+		err =
+		    nfp_cpp_area_write(area, offset + i, &value, sizeof(value));
+		if (err < 0)
+			return err;
+		if (err != sizeof(value))
+			return NFP_ERRNO(ENOSPC);
+		i += sizeof(value);
+	}
+
+	return (int)i;
+}
+
+static inline uint8_t
+__nfp_bytemask_of(int width, uint64_t addr)
+{
+	uint8_t byte_mask;
+
+	if (width == 8)
+		byte_mask = 0xff;
+	else if (width == 4)
+		byte_mask = 0x0f << (addr & 4);
+	else if (width == 2)
+		byte_mask = 0x03 << (addr & 6);
+	else if (width == 1)
+		byte_mask = 0x01 << (addr & 7);
+	else
+		byte_mask = 0;
+
+	return byte_mask;
+}
+
+/*
+ * NOTE: This code should not use nfp_xpb_* functions,
+ * as those are model-specific
+ */
+uint32_t
+__nfp_cpp_model_autodetect(struct nfp_cpp *cpp)
+{
+	uint32_t arm_id = NFP_CPP_ID(NFP_CPP_TARGET_ARM, 0, 0);
+	uint32_t model = 0;
+
+	nfp_cpp_readl(cpp, arm_id, NFP6000_ARM_GCSR_SOFTMODEL0, &model);
+
+	if (NFP_CPP_MODEL_IS_6000(model)) {
+		uint32_t tmp;
+
+		nfp_cpp_model_set(cpp, model);
+
+		/* The PL's PluDeviceID revision code is authoratative */
+		model &= ~0xff;
+		nfp_xpb_readl(cpp, NFP_XPB_DEVICE(1, 1, 16) +
+				   NFP_PL_DEVICE_ID, &tmp);
+		model |= (NFP_PL_DEVICE_ID_MASK & tmp) - 0x10;
+	}
+
+	return model;
+}
+
+/*
+ * nfp_cpp_map_area() - Helper function to map an area
+ * @cpp:    NFP CPP handler
+ * @domain: CPP domain
+ * @target: CPP target
+ * @addr:   CPP address
+ * @size:   Size of the area
+ * @area:   Area handle (output)
+ *
+ * Map an area of IOMEM access.  To undo the effect of this function call
+ * @nfp_cpp_area_release_free(*area).
+ *
+ * Return: Pointer to memory mapped area or ERR_PTR
+ */
+uint8_t *
+nfp_cpp_map_area(struct nfp_cpp *cpp, int domain, int target, uint64_t addr,
+		 unsigned long size, struct nfp_cpp_area **area)
+{
+	uint8_t *res;
+	uint32_t dest;
+
+	dest = NFP_CPP_ISLAND_ID(target, NFP_CPP_ACTION_RW, 0, domain);
+
+	*area = nfp_cpp_area_alloc_acquire(cpp, dest, addr, size);
+	if (!*area)
+		goto err_eio;
+
+	res = nfp_cpp_area_iomem(*area);
+	if (!res)
+		goto err_release_free;
+
+	return res;
+
+err_release_free:
+	nfp_cpp_area_release_free(*area);
+err_eio:
+	return NULL;
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_crc.c b/drivers/net/nfp/nfpcore/nfp_crc.c
new file mode 100644
index 0000000..a3c0e92
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_crc.c
@@ -0,0 +1,75 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <inttypes.h>
+
+#include "nfp_crc.h"
+
+static inline uint32_t
+nfp_crc32_be_generic(uint32_t crc, unsigned char const *p, size_t len,
+		 uint32_t polynomial)
+{
+	int i;
+	while (len--) {
+		crc ^= *p++ << 24;
+		for (i = 0; i < 8; i++)
+			crc = (crc << 1) ^ ((crc & 0x80000000) ? polynomial :
+					  0);
+	}
+	return crc;
+}
+
+static inline uint32_t
+nfp_crc32_be(uint32_t crc, unsigned char const *p, size_t len)
+{
+	return nfp_crc32_be_generic(crc, p, len, CRCPOLY_BE);
+}
+
+static uint32_t
+nfp_crc32_posix_end(uint32_t crc, size_t total_len)
+{
+	/* Extend with the length of the string. */
+	while (total_len != 0) {
+		uint8_t c = total_len & 0xff;
+
+		crc = nfp_crc32_be(crc, &c, 1);
+		total_len >>= 8;
+	}
+
+	return ~crc;
+}
+
+uint32_t
+nfp_crc32_posix(const void *buff, size_t len)
+{
+	return nfp_crc32_posix_end(nfp_crc32_be(0, buff, len), len);
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_crc.h b/drivers/net/nfp/nfpcore/nfp_crc.h
new file mode 100644
index 0000000..f8a112b
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_crc.h
@@ -0,0 +1,45 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_CRC_H__
+#define __NFP_CRC_H__
+
+/*
+ * There are multiple 16-bit CRC polynomials in common use, but this is
+ * *the* standard CRC-32 polynomial, first popularized by Ethernet.
+ * x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x^1+x^0
+ */
+#define CRCPOLY_LE 0xedb88320
+#define CRCPOLY_BE 0x04c11db7
+
+uint32_t nfp_crc32_posix(const void *buff, size_t len);
+
+#endif
diff --git a/drivers/net/nfp/nfpcore/nfp_hwinfo.c b/drivers/net/nfp/nfpcore/nfp_hwinfo.c
new file mode 100644
index 0000000..b8d6400
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_hwinfo.c
@@ -0,0 +1,226 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/* Parse the hwinfo table that the ARM firmware builds in the ARM scratch SRAM
+ * after chip reset.
+ *
+ * Examples of the fields:
+ *   me.count = 40
+ *   me.mask = 0x7f_ffff_ffff
+ *
+ *   me.count is the total number of MEs on the system.
+ *   me.mask is the bitmask of MEs that are available for application usage.
+ *
+ *   (ie, in this example, ME 39 has been reserved by boardconfig.)
+ */
+
+#include <stdio.h>
+#include <time.h>
+#include <zlib.h>
+
+#include "nfp_cpp.h"
+#include "nfp6000/nfp6000.h"
+#include "nfp_resource.h"
+#include "nfp_hwinfo.h"
+#include "nfp_crc.h"
+
+static int
+nfp_hwinfo_is_updating(struct nfp_hwinfo *hwinfo)
+{
+	return hwinfo->version & NFP_HWINFO_VERSION_UPDATING;
+}
+
+static int
+nfp_hwinfo_db_walk(struct nfp_hwinfo *hwinfo, uint32_t size)
+{
+	const char *key, *val, *end = hwinfo->data + size;
+
+	for (key = hwinfo->data; *key && key < end;
+	     key = val + strlen(val) + 1) {
+		val = key + strlen(key) + 1;
+		if (val >= end) {
+			printf("Bad HWINFO - overflowing key\n");
+			return -EINVAL;
+		}
+
+		if (val + strlen(val) + 1 > end) {
+			printf("Bad HWINFO - overflowing value\n");
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+static int
+nfp_hwinfo_db_validate(struct nfp_hwinfo *db, uint32_t len)
+{
+	uint32_t size, new_crc, *crc;
+
+	size = db->size;
+	if (size > len) {
+		printf("Unsupported hwinfo size %u > %u\n", size, len);
+		return -EINVAL;
+	}
+
+	size -= sizeof(uint32_t);
+	new_crc = nfp_crc32_posix((char *)db, size);
+	crc = (uint32_t *)(db->start + size);
+	if (new_crc != *crc) {
+		printf("Corrupt hwinfo table (CRC mismatch)\n");
+		printf("\tcalculated 0x%x, expected 0x%x\n", new_crc, *crc);
+		return -EINVAL;
+	}
+
+	return nfp_hwinfo_db_walk(db, size);
+}
+
+static struct nfp_hwinfo *
+nfp_hwinfo_try_fetch(struct nfp_cpp *cpp, size_t *cpp_size)
+{
+	struct nfp_hwinfo *header;
+	void *res;
+	uint64_t cpp_addr;
+	uint32_t cpp_id;
+	int err;
+	uint8_t *db;
+
+	res = nfp_resource_acquire(cpp, NFP_RESOURCE_NFP_HWINFO);
+	if (res) {
+		cpp_id = nfp_resource_cpp_id(res);
+		cpp_addr = nfp_resource_address(res);
+		*cpp_size = nfp_resource_size(res);
+
+		nfp_resource_release(res);
+
+		if (*cpp_size < HWINFO_SIZE_MIN)
+			return NULL;
+	} else {
+		return NULL;
+	}
+
+	db = malloc(*cpp_size + 1);
+	if (!db)
+		return NULL;
+
+	err = nfp_cpp_read(cpp, cpp_id, cpp_addr, db, *cpp_size);
+	if (err != (int)*cpp_size)
+		goto exit_free;
+
+	header = (void *)db;
+	printf("NFP HWINFO header: %08x\n", *(uint32_t *)header);
+	if (nfp_hwinfo_is_updating(header))
+		goto exit_free;
+
+	if (header->version != NFP_HWINFO_VERSION_2) {
+		printf("Unknown HWInfo version: 0x%08x\n",
+			header->version);
+		goto exit_free;
+	}
+
+	/* NULL-terminate for safety */
+	db[*cpp_size] = '\0';
+
+	return (void *)db;
+exit_free:
+	free(db);
+	return NULL;
+}
+
+static struct nfp_hwinfo *
+nfp_hwinfo_fetch(struct nfp_cpp *cpp, size_t *hwdb_size)
+{
+	struct timespec wait;
+	struct nfp_hwinfo *db;
+	int count;
+
+	wait.tv_sec = 0;
+	wait.tv_nsec = 10000000;
+	count = 0;
+
+	for (;;) {
+		db = nfp_hwinfo_try_fetch(cpp, hwdb_size);
+		if (db)
+			return db;
+
+		nanosleep(&wait, NULL);
+		if (count++ > 200) {
+			printf("NFP access error\n");
+			return NULL;
+		}
+	}
+}
+
+struct nfp_hwinfo *
+nfp_hwinfo_read(struct nfp_cpp *cpp)
+{
+	struct nfp_hwinfo *db;
+	size_t hwdb_size = 0;
+	int err;
+
+	db = nfp_hwinfo_fetch(cpp, &hwdb_size);
+	if (!db)
+		return NULL;
+
+	err = nfp_hwinfo_db_validate(db, hwdb_size);
+	if (err) {
+		free(db);
+		return NULL;
+	}
+	return db;
+}
+
+/*
+ * nfp_hwinfo_lookup() - Find a value in the HWInfo table by name
+ * @hwinfo:	NFP HWinfo table
+ * @lookup:	HWInfo name to search for
+ *
+ * Return: Value of the HWInfo name, or NULL
+ */
+const char *
+nfp_hwinfo_lookup(struct nfp_hwinfo *hwinfo, const char *lookup)
+{
+	const char *key, *val, *end;
+
+	if (!hwinfo || !lookup)
+		return NULL;
+
+	end = hwinfo->data + hwinfo->size - sizeof(uint32_t);
+
+	for (key = hwinfo->data; *key && key < end;
+	     key = val + strlen(val) + 1) {
+		val = key + strlen(key) + 1;
+
+		if (strcmp(key, lookup) == 0)
+			return val;
+	}
+
+	return NULL;
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_hwinfo.h b/drivers/net/nfp/nfpcore/nfp_hwinfo.h
new file mode 100644
index 0000000..e9a9b49
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_hwinfo.h
@@ -0,0 +1,111 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_HWINFO_H__
+#define __NFP_HWINFO_H__
+
+#include <inttypes.h>
+
+#define HWINFO_SIZE_MIN	0x100
+
+/*
+ * The Hardware Info Table defines the properties of the system.
+ *
+ * HWInfo v1 Table (fixed size)
+ *
+ * 0x0000: uint32_t version	        Hardware Info Table version (1.0)
+ * 0x0004: uint32_t size	        Total size of the table, including the
+ *					CRC32 (IEEE 802.3)
+ * 0x0008: uint32_t jumptab	        Offset of key/value table
+ * 0x000c: uint32_t keys	        Total number of keys in the key/value
+ *					table
+ * NNNNNN:				Key/value jump table and string data
+ * (size - 4): uint32_t crc32	CRC32 (same as IEEE 802.3, POSIX csum, etc)
+ *				CRC32("",0) = ~0, CRC32("a",1) = 0x48C279FE
+ *
+ * HWInfo v2 Table (variable size)
+ *
+ * 0x0000: uint32_t version	        Hardware Info Table version (2.0)
+ * 0x0004: uint32_t size	        Current size of the data area, excluding
+ *					CRC32
+ * 0x0008: uint32_t limit	        Maximum size of the table
+ * 0x000c: uint32_t reserved	        Unused, set to zero
+ * NNNNNN:			Key/value data
+ * (size - 4): uint32_t crc32	CRC32 (same as IEEE 802.3, POSIX csum, etc)
+ *				CRC32("",0) = ~0, CRC32("a",1) = 0x48C279FE
+ *
+ * If the HWInfo table is in the process of being updated, the low bit of
+ * version will be set.
+ *
+ * HWInfo v1 Key/Value Table
+ * -------------------------
+ *
+ *  The key/value table is a set of offsets to ASCIIZ strings which have
+ *  been strcmp(3) sorted (yes, please use bsearch(3) on the table).
+ *
+ *  All keys are guaranteed to be unique.
+ *
+ * N+0:	uint32_t key_1		Offset to the first key
+ * N+4:	uint32_t val_1		Offset to the first value
+ * N+8: uint32_t key_2		Offset to the second key
+ * N+c: uint32_t val_2		Offset to the second value
+ * ...
+ *
+ * HWInfo v2 Key/Value Table
+ * -------------------------
+ *
+ * Packed UTF8Z strings, ie 'key1\000value1\000key2\000value2\000'
+ *
+ * Unsorted.
+ */
+
+#define NFP_HWINFO_VERSION_1 ('H' << 24 | 'I' << 16 | 1 << 8 | 0 << 1 | 0)
+#define NFP_HWINFO_VERSION_2 ('H' << 24 | 'I' << 16 | 2 << 8 | 0 << 1 | 0)
+#define NFP_HWINFO_VERSION_UPDATING	BIT(0)
+
+struct nfp_hwinfo {
+	uint8_t start[0];
+
+	uint32_t version;
+	uint32_t size;
+
+	/* v2 specific fields */
+	uint32_t limit;
+	uint32_t resv;
+
+	char data[];
+};
+
+struct nfp_hwinfo *nfp_hwinfo_read(struct nfp_cpp *cpp);
+
+const char *nfp_hwinfo_lookup(struct nfp_hwinfo *hwinfo, const char *lookup);
+
+#endif
diff --git a/drivers/net/nfp/nfpcore/nfp_mip.c b/drivers/net/nfp/nfpcore/nfp_mip.c
new file mode 100644
index 0000000..6e14d06
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_mip.c
@@ -0,0 +1,180 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <rte_byteorder.h>
+
+#include "nfp_cpp.h"
+#include "nfp_mip.h"
+#include "nfp_nffw.h"
+
+#define NFP_MIP_SIGNATURE	rte_cpu_to_le_32(0x0050494d)  /* "MIP\0" */
+#define NFP_MIP_VERSION		rte_cpu_to_le_32(1)
+#define NFP_MIP_MAX_OFFSET	(256 * 1024)
+
+struct nfp_mip {
+	uint32_t signature;
+	uint32_t mip_version;
+	uint32_t mip_size;
+	uint32_t first_entry;
+
+	uint32_t version;
+	uint32_t buildnum;
+	uint32_t buildtime;
+	uint32_t loadtime;
+
+	uint32_t symtab_addr;
+	uint32_t symtab_size;
+	uint32_t strtab_addr;
+	uint32_t strtab_size;
+
+	char name[16];
+	char toolchain[32];
+};
+
+/* Read memory and check if it could be a valid MIP */
+static int
+nfp_mip_try_read(struct nfp_cpp *cpp, uint32_t cpp_id, uint64_t addr,
+		 struct nfp_mip *mip)
+{
+	int ret;
+
+	ret = nfp_cpp_read(cpp, cpp_id, addr, mip, sizeof(*mip));
+	if (ret != sizeof(*mip)) {
+		printf("Failed to read MIP data (%d, %zu)\n",
+			ret, sizeof(*mip));
+		return -EIO;
+	}
+	if (mip->signature != NFP_MIP_SIGNATURE) {
+		printf("Incorrect MIP signature (0x%08x)\n",
+			 rte_le_to_cpu_32(mip->signature));
+		return -EINVAL;
+	}
+	if (mip->mip_version != NFP_MIP_VERSION) {
+		printf("Unsupported MIP version (%d)\n",
+			 rte_le_to_cpu_32(mip->mip_version));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/* Try to locate MIP using the resource table */
+static int
+nfp_mip_read_resource(struct nfp_cpp *cpp, struct nfp_mip *mip)
+{
+	struct nfp_nffw_info *nffw_info;
+	uint32_t cpp_id;
+	uint64_t addr;
+	int err;
+
+	nffw_info = nfp_nffw_info_open(cpp);
+	if (!nffw_info)
+		return -ENODEV;
+
+	err = nfp_nffw_info_mip_first(nffw_info, &cpp_id, &addr);
+	if (err)
+		goto exit_close_nffw;
+
+	err = nfp_mip_try_read(cpp, cpp_id, addr, mip);
+exit_close_nffw:
+	nfp_nffw_info_close(nffw_info);
+	return err;
+}
+
+/*
+ * nfp_mip_open() - Get device MIP structure
+ * @cpp:	NFP CPP Handle
+ *
+ * Copy MIP structure from NFP device and return it.  The returned
+ * structure is handled internally by the library and should be
+ * freed by calling nfp_mip_close().
+ *
+ * Return: pointer to mip, NULL on failure.
+ */
+struct nfp_mip *
+nfp_mip_open(struct nfp_cpp *cpp)
+{
+	struct nfp_mip *mip;
+	int err;
+
+	mip = malloc(sizeof(*mip));
+	if (!mip)
+		return NULL;
+
+	err = nfp_mip_read_resource(cpp, mip);
+	if (err) {
+		free(mip);
+		return NULL;
+	}
+
+	mip->name[sizeof(mip->name) - 1] = 0;
+
+	return mip;
+}
+
+void
+nfp_mip_close(struct nfp_mip *mip)
+{
+	free(mip);
+}
+
+const char *
+nfp_mip_name(const struct nfp_mip *mip)
+{
+	return mip->name;
+}
+
+/*
+ * nfp_mip_symtab() - Get the address and size of the MIP symbol table
+ * @mip:	MIP handle
+ * @addr:	Location for NFP DDR address of MIP symbol table
+ * @size:	Location for size of MIP symbol table
+ */
+void
+nfp_mip_symtab(const struct nfp_mip *mip, uint32_t *addr, uint32_t *size)
+{
+	*addr = rte_le_to_cpu_32(mip->symtab_addr);
+	*size = rte_le_to_cpu_32(mip->symtab_size);
+}
+
+/*
+ * nfp_mip_strtab() - Get the address and size of the MIP symbol name table
+ * @mip:	MIP handle
+ * @addr:	Location for NFP DDR address of MIP symbol name table
+ * @size:	Location for size of MIP symbol name table
+ */
+void
+nfp_mip_strtab(const struct nfp_mip *mip, uint32_t *addr, uint32_t *size)
+{
+	*addr = rte_le_to_cpu_32(mip->strtab_addr);
+	*size = rte_le_to_cpu_32(mip->strtab_size);
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_mip.h b/drivers/net/nfp/nfpcore/nfp_mip.h
new file mode 100644
index 0000000..48d8ef3
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_mip.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_MIP_H__
+#define __NFP_MIP_H__
+
+#include "nfp_nffw.h"
+
+struct nfp_mip;
+
+struct nfp_mip *nfp_mip_open(struct nfp_cpp *cpp);
+void nfp_mip_close(struct nfp_mip *mip);
+
+const char *nfp_mip_name(const struct nfp_mip *mip);
+void nfp_mip_symtab(const struct nfp_mip *mip, uint32_t *addr, uint32_t *size);
+void nfp_mip_strtab(const struct nfp_mip *mip, uint32_t *addr, uint32_t *size);
+int nfp_nffw_info_mip_first(struct nfp_nffw_info *state, uint32_t *cpp_id,
+			    uint64_t *off);
+#endif
diff --git a/drivers/net/nfp/nfpcore/nfp_mutex.c b/drivers/net/nfp/nfpcore/nfp_mutex.c
new file mode 100644
index 0000000..76dc143
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_mutex.c
@@ -0,0 +1,450 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <errno.h>
+
+#include <malloc.h>
+#include <time.h>
+#include <sched.h>
+
+#include "nfp_cpp.h"
+#include "nfp6000/nfp6000.h"
+
+#define MUTEX_LOCKED(interface)  ((((uint32_t)(interface)) << 16) | 0x000f)
+#define MUTEX_UNLOCK(interface)  (0                               | 0x0000)
+
+#define MUTEX_IS_LOCKED(value)   (((value) & 0xffff) == 0x000f)
+#define MUTEX_IS_UNLOCKED(value) (((value) & 0xffff) == 0x0000)
+#define MUTEX_INTERFACE(value)   (((value) >> 16) & 0xffff)
+
+/*
+ * If you need more than 65536 recursive locks, please
+ * rethink your code.
+ */
+#define MUTEX_DEPTH_MAX         0xffff
+
+struct nfp_cpp_mutex {
+	struct nfp_cpp *cpp;
+	uint8_t target;
+	uint16_t depth;
+	unsigned long long address;
+	uint32_t key;
+	unsigned int usage;
+	struct nfp_cpp_mutex *prev, *next;
+};
+
+static int
+_nfp_cpp_mutex_validate(uint32_t model, int *target, unsigned long long address)
+{
+	/* Address must be 64-bit aligned */
+	if (address & 7)
+		return NFP_ERRNO(EINVAL);
+
+	if (NFP_CPP_MODEL_IS_6000(model)) {
+		if (*target != NFP_CPP_TARGET_MU)
+			return NFP_ERRNO(EINVAL);
+	} else {
+		return NFP_ERRNO(EINVAL);
+	}
+
+	return 0;
+}
+
+/*
+ * Initialize a mutex location
+ *
+ * The CPP target:address must point to a 64-bit aligned location, and
+ * will initialize 64 bits of data at the location.
+ *
+ * This creates the initial mutex state, as locked by this
+ * nfp_cpp_interface().
+ *
+ * This function should only be called when setting up
+ * the initial lock state upon boot-up of the system.
+ *
+ * @param mutex     NFP CPP Mutex handle
+ * @param target    NFP CPP target ID (ie NFP_CPP_TARGET_CLS or
+ *		    NFP_CPP_TARGET_MU)
+ * @param address   Offset into the address space of the NFP CPP target ID
+ * @param key       Unique 32-bit value for this mutex
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int
+nfp_cpp_mutex_init(struct nfp_cpp *cpp, int target, unsigned long long address,
+		   uint32_t key)
+{
+	uint32_t model = nfp_cpp_model(cpp);
+	uint32_t muw = NFP_CPP_ID(target, 4, 0);	/* atomic_write */
+	int err;
+
+	err = _nfp_cpp_mutex_validate(model, &target, address);
+	if (err < 0)
+		return err;
+
+	err = nfp_cpp_writel(cpp, muw, address + 4, key);
+	if (err < 0)
+		return err;
+
+	err =
+	    nfp_cpp_writel(cpp, muw, address + 0,
+			   MUTEX_LOCKED(nfp_cpp_interface(cpp)));
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+
+/*
+ * Create a mutex handle from an address controlled by a MU Atomic engine
+ *
+ * The CPP target:address must point to a 64-bit aligned location, and
+ * reserve 64 bits of data at the location for use by the handle.
+ *
+ * Only target/address pairs that point to entities that support the
+ * MU Atomic Engine are supported.
+ *
+ * @param cpp       NFP CPP handle
+ * @param target    NFP CPP target ID (ie NFP_CPP_TARGET_CLS or
+ *		    NFP_CPP_TARGET_MU)
+ * @param address   Offset into the address space of the NFP CPP target ID
+ * @param key       32-bit unique key (must match the key at this location)
+ *
+ * @return      A non-NULL struct nfp_cpp_mutex * on success, NULL on failure.
+ */
+struct nfp_cpp_mutex *
+nfp_cpp_mutex_alloc(struct nfp_cpp *cpp, int target,
+		     unsigned long long address, uint32_t key)
+{
+	uint32_t model = nfp_cpp_model(cpp);
+	struct nfp_cpp_mutex *mutex;
+	uint32_t mur = NFP_CPP_ID(target, 3, 0);	/* atomic_read */
+	int err;
+	uint32_t tmp;
+
+	/* Look for cached mutex */
+	for (mutex = cpp->mutex_cache; mutex; mutex = mutex->next) {
+		if (mutex->target == target && mutex->address == address)
+			break;
+	}
+
+	if (mutex) {
+		if (mutex->key == key) {
+			mutex->usage++;
+			return mutex;
+		}
+
+		/* If the key doesn't match... */
+		return NFP_ERRPTR(EEXIST);
+	}
+
+	err = _nfp_cpp_mutex_validate(model, &target, address);
+	if (err < 0)
+		return NULL;
+
+	err = nfp_cpp_readl(cpp, mur, address + 4, &tmp);
+	if (err < 0)
+		return NULL;
+
+	if (tmp != key)
+		return NFP_ERRPTR(EEXIST);
+
+	mutex = calloc(sizeof(*mutex), 1);
+	if (!mutex)
+		return NFP_ERRPTR(ENOMEM);
+
+	mutex->cpp = cpp;
+	mutex->target = target;
+	mutex->address = address;
+	mutex->key = key;
+	mutex->depth = 0;
+	mutex->usage = 1;
+
+	/* Add mutex to the cache */
+	if (cpp->mutex_cache) {
+		cpp->mutex_cache->prev = mutex;
+		mutex->next = cpp->mutex_cache;
+		cpp->mutex_cache = mutex;
+	} else {
+		cpp->mutex_cache = mutex;
+	}
+
+	return mutex;
+}
+
+struct nfp_cpp *
+nfp_cpp_mutex_cpp(struct nfp_cpp_mutex *mutex)
+{
+	return mutex->cpp;
+}
+
+uint32_t
+nfp_cpp_mutex_key(struct nfp_cpp_mutex *mutex)
+{
+	return mutex->key;
+}
+
+uint16_t
+nfp_cpp_mutex_owner(struct nfp_cpp_mutex *mutex)
+{
+	uint32_t mur = NFP_CPP_ID(mutex->target, 3, 0);	/* atomic_read */
+	uint32_t value, key;
+	int err;
+
+	err = nfp_cpp_readl(mutex->cpp, mur, mutex->address, &value);
+	if (err < 0)
+		return err;
+
+	err = nfp_cpp_readl(mutex->cpp, mur, mutex->address + 4, &key);
+	if (err < 0)
+		return err;
+
+	if (key != mutex->key)
+		return NFP_ERRNO(EPERM);
+
+	if (!MUTEX_IS_LOCKED(value))
+		return 0;
+
+	return MUTEX_INTERFACE(value);
+}
+
+int
+nfp_cpp_mutex_target(struct nfp_cpp_mutex *mutex)
+{
+	return mutex->target;
+}
+
+uint64_t
+nfp_cpp_mutex_address(struct nfp_cpp_mutex *mutex)
+{
+	return mutex->address;
+}
+
+/*
+ * Free a mutex handle - does not alter the lock state
+ *
+ * @param mutex     NFP CPP Mutex handle
+ */
+void
+nfp_cpp_mutex_free(struct nfp_cpp_mutex *mutex)
+{
+	mutex->usage--;
+	if (mutex->usage > 0)
+		return;
+
+	/* Remove mutex from the cache */
+	if (mutex->next)
+		mutex->next->prev = mutex->prev;
+	if (mutex->prev)
+		mutex->prev->next = mutex->next;
+
+	/* If mutex->cpp == NULL, something broke */
+	if (mutex->cpp && mutex == mutex->cpp->mutex_cache)
+		mutex->cpp->mutex_cache = mutex->next;
+
+	free(mutex);
+}
+
+/*
+ * Lock a mutex handle, using the NFP MU Atomic Engine
+ *
+ * @param mutex     NFP CPP Mutex handle
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int
+nfp_cpp_mutex_lock(struct nfp_cpp_mutex *mutex)
+{
+	int err;
+	time_t warn_at = time(NULL) + 15;
+
+	while ((err = nfp_cpp_mutex_trylock(mutex)) != 0) {
+		/* If errno != EBUSY, then the lock was damaged */
+		if (err < 0 && errno != EBUSY)
+			return err;
+		if (time(NULL) >= warn_at) {
+			printf("Warning: waiting for NFP mutex\n");
+			printf("\tusage:%hd\n", mutex->usage);
+			printf("\tdepth:%hd]\n", mutex->depth);
+			printf("\ttarget:%d\n", mutex->target);
+			printf("\taddr:%llx\n", mutex->address);
+			printf("\tkey:%08x]\n", mutex->key);
+			warn_at = time(NULL) + 60;
+		}
+		sched_yield();
+	}
+	return 0;
+}
+
+/*
+ * Unlock a mutex handle, using the NFP MU Atomic Engine
+ *
+ * @param mutex     NFP CPP Mutex handle
+ *
+ * @return 0 on success, or -1 on failure (and set errno accordingly).
+ */
+int
+nfp_cpp_mutex_unlock(struct nfp_cpp_mutex *mutex)
+{
+	uint32_t muw = NFP_CPP_ID(mutex->target, 4, 0);	/* atomic_write */
+	uint32_t mur = NFP_CPP_ID(mutex->target, 3, 0);	/* atomic_read */
+	struct nfp_cpp *cpp = mutex->cpp;
+	uint32_t key, value;
+	uint16_t interface = nfp_cpp_interface(cpp);
+	int err;
+
+	if (mutex->depth > 1) {
+		mutex->depth--;
+		return 0;
+	}
+
+	err = nfp_cpp_readl(mutex->cpp, mur, mutex->address, &value);
+	if (err < 0)
+		goto exit;
+
+	err = nfp_cpp_readl(mutex->cpp, mur, mutex->address + 4, &key);
+	if (err < 0)
+		goto exit;
+
+	if (key != mutex->key) {
+		err = NFP_ERRNO(EPERM);
+		goto exit;
+	}
+
+	if (value != MUTEX_LOCKED(interface)) {
+		err = NFP_ERRNO(EACCES);
+		goto exit;
+	}
+
+	err = nfp_cpp_writel(cpp, muw, mutex->address, MUTEX_UNLOCK(interface));
+	if (err < 0)
+		goto exit;
+
+	mutex->depth = 0;
+
+exit:
+	return err;
+}
+
+/*
+ * Attempt to lock a mutex handle, using the NFP MU Atomic Engine
+ *
+ * Valid lock states:
+ *
+ *      0x....0000      - Unlocked
+ *      0x....000f      - Locked
+ *
+ * @param mutex     NFP CPP Mutex handle
+ * @return      0 if the lock succeeded, -1 on failure (and errno set
+ *		appropriately).
+ */
+int
+nfp_cpp_mutex_trylock(struct nfp_cpp_mutex *mutex)
+{
+	uint32_t mur = NFP_CPP_ID(mutex->target, 3, 0);	/* atomic_read */
+	uint32_t muw = NFP_CPP_ID(mutex->target, 4, 0);	/* atomic_write */
+	uint32_t mus = NFP_CPP_ID(mutex->target, 5, 3);	/* test_set_imm */
+	uint32_t key, value, tmp;
+	struct nfp_cpp *cpp = mutex->cpp;
+	int err;
+
+	if (mutex->depth > 0) {
+		if (mutex->depth == MUTEX_DEPTH_MAX)
+			return NFP_ERRNO(E2BIG);
+
+		mutex->depth++;
+		return 0;
+	}
+
+	/* Verify that the lock marker is not damaged */
+	err = nfp_cpp_readl(cpp, mur, mutex->address + 4, &key);
+	if (err < 0)
+		goto exit;
+
+	if (key != mutex->key) {
+		err = NFP_ERRNO(EPERM);
+		goto exit;
+	}
+
+	/*
+	 * Compare against the unlocked state, and if true,
+	 * write the interface id into the top 16 bits, and
+	 * mark as locked.
+	 */
+	value = MUTEX_LOCKED(nfp_cpp_interface(cpp));
+
+	/*
+	 * We use test_set_imm here, as it implies a read
+	 * of the current state, and sets the bits in the
+	 * bytemask of the command to 1s. Since the mutex
+	 * is guaranteed to be 64-bit aligned, the bytemask
+	 * of this 32-bit command is ensured to be 8'b00001111,
+	 * which implies that the lower 4 bits will be set to
+	 * ones regardless of the initial state.
+	 *
+	 * Since this is a 'Readback' operation, with no Pull
+	 * data, we can treat this as a normal Push (read)
+	 * atomic, which returns the original value.
+	 */
+	err = nfp_cpp_readl(cpp, mus, mutex->address, &tmp);
+	if (err < 0)
+		goto exit;
+
+	/* Was it unlocked? */
+	if (MUTEX_IS_UNLOCKED(tmp)) {
+		/*
+		 * The read value can only be 0x....0000 in the unlocked state.
+		 * If there was another contending for this lock, then
+		 * the lock state would be 0x....000f
+		 *
+		 * Write our owner ID into the lock
+		 * While not strictly necessary, this helps with
+		 * debug and bookkeeping.
+		 */
+		err = nfp_cpp_writel(cpp, muw, mutex->address, value);
+		if (err < 0)
+			goto exit;
+
+		mutex->depth = 1;
+		goto exit;
+	}
+
+	/* Already locked by us? Success! */
+	if (tmp == value) {
+		mutex->depth = 1;
+		goto exit;
+	}
+
+	err = NFP_ERRNO(MUTEX_IS_LOCKED(tmp) ? EBUSY : EINVAL);
+
+exit:
+	return err;
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_nffw.c b/drivers/net/nfp/nfpcore/nfp_nffw.c
new file mode 100644
index 0000000..b8e5331
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_nffw.c
@@ -0,0 +1,261 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "nfp_cpp.h"
+#include "nfp_nffw.h"
+#include "nfp_mip.h"
+#include "nfp6000/nfp6000.h"
+#include "nfp_resource.h"
+
+/*
+ * flg_info_version = flags[0]<27:16>
+ * This is a small version counter intended only to detect if the current
+ * implementation can read the current struct. Struct changes should be very
+ * rare and as such a 12-bit counter should cover large spans of time. By the
+ * time it wraps around, we don't expect to have 4096 versions of this struct
+ * to be in use at the same time.
+ */
+static uint32_t
+nffw_res_info_version_get(const struct nfp_nffw_info_data *res)
+{
+	return (res->flags[0] >> 16) & 0xfff;
+}
+
+/* flg_init = flags[0]<0> */
+static uint32_t
+nffw_res_flg_init_get(const struct nfp_nffw_info_data *res)
+{
+	return (res->flags[0] >> 0) & 1;
+}
+
+/* loaded = loaded__mu_da__mip_off_hi<31:31> */
+static uint32_t
+nffw_fwinfo_loaded_get(const struct nffw_fwinfo *fi)
+{
+	return (fi->loaded__mu_da__mip_off_hi >> 31) & 1;
+}
+
+/* mip_cppid = mip_cppid */
+static uint32_t
+nffw_fwinfo_mip_cppid_get(const struct nffw_fwinfo *fi)
+{
+	return fi->mip_cppid;
+}
+
+/* loaded = loaded__mu_da__mip_off_hi<8:8> */
+static uint32_t
+nffw_fwinfo_mip_mu_da_get(const struct nffw_fwinfo *fi)
+{
+	return (fi->loaded__mu_da__mip_off_hi >> 8) & 1;
+}
+
+/* mip_offset = (loaded__mu_da__mip_off_hi<7:0> << 8) | mip_offset_lo */
+static uint64_t
+nffw_fwinfo_mip_offset_get(const struct nffw_fwinfo *fi)
+{
+	uint64_t mip_off_hi = fi->loaded__mu_da__mip_off_hi;
+
+	return (mip_off_hi & 0xFF) << 32 | fi->mip_offset_lo;
+}
+
+#define NFP_IMB_TGTADDRESSMODECFG_MODE_of(_x)		(((_x) >> 13) & 0x7)
+#define NFP_IMB_TGTADDRESSMODECFG_ADDRMODE		BIT(12)
+#define   NFP_IMB_TGTADDRESSMODECFG_ADDRMODE_32_BIT	0
+#define   NFP_IMB_TGTADDRESSMODECFG_ADDRMODE_40_BIT	BIT(12)
+
+static int
+nfp_mip_mu_locality_lsb(struct nfp_cpp *cpp)
+{
+	unsigned int mode, addr40;
+	uint32_t xpbaddr, imbcppat;
+	int err;
+
+	/* Hardcoded XPB IMB Base, island 0 */
+	xpbaddr = 0x000a0000 + NFP_CPP_TARGET_MU * 4;
+	err = nfp_xpb_readl(cpp, xpbaddr, &imbcppat);
+	if (err < 0)
+		return err;
+
+	mode = NFP_IMB_TGTADDRESSMODECFG_MODE_of(imbcppat);
+	addr40 = !!(imbcppat & NFP_IMB_TGTADDRESSMODECFG_ADDRMODE);
+
+	return nfp_cppat_mu_locality_lsb(mode, addr40);
+}
+
+static unsigned int
+nffw_res_fwinfos(struct nfp_nffw_info_data *fwinf, struct nffw_fwinfo **arr)
+{
+	/*
+	 * For the this code, version 0 is most likely to be version 1 in this
+	 * case. Since the kernel driver does not take responsibility for
+	 * initialising the nfp.nffw resource, any previous code (CA firmware or
+	 * userspace) that left the version 0 and did set the init flag is going
+	 * to be version 1.
+	 */
+	switch (nffw_res_info_version_get(fwinf)) {
+	case 0:
+	case 1:
+		*arr = &fwinf->info.v1.fwinfo[0];
+		return NFFW_FWINFO_CNT_V1;
+	case 2:
+		*arr = &fwinf->info.v2.fwinfo[0];
+		return NFFW_FWINFO_CNT_V2;
+	default:
+		*arr = NULL;
+		return 0;
+	}
+}
+
+/*
+ * nfp_nffw_info_open() - Acquire the lock on the NFFW table
+ * @cpp:	NFP CPP handle
+ *
+ * Return: 0, or -ERRNO
+ */
+struct nfp_nffw_info *
+nfp_nffw_info_open(struct nfp_cpp *cpp)
+{
+	struct nfp_nffw_info_data *fwinf;
+	struct nfp_nffw_info *state;
+	uint32_t info_ver;
+	int err;
+
+	state = malloc(sizeof(*state));
+	if (!state)
+		return NULL;
+
+	memset(state, 0, sizeof(*state));
+
+	state->res = nfp_resource_acquire(cpp, NFP_RESOURCE_NFP_NFFW);
+	if (!state->res)
+		goto err_free;
+
+	fwinf = &state->fwinf;
+
+	if (sizeof(*fwinf) > nfp_resource_size(state->res))
+		goto err_release;
+
+	err = nfp_cpp_read(cpp, nfp_resource_cpp_id(state->res),
+			   nfp_resource_address(state->res),
+			   fwinf, sizeof(*fwinf));
+	if (err < (int)sizeof(*fwinf))
+		goto err_release;
+
+	if (!nffw_res_flg_init_get(fwinf))
+		goto err_release;
+
+	info_ver = nffw_res_info_version_get(fwinf);
+	if (info_ver > NFFW_INFO_VERSION_CURRENT)
+		goto err_release;
+
+	state->cpp = cpp;
+	return state;
+
+err_release:
+	nfp_resource_release(state->res);
+err_free:
+	free(state);
+	return NULL;
+}
+
+/*
+ * nfp_nffw_info_release() - Release the lock on the NFFW table
+ * @state:	NFP FW info state
+ *
+ * Return: 0, or -ERRNO
+ */
+void
+nfp_nffw_info_close(struct nfp_nffw_info *state)
+{
+	nfp_resource_release(state->res);
+	free(state);
+}
+
+/*
+ * nfp_nffw_info_fwid_first() - Return the first firmware ID in the NFFW
+ * @state:	NFP FW info state
+ *
+ * Return: First NFFW firmware info, NULL on failure
+ */
+static struct nffw_fwinfo *
+nfp_nffw_info_fwid_first(struct nfp_nffw_info *state)
+{
+	struct nffw_fwinfo *fwinfo;
+	unsigned int cnt, i;
+
+	cnt = nffw_res_fwinfos(&state->fwinf, &fwinfo);
+	if (!cnt)
+		return NULL;
+
+	for (i = 0; i < cnt; i++)
+		if (nffw_fwinfo_loaded_get(&fwinfo[i]))
+			return &fwinfo[i];
+
+	return NULL;
+}
+
+/*
+ * nfp_nffw_info_mip_first() - Retrieve the location of the first FW's MIP
+ * @state:	NFP FW info state
+ * @cpp_id:	Pointer to the CPP ID of the MIP
+ * @off:	Pointer to the CPP Address of the MIP
+ *
+ * Return: 0, or -ERRNO
+ */
+int
+nfp_nffw_info_mip_first(struct nfp_nffw_info *state, uint32_t *cpp_id,
+			uint64_t *off)
+{
+	struct nffw_fwinfo *fwinfo;
+
+	fwinfo = nfp_nffw_info_fwid_first(state);
+	if (!fwinfo)
+		return -EINVAL;
+
+	*cpp_id = nffw_fwinfo_mip_cppid_get(fwinfo);
+	*off = nffw_fwinfo_mip_offset_get(fwinfo);
+
+	if (nffw_fwinfo_mip_mu_da_get(fwinfo)) {
+		int locality_off;
+
+		if (NFP_CPP_ID_TARGET_of(*cpp_id) != NFP_CPP_TARGET_MU)
+			return 0;
+
+		locality_off = nfp_mip_mu_locality_lsb(state->cpp);
+		if (locality_off < 0)
+			return locality_off;
+
+		*off &= ~(NFP_MU_ADDR_ACCESS_TYPE_MASK << locality_off);
+		*off |= NFP_MU_ADDR_ACCESS_TYPE_DIRECT << locality_off;
+	}
+
+	return 0;
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_nffw.h b/drivers/net/nfp/nfpcore/nfp_nffw.h
new file mode 100644
index 0000000..926ec67
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_nffw.h
@@ -0,0 +1,112 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_NFFW_H__
+#define __NFP_NFFW_H__
+
+#include "nfp-common/nfp_platform.h"
+#include "nfp_cpp.h"
+
+/*
+ * Init-CSR owner IDs for firmware map to firmware IDs which start at 4.
+ * Lower IDs are reserved for target and loader IDs.
+ */
+#define NFFW_FWID_EXT   3	/* For active MEs that we didn't load. */
+#define NFFW_FWID_BASE  4
+
+#define NFFW_FWID_ALL   255
+
+/* Init-CSR owner IDs for firmware map to firmware IDs which start at 4.
+ * Lower IDs are reserved for target and loader IDs.
+ */
+#define NFFW_FWID_EXT   3 /* For active MEs that we didn't load. */
+#define NFFW_FWID_BASE  4
+
+#define NFFW_FWID_ALL   255
+
+/**
+ * NFFW_INFO_VERSION history:
+ * 0: This was never actually used (before versioning), but it refers to
+ *    the previous struct which had FWINFO_CNT = MEINFO_CNT = 120 that later
+ *    changed to 200.
+ * 1: First versioned struct, with
+ *     FWINFO_CNT = 120
+ *     MEINFO_CNT = 120
+ * 2:  FWINFO_CNT = 200
+ *     MEINFO_CNT = 200
+ */
+#define NFFW_INFO_VERSION_CURRENT 2
+
+/* Enough for all current chip families */
+#define NFFW_MEINFO_CNT_V1 120
+#define NFFW_FWINFO_CNT_V1 120
+#define NFFW_MEINFO_CNT_V2 200
+#define NFFW_FWINFO_CNT_V2 200
+
+struct nffw_meinfo {
+	uint32_t ctxmask__fwid__meid;
+};
+
+struct nffw_fwinfo {
+	uint32_t loaded__mu_da__mip_off_hi;
+	uint32_t mip_cppid; /* 0 means no MIP */
+	uint32_t mip_offset_lo;
+};
+
+struct nfp_nffw_info_v1 {
+	struct nffw_meinfo meinfo[NFFW_MEINFO_CNT_V1];
+	struct nffw_fwinfo fwinfo[NFFW_FWINFO_CNT_V1];
+};
+
+struct nfp_nffw_info_v2 {
+	struct nffw_meinfo meinfo[NFFW_MEINFO_CNT_V2];
+	struct nffw_fwinfo fwinfo[NFFW_FWINFO_CNT_V2];
+};
+
+struct nfp_nffw_info_data {
+	uint32_t flags[2];
+	union {
+		struct nfp_nffw_info_v1 v1;
+		struct nfp_nffw_info_v2 v2;
+	} info;
+};
+
+struct nfp_nffw_info {
+	struct nfp_cpp *cpp;
+	struct nfp_resource *res;
+
+	struct nfp_nffw_info_data fwinf;
+};
+
+struct nfp_nffw_info *nfp_nffw_info_open(struct nfp_cpp *cpp);
+void nfp_nffw_info_close(struct nfp_nffw_info *state);
+
+#endif
diff --git a/drivers/net/nfp/nfpcore/nfp_nsp.c b/drivers/net/nfp/nfpcore/nfp_nsp.c
new file mode 100644
index 0000000..6254f9a
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_nsp.c
@@ -0,0 +1,453 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define NFP_SUBSYS "nfp_nsp"
+
+#include <stdio.h>
+#include <time.h>
+
+#include <rte_common.h>
+
+#include "nfp_cpp.h"
+#include "nfp_nsp.h"
+#include "nfp_resource.h"
+
+int
+nfp_nsp_config_modified(struct nfp_nsp *state)
+{
+	return state->modified;
+}
+
+void
+nfp_nsp_config_set_modified(struct nfp_nsp *state, int modified)
+{
+	state->modified = modified;
+}
+
+void *
+nfp_nsp_config_entries(struct nfp_nsp *state)
+{
+	return state->entries;
+}
+
+unsigned int
+nfp_nsp_config_idx(struct nfp_nsp *state)
+{
+	return state->idx;
+}
+
+void
+nfp_nsp_config_set_state(struct nfp_nsp *state, void *entries, unsigned int idx)
+{
+	state->entries = entries;
+	state->idx = idx;
+}
+
+void
+nfp_nsp_config_clear_state(struct nfp_nsp *state)
+{
+	state->entries = NULL;
+	state->idx = 0;
+}
+
+static void
+nfp_nsp_print_extended_error(uint32_t ret_val)
+{
+	int i;
+
+	if (!ret_val)
+		return;
+
+	for (i = 0; i < (int)ARRAY_SIZE(nsp_errors); i++)
+		if (ret_val == (uint32_t)nsp_errors[i].code)
+			printf("err msg: %s\n", nsp_errors[i].msg);
+}
+
+static int
+nfp_nsp_check(struct nfp_nsp *state)
+{
+	struct nfp_cpp *cpp = state->cpp;
+	uint64_t nsp_status, reg;
+	uint32_t nsp_cpp;
+	int err;
+
+	nsp_cpp = nfp_resource_cpp_id(state->res);
+	nsp_status = nfp_resource_address(state->res) + NSP_STATUS;
+
+	err = nfp_cpp_readq(cpp, nsp_cpp, nsp_status, &reg);
+	if (err < 0)
+		return err;
+
+	if (FIELD_GET(NSP_STATUS_MAGIC, reg) != NSP_MAGIC) {
+		printf("Cannot detect NFP Service Processor\n");
+		return -ENODEV;
+	}
+
+	state->ver.major = FIELD_GET(NSP_STATUS_MAJOR, reg);
+	state->ver.minor = FIELD_GET(NSP_STATUS_MINOR, reg);
+
+	if (state->ver.major != NSP_MAJOR || state->ver.minor < NSP_MINOR) {
+		printf("Unsupported ABI %hu.%hu\n", state->ver.major,
+						    state->ver.minor);
+		return -EINVAL;
+	}
+
+	if (reg & NSP_STATUS_BUSY) {
+		printf("Service processor busy!\n");
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
+/*
+ * nfp_nsp_open() - Prepare for communication and lock the NSP resource.
+ * @cpp:	NFP CPP Handle
+ */
+struct nfp_nsp *
+nfp_nsp_open(struct nfp_cpp *cpp)
+{
+	struct nfp_resource *res;
+	struct nfp_nsp *state;
+	int err;
+
+	res = nfp_resource_acquire(cpp, NFP_RESOURCE_NSP);
+	if (!res)
+		return NULL;
+
+	state = malloc(sizeof(*state));
+	if (!state) {
+		nfp_resource_release(res);
+		return NULL;
+	}
+	memset(state, 0, sizeof(*state));
+	state->cpp = cpp;
+	state->res = res;
+
+	err = nfp_nsp_check(state);
+	if (err) {
+		nfp_nsp_close(state);
+		return NULL;
+	}
+
+	return state;
+}
+
+/*
+ * nfp_nsp_close() - Clean up and unlock the NSP resource.
+ * @state:	NFP SP state
+ */
+void
+nfp_nsp_close(struct nfp_nsp *state)
+{
+	nfp_resource_release(state->res);
+	free(state);
+}
+
+uint16_t
+nfp_nsp_get_abi_ver_major(struct nfp_nsp *state)
+{
+	return state->ver.major;
+}
+
+uint16_t
+nfp_nsp_get_abi_ver_minor(struct nfp_nsp *state)
+{
+	return state->ver.minor;
+}
+
+static int
+nfp_nsp_wait_reg(struct nfp_cpp *cpp, uint64_t *reg, uint32_t nsp_cpp,
+		 uint64_t addr, uint64_t mask, uint64_t val)
+{
+	struct timespec wait;
+	int count;
+	int err;
+
+	wait.tv_sec = 0;
+	wait.tv_nsec = 25000000;
+	count = 0;
+
+	for (;;) {
+		err = nfp_cpp_readq(cpp, nsp_cpp, addr, reg);
+		if (err < 0)
+			return err;
+
+		if ((*reg & mask) == val)
+			return 0;
+
+		nanosleep(&wait, 0);
+		if (count++ > 1000)
+			return -ETIMEDOUT;
+	}
+}
+
+/*
+ * nfp_nsp_command() - Execute a command on the NFP Service Processor
+ * @state:	NFP SP state
+ * @code:	NFP SP Command Code
+ * @option:	NFP SP Command Argument
+ * @buff_cpp:	NFP SP Buffer CPP Address info
+ * @buff_addr:	NFP SP Buffer Host address
+ *
+ * Return: 0 for success with no result
+ *
+ *	 positive value for NSP completion with a result code
+ *
+ *	-EAGAIN if the NSP is not yet present
+ *	-ENODEV if the NSP is not a supported model
+ *	-EBUSY if the NSP is stuck
+ *	-EINTR if interrupted while waiting for completion
+ *	-ETIMEDOUT if the NSP took longer than 30 seconds to complete
+ */
+static int
+nfp_nsp_command(struct nfp_nsp *state, uint16_t code, uint32_t option,
+		uint32_t buff_cpp, uint64_t buff_addr)
+{
+	uint64_t reg, ret_val, nsp_base, nsp_buffer, nsp_status, nsp_command;
+	struct nfp_cpp *cpp = state->cpp;
+	uint32_t nsp_cpp;
+	int err;
+
+	nsp_cpp = nfp_resource_cpp_id(state->res);
+	nsp_base = nfp_resource_address(state->res);
+	nsp_status = nsp_base + NSP_STATUS;
+	nsp_command = nsp_base + NSP_COMMAND;
+	nsp_buffer = nsp_base + NSP_BUFFER;
+
+	err = nfp_nsp_check(state);
+	if (err)
+		return err;
+
+	if (!FIELD_FIT(NSP_BUFFER_CPP, buff_cpp >> 8) ||
+	    !FIELD_FIT(NSP_BUFFER_ADDRESS, buff_addr)) {
+		printf("Host buffer out of reach %08x %" PRIx64 "\n",
+			buff_cpp, buff_addr);
+		return -EINVAL;
+	}
+
+	err = nfp_cpp_writeq(cpp, nsp_cpp, nsp_buffer,
+			     FIELD_PREP(NSP_BUFFER_CPP, buff_cpp >> 8) |
+			     FIELD_PREP(NSP_BUFFER_ADDRESS, buff_addr));
+	if (err < 0)
+		return err;
+
+	err = nfp_cpp_writeq(cpp, nsp_cpp, nsp_command,
+			     FIELD_PREP(NSP_COMMAND_OPTION, option) |
+			     FIELD_PREP(NSP_COMMAND_CODE, code) |
+			     FIELD_PREP(NSP_COMMAND_START, 1));
+	if (err < 0)
+		return err;
+
+	/* Wait for NSP_COMMAND_START to go to 0 */
+	err = nfp_nsp_wait_reg(cpp, &reg, nsp_cpp, nsp_command,
+			       NSP_COMMAND_START, 0);
+	if (err) {
+		printf("Error %d waiting for code 0x%04x to start\n",
+			err, code);
+		return err;
+	}
+
+	/* Wait for NSP_STATUS_BUSY to go to 0 */
+	err = nfp_nsp_wait_reg(cpp, &reg, nsp_cpp, nsp_status, NSP_STATUS_BUSY,
+			       0);
+	if (err) {
+		printf("Error %d waiting for code 0x%04x to complete\n",
+			err, code);
+		return err;
+	}
+
+	err = nfp_cpp_readq(cpp, nsp_cpp, nsp_command, &ret_val);
+	if (err < 0)
+		return err;
+	ret_val = FIELD_GET(NSP_COMMAND_OPTION, ret_val);
+
+	err = FIELD_GET(NSP_STATUS_RESULT, reg);
+	if (err) {
+		printf("Result (error) code set: %d (%d) command: %d\n",
+			 -err, (int)ret_val, code);
+		nfp_nsp_print_extended_error(ret_val);
+		return -err;
+	}
+
+	return ret_val;
+}
+
+#define SZ_1M 0x00100000
+
+static int
+nfp_nsp_command_buf(struct nfp_nsp *nsp, uint16_t code, uint32_t option,
+		    const void *in_buf, unsigned int in_size, void *out_buf,
+		    unsigned int out_size)
+{
+	struct nfp_cpp *cpp = nsp->cpp;
+	unsigned int max_size;
+	uint64_t reg, cpp_buf;
+	int ret, err;
+	uint32_t cpp_id;
+
+	if (nsp->ver.minor < 13) {
+		printf("NSP: Code 0x%04x with buffer not supported\n", code);
+		printf("\t(ABI %hu.%hu)\n", nsp->ver.major, nsp->ver.minor);
+		return -EOPNOTSUPP;
+	}
+
+	err = nfp_cpp_readq(cpp, nfp_resource_cpp_id(nsp->res),
+			    nfp_resource_address(nsp->res) +
+			    NSP_DFLT_BUFFER_CONFIG,
+			    &reg);
+	if (err < 0)
+		return err;
+
+	max_size = RTE_MAX(in_size, out_size);
+	if (FIELD_GET(NSP_DFLT_BUFFER_SIZE_MB, reg) * SZ_1M < max_size) {
+		printf("NSP: default buffer too small for command 0x%04x\n",
+		       code);
+		printf("\t(%llu < %u)\n",
+		       FIELD_GET(NSP_DFLT_BUFFER_SIZE_MB, reg) * SZ_1M,
+		       max_size);
+		return -EINVAL;
+	}
+
+	err = nfp_cpp_readq(cpp, nfp_resource_cpp_id(nsp->res),
+			    nfp_resource_address(nsp->res) +
+			    NSP_DFLT_BUFFER,
+			    &reg);
+	if (err < 0)
+		return err;
+
+	cpp_id = FIELD_GET(NSP_BUFFER_CPP, reg) << 8;
+	cpp_buf = FIELD_GET(NSP_BUFFER_ADDRESS, reg);
+
+	if (in_buf && in_size) {
+		err = nfp_cpp_write(cpp, cpp_id, cpp_buf, in_buf, in_size);
+		if (err < 0)
+			return err;
+	}
+	/* Zero out remaining part of the buffer */
+	if (out_buf && out_size && out_size > in_size) {
+		memset(out_buf, 0, out_size - in_size);
+		err = nfp_cpp_write(cpp, cpp_id, cpp_buf + in_size, out_buf,
+				    out_size - in_size);
+		if (err < 0)
+			return err;
+	}
+
+	ret = nfp_nsp_command(nsp, code, option, cpp_id, cpp_buf);
+	if (ret < 0)
+		return ret;
+
+	if (out_buf && out_size) {
+		err = nfp_cpp_read(cpp, cpp_id, cpp_buf, out_buf, out_size);
+		if (err < 0)
+			return err;
+	}
+
+	return ret;
+}
+
+int
+nfp_nsp_wait(struct nfp_nsp *state)
+{
+	struct timespec wait;
+	int count;
+	int err;
+
+	wait.tv_sec = 0;
+	wait.tv_nsec = 25000000;
+	count = 0;
+
+	for (;;) {
+		err = nfp_nsp_command(state, SPCODE_NOOP, 0, 0, 0);
+		if (err != -EAGAIN)
+			break;
+
+		nanosleep(&wait, 0);
+
+		if (count++ > 1000) {
+			err = -ETIMEDOUT;
+			break;
+		}
+	}
+	if (err)
+		printf("NSP failed to respond %d\n", err);
+
+	return err;
+}
+
+int
+nfp_nsp_device_soft_reset(struct nfp_nsp *state)
+{
+	return nfp_nsp_command(state, SPCODE_SOFT_RESET, 0, 0, 0);
+}
+
+int
+nfp_nsp_mac_reinit(struct nfp_nsp *state)
+{
+	return nfp_nsp_command(state, SPCODE_MAC_INIT, 0, 0, 0);
+}
+
+int
+nfp_nsp_load_fw(struct nfp_nsp *state, void *buf, unsigned int size)
+{
+	return nfp_nsp_command_buf(state, SPCODE_FW_LOAD, size, buf, size,
+				   NULL, 0);
+}
+
+int
+nfp_nsp_read_eth_table(struct nfp_nsp *state, void *buf, unsigned int size)
+{
+	return nfp_nsp_command_buf(state, SPCODE_ETH_RESCAN, size, NULL, 0,
+				   buf, size);
+}
+
+int
+nfp_nsp_write_eth_table(struct nfp_nsp *state, const void *buf,
+			unsigned int size)
+{
+	return nfp_nsp_command_buf(state, SPCODE_ETH_CONTROL, size, buf, size,
+				   NULL, 0);
+}
+
+int
+nfp_nsp_read_identify(struct nfp_nsp *state, void *buf, unsigned int size)
+{
+	return nfp_nsp_command_buf(state, SPCODE_NSP_IDENTIFY, size, NULL, 0,
+				   buf, size);
+}
+
+int
+nfp_nsp_read_sensors(struct nfp_nsp *state, unsigned int sensor_mask, void *buf,
+		     unsigned int size)
+{
+	return nfp_nsp_command_buf(state, SPCODE_NSP_SENSORS, sensor_mask, NULL,
+				   0, buf, size);
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_nsp.h b/drivers/net/nfp/nfpcore/nfp_nsp.h
new file mode 100644
index 0000000..37c861a
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_nsp.h
@@ -0,0 +1,330 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef NSP_NSP_H
+#define NSP_NSP_H 1
+
+#include "nfp_cpp.h"
+#include "nfp_nsp.h"
+
+#define GENMASK_ULL(h, l) \
+	(((~0ULL) - (1ULL << (l)) + 1) & \
+	 (~0ULL >> (64 - 1 - (h))))
+
+#define __bf_shf(x) (__builtin_ffsll(x) - 1)
+
+#define FIELD_GET(_mask, _reg)	\
+	(__extension__ ({ \
+		typeof(_mask) _x = (_mask); \
+		(typeof(_x))(((_reg) & (_x)) >> __bf_shf(_x));	\
+	}))
+
+#define FIELD_FIT(_mask, _val)						\
+	(__extension__ ({ \
+		typeof(_mask) _x = (_mask); \
+		!((((typeof(_x))_val) << __bf_shf(_x)) & ~(_x)); \
+	}))
+
+#define FIELD_PREP(_mask, _val)						\
+	(__extension__ ({ \
+		typeof(_mask) _x = (_mask); \
+		((typeof(_x))(_val) << __bf_shf(_x)) & (_x);	\
+	}))
+
+/* Offsets relative to the CSR base */
+#define NSP_STATUS		0x00
+#define   NSP_STATUS_MAGIC	GENMASK_ULL(63, 48)
+#define   NSP_STATUS_MAJOR	GENMASK_ULL(47, 44)
+#define   NSP_STATUS_MINOR	GENMASK_ULL(43, 32)
+#define   NSP_STATUS_CODE	GENMASK_ULL(31, 16)
+#define   NSP_STATUS_RESULT	GENMASK_ULL(15, 8)
+#define   NSP_STATUS_BUSY	BIT_ULL(0)
+
+#define NSP_COMMAND		0x08
+#define   NSP_COMMAND_OPTION	GENMASK_ULL(63, 32)
+#define   NSP_COMMAND_CODE	GENMASK_ULL(31, 16)
+#define   NSP_COMMAND_START	BIT_ULL(0)
+
+/* CPP address to retrieve the data from */
+#define NSP_BUFFER		0x10
+#define   NSP_BUFFER_CPP	GENMASK_ULL(63, 40)
+#define   NSP_BUFFER_PCIE	GENMASK_ULL(39, 38)
+#define   NSP_BUFFER_ADDRESS	GENMASK_ULL(37, 0)
+
+#define NSP_DFLT_BUFFER		0x18
+
+#define NSP_DFLT_BUFFER_CONFIG	0x20
+#define   NSP_DFLT_BUFFER_SIZE_MB	GENMASK_ULL(7, 0)
+
+#define NSP_MAGIC		0xab10
+#define NSP_MAJOR		0
+#define NSP_MINOR		8
+
+#define NSP_CODE_MAJOR		GENMASK(15, 12)
+#define NSP_CODE_MINOR		GENMASK(11, 0)
+
+enum nfp_nsp_cmd {
+	SPCODE_NOOP		= 0, /* No operation */
+	SPCODE_SOFT_RESET	= 1, /* Soft reset the NFP */
+	SPCODE_FW_DEFAULT	= 2, /* Load default (UNDI) FW */
+	SPCODE_PHY_INIT		= 3, /* Initialize the PHY */
+	SPCODE_MAC_INIT		= 4, /* Initialize the MAC */
+	SPCODE_PHY_RXADAPT	= 5, /* Re-run PHY RX Adaptation */
+	SPCODE_FW_LOAD		= 6, /* Load fw from buffer, len in option */
+	SPCODE_ETH_RESCAN	= 7, /* Rescan ETHs, write ETH_TABLE to buf */
+	SPCODE_ETH_CONTROL	= 8, /* Update media config from buffer */
+	SPCODE_NSP_SENSORS	= 12, /* Read NSP sensor(s) */
+	SPCODE_NSP_IDENTIFY	= 13, /* Read NSP version */
+};
+
+static const struct {
+	int code;
+	const char *msg;
+} nsp_errors[] = {
+	{ 6010, "could not map to phy for port" },
+	{ 6011, "not an allowed rate/lanes for port" },
+	{ 6012, "not an allowed rate/lanes for port" },
+	{ 6013, "high/low error, change other port first" },
+	{ 6014, "config not found in flash" },
+};
+
+struct nfp_nsp {
+	struct nfp_cpp *cpp;
+	struct nfp_resource *res;
+	struct {
+		uint16_t major;
+		uint16_t minor;
+	} ver;
+
+	/* Eth table config state */
+	int modified;
+	unsigned int idx;
+	void *entries;
+};
+
+struct nfp_nsp *nfp_nsp_open(struct nfp_cpp *cpp);
+void nfp_nsp_close(struct nfp_nsp *state);
+uint16_t nfp_nsp_get_abi_ver_major(struct nfp_nsp *state);
+uint16_t nfp_nsp_get_abi_ver_minor(struct nfp_nsp *state);
+int nfp_nsp_wait(struct nfp_nsp *state);
+int nfp_nsp_device_soft_reset(struct nfp_nsp *state);
+int nfp_nsp_load_fw(struct nfp_nsp *state, void *buf, unsigned int size);
+int nfp_nsp_mac_reinit(struct nfp_nsp *state);
+int nfp_nsp_read_identify(struct nfp_nsp *state, void *buf, unsigned int size);
+int nfp_nsp_read_sensors(struct nfp_nsp *state, unsigned int sensor_mask,
+			 void *buf, unsigned int size);
+
+static inline int nfp_nsp_has_mac_reinit(struct nfp_nsp *state)
+{
+	return nfp_nsp_get_abi_ver_minor(state) > 20;
+}
+
+enum nfp_eth_interface {
+	NFP_INTERFACE_NONE	= 0,
+	NFP_INTERFACE_SFP	= 1,
+	NFP_INTERFACE_SFPP	= 10,
+	NFP_INTERFACE_SFP28	= 28,
+	NFP_INTERFACE_QSFP	= 40,
+	NFP_INTERFACE_CXP	= 100,
+	NFP_INTERFACE_QSFP28	= 112,
+};
+
+enum nfp_eth_media {
+	NFP_MEDIA_DAC_PASSIVE = 0,
+	NFP_MEDIA_DAC_ACTIVE,
+	NFP_MEDIA_FIBRE,
+};
+
+enum nfp_eth_aneg {
+	NFP_ANEG_AUTO = 0,
+	NFP_ANEG_SEARCH,
+	NFP_ANEG_25G_CONSORTIUM,
+	NFP_ANEG_25G_IEEE,
+	NFP_ANEG_DISABLED,
+};
+
+enum nfp_eth_fec {
+	NFP_FEC_AUTO_BIT = 0,
+	NFP_FEC_BASER_BIT,
+	NFP_FEC_REED_SOLOMON_BIT,
+	NFP_FEC_DISABLED_BIT,
+};
+
+#define NFP_FEC_AUTO		BIT(NFP_FEC_AUTO_BIT)
+#define NFP_FEC_BASER		BIT(NFP_FEC_BASER_BIT)
+#define NFP_FEC_REED_SOLOMON	BIT(NFP_FEC_REED_SOLOMON_BIT)
+#define NFP_FEC_DISABLED	BIT(NFP_FEC_DISABLED_BIT)
+
+#define ETH_ALEN	6
+
+/**
+ * struct nfp_eth_table - ETH table information
+ * @count:	number of table entries
+ * @max_index:	max of @index fields of all @ports
+ * @ports:	table of ports
+ *
+ * @eth_index:	port index according to legacy ethX numbering
+ * @index:	chip-wide first channel index
+ * @nbi:	NBI index
+ * @base:	first channel index (within NBI)
+ * @lanes:	number of channels
+ * @speed:	interface speed (in Mbps)
+ * @interface:	interface (module) plugged in
+ * @media:	media type of the @interface
+ * @fec:	forward error correction mode
+ * @aneg:	auto negotiation mode
+ * @mac_addr:	interface MAC address
+ * @label_port:	port id
+ * @label_subport:  id of interface within port (for split ports)
+ * @enabled:	is enabled?
+ * @tx_enabled:	is TX enabled?
+ * @rx_enabled:	is RX enabled?
+ * @override_changed: is media reconfig pending?
+ *
+ * @port_type:	one of %PORT_* defines for ethtool
+ * @port_lanes:	total number of lanes on the port (sum of lanes of all subports)
+ * @is_split:	is interface part of a split port
+ * @fec_modes_supported:	bitmap of FEC modes supported
+ */
+struct nfp_eth_table {
+	unsigned int count;
+	unsigned int max_index;
+	struct nfp_eth_table_port {
+		unsigned int eth_index;
+		unsigned int index;
+		unsigned int nbi;
+		unsigned int base;
+		unsigned int lanes;
+		unsigned int speed;
+
+		unsigned int interface;
+		enum nfp_eth_media media;
+
+		enum nfp_eth_fec fec;
+		enum nfp_eth_aneg aneg;
+
+		uint8_t mac_addr[ETH_ALEN];
+
+		uint8_t label_port;
+		uint8_t label_subport;
+
+		int enabled;
+		int tx_enabled;
+		int rx_enabled;
+
+		int override_changed;
+
+		/* Computed fields */
+		uint8_t port_type;
+
+		unsigned int port_lanes;
+
+		int is_split;
+
+		unsigned int fec_modes_supported;
+	} ports[0];
+};
+
+struct nfp_eth_table *nfp_eth_read_ports(struct nfp_cpp *cpp);
+
+int nfp_eth_set_mod_enable(struct nfp_cpp *cpp, unsigned int idx, int enable);
+int nfp_eth_set_configured(struct nfp_cpp *cpp, unsigned int idx,
+			   int configed);
+int
+nfp_eth_set_fec(struct nfp_cpp *cpp, unsigned int idx, enum nfp_eth_fec mode);
+
+int nfp_nsp_read_eth_table(struct nfp_nsp *state, void *buf, unsigned int size);
+int nfp_nsp_write_eth_table(struct nfp_nsp *state, const void *buf,
+			    unsigned int size);
+void nfp_nsp_config_set_state(struct nfp_nsp *state, void *entries,
+			      unsigned int idx);
+void nfp_nsp_config_clear_state(struct nfp_nsp *state);
+void nfp_nsp_config_set_modified(struct nfp_nsp *state, int modified);
+void *nfp_nsp_config_entries(struct nfp_nsp *state);
+int nfp_nsp_config_modified(struct nfp_nsp *state);
+unsigned int nfp_nsp_config_idx(struct nfp_nsp *state);
+
+static inline int nfp_eth_can_support_fec(struct nfp_eth_table_port *eth_port)
+{
+	return !!eth_port->fec_modes_supported;
+}
+
+static inline unsigned int
+nfp_eth_supported_fec_modes(struct nfp_eth_table_port *eth_port)
+{
+	return eth_port->fec_modes_supported;
+}
+
+struct nfp_nsp *nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx);
+int nfp_eth_config_commit_end(struct nfp_nsp *nsp);
+void nfp_eth_config_cleanup_end(struct nfp_nsp *nsp);
+
+int __nfp_eth_set_aneg(struct nfp_nsp *nsp, enum nfp_eth_aneg mode);
+int __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed);
+int __nfp_eth_set_split(struct nfp_nsp *nsp, unsigned int lanes);
+
+/**
+ * struct nfp_nsp_identify - NSP static information
+ * @version:      opaque version string
+ * @flags:        version flags
+ * @br_primary:   branch id of primary bootloader
+ * @br_secondary: branch id of secondary bootloader
+ * @br_nsp:       branch id of NSP
+ * @primary:      version of primarary bootloader
+ * @secondary:    version id of secondary bootloader
+ * @nsp:          version id of NSP
+ * @sensor_mask:  mask of present sensors available on NIC
+ */
+struct nfp_nsp_identify {
+	char version[40];
+	uint8_t flags;
+	uint8_t br_primary;
+	uint8_t br_secondary;
+	uint8_t br_nsp;
+	uint16_t primary;
+	uint16_t secondary;
+	uint16_t nsp;
+	uint64_t sensor_mask;
+};
+
+struct nfp_nsp_identify *__nfp_nsp_identify(struct nfp_nsp *nsp);
+
+enum nfp_nsp_sensor_id {
+	NFP_SENSOR_CHIP_TEMPERATURE,
+	NFP_SENSOR_ASSEMBLY_POWER,
+	NFP_SENSOR_ASSEMBLY_12V_POWER,
+	NFP_SENSOR_ASSEMBLY_3V3_POWER,
+};
+
+int nfp_hwmon_read_sensor(struct nfp_cpp *cpp, enum nfp_nsp_sensor_id id,
+			  long *val);
+
+#endif
diff --git a/drivers/net/nfp/nfpcore/nfp_nsp_cmds.c b/drivers/net/nfp/nfpcore/nfp_nsp_cmds.c
new file mode 100644
index 0000000..d9d6013
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_nsp_cmds.c
@@ -0,0 +1,135 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <rte_byteorder.h>
+#include "nfp_cpp.h"
+#include "nfp_nsp.h"
+#include "nfp_nffw.h"
+
+struct nsp_identify {
+	uint8_t version[40];
+	uint8_t flags;
+	uint8_t br_primary;
+	uint8_t br_secondary;
+	uint8_t br_nsp;
+	uint16_t primary;
+	uint16_t secondary;
+	uint16_t nsp;
+	uint8_t reserved[6];
+	uint64_t sensor_mask;
+};
+
+struct nfp_nsp_identify *
+__nfp_nsp_identify(struct nfp_nsp *nsp)
+{
+	struct nfp_nsp_identify *nspi = NULL;
+	struct nsp_identify *ni;
+	int ret;
+
+	if (nfp_nsp_get_abi_ver_minor(nsp) < 15)
+		return NULL;
+
+	ni = malloc(sizeof(*ni));
+	if (!ni)
+		return NULL;
+
+	memset(ni, 0, sizeof(*ni));
+	ret = nfp_nsp_read_identify(nsp, ni, sizeof(*ni));
+	if (ret < 0) {
+		printf("reading bsp version failed %d\n",
+			ret);
+		goto exit_free;
+	}
+
+	nspi = malloc(sizeof(*nspi));
+	if (!nspi)
+		goto exit_free;
+
+	memset(nspi, 0, sizeof(*nspi));
+	memcpy(nspi->version, ni->version, sizeof(nspi->version));
+	nspi->version[sizeof(nspi->version) - 1] = '\0';
+	nspi->flags = ni->flags;
+	nspi->br_primary = ni->br_primary;
+	nspi->br_secondary = ni->br_secondary;
+	nspi->br_nsp = ni->br_nsp;
+	nspi->primary = rte_le_to_cpu_16(ni->primary);
+	nspi->secondary = rte_le_to_cpu_16(ni->secondary);
+	nspi->nsp = rte_le_to_cpu_16(ni->nsp);
+	nspi->sensor_mask = rte_le_to_cpu_64(ni->sensor_mask);
+
+exit_free:
+	free(ni);
+	return nspi;
+}
+
+struct nfp_sensors {
+	uint32_t chip_temp;
+	uint32_t assembly_power;
+	uint32_t assembly_12v_power;
+	uint32_t assembly_3v3_power;
+};
+
+int
+nfp_hwmon_read_sensor(struct nfp_cpp *cpp, enum nfp_nsp_sensor_id id, long *val)
+{
+	struct nfp_sensors s;
+	struct nfp_nsp *nsp;
+	int ret;
+
+	nsp = nfp_nsp_open(cpp);
+	if (!nsp)
+		return -EIO;
+
+	ret = nfp_nsp_read_sensors(nsp, BIT(id), &s, sizeof(s));
+	nfp_nsp_close(nsp);
+
+	if (ret < 0)
+		return ret;
+
+	switch (id) {
+	case NFP_SENSOR_CHIP_TEMPERATURE:
+		*val = rte_le_to_cpu_32(s.chip_temp);
+		break;
+	case NFP_SENSOR_ASSEMBLY_POWER:
+		*val = rte_le_to_cpu_32(s.assembly_power);
+		break;
+	case NFP_SENSOR_ASSEMBLY_12V_POWER:
+		*val = rte_le_to_cpu_32(s.assembly_12v_power);
+		break;
+	case NFP_SENSOR_ASSEMBLY_3V3_POWER:
+		*val = rte_le_to_cpu_32(s.assembly_3v3_power);
+		break;
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_nsp_eth.c b/drivers/net/nfp/nfpcore/nfp_nsp_eth.c
new file mode 100644
index 0000000..409578c
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_nsp_eth.c
@@ -0,0 +1,691 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <rte_common.h>
+#include <rte_byteorder.h>
+#include "nfp_cpp.h"
+#include "nfp_nsp.h"
+#include "nfp6000/nfp6000.h"
+
+#define GENMASK_ULL(h, l) \
+	(((~0ULL) - (1ULL << (l)) + 1) & \
+	 (~0ULL >> (64 - 1 - (h))))
+
+#define __bf_shf(x) (__builtin_ffsll(x) - 1)
+
+#define FIELD_GET(_mask, _reg)						\
+	(__extension__ ({ \
+		typeof(_mask) _x = (_mask); \
+		(typeof(_x))(((_reg) & (_x)) >> __bf_shf(_x));	\
+	}))
+
+#define FIELD_FIT(_mask, _val)						\
+	(__extension__ ({ \
+		typeof(_mask) _x = (_mask); \
+		!((((typeof(_x))_val) << __bf_shf(_x)) & ~(_x)); \
+	}))
+
+#define FIELD_PREP(_mask, _val)						\
+	(__extension__ ({ \
+		typeof(_mask) _x = (_mask); \
+		((typeof(_x))(_val) << __bf_shf(_x)) & (_x);	\
+	}))
+
+#define NSP_ETH_NBI_PORT_COUNT		24
+#define NSP_ETH_MAX_COUNT		(2 * NSP_ETH_NBI_PORT_COUNT)
+#define NSP_ETH_TABLE_SIZE		(NSP_ETH_MAX_COUNT *		\
+					 sizeof(union eth_table_entry))
+
+#define NSP_ETH_PORT_LANES		GENMASK_ULL(3, 0)
+#define NSP_ETH_PORT_INDEX		GENMASK_ULL(15, 8)
+#define NSP_ETH_PORT_LABEL		GENMASK_ULL(53, 48)
+#define NSP_ETH_PORT_PHYLABEL		GENMASK_ULL(59, 54)
+#define NSP_ETH_PORT_FEC_SUPP_BASER	BIT_ULL(60)
+#define NSP_ETH_PORT_FEC_SUPP_RS	BIT_ULL(61)
+
+#define NSP_ETH_PORT_LANES_MASK		rte_cpu_to_le_64(NSP_ETH_PORT_LANES)
+
+#define NSP_ETH_STATE_CONFIGURED	BIT_ULL(0)
+#define NSP_ETH_STATE_ENABLED		BIT_ULL(1)
+#define NSP_ETH_STATE_TX_ENABLED	BIT_ULL(2)
+#define NSP_ETH_STATE_RX_ENABLED	BIT_ULL(3)
+#define NSP_ETH_STATE_RATE		GENMASK_ULL(11, 8)
+#define NSP_ETH_STATE_INTERFACE		GENMASK_ULL(19, 12)
+#define NSP_ETH_STATE_MEDIA		GENMASK_ULL(21, 20)
+#define NSP_ETH_STATE_OVRD_CHNG		BIT_ULL(22)
+#define NSP_ETH_STATE_ANEG		GENMASK_ULL(25, 23)
+#define NSP_ETH_STATE_FEC		GENMASK_ULL(27, 26)
+
+#define NSP_ETH_CTRL_CONFIGURED		BIT_ULL(0)
+#define NSP_ETH_CTRL_ENABLED		BIT_ULL(1)
+#define NSP_ETH_CTRL_TX_ENABLED		BIT_ULL(2)
+#define NSP_ETH_CTRL_RX_ENABLED		BIT_ULL(3)
+#define NSP_ETH_CTRL_SET_RATE		BIT_ULL(4)
+#define NSP_ETH_CTRL_SET_LANES		BIT_ULL(5)
+#define NSP_ETH_CTRL_SET_ANEG		BIT_ULL(6)
+#define NSP_ETH_CTRL_SET_FEC		BIT_ULL(7)
+
+/* Which connector port. */
+#define PORT_TP			0x00
+#define PORT_AUI		0x01
+#define PORT_MII		0x02
+#define PORT_FIBRE		0x03
+#define PORT_BNC		0x04
+#define PORT_DA			0x05
+#define PORT_NONE		0xef
+#define PORT_OTHER		0xff
+
+#define SPEED_10		10
+#define SPEED_100		100
+#define SPEED_1000		1000
+#define SPEED_2500		2500
+#define SPEED_5000		5000
+#define SPEED_10000		10000
+#define SPEED_14000		14000
+#define SPEED_20000		20000
+#define SPEED_25000		25000
+#define SPEED_40000		40000
+#define SPEED_50000		50000
+#define SPEED_56000		56000
+#define SPEED_100000		100000
+
+enum nfp_eth_raw {
+	NSP_ETH_RAW_PORT = 0,
+	NSP_ETH_RAW_STATE,
+	NSP_ETH_RAW_MAC,
+	NSP_ETH_RAW_CONTROL,
+
+	NSP_ETH_NUM_RAW
+};
+
+enum nfp_eth_rate {
+	RATE_INVALID = 0,
+	RATE_10M,
+	RATE_100M,
+	RATE_1G,
+	RATE_10G,
+	RATE_25G,
+};
+
+union eth_table_entry {
+	struct {
+		uint64_t port;
+		uint64_t state;
+		uint8_t mac_addr[6];
+		uint8_t resv[2];
+		uint64_t control;
+	};
+	uint64_t raw[NSP_ETH_NUM_RAW];
+};
+
+static const struct {
+	enum nfp_eth_rate rate;
+	unsigned int speed;
+} nsp_eth_rate_tbl[] = {
+	{ RATE_INVALID,	0, },
+	{ RATE_10M,	SPEED_10, },
+	{ RATE_100M,	SPEED_100, },
+	{ RATE_1G,	SPEED_1000, },
+	{ RATE_10G,	SPEED_10000, },
+	{ RATE_25G,	SPEED_25000, },
+};
+
+static unsigned int
+nfp_eth_rate2speed(enum nfp_eth_rate rate)
+{
+	int i;
+
+	for (i = 0; i < (int)ARRAY_SIZE(nsp_eth_rate_tbl); i++)
+		if (nsp_eth_rate_tbl[i].rate == rate)
+			return nsp_eth_rate_tbl[i].speed;
+
+	return 0;
+}
+
+static unsigned int
+nfp_eth_speed2rate(unsigned int speed)
+{
+	int i;
+
+	for (i = 0; i < (int)ARRAY_SIZE(nsp_eth_rate_tbl); i++)
+		if (nsp_eth_rate_tbl[i].speed == speed)
+			return nsp_eth_rate_tbl[i].rate;
+
+	return RATE_INVALID;
+}
+
+static void
+nfp_eth_copy_mac_reverse(uint8_t *dst, const uint8_t *src)
+{
+	int i;
+
+	for (i = 0; i < (int)ETH_ALEN; i++)
+		dst[ETH_ALEN - i - 1] = src[i];
+}
+
+static void
+nfp_eth_port_translate(struct nfp_nsp *nsp, const union eth_table_entry *src,
+		       unsigned int index, struct nfp_eth_table_port *dst)
+{
+	unsigned int rate;
+	unsigned int fec;
+	uint64_t port, state;
+
+	port = rte_le_to_cpu_64(src->port);
+	state = rte_le_to_cpu_64(src->state);
+
+	dst->eth_index = FIELD_GET(NSP_ETH_PORT_INDEX, port);
+	dst->index = index;
+	dst->nbi = index / NSP_ETH_NBI_PORT_COUNT;
+	dst->base = index % NSP_ETH_NBI_PORT_COUNT;
+	dst->lanes = FIELD_GET(NSP_ETH_PORT_LANES, port);
+
+	dst->enabled = FIELD_GET(NSP_ETH_STATE_ENABLED, state);
+	dst->tx_enabled = FIELD_GET(NSP_ETH_STATE_TX_ENABLED, state);
+	dst->rx_enabled = FIELD_GET(NSP_ETH_STATE_RX_ENABLED, state);
+
+	rate = nfp_eth_rate2speed(FIELD_GET(NSP_ETH_STATE_RATE, state));
+	dst->speed = dst->lanes * rate;
+
+	dst->interface = FIELD_GET(NSP_ETH_STATE_INTERFACE, state);
+	dst->media = FIELD_GET(NSP_ETH_STATE_MEDIA, state);
+
+	nfp_eth_copy_mac_reverse(dst->mac_addr, src->mac_addr);
+
+	dst->label_port = FIELD_GET(NSP_ETH_PORT_PHYLABEL, port);
+	dst->label_subport = FIELD_GET(NSP_ETH_PORT_LABEL, port);
+
+	if (nfp_nsp_get_abi_ver_minor(nsp) < 17)
+		return;
+
+	dst->override_changed = FIELD_GET(NSP_ETH_STATE_OVRD_CHNG, state);
+	dst->aneg = FIELD_GET(NSP_ETH_STATE_ANEG, state);
+
+	if (nfp_nsp_get_abi_ver_minor(nsp) < 22)
+		return;
+
+	fec = FIELD_GET(NSP_ETH_PORT_FEC_SUPP_BASER, port);
+	dst->fec_modes_supported |= fec << NFP_FEC_BASER_BIT;
+	fec = FIELD_GET(NSP_ETH_PORT_FEC_SUPP_RS, port);
+	dst->fec_modes_supported |= fec << NFP_FEC_REED_SOLOMON_BIT;
+	if (dst->fec_modes_supported)
+		dst->fec_modes_supported |= NFP_FEC_AUTO | NFP_FEC_DISABLED;
+
+	dst->fec = 1 << FIELD_GET(NSP_ETH_STATE_FEC, state);
+}
+
+static void
+nfp_eth_calc_port_geometry(struct nfp_eth_table *table)
+{
+	unsigned int i, j;
+
+	for (i = 0; i < table->count; i++) {
+		table->max_index = RTE_MAX(table->max_index,
+					   table->ports[i].index);
+
+		for (j = 0; j < table->count; j++) {
+			if (table->ports[i].label_port !=
+			    table->ports[j].label_port)
+				continue;
+			table->ports[i].port_lanes += table->ports[j].lanes;
+
+			if (i == j)
+				continue;
+			if (table->ports[i].label_subport ==
+			    table->ports[j].label_subport)
+				printf("Port %d subport %d is a duplicate\n",
+					 table->ports[i].label_port,
+					 table->ports[i].label_subport);
+
+			table->ports[i].is_split = 1;
+		}
+	}
+}
+
+static void
+nfp_eth_calc_port_type(struct nfp_eth_table_port *entry)
+{
+	if (entry->interface == NFP_INTERFACE_NONE) {
+		entry->port_type = PORT_NONE;
+		return;
+	}
+
+	if (entry->media == NFP_MEDIA_FIBRE)
+		entry->port_type = PORT_FIBRE;
+	else
+		entry->port_type = PORT_DA;
+}
+
+static struct nfp_eth_table *
+__nfp_eth_read_ports(struct nfp_nsp *nsp)
+{
+	union eth_table_entry *entries;
+	struct nfp_eth_table *table;
+	uint32_t table_sz;
+	int i, j, ret, cnt = 0;
+
+	entries = malloc(NSP_ETH_TABLE_SIZE);
+	if (!entries)
+		return NULL;
+
+	memset(entries, 0, NSP_ETH_TABLE_SIZE);
+	ret = nfp_nsp_read_eth_table(nsp, entries, NSP_ETH_TABLE_SIZE);
+	if (ret < 0) {
+		printf("reading port table failed %d\n", ret);
+		goto err;
+	}
+
+	for (i = 0; i < NSP_ETH_MAX_COUNT; i++)
+		if (entries[i].port & NSP_ETH_PORT_LANES_MASK)
+			cnt++;
+
+	/* Some versions of flash will give us 0 instead of port count. For
+	 * those that give a port count, verify it against the value calculated
+	 * above.
+	 */
+	if (ret && ret != cnt) {
+		printf("table entry count (%d) unmatch entries present (%d)\n",
+		       ret, cnt);
+		goto err;
+	}
+
+	table_sz = sizeof(*table) + sizeof(struct nfp_eth_table_port) * cnt;
+	table = malloc(table_sz);
+	if (!table)
+		goto err;
+
+	memset(table, 0, table_sz);
+	table->count = cnt;
+	for (i = 0, j = 0; i < NSP_ETH_MAX_COUNT; i++)
+		if (entries[i].port & NSP_ETH_PORT_LANES_MASK)
+			nfp_eth_port_translate(nsp, &entries[i], i,
+					       &table->ports[j++]);
+
+	nfp_eth_calc_port_geometry(table);
+	for (i = 0; i < (int)table->count; i++)
+		nfp_eth_calc_port_type(&table->ports[i]);
+
+	free(entries);
+
+	return table;
+
+err:
+	free(entries);
+	return NULL;
+}
+
+/*
+ * nfp_eth_read_ports() - retrieve port information
+ * @cpp:	NFP CPP handle
+ *
+ * Read the port information from the device.  Returned structure should
+ * be freed with kfree() once no longer needed.
+ *
+ * Return: populated ETH table or NULL on error.
+ */
+struct nfp_eth_table *
+nfp_eth_read_ports(struct nfp_cpp *cpp)
+{
+	struct nfp_eth_table *ret;
+	struct nfp_nsp *nsp;
+
+	nsp = nfp_nsp_open(cpp);
+	if (!nsp)
+		return NULL;
+
+	ret = __nfp_eth_read_ports(nsp);
+	nfp_nsp_close(nsp);
+
+	return ret;
+}
+
+struct nfp_nsp *
+nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx)
+{
+	union eth_table_entry *entries;
+	struct nfp_nsp *nsp;
+	int ret;
+
+	entries = malloc(NSP_ETH_TABLE_SIZE);
+	if (!entries)
+		return NULL;
+
+	memset(entries, 0, NSP_ETH_TABLE_SIZE);
+	nsp = nfp_nsp_open(cpp);
+	if (!nsp) {
+		free(entries);
+		return nsp;
+	}
+
+	ret = nfp_nsp_read_eth_table(nsp, entries, NSP_ETH_TABLE_SIZE);
+	if (ret < 0) {
+		printf("reading port table failed %d\n", ret);
+		goto err;
+	}
+
+	if (!(entries[idx].port & NSP_ETH_PORT_LANES_MASK)) {
+		printf("trying to set port state on disabled port %d\n", idx);
+		goto err;
+	}
+
+	nfp_nsp_config_set_state(nsp, entries, idx);
+	return nsp;
+
+err:
+	nfp_nsp_close(nsp);
+	free(entries);
+	return NULL;
+}
+
+void
+nfp_eth_config_cleanup_end(struct nfp_nsp *nsp)
+{
+	union eth_table_entry *entries = nfp_nsp_config_entries(nsp);
+
+	nfp_nsp_config_set_modified(nsp, 0);
+	nfp_nsp_config_clear_state(nsp);
+	nfp_nsp_close(nsp);
+	free(entries);
+}
+
+/*
+ * nfp_eth_config_commit_end() - perform recorded configuration changes
+ * @nsp:	NFP NSP handle returned from nfp_eth_config_start()
+ *
+ * Perform the configuration which was requested with __nfp_eth_set_*()
+ * helpers and recorded in @nsp state.  If device was already configured
+ * as requested or no __nfp_eth_set_*() operations were made no NSP command
+ * will be performed.
+ *
+ * Return:
+ * 0 - configuration successful;
+ * 1 - no changes were needed;
+ * -ERRNO - configuration failed.
+ */
+int
+nfp_eth_config_commit_end(struct nfp_nsp *nsp)
+{
+	union eth_table_entry *entries = nfp_nsp_config_entries(nsp);
+	int ret = 1;
+
+	if (nfp_nsp_config_modified(nsp)) {
+		ret = nfp_nsp_write_eth_table(nsp, entries, NSP_ETH_TABLE_SIZE);
+		ret = ret < 0 ? ret : 0;
+	}
+
+	nfp_eth_config_cleanup_end(nsp);
+
+	return ret;
+}
+
+/*
+ * nfp_eth_set_mod_enable() - set PHY module enable control bit
+ * @cpp:	NFP CPP handle
+ * @idx:	NFP chip-wide port index
+ * @enable:	Desired state
+ *
+ * Enable or disable PHY module (this usually means setting the TX lanes
+ * disable bits).
+ *
+ * Return:
+ * 0 - configuration successful;
+ * 1 - no changes were needed;
+ * -ERRNO - configuration failed.
+ */
+int
+nfp_eth_set_mod_enable(struct nfp_cpp *cpp, unsigned int idx, int enable)
+{
+	union eth_table_entry *entries;
+	struct nfp_nsp *nsp;
+	uint64_t reg;
+
+	nsp = nfp_eth_config_start(cpp, idx);
+	if (!nsp)
+		return -1;
+
+	entries = nfp_nsp_config_entries(nsp);
+
+	/* Check if we are already in requested state */
+	reg = rte_le_to_cpu_64(entries[idx].state);
+	if (enable != (int)FIELD_GET(NSP_ETH_CTRL_ENABLED, reg)) {
+		reg = rte_le_to_cpu_64(entries[idx].control);
+		reg &= ~NSP_ETH_CTRL_ENABLED;
+		reg |= FIELD_PREP(NSP_ETH_CTRL_ENABLED, enable);
+		entries[idx].control = rte_cpu_to_le_64(reg);
+
+		nfp_nsp_config_set_modified(nsp, 1);
+	}
+
+	return nfp_eth_config_commit_end(nsp);
+}
+
+/*
+ * nfp_eth_set_configured() - set PHY module configured control bit
+ * @cpp:	NFP CPP handle
+ * @idx:	NFP chip-wide port index
+ * @configed:	Desired state
+ *
+ * Set the ifup/ifdown state on the PHY.
+ *
+ * Return:
+ * 0 - configuration successful;
+ * 1 - no changes were needed;
+ * -ERRNO - configuration failed.
+ */
+int
+nfp_eth_set_configured(struct nfp_cpp *cpp, unsigned int idx, int configed)
+{
+	union eth_table_entry *entries;
+	struct nfp_nsp *nsp;
+	uint64_t reg;
+
+	nsp = nfp_eth_config_start(cpp, idx);
+	if (!nsp)
+		return -EIO;
+
+	/*
+	 * Older ABI versions did support this feature, however this has only
+	 * been reliable since ABI 20.
+	 */
+	if (nfp_nsp_get_abi_ver_minor(nsp) < 20) {
+		nfp_eth_config_cleanup_end(nsp);
+		return -EOPNOTSUPP;
+	}
+
+	entries = nfp_nsp_config_entries(nsp);
+
+	/* Check if we are already in requested state */
+	reg = rte_le_to_cpu_64(entries[idx].state);
+	if (configed != (int)FIELD_GET(NSP_ETH_STATE_CONFIGURED, reg)) {
+		reg = rte_le_to_cpu_64(entries[idx].control);
+		reg &= ~NSP_ETH_CTRL_CONFIGURED;
+		reg |= FIELD_PREP(NSP_ETH_CTRL_CONFIGURED, configed);
+		entries[idx].control = rte_cpu_to_le_64(reg);
+
+		nfp_nsp_config_set_modified(nsp, 1);
+	}
+
+	return nfp_eth_config_commit_end(nsp);
+}
+
+static int
+nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
+		       const uint64_t mask, const unsigned int shift,
+		       unsigned int val, const uint64_t ctrl_bit)
+{
+	union eth_table_entry *entries = nfp_nsp_config_entries(nsp);
+	unsigned int idx = nfp_nsp_config_idx(nsp);
+	uint64_t reg;
+
+	/*
+	 * Note: set features were added in ABI 0.14 but the error
+	 *	 codes were initially not populated correctly.
+	 */
+	if (nfp_nsp_get_abi_ver_minor(nsp) < 17) {
+		printf("set operations not supported, please update flash\n");
+		return -EOPNOTSUPP;
+	}
+
+	/* Check if we are already in requested state */
+	reg = rte_le_to_cpu_64(entries[idx].raw[raw_idx]);
+	if (val == (reg & mask) >> shift)
+		return 0;
+
+	reg &= ~mask;
+	reg |= (val << shift) & mask;
+	entries[idx].raw[raw_idx] = rte_cpu_to_le_64(reg);
+
+	entries[idx].control |= rte_cpu_to_le_64(ctrl_bit);
+
+	nfp_nsp_config_set_modified(nsp, 1);
+
+	return 0;
+}
+
+#define NFP_ETH_SET_BIT_CONFIG(nsp, raw_idx, mask, val, ctrl_bit)	\
+	(__extension__ ({ \
+		typeof(mask) _x = (mask); \
+		nfp_eth_set_bit_config(nsp, raw_idx, _x, __bf_shf(_x), \
+				       val, ctrl_bit);			\
+	}))
+
+/*
+ * __nfp_eth_set_aneg() - set PHY autonegotiation control bit
+ * @nsp:	NFP NSP handle returned from nfp_eth_config_start()
+ * @mode:	Desired autonegotiation mode
+ *
+ * Allow/disallow PHY module to advertise/perform autonegotiation.
+ * Will write to hwinfo overrides in the flash (persistent config).
+ *
+ * Return: 0 or -ERRNO.
+ */
+int
+__nfp_eth_set_aneg(struct nfp_nsp *nsp, enum nfp_eth_aneg mode)
+{
+	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_STATE,
+				      NSP_ETH_STATE_ANEG, mode,
+				      NSP_ETH_CTRL_SET_ANEG);
+}
+
+/*
+ * __nfp_eth_set_fec() - set PHY forward error correction control bit
+ * @nsp:	NFP NSP handle returned from nfp_eth_config_start()
+ * @mode:	Desired fec mode
+ *
+ * Set the PHY module forward error correction mode.
+ * Will write to hwinfo overrides in the flash (persistent config).
+ *
+ * Return: 0 or -ERRNO.
+ */
+static int
+__nfp_eth_set_fec(struct nfp_nsp *nsp, enum nfp_eth_fec mode)
+{
+	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_STATE,
+				      NSP_ETH_STATE_FEC, mode,
+				      NSP_ETH_CTRL_SET_FEC);
+}
+
+/*
+ * nfp_eth_set_fec() - set PHY forward error correction control mode
+ * @cpp:	NFP CPP handle
+ * @idx:	NFP chip-wide port index
+ * @mode:	Desired fec mode
+ *
+ * Return:
+ * 0 - configuration successful;
+ * 1 - no changes were needed;
+ * -ERRNO - configuration failed.
+ */
+int
+nfp_eth_set_fec(struct nfp_cpp *cpp, unsigned int idx, enum nfp_eth_fec mode)
+{
+	struct nfp_nsp *nsp;
+	int err;
+
+	nsp = nfp_eth_config_start(cpp, idx);
+	if (!nsp)
+		return -EIO;
+
+	err = __nfp_eth_set_fec(nsp, mode);
+	if (err) {
+		nfp_eth_config_cleanup_end(nsp);
+		return err;
+	}
+
+	return nfp_eth_config_commit_end(nsp);
+}
+
+/*
+ * __nfp_eth_set_speed() - set interface speed/rate
+ * @nsp:	NFP NSP handle returned from nfp_eth_config_start()
+ * @speed:	Desired speed (per lane)
+ *
+ * Set lane speed.  Provided @speed value should be subport speed divided
+ * by number of lanes this subport is spanning (i.e. 10000 for 40G, 25000 for
+ * 50G, etc.)
+ * Will write to hwinfo overrides in the flash (persistent config).
+ *
+ * Return: 0 or -ERRNO.
+ */
+int
+__nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed)
+{
+	enum nfp_eth_rate rate;
+
+	rate = nfp_eth_speed2rate(speed);
+	if (rate == RATE_INVALID) {
+		printf("could not find matching lane rate for speed %u\n",
+			 speed);
+		return -EINVAL;
+	}
+
+	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_STATE,
+				      NSP_ETH_STATE_RATE, rate,
+				      NSP_ETH_CTRL_SET_RATE);
+}
+
+/*
+ * __nfp_eth_set_split() - set interface lane split
+ * @nsp:	NFP NSP handle returned from nfp_eth_config_start()
+ * @lanes:	Desired lanes per port
+ *
+ * Set number of lanes in the port.
+ * Will write to hwinfo overrides in the flash (persistent config).
+ *
+ * Return: 0 or -ERRNO.
+ */
+int
+__nfp_eth_set_split(struct nfp_nsp *nsp, unsigned int lanes)
+{
+	return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_PORT, NSP_ETH_PORT_LANES,
+				      lanes, NSP_ETH_CTRL_SET_LANES);
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_resource.c b/drivers/net/nfp/nfpcore/nfp_resource.c
new file mode 100644
index 0000000..e4f1efe
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_resource.c
@@ -0,0 +1,291 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <time.h>
+#include <zlib.h>
+#include <endian.h>
+
+#include "nfp_cpp.h"
+#include "nfp6000/nfp6000.h"
+#include "nfp_resource.h"
+#include "nfp_crc.h"
+
+#define NFP_RESOURCE_TBL_TARGET		NFP_CPP_TARGET_MU
+#define NFP_RESOURCE_TBL_BASE		0x8100000000ULL
+
+/* NFP Resource Table self-identifier */
+#define NFP_RESOURCE_TBL_NAME		"nfp.res"
+#define NFP_RESOURCE_TBL_KEY		0x00000000 /* Special key for entry 0 */
+
+#define NFP_RESOURCE_ENTRY_NAME_SZ	8
+
+/*
+ * struct nfp_resource_entry - Resource table entry
+ * @owner:		NFP CPP Lock, interface owner
+ * @key:		NFP CPP Lock, posix_crc32(name, 8)
+ * @region:		Memory region descriptor
+ * @name:		ASCII, zero padded name
+ * @reserved
+ * @cpp_action:		CPP Action
+ * @cpp_token:		CPP Token
+ * @cpp_target:		CPP Target ID
+ * @page_offset:	256-byte page offset into target's CPP address
+ * @page_size:		size, in 256-byte pages
+ */
+struct nfp_resource_entry {
+	struct nfp_resource_entry_mutex {
+		uint32_t owner;
+		uint32_t key;
+	} mutex;
+	struct nfp_resource_entry_region {
+		uint8_t  name[NFP_RESOURCE_ENTRY_NAME_SZ];
+		uint8_t  reserved[5];
+		uint8_t  cpp_action;
+		uint8_t  cpp_token;
+		uint8_t  cpp_target;
+		uint32_t page_offset;
+		uint32_t page_size;
+	} region;
+};
+
+#define NFP_RESOURCE_TBL_SIZE		4096
+#define NFP_RESOURCE_TBL_ENTRIES	(int)(NFP_RESOURCE_TBL_SIZE /	\
+					 sizeof(struct nfp_resource_entry))
+
+struct nfp_resource {
+	char name[NFP_RESOURCE_ENTRY_NAME_SZ + 1];
+	uint32_t cpp_id;
+	uint64_t addr;
+	uint64_t size;
+	struct nfp_cpp_mutex *mutex;
+};
+
+static int
+nfp_cpp_resource_find(struct nfp_cpp *cpp, struct nfp_resource *res)
+{
+	char name_pad[NFP_RESOURCE_ENTRY_NAME_SZ] = {};
+	struct nfp_resource_entry entry;
+	uint32_t cpp_id, key;
+	int ret, i;
+
+	cpp_id = NFP_CPP_ID(NFP_RESOURCE_TBL_TARGET, 3, 0);  /* Atomic read */
+
+	memset(name_pad, 0, NFP_RESOURCE_ENTRY_NAME_SZ);
+	strncpy(name_pad, res->name, sizeof(name_pad));
+
+	/* Search for a matching entry */
+	if (!memcmp(name_pad, NFP_RESOURCE_TBL_NAME "\0\0\0\0\0\0\0\0", 8)) {
+		printf("Grabbing device lock not supported\n");
+		return -EOPNOTSUPP;
+	}
+	key = nfp_crc32_posix(name_pad, sizeof(name_pad));
+
+	for (i = 0; i < NFP_RESOURCE_TBL_ENTRIES; i++) {
+		uint64_t addr = NFP_RESOURCE_TBL_BASE +
+			sizeof(struct nfp_resource_entry) * i;
+
+		ret = nfp_cpp_read(cpp, cpp_id, addr, &entry, sizeof(entry));
+		if (ret != sizeof(entry))
+			return -EIO;
+
+		if (entry.mutex.key != key)
+			continue;
+
+		/* Found key! */
+		res->mutex =
+			nfp_cpp_mutex_alloc(cpp,
+					    NFP_RESOURCE_TBL_TARGET, addr, key);
+		res->cpp_id = NFP_CPP_ID(entry.region.cpp_target,
+					 entry.region.cpp_action,
+					 entry.region.cpp_token);
+		res->addr = ((uint64_t)entry.region.page_offset) << 8;
+		res->size = (uint64_t)entry.region.page_size << 8;
+		return 0;
+	}
+
+	return -ENOENT;
+}
+
+static int
+nfp_resource_try_acquire(struct nfp_cpp *cpp, struct nfp_resource *res,
+			 struct nfp_cpp_mutex *dev_mutex)
+{
+	int err;
+
+	if (nfp_cpp_mutex_lock(dev_mutex))
+		return -EINVAL;
+
+	err = nfp_cpp_resource_find(cpp, res);
+	if (err)
+		goto err_unlock_dev;
+
+	err = nfp_cpp_mutex_trylock(res->mutex);
+	if (err)
+		goto err_res_mutex_free;
+
+	nfp_cpp_mutex_unlock(dev_mutex);
+
+	return 0;
+
+err_res_mutex_free:
+	nfp_cpp_mutex_free(res->mutex);
+err_unlock_dev:
+	nfp_cpp_mutex_unlock(dev_mutex);
+
+	return err;
+}
+
+/*
+ * nfp_resource_acquire() - Acquire a resource handle
+ * @cpp:	NFP CPP handle
+ * @name:	Name of the resource
+ *
+ * NOTE: This function locks the acquired resource
+ *
+ * Return: NFP Resource handle, or ERR_PTR()
+ */
+struct nfp_resource *
+nfp_resource_acquire(struct nfp_cpp *cpp, const char *name)
+{
+	struct nfp_cpp_mutex *dev_mutex;
+	struct nfp_resource *res;
+	int err;
+	struct timespec wait;
+	int count;
+
+	res = malloc(sizeof(*res));
+	if (!res)
+		return NULL;
+
+	memset(res, 0, sizeof(*res));
+
+	strncpy(res->name, name, NFP_RESOURCE_ENTRY_NAME_SZ);
+
+	dev_mutex = nfp_cpp_mutex_alloc(cpp, NFP_RESOURCE_TBL_TARGET,
+					NFP_RESOURCE_TBL_BASE,
+					NFP_RESOURCE_TBL_KEY);
+	if (!dev_mutex) {
+		free(res);
+		return NULL;
+	}
+
+	wait.tv_sec = 0;
+	wait.tv_nsec = 1000000;
+	count = 0;
+
+	for (;;) {
+		err = nfp_resource_try_acquire(cpp, res, dev_mutex);
+		if (!err)
+			break;
+		if (err != -EBUSY)
+			goto err_free;
+
+		if (count++ > 1000) {
+			printf("Error: resource %s timed out\n", name);
+			err = -EBUSY;
+			goto err_free;
+		}
+
+		nanosleep(&wait, NULL);
+	}
+
+	nfp_cpp_mutex_free(dev_mutex);
+
+	return res;
+
+err_free:
+	nfp_cpp_mutex_free(dev_mutex);
+	free(res);
+	return NULL;
+}
+
+/*
+ * nfp_resource_release() - Release a NFP Resource handle
+ * @res:	NFP Resource handle
+ *
+ * NOTE: This function implictly unlocks the resource handle
+ */
+void
+nfp_resource_release(struct nfp_resource *res)
+{
+	nfp_cpp_mutex_unlock(res->mutex);
+	nfp_cpp_mutex_free(res->mutex);
+	free(res);
+}
+
+/*
+ * nfp_resource_cpp_id() - Return the cpp_id of a resource handle
+ * @res:        NFP Resource handle
+ *
+ * Return: NFP CPP ID
+ */
+uint32_t
+nfp_resource_cpp_id(const struct nfp_resource *res)
+{
+	return res->cpp_id;
+}
+
+/*
+ * nfp_resource_name() - Return the name of a resource handle
+ * @res:        NFP Resource handle
+ *
+ * Return: const char pointer to the name of the resource
+ */
+const char
+*nfp_resource_name(const struct nfp_resource *res)
+{
+	return res->name;
+}
+
+/*
+ * nfp_resource_address() - Return the address of a resource handle
+ * @res:        NFP Resource handle
+ *
+ * Return: Address of the resource
+ */
+uint64_t
+nfp_resource_address(const struct nfp_resource *res)
+{
+	return res->addr;
+}
+
+/*
+ * nfp_resource_size() - Return the size in bytes of a resource handle
+ * @res:        NFP Resource handle
+ *
+ * Return: Size of the resource in bytes
+ */
+uint64_t
+nfp_resource_size(const struct nfp_resource *res)
+{
+	return res->size;
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_resource.h b/drivers/net/nfp/nfpcore/nfp_resource.h
new file mode 100644
index 0000000..2f71d31
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_resource.h
@@ -0,0 +1,78 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef NFP_RESOURCE_H
+#define NFP_RESOURCE_H
+
+#include "nfp_cpp.h"
+
+#define NFP_RESOURCE_NFP_NFFW           "nfp.nffw"
+#define NFP_RESOURCE_NFP_HWINFO         "nfp.info"
+#define NFP_RESOURCE_NSP		"nfp.sp"
+
+/**
+ * Opaque handle to a NFP Resource
+ */
+struct nfp_resource;
+
+struct nfp_resource *nfp_resource_acquire(struct nfp_cpp *cpp,
+					  const char *name);
+
+/**
+ * Release a NFP Resource, and free the handle
+ * @param[in]   res     NFP Resource handle
+ */
+void nfp_resource_release(struct nfp_resource *res);
+
+/**
+ * Return the CPP ID of a NFP Resource
+ * @param[in]   res     NFP Resource handle
+ * @return      CPP ID of the NFP Resource
+ */
+uint32_t nfp_resource_cpp_id(const struct nfp_resource *res);
+
+/**
+ * Return the name of a NFP Resource
+ * @param[in]   res     NFP Resource handle
+ * @return      Name of the NFP Resource
+ */
+const char *nfp_resource_name(const struct nfp_resource *res);
+
+/**
+ * Return the target address of a NFP Resource
+ * @param[in]   res     NFP Resource handle
+ * @return      Address of the NFP Resource
+ */
+uint64_t nfp_resource_address(const struct nfp_resource *res);
+
+uint64_t nfp_resource_size(const struct nfp_resource *res);
+
+#endif /* NFP_RESOURCE_H */
diff --git a/drivers/net/nfp/nfpcore/nfp_rtsym.c b/drivers/net/nfp/nfpcore/nfp_rtsym.c
new file mode 100644
index 0000000..1622395
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_rtsym.c
@@ -0,0 +1,353 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * nfp_rtsym.c
+ * Interface for accessing run-time symbol table
+ */
+
+#include <stdio.h>
+#include <rte_byteorder.h>
+#include "nfp_cpp.h"
+#include "nfp_mip.h"
+#include "nfp_rtsym.h"
+#include "nfp6000/nfp6000.h"
+
+/* These need to match the linker */
+#define SYM_TGT_LMEM		0
+#define SYM_TGT_EMU_CACHE	0x17
+
+struct nfp_rtsym_entry {
+	uint8_t	type;
+	uint8_t	target;
+	uint8_t	island;
+	uint8_t	addr_hi;
+	uint32_t addr_lo;
+	uint16_t name;
+	uint8_t	menum;
+	uint8_t	size_hi;
+	uint32_t size_lo;
+};
+
+struct nfp_rtsym_table {
+	struct nfp_cpp *cpp;
+	int num;
+	char *strtab;
+	struct nfp_rtsym symtab[];
+};
+
+static int
+nfp_meid(uint8_t island_id, uint8_t menum)
+{
+	return (island_id & 0x3F) == island_id && menum < 12 ?
+		(island_id << 4) | (menum + 4) : -1;
+}
+
+static void
+nfp_rtsym_sw_entry_init(struct nfp_rtsym_table *cache, uint32_t strtab_size,
+			struct nfp_rtsym *sw, struct nfp_rtsym_entry *fw)
+{
+	sw->type = fw->type;
+	sw->name = cache->strtab + rte_le_to_cpu_16(fw->name) % strtab_size;
+	sw->addr = ((uint64_t)fw->addr_hi << 32) |
+		   rte_le_to_cpu_32(fw->addr_lo);
+	sw->size = ((uint64_t)fw->size_hi << 32) |
+		   rte_le_to_cpu_32(fw->size_lo);
+
+#ifdef DEBUG
+	printf("rtsym_entry_init\n");
+	printf("\tname=%s, addr=%" PRIx64 ", size=%" PRIu64 ",target=%d\n",
+		sw->name, sw->addr, sw->size, sw->target);
+#endif
+	switch (fw->target) {
+	case SYM_TGT_LMEM:
+		sw->target = NFP_RTSYM_TARGET_LMEM;
+		break;
+	case SYM_TGT_EMU_CACHE:
+		sw->target = NFP_RTSYM_TARGET_EMU_CACHE;
+		break;
+	default:
+		sw->target = fw->target;
+		break;
+	}
+
+	if (fw->menum != 0xff)
+		sw->domain = nfp_meid(fw->island, fw->menum);
+	else if (fw->island != 0xff)
+		sw->domain = fw->island;
+	else
+		sw->domain = -1;
+}
+
+struct nfp_rtsym_table *
+nfp_rtsym_table_read(struct nfp_cpp *cpp)
+{
+	struct nfp_rtsym_table *rtbl;
+	struct nfp_mip *mip;
+
+	mip = nfp_mip_open(cpp);
+	rtbl = __nfp_rtsym_table_read(cpp, mip);
+	nfp_mip_close(mip);
+
+	return rtbl;
+}
+
+/*
+ * This looks more complex than it should be. But we need to get the type for
+ * the ~ right in round_down (it needs to be as wide as the result!), and we
+ * want to evaluate the macro arguments just once each.
+ */
+#define __round_mask(x, y) ((__typeof__(x))((y) - 1))
+
+#define round_up(x, y) \
+	(__extension__ ({ \
+		typeof(x) _x = (x); \
+		((((_x) - 1) | __round_mask(_x, y)) + 1); \
+	}))
+
+#define round_down(x, y) \
+	(__extension__ ({ \
+		typeof(x) _x = (x); \
+		((_x) & ~__round_mask(_x, y)); \
+	}))
+
+struct nfp_rtsym_table *
+__nfp_rtsym_table_read(struct nfp_cpp *cpp, const struct nfp_mip *mip)
+{
+	uint32_t strtab_addr, symtab_addr, strtab_size, symtab_size;
+	struct nfp_rtsym_entry *rtsymtab;
+	struct nfp_rtsym_table *cache;
+	const uint32_t dram =
+		NFP_CPP_ID(NFP_CPP_TARGET_MU, NFP_CPP_ACTION_RW, 0) |
+		NFP_ISL_EMEM0;
+	int err, n, size;
+
+	if (!mip)
+		return NULL;
+
+	nfp_mip_strtab(mip, &strtab_addr, &strtab_size);
+	nfp_mip_symtab(mip, &symtab_addr, &symtab_size);
+
+	if (!symtab_size || !strtab_size || symtab_size % sizeof(*rtsymtab))
+		return NULL;
+
+	/* Align to 64 bits */
+	symtab_size = round_up(symtab_size, 8);
+	strtab_size = round_up(strtab_size, 8);
+
+	rtsymtab = malloc(symtab_size);
+	if (!rtsymtab)
+		return NULL;
+
+	size = sizeof(*cache);
+	size += symtab_size / sizeof(*rtsymtab) * sizeof(struct nfp_rtsym);
+	size +=	strtab_size + 1;
+	cache = malloc(size);
+	if (!cache)
+		goto exit_free_rtsym_raw;
+
+	cache->cpp = cpp;
+	cache->num = symtab_size / sizeof(*rtsymtab);
+	cache->strtab = (void *)&cache->symtab[cache->num];
+
+	err = nfp_cpp_read(cpp, dram, symtab_addr, rtsymtab, symtab_size);
+	if (err != (int)symtab_size)
+		goto exit_free_cache;
+
+	err = nfp_cpp_read(cpp, dram, strtab_addr, cache->strtab, strtab_size);
+	if (err != (int)strtab_size)
+		goto exit_free_cache;
+	cache->strtab[strtab_size] = '\0';
+
+	for (n = 0; n < cache->num; n++)
+		nfp_rtsym_sw_entry_init(cache, strtab_size,
+					&cache->symtab[n], &rtsymtab[n]);
+
+	free(rtsymtab);
+
+	return cache;
+
+exit_free_cache:
+	free(cache);
+exit_free_rtsym_raw:
+	free(rtsymtab);
+	return NULL;
+}
+
+/*
+ * nfp_rtsym_count() - Get the number of RTSYM descriptors
+ * @rtbl:	NFP RTsym table
+ *
+ * Return: Number of RTSYM descriptors
+ */
+int
+nfp_rtsym_count(struct nfp_rtsym_table *rtbl)
+{
+	if (!rtbl)
+		return -EINVAL;
+
+	return rtbl->num;
+}
+
+/*
+ * nfp_rtsym_get() - Get the Nth RTSYM descriptor
+ * @rtbl:	NFP RTsym table
+ * @idx:	Index (0-based) of the RTSYM descriptor
+ *
+ * Return: const pointer to a struct nfp_rtsym descriptor, or NULL
+ */
+const struct nfp_rtsym *
+nfp_rtsym_get(struct nfp_rtsym_table *rtbl, int idx)
+{
+	if (!rtbl)
+		return NULL;
+
+	if (idx >= rtbl->num)
+		return NULL;
+
+	return &rtbl->symtab[idx];
+}
+
+/*
+ * nfp_rtsym_lookup() - Return the RTSYM descriptor for a symbol name
+ * @rtbl:	NFP RTsym table
+ * @name:	Symbol name
+ *
+ * Return: const pointer to a struct nfp_rtsym descriptor, or NULL
+ */
+const struct nfp_rtsym *
+nfp_rtsym_lookup(struct nfp_rtsym_table *rtbl, const char *name)
+{
+	int n;
+
+	if (!rtbl)
+		return NULL;
+
+	for (n = 0; n < rtbl->num; n++)
+		if (strcmp(name, rtbl->symtab[n].name) == 0)
+			return &rtbl->symtab[n];
+
+	return NULL;
+}
+
+/*
+ * nfp_rtsym_read_le() - Read a simple unsigned scalar value from symbol
+ * @rtbl:	NFP RTsym table
+ * @name:	Symbol name
+ * @error:	Poniter to error code (optional)
+ *
+ * Lookup a symbol, map, read it and return it's value. Value of the symbol
+ * will be interpreted as a simple little-endian unsigned value. Symbol can
+ * be 4 or 8 bytes in size.
+ *
+ * Return: value read, on error sets the error and returns ~0ULL.
+ */
+uint64_t
+nfp_rtsym_read_le(struct nfp_rtsym_table *rtbl, const char *name, int *error)
+{
+	const struct nfp_rtsym *sym;
+	uint32_t val32, id;
+	uint64_t val;
+	int err;
+
+	sym = nfp_rtsym_lookup(rtbl, name);
+	if (!sym) {
+		err = -ENOENT;
+		goto exit;
+	}
+
+	id = NFP_CPP_ISLAND_ID(sym->target, NFP_CPP_ACTION_RW, 0, sym->domain);
+
+#ifdef DEBUG
+	printf("Reading symbol %s with size %" PRIu64 " at %" PRIx64 "\n",
+		name, sym->size, sym->addr);
+#endif
+	switch (sym->size) {
+	case 4:
+		err = nfp_cpp_readl(rtbl->cpp, id, sym->addr, &val32);
+		val = val32;
+		break;
+	case 8:
+		err = nfp_cpp_readq(rtbl->cpp, id, sym->addr, &val);
+		break;
+	default:
+		printf("rtsym '%s' unsupported size: %" PRId64 "\n",
+			name, sym->size);
+		err = -EINVAL;
+		break;
+	}
+
+	if (err)
+		err = -EIO;
+exit:
+	if (error)
+		*error = err;
+
+	if (err)
+		return ~0ULL;
+
+	return val;
+}
+
+uint8_t *
+nfp_rtsym_map(struct nfp_rtsym_table *rtbl, const char *name,
+	      unsigned int min_size, struct nfp_cpp_area **area)
+{
+	const struct nfp_rtsym *sym;
+	uint8_t *mem;
+
+#ifdef DEBUG
+	printf("mapping symbol %s\n", name);
+#endif
+	sym = nfp_rtsym_lookup(rtbl, name);
+	if (!sym) {
+		printf("symbol lookup fails for %s\n", name);
+		return NULL;
+	}
+
+	if (sym->size < min_size) {
+		printf("Symbol %s too small (%" PRIu64 " < %u)\n", name,
+			sym->size, min_size);
+		return NULL;
+	}
+
+	mem = nfp_cpp_map_area(rtbl->cpp, sym->domain, sym->target, sym->addr,
+			       sym->size, area);
+	if (!mem) {
+		printf("Failed to map symbol %s\n", name);
+		return NULL;
+	}
+#ifdef DEBUG
+	printf("symbol %s with address %p\n", name, mem);
+#endif
+
+	return mem;
+}
diff --git a/drivers/net/nfp/nfpcore/nfp_rtsym.h b/drivers/net/nfp/nfpcore/nfp_rtsym.h
new file mode 100644
index 0000000..7941ecc
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_rtsym.h
@@ -0,0 +1,87 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __NFP_RTSYM_H__
+#define __NFP_RTSYM_H__
+
+#define NFP_RTSYM_TYPE_NONE             0
+#define NFP_RTSYM_TYPE_OBJECT           1
+#define NFP_RTSYM_TYPE_FUNCTION         2
+#define NFP_RTSYM_TYPE_ABS              3
+
+#define NFP_RTSYM_TARGET_NONE           0
+#define NFP_RTSYM_TARGET_LMEM           -1
+#define NFP_RTSYM_TARGET_EMU_CACHE      -7
+
+/*
+ * Structure describing a run-time NFP symbol.
+ *
+ * The memory target of the symbol is generally the CPP target number and can be
+ * used directly by the nfp_cpp API calls.  However, in some cases (i.e., for
+ * local memory or control store) the target is encoded using a negative number.
+ *
+ * When the target type can not be used to fully describe the location of a
+ * symbol the domain field is used to further specify the location (i.e., the
+ * specific ME or island number).
+ *
+ * For ME target resources, 'domain' is an MEID.
+ * For Island target resources, 'domain' is an island ID, with the one exception
+ * of "sram" symbols for backward compatibility, which are viewed as global.
+ */
+struct nfp_rtsym {
+	const char *name;
+	uint64_t addr;
+	uint64_t size;
+	int type;
+	int target;
+	int domain;
+};
+
+struct nfp_rtsym_table;
+
+struct nfp_rtsym_table *nfp_rtsym_table_read(struct nfp_cpp *cpp);
+
+struct nfp_rtsym_table *
+__nfp_rtsym_table_read(struct nfp_cpp *cpp, const struct nfp_mip *mip);
+
+int nfp_rtsym_count(struct nfp_rtsym_table *rtbl);
+
+const struct nfp_rtsym *nfp_rtsym_get(struct nfp_rtsym_table *rtbl, int idx);
+
+const struct nfp_rtsym *
+nfp_rtsym_lookup(struct nfp_rtsym_table *rtbl, const char *name);
+
+uint64_t nfp_rtsym_read_le(struct nfp_rtsym_table *rtbl, const char *name,
+			   int *error);
+uint8_t *
+nfp_rtsym_map(struct nfp_rtsym_table *rtbl, const char *name,
+	      unsigned int min_size, struct nfp_cpp_area **area);
+#endif
diff --git a/drivers/net/nfp/nfpcore/nfp_target.h b/drivers/net/nfp/nfpcore/nfp_target.h
new file mode 100644
index 0000000..3b8c1db
--- /dev/null
+++ b/drivers/net/nfp/nfpcore/nfp_target.h
@@ -0,0 +1,605 @@
+/*
+ * Copyright (c) 2018 Netronome Systems, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ *  this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution
+ *
+ * 3. Neither the name of the copyright holder nor the names of its
+ *  contributors may be used to endorse or promote products derived from this
+ *  software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef NFP_TARGET_H
+#define NFP_TARGET_H
+
+#include "nfp-common/nfp_resid.h"
+#include "nfp-common/nfp_cppat.h"
+#include "nfp-common/nfp_platform.h"
+#include "nfp_cpp.h"
+
+#define P32 1
+#define P64 2
+
+#define PUSHPULL(_pull, _push) (((_pull) << 4) | ((_push) << 0))
+
+#ifndef NFP_ERRNO
+#include <errno.h>
+#define NFP_ERRNO(x)    (errno = (x), -1)
+#endif
+
+static inline int
+pushpull_width(int pp)
+{
+	pp &= 0xf;
+
+	if (pp == 0)
+		return NFP_ERRNO(EINVAL);
+	return (2 << pp);
+}
+
+#define PUSH_WIDTH(_pushpull)      pushpull_width((_pushpull) >> 0)
+#define PULL_WIDTH(_pushpull)      pushpull_width((_pushpull) >> 4)
+
+static inline int
+target_rw(uint32_t cpp_id, int pp, int start, int len)
+{
+	int island = NFP_CPP_ID_ISLAND_of(cpp_id);
+
+	if (island && (island < start || island > (start + len)))
+		return NFP_ERRNO(EINVAL);
+
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 0, 0):
+		return PUSHPULL(0, pp);
+	case NFP_CPP_ID(0, 1, 0):
+		return PUSHPULL(pp, 0);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 0):
+		return PUSHPULL(pp, pp);
+	default:
+		return NFP_ERRNO(EINVAL);
+	}
+}
+
+static inline int
+nfp6000_nbi_dma(uint32_t cpp_id)
+{
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 0, 0): /* ReadNbiDma */
+		return PUSHPULL(0, P64);
+	case NFP_CPP_ID(0, 1, 0): /* WriteNbiDma */
+		return PUSHPULL(P64, 0);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 0):
+		return PUSHPULL(P64, P64);
+	default:
+		return NFP_ERRNO(EINVAL);
+	}
+}
+
+static inline int
+nfp6000_nbi_stats(uint32_t cpp_id)
+{
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 0, 0): /* ReadNbiStats */
+		return PUSHPULL(0, P64);
+	case NFP_CPP_ID(0, 1, 0): /* WriteNbiStats */
+		return PUSHPULL(P64, 0);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 0):
+		return PUSHPULL(P64, P64);
+	default:
+		return NFP_ERRNO(EINVAL);
+	}
+}
+
+static inline int
+nfp6000_nbi_tm(uint32_t cpp_id)
+{
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 0, 0): /* ReadNbiTM */
+		return PUSHPULL(0, P64);
+	case NFP_CPP_ID(0, 1, 0):  /* WriteNbiTM */
+		return PUSHPULL(P64, 0);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 0):
+		return PUSHPULL(P64, P64);
+	default:
+		return NFP_ERRNO(EINVAL);
+	}
+}
+
+static inline int
+nfp6000_nbi_ppc(uint32_t cpp_id)
+{
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 0, 0): /* ReadNbiPreclassifier */
+		return PUSHPULL(0, P64);
+	case NFP_CPP_ID(0, 1, 0): /* WriteNbiPreclassifier */
+		return PUSHPULL(P64, 0);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 0):
+		return PUSHPULL(P64, P64);
+	default:
+		return NFP_ERRNO(EINVAL);
+	}
+}
+
+static inline int
+nfp6000_nbi(uint32_t cpp_id, uint64_t address)
+{
+	int island = NFP_CPP_ID_ISLAND_of(cpp_id);
+	uint64_t rel_addr = address & 0x3fFFFF;
+
+	if (island && (island < 8 || island > 9))
+		return NFP_ERRNO(EINVAL);
+
+	if (rel_addr < (1 << 20))
+		return nfp6000_nbi_dma(cpp_id);
+	if (rel_addr < (2 << 20))
+		return nfp6000_nbi_stats(cpp_id);
+	if (rel_addr < (3 << 20))
+		return nfp6000_nbi_tm(cpp_id);
+	return nfp6000_nbi_ppc(cpp_id);
+}
+
+/*
+ * This structure ONLY includes items that can be done with a read or write of
+ * 32-bit or 64-bit words. All others are not listed.
+ */
+static inline int
+nfp6000_mu_common(uint32_t cpp_id)
+{
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 0): /* read_be/write_be */
+		return PUSHPULL(P64, P64);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 1): /* read_le/write_le */
+		return PUSHPULL(P64, P64);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 2): /* {read/write}_swap_be */
+		return PUSHPULL(P64, P64);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 3): /* {read/write}_swap_le */
+		return PUSHPULL(P64, P64);
+	case NFP_CPP_ID(0, 0, 0): /* read_be */
+		return PUSHPULL(0, P64);
+	case NFP_CPP_ID(0, 0, 1): /* read_le */
+		return PUSHPULL(0, P64);
+	case NFP_CPP_ID(0, 0, 2): /* read_swap_be */
+		return PUSHPULL(0, P64);
+	case NFP_CPP_ID(0, 0, 3): /* read_swap_le */
+		return PUSHPULL(0, P64);
+	case NFP_CPP_ID(0, 1, 0): /* write_be */
+		return PUSHPULL(P64, 0);
+	case NFP_CPP_ID(0, 1, 1): /* write_le */
+		return PUSHPULL(P64, 0);
+	case NFP_CPP_ID(0, 1, 2): /* write_swap_be */
+		return PUSHPULL(P64, 0);
+	case NFP_CPP_ID(0, 1, 3): /* write_swap_le */
+		return PUSHPULL(P64, 0);
+	case NFP_CPP_ID(0, 3, 0): /* atomic_read */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 3, 2): /* mask_compare_write */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 4, 0): /* atomic_write */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 4, 2): /* atomic_write_imm */
+		return PUSHPULL(0, 0);
+	case NFP_CPP_ID(0, 4, 3): /* swap_imm */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 5, 0): /* set */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 5, 3): /* test_set_imm */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 6, 0): /* clr */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 6, 3): /* test_clr_imm */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 7, 0): /* add */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 7, 3): /* test_add_imm */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 8, 0): /* addsat */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 8, 3): /* test_subsat_imm */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 9, 0): /* sub */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 9, 3): /* test_sub_imm */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 10, 0): /* subsat */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 10, 3): /* test_subsat_imm */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 13, 0): /* microq128_get */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 13, 1): /* microq128_pop */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 13, 2): /* microq128_put */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 15, 0): /* xor */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 15, 3): /* test_xor_imm */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 28, 0): /* read32_be */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 28, 1): /* read32_le */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 28, 2): /* read32_swap_be */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 28, 3): /* read32_swap_le */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 31, 0): /* write32_be */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 31, 1): /* write32_le */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 31, 2): /* write32_swap_be */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 31, 3): /* write32_swap_le */
+		return PUSHPULL(P32, 0);
+	default:
+		return NFP_ERRNO(EINVAL);
+	}
+}
+
+static inline int
+nfp6000_mu_ctm(uint32_t cpp_id)
+{
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 16, 1): /* packet_read_packet_status */
+		return PUSHPULL(0, P32);
+	default:
+		return nfp6000_mu_common(cpp_id);
+	}
+}
+
+static inline int
+nfp6000_mu_emu(uint32_t cpp_id)
+{
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 18, 0): /* read_queue */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 18, 1): /* read_queue_ring */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 18, 2): /* write_queue */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 18, 3): /* write_queue_ring */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 20, 2): /* journal */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 21, 0): /* get */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 21, 1): /* get_eop */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 21, 2): /* get_freely */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 22, 0): /* pop */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 22, 1): /* pop_eop */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 22, 2): /* pop_freely */
+		return PUSHPULL(0, P32);
+	default:
+		return nfp6000_mu_common(cpp_id);
+	}
+}
+
+static inline int
+nfp6000_mu_imu(uint32_t cpp_id)
+{
+	return nfp6000_mu_common(cpp_id);
+}
+
+static inline int
+nfp6000_mu(uint32_t cpp_id, uint64_t address)
+{
+	int pp;
+	int island = NFP_CPP_ID_ISLAND_of(cpp_id);
+
+	if (island == 0) {
+		if (address < 0x2000000000ULL)
+			pp = nfp6000_mu_ctm(cpp_id);
+		else if (address < 0x8000000000ULL)
+			pp = nfp6000_mu_emu(cpp_id);
+		else if (address < 0x9800000000ULL)
+			pp = nfp6000_mu_ctm(cpp_id);
+		else if (address < 0x9C00000000ULL)
+			pp = nfp6000_mu_emu(cpp_id);
+		else if (address < 0xA000000000ULL)
+			pp = nfp6000_mu_imu(cpp_id);
+		else
+			pp = nfp6000_mu_ctm(cpp_id);
+	} else if (island >= 24 && island <= 27) {
+		pp = nfp6000_mu_emu(cpp_id);
+	} else if (island >= 28 && island <= 31) {
+		pp = nfp6000_mu_imu(cpp_id);
+	} else if (island == 1 ||
+		   (island >= 4 && island <= 7) ||
+		   (island >= 12 && island <= 13) ||
+		   (island >= 32 && island <= 47) ||
+		   (island >= 48 && island <= 51)) {
+		pp = nfp6000_mu_ctm(cpp_id);
+	} else {
+		pp = NFP_ERRNO(EINVAL);
+	}
+
+	return pp;
+}
+
+static inline int
+nfp6000_ila(uint32_t cpp_id)
+{
+	int island = NFP_CPP_ID_ISLAND_of(cpp_id);
+
+	if (island && (island < 48 || island > 51))
+		return NFP_ERRNO(EINVAL);
+
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 0, 1): /* read_check_error */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 2, 0): /* read_int */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 3, 0): /* write_int */
+		return PUSHPULL(P32, 0);
+	default:
+		return target_rw(cpp_id, P32, 48, 4);
+	}
+}
+
+static inline int
+nfp6000_pci(uint32_t cpp_id)
+{
+	int island = NFP_CPP_ID_ISLAND_of(cpp_id);
+
+	if (island && (island < 4 || island > 7))
+		return NFP_ERRNO(EINVAL);
+
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 2, 0):
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 3, 0):
+		return PUSHPULL(P32, 0);
+	default:
+		return target_rw(cpp_id, P32, 4, 4);
+	}
+}
+
+static inline int
+nfp6000_crypto(uint32_t cpp_id)
+{
+	int island = NFP_CPP_ID_ISLAND_of(cpp_id);
+
+	if (island && (island < 12 || island > 15))
+		return NFP_ERRNO(EINVAL);
+
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 2, 0):
+		return PUSHPULL(P64, 0);
+	default:
+		return target_rw(cpp_id, P64, 12, 4);
+	}
+}
+
+static inline int
+nfp6000_cap_xpb(uint32_t cpp_id)
+{
+	int island = NFP_CPP_ID_ISLAND_of(cpp_id);
+
+	if (island && (island < 1 || island > 63))
+		return NFP_ERRNO(EINVAL);
+
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 0, 1): /* RingGet */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 0, 2): /* Interthread Signal */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 1, 1): /* RingPut */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 1, 2): /* CTNNWr */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 2, 0): /* ReflectRd, signal none */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 2, 1): /* ReflectRd, signal self */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 2, 2): /* ReflectRd, signal remote */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 2, 3): /* ReflectRd, signal both */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 3, 0): /* ReflectWr, signal none */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 3, 1): /* ReflectWr, signal self */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 3, 2): /* ReflectWr, signal remote */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 3, 3): /* ReflectWr, signal both */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, NFP_CPP_ACTION_RW, 1):
+		return PUSHPULL(P32, P32);
+	default:
+		return target_rw(cpp_id, P32, 1, 63);
+	}
+}
+
+static inline int
+nfp6000_cls(uint32_t cpp_id)
+{
+	int island = NFP_CPP_ID_ISLAND_of(cpp_id);
+
+	if (island && (island < 1 || island > 63))
+		return NFP_ERRNO(EINVAL);
+
+	switch (cpp_id & NFP_CPP_ID(0, ~0, ~0)) {
+	case NFP_CPP_ID(0, 0, 3): /* xor */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 2, 0): /* set */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 2, 1): /* clr */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 4, 0): /* add */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 4, 1): /* add64 */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 6, 0): /* sub */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 6, 1): /* sub64 */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 6, 2): /* subsat */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 8, 2): /* hash_mask */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 8, 3): /* hash_clear */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 9, 0): /* ring_get */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 9, 1): /* ring_pop */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 9, 2): /* ring_get_freely */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 9, 3): /* ring_pop_freely */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 10, 0): /* ring_put */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 10, 2): /* ring_journal */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 14, 0): /* reflect_write_sig_local */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 15, 1):  /* reflect_read_sig_local */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 17, 2): /* statistic */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 24, 0): /* ring_read */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 24, 1): /* ring_write */
+		return PUSHPULL(P32, 0);
+	case NFP_CPP_ID(0, 25, 0): /* ring_workq_add_thread */
+		return PUSHPULL(0, P32);
+	case NFP_CPP_ID(0, 25, 1): /* ring_workq_add_work */
+		return PUSHPULL(P32, 0);
+	default:
+		return target_rw(cpp_id, P32, 0, 64);
+	}
+}
+
+static inline int
+nfp6000_target_pushpull(uint32_t cpp_id, uint64_t address)
+{
+	switch (NFP_CPP_ID_TARGET_of(cpp_id)) {
+	case NFP6000_CPPTGT_NBI:
+		return nfp6000_nbi(cpp_id, address);
+	case NFP6000_CPPTGT_VQDR:
+		return target_rw(cpp_id, P32, 24, 4);
+	case NFP6000_CPPTGT_ILA:
+		return nfp6000_ila(cpp_id);
+	case NFP6000_CPPTGT_MU:
+		return nfp6000_mu(cpp_id, address);
+	case NFP6000_CPPTGT_PCIE:
+		return nfp6000_pci(cpp_id);
+	case NFP6000_CPPTGT_ARM:
+		if (address < 0x10000)
+			return target_rw(cpp_id, P64, 1, 1);
+		else
+			return target_rw(cpp_id, P32, 1, 1);
+	case NFP6000_CPPTGT_CRYPTO:
+		return nfp6000_crypto(cpp_id);
+	case NFP6000_CPPTGT_CTXPB:
+		return nfp6000_cap_xpb(cpp_id);
+	case NFP6000_CPPTGT_CLS:
+		return nfp6000_cls(cpp_id);
+	case 0:
+		return target_rw(cpp_id, P32, 4, 4);
+	default:
+		return NFP_ERRNO(EINVAL);
+	}
+}
+
+static inline int
+nfp_target_pushpull_width(int pp, int write_not_read)
+{
+	if (pp < 0)
+		return pp;
+
+	if (write_not_read)
+		return PULL_WIDTH(pp);
+	else
+		return PUSH_WIDTH(pp);
+}
+
+static inline int
+nfp6000_target_action_width(uint32_t cpp_id, uint64_t address,
+			    int write_not_read)
+{
+	int pp;
+
+	pp = nfp6000_target_pushpull(cpp_id, address);
+
+	return nfp_target_pushpull_width(pp, write_not_read);
+}
+
+static inline int
+nfp_target_action_width(uint32_t model, uint32_t cpp_id, uint64_t address,
+			int write_not_read)
+{
+	if (NFP_CPP_MODEL_IS_6000(model)) {
+		return nfp6000_target_action_width(cpp_id, address,
+						   write_not_read);
+	} else {
+		return NFP_ERRNO(EINVAL);
+	}
+}
+
+static inline int
+nfp_target_cpp(uint32_t cpp_island_id, uint64_t cpp_island_address,
+	       uint32_t *cpp_target_id, uint64_t *cpp_target_address,
+	       const uint32_t *imb_table)
+{
+	int err;
+	int island = NFP_CPP_ID_ISLAND_of(cpp_island_id);
+	int target = NFP_CPP_ID_TARGET_of(cpp_island_id);
+	uint32_t imb;
+
+	if (target < 0 || target >= 16)
+		return NFP_ERRNO(EINVAL);
+
+	if (island == 0) {
+		/* Already translated */
+		*cpp_target_id = cpp_island_id;
+		*cpp_target_address = cpp_island_address;
+		return 0;
+	}
+
+	if (!imb_table) {
+		/* CPP + Island only allowed on systems with IMB tables */
+		return NFP_ERRNO(EINVAL);
+	}
+
+	imb = imb_table[target];
+
+	*cpp_target_address = cpp_island_address;
+	err = _nfp6000_cppat_addr_encode(cpp_target_address, island, target,
+					 ((imb >> 13) & 7),
+					 ((imb >> 12) & 1),
+					 ((imb >> 6) & 0x3f),
+					 ((imb >> 0) & 0x3f));
+	if (err == 0) {
+		*cpp_target_id =
+		    NFP_CPP_ID(target, NFP_CPP_ID_ACTION_of(cpp_island_id),
+			       NFP_CPP_ID_TOKEN_of(cpp_island_id));
+	}
+
+	return err;
+}
+
+#endif /* NFP_TARGET_H */
-- 
1.9.1

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] [PATCH v1 9/9] ethdev: fix ABI version in meson build
  2018-03-23 12:58  3% [dpdk-dev] [PATCH v1 0/9] Bunch of flow API-related fixes Adrien Mazarguil
@ 2018-03-23 12:58  4% ` Adrien Mazarguil
  2018-04-04 14:57  3% ` [dpdk-dev] [PATCH v2 00/13] Bunch of flow API-related fixes Adrien Mazarguil
  1 sibling, 0 replies; 200+ results
From: Adrien Mazarguil @ 2018-03-23 12:58 UTC (permalink / raw)
  To: dev; +Cc: Kirill Rybalchenko

Must remain synchronized with its Makefile counterpart.

Fixes: a4b0b30723b2 ("ethdev: remove versioning of ethdev filter control function")
Cc: Kirill Rybalchenko <kirill.rybalchenko@intel.com>

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 lib/librte_ether/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_ether/meson.build b/lib/librte_ether/meson.build
index 7fed86056..12bdb6b61 100644
--- a/lib/librte_ether/meson.build
+++ b/lib/librte_ether/meson.build
@@ -2,7 +2,7 @@
 # Copyright(c) 2017 Intel Corporation
 
 name = 'ethdev'
-version = 8
+version = 9
 allow_experimental_apis = true
 sources = files('ethdev_profile.c',
 	'rte_ethdev.c',
-- 
2.11.0

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v1 0/9] Bunch of flow API-related fixes
@ 2018-03-23 12:58  3% Adrien Mazarguil
  2018-03-23 12:58  4% ` [dpdk-dev] [PATCH v1 9/9] ethdev: fix ABI version in meson build Adrien Mazarguil
  2018-04-04 14:57  3% ` [dpdk-dev] [PATCH v2 00/13] Bunch of flow API-related fixes Adrien Mazarguil
  0 siblings, 2 replies; 200+ results
From: Adrien Mazarguil @ 2018-03-23 12:58 UTC (permalink / raw)
  To: dev

This series contains several fixes for rte_flow and its implementation in
mlx4 and testpmd. Upcoming work on the flow API depends on it.

Adrien Mazarguil (9):
  net/mlx4: fix RSS resource leak in case of error
  net/mlx4: fix ignored RSS hash types
  app/testpmd: fix flow completion for RSS queues
  app/testpmd: fix lack of flow action configuration
  app/testpmd: fix RSS flow action configuration
  app/testpmd: fix missing RSS fields in flow action
  ethdev: fix shallow copy of flow API RSS action
  ethdev: fix missing boolean values in flow command
  ethdev: fix ABI version in meson build

 app/test-pmd/cmdline_flow.c                 | 255 ++++++++++++++++++++---
 app/test-pmd/config.c                       | 161 +++++++++-----
 app/test-pmd/testpmd.h                      |  13 ++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   8 +
 drivers/net/mlx4/mlx4_flow.c                |  17 +-
 lib/librte_ether/meson.build                |   2 +-
 lib/librte_ether/rte_flow.c                 | 145 +++++++++----
 7 files changed, 477 insertions(+), 124 deletions(-)

-- 
2.11.0

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6] eal: provide API for querying valid socket id's
  2018-03-22 12:36  5%   ` [dpdk-dev] [PATCH v6] " Anatoly Burakov
@ 2018-03-22 17:07  0%     ` gowrishankar muthukrishnan
  2018-03-27 16:24  3%     ` Thomas Monjalon
  2018-03-31 17:08  5%     ` [dpdk-dev] [PATCH v7] " Anatoly Burakov
  2 siblings, 0 replies; 200+ results
From: gowrishankar muthukrishnan @ 2018-03-22 17:07 UTC (permalink / raw)
  To: Anatoly Burakov, dev; +Cc: Bruce Richardson, thomas, chaozhu

On Thursday 22 March 2018 06:06 PM, Anatoly Burakov wrote:
> During lcore scan, find all socket ID's and store them, and
> provide public API to query valid socket id's. This will break
> the ABI, so bump ABI version.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>

Thanks,
Gowrishankar

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v6] eal: provide API for querying valid socket id's
  2018-03-22 10:58  4% ` [dpdk-dev] [PATCH v5] eal: provide API for querying valid socket id's Anatoly Burakov
  2018-03-22 11:45  3%   ` Burakov, Anatoly
@ 2018-03-22 12:36  5%   ` Anatoly Burakov
  2018-03-22 17:07  0%     ` gowrishankar muthukrishnan
                       ` (2 more replies)
  1 sibling, 3 replies; 200+ results
From: Anatoly Burakov @ 2018-03-22 12:36 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, thomas, chaozhu, gowrishankar.m

During lcore scan, find all socket ID's and store them, and
provide public API to query valid socket id's. This will break
the ABI, so bump ABI version.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v6:
    - Fixed meson ABI version header
    
    v5:
    - Move API to experimental
    - Store list of valid socket id's instead of simply
      recording the biggest one
    
    v4:
    - Remove backwards ABI compatibility, bump ABI instead
    
    v3:
    - Added ABI compatibility
    
    v2:
    - checkpatch changes
    - check socket before deciding if the core is not to be used

 lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
 lib/librte_eal/common/eal_common_lcore.c  | 75 ++++++++++++++++++++++++++-----
 lib/librte_eal/common/include/rte_eal.h   |  3 ++
 lib/librte_eal/common/include/rte_lcore.h | 30 +++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
 lib/librte_eal/meson.build                |  2 +-
 lib/librte_eal/rte_eal_version.map        |  2 +
 7 files changed, 101 insertions(+), 15 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index dd455e6..ed1d17b 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -21,7 +21,7 @@ LDLIBS += -lgcc_s
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 6
+LIBABIVER := 7
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/common/eal_common_lcore.c b/lib/librte_eal/common/eal_common_lcore.c
index 7724fa4..50d9f82 100644
--- a/lib/librte_eal/common/eal_common_lcore.c
+++ b/lib/librte_eal/common/eal_common_lcore.c
@@ -7,6 +7,7 @@
 #include <string.h>
 #include <dirent.h>
 
+#include <rte_errno.h>
 #include <rte_log.h>
 #include <rte_eal.h>
 #include <rte_lcore.h>
@@ -16,6 +17,19 @@
 #include "eal_private.h"
 #include "eal_thread.h"
 
+static int
+socket_id_cmp(const void *a, const void *b)
+{
+	const int *lcore_id_a = a;
+	const int *lcore_id_b = b;
+
+	if (*lcore_id_a < *lcore_id_b)
+		return -1;
+	if (*lcore_id_a > *lcore_id_b)
+		return 1;
+	return 0;
+}
+
 /*
  * Parse /sys/devices/system/cpu to get the number of physical and logical
  * processors on the machine. The function will fill the cpu_info
@@ -28,6 +42,8 @@ rte_eal_cpu_init(void)
 	struct rte_config *config = rte_eal_get_configuration();
 	unsigned lcore_id;
 	unsigned count = 0;
+	unsigned int socket_id, prev_socket_id;
+	int lcore_to_socket_id[RTE_MAX_LCORE];
 
 	/*
 	 * Parse the maximum set of logical cores, detect the subset of running
@@ -39,6 +55,19 @@ rte_eal_cpu_init(void)
 		/* init cpuset for per lcore config */
 		CPU_ZERO(&lcore_config[lcore_id].cpuset);
 
+		/* find socket first */
+		socket_id = eal_cpu_socket_id(lcore_id);
+		if (socket_id >= RTE_MAX_NUMA_NODES) {
+#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
+			socket_id = 0;
+#else
+			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than RTE_MAX_NUMA_NODES (%d)\n",
+					socket_id, RTE_MAX_NUMA_NODES);
+			return -1;
+#endif
+		}
+		lcore_to_socket_id[lcore_id] = socket_id;
+
 		/* in 1:1 mapping, record related cpu detected state */
 		lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
 		if (lcore_config[lcore_id].detected == 0) {
@@ -54,18 +83,7 @@ rte_eal_cpu_init(void)
 		config->lcore_role[lcore_id] = ROLE_RTE;
 		lcore_config[lcore_id].core_role = ROLE_RTE;
 		lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
-		lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
-		if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
-#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
-			lcore_config[lcore_id].socket_id = 0;
-#else
-			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
-				"RTE_MAX_NUMA_NODES (%d)\n",
-				lcore_config[lcore_id].socket_id,
-				RTE_MAX_NUMA_NODES);
-			return -1;
-#endif
-		}
+		lcore_config[lcore_id].socket_id = socket_id;
 		RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
 				"core %u on socket %u\n",
 				lcore_id, lcore_config[lcore_id].core_id,
@@ -79,5 +97,38 @@ rte_eal_cpu_init(void)
 		RTE_MAX_LCORE);
 	RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
 
+	/* sort all socket id's in ascending order */
+	qsort(lcore_to_socket_id, RTE_DIM(lcore_to_socket_id),
+			sizeof(lcore_to_socket_id[0]), socket_id_cmp);
+
+	prev_socket_id = -1;
+	config->numa_node_count = 0;
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		socket_id = lcore_to_socket_id[lcore_id];
+		if (socket_id != prev_socket_id)
+			config->numa_nodes[config->numa_node_count++] =
+					socket_id;
+		prev_socket_id = socket_id;
+	}
+	RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", config->numa_node_count);
+
 	return 0;
 }
+
+unsigned int __rte_experimental
+rte_num_socket_ids(void)
+{
+	const struct rte_config *config = rte_eal_get_configuration();
+	return config->numa_node_count;
+}
+
+int __rte_experimental
+rte_socket_id_by_idx(unsigned int idx)
+{
+	const struct rte_config *config = rte_eal_get_configuration();
+	if (idx >= config->numa_node_count) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	return config->numa_nodes[idx];
+}
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 93ca4cc..6109472 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -57,6 +57,9 @@ enum rte_proc_type_t {
 struct rte_config {
 	uint32_t master_lcore;       /**< Id of the master lcore */
 	uint32_t lcore_count;        /**< Number of available logical cores. */
+	uint32_t numa_node_count;    /**< Number of detected NUMA nodes. */
+	uint32_t numa_nodes[RTE_MAX_NUMA_NODES];
+	/**< List of detected numa nodes. */
 	uint32_t service_lcore_count;/**< Number of available service cores. */
 	enum rte_lcore_role_t lcore_role[RTE_MAX_LCORE]; /**< State of cores. */
 
diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h
index 0472220..c6511a9 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -132,6 +132,36 @@ rte_lcore_index(int lcore_id)
 unsigned rte_socket_id(void);
 
 /**
+ * Return number of physical sockets detected on the system.
+ *
+ * Note that number of nodes may not be correspondent to their physical id's:
+ * for example, a system may report two socket id's, but the actual socket id's
+ * may be 0 and 8.
+ *
+ * @return
+ *   the number of physical sockets as recognized by EAL
+ */
+unsigned int __rte_experimental
+rte_num_socket_ids(void);
+
+/**
+ * Return socket id with a particular index.
+ *
+ * This will return socket id at a particular position in list of all detected
+ * physical socket id's. For example, on a machine with sockets [0, 8], passing
+ * 1 as a parameter will return 8.
+ *
+ * @param idx
+ *   index of physical socket id to return
+ *
+ * @return
+ *   - physical socket id as recognized by EAL
+ *   - -1 on error, with errno set to EINVAL
+ */
+int __rte_experimental
+rte_socket_id_by_idx(unsigned int idx);
+
+/**
  * Get the ID of the physical socket of the specified lcore
  *
  * @param lcore_id
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 7e5bbe8..b9c7727 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 6
+LIBABIVER := 7
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/meson.build b/lib/librte_eal/meson.build
index d9ba385..8f0ce1b 100644
--- a/lib/librte_eal/meson.build
+++ b/lib/librte_eal/meson.build
@@ -43,7 +43,7 @@ else
 	error('unsupported system type @0@'.format(hostmachine.system()))
 endif
 
-version = 6  # the version of the EAL API
+version = 7  # the version of the EAL API
 allow_experimental_apis = true
 deps += 'compat'
 cflags += '-D_GNU_SOURCE'
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 1d88437..6a4c355 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -229,6 +229,8 @@ EXPERIMENTAL {
 	rte_mp_request;
 	rte_mp_request_async;
 	rte_mp_reply;
+	rte_num_socket_ids;
+	rte_socket_id_by_idx;
 	rte_service_attr_get;
 	rte_service_attr_reset_all;
 	rte_service_component_register;
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v5] eal: provide API for querying valid socket id's
  2018-03-22 10:58  4% ` [dpdk-dev] [PATCH v5] eal: provide API for querying valid socket id's Anatoly Burakov
@ 2018-03-22 11:45  3%   ` Burakov, Anatoly
  2018-03-22 12:36  5%   ` [dpdk-dev] [PATCH v6] " Anatoly Burakov
  1 sibling, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-03-22 11:45 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, thomas, chaozhu, gowrishankar.m

On 22-Mar-18 10:58 AM, Anatoly Burakov wrote:
> During lcore scan, find all socket ID's and store them, and
> provide public API to query valid socket id's. This will break
> the ABI, so bump ABI version.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---

Left out meson ABI version, will respin.

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v5] eal: provide API for querying valid socket id's
    @ 2018-03-22 10:58  4% ` Anatoly Burakov
  2018-03-22 11:45  3%   ` Burakov, Anatoly
  2018-03-22 12:36  5%   ` [dpdk-dev] [PATCH v6] " Anatoly Burakov
  1 sibling, 2 replies; 200+ results
From: Anatoly Burakov @ 2018-03-22 10:58 UTC (permalink / raw)
  To: dev; +Cc: Bruce Richardson, thomas, chaozhu, gowrishankar.m

During lcore scan, find all socket ID's and store them, and
provide public API to query valid socket id's. This will break
the ABI, so bump ABI version.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v5:
    - Move API to experimental
    - Store list of valid socket id's instead of simply
      recording the biggest one
    
    v4:
    - Remove backwards ABI compatibility, bump ABI instead
    
    v3:
    - Added ABI compatibility
    
    v2:
    - checkpatch changes
    - check socket before deciding if the core is not to be used

 lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
 lib/librte_eal/common/eal_common_lcore.c  | 75 ++++++++++++++++++++++++++-----
 lib/librte_eal/common/include/rte_eal.h   |  3 ++
 lib/librte_eal/common/include/rte_lcore.h | 30 +++++++++++++
 lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
 lib/librte_eal/rte_eal_version.map        |  2 +
 6 files changed, 100 insertions(+), 14 deletions(-)

diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
index dd455e6..ed1d17b 100644
--- a/lib/librte_eal/bsdapp/eal/Makefile
+++ b/lib/librte_eal/bsdapp/eal/Makefile
@@ -21,7 +21,7 @@ LDLIBS += -lgcc_s
 
 EXPORT_MAP := ../../rte_eal_version.map
 
-LIBABIVER := 6
+LIBABIVER := 7
 
 # specific to bsdapp exec-env
 SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
diff --git a/lib/librte_eal/common/eal_common_lcore.c b/lib/librte_eal/common/eal_common_lcore.c
index 7724fa4..50d9f82 100644
--- a/lib/librte_eal/common/eal_common_lcore.c
+++ b/lib/librte_eal/common/eal_common_lcore.c
@@ -7,6 +7,7 @@
 #include <string.h>
 #include <dirent.h>
 
+#include <rte_errno.h>
 #include <rte_log.h>
 #include <rte_eal.h>
 #include <rte_lcore.h>
@@ -16,6 +17,19 @@
 #include "eal_private.h"
 #include "eal_thread.h"
 
+static int
+socket_id_cmp(const void *a, const void *b)
+{
+	const int *lcore_id_a = a;
+	const int *lcore_id_b = b;
+
+	if (*lcore_id_a < *lcore_id_b)
+		return -1;
+	if (*lcore_id_a > *lcore_id_b)
+		return 1;
+	return 0;
+}
+
 /*
  * Parse /sys/devices/system/cpu to get the number of physical and logical
  * processors on the machine. The function will fill the cpu_info
@@ -28,6 +42,8 @@ rte_eal_cpu_init(void)
 	struct rte_config *config = rte_eal_get_configuration();
 	unsigned lcore_id;
 	unsigned count = 0;
+	unsigned int socket_id, prev_socket_id;
+	int lcore_to_socket_id[RTE_MAX_LCORE];
 
 	/*
 	 * Parse the maximum set of logical cores, detect the subset of running
@@ -39,6 +55,19 @@ rte_eal_cpu_init(void)
 		/* init cpuset for per lcore config */
 		CPU_ZERO(&lcore_config[lcore_id].cpuset);
 
+		/* find socket first */
+		socket_id = eal_cpu_socket_id(lcore_id);
+		if (socket_id >= RTE_MAX_NUMA_NODES) {
+#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
+			socket_id = 0;
+#else
+			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than RTE_MAX_NUMA_NODES (%d)\n",
+					socket_id, RTE_MAX_NUMA_NODES);
+			return -1;
+#endif
+		}
+		lcore_to_socket_id[lcore_id] = socket_id;
+
 		/* in 1:1 mapping, record related cpu detected state */
 		lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
 		if (lcore_config[lcore_id].detected == 0) {
@@ -54,18 +83,7 @@ rte_eal_cpu_init(void)
 		config->lcore_role[lcore_id] = ROLE_RTE;
 		lcore_config[lcore_id].core_role = ROLE_RTE;
 		lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
-		lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
-		if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
-#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
-			lcore_config[lcore_id].socket_id = 0;
-#else
-			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
-				"RTE_MAX_NUMA_NODES (%d)\n",
-				lcore_config[lcore_id].socket_id,
-				RTE_MAX_NUMA_NODES);
-			return -1;
-#endif
-		}
+		lcore_config[lcore_id].socket_id = socket_id;
 		RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
 				"core %u on socket %u\n",
 				lcore_id, lcore_config[lcore_id].core_id,
@@ -79,5 +97,38 @@ rte_eal_cpu_init(void)
 		RTE_MAX_LCORE);
 	RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
 
+	/* sort all socket id's in ascending order */
+	qsort(lcore_to_socket_id, RTE_DIM(lcore_to_socket_id),
+			sizeof(lcore_to_socket_id[0]), socket_id_cmp);
+
+	prev_socket_id = -1;
+	config->numa_node_count = 0;
+	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+		socket_id = lcore_to_socket_id[lcore_id];
+		if (socket_id != prev_socket_id)
+			config->numa_nodes[config->numa_node_count++] =
+					socket_id;
+		prev_socket_id = socket_id;
+	}
+	RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", config->numa_node_count);
+
 	return 0;
 }
+
+unsigned int __rte_experimental
+rte_num_socket_ids(void)
+{
+	const struct rte_config *config = rte_eal_get_configuration();
+	return config->numa_node_count;
+}
+
+int __rte_experimental
+rte_socket_id_by_idx(unsigned int idx)
+{
+	const struct rte_config *config = rte_eal_get_configuration();
+	if (idx >= config->numa_node_count) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	return config->numa_nodes[idx];
+}
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index 93ca4cc..6109472 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -57,6 +57,9 @@ enum rte_proc_type_t {
 struct rte_config {
 	uint32_t master_lcore;       /**< Id of the master lcore */
 	uint32_t lcore_count;        /**< Number of available logical cores. */
+	uint32_t numa_node_count;    /**< Number of detected NUMA nodes. */
+	uint32_t numa_nodes[RTE_MAX_NUMA_NODES];
+	/**< List of detected numa nodes. */
 	uint32_t service_lcore_count;/**< Number of available service cores. */
 	enum rte_lcore_role_t lcore_role[RTE_MAX_LCORE]; /**< State of cores. */
 
diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h
index 0472220..c6511a9 100644
--- a/lib/librte_eal/common/include/rte_lcore.h
+++ b/lib/librte_eal/common/include/rte_lcore.h
@@ -132,6 +132,36 @@ rte_lcore_index(int lcore_id)
 unsigned rte_socket_id(void);
 
 /**
+ * Return number of physical sockets detected on the system.
+ *
+ * Note that number of nodes may not be correspondent to their physical id's:
+ * for example, a system may report two socket id's, but the actual socket id's
+ * may be 0 and 8.
+ *
+ * @return
+ *   the number of physical sockets as recognized by EAL
+ */
+unsigned int __rte_experimental
+rte_num_socket_ids(void);
+
+/**
+ * Return socket id with a particular index.
+ *
+ * This will return socket id at a particular position in list of all detected
+ * physical socket id's. For example, on a machine with sockets [0, 8], passing
+ * 1 as a parameter will return 8.
+ *
+ * @param idx
+ *   index of physical socket id to return
+ *
+ * @return
+ *   - physical socket id as recognized by EAL
+ *   - -1 on error, with errno set to EINVAL
+ */
+int __rte_experimental
+rte_socket_id_by_idx(unsigned int idx);
+
+/**
  * Get the ID of the physical socket of the specified lcore
  *
  * @param lcore_id
diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
index 7e5bbe8..b9c7727 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
 EXPORT_MAP := ../../rte_eal_version.map
 VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
 
-LIBABIVER := 6
+LIBABIVER := 7
 
 VPATH += $(RTE_SDK)/lib/librte_eal/common
 
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index 1d88437..6a4c355 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -229,6 +229,8 @@ EXPERIMENTAL {
 	rte_mp_request;
 	rte_mp_request_async;
 	rte_mp_reply;
+	rte_num_socket_ids;
+	rte_socket_id_by_idx;
 	rte_service_attr_get;
 	rte_service_attr_reset_all;
 	rte_service_component_register;
-- 
2.7.4

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3 2/5] vhost: support selective datapath
  2018-03-22  7:55  3%       ` Wang, Zhihong
@ 2018-03-22  8:31  0%         ` Maxime Coquelin
  0 siblings, 0 replies; 200+ results
From: Maxime Coquelin @ 2018-03-22  8:31 UTC (permalink / raw)
  To: Wang, Zhihong, dev
  Cc: Tan, Jianfeng, Bie, Tiwei, yliu, Liang, Cunming, Wang, Xiao W, Daly, Dan



On 03/22/2018 08:55 AM, Wang, Zhihong wrote:
> 
> 
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Thursday, March 22, 2018 5:06 AM
>> To: Wang, Zhihong <zhihong.wang@intel.com>; dev@dpdk.org
>> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; Bie, Tiwei
>> <tiwei.bie@intel.com>; yliu@fridaylinux.org; Liang, Cunming
>> <cunming.liang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>; Daly,
>> Dan <dan.daly@intel.com>
>> Subject: Re: [PATCH v3 2/5] vhost: support selective datapath
>>
>>
>>
>> On 02/27/2018 11:13 AM, Zhihong Wang wrote:
>>> This patch introduces support for selective datapath in DPDK vhost-user lib
>>> to enable various types of virtio-compatible devices to do data transfer
>>> with virtio driver directly to enable acceleration. The default datapath is
>>> the existing software implementation, more options will be available when
>>> new engines are registered.
>>>
>>> An engine is a group of virtio-compatible devices under a single address.
>>> The engine driver includes:
>>>
>>>    1. A set of engine ops is defined in rte_vdpa_eng_ops to perform engine
>>>       init, uninit, and attributes reporting.
>>>
>>>    2. A set of device ops is defined in rte_vdpa_dev_ops for virtio devices
>>>       in the engine to do device specific operations:
>>>
>>>        a. dev_conf: Called to configure the actual device when the virtio
>>>           device becomes ready.
>>>
>>>        b. dev_close: Called to close the actual device when the virtio device
>>>           is stopped.
>>>
>>>        c. vring_state_set: Called to change the state of the vring in the
>>>           actual device when vring state changes.
>>>
>>>        d. feature_set: Called to set the negotiated features to device.
>>>
>>>        e. migration_done: Called to allow the device to response to RARP
>>>           sending.
>>>
>>>        f. get_vfio_group_fd: Called to get the VFIO group fd of the device.
>>>
>>>        g. get_vfio_device_fd: Called to get the VFIO device fd of the device.
>>>
>>>        h. get_notify_area: Called to get the notify area info of the queue.
>>>
>>> Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
>>> ---
>>> Changes in v2:
>>>
>>>    1. Add VFIO related vDPA device ops.
>>>
>>>    lib/librte_vhost/Makefile              |   4 +-
>>>    lib/librte_vhost/rte_vdpa.h            | 126
>> +++++++++++++++++++++++++++++++++
>>>    lib/librte_vhost/rte_vhost_version.map |   8 +++
>>>    lib/librte_vhost/vdpa.c                | 124
>> ++++++++++++++++++++++++++++++++
>>>    4 files changed, 260 insertions(+), 2 deletions(-)
>>>    create mode 100644 lib/librte_vhost/rte_vdpa.h
>>>    create mode 100644 lib/librte_vhost/vdpa.c
>>>
>>> diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
>>> index 5d6c6abae..37044ac03 100644
>>> --- a/lib/librte_vhost/Makefile
>>> +++ b/lib/librte_vhost/Makefile
>>> @@ -22,9 +22,9 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -
>> lrte_ethdev -lrte_net
>>>
>>>    # all source are stored in SRCS-y
>>>    SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c
>> \
>>> -					vhost_user.c virtio_net.c
>>> +					vhost_user.c virtio_net.c vdpa.c
>>>
>>>    # install includes
>>> -SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
>>> +SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
>> rte_vdpa.h
>>>
>>>    include $(RTE_SDK)/mk/rte.lib.mk
>>> diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
>>> new file mode 100644
>>> index 000000000..23fb471be
>>> --- /dev/null
>>> +++ b/lib/librte_vhost/rte_vdpa.h
>>> @@ -0,0 +1,126 @@
>>> +/* SPDX-License-Identifier: BSD-3-Clause
>>> + * Copyright(c) 2018 Intel Corporation
>>> + */
>>> +
>>> +#ifndef _RTE_VDPA_H_
>>> +#define _RTE_VDPA_H_
>>> +
>>> +/**
>>> + * @file
>>> + *
>>> + * Device specific vhost lib
>>> + */
>>> +
>>> +#include <rte_pci.h>
>>> +#include "rte_vhost.h"
>>> +
>>> +#define MAX_VDPA_ENGINE_NUM 128
>>> +#define MAX_VDPA_NAME_LEN 128
>>> +
>>> +struct rte_vdpa_eng_addr {
>>> +	union {
>>> +		uint8_t __dummy[64];
>>> +		struct rte_pci_addr pci_addr;
>> I think we should not only support PCI, but any type of buses.
>> At least in the API.
> 
> Exactly, so we defined a 64 bytes union so any bus types can be added
> without breaking the ABI.

Oh right, I missed that. That's good to me.

> But there is one place that may be impacted is the is_same_eng() function.
> Maybe comparing all the bytes in __dummy[64] is a better way. What do you
> think?

I think that for now that's fine to keep comparing PCI addresses as we 
don't have other bus supported than PCI. My concern was about the API,
and I mistakenly didn't see you already took care of it.

Thanks!
Maxime

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3 2/5] vhost: support selective datapath
  @ 2018-03-22  7:55  3%       ` Wang, Zhihong
  2018-03-22  8:31  0%         ` Maxime Coquelin
  0 siblings, 1 reply; 200+ results
From: Wang, Zhihong @ 2018-03-22  7:55 UTC (permalink / raw)
  To: Maxime Coquelin, dev
  Cc: Tan, Jianfeng, Bie, Tiwei, yliu, Liang, Cunming, Wang, Xiao W, Daly, Dan



> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Thursday, March 22, 2018 5:06 AM
> To: Wang, Zhihong <zhihong.wang@intel.com>; dev@dpdk.org
> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; Bie, Tiwei
> <tiwei.bie@intel.com>; yliu@fridaylinux.org; Liang, Cunming
> <cunming.liang@intel.com>; Wang, Xiao W <xiao.w.wang@intel.com>; Daly,
> Dan <dan.daly@intel.com>
> Subject: Re: [PATCH v3 2/5] vhost: support selective datapath
> 
> 
> 
> On 02/27/2018 11:13 AM, Zhihong Wang wrote:
> > This patch introduces support for selective datapath in DPDK vhost-user lib
> > to enable various types of virtio-compatible devices to do data transfer
> > with virtio driver directly to enable acceleration. The default datapath is
> > the existing software implementation, more options will be available when
> > new engines are registered.
> >
> > An engine is a group of virtio-compatible devices under a single address.
> > The engine driver includes:
> >
> >   1. A set of engine ops is defined in rte_vdpa_eng_ops to perform engine
> >      init, uninit, and attributes reporting.
> >
> >   2. A set of device ops is defined in rte_vdpa_dev_ops for virtio devices
> >      in the engine to do device specific operations:
> >
> >       a. dev_conf: Called to configure the actual device when the virtio
> >          device becomes ready.
> >
> >       b. dev_close: Called to close the actual device when the virtio device
> >          is stopped.
> >
> >       c. vring_state_set: Called to change the state of the vring in the
> >          actual device when vring state changes.
> >
> >       d. feature_set: Called to set the negotiated features to device.
> >
> >       e. migration_done: Called to allow the device to response to RARP
> >          sending.
> >
> >       f. get_vfio_group_fd: Called to get the VFIO group fd of the device.
> >
> >       g. get_vfio_device_fd: Called to get the VFIO device fd of the device.
> >
> >       h. get_notify_area: Called to get the notify area info of the queue.
> >
> > Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
> > ---
> > Changes in v2:
> >
> >   1. Add VFIO related vDPA device ops.
> >
> >   lib/librte_vhost/Makefile              |   4 +-
> >   lib/librte_vhost/rte_vdpa.h            | 126
> +++++++++++++++++++++++++++++++++
> >   lib/librte_vhost/rte_vhost_version.map |   8 +++
> >   lib/librte_vhost/vdpa.c                | 124
> ++++++++++++++++++++++++++++++++
> >   4 files changed, 260 insertions(+), 2 deletions(-)
> >   create mode 100644 lib/librte_vhost/rte_vdpa.h
> >   create mode 100644 lib/librte_vhost/vdpa.c
> >
> > diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
> > index 5d6c6abae..37044ac03 100644
> > --- a/lib/librte_vhost/Makefile
> > +++ b/lib/librte_vhost/Makefile
> > @@ -22,9 +22,9 @@ LDLIBS += -lrte_eal -lrte_mempool -lrte_mbuf -
> lrte_ethdev -lrte_net
> >
> >   # all source are stored in SRCS-y
> >   SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c
> \
> > -					vhost_user.c virtio_net.c
> > +					vhost_user.c virtio_net.c vdpa.c
> >
> >   # install includes
> > -SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
> > +SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
> rte_vdpa.h
> >
> >   include $(RTE_SDK)/mk/rte.lib.mk
> > diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h
> > new file mode 100644
> > index 000000000..23fb471be
> > --- /dev/null
> > +++ b/lib/librte_vhost/rte_vdpa.h
> > @@ -0,0 +1,126 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2018 Intel Corporation
> > + */
> > +
> > +#ifndef _RTE_VDPA_H_
> > +#define _RTE_VDPA_H_
> > +
> > +/**
> > + * @file
> > + *
> > + * Device specific vhost lib
> > + */
> > +
> > +#include <rte_pci.h>
> > +#include "rte_vhost.h"
> > +
> > +#define MAX_VDPA_ENGINE_NUM 128
> > +#define MAX_VDPA_NAME_LEN 128
> > +
> > +struct rte_vdpa_eng_addr {
> > +	union {
> > +		uint8_t __dummy[64];
> > +		struct rte_pci_addr pci_addr;
> I think we should not only support PCI, but any type of buses.
> At least in the API.

Exactly, so we defined a 64 bytes union so any bus types can be added
without breaking the ABI.

But there is one place that may be impacted is the is_same_eng() function.
Maybe comparing all the bytes in __dummy[64] is a better way. What do you
think?

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 0/4] ethdev: add per-PMD tuning of RxTx parmeters
  2018-03-07 12:08  3% [dpdk-dev] [RFC PATCH v1 0/4] ethdev: add per-PMD tuning of RxTx parmeters Remy Horton
@ 2018-03-21 14:27  3% ` Remy Horton
  2018-03-27 18:43  0%   ` Ferruh Yigit
  2018-04-04 17:17  3%   ` [dpdk-dev] [PATCH v3 " Remy Horton
  0 siblings, 2 replies; 200+ results
From: Remy Horton @ 2018-03-21 14:27 UTC (permalink / raw)
  To: dev
  Cc: John McNamara, Wenzhuo Lu, Jingjing Wu, Qi Zhang, Beilei Xing,
	Shreyansh Jain, Thomas Monjalon

The optimal values of several transmission & reception related parameters,
such as burst sizes, descriptor ring sizes, and number of queues, varies
between different network interface devices. This patchset allows individual
PMDs to specify their preferred parameter values, and if so indicated by an
application, for them to be used automatically by the ethdev layer.

rte_eth_dev_configure() has been changed so that specifying zero for both
nb_rx_q AND nb_tx_q causes it to use driver preferred values, and if these
are not available, falls back to EAL defaults. Setting one (but not both)
to zero does not cause the use of defaults, as having one of them zeroed is
a valid setup.

This RFC/V1 includes per-PMD values for e1000 and i40e but it is expected
that subsequent patchsets will cover other PMDs. A deprecation notice
covering the API/ABI change is in place.

Changes in v2:
* Rebased to 
* Removed fallback values from rte_eth_dev_info_get()
* Added fallback values to rte_rte_[rt]x_queue_setup()
* Added fallback values to rte_eth_dev_configure()
* Corrected comment
* Removed deprecation notice
* Split RX and Tx into seperate structures
* Changed parameter names

Remy Horton (4):
  ethdev: add support for PMD-tuned Tx/Rx parameters
  net/e1000: add TxRx tuning parameters
  net/i40e: add TxRx tuning parameters
  testpmd: make use of per-PMD TxRx parameters

 app/test-pmd/testpmd.c               |  5 ++--
 doc/guides/rel_notes/deprecation.rst | 13 -----------
 drivers/net/e1000/em_ethdev.c        |  6 +++++
 drivers/net/i40e/i40e_ethdev.c       | 33 ++++++++++++++++++++++++---
 lib/librte_ether/rte_ethdev.c        | 44 ++++++++++++++++++++++++++++--------
 lib/librte_ether/rte_ethdev.h        | 23 +++++++++++++++++++
 6 files changed, 97 insertions(+), 27 deletions(-)

-- 
2.9.5

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC PATCH v1 1/4] ethdev: add support for PMD-tuned Tx/Rx parameters
  2018-03-21 10:02  3%                             ` Ferruh Yigit
@ 2018-03-21 10:45  0%                               ` Shreyansh Jain
  0 siblings, 0 replies; 200+ results
From: Shreyansh Jain @ 2018-03-21 10:45 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Bruce Richardson, Horton, Remy, Ananyev, Konstantin, dev, Lu,
	Wenzhuo, Wu, Jingjing, Zhang, Qi Z, Xing, Beilei,
	Thomas Monjalon

> -----Original Message-----
> From: Ferruh Yigit [mailto:ferruh.yigit@intel.com]
> Sent: Wednesday, March 21, 2018 3:33 PM
> To: Shreyansh Jain <shreyansh.jain@nxp.com>
> Cc: Bruce Richardson <bruce.richardson@intel.com>; Horton, Remy
> <remy.horton@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; dev@dpdk.org; Lu, Wenzhuo
> <wenzhuo.lu@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Zhang, Qi
> Z <qi.z.zhang@intel.com>; Xing, Beilei <beilei.xing@intel.com>; Thomas
> Monjalon <thomas@monjalon.net>
> Subject: Re: [dpdk-dev] [RFC PATCH v1 1/4] ethdev: add support for PMD-
> tuned Tx/Rx parameters
> 
> On 3/21/2018 6:51 AM, Shreyansh Jain wrote:
> > On Tue, Mar 20, 2018 at 8:24 PM, Ferruh Yigit <ferruh.yigit@intel.com>
> wrote:
> >> On 3/16/2018 1:54 PM, Shreyansh Jain wrote:
> >>> On Thu, Mar 15, 2018 at 8:27 PM, Ferruh Yigit
> <ferruh.yigit@intel.com> wrote:
> >
> > [...]
> >

[...]

> >>>
> >>> C) Unlike the original proposal, this would add two separate members
> >>> to rte_eth_dev_info - one each for Rx and Tx. They both are still
> >>> expected to be populated through the info_get() implementation but
> not
> >>> by lib_eal.
> >>> IMO, doesn't matter.
> >>
> >> There won't be new members, which ones are you talking about?
> >
> > original proposal: (ignore change of names, please)
> >
> >  rte_eth_dev_preferred_info {
> >      rx_burst_size
> >      tx_burst_size
> >      rx_ring_size
> >      tx_ring_size
> >      ...
> >   }
> >
> > And this is what I think last few comments intended:
> >
> >  rte_eth_rxpreferred {
> >    ...
> >    rx_burst_size
> >    rx_ring_size
> >    ...
> >  }
> >
> >  rte_eth_txpreferred {
> >    ...
> >    tx_burst_size
> >    tx_ring_size
> >    ...
> >  }
> >
> > both the above added rte_eth_dev_info{}
> >
> > This is what I meant when I stated "...this would add two separate
> > members to rte_eth_dev_info - one each for Rx and Tx..."
> 
> Got it. I don't have any strong opinion on adding single struct or two
> (one for
> Rx and one for Tx).
> Since these will be public structs, do you think will there be any
> difference
> from ABI stability point of view?

No. It was just an observation. To me, it doesn't matter which approach is selected.

-
Shreyansh

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 18.05 v4] eal: add function to return number of detected sockets
  2018-03-21  4:59  0%     ` gowrishankar muthukrishnan
@ 2018-03-21 10:24  0%       ` Burakov, Anatoly
  0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-03-21 10:24 UTC (permalink / raw)
  To: gowrishankar muthukrishnan; +Cc: dev, Bruce Richardson, Chao Zhu

On 21-Mar-18 4:59 AM, gowrishankar muthukrishnan wrote:
> On Wednesday 07 February 2018 03:28 PM, Anatoly Burakov wrote:
>> During lcore scan, find maximum socket ID and store it. This will
>> break the ABI, so bump ABI version.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>
>> Notes:
>>      v4:
>>      - Remove backwards ABI compatibility, bump ABI instead
>>      v3:
>>      - Added ABI compatibility
>>      v2:
>>      - checkpatch changes
>>      - check socket before deciding if the core is not to be used
>>
>>   lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
>>   lib/librte_eal/common/eal_common_lcore.c  | 37 
>> +++++++++++++++++++++----------
>>   lib/librte_eal/common/include/rte_eal.h   |  1 +
>>   lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
>>   lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
>>   lib/librte_eal/rte_eal_version.map        |  9 +++++++-
>>   6 files changed, 44 insertions(+), 15 deletions(-)
>>
>> diff --git a/lib/librte_eal/bsdapp/eal/Makefile 
>> b/lib/librte_eal/bsdapp/eal/Makefile
>> index dd455e6..ed1d17b 100644
>> --- a/lib/librte_eal/bsdapp/eal/Makefile
>> +++ b/lib/librte_eal/bsdapp/eal/Makefile
>> @@ -21,7 +21,7 @@ LDLIBS += -lgcc_s
>>
>>   EXPORT_MAP := ../../rte_eal_version.map
>>
>> -LIBABIVER := 6
>> +LIBABIVER := 7
>>
>>   # specific to bsdapp exec-env
>>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
>> diff --git a/lib/librte_eal/common/eal_common_lcore.c 
>> b/lib/librte_eal/common/eal_common_lcore.c
>> index 7724fa4..827ddeb 100644
>> --- a/lib/librte_eal/common/eal_common_lcore.c
>> +++ b/lib/librte_eal/common/eal_common_lcore.c
>> @@ -28,6 +28,7 @@ rte_eal_cpu_init(void)
>>       struct rte_config *config = rte_eal_get_configuration();
>>       unsigned lcore_id;
>>       unsigned count = 0;
>> +    unsigned int socket_id, max_socket_id = 0;
>>
>>       /*
>>        * Parse the maximum set of logical cores, detect the subset of 
>> running
>> @@ -39,6 +40,19 @@ rte_eal_cpu_init(void)
>>           /* init cpuset for per lcore config */
>>           CPU_ZERO(&lcore_config[lcore_id].cpuset);
>>
>> +        /* find socket first */
>> +        socket_id = eal_cpu_socket_id(lcore_id);
>> +        if (socket_id >= RTE_MAX_NUMA_NODES) {
>> +#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
>> +            socket_id = 0;
>> +#else
>> +            RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than 
>> RTE_MAX_NUMA_NODES (%d)\n",
>> +                    socket_id, RTE_MAX_NUMA_NODES);
>> +            return -1;
>> +#endif
>> +        }
>> +        max_socket_id = RTE_MAX(max_socket_id, socket_id);
>> +
>>           /* in 1:1 mapping, record related cpu detected state */
>>           lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
>>           if (lcore_config[lcore_id].detected == 0) {
>> @@ -54,18 +68,7 @@ rte_eal_cpu_init(void)
>>           config->lcore_role[lcore_id] = ROLE_RTE;
>>           lcore_config[lcore_id].core_role = ROLE_RTE;
>>           lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
>> -        lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
>> -        if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
>> -#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
>> -            lcore_config[lcore_id].socket_id = 0;
>> -#else
>> -            RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
>> -                "RTE_MAX_NUMA_NODES (%d)\n",
>> -                lcore_config[lcore_id].socket_id,
>> -                RTE_MAX_NUMA_NODES);
>> -            return -1;
>> -#endif
>> -        }
>> +        lcore_config[lcore_id].socket_id = socket_id;
>>           RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
>>                   "core %u on socket %u\n",
>>                   lcore_id, lcore_config[lcore_id].core_id,
>> @@ -79,5 +82,15 @@ rte_eal_cpu_init(void)
>>           RTE_MAX_LCORE);
>>       RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
>>
>> +    config->numa_node_count = max_socket_id + 1;
> 
> In some IBM servers, socket ID number does not seem to be in sequence. 
> For an instance, 0 and 8 for a 2 node server.
> 
> In this case, numa_node_count would mislead users if wrongly understood 
> by its variable name IMO (see below)
>> +    RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", 
>> config->numa_node_count);
> 
> For an instance, reading above message would tell 'EAL detected 8 nodes' 
> in my server, but actually there are only two nodes.
> 
> Could its name better be 'numa_node_id_max' ?. Also, we store in actual 
> count of numa nodes in _count variable.
> 
> Also, there could be a case when there is no local memory available to a 
> numa node too.
> 
> Thanks,
> Gowrishankar

The point of this patchset is to (pre)allocate memory only on existing 
sockets.

If we don't know how many sockets there are, we are forced to 
preallocate VA space per each *possible* NUMA node - that is, reserve 
e.g. 8x128G of memory, 6 of which will go unused on a 2-socket system. 
We can't know if there is no memory on socket in advance, but we can at 
least avoid preallocating VA space for sockets that don't exist in the 
first place.

How about we store all possible socket id's instead? e.g. something like:

static int numa_node_ids[MAX_NUMA_NODES];
<...>
int rte_eal_cpu_init() {
	int sockets[RTE_MAX_LCORE];
	<...>
	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
		core_to_socket[lcore_id] = socket;
	}
	<...>
	qsort(sockets);
	<...>
	// store all unique sockets in numa_node_ids in ascending order
}
<...>

on a 2 socket system we then get:

rte_num_sockets() => return 2
rte_get_socket_id(int idx) => return numa_node_ids[idx]

Would that be suitable?

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC PATCH v1 1/4] ethdev: add support for PMD-tuned Tx/Rx parameters
  @ 2018-03-21 10:02  3%                             ` Ferruh Yigit
  2018-03-21 10:45  0%                               ` Shreyansh Jain
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-03-21 10:02 UTC (permalink / raw)
  To: Shreyansh Jain
  Cc: Bruce Richardson, Horton, Remy, Ananyev, Konstantin, dev, Lu,
	Wenzhuo, Wu, Jingjing, Zhang, Qi Z, Xing, Beilei,
	Thomas Monjalon

On 3/21/2018 6:51 AM, Shreyansh Jain wrote:
> On Tue, Mar 20, 2018 at 8:24 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>> On 3/16/2018 1:54 PM, Shreyansh Jain wrote:
>>> On Thu, Mar 15, 2018 at 8:27 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> 
> [...]
> 
>>>> Hi Remy, Shreyansh,
>>>>
>>>> What do you think about using a variable name consistent with existing
>>>> "default_[rt]xconf" in dev_info?
>>>
>>> It just turned out to be much more complex than I initially thought :)
>>> Is this what the above conversation merging at (for Rx, as example):
>>>
>>> 1. 'default_rx_size_conf' is added in rte_eth_dev_info (and this
>>> includes I/O  params like burst size, besides configure time nb_queue,
>>> nb_desc etc). Driver would return these values filled in when
>>> info_get() is called.
>>>
>>> 2a. If an application needs the defaults, it would perform info_get()
>>> and get the values. then, use the values in configuration APIs
>>> (rx_queue_setup for nb_rx_desc, eth_dev_dev_configure for
>>> nb_rx_queues).
>>> For rx_burst calls, it would use the burst_size fields obtained from info_get().
>>> This is good enough for configuration and datapath (rx_burst).
>>>
>>> OR, another case
>>>
>>> 2b. Application wants to use default vaules provided by driver without
>>> calling info_get. In which case, it would call
>>> rx_queue_setup(nb_rx_desc=0..) or eth_dev_configure(nb_rx_queue=0,
>>> nb_tx_queue=0). The implementation would query the value from
>>> 'default_rx_size_conf' through info_get() and use those values.
>>> Though, in this case, rte_eth_rx_burst(burst=0) might not work for
>>> picking up the default within rte_ethdev.h.
>>
>> In Bruce's suggestion where ethdev keep defaults is changed.
>> Initial suggestion was rte_eth_dev_info_get() filling default data, now defaults
>> will be defined in functions like rte_eth_rx_queue_setup().
>>
>> This is a little different from filling defaults in rte_eth_dev_info_get():
>> - Application can know where the defaults are coming from because dev_info
>> fields are only modified by PMD. Application still prefer to use ethdev defaults.
>>
>> - The default values in ethdev library provided in function related to that
>> data, instead of separate rte_eth_dev_info_get() function.
> 
> It seems we both are on same page (almost) - just that I couldn't
> articulate my comments properly earlier, maybe.
> 
> rte_eth_dev_info_get is only a method to get defaults set by PMDs.
> dev_info_get is not setting defaults by itself. I get this.
> 
>>
>>
>> What application can do:
>> -  Application can call rte_eth_dev_info_get() and can learn if PMD provided
>> defaults or not.
>> - If PMD doesn't provided any default values application can prefer to use
>> application defined values. This may be an option for the application looking
>> for most optimized values.
>> - Although PMD doesn't provide any defaults, application still can use defaults
>> provided by ethdev by providing '0' as arguments.
> 
> Yes, agree - and only comment I added previously in this case is that
> this is not applicable for burst APIs. So, optimal [rt]x burst size
> cannot be 'defaulted' to EAL layer. Other values like ring size, queue
> count can be delegated to EAL for overwriting if passed as '0'.

Yes you are right.

> 
>>
>>
>> So how related ethdev functions will be updated:
>> if argument != 0
>>   use argument
>> else
>>   dev_info_get()
>>     if dev_info->argument != 0
>>       use dev_info->argument
>>     else
>>       use function_prov
> 
> Perfect, but only for eth_dev_configure and eth_[rt]x_queue_setup functions -
> and that is OK with me.
> 
>>
>>>
>>> :Four observations:
>>> A). For burst size (or any other I/O time value added in future),
>>> values would have to be explicitly used by application - always. If
>>> value reported by info_get() is '0' (see (B) below), application to
>>> use its own judgement. No default override by lib_eal.
>>> IMO, This is good enough assumption.
>>
>> This is no more true after Bruce's comment.
>> If application provides any values it will overwrite everything else,
>> application has the final word.
>> But application may prefer to use provided default values.
> 
> I am not sure what has changed with Bruce's comment - but I agree with
> what you are stating.
> 
>>
>>>
>>> B). '0' as an indicator for 'no-default-value-available-from-driver'
>>> is still an open point. It is good enough for current proposed
>>> parameters, but may be a valid numerical value in future.
>>> IMO, this can be ignored for now.
>>
>> Agree that we can ignore it for now.
>>
>>>
>>> C) Unlike the original proposal, this would add two separate members
>>> to rte_eth_dev_info - one each for Rx and Tx. They both are still
>>> expected to be populated through the info_get() implementation but not
>>> by lib_eal.
>>> IMO, doesn't matter.
>>
>> There won't be new members, which ones are you talking about?
> 
> original proposal: (ignore change of names, please)
> 
>  rte_eth_dev_preferred_info {
>      rx_burst_size
>      tx_burst_size
>      rx_ring_size
>      tx_ring_size
>      ...
>   }
> 
> And this is what I think last few comments intended:
> 
>  rte_eth_rxpreferred {
>    ...
>    rx_burst_size
>    rx_ring_size
>    ...
>  }
> 
>  rte_eth_txpreferred {
>    ...
>    tx_burst_size
>    tx_ring_size
>    ...
>  }
> 
> both the above added rte_eth_dev_info{}
> 
> This is what I meant when I stated "...this would add two separate
> members to rte_eth_dev_info - one each for Rx and Tx..."

Got it. I don't have any strong opinion on adding single struct or two (one for
Rx and one for Tx).
Since these will be public structs, do you think will there be any difference
from ABI stability point of view?

> 
> [...]
> 
> -
> Shreyansh
> 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] pdump: change to use generic multi-process channel
  2018-03-21  2:28  4%   ` Tan, Jianfeng
@ 2018-03-21  9:55  3%     ` Pattan, Reshma
  2018-03-27  1:26  0%       ` Tan, Jianfeng
  0 siblings, 1 reply; 200+ results
From: Pattan, Reshma @ 2018-03-21  9:55 UTC (permalink / raw)
  To: Tan, Jianfeng, dev

Hi,

> -----Original Message-----
> From: Tan, Jianfeng
> Sent: Wednesday, March 21, 2018 2:28 AM
> To: Pattan, Reshma <reshma.pattan@intel.com>; dev@dpdk.org
> Subject: RE: [PATCH] pdump: change to use generic multi-process channel
> 

Hi,

> > 1) I feel ABI breakage has to be addressed  first for change in
> rte_pdump_init() .
> > 2)ABI notice for removal of the rte_pdump_set_socket_dir()  and then
> remove it completely .
> 
> This patch itself does not break any ABI. It just puts parameters of
> rte_pdump_init() not used. And make rte_pdump_set_socket_dir() as a
> dummy function.
> 

So, for current release you just mark parameters unused and functions set to dummy, in future release you announce 
ABI breakage by removing them completely? If that is agreed plan I don't have any issues.

> > 3)Need to do cleanup of the code app/dpdk-pdump.
> 
> Yes, I understand it's a normal process to announce deprecation firstly, and
> then do the change.
> 
> But here is the thing, with generic mp introduced, we will not be compatible
> with DPDK versions.
> So we want to unify the use of generic mp channel in this release for vfio,
> pdump, vdev, memory rework.
> And in fact, ABI/API changes could be delayed to later releases.

So, you want to remove unnecessary socket  related code from dpdk-pdump in future release itself?  Kind of making sense. 
But dpdk-pdump  tool has socket path related command line options which user still can pass on, isn't it kind of confusion we creating w.r.t
Internal design and usage? 

> 
> > 4)After all the changes we need to make sure dpdk-pdump works fine
> > without breaking the functionality, validation team should be able to help.
> 
> I have done a simple test of pdump. Can you suggest where can I get the
> comprehensive test cases?
> 

Ok, if you have verified and observed packets are been captured successfully, that is good enough.

Thanks,
Reshma

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] eal: replace rte_panic instances to return an error value
  2018-03-21  8:21  0%         ` Thomas Monjalon
@ 2018-03-21  8:47  0%           ` Arnon Warshavsky
  0 siblings, 0 replies; 200+ results
From: Arnon Warshavsky @ 2018-03-21  8:47 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Burakov, Anatoly, wenzhuo.lu, declan.doherty, jerin.jacob,
	Bruce Richardson, ferruh.yigit, dev

I don't know what needs to be modified.

> I think the first step is to clearly identified the different kind
> of changes by splitting your patch.
> Then we will decide how to integrate them.
>
>
> Ok.Will split the commit with no abi handling and continue from there

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] eal: replace rte_panic instances to return an error value
  2018-03-20 23:04  0%       ` Arnon Warshavsky
@ 2018-03-21  8:21  0%         ` Thomas Monjalon
  2018-03-21  8:47  0%           ` Arnon Warshavsky
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-21  8:21 UTC (permalink / raw)
  To: Arnon Warshavsky
  Cc: Burakov, Anatoly, wenzhuo.lu, declan.doherty, jerin.jacob,
	Bruce Richardson, ferruh.yigit, dev

21/03/2018 00:04, Arnon Warshavsky:
> >
> > You are talking about API, and I agree the old applications can keep
> > considering the functions as void.
> > But I was talking about ABI, meaning: can we use an old application
> > without recompiling and update only the DPDK (in .so file)?
> >
> >
> > You are right of course. Once again I mixed the two..
> I will modify accordingly

I don't know what needs to be modified.
I think the first step is to clearly identified the different kind
of changes by splitting your patch.
Then we will decide how to integrate them.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 18.05 v4] eal: add function to return number of detected sockets
    2018-03-08 12:12  3%     ` Bruce Richardson
@ 2018-03-21  4:59  0%     ` gowrishankar muthukrishnan
  2018-03-21 10:24  0%       ` Burakov, Anatoly
  1 sibling, 1 reply; 200+ results
From: gowrishankar muthukrishnan @ 2018-03-21  4:59 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev, Bruce Richardson, Chao Zhu

On Wednesday 07 February 2018 03:28 PM, Anatoly Burakov wrote:
> During lcore scan, find maximum socket ID and store it. This will
> break the ABI, so bump ABI version.
>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>
> Notes:
>      v4:
>      - Remove backwards ABI compatibility, bump ABI instead
>      
>      v3:
>      - Added ABI compatibility
>      
>      v2:
>      - checkpatch changes
>      - check socket before deciding if the core is not to be used
>
>   lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
>   lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
>   lib/librte_eal/common/include/rte_eal.h   |  1 +
>   lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
>   lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
>   lib/librte_eal/rte_eal_version.map        |  9 +++++++-
>   6 files changed, 44 insertions(+), 15 deletions(-)
>
> diff --git a/lib/librte_eal/bsdapp/eal/Makefile b/lib/librte_eal/bsdapp/eal/Makefile
> index dd455e6..ed1d17b 100644
> --- a/lib/librte_eal/bsdapp/eal/Makefile
> +++ b/lib/librte_eal/bsdapp/eal/Makefile
> @@ -21,7 +21,7 @@ LDLIBS += -lgcc_s
>
>   EXPORT_MAP := ../../rte_eal_version.map
>
> -LIBABIVER := 6
> +LIBABIVER := 7
>
>   # specific to bsdapp exec-env
>   SRCS-$(CONFIG_RTE_EXEC_ENV_BSDAPP) := eal.c
> diff --git a/lib/librte_eal/common/eal_common_lcore.c b/lib/librte_eal/common/eal_common_lcore.c
> index 7724fa4..827ddeb 100644
> --- a/lib/librte_eal/common/eal_common_lcore.c
> +++ b/lib/librte_eal/common/eal_common_lcore.c
> @@ -28,6 +28,7 @@ rte_eal_cpu_init(void)
>   	struct rte_config *config = rte_eal_get_configuration();
>   	unsigned lcore_id;
>   	unsigned count = 0;
> +	unsigned int socket_id, max_socket_id = 0;
>
>   	/*
>   	 * Parse the maximum set of logical cores, detect the subset of running
> @@ -39,6 +40,19 @@ rte_eal_cpu_init(void)
>   		/* init cpuset for per lcore config */
>   		CPU_ZERO(&lcore_config[lcore_id].cpuset);
>
> +		/* find socket first */
> +		socket_id = eal_cpu_socket_id(lcore_id);
> +		if (socket_id >= RTE_MAX_NUMA_NODES) {
> +#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
> +			socket_id = 0;
> +#else
> +			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than RTE_MAX_NUMA_NODES (%d)\n",
> +					socket_id, RTE_MAX_NUMA_NODES);
> +			return -1;
> +#endif
> +		}
> +		max_socket_id = RTE_MAX(max_socket_id, socket_id);
> +
>   		/* in 1:1 mapping, record related cpu detected state */
>   		lcore_config[lcore_id].detected = eal_cpu_detected(lcore_id);
>   		if (lcore_config[lcore_id].detected == 0) {
> @@ -54,18 +68,7 @@ rte_eal_cpu_init(void)
>   		config->lcore_role[lcore_id] = ROLE_RTE;
>   		lcore_config[lcore_id].core_role = ROLE_RTE;
>   		lcore_config[lcore_id].core_id = eal_cpu_core_id(lcore_id);
> -		lcore_config[lcore_id].socket_id = eal_cpu_socket_id(lcore_id);
> -		if (lcore_config[lcore_id].socket_id >= RTE_MAX_NUMA_NODES) {
> -#ifdef RTE_EAL_ALLOW_INV_SOCKET_ID
> -			lcore_config[lcore_id].socket_id = 0;
> -#else
> -			RTE_LOG(ERR, EAL, "Socket ID (%u) is greater than "
> -				"RTE_MAX_NUMA_NODES (%d)\n",
> -				lcore_config[lcore_id].socket_id,
> -				RTE_MAX_NUMA_NODES);
> -			return -1;
> -#endif
> -		}
> +		lcore_config[lcore_id].socket_id = socket_id;
>   		RTE_LOG(DEBUG, EAL, "Detected lcore %u as "
>   				"core %u on socket %u\n",
>   				lcore_id, lcore_config[lcore_id].core_id,
> @@ -79,5 +82,15 @@ rte_eal_cpu_init(void)
>   		RTE_MAX_LCORE);
>   	RTE_LOG(INFO, EAL, "Detected %u lcore(s)\n", config->lcore_count);
>
> +	config->numa_node_count = max_socket_id + 1;

In some IBM servers, socket ID number does not seem to be in sequence. 
For an instance, 0 and 8 for a 2 node server.

In this case, numa_node_count would mislead users if wrongly understood 
by its variable name IMO (see below)
> +	RTE_LOG(INFO, EAL, "Detected %u NUMA nodes\n", config->numa_node_count);

For an instance, reading above message would tell 'EAL detected 8 nodes' 
in my server, but actually there are only two nodes.

Could its name better be 'numa_node_id_max' ?. Also, we store in actual 
count of numa nodes in _count variable.

Also, there could be a case when there is no local memory available to a 
numa node too.

Thanks,
Gowrishankar
> +
>   	return 0;
>   }
> +
> +unsigned int
> +rte_num_sockets(void)
> +{
> +	const struct rte_config *config = rte_eal_get_configuration();
> +	return config->numa_node_count;
> +}
> diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
> index 08c6637..63fcc2e 100644
> --- a/lib/librte_eal/common/include/rte_eal.h
> +++ b/lib/librte_eal/common/include/rte_eal.h
> @@ -57,6 +57,7 @@ enum rte_proc_type_t {
>   struct rte_config {
>   	uint32_t master_lcore;       /**< Id of the master lcore */
>   	uint32_t lcore_count;        /**< Number of available logical cores. */
> +	uint32_t numa_node_count;    /**< Number of detected NUMA nodes. */
>   	uint32_t service_lcore_count;/**< Number of available service cores. */
>   	enum rte_lcore_role_t lcore_role[RTE_MAX_LCORE]; /**< State of cores. */
>
> diff --git a/lib/librte_eal/common/include/rte_lcore.h b/lib/librte_eal/common/include/rte_lcore.h
> index d84bcff..ddf4c64 100644
> --- a/lib/librte_eal/common/include/rte_lcore.h
> +++ b/lib/librte_eal/common/include/rte_lcore.h
> @@ -120,6 +120,14 @@ rte_lcore_index(int lcore_id)
>   unsigned rte_socket_id(void);
>
>   /**
> + * Return number of physical sockets on the system.
> + * @return
> + *   the number of physical sockets as recognized by EAL
> + *
> + */
> +unsigned int rte_num_sockets(void);
> +
> +/**
>    * Get the ID of the physical socket of the specified lcore
>    *
>    * @param lcore_id
> diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile
> index 7e5bbe8..b9c7727 100644
> --- a/lib/librte_eal/linuxapp/eal/Makefile
> +++ b/lib/librte_eal/linuxapp/eal/Makefile
> @@ -10,7 +10,7 @@ ARCH_DIR ?= $(RTE_ARCH)
>   EXPORT_MAP := ../../rte_eal_version.map
>   VPATH += $(RTE_SDK)/lib/librte_eal/common/arch/$(ARCH_DIR)
>
> -LIBABIVER := 6
> +LIBABIVER := 7
>
>   VPATH += $(RTE_SDK)/lib/librte_eal/common
>
> diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
> index 4146907..fc83e74 100644
> --- a/lib/librte_eal/rte_eal_version.map
> +++ b/lib/librte_eal/rte_eal_version.map
> @@ -211,6 +211,13 @@ DPDK_18.02 {
>
>   }  DPDK_17.11;
>
> +DPDK_18.05 {
> +	global:
> +
> +	rte_num_sockets;
> +
> +} DPDK_18.02;
> +
>   EXPERIMENTAL {
>   	global:
>
> @@ -255,4 +262,4 @@ EXPERIMENTAL {
>   	rte_service_set_stats_enable;
>   	rte_service_start_with_defaults;
>
> -} DPDK_18.02;
> +} DPDK_18.05;

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] pdump: change to use generic multi-process channel
  2018-03-20 16:37  4% ` Pattan, Reshma
@ 2018-03-21  2:28  4%   ` Tan, Jianfeng
  2018-03-21  9:55  3%     ` Pattan, Reshma
  0 siblings, 1 reply; 200+ results
From: Tan, Jianfeng @ 2018-03-21  2:28 UTC (permalink / raw)
  To: Pattan, Reshma, dev

Thank you for the comments!

> -----Original Message-----
> From: Pattan, Reshma
> Sent: Wednesday, March 21, 2018 12:38 AM
> To: Tan, Jianfeng; dev@dpdk.org
> Subject: RE: [PATCH] pdump: change to use generic multi-process channel
> 
> Hi,
> 
> > -----Original Message-----
> > From: Tan, Jianfeng
> > Sent: Sunday, March 4, 2018 3:04 PM
> > To: dev@dpdk.org
> > Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; Pattan, Reshma
> > <reshma.pattan@intel.com>
> > Subject: [PATCH] pdump: change to use generic multi-process channel
> >
> > The original code replies on the private channel for primary and secondary
> > communication. Change to use the generic multi-process channel.
> >
> > Note with this change, dpdk-pdump will be not compatible with old version
> > DPDK applications.
> >
> 
> Is this the correct time to make this change? As I see the rte_mp APIs are still
> in experimental stage?
> If you wish to change them,
> 
> 1) I feel ABI breakage has to be addressed  first for change in rte_pdump_init() . 
> 2)ABI notice for removal of the rte_pdump_set_socket_dir()  and then remove it completely .

This patch itself does not break any ABI. It just puts parameters of rte_pdump_init() not used. And make rte_pdump_set_socket_dir() as a dummy function.

> 3)Need to do cleanup of the code app/dpdk-pdump.

Yes, I understand it's a normal process to announce deprecation firstly, and then do the change.

But here is the thing, with generic mp introduced, we will not be compatible with DPDK versions.
So we want to unify the use of generic mp channel in this release for vfio, pdump, vdev, memory rework.
And in fact, ABI/API changes could be delayed to later releases.

> 4)After all the changes we need to make sure dpdk-pdump works fine
> without breaking the functionality, validation team should be able to help.

I have done a simple test of pdump. Can you suggest where can I get the comprehensive test cases?

> 5)Replace strcpy  with snprintf.

Will do.

Thanks,
Jianfeng

> 
> Thanks,
> Reshma
> 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] eal: replace rte_panic instances to return an error value
  2018-03-20 22:49  3%     ` Thomas Monjalon
@ 2018-03-20 23:04  0%       ` Arnon Warshavsky
  2018-03-21  8:21  0%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Arnon Warshavsky @ 2018-03-20 23:04 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Burakov, Anatoly, wenzhuo.lu, declan.doherty, jerin.jacob,
	Bruce Richardson, ferruh.yigit, dev

>
> You are talking about API, and I agree the old applications can keep
> considering the functions as void.
> But I was talking about ABI, meaning: can we use an old application
> without recompiling and update only the DPDK (in .so file)?
>
>
> You are right of course. Once again I mixed the two..
I will modify accordingly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] eal: replace rte_panic instances to return an error value
  2018-03-20 22:42  0%   ` Arnon Warshavsky
@ 2018-03-20 22:49  3%     ` Thomas Monjalon
  2018-03-20 23:04  0%       ` Arnon Warshavsky
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-20 22:49 UTC (permalink / raw)
  To: Arnon Warshavsky
  Cc: Burakov, Anatoly, wenzhuo.lu, declan.doherty, jerin.jacob,
	Bruce Richardson, ferruh.yigit, dev

20/03/2018 23:42, Arnon Warshavsky:
> > Thanks for working on this important topic.
> 
> With pleasure :)
> 
> > My feeling is that we could replace most of them by a log + return.
> > I did not think you would add a new macro. Why you chose this way?
> 
> This was meant to keep the code shorter, and imply to the reader that this
> return is actually meant to be fatal
> >
> > >   I would like to define a device health state that can be monitored from
> > >   the side,and this will be an independant patch.
> >
> > You mean when a device become unusable?
> 
> Yes. Obviously not a simple issue, but essential for refraining from panic
> in the interrupt/data-path context,
> while allowing to detect and execute on the slow/management path.
> 
> > > - Some previously panicing void functions where changed to return a
> > > value, with callers modified accordingly.
> >
> > If the function is exposed to the application, I think it is an ABI change
> > and should follow the deprecation process.
> 
> In this case I thought there would be no actual change for the user as the
> transition is from returning void to int,
> and existing calls should continue to behave as before (except for not
> crashing)

You are talking about API, and I agree the old applications can keep
considering the functions as void.
But I was talking about ABI, meaning: can we use an old application
without recompiling and update only the DPDK (in .so file)?

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] eal: replace rte_panic instances to return an error value
  2018-03-20 22:04  3% ` Thomas Monjalon
@ 2018-03-20 22:42  0%   ` Arnon Warshavsky
  2018-03-20 22:49  3%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Arnon Warshavsky @ 2018-03-20 22:42 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Burakov, Anatoly, wenzhuo.lu, declan.doherty, jerin.jacob,
	Bruce Richardson, ferruh.yigit, dev

> Thanks for working on this important topic.
> With pleasure :)
>
 ...

> My feeling is that we could replace most of them by a log + return.
> I did not think you would add a new macro. Why you chose this way?
> This was meant to keep the code shorter, and imply to the reader that this
> return is actually meant to be fatal
>
...
> >   I would like to define a device health state that can be monitored from
> >   the side,and this will be an independant patch.
>
> You mean when a device become unusable?
>
Yes. Obviously not a simple issue, but essential for refraining from panic
in the interrupt/data-path context,
while allowing to detect and execute on the slow/management path.

>
> ..
> > - Some previously panicing void functions where changed to return a
> value,
> >   with callers modified accordingly.
>
> If the function is exposed to the application, I think it is an ABI change
> and should follow the deprecation process.
>
In this case I thought there would be no actual change for the user as the
transition is from returning void to int,
and existing calls should continue to behave as before (except for not
crashing)
Following references to these functions, they all seem to comply to this
rule, but I guess that's what reviews are for :)


>
>
> /Arnon

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] eal: replace rte_panic instances to return an error value
  @ 2018-03-20 22:04  3% ` Thomas Monjalon
  2018-03-20 22:42  0%   ` Arnon Warshavsky
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-20 22:04 UTC (permalink / raw)
  To: Arnon Warshavsky
  Cc: anatoly.burakov, wenzhuo.lu, declan.doherty, jerin.jacob,
	bruce.richardson, ferruh.yigit, dev

Hi,

20/03/2018 22:28, Arnon Warshavsky:
> The purpose of this patch is to cleanup the library code
> from paths that end up aborting the process,
> and move to checking error values, in order to allow the running process
> perform an orderly teardown or other mitigation of the event.

Thanks for working on this important topic.

> This patch modifies the majority of rte_panic calls under lib and drivers,
> and replaces them with a new variation of rte_panic macro
> that does not abort and returns an error value
> that can be propagated up the call stack.

My feeling is that we could replace most of them by a log + return.
I did not think you would add a new macro. Why you chose this way?

> - Focus was given to the dpdk initialization path
> - Some of the panic calls within drivers were left in place where
>   the call is from within an interrupt or calls that are on the data path,
>   where there is no simple applicative route to propagate
>   the error to temination.
>   These should be handled by the driver maintainers.

Yes, better to let driver maintainers decide if you are not sure.

>   I would like to define a device health state that can be monitored from
>   the side,and this will be an independant patch.

You mean when a device become unusable?

> - No change took place in example and test files

Yes, panic/exit is allowed in applications.

> - No change took place for debug assertions calling panic

Yes, debug assert is a special case.

> - Some previously panicing void functions where changed to return a value,
>   with callers modified accordingly.

If the function is exposed to the application, I think it is an ABI change
and should follow the deprecation process.

> An additional independant patch to devtools/checkpatches.sh
> will be submitted in order to prevent new additions of calls to rte_panic
> under lib and drivers.

Yes please! +1 for an automatic check.

> Keep calm and don't panic.

Sure :)

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] doc: announce ethdev CRC strip flag deprecation
  @ 2018-03-20 17:23  3%   ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-03-20 17:23 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Neil Horman, John McNamara, Marko Kovacevic, dev

On 3/20/2018 11:35 AM, Thomas Monjalon wrote:
> 20/03/2018 12:26, Ferruh Yigit:
>> --- a/doc/guides/rel_notes/deprecation.rst
>> +++ b/doc/guides/rel_notes/deprecation.rst
>> +* ethdev: Make CRC stript default behavior without any flag required and add a
> 
> s/stript/stripping/
> 
>> +  new offload flag to let application request for keeping CRC if PMD reports
>> +  capability for it.
>> +  ``DEV_RX_OFFLOAD_CRC_STRIP`` flag will be removed.
>> +  ``DEV_RX_OFFLOAD_KEETP_CRC`` will be added.
> 
> s/KEETP/KEEP/
> 
> I think we should introduce the new flag without removing the old one
> for one release.
> Setting both flags would be an error.
> Setting no flag would mean stripping.
> So the CRC_STRIP flag would be just ignored by PMDs.
> 
> Opinions?

Introducing KEEP_CRC in this release is OK, since no PMD announces this
capability, it should be safe.

Since many PMD's offloading patches are not applied yet, deciding on "Setting no
flag would mean stripping" helps to get this correct at first place for them.
Only this is ABI break, but I am not aware of any PMD that implements not
stripping CRC case so I hope this is acceptable.

If there is no objection, I will update notice to only remove flag in next release.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] pdump: change to use generic multi-process channel
  @ 2018-03-20 16:37  4% ` Pattan, Reshma
  2018-03-21  2:28  4%   ` Tan, Jianfeng
  0 siblings, 1 reply; 200+ results
From: Pattan, Reshma @ 2018-03-20 16:37 UTC (permalink / raw)
  To: Tan, Jianfeng, dev

Hi,

> -----Original Message-----
> From: Tan, Jianfeng
> Sent: Sunday, March 4, 2018 3:04 PM
> To: dev@dpdk.org
> Cc: Tan, Jianfeng <jianfeng.tan@intel.com>; Pattan, Reshma
> <reshma.pattan@intel.com>
> Subject: [PATCH] pdump: change to use generic multi-process channel
> 
> The original code replies on the private channel for primary and secondary
> communication. Change to use the generic multi-process channel.
> 
> Note with this change, dpdk-pdump will be not compatible with old version
> DPDK applications.
> 

Is this the correct time to make this change? As I see the rte_mp APIs are still in experimental stage?
If you wish to change them,

1) I feel ABI breakage has to be addressed  first for change in  rte_pdump_init() .
2)ABI notice for removal of the rte_pdump_set_socket_dir()  and then remove it completely . 
3)Need to do cleanup of the code app/dpdk-pdump. 
4)After all the changes we need to make sure dpdk-pdump works fine without breaking the functionality, validation team should be able to help.
5)Replace strcpy  with snprintf.

Thanks,
Reshma

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v4] ethdev: return named opaque type instead of void pointer
  2018-03-09 19:06  0%           ` Neil Horman
@ 2018-03-20 15:51  0%             ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-03-20 15:51 UTC (permalink / raw)
  To: Neil Horman; +Cc: John McNamara, Marko Kovacevic, Thomas Monjalon, dev

On 3/9/2018 7:06 PM, Neil Horman wrote:
> On Fri, Mar 09, 2018 at 03:45:49PM +0000, Ferruh Yigit wrote:
>> On 3/9/2018 3:16 PM, Neil Horman wrote:
>>> On Fri, Mar 09, 2018 at 01:00:35PM +0000, Ferruh Yigit wrote:
>>>> On 3/9/2018 12:36 PM, Neil Horman wrote:
>>>>> On Fri, Mar 09, 2018 at 11:25:31AM +0000, Ferruh Yigit wrote:
>>>>>> "struct rte_eth_rxtx_callback" is defined as internal data structure and
>>>>>> used as named opaque type.
>>>>>>
>>>>>> So the functions that are adding callbacks can return objects in this
>>>>>> type instead of void pointer.
>>>>>>
>>>>>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>>>>> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
>>>>>> ---
>>>>>> v2:
>>>>>> * keep using struct * in parameters, instead add callback functions
>>>>>> return struct rte_eth_rxtx_callback pointer.
>>>>>>
>>>>>> v4:
>>>>>> * Remove deprecation notice. LIBABIVER already increased in this release
>>>>>> ---
>>>>>>  doc/guides/rel_notes/deprecation.rst |  7 -------
>>>>>>  lib/librte_ether/rte_ethdev.c        |  6 +++---
>>>>>>  lib/librte_ether/rte_ethdev.h        | 13 ++++++++-----
>>>>>>  3 files changed, 11 insertions(+), 15 deletions(-)
>>>>>>
>>>>> This doesn't quite make sense to me.  If rte_eth_rxtx_callback is defined as an
>>>>> internal data structure, then it shouldn't be used as part of the prototype for
>>>>> an exported function, as the structure will then no longer be a internal data
>>>>> structure, but rather part of the public ABI.
>>>>
>>>> "struct rte_eth_rxtx_callback" is internal data structure. And application
>>>> should not access elements of this structure.
>>>>
>>>> "struct rte_eth_rxtx_callback;" is defined in the public header, so applications
>>>> can use it as opaque type.
>>>>
>>>> It is possible that both "add" and "remove" APIs use "void *" and API itself can
>>>> cast it. But the inconsistency was "add" related APIs return "void *" and
>>>> "remove" related APIs require a parameter in "struct rte_eth_rxtx_callback *" type.
>>>>
>>>> While unifying the usage, "struct rte_eth_rxtx_callback *" preferred against
>>>> "void *", because named opaque type documents intention/usage better.
>>>>
>>>> Thanks,
>>>> ferruh
>>>>
>>> I get what you're saying about rte_eth_rxtx_callback being an internals
>>> structure (or its intent is to be an internal structure), but it doesn't seem to
>>> hold up to the header file layout.  rte_eth_rxtx_callback is defined in
>>> rte_ethdev_core.h which according to the makefile, is listed as a symlinked
>>> file, and therefore available for external applications to include.  This
>>> negates the intended opaque nature of the struct.  I think before you do this,
>>> you want to rectify that.
>>
>> Intention is to make "struct rte_eth_rxtx_callback" internal, but as you said it
>> is available to applications. This is same for all data structures in
>> rte_ethdev_core.h
>>
> Well...yes.  Thats what I said
> 
>> Unfortunately it can't be actual internal because of inline functions in public
>> header uses them. And we can't change inline functions because of performance
>> concerns.
>>
> I'm sorry, thats not ok with me.  Just declaring a data structure to be
> internal-only without enforcing that is asking for applications to mangle
> internal data, and theres no reason it can't be fixed (and done without
> sacrificing performance).

Currently that is what blocking us, mainly rte_eth_[rt]x_burst() inline
functions. Is there a way to fix it without loosing performance?

> 
>> Since we can't make the structure real internal, we can't really prevent
>> applications to access the internals, this same if you use "void *".
>>
> Just typedef a void pointer to some rte_ethdev_cb_handle_t type and pass that
> back and forth instead.  That at least hides the fact that you are using a non
> opaque structure from user applications without some intentional casting.  You
> can further lock the call down by declaring the handles const so that no one
> tries to dereference or modify them without generating a warning.

Even handle won't work if user really wants to update it. This may highlight the
intention but I believe moving struct to ethdev_core.h already highlights the
intention.

Related to the const qualifier I was thinking if remove functions needs to
update the struct, but it doesn't seems the case, I will add them.

> 
> Neil
> 
>>>
>>> Neil
>>>
>>>>>
>>>>> Neil
>>>>>
>>>>
>>>>
>>
>>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated
  2018-03-19 17:03  0%     ` Olivier Matz
@ 2018-03-20 14:41  0%       ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2018-03-20 14:41 UTC (permalink / raw)
  To: Olivier Matz; +Cc: Andrew Rybchenko, dev

On Mon, Mar 19, 2018 at 06:03:52PM +0100, Olivier Matz wrote:
> On Sat, Mar 10, 2018 at 03:39:34PM +0000, Andrew Rybchenko wrote:
> > Size of memory chunk required to populate mempool objects depends
> > on how objects are stored in the memory. Different mempool drivers
> > may have different requirements and a new operation allows to
> > calculate memory size in accordance with driver requirements and
> > advertise requirements on minimum memory chunk size and alignment
> > in a generic way.
> > 
> > Bump ABI version since the patch breaks it.
> > 
> > Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
> 
> Looks good to me. Just see below for few minor comments.
> 
> > ---
> > RFCv2 -> v1:
> >  - move default calc_mem_size callback to rte_mempool_ops_default.c
> >  - add ABI changes to release notes
> >  - name default callback consistently: rte_mempool_op_<callback>_default()
> >  - bump ABI version since it is the first patch which breaks ABI
> >  - describe default callback behaviour in details
> >  - avoid introduction of internal function to cope with depration
> 
> typo (depration)
> 
> >    (keep it to deprecation patch)
> >  - move cache-line or page boundary chunk alignment to default callback
> >  - highlight that min_chunk_size and align parameters are output only
> 
> [...]
> 
> > --- a/lib/librte_mempool/Makefile
> > +++ b/lib/librte_mempool/Makefile
> > @@ -11,11 +11,12 @@ LDLIBS += -lrte_eal -lrte_ring
> >  
> >  EXPORT_MAP := rte_mempool_version.map
> >  
> > -LIBABIVER := 3
> > +LIBABIVER := 4
> >  
> >  # all source are stored in SRCS-y
> >  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
> >  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops.c
> > +SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops_default.c
> >  # install includes
> >  SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
> >  
> > diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
> > index 7a4f3da..9e3b527 100644
> > --- a/lib/librte_mempool/meson.build
> > +++ b/lib/librte_mempool/meson.build
> > @@ -1,7 +1,8 @@
> >  # SPDX-License-Identifier: BSD-3-Clause
> >  # Copyright(c) 2017 Intel Corporation
> >  
> > -version = 2
> > -sources = files('rte_mempool.c', 'rte_mempool_ops.c')
> > +version = 4
> > +sources = files('rte_mempool.c', 'rte_mempool_ops.c',
> > +		'rte_mempool_ops_default.c')
> >  headers = files('rte_mempool.h')
> >  deps += ['ring']
> 
> It's strange to see that meson does not have the same
> .so version than the legacy build system.
> 
> +CC Bruce in case he wants to fix this issue separately.
>
The so version drift occurred during the development of the next-build
tree, sadly. While initially all version were correct, as the patches
flowed into mainline I wasn't able to keep up with all the version changed.
:-(
Since nobody is actually using meson for packaging (yet), I'm not sure this
is critical, so I don't mind whether it's fixed in a separate patch or not.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 13/41] eal: replace memseg with memseg lists
  2018-03-19 17:39  3%   ` Olivier Matz
@ 2018-03-20  9:47  4%     ` Burakov, Anatoly
  0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-03-20  9:47 UTC (permalink / raw)
  To: Olivier Matz
  Cc: dev, Thomas Monjalon, Yuanhan Liu, Maxime Coquelin, Tiwei Bie,
	keith.wiles, jianfeng.tan, andras.kovacs, laszlo.vadkeri,
	benjamin.walker, bruce.richardson, konstantin.ananyev,
	kuralamudhan.ramakrishnan, louise.m.daly, nelio.laranjeiro,
	yskoh, pepperjo, jerin.jacob, hemant.agrawal

On 19-Mar-18 5:39 PM, Olivier Matz wrote:
> On Sat, Mar 03, 2018 at 01:46:01PM +0000, Anatoly Burakov wrote:
> 
> [...]
> 
>> --- a/config/common_base
>> +++ b/config/common_base
>> @@ -61,7 +61,20 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
>>   CONFIG_RTE_LIBRTE_EAL=y
>>   CONFIG_RTE_MAX_LCORE=128
>>   CONFIG_RTE_MAX_NUMA_NODES=8
>> -CONFIG_RTE_MAX_MEMSEG=256
>> +CONFIG_RTE_MAX_MEMSEG_LISTS=32
>> +# each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
>> +# or RTE_MAX_MEM_PER_LIST gigabytes worth of memory, whichever is the smallest
>> +CONFIG_RTE_MAX_MEMSEG_PER_LIST=8192
>> +CONFIG_RTE_MAX_MEM_PER_LIST=32
>> +# a "type" is a combination of page size and NUMA node. total number of memseg
>> +# lists per type will be limited to either RTE_MAX_MEMSEG_PER_TYPE pages (split
>> +# over multiple lists of RTE_MAX_MEMSEG_PER_LIST pages), or RTE_MAX_MEM_PER_TYPE
>> +# gigabytes of memory (split over multiple lists of RTE_MAX_MEM_PER_LIST),
>> +# whichever is the smallest
>> +CONFIG_RTE_MAX_MEMSEG_PER_TYPE=32768
>> +CONFIG_RTE_MAX_MEM_PER_TYPE=128
>> +# legacy mem mode only
>> +CONFIG_RTE_MAX_LEGACY_MEMSEG=256
> 
> Would it be possible to suffix CONFIG_RTE_MAX_MEM_PER_LIST and
> CONFIG_RTE_MAX_MEM_PER_TYPE with _GB? It's not that obvious that is it
> gigabytes.

Sure, will add this.

> 
> What is the impact of changing one of these values on the ABI?

Some of them will change the ABI, some won't. MAX_MEMSEG_LISTS will 
change the ABI because it's in the rte_eal_memconfig, but other values 
are not and are only used during init (and LEGACY_MEMSEG is already 
removed in GitHub code).

> And what would be the impact on performance?

Depending on what you mean by performance. Generally, no impact on 
performance will be noticeable because we're not really doing anything 
differently - a page is a page, no matter how or when it is mapped. 
These changes might also speed up some lookup operations on memseg lists 
themselves.

> The underlying question is: shall we increase these values to avoid changing them later?
> 

I do plan to increase the MAX_MEMSEG_LISTS value to at least 64.

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 13/41] eal: replace memseg with memseg lists
  @ 2018-03-19 17:39  3%   ` Olivier Matz
  2018-03-20  9:47  4%     ` Burakov, Anatoly
  0 siblings, 1 reply; 200+ results
From: Olivier Matz @ 2018-03-19 17:39 UTC (permalink / raw)
  To: Anatoly Burakov
  Cc: dev, Thomas Monjalon, Yuanhan Liu, Maxime Coquelin, Tiwei Bie,
	keith.wiles, jianfeng.tan, andras.kovacs, laszlo.vadkeri,
	benjamin.walker, bruce.richardson, konstantin.ananyev,
	kuralamudhan.ramakrishnan, louise.m.daly, nelio.laranjeiro,
	yskoh, pepperjo, jerin.jacob, hemant.agrawal

On Sat, Mar 03, 2018 at 01:46:01PM +0000, Anatoly Burakov wrote:

[...]

> --- a/config/common_base
> +++ b/config/common_base
> @@ -61,7 +61,20 @@ CONFIG_RTE_CACHE_LINE_SIZE=64
>  CONFIG_RTE_LIBRTE_EAL=y
>  CONFIG_RTE_MAX_LCORE=128
>  CONFIG_RTE_MAX_NUMA_NODES=8
> -CONFIG_RTE_MAX_MEMSEG=256
> +CONFIG_RTE_MAX_MEMSEG_LISTS=32
> +# each memseg list will be limited to either RTE_MAX_MEMSEG_PER_LIST pages
> +# or RTE_MAX_MEM_PER_LIST gigabytes worth of memory, whichever is the smallest
> +CONFIG_RTE_MAX_MEMSEG_PER_LIST=8192
> +CONFIG_RTE_MAX_MEM_PER_LIST=32
> +# a "type" is a combination of page size and NUMA node. total number of memseg
> +# lists per type will be limited to either RTE_MAX_MEMSEG_PER_TYPE pages (split
> +# over multiple lists of RTE_MAX_MEMSEG_PER_LIST pages), or RTE_MAX_MEM_PER_TYPE
> +# gigabytes of memory (split over multiple lists of RTE_MAX_MEM_PER_LIST),
> +# whichever is the smallest
> +CONFIG_RTE_MAX_MEMSEG_PER_TYPE=32768
> +CONFIG_RTE_MAX_MEM_PER_TYPE=128
> +# legacy mem mode only
> +CONFIG_RTE_MAX_LEGACY_MEMSEG=256

Would it be possible to suffix CONFIG_RTE_MAX_MEM_PER_LIST and
CONFIG_RTE_MAX_MEM_PER_TYPE with _GB? It's not that obvious that is it
gigabytes.

What is the impact of changing one of these values on the ABI? And what
would be the impact on performance? The underlying question is: shall we
increase these values to avoid changing them later?

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated
  2018-03-10 15:39  7%   ` [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
  2018-03-11 12:51  0%     ` santosh
@ 2018-03-19 17:03  0%     ` Olivier Matz
  2018-03-20 14:41  0%       ` Bruce Richardson
  1 sibling, 1 reply; 200+ results
From: Olivier Matz @ 2018-03-19 17:03 UTC (permalink / raw)
  To: Andrew Rybchenko; +Cc: dev, Bruce Richardson

On Sat, Mar 10, 2018 at 03:39:34PM +0000, Andrew Rybchenko wrote:
> Size of memory chunk required to populate mempool objects depends
> on how objects are stored in the memory. Different mempool drivers
> may have different requirements and a new operation allows to
> calculate memory size in accordance with driver requirements and
> advertise requirements on minimum memory chunk size and alignment
> in a generic way.
> 
> Bump ABI version since the patch breaks it.
> 
> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>

Looks good to me. Just see below for few minor comments.

> ---
> RFCv2 -> v1:
>  - move default calc_mem_size callback to rte_mempool_ops_default.c
>  - add ABI changes to release notes
>  - name default callback consistently: rte_mempool_op_<callback>_default()
>  - bump ABI version since it is the first patch which breaks ABI
>  - describe default callback behaviour in details
>  - avoid introduction of internal function to cope with depration

typo (depration)

>    (keep it to deprecation patch)
>  - move cache-line or page boundary chunk alignment to default callback
>  - highlight that min_chunk_size and align parameters are output only

[...]

> --- a/lib/librte_mempool/Makefile
> +++ b/lib/librte_mempool/Makefile
> @@ -11,11 +11,12 @@ LDLIBS += -lrte_eal -lrte_ring
>  
>  EXPORT_MAP := rte_mempool_version.map
>  
> -LIBABIVER := 3
> +LIBABIVER := 4
>  
>  # all source are stored in SRCS-y
>  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
>  SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops.c
> +SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops_default.c
>  # install includes
>  SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
>  
> diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
> index 7a4f3da..9e3b527 100644
> --- a/lib/librte_mempool/meson.build
> +++ b/lib/librte_mempool/meson.build
> @@ -1,7 +1,8 @@
>  # SPDX-License-Identifier: BSD-3-Clause
>  # Copyright(c) 2017 Intel Corporation
>  
> -version = 2
> -sources = files('rte_mempool.c', 'rte_mempool_ops.c')
> +version = 4
> +sources = files('rte_mempool.c', 'rte_mempool_ops.c',
> +		'rte_mempool_ops_default.c')
>  headers = files('rte_mempool.h')
>  deps += ['ring']

It's strange to see that meson does not have the same
.so version than the legacy build system.

+CC Bruce in case he wants to fix this issue separately.

[...]

> --- a/lib/librte_mempool/rte_mempool_version.map
> +++ b/lib/librte_mempool/rte_mempool_version.map
> @@ -51,3 +51,11 @@ DPDK_17.11 {
>  	rte_mempool_populate_iova_tab;
>  
>  } DPDK_16.07;
> +
> +DPDK_18.05 {
> +	global:
> +
> +	rte_mempool_op_calc_mem_size_default;
> +
> +} DPDK_17.11;
> +

Another minor comment. When applying the patch with git am:

Applying: mempool: add op to calculate memory size to be allocated
.git/rebase-apply/patch:399: new blank line at EOF.
+
warning: 1 line adds whitespace errors.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver
  2018-03-10 15:39  3% ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
                     ` (4 preceding siblings ...)
  2018-03-10 15:39  8%   ` [dpdk-dev] [PATCH v1 7/9] mempool: remove callback to register memory area Andrew Rybchenko
@ 2018-03-19 17:03  0%   ` Olivier Matz
  5 siblings, 0 replies; 200+ results
From: Olivier Matz @ 2018-03-19 17:03 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: dev, Santosh Shukla, Jerin Jacob, Hemant Agrawal, Shreyansh Jain

Hi Andrew,

Thank you for this nice rework.
Globally, the patchset looks good to me. I'm sending some comments
as reply to specific patches.

On Sat, Mar 10, 2018 at 03:39:33PM +0000, Andrew Rybchenko wrote:
> The initial patch series [1] is split into two to simplify processing.
> The second series relies on this one and will add bucket mempool driver
> and related ops.
> 
> The patch series has generic enhancements suggested by Olivier.
> Basically it adds driver callbacks to calculate required memory size and
> to populate objects using provided memory area. It allows to remove
> so-called capability flags used before to tell generic code how to
> allocate and slice allocated memory into mempool objects.
> Clean up which removes get_capabilities and register_memory_area is
> not strictly required, but I think right thing to do.
> Existing mempool drivers are updated.
> 
> I've kept rte_mempool_populate_iova_tab() intact since it seems to
> be not directly related XMEM API functions.

The function rte_mempool_populate_iova_tab() (actually, it was
rte_mempool_populate_phys_tab()) was introduced to support XMEM
API. In my opinion, it can also be deprecated.

> It breaks ABI since changes rte_mempool_ops. Also it removes
> rte_mempool_ops_register_memory_area() and
> rte_mempool_ops_get_capabilities() since corresponding callbacks are
> removed.
> 
> Internal global functions are not listed in map file since it is not
> a part of external API.
> 
> [1] http://dpdk.org/ml/archives/dev/2018-January/088698.html
> 
> RFCv1 -> RFCv2:
>   - add driver ops to calculate required memory size and populate
>     mempool objects, remove extra flags which were required before
>     to control it
>   - transition of octeontx and dpaa drivers to the new callbacks
>   - change info API to get information from driver required to
>     API user to know contiguous block size
>   - remove get_capabilities (not required any more and may be
>     substituted with more in info get API)
>   - remove register_memory_area since it is substituted with
>     populate callback which can do more
>   - use SPDX tags
>   - avoid all objects affinity to single lcore
>   - fix bucket get_count
>   - deprecate XMEM API
>   - avoid introduction of a new function to flush cache
>   - fix NO_CACHE_ALIGN case in bucket mempool
> 
> RFCv2 -> v1:
>   - split the series in two
>   - squash octeontx patches which implement calc_mem_size and populate
>     callbacks into the patch which removes get_capabilities since it is
>     the easiest way to untangle the tangle of tightly related library
>     functions and flags advertised by the driver
>   - consistently name default callbacks
>   - move default callbacks to dedicated file
>   - see detailed description in patches
> 
> Andrew Rybchenko (7):
>   mempool: add op to calculate memory size to be allocated
>   mempool: add op to populate objects using provided memory
>   mempool: remove callback to get capabilities
>   mempool: deprecate xmem functions
>   mempool/octeontx: prepare to remove register memory area op
>   mempool/dpaa: prepare to remove register memory area op
>   mempool: remove callback to register memory area
> 
> Artem V. Andreev (2):
>   mempool: ensure the mempool is initialized before populating
>   mempool: support flushing the default cache of the mempool
> 
>  doc/guides/rel_notes/deprecation.rst            |  12 +-
>  doc/guides/rel_notes/release_18_05.rst          |  32 ++-
>  drivers/mempool/dpaa/dpaa_mempool.c             |  13 +-
>  drivers/mempool/octeontx/rte_mempool_octeontx.c |  64 ++++--
>  lib/librte_mempool/Makefile                     |   3 +-
>  lib/librte_mempool/meson.build                  |   5 +-
>  lib/librte_mempool/rte_mempool.c                | 159 +++++++--------
>  lib/librte_mempool/rte_mempool.h                | 260 +++++++++++++++++-------
>  lib/librte_mempool/rte_mempool_ops.c            |  37 ++--
>  lib/librte_mempool/rte_mempool_ops_default.c    |  51 +++++
>  lib/librte_mempool/rte_mempool_version.map      |  11 +-
>  test/test/test_mempool.c                        |  31 ---
>  12 files changed, 437 insertions(+), 241 deletions(-)
>  create mode 100644 lib/librte_mempool/rte_mempool_ops_default.c
> 
> -- 
> 2.7.4
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
  2018-03-15 15:08  0%           ` Zhang, Qi Z
@ 2018-03-15 15:38  0%             ` Ananyev, Konstantin
  0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2018-03-15 15:38 UTC (permalink / raw)
  To: Zhang, Qi Z, thomas; +Cc: dev, Xing, Beilei, Wu, Jingjing, Lu, Wenzhuo



> -----Original Message-----
> From: Zhang, Qi Z
> Sent: Thursday, March 15, 2018 3:09 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; thomas@monjalon.net
> Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
> 
> 
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Thursday, March 15, 2018 9:17 PM
> > To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> > Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> > <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> > Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
> >
> > Hi Qi,
> >
> > > -----Original Message-----
> > > From: Zhang, Qi Z
> > > Sent: Thursday, March 15, 2018 3:14 AM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> > > thomas@monjalon.net
> > > Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> > > <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> > > Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue
> > > setup
> > >
> > > Hi Konstantin:
> > >
> > > > -----Original Message-----
> > > > From: Ananyev, Konstantin
> > > > Sent: Wednesday, March 14, 2018 8:32 PM
> > > > To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> > > > Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> > > > <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Zhang,
> > > > Qi Z <qi.z.zhang@intel.com>
> > > > Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue
> > > > setup
> > > >
> > > > Hi Qi,
> > > >
> > > > >
> > > > > The patch let etherdev driver expose the capability flag through
> > > > > rte_eth_dev_info_get when it support deferred queue configuraiton,
> > > > > then base on the flag rte_eth_[rx|tx]_queue_setup could decide
> > > > > continue to setup the queue or just return fail when device
> > > > > already started.
> > > > >
> > > > > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > > > > ---
> > > > >  doc/guides/nics/features.rst  |  8 ++++++++
> > > > > lib/librte_ether/rte_ethdev.c | 30 ++++++++++++++++++------------
> > > > > lib/librte_ether/rte_ethdev.h | 11 +++++++++++
> > > > >  3 files changed, 37 insertions(+), 12 deletions(-)
> > > > >
> > > > > diff --git a/doc/guides/nics/features.rst
> > > > > b/doc/guides/nics/features.rst index 1b4fb979f..36ad21a1f 100644
> > > > > --- a/doc/guides/nics/features.rst
> > > > > +++ b/doc/guides/nics/features.rst
> > > > > @@ -892,7 +892,15 @@ Documentation describes performance
> > values.
> > > > >
> > > > >  See ``dpdk.org/doc/perf/*``.
> > > > >
> > > > > +.. _nic_features_queue_deferred_setup_capabilities:
> > > > >
> > > > > +Queue deferred setup capabilities
> > > > > +---------------------------------
> > > > > +
> > > > > +Supports queue setup / release after device started.
> > > > > +
> > > > > +* **[provides] rte_eth_dev_info**:
> > > > >
> > > >
> > ``deferred_queue_config_capa:DEV_DEFERRED_RX_QUEUE_SETUP,DEV_DEFE
> > > > RRED_
> > > > > TX_QUEUE_SETUP,DEV_DEFERRED_RX_QUEUE_RELE
> > > > > ASE,DEV_DEFERRED_TX_QUEUE_RELEASE``.
> > > > > +* **[related]  API**: ``rte_eth_dev_info_get()``.
> > > > >
> > > > >  .. _nic_features_other:
> > > > >
> > > > > diff --git a/lib/librte_ether/rte_ethdev.c
> > > > > b/lib/librte_ether/rte_ethdev.c index a6ce2a5ba..6c906c4df 100644
> > > > > --- a/lib/librte_ether/rte_ethdev.c
> > > > > +++ b/lib/librte_ether/rte_ethdev.c
> > > > > @@ -1425,12 +1425,6 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> > > > uint16_t rx_queue_id,
> > > > >  		return -EINVAL;
> > > > >  	}
> > > > >
> > > > > -	if (dev->data->dev_started) {
> > > > > -		RTE_PMD_DEBUG_TRACE(
> > > > > -		    "port %d must be stopped to allow configuration\n",
> > port_id);
> > > > > -		return -EBUSY;
> > > > > -	}
> > > > > -
> > > > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get,
> > > > -ENOTSUP);
> > > > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup,
> > > > -ENOTSUP);
> > > > >
> > > > > @@ -1474,10 +1468,19 @@ rte_eth_rx_queue_setup(uint16_t
> > port_id,
> > > > uint16_t rx_queue_id,
> > > > >  		return -EINVAL;
> > > > >  	}
> > > > >
> > > > > +	if (dev->data->dev_started &&
> > > > > +		!(dev_info.deferred_queue_config_capa &
> > > > > +			DEV_DEFERRED_RX_QUEUE_SETUP))
> > > > > +		return -EINVAL;
> > > > > +
> > > >
> > > > I think now you have to check here that the queue is stopped.
> > > > Otherwise you might attempt to reconfigure running queue.
> > >
> > > I'm not sure if it's necessary to let application use different API sequence
> > for a deferred configure and deferred re-configure.
> > > Can we just call dev_ops->rx_queue_stop before rx_queue_release here
> >
> > I don't follow you here.
> > Let say now inside queue_start() we do check:
> >
> > if (dev->data->rx_queue_state[rx_queue_id] !=
> > RTE_ETH_QUEUE_STATE_STOPPED)
> >
> > Right now it is not possible to call queue_setup() without dev_stop() before
> > it - that's why we have check if (dev->data->dev_started) in queue_setup()
> > right now.
> > Though with your patch it not the case anymore - user is able to call
> > queue_setup() without stopping the whole device.
> > But he still has to stop the queue.
> 
> >
> > >
> > > >
> > > >
> > > > >  	rxq = dev->data->rx_queues;
> > > > >  	if (rxq[rx_queue_id]) {
> > > > >
> > 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
> > > > >  					-ENOTSUP);
> > > >
> > > > I don't think it is *that* straightforward.
> > > > rx_queue_setup() parameters can imply different rx function (and
> > > > related dev
> > > > icesettings) that are already setuped by previous
> > queue_setup()/dev_start.
> > > > So I think you need to do one of 2 things:
> > > > 1. rework ethdev layer to introduce a separate rx function (and
> > > > related
> > > > settings) for each queue.
> > > > 2. at rx_queue_setup() if it is invoked after dev_start - check that
> > > > given queue settings wouldn't contradict with current device
> > > > settings  (rx function, etc.).
> > > > If they do - return an error.
> > > Yes, I think what we have is option 2 here, the
> > > dev_ops->rx_queue_setup will return fail if conflict with previous
> > > setting
> >
> > Hmm and what makes you think that?
> > As I know it is not the case  right now.
> > Let say I do:
> >     ....
> >    rx_queue_setup(port=0,queue=0, mp=mb_size_2048);
> >    dev_start(port=0);
> >    ...
> >    rx_queue_setup(port=0,queue=1,mp=mb_size_1024);
> >
> >  If current rx function doesn't support multi-segs then second
> > rx_queue_setup() should fail.
> >  Though I don't think that would happen with the current implementation.
> 
> Why you think that would not happen? dev_ops->rx_queue_setup can fail, right?
> I mean it's the responsibility of low level driver (i40e) to check the conflict with current implementation.

Yes it is responsibility if the PMD because only it knows its own logic of rx/tx function selection.
But I don't see such changes in i40e in your patch series.
Probably I missed them?

> >
> > Same story for TX offloads, though it probably not that critical, as for most
> > Intel PMDs HW TX offloads will become per port in 18.05.
> >
> > As I can see you do have either of these options implemented right  now -
> > that's the problem.
> >
> > > I'm also thinking about option 1, the idea is to move per queue rx/tx
> > function into driver layer, so it will not break existing API.
> > >
> > > 1. driver can expose the capability like per_queue_rx or per_queue_tx
> > > 2. application can enable this capability by dev_config with
> > > rte_eth_conf 3, if per_queue_rx is not enable, nothing change, so we
> > > are at option 2 4. if per_queue_rx is enabled, driver will set
> > > rx_pkt_burst with a hook function which redirect to an function ptr in
> > > a per queue rx function tables ( I guess performance is impacted
> > > somehow, but this is the cost if you want different offload for
> > > different queue)
> >
> > I don't think we need to overcomplicate things here.
> > It should be transparent to the user - user just calls queue_setup() - based on
> > its input parameters PMD selects a function that fits best.
> > Pretty much what we have right now, just possibly have an array of functions
> > (one per queue).
> 
> If we don't introduce a new capability or something like, but just take per queue functions as default way,
> does that mean, we need to change all drivers to adapt this?
> Or do you mean below?
> 
> If (dev->rx_pkt_burst)
> 	/* default way */
> else
> 	/* per queue function */

For me either way seems ok.
Second one probably a bit easier, as no changes from PMDs are required.
But again - might be even rte_ethdev layer can fill queue's rx_pkt_burst[] array
for the drivers that don't support it - just by copying dev->rx_pkt_burst into it.
Konstantin 

> 
> Regards
> Qi
> 
> >
> > >
> > > >
> > > > From my perspective - 1) is a better choice though it required more
> > > > work, and possibly ABI breakage.
> > > > I did some work in that direction as RFC:
> > > > http://dpdk.org/dev/patchwork/patch/31866/
> > >
> > > I will learn this, thanks for the heads up.
> > > >
> > > > 2) might be also possible, but looks a bit clumsy as
> > > > rx_queue_setup() might now fail even with valid parameters - all
> > > > depends on previous queue configurations.
> > > >
> > > > Same story applies for TX.
> > > >
> > > >
> > > > > +		if (dev->data->dev_started &&
> > > > > +			!(dev_info.deferred_queue_config_capa &
> > > > > +				DEV_DEFERRED_RX_QUEUE_RELEASE))
> > > > > +			return -EINVAL;
> > > > >  		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> > > > >  		rxq[rx_queue_id] = NULL;
> > > > >  	}
> > > > > @@ -1573,12 +1576,6 @@ rte_eth_tx_queue_setup(uint16_t port_id,
> > > > uint16_t tx_queue_id,
> > > > >  		return -EINVAL;
> > > > >  	}
> > > > >
> > > > > -	if (dev->data->dev_started) {
> > > > > -		RTE_PMD_DEBUG_TRACE(
> > > > > -		    "port %d must be stopped to allow configuration\n",
> > port_id);
> > > > > -		return -EBUSY;
> > > > > -	}
> > > > > -
> > > > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get,
> > > > -ENOTSUP);
> > > > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_setup,
> > > > -ENOTSUP);
> > > > >
> > > > > @@ -1596,10 +1593,19 @@ rte_eth_tx_queue_setup(uint16_t
> > port_id,
> > > > uint16_t tx_queue_id,
> > > > >  		return -EINVAL;
> > > > >  	}
> > > > >
> > > > > +	if (dev->data->dev_started &&
> > > > > +		!(dev_info.deferred_queue_config_capa &
> > > > > +			DEV_DEFERRED_TX_QUEUE_SETUP))
> > > > > +		return -EINVAL;
> > > > > +
> > > > >  	txq = dev->data->tx_queues;
> > > > >  	if (txq[tx_queue_id]) {
> > > > >
> > 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
> > > > >  					-ENOTSUP);
> > > > > +		if (dev->data->dev_started &&
> > > > > +			!(dev_info.deferred_queue_config_capa &
> > > > > +				DEV_DEFERRED_TX_QUEUE_RELEASE))
> > > > > +			return -EINVAL;
> > > > >  		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> > > > >  		txq[tx_queue_id] = NULL;
> > > > >  	}
> > > > > diff --git a/lib/librte_ether/rte_ethdev.h
> > > > > b/lib/librte_ether/rte_ethdev.h index 036153306..410e58c50 100644
> > > > > --- a/lib/librte_ether/rte_ethdev.h
> > > > > +++ b/lib/librte_ether/rte_ethdev.h
> > > > > @@ -981,6 +981,15 @@ struct rte_eth_conf {
> > > > >   */
> > > > >  #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> > > > >
> > > > > +#define DEV_DEFERRED_RX_QUEUE_SETUP 0x00000001 /**<
> > Deferred
> > > > setup rx
> > > > > +queue */ #define DEV_DEFERRED_TX_QUEUE_SETUP 0x00000002
> > /**<
> > > > Deferred
> > > > > +setup tx queue */ #define DEV_DEFERRED_RX_QUEUE_RELEASE
> > > > 0x00000004
> > > > > +/**< Deferred release rx queue */ #define
> > > > > +DEV_DEFERRED_TX_QUEUE_RELEASE 0x00000008 /**< Deferred
> > release
> > > > tx
> > > > > +queue */
> > > > > +
> > > >
> > > > I don't think we do need flags for both setup a and release.
> > > > If runtime setup is supported - surely dynamic release should be
> > > > supported too.
> > > > Also probably RUNTIME_RX_QUEUE_SETUP sounds a bit better.
> > >
> > > Agree
> > >
> > > Thanks
> > > Qi
> > >
> > > >
> > > > Konstantin
> > > >
> > > > >  /*
> > > > >   * If new Tx offload capabilities are defined, they also must be
> > > > >   * mentioned in rte_tx_offload_names in rte_ethdev.c file.
> > > > > @@ -1029,6 +1038,8 @@ struct rte_eth_dev_info {
> > > > >  	/** Configured number of rx/tx queues */
> > > > >  	uint16_t nb_rx_queues; /**< Number of RX queues. */
> > > > >  	uint16_t nb_tx_queues; /**< Number of TX queues. */
> > > > > +	uint64_t deferred_queue_config_capa;
> > > > > +	/**< queues can be setup/release after dev_start
> > > > > +(DEV_DEFERRED_). */
> > > > >  };
> > > > >
> > > > >  /**
> > > > > --
> > > > > 2.13.6

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
  2018-03-15 13:16  0%         ` Ananyev, Konstantin
@ 2018-03-15 15:08  0%           ` Zhang, Qi Z
  2018-03-15 15:38  0%             ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Zhang, Qi Z @ 2018-03-15 15:08 UTC (permalink / raw)
  To: Ananyev, Konstantin, thomas; +Cc: dev, Xing, Beilei, Wu, Jingjing, Lu, Wenzhuo



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Thursday, March 15, 2018 9:17 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
> 
> Hi Qi,
> 
> > -----Original Message-----
> > From: Zhang, Qi Z
> > Sent: Thursday, March 15, 2018 3:14 AM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>;
> > thomas@monjalon.net
> > Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> > <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> > Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue
> > setup
> >
> > Hi Konstantin:
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Wednesday, March 14, 2018 8:32 PM
> > > To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> > > Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> > > <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Zhang,
> > > Qi Z <qi.z.zhang@intel.com>
> > > Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue
> > > setup
> > >
> > > Hi Qi,
> > >
> > > >
> > > > The patch let etherdev driver expose the capability flag through
> > > > rte_eth_dev_info_get when it support deferred queue configuraiton,
> > > > then base on the flag rte_eth_[rx|tx]_queue_setup could decide
> > > > continue to setup the queue or just return fail when device
> > > > already started.
> > > >
> > > > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > > > ---
> > > >  doc/guides/nics/features.rst  |  8 ++++++++
> > > > lib/librte_ether/rte_ethdev.c | 30 ++++++++++++++++++------------
> > > > lib/librte_ether/rte_ethdev.h | 11 +++++++++++
> > > >  3 files changed, 37 insertions(+), 12 deletions(-)
> > > >
> > > > diff --git a/doc/guides/nics/features.rst
> > > > b/doc/guides/nics/features.rst index 1b4fb979f..36ad21a1f 100644
> > > > --- a/doc/guides/nics/features.rst
> > > > +++ b/doc/guides/nics/features.rst
> > > > @@ -892,7 +892,15 @@ Documentation describes performance
> values.
> > > >
> > > >  See ``dpdk.org/doc/perf/*``.
> > > >
> > > > +.. _nic_features_queue_deferred_setup_capabilities:
> > > >
> > > > +Queue deferred setup capabilities
> > > > +---------------------------------
> > > > +
> > > > +Supports queue setup / release after device started.
> > > > +
> > > > +* **[provides] rte_eth_dev_info**:
> > > >
> > >
> ``deferred_queue_config_capa:DEV_DEFERRED_RX_QUEUE_SETUP,DEV_DEFE
> > > RRED_
> > > > TX_QUEUE_SETUP,DEV_DEFERRED_RX_QUEUE_RELE
> > > > ASE,DEV_DEFERRED_TX_QUEUE_RELEASE``.
> > > > +* **[related]  API**: ``rte_eth_dev_info_get()``.
> > > >
> > > >  .. _nic_features_other:
> > > >
> > > > diff --git a/lib/librte_ether/rte_ethdev.c
> > > > b/lib/librte_ether/rte_ethdev.c index a6ce2a5ba..6c906c4df 100644
> > > > --- a/lib/librte_ether/rte_ethdev.c
> > > > +++ b/lib/librte_ether/rte_ethdev.c
> > > > @@ -1425,12 +1425,6 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> > > uint16_t rx_queue_id,
> > > >  		return -EINVAL;
> > > >  	}
> > > >
> > > > -	if (dev->data->dev_started) {
> > > > -		RTE_PMD_DEBUG_TRACE(
> > > > -		    "port %d must be stopped to allow configuration\n",
> port_id);
> > > > -		return -EBUSY;
> > > > -	}
> > > > -
> > > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get,
> > > -ENOTSUP);
> > > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup,
> > > -ENOTSUP);
> > > >
> > > > @@ -1474,10 +1468,19 @@ rte_eth_rx_queue_setup(uint16_t
> port_id,
> > > uint16_t rx_queue_id,
> > > >  		return -EINVAL;
> > > >  	}
> > > >
> > > > +	if (dev->data->dev_started &&
> > > > +		!(dev_info.deferred_queue_config_capa &
> > > > +			DEV_DEFERRED_RX_QUEUE_SETUP))
> > > > +		return -EINVAL;
> > > > +
> > >
> > > I think now you have to check here that the queue is stopped.
> > > Otherwise you might attempt to reconfigure running queue.
> >
> > I'm not sure if it's necessary to let application use different API sequence
> for a deferred configure and deferred re-configure.
> > Can we just call dev_ops->rx_queue_stop before rx_queue_release here
> 
> I don't follow you here.
> Let say now inside queue_start() we do check:
> 
> if (dev->data->rx_queue_state[rx_queue_id] !=
> RTE_ETH_QUEUE_STATE_STOPPED)
> 
> Right now it is not possible to call queue_setup() without dev_stop() before
> it - that's why we have check if (dev->data->dev_started) in queue_setup()
> right now.
> Though with your patch it not the case anymore - user is able to call
> queue_setup() without stopping the whole device.
> But he still has to stop the queue.

> 
> >
> > >
> > >
> > > >  	rxq = dev->data->rx_queues;
> > > >  	if (rxq[rx_queue_id]) {
> > > >
> 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
> > > >  					-ENOTSUP);
> > >
> > > I don't think it is *that* straightforward.
> > > rx_queue_setup() parameters can imply different rx function (and
> > > related dev
> > > icesettings) that are already setuped by previous
> queue_setup()/dev_start.
> > > So I think you need to do one of 2 things:
> > > 1. rework ethdev layer to introduce a separate rx function (and
> > > related
> > > settings) for each queue.
> > > 2. at rx_queue_setup() if it is invoked after dev_start - check that
> > > given queue settings wouldn't contradict with current device
> > > settings  (rx function, etc.).
> > > If they do - return an error.
> > Yes, I think what we have is option 2 here, the
> > dev_ops->rx_queue_setup will return fail if conflict with previous
> > setting
> 
> Hmm and what makes you think that?
> As I know it is not the case  right now.
> Let say I do:
>     ....
>    rx_queue_setup(port=0,queue=0, mp=mb_size_2048);
>    dev_start(port=0);
>    ...
>    rx_queue_setup(port=0,queue=1,mp=mb_size_1024);
> 
>  If current rx function doesn't support multi-segs then second
> rx_queue_setup() should fail.
>  Though I don't think that would happen with the current implementation.

Why you think that would not happen? dev_ops->rx_queue_setup can fail, right?
I mean it's the responsibility of low level driver (i40e) to check the conflict with current implementation.
> 
> Same story for TX offloads, though it probably not that critical, as for most
> Intel PMDs HW TX offloads will become per port in 18.05.
> 
> As I can see you do have either of these options implemented right  now -
> that's the problem.
> 
> > I'm also thinking about option 1, the idea is to move per queue rx/tx
> function into driver layer, so it will not break existing API.
> >
> > 1. driver can expose the capability like per_queue_rx or per_queue_tx
> > 2. application can enable this capability by dev_config with
> > rte_eth_conf 3, if per_queue_rx is not enable, nothing change, so we
> > are at option 2 4. if per_queue_rx is enabled, driver will set
> > rx_pkt_burst with a hook function which redirect to an function ptr in
> > a per queue rx function tables ( I guess performance is impacted
> > somehow, but this is the cost if you want different offload for
> > different queue)
> 
> I don't think we need to overcomplicate things here.
> It should be transparent to the user - user just calls queue_setup() - based on
> its input parameters PMD selects a function that fits best.
> Pretty much what we have right now, just possibly have an array of functions
> (one per queue).

If we don't introduce a new capability or something like, but just take per queue functions as default way, 
does that mean, we need to change all drivers to adapt this?
Or do you mean below?

If (dev->rx_pkt_burst)
	/* default way */
else
	/* per queue function */

Regards
Qi

> 
> >
> > >
> > > From my perspective - 1) is a better choice though it required more
> > > work, and possibly ABI breakage.
> > > I did some work in that direction as RFC:
> > > http://dpdk.org/dev/patchwork/patch/31866/
> >
> > I will learn this, thanks for the heads up.
> > >
> > > 2) might be also possible, but looks a bit clumsy as
> > > rx_queue_setup() might now fail even with valid parameters - all
> > > depends on previous queue configurations.
> > >
> > > Same story applies for TX.
> > >
> > >
> > > > +		if (dev->data->dev_started &&
> > > > +			!(dev_info.deferred_queue_config_capa &
> > > > +				DEV_DEFERRED_RX_QUEUE_RELEASE))
> > > > +			return -EINVAL;
> > > >  		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> > > >  		rxq[rx_queue_id] = NULL;
> > > >  	}
> > > > @@ -1573,12 +1576,6 @@ rte_eth_tx_queue_setup(uint16_t port_id,
> > > uint16_t tx_queue_id,
> > > >  		return -EINVAL;
> > > >  	}
> > > >
> > > > -	if (dev->data->dev_started) {
> > > > -		RTE_PMD_DEBUG_TRACE(
> > > > -		    "port %d must be stopped to allow configuration\n",
> port_id);
> > > > -		return -EBUSY;
> > > > -	}
> > > > -
> > > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get,
> > > -ENOTSUP);
> > > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_setup,
> > > -ENOTSUP);
> > > >
> > > > @@ -1596,10 +1593,19 @@ rte_eth_tx_queue_setup(uint16_t
> port_id,
> > > uint16_t tx_queue_id,
> > > >  		return -EINVAL;
> > > >  	}
> > > >
> > > > +	if (dev->data->dev_started &&
> > > > +		!(dev_info.deferred_queue_config_capa &
> > > > +			DEV_DEFERRED_TX_QUEUE_SETUP))
> > > > +		return -EINVAL;
> > > > +
> > > >  	txq = dev->data->tx_queues;
> > > >  	if (txq[tx_queue_id]) {
> > > >
> 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
> > > >  					-ENOTSUP);
> > > > +		if (dev->data->dev_started &&
> > > > +			!(dev_info.deferred_queue_config_capa &
> > > > +				DEV_DEFERRED_TX_QUEUE_RELEASE))
> > > > +			return -EINVAL;
> > > >  		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> > > >  		txq[tx_queue_id] = NULL;
> > > >  	}
> > > > diff --git a/lib/librte_ether/rte_ethdev.h
> > > > b/lib/librte_ether/rte_ethdev.h index 036153306..410e58c50 100644
> > > > --- a/lib/librte_ether/rte_ethdev.h
> > > > +++ b/lib/librte_ether/rte_ethdev.h
> > > > @@ -981,6 +981,15 @@ struct rte_eth_conf {
> > > >   */
> > > >  #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> > > >
> > > > +#define DEV_DEFERRED_RX_QUEUE_SETUP 0x00000001 /**<
> Deferred
> > > setup rx
> > > > +queue */ #define DEV_DEFERRED_TX_QUEUE_SETUP 0x00000002
> /**<
> > > Deferred
> > > > +setup tx queue */ #define DEV_DEFERRED_RX_QUEUE_RELEASE
> > > 0x00000004
> > > > +/**< Deferred release rx queue */ #define
> > > > +DEV_DEFERRED_TX_QUEUE_RELEASE 0x00000008 /**< Deferred
> release
> > > tx
> > > > +queue */
> > > > +
> > >
> > > I don't think we do need flags for both setup a and release.
> > > If runtime setup is supported - surely dynamic release should be
> > > supported too.
> > > Also probably RUNTIME_RX_QUEUE_SETUP sounds a bit better.
> >
> > Agree
> >
> > Thanks
> > Qi
> >
> > >
> > > Konstantin
> > >
> > > >  /*
> > > >   * If new Tx offload capabilities are defined, they also must be
> > > >   * mentioned in rte_tx_offload_names in rte_ethdev.c file.
> > > > @@ -1029,6 +1038,8 @@ struct rte_eth_dev_info {
> > > >  	/** Configured number of rx/tx queues */
> > > >  	uint16_t nb_rx_queues; /**< Number of RX queues. */
> > > >  	uint16_t nb_tx_queues; /**< Number of TX queues. */
> > > > +	uint64_t deferred_queue_config_capa;
> > > > +	/**< queues can be setup/release after dev_start
> > > > +(DEV_DEFERRED_). */
> > > >  };
> > > >
> > > >  /**
> > > > --
> > > > 2.13.6

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
  2018-03-15  3:13  0%       ` Zhang, Qi Z
@ 2018-03-15 13:16  0%         ` Ananyev, Konstantin
  2018-03-15 15:08  0%           ` Zhang, Qi Z
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2018-03-15 13:16 UTC (permalink / raw)
  To: Zhang, Qi Z, thomas; +Cc: dev, Xing, Beilei, Wu, Jingjing, Lu, Wenzhuo

Hi Qi,

> -----Original Message-----
> From: Zhang, Qi Z
> Sent: Thursday, March 15, 2018 3:14 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; thomas@monjalon.net
> Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
> 
> Hi Konstantin:
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Wednesday, March 14, 2018 8:32 PM
> > To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> > Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> > <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Zhang, Qi Z
> > <qi.z.zhang@intel.com>
> > Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
> >
> > Hi Qi,
> >
> > >
> > > The patch let etherdev driver expose the capability flag through
> > > rte_eth_dev_info_get when it support deferred queue configuraiton,
> > > then base on the flag rte_eth_[rx|tx]_queue_setup could decide
> > > continue to setup the queue or just return fail when device already
> > > started.
> > >
> > > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > > ---
> > >  doc/guides/nics/features.rst  |  8 ++++++++
> > > lib/librte_ether/rte_ethdev.c | 30 ++++++++++++++++++------------
> > > lib/librte_ether/rte_ethdev.h | 11 +++++++++++
> > >  3 files changed, 37 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/doc/guides/nics/features.rst
> > > b/doc/guides/nics/features.rst index 1b4fb979f..36ad21a1f 100644
> > > --- a/doc/guides/nics/features.rst
> > > +++ b/doc/guides/nics/features.rst
> > > @@ -892,7 +892,15 @@ Documentation describes performance values.
> > >
> > >  See ``dpdk.org/doc/perf/*``.
> > >
> > > +.. _nic_features_queue_deferred_setup_capabilities:
> > >
> > > +Queue deferred setup capabilities
> > > +---------------------------------
> > > +
> > > +Supports queue setup / release after device started.
> > > +
> > > +* **[provides] rte_eth_dev_info**:
> > >
> > ``deferred_queue_config_capa:DEV_DEFERRED_RX_QUEUE_SETUP,DEV_DEFE
> > RRED_
> > > TX_QUEUE_SETUP,DEV_DEFERRED_RX_QUEUE_RELE
> > > ASE,DEV_DEFERRED_TX_QUEUE_RELEASE``.
> > > +* **[related]  API**: ``rte_eth_dev_info_get()``.
> > >
> > >  .. _nic_features_other:
> > >
> > > diff --git a/lib/librte_ether/rte_ethdev.c
> > > b/lib/librte_ether/rte_ethdev.c index a6ce2a5ba..6c906c4df 100644
> > > --- a/lib/librte_ether/rte_ethdev.c
> > > +++ b/lib/librte_ether/rte_ethdev.c
> > > @@ -1425,12 +1425,6 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> > uint16_t rx_queue_id,
> > >  		return -EINVAL;
> > >  	}
> > >
> > > -	if (dev->data->dev_started) {
> > > -		RTE_PMD_DEBUG_TRACE(
> > > -		    "port %d must be stopped to allow configuration\n", port_id);
> > > -		return -EBUSY;
> > > -	}
> > > -
> > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get,
> > -ENOTSUP);
> > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup,
> > -ENOTSUP);
> > >
> > > @@ -1474,10 +1468,19 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> > uint16_t rx_queue_id,
> > >  		return -EINVAL;
> > >  	}
> > >
> > > +	if (dev->data->dev_started &&
> > > +		!(dev_info.deferred_queue_config_capa &
> > > +			DEV_DEFERRED_RX_QUEUE_SETUP))
> > > +		return -EINVAL;
> > > +
> >
> > I think now you have to check here that the queue is stopped.
> > Otherwise you might attempt to reconfigure running queue.
> 
> I'm not sure if it's necessary to let application use different API sequence for a deferred configure and deferred re-configure.
> Can we just call dev_ops->rx_queue_stop before rx_queue_release here

I don't follow you here.
Let say now inside queue_start() we do check:

if (dev->data->rx_queue_state[rx_queue_id] != RTE_ETH_QUEUE_STATE_STOPPED)

Right now it is not possible to call queue_setup() without dev_stop() before it -
that's why we have check if (dev->data->dev_started) in queue_setup() right now.
Though with your patch it not the case anymore - user is able to call queue_setup()
without stopping the whole device.
But he still has to stop the queue. 

> 
> >
> >
> > >  	rxq = dev->data->rx_queues;
> > >  	if (rxq[rx_queue_id]) {
> > >  		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
> > >  					-ENOTSUP);
> >
> > I don't think it is *that* straightforward.
> > rx_queue_setup() parameters can imply different rx function (and related dev
> > icesettings) that are already setuped by previous queue_setup()/dev_start.
> > So I think you need to do one of 2 things:
> > 1. rework ethdev layer to introduce a separate rx function (and related
> > settings) for each queue.
> > 2. at rx_queue_setup() if it is invoked after dev_start - check that given
> > queue settings wouldn't contradict with current device settings  (rx function,
> > etc.).
> > If they do - return an error.
> Yes, I think what we have is option 2 here, the dev_ops->rx_queue_setup will return fail if conflict with previous setting

Hmm and what makes you think that?
As I know it is not the case  right now.
Let say I do:
    ....
   rx_queue_setup(port=0,queue=0, mp=mb_size_2048);
   dev_start(port=0);
   ...
   rx_queue_setup(port=0,queue=1,mp=mb_size_1024);
   
 If current rx function doesn't support multi-segs then second rx_queue_setup() should fail.
 Though I don't think that would happen with the current implementation. 

Same story for TX offloads, though it probably not that critical, as for most Intel PMDs HW TX offloads will become per port in 18.05.

As I can see you do have either of these options implemented right  now - that's the problem.

> I'm also thinking about option 1, the idea is to move per queue rx/tx function into driver layer, so it will not break existing API.
> 
> 1. driver can expose the capability like per_queue_rx or per_queue_tx
> 2. application can enable this capability by dev_config with rte_eth_conf
> 3, if per_queue_rx is not enable, nothing change, so we are at option 2
> 4. if per_queue_rx is enabled, driver will set rx_pkt_burst with a hook function which redirect to an function ptr in a per queue rx function
> tables ( I guess performance is impacted somehow, but this is the cost if you want different offload for different queue)

I don't think we need to overcomplicate things here.
It should be transparent to the user - user just calls queue_setup() - based on its input parameters
PMD selects a function that fits best.
Pretty much what we have right now, just possibly have an array of functions (one per queue).

> 
> >
> > From my perspective - 1) is a better choice though it required more work,
> > and possibly ABI breakage.
> > I did some work in that direction as RFC:
> > http://dpdk.org/dev/patchwork/patch/31866/
> 
> I will learn this, thanks for the heads up.
> >
> > 2) might be also possible, but looks a bit clumsy as rx_queue_setup() might
> > now fail even with valid parameters - all depends on previous queue
> > configurations.
> >
> > Same story applies for TX.
> >
> >
> > > +		if (dev->data->dev_started &&
> > > +			!(dev_info.deferred_queue_config_capa &
> > > +				DEV_DEFERRED_RX_QUEUE_RELEASE))
> > > +			return -EINVAL;
> > >  		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> > >  		rxq[rx_queue_id] = NULL;
> > >  	}
> > > @@ -1573,12 +1576,6 @@ rte_eth_tx_queue_setup(uint16_t port_id,
> > uint16_t tx_queue_id,
> > >  		return -EINVAL;
> > >  	}
> > >
> > > -	if (dev->data->dev_started) {
> > > -		RTE_PMD_DEBUG_TRACE(
> > > -		    "port %d must be stopped to allow configuration\n", port_id);
> > > -		return -EBUSY;
> > > -	}
> > > -
> > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get,
> > -ENOTSUP);
> > >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_setup,
> > -ENOTSUP);
> > >
> > > @@ -1596,10 +1593,19 @@ rte_eth_tx_queue_setup(uint16_t port_id,
> > uint16_t tx_queue_id,
> > >  		return -EINVAL;
> > >  	}
> > >
> > > +	if (dev->data->dev_started &&
> > > +		!(dev_info.deferred_queue_config_capa &
> > > +			DEV_DEFERRED_TX_QUEUE_SETUP))
> > > +		return -EINVAL;
> > > +
> > >  	txq = dev->data->tx_queues;
> > >  	if (txq[tx_queue_id]) {
> > >  		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
> > >  					-ENOTSUP);
> > > +		if (dev->data->dev_started &&
> > > +			!(dev_info.deferred_queue_config_capa &
> > > +				DEV_DEFERRED_TX_QUEUE_RELEASE))
> > > +			return -EINVAL;
> > >  		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> > >  		txq[tx_queue_id] = NULL;
> > >  	}
> > > diff --git a/lib/librte_ether/rte_ethdev.h
> > > b/lib/librte_ether/rte_ethdev.h index 036153306..410e58c50 100644
> > > --- a/lib/librte_ether/rte_ethdev.h
> > > +++ b/lib/librte_ether/rte_ethdev.h
> > > @@ -981,6 +981,15 @@ struct rte_eth_conf {
> > >   */
> > >  #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> > >
> > > +#define DEV_DEFERRED_RX_QUEUE_SETUP 0x00000001 /**< Deferred
> > setup rx
> > > +queue */ #define DEV_DEFERRED_TX_QUEUE_SETUP 0x00000002 /**<
> > Deferred
> > > +setup tx queue */ #define DEV_DEFERRED_RX_QUEUE_RELEASE
> > 0x00000004
> > > +/**< Deferred release rx queue */ #define
> > > +DEV_DEFERRED_TX_QUEUE_RELEASE 0x00000008 /**< Deferred release
> > tx
> > > +queue */
> > > +
> >
> > I don't think we do need flags for both setup a and release.
> > If runtime setup is supported - surely dynamic release should be supported
> > too.
> > Also probably RUNTIME_RX_QUEUE_SETUP sounds a bit better.
> 
> Agree
> 
> Thanks
> Qi
> 
> >
> > Konstantin
> >
> > >  /*
> > >   * If new Tx offload capabilities are defined, they also must be
> > >   * mentioned in rte_tx_offload_names in rte_ethdev.c file.
> > > @@ -1029,6 +1038,8 @@ struct rte_eth_dev_info {
> > >  	/** Configured number of rx/tx queues */
> > >  	uint16_t nb_rx_queues; /**< Number of RX queues. */
> > >  	uint16_t nb_tx_queues; /**< Number of TX queues. */
> > > +	uint64_t deferred_queue_config_capa;
> > > +	/**< queues can be setup/release after dev_start (DEV_DEFERRED_). */
> > >  };
> > >
> > >  /**
> > > --
> > > 2.13.6

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
  2018-03-14 12:31  0%     ` Ananyev, Konstantin
@ 2018-03-15  3:13  0%       ` Zhang, Qi Z
  2018-03-15 13:16  0%         ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Zhang, Qi Z @ 2018-03-15  3:13 UTC (permalink / raw)
  To: Ananyev, Konstantin, thomas; +Cc: dev, Xing, Beilei, Wu, Jingjing, Lu, Wenzhuo

Hi Konstantin:

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Wednesday, March 14, 2018 8:32 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net
> Cc: dev@dpdk.org; Xing, Beilei <beilei.xing@intel.com>; Wu, Jingjing
> <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>
> Subject: RE: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
> 
> Hi Qi,
> 
> >
> > The patch let etherdev driver expose the capability flag through
> > rte_eth_dev_info_get when it support deferred queue configuraiton,
> > then base on the flag rte_eth_[rx|tx]_queue_setup could decide
> > continue to setup the queue or just return fail when device already
> > started.
> >
> > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > ---
> >  doc/guides/nics/features.rst  |  8 ++++++++
> > lib/librte_ether/rte_ethdev.c | 30 ++++++++++++++++++------------
> > lib/librte_ether/rte_ethdev.h | 11 +++++++++++
> >  3 files changed, 37 insertions(+), 12 deletions(-)
> >
> > diff --git a/doc/guides/nics/features.rst
> > b/doc/guides/nics/features.rst index 1b4fb979f..36ad21a1f 100644
> > --- a/doc/guides/nics/features.rst
> > +++ b/doc/guides/nics/features.rst
> > @@ -892,7 +892,15 @@ Documentation describes performance values.
> >
> >  See ``dpdk.org/doc/perf/*``.
> >
> > +.. _nic_features_queue_deferred_setup_capabilities:
> >
> > +Queue deferred setup capabilities
> > +---------------------------------
> > +
> > +Supports queue setup / release after device started.
> > +
> > +* **[provides] rte_eth_dev_info**:
> >
> ``deferred_queue_config_capa:DEV_DEFERRED_RX_QUEUE_SETUP,DEV_DEFE
> RRED_
> > TX_QUEUE_SETUP,DEV_DEFERRED_RX_QUEUE_RELE
> > ASE,DEV_DEFERRED_TX_QUEUE_RELEASE``.
> > +* **[related]  API**: ``rte_eth_dev_info_get()``.
> >
> >  .. _nic_features_other:
> >
> > diff --git a/lib/librte_ether/rte_ethdev.c
> > b/lib/librte_ether/rte_ethdev.c index a6ce2a5ba..6c906c4df 100644
> > --- a/lib/librte_ether/rte_ethdev.c
> > +++ b/lib/librte_ether/rte_ethdev.c
> > @@ -1425,12 +1425,6 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >  		return -EINVAL;
> >  	}
> >
> > -	if (dev->data->dev_started) {
> > -		RTE_PMD_DEBUG_TRACE(
> > -		    "port %d must be stopped to allow configuration\n", port_id);
> > -		return -EBUSY;
> > -	}
> > -
> >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get,
> -ENOTSUP);
> >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup,
> -ENOTSUP);
> >
> > @@ -1474,10 +1468,19 @@ rte_eth_rx_queue_setup(uint16_t port_id,
> uint16_t rx_queue_id,
> >  		return -EINVAL;
> >  	}
> >
> > +	if (dev->data->dev_started &&
> > +		!(dev_info.deferred_queue_config_capa &
> > +			DEV_DEFERRED_RX_QUEUE_SETUP))
> > +		return -EINVAL;
> > +
> 
> I think now you have to check here that the queue is stopped.
> Otherwise you might attempt to reconfigure running queue.

I'm not sure if it's necessary to let application use different API sequence for a deferred configure and deferred re-configure.
Can we just call dev_ops->rx_queue_stop before rx_queue_release here

> 
> 
> >  	rxq = dev->data->rx_queues;
> >  	if (rxq[rx_queue_id]) {
> >  		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
> >  					-ENOTSUP);
> 
> I don't think it is *that* straightforward.
> rx_queue_setup() parameters can imply different rx function (and related dev
> icesettings) that are already setuped by previous queue_setup()/dev_start.
> So I think you need to do one of 2 things:
> 1. rework ethdev layer to introduce a separate rx function (and related
> settings) for each queue.
> 2. at rx_queue_setup() if it is invoked after dev_start - check that given
> queue settings wouldn't contradict with current device settings  (rx function,
> etc.).
> If they do - return an error.
Yes, I think what we have is option 2 here, the dev_ops->rx_queue_setup will return fail if conflict with previous setting
I'm also thinking about option 1, the idea is to move per queue rx/tx function into driver layer, so it will not break existing API.

1. driver can expose the capability like per_queue_rx or per_queue_tx
2. application can enable this capability by dev_config with rte_eth_conf
3, if per_queue_rx is not enable, nothing change, so we are at option 2
4. if per_queue_rx is enabled, driver will set rx_pkt_burst with a hook function which redirect to an function ptr in a per queue rx function tables ( I guess performance is impacted somehow, but this is the cost if you want different offload for different queue)

> 
> From my perspective - 1) is a better choice though it required more work,
> and possibly ABI breakage.
> I did some work in that direction as RFC:
> http://dpdk.org/dev/patchwork/patch/31866/

I will learn this, thanks for the heads up.
> 
> 2) might be also possible, but looks a bit clumsy as rx_queue_setup() might
> now fail even with valid parameters - all depends on previous queue
> configurations.
> 
> Same story applies for TX.
> 
> 
> > +		if (dev->data->dev_started &&
> > +			!(dev_info.deferred_queue_config_capa &
> > +				DEV_DEFERRED_RX_QUEUE_RELEASE))
> > +			return -EINVAL;
> >  		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
> >  		rxq[rx_queue_id] = NULL;
> >  	}
> > @@ -1573,12 +1576,6 @@ rte_eth_tx_queue_setup(uint16_t port_id,
> uint16_t tx_queue_id,
> >  		return -EINVAL;
> >  	}
> >
> > -	if (dev->data->dev_started) {
> > -		RTE_PMD_DEBUG_TRACE(
> > -		    "port %d must be stopped to allow configuration\n", port_id);
> > -		return -EBUSY;
> > -	}
> > -
> >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get,
> -ENOTSUP);
> >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_setup,
> -ENOTSUP);
> >
> > @@ -1596,10 +1593,19 @@ rte_eth_tx_queue_setup(uint16_t port_id,
> uint16_t tx_queue_id,
> >  		return -EINVAL;
> >  	}
> >
> > +	if (dev->data->dev_started &&
> > +		!(dev_info.deferred_queue_config_capa &
> > +			DEV_DEFERRED_TX_QUEUE_SETUP))
> > +		return -EINVAL;
> > +
> >  	txq = dev->data->tx_queues;
> >  	if (txq[tx_queue_id]) {
> >  		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
> >  					-ENOTSUP);
> > +		if (dev->data->dev_started &&
> > +			!(dev_info.deferred_queue_config_capa &
> > +				DEV_DEFERRED_TX_QUEUE_RELEASE))
> > +			return -EINVAL;
> >  		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
> >  		txq[tx_queue_id] = NULL;
> >  	}
> > diff --git a/lib/librte_ether/rte_ethdev.h
> > b/lib/librte_ether/rte_ethdev.h index 036153306..410e58c50 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -981,6 +981,15 @@ struct rte_eth_conf {
> >   */
> >  #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> >
> > +#define DEV_DEFERRED_RX_QUEUE_SETUP 0x00000001 /**< Deferred
> setup rx
> > +queue */ #define DEV_DEFERRED_TX_QUEUE_SETUP 0x00000002 /**<
> Deferred
> > +setup tx queue */ #define DEV_DEFERRED_RX_QUEUE_RELEASE
> 0x00000004
> > +/**< Deferred release rx queue */ #define
> > +DEV_DEFERRED_TX_QUEUE_RELEASE 0x00000008 /**< Deferred release
> tx
> > +queue */
> > +
> 
> I don't think we do need flags for both setup a and release.
> If runtime setup is supported - surely dynamic release should be supported
> too.
> Also probably RUNTIME_RX_QUEUE_SETUP sounds a bit better.

Agree

Thanks
Qi

> 
> Konstantin
> 
> >  /*
> >   * If new Tx offload capabilities are defined, they also must be
> >   * mentioned in rte_tx_offload_names in rte_ethdev.c file.
> > @@ -1029,6 +1038,8 @@ struct rte_eth_dev_info {
> >  	/** Configured number of rx/tx queues */
> >  	uint16_t nb_rx_queues; /**< Number of RX queues. */
> >  	uint16_t nb_tx_queues; /**< Number of TX queues. */
> > +	uint64_t deferred_queue_config_capa;
> > +	/**< queues can be setup/release after dev_start (DEV_DEFERRED_). */
> >  };
> >
> >  /**
> > --
> > 2.13.6

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/4] ether: support deferred queue setup
  @ 2018-03-14 12:31  0%     ` Ananyev, Konstantin
  2018-03-15  3:13  0%       ` Zhang, Qi Z
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2018-03-14 12:31 UTC (permalink / raw)
  To: Zhang, Qi Z, thomas
  Cc: dev, Xing, Beilei, Wu, Jingjing, Lu, Wenzhuo, Zhang, Qi Z

Hi Qi,

> 
> The patch let etherdev driver expose the capability flag through
> rte_eth_dev_info_get when it support deferred queue configuraiton,
> then base on the flag rte_eth_[rx|tx]_queue_setup could decide
> continue to setup the queue or just return fail when device already
> started.
> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> ---
>  doc/guides/nics/features.rst  |  8 ++++++++
>  lib/librte_ether/rte_ethdev.c | 30 ++++++++++++++++++------------
>  lib/librte_ether/rte_ethdev.h | 11 +++++++++++
>  3 files changed, 37 insertions(+), 12 deletions(-)
> 
> diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> index 1b4fb979f..36ad21a1f 100644
> --- a/doc/guides/nics/features.rst
> +++ b/doc/guides/nics/features.rst
> @@ -892,7 +892,15 @@ Documentation describes performance values.
> 
>  See ``dpdk.org/doc/perf/*``.
> 
> +.. _nic_features_queue_deferred_setup_capabilities:
> 
> +Queue deferred setup capabilities
> +---------------------------------
> +
> +Supports queue setup / release after device started.
> +
> +* **[provides] rte_eth_dev_info**:
> ``deferred_queue_config_capa:DEV_DEFERRED_RX_QUEUE_SETUP,DEV_DEFERRED_TX_QUEUE_SETUP,DEV_DEFERRED_RX_QUEUE_RELE
> ASE,DEV_DEFERRED_TX_QUEUE_RELEASE``.
> +* **[related]  API**: ``rte_eth_dev_info_get()``.
> 
>  .. _nic_features_other:
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index a6ce2a5ba..6c906c4df 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -1425,12 +1425,6 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>  		return -EINVAL;
>  	}
> 
> -	if (dev->data->dev_started) {
> -		RTE_PMD_DEBUG_TRACE(
> -		    "port %d must be stopped to allow configuration\n", port_id);
> -		return -EBUSY;
> -	}
> -
>  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
>  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_setup, -ENOTSUP);
> 
> @@ -1474,10 +1468,19 @@ rte_eth_rx_queue_setup(uint16_t port_id, uint16_t rx_queue_id,
>  		return -EINVAL;
>  	}
> 
> +	if (dev->data->dev_started &&
> +		!(dev_info.deferred_queue_config_capa &
> +			DEV_DEFERRED_RX_QUEUE_SETUP))
> +		return -EINVAL;
> +

I think now you have to check here that the queue is stopped.
Otherwise you might attempt to reconfigure running queue.


>  	rxq = dev->data->rx_queues;
>  	if (rxq[rx_queue_id]) {
>  		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->rx_queue_release,
>  					-ENOTSUP);

I don't think it is *that* straightforward.
rx_queue_setup() parameters can imply different rx function (and related dev icesettings)
that are already setuped by previous queue_setup()/dev_start.
So I think you need to do one of 2 things:
1. rework ethdev layer to introduce a separate rx function (and related settings) for each queue.
2. at rx_queue_setup() if it is invoked after dev_start - check that given queue settings wouldn't
contradict with current device settings  (rx function, etc.).
If they do - return an error.

>From my perspective - 1) is a better choice though it required more work, and possibly ABI breakage.
I did some work in that direction as RFC:
http://dpdk.org/dev/patchwork/patch/31866/

2) might be also possible, but looks a bit clumsy as rx_queue_setup() might now fail even with
valid parameters - all depends on previous queue configurations.

Same story applies for TX.


> +		if (dev->data->dev_started &&
> +			!(dev_info.deferred_queue_config_capa &
> +				DEV_DEFERRED_RX_QUEUE_RELEASE))
> +			return -EINVAL;
>  		(*dev->dev_ops->rx_queue_release)(rxq[rx_queue_id]);
>  		rxq[rx_queue_id] = NULL;
>  	}
> @@ -1573,12 +1576,6 @@ rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>  		return -EINVAL;
>  	}
> 
> -	if (dev->data->dev_started) {
> -		RTE_PMD_DEBUG_TRACE(
> -		    "port %d must be stopped to allow configuration\n", port_id);
> -		return -EBUSY;
> -	}
> -
>  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_infos_get, -ENOTSUP);
>  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_setup, -ENOTSUP);
> 
> @@ -1596,10 +1593,19 @@ rte_eth_tx_queue_setup(uint16_t port_id, uint16_t tx_queue_id,
>  		return -EINVAL;
>  	}
> 
> +	if (dev->data->dev_started &&
> +		!(dev_info.deferred_queue_config_capa &
> +			DEV_DEFERRED_TX_QUEUE_SETUP))
> +		return -EINVAL;
> +
>  	txq = dev->data->tx_queues;
>  	if (txq[tx_queue_id]) {
>  		RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->tx_queue_release,
>  					-ENOTSUP);
> +		if (dev->data->dev_started &&
> +			!(dev_info.deferred_queue_config_capa &
> +				DEV_DEFERRED_TX_QUEUE_RELEASE))
> +			return -EINVAL;
>  		(*dev->dev_ops->tx_queue_release)(txq[tx_queue_id]);
>  		txq[tx_queue_id] = NULL;
>  	}
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 036153306..410e58c50 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -981,6 +981,15 @@ struct rte_eth_conf {
>   */
>  #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> 
> +#define DEV_DEFERRED_RX_QUEUE_SETUP 0x00000001
> +/**< Deferred setup rx queue */
> +#define DEV_DEFERRED_TX_QUEUE_SETUP 0x00000002
> +/**< Deferred setup tx queue */
> +#define DEV_DEFERRED_RX_QUEUE_RELEASE 0x00000004
> +/**< Deferred release rx queue */
> +#define DEV_DEFERRED_TX_QUEUE_RELEASE 0x00000008
> +/**< Deferred release tx queue */
> +

I don't think we do need flags for both setup a and release.
If runtime setup is supported - surely dynamic release should be supported too.
Also probably RUNTIME_RX_QUEUE_SETUP sounds a bit better.

Konstantin

>  /*
>   * If new Tx offload capabilities are defined, they also must be
>   * mentioned in rte_tx_offload_names in rte_ethdev.c file.
> @@ -1029,6 +1038,8 @@ struct rte_eth_dev_info {
>  	/** Configured number of rx/tx queues */
>  	uint16_t nb_rx_queues; /**< Number of RX queues. */
>  	uint16_t nb_tx_queues; /**< Number of TX queues. */
> +	uint64_t deferred_queue_config_capa;
> +	/**< queues can be setup/release after dev_start (DEV_DEFERRED_). */
>  };
> 
>  /**
> --
> 2.13.6

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/2] Add RIB library
  @ 2018-03-14 11:09  4%   ` Bruce Richardson
  2018-03-25 18:17  0%     ` Vladimir Medvedkin
  2018-03-29 10:27  3%   ` Bruce Richardson
  1 sibling, 1 reply; 200+ results
From: Bruce Richardson @ 2018-03-14 11:09 UTC (permalink / raw)
  To: Medvedkin Vladimir; +Cc: dev

On Wed, Feb 21, 2018 at 09:44:54PM +0000, Medvedkin Vladimir wrote:
> RIB is an alternative to current LPM library.
> It solves the following problems
>  - Increases the speed of control plane operations against lpm such as
>    adding/deleting routes
>  - Adds abstraction from dataplane algorithms, so it is possible to add
>    different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc
>    in addition to current dir24_8
>  - It is possible to keep user defined application specific additional
>    information in struct rte_rib_node which represents route entry.
>    It can be next hop/set of next hops (i.e. active and feasible),
>    pointers to link rte_rib_node based on some criteria (i.e. next_hop),
>    plenty of additional control plane information.
>  - For dir24_8 implementation it is possible to remove rte_lpm_tbl_entry.depth
>    field that helps to save 6 bits.
>  - Also new dir24_8 implementation supports different next_hop sizes
>    (1/2/4/8 bytes per next hop)
>  - Removed RTE_LPM_LOOKUP_SUCCESS to save 1 bit and to eleminate ternary operator.
>    Instead it returns special default value if there is no route.
> 
> Signed-off-by: Medvedkin Vladimir <medvedkinv@gmail.com>
> ---
>  config/common_base                 |   6 +
>  doc/api/doxy-api.conf              |   1 +
>  lib/Makefile                       |   2 +
>  lib/librte_rib/Makefile            |  22 ++
>  lib/librte_rib/rte_dir24_8.c       | 482 +++++++++++++++++++++++++++++++++
>  lib/librte_rib/rte_dir24_8.h       | 116 ++++++++
>  lib/librte_rib/rte_rib.c           | 526 +++++++++++++++++++++++++++++++++++++
>  lib/librte_rib/rte_rib.h           | 322 +++++++++++++++++++++++
>  lib/librte_rib/rte_rib_version.map |  18 ++
>  mk/rte.app.mk                      |   1 +
>  10 files changed, 1496 insertions(+)
>  create mode 100644 lib/librte_rib/Makefile
>  create mode 100644 lib/librte_rib/rte_dir24_8.c
>  create mode 100644 lib/librte_rib/rte_dir24_8.h
>  create mode 100644 lib/librte_rib/rte_rib.c
>  create mode 100644 lib/librte_rib/rte_rib.h
>  create mode 100644 lib/librte_rib/rte_rib_version.map
> 

First pass review comments. For now just reviewed the main public header
file rte_rib.h. Later reviews will cover the other files as best I can.

/Bruce

<snip>
> diff --git a/lib/librte_rib/rte_rib.h b/lib/librte_rib/rte_rib.h
> new file mode 100644
> index 0000000..6eac8fb
> --- /dev/null
> +++ b/lib/librte_rib/rte_rib.h
> @@ -0,0 +1,322 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Vladimir Medvedkin <medvedkinv@gmail.com>
> + */
> +
> +#ifndef _RTE_RIB_H_
> +#define _RTE_RIB_H_
> +
> +/**
> + * @file
> + * Compressed trie implementation for Longest Prefix Match
> + */
> +
> +/** @internal Macro to enable/disable run-time checks. */
> +#if defined(RTE_LIBRTE_RIB_DEBUG)
> +#define RTE_RIB_RETURN_IF_TRUE(cond, retval) do {	\
> +	if (cond)					\
> +		return retval;				\
> +} while (0)
> +#else
> +#define RTE_RIB_RETURN_IF_TRUE(cond, retval)
> +#endif

use RTE_ASSERT?

> +
> +#define RTE_RIB_VALID_NODE	1

should there be an INVALID_NODE macro?

> +#define RTE_RIB_GET_NXT_ALL	0
> +#define RTE_RIB_GET_NXT_COVER	1
> +
> +#define RTE_RIB_INVALID_ROUTE	0
> +#define RTE_RIB_VALID_ROUTE	1
> +
> +/** Max number of characters in RIB name. */
> +#define RTE_RIB_NAMESIZE	64
> +
> +/** Maximum depth value possible for IPv4 RIB. */
> +#define RTE_RIB_MAXDEPTH	32

I think we should have IPv4 in the name here. Will it not be extended to
support IPv6 in future?

> +
> +/**
> + * Macro to check if prefix1 {key1/depth1}
> + * is covered by prefix2 {key2/depth2}
> + */
> +#define RTE_RIB_IS_COVERED(key1, depth1, key2, depth2)			\
> +	((((key1 ^ key2) & (uint32_t)(UINT64_MAX << (32 - depth2))) == 0)\
> +		&& (depth1 > depth2))
Neat check!

Any particular reason for using UINT64_MAX here rather than UINT32_MAX?
I think you can avoid the casting and have a slightly shorter mask by
changing "(uint32_t)(UINT64_MAX << (32 - depth2)" to 
"~(UINT32_MAX >> depth2)"
I'd also suggest for readability putting the second check first, and,
for maintainability, using an inline function rather than a macro.

> +
> +/** @internal Macro to get next node in tree*/
> +#define RTE_RIB_GET_NXT_NODE(node, key)					\
> +	((key & (1 << (31 - node->depth))) ? node->right : node->left)
> +/** @internal Macro to check if node is right child*/
> +#define RTE_RIB_IS_RIGHT_NODE(node)	(node->parent->right == node)

Again, consider inline fns rather than macros.
For the latter macro, rather than doing additional pointer derefs to
parent, can you also get if it's a right node by using:
"(node->key & (1 << (32 - node->depth)))"? 

> +
> +
> +struct rte_rib_node {
> +	struct rte_rib_node *left;
> +	struct rte_rib_node *right;
> +	struct rte_rib_node *parent;
> +	uint32_t	key;
> +	uint8_t		depth;
> +	uint8_t		flag;
> +	uint64_t	nh;
> +	uint64_t	ext[0];
> +};
> +
> +struct rte_rib;
> +
> +/** Type of FIB struct*/
> +enum rte_rib_type {
> +	RTE_RIB_DIR24_8_1B,
> +	RTE_RIB_DIR24_8_2B,
> +	RTE_RIB_DIR24_8_4B,
> +	RTE_RIB_DIR24_8_8B,
> +	RTE_RIB_TYPE_MAX
> +};

If the plan is to support multiple underlying fib types and algorithms
under the rib library, would it not be better to separate out the
algorithm part from the data storage part? So have the type just be
DIR_24_8, and have the 1, 2, 4 or 8 specified separately.

> +
> +enum rte_rib_op {
> +	RTE_RIB_ADD,
> +	RTE_RIB_DEL
> +};
> +
> +/** RIB nodes allocation type */
> +enum rte_rib_alloc_type {
> +	RTE_RIB_MALLOC,
> +	RTE_RIB_MEMPOOL,
> +	RTE_RIB_ALLOC_MAX
> +};

Not sure you need this any more. Malloc allocations and mempool
allocations are now pretty much the same thing.

> +
> +typedef int (*rte_rib_modify_fn_t)(struct rte_rib *rib, uint32_t key,
> +	uint8_t depth, uint64_t next_hop, enum rte_rib_op op);

Do you anticipate more ops in future than just add and delete? If not,
why not just split this function into two and drop the op struct.

> +typedef int (*rte_rib_tree_lookup_fn_t)(void *fib, const uint32_t *ips,
> +	uint64_t *next_hops, const unsigned n);
> +typedef struct rte_rib_node *(*rte_rib_alloc_node_fn_t)(struct rte_rib *rib);
> +typedef void (*rte_rib_free_node_fn_t)(struct rte_rib *rib,
> +	struct rte_rib_node *node);
> +
> +struct rte_rib {
> +	char name[RTE_RIB_NAMESIZE];
> +	/*pointer to rib trie*/
> +	struct rte_rib_node	*trie;
> +	/*pointer to dataplane struct*/
> +	void	*fib;
> +	/*prefix modification*/
> +	rte_rib_modify_fn_t	modify;
> +	/* Bulk lookup fn*/
> +	rte_rib_tree_lookup_fn_t	lookup;
> +	/*alloc trie element*/
> +	rte_rib_alloc_node_fn_t	alloc_node;
> +	/*free trie element*/
> +	rte_rib_free_node_fn_t	free_node;
> +	struct rte_mempool	*node_pool;
> +	uint32_t		cur_nodes;
> +	uint32_t		cur_routes;
> +	int			max_nodes;
> +	int			node_sz;
> +	enum rte_rib_type	type;
> +	enum rte_rib_alloc_type	alloc_type;
> +};
> +
> +/** RIB configuration structure */
> +struct rte_rib_conf {
> +	enum rte_rib_type	type;
> +	enum rte_rib_alloc_type	alloc_type;
> +	int	max_nodes;
> +	size_t	node_sz;
> +	uint64_t def_nh;
> +};
> +
> +/**
> + * Lookup an IP into the RIB structure
> + *
> + * @param rib
> + *  RIB object handle
> + * @param key
> + *  IP to be looked up in the RIB
> + * @return
> + *  pointer to struct rte_rib_node on success,
> + *  NULL otherwise
> + */
> +struct rte_rib_node *
> +rte_rib_tree_lookup(struct rte_rib *rib, uint32_t key);
> +
> +/**
> + * Lookup less specific route into the RIB structure
> + *
> + * @param ent
> + *  Pointer to struct rte_rib_node that represents target route
> + * @return
> + *  pointer to struct rte_rib_node that represents
> + *  less specific route on success,
> + *  NULL otherwise
> + */
> +struct rte_rib_node *
> +rte_rib_tree_lookup_parent(struct rte_rib_node *ent);
> +
> +/**
> + * Lookup prefix into the RIB structure
> + *
> + * @param rib
> + *  RIB object handle
> + * @param key
> + *  net to be looked up in the RIB
> + * @param depth
> + *  prefix length
> + * @return
> + *  pointer to struct rte_rib_node on success,
> + *  NULL otherwise
> + */
> +struct rte_rib_node *
> +rte_rib_tree_lookup_exact(struct rte_rib *rib, uint32_t key, uint8_t depth);

Can you explain the difference between this and regular lookup, and how
they would be used. I don't think the names convey the differences
sufficiently, and so we should look to rename one or both to be clearer.

> +
> +/**
> + * Retrieve next more specific prefix from the RIB
s/more/most/

> + * that is covered by key/depth supernet
> + *
> + * @param rib
> + *  RIB object handle
> + * @param key
> + *  net address of supernet prefix that covers returned more specific prefixes
> + * @param depth
> + *  supernet prefix length
> + * @param cur
> + *   pointer to the last returned prefix to get next prefix
> + *   or
> + *   NULL to get first more specific prefix
> + * @param flag
> + *  -RTE_RIB_GET_NXT_ALL
> + *   get all prefixes from subtrie

By all prefixes do you mean more specific, i.e. the final prefix?

> + *  -RTE_RIB_GET_NXT_COVER
> + *   get only first more specific prefix even if it have more specifics
> + * @return
> + *  pointer to the next more specific prefix
> + *  or
> + *  NULL if there is no prefixes left
> + */
> +struct rte_rib_node *
> +rte_rib_tree_get_nxt(struct rte_rib *rib, uint32_t key, uint8_t depth,
> +	struct rte_rib_node *cur, int flag);
> +
> +/**
> + * Remove prefix from the RIB
> + *
> + * @param rib
> + *  RIB object handle
> + * @param key
> + *  net to be removed from the RIB
> + * @param depth
> + *  prefix length
> + */
> +void
> +rte_rib_tree_remove(struct rte_rib *rib, uint32_t key, uint8_t depth);
> +
> +/**
> + * Insert prefix into the RIB
> + *
> + * @param rib
> + *  RIB object handle
> + * @param key
> + *  net to be inserted to the RIB
> + * @param depth
> + *  prefix length
> + * @return
> + *  pointer to new rte_rib_node on success
> + *  NULL otherwise
> + */
> +struct rte_rib_node *
> +rte_rib_tree_insert(struct rte_rib *rib, uint32_t key, uint8_t depth);
> +
> +/**
> + * Create RIB
> + *
> + * @param name
> + *  RIB name
> + * @param socket_id
> + *  NUMA socket ID for RIB table memory allocation
> + * @param conf
> + *  Structure containing the configuration
> + * @return
> + *  Handle to RIB object on success
> + *  NULL otherwise with rte_errno set to an appropriate values.
> + */
> +struct rte_rib *
> +rte_rib_create(const char *name, int socket_id, struct rte_rib_conf *conf);
> +
> +/**
> + * Find an existing RIB object and return a pointer to it.
> + *
> + * @param name
> + *  Name of the rib object as passed to rte_rib_create()
> + * @return
> + *  Pointer to rib object or NULL if object not found with rte_errno
> + *  set appropriately. Possible rte_errno values include:
> + *   - ENOENT - required entry not available to return.
> + */
> +struct rte_rib *
> +rte_rib_find_existing(const char *name);
> +
> +/**
> + * Free an RIB object.
> + *
> + * @param rib
> + *   RIB object handle
> + * @return
> + *   None
> + */
> +void
> +rte_rib_free(struct rte_rib *rib);
> +
> +/**
> + * Add a rule to the RIB.
> + *
> + * @param rib
> + *   RIB object handle
> + * @param ip
> + *   IP of the rule to be added to the RIB
> + * @param depth
> + *   Depth of the rule to be added to the RIB
> + * @param next_hop
> + *   Next hop of the rule to be added to the RIB
> + * @return
> + *   0 on success, negative value otherwise
> + */
> +int
> +rte_rib_add(struct rte_rib *rib, uint32_t ip, uint8_t depth, uint64_t next_hop);
> +
> +/**
> + * Delete a rule from the RIB.
> + *
> + * @param rib
> + *   RIB object handle
> + * @param ip
> + *   IP of the rule to be deleted from the RIB
> + * @param depth
> + *   Depth of the rule to be deleted from the RIB
> + * @return
> + *   0 on success, negative value otherwise
> + */
> +int
> +rte_rib_delete(struct rte_rib *rib, uint32_t ip, uint8_t depth);
> +
> +/**
> + * Lookup multiple IP addresses in an FIB. This may be implemented as a
> + * macro, so the address of the function should not be used.
> + *
> + * @param RIB
> + *   RIB object handle
> + * @param ips
> + *   Array of IPs to be looked up in the FIB
> + * @param next_hops
> + *   Next hop of the most specific rule found for IP.
> + *   This is an array of eight byte values.
> + *   If the lookup for the given IP failed, then corresponding element would
> + *   contain default value, see description of then next parameter.
> + * @param n
> + *   Number of elements in ips (and next_hops) array to lookup. This should be a
> + *   compile time constant, and divisible by 8 for best performance.
> + * @param defv
> + *   Default value to populate into corresponding element of hop[] array,
> + *   if lookup would fail.
> + *  @return
> + *   -EINVAL for incorrect arguments, otherwise 0
> + */
> +#define rte_rib_fib_lookup_bulk(rib, ips, next_hops, n)	\
> +	rib->lookup(rib->fib, ips, next_hops, n)

My main thought here is whether this needs to be a function at all?
Given that it takes a full burst of addresses in a single go, how much
performance would actually be lost by making this a regular function in
the C file?
IF we do convert this to a regular function, then a lot of the structure
definitions above - most importantly, the rib structure itself - can
probably be moved to a private header file and not exposed to
applications at all. This will make ABI compatibility a *lot* easier, as
the structures can be changed without affecting the public ABI.

/Bruce

> +
> +#endif /* _RTE_RIB_H_ */

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [RFC] Switch device offload with DPDK
@ 2018-03-12 15:55  1% Adrien Mazarguil
  0 siblings, 0 replies; 200+ results
From: Adrien Mazarguil @ 2018-03-12 15:55 UTC (permalink / raw)
  To: dev
  Cc: Alex Rosenbaum, Ferruh Yigit, Thomas Monjalon, Shahaf Shuler,
	Doherty, Declan, Qi Zhang, Alejandro Lucero

Hi All,

(I tried to CC key people part of related discussions but likely missed
many, feel free to add them back)

The following RFC formalizes what has been discussed so far regarding switch
offload control using rte_flow through port/VF representors. It doesn't
bring new ideas besides the "transfer" attribute used internally to
implement port representors themselves and reuses some from past and current
threads or RFCs (see related anchors [2][3][4][5][7][8][9][10]).

It is meant to be converted without much hassle as proper DPDK documentation
hence the verbosity, however it all boils down to this:

- Port representors shall exist as additional ethdevs [2] created by PMDs
  according to devargs [4] discussed elsewhere; out of scope for this RFC.

- Besides enabling basic management of these ethdevs (e.g. MAC
  configuration), they should support rte_flow rules.

- Flow rules can target a different DPDK port ID [7] in order for matching
  traffic to be automatically forwarded without going through software.

- Forwarding may involve transformation such as VXLAN encap/decap [8][9].

---

===============================
Switch device offload with DPDK
===============================

Rationale
=========

Network adapters with multiple physical ports and/or SR-IOV capabilities
usually support the offload of traffic steering rules between their virtual
functions (VFs), physical functions (PFs) and ports.

Like for standard Ethernet switches, this involves a combination of
automatic MAC learning and manual configuration. For most purposes it is
managed by the host system and fully transparent to users and applications.

On the other hand, applications typically found on hypervisors that process
layer 2 (L2) traffic (such as OVS) need to steer traffic themselves
according on their own criteria.

Without a standard software interface to manage traffic steering rules
between VFs, PFs and the various physical ports of a given device,
applications cannot take advantage of these offloads; software processing is
mandatory even for traffic which ends up re-injected into the device it
originates from.

This document describes how such steering rules can be configured through
the DPDK flow API (**rte_flow**), with emphasis on the SR-IOV use case
(PF/VF steering) using a single physical port for clarity, however the same
logic applies to any number of ports without necessarily involving SR-IOV.

Port representors
=================

In many cases, traffic steering rules cannot be determined in advance;
applications usually have to process a bit of traffic in software before
thinking about offloading specific flows to hardware.

Applications therefore need the ability to receive and inject traffic to
various device endpoints (other VFs, PFs or physical ports) before
connecting them together. Device drivers must provide means to hook the
"other end" of these endpoints and to refer them when configuring flow
rules.

This role is left to so-called "port representors" (also known as "VF
representors" in the specific context of VFs), which are to DPDK what the
Ethernet switch device driver model (**switchdev**) [1]_ is to Linux, and
which can be thought as a software "patch panel" front-end for applications.

- DPDK port representors are implemented as additional virtual Ethernet
  device (**ethdev**) instances [2]_, spawned on a needed basis through
  configuration parameters [3]_ [4]_ by the driver of the underlying
  device.

- As virtual devices, they may be more limited than their physical
  counterparts, for instance by exposing only a subset of device
  configuration callbacks and/or by not necessarily having Rx/Tx capability.

- Among other things, they can be used to assign MAC addresses to the
  resource they represent.

- Applications can tell port representors apart by checking their device
  information structure which contains dedicated fields [5]_ describing
  parent/child device or group relationship (exact API remains to be
  defined).

.. [1] `Ethernet switch device driver model (switchdev)
       <https://www.kernel.org/doc/Documentation/networking/switchdev.txt>`_

.. [2] `[RFC 0/5] Port Representor for control and monitoring of VF devices
       <http://dpdk.org/ml/archives/dev/2017-December/084639.html>`_

.. [3] `[PATCH v4 0/5] lib: add Port Representors
       <http://dpdk.org/ml/archives/dev/2018-January/086598.html>`_

.. [4] `doc: document the new devargs syntax
       <http://dpdk.org/ml/archives/dev/2018-January/087416.html>`_

.. [5] `doc: announce ABI change to support VF representors
       <http://dpdk.org/ml/archives/dev/2018-February/090958.html>`_

Basic SR-IOV
============

"Basic" in the sense that it is not managed by applications, which
nonetheless expect traffic to flow between the various endpoints and the
outside as if everything was linked by an Ethernet hub.

The following diagram pictures a setup involving a device with one PF, two
VFs and one shared physical port::

    .-------------.                 .-------------. .-------------.
    | hypervisor  |                 |    VM 1     | |    VM 2     |
    | application |                 | application | | application |
    `--+----------'                 `----------+--' `--+----------'
       |                                       |       |
 .-----+-----.                                 |       |
 | port_id 3 |                                 |       |
 `-----+-----'                                 |       |
       |                                       |       |
     .-+--.                                .---+--. .--+---.
     | PF |                                | VF 1 | | VF 2 |
     `-+--'                                `---+--' `--+---'
       |                                       |       |
       `---------.     .-----------------------'       |
                 |     |     .-------------------------'
                 |     |     |
              .--+-----+-----+--.
              | interconnection |
              `--------+--------'
                       |
                  .----+-----.
                  | physical |
                  |  port 0  |
                  `----------'

- A DPDK application running on the hypervisor owns the PF device, which is
  arbitrarily assigned port index 3.

- Both VFs are assigned to VMs and used by unknown applications; they may be
  DPDK-based or anything else.

- Interconnection is not necessarily done through a true Ethernet switch and
  may not even exist as a separate entity. The role of this block is to show
  that something brings PF, VFs and physical ports together and enables
  communication between them, with a number of built-in restrictions.

Subsequent sections in this document describe means for DPDK applications
running on the hypervisor to freely assign specific flows between PF, VFs
and physical ports based on traffic properties, by managing this
interconnection.

Controlled SR-IOV
=================

Initialization
--------------

When a DPDK application gets assigned a PF device and is deliberately not
started in `basic SR-IOV`_ mode, any traffic coming from physical ports is
received by PF according to default rules, while VFs remain isolated::

    .-------------.                 .-------------. .-------------.
    | hypervisor  |                 |    VM 1     | |    VM 2     |
    | application |                 | application | | application |
    `--+----------'                 `----------+--' `--+----------'
       |                                       |       |
 .-----+-----.                                 |       |
 | port_id 3 |                                 |       |
 `-----+-----'                                 |       |
       |                                       |       |
     .-+--.                                .---+--. .--+---.
     | PF |                                | VF 1 | | VF 2 |
     `-+--'                                `------' `------'
       |
       `-----.
             |
          .--+----------------------.
          | managed interconnection |
          `------------+------------'
                       |
                  .----+-----.
                  | physical |
                  |  port 0  |
                  `----------'

In this mode, interconnection must be configured by the application to
enable VF communication, for instance by explicitly directing traffic with a
given destination MAC address to VF 1 and allowing that with the same source
MAC address to come out of it.

For this to work, hypervisor applications need a way to refer to either VF 1
or VF 2 in addition to the PF. This is addressed by `VF representors`_.

VF representors
---------------

VF representors are virtual but standard DPDK network devices (albeit with
limited capabilities) created by PMDs when managing a PF device.

Since they represent VF instances used by other applications, configuring
them (e.g. assigning a MAC address or setting up promiscuous mode) affects
interconnection accordingly. If supported, they may also be used as two-way
communication ports with VFs (assuming **switchdev** topology)::

    .-------------.                 .-------------. .-------------.
    | hypervisor  |                 |    VM 1     | |    VM 2     |
    | application |                 | application | | application |
    `--+---+---+--'                 `----------+--' `--+----------'
       |   |   |                               |       |
       |   |   `-------------------.           |       |
       |   `---------.             |           |       |
       |             |             |           |       |
 .-----+-----. .-----+-----. .-----+-----.     |       |
 | port_id 3 | | port_id 4 | | port_id 5 |     |       |
 `-----+-----' `-----+-----' `-----+-----'     |       |
       |             |             |           |       |
     .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
     | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
     `-+--'    `-----+-----' `-----+-----' `---+--' `--+---'
       |             |             |           |       |
       |             |   .---------'           |       |
       `-----.       |   |   .-----------------'       |
             |       |   |   |   .---------------------'
             |       |   |   |   |
          .--+-------+---+---+---+--.
          | managed interconnection |
          `------------+------------'
                       |
                  .----+-----.
                  | physical |
                  |  port 0  |
                  `----------'

- VF representors are assigned arbitrary port indices 4 and 5 in the
  hypervisor application and are respectively associated with VF 1 and VF 2.

- They can't be dissociated; even if VF 1 and VF 2 were not connected,
  representors could still be used for configuration.

- In this context, port index 3 can be thought as a representor for physical
  port 0.

As previously described, the "interconnection" block represents a logical
concept. Interconnection occurs when hardware configuration enables traffic
flows from one place to another (e.g. physical port 0 to VF 1) according to
some criteria.

This is discussed in more detail in `traffic steering`_.

Traffic steering
----------------

In the following diagram, each meaningful traffic origin or endpoint as seen
by the hypervisor application is tagged with a unique letter from A to F::

    .-------------.                 .-------------. .-------------.
    | hypervisor  |                 |    VM 1     | |    VM 2     |
    | application |                 | application | | application |
    `--+---+---+--'                 `----------+--' `--+----------'
       |   |   |                               |       |
       |   |   `-------------------.           |       |
       |   `---------.             |           |       |
       |             |             |           |       |
 .----(A)----. .----(B)----. .----(C)----.     |       |
 | port_id 3 | | port_id 4 | | port_id 5 |     |       |
 `-----+-----' `-----+-----' `-----+-----'     |       |
       |             |             |           |       |
     .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
     | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
     `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
       |             |             |           |       |
       |             |   .---------'           |       |
       `-----.       |   |   .-----------------'       |
             |       |   |   |   .---------------------'
             |       |   |   |   |
          .--+-------+---+---+---+--.
          | managed interconnection |
          `------------+------------'
                       |
                  .---(F)----.
                  | physical |
                  |  port 0  |
                  `----------'

- **A**: PF device.
- **B**: port representor for VF 1.
- **C**: port representor for VF 2.
- **D**: VF 1 proper.
- **E**: VF 2 proper.
- **F**: physical port.

Although uncommon, some devices do not enforce a one to one mapping between
PF and physical ports. For instance, by default all ports of **mlx4**
adapters are available to all their PF/VF instances, in which case
additional ports appear next to **F** in the above diagram.

Assuming no interconnection is provided by default in this mode, setting up
a `basic SR-IOV`_ configuration involving physical port 0 could be broken
down as:

PF:

- **A to F**: let everything through.
- **F to A**: PF MAC as destination.

VF 1:

- **A to D**, **E to D** and **F to D**: VF 1 MAC as destination.
- **D to A**: VF 1 MAC as source and PF MAC as destination.
- **D to E**: VF 1 MAC as source and VF 2 MAC as destination.
- **D to F**: VF 1 MAC as source.

VF 2:

- **A to E**, **D to E** and **F to E**: VF 2 MAC as destination.
- **E to A**: VF 2 MAC as source and PF MAC as destination.
- **E to D**: VF 2 MAC as source and VF 1 MAC as destination.
- **E to F**: VF 2 MAC as source.

Devices may additionally support advanced matching criteria such as
IPv4/IPv6 addresses or TCP/UDP ports.

The combination of matching criteria with target endpoints fits well with
**rte_flow** [6]_, which expresses flow rules as combinations of patterns
and actions.

Enhancing **rte_flow** with the ability to make flow rules match and target
these endpoints provides a standard interface to manage their
interconnection without introducing new concepts and whole new API to
implement them. This is described in `flow API (rte_flow)`_.

.. [6] `Generic flow API (rte_flow)
       <http://dpdk.org/doc/guides/prog_guide/rte_flow.html>`_

Flow API (rte_flow)
===================

Extensions
----------

Compared to creating a brand new dedicated interface, **rte_flow** was
deemed flexible enough to manage representor traffic only with minor
extensions:

- Using physical ports, PF, VF or port representors as targets.

- Affecting traffic that is not necessarily addressed to the DPDK port ID a
  flow rule is associated with (e.g. forcing VF traffic redirection to PF).

For advanced uses:

- Rule-based packet counters.

- The ability to combine several identical actions for traffic duplication
  (e.g. VF representor in addition to a physical port).

- Dedicated actions for traffic encapsulation / decapsulation before
  reaching a endpoint.

The extensions described in the following sections follow up on Qi Zhang's
original RFC [7]_.

.. [7] `rte_flow extension for vSwitch acceleration
       <http://dpdk.org/ml/archives/dev/2017-December/084598.html>`_

Traffic direction
-----------------

>From an application standpoint, "ingress" and "egress" flow rule attributes
apply to the DPDK port ID they are associated with. They select a traffic
direction for matching patterns, but have no impact on actions.

When matching traffic coming from or going to a different place than the
immediate port ID a flow rule is associated with, these attributes keep
their meaning while applying to the chosen origin, as highlighted by the
following diagram::

    .-------------.                 .-------------. .-------------.
    | hypervisor  |                 |    VM 1     | |    VM 2     |
    | application |                 | application | | application |
    `--+---+---+--'                 `----------+--' `--+----------'
       |   |   |                               |       |
       |   |   `-------------------.           |       |
       |   `---------.             |           |       |
       | ^           | ^           | ^         |       |
       | | ingress   | | ingress   | | ingress |       |
       | | egress    | | egress    | | egress  |       |
       | v           | v           | v         |       |
 .----(A)----. .----(B)----. .----(C)----.     |       |
 | port_id 3 | | port_id 4 | | port_id 5 |     |       |
 `-----+-----' `-----+-----' `-----+-----'     |       |
       |             |             |           |       |
     .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
     | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
     `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
       |             |             |         ^ |       | ^
       |             |             |  egress | |       | | egress
       |             |             | ingress | |       | | ingress
       |             |   .---------'         v |       | v
       `-----.       |   |   .-----------------'       |
             |       |   |   |   .---------------------'
             |       |   |   |   |
          .--+-------+---+---+---+--.
          | managed interconnection |
          `------------+------------'
                     ^ |
             ingress | |
              egress | |
                     v |
                  .---(F)----.
                  | physical |
                  |  port 0  |
                  `----------'

Ingress and egress are defined as relative to the application creating the
flow rule.

For instance, matching traffic sent by VM 2 would be done through an ingress
flow rule on VF 2 (**E**). Likewise for incoming traffic on physical port
(**F**). This also applies to **C** and **A** respectively.

Transferring traffic
--------------------

Without port representors
~~~~~~~~~~~~~~~~~~~~~~~~~

`Traffic direction`_ describes how an application could match traffic coming
from or going to a specific place reachable from a DPDK port ID. This makes
sense when the traffic in question is normally seen (i.e. sent or received)
by the application creating the flow rule (e.g. as in "redirect all traffic
coming from VF 1 to local queue 6").

However this does not force such traffic to take a specific route. Creating
a flow rule on **A** matching traffic coming from **D** is only meaningful
if it can be received by **A** in the first place, otherwise doing so simply
has no effect.

A new flow rule attribute named "transfer" is necessary for that. Combining
it with "ingress" or "egress" and a specific origin requests a flow rule to
be applied at the lowest level::

          ingress only           :       ingress + transfer
                                 :
 .-------------. .-------------. : .-------------. .-------------.
 | hypervisor  | |    VM 1     | : | hypervisor  | |    VM 1     |
 | application | | application | : | application | | application |
 `------+------' `--+----------' : `------+------' `--+----------'
        |           | | traffic  :        |           | | traffic
  .----(A)----.     | v          :  .----(A)----.     | v
  | port_id 3 |     |            :  | port_id 3 |     |
  `-----+-----'     |            :  `-----+-----'     |
        |           |            :        | ^         |
        |           |            :        | | traffic |
      .-+--.    .---+--.         :      .-+--.    .---+--.
      | PF |    | VF 1 |         :      | PF |    | VF 1 |
      `-+--'    `--(D)-'         :      `-+--'    `--(D)-'
        |           | | traffic  :        | ^         | | traffic
        |           | v          :        | | traffic | v
     .--+-----------+--.         :     .--+-----------+--.
     | interconnection |         :     | interconnection |
     `--------+--------'         :     `--------+--------'
              | | traffic        :              |
              | v                :              |
         .---(F)----.            :         .---(F)----.
         | physical |            :         | physical |
         |  port 0  |            :         |  port 0  |
         `----------'            :         `----------'

With "ingress" only, traffic is matched on **A** thus still goes to physical
port **F** by default::

 testpmd> flow create 3 ingress pattern vf id is 1 / end
              actions queue index 6 / end

With "ingress + transfer", traffic is matched on **D** and is therefore
successfully assigned to queue 6 on **A**::

 testpmd> flow create 3 ingress transfer pattern vf id is 1 / end
              actions queue index 6 / end

With port representors
~~~~~~~~~~~~~~~~~~~~~~

When port representors exist, implicit flow rules with the "transfer"
attribute (described in `without port representors`_) are be assumed to
exist between them and their represented resources. These may be immutable.

In this case, traffic is received by default through the representor and
neither the "transfer" attribute nor traffic origin in flow rule patterns
are necessary. They simply have to be created on the representor port
directly and may target a different representor as described in `PORT_ID
action`_.

Implicit traffic flow with port representor::

    .-------------.   .-------------.
    | hypervisor  |   |    VM 1     |
    | application |   | application |
    `--+-------+--'   `----------+--'
       |       | ^               | | traffic
       |       | | traffic       | v
       |       `-----.           |
       |             |           |
 .----(A)----. .----(B)----.     |
 | port_id 3 | | port_id 4 |     |
 `-----+-----' `-----+-----'     |
       |             |           |
     .-+--.    .-----+-----. .---+--.
     | PF |    | VF 1 rep. | | VF 1 |
     `-+--'    `-----+-----' `--(D)-'
       |             |           |
    .--|-------------|-----------|--.
    |  |             |           |  |
    |  |             `-----------'  |
    |  |              <-- traffic   |
    `--|----------------------------'
       |
  .---(F)----.
  | physical |
  |  port 0  |
  `----------'

Pattern items and actions
-------------------------

``PORT`` pattern item
~~~~~~~~~~~~~~~~~~~~~

Matches traffic originating from (ingress) or going to (egress) a physical
port of the underlying device.

Using this pattern item without specifying a port index matches the physical
port associated with the current DPDK port ID by default. As described in
`traffic steering`_, specifying it should be rarely needed.

- Matches **F** in `traffic steering`_.

``PORT`` action
~~~~~~~~~~~~~~~

Directs matching traffic to a given physical port index.

- Targets **F** in `traffic steering`_.

``PORT_ID`` pattern item
~~~~~~~~~~~~~~~~~~~~~~~~

Matches traffic originating from (ingress) or going to (egress) a given DPDK
port ID.

Normally only supported if the port ID in question is known by the
underlying PMD and related to the device the flow rule is created against.

This must not be confused with the `PORT pattern item`_ which refers to the
physical port of a device. ``PORT_ID`` refers to a ``struct rte_eth_dev``
object on the application side (also known as "port representor" depending
on the kind of underlying device).

- Matches **A**, **B** or **C** in `traffic steering`_.

``PORT_ID`` action
~~~~~~~~~~~~~~~~~~

Directs matching traffic to a given DPDK port ID.

Same restrictions as `PORT_ID pattern item`_.

- Targets **A**, **B** or **C** in `traffic steering`_.

``PF`` pattern item
~~~~~~~~~~~~~~~~~~~

Matches traffic originating from (ingress) or going to (egress) the physical
function of the current device.

If supported, should work even if the physical function is not managed by
the application and thus not associated with a DPDK port ID. Its behavior is
otherwise similar to `PORT_ID pattern item`_ using PF port ID.

- Matches **A** in `traffic steering`_.

``PF`` action
~~~~~~~~~~~~~

Directs matching traffic to the physical function of the current device.

Same restrictions as `PF pattern item`_.

- Targets **A** in `traffic steering`_.

``VF`` pattern item
~~~~~~~~~~~~~~~~~~~

Matches traffic originating from (ingress) or going to (egress) a given
virtual function of the current device.

If supported, should work even if the virtual function is not managed by
the application and thus not associated with a DPDK port ID. Its behavior is
otherwise similar to `PORT_ID pattern item`_ using VF port ID.

Note this pattern item does not match VF representors traffic which, as
separate entities, should be addressed through their own port IDs.

- Matches **D** or **E** in `traffic steering`_.

``VF`` action
~~~~~~~~~~~~~

Directs matching traffic to a given virtual function of the current device.

Same restrictions as `VF pattern item`_.

- Targets **D** or **E** in `traffic steering`_.

``*_ENCAP`` actions
~~~~~~~~~~~~~~~~~~~

These actions are named according to the protocol they encapsulate traffic
with (e.g. ``VXLAN_ENCAP``) and using specific parameters (e.g. VNI for
VXLAN).

While they modify traffic and can be used multiple times (order matters),
unlike `PORT_ID action`_ and friends, they have no impact on steering.

As described in `actions order and repetition`_ this means they are useless
if used alone in an action list, the resulting traffic gets dropped unless
combined with either ``PASSTHRU`` or other endpoint-targeting actions.

All these are under discussion in the context of adding support for tunnel
endpoint (TEP) [8]_ [9]_.

.. [8] `[RFC] tunnel endpoint hw acceleration enablement
       <http://dpdk.org/ml/archives/dev/2017-December/084676.html>`_

.. [9] `ethdev: Additions to rte_flows to support vTEP encap/decap offload
       <http://dpdk.org/ml/archives/dev/2018-March/092378.html>`_

``*_DECAP`` actions
~~~~~~~~~~~~~~~~~~~

They perform the reverse of `*_ENCAP actions`_ by popping protocol headers
from traffic instead of pushing them. They can be used multiple times as
well.

Note that using these actions on non-matching traffic results in undefined
behavior. It is recommended to match the protocol headers to decapsulate on
the pattern side of a flow rule in order to use these actions or otherwise
make sure only matching traffic goes through.

Actions order and repetition
----------------------------

Flow rules are currently restricted to at most a single action of each
supported type, performed in an unpredictable order (or all at once). To
repeat actions in a predictable fashion, applications have to make rules
pass-through and use priority levels.

It's now clear that PMD support for chaining multiple non-terminating flow
rules of varying priority levels is prohibitively difficult to implement
compared to simply allowing multiple identical actions performed in a
defined order by a single flow rule.

- This change is required to support protocol encapsulation offloads and the
  ability to perform them multiple times (e.g. VLAN then VXLAN).

- It makes the ``DUP`` action redundant since multiple ``QUEUE`` actions can
  be combined for duplication.

- The (non-)terminating property of actions must be discarded. Instead, flow
  rules themselves must be considered terminating by default (i.e. dropping
  traffic if there is no specific target) unless a ``PASSTHRU`` action is
  also specified.

This change was announced [10]_ on the DPDK mailing list.

.. [10] `doc: announce API change for flow actions
	<http://dpdk.org/ml/archives/dev/2018-February/090989.html>`_

Examples
--------

This section provides practical examples based on the established Testpmd
flow command syntax [11]_, in the context described in `traffic steering`_::

    .-------------.                 .-------------. .-------------.
    | hypervisor  |                 |    VM 1     | |    VM 2     |
    | application |                 | application | | application |
    `--+---+---+--'                 `----------+--' `--+----------'
       |   |   |                               |       |
       |   |   `-------------------.           |       |
       |   `---------.             |           |       |
       |             |             |           |       |
 .----(A)----. .----(B)----. .----(C)----.     |       |
 | port_id 3 | | port_id 4 | | port_id 5 |     |       |
 `-----+-----' `-----+-----' `-----+-----'     |       |
       |             |             |           |       |
     .-+--.    .-----+-----. .-----+-----. .---+--. .--+---.
     | PF |    | VF 1 rep. | | VF 2 rep. | | VF 1 | | VF 2 |
     `-+--'    `-----+-----' `-----+-----' `--(D)-' `-(E)--'
       |             |             |           |       |
       |             |   .---------'           |       |
       `-----.       |   |   .-----------------'       |
             |       |   |   |   .---------------------'
             |       |   |   |   |
          .--|-------|---|---|---|--.
          |  |       |   `---|---'  |
          |  |       `-------'      |
          |  `---------.            |
          `------------|------------'
                       |
                  .---(F)----.
                  | physical |
                  |  port 0  |
                  `----------'

By default, PF (**A**) can communicate with the physical port it is
associated with (**F**), while VF 1 (**D**) and VF 2 (**E**) are isolated
and restricted to communicate with the hypervisor application through their
respective representors (**B** and **C**) if supported.

Examples in subsequent sections apply to hypervisor applications only and
are based on port representors **A**, **B** and **C**.

.. [11] `Flow syntax
	<http://dpdk.org/doc/guides/testpmd_app_ug/testpmd_funcs.html#flow-syntax>`_

Associating VF 1 with physical port 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Assign all port traffic (**F**) to VF 1 (**D**) indiscriminately through
their representors::

 flow create 3 ingress pattern / end actions port_id id 4 / end
 flow create 4 ingress pattern / end actions port_id id 3 / end

More practical example with MAC address restrictions::

 flow create 3 ingress
     pattern eth dst is {VF 1 MAC} / end
     actions port_id id 4 / end
 flow create 4 ingress
     pattern eth src is {VF 1 MAC} / end
     actions port_id id 3 / end

Sharing broadcasts
~~~~~~~~~~~~~~~~~~

>From outside to PF and VFs::

 flow create 3 ingress
     pattern eth dst is ff:ff:ff:ff:ff:ff / end
     actions port_id id 3 / port_id id 4 / port_id id 5 / end

Note ``port_id id 3`` is necessary otherwise only VFs would receive matching
traffic.

>From PF to outside and VFs::

 flow create 3 egress
     pattern eth dst is ff:ff:ff:ff:ff:ff / end
     actions port / port_id id 4 / port_id id 5 / end

>From VFs to outside and PF::

 flow create 4 ingress
     pattern eth dst is ff:ff:ff:ff:ff:ff src is {VF 1 MAC} / end
     actions port_id id 3 / port_id id 5 / end
 flow create 5 ingress
     pattern eth dst is ff:ff:ff:ff:ff:ff src is {VF 2 MAC} / end
     actions port_id id 4 / port_id id 4 / end

Similar ``33:33:*`` rules based on known MAC addresses should be added for
IPv6 traffic.

Encapsulating VF 2 traffic in VXLAN
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Assuming pass-through flow rules are supported::

 flow create 5 ingress
     pattern eth / end
     actions vxlan_encap vni 42 / passthru / end
 flow create 5 egress
     pattern vxlan vni is 42 / end
     actions vxlan_decap / passthru / end

Here ``passthru`` is needed since as described in `actions order and
repetition`_, flow rules are otherwise terminating; if supported, a rule
without a target endpoint will drop traffic.

Without pass-through support, ingress encapsulation on the destination
endpoint might not be supported and action list must provide one::

 flow create 5 ingress
      pattern eth src is {VF 2 MAC} / end
      actions vxlan_encap vni 42 / port_id id 3 / end
 flow create 3 ingress
      pattern vxlan vni is 42 / end
      actions vxlan_decap / port_id id 5 / end

-- 
Adrien Mazarguil
6WIND

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] [PATCH v2 1/2] eventdev: add device stop flush callback
  2018-03-12 14:30  3%     ` Eads, Gage
@ 2018-03-12 14:38  0%       ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2018-03-12 14:38 UTC (permalink / raw)
  To: Eads, Gage
  Cc: dev, Van Haaren, Harry, hemant.agrawal, Richardson, Bruce,
	santosh.shukla, nipun.gupta

-----Original Message-----
> Date: Mon, 12 Mar 2018 14:30:49 +0000
> From: "Eads, Gage" <gage.eads@intel.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: "dev@dpdk.org" <dev@dpdk.org>, "Van Haaren, Harry"
>  <harry.van.haaren@intel.com>, "hemant.agrawal@nxp.com"
>  <hemant.agrawal@nxp.com>, "Richardson, Bruce"
>  <bruce.richardson@intel.com>, "santosh.shukla@caviumnetworks.com"
>  <santosh.shukla@caviumnetworks.com>, "nipun.gupta@nxp.com"
>  <nipun.gupta@nxp.com>
> Subject: RE: [PATCH v2 1/2] eventdev: add device stop flush callback
> 
> 
> 
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Monday, March 12, 2018 1:25 AM
> > To: Eads, Gage <gage.eads@intel.com>
> > Cc: dev@dpdk.org; Van Haaren, Harry <harry.van.haaren@intel.com>;
> > hemant.agrawal@nxp.com; Richardson, Bruce <bruce.richardson@intel.com>;
> > santosh.shukla@caviumnetworks.com; nipun.gupta@nxp.com
> > Subject: Re: [PATCH v2 1/2] eventdev: add device stop flush callback
> > 
> > -----Original Message-----
> > > When an event device is stopped, it drains all event queues. These
> > > events may contain pointers, so to prevent memory leaks eventdev now
> > > supports a user-provided flush callback that is called during the queue drain
> > process.
> > > This callback is stored in process memory, so the callback must be
> > > registered by any process that may call rte_event_dev_stop().
> > >
> > > This commit also clarifies the behavior of rte_event_dev_stop().
> > >
> > > This follows this mailing list discussion:
> > > http://dpdk.org/ml/archives/dev/2018-January/087484.html
> > >
> > > Signed-off-by: Gage Eads <gage.eads@intel.com>
> > > ---
> > > v2: allow a NULL callback pointer to unregister the callback
> > >
> > >  lib/librte_eventdev/rte_eventdev.c           | 17 +++++++++
> > >  lib/librte_eventdev/rte_eventdev.h           | 55
> > +++++++++++++++++++++++++++-
> > >  lib/librte_eventdev/rte_eventdev_version.map |  6 +++
> > >  3 files changed, 76 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/lib/librte_eventdev/rte_eventdev.c
> > > b/lib/librte_eventdev/rte_eventdev.c
> > > index 851a119..1aacb7b 100644
> > > --- a/lib/librte_eventdev/rte_eventdev.c
> > > +++ b/lib/librte_eventdev/rte_eventdev.c
> > > @@ -1123,6 +1123,23 @@ rte_event_dev_start(uint8_t dev_id)
> > >  	return 0;
> > >  }
> > >
> > > +typedef void (*eventdev_stop_flush_t)(uint8_t dev_id, struct rte_event event,
> > > +		void *arg);
> > > +/**< Callback function called during rte_event_dev_stop(), invoked
> > > +once per
> > > + * flushed event.
> > > + */
> > > +
> > >  #define RTE_EVENTDEV_NAME_MAX_LEN	(64)
> > >  /**< @internal Max length of name of event PMD */
> > >
> > > @@ -1176,6 +1194,11 @@ struct rte_eventdev {
> > >  	event_dequeue_burst_t dequeue_burst;
> > >  	/**< Pointer to PMD dequeue burst function. */
> > >
> > > +	eventdev_stop_flush_t dev_stop_flush;
> > > +	/**< Optional, user-provided event flush function */
> > > +	void *dev_stop_flush_arg;
> > > +	/**< User-provided argument for event flush function */
> > > +
> > 
> > I think, we can move this additions to the internal rte_eventdev_data structure.
> > Reasons are
> > 1) Changes to "struct rte_eventdev" would call for ABI change
> > 2) We can keep "struct rte_eventdev" only for fast path functions, slow path
> > functions can have additional redirection.
> > 
> 
> Good points -- I hadn't considered the ABI impact of modifying rte_eventdev. rte_eventdev_data is in shared memory, though, so it's not multi-process friendly for function pointers. How about putting it in rte_eventdev_ops?

Yes. Make sense to move to rte_eventdev_ops. But need to take care
updating the those function pointers in secondary process.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/2] eventdev: add device stop flush callback
  2018-03-12  6:25  3%   ` Jerin Jacob
@ 2018-03-12 14:30  3%     ` Eads, Gage
  2018-03-12 14:38  0%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Eads, Gage @ 2018-03-12 14:30 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, Van Haaren, Harry, hemant.agrawal, Richardson, Bruce,
	santosh.shukla, nipun.gupta



> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Monday, March 12, 2018 1:25 AM
> To: Eads, Gage <gage.eads@intel.com>
> Cc: dev@dpdk.org; Van Haaren, Harry <harry.van.haaren@intel.com>;
> hemant.agrawal@nxp.com; Richardson, Bruce <bruce.richardson@intel.com>;
> santosh.shukla@caviumnetworks.com; nipun.gupta@nxp.com
> Subject: Re: [PATCH v2 1/2] eventdev: add device stop flush callback
> 
> -----Original Message-----
> > When an event device is stopped, it drains all event queues. These
> > events may contain pointers, so to prevent memory leaks eventdev now
> > supports a user-provided flush callback that is called during the queue drain
> process.
> > This callback is stored in process memory, so the callback must be
> > registered by any process that may call rte_event_dev_stop().
> >
> > This commit also clarifies the behavior of rte_event_dev_stop().
> >
> > This follows this mailing list discussion:
> > http://dpdk.org/ml/archives/dev/2018-January/087484.html
> >
> > Signed-off-by: Gage Eads <gage.eads@intel.com>
> > ---
> > v2: allow a NULL callback pointer to unregister the callback
> >
> >  lib/librte_eventdev/rte_eventdev.c           | 17 +++++++++
> >  lib/librte_eventdev/rte_eventdev.h           | 55
> +++++++++++++++++++++++++++-
> >  lib/librte_eventdev/rte_eventdev_version.map |  6 +++
> >  3 files changed, 76 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/librte_eventdev/rte_eventdev.c
> > b/lib/librte_eventdev/rte_eventdev.c
> > index 851a119..1aacb7b 100644
> > --- a/lib/librte_eventdev/rte_eventdev.c
> > +++ b/lib/librte_eventdev/rte_eventdev.c
> > @@ -1123,6 +1123,23 @@ rte_event_dev_start(uint8_t dev_id)
> >  	return 0;
> >  }
> >
> > +typedef void (*eventdev_stop_flush_t)(uint8_t dev_id, struct rte_event event,
> > +		void *arg);
> > +/**< Callback function called during rte_event_dev_stop(), invoked
> > +once per
> > + * flushed event.
> > + */
> > +
> >  #define RTE_EVENTDEV_NAME_MAX_LEN	(64)
> >  /**< @internal Max length of name of event PMD */
> >
> > @@ -1176,6 +1194,11 @@ struct rte_eventdev {
> >  	event_dequeue_burst_t dequeue_burst;
> >  	/**< Pointer to PMD dequeue burst function. */
> >
> > +	eventdev_stop_flush_t dev_stop_flush;
> > +	/**< Optional, user-provided event flush function */
> > +	void *dev_stop_flush_arg;
> > +	/**< User-provided argument for event flush function */
> > +
> 
> I think, we can move this additions to the internal rte_eventdev_data structure.
> Reasons are
> 1) Changes to "struct rte_eventdev" would call for ABI change
> 2) We can keep "struct rte_eventdev" only for fast path functions, slow path
> functions can have additional redirection.
> 

Good points -- I hadn't considered the ABI impact of modifying rte_eventdev. rte_eventdev_data is in shared memory, though, so it's not multi-process friendly for function pointers. How about putting it in rte_eventdev_ops?

> >  	struct rte_eventdev_data *data;
> >  	/**< Pointer to device data */
> >  	const struct rte_eventdev_ops *dev_ops; @@ -1822,6 +1845,34 @@
> > rte_event_dev_xstats_reset(uint8_t dev_id,
> >   */
> >  int rte_event_dev_selftest(uint8_t dev_id);
> >
> > +/**
> > + * Registers a callback function to be invoked during
> > +rte_event_dev_stop() for
> > + * each flushed event. This function can be used to properly dispose
> > +of queued
> > + * events, for example events containing memory pointers.
> > + *
> > + * The callback function is only registered for the calling process.
> > +The
> > + * callback function must be registered in every process that can
> > +call
> > + * rte_event_dev_stop().
> > + *
> > + * To unregister a callback, call this function with a NULL callback pointer.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param callback
> > + *   Callback function invoked once per flushed event.
> > + * @param userdata
> > + *   Argument supplied to callback.
> > + *
> > + * @return
> > + *  - 0 on success.
> > + *  - -EINVAL if *dev_id* is invalid
> > + *
> > + * @see rte_event_dev_stop()
> > + */
> > +int
> > +rte_event_dev_stop_flush_callback_register(uint8_t dev_id,
> > +		eventdev_stop_flush_t callback, void *userdata);
> > +
> IMO, It would be better if we place this function near to rte_event_dev_stop().
> 
> Other than above minor changes, It looks good to me.

Ok, will address in v3.

Thanks,
Gage

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1 3/6] net/mlx5: add a function to rdma-core glue
  @ 2018-03-12  9:13  3%   ` Nélio Laranjeiro
  0 siblings, 0 replies; 200+ results
From: Nélio Laranjeiro @ 2018-03-12  9:13 UTC (permalink / raw)
  To: Yongseok Koh; +Cc: wenzhuo.lu, jingjing.wu, adrien.mazarguil, olivier.matz, dev

On Fri, Mar 09, 2018 at 05:25:29PM -0800, Yongseok Koh wrote:
> mlx5dv_create_wq() is added.
> 
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_glue.c | 9 +++++++++
>  drivers/net/mlx5/mlx5_glue.h | 4 ++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5_glue.c b/drivers/net/mlx5/mlx5_glue.c
> index 1c4396ada..e33fc76b5 100644
> --- a/drivers/net/mlx5/mlx5_glue.c
> +++ b/drivers/net/mlx5/mlx5_glue.c
> @@ -287,6 +287,14 @@ mlx5_glue_dv_create_cq(struct ibv_context *context,
>  	return mlx5dv_create_cq(context, cq_attr, mlx5_cq_attr);
>  }
>  
> +static struct ibv_wq *
> +mlx5_glue_dv_create_wq(struct ibv_context *context,
> +		       struct ibv_wq_init_attr *wq_attr,
> +		       struct mlx5dv_wq_init_attr *mlx5_wq_attr)
> +{
> +	return mlx5dv_create_wq(context, wq_attr, mlx5_wq_attr);
> +}
> +
>  static int
>  mlx5_glue_dv_query_device(struct ibv_context *ctx,
>  			  struct mlx5dv_context *attrs_out)
> @@ -347,6 +355,7 @@ const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue){
>  	.port_state_str = mlx5_glue_port_state_str,
>  	.cq_ex_to_cq = mlx5_glue_cq_ex_to_cq,
>  	.dv_create_cq = mlx5_glue_dv_create_cq,
> +	.dv_create_wq = mlx5_glue_dv_create_wq,
>  	.dv_query_device = mlx5_glue_dv_query_device,
>  	.dv_set_context_attr = mlx5_glue_dv_set_context_attr,
>  	.dv_init_obj = mlx5_glue_dv_init_obj,
> diff --git a/drivers/net/mlx5/mlx5_glue.h b/drivers/net/mlx5/mlx5_glue.h
> index b5efee3b6..21a713961 100644
> --- a/drivers/net/mlx5/mlx5_glue.h
> +++ b/drivers/net/mlx5/mlx5_glue.h
> @@ -100,6 +100,10 @@ struct mlx5_glue {
>  		(struct ibv_context *context,
>  		 struct ibv_cq_init_attr_ex *cq_attr,
>  		 struct mlx5dv_cq_init_attr *mlx5_cq_attr);
> +	struct ibv_wq *(*dv_create_wq)
> +		(struct ibv_context *context,
> +		 struct ibv_wq_init_attr *wq_attr,
> +		 struct mlx5dv_wq_init_attr *mlx5_wq_attr);
>  	int (*dv_query_device)(struct ibv_context *ctx_in,
>  			       struct mlx5dv_context *attrs_out);
>  	int (*dv_set_context_attr)(struct ibv_context *ibv_ctx,
> -- 
> 2.11.0
 
You missed to change the GLUE ABI version, it must be updated.

Regards,

-- 
Nélio Laranjeiro
6WIND

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated
  2018-03-11 12:51  0%     ` santosh
@ 2018-03-12  6:53  0%       ` Andrew Rybchenko
  0 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-12  6:53 UTC (permalink / raw)
  To: santosh, dev; +Cc: Olivier MATZ

On 03/11/2018 03:51 PM, santosh wrote:
> Hi Andrew,
>
>
> On Saturday 10 March 2018 09:09 PM, Andrew Rybchenko wrote:
>> Size of memory chunk required to populate mempool objects depends
>> on how objects are stored in the memory. Different mempool drivers
>> may have different requirements and a new operation allows to
>> calculate memory size in accordance with driver requirements and
>> advertise requirements on minimum memory chunk size and alignment
>> in a generic way.
>>
>> Bump ABI version since the patch breaks it.
>>
>> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
>> ---
>> RFCv2 -> v1:
>>   - move default calc_mem_size callback to rte_mempool_ops_default.c
>>   - add ABI changes to release notes
>>   - name default callback consistently: rte_mempool_op_<callback>_default()
>>   - bump ABI version since it is the first patch which breaks ABI
>>   - describe default callback behaviour in details
>>   - avoid introduction of internal function to cope with depration
>>     (keep it to deprecation patch)
>>   - move cache-line or page boundary chunk alignment to default callback
>>   - highlight that min_chunk_size and align parameters are output only
>>
> [...]
>
>> diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
>> new file mode 100644
>> index 0000000..57fe79b
>> --- /dev/null
>> +++ b/lib/librte_mempool/rte_mempool_ops_default.c
>> @@ -0,0 +1,38 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2016 Intel Corporation.
>> + * Copyright(c) 2016 6WIND S.A.
>> + * Copyright(c) 2018 Solarflare Communications Inc.
>> + */
>> +
>> +#include <rte_mempool.h>
>> +
>> +ssize_t
>> +rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
>> +				     uint32_t obj_num, uint32_t pg_shift,
>> +				     size_t *min_chunk_size, size_t *align)
>> +{
>> +	unsigned int mp_flags;
>> +	int ret;
>> +	size_t total_elt_sz;
>> +	size_t mem_size;
>> +
>> +	/* Get mempool capabilities */
>> +	mp_flags = 0;
>> +	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
>> +	if ((ret < 0) && (ret != -ENOTSUP))
>> +		return ret;
>> +
>> +	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
>> +
>> +	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
>> +					 mp->flags | mp_flags);
>> +
> Looks ok to me except a nit:
> (mp->flags | mp_flags) style expression is to differentiate that
> mp_flags holds driver specific flag like BLK_ALIGN and mp->flags
> has appl specific flags.. is it so? If not then why not simply
> do like:
> mp->flags |= mp_flags.

In fact it does not mater a lot since the code is removed in the patch 3.
Here it is required just for consistency. Also, mp argument is a const
which will not allow to change its members.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2 1/2] eventdev: add device stop flush callback
  @ 2018-03-12  6:25  3%   ` Jerin Jacob
  2018-03-12 14:30  3%     ` Eads, Gage
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2018-03-12  6:25 UTC (permalink / raw)
  To: Gage Eads
  Cc: dev, harry.van.haaren, hemant.agrawal, bruce.richardson,
	santosh.shukla, nipun.gupta

-----Original Message-----
> When an event device is stopped, it drains all event queues. These events
> may contain pointers, so to prevent memory leaks eventdev now supports a
> user-provided flush callback that is called during the queue drain process.
> This callback is stored in process memory, so the callback must be
> registered by any process that may call rte_event_dev_stop().
> 
> This commit also clarifies the behavior of rte_event_dev_stop().
> 
> This follows this mailing list discussion:
> http://dpdk.org/ml/archives/dev/2018-January/087484.html
> 
> Signed-off-by: Gage Eads <gage.eads@intel.com>
> ---
> v2: allow a NULL callback pointer to unregister the callback
> 
>  lib/librte_eventdev/rte_eventdev.c           | 17 +++++++++
>  lib/librte_eventdev/rte_eventdev.h           | 55 +++++++++++++++++++++++++++-
>  lib/librte_eventdev/rte_eventdev_version.map |  6 +++
>  3 files changed, 76 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_eventdev/rte_eventdev.c b/lib/librte_eventdev/rte_eventdev.c
> index 851a119..1aacb7b 100644
> --- a/lib/librte_eventdev/rte_eventdev.c
> +++ b/lib/librte_eventdev/rte_eventdev.c
> @@ -1123,6 +1123,23 @@ rte_event_dev_start(uint8_t dev_id)
>  	return 0;
>  }
>  
> +typedef void (*eventdev_stop_flush_t)(uint8_t dev_id, struct rte_event event,
> +		void *arg);
> +/**< Callback function called during rte_event_dev_stop(), invoked once per
> + * flushed event.
> + */
> +
>  #define RTE_EVENTDEV_NAME_MAX_LEN	(64)
>  /**< @internal Max length of name of event PMD */
>  
> @@ -1176,6 +1194,11 @@ struct rte_eventdev {
>  	event_dequeue_burst_t dequeue_burst;
>  	/**< Pointer to PMD dequeue burst function. */
>  
> +	eventdev_stop_flush_t dev_stop_flush;
> +	/**< Optional, user-provided event flush function */
> +	void *dev_stop_flush_arg;
> +	/**< User-provided argument for event flush function */
> +

I think, we can move this additions to the internal rte_eventdev_data structure. Reasons are
1) Changes to "struct rte_eventdev" would call for ABI change
2) We can keep "struct rte_eventdev" only for fast path functions,
slow path functions can have additional redirection.

>  	struct rte_eventdev_data *data;
>  	/**< Pointer to device data */
>  	const struct rte_eventdev_ops *dev_ops;
> @@ -1822,6 +1845,34 @@ rte_event_dev_xstats_reset(uint8_t dev_id,
>   */
>  int rte_event_dev_selftest(uint8_t dev_id);
>  
> +/**
> + * Registers a callback function to be invoked during rte_event_dev_stop() for
> + * each flushed event. This function can be used to properly dispose of queued
> + * events, for example events containing memory pointers.
> + *
> + * The callback function is only registered for the calling process. The
> + * callback function must be registered in every process that can call
> + * rte_event_dev_stop().
> + *
> + * To unregister a callback, call this function with a NULL callback pointer.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param callback
> + *   Callback function invoked once per flushed event.
> + * @param userdata
> + *   Argument supplied to callback.
> + *
> + * @return
> + *  - 0 on success.
> + *  - -EINVAL if *dev_id* is invalid
> + *
> + * @see rte_event_dev_stop()
> + */
> +int
> +rte_event_dev_stop_flush_callback_register(uint8_t dev_id,
> +		eventdev_stop_flush_t callback, void *userdata);
> +
IMO, It would be better if we place this function near to rte_event_dev_stop().

Other than above minor changes, It looks good to me.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated
  2018-03-10 15:39  7%   ` [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
@ 2018-03-11 12:51  0%     ` santosh
  2018-03-12  6:53  0%       ` Andrew Rybchenko
  2018-03-19 17:03  0%     ` Olivier Matz
  1 sibling, 1 reply; 200+ results
From: santosh @ 2018-03-11 12:51 UTC (permalink / raw)
  To: Andrew Rybchenko, dev; +Cc: Olivier MATZ

Hi Andrew,


On Saturday 10 March 2018 09:09 PM, Andrew Rybchenko wrote:
> Size of memory chunk required to populate mempool objects depends
> on how objects are stored in the memory. Different mempool drivers
> may have different requirements and a new operation allows to
> calculate memory size in accordance with driver requirements and
> advertise requirements on minimum memory chunk size and alignment
> in a generic way.
>
> Bump ABI version since the patch breaks it.
>
> Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
> ---
> RFCv2 -> v1:
>  - move default calc_mem_size callback to rte_mempool_ops_default.c
>  - add ABI changes to release notes
>  - name default callback consistently: rte_mempool_op_<callback>_default()
>  - bump ABI version since it is the first patch which breaks ABI
>  - describe default callback behaviour in details
>  - avoid introduction of internal function to cope with depration
>    (keep it to deprecation patch)
>  - move cache-line or page boundary chunk alignment to default callback
>  - highlight that min_chunk_size and align parameters are output only
>
[...]

> diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
> new file mode 100644
> index 0000000..57fe79b
> --- /dev/null
> +++ b/lib/librte_mempool/rte_mempool_ops_default.c
> @@ -0,0 +1,38 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2016 Intel Corporation.
> + * Copyright(c) 2016 6WIND S.A.
> + * Copyright(c) 2018 Solarflare Communications Inc.
> + */
> +
> +#include <rte_mempool.h>
> +
> +ssize_t
> +rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
> +				     uint32_t obj_num, uint32_t pg_shift,
> +				     size_t *min_chunk_size, size_t *align)
> +{
> +	unsigned int mp_flags;
> +	int ret;
> +	size_t total_elt_sz;
> +	size_t mem_size;
> +
> +	/* Get mempool capabilities */
> +	mp_flags = 0;
> +	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
> +	if ((ret < 0) && (ret != -ENOTSUP))
> +		return ret;
> +
> +	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
> +
> +	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
> +					 mp->flags | mp_flags);
> +

Looks ok to me except a nit:
(mp->flags | mp_flags) style expression is to differentiate that
mp_flags holds driver specific flag like BLK_ALIGN and mp->flags
has appl specific flags.. is it so? If not then why not simply
do like:
mp->flags |= mp_flags.

Thanks.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v1 7/9] mempool: remove callback to register memory area
  2018-03-10 15:39  3% ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
                     ` (3 preceding siblings ...)
  2018-03-10 15:39  5%   ` [dpdk-dev] [PATCH v1 4/9] mempool: deprecate xmem functions Andrew Rybchenko
@ 2018-03-10 15:39  8%   ` Andrew Rybchenko
  2018-03-19 17:03  0%   ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Olivier Matz
  5 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-10 15:39 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

The callback is not required any more since there is a new callback
to populate objects using provided memory area which provides
the same information.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
RFCv2 -> v1:
 - advertise ABI changes in release notes

 doc/guides/rel_notes/deprecation.rst       |  1 -
 doc/guides/rel_notes/release_18_05.rst     |  2 ++
 lib/librte_mempool/rte_mempool.c           |  5 -----
 lib/librte_mempool/rte_mempool.h           | 31 ------------------------------
 lib/librte_mempool/rte_mempool_ops.c       | 14 --------------
 lib/librte_mempool/rte_mempool_version.map |  1 -
 6 files changed, 2 insertions(+), 52 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 473330d..5301259 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -63,7 +63,6 @@ Deprecation Notices
 
   The following changes are planned:
 
-  - substitute ``register_memory_area`` with ``populate`` ops.
   - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
 
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 0244f91..9d40db1 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -107,6 +107,8 @@ ABI Changes
   Callback ``get_capabilities`` has been removed from ``rte_mempool_ops``
   since its features are covered by ``calc_mem_size`` and ``populate``
   callbacks.
+  Callback ``register_memory_area`` has been removed from ``rte_mempool_ops``
+  since the new callback ``populate`` may be used instead of it.
 
 
 Removed Items
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index b57ba2a..844d907 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -344,11 +344,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 		mp->flags |= MEMPOOL_F_POOL_CREATED;
 	}
 
-	/* Notify memory area to mempool */
-	ret = rte_mempool_ops_register_memory_area(mp, vaddr, iova, len);
-	if (ret != -ENOTSUP && ret < 0)
-		return ret;
-
 	/* mempool is already populated */
 	if (mp->populated_size >= mp->size)
 		return -ENOSPC;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index ebfc95c..5f63f86 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -370,12 +370,6 @@ typedef int (*rte_mempool_dequeue_t)(struct rte_mempool *mp,
 typedef unsigned (*rte_mempool_get_count)(const struct rte_mempool *mp);
 
 /**
- * Notify new memory area to mempool.
- */
-typedef int (*rte_mempool_ops_register_memory_area_t)
-(const struct rte_mempool *mp, char *vaddr, rte_iova_t iova, size_t len);
-
-/**
  * Calculate memory size required to store given number of objects.
  *
  * @param[in] mp
@@ -507,10 +501,6 @@ struct rte_mempool_ops {
 	rte_mempool_dequeue_t dequeue;   /**< Dequeue an object. */
 	rte_mempool_get_count get_count; /**< Get qty of available objs. */
 	/**
-	 * Notify new memory area to mempool
-	 */
-	rte_mempool_ops_register_memory_area_t register_memory_area;
-	/**
 	 * Optional callback to calculate memory size required to
 	 * store specified number of objects.
 	 */
@@ -632,27 +622,6 @@ unsigned
 rte_mempool_ops_get_count(const struct rte_mempool *mp);
 
 /**
- * @internal wrapper for mempool_ops register_memory_area callback.
- * API to notify the mempool handler when a new memory area is added to pool.
- *
- * @param mp
- *   Pointer to the memory pool.
- * @param vaddr
- *   Pointer to the buffer virtual address.
- * @param iova
- *   Pointer to the buffer IO address.
- * @param len
- *   Pool size.
- * @return
- *   - 0: Success;
- *   - -ENOTSUP - doesn't support register_memory_area ops (valid error case).
- *   - Otherwise, rte_mempool_populate_phys fails thus pool create fails.
- */
-int
-rte_mempool_ops_register_memory_area(const struct rte_mempool *mp,
-				char *vaddr, rte_iova_t iova, size_t len);
-
-/**
  * @internal wrapper for mempool_ops calc_mem_size callback.
  * API to calculate size of memory required to store specified number of
  * object.
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 6ac669a..ea9be1e 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -57,7 +57,6 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->enqueue = h->enqueue;
 	ops->dequeue = h->dequeue;
 	ops->get_count = h->get_count;
-	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
 	ops->populate = h->populate;
 
@@ -99,19 +98,6 @@ rte_mempool_ops_get_count(const struct rte_mempool *mp)
 }
 
 /* wrapper to notify new memory area to external mempool */
-int
-rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
-					rte_iova_t iova, size_t len)
-{
-	struct rte_mempool_ops *ops;
-
-	ops = rte_mempool_get_ops(mp->ops_index);
-
-	RTE_FUNC_PTR_OR_ERR_RET(ops->register_memory_area, -ENOTSUP);
-	return ops->register_memory_area(mp, vaddr, iova, len);
-}
-
-/* wrapper to notify new memory area to external mempool */
 ssize_t
 rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 				uint32_t obj_num, uint32_t pg_shift,
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 42ca4df..f539a5a 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -45,7 +45,6 @@ DPDK_16.07 {
 DPDK_17.11 {
 	global:
 
-	rte_mempool_ops_register_memory_area;
 	rte_mempool_populate_iova;
 	rte_mempool_populate_iova_tab;
 
-- 
2.7.4

^ permalink raw reply	[relevance 8%]

* [dpdk-dev] [PATCH v1 4/9] mempool: deprecate xmem functions
  2018-03-10 15:39  3% ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
                     ` (2 preceding siblings ...)
  2018-03-10 15:39  6%   ` [dpdk-dev] [PATCH v1 3/9] mempool: remove callback to get capabilities Andrew Rybchenko
@ 2018-03-10 15:39  5%   ` Andrew Rybchenko
  2018-03-10 15:39  8%   ` [dpdk-dev] [PATCH v1 7/9] mempool: remove callback to register memory area Andrew Rybchenko
  2018-03-19 17:03  0%   ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Olivier Matz
  5 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-10 15:39 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

Move rte_mempool_xmem_size() code to internal helper function
since it is required in two places: deprecated rte_mempool_xmem_size()
and non-deprecated rte_mempool_op_calc_mem_size_deafult().

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
RFCv2 -> v1:
 - advertise deprecation in release notes
 - factor out default memory size calculation into non-deprecated
   internal function to avoid usage of deprecated function internally
 - remove test for deprecated functions to address build issue because
   of usage of deprecated functions (it is easy to allow usage of
   deprecated function in Makefile, but very complicated in meson)

 doc/guides/rel_notes/deprecation.rst         |  7 -------
 doc/guides/rel_notes/release_18_05.rst       | 10 +++++++++
 lib/librte_mempool/rte_mempool.c             | 19 ++++++++++++++---
 lib/librte_mempool/rte_mempool.h             | 25 ++++++++++++++++++++++
 lib/librte_mempool/rte_mempool_ops_default.c |  4 ++--
 test/test/test_mempool.c                     | 31 ----------------------------
 6 files changed, 53 insertions(+), 43 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 4deed9a..473330d 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -60,13 +60,6 @@ Deprecation Notices
   - ``rte_eal_mbuf_default_mempool_ops``
 
 * mempool: several API and ABI changes are planned in v18.05.
-  The following functions, introduced for Xen, which is not supported
-  anymore since v17.11, are hard to use, not used anywhere else in DPDK.
-  Therefore they will be deprecated in v18.05 and removed in v18.08:
-
-  - ``rte_mempool_xmem_create``
-  - ``rte_mempool_xmem_size``
-  - ``rte_mempool_xmem_usage``
 
   The following changes are planned:
 
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index c50f26c..0244f91 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -74,6 +74,16 @@ API Changes
   Now the new driver callbacks ``calc_mem_size`` and ``populate`` may be
   used to achieve it without specific knowledge in the generic code.
 
+* **Deprecated mempool xmem functions.**
+
+  The following functions, introduced for Xen, which is not supported
+  anymore since v17.11, are hard to use, not used anywhere else in DPDK.
+  Therefore they were deprecated in v18.05 and will be removed in v18.08:
+
+  - ``rte_mempool_xmem_create``
+  - ``rte_mempool_xmem_size``
+  - ``rte_mempool_xmem_usage``
+
 
 ABI Changes
 -----------
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index fdcda45..b57ba2a 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -204,11 +204,13 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
 
 
 /*
- * Calculate maximum amount of memory required to store given number of objects.
+ * Internal function to calculate required memory chunk size shared
+ * by default implementation of the corresponding callback and
+ * deprecated external function.
  */
 size_t
-rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
-		      __rte_unused unsigned int flags)
+rte_mempool_calc_mem_size_helper(uint32_t elt_num, size_t total_elt_sz,
+				 uint32_t pg_shift)
 {
 	size_t obj_per_page, pg_num, pg_sz;
 
@@ -228,6 +230,17 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
 }
 
 /*
+ * Calculate maximum amount of memory required to store given number of objects.
+ */
+size_t
+rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
+		      __rte_unused unsigned int flags)
+{
+	return rte_mempool_calc_mem_size_helper(elt_num, total_elt_sz,
+						pg_shift);
+}
+
+/*
  * Calculate how much memory would be actually required with the
  * given memory footprint to store required number of elements.
  */
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index cd3b229..ebfc95c 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -420,6 +420,28 @@ ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 		size_t *min_chunk_size, size_t *align);
 
 /**
+ * @internal Helper function to calculate memory size required to store
+ * specified number of objects in assumption that the memory buffer will
+ * be aligned at page boundary.
+ *
+ * Note that if object size is bigger than page size, then it assumes
+ * that pages are grouped in subsets of physically continuous pages big
+ * enough to store at least one object.
+ *
+ * @param elt_num
+ *   Number of elements.
+ * @param total_elt_sz
+ *   The size of each element, including header and trailer, as returned
+ *   by rte_mempool_calc_obj_size().
+ * @param pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+size_t rte_mempool_calc_mem_size_helper(uint32_t elt_num, size_t total_elt_sz,
+		uint32_t pg_shift);
+
+/**
  * Function to be called for each populated object.
  *
  * @param[in] mp
@@ -905,6 +927,7 @@ rte_mempool_create(const char *name, unsigned n, unsigned elt_size,
  *   The pointer to the new allocated mempool, on success. NULL on error
  *   with rte_errno set appropriately. See rte_mempool_create() for details.
  */
+__rte_deprecated
 struct rte_mempool *
 rte_mempool_xmem_create(const char *name, unsigned n, unsigned elt_size,
 		unsigned cache_size, unsigned private_data_size,
@@ -1667,6 +1690,7 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  * @return
  *   Required memory size aligned at page boundary.
  */
+__rte_deprecated
 size_t rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz,
 	uint32_t pg_shift, unsigned int flags);
 
@@ -1698,6 +1722,7 @@ size_t rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz,
  *   buffer is too small, return a negative value whose absolute value
  *   is the actual number of elements that can be stored in that buffer.
  */
+__rte_deprecated
 ssize_t rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num,
 	size_t total_elt_sz, const rte_iova_t iova[], uint32_t pg_num,
 	uint32_t pg_shift, unsigned int flags);
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 3defc15..fd63ca1 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -16,8 +16,8 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 
-	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
-					 mp->flags);
+	mem_size = rte_mempool_calc_mem_size_helper(obj_num, total_elt_sz,
+						    pg_shift);
 
 	*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
 
diff --git a/test/test/test_mempool.c b/test/test/test_mempool.c
index 63f921e..8d29af2 100644
--- a/test/test/test_mempool.c
+++ b/test/test/test_mempool.c
@@ -444,34 +444,6 @@ test_mempool_same_name_twice_creation(void)
 	return 0;
 }
 
-/*
- * Basic test for mempool_xmem functions.
- */
-static int
-test_mempool_xmem_misc(void)
-{
-	uint32_t elt_num, total_size;
-	size_t sz;
-	ssize_t usz;
-
-	elt_num = MAX_KEEP;
-	total_size = rte_mempool_calc_obj_size(MEMPOOL_ELT_SIZE, 0, NULL);
-	sz = rte_mempool_xmem_size(elt_num, total_size, MEMPOOL_PG_SHIFT_MAX,
-					0);
-
-	usz = rte_mempool_xmem_usage(NULL, elt_num, total_size, 0, 1,
-		MEMPOOL_PG_SHIFT_MAX, 0);
-
-	if (sz != (size_t)usz)  {
-		printf("failure @ %s: rte_mempool_xmem_usage(%u, %u) "
-			"returns: %#zx, while expected: %#zx;\n",
-			__func__, elt_num, total_size, sz, (size_t)usz);
-		return -1;
-	}
-
-	return 0;
-}
-
 static void
 walk_cb(struct rte_mempool *mp, void *userdata __rte_unused)
 {
@@ -596,9 +568,6 @@ test_mempool(void)
 	if (test_mempool_same_name_twice_creation() < 0)
 		goto err;
 
-	if (test_mempool_xmem_misc() < 0)
-		goto err;
-
 	/* test the stack handler */
 	if (test_mempool_basic(mp_stack, 1) < 0)
 		goto err;
-- 
2.7.4

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v1 3/9] mempool: remove callback to get capabilities
  2018-03-10 15:39  3% ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
  2018-03-10 15:39  7%   ` [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
  2018-03-10 15:39  6%   ` [dpdk-dev] [PATCH v1 2/9] mempool: add op to populate objects using provided memory Andrew Rybchenko
@ 2018-03-10 15:39  6%   ` Andrew Rybchenko
  2018-03-10 15:39  5%   ` [dpdk-dev] [PATCH v1 4/9] mempool: deprecate xmem functions Andrew Rybchenko
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-10 15:39 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

The callback was introduced to let generic code to know octeontx
mempool driver requirements to use single physically contiguous
memory chunk to store all objects and align object address to
total object size. Now these requirements are met using a new
callbacks to calculate required memory chunk size and to populate
objects using provided memory chunk.

These capability flags are not used anywhere else.

Restricting capabilities to flags is not generic and likely to
be insufficient to describe mempool driver features. If required
in the future, API which returns structured information may be
added.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
RFCv2 -> v1:
 - squash mempool/octeontx patches to add calc_mem_size and populate
   callbacks to this one in order to avoid breakages in the middle of
   patchset
 - advertise API changes in release notes

 doc/guides/rel_notes/deprecation.rst            |  1 -
 doc/guides/rel_notes/release_18_05.rst          | 11 +++++
 drivers/mempool/octeontx/rte_mempool_octeontx.c | 59 +++++++++++++++++++++----
 lib/librte_mempool/rte_mempool.c                | 44 ++----------------
 lib/librte_mempool/rte_mempool.h                | 52 +---------------------
 lib/librte_mempool/rte_mempool_ops.c            | 14 ------
 lib/librte_mempool/rte_mempool_ops_default.c    | 15 +------
 lib/librte_mempool/rte_mempool_version.map      |  1 -
 8 files changed, 68 insertions(+), 129 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index c06fc67..4deed9a 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -70,7 +70,6 @@ Deprecation Notices
 
   The following changes are planned:
 
-  - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
   - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index abaefe5..c50f26c 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -66,6 +66,14 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Removed mempool capability flags and related functions.**
+
+  Flags ``MEMPOOL_F_CAPA_PHYS_CONTIG`` and
+  ``MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS`` were used by octeontx mempool
+  driver to customize generic mempool library behaviour.
+  Now the new driver callbacks ``calc_mem_size`` and ``populate`` may be
+  used to achieve it without specific knowledge in the generic code.
+
 
 ABI Changes
 -----------
@@ -86,6 +94,9 @@ ABI Changes
   to allow to customize required memory size calculation.
   A new callback ``populate`` has been added to ``rte_mempool_ops``
   to allow to customize objects population.
+  Callback ``get_capabilities`` has been removed from ``rte_mempool_ops``
+  since its features are covered by ``calc_mem_size`` and ``populate``
+  callbacks.
 
 
 Removed Items
diff --git a/drivers/mempool/octeontx/rte_mempool_octeontx.c b/drivers/mempool/octeontx/rte_mempool_octeontx.c
index d143d05..f2c4f6a 100644
--- a/drivers/mempool/octeontx/rte_mempool_octeontx.c
+++ b/drivers/mempool/octeontx/rte_mempool_octeontx.c
@@ -126,14 +126,29 @@ octeontx_fpavf_get_count(const struct rte_mempool *mp)
 	return octeontx_fpa_bufpool_free_count(pool);
 }
 
-static int
-octeontx_fpavf_get_capabilities(const struct rte_mempool *mp,
-				unsigned int *flags)
+static ssize_t
+octeontx_fpavf_calc_mem_size(const struct rte_mempool *mp,
+			     uint32_t obj_num, uint32_t pg_shift,
+			     size_t *min_chunk_size, size_t *align)
 {
-	RTE_SET_USED(mp);
-	*flags |= (MEMPOOL_F_CAPA_PHYS_CONTIG |
-			MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS);
-	return 0;
+	ssize_t mem_size;
+
+	/*
+	 * Simply need space for one more object to be able to
+	 * fullfil alignment requirements.
+	 */
+	mem_size = rte_mempool_op_calc_mem_size_default(mp, obj_num + 1,
+							pg_shift,
+							min_chunk_size, align);
+	if (mem_size >= 0) {
+		/*
+		 * Memory area which contains objects must be physically
+		 * contiguous.
+		 */
+		*min_chunk_size = mem_size;
+	}
+
+	return mem_size;
 }
 
 static int
@@ -150,6 +165,33 @@ octeontx_fpavf_register_memory_area(const struct rte_mempool *mp,
 	return octeontx_fpavf_pool_set_range(pool_bar, len, vaddr, gpool);
 }
 
+static int
+octeontx_fpavf_populate(struct rte_mempool *mp, unsigned int max_objs,
+			void *vaddr, rte_iova_t iova, size_t len,
+			rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
+{
+	size_t total_elt_sz;
+	size_t off;
+
+	if (iova == RTE_BAD_IOVA)
+		return -EINVAL;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	/* align object start address to a multiple of total_elt_sz */
+	off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
+
+	if (len < off)
+		return -EINVAL;
+
+	vaddr = (char *)vaddr + off;
+	iova += off;
+	len -= off;
+
+	return rte_mempool_op_populate_default(mp, max_objs, vaddr, iova, len,
+					       obj_cb, obj_cb_arg);
+}
+
 static struct rte_mempool_ops octeontx_fpavf_ops = {
 	.name = "octeontx_fpavf",
 	.alloc = octeontx_fpavf_alloc,
@@ -157,8 +199,9 @@ static struct rte_mempool_ops octeontx_fpavf_ops = {
 	.enqueue = octeontx_fpavf_enqueue,
 	.dequeue = octeontx_fpavf_dequeue,
 	.get_count = octeontx_fpavf_get_count,
-	.get_capabilities = octeontx_fpavf_get_capabilities,
 	.register_memory_area = octeontx_fpavf_register_memory_area,
+	.calc_mem_size = octeontx_fpavf_calc_mem_size,
+	.populate = octeontx_fpavf_populate,
 };
 
 MEMPOOL_REGISTER_OPS(octeontx_fpavf_ops);
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index ed0e982..fdcda45 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -208,15 +208,9 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  */
 size_t
 rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
-		      unsigned int flags)
+		      __rte_unused unsigned int flags)
 {
 	size_t obj_per_page, pg_num, pg_sz;
-	unsigned int mask;
-
-	mask = MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS | MEMPOOL_F_CAPA_PHYS_CONTIG;
-	if ((flags & mask) == mask)
-		/* alignment need one additional object */
-		elt_num += 1;
 
 	if (total_elt_sz == 0)
 		return 0;
@@ -240,18 +234,12 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift,
 ssize_t
 rte_mempool_xmem_usage(__rte_unused void *vaddr, uint32_t elt_num,
 	size_t total_elt_sz, const rte_iova_t iova[], uint32_t pg_num,
-	uint32_t pg_shift, unsigned int flags)
+	uint32_t pg_shift, __rte_unused unsigned int flags)
 {
 	uint32_t elt_cnt = 0;
 	rte_iova_t start, end;
 	uint32_t iova_idx;
 	size_t pg_sz = (size_t)1 << pg_shift;
-	unsigned int mask;
-
-	mask = MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS | MEMPOOL_F_CAPA_PHYS_CONTIG;
-	if ((flags & mask) == mask)
-		/* alignment need one additional object */
-		elt_num += 1;
 
 	/* if iova is NULL, assume contiguous memory */
 	if (iova == NULL) {
@@ -330,8 +318,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	rte_iova_t iova, size_t len, rte_mempool_memchunk_free_cb_t *free_cb,
 	void *opaque)
 {
-	unsigned total_elt_sz;
-	unsigned int mp_capa_flags;
 	unsigned i = 0;
 	size_t off;
 	struct rte_mempool_memhdr *memhdr;
@@ -354,27 +340,6 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	if (mp->populated_size >= mp->size)
 		return -ENOSPC;
 
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
-
-	/* Get mempool capabilities */
-	mp_capa_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_capa_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
-	/* update mempool capabilities */
-	mp->flags |= mp_capa_flags;
-
-	/* Detect pool area has sufficient space for elements */
-	if (mp_capa_flags & MEMPOOL_F_CAPA_PHYS_CONTIG) {
-		if (len < total_elt_sz * mp->size) {
-			RTE_LOG(ERR, MEMPOOL,
-				"pool area %" PRIx64 " not enough\n",
-				(uint64_t)len);
-			return -ENOSPC;
-		}
-	}
-
 	memhdr = rte_zmalloc("MEMPOOL_MEMHDR", sizeof(*memhdr), 0);
 	if (memhdr == NULL)
 		return -ENOMEM;
@@ -386,10 +351,7 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	memhdr->free_cb = free_cb;
 	memhdr->opaque = opaque;
 
-	if (mp_capa_flags & MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS)
-		/* align object start address to a multiple of total_elt_sz */
-		off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
-	else if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
+	if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
 		off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr;
 	else
 		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 49083bd..cd3b229 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -245,24 +245,6 @@ struct rte_mempool {
 #define MEMPOOL_F_SC_GET         0x0008 /**< Default get is "single-consumer".*/
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_PHYS_CONTIG 0x0020 /**< Don't need physically contiguous objs. */
-/**
- * This capability flag is advertised by a mempool handler, if the whole
- * memory area containing the objects must be physically contiguous.
- * Note: This flag should not be passed by application.
- */
-#define MEMPOOL_F_CAPA_PHYS_CONTIG 0x0040
-/**
- * This capability flag is advertised by a mempool handler. Used for a case
- * where mempool driver wants object start address(vaddr) aligned to block
- * size(/ total element size).
- *
- * Note:
- * - This flag should not be passed by application.
- *   Flag used for mempool driver only.
- * - Mempool driver must also set MEMPOOL_F_CAPA_PHYS_CONTIG flag along with
- *   MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS.
- */
-#define MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS 0x0080
 
 /**
  * @internal When debug is enabled, store some statistics.
@@ -388,12 +370,6 @@ typedef int (*rte_mempool_dequeue_t)(struct rte_mempool *mp,
 typedef unsigned (*rte_mempool_get_count)(const struct rte_mempool *mp);
 
 /**
- * Get the mempool capabilities.
- */
-typedef int (*rte_mempool_get_capabilities_t)(const struct rte_mempool *mp,
-		unsigned int *flags);
-
-/**
  * Notify new memory area to mempool.
  */
 typedef int (*rte_mempool_ops_register_memory_area_t)
@@ -433,13 +409,7 @@ typedef ssize_t (*rte_mempool_calc_mem_size_t)(const struct rte_mempool *mp,
  * that pages are grouped in subsets of physically continuous pages big
  * enough to store at least one object.
  *
- * If mempool driver requires object addresses to be block size aligned
- * (MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS), space for one extra element is
- * reserved to be able to meet the requirement.
- *
- * Minimum size of memory chunk is either all required space, if
- * capabilities say that whole memory area must be physically contiguous
- * (MEMPOOL_F_CAPA_PHYS_CONTIG), or a maximum of the page size and total
+ * Minimum size of memory chunk is a maximum of the page size and total
  * element size.
  *
  * Required memory chunk alignment is a maximum of page size and cache
@@ -515,10 +485,6 @@ struct rte_mempool_ops {
 	rte_mempool_dequeue_t dequeue;   /**< Dequeue an object. */
 	rte_mempool_get_count get_count; /**< Get qty of available objs. */
 	/**
-	 * Get the mempool capabilities
-	 */
-	rte_mempool_get_capabilities_t get_capabilities;
-	/**
 	 * Notify new memory area to mempool
 	 */
 	rte_mempool_ops_register_memory_area_t register_memory_area;
@@ -644,22 +610,6 @@ unsigned
 rte_mempool_ops_get_count(const struct rte_mempool *mp);
 
 /**
- * @internal wrapper for mempool_ops get_capabilities callback.
- *
- * @param mp [in]
- *   Pointer to the memory pool.
- * @param flags [out]
- *   Pointer to the mempool flags.
- * @return
- *   - 0: Success; The mempool driver has advertised his pool capabilities in
- *   flags param.
- *   - -ENOTSUP - doesn't support get_capabilities ops (valid case).
- *   - Otherwise, pool create fails.
- */
-int
-rte_mempool_ops_get_capabilities(const struct rte_mempool *mp,
-					unsigned int *flags);
-/**
  * @internal wrapper for mempool_ops register_memory_area callback.
  * API to notify the mempool handler when a new memory area is added to pool.
  *
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 1a7f39f..6ac669a 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -57,7 +57,6 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->enqueue = h->enqueue;
 	ops->dequeue = h->dequeue;
 	ops->get_count = h->get_count;
-	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
 	ops->populate = h->populate;
@@ -99,19 +98,6 @@ rte_mempool_ops_get_count(const struct rte_mempool *mp)
 	return ops->get_count(mp);
 }
 
-/* wrapper to get external mempool capabilities. */
-int
-rte_mempool_ops_get_capabilities(const struct rte_mempool *mp,
-					unsigned int *flags)
-{
-	struct rte_mempool_ops *ops;
-
-	ops = rte_mempool_get_ops(mp->ops_index);
-
-	RTE_FUNC_PTR_OR_ERR_RET(ops->get_capabilities, -ENOTSUP);
-	return ops->get_capabilities(mp, flags);
-}
-
 /* wrapper to notify new memory area to external mempool */
 int
 rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 57295f7..3defc15 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -11,26 +11,15 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 				     uint32_t obj_num, uint32_t pg_shift,
 				     size_t *min_chunk_size, size_t *align)
 {
-	unsigned int mp_flags;
-	int ret;
 	size_t total_elt_sz;
 	size_t mem_size;
 
-	/* Get mempool capabilities */
-	mp_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 
 	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
-					 mp->flags | mp_flags);
+					 mp->flags);
 
-	if (mp_flags & MEMPOOL_F_CAPA_PHYS_CONTIG)
-		*min_chunk_size = mem_size;
-	else
-		*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
+	*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
 
 	*align = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE, (size_t)1 << pg_shift);
 
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 90e79ec..42ca4df 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -45,7 +45,6 @@ DPDK_16.07 {
 DPDK_17.11 {
 	global:
 
-	rte_mempool_ops_get_capabilities;
 	rte_mempool_ops_register_memory_area;
 	rte_mempool_populate_iova;
 	rte_mempool_populate_iova_tab;
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH v1 2/9] mempool: add op to populate objects using provided memory
  2018-03-10 15:39  3% ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
  2018-03-10 15:39  7%   ` [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
@ 2018-03-10 15:39  6%   ` Andrew Rybchenko
  2018-03-10 15:39  6%   ` [dpdk-dev] [PATCH v1 3/9] mempool: remove callback to get capabilities Andrew Rybchenko
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-10 15:39 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

The callback allows to customize how objects are stored in the
memory chunk. Default implementation of the callback which simply
puts objects one by one is available.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
RFCv2 -> v1:
 - advertise ABI changes in release notes
 - use consistent name for default callback:
   rte_mempool_op_<callback>_default()
 - add opaque data pointer to populated object callback
 - move default callback to dedicated file

 doc/guides/rel_notes/deprecation.rst         |  2 +-
 doc/guides/rel_notes/release_18_05.rst       |  2 +
 lib/librte_mempool/rte_mempool.c             | 23 +++----
 lib/librte_mempool/rte_mempool.h             | 90 ++++++++++++++++++++++++++++
 lib/librte_mempool/rte_mempool_ops.c         | 21 +++++++
 lib/librte_mempool/rte_mempool_ops_default.c | 24 ++++++++
 lib/librte_mempool/rte_mempool_version.map   |  1 +
 7 files changed, 148 insertions(+), 15 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index e02d4ca..c06fc67 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -72,7 +72,7 @@ Deprecation Notices
 
   - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
-  - addition of new ops to customize objects population and allocate contiguous
+  - addition of new op to allocate contiguous
     block of objects if underlying driver supports it.
 
 * mbuf: The control mbuf API will be removed in v18.05. The impacted
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 59583ea..abaefe5 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -84,6 +84,8 @@ ABI Changes
 
   A new callback ``calc_mem_size`` has been added to ``rte_mempool_ops``
   to allow to customize required memory size calculation.
+  A new callback ``populate`` has been added to ``rte_mempool_ops``
+  to allow to customize objects population.
 
 
 Removed Items
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 3bfb36e..ed0e982 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -99,7 +99,8 @@ static unsigned optimize_object_size(unsigned obj_size)
 }
 
 static void
-mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova)
+mempool_add_elem(struct rte_mempool *mp, __rte_unused void *opaque,
+		 void *obj, rte_iova_t iova)
 {
 	struct rte_mempool_objhdr *hdr;
 	struct rte_mempool_objtlr *tlr __rte_unused;
@@ -116,9 +117,6 @@ mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova)
 	tlr = __mempool_get_trailer(obj);
 	tlr->cookie = RTE_MEMPOOL_TRAILER_COOKIE;
 #endif
-
-	/* enqueue in ring */
-	rte_mempool_ops_enqueue_bulk(mp, &obj, 1);
 }
 
 /* call obj_cb() for each mempool element */
@@ -396,16 +394,13 @@ rte_mempool_populate_iova(struct rte_mempool *mp, char *vaddr,
 	else
 		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
 
-	while (off + total_elt_sz <= len && mp->populated_size < mp->size) {
-		off += mp->header_size;
-		if (iova == RTE_BAD_IOVA)
-			mempool_add_elem(mp, (char *)vaddr + off,
-				RTE_BAD_IOVA);
-		else
-			mempool_add_elem(mp, (char *)vaddr + off, iova + off);
-		off += mp->elt_size + mp->trailer_size;
-		i++;
-	}
+	if (off > len)
+		return -EINVAL;
+
+	i = rte_mempool_ops_populate(mp, mp->size - mp->populated_size,
+		(char *)vaddr + off,
+		(iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off),
+		len - off, mempool_add_elem, NULL);
 
 	/* not enough room to store one object */
 	if (i == 0)
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 0151f6c..49083bd 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -449,6 +449,63 @@ ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 		uint32_t obj_num, uint32_t pg_shift,
 		size_t *min_chunk_size, size_t *align);
 
+/**
+ * Function to be called for each populated object.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] opaque
+ *   An opaque pointer passed to iterator.
+ * @param[in] vaddr
+ *   Object virtual address.
+ * @param[in] iova
+ *   Input/output virtual address of the object or RTE_BAD_IOVA.
+ */
+typedef void (rte_mempool_populate_obj_cb_t)(struct rte_mempool *mp,
+		void *opaque, void *vaddr, rte_iova_t iova);
+
+/**
+ * Populate memory pool objects using provided memory chunk.
+ *
+ * Populated objects should be enqueued to the pool, e.g. using
+ * rte_mempool_ops_enqueue_bulk().
+ *
+ * If the given IO address is unknown (iova = RTE_BAD_IOVA),
+ * the chunk doesn't need to be physically contiguous (only virtually),
+ * and allocated objects may span two pages.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] max_objs
+ *   Maximum number of objects to be populated.
+ * @param[in] vaddr
+ *   The virtual address of memory that should be used to store objects.
+ * @param[in] iova
+ *   The IO address
+ * @param[in] len
+ *   The length of memory in bytes.
+ * @param[in] obj_cb
+ *   Callback function to be executed for each populated object.
+ * @param[in] obj_cb_arg
+ *   An opaque pointer passed to the callback function.
+ * @return
+ *   The number of objects added on success.
+ *   On error, no objects are populated and a negative errno is returned.
+ */
+typedef int (*rte_mempool_populate_t)(struct rte_mempool *mp,
+		unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg);
+
+/**
+ * Default way to populate memory pool object using provided memory
+ * chunk: just slice objects one by one.
+ */
+int rte_mempool_op_populate_default(struct rte_mempool *mp,
+		unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg);
+
 /** Structure defining mempool operations structure */
 struct rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
@@ -470,6 +527,11 @@ struct rte_mempool_ops {
 	 * store specified number of objects.
 	 */
 	rte_mempool_calc_mem_size_t calc_mem_size;
+	/**
+	 * Optional callback to populate mempool objects using
+	 * provided memory chunk.
+	 */
+	rte_mempool_populate_t populate;
 } __rte_cache_aligned;
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
@@ -642,6 +704,34 @@ ssize_t rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 				      size_t *min_chunk_size, size_t *align);
 
 /**
+ * @internal wrapper for mempool_ops populate callback.
+ *
+ * Populate memory pool objects using provided memory chunk.
+ *
+ * @param[in] mp
+ *   A pointer to the mempool structure.
+ * @param[in] max_objs
+ *   Maximum number of objects to be populated.
+ * @param[in] vaddr
+ *   The virtual address of memory that should be used to store objects.
+ * @param[in] iova
+ *   The IO address
+ * @param[in] len
+ *   The length of memory in bytes.
+ * @param[in] obj_cb
+ *   Callback function to be executed for each populated object.
+ * @param[in] obj_cb_arg
+ *   An opaque pointer passed to the callback function.
+ * @return
+ *   The number of objects added on success.
+ *   On error, no objects are populated and a negative errno is returned.
+ */
+int rte_mempool_ops_populate(struct rte_mempool *mp, unsigned int max_objs,
+			     void *vaddr, rte_iova_t iova, size_t len,
+			     rte_mempool_populate_obj_cb_t *obj_cb,
+			     void *obj_cb_arg);
+
+/**
  * @internal wrapper for mempool_ops free callback.
  *
  * @param mp
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 26908cc..1a7f39f 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -60,6 +60,7 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
 	ops->calc_mem_size = h->calc_mem_size;
+	ops->populate = h->populate;
 
 	rte_spinlock_unlock(&rte_mempool_ops_table.sl);
 
@@ -141,6 +142,26 @@ rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
 	return ops->calc_mem_size(mp, obj_num, pg_shift, min_chunk_size, align);
 }
 
+/* wrapper to populate memory pool objects using provided memory chunk */
+int
+rte_mempool_ops_populate(struct rte_mempool *mp, unsigned int max_objs,
+				void *vaddr, rte_iova_t iova, size_t len,
+				rte_mempool_populate_obj_cb_t *obj_cb,
+				void *obj_cb_arg)
+{
+	struct rte_mempool_ops *ops;
+
+	ops = rte_mempool_get_ops(mp->ops_index);
+
+	if (ops->populate == NULL)
+		return rte_mempool_op_populate_default(mp, max_objs, vaddr,
+						       iova, len, obj_cb,
+						       obj_cb_arg);
+
+	return ops->populate(mp, max_objs, vaddr, iova, len, obj_cb,
+			     obj_cb_arg);
+}
+
 /* sets mempool ops previously registered by rte_mempool_register_ops. */
 int
 rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
index 57fe79b..57295f7 100644
--- a/lib/librte_mempool/rte_mempool_ops_default.c
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -36,3 +36,27 @@ rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
 
 	return mem_size;
 }
+
+int
+rte_mempool_op_populate_default(struct rte_mempool *mp, unsigned int max_objs,
+		void *vaddr, rte_iova_t iova, size_t len,
+		rte_mempool_populate_obj_cb_t *obj_cb, void *obj_cb_arg)
+{
+	size_t total_elt_sz;
+	size_t off;
+	unsigned int i;
+	void *obj;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	for (off = 0, i = 0; off + total_elt_sz <= len && i < max_objs; i++) {
+		off += mp->header_size;
+		obj = (char *)vaddr + off;
+		obj_cb(mp, obj_cb_arg, obj,
+		       (iova == RTE_BAD_IOVA) ? RTE_BAD_IOVA : (iova + off));
+		rte_mempool_ops_enqueue_bulk(mp, &obj, 1);
+		off += mp->elt_size + mp->trailer_size;
+	}
+
+	return i;
+}
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index e2a054b..90e79ec 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -56,6 +56,7 @@ DPDK_18.05 {
 	global:
 
 	rte_mempool_op_calc_mem_size_default;
+	rte_mempool_op_populate_default;
 
 } DPDK_17.11;
 
-- 
2.7.4

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated
  2018-03-10 15:39  3% ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
@ 2018-03-10 15:39  7%   ` Andrew Rybchenko
  2018-03-11 12:51  0%     ` santosh
  2018-03-19 17:03  0%     ` Olivier Matz
  2018-03-10 15:39  6%   ` [dpdk-dev] [PATCH v1 2/9] mempool: add op to populate objects using provided memory Andrew Rybchenko
                     ` (4 subsequent siblings)
  5 siblings, 2 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-10 15:39 UTC (permalink / raw)
  To: dev; +Cc: Olivier MATZ

Size of memory chunk required to populate mempool objects depends
on how objects are stored in the memory. Different mempool drivers
may have different requirements and a new operation allows to
calculate memory size in accordance with driver requirements and
advertise requirements on minimum memory chunk size and alignment
in a generic way.

Bump ABI version since the patch breaks it.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
---
RFCv2 -> v1:
 - move default calc_mem_size callback to rte_mempool_ops_default.c
 - add ABI changes to release notes
 - name default callback consistently: rte_mempool_op_<callback>_default()
 - bump ABI version since it is the first patch which breaks ABI
 - describe default callback behaviour in details
 - avoid introduction of internal function to cope with depration
   (keep it to deprecation patch)
 - move cache-line or page boundary chunk alignment to default callback
 - highlight that min_chunk_size and align parameters are output only

 doc/guides/rel_notes/deprecation.rst         |  3 +-
 doc/guides/rel_notes/release_18_05.rst       |  7 ++-
 lib/librte_mempool/Makefile                  |  3 +-
 lib/librte_mempool/meson.build               |  5 +-
 lib/librte_mempool/rte_mempool.c             | 43 +++++++--------
 lib/librte_mempool/rte_mempool.h             | 80 +++++++++++++++++++++++++++-
 lib/librte_mempool/rte_mempool_ops.c         | 18 +++++++
 lib/librte_mempool/rte_mempool_ops_default.c | 38 +++++++++++++
 lib/librte_mempool/rte_mempool_version.map   |  8 +++
 9 files changed, 177 insertions(+), 28 deletions(-)
 create mode 100644 lib/librte_mempool/rte_mempool_ops_default.c

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 6594585..e02d4ca 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -72,8 +72,7 @@ Deprecation Notices
 
   - removal of ``get_capabilities`` mempool ops and related flags.
   - substitute ``register_memory_area`` with ``populate`` ops.
-  - addition of new ops to customize required memory chunk calculation,
-    customize objects population and allocate contiguous
+  - addition of new ops to customize objects population and allocate contiguous
     block of objects if underlying driver supports it.
 
 * mbuf: The control mbuf API will be removed in v18.05. The impacted
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index f2525bb..59583ea 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -80,6 +80,11 @@ ABI Changes
    Also, make sure to start the actual text at the margin.
    =========================================================
 
+* **Changed rte_mempool_ops structure.**
+
+  A new callback ``calc_mem_size`` has been added to ``rte_mempool_ops``
+  to allow to customize required memory size calculation.
+
 
 Removed Items
 -------------
@@ -152,7 +157,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_latencystats.so.1
      librte_lpm.so.2
      librte_mbuf.so.3
-     librte_mempool.so.3
+   + librte_mempool.so.4
    + librte_meter.so.2
      librte_metrics.so.1
      librte_net.so.1
diff --git a/lib/librte_mempool/Makefile b/lib/librte_mempool/Makefile
index 24e735a..072740f 100644
--- a/lib/librte_mempool/Makefile
+++ b/lib/librte_mempool/Makefile
@@ -11,11 +11,12 @@ LDLIBS += -lrte_eal -lrte_ring
 
 EXPORT_MAP := rte_mempool_version.map
 
-LIBABIVER := 3
+LIBABIVER := 4
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool.c
 SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops.c
+SRCS-$(CONFIG_RTE_LIBRTE_MEMPOOL) +=  rte_mempool_ops_default.c
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_MEMPOOL)-include := rte_mempool.h
 
diff --git a/lib/librte_mempool/meson.build b/lib/librte_mempool/meson.build
index 7a4f3da..9e3b527 100644
--- a/lib/librte_mempool/meson.build
+++ b/lib/librte_mempool/meson.build
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
-version = 2
-sources = files('rte_mempool.c', 'rte_mempool_ops.c')
+version = 4
+sources = files('rte_mempool.c', 'rte_mempool_ops.c',
+		'rte_mempool_ops_default.c')
 headers = files('rte_mempool.h')
 deps += ['ring']
diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 54f7f4b..3bfb36e 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -544,39 +544,33 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 	unsigned int mz_flags = RTE_MEMZONE_1GB|RTE_MEMZONE_SIZE_HINT_ONLY;
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	const struct rte_memzone *mz;
-	size_t size, total_elt_sz, align, pg_sz, pg_shift;
+	ssize_t mem_size;
+	size_t align, pg_sz, pg_shift;
 	rte_iova_t iova;
 	unsigned mz_id, n;
-	unsigned int mp_flags;
 	int ret;
 
 	/* mempool must not be populated */
 	if (mp->nb_mem_chunks != 0)
 		return -EEXIST;
 
-	/* Get mempool capabilities */
-	mp_flags = 0;
-	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
-	if ((ret < 0) && (ret != -ENOTSUP))
-		return ret;
-
-	/* update mempool capabilities */
-	mp->flags |= mp_flags;
-
 	if (rte_eal_has_hugepages()) {
 		pg_shift = 0; /* not needed, zone is physically contiguous */
 		pg_sz = 0;
-		align = RTE_CACHE_LINE_SIZE;
 	} else {
 		pg_sz = getpagesize();
 		pg_shift = rte_bsf32(pg_sz);
-		align = pg_sz;
 	}
 
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
 	for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
-		size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift,
-						mp->flags);
+		size_t min_chunk_size;
+
+		mem_size = rte_mempool_ops_calc_mem_size(mp, n, pg_shift,
+				&min_chunk_size, &align);
+		if (mem_size < 0) {
+			ret = mem_size;
+			goto fail;
+		}
 
 		ret = snprintf(mz_name, sizeof(mz_name),
 			RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);
@@ -585,7 +579,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 			goto fail;
 		}
 
-		mz = rte_memzone_reserve_aligned(mz_name, size,
+		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
 			mp->socket_id, mz_flags, align);
 		/* not enough memory, retry with the biggest zone we have */
 		if (mz == NULL)
@@ -596,6 +590,12 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 			goto fail;
 		}
 
+		if (mz->len < min_chunk_size) {
+			rte_memzone_free(mz);
+			ret = -ENOMEM;
+			goto fail;
+		}
+
 		if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG)
 			iova = RTE_BAD_IOVA;
 		else
@@ -628,13 +628,14 @@ rte_mempool_populate_default(struct rte_mempool *mp)
 static size_t
 get_anon_size(const struct rte_mempool *mp)
 {
-	size_t size, total_elt_sz, pg_sz, pg_shift;
+	size_t size, pg_sz, pg_shift;
+	size_t min_chunk_size;
+	size_t align;
 
 	pg_sz = getpagesize();
 	pg_shift = rte_bsf32(pg_sz);
-	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
-	size = rte_mempool_xmem_size(mp->size, total_elt_sz, pg_shift,
-					mp->flags);
+	size = rte_mempool_ops_calc_mem_size(mp, mp->size, pg_shift,
+					     &min_chunk_size, &align);
 
 	return size;
 }
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index 8b1b7f7..0151f6c 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -399,6 +399,56 @@ typedef int (*rte_mempool_get_capabilities_t)(const struct rte_mempool *mp,
 typedef int (*rte_mempool_ops_register_memory_area_t)
 (const struct rte_mempool *mp, char *vaddr, rte_iova_t iova, size_t len);
 
+/**
+ * Calculate memory size required to store given number of objects.
+ *
+ * @param[in] mp
+ *   Pointer to the memory pool.
+ * @param[in] obj_num
+ *   Number of objects.
+ * @param[in] pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @param[out] min_chunk_size
+ *   Location for minimum size of the memory chunk which may be used to
+ *   store memory pool objects.
+ * @param[out] align
+ *   Location with required memory chunk alignment.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+typedef ssize_t (*rte_mempool_calc_mem_size_t)(const struct rte_mempool *mp,
+		uint32_t obj_num,  uint32_t pg_shift,
+		size_t *min_chunk_size, size_t *align);
+
+/**
+ * Default way to calculate memory size required to store given number of
+ * objects.
+ *
+ * If page boundaries may be ignored, it is just a product of total
+ * object size including header and trailer and number of objects.
+ * Otherwise, it is a number of pages required to store given number of
+ * objects without crossing page boundary.
+ *
+ * Note that if object size is bigger than page size, then it assumes
+ * that pages are grouped in subsets of physically continuous pages big
+ * enough to store at least one object.
+ *
+ * If mempool driver requires object addresses to be block size aligned
+ * (MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS), space for one extra element is
+ * reserved to be able to meet the requirement.
+ *
+ * Minimum size of memory chunk is either all required space, if
+ * capabilities say that whole memory area must be physically contiguous
+ * (MEMPOOL_F_CAPA_PHYS_CONTIG), or a maximum of the page size and total
+ * element size.
+ *
+ * Required memory chunk alignment is a maximum of page size and cache
+ * line size.
+ */
+ssize_t rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
+		uint32_t obj_num, uint32_t pg_shift,
+		size_t *min_chunk_size, size_t *align);
+
 /** Structure defining mempool operations structure */
 struct rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
@@ -415,6 +465,11 @@ struct rte_mempool_ops {
 	 * Notify new memory area to mempool
 	 */
 	rte_mempool_ops_register_memory_area_t register_memory_area;
+	/**
+	 * Optional callback to calculate memory size required to
+	 * store specified number of objects.
+	 */
+	rte_mempool_calc_mem_size_t calc_mem_size;
 } __rte_cache_aligned;
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
@@ -564,6 +619,29 @@ rte_mempool_ops_register_memory_area(const struct rte_mempool *mp,
 				char *vaddr, rte_iova_t iova, size_t len);
 
 /**
+ * @internal wrapper for mempool_ops calc_mem_size callback.
+ * API to calculate size of memory required to store specified number of
+ * object.
+ *
+ * @param[in] mp
+ *   Pointer to the memory pool.
+ * @param[in] obj_num
+ *   Number of objects.
+ * @param[in] pg_shift
+ *   LOG2 of the physical pages size. If set to 0, ignore page boundaries.
+ * @param[out] min_chunk_size
+ *   Location for minimum size of the memory chunk which may be used to
+ *   store memory pool objects.
+ * @param[out] align
+ *   Location with required memory chunk alignment.
+ * @return
+ *   Required memory size aligned at page boundary.
+ */
+ssize_t rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
+				      uint32_t obj_num, uint32_t pg_shift,
+				      size_t *min_chunk_size, size_t *align);
+
+/**
  * @internal wrapper for mempool_ops free callback.
  *
  * @param mp
@@ -1533,7 +1611,7 @@ uint32_t rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags,
  * of objects. Assume that the memory buffer will be aligned at page
  * boundary.
  *
- * Note that if object size is bigger then page size, then it assumes
+ * Note that if object size is bigger than page size, then it assumes
  * that pages are grouped in subsets of physically continuous pages big
  * enough to store at least one object.
  *
diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
index 0732255..26908cc 100644
--- a/lib/librte_mempool/rte_mempool_ops.c
+++ b/lib/librte_mempool/rte_mempool_ops.c
@@ -59,6 +59,7 @@ rte_mempool_register_ops(const struct rte_mempool_ops *h)
 	ops->get_count = h->get_count;
 	ops->get_capabilities = h->get_capabilities;
 	ops->register_memory_area = h->register_memory_area;
+	ops->calc_mem_size = h->calc_mem_size;
 
 	rte_spinlock_unlock(&rte_mempool_ops_table.sl);
 
@@ -123,6 +124,23 @@ rte_mempool_ops_register_memory_area(const struct rte_mempool *mp, char *vaddr,
 	return ops->register_memory_area(mp, vaddr, iova, len);
 }
 
+/* wrapper to notify new memory area to external mempool */
+ssize_t
+rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
+				uint32_t obj_num, uint32_t pg_shift,
+				size_t *min_chunk_size, size_t *align)
+{
+	struct rte_mempool_ops *ops;
+
+	ops = rte_mempool_get_ops(mp->ops_index);
+
+	if (ops->calc_mem_size == NULL)
+		return rte_mempool_op_calc_mem_size_default(mp, obj_num,
+				pg_shift, min_chunk_size, align);
+
+	return ops->calc_mem_size(mp, obj_num, pg_shift, min_chunk_size, align);
+}
+
 /* sets mempool ops previously registered by rte_mempool_register_ops. */
 int
 rte_mempool_set_ops_byname(struct rte_mempool *mp, const char *name,
diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c
new file mode 100644
index 0000000..57fe79b
--- /dev/null
+++ b/lib/librte_mempool/rte_mempool_ops_default.c
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2016 Intel Corporation.
+ * Copyright(c) 2016 6WIND S.A.
+ * Copyright(c) 2018 Solarflare Communications Inc.
+ */
+
+#include <rte_mempool.h>
+
+ssize_t
+rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp,
+				     uint32_t obj_num, uint32_t pg_shift,
+				     size_t *min_chunk_size, size_t *align)
+{
+	unsigned int mp_flags;
+	int ret;
+	size_t total_elt_sz;
+	size_t mem_size;
+
+	/* Get mempool capabilities */
+	mp_flags = 0;
+	ret = rte_mempool_ops_get_capabilities(mp, &mp_flags);
+	if ((ret < 0) && (ret != -ENOTSUP))
+		return ret;
+
+	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
+	mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift,
+					 mp->flags | mp_flags);
+
+	if (mp_flags & MEMPOOL_F_CAPA_PHYS_CONTIG)
+		*min_chunk_size = mem_size;
+	else
+		*min_chunk_size = RTE_MAX((size_t)1 << pg_shift, total_elt_sz);
+
+	*align = RTE_MAX((size_t)RTE_CACHE_LINE_SIZE, (size_t)1 << pg_shift);
+
+	return mem_size;
+}
diff --git a/lib/librte_mempool/rte_mempool_version.map b/lib/librte_mempool/rte_mempool_version.map
index 62b76f9..e2a054b 100644
--- a/lib/librte_mempool/rte_mempool_version.map
+++ b/lib/librte_mempool/rte_mempool_version.map
@@ -51,3 +51,11 @@ DPDK_17.11 {
 	rte_mempool_populate_iova_tab;
 
 } DPDK_16.07;
+
+DPDK_18.05 {
+	global:
+
+	rte_mempool_op_calc_mem_size_default;
+
+} DPDK_17.11;
+
-- 
2.7.4

^ permalink raw reply	[relevance 7%]

* [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver
  @ 2018-03-10 15:39  3% ` Andrew Rybchenko
  2018-03-10 15:39  7%   ` [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
                     ` (5 more replies)
  2018-03-25 16:20  2% ` [dpdk-dev] [PATCH v2 00/11] " Andrew Rybchenko
                   ` (2 subsequent siblings)
  3 siblings, 6 replies; 200+ results
From: Andrew Rybchenko @ 2018-03-10 15:39 UTC (permalink / raw)
  To: dev
  Cc: Olivier MATZ, Santosh Shukla, Jerin Jacob, Hemant Agrawal,
	Shreyansh Jain

The initial patch series [1] is split into two to simplify processing.
The second series relies on this one and will add bucket mempool driver
and related ops.

The patch series has generic enhancements suggested by Olivier.
Basically it adds driver callbacks to calculate required memory size and
to populate objects using provided memory area. It allows to remove
so-called capability flags used before to tell generic code how to
allocate and slice allocated memory into mempool objects.
Clean up which removes get_capabilities and register_memory_area is
not strictly required, but I think right thing to do.
Existing mempool drivers are updated.

I've kept rte_mempool_populate_iova_tab() intact since it seems to
be not directly related XMEM API functions.

It breaks ABI since changes rte_mempool_ops. Also it removes
rte_mempool_ops_register_memory_area() and
rte_mempool_ops_get_capabilities() since corresponding callbacks are
removed.

Internal global functions are not listed in map file since it is not
a part of external API.

[1] http://dpdk.org/ml/archives/dev/2018-January/088698.html

RFCv1 -> RFCv2:
  - add driver ops to calculate required memory size and populate
    mempool objects, remove extra flags which were required before
    to control it
  - transition of octeontx and dpaa drivers to the new callbacks
  - change info API to get information from driver required to
    API user to know contiguous block size
  - remove get_capabilities (not required any more and may be
    substituted with more in info get API)
  - remove register_memory_area since it is substituted with
    populate callback which can do more
  - use SPDX tags
  - avoid all objects affinity to single lcore
  - fix bucket get_count
  - deprecate XMEM API
  - avoid introduction of a new function to flush cache
  - fix NO_CACHE_ALIGN case in bucket mempool

RFCv2 -> v1:
  - split the series in two
  - squash octeontx patches which implement calc_mem_size and populate
    callbacks into the patch which removes get_capabilities since it is
    the easiest way to untangle the tangle of tightly related library
    functions and flags advertised by the driver
  - consistently name default callbacks
  - move default callbacks to dedicated file
  - see detailed description in patches

Andrew Rybchenko (7):
  mempool: add op to calculate memory size to be allocated
  mempool: add op to populate objects using provided memory
  mempool: remove callback to get capabilities
  mempool: deprecate xmem functions
  mempool/octeontx: prepare to remove register memory area op
  mempool/dpaa: prepare to remove register memory area op
  mempool: remove callback to register memory area

Artem V. Andreev (2):
  mempool: ensure the mempool is initialized before populating
  mempool: support flushing the default cache of the mempool

 doc/guides/rel_notes/deprecation.rst            |  12 +-
 doc/guides/rel_notes/release_18_05.rst          |  32 ++-
 drivers/mempool/dpaa/dpaa_mempool.c             |  13 +-
 drivers/mempool/octeontx/rte_mempool_octeontx.c |  64 ++++--
 lib/librte_mempool/Makefile                     |   3 +-
 lib/librte_mempool/meson.build                  |   5 +-
 lib/librte_mempool/rte_mempool.c                | 159 +++++++--------
 lib/librte_mempool/rte_mempool.h                | 260 +++++++++++++++++-------
 lib/librte_mempool/rte_mempool_ops.c            |  37 ++--
 lib/librte_mempool/rte_mempool_ops_default.c    |  51 +++++
 lib/librte_mempool/rte_mempool_version.map      |  11 +-
 test/test/test_mempool.c                        |  31 ---
 12 files changed, 437 insertions(+), 241 deletions(-)
 create mode 100644 lib/librte_mempool/rte_mempool_ops_default.c

-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4] ethdev: return named opaque type instead of void pointer
  2018-03-09 15:45  0%         ` Ferruh Yigit
@ 2018-03-09 19:06  0%           ` Neil Horman
  2018-03-20 15:51  0%             ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-03-09 19:06 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: John McNamara, Marko Kovacevic, Thomas Monjalon, dev

On Fri, Mar 09, 2018 at 03:45:49PM +0000, Ferruh Yigit wrote:
> On 3/9/2018 3:16 PM, Neil Horman wrote:
> > On Fri, Mar 09, 2018 at 01:00:35PM +0000, Ferruh Yigit wrote:
> >> On 3/9/2018 12:36 PM, Neil Horman wrote:
> >>> On Fri, Mar 09, 2018 at 11:25:31AM +0000, Ferruh Yigit wrote:
> >>>> "struct rte_eth_rxtx_callback" is defined as internal data structure and
> >>>> used as named opaque type.
> >>>>
> >>>> So the functions that are adding callbacks can return objects in this
> >>>> type instead of void pointer.
> >>>>
> >>>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> >>>> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> >>>> ---
> >>>> v2:
> >>>> * keep using struct * in parameters, instead add callback functions
> >>>> return struct rte_eth_rxtx_callback pointer.
> >>>>
> >>>> v4:
> >>>> * Remove deprecation notice. LIBABIVER already increased in this release
> >>>> ---
> >>>>  doc/guides/rel_notes/deprecation.rst |  7 -------
> >>>>  lib/librte_ether/rte_ethdev.c        |  6 +++---
> >>>>  lib/librte_ether/rte_ethdev.h        | 13 ++++++++-----
> >>>>  3 files changed, 11 insertions(+), 15 deletions(-)
> >>>>
> >>> This doesn't quite make sense to me.  If rte_eth_rxtx_callback is defined as an
> >>> internal data structure, then it shouldn't be used as part of the prototype for
> >>> an exported function, as the structure will then no longer be a internal data
> >>> structure, but rather part of the public ABI.
> >>
> >> "struct rte_eth_rxtx_callback" is internal data structure. And application
> >> should not access elements of this structure.
> >>
> >> "struct rte_eth_rxtx_callback;" is defined in the public header, so applications
> >> can use it as opaque type.
> >>
> >> It is possible that both "add" and "remove" APIs use "void *" and API itself can
> >> cast it. But the inconsistency was "add" related APIs return "void *" and
> >> "remove" related APIs require a parameter in "struct rte_eth_rxtx_callback *" type.
> >>
> >> While unifying the usage, "struct rte_eth_rxtx_callback *" preferred against
> >> "void *", because named opaque type documents intention/usage better.
> >>
> >> Thanks,
> >> ferruh
> >>
> > I get what you're saying about rte_eth_rxtx_callback being an internals
> > structure (or its intent is to be an internal structure), but it doesn't seem to
> > hold up to the header file layout.  rte_eth_rxtx_callback is defined in
> > rte_ethdev_core.h which according to the makefile, is listed as a symlinked
> > file, and therefore available for external applications to include.  This
> > negates the intended opaque nature of the struct.  I think before you do this,
> > you want to rectify that.
> 
> Intention is to make "struct rte_eth_rxtx_callback" internal, but as you said it
> is available to applications. This is same for all data structures in
> rte_ethdev_core.h
> 
Well...yes.  Thats what I said

> Unfortunately it can't be actual internal because of inline functions in public
> header uses them. And we can't change inline functions because of performance
> concerns.
> 
I'm sorry, thats not ok with me.  Just declaring a data structure to be
internal-only without enforcing that is asking for applications to mangle
internal data, and theres no reason it can't be fixed (and done without
sacrificing performance).

> Since we can't make the structure real internal, we can't really prevent
> applications to access the internals, this same if you use "void *".
> 
Just typedef a void pointer to some rte_ethdev_cb_handle_t type and pass that
back and forth instead.  That at least hides the fact that you are using a non
opaque structure from user applications without some intentional casting.  You
can further lock the call down by declaring the handles const so that no one
tries to dereference or modify them without generating a warning.

Neil

> > 
> > Neil
> > 
> >>>
> >>> Neil
> >>>
> >>
> >>
> 
> 

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v1 1/5] bpf: add BPF loading and execution framework
  @ 2018-03-09 16:42  2% ` Konstantin Ananyev
  2018-03-30 17:32  2% ` [dpdk-dev] [PATCH v2 2/7] " Konstantin Ananyev
  1 sibling, 0 replies; 200+ results
From: Konstantin Ananyev @ 2018-03-09 16:42 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

librte_bpf provides a framework to load and execute eBPF bytecode
inside user-space dpdk based applications.
It supports basic set of features from eBPF spec
(https://www.kernel.org/doc/Documentation/networking/filter.txt).

Not currently supported features:
 - JIT
 - cBPF
 - tail-pointer call
 - eBPF MAP
 - skb

It also adds dependency on libelf.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 config/common_base                 |   5 +
 config/common_linuxapp             |   1 +
 lib/Makefile                       |   2 +
 lib/librte_bpf/Makefile            |  30 +++
 lib/librte_bpf/bpf.c               |  48 ++++
 lib/librte_bpf/bpf_exec.c          | 452 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_impl.h          |  37 +++
 lib/librte_bpf/bpf_load.c          | 380 +++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_validate.c      |  55 +++++
 lib/librte_bpf/rte_bpf.h           | 158 +++++++++++++
 lib/librte_bpf/rte_bpf_version.map |  12 +
 mk/rte.app.mk                      |   2 +
 12 files changed, 1182 insertions(+)
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map

diff --git a/config/common_base b/config/common_base
index ad03cf433..2205b684f 100644
--- a/config/common_base
+++ b/config/common_base
@@ -823,3 +823,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y
 # Compile the eventdev application
 #
 CONFIG_RTE_APP_EVENTDEV=y
+
+#
+# Compile librte_bpf
+#
+CONFIG_RTE_LIBRTE_BPF=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index ff98f2355..7b4a0ce7d 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -10,6 +10,7 @@ CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=y
 CONFIG_RTE_EAL_IGB_UIO=y
 CONFIG_RTE_EAL_VFIO=y
 CONFIG_RTE_KNI_KMOD=y
+CONFIG_RTE_LIBRTE_BPF=y
 CONFIG_RTE_LIBRTE_KNI=y
 CONFIG_RTE_LIBRTE_PMD_KNI=y
 CONFIG_RTE_LIBRTE_VHOST=y
diff --git a/lib/Makefile b/lib/Makefile
index ec965a606..a4a2329f9 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -97,6 +97,8 @@ DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
 DIRS-$(CONFIG_RTE_LIBRTE_GSO) += librte_gso
 DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ether librte_net
 DEPDIRS-librte_gso += librte_mempool
+DIRS-$(CONFIG_RTE_LIBRTE_BPF) += librte_bpf
+DEPDIRS-librte_bpf := librte_eal librte_mempool librte_mbuf librte_ether
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
new file mode 100644
index 000000000..e0f434e77
--- /dev/null
+++ b/lib/librte_bpf/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bpf.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+LDLIBS += -lrte_net -lrte_eal
+LDLIBS += -lrte_mempool -lrte_ring
+LDLIBS += -lrte_mbuf -lrte_ethdev
+LDLIBS += -lelf
+
+EXPORT_MAP := rte_bpf_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+
+# install header files
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
new file mode 100644
index 000000000..4727d2251
--- /dev/null
+++ b/lib/librte_bpf/bpf.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+__rte_experimental void
+rte_bpf_destroy(struct rte_bpf *bpf)
+{
+	if (bpf != NULL) {
+		if (bpf->jit.func != NULL)
+			munmap(bpf->jit.func, bpf->jit.sz);
+		munmap(bpf, bpf->sz);
+	}
+}
+
+__rte_experimental int
+rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit)
+{
+	if (bpf == NULL || jit == NULL)
+		return -EINVAL;
+
+	jit[0] = bpf->jit;
+	return 0;
+}
+
+int
+bpf_jit(struct rte_bpf *bpf)
+{
+	int32_t rc;
+
+	rc = -ENOTSUP;
+
+	if (rc != 0)
+		RTE_LOG(WARNING, USER1, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
diff --git a/lib/librte_bpf/bpf_exec.c b/lib/librte_bpf/bpf_exec.c
new file mode 100644
index 000000000..f1c1d3be3
--- /dev/null
+++ b/lib/librte_bpf/bpf_exec.c
@@ -0,0 +1,452 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define BPF_JMP_UNC(ins)	((ins) += (ins)->off)
+
+#define BPF_JMP_CND_REG(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg]) ? \
+		(ins)->off : 0)
+
+#define BPF_JMP_CND_IMM(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(ins)->imm) ? \
+		(ins)->off : 0)
+
+#define BPF_NEG_ALU(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(-(reg)[(ins)->dst_reg]))
+
+#define BPF_MOV_ALU_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(reg)[(ins)->src_reg])
+
+#define BPF_OP_ALU_REG(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg])
+
+#define BPF_MOV_ALU_IMM(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(ins)->imm)
+
+#define BPF_OP_ALU_IMM(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(ins)->imm)
+
+#define BPF_DIV_ZERO_CHECK(bpf, reg, ins, type) do { \
+	if ((type)(reg)[(ins)->src_reg] == 0) { \
+		RTE_LOG(ERR, USER1, \
+			"%s(%p): division by 0 at pc: %#zx;\n", \
+			__func__, bpf, \
+			(uintptr_t)(ins) - (uintptr_t)(bpf)->prm.ins); \
+		return 0; \
+	} \
+} while (0)
+
+#define BPF_LD_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = \
+		*(type *)(uintptr_t)((reg)[(ins)->src_reg] + (ins)->off))
+
+#define BPF_ST_IMM(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(ins)->imm)
+
+#define BPF_ST_REG(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(reg)[(ins)->src_reg])
+
+#define BPF_ST_XADD_REG(reg, ins, tp)	\
+	(rte_atomic##tp##_add((rte_atomic##tp##_t *) \
+		(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off), \
+		reg[ins->src_reg]))
+
+static inline void
+bpf_alu_be(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_be_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_be_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_be_64(*v);
+		break;
+	}
+}
+
+static inline void
+bpf_alu_le(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_le_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_le_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_le_64(*v);
+		break;
+	}
+}
+
+static inline uint64_t
+bpf_exec(const struct rte_bpf *bpf, uint64_t reg[MAX_BPF_REG])
+{
+	const struct bpf_insn *ins;
+
+	for (ins = bpf->prm.ins; ; ins++) {
+		switch (ins->code) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint32_t);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_BE):
+			bpf_alu_be(reg, ins);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_LE):
+			bpf_alu_le(reg, ins);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint64_t);
+			break;
+		/* 64 bit ALU REG operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint64_t);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+			BPF_LD_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_H):
+			BPF_LD_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_W):
+			BPF_LD_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_DW):
+			BPF_LD_REG(reg, ins, uint64_t);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | BPF_DW):
+			reg[ins->dst_reg] = (uint32_t)ins[0].imm |
+				(uint64_t)(uint32_t)ins[1].imm << 32;
+			ins++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+			BPF_ST_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_H):
+			BPF_ST_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_W):
+			BPF_ST_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_DW):
+			BPF_ST_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+			BPF_ST_IMM(reg, ins, uint8_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_H):
+			BPF_ST_IMM(reg, ins, uint16_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_W):
+			BPF_ST_IMM(reg, ins, uint32_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_DW):
+			BPF_ST_IMM(reg, ins, uint64_t);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | BPF_XADD | BPF_W):
+			BPF_ST_XADD_REG(reg, ins, 32);
+			break;
+		case (BPF_STX | BPF_XADD | BPF_DW):
+			BPF_ST_XADD_REG(reg, ins, 64);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			BPF_JMP_UNC(ins);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, &, uint64_t);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, &, uint64_t);
+			break;
+		/* call instructions */
+		case (BPF_JMP | BPF_CALL):
+			reg[BPF_REG_0] = bpf->prm.xsym[ins->imm].func(
+				reg[BPF_REG_1], reg[BPF_REG_2], reg[BPF_REG_3],
+				reg[BPF_REG_4], reg[BPF_REG_5]);
+			break;
+		/* return instruction */
+		case (BPF_JMP | BPF_EXIT):
+			return reg[BPF_REG_0];
+		default:
+			RTE_LOG(ERR, USER1,
+				"%s(%p): invalid opcode %#x at pc: %#zx;\n",
+				__func__, bpf, ins->code,
+				(uintptr_t)ins - (uintptr_t)bpf->prm.ins);
+			return 0;
+		}
+	}
+
+	/* should never be reached */
+	RTE_VERIFY(0);
+	return 0;
+}
+
+__rte_experimental uint32_t
+rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
+	uint32_t num)
+{
+	uint32_t i;
+	uint64_t reg[MAX_BPF_REG];
+	uint64_t stack[MAX_BPF_STACK_SIZE / sizeof(uint64_t)];
+
+	for (i = 0; i != num; i++) {
+
+		reg[BPF_REG_1] = (uintptr_t)ctx[i];
+		reg[BPF_REG_10] = (uintptr_t)(stack + RTE_DIM(stack));
+
+		rc[i] = bpf_exec(bpf, reg);
+	}
+
+	return i;
+}
+
+__rte_experimental uint64_t
+rte_bpf_exec(const struct rte_bpf *bpf, void *ctx)
+{
+	uint64_t rc;
+
+	rte_bpf_exec_burst(bpf, &ctx, &rc, 1);
+	return rc;
+}
diff --git a/lib/librte_bpf/bpf_impl.h b/lib/librte_bpf/bpf_impl.h
new file mode 100644
index 000000000..f09417088
--- /dev/null
+++ b/lib/librte_bpf/bpf_impl.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _BPF_H_
+#define _BPF_H_
+
+#include <rte_bpf.h>
+#include <sys/mman.h>
+#include <linux/bpf.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MAX_BPF_STACK_SIZE	0x200
+
+struct rte_bpf {
+	struct rte_bpf_prm prm;
+	struct rte_bpf_jit jit;
+	size_t sz;
+	uint32_t stack_sz;
+};
+
+extern int bpf_validate(struct rte_bpf *bpf);
+
+extern int bpf_jit(struct rte_bpf *bpf);
+
+#ifdef RTE_ARCH_X86_64
+extern int bpf_jit_x86(struct rte_bpf *);
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _BPF_H_ */
diff --git a/lib/librte_bpf/bpf_load.c b/lib/librte_bpf/bpf_load.c
new file mode 100644
index 000000000..6ced9c640
--- /dev/null
+++ b/lib/librte_bpf/bpf_load.c
@@ -0,0 +1,380 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/queue.h>
+#include <fcntl.h>
+
+#include <libelf.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+#include <rte_errno.h>
+
+#include "bpf_impl.h"
+
+static uint32_t
+bpf_find_xsym(const char *sn, enum rte_bpf_xtype type,
+	const struct rte_bpf_xsym fp[], uint32_t fn)
+{
+	uint32_t i;
+
+	if (sn == NULL || fp == NULL)
+		return UINT32_MAX;
+
+	for (i = 0; i != fn; i++) {
+		if (fp[i].type == type && strcmp(sn, fp[i].name) == 0)
+			break;
+	}
+
+	return (i != fn) ? i : UINT32_MAX;
+}
+
+/*
+ * update BPF code at offset *ofs* with a proper address(index) for external
+ * symbol *sn*
+ */
+static int
+resolve_xsym(const char *sn, size_t ofs, struct bpf_insn *ins, size_t ins_sz,
+	const struct rte_bpf_prm *prm)
+{
+	uint32_t idx, fidx;
+	enum rte_bpf_xtype type;
+
+	if (ofs % sizeof(ins[0]) != 0 || ofs >= ins_sz)
+		return -EINVAL;
+
+	idx = ofs / sizeof(ins[0]);
+	if (ins[idx].code == (BPF_JMP | BPF_CALL))
+		type = RTE_BPF_XTYPE_FUNC;
+	else if (ins[idx].code == (BPF_LD | BPF_IMM | BPF_DW) &&
+			ofs < ins_sz - sizeof(ins[idx]))
+		type = RTE_BPF_XTYPE_VAR;
+	else
+		return -EINVAL;
+
+	fidx = bpf_find_xsym(sn, type, prm->xsym, prm->nb_xsym);
+	if (fidx == UINT32_MAX)
+		return -ENOENT;
+
+	/* for function we just need an index in our xsym table */
+	if (type == RTE_BPF_XTYPE_FUNC)
+		ins[idx].imm = fidx;
+	/* for variable we need to store its absolute address */
+	else {
+		ins[idx].imm = (uintptr_t)prm->xsym[fidx].var;
+		ins[idx + 1].imm = (uintptr_t)prm->xsym[fidx].var >> 32;
+	}
+
+	return 0;
+}
+
+static int
+check_elf_header(const Elf64_Ehdr * eh)
+{
+	const char *err;
+
+	err = NULL;
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	if (eh->e_ident[EI_DATA] != ELFDATA2LSB)
+#else
+	if (eh->e_ident[EI_DATA] != ELFDATA2MSB)
+#endif
+		err = "not native byte order";
+	else if (eh->e_ident[EI_OSABI] != ELFOSABI_NONE)
+		err = "unexpected OS ABI";
+	else if (eh->e_type != ET_REL)
+		err = "unexpected ELF type";
+	else if (eh->e_machine != EM_NONE && eh->e_machine != EM_BPF)
+		err = "unexpected machine type";
+
+	if (err != NULL) {
+		RTE_LOG(ERR, USER1, "%s(): %s\n", __func__, err);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find executable section by name.
+ */
+static int
+find_elf_code(Elf *elf, const char *section, Elf_Data **psd, size_t *pidx)
+{
+	Elf_Scn *sc;
+	const Elf64_Ehdr *eh;
+	const Elf64_Shdr *sh;
+	Elf_Data *sd;
+	const char *sn;
+	int32_t rc;
+
+	eh = elf64_getehdr(elf);
+	if (eh == NULL) {
+		rc = elf_errno();
+		RTE_LOG(ERR, USER1, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	if (check_elf_header(eh) != 0)
+		return -EINVAL;
+
+	/* find given section by name */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL;
+			sc = elf_nextscn(elf, sc)) {
+		sh = elf64_getshdr(sc);
+		sn = elf_strptr(elf, eh->e_shstrndx, sh->sh_name);
+		if (sn != NULL && strcmp(section, sn) == 0 &&
+				sh->sh_type == SHT_PROGBITS &&
+				sh->sh_flags == (SHF_ALLOC | SHF_EXECINSTR))
+			break;
+	}
+
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL || sd->d_size == 0 ||
+			sd->d_size % sizeof(struct bpf_insn) != 0) {
+		rc = elf_errno();
+		RTE_LOG(ERR, USER1, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	*psd = sd;
+	*pidx = elf_ndxscn(sc);
+	return 0;
+}
+
+/*
+ * helper function to process data from relocation table.
+ */
+static int
+process_reloc(Elf *elf, size_t sym_idx, Elf64_Rel *re, size_t re_sz,
+	struct bpf_insn *ins, size_t ins_sz, const struct rte_bpf_prm *prm)
+{
+	int32_t rc;
+	uint32_t i, n;
+	size_t ofs, sym;
+	const char *sn;
+	const Elf64_Ehdr *eh;
+	Elf_Scn *sc;
+	const Elf_Data *sd;
+	Elf64_Sym *sm;
+
+	eh = elf64_getehdr(elf);
+
+	/* get symtable by section index */
+	sc = elf_getscn(elf, sym_idx);
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL)
+		return -EINVAL;
+	sm = sd->d_buf;
+
+	n = re_sz / sizeof(re[0]);
+	for (i = 0; i != n; i++) {
+
+		ofs = re[i].r_offset;
+
+		/* retrieve index in the symtable */
+		sym = ELF64_R_SYM(re[i].r_info);
+		if (sym * sizeof(sm[0]) >= sd->d_size)
+			return -EINVAL;
+
+		sn = elf_strptr(elf, eh->e_shstrndx, sm[sym].st_name);
+
+		rc = resolve_xsym(sn, ofs, ins, ins_sz, prm);
+		if (rc != 0) {
+			RTE_LOG(ERR, USER1,
+				"resolve_xsym(%s, %zu) error code: %d\n",
+				sn, ofs, rc);
+			return rc;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find relocation information (if any)
+ * and update bpf code.
+ */
+static int
+elf_reloc_code(Elf *elf, Elf_Data *ed, size_t sidx,
+	const struct rte_bpf_prm *prm)
+{
+	Elf64_Rel *re;
+	Elf_Scn *sc;
+	const Elf64_Shdr *sh;
+	const Elf_Data *sd;
+	int32_t rc;
+
+	rc = 0;
+
+	/* walk through all sections */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL && rc == 0;
+			sc = elf_nextscn(elf, sc)) {
+
+		sh = elf64_getshdr(sc);
+
+		/* relocation data for our code section */
+		if (sh->sh_type == SHT_REL && sh->sh_info == sidx) {
+			sd = elf_getdata(sc, NULL);
+			if (sd == NULL || sd->d_size == 0 ||
+					sd->d_size % sizeof(re[0]) != 0)
+				return -EINVAL;
+			rc = process_reloc(elf, sh->sh_link,
+				sd->d_buf, sd->d_size, ed->d_buf, ed->d_size,
+				prm);
+		}
+	}
+
+	return rc;
+}
+
+static struct rte_bpf *
+bpf_load(const struct rte_bpf_prm *prm)
+{
+	uint8_t *buf;
+	struct rte_bpf *bpf;
+	size_t sz, bsz, insz, xsz;
+
+	xsz =  prm->nb_xsym * sizeof(prm->xsym[0]);
+	insz = prm->nb_ins * sizeof(prm->ins[0]);
+	bsz = sizeof(bpf[0]);
+	sz = insz + xsz + bsz;
+
+	buf = mmap(NULL, sz, PROT_READ | PROT_WRITE,
+		MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (buf == MAP_FAILED)
+		return NULL;
+
+	bpf = (void *)buf;
+	bpf->sz = sz;
+
+	memcpy(&bpf->prm, prm, sizeof(bpf->prm));
+
+	memcpy(buf + bsz, prm->xsym, xsz);
+	memcpy(buf + bsz + xsz, prm->ins, insz);
+
+	bpf->prm.xsym = (void *)(buf + bsz);
+	bpf->prm.ins = (void *)(buf + bsz + xsz);
+
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_load(const struct rte_bpf_prm *prm)
+{
+	struct rte_bpf *bpf;
+	int32_t rc;
+
+	if (prm == NULL || prm->ins == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load(prm);
+	if (bpf == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	rc = bpf_validate(bpf);
+	if (rc == 0) {
+		bpf_jit(bpf);
+		if (mprotect(bpf, bpf->sz, PROT_READ) != 0)
+			rc = -ENOMEM;
+	}
+
+	if (rc != 0) {
+		rte_bpf_destroy(bpf);
+		rte_errno = -rc;
+		return NULL;
+	}
+
+	return bpf;
+}
+
+static struct rte_bpf *
+bpf_load_elf(const struct rte_bpf_prm *prm, int32_t fd, const char *section)
+{
+	Elf *elf;
+	Elf_Data *sd;
+	size_t sidx;
+	int32_t rc;
+	struct rte_bpf *bpf;
+	struct rte_bpf_prm np;
+
+	elf_version(EV_CURRENT);
+	elf = elf_begin(fd, ELF_C_READ, NULL);
+
+	rc = find_elf_code(elf, section, &sd, &sidx);
+	if (rc == 0)
+		rc = elf_reloc_code(elf, sd, sidx, prm);
+
+	if (rc == 0) {
+		np = prm[0];
+		np.ins = sd->d_buf;
+		np.nb_ins = sd->d_size / sizeof(struct bpf_insn);
+		bpf = rte_bpf_load(&np);
+	} else {
+		bpf = NULL;
+		rte_errno = -rc;
+	}
+
+	elf_end(elf);
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_elf_load(const struct rte_bpf_prm *prm, const char *fname,
+	const char *sname)
+{
+	int32_t fd, rc;
+	struct rte_bpf *bpf;
+
+	if (prm == NULL || fname == NULL || sname == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	fd = open(fname, O_RDONLY);
+	if (fd < 0) {
+		rc = errno;
+		RTE_LOG(ERR, USER1, "%s(%s) error code: %d(%s)\n",
+			__func__, fname, rc, strerror(rc));
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load_elf(prm, fd, sname);
+	close(fd);
+
+	if (bpf == NULL) {
+		RTE_LOG(ERR, USER1,
+			"%s(fname=\"%s\", sname=\"%s\") failed, "
+			"error code: %d\n",
+			__func__, fname, sname, rte_errno);
+		return NULL;
+	}
+
+	RTE_LOG(INFO, USER1, "%s(fname=\"%s\", sname=\"%s\") "
+		"successfully creates %p(jit={.func=%p,.sz=%zu});\n",
+		__func__, fname, sname, bpf, bpf->jit.func, bpf->jit.sz);
+	return bpf;
+}
diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c
new file mode 100644
index 000000000..7c1267cbd
--- /dev/null
+++ b/lib/librte_bpf/bpf_validate.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+/*
+ * dummy one for now, need more work.
+ */
+int
+bpf_validate(struct rte_bpf *bpf)
+{
+	int32_t rc, ofs, stack_sz;
+	uint32_t i, op, dr;
+	const struct bpf_insn *ins;
+
+	rc = 0;
+	stack_sz = 0;
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		ins = bpf->prm.ins + i;
+		op = ins->code;
+		dr = ins->dst_reg;
+		ofs = ins->off;
+
+		if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) &&
+				dr == BPF_REG_10) {
+			ofs -= sizeof(uint64_t);
+			stack_sz = RTE_MIN(ofs, stack_sz);
+		}
+	}
+
+	if (stack_sz != 0) {
+		stack_sz = -stack_sz;
+		if (stack_sz > MAX_BPF_STACK_SIZE)
+			rc = -ERANGE;
+		else
+			bpf->stack_sz = stack_sz;
+	}
+
+	if (rc != 0)
+		RTE_LOG(ERR, USER1, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
diff --git a/lib/librte_bpf/rte_bpf.h b/lib/librte_bpf/rte_bpf.h
new file mode 100644
index 000000000..efee35ad4
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf.h
@@ -0,0 +1,158 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_H_
+#define _RTE_BPF_H_
+
+#include <rte_common.h>
+#include <rte_mbuf.h>
+#include <linux/bpf.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Possible types for external symbols.
+ */
+enum rte_bpf_xtype {
+	RTE_BPF_XTYPE_FUNC, /**< function */
+	RTE_BPF_XTYPE_VAR, /**< variable */
+	RTE_BPF_XTYPE_NUM
+};
+
+/**
+ * Definition for external symbols available in the BPF program.
+ */
+struct rte_bpf_xsym {
+	const char *name;        /**< name */
+	enum rte_bpf_xtype type; /**< type */
+	union {
+		uint64_t (*func)(uint64_t, uint64_t, uint64_t,
+				uint64_t, uint64_t);
+		void *var;
+	}; /**< value */
+};
+
+/**
+ * Possible BPF program types.
+ */
+enum rte_bpf_prog_type {
+	RTE_BPF_PROG_TYPE_UNSPEC = BPF_PROG_TYPE_UNSPEC,
+	/**< input is a pointer to raw data */
+	RTE_BPF_PROG_TYPE_MBUF,
+	/**< input is a pointer to rte_mbuf */
+};
+
+/**
+ * Input parameters for loading eBPF code.
+ */
+struct rte_bpf_prm {
+	const struct bpf_insn *ins; /**< array of eBPF instructions */
+	uint32_t nb_ins;            /**< number of instructions in ins */
+	const struct rte_bpf_xsym *xsym;
+	/**< array of external symbols that eBPF code is allowed to reference */
+	uint32_t nb_xsym; /**< number of elements in xsym */
+	enum rte_bpf_prog_type prog_type; /**< eBPF program type */
+};
+
+/**
+ * Information about compiled into native ISA eBPF code.
+ */
+struct rte_bpf_jit {
+	uint64_t (*func)(void *);
+	size_t sz;
+};
+
+struct rte_bpf;
+
+/**
+ * De-allocate all memory used by this eBPF execution context.
+ *
+ * @param bpf
+ *   BPF handle to destroy.
+ */
+void rte_bpf_destroy(struct rte_bpf *bpf);
+
+/**
+ * Create a new eBPF execution context and load given BPF code into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_load(const struct rte_bpf_prm *prm);
+
+/**
+ * Create a new eBPF execution context and load BPF code from given ELF
+ * file into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_elf_load(const struct rte_bpf_prm *prm,
+	const char *fname, const char *sname);
+
+/**
+ * Execute given BPF bytecode.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   pointer to input context.
+ * @return
+ *   BPF execution return value.
+ */
+uint64_t rte_bpf_exec(const struct rte_bpf *bpf, void *ctx);
+
+/**
+ * Execute given BPF bytecode over a set of input contexts.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   array of pointers to the input contexts.
+ * @param rc
+ *   array of return values (one per input).
+ * @param num
+ *   number of elements in ctx[] (and rc[]).
+ * @return
+ *   number of successfully processed inputs.
+ */
+uint32_t rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[],
+	uint64_t rc[], uint32_t num);
+
+/**
+ * Provide information about natively compield code for given BPF handle.
+ *
+ * @param bpf
+ *   handle for the BPF code.
+ * @param jit
+ *   pointer to the rte_bpf_jit structure to be filled with related data.
+ * @return
+ *   - -EINVAL if the parameters are invalid.
+ *   - Zero if operation completed successfully.
+ */
+int rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
new file mode 100644
index 000000000..ff65144df
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_destroy;
+	rte_bpf_elf_load;
+	rte_bpf_exec;
+	rte_bpf_exec_burst;
+	rte_bpf_get_jit;
+	rte_bpf_load;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 3eb41d176..fb41c77d2 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -83,6 +83,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)          += -lrte_power
 _LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER)          += -lrte_timer
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EFD)            += -lrte_efd
 
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BPF)            += -lrte_bpf -lelf
+
 _LDLIBS-y += --whole-archive
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
-- 
2.13.6

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH 18.05 v4] eal: add function to return number of detected sockets
  2018-03-08 14:38  0%       ` Burakov, Anatoly
@ 2018-03-09 16:32  0%         ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2018-03-09 16:32 UTC (permalink / raw)
  To: Burakov, Anatoly; +Cc: dev

On Thu, Mar 08, 2018 at 02:38:37PM +0000, Burakov, Anatoly wrote:
> On 08-Mar-18 12:12 PM, Bruce Richardson wrote:
> > On Wed, Feb 07, 2018 at 09:58:36AM +0000, Anatoly Burakov wrote:
> > > During lcore scan, find maximum socket ID and store it. This will
> > > break the ABI, so bump ABI version.
> > > 
> > > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > > ---
> > > 
> > > Notes:
> > >      v4:
> > >      - Remove backwards ABI compatibility, bump ABI instead
> > >      v3:
> > >      - Added ABI compatibility
> > >      v2:
> > >      - checkpatch changes
> > >      - check socket before deciding if the core is not to be used
> > > 
> > >   lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
> > >   lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
> > >   lib/librte_eal/common/include/rte_eal.h   |  1 +
> > >   lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
> > >   lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
> > >   lib/librte_eal/rte_eal_version.map        |  9 +++++++-
> > >   6 files changed, 44 insertions(+), 15 deletions(-)
> > > 
> > Breaking the ABI is the best way to implement this change, and given the
> > deprecation was previously announced I'm ok with that.
> > 
> > Question: we are ok assuming that the socket numbers are sequential, or
> > nearly so, and knowing the maximum socket number seen is a good
> > approximation of the actual physical sockets? I know in terms of cores
> > on a system, the core id's often jump - are there systems where the
> > socket numbers do too?
> > 
> > /Bruce
> > 
> 
> I am not aware of any system that would jump sockets like that. I'm open to
> corrections, however :)
> 
> -- 
In the absense of any corrections, I think this is fine to have.

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4] ethdev: return named opaque type instead of void pointer
  2018-03-09 15:16  0%       ` Neil Horman
@ 2018-03-09 15:45  0%         ` Ferruh Yigit
  2018-03-09 19:06  0%           ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-03-09 15:45 UTC (permalink / raw)
  To: Neil Horman; +Cc: John McNamara, Marko Kovacevic, Thomas Monjalon, dev

On 3/9/2018 3:16 PM, Neil Horman wrote:
> On Fri, Mar 09, 2018 at 01:00:35PM +0000, Ferruh Yigit wrote:
>> On 3/9/2018 12:36 PM, Neil Horman wrote:
>>> On Fri, Mar 09, 2018 at 11:25:31AM +0000, Ferruh Yigit wrote:
>>>> "struct rte_eth_rxtx_callback" is defined as internal data structure and
>>>> used as named opaque type.
>>>>
>>>> So the functions that are adding callbacks can return objects in this
>>>> type instead of void pointer.
>>>>
>>>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>>>> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
>>>> ---
>>>> v2:
>>>> * keep using struct * in parameters, instead add callback functions
>>>> return struct rte_eth_rxtx_callback pointer.
>>>>
>>>> v4:
>>>> * Remove deprecation notice. LIBABIVER already increased in this release
>>>> ---
>>>>  doc/guides/rel_notes/deprecation.rst |  7 -------
>>>>  lib/librte_ether/rte_ethdev.c        |  6 +++---
>>>>  lib/librte_ether/rte_ethdev.h        | 13 ++++++++-----
>>>>  3 files changed, 11 insertions(+), 15 deletions(-)
>>>>
>>> This doesn't quite make sense to me.  If rte_eth_rxtx_callback is defined as an
>>> internal data structure, then it shouldn't be used as part of the prototype for
>>> an exported function, as the structure will then no longer be a internal data
>>> structure, but rather part of the public ABI.
>>
>> "struct rte_eth_rxtx_callback" is internal data structure. And application
>> should not access elements of this structure.
>>
>> "struct rte_eth_rxtx_callback;" is defined in the public header, so applications
>> can use it as opaque type.
>>
>> It is possible that both "add" and "remove" APIs use "void *" and API itself can
>> cast it. But the inconsistency was "add" related APIs return "void *" and
>> "remove" related APIs require a parameter in "struct rte_eth_rxtx_callback *" type.
>>
>> While unifying the usage, "struct rte_eth_rxtx_callback *" preferred against
>> "void *", because named opaque type documents intention/usage better.
>>
>> Thanks,
>> ferruh
>>
> I get what you're saying about rte_eth_rxtx_callback being an internals
> structure (or its intent is to be an internal structure), but it doesn't seem to
> hold up to the header file layout.  rte_eth_rxtx_callback is defined in
> rte_ethdev_core.h which according to the makefile, is listed as a symlinked
> file, and therefore available for external applications to include.  This
> negates the intended opaque nature of the struct.  I think before you do this,
> you want to rectify that.

Intention is to make "struct rte_eth_rxtx_callback" internal, but as you said it
is available to applications. This is same for all data structures in
rte_ethdev_core.h

Unfortunately it can't be actual internal because of inline functions in public
header uses them. And we can't change inline functions because of performance
concerns.

Since we can't make the structure real internal, we can't really prevent
applications to access the internals, this same if you use "void *".

> 
> Neil
> 
>>>
>>> Neil
>>>
>>
>>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4] ethdev: return named opaque type instead of void pointer
  2018-03-09 13:00  0%     ` Ferruh Yigit
@ 2018-03-09 15:16  0%       ` Neil Horman
  2018-03-09 15:45  0%         ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-03-09 15:16 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: John McNamara, Marko Kovacevic, Thomas Monjalon, dev

On Fri, Mar 09, 2018 at 01:00:35PM +0000, Ferruh Yigit wrote:
> On 3/9/2018 12:36 PM, Neil Horman wrote:
> > On Fri, Mar 09, 2018 at 11:25:31AM +0000, Ferruh Yigit wrote:
> >> "struct rte_eth_rxtx_callback" is defined as internal data structure and
> >> used as named opaque type.
> >>
> >> So the functions that are adding callbacks can return objects in this
> >> type instead of void pointer.
> >>
> >> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> >> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> >> ---
> >> v2:
> >> * keep using struct * in parameters, instead add callback functions
> >> return struct rte_eth_rxtx_callback pointer.
> >>
> >> v4:
> >> * Remove deprecation notice. LIBABIVER already increased in this release
> >> ---
> >>  doc/guides/rel_notes/deprecation.rst |  7 -------
> >>  lib/librte_ether/rte_ethdev.c        |  6 +++---
> >>  lib/librte_ether/rte_ethdev.h        | 13 ++++++++-----
> >>  3 files changed, 11 insertions(+), 15 deletions(-)
> >>
> > This doesn't quite make sense to me.  If rte_eth_rxtx_callback is defined as an
> > internal data structure, then it shouldn't be used as part of the prototype for
> > an exported function, as the structure will then no longer be a internal data
> > structure, but rather part of the public ABI.
> 
> "struct rte_eth_rxtx_callback" is internal data structure. And application
> should not access elements of this structure.
> 
> "struct rte_eth_rxtx_callback;" is defined in the public header, so applications
> can use it as opaque type.
> 
> It is possible that both "add" and "remove" APIs use "void *" and API itself can
> cast it. But the inconsistency was "add" related APIs return "void *" and
> "remove" related APIs require a parameter in "struct rte_eth_rxtx_callback *" type.
> 
> While unifying the usage, "struct rte_eth_rxtx_callback *" preferred against
> "void *", because named opaque type documents intention/usage better.
> 
> Thanks,
> ferruh
> 
I get what you're saying about rte_eth_rxtx_callback being an internals
structure (or its intent is to be an internal structure), but it doesn't seem to
hold up to the header file layout.  rte_eth_rxtx_callback is defined in
rte_ethdev_core.h which according to the makefile, is listed as a symlinked
file, and therefore available for external applications to include.  This
negates the intended opaque nature of the struct.  I think before you do this,
you want to rectify that.

Neil

> > 
> > Neil
> > 
> 
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v4] ethdev: return named opaque type instead of void pointer
       [not found]       ` <20180309123651.GB19004@hmswarspite.think-freely.org>
@ 2018-03-09 13:00  0%     ` Ferruh Yigit
  2018-03-09 15:16  0%       ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-03-09 13:00 UTC (permalink / raw)
  To: Neil Horman; +Cc: John McNamara, Marko Kovacevic, Thomas Monjalon, dev

On 3/9/2018 12:36 PM, Neil Horman wrote:
> On Fri, Mar 09, 2018 at 11:25:31AM +0000, Ferruh Yigit wrote:
>> "struct rte_eth_rxtx_callback" is defined as internal data structure and
>> used as named opaque type.
>>
>> So the functions that are adding callbacks can return objects in this
>> type instead of void pointer.
>>
>> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
>> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
>> ---
>> v2:
>> * keep using struct * in parameters, instead add callback functions
>> return struct rte_eth_rxtx_callback pointer.
>>
>> v4:
>> * Remove deprecation notice. LIBABIVER already increased in this release
>> ---
>>  doc/guides/rel_notes/deprecation.rst |  7 -------
>>  lib/librte_ether/rte_ethdev.c        |  6 +++---
>>  lib/librte_ether/rte_ethdev.h        | 13 ++++++++-----
>>  3 files changed, 11 insertions(+), 15 deletions(-)
>>
> This doesn't quite make sense to me.  If rte_eth_rxtx_callback is defined as an
> internal data structure, then it shouldn't be used as part of the prototype for
> an exported function, as the structure will then no longer be a internal data
> structure, but rather part of the public ABI.

"struct rte_eth_rxtx_callback" is internal data structure. And application
should not access elements of this structure.

"struct rte_eth_rxtx_callback;" is defined in the public header, so applications
can use it as opaque type.

It is possible that both "add" and "remove" APIs use "void *" and API itself can
cast it. But the inconsistency was "add" related APIs return "void *" and
"remove" related APIs require a parameter in "struct rte_eth_rxtx_callback *" type.

While unifying the usage, "struct rte_eth_rxtx_callback *" preferred against
"void *", because named opaque type documents intention/usage better.

Thanks,
ferruh

> 
> Neil
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-08 21:34  4%             ` Thomas Monjalon
@ 2018-03-09  0:18  4%               ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2018-03-09  0:18 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Ferruh Yigit, John McNamara, Marko Kovacevic, dev, Luca Boccassi,
	Christian Ehrhardt

On Thu, Mar 08, 2018 at 10:34:14PM +0100, Thomas Monjalon wrote:
> 08/03/2018 20:40, Neil Horman:
> > On Thu, Mar 08, 2018 at 05:04:01PM +0100, Thomas Monjalon wrote:
> > > 08/03/2018 16:35, Neil Horman:
> > > > On Thu, Mar 08, 2018 at 04:17:00PM +0100, Thomas Monjalon wrote:
> > > > > 08/03/2018 12:43, Ferruh Yigit:
> > > > > > On 3/8/2018 8:05 AM, Thomas Monjalon wrote:
> > > > > > > 07/03/2018 18:44, Ferruh Yigit:
> > > > > > >> After experimental API process defined do we still need RTE_NEXT_ABI
> > > > > > >> config and process which has similar targets?
> > > > > > > 
> > > > > > > They are different targets.
> > > > > > > Experimental API is always enabled but may be avoided by applications.
> > > > > > > Next ABI can be used to break ABI without notice and disabled to keep
> > > > > > > old ABI compatibility. It is almost never used because it is preferred
> > > > > > > to keep ABI compatibility with rte_compat macros, or wait a deprecation
> > > > > > > period after notice.
> > > > > > 
> > > > > > OK, I see.
> > > > > > 
> > > > > > Shouldn't we disable it by default at least? Otherwise who is not paying
> > > > > > attention to this config option will get and ABI/API break.
> > > > > 
> > > > > Yes I think you are right, it can be disabled by default.
> > > > > 
> > > > I would agree, there seems to be overlap here, and the experimental tagging can
> > > > cover what the NEXT_API flag is meant to do.  It can be removed I think.
> > > 
> > > It is not NEXT_API but NEXT_ABI.
> > Sorry, typo, though I'm sure you got that, since the former doesn't exist,
> > right?
> > > Why do you think it overlaps experimental API tagging?
> > 
> > I assert that because the compat lib has macros to map common symbols to version
> > specific ones.  That is to say, if you change a data structure, you can setup
> > the API calls that use said structure such that version 1 or the symbol maps to
> > an internal function that uses the old structure, while version 2 maps to an
> > internal function that uses the new symbol
> > 
> > That is to say, if you're planning on introducing ABI changes, the experimental
> > API tagging can be used to implement what the NEXT_ABI macro does.
> 
> It is a different usage.
> Experimental API tagging is for new functions.
> rte_compat is used to avoid breaking the ABI when changing old code.
> NEXT_ABI has been used in the past to disable an ABI breakage, which was
> not possible to mitigate with rte_compat because impacting too many functions.
> 
Thats not entirely true.  It _is_ used to manage ABI changes when backwards
compatibiilty needs to be preserved. It _can_be_ used for experimental abi
management.  That is to say, if you want to modify an existing ABI symbol, you
can do so by writing a new function, and then exporting the new function as the
old symbol with the @EXPERIMENTAL version.  Not saying we have to do that, but
we certainly can, and can eliminate NEXT_ABI in the process.

> I am not saying that I like NEXT_ABI, but it could be useful exceptionnally.
> 
Well, if the consensus is that it should be kept, its no skin off my nose, but
the discussion was around removing NEXT_ABI, and I was copied, so I thought I'd
add my $0.02

Neil

> 
> 

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v7 2/7] eventtimer: add common code
  @ 2018-03-08 21:54  2%   ` Erik Gabriel Carrillo
    1 sibling, 0 replies; 200+ results
From: Erik Gabriel Carrillo @ 2018-03-08 21:54 UTC (permalink / raw)
  To: pbhagavatula; +Cc: dev, jerin.jacob, nipun.gupta, hemant.agrawal

This commit adds the logic that is shared by all event timer adapter
drivers; the common code handles instance allocation and some
initialization.

Signed-off-by: Erik Gabriel Carrillo <erik.g.carrillo@intel.com>
---
 config/common_base                                |   1 +
 drivers/event/sw/sw_evdev.c                       |  18 +
 lib/librte_eventdev/Makefile                      |   2 +
 lib/librte_eventdev/rte_event_timer_adapter.c     | 459 ++++++++++++++++++++++
 lib/librte_eventdev/rte_event_timer_adapter_pmd.h | 150 +++++++
 lib/librte_eventdev/rte_eventdev.h                |   3 +
 lib/librte_eventdev/rte_eventdev_pmd.h            |  35 ++
 lib/librte_eventdev/rte_eventdev_version.map      |  20 +
 8 files changed, 688 insertions(+)
 create mode 100644 lib/librte_eventdev/rte_event_timer_adapter.c
 create mode 100644 lib/librte_eventdev/rte_event_timer_adapter_pmd.h

diff --git a/config/common_base b/config/common_base
index ad03cf4..286df74 100644
--- a/config/common_base
+++ b/config/common_base
@@ -546,6 +546,7 @@ CONFIG_RTE_LIBRTE_EVENTDEV=y
 CONFIG_RTE_LIBRTE_EVENTDEV_DEBUG=n
 CONFIG_RTE_EVENT_MAX_DEVS=16
 CONFIG_RTE_EVENT_MAX_QUEUES_PER_DEV=64
+CONFIG_RTE_EVENT_TIMER_ADAPTER_NUM_MAX=32
 
 #
 # Compile PMD for skeleton event device
diff --git a/drivers/event/sw/sw_evdev.c b/drivers/event/sw/sw_evdev.c
index 6672fd8..0847547 100644
--- a/drivers/event/sw/sw_evdev.c
+++ b/drivers/event/sw/sw_evdev.c
@@ -464,6 +464,22 @@ sw_eth_rx_adapter_caps_get(const struct rte_eventdev *dev,
 	return 0;
 }
 
+static int
+sw_timer_adapter_caps_get(const struct rte_eventdev *dev,
+			  uint64_t flags,
+			  uint32_t *caps,
+			  const struct rte_event_timer_adapter_ops **ops)
+{
+	RTE_SET_USED(dev);
+	RTE_SET_USED(flags);
+	*caps = 0;
+
+	/* Use default SW ops */
+	*ops = NULL;
+
+	return 0;
+}
+
 static void
 sw_info_get(struct rte_eventdev *dev, struct rte_event_dev_info *info)
 {
@@ -791,6 +807,8 @@ sw_probe(struct rte_vdev_device *vdev)
 
 			.eth_rx_adapter_caps_get = sw_eth_rx_adapter_caps_get,
 
+			.timer_adapter_caps_get = sw_timer_adapter_caps_get,
+
 			.xstats_get = sw_xstats_get,
 			.xstats_get_names = sw_xstats_get_names,
 			.xstats_get_by_name = sw_xstats_get_by_name,
diff --git a/lib/librte_eventdev/Makefile b/lib/librte_eventdev/Makefile
index 549b182..8b16e3f 100644
--- a/lib/librte_eventdev/Makefile
+++ b/lib/librte_eventdev/Makefile
@@ -20,6 +20,7 @@ LDLIBS += -lrte_eal -lrte_ring -lrte_ethdev -lrte_hash
 SRCS-y += rte_eventdev.c
 SRCS-y += rte_event_ring.c
 SRCS-y += rte_event_eth_rx_adapter.c
+SRCS-y += rte_event_timer_adapter.c
 
 # export include files
 SYMLINK-y-include += rte_eventdev.h
@@ -29,6 +30,7 @@ SYMLINK-y-include += rte_eventdev_pmd_vdev.h
 SYMLINK-y-include += rte_event_ring.h
 SYMLINK-y-include += rte_event_eth_rx_adapter.h
 SYMLINK-y-include += rte_event_timer_adapter.h
+SYMLINK-y-include += rte_event_timer_adapter_pmd.h
 
 # versioning export map
 EXPORT_MAP := rte_eventdev_version.map
diff --git a/lib/librte_eventdev/rte_event_timer_adapter.c b/lib/librte_eventdev/rte_event_timer_adapter.c
new file mode 100644
index 0000000..711d6b9
--- /dev/null
+++ b/lib/librte_eventdev/rte_event_timer_adapter.c
@@ -0,0 +1,459 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017-2018 Intel Corporation.
+ * All rights reserved.
+ */
+
+#include <string.h>
+
+#include <rte_memzone.h>
+#include <rte_memory.h>
+#include <rte_dev.h>
+#include <rte_errno.h>
+
+#include "rte_eventdev.h"
+#include "rte_eventdev_pmd.h"
+#include "rte_event_timer_adapter.h"
+#include "rte_event_timer_adapter_pmd.h"
+
+#define DATA_MZ_NAME_MAX_LEN 64
+#define DATA_MZ_NAME_FORMAT "rte_event_timer_adapter_data_%d"
+
+static int evtim_logtype;
+
+static struct rte_event_timer_adapter adapters[RTE_EVENT_TIMER_ADAPTER_NUM_MAX];
+
+static inline int
+adapter_valid(const struct rte_event_timer_adapter *adapter)
+{
+	return adapter != NULL && adapter->allocated == 1;
+}
+
+#define EVTIM_LOG(level, logtype, ...) \
+	rte_log(RTE_LOG_ ## level, logtype, \
+		RTE_FMT("EVTIMER: %s() line %u: " RTE_FMT_HEAD(__VA_ARGS__,) \
+			"\n", __func__, __LINE__, RTE_FMT_TAIL(__VA_ARGS__,)))
+
+#define EVTIM_LOG_ERR(...) EVTIM_LOG(ERR, evtim_logtype, __VA_ARGS__)
+
+#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
+#define EVTIM_LOG_DBG(...) \
+	EVTIM_LOG(DEBUG, evtim_logtype, __VA_ARGS__)
+#else
+#define EVTIM_LOG_DBG(...) (void)0
+#endif
+
+#define ADAPTER_VALID_OR_ERR_RET(adapter, retval) do { \
+	if (!adapter_valid(adapter))		       \
+		return retval;			       \
+} while (0)
+
+#define FUNC_PTR_OR_ERR_RET(func, errval) do { \
+	if ((func) == NULL)		       \
+		return errval;		       \
+} while (0)
+
+#define FUNC_PTR_OR_NULL_RET_WITH_ERRNO(func, errval) do { \
+	if ((func) == NULL) {				   \
+		rte_errno = errval;			   \
+		return NULL;				   \
+	}						   \
+} while (0)
+
+static int
+default_port_conf_cb(uint16_t id, uint8_t event_dev_id, uint8_t *event_port_id,
+		     void *conf_arg)
+{
+	struct rte_event_timer_adapter *adapter;
+	struct rte_eventdev *dev;
+	struct rte_event_dev_config dev_conf;
+	struct rte_event_port_conf *port_conf, def_port_conf = {0};
+	int started;
+	uint8_t port_id;
+	uint8_t dev_id;
+	int ret;
+
+	RTE_SET_USED(event_dev_id);
+
+	adapter = &adapters[id];
+	dev = &rte_eventdevs[adapter->data->event_dev_id];
+	dev_id = dev->data->dev_id;
+	dev_conf = dev->data->dev_conf;
+
+	started = dev->data->dev_started;
+	if (started)
+		rte_event_dev_stop(dev_id);
+
+	port_id = dev_conf.nb_event_ports;
+	dev_conf.nb_event_ports += 1;
+	ret = rte_event_dev_configure(dev_id, &dev_conf);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to configure event dev %u\n", dev_id);
+		if (started)
+			if (rte_event_dev_start(dev_id))
+				return -EIO;
+
+		return ret;
+	}
+
+	if (conf_arg != NULL)
+		port_conf = conf_arg;
+	else {
+		port_conf = &def_port_conf;
+		ret = rte_event_port_default_conf_get(dev_id, port_id,
+						      port_conf);
+		if (ret < 0)
+			return ret;
+	}
+
+	ret = rte_event_port_setup(dev_id, port_id, port_conf);
+	if (ret < 0) {
+		EVTIM_LOG_ERR("failed to setup event port %u on event dev %u\n",
+			      port_id, dev_id);
+		return ret;
+	}
+
+	*event_port_id = port_id;
+
+	if (started)
+		ret = rte_event_dev_start(dev_id);
+
+	return ret;
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_create(const struct rte_event_timer_adapter_conf *conf)
+{
+	return rte_event_timer_adapter_create_ext(conf, default_port_conf_cb,
+						  NULL);
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_create_ext(
+		const struct rte_event_timer_adapter_conf *conf,
+		rte_event_timer_adapter_port_conf_cb_t conf_cb,
+		void *conf_arg)
+{
+	uint16_t adapter_id;
+	struct rte_event_timer_adapter *adapter;
+	const struct rte_memzone *mz;
+	char mz_name[DATA_MZ_NAME_MAX_LEN];
+	int n, ret;
+	struct rte_eventdev *dev;
+
+	if (conf == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Check eventdev ID */
+	if (!rte_event_pmd_is_valid_dev(conf->event_dev_id)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	dev = &rte_eventdevs[conf->event_dev_id];
+
+	adapter_id = conf->timer_adapter_id;
+
+	/* Check that adapter_id is in range */
+	if (adapter_id >= RTE_EVENT_TIMER_ADAPTER_NUM_MAX) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Check adapter ID not already allocated */
+	adapter = &adapters[adapter_id];
+	if (adapter->allocated) {
+		rte_errno = EEXIST;
+		return NULL;
+	}
+
+	/* Create shared data area. */
+	n = snprintf(mz_name, sizeof(mz_name), DATA_MZ_NAME_FORMAT, adapter_id);
+	if (n >= (int)sizeof(mz_name)) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+	mz = rte_memzone_reserve(mz_name,
+				 sizeof(struct rte_event_timer_adapter_data),
+				 conf->socket_id, 0);
+	if (mz == NULL)
+		/* rte_errno set by rte_memzone_reserve */
+		return NULL;
+
+	adapter->data = mz->addr;
+	memset(adapter->data, 0, sizeof(struct rte_event_timer_adapter_data));
+
+	adapter->data->mz = mz;
+	adapter->data->event_dev_id = conf->event_dev_id;
+	adapter->data->id = adapter_id;
+	adapter->data->socket_id = conf->socket_id;
+	adapter->data->conf = *conf;  /* copy conf structure */
+
+	/* Query eventdev PMD for timer adapter capabilities and ops */
+	ret = dev->dev_ops->timer_adapter_caps_get(dev,
+						   adapter->data->conf.flags,
+						   &adapter->data->caps,
+						   &adapter->ops);
+	if (ret < 0) {
+		rte_errno = ret;
+		goto free_memzone;
+	}
+
+	if (!(adapter->data->caps &
+	      RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT)) {
+		FUNC_PTR_OR_NULL_RET_WITH_ERRNO(conf_cb, -EINVAL);
+		ret = conf_cb(adapter->data->id, adapter->data->event_dev_id,
+			      &adapter->data->event_port_id, conf_arg);
+		if (ret < 0) {
+			rte_errno = ret;
+			goto free_memzone;
+		}
+	}
+
+	/* Allow driver to do some setup */
+	FUNC_PTR_OR_NULL_RET_WITH_ERRNO(adapter->ops->init, -ENOTSUP);
+	ret = adapter->ops->init(adapter);
+	if (ret < 0) {
+		rte_errno = ret;
+		goto free_memzone;
+	}
+
+	/* Set fast-path function pointers */
+	adapter->arm_burst = adapter->ops->arm_burst;
+	adapter->arm_tmo_tick_burst = adapter->ops->arm_tmo_tick_burst;
+	adapter->cancel_burst = adapter->ops->cancel_burst;
+
+	adapter->allocated = 1;
+
+	return adapter;
+
+free_memzone:
+	rte_memzone_free(adapter->data->mz);
+	return NULL;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_get_info(const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_info *adapter_info)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+
+	if (adapter->ops->get_info)
+		/* let driver set values it knows */
+		adapter->ops->get_info(adapter, adapter_info);
+
+	/* Set common values */
+	adapter_info->conf = adapter->data->conf;
+	adapter_info->event_dev_port_id = adapter->data->event_port_id;
+	adapter_info->caps = adapter->data->caps;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_start(const struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->start, -EINVAL);
+
+	ret = adapter->ops->start(adapter);
+	if (ret < 0)
+		return ret;
+
+	adapter->data->started = 1;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stop(const struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stop, -EINVAL);
+
+	if (adapter->data->started == 0) {
+		EVTIM_LOG_ERR("event timer adapter %hu already stopped",
+			      adapter->data->id);
+		return 0;
+	}
+
+	ret = adapter->ops->stop(adapter);
+	if (ret < 0)
+		return ret;
+
+	adapter->data->started = 0;
+
+	return 0;
+}
+
+struct rte_event_timer_adapter * __rte_experimental
+rte_event_timer_adapter_lookup(uint16_t adapter_id)
+{
+	char name[DATA_MZ_NAME_MAX_LEN];
+	const struct rte_memzone *mz;
+	struct rte_event_timer_adapter_data *data;
+	struct rte_event_timer_adapter *adapter;
+	int ret;
+	struct rte_eventdev *dev;
+
+	if (adapters[adapter_id].allocated)
+		return &adapters[adapter_id]; /* Adapter is already loaded */
+
+	snprintf(name, DATA_MZ_NAME_MAX_LEN, DATA_MZ_NAME_FORMAT, adapter_id);
+	mz = rte_memzone_lookup(name);
+	if (mz == NULL) {
+		rte_errno = ENOENT;
+		return NULL;
+	}
+
+	data = mz->addr;
+
+	adapter = &adapters[data->id];
+	adapter->data = data;
+
+	dev = &rte_eventdevs[adapter->data->event_dev_id];
+
+	/* Query eventdev PMD for timer adapter capabilities and ops */
+	ret = dev->dev_ops->timer_adapter_caps_get(dev,
+						   adapter->data->conf.flags,
+						   &adapter->data->caps,
+						   &adapter->ops);
+	if (ret < 0) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	/* Set fast-path function pointers */
+	adapter->arm_burst = adapter->ops->arm_burst;
+	adapter->arm_tmo_tick_burst = adapter->ops->arm_tmo_tick_burst;
+	adapter->cancel_burst = adapter->ops->cancel_burst;
+
+	adapter->allocated = 1;
+
+	return adapter;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_free(struct rte_event_timer_adapter *adapter)
+{
+	int ret;
+
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->uninit, -EINVAL);
+
+	if (adapter->data->started == 1) {
+		EVTIM_LOG_ERR("event timer adapter %hu must be stopped "
+			      "before freeing", adapter->data->id);
+		return -EBUSY;
+	}
+
+	/* free impl priv data */
+	ret = adapter->ops->uninit(adapter);
+	if (ret < 0)
+		return ret;
+
+	/* free shared data area */
+	ret = rte_memzone_free(adapter->data->mz);
+	if (ret < 0)
+		return ret;
+
+	adapter->data = NULL;
+	adapter->allocated = 0;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_service_id_get(struct rte_event_timer_adapter *adapter,
+				       uint32_t *service_id)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+
+	if (adapter->data->service_inited && service_id != NULL)
+		*service_id = adapter->data->service_id;
+
+	return adapter->data->service_inited ? 0 : -ESRCH;
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stats_get(struct rte_event_timer_adapter *adapter,
+				  struct rte_event_timer_adapter_stats *stats)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stats_get, -EINVAL);
+	if (stats == NULL)
+		return -EINVAL;
+
+	return adapter->ops->stats_get(adapter, stats);
+}
+
+int __rte_experimental
+rte_event_timer_adapter_stats_reset(struct rte_event_timer_adapter *adapter)
+{
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->ops->stats_reset, -EINVAL);
+	return adapter->ops->stats_reset(adapter);
+}
+
+void __rte_experimental
+rte_event_timer_init(struct rte_event_timer *evtim)
+{
+	evtim->ev.op = RTE_EVENT_OP_NEW;
+	evtim->ev.event_type = RTE_EVENT_TYPE_TIMER;
+	evtim->state = RTE_EVENT_TIMER_NOT_ARMED;
+}
+
+int __rte_experimental
+rte_event_timer_arm_burst(const struct rte_event_timer_adapter *adapter,
+			  struct rte_event_timer **evtims,
+			  uint16_t nb_evtims)
+{
+#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->arm_burst, -EINVAL);
+#endif
+
+	return adapter->arm_burst(adapter, evtims, nb_evtims);
+}
+
+int __rte_experimental
+rte_event_timer_arm_tmo_tick_burst(
+			const struct rte_event_timer_adapter *adapter,
+			struct rte_event_timer **evtims,
+			const uint64_t timeout_ticks,
+			const uint16_t nb_evtims)
+{
+#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->arm_tmo_tick_burst, -EINVAL);
+#endif
+
+	return adapter->arm_tmo_tick_burst(adapter, evtims, timeout_ticks,
+					   nb_evtims);
+}
+
+int __rte_experimental
+rte_event_timer_cancel_burst(const struct rte_event_timer_adapter *adapter,
+			     struct rte_event_timer **evtims,
+			     uint16_t nb_evtims)
+{
+#ifdef RTE_LIBRTE_EVENTDEV_DEBUG
+	ADAPTER_VALID_OR_ERR_RET(adapter, -EINVAL);
+	FUNC_PTR_OR_ERR_RET(adapter->cancel_burst, -EINVAL);
+#endif
+
+	return adapter->cancel_burst(adapter, evtims, nb_evtims);
+}
+
+RTE_INIT(event_timer_adapter_init_log);
+static void
+event_timer_adapter_init_log(void)
+{
+	evtim_logtype = rte_log_register("lib.eventdev.adapter.timer");
+	if (evtim_logtype >= 0)
+		rte_log_set_level(evtim_logtype, RTE_LOG_NOTICE);
+}
diff --git a/lib/librte_eventdev/rte_event_timer_adapter_pmd.h b/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
new file mode 100644
index 0000000..db044c8
--- /dev/null
+++ b/lib/librte_eventdev/rte_event_timer_adapter_pmd.h
@@ -0,0 +1,150 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2017-2018 Intel Corporation.
+ * All rights reserved.
+ */
+
+#ifndef __RTE_EVENT_TIMER_ADAPTER_PMD_H__
+#define __RTE_EVENT_TIMER_ADAPTER_PMD_H__
+
+/**
+ * @file
+ * RTE Event Timer Adapter API (PMD Side)
+ *
+ * @note
+ * This file provides implementation helpers for internal use by PMDs.  They
+ * are not intended to be exposed to applications and are not subject to ABI
+ * versioning.
+ *
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include "rte_event_timer_adapter.h"
+
+/*
+ * Definitions of functions exported by an event timer adapter implementation
+ * through *rte_event_timer_adapter_ops* structure supplied in the
+ * *rte_event_timer_adapter* structure associated with an event timer adapter.
+ */
+
+typedef int (*rte_event_timer_adapter_init_t)(
+		struct rte_event_timer_adapter *adapter);
+/**< @internal Event timer adapter implementation setup */
+typedef int (*rte_event_timer_adapter_uninit_t)(
+		struct rte_event_timer_adapter *adapter);
+/**< @internal Event timer adapter implementation teardown */
+typedef int (*rte_event_timer_adapter_start_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Start running event timer adapter */
+typedef int (*rte_event_timer_adapter_stop_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Stop running event timer adapter */
+typedef void (*rte_event_timer_adapter_get_info_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_info *adapter_info);
+/**< @internal Get contextual information for event timer adapter */
+typedef int (*rte_event_timer_adapter_stats_get_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer_adapter_stats *stats);
+/**< @internal Get statistics for event timer adapter */
+typedef int (*rte_event_timer_adapter_stats_reset_t)(
+		const struct rte_event_timer_adapter *adapter);
+/**< @internal Reset statistics for event timer adapter */
+typedef int (*rte_event_timer_arm_burst_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **tims,
+		uint16_t nb_tims);
+/**< @internal Enable event timers to enqueue timer events upon expiry */
+typedef int (*rte_event_timer_arm_tmo_tick_burst_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **tims,
+		uint64_t timeout_tick,
+		uint16_t nb_tims);
+/**< @internal Enable event timers with common expiration time */
+typedef int (*rte_event_timer_cancel_burst_t)(
+		const struct rte_event_timer_adapter *adapter,
+		struct rte_event_timer **tims,
+		uint16_t nb_tims);
+/**< @internal Prevent event timers from enqueuing timer events */
+
+/**
+ * @internal Structure containing the functions exported by an event timer
+ * adapter implementation.
+ */
+struct rte_event_timer_adapter_ops {
+	rte_event_timer_adapter_init_t		init;  /**< Set up adapter */
+	rte_event_timer_adapter_uninit_t	uninit;/**< Tear down adapter */
+	rte_event_timer_adapter_start_t		start; /**< Start adapter */
+	rte_event_timer_adapter_stop_t		stop;  /**< Stop adapter */
+	rte_event_timer_adapter_get_info_t	get_info;
+	/**< Get info from driver */
+	rte_event_timer_adapter_stats_get_t	stats_get;
+	/**< Get adapter statistics */
+	rte_event_timer_adapter_stats_reset_t	stats_reset;
+	/**< Reset adapter statistics */
+	rte_event_timer_arm_burst_t		arm_burst;
+	/**< Arm one or more event timers */
+	rte_event_timer_arm_tmo_tick_burst_t	arm_tmo_tick_burst;
+	/**< Arm event timers with same expiration time */
+	rte_event_timer_cancel_burst_t		cancel_burst;
+	/**< Cancel one or more event timers */
+};
+
+/**
+ * @internal Adapter data; structure to be placed in shared memory to be
+ * accessible by various processes in a multi-process configuration.
+ */
+struct rte_event_timer_adapter_data {
+	uint8_t id;
+	/**< Event timer adapter ID */
+	uint8_t event_dev_id;
+	/**< Event device ID */
+	uint32_t socket_id;
+	/**< Socket ID where memory is allocated */
+	uint8_t event_port_id;
+	/**< Optional: event port ID used when the inbuilt port is absent */
+	const struct rte_memzone *mz;
+	/**< Event timer adapter memzone pointer */
+	struct rte_event_timer_adapter_conf conf;
+	/**< Configuration used to configure the adapter. */
+	uint32_t caps;
+	/**< Adapter capabilities */
+	void *adapter_priv;
+	/**< Timer adapter private data*/
+	uint8_t service_inited;
+	/**< Service initialization state */
+	uint32_t service_id;
+	/**< Service ID*/
+
+	RTE_STD_C11
+	uint8_t started : 1;
+	/**< Flag to indicate adapter started. */
+} __rte_cache_aligned;
+
+/**
+ * @internal Data structure associated with each event timer adapter.
+ */
+struct rte_event_timer_adapter {
+	rte_event_timer_arm_burst_t arm_burst;
+	/**< Pointer to driver arm_burst function. */
+	rte_event_timer_arm_tmo_tick_burst_t arm_tmo_tick_burst;
+	/**< Pointer to driver arm_tmo_tick_burst function. */
+	rte_event_timer_cancel_burst_t cancel_burst;
+	/**< Pointer to driver cancel function. */
+	struct rte_event_timer_adapter_data *data;
+	/**< Pointer to shared adapter data */
+	const struct rte_event_timer_adapter_ops *ops;
+	/**< Functions exported by adapter driver */
+
+	RTE_STD_C11
+	uint8_t allocated : 1;
+	/**< Flag to indicate that this adapter has been allocated */
+} __rte_cache_aligned;
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* __RTE_EVENT_TIMER_ADAPTER_PMD_H__ */
diff --git a/lib/librte_eventdev/rte_eventdev.h b/lib/librte_eventdev/rte_eventdev.h
index f9ad71e..888bcf1 100644
--- a/lib/librte_eventdev/rte_eventdev.h
+++ b/lib/librte_eventdev/rte_eventdev.h
@@ -1046,6 +1046,9 @@ struct rte_event {
  * @see struct rte_event_eth_rx_adapter_queue_conf::rx_queue_flags
  */
 
+#define RTE_EVENT_TIMER_ADAPTER_CAP_INTERNAL_PORT (1ULL << 1)
+/**< This flag is set when the timer mechanism is in HW. */
+
 /**
  * Retrieve the event device's ethdev Rx adapter capabilities for the
  * specified ethernet port
diff --git a/lib/librte_eventdev/rte_eventdev_pmd.h b/lib/librte_eventdev/rte_eventdev_pmd.h
index 31343b5..0e37f1c 100644
--- a/lib/librte_eventdev/rte_eventdev_pmd.h
+++ b/lib/librte_eventdev/rte_eventdev_pmd.h
@@ -26,6 +26,7 @@ extern "C" {
 #include <rte_malloc.h>
 
 #include "rte_eventdev.h"
+#include "rte_event_timer_adapter_pmd.h"
 
 /* Logging Macros */
 #define RTE_EDEV_LOG_ERR(...) \
@@ -449,6 +450,37 @@ typedef int (*eventdev_eth_rx_adapter_caps_get_t)
 struct rte_event_eth_rx_adapter_queue_conf *queue_conf;
 
 /**
+ * Retrieve the event device's timer adapter capabilities, as well as the ops
+ * structure that an event timer adapter should call through to enter the
+ * driver
+ *
+ * @param dev
+ *   Event device pointer
+ *
+ * @param flags
+ *   Flags that can be used to determine how to select an event timer
+ *   adapter ops structure
+ *
+ * @param[out] caps
+ *   A pointer to memory filled with Rx event adapter capabilities.
+ *
+ * @param[out] ops
+ *   A pointer to the ops pointer to set with the address of the desired ops
+ *   structure
+ *
+ * @return
+ *   - 0: Success, driver provides Rx event adapter capabilities for the
+ *	ethernet device.
+ *   - <0: Error code returned by the driver function.
+ *
+ */
+typedef int (*eventdev_timer_adapter_caps_get_t)(
+				const struct rte_eventdev *dev,
+				uint64_t flags,
+				uint32_t *caps,
+				const struct rte_event_timer_adapter_ops **ops);
+
+/**
  * Add ethernet Rx queues to event device. This callback is invoked if
  * the caps returned from rte_eventdev_eth_rx_adapter_caps_get(, eth_port_id)
  * has RTE_EVENT_ETH_RX_ADAPTER_CAP_INTERNAL_PORT set.
@@ -640,6 +672,9 @@ struct rte_eventdev_ops {
 	eventdev_eth_rx_adapter_stats_reset eth_rx_adapter_stats_reset;
 	/**< Reset ethernet Rx stats */
 
+	eventdev_timer_adapter_caps_get_t timer_adapter_caps_get;
+	/**< Get timer adapter capabilities */
+
 	eventdev_selftest dev_selftest;
 	/**< Start eventdev Selftest */
 };
diff --git a/lib/librte_eventdev/rte_eventdev_version.map b/lib/librte_eventdev/rte_eventdev_version.map
index 2aef470..345b0b1 100644
--- a/lib/librte_eventdev/rte_eventdev_version.map
+++ b/lib/librte_eventdev/rte_eventdev_version.map
@@ -74,3 +74,23 @@ DPDK_18.02 {
 
 	rte_event_dev_selftest;
 } DPDK_17.11;
+
+EXPERIMENTAL {
+	global:
+
+	rte_event_timer_adapter_create;
+	rte_event_timer_adapter_create_ext;
+	rte_event_timer_adapter_free;
+	rte_event_timer_adapter_get_info;
+	rte_event_timer_adapter_lookup;
+	rte_event_timer_adapter_service_id_get;
+	rte_event_timer_adapter_service_id_get;
+	rte_event_timer_adapter_start;
+	rte_event_timer_adapter_stats_get;
+	rte_event_timer_adapter_stats_reset;
+	rte_event_timer_adapter_stop;
+	rte_event_timer_init;
+	rte_event_timer_arm_burst;
+	rte_event_timer_arm_tmo_tick_burst;
+	rte_event_timer_cancel_burst;
+} DPDK_18.02;
-- 
2.6.4

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-08 19:40  3%           ` Neil Horman
@ 2018-03-08 21:34  4%             ` Thomas Monjalon
  2018-03-09  0:18  4%               ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-08 21:34 UTC (permalink / raw)
  To: Neil Horman
  Cc: Ferruh Yigit, John McNamara, Marko Kovacevic, dev, Luca Boccassi,
	Christian Ehrhardt

08/03/2018 20:40, Neil Horman:
> On Thu, Mar 08, 2018 at 05:04:01PM +0100, Thomas Monjalon wrote:
> > 08/03/2018 16:35, Neil Horman:
> > > On Thu, Mar 08, 2018 at 04:17:00PM +0100, Thomas Monjalon wrote:
> > > > 08/03/2018 12:43, Ferruh Yigit:
> > > > > On 3/8/2018 8:05 AM, Thomas Monjalon wrote:
> > > > > > 07/03/2018 18:44, Ferruh Yigit:
> > > > > >> After experimental API process defined do we still need RTE_NEXT_ABI
> > > > > >> config and process which has similar targets?
> > > > > > 
> > > > > > They are different targets.
> > > > > > Experimental API is always enabled but may be avoided by applications.
> > > > > > Next ABI can be used to break ABI without notice and disabled to keep
> > > > > > old ABI compatibility. It is almost never used because it is preferred
> > > > > > to keep ABI compatibility with rte_compat macros, or wait a deprecation
> > > > > > period after notice.
> > > > > 
> > > > > OK, I see.
> > > > > 
> > > > > Shouldn't we disable it by default at least? Otherwise who is not paying
> > > > > attention to this config option will get and ABI/API break.
> > > > 
> > > > Yes I think you are right, it can be disabled by default.
> > > > 
> > > I would agree, there seems to be overlap here, and the experimental tagging can
> > > cover what the NEXT_API flag is meant to do.  It can be removed I think.
> > 
> > It is not NEXT_API but NEXT_ABI.
> Sorry, typo, though I'm sure you got that, since the former doesn't exist,
> right?
> > Why do you think it overlaps experimental API tagging?
> 
> I assert that because the compat lib has macros to map common symbols to version
> specific ones.  That is to say, if you change a data structure, you can setup
> the API calls that use said structure such that version 1 or the symbol maps to
> an internal function that uses the old structure, while version 2 maps to an
> internal function that uses the new symbol
> 
> That is to say, if you're planning on introducing ABI changes, the experimental
> API tagging can be used to implement what the NEXT_ABI macro does.

It is a different usage.
Experimental API tagging is for new functions.
rte_compat is used to avoid breaking the ABI when changing old code.
NEXT_ABI has been used in the past to disable an ABI breakage, which was
not possible to mitigate with rte_compat because impacting too many functions.

I am not saying that I like NEXT_ABI, but it could be useful exceptionnally.

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-08 16:04  0%         ` Thomas Monjalon
@ 2018-03-08 19:40  3%           ` Neil Horman
  2018-03-08 21:34  4%             ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-03-08 19:40 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Ferruh Yigit, John McNamara, Marko Kovacevic, dev, Luca Boccassi,
	Christian Ehrhardt

On Thu, Mar 08, 2018 at 05:04:01PM +0100, Thomas Monjalon wrote:
> 08/03/2018 16:35, Neil Horman:
> > On Thu, Mar 08, 2018 at 04:17:00PM +0100, Thomas Monjalon wrote:
> > > 08/03/2018 12:43, Ferruh Yigit:
> > > > On 3/8/2018 8:05 AM, Thomas Monjalon wrote:
> > > > > 07/03/2018 18:44, Ferruh Yigit:
> > > > >> After experimental API process defined do we still need RTE_NEXT_ABI
> > > > >> config and process which has similar targets?
> > > > > 
> > > > > They are different targets.
> > > > > Experimental API is always enabled but may be avoided by applications.
> > > > > Next ABI can be used to break ABI without notice and disabled to keep
> > > > > old ABI compatibility. It is almost never used because it is preferred
> > > > > to keep ABI compatibility with rte_compat macros, or wait a deprecation
> > > > > period after notice.
> > > > 
> > > > OK, I see.
> > > > 
> > > > Shouldn't we disable it by default at least? Otherwise who is not paying
> > > > attention to this config option will get and ABI/API break.
> > > 
> > > Yes I think you are right, it can be disabled by default.
> > > 
> > I would agree, there seems to be overlap here, and the experimental tagging can
> > cover what the NEXT_API flag is meant to do.  It can be removed I think.
> 
> It is not NEXT_API but NEXT_ABI.
Sorry, typo, though I'm sure you got that, since the former doesn't exist,
right?
> Why do you think it overlaps experimental API tagging?

I assert that because the compat lib has macros to map common symbols to version
specific ones.  That is to say, if you change a data structure, you can setup
the API calls that use said structure such that version 1 or the symbol maps to
an internal function that uses the old structure, while version 2 maps to an
internal function that uses the new symbol

That is to say, if you're planning on introducing ABI changes, the experimental
API tagging can be used to implement what the NEXT_ABI macro does.

Neil

> 
> 
> 

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-08 15:35  0%       ` Neil Horman
@ 2018-03-08 16:04  0%         ` Thomas Monjalon
  2018-03-08 19:40  3%           ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-08 16:04 UTC (permalink / raw)
  To: Neil Horman
  Cc: Ferruh Yigit, John McNamara, Marko Kovacevic, dev, Luca Boccassi,
	Christian Ehrhardt

08/03/2018 16:35, Neil Horman:
> On Thu, Mar 08, 2018 at 04:17:00PM +0100, Thomas Monjalon wrote:
> > 08/03/2018 12:43, Ferruh Yigit:
> > > On 3/8/2018 8:05 AM, Thomas Monjalon wrote:
> > > > 07/03/2018 18:44, Ferruh Yigit:
> > > >> After experimental API process defined do we still need RTE_NEXT_ABI
> > > >> config and process which has similar targets?
> > > > 
> > > > They are different targets.
> > > > Experimental API is always enabled but may be avoided by applications.
> > > > Next ABI can be used to break ABI without notice and disabled to keep
> > > > old ABI compatibility. It is almost never used because it is preferred
> > > > to keep ABI compatibility with rte_compat macros, or wait a deprecation
> > > > period after notice.
> > > 
> > > OK, I see.
> > > 
> > > Shouldn't we disable it by default at least? Otherwise who is not paying
> > > attention to this config option will get and ABI/API break.
> > 
> > Yes I think you are right, it can be disabled by default.
> > 
> I would agree, there seems to be overlap here, and the experimental tagging can
> cover what the NEXT_API flag is meant to do.  It can be removed I think.

It is not NEXT_API but NEXT_ABI.
Why do you think it overlaps experimental API tagging?

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-08 15:17  0%     ` Thomas Monjalon
@ 2018-03-08 15:35  0%       ` Neil Horman
  2018-03-08 16:04  0%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Neil Horman @ 2018-03-08 15:35 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Ferruh Yigit, John McNamara, Marko Kovacevic, dev, Luca Boccassi,
	Christian Ehrhardt

On Thu, Mar 08, 2018 at 04:17:00PM +0100, Thomas Monjalon wrote:
> 08/03/2018 12:43, Ferruh Yigit:
> > On 3/8/2018 8:05 AM, Thomas Monjalon wrote:
> > > 07/03/2018 18:44, Ferruh Yigit:
> > >> After experimental API process defined do we still need RTE_NEXT_ABI
> > >> config and process which has similar targets?
> > > 
> > > They are different targets.
> > > Experimental API is always enabled but may be avoided by applications.
> > > Next ABI can be used to break ABI without notice and disabled to keep
> > > old ABI compatibility. It is almost never used because it is preferred
> > > to keep ABI compatibility with rte_compat macros, or wait a deprecation
> > > period after notice.
> > 
> > OK, I see.
> > 
> > Shouldn't we disable it by default at least? Otherwise who is not paying
> > attention to this config option will get and ABI/API break.
> 
> Yes I think you are right, it can be disabled by default.
> 
I would agree, there seems to be overlap here, and the experimental tagging can
cover what the NEXT_API flag is meant to do.  It can be removed I think.
Neil

> 
> 
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-08 11:43  3%   ` Ferruh Yigit
@ 2018-03-08 15:17  0%     ` Thomas Monjalon
  2018-03-08 15:35  0%       ` Neil Horman
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-08 15:17 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Neil Horman, John McNamara, Marko Kovacevic, dev, Luca Boccassi,
	Christian Ehrhardt

08/03/2018 12:43, Ferruh Yigit:
> On 3/8/2018 8:05 AM, Thomas Monjalon wrote:
> > 07/03/2018 18:44, Ferruh Yigit:
> >> After experimental API process defined do we still need RTE_NEXT_ABI
> >> config and process which has similar targets?
> > 
> > They are different targets.
> > Experimental API is always enabled but may be avoided by applications.
> > Next ABI can be used to break ABI without notice and disabled to keep
> > old ABI compatibility. It is almost never used because it is preferred
> > to keep ABI compatibility with rte_compat macros, or wait a deprecation
> > period after notice.
> 
> OK, I see.
> 
> Shouldn't we disable it by default at least? Otherwise who is not paying
> attention to this config option will get and ABI/API break.

Yes I think you are right, it can be disabled by default.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 18.05 v4] eal: add function to return number of detected sockets
  2018-03-08 12:12  3%     ` Bruce Richardson
@ 2018-03-08 14:38  0%       ` Burakov, Anatoly
  2018-03-09 16:32  0%         ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-03-08 14:38 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

On 08-Mar-18 12:12 PM, Bruce Richardson wrote:
> On Wed, Feb 07, 2018 at 09:58:36AM +0000, Anatoly Burakov wrote:
>> During lcore scan, find maximum socket ID and store it. This will
>> break the ABI, so bump ABI version.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>> ---
>>
>> Notes:
>>      v4:
>>      - Remove backwards ABI compatibility, bump ABI instead
>>      
>>      v3:
>>      - Added ABI compatibility
>>      
>>      v2:
>>      - checkpatch changes
>>      - check socket before deciding if the core is not to be used
>>
>>   lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
>>   lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
>>   lib/librte_eal/common/include/rte_eal.h   |  1 +
>>   lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
>>   lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
>>   lib/librte_eal/rte_eal_version.map        |  9 +++++++-
>>   6 files changed, 44 insertions(+), 15 deletions(-)
>>
> Breaking the ABI is the best way to implement this change, and given the
> deprecation was previously announced I'm ok with that.
> 
> Question: we are ok assuming that the socket numbers are sequential, or
> nearly so, and knowing the maximum socket number seen is a good
> approximation of the actual physical sockets? I know in terms of cores
> on a system, the core id's often jump - are there systems where the
> socket numbers do too?
> 
> /Bruce
> 

I am not aware of any system that would jump sockets like that. I'm open 
to corrections, however :)

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 18.05 v4] eal: add function to return number of detected sockets
  @ 2018-03-08 12:12  3%     ` Bruce Richardson
  2018-03-08 14:38  0%       ` Burakov, Anatoly
  2018-03-21  4:59  0%     ` gowrishankar muthukrishnan
  1 sibling, 1 reply; 200+ results
From: Bruce Richardson @ 2018-03-08 12:12 UTC (permalink / raw)
  To: Anatoly Burakov; +Cc: dev

On Wed, Feb 07, 2018 at 09:58:36AM +0000, Anatoly Burakov wrote:
> During lcore scan, find maximum socket ID and store it. This will
> break the ABI, so bump ABI version.
> 
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
> 
> Notes:
>     v4:
>     - Remove backwards ABI compatibility, bump ABI instead
>     
>     v3:
>     - Added ABI compatibility
>     
>     v2:
>     - checkpatch changes
>     - check socket before deciding if the core is not to be used
> 
>  lib/librte_eal/bsdapp/eal/Makefile        |  2 +-
>  lib/librte_eal/common/eal_common_lcore.c  | 37 +++++++++++++++++++++----------
>  lib/librte_eal/common/include/rte_eal.h   |  1 +
>  lib/librte_eal/common/include/rte_lcore.h |  8 +++++++
>  lib/librte_eal/linuxapp/eal/Makefile      |  2 +-
>  lib/librte_eal/rte_eal_version.map        |  9 +++++++-
>  6 files changed, 44 insertions(+), 15 deletions(-)
> 
Breaking the ABI is the best way to implement this change, and given the
deprecation was previously announced I'm ok with that.

Question: we are ok assuming that the socket numbers are sequential, or
nearly so, and knowing the maximum socket number seen is a good
approximation of the actual physical sockets? I know in terms of cores
on a system, the core id's often jump - are there systems where the
socket numbers do too?

/Bruce

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-08  8:05  5% ` Thomas Monjalon
@ 2018-03-08 11:43  3%   ` Ferruh Yigit
  2018-03-08 15:17  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-03-08 11:43 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Neil Horman, John McNamara, Marko Kovacevic, dev, Luca Boccassi,
	Christian Ehrhardt

On 3/8/2018 8:05 AM, Thomas Monjalon wrote:
> 07/03/2018 18:44, Ferruh Yigit:
>> After experimental API process defined do we still need RTE_NEXT_ABI
>> config and process which has similar targets?
> 
> They are different targets.
> Experimental API is always enabled but may be avoided by applications.
> Next ABI can be used to break ABI without notice and disabled to keep
> old ABI compatibility. It is almost never used because it is preferred
> to keep ABI compatibility with rte_compat macros, or wait a deprecation
> period after notice.

OK, I see.

Shouldn't we disable it by default at least? Otherwise who is not paying
attention to this config option will get and ABI/API break.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-07 17:44 23% [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI Ferruh Yigit
  2018-03-07 18:06  0% ` Luca Boccassi
@ 2018-03-08  8:05  5% ` Thomas Monjalon
  2018-03-08 11:43  3%   ` Ferruh Yigit
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-08  8:05 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Neil Horman, John McNamara, Marko Kovacevic, dev, Luca Boccassi,
	Christian Ehrhardt

07/03/2018 18:44, Ferruh Yigit:
> After experimental API process defined do we still need RTE_NEXT_ABI
> config and process which has similar targets?

They are different targets.
Experimental API is always enabled but may be avoided by applications.
Next ABI can be used to break ABI without notice and disabled to keep
old ABI compatibility. It is almost never used because it is preferred
to keep ABI compatibility with rte_compat macros, or wait a deprecation
period after notice.

^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [RFC PATCH 1/5] bpf: add BPF loading and execution framework
  @ 2018-03-08  1:29  2% ` Konstantin Ananyev
    1 sibling, 0 replies; 200+ results
From: Konstantin Ananyev @ 2018-03-08  1:29 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

librte_bpf provides a framework to load and execute eBPF bytecode
inside user-space dpdk based applications.

Not currently supported features:
 - JIT
 - cBPF
 - tail-pointer call
 - eBPF MAP
 - skb

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 config/common_base                 |   5 +
 config/common_linuxapp             |   1 +
 lib/Makefile                       |   2 +
 lib/librte_bpf/Makefile            |  30 +++
 lib/librte_bpf/bpf.c               |  48 ++++
 lib/librte_bpf/bpf_exec.c          | 453 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_impl.h          |  37 +++
 lib/librte_bpf/bpf_load.c          | 344 ++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_validate.c      |  55 +++++
 lib/librte_bpf/rte_bpf.h           | 154 +++++++++++++
 lib/librte_bpf/rte_bpf_version.map |  12 +
 mk/rte.app.mk                      |   2 +
 12 files changed, 1143 insertions(+)
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map

diff --git a/config/common_base b/config/common_base
index ad03cf433..2205b684f 100644
--- a/config/common_base
+++ b/config/common_base
@@ -823,3 +823,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y
 # Compile the eventdev application
 #
 CONFIG_RTE_APP_EVENTDEV=y
+
+#
+# Compile librte_bpf
+#
+CONFIG_RTE_LIBRTE_BPF=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index ff98f2355..7b4a0ce7d 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -10,6 +10,7 @@ CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=y
 CONFIG_RTE_EAL_IGB_UIO=y
 CONFIG_RTE_EAL_VFIO=y
 CONFIG_RTE_KNI_KMOD=y
+CONFIG_RTE_LIBRTE_BPF=y
 CONFIG_RTE_LIBRTE_KNI=y
 CONFIG_RTE_LIBRTE_PMD_KNI=y
 CONFIG_RTE_LIBRTE_VHOST=y
diff --git a/lib/Makefile b/lib/Makefile
index ec965a606..a4a2329f9 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -97,6 +97,8 @@ DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
 DIRS-$(CONFIG_RTE_LIBRTE_GSO) += librte_gso
 DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ether librte_net
 DEPDIRS-librte_gso += librte_mempool
+DIRS-$(CONFIG_RTE_LIBRTE_BPF) += librte_bpf
+DEPDIRS-librte_bpf := librte_eal librte_mempool librte_mbuf librte_ether
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
new file mode 100644
index 000000000..e0f434e77
--- /dev/null
+++ b/lib/librte_bpf/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bpf.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+LDLIBS += -lrte_net -lrte_eal
+LDLIBS += -lrte_mempool -lrte_ring
+LDLIBS += -lrte_mbuf -lrte_ethdev
+LDLIBS += -lelf
+
+EXPORT_MAP := rte_bpf_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+
+# install header files
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
new file mode 100644
index 000000000..4727d2251
--- /dev/null
+++ b/lib/librte_bpf/bpf.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+__rte_experimental void
+rte_bpf_destroy(struct rte_bpf *bpf)
+{
+	if (bpf != NULL) {
+		if (bpf->jit.func != NULL)
+			munmap(bpf->jit.func, bpf->jit.sz);
+		munmap(bpf, bpf->sz);
+	}
+}
+
+__rte_experimental int
+rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit)
+{
+	if (bpf == NULL || jit == NULL)
+		return -EINVAL;
+
+	jit[0] = bpf->jit;
+	return 0;
+}
+
+int
+bpf_jit(struct rte_bpf *bpf)
+{
+	int32_t rc;
+
+	rc = -ENOTSUP;
+
+	if (rc != 0)
+		RTE_LOG(WARNING, USER1, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
diff --git a/lib/librte_bpf/bpf_exec.c b/lib/librte_bpf/bpf_exec.c
new file mode 100644
index 000000000..4bad0cc9e
--- /dev/null
+++ b/lib/librte_bpf/bpf_exec.c
@@ -0,0 +1,453 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define BPF_JMP_UNC(ins)	((ins) += (ins)->off)
+
+#define BPF_JMP_CND_REG(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg]) ? \
+		(ins)->off : 0)
+
+#define BPF_JMP_CND_IMM(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(ins)->imm) ? \
+		(ins)->off : 0)
+
+#define BPF_NEG_ALU(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(-(reg)[(ins)->dst_reg]))
+
+#define BPF_MOV_ALU_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(reg)[(ins)->src_reg])
+
+#define BPF_OP_ALU_REG(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg])
+
+#define BPF_MOV_ALU_IMM(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(ins)->imm)
+
+#define BPF_OP_ALU_IMM(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(ins)->imm)
+
+#define BPF_DIV_ZERO_CHECK(bpf, reg, ins, type) do { \
+	if ((type)(reg)[(ins)->src_reg] == 0) { \
+		RTE_LOG(ERR, USER1, \
+			"%s(%p): division by 0 at pc: %#zx;\n", \
+			__func__, bpf, \
+			(uintptr_t)(ins) - (uintptr_t)(bpf)->prm.ins); \
+		return 0; \
+	} \
+} while (0)
+
+#define BPF_LD_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = \
+		*(type *)(uintptr_t)((reg)[(ins)->src_reg] + (ins)->off))
+
+#define BPF_ST_IMM(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(ins)->imm)
+
+#define BPF_ST_REG(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(reg)[(ins)->src_reg])
+
+#define BPF_ST_XADD_REG(reg, ins, tp)	\
+	(rte_atomic##tp##_add((rte_atomic##tp##_t *) \
+		(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off), \
+		reg[ins->src_reg]))
+
+static inline void
+bpf_alu_be(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_be_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_be_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_be_64(*v);
+		break;
+	}
+}
+
+static inline void
+bpf_alu_le(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_le_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_le_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_le_64(*v);
+		break;
+	}
+}
+
+static inline uint64_t
+bpf_exec(const struct rte_bpf *bpf, uint64_t reg[MAX_BPF_REG])
+{
+	const struct bpf_insn *ins;
+
+	for (ins = bpf->prm.ins; ; ins++) {
+		switch (ins->code) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint32_t);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_BE):
+			bpf_alu_be(reg, ins);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_LE):
+			bpf_alu_le(reg, ins);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint64_t);
+			break;
+		/* 64 bit ALU REG operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint64_t);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+			BPF_LD_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_H):
+			BPF_LD_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_W):
+			BPF_LD_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_DW):
+			BPF_LD_REG(reg, ins, uint64_t);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | BPF_DW):
+			reg[ins->dst_reg] = (uint32_t)ins[0].imm |
+				(uint64_t)(uint32_t)ins[1].imm << 32;
+			ins++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+			BPF_ST_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_H):
+			BPF_ST_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_W):
+			BPF_ST_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_DW):
+			BPF_ST_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+			BPF_ST_IMM(reg, ins, uint8_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_H):
+			BPF_ST_IMM(reg, ins, uint16_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_W):
+			BPF_ST_IMM(reg, ins, uint32_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_DW):
+			BPF_ST_IMM(reg, ins, uint64_t);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | BPF_XADD | BPF_W):
+			BPF_ST_XADD_REG(reg, ins, 32);
+			break;
+		case (BPF_STX | BPF_XADD | BPF_DW):
+			BPF_ST_XADD_REG(reg, ins, 64);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			BPF_JMP_UNC(ins);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, &, uint64_t);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, &, uint64_t);
+			break;
+		/* call instructions */
+		case (BPF_JMP | BPF_CALL):
+			reg[BPF_REG_0] = bpf->prm.xsym[ins->imm].func(
+				reg[BPF_REG_1], reg[BPF_REG_2], reg[BPF_REG_3],
+				reg[BPF_REG_4], reg[BPF_REG_5]);
+			break;
+		/* return instruction */
+		case (BPF_JMP | BPF_EXIT):
+			return reg[BPF_REG_0];
+		default:
+			RTE_LOG(ERR, USER1,
+				"%s(%p): invalid opcode %#x at pc: %#zx;\n",
+				__func__, bpf, ins->code,
+				(uintptr_t)ins - (uintptr_t)bpf->prm.ins);
+			return 0;
+		}
+	}
+
+	/* should never be reached */
+	RTE_VERIFY(0);
+	return 0;
+}
+
+__rte_experimental uint32_t
+rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
+	uint32_t num)
+{
+	uint32_t i;
+	uint64_t reg[MAX_BPF_REG];
+	uint64_t stack[MAX_BPF_STACK_SIZE / sizeof(uint64_t)];
+
+	for (i = 0; i != num; i++) {
+
+		reg[BPF_REG_1] = (uintptr_t)ctx[i];
+		reg[BPF_REG_10] = (uintptr_t)(stack + RTE_DIM(stack));
+
+		rc[i] = bpf_exec(bpf, reg);
+	}
+
+	return i;
+}
+
+__rte_experimental uint64_t
+rte_bpf_exec(const struct rte_bpf *bpf, void *ctx)
+{
+	uint64_t rc;
+
+	rte_bpf_exec_burst(bpf, &ctx, &rc, 1);
+	return rc;
+}
+
diff --git a/lib/librte_bpf/bpf_impl.h b/lib/librte_bpf/bpf_impl.h
new file mode 100644
index 000000000..f09417088
--- /dev/null
+++ b/lib/librte_bpf/bpf_impl.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _BPF_H_
+#define _BPF_H_
+
+#include <rte_bpf.h>
+#include <sys/mman.h>
+#include <linux/bpf.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MAX_BPF_STACK_SIZE	0x200
+
+struct rte_bpf {
+	struct rte_bpf_prm prm;
+	struct rte_bpf_jit jit;
+	size_t sz;
+	uint32_t stack_sz;
+};
+
+extern int bpf_validate(struct rte_bpf *bpf);
+
+extern int bpf_jit(struct rte_bpf *bpf);
+
+#ifdef RTE_ARCH_X86_64
+extern int bpf_jit_x86(struct rte_bpf *);
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _BPF_H_ */
diff --git a/lib/librte_bpf/bpf_load.c b/lib/librte_bpf/bpf_load.c
new file mode 100644
index 000000000..84c6b9417
--- /dev/null
+++ b/lib/librte_bpf/bpf_load.c
@@ -0,0 +1,344 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/queue.h>
+#include <fcntl.h>
+
+#include <libelf.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+#include <rte_errno.h>
+
+#include "bpf_impl.h"
+
+static uint32_t
+bpf_find_func(const char *sn, const struct rte_bpf_xsym fp[], uint32_t fn)
+{
+	uint32_t i;
+
+	if (sn == NULL || fp == NULL)
+		return UINT32_MAX;
+
+	for (i = 0; i != fn; i++) {
+		if (fp[i].type == RTE_BPF_XTYPE_FUNC &&
+				strcmp(sn, fp[i].name) == 0)
+			break;
+	}
+
+	return (i != fn) ? i : UINT32_MAX;
+}
+
+static int
+check_elf_header(const Elf64_Ehdr * eh)
+{
+	const char *err;
+
+	err = NULL;
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	if (eh->e_ident[EI_DATA] != ELFDATA2LSB)
+#else
+	if (eh->e_ident[EI_DATA] != ELFDATA2MSB)
+#endif
+		err = "not native byte order";
+	else if (eh->e_ident[EI_OSABI] != ELFOSABI_NONE)
+		err = "unexpected OS ABI";
+	else if (eh->e_type != ET_REL)
+		err = "unexpected ELF type";
+	else if (eh->e_machine != EM_NONE && eh->e_machine != EM_BPF)
+		err = "unexpected machine type";
+
+	if (err != NULL) {
+		RTE_LOG(ERR, USER1, "%s(): %s\n", __func__, err);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find executable section by name.
+ */
+static int
+find_elf_code(Elf *elf, const char *section, Elf_Data **psd, size_t *pidx)
+{
+	Elf_Scn *sc;
+	const Elf64_Ehdr *eh;
+	const Elf64_Shdr *sh;
+	Elf_Data *sd;
+	const char *sn;
+	int32_t rc;
+
+	eh = elf64_getehdr(elf);
+	if (eh == NULL) {
+		rc = elf_errno();
+		RTE_LOG(ERR, USER1, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	if (check_elf_header(eh) != 0)
+		return -EINVAL;
+
+	/* find given section by name */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL;
+			sc = elf_nextscn(elf, sc)) {
+		sh = elf64_getshdr(sc);
+		sn = elf_strptr(elf, eh->e_shstrndx, sh->sh_name);
+		if (sn != NULL && strcmp(section, sn) == 0 &&
+				sh->sh_type == SHT_PROGBITS &&
+				sh->sh_flags == (SHF_ALLOC | SHF_EXECINSTR))
+			break;
+	}
+
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL || sd->d_size == 0 ||
+			sd->d_size % sizeof(struct bpf_insn) != 0) {
+		rc = elf_errno();
+		RTE_LOG(ERR, USER1, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	*psd = sd;
+	*pidx = elf_ndxscn(sc);
+	return 0;
+}
+
+/*
+ * helper function to process data from relocation table.
+ */
+static int
+process_reloc(Elf *elf, size_t sym_idx, Elf64_Rel *re, size_t re_sz,
+	struct bpf_insn *ins, size_t ins_sz, const struct rte_bpf_prm *prm)
+{
+	uint32_t i, idx, fidx, n;
+	size_t ofs, sym;
+	const char *sn;
+	const Elf64_Ehdr *eh;
+	Elf_Scn *sc;
+	const Elf_Data *sd;
+	Elf64_Sym *sm;
+
+	eh = elf64_getehdr(elf);
+
+	/* get symtable by section index */
+	sc = elf_getscn(elf, sym_idx);
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL)
+		return -EINVAL;
+	sm = sd->d_buf;
+
+	n = re_sz / sizeof(re[0]);
+	for (i = 0; i != n; i++) {
+
+		ofs = re[i].r_offset;
+		if (ofs % sizeof(ins[0]) != 0 || ofs >= ins_sz)
+			return -EINVAL;
+
+		idx = ofs / sizeof(ins[0]);
+		if (ins[idx].code != (BPF_JMP | BPF_CALL))
+			return -EINVAL;
+
+		/* retrieve index in the symtable */
+		sym = ELF64_R_SYM(re[i].r_info);
+		if (sym * sizeof(sm[0]) >= sd->d_size)
+			return -EINVAL;
+
+		sn = elf_strptr(elf, eh->e_shstrndx, sm[sym].st_name);
+
+		fidx = bpf_find_func(sn, prm->xsym, prm->nb_xsym);
+		if (fidx == UINT32_MAX)
+			return -EINVAL;
+
+		ins[idx].imm = fidx;
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find relocation information (if any)
+ * and update bpf code.
+ */
+static int
+elf_reloc_code(Elf *elf, Elf_Data *ed, size_t sidx,
+	const struct rte_bpf_prm *prm)
+{
+	Elf64_Rel *re;
+	Elf_Scn *sc;
+	const Elf64_Shdr *sh;
+	const Elf_Data *sd;
+	int32_t rc;
+
+	rc = 0;
+
+	/* walk through all sections */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL && rc == 0;
+			sc = elf_nextscn(elf, sc)) {
+
+		sh = elf64_getshdr(sc);
+
+		/* relocation data for our code section */
+		if (sh->sh_type == SHT_REL && sh->sh_info == sidx) {
+			sd = elf_getdata(sc, NULL);
+			if (sd == NULL || sd->d_size == 0 ||
+					sd->d_size % sizeof(re[0]) != 0)
+				return -EINVAL;
+			rc = process_reloc(elf, sh->sh_link,
+				sd->d_buf, sd->d_size, ed->d_buf, ed->d_size,
+				prm);
+		}
+	}
+
+	return rc;
+}
+
+static struct rte_bpf *
+bpf_load(const struct rte_bpf_prm *prm)
+{
+	uint8_t *buf;
+	struct rte_bpf *bpf;
+	size_t sz, bsz, insz, xsz;
+
+	xsz =  prm->nb_xsym * sizeof(prm->xsym[0]);
+	insz = prm->nb_ins * sizeof(prm->ins[0]);
+	bsz = sizeof(bpf[0]);
+	sz = insz + xsz + bsz;
+
+	buf = mmap(NULL, sz, PROT_READ | PROT_WRITE,
+		MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (buf == MAP_FAILED)
+		return NULL;
+
+	bpf = (void *)buf;
+	bpf->sz = sz;
+
+	memcpy(&bpf->prm, prm, sizeof(bpf->prm));
+
+	memcpy(buf + bsz, prm->xsym, xsz);
+	memcpy(buf + bsz + xsz, prm->ins, insz);
+
+	bpf->prm.xsym = (void *)(buf + bsz);
+	bpf->prm.ins = (void *)(buf + bsz + xsz);
+
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_load(const struct rte_bpf_prm *prm)
+{
+	struct rte_bpf *bpf;
+	int32_t rc;
+
+	if (prm == NULL || prm->ins == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load(prm);
+	if (bpf == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	rc = bpf_validate(bpf);
+	if (rc == 0) {
+		bpf_jit(bpf);
+		if (mprotect(bpf, bpf->sz, PROT_READ) != 0)
+			rc = -ENOMEM;
+	}
+
+	if (rc != 0) {
+		rte_bpf_destroy(bpf);
+		rte_errno = -rc;
+		return NULL;
+	}
+
+	return bpf;
+}
+
+static struct rte_bpf *
+bpf_load_elf(const struct rte_bpf_prm *prm, int32_t fd, const char *section)
+{
+	Elf *elf;
+	Elf_Data *sd;
+	size_t sidx;
+	int32_t rc;
+	struct rte_bpf *bpf;
+	struct rte_bpf_prm np;
+
+	elf_version(EV_CURRENT);
+	elf = elf_begin(fd, ELF_C_READ, NULL);
+
+	rc = find_elf_code(elf, section, &sd, &sidx);
+	if (rc == 0)
+		rc = elf_reloc_code(elf, sd, sidx, prm);
+
+	if (rc == 0) {
+		np = prm[0];
+		np.ins = sd->d_buf;
+		np.nb_ins = sd->d_size / sizeof(struct bpf_insn);
+		bpf = rte_bpf_load(&np);
+	} else {
+		bpf = NULL;
+		rte_errno = -rc;
+	}
+
+	elf_end(elf);
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_elf_load(const struct rte_bpf_prm *prm, const char *fname,
+	const char *sname)
+{
+	int32_t fd, rc;
+	struct rte_bpf *bpf;
+
+	if (prm == NULL || fname == NULL || sname == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	fd = open(fname, O_RDONLY);
+	if (fd < 0) {
+		rc = errno;
+		RTE_LOG(ERR, USER1, "%s(%s) error code: %d(%s)\n",
+			__func__, fname, rc, strerror(rc));
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load_elf(prm, fd, sname);
+	close(fd);
+
+	if (bpf == NULL) {
+		RTE_LOG(ERR, USER1,
+			"%s(fname=\"%s\", sname=\"%s\") failed, "
+			"error code: %d\n",
+			__func__, fname, sname, rte_errno);
+		return NULL;
+	}
+
+	RTE_LOG(INFO, USER1, "%s(fname=\"%s\", sname=\"%s\") "
+		"successfully creates %p;\n",
+		__func__, fname, sname, bpf);
+	return bpf;
+}
diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c
new file mode 100644
index 000000000..7c1267cbd
--- /dev/null
+++ b/lib/librte_bpf/bpf_validate.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+/*
+ * dummy one for now, need more work.
+ */
+int
+bpf_validate(struct rte_bpf *bpf)
+{
+	int32_t rc, ofs, stack_sz;
+	uint32_t i, op, dr;
+	const struct bpf_insn *ins;
+
+	rc = 0;
+	stack_sz = 0;
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		ins = bpf->prm.ins + i;
+		op = ins->code;
+		dr = ins->dst_reg;
+		ofs = ins->off;
+
+		if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) &&
+				dr == BPF_REG_10) {
+			ofs -= sizeof(uint64_t);
+			stack_sz = RTE_MIN(ofs, stack_sz);
+		}
+	}
+
+	if (stack_sz != 0) {
+		stack_sz = -stack_sz;
+		if (stack_sz > MAX_BPF_STACK_SIZE)
+			rc = -ERANGE;
+		else
+			bpf->stack_sz = stack_sz;
+	}
+
+	if (rc != 0)
+		RTE_LOG(ERR, USER1, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
diff --git a/lib/librte_bpf/rte_bpf.h b/lib/librte_bpf/rte_bpf.h
new file mode 100644
index 000000000..45f622818
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf.h
@@ -0,0 +1,154 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_H_
+#define _RTE_BPF_H_
+
+#include <rte_common.h>
+#include <rte_mbuf.h>
+#include <linux/bpf.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Possible types for external symbols.
+ */
+enum rte_bpf_xtype {
+	RTE_BPF_XTYPE_FUNC, /**< function */
+	RTE_BPF_XTYPE_NUM
+};
+
+/**
+ * Definition for external symbols available in the BPF program.
+ */
+struct rte_bpf_xsym {
+	const char *name;        /**< name */
+	enum rte_bpf_xtype type; /**< type */
+	uint64_t (*func)(uint64_t, uint64_t, uint64_t, uint64_t, uint64_t);
+	/**< value */
+};
+
+/**
+ * Possible BPF program types.
+ */
+enum rte_bpf_prog_type {
+	RTE_BPF_PROG_TYPE_UNSPEC = BPF_PROG_TYPE_UNSPEC,
+	/**< input is a pointer to raw data */
+	RTE_BPF_PROG_TYPE_MBUF,
+	/**< input is a pointer to rte_mbuf */
+};
+
+/**
+ * Input parameters for loading eBPF code.
+ */
+struct rte_bpf_prm {
+	const struct bpf_insn *ins; /**< array of eBPF instructions */
+	uint32_t nb_ins;            /**< number of instructions in ins */
+	const struct rte_bpf_xsym *xsym;
+	/**< array of external symbols that eBPF code is allowed to reference */
+	uint32_t nb_xsym; /**< number of elements in xsym */
+	enum rte_bpf_prog_type prog_type; /**< eBPF program type */
+};
+
+/**
+ * Information about compiled into native ISA eBPF code.
+ */
+struct rte_bpf_jit {
+	uint64_t (*func)(void *);
+	size_t sz;
+};
+
+struct rte_bpf;
+
+/**
+ * De-allocate all memory used by this eBPF execution context.
+ *
+ * @param bpf
+ *   BPF handle to destroy.
+ */
+void rte_bpf_destroy(struct rte_bpf *bpf);
+
+/**
+ * Create a new eBPF execution context and load given BPF code into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_load(const struct rte_bpf_prm *prm);
+
+/**
+ * Create a new eBPF execution context and load BPF code from given ELF
+ * file into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_elf_load(const struct rte_bpf_prm *prm,
+	const char *fname, const char *sname);
+
+/**
+ * Execute given BPF bytecode.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   pointer to input context.
+ * @return
+ *   BPF execution return value.
+ */
+uint64_t rte_bpf_exec(const struct rte_bpf *bpf, void *ctx);
+
+/**
+ * Execute given BPF bytecode over a set of input contexts.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   array of pointers to the input contexts.
+ * @param rc
+ *   array of return values (one per input).
+ * @param num
+ *   number of elements in ctx[] (and rc[]).
+ * @return
+ *   number of successfully processed inputs.
+ */
+uint32_t rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[],
+	uint64_t rc[], uint32_t num);
+
+/**
+ * Provide information about natively compield code for given BPF handle.
+ *
+ * @param bpf
+ *   handle for the BPF code.
+ * @param jit
+ *   pointer to the rte_bpf_jit structure to be filled with related data.
+ * @return
+ *   - -EINVAL if the parameters are invalid.
+ *   - Zero if operation completed successfully.
+ */
+int rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
new file mode 100644
index 000000000..ff65144df
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_destroy;
+	rte_bpf_elf_load;
+	rte_bpf_exec;
+	rte_bpf_exec_burst;
+	rte_bpf_get_jit;
+	rte_bpf_load;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 3eb41d176..fb41c77d2 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -83,6 +83,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)          += -lrte_power
 _LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER)          += -lrte_timer
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EFD)            += -lrte_efd
 
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BPF)            += -lrte_bpf -lelf
+
 _LDLIBS-y += --whole-archive
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
-- 
2.13.6

^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
  2018-03-07 17:44 23% [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI Ferruh Yigit
@ 2018-03-07 18:06  0% ` Luca Boccassi
  2018-03-08  8:05  5% ` Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Luca Boccassi @ 2018-03-07 18:06 UTC (permalink / raw)
  To: Ferruh Yigit, Thomas Monjalon, Neil Horman, John McNamara,
	Marko Kovacevic
  Cc: dev, Christian Ehrhardt

On Wed, 2018-03-07 at 17:44 +0000, Ferruh Yigit wrote:
> After experimental API process defined do we still need RTE_NEXT_ABI
> config and process which has similar targets?
> 
> Are distros disable experimental APIs when delivering DPDK? And is
> there
> any config required to control this, as RTE_NEXT_ABI intended to do?

I tried to tinker with not exporting experimental APIs - but the
problem is intra-project dependencies, iow: librte_foo has a
foo_experimental API that librte_bar uses, so if librte_foo
foo_experimental symbol is not available everything breaks down. I need
to spend a bit more on this problem but -ENOTIME

> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>
> Cc: Luca Boccassi <bluca@debian.org>
> Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>
> 
> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---
>  config/common_base                     |  5 -----
>  devtools/test-build.sh                 |  2 --
>  devtools/validate-abi.sh               |  1 -
>  doc/guides/contributing/versioning.rst | 10 ----------
>  mk/rte.lib.mk                          |  5 -----
>  pkg/dpdk.spec                          |  1 -
>  6 files changed, 24 deletions(-)
> 
> diff --git a/config/common_base b/config/common_base
> index ad03cf433..6b867f6a9 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -41,11 +41,6 @@ CONFIG_RTE_ARCH_STRICT_ALIGN=n
>  CONFIG_RTE_BUILD_SHARED_LIB=n
>  
>  #
> -# Use newest code breaking previous ABI
> -#
> -CONFIG_RTE_NEXT_ABI=y
> -
> -#
>  # Major ABI to overwrite library specific LIBABIVER
>  #
>  CONFIG_RTE_MAJOR_ABI=
> diff --git a/devtools/test-build.sh b/devtools/test-build.sh
> index 3362edcc5..22b4e1a98 100755
> --- a/devtools/test-build.sh
> +++ b/devtools/test-build.sh
> @@ -154,8 +154,6 @@ config () # <directory> <target> <options>
>  		# Built-in options (lowercase)
>  		! echo $3 | grep -q '+default' || \
>  		sed -ri 's,(RTE_MACHINE=")native,\1default,'
> $1/.config
> -		echo $3 | grep -q '+next' || \
> -		sed -ri           's,(NEXT_ABI=)y,\1n,' $1/.config
>  		! echo $3 | grep -q '+shared' || \
>  		sed -ri         's,(SHARED_LIB=)n,\1y,' $1/.config
>  		! echo $3 | grep -q '+debug' || ( \
> diff --git a/devtools/validate-abi.sh b/devtools/validate-abi.sh
> index 138436d93..a64edf92f 100755
> --- a/devtools/validate-abi.sh
> +++ b/devtools/validate-abi.sh
> @@ -105,7 +105,6 @@ set_log_file() {
>  fixup_config() {
>  	local conf=config/defconfig_$target
>  	cmd sed -i -e"$ a\CONFIG_RTE_BUILD_SHARED_LIB=y" $conf
> -	cmd sed -i -e"$ a\CONFIG_RTE_NEXT_ABI=n" $conf
>  	cmd sed -i -e"$ a\CONFIG_RTE_EAL_IGB_UIO=n" $conf
>  	cmd sed -i -e"$ a\CONFIG_RTE_LIBRTE_KNI=n" $conf
>  	cmd sed -i -e"$ a\CONFIG_RTE_KNI_KMOD=n" $conf
> diff --git a/doc/guides/contributing/versioning.rst
> b/doc/guides/contributing/versioning.rst
> index c495294db..59ff0e8b7 100644
> --- a/doc/guides/contributing/versioning.rst
> +++ b/doc/guides/contributing/versioning.rst
> @@ -91,19 +91,9 @@ being provided. The requirements for doing so are:
>       interest" be sought for each deprecation, for example: from NIC
> vendors,
>       CPU vendors, end-users, etc.
>  
> -#. The changes (including an alternative map file) must be gated
> with
> -   the ``RTE_NEXT_ABI`` option, and provided with a deprecation
> notice at the
> -   same time.
> -   It will become the default ABI in the next release.
> -
>  #. A full deprecation cycle, as explained above, must be made to
> offer
>     downstream consumers sufficient warning of the change.
>  
> -#. At the beginning of the next release cycle, every
> ``RTE_NEXT_ABI``
> -   conditions will be removed, the ``LIBABIVER`` variable in the
> makefile(s)
> -   where the ABI is changed will be incremented, and the map files
> will
> -   be updated.
> -
>  Note that the above process for ABI deprecation should not be
> undertaken
>  lightly. ABI stability is extremely important for downstream
> consumers of the
>  DPDK, especially when distributed in shared object form. Every
> effort should
> diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
> index c696a2174..8ac26face 100644
> --- a/mk/rte.lib.mk
> +++ b/mk/rte.lib.mk
> @@ -20,11 +20,6 @@ endif
>  ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
>  LIB := $(patsubst %.a,%.so.$(LIBABIVER),$(LIB))
>  ifeq ($(EXTLIB_BUILD),n)
> -ifeq ($(CONFIG_RTE_MAJOR_ABI),)
> -ifeq ($(CONFIG_RTE_NEXT_ABI),y)
> -LIB := $(LIB).1
> -endif
> -endif
>  CPU_LDFLAGS += --version-script=$(SRCDIR)/$(EXPORT_MAP)
>  endif
>  endif
> diff --git a/pkg/dpdk.spec b/pkg/dpdk.spec
> index 4d3b5745c..d118f0463 100644
> --- a/pkg/dpdk.spec
> +++ b/pkg/dpdk.spec
> @@ -84,7 +84,6 @@ make O=%{target} T=%{config} config
>  sed -ri 's,(RTE_MACHINE=).*,\1%{machine},' %{target}/.config
>  sed -ri 's,(RTE_APP_TEST=).*,\1n,'         %{target}/.config
>  sed -ri 's,(RTE_BUILD_SHARED_LIB=).*,\1y,' %{target}/.config
> -sed -ri 's,(RTE_NEXT_ABI=).*,\1n,'         %{target}/.config
>  sed -ri 's,(LIBRTE_VHOST=).*,\1y,'         %{target}/.config
>  sed -ri 's,(LIBRTE_PMD_PCAP=).*,\1y,'      %{target}/.config
>  make O=%{target} %{?_smp_mflags}

-- 
Kind regards,
Luca Boccassi

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] ethdev: remove versioning of ethdev filter control function
  2018-03-07 17:17  0%   ` Ferruh Yigit
@ 2018-03-07 17:47  0%     ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2018-03-07 17:47 UTC (permalink / raw)
  To: Kirill Rybalchenko, dev; +Cc: andrey.chilikin, thomas

On 3/7/2018 5:17 PM, Ferruh Yigit wrote:
> On 2/27/2018 2:18 PM, Kirill Rybalchenko wrote:
>> In 18.02 release the ABI of ethdev component was changed.
>> To keep compatibility with previous versions of the library
>> the versioning of rte_eth_dev_filter_ctrl function was implemented.
>> As soon as deprecation note was issued in 18.02 release, there is
>> no need to keep compatibility with previous versions.
>> Remove the versioning of rte_eth_dev_filter_ctrl function.
>>
>> v2:
>> Modify map file, increment library version,
>> remove deprecation notice
>>
>> Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
> 
> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

Applied to dpdk-next-net/master, thanks.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI
@ 2018-03-07 17:44 23% Ferruh Yigit
  2018-03-07 18:06  0% ` Luca Boccassi
  2018-03-08  8:05  5% ` Thomas Monjalon
  0 siblings, 2 replies; 200+ results
From: Ferruh Yigit @ 2018-03-07 17:44 UTC (permalink / raw)
  To: Thomas Monjalon, Neil Horman, John McNamara, Marko Kovacevic
  Cc: dev, Ferruh Yigit, Luca Boccassi, Christian Ehrhardt

After experimental API process defined do we still need RTE_NEXT_ABI
config and process which has similar targets?

Are distros disable experimental APIs when delivering DPDK? And is there
any config required to control this, as RTE_NEXT_ABI intended to do?

Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Thomas Monjalon <thomas@monjalon.net>
Cc: Luca Boccassi <bluca@debian.org>
Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com>

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 config/common_base                     |  5 -----
 devtools/test-build.sh                 |  2 --
 devtools/validate-abi.sh               |  1 -
 doc/guides/contributing/versioning.rst | 10 ----------
 mk/rte.lib.mk                          |  5 -----
 pkg/dpdk.spec                          |  1 -
 6 files changed, 24 deletions(-)

diff --git a/config/common_base b/config/common_base
index ad03cf433..6b867f6a9 100644
--- a/config/common_base
+++ b/config/common_base
@@ -41,11 +41,6 @@ CONFIG_RTE_ARCH_STRICT_ALIGN=n
 CONFIG_RTE_BUILD_SHARED_LIB=n
 
 #
-# Use newest code breaking previous ABI
-#
-CONFIG_RTE_NEXT_ABI=y
-
-#
 # Major ABI to overwrite library specific LIBABIVER
 #
 CONFIG_RTE_MAJOR_ABI=
diff --git a/devtools/test-build.sh b/devtools/test-build.sh
index 3362edcc5..22b4e1a98 100755
--- a/devtools/test-build.sh
+++ b/devtools/test-build.sh
@@ -154,8 +154,6 @@ config () # <directory> <target> <options>
 		# Built-in options (lowercase)
 		! echo $3 | grep -q '+default' || \
 		sed -ri 's,(RTE_MACHINE=")native,\1default,' $1/.config
-		echo $3 | grep -q '+next' || \
-		sed -ri           's,(NEXT_ABI=)y,\1n,' $1/.config
 		! echo $3 | grep -q '+shared' || \
 		sed -ri         's,(SHARED_LIB=)n,\1y,' $1/.config
 		! echo $3 | grep -q '+debug' || ( \
diff --git a/devtools/validate-abi.sh b/devtools/validate-abi.sh
index 138436d93..a64edf92f 100755
--- a/devtools/validate-abi.sh
+++ b/devtools/validate-abi.sh
@@ -105,7 +105,6 @@ set_log_file() {
 fixup_config() {
 	local conf=config/defconfig_$target
 	cmd sed -i -e"$ a\CONFIG_RTE_BUILD_SHARED_LIB=y" $conf
-	cmd sed -i -e"$ a\CONFIG_RTE_NEXT_ABI=n" $conf
 	cmd sed -i -e"$ a\CONFIG_RTE_EAL_IGB_UIO=n" $conf
 	cmd sed -i -e"$ a\CONFIG_RTE_LIBRTE_KNI=n" $conf
 	cmd sed -i -e"$ a\CONFIG_RTE_KNI_KMOD=n" $conf
diff --git a/doc/guides/contributing/versioning.rst b/doc/guides/contributing/versioning.rst
index c495294db..59ff0e8b7 100644
--- a/doc/guides/contributing/versioning.rst
+++ b/doc/guides/contributing/versioning.rst
@@ -91,19 +91,9 @@ being provided. The requirements for doing so are:
      interest" be sought for each deprecation, for example: from NIC vendors,
      CPU vendors, end-users, etc.
 
-#. The changes (including an alternative map file) must be gated with
-   the ``RTE_NEXT_ABI`` option, and provided with a deprecation notice at the
-   same time.
-   It will become the default ABI in the next release.
-
 #. A full deprecation cycle, as explained above, must be made to offer
    downstream consumers sufficient warning of the change.
 
-#. At the beginning of the next release cycle, every ``RTE_NEXT_ABI``
-   conditions will be removed, the ``LIBABIVER`` variable in the makefile(s)
-   where the ABI is changed will be incremented, and the map files will
-   be updated.
-
 Note that the above process for ABI deprecation should not be undertaken
 lightly. ABI stability is extremely important for downstream consumers of the
 DPDK, especially when distributed in shared object form. Every effort should
diff --git a/mk/rte.lib.mk b/mk/rte.lib.mk
index c696a2174..8ac26face 100644
--- a/mk/rte.lib.mk
+++ b/mk/rte.lib.mk
@@ -20,11 +20,6 @@ endif
 ifeq ($(CONFIG_RTE_BUILD_SHARED_LIB),y)
 LIB := $(patsubst %.a,%.so.$(LIBABIVER),$(LIB))
 ifeq ($(EXTLIB_BUILD),n)
-ifeq ($(CONFIG_RTE_MAJOR_ABI),)
-ifeq ($(CONFIG_RTE_NEXT_ABI),y)
-LIB := $(LIB).1
-endif
-endif
 CPU_LDFLAGS += --version-script=$(SRCDIR)/$(EXPORT_MAP)
 endif
 endif
diff --git a/pkg/dpdk.spec b/pkg/dpdk.spec
index 4d3b5745c..d118f0463 100644
--- a/pkg/dpdk.spec
+++ b/pkg/dpdk.spec
@@ -84,7 +84,6 @@ make O=%{target} T=%{config} config
 sed -ri 's,(RTE_MACHINE=).*,\1%{machine},' %{target}/.config
 sed -ri 's,(RTE_APP_TEST=).*,\1n,'         %{target}/.config
 sed -ri 's,(RTE_BUILD_SHARED_LIB=).*,\1y,' %{target}/.config
-sed -ri 's,(RTE_NEXT_ABI=).*,\1n,'         %{target}/.config
 sed -ri 's,(LIBRTE_VHOST=).*,\1y,'         %{target}/.config
 sed -ri 's,(LIBRTE_PMD_PCAP=).*,\1y,'      %{target}/.config
 make O=%{target} %{?_smp_mflags}
-- 
2.13.6

^ permalink raw reply	[relevance 23%]

* Re: [dpdk-dev] [PATCH v2] ethdev: remove versioning of ethdev filter control function
  2018-02-27 14:18  7% ` [dpdk-dev] [PATCH v2] " Kirill Rybalchenko
@ 2018-03-07 17:17  0%   ` Ferruh Yigit
  2018-03-07 17:47  0%     ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-03-07 17:17 UTC (permalink / raw)
  To: Kirill Rybalchenko, dev; +Cc: andrey.chilikin, thomas

On 2/27/2018 2:18 PM, Kirill Rybalchenko wrote:
> In 18.02 release the ABI of ethdev component was changed.
> To keep compatibility with previous versions of the library
> the versioning of rte_eth_dev_filter_ctrl function was implemented.
> As soon as deprecation note was issued in 18.02 release, there is
> no need to keep compatibility with previous versions.
> Remove the versioning of rte_eth_dev_filter_ctrl function.
> 
> v2:
> Modify map file, increment library version,
> remove deprecation notice
> 
> Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>

Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [RFC PATCH v1 0/4] ethdev: add per-PMD tuning of RxTx parmeters
@ 2018-03-07 12:08  3% Remy Horton
  2018-03-21 14:27  3% ` [dpdk-dev] [PATCH v2 " Remy Horton
  0 siblings, 1 reply; 200+ results
From: Remy Horton @ 2018-03-07 12:08 UTC (permalink / raw)
  To: dev
  Cc: Wenzhuo Lu, Jingjing Wu, Qi Zhang, Beilei Xing, Shreyansh Jain,
	Thomas Monjalon

The optimal values of several transmission & reception related parameters,
such as burst sizes, descriptor ring sizes, and number of queues, varies
between different network interface devices. This patchset allows individual
PMDs to specify their preferred parameter values, and if so indicated by an
application, for them to be used automatically by the ethdev layer.

This RFC/V1 includes per-PMD values for e1000 and i40e but it is expected
that subsequent patchsets will cover other PMDs. A deprecation notice
covering the API/ABI change is in place.

Remy Horton (4):
  ethdev: add support for PMD-tuned Tx/Rx parameters
  net/e1000: add TxRx tuning parameters
  net/i40e: add TxRx tuning parameters
  testpmd: make use of per-PMD TxRx parameters

 app/test-pmd/testpmd.c         |  5 +++--
 drivers/net/e1000/em_ethdev.c  |  8 ++++++++
 drivers/net/i40e/i40e_ethdev.c | 35 ++++++++++++++++++++++++++++++++---
 lib/librte_ether/rte_ethdev.c  | 18 ++++++++++++++++++
 lib/librte_ether/rte_ethdev.h  | 15 +++++++++++++++
 5 files changed, 76 insertions(+), 5 deletions(-)

-- 
2.9.5

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] eal: register rte_panic user callback
  2018-03-07  9:59  0%     ` Thomas Monjalon
@ 2018-03-07 11:29  0%       ` Burakov, Anatoly
  0 siblings, 0 replies; 200+ results
From: Burakov, Anatoly @ 2018-03-07 11:29 UTC (permalink / raw)
  To: Thomas Monjalon, Arnon Warshavsky; +Cc: bruce.richardson, dev

On 07-Mar-18 9:59 AM, Thomas Monjalon wrote:
> 07/03/2018 10:05, Burakov, Anatoly:
>> On 07-Mar-18 8:32 AM, Thomas Monjalon wrote:
>>> Hi,
>>>
>>> 06/03/2018 19:28, Arnon Warshavsky:
>>>> The use case addressed here is dpdk environment init
>>>> aborting the process due to panic,
>>>> preventing the calling process from running its own tear-down actions.
>>>
>>> Thank you for working on this long standing issue.
>>>
>>>> A preferred, though ABI breaking solution would be
>>>> to have the environment init always return a value
>>>> rather than abort upon distress.
>>>
>>> Yes, it is the preferred solution.
>>> We should not use exit (panic & co) inside a library.
>>> It is important enough to break the API.
>>
>> +1, panic exists mostly for historical reasons AFAIK. it's a pity i
>> didn't think of it at the time of submitting the memory hotplug RFC,
>> because i now hit the same issue with the v1 - we might panic while
>> holding a lock, and didn't realize that it was an API break to change
>> this behavior.
>>
>> Can this really go into current release without deprecation notices?
> 
> If such an exception is done, it must be approved by the technical board.
> We need to check few criterias:
> 	- which functions need to be changed
> 	- how the application is impacted
> 	- what is the urgency
> 
> If a panic is removed and the application is not already checking some
> error code, the execution will continue without considering the error.
> 
> Some rte_panic could be probably removed without any impact on applications.
> Some rte_panic could wait for 18.08 with a notice in 18.05.
> If some rte_panic cannot wait, it must be discussed specifically.
> 

Can we add a compile warning for adding new rte_panic's into code? It's 
a nice tool while debugging, but it probably shouldn't be in any new 
production code.

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] eal: register rte_panic user callback
  2018-03-07  9:05  0%   ` Burakov, Anatoly
@ 2018-03-07  9:59  0%     ` Thomas Monjalon
  2018-03-07 11:29  0%       ` Burakov, Anatoly
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-07  9:59 UTC (permalink / raw)
  To: Burakov, Anatoly, Arnon Warshavsky; +Cc: bruce.richardson, dev

07/03/2018 10:05, Burakov, Anatoly:
> On 07-Mar-18 8:32 AM, Thomas Monjalon wrote:
> > Hi,
> > 
> > 06/03/2018 19:28, Arnon Warshavsky:
> >> The use case addressed here is dpdk environment init
> >> aborting the process due to panic,
> >> preventing the calling process from running its own tear-down actions.
> > 
> > Thank you for working on this long standing issue.
> > 
> >> A preferred, though ABI breaking solution would be
> >> to have the environment init always return a value
> >> rather than abort upon distress.
> > 
> > Yes, it is the preferred solution.
> > We should not use exit (panic & co) inside a library.
> > It is important enough to break the API.
> 
> +1, panic exists mostly for historical reasons AFAIK. it's a pity i 
> didn't think of it at the time of submitting the memory hotplug RFC, 
> because i now hit the same issue with the v1 - we might panic while 
> holding a lock, and didn't realize that it was an API break to change 
> this behavior.
> 
> Can this really go into current release without deprecation notices?

If such an exception is done, it must be approved by the technical board.
We need to check few criterias:
	- which functions need to be changed
	- how the application is impacted
	- what is the urgency

If a panic is removed and the application is not already checking some
error code, the execution will continue without considering the error.

Some rte_panic could be probably removed without any impact on applications.
Some rte_panic could wait for 18.08 with a notice in 18.05.
If some rte_panic cannot wait, it must be discussed specifically.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] eal: register rte_panic user callback
  2018-03-07  8:32  0% ` Thomas Monjalon
@ 2018-03-07  9:05  0%   ` Burakov, Anatoly
  2018-03-07  9:59  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Burakov, Anatoly @ 2018-03-07  9:05 UTC (permalink / raw)
  To: Thomas Monjalon, Arnon Warshavsky; +Cc: bruce.richardson, dev

On 07-Mar-18 8:32 AM, Thomas Monjalon wrote:
> Hi,
> 
> 06/03/2018 19:28, Arnon Warshavsky:
>> The use case addressed here is dpdk environment init
>> aborting the process due to panic,
>> preventing the calling process from running its own tear-down actions.
> 
> Thank you for working on this long standing issue.
> 
>> A preferred, though ABI breaking solution would be
>> to have the environment init always return a value
>> rather than abort upon distress.
> 
> Yes, it is the preferred solution.
> We should not use exit (panic & co) inside a library.
> It is important enough to break the API.

+1, panic exists mostly for historical reasons AFAIK. it's a pity i 
didn't think of it at the time of submitting the memory hotplug RFC, 
because i now hit the same issue with the v1 - we might panic while 
holding a lock, and didn't realize that it was an API break to change 
this behavior.

Can this really go into current release without deprecation notices?

-- 
Thanks,
Anatoly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] eal: register rte_panic user callback
  2018-03-06 18:28  3% [dpdk-dev] [PATCH] eal: register rte_panic user callback Arnon Warshavsky
@ 2018-03-07  8:32  0% ` Thomas Monjalon
  2018-03-07  9:05  0%   ` Burakov, Anatoly
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-03-07  8:32 UTC (permalink / raw)
  To: Arnon Warshavsky; +Cc: bruce.richardson, dev

Hi,

06/03/2018 19:28, Arnon Warshavsky:
> The use case addressed here is dpdk environment init
> aborting the process due to panic,
> preventing the calling process from running its own tear-down actions.

Thank you for working on this long standing issue.

> A preferred, though ABI breaking solution would be
> to have the environment init always return a value
> rather than abort upon distress.

Yes, it is the preferred solution.
We should not use exit (panic & co) inside a library.
It is important enough to break the API.
I would be in favor of accepting such breakage in 18.05.

> This patch defines a couple of callback registration functions,
> one for panic and one for exit
> in case one wishes to distinguish between these events.
> Once a callback is set and panic takes place,
> it will be called prior to calling abort.
> 
> Maiden voyage patch for Qwilt and myself.

Are you OK to visit the other side of the solution?

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] eal: register rte_panic user callback
@ 2018-03-06 18:28  3% Arnon Warshavsky
  2018-03-07  8:32  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Arnon Warshavsky @ 2018-03-06 18:28 UTC (permalink / raw)
  To: thomas, bruce.richardson; +Cc: dev, Arnon Warshavsky

The use case addressed here is dpdk environment init
aborting the process due to panic,
preventing the calling process from running its own tear-down actions.
A preferred, though ABI breaking solution would be
to have the environment init always return a value
rather than abort upon distress.

This patch defines a couple of callback registration functions,
one for panic and one for exit
in case one wishes to distinguish between these events.
Once a callback is set and panic takes place,
it will be called prior to calling abort.

Maiden voyage patch for Qwilt and myself.

Signed-off-by: Arnon Warshavsky <arnon@qwilt.com>
---
 lib/librte_eal/bsdapp/eal/eal_debug.c     | 37 ++++++++++++++++++++++++++++++
 lib/librte_eal/common/include/rte_debug.h | 24 +++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal_debug.c   | 38 +++++++++++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map        |  2 ++
 4 files changed, 101 insertions(+)

diff --git a/lib/librte_eal/bsdapp/eal/eal_debug.c b/lib/librte_eal/bsdapp/eal/eal_debug.c
index 5d92500..010859d 100644
--- a/lib/librte_eal/bsdapp/eal/eal_debug.c
+++ b/lib/librte_eal/bsdapp/eal/eal_debug.c
@@ -18,6 +18,39 @@
 
 #define BACKTRACE_SIZE 256
 
+/*
+ * user function pointers that when assigned, gets to be called
+ * during ret_exit()
+ */
+static rte_user_abort_callback_t *exit_user_callback;
+
+/*
+ * user function pointers that when assigned, gets to be called
+ * during ret_panic()
+ */
+static rte_user_abort_callback_t *panic_user_callback;
+
+/**
+ * Register user callback function to be called during rte_panic()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_panic_user_callback_register(rte_user_abort_callback_t *cb)
+{
+	panic_user_callback = cb;
+}
+
+/**
+ * Register user callback function to be called during rte_exit()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_exit_user_callback_register(rte_user_abort_callback_t *cb)
+{
+	exit_user_callback = cb;
+}
+
+
 /* dump the stack of the calling core */
 void rte_dump_stack(void)
 {
@@ -59,6 +92,8 @@ void __rte_panic(const char *funcname, const char *format, ...)
 	va_end(ap);
 	rte_dump_stack();
 	rte_dump_registers();
+	if (panic_user_callback)
+		(*panic_user_callback)();
 	abort();
 }
 
@@ -78,6 +113,8 @@ rte_exit(int exit_code, const char *format, ...)
 	va_start(ap, format);
 	rte_vlog(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, format, ap);
 	va_end(ap);
+	if (exit_user_callback)
+		(*exit_user_callback)();
 
 #ifndef RTE_EAL_ALWAYS_PANIC_ON_ERROR
 	if (rte_eal_cleanup() != 0)
diff --git a/lib/librte_eal/common/include/rte_debug.h b/lib/librte_eal/common/include/rte_debug.h
index 272df49..7e3d0a2 100644
--- a/lib/librte_eal/common/include/rte_debug.h
+++ b/lib/librte_eal/common/include/rte_debug.h
@@ -16,11 +16,35 @@
 
 #include "rte_log.h"
 #include "rte_branch_prediction.h"
+#include <rte_compat.h>
 
 #ifdef __cplusplus
 extern "C" {
 #endif
 
+
+/*
+ * Definition of user function pointer type to be called during
+ * the execution of rte_panic
+ */
+
+typedef void  (*rte_user_abort_callback_t)(void);
+/**< @internal Ethernet device configuration. */
+
+/**
+ * Register user callback function to be called during rte_panic()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_panic_user_callback_register(rte_user_abort_callback_t *cb);
+
+/**
+ * Register user callback function to be called during rte_exit()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_exit_user_callback_register(rte_user_abort_callback_t *cb);
+
 /**
  * Dump the stack of the calling core to the console.
  */
diff --git a/lib/librte_eal/linuxapp/eal/eal_debug.c b/lib/librte_eal/linuxapp/eal/eal_debug.c
index 5d92500..b1748b8 100644
--- a/lib/librte_eal/linuxapp/eal/eal_debug.c
+++ b/lib/librte_eal/linuxapp/eal/eal_debug.c
@@ -16,8 +16,42 @@
 #include <rte_common.h>
 #include <rte_eal.h>
 
+
 #define BACKTRACE_SIZE 256
 
+/*
+ * user function pointers that when assigned, gets to be called
+ * during ret_exit()
+ */
+static rte_user_abort_callback_t *exit_user_callback;
+
+/*
+ * user function pointers that when assigned, gets to be called
+ * during ret_panic()
+ */
+static rte_user_abort_callback_t *panic_user_callback;
+
+/**
+ * Register user callback function to be called during rte_panic()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_panic_user_callback_register(rte_user_abort_callback_t *cb)
+{
+	panic_user_callback = cb;
+}
+
+/**
+ * Register user callback function to be called during rte_exit()
+ * Deregisteration is by passing NULL as the parameter
+ */
+void __rte_experimental
+rte_exit_user_callback_register(rte_user_abort_callback_t *cb)
+{
+	exit_user_callback = cb;
+}
+
+
 /* dump the stack of the calling core */
 void rte_dump_stack(void)
 {
@@ -59,6 +93,8 @@ void __rte_panic(const char *funcname, const char *format, ...)
 	va_end(ap);
 	rte_dump_stack();
 	rte_dump_registers();
+	if (panic_user_callback)
+		(*panic_user_callback)();
 	abort();
 }
 
@@ -78,6 +114,8 @@ rte_exit(int exit_code, const char *format, ...)
 	va_start(ap, format);
 	rte_vlog(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, format, ap);
 	va_end(ap);
+	if (exit_user_callback)
+		(*exit_user_callback)();
 
 #ifndef RTE_EAL_ALWAYS_PANIC_ON_ERROR
 	if (rte_eal_cleanup() != 0)
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d123602..7b8f55d 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -221,11 +221,13 @@ EXPERIMENTAL {
 	rte_eal_hotplug_add;
 	rte_eal_hotplug_remove;
 	rte_eal_mbuf_user_pool_ops;
+	rte_exit_user_callback_register;
 	rte_mp_action_register;
 	rte_mp_action_unregister;
 	rte_mp_sendmsg;
 	rte_mp_request;
 	rte_mp_reply;
+	rte_panic_user_callback_register;
 	rte_service_attr_get;
 	rte_service_attr_reset_all;
 	rte_service_component_register;
-- 
2.7.4

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2] ethdev: remove versioning of ethdev filter control function
  2018-02-27 10:29  3% [dpdk-dev] [PATCH] ethdev: remove versioning of ethdev filter control function Kirill Rybalchenko
  2018-02-27 11:01  0% ` Ferruh Yigit
@ 2018-02-27 14:18  7% ` Kirill Rybalchenko
  2018-03-07 17:17  0%   ` Ferruh Yigit
  1 sibling, 1 reply; 200+ results
From: Kirill Rybalchenko @ 2018-02-27 14:18 UTC (permalink / raw)
  To: dev; +Cc: kirill.rybalchenko, andrey.chilikin, thomas, ferruh.yigit

In 18.02 release the ABI of ethdev component was changed.
To keep compatibility with previous versions of the library
the versioning of rte_eth_dev_filter_ctrl function was implemented.
As soon as deprecation note was issued in 18.02 release, there is
no need to keep compatibility with previous versions.
Remove the versioning of rte_eth_dev_filter_ctrl function.

v2:
Modify map file, increment library version,
remove deprecation notice

Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
---
 doc/guides/rel_notes/deprecation.rst    |   6 --
 doc/guides/rel_notes/release_18_05.rst  |   2 +-
 lib/librte_ether/Makefile               |   2 +-
 lib/librte_ether/rte_ethdev.c           | 155 +-------------------------------
 lib/librte_ether/rte_ethdev_version.map |   1 -
 5 files changed, 4 insertions(+), 162 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 74c18ed..6594585 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -149,12 +149,6 @@ Deprecation Notices
   as parameter. For consistency functions adding callback will return
   ``struct rte_eth_rxtx_callback \*`` instead of ``void \*``.
 
-* ethdev: The size of variables ``flow_types_mask`` in
-  ``rte_eth_fdir_info structure``, ``sym_hash_enable_mask`` and
-  ``valid_bit_mask`` in ``rte_eth_hash_global_conf`` structure
-  will be increased from 32 to 64 bits to fulfill hardware requirements.
-  This change will break existing ABI as size of the structures will increase.
-
 * ethdev: ``rte_eth_dev_get_sec_ctx()`` fix port id storage
   ``rte_eth_dev_get_sec_ctx()`` is using ``uint8_t`` for ``port_id``,
   which should be ``uint16_t``.
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 3923dc2..22da411 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -128,7 +128,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_cryptodev.so.4
      librte_distributor.so.1
      librte_eal.so.6
-     librte_ethdev.so.8
+   + librte_ethdev.so.9
      librte_eventdev.so.3
      librte_flow_classify.so.1
      librte_gro.so.1
diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile
index 3ca5782..c2f2f7d 100644
--- a/lib/librte_ether/Makefile
+++ b/lib/librte_ether/Makefile
@@ -16,7 +16,7 @@ LDLIBS += -lrte_mbuf
 
 EXPORT_MAP := rte_ethdev_version.map
 
-LIBABIVER := 8
+LIBABIVER := 9
 
 SRCS-y += rte_ethdev.c
 SRCS-y += rte_flow.c
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 0590f0c..78b8376 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -34,7 +34,6 @@
 #include <rte_errno.h>
 #include <rte_spinlock.h>
 #include <rte_string_fns.h>
-#include <rte_compat.h>
 
 #include "rte_ether.h"
 #include "rte_ethdev.h"
@@ -3490,153 +3489,8 @@ rte_eth_dev_filter_supported(uint16_t port_id,
 }
 
 int
-rte_eth_dev_filter_ctrl_v22(uint16_t port_id,
-			    enum rte_filter_type filter_type,
-			    enum rte_filter_op filter_op, void *arg);
-
-int
-rte_eth_dev_filter_ctrl_v22(uint16_t port_id,
-			    enum rte_filter_type filter_type,
-			    enum rte_filter_op filter_op, void *arg)
-{
-	struct rte_eth_fdir_info_v22 {
-		enum rte_fdir_mode mode;
-		struct rte_eth_fdir_masks mask;
-		struct rte_eth_fdir_flex_conf flex_conf;
-		uint32_t guarant_spc;
-		uint32_t best_spc;
-		uint32_t flow_types_mask[1];
-		uint32_t max_flexpayload;
-		uint32_t flex_payload_unit;
-		uint32_t max_flex_payload_segment_num;
-		uint16_t flex_payload_limit;
-		uint32_t flex_bitmask_unit;
-		uint32_t max_flex_bitmask_num;
-	};
-
-	struct rte_eth_hash_global_conf_v22 {
-		enum rte_eth_hash_function hash_func;
-		uint32_t sym_hash_enable_mask[1];
-		uint32_t valid_bit_mask[1];
-	};
-
-	struct rte_eth_hash_filter_info_v22 {
-		enum rte_eth_hash_filter_info_type info_type;
-		union {
-			uint8_t enable;
-			struct rte_eth_hash_global_conf_v22 global_conf;
-			struct rte_eth_input_set_conf input_set_conf;
-		} info;
-	};
-
-	struct rte_eth_dev *dev;
-
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-	dev = &rte_eth_devices[port_id];
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->filter_ctrl, -ENOTSUP);
-	if (filter_op == RTE_ETH_FILTER_INFO) {
-		int retval;
-		struct rte_eth_fdir_info_v22 *fdir_info_v22;
-		struct rte_eth_fdir_info fdir_info;
-
-		fdir_info_v22 = (struct rte_eth_fdir_info_v22 *)arg;
-
-		retval = (*dev->dev_ops->filter_ctrl)(dev, filter_type,
-			  filter_op, (void *)&fdir_info);
-		fdir_info_v22->mode = fdir_info.mode;
-		fdir_info_v22->mask = fdir_info.mask;
-		fdir_info_v22->flex_conf = fdir_info.flex_conf;
-		fdir_info_v22->guarant_spc = fdir_info.guarant_spc;
-		fdir_info_v22->best_spc = fdir_info.best_spc;
-		fdir_info_v22->flow_types_mask[0] =
-			(uint32_t)fdir_info.flow_types_mask[0];
-		fdir_info_v22->max_flexpayload = fdir_info.max_flexpayload;
-		fdir_info_v22->flex_payload_unit = fdir_info.flex_payload_unit;
-		fdir_info_v22->max_flex_payload_segment_num =
-			fdir_info.max_flex_payload_segment_num;
-		fdir_info_v22->flex_payload_limit =
-			fdir_info.flex_payload_limit;
-		fdir_info_v22->flex_bitmask_unit = fdir_info.flex_bitmask_unit;
-		fdir_info_v22->max_flex_bitmask_num =
-			fdir_info.max_flex_bitmask_num;
-		return retval;
-	} else if (filter_op == RTE_ETH_FILTER_GET) {
-		int retval;
-		struct rte_eth_hash_filter_info f_info;
-		struct rte_eth_hash_filter_info_v22 *f_info_v22 =
-			(struct rte_eth_hash_filter_info_v22 *)arg;
-
-		f_info.info_type = f_info_v22->info_type;
-		retval = (*dev->dev_ops->filter_ctrl)(dev, filter_type,
-			  filter_op, (void *)&f_info);
-
-		switch (f_info_v22->info_type) {
-		case RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT:
-			f_info_v22->info.enable = f_info.info.enable;
-			break;
-		case RTE_ETH_HASH_FILTER_GLOBAL_CONFIG:
-			f_info_v22->info.global_conf.hash_func =
-				f_info.info.global_conf.hash_func;
-			f_info_v22->info.global_conf.sym_hash_enable_mask[0] =
-				(uint32_t)
-				f_info.info.global_conf.sym_hash_enable_mask[0];
-			f_info_v22->info.global_conf.valid_bit_mask[0] =
-				(uint32_t)
-				f_info.info.global_conf.valid_bit_mask[0];
-			break;
-		case RTE_ETH_HASH_FILTER_INPUT_SET_SELECT:
-			f_info_v22->info.input_set_conf =
-				f_info.info.input_set_conf;
-			break;
-		default:
-			break;
-		}
-		return retval;
-	} else if (filter_op == RTE_ETH_FILTER_SET) {
-		struct rte_eth_hash_filter_info f_info;
-		struct rte_eth_hash_filter_info_v22 *f_v22 =
-			(struct rte_eth_hash_filter_info_v22 *)arg;
-
-		f_info.info_type = f_v22->info_type;
-		switch (f_v22->info_type) {
-		case RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT:
-			f_info.info.enable = f_v22->info.enable;
-			break;
-		case RTE_ETH_HASH_FILTER_GLOBAL_CONFIG:
-			f_info.info.global_conf.hash_func =
-				f_v22->info.global_conf.hash_func;
-			f_info.info.global_conf.sym_hash_enable_mask[0] =
-				(uint32_t)
-				f_v22->info.global_conf.sym_hash_enable_mask[0];
-			f_info.info.global_conf.valid_bit_mask[0] =
-				(uint32_t)
-				f_v22->info.global_conf.valid_bit_mask[0];
-			break;
-		case RTE_ETH_HASH_FILTER_INPUT_SET_SELECT:
-			f_info.info.input_set_conf =
-				f_v22->info.input_set_conf;
-			break;
-		default:
-			break;
-		}
-		return (*dev->dev_ops->filter_ctrl)(dev, filter_type, filter_op,
-						    (void *)&f_info);
-	} else
-		return (*dev->dev_ops->filter_ctrl)(dev, filter_type, filter_op,
-						    arg);
-}
-VERSION_SYMBOL(rte_eth_dev_filter_ctrl, _v22, 2.2);
-
-int
-rte_eth_dev_filter_ctrl_v1802(uint16_t port_id,
-			      enum rte_filter_type filter_type,
-			      enum rte_filter_op filter_op, void *arg);
-
-int
-rte_eth_dev_filter_ctrl_v1802(uint16_t port_id,
-			      enum rte_filter_type filter_type,
-			      enum rte_filter_op filter_op, void *arg)
+rte_eth_dev_filter_ctrl(uint16_t port_id, enum rte_filter_type filter_type,
+			enum rte_filter_op filter_op, void *arg)
 {
 	struct rte_eth_dev *dev;
 
@@ -3647,11 +3501,6 @@ rte_eth_dev_filter_ctrl_v1802(uint16_t port_id,
 	return eth_err(port_id, (*dev->dev_ops->filter_ctrl)(dev, filter_type,
 							     filter_op, arg));
 }
-BIND_DEFAULT_SYMBOL(rte_eth_dev_filter_ctrl, _v1802, 18.02);
-MAP_STATIC_SYMBOL(int rte_eth_dev_filter_ctrl(uint16_t port_id,
-		  enum rte_filter_type filter_type,
-		  enum rte_filter_op filter_op, void *arg),
-		  rte_eth_dev_filter_ctrl_v1802);
 
 void *
 rte_eth_add_rx_callback(uint16_t port_id, uint16_t queue_id,
diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
index 87f02fb..34df6c8 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -16,7 +16,6 @@ DPDK_2.2 {
 	rte_eth_dev_count;
 	rte_eth_dev_default_mac_addr_set;
 	rte_eth_dev_detach;
-	rte_eth_dev_filter_ctrl;
 	rte_eth_dev_filter_supported;
 	rte_eth_dev_flow_ctrl_get;
 	rte_eth_dev_flow_ctrl_set;
-- 
2.5.5

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [PATCH] ethdev: remove versioning of ethdev filter control function
  2018-02-27 11:01  0% ` Ferruh Yigit
@ 2018-02-27 13:45  3%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-27 13:45 UTC (permalink / raw)
  To: Ferruh Yigit, Kirill Rybalchenko; +Cc: dev, andrey.chilikin

27/02/2018 12:01, Ferruh Yigit:
> On 2/27/2018 10:29 AM, Kirill Rybalchenko wrote:
> > In 18.02 release the ABI of ethdev component was changed.
> > To keep compatibility with previous versions of the library
> > the versioning of rte_eth_dev_filter_ctrl function was implemented.
> > As soon as deprecation note was issued in 18.02 release, there is
> > no need to keep compatibility with previous versions.
> > Remove the versioning of rte_eth_dev_filter_ctrl function.
> > 
> > Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
> > ---
> >  lib/librte_ether/rte_ethdev.c | 155 +-----------------------------------------
> 
> Hi Kirill,
> 
> You need to update .map file and removed deprecation notice in this patch.

And bump the ABI version in Makefile and release notes.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] ethdev: remove versioning of ethdev filter control function
  2018-02-27 10:29  3% [dpdk-dev] [PATCH] ethdev: remove versioning of ethdev filter control function Kirill Rybalchenko
@ 2018-02-27 11:01  0% ` Ferruh Yigit
  2018-02-27 13:45  3%   ` Thomas Monjalon
  2018-02-27 14:18  7% ` [dpdk-dev] [PATCH v2] " Kirill Rybalchenko
  1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2018-02-27 11:01 UTC (permalink / raw)
  To: Kirill Rybalchenko, dev; +Cc: andrey.chilikin, thomas

On 2/27/2018 10:29 AM, Kirill Rybalchenko wrote:
> In 18.02 release the ABI of ethdev component was changed.
> To keep compatibility with previous versions of the library
> the versioning of rte_eth_dev_filter_ctrl function was implemented.
> As soon as deprecation note was issued in 18.02 release, there is
> no need to keep compatibility with previous versions.
> Remove the versioning of rte_eth_dev_filter_ctrl function.
> 
> Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
> ---
>  lib/librte_ether/rte_ethdev.c | 155 +-----------------------------------------

Hi Kirill,

You need to update .map file and removed deprecation notice in this patch.

Thanks,
ferruh

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] ethdev: remove versioning of ethdev filter control function
@ 2018-02-27 10:29  3% Kirill Rybalchenko
  2018-02-27 11:01  0% ` Ferruh Yigit
  2018-02-27 14:18  7% ` [dpdk-dev] [PATCH v2] " Kirill Rybalchenko
  0 siblings, 2 replies; 200+ results
From: Kirill Rybalchenko @ 2018-02-27 10:29 UTC (permalink / raw)
  To: dev; +Cc: kirill.rybalchenko, andrey.chilikin, thomas, ferruh.yigit

In 18.02 release the ABI of ethdev component was changed.
To keep compatibility with previous versions of the library
the versioning of rte_eth_dev_filter_ctrl function was implemented.
As soon as deprecation note was issued in 18.02 release, there is
no need to keep compatibility with previous versions.
Remove the versioning of rte_eth_dev_filter_ctrl function.

Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
---
 lib/librte_ether/rte_ethdev.c | 155 +-----------------------------------------
 1 file changed, 2 insertions(+), 153 deletions(-)

diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 0590f0c..78b8376 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -34,7 +34,6 @@
 #include <rte_errno.h>
 #include <rte_spinlock.h>
 #include <rte_string_fns.h>
-#include <rte_compat.h>
 
 #include "rte_ether.h"
 #include "rte_ethdev.h"
@@ -3490,153 +3489,8 @@ rte_eth_dev_filter_supported(uint16_t port_id,
 }
 
 int
-rte_eth_dev_filter_ctrl_v22(uint16_t port_id,
-			    enum rte_filter_type filter_type,
-			    enum rte_filter_op filter_op, void *arg);
-
-int
-rte_eth_dev_filter_ctrl_v22(uint16_t port_id,
-			    enum rte_filter_type filter_type,
-			    enum rte_filter_op filter_op, void *arg)
-{
-	struct rte_eth_fdir_info_v22 {
-		enum rte_fdir_mode mode;
-		struct rte_eth_fdir_masks mask;
-		struct rte_eth_fdir_flex_conf flex_conf;
-		uint32_t guarant_spc;
-		uint32_t best_spc;
-		uint32_t flow_types_mask[1];
-		uint32_t max_flexpayload;
-		uint32_t flex_payload_unit;
-		uint32_t max_flex_payload_segment_num;
-		uint16_t flex_payload_limit;
-		uint32_t flex_bitmask_unit;
-		uint32_t max_flex_bitmask_num;
-	};
-
-	struct rte_eth_hash_global_conf_v22 {
-		enum rte_eth_hash_function hash_func;
-		uint32_t sym_hash_enable_mask[1];
-		uint32_t valid_bit_mask[1];
-	};
-
-	struct rte_eth_hash_filter_info_v22 {
-		enum rte_eth_hash_filter_info_type info_type;
-		union {
-			uint8_t enable;
-			struct rte_eth_hash_global_conf_v22 global_conf;
-			struct rte_eth_input_set_conf input_set_conf;
-		} info;
-	};
-
-	struct rte_eth_dev *dev;
-
-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-
-	dev = &rte_eth_devices[port_id];
-	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->filter_ctrl, -ENOTSUP);
-	if (filter_op == RTE_ETH_FILTER_INFO) {
-		int retval;
-		struct rte_eth_fdir_info_v22 *fdir_info_v22;
-		struct rte_eth_fdir_info fdir_info;
-
-		fdir_info_v22 = (struct rte_eth_fdir_info_v22 *)arg;
-
-		retval = (*dev->dev_ops->filter_ctrl)(dev, filter_type,
-			  filter_op, (void *)&fdir_info);
-		fdir_info_v22->mode = fdir_info.mode;
-		fdir_info_v22->mask = fdir_info.mask;
-		fdir_info_v22->flex_conf = fdir_info.flex_conf;
-		fdir_info_v22->guarant_spc = fdir_info.guarant_spc;
-		fdir_info_v22->best_spc = fdir_info.best_spc;
-		fdir_info_v22->flow_types_mask[0] =
-			(uint32_t)fdir_info.flow_types_mask[0];
-		fdir_info_v22->max_flexpayload = fdir_info.max_flexpayload;
-		fdir_info_v22->flex_payload_unit = fdir_info.flex_payload_unit;
-		fdir_info_v22->max_flex_payload_segment_num =
-			fdir_info.max_flex_payload_segment_num;
-		fdir_info_v22->flex_payload_limit =
-			fdir_info.flex_payload_limit;
-		fdir_info_v22->flex_bitmask_unit = fdir_info.flex_bitmask_unit;
-		fdir_info_v22->max_flex_bitmask_num =
-			fdir_info.max_flex_bitmask_num;
-		return retval;
-	} else if (filter_op == RTE_ETH_FILTER_GET) {
-		int retval;
-		struct rte_eth_hash_filter_info f_info;
-		struct rte_eth_hash_filter_info_v22 *f_info_v22 =
-			(struct rte_eth_hash_filter_info_v22 *)arg;
-
-		f_info.info_type = f_info_v22->info_type;
-		retval = (*dev->dev_ops->filter_ctrl)(dev, filter_type,
-			  filter_op, (void *)&f_info);
-
-		switch (f_info_v22->info_type) {
-		case RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT:
-			f_info_v22->info.enable = f_info.info.enable;
-			break;
-		case RTE_ETH_HASH_FILTER_GLOBAL_CONFIG:
-			f_info_v22->info.global_conf.hash_func =
-				f_info.info.global_conf.hash_func;
-			f_info_v22->info.global_conf.sym_hash_enable_mask[0] =
-				(uint32_t)
-				f_info.info.global_conf.sym_hash_enable_mask[0];
-			f_info_v22->info.global_conf.valid_bit_mask[0] =
-				(uint32_t)
-				f_info.info.global_conf.valid_bit_mask[0];
-			break;
-		case RTE_ETH_HASH_FILTER_INPUT_SET_SELECT:
-			f_info_v22->info.input_set_conf =
-				f_info.info.input_set_conf;
-			break;
-		default:
-			break;
-		}
-		return retval;
-	} else if (filter_op == RTE_ETH_FILTER_SET) {
-		struct rte_eth_hash_filter_info f_info;
-		struct rte_eth_hash_filter_info_v22 *f_v22 =
-			(struct rte_eth_hash_filter_info_v22 *)arg;
-
-		f_info.info_type = f_v22->info_type;
-		switch (f_v22->info_type) {
-		case RTE_ETH_HASH_FILTER_SYM_HASH_ENA_PER_PORT:
-			f_info.info.enable = f_v22->info.enable;
-			break;
-		case RTE_ETH_HASH_FILTER_GLOBAL_CONFIG:
-			f_info.info.global_conf.hash_func =
-				f_v22->info.global_conf.hash_func;
-			f_info.info.global_conf.sym_hash_enable_mask[0] =
-				(uint32_t)
-				f_v22->info.global_conf.sym_hash_enable_mask[0];
-			f_info.info.global_conf.valid_bit_mask[0] =
-				(uint32_t)
-				f_v22->info.global_conf.valid_bit_mask[0];
-			break;
-		case RTE_ETH_HASH_FILTER_INPUT_SET_SELECT:
-			f_info.info.input_set_conf =
-				f_v22->info.input_set_conf;
-			break;
-		default:
-			break;
-		}
-		return (*dev->dev_ops->filter_ctrl)(dev, filter_type, filter_op,
-						    (void *)&f_info);
-	} else
-		return (*dev->dev_ops->filter_ctrl)(dev, filter_type, filter_op,
-						    arg);
-}
-VERSION_SYMBOL(rte_eth_dev_filter_ctrl, _v22, 2.2);
-
-int
-rte_eth_dev_filter_ctrl_v1802(uint16_t port_id,
-			      enum rte_filter_type filter_type,
-			      enum rte_filter_op filter_op, void *arg);
-
-int
-rte_eth_dev_filter_ctrl_v1802(uint16_t port_id,
-			      enum rte_filter_type filter_type,
-			      enum rte_filter_op filter_op, void *arg)
+rte_eth_dev_filter_ctrl(uint16_t port_id, enum rte_filter_type filter_type,
+			enum rte_filter_op filter_op, void *arg)
 {
 	struct rte_eth_dev *dev;
 
@@ -3647,11 +3501,6 @@ rte_eth_dev_filter_ctrl_v1802(uint16_t port_id,
 	return eth_err(port_id, (*dev->dev_ops->filter_ctrl)(dev, filter_type,
 							     filter_op, arg));
 }
-BIND_DEFAULT_SYMBOL(rte_eth_dev_filter_ctrl, _v1802, 18.02);
-MAP_STATIC_SYMBOL(int rte_eth_dev_filter_ctrl(uint16_t port_id,
-		  enum rte_filter_type filter_type,
-		  enum rte_filter_op filter_op, void *arg),
-		  rte_eth_dev_filter_ctrl_v1802);
 
 void *
 rte_eth_add_rx_callback(uint16_t port_id, uint16_t queue_id,
-- 
2.5.5

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 1/7] crypto/virtio: add virtio related fundamental functions
  @ 2018-02-24 13:14  2% ` Jay Zhou
  0 siblings, 0 replies; 200+ results
From: Jay Zhou @ 2018-02-24 13:14 UTC (permalink / raw)
  To: dev
  Cc: pablo.de.lara.guarch, roy.fan.zhang, thomas, arei.gonglei,
	xin.zeng, weidong.huang, wangxinxin.wang, longpeng2,
	jianjay.zhou

Since there are not have the common virtio library, we have to put
these files here. They are basically the same with virtio net related files
with some minor changes.

Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
---
 config/common_base                  |  20 ++
 drivers/crypto/virtio/virtio_logs.h |  47 ++++
 drivers/crypto/virtio/virtio_pci.c  | 460 ++++++++++++++++++++++++++++++++++++
 drivers/crypto/virtio/virtio_pci.h  | 252 ++++++++++++++++++++
 drivers/crypto/virtio/virtio_ring.h | 137 +++++++++++
 drivers/crypto/virtio/virtqueue.c   |  43 ++++
 drivers/crypto/virtio/virtqueue.h   | 176 ++++++++++++++
 7 files changed, 1135 insertions(+)
 create mode 100644 drivers/crypto/virtio/virtio_logs.h
 create mode 100644 drivers/crypto/virtio/virtio_pci.c
 create mode 100644 drivers/crypto/virtio/virtio_pci.h
 create mode 100644 drivers/crypto/virtio/virtio_ring.h
 create mode 100644 drivers/crypto/virtio/virtqueue.c
 create mode 100644 drivers/crypto/virtio/virtqueue.h

diff --git a/config/common_base b/config/common_base
index ad03cf4..19d0cdd 100644
--- a/config/common_base
+++ b/config/common_base
@@ -482,6 +482,26 @@ CONFIG_RTE_LIBRTE_PMD_QAT_DEBUG_DRIVER=n
 CONFIG_RTE_QAT_PMD_MAX_NB_SESSIONS=2048
 
 #
+# Compile PMD for virtio crypto devices
+#
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_INIT=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_SESSION=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_TX=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_RX=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_DRIVER=n
+CONFIG_RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_DUMP=n
+#
+# Number of maximum virtio crypto devices
+#
+CONFIG_RTE_MAX_VIRTIO_CRYPTO=32
+#
+# Number of sessions to create in the session memory pool
+# on a single virtio crypto device.
+#
+CONFIG_RTE_VIRTIO_CRYPTO_PMD_MAX_NB_SESSIONS=1024
+
+#
 # Compile PMD for AESNI backed device
 #
 CONFIG_RTE_LIBRTE_PMD_AESNI_MB=n
diff --git a/drivers/crypto/virtio/virtio_logs.h b/drivers/crypto/virtio/virtio_logs.h
new file mode 100644
index 0000000..20582a4
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_logs.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_LOGS_H_
+#define _VIRTIO_LOGS_H_
+
+#include <rte_log.h>
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_INIT
+#define PMD_INIT_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): " fmt "\n", __func__, ## args)
+#define PMD_INIT_FUNC_TRACE() PMD_INIT_LOG(DEBUG, " >>")
+#else
+#define PMD_INIT_LOG(level, fmt, args...) do { } while (0)
+#define PMD_INIT_FUNC_TRACE() do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_SESSION
+#define PMD_SESSION_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() session: " fmt "\n", __func__, ## args)
+#else
+#define PMD_SESSION_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_RX
+#define PMD_RX_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() rx: " fmt "\n", __func__, ## args)
+#else
+#define PMD_RX_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_TX
+#define PMD_TX_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s() tx: " fmt "\n", __func__, ## args)
+#else
+#define PMD_TX_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_DRIVER
+#define PMD_DRV_LOG(level, fmt, args...) \
+	RTE_LOG(level, PMD, "%s(): driver " fmt "\n", __func__, ## args)
+#else
+#define PMD_DRV_LOG(level, fmt, args...) do { } while (0)
+#endif
+
+#endif /* _VIRTIO_LOGS_H_ */
diff --git a/drivers/crypto/virtio/virtio_pci.c b/drivers/crypto/virtio/virtio_pci.c
new file mode 100644
index 0000000..7aa5cdd
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_pci.c
@@ -0,0 +1,460 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#include <stdint.h>
+
+#ifdef RTE_EXEC_ENV_LINUXAPP
+ #include <dirent.h>
+ #include <fcntl.h>
+#endif
+
+#include <rte_io.h>
+#include <rte_bus.h>
+
+#include "virtio_pci.h"
+#include "virtqueue.h"
+
+/*
+ * Following macros are derived from linux/pci_regs.h, however,
+ * we can't simply include that header here, as there is no such
+ * file for non-Linux platform.
+ */
+#define PCI_CAPABILITY_LIST	0x34
+#define PCI_CAP_ID_VNDR		0x09
+#define PCI_CAP_ID_MSIX		0x11
+
+/*
+ * The remaining space is defined by each driver as the per-driver
+ * configuration space.
+ */
+#define VIRTIO_PCI_CONFIG(hw) \
+		(((hw)->use_msix == VIRTIO_MSIX_ENABLED) ? 24 : 20)
+
+static inline int
+check_vq_phys_addr_ok(struct virtqueue *vq)
+{
+	/* Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit,
+	 * and only accepts 32 bit page frame number.
+	 * Check if the allocated physical memory exceeds 16TB.
+	 */
+	if ((vq->vq_ring_mem + vq->vq_ring_size - 1) >>
+			(VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) {
+		PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!");
+		return 0;
+	}
+
+	return 1;
+}
+
+static inline void
+io_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi)
+{
+	rte_write32(val & ((1ULL << 32) - 1), lo);
+	rte_write32(val >> 32,		     hi);
+}
+
+static void
+modern_read_dev_config(struct virtio_crypto_hw *hw, size_t offset,
+		       void *dst, int length)
+{
+	int i;
+	uint8_t *p;
+	uint8_t old_gen, new_gen;
+
+	do {
+		old_gen = rte_read8(&hw->common_cfg->config_generation);
+
+		p = dst;
+		for (i = 0;  i < length; i++)
+			*p++ = rte_read8((uint8_t *)hw->dev_cfg + offset + i);
+
+		new_gen = rte_read8(&hw->common_cfg->config_generation);
+	} while (old_gen != new_gen);
+}
+
+static void
+modern_write_dev_config(struct virtio_crypto_hw *hw, size_t offset,
+			const void *src, int length)
+{
+	int i;
+	const uint8_t *p = src;
+
+	for (i = 0;  i < length; i++)
+		rte_write8((*p++), (((uint8_t *)hw->dev_cfg) + offset + i));
+}
+
+static uint64_t
+modern_get_features(struct virtio_crypto_hw *hw)
+{
+	uint32_t features_lo, features_hi;
+
+	rte_write32(0, &hw->common_cfg->device_feature_select);
+	features_lo = rte_read32(&hw->common_cfg->device_feature);
+
+	rte_write32(1, &hw->common_cfg->device_feature_select);
+	features_hi = rte_read32(&hw->common_cfg->device_feature);
+
+	return ((uint64_t)features_hi << 32) | features_lo;
+}
+
+static void
+modern_set_features(struct virtio_crypto_hw *hw, uint64_t features)
+{
+	rte_write32(0, &hw->common_cfg->guest_feature_select);
+	rte_write32(features & ((1ULL << 32) - 1),
+		    &hw->common_cfg->guest_feature);
+
+	rte_write32(1, &hw->common_cfg->guest_feature_select);
+	rte_write32(features >> 32,
+		    &hw->common_cfg->guest_feature);
+}
+
+static uint8_t
+modern_get_status(struct virtio_crypto_hw *hw)
+{
+	return rte_read8(&hw->common_cfg->device_status);
+}
+
+static void
+modern_set_status(struct virtio_crypto_hw *hw, uint8_t status)
+{
+	rte_write8(status, &hw->common_cfg->device_status);
+}
+
+static void
+modern_reset(struct virtio_crypto_hw *hw)
+{
+	modern_set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	modern_get_status(hw);
+}
+
+static uint8_t
+modern_get_isr(struct virtio_crypto_hw *hw)
+{
+	return rte_read8(hw->isr);
+}
+
+static uint16_t
+modern_set_config_irq(struct virtio_crypto_hw *hw, uint16_t vec)
+{
+	rte_write16(vec, &hw->common_cfg->msix_config);
+	return rte_read16(&hw->common_cfg->msix_config);
+}
+
+static uint16_t
+modern_set_queue_irq(struct virtio_crypto_hw *hw, struct virtqueue *vq,
+		uint16_t vec)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+	rte_write16(vec, &hw->common_cfg->queue_msix_vector);
+	return rte_read16(&hw->common_cfg->queue_msix_vector);
+}
+
+static uint16_t
+modern_get_queue_num(struct virtio_crypto_hw *hw, uint16_t queue_id)
+{
+	rte_write16(queue_id, &hw->common_cfg->queue_select);
+	return rte_read16(&hw->common_cfg->queue_size);
+}
+
+static int
+modern_setup_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
+{
+	uint64_t desc_addr, avail_addr, used_addr;
+	uint16_t notify_off;
+
+	if (!check_vq_phys_addr_ok(vq))
+		return -1;
+
+	desc_addr = vq->vq_ring_mem;
+	avail_addr = desc_addr + vq->vq_nentries * sizeof(struct vring_desc);
+	used_addr = RTE_ALIGN_CEIL(avail_addr + offsetof(struct vring_avail,
+							 ring[vq->vq_nentries]),
+				   VIRTIO_PCI_VRING_ALIGN);
+
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(desc_addr, &hw->common_cfg->queue_desc_lo,
+				      &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(avail_addr, &hw->common_cfg->queue_avail_lo,
+				       &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(used_addr, &hw->common_cfg->queue_used_lo,
+				      &hw->common_cfg->queue_used_hi);
+
+	notify_off = rte_read16(&hw->common_cfg->queue_notify_off);
+	vq->notify_addr = (void *)((uint8_t *)hw->notify_base +
+				notify_off * hw->notify_off_multiplier);
+
+	rte_write16(1, &hw->common_cfg->queue_enable);
+
+	PMD_INIT_LOG(DEBUG, "queue %u addresses:", vq->vq_queue_index);
+	PMD_INIT_LOG(DEBUG, "\t desc_addr: %" PRIx64, desc_addr);
+	PMD_INIT_LOG(DEBUG, "\t aval_addr: %" PRIx64, avail_addr);
+	PMD_INIT_LOG(DEBUG, "\t used_addr: %" PRIx64, used_addr);
+	PMD_INIT_LOG(DEBUG, "\t notify addr: %p (notify offset: %u)",
+		vq->notify_addr, notify_off);
+
+	return 0;
+}
+
+static void
+modern_del_queue(struct virtio_crypto_hw *hw, struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, &hw->common_cfg->queue_select);
+
+	io_write64_twopart(0, &hw->common_cfg->queue_desc_lo,
+				  &hw->common_cfg->queue_desc_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_avail_lo,
+				  &hw->common_cfg->queue_avail_hi);
+	io_write64_twopart(0, &hw->common_cfg->queue_used_lo,
+				  &hw->common_cfg->queue_used_hi);
+
+	rte_write16(0, &hw->common_cfg->queue_enable);
+}
+
+static void
+modern_notify_queue(struct virtio_crypto_hw *hw __rte_unused,
+		struct virtqueue *vq)
+{
+	rte_write16(vq->vq_queue_index, vq->notify_addr);
+}
+
+const struct virtio_pci_ops virtio_crypto_modern_ops = {
+	.read_dev_cfg	= modern_read_dev_config,
+	.write_dev_cfg	= modern_write_dev_config,
+	.reset		= modern_reset,
+	.get_status	= modern_get_status,
+	.set_status	= modern_set_status,
+	.get_features	= modern_get_features,
+	.set_features	= modern_set_features,
+	.get_isr	= modern_get_isr,
+	.set_config_irq	= modern_set_config_irq,
+	.set_queue_irq  = modern_set_queue_irq,
+	.get_queue_num	= modern_get_queue_num,
+	.setup_queue	= modern_setup_queue,
+	.del_queue	= modern_del_queue,
+	.notify_queue	= modern_notify_queue,
+};
+
+void
+vtpci_read_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+		void *dst, int length)
+{
+	VTPCI_OPS(hw)->read_dev_cfg(hw, offset, dst, length);
+}
+
+void
+vtpci_write_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+		const void *src, int length)
+{
+	VTPCI_OPS(hw)->write_dev_cfg(hw, offset, src, length);
+}
+
+uint64_t
+vtpci_cryptodev_negotiate_features(struct virtio_crypto_hw *hw,
+		uint64_t host_features)
+{
+	uint64_t features;
+
+	/*
+	 * Limit negotiated features to what the driver, virtqueue, and
+	 * host all support.
+	 */
+	features = host_features & hw->guest_features;
+	VTPCI_OPS(hw)->set_features(hw, features);
+
+	return features;
+}
+
+void
+vtpci_cryptodev_reset(struct virtio_crypto_hw *hw)
+{
+	VTPCI_OPS(hw)->set_status(hw, VIRTIO_CONFIG_STATUS_RESET);
+	/* flush status write */
+	VTPCI_OPS(hw)->get_status(hw);
+}
+
+void
+vtpci_cryptodev_reinit_complete(struct virtio_crypto_hw *hw)
+{
+	vtpci_cryptodev_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER_OK);
+}
+
+void
+vtpci_cryptodev_set_status(struct virtio_crypto_hw *hw, uint8_t status)
+{
+	if (status != VIRTIO_CONFIG_STATUS_RESET)
+		status |= VTPCI_OPS(hw)->get_status(hw);
+
+	VTPCI_OPS(hw)->set_status(hw, status);
+}
+
+uint8_t
+vtpci_cryptodev_get_status(struct virtio_crypto_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_status(hw);
+}
+
+uint8_t
+vtpci_cryptodev_isr(struct virtio_crypto_hw *hw)
+{
+	return VTPCI_OPS(hw)->get_isr(hw);
+}
+
+static void *
+get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap)
+{
+	uint8_t  bar    = cap->bar;
+	uint32_t length = cap->length;
+	uint32_t offset = cap->offset;
+	uint8_t *base;
+
+	if (bar >= PCI_MAX_RESOURCE) {
+		PMD_INIT_LOG(ERR, "invalid bar: %u", bar);
+		return NULL;
+	}
+
+	if (offset + length < offset) {
+		PMD_INIT_LOG(ERR, "offset(%u) + length(%u) overflows",
+			offset, length);
+		return NULL;
+	}
+
+	if (offset + length > dev->mem_resource[bar].len) {
+		PMD_INIT_LOG(ERR,
+			"invalid cap: overflows bar space: %u > %" PRIu64,
+			offset + length, dev->mem_resource[bar].len);
+		return NULL;
+	}
+
+	base = dev->mem_resource[bar].addr;
+	if (base == NULL) {
+		PMD_INIT_LOG(ERR, "bar %u base addr is NULL", bar);
+		return NULL;
+	}
+
+	return base + offset;
+}
+
+#define PCI_MSIX_ENABLE 0x8000
+
+static int
+virtio_read_caps(struct rte_pci_device *dev, struct virtio_crypto_hw *hw)
+{
+	uint8_t pos;
+	struct virtio_pci_cap cap;
+	int ret;
+
+	if (rte_pci_map_device(dev)) {
+		PMD_INIT_LOG(DEBUG, "failed to map pci device!");
+		return -1;
+	}
+
+	ret = rte_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST);
+	if (ret < 0) {
+		PMD_INIT_LOG(DEBUG, "failed to read pci capability list");
+		return -1;
+	}
+
+	while (pos) {
+		ret = rte_pci_read_config(dev, &cap, sizeof(cap), pos);
+		if (ret < 0) {
+			PMD_INIT_LOG(ERR,
+				"failed to read pci cap at pos: %x", pos);
+			break;
+		}
+
+		if (cap.cap_vndr == PCI_CAP_ID_MSIX) {
+			/* Transitional devices would also have this capability,
+			 * that's why we also check if msix is enabled.
+			 * 1st byte is cap ID; 2nd byte is the position of next
+			 * cap; next two bytes are the flags.
+			 */
+			uint16_t flags = ((uint16_t *)&cap)[1];
+
+			if (flags & PCI_MSIX_ENABLE)
+				hw->use_msix = VIRTIO_MSIX_ENABLED;
+			else
+				hw->use_msix = VIRTIO_MSIX_DISABLED;
+		}
+
+		if (cap.cap_vndr != PCI_CAP_ID_VNDR) {
+			PMD_INIT_LOG(DEBUG,
+				"[%2x] skipping non VNDR cap id: %02x",
+				pos, cap.cap_vndr);
+			goto next;
+		}
+
+		PMD_INIT_LOG(DEBUG,
+			"[%2x] cfg type: %u, bar: %u, offset: %04x, len: %u",
+			pos, cap.cfg_type, cap.bar, cap.offset, cap.length);
+
+		switch (cap.cfg_type) {
+		case VIRTIO_PCI_CAP_COMMON_CFG:
+			hw->common_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_NOTIFY_CFG:
+			rte_pci_read_config(dev, &hw->notify_off_multiplier,
+					4, pos + sizeof(cap));
+			hw->notify_base = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_DEVICE_CFG:
+			hw->dev_cfg = get_cfg_addr(dev, &cap);
+			break;
+		case VIRTIO_PCI_CAP_ISR_CFG:
+			hw->isr = get_cfg_addr(dev, &cap);
+			break;
+		}
+
+next:
+		pos = cap.cap_next;
+	}
+
+	if (hw->common_cfg == NULL || hw->notify_base == NULL ||
+	    hw->dev_cfg == NULL    || hw->isr == NULL) {
+		PMD_INIT_LOG(INFO, "no modern virtio pci device found.");
+		return -1;
+	}
+
+	PMD_INIT_LOG(INFO, "found modern virtio pci device.");
+
+	PMD_INIT_LOG(DEBUG, "common cfg mapped at: %p", hw->common_cfg);
+	PMD_INIT_LOG(DEBUG, "device cfg mapped at: %p", hw->dev_cfg);
+	PMD_INIT_LOG(DEBUG, "isr cfg mapped at: %p", hw->isr);
+	PMD_INIT_LOG(DEBUG, "notify base: %p, notify off multiplier: %u",
+		hw->notify_base, hw->notify_off_multiplier);
+
+	return 0;
+}
+
+/*
+ * Return -1:
+ *   if there is error mapping with VFIO/UIO.
+ *   if port map error when driver type is KDRV_NONE.
+ *   if whitelisted but driver type is KDRV_UNKNOWN.
+ * Return 1 if kernel driver is managing the device.
+ * Return 0 on success.
+ */
+int
+vtpci_cryptodev_init(struct rte_pci_device *dev, struct virtio_crypto_hw *hw)
+{
+	/*
+	 * Try if we can succeed reading virtio pci caps, which exists
+	 * only on modern pci device. If failed, we fallback to legacy
+	 * virtio handling.
+	 */
+	if (virtio_read_caps(dev, hw) == 0) {
+		PMD_INIT_LOG(INFO, "modern virtio pci detected.");
+		virtio_hw_internal[hw->dev_id].vtpci_ops =
+					&virtio_crypto_modern_ops;
+		hw->modern = 1;
+		return 0;
+	}
+
+	/*
+	 * virtio crypto conforms to virtio 1.0 and doesn't support
+	 * legacy mode
+	 */
+	return -1;
+}
diff --git a/drivers/crypto/virtio/virtio_pci.h b/drivers/crypto/virtio/virtio_pci.h
new file mode 100644
index 0000000..a469ea3
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_pci.h
@@ -0,0 +1,252 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_PCI_H_
+#define _VIRTIO_PCI_H_
+
+#include <linux/virtio_crypto.h>
+
+#include <stdint.h>
+
+#include <rte_pci.h>
+#include <rte_bus_pci.h>
+#include <rte_cryptodev.h>
+
+struct virtqueue;
+
+/* VirtIO PCI vendor/device ID. */
+#define VIRTIO_CRYPTO_PCI_VENDORID 0x1AF4
+#define VIRTIO_CRYPTO_PCI_DEVICEID 0x1054
+
+/* VirtIO ABI version, this must match exactly. */
+#define VIRTIO_PCI_ABI_VERSION 0
+
+/*
+ * VirtIO Header, located in BAR 0.
+ */
+#define VIRTIO_PCI_HOST_FEATURES  0  /* host's supported features (32bit, RO)*/
+#define VIRTIO_PCI_GUEST_FEATURES 4  /* guest's supported features (32, RW) */
+#define VIRTIO_PCI_QUEUE_PFN      8  /* physical address of VQ (32, RW) */
+#define VIRTIO_PCI_QUEUE_NUM      12 /* number of ring entries (16, RO) */
+#define VIRTIO_PCI_QUEUE_SEL      14 /* current VQ selection (16, RW) */
+#define VIRTIO_PCI_QUEUE_NOTIFY   16 /* notify host regarding VQ (16, RW) */
+#define VIRTIO_PCI_STATUS         18 /* device status register (8, RW) */
+#define VIRTIO_PCI_ISR            19 /* interrupt status register, reading
+				      * also clears the register (8, RO)
+				      */
+/* Only if MSIX is enabled: */
+
+/* configuration change vector (16, RW) */
+#define VIRTIO_MSI_CONFIG_VECTOR  20
+/* vector for selected VQ notifications */
+#define VIRTIO_MSI_QUEUE_VECTOR	  22
+
+/* The bit of the ISR which indicates a device has an interrupt. */
+#define VIRTIO_PCI_ISR_INTR   0x1
+/* The bit of the ISR which indicates a device configuration change. */
+#define VIRTIO_PCI_ISR_CONFIG 0x2
+/* Vector value used to disable MSI for queue. */
+#define VIRTIO_MSI_NO_VECTOR 0xFFFF
+
+/* Status byte for guest to report progress. */
+#define VIRTIO_CONFIG_STATUS_RESET     0x00
+#define VIRTIO_CONFIG_STATUS_ACK       0x01
+#define VIRTIO_CONFIG_STATUS_DRIVER    0x02
+#define VIRTIO_CONFIG_STATUS_DRIVER_OK 0x04
+#define VIRTIO_CONFIG_STATUS_FEATURES_OK 0x08
+#define VIRTIO_CONFIG_STATUS_FAILED    0x80
+
+/*
+ * Each virtqueue indirect descriptor list must be physically contiguous.
+ * To allow us to malloc(9) each list individually, limit the number
+ * supported to what will fit in one page. With 4KB pages, this is a limit
+ * of 256 descriptors. If there is ever a need for more, we can switch to
+ * contigmalloc(9) for the larger allocations, similar to what
+ * bus_dmamem_alloc(9) does.
+ *
+ * Note the sizeof(struct vring_desc) is 16 bytes.
+ */
+#define VIRTIO_MAX_INDIRECT ((int) (PAGE_SIZE / 16))
+
+/* Do we get callbacks when the ring is completely used, even if we've
+ * suppressed them?
+ */
+#define VIRTIO_F_NOTIFY_ON_EMPTY	24
+
+/* Can the device handle any descriptor layout? */
+#define VIRTIO_F_ANY_LAYOUT		27
+
+/* We support indirect buffer descriptors */
+#define VIRTIO_RING_F_INDIRECT_DESC	28
+
+#define VIRTIO_F_VERSION_1		32
+#define VIRTIO_F_IOMMU_PLATFORM	33
+
+/* The Guest publishes the used index for which it expects an interrupt
+ * at the end of the avail ring. Host should ignore the avail->flags field.
+ */
+/* The Host publishes the avail index for which it expects a kick
+ * at the end of the used ring. Guest should ignore the used->flags field.
+ */
+#define VIRTIO_RING_F_EVENT_IDX		29
+
+/* Common configuration */
+#define VIRTIO_PCI_CAP_COMMON_CFG	1
+/* Notifications */
+#define VIRTIO_PCI_CAP_NOTIFY_CFG	2
+/* ISR Status */
+#define VIRTIO_PCI_CAP_ISR_CFG		3
+/* Device specific configuration */
+#define VIRTIO_PCI_CAP_DEVICE_CFG	4
+/* PCI configuration access */
+#define VIRTIO_PCI_CAP_PCI_CFG		5
+
+/* This is the PCI capability header: */
+struct virtio_pci_cap {
+	uint8_t cap_vndr;	/* Generic PCI field: PCI_CAP_ID_VNDR */
+	uint8_t cap_next;	/* Generic PCI field: next ptr. */
+	uint8_t cap_len;	/* Generic PCI field: capability length */
+	uint8_t cfg_type;	/* Identifies the structure. */
+	uint8_t bar;		/* Where to find it. */
+	uint8_t padding[3];	/* Pad to full dword. */
+	uint32_t offset;	/* Offset within bar. */
+	uint32_t length;	/* Length of the structure, in bytes. */
+};
+
+struct virtio_pci_notify_cap {
+	struct virtio_pci_cap cap;
+	uint32_t notify_off_multiplier;	/* Multiplier for queue_notify_off. */
+};
+
+/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
+struct virtio_pci_common_cfg {
+	/* About the whole device. */
+	uint32_t device_feature_select;	/* read-write */
+	uint32_t device_feature;	/* read-only */
+	uint32_t guest_feature_select;	/* read-write */
+	uint32_t guest_feature;		/* read-write */
+	uint16_t msix_config;		/* read-write */
+	uint16_t num_queues;		/* read-only */
+	uint8_t device_status;		/* read-write */
+	uint8_t config_generation;	/* read-only */
+
+	/* About a specific virtqueue. */
+	uint16_t queue_select;		/* read-write */
+	uint16_t queue_size;		/* read-write, power of 2. */
+	uint16_t queue_msix_vector;	/* read-write */
+	uint16_t queue_enable;		/* read-write */
+	uint16_t queue_notify_off;	/* read-only */
+	uint32_t queue_desc_lo;		/* read-write */
+	uint32_t queue_desc_hi;		/* read-write */
+	uint32_t queue_avail_lo;	/* read-write */
+	uint32_t queue_avail_hi;	/* read-write */
+	uint32_t queue_used_lo;		/* read-write */
+	uint32_t queue_used_hi;		/* read-write */
+};
+
+struct virtio_crypto_hw;
+
+struct virtio_pci_ops {
+	void (*read_dev_cfg)(struct virtio_crypto_hw *hw, size_t offset,
+			     void *dst, int len);
+	void (*write_dev_cfg)(struct virtio_crypto_hw *hw, size_t offset,
+			      const void *src, int len);
+	void (*reset)(struct virtio_crypto_hw *hw);
+
+	uint8_t (*get_status)(struct virtio_crypto_hw *hw);
+	void (*set_status)(struct virtio_crypto_hw *hw, uint8_t status);
+
+	uint64_t (*get_features)(struct virtio_crypto_hw *hw);
+	void (*set_features)(struct virtio_crypto_hw *hw, uint64_t features);
+
+	uint8_t (*get_isr)(struct virtio_crypto_hw *hw);
+
+	uint16_t (*set_config_irq)(struct virtio_crypto_hw *hw, uint16_t vec);
+
+	uint16_t (*set_queue_irq)(struct virtio_crypto_hw *hw,
+			struct virtqueue *vq, uint16_t vec);
+
+	uint16_t (*get_queue_num)(struct virtio_crypto_hw *hw,
+			uint16_t queue_id);
+	int (*setup_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+	void (*del_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+	void (*notify_queue)(struct virtio_crypto_hw *hw, struct virtqueue *vq);
+};
+
+struct virtio_crypto_hw {
+	/* control queue */
+	struct virtqueue *cvq;
+	uint16_t    dev_id;
+	uint16_t    max_dataqueues;
+	uint64_t    req_guest_features;
+	uint64_t    guest_features;
+	uint8_t	    use_msix;
+	uint8_t     modern;
+	uint32_t    notify_off_multiplier;
+	uint8_t     *isr;
+	uint16_t    *notify_base;
+	struct virtio_pci_common_cfg *common_cfg;
+	struct virtio_crypto_config *dev_cfg;
+};
+
+/*
+ * While virtio_crypto_hw is stored in shared memory, this structure stores
+ * some infos that may vary in the multiple process model locally.
+ * For example, the vtpci_ops pointer.
+ */
+struct virtio_hw_internal {
+	const struct virtio_pci_ops *vtpci_ops;
+	struct rte_pci_ioport io;
+};
+
+#define VTPCI_OPS(hw)	(virtio_hw_internal[(hw)->dev_id].vtpci_ops)
+#define VTPCI_IO(hw)	(&virtio_hw_internal[(hw)->dev_id].io)
+
+extern struct virtio_hw_internal virtio_hw_internal[RTE_MAX_VIRTIO_CRYPTO];
+
+/*
+ * How many bits to shift physical queue address written to QUEUE_PFN.
+ * 12 is historical, and due to x86 page size.
+ */
+#define VIRTIO_PCI_QUEUE_ADDR_SHIFT 12
+
+/* The alignment to use between consumer and producer parts of vring. */
+#define VIRTIO_PCI_VRING_ALIGN 4096
+
+enum virtio_msix_status {
+	VIRTIO_MSIX_NONE = 0,
+	VIRTIO_MSIX_DISABLED = 1,
+	VIRTIO_MSIX_ENABLED = 2
+};
+
+static inline int
+vtpci_with_feature(struct virtio_crypto_hw *hw, uint64_t bit)
+{
+	return (hw->guest_features & (1ULL << bit)) != 0;
+}
+
+/*
+ * Function declaration from virtio_pci.c
+ */
+int vtpci_cryptodev_init(struct rte_pci_device *dev,
+	struct virtio_crypto_hw *hw);
+void vtpci_cryptodev_reset(struct virtio_crypto_hw *hw);
+
+void vtpci_cryptodev_reinit_complete(struct virtio_crypto_hw *hw);
+
+uint8_t vtpci_cryptodev_get_status(struct virtio_crypto_hw *hw);
+void vtpci_cryptodev_set_status(struct virtio_crypto_hw *hw, uint8_t status);
+
+uint64_t vtpci_cryptodev_negotiate_features(struct virtio_crypto_hw *hw,
+	uint64_t host_features);
+
+void vtpci_write_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+	const void *src, int length);
+
+void vtpci_read_cryptodev_config(struct virtio_crypto_hw *hw, size_t offset,
+	void *dst, int length);
+
+uint8_t vtpci_cryptodev_isr(struct virtio_crypto_hw *hw);
+
+#endif /* _VIRTIO_PCI_H_ */
diff --git a/drivers/crypto/virtio/virtio_ring.h b/drivers/crypto/virtio/virtio_ring.h
new file mode 100644
index 0000000..ee30674
--- /dev/null
+++ b/drivers/crypto/virtio/virtio_ring.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTIO_RING_H_
+#define _VIRTIO_RING_H_
+
+#include <stdint.h>
+
+#include <rte_common.h>
+
+/* This marks a buffer as continuing via the next field. */
+#define VRING_DESC_F_NEXT       1
+/* This marks a buffer as write-only (otherwise read-only). */
+#define VRING_DESC_F_WRITE      2
+/* This means the buffer contains a list of buffer descriptors. */
+#define VRING_DESC_F_INDIRECT   4
+
+/* The Host uses this in used->flags to advise the Guest: don't kick me
+ * when you add a buffer.  It's unreliable, so it's simply an
+ * optimization.  Guest will still kick if it's out of buffers.
+ */
+#define VRING_USED_F_NO_NOTIFY  1
+/* The Guest uses this in avail->flags to advise the Host: don't
+ * interrupt me when you consume a buffer.  It's unreliable, so it's
+ * simply an optimization.
+ */
+#define VRING_AVAIL_F_NO_INTERRUPT  1
+
+/* VirtIO ring descriptors: 16 bytes.
+ * These can chain together via "next".
+ */
+struct vring_desc {
+	uint64_t addr;  /*  Address (guest-physical). */
+	uint32_t len;   /* Length. */
+	uint16_t flags; /* The flags as indicated above. */
+	uint16_t next;  /* We chain unused descriptors via this. */
+};
+
+struct vring_avail {
+	uint16_t flags;
+	uint16_t idx;
+	uint16_t ring[0];
+};
+
+/* id is a 16bit index. uint32_t is used here for ids for padding reasons. */
+struct vring_used_elem {
+	/* Index of start of used descriptor chain. */
+	uint32_t id;
+	/* Total length of the descriptor chain which was written to. */
+	uint32_t len;
+};
+
+struct vring_used {
+	uint16_t flags;
+	volatile uint16_t idx;
+	struct vring_used_elem ring[0];
+};
+
+struct vring {
+	unsigned int num;
+	struct vring_desc  *desc;
+	struct vring_avail *avail;
+	struct vring_used  *used;
+};
+
+/* The standard layout for the ring is a continuous chunk of memory which
+ * looks like this.  We assume num is a power of 2.
+ *
+ * struct vring {
+ *      // The actual descriptors (16 bytes each)
+ *      struct vring_desc desc[num];
+ *
+ *      // A ring of available descriptor heads with free-running index.
+ *      __u16 avail_flags;
+ *      __u16 avail_idx;
+ *      __u16 available[num];
+ *      __u16 used_event_idx;
+ *
+ *      // Padding to the next align boundary.
+ *      char pad[];
+ *
+ *      // A ring of used descriptor heads with free-running index.
+ *      __u16 used_flags;
+ *      __u16 used_idx;
+ *      struct vring_used_elem used[num];
+ *      __u16 avail_event_idx;
+ * };
+ *
+ * NOTE: for VirtIO PCI, align is 4096.
+ */
+
+/*
+ * We publish the used event index at the end of the available ring, and vice
+ * versa. They are at the end for backwards compatibility.
+ */
+#define vring_used_event(vr)  ((vr)->avail->ring[(vr)->num])
+#define vring_avail_event(vr) (*(uint16_t *)&(vr)->used->ring[(vr)->num])
+
+static inline size_t
+vring_size(unsigned int num, unsigned long align)
+{
+	size_t size;
+
+	size = num * sizeof(struct vring_desc);
+	size += sizeof(struct vring_avail) + (num * sizeof(uint16_t));
+	size = RTE_ALIGN_CEIL(size, align);
+	size += sizeof(struct vring_used) +
+		(num * sizeof(struct vring_used_elem));
+	return size;
+}
+
+static inline void
+vring_init(struct vring *vr, unsigned int num, uint8_t *p,
+	unsigned long align)
+{
+	vr->num = num;
+	vr->desc = (struct vring_desc *) p;
+	vr->avail = (struct vring_avail *) (p +
+		num * sizeof(struct vring_desc));
+	vr->used = (void *)
+		RTE_ALIGN_CEIL((uintptr_t)(&vr->avail->ring[num]), align);
+}
+
+/*
+ * The following is used with VIRTIO_RING_F_EVENT_IDX.
+ * Assuming a given event_idx value from the other size, if we have
+ * just incremented index from old to new_idx, should we trigger an
+ * event?
+ */
+static inline int
+vring_need_event(uint16_t event_idx, uint16_t new_idx, uint16_t old)
+{
+	return (uint16_t)(new_idx - event_idx - 1) < (uint16_t)(new_idx - old);
+}
+
+#endif /* _VIRTIO_RING_H_ */
diff --git a/drivers/crypto/virtio/virtqueue.c b/drivers/crypto/virtio/virtqueue.c
new file mode 100644
index 0000000..fd8be58
--- /dev/null
+++ b/drivers/crypto/virtio/virtqueue.c
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#include <stdint.h>
+
+#include <rte_mbuf.h>
+#include <rte_crypto.h>
+#include <rte_malloc.h>
+
+#include "virtqueue.h"
+
+void
+virtqueue_disable_intr(struct virtqueue *vq)
+{
+	/*
+	 * Set VRING_AVAIL_F_NO_INTERRUPT to hint host
+	 * not to interrupt when it consumes packets
+	 * Note: this is only considered a hint to the host
+	 */
+	vq->vq_ring.avail->flags |= VRING_AVAIL_F_NO_INTERRUPT;
+}
+
+void
+virtqueue_detatch_unused(struct virtqueue *vq)
+{
+	struct rte_crypto_op *cop = NULL;
+
+	int idx;
+
+	if (vq != NULL)
+		for (idx = 0; idx < vq->vq_nentries; idx++) {
+			cop = vq->vq_descx[idx].crypto_op;
+			if (cop) {
+				if (cop->sym->m_src)
+					rte_pktmbuf_free(cop->sym->m_src);
+				if (cop->sym->m_dst)
+					rte_pktmbuf_free(cop->sym->m_dst);
+				rte_crypto_op_free(cop);
+				vq->vq_descx[idx].crypto_op = NULL;
+			}
+		}
+}
diff --git a/drivers/crypto/virtio/virtqueue.h b/drivers/crypto/virtio/virtqueue.h
new file mode 100644
index 0000000..1bd0e89
--- /dev/null
+++ b/drivers/crypto/virtio/virtqueue.h
@@ -0,0 +1,176 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 HUAWEI TECHNOLOGIES CO., LTD.
+ */
+
+#ifndef _VIRTQUEUE_H_
+#define _VIRTQUEUE_H_
+
+#include <linux/virtio_crypto.h>
+
+#include <stdint.h>
+
+#include <rte_atomic.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_mempool.h>
+
+#include "virtio_pci.h"
+#include "virtio_ring.h"
+#include "virtio_logs.h"
+
+struct rte_mbuf;
+
+/*
+ * Per virtio_config.h in Linux.
+ *     For virtio_pci on SMP, we don't need to order with respect to MMIO
+ *     accesses through relaxed memory I/O windows, so smp_mb() et al are
+ *     sufficient.
+ *
+ */
+#define virtio_mb()	rte_smp_mb()
+#define virtio_rmb()	rte_smp_rmb()
+#define virtio_wmb()	rte_smp_wmb()
+
+#define VIRTQUEUE_MAX_NAME_SZ 32
+
+enum { VTCRYPTO_DATAQ = 0, VTCRYPTO_CTRLQ = 1 };
+
+/**
+ * The maximum virtqueue size is 2^15. Use that value as the end of
+ * descriptor chain terminator since it will never be a valid index
+ * in the descriptor table. This is used to verify we are correctly
+ * handling vq_free_cnt.
+ */
+#define VQ_RING_DESC_CHAIN_END 32768
+
+struct vq_desc_extra {
+	void     *crypto_op;
+	void     *cookie;
+	uint16_t ndescs;
+};
+
+struct virtqueue {
+	/**< virtio_crypto_hw structure pointer. */
+	struct virtio_crypto_hw *hw;
+	/**< mem zone to populate RX ring. */
+	const struct rte_memzone *mz;
+	/**< memzone to populate hdr and request. */
+	struct rte_mempool *mpool;
+	uint8_t     dev_id;              /**< Device identifier. */
+	uint16_t    vq_queue_index;       /**< PCI queue index */
+
+	void        *vq_ring_virt_mem;    /**< linear address of vring*/
+	unsigned int vq_ring_size;
+	phys_addr_t vq_ring_mem;          /**< physical address of vring */
+
+	struct vring vq_ring;    /**< vring keeping desc, used and avail */
+	uint16_t    vq_free_cnt; /**< num of desc available */
+	uint16_t    vq_nentries; /**< vring desc numbers */
+
+	/**
+	 * Head of the free chain in the descriptor table. If
+	 * there are no free descriptors, this will be set to
+	 * VQ_RING_DESC_CHAIN_END.
+	 */
+	uint16_t  vq_desc_head_idx;
+	uint16_t  vq_desc_tail_idx;
+	/**
+	 * Last consumed descriptor in the used table,
+	 * trails vq_ring.used->idx.
+	 */
+	uint16_t vq_used_cons_idx;
+	uint16_t vq_avail_idx;
+
+	/* Statistics */
+	uint64_t	packets_sent_total;
+	uint64_t	packets_sent_failed;
+	uint64_t	packets_received_total;
+	uint64_t	packets_received_failed;
+
+	uint16_t  *notify_addr;
+
+	struct vq_desc_extra vq_descx[0];
+};
+
+/**
+ * Tell the backend not to interrupt us.
+ */
+void virtqueue_disable_intr(struct virtqueue *vq);
+
+/**
+ *  Get all mbufs to be freed.
+ */
+void virtqueue_detatch_unused(struct virtqueue *vq);
+
+static inline int
+virtqueue_full(const struct virtqueue *vq)
+{
+	return vq->vq_free_cnt == 0;
+}
+
+#define VIRTQUEUE_NUSED(vq) \
+	((uint16_t)((vq)->vq_ring.used->idx - (vq)->vq_used_cons_idx))
+
+static inline void
+vq_update_avail_idx(struct virtqueue *vq)
+{
+	virtio_wmb();
+	vq->vq_ring.avail->idx = vq->vq_avail_idx;
+}
+
+static inline void
+vq_update_avail_ring(struct virtqueue *vq, uint16_t desc_idx)
+{
+	uint16_t avail_idx;
+	/*
+	 * Place the head of the descriptor chain into the next slot and make
+	 * it usable to the host. The chain is made available now rather than
+	 * deferring to virtqueue_notify() in the hopes that if the host is
+	 * currently running on another CPU, we can keep it processing the new
+	 * descriptor.
+	 */
+	avail_idx = (uint16_t)(vq->vq_avail_idx & (vq->vq_nentries - 1));
+	if (unlikely(vq->vq_ring.avail->ring[avail_idx] != desc_idx))
+		vq->vq_ring.avail->ring[avail_idx] = desc_idx;
+	vq->vq_avail_idx++;
+}
+
+static inline int
+virtqueue_kick_prepare(struct virtqueue *vq)
+{
+	return !(vq->vq_ring.used->flags & VRING_USED_F_NO_NOTIFY);
+}
+
+static inline void
+virtqueue_notify(struct virtqueue *vq)
+{
+	/*
+	 * Ensure updated avail->idx is visible to host.
+	 * For virtio on IA, the notificaiton is through io port operation
+	 * which is a serialization instruction itself.
+	 */
+	VTPCI_OPS(vq->hw)->notify_queue(vq->hw, vq);
+}
+
+/**
+ * Dump virtqueue internal structures, for debug purpose only.
+ */
+#ifdef RTE_LIBRTE_PMD_VIRTIO_CRYPTO_DEBUG_DUMP
+#define VIRTQUEUE_DUMP(vq) do { \
+	uint16_t used_idx, nused; \
+	used_idx = (vq)->vq_ring.used->idx; \
+	nused = (uint16_t)(used_idx - (vq)->vq_used_cons_idx); \
+	PMD_INIT_LOG(DEBUG, \
+	  "VQ: - size=%d; free=%d; used=%d; desc_head_idx=%d;" \
+	  " avail.idx=%d; used_cons_idx=%d; used.idx=%d;" \
+	  " avail.flags=0x%x; used.flags=0x%x", \
+	  (vq)->vq_nentries, (vq)->vq_free_cnt, nused, \
+	  (vq)->vq_desc_head_idx, (vq)->vq_ring.avail->idx, \
+	  (vq)->vq_used_cons_idx, (vq)->vq_ring.used->idx, \
+	  (vq)->vq_ring.avail->flags, (vq)->vq_ring.used->flags); \
+} while (0)
+#else
+#define VIRTQUEUE_DUMP(vq) do { } while (0)
+#endif
+
+#endif /* _VIRTQUEUE_H_ */
-- 
1.8.3.1

^ permalink raw reply	[relevance 2%]

* [dpdk-dev] [PATCH] doc: fixing grammar
@ 2018-02-22 12:15  8% Alejandro Lucero
  0 siblings, 0 replies; 200+ results
From: Alejandro Lucero @ 2018-02-22 12:15 UTC (permalink / raw)
  To: dev; +Cc: stable

My english is far worse than those from the marketing team.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
---
 doc/guides/nics/nfp.rst | 43 ++++++++++++++++++++++---------------------
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/doc/guides/nics/nfp.rst b/doc/guides/nics/nfp.rst
index 99a3b76..67e574e 100644
--- a/doc/guides/nics/nfp.rst
+++ b/doc/guides/nics/nfp.rst
@@ -34,14 +34,14 @@ NFP poll mode driver library
 Netronome's sixth generation of flow processors pack 216 programmable
 cores and over 100 hardware accelerators that uniquely combine packet,
 flow, security and content processing in a single device that scales
-up to 400 Gbps.
+up to 400-Gb/s.
 
 This document explains how to use DPDK with the Netronome Poll Mode
 Driver (PMD) supporting Netronome's Network Flow Processor 6xxx
 (NFP-6xxx) and Netronome's Flow Processor 4xxx (NFP-4xxx).
 
 NFP is a SRIOV capable device and the PMD driver supports the physical
-function (PF) and virtual functions (VFs).
+function (PF) and the virtual functions (VFs).
 
 Dependencies
 ------------
@@ -49,17 +49,18 @@ Dependencies
 Before using the Netronome's DPDK PMD some NFP configuration,
 which is not related to DPDK, is required. The system requires
 installation of **Netronome's BSP (Board Support Package)** along
-with some specific NFP firmware application. Netronome's NSP ABI
+with a specific NFP firmware application. Netronome's NSP ABI
 version should be 0.20 or higher.
 
 If you have a NFP device you should already have the code and
-documentation for doing all this configuration. Contact
+documentation for this configuration. Contact
 **support@netronome.com** to obtain the latest available firmware.
 
-The NFP Linux netdev kernel driver for VFs is part of vanilla kernel
-since kernel version 4.5, and support for the PF since kernel version
-4.11. Support for older kernels can be obtained on Github at
-**https://github.com/Netronome/nfp-drv-kmods** along with build
+The NFP Linux netdev kernel driver for VFs has been a part of the
+vanilla kernel since kernel version 4.5, and support for the PF
+since kernel version 4.11. Support for older kernels can be obtained
+on Github at
+**https://github.com/Netronome/nfp-drv-kmods** along with the build
 instructions.
 
 NFP PMD needs to be used along with UIO ``igb_uio`` or VFIO (``vfio-pci``)
@@ -70,15 +71,15 @@ Building the software
 
 Netronome's PMD code is provided in the **drivers/net/nfp** directory.
 Although NFP PMD has Netronome´s BSP dependencies, it is possible to
-compile it along with other DPDK PMDs even if no BSP was installed before.
+compile it along with other DPDK PMDs even if no BSP was installed previously.
 Of course, a DPDK app will require such a BSP installed for using the
 NFP PMD, along with a specific NFP firmware application.
 
-Default PMD configuration is at **common_linuxapp configuration** file:
+Default PMD configuration is at the **common_linuxapp configuration** file:
 
 - **CONFIG_RTE_LIBRTE_NFP_PMD=y**
 
-Once DPDK is built all the DPDK apps and examples include support for
+Once the DPDK is built all the DPDK apps and examples include support for
 the NFP PMD.
 
 
@@ -91,18 +92,18 @@ for details.
 Using the PF
 ------------
 
-NFP PMD has support for using the NFP PF as another DPDK port, but it does not
+NFP PMD supports using the NFP PF as another DPDK port, but it does not
 have any functionality for controlling VFs. In fact, it is not possible to use
 the PMD with the VFs if the PF is being used by DPDK, that is, with the NFP PF
-bound to ``igb_uio`` or ``vfio-pci`` kernel drivers. Future DPDK version will
+bound to ``igb_uio`` or ``vfio-pci`` kernel drivers. Future DPDK versions will
 have a PMD able to work with the PF and VFs at the same time and with the PF
 implementing VF management along with other PF-only functionalities/offloads.
 
-The PMD PF has extra work to do which will delay the DPDK app initialization
-like checking if a firmware is already available in the device, uploading the
-firmware if necessary, and configure the Link state properly when starting or
-stopping a PF port. Note that firmware upload is not always necessary which is
-the main delay for NFP PF PMD initialization.
+The PMD PF has extra work to do which will delay the DPDK app initialization.
+This additional effort could be checking if a firmware is already available in
+the device, uploading the firmware if necessary or configuring the Link state
+properly when starting or stopping a PF port. Note that firmware upload is not
+always necessary which is the main delay for NFP PF PMD initialization.
 
 Depending on the Netronome product installed in the system, firmware files
 should be available under ``/lib/firmware/netronome``. DPDK PMD supporting the
@@ -114,14 +115,14 @@ PF multiport support
 --------------------
 
 Some NFP cards support several physical ports with just one single PCI device.
-DPDK core is designed with the 1:1 relationship between PCI devices and DPDK
+The DPDK core is designed with a 1:1 relationship between PCI devices and DPDK
 ports, so NFP PMD PF support requires handling the multiport case specifically.
 During NFP PF initialization, the PMD will extract the information about the
 number of PF ports from the firmware and will create as many DPDK ports as
 needed.
 
 Because the unusual relationship between a single PCI device and several DPDK
-ports, there are some limitations when using more than one PF DPDK ports: there
+ports, there are some limitations when using more than one PF DPDK port: there
 is no support for RX interrupts and it is not possible either to use those PF
 ports with the device hotplug functionality.
 
@@ -136,7 +137,7 @@ System configuration
    get the drivers from the above Github repository and follow the instructions
    for building and installing it.
 
-   Virtual Functions need to be enabled before they can be used with the PMD.
+   VFs need to be enabled before they can be used with the PMD.
    Before enabling the VFs it is useful to obtain information about the
    current NFP PCI device detected by the system:
 
-- 
1.9.1

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [PATCH v4] lib/librte_meter: add meter configuration profile
  @ 2018-02-19 21:12  3%   ` Thomas Monjalon
  2018-04-05 10:12  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2018-02-19 21:12 UTC (permalink / raw)
  To: Jasvinder Singh, cristian.dumitrescu; +Cc: dev

08/01/2018 16:43, Jasvinder Singh:
> From: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> 
> This patch adds support for meter configuration profiles.
> Benefits: simplified configuration procedure, improved performance.
> 
> Q1: What is the configuration profile and why does it make sense?
> A1: The configuration profile represents the set of configuration
>     parameters for a given meter object, such as the rates and sizes for
>     the token buckets. The configuration profile concept makes sense when
>     many meter objects share the same configuration, which is the typical
>     usage model: thousands of traffic flows are each individually metered
>     according to just a few service levels (i.e. profiles).
> 
> Q2: How is the configuration profile improving the performance?
> A2: The performance improvement is achieved by reducing the memory
>     footprint of a meter object, which results in better cache utilization
>     for the typical case when large arrays of meter objects are used. The
>     internal data structures stored for each meter object contain:
>        a) Constant fields: Low level translation of the configuration
>           parameters that does not change post-configuration. This is
>           really duplicated for all meters that use the same
>           configuration. This is the configuration profile data that is
>           moved away from the meter object. Current size (implementation
>           dependent): srTCM = 32 bytes, trTCM = 32 bytes.
>        b) Variable fields: Time stamps and running counters that change
>           during the on-going traffic metering process. Current size
>           (implementation dependant): srTCM = 24 bytes, trTCM = 32 bytes.
>           Therefore, by moving the constant fields to a separate profile
>           data structure shared by all the meters with the same
>           configuration, the size of the meter object is reduced by ~50%.
> 
> Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com>
> Signed-off-by: Jasvinder Singh <jasvinder.singh@intel.com>

Applied for 18.05 (was postponed to preserve 18.02 ABI), thanks.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1] doc: add template release notes for 18.05
  2018-02-15 13:04  6% [dpdk-dev] [PATCH v1] doc: add template release notes for 18.05 John McNamara
@ 2018-02-16 22:54  0% ` Carrillo, Erik G
  0 siblings, 0 replies; 200+ results
From: Carrillo, Erik G @ 2018-02-16 22:54 UTC (permalink / raw)
  To: Mcnamara, John, dev; +Cc: Mcnamara, John

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of John McNamara
> Sent: Thursday, February 15, 2018 7:04 AM
> To: dev@dpdk.org
> Cc: Mcnamara, John <john.mcnamara@intel.com>
> Subject: [dpdk-dev] [PATCH v1] doc: add template release notes for 18.05
> 
> Add template release notes for DPDK 18.05 with inline comments and
> explanations of the various sections.
> 
> Signed-off-by: John McNamara <john.mcnamara@intel.com>
> ---
>  doc/guides/rel_notes/release_18_05.rst | 187
> +++++++++++++++++++++++++++++++++
>  1 file changed, 187 insertions(+)
>  create mode 100644 doc/guides/rel_notes/release_18_05.rst
> 
> diff --git a/doc/guides/rel_notes/release_18_05.rst
> b/doc/guides/rel_notes/release_18_05.rst
> new file mode 100644
> index 0000000..85f4dc5
> --- /dev/null
> +++ b/doc/guides/rel_notes/release_18_05.rst
> @@ -0,0 +1,187 @@
> +DPDK Release 18.05
> +==================
> +
> +.. **Read this first.**
> +
> +   The text in the sections below explains how to update the release notes.
> +
> +   Use proper spelling, capitalization and punctuation in all sections.
> +
> +   Variable and config names should be quoted as fixed width text:
> +   ``LIKE_THIS``.
> +
> +   Build the docs and view the output file to ensure the changes are correct::
> +
> +      make doc-guides-html
> +
> +      xdg-open build/doc/html/guides/rel_notes/release_18_05.html
> +
> +
> +New Features
> +------------
> +
> +.. This section should contain new features added in this release. Sample
> +   format:
> +
> +   * **Add a title in the past tense with a full stop.**
> +
> +     Add a short 1-2 sentence description in the past tense. The description
> +     should be enough to allow someone scanning the release notes to
> +     understand the new feature.
> +
> +     If the feature adds a lot of sub-features you can use a bullet list like
> +     this:
> +
> +     * Added feature foo to do something.
> +     * Enhanced feature bar to do something else.
> +
> +     Refer to the previous release notes for examples.
> +
> +     This section is a comment. Do not overwrite or remove it.
> +     Also, make sure to start the actual text at the margin.
> +
> =========================================================
> +
> +
> +API Changes
> +-----------
> +
> +.. This section should contain API changes. Sample format:
> +
> +   * Add a short 1-2 sentence description of the API change. Use fixed width
> +     quotes for ``rte_function_names`` or ``rte_struct_names``. Use the past
> +     tense.
> +
> +   This section is a comment. Do not overwrite or remove it.
> +   Also, make sure to start the actual text at the margin.
> +
> =========================================================
> +
> +
> +ABI Changes
> +-----------
> +
> +.. This section should contain ABI changes. Sample format:
> +
> +   * Add a short 1-2 sentence description of the ABI change that was
> announced
> +     in the previous releases and made in this release. Use fixed width quotes
> +     for ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
> +
> +   This section is a comment. Do not overwrite or remove it.
> +   Also, make sure to start the actual text at the margin.
> +
> =========================================================
> +
> +
> +Removed Items
> +-------------
> +
> +.. This section should contain removed items in this release. Sample format:
> +
> +   * Add a short 1-2 sentence description of the removed item in the past
> +     tense.
> +
> +   This section is a comment. Do not overwrite or remove it.
> +   Also, make sure to start the actual text at the margin.
> +
> =========================================================
> +
> +
> +Known Issues
> +------------
> +
> +.. This section should contain new known issues in this release. Sample
> format:
> +
> +   * **Add title in present tense with full stop.**
> +
> +     Add a short 1-2 sentence description of the known issue in the present
> +     tense. Add information on any known workarounds.
> +
> +   This section is a comment. Do not overwrite or remove it.
> +   Also, make sure to start the actual text at the margin.
> +
> =========================================================
> +
> +
> +Shared Library Versions
> +-----------------------
> +
> +.. Update any library version updated in this release and prepend with a ``+``
> +   sign, like this:
> +
> +     librte_acl.so.2
> +   + librte_cfgfile.so.2
> +     librte_cmdline.so.2
> +
> +   This section is a comment. Do not overwrite or remove it.
> +
> =========================================================
> +
> +
> +The libraries prepended with a plus sign were incremented in this version.
> +
> +.. code-block:: diff
> +
> +     librte_acl.so.2
> +     librte_bbdev.so.1
> +     librte_bitratestats.so.2
> +     librte_bus_dpaa.so.1
> +     librte_bus_fslmc.so.1
> +     librte_bus_pci.so.1
> +     librte_bus_vdev.so.1
> +     librte_cfgfile.so.2
> +     librte_cmdline.so.2
> +     librte_cryptodev.so.4
> +     librte_distributor.so.1
> +     librte_eal.so.6
> +     librte_ethdev.so.8
> +     librte_eventdev.so.3
> +     librte_flow_classify.so.1
> +     librte_gro.so.1
> +     librte_gso.so.1
> +     librte_hash.so.2
> +     librte_ip_frag.so.1
> +     librte_jobstats.so.1
> +     librte_kni.so.2
> +     librte_kvargs.so.1
> +     librte_latencystats.so.1
> +     librte_lpm.so.2
> +     librte_mbuf.so.3
> +     librte_mempool.so.3
> +     librte_meter.so.1
> +     librte_metrics.so.1
> +     librte_net.so.1
> +     librte_pci.so.1
> +     librte_pdump.so.2
> +     librte_pipeline.so.3
> +     librte_pmd_bnxt.so.2
> +     librte_pmd_bond.so.2
> +     librte_pmd_i40e.so.2
> +     librte_pmd_ixgbe.so.2
> +     librte_pmd_ring.so.2
> +     librte_pmd_softnic.so.1
> +     librte_pmd_vhost.so.2
> +     librte_port.so.3
> +     librte_power.so.1
> +     librte_rawdev.so.1
> +     librte_reorder.so.1
> +     librte_ring.so.1
> +     librte_sched.so.1
> +     librte_security.so.1
> +     librte_table.so.3
> +     librte_timer.so.1
> +     librte_vhost.so.3
> +
> +
> +Tested Platforms
> +----------------
> +
> +.. This section should contain a list of platforms that were tested with this
> +   release.
> +
> +   The format is:
> +
> +   * <vendor> platform with <vendor> <type of devices> combinations
> +
> +     * List of CPU
> +     * List of OS
> +     * List of devices
> +     * Other relevant details...
> +
> +   This section is a comment. Do not overwrite or remove it.
> +   Also, make sure to start the actual text at the margin.
> +
> =========================================================
> --
> 2.7.5

Acked-by:  Erik Gabriel Carrillo <erik.g.carrillo@intel.com>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] net/tap: add CRC stripping capability
  2018-02-15 21:55  3%   ` Stephen Hemminger
@ 2018-02-16 13:00  0%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-16 13:00 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Ophir Munk, dev, Pascal Mazon, Olga Shern

15/02/2018 22:55, Stephen Hemminger:
> On Tue, 13 Feb 2018 17:35:20 +0100
> Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> > 13/02/2018 09:14, Ophir Munk:
> > > CRC stripping is executed in the kernel outside of TAP PMD scope.
> > > There is no prevention that the TAP PMD will report on Rx CRC
> > > stripping capability.
> > > In the corrupted code, TAP PMD did not report on this capability.
> > > The fix enables TAP PMD to report that Rx CRC stripping is supported.
> > > 
> > > Fixes: 02f96a0a82d1 ("net/tap: add TUN/TAP device PMD")
> > > Cc: stable@dpdk.org
> > > 
> > > Signed-off-by: Ophir Munk <ophirmu@mellanox.com>  
> > 
> > Applied, thanks
> > 
> 
> The whole CRC strip flag notion is backwards. It really should of been
> a bit set if driver allows preserving CRC.
> 
> Since changing the ABI is not possible right now;
> the ethdev core ought to log a warning whenever driver is registered
> without CRC_STRIP flag.
> 
> Or is lack of CRC_STRIP in offload flags implying that driver can
> do strip and not stripping?

I agree we should change the API.
Let's open a new thread to discuss it with a wider audience.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] net/tap: add CRC stripping capability
  @ 2018-02-15 21:55  3%   ` Stephen Hemminger
  2018-02-16 13:00  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2018-02-15 21:55 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Ophir Munk, dev, Pascal Mazon, Olga Shern, stable

On Tue, 13 Feb 2018 17:35:20 +0100
Thomas Monjalon <thomas@monjalon.net> wrote:

> 13/02/2018 09:14, Ophir Munk:
> > CRC stripping is executed in the kernel outside of TAP PMD scope.
> > There is no prevention that the TAP PMD will report on Rx CRC
> > stripping capability.
> > In the corrupted code, TAP PMD did not report on this capability.
> > The fix enables TAP PMD to report that Rx CRC stripping is supported.
> > 
> > Fixes: 02f96a0a82d1 ("net/tap: add TUN/TAP device PMD")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Ophir Munk <ophirmu@mellanox.com>  
> 
> Applied, thanks
> 

The whole CRC strip flag notion is backwards. It really should of been
a bit set if driver allows preserving CRC.

Since changing the ABI is not possible right now;
the ethdev core ought to log a warning whenever driver is registered
without CRC_STRIP flag.

Or is lack of CRC_STRIP in offload flags implying that driver can
do strip and not stripping?

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v1] doc: add template release notes for 18.05
@ 2018-02-15 13:04  6% John McNamara
  2018-02-16 22:54  0% ` Carrillo, Erik G
  0 siblings, 1 reply; 200+ results
From: John McNamara @ 2018-02-15 13:04 UTC (permalink / raw)
  To: dev; +Cc: John McNamara

Add template release notes for DPDK 18.05 with inline
comments and explanations of the various sections.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
---
 doc/guides/rel_notes/release_18_05.rst | 187 +++++++++++++++++++++++++++++++++
 1 file changed, 187 insertions(+)
 create mode 100644 doc/guides/rel_notes/release_18_05.rst

diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
new file mode 100644
index 0000000..85f4dc5
--- /dev/null
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -0,0 +1,187 @@
+DPDK Release 18.05
+==================
+
+.. **Read this first.**
+
+   The text in the sections below explains how to update the release notes.
+
+   Use proper spelling, capitalization and punctuation in all sections.
+
+   Variable and config names should be quoted as fixed width text:
+   ``LIKE_THIS``.
+
+   Build the docs and view the output file to ensure the changes are correct::
+
+      make doc-guides-html
+
+      xdg-open build/doc/html/guides/rel_notes/release_18_05.html
+
+
+New Features
+------------
+
+.. This section should contain new features added in this release. Sample
+   format:
+
+   * **Add a title in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description in the past tense. The description
+     should be enough to allow someone scanning the release notes to
+     understand the new feature.
+
+     If the feature adds a lot of sub-features you can use a bullet list like
+     this:
+
+     * Added feature foo to do something.
+     * Enhanced feature bar to do something else.
+
+     Refer to the previous release notes for examples.
+
+     This section is a comment. Do not overwrite or remove it.
+     Also, make sure to start the actual text at the margin.
+     =========================================================
+
+
+API Changes
+-----------
+
+.. This section should contain API changes. Sample format:
+
+   * Add a short 1-2 sentence description of the API change. Use fixed width
+     quotes for ``rte_function_names`` or ``rte_struct_names``. Use the past
+     tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+ABI Changes
+-----------
+
+.. This section should contain ABI changes. Sample format:
+
+   * Add a short 1-2 sentence description of the ABI change that was announced
+     in the previous releases and made in this release. Use fixed width quotes
+     for ``rte_function_names`` or ``rte_struct_names``. Use the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+Removed Items
+-------------
+
+.. This section should contain removed items in this release. Sample format:
+
+   * Add a short 1-2 sentence description of the removed item in the past
+     tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+Known Issues
+------------
+
+.. This section should contain new known issues in this release. Sample format:
+
+   * **Add title in present tense with full stop.**
+
+     Add a short 1-2 sentence description of the known issue in the present
+     tense. Add information on any known workarounds.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
+
+
+Shared Library Versions
+-----------------------
+
+.. Update any library version updated in this release and prepend with a ``+``
+   sign, like this:
+
+     librte_acl.so.2
+   + librte_cfgfile.so.2
+     librte_cmdline.so.2
+
+   This section is a comment. Do not overwrite or remove it.
+   =========================================================
+
+
+The libraries prepended with a plus sign were incremented in this version.
+
+.. code-block:: diff
+
+     librte_acl.so.2
+     librte_bbdev.so.1
+     librte_bitratestats.so.2
+     librte_bus_dpaa.so.1
+     librte_bus_fslmc.so.1
+     librte_bus_pci.so.1
+     librte_bus_vdev.so.1
+     librte_cfgfile.so.2
+     librte_cmdline.so.2
+     librte_cryptodev.so.4
+     librte_distributor.so.1
+     librte_eal.so.6
+     librte_ethdev.so.8
+     librte_eventdev.so.3
+     librte_flow_classify.so.1
+     librte_gro.so.1
+     librte_gso.so.1
+     librte_hash.so.2
+     librte_ip_frag.so.1
+     librte_jobstats.so.1
+     librte_kni.so.2
+     librte_kvargs.so.1
+     librte_latencystats.so.1
+     librte_lpm.so.2
+     librte_mbuf.so.3
+     librte_mempool.so.3
+     librte_meter.so.1
+     librte_metrics.so.1
+     librte_net.so.1
+     librte_pci.so.1
+     librte_pdump.so.2
+     librte_pipeline.so.3
+     librte_pmd_bnxt.so.2
+     librte_pmd_bond.so.2
+     librte_pmd_i40e.so.2
+     librte_pmd_ixgbe.so.2
+     librte_pmd_ring.so.2
+     librte_pmd_softnic.so.1
+     librte_pmd_vhost.so.2
+     librte_port.so.3
+     librte_power.so.1
+     librte_rawdev.so.1
+     librte_reorder.so.1
+     librte_ring.so.1
+     librte_sched.so.1
+     librte_security.so.1
+     librte_table.so.3
+     librte_timer.so.1
+     librte_vhost.so.3
+
+
+Tested Platforms
+----------------
+
+.. This section should contain a list of platforms that were tested with this
+   release.
+
+   The format is:
+
+   * <vendor> platform with <vendor> <type of devices> combinations
+
+     * List of CPU
+     * List of OS
+     * List of devices
+     * Other relevant details...
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =========================================================
-- 
2.7.5

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH v6] checkpatches.sh: Add checks for ABI symbol addition
  @ 2018-02-14 19:19  6% ` Neil Horman
  0 siblings, 0 replies; 200+ results
From: Neil Horman @ 2018-02-14 19:19 UTC (permalink / raw)
  To: dev
  Cc: Neil Horman, thomas, john.mcnamara, bruce.richardson,
	Ferruh Yigit, Stephen Hemminger

Recently, some additional patches were added to allow for programmatic
marking of C symbols as experimental.  The addition of these markers is
dependent on the manual addition of exported symbols to the EXPERIMENTAL
section of the corresponding libraries version map file.  The consensus
on review is that, in addition to mandating the addition of symbols to
the EXPERIMENTAL version in the map, we need a mechanism to enforce our
documented process of mandating that addition when they are introduced.
To that end, I am proposing this change.  It is an addition to the
checkpatches script, which scan incoming patches for additions and
removals of symbols to the map file, and warns the user appropriately

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: thomas@monjalon.net
CC: john.mcnamara@intel.com
CC: bruce.richardson@intel.com
CC: Ferruh Yigit <ferruh.yigit@intel.com>
CC: Stephen Hemminger <stephen@networkplumber.org>

---
Change notes

v2)
 * Cleaned up and documented awk script (shemminger)
 * fixed sort/uniq usage (shemminger)
 * moved checking to new script (tmonjalon)
 * added maintainer entry (tmonjalon)
 * added license (tmonjalon)

v3)
 * Changed symbol check script name (tmonjalon)
 * Trapped exit to clean temp file (tmonjalon)
 * Honored verbose command (tmonjalon)
 * Cleaned left over debug bits (tmonjalon)
 * Updated location in MAINTAINERS file (tmonjalon)

v4)
 * Updated maintainers file (tmonjalon)

v5)
 * undo V4 (tmojalon)

v6)
 * Cleaning up more nits (tmonjalon)
 * Combining some lines (tmonjalon)
 * Fixing error print condition (tmonjalon)
 * Redirect stdin to a file to allow rewinding for
   Multiple passes on tools (nhorman)
---
 MAINTAINERS                     |   1 +
 devtools/check-symbol-change.sh | 146 ++++++++++++++++++++++++++++++++++++++++
 devtools/checkpatches.sh        |  46 +++++++++++--
 3 files changed, 188 insertions(+), 5 deletions(-)
 create mode 100755 devtools/check-symbol-change.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index a646ca3e1..f83b9ab33 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -87,6 +87,7 @@ M: Neil Horman <nhorman@tuxdriver.com>
 F: lib/librte_compat/
 F: doc/guides/rel_notes/deprecation.rst
 F: devtools/validate-abi.sh
+F: devtools/check-symbol-change.sh
 F: buildtools/check-experimental-syms.sh
 
 Driver information
diff --git a/devtools/check-symbol-change.sh b/devtools/check-symbol-change.sh
new file mode 100755
index 000000000..22b17e6f2
--- /dev/null
+++ b/devtools/check-symbol-change.sh
@@ -0,0 +1,146 @@
+#!/bin/sh
+
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Neil Horman <nhorman@tuxdriver.com>
+
+build_map_changes()
+{
+	local fname=$1
+	local mapdb=$2
+
+	cat $fname | filterdiff -i *.map | awk '
+		# Initialize our variables
+		BEGIN {map="";sym="";ar="";sec=""; in_sec=0}
+
+		# Anything that starts with + or -, followed by an a
+		# and ends in the string .map is the name of our map file
+		# This may appear multiple times in a patch if multiple
+		# map files are altered, and all section/symbol names
+		# appearing between a triggering of this rule and the
+		# next trigger of this rule are associated with this file
+		/[-+] a\/.*\.map/ {map=$2}
+
+		# Triggering this rule, which starts a line with a + and ends it
+		# with a { identifies a versioned section.  The section name is
+		# the rest of the line with the + and { symbols remvoed.
+		# Triggering this rule sets in_sec to 1, which actives the
+		# symbol rule below
+		/+.*{/ {gsub("+","");sec=$1; in_sec=1}
+
+		# This rule idenfies the end of a section, and disables the
+		# symbol rule
+		/.*}/ {in_sec=0}
+
+		# This rule matches on a + followed by any characters except a :
+		# (which denotes a global vs local segment), and ends with a ;.
+		# The semicolon is removed and the symbol is printed with its
+		# association file name and version section, along with an
+		# indicator that the symbol is a new addition.  Note this rule
+		# only works if we have found a version section in the rule
+		# above (hence the in_sec check).  Otherwise we flag it as an
+		# unknown section
+		/^+[^}].*[^:*];/ {gsub(";","");sym=$2;
+			if (in_sec == 1) {
+				print map " " sym " " sec " add"
+			} else {
+				print map " " sym " unknown add"
+			}
+		}
+
+		# This is the same rule as above, but the rule matches on a
+		# leading - rather than a +, denoting that the symbol is being
+		# removed.
+		/^-[^}].*[^:*];/ {gsub(";","");sym=$2;
+			if (in_sec == 1) {
+				print map " " sym " " sec " del"
+			} else {
+				print map " " sym " unknown del"
+			}
+		}' > ./$mapdb
+
+		sort -u $mapdb > ./$mapdb.2
+		mv -f $mapdb.2 $mapdb
+
+}
+
+check_for_rule_violations()
+{
+	local mapdb=$1
+	local mname
+	local symname
+	local secname
+	local ar
+	local ret=0
+
+	while read mname symname secname ar
+	do
+		if [ "$ar" == "add" ]
+		then
+
+			if [ "$secname" == "unknown" ]
+			then
+				# Just inform the user of this occurrence, but
+				# don't flag it as an error
+				echo -n "INFO: symbol $syname is added but "
+				echo -n "patch has insuficient context "
+				echo -n "to determine the section name "
+				echo -n "please ensure the version is "
+				echo "EXPERIMENTAL"
+				continue
+			fi
+
+			if [ "$secname" != "EXPERIMENTAL" ]
+			then
+				# Symbols that are getting added in a section
+				# other ithan the experimental section
+				# to be moving from an already supported
+				# section or its a violation
+				grep -q \
+				"$mname $symname [^EXPERIMENTAL] del" $mapdb
+				if [ $? -ne 0 ]
+				then
+					echo -n "ERROR: symbol $symname "
+					echo -n "is added in a section "
+					echo -n "other than the EXPERIMENTAL "
+					echo "section of the version map"
+					ret=1
+				fi
+			fi
+		else
+
+			if [ "$secname" != "EXPERIMENTAL" ]
+			then
+				# Just inform users that non-experimenal
+				# symbols need to go through a deprecation
+				# process
+				echo -n "INFO: symbol $symname is being "
+				echo -n "removed, ensure that it has "
+				echo "gone through the deprecation process"
+			fi
+		fi
+	done < $mapdb
+
+	return $ret
+}
+
+trap clean_and_exit_on_sig EXIT
+
+mapfile=`mktemp mapdb.XXXXXX`
+patch=$1
+exit_code=1
+
+clean_and_exit_on_sig()
+{
+	rm -f $mapfile
+	exit $exit_code
+}
+
+build_map_changes $patch $mapfile
+check_for_rule_violations $mapfile
+exit_code=$?
+
+rm -f $mapfile
+
+exit $exit_code
+
+
diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
index 7676a6b50..fa36b0d98 100755
--- a/devtools/checkpatches.sh
+++ b/devtools/checkpatches.sh
@@ -35,6 +35,10 @@
 # - DPDK_CHECKPATCH_LINE_LENGTH
 . $(dirname $(readlink -e $0))/load-devel-config
 
+trap "rm -f $TMPINPUT" SIGINT
+
+VALIDATE_NEW_API=$(dirname $(readlink -e $0))/check-symbol-change.sh
+
 length=${DPDK_CHECKPATCH_LINE_LENGTH:-80}
 
 # override default Linux options
@@ -61,6 +65,7 @@ print_usage () {
 	END_OF_HELP
 }
 
+
 number=0
 quiet=false
 verbose=false
@@ -86,19 +91,50 @@ total=0
 status=0
 
 check () { # <patch> <commit> <title>
+	local ret=0
+	TMPINPUT=`mktemp checkpatches.XXXXXX`
+
 	total=$(($total + 1))
 	! $verbose || printf '\n### %s\n\n' "$3"
 	if [ -n "$1" ] ; then
 		report=$($DPDK_CHECKPATCH_PATH $options "$1" 2>/dev/null)
 	elif [ -n "$2" ] ; then
-		report=$(git format-patch --find-renames --no-stat --stdout -1 $commit |
+		git format-patch --find-renames --no-stat --stdout -1 $commit > ./$TMPINPUT
+		report=$(cat ./$TMPINPUT |
 			$DPDK_CHECKPATCH_PATH $options - 2>/dev/null)
 	else
-		report=$($DPDK_CHECKPATCH_PATH $options - 2>/dev/null)
+		cat > ./$TMPINPUT
+		report=$(cat ./$TMPINPUT | $DPDK_CHECKPATCH_PATH $options - 2>/dev/null)
+	fi
+	if [ $? -ne 0 ]
+	then
+		$verbose || printf '\n### %s\n\n' "$3"
+		printf '%s\n' "$report" | sed -n '1,/^total:.*lines checked$/p'
+		ret=1
+	fi
+
+	! $verbose || printf '\nChecking API additions/removals:\n'
+
+	if [ -n "$1" ] ; then
+		report=$($VALIDATE_NEW_API "$1")
+	elif [ -n "$2" ] ; then
+		report=$( cat ./$TMPINPUT | 
+			$VALIDATE_NEW_API -)
+	else
+		report=$(cat ./$TMPINPUT | $VALIDATE_NEW_API -)
+	fi
+
+	if [ $? -ne 0 ]
+	then
+		printf '%s\n' "$report"
+		ret=1
+	fi
+
+	rm -f ./$TMPINPUT
+	if [ $ret -eq 0 ]
+	then
+		return 0
 	fi
-	[ $? -ne 0 ] || return 0
-	$verbose || printf '\n### %s\n\n' "$3"
-	printf '%s\n' "$report" | sed -n '1,/^total:.*lines checked$/p'
 	status=$(($status + 1))
 }
 
-- 
2.14.3

^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [dpdk-announce] DPDK 18.02 released
@ 2018-02-14 19:11  3% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-14 19:11 UTC (permalink / raw)
  To: announce

A new major release is available:
	http://fast.dpdk.org/rel/dpdk-18.02.tar.xz

Special attention was paid to not break the ABI in this release.
It means 18.02 could replace 17.11 without rebuilding the applications.
However it is advised to keep using 17.11 LTS for long term deployments.

Some highlights:
	- new license header (SPDX tag)
	- bbdev (Wireless Base Band) device class
	- rawdev device class
	- ethdev probe notifications and port ownership
	- Hyper-V platform driver
	- AVF (Adaptive Virtual Function) ethdev driver
	- IPsec offload in DPAA
	- DPAA eventdev driver
	- OPDL (Ordered Packet Distribution Library) eventdev driver
	- experimental tags and automatic check
	- meson build system (beta)

More details in the release notes:
	http://dpdk.org/doc/guides/rel_notes/release_18_02.html

The statistics are similar to previous release:
	1315 patches from 145 authors
	2316 files changed, 100569 insertions(+), 77209 deletions(-)

There are 46 new contributors
(including authors, reviewers and testers):
Thanks to Aleksey Baulin, Amr Mokhtar, Andrea Grandi, Andrew Jackson,
Anoob Joseph, Avi Kivity, Bao-Long Tran, Bharat Mota, Cheryl Houser,
Ciara Power, David Coyle, Dustin Lundquist, Erik Gabriel Carrillo,
George Wilkie, Georgios Katsikas, Gong Deli, Hyong Youb Kim,
Jerry Lilijun, Jun Yang, Junjie Chen, Kefu Chai, Kevin Laatz,
Laszlo Ersek, Liang Ma, Mallesh Koujalagi, Martin Klozik,
Matthew Smith, Michael McConville, Natalie Samsonov, Nikhil Agarwal,
Peter Mccarthy, Prashant Bhole, Rafal Kozik, Rosen Xu, Roy Franz,
Sharmila Podury, Stefan Hajnoczi, Sunil Kumar Kori, Thomas Speier,
Tomasz Jozwiak, Vijay Srivastava, Wisam Jaddo, Xin Long, Yang Zhang,
Yanglong Wu and Zhike Wang.

Below is the number of patches per company (accuracy not perfect):
    463     Intel (57)
    213     Mellanox (11)
    132     NXP (7)
    131     Cavium (9)
    102     6WIND (8)
     83     Solarflare (6)
     27     Broadcom (2)
     24     RedHat (5)
     21     Semihalf (3)
     20     Microsoft (2)
     17     Cisco (3)
     16     OKTET Labs (2)
      9     AT&T (4)
      6     Marvell (1)
      5     Netronome (1)
      5     IBM (2)
      4     ZTE (1)
      4     Linaro (1)
      4     HXT Semiconductor (1)
      4     ARM (2)

The new features for 18.05 must be submitted before the next month,
in order to be reviewed and integrated during March.
The next release is expected to happen at the beginning of May.

Thanks everyone

PS: Like last year, this release is done during Valentine's day.
It is an opportunity to stop working and offer a day to your Valentine!

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3] doc: ethdev ABI change deprecation notice
  @ 2018-02-14 17:18  4%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-14 17:18 UTC (permalink / raw)
  To: Kirill Rybalchenko
  Cc: dev, Olivier Matz, Ferruh Yigit, Neil Horman, andrey.chilikin,
	adrien.mazarguil

14/02/2018 01:14, Thomas Monjalon:
> > > >> Signed-off-by: Kirill Rybalchenko <kirill.rybalchenko@intel.com>
> > > >>
> > > >> Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
> > > >> ---
> > > >> +* ethdev: announce ABI change
> > > >> +  The size of variables flow_types_mask in rte_eth_fdir_info structure,
> > > >> +  sym_hash_enable_mask and valid_bit_mask in rte_eth_hash_global_conf structure
> > > >> +  will be increased from 32 to 64 bits to fulfill hardware requirements.
> > > >> +  This change will break existing ABI as size of the structures will increase.
> > > >> +
> > > > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> > > 
> > > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> > 
> > Acked-by: Olivier Matz <olivier.matz@6wind.com>
> 
> Acked-by: Thomas Monjalon <thomas@monjalon.net>
> 
> I would prefer you drop the legacy code to keep only rte_flow.

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] doc: announce ABI change to support VF representors
  @ 2018-02-14 16:50  4%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-14 16:50 UTC (permalink / raw)
  To: shahafs
  Cc: dev, Jerin Jacob, Boccassi, Luca, nhorman, remy.horton,
	mohammad.abdul.awal, declan.doherty, ferruh.yigit

> > > This is following the RFC being discussed and targets 18.05
> > > 
> > > http://dpdk.org/ml/archives/dev/2018-January/085716.html
> > > 
> > > Cc: declan.doherty@intel.com
> > > Cc: mohammad.abdul.awal@intel.com
> > > Cc: ferruh.yigit@intel.com
> > > Cc: remy.horton@intel.com
> > > 
> > > Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> > > ---
> > > +* ethdev: A work is being planned for 18.05 to expose VF port
> > > representors
> > > +  as a mean to perform control and data path operation on the
> > > different VFs.
> > > +  As VF representor is an ethdev port, new fields are needed in
> > > order to map
> > > +  between the VF representor and the VF or the parent PF. Those new
> > > fields
> > > +  are to be included in ``rte_eth_dev_info`` struct.
> > 
> > Acked-by: Luca Boccassi <luca.boccassi@intl.att.com>
> > Acked-by: Alex Zelezniak <alexz@att.com>
> 
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

Applied

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] doc: announce ABI change for RSS configuration structure
  @ 2018-02-14 16:28  4%     ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2018-02-14 16:28 UTC (permalink / raw)
  To: Xueming(Steven) Li
  Cc: dev, Jerin Jacob, Ferruh Yigit, Shahaf Shuler, Neil Horman

> > >> Update deprecation notice for the new rss_level field of rte_eth_rss_conf.
> > >>
> > >> Link: http://www.dpdk.org/dev/patchwork/patch/31891
> > >>
> > >> Signed-off-by: Xueming Li <xuemingl@mellanox.com>
> > >> ---
> > >> +* ethdev: A new rss level field planned in 18.05.
> > >> +  The new API add rss_level field to ``rte_eth_rss_conf`` to enable a
> > >> +choice
> > >> +  of RSS hash calculation on outer or inner header of tunneled packet.
> > > 
> > > Acked-By: Shahaf Shuler <shahafs@mellanox.com>
> > 
> > Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> 
> Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>

Applied

^ permalink raw reply	[relevance 4%]

Results 10001-10200 of ~19000  next (older) | prev (newer) | reverse | sort options + mbox downloads above

-- links below jump to the message on this page --
2017-06-30 14:26     [dpdk-dev] [RFC] ring: relax alignment constraint on ring structure Olivier Matz
2018-04-03 13:26  9% ` [dpdk-dev] [PATCH] " Olivier Matz
2018-04-03 15:07       ` Jerin Jacob
2018-04-03 15:25         ` Olivier Matz
2018-04-03 15:37           ` Jerin Jacob
2018-04-03 15:56  3%         ` Olivier Matz
2018-04-03 16:42  3%           ` Jerin Jacob
2018-04-04 23:38  0%             ` Ananyev, Konstantin
2017-11-15 11:41     [dpdk-dev] [PATCH] vhost: fix segfault as handle set_mem_table message Jianfeng Tan
2017-11-28 12:09     ` Maxime Coquelin
2017-12-05 14:19       ` Yuanhan Liu
2017-12-05 14:28         ` Maxime Coquelin
2018-03-29  7:01           ` Tan, Jianfeng
2018-03-29  7:35             ` Maxime Coquelin
2018-03-29 12:57               ` Wodkowski, PawelX
2018-03-29 16:37  3%             ` Maxime Coquelin
2018-03-29 18:09  0%               ` Wodkowski, PawelX
2017-12-08 15:49     [dpdk-dev] [RFC] mbuf: remove control mbuf Olivier Matz
2018-04-03 13:39  3% ` [dpdk-dev] [PATCH] " Olivier Matz
2018-01-08 10:00     [dpdk-dev] [PATCH v3] lib/librte_meter: add meter configuration profile Jasvinder Singh
2018-01-08 15:43     ` [dpdk-dev] [PATCH v4] " Jasvinder Singh
2018-02-19 21:12  3%   ` Thomas Monjalon
2018-04-05 10:12  0%     ` Thomas Monjalon
2018-04-05 11:00  0%       ` Dumitrescu, Cristian
2018-01-11  0:20     [dpdk-dev] [PATCH v6 00/23] eventtimer: introduce event timer adapter Erik Gabriel Carrillo
2018-03-08 21:53     ` [dpdk-dev] [PATCH v7 0/7] " Erik Gabriel Carrillo
2018-03-08 21:54  2%   ` [dpdk-dev] [PATCH v7 2/7] eventtimer: add common code Erik Gabriel Carrillo
2018-03-29 21:27       ` [dpdk-dev] [PATCH v8 0/9] eventtimer: introduce event timer adapter Erik Gabriel Carrillo
2018-03-29 21:27  3%     ` [dpdk-dev] [PATCH v8 3/9] eventtimer: add common code Erik Gabriel Carrillo
2018-04-02 19:39         ` [dpdk-dev] [PATCH v9 0/9] eventtimer: introduce event timer adapter Erik Gabriel Carrillo
2018-04-02 19:39  3%       ` [dpdk-dev] [PATCH v9 3/9] eventtimer: add common code Erik Gabriel Carrillo
2018-04-03 21:44           ` [dpdk-dev] [PATCH v10 0/9] eventtimer: introduce event timer adapter Erik Gabriel Carrillo
2018-04-03 21:44  3%         ` [dpdk-dev] [PATCH v10 3/9] eventtimer: add common code Erik Gabriel Carrillo
2018-04-04 21:51             ` [dpdk-dev] [PATCH v11 0/9] eventtimer: introduce event timer adapter Erik Gabriel Carrillo
2018-04-04 21:51  3%           ` [dpdk-dev] [PATCH v11 3/9] eventtimer: add common code Erik Gabriel Carrillo
2018-01-12 10:27     [dpdk-dev] [PATCH v2] doc: ethdev ABI change deprecation notice Kirill Rybalchenko
2018-02-13 13:21     ` [dpdk-dev] [PATCH v3] " Olivier Matz
2018-02-14  0:14       ` Thomas Monjalon
2018-02-14 17:18  4%     ` Thomas Monjalon
2018-01-15 19:05     [dpdk-dev] [PATCH] checkpatches.sh: Add checks for ABI symbol addition Neil Horman
2018-02-14 19:19  6% ` [dpdk-dev] [PATCH v6] " Neil Horman
2018-01-17 21:57     [dpdk-dev] [PATCH v3 2/6] ethdev: return named opaque type instead of void pointer Ferruh Yigit
2018-03-09 11:25     ` [dpdk-dev] [PATCH v4] " Ferruh Yigit
     [not found]       ` <20180309123651.GB19004@hmswarspite.think-freely.org>
2018-03-09 13:00  0%     ` Ferruh Yigit
2018-03-09 15:16  0%       ` Neil Horman
2018-03-09 15:45  0%         ` Ferruh Yigit
2018-03-09 19:06  0%           ` Neil Horman
2018-03-20 15:51  0%             ` Ferruh Yigit
2018-01-23 13:15     [dpdk-dev] [RFC v2 00/17] mempool: add bucket mempool driver Andrew Rybchenko
2018-03-10 15:39  3% ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Andrew Rybchenko
2018-03-10 15:39  7%   ` [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
2018-03-11 12:51  0%     ` santosh
2018-03-12  6:53  0%       ` Andrew Rybchenko
2018-03-19 17:03  0%     ` Olivier Matz
2018-03-20 14:41  0%       ` Bruce Richardson
2018-03-10 15:39  6%   ` [dpdk-dev] [PATCH v1 2/9] mempool: add op to populate objects using provided memory Andrew Rybchenko
2018-03-10 15:39  6%   ` [dpdk-dev] [PATCH v1 3/9] mempool: remove callback to get capabilities Andrew Rybchenko
2018-03-10 15:39  5%   ` [dpdk-dev] [PATCH v1 4/9] mempool: deprecate xmem functions Andrew Rybchenko
2018-03-10 15:39  8%   ` [dpdk-dev] [PATCH v1 7/9] mempool: remove callback to register memory area Andrew Rybchenko
2018-03-19 17:03  0%   ` [dpdk-dev] [PATCH v1 0/9] mempool: prepare to add bucket driver Olivier Matz
2018-03-25 16:20  2% ` [dpdk-dev] [PATCH v2 00/11] " Andrew Rybchenko
2018-03-25 16:20  7%   ` [dpdk-dev] [PATCH v2 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
2018-03-25 16:20  6%   ` [dpdk-dev] [PATCH v2 05/11] mempool: add op to populate objects using provided memory Andrew Rybchenko
2018-03-25 16:20  6%   ` [dpdk-dev] [PATCH v2 06/11] mempool: remove callback to get capabilities Andrew Rybchenko
2018-03-25 16:20  4%   ` [dpdk-dev] [PATCH v2 07/11] mempool: deprecate xmem functions Andrew Rybchenko
2018-03-25 16:20  8%   ` [dpdk-dev] [PATCH v2 10/11] mempool: remove callback to register memory area Andrew Rybchenko
2018-03-26 16:09  2% ` [dpdk-dev] [PATCH v3 00/11] mempool: prepare to add bucket driver Andrew Rybchenko
2018-03-26 16:09  7%   ` [dpdk-dev] [PATCH v3 04/11] mempool: add op to calculate memory size to be allocated Andrew Rybchenko
2018-04-04 15:08  0%     ` santosh
2018-03-26 16:09  6%   ` [dpdk-dev] [PATCH v3 05/11] mempool: add op to populate objects using provided memory Andrew Rybchenko
2018-03-26 16:09  6%   ` [dpdk-dev] [PATCH v3 06/11] mempool: remove callback to get capabilities Andrew Rybchenko
2018-03-26 16:09  4%   ` [dpdk-dev] [PATCH v3 07/11] mempool: deprecate xmem functions Andrew Rybchenko
2018-03-26 16:09  8%   ` [dpdk-dev] [PATCH v3 10/11] mempool: remove callback to register memory area Andrew Rybchenko
2018-03-26 16:12  3% ` [dpdk-dev] [PATCH v1 0/6] mempool: add bucket driver Andrew Rybchenko
2018-03-26 16:12  4%   ` [dpdk-dev] [PATCH v1 3/6] mempool: support block dequeue operation Andrew Rybchenko
2018-02-02 23:28     [dpdk-dev] [PATCH 0/7] vhost: support selective datapath Zhihong Wang
2018-03-19 10:12     ` [dpdk-dev] [PATCH v3 0/5] " Zhihong Wang
2018-03-19 10:12       ` [dpdk-dev] [PATCH v3 2/5] " Zhihong Wang
2018-03-21 21:05         ` Maxime Coquelin
2018-03-22  7:55  3%       ` Wang, Zhihong
2018-03-22  8:31  0%         ` Maxime Coquelin
2018-03-30 10:00     ` [dpdk-dev] [PATCH v4 0/5] " Zhihong Wang
2018-03-30 10:01       ` [dpdk-dev] [PATCH v4 2/5] " Zhihong Wang
2018-03-31  6:10  3%     ` Maxime Coquelin
2018-04-02  1:58  0%       ` Wang, Zhihong
2018-02-04  7:24     [dpdk-dev] [PATCH] doc: annouce ABI change for RSS configuraiton structure Xueming Li
2018-02-13 11:27     ` [dpdk-dev] [PATCH v2] doc: announce ABI change for RSS configuration structure Ferruh Yigit
2018-02-13 12:10       ` Jerin Jacob
2018-02-14 16:28  4%     ` Thomas Monjalon
2018-02-05 16:37     [dpdk-dev] [PATCH v3] eal: add function to return number of detected sockets Anatoly Burakov
2018-02-07  9:58     ` [dpdk-dev] [PATCH 18.05 v4] Add " Anatoly Burakov
2018-02-07  9:58       ` [dpdk-dev] [PATCH 18.05 v4] eal: add " Anatoly Burakov
2018-03-08 12:12  3%     ` Bruce Richardson
2018-03-08 14:38  0%       ` Burakov, Anatoly
2018-03-09 16:32  0%         ` Bruce Richardson
2018-03-21  4:59  0%     ` gowrishankar muthukrishnan
2018-03-21 10:24  0%       ` Burakov, Anatoly
2018-03-22 10:58  4% ` [dpdk-dev] [PATCH v5] eal: provide API for querying valid socket id's Anatoly Burakov
2018-03-22 11:45  3%   ` Burakov, Anatoly
2018-03-22 12:36  5%   ` [dpdk-dev] [PATCH v6] " Anatoly Burakov
2018-03-22 17:07  0%     ` gowrishankar muthukrishnan
2018-03-27 16:24  3%     ` Thomas Monjalon
2018-03-31 17:08  5%     ` [dpdk-dev] [PATCH v7] " Anatoly Burakov
2018-04-04 22:31  3%       ` Thomas Monjalon
2018-02-12  4:53     [dpdk-dev] [PATCH 0/4] deferred queue setup Qi Zhang
2018-03-02  4:13     ` [dpdk-dev] [PATCH v2 " Qi Zhang
2018-03-02  4:13       ` [dpdk-dev] [PATCH v2 1/4] ether: support " Qi Zhang
2018-03-14 12:31  0%     ` Ananyev, Konstantin
2018-03-15  3:13  0%       ` Zhang, Qi Z
2018-03-15 13:16  0%         ` Ananyev, Konstantin
2018-03-15 15:08  0%           ` Zhang, Qi Z
2018-03-15 15:38  0%             ` Ananyev, Konstantin
2018-02-13  8:14     [dpdk-dev] [PATCH v2] net/tap: add CRC stripping capability Ophir Munk
2018-02-13 16:35     ` Thomas Monjalon
2018-02-15 21:55  3%   ` Stephen Hemminger
2018-02-16 13:00  0%     ` Thomas Monjalon
2018-02-14 12:32     [dpdk-dev] [PATCH] doc: announce ABI change to support VF representors Shahaf Shuler
2018-02-14 15:27     ` Boccassi, Luca
2018-02-14 15:54       ` Jerin Jacob
2018-02-14 16:50  4%     ` Thomas Monjalon
2018-02-14 19:11  3% [dpdk-dev] [dpdk-announce] DPDK 18.02 released Thomas Monjalon
2018-02-15 13:04  6% [dpdk-dev] [PATCH v1] doc: add template release notes for 18.05 John McNamara
2018-02-16 22:54  0% ` Carrillo, Erik G
2018-02-17 10:49     [dpdk-dev] [PATCH 1/2] eal: add API to align integer to previous power of 2 Pavan Nikhilesh
2018-04-04 10:16     ` [dpdk-dev] [PATCH v3 " Pavan Nikhilesh
2018-04-04 16:10       ` Matan Azrad
2018-04-04 16:42         ` Pavan Nikhilesh
2018-04-04 17:11           ` Matan Azrad
2018-04-04 17:51             ` Pavan Nikhilesh
2018-04-04 18:10               ` Matan Azrad
2018-04-04 18:15                 ` Pavan Nikhilesh
2018-04-04 18:23                   ` Matan Azrad
2018-04-04 18:36  3%                 ` Pavan Nikhilesh
2018-04-04 19:41  3%                   ` Matan Azrad
2018-02-21 21:44     [dpdk-dev] [PATCH v2 0/2] lib/rib: Add Routing Information Base library Medvedkin Vladimir
2018-02-21 21:44     ` [dpdk-dev] [PATCH v2 1/2] Add RIB library Medvedkin Vladimir
2018-03-14 11:09  4%   ` Bruce Richardson
2018-03-25 18:17  0%     ` Vladimir Medvedkin
2018-03-26  9:50  0%       ` Bruce Richardson
2018-03-29 19:59  0%         ` Vladimir Medvedkin
2018-03-29 10:27  3%   ` Bruce Richardson
2018-03-29 20:11  0%     ` Vladimir Medvedkin
2018-03-29 20:41  3%       ` Bruce Richardson
2018-02-22 12:15  8% [dpdk-dev] [PATCH] doc: fixing grammar Alejandro Lucero
2018-02-24 13:14     [dpdk-dev] [PATCH v2 0/7] crypto: add virtio poll mode driver Jay Zhou
2018-02-24 13:14  2% ` [dpdk-dev] [PATCH v2 1/7] crypto/virtio: add virtio related fundamental functions Jay Zhou
2018-02-27 10:29  3% [dpdk-dev] [PATCH] ethdev: remove versioning of ethdev filter control function Kirill Rybalchenko
2018-02-27 11:01  0% ` Ferruh Yigit
2018-02-27 13:45  3%   ` Thomas Monjalon
2018-02-27 14:18  7% ` [dpdk-dev] [PATCH v2] " Kirill Rybalchenko
2018-03-07 17:17  0%   ` Ferruh Yigit
2018-03-07 17:47  0%     ` Ferruh Yigit
2018-03-03 13:45     [dpdk-dev] [PATCH 00/41] Memory Hotplug for DPDK Anatoly Burakov
2018-03-03 13:46     ` [dpdk-dev] [PATCH 13/41] eal: replace memseg with memseg lists Anatoly Burakov
2018-03-19 17:39  3%   ` Olivier Matz
2018-03-20  9:47  4%     ` Burakov, Anatoly
2018-03-04 15:04     [dpdk-dev] [PATCH] pdump: change to use generic multi-process channel Jianfeng Tan
2018-03-20 16:37  4% ` Pattan, Reshma
2018-03-21  2:28  4%   ` Tan, Jianfeng
2018-03-21  9:55  3%     ` Pattan, Reshma
2018-03-27  1:26  0%       ` Tan, Jianfeng
2018-03-27  8:21  3%         ` Pattan, Reshma
2018-03-05 23:01     [dpdk-dev] [PATCH 1/2] eventdev: add device stop flush callback Gage Eads
2018-03-08 23:10     ` [dpdk-dev] [PATCH v2 " Gage Eads
2018-03-12  6:25  3%   ` Jerin Jacob
2018-03-12 14:30  3%     ` Eads, Gage
2018-03-12 14:38  0%       ` Jerin Jacob
2018-03-06 18:28  3% [dpdk-dev] [PATCH] eal: register rte_panic user callback Arnon Warshavsky
2018-03-07  8:32  0% ` Thomas Monjalon
2018-03-07  9:05  0%   ` Burakov, Anatoly
2018-03-07  9:59  0%     ` Thomas Monjalon
2018-03-07 11:29  0%       ` Burakov, Anatoly
2018-03-07 12:08  3% [dpdk-dev] [RFC PATCH v1 0/4] ethdev: add per-PMD tuning of RxTx parmeters Remy Horton
2018-03-21 14:27  3% ` [dpdk-dev] [PATCH v2 " Remy Horton
2018-03-27 18:43  0%   ` Ferruh Yigit
2018-03-30 10:34  0%     ` Ferruh Yigit
2018-03-31  0:05  0%       ` Thomas Monjalon
2018-04-04 17:17  3%   ` [dpdk-dev] [PATCH v3 " Remy Horton
2018-04-04 17:17         ` [dpdk-dev] [PATCH v3 1/4] ethdev: add support for PMD-tuned Tx/Rx parameters Remy Horton
2018-04-04 18:56  3%       ` De Lara Guarch, Pablo
2018-04-05 10:16  0%         ` Thomas Monjalon
2018-03-07 12:08     [dpdk-dev] [RFC PATCH v1 " Remy Horton
2018-03-14 14:43     ` Ferruh Yigit
2018-03-14 15:48       ` Remy Horton
2018-03-14 16:42         ` Ferruh Yigit
2018-03-14 17:23           ` Shreyansh Jain
2018-03-14 17:52             ` Ferruh Yigit
2018-03-14 18:53               ` Ananyev, Konstantin
2018-03-14 21:02                 ` Ferruh Yigit
2018-03-14 21:36                   ` Bruce Richardson
2018-03-15 13:57                     ` Ferruh Yigit
2018-03-15 14:39                       ` Bruce Richardson
2018-03-15 14:57                         ` Ferruh Yigit
2018-03-16 13:54                           ` Shreyansh Jain
2018-03-20 14:54                             ` Ferruh Yigit
2018-03-21  6:51                               ` Shreyansh Jain
2018-03-21 10:02  3%                             ` Ferruh Yigit
2018-03-21 10:45  0%                               ` Shreyansh Jain
2018-03-07 17:44 23% [dpdk-dev] [RFC] config: remove RTE_NEXT_ABI Ferruh Yigit
2018-03-07 18:06  0% ` Luca Boccassi
2018-03-08  8:05  5% ` Thomas Monjalon
2018-03-08 11:43  3%   ` Ferruh Yigit
2018-03-08 15:17  0%     ` Thomas Monjalon
2018-03-08 15:35  0%       ` Neil Horman
2018-03-08 16:04  0%         ` Thomas Monjalon
2018-03-08 19:40  3%           ` Neil Horman
2018-03-08 21:34  4%             ` Thomas Monjalon
2018-03-09  0:18  4%               ` Neil Horman
2018-03-08  1:29     [dpdk-dev] [RFC PATCH 0/5] add framework to load and execute BPF code Konstantin Ananyev
2018-03-08  1:29  2% ` [dpdk-dev] [RFC PATCH 1/5] bpf: add BPF loading and execution framework Konstantin Ananyev
2018-03-08  1:30     ` [dpdk-dev] [RFC PATCH 5/5] test: add few eBPF samples Konstantin Ananyev
2018-03-13 14:01       ` Jerin Jacob
2018-03-13 18:14         ` Ananyev, Konstantin
2018-03-30 17:42           ` Ananyev, Konstantin
2018-04-02 22:26  3%         ` Jerin Jacob
2018-03-09 16:42     [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
2018-03-09 16:42  2% ` [dpdk-dev] [PATCH v1 1/5] bpf: add BPF loading and execution framework Konstantin Ananyev
2018-03-30 17:32  2% ` [dpdk-dev] [PATCH v2 2/7] " Konstantin Ananyev
2018-03-10  1:25     [dpdk-dev] [PATCH v1 0/6] net/mlx5: add Multi-Packet Rx support Yongseok Koh
2018-03-10  1:25     ` [dpdk-dev] [PATCH v1 3/6] net/mlx5: add a function to rdma-core glue Yongseok Koh
2018-03-12  9:13  3%   ` Nélio Laranjeiro
2018-03-12 15:55  1% [dpdk-dev] [RFC] Switch device offload with DPDK Adrien Mazarguil
2018-03-20 11:26     [dpdk-dev] [PATCH] doc: announce ethdev CRC strip flag deprecation Ferruh Yigit
2018-03-20 11:35     ` Thomas Monjalon
2018-03-20 17:23  3%   ` Ferruh Yigit
2018-03-20 21:28     [dpdk-dev] [PATCH] eal: replace rte_panic instances to return an error value Arnon Warshavsky
2018-03-20 22:04  3% ` Thomas Monjalon
2018-03-20 22:42  0%   ` Arnon Warshavsky
2018-03-20 22:49  3%     ` Thomas Monjalon
2018-03-20 23:04  0%       ` Arnon Warshavsky
2018-03-21  8:21  0%         ` Thomas Monjalon
2018-03-21  8:47  0%           ` Arnon Warshavsky
2018-03-20 21:53     [dpdk-dev] [PATCH v3] " Arnon Warshavsky
2018-03-21  9:11     ` Bruce Richardson
2018-03-21  9:17       ` Arnon Warshavsky
2018-03-27 14:06  5%     ` Arnon Warshavsky
2018-03-23 12:58  3% [dpdk-dev] [PATCH v1 0/9] Bunch of flow API-related fixes Adrien Mazarguil
2018-03-23 12:58  4% ` [dpdk-dev] [PATCH v1 9/9] ethdev: fix ABI version in meson build Adrien Mazarguil
2018-04-04 14:57  3% ` [dpdk-dev] [PATCH v2 00/13] Bunch of flow API-related fixes Adrien Mazarguil
2018-04-04 14:58  4%   ` [dpdk-dev] [PATCH v2 12/13] ethdev: fix ABI version in meson build Adrien Mazarguil
2018-03-23 17:35     [dpdk-dev] [PATCH 0/4] NFP PF support based on new CPP interface Alejandro Lucero
2018-03-23 17:35  1% ` [dpdk-dev] [PATCH 1/4] net/nfp: add NFP CPP support Alejandro Lucero
2018-03-23 17:35  6% ` [dpdk-dev] [PATCH 2/4] net/nfp: update PMD for using new CPP interface Alejandro Lucero
2018-03-25  8:33     [dpdk-dev] [PATCH v3 0/7] crypto: add virtio poll mode driver Jay Zhou
2018-03-25  8:33  2% ` [dpdk-dev] [PATCH v3 1/7] crypto/virtio: add virtio related fundamental functions Jay Zhou
2018-03-26  9:51     [dpdk-dev] [PATCH v3 00/10] lib/librte_vhost: introduce new vhost user crypto backend support Fan Zhang
2018-03-29 12:52     ` [dpdk-dev] [PATCH v4 0/8] vhost: intdroduce vhost user crypto backend Fan Zhang
2018-03-29 12:52       ` [dpdk-dev] [PATCH v4 1/8] lib/librte_vhost: add external backend support Fan Zhang
2018-03-29 13:47  3%     ` Wodkowski, PawelX
2018-04-01 19:53  0%       ` Zhang, Roy Fan
2018-04-03 13:44  0%         ` Maxime Coquelin
2018-04-03 13:55  0%           ` Zhang, Roy Fan
2018-04-03 14:42  0%           ` Tan, Jianfeng
2018-04-03 14:48  0%             ` Wodkowski, PawelX
2018-03-27 15:17     [dpdk-dev] [PATCH] vhost: add virtio configuration space messages Tomasz Kulasek
2018-03-27 15:35     ` [dpdk-dev] [PATCH v2] " Tomasz Kulasek
2018-03-28  9:11  3%   ` Maxime Coquelin
2018-03-28  9:19  0%     ` Wodkowski, PawelX
2018-03-28  9:33  0%       ` Maxime Coquelin
2018-03-28  9:48  0%         ` Maxime Coquelin
2018-03-28  9:50  0%     ` Liu, Changpeng
2018-03-28  9:57  0%       ` Maxime Coquelin
2018-03-28 10:03  0%         ` Liu, Changpeng
2018-03-28 10:11  0%           ` Maxime Coquelin
2018-03-28 10:23  0%             ` Liu, Changpeng
2018-03-28 10:56  0%               ` Maxime Coquelin
2018-03-27 17:40  1% [dpdk-dev] [PATCH] ethdev: replace bus specific struct with generic dev Ferruh Yigit
2018-03-28  7:04  0% ` Shreyansh Jain
2018-03-28 13:11  0% ` Legacy, Allain
2018-03-29  6:17  0% ` Tomasz Duszynski
2018-03-29  9:20  0%   ` Ferruh Yigit
2018-03-29  8:01  0% ` santosh
2018-03-28 13:54     [dpdk-dev] [PATCH v6 0/7] switching device representation Declan Doherty
2018-03-28 13:54     ` [dpdk-dev] [PATCH v6 4/8] ethdev: Add port representor device flag Declan Doherty
2018-03-29  6:13       ` Shahaf Shuler
2018-03-29 14:53  3%     ` Doherty, Declan
2018-04-01  6:14  0%       ` Shahaf Shuler
2018-03-29 17:05     [dpdk-dev] [PATCH v3 0/2] gcc-8 build fixes Stephen Hemminger
2018-04-03  9:23     ` Ferruh Yigit
2018-04-03 15:10  3%   ` Stephen Hemminger
2018-03-29 17:52     [dpdk-dev] [PATCH v3] ethdev: replace bus specific struct with generic dev Ferruh Yigit
2018-03-30 15:17     ` [dpdk-dev] [PATCH v4] " Ferruh Yigit
2018-03-30 15:29       ` David Marchand
2018-04-02 16:13         ` santosh
2018-04-03  9:06           ` David Marchand
2018-04-03  9:50             ` Ferruh Yigit
2018-04-04 17:57  3%           ` De Lara Guarch, Pablo
2018-04-05  9:19  0%             ` Ferruh Yigit
2018-03-31  7:49     [dpdk-dev] [PATCH v4 0/7] crypto: add virtio poll mode driver Jay Zhou
2018-03-31  7:49  2% ` [dpdk-dev] [PATCH v4 1/7] crypto/virtio: add virtio related fundamental functions Jay Zhou
2018-03-31  9:18     [dpdk-dev] [PATCH v5 0/7] crypto: add virtio poll mode driver Jay Zhou
2018-03-31  9:18  2% ` [dpdk-dev] [PATCH v5 1/7] crypto/virtio: add virtio related fundamental functions Jay Zhou
2018-04-02  8:36     [dpdk-dev] [PATCH v2] eal/vfio: export internal vfio functions Hemant Agrawal
2018-04-03  8:28  4% ` [dpdk-dev] [PATCH v3 1/2] doc: add vfio api support Hemant Agrawal
2018-04-03 10:16  0%   ` Thomas Monjalon
2018-04-03  9:43     [dpdk-dev] [PATCH v6 00/10] crypto: add virtio poll mode driver Jay Zhou
2018-04-03  9:43  1% ` [dpdk-dev] [PATCH v6 02/10] crypto/virtio: support virtio device init Jay Zhou
2018-04-04 17:03     ` [dpdk-dev] [PATCH v7 00/10] crypto: add virtio poll mode driver Jay Zhou
2018-04-04 17:03  1%   ` [dpdk-dev] [PATCH v7 02/10] crypto/virtio: support virtio device init Jay Zhou
2018-04-03 23:21     [dpdk-dev] [PATCH v3 00/68] Memory Hotplug for DPDK Anatoly Burakov
2018-03-07 16:56     ` [dpdk-dev] [PATCH v2 00/41] " Anatoly Burakov
2018-04-03 23:21  3%   ` [dpdk-dev] [PATCH v3 24/68] mempool: add support for the new allocation methods Anatoly Burakov
2018-04-04 11:27  3% [dpdk-dev] eal: replace calls to rte_panic and refrain from new instances Arnon Warshavsky
2018-04-04 11:27  3% ` [dpdk-dev] [PATCH 02/13] bond: replace rte_panic instances in bonding driver Arnon Warshavsky
2018-04-04 11:27  3% ` [dpdk-dev] [PATCH 03/13] e1000: replace rte_panic instances in e1000 driver Arnon Warshavsky
2018-04-04 11:27  3% ` [dpdk-dev] [PATCH 04/13] ixgbe: replace rte_panic instances in ixgbe driver Arnon Warshavsky
2018-04-04 11:27  3% ` [dpdk-dev] [PATCH 06/13] kni: replace rte_panic instances in kni Arnon Warshavsky
2018-04-04 11:27  3% ` [dpdk-dev] [PATCH 11/13] eal: replace rte_panic instances in ethdev Arnon Warshavsky
2018-04-04 11:27  2% ` [dpdk-dev] [PATCH 12/13] eal: replace rte_panic instances in init sequence Arnon Warshavsky
2018-04-04 15:56  4% [dpdk-dev] [PATCH v1 00/16] Flow API overhaul for switch offloads Adrien Mazarguil
2018-04-04 15:56  7% ` [dpdk-dev] [PATCH v1 01/16] ethdev: update ABI for flow API functions Adrien Mazarguil
2018-04-05 10:06  4%   ` Thomas Monjalon
2018-04-05 12:44  9%     ` Adrien Mazarguil
2018-04-04 15:56  3% ` [dpdk-dev] [PATCH v1 05/16] ethdev: remove DUP action from flow API Adrien Mazarguil
2018-04-04 15:56  2% ` [dpdk-dev] [PATCH v1 10/16] ethdev: add encap level to RSS flow API action Adrien Mazarguil
2018-04-04 22:01  3% [dpdk-dev] [PATCH v2 00/13] eal: replace calls to rte_panic and refrain from new instances Arnon Warshavsky
2018-04-05 11:49  4% [dpdk-dev] [PATCH] doc: add meter API change to release notes Jasvinder Singh
2018-04-05 12:03  0% ` Dumitrescu, Cristian
2018-04-05 13:15  9% [dpdk-dev] [PATCH] eal/service: remove experimental tags Harry van Haaren
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).